[mythtv-users] new drive won't boot

Fri Dec 21 03:50:46 UTC 2018

Lots below if you want to go there!

> No, it a myth that SSDs wear out rapidly.  Modern SSDs have much more
> endurance. 

According to all the info I can find modern disks wear out FASTER than older disks. The thing that makes this seem not true is smaller disks wearout faster than large disks. EG the Samsung EVO 850 is rated for a total write of 600TB (The pro 1200TB) but the actual cell write life is down to 3000! 
https://www.compuram.de/blog/en/the-life-span-of-a-ssd-how-long-does-it-last-and-what-can-be-done-to-take-care/

> Here are is the real life SMART data from two of my SSDs:

Stephen that is most interesting thank you. It directly contradicts my own experience.
I had an 850 EVO 256G as my main disk.
Partitioned (EFI, swap) 20G /, and 200G /store (for myth) with 50G un-provisioned

After a year the disk became read-only, SMART said interesting stuff (long lost) and it took GREAT PAIN to reconfig another disk.
This SSD had NO writable sectors left (I must consider that the disk failed)

It is worth trying again (what I’m trying to fix is if I watch a show WHILE it is recording every few seconds playback pauses for a fraction of a second, initially not noticable but eventually quite irritating. 
If I watch a recording while something else is recording the problem does not occur) IE usb and tuners not the problem.

I’m running fixes/29 [v29.1-15-g280138b452] which I compiled myself as I fiddled with lots of debug stuff while I was trying to resolve the meta data saga.

James 

> On 20 Dec 2018, at 8:00 pm, mythtv-users-request at mythtv.org wrote:
> 
>>> On 19 Dec 2018, at 11:45 pm, mythtv-users-request at mythtv.org wrote:
>>> 
>>>> On 19 Dec 2018, at 4:57 pm, mythtv-users-request at mythtv.org wrote:
>>>> 
>>>> Greetings Mythizens, I received my replacement drive from Amazon used
>>>> "rsync" to copy the files from my revived (previously "about to fail")
>>>> drive, swap the two drives and the box doesn't boot. Unswitch and it boots
>>>> fine... unmount storage2, and switch drives and the FE finds storage2
>>>> recordings (on the "rsync'd" drive) then a reboot fails. "fstab" is lookng
>>>> to mount storage1,2&3, storage directories are back to original form, all
>>>> default group, granted it doesn't take much, but I'm stumped! What do I
>>>> need to do to make this drive acceptable to my box? the first of this type
>>>> of drive worked fine, the odds of getting lemons back to back I woud
>>>> imagine to be quite high...Please help, TIA Daryl
>>> 
>>> Since this is so far OT we may as well have a smile:
>>> This guy says ?I prayed for a bicycle, but nothing happened?. ? then I
>>> realized it does not work like this, so I stole a bike and prayed for
>>> forgiveness?.
>>> So it is with rsync. (It does not work like. this)
>>> One *can* fiddle with grub and /etc/fstab, but it will probably take long
>>> and be painfull. (eg do you have an EFI bios)
>>> 
>>> your easiest option is
>>> 
>>> 1) boot the bad disk
>>> 2) run mythconverg_backup.pl If you?ve not used it before 5 min reading is
>>> needed. Save the backup.
>>> 3) install the new disk
>>> 4) install the os, or a newer version if  you want to.
>>> 5) install mythtv, mariadb, apache2 and php if you want mythweb
>>> 6) mount the old disk and copy all the. files you want, specially art and
>>> ts files from myth-place-you-choose
>>> 7) run mythconverg_restore.pl
>>> 8) if you do not have same hostname and IP addr then more reading is needed.
>>> 
>>> James
>>> 
>>> 
>>> Except my goal is to have the SSD with my OS as the boot drive and the
>>> three terabyte drives as storage drives, so painful or time consuming as it
>>> may be this time future disk swaps will be better.
>> 
>> Running the OS from ssd is very non-trivial
>> With all good practices (WRT ssd trim, swapiness etc) the DB journal and log will use up the ssd within a year and GREAT PAIN results.
>> SSD is fast to boot and if your ring-buffers are on the ssd you can notice the diff, but otherwise ssd makes no noticable diff.
>> 
>> <very gently he says> the level of error in assuming rsync will make a bootable disk says by all means try, see what you can do, but you are looking at a very hard learning curve ahead.
>> 
>> Newer, cheaper ssds use 2 or 3 cell technology that drops the write life to 1/2 or 1/3. You can get as few as 20 000 write cycles / cell !!
>> 
>> For fixed SATA disks there is no benefit in using UUIDs so your fstab may have entries /dev/sda1 etc etc, that makes copying easier. 
>> Modern spec, in a flourish of marketing hype, no long specify cell life but total write life ie you can write umpty plonk Pb to your disk and for an ornary desktop machine ssd will outlast mechanical disks. But not for a mythbackend machine.
>> 
>> It is cheap and easy, I use a 1T mechanical dist for all the myth stuff including 2 20G root partitions that are mirrors of each other and a 5T WD passport USB3 for all storage. You can copy one passport to another each week or so without noticing.
>> 
>> In any event you have a lot to figure out on the OS front that only briefly has anything to do with mythtv. Have fun.
>> James
> 
> No, it a myth that SSDs wear out rapidly.  Modern SSDs have much more
> endurance.  It may have been a problem with the first SSD drives, but
> these days it is not a problem to have all normal things on an SSD
> drive.  Obviously you do not want to run a job that is going to write
> gigabytes per second all day, but as long as you avoid obviously
> stupid things like that, SSDs are just fine.
> 
> In any modern Linux, the system will automatically run TRIM for you. I
> did change the Ubuntu fstrim job from /etc/cron.weekly to
> /etc/cron.daily, as my SSDs are fairly busy  at times so I wanted all
> the deleted filesystem space erased and put back on the SSD free list
> a bit more often than weekly.  You can also use the "discard" option
> on the mount in fstab - that causes free space to be sent for erasing
> immediately.  In older kernels that used to slow things down a bit,
> but by now I hope that bug will have been fixed and the free space
> erasing is now done in parallel by a separate lower priority thread.
> 
> Here are is the real life SMART data from two of my SSDs:
> 
> 1) Samsung 950 Pro M.2 NVMe 256 Gbytes - this is my system drive for
> my main MythTV box, running Ubuntu 18.04.  It is on 24/7:
> 
> root at mypvr:/etc# smartctl -a /dev/nvme0n1
> smartctl 6.7 2018-12-05 r4852 [x86_64-linux-4.15.0-42-generic] (local
> build)
> Copyright (C) 2002-18, Bruce Allen, Christian Franke,
> www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Model Number:                       Samsung SSD 950 PRO 256GB
> Serial Number:                      S2GLNCAGA02568R
> Firmware Version:                   1B0QBXX7
> PCI Vendor/Subsystem ID:            0x144d
> IEEE OUI Identifier:                0x002538
> Controller ID:                      1
> Number of Namespaces:               1
> Namespace 1 Size/Capacity:          256,060,514,304 [256 GB]
> Namespace 1 Utilization:            125,586,079,744 [125 GB]
> Namespace 1 Formatted LBA Size:     512
> Namespace 1 IEEE EUI-64:            002538 515a160a08
> Local Time is:                      Thu Dec 20 20:44:44 2018 NZDT
> Firmware Updates (0x06):            3 Slots
> Optional Admin Commands (0x0007):   Security Format Frmw_DL
> Optional NVM Commands (0x001f):     Comp Wr_Unc DS_Mngmt Wr_Zero
> Sav/Sel_Feat
> Maximum Data Transfer Size:         32 Pages
> 
> Supported Power States
> St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
> 0 +     6.50W       -        -    0  0  0  0        5       5
> 1 +     5.80W       -        -    1  1  1  1       30      30
> 2 +     3.60W       -        -    2  2  2  2      100     100
> 3 -   0.0700W       -        -    3  3  3  3      500    5000
> 4 -   0.0050W       -        -    4  4  4  4     2000   22000
> 
> Supported LBA Sizes (NSID 0x1)
> Id Fmt  Data  Metadt  Rel_Perf
> 0 +     512       0         0
> 
> === START OF SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> SMART/Health Information (NVMe Log 0x02, NSID 0x1)
> Critical Warning:                   0x00
> Temperature:                        43 Celsius
> Available Spare:                    100%
> Available Spare Threshold:          10%
> Percentage Used:                    3%
> Data Units Read:                    116,381,714 [59.5 TB]
> Data Units Written:                 79,225,837 [40.5 TB]
> Host Read Commands:                 1,858,335,628
> Host Write Commands:                937,822,226
> Controller Busy Time:               3,812
> Power Cycles:                       220
> Power On Hours:                     22,043
> Unsafe Shutdowns:                   112
> Media and Data Integrity Errors:    0
> Error Information Log Entries:      2,907
> 
> Error Information (NVMe Log 0x01, max 64 entries)
> Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
>  0       2907     0  0x0016  0x4004  0x000            0     0     -
>  1       2906     0  0x000e  0x4004  0x000            0     0     -
>  2       2905     0  0x0017  0x4004  0x000            0     0     -
>  3       2904     0  0x001d  0x4004  0x000            0     0     -
>  4       2903     0  0x0016  0x4004  0x000            0     0     -
>  5       2902     0  0x0014  0x4004  0x000            0     0     -
>  6       2901     0  0x0015  0x4004  0x000            0     0     -
>  7       2900     0  0x0018  0x4004  0x000            0     0     -
>  8       2899     0  0x0007  0x4004  0x000            0     0     -
>  9       2898     0  0x0003  0x4004  0x000            0     0     -
> 10       2897     0  0x0010  0x4004  0x000            0     0     -
> 11       2896     0  0x001d  0x4004  0x000            0     0     -
> 12       2895     0  0x001b  0x4004  0x000            0     0     -
> 13       2894     0  0x0009  0x4004  0x000            0     0     -
> 14       2893     0  0x0011  0x4004  0x000            0     0     -
> 15       2892     0  0x0010  0x4004  0x000            0     0     -
> ... (48 entries not shown)
> 
> So it has been running for over 918 days (2.51 years), and is showing
> no spare blocks used and 3% wear with 40.5 Tbytes written.  The
> specifications for the 950 Pro 256 Gbyte model say it should last for
> 200 Tbytes written, so it is around 20% of its expected lifetime now.
> On that basis, if I keep using it at the same rate, it should last for
> over 12 years.
> 
> 2) Crucial MX300 SATA 525 Gbytes - this is the system drive in my
> test/development box, but since my old OS/2 box died it has been
> running 24/7 with OS/2 running in a virtual machine on Ubuntu 16.04.
> It is not nearly as heavily used as my MythTV box - only the operating
> system and the VM virtual system disk are on the SSD - the bulk of the
> data for my email and the OS/2 VM storage drives are on ordinary hard
> drives.
> 
> root at lith:~# smartctl -a
> /dev/disk/by-id/ata-Crucial_CT525MX300SSD1_1630135D7E12
> smartctl 6.6 2016-11-12 r4366 [x86_64-linux-4.4.0-140-generic] (local
> build)
> Copyright (C) 2002-16, Bruce Allen, Christian Franke,
> www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Crucial/Micron BX/MX1/2/3/500, M5/600, 1100 SSDs
> Device Model:     Crucial_CT525MX300SSD1
> Serial Number:    1630135D7E12
> LU WWN Device Id: 5 00a075 1135d7e12
> Firmware Version: M0CR060
> User Capacity:    525,112,713,216 bytes [525 GB]
> Sector Size:      512 bytes logical/physical
> Rotation Rate:    Solid State Device
> Form Factor:      2.5 inches
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   ACS-3 T13/2161-D revision 5
> SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Thu Dec 20 21:02:13 2018 NZDT
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> General SMART Values:
> Offline data collection status:  (0x00) Offline data collection
> activity
>                                        was never started.
>                                        Auto Offline Data Collection:
> Disabled.
> Self-test execution status:      (   0) The previous self-test routine
> completed
>                                        without error or no self-test
> has ever
>                                        been run.
> Total time to complete Offline
> data collection:                ( 1703) seconds.
> Offline data collection
> capabilities:                    (0x7b) SMART execute Offline
> immediate.
>                                        Auto Offline data collection
> on/off support.
>                                        Suspend Offline collection
> upon new
>                                        command.
>                                        Offline surface scan
> supported.
>                                        Self-test supported.
>                                        Conveyance Self-test
> supported.
>                                        Selective Self-test supported.
> SMART capabilities:            (0x0003) Saves SMART data before
> entering
>                                        power-saving mode.
>                                        Supports SMART auto save
> timer.
> Error logging capability:        (0x01) Error logging supported.
>                                        General Purpose Logging
> supported.
> Short self-test routine
> recommended polling time:        (   2) minutes.
> Extended self-test routine
> recommended polling time:        (   9) minutes.
> Conveyance self-test routine
> recommended polling time:        (   3) minutes.
> SCT capabilities:              (0x0035) SCT Status supported.
>                                        SCT Feature Control supported.
>                                        SCT Data Table supported.
> 
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED
> WHEN_FAILED RAW_VALUE
>  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail
> Always       -       0
>  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age Always
> -       0
>  9 Power_On_Hours          0x0032   100   100   000    Old_age Always
> -       9420
> 12 Power_Cycle_Count       0x0032   100   100   000    Old_age Always
> -       262
> 171 Program_Fail_Count      0x0032   100   100   000    Old_age Always
> -       0
> 172 Erase_Fail_Count        0x0032   100   100   000    Old_age Always
> -       0
> 173 Ave_Block-Erase_Count   0x0032   099   099   000    Old_age Always
> -       19
> 174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age Always
> -       70
> 183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age Always
> -       0
> 184 Error_Correction_Count  0x0032   100   100   000    Old_age Always
> -       0
> 187 Reported_Uncorrect      0x0032   100   100   000    Old_age Always
> -       0
> 194 Temperature_Celsius     0x0022   073   054   000    Old_age Always
> -       27 (Min/Max 16/46)
> 196 Reallocated_Event_Count 0x0032   100   100   000    Old_age Always
> -       0
> 197 Current_Pending_Sector  0x0032   100   100   000    Old_age Always
> -       0
> 198 Offline_Uncorrectable   0x0030   100   100   000    Old_age
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age Always
> -       0
> 202 Percent_Lifetime_Remain 0x0030   099   099   001    Old_age
> Offline      -       1
> 206 Write_Error_Rate        0x000e   100   100   000    Old_age Always
> -       0
> 246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age Always
> -       5871831640
> 247 Host_Program_Page_Count 0x0032   100   100   000    Old_age Always
> -       183758941
> 248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age Always
> -       187091403
> 180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail
> Always       -       1916
> 210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age Always
> -       0
> 
> SMART Error Log Version: 1
> No Errors Logged
> 
> SMART Self-test log structure revision number 1
> Num  Test_Description    Status                  Remaining
> LifeTime(hours)  LBA_of_first_error
> # 1  Vendor (0xff)       Completed without error       00%      9368 -
> # 2  Vendor (0xff)       Completed without error       00%      9117 -
> # 3  Vendor (0xff)       Completed without error       00%      8886 -
> # 4  Vendor (0xff)       Completed without error       00%      8802 -
> # 5  Vendor (0xff)       Completed without error       00%      8542 -
> # 6  Vendor (0xff)       Completed without error       00%      8295 -
> # 7  Vendor (0xff)       Completed without error       00%      7998 -
> # 8  Vendor (0xff)       Completed without error       00%      7742 -
> # 9  Vendor (0xff)       Completed without error       00%      7527 -
> #10  Vendor (0xff)       Completed without error       00%      7430 -
> #11  Vendor (0xff)       Completed without error       00%      7175 -
> #12  Vendor (0xff)       Completed without error       00%      6850 -
> #13  Vendor (0xff)       Completed without error       00%      6582 -
> #14  Vendor (0xff)       Completed without error       00%      6334 -
> #15  Vendor (0xff)       Completed without error       00%      6073 -
> #16  Vendor (0xff)       Completed without error       00%      5789 -
> #17  Vendor (0xff)       Completed without error       00%      5507 -
> #18  Vendor (0xff)       Completed without error       00%      5192 -
> #19  Vendor (0xff)       Completed without error       00%      4924 -
> #20  Vendor (0xff)       Completed without error       00%      4690 -
> #21  Vendor (0xff)       Completed without error       00%      4651 -
> 
> SMART Selective self-test log data structure revision number 1
> SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
>    1        0        0  Not_testing
>    2        0        0  Completed [00% left] (106848-172383)
>    3        0        0  Not_testing
>    4        0        0  Not_testing
>    5        0        0  Not_testing
> Selective self-test flags (0x0):
>  After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute
> delay.
> 
> It has been running for 392.5 days with 5871831640 sectors written - I
> am not quite sure what the sector size is.  It is reporting
> Percent_Lifetime_Remain as 99%.  That is the number that Crucial says
> is the critical one.  But from that, if it is only losing 1% of
> lifetime per year, then it is going to last a very long time.  Even if
> it is closer to 2% per year, I am very happy that it will last way
> longer than the remaining time before I will be replacing that PC.