[mythtv-users] new drive won't boot

Stephen Worthington stephen_agent at jsw.gen.nz
Thu Dec 20 08:32:43 UTC 2018


On Thu, 20 Dec 2018 10:53:25 +0800, you wrote:

>> On 19 Dec 2018, at 11:45 pm, mythtv-users-request at mythtv.org wrote:
>> 
>>> On 19 Dec 2018, at 4:57 pm, mythtv-users-request at mythtv.org wrote:
>>> 
>>> Greetings Mythizens, I received my replacement drive from Amazon used
>>> "rsync" to copy the files from my revived (previously "about to fail")
>>> drive, swap the two drives and the box doesn't boot. Unswitch and it boots
>>> fine... unmount storage2, and switch drives and the FE finds storage2
>>> recordings (on the "rsync'd" drive) then a reboot fails. "fstab" is lookng
>>> to mount storage1,2&3, storage directories are back to original form, all
>>> default group, granted it doesn't take much, but I'm stumped! What do I
>>> need to do to make this drive acceptable to my box? the first of this type
>>> of drive worked fine, the odds of getting lemons back to back I woud
>>> imagine to be quite high...Please help, TIA Daryl
>> 
>> Since this is so far OT we may as well have a smile:
>> This guy says ?I prayed for a bicycle, but nothing happened?. ? then I
>> realized it does not work like this, so I stole a bike and prayed for
>> forgiveness?.
>> So it is with rsync. (It does not work like. this)
>> One *can* fiddle with grub and /etc/fstab, but it will probably take long
>> and be painfull. (eg do you have an EFI bios)
>> 
>> your easiest option is
>> 
>> 1) boot the bad disk
>> 2) run mythconverg_backup.pl If you?ve not used it before 5 min reading is
>> needed. Save the backup.
>> 3) install the new disk
>> 4) install the os, or a newer version if  you want to.
>> 5) install mythtv, mariadb, apache2 and php if you want mythweb
>> 6) mount the old disk and copy all the. files you want, specially art and
>> ts files from myth-place-you-choose
>> 7) run mythconverg_restore.pl
>> 8) if you do not have same hostname and IP addr then more reading is needed.
>> 
>> James
>> 
>> 
>> Except my goal is to have the SSD with my OS as the boot drive and the
>> three terabyte drives as storage drives, so painful or time consuming as it
>> may be this time future disk swaps will be better.
>
>Running the OS from ssd is very non-trivial
>With all good practices (WRT ssd trim, swapiness etc) the DB journal and log will use up the ssd within a year and GREAT PAIN results.
>SSD is fast to boot and if your ring-buffers are on the ssd you can notice the diff, but otherwise ssd makes no noticable diff.
>
><very gently he says> the level of error in assuming rsync will make a bootable disk says by all means try, see what you can do, but you are looking at a very hard learning curve ahead.
>
>Newer, cheaper ssds use 2 or 3 cell technology that drops the write life to 1/2 or 1/3. You can get as few as 20 000 write cycles / cell !!
>
>For fixed SATA disks there is no benefit in using UUIDs so your fstab may have entries /dev/sda1 etc etc, that makes copying easier. 
>Modern spec, in a flourish of marketing hype, no long specify cell life but total write life ie you can write umpty plonk Pb to your disk and for an ornary desktop machine ssd will outlast mechanical disks. But not for a mythbackend machine.
>
>It is cheap and easy, I use a 1T mechanical dist for all the myth stuff including 2 20G root partitions that are mirrors of each other and a 5T WD passport USB3 for all storage. You can copy one passport to another each week or so without noticing.
>
>In any event you have a lot to figure out on the OS front that only briefly has anything to do with mythtv. Have fun.
>James

No, it a myth that SSDs wear out rapidly.  Modern SSDs have much more
endurance.  It may have been a problem with the first SSD drives, but
these days it is not a problem to have all normal things on an SSD
drive.  Obviously you do not want to run a job that is going to write
gigabytes per second all day, but as long as you avoid obviously
stupid things like that, SSDs are just fine.

In any modern Linux, the system will automatically run TRIM for you. I
did change the Ubuntu fstrim job from /etc/cron.weekly to
/etc/cron.daily, as my SSDs are fairly busy  at times so I wanted all
the deleted filesystem space erased and put back on the SSD free list
a bit more often than weekly.  You can also use the "discard" option
on the mount in fstab - that causes free space to be sent for erasing
immediately.  In older kernels that used to slow things down a bit,
but by now I hope that bug will have been fixed and the free space
erasing is now done in parallel by a separate lower priority thread.

Here are is the real life SMART data from two of my SSDs:

1) Samsung 950 Pro M.2 NVMe 256 Gbytes - this is my system drive for
my main MythTV box, running Ubuntu 18.04.  It is on 24/7:

root at mypvr:/etc# smartctl -a /dev/nvme0n1
smartctl 6.7 2018-12-05 r4852 [x86_64-linux-4.15.0-42-generic] (local
build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke,
www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 950 PRO 256GB
Serial Number:                      S2GLNCAGA02568R
Firmware Version:                   1B0QBXX7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Controller ID:                      1
Number of Namespaces:               1
Namespace 1 Size/Capacity:          256,060,514,304 [256 GB]
Namespace 1 Utilization:            125,586,079,744 [125 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 515a160a08
Local Time is:                      Thu Dec 20 20:44:44 2018 NZDT
Firmware Updates (0x06):            3 Slots
Optional Admin Commands (0x0007):   Security Format Frmw_DL
Optional NVM Commands (0x001f):     Comp Wr_Unc DS_Mngmt Wr_Zero
Sav/Sel_Feat
Maximum Data Transfer Size:         32 Pages

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     6.50W       -        -    0  0  0  0        5       5
 1 +     5.80W       -        -    1  1  1  1       30      30
 2 +     3.60W       -        -    2  2  2  2      100     100
 3 -   0.0700W       -        -    3  3  3  3      500    5000
 4 -   0.0050W       -        -    4  4  4  4     2000   22000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0x1)
Critical Warning:                   0x00
Temperature:                        43 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    3%
Data Units Read:                    116,381,714 [59.5 TB]
Data Units Written:                 79,225,837 [40.5 TB]
Host Read Commands:                 1,858,335,628
Host Write Commands:                937,822,226
Controller Busy Time:               3,812
Power Cycles:                       220
Power On Hours:                     22,043
Unsafe Shutdowns:                   112
Media and Data Integrity Errors:    0
Error Information Log Entries:      2,907

Error Information (NVMe Log 0x01, max 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0       2907     0  0x0016  0x4004  0x000            0     0     -
  1       2906     0  0x000e  0x4004  0x000            0     0     -
  2       2905     0  0x0017  0x4004  0x000            0     0     -
  3       2904     0  0x001d  0x4004  0x000            0     0     -
  4       2903     0  0x0016  0x4004  0x000            0     0     -
  5       2902     0  0x0014  0x4004  0x000            0     0     -
  6       2901     0  0x0015  0x4004  0x000            0     0     -
  7       2900     0  0x0018  0x4004  0x000            0     0     -
  8       2899     0  0x0007  0x4004  0x000            0     0     -
  9       2898     0  0x0003  0x4004  0x000            0     0     -
 10       2897     0  0x0010  0x4004  0x000            0     0     -
 11       2896     0  0x001d  0x4004  0x000            0     0     -
 12       2895     0  0x001b  0x4004  0x000            0     0     -
 13       2894     0  0x0009  0x4004  0x000            0     0     -
 14       2893     0  0x0011  0x4004  0x000            0     0     -
 15       2892     0  0x0010  0x4004  0x000            0     0     -
... (48 entries not shown)

So it has been running for over 918 days (2.51 years), and is showing
no spare blocks used and 3% wear with 40.5 Tbytes written.  The
specifications for the 950 Pro 256 Gbyte model say it should last for
200 Tbytes written, so it is around 20% of its expected lifetime now.
On that basis, if I keep using it at the same rate, it should last for
over 12 years.

2) Crucial MX300 SATA 525 Gbytes - this is the system drive in my
test/development box, but since my old OS/2 box died it has been
running 24/7 with OS/2 running in a virtual machine on Ubuntu 16.04.
It is not nearly as heavily used as my MythTV box - only the operating
system and the VM virtual system disk are on the SSD - the bulk of the
data for my email and the OS/2 VM storage drives are on ordinary hard
drives.

root at lith:~# smartctl -a
/dev/disk/by-id/ata-Crucial_CT525MX300SSD1_1630135D7E12
smartctl 6.6 2016-11-12 r4366 [x86_64-linux-4.4.0-140-generic] (local
build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke,
www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Crucial/Micron BX/MX1/2/3/500, M5/600, 1100 SSDs
Device Model:     Crucial_CT525MX300SSD1
Serial Number:    1630135D7E12
LU WWN Device Id: 5 00a075 1135d7e12
Firmware Version: M0CR060
User Capacity:    525,112,713,216 bytes [525 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Dec 20 21:02:13 2018 NZDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection
activity
                                        was never started.
                                        Auto Offline Data Collection:
Disabled.
Self-test execution status:      (   0) The previous self-test routine
completed
                                        without error or no self-test
has ever
                                        been run.
Total time to complete Offline
data collection:                ( 1703) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline
immediate.
                                        Auto Offline data collection
on/off support.
                                        Suspend Offline collection
upon new
                                        command.
                                        Offline surface scan
supported.
                                        Self-test supported.
                                        Conveyance Self-test
supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before
entering
                                        power-saving mode.
                                        Supports SMART auto save
timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging
supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (   9) minutes.
Conveyance self-test routine
recommended polling time:        (   3) minutes.
SCT capabilities:              (0x0035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail
Always       -       0
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age Always
-       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age Always
-       9420
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age Always
-       262
171 Program_Fail_Count      0x0032   100   100   000    Old_age Always
-       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age Always
-       0
173 Ave_Block-Erase_Count   0x0032   099   099   000    Old_age Always
-       19
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age Always
-       70
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age Always
-       0
184 Error_Correction_Count  0x0032   100   100   000    Old_age Always
-       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age Always
-       0
194 Temperature_Celsius     0x0022   073   054   000    Old_age Always
-       27 (Min/Max 16/46)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age Always
-       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age Always
-       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age Always
-       0
202 Percent_Lifetime_Remain 0x0030   099   099   001    Old_age
Offline      -       1
206 Write_Error_Rate        0x000e   100   100   000    Old_age Always
-       0
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age Always
-       5871831640
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age Always
-       183758941
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age Always
-       187091403
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail
Always       -       1916
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age Always
-       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining
LifeTime(hours)  LBA_of_first_error
# 1  Vendor (0xff)       Completed without error       00%      9368 -
# 2  Vendor (0xff)       Completed without error       00%      9117 -
# 3  Vendor (0xff)       Completed without error       00%      8886 -
# 4  Vendor (0xff)       Completed without error       00%      8802 -
# 5  Vendor (0xff)       Completed without error       00%      8542 -
# 6  Vendor (0xff)       Completed without error       00%      8295 -
# 7  Vendor (0xff)       Completed without error       00%      7998 -
# 8  Vendor (0xff)       Completed without error       00%      7742 -
# 9  Vendor (0xff)       Completed without error       00%      7527 -
#10  Vendor (0xff)       Completed without error       00%      7430 -
#11  Vendor (0xff)       Completed without error       00%      7175 -
#12  Vendor (0xff)       Completed without error       00%      6850 -
#13  Vendor (0xff)       Completed without error       00%      6582 -
#14  Vendor (0xff)       Completed without error       00%      6334 -
#15  Vendor (0xff)       Completed without error       00%      6073 -
#16  Vendor (0xff)       Completed without error       00%      5789 -
#17  Vendor (0xff)       Completed without error       00%      5507 -
#18  Vendor (0xff)       Completed without error       00%      5192 -
#19  Vendor (0xff)       Completed without error       00%      4924 -
#20  Vendor (0xff)       Completed without error       00%      4690 -
#21  Vendor (0xff)       Completed without error       00%      4651 -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Completed [00% left] (106848-172383)
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute
delay.

It has been running for 392.5 days with 5871831640 sectors written - I
am not quite sure what the sector size is.  It is reporting
Percent_Lifetime_Remain as 99%.  That is the number that Crucial says
is the critical one.  But from that, if it is only losing 1% of
lifetime per year, then it is going to last a very long time.  Even if
it is closer to 2% per year, I am very happy that it will last way
longer than the remaining time before I will be replacing that PC.


More information about the mythtv-users mailing list