[mythtv-users] OT: system disk failure
Stephen Worthington
stephen_agent at jsw.gen.nz
Sun Oct 6 09:38:43 UTC 2013
On Sun, 6 Oct 2013 10:10:35 +0200, you wrote:
>2013/10/6 Stephen Worthington <stephen_agent at jsw.gen.nz>:
>> On Sat, 5 Oct 2013 22:27:37 +0200, you wrote:
>>
>>>Hello all,
>>>
>>>This is maybe a off topic, but I am worried about it.
>>>
>>>My backend system disk is showing this message on boot:
>>>
>>># dmesg | grep error
>>>[ 3.540794] ata3.00: failed to enable AA (error_mask=0x1)
>>>[ 3.541652] ata3.00: failed to enable AA (error_mask=0x1)
>>>
>>>I read that it is related to a disk failure, is this really a problem?
>>>
>>>Thanks.
>>
>> A bit of googling finally found some kernel code that displays this
>> error:
>>
>> http://lxr.free-electrons.com/source/drivers/ata/libata-core.c
>>
>> That tells me that this error seems to be related to a feature flag
>> for SATA II drives. Further googling shows that it is something to do
>> with FIS (Frame Information Structure, a type of port multiplier). The
>> drive apparently advertises that it has the AA (Auto Activate) feature
>> but when a command is sent to set the AA option, it does not work, so
>> the kernel reports an error. Or so the driver thinks - it may be a
>> driver bug. Have you updated your kernel recently?
>>
>> If the drive is not connected via a port multiplier, then it is likely
>> that you can just ignore this message. Even if it is on a port
>> multiplier, as best I can tell, all that will happen is that this
>> feature will not be used, when possibly the drive supports it. It is
>> also possible that the drive has a firmware bug and is advertising a
>> feature that it actually does not support.
>>
>> It would be a good idea to use smartctl to check out the drive and see
>> if there are any other problems, just in case. And check that the
>> cable is OK, in case it might be caused by a signaling error in the
>> cable. But otherwise I would just get smartctl to run a long drive
>> test and if it passed, I would ignore this AA error.
>> _______________________________________________
>> mythtv-users mailing list
>> mythtv-users at mythtv.org
>> http://www.mythtv.org/mailman/listinfo/mythtv-users
>
>Thanks for your reply.
>
>The disk is system disk. Need I to unplug and execute in other PC with smartctl?
>
>Which is the long drive test?
>
>Thank you very much for your help.
>
>Best regards.
No, SMART monitoring and SMART drive tests can be run on a working
system disk. A SMART long drive test is supposed to automatically be
paused by the drive when it receives a command to do some real work,
and it will resume the test when that work is finished. However, if
your system drive is very busy, the background test running will be
moving the drive's heads away from where they normally are and will
cause slower response times when there is real work to be done. So it
can cause a working system to have errors in that most extreme case.
But normally most drives are just not that busy.
So if it is possible to have the system drive offline for the tests,
that is a good idea, but you can run them with it online and
operational if necessary.
First you need to find the device your system drive is on. Use this
command from a terminal:
mount | grep " on / "
In my case, this gives:
/dev/sda3 on / type ext3 (rw,errors=remount-ro)
so my system drive is on /dev/sda. So then this command will give the
full SMART data for the drive:
sudo smartctl -a /dev/sda
Since you are not familiar with SMART, it might be a good idea to post
the output on pastebin.com and reply with the URL here so that we can
check it out for you and point out any problems. There are plenty of
web pages out there that will help you to understand SMART data, but
it is a fairly steep learning curve.
This command will start a short drive test:
smartctl -t short /dev/sda
The smartctl command should tell you an estimated time for it to
complete the test, and you can look in the smartctl -a output for the
"self-test routine recommended polling time" values to see the
estimated times for the various tests to run. It should take only a
few minutes to run a short test.
After the time for the test has elapsed, do another smartctl -a
command to see the results, which are stored by the drive in its SMART
self test log area. If there are no results yet, try waiting a little
longer and trying again.
If necessary, this command should stop any running SMART self test:
sudo smartctl -X /dev/sda
Here is the result I just got running a short self test on one of my
drives that had never been tested before:
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 4651 -
(The text may be wrapped by the email software).
If the short test works, then you can try a long test:
sudo smartctl -t long /dev/sda
The time a long self test will take depends on the drive speed and
size, but will be hours on any reasonably large drive. On a huge slow
drive, it may take more than one day. Again, once the test's time has
elapsed (plus a little margin for interruptions for real work on the
drive), do another smartctl -a to see any results.
Note - you may need to install smartctl as it is not installed by
default on lots of systems.
More information about the mythtv-users
mailing list