[mythtv-users] OT: Raid Server troubleshooting

Don Jerman djerman at pobox.com
Thu Oct 11 23:27:09 UTC 2007


On 10/11/07, nasa01 at comcast.net <nasa01 at comcast.net> wrote:
> Hi,
>
> Sorry for the OT post....  But I do see lots of people using raid servers with thier myth setup and I haven't been able to figure out what is causing my issues.
>
> Here's my issue:  writes to my raid array often (but not always) ends up in a "D -- uninterruptible sleep" state.  This could be a result of trying to scp a large file (like a .vob file) to the array.  Or it could be a result of moving a .vob file from a non array drive to the array.  The processes that end up in the "D" state tend to be the pdflush and/or sshd processes (although in the mv instance it was the mv process).  Reading from the array *seems* to work fine, although mythmusic is crashing whenever I try and update my music list.
>
>
> What have I done so far....
>
> - Google searches on "uninterruptible sleep" and pdflush/sshd -- mostly it reenforced that the D state can't be killed.
> - scp'd files to the OS drive.  Copy speed starts out around 20MB/s and ends around 10MB/s.  IOSTAT shows cpu utilization at near 100% (actually it showed over 100% at some points).  File was a vob of approx 7G size.  Removed file (took like 5 minutes!)
> - scp'd same file to the raid drive.  Copy speeds starts out around 24MB/s....  IOSTAT starts showing cpu utilization at near 100% and read/write numbers that seem ok (don't have it in front of me right now).  This last for a couple of minutes before IOSTAT goes to all zeros -- everything cpu utilization read/writes, everything.  However, scp still reports couping in progress (at speeds of about 15MB/s).  Eventually scp shows copying as "--stalled--" and the copy speeds drop until it hits zero.  'ps aux' shows the pdflush process has gone into "D" state.
> - Since "D" state forces a reboot, and JFS requires a check after such -- running fsck.jfs /dev/sda1 completes successfully without any errors reported.
>
>
>
> My Setup:
>
> Raid Server:
> HighPoint Technologies, RocketRaid 1740
> DFI Infinity RS482 Motherboard
> CPU AMD|A64 3400+ 939
> 756M generic ram
> 3x320G && 1x500G in Raid 5 (filesystem is JFS)
> 1 60G IDE HD as OS drive (filesystem is ext)
> CoolerMaster ITower 930 case
> Mandriva 2007.1 (stock)
>
> cat /etc/fstab
> /dev/hda1 / ext3 defaults 1 1
> /dev/hda6 /home ext3 defaults 1 2
> /dev/sda1 /mnt/Raid jfs defaults,noatime,rw 1 0
>
> /var/log/messages
> rr174x: module license 'Proprietary' taints kernel.
> rr174x:0: RocketRAID 174x controller driver v1.02 (Apr 7 2007 21:39:11)
> ACPI: PCI Interrupt 0000:02:05.0[A] -> GSI 16 (level, low) -> IRQ 177
> rr174x:0: adapter at PCI 2:5:0, IRQ 177
> rr174x:0: start channel [0,0]
> rr174x:0: start channel [0,1]
> rr174x:0: start channel [0,2]
> rr174x:0: start channel [0,3]
> rr174x:0: channel [0,0] started successfully
> rr174x:0: channel [0,1] started successfully
> rr174x:0: channel [0,2] started successfully
> rr174x:0: channel [0,3] started successfully
> scsi4 : rr174x
> Vendor: HPT Model: DISK_4_0 Rev: 4.00
> Type: Direct-Access ANSI SCSI revision: 00
> SCSI device sda: 1874853888 512-byte hdwr sectors (959925 MB)
> sda: Write Protect is off
> sda: Mode Sense: 2f 00 00 00
> SCSI device sda: drive cache: write back
> SCSI device sda: 1874853888 512-byte hdwr sectors (959925 MB)
> sda: Write Protect is off
> sda: Mode Sense: 2f 00 00 00
> SCSI device sda: drive cache: write back
>  sda: sda1
> sd 4:0:0:0: Attached scsi disk sda
> sd 4:0:0:0: Attached scsi generic sg0 type 0
>
>
> Thanks for looking.
>
> Nasa
> _______________________________________________
> mythtv-users mailing list
> mythtv-users at mythtv.org
> http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users
>

Your best bet is to go ahead and scale up that backup solution.
Seriously - this thing is going to go away anyhow, so plan to take it
down on a schedule before it takes itself out.

When you've rebuild it to present each disk directly, and then built
the array on the kernel md drivers, you'll still be doing all the
heavy lifting with the processor but if there's any bugs in the driver
RAID logic you'll be bypassing them.

If it's still flaky you can go ahead and get a different controller,
if it's not then you fixed it!

Before you go get another controller, download and run SMART tools to
see if the drives report any errors.  If the drive hardware is kicking
up problems you'll see it and you can replace the faulty one.

If you do go for another controller, a real hardware raid would take
the load off the cpu, but think about a motherboard upgrade instead.
A 3ware controller for 4x sata-2 ports is around $320, which is about
the same as an AMD BE-2350, M2NPV-VM motherboard and 2 sticks of
PC6400 RAM.  Since you didn't give cpu specs I don't know if this
would be an upgrade, but I'm guessing yes, from the bus spec of the
card and your throughput numbers.  It would be a bigger project, but
you should wind up ahead on stability and performance.


More information about the mythtv-users mailing list