[mythtv-users] OT: Raid Server troubleshooting
Nathan A. Smith
nasa01 at comcast.net
Thu Oct 11 23:35:47 UTC 2007
On Thu, 2007-10-11 at 19:27 -0400, Don Jerman wrote:
> On 10/11/07, nasa01 at comcast.net <nasa01 at comcast.net> wrote:
> > Hi,
> >
> > Sorry for the OT post.... But I do see lots of people using raid servers with thier myth setup and I haven't been able to figure out what is causing my issues.
> >
> > Here's my issue: writes to my raid array often (but not always) ends up in a "D -- uninterruptible sleep" state. This could be a result of trying to scp a large file (like a .vob file) to the array. Or it could be a result of moving a .vob file from a non array drive to the array. The processes that end up in the "D" state tend to be the pdflush and/or sshd processes (although in the mv instance it was the mv process). Reading from the array *seems* to work fine, although mythmusic is crashing whenever I try and update my music list.
> >
> >
> > What have I done so far....
> >
> > - Google searches on "uninterruptible sleep" and pdflush/sshd -- mostly it reenforced that the D state can't be killed.
> > - scp'd files to the OS drive. Copy speed starts out around 20MB/s and ends around 10MB/s. IOSTAT shows cpu utilization at near 100% (actually it showed over 100% at some points). File was a vob of approx 7G size. Removed file (took like 5 minutes!)
> > - scp'd same file to the raid drive. Copy speeds starts out around 24MB/s.... IOSTAT starts showing cpu utilization at near 100% and read/write numbers that seem ok (don't have it in front of me right now). This last for a couple of minutes before IOSTAT goes to all zeros -- everything cpu utilization read/writes, everything. However, scp still reports couping in progress (at speeds of about 15MB/s). Eventually scp shows copying as "--stalled--" and the copy speeds drop until it hits zero. 'ps aux' shows the pdflush process has gone into "D" state.
> > - Since "D" state forces a reboot, and JFS requires a check after such -- running fsck.jfs /dev/sda1 completes successfully without any errors reported.
> >
> >
> >
> > My Setup:
> >
> > Raid Server:
> > HighPoint Technologies, RocketRaid 1740
> > DFI Infinity RS482 Motherboard
> > CPU AMD|A64 3400+ 939
> > 756M generic ram
> > 3x320G && 1x500G in Raid 5 (filesystem is JFS)
> > 1 60G IDE HD as OS drive (filesystem is ext)
> > CoolerMaster ITower 930 case
> > Mandriva 2007.1 (stock)
> >
> > cat /etc/fstab
> > /dev/hda1 / ext3 defaults 1 1
> > /dev/hda6 /home ext3 defaults 1 2
> > /dev/sda1 /mnt/Raid jfs defaults,noatime,rw 1 0
> >
> > /var/log/messages
> > rr174x: module license 'Proprietary' taints kernel.
> > rr174x:0: RocketRAID 174x controller driver v1.02 (Apr 7 2007 21:39:11)
> > ACPI: PCI Interrupt 0000:02:05.0[A] -> GSI 16 (level, low) -> IRQ 177
> > rr174x:0: adapter at PCI 2:5:0, IRQ 177
> > rr174x:0: start channel [0,0]
> > rr174x:0: start channel [0,1]
> > rr174x:0: start channel [0,2]
> > rr174x:0: start channel [0,3]
> > rr174x:0: channel [0,0] started successfully
> > rr174x:0: channel [0,1] started successfully
> > rr174x:0: channel [0,2] started successfully
> > rr174x:0: channel [0,3] started successfully
> > scsi4 : rr174x
> > Vendor: HPT Model: DISK_4_0 Rev: 4.00
> > Type: Direct-Access ANSI SCSI revision: 00
> > SCSI device sda: 1874853888 512-byte hdwr sectors (959925 MB)
> > sda: Write Protect is off
> > sda: Mode Sense: 2f 00 00 00
> > SCSI device sda: drive cache: write back
> > SCSI device sda: 1874853888 512-byte hdwr sectors (959925 MB)
> > sda: Write Protect is off
> > sda: Mode Sense: 2f 00 00 00
> > SCSI device sda: drive cache: write back
> > sda: sda1
> > sd 4:0:0:0: Attached scsi disk sda
> > sd 4:0:0:0: Attached scsi generic sg0 type 0
> >
> >
> > Thanks for looking.
> >
> > Nasa
> > _______________________________________________
> > mythtv-users mailing list
> > mythtv-users at mythtv.org
> > http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users
> >
>
> Your best bet is to go ahead and scale up that backup solution.
> Seriously - this thing is going to go away anyhow, so plan to take it
> down on a schedule before it takes itself out.
>
> When you've rebuild it to present each disk directly, and then built
> the array on the kernel md drivers, you'll still be doing all the
> heavy lifting with the processor but if there's any bugs in the driver
> RAID logic you'll be bypassing them.
>
> If it's still flaky you can go ahead and get a different controller,
> if it's not then you fixed it!
>
> Before you go get another controller, download and run SMART tools to
> see if the drives report any errors. If the drive hardware is kicking
> up problems you'll see it and you can replace the faulty one.
>
> If you do go for another controller, a real hardware raid would take
> the load off the cpu, but think about a motherboard upgrade instead.
> A 3ware controller for 4x sata-2 ports is around $320, which is about
> the same as an AMD BE-2350, M2NPV-VM motherboard and 2 sticks of
> PC6400 RAM. Since you didn't give cpu specs I don't know if this
> would be an upgrade, but I'm guessing yes, from the bus spec of the
> card and your throughput numbers. It would be a bigger project, but
> you should wind up ahead on stability and performance.
Thanks for all the suggestions....
Guess I will be working on that backup solution.
Nasa
> _______________________________________________
> mythtv-users mailing list
> mythtv-users at mythtv.org
> http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users
More information about the mythtv-users
mailing list