[mythtv-users] IOBOUND: Arrrg, will it ever end?

Tue May 17 08:13:31 UTC 2005

Over 2 weeks, I have read countless search hits and attempted several
fixes to stop these awful IOBOUND message floods.  I'm in need of some
serious help.  I have implemented many of the suggestions in the
archives (and some of my own crazy ideas) with no luck.

Sometimes when recording two shows at once, mythtv gets locked into a
series of IOBOUND messages:

2005-05-16 23:36:43.145 IOBOUND - blocking in ThreadedFileWriter::Write()
2005-05-16 23:36:47.807 IOBOUND - blocking in ThreadedFileWriter::Write()
2005-05-16 23:36:52.579 IOBOUND - blocking in ThreadedFileWriter::Write()
2005-05-16 23:36:57.729 IOBOUND - blocking in ThreadedFileWriter::Write()

and then one of these "100%" msgs:
2005-05-16 23:51:02.736 /dev/video32 ringbuf avg 0.149767% max 100%
samples 249187

This is a hdtv/atsc backend with two hd3000 cards.

This seems to happen:
a. When recording two shows at once.
b. When at least one of them is a high bitrate stream (HDTV show).
c. When both recordings start simultaneously. If I manually start one
a few minutes after the other is in-progress, the iobound 'lockup'
usually doesn't happen.
d. 1 or 2 times a day,  ~15 recordings per day.

When this happens, both recordings will be unwatchable (lots of
pixelation, popping, and skipping).  I also see hd3000 overrun
messages in /var/log/messages.

I have tried:
1. Increasing the buffer in RingBuffer.cpp from 2MB to 4MB to 8MB, and
even to 32MB.
2. Increasing the hdtv buffer in tv_rec.cpp from 4MB to 8MB to 16MB,
and even 64MB.
3. Increasing the pchdtv buffer in the driver (BUF_DEFAULT) to avoid
buffer overruns.
4. Adding the 'noapic' option when the kernel is booted.
5. Adding more cooling and backing off overclocking. My athlon 2100+
was overheating and I was getting segfaults during recompiling.

I have verified:
i. using_dma on all drives is "1" (enabled).
ii. The two pchdtv cards are on different interrupts.
iii. vmstat reports low cpu utilization, except when the IOBOUND
messages are present. "sy" is biggest consumer when they are present.
iv. No other processes are consuming cpu. ie. no commercial flagging jobs.

I have thought about doing the following (I could really use advice if
ANY of these is worth trying):

A. Changing from ext3 to xfs. Better performance? Not really sure disk
i/o is the problem, except maybe when the two recordings first start
up?  Disk i/o doesn't seem to be that much when recording two shows
simulataneously and when the iobound msgs are absent.  Using vmstat:
bo ~4000-8000 with lots of bo=0 every other second.

B. Going back to a frontend & backend in ONE pc: 3.4GHz P4. I didn't
have these blasted iobound problems before splitting the FE and BE.
Kinda weird.
C. Change my LVM volumegroup to a single drive because one drive is
reported as UDMA(33) but the other drive is seen as UDMA(100).
D. Staggering recordings so one starts 1 or 2 mins before the other.
Testing this now. Not a great solution.
E. Adding more RAM, cpu?
F. Pulling my hair out  :-O

This seems to be a backend-only problem. Happens when the frontend is
idle. Though, I have suspected that playback i/o exasperates the
problem. No proof though.

What can I do to isolate this problem?

Backend:
Athlon 2100+
768MB RAM: 256MB PC2100 & 512MB PC3200
2x pcHDTV HD3000 tuners (v1.5 drivers)
PATA, 80GB WDC, 200GB WDC.
Fedora Core 3
kernel 2.6.9-1.667
mythbackend version: 0.18.20050409-1 (cvs)

I have been working 2 weeks on this: Implement a change, wait for it
to happen, get discouraged, do some more searches, try something else.
 Lather, rinse, repeat.

I appreciate any help you can provide.  I think this project is excellent.
Thanks
Donn
ps: I tried a recent sync with cvs around 5/12, and myth would
sometimes segfault when a recording ended. So, I reverted back to cvs
on 4/22.