[mythtv-commits] Ticket #1076: mythbackend dies when writing big files (4 GB) - FAT32 filesystem
MythTV
mythtv at cvs.mythtv.org
Fri Jan 20 08:01:28 UTC 2006
#1076: mythbackend dies when writing big files (4 GB) - FAT32 filesystem
---------------------------+------------------------------------------------
Reporter: buzz at oska.com | Owner: ijr
Type: defect | Status: new
Priority: minor | Milestone:
Component: mythtv | Version:
Severity: medium |
---------------------------+------------------------------------------------
Buzz Says:
(I'm making the -dev conversation into a ticket, as it's verifiably a bug
when the backend seg faults)
Scenario:
Backend saves files to FAT32 partition.
Backend tries to exceed 4GB (or there abouts) while unattended.
Backend dies with error "File size limit exceeded" emitted by OS.
Backend's last message prior to dying was:
"TFW: safe_swite() funky usleep"
(message comes from ThreadedFileWriter.cpp )
---------------------------
Buzz says:
Wouldn't it be reasonable if it barfed the recording partially/entirely
WITHOUT crashing mythbackend, rather than crashing entirely as it does
now.
---------------------------
Isaac says:
Backtrace?:I'm not going to want to add code specifically to handle fat32,
but certainly dying is bad.
---------------------------
Buzz says:
Last 4 lines of 'mythbackend -v all' followed by backtrace:
2006-01-19 16:35:20.575 MSqlQuery: INSERT INTO recordedmarkup (chanid,
starttime, mark, type, offset) VALUES ( '1007' , '2006-01-19T15:15:00' ,
'119754' , 9 , '4289249140' );
2006-01-19 16:35:20.576 MSqlQuery: INSERT INTO recordedmarkup (chanid,
starttime, mark, type, offset) VALUES ( '1007' , '2006-01-19T15:15:00' ,
'119772' , 9 , '4289895672' );
2006-01-19 16:35:20.577 MSqlQuery: INSERT INTO recordedmarkup (chanid,
starttime, mark, type, offset) VALUES ( '1007' , '2006-01-19T15:15:00' ,
'119790' , 9 , '4290548972' );
2006-01-19 16:35:25.630 TFW: safe_write(): funky usleep
Program received signal SIGXFSZ, File size limit exceeded.
[Switching to Thread -1336443984 (LWP 2872)]
0x00b8e402 in __kernel_vsyscall ()
(gdb)
(gdb) bt
#0 0x00b8e402 in __kernel_vsyscall ()
#1 0x007540bb in __write_nocancel () from /lib/libpthread.so.0
#2 0x00e93412 in safe_write (fd=20, data=0xaee70d90, sz=12920)
at ThreadedFileWriter.cpp:57
#3 0x00e950a1 in ThreadedFileWriter::DiskLoop (this=0x89b69d0)
at ThreadedFileWriter.cpp:367
#4 0x00e951a5 in ThreadedFileWriter::boot_writer (wotsit=0x89b69d0)
at ThreadedFileWriter.cpp:93
#5 0x0074fb80 in start_thread () from /lib/libpthread.so.0
#6 0x02d969ce in clone () from /lib/libc.so.6
---------------------------
Buzz says:
*) OS is sending a SIGXFSZ to backend, backend is taking default action
which "coredump and exit".
Solution:
* capture SIGXFSZ, handle it gracefully.
---------------------------
Buzz says:
Hi All. I'm working on a solution to this thread and have got the
following steps working in my code (diff attached -
catch_and_handle_SIGXFSZ_diff.txt):
1) OS sends SIGXFSZ to mythbackend
2) backend captures said signal, squirrels it into a global called
"LastSignal", so anyone who wants to can look for it. (yes, I know globals
are bad, but signal handlers are worse.)
3) ThreadedFileWriter.cpp has an existing function called "safe_write"
that I've modified so that it checks for the signal(in the global) before
trying to write to any file.
4) safe_write: if a SIGXFSZ signal was received it "aborts" the in-
progress write (then-and-there, without flushing memory buffers to disk or
anything), returning an error.
5) safe_write is called from inside ThreadedFileWriter::DiskLoop. The
return value of safe_write is tested in DiskLoop, and it causes both
threads (write and sync threads of ThreadedFileWriter) to be torn down,
and the ThreadedFileWriter to enter a state of "write error".
6) next time the caller (RingBuffer.cpp) trys to call tfw->Write(...) it
fails, returning -1 up to the calling function (which is in RingBuffer.cpp
- Write), and the tfw is torn down, closing the open file handle, and
cleaning up.
7) RingBuffer.cpp already had the capability to return -1 or other errors,
so it's been tweaked to look at the return status of the tfw->Write call
too, and pass the error up if it occurs.
...now, I'm not sure where to take it from here.
The signal is definitely being captured, and it's being passed all the way
back up to the RingBuffer, so I know that's working but.... Nothing else
(backend and/or frontend) seems to recognise that the recording failed.
Should I go that far, or just barf the error message to the log, and leave
it at that?
IE: How do I make everything else recognise that the recording of this
file has aborted/failed?
Am I doing the right thing here... Or is there an easier way?
---------------------------
Mark Weaver says:
Just ignore the signal - write will return EFBIG and the recording
should follow the usual failure path. You should be able to test it
with ulimit -f, that will allow you to generate SIGXFSZ with smaller
files.
---------------------------
Buzz says:
The problem is that as it exists now in CVS, ThreadedFileWriter.cpp has
no "usual failure path" from the 'write' command (in safe_write).
safe_write returns a uint to indicate how much was written, and '0' is a
legitimate amount to write, not an error case. I've changed the relevant
places to allow it to return negative (failure),and pass the failure back-
up the calling chain to RingBuffer where it emits an error to the log.
Both backend and frontend both still seem oblivious to the error condition
that occurs when RingBuffer->Write() return -1 during a record.
Other suggestions?
---------------------------
--
Ticket URL: <http://svn.mythtv.org/trac/ticket/1076>
MythTV <http://www.mythtv.org/>
MythTV
More information about the mythtv-commits
mailing list