[mythtv-commits] Ticket #1076: mythbackend dies when writing big files (4 GB) - FAT32 filesystem

Fri Jan 20 08:01:28 UTC 2006

#1076: mythbackend dies when writing big files (4 GB)  - FAT32 filesystem
---------------------------+------------------------------------------------
 Reporter:  buzz at oska.com  |       Owner:  ijr
     Type:  defect         |      Status:  new
 Priority:  minor          |   Milestone:     
Component:  mythtv         |     Version:     
 Severity:  medium         |  
---------------------------+------------------------------------------------
 Buzz Says:
 (I'm making the -dev conversation into a ticket, as it's verifiably a bug
 when the backend seg faults)
 Scenario:
 Backend saves files to FAT32 partition.
 Backend tries to exceed 4GB (or there abouts) while unattended.
 Backend dies with error "File size limit exceeded" emitted by OS.

 Backend's last message prior to dying was:
 "TFW: safe_swite() funky usleep"
 (message comes from ThreadedFileWriter.cpp )

 ---------------------------
 Buzz says:
 Wouldn't it be reasonable if it barfed the recording partially/entirely
 WITHOUT crashing mythbackend, rather than crashing entirely as it does
 now.

 ---------------------------
 Isaac says:
 Backtrace?:I'm not going to want to add code specifically to handle fat32,
 but certainly dying is bad.
 ---------------------------
 Buzz says:
 Last 4 lines of 'mythbackend -v all' followed by backtrace:
 2006-01-19 16:35:20.575 MSqlQuery: INSERT INTO recordedmarkup (chanid,
 starttime, mark, type, offset) VALUES ( '1007' , '2006-01-19T15:15:00' ,
 '119754' , 9 , '4289249140' );
 2006-01-19 16:35:20.576 MSqlQuery: INSERT INTO recordedmarkup (chanid,
 starttime, mark, type, offset) VALUES ( '1007' , '2006-01-19T15:15:00' ,
 '119772' , 9 , '4289895672' );
 2006-01-19 16:35:20.577 MSqlQuery: INSERT INTO recordedmarkup (chanid,
 starttime, mark, type, offset) VALUES ( '1007' , '2006-01-19T15:15:00' ,
 '119790' , 9 , '4290548972' );
 2006-01-19 16:35:25.630 TFW: safe_write(): funky usleep

 Program received signal SIGXFSZ, File size limit exceeded.
 [Switching to Thread -1336443984 (LWP 2872)]
 0x00b8e402 in __kernel_vsyscall ()
 (gdb)
 (gdb) bt
 #0  0x00b8e402 in __kernel_vsyscall ()
 #1  0x007540bb in __write_nocancel () from /lib/libpthread.so.0
 #2  0x00e93412 in safe_write (fd=20, data=0xaee70d90, sz=12920)
     at ThreadedFileWriter.cpp:57
 #3  0x00e950a1 in ThreadedFileWriter::DiskLoop (this=0x89b69d0)
     at ThreadedFileWriter.cpp:367
 #4  0x00e951a5 in ThreadedFileWriter::boot_writer (wotsit=0x89b69d0)
     at ThreadedFileWriter.cpp:93
 #5  0x0074fb80 in start_thread () from /lib/libpthread.so.0
 #6  0x02d969ce in clone () from /lib/libc.so.6
 ---------------------------
 Buzz says:
 *) OS is sending a SIGXFSZ to backend, backend is taking default action
 which "coredump and exit".
 Solution:
 * capture SIGXFSZ, handle it gracefully.
 ---------------------------
 Buzz says:
 Hi All.   I'm working on a solution to this thread and have got the
 following steps working in my code (diff attached -
 catch_and_handle_SIGXFSZ_diff.txt):

 1) OS sends SIGXFSZ to mythbackend
 2) backend captures said signal, squirrels it into a global called
 "LastSignal", so anyone who wants to can look for it. (yes, I know globals
 are bad, but signal handlers are worse.)
 3) ThreadedFileWriter.cpp has an existing function called "safe_write"
 that I've modified so that it checks for the signal(in the global) before
 trying to write to any file.
 4) safe_write: if a SIGXFSZ signal was received it "aborts" the in-
 progress write (then-and-there, without flushing memory buffers to disk or
 anything), returning an error.
 5) safe_write is called from inside ThreadedFileWriter::DiskLoop.  The
 return value of safe_write is tested in DiskLoop, and it causes both
 threads (write and sync threads of ThreadedFileWriter) to be torn down,
 and the ThreadedFileWriter to enter a state of "write error".
 6)  next time the caller (RingBuffer.cpp) trys to call tfw->Write(...) it
 fails, returning -1 up to the calling function (which is in RingBuffer.cpp
 - Write), and the tfw is torn down, closing the open file handle, and
 cleaning up.
 7) RingBuffer.cpp already had the capability to return -1 or other errors,
 so it's been tweaked to look at the return status of the tfw->Write call
 too, and pass the error up if it occurs.

 ...now, I'm not sure where to take it from here.

 The signal is definitely being captured, and it's being passed all the way
 back up to the RingBuffer, so I know that's working but....  Nothing else
 (backend and/or frontend) seems to  recognise that the recording failed.
 Should I go that far, or just barf the error message to the log, and leave
 it at that?

 IE: How do I make everything else recognise that the recording of this
 file has aborted/failed?

 Am I doing the right thing here... Or is there an easier way?
 ---------------------------
 Mark Weaver says:
 Just ignore the signal - write will return EFBIG and the recording
 should follow the usual failure path.  You should be able to test it
 with ulimit -f, that will allow you to generate SIGXFSZ with smaller
 files.

 ---------------------------

 Buzz says:
 The problem is that as it exists now in CVS,  ThreadedFileWriter.cpp has
 no "usual failure path" from the 'write' command (in safe_write).
 safe_write returns a uint to indicate how much was written, and '0' is a
 legitimate amount to write, not an error case.  I've changed the relevant
 places to allow it to return negative (failure),and pass the failure back-
 up the calling chain to RingBuffer where it emits an error to the log.

 Both backend and frontend both still seem oblivious to the error condition
 that occurs when RingBuffer->Write() return -1 during a record.

 Other suggestions?
 ---------------------------

-- 
Ticket URL: <http://svn.mythtv.org/trac/ticket/1076>
MythTV <http://www.mythtv.org/>
MythTV