[mythtv-users] Radio Times XMLTV failing

Neil Dunbar neil.dunbar at pobox.com
Tue Oct 3 07:36:26 UTC 2006


On Monday 02 October 2006 23:20, malcolm torrent wrote:
> I'd like to echo Simon's thanks to Neil for the fix.
> I tried to diagnose this myself (unsuccessfully) so if possible I'd be
> interested in a short explanation as to how the problem was
> approached, resolved and why this fix works.
> Mal.

OK. The problem is corruption in the datafile 1961.dat, which corresponds to 
the schedules for ITV4 (running mythfilldatabase from the command line shows 
the Unicode wide character \u0000 is not acceptable within an XML document). 
So I wget'ed the offending URL and looked at the file with a binary editor 
(bvi), searched for the sequence of null characters.

First thing I thought was to stop the script dying (comment out the "croak" 
instruction in the XMLTV code), but then it just died with a "unexpected 
end-of-file" error. So I had to replace the offending text with something 
else, so I stuck in that line in tv_grab_uk_rt which substitutes \u0000 with 
the text ".." (ie, something harmless). Now, all of that said, there may very 
well be a legitimate use of a sequence of two nulls in Unicode (eg, for 3 or 
4 byte wide characters), so this kludge can't stay in - it replaces the nulls 
without regard for their context in the file.

In the end, I suspect it's just a bit of file corruption from Radio Times. 
It's not happening anywhere else in the data feed, and it'll disappear from 
the schedules on Saturday, and we can say goodbye to ugly kludges.

A longer term fix would be for XMLTV to replace offending Unicode characters 
with harmless ones, just to be a bit more robust when dealing with partially 
corrupted data. I may have a look at this over the weekend.

Cheers,

Neil


More information about the mythtv-users mailing list