[mythtv-users] Radio Times XMLTV failing

Tue Oct 3 01:21:26 UTC 2006

On 02/10/06, malcolm torrent <malcolm.torrent at gmail.com> wrote:
> I'd like to echo Simon's thanks to Neil for the fix.
> I tried to diagnose this myself (unsuccessfully) so if possible I'd be
> interested in a short explanation as to how the problem was
> approached, resolved and why this fix works.

The ITV4 datafile from the RadioTimes site contained some bad
character(s) (\000\000 in octal, \u0000 in Unicode) that we do not
want to pass to the XMLTV parser, therefore we use a s///g global
substitution to replace these characters with the text "..." after the
data file has been downloaded but before it is processed.

Info from the tv_grab_uk_rt grabber:

# Tidy up HTML entities and bad characters.  The site seems to use
# a mixture of Latin-1 and UTF-8, I'm not sure exactly.  We want
# our output to be in Latin-1 but HTML::Entities decides to use
# Unicode so we have to fiddle a few entities manually first.

There are a couple of other substitutions performed at this stage to
replace other characters commonly seen in the data, but I think this
may be the first time this character sequence has been seen.

At the time of writing, the RT datafiles seem to be empty (I see this
quite a lot) so I can't see whether the characters have been removed
from the source data.

Nick

MythTV Official wiki:
http://mythtv.org/wiki/
MythTV users list archive:
http://www.gossamer-threads.com/lists/mythtv/users