[mythtv-users] IMDB has TV listings.

Daniel Kristjansson danielk at cuymedia.net
Sun Jun 24 02:15:47 UTC 2007


On Sat, 2007-06-23 at 22:05 +0200, Mattias Holmlund wrote:
> On 6/23/07, Daniel Kristjansson <danielk at cuymedia.net> wrote:
> > For those considering competing with TMS and TV Guide, you
> > should look at the Swedish example. There MythTV and XMLTV
> > devs created scrapers for each network and then approached
> > the networks for permission to republish the data when they
> > had a working example with that station.
> 
> Just a small correction: We always secure permission from the network
> before I start writing code for that network. Once we have the
> permissions in place, we get access to the network's press-site or
> their e-mails to the press and we can import the data from there. Most
> of the time, this data is "better" than the data published on their
> public web-sites.

Thanks for the correction.

In the US I think a better approach might be to have the
users create the guide data with the help of some smart
server side software.

For the smarts:
 * Most things on television are serialized, meaning
   the same show that was on last Tuesday at 8pm will
   be on again this Tuesday at 8pm.
 * Most TV shows are reruns, when a rerun is serialized
   show 101 comes on the week/day after show 100.

For the users:
 * Day -14:
    + 5*X users in the NY Metro area get an e-mail with a
      URL and are asked to click it to fix any errors in
      the generated listings they get based on the smarts.
      They are told to only reference wikipedia for show
      descriptions. Any pre-existing show descriptions
      are filled in from the predicted show for reruns,
      if the user corrects the subtitle or show number,
      pre-existing descriptions are pulled from our DB.
      When a user submits data for NY Metro, all areas
      with the same network feed or showing the same show
      are updated for anyone accessing the system, if a
      show has multiple descriptions a drop down is provided
      for anyone in the future when updating that show
      until there are an overwhelming number of votes for
      a particular description.
 * Day -13:
    + 2*X users are each asked to rate one set of listings
      generated on day 1.
    + 3*X users are asked to choose the best of the several
      options created on day 1.
 * Day -12:
    + Each of 1*X users rates one of the 3*X choosers picked
      at random.

User's ratings and responsiveness to e-mails would be used
to determine if they can continue to pull the data and how
much data they can pull. How often a user gets e-mail depends
on how many MythTV users share their subset of channels. Data
could be shipped out to users everyday, it would simply improve
over time, we could even allow the user to pull in data 60
days in advance, but most of that data would be computer
generated. We could also do -7,-6,-5 days for updating
listings data. This could also happen much faster than three
days in practice, and over time the predictive algorithms could
be improved to account for the programming season on each
network and other factors.

The Swedish model could be worked in as well, it could be
used to improve the data sent on day -14 and/or day -7,
or any other day by just becoming another user's listing
for that day to be rated and selected from by other users.
The Daily Show and the Colbert Report would certainly have
correct data under this system.

I still think licensing is better if we can come to a reasonable
deal. If only because implementing the system would take time
away from improving MythTV. But I also think Yeechang Lee is
wrong that it is impossible to replace zap2it ourselves.
Implementing a pure Swedish model would leave too many stations
out in the US, so some manual entry would be required. But with
the number of people using MythTV we could manually enter every
program on every channel without any smarts and still have
multiple people responsible for each channel; licensing from the
stations would just reduce the user load to trivial levels. This
even gives the people in a particular geographic area an
incentive to work on licensing their locals, less frequent
e-mails, with less work to do for the e-mails; a boilerplate
script and license could be drawn up, real-life pointers for how
approach and how not to approach a station could be collected
and condensed into informative web pages.

-- Daniel



More information about the mythtv-users mailing list