[mythtv-users] Re: Bleb.org down??

Tue Apr 20 18:00:45 EDT 2004

Right,

I've been playing around with /usr/bin/tv_grab_uk_rt and tv_split to see how it
all fits together.

I downloaded four days of stuff from the site, tv_split it into days and
channels and the tv_cat'ed it. The file sizes for the output file from the
tv_cat is identical to the original but a diff shows the lines are not in the
same order. I will assume that mythfillbase willwork OK on this as I don't want
to mess my DB up and can't work out how to create a new one this late in an evening.

My first thoughts on this are:

1. The master server(s) need to download every channel (plus additional
information?). 
2) They split the XML file up into channels and days. We may have 14x350 files
now (14 days of 350 channels), not a big deal as we delete anything too old,
i.e. older than fourteen days. 
3) Any Tier-2 servers simply replicate the entire XML directory tree. Average
file size is approx 5K, the first pass is 1.5MB, subsequent daily passes are
100K or so, nothing to bother anybody about. So far, so good, nice and easy.
4) The Tier-3 people, the clients, need to download only the appropriate
channels they need. These channels are held in ~/.mythtv/<videosource>.xmltv. We
read these channels into the mirror script and only wget these files. We then
assemble all the xml files into a single file and pass it into mythfilldatabase.

Problems I can see:

Stages 1,2 and 3 are pretty easy. Simple shell scripting unless I missed
something important. Stage 4 is a little more awkward as wget will still mirror
 each xml file and then and only then delete it after it's copied it. This seems
to be the default behaviour! This means that everybody would load every daily
file, not what we are after. Perhaps LWP:Simple is a better idea!

Comments,

Rob.

Quoting David <myth at dgreaves.com>:

> 
> Rob Willett wrote:
> 
> >David,
> >
> >I have looked at the Showshifter site and they have said that Digiguide
> won't
> >play ball with them at all. They and PureTelly are the only people I know
> doing
> >decent listings that are free(ish). I did look at trying to scrape PureTelly
> and
> >then gave up.
> >  
> >
> Yes, I've heard that.
> 
> >In light of this I have now written most of a very simple distribution
> service.
> >I offer this up as a sacrifical lamb...
> >  
> >
> OK - lets see :)
> (Send me a private copy if you like )
> 
> >It works on a three tier approach. 
> >
> >Somebody, somewhere must create the master XML data files. This is the bit
> to
> >do... <cough>. I've written software to handle the distribution only. It
> will
> >work with a single server but clearly the more servers we have the wider
> the
> >distribution and the less load we each have to bear. It will also work
> around
> >the world by the user simply choosing a different Tier-2 server list when
> they
> >start. 
> >  
> >
> Have 2 servers (A and B) both of them:
> * grab the data we want (tv_grab_uk_rt)
> * split it into individual files (see post by Henk on tv_* utils - I 
> guess we want tv_split - thanks Henk ;) )
> IF we want to be clever then A and B exchange xml files (wget) and do a 
> diff (tv_sort sounds like it's our friend here) - if they're unequal 
> then another grab takes place and we can implement majority rules or 
> repeat until no errors.
> 
> >The software works as follows. A master server (bleb.org?) is used to seed a
> set
> >of Tier 2 servers. Seeding is simply copying the XML files off the master
> server
> >to the Tier 2 servers. I have assumed the XML files are appropriate for
> mythtv.
> >  
> >
> OK this is mirroring - good - wget has options to do this.
> 
> >All the software does at the moment is distribute. The Tier 2 servers then
> make
> >all the XML files and the list of Tier 2 servers available to all and
> sundry.
> >Simple and easy, no databases, no hard work. All a tier-2 provider need do
> is
> >daily update their own files from either the master server or from another
> Tier
> >2 provider.
> >
> Cron, wget --mirror
> 
> >They will need a web server and preferably a dedicated directory.
> >  
> >
> They're cheap - the directories that is ;)
> 
> >They will not need mySQL, PHP though they will need Perl, ping and wget (so
> far).
> >  
> >
> just wget
> 
> >All the software is written in Perl. It consists of a perl script and a
> file
> >containing the list of Tier-2 servers and which directory the xml files are
> in.
> >This software will randomly choose one of the Tier 2 servers (never the
> master)
> >and try and download the XML software and the server list. If a server is
> down
> >it will choose another server until it either succeeds or runds out of
> servers. 
> >  
> >
> Only users need this - see my earlier mail.
> 
> >At the moment I have the core software written but am struggling a little
> with
> >wget when it isn't pulling back web pages, just files. I want the power of
> rsync
> >but on port 80. I'll crack that hopefully later today and will then send
> files out.
> >  
> >
> don't spend too much time - perl should be able to pull back easily w/o 
> wget .
> See LWP - it's not hard :
> 
>   # Create a user agent object
>   use LWP::UserAgent;
>   $ua = LWP::UserAgent->new;
>   # Create a request
>   my $req = HTTP::Request->new(GET =>
> 'http://www.hmmm.co.uk/xmltv/2004/04/21/sky_one.sky.com.xml');
>   # Pass request to the user agent and get a response back
>   my $xml = $ua->request($req);
> and that's it ;O
> 
> >Rather than clogging up the list if there is enough interest I'll setup a
> >mailing list for anybody interested in helping out. If you are send me an
> e-mail
> >with to rob.mythtv.devlist.no.spam at robertwillett.com. Clearly remove the
> >.no.spam  to get this work.
> >  
> >
> It's already clogged - I think this is valid Myth stuff at the minute 
> given Henk and Jeffs unexpected interest ;)
> 
> 

-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/