[mythtv-users] Possible fix for tv_grab_au 2.11

Shanon Mulley shanonmulleyster at gmail.com
Sat Jul 15 04:17:09 UTC 2006


David,

Can you include the diff file as an attachment, rather then text in
the email? (I could copy/paste I guess, but I think an attachment is
neater).

Thanks.

Shanon.

On 7/15/06, David Whyte <david.whyte at gmail.com> wrote:
> On 7/15/06, Max Barry <mythtv at maxbarry.com> wrote:
> >
> > Seems to be URLs. Simply refreshing the daily guide page immediately
> > gives you another 21 shots at the detail pages.
>
> I got the following email overnight.  Not sure why the list wasn't
> CC'd but anyways:
>
> I think I have a solution for the new problem with the msn site. I
> suspected they may have been up to something when they changed the
> pids to a hash with a time component. I think what's happening is that
> they are allowing about 20 odd "closeups" with each refresh of the day
> page, and when you refresh the day page, it will recalculate a new pid
> hash for a new batch of 20. You can test this manually by going to the
> site with a web browser and clicking on 20 or so program details.
> Eventually it will say "Please try again later". When you refresh the
> page, it should allow details again. HOWEVER there are exceptions,
> occassionally I've had  to wait and refresh yet again before it
> worked.
>
> My solution is to refresh the same day page and start grabbing details
> again when one "closeup" fails, after waiting a few seconds. At the
> same time, programs for which we had already retrieved details has
> been cached, so it will "resume" where it left off. On a fresh new
> page where I have to grab details for everything, it can take up to 13
> retries to get it all. But it does get there.
>
> The script is working for me, but I've had to make quite extensive
> changes to your v2.12 to get this to work, and I'm a complete perl
> noobie, so I may have mucked things up. Also, things like the
> statistics reporting are all screwed up now and need to be fixed.
> However, I thought I'd give you what I've done in case it will help.
> (indentation has been unchanged, to minimize the diff output so you
> know where the changes are). I'm not confident enough in my perl
> skills to post this on the list :)
>
> Boy, the programming consultants at ninemsn must be doing a good job
> convincing the brass that this scraper is a "real problem", to pad
> their own hours.
>
> Hope this helps. Oh, and as always, use at your own risk!
>
>
> $ diff tv_grab_au.new tv_grab_au.immir.2.12
> 163,164d162
> < my $maxDayProc = 20; # Maximum times to repeat a single day processing
> < my $waitBetweenRetries = 10; # Time to wait between repeats
> 353d350
> <   LOOPDAYS:
> 363,364d359
> <     my $completedDay = 0;
> <     my $dayProcCount = 0;
> 366,370c361
> <     DAYPROC:
> <     while (!$completedDay and $dayProcCount < $maxDayProc)
> <     {
> <
> <     my $guidedata = get_page($url) or next LOOPDAYS;
> ---
> >     my $guidedata = get_page($url) or next;
> 372,373d362
> <     ++$dayProcCount;
> <     print "DAYPROC ITERATION $dayProcCount\n" if $debug;
> 417,420c406
> <           my $url;
> <           # If webwarper used, the link already contains full URL
> <           $url = $NMSN unless $opt_warper;
> <           $url .= $link[0]->[0];
> ---
> >           my $url = $NMSN . $link[0]->[0];
> 428c414
> <           my ($show, $cachedShow, $needDetails, $gotDetails);
> ---
> >           my ($show, $cache_show);
> 434c420
> <             $cachedShow = 1;
> ---
> >             $cache_show = 1;
> 459,465c445,446
> <             $needDetails = want_details($show);
> <             $gotDetails = get_closeup_details($date6am,$show,$pid,$row,$url)
> <               if $needDetails;
> <             # update current cache in case current day needs to be repeated
> <             $cached->{$cache_id} = $show
> <               unless $needDetails and !($gotDetails);
> <           }
> ---
> >             $cache_show = get_closeup_details($date6am,$show,$pid,$row,$url)
> >               if want_details($show);
> 467,470d447
> <           if (!($cachedShow) and ($needDetails and !($gotDetails))) {
> <             # Give the website a breather for better success
> <             sleep $waitBetweenRetries;
> <             next DAYPROC;
> 475,477c452,453
> <           # recreate newcache based on current shows to flush out obsolete
> <           # entries in old cache
> <           $newcache->{$cache_id} = $show;
> ---
> >           push @{ $showlists{$chanid} }, $show;
> >           $newcache->{$cache_id} = $show if $cache_show;
> 479c455
> <           abbr_dump($show, $cachedShow) if $debug==1;
> ---
> >           abbr_dump($show, $cached->{$cache_id}) if $debug==1;
> 486d461
> <     $completedDay = 1;
> 488,495c463
> <
> <     } # Processing one day
> <   } # For days
> < } # For services
> <
> < # add all shows in cache to showlists
> < while (my ($cache_id, $show) = each (%$newcache)) {
> <   push @{ $showlists{$show->{channel}} }, $show;
> ---
> >   }
> 497a466
> >
> 519,522c488
> <   # Make sure shows are in order so that dupe check will work
> <   my @shows =
> <     sort {$a->{start} cmp $b->{start}} @{ $showlists{$channel} };
> <
> ---
> >   my @shows = @{ $showlists{$channel} };
> 1021,1022c987
> <   # Don't prepend warper if it already has it
> <   $url =~ s/^http:\/\//$WW/ if $opt_warper and !($url =~ /^$WW/);
> ---
> >   $url =~ s/^http:\/\//$WW/ if $opt_warper;
> _______________________________________________
> mythtv-users mailing list
> mythtv-users at mythtv.org
> http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users
>


More information about the mythtv-users mailing list