[mythtv-users] Possible fix for tv_grab_au 2.11
Shanon Mulley
shanonmulleyster at gmail.com
Sat Jul 15 04:17:09 UTC 2006
David,
Can you include the diff file as an attachment, rather then text in
the email? (I could copy/paste I guess, but I think an attachment is
neater).
Thanks.
Shanon.
On 7/15/06, David Whyte <david.whyte at gmail.com> wrote:
> On 7/15/06, Max Barry <mythtv at maxbarry.com> wrote:
> >
> > Seems to be URLs. Simply refreshing the daily guide page immediately
> > gives you another 21 shots at the detail pages.
>
> I got the following email overnight. Not sure why the list wasn't
> CC'd but anyways:
>
> I think I have a solution for the new problem with the msn site. I
> suspected they may have been up to something when they changed the
> pids to a hash with a time component. I think what's happening is that
> they are allowing about 20 odd "closeups" with each refresh of the day
> page, and when you refresh the day page, it will recalculate a new pid
> hash for a new batch of 20. You can test this manually by going to the
> site with a web browser and clicking on 20 or so program details.
> Eventually it will say "Please try again later". When you refresh the
> page, it should allow details again. HOWEVER there are exceptions,
> occassionally I've had to wait and refresh yet again before it
> worked.
>
> My solution is to refresh the same day page and start grabbing details
> again when one "closeup" fails, after waiting a few seconds. At the
> same time, programs for which we had already retrieved details has
> been cached, so it will "resume" where it left off. On a fresh new
> page where I have to grab details for everything, it can take up to 13
> retries to get it all. But it does get there.
>
> The script is working for me, but I've had to make quite extensive
> changes to your v2.12 to get this to work, and I'm a complete perl
> noobie, so I may have mucked things up. Also, things like the
> statistics reporting are all screwed up now and need to be fixed.
> However, I thought I'd give you what I've done in case it will help.
> (indentation has been unchanged, to minimize the diff output so you
> know where the changes are). I'm not confident enough in my perl
> skills to post this on the list :)
>
> Boy, the programming consultants at ninemsn must be doing a good job
> convincing the brass that this scraper is a "real problem", to pad
> their own hours.
>
> Hope this helps. Oh, and as always, use at your own risk!
>
>
> $ diff tv_grab_au.new tv_grab_au.immir.2.12
> 163,164d162
> < my $maxDayProc = 20; # Maximum times to repeat a single day processing
> < my $waitBetweenRetries = 10; # Time to wait between repeats
> 353d350
> < LOOPDAYS:
> 363,364d359
> < my $completedDay = 0;
> < my $dayProcCount = 0;
> 366,370c361
> < DAYPROC:
> < while (!$completedDay and $dayProcCount < $maxDayProc)
> < {
> <
> < my $guidedata = get_page($url) or next LOOPDAYS;
> ---
> > my $guidedata = get_page($url) or next;
> 372,373d362
> < ++$dayProcCount;
> < print "DAYPROC ITERATION $dayProcCount\n" if $debug;
> 417,420c406
> < my $url;
> < # If webwarper used, the link already contains full URL
> < $url = $NMSN unless $opt_warper;
> < $url .= $link[0]->[0];
> ---
> > my $url = $NMSN . $link[0]->[0];
> 428c414
> < my ($show, $cachedShow, $needDetails, $gotDetails);
> ---
> > my ($show, $cache_show);
> 434c420
> < $cachedShow = 1;
> ---
> > $cache_show = 1;
> 459,465c445,446
> < $needDetails = want_details($show);
> < $gotDetails = get_closeup_details($date6am,$show,$pid,$row,$url)
> < if $needDetails;
> < # update current cache in case current day needs to be repeated
> < $cached->{$cache_id} = $show
> < unless $needDetails and !($gotDetails);
> < }
> ---
> > $cache_show = get_closeup_details($date6am,$show,$pid,$row,$url)
> > if want_details($show);
> 467,470d447
> < if (!($cachedShow) and ($needDetails and !($gotDetails))) {
> < # Give the website a breather for better success
> < sleep $waitBetweenRetries;
> < next DAYPROC;
> 475,477c452,453
> < # recreate newcache based on current shows to flush out obsolete
> < # entries in old cache
> < $newcache->{$cache_id} = $show;
> ---
> > push @{ $showlists{$chanid} }, $show;
> > $newcache->{$cache_id} = $show if $cache_show;
> 479c455
> < abbr_dump($show, $cachedShow) if $debug==1;
> ---
> > abbr_dump($show, $cached->{$cache_id}) if $debug==1;
> 486d461
> < $completedDay = 1;
> 488,495c463
> <
> < } # Processing one day
> < } # For days
> < } # For services
> <
> < # add all shows in cache to showlists
> < while (my ($cache_id, $show) = each (%$newcache)) {
> < push @{ $showlists{$show->{channel}} }, $show;
> ---
> > }
> 497a466
> >
> 519,522c488
> < # Make sure shows are in order so that dupe check will work
> < my @shows =
> < sort {$a->{start} cmp $b->{start}} @{ $showlists{$channel} };
> <
> ---
> > my @shows = @{ $showlists{$channel} };
> 1021,1022c987
> < # Don't prepend warper if it already has it
> < $url =~ s/^http:\/\//$WW/ if $opt_warper and !($url =~ /^$WW/);
> ---
> > $url =~ s/^http:\/\//$WW/ if $opt_warper;
> _______________________________________________
> mythtv-users mailing list
> mythtv-users at mythtv.org
> http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users
>
More information about the mythtv-users
mailing list