<br><br><div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Message: 17<br>Date: Tue, 24 Oct 2006 14:39:14 -0400
<br>From: "Michael T. Dean" <<a href="mailto:mtdean@thirdcontact.com">mtdean@thirdcontact.com</a>><br>Subject: Re: [mythtv-users] Google ate my recordings<br>To: Discussion about mythtv <<a href="mailto:mythtv-users@mythtv.org">
mythtv-users@mythtv.org</a>><br>Message-ID: <<a href="mailto:453E5DD2.30402@thirdcontact.com">453E5DD2.30402@thirdcontact.com</a>><br>Content-Type: text/plain; charset=ISO-8859-1; format=flowed<br><br>On 10/24/06 14:22, Carl Fongheiser wrote:
<br><br>> On 10/24/06, Glenn Chubak <<a href="mailto:glenn@saskatoon.com">glenn@saskatoon.com</a>> wrote:<br>><br>> > I would have thought that mythweb "Are you sure?" dialogs would<br>> > have stopped the googlebot but it doesn't seem so. If anyone is
<br>> > interested I can post the access logs from apache.<br>><br>> Those dialogs only happen if the browser is Javascript enabled.<br>> Needless to say, the robots don't execute the Javascript code. For
<br>> the future, you'll probably want to password-protect MythWeb. It's<br>> also a good idea to put a robots.txt file at the top level of your<br>> web server's document tree. For details, look here:<br>>
<a href="http://www.robotstxt.org/wc/robots.html">http://www.robotstxt.org/wc/robots.html</a><br>><br>> Finally, I strongly recommend not having your MythWeb installation<br>> exposed directly to the Internet. That invites all kinds of trouble.
<br><br>IMHO, the robots.txt is completely useless since no robot should ever be<br>allowed into the web app (as you mentioned in the second point). Once<br>Google gets info about your site--even with a robots.txt in place--that
<br>allows Google users to identify it as a MythWeb site, some sociopathic<br>netizens will follow the Google-bot into your site and do Bad Things<br>(i.e. delete the recordings, delete channels, delete settings, set up<br>
user jobs to do evil things to your Myth box--are you starting to get<br>the idea that deleting recordings is probably the least bad thing that<br>could happen?). So, make sure you keep your Myth box out of the search<br>
engines (with appropriate authentication settings) or you're inviting<br>real trouble.<br><br>Mike<br><br></blockquote></div><br>Actually I would think if you've got a robots.txt which denies Googlebot access then it won't index the site either. That said, if Googlebot found your site so too will RandomBot01 which doesn't obey the
robots.txt file.<br>