The Web Design Group

... Making the Web accessible to all.

Welcome Guest ( Log In | Register )

> Petalbot
Brian Chandler
post Jan 13 2022, 11:47 PM
Post #1


Jocular coder
********

Group: Members
Posts: 2,460
Joined: 31-August 06
Member No.: 43



My error log is full of accesses to the nonexistent https://imaginatorium.com/addbskt.php from something identifying itself as Petalbot. This links to a page here:

https://webmaster.petalsearch.com/site/petalbot

This explains that Petalbot follows the robots.txt protocol, and describes how to block it by (e.g.)

CODE

User-agent: PetalBot
Disallow: /*.php


But https://imaginatorium.com/robots.txt already includes

CODE

User-agent: *
Allow: /*.html
Disallow: /*.php


Unless I misunderstand something, if Petalbot followed the robots.txt protocol it would not attempt to access this page. Or do I have to go around adding in the names of all the robots I want to exclude?
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
 
Reply to this topicStart new topic
Replies
Brian Chandler
post Jan 27 2022, 12:08 PM
Post #2


Jocular coder
********

Group: Members
Posts: 2,460
Joined: 31-August 06
Member No.: 43



Well, I am still seeing huge numbers of robot accesses to .php files. Not only Petalbot, also DuckDuckWhateveritis, and others. Here is my robots.txt file, as of about two weeks ago; does it look OK?

https://imaginatorium.com/robots.txt

And how long do you think I need to give bots to update their copy of robots.txt? Any ideas?
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Jan 27 2022, 05:13 PM
Post #3


.
********

Group: WDG Moderators
Posts: 9,686
Joined: 10-August 06
Member No.: 7



QUOTE(Brian Chandler @ Jan 27 2022, 06:08 PM) *

Well, I am still seeing huge numbers of robot accesses to .php files. Not only Petalbot, also DuckDuckWhateveritis, and others. Here is my robots.txt file, as of about two weeks ago; does it look OK?

https://imaginatorium.com/robots.txt

I wouldn't use ending slashes, unless e.g. "ack.php" is a directory and not a PHP file...

QUOTE
And how long do you think I need to give bots to update their copy of robots.txt? Any ideas?

No idea what they actually do. But since the purpose of a returning bot is to update its database, surely that would include the robots.txt file as well (if they care about it)?
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Brian Chandler
post Jan 28 2022, 01:57 AM
Post #4


Jocular coder
********

Group: Members
Posts: 2,460
Joined: 31-August 06
Member No.: 43



QUOTE(Christian J @ Jan 28 2022, 07:13 AM) *

QUOTE(Brian Chandler @ Jan 27 2022, 06:08 PM) *

Well, I am still seeing huge numbers of robot accesses to .php files. Not only Petalbot, also DuckDuckWhateveritis, and others. Here is my robots.txt file, as of about two weeks ago; does it look OK?

https://imaginatorium.com/robots.txt

I wouldn't use ending slashes, unless e.g. "ack.php" is a directory and not a PHP file...


Thanks Christian! My blunder somehow. Pandy's links are interesting, but rather evidence-free claims of Petalbot not complying with robots.txt. I'll see what happens now. I don't think we can expect them to read the robots.txt file every day, even - something like once a week or month would seem quite reasonable, so I am happy to be patient.

User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post

Posts in this topic


Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



- Lo-Fi Version Time is now: 15th June 2024 - 12:40 PM