The Web Design Group

... Making the Web accessible to all.

Welcome Guest ( Log In | Register )

> Petalbot
Brian Chandler
post Jan 13 2022, 11:47 PM
Post #1


Jocular coder
********

Group: Members
Posts: 2,460
Joined: 31-August 06
Member No.: 43



My error log is full of accesses to the nonexistent https://imaginatorium.com/addbskt.php from something identifying itself as Petalbot. This links to a page here:

https://webmaster.petalsearch.com/site/petalbot

This explains that Petalbot follows the robots.txt protocol, and describes how to block it by (e.g.)

CODE

User-agent: PetalBot
Disallow: /*.php


But https://imaginatorium.com/robots.txt already includes

CODE

User-agent: *
Allow: /*.html
Disallow: /*.php


Unless I misunderstand something, if Petalbot followed the robots.txt protocol it would not attempt to access this page. Or do I have to go around adding in the names of all the robots I want to exclude?
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
 
Reply to this topicStart new topic
Replies
Christian J
post Jan 15 2022, 08:38 AM
Post #2


.
********

Group: WDG Moderators
Posts: 9,679
Joined: 10-August 06
Member No.: 7



According to this page, Petalbot does not respect robots.txt (though no examples are given):
https://www.hypernode.com/blog/hosting/huaw...-online-stores/

This page says it does respect robots.txt, but is too aggressive: https://james-william-fletcher.medium.com/h...er-f17c30e061e7
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Brian Chandler
post Jan 15 2022, 09:44 AM
Post #3


Jocular coder
********

Group: Members
Posts: 2,460
Joined: 31-August 06
Member No.: 43



QUOTE(Christian J @ Jan 15 2022, 10:38 PM) *

According to this page, Petalbot does not respect robots.txt (though no examples are given):
https://www.hypernode.com/blog/hosting/huaw...-online-stores/


Their robots.txt is curious: it just blocks /wp-admin/ - not apparently what they are complaining about - but tries to 'allow' /wp-admin/admin-ajax.php which looks very odd for something you would want a bot poking at.

QUOTE

This page says it does respect robots.txt, but is too aggressive: https://james-william-fletcher.medium.com/h...er-f17c30e061e7


Didn't read very carefully, but they seem to have committed a "By default, execute the command" error.

Neither very convincing, frankly... so the muddle continues.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Jan 15 2022, 11:49 AM
Post #4


.
********

Group: WDG Moderators
Posts: 9,679
Joined: 10-August 06
Member No.: 7



QUOTE(Brian Chandler @ Jan 15 2022, 03:44 PM) *

Their robots.txt is curious: it just blocks /wp-admin/ - not apparently what they are complaining about - but tries to 'allow' /wp-admin/admin-ajax.php which looks very odd for something you would want a bot poking at.

Odd indeed. Maybe they don't care about robots.txt entries for well-behaving bots, while bad bots need to be blocked in other ways (since they ignore robots.txt anyway).
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post

Posts in this topic


Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



- Lo-Fi Version Time is now: 4th June 2024 - 06:00 AM