![]() |
![]() |
Brian Chandler |
![]()
Post
#1
|
Jocular coder ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 2,494 Joined: 31-August 06 Member No.: 43 ![]() |
I suddenly got a warning from Pair Networks that my "bandwidth" figure was several hundred times normal, with a projected 4-digit (dollar) bill. I managed to track down the problem, and Pair agreed to waive any surcharge, so good for them. But the underlying problem is weird. The problem page was https://imaginatorium.org/sano/tanbo.htm - which I made 20+ years ago.
But to go back to the beginning, the weirdness is this. Try the page: https://imaginatorium.com/ensky.html - should work. Now try https://imaginatorium.com/ensky.html/ship.php - I expect this to fail, since there is no file called ensky.html/ship.php in the relevant directory. But Apache simply serves the same page: it appears to go through the url until it finds an existing file, ignoring the rest of the string, BUT treating the last slash in the url as marking the "current directory". So if you click any of the links on this page, for example "Shop front" it goes to https://imaginatorium.com/ensky.html/shop.html - and this of course returns the current page all over again. My problem page included an iframe (remember that???) including an image; this is the scrolling panorama near the top, which I have just reimplemented using css. The old version is commented out; here it is: QUOTE <center> <iframe width="80%" height=196 src="pics/b045pano.htm" marginheight=0 marginwidth=0><a href="pics/b045pano.jpg">Panorama</a></iframe> <p class=caption>Bare paddy-fields - 250-degree panorama of the Kanto Plain in winter</p> </center> So what happened was a (genuine bot) access to "GET /sano/tanbo.htm/art/art/books/art/books/books/pics/art/pics/web/web/guest/pics/guest/art/pics/pics/stuff.htm HTTP/1.1" 200 12311 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)". This served the page, then inside the iframe served the same page again with a different (ignored) bit on the end, and so on, recursing indefinitely. So what is going on? I cannot believe that this was the intention of the http designers. I tried the Apache documentation, but could not anything resembling a specification, just a vague statemement about the file tree, and lots of "Try this, it may work for you" type stuff. Is this standard Apache behaviour, or could it be some problem with the Pair.com implementation? Can someone try the same trick on their own server? (This test will not work on the original paddy-field page, because I put this in .htaccess ...) QUOTE DirectoryIndex sano.htm index.htm AddType application/x-httpd-php .php .htm # RewriteEngine On RewriteBase / # Block bogus tanbo.htm/art/stuff/... and deliver 403 access denied RewriteCond %{REQUEST_URI} \.htm/ RewriteRule \.htm/ - [F,L] Grateful for suggestions: if I can work out whether it is specific to pair.com I can go either to them or to Apache support... |
![]() ![]() |
pandy |
![]()
Post
#2
|
🌟Computer says no🌟 ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: WDG Moderators Posts: 20,814 Joined: 9-August 06 Member No.: 6 ![]() |
But to go back to the beginning, the weirdness is this. Try the page: https://imaginatorium.com/ensky.html - should work. Now try https://imaginatorium.com/ensky.html/ship.php - I expect this to fail, since there is no file called ensky.html/ship.php in the relevant directory. But Apache simply serves the same page: it appears to go through the url until it finds an existing file, ignoring the rest of the string, BUT treating the last slash in the url as marking the "current directory". So if you click any of the links on this page, for example "Shop front" it goes to https://imaginatorium.com/ensky.html/shop.html - and this of course returns the current page all over again. I don't know. https://imaginatorium.com/ensky.html/ship.php isn't a valid URL (can't do the slash thing after a file name). Could it be the server is configured to just ignore a slash after a file name and any mumbo jumbo that comes after it and just reload the page? Don't know why it would be, but it's all I can think of. https://imaginatorium.com/ensky.html/ https://imaginatorium.com/ensky.html/qwertyuio https://imaginatorium.com/ensky.html/qwertyuio.html |
Dag |
![]()
Post
#3
|
Advanced Member ![]() ![]() ![]() ![]() Group: Members Posts: 122 Joined: 24-October 06 Member No.: 549 ![]() |
... Could it be the server is configured to just ignore a slash after a file name and any mumbo jumbo that comes after it and just reload the page? Don't know why it would be, but it's all I can think of. https://imaginatorium.com/ensky.html/qwertyuio.html Interesting... seems that default apache (or browsers?) attitude is to ignore ending backslah. The same cases on my server. Here too! You should try: https://forums.htmlhelp.com/index.php?act=idx idx is variable https://forums.htmlhelp.com/index.php?act=idx/ but the above one also works (incredible!). The next one too: https://forums.htmlhelp.com/index.php?act=idx/abracadabra In your case, you can't see https://imaginatorium.com/ship.php in URI https://imaginatorium.com/ensky.html/ship.php because 'ensky.html' is real valid file which is returned. I am not sure that it has anything to do with htaccess content negotiating which deals with various file types that have the same name In cases of existing real files: https://imaginatorium.com/ensky.html https://imaginatorium.com/ensky.jpg https://imaginatorium.com/ensky.php URI request of https://imaginatorium.com/ensky server will negotiate and return that one which 'he' decide is the proper solution. This works (file is 'analize.html'): http://www.laban.rs/r/a/analize but this don't (file is 'ensky.html' - 404 returned): https://imaginatorium.com/ensky Your content nagotiating is off. |
Brian Chandler |
![]()
Post
#4
|
Jocular coder ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 2,494 Joined: 31-August 06 Member No.: 43 ![]() |
... Could it be the server is configured to just ignore a slash after a file name and any mumbo jumbo that comes after it and just reload the page? Don't know why it would be, but it's all I can think of. https://imaginatorium.com/ensky.html/qwertyuio.html Interesting... seems that default apache (or browsers?) attitude is to ignore ending backslah. The same cases on my server. Here too! You should try: https://forums.htmlhelp.com/index.php?act=idx idx is variable https://forums.htmlhelp.com/index.php?act=idx/ but the above one also works (incredible!). The next one too: https://forums.htmlhelp.com/index.php?act=idx/abracadabra In your case, you can't see https://imaginatorium.com/ship.php in URI https://imaginatorium.com/ensky.html/ship.php because 'ensky.html' is real valid file which is returned. I am not sure that it has anything to do with htaccess content negotiating which deals with various file types that have the same name In cases of existing real files: https://imaginatorium.com/ensky.html https://imaginatorium.com/ensky.jpg https://imaginatorium.com/ensky.php URI request of https://imaginatorium.com/ensky server will negotiate and return that one which 'he' decide is the proper solution. This works (file is 'analize.html'): http://www.laban.rs/r/a/analize but this don't (file is 'ensky.html' - 404 returned): https://imaginatorium.com/ensky Your content nagotiating is off. What does "Your content nagotiating is off." mean? |
![]() ![]() |
![]() |
Lo-Fi Version | Time is now: 16th May 2025 - 09:51 AM |