The Web Design Group

... Making the Web accessible to all.

Welcome Guest ( Log In | Register )

> Apache problem, Weird addresses being served...
Brian Chandler
post Jul 14 2024, 10:28 AM
Post #1


Jocular coder
********

Group: Members
Posts: 2,494
Joined: 31-August 06
Member No.: 43



I suddenly got a warning from Pair Networks that my "bandwidth" figure was several hundred times normal, with a projected 4-digit (dollar) bill. I managed to track down the problem, and Pair agreed to waive any surcharge, so good for them. But the underlying problem is weird. The problem page was https://imaginatorium.org/sano/tanbo.htm - which I made 20+ years ago.

But to go back to the beginning, the weirdness is this. Try the page: https://imaginatorium.com/ensky.html - should work. Now try https://imaginatorium.com/ensky.html/ship.php - I expect this to fail, since there is no file called ensky.html/ship.php in the relevant directory. But Apache simply serves the same page: it appears to go through the url until it finds an existing file, ignoring the rest of the string, BUT treating the last slash in the url as marking the "current directory". So if you click any of the links on this page, for example "Shop front" it goes to https://imaginatorium.com/ensky.html/shop.html - and this of course returns the current page all over again.

My problem page included an iframe (remember that???) including an image; this is the scrolling panorama near the top, which I have just reimplemented using css. The old version is commented out; here it is:

QUOTE
<center>
<iframe width="80%" height=196 src="pics/b045pano.htm" marginheight=0 marginwidth=0><a href="pics/b045pano.jpg">Panorama</a></iframe>
<p class=caption>Bare paddy-fields - 250-degree panorama of the Kanto Plain in winter</p>
</center>


So what happened was a (genuine bot) access to "GET /sano/tanbo.htm/art/art/books/art/books/books/pics/art/pics/web/web/guest/pics/guest/art/pics/pics/stuff.htm HTTP/1.1" 200 12311 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)". This served the page, then inside the iframe served the same page again with a different (ignored) bit on the end, and so on, recursing indefinitely.

So what is going on? I cannot believe that this was the intention of the http designers. I tried the Apache documentation, but could not anything resembling a specification, just a vague statemement about the file tree, and lots of "Try this, it may work for you" type stuff. Is this standard Apache behaviour, or could it be some problem with the Pair.com implementation? Can someone try the same trick on their own server? (This test will not work on the original paddy-field page, because I put this in .htaccess ...)

QUOTE
DirectoryIndex sano.htm index.htm
AddType application/x-httpd-php .php .htm
#
RewriteEngine On
RewriteBase /
# Block bogus tanbo.htm/art/stuff/... and deliver 403 access denied
RewriteCond %{REQUEST_URI} \.htm/
RewriteRule \.htm/ - [F,L]


Grateful for suggestions: if I can work out whether it is specific to pair.com I can go either to them or to Apache support...
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
 
Reply to this topicStart new topic
Replies
pandy
post Jul 14 2024, 10:43 AM
Post #2


🌟Computer says no🌟
********

Group: WDG Moderators
Posts: 20,814
Joined: 9-August 06
Member No.: 6



QUOTE(Brian Chandler @ Jul 14 2024, 05:28 PM) *



But to go back to the beginning, the weirdness is this. Try the page: https://imaginatorium.com/ensky.html - should work. Now try https://imaginatorium.com/ensky.html/ship.php - I expect this to fail, since there is no file called ensky.html/ship.php in the relevant directory. But Apache simply serves the same page: it appears to go through the url until it finds an existing file, ignoring the rest of the string, BUT treating the last slash in the url as marking the "current directory". So if you click any of the links on this page, for example "Shop front" it goes to https://imaginatorium.com/ensky.html/shop.html - and this of course returns the current page all over again.


I don't know. https://imaginatorium.com/ensky.html/ship.php isn't a valid URL (can't do the slash thing after a file name). Could it be the server is configured to just ignore a slash after a file name and any mumbo jumbo that comes after it and just reload the page? Don't know why it would be, but it's all I can think of.
https://imaginatorium.com/ensky.html/
https://imaginatorium.com/ensky.html/qwertyuio
https://imaginatorium.com/ensky.html/qwertyuio.html
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Dag
post Jul 15 2024, 02:32 AM
Post #3


Advanced Member
****

Group: Members
Posts: 122
Joined: 24-October 06
Member No.: 549



QUOTE(pandy @ Jul 14 2024, 07:43 PM) *


... Could it be the server is configured to just ignore a slash after a file name and any mumbo jumbo that comes after it and just reload the page? Don't know why it would be, but it's all I can think of.
https://imaginatorium.com/ensky.html/qwertyuio.html


Interesting... seems that default apache (or browsers?) attitude is to ignore ending backslah. The same cases on my server. Here too!

You should try:
https://forums.htmlhelp.com/index.php?act=idx
idx is variable
https://forums.htmlhelp.com/index.php?act=idx/
but the above one also works (incredible!). The next one too:
https://forums.htmlhelp.com/index.php?act=idx/abracadabra

In your case, you can't see
https://imaginatorium.com/ship.php
in URI
https://imaginatorium.com/ensky.html/ship.php
because 'ensky.html' is real valid file which is returned.

I am not sure that it has anything to do with htaccess content negotiating which deals with various file types that have the same name

In cases of existing real files:
https://imaginatorium.com/ensky.html
https://imaginatorium.com/ensky.jpg
https://imaginatorium.com/ensky.php

URI request of
https://imaginatorium.com/ensky
server will negotiate and return that one which 'he' decide is the proper solution.

This works (file is 'analize.html'):
http://www.laban.rs/r/a/analize
but this don't (file is 'ensky.html' - 404 returned):
https://imaginatorium.com/ensky

Your content nagotiating is off.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Brian Chandler
post Aug 7 2024, 08:30 AM
Post #4


Jocular coder
********

Group: Members
Posts: 2,494
Joined: 31-August 06
Member No.: 43



QUOTE(Dag @ Jul 15 2024, 04:32 PM) *

QUOTE(pandy @ Jul 14 2024, 07:43 PM) *


... Could it be the server is configured to just ignore a slash after a file name and any mumbo jumbo that comes after it and just reload the page? Don't know why it would be, but it's all I can think of.
https://imaginatorium.com/ensky.html/qwertyuio.html


Interesting... seems that default apache (or browsers?) attitude is to ignore ending backslah. The same cases on my server. Here too!

You should try:
https://forums.htmlhelp.com/index.php?act=idx
idx is variable
https://forums.htmlhelp.com/index.php?act=idx/
but the above one also works (incredible!). The next one too:
https://forums.htmlhelp.com/index.php?act=idx/abracadabra

In your case, you can't see
https://imaginatorium.com/ship.php
in URI
https://imaginatorium.com/ensky.html/ship.php
because 'ensky.html' is real valid file which is returned.

I am not sure that it has anything to do with htaccess content negotiating which deals with various file types that have the same name

In cases of existing real files:
https://imaginatorium.com/ensky.html
https://imaginatorium.com/ensky.jpg
https://imaginatorium.com/ensky.php

URI request of
https://imaginatorium.com/ensky
server will negotiate and return that one which 'he' decide is the proper solution.

This works (file is 'analize.html'):
http://www.laban.rs/r/a/analize
but this don't (file is 'ensky.html' - 404 returned):
https://imaginatorium.com/ensky

Your content nagotiating is off.


What does "Your content nagotiating is off." mean?
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post

Posts in this topic
Brian Chandler   Apache problem   Jul 14 2024, 10:28 AM
pandy   But to go back to the beginning, the weirdness ...   Jul 14 2024, 10:43 AM
Christian J   I don't know. https://imaginatorium.com/ensky...   Jul 14 2024, 08:21 PM
pandy   Is ensky.html a directory and not a HTML document?...   Jul 14 2024, 08:49 PM
Christian J   Is ensky.html a directory and not a HTML document...   Jul 15 2024, 07:32 AM
pandy   [quote name='pandy' post='147339' date='Jul 15 20...   Jul 15 2024, 10:08 AM
Christian J   Surely periods in file and folder names are allow...   Jul 15 2024, 10:35 AM
Dag   ... Could it be the server is configured to jus...   Jul 15 2024, 02:32 AM
pandy   Interesting... seems that default apache (or br...   Jul 15 2024, 09:53 AM
Brian Chandler   ... Could it be the server is configured to ju...   Aug 7 2024, 08:30 AM
Christian J   Now try [url=https://imaginatorium.com/ensky.html...   Jul 14 2024, 08:11 PM
Christian J   Can someone try the same trick on their own serve...   Jul 15 2024, 10:30 AM
Brian Chandler   Thanks for responses. Some points in no particular...   Jul 15 2024, 11:40 AM
Christian J   I don't know if this is simply Apache "m...   Jul 15 2024, 03:34 PM
pandy   The behavior is not the same on this domain.   Jul 15 2024, 06:37 PM
Brian Chandler   I don't know if this is simply Apache ...   Jul 15 2024, 11:20 PM
Brian Chandler   I am still trying to get to the bottom of this. Th...   Jul 20 2024, 11:00 AM
Brian Chandler   I am still trying to get to the bottom of this. T...   Aug 4 2024, 01:33 AM
Christian J   You're right, this gives a 404: https://...   Jul 15 2024, 08:13 PM


Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



- Lo-Fi Version Time is now: 16th May 2025 - 09:51 AM