The Web Design Group

... Making the Web accessible to all.

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
> Rewriting URL to https and www with htaccess
Christian J
post Jan 20 2025, 05:54 PM
Post #1


.
********

Group: WDG Moderators
Posts: 9,781
Joined: 10-August 06
Member No.: 7



I'm trying the following .htaccess snippet to force both HTTPS and WWW at the same time:

CODE
RewriteCond %{HTTP_HOST} ^mysite\.com$ [NC]
RewriteRule ^(.*)$ https://www.mysite.com/$1 [R=301,L]

This seems to rewrite requests containing the string "mysite.com" to https://www.mysite.com/, as intended. But my website also changes requests for www.mysite.com or http://www.mysite.com to https://www.mysite.com/ and I wonder how? Is it because of the above snippet?

I was thinking it all works by the directive discarding everything before "mysite.com", and then prepending "https://www." to it again (even if that part already exists in the original URL). But when I try other (non-existing) sub-domains like https://foo.mysite.com/ they are not rewritten. unsure.gif
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Brian Chandler
post Jan 21 2025, 12:25 PM
Post #2


Jocular coder
********

Group: Members
Posts: 2,489
Joined: 31-August 06
Member No.: 43



QUOTE(Christian J @ Jan 21 2025, 07:54 AM) *

I'm trying the following .htaccess snippet to force both HTTPS and WWW at the same time:

CODE
RewriteCond %{HTTP_HOST} ^mysite\.com$ [NC]
RewriteRule ^(.*)$ https://www.mysite.com/$1 [R=301,L]

This seems to rewrite requests containing the string "mysite.com" to https://www.mysite.com/, as intended. But my website also changes requests for www.mysite.com or http://www.mysite.com to https://www.mysite.com/ and I wonder how? Is it because of the above snippet?

I was thinking it all works by the directive discarding everything before "mysite.com", and then prepending "https://www." to it again (even if that part already exists in the original URL). But when I try other (non-existing) sub-domains like https://foo.mysite.com/ they are not rewritten. unsure.gif


I find the Apache stuff horrible: the documentation is frustratingly vague, and everything is counterintuitive. You would think "replace x y" meant "change x to y", but it doesn't. The first argument matches only the path information, between the domain name and the argument string, but the second argument is the whole url: so ^(.*)$ means the whole path, and this should put that after https://www.mysite.com/. Which it appears to do, so what is the problem?

Well, perhaps you expect to be able to understand what is going on, but that is oh so 20th century, isn't it? I can't find anything in my .htaccess files about changing to https, so I think that is happening anyway, automagically. Then there is nothing left to explain! If someone requests mysite.com the htacess rewrites this to www.mysite.com. In the foo. case, it doesn't. Does that help?
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Jan 21 2025, 06:15 PM
Post #3


.
********

Group: WDG Moderators
Posts: 9,781
Joined: 10-August 06
Member No.: 7



QUOTE(Brian Chandler @ Jan 21 2025, 06:25 PM) *

I can't find anything in my .htaccess files about changing to https, so I think that is happening anyway, automagically.

I don't think my webhost does, which creates problems with users that force HTTPS (e.g. with browser addons) resulting in some users getting a HTTP version and others HTTPS. In particular, this anti-hotlinking directive I used:

CODE
RewriteCond %{HTTP_REFERER} !^http://(www\.)?mysite.com/.*$ [NC]

suddenly(?) stopped my images from loading on my own HTML pages, since the directive used HTTP and my browser forced HTTPS. No idea why this suddenly happened now, it has worked for years with the same browser addons. Anyway, that part was easily changed to:

CODE
RewriteCond %{HTTP_REFERER} !^https://(www\.)?mysite.com/.*$ [NC]

but this in turn means I now have to force HTTPS on all site visitors, hence the directive in my first post. unsure.gif
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Brian Chandler
post Jan 27 2025, 12:06 AM
Post #4


Jocular coder
********

Group: Members
Posts: 2,489
Joined: 31-August 06
Member No.: 43



QUOTE(Christian J @ Jan 22 2025, 08:15 AM) *

QUOTE(Brian Chandler @ Jan 21 2025, 06:25 PM) *

I can't find anything in my .htaccess files about changing to https, so I think that is happening anyway, automagically.

I don't think my webhost does, which creates problems with users that force HTTPS (e.g. with browser addons) resulting in some users getting a HTTP version and others HTTPS. In particular, this anti-hotlinking directive I used:

CODE
RewriteCond %{HTTP_REFERER} !^http://(www\.)?mysite.com/.*$ [NC]

suddenly(?) stopped my images from loading on my own HTML pages, since the directive used HTTP and my browser forced HTTPS. No idea why this suddenly happened now, it has worked for years with the same browser addons. Anyway, that part was easily changed to:

CODE
RewriteCond %{HTTP_REFERER} !^https://(www\.)?mysite.com/.*$ [NC]

but this in turn means I now have to force HTTPS on all site visitors, hence the directive in my first post. unsure.gif


Well, I am not sure what the answer is, since I can't really understand what the question is either, but above you are checking that the refer( r )er [spaces to defeat idiocy] is exactly "http://(www\.)?mysite.com/.*" (the ^ and $ mean beginning and end), whereas all you need to do is check that the referrer *includes* "mysite.com":

CODE
RewriteCond %{HTTP_REFERER} !mysite.com/ [NC]


I think. I have not tested it.

There really is nothing else to explain, except how everything is getting switched to https. I can't see any point in not serving everything over https, since that is what everyone does, but in any event the way to force it is surely not by a rewrite: if you crawl through the Apache documentation or search stack exchange you will find the separate directive for doing just this. You can then find out how this directive is set either by looking at the Apache configuration file, or writing a little test: just as there is a variable %{HTTP_REFERER}, there will be others including the setting of the preferred protocol.

User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Jan 27 2025, 12:03 PM
Post #5


.
********

Group: WDG Moderators
Posts: 9,781
Joined: 10-August 06
Member No.: 7



QUOTE(Brian Chandler @ Jan 27 2025, 06:06 AM) *

Well, I am not sure what the answer is, since I can't really understand what the question is either,

In my first post I just tried to understand what's going on. I'll get back to it in a separate post after this one.

QUOTE

all you need to do is check that the referrer *includes* "mysite.com":

True, I'll try that.

QUOTE

There really is nothing else to explain, except how everything is getting switched to https. I can't see any point in not serving everything over https, since that is what everyone does, but in any event the way to force it is surely not by a rewrite: if you crawl through the Apache documentation or search stack exchange you will find the separate directive for doing just this.

From what I've read there is no such directive; all that you can do is test if HTTPS is enabled, and if not use rewrite. The following seems to work for now:

CODE
# prevent hotlinking (can NOT be used locally!) -------------------------
RewriteCond %{HTTP_REFERER} !mysite.com/ [NC]
RewriteRule \.(jpg|jpeg|png|gif|js|css)$ - [NC,F,L]

# force HTTPS  ------------------------------------------
RewriteCond %{HTTPS} !=on
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

# force WWW ------------------------------------------
RewriteCond %{HTTP_HOST} ^mysite +.com$ [NC]
RewriteRule ^(.*)$ https://www.mysite.com/$1 [R=301,L]


Or maybe it's best to let webhost to set it on some higher level, then maybe it's less likely to next time they change server software. unsure.gif
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Jan 27 2025, 12:09 PM
Post #6


.
********

Group: WDG Moderators
Posts: 9,781
Joined: 10-August 06
Member No.: 7



From the thread start:

CODE
RewriteCond %{HTTP_HOST} ^mysite\.com$ [NC]
RewriteRule ^(.*)$ https://www.mysite.com/$1 [R=301,L]


QUOTE(Brian Chandler @ Jan 21 2025, 06:25 PM) *

The first argument matches only the path information, between the domain name and the argument string,

Didn't understand what you meant with "between the domain name and the argument string"?

QUOTE

but the second argument is the whole url: so ^(.*)$ means the whole path, and this should put that after https://www.mysite.com/.

Didn't understand that either. blush.gif
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Brian Chandler
post Feb 10 2025, 03:39 AM
Post #7


Jocular coder
********

Group: Members
Posts: 2,489
Joined: 31-August 06
Member No.: 43



QUOTE(Christian J @ Jan 28 2025, 02:09 AM) *

From the thread start:

CODE
RewriteCond %{HTTP_HOST} ^mysite\.com$ [NC]
RewriteRule ^(.*)$ https://www.mysite.com/$1 [R=301,L]


QUOTE(Brian Chandler @ Jan 21 2025, 06:25 PM) *

The first argument matches only the path information, between the domain name and the argument string,

Didn't understand what you meant with "between the domain name and the argument string"?

QUOTE

but the second argument is the whole url: so ^(.*)$ means the whole path, and this should put that after https://www.mysite.com/.

Didn't understand that either. blush.gif


Sorry, I didn't explain. Well, here is a URL:

CODE
https://imaginatorium.com/show.php?genre=both&aKayomi=x


This consists of:
CODE
<protocol>//<domain name>/<path>?<argument string>


All I mean by the "path information" is the bit between the domain name and the argument string. Is this unclear? So in the example above, the path is "show.php".

Then the point about the rewrite rule is that it matches against just the path, but rewrites the whole URL (to be honest, I'm not sure about the argument string: whether that automatically gets added unchanged, or whether you need to put is in explicitly). So consider this rewrite rule:

CODE
RewriteRule ^(.*)\.htm$ https://www.mysite.com/$1.html


This (I haven't checked it) replaces any path (filename) with a .htm extension by the same filename with a .html extension.But it is an eccentric way to do things: normally you replace some string (or regular expression) by a different string. Compare a normal function like str_replace() in php: manual. Coupled with the "Try this and see if it works for you" style the Apache "specification" is written is, this makes things difficult. HTH
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Feb 10 2025, 04:18 PM
Post #8


.
********

Group: WDG Moderators
Posts: 9,781
Joined: 10-August 06
Member No.: 7



QUOTE(Brian Chandler @ Feb 10 2025, 09:39 AM) *

All I mean by the "path information" is the bit between the domain name and the argument string. Is this unclear? So in the example above, the path is "show.php".

Then the point about the rewrite rule is that it matches against just the path, but rewrites the whole URL

That's what I though it did, but why are not URLs with a different sub-domain like http://foo.mysite.com/ rewritten to https://www.mysite.com/?

QUOTE
(to be honest, I'm not sure about the argument string: whether that automatically gets added unchanged, or whether you need to put is in explicitly).

It's added when I tested, FWIW...
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Brian Chandler
post Feb 12 2025, 01:19 AM
Post #9


Jocular coder
********

Group: Members
Posts: 2,489
Joined: 31-August 06
Member No.: 43



QUOTE(Christian J @ Feb 11 2025, 06:18 AM) *

QUOTE(Brian Chandler @ Feb 10 2025, 09:39 AM) *

All I mean by the "path information" is the bit between the domain name and the argument string. Is this unclear? So in the example above, the path is "show.php".

Then the point about the rewrite rule is that it matches against just the path, but rewrites the whole URL

That's what I though it did, but why are not URLs with a different sub-domain like http://foo.mysite.com/ rewritten to https://www.mysite.com/?


Because of the first line in this:
CODE
RewriteCond %{HTTP_HOST} ^mysite\.com$ [NC]
RewriteRule ^(.*)$ https://www.mysite.com/$1 [R=301,L]


... which says "only do it, if the HTTP_HOST matches mysite.com, exactly, because ^ means beginning and $ means end."

This post has been edited by Brian Chandler: Feb 12 2025, 01:33 AM
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Feb 12 2025, 07:43 AM
Post #10


.
********

Group: WDG Moderators
Posts: 9,781
Joined: 10-August 06
Member No.: 7



QUOTE(Brian Chandler @ Feb 12 2025, 07:19 AM) *

QUOTE(Christian J @ Feb 11 2025, 06:18 AM) *

QUOTE(Brian Chandler @ Feb 10 2025, 09:39 AM) *

All I mean by the "path information" is the bit between the domain name and the argument string. Is this unclear? So in the example above, the path is "show.php".

Then the point about the rewrite rule is that it matches against just the path, but rewrites the whole URL

That's what I though it did, but why are not URLs with a different sub-domain like http://foo.mysite.com/ rewritten to https://www.mysite.com/?


Because of the first line in this:
CODE
RewriteCond %{HTTP_HOST} ^mysite\.com$ [NC]
RewriteRule ^(.*)$ https://www.mysite.com/$1 [R=301,L]


... which says "only do it, if the HTTP_HOST matches mysite.com, exactly, because ^ means beginning and $ means end."

Don't both "www.mysite.com" and "foo.mysite.com" match "mysite.com" there?

And if (for unknown reasons) only "www.mysite.com" would match, how come it still works when the URL lacks a subdomain entirely? For example, when my browser requests "http://mysite.com" it's still changed to "https://www.mysite.com".

I'm happy with how it currently works, I just can't understand how. wacko.gif
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Brian Chandler
post Feb 13 2025, 04:02 AM
Post #11


Jocular coder
********

Group: Members
Posts: 2,489
Joined: 31-August 06
Member No.: 43



QUOTE(Christian J @ Feb 12 2025, 09:43 PM) *

QUOTE(Brian Chandler @ Feb 12 2025, 07:19 AM) *

QUOTE(Christian J @ Feb 11 2025, 06:18 AM) *

That's what I though it did, but why are not URLs with a different sub-domain like http://foo.mysite.com/ rewritten to https://www.mysite.com/?


Because of the first line in this:
CODE
RewriteCond %{HTTP_HOST} ^mysite\.com$ [NC]
RewriteRule ^(.*)$ https://www.mysite.com/$1 [R=301,L]


... which says "only do it, if the HTTP_HOST matches mysite.com, exactly, because ^ means beginning and $ means end."

Don't both "www.mysite.com" and "foo.mysite.com" match "mysite.com" there?


No, because (as above!): the regexp "^mysite\.com$" is a pattern which matches:

^ beginning of string followed immediately by
mysite\.com followed immediately by
$ end of string

This does not match "foo.mysite.com" because the beginning is immediately followed by "foo", not by "mysite...".

If you change the matching pattern to "mysite\.com$" it will then match "foo.mysite.com", but will still not match "mysite.com.au" (which I think is a potentially valid domain).
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Feb 13 2025, 10:20 AM
Post #12


.
********

Group: WDG Moderators
Posts: 9,781
Joined: 10-August 06
Member No.: 7



QUOTE(Brian Chandler @ Feb 13 2025, 10:02 AM) *

This does not match "foo.mysite.com" because the beginning is immediately followed by "foo", not by "mysite...".

True, and for "www.mysite.com" it doesn't have to match, since that already has its "www" part. That doesn't explain how "http://www.mysite.com" could redirect to "https://www.mysite.com" though. Occasionally even "http://foo.mysite.com" has redirected to "https://foo.mysite.com", which doesn't make any sense to me unless the browser itself was forcing HTTPS (independently of my htaccess directives). But I didn't use browser settings or addons that would force HTTPS to my knowledge.

Apparently redirects can be cached by the browser as well: https://stackoverflow.com/questions/4499541...588494#10588494 maybe that could explain the unexpected results.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Brian Chandler
post Feb 15 2025, 01:08 PM
Post #13


Jocular coder
********

Group: Members
Posts: 2,489
Joined: 31-August 06
Member No.: 43



I think your stackoverflow link answers the question: try in a private window, and the redirect will disappear...
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post

Reply to this topicStart new topic
16 User(s) are reading this topic (16 Guests and 0 Anonymous Users)
0 Members:

 



- Lo-Fi Version Time is now: 19th March 2025 - 02:03 AM