The Web Design Group

... Making the Web accessible to all.

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
> Extended characters in URLs and links
pandy
post Apr 24 2008, 12:08 PM
Post #1


Don't like donuts. Don't do MySpace.
********

Group: WDG Moderators
Posts: 17,967
Joined: 9-August 06
Member No.: 6



How does this work? blink.gif

Note the "ä" and "å".
http://blogg.passagen.se/matfrisk/entry/rä...rån_ica_uppsala

I understand if the browser does some internal conversion, but is the server involved in this too?

Here's another one I found.
http://www.bogrönt.se

The first one works in IE6, Opera and FF if you click a link. If you paste the URL in the address bar it doesn't work in FF but does in IE. The second one doesn't work at all in IE6 but does in the other browsers I tried (newer browsers, I should add).

One difference seems to be that in the case of the domain the "undotted" version is also covered, http://www.bogront.se , while http://blogg.passagen.se/matfrisk/entry/rä...rån_ica_uppsala probably is produced by a poor blog software. http://blogg.passagen.se/matfrisk/entry/ra...ran_ica_uppsala doesn't exist.

I'm totally confused, I haven't heard anything about this. Please enlighten me! wub.gif
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Brian Chandler
post Apr 24 2008, 01:54 PM
Post #2


Jocular coder
********

Group: Members
Posts: 2,298
Joined: 31-August 06
Member No.: 43



It's a very bad idea actually. Of course it sounds wonderful - why should non-Americans not be able to see the name they are looking for directly in the address? But one of the principles of the web design was that an address is something you can write on a table napkin and take home. Allowing more or less any unicode character in an address breaks this in a serious way - of course Swedish letters are no problem, because two of them look the same as German umlauts, and look different to (almost) everybody, while the other has a clear ring to it, there are other characters, such as the Russian letter a, or the (actually quite ridiculous) Oriental-double-width letter a, and probably more, which are different characters, but completely indistinguishable on the screen, let alone a table napkin. (I think there has already been a PayPal scam with paypal.com, where one of the a's isn't what it looks like.)

User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Apr 24 2008, 02:20 PM
Post #3


Don't like donuts. Don't do MySpace.
********

Group: WDG Moderators
Posts: 17,967
Joined: 9-August 06
Member No.: 6



Uhm. I don't want to do it. I want to know how it works. I didn't even know it was possible.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Brian Chandler
post Apr 24 2008, 02:42 PM
Post #4


Jocular coder
********

Group: Members
Posts: 2,298
Joined: 31-August 06
Member No.: 43



http://en.wikipedia.org/wiki/Internationalized_domain_name

User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Frederiek
post Apr 24 2008, 03:07 PM
Post #5


Programming Fanatic
********

Group: Members
Posts: 5,146
Joined: 23-August 06
From: Europe
Member No.: 9



FWIW, I read somewhere that Sweden had to wait 2003 to be able to use these accented characters in url's. No wonder, following the wiki, as the IDNA was defined in 2003.
But I also read that search engines are not happy with these accented characters in url's and simply resume them to their corresponding non-accented character.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Apr 24 2008, 05:40 PM
Post #6


.
********

Group: WDG Moderators
Posts: 7,867
Joined: 10-August 06
Member No.: 7



QUOTE(pandy @ Apr 24 2008, 07:08 PM) *

How does this work? blink.gif

Note the "ä" and "å".
http://blogg.passagen.se/matfrisk/entry/rä...rån_ica_uppsala

I remember ads about registering domain names with ÅÄÖ a few years ago. At the time browsers needed a plugin to be able to use them.

Just the other day I saw a link with an ÅÄÖ file name, too. Coincidence?

BTW when I open the link above in Firefox 2.0.0.14 it's encoded to http://blogg.passagen.se/matfrisk/entry/r%...E5n_ica_uppsala , which is a blank page (but not a 404).
User is online!PM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Apr 24 2008, 06:00 PM
Post #7


Don't like donuts. Don't do MySpace.
********

Group: WDG Moderators
Posts: 17,967
Joined: 9-August 06
Member No.: 6



Thanks. I remember reading about IDN now, long ago. But I didn't know it was ever implemented.

QUOTE
BTW when I open the link above in Firefox 2.0.0.14 it's encoded to http://blogg.passagen.se/matfrisk/entry/r%...E5n_ica_uppsala , which is a blank page (but not a 404).

Funny. That's what happen when I paste the URL in, but clicking works fine. A setting maybe?

Anyway, IDN doesn't explain why the URL to passagen.se works (sometimes) . Is that purely a browser trick then?
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Brian Chandler
post Apr 24 2008, 09:29 PM
Post #8


Jocular coder
********

Group: Members
Posts: 2,298
Joined: 31-August 06
Member No.: 43



QUOTE(Frederiek @ Apr 25 2008, 05:07 AM) *

FWIW, I read somewhere that Sweden had to wait 2003 to be able to use these accented characters in url's. No wonder, following the wiki, as the IDNA was defined in 2003.
But I also read that search engines are not happy with these accented characters in url's and simply resume them to their corresponding non-accented character.


Sorry, but that doesn't make sense. (I mean, 'resume' is the wrong word in English, but whatever you change it to it must just be wrong.) There is a punycode (I think that's what it's called) representation of (say) öre.com in url-permitted characters - suppose it's x--_oe_re.com. Then this is a different address from anything else, including ore.com (which _is_ the same as Ore.com). So either the search engine includes links to it or it doesn't, but it's not going to work if it includes links to somewhere completely different.

You might mean that search engines treat "accented" letters the same as unaccented ones in the search terms - but there are problems with this approach. For a start, some languages don't consist of "accented" variations of the 26-letter Latin alphabet.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Brian Chandler
post Apr 24 2008, 09:33 PM
Post #9


Jocular coder
********

Group: Members
Posts: 2,298
Joined: 31-August 06
Member No.: 43



QUOTE(pandy @ Apr 25 2008, 08:00 AM) *

Thanks. I remember reading about IDN now, long ago. But I didn't know it was ever implemented.

QUOTE
BTW when I open the link above in Firefox 2.0.0.14 it's encoded to http://blogg.passagen.se/matfrisk/entry/r%...E5n_ica_uppsala , which is a blank page (but not a 404).

Funny. That's what happen when I paste the URL in, but clicking works fine. A setting maybe?

Anyway, IDN doesn't explain why the URL to passagen.se works (sometimes) . Is that purely a browser trick then?


Presumably the siteowners have registered all the obvious variants. (I don't know if Swedish uses the 'oe' 'ae' convention, but in german for example you might have muller.de, mueller.de and m-can't-find-a-u-umlaut-to-copy-ller.de)
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Apr 25 2008, 09:26 AM
Post #10


Don't like donuts. Don't do MySpace.
********

Group: WDG Moderators
Posts: 17,967
Joined: 9-August 06
Member No.: 6



But the fishy character aren't in the domain name.

No, we don't do that. We use a for ä and å and o for ö.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Brian Chandler
post Apr 25 2008, 12:21 PM
Post #11


Jocular coder
********

Group: Members
Posts: 2,298
Joined: 31-August 06
Member No.: 43



QUOTE(pandy @ Apr 25 2008, 11:26 PM) *

But the fishy character aren't in the domain name.



Oh, well for filenames you have always been able to use anything you like. (Sort of)

I believe there is/was a recommendation not to, but lots of Japanese sites would have filenames in Shift-JIS, and as long as the browser sends the right string of numbers, because as we all know, all information is passed around as binary bit patterns, isn't it, then you would get the link. But of course in general this only works if the page including the link is encoded in the same character set (Shift-JIS).

QUOTE

No, we don't do that. We use a for ä and å and o for ö.


Aha! (Or should I say Åhå!)
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Apr 25 2008, 12:35 PM
Post #12


Don't like donuts. Don't do MySpace.
********

Group: WDG Moderators
Posts: 17,967
Joined: 9-August 06
Member No.: 6



That remind me of an old linguistic joke. Would you believe this is really a sentence, even if in a local dialect? tongue.gif

I åa ä e öa å i öa ä e å.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



- Lo-Fi Version Time is now: 14th August 2018 - 05:04 PM