Extended characters in URLs and links |
Extended characters in URLs and links |
pandy |
Apr 24 2008, 12:08 PM
Post
#1
|
🌟Computer says no🌟 Group: WDG Moderators Posts: 20,731 Joined: 9-August 06 Member No.: 6 |
How does this work?
Note the "ä" and "å". http://blogg.passagen.se/matfrisk/entry/rä...rån_ica_uppsala I understand if the browser does some internal conversion, but is the server involved in this too? Here's another one I found. http://www.bogrönt.se The first one works in IE6, Opera and FF if you click a link. If you paste the URL in the address bar it doesn't work in FF but does in IE. The second one doesn't work at all in IE6 but does in the other browsers I tried (newer browsers, I should add). One difference seems to be that in the case of the domain the "undotted" version is also covered, http://www.bogront.se , while http://blogg.passagen.se/matfrisk/entry/rä...rån_ica_uppsala probably is produced by a poor blog software. http://blogg.passagen.se/matfrisk/entry/ra...ran_ica_uppsala doesn't exist. I'm totally confused, I haven't heard anything about this. Please enlighten me! |
Brian Chandler |
Apr 24 2008, 01:54 PM
Post
#2
|
Jocular coder Group: Members Posts: 2,460 Joined: 31-August 06 Member No.: 43 |
It's a very bad idea actually. Of course it sounds wonderful - why should non-Americans not be able to see the name they are looking for directly in the address? But one of the principles of the web design was that an address is something you can write on a table napkin and take home. Allowing more or less any unicode character in an address breaks this in a serious way - of course Swedish letters are no problem, because two of them look the same as German umlauts, and look different to (almost) everybody, while the other has a clear ring to it, there are other characters, such as the Russian letter a, or the (actually quite ridiculous) Oriental-double-width letter a, and probably more, which are different characters, but completely indistinguishable on the screen, let alone a table napkin. (I think there has already been a PayPal scam with paypal.com, where one of the a's isn't what it looks like.)
|
pandy |
Apr 24 2008, 02:20 PM
Post
#3
|
🌟Computer says no🌟 Group: WDG Moderators Posts: 20,731 Joined: 9-August 06 Member No.: 6 |
Uhm. I don't want to do it. I want to know how it works. I didn't even know it was possible.
|
Brian Chandler |
Apr 24 2008, 02:42 PM
Post
#4
|
Jocular coder Group: Members Posts: 2,460 Joined: 31-August 06 Member No.: 43 |
|
Frederiek |
Apr 24 2008, 03:07 PM
Post
#5
|
Programming Fanatic Group: Members Posts: 5,146 Joined: 23-August 06 From: Europe Member No.: 9 |
FWIW, I read somewhere that Sweden had to wait 2003 to be able to use these accented characters in url's. No wonder, following the wiki, as the IDNA was defined in 2003.
But I also read that search engines are not happy with these accented characters in url's and simply resume them to their corresponding non-accented character. |
Christian J |
Apr 24 2008, 05:40 PM
Post
#6
|
. Group: WDG Moderators Posts: 9,658 Joined: 10-August 06 Member No.: 7 |
How does this work? Note the "ä" and "å". http://blogg.passagen.se/matfrisk/entry/rä...rån_ica_uppsala I remember ads about registering domain names with ÅÄÖ a few years ago. At the time browsers needed a plugin to be able to use them. Just the other day I saw a link with an ÅÄÖ file name, too. Coincidence? BTW when I open the link above in Firefox 2.0.0.14 it's encoded to http://blogg.passagen.se/matfrisk/entry/r%...E5n_ica_uppsala , which is a blank page (but not a 404). |
pandy |
Apr 24 2008, 06:00 PM
Post
#7
|
🌟Computer says no🌟 Group: WDG Moderators Posts: 20,731 Joined: 9-August 06 Member No.: 6 |
Thanks. I remember reading about IDN now, long ago. But I didn't know it was ever implemented.
QUOTE BTW when I open the link above in Firefox 2.0.0.14 it's encoded to http://blogg.passagen.se/matfrisk/entry/r%...E5n_ica_uppsala , which is a blank page (but not a 404). Funny. That's what happen when I paste the URL in, but clicking works fine. A setting maybe? Anyway, IDN doesn't explain why the URL to passagen.se works (sometimes) . Is that purely a browser trick then? |
Brian Chandler |
Apr 24 2008, 09:29 PM
Post
#8
|
Jocular coder Group: Members Posts: 2,460 Joined: 31-August 06 Member No.: 43 |
FWIW, I read somewhere that Sweden had to wait 2003 to be able to use these accented characters in url's. No wonder, following the wiki, as the IDNA was defined in 2003. But I also read that search engines are not happy with these accented characters in url's and simply resume them to their corresponding non-accented character. Sorry, but that doesn't make sense. (I mean, 'resume' is the wrong word in English, but whatever you change it to it must just be wrong.) There is a punycode (I think that's what it's called) representation of (say) öre.com in url-permitted characters - suppose it's x--_oe_re.com. Then this is a different address from anything else, including ore.com (which _is_ the same as Ore.com). So either the search engine includes links to it or it doesn't, but it's not going to work if it includes links to somewhere completely different. You might mean that search engines treat "accented" letters the same as unaccented ones in the search terms - but there are problems with this approach. For a start, some languages don't consist of "accented" variations of the 26-letter Latin alphabet. |
Brian Chandler |
Apr 24 2008, 09:33 PM
Post
#9
|
Jocular coder Group: Members Posts: 2,460 Joined: 31-August 06 Member No.: 43 |
Thanks. I remember reading about IDN now, long ago. But I didn't know it was ever implemented. QUOTE BTW when I open the link above in Firefox 2.0.0.14 it's encoded to http://blogg.passagen.se/matfrisk/entry/r%...E5n_ica_uppsala , which is a blank page (but not a 404). Funny. That's what happen when I paste the URL in, but clicking works fine. A setting maybe? Anyway, IDN doesn't explain why the URL to passagen.se works (sometimes) . Is that purely a browser trick then? Presumably the siteowners have registered all the obvious variants. (I don't know if Swedish uses the 'oe' 'ae' convention, but in german for example you might have muller.de, mueller.de and m-can't-find-a-u-umlaut-to-copy-ller.de) |
pandy |
Apr 25 2008, 09:26 AM
Post
#10
|
🌟Computer says no🌟 Group: WDG Moderators Posts: 20,731 Joined: 9-August 06 Member No.: 6 |
But the fishy character aren't in the domain name.
No, we don't do that. We use a for ä and å and o for ö. |
Brian Chandler |
Apr 25 2008, 12:21 PM
Post
#11
|
Jocular coder Group: Members Posts: 2,460 Joined: 31-August 06 Member No.: 43 |
But the fishy character aren't in the domain name. Oh, well for filenames you have always been able to use anything you like. (Sort of) I believe there is/was a recommendation not to, but lots of Japanese sites would have filenames in Shift-JIS, and as long as the browser sends the right string of numbers, because as we all know, all information is passed around as binary bit patterns, isn't it, then you would get the link. But of course in general this only works if the page including the link is encoded in the same character set (Shift-JIS). QUOTE No, we don't do that. We use a for ä and å and o for ö. Aha! (Or should I say Åhå!) |
pandy |
Apr 25 2008, 12:35 PM
Post
#12
|
🌟Computer says no🌟 Group: WDG Moderators Posts: 20,731 Joined: 9-August 06 Member No.: 6 |
That remind me of an old linguistic joke. Would you believe this is really a sentence, even if in a local dialect?
I åa ä e öa å i öa ä e å. |
Lo-Fi Version | Time is now: 26th April 2024 - 03:02 PM |