Incorrect sorting of the Scandinavian alphabets |
Incorrect sorting of the Scandinavian alphabets |
Christian J |
Oct 23 2006, 06:11 PM
Post
#1
|
. Group: WDG Moderators Posts: 9,653 Joined: 10-August 06 Member No.: 7 |
Not only PHP sorts the Swedish letters å, ä and ö incorrectly, now I noticed that javascript does the same, and also in Danish and Norwegian. The arrays below should be in the correct order for each language:
CODE window.onload=function() { var se=['å','ä','ö']; // Swedish var dk=['æ','ø','å']; // Danish, apparently same as Norwegian alert(se.sort()); alert(dk.sort()); } Note that Danish and Norwegian use a different order than Swedish. But in the sorted javascript alerts the Swedish letters are incorrectly sorted as "ä,å,ö", while Danish and Norwegian are (again incorrectly) sorted as "å,æ,ø". The same error appear in IE, Opera and Firefox. At least Opera's Norwegian creators should know their own alphabet, so am I correct in assuming that all three browser vendors deliberately follow some flawed convention? |
Darin McGrew |
Oct 23 2006, 06:55 PM
Post
#2
|
WDG Member Group: Root Admin Posts: 8,365 Joined: 4-August 06 From: Mountain View, CA Member No.: 3 |
Does PHP allow you to specify the locale? The default locale is often "C", which sorts characters according to their numeric encoding. Other locales should sort characters as appropriate for that locale.
|
Liam Quinn |
Oct 23 2006, 08:03 PM
Post
#3
|
WDG Founder Group: Root Admin Posts: 52 Joined: 2-August 06 From: Canada Member No.: 1 |
The default sort algorithm in JavaScript is based purely on the Unicode code point. If you want a locale-sensitive sort order, you can use this:
CODE function localeSort(string1, string2) { return string1.toString().localeCompare(string2.toString()); } var se=['å','ä','ö']; // Swedish var dk=['æ','ø','å']; // Danish, apparently same as Norwegian alert(se.sort(localeSort)); alert(dk.sort(localeSort)); That should use the locale configured on the user's system. If you want to use a specific locale regardless of the user's locale, I think you're stuck with writing the code for the locale-specific rules yourself in the function you pass to sort(). |
Christian J |
Oct 24 2006, 05:48 AM
Post
#4
|
. Group: WDG Moderators Posts: 9,653 Joined: 10-August 06 Member No.: 7 |
Does PHP allow you to specify the locale? It does, but it seems to be buggy. The entry on http://bugs.php.net/bug.php?id=9671 (10 Mar 2001 1:36pm) suggests something like this, which still sorts in the wrong order (PHP 4.3.3): CODE <?php // Danish letters $dk = array('ø', 'æ', 'å'); setlocale(LC_COLLATE, "dk_DK"); usort($dk, "strcoll"); print_r($dk); // returns "Array ( [0] => å [1] => æ [2] => ø )" echo '<br>'; // Norwegian letters $no = array('ø', 'æ', 'å'); setlocale(LC_COLLATE, "no_NO"); usort($no, "strcoll"); print_r($no); // returns "Array ( [0] => å [1] => æ [2] => ø )" echo '<br>'; // Swedish letters $se = array('å', 'ä', 'ö'); setlocale(LC_COLLATE, "sv_SV"); usort($se, "strcoll"); print_r($se); // returns "Array ( [0] => ä [1] => å [2] => ö )" ?> |
Christian J |
Oct 24 2006, 07:46 AM
Post
#5
|
. Group: WDG Moderators Posts: 9,653 Joined: 10-August 06 Member No.: 7 |
The default sort algorithm in JavaScript is based purely on the Unicode code point. According to wikipedia the first 256 code points are identical to ISO 8859-1, and there you can indeed find "ä" before "å". QUOTE If you want a locale-sensitive sort order, you can use this: CODE function localeSort(string1, string2) { return string1.toString().localeCompare(string2.toString()); } var se=['å','ä','ö']; // Swedish var dk=['æ','ø','å']; // Danish, apparently same as Norwegian alert(se.sort(localeSort)); alert(dk.sort(localeSort)); That should use the locale configured on the user's system. Do you mean the user's OS or browser language settings? On my Swedish Win XP it seems to work in IE6 and Firefox, but Opera sorts like before (despite claiming to support the localeCompare() method from Op7). QUOTE If you want to use a specific locale regardless of the user's locale... Regarding usability: what if a non-Swedish user reads a Swedish web page, wouldn't they (as I believe) expect letters to be sorted according to their own habit? E.g., wouldn't a typical English-speaking user expect "å" and "ä" to be treated as "a", and "ö" to be treated as "o"? |
Darin McGrew |
Oct 24 2006, 01:49 PM
Post
#6
|
WDG Member Group: Root Admin Posts: 8,365 Joined: 4-August 06 From: Mountain View, CA Member No.: 3 |
QUOTE wouldn't a typical English-speaking user expect "å" and "ä" to be treated as "a", and "ö" to be treated as "o"? I can't say whether I'm "a typical English-speaking user", but I would expect a Swedish page to sort Swedish names according to the normal Swedish rules for alphabetizing names. I would expect Danish and Norwegian pages to use the Danish and Norwegian alphabetizing rules (respectively). And so on. |
Christian J |
Oct 24 2006, 02:21 PM
Post
#7
|
. Group: WDG Moderators Posts: 9,653 Joined: 10-August 06 Member No.: 7 |
QUOTE wouldn't a typical English-speaking user expect "å" and "ä" to be treated as "a", and "ö" to be treated as "o"? I can't say whether I'm "a typical English-speaking user", but I would expect a Swedish page to sort Swedish names according to the normal Swedish rules for alphabetizing names. I would expect Danish and Norwegian pages to use the Danish and Norwegian alphabetizing rules (respectively). And so on.But what if you (the English-speaking user) don't know the Swedish rules? Suppose you're looking for a name like "Åsa" or "Örjan" in a very long alphabetically sorted list, were in the list would you begin to look? |
Darin McGrew |
Oct 24 2006, 02:36 PM
Post
#8
|
WDG Member Group: Root Admin Posts: 8,365 Joined: 4-August 06 From: Mountain View, CA Member No.: 3 |
But what if you (the English-speaking user) don't know the Swedish rules? Suppose you're looking for a name like "Åsa" or "Örjan" in a very long alphabetically sorted list, were in the list would you begin to look? Here's where I know I'm atypical: I'd look for an index of some sort. If the index listed ABCDEFGHIJKLMNOPQRSTUVWXYZ, then I'd look under A or O. But if the index listed ABCDEFGHIJKLMNOPQRSTUVWXYZÅÄÖ, then I'd look under Å or Ö. |
Liam Quinn |
Oct 24 2006, 07:48 PM
Post
#9
|
WDG Founder Group: Root Admin Posts: 52 Joined: 2-August 06 From: Canada Member No.: 1 |
Does PHP allow you to specify the locale? It does, but it seems to be buggy. The entry on http://bugs.php.net/bug.php?id=9671 (10 Mar 2001 1:36pm) suggests something like this, which still sorts in the wrong order (PHP 4.3.3): CODE <?php // Danish letters $dk = array('ø', 'æ', 'å'); setlocale(LC_COLLATE, "dk_DK"); usort($dk, "strcoll"); print_r($dk); // returns "Array ( [0] => å [1] => æ [2] => ø )" echo '<br>'; // Norwegian letters $no = array('ø', 'æ', 'å'); setlocale(LC_COLLATE, "no_NO"); usort($no, "strcoll"); print_r($no); // returns "Array ( [0] => å [1] => æ [2] => ø )" echo '<br>'; // Swedish letters $se = array('å', 'ä', 'ö'); setlocale(LC_COLLATE, "sv_SV"); usort($se, "strcoll"); print_r($se); // returns "Array ( [0] => ä [1] => å [2] => ö )" ?> The user comments in http://ca3.php.net/setlocale may help you determine whether your system has the locales installed. One problem is that you have the Danish and Swedish locale codes wrong: They should be "da_DK" and "sv_SE" (language_COUNTRY). |
Liam Quinn |
Oct 24 2006, 08:05 PM
Post
#10
|
WDG Founder Group: Root Admin Posts: 52 Joined: 2-August 06 From: Canada Member No.: 1 |
QUOTE If you want a locale-sensitive sort order, you can use this: CODE function localeSort(string1, string2) { return string1.toString().localeCompare(string2.toString()); } var se=['å','ä','ö']; // Swedish var dk=['æ','ø','å']; // Danish, apparently same as Norwegian alert(se.sort(localeSort)); alert(dk.sort(localeSort)); That should use the locale configured on the user's system. Do you mean the user's OS or browser language settings? I think that's up to the browser implementation. QUOTE Regarding usability: what if a non-Swedish user reads a Swedish web page, wouldn't they (as I believe) expect letters to be sorted according to their own habit? E.g., wouldn't a typical English-speaking user expect "å" and "ä" to be treated as "a", and "ö" to be treated as "o"? If the page is in Swedish, I think you should assume that the reader knows Swedish and that Swedish sorting rules are appropriate. |
Christian J |
Oct 25 2006, 07:04 AM
Post
#11
|
. Group: WDG Moderators Posts: 9,653 Joined: 10-August 06 Member No.: 7 |
One problem is that you have the Danish and Swedish locale codes wrong: They should be "da_DK" and "sv_SE" (language_COUNTRY). The locale codes indeed seem to be the problem. Like http://ca3.php.net/setlocale says, different systems have different naming schemes for locales, but you can apparently use an array of codes. The following works both on my Apache/Windows test server and on my web host's FreeBSD: CODE setlocale(LC_COLLATE, "sve", "sv_SE.ISO8859-1"); But while "nor" and "dan" work for Norwegian and Danish on my Apache/Windows, I haven't been able to make any code work for them on my web host yet. E.g., even though the following echoes "da_DK.ISO8859-1" as the preferred locale, it doesn't sort properly: CODE <?php // Danish letters $dk = array('a', 'b', 'o', 'æ', 'ø', 'å'); setlocale(LC_COLLATE, "dan", "da_DK.ISO8859-1"); echo setlocale(LC_COLLATE, "dan", "da_DK.ISO8859-1").'<br>'; // "da_DK.ISO8859-1" usort($dk, "strcoll"); print_r($dk); // "Array ( [0] => a [1] => å [2] => æ [3] => b [4] => o [5] => ø )" ?> I should add that Norwegian and Danish is not an urgent problem, I'm mostly curious. |
pandy |
Oct 25 2006, 06:02 PM
Post
#12
|
🌟Computer says no🌟 Group: WDG Moderators Posts: 20,730 Joined: 9-August 06 Member No.: 6 |
According to wikipedia the first 256 code points are identical to ISO 8859-1, and there you can indeed find "ä" before "å". It's only ASCII characters that are encoded the same in Unicode, isn't it? ÅÄÖ are 0197, 0196 and 0214 in Unicode so indeed Ä comes first. |
Brian Chandler |
Oct 26 2006, 12:32 AM
Post
#13
|
Jocular coder Group: Members Posts: 2,460 Joined: 31-August 06 Member No.: 43 |
One problem is that you have the Danish and Swedish locale codes wrong: They should be "da_DK" and "sv_SE" (language_COUNTRY). The locale codes indeed seem to be the problem. Like http://ca3.php.net/setlocale says, different systems have different naming schemes for locales, but you can apparently use an array of codes. The following works both on my Apache/Windows test server and on my web host's FreeBSD: CODE setlocale(LC_COLLATE, "sve", "sv_SE.ISO8859-1"); But while "nor" and "dan" work for Norwegian and Danish on my Apache/Windows, I haven't been able to make any code work for them on my web host yet. E.g., even though the following echoes "da_DK.ISO8859-1" as the preferred locale, it doesn't sort properly: CODE <?php // Danish letters $dk = array('a', 'b', 'o', 'æ', 'ø', 'å'); setlocale(LC_COLLATE, "dan", "da_DK.ISO8859-1"); echo setlocale(LC_COLLATE, "dan", "da_DK.ISO8859-1").'<br>'; // "da_DK.ISO8859-1" usort($dk, "strcoll"); print_r($dk); // "Array ( [0] => a [1] => å [2] => æ [3] => b [4] => o [5] => ø )" ?> I should add that Norwegian and Danish is not an urgent problem, I'm mostly curious. I'm still a bit puzzled about this. At what point does a web server actually _sort_ anything? |
Lo-Fi Version | Time is now: 19th April 2024 - 02:56 PM |