The Web Design Group

... Making the Web accessible to all.

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
> Incorrect sorting of the Scandinavian alphabets
Christian J
post Oct 23 2006, 06:11 PM
Post #1


.
********

Group: WDG Moderators
Posts: 6,174
Joined: 10-August 06
Member No.: 7



Not only PHP sorts the Swedish letters , and incorrectly, now I noticed that javascript does the same, and also in Danish and Norwegian. The arrays below should be in the correct order for each language:

CODE
window.onload=function()
{    
    var se=['','','']; // Swedish
    var dk=['','','']; // Danish, apparently same as Norwegian
    
    alert(se.sort());
    alert(dk.sort());    
}


Note that Danish and Norwegian use a different order than Swedish. But in the sorted javascript alerts the Swedish letters are incorrectly sorted as ",,", while Danish and Norwegian are (again incorrectly) sorted as ",,". The same error appear in IE, Opera and Firefox. At least Opera's Norwegian creators should know their own alphabet, so am I correct in assuming that all three browser vendors deliberately follow some flawed convention?


--------------------
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Darin McGrew
post Oct 23 2006, 06:55 PM
Post #2


WDG Member
********

Group: Root Admin
Posts: 8,021
Joined: 4-August 06
From: Mountain View, CA
Member No.: 3



Does PHP allow you to specify the locale? The default locale is often "C", which sorts characters according to their numeric encoding. Other locales should sort characters as appropriate for that locale.


--------------------
Darin McGrew
WDG Member since 1998
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Liam Quinn
post Oct 23 2006, 08:03 PM
Post #3


WDG Founder
***

Group: Root Admin
Posts: 51
Joined: 2-August 06
From: Canada
Member No.: 1



The default sort algorithm in JavaScript is based purely on the Unicode code point. If you want a locale-sensitive sort order, you can use this:

CODE

function localeSort(string1, string2) {
  return string1.toString().localeCompare(string2.toString());
}

var se=['','','']; // Swedish
var dk=['','','']; // Danish, apparently same as Norwegian

alert(se.sort(localeSort));
alert(dk.sort(localeSort));


That should use the locale configured on the user's system. If you want to use a specific locale regardless of the user's locale, I think you're stuck with writing the code for the locale-specific rules yourself in the function you pass to sort().
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Oct 24 2006, 05:48 AM
Post #4


.
********

Group: WDG Moderators
Posts: 6,174
Joined: 10-August 06
Member No.: 7



QUOTE(Darin McGrew @ Oct 24 2006, 01:55 AM) *

Does PHP allow you to specify the locale?

It does, but it seems to be buggy. The entry on http://bugs.php.net/bug.php?id=9671 (10 Mar 2001 1:36pm) suggests something like this, which still sorts in the wrong order (PHP 4.3.3):

CODE
<?php
// Danish letters
$dk = array('', '', '');
setlocale(LC_COLLATE, "dk_DK");
usort($dk, "strcoll");
print_r($dk); // returns "Array ( [0] => [1] => [2] => )"

echo '<br>';

// Norwegian letters
$no = array('', '', '');
setlocale(LC_COLLATE, "no_NO");
usort($no, "strcoll");
print_r($no); // returns "Array ( [0] => [1] => [2] => )"

echo '<br>';

// Swedish letters
$se = array('', '', '');
setlocale(LC_COLLATE, "sv_SV");
usort($se, "strcoll");
print_r($se); // returns "Array ( [0] => [1] => [2] => )"
?>


--------------------
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Oct 24 2006, 07:46 AM
Post #5


.
********

Group: WDG Moderators
Posts: 6,174
Joined: 10-August 06
Member No.: 7



QUOTE(Liam Quinn @ Oct 24 2006, 03:03 AM) *

The default sort algorithm in JavaScript is based purely on the Unicode code point.

According to wikipedia the first 256 code points are identical to ISO 8859-1, and there you can indeed find "" before "".

QUOTE
If you want a locale-sensitive sort order, you can use this:
CODE

function localeSort(string1, string2) {
  return string1.toString().localeCompare(string2.toString());
}

var se=['','','']; // Swedish
var dk=['','','']; // Danish, apparently same as Norwegian

alert(se.sort(localeSort));
alert(dk.sort(localeSort));


That should use the locale configured on the user's system.

Do you mean the user's OS or browser language settings? On my Swedish Win XP it seems to work in IE6 and Firefox, but Opera sorts like before (despite claiming to support the localeCompare() method from Op7).

QUOTE
If you want to use a specific locale regardless of the user's locale...

Regarding usability: what if a non-Swedish user reads a Swedish web page, wouldn't they (as I believe) expect letters to be sorted according to their own habit? E.g., wouldn't a typical English-speaking user expect "" and "" to be treated as "a", and "" to be treated as "o"?


--------------------
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Darin McGrew
post Oct 24 2006, 01:49 PM
Post #6


WDG Member
********

Group: Root Admin
Posts: 8,021
Joined: 4-August 06
From: Mountain View, CA
Member No.: 3



QUOTE
wouldn't a typical English-speaking user expect "" and "" to be treated as "a", and "" to be treated as "o"?
I can't say whether I'm "a typical English-speaking user", but I would expect a Swedish page to sort Swedish names according to the normal Swedish rules for alphabetizing names. I would expect Danish and Norwegian pages to use the Danish and Norwegian alphabetizing rules (respectively). And so on.


--------------------
Darin McGrew
WDG Member since 1998
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Oct 24 2006, 02:21 PM
Post #7


.
********

Group: WDG Moderators
Posts: 6,174
Joined: 10-August 06
Member No.: 7



QUOTE(Darin McGrew @ Oct 24 2006, 08:49 PM) *

QUOTE
wouldn't a typical English-speaking user expect "" and "" to be treated as "a", and "" to be treated as "o"?
I can't say whether I'm "a typical English-speaking user", but I would expect a Swedish page to sort Swedish names according to the normal Swedish rules for alphabetizing names. I would expect Danish and Norwegian pages to use the Danish and Norwegian alphabetizing rules (respectively). And so on.


But what if you (the English-speaking user) don't know the Swedish rules? Suppose you're looking for a name like "sa" or "rjan" in a very long alphabetically sorted list, were in the list would you begin to look?


--------------------
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Darin McGrew
post Oct 24 2006, 02:36 PM
Post #8


WDG Member
********

Group: Root Admin
Posts: 8,021
Joined: 4-August 06
From: Mountain View, CA
Member No.: 3



QUOTE(Christian J @ Oct 24 2006, 12:21 PM) *
But what if you (the English-speaking user) don't know the Swedish rules? Suppose you're looking for a name like "sa" or "rjan" in a very long alphabetically sorted list, were in the list would you begin to look?
Here's where I know I'm atypical: I'd look for an index of some sort. If the index listed ABCDEFGHIJKLMNOPQRSTUVWXYZ, then I'd look under A or O. But if the index listed ABCDEFGHIJKLMNOPQRSTUVWXYZ, then I'd look under or .


--------------------
Darin McGrew
WDG Member since 1998
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Liam Quinn
post Oct 24 2006, 07:48 PM
Post #9


WDG Founder
***

Group: Root Admin
Posts: 51
Joined: 2-August 06
From: Canada
Member No.: 1



QUOTE(Christian J @ Oct 24 2006, 06:48 AM) *

QUOTE(Darin McGrew @ Oct 24 2006, 01:55 AM) *

Does PHP allow you to specify the locale?

It does, but it seems to be buggy. The entry on http://bugs.php.net/bug.php?id=9671 (10 Mar 2001 1:36pm) suggests something like this, which still sorts in the wrong order (PHP 4.3.3):

CODE
<?php
// Danish letters
$dk = array('', '', '');
setlocale(LC_COLLATE, "dk_DK");
usort($dk, "strcoll");
print_r($dk); // returns "Array ( [0] => [1] => [2] => )"

echo '<br>';

// Norwegian letters
$no = array('', '', '');
setlocale(LC_COLLATE, "no_NO");
usort($no, "strcoll");
print_r($no); // returns "Array ( [0] => [1] => [2] => )"

echo '<br>';

// Swedish letters
$se = array('', '', '');
setlocale(LC_COLLATE, "sv_SV");
usort($se, "strcoll");
print_r($se); // returns "Array ( [0] => [1] => [2] => )"
?>



The user comments in http://ca3.php.net/setlocale may help you determine whether your system has the locales installed. One problem is that you have the Danish and Swedish locale codes wrong: They should be "da_DK" and "sv_SE" (language_COUNTRY).
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Liam Quinn
post Oct 24 2006, 08:05 PM
Post #10


WDG Founder
***

Group: Root Admin
Posts: 51
Joined: 2-August 06
From: Canada
Member No.: 1



QUOTE(Christian J @ Oct 24 2006, 08:46 AM) *

QUOTE
If you want a locale-sensitive sort order, you can use this:
CODE

function localeSort(string1, string2) {
  return string1.toString().localeCompare(string2.toString());
}

var se=['','','']; // Swedish
var dk=['','','']; // Danish, apparently same as Norwegian

alert(se.sort(localeSort));
alert(dk.sort(localeSort));


That should use the locale configured on the user's system.

Do you mean the user's OS or browser language settings?


I think that's up to the browser implementation.

QUOTE

Regarding usability: what if a non-Swedish user reads a Swedish web page, wouldn't they (as I believe) expect letters to be sorted according to their own habit? E.g., wouldn't a typical English-speaking user expect "" and "" to be treated as "a", and "" to be treated as "o"?


If the page is in Swedish, I think you should assume that the reader knows Swedish and that Swedish sorting rules are appropriate.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Christian J
post Oct 25 2006, 07:04 AM
Post #11


.
********

Group: WDG Moderators
Posts: 6,174
Joined: 10-August 06
Member No.: 7



QUOTE(Liam Quinn @ Oct 25 2006, 02:48 AM) *

One problem is that you have the Danish and Swedish locale codes wrong: They should be "da_DK" and "sv_SE" (language_COUNTRY).

The locale codes indeed seem to be the problem. Like http://ca3.php.net/setlocale says, different systems have different naming schemes for locales, but you can apparently use an array of codes. The following works both on my Apache/Windows test server and on my web host's FreeBSD:

CODE
setlocale(LC_COLLATE, "sve", "sv_SE.ISO8859-1");

But while "nor" and "dan" work for Norwegian and Danish on my Apache/Windows, I haven't been able to make any code work for them on my web host yet. E.g., even though the following echoes "da_DK.ISO8859-1" as the preferred locale, it doesn't sort properly:

CODE
<?php
// Danish letters
$dk = array('a', 'b', 'o', '', '', '');
setlocale(LC_COLLATE, "dan", "da_DK.ISO8859-1");
echo setlocale(LC_COLLATE, "dan", "da_DK.ISO8859-1").'<br>';  // "da_DK.ISO8859-1"
usort($dk, "strcoll");  
print_r($dk); // "Array ( [0] => a [1] => [2] => [3] => b [4] => o [5] => )"
?>


I should add that Norwegian and Danish is not an urgent problem, I'm mostly curious.


--------------------
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Oct 25 2006, 06:02 PM
Post #12


Don't like donuts. Don't do MySpace.
********

Group: WDG Moderators
Posts: 15,575
Joined: 9-August 06
Member No.: 6



QUOTE(Christian J @ Oct 24 2006, 02:46 PM) *

According to wikipedia the first 256 code points are identical to ISO 8859-1, and there you can indeed find "" before "".

It's only ASCII characters that are encoded the same in Unicode, isn't it? are 0197, 0196 and 0214 in Unicode so indeed comes first.


--------------------
"Never go to excess, but let moderation be your guide."
- Cicero

IPB Image
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Brian Chandler
post Oct 26 2006, 12:32 AM
Post #13


Jocular coder
********

Group: Members
Posts: 2,256
Joined: 31-August 06
Member No.: 43



QUOTE(Christian J @ Oct 25 2006, 09:04 PM) *

QUOTE(Liam Quinn @ Oct 25 2006, 02:48 AM) *

One problem is that you have the Danish and Swedish locale codes wrong: They should be "da_DK" and "sv_SE" (language_COUNTRY).

The locale codes indeed seem to be the problem. Like http://ca3.php.net/setlocale says, different systems have different naming schemes for locales, but you can apparently use an array of codes. The following works both on my Apache/Windows test server and on my web host's FreeBSD:

CODE
setlocale(LC_COLLATE, "sve", "sv_SE.ISO8859-1");

But while "nor" and "dan" work for Norwegian and Danish on my Apache/Windows, I haven't been able to make any code work for them on my web host yet. E.g., even though the following echoes "da_DK.ISO8859-1" as the preferred locale, it doesn't sort properly:

CODE
<?php
// Danish letters
$dk = array('a', 'b', 'o', '', '', '');
setlocale(LC_COLLATE, "dan", "da_DK.ISO8859-1");
echo setlocale(LC_COLLATE, "dan", "da_DK.ISO8859-1").'<br>';  // "da_DK.ISO8859-1"
usort($dk, "strcoll");  
print_r($dk); // "Array ( [0] => a [1] => [2] => [3] => b [4] => o [5] => )"
?>


I should add that Norwegian and Danish is not an urgent problem, I'm mostly curious.


I'm still a bit puzzled about this. At what point does a web server actually _sort_ anything?


--------------------
Brian Chandler
Nothing in this post constitutes "commercial solicitation". PayPal does not solicit residents of Japan. Contents may settle in transit. "Legal mind" may or may not be brain-damaged.
User is online!PM
Go to the top of the page
Toggle Multi-post QuotingQuote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



- Lo-Fi Version Time is now: 1st November 2014 - 12:25 AM