Which Character Set, I always find advice on character sets ambiguous |
Which Character Set, I always find advice on character sets ambiguous |
MikeC |
Nov 26 2006, 11:32 AM
Post
#1
|
Group: Members Posts: 4 Joined: 26-November 06 Member No.: 1,118 |
Hi
My first post. I've written pages in the past and given up on finding guidance on character sets and just used ISO-8859-1 because it was the default, or because it worked... Which is the right character set for my website? That's a very open-ended question I know but there doesn't seem to be easily accessible well-set out guidance on this choice anywhere. Since I want to do a more serious site now, I'd like it to be right. I'll help narrow things down and say: I want my pages to be to the latest web standards and possibly multilingual / easily translated in the future. So I want to use XHTML. Instinctively I want to use UTF-8. Would this be the right choice? Are there drawbacks I'm unaware of? Should I just use ISO-8859-1? I think some discussion of the relative merits of character sets could benefit lots of people. Thanks in advance Mike |
Darin McGrew |
Nov 26 2006, 12:43 PM
Post
#2
|
WDG Member Group: Root Admin Posts: 8,365 Joined: 4-August 06 From: Mountain View, CA Member No.: 3 |
So I want to use XHTML. Unless you have a particular reason to use XHTML, you're probably better off with plain HTML 4.01.Instinctively I want to use UTF-8. Would this be the right choice? What character encodings does your authoring software support? What character encodings are used by any third parties that supply your content? Is there some way that ISO-8859-1 is not meeting your needs? Or that you expect it to no longer meet your needs in the future? |
MikeC |
Nov 27 2006, 06:02 AM
Post
#3
|
Group: Members Posts: 4 Joined: 26-November 06 Member No.: 1,118 |
Hi Darin, thanks for responding.
Unless you have a particular reason to use XHTML, you're probably better off with plain HTML 4.01. 1 Standards Compliance - we're advised there's no point in using older standards in a brand new site. 2 Future proofing - I'm expecting growth 3 XHTML is no harder to write than HTML so why not? What character encodings does your authoring software support? What character encodings are used by any third parties that supply your content? Is there some way that ISO-8859-1 is not meeting your needs? Or that you expect it to no longer meet your needs in the future? Most of them, including UTF-8. 3rd party input will be small and mostly PNG / SVG. ISO-8859-1 meets my needs today, but so does UTF-8. I expect my site, or the descendant of my site to be multilingual in the future. I understand that some browsers can't deal very well with XHTML 1.1 but Gecko / WebKit both seem to manage, which covers probably 99% of my future reader-base. |
Brian Chandler |
Nov 27 2006, 06:30 AM
Post
#4
|
Jocular coder Group: Members Posts: 2,460 Joined: 31-August 06 Member No.: 43 |
Use UTF-8. The only thing you would have to do to convert is to change any "high-ASCII" encoded characters (I avoided that by always entering accented letters with the HTML "entity" codes). UTF-8 is a widely used standard, and the only way to enable real multilingual capability; lots of advantages, no disadvantages except a tiny byte overhead on heavily accented languages that can be represented in 8859.
Don't waste time with XHTML: it offer no empirically detectable advantages, many disadvantages (such as no browser currently actually handles it correctly, I believe), and it is not at all clear that it is the "way of the future". It's just as likely to disappear before HTML does (if it ever does, given the amount of stuff in html, and the ease of transcribing). My 2 yen |
MikeC |
Nov 27 2006, 11:12 AM
Post
#5
|
Group: Members Posts: 4 Joined: 26-November 06 Member No.: 1,118 |
UTF-8 is a widely used standard, and the only way to enable real multilingual capability; lots of advantages, no disadvantages except a tiny byte overhead... That's what I understood to be the case. I guess as long as the charset code is the right one for the content on a particular page, it doesn't really matter and I can mix and match pages. Don't waste time with XHTML: it offer no empirically detectable advantages, many disadvantages (such as no browser currently actually handles it correctly, I believe).. Your 2yen very much appreciated and I take your point. It could be argued that some current browsers don't handle HTML 4.01 correctly I haven't encontered any problems with the techie-geeky websites I visit, most of which are XHTML. I have far more trouble with Flash I believe however that the Mozilla foundation and the many (Adobe, Nokia, Apple, KDE?) adopters of WebKit are very much committed to XHTML. I'm pretty confident of it taking hold. IE will become much less relevent over the next few years as the internet landscape embraces handhelds and I have no doubt the future of handhelds is Linux. |
Peter1968 |
Nov 27 2006, 01:28 PM
Post
#6
|
Serious Coder Group: Members Posts: 448 Joined: 23-September 06 Member No.: 213 |
They've committed to XHTML, as you say, simply because it's an official thing that the W3C have invented/dreamt up.
The W3C have brainstormed and formulated some great ideas over time, but XHTML is not one of them. It's a reinvention of the wheel that's caused more confusion and further muddied already murky waters. And, more to the point, few sites implement it correctly. |
Brian Chandler |
Nov 27 2006, 03:11 PM
Post
#7
|
Jocular coder Group: Members Posts: 2,460 Joined: 31-August 06 Member No.: 43 |
UTF-8 is a widely used standard, and the only way to enable real multilingual capability; lots of advantages, no disadvantages except a tiny byte overhead... That's what I understood to be the case. I guess as long as the charset code is the right one for the content on a particular page, it doesn't really matter and I can mix and match pages. Mix-n-match sounds like a (seriously) Bad Idea. I have some Shift_JIS (Japanese) pages mixed up with my UTF-8 pages - well, not really very mixed, which is why it's not too bad, but it would be much easier just to use UTF-8. What non-ASCII characters are you using from 8859? QUOTE Don't waste time with XHTML: it offer no empirically detectable advantages, many disadvantages (such as no browser currently actually handles it correctly, I believe).. Your 2yen very much appreciated and I take your point. It could be argued that some current browsers don't handle HTML 4.01 correctly I haven't encontered any problems with the techie-geeky websites I visit, most of which are XHTML. I have far more trouble with Flash I believe however that the Mozilla foundation and the many (Adobe, Nokia, Apple, KDE?) adopters of WebKit are very much committed to XHTML. I'm pretty confident of it taking hold. IE will become much less relevent over the next few years as the internet landscape embraces handhelds and I have no doubt the future of handhelds is Linux. "Committed"... IBM were so committed to PL/1 that they trademarked PL/2, PL/3, PL/4, PL/5, PL/6, PL/7, PL/8, and PL/9. ("What's PL/1?" "Exactly") What advantages do you see to XHTML? An advantage is something like: "XHTML makes pages load faster", "XHTML makes it easier to generate pages from a database". Anything like that - empirically detectable. |
MikeC |
Nov 27 2006, 03:44 PM
Post
#8
|
Group: Members Posts: 4 Joined: 26-November 06 Member No.: 1,118 |
Doh. I just used fast reply and it didn't work.
I take on board the comments re XHTML. I guess I was being a bit wide-eyed "standards are good". As I said, either ISO-8859-1 or UTF-8 would do the job now Realistically what I am writing now is relatively small and non-dynamic. I was thinking of saving myself work in the future but when the site grows it is likely these pages will be replaced anyway. Thanks for the help folks! |
pandy |
Nov 27 2006, 06:35 PM
Post
#9
|
🌟Computer says no🌟 Group: WDG Moderators Posts: 20,732 Joined: 9-August 06 Member No.: 6 |
I take on board the comments re XHTML. I guess I was being a bit wide-eyed "standards are good". Standards are good. HTML 4.01 is a standard. QUOTE I haven't encontered any problems with the techie-geeky websites I visit, most of which are XHTML. I have far more trouble with Flash That's because the pages are served as text/html and that's how they are treated by browsers. Their XML parsers never see them. The same old tag soup engines as always handle them . Those pages ARE HTML, even if not totally correct HTML (the slashes breaks the rules). If they were served with one of the correct content types for XHTML many of those pages wouldn't even show up. You can amuse yourself with a little game I played awhile. Download some of those pages, rename them to .xhtml and see what browsers do with them. Or use some XML "validator" and see how they fare. It turns out they more often than not are not well-formed. Two requirements of XHTML are well-formedness and that UAs should not display a page that isn't well-formed. This alone talks against XHTML for the web. Can you see a big commercial site take the risk of a small mistake, maybe something in a program's output, causing their pages not to display before the error is found and corrected? XML is probably a good thing, but not on this side of the server. |
Lo-Fi Version | Time is now: 27th April 2024 - 03:20 AM |