The Web Design Group

... Making the Web accessible to all.

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
> Which Character Set, I always find advice on character sets ambiguous
MikeC
post Nov 26 2006, 11:32 AM
Post #1





Group: Members
Posts: 4
Joined: 26-November 06
Member No.: 1,118



Hi

My first post.

I've written pages in the past and given up on finding guidance on character sets and just used ISO-8859-1 because it was the default, or because it worked...

Which is the right character set for my website? That's a very open-ended question I know but there doesn't seem to be easily accessible well-set out guidance on this choice anywhere. Since I want to do a more serious site now, I'd like it to be right.

I'll help narrow things down and say: I want my pages to be to the latest web standards and possibly multilingual / easily translated in the future. So I want to use XHTML. Instinctively I want to use UTF-8. Would this be the right choice? Are there drawbacks I'm unaware of? Should I just use ISO-8859-1? I think some discussion of the relative merits of character sets could benefit lots of people.

Thanks in advance

Mike
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Darin McGrew
post Nov 26 2006, 12:43 PM
Post #2


WDG Member
********

Group: Root Admin
Posts: 8,365
Joined: 4-August 06
From: Mountain View, CA
Member No.: 3



QUOTE(MikeC @ Nov 26 2006, 08:32 AM) *
So I want to use XHTML.
Unless you have a particular reason to use XHTML, you're probably better off with plain HTML 4.01.

QUOTE(MikeC @ Nov 26 2006, 08:32 AM) *
Instinctively I want to use UTF-8. Would this be the right choice?
What character encodings does your authoring software support? What character encodings are used by any third parties that supply your content? Is there some way that ISO-8859-1 is not meeting your needs? Or that you expect it to no longer meet your needs in the future?
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
MikeC
post Nov 27 2006, 06:02 AM
Post #3





Group: Members
Posts: 4
Joined: 26-November 06
Member No.: 1,118



Hi Darin, thanks for responding.

QUOTE(Darin McGrew @ Nov 26 2006, 05:43 PM) *
Unless you have a particular reason to use XHTML, you're probably better off with plain HTML 4.01.

1 Standards Compliance - we're advised there's no point in using older standards in a brand new site.
2 Future proofing - I'm expecting growth
3 XHTML is no harder to write than HTML so why not?

QUOTE(Darin McGrew @ Nov 26 2006, 05:43 PM) *
What character encodings does your authoring software support? What character encodings are used by any third parties that supply your content? Is there some way that ISO-8859-1 is not meeting your needs? Or that you expect it to no longer meet your needs in the future?

Most of them, including UTF-8.
3rd party input will be small and mostly PNG / SVG.
ISO-8859-1 meets my needs today, but so does UTF-8. I expect my site, or the descendant of my site to be multilingual in the future.

I understand that some browsers can't deal very well with XHTML 1.1 but Gecko / WebKit both seem to manage, which covers probably 99% of my future reader-base.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Brian Chandler
post Nov 27 2006, 06:30 AM
Post #4


Jocular coder
********

Group: Members
Posts: 2,460
Joined: 31-August 06
Member No.: 43



Use UTF-8. The only thing you would have to do to convert is to change any "high-ASCII" encoded characters (I avoided that by always entering accented letters with the HTML "entity" codes). UTF-8 is a widely used standard, and the only way to enable real multilingual capability; lots of advantages, no disadvantages except a tiny byte overhead on heavily accented languages that can be represented in 8859.

Don't waste time with XHTML: it offer no empirically detectable advantages, many disadvantages (such as no browser currently actually handles it correctly, I believe), and it is not at all clear that it is the "way of the future". It's just as likely to disappear before HTML does (if it ever does, given the amount of stuff in html, and the ease of transcribing).

My 2 yen
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
MikeC
post Nov 27 2006, 11:12 AM
Post #5





Group: Members
Posts: 4
Joined: 26-November 06
Member No.: 1,118



QUOTE(Brian Chandler @ Nov 27 2006, 11:30 AM) *

UTF-8 is a widely used standard, and the only way to enable real multilingual capability; lots of advantages, no disadvantages except a tiny byte overhead...

That's what I understood to be the case. I guess as long as the charset code is the right one for the content on a particular page, it doesn't really matter and I can mix and match pages.

QUOTE(Brian Chandler @ Nov 27 2006, 11:30 AM) *
Don't waste time with XHTML: it offer no empirically detectable advantages, many disadvantages (such as no browser currently actually handles it correctly, I believe)..

Your 2yen very much appreciated and I take your point. It could be argued that some current browsers don't handle HTML 4.01 correctly laugh.gif I haven't encontered any problems with the techie-geeky websites I visit, most of which are XHTML. I have far more trouble with Flash angry.gif

I believe however that the Mozilla foundation and the many (Adobe, Nokia, Apple, KDE?) adopters of WebKit are very much committed to XHTML. I'm pretty confident of it taking hold. IE will become much less relevent over the next few years as the internet landscape embraces handhelds and I have no doubt the future of handhelds is Linux.

User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Peter1968
post Nov 27 2006, 01:28 PM
Post #6


Serious Coder
*****

Group: Members
Posts: 448
Joined: 23-September 06
Member No.: 213



They've committed to XHTML, as you say, simply because it's an official thing that the W3C have invented/dreamt up.

The W3C have brainstormed and formulated some great ideas over time, but XHTML is not one of them.

It's a reinvention of the wheel that's caused more confusion and further muddied already murky waters.

And, more to the point, few sites implement it correctly.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
Brian Chandler
post Nov 27 2006, 03:11 PM
Post #7


Jocular coder
********

Group: Members
Posts: 2,460
Joined: 31-August 06
Member No.: 43



QUOTE(MikeC @ Nov 28 2006, 01:12 AM) *

QUOTE(Brian Chandler @ Nov 27 2006, 11:30 AM) *

UTF-8 is a widely used standard, and the only way to enable real multilingual capability; lots of advantages, no disadvantages except a tiny byte overhead...

That's what I understood to be the case. I guess as long as the charset code is the right one for the content on a particular page, it doesn't really matter and I can mix and match pages.



Mix-n-match sounds like a (seriously) Bad Idea. I have some Shift_JIS (Japanese) pages mixed up with my UTF-8 pages - well, not really very mixed, which is why it's not too bad, but it would be much easier just to use UTF-8. What non-ASCII characters are you using from 8859?

QUOTE

QUOTE(Brian Chandler @ Nov 27 2006, 11:30 AM) *
Don't waste time with XHTML: it offer no empirically detectable advantages, many disadvantages (such as no browser currently actually handles it correctly, I believe)..

Your 2yen very much appreciated and I take your point. It could be argued that some current browsers don't handle HTML 4.01 correctly laugh.gif I haven't encontered any problems with the techie-geeky websites I visit, most of which are XHTML. I have far more trouble with Flash angry.gif

I believe however that the Mozilla foundation and the many (Adobe, Nokia, Apple, KDE?) adopters of WebKit are very much committed to XHTML. I'm pretty confident of it taking hold. IE will become much less relevent over the next few years as the internet landscape embraces handhelds and I have no doubt the future of handhelds is Linux.


"Committed"... IBM were so committed to PL/1 that they trademarked PL/2, PL/3, PL/4, PL/5, PL/6, PL/7, PL/8, and PL/9. ("What's PL/1?" "Exactly")

What advantages do you see to XHTML? An advantage is something like: "XHTML makes pages load faster", "XHTML makes it easier to generate pages from a database". Anything like that - empirically detectable.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
MikeC
post Nov 27 2006, 03:44 PM
Post #8





Group: Members
Posts: 4
Joined: 26-November 06
Member No.: 1,118



Doh. I just used fast reply and it didn't work.

I take on board the comments re XHTML. I guess I was being a bit wide-eyed "standards are good".

As I said, either ISO-8859-1 or UTF-8 would do the job now

Realistically what I am writing now is relatively small and non-dynamic. I was thinking of saving myself work in the future but when the site grows it is likely these pages will be replaced anyway.

Thanks for the help folks!
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post
pandy
post Nov 27 2006, 06:35 PM
Post #9


🌟Computer says no🌟
********

Group: WDG Moderators
Posts: 20,732
Joined: 9-August 06
Member No.: 6



QUOTE(MikeC @ Nov 27 2006, 09:44 PM) *

I take on board the comments re XHTML. I guess I was being a bit wide-eyed "standards are good".


Standards are good. HTML 4.01 is a standard.

QUOTE
I haven't encontered any problems with the techie-geeky websites I visit, most of which are XHTML. I have far more trouble with Flash


That's because the pages are served as text/html and that's how they are treated by browsers. Their XML parsers never see them. The same old tag soup engines as always handle them . Those pages ARE HTML, even if not totally correct HTML (the slashes breaks the rules).

If they were served with one of the correct content types for XHTML many of those pages wouldn't even show up. You can amuse yourself with a little game I played awhile. Download some of those pages, rename them to .xhtml and see what browsers do with them. Or use some XML "validator" and see how they fare. It turns out they more often than not are not well-formed. Two requirements of XHTML are well-formedness and that UAs should not display a page that isn't well-formed. This alone talks against XHTML for the web. Can you see a big commercial site take the risk of a small mistake, maybe something in a program's output, causing their pages not to display before the error is found and corrected? XML is probably a good thing, but not on this side of the server.
User is offlinePM
Go to the top of the page
Toggle Multi-post QuotingQuote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



- Lo-Fi Version Time is now: 27th April 2024 - 03:20 AM