Those of you that use UTF-8, what does your workflow look like when creating new HTML files?
I suspect I'll keep forgetting to save new files as UTF-8 in my text editor. When that happens, how can you easily tell if a file was saved as UTF-8 or ANSI (especially if it was saved as UTF-8 without a BOM)? Do you have to look in your text editor's document properties? What if you forget to check, and the web page doesn't contain many exotic characters that may alert you of your mistake?
(I'm thinking of switching to UTF-8 just to please the W3C validator: http://forums.htmlhelp.com/index.php?s=&showtopic=25841&view=findpost&p=114090 but otherwise I don't really need it, and fear it will just cause problems.)
I check what my text editor says it is. It isn't Unicode enabled, so it tells me rather loudly. Among other things it doesn't let me edit the document unless I actively OK that. It also says in the status bar, so I don't need to click anything. But many editors have that feature, I think.
Do you write in Swedish? If you write strictly in English it doesn't matter. You can save as ANSI and serve as UTF-8, but you already know that I guess.
I don't know if there is any real advantage of UTF-8 if you seldom use characters outside the iso latin charset. Can't think of any. I guess it would be that it's extensible, if you in the future want to add some larger quotes in a language outside iso latin you don't have to change anything. Just remap your keyboard - like anyone would do that.
You'll probably get a better answer from Darin.
My editor can't save as Unicode, so not a problem. I use another editor if I need to use Unicode.
Yeah, the ASCII range of charcters is encoded the same in ANSI and UTF-8. Very handy, that. Especially with the editor I use. If I need to edit UTF-8 files I'm alright as long as they are in English.
But regarding your problem, can't you set your editor to default to UTF-8 and use that all the time?
I don't know. I remember reading something about that but it was looong ago. Since I've stuck with iso-latin for files I create myself I haven't given it much thought.
If ISO Latin characters are encoded the same in ANSI and UTF-8, like you wrote, I guess there is literally no difference -- it's only if you want to use non-ISO Latin characters in CSS/JS/PHP (e.g. Hieroglyphs for variable names, or scripts that print Swedish text) that you may need to send UTF-8 headers.
No, not the whole ANSI range. Just the ASCII range, i.e. a-z, A-Z, 0-9 and common punctuation marks. You can't use ÅÄÖ, ê, ñ, ø, € and so on and save as either ANSI or UTF-8 and serve as whichever you choose. That is, English only!
Oops, I meant ASCII, not ANSI.
Now I tried converting a web site from iso-8859-1 to UTF-8, using TextPad. At first pages with Swedish text displayed correctly in the browser, but when I viewed source the Swedish letters å, ä and ö were changed into Ã¥, ä and ö, and the file encoding was back to ANSI. How did that happen?
When I saved the above garbled files as UTF-8 a second time, the browser displayed the garbled characters instead of Swedish text. When I changed the garbled characters back to Swedish text and saved as UTF-8 a third time it suddenly worked, so I guess I must have made some mistake, but I'm not sure what.
After some more testing I wonder if it's the pages' META charset (that I forgot to change from iso-8859-1 to UTF-8) that made TextPad act strange --but should a text editor pay attention to HTML tags? After batch correcting the META charsets I batch-saved the files, which now made them become UTF-8 by default, but this time åäö were changed into empty "[]" boxes in the source, and � ("?") characters in my browser. Seems things only work correctly if I save each file individually as UTF-8.
TextPad (v8.1.2) isn't very helpful, BTW. Changing its Default encoding for HTML files in the preferences doesn't seem to work, files still get saved as ANSI by default. Batch conversion doesn't seem possible either, instead you must(?) resave each file separately to change its encoding.
I have a growing list of those. Can't be bothered to edit the not relevant bits out, you get it. Well, the last isn't Unicode but it's there for a reason.
It does appear TextPad analyzes the META Charset tag. Here are some experiments:
1. I created an HTML file containing a <meta charset="UTF-8"> element and Swedish åäö characters. When saved the file became UTF-8 by default, and everything worked. Correction: now that I tested again, it did not work. UTF-8 is not the default encoding, even though I set it to in the preferences.
2. Created a second identical HTML file, but with a <meta charset="iso-8859-1"> element. When saved the file became ANSI by default. When I changed the META charset to UTF-8 and resaved, the file remained ANSI but Swedish letters were garbled. When I resaved explicitly as UTF-8 everything worked.
3. Created a third identical HTML file, but without any META charset. When saved it became ANSI by default. When I resaved as UTF-8, Swedish letters were garbled. When I added a <meta charset="UTF-8"> element and resaved, the file became UTF-8 and everything worked.
I got lost there at some point...
Yes, I think I confused myself too. Maybe I should delete that post to protect the innocent...
Did some more tests. This makes my head spin, so maybe I got it wrong again.
TextPad's Preferences let you specify different default encodings for various document classes, but it seems the default for text documents affects other document classes too. In other words, UTF-8 as default for text documents will also apply to new documents saved as HTML --TextPad's default for HTML documents has no effect.
Furthermore, if you explicitly specify a non-default encoding when saving an HTML document, TextPad obeys you even if you use the wrong META charset (and characters like "å ä ö" are consequently garbled). Apparently TextPad encodes text differently depending on the META charset. When you open such a document in TextPad, it again seems TextPad lets the META charset decide the encoding:
- A document with <meta charset="UTF-8"> can be explicitly saved as ANSI (which turns "å ä ö" into "? ? ?"), but when you open it again TextPad considers it UTF-8.
- A document with <meta charset="iso-8859-1"> can be explicitly saved as UTF-8 (which turns "å ä ö" into "Ã¥ ä ö"), but when you open it again TextPad considers it ANSI.
Never used TextPad, so can't be of much help I'm afraid.
Powered by Invision Power Board (http://www.invisionboard.com)
© Invision Power Services (http://www.invisionpower.com)