Help - Search - Members - Calendar
Full Version: Site displays in question marks randomly
HTMLHelp Forums > Web Authoring > Markup (HTML, XHTML, XML)
cheerfulnut
Hi All,

I'm not really a web designer, but a friend has asked if I could help with an issue their site is having since I once helped them update a couple of their pages. The URL in question is http://www.intec-comms.com/ and the issue is that apparently the front page will suddenly display in question marks. I've been able to reproduce the issue - when first going to the site, it will show up as all question marks, but refreshing will occasionally bring it back to normal (it's in Japanese), but then another refresh will bring back the question marks.

As far as I know, nothing has been changed to the code since I last updated it almost a year ago. It almost looks like an encoding issue, but I've modified the current "charset=utf-8" to "Shift_JIS" and that just made things worse.

I know nothing about php, so I don't really know where to look but if you had any suggestions I'd greatly appreciate it!

Thanks,
NJ
Darin McGrew
Actually, it doesn't look like you've specified the charset at all:
CODE
Content-Type: text/html
pandy
It's in a Meta tag (UTF-8). So what encoding did you use when you wrote the page?
Darin McGrew
QUOTE
It's in a Meta tag (UTF-8).
Yeah, but by the time the browser sees the meta tag, it's too late.
pandy
Because it's after TITLE? But if it is UTF-8, shouldn't it work to manually tell the browser to use UTF-8 anyway? blink.gif

I think we had something like this on the old board, but I've forgotten the details.
Darin McGrew
The browser has to pick a charset before it can interpret the HTML, before it can see the meta tag.
pandy
But why does it work with western charsets, say UTF-8 or iso-latin and extended characters in the title? huh.gif
cheerfulnut
Thanks for your input so far Pandy, Darin. Sorry my responses will be a bit delayed since I'm in Japan.

I didn't actually create this site - it was setup by someone who is apparently long gone. They were just lucky to have the ftp logon info so that I was able to go in and modify a page they needed updated. However, I am *extremely* rusty on anything html/web coding-related so I've tried to be very careful on what I've tried tweaking.

I'm willing to give anything a try right now, though I'm still puzzled as to why this has suddenly happened - and why it seems sporadic. If I want to specify the charset, is it just a matter of including it in the header? Like: <head Content-Type: text/html; charset=utf-8>

And to answer your question Pandy, the content includes both single (western) and double-byte (Japanese) characters.


Darin McGrew
Proper HTTP Content-Type headers are set by configuring the server. See also:
http://httpd.apache.org/docs/1.3/mod/core....ddefaultcharset
http://httpd.apache.org/docs/1.3/mod/mod_m...html#addcharset

BTW, your server is running Apache 1.3 which is no longer maintained, and has been declared "end of life". You might consider upgrading.
Brian Chandler
QUOTE(pandy @ Feb 10 2011, 06:09 AM) *

But why does it work with western charsets, say UTF-8 or iso-latin and extended characters in the title? huh.gif


Because UTF-8 and all of the "extended ASCII" 256-character sets are by design ASCII-transparent. That is, all of the characters required to specify the stuff you need before you get to the meta tag are "in ASCII" anyway.

Strictly it's slightly more complicated than that. UTF-8 (which was designed in a restaurant) _is_ ASCII transparent, as (I believe) is 8859-1 ("Iso-Latin 1") and its relatives, like 8859-15, whose _name_ is "Latin 9", and which includes the Euro sign. My guess is that browsers/fonts typically hack 8859-1 to pretend it includes the Euro sign, which replaces a strange blob called "currency symbol", which is described here: http://en.wikipedia.org/wiki/Currency_(typography)
(Essentially it has no real function).

Character sets like shift-JIS are semi-ASCII transparent: they are the same for basic characters, so probably enough to get you to the meta tag. But there are also other characters in the first 128 places, and in shift-JIS in particular backslash is replaced by a so-called "narrow-width" yen sign, which is why translations of manuals about MS stuff from Japanese often show filenames separated by yen signs.

The stuff on this website looks rather out of date: http://htmlhelp.com/reference/charset/

Brian Chandler
QUOTE(cheerfulnut @ Feb 10 2011, 08:11 AM) *

And to answer your question Pandy, the content includes both single (western) and double-byte (Japanese) characters.


That doesn't actually answer the question. (Which I can't see, but guess it's the obvious question)

Strictly speaking there is no such thing in the wide sense as "single" or "double" byte characters. This terminology all grew out of the shift-JIS kludgery from the early days, when a "single-byte" character appeared as one tile in a 80x25 character display, and a "double-byte" character appeared in the (roughly square) space occupied by two such tiles.

If the Japanese is encoded in UTF-8, for example, all ASCII characters are 1 byte, some other stuff like pound signs are 2 bytes, and most kanji and kana are 3 bytes.

Anyway, the file must actually be encoded in some character set, and that character set should be shown as the character set the file is encoded in. Unfortunately most modern software spends its time trying to second guess the user, and makes it hard to see what the encoding is, but if you open the text of the html file (rename it as html.txt) in a browser, you can choose different encodings, and find which one shows it correctly.
Brian Chandler
QUOTE(Darin McGrew @ Feb 10 2011, 09:28 AM) *

Proper HTTP Content-Type headers are set by configuring the server.



That's not entirely accurate. There are two issues:

(1) It is better to send the encoding in an HTTP header. True.
But it cannot hurt to include the meta tag as well, or there is no way of knowing the encoding after the file has been downloaded and saved somewhere.

(2) The *default* content type header is set by configuring the server. But if you have different files in different encodings in the same directory, you can use php (or other language) to send the appropriate headers. I don't think you can do this in the server settings, can you? (Assuming the files have the same 'extension', like .html)




cheerfulnut
Apologies for my late response (been crazy at work), and thanks for the advice Brian and Darin.

Turns out the issue was some feature of their news section - I'm not entirely sure how it works (again, I was not involved with the initial creation of this site, and am rarely asked to look into it) but apparently there's a function that will allow a user to add "news" sections without the user actually needing to know html/php/css, etc. A user modified an older news post (likely by copy/pasting it from an MSWord or email file), instead of creating a new entry, and it was this modified older entry that borked things up.

After deleting the old entry and creating a new one from scratch, things went back to normal.

Again, appreciate the advice - I learned something from it smile.gif
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2013 Invision Power Services, Inc.