Garbled form script output after changing charset to UTF-8 |
Garbled form script output after changing charset to UTF-8 |
Christian J |
Sep 3 2018, 03:42 PM
Post
#1
|
. Group: WDG Moderators Posts: 9,656 Joined: 10-August 06 Member No.: 7 |
This old guestbook script converts Swedish åäö characters to the HTML entities å, ä and ö. But when I changed charset from ISO 8859-1 to UTF-8 in the guestbook's HTML files, åäö characters in new guestbook entries became garbled by the perl script (old entries in the guestbook still displayed correctly). I eventually gave up and deleted the whole guestbook, but I'm still curious what might have caused the bug. Could form data posted from a UTF-8 web page be to blame?
|
pandy |
Sep 3 2018, 08:20 PM
Post
#2
|
🌟Computer says no🌟 Group: WDG Moderators Posts: 20,730 Joined: 9-August 06 Member No.: 6 |
Yeah, I think so. I don't know the ins and outs of it, but Perl at least used to have a problem with Unicode. Maybe if you had made the script also convert Unicode åäö to entities?
|
Christian J |
Sep 4 2018, 04:45 AM
Post
#3
|
. Group: WDG Moderators Posts: 9,656 Joined: 10-August 06 Member No.: 7 |
Yeah, I think so. I don't know the ins and outs of it, but Perl at least used to have a problem with Unicode. IIRC, the åäö characters looked like this: åäö, which I think usually happens if a document saved as UTF-8 still uses the iso-8859-1 charset. I tried replacing every occurence of iso-8859-1 META charset tags in the perl script with UTF-8, and even tried saving the perl script itself as UTF-8 to no avail. QUOTE Maybe if you had made the script also convert Unicode åäö to entities? Alas I don't know Perl or Unicode well enough. I was thinking of making the form on the UTF-8 page to submit its form data as iso-8859-1, so that the Perl script then could handle it. Could a form's ACCEPT-CHARSET attribute be used for that? "The ACCEPT-CHARSET attribute specifies a list of character encodings that are accepted by the form handler. The value consists of a list of "charsets" separated by commas and/or spaces. The default value is UNKNOWN and is usually considered to be the character encoding used to transmit the document containing the FORM." |
pandy |
Sep 4 2018, 09:34 AM
Post
#4
|
🌟Computer says no🌟 Group: WDG Moderators Posts: 20,730 Joined: 9-August 06 Member No.: 6 |
Yeah, I think so. I don't know the ins and outs of it, but Perl at least used to have a problem with Unicode. IIRC, the åäö characters looked like this: åäö, which I think usually happens if a document saved as UTF-8 still uses the iso-8859-1 charset. I tried replacing every occurence of iso-8859-1 META charset tags in the perl script with UTF-8, Meta tags in the script? How would that work? QUOTE and even tried saving the perl script itself as UTF-8 to no avail. There's more to it. Have you read this? https://perldoc.perl.org/perlunicode.html I haven't more than glanced at it. But I think you may find something there. QUOTE QUOTE Maybe if you had made the script also convert Unicode åäö to entities? Alas I don't know Perl or Unicode well enough. Neither do I. QUOTE I was thinking of making the form on the UTF-8 page to submit its form data as iso-8859-1, so that the Perl script then could handle it. Could a form's ACCEPT-CHARSET attribute be used for that? But why is it important that the page is UTF-8? Can't you just go back to what you had? |
Christian J |
Sep 4 2018, 11:55 AM
Post
#5
|
. Group: WDG Moderators Posts: 9,656 Joined: 10-August 06 Member No.: 7 |
Meta tags in the script? How would that work? Sorry it was headers, not Meta tags. Like this one: CODE print "Content-Type: text/html; charset=iso-8859-1\n\n"; It's done for each confirmation/error page. QUOTE There's more to it. Have you read this? https://perldoc.perl.org/perlunicode.html I haven't more than glanced at it. But I think you may find something there. QUOTE But why is it important that the page is UTF-8? Can't you just go back to what you had? I could have made an ISO 8859-1 exception with the guestbook form page, but the sites uses inclusion files like nav menus that need the same encoding on all pages, so I'd have to change back the whole site to ISO 8859-1 which felt like even more work. |
pandy |
Sep 4 2018, 04:27 PM
Post
#6
|
🌟Computer says no🌟 Group: WDG Moderators Posts: 20,730 Joined: 9-August 06 Member No.: 6 |
Meta tags in the script? How would that work? Sorry it was headers, not Meta tags. Like this one: CODE print "Content-Type: text/html; charset=iso-8859-1\n\n"; It's done for each confirmation/error page. QUOTE There's more to it. Have you read this? https://perldoc.perl.org/perlunicode.html I haven't more than glanced at it. But I think you may find something there. QUOTE But why is it important that the page is UTF-8? Can't you just go back to what you had? I could have made an ISO 8859-1 exception with the guestbook form page, but the sites uses inclusion files like nav menus that need the same encoding on all pages, so I'd have to change back the whole site to ISO 8859-1 which felt like even more work. Ack. Then yo have to read the perl doc page I linked to. |
Lo-Fi Version | Time is now: 25th April 2024 - 10:17 PM |