Bug 474 : export to web, wrong character encoding in html
Last modified: 2007-02-17 10:49



Assigned To:

Attachment Type Created Size Actions
example export to web showing wrongly encoded umlauts. application/zip 2006-12-29 12:28 156.36 KB

Description:   Opened: 2006-12-29 12:26
when you have umlauts in the comment at the top of you sketch and export-to-web, the
included comment in the html is wrongly encoded. it says it's "iso-8859-1" in the html and the
file itself is ISO Latin 1 (on my machine) but the umlauts end up being garbled. they are ok in
Processing and in the source-pde.

Additional Comment #1 From fjen 2006-12-29 12:28
example export to web showing wrongly encoded umlauts.
Additional Comment #2 From fry 2007-01-03 12:05
actually the encoding will be the default encoding that java uses on that
platform. in the html we should probably say that it's actually us-ascii
(or whatever the html equiv is for that) since people shouldn't be
embedding anything but straight ascii into their pages and using &blah; and
friends to encode things.
Additional Comment #3 From fjen 2007-01-03 12:55
ok. but what i don't get is where the wrong encoding happens since the pde is correct and just
the html is wrong. see what i mean? if my machine uses same encoding for both of them, then
they should both end up working (i'm checking in bbedit and safari).

would it be much trouble to have the html being encoded in utf-8, then we could just html-
entify anthing (from the comments) that exceeds ascii into hex-entities �.
see html_utf8() in .../dev/bugs/login.php


Additional Comment #4 From fjen 2007-01-29 18:56
solution (svn-rev 3025):

in Sketch.java change line 1973 to:
PrintStream ps = new PrintStream(fos, false, "UTF-8");

and line 1988 to:
BufferedReader reader = new BufferedReader(new InputStreamReader(is, "UTF-8"));

change both lib/export/applet.html and lib/export/applet-opengl.html to have:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
and make sure they are utf-8 encoded when saving them.

works perfectly for me.
Additional Comment #5 From fry 2007-02-01 14:04
iso-8859-1 is the same thing as iso-latin-1 so i'm not sure what's going
on.. actually, java's default encoding for /reading/ files might actually
be mac-roman for us and western european systems, not sure.

i'm concerned about switching to utf-8, since that requires everyone to use
a utf8 editor (more common on osx than windows, to my knowledge).
Additional Comment #6 From fry 2007-02-01 14:06
aha! that was it:
produces "MacRoman"

so i just need to alter the reader/writer to be 8859-1.

tho i guess that's not great for jp/ko/etc..
Additional Comment #7 From fry 2007-02-01 14:09
casey, do you have thoughts on this? should we switch to utf-8 for everybody?
Additional Comment #8 From fjen 2007-02-02 02:05
i think it's better to support as many languages as possible from the start since not too many
people actually edit the html-files. even if one of the windows people want's to edit an
exported file, it's easy for him / her to change it to his favorit encoding while working on it.
Additional Comment #9 From fry 2007-02-17 10:49
k, we're going utf-8 throughout. casey and i decided that since it's a
small number of people who it will affect, we have to go with what's safest
for them. also added a comment to the html files stating that the encoding
must remain utf-8, so that people don't try to get clever and change it.
all set for 0125.