UTF-8 processing

Chris Young chris.young at unsatisfactorysoftware.co.uk
Fri Jul 14 12:39:38 BST 2017



On 14 July 2017 12:10:47 BST, Bernard Boase <b.boase at bcs.org> wrote:
>Looking in detail at some recent HTML email attachments (received in 
>Messenger Pro), Netsurf's rendering of them seems to have a dependence 
>on the coding of the <meta content=""> tag.
>
>When this tag is present and includes:
>
>     content="text/html charset=utf-8"
>
>any non-ASCII characters are shown as the three bytes of their UTF-8 
>encoding, whereas if the two attributes are separated by semicolon:
>
>     content="text/html; charset=utf-8"
>
>the rendering (using Unicode font DejaVu) looks correct.
>
>Is this a known problem?

I don't know if it's a known problem, but certainly the first form is incorrect. The semicolon is required to separate the parameters.

If NetSurf doesn't know the encoding it will assume ASCII/ISO-8859-1, or maybe default character encoding of the OS.

Chris



More information about the netsurf-users mailing list