On Tue, 18 Jul 2006, Richard Porter wrote:
On 18 Jul 2006 John-Mark Bell <jmb202(a)ecs.soton.ac.uk> wrote:
> On Tue, 18 Jul 2006, Richard Porter wrote:
>> I'm getting spurious A-circumflex characters on some pages in front of
>> characters like the pound sign. Specifically I've filled in a form in
>> which I used the pound sign twice in a text area. When the updated
>> page showing my message comes back there's a spurious A-circumflex in
>> front of the pound sign (encoded Â£ in the source).
> It's not spurious; the UTF-8 encoding of a pound sign will display like
> that if sent as Latin1.
OK, but even if it's legit its display on the screen is spurious.
No; How does the browser know what encoding the content is in? The server
tells it (and where it doesn't, the user-agent should assume ISO-8859-1
). Therefore, the behaviour is well defined and NetSurf is doing the
right thing here with regards to display.
Additionally, as you pointed out above, the source of the page contains
"Â£" so NetSurf's rendering is correct _regardless of
character set of the page_ (HTML entities are defined in the Unicode
>> Is this character being inserted when I submit the form or is
>> by the web server concerned?
> NetSurf goes to great lengths to ensure that it sends text in the correct
> encoding for the server concerned when submitting a form. It sounds
> as if it's either getting it wrong here, or the site in question is
> doing something strange. Without more details about the site in question
> (a URL to the page, for example) I can't say any more.
It's PlusNet's ticketing system. You'd need to be a PlusNet customer,
but if necessary I can save the pages and raise a bug report. I didn't
want to do that if the problem is outside NetSurf's control.
Well, I've run NetSurf through my testcases for encoded form submission
and can't see anything amiss - for a page encoded in ISO-8859-1, NetSurf
submits pound signs as %A3, which is correct. For a UTF-8 encoded page, it
submits it as %C2%A3, which is also correct.
What I do need to know is the encoding of the page containing the form
along with a save out of the form-containing page (normal, not full save).
Additionally the same for the page displaying the submitted content
(including the encoding information). To discover what encoding NetSurf
thinks the page is in, see the page info dialog (Page->Info on the main
menu). Please send this to me off-list.
1. If you believe the spec, anyway. Some sites assume that the UA defaults
to CP1252 in this situation (NetSurf doesn't and the only real
drawback to this is that "smart" quotes on some pages don't display)