On Thu, 2009-05-28 at 14:40 -0700, WP Blatchley wrote:
>> Could an Iyonix owner tell me if RISC OS 5 can properly
display
>> UTF-8 encoded messages in windows and menus?
> The RO5 Wimp speaks Unicode. If you set your system alphabet to UTF8,
> then it will display such strings correctly. The downside, however, is
> that most applications have no knowledge of UTF-8, so their menu text
> will likely come out garbled (invariably, it's the shift arrow which
> confuses matters).
> My understanding is that for the embedded RO boxes shipped in Japan, the
> system alphabet was set to UTF8 by default. I suspect it's likely that
> you will need to softload ROOL International and InternationalKeyboard
> modules to be able to set the system alphabet to UTF8. I suspect that
> it's unwise to attempt to softload the ROOL Wimp on a ROL OS, however.
So if I were to translate the necessary files, someone could at least give
me a screenshot of it running on RO5?
I expect so, yes.
That would be satisfying! I'll have to try to get R05 running
under
emulation on Windows at least, so I can see my translation in action!
There's not yet a complete RO5 ROM image for emulation.
It's a shame that setting the system alphabet to UTF8 will break
a lot of
applications' menus (and slightly ironic, seeing as UTF8 was designed to
slot into non-Unicode aware systems without causing problems).
Top-bit-set characters will always be misinterpreted, in the general
case.
Perhaps there could be a hack written that tries to assess whether a
Messages file is UTF8-encoded or Latin-1-encoded, and transcode the
strings accordingly on the fly... Still, that's a discussion for another
mailing list, I suppose.
It's somewhat awkward as the thing doing the translation has no idea
where the strings will be used. Therefore, transcoding messages as
they're loaded from the Messages file will probably break things.
Of course, it's possible to trigger such transcoding by sticking some
magic value in the Messages file, but that then requires application
authors to do whatever work is necessary to cope with it. Additionally,
many applications a) don't use MessageTrans and b) don't have Messages
files, so you won't catch those by extending MessageTrans.
For strings drawn in icons and menus by the Wimp, it should be
relatively easy for the Wimp to distinguish between UTF-8 text and
legacy 8bit text. The difficulty arises with working out which 8bit
character set the text is in. In most cases, it'll be Acorn Latin 1, but
it's guaranteed that there are edge cases.
It occurs to me that it's likely that the Wimp doesn't even inspect
strings to be drawn -- it just throws them at the Font Manager, having
opened the desktop font without an explicit encoding specifier, so it'll
use whatever the system alphabet is set to. Perhaps, therefore, the Font
Manager should attempt to fix this case up.
Finally, setting the system alphabet to UTF-8 changes the way in which
the Font Manager works when an application opens a font without an
explicit encoding. That's going to break many things because far too few
applications deal with character sets and font encodings at all, let
alone properly.
Any pointers as to where to start with the translation files?
Copy !NetSurf.Resources.en.Messages to !NetSurf.Resources.ja.Messages,
translate the strings, and send use the resulting file. That's about it.
The text at the top of the Messages file describes what the format is.
Modern Zap can edit UTF-8 encoded files quite happily -- you may have to
tell it that a file is UTF-8 encoded; I can't remember, off hand.
J.