---------- Forwarded message ----------
Date: Wed, 17 Oct 2007 21:42:38 +0200
From: Franz Korntner <franz(a)digital-connectivity.com>
To: John-Mark Bell <jmb(a)netsurf-browser.org>
Subject: Re: Source scrubbing
John,
I'm not sure if this is a private reply or if it gets posted on the
mailinglist.
I currently don't have time to look at your patch (it'll
probably be next
week when I get time), but 10000 lines is most likely too big to evaluate in
one go. Is there any possibility that you could break it up into more
manageable chunks?
Nearly all patch snippets are independent. So you can slice it
into chunks as
large as you like. The smaller patchfile is an extract containing the cherries.
The patch consists largely of adding typecasts and layout changes to fix the
scope of inner enums. It also contains lots of explicit casts from float to int
to mark loss of precision. At some places I uniformed signed/unsigned, int/long
and const issues. Other stuff found I'll keep for later as not to make the
patch too complicated.
What I did bump into
- Is some const nasties with regard to tree nodes. This is because the text
field are sometimes populated by const strings, sometimes 'shared strings' and
sometimes free()able. This make the code very complex and breaks the const
attribute. I really suggest that when the node is allocated, you allocate a
couple of bytes more and copy the title field into the node itself. This
relaxes management and you can drop the related code and be more const strict.
- The rendering coordinate system is limping on two legs (ints and floats) and
it feels that it's having hidden side-effects. I also believe I found some
points (in CSS) where sizes and counts are getting mixed. I am currently
looking into getting these two separated.
- At some places enums are being used instead of bitfields resulting in lots of
warnings.
- For example, in css there is the struct css_border_width, it contains
css_length and the field percent. Technically is not a length, but it seems
mutual-exclusive with css_length. It might seem that broadening the concept of
css_length might optimize both code and storage size.
Finally I constructed autoconf/automake templates as I require
certain
functionality not delivered by the supplied makefile. I suggest you include
these files so that package/distribution builders can choose which system to
use.
Are you suggesting that the existing build system be replaced? If so, that's
a non-starter as it's highly unlikely that autotools will ever work on RISC
OS.
No, keep the build system! What I suggest is to include the autoconf/automake
templates so that they can be activated on request. The templates by themselves
are harmless and do not stand in the way or interfere with the build system. I
need them because of my non-standard environment.
For the same reasons I suggest you include precompiled versions of
lemon/re2c
generated files.
Currently, these are available from
http://netsurf.strcprstskrzkrk.co.uk/developer/ -- they're recreated
automatically as needed. The CSS parser is due for a major overhaul at some
point in the relatively near future. I currently have no idea whether this
will have an impact on the use of lemon/re2c.
I also maintain distributions. In
general it's a pain to recreate intermediate
files and it's annoying if the contents is not effected by the build/hosting
environment. I have even encountered massive problems reproducing older
packages because the required tools were not (or difficult) reproducible. These
files do not require build-time regeneration.
> Good to know.
> I have a good feeling that DOM functionality can easily be injected in the
> current HTML parser.
The current layout engine cannot cope with the document changing.
That depends; whilst libxml already produces a tree which is fairly close to
the W3C DOM, its HTML parser is particularly non-robust to real-world web
content. Additionally, its architecture is not suited to handling injection
of data into the document source stream (as required by certain scripting
methods -- namely document.write()).
I formulated this one really poorly as I
actually meant something different.
However you did answer a question I had not yet asked. I'll return on this
subject later.
> I am looking for a small footprint package and get unnervy with
the
> foresight of the introduction of a complete and/or standalone DOM component.
> This might make things easier.
I'm not sure I understand this. Are you saying that a standalone DOM
component is a good thing or not? Note that a standalone HTML parser and core
DOM implementation is likely to be smaller than a binary of libxml. Note that
both the HTML parser and DOM implementation will be standalone libraries.
I meant
that libxml seems suitable enough for an implied DOM model. i.e. it
seems overdone to maintain a DOM tree separate of libxml. I meant using libxml
for the DOM.
The two libraries I currently favour are SpiderMonkey (which is what
I think
you meant, not SeaMonkey ;) and libsee. However, JavaScript work is someway
down the line, so a decision on this hasn't happened yet.
I just installed a
newer version of seamonkey, guess that was echoing in my
head. I haven't investigated libsee but I did look at spidermonkey and it felt
good. It has a compiler, interpreter and object handling with objects you
expect and hooks for DOM. I was surprised about the language capabilities.
Seems that js is more mature than I imagined. I was also looking into the
parser/scanner. Having separate combos for js and css seems silly. If you are
looking for a lemon/re2c substitute, I don't know how well the spidermonker
scanner/parser can handle css. But as I said before, this has no high priority
as I expect many unexpected nasties. If I have time to spare, I prefer to get
netsurf though css compliancy testing.
Franz.