Style selection, caching, and general architecture

John-Mark Bell jmb at netsurf-browser.org
Wed Feb 2 23:07:08 GMT 2011


Definitions
-----------

1) Document          -- the HTML document being displayed
2) Selection Context -- An ordered collection of stylesheets from which 
                        element styles are selected.
3) DOM               -- Our internal document representation and 
                        management function (which just happens to be
                        accessible through the standardised DOM
                        interface)

Assumptions
-----------

1) There is a single Selection Context associated with a Document
2) A sane DOM & efficient event dispatch/delivery mechanism
3) A rendering engine that expects dynamic document changes

General premises
----------------

1) The most efficient way to do something is not to do it at all
2) If something has to be done, do it as late as possible and cache the 
   result
3) Modifying or replacing the Selection Context is a rare occurrence

A quick style selection refresh
-------------------------------

Selection of styles is performed for us by libcss. It models a Selection
Context for us, into which we can insert pre-parsed stylesheets.

When we want to find a style for a given DOM node (the Target Element),
the following happens:

  style = initial_style;

  for stylesheet in selection_context:
    if stylesheet.enabled and is_applicable(stylesheet.media):
      stylesheet.select(style)

  select(style):
    for import in stylesheet.imports:
      import.select(style)
    for selector in stylesheet.selectors:
      if is_applicable(selector):
        selector.styles.apply(style)

That is, process every stylesheet in the selection context, in order,
recursing into any imported stylesheets and considering every selection
rule in every stylesheet.

At least, that's what would happen, were libcss' selection engine a
thoughtless translation of the specification into an implementation.
Fortunately for us, it isn't, and it optimises a number of things.

We note that there are significantly more selectors that don't
match the target element than there are that do match it. Thus, what we
actually want is to be able to rapidly reject entire selector chains
at the earliest opportunity. This is best done by not even needing to
consider those selector chains that obviously don't match the target
element.

Internally, libcss maintains a number of hash tables containing
selectors:

  1) A hash for selectors whose last simple selector contains an 
     element name
  2) A hash for selectors whose last simple selector contains a 
     universal class
  3) A hash for selectors whose last simple selector contains a 
     universal ID
  4) A single chain of selectors that don't fall into the above 
     categories

When selecting, we extract from the target element its node name, a list
of classes it belongs to (if any), and its ID (if any). We then use this
information to find the hash chain(s) which contain selectors that
potentially match the target element. We must always consider all
selectors in the catch-all chain (which, ideally, is as few as
possible).

Further optimisations are possible but, even so, style selection will
still be a relatively expensive operation. Therefore, we don't want to
be doing it all the time.

Style changed events
--------------------

Obviously, if we don't want to be selecting style information
continually, we must maintain a cache on the target element. For this to
be reliable, we must know when the cached styling has become invalid.

The mechanism we use for notifying the Document of changes is that of an
extension to the DOM event model. We introduce a new, Style Changed,
event which is fired at the root of the affected DOM subtree.

There are two scenarios in which this event might be generated:

  1) when the underlying styling *has* changed (e.g. if the
     Selection Context for the document is modified or replaced)
  2) when the underlying styling *may have* changed (e.g. if the
     target element's position in the DOM has changed)

The first of these is easy to detect as it can only occur when the
Selection Context has changed in some way. When this happens, we fire a
Style Changed event at the Document and it must invalidate the styling
on all DOM nodes and reselect as necessary. Examples of Selection
Context changes might be the addition/removal of stylesheets, or
script-based modifications to style declarations.

The second is rather more difficult to determine, particularly if we
want to ensure that only the minimum number of nodes are affected (and
thus the minimum number of style recomputations are performed).

Causes of potential style changes
---------------------------------

In general, there are two main reasons why the styling of a node may
change:

  1) Modification of the properties of the node itself (e.g. movement of
     the node within the DOM, changes to attribute values, or whether a 
     form input is enabled or not)
  2) Modification of the properties of other nodes

The first of these is obvious enough: the node has changed, so its
styling may well have done so, too. It's also simple enough to detect
(given the DOM will have been responsible for making the relevant
changes).

The more complex case is detecting the impact of changes to other nodes.
This is probably best explained with an example. 

Given the following DOM tree:

          root
         /  |  \
        a - b - c
        |  / \  |
        d e - f g

The styling for a node can depend upon the properties of itself, its
siblings, its ancestors, and their siblings. Take node 'g', for example;
its styling may depend upon the properties of the nodes: root, a, b, c,
and g.

With the following style rules, this is the case:

  root { background-color: red }
  b + c g { background-color: green ! important }
  a ~ c > g { background-color: blue }

If we move node 'b' such that it becomes the last child of 'root', thus:

          root
         /  |  \
        a - c - b
        |   |  /|
        d   g e-f

then the background colour of 'g' will change from green to blue.

In general, there are 4 possible collections of nodes that will be
affected by a node moving in the DOM:

  1) The node itself
  2) The node's descendants
  3) The node's new siblings, and their descendants
  4) The node's previous siblings, and their descendants

Effectively, this means that the entire subtree beneath the node's
original parent and the entire subtree beneath the node's new parent may
be affected by the node moving. These subtrees may overlap (as they do
in the example).

An unoptimised implementation could simply mark all of the relevant
nodes' cached styles as invalid and force them all to be reselected.

Style caching
-------------

Given that it's a good idea to cache things, we need to work out what,
exactly we need to cache. Fortunately, many of the properties that
feed into style selection are relatively static. For example, an
element's name is fairly unlikely to change. In general, if a node has
been affected by a change in the DOM, it's safest to invalidate all of
its cached styles and repopulate the cache when needed.

While we could cache variants of a node's style for every combination of
selector applicability, this is most likely to be extremely wasteful and
unnecessary. Given the static nature of (or, at least, the symbiotic
relationship between) many of the CSS selectors it's only likely to be
worthwhile to cache a few sets of style information for a node.

Firstly, we must cache the base style for a node. That is, the style
selected when none of the pseudo element or the :active, :focus, 
:hover, or :checked pseudo classes are applied. This implies that the 
other pseudo classes and all other selector type are considered
reasonable for inclusion in the node's base style.

If there is a change to any of the properties that affect the
computation of the base style, then the cached base style must be
invalidated and reselected. This seems fair as they are either
structural or reflect data that is unlikely to change frequently. 

The styles computed when the pseudo element selectors are involved must
be treated independently of the node's base style as they reflect the
styling of parts of the render tree that are not (directly) backed by 
the DOM. They are, however, closely related to the node's base style and
may be stored as deltas against it and reselected at the same time.

The four pseudo classes exempt from the base style are the ones which
reflect interaction with the user. As with the pseudo elements, the
styles computed when they are applicable may reasonably be stored as
deltas against the base style. Additionally (and unlike the pseudo 
elements) they may be retrieved and cached on demand.

Relationship between DOM, styling, layout, and redraw
-----------------------------------------------------

At this point, it is sensible to consider the relationship between the
various components that come together to take a page and display it to
the user.

Effectively, we have this:

       +----------+                +----------+
       | Frontend |<---------------|  Redraw  |
       +----------+        |       +----------+
             |       Plot requests      ^
             |                          |
             |- Input events            |- Redraw requests
             v                          |
        +----------+               +----------+
        |   DOM    |-------------->|  Layout  |
        +----------+       |       +----------+
             ^             \
             |              `- Node changed events
             |
             |- Selection Context modified event
             |
        +----------+
        |  Style   |
        +----------+

That is, the DOM is the core component which is responsible for
detecting and reflecting changes to the document. Styling is an adjunct
to the DOM which acts as a repository from which to pick style
information, which is then cached by the DOM as it wishes.

Layout operates solely upon a stream of events from the DOM, using 
appropriate access methods to inspect DOM components and acquire
computed style information. It is responsible for maintaining its own
data structures for providing a visual representation of the document.

Finally, layout is also responsible for noticing changes within the
render tree that require redrawing or other such reflection in the UI.
On noticing such changes, it must send events to the redraw agent which
is responsible for displaying things.

Dynamic pseudo classes
----------------------

Aside from rendering, which causes events to propagate up from the DOM
to the frontend, we have user input, which requires events to travel
from the frontend down to the DOM for it to act upon (requesting
information about the affected nodes from layout).

The DOM must apply some hysteresis to these events, as an anti-spam
measure and then, once satisfied that the event was meaningful, act upon
it. In this way, the dynamic pseudo classes :active, :focus, :hover and,
to a lesser extent, :checked may be supported.

The DOM will retrieve the appropriate styling for the affected nodes,
and notify layout about the change to the document. Layout acts upon
the notification and eventually effects a redraw by notifying the
redraw engine. Note that there is nothing preventing the layout engine
caching render parts in much the same way as the DOM caches style 
information.

Pseudo elements
---------------

The handling of pseudo elements is mostly left up to the layout engine.
As described above, the DOM will select and cache pseudo element styles
simultaneously with node base styles.






More information about the netsurf-dev mailing list