Thanks for your thorough review and involved comments :)
Comments in line below...
On Mon, 2015-11-02 at 19:23 +0000, Richard Maw wrote:
On Mon, Nov 02, 2015 at 01:07:15PM +0900, Tristan Van Berkom wrote:
> Table of Contents
Wow, I've not seen a proposal that required a ToC before.
> 1 Problem Statement
> There are quite a few problems which are directly related to the current
> model, in the interest of outlining a project that is realistic and
> achievable; I will try to restrict this problem set as much as possible,
> as we cannot solve everything at the same time.
> 1.1 System Bloat
It is currently possible in morph (but not YBD) to reduce this,
as we can define stratum split rules to specify that
only some parts of a chunk get included in the stratum artifact,
then the system artifact declares what stratum artifacts it needs.
The current design is clunky and difficult to understand,
which has resulted in the fact that while it is technically possible,
nobody does it which makes the feature moot.
You are referring to a different level of system bloat, i.e. the
inclusion of what is typically considered 'devel' and 'doc' payload
output from a given module build.
The system bloat I am referring to is strictly at the module/chunk
level. Because we group chunks inside of strata at all levels of the
stack, we end up building and including entire modules in a system build
because of this stata sharing - entire modules which would not need to
be built or distributed in many specific use cases.
This is much more alarming to me than the fact that we currently end up
installing build tooling, header files and pkg-config files on a target
OS. That extra build metadata is only expensive in disk usage; while the
bloat I mean to address actually adds unneeded code to any given use
case specific target operating system. pkg-config files, header files
and documentation do not add complexity to the end result or product,
they do not run the risk of actually having an effect on the
functionality of the system.
That said, the bloat you refer to does exist, however it was very
intentional that I do not address this in the proposal. This is a half
solved problem in Baserock and will not be more, or less solved after
the proposed refactor.
> 2 Proposed Model: Runtime Dependencies and Flavors
> Simply put, the proposed model is to build the entire dependency graph
> by defining each and every chunk separately, where each chunk declares
> it's direct dependencies explicitly.
> For convenience, we would keep the concept of "stratum" around, however
> they would not conceptually "contain" chunks. Strata would instead refer
> to existing chunks and strata as dependencies, as such they would be
> allowed to overlap in such a way that a system may include two strata
> which refer to some of the same chunks, but the build mechanics would
> revolve around building an entire dependency tree of chunks, while
> strata would only define which chunks and strata come together as
> logical groups.
Conceptually, this is what they already are,
I think the key distinction with where you want them to go
is that they are no longer mandatory to be able to define a chunk,
which allows chunks to configure things currently only defineable in strata.
Sure, I want a clean split between chunks and strata, where the
paramount goal here is to ensure that chunks only ever depend on other
chunks, and entire strata are never implicitly pulled into a build as a
dependency of a chunk.
Granularity and flexibility are what I am after, leaving strata behind
simply as a convenient semantic for grouping logical sets together for
the definition of a system, or, as a convenience specifically for the
singular build-essential.morph, which would be the only acceptable
stratum for a chunk to depend on really.
> 2.1 Stratum, Chunk and System Morphologies
> For the sake of added clarity, let us consider an OOP approach to
> defining the concepts supported by the various Morphology "kinds",
> consider that a morphology "kind" is an object class.
> The class hierarchy looks like this:
> / \
> Chunk System
> Which is to say, the Stratum Morphology type captures the common
> properties of Chunk and System types.
> Before continuing, lets just give a quick run down of what properties
> belong to which type in the class hierarchy.
I'm a bit cynical about Object Orientation,
especially since logically they are actually very different beasts,
and I think this hierarchy glosses over the actual details.
I'd rather there be some separate interface concept involved,
which just happens to have the same interface as the Stratum,
than asserting Chunks and Systems are somehow derived from Strata.
/ | \
Chunk Stratum System
Though since python is duck-typed rather than statically typed,
debating class hierarchies isn't massively valuable.
This object oriented approach to the explanation was an afterthought,
however you may be surprised at how much simpler and clearer things
became for the sake of explaining, and even for the sake of
understanding the problem, after looking at it this way.
The implementation language of the build tooling is of zero relevance to
this conversation, as is the precise methods with which the model is
That said, I honestly disagree with the hierarchy you propose, for the
simple reason that you allow Stratum to grow properties and
functionality orthogonally to chunks.
Is there some justification for a Stratum to do *anything* more than
what it shares in common with a Chunk ? Which is to say:
o refer to build dependencies
o refer to runtime dependencies
o declare flavors/variants
I think it is in our best interest to disallow the stratum from gaining
any further feature creep, which should probably be handled at the chunk
or deployment levels instead.
> 2.1.2 Stratum
> a stratum may refer to a chunk as a dependency as conceptually speaking, a
> chunk is also a stratum.
Yeah, this is what was worrying me.
Can you elaborate on why this worries you ?
> Note that this structure is already conceptually very close to what we
> have, organizationally we still group Chunks in our defined Strata -
> however a Strata is not said to contain Chunks, instead a Strata depends
> on Chunks in exactly the same way that Chunks depend on other Chunks.
Amusingly, I've had this argument before,
with Paul arguing for the opposite direction,
of having only one type of definition
which logically contains everything it depends on.
> We will get into the difference of "build dependencies" and "run
> dependencies", and define the meaning of "flavors" in the following
> In the following sections, we will use the word "Stratum" loosely to
> define any Morphology which is either a Stratum or a derived class.
I prefer Buildable :¬)
> 2.1.3 Chunk
> The Chunk class extends the Stratum class with the remaining usual chunk
> o repo
> o ref
> o unpetrify-ref
> o build-system
> o [ configure, build, install -commands ]
> o system-integration
> o ... other build related attributes I've missed here ...
> It's important to note that a chunk here is the only Morphology class
> which should be adding any payload to the resulting build.
> Further; it should be noted that the 'repo' attribute of the chunk is an
> optional attribute. A Chunk may be defined simply for the purpose of
> adding static payload to the resulting build (configuration files) or
> simply to run system-integration hooks in the resulting system.
I'd argue that instead it should be a list of (repo, ref, unpetrify-ref, path),
which can be empty if you don't need any git repositories to build from,
or it could have multiple if you'd rather do the submodules yourself.
I wrote a proposal for this… last year.
And I would not disagree :)
Thankfully this is another problem which is quite safely orthogonal to
the problems I aim to fix with this proposal.
> 2.1.4 System
> The System class extends the Stratum class with the following attributes
> o arch
> o configure-extensions
> Here I should add a note that it may be possible or desirable to abolish
> the System type entirely. During the implementation of a refactoring
> towards the proposed model, I would leave it up to the implementor to
> decide if we can address this sufficiently with Chunks and "flavors"
Yep, potentially arch could be a flavour/variant.
> The advantage of removing the concept of "System" morphologies are that
> we could potentially use the same system definition for multiple arches,
> via our newly introduced concept of flavors (a system flavor could be
> the arch itself), further the configure-extensions could be collapsed
> into specific Chunk definitions which could be shared across multiple
> systems, and conditionally run depending on the flavor of the system
> (the arch).
Either that, or move configure-extensions to deployment definitions.
True, there are probably some extensions or scripts which will always
need to be defined in the deployment (cluster ?) definitions.
I personally prefer handling at least all payload related material at
the chunk level, although I can see that for deployments, there are
going to be some scripting extensions which are *not* related to the
payload itself, but related to container/disk/medium preparation and
installation of the said payload instead - this does logically belong
somewhere outside of the chunk, the cluster/deployment is the right
place for such scripting to live.
An advantage of restricting all payload related scripting to the chunks
themselves, is that we centralize the scripts and their invocations
inside chunks which can be shared across multiple systems.
> 2.2.1 Syntax
> Expanding on the current syntax, I would propose that we add a
> 'run-depends' keyword to compliment the existing 'build-depends'.
> This would mean that instead of listing all dependencies under the
> 'build-depends' group, we would split the runtime dependencies from
> there and add them instead to 'run-depends'.
> This allows software builders to compile the runtime dependencies
> orthogonally to the requiring Stratum, allowing greater parallelism at
> build time, avoiding circular dependency pitfalls and avoiding
> unnecessary rebuilds.
> The syntax in a self contained chunk definition would simply be:
> name: chunkname
> kind: chunk
> - chunks/build-dependency1.morph
> - chunks/build-dependency2.morph
> - chunks/run-dependency1.morph
This is not sufficient on its own.
To be able to have a system which doesn't include the whole toolchain you need
to be able to split the files produced by a build into different artifacts.
I have to reiterate that this is a problem which is left unsolved by my
proposal, and quite intentionally so.
However going by file path alone will lead you only as far as the
not the split.
We can either require a pair of morphology and artifact name,
as the previous runtime dependencies proposal did,
or we could define *another* morphology type,
which is responsible for filtering the result of a build somehow,
which is defined in a file of its own,
and refers to the Buildable that it is filtering.
I have been reading through the said proposal here:
And I admit that I find this approach of "depending on a portion of a
chunk" as a build dependency and "depending on another portion of a
chunk" as a runtime dependency, is a very complex and verbose approach.
I also think you are using the term "runtime dependency" in a non
traditional sense, or at least in a sense that I am not used to.
Before we get into a debate that is lost in semantics; I should clarify
what is a runtime dependency, at least in the context of this proposal:
A runtime dependency is the dependency on a chunk which need not be
built or added to the staging area in order to build the depending
It is nothing more than a distinction between what is required at build
time, and what is required, but not at build time.
With that said, I see that you view the splitting and sorting of
artifacts from their source chunks as somehow intrinsically related to
build planning and runtime vs build time dependencies. As such, I have
no recourse but to enter briefly into this subject, even though I very
intentionally left this problem out as I strongly believe it to be
unrelated to the problems I am trying to address.
<artifact splitting subject>
In order to reduce system bloat by excluding the toolchain and excluding
header files and development related metadata from a target deployment,
I believe there is a much simpler approach.
There is already a semantic for artifact splitting, which is expressed
in the glibc.morph for example, this chunk creates:
glibc-devel, glibc-locale, glibc-libs, glibc-bins and glibc-nss
Ok, maybe it's already a bit overly split up, I would be satisfied with
only glibc-base, glibc-devel and glibc-locale.
I would push glibc-nss directly into glibc-base as it's required for a
base system that uses nss - and I would only include the nss in the
build & artifact if the 'nss' flavor/variant was selected for the
building of that glibc chunk.
All we're really missing here is:
o Standardization on the agreed symbols for artifact splitting,
typically I think 'base' 'devel' 'locale' and 'doc'
o At the deployment level, one simply needs to declare a list
of which artifact types should be included - 'base' is of course
As such, a deployment or "cluster" would simply specify which
artifact types are required:
Yes, this solution is rigid and simple - and yes it resembles very much
the way that typical distributions handle this problem, however this is
of course a distribution related problem.
Further, the concept of system languages needs to be explored, as it is
not sufficient to simply specify that we want the 'locale' packages, we
want to choose which .mo files are selected for every distributed
library (this again falls into the unsolved problems category).
Is there any justification for the added complexity of having one chunk
depend on a specific sub-artifact of another chunk, during the build
process ? Or, can we safely table this discussion in the knowledge that
we can handle this quite well without mixing these issues together ?
</artifact splitting subject>
> 2.2.3 Implications in Recursion
> When constructing a dependency tree for the purpose of laying out a
> build plan, it's important to note that any dependencies of a runtime
> dependency are classified as runtime dependencies by the referring
> Stratum, while only build dependencies of build dependencies get
> classified as build time dependencies of the referring Stratum.
> To illustrate this more clearly, take the following ASCII art for a
> rough example of what parts of the GNOME stratum might look like;
> dependencies marked with (b) are build time dependencies and those
> marked with (r) are only runtime dependencies.
> / \ (g-s-d = gnome-settings-daemon)
> g-s-d(b) g-o-a(b) (g-o-a = gnome-online-accounts)
> | |
> | WebKitGtk(b)
> \ /
> \ /
> \ /
> / \
> / \
> / \
> libsoup(b) glib-networking(r)
> \ / \
> \ / \
> \ / \
> glib(b) gnutls(b)
> When preprocessing the dependency graph for a build plan, strict build
> dependencies will be processed as the main build tree, while runtime
> dependencies are to be pruned and processed in a second pass. Runtime
> dependencies can be discarded completely when they appear as build
> dependencies elsewhere in the build tree, otherwise they are added as a
> build dependency of a virtual main target, orthogonal to the specified
> build target; as would be the case with glib-networking and gnutls in
> the above example:
> Virtual Main Target
> / \
> / \
> GNOME \
> / \ |
> g-s-d g-o-a |
> | | |
> | WebKitGtk glib-networking
> \ / / |
> \ / / |
> geoclue / gnutls
> \ /
> libsoup /
> \ /
> \ /
You might want to require that if you build-depend on something that has run-depends,
then its run-depends should be included,
e.g. glib build-depends on gcc and the glibc-headers.
gcc run-depends on glibc
so since you depended on gcc, you need its glibc run-dependency
This would be a way to remove the implicit recursive build dependency inclusion,
as with runtime dependencies it's properly declarative,
and operates the same for intermediate builds, as the top-level build.
Yes I may have been a little vague where I say above:
"... while only build dependencies of build dependencies get
classified as build time dependencies of the referring Stratum."
Indeed, run dependencies of build dependencies are *still* selected as
dependencies, however they are pruned from the tree in the initial parse
and not considered as build time dependencies of the build-depending
chunk (I hopefully covered this better in "3.2 Build Planning").
So yes indeed, all dependencies are always selected, but runtime
dependencies are never staged for the building of a runtime dependent
chunk, only build dependencies are ever staged - while runtime
dependencies are built orthogonally in the same build tree.
> 2.3 Declaring Build Flavors
> Build flavors allow us to express multiple recipies to build the
> Most of the time Strata will not need to declare any flavors. While
> there is a potentially infinate amount of flavors which *can* be
> declared, we are only interested in the flavors required to build the
> systems which we support; which can in turn settle on a shared build
> flavor of a given package most of the time.
> The main advantage we achieve with build flavors is that we avoid ever
> declaring a separate Chunk file for the same module. We allow some
> flexibility so that multiple systems may live safely in the same
> definitions repository, brief and concise exceptions can be made to the
> build rules for a given Chunk in cases where multiple Systems cannot
> agree on a single Chunk recipe.
> In this section we discuss only the declaration of a chunk with flavors,
> we will explore the referencing of chunks as dependencies in the
> following section 2.4.
> 2.3.1 Syntax
> In the interest of keeping the Stratum definition brief and concise
> while supporting build flavors, I propose a cascading attributes
> A Stratum which declares flavors will continue to declare all common
> aspects at the root of the Stratum definition, while each flavor
> definition will be allowed to override and extend portions of the
> In the case of "build-depends" and "run-depends" attributes;
> extend the common dependencies which may be specified at the root of the
> chunk. In all other cases, attributes specified in a Stratum (usually a
> Chunk) override whatever may have been declared at the root. This is to
> say that build-commands of a given flavor are never compounded and
> appended to commands specified at the root level, they replace the
> build-commands at the root entirely.
> The "kind", "name", "description" and "repo"
attributes are exceptions,
> they may not be overridden by a build flavor declaration.
> Should a Stratum declare flavors, the first flavor declared in the list
> will be considered the default and that flavor will be selected if no
> specific flavor was mentioned by any depending Strata.
> The syntax of a Chunk which declares flavors would look roughly like so:
> name: chunkname
> kind: chunk
> repo: upstream:chunkname
> ref: [ default branch ]
> common dependencies and overridable attributes
> - flavor1
> - extra-dependency1
> - flavor2
> ref: [ possibly override branch ]
> unpetrify-ref: baserock/special-branch-with-delta
> - extra-dependency2
> ./autogen.sh --without-extra-dependency1 --with-extra-dependency2
you either want a separate flavor-name/variant-name,
or to turn it into a mapping, like:
ref: [ possibly override branch ]
./autogen.sh --without-extra-dependency1 --with-extra-dependency2
> 2.4 Depending on Build Flavors
> When referring to a Stratum as a dependency, it now becomes possible to
> specify a flavor for the given Stratum.
> There are in fact three ways to refer to a Stratum. One can remain
> ambivalent as to which flavor of the dependency is used, one can specify
> a single acceptable flavor of the given dependency, or, one can list
> multiple flavors of a given dependency in order of preference.
> This last case where multiple flavors can be specified is a corner case
> which can come in handy where multiple depending Strata depend on the
> same Stratum but disagree on the expected flavor; giving a second choice
> to the depending Strata allows us to find some agreement on which flavor
> can satisfy both depending Strata.
> I included the 'all' flavor in the example above (2.3.2) to demonstrate
> this. During development it can be desirable to define a GNOME System
> with both x11 and wayland installed. Some 'x11' specific Strata may
> prefer an 'x11' flavored IBus, and some wayland specific Strata may
> prefer a 'wayland' flavored IBus, but in the case of the IBus Chunk, it
> is possible to satisfy both options simultaneously by providing an
> alternative 'all' flavor.
> As a policy, Strata should remain ambivalent as much as possible and
> references to specific flavors should only be made when it is a hard
> dependency for the referring Strata.
How should the tooling decide which variant to use if by the time you reach the top-level
there are multiple potential variants that could be used?
Reproducibility demands that there be a well defined mechanism for deciding,
and ideally for human readability, it should be obvious which variant is used,
which sounds like ideally we'd find another way to handle this use-case,
without requiring this feature.
Yes, I had thought of this scenario - the document was getting quite
large already and I left these considerations out :)
To address this perfectly valid concern:
o Having a second choice is something I consider to be a rare corner
case already, having 3 choices means that things are getting
seriously out of hand and that some flavors (or variants) should
probably be factored out at that point.
o As you mention it is possible that 2 depending chunks can agree on a
second or third choice, but not in the same order - this is only
possible where 2 depending strata declare 3 acceptable variants of a
common chunk they depend on.
o In this case, as you correctly point out, nondeterminism arises in
o Because they do in fact agree on *something*, the choice made is
unimportant, however the nondeterminism is important to address.
Conclusion: As the case should be very rare, and as the choice itself is
of no relevance at all; the tie breaker can simply be the alphabetical
order of the flavor/variant name.
> 2.4.1 Syntax
> The syntax for specifying a dependency remains the same except that
> flavors can now be specified in addition to the chunk's morph as a
> To specify a dependency where one is ambivalent of the flavor:
> - chunks/build-dependency.morph
> To specify a specific flavor:
> - chunks/build-dependency.morph:flavor
> Or using a comma separated list to specify acceptable flavors, starting
> with the preferred flavor:
> - chunks/build-dependency.morph:flavor1,flavor2
> NOTE: I am assuming here that use of the colon separator is acceptable
> so long as it is not followed by whitespace, as we use the same
> semantics to specify the upstream repositories as values for the 'repo'
Eww, please no.
Those are perfectly valid file names, and YAML lets you specify extra fields.
- path: chunks/build-dependency.morph
variants: [flavor1, flavor2]
Let's avoid string parsing wherever possible,
since it just hurts us.
I have no preference here, my thought was only to reduce verbosity of
the morphology syntax, I have no strong feeling about how the semantic
> 2.4.2 Builders
> In addition to the semantics within the Morphology definitions, builders
> need to provide a technique for specifying build flavors.
> For example, the following invocation needs also to be supported:
> ybd.py --flavor x11 chunks/ibus.morph x86_64
I'd have thought you would have proposed:
ybd.py chunks/ibus.morph:x11 x86_64
Which would fit with ybd.py's general philosophy of having few command-line
Again I have no strong preference on the syntax. It was only important
to highlight that the new semantic would have to be handled in some
shape or form.
> 3 Build Planning
> 3.1 Choosing and Validating Build Flavors
> Before managing the dependency tree in any way, we should first decide
> on which flavor of each individual Stratum and Chunk is to be selected.
> To do so, we simply need to construct a flat cache of all chunks which
> have been referred to in any way, and follow the rules to determine
> which flavor (if any) should be selected, or to bail out with an error
> message if the build makes no sense because two or more referring
> stratum cannot agree on a given flavor.
> Remember that:
> a.) A reference without flavor is ambivalent; it does not care which
> flavor of Stratum is selected
> b.) A reference to a single flavor is specific, it can only live with
> that specified flavor
> c.) A reference with multiple flavors has a preference for a specific
> flavor, but can find agreement with other referring stratum and
> settle on a second choice
> d.) In the case that all referring stratum have remained ambivalent,
> the first declared flavor is the default and will be selected for
> the build.
What happens if multiple things depended on something in a different order?
If it depends on which was defined first,
you need to define a traversal/parse order.
You could solve this with some form of vote, <satire> though apparently the
British public is incapable of understanding preference ranked voting, which
implies that this mechanism is too complicated. </satire>
> Provided that all referring Stratum have found some agreement on
> flavors, we will keep this cache of preselected flavors around for the
> remainder of the build plan.
What happens if it can't be decided since there's a tie?
This is answered in a previous comment.
> 4 Migration Paths
> The proposed changes to the Morphology format represent a significant
> amount of work.
> Updating the tooling to understand the new format is only the fun part,
> untangling build dependencies so as to regain the lost knowledge of what
> really depends on what is a longer arduous process. Of course the latter
> is where most of the added value will start to come to life.
> To perform the migration, we could build a separate definitions
> repository from scratch, which would probably be a much cleaner
> approach, however, we do have the option to perform the migration "in
> tree". As the Stratum concept is not completely abolished, but only
> semantically changed (as highlighted in section 2.1.2); it would be
> conceivable to apply the new format to our existing definitions
> repository using an automated script.
> If the second approach were taken, we would still be left with a tangled
> mess after the initial migration, but we could then approach the
> refactoring work on a Chunk by Chunk basis, moving from the bottom
> towards the higher level Strata.
> Fixing a given chunk would simply involve removing any dependencies on
> Strata that are not build-essential and replacing those dependencies
> with the appropriate individual Chunk dependencies, adding the
> distinction between runtime and build time dependencies. During the
> refactoring, when an occurrance of a legitimately duplicated Chunk is
> identified, it would need to be replaced by a flavor definition in a
> single unified Chunk.
> This refactoring would could potentially be done in the same repository
> orthogonally to other ongoing work.
This would not work by itself,
we'd need to incrementally approach the target format.
Ideally we'd have a period where we overlaid extra dependency information on top,
but the build tool had to opt in to using it,
otherwise it would use the old behaviour.
Additionally we'd like to be able to opt into fine-grained dependencies at the
morphology-level so we can migrate the definitions piecemeal and have it still
work, so we'd need to add an extra flag to a definition saying it has
Also, a stepping stone towards allowing chunks to have their own dependencies,
would be to allow strata to in-line the contents of chunks,
since then we'd be able to work out which strata need to remain small.
This is indeed not a backwards compatible change in any way, that would
indeed be a very hard problem.
In the very small and limited ybd world in which I live, it is very
possible to simply:
a.) Update a couple thousand lines of ybd python to handle the new
b.) Write a relatively simple script which transforms the existing
definitions repository in a single pass. This would not produce
the most desirable output but it would be a starting point which
safely produces exactly the same output for our existing builds.
c.) Start a slow and ongoing process of eliminating duplicate chunks
and factoring out cases where chunks depend on strata.
I can only presume that if should in theory be just as easy to perform
the same update to Morph atomically, probably with a format revision
increment to ensure the transition goes smoothly for both parties.
In a world where it is impossible for Morph to perform the same
transition, or worse; a world where surrounding tooling wants to have a
choice to "opt in" to the new format selectively while processing the
same definitions repository, I am afraid that we have exited the realm
of realistic entirely.