On 2015-07-30 08:16, Paul Martin wrote:
This follows a quick bit of brainstorming with Sam Thursfield.
This all looks sensible. A couple of comments inline...
# `ybd` artefact splitting
## Current situation
The way `ybd` currently creates artefacts is much simplified compared
with `morph`. It builds and includes everything "installed" by that
component into a single artefact.
## How `morph` does it
`morph` has a list of default rules, currently hard coded into its
source, which defines a set of regular expressions which allocate
files to each generated artefact on a first-found basis.
Individual definitions may override this by using an `artifacts`
It is also possible in a system definition to select individual
artefacts by providing an `artifacts` table within each `strata`
There are plans in `morph` to move the default artefact rules out
a file within the definitions tree.
## How `ybd` might do it
`assembly.assemble()` currently invokes the _artifact creation_ phase
by running `cache.cache()`, and creates monolithic artefacts as
The compression phase currently used in artefact generation is
expensive. This is probably not needed for local storage, and time
wasted in compressing the artefacts and then uncompressing them into
individual files for the "unpacked" tree.
Note that the compression phase is primarily to create an artefact
which can be published to a cache server. It may be possible to save
some wall clock time by forking that process, or not doing it at all if
the running instance of ybd can be sure that its artefact will not be
required by anything else.
I suggest a change to `ybd` to allow for artefact splitting by
modifing the metadata associated with an artefact. There is
a list of files contained in that artefact within the metadata. I
propose partitioning that into a list of split tags, each containing
list of the files within that split.
For example, where an artefact's metadata might currently contain:
it might instead contain:
Rather than holding a fully unpacked representation of the artefact
within the ybd cache, the metadata `artifact` table can be used to
control which files `tar` extracts.
The unpack stage in the sandbox module would then select the table(s)
of filenames to unpack and use that with the `-T` option to `tar`, to
restrict the extraction process to just those filenames.
If it is still desired to keep an unpacked representation within the
`ybd` cache directory, the same metadata tables can be used to
which files are copied into the sandbox. However, extraction from an
uncompressed tarball might be more space-efficient.
The benefits of this scheme are:
* Minimal changes to current code.
* `ybd` continues to generate a single cache object for each
This is a design choice. It may be that some others here prefer the
split artefacts approach taken by morph, but this was one of the things
I expressly wanted to change in doing ybd (i think a monolithic cache
artefact is easier to understand, easier to publish, and easier to deal
* Cache files will remain incompatible with `morph`.
I don't think that making ybd and morph cache artifacts compatible is
going to be achievable in the near future for other reasons (eg
monolithic vs split, and cache key algorithm) so I'd be very happy to
adopt the approach you're describing.