Proposal: gitlab.com/baserock/definitions as definitions canonical source (was Re: RFC: Gitlab Implementation)
by Javier Jardón
Hi all,
On 4 July 2016 at 17:35, Javier Jardón <jjardon(a)gnome.org> wrote:
> On 3 July 2016 at 14:20, Javier Jardón <jjardon(a)gnome.org> wrote:
>> I personally like gitlab for its simplicity so this morning I went
>> ahead and I setup a ci/cd pipeline in a mirror of definitions in
>> gitlab.com [1]
>> To make things easier it would be great if you can review this patch
>> [2] to add .gitlab-ci.yml in the master branch of definitions.
>> In this way the ci pipeline will be automatically triggered after any change.
> <snip>
>> [1] https://gitlab.com/jjardon/baserock-definitions/pipelines/3636171
>> [2] https://gerrit.baserock.org/#/c/2194/
>
> This is merged now, thanks for the quick reviews!
>
> Pipelines will be automatically generated here [1] with any change in
> master of definitions
> If you want to give a try to this ci system: the only thing you have
> to do is create a branch from master of definitions in gitlab,
> make your changes and push it. A pipeline will be automatically created
>
> Cheers,
> Javier
>
> [1] https://gitlab.com/baserock/definitions/pipelines
Since I presented this demo in July, I've improved the CI pipeline a
little so now we have elastic runners; this means that we will
generate build machines on demand, while keeping the free runners
offered by gitlab.com as a backup.
This will save us resources and we can adjust the capabilities of the
machines as we wish (the machines are generated from a DigitalOcean
account)
We also have a ARM runner but I've had to disable it at the moment
because some problems cloning git repos [1]
Therefore, and after some people have been playing with the system for
a while with mostly positive feedback, I'd like to propose
gitlab.com/baserock/definitions as the canonical source for
definitions.
By doing so we will have the most important feature we were wanting for a
long time (and one of the main reason we switched to gerrit):
- Pre-merge testing of all the system we are interested on
There are myriad other gitlab features we can use (issue
tracker... ), but I think this is the most important one for this
discussion
Let me know what you think
Cheers,
Javier
[1] https://gitlab.com/baserock/definitions/commit/e2255bd601a12c9f6924874f2d...
6 years, 6 months
System Integration Commands & Artifact Splits
by Tristan Van Berkom
Hi,
On Friday we had an interesting conversation about system integration
commands (aka "s-i commands"), and I'd like to pick your brains further
on the subject because I think this is an area we need more clarity in,
and I'm not sure I have a proposal that solves the problem once and for
all either.
IRC Log: https://irclogs.baserock.org/%23baserock.2016-10-21.log.html
Emmet Hikory sums up pretty accurately that:
"The s-i commands are intended to represent the results of maintainer
scripts (both pre- and post-) to make system-specific adjustments to
the content. We do them at build time in an attempt to ensure we can
repeat systems, rather than relying on the right balance of cosmic
rays in the target system at package installation time."
The conversation revolved around whether the s-i commands should be run
pre or post artifact splitting. Looking closely at the problem, it
seems that this is a serious design flaw that needs to be addressed in
some way.
Should s-i commands apply to a full system before splitting ?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There are a few negative side effects of this which would need further
examination, some of which come to mind:
o Split artifact metadata is currently generated before running
the s-i commands, leaving important system caches orphaned.
The s-i commands themselves typically result in the updates and
creation of system caches, the prime example of the day:
http://git.baserock.org/cgit/baserock/baserock/definitions.git/tree/strat...
Once we run the s-i commands which *create* the font cache, and
then attempt to only distribute files which are spoken for
in the existing split metadata, updating the fc-cache will be
for naught - nobody has yet claimed the cache which was not
yet generated at build time so it stands no chance of existing
on a resulting system which cherry picked only split artifacts.
Should we decide to generate the split metadata after running
the s-i commands and before splitting; we now lack context
of where the additional cache files should be shipped.
This problem itself could be mitigated by at least enforcing that
any chunk who's s-i commands would result in created caches, should
claim the generated files explicitly in one of their split
artifacts (So fontconfig.morph should explicitly claim
${localstatedir}/cache/fontconfig/* in it's runtime artifact).
o Since we have the entire system staged at the time of running s-i
commands, the caches that are updated are potentially incorrect for
the resulting deployment.
Consider a case where a general "all-fonts" chunk is used, and
it decides that only a single standard font be used in the minimal
split, but the rest of the fonts appear in separate split
artifacts (ok, unlikely in this case but clearly *possible*).
In this case we have updated and hopefully managed to ship
the generated font cache for the minimal system, however we
shipped a cache for all possible fonts onto a system with only
one font available.
So this is a highly unlikely case; since font chunks are more
likely split in different ways (if at all).
GdkPixbuf loaders are generally split into runtime/dev/debug
artifacts, but not into separate artifacts per image type (again
this would not result in an incorrect loader cache).
GTK+ IM Modules would be split much like GdkPixbuf loaders (again
not causing any issue in the immodule caches).
Is this a problem worth worrying about really ? If so, can we
mitigate this at least by documenting how artifacts should be
split ?
Should s-i commands apply to an assembled collection of split artifacts
before deployment ?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here we have a different problem. Sticking with our nifty fontconfig
example, what happens when we want to deploy a runtime or bootable
system which has a fontconfig library which can load fonts and use an
already generated cache, but in that runtime we clearly have no need of
ever updating the font caches ?
I.e. we want to install a runtime that has fonts, has a fontconfig
library, has a generated font cache; but we explicitly dont want those
bothersome extra tools for updating these caches, they serve no purpose
in the resulting runtime.
Now we have a situation where we split the artifacts first, and when we
try to run the fontconfig s-i commands to generate the font cache, the
undesired fc-cache program is nowhere to be found to do so.
RPM mitigates this problem in packages by declaring alternative
dependency types; i.e. in a spec file one might state:
Requires(post): fc-cache
Although, of course it still means you may not update the font cache
for a system/runtime which desires the absence of the unneeded fc-cache
program.
This problem extends itself well beyond cases like fc-cache: We ended
up running s-i commands before artifact splitting because we found that
when trying to generate only a runtime; system integration commands
would not run at all for lack of a shell.
Cheers,
-Tristan
6 years, 7 months
Cache Keys: Kill the tree shas, use commit shas instead
by Tristan Van Berkom
Hi all,
I'm trying to cut some fat and reduce complexity here, and making (yet
another ?) case for throwing away git tree shas from the cache key
algorithm in favor of the git commit sha instead.
I have looked through the mail archives and found not much conversation
on the matter, I did find this:
Richard Maw writes:
"Frankly, to get all the way out of needing to talk to git during the
build graphing operation, we would need some lookup between commit
sha1 and tree sha1 anyway, since we use the tree in the cache key,
not the commit."
https://listmaster.pepperfish.net/pipermail/baserock-dev-baserock.org/201...
In any case, I am mostly curious if people on this list have first hand
knowledge of just how much disk space on the artifact servers we save
by trading away the simplicity of just using commit shas.
The decision should be simple, if we can maintain a simpler process and
consequently a simpler and more straight forward code base by removing
this, we should. Unless the cost is truly demonstrably high: We should
not be bending over backwards to use these git tree shas.
Below is some more text on the subject, clearly I want to kill the git
tree shas.
Cheers,
-Tristan
The benefits of using the tree sha are:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
o Two commit shas may possibly point to the same tree with identical
sources.
So in the case that definitions are modified to point to a new
commit that is the same tree, and in that definitions set; the said
modified chunk has all the same dependencies which they themselves
have not changed, this would potentially result in the caching of
2 identical artifacts under differing cache keys.
In other words: It can happen approximately never.
Now I can in fact think of one situation where it could happen a bit
more often than never, but it comes with a counterargument too. Imagine
the following steps:
1.) There is some CI which triggers a build on every push to every
branch in a given definitions git repository
2.) The users are working on a wip/<username>/<project> branch in
an upstream gitlab or trove.
3.) A commit is pushed to a wip branch in definitions, pointing
to a wip ref in an upstream gitlab or trove.
4.) After CI passes, the upstream gitlab gets its wip branch
now merged
5.) A new merge request is created to definitions to point to the new
merged upstream commit.
At this point the second time we push a definitions commit which points
to the newly merged wip branch in the upstream, it will point to the
same tree.
Using the git tree sha, if definitions have not changed between (3) and
(5), we successfully avoid a duplicate artifact.
That is, unless the upstream also has merge handling like gitlab or
gerrit which adds an additional informational commit telling a story
about why the upstream wip branch was merged, in which case the trees
would differ anyway.
So, this case exists, and it happens a bit more than approximately
never, do we know how much ? Is the optimization worthwhile ?
The benefits of using the commit sha are mostly obvious:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
o Ability to make cache key calculations with only the set of
target definitions, without having access to gits or "tree
servers"
o Less moving parts; if for example we would build GNOME modules
with ybd/definitions, why would we bother with a trove for the
GNOME modules when they are all hosted at git.gnome.org ? Would
we have to implement a tree server to optimize the builds ?
Same goes for some infrastructure and CI setup with gitlab,
does gitlab provide it's own tree server ?
o Reduced complexity in surrounding tooling, code is more presentable
and readable and maintainable without this, the operation looks
questionable and convoluted.
Not only does it make the surrounding code complex (cache key
calculation code depends on code which manages git mirrors),
but the operation itself is complex.
Mitigation of damage
~~~~~~~~~~~~~~~~~~~~
Now to go a bit deeper, should an artifact server run for years on end
I can expect it to consume loads of disk space. Especially if we're
caching artifacts for every wip branch ever made, instead of only long
term caching the artifacts which are relevant to definitions master and
a set of registered release branches (of definitions).
This disk space consumption might be slightly increased were we to ditch the cumbersome git tree lookups.
On the other hand, if we did not have to use the tree sha and could use the commit sha itself to calculate the cache keys, it would be relatively straight forward to write a pruning or culling algorithm, with the inputs:
o definitions git repo
o list of branches of importance to track
o artifact cache directory
A script could be made to iterate over the commits in the tracked branches of definitions and generating the keys for each artifact that an "important" branch could possibly generate; while removing artifacts from the server which are of no relevance (accumulated cruft from various merge requests throughout 10 years of history).
The above could be made more fancy as well, i.e.: for each release branch, keep artifacts from the latest release tag -> HEAD, and keep artifacts for every other release tag since the creation of that branch.
Yes, we *could* implement the above with access to all of the gits under the sun present at the same time, but refreshing all the git mirrors in order to calculate trees is in itself time consuming, and we would be racing against the next definitions commit to do so.
6 years, 7 months
RE: RFC: Gitlab Implementation
by Richard Ipsum
hi i would like to at least try to provide insight
but not exactly insight, oh deer...
i use gitlab for reviewing code every day at my day job,
it is mostly good,
i am especially fond of the :tropical_fish: and :smile_cat: emoticons,
they make life better
and help me stay positive
the review model encourages the force push,
if you don't know what a force push looks like it looks like
https://i.imgur.com/XFQLB.jpg - hahahaha, but the commits from my v1
patchset are in that house! ;.;
so we don't do that on our project,
we manually submit a merge request for each revision of
what i'm suddenly going to call a candidate,
this means we can keep track of what happened,
without fire
gitlab doesn't provide a native way of showing diffs between
merge requests like gerrit does (if it does then i'd like to know!)
but that is actually okay because they are just refs so
you can git diff richardipsum/CHUNKYBACON_v1 richardipsum/CHUNKYBACON_v2
from your terminal, which may also be in colour.
some more notes,
gitlab pros (COMPARED TO GERRIT):
* it's written in ruby
* it's much prettier than gerrit
* it has :tropical_fish: emoticons
* it has simpler ci (actually i've never used this
i just read that it does)
* probably better support for submitting groups of patches
(though gerrit has pretty much sorted this out now)
gitlab cons (COMPARED TO GERRIT):
* it's written in ruby
* it has no native concept of a patchset (as gerrit does)
* having no native concept of a patchset,
it has no native way to diff between different versions of a patch
* CANNOT COMMENT ON NON-DIFF CONTENT (THIS IS SUPER ANNOYING
AND WE MUST BRIBE THEM TO FIX IT)
* No support for "BATCH" comments,
gerrit lets me save comments in a DRAFT and then post them
all when I'm sure they're sane, this is how review works,
(if gitlab has this then I am too dumb to find it)
by the end of a patch series I might realise my comment is wrong,
gerrit's model allows for this in a way that preserves my ego
which as you can see is important to me.
* i *think* it stores all the metadata in a database:
* if it does then that has replication/distribution IMPLICATIONS
which is why NOTEDB was made by the gerrit people at GOOGLE,
btw i even wrote a library for working with this,
USE AT YOUR OWN RISK,
I WONT BE HELD LIABLE FOR ANYTHING,
YOU CAN ALSO BUY ME A SANDWICH,
I LIKE PLOUGHMANS WITH LETTUCE, PLENTY OF PICKLE.
https://bitbucket.org/richardipsum/perl-notedb
okay so i hope that helped!
VLetrmx
6 years, 7 months
YBD 16.42 is released
by Paul Sherwood
Hi all
changes since last time are mainly small fixes and CI configuration.
The key thing to note now is that YBD has moved its upstream to
gitlab.com, so I'm expecting that future improvements will land there,
and the original github repo will be frozen at 16.42.
As you can see from the various discussion threads, Dan Firth, Tristan
Van Berkom and others are deep into rethinking some of our design for
definitions and build tooling, and I'm hopeful that this will lead to
significant steps forward over the coming months. It's not yet clear to
me whether the result will be a new and improved YBD, or an alternative
tool entirely, but I'm happy to support either route.
br
Paul
e74d938 Note for move of ybd upstream
96a1ddd Fix declaration of 'readme.md' in setup.cfg
ef460b2 Merge branch 'tacgomes/ybd-tacgomes/fix-git-mirroring'
75511a1 Avoid running the git gc in the background
c24b2c9 Fix error message
57ae275 Merge branch 'james/chmod_pip' into 'master'
a7a8f41 Make get-pip executable
3cdf9bd Merge branch 'jjardon/get_pip_fix' into 'master'
465e9ce install_dependencies.sh: Fix get-pip.py execution
437749a Merge branch 'jjardon/sudo_fix' into 'master'
7631d2c install_dependencies.sh: Make sudo optional
65bd91f Merge branch 'tiagogomes/improve-docs' into 'master'
4b02337 Improve documentation
d8aba23 Merge branch 'jjardon/gitlab_move' into 'master'
5f49085 ybd is hosted in GitLab now
e7be39b Include max-jobs in cache-key
db7d285 Show download count
e83f8b0 Rejects were causing status page 500 error
86f8e0b Force ref to be a string
766b244 Fixes #241 - reorder tar adds so dirs are last
3e0ec8c Fixes #241
c35ede2 Merge branch 'jjardon/ci' into 'master'
ef4d449 .gitlab-ci.yml: Do not print all the environment variables
43aab08 Merge branch 'ps/simpler-walk' into 'master'
f072299 Simplify logic for walk and parse definitions
969df36 Regularise artifact-version range checks
33aba94 Merge branch 'jjardon/ci' into 'master'
259d1a7 .gitlab-ci.yml: Build minimal system
3f8bd65 .gitlab-ci.yml: keys-only mode and artifact cache:1 only in
cache-keys stage
f77415f .gitlab-ci.yml: Remove installation stage
e918103 .gitlab-ci.yml: Fix cache key: path -> paths
b972ca1 Update readme.md: Add gitlab build status
ec37d28 We only need root permissions for actual builds
91daaf0 Pep8
26a3170 Move permissions check after usage check
d2e954a Merge branch 'master' of https://github.com/devcurmudgeon/ybd
76a5758 Spot the deliberate mistake in previous commit :/
960914e Merge pull request #240 from leeming/leeming/missing-cwd-conf
f29fb1c Fix for #238
2c7034d Pep8
0ae83f8 Fixes bug that ignores ybd.conf on cwd
c069aba Check every field in every definition
340be12 Add tristan's 'awful hack' until it lands in released
sandboxlib
9ba66ef Merge pull request #239 from jjardon/jjardon/fix_python_dep
b332ab7 install_dependencies.sh: Python2 is python2 in Arch
a39961b Merge pull request #236 from
leeming/leeming/redundant-exit-setting
3a7ede8 Simplified setting 'exit' var from config setting
831c19f Revert "Always exit if specified morph file is missing"
fec1a23 Need to include 'devices' in cach-key
afb2477 Add python to dependencies
b4a1be9 Always exit if specified morph file is missing
6 years, 7 months
Definitions New World Order
by Tristan Van Berkom
Hi all,
So some of us had an offline conversation about how we can reorganize
definitions to be more workable in the future. While we can agree on
what the current set of problems we have to solve pretty easily, it's
harder to agree on the solution, so lets try to hash it out here on the
list. I really hope we can at least agree on a solution by the end of
the week.
These are the two main problems I see with the current project layout,
these I want to fix by proposing a new layout with minimal or even no
changes to the tooling surrounding definitions.
I've included a TL;DR for each of the following points, for those who
just want to gloss over this email.
Clutter
~~~~~~~
TL;DR: Creating a system with baserock means that one has to work in a
git repository cluttered with a vast amount of irrelevant "stuff".
Currently if one wants to create a system with baserock using the
current practices, they will clone/branch/fork the upstream definitions
repository and start adding stuff.
Possibly the result of the work done will be merged back into upstream
definitions if desirable, pushing even more "stuff" into there for the
next person to carry.
This has gone on to a point where we actually have deadcode present in
the system, extensions which exist but are not used by any existing
system. It would not surprise me that we have orphaned strata lying
around, serving no greater purpose than to clutter the user's workspace
and confuse them as to what is useful and what is not.
Further, even if one considers that if something is somehow referenced
by a system, "it's not deadcode", this is also wrong; as much of the
existing morphs in the systems directory are long unmaintained and just
add dead weight which we need to carry. While it can be agrued that
there is "some value" that we have a given system "that builds", the
fact that a given system builds is of negligeable value unless that
system has a user who tests that system to actually work in some more
meaningful way.
Some examples of what I cannot do with the current structure:
a.) I cannot peruse the git history of my own project which
uses baserock without being exposed to a huge amount of
work unrelated to my own project.
b.) I cannot easily grep through my own definitions repository
to find a given string in an efficient way because I get
a lot of unrelated results.
But most irritating to me, is that I cannot have a clean workspace
which belongs to my project. I cannot work in an environment where my
coffee table is cluttered with all the stuff my roommate left on the
table: Definitions is currently that cluttered coffee table.
Changes are pushed/forced upon higher level projects/components
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TL;DR: Changes in low level strata are forced upon thier consumers,
regularly causing breakage and making it impossible for projects
depending on low level strata to ensure their systems actually work as
expected.
This might be perceived as a result of the above point but it is one of
the more damaging aspects.
Currently, as baserock definitions are maintained in one huge directory
tree, changes in lower level components get forced upon higher level
branches, which attempt to reuse these lower level components. This is
just plain wrong.
In C programming, teams who provide stable API/ABI surfaces have the
discipline to push only API/ABI compatible changes to their consumer
base. Changes of a C library which break the API contract are offered
with a clear version bump and consumers of that library must choose and
adapt to the new API or at least recompile for an ABI break.
For C programs, this has been an exact science for a long time,
mistakes are occasionally made but the methodology is sound.
In the case of definitions, changes have been made on a regular basis
which pull in a newer version of something in a lower level strata,
without much care as to whether there has been any API break or change.
Then master happily moves along and the burden of testing and repairing
resulting breakage falls on the maintainers of the consuming systems.
This cannot continue, until we have an exact science and better
discipline to maintain a stable version/branch of a low level stratum
or collection of strata, in which we know that the collective API
surface will absolutely not change, we cannot allow lower level strata
to force changes onto it's consumers.
Instead, the consuming higher level strata and systems mush be at
liberty to develop against a specific version of say, "core" and
"foundation", these projects which depend on the "core/foundation"
group must be able to pull in fresh changes at will and test their
resulting systems accordingly.
So those two points above are basically problems IMO and those are what
I want to address in the immediate future, below is my own proposal for
fixing these issues.
Proposal to use recursive git submodules to address this issue
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Yes I said it, and I know there are plenty of reasons to hate a
submodule, obnoxious things as they are. However I have not yet been
able to come up with a better solution.
So here is how I would propose to address the issues: I would first
propose the following split of definitions, each shared git repo would
be comprised of one or more strata. Systems would be split out into
repositories holding their own system specific strata, system morphs,
deployments and extensions.
o definitions-gnu-linux
This would contain:
- build-essential
- core
- foundataion
- the minimal system bsps
Possibly the stray python bits, maybe some things like icu could
live here too.
These seem to be the most widely shared stratum and they are the
"root of all evil" so to speak.
It's unimportant if a consumer of this component/repo is going to
use all of the strata provided by it.
When I write a C program, I strictly do not leave any deadcode
lying around in my own repository, but I dont mind linking against
Qt even if I do not happen to use every function provided by that
library, this I think is a proper analogy.
definitions-gnu-linux would strictly not contain anything that is
not needed for building these base strata and systems. It would not
contain any deployments or extensions that are not strictly needed
to produce the systems in this repository.
o definitions-freedesktop
This would contain everything needed to build X and also
alternatively wayland, or X with wayland support.
This would also contain the graphics common stuff and gstreamer as
well.
This repository itself would include definitions-gnu-linux as a
submodule, which means that this module can remain in charge of
what version of definitions-gnu-linux it wants to use.
o definitions-gnome
This would contain the gnome stratum and whatever else is needed to
assemble a full GNOME system.
It would contain a deployment as well, and _only_ the extensions
that are required to deploy GNOME.
Of course definitions-gnome would also consume
definitions-freedesktop as a submodule much in the same way that
definitions-freedesktop consumes definitions-gnu-linux as a
submodule, allowing a recursive chain of submodules to be used to
compose a system, and allowing the GNOME project maintainer to
choose what version of the definitions-freedesktop API surface it
wants to consume.
o Other systems
The same would be done initially for systems that we maintain, like
the genivi demo platform, which probably wants to additionally pull
in a Qt definitions module; possibly living side by side or on top
of the definitions-freedesktop module.
Systems that we do not currently maintain but obnoxiously exist,
will be left to die in the current definitions repository until
which time an interested maintainer steps up to maintain that
system using the new layout.
Caveats:
As far as I can see the main caveat in the above proposal is that
sometimes we will want to reorganize things, for instance something
from GNOME might become desirable in the freedesktop module in order to
be reusable by another system; in this case we have some discontinuity
of history.
I dont think this horribly bad, analogy: it has happened before that
custom widgets (GtkOverlay and GtkInfoBar) have been developed as a
part of the gedit program, but were later considered desirable for
other applications and then merged into GTK+ instead.
The above is IMO a much better scenario than if all of the GNOME
Desktop were to be maintained in a single git repository.
Thoughts on this proposal ? Improvements on the given proposal ? Any
counter proposals to address the same issues in a different way ?
Cheers,
-Tristan
6 years, 7 months
Baserock - Yocto
by Paul Sherwood
Hi folks,
I met with Richard Purdie at ELCE, and he graciously discussed some of
the fundamental differences of approach between the Yocto project and
Baserock.
TL; DR Yocto is aiming for (and achieving) wide adoption, and this
involves compromises which are different from Baserock's simplifying
decisions.
Key points:
- his main awareness of Baserock prior to the conversation was our
choice to aim for only native builds, which he had pegged as 'a bit
crazy'
- I think Richard agrees that YAML is a good idea, but bitbake has its
custom format as a result of its history and popularity
- it's unrealistic for Yocto to expect everything in git, since that's
not the real world situation for most adopters
- given the above, patch files in the recipe tree are the best solution
- the Yocto mirror (and pre-mirror) approach is designed and intended
to deal with use-cases we care about, e.g. upstream moves, or internet
goes down, and CI/release process should be unaffected. When we talked
further Richard did acknowledge that there's been some recent
discussion/bug about a corner-case where upstream is a git repo, but in
general Richard believes that if configuration is correct, the problems
seen in (eg) GDP releases are normally avoided.
- Yocto does care about reproducible builds (in the bit-for-bit sense),
and there has been work ongoing to remove host config pollution. Richard
believes that bitbake does enough to isolate toolchain from host, does
not see any difference vs our toolchain bootstrap approach. Maybe I
failed to explain this properly.
It was a friendly discussion, and I'm grateful for Richard's
consideration.
br
Paul
6 years, 7 months
Baserock Definitions Schema V9, V10 & Defslib
by Daniel Firth
Hi all
In the last couple of weeks I've been working on an improvement to the
definitions format in order to address a few key problems we've had in
integrating system images. I will detail a list of what I've developed,
where in what state, and some short stories to support the changes.
This work is partially to mostly complete - The morphology resolution
and assemblage manipulation is quite usable but I do invite everybody to
try and break it as hard as possible, provide patches or even
cannibalize all of it.
Links:
* The V10 schema change (No ontology changes yet):
https://gitlab.com/baserock/spec/merge_requests/2/diffs
* Defslib (Will attempt to build naively using `sudo ./quick-check.sh`:
https://gitlab.com/baserock/defslib
* Defslib pypi page: https://pypi.python.org/pypi/defslib
* V10 visualisation of base-system-x86-64-generic:
http://baserock.gitlab.io/defslib/index.html
* Example of pre-migrated definitions:
https://gitlab.com/baserock/definitions/tree/lc/010
Why (V9):
Suppose we develop a system foo-x86_64.morph, using a particular
toolchain stratum that was delivered. We'll call this
"foo-toolchain.morph". A large amount of strata rest atop, specifying
"foo-toolchain.morph" as a stratum build-dependency. After specifying
the definitions for the entire system and strata stack, we then wish to
swap out the foo-toolchain and test the system building against
build-essential. How much do we need to change in order to try this?
The answer to this is "everything". The entire strata stack depends
ultimately depends on foo-toolchain, necessitating a duplication of the
entire strata stack, which is not something we want to have to do just
to try a provisional compiler.
The answer for this came from observing why it is we don't suffer the
same problem in swapping out an individual chunk in a stratum, and that
is that the build-depends are ultimately the juresdiction of the
enclosing stratum to declare and modify. V9 tries to homogenise this
slightly by moving control of the build dependencies out of the stratum
files themselves up to the system level, and mirroring the same schema
that strata use to manage build-dependencies of their component chunks.
Testing a different toolchain now only involves modifying the filename
to the stratum included in the system.
Why (V10):
Suppose we have several systems all relying on the same collection of
strata for functional reasons. Suppose foo-x86_32, foo-x86_64 and
foo-armv7 include as strata [graphics-stack-banana,
graphics-stack-banana-core, graphics-stack-banana-plugins]. Suppose then
we want to switch to a single new graphics-stack-trampoline in all of
our systems. We want to keep both collections of strata around, as some
system may genuinely need the graphics-stack-banana collection. But
since all systems must explicitly list all of their strata, we must
update every system to cope with the new change. Perhaps what we would
like is a mechanism by which all systems can include a certain
collection of functional components [build-essential, core, foundation,
graphics-stack-something], in which can abstract out a particular
graphics-stack provider across all systems that utilise the same
subcollection.
In a more extreme case we may want to replace the functionality of
several strata with a single chunk, however this will require containing
the chunk in its own stratum, which is bloatful. Other times we may want
to do the reverse, and swap out a single chunk for a collection of
components, already defined in a stratum, this requires copying or
rearranging.
V10 attempts to answer this by replacing system and stratum with a new
syntactic type called 'assemblage'. Assemblages are objects which
contain a contents list, which is a hetrogenous list of either
assemblage or chunk. In type theory parlance, where previously systems
held a field strata: List<Stratum>, and strata held a field chunks:
List<Chunk>, assemblages hold contents: List<Either Assemblage Chunk>.
Chunks and assemblages within a contents list can be made to depend on
each other. The visualisation of base-system in the links above
indicates one such factorisation - where previously base-system
duplicated the strata in minimal-system, here it need only include
minimal-system as an element of its contents and have foundation depend
on it. One further factorisation would be to collate the non-bsp parts
of minimal into say "minimal-stack", and only putting architecture
specific bsps in the system level, dependant on minimal-stack, allowing
reuse of "minimal-stack".
Since the type of contents: is uniform across the board, assemblages can
be programatically sorted, manipulated, and edited to form new
topologies. In the defslib example, the Actuator.build_assemblage()
function works by flattening the assemblage recursively into a list of
chunks, and sorting them topologically so that iterating superficially
over the contents list is also a sound build order.
System and stratum will still form part of the ontology of baserock, but
with a dependent meaning. A stratum is an assemblage that exists as a
component in another assemblage. A system is an assemblage that boots.
V10 also does not preclude continuing to define everything as has been
done in multiple systems, should the user prefer.
More stuff:
Analysis of the existing json schema indicated that it didn't handle
type checking files with any level of rigor. Due to the fact that
"morph:" fields allow for self-updating dictionaries, very few fields
can actually be guaranteed to exist in say, a file containing chunk
build instructions. This as documented in the defslib README, makes type
checking individual files pointless, and also led to ybd being unable to
use the schema itself to type check incoming data, instead relying on
its own field validation. Defslib will currently parse V10 using the
MorphologyResolver is able to validate *fully resolved assemblages*,
that is assemblages that have had their morph: references inserted and
the morph: field popped out. This allows the entire assemblage to be
type checked, giving some assurance that the resulting structure is
something that can be understood by a build process, or other logic.
Defslib will attempt to build with sandboxlib chroot using "sudo
./quick-check.sh". I have not let this run very far yet, so more to come.
Br,
Dan
6 years, 7 months
ybd-rpm, extensions, sharing ybd code
by Tristan Van Berkom
Hi all,
So today I present this monstrous hack which I came up with last
week, I've put it up on gitlab here:
git@gitlab.com:tristanvb/ybd-rpm.git
The majority of the code is a shameless copy/paste job of ybd, and the
interesting bits are in ybd-rpm/ybd-rpm.py and ybd-rpm/rpm.py (some
minor modification was also made to sandbox.py to allow more elaborate
staging techniques).
As I brought up on the irc channel last week, it has some problems
which I'd like to solve, discussed below.
Code Sharing
~~~~~~~~~~~~
I don't think that building rpms should be a first class citizen in
ybd, so I don't want to clutter ybd codepaths with knowledge of how to
build rpms. Because of this I created a separate tool (or mess).
This currently works by copying more than half of the ybd codebase as
is and making a program which converts chunk artifacts into rpms,
relying on an rpmbuild tool which is expected to exist in the system.
For rpms to be 'a thing' in any way, I need to cut the fat and short of
building some elaborate plugin system into ybd I think the best way
forward here is to abstract the ybd functionality into some libraries,
hopefully with a well defined and clean API.
Some of this work will be solved by Daniel's work on defslib, but most
probably we need to step it up a notch.
The functionality I need from ybd includes:
o Configuration
It's important that any ybd 'sister tool' such as this one have
access to the same configuration as an adjacently installed ybd
program.
We would want to share things like the tmp directory for sandboxing
and the deployment directory for dumping output, etc. (as opposed
to having to configure a separate tool with essentially the same
data).
o Computing cache keys
To generate rpms from a given system morph & arch, under the
assumption that all the artifacts are available in ybd's configured
artifact directory, we of course need to know the cache keys.
This will probably be taken care of in the up and coming defslib.
o Walking the definitions dependency tree
When building rpms, one wants to build them in order of build
dependency. This is because rpm itself has some capacity for
automatically generating it's own package dependencies assuming
that your rpm database is created in a sane order.
This basically involves using tooling like ldd and such to
check if binaries inside rpms depend on files provided by
rpms you have already generated, so we want to build the rpms
in order of build dependency.
Other than this, it just makes sense to have an API for iteration
over the build dependency tree - ybd's code itself could be a bit
more readable if it was less recursive (even more so if we had some
job server logic handing out chunks to build to various 'instances'
rather than having the entire program race for a lock file, imo).
o Staging chunks and build directories
To build rpms, one needs to stage a sandbox that has rpmbuild
and one has to stage chunks *inside* that sandbox.
o Sandboxing
Most of this is handled by sandboxlib already, but it might be
better to review the API, and maybe adopt it to better suit the
needs to ybd, hopefully abolishing ybd's sandbox.py completely
and making it a better API fit.
Afaik, there are no other users of sandboxlib so we should feel
free I think to just change it (unless I'm mistaken on that).
So, once the defslib refactor lands in the immediate future, I would
like to tear apart the remains of ybd and library-ize the majority of
the codebase, leaving ybd.py itself as a tiny python script which just
iterates over the build graph and builds chunks.
Extensions
~~~~~~~~~~
As far as I can see, and I may be wrong... extensions as we know them
do not allow for providing enough context to a 'sister program' to run.
To run ybd, one needs to know:
o The target arch you're building for
o The location of the definitions root (mostly for loading DEFAULTS
and the like).
o The ybd configuration (usually derived from checking where ybd
is invoked from, but that is impossible for a program that is
not ybd).
o The target morph file
I would like there to be some semantic for extensions in definitions to
be allowed to run a program with all the context that ybd is run with.
This would allow for more elaborate plugins/extensions to leverage the
proposed ybd libraries, other than simply running shell scripts.
I'm not sure what this would look like, though. Note that currently the
only way to use ybd-rpm is to run it manually after ybd succeeds in
producing a build, obviously any person or machine which was able to
run ybd in the first place; has enough context to run ybd-rpm.
Extended Metadata
~~~~~~~~~~~~~~~~~
Currently rpms can be generated but lack some information which would
be nice to include in the final builds. This kind of metadata is
possibly not a first class citizen to definitions, but for convenience
one should be allowed to add extra data to definitions for special non
first class citizen purposes.
On IRC last week I discussed an approach which would allow extending
the metadata with attributes which could be safely namespaced and
safely ignorable when validating definitions.
The gist of it would be that
A.) Definitions could be extended with user-data-${foo}
So a chunk morph, or any morph, could have entries like:
-------------------------------
name: frobnicator
description: A tool for frobnicating foo's barbazes
kind: chunk
user-data-rpm:
licence: GPLv2
package-version: 2.2
artifact-descriptions:
- frobnicator-libs: The libraries used for frobnication
- frobnicator-bins: The main frobnicator binary
-------------------------------
I'm sure the yaml above is not exactly correct, but I think you
get the idea, there is certainly valid yaml to describe a simple
dict like above.
Also the suggested attributes are just an example of the kind
of stuff that a given plugin might want to extend definitions
with.
B.) A base library could provide a simple API for plugins to get
thier data.
So from a plugin (or 'sister program') that might run at
deployment time or any plug-in-able phase, could simply
say:
rpm_data = morph_get_plugin_data (morph, 'rpm')
Or more elegantly
rpm_data = morph.get_plugin_data ('rpm')
and that would return a dict with the content of the
'user-data-rpm' yaml content depicted in the above (A).
Doing the three things above would not be too complex, and would allow
for better extensibility of ybd in the future, the harder part will be
to define a sane API for exposing some of the essential ybd plumbing in
such a way that we minimize API churn in the future.
Thoughts ? Suggestions ? Outrage ;-) ?
Is this a direction we're generally comfortable with ?
Cheers,
-Tristan
6 years, 7 months