Our system artifacts are currently a tar archive of all the files that
go into the system. They can be quite big: a base system for x86-64 is
currently almost 900 megabytes, and a devel system is 1.5 gigabytes.
There's several reasons for this: a primary reason is that we do not
yet split out development tools and libraries from the base system;
work to fix that is going to happen soon.
However, another reason is that we store all the files in a system in
the tar archive. In principle, a system artifact is a union of stratum
artifacts, each of which is a union of chunk artifacts (so actually a
system is a union of chunks). We implement stratum artifacts already
as a list of (versioned!) chunk artifacts, so that a stratum does not
need to store the same files as the chunks already store. We should do
the same for systems.
Unfortunately, a system artifact is not purely a union of stratum
artifacts: it contains some changes to those, most especially we run
ldconfig at system construction time. There's other tools we need to
run then as well, such as constructing cache for gdk-pixbuf loading,
etc.
I've prototyped today a way to store system artifacts as a list of
stratum artifacts plus a binary delta between the pure union and after
running ldconfig. Roughly:
* Create a temporary directory, A.
* Hardlink all the chunks in the system into A.
- this is the same procedure as we do for creating staging areas
already
* Copy A into a new directory B.
- a hardlink copy should do, if we can make the next step safe
* Run ldconfig, and other such things, in B.
* Produce a Trebuchet delta (tbdiff-create) betweewn A and B.
* Create a system artifact consisting of the list of stratum
artifacts plus the Trebuchet delta.
I wrote some very hacky scripts to do this. Result:
916252160 current system artifact
46080 new system artifact
That's about 900 megabytes vs 45 kilobytes. Quite a difference.
When the system artifact is used, it can be reconstructed by unpacking
the stratum artifacts and applying the Trebuchet delta. Again, the
unpacked, cached chunk artifacts can, hopefully, be used to speed up
the process.
If we can make this happen, then:
* during system construction, we don't need to write a gigabyte or two
of data when creating the system directory tree, and then the same
amount again when creating the tar archive of the root filesystem
* during deployment, we don't need to write a gigabyte or two of data
for the configuration phase
There's a lot of details to get right, of course, and some
experimentation and research to make sure the unpacked chunk caches
are safe to hardlink into. (ldconfig, and any other programs that make
changes, will need to make them by writing to a temporary file and
renaming that over the real file, rather than modifying the real file
directly. That's usually the case anyway.)
--
http://www.codethink.co.uk/ http://wiki.baserock.org/ http://www.baserock.com/