Introducing a per-commit key/value store for Git

Richard Maw richard.maw at codethink.co.uk
Wed Jan 2 15:27:58 GMT 2013


On Wed, Jan 02, 2013 at 02:37:01AM +0100, Jannis Pohlmann wrote:
> The source code is available at
> 
>   https://github.com/Jannis/gitpercs
> 
> Please have a skim through the code and comments (esp. the main doc
> string for the Store class in gitpercs/store.py). I'd appreciate
> feedback to the current design. I reckon especially Daniel might
> come up with remarks wrt the usage of Git internals here. ;)
> 
>   - Jannis

Validating key format with a regular expression is overkill unless the
format changes. You just need a set operation like the following.

valid_chars = string.digits + string.ascii_letters + '-_/:'
any((c in valid_chars) for c in key)

Interestingly, the '.' character is not valid, though git itself uses it
as a configuration path separator.

I don't think that there's a use case for needing snapshots of the state
of every annotation together, so I would have multiple percs refs named
after the sha1 of the commit they annotate, so the commits only have the
property names and values.

I don't think having a tree for every path component is a good idea, it
makes a lot of trees and complicates the code significantly.
The only benefits I know of are:
  1. it produces a smaller top-level tree
  2. it is less likely to run into problems checking out the tree
     on systems with small directory entry size limits but implausibly
     large numbers of properties
  3. applications don't need to escape or strip / components in property
     names

Since this could be involved in page caches for bottle, I'm guessing
it's in for point 3, since you could have the processed page as the
.value and the key be the relative path to the page, in which case a
checkout is a static snapshot of the page.

However, being able to do that requires that the web server redirects to
the .value file, or the format is changed so the last path component is
the blob.

In summary; you could special case the format such that a checkout of
the tree would become useful, or the application needing to substitute
'/', but at the cost of creating a lot of creating a lot of
potentially redundant tree objects.




More information about the baserock-dev mailing list