Introducing a per-commit key/value store for Git

Richard Maw richard.maw at
Wed Jan 2 16:57:14 GMT 2013

On Wed, Jan 02, 2013 at 04:56:35PM +0100, Jannis Pohlmann wrote:
> Hey,
> first of all, thanks for the feedback!
> On 13-01-02 15:27:58, Richard Maw wrote:
> > Validating key format with a regular expression is overkill unless the
> > format changes. You just need a set operation like the following.
> > 
> > valid_chars = string.digits + string.ascii_letters + '-_/:'
> > any((c in valid_chars) for c in key)
> I think you mean all()? But yes, I agree with your point.

I did mean all() :)

> > Interestingly, the '.' character is not valid, though git itself uses it
> > as a configuration path separator.
> There's no good reasons for this. All we need is a sensible format that
> works for our use cases. Caching rendered web content is probably the
> most tricky one here as people will likely want to use URL paths
> as keys. But if that involves more than just basic characters, they can
> always convert between URL paths and keys encoded using base64 and a
> basic alphabet that is allowed.

RFC3986 (URIs) specifies pretty much all printable characters apart from
"<>#%\"", and % can appear for escaping.

> > I don't think that there's a use case for needing snapshots of the state
> > of every annotation together, so I would have multiple percs refs named
> > after the sha1 of the commit they annotate, so the commits only have the
> > property names and values.
> That would be an option. In the initial implementation I decided against
> it because it would potentially generate a lot of refs. Overall, I think
> what I'd go for is refs like
>   refs/percs/<sha1>
> rather than
>   refs/heads/percs/<sha1>
> because the latter might conflict with real branches.

Agreed, it also makes `git branch` cluttered.
However it will add complications to fetching, since git defaults to
just refs/heads and refs/tags.
For the caching use-case I don't see a problem though, since you can
always generate the pairs in refs/percs yourself, or change your fetch

> > Since this could be involved in page caches for bottle, I'm guessing
> > it's in for point 3, since you could have the processed page as the
> > .value and the key be the relative path to the page, in which case a
> > checkout is a static snapshot of the page.
> > 
> > However, being able to do that requires that the web server redirects to
> > the .value file, or the format is changed so the last path component is
> > the blob.
> I'd expect web applications to load the .value files via libgit2 rather
> than redirecting to checked out versions of these files.
> > In summary; you could special case the format such that a checkout of
> > the tree would become useful, or the application needing to substitute
> > '/', but at the cost of creating a lot of creating a lot of
> > potentially redundant tree objects.
> I think you're right and we should avoid creating 1+ trees for every
> single key/value pair. I'll think about this for a bit. Flat trees with
> keys represented as blobs with names like
>   foo
>   foo:bar:baz
>   bla:1231423:bla
> might be ok as well if gitpercs converts between / and : internally so
> that applications can still use
>   foo
>   foo/bar/baz
>   bla/1231423/bla
> transparently. Does that make sense?

If : and / are both valid separators then that would make foo:bar and
foo/bar syonyms. This is unlikley but confusing.

Control characters aren't allowed, so if it doesn't need to be printable
then how about

If it needs to be printable, it can be one of ' <>#"'.

More information about the baserock-dev mailing list