Warning: This will be a long mail to read.
I was asked if I could make it possible to mirror gcc with shallow history.
After a bit of investigation, I said yes, but it will be difficult.
I stand by that assessment, but I no longer have the time to do so,
since I hit a problem that increased the work required over my available time.
The original plan
1. Extend gitano to let you configure it to accept shallow history.
Patches to support this were sent upstream.
2. Make lorry-controller enable shallow when creating repositories.
3. Make lorry support fetching from shallow repositories.
4. Add extra configuration for git lorries to:
1. Allow you to specify which refs to fetch.
This would let you only fetch important tags and refs.
2. Allow you to specify a fetch depth.
This would allow you to make a shallow repository out of a deep one.
The flow for performing a shallow import would then be:
1. lorry-controller reads the config, and if it's new, creates the repository
and sets the
2. lorry runs the equivalent of:
`git fetch $upstream_url --depth=$depth --update-shallow $refspecs`
3. lorry runs the equivalent of:
`git push ssh://git@localhost/delta/$repo $refspecs`
Unfortunately I encountered many issues,
since shallow mirrors are a use-case that git upstream hasn't considered before.
1. We need to push the new history to the trove,
but shallow push breaks if you push more than one branch at a time.
We can work around this by performing multiple pushes,
which would allow lorry to work without requiring a new version of git.
2. We lose the ability to detect non-fast-forward updates.
We can work around this by setting `--force` when fetching.
See later for why, and how it can be fixed.
3. Converting an existing repository to/from a shallow mirror isn't possible.
The problem is that pushing from source to destination isn't the same
as fetching from destination to source.
This would cause us trouble if we wanted to deepen the history of a shallow.
An exceedingly ugly work-around is possible,
and is detailed later in this document.
4. Old, downstream troves, will be unable to fetch from shallow repositories.
We could hide the repositories from downstream troves,
but gitano upstream doesn't want shallow repositories to be hidden
either by default or by configuration option,
so we'd need to fork or write a plugin.
Fixing shallow fast-forward detection
Discussion with upstream can be found at:
1. Have the sender advertise the "ancestry-check" feature.
2. Have the receiver request ancestry before sending "want"s.
See `find_common()` in `fetch-pack.c`
3. Have the sender reply with the relationships.
See `receive_needs()` in `upload-pack.c`
4. Have the receiver augment its `struct ref`
to add the ancestry result.
5. Continue as normal until `update_local_ref()` in `builtin/fetch.c`,
where the `in_merge_bases()` check should use the ancestry result.
If the server doesn't support "ancestry-check",
then it would be convenient to degrade to forcing the fetch,
which would require adding an extra config option for fetch.
Working around the lack of a way to specify depth with push
This is a pretty evil solution, but would work.
It would also work around the bug only allowing one ref push.
The gist of it is to add a plugin to gitano that adds a `reverse-fetch` command,
which boils down to the following, after ensuring the auth hooks support it.
git fetch fd::0,1 "$@"
Then rather than allowing lorry to push its changes,
lorry controller must do the push with reverse-fetch,
since lorry is explicitly unaware of gitano extensions.
Pusing with reverse-fetch involves expanding the push refspec, reversing it,
and plugging a send-pack command into the ssh command.
# Note, not quoting or whitespace safe
fetchcmd="reverse-fetch "$fetchrepo" --depth=$DEPTH
socat EXEC:"git upload-pack $(quote "$GIT_REPO")" \
EXEC:"ssh git@localhost $fetchcmd"
`--depth` is required if the repo is shallow.
If before lorry fetched the repo was shallow, but after it is not,
then `--unshallow` must be passed to the fetch, rather than `--depth`.
`--unshallow` may not be specified at any other time.
Preventing downstream trove breakage
Old downstream troves can't handle shallow repositories.
Unless we can ensure there are no old downstream troves,
we must hide shallow mirrors from gitano's `ls` output,
so that they don't attempt to mirror them.
There is precedent, since gitano supports archiving repositories to hide them.
However gitano upstream won't accept a patch that hides shallow repositories,
either by default or by configuration option.
So instead this would require a plugin to wrap the "ls" command,
so that unless you specify `--show-shallow`, they aren't listed.
Then new versions of lorry-controller could opt into shallow support,
by setting `--show-shallow` after it has been fixed to support shallow.