We've had a short disucssion on IRC about maybe adding support to Gitano (a
git server) and ick for ActivityPub instead of having explicit triggers
configured in Gitano to trigger builds in ick. I thought I'd expand on that
a little bit.
The thing I'm grasping at is that currently, each git server instance needs
to notify each CI instance separetely, creating a fairly tight coupling.
Alternatively, each CI instance needs to poll each git server, which is
unworkable at scale.
So currently, if there are three users, one using
github.com, one using
gitlab.com, and one using the Debian gitlab instance (salsa), and each user
is developing their own software, then each user needs to configure their
own git server to notify their own CI server. So far, so good.
However, if each of the three also wants to build and deploy each other's
software, so that if user A makes a change, it gets put through CI by A's,
B', and C's CI, and then deployed to A's, B's, and C's production
server,
and likewise for B's and C's software, then each of the three needs to add
a trigger hook on each of th three git repositories. That's three
repositories, three CI servers, and a total of nine trigger hooks.
That's still manageable, but there are problems. Not all git servers allow
random strangers to add trigger hooks on each repository. Also, three
repositories and three CI servers is small potatoes. Imagine having fifty
thousand repositories! Debian has about that many binary packages.
Also imagine that those fifty thousand repositories are used by a million
users. Not all of them by all users, but many repositories by each of the
million. Having millions of hooks does not scale. Having that much polling
also doesn't scale.
So here's the solution I'm thinkin of: instead of having the git server
notify each CI server, have the git server send a message to an ActivityPub
server, and have each CI server listen on such messages on their own
ActivityPub server. The AP servers federate (send messages to each other),
so each git server and each CI server only needs to post to and listen on
one AP server.
With this design, many-to-lots communication becomes feasible.
A makes a change, and pushes to their git server. The git server posts a
message to its own AP server, saying "this branch on this repo on this git
server has changed, new commit it CAFEBEEF." The AP server knows who's
"following" A's repository, and so the message flows out to the AP server
network to every AP server where a CI server is interested in that
repository. Each of those CI servers can then trigger a build and deploy of
A's software.
I don't actually know much about ActivityPub yet. It's the protocol behind
the Mastodon system (
https://joinmastodon.org/), and it works fine for a
Twitter clone. I don't know yet if it would work for what I'm talking about
above, but I think so.
Of course, doing distributed CI at this scale might not be something that
anyone actually cares about. We'll see. But even at a small scale, I'd like
to break the necessity of the git server having to know about each CI
server that is interested in each of its repositories.
Also, this needs thinking for non-public repositories.