On Thu, 19 Oct 2006 10:58:48 -0400, Aaron Bentley wrote:
> >> In bzr development, it's very rare for anyone's revision numbers to change.
> >
> > Which just says to me that the bzr developers really are sticking to a
> > centralized model.
>
> I don't see why you're reaching that conclusion.  I'd like to understand
> that better, because Linus seems to be concluding the same thing, and it
> doesn't make sense to me.

First, I want to point out that I think we're having a delightfully
enlightening conversation here, and I'm glad for that.

Let me provide a couple of hypothetical situations to try to
demonstrate my thinking here. The first is far-fetched but perhaps
easier to understand the implications. But the second is the real,
everyday situation that is much more important.

Far-fetched
-----------
Let's imagine there's a complete fork in the bzr codebase tomorrow. We
need not suppose any acrimony, just an amiable split as two subsets of
the team start taking the code in different directions.

Now, at the time of the fork, all published revision numbers apply
equally well to either team's codebase, (obviously, since they are
identical). But as the projects diverge they each start publishing
revision numbers with respect to their own repositories in their own
bug trackers, etc. Obviously, each project has its own "mainline" so
these new revision numbers are only unique within each project and not
between the two.

Time passes...

Finally the two teams (who had remained good friends after the
breakup) find a unifying theory that will let them work on a single
tool that will meet the needs of both user bases. So they want to
merge their code together.

After the merge, there can be only one mainline, so one team or the
other will have to concede to give up the numbers they had generated
and published during the fork. That is, the numbers will not be usable
within the new, merged repository.

Everyday
--------
Now, the above scenario is just silly. It's not likely to ever happen,
so it's really not worth considering as a motivating case.

But, what does (and should) happen everyday is exactly the same. So
here's a realistic situation that is worth considering:

An individual takes the bzr codebase and starts working on it. It's
experimental stuff, so it's not pushed back into the central
repository yet. But our coder isn't a total recluse, so his friends
help him with the code he's working on. They communicate about their
work, (perhaps on the main bzr mailing list), and make statements such
as "feature F is working perfectly as of version V".

But for these communications, revision numbers will not provide
historically stable values that can be used. It's impossible for our
coder to predict the numbers that will be assigned to his code when
they get merged back into the mainline---since some other unknown
programmer may have branched at exactly the same point and is trying
to make the same determination. Neither programmer can know which code
will land first, so neither can know what numbers will get assigned,
right?

Now, the programmers could get stable numbers by keeping the branch in
the main tree, or by at least pushing out the branching point to
"reserve" a number in the main tree.

So, the only way to get stable numbers is to rely on this central
tree.

Does that make sense?

> That doesn't follow.  Just because something is arguably true doesn't
> make it bad.  And in this case, I'm not arguing that it's true, I'm
> saying that it's true, because that is what my experience tells me is true.

[I'm sorry, but I didn't grasp this sentence. I think I lost the
antecedent of "it" somewhere.]

> > In cairo, for example, we've made a habit of including a revision
> > identifier in our bug tracking system for every commit that resolves a
> > bug.
>
> We do it the other way around: we put a bug number in the commit
> message.

Oh, we do that too. That number is important, (for "what the heck is
this commit trying to do, and why", since (sadly) much of the why ends
up getting stuck off in external bug tracking tools). But the reverse
direction is also important, ("Hey, this bug got fixed in the
development version, but I want to backport it to my distribution
package. Where can I find it?").

>          And I personally have been developing a bugtracker that is
> distributed in the same way bzr is; it stores bug data in the source
> tree of a project, so that bug activities follow branches around.

That kind of thing sounds very useful. As I've been talking about
"numbers" here in bug trackers and mailing lists, it should be obvious
that I consider the information stored in such systems an important
part of the history of a code project. So it would be nice if all of
that history were stored in an equally reliable system in some way.

> On the other hand, I think your revision identifiers are not as
> permanent as you think.
>
> In the first place, it seems fairly common in the Git community to
> rebase.  This process throws away old revisions and creates new
> revisions that are morally equivalent[1].

Yes, rebasing does "destroy history" in one sense, (in actual fact, it
creates new commits and leaves the old ones around, which may or may
not have references to them anymore). But i's definitely not common
for git users to use rebase in a situation where it would change any
published number.

For example, I regularly use git-rebase, (and similar "git-commit
 --amend"), as I'm putting together a new branch that exists only
in a repository on my laptop with nobody having external visibility to
it.

So, if I see a typo in a commit and I've never pushed it anywhere,
I'll just "git commit --amend" to fix it. But if I see that typo only
after I push out the change, then I just make a new commit to fix it,
(and suck up the fact that my mistake will be a permanent part of the
history).

And git helps with this as well. If I ever forget that I've already
pushed a change and then I rebase, then the next time I try to push,
git will complain that I'm attempting to throw away history on the
remote end, and will refuse to cooperate, (unless I force it).

There's a similar safety mechanism on the pull side. If I did force a
history-rewriting push, then users who tried to pull it would also
have to force git's hand before it would rewrite their history.

[By the way, it is sometimes useful to make chaotic, regularly-rebased
branches visible to others, so they can watch what's going on. (Junio
does this with his "proposed updates (pu)" branch in hit repository
for git itself, for example). It's just that such branches should
never be used to start new development if they expect to pull from the
branch again later, nor should the revision numbers of such a branch
ever be considered permanent, nor published anywhere.]

> In the second place, one must consider the "nuclear launch codes"
> scenario.

Sure. And git does provide tools that can do this. Of course, the
"normal" tools strictly add new commits and move branches (which are
no more than references to commits) around. But moving branches can
leave commits unreferenced. And a "prune" command does exist, (which
isn't needed in "normal" use), which will delete unreferenced objects.

-Carl

> [1] This is a process that I find discomforting, because I consider the
> original revisions to be real, historical data, and I don't like the
> idea of throwing it away.

As I mentioned above. They aren't thrown away. I often use rebase when
re-building an ugly series of patches into a nice clean set of
patches. And in that situation, I might rebase from the old to the
new, but still with a reference to the old branch until I'm done with
the entire process. And it's perfectly possible, and legitimate that
such a reference has been published and the old branch will live
"forever" even if I rebased it. So rebase isn't necessarily
destructive.