From: Linus Torvalds <torvalds@osdl.org>
To: Petr Baudis <pasky@suse.cz>
Cc: linux@horizon.com, Git Mailing List <git@vger.kernel.org>
Subject: Re: [ANNOUNCE] Git wiki
Date: Fri, 5 May 2006 10:48:38 -0700 (PDT) [thread overview]
Message-ID: <Pine.LNX.4.64.0605050944200.3622@g5.osdl.org> (raw)
In-Reply-To: <20060505163629.GZ27689@pasky.or.cz>
On Fri, 5 May 2006, Petr Baudis wrote:
>
> It's a philosophical question here, but I'd say that Git is much closer
> to Monotone than to any other version control system
Some historical background..
Before I dropped BK, I ended up being involved in trying to get Larry and
Tridge to come to some agreement about how to solve the issues Tridge had
with BK not being open-source. That actually went on for maybe two months
or so, and I kept on hoping that we'd find some acceptably middle ground.
I thought we could find somethign that would actually work for everybody:
to hopefully both make BK technically better, _and_ to make the end result
more palatable to the "free software or bust" contingency.
One of the suggestions that I tried to push as an acceptable middle ground
was to make a "generic" BK repository export format, so that people who
didn't want to use BK could still get all the information, and not in a
broken format like CVS (yes, CVS makes sense as an interchange format,
since _everybody_ speaks CVS, but it's a horrible, horrible, horrible
format from any technical standpoint).
My example export format was really a strange mixture of patches with
parenthood information, where the history information was described with
hashes (MD5 rather than SHA1, but that was just an implementation thing,
and mostly because BK used MD5 sums). Not something really useful as a
real SCM, but it wasn't designed for that - it was just meant to be a
useful and unambiguous interoperability format.
Now, that didn't work out, and I was a little bummed. I thought it would
have made both sides happy, because it would actually have been a better
format than CVS (and yes, I'm somewhat biased: in my opinion, having a
million monkeys throwing crap at the walls and encoding the information in
the patterns on monkey shit is a better format than CVS), so it would
actually have improved BK, while also making it possible to interoperate
if you didn't want to use BK itself.
But Tridge didn't believe that it would actually have exported all the
information in a BK tree, even if both I and Larry told him it would. I'm
not a hundred percent sure that Larry would have gone for the export
format either, but hey, one sign of a good compromise is that neither side
really gets what they really want. Whatever. It didn't work.
So it didn't actually resolve the deadlock, but when it became clear that
I couldn't work with BK any more, I thought I might use something like
that "patch + parenthood" representation as a way to maintain my tree
while looking at other alternatives.
So in many ways, when I started looking around for distributed SCM's, I
came into the game with the background of keeping the history around as
chains of hashes describing it, and then just having patches to describe
the differences between versions.
So that was really my "fallback" position: if nothing out there worked,
I'd rather go back to lists of patches than use CVS.
Now, if you keep track of just patches, one of the issues is that you
can't afford to re-create the tree every time by walking patches forward
from the beginning, so I also was planning to have an "cache" that
maintained the current state of the tree as a separate state from the
working tree, so that I would always have the "working tree" and the
"result of patches up to this moment" as two separate things (so that I
could do the "bk diff" that I was used to doing to see the difference
between my last state and the current state of the working tree).
In other words, I was already working on the git "index" file. And I was
planning to just have a patch-based system behind it, with a hashed
history. Kind of "quilt with history and an index to speed things up".
The index itself would be backed-up with whole files (all hidden in the
".dircache" directory), and the patch series would thus normally never
actually be _used_. So the inefficiency of working with patches would
never be much of an issue. A "commit" would create a new patch from the
current working directory and the previous shadow tree, and update the
shadow tree and add a new entry to the history list.
And then I found Monotone.
Now, monotone was slow. Monotone was so _horrendously_ slow that I had to
do special hacks just to import _one_ version of Linux into it in less
than two hours. It was something stupid like an O(N**3) algorithm in the
number of filenames (and the kernel had 17,291 files at that time:
v2.6.12-rc2), and it was just totally unusable for me.
I also thought (and still think) that the whole signing thing was a waste
of time and misdesigned, and I obviously am not a huge fan of databases.
So in many ways I disliked the monotone implementation decisions (and some
of its design decisions). But at the same time, I immediately liked the
SHA1 object naming concept of Monotone.
It also already matched how I had conceptually planned on doing on the
history anyway, and had some ideas for, but it took that whole "history
hashing" all the way.
And thus git was born.
So git really has three parents. In a very real sense, BK (or, perhaps
more appropriately - the way I personally used BK, which is not
necessarily how others have used it) was the biggest thing from the
standpoint of what I wanted my _workflow_ to be like. It was simply how I
had done things for the last few years, so a lot of my mental model for
how things are supposed to _work_ came from BK.
I still don't think people give Larry enough credit for actually pushing
this whole distributed SCM thing as a _usable_ model. Very few of the
open-source distributed SCM's are actually usable even today, and as far
as I've been able to gather, the commercial ones aren't really any closer
either. Larry didn't have the kind of examples of what _can_ work that I
had.
The other parent was the stupid "series of patches" model, which was what
really resulted in the "index" thing. I realize that people don't always
much like the index, but it's really a pretty central part of git history,
and one of the distinguising marks of git. It may be trivial, and to some
degree it's been overshadowed by all the tree operations we do (the
combination of revision walking and tree diffing), but it was very central
to how git came to be.
The index also ended up being central to how we did merges - even if some
day we may end up doing more of that on a pure tree level (ie the current
git-merge-tree model), I think the way we ended up doing merges owes a lot
to the index as a staging area.
(Historically, the "index" was called the "cache". Exactly because it came
from the notion of "caching" the top commit state in a patch series, and
then working with patches either backwards or forwards from that top
cached state. Similarly, we didn't have a ".git" directory: it was
called ".dircache", exactly because it was all about caching the state
of the previous commit directory layout).
And finally, Monotone for the "everything is an object named by its SHA1"
model, which to some degree is perhaps the central - or at least the most
obvious - part of git. It largely was designed really just to be the
"backing store" for the "cache", and to not be _that_ important. That also
explains why I didn't worry too much about disk usage etc initially: the
object store wasn't even the most important part, and I envisioned just
moving old objects that weren't needed into some "backup storage" kind of
thing.
Linus
next prev parent reply other threads:[~2006-05-05 17:48 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-05-05 0:56 [ANNOUNCE] Git wiki linux
2006-05-05 6:22 ` Fredrik Kuivinen
2006-05-05 6:26 ` Jakub Narebski
2006-05-05 9:23 ` Petr Baudis
2006-05-05 9:51 ` Junio C Hamano
2006-05-05 16:40 ` Petr Baudis
2006-05-05 16:47 ` Jakub Narebski
2006-05-05 18:49 ` Jakub Narebski
2006-05-05 16:36 ` Petr Baudis
2006-05-05 17:48 ` Linus Torvalds [this message]
2006-05-05 19:04 ` Dave Jones
2006-05-05 18:15 ` Petr Baudis
2006-05-05 18:20 ` Petr Baudis
2006-05-05 18:27 ` Jakub Narebski
2006-05-05 18:31 ` Linus Torvalds
2006-05-05 18:54 ` Petr Baudis
2006-05-05 19:39 ` Jakub Narebski
2006-05-06 13:37 ` Jakub Narebski
2006-05-05 19:49 ` Junio C Hamano
2006-05-06 6:53 ` Martin Langhoff
2006-05-06 7:14 ` Junio C Hamano
2006-05-06 7:33 ` Jakub Narebski
2006-05-06 7:41 ` Junio C Hamano
2006-05-06 12:46 ` Bertrand Jacquin
2006-05-05 20:45 ` Olivier Galibert
-- strict thread matches above, loose matches on Subject: below --
2006-05-02 23:25 Petr Baudis
2006-05-02 23:33 ` Junio C Hamano
2006-05-03 8:39 ` Paolo Ciarrocchi
2006-05-03 9:00 ` Petr Baudis
2006-05-03 9:13 ` Paolo Ciarrocchi
2006-05-03 13:41 ` Nicolas Pitre
2006-05-03 14:29 ` Shawn Pearce
2006-05-03 15:01 ` Andreas Ericsson
2006-05-03 15:24 ` Paolo Ciarrocchi
2006-05-03 15:30 ` Jakub Narebski
2006-05-03 15:30 ` Linus Torvalds
2006-05-03 15:39 ` Paolo Ciarrocchi
2006-05-03 16:06 ` Linus Torvalds
2006-05-03 16:17 ` Jakub Narebski
2006-05-03 16:19 ` Paolo Ciarrocchi
2006-05-03 16:46 ` Jakub Narebski
2006-05-03 19:21 ` David Lang
2006-05-03 19:30 ` Petr Baudis
2006-05-03 19:46 ` David Lang
2006-05-03 20:07 ` Petr Baudis
2006-05-04 0:53 ` Daniel Barkalow
2006-05-03 16:47 ` Theodore Tso
2006-05-03 17:06 ` Linus Torvalds
2006-05-03 17:15 ` Theodore Tso
2006-05-03 17:40 ` Linus Torvalds
2006-05-03 22:39 ` Sam Ravnborg
2006-05-03 22:46 ` Petr Baudis
2006-05-03 22:50 ` Joel Becker
2006-05-03 23:05 ` Petr Baudis
2006-05-03 18:04 ` Daniel Barkalow
[not found] ` <20060503144522.7b5b7ba5.seanlkml@sympatico.ca>
2006-05-03 18:45 ` sean
2006-05-03 20:58 ` Junio C Hamano
2006-05-03 21:01 ` Junio C Hamano
2006-05-03 22:13 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0605050944200.3622@g5.osdl.org \
--to=torvalds@osdl.org \
--cc=git@vger.kernel.org \
--cc=linux@horizon.com \
--cc=pasky@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).