From: Linus Torvalds <torvalds@osdl.org>
To: Jeff Garzik <jgarzik@pobox.com>
Cc: Ben Clifford <benc@hawaga.org.uk>,
Martin Langhoff <martin.langhoff@gmail.com>,
Florian Weimer <fw@deneb.enyo.de>,
git@vger.kernel.org
Subject: Re: Handling large files with GIT
Date: Mon, 13 Feb 2006 08:19:10 -0800 (PST) [thread overview]
Message-ID: <Pine.LNX.4.64.0602130806070.3691@g5.osdl.org> (raw)
In-Reply-To: <43F01F5A.5020808@pobox.com>
On Mon, 13 Feb 2006, Jeff Garzik wrote:
>
> Linus Torvalds wrote:
> > I've never used maildir layout, but if it is a couple of large _flat_
> > subdirectories,
>
> That's what it is :/ One directory per mail folder, with each email an
> individual file in that dir.
Ok.
Anyway, I double-checked, and I'm wrong anyway. While the "static
directories" thing is a huge performance optimization for doing many
things (diffing trees, file history in git-rev-list, etc etc), for merging
it doesn't help. We always end up expanding the whole tree.
Which is kind of sad.
It's inevitable in one sense: we do the merge in the index, after all, and
the index - unlike the tree structures - is a flat file (like the
"manifest" in mercurial or monotone). It's also represented that way in
memory.
However, it is a total and complete waste in other cases.
Thinking more about it, this is also why merging causes all the horrible
index performance: not only do we (unnecessarily) read the same trees over
and over again only to collapse them back to stage0 later when they are
the same, but because we keep the index in a linear format, when we read
the other trees, we'll have to move things around with memmove() (just the
pointers, but still).
We'd actually be a _lot_ better off if we split "git-read-tree" up into
two phases: one that did the recursive tree operation (which can optimize
the "same tree everywhere" case), and the second stage that actually
populated the index.
I'll have to think about this. It would be an absolutely _huge_
optimization for merging in certain patterns, it just doesn't matter for
something like the kernel with "just" 18,000 files and not a lot of
strange merging going on.
In contrast, I can see a mail archive easily having hundreds of thousands
of individual emails. At which time it's horribly stupid to read them all
in three times (for a merge - base, origin, new) and do so in a pretty
inefficient manner.
Ho humm. It doesn't look _hard_ per se, and I think the two-stage
git-read-tree is actually also what the recursive merge strategy wants
anyway (it can't use the index - it really just wants to get a list of
conflict information). So this definitely sounds like the RightThing(tm)
to do anyway, and it fits the git data structures really well.
So no downsides. Except that this is some rather core code, and you can't
afford to get it wrong. And the fact that I'm a lazy bastard, of course.
Linus
next prev parent reply other threads:[~2006-02-13 16:20 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-02-08 9:14 Handling large files with GIT Martin Langhoff
2006-02-08 11:54 ` Johannes Schindelin
2006-02-08 16:34 ` Linus Torvalds
2006-02-08 17:01 ` Linus Torvalds
2006-02-08 20:11 ` Junio C Hamano
2006-02-08 21:20 ` Florian Weimer
2006-02-08 22:35 ` Martin Langhoff
2006-02-13 1:26 ` Ben Clifford
2006-02-13 3:42 ` Linus Torvalds
2006-02-13 4:57 ` Linus Torvalds
2006-02-13 5:05 ` Linus Torvalds
2006-02-13 23:17 ` Ian Molton
2006-02-13 23:19 ` Martin Langhoff
2006-02-14 18:56 ` Johannes Schindelin
2006-02-14 19:52 ` Linus Torvalds
2006-02-14 21:21 ` Sam Vilain
2006-02-14 22:01 ` Linus Torvalds
2006-02-14 22:30 ` Junio C Hamano
2006-02-15 0:40 ` Sam Vilain
2006-02-15 1:39 ` Junio C Hamano
2006-02-15 4:03 ` Sam Vilain
2006-02-15 2:07 ` Martin Langhoff
2006-02-15 2:05 ` Linus Torvalds
2006-02-15 2:18 ` Linus Torvalds
2006-02-15 2:33 ` Linus Torvalds
2006-02-15 3:58 ` Linus Torvalds
2006-02-15 9:54 ` Junio C Hamano
2006-02-15 15:44 ` Linus Torvalds
2006-02-15 17:16 ` Linus Torvalds
2006-02-16 3:25 ` Linus Torvalds
2006-02-16 3:29 ` Junio C Hamano
2006-02-16 20:32 ` Fredrik Kuivinen
2006-02-13 5:55 ` Jeff Garzik
2006-02-13 6:07 ` Keith Packard
2006-02-14 0:07 ` Martin Langhoff
2006-02-13 16:19 ` Linus Torvalds [this message]
2006-02-13 4:40 ` Martin Langhoff
2006-02-09 4:54 ` Greg KH
2006-02-09 5:38 ` Martin Langhoff
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0602130806070.3691@g5.osdl.org \
--to=torvalds@osdl.org \
--cc=benc@hawaga.org.uk \
--cc=fw@deneb.enyo.de \
--cc=git@vger.kernel.org \
--cc=jgarzik@pobox.com \
--cc=martin.langhoff@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).