git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Ben Clifford <benc@hawaga.org.uk>
Cc: Martin Langhoff <martin.langhoff@gmail.com>,
	Florian Weimer <fw@deneb.enyo.de>,
	git@vger.kernel.org
Subject: Re: Handling large files with GIT
Date: Sun, 12 Feb 2006 20:57:25 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0602122049010.3691@g5.osdl.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0602121939070.3691@g5.osdl.org>



On Sun, 12 Feb 2006, Linus Torvalds wrote:
> 
> If it takes an hour per merge, it _is_ unusable. I consider 15 _seconds_ 
> to be pretty unusable.

Btw, one thing to realize is that git is inherently a lot better at 
handling lots of files in _subdirectories_, especially if those 
subdirectories don't change.

I've never used maildir layout, but if it is a couple of large _flat_ 
subdirectories, git will potentially handle that a lot worse than if you 
have a hierarchy of directories.

I say "potentially", because if the directories are all mutable and 
change, then the flat approach is better. But if they tend to have some 
kind of stability, a lot of git operations (diffing and merging in 
particular) are able to see that two subdirectories are 100% equal, and 
will entirely skip them.

This is a large part of why git performs well on the kernel. Most merges 
don't actually touch all - or even a very big percentage - of the over 
thousand subdirectories in the kernel. Git can quickly see and ignore the 
whole subdirectory when that happens - the SHA1 is exactly the same, so 
git knows that every file under that subdirectory (and every recursive 
directory) is the same.

In contrast, if you have a million files in one directory, and 10 of them 
changed, git will still have to check the SHA1's for matches for the other 
999,990 files. Which is going to be slow.

That said, I suspect there's space for optimization. 

		Linus

  reply	other threads:[~2006-02-13  4:57 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-08  9:14 Handling large files with GIT Martin Langhoff
2006-02-08 11:54 ` Johannes Schindelin
2006-02-08 16:34   ` Linus Torvalds
2006-02-08 17:01     ` Linus Torvalds
2006-02-08 20:11       ` Junio C Hamano
2006-02-08 21:20 ` Florian Weimer
2006-02-08 22:35   ` Martin Langhoff
2006-02-13  1:26     ` Ben Clifford
2006-02-13  3:42       ` Linus Torvalds
2006-02-13  4:57         ` Linus Torvalds [this message]
2006-02-13  5:05           ` Linus Torvalds
2006-02-13 23:17             ` Ian Molton
2006-02-13 23:19               ` Martin Langhoff
2006-02-14 18:56               ` Johannes Schindelin
2006-02-14 19:52                 ` Linus Torvalds
2006-02-14 21:21                   ` Sam Vilain
2006-02-14 22:01                     ` Linus Torvalds
2006-02-14 22:30                       ` Junio C Hamano
2006-02-15  0:40                         ` Sam Vilain
2006-02-15  1:39                           ` Junio C Hamano
2006-02-15  4:03                             ` Sam Vilain
2006-02-15  2:07                           ` Martin Langhoff
2006-02-15  2:05                         ` Linus Torvalds
2006-02-15  2:18                           ` Linus Torvalds
2006-02-15  2:33                             ` Linus Torvalds
2006-02-15  3:58                               ` Linus Torvalds
2006-02-15  9:54                                 ` Junio C Hamano
2006-02-15 15:44                                   ` Linus Torvalds
2006-02-15 17:16                                     ` Linus Torvalds
2006-02-16  3:25                                   ` Linus Torvalds
2006-02-16  3:29                                     ` Junio C Hamano
2006-02-16 20:32                                 ` Fredrik Kuivinen
2006-02-13  5:55           ` Jeff Garzik
2006-02-13  6:07             ` Keith Packard
2006-02-14  0:07               ` Martin Langhoff
2006-02-13 16:19             ` Linus Torvalds
2006-02-13  4:40       ` Martin Langhoff
2006-02-09  4:54   ` Greg KH
2006-02-09  5:38     ` Martin Langhoff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0602122049010.3691@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=benc@hawaga.org.uk \
    --cc=fw@deneb.enyo.de \
    --cc=git@vger.kernel.org \
    --cc=martin.langhoff@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).