git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Daniel Berlin <dberlin@dberlin.org>
Cc: David Miller <davem@davemloft.net>,
	ismail@pardus.org.tr, gcc@gcc.gnu.org,
	        git@vger.kernel.org
Subject: Re: Git and GCC
Date: Wed, 5 Dec 2007 22:09:12 -0800 (PST)	[thread overview]
Message-ID: <alpine.LFD.0.9999.0712052132450.13796@woody.linux-foundation.org> (raw)
In-Reply-To: <4aca3dc20712052111o730f6fb6h7a329ee811a70f28@mail.gmail.com>



On Thu, 6 Dec 2007, Daniel Berlin wrote:
> 
> Actually, it turns out that git-gc --aggressive does this dumb thing
> to pack files sometimes regardless of whether you converted from an
> SVN repo or not.

Absolutely. git --aggressive is mostly dumb. It's really only useful for 
the case of "I know I have a *really* bad pack, and I want to throw away 
all the bad packing decisions I have done".

To explain this, it's worth explaining (you are probably aware of it, but 
let me go through the basics anyway) how git delta-chains work, and how 
they are so different from most other systems.

In other SCM's, a delta-chain is generally fixed. It might be "forwards" 
or "backwards", and it might evolve a bit as you work with the repository, 
but generally it's a chain of changes to a single file represented as some 
kind of single SCM entity. In CVS, it's obviously the *,v file, and a lot 
of other systems do rather similar things.

Git also does delta-chains, but it does them a lot more "loosely". There 
is no fixed entity. Delta's are generated against any random other version 
that git deems to be a good delta candidate (with various fairly 
successful heursitics), and there are absolutely no hard grouping rules.

This is generally a very good thing. It's good for various conceptual 
reasons (ie git internally never really even needs to care about the whole 
revision chain - it doesn't really think in terms of deltas at all), but 
it's also great because getting rid of the inflexible delta rules means 
that git doesn't have any problems at all with merging two files together, 
for example - there simply are no arbitrary *,v "revision files" that have 
some hidden meaning.

It also means that the choice of deltas is a much more open-ended 
question. If you limit the delta chain to just one file, you really don't 
have a lot of choices on what to do about deltas, but in git, it really 
can be a totally different issue.

And this is where the really badly named "--aggressive" comes in. While 
git generally tries to re-use delta information (because it's a good idea, 
and it doesn't waste CPU time re-finding all the good deltas we found 
earlier), sometimes you want to say "let's start all over, with a blank 
slate, and ignore all the previous delta information, and try to generate 
a new set of deltas".

So "--aggressive" is not really about being aggressive, but about wasting 
CPU time re-doing a decision we already did earlier!

*Sometimes* that is a good thing. Some import tools in particular could 
generate really horribly bad deltas. Anything that uses "git fast-import", 
for example, likely doesn't have much of a great delta layout, so it might 
be worth saying "I want to start from a clean slate".

But almost always, in other cases, it's actually a really bad thing to do. 
It's going to waste CPU time, and especially if you had actually done a 
good job at deltaing earlier, the end result isn't going to re-use all 
those *good* deltas you already found, so you'll actually end up with a 
much worse end result too!

I'll send a patch to Junio to just remove the "git gc --aggressive" 
documentation. It can be useful, but it generally is useful only when you 
really understand at a very deep level what it's doing, and that 
documentation doesn't help you do that.

Generally, doing incremental "git gc" is the right approach, and better 
than doing "git gc --aggressive". It's going to re-use old deltas, and 
when those old deltas can't be found (the reason for doing incremental GC 
in the first place!) it's going to create new ones.

On the other hand, it's definitely true that an "initial import of a long 
and involved history" is a point where it can be worth spending a lot of 
time finding the *really*good* deltas. Then, every user ever after (as 
long as they don't use "git gc --aggressive" to undo it!) will get the 
advantage of that one-time event. So especially for big projects with a 
long history, it's probably worth doing some extra work, telling the delta 
finding code to go wild.

So the equivalent of "git gc --aggressive" - but done *properly* - is to 
do (overnight) something like

	git repack -a -d --depth=250 --window=250

where that depth thing is just about how deep the delta chains can be 
(make them longer for old history - it's worth the space overhead), and 
the window thing is about how big an object window we want each delta 
candidate to scan.

And here, you might well want to add the "-f" flag (which is the "drop all 
old deltas", since you now are actually trying to make sure that this one 
actually finds good candidates.

And then it's going to take forever and a day (ie a "do it overnight" 
thing). But the end result is that everybody downstream from that 
repository will get much better packs, without having to spend any effort 
on it themselves.

			Linus

  parent reply	other threads:[~2007-12-06  6:09 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <4aca3dc20712051108s216d3331t8061ef45b9aa324a@mail.gmail.com>
2007-12-06  2:28 ` Git and GCC David Miller
2007-12-06  2:41   ` Daniel Berlin
2007-12-06  2:52     ` David Miller
2007-12-06  3:47       ` Daniel Berlin
2007-12-06  4:20         ` David Miller
2007-12-06  4:28           ` Harvey Harrison
2007-12-06  4:32           ` Daniel Berlin
2007-12-06  4:48             ` David Miller
2007-12-06  5:11               ` Daniel Berlin
2007-12-06  5:15                 ` Harvey Harrison
2007-12-06  5:17                   ` Daniel Berlin
2007-12-06  6:47                     ` Jon Smirl
2007-12-06  7:15                       ` Jeff King
2007-12-06 14:18                         ` Nicolas Pitre
2007-12-06 17:39                           ` Jeff King
2007-12-06 18:02                             ` Nicolas Pitre
2007-12-07  6:50                               ` Jeff King
2007-12-07  7:27                                 ` Jeff King
2007-12-06 18:35                             ` Linus Torvalds
2007-12-06 18:55                               ` Jon Smirl
2007-12-06 19:08                                 ` Nicolas Pitre
2007-12-06 21:39                                   ` Jon Smirl
2007-12-06 22:08                                     ` Nicolas Pitre
2007-12-06 22:11                                       ` Jon Smirl
2007-12-06 22:22                                       ` Jon Smirl
2007-12-06 22:30                                         ` Nicolas Pitre
2007-12-06 22:44                                           ` Jon Smirl
2007-12-07  7:31                               ` Jeff King
2007-12-08  0:47                               ` Harvey Harrison
2007-12-10  9:54                                 ` Gabriel Paubert
2007-12-10 15:35                                   ` Nicolas Pitre
2007-12-07  3:31                             ` David Miller
2007-12-07  6:38                               ` Jeff King
2007-12-07  7:10                                 ` Jon Smirl
2007-12-07 12:53                                   ` David Miller
2007-12-07 17:23                                     ` Linus Torvalds
2007-12-07 20:26                                       ` Giovanni Bajo
2007-12-07 22:14                                         ` Jakub Narebski
2007-12-07 23:04                                           ` Luke Lu
2007-12-07 23:14                                           ` Giovanni Bajo
2007-12-07 23:33                                             ` Daniel Berlin
2007-12-08 12:00                                               ` Johannes Schindelin
2007-12-08  1:55                                       ` David Miller
2007-12-10  9:57                                     ` David Miller
2007-12-06  6:09                 ` Linus Torvalds [this message]
2007-12-06  7:49                   ` Harvey Harrison
2007-12-06  8:11                     ` David Brown
2007-12-06 14:01                     ` Nicolas Pitre
2007-12-06 12:03                   ` [PATCH] gc --aggressive: make it really aggressive Johannes Schindelin
2007-12-06 13:42                     ` Theodore Tso
2007-12-06 14:15                       ` Nicolas Pitre
2007-12-06 14:22                     ` Pierre Habouzit
2007-12-06 15:55                       ` Johannes Schindelin
2007-12-06 17:05                         ` David Kastrup
2007-12-06 15:30                     ` Harvey Harrison
2007-12-06 15:56                       ` Johannes Schindelin
2007-12-06 16:19                       ` Linus Torvalds
2009-03-18 16:01                     ` Johannes Schindelin
2009-03-18 16:27                       ` Teemu Likonen
2009-03-18 18:02                       ` Nicolas Pitre
2007-12-06 18:04                   ` Git and GCC Daniel Berlin
2007-12-06 18:29                     ` Linus Torvalds
2007-12-07  2:42                     ` Harvey Harrison
2007-12-07  3:01                       ` Linus Torvalds
2007-12-07  4:06                         ` Jon Smirl
2007-12-07  4:21                           ` Nicolas Pitre
2007-12-07  5:21                           ` Linus Torvalds
2007-12-07  7:08                             ` Jon Smirl
2007-12-07 19:36                               ` Nicolas Pitre
2007-12-06 18:24                   ` NightStrike
2007-12-06 18:45                     ` Linus Torvalds
2007-12-07  5:36                       ` NightStrike
2007-12-06 19:12                   ` Jon Loeliger
2007-12-06 19:39                     ` Linus Torvalds
2007-12-07  0:29                       ` Jakub Narebski
2007-12-06 20:04                     ` Junio C Hamano
2007-12-06 21:02                       ` Junio C Hamano
2007-12-06 22:26                         ` David Kastrup
2007-12-06 22:38                           ` [OT] " Randy Dunlap
2007-12-06  4:25         ` Harvey Harrison
2007-12-06  4:54           ` Linus Torvalds
2007-12-06  5:04             ` Harvey Harrison
2007-12-06 11:57       ` Johannes Schindelin
2007-12-06 12:04         ` Ismail Dönmez
     [not found] ` <2007-12-05-21-23-14+trackit+sam@rfc1149.net>
     [not found]   ` <1196891451.10408.54.camel@brick>
     [not found]     ` <jeeje0ogvk.fsf@sykes.suse.de>
     [not found]       ` <1196897840.10408.57.camel@brick>
     [not found]         ` <38a0d8450712130640p1b5d74d6nfa124ad0b0110d64@mail.gmail.com>
     [not found]           ` <1197572755.898.15.camel@brick>
2007-12-17 22:15             ` "Argument list too long" in git remote update (Was: Git and GCC) Geert Bosch
2007-12-17 22:59               ` Johannes Schindelin
2007-12-17 23:01               ` Linus Torvalds
2007-12-18  1:34                 ` Derek Fawcus
2007-12-18  1:52                   ` Shawn O. Pearce
2007-12-08  2:21 Git and GCC J.C. Pizarro
2007-12-08 12:24 ` Johannes Schindelin
2007-12-08 19:53   ` Joe Buck
2007-12-08 20:28     ` Marco Costalba
2007-12-09  1:51       ` Daniel Berlin
2007-12-15  0:18   ` Nix

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.0.9999.0712052132450.13796@woody.linux-foundation.org \
    --to=torvalds@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=dberlin@dberlin.org \
    --cc=gcc@gcc.gnu.org \
    --cc=git@vger.kernel.org \
    --cc=ismail@pardus.org.tr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).