git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: tytso@mit.edu
To: Will Palmer <wmpalmer@gmail.com>
Cc: Jeff King <peff@peff.net>, Avery Pennarun <apenwarr@gmail.com>,
	git@vger.kernel.org
Subject: Re: Why is "git tag --contains" so slow?
Date: Thu, 8 Jul 2010 09:54:42 -0400	[thread overview]
Message-ID: <20100708135442.GA7549@thunk.org> (raw)
In-Reply-To: <1278595295.2668.10.camel@wpalmer.simply-domain>

On Thu, Jul 08, 2010 at 02:21:35PM +0100, Will Palmer wrote:
> I think these two go hand-in-hand, and would resolve most of my issues
> with it. Auto-tune, starting pessimistically, but then using something
> more-optimized after something like gc has detected that it's okay. On
> pull from an older repository (which I see as happening very frequently,
> I add remotes much more often than I do a straight "clone"), a warning
> and an auto-tune to something which would account for the newly-fetched
> bad data.

Well, keep in mind that we've been using a one-day "maximum clock skew
or you start getting incorrect results" for quite a while.  We just
haven't necessarily publicized this fact or added enforcement to "git
commit" and git-receive-pack.  And I've been introducing clock skews
of about 3 months (roughly equal to the time between Linux stable
releases) into the Linux kernel repository for some 1-2 years (because
I just didn't know about the clock skew dependency and there was no
enforcement of the same), and the number of times people have noticed,
or at least complained, has been relatively small.  So like it or not,
the default of "one day clock skew or you get incorrect results" is
the status quo.

Your idea of being utterly pessimistic until the next "git gc" is
interesting, since it doesn't slow down the "git clone step".
Unfortunately, it also means that commands like "git name-rev" will be
several hundred times slower than they currently are (which probably
makes them unusable) until the first "git gc" --- this is the fact
that we've been depending on the clock skew being less than one day
for quite some time already; again, it _is_ the status quo.

And, "git gc" takes quite a bit longer than scanning for the maximum
skew after doing a "git clone".  Given that "git clone" generally
takes a good long time anyway, adding 1-3 seconds after a clone seems
to be the better way to go.  I think the biggest issue with it is the
complexity concern which Jeff has raised.

Given that, it would seem to me that (a) auto-tuning when cloning, (b)
defaulting to one-day and then auto-tuning on "git gc", or (c)
defaulting to one-day and not auto-tuning at all (the status quo) seem
to be the three most reasonable, listed in order of my preference.

As far as enforcement is concerned, we have choices of (a) add
warnings, which later become hard errors if the skew is greater than
some configurable window, (b) scan for skews during commits and
receive-packs, and update the auto-tune based on the skews found, or
(c) do nothing (the status quo).  (b) is the most user friendly, but
it adds more complexity, which is why I think (a) edges it out as a
better choice, but fortunately, I'm not the one who's paid the big
bucks to make these sorts of decisions.  :-)

> My only other objection is more wishy-washy and/or lazy: currently a
> "commit" doesn't need to know anything at all about what it references
> in order to be considered a valid object, but saying "the time of commit
> needs to be equal to or greater than the parent commit" means that a
> tool.. and by "tool" I mean "wretched abuse of cat-file and sed", which
> is sometimes just faster to throw-together than filter-branch ..needs to
> be more aware of what it's doing. Yes, it's a horrible abuse, but I was
> always under the impression that low-level abuse of the system is
> something which git supports, by virtue of having such a simple model.

I don't know that it's that much more difficult.  See the patch I
proposed for "guilt"; it was a ten line patch, and if you were willing
to be even more "quick and wretched", I could have made it a 3 or 4
line patch.

						- Ted

  reply	other threads:[~2010-07-08 13:54 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-01  0:54 Why is "git tag --contains" so slow? Theodore Ts'o
2010-07-01  0:58 ` Shawn O. Pearce
2010-07-03 23:27   ` Sam Vilain
2010-07-01  1:00 ` Avery Pennarun
2010-07-01 12:17   ` tytso
2010-07-01 15:03     ` Jeff King
2010-07-01 15:38       ` Jeff King
2010-07-02 19:26         ` tytso
2010-07-03  8:06           ` Jeff King
2010-07-04  0:55             ` tytso
2010-07-05 12:27               ` Jeff King
2010-07-05 12:33                 ` [RFC/PATCH 1/4] tag: speed up --contains calculation Jeff King
2010-10-13 22:07                   ` Jonathan Nieder
2010-10-13 22:56                   ` Clemens Buchacher
2011-02-23 15:51                   ` Ævar Arnfjörð Bjarmason
2011-02-23 16:39                     ` Jeff King
2010-07-05 12:34                 ` [RFC/PATCH 2/4] limit "contains" traversals based on commit timestamp Jeff King
2010-10-13 23:21                   ` Jonathan Nieder
2010-07-05 12:35                 ` [RFC/PATCH 3/4] default core.clockskew variable to one day Jeff King
2010-07-05 12:36                 ` [RFC/PATCH 4/4] name-rev: respect core.clockskew Jeff King
2010-07-05 12:39                 ` Why is "git tag --contains" so slow? Jeff King
2010-10-14 18:59                   ` Jonathan Nieder
2010-10-16 14:32                     ` Clemens Buchacher
2010-10-27 17:11                       ` Jeff King
2010-10-28  8:07                         ` Clemens Buchacher
2010-07-05 14:10                 ` tytso
2010-07-06 11:58                   ` Jeff King
2010-07-06 15:31                     ` Will Palmer
2010-07-06 16:53                       ` tytso
2010-07-08 11:28                         ` Jeff King
2010-07-08 13:21                           ` Will Palmer
2010-07-08 13:54                             ` tytso [this message]
2010-07-07 17:45                       ` Jeff King
2010-07-08 10:29                         ` Theodore Tso
2010-07-08 11:12                           ` Jakub Narebski
2010-07-08 19:29                             ` Nicolas Pitre
2010-07-08 19:39                               ` Avery Pennarun
2010-07-08 20:13                                 ` Nicolas Pitre
2010-07-08 21:20                                   ` Jakub Narebski
2010-07-08 21:30                                     ` Sverre Rabbelier
2010-07-08 23:10                                       ` Nicolas Pitre
2010-07-08 23:15                                     ` Nicolas Pitre
2010-07-08 11:31                           ` Jeff King
2010-07-08 14:35                           ` Johan Herland
2010-07-08 19:06                           ` Nicolas Pitre
2010-07-07 17:50                       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100708135442.GA7549@thunk.org \
    --to=tytso@mit.edu \
    --cc=apenwarr@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=wmpalmer@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).