git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: tytso@mit.edu
Cc: Avery Pennarun <apenwarr@gmail.com>, git@vger.kernel.org
Subject: Re: Why is "git tag --contains" so slow?
Date: Mon, 5 Jul 2010 08:27:23 -0400	[thread overview]
Message-ID: <20100705122723.GB21146@sigill.intra.peff.net> (raw)
In-Reply-To: <20100704005543.GB6384@thunk.org>

On Sat, Jul 03, 2010 at 08:55:43PM -0400, tytso@mit.edu wrote:

> > I noticed that my improved time for "git tag --contains" was similar to
> > the total time for "git rev-list --all >/dev/null". Can you try timing
> > that? My suspicion is that it is going to be about 2.9 seconds for you.
> 
> I'm at home, so getting access to my work machine is a bit of a pain,
> so I replicated the experiment at home.

Thanks. Those numbers confirm what I had been thinking.

> Yep, it does blow up in the face of the extreme clock skew in some of
> the ext4 commits in the Linux kernel tree.  (Sorry about that; mea
> culpa, I didn't realize at the time this was a problem, and it was my
> workflow using the guilt program which happened to introduce them.)

Yes, I was thinking specifically of those commits when I warned about
clock skew. :)

> In any case, because of the ext4 commits, I can show a test case which
> doesn't work well with your date cutoff patch:

Not surprising. I think you will find that "git name-rev" (or "git
describe --contains", which simply calls name-rev) will have similar
problems.

> (Or maybe we have git config options that can enable or disable
> optimizations that depend on the lack of clock skews; but I could
> understand people not wanting to maintian the extra code paths.)

I think the best thing we can do is provide a "how much clock skew to
tolerate" variable, and give it a sane default. Then people who know
they have skewed repositories can make the correctness-optimization
tradeoff as they see fit.

The extra code is very minor. It's really only a line or two of code
when calculating the cutoff date to convert "be thorough" into a cutoff
date of 0.

The real question is what that default should be. Name-rev already uses
86400 seconds. The worst skew in git.git is 8 seconds. The worst skew in
linux-2.6.git is 8622098 (about 100 days). For reference, here are my
timings on "git tag --contains HEAD~200" for various allowable clock
skew values:

  0 (don't allow even a second of clock skew): .035s
  86400 (one day of clock skew allowed): .034s
  8622098 (the worst skew in linux-2.6): .252s
  infinite (never cutoff for clock skew): 5.373s

So anything below a day is pointless and lost in the noise. Even 100
days yields quite a satisfactory speedup from the current code, but
obviously that number is totally arbitrary based on one repo.

As you probably guessed from the specificity of the number, I wrote a
short program to actually traverse and find the worst skew. It takes
about 5 seconds to run (unsurprisingly, since it is doing the same full
traversal that we end up doing in the above numbers). So we could
"autoskew" by setting up the configuration on clone, and then
periodically updating it as part of "git gc".

That is perhaps over-engineering (and would add a few seconds to a
clone), but I like that it would Just Work without the user doing
anything.

I'll follow this mail up with a series that implements a cleaned-up
version of the previous patches in this thread, and we'll see what
others think.

-Peff

  reply	other threads:[~2010-07-05 12:27 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-01  0:54 Why is "git tag --contains" so slow? Theodore Ts'o
2010-07-01  0:58 ` Shawn O. Pearce
2010-07-03 23:27   ` Sam Vilain
2010-07-01  1:00 ` Avery Pennarun
2010-07-01 12:17   ` tytso
2010-07-01 15:03     ` Jeff King
2010-07-01 15:38       ` Jeff King
2010-07-02 19:26         ` tytso
2010-07-03  8:06           ` Jeff King
2010-07-04  0:55             ` tytso
2010-07-05 12:27               ` Jeff King [this message]
2010-07-05 12:33                 ` [RFC/PATCH 1/4] tag: speed up --contains calculation Jeff King
2010-10-13 22:07                   ` Jonathan Nieder
2010-10-13 22:56                   ` Clemens Buchacher
2011-02-23 15:51                   ` Ævar Arnfjörð Bjarmason
2011-02-23 16:39                     ` Jeff King
2010-07-05 12:34                 ` [RFC/PATCH 2/4] limit "contains" traversals based on commit timestamp Jeff King
2010-10-13 23:21                   ` Jonathan Nieder
2010-07-05 12:35                 ` [RFC/PATCH 3/4] default core.clockskew variable to one day Jeff King
2010-07-05 12:36                 ` [RFC/PATCH 4/4] name-rev: respect core.clockskew Jeff King
2010-07-05 12:39                 ` Why is "git tag --contains" so slow? Jeff King
2010-10-14 18:59                   ` Jonathan Nieder
2010-10-16 14:32                     ` Clemens Buchacher
2010-10-27 17:11                       ` Jeff King
2010-10-28  8:07                         ` Clemens Buchacher
2010-07-05 14:10                 ` tytso
2010-07-06 11:58                   ` Jeff King
2010-07-06 15:31                     ` Will Palmer
2010-07-06 16:53                       ` tytso
2010-07-08 11:28                         ` Jeff King
2010-07-08 13:21                           ` Will Palmer
2010-07-08 13:54                             ` tytso
2010-07-07 17:45                       ` Jeff King
2010-07-08 10:29                         ` Theodore Tso
2010-07-08 11:12                           ` Jakub Narebski
2010-07-08 19:29                             ` Nicolas Pitre
2010-07-08 19:39                               ` Avery Pennarun
2010-07-08 20:13                                 ` Nicolas Pitre
2010-07-08 21:20                                   ` Jakub Narebski
2010-07-08 21:30                                     ` Sverre Rabbelier
2010-07-08 23:10                                       ` Nicolas Pitre
2010-07-08 23:15                                     ` Nicolas Pitre
2010-07-08 11:31                           ` Jeff King
2010-07-08 14:35                           ` Johan Herland
2010-07-08 19:06                           ` Nicolas Pitre
2010-07-07 17:50                       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100705122723.GB21146@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=apenwarr@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).