From: Jeff King <peff@peff.net>
To: tytso@mit.edu
Cc: Avery Pennarun <apenwarr@gmail.com>, git@vger.kernel.org
Subject: Re: Why is "git tag --contains" so slow?
Date: Mon, 5 Jul 2010 08:27:23 -0400 [thread overview]
Message-ID: <20100705122723.GB21146@sigill.intra.peff.net> (raw)
In-Reply-To: <20100704005543.GB6384@thunk.org>
On Sat, Jul 03, 2010 at 08:55:43PM -0400, tytso@mit.edu wrote:
> > I noticed that my improved time for "git tag --contains" was similar to
> > the total time for "git rev-list --all >/dev/null". Can you try timing
> > that? My suspicion is that it is going to be about 2.9 seconds for you.
>
> I'm at home, so getting access to my work machine is a bit of a pain,
> so I replicated the experiment at home.
Thanks. Those numbers confirm what I had been thinking.
> Yep, it does blow up in the face of the extreme clock skew in some of
> the ext4 commits in the Linux kernel tree. (Sorry about that; mea
> culpa, I didn't realize at the time this was a problem, and it was my
> workflow using the guilt program which happened to introduce them.)
Yes, I was thinking specifically of those commits when I warned about
clock skew. :)
> In any case, because of the ext4 commits, I can show a test case which
> doesn't work well with your date cutoff patch:
Not surprising. I think you will find that "git name-rev" (or "git
describe --contains", which simply calls name-rev) will have similar
problems.
> (Or maybe we have git config options that can enable or disable
> optimizations that depend on the lack of clock skews; but I could
> understand people not wanting to maintian the extra code paths.)
I think the best thing we can do is provide a "how much clock skew to
tolerate" variable, and give it a sane default. Then people who know
they have skewed repositories can make the correctness-optimization
tradeoff as they see fit.
The extra code is very minor. It's really only a line or two of code
when calculating the cutoff date to convert "be thorough" into a cutoff
date of 0.
The real question is what that default should be. Name-rev already uses
86400 seconds. The worst skew in git.git is 8 seconds. The worst skew in
linux-2.6.git is 8622098 (about 100 days). For reference, here are my
timings on "git tag --contains HEAD~200" for various allowable clock
skew values:
0 (don't allow even a second of clock skew): .035s
86400 (one day of clock skew allowed): .034s
8622098 (the worst skew in linux-2.6): .252s
infinite (never cutoff for clock skew): 5.373s
So anything below a day is pointless and lost in the noise. Even 100
days yields quite a satisfactory speedup from the current code, but
obviously that number is totally arbitrary based on one repo.
As you probably guessed from the specificity of the number, I wrote a
short program to actually traverse and find the worst skew. It takes
about 5 seconds to run (unsurprisingly, since it is doing the same full
traversal that we end up doing in the above numbers). So we could
"autoskew" by setting up the configuration on clone, and then
periodically updating it as part of "git gc".
That is perhaps over-engineering (and would add a few seconds to a
clone), but I like that it would Just Work without the user doing
anything.
I'll follow this mail up with a series that implements a cleaned-up
version of the previous patches in this thread, and we'll see what
others think.
-Peff
next prev parent reply other threads:[~2010-07-05 12:27 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-01 0:54 Why is "git tag --contains" so slow? Theodore Ts'o
2010-07-01 0:58 ` Shawn O. Pearce
2010-07-03 23:27 ` Sam Vilain
2010-07-01 1:00 ` Avery Pennarun
2010-07-01 12:17 ` tytso
2010-07-01 15:03 ` Jeff King
2010-07-01 15:38 ` Jeff King
2010-07-02 19:26 ` tytso
2010-07-03 8:06 ` Jeff King
2010-07-04 0:55 ` tytso
2010-07-05 12:27 ` Jeff King [this message]
2010-07-05 12:33 ` [RFC/PATCH 1/4] tag: speed up --contains calculation Jeff King
2010-10-13 22:07 ` Jonathan Nieder
2010-10-13 22:56 ` Clemens Buchacher
2011-02-23 15:51 ` Ævar Arnfjörð Bjarmason
2011-02-23 16:39 ` Jeff King
2010-07-05 12:34 ` [RFC/PATCH 2/4] limit "contains" traversals based on commit timestamp Jeff King
2010-10-13 23:21 ` Jonathan Nieder
2010-07-05 12:35 ` [RFC/PATCH 3/4] default core.clockskew variable to one day Jeff King
2010-07-05 12:36 ` [RFC/PATCH 4/4] name-rev: respect core.clockskew Jeff King
2010-07-05 12:39 ` Why is "git tag --contains" so slow? Jeff King
2010-10-14 18:59 ` Jonathan Nieder
2010-10-16 14:32 ` Clemens Buchacher
2010-10-27 17:11 ` Jeff King
2010-10-28 8:07 ` Clemens Buchacher
2010-07-05 14:10 ` tytso
2010-07-06 11:58 ` Jeff King
2010-07-06 15:31 ` Will Palmer
2010-07-06 16:53 ` tytso
2010-07-08 11:28 ` Jeff King
2010-07-08 13:21 ` Will Palmer
2010-07-08 13:54 ` tytso
2010-07-07 17:45 ` Jeff King
2010-07-08 10:29 ` Theodore Tso
2010-07-08 11:12 ` Jakub Narebski
2010-07-08 19:29 ` Nicolas Pitre
2010-07-08 19:39 ` Avery Pennarun
2010-07-08 20:13 ` Nicolas Pitre
2010-07-08 21:20 ` Jakub Narebski
2010-07-08 21:30 ` Sverre Rabbelier
2010-07-08 23:10 ` Nicolas Pitre
2010-07-08 23:15 ` Nicolas Pitre
2010-07-08 11:31 ` Jeff King
2010-07-08 14:35 ` Johan Herland
2010-07-08 19:06 ` Nicolas Pitre
2010-07-07 17:50 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100705122723.GB21146@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=apenwarr@gmail.com \
--cc=git@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).