git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: tytso@mit.edu
To: Jeff King <peff@peff.net>
Cc: Avery Pennarun <apenwarr@gmail.com>, git@vger.kernel.org
Subject: Re: Why is "git tag --contains" so slow?
Date: Sat, 3 Jul 2010 20:55:43 -0400	[thread overview]
Message-ID: <20100704005543.GB6384@thunk.org> (raw)
In-Reply-To: <20100703080618.GA10483@sigill.intra.peff.net>

On Sat, Jul 03, 2010 at 04:06:19AM -0400, Jeff King wrote:
> 
> I noticed that my improved time for "git tag --contains" was similar to
> the total time for "git rev-list --all >/dev/null". Can you try timing
> that? My suspicion is that it is going to be about 2.9 seconds for you.

I'm at home, so getting access to my work machine is a bit of a pain,
so I replicated the experiment at home.

% time /tmp/git.1.7.2-rc1 tag --contains 307ae18 | wc -l
13

real	0m13.706s
user	0m13.542s
sys	0m0.150s

% time /tmp/git.patch.1 tag --contains 307ae18 | wc -l
13

real	0m2.869s
user	0m2.703s
sys	0m0.163s

% time /tmp/git.patch.1 rev-list --all  > /dev/null

real   0m3.074s
user   0m2.920s
sys    0m0.147s


> Try the patch below, which adds a date cutoff similar to the one used in
> name-rev. It's much faster in my tests:

Yep, much faster indeed:

% time /tmp/git.patch.2 tag --contains 307ae18 | wc -l
13

real	0m0.054s
user	0m0.030s
sys	0m0.023s

> The obvious downside is that it stops looking down a path in the face of
> extreme clock skew. We could perhaps allow a "--contains=thorough" to
> spend a little more time to achieve a better answer (i.e., ignore the
> cutoff date).

Yep, it does blow up in the face of the extreme clock skew in some of
the ext4 commits in the Linux kernel tree.  (Sorry about that; mea
culpa, I didn't realize at the time this was a problem, and it was my
workflow using the guilt program which happened to introduce them.)

In any case, because of the ext4 commits, I can show a test case which
doesn't work well with your date cutoff patch:

#!/bin/sh

for i in $(git log --reverse --oneline v2.6.32..v2.6.35-rc3 fs/ext4 fs/jbd2 |
      	       awk '{print $1}')
do
    echo -n "$i "
    /tmp/git.patch.2 tag --contains $i | head -1
done

You won't see any problems after v2.6.34; I fixed my workflow once I
was told it was causing git problems.

If you replace this with the unpatched git, or with git with your
first patch, it will be of course much slower, but it will print out
all of the tags that would be expected given a topographical
examination of the commit graph.  Whether this is a bug in git or a bug
that I introduced into the Linux kernel git tree is of course an open
question.  However, if we do want to allow git operations to work
quickly --- and I agree this is a good thing; the speedups that this
can allow are quite significant --- maybe we should teach "git commit"
to either print a warning or outright refuse to introduce clock skews
in the first place.

(Or maybe we have git config options that can enable or disable
optimizations that depend on the lack of clock skews; but I could
understand people not wanting to maintian the extra code paths.)

	   	      	      	 	  - Ted

  reply	other threads:[~2010-07-04  0:55 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-01  0:54 Why is "git tag --contains" so slow? Theodore Ts'o
2010-07-01  0:58 ` Shawn O. Pearce
2010-07-03 23:27   ` Sam Vilain
2010-07-01  1:00 ` Avery Pennarun
2010-07-01 12:17   ` tytso
2010-07-01 15:03     ` Jeff King
2010-07-01 15:38       ` Jeff King
2010-07-02 19:26         ` tytso
2010-07-03  8:06           ` Jeff King
2010-07-04  0:55             ` tytso [this message]
2010-07-05 12:27               ` Jeff King
2010-07-05 12:33                 ` [RFC/PATCH 1/4] tag: speed up --contains calculation Jeff King
2010-10-13 22:07                   ` Jonathan Nieder
2010-10-13 22:56                   ` Clemens Buchacher
2011-02-23 15:51                   ` Ævar Arnfjörð Bjarmason
2011-02-23 16:39                     ` Jeff King
2010-07-05 12:34                 ` [RFC/PATCH 2/4] limit "contains" traversals based on commit timestamp Jeff King
2010-10-13 23:21                   ` Jonathan Nieder
2010-07-05 12:35                 ` [RFC/PATCH 3/4] default core.clockskew variable to one day Jeff King
2010-07-05 12:36                 ` [RFC/PATCH 4/4] name-rev: respect core.clockskew Jeff King
2010-07-05 12:39                 ` Why is "git tag --contains" so slow? Jeff King
2010-10-14 18:59                   ` Jonathan Nieder
2010-10-16 14:32                     ` Clemens Buchacher
2010-10-27 17:11                       ` Jeff King
2010-10-28  8:07                         ` Clemens Buchacher
2010-07-05 14:10                 ` tytso
2010-07-06 11:58                   ` Jeff King
2010-07-06 15:31                     ` Will Palmer
2010-07-06 16:53                       ` tytso
2010-07-08 11:28                         ` Jeff King
2010-07-08 13:21                           ` Will Palmer
2010-07-08 13:54                             ` tytso
2010-07-07 17:45                       ` Jeff King
2010-07-08 10:29                         ` Theodore Tso
2010-07-08 11:12                           ` Jakub Narebski
2010-07-08 19:29                             ` Nicolas Pitre
2010-07-08 19:39                               ` Avery Pennarun
2010-07-08 20:13                                 ` Nicolas Pitre
2010-07-08 21:20                                   ` Jakub Narebski
2010-07-08 21:30                                     ` Sverre Rabbelier
2010-07-08 23:10                                       ` Nicolas Pitre
2010-07-08 23:15                                     ` Nicolas Pitre
2010-07-08 11:31                           ` Jeff King
2010-07-08 14:35                           ` Johan Herland
2010-07-08 19:06                           ` Nicolas Pitre
2010-07-07 17:50                       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100704005543.GB6384@thunk.org \
    --to=tytso@mit.edu \
    --cc=apenwarr@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).