git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: tytso@mit.edu
Cc: Avery Pennarun <apenwarr@gmail.com>, git@vger.kernel.org
Subject: [RFC/PATCH 1/4] tag: speed up --contains calculation
Date: Mon, 5 Jul 2010 08:33:36 -0400	[thread overview]
Message-ID: <20100705123335.GA25699@sigill.intra.peff.net> (raw)
In-Reply-To: <20100705122723.GB21146@sigill.intra.peff.net>

When we want to know if commit A contains commit B (or any
one of a set of commits, B through Z), we generally
calculate the merge bases and see if B is a merge base of A
(or for a set, if any of the commits B through Z have that
property).

When we are going to check a series of commits A1 through An
to see whether each contains B (e.g., because we are
deciding which tags to show with "git tag --contains"), we
do a series of merge base calculations. This can be very
expensive, as we repeat a lot of traversal work.

Instead, let's leverage the fact that we are going to use
the same --contains list for each tag, and mark areas of the
commit graph is definitely containing those commits, or
definitely not containing those commits. Later tags can then
stop traversing as soon as they see a previously calculated
answer.

This sped up "git tag --contains HEAD~200" in the linux-2.6
repository from:

  real    0m15.417s
  user    0m15.197s
  sys     0m0.220s

to:

  real    0m5.329s
  user    0m5.144s
  sys     0m0.184s

Signed-off-by: Jeff King <peff@peff.net>
---
Note that this is a depth first search, whereas we generally traverse in
a more breadth-first way. So it can actually make things slightly slower
than the current merge-base code, if:

  1. You don't have any merge base calculations that involve going very
     far back in history.

  2. We depth-first down the wrong side of a merge.

However, in the usual cases, I think it will perform much better.
Comparing an old tag with a recent commit means we have to look at a lot
of the commit graph, anyway.

If anybody has a clever idea for making it breadth-first, I would be
happy to hear it.

Other caveats:

  1. This can probably be a one-liner change to use in "git branch
     --contains", as well. I didn't measure it, as that already tends to
     be reasonably fast.

  2. It uses commit marks, which means it doesn't behave well with other
     traversals. I have no idea if I should be using alternate marks or
     not.

  3. We never clear the marks, or check them against the "want" list. So
     we just assume that repeated calls to contains will have the same
     "want" list.

 builtin/tag.c |    2 +-
 commit.c      |   42 ++++++++++++++++++++++++++++++++++++++++++
 commit.h      |    2 ++
 3 files changed, 45 insertions(+), 1 deletions(-)

diff --git a/builtin/tag.c b/builtin/tag.c
index d311491..c200e1e 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -49,7 +49,7 @@ static int show_reference(const char *refname, const unsigned char *sha1,
 			commit = lookup_commit_reference_gently(sha1, 1);
 			if (!commit)
 				return 0;
-			if (!is_descendant_of(commit, filter->with_commit))
+			if (!contains(commit, filter->with_commit))
 				return 0;
 		}
 
diff --git a/commit.c b/commit.c
index e9b0750..20354c6 100644
--- a/commit.c
+++ b/commit.c
@@ -845,3 +845,45 @@ int commit_tree(const char *msg, unsigned char *tree,
 	strbuf_release(&buffer);
 	return result;
 }
+
+static int in_commit_list(const struct commit_list *want, struct commit *c)
+{
+	for (; want; want = want->next)
+		if (!hashcmp(want->item->object.sha1, c->object.sha1))
+			return 1;
+	return 0;
+}
+
+static int contains_recurse(struct commit *candidate,
+			    const struct commit_list *want)
+{
+	struct commit_list *p;
+
+	/* was it previously marked as containing a want commit? */
+	if (candidate->object.flags & TMP_MARK)
+		return 1;
+	/* or marked as not possibly containing a want commit? */
+	if (candidate->object.flags & UNINTERESTING)
+		return 0;
+	/* or are we it? */
+	if (in_commit_list(want, candidate))
+		return 1;
+
+	if (parse_commit(candidate) < 0)
+		return 0;
+
+	/* Otherwise recurse and mark ourselves for future traversals. */
+	for (p = candidate->parents; p; p = p->next) {
+		if (contains_recurse(p->item, want)) {
+			candidate->object.flags |= TMP_MARK;
+			return 1;
+		}
+	}
+	candidate->object.flags |= UNINTERESTING;
+	return 0;
+}
+
+int contains(struct commit *candidate, const struct commit_list *want)
+{
+	return contains_recurse(candidate, want);
+}
diff --git a/commit.h b/commit.h
index eb2b8ac..1a7299e 100644
--- a/commit.h
+++ b/commit.h
@@ -153,6 +153,8 @@ extern struct commit_list *get_shallow_commits(struct object_array *heads,
 int is_descendant_of(struct commit *, struct commit_list *);
 int in_merge_bases(struct commit *, struct commit **, int);
 
+int contains(struct commit *, const struct commit_list *);
+
 extern int interactive_add(int argc, const char **argv, const char *prefix);
 extern int run_add_interactive(const char *revision, const char *patch_mode,
 			       const char **pathspec);
-- 
1.7.2.rc1.209.g2a36c

  reply	other threads:[~2010-07-05 12:33 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-01  0:54 Why is "git tag --contains" so slow? Theodore Ts'o
2010-07-01  0:58 ` Shawn O. Pearce
2010-07-03 23:27   ` Sam Vilain
2010-07-01  1:00 ` Avery Pennarun
2010-07-01 12:17   ` tytso
2010-07-01 15:03     ` Jeff King
2010-07-01 15:38       ` Jeff King
2010-07-02 19:26         ` tytso
2010-07-03  8:06           ` Jeff King
2010-07-04  0:55             ` tytso
2010-07-05 12:27               ` Jeff King
2010-07-05 12:33                 ` Jeff King [this message]
2010-10-13 22:07                   ` [RFC/PATCH 1/4] tag: speed up --contains calculation Jonathan Nieder
2010-10-13 22:56                   ` Clemens Buchacher
2011-02-23 15:51                   ` Ævar Arnfjörð Bjarmason
2011-02-23 16:39                     ` Jeff King
2010-07-05 12:34                 ` [RFC/PATCH 2/4] limit "contains" traversals based on commit timestamp Jeff King
2010-10-13 23:21                   ` Jonathan Nieder
2010-07-05 12:35                 ` [RFC/PATCH 3/4] default core.clockskew variable to one day Jeff King
2010-07-05 12:36                 ` [RFC/PATCH 4/4] name-rev: respect core.clockskew Jeff King
2010-07-05 12:39                 ` Why is "git tag --contains" so slow? Jeff King
2010-10-14 18:59                   ` Jonathan Nieder
2010-10-16 14:32                     ` Clemens Buchacher
2010-10-27 17:11                       ` Jeff King
2010-10-28  8:07                         ` Clemens Buchacher
2010-07-05 14:10                 ` tytso
2010-07-06 11:58                   ` Jeff King
2010-07-06 15:31                     ` Will Palmer
2010-07-06 16:53                       ` tytso
2010-07-08 11:28                         ` Jeff King
2010-07-08 13:21                           ` Will Palmer
2010-07-08 13:54                             ` tytso
2010-07-07 17:45                       ` Jeff King
2010-07-08 10:29                         ` Theodore Tso
2010-07-08 11:12                           ` Jakub Narebski
2010-07-08 19:29                             ` Nicolas Pitre
2010-07-08 19:39                               ` Avery Pennarun
2010-07-08 20:13                                 ` Nicolas Pitre
2010-07-08 21:20                                   ` Jakub Narebski
2010-07-08 21:30                                     ` Sverre Rabbelier
2010-07-08 23:10                                       ` Nicolas Pitre
2010-07-08 23:15                                     ` Nicolas Pitre
2010-07-08 11:31                           ` Jeff King
2010-07-08 14:35                           ` Johan Herland
2010-07-08 19:06                           ` Nicolas Pitre
2010-07-07 17:50                       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100705123335.GA25699@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=apenwarr@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).