git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: tytso@mit.edu
Cc: Avery Pennarun <apenwarr@gmail.com>, git@vger.kernel.org
Subject: [RFC/PATCH 2/4] limit "contains" traversals based on commit timestamp
Date: Mon, 5 Jul 2010 08:34:19 -0400	[thread overview]
Message-ID: <20100705123419.GB25699@sigill.intra.peff.net> (raw)
In-Reply-To: <20100705122723.GB21146@sigill.intra.peff.net>

When looking for commits that contain other commits (e.g.,
via "git tag --contains"), we can end up traversing useless
portions of the graph. For example, if I am looking for a
tag that contains a commit made last week, there is not much
point in traversing portions of the history graph made five
years ago.

This optimization can provide massive speedups. For example,
doing "git tag --contains HEAD~200" in the linux-2.6
repository goes from:

  real    0m5.302s
  user    0m5.116s
  sys     0m0.184s

to:

  real    0m0.030s
  user    0m0.020s
  sys     0m0.008s

The downside is that we will no longer find some answers in
the face of extreme clock skew, as we will stop the
traversal early when seeing commits skewed too far into the
past.

Name-rev already implements a similar optimization, using a
"slop" of one day to allow for a certain amount of clock
skew in commit timestamps. This patch introduces a
"core.clockskew" variable, which allows specifying the
allowable amount of clock skew in seconds.  For safety, it
defaults to "none", causing a full traversal (i.e., no
change in behavior from previous versions).

Signed-off-by: Jeff King <peff@peff.net>
---
 cache.h  |    1 +
 commit.c |   27 ++++++++++++++++++++++++---
 config.c |    8 ++++++++
 3 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/cache.h b/cache.h
index c9fa3df..2dfb636 100644
--- a/cache.h
+++ b/cache.h
@@ -551,6 +551,7 @@ extern int read_replace_refs;
 extern int fsync_object_files;
 extern int core_preload_index;
 extern int core_apply_sparse_checkout;
+extern int core_clock_skew;
 
 enum safe_crlf {
 	SAFE_CRLF_FALSE = 0,
diff --git a/commit.c b/commit.c
index 20354c6..c849b50 100644
--- a/commit.c
+++ b/commit.c
@@ -7,6 +7,7 @@
 #include "revision.h"
 #include "notes.h"
 
+int core_clock_skew = -1;
 int save_commit_buffer = 1;
 
 const char *commit_type = "commit";
@@ -855,7 +856,8 @@ static int in_commit_list(const struct commit_list *want, struct commit *c)
 }
 
 static int contains_recurse(struct commit *candidate,
-			    const struct commit_list *want)
+			    const struct commit_list *want,
+			    unsigned long cutoff)
 {
 	struct commit_list *p;
 
@@ -872,9 +874,13 @@ static int contains_recurse(struct commit *candidate,
 	if (parse_commit(candidate) < 0)
 		return 0;
 
+	/* stop searching if we go too far back in time */
+	if (candidate->date < cutoff)
+		return 0;
+
 	/* Otherwise recurse and mark ourselves for future traversals. */
 	for (p = candidate->parents; p; p = p->next) {
-		if (contains_recurse(p->item, want)) {
+		if (contains_recurse(p->item, want, cutoff)) {
 			candidate->object.flags |= TMP_MARK;
 			return 1;
 		}
@@ -885,5 +891,20 @@ static int contains_recurse(struct commit *candidate,
 
 int contains(struct commit *candidate, const struct commit_list *want)
 {
-	return contains_recurse(candidate, want);
+	unsigned long cutoff = 0;
+
+	if (core_clock_skew >= 0) {
+		const struct commit_list *c;
+		unsigned long min_date = ULONG_MAX;
+		for (c = want; c; c = c->next) {
+			if (parse_commit(c->item) < 0)
+				continue;
+			if (c->item->date < min_date)
+				min_date = c->item->date;
+		}
+		if (min_date > core_clock_skew)
+			cutoff = min_date - core_clock_skew;
+	}
+
+	return contains_recurse(candidate, want, cutoff);
 }
diff --git a/config.c b/config.c
index cdcf583..7a18bc9 100644
--- a/config.c
+++ b/config.c
@@ -595,6 +595,14 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.clockskew")) {
+		if (!value || !strcmp(value, "none"))
+			core_clock_skew = -1;
+		else
+			core_clock_skew = git_config_int(var, value);
+		return 0;
+	}
+
 	/* Add other config variables here and to Documentation/config.txt. */
 	return 0;
 }
-- 
1.7.2.rc1.209.g2a36c

  parent reply	other threads:[~2010-07-05 12:34 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-01  0:54 Why is "git tag --contains" so slow? Theodore Ts'o
2010-07-01  0:58 ` Shawn O. Pearce
2010-07-03 23:27   ` Sam Vilain
2010-07-01  1:00 ` Avery Pennarun
2010-07-01 12:17   ` tytso
2010-07-01 15:03     ` Jeff King
2010-07-01 15:38       ` Jeff King
2010-07-02 19:26         ` tytso
2010-07-03  8:06           ` Jeff King
2010-07-04  0:55             ` tytso
2010-07-05 12:27               ` Jeff King
2010-07-05 12:33                 ` [RFC/PATCH 1/4] tag: speed up --contains calculation Jeff King
2010-10-13 22:07                   ` Jonathan Nieder
2010-10-13 22:56                   ` Clemens Buchacher
2011-02-23 15:51                   ` Ævar Arnfjörð Bjarmason
2011-02-23 16:39                     ` Jeff King
2010-07-05 12:34                 ` Jeff King [this message]
2010-10-13 23:21                   ` [RFC/PATCH 2/4] limit "contains" traversals based on commit timestamp Jonathan Nieder
2010-07-05 12:35                 ` [RFC/PATCH 3/4] default core.clockskew variable to one day Jeff King
2010-07-05 12:36                 ` [RFC/PATCH 4/4] name-rev: respect core.clockskew Jeff King
2010-07-05 12:39                 ` Why is "git tag --contains" so slow? Jeff King
2010-10-14 18:59                   ` Jonathan Nieder
2010-10-16 14:32                     ` Clemens Buchacher
2010-10-27 17:11                       ` Jeff King
2010-10-28  8:07                         ` Clemens Buchacher
2010-07-05 14:10                 ` tytso
2010-07-06 11:58                   ` Jeff King
2010-07-06 15:31                     ` Will Palmer
2010-07-06 16:53                       ` tytso
2010-07-08 11:28                         ` Jeff King
2010-07-08 13:21                           ` Will Palmer
2010-07-08 13:54                             ` tytso
2010-07-07 17:45                       ` Jeff King
2010-07-08 10:29                         ` Theodore Tso
2010-07-08 11:12                           ` Jakub Narebski
2010-07-08 19:29                             ` Nicolas Pitre
2010-07-08 19:39                               ` Avery Pennarun
2010-07-08 20:13                                 ` Nicolas Pitre
2010-07-08 21:20                                   ` Jakub Narebski
2010-07-08 21:30                                     ` Sverre Rabbelier
2010-07-08 23:10                                       ` Nicolas Pitre
2010-07-08 23:15                                     ` Nicolas Pitre
2010-07-08 11:31                           ` Jeff King
2010-07-08 14:35                           ` Johan Herland
2010-07-08 19:06                           ` Nicolas Pitre
2010-07-07 17:50                       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100705123419.GB25699@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=apenwarr@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).