git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Clemens Buchacher <drizzd@aon.at>
Cc: Jonathan Nieder <jrnieder@gmail.com>,
	tytso@mit.edu, Avery Pennarun <apenwarr@gmail.com>,
	git@vger.kernel.org
Subject: Re: Why is "git tag --contains" so slow?
Date: Wed, 27 Oct 2010 10:11:45 -0700	[thread overview]
Message-ID: <20101027171145.GA6921@sigill.intra.peff.net> (raw)
In-Reply-To: <20101016143226.GA20515@localhost>

On Sat, Oct 16, 2010 at 04:32:26PM +0200, Clemens Buchacher wrote:

> On Thu, Oct 14, 2010 at 01:59:45PM -0500, Jonathan Nieder wrote:
> > 
> > project # skewed        maximum skew    notes
> > ------- --------        ------------    -----
> > gtk+    13              13 hrs          worst example seems to be tz related
> 
> It really is kind of fun.
> 
> wine                    1       14 days

Thanks both of you for the extra data points. If you don't mind, would
you consider running my updated git-skew below on your test cases (or
tweaking your skew detectors, since you both seem to be getting a '#
skewed' column that mine didn't output). Specifically, I am interested
in long runs of skewed commits, since one potential solution would be to
just accept a fixed number of slop commits (rather than accepting
commits within a certain slop time).

For the kernel the longest run looks to be about 80 commits. This should
yield much better performance than handling the worst skew by time
(which was 100 days, during which many more than 80 commits are usually
used).

Patch below. NB: thinking on this more, I think my program's results are
slightly inaccurate. There are corner skew cases we can miss by the use
of marking commits SEEN (i.e., skew you would see if you got to a commit
by another path). However, the problem gets intractably large if you
don't mark commits (you end up traversing every possible path through
the graph).

---
 .gitignore     |    1 +
 Makefile       |    1 +
 builtin.h      |    1 +
 builtin/skew.c |   62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 git.c          |    1 +
 5 files changed, 66 insertions(+), 0 deletions(-)
 create mode 100644 builtin/skew.c

diff --git a/.gitignore b/.gitignore
index 20560b8..d696c78 100644
--- a/.gitignore
+++ b/.gitignore
@@ -132,6 +132,7 @@
 /git-show-branch
 /git-show-index
 /git-show-ref
+/git-skew
 /git-stage
 /git-stash
 /git-status
diff --git a/Makefile b/Makefile
index 1f1ce04..25f94b0 100644
--- a/Makefile
+++ b/Makefile
@@ -739,6 +739,7 @@ BUILTIN_OBJS += builtin/send-pack.o
 BUILTIN_OBJS += builtin/shortlog.o
 BUILTIN_OBJS += builtin/show-branch.o
 BUILTIN_OBJS += builtin/show-ref.o
+BUILTIN_OBJS += builtin/skew.o
 BUILTIN_OBJS += builtin/stripspace.o
 BUILTIN_OBJS += builtin/symbolic-ref.o
 BUILTIN_OBJS += builtin/tag.o
diff --git a/builtin.h b/builtin.h
index f2a25a0..e01ac4c 100644
--- a/builtin.h
+++ b/builtin.h
@@ -140,5 +140,6 @@ extern int cmd_verify_pack(int argc, const char **argv, const char *prefix);
 extern int cmd_show_ref(int argc, const char **argv, const char *prefix);
 extern int cmd_pack_refs(int argc, const char **argv, const char *prefix);
 extern int cmd_replace(int argc, const char **argv, const char *prefix);
+extern int cmd_skew(int argc, const char **argv, const char *prefix);
 
 #endif
diff --git a/builtin/skew.c b/builtin/skew.c
new file mode 100644
index 0000000..169a9f4
--- /dev/null
+++ b/builtin/skew.c
@@ -0,0 +1,62 @@
+#include "cache.h"
+#include "commit.h"
+#include "diff.h"
+#include "revision.h"
+
+unsigned long worst_skew = 0;
+unsigned char worst_skew_sha1[20];
+unsigned long worst_run = 0;
+unsigned char worst_run_sha1[20];
+
+static void check_skew_recurse(struct commit *c, unsigned long when, int counter)
+{
+	struct commit_list *p;
+
+	if (c->object.flags & SEEN)
+		return;
+	c->object.flags |= SEEN;
+
+	if (parse_commit(c) < 0)
+		return;
+
+	if (c->date > when) {
+		unsigned long skew = c->date - when;
+		if (skew > worst_skew) {
+			worst_skew = skew;
+			hashcpy(worst_skew_sha1, c->object.sha1);
+		}
+		if (++counter > worst_run) {
+			worst_run = counter;
+			hashcpy(worst_run_sha1, c->object.sha1);
+		}
+	}
+	else
+		counter = 0;
+
+	for (p = c->parents; p; p = p->next)
+		check_skew_recurse(p->item, c->date < when ? c->date : when, counter);
+}
+
+static void check_skew(struct commit *c)
+{
+	check_skew_recurse(c, time(NULL), 0);
+}
+
+int cmd_skew(int argc, const char **argv, const char *prefix) {
+	struct rev_info revs;
+	int i;
+
+	git_config(git_default_config, NULL);
+	init_revisions(&revs, prefix);
+	argc = setup_revisions(argc, argv, &revs, NULL);
+
+	for (i = 0; i < revs.pending.nr; i++) {
+		struct object *o = revs.pending.objects[i].item;
+		if (o->type == OBJ_COMMIT)
+			check_skew((struct commit *)o);
+	}
+
+	printf("worst skew: %lu (%s)\n", worst_skew, sha1_to_hex(worst_skew_sha1));
+	printf("longest run: %lu (%s)\n", worst_run, sha1_to_hex(worst_run_sha1));
+	return 0;
+}
diff --git a/git.c b/git.c
index 0409ac9..1041858 100644
--- a/git.c
+++ b/git.c
@@ -405,6 +405,7 @@ static void handle_internal_command(int argc, const char **argv)
 		{ "verify-pack", cmd_verify_pack },
 		{ "show-ref", cmd_show_ref, RUN_SETUP },
 		{ "pack-refs", cmd_pack_refs, RUN_SETUP },
+		{ "skew", cmd_skew, RUN_SETUP },
 	};
 	int i;
 	static const char ext[] = STRIP_EXTENSION;
-- 
1.7.3.2.216.g61ab7.dirty

  reply	other threads:[~2010-10-27 17:11 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-01  0:54 Why is "git tag --contains" so slow? Theodore Ts'o
2010-07-01  0:58 ` Shawn O. Pearce
2010-07-03 23:27   ` Sam Vilain
2010-07-01  1:00 ` Avery Pennarun
2010-07-01 12:17   ` tytso
2010-07-01 15:03     ` Jeff King
2010-07-01 15:38       ` Jeff King
2010-07-02 19:26         ` tytso
2010-07-03  8:06           ` Jeff King
2010-07-04  0:55             ` tytso
2010-07-05 12:27               ` Jeff King
2010-07-05 12:33                 ` [RFC/PATCH 1/4] tag: speed up --contains calculation Jeff King
2010-10-13 22:07                   ` Jonathan Nieder
2010-10-13 22:56                   ` Clemens Buchacher
2011-02-23 15:51                   ` Ævar Arnfjörð Bjarmason
2011-02-23 16:39                     ` Jeff King
2010-07-05 12:34                 ` [RFC/PATCH 2/4] limit "contains" traversals based on commit timestamp Jeff King
2010-10-13 23:21                   ` Jonathan Nieder
2010-07-05 12:35                 ` [RFC/PATCH 3/4] default core.clockskew variable to one day Jeff King
2010-07-05 12:36                 ` [RFC/PATCH 4/4] name-rev: respect core.clockskew Jeff King
2010-07-05 12:39                 ` Why is "git tag --contains" so slow? Jeff King
2010-10-14 18:59                   ` Jonathan Nieder
2010-10-16 14:32                     ` Clemens Buchacher
2010-10-27 17:11                       ` Jeff King [this message]
2010-10-28  8:07                         ` Clemens Buchacher
2010-07-05 14:10                 ` tytso
2010-07-06 11:58                   ` Jeff King
2010-07-06 15:31                     ` Will Palmer
2010-07-06 16:53                       ` tytso
2010-07-08 11:28                         ` Jeff King
2010-07-08 13:21                           ` Will Palmer
2010-07-08 13:54                             ` tytso
2010-07-07 17:45                       ` Jeff King
2010-07-08 10:29                         ` Theodore Tso
2010-07-08 11:12                           ` Jakub Narebski
2010-07-08 19:29                             ` Nicolas Pitre
2010-07-08 19:39                               ` Avery Pennarun
2010-07-08 20:13                                 ` Nicolas Pitre
2010-07-08 21:20                                   ` Jakub Narebski
2010-07-08 21:30                                     ` Sverre Rabbelier
2010-07-08 23:10                                       ` Nicolas Pitre
2010-07-08 23:15                                     ` Nicolas Pitre
2010-07-08 11:31                           ` Jeff King
2010-07-08 14:35                           ` Johan Herland
2010-07-08 19:06                           ` Nicolas Pitre
2010-07-07 17:50                       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101027171145.GA6921@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=apenwarr@gmail.com \
    --cc=drizzd@aon.at \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).