git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Jakub Narebski" <jnareb@gmail.com>, "Ted Ts'o" <tytso@mit.edu>,
	"Jonathan Nieder" <jrnieder@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Clemens Buchacher" <drizzd@aon.at>,
	"Shawn O. Pearce" <spearce@spearce.org>
Subject: [RFC/PATCHv2 3/6] commit: add commit_generation function
Date: Wed, 13 Jul 2011 03:05:17 -0400	[thread overview]
Message-ID: <20110713070517.GC18566@sigill.intra.peff.net> (raw)
In-Reply-To: <20110713064709.GA18499@sigill.intra.peff.net>

A commit's generation is its height in the history graph, as
measured from the farthest root. It is defined as:

  - If the commit has no parents, then its generation is 0.

  - Otherwise, its generation is 1 more than the maximum of
    its parents generations.

The following diagram shows a sample history with
generations:

  A(0)--B(1)--C(2)------G(5)--H(6)
         \             /
          D(2)--E(3)--F(4)

Note that C and D have the same generation, as they are both
children of B. Note also that the merge commit G's
generation is 5, not 3, as we take the maximum of its
parents' generations.

Generation numbers can be useful for bounding traversals.
For example, if we have two commits with generations 500 and
600, we know that the second cannot be an ancestor of the
first. The first could be an ancestor of the second, but we
can't know unless we traverse the history graph. However,
when walking backwards from the "600" commit, once we reach
generation "499", we know that the "500" commit cannot be an
ancestor of the "499" commit, and we can stop the traversal
without even looking at the earlier parts of the history.

We already do something similar with commit timestamps in
many traversals. However, timestamps are somewhat
untrustworthy, as we have to deal with clock skew and with
imports from buggy systems.

Generation numbers are easy to calculate recursively, though
you have to go to the roots to do so. This patch calculates
and stores them in a persistent cache.  It uses a simple
recursive algorithm; you could probably drop the recursion
by topologically sorting a list of all commits and filling
in generation numbers left to right. But the recursive
definition coupled with the cache make it very cheap to
calculate generation numbers for new commits at the tip of
history (you only have to traverse back to the last cached
parents).

We could also store generation numbers in the commit header
directly. These would be faster to look at than an external
cache (they would be on par speed-wise with commit
timestamps). But there are a few reasons not to:

  1. The reason to avoid commit timestamps is that they are
     unreliable. Generation numbers would probably be more
     reliable, but they are still redundant with the actual
     graph structure represented by the parent pointers, and
     can therefore be out of sync with the parent
     information. By calculating them ourselves, we know
     they are correct.

  2. With grafts and replacement objects, the graph
     structure (and thus the generation numbers) can be
     changed. So the generation number, while immutable for
     a given commit object, can be changed when we "lie"
     about the graph structure via these mechanisms. Being
     able to simply clear the cache when these things are
     changed is helpful.

  3. There's a lot of existing git history without
     generation headers. So we'd still need to have the same
     cache to handle those cases.

  4. It generally pollutes the header with redundant
     information, which we try to avoid. Putting it in the
     commit header is purely a speedup, and it seems we can
     get similar performance with a generation cache.

Signed-off-by: Jeff King <peff@peff.net>
---
Same as before, but rebased onto the new metadata-cache interface.

 commit.c |   36 ++++++++++++++++++++++++++++++++++++
 commit.h |    2 ++
 2 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/commit.c b/commit.c
index ac337c7..fb37aa0 100644
--- a/commit.c
+++ b/commit.c
@@ -6,6 +6,7 @@
 #include "diff.h"
 #include "revision.h"
 #include "notes.h"
+#include "metadata-cache.h"
 
 int save_commit_buffer = 1;
 
@@ -878,3 +879,38 @@ int commit_tree(const char *msg, unsigned char *tree,
 	strbuf_release(&buffer);
 	return result;
 }
+
+static struct metadata_cache generations =
+	METADATA_CACHE_INIT("generations", sizeof(uint32_t), NULL);
+
+static unsigned long commit_generation_recurse(struct commit *c)
+{
+	struct commit_list *p;
+	uint32_t r;
+
+	if (!metadata_cache_lookup_uint32(&generations, &c->object, &r))
+		return r;
+
+	if (parse_commit(c) < 0)
+		die("unable to parse commit: %s", sha1_to_hex(c->object.sha1));
+
+	if (!c->parents)
+		return 0;
+
+	r = 0;
+	for (p = c->parents; p; p = p->next) {
+		unsigned long pgen = commit_generation_recurse(p->item);
+		if (pgen > r)
+			r = pgen;
+	}
+	r++;
+
+	metadata_cache_add_uint32(&generations, &c->object, r);
+	return r;
+}
+
+unsigned long commit_generation(const struct commit *commit)
+{
+	/* drop const because we may call parse_commit */
+	return commit_generation_recurse((struct commit *)commit);
+}
diff --git a/commit.h b/commit.h
index a2d571b..bff6b36 100644
--- a/commit.h
+++ b/commit.h
@@ -176,4 +176,6 @@ extern int commit_tree(const char *msg, unsigned char *tree,
 		struct commit_list *parents, unsigned char *ret,
 		const char *author);
 
+unsigned long commit_generation(const struct commit *commit);
+
 #endif /* COMMIT_H */
-- 
1.7.6.37.g989c6

  parent reply	other threads:[~2011-07-13  7:05 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-13  6:47 [RFC/PATCHv2 0/6] generation numbers for faster traversals Jeff King
2011-07-13  6:57 ` [RFC/PATCHv2 1/6] decorate: allow storing values instead of pointers Jeff King
2011-07-13 17:52   ` Jonathan Nieder
2011-07-13 20:08     ` Jeff King
2011-07-14 17:34       ` Jeff King
2011-07-14 17:51         ` [PATCH 1/3] implement generic key/value map Jeff King
2011-07-14 18:52           ` Bert Wesarg
2011-07-14 18:54             ` Bert Wesarg
2011-07-14 18:55               ` Jeff King
2011-07-14 19:07                 ` Bert Wesarg
2011-07-14 19:14                   ` Jeff King
2011-07-14 19:18                     ` Bert Wesarg
2011-07-14 17:52         ` [PATCH 2/3] fast-export: use object to uint32 map instead of "decorate" Jeff King
2011-07-15  9:40           ` Sverre Rabbelier
2011-07-15 20:00             ` Jeff King
2011-07-14 17:53         ` [PATCH 3/3] decorate: use "map" for the underlying implementation Jeff King
2011-07-14 21:06         ` [RFC/PATCHv2 1/6] decorate: allow storing values instead of pointers Junio C Hamano
2011-08-04 22:43           ` [RFC/PATCH 0/5] macro-based key/value maps Jeff King
2011-08-04 22:45             ` [PATCH 1/5] implement generic key/value map Jeff King
2011-08-04 22:46             ` [PATCH 2/5] fast-export: use object to uint32 map instead of "decorate" Jeff King
2011-08-04 22:46             ` [PATCH 3/5] decorate: use "map" for the underlying implementation Jeff King
2011-08-04 22:46             ` [PATCH 4/5] map: implement persistent maps Jeff King
2011-08-04 22:46             ` [PATCH 5/5] implement metadata cache subsystem Jeff King
2011-08-04 22:48             ` [RFC/PATCH 0/2] patch-id caching Jeff King
2011-08-04 22:49               ` [PATCH 1/2] cherry: read default config Jeff King
2011-08-04 22:49               ` [PATCH 2/2] cache patch ids on disk Jeff King
2011-08-04 22:52                 ` Jeff King
2011-08-05 11:03             ` [RFC/PATCH 0/5] macro-based key/value maps Jeff King
2011-08-05 15:31               ` René Scharfe
2011-08-06  6:30                 ` Jeff King
2011-07-13  7:04 ` [RFC/PATCHv2 2/6] add metadata-cache infrastructure Jeff King
2011-07-13  8:18   ` Bert Wesarg
2011-07-13  8:31     ` Jeff King
2011-07-13  8:45       ` Bert Wesarg
2011-07-13 19:18         ` Jeff King
2011-07-13 19:40       ` Junio C Hamano
2011-07-13 19:33   ` Junio C Hamano
2011-07-13 20:25     ` Jeff King
2011-07-13  7:05 ` Jeff King [this message]
2011-07-13 14:26   ` [RFC/PATCHv2 3/6] commit: add commit_generation function Eric Sunshine
2011-07-13  7:05 ` [RFC/PATCHv2 4/6] pretty: support %G to show the generation number of a commit Jeff King
2011-07-13  7:06 ` [RFC/PATCHv2 5/6] check commit generation cache validity against grafts Jeff King
2011-07-13 14:26   ` Eric Sunshine
2011-07-13 19:35     ` Jeff King
2011-07-13  7:06 ` [RFC/PATCHv2 6/6] limit "contains" traversals based on commit generation Jeff King
2011-07-13  7:23   ` Jeff King
2011-07-13 20:33     ` Junio C Hamano
2011-07-13 20:58       ` Jeff King
2011-07-13 21:12         ` Junio C Hamano
2011-07-13 21:18           ` Jeff King
2011-07-15 18:22   ` Junio C Hamano
2011-07-15 20:40     ` Jeff King
2011-07-15 21:04       ` Junio C Hamano
2011-07-15 21:14         ` Jeff King
2011-07-15 21:01 ` Generation numbers and replacement objects Jakub Narebski
2011-07-15 21:10   ` Jeff King
2011-07-16 21:10     ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110713070517.GC18566@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=drizzd@aon.at \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=jrnieder@gmail.com \
    --cc=spearce@spearce.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).