git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Handling renames.
@ 2005-04-14 17:54 David Woodhouse
  2005-04-14 18:11 ` Linus Torvalds
                   ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: David Woodhouse @ 2005-04-14 17:54 UTC (permalink / raw)
  To: git; +Cc: James Bottomley

I've been looking at tracking file revisions. One proposed solution was
to have a separate revision history for individual files, with a new
kind of 'filecommit' object which parallels the existing 'commit',
referencing a blob instead of a tree. Then trees would reference such
objects instead of referencing blobs directly.

I think that introduces a lot of redundancy though, because 99% of the
time, the revision history of the individual file is entirely
reproducible from the revision history of the tree. It's only when files
are renamed that we fall over -- and I think we can handle renames
fairly well if we just log them in the commit object. 

My 'gitfilelog.sh' script is already capable of tracking a given file
back through multiple tree commits, listing those commits where the file
in question was actually changed. It uses my patched version of diff-
tree which supports 'diff-tree <TREE_A> <TREE_B> <filename>' in order to
do this.

By storing rename information in the commit object, the script (or a
reimplementation of a similar algorithm) could know when to change the
filename it's looking for, as it goes back through the tree. That ought
to be perfectly sufficient.

So a commit involving a rename would look something like this...

	tree 82ba574c85e9a2e4652419c88244e9dd1bfa8baa
	parent bb95843a5a0f397270819462812735ee29796fb4
	rename foo.c bar.c
	author David Woodhouse <dwmw2@hades.cambridge.redhat.com> 1113499881 +0100
	committer David Woodhouse <dwmw2@hades.cambridge.redhat.com> 1113499881 +0100
	Rename foo.c to bar.c and s/foo_/bar_/g

Opinions? Dissent? We'd probably need to escape the filenames in some
way -- handwave over that for now.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 28+ messages in thread
* Re: Handling renames.
@ 2005-04-15 13:37 linux
  2005-04-15 13:53 ` David Woodhouse
  0 siblings, 1 reply; 28+ messages in thread
From: linux @ 2005-04-15 13:37 UTC (permalink / raw)
  To: dwmw2; +Cc: git

> One option for optimising this, if we really need to, might be to track
> the file back to its _first_ ancestor and use that as an identification.
> The SCM could store that identifier in the blob itself, or we could
> consider it an 'inode number' and store it in git's tree objects.

This suggestion (and this whole discussion about renames) has issues
with file copies, which form a branch in the revision history.  If I
copy foo.c to foo2.c (or fs/ext2/ to fs/ext3/), then the oldest ancestor
isn't a "unique inode number".

I've written a lot of programs by debugging hello.c.

Thinking about this can give you all sorts of exciting merge possibilities.

If branch1 renames a.c to b.c, and branch2 patches a.c, it seems obvious
that the patch should be merged into b.c.  But what if branch1 copies a.c
to b.c?

^ permalink raw reply	[flat|nested] 28+ messages in thread
* git-rev-list: add "--dense" flag
@ 2005-10-21 23:40 Linus Torvalds
  2005-10-22  0:37 ` Petr Baudis
  0 siblings, 1 reply; 28+ messages in thread
From: Linus Torvalds @ 2005-10-21 23:40 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List


This is what the recent git-rev-list changes have all been gearing up for.

When we use a path filter to git-rev-list, the new "--dense" flag asks
git-rev-list to compress the history so that it _only_ contains commits
that change files in the path filter.  It also rewrites the parent
information so that tools like "gitk" will see the result as a dense
history tree.

For example, on the current kernel archive:

	[torvalds@g5 linux]$ git-rev-list HEAD | wc -l
	9904
	[torvalds@g5 linux]$ git-rev-list HEAD -- kernel | wc -l
	5442
	[torvalds@g5 linux]$ git-rev-list --dense HEAD -- kernel | wc -l
	356

which shows that while we have almost ten thousand commits, we can prune
down the work to slightly more than half by only following the merges
that are interesting. But further, we can then compress the history to
just 356 entries that actually make changes to the kernel subdirectory.

To see this in action, try something like

	gitk --dense -- gitk

to see just the history that affects gitk.  Or, to show that true
parallell development still remains parallell, do

	gitk --dense -- daemon.c

which shows some parallell commits in the current git tree.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---

I'm really happy with how this turned out. It's a bit expensive to run on 
big archives, but I _really_ think it's quite spectacular. And likely very 
useful indeed.

For example, say you love gitk, but only care about networking changes. 
Easy enough, just do

	gitk --dense -- net/ include/net/

and off you go. It's not free (we do a _lot_ of tree comparisons), but 
dammit, it's fast enough that it's very very useful.  The tree comparisons 
are done very efficiently.

This is _way_ more powerful than annotate. Interested in just a single 
file? Just do

	gitk --dense -- kernel/exit.c

and it will show the 17 or so commits that change kernel/exit.c with the 
right history (it turns out that there is no parallell development at all 
in that file, so in this case it will linearize history entirely).

Damn, I'm good. 

diff --git a/rev-list.c b/rev-list.c
index b5dbb9f..5f125fd 100644
--- a/rev-list.c
+++ b/rev-list.c
@@ -11,6 +11,7 @@
 #define INTERESTING	(1u << 1)
 #define COUNTED		(1u << 2)
 #define SHOWN		(1u << 3)
+#define TREECHANGE	(1u << 4)
 
 static const char rev_list_usage[] =
 	"git-rev-list [OPTION] commit-id <commit-id>\n"
@@ -27,6 +28,7 @@ static const char rev_list_usage[] =
 		      "  --merge-order [ --show-breaks ]\n"
 		      "  --topo-order";
 
+static int dense = 0;
 static int unpacked = 0;
 static int bisect_list = 0;
 static int tag_objects = 0;
@@ -79,6 +81,26 @@ static void show_commit(struct commit *c
 	fflush(stdout);
 }
 
+static void rewrite_one(struct commit **pp)
+{
+	for (;;) {
+		struct commit *p = *pp;
+		if (p->object.flags & (TREECHANGE | UNINTERESTING))
+			return;
+		/* Only single-parent commits don't have TREECHANGE */
+		*pp = p->parents->item;
+	}
+}
+
+static void rewrite_parents(struct commit *commit)
+{
+	struct commit_list *parent = commit->parents;
+	while (parent) {
+		rewrite_one(&parent->item);
+		parent = parent->next;
+	}
+}
+
 static int filter_commit(struct commit * commit)
 {
 	if (stop_traversal && (commit->object.flags & BOUNDARY))
@@ -95,6 +117,11 @@ static int filter_commit(struct commit *
 		return STOP;
 	if (no_merges && (commit->parents && commit->parents->next))
 		return CONTINUE;
+	if (paths && dense) {
+		if (!(commit->object.flags & TREECHANGE))
+			return CONTINUE;
+		rewrite_parents(commit);
+	}
 	return DO;
 }
 
@@ -404,6 +431,14 @@ static struct diff_options diff_opt = {
 	.change = file_change,
 };
 
+static int same_tree(struct tree *t1, struct tree *t2)
+{
+	is_different = 0;
+	if (diff_tree_sha1(t1->object.sha1, t2->object.sha1, "", &diff_opt) < 0)
+		return 0;
+	return !is_different;
+}
+
 static struct commit *try_to_simplify_merge(struct commit *commit, struct commit_list *parent)
 {
 	if (!commit->tree)
@@ -415,11 +450,7 @@ static struct commit *try_to_simplify_me
 		parse_commit(p);
 		if (!p->tree)
 			continue;
-		is_different = 0;
-		if (diff_tree_sha1(commit->tree->object.sha1,
-				   p->tree->object.sha1, "", &diff_opt) < 0)
-			continue;
-		if (!is_different)
+		if (same_tree(commit->tree, p->tree))
 			return p;
 	}
 	return NULL;
@@ -485,6 +516,27 @@ static void add_parents_to_list(struct c
 	}
 }
 
+static void compress_list(struct commit_list *list)
+{
+	while (list) {
+		struct commit *commit = list->item;
+		struct commit_list *parent = commit->parents;
+		list = list->next;
+
+		/*
+		 * Exactly one parent? Check if it leaves the tree
+		 * unchanged
+		 */
+		if (parent && !parent->next) {
+			struct tree *t1 = commit->tree;
+			struct tree *t2 = parent->item->tree;
+			if (!t1 || !t2 || same_tree(t1, t2))
+				continue;
+		}
+		commit->object.flags |= TREECHANGE;
+	}
+}
+
 static struct commit_list *limit_list(struct commit_list *list)
 {
 	struct commit_list *newlist = NULL;
@@ -514,6 +566,8 @@ static struct commit_list *limit_list(st
 	}
 	if (tree_objects)
 		mark_edges_uninteresting(newlist);
+	if (paths && dense)
+		compress_list(newlist);
 	if (bisect_list)
 		newlist = find_bisection(newlist);
 	return newlist;
@@ -700,6 +754,10 @@ int main(int argc, const char **argv)
 		        limited = 1;
 			continue;
 		}
+		if (!strcmp(arg, "--dense")) {
+			dense = 1;
+			continue;
+		}
 		if (!strcmp(arg, "--")) {
 			paths = get_pathspec(prefix, argv + i + 1);
 			if (paths) {

^ permalink raw reply related	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2005-10-22  3:23 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-04-14 17:54 Handling renames David Woodhouse
2005-04-14 18:11 ` Linus Torvalds
2005-04-14 19:09   ` David Woodhouse
2005-04-14 18:12 ` Ingo Molnar
2005-04-14 18:32   ` Linus Torvalds
2005-04-14 18:58     ` Ingo Molnar
2005-04-14 19:20       ` David Woodhouse
2005-04-14 19:21     ` David Mansfield
2005-04-14 18:21 ` H. Peter Anvin
2005-04-14 18:48   ` Linus Torvalds
2005-04-14 18:49     ` H. Peter Anvin
2005-04-14 19:22     ` Zach Welch
2005-04-14 19:40       ` Andrew Timberlake-Newell
2005-04-14 20:42         ` Naming the SCM (was Re: Handling renames.) Steven Cole
2005-04-14 20:53           ` Petr Baudis
2005-04-14 20:58             ` H. Peter Anvin
2005-04-14 21:01               ` Petr Baudis
2005-04-14 23:17           ` Peter Williams
2005-04-14 22:23 ` Handling renames Daniel Barkalow
2005-04-14 22:46   ` David Woodhouse
  -- strict thread matches above, loose matches on Subject: below --
2005-04-15 13:37 linux
2005-04-15 13:53 ` David Woodhouse
2005-10-21 23:40 git-rev-list: add "--dense" flag Linus Torvalds
2005-10-22  0:37 ` Petr Baudis
2005-10-22  0:47   ` Handling renames Petr Baudis
2005-10-22  1:28     ` Linus Torvalds
2005-10-22  1:51       ` Petr Baudis
2005-10-22  2:10         ` Junio C Hamano
2005-10-22  2:49           ` Petr Baudis
2005-10-22  3:23         ` Linus Torvalds

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).