git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Thomas Rast <trast@inf.ethz.ch>
To: <git@vger.kernel.org>
Cc: "Uwe Kleine-König" <u.kleine-koenig@pengutronix.de>,
	"Ramsay Jones" <ramsay@ramsay1.demon.co.uk>,
	"Junio C Hamano" <gitster@pobox.com>
Subject: [PATCH v2] log: use true parents for diff even when rewriting
Date: Wed, 31 Jul 2013 22:13:20 +0200	[thread overview]
Message-ID: <e7f2ead2267ff78940aab00fe36c378a2ce5d85e.1375301293.git.trast@inf.ethz.ch> (raw)
In-Reply-To: <a598aec3e3c90de4d2c08e58ee0a4828edc80ac2.1374527806.git.trast@inf.ethz.ch>

When using pathspec filtering in combination with diff-based log
output, parent simplification happens before the diff is computed.
The diff is therefore against the *simplified* parents.

This works okay, arguably by accident, in the normal case:
simplification reduces to one parent as long as the commit is TREESAME
to it.  So the simplified parent of any given commit must have the
same tree contents on the filtered paths as its true (unfiltered)
parent.

However, --full-diff breaks this guarantee, and indeed gives pretty
spectacular results when comparing the output of

  git log --graph --stat ...
  git log --graph --full-diff --stat ...

(--graph internally kicks in parent simplification, much like
--parents).

To fix it, store a copy of the parent list before simplification (in a
slab) whenever --full-diff is in effect.  Then use the stored parents
instead of the simplified ones in the commit display code paths.  The
latter do not actually check for --full-diff to avoid duplicated code;
they just grab the original parents if save_parents() has not been
called for this revision walk.

For ordinary commits it should be obvious that this is the right thing
to do.

Merge commits are a bit subtle.  Observe that with default
simplification, merge simplification is an all-or-nothing decision:
either the merge is TREESAME to one parent and disappears, or it is
different from all parents and the parent list remains intact.
Redundant parents are not pruned, so the existing code also shows them
as a merge.

So if we do show a merge commit, the parent list just consists of the
rewrite result on each parent.  Running, e.g., --cc on this in
--full-diff mode is not very useful: if any commits were skipped, some
hunks will disagree with all sides of the merge (with one side,
because commits were skipped; with the others, because they didn't
have those changes in the first place).  This triggers --cc showing
these hunks spuriously.

Therefore I believe that even for merge commits it is better to show
the diffs wrt. the original parents.

Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
---

New in this version:

* Junio's and Ramsay's suggestions squashed

* Moved the slab variable from file-static to within the struct
  rev_info.  In practice there is no difference, because you cannot
  run two walks in parallel anyway (they would stomp over each others'
  parent lists!).  But it was easy, feels slightly cleaner, and avoids
  having yet another global variable.

* Wrote an actual commit message.  I hope the logic wrt. merge commits
  is correct.


 combine-diff.c               |  3 ++-
 commit.c                     | 16 ++++++++++++++++
 commit.h                     |  3 +++
 log-tree.c                   |  2 +-
 revision.c                   | 43 ++++++++++++++++++++++++++++++++++++++++++-
 revision.h                   | 20 ++++++++++++++++++++
 t/t6012-rev-list-simplify.sh |  6 ++++++
 7 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/combine-diff.c b/combine-diff.c
index 6dc0609..3d2aaf3 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -10,6 +10,7 @@
 #include "refs.h"
 #include "userdiff.h"
 #include "sha1-array.h"
+#include "revision.h"
 
 static struct combine_diff_path *intersect_paths(struct combine_diff_path *curr, int n, int num_parent)
 {
@@ -1383,7 +1384,7 @@ void diff_tree_combined(const unsigned char *sha1,
 void diff_tree_combined_merge(const struct commit *commit, int dense,
 			      struct rev_info *rev)
 {
-	struct commit_list *parent = commit->parents;
+	struct commit_list *parent = get_saved_parents(rev, commit);
 	struct sha1_array parents = SHA1_ARRAY_INIT;
 
 	while (parent) {
diff --git a/commit.c b/commit.c
index e5862f6..5ecdb38 100644
--- a/commit.c
+++ b/commit.c
@@ -377,6 +377,22 @@ unsigned commit_list_count(const struct commit_list *l)
 	return c;
 }
 
+struct commit_list *copy_commit_list(struct commit_list *list)
+{
+	struct commit_list *head = NULL;
+	struct commit_list **pp = &head;
+	while (list) {
+		struct commit_list *new;
+		new = xmalloc(sizeof(struct commit_list));
+		new->item = list->item;
+		new->next = NULL;
+		*pp = new;
+		pp = &new->next;
+		list = list->next;
+	}
+	return head;
+}
+
 void free_commit_list(struct commit_list *list)
 {
 	while (list) {
diff --git a/commit.h b/commit.h
index d912a9d..f9504f7 100644
--- a/commit.h
+++ b/commit.h
@@ -62,6 +62,9 @@ struct commit_list *commit_list_insert_by_date(struct commit *item,
 				    struct commit_list **list);
 void commit_list_sort_by_date(struct commit_list **list);
 
+/* Shallow copy of the input list */
+struct commit_list *copy_commit_list(struct commit_list *list);
+
 void free_commit_list(struct commit_list *list);
 
 /* Commit formats */
diff --git a/log-tree.c b/log-tree.c
index a49d8e8..8534d91 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -738,7 +738,7 @@ static int log_tree_diff(struct rev_info *opt, struct commit *commit, struct log
 	sha1 = commit->tree->object.sha1;
 
 	/* Root commit? */
-	parents = commit->parents;
+	parents = get_saved_parents(opt, commit);
 	if (!parents) {
 		if (opt->show_root_diff) {
 			diff_root_tree_sha1(sha1, "", &opt->diffopt);
diff --git a/revision.c b/revision.c
index 84ccc05..e3ca936 100644
--- a/revision.c
+++ b/revision.c
@@ -15,6 +15,7 @@
 #include "string-list.h"
 #include "line-log.h"
 #include "mailmap.h"
+#include "commit-slab.h"
 
 volatile show_early_output_fn_t show_early_output;
 
@@ -2763,7 +2764,7 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 	return retval;
 }
 
-static inline int want_ancestry(struct rev_info *revs)
+static inline int want_ancestry(const struct rev_info *revs)
 {
 	return (revs->rewrite_parents || revs->children.name);
 }
@@ -2820,6 +2821,14 @@ enum commit_action simplify_commit(struct rev_info *revs, struct commit *commit)
 	if (action == commit_show &&
 	    !revs->show_all &&
 	    revs->prune && revs->dense && want_ancestry(revs)) {
+		/*
+		 * --full-diff on simplified parents is no good: it
+		 * will show spurious changes from the commits that
+		 * were elided.  So we save the parents on the side
+		 * when --full-diff is in effect.
+		 */
+		if (revs->full_diff)
+			save_parents(revs, commit);
 		if (rewrite_parents(revs, commit, rewrite_one) < 0)
 			return commit_error;
 	}
@@ -3038,6 +3047,8 @@ struct commit *get_revision(struct rev_info *revs)
 	c = get_revision_internal(revs);
 	if (c && revs->graph)
 		graph_update(revs->graph, c);
+	if (!c)
+		free_saved_parents(revs);
 	return c;
 }
 
@@ -3069,3 +3080,33 @@ void put_revision_mark(const struct rev_info *revs, const struct commit *commit)
 	fputs(mark, stdout);
 	putchar(' ');
 }
+
+define_commit_slab(saved_parents, struct commit_list *);
+
+void save_parents(struct rev_info *revs, struct commit *commit)
+{
+	struct commit_list **pp;
+
+	if (!revs->saved_parents_slab) {
+		revs->saved_parents_slab = xmalloc(sizeof(struct saved_parents));
+		init_saved_parents(revs->saved_parents_slab);
+	}
+
+	pp = saved_parents_at(revs->saved_parents_slab, commit);
+	assert(*pp == NULL);
+	*pp = copy_commit_list(commit->parents);
+}
+
+struct commit_list *get_saved_parents(struct rev_info *revs, const struct commit *commit)
+{
+	if (!revs->saved_parents_slab)
+		return commit->parents;
+
+	return *saved_parents_at(revs->saved_parents_slab, commit);
+}
+
+void free_saved_parents(struct rev_info *revs)
+{
+	if (revs->saved_parents_slab)
+		clear_saved_parents(revs->saved_parents_slab);
+}
diff --git a/revision.h b/revision.h
index 95859ba..e7f1d21 100644
--- a/revision.h
+++ b/revision.h
@@ -25,6 +25,7 @@
 struct rev_info;
 struct log_info;
 struct string_list;
+struct saved_parents;
 
 struct rev_cmdline_info {
 	unsigned int nr;
@@ -187,6 +188,9 @@ struct rev_info {
 
 	/* line level range that we are chasing */
 	struct decoration line_log_data;
+
+	/* copies of the parent lists, for --full-diff display */
+	struct saved_parents *saved_parents_slab;
 };
 
 #define REV_TREE_SAME		0
@@ -273,4 +277,20 @@ enum rewrite_result {
 
 extern int rewrite_parents(struct rev_info *revs, struct commit *commit,
 	rewrite_parent_fn_t rewrite_parent);
+
+/*
+ * Save a copy of the parent list, and return the saved copy.  This is
+ * used by the log machinery to retrieve the original parents when
+ * commit->parents has been modified by history simpification.
+ *
+ * You may only call save_parents() once per commit (this is checked
+ * for non-root commits).
+ *
+ * get_saved_parents() will transparently return commit->parents if
+ * history simplification is off.
+ */
+extern void save_parents(struct rev_info *revs, struct commit *commit);
+extern struct commit_list *get_saved_parents(struct rev_info *revs, const struct commit *commit);
+extern void free_saved_parents(struct rev_info *revs);
+
 #endif
diff --git a/t/t6012-rev-list-simplify.sh b/t/t6012-rev-list-simplify.sh
index 57ce239..fde5e71 100755
--- a/t/t6012-rev-list-simplify.sh
+++ b/t/t6012-rev-list-simplify.sh
@@ -127,4 +127,10 @@ test_expect_success 'full history simplification without parent' '
 	}
 '
 
+test_expect_success '--full-diff is not affected by --parents' '
+	git log -p --pretty="%H" --full-diff -- file >expected &&
+	git log -p --pretty="%H" --full-diff --parents -- file >actual &&
+	test_cmp expected actual
+'
+
 test_done
-- 
1.8.4.rc0.408.gad6868d

  parent reply	other threads:[~2013-07-31 20:13 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-22  9:08 git log anomalities Uwe Kleine-König
2013-07-22 10:40 ` Thomas Rast
2013-07-22 21:22 ` [PATCH] log: use true parents for diff even when rewriting Thomas Rast
2013-07-22 21:48   ` Junio C Hamano
2013-07-23  7:27     ` Thomas Rast
2013-07-23  7:49       ` Uwe Kleine-König
2013-07-23 19:55   ` Junio C Hamano
2013-07-31 20:13   ` Thomas Rast [this message]
2013-07-31 22:55     ` [PATCH v2] " Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e7f2ead2267ff78940aab00fe36c378a2ce5d85e.1375301293.git.trast@inf.ethz.ch \
    --to=trast@inf.ethz.ch \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=ramsay@ramsay1.demon.co.uk \
    --cc=u.kleine-koenig@pengutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).