git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH V3] git-rev-list: add --first-parent-not flag
@ 2022-01-05 23:27 Jerry Zhang
  2022-01-06 22:10 ` Junio C Hamano
  2022-01-11 21:39 ` [PATCH V4] git-rev-list: add --exclude-first-parent-only flag Jerry Zhang
  0 siblings, 2 replies; 5+ messages in thread
From: Jerry Zhang @ 2022-01-05 23:27 UTC (permalink / raw)
  To: git, gitster; +Cc: Jerry Zhang

Add the --path-first-parent-not flag, which
causes the traversal of any "not" commits
to visit only the first parent upon encountering
a merge commit.

   -A-----E-F-G--main
     \   / /
      B-C-D--topic

In this example, the goal is to return the
set {B, C, D} which represents a topic
branch that has been merged into main branch.
`git rev-list topic ^main` will end up returning
no commits since excluding main will end up
traversing the commits on topic as well.
`git rev-list --first-parent-not topic ^main`
however will return {B, C, D} as desired.

Add docs for the new flag, and clarify the
doc for --first-parent to indicate that it
applies to traversing the set of included
commits only. The semantics of existing flags
however have not changed.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
---
 Documentation/rev-list-options.txt | 21 ++++++++++++++-------
 blame.c                            |  2 +-
 revision.c                         | 30 ++++++++++++++++++++----------
 revision.h                         |  3 ++-
 shallow.c                          |  2 +-
 t/t6012-rev-list-simplify.sh       | 18 ++++++++++++------
 6 files changed, 50 insertions(+), 26 deletions(-)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 43a86fa562..59684d6a4f 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -120,23 +120,30 @@ providing this option will cause it to die.
 `--no-min-parents` and `--no-max-parents` reset these limits (to no limit)
 again.  Equivalent forms are `--min-parents=0` (any commit has 0 or more
 parents) and `--max-parents=-1` (negative numbers denote no upper limit).
 
 --first-parent::
-	Follow only the first parent commit upon seeing a merge
-	commit.  This option can give a better overview when
-	viewing the evolution of a particular topic branch,
-	because merges into a topic branch tend to be only about
-	adjusting to updated upstream from time to time, and
-	this option allows you to ignore the individual commits
-	brought in to your history by such a merge.
+	When finding commits to include, follow only the first
+	parent commit upon seeing a merge commit.  This option
+	can give a better overview when viewing the evolution of
+	a particular topic branch, because merges into a topic
+	branch tend to be only about adjusting to updated upstream
+	from time to time, and this option allows you to ignore
+	the individual commits brought in to your history by such
+	a merge.
 ifdef::git-log[]
 +
 This option also changes default diff format for merge commits
 to `first-parent`, see `--diff-merges=first-parent` for details.
 endif::git-log[]
 
+--first-parent-not::
+	When finding commits to exclude, follow only the first
+	parent commit upon seeing a merge commit.  This causes
+	"not" commits to exclude only commits on that branch itself
+	and not those brought in by a merge.
+
 --not::
 	Reverses the meaning of the '{caret}' prefix (or lack thereof)
 	for all following revision specifiers, up to the next `--not`.
 
 --all::
diff --git a/blame.c b/blame.c
index 206c295660..083d99fdbc 100644
--- a/blame.c
+++ b/blame.c
@@ -2613,11 +2613,11 @@ void assign_blame(struct blame_scoreboard *sb, int opt)
 		     !(revs->max_age != -1 && commit->date < revs->max_age)))
 			pass_blame(sb, suspect, opt);
 		else {
 			commit->object.flags |= UNINTERESTING;
 			if (commit->object.parsed)
-				mark_parents_uninteresting(commit);
+				mark_parents_uninteresting(sb->revs, commit);
 		}
 		/* treat root commit as boundary */
 		if (!commit->parents && !sb->show_root)
 			commit->object.flags |= UNINTERESTING;
 
diff --git a/revision.c b/revision.c
index 250f61e8cf..743e8d9e3c 100644
--- a/revision.c
+++ b/revision.c
@@ -271,11 +271,11 @@ static void commit_stack_clear(struct commit_stack *stack)
 {
 	FREE_AND_NULL(stack->items);
 	stack->nr = stack->alloc = 0;
 }
 
-static void mark_one_parent_uninteresting(struct commit *commit,
+static void mark_one_parent_uninteresting(struct rev_info *revs, struct commit *commit,
 					  struct commit_stack *pending)
 {
 	struct commit_list *l;
 
 	if (commit->object.flags & UNINTERESTING)
@@ -288,24 +288,30 @@ static void mark_one_parent_uninteresting(struct commit *commit,
 	 * here. However, it may turn out that we've
 	 * reached this commit some other way (where it
 	 * wasn't uninteresting), in which case we need
 	 * to mark its parents recursively too..
 	 */
-	for (l = commit->parents; l; l = l->next)
+	for (l = commit->parents; l; l = l->next) {
 		commit_stack_push(pending, l->item);
+		if (revs && revs->first_parent_not)
+			break;
+	}
 }
 
-void mark_parents_uninteresting(struct commit *commit)
+void mark_parents_uninteresting(struct rev_info *revs, struct commit *commit)
 {
 	struct commit_stack pending = COMMIT_STACK_INIT;
 	struct commit_list *l;
 
-	for (l = commit->parents; l; l = l->next)
-		mark_one_parent_uninteresting(l->item, &pending);
+	for (l = commit->parents; l; l = l->next) {
+		mark_one_parent_uninteresting(revs, l->item, &pending);
+		if (revs && revs->first_parent_not)
+			break;
+	}
 
 	while (pending.nr > 0)
-		mark_one_parent_uninteresting(commit_stack_pop(&pending),
+		mark_one_parent_uninteresting(revs, commit_stack_pop(&pending),
 					      &pending);
 
 	commit_stack_clear(&pending);
 }
 
@@ -439,11 +445,11 @@ static struct commit *handle_commit(struct rev_info *revs,
 		struct commit *commit = (struct commit *)object;
 
 		if (repo_parse_commit(revs->repo, commit) < 0)
 			die("unable to parse commit %s", name);
 		if (flags & UNINTERESTING) {
-			mark_parents_uninteresting(commit);
+			mark_parents_uninteresting(revs, commit);
 
 			if (!revs->topo_order || !generation_numbers_enabled(the_repository))
 				revs->limited = 1;
 		}
 		if (revs->sources) {
@@ -1122,18 +1128,20 @@ static int process_parents(struct rev_info *revs, struct commit *commit,
 			if (p)
 				p->object.flags |= UNINTERESTING;
 			if (repo_parse_commit_gently(revs->repo, p, 1) < 0)
 				continue;
 			if (p->parents)
-				mark_parents_uninteresting(p);
+				mark_parents_uninteresting(revs, p);
 			if (p->object.flags & SEEN)
 				continue;
 			p->object.flags |= (SEEN | NOT_USER_GIVEN);
 			if (list)
 				commit_list_insert_by_date(p, list);
 			if (queue)
 				prio_queue_put(queue, p);
+			if (revs->first_parent_not)
+				break;
 		}
 		return 0;
 	}
 
 	/*
@@ -1420,11 +1428,11 @@ static int limit_list(struct rev_info *revs)
 		if (revs->max_age != -1 && (commit->date < revs->max_age))
 			obj->flags |= UNINTERESTING;
 		if (process_parents(revs, commit, &original_list, NULL) < 0)
 			return -1;
 		if (obj->flags & UNINTERESTING) {
-			mark_parents_uninteresting(commit);
+			mark_parents_uninteresting(revs, commit);
 			slop = still_interesting(original_list, date, slop, &interesting_cache);
 			if (slop)
 				continue;
 			break;
 		}
@@ -2221,10 +2229,12 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	} else if ((argcount = parse_long_opt("until", argv, &optarg))) {
 		revs->min_age = approxidate(optarg);
 		return argcount;
 	} else if (!strcmp(arg, "--first-parent")) {
 		revs->first_parent_only = 1;
+	} else if (!strcmp(arg, "--first-parent-not")) {
+		revs->first_parent_not = 1;
 	} else if (!strcmp(arg, "--ancestry-path")) {
 		revs->ancestry_path = 1;
 		revs->simplify_history = 0;
 		revs->limited = 1;
 	} else if (!strcmp(arg, "-g") || !strcmp(arg, "--walk-reflogs")) {
@@ -3343,11 +3353,11 @@ static void explore_walk_step(struct rev_info *revs)
 
 	if (process_parents(revs, c, NULL, NULL) < 0)
 		return;
 
 	if (c->object.flags & UNINTERESTING)
-		mark_parents_uninteresting(c);
+		mark_parents_uninteresting(revs, c);
 
 	for (p = c->parents; p; p = p->next)
 		test_flag_and_insert(&info->explore_queue, p->item, TOPO_WALK_EXPLORED);
 }
 
diff --git a/revision.h b/revision.h
index 3f66147bfd..667dd27740 100644
--- a/revision.h
+++ b/revision.h
@@ -156,10 +156,11 @@ struct rev_info {
 			cherry_pick:1,
 			cherry_mark:1,
 			bisect:1,
 			ancestry_path:1,
 			first_parent_only:1,
+			first_parent_not:1,
 			line_level_traverse:1,
 			tree_blobs_in_commit_order:1,
 
 			/*
 			 * Blobs are shown without regard for their existence.
@@ -396,11 +397,11 @@ struct commit *get_revision(struct rev_info *revs);
 const char *get_revision_mark(const struct rev_info *revs,
 			      const struct commit *commit);
 void put_revision_mark(const struct rev_info *revs,
 		       const struct commit *commit);
 
-void mark_parents_uninteresting(struct commit *commit);
+void mark_parents_uninteresting(struct rev_info *revs, struct commit *commit);
 void mark_tree_uninteresting(struct repository *r, struct tree *tree);
 void mark_trees_uninteresting_sparse(struct repository *r, struct oidset *trees);
 
 void show_object_with_name(FILE *, struct object *, const char *);
 
diff --git a/shallow.c b/shallow.c
index 9ed18eb884..71e5876f37 100644
--- a/shallow.c
+++ b/shallow.c
@@ -601,11 +601,11 @@ static int mark_uninteresting(const char *refname, const struct object_id *oid,
 	struct commit *commit = lookup_commit_reference_gently(the_repository,
 							       oid, 1);
 	if (!commit)
 		return 0;
 	commit->object.flags |= UNINTERESTING;
-	mark_parents_uninteresting(commit);
+	mark_parents_uninteresting(NULL, commit);
 	return 0;
 }
 
 static void post_assign_shallow(struct shallow_info *info,
 				struct ref_bitmap *ref_bitmap,
diff --git a/t/t6012-rev-list-simplify.sh b/t/t6012-rev-list-simplify.sh
index 4f7fa8b6c0..7da8542e58 100755
--- a/t/t6012-rev-list-simplify.sh
+++ b/t/t6012-rev-list-simplify.sh
@@ -14,17 +14,16 @@ note () {
 unnote () {
 	git name-rev --tags --stdin | sed -e "s|$OID_REGEX (tags/\([^)]*\)) |\1 |g"
 }
 
 #
-# Create a test repo with interesting commit graph:
+# Create a test repo with an interesting commit graph:
 #
-# A--B----------G--H--I--K--L
-#  \  \           /     /
-#   \  \         /     /
-#    C------E---F     J
-#        \_/
+# A-----B-----G--H--I--K--L
+#  \     \      /     /
+#   \     \    /     /
+#    C--D--E--F     J
 #
 # The commits are laid out from left-to-right starting with
 # the root commit A and terminating at the tip commit L.
 #
 # There are a few places where we adjust the commit date or
@@ -140,10 +139,17 @@ check_result 'I B A' --topo-order -- file
 check_result 'I B A' --date-order -- file
 check_result 'I B A' --author-date-order -- file
 check_result 'H' --first-parent -- another-file
 check_result 'H' --first-parent --topo-order -- another-file
 
+check_result 'L K I H G B A' --first-parent L
+check_result 'F E D C' --first-parent-not F ^L
+check_result '' F ^L
+check_result 'L K I H G J' L ^F
+check_result 'L K I H G B J' --first-parent-not L ^F
+check_result 'L K I H G B' --first-parent-not --first-parent L ^F
+
 check_result 'E C B A' --full-history E -- lost
 test_expect_success 'full history simplification without parent' '
 	printf "%s\n" E C B A >expect &&
 	git log --pretty="$FMT" --full-history E -- lost |
 	unnote >actual &&
-- 
2.32.0.1314.g6ed4fcc4cc


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH V3] git-rev-list: add --first-parent-not flag
  2022-01-05 23:27 [PATCH V3] git-rev-list: add --first-parent-not flag Jerry Zhang
@ 2022-01-06 22:10 ` Junio C Hamano
  2022-01-07  3:51   ` Jerry Zhang
  2022-01-11 21:39 ` [PATCH V4] git-rev-list: add --exclude-first-parent-only flag Jerry Zhang
  1 sibling, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2022-01-06 22:10 UTC (permalink / raw)
  To: Jerry Zhang; +Cc: git

Jerry Zhang <jerry@skydio.com> writes:

> Add the --path-first-parent-not flag, which
> causes the traversal of any "not" commits
> to visit only the first parent upon encountering
> a merge commit.
>
>    -A-----E-F-G--main
>      \   / /
>       B-C-D--topic
>
> In this example, the goal is to return the
> set {B, C, D} which represents a topic
> branch that has been merged into main branch.
> `git rev-list topic ^main` will end up returning
> no commits since excluding main will end up
> traversing the commits on topic as well.
> `git rev-list --first-parent-not topic ^main`
> however will return {B, C, D} as desired.

It may be true for _this_ particular topology, but it is unclear
what "the goal" is, and if the "ignore side branches that got
merged" is the right way to achieve that goal.  Perhaps shuffling
the order of explanation to state what you want to achieve first may
help clarify.

Is the goal "I have a commit that I know is at the tip of a topic
branch, but the topic may or may not have been merged to the primary
integration branch.  I want to know what commits were on the topic
branch"?

Even if we disregard a fast-forward merges from side branches, which
will screw up any algorithm that takes advantage of the assumption
that the first-parent chain is special, I am not quite convinced how
"propagate UNINTERESTING only along the first parent chain" is
necessary and sufficient for the purpose of solving that problem.
Care to elaborate on the correctness of the logic a bit more?

Please do not talk back with "give me a topology that the algorithm
would not work on, then".  The onus is on to whoever proposes a
change to show how it produces correct result.

> Add docs for the new flag, and clarify the
> doc for --first-parent to indicate that it
> applies to traversing the set of included
> commits only. The semantics of existing flags
> however have not changed.

This is a tangent.  Even though 45 is certainly less than 80, can we
use a bit wider lines?  What you used in the documentation patch
(around 60-65?) may be more readable.

>  --first-parent::
> -	Follow only the first parent commit upon seeing a merge
> -	commit.  This option can give a better overview when
> -	viewing the evolution of a particular topic branch,
> -	because merges into a topic branch tend to be only about
> -	adjusting to updated upstream from time to time, and
> -	this option allows you to ignore the individual commits
> -	brought in to your history by such a merge.
> +	When finding commits to include, follow only the first
> +	parent commit upon seeing a merge commit.  This option
> +	can give a better overview when viewing the evolution of
> +	a particular topic branch, because merges into a topic
> +	branch tend to be only about adjusting to updated upstream
> +	from time to time, and this option allows you to ignore
> +	the individual commits brought in to your history by such
> +	a merge.

The only change is to clarify that the first-parent traversal is
done only on the positive side; what is implied but probably is lost
to most readers is that propagation of UNINTERESTING bit is not
affected by this option.  I made sure that "This option can ..."
and everything after it are identical to save other reviewers' time,
as the above hunk have unnecessary rewrapping of the text.

> +--first-parent-not::
> +	When finding commits to exclude, follow only the first
> +	parent commit upon seeing a merge commit.  This causes
> +	"not" commits to exclude only commits on that branch itself
> +	and not those brought in by a merge.

Are there places we use a term '"not" commit'?  What you are trying
to refer to is a subset of "UNINTERESTING commits"; it is the
initial set of UNINTERESTING commits the traversal starts with.  I
know we use the word "negative" (or sometimes "bottom") in the
context of discussing revision ranges on this list, but I do not
think we used either in end-user facing documentation pages.

Let's read the beginning of the description of "git log --help" to
see if we can find a good phrase our readers should already be
familiar with.  This is how we describe the command:

    List commits that are reachable by following the `parent` links
    from the given commit(s), but exclude commits that are reachable
    from the one(s) given with a '{caret}' in front of them.  The
    output is given in reverse chronological order by default.

Assuming that propagating the UNINTERESTING bit only along the first
parent chain is a way to achieve some meaningful result (which, as I
said, I am not convinced about), I probably would call this option
"--exclude-first-parent-only" and explain it perhaps like so

	Follow only the first-parent chain from commits given with a
	{caret} in front of them, to find commits to exclude.

        This prevents commits merged from the side branches from
	becoming uninteresting and instead be shown if they are
	reachable from the positive end of the range.

I am debating myself if the second paragraph is necessary, though.
I suspect that the first two-line paragraph may be sufficient.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH V3] git-rev-list: add --first-parent-not flag
  2022-01-06 22:10 ` Junio C Hamano
@ 2022-01-07  3:51   ` Jerry Zhang
  2022-01-07 21:02     ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Jerry Zhang @ 2022-01-07  3:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Jan 6, 2022 at 2:10 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jerry Zhang <jerry@skydio.com> writes:
>
> > Add the --path-first-parent-not flag, which
> > causes the traversal of any "not" commits
> > to visit only the first parent upon encountering
> > a merge commit.
> >
> >    -A-----E-F-G--main
> >      \   / /
> >       B-C-D--topic
> >
> > In this example, the goal is to return the
> > set {B, C, D} which represents a topic
> > branch that has been merged into main branch.
> > `git rev-list topic ^main` will end up returning
> > no commits since excluding main will end up
> > traversing the commits on topic as well.
> > `git rev-list --first-parent-not topic ^main`
> > however will return {B, C, D} as desired.
>
> It may be true for _this_ particular topology, but it is unclear
> what "the goal" is, and if the "ignore side branches that got
> merged" is the right way to achieve that goal.  Perhaps shuffling
> the order of explanation to state what you want to achieve first may
> help clarify.
>
> Is the goal "I have a commit that I know is at the tip of a topic
> branch, but the topic may or may not have been merged to the primary
> integration branch.  I want to know what commits were on the topic
> branch"?

Although this is a useful application of the flag, it's not something we use
it for right now (although we might in the future).

As an overview, we're building a code review tool where the
(simplified) workflow is:

1. start off on some local version of an integration branch, tracking the
remote version of that branch
2. make some local changes. valid local changes include
    - making a new commit(s)
    - merging in other branches (could be remote integration branches
or feature branches)
    - merging in the head of the tracking branch to get the latest changes
3. identify which local changes were made and upload them (in our case
to github).

So our goals are:

1. Automatically identify the appropriate integration branch that the user
is intending to target, without needing to add arguments to the command.
2. Automatically identify relevant commits that the user has made on top of
the integration branch, without needing to add arguments.

We accomplish these by defining the "fork point" of two branches, although
note that this is different from what is found by "merge-base --fork-point".
Fork point of branch "dev" and "main" is the first parent of the last element of
"git rev-list --first-parent --exclude-first-parent-only main ^dev"
or equivalently, the first parent of the last element of
"git rev-list --first-parent --exclude-first-parent-only ^main dev".

The proof that those two are equivalent: when both flags are specified, rev-list
can only ever traverse first parents. This makes it as though all non
first parent
edges don't exist, so the git graph becomes basically a directed tree. Since we
are only traversing from 2 leaves, only 2 paths are traversed, and since they
are assumed to share history, the traversal forms a "Y".
       B ---- main
      /
--- A
      \
      C ---- dev
So "git rev-list --first-parent --exclude-first-parent-only main ^dev
--reverse | head -n 1"
finds B and "git rev-list --first-parent --exclude-first-parent-only
^main dev --reverse | head -n 1"
finds C, but both their parents are A. Hopefully this was sufficiently rigorous.

To solve goal #1, we loop the set of possible integration branches and
find the number of
local changes that our algorithm would detect. The branch that our
branch has the smallest
number of local changes (as identified in goal #2) with respect to is
considered to be the target.

To solve goal #2 we find the fork-point of our branch and the
integration branch, then
the following set is our local commits: "git rev-list --first-parent
dev ^forkpoint".

Merges will all work well with our algorithm and the new flag because
they no longer "look like" merges in terms of finding the fork point.
Another way of saying this is that because the traversal that calculates
the fork point can't follow secondary parents, merges are indistinguishable
from normal commits. Our algorithm that finds the fork-point and the set
of local changes already assumes an arbitrary number of commits between
the fork point and heads of the two different branches. Thus its impossible
to affect the correctness of the fork-point or the local commit set by adding
a normal commit. Since merges are indistinguishable from normal commits
it must also be impossible to affect this correctness by adding a merge commit.
This should prove that the flag is sufficient for our goals.

Now here is an example to show that the flag is necessary. It's a bit
complicated
but it's all valid individual and organizational uses of git.
- main branch is an integration branch that people work on
- next is a forward looking integration branch with experimental
features. it needs
occasional merges from main, but can't be merged back into main until
experimental
features have stabilized.
- dev is a local development branch.

 -A--B---C--D---main
   \  \ /    \
    \  E------------F--dev
     \        \    /
      G--------H--I--J--next

1. commit E is made on dev, PR is opened against main.
2. overnight E merges into main at C, and main is merged into next at H.
3. the next day, user fetches and merges in next at F, intending to
merge it into main.

With our algorithm, we find fork-point of dev and main to be B, so
local commits against
main are [E, F]. Fork-point of dev and next is found as A, local
commits would be [B, E, F].
Since main has a smaller set of local commits, it is chosen as the
target branch.

Without the new flag, we don't have a good definition of fork-point since
"git rev-list --first-parent main ^dev --reverse | head -n 1"
does not equal "git rev-list --first-parent ^main dev --reverse | head -n 1".
Nevertheless we can try a few things to see if we can reach the same result.
By excluding the dev branch, we find the fork point of dev and main to be
B, with local set [E, F]. However, we find the fork point of dev and
next to be I,
with local set [E, F] as well, so it becomes ambiguous which branch is the
preferred target.
Switching to excluding the target branch, we find the fork point of dev and main
to be E and working set to be [F]. Fork point of dev and next to be E
and working
set to be [F]. So it is still ambiguous and there's not any way I'm aware of to
get our desired result.

>
> Even if we disregard a fast-forward merges from side branches, which
> will screw up any algorithm that takes advantage of the assumption
> that the first-parent chain is special, I am not quite convinced how
FF merges aren't an issue here because our tool only uploads what the
user wants. If the user wants a real merge instead of an FF, its on them
to make that commit. Another way to look at it is that our tool must be
robust to all sorts of merges in the git history. Fast forwards don't look
like merges, so it isn't an issue if they've happened.

> "propagate UNINTERESTING only along the first parent chain" is
> necessary and sufficient for the purpose of solving that problem.
> Care to elaborate on the correctness of the logic a bit more?
>
> Please do not talk back with "give me a topology that the algorithm
> would not work on, then".  The onus is on to whoever proposes a
> change to show how it produces correct result.
>
> > Add docs for the new flag, and clarify the
> > doc for --first-parent to indicate that it
> > applies to traversing the set of included
> > commits only. The semantics of existing flags
> > however have not changed.
>
> This is a tangent.  Even though 45 is certainly less than 80, can we
> use a bit wider lines?  What you used in the documentation patch
> (around 60-65?) may be more readable.
>
> >  --first-parent::
> > -     Follow only the first parent commit upon seeing a merge
> > -     commit.  This option can give a better overview when
> > -     viewing the evolution of a particular topic branch,
> > -     because merges into a topic branch tend to be only about
> > -     adjusting to updated upstream from time to time, and
> > -     this option allows you to ignore the individual commits
> > -     brought in to your history by such a merge.
> > +     When finding commits to include, follow only the first
> > +     parent commit upon seeing a merge commit.  This option
> > +     can give a better overview when viewing the evolution of
> > +     a particular topic branch, because merges into a topic
> > +     branch tend to be only about adjusting to updated upstream
> > +     from time to time, and this option allows you to ignore
> > +     the individual commits brought in to your history by such
> > +     a merge.
>
> The only change is to clarify that the first-parent traversal is
> done only on the positive side; what is implied but probably is lost
> to most readers is that propagation of UNINTERESTING bit is not
> affected by this option.  I made sure that "This option can ..."
> and everything after it are identical to save other reviewers' time,
> as the above hunk have unnecessary rewrapping of the text.
>
> > +--first-parent-not::
> > +     When finding commits to exclude, follow only the first
> > +     parent commit upon seeing a merge commit.  This causes
> > +     "not" commits to exclude only commits on that branch itself
> > +     and not those brought in by a merge.
>
> Are there places we use a term '"not" commit'?  What you are trying
> to refer to is a subset of "UNINTERESTING commits"; it is the
> initial set of UNINTERESTING commits the traversal starts with.  I
> know we use the word "negative" (or sometimes "bottom") in the
> context of discussing revision ranges on this list, but I do not
> think we used either in end-user facing documentation pages.
>
> Let's read the beginning of the description of "git log --help" to
> see if we can find a good phrase our readers should already be
> familiar with.  This is how we describe the command:
>
>     List commits that are reachable by following the `parent` links
>     from the given commit(s), but exclude commits that are reachable
>     from the one(s) given with a '{caret}' in front of them.  The
>     output is given in reverse chronological order by default.
>
> Assuming that propagating the UNINTERESTING bit only along the first
> parent chain is a way to achieve some meaningful result (which, as I
> said, I am not convinced about), I probably would call this option
> "--exclude-first-parent-only" and explain it perhaps like so
>
>         Follow only the first-parent chain from commits given with a
>         {caret} in front of them, to find commits to exclude.
>
>         This prevents commits merged from the side branches from
>         becoming uninteresting and instead be shown if they are
>         reachable from the positive end of the range.
>
> I am debating myself if the second paragraph is necessary, though.
> I suspect that the first two-line paragraph may be sufficient.
Sure, I'll update the patch and commit text with these changes.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH V3] git-rev-list: add --first-parent-not flag
  2022-01-07  3:51   ` Jerry Zhang
@ 2022-01-07 21:02     ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2022-01-07 21:02 UTC (permalink / raw)
  To: Jerry Zhang; +Cc: git

Jerry Zhang <jerry@skydio.com> writes:

>> Assuming that propagating the UNINTERESTING bit only along the first
>> parent chain is a way to achieve some meaningful result (which, as I
>> said, I am not convinced about), I probably would call this option
>> "--exclude-first-parent-only" and explain it perhaps like so
>>
>>         Follow only the first-parent chain from commits given with a
>>         {caret} in front of them, to find commits to exclude.
>>
>>         This prevents commits merged from the side branches from
>>         becoming uninteresting and instead be shown if they are
>>         reachable from the positive end of the range.
>>
>> I am debating myself if the second paragraph is necessary, though.
>> I suspect that the first two-line paragraph may be sufficient.
> Sure, I'll update the patch and commit text with these changes.

OK.  Also the log message for the commit needs to be updated so that
those who read "git log" and find the commit for this change will
not have to ask the same question as I asked.

Thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH V4] git-rev-list: add --exclude-first-parent-only flag
  2022-01-05 23:27 [PATCH V3] git-rev-list: add --first-parent-not flag Jerry Zhang
  2022-01-06 22:10 ` Junio C Hamano
@ 2022-01-11 21:39 ` Jerry Zhang
  1 sibling, 0 replies; 5+ messages in thread
From: Jerry Zhang @ 2022-01-11 21:39 UTC (permalink / raw)
  To: git, gitster; +Cc: Jerry Zhang

It is useful to know when a branch first diverged in history
from some integration branch in order to be able to enumerate
the user's local changes. However, these local changes can
include arbitrary merges, so it is necessary to ignore this
merge structure when finding the divergence point.

In order to do this, teach the "rev-list" family to accept
"--exclude-first-parent-only", which restricts the traversal
of excluded commits to only follow first parent links.

   -A-----E-F-G--main
     \   / /
      B-C-D--topic

In this example, the goal is to return the set {B, C, D} which
represents a topic branch that has been merged into main branch.
`git rev-list topic ^main` will end up returning no commits
since excluding main will end up traversing the commits on topic
as well. `git rev-list --exclude-first-parent-only topic ^main`
however will return {B, C, D} as desired.

Add docs for the new flag, and clarify the doc for --first-parent
to indicate that it applies to traversing the set of included
commits only.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
---
V3->V4:
- Updated flag name
- Updated doc and commit text to describe the exact use-case

 Documentation/rev-list-options.txt | 22 +++++++++++++++-------
 blame.c                            |  2 +-
 revision.c                         | 30 ++++++++++++++++++++----------
 revision.h                         |  3 ++-
 shallow.c                          |  2 +-
 t/t6012-rev-list-simplify.sh       | 18 ++++++++++++------
 6 files changed, 51 insertions(+), 26 deletions(-)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 43a86fa562..fd4f4e26c9 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -120,23 +120,31 @@ providing this option will cause it to die.
 `--no-min-parents` and `--no-max-parents` reset these limits (to no limit)
 again.  Equivalent forms are `--min-parents=0` (any commit has 0 or more
 parents) and `--max-parents=-1` (negative numbers denote no upper limit).
 
 --first-parent::
-	Follow only the first parent commit upon seeing a merge
-	commit.  This option can give a better overview when
-	viewing the evolution of a particular topic branch,
-	because merges into a topic branch tend to be only about
-	adjusting to updated upstream from time to time, and
-	this option allows you to ignore the individual commits
-	brought in to your history by such a merge.
+	When finding commits to include, follow only the first
+	parent commit upon seeing a merge commit.  This option
+	can give a better overview when viewing the evolution of
+	a particular topic branch, because merges into a topic
+	branch tend to be only about adjusting to updated upstream
+	from time to time, and this option allows you to ignore
+	the individual commits brought in to your history by such
+	a merge.
 ifdef::git-log[]
 +
 This option also changes default diff format for merge commits
 to `first-parent`, see `--diff-merges=first-parent` for details.
 endif::git-log[]
 
+--exclude-first-parent-only::
+	When finding commits to exclude (with a '{caret}'), follow only
+	the first parent commit upon seeing a merge commit.
+	This can be used to find the set of changes in a topic branch
+	from the point where it diverged from the remote branch, given
+	that arbitrary merges can be valid topic branch changes.
+
 --not::
 	Reverses the meaning of the '{caret}' prefix (or lack thereof)
 	for all following revision specifiers, up to the next `--not`.
 
 --all::
diff --git a/blame.c b/blame.c
index 206c295660..083d99fdbc 100644
--- a/blame.c
+++ b/blame.c
@@ -2613,11 +2613,11 @@ void assign_blame(struct blame_scoreboard *sb, int opt)
 		     !(revs->max_age != -1 && commit->date < revs->max_age)))
 			pass_blame(sb, suspect, opt);
 		else {
 			commit->object.flags |= UNINTERESTING;
 			if (commit->object.parsed)
-				mark_parents_uninteresting(commit);
+				mark_parents_uninteresting(sb->revs, commit);
 		}
 		/* treat root commit as boundary */
 		if (!commit->parents && !sb->show_root)
 			commit->object.flags |= UNINTERESTING;
 
diff --git a/revision.c b/revision.c
index ad4286fbdd..d8d326d6b0 100644
--- a/revision.c
+++ b/revision.c
@@ -271,11 +271,11 @@ static void commit_stack_clear(struct commit_stack *stack)
 {
 	FREE_AND_NULL(stack->items);
 	stack->nr = stack->alloc = 0;
 }
 
-static void mark_one_parent_uninteresting(struct commit *commit,
+static void mark_one_parent_uninteresting(struct rev_info *revs, struct commit *commit,
 					  struct commit_stack *pending)
 {
 	struct commit_list *l;
 
 	if (commit->object.flags & UNINTERESTING)
@@ -288,24 +288,30 @@ static void mark_one_parent_uninteresting(struct commit *commit,
 	 * here. However, it may turn out that we've
 	 * reached this commit some other way (where it
 	 * wasn't uninteresting), in which case we need
 	 * to mark its parents recursively too..
 	 */
-	for (l = commit->parents; l; l = l->next)
+	for (l = commit->parents; l; l = l->next) {
 		commit_stack_push(pending, l->item);
+		if (revs && revs->exclude_first_parent_only)
+			break;
+	}
 }
 
-void mark_parents_uninteresting(struct commit *commit)
+void mark_parents_uninteresting(struct rev_info *revs, struct commit *commit)
 {
 	struct commit_stack pending = COMMIT_STACK_INIT;
 	struct commit_list *l;
 
-	for (l = commit->parents; l; l = l->next)
-		mark_one_parent_uninteresting(l->item, &pending);
+	for (l = commit->parents; l; l = l->next) {
+		mark_one_parent_uninteresting(revs, l->item, &pending);
+		if (revs && revs->exclude_first_parent_only)
+			break;
+	}
 
 	while (pending.nr > 0)
-		mark_one_parent_uninteresting(commit_stack_pop(&pending),
+		mark_one_parent_uninteresting(revs, commit_stack_pop(&pending),
 					      &pending);
 
 	commit_stack_clear(&pending);
 }
 
@@ -439,11 +445,11 @@ static struct commit *handle_commit(struct rev_info *revs,
 		struct commit *commit = (struct commit *)object;
 
 		if (repo_parse_commit(revs->repo, commit) < 0)
 			die("unable to parse commit %s", name);
 		if (flags & UNINTERESTING) {
-			mark_parents_uninteresting(commit);
+			mark_parents_uninteresting(revs, commit);
 
 			if (!revs->topo_order || !generation_numbers_enabled(the_repository))
 				revs->limited = 1;
 		}
 		if (revs->sources) {
@@ -1122,18 +1128,20 @@ static int process_parents(struct rev_info *revs, struct commit *commit,
 			if (p)
 				p->object.flags |= UNINTERESTING;
 			if (repo_parse_commit_gently(revs->repo, p, 1) < 0)
 				continue;
 			if (p->parents)
-				mark_parents_uninteresting(p);
+				mark_parents_uninteresting(revs, p);
 			if (p->object.flags & SEEN)
 				continue;
 			p->object.flags |= (SEEN | NOT_USER_GIVEN);
 			if (list)
 				commit_list_insert_by_date(p, list);
 			if (queue)
 				prio_queue_put(queue, p);
+			if (revs->exclude_first_parent_only)
+				break;
 		}
 		return 0;
 	}
 
 	/*
@@ -1420,11 +1428,11 @@ static int limit_list(struct rev_info *revs)
 		if (revs->max_age != -1 && (commit->date < revs->max_age))
 			obj->flags |= UNINTERESTING;
 		if (process_parents(revs, commit, &original_list, NULL) < 0)
 			return -1;
 		if (obj->flags & UNINTERESTING) {
-			mark_parents_uninteresting(commit);
+			mark_parents_uninteresting(revs, commit);
 			slop = still_interesting(original_list, date, slop, &interesting_cache);
 			if (slop)
 				continue;
 			break;
 		}
@@ -2221,10 +2229,12 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	} else if ((argcount = parse_long_opt("until", argv, &optarg))) {
 		revs->min_age = approxidate(optarg);
 		return argcount;
 	} else if (!strcmp(arg, "--first-parent")) {
 		revs->first_parent_only = 1;
+	} else if (!strcmp(arg, "--exclude-first-parent-only")) {
+		revs->exclude_first_parent_only = 1;
 	} else if (!strcmp(arg, "--ancestry-path")) {
 		revs->ancestry_path = 1;
 		revs->simplify_history = 0;
 		revs->limited = 1;
 	} else if (!strcmp(arg, "-g") || !strcmp(arg, "--walk-reflogs")) {
@@ -3343,11 +3353,11 @@ static void explore_walk_step(struct rev_info *revs)
 
 	if (process_parents(revs, c, NULL, NULL) < 0)
 		return;
 
 	if (c->object.flags & UNINTERESTING)
-		mark_parents_uninteresting(c);
+		mark_parents_uninteresting(revs, c);
 
 	for (p = c->parents; p; p = p->next)
 		test_flag_and_insert(&info->explore_queue, p->item, TOPO_WALK_EXPLORED);
 }
 
diff --git a/revision.h b/revision.h
index 3f66147bfd..374a4ff468 100644
--- a/revision.h
+++ b/revision.h
@@ -156,10 +156,11 @@ struct rev_info {
 			cherry_pick:1,
 			cherry_mark:1,
 			bisect:1,
 			ancestry_path:1,
 			first_parent_only:1,
+			exclude_first_parent_only:1,
 			line_level_traverse:1,
 			tree_blobs_in_commit_order:1,
 
 			/*
 			 * Blobs are shown without regard for their existence.
@@ -396,11 +397,11 @@ struct commit *get_revision(struct rev_info *revs);
 const char *get_revision_mark(const struct rev_info *revs,
 			      const struct commit *commit);
 void put_revision_mark(const struct rev_info *revs,
 		       const struct commit *commit);
 
-void mark_parents_uninteresting(struct commit *commit);
+void mark_parents_uninteresting(struct rev_info *revs, struct commit *commit);
 void mark_tree_uninteresting(struct repository *r, struct tree *tree);
 void mark_trees_uninteresting_sparse(struct repository *r, struct oidset *trees);
 
 void show_object_with_name(FILE *, struct object *, const char *);
 
diff --git a/shallow.c b/shallow.c
index 9ed18eb884..71e5876f37 100644
--- a/shallow.c
+++ b/shallow.c
@@ -601,11 +601,11 @@ static int mark_uninteresting(const char *refname, const struct object_id *oid,
 	struct commit *commit = lookup_commit_reference_gently(the_repository,
 							       oid, 1);
 	if (!commit)
 		return 0;
 	commit->object.flags |= UNINTERESTING;
-	mark_parents_uninteresting(commit);
+	mark_parents_uninteresting(NULL, commit);
 	return 0;
 }
 
 static void post_assign_shallow(struct shallow_info *info,
 				struct ref_bitmap *ref_bitmap,
diff --git a/t/t6012-rev-list-simplify.sh b/t/t6012-rev-list-simplify.sh
index 4f7fa8b6c0..e2851fd75d 100755
--- a/t/t6012-rev-list-simplify.sh
+++ b/t/t6012-rev-list-simplify.sh
@@ -14,17 +14,16 @@ note () {
 unnote () {
 	git name-rev --tags --stdin | sed -e "s|$OID_REGEX (tags/\([^)]*\)) |\1 |g"
 }
 
 #
-# Create a test repo with interesting commit graph:
+# Create a test repo with an interesting commit graph:
 #
-# A--B----------G--H--I--K--L
-#  \  \           /     /
-#   \  \         /     /
-#    C------E---F     J
-#        \_/
+# A-----B-----G--H--I--K--L
+#  \     \      /     /
+#   \     \    /     /
+#    C--D--E--F     J
 #
 # The commits are laid out from left-to-right starting with
 # the root commit A and terminating at the tip commit L.
 #
 # There are a few places where we adjust the commit date or
@@ -140,10 +139,17 @@ check_result 'I B A' --topo-order -- file
 check_result 'I B A' --date-order -- file
 check_result 'I B A' --author-date-order -- file
 check_result 'H' --first-parent -- another-file
 check_result 'H' --first-parent --topo-order -- another-file
 
+check_result 'L K I H G B A' --first-parent L
+check_result 'F E D C' --exclude-first-parent-only F ^L
+check_result '' F ^L
+check_result 'L K I H G J' L ^F
+check_result 'L K I H G B J' --exclude-first-parent-only L ^F
+check_result 'L K I H G B' --exclude-first-parent-only --first-parent L ^F
+
 check_result 'E C B A' --full-history E -- lost
 test_expect_success 'full history simplification without parent' '
 	printf "%s\n" E C B A >expect &&
 	git log --pretty="$FMT" --full-history E -- lost |
 	unnote >actual &&
-- 
2.32.0.1314.g6ed4fcc4cc


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-01-11 21:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-05 23:27 [PATCH V3] git-rev-list: add --first-parent-not flag Jerry Zhang
2022-01-06 22:10 ` Junio C Hamano
2022-01-07  3:51   ` Jerry Zhang
2022-01-07 21:02     ` Junio C Hamano
2022-01-11 21:39 ` [PATCH V4] git-rev-list: add --exclude-first-parent-only flag Jerry Zhang

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).