git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Antonio Russo <antonio.e.russo@gmail.com>, git-ml <git@vger.kernel.org>
Cc: Junio C Hamano <gitster@pobox.com>,
	Michael Haggerty <mhagger@alum.mit.edu>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH v2] Teach git-rev-list --simplify-forks
Date: Wed, 29 Apr 2020 09:12:55 -0400	[thread overview]
Message-ID: <08d5460b-d80b-28a3-19dd-d438afbdfba9@gmail.com> (raw)
In-Reply-To: <d3079ba8-33e1-3b68-23d2-ea97b9ca1e97@gmail.com>

On 4/29/2020 4:01 AM, Antonio Russo wrote:
> When used with --graph, instead of displaying the full graph, display a
> spanning subgraph produced by a depth-first search of the graph visiting
> parents from left to right.  Edges to already visited commits are
> discarded.  This process is repeated for every commit to be displayed.
> 
> This is valuable to reduce visual clutter when there are many merges
> that were not rebased onto each other and the user is primarily
> interested in the state of the branch being merged into.

My earlier comment to recommend how to fix the test failures intended
to demonstrate that this area of code requires a bit of precision. I
took some time to review this patch, but I find it needs some significant
updates.

tl;dr: The way you manipulate the commit parents seems incorrect to me,
but perhaps there is enough prior art in the way we simplify parents to
make that acceptable. Someone else will need to chime in on that part.

It may be possible to do this "drop edges" entirely inside of graph.c
(the graph rendering code) instead of revision.c, which would automatically
work with the performance gains from the newer topo-walk algorithm.

There are enough deviations from code and test style that this will
need a significant revision regardless.
 
> This second revision of the patch sets revs->limited.  This forces the
> graph of commits to be loaded, and simplfiy_forks() therefore reliably
> traverses it.  This addresses the test failures mentioned before (see [1]).

This will have a significant performance impact on the option, as you will
not see even the first result until the DFS has completed.

> diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
> index 04ad7dd36e..cbac09028c 100644
> --- a/Documentation/rev-list-options.txt
> +++ b/Documentation/rev-list-options.txt
> @@ -363,6 +363,14 @@ Default mode::
>  	merges from the resulting history, as there are no selected
>  	commits contributing to this merge.
> 
> +--simplify-forks::
> +	Convert the commit graph into a spanning subgraph produced by a
> +	depth-first-search of the history graph, searching the leftmost
> +	parent first, and discarding edges to commits already visited.
> +	Useful with `--graph` to visualize repositories with many merges
> +	when you are interested in was added to master, and not when the
> +	branch was last rebased.

Describing the option via the algorithm may not be the best way to
inform the user of what they will see. It also will not be informative
if the implementation changes to not perform the algorithm described
here, because that algorithm is not incremental (it is O(N) where N is
the total number of reachable commits, not ~O(n) where n is the number
of commits that will actually show up).

An easy test is to time your command with "-n 1".

I'm also not crazy about "when the branch was last rebased". You
probably mean "..when you are not interested in which merge base
was used for each merge."

Also, this does nothing without "--graph" right? Perhaps it should
enable it?

Here is a suggested alternative:

--simplify-forks::
	Show commits that are introduced by each merge before showing
	the first parent of the merge, as in `--graph`, but remove
	edges from those commits to commits reachable from the first
	parent. This helps to visualize repositories with many merges
	when you are not interested in which merge base was used for
	each merge. It also reduces the width of the graph visualization
	compared to `--graph`.

With this description, perhaps it is worth renaming the option? Perhaps
"--ignore-merge-bases"? The word "simplify" implies something like
dropping commits from history, but you are instead dropping _edges_ from
the graph.

>  --ancestry-path::
>  	When given a range of commits to display (e.g. 'commit1..commit2'
>  	or 'commit2 {caret}commit1'), only display commits that exist
> diff --git a/revision.c b/revision.c
> index 5bc96444b6..51dbe21847 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -2082,6 +2082,9 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
>  		revs->simplify_by_decoration = 1;
>  		revs->limited = 1;
>  		revs->prune = 1;
> +	} else if (!strcmp(arg, "--simplify-forks")) {
> +		revs->limited = 1;
> +		revs->simplify_forks = 1;

I recommend dropping the revs->limited setting here and instead fixing the
performance instead. But maybe that should be a second patch on top of this
one.

>  	} else if (!strcmp(arg, "--date-order")) {
>  		revs->sort_order = REV_SORT_BY_COMMIT_DATE;
>  		revs->topo_order = 1;
> @@ -3095,6 +3098,63 @@ static void simplify_merges(struct rev_info *revs)
>  	}
>  }
> 
> +static void simplify_forks(struct rev_info *revs)
> +{
> +	struct commit_list *stack, *list_lr, *iter_list;
> +	struct commit_list **parents;

You want "struct commit_list *parents" here for simpler use.

> +	struct commit *commit, *parent;
> +
> +	stack = NULL;
> +	list_lr = NULL;
> +
> +	clear_object_flags(SIMP_FORK_VISITED);
> +
> +	for(iter_list = revs->commits; iter_list; iter_list = iter_list->next) {

This method could use a split into at least two. I count three levels of nested
loops here, so please break them up into smaller methods.

> +		/* process every commit to be displayed exactly once */
> +		if(iter_list->item->object.flags & SIMP_FORK_VISITED)
> +			continue;
> +		clear_object_flags(SIMP_FORK_VISITING);

This drops the flag from all commits. This will cause quadratic performance
if combined with "--all". Drop this and instead clear the flag yourself as you
visit commits.

> +		commit_list_insert(iter_list->item, &stack);
> +		iter_list->item->object.flags |= SIMP_FORK_VISITED | SIMP_FORK_VISITING;
> +		while(stack) {
> +			commit = pop_commit(&stack);
> +			/* process the parent nodes: removing links to
> +			 * commits already visited (creating a spanning tree)
> +			 */
> +			parents = &(commit->parents);
> +			while(*parents) {

You have some whitespace issues. Put a space between "while" and "(". Same with "if"s below.

> +				parent = (*parents)->item;
> +				if(parent->object.flags & SIMP_FORK_VISITING) {
> +					/* We have already visited this commit, from the same root.
> +					 * We do not explore it at all.
> +					 */
> +					pop_commit(parents);

I don't think you want pop_commit() here. You want to have "parents = parents->next" at the end.

Now, if this is how you are "simplifying" the edges in the final output, then I think this is
destructive and unsafe! You are literally modifying "commit->parents" so if another operation in
Git tries to read those parents it will see the wrong data.

Think about a different way to achieve your goal here, perhaps in the rendering portion of
graph.c instead of here.

> +				} else if(parent->object.flags & SIMP_FORK_VISITED) {
> +					/* We visited this commit before, but from a different root.
> +					 * Leave it attached, but do not explore it further.
> +					 */
> +					parents = &((*parents)->next);
> +				} else {
> +					/* We have not visited this commit yet. Explore it, as usual.
> +					 */
> +					parent->object.flags |= SIMP_FORK_VISITED | SIMP_FORK_VISITING;
> +					commit_list_insert(parent, &list_lr);
> +					parents = &((*parents)->next);
> +				}

It is very unclear that your loop terminates. Instead, use
"parents = parents->next" at the end of the loop block to
make is extremely clear that the loop behaves as expected.

Of course, this assumes that you are not being destructive
to the commit parents as you explore.

> +			}
> +
> +			/* feed the parents, right to left (reversed) onto the
> +			 * stack to do a depth-first traversal of the commit graph
> +			 */

nit: multi-line comment style [1]

[1] https://github.com/git/git/blob/86ab15cb154862b6fa5cc646dac27532f881e1fb/Documentation/CodingGuidelines#L285-L291

> +			while(list_lr) {
> +				commit_list_insert(pop_commit(&list_lr), &stack);
> +			}

nit: No curly braces around single-line blocks. [2]

[2] https://github.com/git/git/blob/86ab15cb154862b6fa5cc646dac27532f881e1fb/Documentation/CodingGuidelines#L239-L243

> +		}
> +	}
> +
> +	clear_object_flags(SIMP_FORK_VISITED | SIMP_FORK_VISITING);
> +}
> +
>  static void set_children(struct rev_info *revs)
>  {
>  	struct commit_list *l;
> @@ -3392,6 +3452,8 @@ int prepare_revision_walk(struct rev_info *revs)
>  	if (revs->limited) {
>  		if (limit_list(revs) < 0)
>  			return -1;
> +		if (revs->simplify_forks)
> +			simplify_forks(revs);
>  		if (revs->topo_order)
>  			sort_in_topological_order(&revs->commits, revs->sort_order);
>  	} else if (revs->topo_order)
> diff --git a/revision.h b/revision.h
> index c1af164b30..f1abdb26b0 100644
> --- a/revision.h
> +++ b/revision.h
> @@ -51,6 +51,11 @@
>  #define TOPO_WALK_EXPLORED	(1u<<27)
>  #define TOPO_WALK_INDEGREE	(1u<<28)
> 
> +/* Re-use the TOPO_WALK flagspace for simplify_forks
> + */
> +#define SIMP_FORK_VISITED	(1u<<27)
> +#define SIMP_FORK_VISITING	(1u<<28)

I don't like that you are re-using these, as it is dangerous for later
collisions. At the moment, these flags are not used in the same code paths
because you specify "limited = 1" and that code path does not use TOPO_WALK_*
macros.

>  #define DECORATE_SHORT_REFS	1
>  #define DECORATE_FULL_REFS	2
> 
> @@ -132,6 +137,7 @@ struct rev_info {
>  			no_walk:2,
>  			remove_empty_trees:1,
>  			simplify_history:1,
> +			simplify_forks:1,
>  			show_pulls:1,
>  			topo_order:1,
>  			simplify_merges:1,
> diff --git a/t/t6016-rev-list-graph-simplify-history.sh b/t/t6016-rev-list-graph-simplify-history.sh
> index f5e6e92f5b..d99214b6df 100755
> --- a/t/t6016-rev-list-graph-simplify-history.sh
> +++ b/t/t6016-rev-list-graph-simplify-history.sh
> @@ -85,6 +85,28 @@ test_expect_success '--graph --all' '
>  	test_cmp expected actual
>  	'
> 
> +# Make sure that simplify_histpry_forks produces a spanning tree
> +test_expect_success '--graph --simplify-forks --all' '
> +	rm -f expected &&
> +	echo "* $A7" >> expected &&
> +	echo "*   $A6" >> expected &&
> +	echo "|\  " >> expected &&
> +	echo "| * $C4" >> expected &&
> +	echo "| * $C3" >> expected &&
> +	echo "* $A5" >> expected &&
> +	echo "*-.   $A4" >> expected &&
> +	echo "|\ \  " >> expected &&
> +	echo "| | * $C2" >> expected &&
> +	echo "| | * $C1" >> expected &&
> +	echo "| * $B2" >> expected &&
> +	echo "| * $B1" >> expected &&
> +	echo "* $A3" >> expected &&
> +	echo "* $A2" >> expected &&
> +	echo "* $A1" >> expected &&

Do not use too many echos like this. Instead use t4215-log-skewed-merges.sh
as an example [3].

[3] https://github.com/git/git/blob/86ab15cb154862b6fa5cc646dac27532f881e1fb/t/t4215-log-skewed-merges.sh#L24-L37

I also hope that you can find more complicated cases to
test, including:

 1. A merge that brings in a merge (think a feature branch)
 2. A merge that brings in a twisted merge (think a user using "git pull").

Here are some example --graph and --simplify-forks outputs to try. I
have not tested these myself, but I believe they are interesting test
cases that can trip up other algorithms.

Merge bringing a merge (non-twisted):

 --graph	--simplify-forks
 *		*
 |\		|\
 | *		| *
 | |\		| |\
 | | *		| | *
 | * |		| *
 * | |		*
 |/ /		*
 * /		*
 |/
 *

Twisted merge:

 --graph	--simplify-forks
 *		*
 |\		|\
 | *		| *
 | |\		| *
 | * |		*
 | | |		*
 * |/		*
 |/|
 * |
 |/
 *

Thanks,
-Stolee


  parent reply	other threads:[~2020-04-29 13:13 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-27  5:07 [PATCH] Teach git-rev-list --simplify-forks Antonio Russo
2020-04-27 10:55 ` Derrick Stolee
2020-04-28 12:49   ` Antonio Russo
2020-04-28 14:11     ` Derrick Stolee
2020-04-29  8:01 ` [PATCH v2] " Antonio Russo
2020-04-29 10:08   ` Philip Oakley
2020-04-29 13:04     ` Antonio Russo
2020-04-29 13:12   ` Derrick Stolee [this message]
2020-05-01 14:13     ` Antonio Russo
2020-05-01 15:44       ` Derrick Stolee
2020-05-01 14:21   ` [PATCH 1/3 v3] Clean up t6016-rev-list-graph-simplify-history Antonio Russo
2020-05-01 17:10     ` Junio C Hamano
2020-05-02 15:27       ` Antonio Russo
2020-05-01 14:22   ` [PATCH 2/3 v3] Teach git-rev-list --ignore-merge-bases Antonio Russo
2020-05-02 13:25     ` Johannes Sixt
2020-05-02 14:21       ` Antonio Russo
2020-05-01 14:23   ` [PATCH 3/3 v3] Add new tests of ignore-merge-bases Antonio Russo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=08d5460b-d80b-28a3-19dd-d438afbdfba9@gmail.com \
    --to=stolee@gmail.com \
    --cc=antonio.e.russo@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).