git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Antonio Russo <antonio.e.russo@gmail.com>
To: git-ml <git@vger.kernel.org>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Derrick Stolee" <dstolee@microsoft.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"René Scharfe" <l.s.r@web.de>
Subject: [PATCH 0/3] Ignore merge bases graph simplification
Date: Sun, 7 Jun 2020 10:22:34 -0600	[thread overview]
Message-ID: <456a1271-4f17-5503-5d13-d0c97115e2fd@gmail.com> (raw)

This patch series is a second attempt at providing a way to simplify the
graphical representation of forked branches that are merged back in.
Currently, when many branches are forked off the same commit, a
"mountain" of the tails of the merge that connect to the merge base
accumulate, making visual inspection of the --graph output challenging:

% git log --graph --oneline 20514004dd -n20

> * 20514004dd Start the post 2.27 cycle
> *   54041832d7 Merge branch 'en/fast-import-looser-date'
> |\
> | * d42a2fb72f fast-import: add new --date-format=raw-permissive format
> * |   a0ba2bbbdd Merge branch 'mt/zsh-completion-optim'
> |\ \
> | * | a44a0a9fc4 completion: use native ZSH array pattern matching
> * | |   e34df9a6e5 Merge branch 'la/diff-relative-config'
> |\ \ \
> | * | | c28ded83fc diff: add config option relative
> * | | |   de82fb45db Merge branch 'rs/checkout-b-track-error'
> |\ \ \ \
> | * | | | bb2198fb91 checkout: improve error messages for -b with extra argument
> | * | | | 16ab794b82 checkout: add tests for -b and --track
> * | | | |   202a2b8e71 Merge branch 'lo/sparse-universal-zero-init'
> |\ \ \ \ \
> | * | | | | 1c96642326 sparse: allow '{ 0 }' to be used without warnings
> | | |_|_|/
> | |/| | |
> * | | | |   1ab0dfde2c Merge branch 'cb/t5608-cleanup'
> |\ \ \ \ \
> | * | | | | d63ae31962 t5608: avoid say() and use "skip_all" instead for consistency
> | |/ / / /
> * | | | |   70a1e331b0 Merge branch 'jx/pkt-line-doc-count-fix'
> |\ \ \ \ \
> | * | | | | 2c31a7aa44 doc: fix wrong 4-byte length of pkt-line message
> | |/ / / /
> * | | | |   51b4708811 Merge branch 'jn/experimental-opts-into-proto-v2'
> |\ \ \ \ \
> | * | | | | 3697caf4b9 config: let feature.experimental imply protocol.version=2
> * | | | | |   7a8fec908a Merge branch 'bk/p4-prepare-p4-only-fix'
> |\ \ \ \ \ \
> | * | | | | | 2dfdd705ff git-p4.py: fix --prepare-p4-only error with multiple commits

This patch adds a new "--ignore-merge-bases' parameter that, roughly,
hides all of the information about the merge base:

% git log --graph --oneline 20514004dd --ignore-merge-bases -n20

> * 20514004dd Start the post 2.27 cycle
> *   54041832d7 Merge branch 'en/fast-import-looser-date'
> |\
> | * d42a2fb72f fast-import: add new --date-format=raw-permissive format
> *   a0ba2bbbdd Merge branch 'mt/zsh-completion-optim'
> |\
> | * a44a0a9fc4 completion: use native ZSH array pattern matching
> *   e34df9a6e5 Merge branch 'la/diff-relative-config'
> |\
> | * c28ded83fc diff: add config option relative
> *   de82fb45db Merge branch 'rs/checkout-b-track-error'
> |\
> | * bb2198fb91 checkout: improve error messages for -b with extra argument
> | * 16ab794b82 checkout: add tests for -b and --track
> *   202a2b8e71 Merge branch 'lo/sparse-universal-zero-init'
> |\
> | * 1c96642326 sparse: allow '{ 0 }' to be used without warnings
> *   1ab0dfde2c Merge branch 'cb/t5608-cleanup'
> |\
> | * d63ae31962 t5608: avoid say() and use "skip_all" instead for consistency
> *   70a1e331b0 Merge branch 'jx/pkt-line-doc-count-fix'
> |\
> | * 2c31a7aa44 doc: fix wrong 4-byte length of pkt-line message
> *   51b4708811 Merge branch 'jn/experimental-opts-into-proto-v2'
> |\
> | * 3697caf4b9 config: let feature.experimental imply protocol.version=2
> *   7a8fec908a Merge branch 'bk/p4-prepare-p4-only-fix'
> |\
> | * 2dfdd705ff git-p4.py: fix --prepare-p4-only error with multiple commits

The significant changes since the last round of discussion are:

1. A streaming version of the algorithm has been implemented.  No longer
is revs->limited, with all of its very serious drawbacks, required.

2. The commit->parents commit_list is no longer modified (amputated).
Instead, a "principle child" is identified during a (heavily modified)
indegree walk.

3. Per D. Stolee's suggestion, the output of child->parent edges are
suppressed during graph emission (in particular, in
graph_is_interesting).

4. No commit flags are used. All per-commit ancillary data is stored
in a commit slab.

I discarded J. Sixt's suggestion to change the format of the --boundary
output.  While I like the proposed format, I think the change is
orthogonal to this (already large) change.

There are several outstanding issues that I would appreciate help with:

1. I have developed some terminology to think about the concepts used
here.  They are definitely not conventional, and probably should be
changed.  I would appreciate feedback:

 - A "skeleton walk" is the streaming version of a left-first
   DFS-generated spanning tree.  This is achieved using a carefully
   generation-limited traversal of the commit graph.  *Edges* are
   stored in a skel_walk_list structure.  This traversal should be
   linear in the number of edges (but I'll admit I haven't actually
   done this proof).

 - The "guide commits" are all of the commits in revs->commits.

 - A "component" (and this is particularly bad terminology) is the
   collection of commits reachable from a guide commit, but not from
   an earlier ("higher priority") guide commit.

   This approach avoids any ambiguity in the output of
   --ignore-merge-bases as to what commits are included at the merge
   point, by always drawing any edge between components.  Notice that
   the definition of a component means that no edges go from higher-
   priority to lower-priority components (instead, those commits are
   absorbed into the higher-priority component).

 - The "principle child" of a commit c is the first child in the
   leftmost-first DFS that references c.  It is always in the same
   component as c, and it is the only commit in that component whose
   edge is drawn connecting to c.

Presumably, all of these terms should be documented (with visuals)
somewhere.  If there is serious interest in this patch, I'm happy to do
this, but I'm loathe to waste time doing this until I start hearing some
sort of positive feedback on what has become a significant time sink.

2. The meat of the changes are monolithically collected into patch 2/3.
It does not make sense in my mind to implement functions that are not
used anywhere, and then make first use them in another commit.  But, I
can split this up into the skeleton walk implementation and the graph
display, if wanted.

3. A skel_walk_info structure is allocated at parameter parse time, and
currently is never freed.  While this is certainly a memory leak, it is
not clear to me if there is a benefit to performing the cleanup, besides
getting clean valgrind output (and masking more serious memory leaks). I
I didn't try to read everything in Documentation, but, besides an
abundance of commits fixing memory leaks, I could not find a specific
policy regarding freeing memory before program termination.

4. The implementation for revs->limited and !revs->limited is
frustratingly similar, but I cannot see any particular elegant way to
deduplicate the code.  The existing code also seems to suffer similarly.

Is there movement forward on eliminating the whole rev->limited
code-path?  This is obviously out of scope for this patch set, but
maintaining two code-paths was a hassle for this one action, and I can
imagine that it is much worse when multiplied by all of git's features.

5. Similarly, the skel_walk is very similar to the existing indegree
walk.  But again, avoiding duplicated code leads to uncomfortable

> if (skel) {
>         ...
> }

blocks.  It also confuses gcc, which has trouble realizing that
variables were initialized in previous `if (skel)` blocks.  I'm then
forced to resort to underhanded tricks: blanket initializing of
variables that should have been optimized out if !skel.

Duplicating the entire function leads to cleaner and negligibly faster
code.  Frustratingly, but unsurprisingly, the meticulous careful
skeleton walk that I've implemented is slightly slower than the
existing indegree walk, so it's a bad candidate to replace the old code.

Any thoughts?

The entire patch set is also available at [1], which includes a few
debugging assertions enabled, and the excision of the unused
free_skel_info method (see 3, above) which otherwise breaks
-Wunused-function.  It passes the full battery of tests there (as well
as locally for me).

Best,
Antonio Russo

[1] <https://github.com/aerusso/git/tree/mrs/imb-skel>

             reply	other threads:[~2020-06-07 16:22 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-07 16:22 Antonio Russo [this message]
2020-06-07 16:23 ` [PATCH 1/3] Clean up t6016-rev-list-graph-simplify-history Antonio Russo
2020-06-07 17:42   ` Junio C Hamano
2020-06-07 16:24 ` [PATCH 2/3] Teach git-rev-list --ignore-merge-bases Antonio Russo
2020-06-07 16:26 ` [PATCH 3/3] Add new tests of --ignore-merge-bases Antonio Russo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=456a1271-4f17-5503-5d13-d0c97115e2fd@gmail.com \
    --to=antonio.e.russo@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).