git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Christian Couder <chriscool@tuxfamily.org>,
	Taylor Blau <me@ttaylorr.com>,
	Johannes Altmanninger <aclopte@gmail.com>,
	Elijah Newren <newren@gmail.com>,
	Elijah Newren <newren@gmail.com>,
	Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH v2 8/8] merge-tree: provide an easy way to access which files have conflicts
Date: Fri, 7 Jan 2022 20:36:18 +0100 (CET)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.2201071908580.339@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <01364bb020ee2836016ec0e8eafa2261fb7800ab.1641403655.git.gitgitgadget@gmail.com>

Hi Elijah,

On Wed, 5 Jan 2022, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
>
> Callers of `git merge-tree --real` might want an easy way to determine
> which files conflicted.  While they could potentially use the --messages
> option and parse the resulting messages written to that file, those
> messages are not meant to be machine readable.  Provide a simpler
> mechanism of having the user specify --unmerged-list=$FILENAME, and then
> write a NUL-separated list of unmerged filenames to the specified file.

This patch does what the commit message says, and it looks quite
plausible. However, in practice it seems that you need either a tree (if
the merge succeeded) or the list of conflicted files (if the merge
succeeded).

So while it looks relatively clean from the implementation's point of
view, the design itself could probably withstand a bit of consideration.

As I hinted earlier (to be precise, in
https://lore.kernel.org/git/nycvar.QRO.7.76.6.2201071602110.339@tvgsbejvaqbjf.bet/),
I had the chance in December to work on the server-side, using `merge-ort`
for a bit. In the following, I will talk about this a bit more than about
this particular patch, but I think it is highly relevant (not a tangent).

One of the things that became clear to me is that we really have an
either/or situation here. Either the merge succeeds, and we _need_ that
tree, or it fails, and we could not care less about the tree at all.

In fact, if the merge fails, we completely ignore the tree, and it would
be better if we would not even write out any Git objects in that case at
all: even just writing the objects would be quite costly at the
server-side scale.

So my (somewhat hacky) patches for a proof-of-concept produced _either_
the hash of the tree on `stdout`, _or_ a header saying that there were
conflicts followed by a NUL-separated list of file names.

Mind you, I did not even get to the point of analyzing things even more
deeply. My partner in crime and I only got to comparing the `merge-ort`
way to the libgit2-based way, trying to get them to compare as much
apples-to-apples as possible [*1*], and we found that even the time to
spawn the Git process (~1-3ms, with all overhead counted in) is _quite_
noticeable, at server-side scale.

Of course, the `merge-ort` performance was _really_ nice when doing
anything remotely complex, then `merge-ort` really blew the libgit2-based
merge out of the water. But that's not the common case. The common case
are merges that involve very few modified files, a single merge base, and
they don't conflict. And those can be processed by the libgit2-based
method within a fraction of the time it takes to even only so much as
spawn `git` (libgit2-based merges can complete in less than a fifth
millisecond, that's at most a fifth of the time it takes to merely run
`git merge-tree`).

The difference between 0.2-0.5ms for libgit2-based merges on the one hand,
and 1-3ms for `merge-ort`-based merges on the other hand, might not seem
like much, but you have to multiply it by the times such a merge is
performed on the server. Which is a _lot_. Way more often than I thought.

In this particular instance, there is a silver-lining: the libgit2-based
merge is not actually recursive. It is a three-way merge. Which means that
we first have to determine a merge base. In our case, this is done by
spawning a Git process anyway, so one of my ideas to move forward is to fold
that merge-base logic into `git merge-tree`, too.

Anyway, the short short is: whenever we can avoid unnecessary work, we
should do so. In the context of this patch, I would say that we should
avoid writing out a tree (and avoid printing its hash to `stdout`) if
there are merge conflicts. And we should avoid writing (and later reading)
a file, if we can get away with avoiding it.

At least in the default case, that is. We still might need a flag to
produce some more information about those merge conflicts. But even in
that case, it would be better to have a list of file names with the three
associated stages than to output the hash of a tree that contains
conflicts (and tons of files _without_ conflicts). The UI needs to
re-generate those conflicts anyway. And remember: a tree can contain
millions of files even if there is but a single conflict. It makes more
sense for `merge-tree --real` to output a concrete list of files that
conflicted, rather than expecting the caller to discern between conflicts
and non-conflicts by processing a tree object.

Maybe you agree with this rationale and re-design the `--real` mode to try
to avoid writing out files in the common case?

About the form of the patch itself: I was tempted to go with the
nitpicking spirit I see on the Git mailing list these days, especially
about the shell script code in the test scripts. But then I realized that
I find such nitpicking pretty unhelpful, myself. The code is good as-is,
even if I would write it differently. It is clear, and it does exactly
what it is supposed to do.

Thank you,
Dscho

Footnote *1*: I did not _quite_ get to the point of comparing the
`merge-ort` merges to the libgit2 ones, unfortunately. I was on my way to
add code to respect `merge.renames = false` so that we could _truly_
compare the `merge-ort` merges to the libgit2 merges (we really will want
to verify that the output is identical, before even considering to enable
recursive merges on the server side, and then only after studying the time
difference), and then had to take off due to the holidays. If you already
have that need to be able to turn off rename-detection on your radar, even
if only for a transitional period, I would be _so_ delighted.

  parent reply	other threads:[~2022-01-07 19:36 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-31  5:03 [PATCH 0/8] RFC: Server side merges (no ref updating, no commit creating, no touching worktree or index) Elijah Newren via GitGitGadget
2021-12-31  5:03 ` [PATCH 1/8] merge-tree: rename merge_trees() to trivial_merge_trees() Elijah Newren via GitGitGadget
2021-12-31  5:03 ` [PATCH 2/8] merge-tree: move logic for existing merge into new function Elijah Newren via GitGitGadget
2022-01-01 20:11   ` Johannes Altmanninger
2022-01-01 20:17     ` Elijah Newren
2021-12-31  5:03 ` [PATCH 3/8] merge-tree: add option parsing and initial shell for real merge function Elijah Newren via GitGitGadget
2021-12-31  5:04 ` [PATCH 4/8] merge-tree: implement real merges Elijah Newren via GitGitGadget
2022-01-01 20:08   ` Johannes Altmanninger
2022-01-01 21:11     ` Elijah Newren
2022-01-03 12:23   ` Fabian Stelzer
2022-01-03 16:37     ` Elijah Newren
2021-12-31  5:04 ` [PATCH 5/8] merge-ort: split out a separate display_update_messages() function Elijah Newren via GitGitGadget
2022-01-03 12:15   ` Fabian Stelzer
2022-01-03 12:25     ` Fabian Stelzer
2021-12-31  5:04 ` [PATCH 6/8] merge-ort: allow update messages to be written to different file stream Elijah Newren via GitGitGadget
2022-01-01 20:08   ` Johannes Altmanninger
2022-01-01 20:19     ` Elijah Newren
2021-12-31  5:04 ` [PATCH 7/8] merge-tree: support saving merge messages to a separate file Elijah Newren via GitGitGadget
2022-01-03 12:31   ` Fabian Stelzer
2022-01-03 16:51     ` Elijah Newren
2022-01-03 17:22       ` Fabian Stelzer
2022-01-03 19:46         ` Elijah Newren
2022-01-04 13:05           ` Fabian Stelzer
2022-01-03 12:35   ` Fabian Stelzer
2022-01-03 16:55     ` Elijah Newren
2021-12-31  5:04 ` [PATCH 8/8] merge-tree: provide an easy way to access which files have conflicts Elijah Newren via GitGitGadget
2022-01-05 17:27 ` [PATCH v2 0/8] RFC: Server side merges (no ref updating, no commit creating, no touching worktree or index) Elijah Newren via GitGitGadget
2022-01-05 17:27   ` [PATCH v2 1/8] merge-tree: rename merge_trees() to trivial_merge_trees() Elijah Newren via GitGitGadget
2022-01-05 17:27   ` [PATCH v2 2/8] merge-tree: move logic for existing merge into new function Elijah Newren via GitGitGadget
2022-01-05 17:27   ` [PATCH v2 3/8] merge-tree: add option parsing and initial shell for real merge function Elijah Newren via GitGitGadget
2022-01-05 17:27   ` [PATCH v2 4/8] merge-tree: implement real merges Elijah Newren via GitGitGadget
2022-01-07 15:30     ` Johannes Schindelin
2022-01-07 17:26       ` Elijah Newren
2022-01-07 18:22         ` Johannes Schindelin
2022-01-07 19:15           ` Elijah Newren
2022-01-07 20:56         ` Junio C Hamano
2022-01-11 13:39           ` Johannes Schindelin
2022-01-07 18:12     ` Christian Couder
2022-01-07 19:09       ` Elijah Newren
2022-01-05 17:27   ` [PATCH v2 5/8] merge-ort: split out a separate display_update_messages() function Elijah Newren via GitGitGadget
2022-01-05 17:27   ` [PATCH v2 6/8] merge-ort: allow update messages to be written to different file stream Elijah Newren via GitGitGadget
2022-01-05 17:27   ` [PATCH v2 7/8] merge-tree: support saving merge messages to a separate file Elijah Newren via GitGitGadget
2022-01-07 18:07     ` Johannes Schindelin
2022-01-08  1:02       ` Elijah Newren
2022-01-05 17:27   ` [PATCH v2 8/8] merge-tree: provide an easy way to access which files have conflicts Elijah Newren via GitGitGadget
2022-01-05 19:09     ` Ramsay Jones
2022-01-05 19:17       ` Elijah Newren
2022-01-07 19:36     ` Johannes Schindelin [this message]
2022-01-07 22:12       ` Elijah Newren
2022-02-22 13:03         ` Johannes Schindelin
2022-01-08  1:28       ` Elijah Newren
2022-02-22 13:05         ` Johannes Schindelin
2022-01-05 20:18   ` [PATCH v2 0/8] RFC: Server side merges (no ref updating, no commit creating, no touching worktree or index) Junio C Hamano
2022-01-05 22:35     ` Elijah Newren
2022-01-07 18:46   ` Christian Couder
2022-01-07 19:59     ` Elijah Newren
2022-01-07 21:26       ` René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.2201071908580.339@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=aclopte@gmail.com \
    --cc=chriscool@tuxfamily.org \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).