From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: ** X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=2.5 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,LIST_MIRROR_RECEIVED,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_SBL,URIBL_SBL_A shortcircuit=no autolearn=no autolearn_force=no version=3.4.2 Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by dcvr.yhbt.net (Postfix) with ESMTP id D6D131F852 for ; Mon, 21 Feb 2022 17:51:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377799AbiBUO2b (ORCPT ); Mon, 21 Feb 2022 09:28:31 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:38422 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1377781AbiBUO23 (ORCPT ); Mon, 21 Feb 2022 09:28:29 -0500 Received: from mail-ed1-x531.google.com (mail-ed1-x531.google.com [IPv6:2a00:1450:4864:20::531]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 152CB1EC7B for ; Mon, 21 Feb 2022 06:28:05 -0800 (PST) Received: by mail-ed1-x531.google.com with SMTP id m17so29689847edc.13 for ; Mon, 21 Feb 2022 06:28:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:references:user-agent:in-reply-to :message-id:mime-version; bh=xviSCgc/lRfPaYOZZAXb7ddQ1a9reO3GbNcg98AK/OQ=; b=POM7uXi8KupJYfc9q1ZC86Bu4F4kH4mv+hzzRCza6NJWwE8qd+sKo2Yoal/wD5u7FZ 9J/cQIc7m4fsL+sEvm63Sj5cTGKANafqbsz+JcemIBGUcBI75iW+d8BAKwN5Wa7DFfCV kiEgZKzN5eBABXrysJH0lz0ZDQPRjxBcHCva0heH6qG9mX/P0toc/qE8cisVTBysqx3M 2b3N9EgLVnb+duZaMHuXxH9WJDnGMSKki/WrWdQYh8sbGRHMOg5QfytNRlN+ZZTLLjhX jpP1yW/eJ27vJid88jbJ4tceuKtyYJRenO00rgf6Z+sQiTrCQeNBSaH3HRt8ToMHQNKf 2u5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:references:user-agent :in-reply-to:message-id:mime-version; bh=xviSCgc/lRfPaYOZZAXb7ddQ1a9reO3GbNcg98AK/OQ=; b=5XMzZjw7N4zXN7WBwDvbXx5kxn9gDZfyxymp7HRUqQDodOx+xBD/7FHPsC1hNK4evv jTC0adDzA+yET7B5Ug6lqRQuLgOE/jSG5NIBAUZ7Tp0ao8/u0m0bzw6vEuWOP3v9j2rM rT1rTlpWOXdXUWWLkYGcp26zpevNNex1h5ik86mxUmcue5fZ7z34TLVc9nSqgnNlNy62 HP/TeYpb9D2CmcVBCjd4BG8PRG/JrgQddED2wi0h3AywZ0/7n+mheG+f9hYt8xYBVvg/ oPqpIUoc+TN6iIq3PwBFamzc5OTvNd5fDVDtVp7uSOfBJOho2xrdCrSY4ot9Tt8FtFMc /Mow== X-Gm-Message-State: AOAM531VlLrVvladKMuWi2xEXoSQ4fyx64btq9hoIJkjXXET7yeIUTwB olDaizbcH5vpQbcUYVfhLp/xcVqcHq4= X-Google-Smtp-Source: ABdhPJzogkyc7aKDxKTsPT7a/aHdySH9N5n9G/RWGIkuFt8EQKX9mjvnMmmNopuBoYsJYLXfEPQErw== X-Received: by 2002:aa7:c446:0:b0:410:9a8e:9b85 with SMTP id n6-20020aa7c446000000b004109a8e9b85mr21254959edr.319.1645453683312; Mon, 21 Feb 2022 06:28:03 -0800 (PST) Received: from gmgdl (j120189.upc-j.chello.nl. [24.132.120.189]) by smtp.gmail.com with ESMTPSA id i6sm5267354eja.132.2022.02.21.06.28.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Feb 2022 06:28:02 -0800 (PST) Received: from avar by gmgdl with local (Exim 4.95) (envelope-from ) id 1nM9fV-005dbC-NV; Mon, 21 Feb 2022 15:28:01 +0100 From: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason To: Johannes Schindelin Cc: Elijah Newren , Johannes Sixt , Elijah Newren via GitGitGadget , Git Mailing List , Christian Couder , Taylor Blau , Johannes Altmanninger , Ramsay Jones , Christian Couder , =?utf-8?Q?Ren=C3=A9?= Scharfe Subject: Re: [PATCH 08/12] merge-ort: provide a merge_get_conflicted_files() helper function Date: Mon, 21 Feb 2022 15:27:56 +0100 References: <35e0ed9271a0229fe2acd2385a7e4171d4dfe077.1642888562.git.gitgitgadget@gmail.com> <0d7ba76c-9824-9953-b8ce-6abe810e2778@kdbg.org> User-agent: Debian GNU/Linux bookworm/sid; Emacs 27.1; mu4e 1.6.10 In-reply-to: Message-ID: <220221.8635kccn0u.gmgdl@evledraar.gmail.com> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Mon, Feb 21 2022, Johannes Schindelin wrote: > Hi Elijah, > > On Fri, 4 Feb 2022, Elijah Newren wrote: > >> On Fri, Feb 4, 2022 at 3:10 PM Johannes Schindelin >> wrote: >> > >> > On Sat, 29 Jan 2022, Elijah Newren wrote: >> > >> > > On Sat, Jan 29, 2022 at 12:23 AM Johannes Sixt wrote: >> > > > >> > > > Just a heckling from the peanut gallery... >> > > > >> > > > Am 29.01.22 um 07:08 schrieb Elijah Newren: >> > > > > On Fri, Jan 28, 2022 at 8:55 AM Johannes Schindelin >> > > > > wrote: >> > > > >> Meaning: Even if stage 3 is missing from the first conflict and stage 1 is >> > > > >> missing from the second conflict, in the output we would see stages 1, 2, >> > > > >> 2, 3, i.e. a duplicate stage 2, signifying that we're talking about two >> > > > >> different conflicts. >> > > > > >> > > > > I don't understand why you're fixating on the stage here. Why would >> > > > > you want to group all the stage 2s together, count them up, and then >> > > > > determine there are N conflicting files because there are N stage 2's? >> > > > >> > > > Looks like you are misunderstanding Dscho's point: When you have two >> > > > conflicts, the first with stages 1 and 2, the second with stages 2 and >> > > > 3, then the 2s occur lumped together when the 4 lines are printed in a >> > > > row, and that is the cue to the parser where the new conflict begins. >> > > > Dscho did not mean that all N 2s of should be listed together. >> > > >> > > Ah, so...I didn't understand his misunderstanding? Using stages as a >> > > cue to the parser where the new conflict begins is broken; you should >> > > instead check for when the filename listed on a line does not match >> > > the filename on the previous line. >> > >> > But that would break down in case of rename/rename conflicts, right? >> > >> > > In particular, if one conflict has stages 1 and 2, and the next conflict >> > > has only stage 3, then looking at stages only might cause you to >> > > accidentally lump unrelated conflicts together. >> > >> > Precisely. That's why I would love to have a way to deviate from the >> > output of `ls-files -u`'s format, and have a reliable way to indicate >> > stages that belong to the same merge conflict. >> >> Ah, attempting to somehow identify and present logical separate >> conflicts? That could be awesome, but I'm not sure it's technically >> possible. It certainly isn't with today's merge-ort. >> >> Let me ask some questions first... >> >> If I understand you correctly then in the event of a rename/rename, >> i.e. foo->bar & foo->baz, then you want foo's, bar's, & baz's stages >> all listed together. Right? And in some way that you can identify >> them as related? > > Yes, that was kind of my idea ;-) > >> If we do so, how do we mark the beginning and the end of what you call >> "the same merge conflict"? If you say it's always 3 stages (with the >> possibility of all-zero modes/oids), then what about the rename/rename >> case above modified so that the side that did foo->baz also added a >> different 'bar'? That'd be 4 non-zero modes/oids, all of them >> relevant. Or what if the side that did foo->bar also renamed >> something else to 'baz', giving us even more non-zero stages for these >> three paths? Perhaps you consider these different conflicts and want >> them listed separately -- if so, where does one conflict begin and >> another start and which stages are parts of which conflict? > > Thank you for calling me out on an only half-finished thought. > > To be quite honest, I previously did not have much time to think of the > non-trivial nature of representing merge conflicts in a web UI when > performing merges with rename detection, in particular when performing > _recursive_ merges (where, as you point out so correctly, an arbitrary > number of rename conflicts can happen _for the same file_). > > Of course, this gets even more complicated when we're also thinking about > type changes (file -> symlink, file -> directory, and vice versa). > >> If you are attempting to somehow present the stuff that "belongs to >> the same merge conflict" are you also trying to identify what kind of >> merge conflict it is? If so, do you want each type of merge conflict >> listed? For example, let's switch from the example above of logically >> disjoint paths coming together to result in more than 3 stages, and >> instead pick an example with a single logical path with less than >> three stages. And now let's say that path has multiple conflicts >> associated with it; let's use an example with 3: rename/delete + >> modify/delete + directory/file (one side renames foo->bar while >> modifying the contents, the other side deletes foo and adds the >> directory 'bar/'). In this case, there is a target file 'bar' that >> has two non-zero modes/oids in the ls-files-u output. If all three >> types of conflicts need to be listed, does each need to be listed with >> the two non-zero modes/oids (and perhaps one zero mode/oid), resulting >> in six listings for 'bar'? Or would the duplication be confusing >> enough that we instead decide to list some merge conflicts with no >> stages associated with them? >> >> Thinking about both sets of questions in the last two paragraphs from >> a higher level -- should we focus on and group the higher order stages >> by the individual conflicts that happen, or should we group them by >> the paths that they happen to (which is what `ls-files -u` happens to >> do), or should we not bother grouping them and instead duplicate the >> higher order stages for each logical conflict it is part of? >> >> As an alternative to duplicating higher order stages, do we sometimes >> decide to "lump" separate conflicts together and treat them as one >> conflict? If so, what are the rules on how we decide to lump >> conflicts and when not to? Is there a bright line boundary? And can >> it be done without sucking in arbitrarily more stages for a single >> conflict? >> >> >> Some testcases that might be useful while considering the above >> questions: take a look at the "rad", "rrdd", and "mod6" tests of >> t6422. How many "same merge conflicts" are there for each of those, >> and what's the boundary between them? And can you give the answer in >> the form of rules that generically handle all cases, rather than just >> answering these three specific cases? >> >> >> I've thought about this problem long and hard before (in part because >> of some conversations I had with Edward Thompson about libgit2 and > > Not a big deal for _me_, but I seem to remember that Ed cared a lot about > having no p in their surname ;-) > >> merging at Git Merge 2020). It wasn't at all clear to me that libgit2 >> had considered anything beyond simple rename cases. The only rules I >> ever figured out that made sense to me was "group the stages by target >> filename rather than by logical conflict" (so we get `ls -files -u` >> populated) and print a meant-for-human message for each logical >> conflict (found in the section for >> merge-tree), and make NO attempt to connect stages by conflict type. >> >> I'm sure that's not what you wanted to hear, and maybe doesn't even >> play nicely with your design. But short of ignoring the edge and >> corner cases, I don't see how to solve that problem. If you do just >> want to ignore edge and corner cases, then just ignore the >> rename/rename case you brought up in the first place and just use >> `ls-files -u`-type output as-is within your design. If you don't want >> to ignore edge cases and want something that works with a specific >> design that somehow groups conflicted file stages by conflict type, >> then we're going to have to dig into all these questions above and do >> some big replumbing within merge-ort. > > There is sometimes a big difference between what I want to hear and what I > need to hear. Thank you for providing so many details that I needed to > hear. > > So let's take a step back and look at my goal here, as in: the > over-arching goal: to use merge-ort on the server side. > > From what you said above, it becomes very clear to me that there is very > little chance to resolve such conflicts on the server side. > > For example, if a topic branch renames a file differently than the main > branch, there is a really good chance that the user tasked with merging > the topic branch will have to do a whole lot more than just click a few > buttons to perform that task. There might very well be the need to edit > files that do not contain merge conflict markers (I like to call those > cases "non-semantic merge conflicts"), and almost certainly local testing > will be necessary. > > So I guess the best we can do in those complicated cases is to give a > comprehensive overview of the problems in the web UI, with the note that > this merge conflict has to be resolved on the local side. > > Which brings me to the next concern: since `merge-tree` is a low-level > tool meant to be called by programs rather than humans, we need to make > sure that those messages remain machine-parseable, even if they contain > file names. > > Concretely: while I am not currently aware of any web UI that allows to > resolve simple rename/rename conflicts, it is easily conceivable how to > implement such a thing. When that happens, we will need to be able to > teach the server-side code to discern between the cases that can be > handled in the web UI (trivial merge conflicts, trivial rename/rename > conflicts) as compared to scenarios where the conflicts are just too > complex. > > Here's an excerpt from t4301: > > -- snip -- > Auto-merging greeting > CONFLICT (content): Merge conflict in greeting > Auto-merging numbers > CONFLICT (file/directory): directory in the way of whatever from side1; moving it to whatever~side1 instead. > CONFLICT (modify/delete): whatever~side1 deleted in side2 and modified in side1. Version side1 of whatever~side1 left in tree. > -- snap -- > > This is the complete set of messages provided in the `test conflict > notices and such` test case. > > I immediately notice that every line contains at least one file name. > Looking at https://github.com/git/git/blob/v2.35.1/merge-ort.c#L1899, it > does not seem as if the file names are quoted: > > path_msg(opt, path, 1, _("Auto-merging %s"), path); > > (where `path` is used verbatim in a call to `merge_3way()` before that, > i.e. it must not have been quoted) > > I would like to register a wish to ensure that file names with special > characters (such as most notably line-feed characters) are quoted in these > messages, so that a simple server-side parser can handle messages starting > with `Auto-merging` and with `CONFLICT (content): Merge conflict in `, and > "throw the hands up in the air" if any other message prefix is seen. > > Do you think we can switch to `sq_quote_buf_pretty()` for these messages? > For the `Auto-merging` one, it would be trivial, but I fear that we will > have to work a bit on the `path_msg()` function > (https://github.com/git/git/blob/v2.35.1/merge-ort.c#L630-L649) because it > accepts a variable list of arguments without any clue whether the > arguments refer to paths or not. (And I would be loathe to switch _all_ > callers to do the quoting themselves.) > > I see 28 calls to that function, and at least a couple that pass not only > a path but also an OID (e.g. > https://github.com/git/git/blob/v2.35.1/merge-ort.c#L1611-L1613). > > We could of course be sloppy and pass even OIDs through > `sq_quote_buf_pretty()` in `path_msg()`, knowing that there won't be any > special characters in them, but it gets more complicated e.g. in > https://github.com/git/git/blob/v2.35.1/merge-ort.c#L1648-L1651, where we > pass an `strbuf` that contains a somewhat free-form commit message. > > I guess we could still pass those through `sq_quote_buf_pretty()`, even if > they are not paths, to ensure that there are no special characters in the > machine-parseable lines. > > What do you think? > > Ciao, > Dscho