git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jerry Zhang <jerry@skydio.com>
To: Elijah Newren <newren@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Ross Yeager <ross@skydio.com>, Abraham Bachrach <abe@skydio.com>,
	brian.kubisiask@skydio.com
Subject: Re: [PATCH 0/1] git-apply: Allow simultaneous --cached and --3way options
Date: Mon, 5 Apr 2021 15:05:34 -0700	[thread overview]
Message-ID: <CAMKO5CsN+J_30vhJTo5PYj_9SNJVh_y33APUviG2P4bir29RjQ@mail.gmail.com> (raw)
In-Reply-To: <CABPp-BGSgyAH0w21Vrv_bdPaLg+rCPViktbUmM6fMbmxaK70qA@mail.gmail.com>

On Fri, Apr 2, 2021 at 8:04 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Fri, Apr 2, 2021 at 6:36 PM Jerry Zhang <jerry@skydio.com> wrote:
> >
> > I'm creating a script/tool that will be able to cherry-pick
> > multiple commits from a single branch, rebase them onto a
> > base commit, and push those references to a remote.
> >
> > Ex. with a branch like "origin/master -> A -> B -> C"
> > The tool will create "master -> A", "master -> B",
> > "master -> C" and either make local branches or
> > push them to a remote. This can be useful since code
> > review tools like github use branches as the basis
> > for pull requests.
>
> Not sure I understand the "master -> A", "master -> B" syntax.  What
> do you mean here?
Ah yeah my syntax wasn't super clear here.
I mean a branch "dev" pointing to commit "C", which is on top of "B",
which is on top of "A", which is on top of "master".
My tool would fake "cherry-pick" each of A, B, and C on top of master.
>
> > A key feature here is that the above happens without
> > any changes to the user's working directory or cache.
> > This is important since those operations will add
> > time and generate build churn. We use these steps
> > for synthesizing a "cherry-pick" of B to master.
> >
> > 1. cp .git/index index.temp
> > 2. set GIT_INDEX_FILE=index.temp
> > 3. git reset master -- . (git read-tree also works here, but is a bit slower)
> > 4. git format-patch --full-index B~..B
> > 5. git apply --cached B.patch
> > 6. git write-tree
> > 7. git commit-tree {output of 6} -p master -m "message"
> > 8. either `git symbolic-ref` to make a branch or `git push` to remote
>
> Yeah, folks have resorted to various variants of this kind of thing in
> the past.  It is a clever way to handle some basic cases, but it does
> often fall short.  It's unfortunate that cherry-pick and rebase cannot
> yet just provide this functionality (more on that below).
>
> It may also interest you that rebase has two different backends, one
> built on am (which in turn is built on format-patch + apply), and one
> built on the merge machinery (which the am --3way also uses when it
> needs to).  We deprecated the format-patch + apply backend in part
> because it sometimes results in misapplied patches; see the "Context"
> subsection of the "BEHAVIORAL DIFFERENCES" section of the git-rebase
> manpage.  However, the am version would at least handle basic renames,
> which I believe might cause problems for a direct format-patch + apply
> invocation like yours (I'll also discuss this more below).
Thanks -- I was able to repro a case where am machinery applied a patch
incorrectly but 3way applied it correctly. This actually brings up
another point,
because am doesn't report errors when applying a patch incorrectly in this
case, we don't end up falling back to 3way. There also is no user flag
to force 3way, so the user can't do anything to ensure the correct
application here. Maybe it would be better for --3way to directly invoke
the 3way merge rather than causing it to fallback? (Junio might also
have some input here).
>
> > I'm looking to improve the git apply step in #5.
> > Currently we can't use --cached in combination with
> > --3way, which limits some of the usefulness of this method.
> > There are many diffs that will block applying a patch
> > that a 3 way merge can resolve without conflicts. Even
> > in the case where there are real conflicts, performing
> > a 3 way merge will allow us to show the user the lines
> > where the conflict occurred.
> >
> > With the above in mind, I've created a small patch that
> > implements the behavior I'd like. Rather than disallow
> > the cached and 3way flags to be combined, we allow them,
> > but write any conflicts directly to the cached file. Since
> > we're unable to modify the working directory, it seems
> > reasonable in this case to not actually present the user
> > with any options to resolve conflicts. Instead, a script
> > or tool using this command can diff the temporary cache
> > to get the source of the conflict.
>
> Looks like you're focusing on content conflicts.  What about path
> conflicts?  For example, apply's --3way just uses a per-file
> ll_merge() call, meaning it won't handle renames, so your method would
> also often get spurious modify/delete conflicts when renames occur.
> How does your plan to just "cache" conflicts work with these
> modify/delete files?  Will users just look for conflict markers and
> ignore the fact that both modified newfile and modified oldfile are
> present?  I'm also curious how e.g. directory/file conflicts would be
> handled by your scheme; these seem somewhat problematic to me from
> your description.
>
> > Happy to address any feedback. After I address any major
> > changes I will add new tests for this path.
>
> Don't know the timeframe you're looking at, but I'm looking to modify
> cherry-pick and rebase to be able to operate on branches that are not
> checked out, or in bare repositories.  The blocker to that
That functionality would be great. I initially did look at what it would
take to modify sequencer to get what I wanted, but I quickly realized
it would be a big refactor.
> traditionally has been that the merge machinery required a working
> directory.  The good news is that I wrote a new merge backend that
> doesn't require a working directory.  The bad news is I'm still trying
> to get that new merge backend through the review process, and no
> current release of git has a complete version of that new backend yet.
> Further, the changes to cherry-pick and rebase have not yet been
> started.  There were some decisions to make too, such as how to handle
> the case with conflicts -- just report them and tell the user to retry
> with a checkout?  Provide some kind of basic information about the
> conflicts?  What'd be useful to you?
After thinking some more I'd generally agree with comments to leave
the conflicts at higher stages rather than check in the conflict markers.
This should result in less issues with path conflicts as well (or at least
be similar to 3way by itself). This is probably ok, because the conflict
markers can always be generated from the higher stage files (git diff or
git checkout -m -- .), but the reverse isn't true.
Overall 90% of the functionality comes from being able to do the 3-way
at all, since it's able to handle more cases correctly. Having *any* output
to tell the user why their operation failed would just be a bonus.

I'd envision flags similar to these, for cherry-pick

--cached : Do not touch the working tree to apply conflict markers.
Instead conflicts are left at a higher order in the cache.
--cached-parent : Checkout the index to the given commit, then
apply the cherry-pick with the given commit as a parent. Print out
the new commit. Warning: index will be left unsynchronized with
HEAD after this operation. Intended to be used with a temporary index
rather than the main one.

In the end I'm not sure how to still accomplish the desired functionality
without using a temporary index -- this would always result in
desyncing the user's index / working dir afterwards. Maybe error or
warn if the user isn't using a temporary index?

  reply	other threads:[~2021-04-05 22:06 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-03  1:34 [PATCH 0/1] git-apply: Allow simultaneous --cached and --3way options Jerry Zhang
2021-04-03  1:34 ` [PATCH 1/1] " Jerry Zhang
2021-04-03  3:46   ` Elijah Newren
2021-04-03  4:26     ` Junio C Hamano
2021-04-04  1:02       ` Junio C Hamano
2021-04-05 22:12         ` Jerry Zhang
2021-04-05 22:23           ` Junio C Hamano
2021-04-05 23:29             ` Jerry Zhang
2021-04-06  0:10               ` Junio C Hamano
2021-04-05 22:08     ` Jerry Zhang
2021-04-05 22:19   ` [PATCH V2] " Jerry Zhang
2021-04-05 22:46     ` Junio C Hamano
2021-04-06  2:52       ` Jerry Zhang
2021-04-06  5:52         ` Junio C Hamano
2021-04-06 21:56           ` Jerry Zhang
2021-04-07  2:25             ` Jerry Zhang
2021-04-06  2:49     ` [PATCH v3] git-apply: allow " Jerry Zhang
2021-04-07 18:03       ` [PATCH v4] " Jerry Zhang
2021-04-07 19:00         ` Junio C Hamano
2021-04-08  2:13         ` [PATCH v5] " Jerry Zhang
2021-04-08 13:33           ` Junio C Hamano
2021-04-12 15:45             ` Elijah Newren
2021-04-12 18:26               ` Junio C Hamano
2021-04-12 15:40           ` Elijah Newren
2021-04-12 18:27             ` Junio C Hamano
2021-04-03  3:04 ` [PATCH 0/1] git-apply: Allow " Elijah Newren
2021-04-05 22:05   ` Jerry Zhang [this message]
2021-04-03  5:24 ` Bagas Sanjaya
     [not found]   ` <CAMKO5CtiW84E4XjnPRf-yOPp+ua_u07LsAu=BB0YhmP3+3kYiw@mail.gmail.com>
2021-04-03  8:05     ` Bagas Sanjaya

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMKO5CsN+J_30vhJTo5PYj_9SNJVh_y33APUviG2P4bir29RjQ@mail.gmail.com \
    --to=jerry@skydio.com \
    --cc=abe@skydio.com \
    --cc=brian.kubisiask@skydio.com \
    --cc=git@vger.kernel.org \
    --cc=newren@gmail.com \
    --cc=ross@skydio.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).