* git add -p breaks after split on change at the top of the file @ 2017-08-16 20:25 Simon Ruderich 2017-08-17 8:41 ` Jeff King 0 siblings, 1 reply; 8+ messages in thread From: Simon Ruderich @ 2017-08-16 20:25 UTC (permalink / raw) To: git Hello, The following session reproduces the issue with Git 2.14.1: $ git init test $ cd test $ echo x >file $ git add file $ git commit -m commit $ printf 'a\nb\nx\nc\n' >file $ git add -p diff --git a/file b/file index 587be6b..74a69a0 100644 --- a/file +++ b/file @@ -1 +1,4 @@ +a +b x +c Stage this hunk [y,n,q,a,d,/,s,e,?]? <-- press s Split into 2 hunks. @@ -1 +1,3 @@ +a +b x Stage this hunk [y,n,q,a,d,/,j,J,g,e,?]? <-- press e Now delete the line "+a" in your editor and save. error: patch failed: file:1 error: file: patch does not apply I expected git add -p to stage this change without error. It works fine without splitting the hunk (by deleting the first and last + line in the diff). Regards Simon -- + privacy is necessary + using gnupg http://gnupg.org + public key id: 0x92FEFDB7E44C32F9 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: git add -p breaks after split on change at the top of the file 2017-08-16 20:25 git add -p breaks after split on change at the top of the file Simon Ruderich @ 2017-08-17 8:41 ` Jeff King 2017-08-17 9:03 ` Jeff King 2017-08-17 19:08 ` Junio C Hamano 0 siblings, 2 replies; 8+ messages in thread From: Jeff King @ 2017-08-17 8:41 UTC (permalink / raw) To: Simon Ruderich; +Cc: Junio C Hamano, git [+cc Junio, as this gets deep into git-apply innards] On Wed, Aug 16, 2017 at 10:25:04PM +0200, Simon Ruderich wrote: > $ git add -p > diff --git a/file b/file > index 587be6b..74a69a0 100644 > --- a/file > +++ b/file > @@ -1 +1,4 @@ > +a > +b > x > +c > Stage this hunk [y,n,q,a,d,/,s,e,?]? <-- press s > Split into 2 hunks. > @@ -1 +1,3 @@ > +a > +b > x > Stage this hunk [y,n,q,a,d,/,j,J,g,e,?]? <-- press e > > Now delete the line "+a" in your editor and save. > > error: patch failed: file:1 > error: file: patch does not apply > > I expected git add -p to stage this change without error. It > works fine without splitting the hunk (by deleting the first and > last + line in the diff). Interesting case. The problem isn't in add--interactive itself (I don't think), but in git-apply. This is the diff we end up feeding to "git apply --cached --check --recount --allow-overlap" to see if it applies: diff --git a/file b/file index 587be6b..74a69a0 100644 --- a/file +++ b/file @@ -1 +1,3 @@ +b x @@ -1 +3,2 @@ x +c The first hunk (that we edited) applies fine. But the second one does not. Its hunk header says that it starts at line "1", so we expect to find it at the beginning of the file. But of course it _isn't_ at the beginning of the file anymore, because the first hunk added a line before there. So this diff is somewhat bogus; it has two hunks which start at the same spot. But I think that's exactly the sort of thing that "--allow-overlap" should handle. Doing this makes your case work for me: diff --git a/apply.c b/apply.c index 41ee63e1db..606db58218 100644 --- a/apply.c +++ b/apply.c @@ -2966,8 +2966,9 @@ static int apply_one_fragment(struct apply_state *state, * In other words, a hunk that is (frag->oldpos <= 1) with or * without leading context must match at the beginning. */ - match_beginning = (!frag->oldpos || - (frag->oldpos == 1 && !state->unidiff_zero)); + match_beginning = (nth_fragment == 1 && + (!frag->oldpos || + (frag->oldpos == 1 && !state->unidiff_zero))); /* * A hunk without trailing lines must match at the end. But I'm not quite sure if that's right. The rest of the overlap code seems to mark patched lines with a flag. Meaning that instead of giving up and saying "well, this is the second line so we can't ever try matching the beginning", we should be redefining "beginning" in that case to allow 0 or more PATCHED lines starting from the beginning of the file. -Peff ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: git add -p breaks after split on change at the top of the file 2017-08-17 8:41 ` Jeff King @ 2017-08-17 9:03 ` Jeff King 2017-08-17 9:15 ` Jeff King 2017-08-17 19:08 ` Junio C Hamano 1 sibling, 1 reply; 8+ messages in thread From: Jeff King @ 2017-08-17 9:03 UTC (permalink / raw) To: Simon Ruderich; +Cc: Junio C Hamano, git On Thu, Aug 17, 2017 at 04:41:09AM -0400, Jeff King wrote: > diff --git a/apply.c b/apply.c > index 41ee63e1db..606db58218 100644 > --- a/apply.c > +++ b/apply.c > @@ -2966,8 +2966,9 @@ static int apply_one_fragment(struct apply_state *state, > * In other words, a hunk that is (frag->oldpos <= 1) with or > * without leading context must match at the beginning. > */ > - match_beginning = (!frag->oldpos || > - (frag->oldpos == 1 && !state->unidiff_zero)); > + match_beginning = (nth_fragment == 1 && > + (!frag->oldpos || > + (frag->oldpos == 1 && !state->unidiff_zero))); > > /* > * A hunk without trailing lines must match at the end. > > > But I'm not quite sure if that's right. The rest of the overlap code > seems to mark patched lines with a flag. Meaning that instead of giving > up and saying "well, this is the second line so we can't ever try > matching the beginning", we should be redefining "beginning" in that > case to allow 0 or more PATCHED lines starting from the beginning of the > file. Hmm, actually I was reading that part of the code backwards. We record the PATCHED flag _only_ when allow_overlap isn't set, not the other way around. So I had been imagining we'd want something like this: diff --git a/apply.c b/apply.c index 41ee63e1db..4048751807 100644 --- a/apply.c +++ b/apply.c @@ -2489,8 +2489,11 @@ static int match_fragment(struct apply_state *state, return 0; } - if (match_beginning && try_lno) - return 0; + if (match_beginning) { + for (i = 0; i < try_lno; i++) + if (!(img->line[i].flag & LINE_PATCHED)) + return 0; + } /* Quick hash check */ for (i = 0; i < preimage_limit; i++) But that does the opposite of what we want: it makes this case work when --allow-overlap isn't specified. I think my first attempt is probably closer to the right direction (but we'd want to have it kick in only when --allow-overlap is specified; well formed patches shouldn't overlap but we'd want to barf loudly if they do). I'll stop here for now before digging any further. I'm not all that familiar with the apply code, so I have a feeling Junio's comments may stop me from going down an unproductive dead end. :) -Peff ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: git add -p breaks after split on change at the top of the file 2017-08-17 9:03 ` Jeff King @ 2017-08-17 9:15 ` Jeff King 2017-08-17 19:22 ` Junio C Hamano 0 siblings, 1 reply; 8+ messages in thread From: Jeff King @ 2017-08-17 9:15 UTC (permalink / raw) To: Simon Ruderich; +Cc: Junio C Hamano, git On Thu, Aug 17, 2017 at 05:03:08AM -0400, Jeff King wrote: > But that does the opposite of what we want: it makes this case work when > --allow-overlap isn't specified. I think my first attempt is probably > closer to the right direction (but we'd want to have it kick in only > when --allow-overlap is specified; well formed patches shouldn't overlap > but we'd want to barf loudly if they do). > > I'll stop here for now before digging any further. I'm not all that > familiar with the apply code, so I have a feeling Junio's comments may > stop me from going down an unproductive dead end. :) OK, last email, I promise. :) My first attempt which turns off match_beginning for the overlapping hunk really does feel wrong. It would blindly match a similar hunk later in the file. So if you did something like: 1. The first hunk deletes context line "x". 2. The second overlapping hunk claims to start at the beginning of the file (line 1), and then adds a line after "x". 3. Later in the file, we have another line "x". Then the second hunk should be rejected, and not find the unrelated "x" from (3). But with my patch, I suspect it would be found (because we dropped the requirement to find it at the beginning). Of course this is a pretty ridiculous input in the first place. In theory it _could_ be figured out, but overlapping hunks like this are always going to cause problems (in this case, context from the second hunk became a removal, and the second hunk no longer applies). So maybe it's not worth caring about. But I think the most robust solution would involve checking the lines between try_lno and the start to make sure they were all added. I just don't know that we have an easy way of checking that. Perhaps if we recorded the cumulative number of lines added in previous hunks, and used that as our starting point (instead of "line 0"). I also suspect that match_end has a similar problem, but I couldn't trigger it in practice (perhaps it's impossible, or perhaps my brain is just not up to the challenge today). -Peff ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: git add -p breaks after split on change at the top of the file 2017-08-17 9:15 ` Jeff King @ 2017-08-17 19:22 ` Junio C Hamano 2017-08-20 8:19 ` Jeff King 0 siblings, 1 reply; 8+ messages in thread From: Junio C Hamano @ 2017-08-17 19:22 UTC (permalink / raw) To: Jeff King; +Cc: Simon Ruderich, git Jeff King <peff@peff.net> writes: > Of course this is a pretty ridiculous input in the first place. In > theory it _could_ be figured out, but overlapping hunks like this are > always going to cause problems (in this case, context from the second > hunk became a removal, and the second hunk no longer applies). To be honest, I do not think it could be figured out by "git apply", but "git add -p" _should_ be able to. I've said already this earlier in the discussion: https://public-inbox.org/git/7vab5cn7wr.fsf@alter.siamese.dyndns.org/ "add -p" knows how the hunk _before_ it gave the user to edit looked like, and it knows how the hunk after the user edited looks like, so it may have enough information to figure it out. But the "(e)dit" feature was done in a way to discard that information and forced an impossible "guess what the correct shape of the patch would be, out of this broken patch input" with --recount and --allow-overlap. If we want to make "add -p/(e)dit" work in the corner cases and make it trustworthy, "apply" is a wrong tree to back at. A major part of the effort to fix would go to "add -p", not to "apply". ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: git add -p breaks after split on change at the top of the file 2017-08-17 19:22 ` Junio C Hamano @ 2017-08-20 8:19 ` Jeff King 2017-08-20 16:25 ` Junio C Hamano 0 siblings, 1 reply; 8+ messages in thread From: Jeff King @ 2017-08-20 8:19 UTC (permalink / raw) To: Junio C Hamano; +Cc: Simon Ruderich, git On Thu, Aug 17, 2017 at 12:22:19PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > Of course this is a pretty ridiculous input in the first place. In > > theory it _could_ be figured out, but overlapping hunks like this are > > always going to cause problems (in this case, context from the second > > hunk became a removal, and the second hunk no longer applies). > > To be honest, I do not think it could be figured out by "git apply", > but "git add -p" _should_ be able to. I've said already this > earlier in the discussion: > > https://public-inbox.org/git/7vab5cn7wr.fsf@alter.siamese.dyndns.org/ > > "add -p" knows how the hunk _before_ it gave the user to edit looked > like, and it knows how the hunk after the user edited looks like, so > it may have enough information to figure it out. But the "(e)dit" > feature was done in a way to discard that information and forced an > impossible "guess what the correct shape of the patch would be, out > of this broken patch input" with --recount and --allow-overlap. > > If we want to make "add -p/(e)dit" work in the corner cases and make > it trustworthy, "apply" is a wrong tree to back at. A major part of > the effort to fix would go to "add -p", not to "apply". In theory "add -p" could give all the information it has to git-apply in some form, and it could figure it out. That may sound crazy, but I just wonder if it makes things easier because git-apply already knows quite a lot about how to parse and interpret diffs. Whereas "add -p" is largely stupid and just parroting back lines. But I certainly wouldn't make such a decision until figuring out the actual algorithm, which should then make it obvious where is the best place to implement it. :) So just thinking about this particular situation. We have two hunks after the split: @@ -1, +1,3 @@ +a +b x and @@ -1, +3,2 @@ x +c The root of the issue, I think, is that the hunk header for the second one is bogus. It claims to start at the beginning, but of course if the other hunk is applied first, it doesn't. The right header in that case would be (assuming we do not copy the extra context): @@ -3, +3,2 @@ x +c One point of interest is that without an (e)dit, we get this case right, despite the bogus hunk header. I'm not quite sure why that is. After the failing edit we would end up with: @@ -1, +1,3 @@ +b x @@ -1, +3,2 @@ x +c Clearly the count in the first one is now wrong, but we fix that with --recount. But it seems to me that the second hunk is equally wrong as it would be in the non-edit case. I guess in addition to the "-1" being a "-2", the "+3" should now also be a "+2". But I didn't think that would impact us finding the preimage side. So that's one question I'm puzzled by: why does it work without an edit, but fail with one. My second question is how "add -p" could generate the corrected hunk headers. In the non-edit case, we know that our second hunk's leading context must always match the first hunk's trailing context (since that's how we do a split). So we'd know that the preimage "-pos" field of the second hunk header should be offset by the net sum of the added/changed lines for any previous split hunks. In the original case we had net two added lines in the first hunk. So if it's selected, that's how our "-1" becomes "-3". And after the edit, we should be able to apply the same, I think? We have a single added line, so it becomes "-2". That doesn't require knowing what was in the hunk before the user edited at all. We just need to know which other hunks it's split from, so we know which overlapping hunks we need to take into account when recomputing. "add -p" certainly has that information. I _think_ git-apply also has that information if it bothers to find out. Right now --allow-overlap is just "do not complain of overlap". But it could actively recount the offsets when it detects overlap. I'm assuming here that the edit process isn't changing the hunk headers, introducing new hunks, etc. I feel like all bets are off, then. -Peff ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: git add -p breaks after split on change at the top of the file 2017-08-20 8:19 ` Jeff King @ 2017-08-20 16:25 ` Junio C Hamano 0 siblings, 0 replies; 8+ messages in thread From: Junio C Hamano @ 2017-08-20 16:25 UTC (permalink / raw) To: Jeff King; +Cc: Simon Ruderich, git Jeff King <peff@peff.net> writes: > So that's one question I'm puzzled by: why does it work without an edit, > but fail with one. As this is the part of the system I long time ago declared a "works only some of the time" hack, I do not recall the exact details, but a key phrase "DIRTY" in the old discussion thread made me look. Doesn't coalesce_overlapping_hunks in add-interactive.perl play a role in the difference? I _think_ that one tries to do the right thing by enabling the logic before the "(e)dit" hack was introduced; I haven't actually checked out and looked the code before the addition of the option, but "add-p" should back then have actually understood the offsets and hunk length and merged the surviving hunks as needed to avoid feeding overlaps to "git apply"---it may even have fixed the offset before doing so. Edited hunks whose mechanical integrity has become dubious are marked with DIRTY, as the hunk touched by "(e)dit" codepath, by the time it reaches that helper function, no longer has trustable information in the correct offset information there. > My second question is how "add -p" could generate the corrected hunk > headers. > > In the non-edit case, we know that our second hunk's leading context > must always match the first hunk's trailing context (since that's how we > do a split). So we'd know that the preimage "-pos" field of the second > hunk header should be offset by the net sum of the added/changed lines > for any previous split hunks. > > In the original case we had net two added lines in the first hunk. So if > it's selected, that's how our "-1" becomes "-3". And after the edit, we > should be able to apply the same, I think? We have a single added line, > so it becomes "-2". That doesn't require knowing what was in the hunk > before the user edited at all. We just need to know which other hunks > it's split from, so we know which overlapping hunks we need to take into > account when recomputing. > > "add -p" certainly has that information. I _think_ git-apply also has > that information if it bothers to find out. I am not sure about the latter. If we declare that we assume the user never touches the hunk header line (i.e. "@@ ... @@") when editing the patch text (i.e. " /-/+" lines), the receiving end can probably guess by counting how many preimage and postimage lines are described in the hunk and then seeing the change from the numbers on the hunk header. I do not think the "--recount/--overlap" code trusts its input that much in the current code, and there may be a room to tweak that band aid with such an additional assumption. But I dunno. Once we go that route, and then if we want to assume less about random things the end-user may do to make it more robust, "add -p" would need to keep the hunk header before giving the hunk to the user, and then replace the (possibly modified) hunk header in the edited patch with the original to tell the receiving end what the original numbers were to help --recount logic. At that point, I do not think it is too big a stretch for the "(e)dit" to actually correct the hunk headers to not just express the correct offset but also show the right number of lines, so that it does not have to be marked with the "DIRTY" bit. That way, coalesce_overlapping_hunks logic can finish it off just like other "picked but unedited" hunks. And when that happens, we may not even need to run "apply" with the "--recount" and "--allow-overlap" options from this codepath to go back to the state before 8cbd4310 ("git-add--interactive: replace hunk recounting with apply --recount", 2008-07-02), where "add -p" did not fly blind and always knew (or at least "supposed to know") everything that is happening, which IMO is a preferrable endgame. If the language spoken between "add -p" and "apply" is "patch", we should try to stick to that language. Butchering the listener to allow this speaker to speak in loose and ambiguous terms, especially when we can teach the speaker to do so, was a big mistake. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: git add -p breaks after split on change at the top of the file 2017-08-17 8:41 ` Jeff King 2017-08-17 9:03 ` Jeff King @ 2017-08-17 19:08 ` Junio C Hamano 1 sibling, 0 replies; 8+ messages in thread From: Junio C Hamano @ 2017-08-17 19:08 UTC (permalink / raw) To: Jeff King; +Cc: Simon Ruderich, git Jeff King <peff@peff.net> writes: > [+cc Junio, as this gets deep into git-apply innards] I've written off --recount and --allow-overlap as ugly workaround that happens to work some of the time but cannot be trusted long time ago. IIRC, before the "(e)dit" thing was added to "add -p", we counted the line numbers correctly and merged the adjacent hunks before applying and neither of these two kluge was necessary. These threads may give us a bit more background: https://public-inbox.org/git/7viqk1ndlk.fsf@alter.siamese.dyndns.org/ https://public-inbox.org/git/1304117373-592-1-git-send-email-gitster@pobox.com/ The original that introduced the "(e)dit" thing was in this thread: https://public-inbox.org/git/200805232221.45406.trast@student.ethz.ch/ As you can see, I was very much against it, as it cannot fundamentally sanely implemented (which I think is the same conclusion you reached at the end of the current thread). I think there should be a better failure mode, though. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-08-20 16:25 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-08-16 20:25 git add -p breaks after split on change at the top of the file Simon Ruderich 2017-08-17 8:41 ` Jeff King 2017-08-17 9:03 ` Jeff King 2017-08-17 9:15 ` Jeff King 2017-08-17 19:22 ` Junio C Hamano 2017-08-20 8:19 ` Jeff King 2017-08-20 16:25 ` Junio C Hamano 2017-08-17 19:08 ` Junio C Hamano
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).