git add -p breaks after split on change at the top of the file

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* git add -p breaks after split on change at the top of the file
@ 2017-08-16 20:25 Simon Ruderich
  2017-08-17  8:41 ` Jeff King
  0 siblings, 1 reply; 8+ messages in thread
From: Simon Ruderich @ 2017-08-16 20:25 UTC (permalink / raw)
  To: git

Hello,

The following session reproduces the issue with Git 2.14.1:

    $ git init test
    $ cd test
    $ echo x >file
    $ git add file
    $ git commit -m commit
    $ printf 'a\nb\nx\nc\n' >file

    $ git add -p
    diff --git a/file b/file
    index 587be6b..74a69a0 100644
    --- a/file
    +++ b/file
    @@ -1 +1,4 @@
    +a
    +b
     x
    +c
    Stage this hunk [y,n,q,a,d,/,s,e,?]? <-- press s
    Split into 2 hunks.
    @@ -1 +1,3 @@
    +a
    +b
     x
    Stage this hunk [y,n,q,a,d,/,j,J,g,e,?]? <-- press e

    Now delete the line "+a" in your editor and save.

    error: patch failed: file:1
    error: file: patch does not apply

I expected git add -p to stage this change without error. It
works fine without splitting the hunk (by deleting the first and
last + line in the diff).

Regards
Simon
-- 
+ privacy is necessary
+ using gnupg http://gnupg.org
+ public key id: 0x92FEFDB7E44C32F9

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: git add -p breaks after split on change at the top of the file
  2017-08-16 20:25 git add -p breaks after split on change at the top of the file Simon Ruderich
@ 2017-08-17  8:41 ` Jeff King
  2017-08-17  9:03   ` Jeff King
  2017-08-17 19:08   ` Junio C Hamano
  0 siblings, 2 replies; 8+ messages in thread
From: Jeff King @ 2017-08-17  8:41 UTC (permalink / raw)
  To: Simon Ruderich; +Cc: Junio C Hamano, git

[+cc Junio, as this gets deep into git-apply innards]

On Wed, Aug 16, 2017 at 10:25:04PM +0200, Simon Ruderich wrote:

>     $ git add -p
>     diff --git a/file b/file
>     index 587be6b..74a69a0 100644
>     --- a/file
>     +++ b/file
>     @@ -1 +1,4 @@
>     +a
>     +b
>      x
>     +c
>     Stage this hunk [y,n,q,a,d,/,s,e,?]? <-- press s
>     Split into 2 hunks.
>     @@ -1 +1,3 @@
>     +a
>     +b
>      x
>     Stage this hunk [y,n,q,a,d,/,j,J,g,e,?]? <-- press e
> 
>     Now delete the line "+a" in your editor and save.
> 
>     error: patch failed: file:1
>     error: file: patch does not apply
> 
> I expected git add -p to stage this change without error. It
> works fine without splitting the hunk (by deleting the first and
> last + line in the diff).

Interesting case. The problem isn't in add--interactive itself (I don't
think), but in git-apply. This is the diff we end up feeding to "git
apply --cached --check --recount --allow-overlap" to see if it applies:

  diff --git a/file b/file
  index 587be6b..74a69a0 100644
  --- a/file
  +++ b/file
  @@ -1 +1,3 @@
  +b
   x
  @@ -1 +3,2 @@
   x
  +c

The first hunk (that we edited) applies fine. But the second one does
not. Its hunk header says that it starts at line "1", so we expect to
find it at the beginning of the file. But of course it _isn't_ at the
beginning of the file anymore, because the first hunk added a line
before there.

So this diff is somewhat bogus; it has two hunks which start at the same
spot. But I think that's exactly the sort of thing that
"--allow-overlap" should handle. Doing this makes your case work for me:

diff --git a/apply.c b/apply.c
index 41ee63e1db..606db58218 100644
--- a/apply.c
+++ b/apply.c
@@ -2966,8 +2966,9 @@ static int apply_one_fragment(struct apply_state *state,
 	 * In other words, a hunk that is (frag->oldpos <= 1) with or
 	 * without leading context must match at the beginning.
 	 */
-	match_beginning = (!frag->oldpos ||
-			   (frag->oldpos == 1 && !state->unidiff_zero));
+	match_beginning = (nth_fragment == 1 &&
+			   (!frag->oldpos ||
+			    (frag->oldpos == 1 && !state->unidiff_zero)));

 	/*
 	 * A hunk without trailing lines must match at the end.

But I'm not quite sure if that's right. The rest of the overlap code
seems to mark patched lines with a flag. Meaning that instead of giving
up and saying "well, this is the second line so we can't ever try
matching the beginning", we should be redefining "beginning" in that
case to allow 0 or more PATCHED lines starting from the beginning of the
file.

-Peff

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: git add -p breaks after split on change at the top of the file
  2017-08-17  8:41 ` Jeff King
@ 2017-08-17  9:03   ` Jeff King
  2017-08-17  9:15     ` Jeff King
  2017-08-17 19:08   ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Jeff King @ 2017-08-17  9:03 UTC (permalink / raw)
  To: Simon Ruderich; +Cc: Junio C Hamano, git

On Thu, Aug 17, 2017 at 04:41:09AM -0400, Jeff King wrote:

> diff --git a/apply.c b/apply.c
> index 41ee63e1db..606db58218 100644
> --- a/apply.c
> +++ b/apply.c
> @@ -2966,8 +2966,9 @@ static int apply_one_fragment(struct apply_state *state,
>  	 * In other words, a hunk that is (frag->oldpos <= 1) with or
>  	 * without leading context must match at the beginning.
>  	 */
> -	match_beginning = (!frag->oldpos ||
> -			   (frag->oldpos == 1 && !state->unidiff_zero));
> +	match_beginning = (nth_fragment == 1 &&
> +			   (!frag->oldpos ||
> +			    (frag->oldpos == 1 && !state->unidiff_zero)));
>  
>  	/*
>  	 * A hunk without trailing lines must match at the end.
> 
> 
> But I'm not quite sure if that's right. The rest of the overlap code
> seems to mark patched lines with a flag. Meaning that instead of giving
> up and saying "well, this is the second line so we can't ever try
> matching the beginning", we should be redefining "beginning" in that
> case to allow 0 or more PATCHED lines starting from the beginning of the
> file.

Hmm, actually I was reading that part of the code backwards. We record
the PATCHED flag _only_ when allow_overlap isn't set, not the other way
around. So I had been imagining we'd want something like this:

diff --git a/apply.c b/apply.c
index 41ee63e1db..4048751807 100644
--- a/apply.c
+++ b/apply.c
@@ -2489,8 +2489,11 @@ static int match_fragment(struct apply_state *state,
 		return 0;
 	}
 
-	if (match_beginning && try_lno)
-		return 0;
+	if (match_beginning) {
+		for (i = 0; i < try_lno; i++)
+			if (!(img->line[i].flag & LINE_PATCHED))
+				return 0;
+	}
 
 	/* Quick hash check */
 	for (i = 0; i < preimage_limit; i++)

But that does the opposite of what we want: it makes this case work when
--allow-overlap isn't specified. I think my first attempt is probably
closer to the right direction (but we'd want to have it kick in only
when --allow-overlap is specified; well formed patches shouldn't overlap
but we'd want to barf loudly if they do).

I'll stop here for now before digging any further. I'm not all that
familiar with the apply code, so I have a feeling Junio's comments may
stop me from going down an unproductive dead end. :)

-Peff

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: git add -p breaks after split on change at the top of the file
  2017-08-17  9:03   ` Jeff King
@ 2017-08-17  9:15     ` Jeff King
  2017-08-17 19:22       ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2017-08-17  9:15 UTC (permalink / raw)
  To: Simon Ruderich; +Cc: Junio C Hamano, git

On Thu, Aug 17, 2017 at 05:03:08AM -0400, Jeff King wrote:

> But that does the opposite of what we want: it makes this case work when
> --allow-overlap isn't specified. I think my first attempt is probably
> closer to the right direction (but we'd want to have it kick in only
> when --allow-overlap is specified; well formed patches shouldn't overlap
> but we'd want to barf loudly if they do).
> 
> I'll stop here for now before digging any further. I'm not all that
> familiar with the apply code, so I have a feeling Junio's comments may
> stop me from going down an unproductive dead end. :)

OK, last email, I promise. :)

My first attempt which turns off match_beginning for the overlapping
hunk really does feel wrong. It would blindly match a similar hunk later
in the file. So if you did something like:

  1. The first hunk deletes context line "x".

  2. The second overlapping hunk claims to start at the beginning of the
     file (line 1), and then adds a line after "x".

  3. Later in the file, we have another line "x".

Then the second hunk should be rejected, and not find the unrelated "x"
from (3). But with my patch, I suspect it would be found (because we
dropped the requirement to find it at the beginning).

Of course this is a pretty ridiculous input in the first place. In
theory it _could_ be figured out, but overlapping hunks like this are
always going to cause problems (in this case, context from the second
hunk became a removal, and the second hunk no longer applies).

So maybe it's not worth caring about.  But I think the most robust
solution would involve checking the lines between try_lno and the start
to make sure they were all added. I just don't know that we have an easy
way of checking that. Perhaps if we recorded the cumulative number of
lines added in previous hunks, and used that as our starting point
(instead of "line 0").

I also suspect that match_end has a similar problem, but I couldn't
trigger it in practice (perhaps it's impossible, or perhaps my brain is
just not up to the challenge today).

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: git add -p breaks after split on change at the top of the file
  2017-08-17  8:41 ` Jeff King
  2017-08-17  9:03   ` Jeff King
@ 2017-08-17 19:08   ` Junio C Hamano
  1 sibling, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2017-08-17 19:08 UTC (permalink / raw)
  To: Jeff King; +Cc: Simon Ruderich, git

Jeff King <peff@peff.net> writes:

> [+cc Junio, as this gets deep into git-apply innards]

I've written off --recount and --allow-overlap as ugly workaround
that happens to work some of the time but cannot be trusted long
time ago.

IIRC, before the "(e)dit" thing was added to "add -p", we counted
the line numbers correctly and merged the adjacent hunks before
applying and neither of these two kluge was necessary.

These threads may give us a bit more background:

  https://public-inbox.org/git/7viqk1ndlk.fsf@alter.siamese.dyndns.org/
  https://public-inbox.org/git/1304117373-592-1-git-send-email-gitster@pobox.com/

The original that introduced the "(e)dit" thing was in this thread:

  https://public-inbox.org/git/200805232221.45406.trast@student.ethz.ch/

As you can see, I was very much against it, as it cannot
fundamentally sanely implemented (which I think is the same
conclusion you reached at the end of the current thread).

I think there should be a better failure mode, though.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: git add -p breaks after split on change at the top of the file
  2017-08-17  9:15     ` Jeff King
@ 2017-08-17 19:22       ` Junio C Hamano
  2017-08-20  8:19         ` Jeff King
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2017-08-17 19:22 UTC (permalink / raw)
  To: Jeff King; +Cc: Simon Ruderich, git

Jeff King <peff@peff.net> writes:

> Of course this is a pretty ridiculous input in the first place. In
> theory it _could_ be figured out, but overlapping hunks like this are
> always going to cause problems (in this case, context from the second
> hunk became a removal, and the second hunk no longer applies).

To be honest, I do not think it could be figured out by "git apply",
but "git add -p" _should_ be able to.  I've said already this
earlier in the discussion:

  https://public-inbox.org/git/7vab5cn7wr.fsf@alter.siamese.dyndns.org/

"add -p" knows how the hunk _before_ it gave the user to edit looked
like, and it knows how the hunk after the user edited looks like, so
it may have enough information to figure it out.  But the "(e)dit"
feature was done in a way to discard that information and forced an
impossible "guess what the correct shape of the patch would be, out
of this broken patch input" with --recount and --allow-overlap.  

If we want to make "add -p/(e)dit" work in the corner cases and make
it trustworthy, "apply" is a wrong tree to back at.  A major part of
the effort to fix would go to "add -p", not to "apply".

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: git add -p breaks after split on change at the top of the file
  2017-08-17 19:22       ` Junio C Hamano
@ 2017-08-20  8:19         ` Jeff King
  2017-08-20 16:25           ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2017-08-20  8:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Simon Ruderich, git

On Thu, Aug 17, 2017 at 12:22:19PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Of course this is a pretty ridiculous input in the first place. In
> > theory it _could_ be figured out, but overlapping hunks like this are
> > always going to cause problems (in this case, context from the second
> > hunk became a removal, and the second hunk no longer applies).
> 
> To be honest, I do not think it could be figured out by "git apply",
> but "git add -p" _should_ be able to.  I've said already this
> earlier in the discussion:
> 
>   https://public-inbox.org/git/7vab5cn7wr.fsf@alter.siamese.dyndns.org/
> 
> "add -p" knows how the hunk _before_ it gave the user to edit looked
> like, and it knows how the hunk after the user edited looks like, so
> it may have enough information to figure it out.  But the "(e)dit"
> feature was done in a way to discard that information and forced an
> impossible "guess what the correct shape of the patch would be, out
> of this broken patch input" with --recount and --allow-overlap.
> 
> If we want to make "add -p/(e)dit" work in the corner cases and make
> it trustworthy, "apply" is a wrong tree to back at.  A major part of
> the effort to fix would go to "add -p", not to "apply".

In theory "add -p" could give all the information it has to git-apply in
some form, and it could figure it out. That may sound crazy, but I just
wonder if it makes things easier because git-apply already knows quite a
lot about how to parse and interpret diffs. Whereas "add -p" is largely
stupid and just parroting back lines.

But I certainly wouldn't make such a decision until figuring out the
actual algorithm, which should then make it obvious where is the best
place to implement it. :)

So just thinking about this particular situation. We have two hunks
after the split:

  @@ -1, +1,3 @@
  +a
  +b
   x

and

  @@ -1, +3,2 @@
   x
  +c

The root of the issue, I think, is that the hunk header for the second
one is bogus. It claims to start at the beginning, but of course if the
other hunk is applied first, it doesn't. The right header in that case
would be (assuming we do not copy the extra context):

  @@ -3, +3,2 @@
   x
  +c

One point of interest is that without an (e)dit, we get this case right,
despite the bogus hunk header. I'm not quite sure why that is. After the
failing edit we would end up with:

  @@ -1, +1,3 @@
  +b
   x

  @@ -1, +3,2 @@
   x
  +c

Clearly the count in the first one is now wrong, but we fix that with
--recount. But it seems to me that the second hunk is equally wrong as
it would be in the non-edit case. I guess in addition to the "-1" being
a "-2", the "+3" should now also be a "+2". But I didn't think that
would impact us finding the preimage side.

So that's one question I'm puzzled by: why does it work without an edit,
but fail with one.

My second question is how "add -p" could generate the corrected hunk
headers.

In the non-edit case, we know that our second hunk's leading context
must always match the first hunk's trailing context (since that's how we
do a split). So we'd know that the preimage "-pos" field of the second
hunk header should be offset by the net sum of the added/changed lines
for any previous split hunks.

In the original case we had net two added lines in the first hunk. So if
it's selected, that's how our "-1" becomes "-3". And after the edit, we
should be able to apply the same, I think? We have a single added line,
so it becomes "-2". That doesn't require knowing what was in the hunk
before the user edited at all. We just need to know which other hunks
it's split from, so we know which overlapping hunks we need to take into
account when recomputing.

"add -p" certainly has that information. I _think_ git-apply also has
that information if it bothers to find out. Right now --allow-overlap is
just "do not complain of overlap". But it could actively recount the
offsets when it detects overlap.

I'm assuming here that the edit process isn't changing the hunk headers,
introducing new hunks, etc. I feel like all bets are off, then.

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: git add -p breaks after split on change at the top of the file
  2017-08-20  8:19         ` Jeff King
@ 2017-08-20 16:25           ` Junio C Hamano
  0 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2017-08-20 16:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Simon Ruderich, git

Jeff King <peff@peff.net> writes:

> So that's one question I'm puzzled by: why does it work without an edit,
> but fail with one.

As this is the part of the system I long time ago declared a "works
only some of the time" hack, I do not recall the exact details, but
a key phrase "DIRTY" in the old discussion thread made me look.

Doesn't coalesce_overlapping_hunks in add-interactive.perl play a
role in the difference?  I _think_ that one tries to do the right
thing by enabling the logic before the "(e)dit" hack was introduced;
I haven't actually checked out and looked the code before the
addition of the option, but "add-p" should back then have actually
understood the offsets and hunk length and merged the surviving
hunks as needed to avoid feeding overlaps to "git apply"---it may
even have fixed the offset before doing so.  Edited hunks whose
mechanical integrity has become dubious are marked with DIRTY, as
the hunk touched by "(e)dit" codepath, by the time it reaches that
helper function, no longer has trustable information in the correct
offset information there.

> My second question is how "add -p" could generate the corrected hunk
> headers.
>
> In the non-edit case, we know that our second hunk's leading context
> must always match the first hunk's trailing context (since that's how we
> do a split). So we'd know that the preimage "-pos" field of the second
> hunk header should be offset by the net sum of the added/changed lines
> for any previous split hunks.
>
> In the original case we had net two added lines in the first hunk. So if
> it's selected, that's how our "-1" becomes "-3". And after the edit, we
> should be able to apply the same, I think? We have a single added line,
> so it becomes "-2". That doesn't require knowing what was in the hunk
> before the user edited at all. We just need to know which other hunks
> it's split from, so we know which overlapping hunks we need to take into
> account when recomputing.
>
> "add -p" certainly has that information. I _think_ git-apply also has
> that information if it bothers to find out.

I am not sure about the latter.  If we declare that we assume the
user never touches the hunk header line (i.e. "@@ ... @@") when
editing the patch text (i.e. " /-/+" lines), the receiving end can
probably guess by counting how many preimage and postimage lines are
described in the hunk and then seeing the change from the numbers on
the hunk header.  

I do not think the "--recount/--overlap" code trusts its input that
much in the current code, and there may be a room to tweak that band
aid with such an additional assumption.

But I dunno.  Once we go that route, and then if we want to assume
less about random things the end-user may do to make it more robust,
"add -p" would need to keep the hunk header before giving the hunk
to the user, and then replace the (possibly modified) hunk header in
the edited patch with the original to tell the receiving end what
the original numbers were to help --recount logic.  At that point, I
do not think it is too big a stretch for the "(e)dit" to actually
correct the hunk headers to not just express the correct offset but
also show the right number of lines, so that it does not have to be
marked with the "DIRTY" bit.  That way, coalesce_overlapping_hunks
logic can finish it off just like other "picked but unedited" hunks.

And when that happens, we may not even need to run "apply" with the
"--recount" and "--allow-overlap" options from this codepath to go
back to the state before 8cbd4310 ("git-add--interactive: replace
hunk recounting with apply --recount", 2008-07-02), where "add -p"
did not fly blind and always knew (or at least "supposed to know")
everything that is happening, which IMO is a preferrable endgame.
If the language spoken between "add -p" and "apply" is "patch", we
should try to stick to that language.  Butchering the listener to
allow this speaker to speak in loose and ambiguous terms, especially
when we can teach the speaker to do so, was a big mistake.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-08-20 16:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-16 20:25 git add -p breaks after split on change at the top of the file Simon Ruderich
2017-08-17  8:41 ` Jeff King
2017-08-17  9:03   ` Jeff King
2017-08-17  9:15     ` Jeff King
2017-08-17 19:22       ` Junio C Hamano
2017-08-20  8:19         ` Jeff King
2017-08-20 16:25           ` Junio C Hamano
2017-08-17 19:08   ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).