git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Can I convince the diff algorithm to behave better?
@ 2021-03-03  2:03 Tom Ritter
  2021-03-03 12:41 ` Thomas Braun
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Tom Ritter @ 2021-03-03  2:03 UTC (permalink / raw)
  To: git

(For a specific, nuanced, and personal definition of better...)

I have a frequent behavior that arises when I am copy/pasting chunks
of code, typically in tests.  Here is an example:

My Original code:

def function():
   line 1
   line 2
   line 3
   line 4
   line 5
   line 6

--------------------------------
I add, after it:

def function2():
   line 1
   line 2
   line 3
   line 4
   line 5
   line 6

--------------------------------
My diff is:

+   line 3
+   line 4
+   line 5
+   line 6
+
+def function2():
+   line 1
+   line 2

--------------------------------
I'd like my diff to be

+
+def function2():
+   line 1
+   line 2
+   line 3
+   line 4
+   line 5
+   line 6


Obviously there's nothing incorrect about the former diff, I just wish
it was the latter rather than the former.

I know that git includes four diff algorithms; in my testing patience
or histogram exacerbated the problem; and none of them improved upon
it.  If anyone has suggestions I'd be curious to know if there's
anything that could be done...

Thanks,
-tom

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Can I convince the diff algorithm to behave better?
  2021-03-03  2:03 Can I convince the diff algorithm to behave better? Tom Ritter
@ 2021-03-03 12:41 ` Thomas Braun
  2021-03-03 23:45 ` Jonathan Tan
  2021-03-04  9:52 ` Christian Couder
  2 siblings, 0 replies; 4+ messages in thread
From: Thomas Braun @ 2021-03-03 12:41 UTC (permalink / raw)
  To: Tom Ritter, git

On 3/3/2021 3:03 AM, Tom Ritter wrote:

Hi Tom,

> (For a specific, nuanced, and personal definition of better...)
> 
> I have a frequent behavior that arises when I am copy/pasting chunks
> of code, typically in tests.  Here is an example:
> 
> My Original code:
> 
> def function():
>    line 1
>    line 2
>    line 3
>    line 4
>    line 5
>    line 6
> 
> --------------------------------
> I add, after it:
> 
> def function2():
>    line 1
>    line 2
>    line 3
>    line 4
>    line 5
>    line 6
> 
> --------------------------------
> My diff is:
> 
> +   line 3
> +   line 4
> +   line 5
> +   line 6
> +
> +def function2():
> +   line 1
> +   line 2
> 
> --------------------------------
> I'd like my diff to be
> 
> +
> +def function2():
> +   line 1
> +   line 2
> +   line 3
> +   line 4
> +   line 5
> +   line 6

I tried to reproduce and got exactly the diff you wanted to have. I need
to add a newline after the first "line 4" to get the not-sought-for diff.

Commit:

+++ b/test.py
@@ -0,0 +1,7 @@
+def function():
+    line 1
+    line 2
+    line 3
+    line 4
+    line 5
+    line 6

and then the following change:

--- a/test.py
+++ b/test.py
@@ -3,5 +3,14 @@ def function():
     line 2
     line 3
     line 4
+
+    line 5
+    line 6
+
+def function2():
+    line 1
+    line 2
+    line 3
+    line 4
     line 5
     line 6

I usually play around with --anchored when I want to solve an issue like
that.

The documentation of anchored says

If a line exists in both the source and destination, exists only once,
and starts with this text, this algorithm attempts to prevent it from
appearing as a deletion or addition in the output. It uses the "patience
diff" algorithm internally.

But I can't get it working here as the "exists only once" premise is broken.

Stepping back: It might also make sense to rethink the code as repeating
the same 6 lines in every function might not be the best possible design.

Thomas

[...]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Can I convince the diff algorithm to behave better?
  2021-03-03  2:03 Can I convince the diff algorithm to behave better? Tom Ritter
  2021-03-03 12:41 ` Thomas Braun
@ 2021-03-03 23:45 ` Jonathan Tan
  2021-03-04  9:52 ` Christian Couder
  2 siblings, 0 replies; 4+ messages in thread
From: Jonathan Tan @ 2021-03-03 23:45 UTC (permalink / raw)
  To: tom; +Cc: git, Jonathan Tan

> I know that git includes four diff algorithms; in my testing patience
> or histogram exacerbated the problem; and none of them improved upon
> it.  If anyone has suggestions I'd be curious to know if there's
> anything that could be done...

In your particular case, I can't think of anything, but in general, if
one of the lines weren't repeated, you might be able to use the
--anchored option.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Can I convince the diff algorithm to behave better?
  2021-03-03  2:03 Can I convince the diff algorithm to behave better? Tom Ritter
  2021-03-03 12:41 ` Thomas Braun
  2021-03-03 23:45 ` Jonathan Tan
@ 2021-03-04  9:52 ` Christian Couder
  2 siblings, 0 replies; 4+ messages in thread
From: Christian Couder @ 2021-03-04  9:52 UTC (permalink / raw)
  To: Tom Ritter; +Cc: git

On Thu, Mar 4, 2021 at 8:37 AM Tom Ritter <tom@ritter.vg> wrote:

[...]

> Obviously there's nothing incorrect about the former diff, I just wish
> it was the latter rather than the former.
>
> I know that git includes four diff algorithms; in my testing patience
> or histogram exacerbated the problem; and none of them improved upon
> it.  If anyone has suggestions I'd be curious to know if there's
> anything that could be done...

It's not so easy to implement good diff algorithms. You might want to
take a look at the "v2.11 new diff heuristic?" article in:

https://git.github.io/rev_news/2016/12/14/edition-22/

Best,
Christian.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-03-04  9:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-03  2:03 Can I convince the diff algorithm to behave better? Tom Ritter
2021-03-03 12:41 ` Thomas Braun
2021-03-03 23:45 ` Jonathan Tan
2021-03-04  9:52 ` Christian Couder

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).