* Can I convince the diff algorithm to behave better?
@ 2021-03-03 2:03 Tom Ritter
2021-03-03 12:41 ` Thomas Braun
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Tom Ritter @ 2021-03-03 2:03 UTC (permalink / raw)
To: git
(For a specific, nuanced, and personal definition of better...)
I have a frequent behavior that arises when I am copy/pasting chunks
of code, typically in tests. Here is an example:
My Original code:
def function():
line 1
line 2
line 3
line 4
line 5
line 6
--------------------------------
I add, after it:
def function2():
line 1
line 2
line 3
line 4
line 5
line 6
--------------------------------
My diff is:
+ line 3
+ line 4
+ line 5
+ line 6
+
+def function2():
+ line 1
+ line 2
--------------------------------
I'd like my diff to be
+
+def function2():
+ line 1
+ line 2
+ line 3
+ line 4
+ line 5
+ line 6
Obviously there's nothing incorrect about the former diff, I just wish
it was the latter rather than the former.
I know that git includes four diff algorithms; in my testing patience
or histogram exacerbated the problem; and none of them improved upon
it. If anyone has suggestions I'd be curious to know if there's
anything that could be done...
Thanks,
-tom
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Can I convince the diff algorithm to behave better?
2021-03-03 2:03 Can I convince the diff algorithm to behave better? Tom Ritter
@ 2021-03-03 12:41 ` Thomas Braun
2021-03-03 23:45 ` Jonathan Tan
2021-03-04 9:52 ` Christian Couder
2 siblings, 0 replies; 4+ messages in thread
From: Thomas Braun @ 2021-03-03 12:41 UTC (permalink / raw)
To: Tom Ritter, git
On 3/3/2021 3:03 AM, Tom Ritter wrote:
Hi Tom,
> (For a specific, nuanced, and personal definition of better...)
>
> I have a frequent behavior that arises when I am copy/pasting chunks
> of code, typically in tests. Here is an example:
>
> My Original code:
>
> def function():
> line 1
> line 2
> line 3
> line 4
> line 5
> line 6
>
> --------------------------------
> I add, after it:
>
> def function2():
> line 1
> line 2
> line 3
> line 4
> line 5
> line 6
>
> --------------------------------
> My diff is:
>
> + line 3
> + line 4
> + line 5
> + line 6
> +
> +def function2():
> + line 1
> + line 2
>
> --------------------------------
> I'd like my diff to be
>
> +
> +def function2():
> + line 1
> + line 2
> + line 3
> + line 4
> + line 5
> + line 6
I tried to reproduce and got exactly the diff you wanted to have. I need
to add a newline after the first "line 4" to get the not-sought-for diff.
Commit:
+++ b/test.py
@@ -0,0 +1,7 @@
+def function():
+ line 1
+ line 2
+ line 3
+ line 4
+ line 5
+ line 6
and then the following change:
--- a/test.py
+++ b/test.py
@@ -3,5 +3,14 @@ def function():
line 2
line 3
line 4
+
+ line 5
+ line 6
+
+def function2():
+ line 1
+ line 2
+ line 3
+ line 4
line 5
line 6
I usually play around with --anchored when I want to solve an issue like
that.
The documentation of anchored says
If a line exists in both the source and destination, exists only once,
and starts with this text, this algorithm attempts to prevent it from
appearing as a deletion or addition in the output. It uses the "patience
diff" algorithm internally.
But I can't get it working here as the "exists only once" premise is broken.
Stepping back: It might also make sense to rethink the code as repeating
the same 6 lines in every function might not be the best possible design.
Thomas
[...]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Can I convince the diff algorithm to behave better?
2021-03-03 2:03 Can I convince the diff algorithm to behave better? Tom Ritter
2021-03-03 12:41 ` Thomas Braun
@ 2021-03-03 23:45 ` Jonathan Tan
2021-03-04 9:52 ` Christian Couder
2 siblings, 0 replies; 4+ messages in thread
From: Jonathan Tan @ 2021-03-03 23:45 UTC (permalink / raw)
To: tom; +Cc: git, Jonathan Tan
> I know that git includes four diff algorithms; in my testing patience
> or histogram exacerbated the problem; and none of them improved upon
> it. If anyone has suggestions I'd be curious to know if there's
> anything that could be done...
In your particular case, I can't think of anything, but in general, if
one of the lines weren't repeated, you might be able to use the
--anchored option.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Can I convince the diff algorithm to behave better?
2021-03-03 2:03 Can I convince the diff algorithm to behave better? Tom Ritter
2021-03-03 12:41 ` Thomas Braun
2021-03-03 23:45 ` Jonathan Tan
@ 2021-03-04 9:52 ` Christian Couder
2 siblings, 0 replies; 4+ messages in thread
From: Christian Couder @ 2021-03-04 9:52 UTC (permalink / raw)
To: Tom Ritter; +Cc: git
On Thu, Mar 4, 2021 at 8:37 AM Tom Ritter <tom@ritter.vg> wrote:
[...]
> Obviously there's nothing incorrect about the former diff, I just wish
> it was the latter rather than the former.
>
> I know that git includes four diff algorithms; in my testing patience
> or histogram exacerbated the problem; and none of them improved upon
> it. If anyone has suggestions I'd be curious to know if there's
> anything that could be done...
It's not so easy to implement good diff algorithms. You might want to
take a look at the "v2.11 new diff heuristic?" article in:
https://git.github.io/rev_news/2016/12/14/edition-22/
Best,
Christian.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-03-04 9:56 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-03 2:03 Can I convince the diff algorithm to behave better? Tom Ritter
2021-03-03 12:41 ` Thomas Braun
2021-03-03 23:45 ` Jonathan Tan
2021-03-04 9:52 ` Christian Couder
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).