git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] Documentation/diff-options: explain different diff algorithms
@ 2018-07-24  0:36 Stefan Beller
  2018-07-24  4:40 ` Jonathan Nieder
  2018-08-10  0:10 ` [PATCH 0/2] Getting data on different diff algorithms WAS: " Stefan Beller
  0 siblings, 2 replies; 13+ messages in thread
From: Stefan Beller @ 2018-07-24  0:36 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

As a user I wondered what the diff algorithms are about. Offer at least
a basic explanation on the differences of the diff algorithms.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/diff-options.txt | 32 +++++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index f466600972f..0d765482027 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -94,16 +94,34 @@ diff" algorithm internally.
 	Choose a diff algorithm. The variants are as follows:
 +
 --
-`default`, `myers`;;
-	The basic greedy diff algorithm. Currently, this is the default.
 `minimal`;;
-	Spend extra time to make sure the smallest possible diff is
-	produced.
+	A diff as produced by the basic greedy algorithm described in
+	link:http://www.xmailserver.org/diff2.pdf[An O(ND) Difference Algorithm and its Variations]
+`default`, `myers`;;
+	The same algorithm as `minimal`, extended with a heuristic to
+	reduce extensive searches. Currently, this is the default.
 `patience`;;
-	Use "patience diff" algorithm when generating patches.
+	Use "patience diff" algorithm when generating patches. This
+	matches the longest common subsequence of unique lines on
+	both sides, recursively. It obtained its name by the way the
+	longest subsequence is found, as that is a byproduct of the
+	patience sorting algorithm. If there are no unique lines left
+	it falls back to `myers`. Empirically this algorithm produces
+	a more readable output for code, but it does not garantuee
+	the shortest output.
 `histogram`;;
-	This algorithm extends the patience algorithm to "support
-	low-occurrence common elements".
+	This algorithm re-implements the `patience` algorithm with
+	"support of low-occurrence common elements" and only picks
+	one element of the LCS for the recursion. It is often the
+	fastest, but in cornercases (when there are many longest
+	common subsequences of the same length) it produces bad
+	results as seen in:
+
+		seq 1 100 >one
+		echo 99 > two
+		seq 1 2 98 >>two
+		git diff --no-index --histogram one two
+
 --
 +
 For instance, if you configured diff.algorithm variable to a
-- 
2.18.0.345.g5c9ce644c3-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-08-10 22:19 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-24  0:36 [PATCH] Documentation/diff-options: explain different diff algorithms Stefan Beller
2018-07-24  4:40 ` Jonathan Nieder
2018-07-24 17:38   ` Stefan Beller
2018-07-24 20:06     ` Junio C Hamano
2018-08-06 22:25   ` Stefan Beller
2018-08-06 23:18     ` Jonathan Nieder
2018-08-07 15:56       ` Junio C Hamano
2018-08-09 19:26         ` Stefan Beller
2018-08-10 22:18           ` Stefan Beller
2018-08-09 19:51       ` Stefan Beller
2018-08-10  0:10 ` [PATCH 0/2] Getting data on different diff algorithms WAS: " Stefan Beller
2018-08-10  0:10   ` [PATCH 1/2] WIP: range-diff: take extra arguments for different diffs Stefan Beller
2018-08-10  0:10   ` [PATCH 2/2] WIP range-diff: print some statistics about the range Stefan Beller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).