git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <johannes.schindelin@gmx.de>
To: "Luis R. Rodriguez" <mcgrof@do-not-panic.com>
Cc: git@vger.kernel.org, Julia Lawall <julia.lawall@lip6.fr>
Subject: Re: GNU diff and git diff - difference on myers algorithm?
Date: Tue, 09 Jun 2015 10:25:44 +0200	[thread overview]
Message-ID: <0add7d95076f5b112af90d8566c29203@www.dscho.org> (raw)
In-Reply-To: <CAB=NE6XRnKAY6t+dxT7vO_4wqngXvULh-CqezEAs2r99FkNCTg@mail.gmail.com>

Hi Luis,

On 2015-06-08 20:34, Luis R. Rodriguez wrote:
> Based on a cursory review of the git code I get the impression that
> GNU diff and git 'diff' do not share any code for the possible diff
> algorithms.

Indeed, Git's diff machinery is based[*1*] ofn libxdiff[*2*], not on GNU diff.

> I'm in particularly curious more about the default "myers"
> algorithm.

Are you looking for a freely available implementation of the Myers algorithm? Or are you interested in understanding it?

Please note that Myers' algorithm is just one first step in most diff implementations (and that other diff algorithms have become popular, in particular because comparing strings can be accelerated by hashing the text lines first, and those hashes can also be used to identify matching pairs of unique lines, giving rise to yet another huge performance boost for typical uses).

The reason why Myers' algorithm is not sufficient for diff implementations is that it only optimizes the "edit distance", i.e. the amount of added/removed lines, while patches should be readable, too, i.e. prefer *consecutive* edits to disjunct ones.

Just to mention one post-processing technique that is so useful that I implemented it for Git[*3*]: the "patience diff" algorithm of Bram Cohen (of BitTorrent fame) finds matching pairs of unique lines -- think of a function from which another function is refactored, for example, intuitively you want the diff to keep the signature of the original function as a context line.

Disclaimer: While it is true that Gene and I shared an office for one month, and that I am once again working in the same institute as he does, all my knowledge about this algorithm stems from my reading his paper and implementing the algorithm in Java for use in JGit[*3*].

> I can take time to do a precise code review of the
> algorithms used on both GNU diff and git but if someone can already
> vet for any differences that'd be appreciated as it would save time.

Again, I am curious what your goal is? I am sure I can support your quest better when I understand what the purpose of this code review should be.

Ciao,
Johannes

Footnote *1*: https://github.com/git/git/commit/3443546f6ef57fe28ea5cca232df8e400bfc3883
Footnote *2*: http://www.xmailserver.org/xdiff-lib.html
Footnote *3*: https://github.com/git/git/blob/master/xdiff/xpatience.c
Footnote *4*: https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/diff/MyersDiff.java

  reply	other threads:[~2015-06-09  8:25 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-08 18:34 GNU diff and git diff - difference on myers algorithm? Luis R. Rodriguez
2015-06-09  8:25 ` Johannes Schindelin [this message]
2015-06-12 18:52   ` Luis R. Rodriguez
     [not found]     ` <CAB=NE6VGX332=CvhQM4sc27AM8ae5S1kdRnm5sMfoqkU=b=ebg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-07-16 19:07       ` Luis R. Rodriguez
     [not found]         ` <CAB=NE6UFMv0qu8fJ1P2-pJCF0tSGKoW+uKhfwt0jV5fj2wZGSQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-07-17  4:22           ` Jacob Keller
     [not found]             ` <CA+P7+xrOPS6NeQhte-ATdm2Nqo0PpmUAxS+XYzWDvZGtwPtWMw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-07-17  4:23               ` Jacob Keller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0add7d95076f5b112af90d8566c29203@www.dscho.org \
    --to=johannes.schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=julia.lawall@lip6.fr \
    --cc=mcgrof@do-not-panic.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).