* make git diff output easier to read - use better diff heuristics @ 2018-02-13 15:06 Σπύρος Βαζαίος 2018-02-13 16:08 ` Jeff King 0 siblings, 1 reply; 5+ messages in thread From: Σπύρος Βαζαίος @ 2018-02-13 15:06 UTC (permalink / raw) To: git Hi, I've came across an issue when using the git diff command. In particular the diff is different to what the svn diff produces. While both being correct the output of the svn diff is easier to understand than the git diff one. See the following issue on github where I initially reported the issue: https://github.com/git-for-windows/git/issues/1494 I have Included a picture to better illustrate the problem. What do you think? Is it possible to make git diff output similar to svn diff regarding this issue? ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: make git diff output easier to read - use better diff heuristics 2018-02-13 15:06 make git diff output easier to read - use better diff heuristics Σπύρος Βαζαίος @ 2018-02-13 16:08 ` Jeff King 2018-02-13 18:11 ` Stefan Beller 0 siblings, 1 reply; 5+ messages in thread From: Jeff King @ 2018-02-13 16:08 UTC (permalink / raw) To: Σπύρος Βαζαίος Cc: git, Michael Haggerty On Tue, Feb 13, 2018 at 05:06:15PM +0200, Σπύρος Βαζαίος wrote: > Hi, I've came across an issue when using the git diff command. In > particular the diff is different to what the svn diff produces. While > both being correct the output of the svn diff is easier to understand > than the git diff one. See the following issue on github where I > initially reported the issue: > > https://github.com/git-for-windows/git/issues/1494 > > I have Included a picture to better illustrate the problem. What do > you think? Is it possible to make git diff output similar to svn diff > regarding this issue? Try "git diff --no-indent-heuristic", which makes your example look better. Here's a quick reproduction: -- >8 -- cat >foo.c <<\EOF static struct foo bar[] = { #ifdef SOMETHING { "stats.info", MNU_GBX_FSTAINF, etc }, { "expired.info", MNU_GBX_FSTAINF, etc }, { "info.log", MNU_GBX_INFOLOG, etc }, #endif { NULL, 0, 0 }, }; EOF sed '6a\ #ifdef WITH_EMU\ { "SoftCam.Key", MNU_CFG_FSOFTCAMKEY, etc },\ #endif ' <foo.c >bar.c -- 8< -- Now this looks ugly: git diff --no-index foo.c bar.c but this does not: git diff --no-index --no-indent-heuristic foo.c bar.c That heuristic is described in 433860f3d0 (diff: improve positioning of add/delete blocks in diffs, 2016-09-05). I'm not sure exactly why it does the wrong thing here, or if it would be possible to tweak the weighting factors to make it work. -Peff ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: make git diff output easier to read - use better diff heuristics 2018-02-13 16:08 ` Jeff King @ 2018-02-13 18:11 ` Stefan Beller 2018-02-13 18:25 ` Σπύρος Βαζαίος 0 siblings, 1 reply; 5+ messages in thread From: Stefan Beller @ 2018-02-13 18:11 UTC (permalink / raw) To: Jeff King Cc: Σπύρος Βαζαίος, git, Michael Haggerty On Tue, Feb 13, 2018 at 8:08 AM, Jeff King <peff@peff.net> wrote: > On Tue, Feb 13, 2018 at 05:06:15PM +0200, Σπύρος Βαζαίος wrote: > >> Hi, I've came across an issue when using the git diff command. In >> particular the diff is different to what the svn diff produces. While >> both being correct the output of the svn diff is easier to understand >> than the git diff one. See the following issue on github where I >> initially reported the issue: >> >> https://github.com/git-for-windows/git/issues/1494 >> >> I have Included a picture to better illustrate the problem. What do >> you think? Is it possible to make git diff output similar to svn diff >> regarding this issue? > > Try "git diff --no-indent-heuristic", which makes your example look > better. Here's a quick reproduction: > > -- >8 -- > cat >foo.c <<\EOF > static struct foo bar[] = { > #ifdef SOMETHING > { "stats.info", MNU_GBX_FSTAINF, etc }, > { "expired.info", MNU_GBX_FSTAINF, etc }, > { "info.log", MNU_GBX_INFOLOG, etc }, > #endif > { NULL, 0, 0 }, > }; > EOF > > sed '6a\ > #ifdef WITH_EMU\ > { "SoftCam.Key", MNU_CFG_FSOFTCAMKEY, etc },\ > #endif > ' <foo.c >bar.c > -- 8< -- > > Now this looks ugly: > > git diff --no-index foo.c bar.c > > but this does not: > > git diff --no-index --no-indent-heuristic foo.c bar.c > > That heuristic is described in 433860f3d0 (diff: improve positioning of > add/delete blocks in diffs, 2016-09-05). I'm not sure exactly why it > does the wrong thing here, or if it would be possible to tweak the > weighting factors to make it work. https://github.com/git/git/commit/433860f3d0 I would think that the long lines at the shift boundaries[1] have bad impact on the algorithm as they are of different length and influence the backstepping value. (vague theory) I wonder if we want to add language specifics to the heuristic, such that '}' or 'end' in a given line have a strong signal that the cut should be right after that line? i.e. that would decrease the badness score. While this is not a general solution (but quite language specific), I would think this is still a good idea to look at. (When coming up with the heuristic, most people thought the bad examples would come from exotic languages, but not C, which we are all so familiar with such that we thought we had a good grasp how to make a heuristic that works with C. Note that the example looks like C, though). [1] { "info.log", MNU_GBX_INFOLOG, etc }, vs { "SoftCam.Key", MNU_CFG_FSOFTCAMKEY, etc },\ (slightly unrelated tangent on better diffs in general:) A couple days ago I had another discussion on how diffs are best presented to users. That discussion was focused on a slightly higher layer, the diff algorithm itself, instead of a post-algorithm boost-fixup. It is usually easier to both write as well as review code that adds completely new features or projects instead of squashing bugs or integrating subtle new features in an existing code base. This observation lead to the conclusion that reviewing large locks of adjacent new lines is easier than reviewing blocks where deletions as well as additions are found. So we theorized that deletions should have more impact than additions when computing the diff itself. The case presented here has no deletions so this is not applicable, but most examples that came up once 2477ab2ea8 (diff: support anchoring line(s), 2017-11-27) was discussed would be likely to have better diffs. Stefan ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: make git diff output easier to read - use better diff heuristics 2018-02-13 18:11 ` Stefan Beller @ 2018-02-13 18:25 ` Σπύρος Βαζαίος 2018-02-13 22:41 ` Michael Haggerty 0 siblings, 1 reply; 5+ messages in thread From: Σπύρος Βαζαίος @ 2018-02-13 18:25 UTC (permalink / raw) To: Stefan Beller; +Cc: Jeff King, git, Michael Haggerty Hi guys, While I din't have the experience to express an opinion on this matter, I have to say that the --no-indent-heuristic that Jeff suggested worked great. There were more than a handful of cases that this issue happened in my diff file (all were the same: #endif followed by #ifdef). Oh, and the language is C indeed. Thanks for the great support! On Tue, Feb 13, 2018 at 8:11 PM, Stefan Beller <sbeller@google.com> wrote: > On Tue, Feb 13, 2018 at 8:08 AM, Jeff King <peff@peff.net> wrote: >> On Tue, Feb 13, 2018 at 05:06:15PM +0200, Σπύρος Βαζαίος wrote: >> >>> Hi, I've came across an issue when using the git diff command. In >>> particular the diff is different to what the svn diff produces. While >>> both being correct the output of the svn diff is easier to understand >>> than the git diff one. See the following issue on github where I >>> initially reported the issue: >>> >>> https://github.com/git-for-windows/git/issues/1494 >>> >>> I have Included a picture to better illustrate the problem. What do >>> you think? Is it possible to make git diff output similar to svn diff >>> regarding this issue? >> >> Try "git diff --no-indent-heuristic", which makes your example look >> better. Here's a quick reproduction: >> >> -- >8 -- >> cat >foo.c <<\EOF >> static struct foo bar[] = { >> #ifdef SOMETHING >> { "stats.info", MNU_GBX_FSTAINF, etc }, >> { "expired.info", MNU_GBX_FSTAINF, etc }, >> { "info.log", MNU_GBX_INFOLOG, etc }, >> #endif >> { NULL, 0, 0 }, >> }; >> EOF >> >> sed '6a\ >> #ifdef WITH_EMU\ >> { "SoftCam.Key", MNU_CFG_FSOFTCAMKEY, etc },\ >> #endif >> ' <foo.c >bar.c >> -- 8< -- >> >> Now this looks ugly: >> >> git diff --no-index foo.c bar.c >> >> but this does not: >> >> git diff --no-index --no-indent-heuristic foo.c bar.c >> >> That heuristic is described in 433860f3d0 (diff: improve positioning of >> add/delete blocks in diffs, 2016-09-05). I'm not sure exactly why it >> does the wrong thing here, or if it would be possible to tweak the >> weighting factors to make it work. > > https://github.com/git/git/commit/433860f3d0 > > I would think that the long lines at the shift boundaries[1] have > bad impact on the algorithm as they are of different length > and influence the backstepping value. (vague theory) > > I wonder if we want to add language specifics to the heuristic, > such that '}' or 'end' in a given line have a strong signal that the > cut should be right after that line? i.e. that would decrease the > badness score. > > While this is not a general solution (but quite language specific), > I would think this is still a good idea to look at. > (When coming up with the heuristic, most people thought the bad > examples would come from exotic languages, but not C, which we > are all so familiar with such that we thought we had a good grasp > how to make a heuristic that works with C. Note that the example > looks like C, though). > > [1] > { "info.log", MNU_GBX_INFOLOG, etc }, > vs > { "SoftCam.Key", MNU_CFG_FSOFTCAMKEY, etc },\ > > > (slightly unrelated tangent on better diffs in general:) > A couple days ago I had another discussion on how > diffs are best presented to users. That discussion was > focused on a slightly higher layer, the diff algorithm itself, > instead of a post-algorithm boost-fixup. > > It is usually easier to both write as well as review code that > adds completely new features or projects instead of squashing > bugs or integrating subtle new features in an existing code base. > > This observation lead to the conclusion that reviewing large > locks of adjacent new lines is easier than reviewing blocks > where deletions as well as additions are found. So we theorized > that deletions should have more impact than additions when > computing the diff itself. The case presented here has no deletions > so this is not applicable, but most examples that came up once > 2477ab2ea8 (diff: support anchoring line(s), 2017-11-27) > was discussed would be likely to have better diffs. > > Stefan ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: make git diff output easier to read - use better diff heuristics 2018-02-13 18:25 ` Σπύρος Βαζαίος @ 2018-02-13 22:41 ` Michael Haggerty 0 siblings, 0 replies; 5+ messages in thread From: Michael Haggerty @ 2018-02-13 22:41 UTC (permalink / raw) To: Σπύρος Βαζαίος Cc: Stefan Beller, Jeff King, git On Tue, Feb 13, 2018 at 7:25 PM, Σπύρος Βαζαίος <sbazaios@gmail.com> wrote: > While I din't have the experience to express an opinion on this > matter, I have to say that the --no-indent-heuristic that Jeff > suggested worked great. > There were more than a handful of cases that this issue happened in my > diff file (all were the same: #endif followed by #ifdef). > Oh, and the language is C indeed. The "indent heuristic" algorithm that Git now uses by default is nothing more than that—a heuristic—so it can be fooled. It bases its decision on the locations of blank lines and the indentations of non-blank lines. In the vast majority of cases it gives the same or better results than the old algorithm, but there are some cases, like yours, where it gives aesthetically less pleasing (though still correct) results. The algorithm usually handles C code well, but it tends to be confused by preprocessor directives, because they are not indented like typical code. It might be possible to tweak the weights to get it to handle preprocessor directives better, but that causes it to do worse on other, more common things like Python code (where blocks are preceded but not followed by a line with lesser indentation). Doing significantly better probably would require some amount of language-awareness, but that's a bigger job than I was willing to take on. Michael ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-02-13 22:41 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-02-13 15:06 make git diff output easier to read - use better diff heuristics Σπύρος Βαζαίος 2018-02-13 16:08 ` Jeff King 2018-02-13 18:11 ` Stefan Beller 2018-02-13 18:25 ` Σπύρος Βαζαίος 2018-02-13 22:41 ` Michael Haggerty
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).