git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Using --word-diff breaks --color-moved
@ 2018-10-31  2:05 james harvey
  2018-10-31  4:27 ` Junio C Hamano
  2018-10-31 17:41 ` Stefan Beller
  0 siblings, 2 replies; 7+ messages in thread
From: james harvey @ 2018-10-31  2:05 UTC (permalink / raw)
  To: git; +Cc: sbeller

If you use both "--word-diff" and "--color-moved", regardless of the
order of arguments, "--word-diff" takes precedence and "--color-moved"
isn't allowed to do anything.

I think "--color-moved" should have precedence over "--word-diff".  I
cannot think of a scenario where a user would supply both options, and
actually want "--word-diff" to take precedence.  If I'm not thinking
of a scenario where this wouldn't be desired, perhaps whichever is
first as an argument could take precedence.

(The same behavior happens if 4+ lines are moved and
"--color-moved{default=zebra}" is used, but below
"--color-moved=plain" is used to be a smaller testcase.)

Given the following setup:

$ cat << EOF > file
> a
> b
> c
> EOF
$ git add file
$ git commit -m "Added file."
$ cat << EOF > file
> b
> c
> a
> EOF

You can have moved lines colorization:
$ git diff --color-moved=plain
...
[oldMovedColor]-a
b
c
[newMovedColor]+a

You can diff based on words:
$ git diff --word-diff
...
[oldColor][-a-]
b
c
[newColor][+a+}

But, you cannot diff based on words, and have moved lines colorization:
$ git diff --color-moved=plain --word-diff
$ git diff --word-diff
...
[oldColor][-a-]
b
c
[newColor][+a+}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Using --word-diff breaks --color-moved
  2018-10-31  2:05 Using --word-diff breaks --color-moved james harvey
@ 2018-10-31  4:27 ` Junio C Hamano
  2018-10-31  7:07   ` james harvey
  2018-10-31 17:41 ` Stefan Beller
  1 sibling, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2018-10-31  4:27 UTC (permalink / raw)
  To: james harvey; +Cc: git, sbeller

james harvey <jamespharvey20@gmail.com> writes:

> If you use both "--word-diff" and "--color-moved", regardless of the
> order of arguments, "--word-diff" takes precedence and "--color-moved"
> isn't allowed to do anything.
>
> I think "--color-moved" should have precedence over "--word-diff".  I
> cannot think of a scenario where a user would supply both options, and
> actually want "--word-diff" to take precedence.

I am not sure if I follow.  If these two cannot work well together,
then we should just reject the request as asking for incompatible
combination of options while we are parsing the command line
arguments, rather than arguing which one should trump the other
one---that would simply lead to "in my opinion, word-diff is more
important" vs "in mine, color-moved is more important", no?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Using --word-diff breaks --color-moved
  2018-10-31  4:27 ` Junio C Hamano
@ 2018-10-31  7:07   ` james harvey
  2018-10-31 17:43     ` Stefan Beller
  0 siblings, 1 reply; 7+ messages in thread
From: james harvey @ 2018-10-31  7:07 UTC (permalink / raw)
  To: gitster; +Cc: git, sbeller

On Wed, Oct 31, 2018 at 12:27 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> james harvey <jamespharvey20@gmail.com> writes:
>
> > If you use both "--word-diff" and "--color-moved", regardless of the
> > order of arguments, "--word-diff" takes precedence and "--color-moved"
> > isn't allowed to do anything.
> >
> > I think "--color-moved" should have precedence over "--word-diff".  I
> > cannot think of a scenario where a user would supply both options, and
> > actually want "--word-diff" to take precedence.
>
> I am not sure if I follow.  If these two cannot work well together,
> then we should just reject the request as asking for incompatible
> combination of options while we are parsing the command line
> arguments, rather than arguing which one should trump the other
> one---that would simply lead to "in my opinion, word-diff is more
> important" vs "in mine, color-moved is more important", no?

I should have been more clear in my original message.  I don't mean
that if "--color-moved" is given, that the argument "--word-diff"
should be completely ignored as if it weren't given as an option.

I'm not too concerned about my reduced test case scenario.  I'm
concerned about a larger diff, where there's some areas that got
moved, some lines that got deleted, some added, and some lines with
just a word or two changed.

In those larger scenarios, WITHOUT using BOTH "--color-moved" and
"--word-diff", and INSTEAD just using "git diff --color-moved", a
typical full line(s) diff occurs for changed areas that weren't moved,
as if it were given as a hidden/default option.  It's analyzing each
differing area to see if it's going to show each of those differing
areas as a move or a full line(s) diff.  Here, "--color-moved" takes
precedence (in the way I'm trying to use the word) over the typical
full line(s) diff.

I could be wrong, but I don't see why "--color-moved" can't operate
the same way, with "--word-diff" taking the place of the typical full
line(s) diff.  So, if it would be technically accurate to show
something that was moved using either method, that it would show moved
areas as a move rather than as word-diffs.  This would leave areas not
moved to be word-diffed.

I think these options can co-exist.  I could be wrong, but I'm betting
the code for "--color-moved" was only written with the typical full
line(s) diff in mind, and wasn't written with "--word-diff" in mind.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Using --word-diff breaks --color-moved
  2018-10-31  2:05 Using --word-diff breaks --color-moved james harvey
  2018-10-31  4:27 ` Junio C Hamano
@ 2018-10-31 17:41 ` Stefan Beller
  2018-11-02  1:18   ` james harvey
  1 sibling, 1 reply; 7+ messages in thread
From: Stefan Beller @ 2018-10-31 17:41 UTC (permalink / raw)
  To: jamespharvey20; +Cc: git

On Tue, Oct 30, 2018 at 7:06 PM james harvey <jamespharvey20@gmail.com> wrote:
>
> If you use both "--word-diff" and "--color-moved", regardless of the
> order of arguments, "--word-diff" takes precedence and "--color-moved"
> isn't allowed to do anything.

The order of arguments doesn't matter here, as these just set internal
flags at parse time, which determine what later stages do.

Git uses the xdiff library internally for producing diffs[1].
To produce a diff, we have to feed two "streams of symbols"
to the library which then figures out the diff.
Usually a symbol is a whole line. Once we have the diff
we need to make it look nice again (i.e. put file names,
context markers and lines around the diff), which happens
in diff.c.

But when --word-diff is given, each line is broken up
into words and those are used as symbols for the finding
the diff[2]. See the function fn_out_consume() [3],
for example 'ecbdata->diff_words' is set on '--word-diff'.

When it is not set we fall down to the switch case that
will call emit_{add, del, context}_line(), which in turn
emits the lines.
The --color-moved step is performed after all diffing
(and nicing up) is done already and solely works on
the add/del lines. The word diff is piecing together lines
for output, which are completely ignored for move
detection.

[1] see the xdiff/ dir in your copy of git. We have some
    substantial changes compared to unmaintained upstream
    http://www.xmailserver.org/xdiff-lib.html
    http://www.xmailserver.org/xdiff.html

[2] https://github.com/git/git/blob/master/diff.c#L1872

[3] https://github.com/git/git/blob/master/diff.c#L2259

> I think "--color-moved" should have precedence over "--word-diff".

I agree for precedence as in "work well together". Now we'd need
to figure out what that means. In its current form, the move
detection can detect moved lines across diff hunks or file
boundaries.

Should that also be the case for word diffing?
I think word diffing is mostly used for free text, which has different
properties compared to code, that the color-moved was originally
intended for.

For example in code we often have few characters on a line
such as "<TAB> }" which is found often in gits code base.
We added some heuristics that lines showing up often with
few characters would not be detected on their own as a moved
block [4]. I would expect we'd have to figure out a similar heuristic
for word diffing, if we go down that route.

But that is a detail; we'd first have to figure out how to make the
words work with the move detection.

[4] https://github.com/git/git/commit/f0b8fb6e591b50b72b921f2c4cf120ebd284f510


>   I
> cannot think of a scenario where a user would supply both options, and
> actually want "--word-diff" to take precedence.  If I'm not thinking
> of a scenario where this wouldn't be desired, perhaps whichever is
> first as an argument could take precedence.

word diffing and move detection are completely orthogonal at the moment.
Instead of option order, I'd rather introduce a new option that tells us
how to resolve some corner case. Or in the short term we might just
want to raise an error?

> (The same behavior happens if 4+ lines are moved and
> "--color-moved{default=zebra}" is used, but below
> "--color-moved=plain" is used to be a smaller testcase.)
>
> [...]

This sounds like you are asking for two things:
(1) make color-moved work with words (somehow)
(2) allow the user to fine tune the heuristics for a block,
    such that default=zebra would still work.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Using --word-diff breaks --color-moved
  2018-10-31  7:07   ` james harvey
@ 2018-10-31 17:43     ` Stefan Beller
  0 siblings, 0 replies; 7+ messages in thread
From: Stefan Beller @ 2018-10-31 17:43 UTC (permalink / raw)
  To: jamespharvey20; +Cc: Junio C Hamano, git

On Wed, Oct 31, 2018 at 12:07 AM james harvey <jamespharvey20@gmail.com> wrote:

> I think these options can co-exist.  I could be wrong, but I'm betting
> the code for "--color-moved" was only written with the typical full
> line(s) diff in mind, and wasn't written with "--word-diff" in mind.

I think it was brought up, but neglected at the time.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Using --word-diff breaks --color-moved
  2018-10-31 17:41 ` Stefan Beller
@ 2018-11-02  1:18   ` james harvey
  2018-11-02 20:46     ` Stefan Beller
  0 siblings, 1 reply; 7+ messages in thread
From: james harvey @ 2018-11-02  1:18 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

On Wed, Oct 31, 2018 at 1:42 PM Stefan Beller <sbeller@google.com> wrote:
>
> On Tue, Oct 30, 2018 at 7:06 PM james harvey <jamespharvey20@gmail.com> wrote:
> > I think "--color-moved" should have precedence over "--word-diff".
>
> I agree for precedence as in "work well together". Now we'd need
> to figure out what that means. In its current form, the move
> detection can detect moved lines across diff hunks or file
> boundaries.
>
> Should that also be the case for word diffing?
> I think word diffing is mostly used for free text, which has different
> properties compared to code, that the color-moved was originally
> intended for.

That's how I think of it too.  I think I'd be fine if word diffing
stayed not being able to be detected with moved lines across diff
hunks or file boundaries.

> >   I
> > cannot think of a scenario where a user would supply both options, and
> > actually want "--word-diff" to take precedence.  If I'm not thinking
> > of a scenario where this wouldn't be desired, perhaps whichever is
> > first as an argument could take precedence.
>
> word diffing and move detection are completely orthogonal at the moment.
> Instead of option order, I'd rather introduce a new option that tells us
> how to resolve some corner case. Or in the short term we might just
> want to raise an error?

I'm fine with option order not mattering, as it does now.  Was
assuming it didn't matter now, but mentioned trying it in case it
worked that way.  And, mentioned it as an alternative in case it
turned out the two could conflict in some corner case.  I think
defaulting to resolving one way or the other with an optional option
to go the other way makes sense.

> > (The same behavior happens if 4+ lines are moved and
> > "--color-moved{default=zebra}" is used, but below
> > "--color-moved=plain" is used to be a smaller testcase.)
> >
> > [...]
>
> This sounds like you are asking for two things:
> (1) make color-moved work with words (somehow)
> (2) allow the user to fine tune the heuristics for a block,
>     such that default=zebra would still work.

I was asking for #1.  #2 might be a good idea, but I just tried using
"--color-moved" for the first time the other day, so haven't used it
enough to get that far.  If they worked together, I'm not sure yet if
I'd be using plain or zebra.  I mentioned "4+ lines" because I can
remember something said zebra only worked with more than 3 lines.  Not
sure where that was.  I thought it was the manpage, but I'm not seeing
that in there now.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Using --word-diff breaks --color-moved
  2018-11-02  1:18   ` james harvey
@ 2018-11-02 20:46     ` Stefan Beller
  0 siblings, 0 replies; 7+ messages in thread
From: Stefan Beller @ 2018-11-02 20:46 UTC (permalink / raw)
  To: james harvey; +Cc: git

On Thu, Nov 1, 2018 at 6:19 PM james harvey <jamespharvey20@gmail.com> wrote:

> > This sounds like you are asking for two things:
> > (1) make color-moved work with words (somehow)
> > (2) allow the user to fine tune the heuristics for a block,
> >     such that default=zebra would still work.
>
> I was asking for #1.

I currently have no time to look into that,
but you're welcome to do so. :-)
I'd be happy to review the patches!

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-11-02 20:46 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-31  2:05 Using --word-diff breaks --color-moved james harvey
2018-10-31  4:27 ` Junio C Hamano
2018-10-31  7:07   ` james harvey
2018-10-31 17:43     ` Stefan Beller
2018-10-31 17:41 ` Stefan Beller
2018-11-02  1:18   ` james harvey
2018-11-02 20:46     ` Stefan Beller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).