git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Potential bug in --color-words output
@ 2022-10-28 21:08 Simeon Krastnikov
       [not found] ` <198be0ca-4768-d3ce-caf3-bce4b1bb1e49@gmail.com>
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Simeon Krastnikov @ 2022-10-28 21:08 UTC (permalink / raw)
  To: git

Hello,

Given an initial file with the contents "not to be", which I then change 
to "to be", the output of 'git diff --color-words', is

   notto be

with the first three letters colored red. To me this seems incorrect as 
it implies, or at least misleadingly suggests, that there was no space 
between "not" and "to" in the original file. (Even though in that case 
the output is actually "nottoto be" with the "notto" in red and "to" in 
green.)

If instead I start with a file with contents "to be", which I then 
change to "not to be", then the output is as expected:

   not to be

(First three letters colored green.)

Am I correct in seeing this as a bug? If so, any tips on what parts of 
diff.c to look at when starting a patch?

Thanks,

Simeon


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Potential bug in --color-words output
       [not found] ` <198be0ca-4768-d3ce-caf3-bce4b1bb1e49@gmail.com>
@ 2022-11-07 23:23   ` Stefan Beller
  0 siblings, 0 replies; 4+ messages in thread
From: Stefan Beller @ 2022-11-07 23:23 UTC (permalink / raw)
  To: Simeon Krastnikov, Git Mailing List; +Cc: gitster, phillip.wood

Hi,

in my copy of git there is:

----
git % git grep --line-number --no-color  color-words

Documentation/RelNotes/1.5.4.txt:377: * "git diff --color-words"
colored context lines in a wrong color.

Documentation/RelNotes/1.6.2.txt:100:* The definition of what
constitutes a word for "git diff --color-words"

Documentation/RelNotes/1.6.5.3.txt:23: * "git diff --color-words -U0"
didn't work correctly.

Documentation/RelNotes/1.7.2.txt:64: * "git diff --word-diff=<mode>"
extends the existing "--color-words"

Documentation/RelNotes/1.7.2.txt:141: * "git diff --graph" works
better with "--color-words" and other options

Documentation/diff-options.txt:475:--color-words[=<regex>]::

contrib/completion/git-completion.bash:1718:
--no-color --color-words --no-renames --check

contrib/diff-highlight/README:9:You can use "--color-words" to
highlight only the changed portions of

diff.c:1914: * '--color-words' algorithm can be described as:

diff.c:1929: * For '--graph' to work with '--color-words', we need to
output the graph prefix

diff.c:2138:/* In "color-words" mode, show word-diff of words
accumulated in the buffer */

diff.c:5566:            OPT_CALLBACK_F(0, "color-words", options, N_("<regex>"),

gitk-git/gitk:207:            "--color-words*" - "--word-diff=color" {

t/t4034-diff-words.sh:70:               word_diff --color-words

t/t4034-diff-words.sh:100:      word_diff --color-words &&

t/t4034-diff-words.sh:180:      word_diff --color-words --unified=0

t/t4034-diff-words.sh:185:      word_diff --color-words="[a-z]+"

t/t4034-diff-words.sh:190:      word_diff --color-words="[a-z${LF}]*"

t/t4034-diff-words.sh:203:      word_diff --color-words="[a-z]+"

t/t4034-diff-words.sh:208:      word_diff --color-words

t/t4034-diff-words.sh:217:      word_diff --color-words="[a-z]+"

t/t4034-diff-words.sh:240:      word_diff --color-words

t/t4034-diff-words.sh:262:      word_diff --color-words

t/t4034-diff-words.sh:278:      word_diff --color-words="a+"

t/t4034-diff-words.sh:294:      word_diff --color-words=.
----


So there are tests in t/t4034-diff-words.sh and probably around
diff.c:1914- diff.c:2138

Best,
Stefan


Am Mo., 7. Nov. 2022 um 12:43 Uhr schrieb Simeon Krastnikov
<skrastnikov@gmail.com>:
>
> Hi Junio, Phillip, and Stefan,
>
> I apologize for forwarding this to you personally, but based on the recent commit history, I figured that you would be the right persons to ask about this.
>
> Sorry again, if you consider this to be spam.
>
> Best,
>
> Simeon
>
>
>
> -------- Forwarded Message --------
> Subject: Potential bug in --color-words output
> Date: Fri, 28 Oct 2022 23:08:09 +0200
> From: Simeon Krastnikov <skrastnikov@gmail.com>
> To: git@vger.kernel.org
>
>
> Hello,
>
> Given an initial file with the contents "not to be", which I then change to "to be", the output of 'git diff --color-words', is
>
>   notto be
>
> with the first three letters colored red. To me this seems incorrect as it implies, or at least misleadingly suggests, that there was no space between "not" and "to" in the original file. (Even though in that case the output is actually "nottoto be" with the "notto" in red and "to" in green.)
>
> If instead I start with a file with contents "to be", which I then change to "not to be", then the output is as expected:
>
>   not to be
>
> (First three letters colored green.)
>
> Am I correct in seeing this as a bug? If so, any tips on what parts of diff.c to look at when starting a patch?
>
> Thanks,
>
> Simeon
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Potential bug in --color-words output
  2022-10-28 21:08 Potential bug in --color-words output Simeon Krastnikov
       [not found] ` <198be0ca-4768-d3ce-caf3-bce4b1bb1e49@gmail.com>
@ 2022-11-08  7:27 ` Johannes Sixt
  2022-11-10 13:51 ` Johannes Schindelin
  2 siblings, 0 replies; 4+ messages in thread
From: Johannes Sixt @ 2022-11-08  7:27 UTC (permalink / raw)
  To: Simeon Krastnikov; +Cc: git

Am 28.10.22 um 23:08 schrieb Simeon Krastnikov:
> Hello,
> 
> Given an initial file with the contents "not to be", which I then change
> to "to be", the output of 'git diff --color-words', is
> 
>   notto be
> 
> with the first three letters colored red. To me this seems incorrect as
> it implies, or at least misleadingly suggests, that there was no space
> between "not" and "to" in the original file. (Even though in that case
> the output is actually "nottoto be" with the "notto" in red and "to" in
> green.)
> 
> If instead I start with a file with contents "to be", which I then
> change to "not to be", then the output is as expected:
> 
>   not to be
> 
> (First three letters colored green.)
> 
> Am I correct in seeing this as a bug? If so, any tips on what parts of
> diff.c to look at when starting a patch?

Well, not really. When you have a file with

   Line one.
   Line two.

then change it to

   Line ONE.
   Line TWO.

then --color-words currently prints it as

   Line one.ONE.
   Line two.TWO.

because it does not print the whitespace after[*] a sequence of deleted
words. But if it were printed, we would see

   Line one.
   ONE.
   Line two.
   TWO.

That is considered inferior; hence, it isn't printed.

The current algorithm produces sensible output in the vast majority of
cases while also being fairly straight-forward. To make it work "better"
(for some definition of that word) in the borderline cases, the
algorithm would have to be made considerably more sophisticated.

[*] It might be whitespace before a sequence of words, but that does not
change the gist of the argument.

-- Hannes


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Potential bug in --color-words output
  2022-10-28 21:08 Potential bug in --color-words output Simeon Krastnikov
       [not found] ` <198be0ca-4768-d3ce-caf3-bce4b1bb1e49@gmail.com>
  2022-11-08  7:27 ` Johannes Sixt
@ 2022-11-10 13:51 ` Johannes Schindelin
  2 siblings, 0 replies; 4+ messages in thread
From: Johannes Schindelin @ 2022-11-10 13:51 UTC (permalink / raw)
  To: Simeon Krastnikov; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1143 bytes --]

Hi Simeon,

On Fri, 28 Oct 2022, Simeon Krastnikov wrote:

> Given an initial file with the contents "not to be", which I then change to
> "to be", the output of 'git diff --color-words', is
>
>   notto be
>
> with the first three letters colored red.

This seems to be a limitation of the original `--color-words` design.

FWIW I find myself using these three aliases all the time now, i.e. using
a different word-coloring mode that works much better for me:

	[alias]
		cwdiff = diff --color-words=\"[A-Za-z0-9_]+|.\"
	        cwlog = log --color-words=\"[A-Za-z0-9_]+|.\"
		cwshow = show --color-words=\"[A-Za-z0-9_]+|.\"

Explanation: the regex `[A-Za-z0-9_]+|.` treats runs of alphanumerical
characters (including the underscore) as words, and also treats any
individual characters that do not fall into that category as "words".

The only downside of this is that the coloring does not help identifying
white-space differences because only the foreground color is changed
(which does not result in any visible difference for white-space
characters).

Maybe these aliases help you, too?

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-11-10 13:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-28 21:08 Potential bug in --color-words output Simeon Krastnikov
     [not found] ` <198be0ca-4768-d3ce-caf3-bce4b1bb1e49@gmail.com>
2022-11-07 23:23   ` Stefan Beller
2022-11-08  7:27 ` Johannes Sixt
2022-11-10 13:51 ` Johannes Schindelin

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).