git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* UTF-8-safe way for char-level-diff
@ 2018-01-19 14:13 Danny Lin
  0 siblings, 0 replies; only message in thread
From: Danny Lin @ 2018-01-19 14:13 UTC (permalink / raw)
  To: git develop

Git has a diff.wordRegex config that allows the user to specify a
regex that defines a word. Setting diff.wordRegex to "." works well
for a char-level diff for ASCII chars, but not for UTF-8 chars.

For example, if a file (encoded by UTF-8) with text "一人" is changed to
"丁人", "git diff --word-diff=color" gets "<E4><B8><80><81>人" (where
"<80>" is red and "<81>" is green) instead of desired "一丁人" (where "一"
is red and "丁" is green). This could be very annoying when diff-ing
files containing CJK chars.

Git diff.wordRegex seems to implement a very basic regex that doesn't
support matching char range by encoding such as "\x41" for "a". Is
there a way to make the char-level diff work correctly? If not, maybe
we should implement a way to allow it.

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2018-01-19 14:13 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-19 14:13 UTF-8-safe way for char-level-diff Danny Lin

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).