From: "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>,
Phillip Wood <phillip.wood@dunelm.org.uk>,
Phillip Wood <phillip.wood@dunelm.org.uk>
Subject: [PATCH] word diff: handle zero length matches
Date: Tue, 04 May 2021 09:27:34 +0000 [thread overview]
Message-ID: <pull.947.git.1620120455364.gitgitgadget@gmail.com> (raw)
From: Phillip Wood <phillip.wood@dunelm.org.uk>
If find_word_boundaries() encounters a zero length match (which can be
caused by matching a newline or using '*' instead of '+' in the regex)
we stop splitting the input into words which generates an inaccurate
diff. To fix this increment the start point when there is a zero
length match and try a new match. This is safe as posix regular
expressions always return the longest available match so a zero length
match means there are no longer matches available from the current
position.
Commit bf82940dbf1 (color-words: enable REG_NEWLINE to help user,
2009-01-17) prevented matching newlines in negated character classes
but it is still possible for the user to have an explicit newline
match in the regex which could cause a zero length match.
One could argue that having explicit newline matches or using '*'
rather than '+' are user errors but it seems to be better to work
round them than produce inaccurate diffs.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
word diff: handle zero length matches
If find_word_boundaries() encounters a zero length match (which can be
caused by matching a newline or using '*' instead of '+' in the regex)
we stop splitting the input into words which generates an inaccurate
diff. To fix this increment the start point when there is a zero length
match and try a new match.
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-947%2Fphillipwood%2Fwip%2Fword-diff-zero-length-matches-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-947/phillipwood/wip/word-diff-zero-length-matches-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/947
diff.c | 10 +++++++---
t/t4034-diff-words.sh | 5 +++++
2 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/diff.c b/diff.c
index 4acccd9d7edb..c8b1d724349c 100644
--- a/diff.c
+++ b/diff.c
@@ -2053,7 +2053,7 @@ static void fn_out_diff_words_aux(void *priv,
static int find_word_boundaries(mmfile_t *buffer, regex_t *word_regex,
int *begin, int *end)
{
- if (word_regex && *begin < buffer->size) {
+ while (word_regex && *begin < buffer->size) {
regmatch_t match[1];
if (!regexec_buf(word_regex, buffer->ptr + *begin,
buffer->size - *begin, 1, match, 0)) {
@@ -2061,9 +2061,13 @@ static int find_word_boundaries(mmfile_t *buffer, regex_t *word_regex,
'\n', match[0].rm_eo - match[0].rm_so);
*end = p ? p - buffer->ptr : match[0].rm_eo + *begin;
*begin += match[0].rm_so;
- return *begin >= *end;
+ if (*begin == *end)
+ (*begin)++;
+ else
+ return *begin > *end;
+ } else {
+ return -1;
}
- return -1;
}
/* find the next word */
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index ee7721ab9135..561c582d1615 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -184,6 +184,11 @@ test_expect_success 'word diff with a regular expression' '
word_diff --color-words="[a-z]+"
'
+test_expect_success 'word diff with zero length matches' '
+ cp expect.letter-runs-are-words expect &&
+ word_diff --color-words="[a-z${LF}]*"
+'
+
test_expect_success 'set up a diff driver' '
git config diff.testdriver.wordRegex "[^[:space:]]" &&
cat <<-\EOF >.gitattributes
base-commit: 7e391989789db82983665667013a46eabc6fc570
--
gitgitgadget
reply other threads:[~2021-05-04 9:27 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.947.git.1620120455364.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=phillip.wood@dunelm.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).