From: Thomas Rast <trast@student.ethz.ch>
To: Scott Johnson <scottj75074@yahoo.com>
Cc: Michael J Gruber <git@drmicha.warpmail.net>,
Matthijs Kooijman <matthijs@stdin.nl>, <git@vger.kernel.org>
Subject: [PATCH 0/4] --word-regex sanity checking and such
Date: Wed, 15 Dec 2010 16:13:24 +0100 [thread overview]
Message-ID: <cover.1292424926.git.trast@student.ethz.ch> (raw)
In-Reply-To: <561247.22837.qm@web110707.mail.gq1.yahoo.com>
[Forgot the list and Matthijs on the first sending. Sorry for the
spam!]
Scott Johnson wrote [trimmed and wraps fixed up]:
> Here's the default `git diff --word-diff`:
> [-<li class="yws-maps"><em></em><a-]{+<li><em></em><a+} href="#">yws-maps</a></li>
> [-<li class="ydn-delicious"><em></em><a-]{+<li><em></em><a+} href="#">ydn-delicious</a></li>
>
> Which is correct, but less than ideal because it highlights much more than the
> actual changes.
>
> So I create a .gitattributes file with one line:
> *.html diff=html
>
> And rerun `git diff --word-diff`:
> <li[-class="yws-maps"-]><em></em><a href="#">yws-maps</a></li>
> <li><em></em><a href="#">ydn-delicious</a></li>
>
> Yikes! What happened to the second line of changes? The removed code is not
> displayed at all.
Michael J Gruber wrote:
> Yep, I just found out myself experimenting with a wordRegex for csv.
> Seems like quite a "Gimme rope" feature...
>
> So, it's the regex.
Well. Yes. No. Maybe.
Thanks for bringing this to my attention. I currently have enough
more serious work to avoid that this actually motivated me to hack up
a sanity check. It's just far too error prone as it is now.
But I cannot reproduce the problem! I put Scott's two offending lines
(taken from his "straight" diff) into t4034/html/{pre,post}, and I
think the output is valid. Also, the word regex for html has long
included the |[^[:space:]] safeguard (actually they all do except for
bibtex, which is even more lenient on what it matches). So you either
found an example that depends on more context (which would be *really*
bad) or there is another source of bad regexes. Anyway, the safeguard
should easily catch the latter case.
This did unearth a bug in the ruby regex, though, so it's been worth
the trouble.
Various small issues with this patch series:
* [4/4] I stole the html test from Scott's mail, and some of the rest
from various Wikibooks sources on "Hello World" in each language,
usually extended by a bit of code that tests the world-splitting
power. I hope this is ok with Scott and the Copyright overlords.
There are only so many ways to spell "Hello World", and only so many
languages I know myself.
* [4/4] Many patterns do not split 1+2, probably because they stick +2
together as a signed integer literal, even though I think they
should. I ran out of time to investigate however.
* [3/4] was actually detected with the help of [4/4], but putting it
after would require heavy special casing.
* [2/4] It's a weird idiosyncrasy of the word-diff code that the exit
status of git-diff does not depend on whether word-diff found any
differences, and in fact the shown hunks do not either. So the
tests are "test_must_fail" regardless of word regex, because the
input files differ at a byte level. Maybe at least hunks without
word differences should be suppressed?
Thomas Rast (4):
diff.c: pass struct diff_words into find_word_boundaries
diff.c: implement a sanity check for word regexes
userdiff: fix typo in ruby word regex
t4034: bulk verify builtin word regex sanity
Documentation/config.txt | 8 ++++
diff.c | 104 +++++++++++++++++++++++++++++++++++++++++----
diff.h | 1 +
t/t4034-diff-words.sh | 85 +++++++++++++++++++++++++++++++++++++-
t/t4034/bibtex/expect | 15 +++++++
t/t4034/bibtex/post | 10 ++++
t/t4034/bibtex/pre | 9 ++++
t/t4034/cpp/expect | 10 ++++
t/t4034/cpp/post | 5 ++
t/t4034/cpp/pre | 4 ++
t/t4034/csharp/expect | 12 +++++
t/t4034/csharp/post | 8 ++++
t/t4034/csharp/pre | 8 ++++
t/t4034/fortran/expect | 12 +++++
t/t4034/fortran/post | 7 +++
t/t4034/fortran/pre | 7 +++
t/t4034/html/expect | 7 +++
t/t4034/html/post | 2 +
t/t4034/html/pre | 2 +
t/t4034/java/expect | 11 +++++
t/t4034/java/post | 6 +++
t/t4034/java/pre | 6 +++
t/t4034/objc/expect | 11 +++++
t/t4034/objc/post | 7 +++
t/t4034/objc/pre | 7 +++
t/t4034/pascal/expect | 12 +++++
t/t4034/pascal/post | 7 +++
t/t4034/pascal/pre | 7 +++
t/t4034/php/expect | 7 +++
t/t4034/php/post | 2 +
t/t4034/php/pre | 2 +
t/t4034/python/expect | 8 ++++
t/t4034/python/post | 3 +
t/t4034/python/pre | 3 +
t/t4034/ruby/expect | 7 +++
t/t4034/ruby/post | 2 +
t/t4034/ruby/pre | 2 +
t/t4034/tex/expect | 9 ++++
t/t4034/tex/post | 4 ++
t/t4034/tex/pre | 4 ++
userdiff.c | 2 +-
41 files changed, 433 insertions(+), 12 deletions(-)
create mode 100644 t/t4034/bibtex/expect
create mode 100644 t/t4034/bibtex/post
create mode 100644 t/t4034/bibtex/pre
create mode 100644 t/t4034/cpp/expect
create mode 100644 t/t4034/cpp/post
create mode 100644 t/t4034/cpp/pre
create mode 100644 t/t4034/csharp/expect
create mode 100644 t/t4034/csharp/post
create mode 100644 t/t4034/csharp/pre
create mode 100644 t/t4034/fortran/expect
create mode 100644 t/t4034/fortran/post
create mode 100644 t/t4034/fortran/pre
create mode 100644 t/t4034/html/expect
create mode 100644 t/t4034/html/post
create mode 100644 t/t4034/html/pre
create mode 100644 t/t4034/java/expect
create mode 100644 t/t4034/java/post
create mode 100644 t/t4034/java/pre
create mode 100644 t/t4034/objc/expect
create mode 100644 t/t4034/objc/post
create mode 100644 t/t4034/objc/pre
create mode 100644 t/t4034/pascal/expect
create mode 100644 t/t4034/pascal/post
create mode 100644 t/t4034/pascal/pre
create mode 100644 t/t4034/php/expect
create mode 100644 t/t4034/php/post
create mode 100644 t/t4034/php/pre
create mode 100644 t/t4034/python/expect
create mode 100644 t/t4034/python/post
create mode 100644 t/t4034/python/pre
create mode 100644 t/t4034/ruby/expect
create mode 100644 t/t4034/ruby/post
create mode 100644 t/t4034/ruby/pre
create mode 100644 t/t4034/tex/expect
create mode 100644 t/t4034/tex/post
create mode 100644 t/t4034/tex/pre
--
1.7.3.3.807.g6ee1f
next prev parent reply other threads:[~2010-12-15 15:13 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-15 3:47 html userdiff is not showing all my changes Scott Johnson
2010-12-15 9:06 ` Michael J Gruber
2010-12-15 9:12 ` Matthijs Kooijman
2010-12-15 9:29 ` Michael J Gruber
2010-12-15 15:13 ` Thomas Rast [this message]
2010-12-15 15:13 ` [PATCH 1/4] diff.c: pass struct diff_words into find_word_boundaries Thomas Rast
2010-12-15 15:13 ` [PATCH 2/4] diff.c: implement a sanity check for word regexes Thomas Rast
2010-12-15 15:13 ` [PATCH 3/4] userdiff: fix typo in ruby word regex Thomas Rast
2010-12-15 15:13 ` [PATCH 4/4] t4034: bulk verify builtin word regex sanity Thomas Rast
[not found] ` <913156.57703.qm@web110711.mail.gq1.yahoo.com>
2010-12-15 19:51 ` [PATCH 0/4] --word-regex sanity checking and such Thomas Rast
2010-12-15 20:48 ` Scott Johnson
2010-12-18 16:17 ` [PATCH v2 " Thomas Rast
2010-12-18 16:17 ` [PATCH v2 1/4] diff.c: pass struct diff_words into find_word_boundaries Thomas Rast
2010-12-18 16:17 ` [PATCH v2 2/4] diff.c: implement a sanity check for word regexes Thomas Rast
2010-12-18 21:00 ` Junio C Hamano
2010-12-19 1:59 ` Thomas Rast
2010-12-18 16:17 ` [PATCH v2 3/4] userdiff: fix typo in ruby and python " Thomas Rast
2010-12-18 21:02 ` Junio C Hamano
2010-12-19 2:10 ` Thomas Rast
2010-12-18 16:17 ` [PATCH v2 4/4] t4034: bulk verify builtin word regex sanity Thomas Rast
2011-01-11 21:47 ` [RFC/PATCH 0/3] " Jonathan Nieder
2011-01-11 21:48 ` [PATCH 1/3] " Jonathan Nieder
2011-01-18 18:00 ` Re*: " Junio C Hamano
2011-01-11 21:48 ` [PATCH 2/3] userdiff: simplify word-diff safeguard Jonathan Nieder
2011-01-11 21:49 ` [PATCH 3/3] t4034 (diff --word-diff): style suggestions Jonathan Nieder
2010-12-18 16:24 ` [PATCH v2 0/4] --word-regex sanity checking and such Thomas Rast
2010-12-18 20:48 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1292424926.git.trast@student.ethz.ch \
--to=trast@student.ethz.ch \
--cc=git@drmicha.warpmail.net \
--cc=git@vger.kernel.org \
--cc=matthijs@stdin.nl \
--cc=scottj75074@yahoo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).