git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Thomas Rast <trast@student.ethz.ch>
To: Scott Johnson <scottj75074@yahoo.com>
Cc: Michael J Gruber <git@drmicha.warpmail.net>,
	Matthijs Kooijman <matthijs@stdin.nl>, <git@vger.kernel.org>
Subject: [PATCH 0/4] --word-regex sanity checking and such
Date: Wed, 15 Dec 2010 16:13:24 +0100	[thread overview]
Message-ID: <cover.1292424926.git.trast@student.ethz.ch> (raw)
In-Reply-To: <561247.22837.qm@web110707.mail.gq1.yahoo.com>

[Forgot the list and Matthijs on the first sending.  Sorry for the
spam!]

Scott Johnson wrote [trimmed and wraps fixed up]:
> Here's the default `git diff --word-diff`:
>         [-<li class="yws-maps"><em></em><a-]{+<li><em></em><a+} href="#">yws-maps</a></li>
>         [-<li class="ydn-delicious"><em></em><a-]{+<li><em></em><a+} href="#">ydn-delicious</a></li>
> 
> Which is correct, but less than ideal because it highlights much more than the 
> actual changes.
> 
> So I create a .gitattributes file with one line:
> *.html diff=html
> 
> And rerun `git diff --word-diff`:
>         <li[-class="yws-maps"-]><em></em><a href="#">yws-maps</a></li>
>         <li><em></em><a href="#">ydn-delicious</a></li>
> 
> Yikes! What happened to the second line of changes? The removed code is not 
> displayed at all.

Michael J Gruber wrote:
> Yep, I just found out myself experimenting with a wordRegex for csv.
> Seems like quite a "Gimme rope" feature...
> 
> So, it's the regex.

Well. Yes. No. Maybe.

Thanks for bringing this to my attention.  I currently have enough
more serious work to avoid that this actually motivated me to hack up
a sanity check.  It's just far too error prone as it is now.

But I cannot reproduce the problem!  I put Scott's two offending lines
(taken from his "straight" diff) into t4034/html/{pre,post}, and I
think the output is valid.  Also, the word regex for html has long
included the |[^[:space:]] safeguard (actually they all do except for
bibtex, which is even more lenient on what it matches).  So you either
found an example that depends on more context (which would be *really*
bad) or there is another source of bad regexes.  Anyway, the safeguard
should easily catch the latter case.

This did unearth a bug in the ruby regex, though, so it's been worth
the trouble.

Various small issues with this patch series:

* [4/4] I stole the html test from Scott's mail, and some of the rest
  from various Wikibooks sources on "Hello World" in each language,
  usually extended by a bit of code that tests the world-splitting
  power.  I hope this is ok with Scott and the Copyright overlords.
  There are only so many ways to spell "Hello World", and only so many
  languages I know myself.

* [4/4] Many patterns do not split 1+2, probably because they stick +2
  together as a signed integer literal, even though I think they
  should.  I ran out of time to investigate however.

* [3/4] was actually detected with the help of [4/4], but putting it
  after would require heavy special casing.

* [2/4] It's a weird idiosyncrasy of the word-diff code that the exit
  status of git-diff does not depend on whether word-diff found any
  differences, and in fact the shown hunks do not either.  So the
  tests are "test_must_fail" regardless of word regex, because the
  input files differ at a byte level.  Maybe at least hunks without
  word differences should be suppressed?


Thomas Rast (4):
  diff.c: pass struct diff_words into find_word_boundaries
  diff.c: implement a sanity check for word regexes
  userdiff: fix typo in ruby word regex
  t4034: bulk verify builtin word regex sanity

 Documentation/config.txt |    8 ++++
 diff.c                   |  104 +++++++++++++++++++++++++++++++++++++++++----
 diff.h                   |    1 +
 t/t4034-diff-words.sh    |   85 +++++++++++++++++++++++++++++++++++++-
 t/t4034/bibtex/expect    |   15 +++++++
 t/t4034/bibtex/post      |   10 ++++
 t/t4034/bibtex/pre       |    9 ++++
 t/t4034/cpp/expect       |   10 ++++
 t/t4034/cpp/post         |    5 ++
 t/t4034/cpp/pre          |    4 ++
 t/t4034/csharp/expect    |   12 +++++
 t/t4034/csharp/post      |    8 ++++
 t/t4034/csharp/pre       |    8 ++++
 t/t4034/fortran/expect   |   12 +++++
 t/t4034/fortran/post     |    7 +++
 t/t4034/fortran/pre      |    7 +++
 t/t4034/html/expect      |    7 +++
 t/t4034/html/post        |    2 +
 t/t4034/html/pre         |    2 +
 t/t4034/java/expect      |   11 +++++
 t/t4034/java/post        |    6 +++
 t/t4034/java/pre         |    6 +++
 t/t4034/objc/expect      |   11 +++++
 t/t4034/objc/post        |    7 +++
 t/t4034/objc/pre         |    7 +++
 t/t4034/pascal/expect    |   12 +++++
 t/t4034/pascal/post      |    7 +++
 t/t4034/pascal/pre       |    7 +++
 t/t4034/php/expect       |    7 +++
 t/t4034/php/post         |    2 +
 t/t4034/php/pre          |    2 +
 t/t4034/python/expect    |    8 ++++
 t/t4034/python/post      |    3 +
 t/t4034/python/pre       |    3 +
 t/t4034/ruby/expect      |    7 +++
 t/t4034/ruby/post        |    2 +
 t/t4034/ruby/pre         |    2 +
 t/t4034/tex/expect       |    9 ++++
 t/t4034/tex/post         |    4 ++
 t/t4034/tex/pre          |    4 ++
 userdiff.c               |    2 +-
 41 files changed, 433 insertions(+), 12 deletions(-)
 create mode 100644 t/t4034/bibtex/expect
 create mode 100644 t/t4034/bibtex/post
 create mode 100644 t/t4034/bibtex/pre
 create mode 100644 t/t4034/cpp/expect
 create mode 100644 t/t4034/cpp/post
 create mode 100644 t/t4034/cpp/pre
 create mode 100644 t/t4034/csharp/expect
 create mode 100644 t/t4034/csharp/post
 create mode 100644 t/t4034/csharp/pre
 create mode 100644 t/t4034/fortran/expect
 create mode 100644 t/t4034/fortran/post
 create mode 100644 t/t4034/fortran/pre
 create mode 100644 t/t4034/html/expect
 create mode 100644 t/t4034/html/post
 create mode 100644 t/t4034/html/pre
 create mode 100644 t/t4034/java/expect
 create mode 100644 t/t4034/java/post
 create mode 100644 t/t4034/java/pre
 create mode 100644 t/t4034/objc/expect
 create mode 100644 t/t4034/objc/post
 create mode 100644 t/t4034/objc/pre
 create mode 100644 t/t4034/pascal/expect
 create mode 100644 t/t4034/pascal/post
 create mode 100644 t/t4034/pascal/pre
 create mode 100644 t/t4034/php/expect
 create mode 100644 t/t4034/php/post
 create mode 100644 t/t4034/php/pre
 create mode 100644 t/t4034/python/expect
 create mode 100644 t/t4034/python/post
 create mode 100644 t/t4034/python/pre
 create mode 100644 t/t4034/ruby/expect
 create mode 100644 t/t4034/ruby/post
 create mode 100644 t/t4034/ruby/pre
 create mode 100644 t/t4034/tex/expect
 create mode 100644 t/t4034/tex/post
 create mode 100644 t/t4034/tex/pre

-- 
1.7.3.3.807.g6ee1f

  parent reply	other threads:[~2010-12-15 15:13 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-15  3:47 html userdiff is not showing all my changes Scott Johnson
2010-12-15  9:06 ` Michael J Gruber
2010-12-15  9:12   ` Matthijs Kooijman
2010-12-15  9:29     ` Michael J Gruber
2010-12-15 15:13 ` Thomas Rast [this message]
2010-12-15 15:13   ` [PATCH 1/4] diff.c: pass struct diff_words into find_word_boundaries Thomas Rast
2010-12-15 15:13   ` [PATCH 2/4] diff.c: implement a sanity check for word regexes Thomas Rast
2010-12-15 15:13   ` [PATCH 3/4] userdiff: fix typo in ruby word regex Thomas Rast
2010-12-15 15:13   ` [PATCH 4/4] t4034: bulk verify builtin word regex sanity Thomas Rast
     [not found]   ` <913156.57703.qm@web110711.mail.gq1.yahoo.com>
2010-12-15 19:51     ` [PATCH 0/4] --word-regex sanity checking and such Thomas Rast
2010-12-15 20:48       ` Scott Johnson
2010-12-18 16:17         ` [PATCH v2 " Thomas Rast
2010-12-18 16:17           ` [PATCH v2 1/4] diff.c: pass struct diff_words into find_word_boundaries Thomas Rast
2010-12-18 16:17           ` [PATCH v2 2/4] diff.c: implement a sanity check for word regexes Thomas Rast
2010-12-18 21:00             ` Junio C Hamano
2010-12-19  1:59               ` Thomas Rast
2010-12-18 16:17           ` [PATCH v2 3/4] userdiff: fix typo in ruby and python " Thomas Rast
2010-12-18 21:02             ` Junio C Hamano
2010-12-19  2:10               ` Thomas Rast
2010-12-18 16:17           ` [PATCH v2 4/4] t4034: bulk verify builtin word regex sanity Thomas Rast
2011-01-11 21:47             ` [RFC/PATCH 0/3] " Jonathan Nieder
2011-01-11 21:48               ` [PATCH 1/3] " Jonathan Nieder
2011-01-18 18:00                 ` Re*: " Junio C Hamano
2011-01-11 21:48               ` [PATCH 2/3] userdiff: simplify word-diff safeguard Jonathan Nieder
2011-01-11 21:49               ` [PATCH 3/3] t4034 (diff --word-diff): style suggestions Jonathan Nieder
2010-12-18 16:24           ` [PATCH v2 0/4] --word-regex sanity checking and such Thomas Rast
2010-12-18 20:48             ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1292424926.git.trast@student.ethz.ch \
    --to=trast@student.ethz.ch \
    --cc=git@drmicha.warpmail.net \
    --cc=git@vger.kernel.org \
    --cc=matthijs@stdin.nl \
    --cc=scottj75074@yahoo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).