git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* html userdiff is not showing all my changes
@ 2010-12-15  3:47 Scott Johnson
  2010-12-15  9:06 ` Michael J Gruber
  2010-12-15 15:13 ` [PATCH 0/4] --word-regex sanity checking and such Thomas Rast
  0 siblings, 2 replies; 27+ messages in thread
From: Scott Johnson @ 2010-12-15  3:47 UTC (permalink / raw)
  To: git, trast

I am attempting to do a word diff of an html source file. Part of the removed 
html is disappearing from the diff when I enable the fancy html word diff.

Here's the output from basic `git diff`:
diff --git a/adv_layout_source.html b/adv_layout_source.html
index 18a81dd..c4ed609 100644
--- a/adv_layout_source.html
+++ b/adv_layout_source.html
@@ -42,8 +42,8 @@
       <ul>
         <li class="ydn-patterns"><em></em><a href="#">ydn-patterns</a></li>
         <li class="ydn-mail"><em></em><a href="#">ydn-mail</a></li>
-        <li class="yws-maps"><em></em><a href="#">yws-maps</a></li>
-        <li class="ydn-delicious"><em></em><a href="#">ydn-delicious</a></li>
+        <li><em></em><a href="#">yws-maps</a></li>
+        <li><em></em><a href="#">ydn-delicious</a></li>
         <li class="yws-flickr"><em></em><a href="#">yws-flickr</a></li>
         <li class="yws-events"><em></em><a href="#">yws-events</a></li>
       </ul>


Here's the default `git diff --word-diff`:
diff --git a/adv_layout_source.html b/adv_layout_source.html
index 18a81dd..c4ed609 100644
--- a/adv_layout_source.html
+++ b/adv_layout_source.html
@@ -42,8 +42,8 @@
      <ul>
        <li class="ydn-patterns"><em></em><a href="#">ydn-patterns</a></li>
        <li class="ydn-mail"><em></em><a href="#">ydn-mail</a></li>
        [-<li class="yws-maps"><em></em><a-]{+<li><em></em><a+} 
href="#">yws-maps</a></li>
        [-<li class="ydn-delicious"><em></em><a-]{+<li><em></em><a+} 
href="#">ydn-delicious</a></li>
        <li class="yws-flickr"><em></em><a href="#">yws-flickr</a></li>
        <li class="yws-events"><em></em><a href="#">yws-events</a></li>
      </ul>

Which is correct, but less than ideal because it highlights much more than the 
actual changes.

So I create a .gitattributes file with one line:
*.html diff=html

And rerun `git diff --word-diff`:
diff --git a/adv_layout_source.html b/adv_layout_source.html
index 18a81dd..c4ed609 100644
--- a/adv_layout_source.html
+++ b/adv_layout_source.html
@@ -42,8 +42,8 @@
      <ul>
        <li class="ydn-patterns"><em></em><a href="#">ydn-patterns</a></li>
        <li class="ydn-mail"><em></em><a href="#">ydn-mail</a></li>
        <li[-class="yws-maps"-]><em></em><a href="#">yws-maps</a></li>
        <li><em></em><a href="#">ydn-delicious</a></li>
        <li class="yws-flickr"><em></em><a href="#">yws-flickr</a></li>
        <li class="yws-events"><em></em><a href="#">yws-events</a></li>
      </ul>

Yikes! What happened to the second line of changes? The removed code is not 
displayed at all.

This is running git 1.7.3.3.

I suspect the problem is in the html patterns in userdiff.c, but I don't 
understand the word-diff-regex well enough to fix it.

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: html userdiff is not showing all my changes
  2010-12-15  3:47 html userdiff is not showing all my changes Scott Johnson
@ 2010-12-15  9:06 ` Michael J Gruber
  2010-12-15  9:12   ` Matthijs Kooijman
  2010-12-15 15:13 ` [PATCH 0/4] --word-regex sanity checking and such Thomas Rast
  1 sibling, 1 reply; 27+ messages in thread
From: Michael J Gruber @ 2010-12-15  9:06 UTC (permalink / raw)
  To: Scott Johnson; +Cc: git, trast

Scott Johnson venit, vidit, dixit 15.12.2010 04:47:
> I am attempting to do a word diff of an html source file. Part of the removed 
> html is disappearing from the diff when I enable the fancy html word diff.
> 
> Here's the output from basic `git diff`:
> diff --git a/adv_layout_source.html b/adv_layout_source.html
> index 18a81dd..c4ed609 100644
> --- a/adv_layout_source.html
> +++ b/adv_layout_source.html
> @@ -42,8 +42,8 @@
>        <ul>
>          <li class="ydn-patterns"><em></em><a href="#">ydn-patterns</a></li>
>          <li class="ydn-mail"><em></em><a href="#">ydn-mail</a></li>
> -        <li class="yws-maps"><em></em><a href="#">yws-maps</a></li>
> -        <li class="ydn-delicious"><em></em><a href="#">ydn-delicious</a></li>
> +        <li><em></em><a href="#">yws-maps</a></li>
> +        <li><em></em><a href="#">ydn-delicious</a></li>
>          <li class="yws-flickr"><em></em><a href="#">yws-flickr</a></li>
>          <li class="yws-events"><em></em><a href="#">yws-events</a></li>
>        </ul>
> 
> 
> Here's the default `git diff --word-diff`:
> diff --git a/adv_layout_source.html b/adv_layout_source.html
> index 18a81dd..c4ed609 100644
> --- a/adv_layout_source.html
> +++ b/adv_layout_source.html
> @@ -42,8 +42,8 @@
>       <ul>
>         <li class="ydn-patterns"><em></em><a href="#">ydn-patterns</a></li>
>         <li class="ydn-mail"><em></em><a href="#">ydn-mail</a></li>
>         [-<li class="yws-maps"><em></em><a-]{+<li><em></em><a+} 
> href="#">yws-maps</a></li>
>         [-<li class="ydn-delicious"><em></em><a-]{+<li><em></em><a+} 
> href="#">ydn-delicious</a></li>
>         <li class="yws-flickr"><em></em><a href="#">yws-flickr</a></li>
>         <li class="yws-events"><em></em><a href="#">yws-events</a></li>
>       </ul>
> 
> Which is correct, but less than ideal because it highlights much more than the 
> actual changes.
> 
> So I create a .gitattributes file with one line:
> *.html diff=html
> 
> And rerun `git diff --word-diff`:
> diff --git a/adv_layout_source.html b/adv_layout_source.html
> index 18a81dd..c4ed609 100644
> --- a/adv_layout_source.html
> +++ b/adv_layout_source.html
> @@ -42,8 +42,8 @@
>       <ul>
>         <li class="ydn-patterns"><em></em><a href="#">ydn-patterns</a></li>
>         <li class="ydn-mail"><em></em><a href="#">ydn-mail</a></li>
>         <li[-class="yws-maps"-]><em></em><a href="#">yws-maps</a></li>
>         <li><em></em><a href="#">ydn-delicious</a></li>
>         <li class="yws-flickr"><em></em><a href="#">yws-flickr</a></li>
>         <li class="yws-events"><em></em><a href="#">yws-events</a></li>
>       </ul>
> 
> Yikes! What happened to the second line of changes? The removed code is not 
> displayed at all.
> 
> This is running git 1.7.3.3.
> 
> I suspect the problem is in the html patterns in userdiff.c, but I don't 
> understand the word-diff-regex well enough to fix it.

The wordRegex should really only control what comprises a word, i.e. the
granularity of --word-diff. (Where do we insert additional line-breaks
before running ordinary diff?)

If a wordRegex can make parts of diff disappear than there is problem
deeper in the diff machinery. Can you trim this down to a minimal example?

Michael

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: html userdiff is not showing all my changes
  2010-12-15  9:06 ` Michael J Gruber
@ 2010-12-15  9:12   ` Matthijs Kooijman
  2010-12-15  9:29     ` Michael J Gruber
  0 siblings, 1 reply; 27+ messages in thread
From: Matthijs Kooijman @ 2010-12-15  9:12 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: Scott Johnson, git, trast

[-- Attachment #1: Type: text/plain, Size: 773 bytes --]

Hi Michael,

> If a wordRegex can make parts of diff disappear than there is problem
> deeper in the diff machinery.
It can do exactly that. The word regex determines what is a word, but
everything else is counted as "whitespace". The word diff view shows
only differences in words, not in whitespace (which is intentional,
since whitespace changes in things like LaTeX or HTML are not
interesting). Note that it doesn't show whitespace _differences_, but it
does show the whitespace itself (taken from the "new" version of the
file).


So, if the word regex somehow doesn't match the second line at all (or
at least not the differen part), the differences could get ignored.

> Can you trim this down to a minimal example?
That would be useful in any case.

Gr.

Matthijs

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: html userdiff is not showing all my changes
  2010-12-15  9:12   ` Matthijs Kooijman
@ 2010-12-15  9:29     ` Michael J Gruber
  0 siblings, 0 replies; 27+ messages in thread
From: Michael J Gruber @ 2010-12-15  9:29 UTC (permalink / raw)
  To: matthijs, Scott Johnson, git, trast

Matthijs Kooijman venit, vidit, dixit 15.12.2010 10:12:
> Hi Michael,
> 
>> If a wordRegex can make parts of diff disappear than there is problem
>> deeper in the diff machinery.
> It can do exactly that. The word regex determines what is a word, but
> everything else is counted as "whitespace". The word diff view shows
> only differences in words, not in whitespace (which is intentional,
> since whitespace changes in things like LaTeX or HTML are not
> interesting). Note that it doesn't show whitespace _differences_, but it
> does show the whitespace itself (taken from the "new" version of the
> file).

Yep, I just found out myself experimenting with a wordRegex for csv.
Seems like quite a "Gimme rope" feature...

So, it's the regex.

> So, if the word regex somehow doesn't match the second line at all (or
> at least not the differen part), the differences could get ignored.
> 
>> Can you trim this down to a minimal example?
> That would be useful in any case.

What strikes me is that both lines are semantically identical, yet one
is treated correctly and the other isn't.

Michael

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 0/4] --word-regex sanity checking and such
  2010-12-15  3:47 html userdiff is not showing all my changes Scott Johnson
  2010-12-15  9:06 ` Michael J Gruber
@ 2010-12-15 15:13 ` Thomas Rast
  2010-12-15 15:13   ` [PATCH 1/4] diff.c: pass struct diff_words into find_word_boundaries Thomas Rast
                     ` (4 more replies)
  1 sibling, 5 replies; 27+ messages in thread
From: Thomas Rast @ 2010-12-15 15:13 UTC (permalink / raw)
  To: Scott Johnson; +Cc: Michael J Gruber, Matthijs Kooijman, git

[Forgot the list and Matthijs on the first sending.  Sorry for the
spam!]

Scott Johnson wrote [trimmed and wraps fixed up]:
> Here's the default `git diff --word-diff`:
>         [-<li class="yws-maps"><em></em><a-]{+<li><em></em><a+} href="#">yws-maps</a></li>
>         [-<li class="ydn-delicious"><em></em><a-]{+<li><em></em><a+} href="#">ydn-delicious</a></li>
> 
> Which is correct, but less than ideal because it highlights much more than the 
> actual changes.
> 
> So I create a .gitattributes file with one line:
> *.html diff=html
> 
> And rerun `git diff --word-diff`:
>         <li[-class="yws-maps"-]><em></em><a href="#">yws-maps</a></li>
>         <li><em></em><a href="#">ydn-delicious</a></li>
> 
> Yikes! What happened to the second line of changes? The removed code is not 
> displayed at all.

Michael J Gruber wrote:
> Yep, I just found out myself experimenting with a wordRegex for csv.
> Seems like quite a "Gimme rope" feature...
> 
> So, it's the regex.

Well. Yes. No. Maybe.

Thanks for bringing this to my attention.  I currently have enough
more serious work to avoid that this actually motivated me to hack up
a sanity check.  It's just far too error prone as it is now.

But I cannot reproduce the problem!  I put Scott's two offending lines
(taken from his "straight" diff) into t4034/html/{pre,post}, and I
think the output is valid.  Also, the word regex for html has long
included the |[^[:space:]] safeguard (actually they all do except for
bibtex, which is even more lenient on what it matches).  So you either
found an example that depends on more context (which would be *really*
bad) or there is another source of bad regexes.  Anyway, the safeguard
should easily catch the latter case.

This did unearth a bug in the ruby regex, though, so it's been worth
the trouble.

Various small issues with this patch series:

* [4/4] I stole the html test from Scott's mail, and some of the rest
  from various Wikibooks sources on "Hello World" in each language,
  usually extended by a bit of code that tests the world-splitting
  power.  I hope this is ok with Scott and the Copyright overlords.
  There are only so many ways to spell "Hello World", and only so many
  languages I know myself.

* [4/4] Many patterns do not split 1+2, probably because they stick +2
  together as a signed integer literal, even though I think they
  should.  I ran out of time to investigate however.

* [3/4] was actually detected with the help of [4/4], but putting it
  after would require heavy special casing.

* [2/4] It's a weird idiosyncrasy of the word-diff code that the exit
  status of git-diff does not depend on whether word-diff found any
  differences, and in fact the shown hunks do not either.  So the
  tests are "test_must_fail" regardless of word regex, because the
  input files differ at a byte level.  Maybe at least hunks without
  word differences should be suppressed?


Thomas Rast (4):
  diff.c: pass struct diff_words into find_word_boundaries
  diff.c: implement a sanity check for word regexes
  userdiff: fix typo in ruby word regex
  t4034: bulk verify builtin word regex sanity

 Documentation/config.txt |    8 ++++
 diff.c                   |  104 +++++++++++++++++++++++++++++++++++++++++----
 diff.h                   |    1 +
 t/t4034-diff-words.sh    |   85 +++++++++++++++++++++++++++++++++++++-
 t/t4034/bibtex/expect    |   15 +++++++
 t/t4034/bibtex/post      |   10 ++++
 t/t4034/bibtex/pre       |    9 ++++
 t/t4034/cpp/expect       |   10 ++++
 t/t4034/cpp/post         |    5 ++
 t/t4034/cpp/pre          |    4 ++
 t/t4034/csharp/expect    |   12 +++++
 t/t4034/csharp/post      |    8 ++++
 t/t4034/csharp/pre       |    8 ++++
 t/t4034/fortran/expect   |   12 +++++
 t/t4034/fortran/post     |    7 +++
 t/t4034/fortran/pre      |    7 +++
 t/t4034/html/expect      |    7 +++
 t/t4034/html/post        |    2 +
 t/t4034/html/pre         |    2 +
 t/t4034/java/expect      |   11 +++++
 t/t4034/java/post        |    6 +++
 t/t4034/java/pre         |    6 +++
 t/t4034/objc/expect      |   11 +++++
 t/t4034/objc/post        |    7 +++
 t/t4034/objc/pre         |    7 +++
 t/t4034/pascal/expect    |   12 +++++
 t/t4034/pascal/post      |    7 +++
 t/t4034/pascal/pre       |    7 +++
 t/t4034/php/expect       |    7 +++
 t/t4034/php/post         |    2 +
 t/t4034/php/pre          |    2 +
 t/t4034/python/expect    |    8 ++++
 t/t4034/python/post      |    3 +
 t/t4034/python/pre       |    3 +
 t/t4034/ruby/expect      |    7 +++
 t/t4034/ruby/post        |    2 +
 t/t4034/ruby/pre         |    2 +
 t/t4034/tex/expect       |    9 ++++
 t/t4034/tex/post         |    4 ++
 t/t4034/tex/pre          |    4 ++
 userdiff.c               |    2 +-
 41 files changed, 433 insertions(+), 12 deletions(-)
 create mode 100644 t/t4034/bibtex/expect
 create mode 100644 t/t4034/bibtex/post
 create mode 100644 t/t4034/bibtex/pre
 create mode 100644 t/t4034/cpp/expect
 create mode 100644 t/t4034/cpp/post
 create mode 100644 t/t4034/cpp/pre
 create mode 100644 t/t4034/csharp/expect
 create mode 100644 t/t4034/csharp/post
 create mode 100644 t/t4034/csharp/pre
 create mode 100644 t/t4034/fortran/expect
 create mode 100644 t/t4034/fortran/post
 create mode 100644 t/t4034/fortran/pre
 create mode 100644 t/t4034/html/expect
 create mode 100644 t/t4034/html/post
 create mode 100644 t/t4034/html/pre
 create mode 100644 t/t4034/java/expect
 create mode 100644 t/t4034/java/post
 create mode 100644 t/t4034/java/pre
 create mode 100644 t/t4034/objc/expect
 create mode 100644 t/t4034/objc/post
 create mode 100644 t/t4034/objc/pre
 create mode 100644 t/t4034/pascal/expect
 create mode 100644 t/t4034/pascal/post
 create mode 100644 t/t4034/pascal/pre
 create mode 100644 t/t4034/php/expect
 create mode 100644 t/t4034/php/post
 create mode 100644 t/t4034/php/pre
 create mode 100644 t/t4034/python/expect
 create mode 100644 t/t4034/python/post
 create mode 100644 t/t4034/python/pre
 create mode 100644 t/t4034/ruby/expect
 create mode 100644 t/t4034/ruby/post
 create mode 100644 t/t4034/ruby/pre
 create mode 100644 t/t4034/tex/expect
 create mode 100644 t/t4034/tex/post
 create mode 100644 t/t4034/tex/pre

-- 
1.7.3.3.807.g6ee1f

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/4] diff.c: pass struct diff_words into find_word_boundaries
  2010-12-15 15:13 ` [PATCH 0/4] --word-regex sanity checking and such Thomas Rast
@ 2010-12-15 15:13   ` Thomas Rast
  2010-12-15 15:13   ` [PATCH 2/4] diff.c: implement a sanity check for word regexes Thomas Rast
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 27+ messages in thread
From: Thomas Rast @ 2010-12-15 15:13 UTC (permalink / raw)
  To: Scott Johnson; +Cc: Michael J Gruber, Matthijs Kooijman, git

We need the word_regex_check member.  Instead of adding another
argument, just pass in the whole struct for future extensibility.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 diff.c |   11 ++++++-----
 1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/diff.c b/diff.c
index a16ce69..8758a51 100644
--- a/diff.c
+++ b/diff.c
@@ -778,12 +778,13 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
 }
 
 /* This function starts looking at *begin, and returns 0 iff a word was found. */
-static int find_word_boundaries(mmfile_t *buffer, regex_t *word_regex,
+static int find_word_boundaries(mmfile_t *buffer, struct diff_words_data *diff_words,
 		int *begin, int *end)
 {
-	if (word_regex && *begin < buffer->size) {
+	if (diff_words->word_regex && *begin < buffer->size) {
 		regmatch_t match[1];
-		if (!regexec(word_regex, buffer->ptr + *begin, 1, match, 0)) {
+		if (!regexec(diff_words->word_regex, buffer->ptr + *begin,
+			     1, match, 0)) {
 			char *p = memchr(buffer->ptr + *begin + match[0].rm_so,
 					'\n', match[0].rm_eo - match[0].rm_so);
 			*end = p ? p - buffer->ptr : match[0].rm_eo + *begin;
@@ -813,7 +814,7 @@ static int find_word_boundaries(mmfile_t *buffer, regex_t *word_regex,
  * in buffer->orig.
  */
 static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out,
-		regex_t *word_regex)
+		struct diff_words_data *diff_words)
 {
 	int i, j;
 	long alloc = 0;
@@ -827,7 +828,7 @@ static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out,
 	buffer->orig_nr = 1;
 
 	for (i = 0; i < buffer->text.size; i++) {
-		if (find_word_boundaries(&buffer->text, word_regex, &i, &j))
+		if (find_word_boundaries(&buffer->text, diff_words, &i, &j))
 			return;
 
 		/* store original boundaries */
-- 
1.7.3.3.807.g6ee1f

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/4] diff.c: implement a sanity check for word regexes
  2010-12-15 15:13 ` [PATCH 0/4] --word-regex sanity checking and such Thomas Rast
  2010-12-15 15:13   ` [PATCH 1/4] diff.c: pass struct diff_words into find_word_boundaries Thomas Rast
@ 2010-12-15 15:13   ` Thomas Rast
  2010-12-15 15:13   ` [PATCH 3/4] userdiff: fix typo in ruby word regex Thomas Rast
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 27+ messages in thread
From: Thomas Rast @ 2010-12-15 15:13 UTC (permalink / raw)
  To: Scott Johnson; +Cc: Michael J Gruber, Matthijs Kooijman, git

Word regexes are a bit of a dangerous beast, since it is easily
possible to not match a non-space part, which is subsequently ignored
for the purposes of emitting the word diff.  This was clearly stated
in the docs, but users still tripped over it.

Implement a safeguard that verifies two basic sanity assumptions:

* The word regex matches anything that is !isspace().

* The word regex does not match '\n'.  (This case is not very harmful,
  but we used to silently cut off at the '\n' which may go against
  user expectations.)

This is configurable via 'diff.wordRegexCheck', and defaults to
'warn'.

Reported-by: Scott Johnson <scottj75074@yahoo.com>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 Documentation/config.txt |    8 ++++
 diff.c                   |   93 +++++++++++++++++++++++++++++++++++++++++++--
 diff.h                   |    1 +
 t/t4034-diff-words.sh    |   65 +++++++++++++++++++++++++++++++-
 4 files changed, 161 insertions(+), 6 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index bf9479e..2e033ea 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -897,6 +897,14 @@ diff.wordRegex::
 	sequences that match the regular expression are "words", all other
 	characters are *ignorable* whitespace.
 
+diff.wordRegexCheck::
+	Perform a simple sanity check on matches of the word regex.
+	Currently this check ensures that the word regex matches all
+	non-space characters, and that the word regex does not match a
+	newline.  The setting controls what to do when the check
+	fails: 'false'/'off'/'ignore' ignore, 'true'/'on'/'warn' emit
+	a warning, and 'error' abort with an error message.
+
 fetch.recurseSubmodules::
 	A boolean value which changes the behavior for fetch and pull, the
 	default is to not recursively fetch populated sumodules unless
diff --git a/diff.c b/diff.c
index 8758a51..becefcf 100644
--- a/diff.c
+++ b/diff.c
@@ -22,11 +22,17 @@
 #define FAST_WORKING_DIRECTORY 1
 #endif
 
+#define REGEX_CHECK_UNSET -1
+#define REGEX_CHECK_OFF 0
+#define REGEX_CHECK_WARN 1
+#define REGEX_CHECK_ERROR 2
+
 static int diff_detect_rename_default;
 static int diff_rename_limit_default = 200;
 static int diff_suppress_blank_empty;
 int diff_use_color_default = -1;
 static const char *diff_word_regex_cfg;
+static int diff_word_regex_check_cfg = REGEX_CHECK_UNSET;
 static const char *external_diff_cmd_cfg;
 int diff_auto_refresh_index = 1;
 static int diff_mnemonic_prefix;
@@ -75,6 +81,19 @@ static int git_config_rename(const char *var, const char *value)
 	return git_config_bool(var,value) ? DIFF_DETECT_RENAME : 0;
 }
 
+static int parse_regex_check_level(int *b, const char *k, const char *v)
+{
+	if (v && !strcasecmp(v, "ignore"))
+		*b = REGEX_CHECK_OFF;
+	else if (v && !strcasecmp(v, "warn"))
+		*b = REGEX_CHECK_WARN;
+	else if (v && !strcasecmp(v, "error"))
+		*b = REGEX_CHECK_ERROR;
+	else
+		*b = git_config_bool(k, v);
+	return 1;
+}
+
 /*
  * These are to give UI layer defaults.
  * The core-level commands such as git-diff-files should
@@ -107,6 +126,8 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
 		return git_config_string(&external_diff_cmd_cfg, var, value);
 	if (!strcmp(var, "diff.wordregex"))
 		return git_config_string(&diff_word_regex_cfg, var, value);
+	if (!strcmp(var, "diff.wordregexcheck"))
+		return parse_regex_check_level(&diff_word_regex_check_cfg, var, value);
 
 	if (!strcmp(var, "diff.ignoresubmodules"))
 		handle_ignore_submodules_arg(&default_diff_options, value);
@@ -777,6 +798,50 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
 	diff_words->last_minus = minus_first;
 }
 
+
+static void check_word_regex_match(struct diff_words_data *diff_words,
+		char *start, int len, int unmatched)
+{
+	int check = diff_words->opt->word_regex_check;
+	void (*report_fn)(const char *, ...);
+
+	if (check == REGEX_CHECK_OFF)
+		return;
+
+	if (check == REGEX_CHECK_WARN)
+		report_fn = warning;
+	else if (check == REGEX_CHECK_ERROR)
+		report_fn = die;
+	else
+		assert(!"expected REGEX_CHECK_WARN or _ERROR");
+
+	if (unmatched) {
+		int i;
+		char *match_str;
+		for (i = 0; i < len; i++) {
+			if (isspace(start[i]))
+				continue;
+			match_str = xmemdupz(start, len);
+			report_fn("The following snippet contains non-space "
+				  "characters, but was not\nmatched by the "
+				  "word regex:\n'%s'\n"
+				  "They would be ignored for the purposes of "
+				  "the diff, which is\nusually not what you want.",
+				  match_str);
+			free(match_str);
+			break;
+		}
+	} else {
+		if (memchr(start, '\n', len)) {
+			char *match_str = xmemdupz(start, len);
+			report_fn("The following word regex match contains a newline "
+				  "and will be truncated there:\n'%s'",
+				  match_str);
+			free(match_str);
+		}
+	}
+}
+
 /* This function starts looking at *begin, and returns 0 iff a word was found. */
 static int find_word_boundaries(mmfile_t *buffer, struct diff_words_data *diff_words,
 		int *begin, int *end)
@@ -785,8 +850,15 @@ static int find_word_boundaries(mmfile_t *buffer, struct diff_words_data *diff_w
 		regmatch_t match[1];
 		if (!regexec(diff_words->word_regex, buffer->ptr + *begin,
 			     1, match, 0)) {
-			char *p = memchr(buffer->ptr + *begin + match[0].rm_so,
-					'\n', match[0].rm_eo - match[0].rm_so);
+			char *prev_start = buffer->ptr + *begin;
+			char *match_start = prev_start + match[0].rm_so;
+			int match_len = match[0].rm_eo - match[0].rm_so;
+			char *p;
+			check_word_regex_match(diff_words, prev_start,
+					       match_start-prev_start, 1);
+			check_word_regex_match(diff_words, match_start,
+					       match_len, 0);
+			p = memchr(match_start, '\n', match_len);
 			*end = p ? p - buffer->ptr : match[0].rm_eo + *begin;
 			*begin += match[0].rm_so;
 			return *begin >= *end;
@@ -829,7 +901,7 @@ static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out,
 
 	for (i = 0; i < buffer->text.size; i++) {
 		if (find_word_boundaries(&buffer->text, diff_words, &i, &j))
-			return;
+			break;
 
 		/* store original boundaries */
 		ALLOC_GROW(buffer->orig, buffer->orig_nr + 1,
@@ -846,6 +918,11 @@ static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out,
 
 		i = j - 1;
 	}
+
+	/* no more boundaries, check any non-matched chunk remaining */
+	if (i < buffer->text.size)
+		check_word_regex_match(diff_words, buffer->text.ptr + i,
+				       buffer->text.size-i, 1);
 }
 
 /* this executes the word diff on the accumulated buffers */
@@ -882,8 +959,8 @@ static void diff_words_show(struct diff_words_data *diff_words)
 
 	memset(&xpp, 0, sizeof(xpp));
 	memset(&xecfg, 0, sizeof(xecfg));
-	diff_words_fill(&diff_words->minus, &minus, diff_words->word_regex);
-	diff_words_fill(&diff_words->plus, &plus, diff_words->word_regex);
+	diff_words_fill(&diff_words->minus, &minus, diff_words);
+	diff_words_fill(&diff_words->plus, &plus, diff_words);
 	xpp.flags = 0;
 	/* as only the hunk header will be parsed, we need a 0-context */
 	xecfg.ctxlen = 0;
@@ -2021,6 +2098,10 @@ static void builtin_diff(const char *name_a,
 				o->word_regex = userdiff_word_regex(two);
 			if (!o->word_regex)
 				o->word_regex = diff_word_regex_cfg;
+			if (o->word_regex_check == REGEX_CHECK_UNSET)
+				o->word_regex_check = diff_word_regex_check_cfg;
+			if (o->word_regex_check == REGEX_CHECK_UNSET)
+				o->word_regex_check = REGEX_CHECK_WARN;
 			if (o->word_regex) {
 				ecbdata.diff_words->word_regex = (regex_t *)
 					xmalloc(sizeof(regex_t));
@@ -2861,6 +2942,8 @@ void diff_setup(struct diff_options *options)
 		options->a_prefix = "a/";
 		options->b_prefix = "b/";
 	}
+
+	options->word_regex_check = REGEX_CHECK_UNSET;
 }
 
 int diff_setup_done(struct diff_options *options)
diff --git a/diff.h b/diff.h
index 165f368..5f6a7be 100644
--- a/diff.h
+++ b/diff.h
@@ -123,6 +123,7 @@ struct diff_options {
 	int stat_width;
 	int stat_name_width;
 	const char *word_regex;
+	int word_regex_check;
 	enum diff_words_type word_diff;
 
 	/* this is set by diffcore for DIFF_FORMAT_PATCH */
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index 8096d8a..ebe72ce 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -8,7 +8,8 @@ test_expect_success setup '
 
 	git config diff.color.old red &&
 	git config diff.color.new green &&
-	git config diff.color.func magenta
+	git config diff.color.func magenta &&
+	git config diff.wordRegexCheck off
 
 '
 
@@ -331,4 +332,66 @@ test_expect_success '--word-diff=none' '
 
 '
 
+echo abcd > pre
+echo aXYd > post
+
+test_expect_success 'diff.wordRegexCheck="error" catches nonspaces' '
+
+	git config diff.wordRegexCheck error &&
+	test_must_fail git diff --no-index --word-diff-regex="a|d" pre post 2>out &&
+	grep "fatal.*contains non-space characters" out
+
+'
+
+newline="
+"
+
+test_expect_success 'diff.wordRegexCheck="error" catches newlines' '
+
+	git config diff.wordRegexCheck error &&
+	test_must_fail git diff --no-index --word-diff-regex=".|$newline" pre post 2>out &&
+	grep "fatal.*contains a newline" out
+
+'
+
+test_expect_success 'diff.wordRegexCheck="warn" works' '
+
+	git config diff.wordRegexCheck warn &&
+	test_must_fail git diff --no-index --word-diff-regex="a|d" pre post 2>out &&
+	grep "warning.*contains non-space characters" out
+
+'
+
+test_expect_success 'diff.wordRegexCheck="ignore" works' '
+
+	git config diff.wordRegexCheck ignore &&
+	test_must_fail git diff --no-index --word-diff-regex="a|d" pre post 2>out &&
+	! grep "contains non-space characters" out
+
+'
+
+test_expect_success 'diff.wordRegexCheck="false" is like "ignore"' '
+
+	git config diff.wordRegexCheck false &&
+	test_must_fail git diff --no-index --word-diff-regex="a|d" pre post 2>out &&
+	! grep "contains non-space characters" out
+
+'
+
+test_expect_success 'diff.wordRegexCheck="true" is like "warn"' '
+
+	git config diff.wordRegexCheck true &&
+	test_must_fail git diff --no-index --word-diff-regex="a|d" pre post 2>out &&
+	grep "warning.*contains non-space characters" out
+
+'
+
+test_expect_success 'diff.wordRegexCheck unset is like "warn"' '
+
+	git config --unset diff.wordRegexCheck &&
+	test_must_fail git diff --no-index --word-diff-regex="a|d" pre post 2>out &&
+	grep "warning.*contains non-space characters" out
+
+'
+
 test_done
-- 
1.7.3.3.807.g6ee1f

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/4] userdiff: fix typo in ruby word regex
  2010-12-15 15:13 ` [PATCH 0/4] --word-regex sanity checking and such Thomas Rast
  2010-12-15 15:13   ` [PATCH 1/4] diff.c: pass struct diff_words into find_word_boundaries Thomas Rast
  2010-12-15 15:13   ` [PATCH 2/4] diff.c: implement a sanity check for word regexes Thomas Rast
@ 2010-12-15 15:13   ` Thomas Rast
  2010-12-15 15:13   ` [PATCH 4/4] t4034: bulk verify builtin word regex sanity Thomas Rast
       [not found]   ` <913156.57703.qm@web110711.mail.gq1.yahoo.com>
  4 siblings, 0 replies; 27+ messages in thread
From: Thomas Rast @ 2010-12-15 15:13 UTC (permalink / raw)
  To: Scott Johnson; +Cc: Michael J Gruber, Matthijs Kooijman, git

The regex had an unclosed ] that pretty much ruined the safeguard
against not matching a non-space char.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 userdiff.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/userdiff.c b/userdiff.c
index f9e05b5..4d6433b 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -81,7 +81,7 @@
 	 "(@|@@|\\$)?[a-zA-Z_][a-zA-Z0-9_]*"
 	 "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+|\\?(\\\\C-)?(\\\\M-)?."
 	 "|//=?|[-+*/<>%&^|=!]=|<<=?|>>=?|===|\\.{1,3}|::|[!=]~"
-	 "|[^[:space:]|[\x80-\xff]+"),
+	 "|[^[:space:]]|[\x80-\xff]+"),
 PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
 	 "[={}\"]|[^={}\" \t]+"),
 PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
-- 
1.7.3.3.807.g6ee1f

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/4] t4034: bulk verify builtin word regex sanity
  2010-12-15 15:13 ` [PATCH 0/4] --word-regex sanity checking and such Thomas Rast
                     ` (2 preceding siblings ...)
  2010-12-15 15:13   ` [PATCH 3/4] userdiff: fix typo in ruby word regex Thomas Rast
@ 2010-12-15 15:13   ` Thomas Rast
       [not found]   ` <913156.57703.qm@web110711.mail.gq1.yahoo.com>
  4 siblings, 0 replies; 27+ messages in thread
From: Thomas Rast @ 2010-12-15 15:13 UTC (permalink / raw)
  To: Scott Johnson; +Cc: Michael J Gruber, Matthijs Kooijman, git

The builtin word regexes should be tested with some simple examples
against simple issues, like failing to match a non-space character.

Do this in bulk.  Many of these patterns are a rather ad-hoc
combination of a few simple lines of code, so they can certainly be
improved.  However, they already unearthed a typo in the ruby pattern
(previous commit).

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 t/t4034-diff-words.sh  |   20 ++++++++++++++++++++
 t/t4034/bibtex/expect  |   15 +++++++++++++++
 t/t4034/bibtex/post    |   10 ++++++++++
 t/t4034/bibtex/pre     |    9 +++++++++
 t/t4034/cpp/expect     |   10 ++++++++++
 t/t4034/cpp/post       |    5 +++++
 t/t4034/cpp/pre        |    4 ++++
 t/t4034/csharp/expect  |   12 ++++++++++++
 t/t4034/csharp/post    |    8 ++++++++
 t/t4034/csharp/pre     |    8 ++++++++
 t/t4034/fortran/expect |   12 ++++++++++++
 t/t4034/fortran/post   |    7 +++++++
 t/t4034/fortran/pre    |    7 +++++++
 t/t4034/html/expect    |    7 +++++++
 t/t4034/html/post      |    2 ++
 t/t4034/html/pre       |    2 ++
 t/t4034/java/expect    |   11 +++++++++++
 t/t4034/java/post      |    6 ++++++
 t/t4034/java/pre       |    6 ++++++
 t/t4034/objc/expect    |   11 +++++++++++
 t/t4034/objc/post      |    7 +++++++
 t/t4034/objc/pre       |    7 +++++++
 t/t4034/pascal/expect  |   12 ++++++++++++
 t/t4034/pascal/post    |    7 +++++++
 t/t4034/pascal/pre     |    7 +++++++
 t/t4034/php/expect     |    7 +++++++
 t/t4034/php/post       |    2 ++
 t/t4034/php/pre        |    2 ++
 t/t4034/python/expect  |    8 ++++++++
 t/t4034/python/post    |    3 +++
 t/t4034/python/pre     |    3 +++
 t/t4034/ruby/expect    |    7 +++++++
 t/t4034/ruby/post      |    2 ++
 t/t4034/ruby/pre       |    2 ++
 t/t4034/tex/expect     |    9 +++++++++
 t/t4034/tex/post       |    4 ++++
 t/t4034/tex/pre        |    4 ++++
 37 files changed, 265 insertions(+), 0 deletions(-)
 create mode 100644 t/t4034/bibtex/expect
 create mode 100644 t/t4034/bibtex/post
 create mode 100644 t/t4034/bibtex/pre
 create mode 100644 t/t4034/cpp/expect
 create mode 100644 t/t4034/cpp/post
 create mode 100644 t/t4034/cpp/pre
 create mode 100644 t/t4034/csharp/expect
 create mode 100644 t/t4034/csharp/post
 create mode 100644 t/t4034/csharp/pre
 create mode 100644 t/t4034/fortran/expect
 create mode 100644 t/t4034/fortran/post
 create mode 100644 t/t4034/fortran/pre
 create mode 100644 t/t4034/html/expect
 create mode 100644 t/t4034/html/post
 create mode 100644 t/t4034/html/pre
 create mode 100644 t/t4034/java/expect
 create mode 100644 t/t4034/java/post
 create mode 100644 t/t4034/java/pre
 create mode 100644 t/t4034/objc/expect
 create mode 100644 t/t4034/objc/post
 create mode 100644 t/t4034/objc/pre
 create mode 100644 t/t4034/pascal/expect
 create mode 100644 t/t4034/pascal/post
 create mode 100644 t/t4034/pascal/pre
 create mode 100644 t/t4034/php/expect
 create mode 100644 t/t4034/php/post
 create mode 100644 t/t4034/php/pre
 create mode 100644 t/t4034/python/expect
 create mode 100644 t/t4034/python/post
 create mode 100644 t/t4034/python/pre
 create mode 100644 t/t4034/ruby/expect
 create mode 100644 t/t4034/ruby/post
 create mode 100644 t/t4034/ruby/pre
 create mode 100644 t/t4034/tex/expect
 create mode 100644 t/t4034/tex/post
 create mode 100644 t/t4034/tex/pre

diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index ebe72ce..b085948 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -394,4 +394,24 @@ test_expect_success 'diff.wordRegexCheck unset is like "warn"' '
 
 '
 
+test_expect_success 'set diff.wordRegexCheck=error for language tests' '
+
+	git config diff.wordRegexCheck error
+
+'
+
+word_diff_for_language () {
+	cp $TEST_DIRECTORY/t4034/$1/pre $TEST_DIRECTORY/t4034/$1/post \
+		$TEST_DIRECTORY/t4034/$1/expect . &&
+	echo "* diff=$1" > .gitattributes &&
+	word_diff --color-words
+}
+
+for lang_dir in $TEST_DIRECTORY/t4034/*; do
+	lang=${lang_dir#$TEST_DIRECTORY/t4034/}
+	test_expect_success "diff driver '$lang' has sane word regex" "
+		word_diff_for_language $lang
+	"
+done
+
 test_done
diff --git a/t/t4034/bibtex/expect b/t/t4034/bibtex/expect
new file mode 100644
index 0000000..a157774
--- /dev/null
+++ b/t/t4034/bibtex/expect
@@ -0,0 +1,15 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 95cd55b..ddcba9b 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,9 +1,10 @@<RESET>
+@article{aldous1987uie,<RESET>
+  title={{Ultimate instability of exponential back-off protocol for acknowledgment-based transmission control of random access communication channels}},<RESET>
+  author={Aldous, <RED>D.<RESET><GREEN>David<RESET>},
+  journal={Information Theory, IEEE Transactions on},<RESET>
+  volume={<RED>33<RESET><GREEN>Bogus.<RESET>},
+  number={<RED>2<RESET><GREEN>4<RESET>},
+  pages={219--223},<RESET>
+  year=<GREEN>1987,<RESET>
+<GREEN>  note={This is in fact a rather funny read since ethernet works well in practice. The<RESET> {<RED>1987<RESET><GREEN>\em pre} reference is the right one, however.<RESET>}<RED>,<RESET>
+}<RESET>
diff --git a/t/t4034/bibtex/post b/t/t4034/bibtex/post
new file mode 100644
index 0000000..ddcba9b
--- /dev/null
+++ b/t/t4034/bibtex/post
@@ -0,0 +1,10 @@
+@article{aldous1987uie,
+  title={{Ultimate instability of exponential back-off protocol for acknowledgment-based transmission control of random access communication channels}},
+  author={Aldous, David},
+  journal={Information Theory, IEEE Transactions on},
+  volume={Bogus.},
+  number={4},
+  pages={219--223},
+  year=1987,
+  note={This is in fact a rather funny read since ethernet works well in practice. The {\em pre} reference is the right one, however.}
+}
diff --git a/t/t4034/bibtex/pre b/t/t4034/bibtex/pre
new file mode 100644
index 0000000..95cd55b
--- /dev/null
+++ b/t/t4034/bibtex/pre
@@ -0,0 +1,9 @@
+@article{aldous1987uie,
+  title={{Ultimate instability of exponential back-off protocol for acknowledgment-based transmission control of random access communication channels}},
+  author={Aldous, D.},
+  journal={Information Theory, IEEE Transactions on},
+  volume={33},
+  number={2},
+  pages={219--223},
+  year={1987},
+}
diff --git a/t/t4034/cpp/expect b/t/t4034/cpp/expect
new file mode 100644
index 0000000..e529842
--- /dev/null
+++ b/t/t4034/cpp/expect
@@ -0,0 +1,10 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 5517c3c..17aa265 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,4 +1,5 @@<RESET>
+class Foo : public Thing{
+	Foo() : x(0<RED>&&1<RESET><GREEN>&42<RESET>) {
+		<GREEN>bar(x);<RESET>
+	}
+}<GREEN>;<RESET>
diff --git a/t/t4034/cpp/post b/t/t4034/cpp/post
new file mode 100644
index 0000000..17aa265
--- /dev/null
+++ b/t/t4034/cpp/post
@@ -0,0 +1,5 @@
+class Foo : public Thing{
+	Foo() : x(0&42) {
+		bar(x);
+	}
+};
diff --git a/t/t4034/cpp/pre b/t/t4034/cpp/pre
new file mode 100644
index 0000000..5517c3c
--- /dev/null
+++ b/t/t4034/cpp/pre
@@ -0,0 +1,4 @@
+class Foo:public Thing{
+	Foo():x(0&&1){}
+}
+
diff --git a/t/t4034/csharp/expect b/t/t4034/csharp/expect
new file mode 100644
index 0000000..c8b6b8f
--- /dev/null
+++ b/t/t4034/csharp/expect
@@ -0,0 +1,12 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 8ff9319..9869fa9 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -2,7 +2,7 @@<RESET> <RESET><MAGENTA>class Program<RESET>
+{<RESET>
+	static void Main()<RESET>
+	{<RESET>
+		Console.WriteLine("Hello, <GREEN>New<RESET> World!");
+		int i = <RED>5<RESET><GREEN>5+0<RESET>;
+	}<RESET>
+}<RESET>
diff --git a/t/t4034/csharp/post b/t/t4034/csharp/post
new file mode 100644
index 0000000..9869fa9
--- /dev/null
+++ b/t/t4034/csharp/post
@@ -0,0 +1,8 @@
+class Program
+{
+	static void Main()
+	{
+		Console.WriteLine("Hello, New World!");
+		int i = 5+0;
+	}
+}
diff --git a/t/t4034/csharp/pre b/t/t4034/csharp/pre
new file mode 100644
index 0000000..8ff9319
--- /dev/null
+++ b/t/t4034/csharp/pre
@@ -0,0 +1,8 @@
+class Program
+{
+	static void Main()
+	{
+		Console.WriteLine("Hello, World!");
+		int i=5;
+	}
+}
diff --git a/t/t4034/fortran/expect b/t/t4034/fortran/expect
new file mode 100644
index 0000000..5a25663
--- /dev/null
+++ b/t/t4034/fortran/expect
@@ -0,0 +1,12 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 08b4e5a..fb7cb51 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,7 +1,7 @@<RESET>
+program hello<RESET>
+   print *, "Hello World<RED>!<RESET><GREEN>?<RESET>"
+
+   DO10I = 1,10<RESET>
+   <RED>DO10I<RESET><GREEN>DO 10 I<RESET> = 1,10
+   <RED>DO10I<RESET><GREEN>DO 1 0 I<RESET> = 1,10
+end program hello<RESET>
diff --git a/t/t4034/fortran/post b/t/t4034/fortran/post
new file mode 100644
index 0000000..fb7cb51
--- /dev/null
+++ b/t/t4034/fortran/post
@@ -0,0 +1,7 @@
+program hello
+   print *, "Hello World?"
+
+   DO10I = 1,10
+   DO 10 I = 1,10
+   DO 1 0 I = 1,10
+end program hello
diff --git a/t/t4034/fortran/pre b/t/t4034/fortran/pre
new file mode 100644
index 0000000..08b4e5a
--- /dev/null
+++ b/t/t4034/fortran/pre
@@ -0,0 +1,7 @@
+program hello
+   print *, "Hello World!"
+
+   DO10I = 1,10
+   DO10I = 1,10
+   DO10I = 1,10
+end program hello
diff --git a/t/t4034/html/expect b/t/t4034/html/expect
new file mode 100644
index 0000000..78d28d4
--- /dev/null
+++ b/t/t4034/html/expect
@@ -0,0 +1,7 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 8bf936a..125bdd5 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,2 +1,2 @@<RESET>
+<li<RED>class="yws-maps"<RESET>><em></em><a href="#">yws-maps</a></li>
+<li<RED>class="ydn-delicious"<RESET>><em></em><a href="#">ydn-delicious</a></li>
diff --git a/t/t4034/html/post b/t/t4034/html/post
new file mode 100644
index 0000000..125bdd5
--- /dev/null
+++ b/t/t4034/html/post
@@ -0,0 +1,2 @@
+<li><em></em><a href="#">yws-maps</a></li>
+<li><em></em><a href="#">ydn-delicious</a></li>
diff --git a/t/t4034/html/pre b/t/t4034/html/pre
new file mode 100644
index 0000000..8bf936a
--- /dev/null
+++ b/t/t4034/html/pre
@@ -0,0 +1,2 @@
+<li class="yws-maps"><em></em><a href="#">yws-maps</a></li>
+<li class="ydn-delicious"><em></em><a href="#">ydn-delicious</a></li>
diff --git a/t/t4034/java/expect b/t/t4034/java/expect
new file mode 100644
index 0000000..9d99523
--- /dev/null
+++ b/t/t4034/java/expect
@@ -0,0 +1,11 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index ae11cd3..fd61213 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,6 +1,6 @@<RESET>
+public class HelloWorld {<RESET>
+   public static void main(String[] args) {<RESET>
+       System.out.println("Hello <RED>,<RESET><GREEN>--<RESET> world!");
+       int i = <RED>1+2<RESET><GREEN>1 + 2<RESET>;
+   }<RESET>
+}<RESET>
diff --git a/t/t4034/java/post b/t/t4034/java/post
new file mode 100644
index 0000000..fd61213
--- /dev/null
+++ b/t/t4034/java/post
@@ -0,0 +1,6 @@
+public class HelloWorld {
+   public static void main(String[] args) {
+       System.out.println("Hello -- world!");
+       int i = 1 + 2;
+   }
+}
diff --git a/t/t4034/java/pre b/t/t4034/java/pre
new file mode 100644
index 0000000..ae11cd3
--- /dev/null
+++ b/t/t4034/java/pre
@@ -0,0 +1,6 @@
+public class HelloWorld {
+   public static void main(String[] args) {
+       System.out.println("Hello, world!");
+       int i = 1+2;
+   }
+}
diff --git a/t/t4034/objc/expect b/t/t4034/objc/expect
new file mode 100644
index 0000000..a29fec5
--- /dev/null
+++ b/t/t4034/objc/expect
@@ -0,0 +1,11 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 8eb298d..e728a08 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -2,6 +2,6 @@<RESET>
+ <RESET>
+int main (void)<RESET>
+{<RESET>
+	int i = <RED>1+2<RESET><GREEN>1 + 2<RESET>;
+	printf ("Hello<GREEN>, new<RESET> world!\n");
+}<RESET>
diff --git a/t/t4034/objc/post b/t/t4034/objc/post
new file mode 100644
index 0000000..e728a08
--- /dev/null
+++ b/t/t4034/objc/post
@@ -0,0 +1,7 @@
+#import <stdio.h>
+ 
+int main (void)
+{
+	int i = 1 + 2;
+	printf ("Hello, new world!\n");
+}
diff --git a/t/t4034/objc/pre b/t/t4034/objc/pre
new file mode 100644
index 0000000..8eb298d
--- /dev/null
+++ b/t/t4034/objc/pre
@@ -0,0 +1,7 @@
+#import <stdio.h>
+ 
+int main (void)
+{
+	int i = 1+2;
+	printf ("Hello world!\n");
+}
diff --git a/t/t4034/pascal/expect b/t/t4034/pascal/expect
new file mode 100644
index 0000000..10953cf
--- /dev/null
+++ b/t/t4034/pascal/expect
@@ -0,0 +1,12 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 7c5fbef..bdd5df9 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,7 +1,7 @@<RESET>
+program HelloWorld;<RESET>
+var<RESET>
+   i : integer;
+begin<RESET>
+   i = i <RED>+1<RESET><GREEN>+ 1<RESET>;
+   writeln('Hello<GREEN>, new<RESET> world!');
+end.<RESET>
diff --git a/t/t4034/pascal/post b/t/t4034/pascal/post
new file mode 100644
index 0000000..bdd5df9
--- /dev/null
+++ b/t/t4034/pascal/post
@@ -0,0 +1,7 @@
+program HelloWorld;
+var
+   i : integer;
+begin
+   i = i + 1;
+   writeln('Hello, new world!');
+end.
diff --git a/t/t4034/pascal/pre b/t/t4034/pascal/pre
new file mode 100644
index 0000000..7c5fbef
--- /dev/null
+++ b/t/t4034/pascal/pre
@@ -0,0 +1,7 @@
+program HelloWorld;
+var
+   i: integer;
+begin
+   i = i+1;
+   writeln('Hello world!');
+end.
diff --git a/t/t4034/php/expect b/t/t4034/php/expect
new file mode 100644
index 0000000..171cfd5
--- /dev/null
+++ b/t/t4034/php/expect
@@ -0,0 +1,7 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 646a13d..1b1fe22 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,2 +1,2 @@<RESET>
+$i =<RED>$i+1<RESET><GREEN>CONSTANT + 1<RESET>;
+echo "Hello, <GREEN>New<RESET> World!\n";
diff --git a/t/t4034/php/post b/t/t4034/php/post
new file mode 100644
index 0000000..1b1fe22
--- /dev/null
+++ b/t/t4034/php/post
@@ -0,0 +1,2 @@
+$i =CONSTANT + 1;
+echo "Hello, New World!\n";
diff --git a/t/t4034/php/pre b/t/t4034/php/pre
new file mode 100644
index 0000000..646a13d
--- /dev/null
+++ b/t/t4034/php/pre
@@ -0,0 +1,2 @@
+$i=$i+1;
+echo "Hello, World!\n";
diff --git a/t/t4034/python/expect b/t/t4034/python/expect
new file mode 100644
index 0000000..bf6a30b
--- /dev/null
+++ b/t/t4034/python/expect
@@ -0,0 +1,8 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 2261a37..1076af0 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,3 +1,3 @@<RESET>
+<RED>foo.bar()<RESET><GREEN>foo . bar (stuff)<RESET>
+i = <RED>i+1<RESET><GREEN>i + 1<RESET>
+print "Hello, <GREEN>New<RESET> World!\n"
diff --git a/t/t4034/python/post b/t/t4034/python/post
new file mode 100644
index 0000000..1076af0
--- /dev/null
+++ b/t/t4034/python/post
@@ -0,0 +1,3 @@
+foo . bar (stuff)
+i = i + 1
+print "Hello, New World!\n"
diff --git a/t/t4034/python/pre b/t/t4034/python/pre
new file mode 100644
index 0000000..2261a37
--- /dev/null
+++ b/t/t4034/python/pre
@@ -0,0 +1,3 @@
+foo.bar()
+i = i+1
+print "Hello, World!\n"
diff --git a/t/t4034/ruby/expect b/t/t4034/ruby/expect
new file mode 100644
index 0000000..72ff72b
--- /dev/null
+++ b/t/t4034/ruby/expect
@@ -0,0 +1,7 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 1961e79..d954376 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,2 +1,2 @@<RESET>
+10.downto(<RED>1<RESET><GREEN>2<RESET>) { |x| puts x }
+puts 'Hello <GREEN>new<RESET> world'
diff --git a/t/t4034/ruby/post b/t/t4034/ruby/post
new file mode 100644
index 0000000..d954376
--- /dev/null
+++ b/t/t4034/ruby/post
@@ -0,0 +1,2 @@
+10.downto(2) { |x| puts x }
+puts 'Hello new world'
diff --git a/t/t4034/ruby/pre b/t/t4034/ruby/pre
new file mode 100644
index 0000000..1961e79
--- /dev/null
+++ b/t/t4034/ruby/pre
@@ -0,0 +1,2 @@
+10.downto(1) {|x| puts x}
+puts 'Hello world'
diff --git a/t/t4034/tex/expect b/t/t4034/tex/expect
new file mode 100644
index 0000000..604969b
--- /dev/null
+++ b/t/t4034/tex/expect
@@ -0,0 +1,9 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 2b2dfcb..65cab61 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,4 +1,4 @@<RESET>
+\section{Something <GREEN>new<RESET>}
+<RED>\emph<RESET><GREEN>\textbf<RESET>{Macro style}
+{<RED>\em<RESET><GREEN>\bfseries<RESET> State toggle style}
+\\[<RED>1em<RESET><GREEN>1cm<RESET>]
diff --git a/t/t4034/tex/post b/t/t4034/tex/post
new file mode 100644
index 0000000..65cab61
--- /dev/null
+++ b/t/t4034/tex/post
@@ -0,0 +1,4 @@
+\section{Something new}
+\textbf{Macro style}
+{\bfseries State toggle style}
+\\[1cm]
diff --git a/t/t4034/tex/pre b/t/t4034/tex/pre
new file mode 100644
index 0000000..2b2dfcb
--- /dev/null
+++ b/t/t4034/tex/pre
@@ -0,0 +1,4 @@
+\section{Something}
+\emph{Macro style}
+{\em State toggle style}
+\\[1em]
-- 
1.7.3.3.807.g6ee1f

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/4] --word-regex sanity checking and such
       [not found]   ` <913156.57703.qm@web110711.mail.gq1.yahoo.com>
@ 2010-12-15 19:51     ` Thomas Rast
  2010-12-15 20:48       ` Scott Johnson
  0 siblings, 1 reply; 27+ messages in thread
From: Thomas Rast @ 2010-12-15 19:51 UTC (permalink / raw)
  To: Scott Johnson; +Cc: Michael J Gruber, Matthijs Kooijman, git

Scott Johnson wrote:
> I've attached a pre and post with the complete file that is showing this 
> problem. I hope you'll be able to reproduce the issue with this.

I can't reproduce.  I did this:

  $ ls -l
  total 16
  -rw-r--r-- 1 thomas users 2128 2010-12-15 20:42 post.html
  -rw-r--r-- 1 thomas users 2354 2010-12-15 20:42 pre.html
  $ echo '*.html diff=html'  >.gitattributes
  $ git diff --no-index pre.html post.html
  diff --git 1/pre.html 2/post.html
[...]
  -        <li class="ydn-patterns"><em></em><a href="#">ydn-patterns</a></li>
  -        <li class="ydn-mail"><em></em><a href="#">ydn-mail</a></li>
  -        <li class="yws-maps"><em></em><a href="#">yws-maps</a></li>
  -        <li class="ydn-delicious"><em></em><a href="#">ydn-delicious</a></li>
  -        <li class="yws-flickr"><em></em><a href="#">yws-flickr</a></li>
  -        <li class="yws-events"><em></em><a href="#">yws-events</a></li>
  +        <li><em></em><a href="#">ydn-patterns</a></li>
  +        <li><em></em><a href="#">ydn-mail</a></li>
  +        <li><em></em><a href="#">yws-maps</a></li>
  +        <li><em></em><a href="#">ydn-delicious</a></li>
  +        <li><em></em><a href="#">yws-flickr</a></li>
  +        <li><em></em><a href="#">yws-events</a></li>
         </ul>
       </div><!-- wrap -->
     </div><!-- folder_list -->
  $ git diff --word-diff --no-index pre.html post.html
  diff --git 1/pre.html 2/post.html
[...]
          <li[-class="ydn-patterns"-]><em></em><a href="#">ydn-patterns</a></li>
          <li[-class="ydn-mail"-]><em></em><a href="#">ydn-mail</a></li>
          <li[-class="yws-maps"-]><em></em><a href="#">yws-maps</a></li>
          <li[-class="ydn-delicious"-]><em></em><a href="#">ydn-delicious</a></li>
          <li[-class="yws-flickr"-]><em></em><a href="#">yws-flickr</a></li>
          <li[-class="yws-events"-]><em></em><a href="#">yws-events</a></li>
        </ul>
      </div><!-- wrap -->
    </div><!-- folder_list -->

That's running bleeding-edge git, like I always do, but the userdiff
stuff hasn't changed in ages.

What does

  git config diff.html.wordregex

say on your system?

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/4] --word-regex sanity checking and such
  2010-12-15 19:51     ` [PATCH 0/4] --word-regex sanity checking and such Thomas Rast
@ 2010-12-15 20:48       ` Scott Johnson
  2010-12-18 16:17         ` [PATCH v2 " Thomas Rast
  0 siblings, 1 reply; 27+ messages in thread
From: Scott Johnson @ 2010-12-15 20:48 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Michael J Gruber, Matthijs Kooijman, git

Turns out to be system-dependent. I built v1.7.3.3 from source on three 
different boxes and only one of them is broken.


The /etc/redhat-release shows:

Broken:
Fedora Core release 6 (Zod)

Correct:
Red Hat Enterprise Linux WS release 4 (Nahant Update 6)
Fedora release 9 (Sulphur)

So I guess that means the problem is in some library that has most likely been 
fixed since Fedora 6.



----- Original Message ----
From: Thomas Rast <trast@student.ethz.ch>
To: Scott Johnson <scottj75074@yahoo.com>
Cc: Michael J Gruber <git@drmicha.warpmail.net>; Matthijs Kooijman 
<matthijs@stdin.nl>; git@vger.kernel.org
Sent: Wed, December 15, 2010 11:51:14 AM
Subject: Re: [PATCH 0/4] --word-regex sanity checking and such

Scott Johnson wrote:
> I've attached a pre and post with the complete file that is showing this 
> problem. I hope you'll be able to reproduce the issue with this.

I can't reproduce.  I did this:

  $ ls -l
  total 16
  -rw-r--r-- 1 thomas users 2128 2010-12-15 20:42 post.html
  -rw-r--r-- 1 thomas users 2354 2010-12-15 20:42 pre.html
  $ echo '*.html diff=html'  >.gitattributes
  $ git diff --no-index pre.html post.html
  diff --git 1/pre.html 2/post.html
[...]
  -        <li class="ydn-patterns"><em></em><a href="#">ydn-patterns</a></li>
  -        <li class="ydn-mail"><em></em><a href="#">ydn-mail</a></li>
  -        <li class="yws-maps"><em></em><a href="#">yws-maps</a></li>
  -        <li class="ydn-delicious"><em></em><a href="#">ydn-delicious</a></li>
  -        <li class="yws-flickr"><em></em><a href="#">yws-flickr</a></li>
  -        <li class="yws-events"><em></em><a href="#">yws-events</a></li>
  +        <li><em></em><a href="#">ydn-patterns</a></li>
  +        <li><em></em><a href="#">ydn-mail</a></li>
  +        <li><em></em><a href="#">yws-maps</a></li>
  +        <li><em></em><a href="#">ydn-delicious</a></li>
  +        <li><em></em><a href="#">yws-flickr</a></li>
  +        <li><em></em><a href="#">yws-events</a></li>
         </ul>
       </div><!-- wrap -->
     </div><!-- folder_list -->
  $ git diff --word-diff --no-index pre.html post.html
  diff --git 1/pre.html 2/post.html
[...]
          <li[-class="ydn-patterns"-]><em></em><a href="#">ydn-patterns</a></li>
          <li[-class="ydn-mail"-]><em></em><a href="#">ydn-mail</a></li>
          <li[-class="yws-maps"-]><em></em><a href="#">yws-maps</a></li>
          <li[-class="ydn-delicious"-]><em></em><a 
href="#">ydn-delicious</a></li>
          <li[-class="yws-flickr"-]><em></em><a href="#">yws-flickr</a></li>
          <li[-class="yws-events"-]><em></em><a href="#">yws-events</a></li>
        </ul>
      </div><!-- wrap -->
    </div><!-- folder_list -->

That's running bleeding-edge git, like I always do, but the userdiff
stuff hasn't changed in ages.

What does

  git config diff.html.wordregex

say on your system?

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 0/4] --word-regex sanity checking and such
  2010-12-15 20:48       ` Scott Johnson
@ 2010-12-18 16:17         ` Thomas Rast
  2010-12-18 16:17           ` [PATCH v2 1/4] diff.c: pass struct diff_words into find_word_boundaries Thomas Rast
                             ` (4 more replies)
  0 siblings, 5 replies; 27+ messages in thread
From: Thomas Rast @ 2010-12-18 16:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Scott Johnson, Michael J Gruber, Matthijs Kooijman, git

I wrote:
> * [4/4] I stole the html test from Scott's mail, and some of the rest
>   from various Wikibooks sources on "Hello World" in each language,
>   usually extended by a bit of code that tests the world-splitting
>   power.  I hope this is ok with Scott and the Copyright overlords.
>   There are only so many ways to spell "Hello World", and only so many
>   languages I know myself.

I decided to play it safe, and removed these parts.  In addition I
extended the bulk tests to a C operator table and some forms of
literals for the C-style languages so as to catch issues with
non-matches.  This showed that the python regex had the same typo as
the ruby one. *blush*

Scott's problem still remains:

Scott Johnson wrote:
> Turns out to be system-dependent. I built v1.7.3.3 from source on three 
> different boxes and only one of them is broken.
[...]
> Fedora Core release 6 (Zod)

I briefly considered installing FC6 on a VM, but my VirtualBox is
broken and I'm having a hard time finding a FC6 installation medium.
Right now the only other systems I have are darwin 10.5 and RHEL5.5,
and the test works on both.

So in the absence of any way of testing this, someone with a breaking
system will have to investigate.  I think it's worth including the
series anyway, since the regexes give wrong results in the case of
match failures, and we would want users to know about this.


Thomas Rast (4):
  diff.c: pass struct diff_words into find_word_boundaries
  diff.c: implement a sanity check for word regexes
  userdiff: fix typo in ruby and python word regexes
  t4034: bulk verify builtin word regex sanity

 Documentation/config.txt |    8 ++++
 diff.c                   |  104 +++++++++++++++++++++++++++++++++++++++++----
 diff.h                   |    1 +
 t/t4034-diff-words.sh    |   85 +++++++++++++++++++++++++++++++++++++-
 t/t4034/bibtex/expect    |   15 +++++++
 t/t4034/bibtex/post      |   10 ++++
 t/t4034/bibtex/pre       |    9 ++++
 t/t4034/cpp/expect       |   36 ++++++++++++++++
 t/t4034/cpp/post         |   19 ++++++++
 t/t4034/cpp/pre          |   19 ++++++++
 t/t4034/csharp/expect    |   35 +++++++++++++++
 t/t4034/csharp/post      |   18 ++++++++
 t/t4034/csharp/pre       |   18 ++++++++
 t/t4034/fortran/expect   |   10 ++++
 t/t4034/fortran/post     |    5 ++
 t/t4034/fortran/pre      |    5 ++
 t/t4034/html/expect      |    8 ++++
 t/t4034/html/post        |    3 +
 t/t4034/html/pre         |    3 +
 t/t4034/java/expect      |   36 ++++++++++++++++
 t/t4034/java/post        |   19 ++++++++
 t/t4034/java/pre         |   19 ++++++++
 t/t4034/objc/expect      |   35 +++++++++++++++
 t/t4034/objc/post        |   18 ++++++++
 t/t4034/objc/pre         |   18 ++++++++
 t/t4034/pascal/expect    |   35 +++++++++++++++
 t/t4034/pascal/post      |   18 ++++++++
 t/t4034/pascal/pre       |   18 ++++++++
 t/t4034/php/expect       |   35 +++++++++++++++
 t/t4034/php/post         |   18 ++++++++
 t/t4034/php/pre          |   18 ++++++++
 t/t4034/python/expect    |   34 +++++++++++++++
 t/t4034/python/post      |   17 +++++++
 t/t4034/python/pre       |   17 +++++++
 t/t4034/ruby/expect      |   34 +++++++++++++++
 t/t4034/ruby/post        |   17 +++++++
 t/t4034/ruby/pre         |   17 +++++++
 t/t4034/tex/expect       |    9 ++++
 t/t4034/tex/post         |    4 ++
 t/t4034/tex/pre          |    4 ++
 userdiff.c               |    4 +-
 41 files changed, 842 insertions(+), 13 deletions(-)
 create mode 100644 t/t4034/bibtex/expect
 create mode 100644 t/t4034/bibtex/post
 create mode 100644 t/t4034/bibtex/pre
 create mode 100644 t/t4034/cpp/expect
 create mode 100644 t/t4034/cpp/post
 create mode 100644 t/t4034/cpp/pre
 create mode 100644 t/t4034/csharp/expect
 create mode 100644 t/t4034/csharp/post
 create mode 100644 t/t4034/csharp/pre
 create mode 100644 t/t4034/fortran/expect
 create mode 100644 t/t4034/fortran/post
 create mode 100644 t/t4034/fortran/pre
 create mode 100644 t/t4034/html/expect
 create mode 100644 t/t4034/html/post
 create mode 100644 t/t4034/html/pre
 create mode 100644 t/t4034/java/expect
 create mode 100644 t/t4034/java/post
 create mode 100644 t/t4034/java/pre
 create mode 100644 t/t4034/objc/expect
 create mode 100644 t/t4034/objc/post
 create mode 100644 t/t4034/objc/pre
 create mode 100644 t/t4034/pascal/expect
 create mode 100644 t/t4034/pascal/post
 create mode 100644 t/t4034/pascal/pre
 create mode 100644 t/t4034/php/expect
 create mode 100644 t/t4034/php/post
 create mode 100644 t/t4034/php/pre
 create mode 100644 t/t4034/python/expect
 create mode 100644 t/t4034/python/post
 create mode 100644 t/t4034/python/pre
 create mode 100644 t/t4034/ruby/expect
 create mode 100644 t/t4034/ruby/post
 create mode 100644 t/t4034/ruby/pre
 create mode 100644 t/t4034/tex/expect
 create mode 100644 t/t4034/tex/post
 create mode 100644 t/t4034/tex/pre

-- 
1.7.3.4.789.g74ad1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 1/4] diff.c: pass struct diff_words into find_word_boundaries
  2010-12-18 16:17         ` [PATCH v2 " Thomas Rast
@ 2010-12-18 16:17           ` Thomas Rast
  2010-12-18 16:17           ` [PATCH v2 2/4] diff.c: implement a sanity check for word regexes Thomas Rast
                             ` (3 subsequent siblings)
  4 siblings, 0 replies; 27+ messages in thread
From: Thomas Rast @ 2010-12-18 16:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Scott Johnson, Michael J Gruber, Matthijs Kooijman, git

We need the word_regex_check member.  Instead of adding another
argument, just pass in the whole struct for future extensibility.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 diff.c |   11 ++++++-----
 1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/diff.c b/diff.c
index 0a43869..5fdcb15 100644
--- a/diff.c
+++ b/diff.c
@@ -778,12 +778,13 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
 }
 
 /* This function starts looking at *begin, and returns 0 iff a word was found. */
-static int find_word_boundaries(mmfile_t *buffer, regex_t *word_regex,
+static int find_word_boundaries(mmfile_t *buffer, struct diff_words_data *diff_words,
 		int *begin, int *end)
 {
-	if (word_regex && *begin < buffer->size) {
+	if (diff_words->word_regex && *begin < buffer->size) {
 		regmatch_t match[1];
-		if (!regexec(word_regex, buffer->ptr + *begin, 1, match, 0)) {
+		if (!regexec(diff_words->word_regex, buffer->ptr + *begin,
+			     1, match, 0)) {
 			char *p = memchr(buffer->ptr + *begin + match[0].rm_so,
 					'\n', match[0].rm_eo - match[0].rm_so);
 			*end = p ? p - buffer->ptr : match[0].rm_eo + *begin;
@@ -813,7 +814,7 @@ static int find_word_boundaries(mmfile_t *buffer, regex_t *word_regex,
  * in buffer->orig.
  */
 static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out,
-		regex_t *word_regex)
+		struct diff_words_data *diff_words)
 {
 	int i, j;
 	long alloc = 0;
@@ -827,7 +828,7 @@ static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out,
 	buffer->orig_nr = 1;
 
 	for (i = 0; i < buffer->text.size; i++) {
-		if (find_word_boundaries(&buffer->text, word_regex, &i, &j))
+		if (find_word_boundaries(&buffer->text, diff_words, &i, &j))
 			return;
 
 		/* store original boundaries */
-- 
1.7.3.4.789.g74ad1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 2/4] diff.c: implement a sanity check for word regexes
  2010-12-18 16:17         ` [PATCH v2 " Thomas Rast
  2010-12-18 16:17           ` [PATCH v2 1/4] diff.c: pass struct diff_words into find_word_boundaries Thomas Rast
@ 2010-12-18 16:17           ` Thomas Rast
  2010-12-18 21:00             ` Junio C Hamano
  2010-12-18 16:17           ` [PATCH v2 3/4] userdiff: fix typo in ruby and python " Thomas Rast
                             ` (2 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Thomas Rast @ 2010-12-18 16:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Scott Johnson, Michael J Gruber, Matthijs Kooijman, git

Word regexes are a bit of a dangerous beast, since it is easily
possible to not match a non-space part, which is subsequently ignored
for the purposes of emitting the word diff.  This was clearly stated
in the docs, but users still tripped over it.

Implement a safeguard that verifies two basic sanity assumptions:

* The word regex matches anything that is !isspace().

* The word regex does not match '\n'.  (This case is not very harmful,
  but we used to silently cut off at the '\n' which may go against
  user expectations.)

This is configurable via 'diff.wordRegexCheck', and defaults to
'warn'.

Reported-by: Scott Johnson <scottj75074@yahoo.com>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 Documentation/config.txt |    8 ++++
 diff.c                   |   93 +++++++++++++++++++++++++++++++++++++++++++--
 diff.h                   |    1 +
 t/t4034-diff-words.sh    |   65 +++++++++++++++++++++++++++++++-
 4 files changed, 161 insertions(+), 6 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index bf9479e..2e033ea 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -897,6 +897,14 @@ diff.wordRegex::
 	sequences that match the regular expression are "words", all other
 	characters are *ignorable* whitespace.
 
+diff.wordRegexCheck::
+	Perform a simple sanity check on matches of the word regex.
+	Currently this check ensures that the word regex matches all
+	non-space characters, and that the word regex does not match a
+	newline.  The setting controls what to do when the check
+	fails: 'false'/'off'/'ignore' ignore, 'true'/'on'/'warn' emit
+	a warning, and 'error' abort with an error message.
+
 fetch.recurseSubmodules::
 	A boolean value which changes the behavior for fetch and pull, the
 	default is to not recursively fetch populated sumodules unless
diff --git a/diff.c b/diff.c
index 5fdcb15..7213b2b 100644
--- a/diff.c
+++ b/diff.c
@@ -22,11 +22,17 @@
 #define FAST_WORKING_DIRECTORY 1
 #endif
 
+#define REGEX_CHECK_UNSET -1
+#define REGEX_CHECK_OFF 0
+#define REGEX_CHECK_WARN 1
+#define REGEX_CHECK_ERROR 2
+
 static int diff_detect_rename_default;
 static int diff_rename_limit_default = 200;
 static int diff_suppress_blank_empty;
 int diff_use_color_default = -1;
 static const char *diff_word_regex_cfg;
+static int diff_word_regex_check_cfg = REGEX_CHECK_UNSET;
 static const char *external_diff_cmd_cfg;
 int diff_auto_refresh_index = 1;
 static int diff_mnemonic_prefix;
@@ -75,6 +81,19 @@ static int git_config_rename(const char *var, const char *value)
 	return git_config_bool(var,value) ? DIFF_DETECT_RENAME : 0;
 }
 
+static int parse_regex_check_level(int *b, const char *k, const char *v)
+{
+	if (v && !strcasecmp(v, "ignore"))
+		*b = REGEX_CHECK_OFF;
+	else if (v && !strcasecmp(v, "warn"))
+		*b = REGEX_CHECK_WARN;
+	else if (v && !strcasecmp(v, "error"))
+		*b = REGEX_CHECK_ERROR;
+	else
+		*b = git_config_bool(k, v);
+	return 1;
+}
+
 /*
  * These are to give UI layer defaults.
  * The core-level commands such as git-diff-files should
@@ -107,6 +126,8 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
 		return git_config_string(&external_diff_cmd_cfg, var, value);
 	if (!strcmp(var, "diff.wordregex"))
 		return git_config_string(&diff_word_regex_cfg, var, value);
+	if (!strcmp(var, "diff.wordregexcheck"))
+		return parse_regex_check_level(&diff_word_regex_check_cfg, var, value);
 
 	if (!strcmp(var, "diff.ignoresubmodules"))
 		handle_ignore_submodules_arg(&default_diff_options, value);
@@ -777,6 +798,50 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
 	diff_words->last_minus = minus_first;
 }
 
+
+static void check_word_regex_match(struct diff_words_data *diff_words,
+		char *start, int len, int unmatched)
+{
+	int check = diff_words->opt->word_regex_check;
+	void (*report_fn)(const char *, ...);
+
+	if (check == REGEX_CHECK_OFF)
+		return;
+
+	if (check == REGEX_CHECK_WARN)
+		report_fn = warning;
+	else if (check == REGEX_CHECK_ERROR)
+		report_fn = die;
+	else
+		assert(!"expected REGEX_CHECK_WARN or _ERROR");
+
+	if (unmatched) {
+		int i;
+		char *match_str;
+		for (i = 0; i < len; i++) {
+			if (isspace(start[i]))
+				continue;
+			match_str = xmemdupz(start, len);
+			report_fn("The following snippet contains non-space "
+				  "characters, but was not\nmatched by the "
+				  "word regex:\n'%s'\n"
+				  "They would be ignored for the purposes of "
+				  "the diff, which is\nusually not what you want.",
+				  match_str);
+			free(match_str);
+			break;
+		}
+	} else {
+		if (memchr(start, '\n', len)) {
+			char *match_str = xmemdupz(start, len);
+			report_fn("The following word regex match contains a newline "
+				  "and will be truncated there:\n'%s'",
+				  match_str);
+			free(match_str);
+		}
+	}
+}
+
 /* This function starts looking at *begin, and returns 0 iff a word was found. */
 static int find_word_boundaries(mmfile_t *buffer, struct diff_words_data *diff_words,
 		int *begin, int *end)
@@ -785,8 +850,15 @@ static int find_word_boundaries(mmfile_t *buffer, struct diff_words_data *diff_w
 		regmatch_t match[1];
 		if (!regexec(diff_words->word_regex, buffer->ptr + *begin,
 			     1, match, 0)) {
-			char *p = memchr(buffer->ptr + *begin + match[0].rm_so,
-					'\n', match[0].rm_eo - match[0].rm_so);
+			char *prev_start = buffer->ptr + *begin;
+			char *match_start = prev_start + match[0].rm_so;
+			int match_len = match[0].rm_eo - match[0].rm_so;
+			char *p;
+			check_word_regex_match(diff_words, prev_start,
+					       match_start-prev_start, 1);
+			check_word_regex_match(diff_words, match_start,
+					       match_len, 0);
+			p = memchr(match_start, '\n', match_len);
 			*end = p ? p - buffer->ptr : match[0].rm_eo + *begin;
 			*begin += match[0].rm_so;
 			return *begin >= *end;
@@ -829,7 +901,7 @@ static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out,
 
 	for (i = 0; i < buffer->text.size; i++) {
 		if (find_word_boundaries(&buffer->text, diff_words, &i, &j))
-			return;
+			break;
 
 		/* store original boundaries */
 		ALLOC_GROW(buffer->orig, buffer->orig_nr + 1,
@@ -846,6 +918,11 @@ static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out,
 
 		i = j - 1;
 	}
+
+	/* no more boundaries, check any non-matched chunk remaining */
+	if (i < buffer->text.size)
+		check_word_regex_match(diff_words, buffer->text.ptr + i,
+				       buffer->text.size-i, 1);
 }
 
 /* this executes the word diff on the accumulated buffers */
@@ -882,8 +959,8 @@ static void diff_words_show(struct diff_words_data *diff_words)
 
 	memset(&xpp, 0, sizeof(xpp));
 	memset(&xecfg, 0, sizeof(xecfg));
-	diff_words_fill(&diff_words->minus, &minus, diff_words->word_regex);
-	diff_words_fill(&diff_words->plus, &plus, diff_words->word_regex);
+	diff_words_fill(&diff_words->minus, &minus, diff_words);
+	diff_words_fill(&diff_words->plus, &plus, diff_words);
 	xpp.flags = 0;
 	/* as only the hunk header will be parsed, we need a 0-context */
 	xecfg.ctxlen = 0;
@@ -2021,6 +2098,10 @@ static void builtin_diff(const char *name_a,
 				o->word_regex = userdiff_word_regex(two);
 			if (!o->word_regex)
 				o->word_regex = diff_word_regex_cfg;
+			if (o->word_regex_check == REGEX_CHECK_UNSET)
+				o->word_regex_check = diff_word_regex_check_cfg;
+			if (o->word_regex_check == REGEX_CHECK_UNSET)
+				o->word_regex_check = REGEX_CHECK_WARN;
 			if (o->word_regex) {
 				ecbdata.diff_words->word_regex = (regex_t *)
 					xmalloc(sizeof(regex_t));
@@ -2861,6 +2942,8 @@ void diff_setup(struct diff_options *options)
 		options->a_prefix = "a/";
 		options->b_prefix = "b/";
 	}
+
+	options->word_regex_check = REGEX_CHECK_UNSET;
 }
 
 int diff_setup_done(struct diff_options *options)
diff --git a/diff.h b/diff.h
index 0083d92..4d02981 100644
--- a/diff.h
+++ b/diff.h
@@ -122,6 +122,7 @@ struct diff_options {
 	int stat_width;
 	int stat_name_width;
 	const char *word_regex;
+	int word_regex_check;
 	enum diff_words_type word_diff;
 
 	/* this is set by diffcore for DIFF_FORMAT_PATCH */
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index 8096d8a..ebe72ce 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -8,7 +8,8 @@ test_expect_success setup '
 
 	git config diff.color.old red &&
 	git config diff.color.new green &&
-	git config diff.color.func magenta
+	git config diff.color.func magenta &&
+	git config diff.wordRegexCheck off
 
 '
 
@@ -331,4 +332,66 @@ test_expect_success '--word-diff=none' '
 
 '
 
+echo abcd > pre
+echo aXYd > post
+
+test_expect_success 'diff.wordRegexCheck="error" catches nonspaces' '
+
+	git config diff.wordRegexCheck error &&
+	test_must_fail git diff --no-index --word-diff-regex="a|d" pre post 2>out &&
+	grep "fatal.*contains non-space characters" out
+
+'
+
+newline="
+"
+
+test_expect_success 'diff.wordRegexCheck="error" catches newlines' '
+
+	git config diff.wordRegexCheck error &&
+	test_must_fail git diff --no-index --word-diff-regex=".|$newline" pre post 2>out &&
+	grep "fatal.*contains a newline" out
+
+'
+
+test_expect_success 'diff.wordRegexCheck="warn" works' '
+
+	git config diff.wordRegexCheck warn &&
+	test_must_fail git diff --no-index --word-diff-regex="a|d" pre post 2>out &&
+	grep "warning.*contains non-space characters" out
+
+'
+
+test_expect_success 'diff.wordRegexCheck="ignore" works' '
+
+	git config diff.wordRegexCheck ignore &&
+	test_must_fail git diff --no-index --word-diff-regex="a|d" pre post 2>out &&
+	! grep "contains non-space characters" out
+
+'
+
+test_expect_success 'diff.wordRegexCheck="false" is like "ignore"' '
+
+	git config diff.wordRegexCheck false &&
+	test_must_fail git diff --no-index --word-diff-regex="a|d" pre post 2>out &&
+	! grep "contains non-space characters" out
+
+'
+
+test_expect_success 'diff.wordRegexCheck="true" is like "warn"' '
+
+	git config diff.wordRegexCheck true &&
+	test_must_fail git diff --no-index --word-diff-regex="a|d" pre post 2>out &&
+	grep "warning.*contains non-space characters" out
+
+'
+
+test_expect_success 'diff.wordRegexCheck unset is like "warn"' '
+
+	git config --unset diff.wordRegexCheck &&
+	test_must_fail git diff --no-index --word-diff-regex="a|d" pre post 2>out &&
+	grep "warning.*contains non-space characters" out
+
+'
+
 test_done
-- 
1.7.3.4.789.g74ad1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 3/4] userdiff: fix typo in ruby and python word regexes
  2010-12-18 16:17         ` [PATCH v2 " Thomas Rast
  2010-12-18 16:17           ` [PATCH v2 1/4] diff.c: pass struct diff_words into find_word_boundaries Thomas Rast
  2010-12-18 16:17           ` [PATCH v2 2/4] diff.c: implement a sanity check for word regexes Thomas Rast
@ 2010-12-18 16:17           ` Thomas Rast
  2010-12-18 21:02             ` Junio C Hamano
  2010-12-18 16:17           ` [PATCH v2 4/4] t4034: bulk verify builtin word regex sanity Thomas Rast
  2010-12-18 16:24           ` [PATCH v2 0/4] --word-regex sanity checking and such Thomas Rast
  4 siblings, 1 reply; 27+ messages in thread
From: Thomas Rast @ 2010-12-18 16:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Scott Johnson, Michael J Gruber, Matthijs Kooijman, git

Both had an unclosed ] that ruined the safeguard against not matching
a non-space char.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 userdiff.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/userdiff.c b/userdiff.c
index f9e05b5..2d54536 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -74,14 +74,14 @@
 	 "[a-zA-Z_][a-zA-Z0-9_]*"
 	 "|[-+0-9.e]+[jJlL]?|0[xX]?[0-9a-fA-F]+[lL]?"
 	 "|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?"
-	 "|[^[:space:]|[\x80-\xff]+"),
+	 "|[^[:space:]]|[\x80-\xff]+"),
 	 /* -- */
 PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
 	 /* -- */
 	 "(@|@@|\\$)?[a-zA-Z_][a-zA-Z0-9_]*"
 	 "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+|\\?(\\\\C-)?(\\\\M-)?."
 	 "|//=?|[-+*/<>%&^|=!]=|<<=?|>>=?|===|\\.{1,3}|::|[!=]~"
-	 "|[^[:space:]|[\x80-\xff]+"),
+	 "|[^[:space:]]|[\x80-\xff]+"),
 PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
 	 "[={}\"]|[^={}\" \t]+"),
 PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
-- 
1.7.3.4.789.g74ad1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 4/4] t4034: bulk verify builtin word regex sanity
  2010-12-18 16:17         ` [PATCH v2 " Thomas Rast
                             ` (2 preceding siblings ...)
  2010-12-18 16:17           ` [PATCH v2 3/4] userdiff: fix typo in ruby and python " Thomas Rast
@ 2010-12-18 16:17           ` Thomas Rast
  2011-01-11 21:47             ` [RFC/PATCH 0/3] " Jonathan Nieder
  2010-12-18 16:24           ` [PATCH v2 0/4] --word-regex sanity checking and such Thomas Rast
  4 siblings, 1 reply; 27+ messages in thread
From: Thomas Rast @ 2010-12-18 16:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Scott Johnson, Michael J Gruber, Matthijs Kooijman, git

The builtin word regexes should be tested with some simple examples
against simple issues, like failing to match a non-space character.
Do this in bulk.

Mainly due to a lack of language knowledge and inspiration, most of
the test cases (cpp, csharp, java, objc, pascal, php, python, ruby)
are directly based off a C operator precedence table to verify that
all operators are split correctly.  This means that they are probably
incomplete or inaccurate except for 'cpp' itself.

Still, they are good enough to already have uncovered a typo in the
python and ruby patterns.

'fortran' is based on my anecdotal knowledge of the DO10I parsing
rules, and thus probably useless.  The rest (bibtex, html, tex) are an
ad-hoc test of what I consider important splits in those languages.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 t/t4034-diff-words.sh  |   20 ++++++++++++++++++++
 t/t4034/bibtex/expect  |   15 +++++++++++++++
 t/t4034/bibtex/post    |   10 ++++++++++
 t/t4034/bibtex/pre     |    9 +++++++++
 t/t4034/cpp/expect     |   36 ++++++++++++++++++++++++++++++++++++
 t/t4034/cpp/post       |   19 +++++++++++++++++++
 t/t4034/cpp/pre        |   19 +++++++++++++++++++
 t/t4034/csharp/expect  |   35 +++++++++++++++++++++++++++++++++++
 t/t4034/csharp/post    |   18 ++++++++++++++++++
 t/t4034/csharp/pre     |   18 ++++++++++++++++++
 t/t4034/fortran/expect |   10 ++++++++++
 t/t4034/fortran/post   |    5 +++++
 t/t4034/fortran/pre    |    5 +++++
 t/t4034/html/expect    |    8 ++++++++
 t/t4034/html/post      |    3 +++
 t/t4034/html/pre       |    3 +++
 t/t4034/java/expect    |   36 ++++++++++++++++++++++++++++++++++++
 t/t4034/java/post      |   19 +++++++++++++++++++
 t/t4034/java/pre       |   19 +++++++++++++++++++
 t/t4034/objc/expect    |   35 +++++++++++++++++++++++++++++++++++
 t/t4034/objc/post      |   18 ++++++++++++++++++
 t/t4034/objc/pre       |   18 ++++++++++++++++++
 t/t4034/pascal/expect  |   35 +++++++++++++++++++++++++++++++++++
 t/t4034/pascal/post    |   18 ++++++++++++++++++
 t/t4034/pascal/pre     |   18 ++++++++++++++++++
 t/t4034/php/expect     |   35 +++++++++++++++++++++++++++++++++++
 t/t4034/php/post       |   18 ++++++++++++++++++
 t/t4034/php/pre        |   18 ++++++++++++++++++
 t/t4034/python/expect  |   34 ++++++++++++++++++++++++++++++++++
 t/t4034/python/post    |   17 +++++++++++++++++
 t/t4034/python/pre     |   17 +++++++++++++++++
 t/t4034/ruby/expect    |   34 ++++++++++++++++++++++++++++++++++
 t/t4034/ruby/post      |   17 +++++++++++++++++
 t/t4034/ruby/pre       |   17 +++++++++++++++++
 t/t4034/tex/expect     |    9 +++++++++
 t/t4034/tex/post       |    4 ++++
 t/t4034/tex/pre        |    4 ++++
 37 files changed, 673 insertions(+), 0 deletions(-)
 create mode 100644 t/t4034/bibtex/expect
 create mode 100644 t/t4034/bibtex/post
 create mode 100644 t/t4034/bibtex/pre
 create mode 100644 t/t4034/cpp/expect
 create mode 100644 t/t4034/cpp/post
 create mode 100644 t/t4034/cpp/pre
 create mode 100644 t/t4034/csharp/expect
 create mode 100644 t/t4034/csharp/post
 create mode 100644 t/t4034/csharp/pre
 create mode 100644 t/t4034/fortran/expect
 create mode 100644 t/t4034/fortran/post
 create mode 100644 t/t4034/fortran/pre
 create mode 100644 t/t4034/html/expect
 create mode 100644 t/t4034/html/post
 create mode 100644 t/t4034/html/pre
 create mode 100644 t/t4034/java/expect
 create mode 100644 t/t4034/java/post
 create mode 100644 t/t4034/java/pre
 create mode 100644 t/t4034/objc/expect
 create mode 100644 t/t4034/objc/post
 create mode 100644 t/t4034/objc/pre
 create mode 100644 t/t4034/pascal/expect
 create mode 100644 t/t4034/pascal/post
 create mode 100644 t/t4034/pascal/pre
 create mode 100644 t/t4034/php/expect
 create mode 100644 t/t4034/php/post
 create mode 100644 t/t4034/php/pre
 create mode 100644 t/t4034/python/expect
 create mode 100644 t/t4034/python/post
 create mode 100644 t/t4034/python/pre
 create mode 100644 t/t4034/ruby/expect
 create mode 100644 t/t4034/ruby/post
 create mode 100644 t/t4034/ruby/pre
 create mode 100644 t/t4034/tex/expect
 create mode 100644 t/t4034/tex/post
 create mode 100644 t/t4034/tex/pre

diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index ebe72ce..c537116 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -394,4 +394,24 @@ test_expect_success 'diff.wordRegexCheck unset is like "warn"' '
 
 '
 
+test_expect_success 'set diff.wordRegexCheck=error for language tests' '
+
+	git config diff.wordRegexCheck error
+
+'
+
+word_diff_for_language () {
+	cp $TEST_DIRECTORY/t4034/$1/pre $TEST_DIRECTORY/t4034/$1/post \
+		$TEST_DIRECTORY/t4034/$1/expect . &&
+	echo "* diff=$1" > .gitattributes &&
+	word_diff --color-words && cp output output.$1
+}
+
+for lang_dir in $TEST_DIRECTORY/t4034/*; do
+	lang=${lang_dir#$TEST_DIRECTORY/t4034/}
+	test_expect_success "diff driver '$lang' has sane word regex" "
+		word_diff_for_language $lang
+	"
+done
+
 test_done
diff --git a/t/t4034/bibtex/expect b/t/t4034/bibtex/expect
new file mode 100644
index 0000000..a157774
--- /dev/null
+++ b/t/t4034/bibtex/expect
@@ -0,0 +1,15 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 95cd55b..ddcba9b 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,9 +1,10 @@<RESET>
+@article{aldous1987uie,<RESET>
+  title={{Ultimate instability of exponential back-off protocol for acknowledgment-based transmission control of random access communication channels}},<RESET>
+  author={Aldous, <RED>D.<RESET><GREEN>David<RESET>},
+  journal={Information Theory, IEEE Transactions on},<RESET>
+  volume={<RED>33<RESET><GREEN>Bogus.<RESET>},
+  number={<RED>2<RESET><GREEN>4<RESET>},
+  pages={219--223},<RESET>
+  year=<GREEN>1987,<RESET>
+<GREEN>  note={This is in fact a rather funny read since ethernet works well in practice. The<RESET> {<RED>1987<RESET><GREEN>\em pre} reference is the right one, however.<RESET>}<RED>,<RESET>
+}<RESET>
diff --git a/t/t4034/bibtex/post b/t/t4034/bibtex/post
new file mode 100644
index 0000000..ddcba9b
--- /dev/null
+++ b/t/t4034/bibtex/post
@@ -0,0 +1,10 @@
+@article{aldous1987uie,
+  title={{Ultimate instability of exponential back-off protocol for acknowledgment-based transmission control of random access communication channels}},
+  author={Aldous, David},
+  journal={Information Theory, IEEE Transactions on},
+  volume={Bogus.},
+  number={4},
+  pages={219--223},
+  year=1987,
+  note={This is in fact a rather funny read since ethernet works well in practice. The {\em pre} reference is the right one, however.}
+}
diff --git a/t/t4034/bibtex/pre b/t/t4034/bibtex/pre
new file mode 100644
index 0000000..95cd55b
--- /dev/null
+++ b/t/t4034/bibtex/pre
@@ -0,0 +1,9 @@
+@article{aldous1987uie,
+  title={{Ultimate instability of exponential back-off protocol for acknowledgment-based transmission control of random access communication channels}},
+  author={Aldous, D.},
+  journal={Information Theory, IEEE Transactions on},
+  volume={33},
+  number={2},
+  pages={219--223},
+  year={1987},
+}
diff --git a/t/t4034/cpp/expect b/t/t4034/cpp/expect
new file mode 100644
index 0000000..37d1ea2
--- /dev/null
+++ b/t/t4034/cpp/expect
@@ -0,0 +1,36 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 23d5c8a..7e8c026 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,19 +1,19 @@<RESET>
+Foo() : x(0<RED>&&1<RESET><GREEN>&42<RESET>) { <GREEN>bar(x);<RESET> }
+cout<<"Hello World<RED>!<RESET><GREEN>?<RESET>\n"<<endl;
+<GREEN>(<RESET>1<GREEN>) (<RESET>-1e10<GREEN>) (<RESET>0xabcdef<GREEN>)<RESET> '<RED>x<RESET><GREEN>y<RESET>'
+[<RED>a<RESET><GREEN>x<RESET>] <RED>a<RESET><GREEN>x<RESET>-><RED>b a<RESET><GREEN>y x<RESET>.<RED>b<RESET><GREEN>y<RESET>
+!<RED>a<RESET><GREEN>x<RESET> ~<RED>a a<RESET><GREEN>x x<RESET>++ <RED>a<RESET><GREEN>x<RESET>-- <RED>a<RESET><GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>/<RED>b a<RESET><GREEN>y x<RESET>%<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>+<RED>b a<RESET><GREEN>y x<RESET>-<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<<RED>b a<RESET><GREEN>y x<RESET>>><RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<RED>b a<RESET><GREEN>y x<RESET><=<RED>b a<RESET><GREEN>y x<RESET>><RED>b a<RESET><GREEN>y x<RESET>>=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>==<RED>b a<RESET><GREEN>y x<RESET>!=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>^<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>|<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>||<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>?<RED>b<RESET><GREEN>y<RESET>:z
+<RED>a<RESET><GREEN>x<RESET>=<RED>b a<RESET><GREEN>y x<RESET>+=<RED>b a<RESET><GREEN>y x<RESET>-=<RED>b a<RESET><GREEN>y x<RESET>*=<RED>b a<RESET><GREEN>y x<RESET>/=<RED>b a<RESET><GREEN>y x<RESET>%=<RED>b a<RESET><GREEN>y x<RESET><<=<RED>b a<RESET><GREEN>y x<RESET>>>=<RED>b a<RESET><GREEN>y x<RESET>&=<RED>b a<RESET><GREEN>y x<RESET>^=<RED>b a<RESET><GREEN>y x<RESET>|=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>,y
+<RED>a<RESET><GREEN>x<RESET>::<RED>b<RESET><GREEN>y<RESET>
diff --git a/t/t4034/cpp/post b/t/t4034/cpp/post
new file mode 100644
index 0000000..7e8c026
--- /dev/null
+++ b/t/t4034/cpp/post
@@ -0,0 +1,19 @@
+Foo() : x(0&42) { bar(x); }
+cout<<"Hello World?\n"<<endl;
+(1) (-1e10) (0xabcdef) 'y'
+[x] x->y x.y
+!x ~x x++ x-- x*y x&y
+x*y x/y x%y
+x+y x-y
+x<<y x>>y
+x<y x<=y x>y x>=y
+x==y x!=y
+x&y
+x^y
+x|y
+x&&y
+x||y
+x?y:z
+x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y
+x,y
+x::y
diff --git a/t/t4034/cpp/pre b/t/t4034/cpp/pre
new file mode 100644
index 0000000..23d5c8a
--- /dev/null
+++ b/t/t4034/cpp/pre
@@ -0,0 +1,19 @@
+Foo():x(0&&1){}
+cout<<"Hello World!\n"<<endl;
+1 -1e10 0xabcdef 'x'
+[a] a->b a.b
+!a ~a a++ a-- a*b a&b
+a*b a/b a%b
+a+b a-b
+a<<b a>>b
+a<b a<=b a>b a>=b
+a==b a!=b
+a&b
+a^b
+a|b
+a&&b
+a||b
+a?b:z
+a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b
+a,y
+a::b
diff --git a/t/t4034/csharp/expect b/t/t4034/csharp/expect
new file mode 100644
index 0000000..e5d1dd2
--- /dev/null
+++ b/t/t4034/csharp/expect
@@ -0,0 +1,35 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 9106d63..dd5f421 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,18 +1,18 @@<RESET>
+Foo() : x(0<RED>&&1<RESET><GREEN>&42<RESET>) { <GREEN>bar(x);<RESET> }
+cout<<"Hello World<RED>!<RESET><GREEN>?<RESET>\n"<<endl;
+<GREEN>(<RESET>1<GREEN>) (<RESET>-1e10<GREEN>) (<RESET>0xabcdef<GREEN>)<RESET> '<RED>x<RESET><GREEN>y<RESET>'
+[<RED>a<RESET><GREEN>x<RESET>] <RED>a<RESET><GREEN>x<RESET>-><RED>b a<RESET><GREEN>y x<RESET>.<RED>b<RESET><GREEN>y<RESET>
+!<RED>a<RESET><GREEN>x<RESET> ~<RED>a a<RESET><GREEN>x x<RESET>++ <RED>a<RESET><GREEN>x<RESET>-- <RED>a<RESET><GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>/<RED>b a<RESET><GREEN>y x<RESET>%<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>+<RED>b a<RESET><GREEN>y x<RESET>-<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<<RED>b a<RESET><GREEN>y x<RESET>>><RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<RED>b a<RESET><GREEN>y x<RESET><=<RED>b a<RESET><GREEN>y x<RESET>><RED>b a<RESET><GREEN>y x<RESET>>=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>==<RED>b a<RESET><GREEN>y x<RESET>!=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>^<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>|<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>||<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>?<RED>b<RESET><GREEN>y<RESET>:z
+<RED>a<RESET><GREEN>x<RESET>=<RED>b a<RESET><GREEN>y x<RESET>+=<RED>b a<RESET><GREEN>y x<RESET>-=<RED>b a<RESET><GREEN>y x<RESET>*=<RED>b a<RESET><GREEN>y x<RESET>/=<RED>b a<RESET><GREEN>y x<RESET>%=<RED>b a<RESET><GREEN>y x<RESET><<=<RED>b a<RESET><GREEN>y x<RESET>>>=<RED>b a<RESET><GREEN>y x<RESET>&=<RED>b a<RESET><GREEN>y x<RESET>^=<RED>b a<RESET><GREEN>y x<RESET>|=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>,y
diff --git a/t/t4034/csharp/post b/t/t4034/csharp/post
new file mode 100644
index 0000000..dd5f421
--- /dev/null
+++ b/t/t4034/csharp/post
@@ -0,0 +1,18 @@
+Foo() : x(0&42) { bar(x); }
+cout<<"Hello World?\n"<<endl;
+(1) (-1e10) (0xabcdef) 'y'
+[x] x->y x.y
+!x ~x x++ x-- x*y x&y
+x*y x/y x%y
+x+y x-y
+x<<y x>>y
+x<y x<=y x>y x>=y
+x==y x!=y
+x&y
+x^y
+x|y
+x&&y
+x||y
+x?y:z
+x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y
+x,y
diff --git a/t/t4034/csharp/pre b/t/t4034/csharp/pre
new file mode 100644
index 0000000..9106d63
--- /dev/null
+++ b/t/t4034/csharp/pre
@@ -0,0 +1,18 @@
+Foo():x(0&&1){}
+cout<<"Hello World!\n"<<endl;
+1 -1e10 0xabcdef 'x'
+[a] a->b a.b
+!a ~a a++ a-- a*b a&b
+a*b a/b a%b
+a+b a-b
+a<<b a>>b
+a<b a<=b a>b a>=b
+a==b a!=b
+a&b
+a^b
+a|b
+a&&b
+a||b
+a?b:z
+a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b
+a,y
diff --git a/t/t4034/fortran/expect b/t/t4034/fortran/expect
new file mode 100644
index 0000000..b233dbd
--- /dev/null
+++ b/t/t4034/fortran/expect
@@ -0,0 +1,10 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 87f0d0b..d308da2 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,5 +1,5 @@<RESET>
+print *, "Hello World<RED>!<RESET><GREEN>?<RESET>"
+
+DO10I = 1,10<RESET>
+<RED>DO10I<RESET><GREEN>DO 10 I<RESET> = 1,10
+<RED>DO10I<RESET><GREEN>DO 1 0 I<RESET> = 1,10
diff --git a/t/t4034/fortran/post b/t/t4034/fortran/post
new file mode 100644
index 0000000..d308da2
--- /dev/null
+++ b/t/t4034/fortran/post
@@ -0,0 +1,5 @@
+print *, "Hello World?"
+
+DO10I = 1,10
+DO 10 I = 1,10
+DO 1 0 I = 1,10
diff --git a/t/t4034/fortran/pre b/t/t4034/fortran/pre
new file mode 100644
index 0000000..87f0d0b
--- /dev/null
+++ b/t/t4034/fortran/pre
@@ -0,0 +1,5 @@
+print *, "Hello World!"
+
+DO10I = 1,10
+DO10I = 1,10
+DO10I = 1,10
diff --git a/t/t4034/html/expect b/t/t4034/html/expect
new file mode 100644
index 0000000..447b49a
--- /dev/null
+++ b/t/t4034/html/expect
@@ -0,0 +1,8 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 8ca4aea..46921e5 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,3 +1,3 @@<RESET>
+<tag <GREEN>newattr="newvalue"<RESET>><GREEN>added<RESET> content</tag>
+<tag attr=<RED>"value"<RESET><GREEN>"newvalue"<RESET>><RED>content<RESET><GREEN>changed<RESET></tag>
+<<RED>tag<RESET><GREEN>newtag<RESET>>content <RED>&entity;<RESET><GREEN>&newentity;<RESET><<RED>/tag<RESET><GREEN>/newtag<RESET>>
diff --git a/t/t4034/html/post b/t/t4034/html/post
new file mode 100644
index 0000000..46921e5
--- /dev/null
+++ b/t/t4034/html/post
@@ -0,0 +1,3 @@
+<tag newattr="newvalue">added content</tag>
+<tag attr="newvalue">changed</tag>
+<newtag>content &newentity;</newtag>
diff --git a/t/t4034/html/pre b/t/t4034/html/pre
new file mode 100644
index 0000000..8ca4aea
--- /dev/null
+++ b/t/t4034/html/pre
@@ -0,0 +1,3 @@
+<tag>content</tag>
+<tag attr="value">content</tag>
+<tag>content &entity;</tag>
diff --git a/t/t4034/java/expect b/t/t4034/java/expect
new file mode 100644
index 0000000..37d1ea2
--- /dev/null
+++ b/t/t4034/java/expect
@@ -0,0 +1,36 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 23d5c8a..7e8c026 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,19 +1,19 @@<RESET>
+Foo() : x(0<RED>&&1<RESET><GREEN>&42<RESET>) { <GREEN>bar(x);<RESET> }
+cout<<"Hello World<RED>!<RESET><GREEN>?<RESET>\n"<<endl;
+<GREEN>(<RESET>1<GREEN>) (<RESET>-1e10<GREEN>) (<RESET>0xabcdef<GREEN>)<RESET> '<RED>x<RESET><GREEN>y<RESET>'
+[<RED>a<RESET><GREEN>x<RESET>] <RED>a<RESET><GREEN>x<RESET>-><RED>b a<RESET><GREEN>y x<RESET>.<RED>b<RESET><GREEN>y<RESET>
+!<RED>a<RESET><GREEN>x<RESET> ~<RED>a a<RESET><GREEN>x x<RESET>++ <RED>a<RESET><GREEN>x<RESET>-- <RED>a<RESET><GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>/<RED>b a<RESET><GREEN>y x<RESET>%<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>+<RED>b a<RESET><GREEN>y x<RESET>-<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<<RED>b a<RESET><GREEN>y x<RESET>>><RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<RED>b a<RESET><GREEN>y x<RESET><=<RED>b a<RESET><GREEN>y x<RESET>><RED>b a<RESET><GREEN>y x<RESET>>=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>==<RED>b a<RESET><GREEN>y x<RESET>!=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>^<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>|<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>||<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>?<RED>b<RESET><GREEN>y<RESET>:z
+<RED>a<RESET><GREEN>x<RESET>=<RED>b a<RESET><GREEN>y x<RESET>+=<RED>b a<RESET><GREEN>y x<RESET>-=<RED>b a<RESET><GREEN>y x<RESET>*=<RED>b a<RESET><GREEN>y x<RESET>/=<RED>b a<RESET><GREEN>y x<RESET>%=<RED>b a<RESET><GREEN>y x<RESET><<=<RED>b a<RESET><GREEN>y x<RESET>>>=<RED>b a<RESET><GREEN>y x<RESET>&=<RED>b a<RESET><GREEN>y x<RESET>^=<RED>b a<RESET><GREEN>y x<RESET>|=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>,y
+<RED>a<RESET><GREEN>x<RESET>::<RED>b<RESET><GREEN>y<RESET>
diff --git a/t/t4034/java/post b/t/t4034/java/post
new file mode 100644
index 0000000..7e8c026
--- /dev/null
+++ b/t/t4034/java/post
@@ -0,0 +1,19 @@
+Foo() : x(0&42) { bar(x); }
+cout<<"Hello World?\n"<<endl;
+(1) (-1e10) (0xabcdef) 'y'
+[x] x->y x.y
+!x ~x x++ x-- x*y x&y
+x*y x/y x%y
+x+y x-y
+x<<y x>>y
+x<y x<=y x>y x>=y
+x==y x!=y
+x&y
+x^y
+x|y
+x&&y
+x||y
+x?y:z
+x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y
+x,y
+x::y
diff --git a/t/t4034/java/pre b/t/t4034/java/pre
new file mode 100644
index 0000000..23d5c8a
--- /dev/null
+++ b/t/t4034/java/pre
@@ -0,0 +1,19 @@
+Foo():x(0&&1){}
+cout<<"Hello World!\n"<<endl;
+1 -1e10 0xabcdef 'x'
+[a] a->b a.b
+!a ~a a++ a-- a*b a&b
+a*b a/b a%b
+a+b a-b
+a<<b a>>b
+a<b a<=b a>b a>=b
+a==b a!=b
+a&b
+a^b
+a|b
+a&&b
+a||b
+a?b:z
+a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b
+a,y
+a::b
diff --git a/t/t4034/objc/expect b/t/t4034/objc/expect
new file mode 100644
index 0000000..e5d1dd2
--- /dev/null
+++ b/t/t4034/objc/expect
@@ -0,0 +1,35 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 9106d63..dd5f421 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,18 +1,18 @@<RESET>
+Foo() : x(0<RED>&&1<RESET><GREEN>&42<RESET>) { <GREEN>bar(x);<RESET> }
+cout<<"Hello World<RED>!<RESET><GREEN>?<RESET>\n"<<endl;
+<GREEN>(<RESET>1<GREEN>) (<RESET>-1e10<GREEN>) (<RESET>0xabcdef<GREEN>)<RESET> '<RED>x<RESET><GREEN>y<RESET>'
+[<RED>a<RESET><GREEN>x<RESET>] <RED>a<RESET><GREEN>x<RESET>-><RED>b a<RESET><GREEN>y x<RESET>.<RED>b<RESET><GREEN>y<RESET>
+!<RED>a<RESET><GREEN>x<RESET> ~<RED>a a<RESET><GREEN>x x<RESET>++ <RED>a<RESET><GREEN>x<RESET>-- <RED>a<RESET><GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>/<RED>b a<RESET><GREEN>y x<RESET>%<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>+<RED>b a<RESET><GREEN>y x<RESET>-<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<<RED>b a<RESET><GREEN>y x<RESET>>><RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<RED>b a<RESET><GREEN>y x<RESET><=<RED>b a<RESET><GREEN>y x<RESET>><RED>b a<RESET><GREEN>y x<RESET>>=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>==<RED>b a<RESET><GREEN>y x<RESET>!=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>^<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>|<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>||<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>?<RED>b<RESET><GREEN>y<RESET>:z
+<RED>a<RESET><GREEN>x<RESET>=<RED>b a<RESET><GREEN>y x<RESET>+=<RED>b a<RESET><GREEN>y x<RESET>-=<RED>b a<RESET><GREEN>y x<RESET>*=<RED>b a<RESET><GREEN>y x<RESET>/=<RED>b a<RESET><GREEN>y x<RESET>%=<RED>b a<RESET><GREEN>y x<RESET><<=<RED>b a<RESET><GREEN>y x<RESET>>>=<RED>b a<RESET><GREEN>y x<RESET>&=<RED>b a<RESET><GREEN>y x<RESET>^=<RED>b a<RESET><GREEN>y x<RESET>|=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>,y
diff --git a/t/t4034/objc/post b/t/t4034/objc/post
new file mode 100644
index 0000000..dd5f421
--- /dev/null
+++ b/t/t4034/objc/post
@@ -0,0 +1,18 @@
+Foo() : x(0&42) { bar(x); }
+cout<<"Hello World?\n"<<endl;
+(1) (-1e10) (0xabcdef) 'y'
+[x] x->y x.y
+!x ~x x++ x-- x*y x&y
+x*y x/y x%y
+x+y x-y
+x<<y x>>y
+x<y x<=y x>y x>=y
+x==y x!=y
+x&y
+x^y
+x|y
+x&&y
+x||y
+x?y:z
+x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y
+x,y
diff --git a/t/t4034/objc/pre b/t/t4034/objc/pre
new file mode 100644
index 0000000..9106d63
--- /dev/null
+++ b/t/t4034/objc/pre
@@ -0,0 +1,18 @@
+Foo():x(0&&1){}
+cout<<"Hello World!\n"<<endl;
+1 -1e10 0xabcdef 'x'
+[a] a->b a.b
+!a ~a a++ a-- a*b a&b
+a*b a/b a%b
+a+b a-b
+a<<b a>>b
+a<b a<=b a>b a>=b
+a==b a!=b
+a&b
+a^b
+a|b
+a&&b
+a||b
+a?b:z
+a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b
+a,y
diff --git a/t/t4034/pascal/expect b/t/t4034/pascal/expect
new file mode 100644
index 0000000..2ce4230
--- /dev/null
+++ b/t/t4034/pascal/expect
@@ -0,0 +1,35 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 077046c..8865e6b 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,18 +1,18 @@<RESET>
+writeln("Hello World<RED>!<RESET><GREEN>?<RESET>");
+<GREEN>(<RESET>1<GREEN>) (<RESET>-1e10<GREEN>) (<RESET>0xabcdef<GREEN>)<RESET> '<RED>x<RESET><GREEN>y<RESET>'
+[<RED>a<RESET><GREEN>x<RESET>] <RED>a<RESET><GREEN>x<RESET>-><RED>b a<RESET><GREEN>y x<RESET>.<RED>b<RESET><GREEN>y<RESET>
+!<RED>a<RESET><GREEN>x<RESET> ~<RED>a a<RESET><GREEN>x x<RESET>++ <RED>a<RESET><GREEN>x<RESET>-- <RED>a<RESET><GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>/<RED>b a<RESET><GREEN>y x<RESET>%<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>+<RED>b a<RESET><GREEN>y x<RESET>-<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<<RED>b a<RESET><GREEN>y x<RESET>>><RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<RED>b a<RESET><GREEN>y x<RESET><=<RED>b a<RESET><GREEN>y x<RESET>><RED>b a<RESET><GREEN>y x<RESET>>=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>==<RED>b a<RESET><GREEN>y x<RESET>!=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>^<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>|<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>||<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>?<RED>b<RESET><GREEN>y<RESET>:z
+<RED>a<RESET><GREEN>x<RESET>=<RED>b a<RESET><GREEN>y x<RESET>+=<RED>b a<RESET><GREEN>y x<RESET>-=<RED>b a<RESET><GREEN>y x<RESET>*=<RED>b a<RESET><GREEN>y x<RESET>/=<RED>b a<RESET><GREEN>y x<RESET>%=<RED>b a<RESET><GREEN>y x<RESET><<=<RED>b a<RESET><GREEN>y x<RESET>>>=<RED>b a<RESET><GREEN>y x<RESET>&=<RED>b a<RESET><GREEN>y x<RESET>^=<RED>b a<RESET><GREEN>y x<RESET>|=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>,y
+<RED>a<RESET><GREEN>x<RESET>::<RED>b<RESET><GREEN>y<RESET>
diff --git a/t/t4034/pascal/post b/t/t4034/pascal/post
new file mode 100644
index 0000000..8865e6b
--- /dev/null
+++ b/t/t4034/pascal/post
@@ -0,0 +1,18 @@
+writeln("Hello World?");
+(1) (-1e10) (0xabcdef) 'y'
+[x] x->y x.y
+!x ~x x++ x-- x*y x&y
+x*y x/y x%y
+x+y x-y
+x<<y x>>y
+x<y x<=y x>y x>=y
+x==y x!=y
+x&y
+x^y
+x|y
+x&&y
+x||y
+x?y:z
+x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y
+x,y
+x::y
diff --git a/t/t4034/pascal/pre b/t/t4034/pascal/pre
new file mode 100644
index 0000000..077046c
--- /dev/null
+++ b/t/t4034/pascal/pre
@@ -0,0 +1,18 @@
+writeln("Hello World!");
+1 -1e10 0xabcdef 'x'
+[a] a->b a.b
+!a ~a a++ a-- a*b a&b
+a*b a/b a%b
+a+b a-b
+a<<b a>>b
+a<b a<=b a>b a>=b
+a==b a!=b
+a&b
+a^b
+a|b
+a&&b
+a||b
+a?b:z
+a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b
+a,y
+a::b
diff --git a/t/t4034/php/expect b/t/t4034/php/expect
new file mode 100644
index 0000000..0404408
--- /dev/null
+++ b/t/t4034/php/expect
@@ -0,0 +1,35 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index cf6e06b..4420a49 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,18 +1,18 @@<RESET>
+<GREEN>(<RESET>$var<GREEN>)<RESET> $ var
+<?="Hello World<RED>!<RESET><GREEN>?<RESET>"?>
+<GREEN>(<RESET>1<GREEN>) (<RESET>-1e10<GREEN>) (<RESET>0xabcdef<GREEN>)<RESET> '<RED>x<RESET><GREEN>y<RESET>'
+[<RED>a<RESET><GREEN>x<RESET>] <RED>a<RESET><GREEN>x<RESET>-><RED>b a<RESET><GREEN>y x<RESET>.<RED>b<RESET><GREEN>y<RESET>
+!<RED>a<RESET><GREEN>x<RESET> ~<RED>a a<RESET><GREEN>x x<RESET>++ <RED>a<RESET><GREEN>x<RESET>-- <RED>a<RESET><GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>/<RED>b a<RESET><GREEN>y x<RESET>%<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>+<RED>b a<RESET><GREEN>y x<RESET>-<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<<RED>b a<RESET><GREEN>y x<RESET>>><RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<RED>b a<RESET><GREEN>y x<RESET><=<RED>b a<RESET><GREEN>y x<RESET>><RED>b a<RESET><GREEN>y x<RESET>>=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>==<RED>b a<RESET><GREEN>y x<RESET>!=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>^<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>|<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>||<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>?<RED>b<RESET><GREEN>y<RESET>:z
+<RED>a<RESET><GREEN>x<RESET>=<RED>b a<RESET><GREEN>y x<RESET>+=<RED>b a<RESET><GREEN>y x<RESET>-=<RED>b a<RESET><GREEN>y x<RESET>*=<RED>b a<RESET><GREEN>y x<RESET>/=<RED>b a<RESET><GREEN>y x<RESET>%=<RED>b a<RESET><GREEN>y x<RESET><<=<RED>b a<RESET><GREEN>y x<RESET>>>=<RED>b a<RESET><GREEN>y x<RESET>&=<RED>b a<RESET><GREEN>y x<RESET>^=<RED>b a<RESET><GREEN>y x<RESET>|=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>,y
diff --git a/t/t4034/php/post b/t/t4034/php/post
new file mode 100644
index 0000000..4420a49
--- /dev/null
+++ b/t/t4034/php/post
@@ -0,0 +1,18 @@
+($var) $ var
+<?="Hello World?"?>
+(1) (-1e10) (0xabcdef) 'y'
+[x] x->y x.y
+!x ~x x++ x-- x*y x&y
+x*y x/y x%y
+x+y x-y
+x<<y x>>y
+x<y x<=y x>y x>=y
+x==y x!=y
+x&y
+x^y
+x|y
+x&&y
+x||y
+x?y:z
+x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y
+x,y
diff --git a/t/t4034/php/pre b/t/t4034/php/pre
new file mode 100644
index 0000000..cf6e06b
--- /dev/null
+++ b/t/t4034/php/pre
@@ -0,0 +1,18 @@
+$var $var
+<?= "Hello World!" ?>
+1 -1e10 0xabcdef 'x'
+[a] a->b a.b
+!a ~a a++ a-- a*b a&b
+a*b a/b a%b
+a+b a-b
+a<<b a>>b
+a<b a<=b a>b a>=b
+a==b a!=b
+a&b
+a^b
+a|b
+a&&b
+a||b
+a?b:z
+a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b
+a,y
diff --git a/t/t4034/python/expect b/t/t4034/python/expect
new file mode 100644
index 0000000..8abb8a4
--- /dev/null
+++ b/t/t4034/python/expect
@@ -0,0 +1,34 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 438f776..68baf34 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,17 +1,17 @@<RESET>
+print<RED>u<RESET> "Hello World<RED>!<RESET><GREEN>?<RESET>\n"<GREEN>; print<RESET>
+<GREEN>(<RESET>1<GREEN>) (<RESET>-1e10<GREEN>) (<RESET>0xabcdef<GREEN>) u<RESET>'<RED>x<RESET><GREEN>y<RESET>'
+[<RED>a<RESET><GREEN>x<RESET>] <RED>a<RESET><GREEN>x<RESET>-><RED>b a<RESET><GREEN>y x<RESET>.<RED>b<RESET><GREEN>y<RESET>
+!<RED>a<RESET><GREEN>x<RESET> ~<RED>a a<RESET><GREEN>x x<RESET>++ <RED>a<RESET><GREEN>x<RESET>-- <RED>a<RESET><GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>/<RED>b a<RESET><GREEN>y x<RESET>%<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>+<RED>b a<RESET><GREEN>y x<RESET>-<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<<RED>b a<RESET><GREEN>y x<RESET>>><RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<RED>b a<RESET><GREEN>y x<RESET><=<RED>b a<RESET><GREEN>y x<RESET>><RED>b a<RESET><GREEN>y x<RESET>>=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>==<RED>b a<RESET><GREEN>y x<RESET>!=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>^<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>|<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>||<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>?<RED>b<RESET><GREEN>y<RESET>:z
+<RED>a<RESET><GREEN>x<RESET>=<RED>b a<RESET><GREEN>y x<RESET>+=<RED>b a<RESET><GREEN>y x<RESET>-=<RED>b a<RESET><GREEN>y x<RESET>*=<RED>b a<RESET><GREEN>y x<RESET>/=<RED>b a<RESET><GREEN>y x<RESET>%=<RED>b a<RESET><GREEN>y x<RESET><<=<RED>b a<RESET><GREEN>y x<RESET>>>=<RED>b a<RESET><GREEN>y x<RESET>&=<RED>b a<RESET><GREEN>y x<RESET>^=<RED>b a<RESET><GREEN>y x<RESET>|=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>,y
diff --git a/t/t4034/python/post b/t/t4034/python/post
new file mode 100644
index 0000000..68baf34
--- /dev/null
+++ b/t/t4034/python/post
@@ -0,0 +1,17 @@
+print "Hello World?\n"; print
+(1) (-1e10) (0xabcdef) u'y'
+[x] x->y x.y
+!x ~x x++ x-- x*y x&y
+x*y x/y x%y
+x+y x-y
+x<<y x>>y
+x<y x<=y x>y x>=y
+x==y x!=y
+x&y
+x^y
+x|y
+x&&y
+x||y
+x?y:z
+x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y
+x,y
diff --git a/t/t4034/python/pre b/t/t4034/python/pre
new file mode 100644
index 0000000..438f776
--- /dev/null
+++ b/t/t4034/python/pre
@@ -0,0 +1,17 @@
+print u"Hello World!\n"
+1 -1e10 0xabcdef 'x'
+[a] a->b a.b
+!a ~a a++ a-- a*b a&b
+a*b a/b a%b
+a+b a-b
+a<<b a>>b
+a<b a<=b a>b a>=b
+a==b a!=b
+a&b
+a^b
+a|b
+a&&b
+a||b
+a?b:z
+a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b
+a,y
diff --git a/t/t4034/ruby/expect b/t/t4034/ruby/expect
new file mode 100644
index 0000000..16e1dd5
--- /dev/null
+++ b/t/t4034/ruby/expect
@@ -0,0 +1,34 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 30ed9a1..7678f14 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,17 +1,17 @@<RESET>
+10.downto(1) {|<RED>x<RESET><GREEN>y<RESET>| puts <RED>x<RESET><GREEN>y<RESET>}
+<GREEN>(<RESET>1<GREEN>) (<RESET>-1e10<GREEN>) (<RESET>0xabcdef<GREEN>)<RESET> '<RED>x<RESET><GREEN>y<RESET>'
+[<RED>a<RESET><GREEN>x<RESET>] <RED>a<RESET><GREEN>x<RESET>-><RED>b a<RESET><GREEN>y x<RESET>.<RED>b<RESET><GREEN>y<RESET>
+!<RED>a<RESET><GREEN>x<RESET> ~<RED>a a<RESET><GREEN>x x<RESET>++ <RED>a<RESET><GREEN>x<RESET>-- <RED>a<RESET><GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>/<RED>b a<RESET><GREEN>y x<RESET>%<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>+<RED>b a<RESET><GREEN>y x<RESET>-<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<<RED>b a<RESET><GREEN>y x<RESET>>><RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<RED>b a<RESET><GREEN>y x<RESET><=<RED>b a<RESET><GREEN>y x<RESET>><RED>b a<RESET><GREEN>y x<RESET>>=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>==<RED>b a<RESET><GREEN>y x<RESET>!=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>^<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>|<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>||<RED>b<RESET>
+<RED>a?b<RESET><GREEN>y<RESET>
+<GREEN>x?y<RESET>:z
+<RED>a<RESET><GREEN>x<RESET>=<RED>b a<RESET><GREEN>y x<RESET>+=<RED>b a<RESET><GREEN>y x<RESET>-=<RED>b a<RESET><GREEN>y x<RESET>*=<RED>b a<RESET><GREEN>y x<RESET>/=<RED>b a<RESET><GREEN>y x<RESET>%=<RED>b a<RESET><GREEN>y x<RESET><<=<RED>b a<RESET><GREEN>y x<RESET>>>=<RED>b a<RESET><GREEN>y x<RESET>&=<RED>b a<RESET><GREEN>y x<RESET>^=<RED>b a<RESET><GREEN>y x<RESET>|=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>,y
diff --git a/t/t4034/ruby/post b/t/t4034/ruby/post
new file mode 100644
index 0000000..7678f14
--- /dev/null
+++ b/t/t4034/ruby/post
@@ -0,0 +1,17 @@
+10.downto(1) {|y| puts y}
+(1) (-1e10) (0xabcdef) 'y'
+[x] x->y x.y
+!x ~x x++ x-- x*y x&y
+x*y x/y x%y
+x+y x-y
+x<<y x>>y
+x<y x<=y x>y x>=y
+x==y x!=y
+x&y
+x^y
+x|y
+x&&y
+x||y
+x?y:z
+x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y
+x,y
diff --git a/t/t4034/ruby/pre b/t/t4034/ruby/pre
new file mode 100644
index 0000000..30ed9a1
--- /dev/null
+++ b/t/t4034/ruby/pre
@@ -0,0 +1,17 @@
+10.downto(1) {|x| puts x}
+1 -1e10 0xabcdef 'x'
+[a] a->b a.b
+!a ~a a++ a-- a*b a&b
+a*b a/b a%b
+a+b a-b
+a<<b a>>b
+a<b a<=b a>b a>=b
+a==b a!=b
+a&b
+a^b
+a|b
+a&&b
+a||b
+a?b:z
+a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b
+a,y
diff --git a/t/t4034/tex/expect b/t/t4034/tex/expect
new file mode 100644
index 0000000..604969b
--- /dev/null
+++ b/t/t4034/tex/expect
@@ -0,0 +1,9 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 2b2dfcb..65cab61 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,4 +1,4 @@<RESET>
+\section{Something <GREEN>new<RESET>}
+<RED>\emph<RESET><GREEN>\textbf<RESET>{Macro style}
+{<RED>\em<RESET><GREEN>\bfseries<RESET> State toggle style}
+\\[<RED>1em<RESET><GREEN>1cm<RESET>]
diff --git a/t/t4034/tex/post b/t/t4034/tex/post
new file mode 100644
index 0000000..65cab61
--- /dev/null
+++ b/t/t4034/tex/post
@@ -0,0 +1,4 @@
+\section{Something new}
+\textbf{Macro style}
+{\bfseries State toggle style}
+\\[1cm]
diff --git a/t/t4034/tex/pre b/t/t4034/tex/pre
new file mode 100644
index 0000000..2b2dfcb
--- /dev/null
+++ b/t/t4034/tex/pre
@@ -0,0 +1,4 @@
+\section{Something}
+\emph{Macro style}
+{\em State toggle style}
+\\[1em]
-- 
1.7.3.4.789.g74ad1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/4] --word-regex sanity checking and such
  2010-12-18 16:17         ` [PATCH v2 " Thomas Rast
                             ` (3 preceding siblings ...)
  2010-12-18 16:17           ` [PATCH v2 4/4] t4034: bulk verify builtin word regex sanity Thomas Rast
@ 2010-12-18 16:24           ` Thomas Rast
  2010-12-18 20:48             ` Junio C Hamano
  4 siblings, 1 reply; 27+ messages in thread
From: Thomas Rast @ 2010-12-18 16:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Scott Johnson, Michael J Gruber, Matthijs Kooijman, git

I wrote:
>   diff.c: pass struct diff_words into find_word_boundaries
>   diff.c: implement a sanity check for word regexes
>   userdiff: fix typo in ruby and python word regexes
>   t4034: bulk verify builtin word regex sanity

BTW, Junio, you could move the third patch to the front and merge it
to maint.  I think it's an obvious fix to a real bug, and it does not
depend on the other patches except that the test in 4/4 will fail
without the fix.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/4] --word-regex sanity checking and such
  2010-12-18 16:24           ` [PATCH v2 0/4] --word-regex sanity checking and such Thomas Rast
@ 2010-12-18 20:48             ` Junio C Hamano
  0 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2010-12-18 20:48 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Scott Johnson, Michael J Gruber, Matthijs Kooijman, git

Thomas Rast <trast@student.ethz.ch> writes:

> I wrote:
>>   diff.c: pass struct diff_words into find_word_boundaries
>>   diff.c: implement a sanity check for word regexes
>>   userdiff: fix typo in ruby and python word regexes
>>   t4034: bulk verify builtin word regex sanity
>
> BTW, Junio, you could move the third patch to the front and merge it
> to maint.  I think it's an obvious fix to a real bug, and it does not
> depend on the other patches except that the test in 4/4 will fail
> without the fix.

Makes sense; thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/4] diff.c: implement a sanity check for word regexes
  2010-12-18 16:17           ` [PATCH v2 2/4] diff.c: implement a sanity check for word regexes Thomas Rast
@ 2010-12-18 21:00             ` Junio C Hamano
  2010-12-19  1:59               ` Thomas Rast
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2010-12-18 21:00 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Scott Johnson, Michael J Gruber, Matthijs Kooijman, git

Thomas Rast <trast@student.ethz.ch> writes:

> Word regexes are a bit of a dangerous beast, since it is easily
> possible to not match a non-space part, which is subsequently ignored
> for the purposes of emitting the word diff.  This was clearly stated
> in the docs, but users still tripped over it.
>
> Implement a safeguard that verifies two basic sanity assumptions:
>
> * The word regex matches anything that is !isspace().
>
> * The word regex does not match '\n'.  (This case is not very harmful,
>   but we used to silently cut off at the '\n' which may go against
>   user expectations.)
>
> This is configurable via 'diff.wordRegexCheck', and defaults to
> 'warn'.

How expensive to run this check twice, every time word_regex finds a
match?

As this is about making sure that we got a sane regex from the user (or a
builtin pattern), I wonder if we can make it not depend on the payload we
are matching the regex against.  Then before using a word_regex that we
have not checked, we check if that regex is sane, mark it checked, and do
not have to do the check over and over again.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 3/4] userdiff: fix typo in ruby and python word regexes
  2010-12-18 16:17           ` [PATCH v2 3/4] userdiff: fix typo in ruby and python " Thomas Rast
@ 2010-12-18 21:02             ` Junio C Hamano
  2010-12-19  2:10               ` Thomas Rast
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2010-12-18 21:02 UTC (permalink / raw)
  To: Thomas Rast
  Cc: Junio C Hamano, Scott Johnson, Michael J Gruber,
	Matthijs Kooijman, git

Thomas Rast <trast@student.ethz.ch> writes:

> Both had an unclosed ] that ruined the safeguard against not matching
> a non-space char.

Thanks.

Couldn't we have found this without your "sanity check" patch?  Are we
ignoring error returns from regcomp() in some codepath, or is it just that
we are catching them but our test suite lacks ruby and python test
vectors?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/4] diff.c: implement a sanity check for word regexes
  2010-12-18 21:00             ` Junio C Hamano
@ 2010-12-19  1:59               ` Thomas Rast
  0 siblings, 0 replies; 27+ messages in thread
From: Thomas Rast @ 2010-12-19  1:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Scott Johnson, Michael J Gruber, Matthijs Kooijman, git

Junio C Hamano wrote:
> Thomas Rast <trast@student.ethz.ch> writes:
> 
> > * The word regex matches anything that is !isspace().
> >
> > * The word regex does not match '\n'.  (This case is not very harmful,
> >   but we used to silently cut off at the '\n' which may go against
> >   user expectations.)
> 
> How expensive to run this check twice, every time word_regex finds a
> match?

It runs the first bullet point for every non-match, and the second
bullet point for every match.  So it looks at every input character
exactly once.

> As this is about making sure that we got a sane regex from the user (or a
> builtin pattern), I wonder if we can make it not depend on the payload we
> are matching the regex against.  Then before using a word_regex that we
> have not checked, we check if that regex is sane, mark it checked, and do
> not have to do the check over and over again.

Algorithmically it should be easy once you have the finite state
automaton corresponding to the regex: just verify that for every
possible non-terminal state, there is a transition for every
!isspace() character to a state other than "fail to match" or "match
the empty string".

In the implementation, it might be doable if we switch to compat/regex
on all platforms, since we then have ready access to all internal
structures regcomp() creates, including the DFA.

I'll think about at least using compat/regex for a static check of all
*builtin* patterns, which would be superior to the brute force
approach in 4/4.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 3/4] userdiff: fix typo in ruby and python word regexes
  2010-12-18 21:02             ` Junio C Hamano
@ 2010-12-19  2:10               ` Thomas Rast
  0 siblings, 0 replies; 27+ messages in thread
From: Thomas Rast @ 2010-12-19  2:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Scott Johnson, Michael J Gruber, Matthijs Kooijman, git

Junio C Hamano wrote:
> Thomas Rast <trast@student.ethz.ch> writes:
> 
> > Both had an unclosed ] that ruined the safeguard against not matching
> > a non-space char.
> 
> Thanks.
> 
> Couldn't we have found this without your "sanity check" patch?  Are we
> ignoring error returns from regcomp() in some codepath, or is it just that
> we are catching them but our test suite lacks ruby and python test
> vectors?

We lacked test vectors, but we still couldn't have caught it.  We do
check for errors in regcomp():

	if (o->word_regex) {
		ecbdata.diff_words->word_regex = (regex_t *)
			xmalloc(sizeof(regex_t));
		if (regcomp(ecbdata.diff_words->word_regex,
				o->word_regex,
				REG_EXTENDED | REG_NEWLINE))
			die ("Invalid regular expression: %s",
					o->word_regex);
	}

(Now that I'm seeing this and comparing with regcomp(3), we should
actually report regerror() as part of the error message.)

The problem is that the pattern is still valid.  Consider that it was
a final two arms to the regex:

-        "|[^[:space:]|[\x80-\xff]+"),
+        "|[^[:space:]]|[\x80-\xff]+"),

In the preimage, it parses like so:

  | [^
      [:space:]|[\x80-\xff
     ]+

That is, the third [ is part of the (negated) character class.  So the
only problem is with | or [ characters in the input.  Any other
non-space character is part of the class.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC/PATCH 0/3] Re: t4034: bulk verify builtin word regex sanity
  2010-12-18 16:17           ` [PATCH v2 4/4] t4034: bulk verify builtin word regex sanity Thomas Rast
@ 2011-01-11 21:47             ` Jonathan Nieder
  2011-01-11 21:48               ` [PATCH 1/3] " Jonathan Nieder
                                 ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Jonathan Nieder @ 2011-01-11 21:47 UTC (permalink / raw)
  To: Thomas Rast
  Cc: Junio C Hamano, Scott Johnson, Michael J Gruber,
	Matthijs Kooijman, git

Thomas Rast wrote:

> The builtin word regexes should be tested with some simple examples
> against simple issues, like failing to match a non-space character.
> Do this in bulk.

The above patch depends on "diff.c: implement a sanity check for word
regexes" but not in any essential way.  Patch 1 below is the part that
is still relevant without it.

Patch 2 changes the UTF-8 catchall to match a single non-ASCII
character[1], at the same time as making it harder to forget to use.
(My motivation is that the UTF-8 catchall is missing in the new perl
support.)

Patch 3 contains some cosmetic tweaks to the tests.  They were meant
as preparations for patch 2 but I checkened out and stopped there.

Thoughts and improvements welcome, as always.

Jonathan Nieder (2):
  userdiff: simplify word-diff safeguard
  t4034 (diff --word-diff): style suggestions

Thomas Rast (1):
  t4034: bulk verify builtin word regex sanity

 t/t4034-diff-words.sh  |  463 +++++++++++++++++++++++-------------------------
 t/t4034/bibtex/expect  |   15 ++
 t/t4034/bibtex/post    |   10 +
 t/t4034/bibtex/pre     |    9 +
 t/t4034/cpp/expect     |   36 ++++
 t/t4034/cpp/post       |   19 ++
 t/t4034/cpp/pre        |   19 ++
 t/t4034/csharp/expect  |   35 ++++
 t/t4034/csharp/post    |   18 ++
 t/t4034/csharp/pre     |   18 ++
 t/t4034/fortran/expect |   10 +
 t/t4034/fortran/post   |    5 +
 t/t4034/fortran/pre    |    5 +
 t/t4034/html/expect    |    8 +
 t/t4034/html/post      |    3 +
 t/t4034/html/pre       |    3 +
 t/t4034/java/expect    |   36 ++++
 t/t4034/java/post      |   19 ++
 t/t4034/java/pre       |   19 ++
 t/t4034/objc/expect    |   35 ++++
 t/t4034/objc/post      |   18 ++
 t/t4034/objc/pre       |   18 ++
 t/t4034/pascal/expect  |   35 ++++
 t/t4034/pascal/post    |   18 ++
 t/t4034/pascal/pre     |   18 ++
 t/t4034/php/expect     |   35 ++++
 t/t4034/php/post       |   18 ++
 t/t4034/php/pre        |   18 ++
 t/t4034/python/expect  |   34 ++++
 t/t4034/python/post    |   17 ++
 t/t4034/python/pre     |   17 ++
 t/t4034/ruby/expect    |   34 ++++
 t/t4034/ruby/post      |   17 ++
 t/t4034/ruby/pre       |   17 ++
 t/t4034/tex/expect     |    9 +
 t/t4034/tex/post       |    4 +
 t/t4034/tex/pre        |    4 +
 userdiff.c             |   37 ++---
 38 files changed, 887 insertions(+), 266 deletions(-)
 create mode 100644 t/t4034/bibtex/expect
 create mode 100644 t/t4034/bibtex/post
 create mode 100644 t/t4034/bibtex/pre
 create mode 100644 t/t4034/cpp/expect
 create mode 100644 t/t4034/cpp/post
 create mode 100644 t/t4034/cpp/pre
 create mode 100644 t/t4034/csharp/expect
 create mode 100644 t/t4034/csharp/post
 create mode 100644 t/t4034/csharp/pre
 create mode 100644 t/t4034/fortran/expect
 create mode 100644 t/t4034/fortran/post
 create mode 100644 t/t4034/fortran/pre
 create mode 100644 t/t4034/html/expect
 create mode 100644 t/t4034/html/post
 create mode 100644 t/t4034/html/pre
 create mode 100644 t/t4034/java/expect
 create mode 100644 t/t4034/java/post
 create mode 100644 t/t4034/java/pre
 create mode 100644 t/t4034/objc/expect
 create mode 100644 t/t4034/objc/post
 create mode 100644 t/t4034/objc/pre
 create mode 100644 t/t4034/pascal/expect
 create mode 100644 t/t4034/pascal/post
 create mode 100644 t/t4034/pascal/pre
 create mode 100644 t/t4034/php/expect
 create mode 100644 t/t4034/php/post
 create mode 100644 t/t4034/php/pre
 create mode 100644 t/t4034/python/expect
 create mode 100644 t/t4034/python/post
 create mode 100644 t/t4034/python/pre
 create mode 100644 t/t4034/ruby/expect
 create mode 100644 t/t4034/ruby/post
 create mode 100644 t/t4034/ruby/pre
 create mode 100644 t/t4034/tex/expect
 create mode 100644 t/t4034/tex/post
 create mode 100644 t/t4034/tex/pre

[1] suggested in <201012261206.11942.trast@student.ethz.ch>, which is
missing from gmane for some reason.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/3] t4034: bulk verify builtin word regex sanity
  2011-01-11 21:47             ` [RFC/PATCH 0/3] " Jonathan Nieder
@ 2011-01-11 21:48               ` Jonathan Nieder
  2011-01-18 18:00                 ` Re*: " Junio C Hamano
  2011-01-11 21:48               ` [PATCH 2/3] userdiff: simplify word-diff safeguard Jonathan Nieder
  2011-01-11 21:49               ` [PATCH 3/3] t4034 (diff --word-diff): style suggestions Jonathan Nieder
  2 siblings, 1 reply; 27+ messages in thread
From: Jonathan Nieder @ 2011-01-11 21:48 UTC (permalink / raw)
  To: Thomas Rast
  Cc: Junio C Hamano, Scott Johnson, Michael J Gruber,
	Matthijs Kooijman, git

From: Thomas Rast <trast@student.ethz.ch>
Date: Sat, 18 Dec 2010 17:17:54 +0100

The builtin word regexes should be tested with some simple examples
against simple issues.  Do this in bulk.

Mainly due to a lack of language knowledge and inspiration, most of
the test cases (cpp, csharp, java, objc, pascal, php, python, ruby)
are directly based off a C operator precedence table to verify that
all operators are split correctly.  This means that they are probably
incomplete or inaccurate except for 'cpp' itself.

Still, they are good enough to already have uncovered a typo in the
python and ruby patterns.

'fortran' is based on my anecdotal knowledge of the DO10I parsing
rules, and thus probably useless.  The rest (bibtex, html, tex) are an
ad-hoc test of what I consider important splits in those languages.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 t/t4034-diff-words.sh  |   15 +++++++++++++++
 t/t4034/bibtex/expect  |   15 +++++++++++++++
 t/t4034/bibtex/post    |   10 ++++++++++
 t/t4034/bibtex/pre     |    9 +++++++++
 t/t4034/cpp/expect     |   36 ++++++++++++++++++++++++++++++++++++
 t/t4034/cpp/post       |   19 +++++++++++++++++++
 t/t4034/cpp/pre        |   19 +++++++++++++++++++
 t/t4034/csharp/expect  |   35 +++++++++++++++++++++++++++++++++++
 t/t4034/csharp/post    |   18 ++++++++++++++++++
 t/t4034/csharp/pre     |   18 ++++++++++++++++++
 t/t4034/fortran/expect |   10 ++++++++++
 t/t4034/fortran/post   |    5 +++++
 t/t4034/fortran/pre    |    5 +++++
 t/t4034/html/expect    |    8 ++++++++
 t/t4034/html/post      |    3 +++
 t/t4034/html/pre       |    3 +++
 t/t4034/java/expect    |   36 ++++++++++++++++++++++++++++++++++++
 t/t4034/java/post      |   19 +++++++++++++++++++
 t/t4034/java/pre       |   19 +++++++++++++++++++
 t/t4034/objc/expect    |   35 +++++++++++++++++++++++++++++++++++
 t/t4034/objc/post      |   18 ++++++++++++++++++
 t/t4034/objc/pre       |   18 ++++++++++++++++++
 t/t4034/pascal/expect  |   35 +++++++++++++++++++++++++++++++++++
 t/t4034/pascal/post    |   18 ++++++++++++++++++
 t/t4034/pascal/pre     |   18 ++++++++++++++++++
 t/t4034/php/expect     |   35 +++++++++++++++++++++++++++++++++++
 t/t4034/php/post       |   18 ++++++++++++++++++
 t/t4034/php/pre        |   18 ++++++++++++++++++
 t/t4034/python/expect  |   34 ++++++++++++++++++++++++++++++++++
 t/t4034/python/post    |   17 +++++++++++++++++
 t/t4034/python/pre     |   17 +++++++++++++++++
 t/t4034/ruby/expect    |   34 ++++++++++++++++++++++++++++++++++
 t/t4034/ruby/post      |   17 +++++++++++++++++
 t/t4034/ruby/pre       |   17 +++++++++++++++++
 t/t4034/tex/expect     |    9 +++++++++
 t/t4034/tex/post       |    4 ++++
 t/t4034/tex/pre        |    4 ++++
 37 files changed, 668 insertions(+), 0 deletions(-)
 create mode 100644 t/t4034/bibtex/expect
 create mode 100644 t/t4034/bibtex/post
 create mode 100644 t/t4034/bibtex/pre
 create mode 100644 t/t4034/cpp/expect
 create mode 100644 t/t4034/cpp/post
 create mode 100644 t/t4034/cpp/pre
 create mode 100644 t/t4034/csharp/expect
 create mode 100644 t/t4034/csharp/post
 create mode 100644 t/t4034/csharp/pre
 create mode 100644 t/t4034/fortran/expect
 create mode 100644 t/t4034/fortran/post
 create mode 100644 t/t4034/fortran/pre
 create mode 100644 t/t4034/html/expect
 create mode 100644 t/t4034/html/post
 create mode 100644 t/t4034/html/pre
 create mode 100644 t/t4034/java/expect
 create mode 100644 t/t4034/java/post
 create mode 100644 t/t4034/java/pre
 create mode 100644 t/t4034/objc/expect
 create mode 100644 t/t4034/objc/post
 create mode 100644 t/t4034/objc/pre
 create mode 100644 t/t4034/pascal/expect
 create mode 100644 t/t4034/pascal/post
 create mode 100644 t/t4034/pascal/pre
 create mode 100644 t/t4034/php/expect
 create mode 100644 t/t4034/php/post
 create mode 100644 t/t4034/php/pre
 create mode 100644 t/t4034/python/expect
 create mode 100644 t/t4034/python/post
 create mode 100644 t/t4034/python/pre
 create mode 100644 t/t4034/ruby/expect
 create mode 100644 t/t4034/ruby/post
 create mode 100644 t/t4034/ruby/pre
 create mode 100644 t/t4034/tex/expect
 create mode 100644 t/t4034/tex/post
 create mode 100644 t/t4034/tex/pre

diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index 8096d8a..2647191 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -331,4 +331,19 @@ test_expect_success '--word-diff=none' '
 
 '
 
+word_diff_for_language () {
+	cp "$TEST_DIRECTORY/t4034/$1/pre" \
+		"$TEST_DIRECTORY/t4034/$1/post" \
+		"$TEST_DIRECTORY/t4034/$1/expect" . &&
+	echo "* diff=$1" >.gitattributes &&
+	word_diff --color-words && cp output output.$1
+}
+
+for lang_dir in $TEST_DIRECTORY/t4034/*; do
+	lang=${lang_dir#$TEST_DIRECTORY/t4034/}
+	test_expect_success "diff driver '$lang' has sane word regex" "
+		word_diff_for_language $lang
+	"
+done
+
 test_done
diff --git a/t/t4034/bibtex/expect b/t/t4034/bibtex/expect
new file mode 100644
index 0000000..a157774
--- /dev/null
+++ b/t/t4034/bibtex/expect
@@ -0,0 +1,15 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 95cd55b..ddcba9b 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,9 +1,10 @@<RESET>
+@article{aldous1987uie,<RESET>
+  title={{Ultimate instability of exponential back-off protocol for acknowledgment-based transmission control of random access communication channels}},<RESET>
+  author={Aldous, <RED>D.<RESET><GREEN>David<RESET>},
+  journal={Information Theory, IEEE Transactions on},<RESET>
+  volume={<RED>33<RESET><GREEN>Bogus.<RESET>},
+  number={<RED>2<RESET><GREEN>4<RESET>},
+  pages={219--223},<RESET>
+  year=<GREEN>1987,<RESET>
+<GREEN>  note={This is in fact a rather funny read since ethernet works well in practice. The<RESET> {<RED>1987<RESET><GREEN>\em pre} reference is the right one, however.<RESET>}<RED>,<RESET>
+}<RESET>
diff --git a/t/t4034/bibtex/post b/t/t4034/bibtex/post
new file mode 100644
index 0000000..ddcba9b
--- /dev/null
+++ b/t/t4034/bibtex/post
@@ -0,0 +1,10 @@
+@article{aldous1987uie,
+  title={{Ultimate instability of exponential back-off protocol for acknowledgment-based transmission control of random access communication channels}},
+  author={Aldous, David},
+  journal={Information Theory, IEEE Transactions on},
+  volume={Bogus.},
+  number={4},
+  pages={219--223},
+  year=1987,
+  note={This is in fact a rather funny read since ethernet works well in practice. The {\em pre} reference is the right one, however.}
+}
diff --git a/t/t4034/bibtex/pre b/t/t4034/bibtex/pre
new file mode 100644
index 0000000..95cd55b
--- /dev/null
+++ b/t/t4034/bibtex/pre
@@ -0,0 +1,9 @@
+@article{aldous1987uie,
+  title={{Ultimate instability of exponential back-off protocol for acknowledgment-based transmission control of random access communication channels}},
+  author={Aldous, D.},
+  journal={Information Theory, IEEE Transactions on},
+  volume={33},
+  number={2},
+  pages={219--223},
+  year={1987},
+}
diff --git a/t/t4034/cpp/expect b/t/t4034/cpp/expect
new file mode 100644
index 0000000..37d1ea2
--- /dev/null
+++ b/t/t4034/cpp/expect
@@ -0,0 +1,36 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 23d5c8a..7e8c026 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,19 +1,19 @@<RESET>
+Foo() : x(0<RED>&&1<RESET><GREEN>&42<RESET>) { <GREEN>bar(x);<RESET> }
+cout<<"Hello World<RED>!<RESET><GREEN>?<RESET>\n"<<endl;
+<GREEN>(<RESET>1<GREEN>) (<RESET>-1e10<GREEN>) (<RESET>0xabcdef<GREEN>)<RESET> '<RED>x<RESET><GREEN>y<RESET>'
+[<RED>a<RESET><GREEN>x<RESET>] <RED>a<RESET><GREEN>x<RESET>-><RED>b a<RESET><GREEN>y x<RESET>.<RED>b<RESET><GREEN>y<RESET>
+!<RED>a<RESET><GREEN>x<RESET> ~<RED>a a<RESET><GREEN>x x<RESET>++ <RED>a<RESET><GREEN>x<RESET>-- <RED>a<RESET><GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>/<RED>b a<RESET><GREEN>y x<RESET>%<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>+<RED>b a<RESET><GREEN>y x<RESET>-<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<<RED>b a<RESET><GREEN>y x<RESET>>><RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<RED>b a<RESET><GREEN>y x<RESET><=<RED>b a<RESET><GREEN>y x<RESET>><RED>b a<RESET><GREEN>y x<RESET>>=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>==<RED>b a<RESET><GREEN>y x<RESET>!=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>^<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>|<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>||<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>?<RED>b<RESET><GREEN>y<RESET>:z
+<RED>a<RESET><GREEN>x<RESET>=<RED>b a<RESET><GREEN>y x<RESET>+=<RED>b a<RESET><GREEN>y x<RESET>-=<RED>b a<RESET><GREEN>y x<RESET>*=<RED>b a<RESET><GREEN>y x<RESET>/=<RED>b a<RESET><GREEN>y x<RESET>%=<RED>b a<RESET><GREEN>y x<RESET><<=<RED>b a<RESET><GREEN>y x<RESET>>>=<RED>b a<RESET><GREEN>y x<RESET>&=<RED>b a<RESET><GREEN>y x<RESET>^=<RED>b a<RESET><GREEN>y x<RESET>|=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>,y
+<RED>a<RESET><GREEN>x<RESET>::<RED>b<RESET><GREEN>y<RESET>
diff --git a/t/t4034/cpp/post b/t/t4034/cpp/post
new file mode 100644
index 0000000..7e8c026
--- /dev/null
+++ b/t/t4034/cpp/post
@@ -0,0 +1,19 @@
+Foo() : x(0&42) { bar(x); }
+cout<<"Hello World?\n"<<endl;
+(1) (-1e10) (0xabcdef) 'y'
+[x] x->y x.y
+!x ~x x++ x-- x*y x&y
+x*y x/y x%y
+x+y x-y
+x<<y x>>y
+x<y x<=y x>y x>=y
+x==y x!=y
+x&y
+x^y
+x|y
+x&&y
+x||y
+x?y:z
+x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y
+x,y
+x::y
diff --git a/t/t4034/cpp/pre b/t/t4034/cpp/pre
new file mode 100644
index 0000000..23d5c8a
--- /dev/null
+++ b/t/t4034/cpp/pre
@@ -0,0 +1,19 @@
+Foo():x(0&&1){}
+cout<<"Hello World!\n"<<endl;
+1 -1e10 0xabcdef 'x'
+[a] a->b a.b
+!a ~a a++ a-- a*b a&b
+a*b a/b a%b
+a+b a-b
+a<<b a>>b
+a<b a<=b a>b a>=b
+a==b a!=b
+a&b
+a^b
+a|b
+a&&b
+a||b
+a?b:z
+a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b
+a,y
+a::b
diff --git a/t/t4034/csharp/expect b/t/t4034/csharp/expect
new file mode 100644
index 0000000..e5d1dd2
--- /dev/null
+++ b/t/t4034/csharp/expect
@@ -0,0 +1,35 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 9106d63..dd5f421 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,18 +1,18 @@<RESET>
+Foo() : x(0<RED>&&1<RESET><GREEN>&42<RESET>) { <GREEN>bar(x);<RESET> }
+cout<<"Hello World<RED>!<RESET><GREEN>?<RESET>\n"<<endl;
+<GREEN>(<RESET>1<GREEN>) (<RESET>-1e10<GREEN>) (<RESET>0xabcdef<GREEN>)<RESET> '<RED>x<RESET><GREEN>y<RESET>'
+[<RED>a<RESET><GREEN>x<RESET>] <RED>a<RESET><GREEN>x<RESET>-><RED>b a<RESET><GREEN>y x<RESET>.<RED>b<RESET><GREEN>y<RESET>
+!<RED>a<RESET><GREEN>x<RESET> ~<RED>a a<RESET><GREEN>x x<RESET>++ <RED>a<RESET><GREEN>x<RESET>-- <RED>a<RESET><GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>/<RED>b a<RESET><GREEN>y x<RESET>%<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>+<RED>b a<RESET><GREEN>y x<RESET>-<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<<RED>b a<RESET><GREEN>y x<RESET>>><RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<RED>b a<RESET><GREEN>y x<RESET><=<RED>b a<RESET><GREEN>y x<RESET>><RED>b a<RESET><GREEN>y x<RESET>>=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>==<RED>b a<RESET><GREEN>y x<RESET>!=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>^<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>|<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>||<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>?<RED>b<RESET><GREEN>y<RESET>:z
+<RED>a<RESET><GREEN>x<RESET>=<RED>b a<RESET><GREEN>y x<RESET>+=<RED>b a<RESET><GREEN>y x<RESET>-=<RED>b a<RESET><GREEN>y x<RESET>*=<RED>b a<RESET><GREEN>y x<RESET>/=<RED>b a<RESET><GREEN>y x<RESET>%=<RED>b a<RESET><GREEN>y x<RESET><<=<RED>b a<RESET><GREEN>y x<RESET>>>=<RED>b a<RESET><GREEN>y x<RESET>&=<RED>b a<RESET><GREEN>y x<RESET>^=<RED>b a<RESET><GREEN>y x<RESET>|=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>,y
diff --git a/t/t4034/csharp/post b/t/t4034/csharp/post
new file mode 100644
index 0000000..dd5f421
--- /dev/null
+++ b/t/t4034/csharp/post
@@ -0,0 +1,18 @@
+Foo() : x(0&42) { bar(x); }
+cout<<"Hello World?\n"<<endl;
+(1) (-1e10) (0xabcdef) 'y'
+[x] x->y x.y
+!x ~x x++ x-- x*y x&y
+x*y x/y x%y
+x+y x-y
+x<<y x>>y
+x<y x<=y x>y x>=y
+x==y x!=y
+x&y
+x^y
+x|y
+x&&y
+x||y
+x?y:z
+x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y
+x,y
diff --git a/t/t4034/csharp/pre b/t/t4034/csharp/pre
new file mode 100644
index 0000000..9106d63
--- /dev/null
+++ b/t/t4034/csharp/pre
@@ -0,0 +1,18 @@
+Foo():x(0&&1){}
+cout<<"Hello World!\n"<<endl;
+1 -1e10 0xabcdef 'x'
+[a] a->b a.b
+!a ~a a++ a-- a*b a&b
+a*b a/b a%b
+a+b a-b
+a<<b a>>b
+a<b a<=b a>b a>=b
+a==b a!=b
+a&b
+a^b
+a|b
+a&&b
+a||b
+a?b:z
+a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b
+a,y
diff --git a/t/t4034/fortran/expect b/t/t4034/fortran/expect
new file mode 100644
index 0000000..b233dbd
--- /dev/null
+++ b/t/t4034/fortran/expect
@@ -0,0 +1,10 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 87f0d0b..d308da2 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,5 +1,5 @@<RESET>
+print *, "Hello World<RED>!<RESET><GREEN>?<RESET>"
+
+DO10I = 1,10<RESET>
+<RED>DO10I<RESET><GREEN>DO 10 I<RESET> = 1,10
+<RED>DO10I<RESET><GREEN>DO 1 0 I<RESET> = 1,10
diff --git a/t/t4034/fortran/post b/t/t4034/fortran/post
new file mode 100644
index 0000000..d308da2
--- /dev/null
+++ b/t/t4034/fortran/post
@@ -0,0 +1,5 @@
+print *, "Hello World?"
+
+DO10I = 1,10
+DO 10 I = 1,10
+DO 1 0 I = 1,10
diff --git a/t/t4034/fortran/pre b/t/t4034/fortran/pre
new file mode 100644
index 0000000..87f0d0b
--- /dev/null
+++ b/t/t4034/fortran/pre
@@ -0,0 +1,5 @@
+print *, "Hello World!"
+
+DO10I = 1,10
+DO10I = 1,10
+DO10I = 1,10
diff --git a/t/t4034/html/expect b/t/t4034/html/expect
new file mode 100644
index 0000000..447b49a
--- /dev/null
+++ b/t/t4034/html/expect
@@ -0,0 +1,8 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 8ca4aea..46921e5 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,3 +1,3 @@<RESET>
+<tag <GREEN>newattr="newvalue"<RESET>><GREEN>added<RESET> content</tag>
+<tag attr=<RED>"value"<RESET><GREEN>"newvalue"<RESET>><RED>content<RESET><GREEN>changed<RESET></tag>
+<<RED>tag<RESET><GREEN>newtag<RESET>>content <RED>&entity;<RESET><GREEN>&newentity;<RESET><<RED>/tag<RESET><GREEN>/newtag<RESET>>
diff --git a/t/t4034/html/post b/t/t4034/html/post
new file mode 100644
index 0000000..46921e5
--- /dev/null
+++ b/t/t4034/html/post
@@ -0,0 +1,3 @@
+<tag newattr="newvalue">added content</tag>
+<tag attr="newvalue">changed</tag>
+<newtag>content &newentity;</newtag>
diff --git a/t/t4034/html/pre b/t/t4034/html/pre
new file mode 100644
index 0000000..8ca4aea
--- /dev/null
+++ b/t/t4034/html/pre
@@ -0,0 +1,3 @@
+<tag>content</tag>
+<tag attr="value">content</tag>
+<tag>content &entity;</tag>
diff --git a/t/t4034/java/expect b/t/t4034/java/expect
new file mode 100644
index 0000000..37d1ea2
--- /dev/null
+++ b/t/t4034/java/expect
@@ -0,0 +1,36 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 23d5c8a..7e8c026 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,19 +1,19 @@<RESET>
+Foo() : x(0<RED>&&1<RESET><GREEN>&42<RESET>) { <GREEN>bar(x);<RESET> }
+cout<<"Hello World<RED>!<RESET><GREEN>?<RESET>\n"<<endl;
+<GREEN>(<RESET>1<GREEN>) (<RESET>-1e10<GREEN>) (<RESET>0xabcdef<GREEN>)<RESET> '<RED>x<RESET><GREEN>y<RESET>'
+[<RED>a<RESET><GREEN>x<RESET>] <RED>a<RESET><GREEN>x<RESET>-><RED>b a<RESET><GREEN>y x<RESET>.<RED>b<RESET><GREEN>y<RESET>
+!<RED>a<RESET><GREEN>x<RESET> ~<RED>a a<RESET><GREEN>x x<RESET>++ <RED>a<RESET><GREEN>x<RESET>-- <RED>a<RESET><GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>/<RED>b a<RESET><GREEN>y x<RESET>%<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>+<RED>b a<RESET><GREEN>y x<RESET>-<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<<RED>b a<RESET><GREEN>y x<RESET>>><RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<RED>b a<RESET><GREEN>y x<RESET><=<RED>b a<RESET><GREEN>y x<RESET>><RED>b a<RESET><GREEN>y x<RESET>>=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>==<RED>b a<RESET><GREEN>y x<RESET>!=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>^<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>|<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>||<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>?<RED>b<RESET><GREEN>y<RESET>:z
+<RED>a<RESET><GREEN>x<RESET>=<RED>b a<RESET><GREEN>y x<RESET>+=<RED>b a<RESET><GREEN>y x<RESET>-=<RED>b a<RESET><GREEN>y x<RESET>*=<RED>b a<RESET><GREEN>y x<RESET>/=<RED>b a<RESET><GREEN>y x<RESET>%=<RED>b a<RESET><GREEN>y x<RESET><<=<RED>b a<RESET><GREEN>y x<RESET>>>=<RED>b a<RESET><GREEN>y x<RESET>&=<RED>b a<RESET><GREEN>y x<RESET>^=<RED>b a<RESET><GREEN>y x<RESET>|=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>,y
+<RED>a<RESET><GREEN>x<RESET>::<RED>b<RESET><GREEN>y<RESET>
diff --git a/t/t4034/java/post b/t/t4034/java/post
new file mode 100644
index 0000000..7e8c026
--- /dev/null
+++ b/t/t4034/java/post
@@ -0,0 +1,19 @@
+Foo() : x(0&42) { bar(x); }
+cout<<"Hello World?\n"<<endl;
+(1) (-1e10) (0xabcdef) 'y'
+[x] x->y x.y
+!x ~x x++ x-- x*y x&y
+x*y x/y x%y
+x+y x-y
+x<<y x>>y
+x<y x<=y x>y x>=y
+x==y x!=y
+x&y
+x^y
+x|y
+x&&y
+x||y
+x?y:z
+x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y
+x,y
+x::y
diff --git a/t/t4034/java/pre b/t/t4034/java/pre
new file mode 100644
index 0000000..23d5c8a
--- /dev/null
+++ b/t/t4034/java/pre
@@ -0,0 +1,19 @@
+Foo():x(0&&1){}
+cout<<"Hello World!\n"<<endl;
+1 -1e10 0xabcdef 'x'
+[a] a->b a.b
+!a ~a a++ a-- a*b a&b
+a*b a/b a%b
+a+b a-b
+a<<b a>>b
+a<b a<=b a>b a>=b
+a==b a!=b
+a&b
+a^b
+a|b
+a&&b
+a||b
+a?b:z
+a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b
+a,y
+a::b
diff --git a/t/t4034/objc/expect b/t/t4034/objc/expect
new file mode 100644
index 0000000..e5d1dd2
--- /dev/null
+++ b/t/t4034/objc/expect
@@ -0,0 +1,35 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 9106d63..dd5f421 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,18 +1,18 @@<RESET>
+Foo() : x(0<RED>&&1<RESET><GREEN>&42<RESET>) { <GREEN>bar(x);<RESET> }
+cout<<"Hello World<RED>!<RESET><GREEN>?<RESET>\n"<<endl;
+<GREEN>(<RESET>1<GREEN>) (<RESET>-1e10<GREEN>) (<RESET>0xabcdef<GREEN>)<RESET> '<RED>x<RESET><GREEN>y<RESET>'
+[<RED>a<RESET><GREEN>x<RESET>] <RED>a<RESET><GREEN>x<RESET>-><RED>b a<RESET><GREEN>y x<RESET>.<RED>b<RESET><GREEN>y<RESET>
+!<RED>a<RESET><GREEN>x<RESET> ~<RED>a a<RESET><GREEN>x x<RESET>++ <RED>a<RESET><GREEN>x<RESET>-- <RED>a<RESET><GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>/<RED>b a<RESET><GREEN>y x<RESET>%<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>+<RED>b a<RESET><GREEN>y x<RESET>-<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<<RED>b a<RESET><GREEN>y x<RESET>>><RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<RED>b a<RESET><GREEN>y x<RESET><=<RED>b a<RESET><GREEN>y x<RESET>><RED>b a<RESET><GREEN>y x<RESET>>=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>==<RED>b a<RESET><GREEN>y x<RESET>!=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>^<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>|<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>||<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>?<RED>b<RESET><GREEN>y<RESET>:z
+<RED>a<RESET><GREEN>x<RESET>=<RED>b a<RESET><GREEN>y x<RESET>+=<RED>b a<RESET><GREEN>y x<RESET>-=<RED>b a<RESET><GREEN>y x<RESET>*=<RED>b a<RESET><GREEN>y x<RESET>/=<RED>b a<RESET><GREEN>y x<RESET>%=<RED>b a<RESET><GREEN>y x<RESET><<=<RED>b a<RESET><GREEN>y x<RESET>>>=<RED>b a<RESET><GREEN>y x<RESET>&=<RED>b a<RESET><GREEN>y x<RESET>^=<RED>b a<RESET><GREEN>y x<RESET>|=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>,y
diff --git a/t/t4034/objc/post b/t/t4034/objc/post
new file mode 100644
index 0000000..dd5f421
--- /dev/null
+++ b/t/t4034/objc/post
@@ -0,0 +1,18 @@
+Foo() : x(0&42) { bar(x); }
+cout<<"Hello World?\n"<<endl;
+(1) (-1e10) (0xabcdef) 'y'
+[x] x->y x.y
+!x ~x x++ x-- x*y x&y
+x*y x/y x%y
+x+y x-y
+x<<y x>>y
+x<y x<=y x>y x>=y
+x==y x!=y
+x&y
+x^y
+x|y
+x&&y
+x||y
+x?y:z
+x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y
+x,y
diff --git a/t/t4034/objc/pre b/t/t4034/objc/pre
new file mode 100644
index 0000000..9106d63
--- /dev/null
+++ b/t/t4034/objc/pre
@@ -0,0 +1,18 @@
+Foo():x(0&&1){}
+cout<<"Hello World!\n"<<endl;
+1 -1e10 0xabcdef 'x'
+[a] a->b a.b
+!a ~a a++ a-- a*b a&b
+a*b a/b a%b
+a+b a-b
+a<<b a>>b
+a<b a<=b a>b a>=b
+a==b a!=b
+a&b
+a^b
+a|b
+a&&b
+a||b
+a?b:z
+a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b
+a,y
diff --git a/t/t4034/pascal/expect b/t/t4034/pascal/expect
new file mode 100644
index 0000000..2ce4230
--- /dev/null
+++ b/t/t4034/pascal/expect
@@ -0,0 +1,35 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 077046c..8865e6b 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,18 +1,18 @@<RESET>
+writeln("Hello World<RED>!<RESET><GREEN>?<RESET>");
+<GREEN>(<RESET>1<GREEN>) (<RESET>-1e10<GREEN>) (<RESET>0xabcdef<GREEN>)<RESET> '<RED>x<RESET><GREEN>y<RESET>'
+[<RED>a<RESET><GREEN>x<RESET>] <RED>a<RESET><GREEN>x<RESET>-><RED>b a<RESET><GREEN>y x<RESET>.<RED>b<RESET><GREEN>y<RESET>
+!<RED>a<RESET><GREEN>x<RESET> ~<RED>a a<RESET><GREEN>x x<RESET>++ <RED>a<RESET><GREEN>x<RESET>-- <RED>a<RESET><GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>/<RED>b a<RESET><GREEN>y x<RESET>%<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>+<RED>b a<RESET><GREEN>y x<RESET>-<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<<RED>b a<RESET><GREEN>y x<RESET>>><RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<RED>b a<RESET><GREEN>y x<RESET><=<RED>b a<RESET><GREEN>y x<RESET>><RED>b a<RESET><GREEN>y x<RESET>>=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>==<RED>b a<RESET><GREEN>y x<RESET>!=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>^<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>|<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>||<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>?<RED>b<RESET><GREEN>y<RESET>:z
+<RED>a<RESET><GREEN>x<RESET>=<RED>b a<RESET><GREEN>y x<RESET>+=<RED>b a<RESET><GREEN>y x<RESET>-=<RED>b a<RESET><GREEN>y x<RESET>*=<RED>b a<RESET><GREEN>y x<RESET>/=<RED>b a<RESET><GREEN>y x<RESET>%=<RED>b a<RESET><GREEN>y x<RESET><<=<RED>b a<RESET><GREEN>y x<RESET>>>=<RED>b a<RESET><GREEN>y x<RESET>&=<RED>b a<RESET><GREEN>y x<RESET>^=<RED>b a<RESET><GREEN>y x<RESET>|=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>,y
+<RED>a<RESET><GREEN>x<RESET>::<RED>b<RESET><GREEN>y<RESET>
diff --git a/t/t4034/pascal/post b/t/t4034/pascal/post
new file mode 100644
index 0000000..8865e6b
--- /dev/null
+++ b/t/t4034/pascal/post
@@ -0,0 +1,18 @@
+writeln("Hello World?");
+(1) (-1e10) (0xabcdef) 'y'
+[x] x->y x.y
+!x ~x x++ x-- x*y x&y
+x*y x/y x%y
+x+y x-y
+x<<y x>>y
+x<y x<=y x>y x>=y
+x==y x!=y
+x&y
+x^y
+x|y
+x&&y
+x||y
+x?y:z
+x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y
+x,y
+x::y
diff --git a/t/t4034/pascal/pre b/t/t4034/pascal/pre
new file mode 100644
index 0000000..077046c
--- /dev/null
+++ b/t/t4034/pascal/pre
@@ -0,0 +1,18 @@
+writeln("Hello World!");
+1 -1e10 0xabcdef 'x'
+[a] a->b a.b
+!a ~a a++ a-- a*b a&b
+a*b a/b a%b
+a+b a-b
+a<<b a>>b
+a<b a<=b a>b a>=b
+a==b a!=b
+a&b
+a^b
+a|b
+a&&b
+a||b
+a?b:z
+a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b
+a,y
+a::b
diff --git a/t/t4034/php/expect b/t/t4034/php/expect
new file mode 100644
index 0000000..0404408
--- /dev/null
+++ b/t/t4034/php/expect
@@ -0,0 +1,35 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index cf6e06b..4420a49 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,18 +1,18 @@<RESET>
+<GREEN>(<RESET>$var<GREEN>)<RESET> $ var
+<?="Hello World<RED>!<RESET><GREEN>?<RESET>"?>
+<GREEN>(<RESET>1<GREEN>) (<RESET>-1e10<GREEN>) (<RESET>0xabcdef<GREEN>)<RESET> '<RED>x<RESET><GREEN>y<RESET>'
+[<RED>a<RESET><GREEN>x<RESET>] <RED>a<RESET><GREEN>x<RESET>-><RED>b a<RESET><GREEN>y x<RESET>.<RED>b<RESET><GREEN>y<RESET>
+!<RED>a<RESET><GREEN>x<RESET> ~<RED>a a<RESET><GREEN>x x<RESET>++ <RED>a<RESET><GREEN>x<RESET>-- <RED>a<RESET><GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>/<RED>b a<RESET><GREEN>y x<RESET>%<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>+<RED>b a<RESET><GREEN>y x<RESET>-<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<<RED>b a<RESET><GREEN>y x<RESET>>><RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<RED>b a<RESET><GREEN>y x<RESET><=<RED>b a<RESET><GREEN>y x<RESET>><RED>b a<RESET><GREEN>y x<RESET>>=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>==<RED>b a<RESET><GREEN>y x<RESET>!=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>^<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>|<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>||<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>?<RED>b<RESET><GREEN>y<RESET>:z
+<RED>a<RESET><GREEN>x<RESET>=<RED>b a<RESET><GREEN>y x<RESET>+=<RED>b a<RESET><GREEN>y x<RESET>-=<RED>b a<RESET><GREEN>y x<RESET>*=<RED>b a<RESET><GREEN>y x<RESET>/=<RED>b a<RESET><GREEN>y x<RESET>%=<RED>b a<RESET><GREEN>y x<RESET><<=<RED>b a<RESET><GREEN>y x<RESET>>>=<RED>b a<RESET><GREEN>y x<RESET>&=<RED>b a<RESET><GREEN>y x<RESET>^=<RED>b a<RESET><GREEN>y x<RESET>|=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>,y
diff --git a/t/t4034/php/post b/t/t4034/php/post
new file mode 100644
index 0000000..4420a49
--- /dev/null
+++ b/t/t4034/php/post
@@ -0,0 +1,18 @@
+($var) $ var
+<?="Hello World?"?>
+(1) (-1e10) (0xabcdef) 'y'
+[x] x->y x.y
+!x ~x x++ x-- x*y x&y
+x*y x/y x%y
+x+y x-y
+x<<y x>>y
+x<y x<=y x>y x>=y
+x==y x!=y
+x&y
+x^y
+x|y
+x&&y
+x||y
+x?y:z
+x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y
+x,y
diff --git a/t/t4034/php/pre b/t/t4034/php/pre
new file mode 100644
index 0000000..cf6e06b
--- /dev/null
+++ b/t/t4034/php/pre
@@ -0,0 +1,18 @@
+$var $var
+<?= "Hello World!" ?>
+1 -1e10 0xabcdef 'x'
+[a] a->b a.b
+!a ~a a++ a-- a*b a&b
+a*b a/b a%b
+a+b a-b
+a<<b a>>b
+a<b a<=b a>b a>=b
+a==b a!=b
+a&b
+a^b
+a|b
+a&&b
+a||b
+a?b:z
+a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b
+a,y
diff --git a/t/t4034/python/expect b/t/t4034/python/expect
new file mode 100644
index 0000000..8abb8a4
--- /dev/null
+++ b/t/t4034/python/expect
@@ -0,0 +1,34 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 438f776..68baf34 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,17 +1,17 @@<RESET>
+print<RED>u<RESET> "Hello World<RED>!<RESET><GREEN>?<RESET>\n"<GREEN>; print<RESET>
+<GREEN>(<RESET>1<GREEN>) (<RESET>-1e10<GREEN>) (<RESET>0xabcdef<GREEN>) u<RESET>'<RED>x<RESET><GREEN>y<RESET>'
+[<RED>a<RESET><GREEN>x<RESET>] <RED>a<RESET><GREEN>x<RESET>-><RED>b a<RESET><GREEN>y x<RESET>.<RED>b<RESET><GREEN>y<RESET>
+!<RED>a<RESET><GREEN>x<RESET> ~<RED>a a<RESET><GREEN>x x<RESET>++ <RED>a<RESET><GREEN>x<RESET>-- <RED>a<RESET><GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>/<RED>b a<RESET><GREEN>y x<RESET>%<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>+<RED>b a<RESET><GREEN>y x<RESET>-<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<<RED>b a<RESET><GREEN>y x<RESET>>><RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<RED>b a<RESET><GREEN>y x<RESET><=<RED>b a<RESET><GREEN>y x<RESET>><RED>b a<RESET><GREEN>y x<RESET>>=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>==<RED>b a<RESET><GREEN>y x<RESET>!=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>^<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>|<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>||<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>?<RED>b<RESET><GREEN>y<RESET>:z
+<RED>a<RESET><GREEN>x<RESET>=<RED>b a<RESET><GREEN>y x<RESET>+=<RED>b a<RESET><GREEN>y x<RESET>-=<RED>b a<RESET><GREEN>y x<RESET>*=<RED>b a<RESET><GREEN>y x<RESET>/=<RED>b a<RESET><GREEN>y x<RESET>%=<RED>b a<RESET><GREEN>y x<RESET><<=<RED>b a<RESET><GREEN>y x<RESET>>>=<RED>b a<RESET><GREEN>y x<RESET>&=<RED>b a<RESET><GREEN>y x<RESET>^=<RED>b a<RESET><GREEN>y x<RESET>|=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>,y
diff --git a/t/t4034/python/post b/t/t4034/python/post
new file mode 100644
index 0000000..68baf34
--- /dev/null
+++ b/t/t4034/python/post
@@ -0,0 +1,17 @@
+print "Hello World?\n"; print
+(1) (-1e10) (0xabcdef) u'y'
+[x] x->y x.y
+!x ~x x++ x-- x*y x&y
+x*y x/y x%y
+x+y x-y
+x<<y x>>y
+x<y x<=y x>y x>=y
+x==y x!=y
+x&y
+x^y
+x|y
+x&&y
+x||y
+x?y:z
+x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y
+x,y
diff --git a/t/t4034/python/pre b/t/t4034/python/pre
new file mode 100644
index 0000000..438f776
--- /dev/null
+++ b/t/t4034/python/pre
@@ -0,0 +1,17 @@
+print u"Hello World!\n"
+1 -1e10 0xabcdef 'x'
+[a] a->b a.b
+!a ~a a++ a-- a*b a&b
+a*b a/b a%b
+a+b a-b
+a<<b a>>b
+a<b a<=b a>b a>=b
+a==b a!=b
+a&b
+a^b
+a|b
+a&&b
+a||b
+a?b:z
+a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b
+a,y
diff --git a/t/t4034/ruby/expect b/t/t4034/ruby/expect
new file mode 100644
index 0000000..16e1dd5
--- /dev/null
+++ b/t/t4034/ruby/expect
@@ -0,0 +1,34 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 30ed9a1..7678f14 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,17 +1,17 @@<RESET>
+10.downto(1) {|<RED>x<RESET><GREEN>y<RESET>| puts <RED>x<RESET><GREEN>y<RESET>}
+<GREEN>(<RESET>1<GREEN>) (<RESET>-1e10<GREEN>) (<RESET>0xabcdef<GREEN>)<RESET> '<RED>x<RESET><GREEN>y<RESET>'
+[<RED>a<RESET><GREEN>x<RESET>] <RED>a<RESET><GREEN>x<RESET>-><RED>b a<RESET><GREEN>y x<RESET>.<RED>b<RESET><GREEN>y<RESET>
+!<RED>a<RESET><GREEN>x<RESET> ~<RED>a a<RESET><GREEN>x x<RESET>++ <RED>a<RESET><GREEN>x<RESET>-- <RED>a<RESET><GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>/<RED>b a<RESET><GREEN>y x<RESET>%<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>+<RED>b a<RESET><GREEN>y x<RESET>-<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<<RED>b a<RESET><GREEN>y x<RESET>>><RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<RED>b a<RESET><GREEN>y x<RESET><=<RED>b a<RESET><GREEN>y x<RESET>><RED>b a<RESET><GREEN>y x<RESET>>=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>==<RED>b a<RESET><GREEN>y x<RESET>!=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>^<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>|<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>||<RED>b<RESET>
+<RED>a?b<RESET><GREEN>y<RESET>
+<GREEN>x?y<RESET>:z
+<RED>a<RESET><GREEN>x<RESET>=<RED>b a<RESET><GREEN>y x<RESET>+=<RED>b a<RESET><GREEN>y x<RESET>-=<RED>b a<RESET><GREEN>y x<RESET>*=<RED>b a<RESET><GREEN>y x<RESET>/=<RED>b a<RESET><GREEN>y x<RESET>%=<RED>b a<RESET><GREEN>y x<RESET><<=<RED>b a<RESET><GREEN>y x<RESET>>>=<RED>b a<RESET><GREEN>y x<RESET>&=<RED>b a<RESET><GREEN>y x<RESET>^=<RED>b a<RESET><GREEN>y x<RESET>|=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>,y
diff --git a/t/t4034/ruby/post b/t/t4034/ruby/post
new file mode 100644
index 0000000..7678f14
--- /dev/null
+++ b/t/t4034/ruby/post
@@ -0,0 +1,17 @@
+10.downto(1) {|y| puts y}
+(1) (-1e10) (0xabcdef) 'y'
+[x] x->y x.y
+!x ~x x++ x-- x*y x&y
+x*y x/y x%y
+x+y x-y
+x<<y x>>y
+x<y x<=y x>y x>=y
+x==y x!=y
+x&y
+x^y
+x|y
+x&&y
+x||y
+x?y:z
+x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y
+x,y
diff --git a/t/t4034/ruby/pre b/t/t4034/ruby/pre
new file mode 100644
index 0000000..30ed9a1
--- /dev/null
+++ b/t/t4034/ruby/pre
@@ -0,0 +1,17 @@
+10.downto(1) {|x| puts x}
+1 -1e10 0xabcdef 'x'
+[a] a->b a.b
+!a ~a a++ a-- a*b a&b
+a*b a/b a%b
+a+b a-b
+a<<b a>>b
+a<b a<=b a>b a>=b
+a==b a!=b
+a&b
+a^b
+a|b
+a&&b
+a||b
+a?b:z
+a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b
+a,y
diff --git a/t/t4034/tex/expect b/t/t4034/tex/expect
new file mode 100644
index 0000000..604969b
--- /dev/null
+++ b/t/t4034/tex/expect
@@ -0,0 +1,9 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 2b2dfcb..65cab61 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,4 +1,4 @@<RESET>
+\section{Something <GREEN>new<RESET>}
+<RED>\emph<RESET><GREEN>\textbf<RESET>{Macro style}
+{<RED>\em<RESET><GREEN>\bfseries<RESET> State toggle style}
+\\[<RED>1em<RESET><GREEN>1cm<RESET>]
diff --git a/t/t4034/tex/post b/t/t4034/tex/post
new file mode 100644
index 0000000..65cab61
--- /dev/null
+++ b/t/t4034/tex/post
@@ -0,0 +1,4 @@
+\section{Something new}
+\textbf{Macro style}
+{\bfseries State toggle style}
+\\[1cm]
diff --git a/t/t4034/tex/pre b/t/t4034/tex/pre
new file mode 100644
index 0000000..2b2dfcb
--- /dev/null
+++ b/t/t4034/tex/pre
@@ -0,0 +1,4 @@
+\section{Something}
+\emph{Macro style}
+{\em State toggle style}
+\\[1em]
-- 
1.7.4.rc1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/3] userdiff: simplify word-diff safeguard
  2011-01-11 21:47             ` [RFC/PATCH 0/3] " Jonathan Nieder
  2011-01-11 21:48               ` [PATCH 1/3] " Jonathan Nieder
@ 2011-01-11 21:48               ` Jonathan Nieder
  2011-01-11 21:49               ` [PATCH 3/3] t4034 (diff --word-diff): style suggestions Jonathan Nieder
  2 siblings, 0 replies; 27+ messages in thread
From: Jonathan Nieder @ 2011-01-11 21:48 UTC (permalink / raw)
  To: Thomas Rast
  Cc: Junio C Hamano, Scott Johnson, Michael J Gruber,
	Matthijs Kooijman, git

git's diff-words support has a detail that can be a little dangerous:
any text not matched by a given language's tokenization pattern is
treated as whitespace and changes in such text would go unnoticed.
Therefore each of the built-in regexes allows a special token type
consisting of a single non-whitespace character [^[:space:]].

To make sure UTF-8 sequences remain human readable, the builtin
regexes also have a special token type for runs of bytes with the high
bit set.  In English, non-ASCII characters are usually isolated so
this is analogous to the [^[:space:]] pattern, except it matches a
single _multibyte_ character despite use of the C locale.

Unfortunately it is easy to make typos or forget entirely to include
these catch-all token types when adding support for new languages (see
v1.7.3.5~16, userdiff: fix typo in ruby and python word regexes,
2010-12-18).  Avoid this by including them automatically within the
PATTERNS and IPATTERN macros.

While at it, change the UTF-8 sequence token type to match exactly one
non-ASCII multi-byte character, rather than an arbitrary run of them.

Suggested-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 userdiff.c |   37 +++++++++++++++----------------------
 1 files changed, 15 insertions(+), 22 deletions(-)

diff --git a/userdiff.c b/userdiff.c
index 2d54536..91586cf 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -8,9 +8,11 @@ static int ndrivers;
 static int drivers_alloc;
 
 #define PATTERNS(name, pattern, word_regex)			\
-	{ name, NULL, -1, { pattern, REG_EXTENDED }, word_regex }
+	{ name, NULL, -1, { pattern, REG_EXTENDED },		\
+	  word_regex "|[^[:space:]]|[\xc0-\xff][\x80-\xbf]+" }
 #define IPATTERN(name, pattern, word_regex)			\
-	{ name, NULL, -1, { pattern, REG_EXTENDED | REG_ICASE }, word_regex }
+	{ name, NULL, -1, { pattern, REG_EXTENDED | REG_ICASE }, \
+	  word_regex "|[^[:space:]]|[\xc0-\xff][\x80-\xbf]+" }
 static struct userdiff_driver builtin_drivers[] = {
 IPATTERN("fortran",
 	 "!^([C*]|[ \t]*!)\n"
@@ -24,10 +26,9 @@ IPATTERN("fortran",
 	  * Don't worry about format statements without leading digits since
 	  * they would have been matched above as a variable anyway. */
 	 "|[-+]?[0-9.]+([AaIiDdEeFfLlTtXx][Ss]?[-+]?[0-9.]*)?(_[a-zA-Z0-9][a-zA-Z0-9_]*)?"
-	 "|//|\\*\\*|::|[/<>=]="
-	 "|[^[:space:]]|[\x80-\xff]+"),
+	 "|//|\\*\\*|::|[/<>=]="),
 PATTERNS("html", "^[ \t]*(<[Hh][1-6][ \t].*>.*)$",
-	 "[^<>= \t]+|[^[:space:]]|[\x80-\xff]+"),
+	 "[^<>= \t]+"),
 PATTERNS("java",
 	 "!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n"
 	 "^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)$",
@@ -35,8 +36,7 @@ PATTERNS("java",
 	 "[a-zA-Z_][a-zA-Z0-9_]*"
 	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
 	 "|[-+*/<>%&^|=!]="
-	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"
-	 "|[^[:space:]]|[\x80-\xff]+"),
+	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
 PATTERNS("objc",
 	 /* Negate C statements that can look like functions */
 	 "!^[ \t]*(do|for|if|else|return|switch|while)\n"
@@ -49,8 +49,7 @@ PATTERNS("objc",
 	 /* -- */
 	 "[a-zA-Z_][a-zA-Z0-9_]*"
 	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
-	 "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"
-	 "|[^[:space:]]|[\x80-\xff]+"),
+	 "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"),
 PATTERNS("pascal",
 	 "^((procedure|function|constructor|destructor|interface|"
 		"implementation|initialization|finalization)[ \t]*.*)$"
@@ -59,33 +58,29 @@ PATTERNS("pascal",
 	 /* -- */
 	 "[a-zA-Z_][a-zA-Z0-9_]*"
 	 "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+"
-	 "|<>|<=|>=|:=|\\.\\."
-	 "|[^[:space:]]|[\x80-\xff]+"),
+	 "|<>|<=|>=|:=|\\.\\."),
 PATTERNS("php",
 	 "^[\t ]*(((public|protected|private|static)[\t ]+)*function.*)$\n"
 	 "^[\t ]*(class.*)$",
 	 /* -- */
 	 "[a-zA-Z_][a-zA-Z0-9_]*"
 	 "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+"
-	 "|[-+*/<>%&^|=!.]=|--|\\+\\+|<<=?|>>=?|===|&&|\\|\\||::|->"
-	 "|[^[:space:]]|[\x80-\xff]+"),
+	 "|[-+*/<>%&^|=!.]=|--|\\+\\+|<<=?|>>=?|===|&&|\\|\\||::|->"),
 PATTERNS("python", "^[ \t]*((class|def)[ \t].*)$",
 	 /* -- */
 	 "[a-zA-Z_][a-zA-Z0-9_]*"
 	 "|[-+0-9.e]+[jJlL]?|0[xX]?[0-9a-fA-F]+[lL]?"
-	 "|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?"
-	 "|[^[:space:]]|[\x80-\xff]+"),
+	 "|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?"),
 	 /* -- */
 PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
 	 /* -- */
 	 "(@|@@|\\$)?[a-zA-Z_][a-zA-Z0-9_]*"
 	 "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+|\\?(\\\\C-)?(\\\\M-)?."
-	 "|//=?|[-+*/<>%&^|=!]=|<<=?|>>=?|===|\\.{1,3}|::|[!=]~"
-	 "|[^[:space:]]|[\x80-\xff]+"),
+	 "|//=?|[-+*/<>%&^|=!]=|<<=?|>>=?|===|\\.{1,3}|::|[!=]~"),
 PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
 	 "[={}\"]|[^={}\" \t]+"),
 PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
-	 "\\\\[a-zA-Z@]+|\\\\.|[a-zA-Z0-9\x80-\xff]+|[^[:space:]]"),
+	 "\\\\[a-zA-Z@]+|\\\\.|[a-zA-Z0-9\x80-\xff]+"),
 PATTERNS("cpp",
 	 /* Jump targets or access declarations */
 	 "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:.*$\n"
@@ -96,8 +91,7 @@ PATTERNS("cpp",
 	 /* -- */
 	 "[a-zA-Z_][a-zA-Z0-9_]*"
 	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
-	 "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"
-	 "|[^[:space:]]|[\x80-\xff]+"),
+	 "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"),
 PATTERNS("csharp",
 	 /* Keywords */
 	 "!^[ \t]*(do|while|for|if|else|instanceof|new|return|switch|case|throw|catch|using)\n"
@@ -112,8 +106,7 @@ PATTERNS("csharp",
 	 /* -- */
 	 "[a-zA-Z_][a-zA-Z0-9_]*"
 	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
-	 "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"
-	 "|[^[:space:]]|[\x80-\xff]+"),
+	 "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"),
 { "default", NULL, -1, { NULL, 0 } },
 };
 #undef PATTERNS
-- 
1.7.4.rc1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/3] t4034 (diff --word-diff): style suggestions
  2011-01-11 21:47             ` [RFC/PATCH 0/3] " Jonathan Nieder
  2011-01-11 21:48               ` [PATCH 1/3] " Jonathan Nieder
  2011-01-11 21:48               ` [PATCH 2/3] userdiff: simplify word-diff safeguard Jonathan Nieder
@ 2011-01-11 21:49               ` Jonathan Nieder
  2 siblings, 0 replies; 27+ messages in thread
From: Jonathan Nieder @ 2011-01-11 21:49 UTC (permalink / raw)
  To: Thomas Rast
  Cc: Junio C Hamano, Scott Johnson, Michael J Gruber,
	Matthijs Kooijman, git

Rearrange code to be easier to browse:

 - first data
 - then functions
 - then test assertions

Mark up inline test vectors as

  cat >vector <<-\EOF
	data
	data
  EOF

for visual scannability.  Use words like "set up" for tests that set
up for other tests, to make it obvious which tests are safe to skip.
Use repeated function calls instead of a loop for the
language-specific tests, so the invocations can be easily tweaked
individually (for example if one starts to fail).

This means if you add a new subdirectory to t4034/, it will not be
automatically used.  I think that's worth it for the added
explicitness.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 t/t4034-diff-words.sh |  476 ++++++++++++++++++++++--------------------------
 1 files changed, 218 insertions(+), 258 deletions(-)

diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index 2647191..c3b1c48 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -4,346 +4,306 @@ test_description='word diff colors'
 
 . ./test-lib.sh
 
+cat >pre.simple <<-\EOF
+	h(4)
+
+	a = b + c
+EOF
+cat >post.simple <<-\EOF
+	h(4),hh[44]
+
+	a = b + c
+
+	aa = a
+
+	aeff = aeff * ( aaa )
+EOF
+cat >expect.letter-runs-are-words <<-\EOF
+	<BOLD>diff --git a/pre b/post<RESET>
+	<BOLD>index 330b04f..5ed8eff 100644<RESET>
+	<BOLD>--- a/pre<RESET>
+	<BOLD>+++ b/post<RESET>
+	<CYAN>@@ -1,3 +1,7 @@<RESET>
+	h(4),<GREEN>hh<RESET>[44]
+
+	a = b + c<RESET>
+
+	<GREEN>aa = a<RESET>
+
+	<GREEN>aeff = aeff * ( aaa<RESET> )
+EOF
+cat >expect.non-whitespace-is-word <<-\EOF
+	<BOLD>diff --git a/pre b/post<RESET>
+	<BOLD>index 330b04f..5ed8eff 100644<RESET>
+	<BOLD>--- a/pre<RESET>
+	<BOLD>+++ b/post<RESET>
+	<CYAN>@@ -1,3 +1,7 @@<RESET>
+	h(4)<GREEN>,hh[44]<RESET>
+
+	a = b + c<RESET>
+
+	<GREEN>aa = a<RESET>
+
+	<GREEN>aeff = aeff * ( aaa )<RESET>
+EOF
+
+word_diff () {
+	test_must_fail git diff --no-index "$@" pre post >output &&
+	test_decode_color <output >output.decrypted &&
+	test_cmp expect output.decrypted
+}
+
+test_language_driver () {
+	lang=$1
+	test_expect_success "diff driver '$lang'" '
+		cp "$TEST_DIRECTORY/t4034/'"$lang"'/pre" \
+			"$TEST_DIRECTORY/t4034/'"$lang"'/post" \
+			"$TEST_DIRECTORY/t4034/'"$lang"'/expect" . &&
+		echo "* diff='"$lang"'" >.gitattributes &&
+		word_diff --color-words
+	'
+}
+
 test_expect_success setup '
-
 	git config diff.color.old red &&
 	git config diff.color.new green &&
 	git config diff.color.func magenta
-
 '
 
-word_diff () {
-	test_must_fail git diff --no-index "$@" pre post > output &&
-	test_decode_color <output >output.decrypted &&
-	test_cmp expect output.decrypted
-}
-
-cat > pre <<\EOF
-h(4)
-
-a = b + c
-EOF
-
-cat > post <<\EOF
-h(4),hh[44]
-
-a = b + c
-
-aa = a
-
-aeff = aeff * ( aaa )
-EOF
-
-cat > expect <<\EOF
-<BOLD>diff --git a/pre b/post<RESET>
-<BOLD>index 330b04f..5ed8eff 100644<RESET>
-<BOLD>--- a/pre<RESET>
-<BOLD>+++ b/post<RESET>
-<CYAN>@@ -1,3 +1,7 @@<RESET>
-<RED>h(4)<RESET><GREEN>h(4),hh[44]<RESET>
-
-a = b + c<RESET>
-
-<GREEN>aa = a<RESET>
-
-<GREEN>aeff = aeff * ( aaa )<RESET>
-EOF
+test_expect_success 'set up pre and post with runs of whitespace' '
+	cp pre.simple pre &&
+	cp post.simple post
+'
 
 test_expect_success 'word diff with runs of whitespace' '
+	cat >expect <<-\EOF &&
+		<BOLD>diff --git a/pre b/post<RESET>
+		<BOLD>index 330b04f..5ed8eff 100644<RESET>
+		<BOLD>--- a/pre<RESET>
+		<BOLD>+++ b/post<RESET>
+		<CYAN>@@ -1,3 +1,7 @@<RESET>
+		<RED>h(4)<RESET><GREEN>h(4),hh[44]<RESET>
 
-	word_diff --color-words
+		a = b + c<RESET>
 
-'
-
-test_expect_success '--word-diff=color' '
-
-	word_diff --word-diff=color
-
-'
-
-test_expect_success '--color --word-diff=color' '
+		<GREEN>aa = a<RESET>
 
+		<GREEN>aeff = aeff * ( aaa )<RESET>
+	EOF
+	word_diff --color-words &&
+	word_diff --word-diff=color &&
 	word_diff --color --word-diff=color
-
 '
 
-sed 's/#.*$//' > expect <<EOF
-diff --git a/pre b/post
-index 330b04f..5ed8eff 100644
---- a/pre
-+++ b/post
-@@ -1,3 +1,7 @@
--h(4)
-+h(4),hh[44]
-~
- # significant space
-~
- a = b + c
-~
-~
-+aa = a
-~
-~
-+aeff = aeff * ( aaa )
-~
-EOF
-
 test_expect_success '--word-diff=porcelain' '
-
+	sed 's/#.*$//' >expect <<-\EOF &&
+		diff --git a/pre b/post
+		index 330b04f..5ed8eff 100644
+		--- a/pre
+		+++ b/post
+		@@ -1,3 +1,7 @@
+		-h(4)
+		+h(4),hh[44]
+		~
+		 # significant space
+		~
+		 a = b + c
+		~
+		~
+		+aa = a
+		~
+		~
+		+aeff = aeff * ( aaa )
+		~
+	EOF
 	word_diff --word-diff=porcelain
-
 '
 
-cat > expect <<EOF
-diff --git a/pre b/post
-index 330b04f..5ed8eff 100644
---- a/pre
-+++ b/post
-@@ -1,3 +1,7 @@
-[-h(4)-]{+h(4),hh[44]+}
-
-a = b + c
-
-{+aa = a+}
-
-{+aeff = aeff * ( aaa )+}
-EOF
-
 test_expect_success '--word-diff=plain' '
+	cat >expect <<-\EOF &&
+		diff --git a/pre b/post
+		index 330b04f..5ed8eff 100644
+		--- a/pre
+		+++ b/post
+		@@ -1,3 +1,7 @@
+		[-h(4)-]{+h(4),hh[44]+}
 
-	word_diff --word-diff=plain
+		a = b + c
 
-'
-
-test_expect_success '--word-diff=plain --no-color' '
+		{+aa = a+}
 
+		{+aeff = aeff * ( aaa )+}
+	EOF
+	word_diff --word-diff=plain &&
 	word_diff --word-diff=plain --no-color
-
 '
 
-cat > expect <<EOF
-<BOLD>diff --git a/pre b/post<RESET>
-<BOLD>index 330b04f..5ed8eff 100644<RESET>
-<BOLD>--- a/pre<RESET>
-<BOLD>+++ b/post<RESET>
-<CYAN>@@ -1,3 +1,7 @@<RESET>
-<RED>[-h(4)-]<RESET><GREEN>{+h(4),hh[44]+}<RESET>
-
-a = b + c<RESET>
-
-<GREEN>{+aa = a+}<RESET>
-
-<GREEN>{+aeff = aeff * ( aaa )+}<RESET>
-EOF
-
 test_expect_success '--word-diff=plain --color' '
+	cat >expect <<-\EOF &&
+		<BOLD>diff --git a/pre b/post<RESET>
+		<BOLD>index 330b04f..5ed8eff 100644<RESET>
+		<BOLD>--- a/pre<RESET>
+		<BOLD>+++ b/post<RESET>
+		<CYAN>@@ -1,3 +1,7 @@<RESET>
+		<RED>[-h(4)-]<RESET><GREEN>{+h(4),hh[44]+}<RESET>
 
+		a = b + c<RESET>
+
+		<GREEN>{+aa = a+}<RESET>
+
+		<GREEN>{+aeff = aeff * ( aaa )+}<RESET>
+	EOF
 	word_diff --word-diff=plain --color
-
 '
 
-cat > expect <<\EOF
-<BOLD>diff --git a/pre b/post<RESET>
-<BOLD>index 330b04f..5ed8eff 100644<RESET>
-<BOLD>--- a/pre<RESET>
-<BOLD>+++ b/post<RESET>
-<CYAN>@@ -1 +1 @@<RESET>
-<RED>h(4)<RESET><GREEN>h(4),hh[44]<RESET>
-<CYAN>@@ -3,0 +4,4 @@<RESET> <RESET><MAGENTA>a = b + c<RESET>
-
-<GREEN>aa = a<RESET>
-
-<GREEN>aeff = aeff * ( aaa )<RESET>
-EOF
-
 test_expect_success 'word diff without context' '
+	cat >expect <<-\EOF &&
+		<BOLD>diff --git a/pre b/post<RESET>
+		<BOLD>index 330b04f..5ed8eff 100644<RESET>
+		<BOLD>--- a/pre<RESET>
+		<BOLD>+++ b/post<RESET>
+		<CYAN>@@ -1 +1 @@<RESET>
+		<RED>h(4)<RESET><GREEN>h(4),hh[44]<RESET>
+		<CYAN>@@ -3,0 +4,4 @@<RESET> <RESET><MAGENTA>a = b + c<RESET>
 
+		<GREEN>aa = a<RESET>
+
+		<GREEN>aeff = aeff * ( aaa )<RESET>
+	EOF
 	word_diff --color-words --unified=0
-
 '
 
-cat > expect <<\EOF
-<BOLD>diff --git a/pre b/post<RESET>
-<BOLD>index 330b04f..5ed8eff 100644<RESET>
-<BOLD>--- a/pre<RESET>
-<BOLD>+++ b/post<RESET>
-<CYAN>@@ -1,3 +1,7 @@<RESET>
-h(4),<GREEN>hh<RESET>[44]
-
-a = b + c<RESET>
-
-<GREEN>aa = a<RESET>
-
-<GREEN>aeff = aeff * ( aaa<RESET> )
-EOF
-cp expect expect.letter-runs-are-words
-
 test_expect_success 'word diff with a regular expression' '
-
+	cp expect.letter-runs-are-words expect &&
 	word_diff --color-words="[a-z]+"
-
 '
 
-test_expect_success 'set a diff driver' '
+test_expect_success 'set up a diff driver' '
 	git config diff.testdriver.wordRegex "[^[:space:]]" &&
-	cat <<EOF > .gitattributes
-pre diff=testdriver
-post diff=testdriver
-EOF
+	cat <<-\EOF >.gitattributes
+		pre diff=testdriver
+		post diff=testdriver
+	EOF
 '
 
 test_expect_success 'option overrides .gitattributes' '
-
+	cp expect.letter-runs-are-words expect &&
 	word_diff --color-words="[a-z]+"
-
 '
 
-cat > expect <<\EOF
-<BOLD>diff --git a/pre b/post<RESET>
-<BOLD>index 330b04f..5ed8eff 100644<RESET>
-<BOLD>--- a/pre<RESET>
-<BOLD>+++ b/post<RESET>
-<CYAN>@@ -1,3 +1,7 @@<RESET>
-h(4)<GREEN>,hh[44]<RESET>
-
-a = b + c<RESET>
-
-<GREEN>aa = a<RESET>
-
-<GREEN>aeff = aeff * ( aaa )<RESET>
-EOF
-cp expect expect.non-whitespace-is-word
-
 test_expect_success 'use regex supplied by driver' '
-
+	cp expect.non-whitespace-is-word expect &&
 	word_diff --color-words
-
 '
 
-test_expect_success 'set diff.wordRegex option' '
+test_expect_success 'set up diff.wordRegex option' '
 	git config diff.wordRegex "[[:alnum:]]+"
 '
 
-cp expect.letter-runs-are-words expect
-
 test_expect_success 'command-line overrides config' '
+	cp expect.letter-runs-are-words expect &&
 	word_diff --color-words="[a-z]+"
 '
 
-cat > expect <<\EOF
-<BOLD>diff --git a/pre b/post<RESET>
-<BOLD>index 330b04f..5ed8eff 100644<RESET>
-<BOLD>--- a/pre<RESET>
-<BOLD>+++ b/post<RESET>
-<CYAN>@@ -1,3 +1,7 @@<RESET>
-h(4),<GREEN>{+hh+}<RESET>[44]
-
-a = b + c<RESET>
-
-<GREEN>{+aa = a+}<RESET>
-
-<GREEN>{+aeff = aeff * ( aaa+}<RESET> )
-EOF
-
 test_expect_success 'command-line overrides config: --word-diff-regex' '
+	cat >expect <<-\EOF &&
+		<BOLD>diff --git a/pre b/post<RESET>
+		<BOLD>index 330b04f..5ed8eff 100644<RESET>
+		<BOLD>--- a/pre<RESET>
+		<BOLD>+++ b/post<RESET>
+		<CYAN>@@ -1,3 +1,7 @@<RESET>
+		h(4),<GREEN>{+hh+}<RESET>[44]
+
+		a = b + c<RESET>
+
+		<GREEN>{+aa = a+}<RESET>
+
+		<GREEN>{+aeff = aeff * ( aaa+}<RESET> )
+	EOF
 	word_diff --color --word-diff-regex="[a-z]+"
 '
 
-cp expect.non-whitespace-is-word expect
-
 test_expect_success '.gitattributes override config' '
+	cp expect.non-whitespace-is-word expect &&
 	word_diff --color-words
 '
 
-test_expect_success 'remove diff driver regex' '
-	git config --unset diff.testdriver.wordRegex
+test_expect_success 'setup: remove diff driver regex' '
+	test_might_fail git config --unset diff.testdriver.wordRegex
 '
 
-cat > expect <<\EOF
-<BOLD>diff --git a/pre b/post<RESET>
-<BOLD>index 330b04f..5ed8eff 100644<RESET>
-<BOLD>--- a/pre<RESET>
-<BOLD>+++ b/post<RESET>
-<CYAN>@@ -1,3 +1,7 @@<RESET>
-h(4),<GREEN>hh[44<RESET>]
-
-a = b + c<RESET>
-
-<GREEN>aa = a<RESET>
-
-<GREEN>aeff = aeff * ( aaa<RESET> )
-EOF
-
 test_expect_success 'use configured regex' '
+	cat >expect <<-\EOF &&
+		<BOLD>diff --git a/pre b/post<RESET>
+		<BOLD>index 330b04f..5ed8eff 100644<RESET>
+		<BOLD>--- a/pre<RESET>
+		<BOLD>+++ b/post<RESET>
+		<CYAN>@@ -1,3 +1,7 @@<RESET>
+		h(4),<GREEN>hh[44<RESET>]
+
+		a = b + c<RESET>
+
+		<GREEN>aa = a<RESET>
+
+		<GREEN>aeff = aeff * ( aaa<RESET> )
+	EOF
 	word_diff --color-words
 '
 
-echo 'aaa (aaa)' > pre
-echo 'aaa (aaa) aaa' > post
-
-cat > expect <<\EOF
-<BOLD>diff --git a/pre b/post<RESET>
-<BOLD>index c29453b..be22f37 100644<RESET>
-<BOLD>--- a/pre<RESET>
-<BOLD>+++ b/post<RESET>
-<CYAN>@@ -1 +1 @@<RESET>
-aaa (aaa) <GREEN>aaa<RESET>
-EOF
-
 test_expect_success 'test parsing words for newline' '
-
+	echo "aaa (aaa)" >pre &&
+	echo "aaa (aaa) aaa" >post &&
+	cat >expect <<-\EOF &&
+		<BOLD>diff --git a/pre b/post<RESET>
+		<BOLD>index c29453b..be22f37 100644<RESET>
+		<BOLD>--- a/pre<RESET>
+		<BOLD>+++ b/post<RESET>
+		<CYAN>@@ -1 +1 @@<RESET>
+		aaa (aaa) <GREEN>aaa<RESET>
+	EOF
 	word_diff --color-words="a+"
-
-
 '
 
-echo '(:' > pre
-echo '(' > post
-
-cat > expect <<\EOF
-<BOLD>diff --git a/pre b/post<RESET>
-<BOLD>index 289cb9d..2d06f37 100644<RESET>
-<BOLD>--- a/pre<RESET>
-<BOLD>+++ b/post<RESET>
-<CYAN>@@ -1 +1 @@<RESET>
-(<RED>:<RESET>
-EOF
-
 test_expect_success 'test when words are only removed at the end' '
-
+	echo "(:" >pre &&
+	echo "(" >post &&
+	cat >expect <<-\EOF &&
+		<BOLD>diff --git a/pre b/post<RESET>
+		<BOLD>index 289cb9d..2d06f37 100644<RESET>
+		<BOLD>--- a/pre<RESET>
+		<BOLD>+++ b/post<RESET>
+		<CYAN>@@ -1 +1 @@<RESET>
+		(<RED>:<RESET>
+	EOF
 	word_diff --color-words=.
-
 '
 
-cat > expect <<\EOF
-diff --git a/pre b/post
-index 289cb9d..2d06f37 100644
---- a/pre
-+++ b/post
-@@ -1 +1 @@
--(:
-+(
-EOF
-
 test_expect_success '--word-diff=none' '
-
+	echo "(:" >pre &&
+	echo "(" >post &&
+	cat >expect <<-\EOF &&
+		diff --git a/pre b/post
+		index 289cb9d..2d06f37 100644
+		--- a/pre
+		+++ b/post
+		@@ -1 +1 @@
+		-(:
+		+(
+	EOF
 	word_diff --word-diff=plain --word-diff=none
-
 '
 
-word_diff_for_language () {
-	cp "$TEST_DIRECTORY/t4034/$1/pre" \
-		"$TEST_DIRECTORY/t4034/$1/post" \
-		"$TEST_DIRECTORY/t4034/$1/expect" . &&
-	echo "* diff=$1" >.gitattributes &&
-	word_diff --color-words && cp output output.$1
-}
-
-for lang_dir in $TEST_DIRECTORY/t4034/*; do
-	lang=${lang_dir#$TEST_DIRECTORY/t4034/}
-	test_expect_success "diff driver '$lang' has sane word regex" "
-		word_diff_for_language $lang
-	"
-done
+test_language_driver bibtex
+test_language_driver cpp
+test_language_driver csharp
+test_language_driver fortran
+test_language_driver html
+test_language_driver java
+test_language_driver objc
+test_language_driver pascal
+test_language_driver php
+test_language_driver python
+test_language_driver ruby
+test_language_driver tex
 
 test_done
-- 
1.7.4.rc1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re*: [PATCH 1/3] t4034: bulk verify builtin word regex sanity
  2011-01-11 21:48               ` [PATCH 1/3] " Jonathan Nieder
@ 2011-01-18 18:00                 ` Junio C Hamano
  0 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2011-01-18 18:00 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Thomas Rast, Scott Johnson, Michael J Gruber, Matthijs Kooijman,
	git

Jonathan Nieder <jrnieder@gmail.com> writes:

> From: Thomas Rast <trast@student.ethz.ch>
> Date: Sat, 18 Dec 2010 17:17:54 +0100
>
> The builtin word regexes should be tested with some simple examples
> against simple issues.  Do this in bulk.

Thanks.

> diff --git a/t/t4034/bibtex/expect b/t/t4034/bibtex/expect
> new file mode 100644
> index 0000000..a157774
> --- /dev/null
> +++ b/t/t4034/bibtex/expect
> @@ -0,0 +1,15 @@
> +<BOLD>diff --git a/pre b/post<RESET>
> +<BOLD>index 95cd55b..ddcba9b 100644<RESET>

Having to change this line every time the test input (or output) has
changed is somewhat unfortunate.

Also I noticed that "word_diff" shell function has this:

       test_must_fail git diff --no-index "$@" pre post >output &&

which solicits two comments:

 - We do not seem to document that --no-index implies --exit-code, ever
   since the latter option was introduced at 41bbf9d (Allow git-diff exit
   with codes similar to diff(1), 2007-03-14).  Probably we should.

 - This assumes that no test vector would have identical pre/post pair
   that expects no output, which feels somewhat limiting.

What we care about in this test is that "git diff --no-index" does not die
an uncontrolled death, so test_might_fail may be more appropriate.

Here is another that probably should be squashed to this patch together
with 3/3 to add tests for the perl driver (I noticed it only because 2/3
had trivial conflict due to recent addition of it).

-- >8 --
From: Junio C Hamano <gitster@pobox.com>
Date: Tue, 18 Jan 2011 09:43:43 -0800
Subject: [PATCH] t4034 (diff --word-diff): add a minimum Perl drier test vector

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/t4034-diff-words.sh |    1 +
 t/t4034/perl/expect   |   13 +++++++++++++
 t/t4034/perl/post     |   22 ++++++++++++++++++++++
 t/t4034/perl/pre      |   22 ++++++++++++++++++++++
 4 files changed, 58 insertions(+), 0 deletions(-)
 create mode 100644 t/t4034/perl/expect
 create mode 100644 t/t4034/perl/post
 create mode 100644 t/t4034/perl/pre

diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index c3b1c48..37aeab0 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -301,6 +301,7 @@ test_language_driver html
 test_language_driver java
 test_language_driver objc
 test_language_driver pascal
+test_language_driver perl
 test_language_driver php
 test_language_driver python
 test_language_driver ruby
diff --git a/t/t4034/perl/expect b/t/t4034/perl/expect
new file mode 100644
index 0000000..a1deb6b
--- /dev/null
+++ b/t/t4034/perl/expect
@@ -0,0 +1,13 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index f6610d3..e8b72ef 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -4,8 +4,8 @@<RESET>
+
+package Frotz;<RESET>
+sub new {<RESET>
+	my <GREEN>(<RESET>$class<GREEN>, %opts)<RESET> = <RED>shift<RESET><GREEN>@_<RESET>;
+	return bless { <GREEN>xyzzy => "nitfol", %opts<RESET> }, $class;
+}<RESET>
+
+__END__<RESET>
diff --git a/t/t4034/perl/post b/t/t4034/perl/post
new file mode 100644
index 0000000..e8b72ef
--- /dev/null
+++ b/t/t4034/perl/post
@@ -0,0 +1,22 @@
+#!/usr/bin/perl
+
+use strict;
+
+package Frotz;
+sub new {
+	my ($class, %opts) = @_;
+	return bless { xyzzy => "nitfol", %opts }, $class;
+}
+
+__END__
+=head1 NAME
+
+frotz - Frotz
+
+=head1 SYNOPSIS
+
+  use frotz;
+
+  $nitfol = new Frotz();
+
+=cut
diff --git a/t/t4034/perl/pre b/t/t4034/perl/pre
new file mode 100644
index 0000000..f6610d3
--- /dev/null
+++ b/t/t4034/perl/pre
@@ -0,0 +1,22 @@
+#!/usr/bin/perl
+
+use strict;
+
+package Frotz;
+sub new {
+	my $class = shift;
+	return bless {}, $class;
+}
+
+__END__
+=head1 NAME
+
+frotz - Frotz
+
+=head1 SYNOPSIS
+
+  use frotz;
+
+  $nitfol = new Frotz();
+
+=cut
-- 
1.7.4.rc2.226.g63d9a

^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2011-01-18 18:01 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-15  3:47 html userdiff is not showing all my changes Scott Johnson
2010-12-15  9:06 ` Michael J Gruber
2010-12-15  9:12   ` Matthijs Kooijman
2010-12-15  9:29     ` Michael J Gruber
2010-12-15 15:13 ` [PATCH 0/4] --word-regex sanity checking and such Thomas Rast
2010-12-15 15:13   ` [PATCH 1/4] diff.c: pass struct diff_words into find_word_boundaries Thomas Rast
2010-12-15 15:13   ` [PATCH 2/4] diff.c: implement a sanity check for word regexes Thomas Rast
2010-12-15 15:13   ` [PATCH 3/4] userdiff: fix typo in ruby word regex Thomas Rast
2010-12-15 15:13   ` [PATCH 4/4] t4034: bulk verify builtin word regex sanity Thomas Rast
     [not found]   ` <913156.57703.qm@web110711.mail.gq1.yahoo.com>
2010-12-15 19:51     ` [PATCH 0/4] --word-regex sanity checking and such Thomas Rast
2010-12-15 20:48       ` Scott Johnson
2010-12-18 16:17         ` [PATCH v2 " Thomas Rast
2010-12-18 16:17           ` [PATCH v2 1/4] diff.c: pass struct diff_words into find_word_boundaries Thomas Rast
2010-12-18 16:17           ` [PATCH v2 2/4] diff.c: implement a sanity check for word regexes Thomas Rast
2010-12-18 21:00             ` Junio C Hamano
2010-12-19  1:59               ` Thomas Rast
2010-12-18 16:17           ` [PATCH v2 3/4] userdiff: fix typo in ruby and python " Thomas Rast
2010-12-18 21:02             ` Junio C Hamano
2010-12-19  2:10               ` Thomas Rast
2010-12-18 16:17           ` [PATCH v2 4/4] t4034: bulk verify builtin word regex sanity Thomas Rast
2011-01-11 21:47             ` [RFC/PATCH 0/3] " Jonathan Nieder
2011-01-11 21:48               ` [PATCH 1/3] " Jonathan Nieder
2011-01-18 18:00                 ` Re*: " Junio C Hamano
2011-01-11 21:48               ` [PATCH 2/3] userdiff: simplify word-diff safeguard Jonathan Nieder
2011-01-11 21:49               ` [PATCH 3/3] t4034 (diff --word-diff): style suggestions Jonathan Nieder
2010-12-18 16:24           ` [PATCH v2 0/4] --word-regex sanity checking and such Thomas Rast
2010-12-18 20:48             ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).