git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Thomas Rast <trast@student.ethz.ch>
To: Junio C Hamano <gitster@pobox.com>
Cc: Scott Johnson <scottj75074@yahoo.com>,
	Michael J Gruber <git@drmicha.warpmail.net>,
	Matthijs Kooijman <matthijs@stdin.nl>, <git@vger.kernel.org>
Subject: Re: [PATCH v2 3/4] userdiff: fix typo in ruby and python word regexes
Date: Sun, 19 Dec 2010 03:10:43 +0100	[thread overview]
Message-ID: <201012190310.43308.trast@student.ethz.ch> (raw)
In-Reply-To: <7vr5deg5fp.fsf@alter.siamese.dyndns.org>

Junio C Hamano wrote:
> Thomas Rast <trast@student.ethz.ch> writes:
> 
> > Both had an unclosed ] that ruined the safeguard against not matching
> > a non-space char.
> 
> Thanks.
> 
> Couldn't we have found this without your "sanity check" patch?  Are we
> ignoring error returns from regcomp() in some codepath, or is it just that
> we are catching them but our test suite lacks ruby and python test
> vectors?

We lacked test vectors, but we still couldn't have caught it.  We do
check for errors in regcomp():

	if (o->word_regex) {
		ecbdata.diff_words->word_regex = (regex_t *)
			xmalloc(sizeof(regex_t));
		if (regcomp(ecbdata.diff_words->word_regex,
				o->word_regex,
				REG_EXTENDED | REG_NEWLINE))
			die ("Invalid regular expression: %s",
					o->word_regex);
	}

(Now that I'm seeing this and comparing with regcomp(3), we should
actually report regerror() as part of the error message.)

The problem is that the pattern is still valid.  Consider that it was
a final two arms to the regex:

-        "|[^[:space:]|[\x80-\xff]+"),
+        "|[^[:space:]]|[\x80-\xff]+"),

In the preimage, it parses like so:

  | [^
      [:space:]|[\x80-\xff
     ]+

That is, the third [ is part of the (negated) character class.  So the
only problem is with | or [ characters in the input.  Any other
non-space character is part of the class.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

  reply	other threads:[~2010-12-19  2:11 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-15  3:47 html userdiff is not showing all my changes Scott Johnson
2010-12-15  9:06 ` Michael J Gruber
2010-12-15  9:12   ` Matthijs Kooijman
2010-12-15  9:29     ` Michael J Gruber
2010-12-15 15:13 ` [PATCH 0/4] --word-regex sanity checking and such Thomas Rast
2010-12-15 15:13   ` [PATCH 1/4] diff.c: pass struct diff_words into find_word_boundaries Thomas Rast
2010-12-15 15:13   ` [PATCH 2/4] diff.c: implement a sanity check for word regexes Thomas Rast
2010-12-15 15:13   ` [PATCH 3/4] userdiff: fix typo in ruby word regex Thomas Rast
2010-12-15 15:13   ` [PATCH 4/4] t4034: bulk verify builtin word regex sanity Thomas Rast
     [not found]   ` <913156.57703.qm@web110711.mail.gq1.yahoo.com>
2010-12-15 19:51     ` [PATCH 0/4] --word-regex sanity checking and such Thomas Rast
2010-12-15 20:48       ` Scott Johnson
2010-12-18 16:17         ` [PATCH v2 " Thomas Rast
2010-12-18 16:17           ` [PATCH v2 1/4] diff.c: pass struct diff_words into find_word_boundaries Thomas Rast
2010-12-18 16:17           ` [PATCH v2 2/4] diff.c: implement a sanity check for word regexes Thomas Rast
2010-12-18 21:00             ` Junio C Hamano
2010-12-19  1:59               ` Thomas Rast
2010-12-18 16:17           ` [PATCH v2 3/4] userdiff: fix typo in ruby and python " Thomas Rast
2010-12-18 21:02             ` Junio C Hamano
2010-12-19  2:10               ` Thomas Rast [this message]
2010-12-18 16:17           ` [PATCH v2 4/4] t4034: bulk verify builtin word regex sanity Thomas Rast
2011-01-11 21:47             ` [RFC/PATCH 0/3] " Jonathan Nieder
2011-01-11 21:48               ` [PATCH 1/3] " Jonathan Nieder
2011-01-18 18:00                 ` Re*: " Junio C Hamano
2011-01-11 21:48               ` [PATCH 2/3] userdiff: simplify word-diff safeguard Jonathan Nieder
2011-01-11 21:49               ` [PATCH 3/3] t4034 (diff --word-diff): style suggestions Jonathan Nieder
2010-12-18 16:24           ` [PATCH v2 0/4] --word-regex sanity checking and such Thomas Rast
2010-12-18 20:48             ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201012190310.43308.trast@student.ethz.ch \
    --to=trast@student.ethz.ch \
    --cc=git@drmicha.warpmail.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=matthijs@stdin.nl \
    --cc=scottj75074@yahoo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).