From: Jeff King <peff@peff.net>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [PATCH 0/3] Fix a segfault caused by regexec() being called on mmap()ed data
Date: Tue, 6 Sep 2016 14:29:43 -0400 [thread overview]
Message-ID: <20160906182942.s2mlge2vg65f5sy4@sigill.intra.peff.net> (raw)
In-Reply-To: <alpine.DEB.2.20.1609061521410.129229@virtualbox>
On Tue, Sep 06, 2016 at 04:06:32PM +0200, Johannes Schindelin wrote:
> > I think re_search() the correct replacement function but it's been a
> > while since I've looked into it.
>
> The segfault I investigated happened in a call to strlen(). I see many
> calls to strlen() in compat/regex/... The one that triggers the segfault
> is in regexec(), compat/regex/regexec.c:241.
Yes, that is the important one, I think. The others are for patterns,
error msgs, etc. Of course strlen() is not the only function that cares
about NUL delimiters (and there might even be a "while (*p)" somewhere
in the code).
I always assumed the _point_ of re_search taking a ptr/len pair was
exactly to handle this case. The documentation[1] says:
`string` is the string you want to match; it can contain newline and
null characters. `size` is the length of that string.
Which seems pretty definitive to me (that's for re_match(), but
re_search() is defined in the docs in terms of re_match()).
[1] http://www.delorie.com/gnu/docs/regex/regex_47.html
> As to re_search(): I have not been able to reason about its callees in a
> reasonable amount of time. I agree that they *should* not run over the
> buffer, but I cannot easily verify it.
Between the documentation above, and the fact that your new test passes
when we switch to it (see below), I feel pretty good about it.
> The bigger problem is that re_search() is defined in the __USE_GNU section
> of regex.h, and I do not think it is appropriate to universally #define
> said constant before #include'ing regex.h. So it would appear that major
> surgery would be required if we wanted to use regular expressions on
> strings that are not NUL-terminated.
We can contain this to the existing compat/regexec/regexec.c, and just
provide a wrapper that is similar to regexec but takes a ptr/len pair.
Like:
diff --git a/compat/regex/regex.h b/compat/regex/regex.h
index 61c9683..b2dd0b7 100644
--- a/compat/regex/regex.h
+++ b/compat/regex/regex.h
@@ -569,6 +569,11 @@ extern int regexec (const regex_t *__restrict __preg,
regmatch_t __pmatch[__restrict_arr],
int __eflags);
+extern int regexecn (const regex_t *__restrict __preg,
+ const char *__restrict __cstring, size_t __length,
+ size_t __nmatch, regmatch_t __pmatch[__restrict_arr],
+ int __eflags);
+
extern size_t regerror (int __errcode, const regex_t *__restrict __preg,
char *__restrict __errbuf, size_t __errbuf_size);
diff --git a/compat/regex/regexec.c b/compat/regex/regexec.c
index eb5e1d4..8afe26b 100644
--- a/compat/regex/regexec.c
+++ b/compat/regex/regexec.c
@@ -217,15 +217,16 @@ static reg_errcode_t extend_buffers (re_match_context_t *mctx)
We return 0 if we find a match and REG_NOMATCH if not. */
int
-regexec (
+regexecn (
const regex_t *__restrict preg,
const char *__restrict string,
+ size_t length,
size_t nmatch,
regmatch_t pmatch[],
int eflags)
{
reg_errcode_t err;
- int start, length;
+ int start;
if (eflags & ~(REG_NOTBOL | REG_NOTEOL | REG_STARTEND))
return REG_BADPAT;
@@ -238,7 +239,7 @@ regexec (
else
{
start = 0;
- length = strlen (string);
+ /* length already passed in */
}
__libc_lock_lock (dfa->lock);
@@ -252,6 +253,17 @@ regexec (
return err != REG_NOERROR;
}
+int
+regexec (
+ const regex_t *__restrict preg,
+ const char *__restrict string,
+ size_t nmatch,
+ regmatch_t pmatch[],
+ int eflags)
+{
+ return regexecn(preg, string, strlen(string), nmatch, pmatch, eflags);
+}
+
#ifdef _LIBC
# include <shlib-compat.h>
versioned_symbol (libc, __regexec, regexec, GLIBC_2_3_4);
diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 55067ca..fdd08dd 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -50,9 +50,9 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
xdemitconf_t xecfg;
if (!one)
- return !regexec(regexp, two->ptr, 1, ®match, 0);
+ return !regexecn(regexp, two->ptr, two->size, 1, ®match, 0);
if (!two)
- return !regexec(regexp, one->ptr, 1, ®match, 0);
+ return !regexecn(regexp, one->ptr, one->size, 1, ®match, 0);
/*
* We have both sides; need to run textual diff and see if
next prev parent reply other threads:[~2016-09-06 18:30 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-05 15:44 [PATCH 0/3] Fix a segfault caused by regexec() being called on mmap()ed data Johannes Schindelin
2016-09-05 15:45 ` [PATCH 1/3] Demonstrate a problem: our pickaxe code assumes NUL-terminated buffers Johannes Schindelin
2016-09-06 18:43 ` Jeff King
2016-09-08 7:53 ` Johannes Schindelin
2016-09-05 15:45 ` [PATCH 2/3] diff_populate_filespec: NUL-terminate buffers Johannes Schindelin
2016-09-06 7:06 ` Jeff King
2016-09-06 16:02 ` Johannes Schindelin
2016-09-06 18:41 ` Jeff King
2016-09-07 18:31 ` Junio C Hamano
2016-09-08 7:52 ` Johannes Schindelin
2016-09-08 7:49 ` Johannes Schindelin
2016-09-08 8:22 ` Jeff King
2016-09-08 16:57 ` Junio C Hamano
2016-09-08 18:22 ` Johannes Schindelin
2016-09-08 18:48 ` Jeff King
2016-09-05 15:45 ` [PATCH 3/3] diff_grep: add assertions verifying that the buffers are NUL-terminated Johannes Schindelin
2016-09-06 7:08 ` Jeff King
2016-09-06 16:04 ` Johannes Schindelin
2016-09-05 19:10 ` [PATCH 0/3] Fix a segfault caused by regexec() being called on mmap()ed data Junio C Hamano
2016-09-06 7:12 ` Jeff King
2016-09-06 14:06 ` Johannes Schindelin
2016-09-06 18:29 ` Jeff King [this message]
2016-09-08 7:29 ` Johannes Schindelin
2016-09-08 8:00 ` Jeff King
2016-09-09 10:09 ` Johannes Schindelin
2016-09-09 17:46 ` Junio C Hamano
2016-09-06 13:21 ` Johannes Schindelin
2016-09-06 6:58 ` Jeff King
2016-09-06 14:13 ` Johannes Schindelin
2016-09-08 7:31 ` [PATCH v2 " Johannes Schindelin
2016-09-08 7:31 ` [PATCH v2 2/3] Introduce a function to run regexec() on non-NUL-terminated buffers Johannes Schindelin
2016-09-08 8:04 ` Jeff King
2016-09-09 9:45 ` Johannes Schindelin
2016-09-09 9:59 ` Jeff King
2016-09-08 7:31 ` [PATCH v2 1/3] Demonstrate a problem: our pickaxe code assumes NUL-terminated buffers Johannes Schindelin
2016-09-08 7:31 ` [PATCH v2 3/3] Use the newly-introduced regexec_buf() function Johannes Schindelin
2016-09-08 7:54 ` Johannes Schindelin
2016-09-08 8:10 ` Jeff King
2016-09-08 8:14 ` Jeff King
2016-09-08 8:35 ` Jeff King
2016-09-08 19:06 ` Ramsay Jones
2016-09-08 19:53 ` Jeff King
2016-09-08 21:30 ` Junio C Hamano
2016-09-08 7:33 ` [PATCH v2 0/3] Fix a segfault caused by regexec() being called on mmap()ed data Johannes Schindelin
2016-09-08 8:13 ` Jeff King
2016-09-08 7:57 ` [PATCH v3 " Johannes Schindelin
2016-09-08 7:57 ` [PATCH v3 1/3] Demonstrate a problem: our pickaxe code assumes NUL-terminated buffers Johannes Schindelin
2016-09-08 7:58 ` [PATCH v3 2/3] Introduce a function to run regexec() on non-NUL-terminated buffers Johannes Schindelin
2016-09-08 17:03 ` Junio C Hamano
2016-09-08 7:59 ` [PATCH v3 3/3] Use the newly-introduced regexec_buf() function Johannes Schindelin
2016-09-08 17:09 ` Junio C Hamano
2016-09-09 9:52 ` Johannes Schindelin
2016-09-09 9:57 ` Jeff King
2016-09-09 10:41 ` Johannes Schindelin
2016-09-09 17:49 ` Junio C Hamano
2016-09-21 18:23 ` [PATCH v4 0/3] Fix a segfault caused by regexec() being called on mmap()ed data Johannes Schindelin
2016-09-21 18:23 ` [PATCH v4 1/3] regex: -G<pattern> feeds a non NUL-terminated string to regexec() and fails Johannes Schindelin
2016-09-21 18:24 ` [PATCH v4 2/3] regex: add regexec_buf() that can work on a non NUL-terminated string Johannes Schindelin
2016-09-21 19:17 ` Junio C Hamano
2016-09-22 18:38 ` Johannes Schindelin
2016-09-21 18:24 ` [PATCH v4 3/3] regex: use regexec_buf() Johannes Schindelin
2016-09-21 19:18 ` Junio C Hamano
2016-09-21 20:09 ` Junio C Hamano
2016-09-21 22:03 ` Jeff King
2016-09-25 14:01 ` Johannes Schindelin
2016-09-21 22:04 ` [PATCH v4 0/3] Fix a segfault caused by regexec() being called on mmap()ed data Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160906182942.s2mlge2vg65f5sy4@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).