git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <johannes.schindelin@gmx.de>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>, "Jeff King" <peff@peff.net>,
	"Benjamin Kramer" <benny.kra@googlemail.com>,
	"René Scharfe" <l.s.r@web.de>
Subject: [PATCH v4 0/3] Fix a segfault caused by regexec() being called on mmap()ed data
Date: Wed, 21 Sep 2016 20:23:11 +0200 (CEST)	[thread overview]
Message-ID: <cover.1474482164.git.johannes.schindelin@gmx.de> (raw)
In-Reply-To: <cover.1473321437.git.johannes.schindelin@gmx.de>

[-- Attachment #1: Type: text/plain, Size: 4565 bytes --]

[Cc:ing Benjamin Kramer & René Scharfe because they both worked on
the REG_STARTEND code in grep.c that I replace in this iteration of the
patch series]

This patch series addresses a problem where `git diff` is called using
`-G` or `-S --pickaxe-regex` on new-born files that are configured
without user diff drivers, and that hence get mmap()ed into memory.

The problem with that: mmap()ed memory is *not* NUL-terminated, yet the
pickaxe code calls regexec() on it just the same.

This problem has been reported by my colleague Chris Sidi.

We solve this by introducing a helper, regexec_buf(), that takes a
pointer and a length instead of a NUL-terminated string.

This helper then uses REG_STARTEND where available, and falls back to
allocating and constructing a NUL-terminated string. Given the
wide-spread support for REG_STARTEND (Linux has it, MacOSX has it, Git
for Windows has it because it uses compat/regex/ that has it), I think
this is a fair trade-off.

Changes since v3:

- reworded the onelines as per Junio's suggestions.

- removed fallback when REG_STARTEND is not supported, in favor of
  requiring NO_REGEX.

- removed the regmatch() function from grep.c, in favor of using
  regexec_buf().


Johannes Schindelin (3):
  regex: -G<pattern> feeds a non NUL-terminated string to regexec() and
    fails
  regex: add regexec_buf() that can work on a non NUL-terminated string
  regex: use regexec_buf()

 Makefile                |  3 ++-
 diff.c                  |  3 ++-
 diffcore-pickaxe.c      | 18 ++++++++----------
 git-compat-util.h       | 13 +++++++++++++
 grep.c                  | 14 ++------------
 t/t4061-diff-pickaxe.sh | 22 ++++++++++++++++++++++
 xdiff-interface.c       | 13 ++++---------
 7 files changed, 53 insertions(+), 33 deletions(-)
 create mode 100755 t/t4061-diff-pickaxe.sh

Published-As: https://github.com/dscho/git/releases/tag/mmap-regexec-v4
Fetch-It-Via: git fetch https://github.com/dscho/git mmap-regexec-v4

Interdiff vs v3:

 diff --git a/Makefile b/Makefile
 index df4f86b..c6f7f66 100644
 --- a/Makefile
 +++ b/Makefile
 @@ -301,7 +301,8 @@ all::
  # crashes due to allocation and free working on different 'heaps'.
  # It's defined automatically if USE_NED_ALLOCATOR is set.
  #
 -# Define NO_REGEX if you have no or inferior regex support in your C library.
 +# Define NO_REGEX if your C library lacks regex support with REG_STARTEND
 +# feature.
  #
  # Define HAVE_DEV_TTY if your system can open /dev/tty to interact with the
  # user.
 diff --git a/git-compat-util.h b/git-compat-util.h
 index 627ec5f..8aab0c3 100644
 --- a/git-compat-util.h
 +++ b/git-compat-util.h
 @@ -977,25 +977,17 @@ void git_qsort(void *base, size_t nmemb, size_t size,
  #define qsort git_qsort
  #endif
  
 +#ifndef REG_STARTEND
 +#error "Git requires REG_STARTEND support. Compile with NO_REGEX=NeedsStartEnd"
 +#endif
 +
  static inline int regexec_buf(const regex_t *preg, const char *buf, size_t size,
  			      size_t nmatch, regmatch_t pmatch[], int eflags)
  {
 -#ifdef REG_STARTEND
  	assert(nmatch > 0 && pmatch);
  	pmatch[0].rm_so = 0;
  	pmatch[0].rm_eo = size;
  	return regexec(preg, buf, nmatch, pmatch, eflags | REG_STARTEND);
 -#else
 -	char *buf2 = xmalloc(size + 1);
 -	int ret;
 -
 -	memcpy(buf2, buf, size);
 -	buf2[size] = '\0';
 -	ret = regexec(preg, buf2, nmatch, pmatch, eflags);
 -	free(buf2);
 -
 -	return ret;
 -#endif
  }
  
  #ifndef DIR_HAS_BSD_GROUP_SEMANTICS
 diff --git a/grep.c b/grep.c
 index d7d00b8..1194d35 100644
 --- a/grep.c
 +++ b/grep.c
 @@ -898,17 +898,6 @@ static int fixmatch(struct grep_pat *p, char *line, char *eol,
  	}
  }
  
 -static int regmatch(const regex_t *preg, char *line, char *eol,
 -		    regmatch_t *match, int eflags)
 -{
 -#ifdef REG_STARTEND
 -	match->rm_so = 0;
 -	match->rm_eo = eol - line;
 -	eflags |= REG_STARTEND;
 -#endif
 -	return regexec(preg, line, 1, match, eflags);
 -}
 -
  static int patmatch(struct grep_pat *p, char *line, char *eol,
  		    regmatch_t *match, int eflags)
  {
 @@ -919,7 +908,8 @@ static int patmatch(struct grep_pat *p, char *line, char *eol,
  	else if (p->pcre_regexp)
  		hit = !pcrematch(p, line, eol, match, eflags);
  	else
 -		hit = !regmatch(&p->regexp, line, eol, match, eflags);
 +		hit = !regexec_buf(&p->regexp, line, eol - line, 1, match,
 +				   eflags);
  
  	return hit;
  }

-- 
2.10.0.windows.1.10.g803177d

base-commit: f6727b0509ec3417a5183ba6e658143275a734f5

  parent reply	other threads:[~2016-09-21 18:23 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-05 15:44 [PATCH 0/3] Fix a segfault caused by regexec() being called on mmap()ed data Johannes Schindelin
2016-09-05 15:45 ` [PATCH 1/3] Demonstrate a problem: our pickaxe code assumes NUL-terminated buffers Johannes Schindelin
2016-09-06 18:43   ` Jeff King
2016-09-08  7:53     ` Johannes Schindelin
2016-09-05 15:45 ` [PATCH 2/3] diff_populate_filespec: NUL-terminate buffers Johannes Schindelin
2016-09-06  7:06   ` Jeff King
2016-09-06 16:02     ` Johannes Schindelin
2016-09-06 18:41       ` Jeff King
2016-09-07 18:31         ` Junio C Hamano
2016-09-08  7:52           ` Johannes Schindelin
2016-09-08  7:49         ` Johannes Schindelin
2016-09-08  8:22           ` Jeff King
2016-09-08 16:57             ` Junio C Hamano
2016-09-08 18:22               ` Johannes Schindelin
2016-09-08 18:48               ` Jeff King
2016-09-05 15:45 ` [PATCH 3/3] diff_grep: add assertions verifying that the buffers are NUL-terminated Johannes Schindelin
2016-09-06  7:08   ` Jeff King
2016-09-06 16:04     ` Johannes Schindelin
2016-09-05 19:10 ` [PATCH 0/3] Fix a segfault caused by regexec() being called on mmap()ed data Junio C Hamano
2016-09-06  7:12   ` Jeff King
2016-09-06 14:06     ` Johannes Schindelin
2016-09-06 18:29       ` Jeff King
2016-09-08  7:29         ` Johannes Schindelin
2016-09-08  8:00           ` Jeff King
2016-09-09 10:09             ` Johannes Schindelin
2016-09-09 17:46               ` Junio C Hamano
2016-09-06 13:21   ` Johannes Schindelin
2016-09-06  6:58 ` Jeff King
2016-09-06 14:13   ` Johannes Schindelin
2016-09-08  7:31 ` [PATCH v2 " Johannes Schindelin
2016-09-08  7:31   ` [PATCH v2 2/3] Introduce a function to run regexec() on non-NUL-terminated buffers Johannes Schindelin
2016-09-08  8:04     ` Jeff King
2016-09-09  9:45       ` Johannes Schindelin
2016-09-09  9:59         ` Jeff King
2016-09-08  7:31   ` [PATCH v2 1/3] Demonstrate a problem: our pickaxe code assumes NUL-terminated buffers Johannes Schindelin
2016-09-08  7:31   ` [PATCH v2 3/3] Use the newly-introduced regexec_buf() function Johannes Schindelin
2016-09-08  7:54     ` Johannes Schindelin
2016-09-08  8:10       ` Jeff King
2016-09-08  8:14         ` Jeff King
2016-09-08  8:35           ` Jeff King
2016-09-08 19:06             ` Ramsay Jones
2016-09-08 19:53               ` Jeff King
2016-09-08 21:30                 ` Junio C Hamano
2016-09-08  7:33   ` [PATCH v2 0/3] Fix a segfault caused by regexec() being called on mmap()ed data Johannes Schindelin
2016-09-08  8:13     ` Jeff King
2016-09-08  7:57   ` [PATCH v3 " Johannes Schindelin
2016-09-08  7:57     ` [PATCH v3 1/3] Demonstrate a problem: our pickaxe code assumes NUL-terminated buffers Johannes Schindelin
2016-09-08  7:58     ` [PATCH v3 2/3] Introduce a function to run regexec() on non-NUL-terminated buffers Johannes Schindelin
2016-09-08 17:03       ` Junio C Hamano
2016-09-08  7:59     ` [PATCH v3 3/3] Use the newly-introduced regexec_buf() function Johannes Schindelin
2016-09-08 17:09       ` Junio C Hamano
2016-09-09  9:52         ` Johannes Schindelin
2016-09-09  9:57           ` Jeff King
2016-09-09 10:41             ` Johannes Schindelin
2016-09-09 17:49           ` Junio C Hamano
2016-09-21 18:23     ` Johannes Schindelin [this message]
2016-09-21 18:23       ` [PATCH v4 1/3] regex: -G<pattern> feeds a non NUL-terminated string to regexec() and fails Johannes Schindelin
2016-09-21 18:24       ` [PATCH v4 2/3] regex: add regexec_buf() that can work on a non NUL-terminated string Johannes Schindelin
2016-09-21 19:17         ` Junio C Hamano
2016-09-22 18:38           ` Johannes Schindelin
2016-09-21 18:24       ` [PATCH v4 3/3] regex: use regexec_buf() Johannes Schindelin
2016-09-21 19:18         ` Junio C Hamano
2016-09-21 20:09           ` Junio C Hamano
2016-09-21 22:03         ` Jeff King
2016-09-25 14:01           ` Johannes Schindelin
2016-09-21 22:04       ` [PATCH v4 0/3] Fix a segfault caused by regexec() being called on mmap()ed data Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1474482164.git.johannes.schindelin@gmx.de \
    --to=johannes.schindelin@gmx.de \
    --cc=benny.kra@googlemail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).