git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] Makefile: use compat regex with SANITIZE=address
@ 2020-01-16 17:51 Jeff King
  2020-01-16 18:34 ` [PATCH] t4018: drop "debugging" cat from hunk-header tests Jeff King
  2020-01-17 17:37 ` [PATCH] Makefile: use compat regex with SANITIZE=address Junio C Hamano
  0 siblings, 2 replies; 7+ messages in thread
From: Jeff King @ 2020-01-16 17:51 UTC (permalink / raw)
  To: git

Recent versions of the gcc and clang Address Sanitizer produce test
failures related to regexec(). This triggers with gcc-10 and clang-8
(but not gcc-9 nor clang-7). Running:

  make CC=gcc-10 SANITIZE=address test

results in failures in t4018, t3206, and t4062.

The cause seems to be that when built with ASan, we use a different
version of regexec() than normal. And this version doesn't understand
the REG_STARTEND flag. Here's my evidence supporting that.

The failure in t4062 is an ASan warning:

  expecting success of 4062.2 '-G matches':
  	git diff --name-only -G "^(0{64}){64}$" HEAD^ >out &&
  	test 4096-zeroes.txt = "$(cat out)"

  =================================================================
  ==672994==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fa76f672000 at pc 0x7fa7726f75b6 bp 0x7ffe41bdda70 sp 0x7ffe41bdd220
  READ of size 4097 at 0x7fa76f672000 thread T0
      #0 0x7fa7726f75b5  (/lib/x86_64-linux-gnu/libasan.so.6+0x4f5b5)
      #1 0x562ae0c9c40e in regexec_buf /home/peff/compile/git/git-compat-util.h:1117
      #2 0x562ae0c9c40e in diff_grep /home/peff/compile/git/diffcore-pickaxe.c:52
      #3 0x562ae0c9cc28 in pickaxe_match /home/peff/compile/git/diffcore-pickaxe.c:166
      [...]

In this case we're looking in a buffer which was mmap'd via
reuse_worktree_file(), and whose size is 4096 bytes. But libasan's
regex tries to look at byte 4097 anyway! If we tweak Git like this:

  diff --git a/diff.c b/diff.c
  index 8e2914c031..cfae60c120 100644
  --- a/diff.c
  +++ b/diff.c
  @@ -3880,7 +3880,7 @@ static int reuse_worktree_file(struct index_state *istate,
           */
          if (ce_uptodate(ce) ||
              (!lstat(name, &st) && !ie_match_stat(istate, ce, &st, 0)))
  -               return 1;
  +               return 0;

          return 0;
   }

to use a regular buffer (with a trailing NUL) instead of an mmap, then
the complaint goes away.

The other failures are actually diff output with an incorrect funcname
header. If I instrument xdiff to show the funcname matching like so:

  diff --git a/xdiff-interface.c b/xdiff-interface.c
  index 8509f9ea22..f6c3dc1986 100644
  --- a/xdiff-interface.c
  +++ b/xdiff-interface.c
  @@ -197,6 +197,7 @@ struct ff_regs {
   	struct ff_reg {
   		regex_t re;
   		int negate;
  +		char *printable;
   	} *array;
   };

  @@ -218,7 +219,12 @@ static long ff_regexp(const char *line, long len,

   	for (i = 0; i < regs->nr; i++) {
   		struct ff_reg *reg = regs->array + i;
  -		if (!regexec_buf(&reg->re, line, len, 2, pmatch, 0)) {
  +		int ret = regexec_buf(&reg->re, line, len, 2, pmatch, 0);
  +		warning("regexec %s:\n  regex: %s\n  buf: %.*s",
  +			ret == 0 ? "matched" : "did not match",
  +			reg->printable,
  +			(int)len, line);
  +		if (!ret) {
   			if (reg->negate)
   				return -1;
   			break;
  @@ -264,6 +270,7 @@ void xdiff_set_find_func(xdemitconf_t *xecfg, const char *value, int cflags)
   			expression = value;
   		if (regcomp(&reg->re, expression, cflags))
   			die("Invalid regexp to look for hunk header: %s", expression);
  +		reg->printable = xstrdup(expression);
   		free(buffer);
   		value = ep + 1;
   	}

then when compiling with ASan and gcc-10, running the diff from t4018.66
produces this:

  $ git diff -U1 cpp-skip-access-specifiers
  warning: regexec did not match:
    regex: ^[     ]*[A-Za-z_][A-Za-z_0-9]*:[[:space:]]*($|/[/*])
    buf: private:
  warning: regexec matched:
    regex: ^((::[[:space:]]*)?[A-Za-z_].*)$
    buf: private:
  diff --git a/cpp-skip-access-specifiers b/cpp-skip-access-specifiers
  index 4d4a9db..ebd6f42 100644
  --- a/cpp-skip-access-specifiers
  +++ b/cpp-skip-access-specifiers
  @@ -6,3 +6,3 @@ private:
          void DoSomething();
          int ChangeMe;
  };
          void DoSomething();
  -       int ChangeMe;
  +       int IWasChanged;
   };

That first regex should match (and is negated, so it should be telling
us _not_ to match "private:"). But it wouldn't if regexec() is looking
at the whole buffer, and not just the length-limited line we've fed to
regexec_buf(). So this is consistent again with REG_STARTEND being
ignored.

The correct output (compiling without ASan, or gcc-9 with Asan) looks
like this:

  warning: regexec matched:
    regex: ^[     ]*[A-Za-z_][A-Za-z_0-9]*:[[:space:]]*($|/[/*])
    buf: private:
  [...more lines that we end up not using...]
  warning: regexec matched:
    regex: ^((::[[:space:]]*)?[A-Za-z_].*)$
    buf: class RIGHT : public Baseclass
  diff --git a/cpp-skip-access-specifiers b/cpp-skip-access-specifiers
  index 4d4a9db..ebd6f42 100644
  --- a/cpp-skip-access-specifiers
  +++ b/cpp-skip-access-specifiers
  @@ -6,3 +6,3 @@ class RIGHT : public Baseclass
          void DoSomething();
  -       int ChangeMe;
  +       int IWasChanged;
   };

So it really does seem like libasan's regex engine is ignoring
REG_STARTEND. We should be able to work around it by compiling with
NO_REGEX, which would use our local regexec(). But to make matters even
more interesting, this isn't enough by itself.

Because ASan has support from the compiler, it doesn't seem to intercept
our call to regexec() at the dynamic library level. It actually
recognizes when we are compiling a call to regexec() and replaces it
with ASan-specific code at that point. And unlike most of our other
compat code, where we might have git_mmap() or similar, the actual
symbol name in the compiled compat/regex code is regexec(). So just
compiling with NO_REGEX isn't enough; we still end up in libasan!

We can work around that by having the preprocessor replace regexec with
git_regexec (both in the callers and in the actual implementation), and
we truly end up with a call to our custom regex code, even when
compiling with ASan. That's probably a good thing to do anyway, as it
means anybody looking at the symbols later (e.g., in a debugger) would
have a better indication of which function is which. So we'll do the
same for the other common regex functions (even though just regexec() is
enough to fix this ASan problem).

Signed-off-by: Jeff King <peff@peff.net>
---
 Makefile             | 3 +++
 compat/regex/regex.h | 5 +++++
 2 files changed, 8 insertions(+)

diff --git a/Makefile b/Makefile
index 09f98b777c..9cd9826800 100644
--- a/Makefile
+++ b/Makefile
@@ -1220,6 +1220,9 @@ endif
 ifneq ($(filter leak,$(SANITIZERS)),)
 BASIC_CFLAGS += -DSUPPRESS_ANNOTATED_LEAKS
 endif
+ifneq ($(filter address,$(SANITIZERS)),)
+NO_REGEX = NeededForASAN
+endif
 endif
 
 ifndef sysconfdir
diff --git a/compat/regex/regex.h b/compat/regex/regex.h
index 08a2609663..2d3412860d 100644
--- a/compat/regex/regex.h
+++ b/compat/regex/regex.h
@@ -41,6 +41,11 @@
 extern "C" {
 #endif
 
+#define regcomp git_regcomp
+#define regexec git_regexec
+#define regerror git_regerror
+#define regfree git_regfree
+
 /* The following two types have to be signed and unsigned integer type
    wide enough to hold a value of a pointer.  For most ANSI compilers
    ptrdiff_t and size_t should be likely OK.  Still size of these two
-- 
2.25.0.318.gee4019ba55

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH] t4018: drop "debugging" cat from hunk-header tests
  2020-01-16 17:51 [PATCH] Makefile: use compat regex with SANITIZE=address Jeff King
@ 2020-01-16 18:34 ` Jeff King
  2020-01-16 22:56   ` Jonathan Nieder
  2020-01-17 17:37 ` [PATCH] Makefile: use compat regex with SANITIZE=address Junio C Hamano
  1 sibling, 1 reply; 7+ messages in thread
From: Jeff King @ 2020-01-16 18:34 UTC (permalink / raw)
  To: git

We run a series of hunk-header tests in a loop, and each one does this:

  test_when_finished 'cat actual' &&      # for debugging only

This is pretty pointless. When the test succeeds, we waste time running
a useless cat process. If you're debugging a failure with "-i", then we
won't run the when-finished part at all. So it helps only if you're
running with something like "--verbose-log".

Since we expect the tests to succeed most of the time, a better way to
do this would be a helper that checks the output and dumps "actual" only
when it fails. But it's probably not even worth the effort, as anyone
debugging a failure could just run with "-i" and investigate the
"actual" file themselves.

Signed-off-by: Jeff King <peff@peff.net>
---
Just noticed this when working with t4018 on an unrelated problem.

I could be convinced otherwise on the final paragraph, but I think it
would only be worth it if we added a general test_grep() helper and used
it in more places.

 t/t4018-diff-funcname.sh | 1 -
 1 file changed, 1 deletion(-)

diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index c0f4839543..02255a08bf 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -106,7 +106,6 @@ do
 		result=success
 	fi
 	test_expect_$result "hunk header: $i" "
-		test_when_finished 'cat actual' &&	# for debugging only
 		git diff -U1 $i >actual &&
 		grep '@@ .* @@.*RIGHT' actual
 	"
-- 
2.25.0.318.gee4019ba55

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] t4018: drop "debugging" cat from hunk-header tests
  2020-01-16 18:34 ` [PATCH] t4018: drop "debugging" cat from hunk-header tests Jeff King
@ 2020-01-16 22:56   ` Jonathan Nieder
  0 siblings, 0 replies; 7+ messages in thread
From: Jonathan Nieder @ 2020-01-16 22:56 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King wrote:

> We run a series of hunk-header tests in a loop, and each one does this:
>
>   test_when_finished 'cat actual' &&      # for debugging only
>
> This is pretty pointless. When the test succeeds, we waste time running
> a useless cat process. If you're debugging a failure with "-i", then we
> won't run the when-finished part at all. So it helps only if you're
> running with something like "--verbose-log".
>
> Since we expect the tests to succeed most of the time, a better way to
> do this would be a helper that checks the output and dumps "actual" only
> when it fails. But it's probably not even worth the effort, as anyone
> debugging a failure could just run with "-i" and investigate the
> "actual" file themselves.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> Just noticed this when working with t4018 on an unrelated problem.
>
> I could be convinced otherwise on the final paragraph, but I think it
> would only be worth it if we added a general test_grep() helper and used
> it in more places.
>
>  t/t4018-diff-funcname.sh | 1 -
>  1 file changed, 1 deletion(-)

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks.

Sure, a test_grep would be nice for CI cases (especially for
Heisenbugs), but in its absence I agree that this patch is the right
thing to do.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Makefile: use compat regex with SANITIZE=address
  2020-01-16 17:51 [PATCH] Makefile: use compat regex with SANITIZE=address Jeff King
  2020-01-16 18:34 ` [PATCH] t4018: drop "debugging" cat from hunk-header tests Jeff King
@ 2020-01-17 17:37 ` Junio C Hamano
  2020-01-17 17:49   ` Jeff King
  1 sibling, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2020-01-17 17:37 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Johannes Schindelin

Jeff King <peff@peff.net> writes:

> Recent versions of the gcc and clang Address Sanitizer produce test
> failures related to regexec(). This triggers with gcc-10 and clang-8
> (but not gcc-9 nor clang-7). Running:
>
>   make CC=gcc-10 SANITIZE=address test
>
> results in failures in t4018, t3206, and t4062.
> ...
> We can work around that by having the preprocessor replace regexec with
> git_regexec (both in the callers and in the actual implementation), and
> we truly end up with a call to our custom regex code, even when
> compiling with ASan. That's probably a good thing to do anyway, as it
> means anybody looking at the symbols later (e.g., in a debugger) would
> have a better indication of which function is which. So we'll do the
> same for the other common regex functions (even though just regexec() is
> enough to fix this ASan problem).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  Makefile             | 3 +++
>  compat/regex/regex.h | 5 +++++
>  2 files changed, 8 insertions(+)

I guess we should treat this the same way as the recent vcproj "fix"
by Dscho, i.e. fast-track to 'maint' to ensure that all public
integration branches has it soonish?

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Makefile: use compat regex with SANITIZE=address
  2020-01-17 17:37 ` [PATCH] Makefile: use compat regex with SANITIZE=address Junio C Hamano
@ 2020-01-17 17:49   ` Jeff King
  2020-01-17 23:39     ` brian m. carlson
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff King @ 2020-01-17 17:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Johannes Schindelin

On Fri, Jan 17, 2020 at 09:37:22AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Recent versions of the gcc and clang Address Sanitizer produce test
> > failures related to regexec(). This triggers with gcc-10 and clang-8
> > (but not gcc-9 nor clang-7). Running:
> >
> >   make CC=gcc-10 SANITIZE=address test
> >
> > results in failures in t4018, t3206, and t4062.
> [...]
> I guess we should treat this the same way as the recent vcproj "fix"
> by Dscho, i.e. fast-track to 'maint' to ensure that all public
> integration branches has it soonish?

I don't think there's a huge rush. It only triggers ASan and recent
compilers, so AFAIK nobody has been hitting this in CI. I occasionally
test with whatever is the most recent compiler in Debian unstable to see
if it turns up any interesting new warnings or errors (but gcc-9 is
still the default there).

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Makefile: use compat regex with SANITIZE=address
  2020-01-17 17:49   ` Jeff King
@ 2020-01-17 23:39     ` brian m. carlson
  2020-01-18  3:34       ` Jeff King
  0 siblings, 1 reply; 7+ messages in thread
From: brian m. carlson @ 2020-01-17 23:39 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Johannes Schindelin

[-- Attachment #1: Type: text/plain, Size: 1221 bytes --]

On 2020-01-17 at 17:49:31, Jeff King wrote:
> On Fri, Jan 17, 2020 at 09:37:22AM -0800, Junio C Hamano wrote:
> 
> > Jeff King <peff@peff.net> writes:
> > 
> > > Recent versions of the gcc and clang Address Sanitizer produce test
> > > failures related to regexec(). This triggers with gcc-10 and clang-8
> > > (but not gcc-9 nor clang-7). Running:
> > >
> > >   make CC=gcc-10 SANITIZE=address test
> > >
> > > results in failures in t4018, t3206, and t4062.
> > [...]
> > I guess we should treat this the same way as the recent vcproj "fix"
> > by Dscho, i.e. fast-track to 'maint' to ensure that all public
> > integration branches has it soonish?
> 
> I don't think there's a huge rush. It only triggers ASan and recent
> compilers, so AFAIK nobody has been hitting this in CI. I occasionally
> test with whatever is the most recent compiler in Debian unstable to see
> if it turns up any interesting new warnings or errors (but gcc-9 is
> still the default there).

I've reported this as a bug to Debian on the clang-8 and libasan6
packages, along with a trivial test case.  Hopefully this will get fixed
soon.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Makefile: use compat regex with SANITIZE=address
  2020-01-17 23:39     ` brian m. carlson
@ 2020-01-18  3:34       ` Jeff King
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2020-01-18  3:34 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Junio C Hamano, git, Johannes Schindelin

On Fri, Jan 17, 2020 at 11:39:31PM +0000, brian m. carlson wrote:

> > I don't think there's a huge rush. It only triggers ASan and recent
> > compilers, so AFAIK nobody has been hitting this in CI. I occasionally
> > test with whatever is the most recent compiler in Debian unstable to see
> > if it turns up any interesting new warnings or errors (but gcc-9 is
> > still the default there).
> 
> I've reported this as a bug to Debian on the clang-8 and libasan6
> packages, along with a trivial test case.  Hopefully this will get fixed
> soon.

Thanks. I'm not entirely convinced it's a bug (after all, we are
squatting on a well-known POSIX function name), but we'll see what they
say. :)

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-01-18  3:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-16 17:51 [PATCH] Makefile: use compat regex with SANITIZE=address Jeff King
2020-01-16 18:34 ` [PATCH] t4018: drop "debugging" cat from hunk-header tests Jeff King
2020-01-16 22:56   ` Jonathan Nieder
2020-01-17 17:37 ` [PATCH] Makefile: use compat regex with SANITIZE=address Junio C Hamano
2020-01-17 17:49   ` Jeff King
2020-01-17 23:39     ` brian m. carlson
2020-01-18  3:34       ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).