git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, musl@lists.openwall.com
Subject: Re: Regression: git no longer works with musl libc's regex impl
Date: Tue, 4 Oct 2016 11:40:45 -0400	[thread overview]
Message-ID: <20161004154045.GT19318@brightrain.aerifal.cx> (raw)
In-Reply-To: <20161004152722.ex2nox43oj5ak4yi@sigill.intra.peff.net>

On Tue, Oct 04, 2016 at 11:27:22AM -0400, Jeff King wrote:
> On Tue, Oct 04, 2016 at 11:08:48AM -0400, Rich Felker wrote:
> 
> > This commit broke support for using git with musl libc:
> > 
> > https://github.com/git/git/commit/2f8952250a84313b74f96abb7b035874854cf202
> 
> Yep. The idea is that you would compile git with NO_REGEX=1, and it
> would use the included compat routines.

This is really obnoxious to have to do manually. Why can't it simply
be auto-detected based on non-definition of REG_STARDEND?

> Is there something in particular you want to get out of using musl's
> regex that is not supported in the compat library?

It's always nice not to link extra implementations of the same thing.
This comes up all the time with gratuitous gnulib replacement
functions too.

> > Rather than depending on non-portable GNU regex extensions, there is a
> > simple portable fix for the issue this code was added to work around:
> > When a text file is being mmapped for use with string functions which
> > depend on null termination, if the file size:
> > 
> > 1. is nonzero mod page size, it just works; the remainder of the last
> >    page reads as zero bytes when mmapped.
> 
> Is that a portable assumption?

Yes. Per POSIX:

"The system shall always zero-fill any partial page at the end of an
object. Further, the system shall never write out any modified
portions of the last page of an object which are beyond its end.
References within the address range starting at pa and continuing for
len bytes to whole pages following the end of an object shall result
in delivery of a SIGBUS signal."

Source: http://pubs.opengroup.org/onlinepubs/9699919799/functions/mmap.html

The same or similar text appears at least back to SUSv2; I did not
look further back.

> > 2. if an exact multiple of the page size, then instead of directly
> >    mmapping the file, first mmap a mapping 1 byte (thus 1 page) larger
> >    with MAP_ANON, then use MAP_FIXED to map the file over top of all
> >    but the last page. Now the mmapped buffer can safely be used as a C
> >    string.
> 
> I'm not sure whether all of our compat layers for mmap would be happy
> with that (e.g., see compat/win32mmap.c).

The nice thing about this approach is that if the MAP_FIXED fails due
to a broken system you can just read() into the anonymous memory.

> So it seems like any mmap-related solutions would have to be
> conditional, too. And then regexec_buf() would have to become something
> like:
> 
>   int regexec_buf(...)
>   {
>   #if defined(REG_STARTEND)
> 	... set up match ...
> 	return regexec(..., REG_STARTEND);
>   #elif defined(MMAP_ALWAYS_HAS_NUL)
> 	/*
> 	 * We assume that every buffer we see is always NUL-terminated
> 	 * eventually, either because it comes from xmallocz() or our
> 	 * mmap layer always ensures an extra NUL.
> 	 */
> 	 return regexec(...);
>   #else
>   #error "Nope, you need either NO_REGEX or USE_MMAP_NUL"
>   #endif
>   }
> 
> The assumption in the middle case feels pretty hacky, though. It fails

I agree. I would prefer just falling back to read() after MAP_ANON if
the MAP_FIXED fails. This would work for all systems.

> if we get a buffer from somewhere besides those two sources. It fails if
> somebody calls regexec_buf() on a subset of a string.
> 
> It also doesn't handle matching past embedded NULs in the string. That's
> not something we're relying on yet, but it would be nice to support
> consistently in the long run.

This is going to be even more non-portable and
implementation-specific. There's not even a good spec for how embedded
nuls should be treated in regex.

> If there's a compelling reason, it might be worth making that tradeoff.
> But I am not sure what the compelling reason is to use musl's regex
> (aside from the obvious of "less code in the resulting executable").

The compelling reason is just being portable by default rather than
requiring manual overrides to use a replacement library with weird
extensions for something that could have been done portably to begin
with.

Rich

  reply	other threads:[~2016-10-04 15:40 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-04 15:08 Rich Felker
2016-10-04 15:27 ` Jeff King
2016-10-04 15:40   ` Rich Felker [this message]
2016-10-04 16:08     ` Johannes Schindelin
2016-10-04 16:11       ` Rich Felker
2016-10-04 17:16         ` Johannes Schindelin
2016-10-04 18:00           ` Ray Donnelly
2016-10-04 17:39       ` [musl] " Rich Felker
2016-10-05 11:17         ` Johannes Schindelin
2016-10-05 13:01           ` Szabolcs Nagy
2016-10-05 13:15           ` Rich Felker
2016-10-04 22:06       ` James B
2016-10-04 22:33         ` Rich Felker
2016-10-04 22:48           ` Junio C Hamano
2016-10-05 13:11           ` Jakub Narębski
2016-10-05 16:15             ` [musl] " Rich Felker
2016-10-05 10:41         ` Johannes Schindelin
2016-10-05 11:59           ` James B
2016-10-05 16:11             ` Jeff King
2016-10-05 16:27               ` Rich Felker
2016-10-06 10:44             ` Johannes Schindelin
2016-10-06 19:18       ` Ævar Arnfjörð Bjarmason
2016-10-06 19:23         ` Jeff King
2016-10-06 19:25           ` Rich Felker
2016-10-06 19:28             ` Jeff King
2016-10-06 22:42         ` Ramsay Jones
2016-10-07 11:30           ` Jakub Narębski
2016-10-04 16:01   ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161004154045.GT19318@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=git@vger.kernel.org \
    --cc=musl@lists.openwall.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).