git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* Regression: git no longer works with musl libc's regex impl
@ 2016-10-04 15:08 Rich Felker
  2016-10-04 15:27 ` Jeff King
  0 siblings, 1 reply; 28+ messages in thread
From: Rich Felker @ 2016-10-04 15:08 UTC (permalink / raw)
  To: git; +Cc: musl

This commit broke support for using git with musl libc:

https://github.com/git/git/commit/2f8952250a84313b74f96abb7b035874854cf202

Rather than depending on non-portable GNU regex extensions, there is a
simple portable fix for the issue this code was added to work around:
When a text file is being mmapped for use with string functions which
depend on null termination, if the file size:

1. is nonzero mod page size, it just works; the remainder of the last
   page reads as zero bytes when mmapped.

2. if an exact multiple of the page size, then instead of directly
   mmapping the file, first mmap a mapping 1 byte (thus 1 page) larger
   with MAP_ANON, then use MAP_FIXED to map the file over top of all
   but the last page. Now the mmapped buffer can safely be used as a C
   string.

If such a solution is acceptable I can try to prepare a patch.

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regression: git no longer works with musl libc's regex impl
  2016-10-04 15:08 Regression: git no longer works with musl libc's regex impl Rich Felker
@ 2016-10-04 15:27 ` Jeff King
  2016-10-04 15:40   ` Rich Felker
  2016-10-04 16:01   ` Johannes Schindelin
  0 siblings, 2 replies; 28+ messages in thread
From: Jeff King @ 2016-10-04 15:27 UTC (permalink / raw)
  To: Rich Felker; +Cc: git, musl

On Tue, Oct 04, 2016 at 11:08:48AM -0400, Rich Felker wrote:

> This commit broke support for using git with musl libc:
> 
> https://github.com/git/git/commit/2f8952250a84313b74f96abb7b035874854cf202

Yep. The idea is that you would compile git with NO_REGEX=1, and it
would use the included compat routines.

Is there something in particular you want to get out of using musl's
regex that is not supported in the compat library?

> Rather than depending on non-portable GNU regex extensions, there is a
> simple portable fix for the issue this code was added to work around:
> When a text file is being mmapped for use with string functions which
> depend on null termination, if the file size:
> 
> 1. is nonzero mod page size, it just works; the remainder of the last
>    page reads as zero bytes when mmapped.

Is that a portable assumption?

> 2. if an exact multiple of the page size, then instead of directly
>    mmapping the file, first mmap a mapping 1 byte (thus 1 page) larger
>    with MAP_ANON, then use MAP_FIXED to map the file over top of all
>    but the last page. Now the mmapped buffer can safely be used as a C
>    string.

I'm not sure whether all of our compat layers for mmap would be happy
with that (e.g., see compat/win32mmap.c).

So it seems like any mmap-related solutions would have to be
conditional, too. And then regexec_buf() would have to become something
like:

  int regexec_buf(...)
  {
  #if defined(REG_STARTEND)
	... set up match ...
	return regexec(..., REG_STARTEND);
  #elif defined(MMAP_ALWAYS_HAS_NUL)
	/*
	 * We assume that every buffer we see is always NUL-terminated
	 * eventually, either because it comes from xmallocz() or our
	 * mmap layer always ensures an extra NUL.
	 */
	 return regexec(...);
  #else
  #error "Nope, you need either NO_REGEX or USE_MMAP_NUL"
  #endif
  }

The assumption in the middle case feels pretty hacky, though. It fails
if we get a buffer from somewhere besides those two sources. It fails if
somebody calls regexec_buf() on a subset of a string.

It also doesn't handle matching past embedded NULs in the string. That's
not something we're relying on yet, but it would be nice to support
consistently in the long run.

If there's a compelling reason, it might be worth making that tradeoff.
But I am not sure what the compelling reason is to use musl's regex
(aside from the obvious of "less code in the resulting executable").

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regression: git no longer works with musl libc's regex impl
  2016-10-04 15:27 ` Jeff King
@ 2016-10-04 15:40   ` Rich Felker
  2016-10-04 16:08     ` Johannes Schindelin
  2016-10-04 16:01   ` Johannes Schindelin
  1 sibling, 1 reply; 28+ messages in thread
From: Rich Felker @ 2016-10-04 15:40 UTC (permalink / raw)
  To: Jeff King; +Cc: git, musl

On Tue, Oct 04, 2016 at 11:27:22AM -0400, Jeff King wrote:
> On Tue, Oct 04, 2016 at 11:08:48AM -0400, Rich Felker wrote:
> 
> > This commit broke support for using git with musl libc:
> > 
> > https://github.com/git/git/commit/2f8952250a84313b74f96abb7b035874854cf202
> 
> Yep. The idea is that you would compile git with NO_REGEX=1, and it
> would use the included compat routines.

This is really obnoxious to have to do manually. Why can't it simply
be auto-detected based on non-definition of REG_STARDEND?

> Is there something in particular you want to get out of using musl's
> regex that is not supported in the compat library?

It's always nice not to link extra implementations of the same thing.
This comes up all the time with gratuitous gnulib replacement
functions too.

> > Rather than depending on non-portable GNU regex extensions, there is a
> > simple portable fix for the issue this code was added to work around:
> > When a text file is being mmapped for use with string functions which
> > depend on null termination, if the file size:
> > 
> > 1. is nonzero mod page size, it just works; the remainder of the last
> >    page reads as zero bytes when mmapped.
> 
> Is that a portable assumption?

Yes. Per POSIX:

"The system shall always zero-fill any partial page at the end of an
object. Further, the system shall never write out any modified
portions of the last page of an object which are beyond its end.
References within the address range starting at pa and continuing for
len bytes to whole pages following the end of an object shall result
in delivery of a SIGBUS signal."

Source: http://pubs.opengroup.org/onlinepubs/9699919799/functions/mmap.html

The same or similar text appears at least back to SUSv2; I did not
look further back.

> > 2. if an exact multiple of the page size, then instead of directly
> >    mmapping the file, first mmap a mapping 1 byte (thus 1 page) larger
> >    with MAP_ANON, then use MAP_FIXED to map the file over top of all
> >    but the last page. Now the mmapped buffer can safely be used as a C
> >    string.
> 
> I'm not sure whether all of our compat layers for mmap would be happy
> with that (e.g., see compat/win32mmap.c).

The nice thing about this approach is that if the MAP_FIXED fails due
to a broken system you can just read() into the anonymous memory.

> So it seems like any mmap-related solutions would have to be
> conditional, too. And then regexec_buf() would have to become something
> like:
> 
>   int regexec_buf(...)
>   {
>   #if defined(REG_STARTEND)
> 	... set up match ...
> 	return regexec(..., REG_STARTEND);
>   #elif defined(MMAP_ALWAYS_HAS_NUL)
> 	/*
> 	 * We assume that every buffer we see is always NUL-terminated
> 	 * eventually, either because it comes from xmallocz() or our
> 	 * mmap layer always ensures an extra NUL.
> 	 */
> 	 return regexec(...);
>   #else
>   #error "Nope, you need either NO_REGEX or USE_MMAP_NUL"
>   #endif
>   }
> 
> The assumption in the middle case feels pretty hacky, though. It fails

I agree. I would prefer just falling back to read() after MAP_ANON if
the MAP_FIXED fails. This would work for all systems.

> if we get a buffer from somewhere besides those two sources. It fails if
> somebody calls regexec_buf() on a subset of a string.
> 
> It also doesn't handle matching past embedded NULs in the string. That's
> not something we're relying on yet, but it would be nice to support
> consistently in the long run.

This is going to be even more non-portable and
implementation-specific. There's not even a good spec for how embedded
nuls should be treated in regex.

> If there's a compelling reason, it might be worth making that tradeoff.
> But I am not sure what the compelling reason is to use musl's regex
> (aside from the obvious of "less code in the resulting executable").

The compelling reason is just being portable by default rather than
requiring manual overrides to use a replacement library with weird
extensions for something that could have been done portably to begin
with.

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regression: git no longer works with musl libc's regex impl
  2016-10-04 15:27 ` Jeff King
  2016-10-04 15:40   ` Rich Felker
@ 2016-10-04 16:01   ` Johannes Schindelin
  1 sibling, 0 replies; 28+ messages in thread
From: Johannes Schindelin @ 2016-10-04 16:01 UTC (permalink / raw)
  To: Jeff King; +Cc: Rich Felker, git, musl

Hi,

On Tue, 4 Oct 2016, Jeff King wrote:

> On Tue, Oct 04, 2016 at 11:08:48AM -0400, Rich Felker wrote:
> 
> > 1. is nonzero mod page size, it just works; the remainder of the last
> >    page reads as zero bytes when mmapped.
> 
> Is that a portable assumption?

No.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regression: git no longer works with musl libc's regex impl
  2016-10-04 15:40   ` Rich Felker
@ 2016-10-04 16:08     ` Johannes Schindelin
  2016-10-04 16:11       ` Rich Felker
                         ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Johannes Schindelin @ 2016-10-04 16:08 UTC (permalink / raw)
  To: Rich Felker; +Cc: Jeff King, git, musl

Hi Rich,

On Tue, 4 Oct 2016, Rich Felker wrote:

> On Tue, Oct 04, 2016 at 11:27:22AM -0400, Jeff King wrote:
> > On Tue, Oct 04, 2016 at 11:08:48AM -0400, Rich Felker wrote:
> > 
> > > 1. is nonzero mod page size, it just works; the remainder of the last
> > >    page reads as zero bytes when mmapped.
> > 
> > Is that a portable assumption?
> 
> Yes.

No, it is not. You quote POSIX, but the matter of the fact is that we use
a subset of POSIX in order to be able to keep things running on Windows.

And quite honestly, there are lots of reasons to keep things running on
Windows, and even to favor Windows support over musl support. Over four
million reasons: the Git for Windows users.

So rather than getting into an ideological discussion about "broken"
systems, it would be good to keep things practical, realizing that those
users make up a very real chunk of all of Git's users.

As to making NO_REGEX conditional on REG_STARTEND: you are talking about
apples and oranges here. NO_REGEX is a Makefile flag, while REG_STARTEND
is a C preprocessor macro.

Unless you can convince the rest of the Git developers (you would not
convince me) to simulate autoconf by compiling an executable every time
`make` is run, to determine whether REG_STARTEND is defined, this is a
no-go.

However, you *can* use autoconf directly, and come up with a patch to our
configure.ac that detects the absence of REG_STARTEND and sets NO_REGEX=1.

Alternatively, you can set NO_REGEX=1 in your config.mak.

Or, if you use one of the auto-detected cases in config.mak.uname, you
could patch it to set NO_REGEX=1.

And lastly, the best alternative would be to teach musl about
REG_STARTEND, as it is rather useful a feature.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regression: git no longer works with musl libc's regex impl
  2016-10-04 16:08     ` Johannes Schindelin
@ 2016-10-04 16:11       ` Rich Felker
  2016-10-04 17:16         ` Johannes Schindelin
  2016-10-04 17:39       ` [musl] " Rich Felker
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 28+ messages in thread
From: Rich Felker @ 2016-10-04 16:11 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jeff King, git, musl

On Tue, Oct 04, 2016 at 06:08:33PM +0200, Johannes Schindelin wrote:
> Hi Rich,
> 
> On Tue, 4 Oct 2016, Rich Felker wrote:
> 
> > On Tue, Oct 04, 2016 at 11:27:22AM -0400, Jeff King wrote:
> > > On Tue, Oct 04, 2016 at 11:08:48AM -0400, Rich Felker wrote:
> > > 
> > > > 1. is nonzero mod page size, it just works; the remainder of the last
> > > >    page reads as zero bytes when mmapped.
> > > 
> > > Is that a portable assumption?
> > 
> > Yes.
> 
> No, it is not. You quote POSIX, but the matter of the fact is that we use
> a subset of POSIX in order to be able to keep things running on Windows.
> 
> And quite honestly, there are lots of reasons to keep things running on
> Windows, and even to favor Windows support over musl support. Over four
> million reasons: the Git for Windows users.

I would hope that in the future, git-for-windows users will be using
musl, via midipix, rather than the painfully slow and awful version
they're stuck with now...

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regression: git no longer works with musl libc's regex impl
  2016-10-04 16:11       ` Rich Felker
@ 2016-10-04 17:16         ` Johannes Schindelin
  2016-10-04 18:00           ` Ray Donnelly
  0 siblings, 1 reply; 28+ messages in thread
From: Johannes Schindelin @ 2016-10-04 17:16 UTC (permalink / raw)
  To: Rich Felker; +Cc: Jeff King, git, musl

Hi Rich,

On Tue, 4 Oct 2016, Rich Felker wrote:

> On Tue, Oct 04, 2016 at 06:08:33PM +0200, Johannes Schindelin wrote:
> > Hi Rich,
> > 
> > On Tue, 4 Oct 2016, Rich Felker wrote:
> > 
> > > On Tue, Oct 04, 2016 at 11:27:22AM -0400, Jeff King wrote:
> > > > On Tue, Oct 04, 2016 at 11:08:48AM -0400, Rich Felker wrote:
> > > > 
> > > > > 1. is nonzero mod page size, it just works; the remainder of the last
> > > > >    page reads as zero bytes when mmapped.
> > > > 
> > > > Is that a portable assumption?
> > > 
> > > Yes.
> > 
> > No, it is not. You quote POSIX, but the matter of the fact is that we use
> > a subset of POSIX in order to be able to keep things running on Windows.
> > 
> > And quite honestly, there are lots of reasons to keep things running on
> > Windows, and even to favor Windows support over musl support. Over four
> > million reasons: the Git for Windows users.
> 
> I would hope that in the future, git-for-windows users will be using
> musl, via midipix, rather than the painfully slow and awful version
> they're stuck with now...

Git for Windows actually uses the MSVC runtime, which is blazing fast.

You are probably confusing Git for Windows with Cygwin Git.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-04 16:08     ` Johannes Schindelin
  2016-10-04 16:11       ` Rich Felker
@ 2016-10-04 17:39       ` Rich Felker
  2016-10-05 11:17         ` Johannes Schindelin
  2016-10-04 22:06       ` James B
  2016-10-06 19:18       ` Ævar Arnfjörð Bjarmason
  3 siblings, 1 reply; 28+ messages in thread
From: Rich Felker @ 2016-10-04 17:39 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jeff King, git, musl

On Tue, Oct 04, 2016 at 06:08:33PM +0200, Johannes Schindelin wrote:
> Hi Rich,
> 
> On Tue, 4 Oct 2016, Rich Felker wrote:
> 
> > On Tue, Oct 04, 2016 at 11:27:22AM -0400, Jeff King wrote:
> > > On Tue, Oct 04, 2016 at 11:08:48AM -0400, Rich Felker wrote:
> > > 
> > > > 1. is nonzero mod page size, it just works; the remainder of the last
> > > >    page reads as zero bytes when mmapped.
> > > 
> > > Is that a portable assumption?
> > 
> > Yes.
> 
> No, it is not. You quote POSIX, but the matter of the fact is that we use
> a subset of POSIX in order to be able to keep things running on Windows.
> 
> And quite honestly, there are lots of reasons to keep things running on
> Windows, and even to favor Windows support over musl support. Over four
> million reasons: the Git for Windows users.
> 
> So rather than getting into an ideological discussion about "broken"
> systems, it would be good to keep things practical, realizing that those
> users make up a very real chunk of all of Git's users.
> 
> As to making NO_REGEX conditional on REG_STARTEND: you are talking about
> apples and oranges here. NO_REGEX is a Makefile flag, while REG_STARTEND
> is a C preprocessor macro.

It seems like you could just always compile the source file, and just
have it all inside #if defined(NO_REGEX) || !defined(REG_STARTEND) or
similar.

> And lastly, the best alternative would be to teach musl about
> REG_STARTEND, as it is rather useful a feature.

Maybe, but it seems fundamentally costly to support -- it's extra
state in the inner loops that imposes costly spill/reload on archs
with too few registers (x86). I'll look at doing this when we
overhaul/replace the regex implementation, and I'm happy to do some
performance-regression tests for adding it now if someone has a simple
patch (as was mentioned on the musl list).

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regression: git no longer works with musl libc's regex impl
  2016-10-04 17:16         ` Johannes Schindelin
@ 2016-10-04 18:00           ` Ray Donnelly
  0 siblings, 0 replies; 28+ messages in thread
From: Ray Donnelly @ 2016-10-04 18:00 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Rich Felker, Jeff King, git, musl

On Tue, Oct 4, 2016 at 6:16 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi Rich,
>
> On Tue, 4 Oct 2016, Rich Felker wrote:
>
>> On Tue, Oct 04, 2016 at 06:08:33PM +0200, Johannes Schindelin wrote:
>> > Hi Rich,
>> >
>> > On Tue, 4 Oct 2016, Rich Felker wrote:
>> >
>> > > On Tue, Oct 04, 2016 at 11:27:22AM -0400, Jeff King wrote:
>> > > > On Tue, Oct 04, 2016 at 11:08:48AM -0400, Rich Felker wrote:
>> > > >
>> > > > > 1. is nonzero mod page size, it just works; the remainder of the last
>> > > > >    page reads as zero bytes when mmapped.
>> > > >
>> > > > Is that a portable assumption?
>> > >
>> > > Yes.
>> >
>> > No, it is not. You quote POSIX, but the matter of the fact is that we use
>> > a subset of POSIX in order to be able to keep things running on Windows.
>> >
>> > And quite honestly, there are lots of reasons to keep things running on
>> > Windows, and even to favor Windows support over musl support. Over four
>> > million reasons: the Git for Windows users.
>>
>> I would hope that in the future, git-for-windows users will be using
>> musl, via midipix, rather than the painfully slow and awful version
>> they're stuck with now...
>
> Git for Windows actually uses the MSVC runtime, which is blazing fast.
>
> You are probably confusing Git for Windows with Cygwin Git.

To be fair, Cygwin Git isn't *that* slow, though I look forward to the
day when MSYS2 can use the native-Windows/GfW version instead
(including your rebase-in-C changes)

>
> Ciao,
> Johannes

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-04 16:08     ` Johannes Schindelin
  2016-10-04 16:11       ` Rich Felker
  2016-10-04 17:39       ` [musl] " Rich Felker
@ 2016-10-04 22:06       ` James B
  2016-10-04 22:33         ` Rich Felker
  2016-10-05 10:41         ` Johannes Schindelin
  2016-10-06 19:18       ` Ævar Arnfjörð Bjarmason
  3 siblings, 2 replies; 28+ messages in thread
From: James B @ 2016-10-04 22:06 UTC (permalink / raw)
  To: musl; +Cc: Johannes Schindelin, Rich Felker, Jeff King, git

On Tue, 4 Oct 2016 18:08:33 +0200 (CEST)
Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:

> 
> No, it is not. You quote POSIX, but the matter of the fact is that we use
> a subset of POSIX in order to be able to keep things running on Windows.
> 
> And quite honestly, there are lots of reasons to keep things running on
> Windows, and even to favor Windows support over musl support. Over four
> million reasons: the Git for Windows users.
> 

Wow, I don't know that Windows is a git's first-tier platform now, and Linux/POSIX second. Are we talking about the same git that was originally written in Linus Torvalds, and is used to manage Linux kernel? Are you by any chance employed by Redmond, directly or indirectly?

Sorry - can't help it.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-04 22:06       ` James B
@ 2016-10-04 22:33         ` Rich Felker
  2016-10-04 22:48           ` Junio C Hamano
  2016-10-05 13:11           ` Jakub Narębski
  2016-10-05 10:41         ` Johannes Schindelin
  1 sibling, 2 replies; 28+ messages in thread
From: Rich Felker @ 2016-10-04 22:33 UTC (permalink / raw)
  To: James B; +Cc: musl, Johannes Schindelin, Jeff King, git

On Wed, Oct 05, 2016 at 09:06:25AM +1100, James B wrote:
> On Tue, 4 Oct 2016 18:08:33 +0200 (CEST)
> Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> 
> > 
> > No, it is not. You quote POSIX, but the matter of the fact is that we use
> > a subset of POSIX in order to be able to keep things running on Windows.
> > 
> > And quite honestly, there are lots of reasons to keep things running on
> > Windows, and even to favor Windows support over musl support. Over four
> > million reasons: the Git for Windows users.
> > 
> 
> Wow, I don't know that Windows is a git's first-tier platform now,
> and Linux/POSIX second. Are we talking about the same git that was
> originally written in Linus Torvalds, and is used to manage Linux
> kernel? Are you by any chance employed by Redmond, directly or
> indirectly?
> 
> Sorry - can't help it.

I don't think the hostility and sarcasm are really needed here. But
what this does speak to is that users don't like feeling like their
platform is being treated as a second-class target, which is what it
feels like when you have to manually flip a switch to make git build.
This is especially unfriendly when the semantics of the switch come
across, at least to some users, as "your system regex is incomplete"
rather than "git can't use it because git depends on nonstandard
extensions".

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-04 22:33         ` Rich Felker
@ 2016-10-04 22:48           ` Junio C Hamano
  2016-10-05 13:11           ` Jakub Narębski
  1 sibling, 0 replies; 28+ messages in thread
From: Junio C Hamano @ 2016-10-04 22:48 UTC (permalink / raw)
  To: Rich Felker; +Cc: James B, musl, Johannes Schindelin, Jeff King, git

Rich Felker <dalias@libc.org> writes:

> This is especially unfriendly when the semantics of the switch come
> across, at least to some users, as "your system regex is incomplete"
> rather than "git can't use it because git depends on nonstandard
> extensions".

The latter is exactly what Makefile patch that brought this change
says, I think.

    # Define NO_REGEX if your C library lacks regex support with REG_STARTEND
    # feature.

Before the series updated the message to the above, we used to say:

    # Define NO_REGEX if you have no or inferior regex support in your C library.

which _was_ unfair to those who needed to set NO_REGEX for whatever
reason.  It was totally unclear "inferior" relative to what standard
the message was passing its harsh judgement on your C library.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-04 22:06       ` James B
  2016-10-04 22:33         ` Rich Felker
@ 2016-10-05 10:41         ` Johannes Schindelin
  2016-10-05 11:59           ` James B
  1 sibling, 1 reply; 28+ messages in thread
From: Johannes Schindelin @ 2016-10-05 10:41 UTC (permalink / raw)
  To: James B; +Cc: musl, Rich Felker, Jeff King, git

Hi James,

On Wed, 5 Oct 2016, James B wrote:

> On Tue, 4 Oct 2016 18:08:33 +0200 (CEST)
> Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> 
> > No, it is not. You quote POSIX, but the matter of the fact is that we
> > use a subset of POSIX in order to be able to keep things running on
> > Windows.
> > 
> > And quite honestly, there are lots of reasons to keep things running
> > on Windows, and even to favor Windows support over musl support. Over
> > four million reasons: the Git for Windows users.
> 
> Wow, I don't know that Windows is a git's first-tier platform now,

It is. Git for Windows is maintained by me, and I make as certain as I can
that it works fine. And yes, we have download numbers to support my claim.
The latest release is less than 24h old, but I can point you to Git for
Windows 2.8.1 whose 32-bit installer was downloaded 397,273 times, and
whose 64-bit installer was downloaded 3,780,079 times.

> and Linux/POSIX second.

This is not at all what I said, so please be careful of what you accuse
me.

What I said is that we never exploited the full POSIX standard, but that
we made certain to use a subset of POSIX in Git which would be relatively
easy to emulate using Windows' API.

> Are we talking about the same git that was originally written in Linus
> Torvalds, and is used to manage Linux kernel?

It was originally written by (not in) Linus Torvalds, and yes, the Linux
kernel is one of its many users.

Git is not used only for the Linux kernel, though, and I am certain that
Linus agrees that it should not cater only to the Linux folks. Git is used
very widely in OSS as well as in the industry. So we, the Git developers,
kind of have an obligation to make things work in a much broader
perspective than you suggested.

> Are you by any chance employed by Redmond, directly or indirectly?

I am not exactly employed by Redmond, but by Microsoft (this is what you
meant, I guess).

I maintained Git for Windows in my spare time, next to a very demanding
position in science, for eight years. In 2015, I joined Microsoft and part
of my role is to maintain Git for Windows, allowing me to do a much better
job at it.

Of course, I do not only improve Git's Windows support, but contribute
other patches, too. You might also appreciate the fact that some of my
colleagues started contributing patches to Git that benefit all Git users.

> Sorry - can't help it.

I do not know why you are sorry, and I do not believe that you have to be.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-04 17:39       ` [musl] " Rich Felker
@ 2016-10-05 11:17         ` Johannes Schindelin
  2016-10-05 13:01           ` Szabolcs Nagy
  2016-10-05 13:15           ` Rich Felker
  0 siblings, 2 replies; 28+ messages in thread
From: Johannes Schindelin @ 2016-10-05 11:17 UTC (permalink / raw)
  To: Rich Felker; +Cc: Jeff King, git, musl

Hi Rich,

On Tue, 4 Oct 2016, Rich Felker wrote:

> On Tue, Oct 04, 2016 at 06:08:33PM +0200, Johannes Schindelin wrote:
>
> > And lastly, the best alternative would be to teach musl about
> > REG_STARTEND, as it is rather useful a feature.
> 
> Maybe, but it seems fundamentally costly to support -- it's extra
> state in the inner loops that imposes costly spill/reload on archs
> with too few registers (x86).

It is true that it could cause that.

I had a brief look at the source code (you use backtracking... hopefully
nobody uses musl to parse regular expressions from untrusted, or
inexperienced, sources [*1*]), and it seems that the regex code might
spill unnecessarily already (I see, for example, that the reg_notbol,
reg_noteol and reg_newline flags all use up complete int registers, not
merely bits of a single one).

It seems, specifically, that the *match_end_ofs parameter of the two
regexec backends is always set to point to eo, which is so far not
initialized. You could initialize it to -1 and set it to pmatch[0].rm_eo
if the REG_STARTEND flag is set. The GET_NEXT_WCHAR() macro would then
need to test something like

	if (str_byte >= string + *match_end_ofs) {
		ret = REG_NOMATCH; goto error_exit;
	}

This does not handle non-zero pmatch[0].rm_so, though. I would probably
try to pass another input parameter for that, but I have not verified yet
that a "^" would be handled properly (if pmatch[0].rm_so > 0 and
REG_STARTEND is set, "^" should *not* match).

> I'll look at doing this when we overhaul/replace the regex
> implementation, and I'm happy to do some performance-regression tests
> for adding it now if someone has a simple patch (as was mentioned on the
> musl list).

I'd be interested to be kept in the loop, if you do not mind Cc:ing me.

Ciao,
Johannes

Footnote *1*:
http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-05 10:41         ` Johannes Schindelin
@ 2016-10-05 11:59           ` James B
  2016-10-05 16:11             ` Jeff King
  2016-10-06 10:44             ` Johannes Schindelin
  0 siblings, 2 replies; 28+ messages in thread
From: James B @ 2016-10-05 11:59 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: musl, Rich Felker, Jeff King, git

On Wed, 5 Oct 2016 12:41:50 +0200 (CEST)
Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:

> > 
> > Wow, I don't know that Windows is a git's first-tier platform now,
> 
> It is. Git for Windows is maintained by me, and I make as certain as I can
> that it works fine. 
> And yes, we have download numbers to support my claim.
> The latest release is less than 24h old, but I can point you to Git for
> Windows 2.8.1 whose 32-bit installer was downloaded 397,273 times, and
> whose 64-bit installer was downloaded 3,780,079 times.

Number downloads does not make first-tier platform. You know that as well as everyone else.

First-tier support is the decision made by the maintainers that the entire features of the software must be available on those first tier platforms. So if Windows is indeed first-tier platform for git, it means any features that don't work on git version of Windows must not be used/developed or even castrated. That's a scary thought.

Being a maintainer for "Git for Windows" does not make one automatically as the maintainer for "git", although that can happen.

So this decision that "Windows is now a first-tier platform for git" - is your own opinion, or is this the collective opinion of *all* the git maintainers? 


> 
> > and Linux/POSIX second.
> 
> This is not at all what I said, so please be careful of what you accuse
> me.

Yes, you did not say that. I said that. And I will say more. Git has Linux/POSIX roots. Attempting to "not use common POSIX features because they're not available on Windows" *does* make Linux/POSIX feels like second class platform for git. The way I see it, it should be *the other way around*.

It's a very sad day for a tool that was developed originally to maintain Linux kernel, by the Linux kernel author, now is restricted to avoid use/optimise on Linux/POSIX features *because* it has to run on another Windows ...

> 
> What I said is that we never exploited the full POSIX standard, but that
> we made certain to use a subset of POSIX in Git which would be relatively
> easy to emulate using Windows' API.

All this just proves my point above.

And - I notice you use the pronoun "we" - is that a "royal we" (which means the entire point is your own or your cohorts position), or is it the official position of all git maintainers?

> 
> > Are we talking about the same git that was originally written in Linus
> > Torvalds, and is used to manage Linux kernel?
> 
> It was originally written by (not in) Linus Torvalds, and yes, the Linux
> kernel is one of its many users.

That was a rhetorical question.

> 
> > Are you by any chance employed by Redmond, directly or indirectly?
> 
> I am not exactly employed by Redmond, but by Microsoft (this is what you
> meant, I guess).
> 
> I maintained Git for Windows in my spare time, next to a very demanding
> position in science, for eight years. In 2015, I joined Microsoft and part
> of my role is to maintain Git for Windows, allowing me to do a much better
> job at it.


Well thank you for being honest. I can see now why you responded the way you did (and still do). By being employed by Microsoft, and especially paid to work on Git for Windows, you have all the incentives to make it work best on Windows, and to make it as its first-tier platform within the limitation of Windows.

That in itself is not a problem - it only starts to become a problem when you try to cut down support for other platforms or stifle improvements on other platforms because "hey it makes it too hard to do those things in Windows".

> 
> Of course, I do not only improve Git's Windows support, but contribute
> other patches, too. You might also appreciate the fact that some of my
> colleagues started contributing patches to Git that benefit all Git users.
> 

By "colleagues" I assume other Microsoft employees? 
I don't have a problem with that - thank you and your colleagues for making git better.

But that does not give the right to you to control that "if it doesn't work on Windows we shouldn't do it". If you do, and at the same time claim that musl-libc (which is Linux-only) support is unimportant compared to your 4 million Git-for-Windows downloads really don't do well for your or your employer's image. I don't have to say why - everyone outside Microsoft knows why.

In conclusion, I certainly hope that your view is not shared by the other git maintainers.

PS: Rich, sorry for the distraction. I have said what I want to say, so I'll bow out from this thread.

cheers,
James

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-05 11:17         ` Johannes Schindelin
@ 2016-10-05 13:01           ` Szabolcs Nagy
  2016-10-05 13:15           ` Rich Felker
  1 sibling, 0 replies; 28+ messages in thread
From: Szabolcs Nagy @ 2016-10-05 13:01 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Rich Felker, Jeff King, git, musl

* Johannes Schindelin <Johannes.Schindelin@gmx.de> [2016-10-05 13:17:49 +0200]:
> I had a brief look at the source code (you use backtracking... hopefully
> nobody uses musl to parse regular expressions from untrusted, or
> inexperienced, sources [*1*]), and it seems that the regex code might

does git use BRE?

a conforming BRE implementation has to use back tracking
if the pattern has back references.

usually ERE implementations may also use back tracking
since they support back references as an extension.

musl does not support this extension (and many others) so
it never uses back tracking for ERE matches, note however
that match complexity and memory usage of a conforming
ERE implementation is still exponential in pattern
length because of repetition counts.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regression: git no longer works with musl libc's regex impl
  2016-10-04 22:33         ` Rich Felker
  2016-10-04 22:48           ` Junio C Hamano
@ 2016-10-05 13:11           ` Jakub Narębski
  2016-10-05 16:15             ` [musl] " Rich Felker
  1 sibling, 1 reply; 28+ messages in thread
From: Jakub Narębski @ 2016-10-05 13:11 UTC (permalink / raw)
  To: musl, James B; +Cc: Johannes Schindelin, Jeff King, git

W dniu 05.10.2016 o 00:33, Rich Felker pisze:
> On Wed, Oct 05, 2016 at 09:06:25AM +1100, James B wrote:
>> On Tue, 4 Oct 2016 18:08:33 +0200 (CEST)
>> Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>>>
>>> No, it is not. You quote POSIX, but the matter of the fact is that we use
>>> a subset of POSIX in order to be able to keep things running on Windows.
>>>
>>> And quite honestly, there are lots of reasons to keep things running on
>>> Windows, and even to favor Windows support over musl support. Over four
>>> million reasons: the Git for Windows users.
>>
>> Wow, I don't know that Windows is a git's first-tier platform now,
>> and Linux/POSIX second. Are we talking about the same git that was
>> originally written in Linus Torvalds, and is used to manage Linux
>> kernel? Are you by any chance employed by Redmond, directly or
>> indirectly?
>>
>> Sorry - can't help it.

Windows is one of the major platforms, yes.  I think there much, much
more people using Git on Windows, than using Git with musl.  More
users = more important.

Also, working with some inconvenience (requiring compilation with
NO_REGEX=1) is better than not working at all.

In CodingGuidelines we say:

 - Most importantly, we never say "It's in POSIX; we'll happily
   ignore your needs should your system not conform to it."
   We live in the real world.

 - However, we often say "Let's stay away from that construct,
   it's not even in POSIX".

 - In spite of the above two rules, we sometimes say "Although
   this is not in POSIX, it (is so convenient | makes the code
   much more readable | has other good characteristics) and
   practically all the platforms we care about support it, so
   let's use it".

The REG_STARTEND is 3rd point, mmap shenningans looks like 1st...

...on the other hand midipix <writeonce@midipix.org> wrote in
http://public-inbox.org/git/20161004200057.dc30d64f61e5ec441c34ffd4f788e58e.efa66ead67.wbe@email15.godaddy.com/
that the proposed fix should work on all Windows version we are
interested in (I think).  Test program included / attached.

The above-mentioned email also explains that the problem was
caught on MS Windows; it triggers if file end falls on the mmapped
page boundary, which is more likely to happen with 4096 mod size
on Windows rather than 65536 mod size on Linux.


On the other hand, while the proposed solution of "add padding as
to not end at page boundary, if necessary" doesn't have the
performance impact of "memcpy into NUL-terminated buffer" that
was originally proposed in patch series, it is still extra code
to maintain.

> 
> I don't think the hostility and sarcasm are really needed here. But
> what this does speak to is that users don't like feeling like their
> platform is being treated as a second-class target, which is what it
> feels like when you have to manually flip a switch to make git build.

You are welcome to send a patch adding to configure.ac detection
of REG_STARTEND support in standard library - setting NO_REGEX if
needed, and/or adding to Makefile uname-based defaults setting
NO_REGEX for compiling with musl.

> This is especially unfriendly when the semantics of the switch come
> across, at least to some users, as "your system regex is incomplete"
> rather than "git can't use it because git depends on nonstandard
> extensions".

Nonstandard but common extension.  As 2f8952250a commit message says
https://github.com/git/git/commit/2f8952250a84313b74f96abb7b035874854cf202

  Happily, there is an extension to regexec() introduced by the NetBSD
  project and present in all major regex implementation including
  Linux', MacOSX' and the one Git includes in compat/regex/: [...]

  [...]

  Since support for REG_STARTEND is so widespread by now, let's just
  introduce a helper function that always uses it, and tell people
  on a platform whose regex library does not support it to use the
  one from our compat/regex/ directory.

Also, as Junio said, the description of NO_REGEX option in the
Makefile now explicitly says:

    # Define NO_REGEX if your C library lacks regex support with REG_STARTEND
    # feature.

Best,
-- 
Jakub Narębski


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-05 11:17         ` Johannes Schindelin
  2016-10-05 13:01           ` Szabolcs Nagy
@ 2016-10-05 13:15           ` Rich Felker
  1 sibling, 0 replies; 28+ messages in thread
From: Rich Felker @ 2016-10-05 13:15 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jeff King, git, musl

On Wed, Oct 05, 2016 at 01:17:49PM +0200, Johannes Schindelin wrote:
> Hi Rich,
> 
> On Tue, 4 Oct 2016, Rich Felker wrote:
> 
> > On Tue, Oct 04, 2016 at 06:08:33PM +0200, Johannes Schindelin wrote:
> >
> > > And lastly, the best alternative would be to teach musl about
> > > REG_STARTEND, as it is rather useful a feature.
> > 
> > Maybe, but it seems fundamentally costly to support -- it's extra
> > state in the inner loops that imposes costly spill/reload on archs
> > with too few registers (x86).
> 
> It is true that it could cause that.
> 
> I had a brief look at the source code (you use backtracking...

Where did you get that idea? Backtracking is the most utterly
incompetent way to implement regex -- it throws away the whole
property that makes regex useful, being regular. Unfortunately, POSIX
BRE is not regular, as it contains backreferences, so any
implementation of regcomp/regexec requires at least a minimal
backtracking code path for BREs that contain backreferences.

> hopefully
> nobody uses musl to parse regular expressions from untrusted, or

On the contrary, musl's is the only system reccomp/regexec I'm aware
of that actually attempts to be safe with untrusted input -- when
using REG_EXTENDED (ERE). Other implementations provide backreferences
in ERE as an extension, making ERE unsafe just like BRE. musl
intentionally disallows them as a feature.

At least until recently, glibc also crashed on malloc failures in
regcomp, making it unsafe on untrusted input for that reason too.

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-05 11:59           ` James B
@ 2016-10-05 16:11             ` Jeff King
  2016-10-05 16:27               ` Rich Felker
  2016-10-06 10:44             ` Johannes Schindelin
  1 sibling, 1 reply; 28+ messages in thread
From: Jeff King @ 2016-10-05 16:11 UTC (permalink / raw)
  To: James B; +Cc: Johannes Schindelin, musl, Rich Felker, git

On Wed, Oct 05, 2016 at 10:59:34PM +1100, James B wrote:

> Number downloads does not make first-tier platform. You know that as
> well as everyone else.
> 
> First-tier support is the decision made by the maintainers that the
> entire features of the software must be available on those first tier
> platforms. So if Windows is indeed first-tier platform for git, it
> means any features that don't work on git version of Windows must not
> be used/developed or even castrated. That's a scary thought.

Prepare to be scared, then, I guess. Ever since the msysgit project
started years ago, we have made concessions in the code to work both
with POSIX-ish systems and with the msys layer. E.g., see how git-daemon
does not fork(), but actually re-spawns itself to handle connections.

When possible we try to put our abstractions at a level where they can
be implemented in a performant way on all platforms (the git-daemon
things is probably the _most_ ugly in that respect; I think nobody has
really cared about the performance enough to add back in a forking code
path for POSIX systems).

> So this decision that "Windows is now a first-tier platform for git" -
> is your own opinion, or is this the collective opinion of *all* the
> git maintainers?

There is only one maintainer of git: Junio. However, you'll note that I
also used "we" in the paragraphs above. And that is because the approach
I am talking about is something that has been done over the course of
many years by many members of the development community.

You may disagree with that approach, but it is nothing new. The msysgit
project started in 2007.

> Well thank you for being honest. I can see now why you responded the
> way you did (and still do). By being employed by Microsoft, and
> especially paid to work on Git for Windows, you have all the
> incentives to make it work best on Windows, and to make it as its
> first-tier platform within the limitation of Windows.

Please don't insinuate that Johannes is a Microsoft shill. He has been
working on the Windows port of Git for over 9 years, and was only
employed by Microsoft this year. Furthermore, his original REG_STARTEND
patch actually did a run-time fallback of NUL-terminating the input
buffers. It was _I_ who suggested that we should simply push people
towards our compat/regex routines instead. So if you want to be mad at
somebody, be mad at me.

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-05 13:11           ` Jakub Narębski
@ 2016-10-05 16:15             ` Rich Felker
  0 siblings, 0 replies; 28+ messages in thread
From: Rich Felker @ 2016-10-05 16:15 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: musl, James B, Johannes Schindelin, Jeff King, git

On Wed, Oct 05, 2016 at 03:11:05PM +0200, Jakub Narębski wrote:
> W dniu 05.10.2016 o 00:33, Rich Felker pisze:
> > On Wed, Oct 05, 2016 at 09:06:25AM +1100, James B wrote:
> >> On Tue, 4 Oct 2016 18:08:33 +0200 (CEST)
> >> Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> >>>
> >>> No, it is not. You quote POSIX, but the matter of the fact is that we use
> >>> a subset of POSIX in order to be able to keep things running on Windows.
> >>>
> >>> And quite honestly, there are lots of reasons to keep things running on
> >>> Windows, and even to favor Windows support over musl support. Over four
> >>> million reasons: the Git for Windows users.
> >>
> >> Wow, I don't know that Windows is a git's first-tier platform now,
> >> and Linux/POSIX second. Are we talking about the same git that was
> >> originally written in Linus Torvalds, and is used to manage Linux
> >> kernel? Are you by any chance employed by Redmond, directly or
> >> indirectly?
> >>
> >> Sorry - can't help it.
> 
> Windows is one of the major platforms, yes.  I think there much, much
> more people using Git on Windows, than using Git with musl.  More
> users = more important.
> 
> Also, working with some inconvenience (requiring compilation with
> NO_REGEX=1) is better than not working at all.
> 
> In CodingGuidelines we say:
> 
>  - Most importantly, we never say "It's in POSIX; we'll happily
>    ignore your needs should your system not conform to it."
>    We live in the real world.
> 
>  - However, we often say "Let's stay away from that construct,
>    it's not even in POSIX".

I agree wholeheartedly with these points.

> 
>  - In spite of the above two rules, we sometimes say "Although
>    this is not in POSIX, it (is so convenient | makes the code
>    much more readable | has other good characteristics) and
>    practically all the platforms we care about support it, so
>    let's use it".
> 
> The REG_STARTEND is 3rd point,

To begin with I wasn't clear that REG_STARDEND being nonstandard was
even noticed or compatibility considered when adding the dependency on
it, but it seems such discussion did take place and most targets have
it. Perhaps this means it should be proposed for standardization in
the next issue of POSIX.

> mmap shenningans looks like 1st...
> 
> ....on the other hand midipix <writeonce@midipix.org> wrote in
> http://public-inbox.org/git/20161004200057.dc30d64f61e5ec441c34ffd4f788e58e.efa66ead67.wbe@email15.godaddy.com/
> that the proposed fix should work on all Windows version we are
> interested in (I think).  Test program included / attached.
> 
> The above-mentioned email also explains that the problem was
> caught on MS Windows; it triggers if file end falls on the mmapped
> page boundary, which is more likely to happen with 4096 mod size
> on Windows rather than 65536 mod size on Linux.

On Linux page-size (mmap granularity) varies by arch but it's 4k on
basically all archs that people care about. I think midipix's author
was talking about real page size on Windows (4k) vs the minimum
logical page size (mmap granularity) that can be used to get
POSIX-matching semantics in midipix (which is 64k due to some
technical reasons I forget, which he could probably remind me of).

> On the other hand, while the proposed solution of "add padding as
> to not end at page boundary, if necessary" doesn't have the
> performance impact of "memcpy into NUL-terminated buffer" that
> was originally proposed in patch series, it is still extra code
> to maintain.

*nod*

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-05 16:11             ` Jeff King
@ 2016-10-05 16:27               ` Rich Felker
  0 siblings, 0 replies; 28+ messages in thread
From: Rich Felker @ 2016-10-05 16:27 UTC (permalink / raw)
  To: Jeff King; +Cc: James B, Johannes Schindelin, musl, git

On Wed, Oct 05, 2016 at 12:11:58PM -0400, Jeff King wrote:
> On Wed, Oct 05, 2016 at 10:59:34PM +1100, James B wrote:
> 
> > Number downloads does not make first-tier platform. You know that as
> > well as everyone else.
> > 
> > First-tier support is the decision made by the maintainers that the
> > entire features of the software must be available on those first tier
> > platforms. So if Windows is indeed first-tier platform for git, it
> > means any features that don't work on git version of Windows must not
> > be used/developed or even castrated. That's a scary thought.
> 
> Prepare to be scared, then, I guess. Ever since the msysgit project
> started years ago, we have made concessions in the code to work both
> with POSIX-ish systems and with the msys layer. E.g., see how git-daemon
> does not fork(), but actually re-spawns itself to handle connections.
> 
> When possible we try to put our abstractions at a level where they can
> be implemented in a performant way on all platforms (the git-daemon
> things is probably the _most_ ugly in that respect; I think nobody has
> really cared about the performance enough to add back in a forking code
> path for POSIX systems).
> 
> > So this decision that "Windows is now a first-tier platform for git" -
> > is your own opinion, or is this the collective opinion of *all* the
> > git maintainers?
> 
> There is only one maintainer of git: Junio. However, you'll note that I
> also used "we" in the paragraphs above. And that is because the approach
> I am talking about is something that has been done over the course of
> many years by many members of the development community.
> 
> You may disagree with that approach, but it is nothing new. The msysgit
> project started in 2007.

The goal of the midipix project is to make the need for FOSS projects
supporting Windows to do hacks like this obsolete. It still has a
little ways to go to be ready for mainstream use, but it's already
running a lot, and I hope you'll consider it for the future since it
simplifies things A LOT when you can just write to POSIX instead of
having to come up with abstraction layers that cater to Windows'
brokenness.

> > Well thank you for being honest. I can see now why you responded the
> > way you did (and still do). By being employed by Microsoft, and
> > especially paid to work on Git for Windows, you have all the
> > incentives to make it work best on Windows, and to make it as its
> > first-tier platform within the limitation of Windows.
> 
> Please don't insinuate that Johannes is a Microsoft shill. He has been
> working on the Windows port of Git for over 9 years, and was only
> employed by Microsoft this year. Furthermore, his original REG_STARTEND
> patch actually did a run-time fallback of NUL-terminating the input
> buffers. It was _I_ who suggested that we should simply push people
> towards our compat/regex routines instead. So if you want to be mad at
> somebody, be mad at me.

I hope we can get this thread away from accusing and attacking people
and on to doing productive things to make the software better.

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-05 11:59           ` James B
  2016-10-05 16:11             ` Jeff King
@ 2016-10-06 10:44             ` Johannes Schindelin
  1 sibling, 0 replies; 28+ messages in thread
From: Johannes Schindelin @ 2016-10-06 10:44 UTC (permalink / raw)
  To: James B; +Cc: musl, Rich Felker, Jeff King, git

Hi James,

On Wed, 5 Oct 2016, James B wrote:

> On Wed, 5 Oct 2016 12:41:50 +0200 (CEST)
> Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> 
> It's a very sad day for a tool that was developed originally to maintain
> Linux kernel, by the Linux kernel author, now is restricted to avoid
> use/optimise on Linux/POSIX features *because* it has to run on another
> Windows ...

Please note that this thread started because Git tried to shed the
restrictions of POSIX by using REG_STARTEND. In other words, we tried very
much *not* to be restricted.

And pragmatically, I must add that the REG_STARTEND feature is a very,
very useful one.

If you want to turn your sadness into something productive (you will allow
me that little prod after all you said in your mail), why don't you
dig back through the Git mailing list, fish out my original patch that
fell back to the "malloc();memcpy();/*add NUL*/" strategy, and contribute
that as a patch to the Git project, making a case that musl requires it?

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regression: git no longer works with musl libc's regex impl
  2016-10-04 16:08     ` Johannes Schindelin
                         ` (2 preceding siblings ...)
  2016-10-04 22:06       ` James B
@ 2016-10-06 19:18       ` Ævar Arnfjörð Bjarmason
  2016-10-06 19:23         ` Jeff King
  2016-10-06 22:42         ` Ramsay Jones
  3 siblings, 2 replies; 28+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2016-10-06 19:18 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Rich Felker, Jeff King, Git, musl

On Tue, Oct 4, 2016 at 6:08 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> As to making NO_REGEX conditional on REG_STARTEND: you are talking about
> apples and oranges here. NO_REGEX is a Makefile flag, while REG_STARTEND
> is a C preprocessor macro.
>
> Unless you can convince the rest of the Git developers (you would not
> convince me) to simulate autoconf by compiling an executable every time
> `make` is run, to determine whether REG_STARTEND is defined, this is a
> no-go.

But just to clarify, does anyone have any objection to making our
configure.ac compile a C program to check for this sort of thing?
Because that seems like the easiest solution to this class of problem.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regression: git no longer works with musl libc's regex impl
  2016-10-06 19:18       ` Ævar Arnfjörð Bjarmason
@ 2016-10-06 19:23         ` Jeff King
  2016-10-06 19:25           ` Rich Felker
  2016-10-06 22:42         ` Ramsay Jones
  1 sibling, 1 reply; 28+ messages in thread
From: Jeff King @ 2016-10-06 19:23 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Johannes Schindelin, Rich Felker, Git, musl

On Thu, Oct 06, 2016 at 09:18:29PM +0200, Ævar Arnfjörð Bjarmason wrote:

> On Tue, Oct 4, 2016 at 6:08 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> > As to making NO_REGEX conditional on REG_STARTEND: you are talking about
> > apples and oranges here. NO_REGEX is a Makefile flag, while REG_STARTEND
> > is a C preprocessor macro.
> >
> > Unless you can convince the rest of the Git developers (you would not
> > convince me) to simulate autoconf by compiling an executable every time
> > `make` is run, to determine whether REG_STARTEND is defined, this is a
> > no-go.
> 
> But just to clarify, does anyone have any objection to making our
> configure.ac compile a C program to check for this sort of thing?
> Because that seems like the easiest solution to this class of problem.

No, I think that is the exact purpose of configure.ac and autoconf.

It would be neat if we could auto-fallback during the build. Rich
suggested always compiling compat/regex.c, and just having it be a noop
at the preprocessor level. I'm not sure if that would work, though,
because we'd have to include the system "regex.h" to know if we have
REG_STARTEND, at which point it is potentially too late to compile our
own regex routines (we're potentially going to conflict with the system
declarations).

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regression: git no longer works with musl libc's regex impl
  2016-10-06 19:23         ` Jeff King
@ 2016-10-06 19:25           ` Rich Felker
  2016-10-06 19:28             ` Jeff King
  0 siblings, 1 reply; 28+ messages in thread
From: Rich Felker @ 2016-10-06 19:25 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Johannes Schindelin, Git, musl

On Thu, Oct 06, 2016 at 03:23:40PM -0400, Jeff King wrote:
> On Thu, Oct 06, 2016 at 09:18:29PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> > On Tue, Oct 4, 2016 at 6:08 PM, Johannes Schindelin
> > <Johannes.Schindelin@gmx.de> wrote:
> > > As to making NO_REGEX conditional on REG_STARTEND: you are talking about
> > > apples and oranges here. NO_REGEX is a Makefile flag, while REG_STARTEND
> > > is a C preprocessor macro.
> > >
> > > Unless you can convince the rest of the Git developers (you would not
> > > convince me) to simulate autoconf by compiling an executable every time
> > > `make` is run, to determine whether REG_STARTEND is defined, this is a
> > > no-go.
> > 
> > But just to clarify, does anyone have any objection to making our
> > configure.ac compile a C program to check for this sort of thing?
> > Because that seems like the easiest solution to this class of problem.
> 
> No, I think that is the exact purpose of configure.ac and autoconf.
> 
> It would be neat if we could auto-fallback during the build. Rich
> suggested always compiling compat/regex.c, and just having it be a noop
> at the preprocessor level. I'm not sure if that would work, though,
> because we'd have to include the system "regex.h" to know if we have
> REG_STARTEND, at which point it is potentially too late to compile our
> own regex routines (we're potentially going to conflict with the system
> declarations).

If you have autoconf testing for REG_STARTEND at configure time then
compat/regex.c can #include "config.h" and test for HAVE_REG_STARTEND
rather than for REG_STARTEND, or something like that.

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regression: git no longer works with musl libc's regex impl
  2016-10-06 19:25           ` Rich Felker
@ 2016-10-06 19:28             ` Jeff King
  0 siblings, 0 replies; 28+ messages in thread
From: Jeff King @ 2016-10-06 19:28 UTC (permalink / raw)
  To: Rich Felker
  Cc: Ævar Arnfjörð Bjarmason, Johannes Schindelin, Git, musl

On Thu, Oct 06, 2016 at 03:25:00PM -0400, Rich Felker wrote:

> > No, I think that is the exact purpose of configure.ac and autoconf.
> > 
> > It would be neat if we could auto-fallback during the build. Rich
> > suggested always compiling compat/regex.c, and just having it be a noop
> > at the preprocessor level. I'm not sure if that would work, though,
> > because we'd have to include the system "regex.h" to know if we have
> > REG_STARTEND, at which point it is potentially too late to compile our
> > own regex routines (we're potentially going to conflict with the system
> > declarations).
> 
> If you have autoconf testing for REG_STARTEND at configure time then
> compat/regex.c can #include "config.h" and test for HAVE_REG_STARTEND
> rather than for REG_STARTEND, or something like that.

Right, that part is easy; we do not even have to touch compat/regex.c,
because we already have such a knob in the Makefile (NO_REGEX), and
autoconf just needs to tweak that knob.

My question was whether we could do it without running a separate
compile (via autoconf or via the Makefile), and I think the answer is
"no".

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regression: git no longer works with musl libc's regex impl
  2016-10-06 19:18       ` Ævar Arnfjörð Bjarmason
  2016-10-06 19:23         ` Jeff King
@ 2016-10-06 22:42         ` Ramsay Jones
  2016-10-07 11:30           ` Jakub Narębski
  1 sibling, 1 reply; 28+ messages in thread
From: Ramsay Jones @ 2016-10-06 22:42 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Johannes Schindelin
  Cc: Rich Felker, Jeff King, Git, musl



On 06/10/16 20:18, Ævar Arnfjörð Bjarmason wrote:
> On Tue, Oct 4, 2016 at 6:08 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
>> As to making NO_REGEX conditional on REG_STARTEND: you are talking about
>> apples and oranges here. NO_REGEX is a Makefile flag, while REG_STARTEND
>> is a C preprocessor macro.
>>
>> Unless you can convince the rest of the Git developers (you would not
>> convince me) to simulate autoconf by compiling an executable every time
>> `make` is run, to determine whether REG_STARTEND is defined, this is a
>> no-go.
> 
> But just to clarify, does anyone have any objection to making our
> configure.ac compile a C program to check for this sort of thing?
> Because that seems like the easiest solution to this class of problem.

Err, you do know that we already do that, right?

[see commit a1e3b669 ("autoconf: don't use platform regex if it lacks REG_STARTEND", 17-08-2010)]

In fact, if you run the auto tools on cygwin, you get a different setting
for NO_REGEX than via config.mak.uname. Which is why I don't run configure
on cygwin. :-D

[The issue is exposed by t7008-grep-binary.sh, where the cygwin native
regex library matches '.' in a pattern with the NUL character. ie the
test_expect_failure test passes.]

ATB,
Ramsay Jones



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regression: git no longer works with musl libc's regex impl
  2016-10-06 22:42         ` Ramsay Jones
@ 2016-10-07 11:30           ` Jakub Narębski
  0 siblings, 0 replies; 28+ messages in thread
From: Jakub Narębski @ 2016-10-07 11:30 UTC (permalink / raw)
  To: Ramsay Jones, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, git
  Cc: Rich Felker, Jeff King, musl

W dniu 07.10.2016 o 00:42, Ramsay Jones pisze: 
> On 06/10/16 20:18, Ævar Arnfjörð Bjarmason wrote:
[...]
>> But just to clarify, does anyone have any objection to making our
>> configure.ac compile a C program to check for this sort of thing?
>> Because that seems like the easiest solution to this class of problem.
> 
> Err, you do know that we already do that, right?
> 
> [see commit a1e3b669 ("autoconf: don't use platform regex if it lacks REG_STARTEND", 17-08-2010)]
> 
> In fact, if you run the auto tools on cygwin, you get a different setting
> for NO_REGEX than via config.mak.uname. Which is why I don't run configure
> on cygwin. :-D
> 
> [The issue is exposed by t7008-grep-binary.sh, where the cygwin native
> regex library matches '.' in a pattern with the NUL character. ie the
> test_expect_failure test passes.]

Huh.  So we have NO_REGEX support in ./configure, and people using
Git on untypical architectures and systems *can* make use of it.

It was just described wrongly, so in turn to have the more neutral
description, the same as in Makefile, let's do this:

-------- >8 ---------- >8 ------------- >8 ---------- >8 ----------
Subject: [PATCH] configure.ac: Improve description of NO_REGEX test

The commit 2f8952250a changed description of NO_REGEX build config
variable to be more neutral, and actually say that it is about
support for REG_STARTEND.  Change description in configure.ac to
be the same.

Change also the test message and variable name to match.  The test
just checks that REG_STARTEND is #defined.

Issue-found-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Jakub Narębski <jnareb@gmail.com>
---
 configure.ac | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/configure.ac b/configure.ac
index aa9c91d..7f39fd0 100644
--- a/configure.ac
+++ b/configure.ac
@@ -835,9 +835,10 @@ AC_CHECK_TYPE([struct addrinfo],[
 ])
 GIT_CONF_SUBST([NO_IPV6])
 #
-# Define NO_REGEX if you have no or inferior regex support in your C library.
-AC_CACHE_CHECK([whether the platform regex can handle null bytes],
- [ac_cv_c_excellent_regex], [
+# Define NO_REGEX if your C library lacks regex support with REG_STARTEND
+# feature.
+AC_CACHE_CHECK([whether the platform regex supports REG_STARTEND],
+ [ac_cv_c_regex_with_reg_startend], [
 AC_EGREP_CPP(yippeeyeswehaveit,
 	AC_LANG_PROGRAM([AC_INCLUDES_DEFAULT
 #include <regex.h>
@@ -846,10 +847,10 @@ AC_EGREP_CPP(yippeeyeswehaveit,
 yippeeyeswehaveit
 #endif
 ]),
-	[ac_cv_c_excellent_regex=yes],
-	[ac_cv_c_excellent_regex=no])
+	[ac_cv_c_regex_with_reg_startend=yes],
+	[ac_cv_c_regex_with_reg_startend=no])
 ])
-if test $ac_cv_c_excellent_regex = yes; then
+if test $ac_cv_c_regex_with_reg_startend = yes; then
 	NO_REGEX=
 else
 	NO_REGEX=YesPlease
-- 
2.10.0




^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-10-07 11:31 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-04 15:08 Regression: git no longer works with musl libc's regex impl Rich Felker
2016-10-04 15:27 ` Jeff King
2016-10-04 15:40   ` Rich Felker
2016-10-04 16:08     ` Johannes Schindelin
2016-10-04 16:11       ` Rich Felker
2016-10-04 17:16         ` Johannes Schindelin
2016-10-04 18:00           ` Ray Donnelly
2016-10-04 17:39       ` [musl] " Rich Felker
2016-10-05 11:17         ` Johannes Schindelin
2016-10-05 13:01           ` Szabolcs Nagy
2016-10-05 13:15           ` Rich Felker
2016-10-04 22:06       ` James B
2016-10-04 22:33         ` Rich Felker
2016-10-04 22:48           ` Junio C Hamano
2016-10-05 13:11           ` Jakub Narębski
2016-10-05 16:15             ` [musl] " Rich Felker
2016-10-05 10:41         ` Johannes Schindelin
2016-10-05 11:59           ` James B
2016-10-05 16:11             ` Jeff King
2016-10-05 16:27               ` Rich Felker
2016-10-06 10:44             ` Johannes Schindelin
2016-10-06 19:18       ` Ævar Arnfjörð Bjarmason
2016-10-06 19:23         ` Jeff King
2016-10-06 19:25           ` Rich Felker
2016-10-06 19:28             ` Jeff King
2016-10-06 22:42         ` Ramsay Jones
2016-10-07 11:30           ` Jakub Narębski
2016-10-04 16:01   ` Johannes Schindelin

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).