git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Rich Felker <dalias@libc.org>
Cc: Jeff King <peff@peff.net>, git@vger.kernel.org, musl@lists.openwall.com
Subject: Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
Date: Wed, 5 Oct 2016 13:17:49 +0200 (CEST)	[thread overview]
Message-ID: <alpine.DEB.2.20.1610051250080.35196@virtualbox> (raw)
In-Reply-To: <20161004173926.GA19318@brightrain.aerifal.cx>

Hi Rich,

On Tue, 4 Oct 2016, Rich Felker wrote:

> On Tue, Oct 04, 2016 at 06:08:33PM +0200, Johannes Schindelin wrote:
>
> > And lastly, the best alternative would be to teach musl about
> > REG_STARTEND, as it is rather useful a feature.
> 
> Maybe, but it seems fundamentally costly to support -- it's extra
> state in the inner loops that imposes costly spill/reload on archs
> with too few registers (x86).

It is true that it could cause that.

I had a brief look at the source code (you use backtracking... hopefully
nobody uses musl to parse regular expressions from untrusted, or
inexperienced, sources [*1*]), and it seems that the regex code might
spill unnecessarily already (I see, for example, that the reg_notbol,
reg_noteol and reg_newline flags all use up complete int registers, not
merely bits of a single one).

It seems, specifically, that the *match_end_ofs parameter of the two
regexec backends is always set to point to eo, which is so far not
initialized. You could initialize it to -1 and set it to pmatch[0].rm_eo
if the REG_STARTEND flag is set. The GET_NEXT_WCHAR() macro would then
need to test something like

	if (str_byte >= string + *match_end_ofs) {
		ret = REG_NOMATCH; goto error_exit;
	}

This does not handle non-zero pmatch[0].rm_so, though. I would probably
try to pass another input parameter for that, but I have not verified yet
that a "^" would be handled properly (if pmatch[0].rm_so > 0 and
REG_STARTEND is set, "^" should *not* match).

> I'll look at doing this when we overhaul/replace the regex
> implementation, and I'm happy to do some performance-regression tests
> for adding it now if someone has a simple patch (as was mentioned on the
> musl list).

I'd be interested to be kept in the loop, if you do not mind Cc:ing me.

Ciao,
Johannes

Footnote *1*:
http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016

  reply	other threads:[~2016-10-05 11:18 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-04 15:08 Rich Felker
2016-10-04 15:27 ` Jeff King
2016-10-04 15:40   ` Rich Felker
2016-10-04 16:08     ` Johannes Schindelin
2016-10-04 16:11       ` Rich Felker
2016-10-04 17:16         ` Johannes Schindelin
2016-10-04 18:00           ` Ray Donnelly
2016-10-04 17:39       ` [musl] " Rich Felker
2016-10-05 11:17         ` Johannes Schindelin [this message]
2016-10-05 13:01           ` Szabolcs Nagy
2016-10-05 13:15           ` Rich Felker
2016-10-04 22:06       ` James B
2016-10-04 22:33         ` Rich Felker
2016-10-04 22:48           ` Junio C Hamano
2016-10-05 13:11           ` Jakub Narębski
2016-10-05 16:15             ` [musl] " Rich Felker
2016-10-05 10:41         ` Johannes Schindelin
2016-10-05 11:59           ` James B
2016-10-05 16:11             ` Jeff King
2016-10-05 16:27               ` Rich Felker
2016-10-06 10:44             ` Johannes Schindelin
2016-10-06 19:18       ` Ævar Arnfjörð Bjarmason
2016-10-06 19:23         ` Jeff King
2016-10-06 19:25           ` Rich Felker
2016-10-06 19:28             ` Jeff King
2016-10-06 22:42         ` Ramsay Jones
2016-10-07 11:30           ` Jakub Narębski
2016-10-04 16:01   ` Johannes Schindelin
2016-10-05  3:00 [musl] " writeonce
2016-10-05 10:49 ` Johannes Schindelin
2016-10-05 16:37 writeonce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.20.1610051250080.35196@virtualbox \
    --to=johannes.schindelin@gmx.de \
    --cc=dalias@libc.org \
    --cc=git@vger.kernel.org \
    --cc=musl@lists.openwall.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).