git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Brandon Williams <bmwill@google.com>
Cc: "Git Mailing List" <git@vger.kernel.org>,
	"Junio C Hamano" <gitster@pobox.com>, "Jeff King" <peff@peff.net>,
	"Jeffrey Walton" <noloader@gmail.com>,
	"Michał Kiedrowicz" <michal.kiedrowicz@gmail.com>,
	"J Smith" <dark.panda@gmail.com>,
	"Victor Leschuk" <vleschuk@gmail.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Fredrik Kuivinen" <frekui@gmail.com>
Subject: Re: [PATCH 1/7] grep: don't redundantly compile throwaway patterns under threading
Date: Fri, 12 May 2017 20:17:48 +0200	[thread overview]
Message-ID: <CACBZZX7x=9Pvuhy7Q9=ecnhvVmjt664enZOjk4B8LXErVL4oEg@mail.gmail.com> (raw)
In-Reply-To: <20170512173527.GD98586@google.com>

On Fri, May 12, 2017 at 7:35 PM, Brandon Williams <bmwill@google.com> wrote:
> On 05/11, Ęvar Arnfjörš Bjarmason wrote:
>> Change the pattern compilation logic under threading so that grep
>> doesn't compile a pattern it never ends up using on the non-threaded
>> code path, only to compile it again N times for N threads which will
>> each use their own copy, ignoring the initially compiled pattern.
>
> Is there a reason each thread needs to compile the patterns as opposed
> to them being compiled a single time and being copies N time for N
> threads?

Not really, just simplicity.

I'll amend the commit message to mention that I did this not as an
optimization, but just so the code is easier to reason about and
debug, when debugging this I found that there were two different
stacktraces to get to where I was compiling the pattern, and e.g.
"give each compilation/execution an id" for ad-hoc debugging would
always end up with one unused pattern, confusingly.

The reason it's like this is because it's a side-effect of duplicating
the whole grep_opt structure, which is not thread safe, writable, and
munged during execution, that's where the pattern is stored.

We could re-use the compiled regexp itself for POSIX, for PCRE you can
pre-JIT, post-JIT you can only partially do it, i.e. the pattern is
const, thread safe and can be shared, but you need to also marshal
around mutable per-thread for the JIT stack etc.

I looked into e.g. splitting the API into some "do & alloc threadsafe
stuff", "spawn thread", "do and alloc non-threadsafe stuff", but the
execution time of grep_opt_dup() & pattern compilation is trivial
compared to actually executing the grep, so there was no point. Even
with the more expensive JIT some of the most expensive PCRE patterns
take something like 0.0X milliseconds to compile at most[1].

1. http://sljit.sourceforge.net/pcre.html

  reply	other threads:[~2017-05-12 18:18 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-11 17:01 [PATCH 0/7] PCRE v2, PCRE v1 JIT, log -P & fixes Ævar Arnfjörð Bjarmason
2017-05-11 17:01 ` [PATCH 1/7] grep: don't redundantly compile throwaway patterns under threading Ævar Arnfjörð Bjarmason
2017-05-12 17:35   ` Brandon Williams
2017-05-12 18:17     ` Ævar Arnfjörð Bjarmason [this message]
2017-05-11 17:01 ` [PATCH 2/7] grep: skip pthreads overhead when using one thread Ævar Arnfjörð Bjarmason
2017-05-11 17:01 ` [PATCH 3/7] log: add -P as a synonym for --perl-regexp Ævar Arnfjörð Bjarmason
2017-05-11 17:01 ` [PATCH 4/7] grep: add support for the PCRE v1 JIT API Ævar Arnfjörð Bjarmason
2017-05-11 17:01 ` [PATCH 5/7] grep: un-break building with PCRE < 8.32 Ævar Arnfjörð Bjarmason
2017-05-11 17:01 ` [PATCH 6/7] grep: un-break building with PCRE < 8.20 Ævar Arnfjörð Bjarmason
2017-05-11 17:01 ` [PATCH 7/7] grep: add support for PCRE v2 Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACBZZX7x=9Pvuhy7Q9=ecnhvVmjt664enZOjk4B8LXErVL4oEg@mail.gmail.com' \
    --to=avarab@gmail.com \
    --cc=bmwill@google.com \
    --cc=dark.panda@gmail.com \
    --cc=frekui@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=michal.kiedrowicz@gmail.com \
    --cc=noloader@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=vleschuk@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).