From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "René Scharfe" <l.s.r@web.de>,
git@vger.kernel.org, "Jeff King" <peff@peff.net>,
"Johannes Schindelin" <johannes.schindelin@gmx.de>,
"Carlo Marcelo Arenas Belón" <carenas@gmail.com>
Subject: Re: [PATCH 14/25] pickaxe -S: remove redundant "sz" check in while-loop
Date: Thu, 04 Feb 2021 22:13:09 +0100 [thread overview]
Message-ID: <87pn1fcza2.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <xmqqh7mr90ou.fsf@gitster.c.googlers.com>
On Thu, Feb 04 2021, Junio C Hamano wrote:
> René Scharfe <l.s.r@web.de> writes:
>
>>> - while (sz &&
>>> - !regexec_buf(regexp, data, sz, 1, ®match, flags)) {
>>> + while (!regexec_buf(regexp, data, sz, 1, ®match, flags)) {
>>
>> This will loop forever for regexes that match an empty string. An
>> example would be /$/. Silly, perhaps, but still I understand this check
>> less as an optimization and more as a correctness/robustness thing.
>>
>>> flags |= REG_NOTBOL;
>>> data += regmatch.rm_eo;
>>> sz -= regmatch.rm_eo;
>>> - if (sz && regmatch.rm_so == regmatch.rm_eo) {
>>> + if (regmatch.rm_so == regmatch.rm_eo) {
>>> data++;
>>> sz--;
>>> }
>>
>> Before, if the match was an empty string and there was more data after
>> it, then the code would consume a character anyway, in order to avoid
>> matching the same empty string again. With the patch, that character
>> is consumed even if there is no more data. This leaves 'data'
>> pointing beyond the buffer and 'sz' rolls over to ULONG_MAX. Oops. :(
>
> While I do not care too much about NUL in the haystack, I do not
> mind [13/25] either. But this is bad.
>
> This whole thing reminds me of f53c5de2 (pickaxe: fix segfault with
> '-S<...> --pickaxe-regex', 2017-03-18), by the way.
René: Well spotted, thanks, and Oops.
I've just sent a separate series with 01-10 of this one. I'm sitting on
the diffcore-pickaxe patches for a while. I've got local fixes for a lot
of issues in it, will fix this one too.
I've optimized the PCRE v2 codepath a lot more in my local
version. Current results are:
4209.1: git log -S'int main' <limit-rev>.. 0.38(0.36+0.01) 0.37(0.33+0.04) -2.6%
4209.2: git log -S'æ' <limit-rev>.. 0.51(0.47+0.04) 0.32(0.27+0.05) -37.3%
4209.3: git log --pickaxe-regex -S'(int|void|null)' <limit-rev>.. 0.72(0.68+0.03) 0.57(0.54+0.03) -20.8%
4209.4: git log --pickaxe-regex -S'if *\([^ ]+ & ' <limit-rev>.. 0.60(0.55+0.02) 0.39(0.34+0.05) -35.0%
4209.5: git log --pickaxe-regex -S'[àáâãäåæñøùúûüýþ]' <limit-rev>.. 0.43(0.40+0.03) 0.50(0.44+0.06) +16.3%
4209.6: git log -G'(int|void|null)' <limit-rev>.. 0.64(0.55+0.09) 0.63(0.56+0.05) -1.6%
4209.7: git log -G'if *\([^ ]+ & ' <limit-rev>.. 0.64(0.59+0.05) 0.63(0.56+0.06) -1.6%
4209.8: git log -G'[àáâãäåæñøùúûüýþ]' <limit-rev>.. 0.63(0.54+0.08) 0.62(0.55+0.06) -1.6%
4209.9: git log -i -S'int main' <limit-rev>.. 0.39(0.35+0.03) 0.38(0.35+0.02) -2.6%
4209.10: git log -i -S'æ' <limit-rev>.. 0.39(0.33+0.06) 0.32(0.28+0.04) -17.9%
4209.11: git log -i --pickaxe-regex -S'(int|void|null)' <limit-rev>.. 0.90(0.84+0.05) 0.58(0.53+0.04) -35.6%
4209.12: git log -i --pickaxe-regex -S'if *\([^ ]+ & ' <limit-rev>.. 0.71(0.64+0.06) 0.40(0.37+0.03) -43.7%
4209.13: git log -i --pickaxe-regex -S'[àáâãäåæñøùúûüýþ]' <limit-rev>.. 0.43(0.40+0.03) 0.50(0.46+0.04) +16.3%
4209.14: git log -i -G'(int|void|null)' <limit-rev>.. 0.64(0.57+0.06) 0.62(0.56+0.05) -3.1%
4209.15: git log -i -G'if *\([^ ]+ & ' <limit-rev>.. 0.65(0.59+0.06) 0.63(0.54+0.08) -3.1%
4209.16: git log -i -G'[àáâãäåæñøùúûüýþ]' <limit-rev>.. 0.63(0.55+0.08) 0.62(0.56+0.05) -1.6%
The main optimization was just moving the compilation of the pattern up
the stack into the diff_options struct, the current version in this
thread re-compiles it every time.
next prev parent reply other threads:[~2021-02-04 21:15 UTC|newest]
Thread overview: 99+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-03 3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
2021-02-03 3:27 ` [PATCH 01/25] grep/pcre2 tests: reword comments referring to kwset Ævar Arnfjörð Bjarmason
2021-02-03 3:27 ` [PATCH 02/25] grep/pcre2: drop needless assignment + assert() on opt->pcre2 Ævar Arnfjörð Bjarmason
2021-02-03 3:27 ` [PATCH 03/25] grep/pcre2: drop needless assignment to NULL Ævar Arnfjörð Bjarmason
2021-02-03 3:27 ` [PATCH 04/25] grep/pcre2: correct reference to grep_init() in comment Ævar Arnfjörð Bjarmason
2021-02-03 3:27 ` [PATCH 05/25] grep/pcre2: prepare to add debugging to pcre2_malloc() Ævar Arnfjörð Bjarmason
2021-02-03 3:27 ` [PATCH 06/25] grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode Ævar Arnfjörð Bjarmason
2021-02-03 3:27 ` [PATCH 07/25] grep/pcre2: use compile-time PCREv2 version test Ævar Arnfjörð Bjarmason
2021-02-03 3:27 ` [PATCH 08/25] grep/pcre2: use pcre2_maketables_free() function Ævar Arnfjörð Bjarmason
2021-02-03 3:27 ` [PATCH 09/25] grep/pcre2: actually make pcre2 use custom allocator Ævar Arnfjörð Bjarmason
2021-02-03 3:27 ` [PATCH 10/25] grep/pcre2: move back to thread-only PCREv2 structures Ævar Arnfjörð Bjarmason
2021-02-03 3:27 ` [PATCH 11/25] grep/pcre2: move definitions of pcre2_{malloc,free} Ævar Arnfjörð Bjarmason
2021-02-03 3:27 ` [PATCH 12/25] pickaxe tests: refactor to use test_commit --append Ævar Arnfjörð Bjarmason
2021-02-03 3:27 ` [PATCH 13/25] pickaxe -S: support content with NULs under --pickaxe-regex Ævar Arnfjörð Bjarmason
2021-02-03 3:28 ` [PATCH 14/25] pickaxe -S: remove redundant "sz" check in while-loop Ævar Arnfjörð Bjarmason
2021-02-04 16:16 ` René Scharfe
2021-02-04 17:56 ` Junio C Hamano
2021-02-04 21:13 ` Ævar Arnfjörð Bjarmason [this message]
2021-02-03 3:28 ` [PATCH 15/25] pickaxe/style: consolidate declarations and assignments Ævar Arnfjörð Bjarmason
2021-02-03 3:28 ` [PATCH 16/25] pickaxe tests: add test for diffgrep_consume() internals Ævar Arnfjörð Bjarmason
2021-02-03 3:28 ` [PATCH 17/25] pickaxe tests: add test for "log -S" not being a regex Ævar Arnfjörð Bjarmason
2021-02-03 3:28 ` [PATCH 18/25] perf: add performance test for pickaxe Ævar Arnfjörð Bjarmason
2021-02-03 3:28 ` [PATCH 19/25] pickaxe -G: set -U0 for diff generation Ævar Arnfjörð Bjarmason
2021-02-03 14:26 ` Ævar Arnfjörð Bjarmason
2021-02-03 19:42 ` Junio C Hamano
2021-02-03 3:28 ` [PATCH 20/25] grep.h: make patmatch() a public function Ævar Arnfjörð Bjarmason
2021-02-03 3:28 ` [PATCH 21/25] pickaxe: use PCREv2 for -G and -S Ævar Arnfjörð Bjarmason
2021-02-03 20:44 ` Ævar Arnfjörð Bjarmason
2021-02-04 18:11 ` Junio C Hamano
2021-02-04 18:22 ` Junio C Hamano
2021-02-03 3:28 ` [PATCH 22/25] Remove unused kwset.[ch] Ævar Arnfjörð Bjarmason
[not found] ` <CAPUEspgBmuTBHVZWY9fRtjbHWBRr0zHravLL1Czepc6jmib4HA@mail.gmail.com>
2021-02-03 14:13 ` Ævar Arnfjörð Bjarmason
[not found] ` <CAPUEsphN7QuSVsC1Tr4xE8yQgPTtpF7wL7zbk1crQU3n-5g6JQ@mail.gmail.com>
2021-02-03 16:45 ` Ævar Arnfjörð Bjarmason
2021-02-03 3:28 ` [PATCH 23/25] xdiff-interface: allow early return from xdiff_emit_{line,hunk}_fn Ævar Arnfjörð Bjarmason
2021-02-03 3:28 ` [PATCH 24/25] xdiff-interface: support early exit in xdiff_outf() Ævar Arnfjörð Bjarmason
2021-02-04 18:16 ` Junio C Hamano
2021-02-03 3:28 ` [PATCH 25/25] pickaxe -G: terminate early on matching lines Ævar Arnfjörð Bjarmason
2021-02-03 12:38 ` [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 00/22] pickaxe: test and refactoring for follow-up changes Ævar Arnfjörð Bjarmason
2021-02-16 22:23 ` Junio C Hamano
2021-02-17 1:19 ` Junio C Hamano
2021-04-12 17:15 ` [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 01/22] grep/pcre2 tests: reword comments referring to kwset Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 02/22] pickaxe tests: refactor to use test_commit --append --printf Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 03/22] pickaxe tests: add test for diffgrep_consume() internals Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 04/22] pickaxe tests: add test for "log -S" not being a regex Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 05/22] pickaxe tests: test for -G, -S and --find-object incompatibility Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 06/22] pickaxe tests: add missing test for --no-pickaxe-regex being an error Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 07/22] pickaxe: die when -G and --pickaxe-regex are combined Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 08/22] pickaxe: die when --find-object and --pickaxe-all " Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 09/22] diff.h: move pickaxe fields together again Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 10/22] pickaxe/style: consolidate declarations and assignments Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 11/22] perf: add performance test for pickaxe Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 12/22] pickaxe: refactor function selection in diffcore-pickaxe() Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 13/22] pickaxe: assert that we must have a needle under -G or -S Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 14/22] pickaxe -S: support content with NULs under --pickaxe-regex Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 15/22] pickaxe: rename variables in has_changes() for brevity Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 16/22] pickaxe -S: slightly optimize contains() Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 17/22] xdiff-interface: prepare for allowing early return Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 18/22] xdiff-interface: allow early return from xdiff_emit_line_fn Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 19/22] pickaxe -G: terminate early on matching lines Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 20/22] pickaxe -G: don't special-case create/delete Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 21/22] xdiff users: use designated initializers for out_line Ævar Arnfjörð Bjarmason
2021-04-12 17:15 ` [PATCH v3 22/22] xdiff-interface: replace discard_hunk_line() with a flag Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 01/22] grep/pcre2 tests: reword comments referring to kwset Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 02/22] test-lib-functions: document and test test_commit --no-tag Ævar Arnfjörð Bjarmason
2021-03-30 23:14 ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 03/22] test-lib-functions: reword "test_commit --append" docs Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 04/22] test-lib functions: add --printf option to test_commit Ævar Arnfjörð Bjarmason
2021-03-30 23:11 ` Junio C Hamano
2021-04-12 13:19 ` Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 05/22] pickaxe tests: refactor to use test_commit --append --printf Ævar Arnfjörð Bjarmason
2021-03-30 23:26 ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 06/22] pickaxe tests: add test for diffgrep_consume() internals Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 07/22] pickaxe tests: add test for "log -S" not being a regex Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 08/22] pickaxe tests: test for -G, -S and --find-object incompatibility Ævar Arnfjörð Bjarmason
2021-03-30 23:32 ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 09/22] pickaxe: die when -G and --pickaxe-regex are combined Ævar Arnfjörð Bjarmason
2021-03-30 23:36 ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 10/22] pickaxe: die when --find-object and --pickaxe-all " Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 11/22] diff.h: move pickaxe fields together again Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 12/22] pickaxe/style: consolidate declarations and assignments Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 13/22] perf: add performance test for pickaxe Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 14/22] pickaxe: refactor function selection in diffcore-pickaxe() Ævar Arnfjörð Bjarmason
2021-03-30 23:45 ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 15/22] pickaxe: assert that we must have a needle under -G or -S Ævar Arnfjörð Bjarmason
2021-03-30 23:50 ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 16/22] pickaxe -S: support content with NULs under --pickaxe-regex Ævar Arnfjörð Bjarmason
2021-03-30 23:54 ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 17/22] pickaxe: rename variables in has_changes() for brevity Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 18/22] pickaxe -S: slightly optimize contains() Ævar Arnfjörð Bjarmason
2021-03-30 23:58 ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 19/22] xdiff-interface: allow early return from xdiff_emit_{line,hunk}_fn Ævar Arnfjörð Bjarmason
2021-03-31 0:04 ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 20/22] xdiff-interface: support early exit in xdiff_outf() Ævar Arnfjörð Bjarmason
2021-02-16 11:58 ` [PATCH v2 21/22] pickaxe -G: terminate early on matching lines Ævar Arnfjörð Bjarmason
2021-03-31 0:11 ` Junio C Hamano
2021-02-16 11:58 ` [PATCH v2 22/22] pickaxe -G: don't special-case create/delete Ævar Arnfjörð Bjarmason
2021-03-31 0:14 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87pn1fcza2.fsf@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=carenas@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=johannes.schindelin@gmx.de \
--cc=l.s.r@web.de \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).