git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stephane Odul <stephane@clumio.com>
To: Mathias Krause <minipli@grsecurity.net>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	git@vger.kernel.org,
	"Carlo Marcelo Arenas Belón" <carenas@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [EXTERNAL SENDER] Suspected git grep regression in git 2.40.0
Date: Tue, 21 Mar 2023 13:46:55 -0700	[thread overview]
Message-ID: <51078D7E-C325-4F57-96C1-601B4E102DD9@clumio.com> (raw)
In-Reply-To: <b0f4b588-9871-8e59-e5a2-3f8745a7c4cd@grsecurity.net>

Thank you for looking into this so quickly.

I’m unable to reproduce locally reliably but I created a custom pipeline to reproduce more quickly.

Here are the things I found out.

* With the NO_JIT flag and limited to only python files (in our case we only want to grep on py files anyways):
  - git grep -c -P '(*NO_JIT)^[[:alnum:]_]+ = json.load' -- '*.py’
  This is snappy and works, no more error.

* Without the flag and the *.py restriction:
  - git grep -c -P '^[^ #][^#]+sys[.]argv’
    This did not fail but took almost 3m, big performance regression.
  - git grep -c -P '^[[:alnum:]_]+ = json.load’
    Crashed and returned -11. Stderr was empty so I have no idea on what file it failed.

 * With NO_JIT on all the files:
  - git grep -c -P '(*NO_JIT)^[[:alnum:]_]+ = json.load’
   This worked, that pattern is snappy but other patterns are very slow:
  - git grep -c -P '(*NO_JIT)^[^ #][^#]+sys[.]argv’
    Took 8m to complete.

 * Without the flag but only *.py.
    - git grep -c -P '^[[:alnum:]_]+ = json.load' -- '*.py’
     All the patterns run fast (under 1s), and no errors.


Note that I was trying -E and replaced \w with [[:alnum:]_] … I’ll need to revert that, but I don’t thing \w is the issue.

Overall I would say that the issue is likely because the patterns are run against a non ASCII file somewhere in the repo.
Our repo is fairly large with files in various formats, including potentially some binaries that would definitely not be proper UTF-8.

For now I have a good workaround which is to only check for *.py files, which we should have done in the first place. The NO_JIT flag slows down things significantly so we will not use it here.

Do you have any recommendation on how to identify which file(s) is causing the crash considering there is nothing in stderr?

Thanks!

  reply	other threads:[~2023-03-21 20:47 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-21  8:04 Suspected git grep regression in git 2.40.0 Stephane Odul
2023-03-21 12:33 ` Bagas Sanjaya
2023-03-21 16:33 ` Junio C Hamano
2023-03-21 19:20   ` Mathias Krause
2023-03-21 20:46     ` Stephane Odul [this message]
2023-03-22 19:52       ` Mathias Krause
2023-03-22 20:04         ` [EXTERNAL SENDER] " Stephane Odul
2023-03-23 14:40         ` Suspected git grep regression in git 2.40.0 - proposed fix Mathias Krause
2023-03-23 16:19           ` Junio C Hamano
2023-03-23 16:36             ` Mathias Krause
2023-03-23 17:25           ` [PATCH v2] grep: work around UTF-8 related JIT bug in PCRE2 <= 10.34 Mathias Krause

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51078D7E-C325-4F57-96C1-601B4E102DD9@clumio.com \
    --to=stephane@clumio.com \
    --cc=avarab@gmail.com \
    --cc=carenas@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=minipli@grsecurity.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).