git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Carlo Arenas <carenas@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org
Subject: Re: [PATCH] grep: skip UTF8 checks explicitally
Date: Fri, 26 Jul 2019 08:53:49 -0700	[thread overview]
Message-ID: <CAPUEspgseYvKTNN6EkqjKo1zQXRjUzyiLMe0kJwnNu=F3ATmOA@mail.gmail.com> (raw)
In-Reply-To: <87ef2c7roy.fsf@evledraar.gmail.com>

On Fri, Jul 26, 2019 at 8:15 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> I'm not sure what a real fix for that is. Part of it is probably 8/8 in
> the series I mention below, but more generally we'd need to be more
> encoding aware at a much higher callsite than "grep". So e.g. we'd know
> that we match "binary" data as not-UTF-8. Now we just throw arbitrary
> bytes around and hope something sticks.

I haven't look yet at your proposed changes, but my gut feeling is that
the work to support invalid UTF in the yet unreleased PCRE version would
be needed as part of it, and therefore it might be better to keep PCRE
out of the main path until that gets released and can be relied upon.

kwset is not going away with this series anyway, regardless of the no-kwset
name on the branch.

> > If we're already deciding to paper over things, I'd much rather prefer
> > the simpler patch, i.e. Carlo's.
>
> As I noted upthread PCRE's own docs promise undefined behavior and fire
> and brimstone if that patch is applied. Those last two not
> guaranteed. So we need another solution.

in my original reply I mentioned I explicitly didn't do a test because of this
"undefined behavior", but I think it should be fair to mention that we are
already affected by that because using the JIT fast path does skip any
UTF-8 validations and is currently possible to get git into an infinite loop
or make it segfault when using PCRE.

in that line, I am not sure I understand the pushback against making that
explicit since it only makes both codepaths behave the same (bugs and
risks of burning alike)

Carlo

  reply	other threads:[~2019-07-26 15:54 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-21 18:31 [PATCH] grep: skip UTF8 checks explicitally Carlo Marcelo Arenas Belón
2019-07-21 18:34 ` Eric Sunshine
2019-07-22 11:55 ` Johannes Schindelin
2019-07-22 14:43   ` [PATCH] grep: skip UTF8 checks explicitly Carlo Marcelo Arenas Belón
2019-07-24  2:10     ` Junio C Hamano
2019-07-24 10:50       ` Johannes Schindelin
2019-07-24 16:08         ` Junio C Hamano
2019-08-28 14:54           ` [PATCH v2] " Carlo Marcelo Arenas Belón
2019-08-28 16:57             ` Carlo Arenas
2019-09-09 15:07               ` Carlo Arenas
2019-09-09 18:49                 ` Junio C Hamano
2019-07-22 19:42   ` [PATCH] grep: skip UTF8 checks explicitally Ævar Arnfjörð Bjarmason
2019-07-23  3:50     ` Carlo Arenas
2019-07-23 12:46       ` Johannes Schindelin
2019-07-24  1:47         ` Carlo Arenas
2019-07-24 10:47           ` Johannes Schindelin
2019-07-24 18:22             ` Ævar Arnfjörð Bjarmason
2019-07-24 21:04               ` Junio C Hamano
2019-07-25  9:48                 ` Johannes Schindelin
2019-07-25 13:02                   ` Junio C Hamano
2019-07-25 18:22                     ` Johannes Schindelin
2019-07-26 15:15                       ` Ævar Arnfjörð Bjarmason
2019-07-26 15:53                         ` Carlo Arenas [this message]
2019-07-26 20:05                           ` Ævar Arnfjörð Bjarmason
2019-07-26 16:19                         ` Junio C Hamano
2019-07-26 19:40                           ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPUEspgseYvKTNN6EkqjKo1zQXRjUzyiLMe0kJwnNu=F3ATmOA@mail.gmail.com' \
    --to=carenas@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).