git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: Johannes Schindelin via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org,
	Johannes Schindelin <johannes.schindelin@gmx.de>
Subject: Re: [PATCH] ci: disallow directional formatting
Date: Tue, 02 Nov 2021 17:03:22 +0100	[thread overview]
Message-ID: <211102.86lf261q2e.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <YYFdu/+j8ba+hjCF@nand.local>


On Tue, Nov 02 2021, Taylor Blau wrote:

> On Tue, Nov 02, 2021 at 04:01:57PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> There's a parallel discussion about doing something to detect this in
>> "git am", which for the git project seems like a better place to put
>> this.
>
> I don't think that one impacts the other necessarily. Having `git am`
> guard against this would probably be sufficient to protect Junio
> accidentally apply something containing directional formatting to his
> tree unknowingly.
>
> But the idea that we rely on the import mechanism to protect against
> this doesn't sit well with me. Ultimately, we should be relying on a
> static check like below to ensure that directional formatting hasn't
> entered the tree by any mechanism (not just 'git am').
>
>> > diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
>> > index 6ed6a9e8076..7b4b4df03c3 100644
>> > --- a/.github/workflows/main.yml
>> > +++ b/.github/workflows/main.yml
>> > @@ -289,6 +289,13 @@ jobs:
>> >      - uses: actions/checkout@v2
>> >      - run: ci/install-dependencies.sh
>> >      - run: ci/run-static-analysis.sh
>> > +    - name: disallow Unicode directional formatting
>> > +      run: |
>> > +        # Use UTF-8-aware `printf` to feed a byte pattern to non-UTF-8-aware `git grep`
>> > +        # (Ubuntu's `git grep` is compiled without support for libpcre, otherwise we
>> > +        # could use `git grep -P` with the `\u` syntax).
>> > +        ! LANG=C git grep -Il "$(LANG=C.UTF-8 printf \
>> > +          '\\(\u202a\\|\u202b\\|\u202c\\|\u202d\\|\u202e\\|\u2066\\|\u2067\\|\u2068\\|\u2069\\)')"
>> >    sparse:
>> >      needs: ci-config
>> >      if: needs.ci-config.outputs.enabled == 'yes'
>> >
>> > base-commit: 0cddd84c9f3e9c3d793ec93034ef679335f35e49
>>
>> It would be easier to maintain this if were added to
>> ci/run-static-analysis.sh instead, where we have some similar tests, and
>> if it lives there we could do away with the double-escaping, then it can
>> also be run manually.
>>
>> Also, can't we just pipe "git ls-files -z" into "perl -0ne" here and use
>> its unconditional support for e.g. unicode properties in regexes.
>
> I agree that the double-escaping is ugly. I think that this would be
> easier to maintain if it lived in ci/run-static-analysis.sh or its own
> script like ci/check-directional-formatting.sh.
>
> And yes, constructing a byte pattern is a little odd as well, but I
> think that it's the best you can do if you're limited to running 'git
> grep' without libpcre. I wondered if we could depend on perl being
> around during CI, but as far as I know we can since install Perl modules
> in ci/install-dependencies.sh and use Perl extensively through the test
> suite.

We can hard depend on anything listed in [1][2], in this case there's a
Perl 5.30 available.

We don't need any Perl modules, it's all Perl-native regex features and
a small for-loop.

On the topic at large I wonder how much we need to worry about this at
all, seems like somewher between case of "the anglosphere discovers
scary foreign characters in Unicode again" and "security researcher
creates scary landing page for well-known Unicode edge-case"[3] :)

In this particular case the test cases involved seem extremely contrived
and unlikely to be something we'd end up with in our code in any case,
even in a case where all the reviewers would be fooled by this method in
itself.

I.e. since you can't use these to "fold" lines in the eyes of a human
viewer it's all rather contrived cases where comments and valid C code
lives on the same line, e.g.:

    /*‮ } ⁦if (isAdmin)⁩ ⁦ begin admins only */
        printf("You are an admin.\n");
    /* end admins only ‮ { ⁦*/

Which at least in Emacs is highlighted the right way, which is the first
major clue, and having tested some of these just now on AIX I've got xlc
whining about some cases that gcc/clang will happily pass by, so our
tooling at large will probably catch these anyway...

1. https://github.com/actions/virtual-environments
2. https://github.com/actions/virtual-environments/blob/main/images/linux/Ubuntu2004-README.md
3. https://www.trojansource.codes/

  reply	other threads:[~2021-11-02 16:36 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-02 12:58 [PATCH] ci: disallow directional formatting Johannes Schindelin via GitGitGadget
2021-11-02 15:01 ` Ævar Arnfjörð Bjarmason
2021-11-02 15:48   ` Taylor Blau
2021-11-02 16:03     ` Ævar Arnfjörð Bjarmason [this message]
2021-11-02 16:12     ` Johannes Schindelin
2021-11-02 16:38       ` Ævar Arnfjörð Bjarmason
2021-11-03 17:20     ` Junio C Hamano
2021-11-03 12:23 ` [PATCH v2] " Johannes Schindelin via GitGitGadget
2021-11-03 16:36   ` Ævar Arnfjörð Bjarmason
2021-11-03 18:00   ` Junio C Hamano
2021-11-03 23:38   ` Junio C Hamano
2021-11-04 10:19     ` Johannes Schindelin
2021-11-04 13:13   ` [PATCH v3] " Johannes Schindelin via GitGitGadget
2021-11-04 13:48     ` Ævar Arnfjörð Bjarmason
2021-11-04 17:19       ` Junio C Hamano
2021-11-08 18:49         ` Ævar Arnfjörð Bjarmason
2021-11-08 20:08           ` Junio C Hamano
2021-11-09 13:34             ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=211102.86lf261q2e.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).