git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Michael J Gruber <git@grubix.eu>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [PATCH] t5534: fix misleading grep invocation
Date: Fri, 7 Jul 2017 13:13:06 +0200 (CEST)	[thread overview]
Message-ID: <alpine.DEB.2.21.1.1707071259550.84669@virtualbox> (raw)
In-Reply-To: <22feab0a-ca75-2aea-1ec9-2f71fe40c9d0@grubix.eu>

[-- Attachment #1: Type: text/plain, Size: 2914 bytes --]

Hi Michael,

On Thu, 6 Jul 2017, Michael J Gruber wrote:

> Junio C Hamano venit, vidit, dixit 05.07.2017 18:26:
> > Johannes Schindelin <johannes.schindelin@gmx.de> writes:
> > 
> >> It seems to be a little-known feature of `grep` (and it certainly came
> >> as a surprise to this here developer who believed to know the Unix tools
> >> pretty well) that multiple patterns can be passed in the same
> >> command-line argument simply by separating them by newlines. Watch, and
> >> learn:
> >>
> >> 	$ printf '1\n2\n3\n' | grep "$(printf '1\n3\n')"
> >> 	1
> >> 	3
> >>
> >> That behavior also extends to patterns passed via `-e`, and it is not
> >> modified by passing the option `-E` (but trying this with -P issues the
> >> error "grep: the -P option only supports a single pattern").
> >>
> >> It seems that there are more old Unix hands who are surprised by this
> >> behavior, as grep invocations of the form
> >>
> >> 	grep "$(git rev-parse A B) C" file
> >>
> >> were introduced in a85b377d041 (push: the beginning of "git push
> >> --signed", 2014-09-12), and later faithfully copy-edited in b9459019bbb
> >> (push: heed user.signingkey for signed pushes, 2014-10-22).
> >>
> >> Please note that the output of `git rev-parse A B` separates the object
> >> IDs via *newlines*, not via spaces, and those newlines are preserved
> >> because the interpolation is enclosed in double quotes.
> >>
> >> As a consequence, these tests try to validate that the file contains
> >> either A's object ID, or B's object ID followed by C, or both. Clearly,
> >> however, what the test wanted to see is that there is a line that
> >> contains all of them.
> >>
> >> This is clearly unintended, and the grep invocations in question really
> >> match too many lines.
>
> [...]
>
> How did you spot this? Are there grep versions that behave differently?

Yes, there are grep versions that behave differently... how did you guess?

I am in the middle of an extended investigation trying to assess how
feasible it would be to use a native Win32 port of BusyBox (started by
long-time Git contributor Nguyễn Thái Ngọc Duy) in Git for Windows to
execute the many, many remaining Unix shell scripts that are a core part
of Git (including crucial functionality such as bisect, rebase, stash and
submodule, for which we suffer portability and performance problems).

And it is BusyBox' grep that does not handle newlines in the pattern
argument to split it into two alternative patterns.

I first considered patching BusyBox to adhere to the expected behavior,
but then I looked closer and saw that the test's grep invocations actually
matched two lines instead of what I expected. An even closer look made me
suspect that the original intention was different from what the script
actually does, and for once I tried to be nice in my commit message.

Ciao,
Dscho

  parent reply	other threads:[~2017-07-07 11:13 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-05 11:37 [PATCH] t5534: fix misleading grep invocation Johannes Schindelin
2017-07-05 16:26 ` Junio C Hamano
2017-07-06  9:20   ` Michael J Gruber
2017-07-06 16:23     ` Junio C Hamano
2017-07-07 11:13     ` Johannes Schindelin [this message]
2017-07-07 16:41       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1.1707071259550.84669@virtualbox \
    --to=johannes.schindelin@gmx.de \
    --cc=git@grubix.eu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).