git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* bug: `git log --grep ... --invert-grep --author=...` negates / ignores --author
@ 2017-05-31 15:45 Anthony Sottile
  2017-05-31 18:08 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 7+ messages in thread
From: Anthony Sottile @ 2017-05-31 15:45 UTC (permalink / raw)
  To: git

Given the following commits:

```
asottile@asottile-VirtualBox:/tmp$ git init test
Initialized empty Git repository in /tmp/test/.git/
asottile@asottile-VirtualBox:/tmp$ cd test/
asottile@asottile-VirtualBox:/tmp/test$
GIT_COMMITTER_EMAIL=foo@example.com GIT_AUTHOR_EMAIL=foo@example.com
git commit --allow-empty -m "foo"
[master (root-commit) c9df62b] foo
asottile@asottile-VirtualBox:/tmp/test$ git commit -m "blah" --allow-empty
[master 9e3ee9b] blah
asottile@asottile-VirtualBox:/tmp/test$ git log
commit 9e3ee9bc1adab2ae8eb1884a8f6237da18dfd27b
Author: Anthony Sottile <asottile@umich.edu>
Date:   Wed May 31 08:40:59 2017 -0700

    blah

commit c9df62b93298a247fcfbe24ed4282ccf95448f47
Author: Anthony Sottile <foo@example.com>
Date:   Wed May 31 08:40:49 2017 -0700

    foo
asottile@asottile-VirtualBox:/tmp/test$ git log --grep bar
--invert-grep --author=foo
commit 9e3ee9bc1adab2ae8eb1884a8f6237da18dfd27b
Author: Anthony Sottile <asottile@umich.edu>
Date:   Wed May 31 08:40:59 2017 -0700

    blah

commit c9df62b93298a247fcfbe24ed4282ccf95448f47
Author: Anthony Sottile <foo@example.com>
Date:   Wed May 31 08:40:49 2017 -0700

    foo
asottile@asottile-VirtualBox:/tmp/test$ git log --author=foocommit
c9df62b93298a247fcfbe24ed4282ccf95448f47
Author: Anthony Sottile <foo@example.com>
Date:   Wed May 31 08:40:49 2017 -0700

    foo
```

I expect the same output from the last two commands, but the
`--invert-grep` one seems to match _all_ the commits.

I can try and dig into this if I have time, just trying to get a count
using this as a workaround

git log --grep ... --invert-grep --format=%ce | grep ... | wc -l

Anthony

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: bug: `git log --grep ... --invert-grep --author=...` negates / ignores --author
  2017-05-31 15:45 bug: `git log --grep ... --invert-grep --author=...` negates / ignores --author Anthony Sottile
@ 2017-05-31 18:08 ` Ævar Arnfjörð Bjarmason
  2017-05-31 21:40   ` Jeff King
  2017-06-01  2:20   ` Junio C Hamano
  0 siblings, 2 replies; 7+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-31 18:08 UTC (permalink / raw)
  To: Anthony Sottile; +Cc: Git Mailing List

On Wed, May 31, 2017 at 5:45 PM, Anthony Sottile <asottile@umich.edu> wrote:
> Given the following commits:
> ```
> asottile@asottile-VirtualBox:/tmp$ git init test
> Initialized empty Git repository in /tmp/test/.git/
> asottile@asottile-VirtualBox:/tmp$ cd test/
> asottile@asottile-VirtualBox:/tmp/test$
> GIT_COMMITTER_EMAIL=foo@example.com GIT_AUTHOR_EMAIL=foo@example.com
> git commit --allow-empty -m "foo"
> [master (root-commit) c9df62b] foo
> asottile@asottile-VirtualBox:/tmp/test$ git commit -m "blah" --allow-empty
> [master 9e3ee9b] blah
> asottile@asottile-VirtualBox:/tmp/test$ git log
> commit 9e3ee9bc1adab2ae8eb1884a8f6237da18dfd27b
> Author: Anthony Sottile <asottile@umich.edu>
> Date:   Wed May 31 08:40:59 2017 -0700
>
>     blah
>
> commit c9df62b93298a247fcfbe24ed4282ccf95448f47
> Author: Anthony Sottile <foo@example.com>
> Date:   Wed May 31 08:40:49 2017 -0700
>
>     foo
> asottile@asottile-VirtualBox:/tmp/test$ git log --grep bar
> --invert-grep --author=foo
> commit 9e3ee9bc1adab2ae8eb1884a8f6237da18dfd27b
> Author: Anthony Sottile <asottile@umich.edu>
> Date:   Wed May 31 08:40:59 2017 -0700
>
>     blah
>
> commit c9df62b93298a247fcfbe24ed4282ccf95448f47
> Author: Anthony Sottile <foo@example.com>
> Date:   Wed May 31 08:40:49 2017 -0700
>
>     foo
> asottile@asottile-VirtualBox:/tmp/test$ git log --author=foocommit
> c9df62b93298a247fcfbe24ed4282ccf95448f47
> Author: Anthony Sottile <foo@example.com>
> Date:   Wed May 31 08:40:49 2017 -0700
>
>     foo
> ```
>
> I expect the same output from the last two commands, but the
> `--invert-grep` one seems to match _all_ the commits.
>
> I can try and dig into this if I have time, just trying to get a count
> using this as a workaround
>
> git log --grep ... --invert-grep --format=%ce | grep ... | wc -l

I had to squint a bit to see what you were getting at here, which is
pretty simple: When you provide --invert-grep the --author filter is
completely discarded. This is a bug.

I.e. on git.git:

OK:

$ git log --grep=bar --author=Ævar --pretty=format:%an -100 origin/pu
|sort|uniq -c|sort -nr
5 Ævar Arnfjörð Bjarmason

$ git log --author=Ævar --pretty=format:%an -100 origin/pu |sort|uniq
-c|sort -nr
100 Ævar Arnfjörð Bjarmason

$ git log --grep=bar --invert-grep --author=Ævar --pretty=format:%an
-100 origin/pu |sort|uniq -c|sort -nr
     78 Junio C Hamano
     14 Jeff King
      2 Andreas Heiduk
      1 Sahil Dua
      1 Rikard Falkeborn
      1 Johannes Sixt
      1 Johannes Schindelin
      1 Ben Peart
      1 Ævar Arnfjörð Bjarmason

That last command should only find my commits, but instead --author is
discarded.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: bug: `git log --grep ... --invert-grep --author=...` negates / ignores --author
  2017-05-31 18:08 ` Ævar Arnfjörð Bjarmason
@ 2017-05-31 21:40   ` Jeff King
  2017-06-01 19:45     ` Ævar Arnfjörð Bjarmason
  2017-06-01  2:20   ` Junio C Hamano
  1 sibling, 1 reply; 7+ messages in thread
From: Jeff King @ 2017-05-31 21:40 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Anthony Sottile, Git Mailing List

On Wed, May 31, 2017 at 08:08:54PM +0200, Ævar Arnfjörð Bjarmason wrote:

> $ git log --grep=bar --author=Ævar --pretty=format:%an -100 origin/pu
> |sort|uniq -c|sort -nr
> 5 Ævar Arnfjörð Bjarmason
> 
> $ git log --author=Ævar --pretty=format:%an -100 origin/pu |sort|uniq
> -c|sort -nr
> 100 Ævar Arnfjörð Bjarmason
> 
> $ git log --grep=bar --invert-grep --author=Ævar --pretty=format:%an
> -100 origin/pu |sort|uniq -c|sort -nr
>      78 Junio C Hamano
>      14 Jeff King
>       2 Andreas Heiduk
>       1 Sahil Dua
>       1 Rikard Falkeborn
>       1 Johannes Sixt
>       1 Johannes Schindelin
>       1 Ben Peart
>       1 Ævar Arnfjörð Bjarmason
> 
> That last command should only find my commits, but instead --author is
> discarded.

I would have thought that the last command wouldn't find _any_ of your
commits. Don't we consider --author a greppable pattern and invert it?

By itself:

  $ git log --author=Ævar --invert-grep --format=%an -100 origin/pu |
    sort | uniq -c | sort -rn
       79 Junio C Hamano
        8 Stefan Beller
        8 Jeff King
        1 Sahil Dua
        1 Rikard Falkeborn
        1 Johannes Sixt
        1 Johannes Schindelin
        1 Andreas Heiduk

So that makes sense from the "--author is invertable" world-view.

But I'm puzzled that you show up at all when --grep=bar is added (and
moreover, that we get _one_ commit from you, not the 5 found in your
initial command). That seems like a bug.

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: bug: `git log --grep ... --invert-grep --author=...` negates / ignores --author
  2017-05-31 18:08 ` Ævar Arnfjörð Bjarmason
  2017-05-31 21:40   ` Jeff King
@ 2017-06-01  2:20   ` Junio C Hamano
  1 sibling, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2017-06-01  2:20 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Anthony Sottile, Git Mailing List

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> $ git log --grep=bar --invert-grep --author=Ævar --pretty=format:%an
> -100 origin/pu |sort|uniq -c|sort -nr
>      78 Junio C Hamano
>      14 Jeff King
>       2 Andreas Heiduk
>       1 Sahil Dua
>       1 Rikard Falkeborn
>       1 Johannes Sixt
>       1 Johannes Schindelin
>       1 Ben Peart
>       1 Ævar Arnfjörð Bjarmason
>
> That last command should only find my commits, but instead --author is
> discarded.

Hmph, the way I read revision.c::commit_match() is that we check
with "--grep=bar --author=Ævar" and then emit the commit if that
check fails when --invert-grep is in effect (instead of emitting the
commit when the check succeeds, which is the normal case).

So there is one commit that didn't pass "has 'bar' and written by
Ævar" check that was written by you in the recent past (iow,
recently you were writing bar all over the place).  Changes by other
people by definition does not pass "has 'bar' and written by Ævar"
check, on the other hand.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: bug: `git log --grep ... --invert-grep --author=...` negates / ignores --author
  2017-05-31 21:40   ` Jeff King
@ 2017-06-01 19:45     ` Ævar Arnfjörð Bjarmason
  2017-06-01 21:05       ` Anthony Sottile
  2017-06-01 21:38       ` Junio C Hamano
  0 siblings, 2 replies; 7+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-01 19:45 UTC (permalink / raw)
  To: Jeff King; +Cc: Anthony Sottile, Git Mailing List

On Wed, May 31, 2017 at 11:40 PM, Jeff King <peff@peff.net> wrote:
> On Wed, May 31, 2017 at 08:08:54PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> $ git log --grep=bar --author=Ævar --pretty=format:%an -100 origin/pu
>> |sort|uniq -c|sort -nr
>> 5 Ævar Arnfjörð Bjarmason
>>
>> $ git log --author=Ævar --pretty=format:%an -100 origin/pu |sort|uniq
>> -c|sort -nr
>> 100 Ævar Arnfjörð Bjarmason
>>
>> $ git log --grep=bar --invert-grep --author=Ævar --pretty=format:%an
>> -100 origin/pu |sort|uniq -c|sort -nr
>>      78 Junio C Hamano
>>      14 Jeff King
>>       2 Andreas Heiduk
>>       1 Sahil Dua
>>       1 Rikard Falkeborn
>>       1 Johannes Sixt
>>       1 Johannes Schindelin
>>       1 Ben Peart
>>       1 Ævar Arnfjörð Bjarmason
>>
>> That last command should only find my commits, but instead --author is
>> discarded.
>
> I would have thought that the last command wouldn't find _any_ of your
> commits. Don't we consider --author a greppable pattern and invert it?

There's two unrelated things going on here AFAICT:

* Anthony expected --author to be inverted by --invert-grep, but this
is not how it's documented, it's *only* for inverting the --grep
pattern: "Limit the commits output to ones with log message that do
not match the pattern specified with --grep=<pattern>."

* The --author filter should be applied un-inverted, but isn't, it's
seemingly just ignored in the presence of --grep, actually mostly
ignored, this is bizarre:

$ diff -ru <(git log --grep=bar --invert-grep --pretty=format:%an -100
origin/pu |sort|uniq -c|sort -nr) <(git log --grep=bar --invert-grep
--pretty=format:%an -100 --author=WeMostlyIgnoreThisButNotReally
origin/pu |sort|uniq -c|sort -nr)
--- /dev/fd/63  2017-06-01 21:44:08.952583804 +0200
+++ /dev/fd/62  2017-06-01 21:44:08.952583804 +0200
@@ -1,6 +1,6 @@
-     66 Junio C Hamano
+     65 Junio C Hamano
      10 Jeff King
-      8 Stefan Beller
+      9 Stefan Beller
       5 Lars Schneider
       2 SZEDER Gábor
       1 Tyler Brazier


> By itself:
>
>   $ git log --author=Ævar --invert-grep --format=%an -100 origin/pu |
>     sort | uniq -c | sort -rn
>        79 Junio C Hamano
>         8 Stefan Beller
>         8 Jeff King
>         1 Sahil Dua
>         1 Rikard Falkeborn
>         1 Johannes Sixt
>         1 Johannes Schindelin
>         1 Andreas Heiduk
>
> So that makes sense from the "--author is invertable" world-view.
>
> But I'm puzzled that you show up at all when --grep=bar is added (and
> moreover, that we get _one_ commit from you, not the 5 found in your
> initial command). That seems like a bug.

I think that's the only thing that's not a bug, i.e. we just find 1
match in the first 100 (note -100), sorry for the confusing example.

Anyway, much of the above may be incorrect, I haven't dug deeply
beyond just finding that something's funny going on and we definitely
have *some* bugs here.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: bug: `git log --grep ... --invert-grep --author=...` negates / ignores --author
  2017-06-01 19:45     ` Ævar Arnfjörð Bjarmason
@ 2017-06-01 21:05       ` Anthony Sottile
  2017-06-01 21:38       ` Junio C Hamano
  1 sibling, 0 replies; 7+ messages in thread
From: Anthony Sottile @ 2017-06-01 21:05 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Jeff King, Git Mailing List

I actually only expected the --grep to be inverted -- I think I'm on
the same page with what's documented.

I'd be happy to dig into the code and investigate this some more but I
am not familiar with the git codebase, any code hints on where to get
bootstrapped?

Anthony

On Thu, Jun 1, 2017 at 12:45 PM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> On Wed, May 31, 2017 at 11:40 PM, Jeff King <peff@peff.net> wrote:
>> On Wed, May 31, 2017 at 08:08:54PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>>> $ git log --grep=bar --author=Ævar --pretty=format:%an -100 origin/pu
>>> |sort|uniq -c|sort -nr
>>> 5 Ævar Arnfjörð Bjarmason
>>>
>>> $ git log --author=Ævar --pretty=format:%an -100 origin/pu |sort|uniq
>>> -c|sort -nr
>>> 100 Ævar Arnfjörð Bjarmason
>>>
>>> $ git log --grep=bar --invert-grep --author=Ævar --pretty=format:%an
>>> -100 origin/pu |sort|uniq -c|sort -nr
>>>      78 Junio C Hamano
>>>      14 Jeff King
>>>       2 Andreas Heiduk
>>>       1 Sahil Dua
>>>       1 Rikard Falkeborn
>>>       1 Johannes Sixt
>>>       1 Johannes Schindelin
>>>       1 Ben Peart
>>>       1 Ævar Arnfjörð Bjarmason
>>>
>>> That last command should only find my commits, but instead --author is
>>> discarded.
>>
>> I would have thought that the last command wouldn't find _any_ of your
>> commits. Don't we consider --author a greppable pattern and invert it?
>
> There's two unrelated things going on here AFAICT:
>
> * Anthony expected --author to be inverted by --invert-grep, but this
> is not how it's documented, it's *only* for inverting the --grep
> pattern: "Limit the commits output to ones with log message that do
> not match the pattern specified with --grep=<pattern>."
>
> * The --author filter should be applied un-inverted, but isn't, it's
> seemingly just ignored in the presence of --grep, actually mostly
> ignored, this is bizarre:
>
> $ diff -ru <(git log --grep=bar --invert-grep --pretty=format:%an -100
> origin/pu |sort|uniq -c|sort -nr) <(git log --grep=bar --invert-grep
> --pretty=format:%an -100 --author=WeMostlyIgnoreThisButNotReally
> origin/pu |sort|uniq -c|sort -nr)
> --- /dev/fd/63  2017-06-01 21:44:08.952583804 +0200
> +++ /dev/fd/62  2017-06-01 21:44:08.952583804 +0200
> @@ -1,6 +1,6 @@
> -     66 Junio C Hamano
> +     65 Junio C Hamano
>       10 Jeff King
> -      8 Stefan Beller
> +      9 Stefan Beller
>        5 Lars Schneider
>        2 SZEDER Gábor
>        1 Tyler Brazier
>
>
>> By itself:
>>
>>   $ git log --author=Ævar --invert-grep --format=%an -100 origin/pu |
>>     sort | uniq -c | sort -rn
>>        79 Junio C Hamano
>>         8 Stefan Beller
>>         8 Jeff King
>>         1 Sahil Dua
>>         1 Rikard Falkeborn
>>         1 Johannes Sixt
>>         1 Johannes Schindelin
>>         1 Andreas Heiduk
>>
>> So that makes sense from the "--author is invertable" world-view.
>>
>> But I'm puzzled that you show up at all when --grep=bar is added (and
>> moreover, that we get _one_ commit from you, not the 5 found in your
>> initial command). That seems like a bug.
>
> I think that's the only thing that's not a bug, i.e. we just find 1
> match in the first 100 (note -100), sorry for the confusing example.
>
> Anyway, much of the above may be incorrect, I haven't dug deeply
> beyond just finding that something's funny going on and we definitely
> have *some* bugs here.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: bug: `git log --grep ... --invert-grep --author=...` negates / ignores --author
  2017-06-01 19:45     ` Ævar Arnfjörð Bjarmason
  2017-06-01 21:05       ` Anthony Sottile
@ 2017-06-01 21:38       ` Junio C Hamano
  1 sibling, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2017-06-01 21:38 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff King, Anthony Sottile, Git Mailing List

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> Anyway, much of the above may be incorrect, I haven't dug deeply
> beyond just finding that something's funny going on and we definitely
> have *some* bugs here.

One thing that is very correct in what you said is that the
documentation of "--invert-grep" is wrong to mention only "--grep".
It probably is because the author of "--invert-grep" did not realize
that the "--author" thing internally is considered a part of "grep"
processing done in revision.c::commit_match() that chooses which
commit to show.

As you may have noticed, the integration to the grep infrastructure
with "git log" and friends does not expose the power of combination
(e.g. "--and", "--not", etc.), unlike "git grep", to the command
line.  This is purely a technical limitation (e.g. "--not" is taken
to mean a completely different thing in "git log" family) and there
is no fundamental reason why we shouldn't, but the fact that the end
user cannot combine the "--grep" like terms in a flexible way
remains from the beginning to this day.

Within that constraint, --grep and things like --author are defined
to interact in a certain hardcoded way (the former searches
substring in the body part of commit objects while the latter
searches substrings in the "header" part of them), simply because
defining them to be all ORed together does not give us a very useful
semantics.

In general, two or more of these searches always "OR" together,
e.g. "git log --grep=foo --grep=bar" finds commits that mention foo
or commits that mention bar.  However, searches in body and searches
in header "AND" together, e.g. "git log --author=Ævar --grep=foo"
finds commits that mention foo and written by you.  "git log
--author=Ævar --author=gitster@ --grep=foo --grep=bar" finds commits
that mention either foo or bar and written by either you or me.
IIRC, its parse tree would look like

    (AND (OR (author Ævar) (author gitster@))
         (OR (grep foo) (grep bar))

The "--invert-grep" merely adds (NOT ...) around it.

I vaguely recall there was impliations to "all match" when the
header search and the body search is used together.  Without header
search, the usual "git log --grep=foo --grep=bar" becomes

         (OR (grep foo) (grep bar))

and "--all-match" is applied to this "OR" (i.e. by the time the
matcher finishes a single commit's body, all of the terms ORed
together, i.e. (grep foo) and (grep bar), must have matched at least
once for the commit to be considered a match.  Obviously, a commit
can be authored and commited by only a single person, and applying
all-match naively to "git log --author=Ævar --author=gitster@" would
not make any sense (by definition the result would be an empty set),
and IIRC there is a code that copes with this to avoid requiring a
commit that is written by two people (or committed by two people)
when "--all-match" is in effect.

So taken all together, your example and its output makes all sense
to me.  As you mentioned, the examples by the OP was written a bit
too dense for me to grok quickly and I didn't look too carefully at
them to comment.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-06-01 21:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-31 15:45 bug: `git log --grep ... --invert-grep --author=...` negates / ignores --author Anthony Sottile
2017-05-31 18:08 ` Ævar Arnfjörð Bjarmason
2017-05-31 21:40   ` Jeff King
2017-06-01 19:45     ` Ævar Arnfjörð Bjarmason
2017-06-01 21:05       ` Anthony Sottile
2017-06-01 21:38       ` Junio C Hamano
2017-06-01  2:20   ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).