git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* git blame --ignore-rev does not work
@ 2020-09-30 21:15 Harrison McCullough
  2020-10-02 21:40 ` René Scharfe
  0 siblings, 1 reply; 5+ messages in thread
From: Harrison McCullough @ 2020-09-30 21:15 UTC (permalink / raw)
  To: git

What did you do before the bug happened?

1. Commit changes to <FILE>
2. Observe that this commit has a hash of <HASH>, e.g. through git rev-parse
   HEAD
3. Run `echo <HASH> > .git-blame-ignore-revs`
4. Run `git config blame.ignoreRevsFile .git-blame-ignore-revs`
5. Run `git blame <FILE>`
6. Run `git blame --ignore-revs-file=.git-blame-ignore-revs <FILE>`
7. Run `git blame --ignore-rev=<HASH> <FILE>`


What did you expect to happen? (Expected behavior)

The three git blame commands should attribute each line of the source file to a
commit, but none of those commits should be the one specified by <HASH>.


What happened instead? (Actual behavior)

All three git blame commands included lines attributed to <HASH>.


What's different between what you expected and what actually happened?

The commit identified by <HASH> was not ignored.


Anything else you want to add:

I tried this in a brand new repository and everything worked as expected. I do
not know why it is only failing in this repository. It is a large repository I
use for work, but I'm using the same version of Git in both places.


[System Info]
git version:
git version 2.28.0
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
uname: Darwin 18.7.0 Darwin Kernel Version 18.7.0: Mon Apr 27 20:09:39
PDT 2020; root:xnu-4903.278.35~1/RELEASE_X86_64 x86_64
compiler info: clang: 11.0.0 (clang-1100.0.33.17)
libc info: no libc information available
$SHELL (typically, interactive shell): /usr/local/bin/bash


[Enabled Hooks]
post-commit
post-checkout
post-merge
pre-push

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: git blame --ignore-rev does not work
  2020-09-30 21:15 git blame --ignore-rev does not work Harrison McCullough
@ 2020-10-02 21:40 ` René Scharfe
  2020-10-02 22:44   ` Barret Rhoden
  0 siblings, 1 reply; 5+ messages in thread
From: René Scharfe @ 2020-10-02 21:40 UTC (permalink / raw)
  To: Harrison McCullough, git; +Cc: Barret Rhoden

Am 30.09.20 um 23:15 schrieb Harrison McCullough:
> What did you do before the bug happened?
>
> 1. Commit changes to <FILE>
> 2. Observe that this commit has a hash of <HASH>, e.g. through git rev-parse
>    HEAD
> 3. Run `echo <HASH> > .git-blame-ignore-revs`
> 4. Run `git config blame.ignoreRevsFile .git-blame-ignore-revs`
> 5. Run `git blame <FILE>`
> 6. Run `git blame --ignore-revs-file=.git-blame-ignore-revs <FILE>`
> 7. Run `git blame --ignore-rev=<HASH> <FILE>`
>
>
> What did you expect to happen? (Expected behavior)
>
> The three git blame commands should attribute each line of the source file to a
> commit, but none of those commits should be the one specified by <HASH>.
>
>
> What happened instead? (Actual behavior)
>
> All three git blame commands included lines attributed to <HASH>.
>
>
> What's different between what you expected and what actually happened?
>
> The commit identified by <HASH> was not ignored.

Well, your expectation sounds reasonable, but apparently it's not that
easy.  Consider this sentence from the description of --ignore-rev on
the manpage of git blame:

    If the `blame.markUnblamableLines` config option is set, then those
    lines touched by an ignored commit that we could not attribute to
    another revision are marked with a '*'.

So some commits just cannot be ignored by the current version of git
blame.  The commit message of ae3f36dea1 (blame: add the ability to
ignore commits and their changes, 2019-05-15), which introduced that
feature, mentions an example.  And this silly script finds that 365 of
the 765 commits blamed for Git's own Makefile are examples as well:

  file=Makefile
  rev=v2.28.0
  hashes=$(git blame "$rev" "$file" | awk '{print $1}' | sort -u)
  echo "$hashes" | wc -l
  for hash in $hashes
  do
     if git blame --ignore-rev=$hash "$rev" "$file" | grep -q "^$hash "
     then
       echo $hash
     fi
  done | wc -l

I don't know if these revisions are not ignored due to bugs or because
the feature just isn't strong enough, yet, but I would expect your
particular case to be represented by at least one of these...

> Anything else you want to add:
>
> I tried this in a brand new repository and everything worked as expected. I do
> not know why it is only failing in this repository. It is a large repository I
> use for work, but I'm using the same version of Git in both places.

... so this might not be a problem, as finding public examples seems
to be easy.

>
>
> [System Info]
> git version:
> git version 2.28.0
> cpu: x86_64
> no commit associated with this build
> sizeof-long: 8
> sizeof-size_t: 8
> shell-path: /bin/sh
> uname: Darwin 18.7.0 Darwin Kernel Version 18.7.0: Mon Apr 27 20:09:39
> PDT 2020; root:xnu-4903.278.35~1/RELEASE_X86_64 x86_64
> compiler info: clang: 11.0.0 (clang-1100.0.33.17)
> libc info: no libc information available
> $SHELL (typically, interactive shell): /usr/local/bin/bash
>
>
> [Enabled Hooks]
> post-commit
> post-checkout
> post-merge
> pre-push
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: git blame --ignore-rev does not work
  2020-10-02 21:40 ` René Scharfe
@ 2020-10-02 22:44   ` Barret Rhoden
  2020-10-02 22:52     ` Harrison McCullough
  0 siblings, 1 reply; 5+ messages in thread
From: Barret Rhoden @ 2020-10-02 22:44 UTC (permalink / raw)
  To: René Scharfe, Harrison McCullough, git

Hi -

On 10/2/20 5:40 PM, René Scharfe wrote:
[snip]
> I don't know if these revisions are not ignored due to bugs or because
> the feature just isn't strong enough, yet, but I would expect your
> particular case to be represented by at least one of these...

Correct.

When skipping a revision, the algorithm attempts to find another 
revision that could be responsible for the change.  But it might not be 
able to find anything.  Consider a commit that just adds a few lines to 
a file with only 'foo' and 'bar':

commit: "Adding Lines"
-------------
  foo
+No commit
+ever touched
+these lines
  bar

If we ignored that revision, which commit do we assign those lines to? 
If they were "similar" to the existing lines, then the algorithm might 
match.  But in general, we can't find 'correct' (as defined by a user) 
matches for arbitrary changes.

I usually run git with these settings:

[blame]
         ignorerevsfile = .git-blame-ignore-revs
         markIgnoredLines = true
         markUnblamableLines = true

Which points out when --ignore-revs is doing something.

Thanks,

Barret



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: git blame --ignore-rev does not work
  2020-10-02 22:44   ` Barret Rhoden
@ 2020-10-02 22:52     ` Harrison McCullough
  2020-10-03  0:56       ` Barret Rhoden
  0 siblings, 1 reply; 5+ messages in thread
From: Harrison McCullough @ 2020-10-02 22:52 UTC (permalink / raw)
  To: Barret Rhoden; +Cc: René Scharfe, git

Thank you for your feedback! I do have some more information to
provide that is confusing.

I tried running `git blame -w`, and this correctly ignores the
revision I tried to ignore with `--ignore-rev`, etc. So it appears
that the algorithm to attribute lines to commits is capable of
ignoring the commit in question (in the lines I've inspected) but it's
not doing it when I use the "ignore-rev" capability—only the "ignore
whitespace changes" capability.

Does anyone have any ideas about why that may be the case? Does the
"ignore whitespace" and "ignore commit" algorithms use different
logic? I would have assumed that they shared most of the logic.

I would love to provide a concrete example, but the only time I've
been able to reproduce this is with proprietary code. I'll try to
create a new repository with a similar commit and see if I can ignore
it there.

For the information of those interested, the commit I'm trying to
ignore is a "reformat the world" commit. We introduced the tool
"astyle" into our codebase, and as part of that effort I ran astyle
over our entire codebase.

Is it possible that the commit isn't being ignored because it's too
big? It did change over 1300 files....

On Fri, Oct 2, 2020 at 4:44 PM Barret Rhoden <brho@google.com> wrote:
>
> Hi -
>
> On 10/2/20 5:40 PM, René Scharfe wrote:
> [snip]
> > I don't know if these revisions are not ignored due to bugs or because
> > the feature just isn't strong enough, yet, but I would expect your
> > particular case to be represented by at least one of these...
>
> Correct.
>
> When skipping a revision, the algorithm attempts to find another
> revision that could be responsible for the change.  But it might not be
> able to find anything.  Consider a commit that just adds a few lines to
> a file with only 'foo' and 'bar':
>
> commit: "Adding Lines"
> -------------
>   foo
> +No commit
> +ever touched
> +these lines
>   bar
>
> If we ignored that revision, which commit do we assign those lines to?
> If they were "similar" to the existing lines, then the algorithm might
> match.  But in general, we can't find 'correct' (as defined by a user)
> matches for arbitrary changes.
>
> I usually run git with these settings:
>
> [blame]
>          ignorerevsfile = .git-blame-ignore-revs
>          markIgnoredLines = true
>          markUnblamableLines = true
>
> Which points out when --ignore-revs is doing something.
>
> Thanks,
>
> Barret
>
>


-- 
-Harrison McCullough

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: git blame --ignore-rev does not work
  2020-10-02 22:52     ` Harrison McCullough
@ 2020-10-03  0:56       ` Barret Rhoden
  0 siblings, 0 replies; 5+ messages in thread
From: Barret Rhoden @ 2020-10-03  0:56 UTC (permalink / raw)
  To: Harrison McCullough; +Cc: René Scharfe, git

Hi -

On 10/2/20 6:52 PM, Harrison McCullough wrote:
> Thank you for your feedback! I do have some more information to
> provide that is confusing.
> 
> I tried running `git blame -w`, and this correctly ignores the
> revision I tried to ignore with `--ignore-rev`, etc. So it appears
> that the algorithm to attribute lines to commits is capable of
> ignoring the commit in question (in the lines I've inspected) but it's
> not doing it when I use the "ignore-rev" capability—only the "ignore
> whitespace changes" capability.
> 
> Does anyone have any ideas about why that may be the case? Does the
> "ignore whitespace" and "ignore commit" algorithms use different
> logic? I would have assumed that they shared most of the logic.

Yeah, the logic is a little different.  IIRC, the "whitespace ignore" 
affects individual diff_hunks that are generated during the blame 
process, generated by calling into xdiff.  That's when blame tries and 
figure out what lines change from target/child to parent.  So the "blame 
chunk" that gets analyzed is different, depending on whitespace-ignore 
or not.

The "blame ignore" logic happens after that, and it operates on the 
output of xdiff.  If the 'target' is one of the ignored commits, it 
looks at those chunks produced by xdiff (differences from target commit 
to parent) and for each line in the target commit, attempts to figure 
out what line in the parent to match it to.

The matching process is imperfect.  It uses a fingerprinting algorithm 
to find a likely candidate line in the parent's diff hunk.  The 
fingerprinting is based on matching the two-character pairs in each 
string.  (I didn't write the fingerprinting, btw).  It does some ranking 
of the lines, picks the best one, and a few other 'search quality' 
things.  etc.  If it fails to find a matching line in the hunk, it'll do 
a simple O(n) scan in the entire file.  But when it looks in the entire 
file, it has a threshold of similarity, so you don't match arbitrary 
things.  (Threshold is 10 two-letter pairs, I think).

Anyway, my guess is that when you use the ignore-whitespace option, it 
changes the diff hunks enough that blame_chunk() gives a different 
result.  This could be because there are larger diff hunks (more search 
space for the "good" initial scan).  Or maybe because you have a 
different set of two-character pairs or something.

> I would love to provide a concrete example, but the only time I've
> been able to reproduce this is with proprietary code. I'll try to
> create a new repository with a similar commit and see if I can ignore
> it there.

I'd be glad to take a look, though I don't know how fixable it is. 
My guess is you're hitting right on the edge of the "find a similar line 
inside a hunk" and "couldn't find a good change in the entire file" 
threshold, and any change would just be tuning it for your repo.  But 
maybe not, and I'd like for --ignore-rev to work for you.

> For the information of those interested, the commit I'm trying to
> ignore is a "reformat the world" commit. We introduced the tool
> "astyle" into our codebase, and as part of that effort I ran astyle
> over our entire codebase.

Same situation for me - reformatted a lot of code and didn't want to 
break blame.  =)  That was the original motivation.  I've actually used 
it a lot for other things now, such as finding out which commit changed 
a line in a particular way.  git blame, look around.  git show XXX, 
realize i want and older commit, git blame --ignore-rev XXX, etc.

> Is it possible that the commit isn't being ignored because it's too
> big? It did change over 1300 files....

That should be OK.  The algorithm doesn't care about the other files in 
the commit - only the one that you are blaming.

Thanks,

Barret


> 
> On Fri, Oct 2, 2020 at 4:44 PM Barret Rhoden <brho@google.com> wrote:
>>
>> Hi -
>>
>> On 10/2/20 5:40 PM, René Scharfe wrote:
>> [snip]
>>> I don't know if these revisions are not ignored due to bugs or because
>>> the feature just isn't strong enough, yet, but I would expect your
>>> particular case to be represented by at least one of these...
>>
>> Correct.
>>
>> When skipping a revision, the algorithm attempts to find another
>> revision that could be responsible for the change.  But it might not be
>> able to find anything.  Consider a commit that just adds a few lines to
>> a file with only 'foo' and 'bar':
>>
>> commit: "Adding Lines"
>> -------------
>>    foo
>> +No commit
>> +ever touched
>> +these lines
>>    bar
>>
>> If we ignored that revision, which commit do we assign those lines to?
>> If they were "similar" to the existing lines, then the algorithm might
>> match.  But in general, we can't find 'correct' (as defined by a user)
>> matches for arbitrary changes.
>>
>> I usually run git with these settings:
>>
>> [blame]
>>           ignorerevsfile = .git-blame-ignore-revs
>>           markIgnoredLines = true
>>           markUnblamableLines = true
>>
>> Which points out when --ignore-revs is doing something.
>>
>> Thanks,
>>
>> Barret
>>
>>
> 
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-10-03  0:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-30 21:15 git blame --ignore-rev does not work Harrison McCullough
2020-10-02 21:40 ` René Scharfe
2020-10-02 22:44   ` Barret Rhoden
2020-10-02 22:52     ` Harrison McCullough
2020-10-03  0:56       ` Barret Rhoden

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).