From: Barret Rhoden <brho@google.com>
To: Harrison McCullough <mccullough.harrison@gmail.com>
Cc: "René Scharfe" <l.s.r@web.de>, git@vger.kernel.org
Subject: Re: git blame --ignore-rev does not work
Date: Fri, 2 Oct 2020 20:56:27 -0400 [thread overview]
Message-ID: <c77eb54a-ecfd-508b-ed6a-030e66d8e257@google.com> (raw)
In-Reply-To: <CAHLeu+zSaTwPEDQ=CuFua0NdEppM+OjFaREU+Yiy9udK1OUK4w@mail.gmail.com>
Hi -
On 10/2/20 6:52 PM, Harrison McCullough wrote:
> Thank you for your feedback! I do have some more information to
> provide that is confusing.
>
> I tried running `git blame -w`, and this correctly ignores the
> revision I tried to ignore with `--ignore-rev`, etc. So it appears
> that the algorithm to attribute lines to commits is capable of
> ignoring the commit in question (in the lines I've inspected) but it's
> not doing it when I use the "ignore-rev" capability—only the "ignore
> whitespace changes" capability.
>
> Does anyone have any ideas about why that may be the case? Does the
> "ignore whitespace" and "ignore commit" algorithms use different
> logic? I would have assumed that they shared most of the logic.
Yeah, the logic is a little different. IIRC, the "whitespace ignore"
affects individual diff_hunks that are generated during the blame
process, generated by calling into xdiff. That's when blame tries and
figure out what lines change from target/child to parent. So the "blame
chunk" that gets analyzed is different, depending on whitespace-ignore
or not.
The "blame ignore" logic happens after that, and it operates on the
output of xdiff. If the 'target' is one of the ignored commits, it
looks at those chunks produced by xdiff (differences from target commit
to parent) and for each line in the target commit, attempts to figure
out what line in the parent to match it to.
The matching process is imperfect. It uses a fingerprinting algorithm
to find a likely candidate line in the parent's diff hunk. The
fingerprinting is based on matching the two-character pairs in each
string. (I didn't write the fingerprinting, btw). It does some ranking
of the lines, picks the best one, and a few other 'search quality'
things. etc. If it fails to find a matching line in the hunk, it'll do
a simple O(n) scan in the entire file. But when it looks in the entire
file, it has a threshold of similarity, so you don't match arbitrary
things. (Threshold is 10 two-letter pairs, I think).
Anyway, my guess is that when you use the ignore-whitespace option, it
changes the diff hunks enough that blame_chunk() gives a different
result. This could be because there are larger diff hunks (more search
space for the "good" initial scan). Or maybe because you have a
different set of two-character pairs or something.
> I would love to provide a concrete example, but the only time I've
> been able to reproduce this is with proprietary code. I'll try to
> create a new repository with a similar commit and see if I can ignore
> it there.
I'd be glad to take a look, though I don't know how fixable it is.
My guess is you're hitting right on the edge of the "find a similar line
inside a hunk" and "couldn't find a good change in the entire file"
threshold, and any change would just be tuning it for your repo. But
maybe not, and I'd like for --ignore-rev to work for you.
> For the information of those interested, the commit I'm trying to
> ignore is a "reformat the world" commit. We introduced the tool
> "astyle" into our codebase, and as part of that effort I ran astyle
> over our entire codebase.
Same situation for me - reformatted a lot of code and didn't want to
break blame. =) That was the original motivation. I've actually used
it a lot for other things now, such as finding out which commit changed
a line in a particular way. git blame, look around. git show XXX,
realize i want and older commit, git blame --ignore-rev XXX, etc.
> Is it possible that the commit isn't being ignored because it's too
> big? It did change over 1300 files....
That should be OK. The algorithm doesn't care about the other files in
the commit - only the one that you are blaming.
Thanks,
Barret
>
> On Fri, Oct 2, 2020 at 4:44 PM Barret Rhoden <brho@google.com> wrote:
>>
>> Hi -
>>
>> On 10/2/20 5:40 PM, René Scharfe wrote:
>> [snip]
>>> I don't know if these revisions are not ignored due to bugs or because
>>> the feature just isn't strong enough, yet, but I would expect your
>>> particular case to be represented by at least one of these...
>>
>> Correct.
>>
>> When skipping a revision, the algorithm attempts to find another
>> revision that could be responsible for the change. But it might not be
>> able to find anything. Consider a commit that just adds a few lines to
>> a file with only 'foo' and 'bar':
>>
>> commit: "Adding Lines"
>> -------------
>> foo
>> +No commit
>> +ever touched
>> +these lines
>> bar
>>
>> If we ignored that revision, which commit do we assign those lines to?
>> If they were "similar" to the existing lines, then the algorithm might
>> match. But in general, we can't find 'correct' (as defined by a user)
>> matches for arbitrary changes.
>>
>> I usually run git with these settings:
>>
>> [blame]
>> ignorerevsfile = .git-blame-ignore-revs
>> markIgnoredLines = true
>> markUnblamableLines = true
>>
>> Which points out when --ignore-revs is doing something.
>>
>> Thanks,
>>
>> Barret
>>
>>
>
>
prev parent reply other threads:[~2020-10-03 0:56 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-30 21:15 git blame --ignore-rev does not work Harrison McCullough
2020-10-02 21:40 ` René Scharfe
2020-10-02 22:44 ` Barret Rhoden
2020-10-02 22:52 ` Harrison McCullough
2020-10-03 0:56 ` Barret Rhoden [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c77eb54a-ecfd-508b-ed6a-030e66d8e257@google.com \
--to=brho@google.com \
--cc=git@vger.kernel.org \
--cc=l.s.r@web.de \
--cc=mccullough.harrison@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).