A new idea to extend git-blame

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* A new idea to extend git-blame
@ 2019-11-25 12:41 Chen Bin
  2019-11-25 14:16 ` Jeff King
  0 siblings, 1 reply; 8+ messages in thread
From: Chen Bin @ 2019-11-25 12:41 UTC (permalink / raw)
  To: git

I already implemented the idea in Emacs Lisp. See,

http://blog.binchen.org/posts/effective-git-blame-in-emacs.html

I tested at https://github.com/redguardtoo/test-git-blame

Looks it works.

The only issue is Lisp is slow in big code base.

So I'm thinking I could re-implement it in C instead.

My question is, *who can I contact to understand git-blame?

I'm experienced at C but need some expert's guide.

The key algorithm is simple,

The algorithm only works for one line blame and the user must
select text inside the line first.

Step 1, `git blame -L6,1 --porcelain -- hello.js` output,

    4f87408612e0dacfd89a1cd2515944e21cf68561 6 6 1
    skip...
    filename hello.js
     doit({bad: 'destroy world', good: 'hello world', ...});

I got the commit id (1st column), the line number (2nd column),
file name (hello.js) and the code line (last line).

Step 2, if the code line does not contain the selected text, the
  recursive search stops

Step 3, or else use commit id, line number and file name to build
  new git blame cli, like,

`git blame -L line-num,1 --porcelain 4f8740^ file-name`

Step 4, execute new git blame command and start from Step 1

Here is my first commit (added some debug code),

https://github.com/redguardtoo/git/commit/d01d26f2df

-- 
Best Regards,
Chen Bin

--
Help me, help you

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A new idea to extend git-blame
  2019-11-25 12:41 A new idea to extend git-blame Chen Bin
@ 2019-11-25 14:16 ` Jeff King
  2019-11-26  4:55   ` chen bin
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2019-11-25 14:16 UTC (permalink / raw)
  To: Chen Bin; +Cc: git

On Mon, Nov 25, 2019 at 11:41:55PM +1100, Chen Bin wrote:

> The key algorithm is simple,
> 
> The algorithm only works for one line blame and the user must
> select text inside the line first.
> 
> Step 1, `git blame -L6,1 --porcelain -- hello.js` output,
> 
>     4f87408612e0dacfd89a1cd2515944e21cf68561 6 6 1
>     skip...
>     filename hello.js
>      doit({bad: 'destroy world', good: 'hello world', ...});
> 
> I got the commit id (1st column), the line number (2nd column),
> file name (hello.js) and the code line (last line).
> 
> Step 2, if the code line does not contain the selected text, the
>   recursive search stops
> 
> Step 3, or else use commit id, line number and file name to build
>   new git blame cli, like,
> 
> `git blame -L line-num,1 --porcelain 4f8740^ file-name`
> 
> Step 4, execute new git blame command and start from Step 1

This sounds a lot like how git-log's "-L" option works, which tries to
find the history of a line over many changes.

It's also similar to the "re-blame from parent" feature of many blame
viewers. There we have a human in the loop saying "no, this is not quite
the change I'm looking for; go back further".

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A new idea to extend git-blame
  2019-11-25 14:16 ` Jeff King
@ 2019-11-26  4:55   ` chen bin
  2019-11-26  5:36     ` chen bin
  0 siblings, 1 reply; 8+ messages in thread
From: chen bin @ 2019-11-26  4:55 UTC (permalink / raw)
  To: Jeff King; +Cc: Git ML

I re-tested `git log -L20,20:README.md` in git's own repo with HEAD
d01d26f2df. Looks git log is not what I expected. The output contains
many unrelated commits. So it will be slow in real project.

A recursive blame with the algorithm I suggest will get correct commit
in short time. So my suggestion still hold.

I could submit a patch to enhance blame.

On Tue, Nov 26, 2019 at 1:16 AM Jeff King <peff@peff.net> wrote:
>
> On Mon, Nov 25, 2019 at 11:41:55PM +1100, Chen Bin wrote:
>
> > The key algorithm is simple,
> >
> > The algorithm only works for one line blame and the user must
> > select text inside the line first.
> >
> > Step 1, `git blame -L6,1 --porcelain -- hello.js` output,
> >
> >     4f87408612e0dacfd89a1cd2515944e21cf68561 6 6 1
> >     skip...
> >     filename hello.js
> >      doit({bad: 'destroy world', good: 'hello world', ...});
> >
> > I got the commit id (1st column), the line number (2nd column),
> > file name (hello.js) and the code line (last line).
> >
> > Step 2, if the code line does not contain the selected text, the
> >   recursive search stops
> >
> > Step 3, or else use commit id, line number and file name to build
> >   new git blame cli, like,
> >
> > `git blame -L line-num,1 --porcelain 4f8740^ file-name`
> >
> > Step 4, execute new git blame command and start from Step 1
>
> This sounds a lot like how git-log's "-L" option works, which tries to
> find the history of a line over many changes.
>
> It's also similar to the "re-blame from parent" feature of many blame
> viewers. There we have a human in the loop saying "no, this is not quite
> the change I'm looking for; go back further".
>
> -Peff



-- 
help me, help you.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A new idea to extend git-blame
  2019-11-26  4:55   ` chen bin
@ 2019-11-26  5:36     ` chen bin
  2019-11-27  7:32       ` chen bin
  0 siblings, 1 reply; 8+ messages in thread
From: chen bin @ 2019-11-26  5:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Git ML

The problem of recursive blaming with 3rd party script is it's slow.
The script have to restart git process multiple times. Besides, as I
already did in Emacs (see
https://github.com/redguardtoo/vc-msg/commit/1f7775f951fdd9e6571afbb3a767c42d20617f52),
it's still lots of code.

So if it's possible to use a simple cli option like `git blame
-S"text"`to do the same thing, it would be easier to migrate vc-msg's
functionality to other text editors.

On Tue, Nov 26, 2019 at 3:55 PM chen bin <chenbin.sh@gmail.com> wrote:
>
> I re-tested `git log -L20,20:README.md` in git's own repo with HEAD
> d01d26f2df. Looks git log is not what I expected. The output contains
> many unrelated commits. So it will be slow in real project.
>
> A recursive blame with the algorithm I suggest will get correct commit
> in short time. So my suggestion still hold.
>
> I could submit a patch to enhance blame.
>
> On Tue, Nov 26, 2019 at 1:16 AM Jeff King <peff@peff.net> wrote:
> >
> > On Mon, Nov 25, 2019 at 11:41:55PM +1100, Chen Bin wrote:
> >
> > > The key algorithm is simple,
> > >
> > > The algorithm only works for one line blame and the user must
> > > select text inside the line first.
> > >
> > > Step 1, `git blame -L6,1 --porcelain -- hello.js` output,
> > >
> > >     4f87408612e0dacfd89a1cd2515944e21cf68561 6 6 1
> > >     skip...
> > >     filename hello.js
> > >      doit({bad: 'destroy world', good: 'hello world', ...});
> > >
> > > I got the commit id (1st column), the line number (2nd column),
> > > file name (hello.js) and the code line (last line).
> > >
> > > Step 2, if the code line does not contain the selected text, the
> > >   recursive search stops
> > >
> > > Step 3, or else use commit id, line number and file name to build
> > >   new git blame cli, like,
> > >
> > > `git blame -L line-num,1 --porcelain 4f8740^ file-name`
> > >
> > > Step 4, execute new git blame command and start from Step 1
> >
> > This sounds a lot like how git-log's "-L" option works, which tries to
> > find the history of a line over many changes.
> >
> > It's also similar to the "re-blame from parent" feature of many blame
> > viewers. There we have a human in the loop saying "no, this is not quite
> > the change I'm looking for; go back further".
> >
> > -Peff
>
>
>
> --
> help me, help you.



-- 
help me, help you.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A new idea to extend git-blame
  2019-11-26  5:36     ` chen bin
@ 2019-11-27  7:32       ` chen bin
  2019-11-27 11:30         ` Jeff King
  0 siblings, 1 reply; 8+ messages in thread
From: chen bin @ 2019-11-27  7:32 UTC (permalink / raw)
  To: Jeff King; +Cc: Git ML

HI, Jeff,
Just double checking, the feature I suggested is fine, right? I will
send the patch asap. It may take 2 weeks to implement.

On Tue, Nov 26, 2019 at 4:36 PM chen bin <chenbin.sh@gmail.com> wrote:
>
> The problem of recursive blaming with 3rd party script is it's slow.
> The script have to restart git process multiple times. Besides, as I
> already did in Emacs (see
> https://github.com/redguardtoo/vc-msg/commit/1f7775f951fdd9e6571afbb3a767c42d20617f52),
> it's still lots of code.
>
> So if it's possible to use a simple cli option like `git blame
> -S"text"`to do the same thing, it would be easier to migrate vc-msg's
> functionality to other text editors.
>
> On Tue, Nov 26, 2019 at 3:55 PM chen bin <chenbin.sh@gmail.com> wrote:
> >
> > I re-tested `git log -L20,20:README.md` in git's own repo with HEAD
> > d01d26f2df. Looks git log is not what I expected. The output contains
> > many unrelated commits. So it will be slow in real project.
> >
> > A recursive blame with the algorithm I suggest will get correct commit
> > in short time. So my suggestion still hold.
> >
> > I could submit a patch to enhance blame.
> >
> > On Tue, Nov 26, 2019 at 1:16 AM Jeff King <peff@peff.net> wrote:
> > >
> > > On Mon, Nov 25, 2019 at 11:41:55PM +1100, Chen Bin wrote:
> > >
> > > > The key algorithm is simple,
> > > >
> > > > The algorithm only works for one line blame and the user must
> > > > select text inside the line first.
> > > >
> > > > Step 1, `git blame -L6,1 --porcelain -- hello.js` output,
> > > >
> > > >     4f87408612e0dacfd89a1cd2515944e21cf68561 6 6 1
> > > >     skip...
> > > >     filename hello.js
> > > >      doit({bad: 'destroy world', good: 'hello world', ...});
> > > >
> > > > I got the commit id (1st column), the line number (2nd column),
> > > > file name (hello.js) and the code line (last line).
> > > >
> > > > Step 2, if the code line does not contain the selected text, the
> > > >   recursive search stops
> > > >
> > > > Step 3, or else use commit id, line number and file name to build
> > > >   new git blame cli, like,
> > > >
> > > > `git blame -L line-num,1 --porcelain 4f8740^ file-name`
> > > >
> > > > Step 4, execute new git blame command and start from Step 1
> > >
> > > This sounds a lot like how git-log's "-L" option works, which tries to
> > > find the history of a line over many changes.
> > >
> > > It's also similar to the "re-blame from parent" feature of many blame
> > > viewers. There we have a human in the loop saying "no, this is not quite
> > > the change I'm looking for; go back further".
> > >
> > > -Peff
> >
> >
> >
> > --
> > help me, help you.
>
>
>
> --
> help me, help you.



-- 
help me, help you.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A new idea to extend git-blame
  2019-11-27  7:32       ` chen bin
@ 2019-11-27 11:30         ` Jeff King
  2019-11-27 12:18           ` chen bin
  2019-11-27 13:38           ` chen bin
  0 siblings, 2 replies; 8+ messages in thread
From: Jeff King @ 2019-11-27 11:30 UTC (permalink / raw)
  To: chen bin; +Cc: Git ML

On Wed, Nov 27, 2019 at 06:32:37PM +1100, chen bin wrote:

> Just double checking, the feature I suggested is fine, right? I will
> send the patch asap. It may take 2 weeks to implement.

To be clear, I can't say whether a patch is fine or not without seeing
the patch. :)

I'm not entirely sure I understand what you're proposing to implement.
But if it's of interest to you, maybe it makes sense to see if you can
make it work to your satisfaction, and then we can all look at the patch
and what it does to see if it makes sense to include in Git?

> > > I re-tested `git log -L20,20:README.md` in git's own repo with HEAD
> > > d01d26f2df. Looks git log is not what I expected. The output contains
> > > many unrelated commits. So it will be slow in real project.

Looking at the output from that command, the issue is that it's
imperfect to decide which lines in the pre- and post-image correspond to
each other. The first commit is:

  diff --git a/README.md b/README.md
  --- a/README.md
  +++ b/README.md
  @@ -26,3 +26,1 @@
  -See Documentation/gittutorial.txt to get started, then see
  -Documentation/giteveryday.txt for a useful minimum set of commands, and
  -Documentation/git-commandname.txt for documentation of each command.
  +See [Documentation/gittutorial.txt][] to get started, then see

at which point we consider all three of the pre-image lines to be
potentially interestnig. And then the next commit:

  diff --git a/README b/README
  --- a/README
  +++ b/README
  @@ -29,3 +29,3 @@
   See Documentation/gittutorial.txt to get started, then see
  -Documentation/everyday.txt for a useful minimum set of commands, and
  +Documentation/giteveryday.txt for a useful minimum set of commands, and
   Documentation/git-commandname.txt for documentation of each command.

touches one of those lines, and so forth. A human might see that in the
first hunk, it was probably the first line of the hunk that was
interesting to keep following backwards. But I don't think it can be
done automatically (which is why manual "reblame from parent" is still a
useful technique).

It sounds like your suggestion is to take some anchor text on the line
to decide which lines to keep following. But then it sounds a lot more
like a "log -L" feature than a git-blame feature.

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A new idea to extend git-blame
  2019-11-27 11:30         ` Jeff King
@ 2019-11-27 12:18           ` chen bin
  2019-11-27 13:38           ` chen bin
  1 sibling, 0 replies; 8+ messages in thread
From: chen bin @ 2019-11-27 12:18 UTC (permalink / raw)
  To: Jeff King; +Cc: Git ML

Got it. I will try to finish the patch first.

On Wed, Nov 27, 2019 at 10:30 PM Jeff King <peff@peff.net> wrote:
>
> On Wed, Nov 27, 2019 at 06:32:37PM +1100, chen bin wrote:
>
> > Just double checking, the feature I suggested is fine, right? I will
> > send the patch asap. It may take 2 weeks to implement.
>
> To be clear, I can't say whether a patch is fine or not without seeing
> the patch. :)
>
> I'm not entirely sure I understand what you're proposing to implement.
> But if it's of interest to you, maybe it makes sense to see if you can
> make it work to your satisfaction, and then we can all look at the patch
> and what it does to see if it makes sense to include in Git?
>
> > > > I re-tested `git log -L20,20:README.md` in git's own repo with HEAD
> > > > d01d26f2df. Looks git log is not what I expected. The output contains
> > > > many unrelated commits. So it will be slow in real project.
>
> Looking at the output from that command, the issue is that it's
> imperfect to decide which lines in the pre- and post-image correspond to
> each other. The first commit is:
>
>   diff --git a/README.md b/README.md
>   --- a/README.md
>   +++ b/README.md
>   @@ -26,3 +26,1 @@
>   -See Documentation/gittutorial.txt to get started, then see
>   -Documentation/giteveryday.txt for a useful minimum set of commands, and
>   -Documentation/git-commandname.txt for documentation of each command.
>   +See [Documentation/gittutorial.txt][] to get started, then see
>
> at which point we consider all three of the pre-image lines to be
> potentially interestnig. And then the next commit:
>
>   diff --git a/README b/README
>   --- a/README
>   +++ b/README
>   @@ -29,3 +29,3 @@
>    See Documentation/gittutorial.txt to get started, then see
>   -Documentation/everyday.txt for a useful minimum set of commands, and
>   +Documentation/giteveryday.txt for a useful minimum set of commands, and
>    Documentation/git-commandname.txt for documentation of each command.
>
> touches one of those lines, and so forth. A human might see that in the
> first hunk, it was probably the first line of the hunk that was
> interesting to keep following backwards. But I don't think it can be
> done automatically (which is why manual "reblame from parent" is still a
> useful technique).
>
> It sounds like your suggestion is to take some anchor text on the line
> to decide which lines to keep following. But then it sounds a lot more
> like a "log -L" feature than a git-blame feature.
>
> -Peff



-- 
help me, help you.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A new idea to extend git-blame
  2019-11-27 11:30         ` Jeff King
  2019-11-27 12:18           ` chen bin
@ 2019-11-27 13:38           ` chen bin
  1 sibling, 0 replies; 8+ messages in thread
From: chen bin @ 2019-11-27 13:38 UTC (permalink / raw)
  To: Jeff King; +Cc: Git ML

What I did is "partial line blame", select partial line, do recursive blame,
stop when the selected text disappears.

There are screenshots at
http://blog.binchen.org/posts/effective-git-blame-in-emacs.html

I already implemented in Emacs Lisp. For now looks it works.

I understand `git log -L20,20:README.md` can track the change before `README`
is renamed to `README.md`. But my recursive blame code can't.

The problem is, `git log -L20,20:README.md` output too much text.
See https://gist.github.com/redguardtoo/a0c178350414449c45e6b67e16249000

It's difficult to track the correct commit if anchor text is generic
work like "true".

For example, the full text of line 20 in `README.md` (HEAD is d9f6f3b619 ) is,
`See [Documentation/gittutorial.txt][] to get started, then see`
To detect the commit adding the word `see`, the recursive blame algorithm stops
at commit 6164972018 because 6164972018^ rename `README` to `README.md`.

That's acceptable in real world because users can base the manual
blame on 6164972018.

But in the output of `git log command`, there are too many
"occurrence" of word "see" so it's
difficult to find the correct commit.

On Wed, Nov 27, 2019 at 10:30 PM Jeff King <peff@peff.net> wrote:
>
> On Wed, Nov 27, 2019 at 06:32:37PM +1100, chen bin wrote:
>
> > Just double checking, the feature I suggested is fine, right? I will
> > send the patch asap. It may take 2 weeks to implement.
>
> To be clear, I can't say whether a patch is fine or not without seeing
> the patch. :)
>
> I'm not entirely sure I understand what you're proposing to implement.
> But if it's of interest to you, maybe it makes sense to see if you can
> make it work to your satisfaction, and then we can all look at the patch
> and what it does to see if it makes sense to include in Git?
>
> > > > I re-tested `git log -L20,20:README.md` in git's own repo with HEAD
> > > > d01d26f2df. Looks git log is not what I expected. The output contains
> > > > many unrelated commits. So it will be slow in real project.
>
> Looking at the output from that command, the issue is that it's
> imperfect to decide which lines in the pre- and post-image correspond to
> each other. The first commit is:
>
>   diff --git a/README.md b/README.md
>   --- a/README.md
>   +++ b/README.md
>   @@ -26,3 +26,1 @@
>   -See Documentation/gittutorial.txt to get started, then see
>   -Documentation/giteveryday.txt for a useful minimum set of commands, and
>   -Documentation/git-commandname.txt for documentation of each command.
>   +See [Documentation/gittutorial.txt][] to get started, then see
>
> at which point we consider all three of the pre-image lines to be
> potentially interestnig. And then the next commit:
>
>   diff --git a/README b/README
>   --- a/README
>   +++ b/README
>   @@ -29,3 +29,3 @@
>    See Documentation/gittutorial.txt to get started, then see
>   -Documentation/everyday.txt for a useful minimum set of commands, and
>   +Documentation/giteveryday.txt for a useful minimum set of commands, and
>    Documentation/git-commandname.txt for documentation of each command.
>
> touches one of those lines, and so forth. A human might see that in the
> first hunk, it was probably the first line of the hunk that was
> interesting to keep following backwards. But I don't think it can be
> done automatically (which is why manual "reblame from parent" is still a
> useful technique).
>
> It sounds like your suggestion is to take some anchor text on the line
> to decide which lines to keep following. But then it sounds a lot more
> like a "log -L" feature than a git-blame feature.
>
> -Peff



-- 
help me, help you.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-11-27 13:38 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-25 12:41 A new idea to extend git-blame Chen Bin
2019-11-25 14:16 ` Jeff King
2019-11-26  4:55   ` chen bin
2019-11-26  5:36     ` chen bin
2019-11-27  7:32       ` chen bin
2019-11-27 11:30         ` Jeff King
2019-11-27 12:18           ` chen bin
2019-11-27 13:38           ` chen bin

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).