From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH] blame-tree: add library and tests via "test-tool blame-tree"
Date: Fri, 17 Feb 2023 15:42:06 -0500 [thread overview]
Message-ID: <Y+/mnnJUz75yfWCN@coredump.intra.peff.net> (raw)
In-Reply-To: <patch-1.1-0ea849d900b-20230205T204104Z-avarab@gmail.com>
On Sun, Feb 05, 2023 at 09:47:03PM +0100, Ævar Arnfjörð Bjarmason wrote:
> From: Jeff King <peff@peff.net>
I appreciate being credited, of course, but at some point I think this
becomes "Based-on-a-patch-by". And we may have crossed the line here.
> The "blame-tree" library allows for finding the most recent
> modification to paths in a tree. It does so by expanding the tree at a
> given commit, taking note of the current state of each path, and then
> walking backwards through history looking for commits where each path
> changed into its final sha1.
>
> The "git blame-tree" command was first noted on the ML 2011[1], and
> over the years there have been mentions of it[2][3], and whether it
> could be upstreamed. The sources have been available at [4]'s
> "jk/blame-tree-wip" and "jk/faster-blame-tree-wip" branches.
Sort of. jk/blame-tree-wip probably matches what I sent to the list in
2011 (and that probably matches what was deployed at GitHub at the
time). And like all of my branches, I continually rebase it on git.git's
master branch. But like all branches in my repo with "-wip" in them, it
is not part of my daily driver, and I generally do not even build it
(and I would not be surprised if it does not build at all, or one of
those rebases introduced a horrible bug).
The jk/faster-blame-tree-wip was an experiment from 2014 to narrow the
pathspec as we found answers. But it was never deployed anywhere, and
likewise may or may not even build now. I think the general idea there
is sound (make pathspec lookups faster by using a trie, and then narrow
it), but I suspect it's not a full solution. In particular, I don't
think it does anything clever with merges. Since we are narrowing the
pathspec as we traverse, we can't rely on the usual pruning of side
branches that happens via limit_list(), as that is all up-front. So it
probably needs to look at each merge as we traverse and cull parents
based on TREESAME.
I believe GitHub later had patches to do that, but they didn't use the
pathspec machinery (because without the tries, it becomes accidentally
quadratic in the number of paths). I didn't work on those patches,
though (Stolee and Taylor did), and they're not anywhere in my
repository (and I no longer have access to the private github repo).
> This change attempts to start upstreaming this command, but rather
> than adding a command whose interface & semantics may be controversial
> starts by exposing & testing the core of its library through a new
> test helper.
>
> An eventual "git blame-tree" command, or e.g. a new format for "git
> ls-tree" to show what a path "blames" to can then be implemented with
> this library.
OK. It's a little weird, as I do think the interface and semantics are
the more interesting part, but this certainly isn't hurting anybody to
go this route.
> * Removing the "--max-depth" changes to the diff code. We'll need
> those eventually, but it's not required for a blame of a given list
> of paths.
>
> As has been noted in previous on-list discussions the semantics of
> the "max-depth" changes might be controversial, so it's worthwhile
> to split those out so that they can be reviewed separately.
That's probably reasonable. The only two interesting "depths" are really
"recurse" and "don't recurse" (where "recurse" is probably what you'd
put in a user-facing tool, and "don't recurse" is what a site like
GitHub uses to do the blame for a single level of tree it's showing).
And that narrows the problem space quite a bit.
> * Made the "blame-tree" helper take "--" before any revision options,
> for clarity. An eventual built-in command (if any) probably doesn't
> want to enforce this, but it makes it clearer in the test helper
> what's an argument for "blame-tree" itself, and what's an argument for
> the revision machinery.
OK. Since this is just a test-helper, we don't care too much either way.
> * Minor updates for using C99 syntax, and "size_t" instead of "int"
> when we're iterating over types whose "nr" is that size.
Reasonable.
> * Avoid sub-shelling in the tests, use "test-tool -C .." instead.
Yeah, the original code probably predates "-C". ;)
> The range-diff here is to peff's jk/blame-tree-wip. As noted above
> this is far from the full thing, but hopefully getting the basic bits
> of the library (sans the max-depth question) will make the review of
> subsequent bits easier.
I doubt this range-diff is useful to anybody. I'm probably the person
most likely to make sense of it, and it means nothing to me. It probably
makes sense conceptually to just treat this as a new topic that happens
to be based on older work.
But if you did want to base it on something, you probably ought to do so
on what GitHub is currently running in production (and again, talk to
Taylor or Stolee for that). Unless the intent is to use this as a base
for showing their changes. Though even then, I'm not sure the
intermediate state is all that interesting.
> [oodles of patch]
TBH, I didn't really look at this closely. It's been a decade, so even
for me reviewing this would basically be looking at it from scratch. And
as my Git time is a bit limited these days, I can't really promise
timely review of a big topic.
-Peff
prev parent reply other threads:[~2023-02-17 20:42 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-05 20:47 [PATCH] blame-tree: add library and tests via "test-tool blame-tree" Ævar Arnfjörð Bjarmason
2023-02-10 15:42 ` Derrick Stolee
2023-03-07 13:56 ` Ævar Arnfjörð Bjarmason
2023-03-08 15:30 ` Derrick Stolee
2023-03-08 15:49 ` Taylor Blau
2023-02-17 20:42 ` Jeff King [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y+/mnnJUz75yfWCN@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).