git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Jeff King <peff@peff.net>, Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH] blame-tree: add library and tests via "test-tool blame-tree"
Date: Wed, 8 Mar 2023 10:30:43 -0500	[thread overview]
Message-ID: <d102dd22-778d-add6-faf9-20bf87d107c7@github.com> (raw)
In-Reply-To: <230307.86o7p4zm4s.gmgdl@evledraar.gmail.com>

On 3/7/2023 8:56 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Feb 10 2023, Derrick Stolee wrote:

>> All this is to say, that I'd like to see this API start with the smallest
>> possible surface area and with the simplest implementation, and then I'd
>> be happy to contribute those algorithms within the API boundary while the
>> CLI is handled independently.
> 
> I hear your concern about leaving this open for optimization, and in
> general I'd vehemently agree with it, except for needing to eventually
> feed a command-line to setup_revisions().

The most-correct way to build this, with full optimizations, does not
involve revisions.c at all, so this "eventually" is incorrect. It's
only something to do for the "first" implementation, as a reference.

In order to do the single-walk approach for every path simultaneously,
we _must_ have full control of the commit walk. There was a time where
we had done a single-walk approach by letting the revision machinery
walk all commits that changed the base tree, then looked for changes
to the contained paths. However, this results in _incorrect_ results
because commits that would normally be ignored by the simplified
history walk for "<dir>/<entry>" were not ignored by the simplified
history walk for "<dir>/" and thus that algorithm presented _incorrect
results_.

For that reason, doing a single walk that outputs the blame-tree
results for each path must have full control over which commits are
walked and which paths could emit a change for those commits. This
means we must not use revision.c as a base for full control.

> Ideally the revision API would make what you're describing easy, but the
> way it's currently implemented (and changing it would be a much larger
> project) someone who'd like to pass structured options in the way you'd
> describe will end up having to re-implement bug-for-bug compatible
> versions of some subset of the option parsing in revision.c.

The subset of option parsing is "a starting revision" and "a base tree"
and _perhaps_ "is the diff recursive or not?" (and this last one isn't
even in revision.c yet). That does not seem like using revision.c's
parsing is actually helpful at all.

> Isn't a way to get the best of both worlds to have a small snippet of
> code that inspects the "struct rev_info" before & after
> setup_revisions(), and which would only implement certain optimizations
> if certain known options are provided, but not if any unknown ones are?
> 
> That way those who'd like the faster happy path could use that subset of
> options, while the general API would allow any revision options. We'd
> then error() or BUG() out only if we fail to map our expected paths to
> OIDs.
 
This option requires examining the long and ever-growing list of options
to struct rev_info which will take much more work than parsing a starting
ref and a path from the command-line.

> I think those are all good ways forward here, and I'd much prefer those
> to having to re-implement or pull out subsets of the current option
> parsing logic in revision.c. What do you think?

I think you are skirting over the difficult part about upstreaming the
blame-tree command, which is the biggest reason we have not done it in
the past. The way it is implemented in our fork started with this "just
parse args using revision.c" because that's the easiest way to implement
the naive implementation, but we were able to make optimizations on top
only because we had full control over the callers not using any other
options. We would not have been able to make the assumptions that allowed
those performance enhancements without that control. Actually building the
interface in a way that guarantees the behavior will be stable and
understood is not easy, but is worth doing well.

Thanks,
-Stolee

  reply	other threads:[~2023-03-08 15:30 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-05 20:47 [PATCH] blame-tree: add library and tests via "test-tool blame-tree" Ævar Arnfjörð Bjarmason
2023-02-10 15:42 ` Derrick Stolee
2023-03-07 13:56   ` Ævar Arnfjörð Bjarmason
2023-03-08 15:30     ` Derrick Stolee [this message]
2023-03-08 15:49     ` Taylor Blau
2023-02-17 20:42 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d102dd22-778d-add6-faf9-20bf87d107c7@github.com \
    --to=derrickstolee@github.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).