From: Junio C Hamano <gitster@pobox.com>
To: Stefan Beller <sbeller@google.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
Date: Fri, 24 Nov 2017 16:43:49 +0900 [thread overview]
Message-ID: <xmqqpo88m896.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <20171120222529.24995-2-sbeller@google.com> (Stefan Beller's message of "Mon, 20 Nov 2017 14:25:29 -0800")
Stefan Beller <sbeller@google.com> writes:
> Sometimes users are given a hash of an object and they want to
> identify it further (ex.: Use verify-pack to find the largest blobs,
> but what are these? or [1])
>
> One might be tempted to extend git-describe to also work with blobs,
> such that `git describe <blob-id>` gives a description as
> '<commit-ish>:<path>'. This was implemented at [2]; as seen by the sheer
> number of responses (>110), it turns out this is tricky to get right.
> The hard part to get right is picking the correct 'commit-ish' as that
> could be the commit that (re-)introduced the blob or the blob that
> removed the blob; the blob could exist in different branches.
>
> Junio hinted at a different approach of solving this problem, which this
> patch implements. Teach the diff machinery another flag for restricting
> the information to what is shown. For example:
>
> $ ./git log --oneline --blobfind=v2.0.0:Makefile
> b2feb64309 Revert the whole "ask curl-config" topic for now
> 47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"
>
> we observe that the Makefile as shipped with 2.0 was introduced in
> v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by
> a different blob.
>
> [1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob
> [2] https://public-inbox.org/git/20171028004419.10139-1-sbeller@google.com/
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>
> On playing around with this, trying to find more interesting cases, I observed:
>
> git log --oneline --blobfind=HEAD:COPYING
> 703601d678 Update COPYING with GPLv2 with new FSF address
>
> git log --oneline --blobfind=703601d678^:COPYING
> 459b8d22e5 tests: do not borrow from COPYING and README from the real source
> 703601d678 Update COPYING with GPLv2 with new FSF address
> 075b845a85 Add a COPYING notice, making it explicit that the license is GPLv2.
>
> t/diff-lib/COPYING may need an update of the adress of the FSF,
> # leftoverbits I guess.
I do not think so. See tz/fsf-address-update topic for details.
Please do not contaminate the list archive with careless mention of
"hash-mark plus left over bits", as it will make searching the real
good bits harder. Thanks.
> Another interesting case that I found was
> git log --oneline --blobfind=v2.14.0:Makefile
> 3921a0b3c3 perf: add test for writing the index
> 36f048c5e4 sha1dc: build git plumbing code more explicitly
> 2118805b92 Makefile: add style build rule
>
> all of which were after v2.14, such that the introduction of that blob doesn't
> show up; I suspect it came in via a merge as unrelated series may have updated
> the Makefile in parallel, though git-log should have told me?
If that is the case, shouldn't we make this new mode imply
--full-history to forbid history simplification? "git log" is a
tool to find _an_ explanation of the current state, and the usual
history simplification makes tons of sense there, but blobfind is
run most likely in order to find _all_ mention of the set of blobs
given.
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index dd0dba5b1d..252a21cc19 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -500,6 +500,10 @@ information.
> --pickaxe-regex::
> Treat the <string> given to `-S` as an extended POSIX regular
> expression to match.
> +--blobfind=<blob-id>::
> + Restrict the output such that one side of the diff
> + matches the given blob-id.
> +
> endif::git-format-patch[]
Can we have a blank line between these enumerations to make the
source easier to read? Thanks.
> diff --git a/diffcore-blobfind.c b/diffcore-blobfind.c
> new file mode 100644
> index 0000000000..5d222fc336
> --- /dev/null
> +++ b/diffcore-blobfind.c
> @@ -0,0 +1,51 @@
> +/*
> + * Copyright (c) 2017 Google Inc.
> + */
> +#include "cache.h"
> +#include "diff.h"
> +#include "diffcore.h"
> +
> +static void diffcore_filter_blobs(struct diff_queue_struct *q,
> + struct diff_options *options)
> +{
> + int i, j = 0, c = q->nr;
> +
> + if (!options->blobfind)
> + BUG("blobfind oidset not initialized???");
> +
> + for (i = 0; i < q->nr; i++) {
> + struct diff_filepair *p = q->queue[i];
> +
> + if (DIFF_PAIR_UNMERGED(p) ||
> + (DIFF_FILE_VALID(p->one) &&
> + oidset_contains(options->blobfind, &p->one->oid)) ||
> + (DIFF_FILE_VALID(p->two) &&
> + oidset_contains(options->blobfind, &p->two->oid)))
> + continue;
So, we keep an unmerged pair, a pair that mentions a sought-blob on
one side or the other side? I am not sure if we want to keep the
unmerged pair for the purpose of this one.
> + diff_free_filepair(p);
> + q->queue[i] = NULL;
> + c--;
Also, if you are doing the in-place shrinking and have already
introduced another counter 'j' that is initialized to 0, I think it
makes more sense to do the shrinking in-place. 'i' will stay to be
the source-scan pointer that runs 0 thru q->nr, while 'j' can be
used in this loop (where you have 'continue') to move the current
one that is determined to survive from q->queue[i] to q->queue[j++].
Then you do not need 'c'; when the loop ends, 'j' would be the
number of surviving entries and q->nr can be adjusted to it. Unlike
the usual pattern taken by the other diffcore transformations where
a new queue is populated and the old one discarded, this would leave
the q->queue[] over-allocated, but I do not think it is too bad.
next prev parent reply other threads:[~2017-11-24 7:43 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-20 22:25 [PATCH 0/1] Teaching the diff machinery about blobfind [WAS: git describe <blob>] Stefan Beller
2017-11-20 22:25 ` [PATCH 1/1] diffcore: add a filter to find a specific blob Stefan Beller
2017-11-24 7:43 ` Junio C Hamano [this message]
2017-11-25 4:59 ` Junio C Hamano
2017-12-07 21:40 ` Junio C Hamano
-- strict thread matches above, loose matches on Subject: below --
2017-12-08 0:24 [PATCH 0/1] diffcore-blobfind Stefan Beller
2017-12-08 0:24 ` [PATCH 1/1] diffcore: add a filter to find a specific blob Stefan Beller
2017-12-08 9:34 ` Jeff King
2017-12-08 16:28 ` Ramsay Jones
2017-12-08 20:19 ` Jeff King
2017-12-08 20:39 ` Stefan Beller
2017-12-08 21:38 ` Jeff King
2017-12-08 15:04 ` Junio C Hamano
2017-12-08 17:21 ` Junio C Hamano
2017-12-08 21:11 ` Stefan Beller
2017-12-08 21:15 ` Junio C Hamano
2017-12-11 19:58 ` [PATCH 0/1] diff-core blobfind Stefan Beller
2017-12-11 19:58 ` [PATCH 1/1] diffcore: add a filter to find a specific blob Stefan Beller
2017-12-11 23:17 ` Junio C Hamano
2017-12-12 0:21 ` Stefan Beller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqpo88m896.fsf@gitster.mtv.corp.google.com \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=sbeller@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).