git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/1] Teaching the diff machinery about blobfind [WAS: git describe <blob>]
@ 2017-11-20 22:25 Stefan Beller
  2017-11-20 22:25 ` [PATCH 1/1] diffcore: add a filter to find a specific blob Stefan Beller
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Beller @ 2017-11-20 22:25 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

This seems to be an easier approach; thanks Junio for hinting at it.
This certainly solves our immediate needs, we may want to build
'git describe <blob>' on top of it or defer it until later.

Thanks,
Stefan

previous descussion
https://public-inbox.org/git/20171028004419.10139-1-sbeller@google.com/

Stefan Beller (1):
  diffcore: add a filter to find a specific blob

 Documentation/diff-options.txt |  4 ++++
 Makefile                       |  1 +
 builtin/log.c                  |  2 +-
 diff.c                         | 20 ++++++++++++++++-
 diff.h                         |  3 +++
 diffcore-blobfind.c            | 51 ++++++++++++++++++++++++++++++++++++++++++
 diffcore.h                     |  1 +
 revision.c                     |  3 ++-
 t/t4064-diff-blobfind.sh       | 35 +++++++++++++++++++++++++++++
 9 files changed, 117 insertions(+), 3 deletions(-)
 create mode 100644 diffcore-blobfind.c
 create mode 100755 t/t4064-diff-blobfind.sh

-- 
2.15.0.128.gcadd42da22


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-11-20 22:25 [PATCH 0/1] Teaching the diff machinery about blobfind [WAS: git describe <blob>] Stefan Beller
@ 2017-11-20 22:25 ` Stefan Beller
  2017-11-24  7:43   ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Beller @ 2017-11-20 22:25 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Sometimes users are given a hash of an object and they want to
identify it further (ex.: Use verify-pack to find the largest blobs,
but what are these? or [1])

One might be tempted to extend git-describe to also work with blobs,
such that `git describe <blob-id>` gives a description as
'<commit-ish>:<path>'.  This was implemented at [2]; as seen by the sheer
number of responses (>110), it turns out this is tricky to get right.
The hard part to get right is picking the correct 'commit-ish' as that
could be the commit that (re-)introduced the blob or the blob that
removed the blob; the blob could exist in different branches.

Junio hinted at a different approach of solving this problem, which this
patch implements. Teach the diff machinery another flag for restricting
the information to what is shown. For example:

  $ ./git log --oneline --blobfind=v2.0.0:Makefile
  b2feb64309 Revert the whole "ask curl-config" topic for now
  47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"

we observe that the Makefile as shipped with 2.0 was introduced in
v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by
a different blob.

[1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob
[2] https://public-inbox.org/git/20171028004419.10139-1-sbeller@google.com/

Signed-off-by: Stefan Beller <sbeller@google.com>
---

On playing around with this, trying to find more interesting cases, I observed:

    git log --oneline --blobfind=HEAD:COPYING
    703601d678 Update COPYING with GPLv2 with new FSF address
    
    git log --oneline --blobfind=703601d678^:COPYING
    459b8d22e5 tests: do not borrow from COPYING and README from the real source
    703601d678 Update COPYING with GPLv2 with new FSF address
    075b845a85 Add a COPYING notice, making it explicit that the license is GPLv2.

    t/diff-lib/COPYING may need an update of the adress of the FSF,
    #leftoverbits I guess.
    
Another interesting case that I found was
   git log --oneline --blobfind=v2.14.0:Makefile
   3921a0b3c3 perf: add test for writing the index
   36f048c5e4 sha1dc: build git plumbing code more explicitly
   2118805b92 Makefile: add style build rule

all of which were after v2.14, such that the introduction of that blob doesn't
show up; I suspect it came in via a merge as unrelated series may have updated
the Makefile in parallel, though git-log should have told me?

Thanks,
Stefan

 Documentation/diff-options.txt |  4 ++++
 Makefile                       |  1 +
 builtin/log.c                  |  2 +-
 diff.c                         | 20 ++++++++++++++++-
 diff.h                         |  3 +++
 diffcore-blobfind.c            | 51 ++++++++++++++++++++++++++++++++++++++++++
 diffcore.h                     |  1 +
 revision.c                     |  3 ++-
 t/t4064-diff-blobfind.sh       | 35 +++++++++++++++++++++++++++++
 9 files changed, 117 insertions(+), 3 deletions(-)
 create mode 100644 diffcore-blobfind.c
 create mode 100755 t/t4064-diff-blobfind.sh

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index dd0dba5b1d..252a21cc19 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -500,6 +500,10 @@ information.
 --pickaxe-regex::
 	Treat the <string> given to `-S` as an extended POSIX regular
 	expression to match.
+--blobfind=<blob-id>::
+	Restrict the output such that one side of the diff
+	matches the given blob-id.
+
 endif::git-format-patch[]
 
 -O<orderfile>::
diff --git a/Makefile b/Makefile
index ee9d5eb11e..fdfa8f38f6 100644
--- a/Makefile
+++ b/Makefile
@@ -775,6 +775,7 @@ LIB_OBJS += date.o
 LIB_OBJS += decorate.o
 LIB_OBJS += diffcore-break.o
 LIB_OBJS += diffcore-delta.o
+LIB_OBJS += diffcore-blobfind.o
 LIB_OBJS += diffcore-order.o
 LIB_OBJS += diffcore-pickaxe.o
 LIB_OBJS += diffcore-rename.o
diff --git a/builtin/log.c b/builtin/log.c
index 6c1fa896ad..7b91f61423 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -181,7 +181,7 @@ static void cmd_log_init_finish(int argc, const char **argv, const char *prefix,
 		init_display_notes(&rev->notes_opt);
 
 	if (rev->diffopt.pickaxe || rev->diffopt.filter ||
-	    rev->diffopt.flags.follow_renames)
+	    rev->diffopt.flags.follow_renames || rev->diffopt.blobfind)
 		rev->always_show_header = 0;
 
 	if (source)
diff --git a/diff.c b/diff.c
index 0763e89263..8861f89ab1 100644
--- a/diff.c
+++ b/diff.c
@@ -4082,6 +4082,7 @@ void diff_setup(struct diff_options *options)
 	options->interhunkcontext = diff_interhunk_context_default;
 	options->ws_error_highlight = ws_error_highlight_default;
 	options->flags.rename_empty = 1;
+	options->blobfind = NULL;
 
 	/* pathchange left =NULL by default */
 	options->change = diff_change;
@@ -4487,6 +4488,19 @@ static int parse_ws_error_highlight_opt(struct diff_options *opt, const char *ar
 	return 1;
 }
 
+static int parse_blobfind_opt(struct diff_options *opt, const char *arg)
+{
+	struct object_id oid;
+
+	if (get_oid_blob(arg, &oid) || sha1_object_info(oid.hash, NULL) != OBJ_BLOB)
+		return error("object '%s' is not a blob", arg);
+
+	if (!opt->blobfind)
+		opt->blobfind = xcalloc(1, sizeof(*opt->blobfind));
+	oidset_insert(opt->blobfind, &oid);
+	return 1;
+}
+
 int diff_opt_parse(struct diff_options *options,
 		   const char **av, int ac, const char *prefix)
 {
@@ -4736,7 +4750,8 @@ int diff_opt_parse(struct diff_options *options,
 	else if ((argcount = short_opt('O', av, &optarg))) {
 		options->orderfile = prefix_filename(prefix, optarg);
 		return argcount;
-	}
+	} else if (skip_prefix(arg, "--blobfind=", &arg))
+		return parse_blobfind_opt(options, arg);
 	else if ((argcount = parse_long_opt("diff-filter", av, &optarg))) {
 		int offending = parse_diff_filter_opt(optarg, options);
 		if (offending)
@@ -5770,6 +5785,9 @@ void diffcore_std(struct diff_options *options)
 		diffcore_skip_stat_unmatch(options);
 	if (!options->found_follow) {
 		/* See try_to_follow_renames() in tree-diff.c */
+
+		if (options->blobfind)
+			diffcore_blobfind(options);
 		if (options->break_opt != -1)
 			diffcore_break(options->break_opt);
 		if (options->detect_rename)
diff --git a/diff.h b/diff.h
index 0fb18dd735..9178e498fa 100644
--- a/diff.h
+++ b/diff.h
@@ -7,6 +7,7 @@
 #include "tree-walk.h"
 #include "pathspec.h"
 #include "object.h"
+#include "oidset.h"
 
 struct rev_info;
 struct diff_options;
@@ -174,6 +175,8 @@ struct diff_options {
 	enum diff_words_type word_diff;
 	enum diff_submodule_format submodule_format;
 
+	struct oidset *blobfind;
+
 	/* this is set by diffcore for DIFF_FORMAT_PATCH */
 	int found_changes;
 
diff --git a/diffcore-blobfind.c b/diffcore-blobfind.c
new file mode 100644
index 0000000000..5d222fc336
--- /dev/null
+++ b/diffcore-blobfind.c
@@ -0,0 +1,51 @@
+/*
+ * Copyright (c) 2017 Google Inc.
+ */
+#include "cache.h"
+#include "diff.h"
+#include "diffcore.h"
+
+static void diffcore_filter_blobs(struct diff_queue_struct *q,
+				  struct diff_options *options)
+{
+	int i, j = 0, c = q->nr;
+
+	if (!options->blobfind)
+		BUG("blobfind oidset not initialized???");
+
+	for (i = 0; i < q->nr; i++) {
+		struct diff_filepair *p = q->queue[i];
+
+		if (DIFF_PAIR_UNMERGED(p) ||
+		    (DIFF_FILE_VALID(p->one) &&
+		     oidset_contains(options->blobfind, &p->one->oid)) ||
+		    (DIFF_FILE_VALID(p->two) &&
+		     oidset_contains(options->blobfind, &p->two->oid)))
+			continue;
+
+		diff_free_filepair(p);
+		q->queue[i] = NULL;
+		c--;
+	}
+
+	/* Keep it sorted. */
+	i = 0; j = 0;
+	while (i < c) {
+		while (!q->queue[j])
+			j++;
+		q->queue[i] = q->queue[j];
+		i++; j++;
+	}
+
+	q->nr = c;
+
+	if (!c) {
+		free(q->queue);
+		DIFF_QUEUE_CLEAR(q);
+	}
+}
+
+void diffcore_blobfind(struct diff_options *options)
+{
+	diffcore_filter_blobs(&diff_queued_diff, options);
+}
diff --git a/diffcore.h b/diffcore.h
index a30da161da..431917672f 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -107,6 +107,7 @@ extern struct diff_filepair *diff_queue(struct diff_queue_struct *,
 					struct diff_filespec *);
 extern void diff_q(struct diff_queue_struct *, struct diff_filepair *);
 
+extern void diffcore_blobfind(struct diff_options *);
 extern void diffcore_break(int);
 extern void diffcore_rename(struct diff_options *);
 extern void diffcore_merge_broken(void);
diff --git a/revision.c b/revision.c
index e2e691dd5a..6449619c0a 100644
--- a/revision.c
+++ b/revision.c
@@ -2409,7 +2409,8 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 	/* Pickaxe, diff-filter and rename following need diffs */
 	if (revs->diffopt.pickaxe ||
 	    revs->diffopt.filter ||
-	    revs->diffopt.flags.follow_renames)
+	    revs->diffopt.flags.follow_renames ||
+	    revs->diffopt.blobfind)
 		revs->diff = 1;
 
 	if (revs->topo_order)
diff --git a/t/t4064-diff-blobfind.sh b/t/t4064-diff-blobfind.sh
new file mode 100755
index 0000000000..b2c2964d77
--- /dev/null
+++ b/t/t4064-diff-blobfind.sh
@@ -0,0 +1,35 @@
+#!/bin/sh
+
+test_description='test finding specific blobs in the revision walking'
+. ./test-lib.sh
+
+test_expect_success 'setup ' '
+	git commit --allow-empty -m "empty initial commit" &&
+
+	echo "Hello, world!" >greeting &&
+	git add greeting &&
+	git commit -m "add the greeting blob" && # borrowed from Git from the Bottom Up
+	git tag -m "the blob" greeting $(git rev-parse HEAD:greeting) &&
+
+	echo asdf >unrelated &&
+	git add unrelated &&
+	git commit -m "unrelated history" &&
+
+	git revert HEAD^ &&
+
+	git commit --allow-empty -m "another unrelated commit"
+'
+
+test_expect_success 'find the greeting blob' '
+	cat >expect <<-EOF &&
+	Revert "add the greeting blob"
+	add the greeting blob
+	EOF
+
+	git log --abbrev=12 --oneline --blobfind=greeting^{blob} >actual.raw &&
+	cut -c 14- actual.raw >actual &&
+
+	test_cmp expect actual
+'
+
+test_done
-- 
2.15.0.128.gcadd42da22


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-11-20 22:25 ` [PATCH 1/1] diffcore: add a filter to find a specific blob Stefan Beller
@ 2017-11-24  7:43   ` Junio C Hamano
  2017-11-25  4:59     ` Junio C Hamano
  2017-12-07 21:40     ` Junio C Hamano
  0 siblings, 2 replies; 18+ messages in thread
From: Junio C Hamano @ 2017-11-24  7:43 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> Sometimes users are given a hash of an object and they want to
> identify it further (ex.: Use verify-pack to find the largest blobs,
> but what are these? or [1])
>
> One might be tempted to extend git-describe to also work with blobs,
> such that `git describe <blob-id>` gives a description as
> '<commit-ish>:<path>'.  This was implemented at [2]; as seen by the sheer
> number of responses (>110), it turns out this is tricky to get right.
> The hard part to get right is picking the correct 'commit-ish' as that
> could be the commit that (re-)introduced the blob or the blob that
> removed the blob; the blob could exist in different branches.
>
> Junio hinted at a different approach of solving this problem, which this
> patch implements. Teach the diff machinery another flag for restricting
> the information to what is shown. For example:
>
>   $ ./git log --oneline --blobfind=v2.0.0:Makefile
>   b2feb64309 Revert the whole "ask curl-config" topic for now
>   47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"
>
> we observe that the Makefile as shipped with 2.0 was introduced in
> v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by
> a different blob.
>
> [1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob
> [2] https://public-inbox.org/git/20171028004419.10139-1-sbeller@google.com/
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>
> On playing around with this, trying to find more interesting cases, I observed:
>
>     git log --oneline --blobfind=HEAD:COPYING
>     703601d678 Update COPYING with GPLv2 with new FSF address
>     
>     git log --oneline --blobfind=703601d678^:COPYING
>     459b8d22e5 tests: do not borrow from COPYING and README from the real source
>     703601d678 Update COPYING with GPLv2 with new FSF address
>     075b845a85 Add a COPYING notice, making it explicit that the license is GPLv2.
>
>     t/diff-lib/COPYING may need an update of the adress of the FSF,
>     # leftoverbits I guess.

I do not think so.  See tz/fsf-address-update topic for details.

Please do not contaminate the list archive with careless mention of 
"hash-mark plus left over bits", as it will make searching the real
good bits harder.  Thanks.

> Another interesting case that I found was
>    git log --oneline --blobfind=v2.14.0:Makefile
>    3921a0b3c3 perf: add test for writing the index
>    36f048c5e4 sha1dc: build git plumbing code more explicitly
>    2118805b92 Makefile: add style build rule
>
> all of which were after v2.14, such that the introduction of that blob doesn't
> show up; I suspect it came in via a merge as unrelated series may have updated
> the Makefile in parallel, though git-log should have told me?

If that is the case, shouldn't we make this new mode imply
--full-history to forbid history simplification?  "git log" is a
tool to find _an_ explanation of the current state, and the usual
history simplification makes tons of sense there, but blobfind is
run most likely in order to find _all_ mention of the set of blobs
given.

> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index dd0dba5b1d..252a21cc19 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -500,6 +500,10 @@ information.
>  --pickaxe-regex::
>  	Treat the <string> given to `-S` as an extended POSIX regular
>  	expression to match.
> +--blobfind=<blob-id>::
> +	Restrict the output such that one side of the diff
> +	matches the given blob-id.
> +
>  endif::git-format-patch[]

Can we have a blank line between these enumerations to make the
source easier to read?  Thanks.

> diff --git a/diffcore-blobfind.c b/diffcore-blobfind.c
> new file mode 100644
> index 0000000000..5d222fc336
> --- /dev/null
> +++ b/diffcore-blobfind.c
> @@ -0,0 +1,51 @@
> +/*
> + * Copyright (c) 2017 Google Inc.
> + */
> +#include "cache.h"
> +#include "diff.h"
> +#include "diffcore.h"
> +
> +static void diffcore_filter_blobs(struct diff_queue_struct *q,
> +				  struct diff_options *options)
> +{
> +	int i, j = 0, c = q->nr;
> +
> +	if (!options->blobfind)
> +		BUG("blobfind oidset not initialized???");
> +
> +	for (i = 0; i < q->nr; i++) {
> +		struct diff_filepair *p = q->queue[i];
> +
> +		if (DIFF_PAIR_UNMERGED(p) ||
> +		    (DIFF_FILE_VALID(p->one) &&
> +		     oidset_contains(options->blobfind, &p->one->oid)) ||
> +		    (DIFF_FILE_VALID(p->two) &&
> +		     oidset_contains(options->blobfind, &p->two->oid)))
> +			continue;

So, we keep an unmerged pair, a pair that mentions a sought-blob on
one side or the other side?  I am not sure if we want to keep the
unmerged pair for the purpose of this one.

> +		diff_free_filepair(p);
> +		q->queue[i] = NULL;
> +		c--;

Also, if you are doing the in-place shrinking and have already
introduced another counter 'j' that is initialized to 0, I think it
makes more sense to do the shrinking in-place.  'i' will stay to be
the source-scan pointer that runs 0 thru q->nr, while 'j' can be
used in this loop (where you have 'continue') to move the current
one that is determined to survive from q->queue[i] to q->queue[j++].

Then you do not need 'c'; when the loop ends, 'j' would be the
number of surviving entries and q->nr can be adjusted to it.  Unlike
the usual pattern taken by the other diffcore transformations where
a new queue is populated and the old one discarded, this would leave
the q->queue[] over-allocated, but I do not think it is too bad.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-11-24  7:43   ` Junio C Hamano
@ 2017-11-25  4:59     ` Junio C Hamano
  2017-12-07 21:40     ` Junio C Hamano
  1 sibling, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2017-11-25  4:59 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> If that is the case, shouldn't we make this new mode imply
> --full-history to forbid history simplification?  "git log" is a
> tool to find _an_ explanation of the current state, and the usual
> history simplification makes tons of sense there, but blobfind is
> run most likely in order to find _all_ mention of the set of blobs
> given.

One scenario that I think we may want to be careful about is this:

 ---o---*---*---A*--M*--o---X
     \             /
      o---*---o---B

where commits marked with '*' has the same blob M:Makefile you are
looking for at the same path Makefile, and we start traversal at X
with "git log --blobfind=M:Makefile X" (or even with a pathspec, i.e.
"git log --blobfind=M:Makefile X -- Makefile).

The usual merge simplification rules would say "Ah, M and A are
TREESAME so we do not have to look at the side branch that ends at
B".  If the user is interested in finding all the introduction and
the retirement of a specific blob object, we would miss the
transition around the '*' on that side branch and ends up finding
only the transitions after the fork point where the blob is
introduced, and after M where the blob is retired.

Another interesting case we may want to be careful is this:

    ---A*--M*--o---X
          /
      ---B*

for the same reason.  The usual merge simplification rules are
designed to come up with _an_ explanation for the state in X,
and because M is TREESAME with both A and B, it would pick just
one (the first parent) while ignoring the other.  Again, that would
not be appropriate if the reason why the user is running the command
is to find all the introduction and the retirement of an object.

It may be worth covering these in the tests (I didn't try to see
specifically if the patch has these cases already, as I didn't think
of the issue when I responded---sorry about that).

Thanks.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-11-24  7:43   ` Junio C Hamano
  2017-11-25  4:59     ` Junio C Hamano
@ 2017-12-07 21:40     ` Junio C Hamano
  1 sibling, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2017-12-07 21:40 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

After saying "Will merge to 'next'" in the recent "What's cooking"
report, I noticed that a few loose ends were never tied on this
topic.

>> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
>> index dd0dba5b1d..252a21cc19 100644
>> --- a/Documentation/diff-options.txt
>> +++ b/Documentation/diff-options.txt
>> @@ -500,6 +500,10 @@ information.
>>  --pickaxe-regex::
>>  	Treat the <string> given to `-S` as an extended POSIX regular
>>  	expression to match.
>> +--blobfind=<blob-id>::
>> +	Restrict the output such that one side of the diff
>> +	matches the given blob-id.
>> +
>>  endif::git-format-patch[]
>
> Can we have a blank line between these enumerations to make the
> source easier to read?  Thanks.
>
> ...
> So, we keep an unmerged pair, a pair that mentions a sought-blob on
> one side or the other side?  I am not sure if we want to keep the
> unmerged pair for the purpose of this one.
>
>> +		diff_free_filepair(p);
>> +		q->queue[i] = NULL;
>> +		c--;
>
> Also, if you are doing the in-place shrinking and have already
> introduced another counter 'j' that is initialized to 0, I think it
> makes more sense to do the shrinking in-place.  'i' will stay to be
> the source-scan pointer that runs 0 thru q->nr, while 'j' can be
> used in this loop (where you have 'continue') to move the current
> one that is determined to survive from q->queue[i] to q->queue[j++].
>
> Then you do not need 'c'; when the loop ends, 'j' would be the
> number of surviving entries and q->nr can be adjusted to it.  Unlike
> the usual pattern taken by the other diffcore transformations where
> a new queue is populated and the old one discarded, this would leave
> the q->queue[] over-allocated, but I do not think it is too bad.

Here is to illustrate the last point.  I still think we should keep
the unmerged entries for the purpose of blobfind but it should be
trivial to fix that.

 diffcore-blobfind.c | 33 ++++++++++++---------------------
 1 file changed, 12 insertions(+), 21 deletions(-)

diff --git a/diffcore-blobfind.c b/diffcore-blobfind.c
index 5d222fc336..bf63ba61dc 100644
--- a/diffcore-blobfind.c
+++ b/diffcore-blobfind.c
@@ -8,40 +8,31 @@
 static void diffcore_filter_blobs(struct diff_queue_struct *q,
 				  struct diff_options *options)
 {
-	int i, j = 0, c = q->nr;
+	int src, dst;
 
 	if (!options->blobfind)
 		BUG("blobfind oidset not initialized???");
 
-	for (i = 0; i < q->nr; i++) {
-		struct diff_filepair *p = q->queue[i];
+	for (src = dst = 0; src < q->nr; src++) {
+		struct diff_filepair *p = q->queue[src];
 
 		if (DIFF_PAIR_UNMERGED(p) ||
 		    (DIFF_FILE_VALID(p->one) &&
 		     oidset_contains(options->blobfind, &p->one->oid)) ||
 		    (DIFF_FILE_VALID(p->two) &&
-		     oidset_contains(options->blobfind, &p->two->oid)))
-			continue;
-
-		diff_free_filepair(p);
-		q->queue[i] = NULL;
-		c--;
-	}
-
-	/* Keep it sorted. */
-	i = 0; j = 0;
-	while (i < c) {
-		while (!q->queue[j])
-			j++;
-		q->queue[i] = q->queue[j];
-		i++; j++;
+		     oidset_contains(options->blobfind, &p->two->oid))) {
+			q->queue[dst] = p;
+			dst++;
+		} else {
+			diff_free_filepair(p);
+		}
 	}
 
-	q->nr = c;
-
-	if (!c) {
+	if (!dst) {
 		free(q->queue);
 		DIFF_QUEUE_CLEAR(q);
+	} else {
+		q->nr = dst;
 	}
 }
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-12-08  0:24 [PATCH 0/1] diffcore-blobfind Stefan Beller
@ 2017-12-08  0:24 ` Stefan Beller
  2017-12-08  9:34   ` Jeff King
  2017-12-08 15:04   ` Junio C Hamano
  2017-12-11 19:58 ` [PATCH 0/1] diff-core blobfind Stefan Beller
  1 sibling, 2 replies; 18+ messages in thread
From: Stefan Beller @ 2017-12-08  0:24 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Sometimes users are given a hash of an object and they want to
identify it further (ex.: Use verify-pack to find the largest blobs,
but what are these? or [1])

One might be tempted to extend git-describe to also work with blobs,
such that `git describe <blob-id>` gives a description as
'<commit-ish>:<path>'.  This was implemented at [2]; as seen by the sheer
number of responses (>110), it turns out this is tricky to get right.
The hard part to get right is picking the correct 'commit-ish' as that
could be the commit that (re-)introduced the blob or the blob that
removed the blob; the blob could exist in different branches.

Junio hinted at a different approach of solving this problem, which this
patch implements. Teach the diff machinery another flag for restricting
the information to what is shown. For example:

  $ ./git log --oneline --blobfind=v2.0.0:Makefile
  b2feb64309 Revert the whole "ask curl-config" topic for now
  47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"

we observe that the Makefile as shipped with 2.0 was introduced in
v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by
a different blob.

[1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob
[2] https://public-inbox.org/git/20171028004419.10139-1-sbeller@google.com/

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/diff-options.txt |  5 +++++
 Makefile                       |  1 +
 builtin/log.c                  |  2 +-
 diff.c                         | 20 +++++++++++++++++++-
 diff.h                         |  3 +++
 diffcore-blobfind.c            | 41 +++++++++++++++++++++++++++++++++++++++++
 diffcore.h                     |  1 +
 revision.c                     |  5 ++++-
 t/t4064-diff-blobfind.sh       | 35 +++++++++++++++++++++++++++++++++++
 9 files changed, 110 insertions(+), 3 deletions(-)
 create mode 100644 diffcore-blobfind.c
 create mode 100755 t/t4064-diff-blobfind.sh

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index dd0dba5b1d..34d53b95f1 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -500,6 +500,11 @@ information.
 --pickaxe-regex::
 	Treat the <string> given to `-S` as an extended POSIX regular
 	expression to match.
+
+--blobfind=<blob-id>::
+	Restrict the output such that one side of the diff
+	matches the given blob-id.
+
 endif::git-format-patch[]
 
 -O<orderfile>::
diff --git a/Makefile b/Makefile
index ee9d5eb11e..fdfa8f38f6 100644
--- a/Makefile
+++ b/Makefile
@@ -775,6 +775,7 @@ LIB_OBJS += date.o
 LIB_OBJS += decorate.o
 LIB_OBJS += diffcore-break.o
 LIB_OBJS += diffcore-delta.o
+LIB_OBJS += diffcore-blobfind.o
 LIB_OBJS += diffcore-order.o
 LIB_OBJS += diffcore-pickaxe.o
 LIB_OBJS += diffcore-rename.o
diff --git a/builtin/log.c b/builtin/log.c
index 6c1fa896ad..7b91f61423 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -181,7 +181,7 @@ static void cmd_log_init_finish(int argc, const char **argv, const char *prefix,
 		init_display_notes(&rev->notes_opt);
 
 	if (rev->diffopt.pickaxe || rev->diffopt.filter ||
-	    rev->diffopt.flags.follow_renames)
+	    rev->diffopt.flags.follow_renames || rev->diffopt.blobfind)
 		rev->always_show_header = 0;
 
 	if (source)
diff --git a/diff.c b/diff.c
index 0763e89263..8861f89ab1 100644
--- a/diff.c
+++ b/diff.c
@@ -4082,6 +4082,7 @@ void diff_setup(struct diff_options *options)
 	options->interhunkcontext = diff_interhunk_context_default;
 	options->ws_error_highlight = ws_error_highlight_default;
 	options->flags.rename_empty = 1;
+	options->blobfind = NULL;
 
 	/* pathchange left =NULL by default */
 	options->change = diff_change;
@@ -4487,6 +4488,19 @@ static int parse_ws_error_highlight_opt(struct diff_options *opt, const char *ar
 	return 1;
 }
 
+static int parse_blobfind_opt(struct diff_options *opt, const char *arg)
+{
+	struct object_id oid;
+
+	if (get_oid_blob(arg, &oid) || sha1_object_info(oid.hash, NULL) != OBJ_BLOB)
+		return error("object '%s' is not a blob", arg);
+
+	if (!opt->blobfind)
+		opt->blobfind = xcalloc(1, sizeof(*opt->blobfind));
+	oidset_insert(opt->blobfind, &oid);
+	return 1;
+}
+
 int diff_opt_parse(struct diff_options *options,
 		   const char **av, int ac, const char *prefix)
 {
@@ -4736,7 +4750,8 @@ int diff_opt_parse(struct diff_options *options,
 	else if ((argcount = short_opt('O', av, &optarg))) {
 		options->orderfile = prefix_filename(prefix, optarg);
 		return argcount;
-	}
+	} else if (skip_prefix(arg, "--blobfind=", &arg))
+		return parse_blobfind_opt(options, arg);
 	else if ((argcount = parse_long_opt("diff-filter", av, &optarg))) {
 		int offending = parse_diff_filter_opt(optarg, options);
 		if (offending)
@@ -5770,6 +5785,9 @@ void diffcore_std(struct diff_options *options)
 		diffcore_skip_stat_unmatch(options);
 	if (!options->found_follow) {
 		/* See try_to_follow_renames() in tree-diff.c */
+
+		if (options->blobfind)
+			diffcore_blobfind(options);
 		if (options->break_opt != -1)
 			diffcore_break(options->break_opt);
 		if (options->detect_rename)
diff --git a/diff.h b/diff.h
index 0fb18dd735..9178e498fa 100644
--- a/diff.h
+++ b/diff.h
@@ -7,6 +7,7 @@
 #include "tree-walk.h"
 #include "pathspec.h"
 #include "object.h"
+#include "oidset.h"
 
 struct rev_info;
 struct diff_options;
@@ -174,6 +175,8 @@ struct diff_options {
 	enum diff_words_type word_diff;
 	enum diff_submodule_format submodule_format;
 
+	struct oidset *blobfind;
+
 	/* this is set by diffcore for DIFF_FORMAT_PATCH */
 	int found_changes;
 
diff --git a/diffcore-blobfind.c b/diffcore-blobfind.c
new file mode 100644
index 0000000000..e65c7cad6e
--- /dev/null
+++ b/diffcore-blobfind.c
@@ -0,0 +1,41 @@
+/*
+ * Copyright (c) 2017 Google Inc.
+ */
+#include "cache.h"
+#include "diff.h"
+#include "diffcore.h"
+
+static void diffcore_filter_blobs(struct diff_queue_struct *q,
+				  struct diff_options *options)
+{
+	int src, dst;
+
+	if (!options->blobfind)
+		BUG("blobfind oidset not initialized???");
+
+	for (src = dst = 0; src < q->nr; src++) {
+		struct diff_filepair *p = q->queue[src];
+
+		if ((DIFF_FILE_VALID(p->one) &&
+		     oidset_contains(options->blobfind, &p->one->oid)) ||
+		    (DIFF_FILE_VALID(p->two) &&
+		     oidset_contains(options->blobfind, &p->two->oid))) {
+			q->queue[dst] = p;
+			dst++;
+		} else {
+			diff_free_filepair(p);
+		}
+	}
+
+	if (!dst) {
+		free(q->queue);
+		DIFF_QUEUE_CLEAR(q);
+	} else {
+		q->nr = dst;
+	}
+}
+
+void diffcore_blobfind(struct diff_options *options)
+{
+	diffcore_filter_blobs(&diff_queued_diff, options);
+}
diff --git a/diffcore.h b/diffcore.h
index a30da161da..431917672f 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -107,6 +107,7 @@ extern struct diff_filepair *diff_queue(struct diff_queue_struct *,
 					struct diff_filespec *);
 extern void diff_q(struct diff_queue_struct *, struct diff_filepair *);
 
+extern void diffcore_blobfind(struct diff_options *);
 extern void diffcore_break(int);
 extern void diffcore_rename(struct diff_options *);
 extern void diffcore_merge_broken(void);
diff --git a/revision.c b/revision.c
index e2e691dd5a..8e1a89f832 100644
--- a/revision.c
+++ b/revision.c
@@ -2409,7 +2409,8 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 	/* Pickaxe, diff-filter and rename following need diffs */
 	if (revs->diffopt.pickaxe ||
 	    revs->diffopt.filter ||
-	    revs->diffopt.flags.follow_renames)
+	    revs->diffopt.flags.follow_renames ||
+	    revs->diffopt.blobfind)
 		revs->diff = 1;
 
 	if (revs->topo_order)
@@ -2883,6 +2884,8 @@ int prepare_revision_walk(struct rev_info *revs)
 		simplify_merges(revs);
 	if (revs->children.name)
 		set_children(revs);
+	if (revs->diffopt.blobfind)
+		revs->simplify_history = 0;
 	return 0;
 }
 
diff --git a/t/t4064-diff-blobfind.sh b/t/t4064-diff-blobfind.sh
new file mode 100755
index 0000000000..b2c2964d77
--- /dev/null
+++ b/t/t4064-diff-blobfind.sh
@@ -0,0 +1,35 @@
+#!/bin/sh
+
+test_description='test finding specific blobs in the revision walking'
+. ./test-lib.sh
+
+test_expect_success 'setup ' '
+	git commit --allow-empty -m "empty initial commit" &&
+
+	echo "Hello, world!" >greeting &&
+	git add greeting &&
+	git commit -m "add the greeting blob" && # borrowed from Git from the Bottom Up
+	git tag -m "the blob" greeting $(git rev-parse HEAD:greeting) &&
+
+	echo asdf >unrelated &&
+	git add unrelated &&
+	git commit -m "unrelated history" &&
+
+	git revert HEAD^ &&
+
+	git commit --allow-empty -m "another unrelated commit"
+'
+
+test_expect_success 'find the greeting blob' '
+	cat >expect <<-EOF &&
+	Revert "add the greeting blob"
+	add the greeting blob
+	EOF
+
+	git log --abbrev=12 --oneline --blobfind=greeting^{blob} >actual.raw &&
+	cut -c 14- actual.raw >actual &&
+
+	test_cmp expect actual
+'
+
+test_done
-- 
2.15.1.424.g9478a66081-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-12-08  0:24 ` [PATCH 1/1] diffcore: add a filter to find a specific blob Stefan Beller
@ 2017-12-08  9:34   ` Jeff King
  2017-12-08 16:28     ` Ramsay Jones
  2017-12-08 15:04   ` Junio C Hamano
  1 sibling, 1 reply; 18+ messages in thread
From: Jeff King @ 2017-12-08  9:34 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

On Thu, Dec 07, 2017 at 04:24:47PM -0800, Stefan Beller wrote:

> Sometimes users are given a hash of an object and they want to
> identify it further (ex.: Use verify-pack to find the largest blobs,
> but what are these? or [1])
> 
> One might be tempted to extend git-describe to also work with blobs,
> such that `git describe <blob-id>` gives a description as
> '<commit-ish>:<path>'.  This was implemented at [2]; as seen by the sheer
> number of responses (>110), it turns out this is tricky to get right.
> The hard part to get right is picking the correct 'commit-ish' as that
> could be the commit that (re-)introduced the blob or the blob that
> removed the blob; the blob could exist in different branches.

Neat. I didn't follow the original thread very closely, but I think this
is a good idea (and is definitely something I've simulated manually with
"git log --raw | grep" before).

Bear with me while I pontificate for a moment.

We do something similar at GitHub to report on too-large objects
during a push (we identify the object during index-pack, but then want
to hand back a human-readable pathname).

We do it with a patch to "rev-list", so that you can use the normal
traversal options to limit the set of visited commits. Which effectively
makes it "traverse and find a place where this object is referenced, and
then tell me the path at which you found it (or tell me if it was not
referenced at all)".

That sidesteps the "describe" problem, because now it is the user's
responsibility to tell you which commits to look in. :)

But the rev-list solution only finds _a_ commit that contains the
object, and which likely has nothing to do with the object at all. Your
solution here finds the introduction and removal of the object, which
are interesting for a wider variety of searches.

So I wondered if this could replace my rev-list patch (which I've been
meaning to upstream for a while). But I think the answer is "no", there
is room for both features. What you're finding here is much more
specific and useful data, but it's also much more expensive to generate.
For my uses cases, it was enough to ask "show me a plausible path
quickly" or even "is this object reachable" (which you can answer out of
bitmaps!).

> Junio hinted at a different approach of solving this problem, which this
> patch implements. Teach the diff machinery another flag for restricting
> the information to what is shown. For example:
> 
>   $ ./git log --oneline --blobfind=v2.0.0:Makefile
>   b2feb64309 Revert the whole "ask curl-config" topic for now
>   47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"
> 
> we observe that the Makefile as shipped with 2.0 was introduced in
> v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by
> a different blob.

So I like this very much as an addition to Git's toolbox. But does it
really need to be limited to blobs? We should be able to ask for trees
(both root trees and sub-trees), too. That's going to be less common, I
think, but I have definitely done

  git log --raw -t --pretty=raw --no-abbrev |
  less +/$tree_sha1

to debug things (corrupted repos, funny merges, etc).

You could even do it for submodule commits. Which demonstrates that this
feature does not even need to actually have a resolvable object; you're
just looking for the textual sha1.

You do your filtering on the diff queue, which I think will have the
entries you need if "-t" is specified (and should always have submodule
commits, I think). So you'd just need to relax the incoming object
check, like:

diff --git a/diff.c b/diff.c
index e4571df3bf..0007bb0a09 100644
--- a/diff.c
+++ b/diff.c
@@ -4490,8 +4490,12 @@ static int parse_blobfind_opt(struct diff_options *opt, const char *arg)
 {
 	struct object_id oid;
 
-	if (get_oid_blob(arg, &oid) || sha1_object_info(oid.hash, NULL) != OBJ_BLOB)
-		return error("object '%s' is not a blob", arg);
+	/*
+	 * We don't even need to have the object, and 40-hex sha1s will
+	 * just resolve here.
+	 */
+	if (get_oid_blob(arg, &oid))
+		return error("unable to resolve '%s'", arg);
 
 	if (!opt->blobfind)
 		opt->blobfind = xcalloc(1, sizeof(*opt->blobfind));

At which point:

  ./git log --oneline -t --blobfind=v2.0.0:Documentation

just works (the results are kind of interesting, too; you see that tree
"go away" in a lot of different commits because many independent
branches touched the directory).

Also interesting:

  ./git log --oneline --blobfind=HEAD:sha1collisiondetection

Well, the output for this particular case isn't that interesting. But it
shows that we can find submodules, too.

-Peff

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-12-08  0:24 ` [PATCH 1/1] diffcore: add a filter to find a specific blob Stefan Beller
  2017-12-08  9:34   ` Jeff King
@ 2017-12-08 15:04   ` Junio C Hamano
  2017-12-08 17:21     ` Junio C Hamano
  2017-12-08 21:11     ` Stefan Beller
  1 sibling, 2 replies; 18+ messages in thread
From: Junio C Hamano @ 2017-12-08 15:04 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> diff --git a/diffcore-blobfind.c b/diffcore-blobfind.c
> new file mode 100644
> index 0000000000..e65c7cad6e
> --- /dev/null
> +++ b/diffcore-blobfind.c
> @@ -0,0 +1,41 @@
> +/*
> + * Copyright (c) 2017 Google Inc.
> + */
> +#include "cache.h"
> +#include "diff.h"
> +#include "diffcore.h"
> +
> +static void diffcore_filter_blobs(struct diff_queue_struct *q,
> +				  struct diff_options *options)
> +{
> +	int src, dst;
> +
> +	if (!options->blobfind)
> +		BUG("blobfind oidset not initialized???");
> +
> +	for (src = dst = 0; src < q->nr; src++) {
> +		struct diff_filepair *p = q->queue[src];
> +
> +		if ((DIFF_FILE_VALID(p->one) &&
> +		     oidset_contains(options->blobfind, &p->one->oid)) ||
> +		    (DIFF_FILE_VALID(p->two) &&
> +		     oidset_contains(options->blobfind, &p->two->oid))) {

Shouldn't this make sure that !DIFF_PAIR_UNMERGED(p) before looking
at the sides of the pair?  I think an unmerged pair is queued with
filespecs whose modes are cleared, but we should not be relying on
that---I think we even had bugs we fixed by introducing an explicit
is_unmerged field to the filepair struct.

> @@ -2883,6 +2884,8 @@ int prepare_revision_walk(struct rev_info *revs)
>  		simplify_merges(revs);
>  	if (revs->children.name)
>  		set_children(revs);
> +	if (revs->diffopt.blobfind)
> +		revs->simplify_history = 0;
>  	return 0;
>  }

It makes sense to clear this bit by default, but is this an
unconditonal clearing?  In other words, is there a way for the user
to countermand this default and ask for merge simplification from
the command line, e.g. "diff --simplify-history --blobfind=<blob>"?

> +test_description='test finding specific blobs in the revision walking'
> +. ./test-lib.sh
> +
> +test_expect_success 'setup ' '
> +	git commit --allow-empty -m "empty initial commit" &&
> +
> +	echo "Hello, world!" >greeting &&
> +	git add greeting &&
> +	git commit -m "add the greeting blob" && # borrowed from Git from the Bottom Up
> +	git tag -m "the blob" greeting $(git rev-parse HEAD:greeting) &&
> +
> +	echo asdf >unrelated &&
> +	git add unrelated &&
> +	git commit -m "unrelated history" &&
> +
> +	git revert HEAD^ &&
> +
> +	git commit --allow-empty -m "another unrelated commit"
> +'
> +
> +test_expect_success 'find the greeting blob' '
> +	cat >expect <<-EOF &&
> +	Revert "add the greeting blob"
> +	add the greeting blob
> +	EOF
> +
> +	git log --abbrev=12 --oneline --blobfind=greeting^{blob} >actual.raw &&
> +	cut -c 14- actual.raw >actual &&
> +	test_cmp expect actual

The hardcoded numbers look strange and smell like a workaround of
not asking for full object names.

Also, now it has the simplify-history bit in the change, don't we
want to check a mergy history, and also running "diff-files" while
the index has unmerged entries?

Other than that, the changes look quite sane and pleasnt read.

Thanks.


> +'
> +
> +test_done

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-12-08  9:34   ` Jeff King
@ 2017-12-08 16:28     ` Ramsay Jones
  2017-12-08 20:19       ` Jeff King
  0 siblings, 1 reply; 18+ messages in thread
From: Ramsay Jones @ 2017-12-08 16:28 UTC (permalink / raw)
  To: Jeff King, Stefan Beller; +Cc: git



On 08/12/17 09:34, Jeff King wrote:
> On Thu, Dec 07, 2017 at 04:24:47PM -0800, Stefan Beller wrote:
[snip]
>> Junio hinted at a different approach of solving this problem, which this
>> patch implements. Teach the diff machinery another flag for restricting
>> the information to what is shown. For example:
>>
>>   $ ./git log --oneline --blobfind=v2.0.0:Makefile
>>   b2feb64309 Revert the whole "ask curl-config" topic for now
>>   47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"
>>
>> we observe that the Makefile as shipped with 2.0 was introduced in
>> v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by
>> a different blob.

Sorry, this has nothing to do with this specific thread. :(

However, the above caught my eye. Commit b2feb64309 does not 'replace
the Makefile with a different blob'. That commit was a 'last minute'
revert of a topic _prior_ to v2.0.0, which resulted in the _same_
blob as commit 47fbfded53. (i.e. a53f3a8326c2e62dc79bae7169d64137ac3dab20).

[I haven't been following this topic, so just ignore me if I have
misunderstood what the above was describing! :-D ]

ATB,
Ramsay Jones


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-12-08 15:04   ` Junio C Hamano
@ 2017-12-08 17:21     ` Junio C Hamano
  2017-12-08 21:11     ` Stefan Beller
  1 sibling, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2017-12-08 17:21 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Stefan Beller <sbeller@google.com> writes:
> ...
>> @@ -2883,6 +2884,8 @@ int prepare_revision_walk(struct rev_info *revs)
>>  		simplify_merges(revs);
>>  	if (revs->children.name)
>>  		set_children(revs);
>> +	if (revs->diffopt.blobfind)
>> +		revs->simplify_history = 0;
>>  	return 0;
>>  }
>
> It makes sense to clear this bit by default, but is this an
> unconditonal clearing?  In other words, is there a way for the user
> to countermand this default and ask for merge simplification from
> the command line, e.g. "diff --simplify-history --blobfind=<blob>"?

Looking at the places that assign to revs->simplify_history in
revision.c it seems that "this option turns the merge simplification
off unconditionally" is pretty much the norm, and this change just
follows suit.  So perhaps it is OK to let this pass, at least for
now.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-12-08 16:28     ` Ramsay Jones
@ 2017-12-08 20:19       ` Jeff King
  2017-12-08 20:39         ` Stefan Beller
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff King @ 2017-12-08 20:19 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Stefan Beller, git

On Fri, Dec 08, 2017 at 04:28:23PM +0000, Ramsay Jones wrote:

> On 08/12/17 09:34, Jeff King wrote:
> > On Thu, Dec 07, 2017 at 04:24:47PM -0800, Stefan Beller wrote:
> [snip]
> >> Junio hinted at a different approach of solving this problem, which this
> >> patch implements. Teach the diff machinery another flag for restricting
> >> the information to what is shown. For example:
> >>
> >>   $ ./git log --oneline --blobfind=v2.0.0:Makefile
> >>   b2feb64309 Revert the whole "ask curl-config" topic for now
> >>   47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"
> >>
> >> we observe that the Makefile as shipped with 2.0 was introduced in
> >> v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by
> >> a different blob.
> 
> Sorry, this has nothing to do with this specific thread. :(
> 
> However, the above caught my eye. Commit b2feb64309 does not 'replace
> the Makefile with a different blob'. That commit was a 'last minute'
> revert of a topic _prior_ to v2.0.0, which resulted in the _same_
> blob as commit 47fbfded53. (i.e. a53f3a8326c2e62dc79bae7169d64137ac3dab20).

Right, I think the patch is working as advertised but the commit message
misinterprets the results. :)

If you add --raw, you can see that both commits introduce that blob, and
it never "goes away". That's because that happened in a merge, which we
don't diff in a default log invocation.

And in fact finding where this "goes away" is hard, because the merge
doesn't trigger "-c" or "--cc". It's 8eaf517835, which presumably was
forked before the revert in b2feb64309, and then the merge was able to
replace that blob completely with the other side of the merge.

Viewing with "-m" turns up a bunch of uninteresting merges. Looking at
"--first-parent" is how I got 8eaf517835.

So I think this one is tricky because of the revert. In the same way
that pathspec-limiting is often tricky in the face of a revert, because
the merges "hide" interesting things happening.

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-12-08 20:19       ` Jeff King
@ 2017-12-08 20:39         ` Stefan Beller
  2017-12-08 21:38           ` Jeff King
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Beller @ 2017-12-08 20:39 UTC (permalink / raw)
  To: Jeff King; +Cc: Ramsay Jones, git

On Fri, Dec 8, 2017 at 12:19 PM, Jeff King <peff@peff.net> wrote:
> On Fri, Dec 08, 2017 at 04:28:23PM +0000, Ramsay Jones wrote:
>
>> On 08/12/17 09:34, Jeff King wrote:
>> > On Thu, Dec 07, 2017 at 04:24:47PM -0800, Stefan Beller wrote:
>> [snip]
>> >> Junio hinted at a different approach of solving this problem, which this
>> >> patch implements. Teach the diff machinery another flag for restricting
>> >> the information to what is shown. For example:
>> >>
>> >>   $ ./git log --oneline --blobfind=v2.0.0:Makefile
>> >>   b2feb64309 Revert the whole "ask curl-config" topic for now
>> >>   47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"
>> >>
>> >> we observe that the Makefile as shipped with 2.0 was introduced in
>> >> v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by
>> >> a different blob.
>>
>> Sorry, this has nothing to do with this specific thread. :(
>>
>> However, the above caught my eye. Commit b2feb64309 does not 'replace
>> the Makefile with a different blob'. That commit was a 'last minute'
>> revert of a topic _prior_ to v2.0.0, which resulted in the _same_
>> blob as commit 47fbfded53. (i.e. a53f3a8326c2e62dc79bae7169d64137ac3dab20).
>
> Right, I think the patch is working as advertised but the commit message
> misinterprets the results. :)

Thanks for digging. I came to a similar realization.

> If you add --raw, you can see that both commits introduce that blob, and
> it never "goes away". That's because that happened in a merge, which we
> don't diff in a default log invocation.

We should when --raw is given.
--raw is documented as  "For each commit, show a summary of changes
using the raw diff format." and I would argue that 'each commit' includes
merges. Though I guess this may have implications for long time users.

> And in fact finding where this "goes away" is hard, because the merge
> doesn't trigger "-c" or "--cc". It's 8eaf517835, which presumably was
> forked before the revert in b2feb64309, and then the merge was able to
> replace that blob completely with the other side of the merge.
>
> Viewing with "-m" turns up a bunch of uninteresting merges. Looking at
> "--first-parent" is how I got 8eaf517835.
>
> So I think this one is tricky because of the revert. In the same way
> that pathspec-limiting is often tricky in the face of a revert, because
> the merges "hide" interesting things happening.

yup, hidden merges are unfortunate. Is there an easy way to find out
about merges? (Junio hints at having tests around merges, which I'll do
next)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-12-08 15:04   ` Junio C Hamano
  2017-12-08 17:21     ` Junio C Hamano
@ 2017-12-08 21:11     ` Stefan Beller
  2017-12-08 21:15       ` Junio C Hamano
  1 sibling, 1 reply; 18+ messages in thread
From: Stefan Beller @ 2017-12-08 21:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

>> +
>> +             if ((DIFF_FILE_VALID(p->one) &&
>> +                  oidset_contains(options->blobfind, &p->one->oid)) ||
>> +                 (DIFF_FILE_VALID(p->two) &&
>> +                  oidset_contains(options->blobfind, &p->two->oid))) {
>
> Shouldn't this make sure that !DIFF_PAIR_UNMERGED(p) before looking
> at the sides of the pair?  I think an unmerged pair is queued with
> filespecs whose modes are cleared, but we should not be relying on
> that---I think we even had bugs we fixed by introducing an explicit
> is_unmerged field to the filepair struct.

I am not sure I follow. IIUC 'unmerged' only ever happens in the
index when there are multiple stages for one path (represented in the
working tree with diff markers). Aren't we supposed to find such
an unmerged blob as well (despite wrong mode, but the oid ought to fit)?

>> +     if (revs->diffopt.blobfind)
>> +             revs->simplify_history = 0;
>>       return 0;
>>  }
>
> It makes sense to clear this bit by default, but is this an
> unconditonal clearing?  In other words, is there a way for the user
> to countermand this default and ask for merge simplification from
> the command line, e.g. "diff --simplify-history --blobfind=<blob>"?

As noted in your reply, this is consistent with other occurrences of
simplify_history, so let's keep it this way.

>> +
>> +     git log --abbrev=12 --oneline --blobfind=greeting^{blob} >actual.raw &&
>> +     cut -c 14- actual.raw >actual &&
>> +     test_cmp expect actual
>
> The hardcoded numbers look strange and smell like a workaround of
> not asking for full object names.

You mean the 12 and 14 ? Yeah I can fix that to 40 and 42 if you want.
I wrote it this way to be agnostic of the actual object id, but thought 12
would be enough for this test no matter the future hash.

> Also, now it has the simplify-history bit in the change, don't we
> want to check a mergy history, and also running "diff-files" while
> the index has unmerged entries?

yup, I am working on that.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-12-08 21:11     ` Stefan Beller
@ 2017-12-08 21:15       ` Junio C Hamano
  0 siblings, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2017-12-08 21:15 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

>>> +
>>> +             if ((DIFF_FILE_VALID(p->one) &&
>>> +                  oidset_contains(options->blobfind, &p->one->oid)) ||
>>> +                 (DIFF_FILE_VALID(p->two) &&
>>> +                  oidset_contains(options->blobfind, &p->two->oid))) {
>>
>> Shouldn't this make sure that !DIFF_PAIR_UNMERGED(p) before looking
>> at the sides of the pair?  I think an unmerged pair is queued with
>> filespecs whose modes are cleared, but we should not be relying on
>> that---I think we even had bugs we fixed by introducing an explicit
>> is_unmerged field to the filepair struct.
>
> I am not sure I follow. IIUC 'unmerged' only ever happens in the
> index when there are multiple stages for one path (represented in the
> working tree with diff markers). Aren't we supposed to find such
> an unmerged blob as well (despite wrong mode, but the oid ought to fit)?

I think diff_unmerged() records "this path is unmerged" without
giving any object name to either side, so you do not get to compare
(or even look at) p->{one,two}->oid with anything.  That is why I
think checking p->one and p->two like the above code, without making
sure that you are looking at a pair that is not unmerged, is wrong.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-12-08 20:39         ` Stefan Beller
@ 2017-12-08 21:38           ` Jeff King
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff King @ 2017-12-08 21:38 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Ramsay Jones, git

On Fri, Dec 08, 2017 at 12:39:55PM -0800, Stefan Beller wrote:

> > If you add --raw, you can see that both commits introduce that blob, and
> > it never "goes away". That's because that happened in a merge, which we
> > don't diff in a default log invocation.
> 
> We should when --raw is given.
> --raw is documented as  "For each commit, show a summary of changes
> using the raw diff format." and I would argue that 'each commit' includes
> merges. Though I guess this may have implications for long time users.

And "--patch" is documented as "generate patch", but it also does
nothing for merges by default. I think it has little to do with "--raw".
It is simply that the default for "log" is none of "-c", "--cc", or
"-m".

We _could_ change that default ("--cc" is already the default for
git-show), but I would not be surprised if that has fallouts (certainly
it makes git-log much slower).

> > So I think this one is tricky because of the revert. In the same way
> > that pathspec-limiting is often tricky in the face of a revert, because
> > the merges "hide" interesting things happening.
> 
> yup, hidden merges are unfortunate. Is there an easy way to find out
> about merges? (Junio hints at having tests around merges, which I'll do
> next)

If you find such an easy way, let me know. :)

One of the few really manual types of query I remember having done in
recent years is trying to pinpoint a bad merge. I.e., somebody during
merge resolution accidentally does "git checkout --ours foo.c", blowing
away changes which they didn't mean to. And then later you want to
figure out which merge did it.

If you use "-c" or "--cc", that isn't an "interesting" change, because
it resolves to one side of the merge. If you use "-m", you get way too
many changes and have to comb through them manually. I've resorted to
"-m --first-parent", but then you frequently have to dig down several
layers (e.g., the bad merge is a merge from "master" onto a topic
branch, and your first "--first-parent" attempt will just find the bad
topic being merged back into master).

I think the most promising tool I've seen there is to redo the merge and
show the diff between the auto-merge (including conflicts) and the
committed tree. It's just another definition of "is this hunk
interesting" that's different from "--cc".

I'm not sure how that would interact with something like "--blobfind",
though. For that matter, I'm not quite sure how your patch would
interact with "--cc". I think you may need to special-case it.

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-12-11 19:58 ` [PATCH 0/1] diff-core blobfind Stefan Beller
@ 2017-12-11 19:58   ` Stefan Beller
  2017-12-11 23:17     ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Beller @ 2017-12-11 19:58 UTC (permalink / raw)
  To: git, gitster; +Cc: peff, Stefan Beller

Sometimes users are given a hash of an object and they want to
identify it further (ex.: Use verify-pack to find the largest blobs,
but what are these? or [1])

One might be tempted to extend git-describe to also work with blobs,
such that `git describe <blob-id>` gives a description as
'<commit-ish>:<path>'.  This was implemented at [2]; as seen by the sheer
number of responses (>110), it turns out this is tricky to get right.
The hard part to get right is picking the correct 'commit-ish' as that
could be the commit that (re-)introduced the blob or the blob that
removed the blob; the blob could exist in different branches.

Junio hinted at a different approach of solving this problem, which this
patch implements. Teach the diff machinery another flag for restricting
the information to what is shown. For example:

  $ ./git log --oneline --blobfind=v2.0.0:Makefile
  b2feb64309 Revert the whole "ask curl-config" topic for now
  47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"

we observe that the Makefile as shipped with 2.0 was introduced in
v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by
a different blob.

[1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob
[2] https://public-inbox.org/git/20171028004419.10139-1-sbeller@google.com/

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/diff-options.txt |  6 +++++
 Makefile                       |  1 +
 builtin/log.c                  |  2 +-
 diff.c                         | 21 ++++++++++++++++-
 diff.h                         |  3 +++
 diffcore-oidfind.c             | 42 ++++++++++++++++++++++++++++++++++
 diffcore.h                     |  1 +
 revision.c                     |  5 ++++-
 t/t4064-diff-oidfind.sh        | 51 ++++++++++++++++++++++++++++++++++++++++++
 9 files changed, 129 insertions(+), 3 deletions(-)
 create mode 100644 diffcore-oidfind.c
 create mode 100755 t/t4064-diff-oidfind.sh

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index dd0dba5b1d..67a99e522b 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -500,6 +500,12 @@ information.
 --pickaxe-regex::
 	Treat the <string> given to `-S` as an extended POSIX regular
 	expression to match.
+
+--find-object=<object-id>::
+	Restrict the output such that one side of the diff
+	matches the given object id. The object can be a blob,
+	or gitlink entry.
+
 endif::git-format-patch[]
 
 -O<orderfile>::
diff --git a/Makefile b/Makefile
index ee9d5eb11e..fc2b136694 100644
--- a/Makefile
+++ b/Makefile
@@ -775,6 +775,7 @@ LIB_OBJS += date.o
 LIB_OBJS += decorate.o
 LIB_OBJS += diffcore-break.o
 LIB_OBJS += diffcore-delta.o
+LIB_OBJS += diffcore-oidfind.o
 LIB_OBJS += diffcore-order.o
 LIB_OBJS += diffcore-pickaxe.o
 LIB_OBJS += diffcore-rename.o
diff --git a/builtin/log.c b/builtin/log.c
index 6c1fa896ad..2ab7f338e6 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -181,7 +181,7 @@ static void cmd_log_init_finish(int argc, const char **argv, const char *prefix,
 		init_display_notes(&rev->notes_opt);
 
 	if (rev->diffopt.pickaxe || rev->diffopt.filter ||
-	    rev->diffopt.flags.follow_renames)
+	    rev->diffopt.flags.follow_renames || rev->diffopt.oidfind)
 		rev->always_show_header = 0;
 
 	if (source)
diff --git a/diff.c b/diff.c
index 0763e89263..cb35934634 100644
--- a/diff.c
+++ b/diff.c
@@ -4082,6 +4082,7 @@ void diff_setup(struct diff_options *options)
 	options->interhunkcontext = diff_interhunk_context_default;
 	options->ws_error_highlight = ws_error_highlight_default;
 	options->flags.rename_empty = 1;
+	options->oidfind = NULL;
 
 	/* pathchange left =NULL by default */
 	options->change = diff_change;
@@ -4487,6 +4488,20 @@ static int parse_ws_error_highlight_opt(struct diff_options *opt, const char *ar
 	return 1;
 }
 
+static int parse_oidfind_opt(struct diff_options *opt, const char *arg)
+{
+	struct object_id oid;
+
+	/* We don't even need to have the object, any oid works. */
+	if (get_oid_blob(arg, &oid))
+		return error("unable to resolve '%s'", arg);
+
+	if (!opt->oidfind)
+		opt->oidfind = xcalloc(1, sizeof(*opt->oidfind));
+	oidset_insert(opt->oidfind, &oid);
+	return 1;
+}
+
 int diff_opt_parse(struct diff_options *options,
 		   const char **av, int ac, const char *prefix)
 {
@@ -4736,7 +4751,8 @@ int diff_opt_parse(struct diff_options *options,
 	else if ((argcount = short_opt('O', av, &optarg))) {
 		options->orderfile = prefix_filename(prefix, optarg);
 		return argcount;
-	}
+	} else if (skip_prefix(arg, "--find-object=", &arg))
+		return parse_oidfind_opt(options, arg);
 	else if ((argcount = parse_long_opt("diff-filter", av, &optarg))) {
 		int offending = parse_diff_filter_opt(optarg, options);
 		if (offending)
@@ -5770,6 +5786,9 @@ void diffcore_std(struct diff_options *options)
 		diffcore_skip_stat_unmatch(options);
 	if (!options->found_follow) {
 		/* See try_to_follow_renames() in tree-diff.c */
+
+		if (options->oidfind)
+			diffcore_oidfind(options);
 		if (options->break_opt != -1)
 			diffcore_break(options->break_opt);
 		if (options->detect_rename)
diff --git a/diff.h b/diff.h
index 0fb18dd735..d2badb29a1 100644
--- a/diff.h
+++ b/diff.h
@@ -7,6 +7,7 @@
 #include "tree-walk.h"
 #include "pathspec.h"
 #include "object.h"
+#include "oidset.h"
 
 struct rev_info;
 struct diff_options;
@@ -174,6 +175,8 @@ struct diff_options {
 	enum diff_words_type word_diff;
 	enum diff_submodule_format submodule_format;
 
+	struct oidset *oidfind;
+
 	/* this is set by diffcore for DIFF_FORMAT_PATCH */
 	int found_changes;
 
diff --git a/diffcore-oidfind.c b/diffcore-oidfind.c
new file mode 100644
index 0000000000..39a0b5390f
--- /dev/null
+++ b/diffcore-oidfind.c
@@ -0,0 +1,42 @@
+/*
+ * Copyright (c) 2017 Google Inc.
+ */
+#include "cache.h"
+#include "diff.h"
+#include "diffcore.h"
+
+static void diffcore_filter_blobs(struct diff_queue_struct *q,
+				  struct diff_options *options)
+{
+	int src, dst;
+
+	if (!options->oidfind)
+		BUG("oidfind oidset not initialized???");
+
+	for (src = dst = 0; src < q->nr; src++) {
+		struct diff_filepair *p = q->queue[src];
+
+		if (!DIFF_PAIR_UNMERGED(p) &&
+		    ((DIFF_FILE_VALID(p->one) &&
+		     oidset_contains(options->oidfind, &p->one->oid)) ||
+		    (DIFF_FILE_VALID(p->two) &&
+		     oidset_contains(options->oidfind, &p->two->oid)))) {
+			q->queue[dst] = p;
+			dst++;
+		} else {
+			diff_free_filepair(p);
+		}
+	}
+
+	if (!dst) {
+		free(q->queue);
+		DIFF_QUEUE_CLEAR(q);
+	} else {
+		q->nr = dst;
+	}
+}
+
+void diffcore_oidfind(struct diff_options *options)
+{
+	diffcore_filter_blobs(&diff_queued_diff, options);
+}
diff --git a/diffcore.h b/diffcore.h
index a30da161da..6bb1d76b1a 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -107,6 +107,7 @@ extern struct diff_filepair *diff_queue(struct diff_queue_struct *,
 					struct diff_filespec *);
 extern void diff_q(struct diff_queue_struct *, struct diff_filepair *);
 
+extern void diffcore_oidfind(struct diff_options *);
 extern void diffcore_break(int);
 extern void diffcore_rename(struct diff_options *);
 extern void diffcore_merge_broken(void);
diff --git a/revision.c b/revision.c
index e2e691dd5a..e4611054b9 100644
--- a/revision.c
+++ b/revision.c
@@ -2409,7 +2409,8 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 	/* Pickaxe, diff-filter and rename following need diffs */
 	if (revs->diffopt.pickaxe ||
 	    revs->diffopt.filter ||
-	    revs->diffopt.flags.follow_renames)
+	    revs->diffopt.flags.follow_renames ||
+	    revs->diffopt.oidfind)
 		revs->diff = 1;
 
 	if (revs->topo_order)
@@ -2883,6 +2884,8 @@ int prepare_revision_walk(struct rev_info *revs)
 		simplify_merges(revs);
 	if (revs->children.name)
 		set_children(revs);
+	if (revs->diffopt.oidfind)
+		revs->simplify_history = 0;
 	return 0;
 }
 
diff --git a/t/t4064-diff-oidfind.sh b/t/t4064-diff-oidfind.sh
new file mode 100755
index 0000000000..7afe62df87
--- /dev/null
+++ b/t/t4064-diff-oidfind.sh
@@ -0,0 +1,51 @@
+#!/bin/sh
+
+test_description='test finding specific blobs in the revision walking'
+. ./test-lib.sh
+
+test_expect_success 'setup ' '
+	git commit --allow-empty -m "empty initial commit" &&
+
+	echo "Hello, world!" >greeting &&
+	git add greeting &&
+	git commit -m "add the greeting blob" && # borrowed from Git from the Bottom Up
+	git tag -m "the blob" greeting $(git rev-parse HEAD:greeting) &&
+
+	echo asdf >unrelated &&
+	git add unrelated &&
+	git commit -m "unrelated history" &&
+
+	git revert HEAD^ &&
+
+	git commit --allow-empty -m "another unrelated commit"
+'
+
+test_expect_success 'find the greeting blob' '
+	cat >expect <<-EOF &&
+	Revert "add the greeting blob"
+	add the greeting blob
+	EOF
+
+	git log --format=%s --find-object=greeting^{blob} >actual &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'setup a submodule' '
+	test_create_repo sub &&
+	test_commit -C sub sub &&
+	git submodule add ./sub sub &&
+	git commit -a -m "add sub"
+'
+
+test_expect_success 'find a submodule' '
+	cat >expect <<-EOF &&
+	add sub
+	EOF
+
+	git log --format=%s --find-object=HEAD:sub >actual &&
+
+	test_cmp expect actual
+'
+
+test_done
-- 
2.15.1.424.g9478a66081-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-12-11 19:58   ` [PATCH 1/1] diffcore: add a filter to find a specific blob Stefan Beller
@ 2017-12-11 23:17     ` Junio C Hamano
  2017-12-12  0:21       ` Stefan Beller
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2017-12-11 23:17 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, peff

Stefan Beller <sbeller@google.com> writes:

> the information to what is shown. For example:
>
>   $ ./git log --oneline --blobfind=v2.0.0:Makefile
>   b2feb64309 Revert the whole "ask curl-config" topic for now
>   47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"

This part is a bit stale???

> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index dd0dba5b1d..67a99e522b 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -500,6 +500,12 @@ information.
>  --pickaxe-regex::
>  	Treat the <string> given to `-S` as an extended POSIX regular
>  	expression to match.
> +
> +--find-object=<object-id>::
> +	Restrict the output such that one side of the diff
> +	matches the given object id. The object can be a blob,
> +	or gitlink entry.

OK.  In principle you should also be able to find a tree, but I do
not now how useful it would be.  Extending it to gitlink, which is
another kind of leaf node in the reachability DAG, does make tons of
sense---it's a no brainer that I feel ashamed not to have thought of
myself ;-)

> +LIB_OBJS += diffcore-oidfind.o

Just to nitpick, but "blobfind" was to find "blob", and if you are
extending it to find any "object", then that should be "objfind".
"oid" is _A_ way to refer to an object (i.e. the _name_ of it), and
name is *not* the same as the thing the name refers to, so...

> +static int parse_oidfind_opt(struct diff_options *opt, const char *arg)
> +{
> +	struct object_id oid;
> +
> +	/* We don't even need to have the object, any oid works. */
> +	if (get_oid_blob(arg, &oid))
> +		return error("unable to resolve '%s'", arg);

Should this still be get_oid_blob(), or should it be less specific
to blobs?

> +test_expect_success 'find the greeting blob' '
> +	cat >expect <<-EOF &&
> +	Revert "add the greeting blob"
> +	add the greeting blob
> +	EOF
> +
> +	git log --format=%s --find-object=greeting^{blob} >actual &&
> +
> +	test_cmp expect actual
> +'

Makes sense.

> +
> +test_expect_success 'setup a submodule' '
> +	test_create_repo sub &&
> +	test_commit -C sub sub &&
> +	git submodule add ./sub sub &&
> +	git commit -a -m "add sub"
> +'
> +
> +test_expect_success 'find a submodule' '
> +	cat >expect <<-EOF &&
> +	add sub
> +	EOF
> +
> +	git log --format=%s --find-object=HEAD:sub >actual &&
> +
> +	test_cmp expect actual
> +'

Nice (and cute).

> +test_done

Looking good.  Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-12-11 23:17     ` Junio C Hamano
@ 2017-12-12  0:21       ` Stefan Beller
  0 siblings, 0 replies; 18+ messages in thread
From: Stefan Beller @ 2017-12-12  0:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King

On Mon, Dec 11, 2017 at 3:17 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> the information to what is shown. For example:
>>
>>   $ ./git log --oneline --blobfind=v2.0.0:Makefile
>>   b2feb64309 Revert the whole "ask curl-config" topic for now
>>   47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"
>
> This part is a bit stale???

fixed in the next reroll :(

>> +--find-object=<object-id>::
>> +     Restrict the output such that one side of the diff
>> +     matches the given object id. The object can be a blob,
>> +     or gitlink entry.
>
> OK.  In principle you should also be able to find a tree, but I do
> not now how useful it would be.  Extending it to gitlink, which is
> another kind of leaf node in the reachability DAG, does make tons of
> sense---it's a no brainer that I feel ashamed not to have thought of
> myself ;-)

The current patch under discussion doesn't find trees, though.
Hence the documentation is accurate saying that only blobs and
gitlinks work.

>
>> +LIB_OBJS += diffcore-oidfind.o
>
> Just to nitpick, but "blobfind" was to find "blob", and if you are
> extending it to find any "object", then that should be "objfind".
> "oid" is _A_ way to refer to an object (i.e. the _name_ of it), and
> name is *not* the same as the thing the name refers to, so...

obj-find sounds good.

>
>> +static int parse_oidfind_opt(struct diff_options *opt, const char *arg)
>> +{
>> +     struct object_id oid;
>> +
>> +     /* We don't even need to have the object, any oid works. */
>> +     if (get_oid_blob(arg, &oid))
>> +             return error("unable to resolve '%s'", arg);
>
> Should this still be get_oid_blob(), or should it be less specific
> to blobs?

We could check if it is a tree/commit and die as they are not
being handled correctly.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-12-12  0:21 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-20 22:25 [PATCH 0/1] Teaching the diff machinery about blobfind [WAS: git describe <blob>] Stefan Beller
2017-11-20 22:25 ` [PATCH 1/1] diffcore: add a filter to find a specific blob Stefan Beller
2017-11-24  7:43   ` Junio C Hamano
2017-11-25  4:59     ` Junio C Hamano
2017-12-07 21:40     ` Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2017-12-08  0:24 [PATCH 0/1] diffcore-blobfind Stefan Beller
2017-12-08  0:24 ` [PATCH 1/1] diffcore: add a filter to find a specific blob Stefan Beller
2017-12-08  9:34   ` Jeff King
2017-12-08 16:28     ` Ramsay Jones
2017-12-08 20:19       ` Jeff King
2017-12-08 20:39         ` Stefan Beller
2017-12-08 21:38           ` Jeff King
2017-12-08 15:04   ` Junio C Hamano
2017-12-08 17:21     ` Junio C Hamano
2017-12-08 21:11     ` Stefan Beller
2017-12-08 21:15       ` Junio C Hamano
2017-12-11 19:58 ` [PATCH 0/1] diff-core blobfind Stefan Beller
2017-12-11 19:58   ` [PATCH 1/1] diffcore: add a filter to find a specific blob Stefan Beller
2017-12-11 23:17     ` Junio C Hamano
2017-12-12  0:21       ` Stefan Beller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).