git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-11-20 22:25 [PATCH 0/1] Teaching the diff machinery about blobfind [WAS: git describe <blob>] Stefan Beller
@ 2017-11-20 22:25 ` Stefan Beller
  2017-11-24  7:43   ` Junio C Hamano
  0 siblings, 1 reply; 29+ messages in thread
From: Stefan Beller @ 2017-11-20 22:25 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Sometimes users are given a hash of an object and they want to
identify it further (ex.: Use verify-pack to find the largest blobs,
but what are these? or [1])

One might be tempted to extend git-describe to also work with blobs,
such that `git describe <blob-id>` gives a description as
'<commit-ish>:<path>'.  This was implemented at [2]; as seen by the sheer
number of responses (>110), it turns out this is tricky to get right.
The hard part to get right is picking the correct 'commit-ish' as that
could be the commit that (re-)introduced the blob or the blob that
removed the blob; the blob could exist in different branches.

Junio hinted at a different approach of solving this problem, which this
patch implements. Teach the diff machinery another flag for restricting
the information to what is shown. For example:

  $ ./git log --oneline --blobfind=v2.0.0:Makefile
  b2feb64309 Revert the whole "ask curl-config" topic for now
  47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"

we observe that the Makefile as shipped with 2.0 was introduced in
v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by
a different blob.

[1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob
[2] https://public-inbox.org/git/20171028004419.10139-1-sbeller@google.com/

Signed-off-by: Stefan Beller <sbeller@google.com>
---

On playing around with this, trying to find more interesting cases, I observed:

    git log --oneline --blobfind=HEAD:COPYING
    703601d678 Update COPYING with GPLv2 with new FSF address
    
    git log --oneline --blobfind=703601d678^:COPYING
    459b8d22e5 tests: do not borrow from COPYING and README from the real source
    703601d678 Update COPYING with GPLv2 with new FSF address
    075b845a85 Add a COPYING notice, making it explicit that the license is GPLv2.

    t/diff-lib/COPYING may need an update of the adress of the FSF,
    #leftoverbits I guess.
    
Another interesting case that I found was
   git log --oneline --blobfind=v2.14.0:Makefile
   3921a0b3c3 perf: add test for writing the index
   36f048c5e4 sha1dc: build git plumbing code more explicitly
   2118805b92 Makefile: add style build rule

all of which were after v2.14, such that the introduction of that blob doesn't
show up; I suspect it came in via a merge as unrelated series may have updated
the Makefile in parallel, though git-log should have told me?

Thanks,
Stefan

 Documentation/diff-options.txt |  4 ++++
 Makefile                       |  1 +
 builtin/log.c                  |  2 +-
 diff.c                         | 20 ++++++++++++++++-
 diff.h                         |  3 +++
 diffcore-blobfind.c            | 51 ++++++++++++++++++++++++++++++++++++++++++
 diffcore.h                     |  1 +
 revision.c                     |  3 ++-
 t/t4064-diff-blobfind.sh       | 35 +++++++++++++++++++++++++++++
 9 files changed, 117 insertions(+), 3 deletions(-)
 create mode 100644 diffcore-blobfind.c
 create mode 100755 t/t4064-diff-blobfind.sh

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index dd0dba5b1d..252a21cc19 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -500,6 +500,10 @@ information.
 --pickaxe-regex::
 	Treat the <string> given to `-S` as an extended POSIX regular
 	expression to match.
+--blobfind=<blob-id>::
+	Restrict the output such that one side of the diff
+	matches the given blob-id.
+
 endif::git-format-patch[]
 
 -O<orderfile>::
diff --git a/Makefile b/Makefile
index ee9d5eb11e..fdfa8f38f6 100644
--- a/Makefile
+++ b/Makefile
@@ -775,6 +775,7 @@ LIB_OBJS += date.o
 LIB_OBJS += decorate.o
 LIB_OBJS += diffcore-break.o
 LIB_OBJS += diffcore-delta.o
+LIB_OBJS += diffcore-blobfind.o
 LIB_OBJS += diffcore-order.o
 LIB_OBJS += diffcore-pickaxe.o
 LIB_OBJS += diffcore-rename.o
diff --git a/builtin/log.c b/builtin/log.c
index 6c1fa896ad..7b91f61423 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -181,7 +181,7 @@ static void cmd_log_init_finish(int argc, const char **argv, const char *prefix,
 		init_display_notes(&rev->notes_opt);
 
 	if (rev->diffopt.pickaxe || rev->diffopt.filter ||
-	    rev->diffopt.flags.follow_renames)
+	    rev->diffopt.flags.follow_renames || rev->diffopt.blobfind)
 		rev->always_show_header = 0;
 
 	if (source)
diff --git a/diff.c b/diff.c
index 0763e89263..8861f89ab1 100644
--- a/diff.c
+++ b/diff.c
@@ -4082,6 +4082,7 @@ void diff_setup(struct diff_options *options)
 	options->interhunkcontext = diff_interhunk_context_default;
 	options->ws_error_highlight = ws_error_highlight_default;
 	options->flags.rename_empty = 1;
+	options->blobfind = NULL;
 
 	/* pathchange left =NULL by default */
 	options->change = diff_change;
@@ -4487,6 +4488,19 @@ static int parse_ws_error_highlight_opt(struct diff_options *opt, const char *ar
 	return 1;
 }
 
+static int parse_blobfind_opt(struct diff_options *opt, const char *arg)
+{
+	struct object_id oid;
+
+	if (get_oid_blob(arg, &oid) || sha1_object_info(oid.hash, NULL) != OBJ_BLOB)
+		return error("object '%s' is not a blob", arg);
+
+	if (!opt->blobfind)
+		opt->blobfind = xcalloc(1, sizeof(*opt->blobfind));
+	oidset_insert(opt->blobfind, &oid);
+	return 1;
+}
+
 int diff_opt_parse(struct diff_options *options,
 		   const char **av, int ac, const char *prefix)
 {
@@ -4736,7 +4750,8 @@ int diff_opt_parse(struct diff_options *options,
 	else if ((argcount = short_opt('O', av, &optarg))) {
 		options->orderfile = prefix_filename(prefix, optarg);
 		return argcount;
-	}
+	} else if (skip_prefix(arg, "--blobfind=", &arg))
+		return parse_blobfind_opt(options, arg);
 	else if ((argcount = parse_long_opt("diff-filter", av, &optarg))) {
 		int offending = parse_diff_filter_opt(optarg, options);
 		if (offending)
@@ -5770,6 +5785,9 @@ void diffcore_std(struct diff_options *options)
 		diffcore_skip_stat_unmatch(options);
 	if (!options->found_follow) {
 		/* See try_to_follow_renames() in tree-diff.c */
+
+		if (options->blobfind)
+			diffcore_blobfind(options);
 		if (options->break_opt != -1)
 			diffcore_break(options->break_opt);
 		if (options->detect_rename)
diff --git a/diff.h b/diff.h
index 0fb18dd735..9178e498fa 100644
--- a/diff.h
+++ b/diff.h
@@ -7,6 +7,7 @@
 #include "tree-walk.h"
 #include "pathspec.h"
 #include "object.h"
+#include "oidset.h"
 
 struct rev_info;
 struct diff_options;
@@ -174,6 +175,8 @@ struct diff_options {
 	enum diff_words_type word_diff;
 	enum diff_submodule_format submodule_format;
 
+	struct oidset *blobfind;
+
 	/* this is set by diffcore for DIFF_FORMAT_PATCH */
 	int found_changes;
 
diff --git a/diffcore-blobfind.c b/diffcore-blobfind.c
new file mode 100644
index 0000000000..5d222fc336
--- /dev/null
+++ b/diffcore-blobfind.c
@@ -0,0 +1,51 @@
+/*
+ * Copyright (c) 2017 Google Inc.
+ */
+#include "cache.h"
+#include "diff.h"
+#include "diffcore.h"
+
+static void diffcore_filter_blobs(struct diff_queue_struct *q,
+				  struct diff_options *options)
+{
+	int i, j = 0, c = q->nr;
+
+	if (!options->blobfind)
+		BUG("blobfind oidset not initialized???");
+
+	for (i = 0; i < q->nr; i++) {
+		struct diff_filepair *p = q->queue[i];
+
+		if (DIFF_PAIR_UNMERGED(p) ||
+		    (DIFF_FILE_VALID(p->one) &&
+		     oidset_contains(options->blobfind, &p->one->oid)) ||
+		    (DIFF_FILE_VALID(p->two) &&
+		     oidset_contains(options->blobfind, &p->two->oid)))
+			continue;
+
+		diff_free_filepair(p);
+		q->queue[i] = NULL;
+		c--;
+	}
+
+	/* Keep it sorted. */
+	i = 0; j = 0;
+	while (i < c) {
+		while (!q->queue[j])
+			j++;
+		q->queue[i] = q->queue[j];
+		i++; j++;
+	}
+
+	q->nr = c;
+
+	if (!c) {
+		free(q->queue);
+		DIFF_QUEUE_CLEAR(q);
+	}
+}
+
+void diffcore_blobfind(struct diff_options *options)
+{
+	diffcore_filter_blobs(&diff_queued_diff, options);
+}
diff --git a/diffcore.h b/diffcore.h
index a30da161da..431917672f 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -107,6 +107,7 @@ extern struct diff_filepair *diff_queue(struct diff_queue_struct *,
 					struct diff_filespec *);
 extern void diff_q(struct diff_queue_struct *, struct diff_filepair *);
 
+extern void diffcore_blobfind(struct diff_options *);
 extern void diffcore_break(int);
 extern void diffcore_rename(struct diff_options *);
 extern void diffcore_merge_broken(void);
diff --git a/revision.c b/revision.c
index e2e691dd5a..6449619c0a 100644
--- a/revision.c
+++ b/revision.c
@@ -2409,7 +2409,8 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 	/* Pickaxe, diff-filter and rename following need diffs */
 	if (revs->diffopt.pickaxe ||
 	    revs->diffopt.filter ||
-	    revs->diffopt.flags.follow_renames)
+	    revs->diffopt.flags.follow_renames ||
+	    revs->diffopt.blobfind)
 		revs->diff = 1;
 
 	if (revs->topo_order)
diff --git a/t/t4064-diff-blobfind.sh b/t/t4064-diff-blobfind.sh
new file mode 100755
index 0000000000..b2c2964d77
--- /dev/null
+++ b/t/t4064-diff-blobfind.sh
@@ -0,0 +1,35 @@
+#!/bin/sh
+
+test_description='test finding specific blobs in the revision walking'
+. ./test-lib.sh
+
+test_expect_success 'setup ' '
+	git commit --allow-empty -m "empty initial commit" &&
+
+	echo "Hello, world!" >greeting &&
+	git add greeting &&
+	git commit -m "add the greeting blob" && # borrowed from Git from the Bottom Up
+	git tag -m "the blob" greeting $(git rev-parse HEAD:greeting) &&
+
+	echo asdf >unrelated &&
+	git add unrelated &&
+	git commit -m "unrelated history" &&
+
+	git revert HEAD^ &&
+
+	git commit --allow-empty -m "another unrelated commit"
+'
+
+test_expect_success 'find the greeting blob' '
+	cat >expect <<-EOF &&
+	Revert "add the greeting blob"
+	add the greeting blob
+	EOF
+
+	git log --abbrev=12 --oneline --blobfind=greeting^{blob} >actual.raw &&
+	cut -c 14- actual.raw >actual &&
+
+	test_cmp expect actual
+'
+
+test_done
-- 
2.15.0.128.gcadd42da22


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-11-20 22:25 ` [PATCH 1/1] diffcore: add a filter to find a specific blob Stefan Beller
@ 2017-11-24  7:43   ` Junio C Hamano
  2017-11-25  4:59     ` Junio C Hamano
  2017-12-07 21:40     ` Junio C Hamano
  0 siblings, 2 replies; 29+ messages in thread
From: Junio C Hamano @ 2017-11-24  7:43 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> Sometimes users are given a hash of an object and they want to
> identify it further (ex.: Use verify-pack to find the largest blobs,
> but what are these? or [1])
>
> One might be tempted to extend git-describe to also work with blobs,
> such that `git describe <blob-id>` gives a description as
> '<commit-ish>:<path>'.  This was implemented at [2]; as seen by the sheer
> number of responses (>110), it turns out this is tricky to get right.
> The hard part to get right is picking the correct 'commit-ish' as that
> could be the commit that (re-)introduced the blob or the blob that
> removed the blob; the blob could exist in different branches.
>
> Junio hinted at a different approach of solving this problem, which this
> patch implements. Teach the diff machinery another flag for restricting
> the information to what is shown. For example:
>
>   $ ./git log --oneline --blobfind=v2.0.0:Makefile
>   b2feb64309 Revert the whole "ask curl-config" topic for now
>   47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"
>
> we observe that the Makefile as shipped with 2.0 was introduced in
> v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by
> a different blob.
>
> [1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob
> [2] https://public-inbox.org/git/20171028004419.10139-1-sbeller@google.com/
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>
> On playing around with this, trying to find more interesting cases, I observed:
>
>     git log --oneline --blobfind=HEAD:COPYING
>     703601d678 Update COPYING with GPLv2 with new FSF address
>     
>     git log --oneline --blobfind=703601d678^:COPYING
>     459b8d22e5 tests: do not borrow from COPYING and README from the real source
>     703601d678 Update COPYING with GPLv2 with new FSF address
>     075b845a85 Add a COPYING notice, making it explicit that the license is GPLv2.
>
>     t/diff-lib/COPYING may need an update of the adress of the FSF,
>     # leftoverbits I guess.

I do not think so.  See tz/fsf-address-update topic for details.

Please do not contaminate the list archive with careless mention of 
"hash-mark plus left over bits", as it will make searching the real
good bits harder.  Thanks.

> Another interesting case that I found was
>    git log --oneline --blobfind=v2.14.0:Makefile
>    3921a0b3c3 perf: add test for writing the index
>    36f048c5e4 sha1dc: build git plumbing code more explicitly
>    2118805b92 Makefile: add style build rule
>
> all of which were after v2.14, such that the introduction of that blob doesn't
> show up; I suspect it came in via a merge as unrelated series may have updated
> the Makefile in parallel, though git-log should have told me?

If that is the case, shouldn't we make this new mode imply
--full-history to forbid history simplification?  "git log" is a
tool to find _an_ explanation of the current state, and the usual
history simplification makes tons of sense there, but blobfind is
run most likely in order to find _all_ mention of the set of blobs
given.

> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index dd0dba5b1d..252a21cc19 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -500,6 +500,10 @@ information.
>  --pickaxe-regex::
>  	Treat the <string> given to `-S` as an extended POSIX regular
>  	expression to match.
> +--blobfind=<blob-id>::
> +	Restrict the output such that one side of the diff
> +	matches the given blob-id.
> +
>  endif::git-format-patch[]

Can we have a blank line between these enumerations to make the
source easier to read?  Thanks.

> diff --git a/diffcore-blobfind.c b/diffcore-blobfind.c
> new file mode 100644
> index 0000000000..5d222fc336
> --- /dev/null
> +++ b/diffcore-blobfind.c
> @@ -0,0 +1,51 @@
> +/*
> + * Copyright (c) 2017 Google Inc.
> + */
> +#include "cache.h"
> +#include "diff.h"
> +#include "diffcore.h"
> +
> +static void diffcore_filter_blobs(struct diff_queue_struct *q,
> +				  struct diff_options *options)
> +{
> +	int i, j = 0, c = q->nr;
> +
> +	if (!options->blobfind)
> +		BUG("blobfind oidset not initialized???");
> +
> +	for (i = 0; i < q->nr; i++) {
> +		struct diff_filepair *p = q->queue[i];
> +
> +		if (DIFF_PAIR_UNMERGED(p) ||
> +		    (DIFF_FILE_VALID(p->one) &&
> +		     oidset_contains(options->blobfind, &p->one->oid)) ||
> +		    (DIFF_FILE_VALID(p->two) &&
> +		     oidset_contains(options->blobfind, &p->two->oid)))
> +			continue;

So, we keep an unmerged pair, a pair that mentions a sought-blob on
one side or the other side?  I am not sure if we want to keep the
unmerged pair for the purpose of this one.

> +		diff_free_filepair(p);
> +		q->queue[i] = NULL;
> +		c--;

Also, if you are doing the in-place shrinking and have already
introduced another counter 'j' that is initialized to 0, I think it
makes more sense to do the shrinking in-place.  'i' will stay to be
the source-scan pointer that runs 0 thru q->nr, while 'j' can be
used in this loop (where you have 'continue') to move the current
one that is determined to survive from q->queue[i] to q->queue[j++].

Then you do not need 'c'; when the loop ends, 'j' would be the
number of surviving entries and q->nr can be adjusted to it.  Unlike
the usual pattern taken by the other diffcore transformations where
a new queue is populated and the old one discarded, this would leave
the q->queue[] over-allocated, but I do not think it is too bad.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-11-24  7:43   ` Junio C Hamano
@ 2017-11-25  4:59     ` Junio C Hamano
  2017-12-07 21:40     ` Junio C Hamano
  1 sibling, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2017-11-25  4:59 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> If that is the case, shouldn't we make this new mode imply
> --full-history to forbid history simplification?  "git log" is a
> tool to find _an_ explanation of the current state, and the usual
> history simplification makes tons of sense there, but blobfind is
> run most likely in order to find _all_ mention of the set of blobs
> given.

One scenario that I think we may want to be careful about is this:

 ---o---*---*---A*--M*--o---X
     \             /
      o---*---o---B

where commits marked with '*' has the same blob M:Makefile you are
looking for at the same path Makefile, and we start traversal at X
with "git log --blobfind=M:Makefile X" (or even with a pathspec, i.e.
"git log --blobfind=M:Makefile X -- Makefile).

The usual merge simplification rules would say "Ah, M and A are
TREESAME so we do not have to look at the side branch that ends at
B".  If the user is interested in finding all the introduction and
the retirement of a specific blob object, we would miss the
transition around the '*' on that side branch and ends up finding
only the transitions after the fork point where the blob is
introduced, and after M where the blob is retired.

Another interesting case we may want to be careful is this:

    ---A*--M*--o---X
          /
      ---B*

for the same reason.  The usual merge simplification rules are
designed to come up with _an_ explanation for the state in X,
and because M is TREESAME with both A and B, it would pick just
one (the first parent) while ignoring the other.  Again, that would
not be appropriate if the reason why the user is running the command
is to find all the introduction and the retirement of an object.

It may be worth covering these in the tests (I didn't try to see
specifically if the patch has these cases already, as I didn't think
of the issue when I responded---sorry about that).

Thanks.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-11-24  7:43   ` Junio C Hamano
  2017-11-25  4:59     ` Junio C Hamano
@ 2017-12-07 21:40     ` Junio C Hamano
  1 sibling, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2017-12-07 21:40 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

After saying "Will merge to 'next'" in the recent "What's cooking"
report, I noticed that a few loose ends were never tied on this
topic.

>> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
>> index dd0dba5b1d..252a21cc19 100644
>> --- a/Documentation/diff-options.txt
>> +++ b/Documentation/diff-options.txt
>> @@ -500,6 +500,10 @@ information.
>>  --pickaxe-regex::
>>  	Treat the <string> given to `-S` as an extended POSIX regular
>>  	expression to match.
>> +--blobfind=<blob-id>::
>> +	Restrict the output such that one side of the diff
>> +	matches the given blob-id.
>> +
>>  endif::git-format-patch[]
>
> Can we have a blank line between these enumerations to make the
> source easier to read?  Thanks.
>
> ...
> So, we keep an unmerged pair, a pair that mentions a sought-blob on
> one side or the other side?  I am not sure if we want to keep the
> unmerged pair for the purpose of this one.
>
>> +		diff_free_filepair(p);
>> +		q->queue[i] = NULL;
>> +		c--;
>
> Also, if you are doing the in-place shrinking and have already
> introduced another counter 'j' that is initialized to 0, I think it
> makes more sense to do the shrinking in-place.  'i' will stay to be
> the source-scan pointer that runs 0 thru q->nr, while 'j' can be
> used in this loop (where you have 'continue') to move the current
> one that is determined to survive from q->queue[i] to q->queue[j++].
>
> Then you do not need 'c'; when the loop ends, 'j' would be the
> number of surviving entries and q->nr can be adjusted to it.  Unlike
> the usual pattern taken by the other diffcore transformations where
> a new queue is populated and the old one discarded, this would leave
> the q->queue[] over-allocated, but I do not think it is too bad.

Here is to illustrate the last point.  I still think we should keep
the unmerged entries for the purpose of blobfind but it should be
trivial to fix that.

 diffcore-blobfind.c | 33 ++++++++++++---------------------
 1 file changed, 12 insertions(+), 21 deletions(-)

diff --git a/diffcore-blobfind.c b/diffcore-blobfind.c
index 5d222fc336..bf63ba61dc 100644
--- a/diffcore-blobfind.c
+++ b/diffcore-blobfind.c
@@ -8,40 +8,31 @@
 static void diffcore_filter_blobs(struct diff_queue_struct *q,
 				  struct diff_options *options)
 {
-	int i, j = 0, c = q->nr;
+	int src, dst;
 
 	if (!options->blobfind)
 		BUG("blobfind oidset not initialized???");
 
-	for (i = 0; i < q->nr; i++) {
-		struct diff_filepair *p = q->queue[i];
+	for (src = dst = 0; src < q->nr; src++) {
+		struct diff_filepair *p = q->queue[src];
 
 		if (DIFF_PAIR_UNMERGED(p) ||
 		    (DIFF_FILE_VALID(p->one) &&
 		     oidset_contains(options->blobfind, &p->one->oid)) ||
 		    (DIFF_FILE_VALID(p->two) &&
-		     oidset_contains(options->blobfind, &p->two->oid)))
-			continue;
-
-		diff_free_filepair(p);
-		q->queue[i] = NULL;
-		c--;
-	}
-
-	/* Keep it sorted. */
-	i = 0; j = 0;
-	while (i < c) {
-		while (!q->queue[j])
-			j++;
-		q->queue[i] = q->queue[j];
-		i++; j++;
+		     oidset_contains(options->blobfind, &p->two->oid))) {
+			q->queue[dst] = p;
+			dst++;
+		} else {
+			diff_free_filepair(p);
+		}
 	}
 
-	q->nr = c;
-
-	if (!c) {
+	if (!dst) {
 		free(q->queue);
 		DIFF_QUEUE_CLEAR(q);
+	} else {
+		q->nr = dst;
 	}
 }
 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 0/1] diffcore-blobfind
@ 2017-12-08  0:24 Stefan Beller
  2017-12-08  0:24 ` [PATCH 1/1] diffcore: add a filter to find a specific blob Stefan Beller
  2017-12-11 19:58 ` [PATCH 0/1] diff-core blobfind Stefan Beller
  0 siblings, 2 replies; 29+ messages in thread
From: Stefan Beller @ 2017-12-08  0:24 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

This includes the suggestions by Junio,

Thanks,
Stefan

interdiff to currently queued below.

Stefan Beller (1):
  diffcore: add a filter to find a specific blob

 Documentation/diff-options.txt |  5 +++++
 Makefile                       |  1 +
 builtin/log.c                  |  2 +-
 diff.c                         | 20 +++++++++++++++++++-
 diff.h                         |  3 +++
 diffcore-blobfind.c            | 41 +++++++++++++++++++++++++++++++++++++++++
 diffcore.h                     |  1 +
 revision.c                     |  5 ++++-
 t/t4064-diff-blobfind.sh       | 35 +++++++++++++++++++++++++++++++++++
 9 files changed, 110 insertions(+), 3 deletions(-)
 create mode 100644 diffcore-blobfind.c
 create mode 100755 t/t4064-diff-blobfind.sh

-- 
2.15.1.424.g9478a66081-goog

diff --git c/Documentation/diff-options.txt w/Documentation/diff-options.txt
index 252a21cc19..34d53b95f1 100644
--- c/Documentation/diff-options.txt
+++ w/Documentation/diff-options.txt
@@ -500,6 +500,7 @@ information.
 --pickaxe-regex::
 	Treat the <string> given to `-S` as an extended POSIX regular
 	expression to match.
+
 --blobfind=<blob-id>::
 	Restrict the output such that one side of the diff
 	matches the given blob-id.
diff --git c/diffcore-blobfind.c w/diffcore-blobfind.c
index 5d222fc336..e65c7cad6e 100644
--- c/diffcore-blobfind.c
+++ w/diffcore-blobfind.c
@@ -8,40 +8,30 @@
 static void diffcore_filter_blobs(struct diff_queue_struct *q,
 				  struct diff_options *options)
 {
-	int i, j = 0, c = q->nr;
+	int src, dst;
 
 	if (!options->blobfind)
 		BUG("blobfind oidset not initialized???");
 
-	for (i = 0; i < q->nr; i++) {
-		struct diff_filepair *p = q->queue[i];
+	for (src = dst = 0; src < q->nr; src++) {
+		struct diff_filepair *p = q->queue[src];
 
-		if (DIFF_PAIR_UNMERGED(p) ||
-		    (DIFF_FILE_VALID(p->one) &&
+		if ((DIFF_FILE_VALID(p->one) &&
 		     oidset_contains(options->blobfind, &p->one->oid)) ||
 		    (DIFF_FILE_VALID(p->two) &&
-		     oidset_contains(options->blobfind, &p->two->oid)))
-			continue;
-
-		diff_free_filepair(p);
-		q->queue[i] = NULL;
-		c--;
-	}
-
-	/* Keep it sorted. */
-	i = 0; j = 0;
-	while (i < c) {
-		while (!q->queue[j])
-			j++;
-		q->queue[i] = q->queue[j];
-		i++; j++;
+		     oidset_contains(options->blobfind, &p->two->oid))) {
+			q->queue[dst] = p;
+			dst++;
+		} else {
+			diff_free_filepair(p);
+		}
 	}
 
-	q->nr = c;
-
-	if (!c) {
+	if (!dst) {
 		free(q->queue);
 		DIFF_QUEUE_CLEAR(q);
+	} else {
+		q->nr = dst;
 	}
 }
 
diff --git c/revision.c w/revision.c
index 6449619c0a..8e1a89f832 100644
--- c/revision.c
+++ w/revision.c
@@ -2884,6 +2884,8 @@ int prepare_revision_walk(struct rev_info *revs)
 		simplify_merges(revs);
 	if (revs->children.name)
 		set_children(revs);
+	if (revs->diffopt.blobfind)
+		revs->simplify_history = 0;
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-12-08  0:24 [PATCH 0/1] diffcore-blobfind Stefan Beller
@ 2017-12-08  0:24 ` Stefan Beller
  2017-12-08  9:34   ` Jeff King
  2017-12-08 15:04   ` Junio C Hamano
  2017-12-11 19:58 ` [PATCH 0/1] diff-core blobfind Stefan Beller
  1 sibling, 2 replies; 29+ messages in thread
From: Stefan Beller @ 2017-12-08  0:24 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Sometimes users are given a hash of an object and they want to
identify it further (ex.: Use verify-pack to find the largest blobs,
but what are these? or [1])

One might be tempted to extend git-describe to also work with blobs,
such that `git describe <blob-id>` gives a description as
'<commit-ish>:<path>'.  This was implemented at [2]; as seen by the sheer
number of responses (>110), it turns out this is tricky to get right.
The hard part to get right is picking the correct 'commit-ish' as that
could be the commit that (re-)introduced the blob or the blob that
removed the blob; the blob could exist in different branches.

Junio hinted at a different approach of solving this problem, which this
patch implements. Teach the diff machinery another flag for restricting
the information to what is shown. For example:

  $ ./git log --oneline --blobfind=v2.0.0:Makefile
  b2feb64309 Revert the whole "ask curl-config" topic for now
  47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"

we observe that the Makefile as shipped with 2.0 was introduced in
v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by
a different blob.

[1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob
[2] https://public-inbox.org/git/20171028004419.10139-1-sbeller@google.com/

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/diff-options.txt |  5 +++++
 Makefile                       |  1 +
 builtin/log.c                  |  2 +-
 diff.c                         | 20 +++++++++++++++++++-
 diff.h                         |  3 +++
 diffcore-blobfind.c            | 41 +++++++++++++++++++++++++++++++++++++++++
 diffcore.h                     |  1 +
 revision.c                     |  5 ++++-
 t/t4064-diff-blobfind.sh       | 35 +++++++++++++++++++++++++++++++++++
 9 files changed, 110 insertions(+), 3 deletions(-)
 create mode 100644 diffcore-blobfind.c
 create mode 100755 t/t4064-diff-blobfind.sh

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index dd0dba5b1d..34d53b95f1 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -500,6 +500,11 @@ information.
 --pickaxe-regex::
 	Treat the <string> given to `-S` as an extended POSIX regular
 	expression to match.
+
+--blobfind=<blob-id>::
+	Restrict the output such that one side of the diff
+	matches the given blob-id.
+
 endif::git-format-patch[]
 
 -O<orderfile>::
diff --git a/Makefile b/Makefile
index ee9d5eb11e..fdfa8f38f6 100644
--- a/Makefile
+++ b/Makefile
@@ -775,6 +775,7 @@ LIB_OBJS += date.o
 LIB_OBJS += decorate.o
 LIB_OBJS += diffcore-break.o
 LIB_OBJS += diffcore-delta.o
+LIB_OBJS += diffcore-blobfind.o
 LIB_OBJS += diffcore-order.o
 LIB_OBJS += diffcore-pickaxe.o
 LIB_OBJS += diffcore-rename.o
diff --git a/builtin/log.c b/builtin/log.c
index 6c1fa896ad..7b91f61423 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -181,7 +181,7 @@ static void cmd_log_init_finish(int argc, const char **argv, const char *prefix,
 		init_display_notes(&rev->notes_opt);
 
 	if (rev->diffopt.pickaxe || rev->diffopt.filter ||
-	    rev->diffopt.flags.follow_renames)
+	    rev->diffopt.flags.follow_renames || rev->diffopt.blobfind)
 		rev->always_show_header = 0;
 
 	if (source)
diff --git a/diff.c b/diff.c
index 0763e89263..8861f89ab1 100644
--- a/diff.c
+++ b/diff.c
@@ -4082,6 +4082,7 @@ void diff_setup(struct diff_options *options)
 	options->interhunkcontext = diff_interhunk_context_default;
 	options->ws_error_highlight = ws_error_highlight_default;
 	options->flags.rename_empty = 1;
+	options->blobfind = NULL;
 
 	/* pathchange left =NULL by default */
 	options->change = diff_change;
@@ -4487,6 +4488,19 @@ static int parse_ws_error_highlight_opt(struct diff_options *opt, const char *ar
 	return 1;
 }
 
+static int parse_blobfind_opt(struct diff_options *opt, const char *arg)
+{
+	struct object_id oid;
+
+	if (get_oid_blob(arg, &oid) || sha1_object_info(oid.hash, NULL) != OBJ_BLOB)
+		return error("object '%s' is not a blob", arg);
+
+	if (!opt->blobfind)
+		opt->blobfind = xcalloc(1, sizeof(*opt->blobfind));
+	oidset_insert(opt->blobfind, &oid);
+	return 1;
+}
+
 int diff_opt_parse(struct diff_options *options,
 		   const char **av, int ac, const char *prefix)
 {
@@ -4736,7 +4750,8 @@ int diff_opt_parse(struct diff_options *options,
 	else if ((argcount = short_opt('O', av, &optarg))) {
 		options->orderfile = prefix_filename(prefix, optarg);
 		return argcount;
-	}
+	} else if (skip_prefix(arg, "--blobfind=", &arg))
+		return parse_blobfind_opt(options, arg);
 	else if ((argcount = parse_long_opt("diff-filter", av, &optarg))) {
 		int offending = parse_diff_filter_opt(optarg, options);
 		if (offending)
@@ -5770,6 +5785,9 @@ void diffcore_std(struct diff_options *options)
 		diffcore_skip_stat_unmatch(options);
 	if (!options->found_follow) {
 		/* See try_to_follow_renames() in tree-diff.c */
+
+		if (options->blobfind)
+			diffcore_blobfind(options);
 		if (options->break_opt != -1)
 			diffcore_break(options->break_opt);
 		if (options->detect_rename)
diff --git a/diff.h b/diff.h
index 0fb18dd735..9178e498fa 100644
--- a/diff.h
+++ b/diff.h
@@ -7,6 +7,7 @@
 #include "tree-walk.h"
 #include "pathspec.h"
 #include "object.h"
+#include "oidset.h"
 
 struct rev_info;
 struct diff_options;
@@ -174,6 +175,8 @@ struct diff_options {
 	enum diff_words_type word_diff;
 	enum diff_submodule_format submodule_format;
 
+	struct oidset *blobfind;
+
 	/* this is set by diffcore for DIFF_FORMAT_PATCH */
 	int found_changes;
 
diff --git a/diffcore-blobfind.c b/diffcore-blobfind.c
new file mode 100644
index 0000000000..e65c7cad6e
--- /dev/null
+++ b/diffcore-blobfind.c
@@ -0,0 +1,41 @@
+/*
+ * Copyright (c) 2017 Google Inc.
+ */
+#include "cache.h"
+#include "diff.h"
+#include "diffcore.h"
+
+static void diffcore_filter_blobs(struct diff_queue_struct *q,
+				  struct diff_options *options)
+{
+	int src, dst;
+
+	if (!options->blobfind)
+		BUG("blobfind oidset not initialized???");
+
+	for (src = dst = 0; src < q->nr; src++) {
+		struct diff_filepair *p = q->queue[src];
+
+		if ((DIFF_FILE_VALID(p->one) &&
+		     oidset_contains(options->blobfind, &p->one->oid)) ||
+		    (DIFF_FILE_VALID(p->two) &&
+		     oidset_contains(options->blobfind, &p->two->oid))) {
+			q->queue[dst] = p;
+			dst++;
+		} else {
+			diff_free_filepair(p);
+		}
+	}
+
+	if (!dst) {
+		free(q->queue);
+		DIFF_QUEUE_CLEAR(q);
+	} else {
+		q->nr = dst;
+	}
+}
+
+void diffcore_blobfind(struct diff_options *options)
+{
+	diffcore_filter_blobs(&diff_queued_diff, options);
+}
diff --git a/diffcore.h b/diffcore.h
index a30da161da..431917672f 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -107,6 +107,7 @@ extern struct diff_filepair *diff_queue(struct diff_queue_struct *,
 					struct diff_filespec *);
 extern void diff_q(struct diff_queue_struct *, struct diff_filepair *);
 
+extern void diffcore_blobfind(struct diff_options *);
 extern void diffcore_break(int);
 extern void diffcore_rename(struct diff_options *);
 extern void diffcore_merge_broken(void);
diff --git a/revision.c b/revision.c
index e2e691dd5a..8e1a89f832 100644
--- a/revision.c
+++ b/revision.c
@@ -2409,7 +2409,8 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 	/* Pickaxe, diff-filter and rename following need diffs */
 	if (revs->diffopt.pickaxe ||
 	    revs->diffopt.filter ||
-	    revs->diffopt.flags.follow_renames)
+	    revs->diffopt.flags.follow_renames ||
+	    revs->diffopt.blobfind)
 		revs->diff = 1;
 
 	if (revs->topo_order)
@@ -2883,6 +2884,8 @@ int prepare_revision_walk(struct rev_info *revs)
 		simplify_merges(revs);
 	if (revs->children.name)
 		set_children(revs);
+	if (revs->diffopt.blobfind)
+		revs->simplify_history = 0;
 	return 0;
 }
 
diff --git a/t/t4064-diff-blobfind.sh b/t/t4064-diff-blobfind.sh
new file mode 100755
index 0000000000..b2c2964d77
--- /dev/null
+++ b/t/t4064-diff-blobfind.sh
@@ -0,0 +1,35 @@
+#!/bin/sh
+
+test_description='test finding specific blobs in the revision walking'
+. ./test-lib.sh
+
+test_expect_success 'setup ' '
+	git commit --allow-empty -m "empty initial commit" &&
+
+	echo "Hello, world!" >greeting &&
+	git add greeting &&
+	git commit -m "add the greeting blob" && # borrowed from Git from the Bottom Up
+	git tag -m "the blob" greeting $(git rev-parse HEAD:greeting) &&
+
+	echo asdf >unrelated &&
+	git add unrelated &&
+	git commit -m "unrelated history" &&
+
+	git revert HEAD^ &&
+
+	git commit --allow-empty -m "another unrelated commit"
+'
+
+test_expect_success 'find the greeting blob' '
+	cat >expect <<-EOF &&
+	Revert "add the greeting blob"
+	add the greeting blob
+	EOF
+
+	git log --abbrev=12 --oneline --blobfind=greeting^{blob} >actual.raw &&
+	cut -c 14- actual.raw >actual &&
+
+	test_cmp expect actual
+'
+
+test_done
-- 
2.15.1.424.g9478a66081-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-12-08  0:24 ` [PATCH 1/1] diffcore: add a filter to find a specific blob Stefan Beller
@ 2017-12-08  9:34   ` Jeff King
  2017-12-08 16:28     ` Ramsay Jones
  2017-12-08 15:04   ` Junio C Hamano
  1 sibling, 1 reply; 29+ messages in thread
From: Jeff King @ 2017-12-08  9:34 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

On Thu, Dec 07, 2017 at 04:24:47PM -0800, Stefan Beller wrote:

> Sometimes users are given a hash of an object and they want to
> identify it further (ex.: Use verify-pack to find the largest blobs,
> but what are these? or [1])
> 
> One might be tempted to extend git-describe to also work with blobs,
> such that `git describe <blob-id>` gives a description as
> '<commit-ish>:<path>'.  This was implemented at [2]; as seen by the sheer
> number of responses (>110), it turns out this is tricky to get right.
> The hard part to get right is picking the correct 'commit-ish' as that
> could be the commit that (re-)introduced the blob or the blob that
> removed the blob; the blob could exist in different branches.

Neat. I didn't follow the original thread very closely, but I think this
is a good idea (and is definitely something I've simulated manually with
"git log --raw | grep" before).

Bear with me while I pontificate for a moment.

We do something similar at GitHub to report on too-large objects
during a push (we identify the object during index-pack, but then want
to hand back a human-readable pathname).

We do it with a patch to "rev-list", so that you can use the normal
traversal options to limit the set of visited commits. Which effectively
makes it "traverse and find a place where this object is referenced, and
then tell me the path at which you found it (or tell me if it was not
referenced at all)".

That sidesteps the "describe" problem, because now it is the user's
responsibility to tell you which commits to look in. :)

But the rev-list solution only finds _a_ commit that contains the
object, and which likely has nothing to do with the object at all. Your
solution here finds the introduction and removal of the object, which
are interesting for a wider variety of searches.

So I wondered if this could replace my rev-list patch (which I've been
meaning to upstream for a while). But I think the answer is "no", there
is room for both features. What you're finding here is much more
specific and useful data, but it's also much more expensive to generate.
For my uses cases, it was enough to ask "show me a plausible path
quickly" or even "is this object reachable" (which you can answer out of
bitmaps!).

> Junio hinted at a different approach of solving this problem, which this
> patch implements. Teach the diff machinery another flag for restricting
> the information to what is shown. For example:
> 
>   $ ./git log --oneline --blobfind=v2.0.0:Makefile
>   b2feb64309 Revert the whole "ask curl-config" topic for now
>   47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"
> 
> we observe that the Makefile as shipped with 2.0 was introduced in
> v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by
> a different blob.

So I like this very much as an addition to Git's toolbox. But does it
really need to be limited to blobs? We should be able to ask for trees
(both root trees and sub-trees), too. That's going to be less common, I
think, but I have definitely done

  git log --raw -t --pretty=raw --no-abbrev |
  less +/$tree_sha1

to debug things (corrupted repos, funny merges, etc).

You could even do it for submodule commits. Which demonstrates that this
feature does not even need to actually have a resolvable object; you're
just looking for the textual sha1.

You do your filtering on the diff queue, which I think will have the
entries you need if "-t" is specified (and should always have submodule
commits, I think). So you'd just need to relax the incoming object
check, like:

diff --git a/diff.c b/diff.c
index e4571df3bf..0007bb0a09 100644
--- a/diff.c
+++ b/diff.c
@@ -4490,8 +4490,12 @@ static int parse_blobfind_opt(struct diff_options *opt, const char *arg)
 {
 	struct object_id oid;
 
-	if (get_oid_blob(arg, &oid) || sha1_object_info(oid.hash, NULL) != OBJ_BLOB)
-		return error("object '%s' is not a blob", arg);
+	/*
+	 * We don't even need to have the object, and 40-hex sha1s will
+	 * just resolve here.
+	 */
+	if (get_oid_blob(arg, &oid))
+		return error("unable to resolve '%s'", arg);
 
 	if (!opt->blobfind)
 		opt->blobfind = xcalloc(1, sizeof(*opt->blobfind));

At which point:

  ./git log --oneline -t --blobfind=v2.0.0:Documentation

just works (the results are kind of interesting, too; you see that tree
"go away" in a lot of different commits because many independent
branches touched the directory).

Also interesting:

  ./git log --oneline --blobfind=HEAD:sha1collisiondetection

Well, the output for this particular case isn't that interesting. But it
shows that we can find submodules, too.

-Peff

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-12-08  0:24 ` [PATCH 1/1] diffcore: add a filter to find a specific blob Stefan Beller
  2017-12-08  9:34   ` Jeff King
@ 2017-12-08 15:04   ` Junio C Hamano
  2017-12-08 17:21     ` Junio C Hamano
  2017-12-08 21:11     ` Stefan Beller
  1 sibling, 2 replies; 29+ messages in thread
From: Junio C Hamano @ 2017-12-08 15:04 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> diff --git a/diffcore-blobfind.c b/diffcore-blobfind.c
> new file mode 100644
> index 0000000000..e65c7cad6e
> --- /dev/null
> +++ b/diffcore-blobfind.c
> @@ -0,0 +1,41 @@
> +/*
> + * Copyright (c) 2017 Google Inc.
> + */
> +#include "cache.h"
> +#include "diff.h"
> +#include "diffcore.h"
> +
> +static void diffcore_filter_blobs(struct diff_queue_struct *q,
> +				  struct diff_options *options)
> +{
> +	int src, dst;
> +
> +	if (!options->blobfind)
> +		BUG("blobfind oidset not initialized???");
> +
> +	for (src = dst = 0; src < q->nr; src++) {
> +		struct diff_filepair *p = q->queue[src];
> +
> +		if ((DIFF_FILE_VALID(p->one) &&
> +		     oidset_contains(options->blobfind, &p->one->oid)) ||
> +		    (DIFF_FILE_VALID(p->two) &&
> +		     oidset_contains(options->blobfind, &p->two->oid))) {

Shouldn't this make sure that !DIFF_PAIR_UNMERGED(p) before looking
at the sides of the pair?  I think an unmerged pair is queued with
filespecs whose modes are cleared, but we should not be relying on
that---I think we even had bugs we fixed by introducing an explicit
is_unmerged field to the filepair struct.

> @@ -2883,6 +2884,8 @@ int prepare_revision_walk(struct rev_info *revs)
>  		simplify_merges(revs);
>  	if (revs->children.name)
>  		set_children(revs);
> +	if (revs->diffopt.blobfind)
> +		revs->simplify_history = 0;
>  	return 0;
>  }

It makes sense to clear this bit by default, but is this an
unconditonal clearing?  In other words, is there a way for the user
to countermand this default and ask for merge simplification from
the command line, e.g. "diff --simplify-history --blobfind=<blob>"?

> +test_description='test finding specific blobs in the revision walking'
> +. ./test-lib.sh
> +
> +test_expect_success 'setup ' '
> +	git commit --allow-empty -m "empty initial commit" &&
> +
> +	echo "Hello, world!" >greeting &&
> +	git add greeting &&
> +	git commit -m "add the greeting blob" && # borrowed from Git from the Bottom Up
> +	git tag -m "the blob" greeting $(git rev-parse HEAD:greeting) &&
> +
> +	echo asdf >unrelated &&
> +	git add unrelated &&
> +	git commit -m "unrelated history" &&
> +
> +	git revert HEAD^ &&
> +
> +	git commit --allow-empty -m "another unrelated commit"
> +'
> +
> +test_expect_success 'find the greeting blob' '
> +	cat >expect <<-EOF &&
> +	Revert "add the greeting blob"
> +	add the greeting blob
> +	EOF
> +
> +	git log --abbrev=12 --oneline --blobfind=greeting^{blob} >actual.raw &&
> +	cut -c 14- actual.raw >actual &&
> +	test_cmp expect actual

The hardcoded numbers look strange and smell like a workaround of
not asking for full object names.

Also, now it has the simplify-history bit in the change, don't we
want to check a mergy history, and also running "diff-files" while
the index has unmerged entries?

Other than that, the changes look quite sane and pleasnt read.

Thanks.


> +'
> +
> +test_done

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-12-08  9:34   ` Jeff King
@ 2017-12-08 16:28     ` Ramsay Jones
  2017-12-08 20:19       ` Jeff King
  0 siblings, 1 reply; 29+ messages in thread
From: Ramsay Jones @ 2017-12-08 16:28 UTC (permalink / raw)
  To: Jeff King, Stefan Beller; +Cc: git



On 08/12/17 09:34, Jeff King wrote:
> On Thu, Dec 07, 2017 at 04:24:47PM -0800, Stefan Beller wrote:
[snip]
>> Junio hinted at a different approach of solving this problem, which this
>> patch implements. Teach the diff machinery another flag for restricting
>> the information to what is shown. For example:
>>
>>   $ ./git log --oneline --blobfind=v2.0.0:Makefile
>>   b2feb64309 Revert the whole "ask curl-config" topic for now
>>   47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"
>>
>> we observe that the Makefile as shipped with 2.0 was introduced in
>> v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by
>> a different blob.

Sorry, this has nothing to do with this specific thread. :(

However, the above caught my eye. Commit b2feb64309 does not 'replace
the Makefile with a different blob'. That commit was a 'last minute'
revert of a topic _prior_ to v2.0.0, which resulted in the _same_
blob as commit 47fbfded53. (i.e. a53f3a8326c2e62dc79bae7169d64137ac3dab20).

[I haven't been following this topic, so just ignore me if I have
misunderstood what the above was describing! :-D ]

ATB,
Ramsay Jones


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-12-08 15:04   ` Junio C Hamano
@ 2017-12-08 17:21     ` Junio C Hamano
  2017-12-08 21:11     ` Stefan Beller
  1 sibling, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2017-12-08 17:21 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Stefan Beller <sbeller@google.com> writes:
> ...
>> @@ -2883,6 +2884,8 @@ int prepare_revision_walk(struct rev_info *revs)
>>  		simplify_merges(revs);
>>  	if (revs->children.name)
>>  		set_children(revs);
>> +	if (revs->diffopt.blobfind)
>> +		revs->simplify_history = 0;
>>  	return 0;
>>  }
>
> It makes sense to clear this bit by default, but is this an
> unconditonal clearing?  In other words, is there a way for the user
> to countermand this default and ask for merge simplification from
> the command line, e.g. "diff --simplify-history --blobfind=<blob>"?

Looking at the places that assign to revs->simplify_history in
revision.c it seems that "this option turns the merge simplification
off unconditionally" is pretty much the norm, and this change just
follows suit.  So perhaps it is OK to let this pass, at least for
now.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-12-08 16:28     ` Ramsay Jones
@ 2017-12-08 20:19       ` Jeff King
  2017-12-08 20:39         ` Stefan Beller
  0 siblings, 1 reply; 29+ messages in thread
From: Jeff King @ 2017-12-08 20:19 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Stefan Beller, git

On Fri, Dec 08, 2017 at 04:28:23PM +0000, Ramsay Jones wrote:

> On 08/12/17 09:34, Jeff King wrote:
> > On Thu, Dec 07, 2017 at 04:24:47PM -0800, Stefan Beller wrote:
> [snip]
> >> Junio hinted at a different approach of solving this problem, which this
> >> patch implements. Teach the diff machinery another flag for restricting
> >> the information to what is shown. For example:
> >>
> >>   $ ./git log --oneline --blobfind=v2.0.0:Makefile
> >>   b2feb64309 Revert the whole "ask curl-config" topic for now
> >>   47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"
> >>
> >> we observe that the Makefile as shipped with 2.0 was introduced in
> >> v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by
> >> a different blob.
> 
> Sorry, this has nothing to do with this specific thread. :(
> 
> However, the above caught my eye. Commit b2feb64309 does not 'replace
> the Makefile with a different blob'. That commit was a 'last minute'
> revert of a topic _prior_ to v2.0.0, which resulted in the _same_
> blob as commit 47fbfded53. (i.e. a53f3a8326c2e62dc79bae7169d64137ac3dab20).

Right, I think the patch is working as advertised but the commit message
misinterprets the results. :)

If you add --raw, you can see that both commits introduce that blob, and
it never "goes away". That's because that happened in a merge, which we
don't diff in a default log invocation.

And in fact finding where this "goes away" is hard, because the merge
doesn't trigger "-c" or "--cc". It's 8eaf517835, which presumably was
forked before the revert in b2feb64309, and then the merge was able to
replace that blob completely with the other side of the merge.

Viewing with "-m" turns up a bunch of uninteresting merges. Looking at
"--first-parent" is how I got 8eaf517835.

So I think this one is tricky because of the revert. In the same way
that pathspec-limiting is often tricky in the face of a revert, because
the merges "hide" interesting things happening.

-Peff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-12-08 20:19       ` Jeff King
@ 2017-12-08 20:39         ` Stefan Beller
  2017-12-08 21:38           ` Jeff King
  0 siblings, 1 reply; 29+ messages in thread
From: Stefan Beller @ 2017-12-08 20:39 UTC (permalink / raw)
  To: Jeff King; +Cc: Ramsay Jones, git

On Fri, Dec 8, 2017 at 12:19 PM, Jeff King <peff@peff.net> wrote:
> On Fri, Dec 08, 2017 at 04:28:23PM +0000, Ramsay Jones wrote:
>
>> On 08/12/17 09:34, Jeff King wrote:
>> > On Thu, Dec 07, 2017 at 04:24:47PM -0800, Stefan Beller wrote:
>> [snip]
>> >> Junio hinted at a different approach of solving this problem, which this
>> >> patch implements. Teach the diff machinery another flag for restricting
>> >> the information to what is shown. For example:
>> >>
>> >>   $ ./git log --oneline --blobfind=v2.0.0:Makefile
>> >>   b2feb64309 Revert the whole "ask curl-config" topic for now
>> >>   47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"
>> >>
>> >> we observe that the Makefile as shipped with 2.0 was introduced in
>> >> v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by
>> >> a different blob.
>>
>> Sorry, this has nothing to do with this specific thread. :(
>>
>> However, the above caught my eye. Commit b2feb64309 does not 'replace
>> the Makefile with a different blob'. That commit was a 'last minute'
>> revert of a topic _prior_ to v2.0.0, which resulted in the _same_
>> blob as commit 47fbfded53. (i.e. a53f3a8326c2e62dc79bae7169d64137ac3dab20).
>
> Right, I think the patch is working as advertised but the commit message
> misinterprets the results. :)

Thanks for digging. I came to a similar realization.

> If you add --raw, you can see that both commits introduce that blob, and
> it never "goes away". That's because that happened in a merge, which we
> don't diff in a default log invocation.

We should when --raw is given.
--raw is documented as  "For each commit, show a summary of changes
using the raw diff format." and I would argue that 'each commit' includes
merges. Though I guess this may have implications for long time users.

> And in fact finding where this "goes away" is hard, because the merge
> doesn't trigger "-c" or "--cc". It's 8eaf517835, which presumably was
> forked before the revert in b2feb64309, and then the merge was able to
> replace that blob completely with the other side of the merge.
>
> Viewing with "-m" turns up a bunch of uninteresting merges. Looking at
> "--first-parent" is how I got 8eaf517835.
>
> So I think this one is tricky because of the revert. In the same way
> that pathspec-limiting is often tricky in the face of a revert, because
> the merges "hide" interesting things happening.

yup, hidden merges are unfortunate. Is there an easy way to find out
about merges? (Junio hints at having tests around merges, which I'll do
next)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-12-08 15:04   ` Junio C Hamano
  2017-12-08 17:21     ` Junio C Hamano
@ 2017-12-08 21:11     ` Stefan Beller
  2017-12-08 21:15       ` Junio C Hamano
  1 sibling, 1 reply; 29+ messages in thread
From: Stefan Beller @ 2017-12-08 21:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

>> +
>> +             if ((DIFF_FILE_VALID(p->one) &&
>> +                  oidset_contains(options->blobfind, &p->one->oid)) ||
>> +                 (DIFF_FILE_VALID(p->two) &&
>> +                  oidset_contains(options->blobfind, &p->two->oid))) {
>
> Shouldn't this make sure that !DIFF_PAIR_UNMERGED(p) before looking
> at the sides of the pair?  I think an unmerged pair is queued with
> filespecs whose modes are cleared, but we should not be relying on
> that---I think we even had bugs we fixed by introducing an explicit
> is_unmerged field to the filepair struct.

I am not sure I follow. IIUC 'unmerged' only ever happens in the
index when there are multiple stages for one path (represented in the
working tree with diff markers). Aren't we supposed to find such
an unmerged blob as well (despite wrong mode, but the oid ought to fit)?

>> +     if (revs->diffopt.blobfind)
>> +             revs->simplify_history = 0;
>>       return 0;
>>  }
>
> It makes sense to clear this bit by default, but is this an
> unconditonal clearing?  In other words, is there a way for the user
> to countermand this default and ask for merge simplification from
> the command line, e.g. "diff --simplify-history --blobfind=<blob>"?

As noted in your reply, this is consistent with other occurrences of
simplify_history, so let's keep it this way.

>> +
>> +     git log --abbrev=12 --oneline --blobfind=greeting^{blob} >actual.raw &&
>> +     cut -c 14- actual.raw >actual &&
>> +     test_cmp expect actual
>
> The hardcoded numbers look strange and smell like a workaround of
> not asking for full object names.

You mean the 12 and 14 ? Yeah I can fix that to 40 and 42 if you want.
I wrote it this way to be agnostic of the actual object id, but thought 12
would be enough for this test no matter the future hash.

> Also, now it has the simplify-history bit in the change, don't we
> want to check a mergy history, and also running "diff-files" while
> the index has unmerged entries?

yup, I am working on that.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-12-08 21:11     ` Stefan Beller
@ 2017-12-08 21:15       ` Junio C Hamano
  0 siblings, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2017-12-08 21:15 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

>>> +
>>> +             if ((DIFF_FILE_VALID(p->one) &&
>>> +                  oidset_contains(options->blobfind, &p->one->oid)) ||
>>> +                 (DIFF_FILE_VALID(p->two) &&
>>> +                  oidset_contains(options->blobfind, &p->two->oid))) {
>>
>> Shouldn't this make sure that !DIFF_PAIR_UNMERGED(p) before looking
>> at the sides of the pair?  I think an unmerged pair is queued with
>> filespecs whose modes are cleared, but we should not be relying on
>> that---I think we even had bugs we fixed by introducing an explicit
>> is_unmerged field to the filepair struct.
>
> I am not sure I follow. IIUC 'unmerged' only ever happens in the
> index when there are multiple stages for one path (represented in the
> working tree with diff markers). Aren't we supposed to find such
> an unmerged blob as well (despite wrong mode, but the oid ought to fit)?

I think diff_unmerged() records "this path is unmerged" without
giving any object name to either side, so you do not get to compare
(or even look at) p->{one,two}->oid with anything.  That is why I
think checking p->one and p->two like the above code, without making
sure that you are looking at a pair that is not unmerged, is wrong.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-12-08 20:39         ` Stefan Beller
@ 2017-12-08 21:38           ` Jeff King
  0 siblings, 0 replies; 29+ messages in thread
From: Jeff King @ 2017-12-08 21:38 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Ramsay Jones, git

On Fri, Dec 08, 2017 at 12:39:55PM -0800, Stefan Beller wrote:

> > If you add --raw, you can see that both commits introduce that blob, and
> > it never "goes away". That's because that happened in a merge, which we
> > don't diff in a default log invocation.
> 
> We should when --raw is given.
> --raw is documented as  "For each commit, show a summary of changes
> using the raw diff format." and I would argue that 'each commit' includes
> merges. Though I guess this may have implications for long time users.

And "--patch" is documented as "generate patch", but it also does
nothing for merges by default. I think it has little to do with "--raw".
It is simply that the default for "log" is none of "-c", "--cc", or
"-m".

We _could_ change that default ("--cc" is already the default for
git-show), but I would not be surprised if that has fallouts (certainly
it makes git-log much slower).

> > So I think this one is tricky because of the revert. In the same way
> > that pathspec-limiting is often tricky in the face of a revert, because
> > the merges "hide" interesting things happening.
> 
> yup, hidden merges are unfortunate. Is there an easy way to find out
> about merges? (Junio hints at having tests around merges, which I'll do
> next)

If you find such an easy way, let me know. :)

One of the few really manual types of query I remember having done in
recent years is trying to pinpoint a bad merge. I.e., somebody during
merge resolution accidentally does "git checkout --ours foo.c", blowing
away changes which they didn't mean to. And then later you want to
figure out which merge did it.

If you use "-c" or "--cc", that isn't an "interesting" change, because
it resolves to one side of the merge. If you use "-m", you get way too
many changes and have to comb through them manually. I've resorted to
"-m --first-parent", but then you frequently have to dig down several
layers (e.g., the bad merge is a merge from "master" onto a topic
branch, and your first "--first-parent" attempt will just find the bad
topic being merged back into master).

I think the most promising tool I've seen there is to redo the merge and
show the diff between the auto-merge (including conflicts) and the
committed tree. It's just another definition of "is this hunk
interesting" that's different from "--cc".

I'm not sure how that would interact with something like "--blobfind",
though. For that matter, I'm not quite sure how your patch would
interact with "--cc". I think you may need to special-case it.

-Peff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 0/1] diff-core blobfind
  2017-12-08  0:24 [PATCH 0/1] diffcore-blobfind Stefan Beller
  2017-12-08  0:24 ` [PATCH 1/1] diffcore: add a filter to find a specific blob Stefan Beller
@ 2017-12-11 19:58 ` Stefan Beller
  2017-12-11 19:58   ` [PATCH 1/1] diffcore: add a filter to find a specific blob Stefan Beller
  1 sibling, 1 reply; 29+ messages in thread
From: Stefan Beller @ 2017-12-11 19:58 UTC (permalink / raw)
  To: git, gitster; +Cc: peff, Stefan Beller

* added check for unmerged entries (!DIFF_PAIR_UNMERGED)
* added support to find submodules
* renamed the UI to `--find-object` instead of --blobfind.

diff to currently queued below.

Thanks,
Stefan

Stefan Beller (1):
  diffcore: add a filter to find a specific blob

 Documentation/diff-options.txt |  6 +++++
 Makefile                       |  1 +
 builtin/log.c                  |  2 +-
 diff.c                         | 21 ++++++++++++++++-
 diff.h                         |  3 +++
 diffcore-oidfind.c             | 42 ++++++++++++++++++++++++++++++++++
 diffcore.h                     |  1 +
 revision.c                     |  5 ++++-
 t/t4064-diff-oidfind.sh        | 51 ++++++++++++++++++++++++++++++++++++++++++
 9 files changed, 129 insertions(+), 3 deletions(-)
 create mode 100644 diffcore-oidfind.c
 create mode 100755 t/t4064-diff-oidfind.sh

-- 
2.15.1.424.g9478a66081-goog

diff --git c/Documentation/diff-options.txt w/Documentation/diff-options.txt
index 34d53b95f1..67a99e522b 100644
--- c/Documentation/diff-options.txt
+++ w/Documentation/diff-options.txt
@@ -501,9 +501,10 @@ information.
 	Treat the <string> given to `-S` as an extended POSIX regular
 	expression to match.
 
---blobfind=<blob-id>::
+--find-object=<object-id>::
 	Restrict the output such that one side of the diff
-	matches the given blob-id.
+	matches the given object id. The object can be a blob,
+	or gitlink entry.
 
 endif::git-format-patch[]
 
diff --git c/Makefile w/Makefile
index fdfa8f38f6..fc2b136694 100644
--- c/Makefile
+++ w/Makefile
@@ -775,7 +775,7 @@ LIB_OBJS += date.o
 LIB_OBJS += decorate.o
 LIB_OBJS += diffcore-break.o
 LIB_OBJS += diffcore-delta.o
-LIB_OBJS += diffcore-blobfind.o
+LIB_OBJS += diffcore-oidfind.o
 LIB_OBJS += diffcore-order.o
 LIB_OBJS += diffcore-pickaxe.o
 LIB_OBJS += diffcore-rename.o
diff --git c/builtin/log.c w/builtin/log.c
index 7b91f61423..2ab7f338e6 100644
--- c/builtin/log.c
+++ w/builtin/log.c
@@ -181,7 +181,7 @@ static void cmd_log_init_finish(int argc, const char **argv, const char *prefix,
 		init_display_notes(&rev->notes_opt);
 
 	if (rev->diffopt.pickaxe || rev->diffopt.filter ||
-	    rev->diffopt.flags.follow_renames || rev->diffopt.blobfind)
+	    rev->diffopt.flags.follow_renames || rev->diffopt.oidfind)
 		rev->always_show_header = 0;
 
 	if (source)
diff --git c/diff.c w/diff.c
index 8861f89ab1..cb35934634 100644
--- c/diff.c
+++ w/diff.c
@@ -4082,7 +4082,7 @@ void diff_setup(struct diff_options *options)
 	options->interhunkcontext = diff_interhunk_context_default;
 	options->ws_error_highlight = ws_error_highlight_default;
 	options->flags.rename_empty = 1;
-	options->blobfind = NULL;
+	options->oidfind = NULL;
 
 	/* pathchange left =NULL by default */
 	options->change = diff_change;
@@ -4488,16 +4488,17 @@ static int parse_ws_error_highlight_opt(struct diff_options *opt, const char *ar
 	return 1;
 }
 
-static int parse_blobfind_opt(struct diff_options *opt, const char *arg)
+static int parse_oidfind_opt(struct diff_options *opt, const char *arg)
 {
 	struct object_id oid;
 
-	if (get_oid_blob(arg, &oid) || sha1_object_info(oid.hash, NULL) != OBJ_BLOB)
-		return error("object '%s' is not a blob", arg);
+	/* We don't even need to have the object, any oid works. */
+	if (get_oid_blob(arg, &oid))
+		return error("unable to resolve '%s'", arg);
 
-	if (!opt->blobfind)
-		opt->blobfind = xcalloc(1, sizeof(*opt->blobfind));
-	oidset_insert(opt->blobfind, &oid);
+	if (!opt->oidfind)
+		opt->oidfind = xcalloc(1, sizeof(*opt->oidfind));
+	oidset_insert(opt->oidfind, &oid);
 	return 1;
 }
 
@@ -4750,8 +4751,8 @@ int diff_opt_parse(struct diff_options *options,
 	else if ((argcount = short_opt('O', av, &optarg))) {
 		options->orderfile = prefix_filename(prefix, optarg);
 		return argcount;
-	} else if (skip_prefix(arg, "--blobfind=", &arg))
-		return parse_blobfind_opt(options, arg);
+	} else if (skip_prefix(arg, "--find-object=", &arg))
+		return parse_oidfind_opt(options, arg);
 	else if ((argcount = parse_long_opt("diff-filter", av, &optarg))) {
 		int offending = parse_diff_filter_opt(optarg, options);
 		if (offending)
@@ -5786,8 +5787,8 @@ void diffcore_std(struct diff_options *options)
 	if (!options->found_follow) {
 		/* See try_to_follow_renames() in tree-diff.c */
 
-		if (options->blobfind)
-			diffcore_blobfind(options);
+		if (options->oidfind)
+			diffcore_oidfind(options);
 		if (options->break_opt != -1)
 			diffcore_break(options->break_opt);
 		if (options->detect_rename)
diff --git c/diff.h w/diff.h
index 9178e498fa..d2badb29a1 100644
--- c/diff.h
+++ w/diff.h
@@ -175,7 +175,7 @@ struct diff_options {
 	enum diff_words_type word_diff;
 	enum diff_submodule_format submodule_format;
 
-	struct oidset *blobfind;
+	struct oidset *oidfind;
 
 	/* this is set by diffcore for DIFF_FORMAT_PATCH */
 	int found_changes;
diff --git c/diffcore-blobfind.c w/diffcore-oidfind.c
similarity index 64%
rename from diffcore-blobfind.c
rename to diffcore-oidfind.c
index e65c7cad6e..39a0b5390f 100644
--- c/diffcore-blobfind.c
+++ w/diffcore-oidfind.c
@@ -10,16 +10,17 @@ static void diffcore_filter_blobs(struct diff_queue_struct *q,
 {
 	int src, dst;
 
-	if (!options->blobfind)
-		BUG("blobfind oidset not initialized???");
+	if (!options->oidfind)
+		BUG("oidfind oidset not initialized???");
 
 	for (src = dst = 0; src < q->nr; src++) {
 		struct diff_filepair *p = q->queue[src];
 
-		if ((DIFF_FILE_VALID(p->one) &&
-		     oidset_contains(options->blobfind, &p->one->oid)) ||
+		if (!DIFF_PAIR_UNMERGED(p) &&
+		    ((DIFF_FILE_VALID(p->one) &&
+		     oidset_contains(options->oidfind, &p->one->oid)) ||
 		    (DIFF_FILE_VALID(p->two) &&
-		     oidset_contains(options->blobfind, &p->two->oid))) {
+		     oidset_contains(options->oidfind, &p->two->oid)))) {
 			q->queue[dst] = p;
 			dst++;
 		} else {
@@ -35,7 +36,7 @@ static void diffcore_filter_blobs(struct diff_queue_struct *q,
 	}
 }
 
-void diffcore_blobfind(struct diff_options *options)
+void diffcore_oidfind(struct diff_options *options)
 {
 	diffcore_filter_blobs(&diff_queued_diff, options);
 }
diff --git c/diffcore.h w/diffcore.h
index 431917672f..6bb1d76b1a 100644
--- c/diffcore.h
+++ w/diffcore.h
@@ -107,7 +107,7 @@ extern struct diff_filepair *diff_queue(struct diff_queue_struct *,
 					struct diff_filespec *);
 extern void diff_q(struct diff_queue_struct *, struct diff_filepair *);
 
-extern void diffcore_blobfind(struct diff_options *);
+extern void diffcore_oidfind(struct diff_options *);
 extern void diffcore_break(int);
 extern void diffcore_rename(struct diff_options *);
 extern void diffcore_merge_broken(void);
diff --git c/revision.c w/revision.c
index 8e1a89f832..e4611054b9 100644
--- c/revision.c
+++ w/revision.c
@@ -2410,7 +2410,7 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 	if (revs->diffopt.pickaxe ||
 	    revs->diffopt.filter ||
 	    revs->diffopt.flags.follow_renames ||
-	    revs->diffopt.blobfind)
+	    revs->diffopt.oidfind)
 		revs->diff = 1;
 
 	if (revs->topo_order)
@@ -2884,7 +2884,7 @@ int prepare_revision_walk(struct rev_info *revs)
 		simplify_merges(revs);
 	if (revs->children.name)
 		set_children(revs);
-	if (revs->diffopt.blobfind)
+	if (revs->diffopt.oidfind)
 		revs->simplify_history = 0;
 	return 0;
 }
diff --git c/t/t4064-diff-blobfind.sh w/t/t4064-diff-oidfind.sh
similarity index 65%
rename from t/t4064-diff-blobfind.sh
rename to t/t4064-diff-oidfind.sh
index b2c2964d77..7afe62df87 100755
--- c/t/t4064-diff-blobfind.sh
+++ w/t/t4064-diff-oidfind.sh
@@ -26,8 +26,24 @@ test_expect_success 'find the greeting blob' '
 	add the greeting blob
 	EOF
 
-	git log --abbrev=12 --oneline --blobfind=greeting^{blob} >actual.raw &&
-	cut -c 14- actual.raw >actual &&
+	git log --format=%s --find-object=greeting^{blob} >actual &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'setup a submodule' '
+	test_create_repo sub &&
+	test_commit -C sub sub &&
+	git submodule add ./sub sub &&
+	git commit -a -m "add sub"
+'
+
+test_expect_success 'find a submodule' '
+	cat >expect <<-EOF &&
+	add sub
+	EOF
+
+	git log --format=%s --find-object=HEAD:sub >actual &&
 
 	test_cmp expect actual
 '

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-12-11 19:58 ` [PATCH 0/1] diff-core blobfind Stefan Beller
@ 2017-12-11 19:58   ` Stefan Beller
  2017-12-11 23:17     ` Junio C Hamano
  0 siblings, 1 reply; 29+ messages in thread
From: Stefan Beller @ 2017-12-11 19:58 UTC (permalink / raw)
  To: git, gitster; +Cc: peff, Stefan Beller

Sometimes users are given a hash of an object and they want to
identify it further (ex.: Use verify-pack to find the largest blobs,
but what are these? or [1])

One might be tempted to extend git-describe to also work with blobs,
such that `git describe <blob-id>` gives a description as
'<commit-ish>:<path>'.  This was implemented at [2]; as seen by the sheer
number of responses (>110), it turns out this is tricky to get right.
The hard part to get right is picking the correct 'commit-ish' as that
could be the commit that (re-)introduced the blob or the blob that
removed the blob; the blob could exist in different branches.

Junio hinted at a different approach of solving this problem, which this
patch implements. Teach the diff machinery another flag for restricting
the information to what is shown. For example:

  $ ./git log --oneline --blobfind=v2.0.0:Makefile
  b2feb64309 Revert the whole "ask curl-config" topic for now
  47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"

we observe that the Makefile as shipped with 2.0 was introduced in
v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by
a different blob.

[1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob
[2] https://public-inbox.org/git/20171028004419.10139-1-sbeller@google.com/

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/diff-options.txt |  6 +++++
 Makefile                       |  1 +
 builtin/log.c                  |  2 +-
 diff.c                         | 21 ++++++++++++++++-
 diff.h                         |  3 +++
 diffcore-oidfind.c             | 42 ++++++++++++++++++++++++++++++++++
 diffcore.h                     |  1 +
 revision.c                     |  5 ++++-
 t/t4064-diff-oidfind.sh        | 51 ++++++++++++++++++++++++++++++++++++++++++
 9 files changed, 129 insertions(+), 3 deletions(-)
 create mode 100644 diffcore-oidfind.c
 create mode 100755 t/t4064-diff-oidfind.sh

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index dd0dba5b1d..67a99e522b 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -500,6 +500,12 @@ information.
 --pickaxe-regex::
 	Treat the <string> given to `-S` as an extended POSIX regular
 	expression to match.
+
+--find-object=<object-id>::
+	Restrict the output such that one side of the diff
+	matches the given object id. The object can be a blob,
+	or gitlink entry.
+
 endif::git-format-patch[]
 
 -O<orderfile>::
diff --git a/Makefile b/Makefile
index ee9d5eb11e..fc2b136694 100644
--- a/Makefile
+++ b/Makefile
@@ -775,6 +775,7 @@ LIB_OBJS += date.o
 LIB_OBJS += decorate.o
 LIB_OBJS += diffcore-break.o
 LIB_OBJS += diffcore-delta.o
+LIB_OBJS += diffcore-oidfind.o
 LIB_OBJS += diffcore-order.o
 LIB_OBJS += diffcore-pickaxe.o
 LIB_OBJS += diffcore-rename.o
diff --git a/builtin/log.c b/builtin/log.c
index 6c1fa896ad..2ab7f338e6 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -181,7 +181,7 @@ static void cmd_log_init_finish(int argc, const char **argv, const char *prefix,
 		init_display_notes(&rev->notes_opt);
 
 	if (rev->diffopt.pickaxe || rev->diffopt.filter ||
-	    rev->diffopt.flags.follow_renames)
+	    rev->diffopt.flags.follow_renames || rev->diffopt.oidfind)
 		rev->always_show_header = 0;
 
 	if (source)
diff --git a/diff.c b/diff.c
index 0763e89263..cb35934634 100644
--- a/diff.c
+++ b/diff.c
@@ -4082,6 +4082,7 @@ void diff_setup(struct diff_options *options)
 	options->interhunkcontext = diff_interhunk_context_default;
 	options->ws_error_highlight = ws_error_highlight_default;
 	options->flags.rename_empty = 1;
+	options->oidfind = NULL;
 
 	/* pathchange left =NULL by default */
 	options->change = diff_change;
@@ -4487,6 +4488,20 @@ static int parse_ws_error_highlight_opt(struct diff_options *opt, const char *ar
 	return 1;
 }
 
+static int parse_oidfind_opt(struct diff_options *opt, const char *arg)
+{
+	struct object_id oid;
+
+	/* We don't even need to have the object, any oid works. */
+	if (get_oid_blob(arg, &oid))
+		return error("unable to resolve '%s'", arg);
+
+	if (!opt->oidfind)
+		opt->oidfind = xcalloc(1, sizeof(*opt->oidfind));
+	oidset_insert(opt->oidfind, &oid);
+	return 1;
+}
+
 int diff_opt_parse(struct diff_options *options,
 		   const char **av, int ac, const char *prefix)
 {
@@ -4736,7 +4751,8 @@ int diff_opt_parse(struct diff_options *options,
 	else if ((argcount = short_opt('O', av, &optarg))) {
 		options->orderfile = prefix_filename(prefix, optarg);
 		return argcount;
-	}
+	} else if (skip_prefix(arg, "--find-object=", &arg))
+		return parse_oidfind_opt(options, arg);
 	else if ((argcount = parse_long_opt("diff-filter", av, &optarg))) {
 		int offending = parse_diff_filter_opt(optarg, options);
 		if (offending)
@@ -5770,6 +5786,9 @@ void diffcore_std(struct diff_options *options)
 		diffcore_skip_stat_unmatch(options);
 	if (!options->found_follow) {
 		/* See try_to_follow_renames() in tree-diff.c */
+
+		if (options->oidfind)
+			diffcore_oidfind(options);
 		if (options->break_opt != -1)
 			diffcore_break(options->break_opt);
 		if (options->detect_rename)
diff --git a/diff.h b/diff.h
index 0fb18dd735..d2badb29a1 100644
--- a/diff.h
+++ b/diff.h
@@ -7,6 +7,7 @@
 #include "tree-walk.h"
 #include "pathspec.h"
 #include "object.h"
+#include "oidset.h"
 
 struct rev_info;
 struct diff_options;
@@ -174,6 +175,8 @@ struct diff_options {
 	enum diff_words_type word_diff;
 	enum diff_submodule_format submodule_format;
 
+	struct oidset *oidfind;
+
 	/* this is set by diffcore for DIFF_FORMAT_PATCH */
 	int found_changes;
 
diff --git a/diffcore-oidfind.c b/diffcore-oidfind.c
new file mode 100644
index 0000000000..39a0b5390f
--- /dev/null
+++ b/diffcore-oidfind.c
@@ -0,0 +1,42 @@
+/*
+ * Copyright (c) 2017 Google Inc.
+ */
+#include "cache.h"
+#include "diff.h"
+#include "diffcore.h"
+
+static void diffcore_filter_blobs(struct diff_queue_struct *q,
+				  struct diff_options *options)
+{
+	int src, dst;
+
+	if (!options->oidfind)
+		BUG("oidfind oidset not initialized???");
+
+	for (src = dst = 0; src < q->nr; src++) {
+		struct diff_filepair *p = q->queue[src];
+
+		if (!DIFF_PAIR_UNMERGED(p) &&
+		    ((DIFF_FILE_VALID(p->one) &&
+		     oidset_contains(options->oidfind, &p->one->oid)) ||
+		    (DIFF_FILE_VALID(p->two) &&
+		     oidset_contains(options->oidfind, &p->two->oid)))) {
+			q->queue[dst] = p;
+			dst++;
+		} else {
+			diff_free_filepair(p);
+		}
+	}
+
+	if (!dst) {
+		free(q->queue);
+		DIFF_QUEUE_CLEAR(q);
+	} else {
+		q->nr = dst;
+	}
+}
+
+void diffcore_oidfind(struct diff_options *options)
+{
+	diffcore_filter_blobs(&diff_queued_diff, options);
+}
diff --git a/diffcore.h b/diffcore.h
index a30da161da..6bb1d76b1a 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -107,6 +107,7 @@ extern struct diff_filepair *diff_queue(struct diff_queue_struct *,
 					struct diff_filespec *);
 extern void diff_q(struct diff_queue_struct *, struct diff_filepair *);
 
+extern void diffcore_oidfind(struct diff_options *);
 extern void diffcore_break(int);
 extern void diffcore_rename(struct diff_options *);
 extern void diffcore_merge_broken(void);
diff --git a/revision.c b/revision.c
index e2e691dd5a..e4611054b9 100644
--- a/revision.c
+++ b/revision.c
@@ -2409,7 +2409,8 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 	/* Pickaxe, diff-filter and rename following need diffs */
 	if (revs->diffopt.pickaxe ||
 	    revs->diffopt.filter ||
-	    revs->diffopt.flags.follow_renames)
+	    revs->diffopt.flags.follow_renames ||
+	    revs->diffopt.oidfind)
 		revs->diff = 1;
 
 	if (revs->topo_order)
@@ -2883,6 +2884,8 @@ int prepare_revision_walk(struct rev_info *revs)
 		simplify_merges(revs);
 	if (revs->children.name)
 		set_children(revs);
+	if (revs->diffopt.oidfind)
+		revs->simplify_history = 0;
 	return 0;
 }
 
diff --git a/t/t4064-diff-oidfind.sh b/t/t4064-diff-oidfind.sh
new file mode 100755
index 0000000000..7afe62df87
--- /dev/null
+++ b/t/t4064-diff-oidfind.sh
@@ -0,0 +1,51 @@
+#!/bin/sh
+
+test_description='test finding specific blobs in the revision walking'
+. ./test-lib.sh
+
+test_expect_success 'setup ' '
+	git commit --allow-empty -m "empty initial commit" &&
+
+	echo "Hello, world!" >greeting &&
+	git add greeting &&
+	git commit -m "add the greeting blob" && # borrowed from Git from the Bottom Up
+	git tag -m "the blob" greeting $(git rev-parse HEAD:greeting) &&
+
+	echo asdf >unrelated &&
+	git add unrelated &&
+	git commit -m "unrelated history" &&
+
+	git revert HEAD^ &&
+
+	git commit --allow-empty -m "another unrelated commit"
+'
+
+test_expect_success 'find the greeting blob' '
+	cat >expect <<-EOF &&
+	Revert "add the greeting blob"
+	add the greeting blob
+	EOF
+
+	git log --format=%s --find-object=greeting^{blob} >actual &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'setup a submodule' '
+	test_create_repo sub &&
+	test_commit -C sub sub &&
+	git submodule add ./sub sub &&
+	git commit -a -m "add sub"
+'
+
+test_expect_success 'find a submodule' '
+	cat >expect <<-EOF &&
+	add sub
+	EOF
+
+	git log --format=%s --find-object=HEAD:sub >actual &&
+
+	test_cmp expect actual
+'
+
+test_done
-- 
2.15.1.424.g9478a66081-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-12-11 19:58   ` [PATCH 1/1] diffcore: add a filter to find a specific blob Stefan Beller
@ 2017-12-11 23:17     ` Junio C Hamano
  2017-12-12  0:21       ` Stefan Beller
  0 siblings, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2017-12-11 23:17 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, peff

Stefan Beller <sbeller@google.com> writes:

> the information to what is shown. For example:
>
>   $ ./git log --oneline --blobfind=v2.0.0:Makefile
>   b2feb64309 Revert the whole "ask curl-config" topic for now
>   47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"

This part is a bit stale???

> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index dd0dba5b1d..67a99e522b 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -500,6 +500,12 @@ information.
>  --pickaxe-regex::
>  	Treat the <string> given to `-S` as an extended POSIX regular
>  	expression to match.
> +
> +--find-object=<object-id>::
> +	Restrict the output such that one side of the diff
> +	matches the given object id. The object can be a blob,
> +	or gitlink entry.

OK.  In principle you should also be able to find a tree, but I do
not now how useful it would be.  Extending it to gitlink, which is
another kind of leaf node in the reachability DAG, does make tons of
sense---it's a no brainer that I feel ashamed not to have thought of
myself ;-)

> +LIB_OBJS += diffcore-oidfind.o

Just to nitpick, but "blobfind" was to find "blob", and if you are
extending it to find any "object", then that should be "objfind".
"oid" is _A_ way to refer to an object (i.e. the _name_ of it), and
name is *not* the same as the thing the name refers to, so...

> +static int parse_oidfind_opt(struct diff_options *opt, const char *arg)
> +{
> +	struct object_id oid;
> +
> +	/* We don't even need to have the object, any oid works. */
> +	if (get_oid_blob(arg, &oid))
> +		return error("unable to resolve '%s'", arg);

Should this still be get_oid_blob(), or should it be less specific
to blobs?

> +test_expect_success 'find the greeting blob' '
> +	cat >expect <<-EOF &&
> +	Revert "add the greeting blob"
> +	add the greeting blob
> +	EOF
> +
> +	git log --format=%s --find-object=greeting^{blob} >actual &&
> +
> +	test_cmp expect actual
> +'

Makes sense.

> +
> +test_expect_success 'setup a submodule' '
> +	test_create_repo sub &&
> +	test_commit -C sub sub &&
> +	git submodule add ./sub sub &&
> +	git commit -a -m "add sub"
> +'
> +
> +test_expect_success 'find a submodule' '
> +	cat >expect <<-EOF &&
> +	add sub
> +	EOF
> +
> +	git log --format=%s --find-object=HEAD:sub >actual &&
> +
> +	test_cmp expect actual
> +'

Nice (and cute).

> +test_done

Looking good.  Thanks.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
  2017-12-11 23:17     ` Junio C Hamano
@ 2017-12-12  0:21       ` Stefan Beller
  2017-12-12  1:24         ` [PATCH] " Stefan Beller
  0 siblings, 1 reply; 29+ messages in thread
From: Stefan Beller @ 2017-12-12  0:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King

On Mon, Dec 11, 2017 at 3:17 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> the information to what is shown. For example:
>>
>>   $ ./git log --oneline --blobfind=v2.0.0:Makefile
>>   b2feb64309 Revert the whole "ask curl-config" topic for now
>>   47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"
>
> This part is a bit stale???

fixed in the next reroll :(

>> +--find-object=<object-id>::
>> +     Restrict the output such that one side of the diff
>> +     matches the given object id. The object can be a blob,
>> +     or gitlink entry.
>
> OK.  In principle you should also be able to find a tree, but I do
> not now how useful it would be.  Extending it to gitlink, which is
> another kind of leaf node in the reachability DAG, does make tons of
> sense---it's a no brainer that I feel ashamed not to have thought of
> myself ;-)

The current patch under discussion doesn't find trees, though.
Hence the documentation is accurate saying that only blobs and
gitlinks work.

>
>> +LIB_OBJS += diffcore-oidfind.o
>
> Just to nitpick, but "blobfind" was to find "blob", and if you are
> extending it to find any "object", then that should be "objfind".
> "oid" is _A_ way to refer to an object (i.e. the _name_ of it), and
> name is *not* the same as the thing the name refers to, so...

obj-find sounds good.

>
>> +static int parse_oidfind_opt(struct diff_options *opt, const char *arg)
>> +{
>> +     struct object_id oid;
>> +
>> +     /* We don't even need to have the object, any oid works. */
>> +     if (get_oid_blob(arg, &oid))
>> +             return error("unable to resolve '%s'", arg);
>
> Should this still be get_oid_blob(), or should it be less specific
> to blobs?

We could check if it is a tree/commit and die as they are not
being handled correctly.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH] diffcore: add a filter to find a specific blob
  2017-12-12  0:21       ` Stefan Beller
@ 2017-12-12  1:24         ` Stefan Beller
  2017-12-12 18:36           ` Junio C Hamano
  2017-12-14 21:22           ` Jonathan Nieder
  0 siblings, 2 replies; 29+ messages in thread
From: Stefan Beller @ 2017-12-12  1:24 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, peff

Sometimes users are given a hash of an object and they want to
identify it further (ex.: Use verify-pack to find the largest blobs,
but what are these? or [1])

One might be tempted to extend git-describe to also work with blobs,
such that `git describe <blob-id>` gives a description as
'<commit-ish>:<path>'.  This was implemented at [2]; as seen by the sheer
number of responses (>110), it turns out this is tricky to get right.
The hard part to get right is picking the correct 'commit-ish' as that
could be the commit that (re-)introduced the blob or the blob that
removed the blob; the blob could exist in different branches.

Junio hinted at a different approach of solving this problem, which this
patch implements. Teach the diff machinery another flag for restricting
the information to what is shown. For example:

  $ ./git log --oneline --find-object=v2.0.0:Makefile
  b2feb64309 Revert the whole "ask curl-config" topic for now
  47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"

we observe that the Makefile as shipped with 2.0 was appeared in
v1.9.2-471-g47fbfded53 and in v2.0.0-rc1-5-gb2feb6430b.  The
reason why these commits both occur prior to v2.0.0 are evil
merges that are not found using this new mechanism.

[1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob
[2] https://public-inbox.org/git/20171028004419.10139-1-sbeller@google.com/

Signed-off-by: Stefan Beller <sbeller@google.com>

asdf

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/diff-options.txt |  6 ++++
 Makefile                       |  1 +
 builtin/log.c                  |  2 +-
 diff.c                         | 20 ++++++++++++-
 diff.h                         |  3 ++
 diffcore-objfind.c             | 42 ++++++++++++++++++++++++++
 diffcore.h                     |  1 +
 revision.c                     |  5 +++-
 t/t4064-diff-oidfind.sh        | 68 ++++++++++++++++++++++++++++++++++++++++++
 9 files changed, 145 insertions(+), 3 deletions(-)
 create mode 100644 diffcore-objfind.c
 create mode 100755 t/t4064-diff-oidfind.sh

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index dd0dba5b1d..21d1776996 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -500,6 +500,12 @@ information.
 --pickaxe-regex::
 	Treat the <string> given to `-S` as an extended POSIX regular
 	expression to match.
+
+--find-object=<object-id>::
+	Restrict the output such that one side of the diff
+	matches the given object id. The object can be a blob,
+	gitlink entry or tree (when `-t` is given).
+
 endif::git-format-patch[]
 
 -O<orderfile>::
diff --git a/Makefile b/Makefile
index ee9d5eb11e..c26596c30a 100644
--- a/Makefile
+++ b/Makefile
@@ -775,6 +775,7 @@ LIB_OBJS += date.o
 LIB_OBJS += decorate.o
 LIB_OBJS += diffcore-break.o
 LIB_OBJS += diffcore-delta.o
+LIB_OBJS += diffcore-objfind.o
 LIB_OBJS += diffcore-order.o
 LIB_OBJS += diffcore-pickaxe.o
 LIB_OBJS += diffcore-rename.o
diff --git a/builtin/log.c b/builtin/log.c
index 6c1fa896ad..08ea82d69f 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -181,7 +181,7 @@ static void cmd_log_init_finish(int argc, const char **argv, const char *prefix,
 		init_display_notes(&rev->notes_opt);
 
 	if (rev->diffopt.pickaxe || rev->diffopt.filter ||
-	    rev->diffopt.flags.follow_renames)
+	    rev->diffopt.flags.follow_renames || rev->diffopt.objfind)
 		rev->always_show_header = 0;
 
 	if (source)
diff --git a/diff.c b/diff.c
index 0763e89263..e13b8229d3 100644
--- a/diff.c
+++ b/diff.c
@@ -4082,6 +4082,7 @@ void diff_setup(struct diff_options *options)
 	options->interhunkcontext = diff_interhunk_context_default;
 	options->ws_error_highlight = ws_error_highlight_default;
 	options->flags.rename_empty = 1;
+	options->objfind = NULL;
 
 	/* pathchange left =NULL by default */
 	options->change = diff_change;
@@ -4487,6 +4488,19 @@ static int parse_ws_error_highlight_opt(struct diff_options *opt, const char *ar
 	return 1;
 }
 
+static int parse_objfind_opt(struct diff_options *opt, const char *arg)
+{
+	struct object_id oid;
+
+	if (get_oid(arg, &oid))
+		return error("unable to resolve '%s'", arg);
+
+	if (!opt->objfind)
+		opt->objfind = xcalloc(1, sizeof(*opt->objfind));
+	oidset_insert(opt->objfind, &oid);
+	return 1;
+}
+
 int diff_opt_parse(struct diff_options *options,
 		   const char **av, int ac, const char *prefix)
 {
@@ -4736,7 +4750,8 @@ int diff_opt_parse(struct diff_options *options,
 	else if ((argcount = short_opt('O', av, &optarg))) {
 		options->orderfile = prefix_filename(prefix, optarg);
 		return argcount;
-	}
+	} else if (skip_prefix(arg, "--find-object=", &arg))
+		return parse_objfind_opt(options, arg);
 	else if ((argcount = parse_long_opt("diff-filter", av, &optarg))) {
 		int offending = parse_diff_filter_opt(optarg, options);
 		if (offending)
@@ -5770,6 +5785,9 @@ void diffcore_std(struct diff_options *options)
 		diffcore_skip_stat_unmatch(options);
 	if (!options->found_follow) {
 		/* See try_to_follow_renames() in tree-diff.c */
+
+		if (options->objfind)
+			diffcore_objfind(options);
 		if (options->break_opt != -1)
 			diffcore_break(options->break_opt);
 		if (options->detect_rename)
diff --git a/diff.h b/diff.h
index 0fb18dd735..8fa2ad8e2d 100644
--- a/diff.h
+++ b/diff.h
@@ -7,6 +7,7 @@
 #include "tree-walk.h"
 #include "pathspec.h"
 #include "object.h"
+#include "oidset.h"
 
 struct rev_info;
 struct diff_options;
@@ -174,6 +175,8 @@ struct diff_options {
 	enum diff_words_type word_diff;
 	enum diff_submodule_format submodule_format;
 
+	struct oidset *objfind;
+
 	/* this is set by diffcore for DIFF_FORMAT_PATCH */
 	int found_changes;
 
diff --git a/diffcore-objfind.c b/diffcore-objfind.c
new file mode 100644
index 0000000000..676bbfff00
--- /dev/null
+++ b/diffcore-objfind.c
@@ -0,0 +1,42 @@
+/*
+ * Copyright (c) 2017 Google Inc.
+ */
+#include "cache.h"
+#include "diff.h"
+#include "diffcore.h"
+
+static void diffcore_filter_blobs(struct diff_queue_struct *q,
+				  struct diff_options *options)
+{
+	int src, dst;
+
+	if (!options->objfind)
+		BUG("objfind oidset not initialized???");
+
+	for (src = dst = 0; src < q->nr; src++) {
+		struct diff_filepair *p = q->queue[src];
+
+		if (!DIFF_PAIR_UNMERGED(p) &&
+		    ((DIFF_FILE_VALID(p->one) &&
+		     oidset_contains(options->objfind, &p->one->oid)) ||
+		    (DIFF_FILE_VALID(p->two) &&
+		     oidset_contains(options->objfind, &p->two->oid)))) {
+			q->queue[dst] = p;
+			dst++;
+		} else {
+			diff_free_filepair(p);
+		}
+	}
+
+	if (!dst) {
+		free(q->queue);
+		DIFF_QUEUE_CLEAR(q);
+	} else {
+		q->nr = dst;
+	}
+}
+
+void diffcore_objfind(struct diff_options *options)
+{
+	diffcore_filter_blobs(&diff_queued_diff, options);
+}
diff --git a/diffcore.h b/diffcore.h
index a30da161da..cbde777bdd 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -107,6 +107,7 @@ extern struct diff_filepair *diff_queue(struct diff_queue_struct *,
 					struct diff_filespec *);
 extern void diff_q(struct diff_queue_struct *, struct diff_filepair *);
 
+extern void diffcore_objfind(struct diff_options *);
 extern void diffcore_break(int);
 extern void diffcore_rename(struct diff_options *);
 extern void diffcore_merge_broken(void);
diff --git a/revision.c b/revision.c
index e2e691dd5a..0a797bdfc7 100644
--- a/revision.c
+++ b/revision.c
@@ -2409,7 +2409,8 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 	/* Pickaxe, diff-filter and rename following need diffs */
 	if (revs->diffopt.pickaxe ||
 	    revs->diffopt.filter ||
-	    revs->diffopt.flags.follow_renames)
+	    revs->diffopt.flags.follow_renames ||
+	    revs->diffopt.objfind)
 		revs->diff = 1;
 
 	if (revs->topo_order)
@@ -2883,6 +2884,8 @@ int prepare_revision_walk(struct rev_info *revs)
 		simplify_merges(revs);
 	if (revs->children.name)
 		set_children(revs);
+	if (revs->diffopt.objfind)
+		revs->simplify_history = 0;
 	return 0;
 }
 
diff --git a/t/t4064-diff-oidfind.sh b/t/t4064-diff-oidfind.sh
new file mode 100755
index 0000000000..3bdf317af8
--- /dev/null
+++ b/t/t4064-diff-oidfind.sh
@@ -0,0 +1,68 @@
+#!/bin/sh
+
+test_description='test finding specific blobs in the revision walking'
+. ./test-lib.sh
+
+test_expect_success 'setup ' '
+	git commit --allow-empty -m "empty initial commit" &&
+
+	echo "Hello, world!" >greeting &&
+	git add greeting &&
+	git commit -m "add the greeting blob" && # borrowed from Git from the Bottom Up
+	git tag -m "the blob" greeting $(git rev-parse HEAD:greeting) &&
+
+	echo asdf >unrelated &&
+	git add unrelated &&
+	git commit -m "unrelated history" &&
+
+	git revert HEAD^ &&
+
+	git commit --allow-empty -m "another unrelated commit"
+'
+
+test_expect_success 'find the greeting blob' '
+	cat >expect <<-EOF &&
+	Revert "add the greeting blob"
+	add the greeting blob
+	EOF
+
+	git log --format=%s --find-object=greeting^{blob} >actual &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'setup a tree' '
+	mkdir a &&
+	echo asdf >a/file &&
+	git add a/file &&
+	git commit -m "add a file in a subdirectory"
+'
+
+test_expect_success 'find a tree' '
+	cat >expect <<-EOF &&
+	add a file in a subdirectory
+	EOF
+
+	git log --format=%s -t --find-object=HEAD:a >actual &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'setup a submodule' '
+	test_create_repo sub &&
+	test_commit -C sub sub &&
+	git submodule add ./sub sub &&
+	git commit -a -m "add sub"
+'
+
+test_expect_success 'find a submodule' '
+	cat >expect <<-EOF &&
+	add sub
+	EOF
+
+	git log --format=%s --find-object=HEAD:sub >actual &&
+
+	test_cmp expect actual
+'
+
+test_done
-- 
2.15.1.504.g5279b80103-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] diffcore: add a filter to find a specific blob
  2017-12-12  1:24         ` [PATCH] " Stefan Beller
@ 2017-12-12 18:36           ` Junio C Hamano
  2017-12-14 21:22           ` Jonathan Nieder
  1 sibling, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2017-12-12 18:36 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, peff

Stefan Beller <sbeller@google.com> writes:

> Sometimes users are given a hash of an object and they want to
> identify it further (ex.: Use verify-pack to find the largest blobs,
> but what are these? or [1])
> ...
>  Documentation/diff-options.txt |  6 ++++
>  Makefile                       |  1 +
>  builtin/log.c                  |  2 +-
>  diff.c                         | 20 ++++++++++++-
>  diff.h                         |  3 ++
>  diffcore-objfind.c             | 42 ++++++++++++++++++++++++++
>  diffcore.h                     |  1 +
>  revision.c                     |  5 +++-
>  t/t4064-diff-oidfind.sh        | 68 ++++++++++++++++++++++++++++++++++++++++++
>  9 files changed, 145 insertions(+), 3 deletions(-)
>  create mode 100644 diffcore-objfind.c
>  create mode 100755 t/t4064-diff-oidfind.sh

Thanks.  Will drop an asdf and queue.  

I think this is ready for 'next'; I'll wait for a day or two just to
give us a final chance to spot and save us from minor embarrassments,
but I expect anything we'd spot from here on would be so minor that
we can go incremental.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] diffcore: add a filter to find a specific blob
  2017-12-12  1:24         ` [PATCH] " Stefan Beller
  2017-12-12 18:36           ` Junio C Hamano
@ 2017-12-14 21:22           ` Jonathan Nieder
  2017-12-14 22:30             ` Stefan Beller
  2017-12-14 22:44             ` Junio C Hamano
  1 sibling, 2 replies; 29+ messages in thread
From: Jonathan Nieder @ 2017-12-14 21:22 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, gitster, peff

Hi,

Stefan Beller wrote:

> Junio hinted at a different approach of solving this problem, which this
> patch implements. Teach the diff machinery another flag for restricting
> the information to what is shown. For example:
>
>   $ ./git log --oneline --find-object=v2.0.0:Makefile
>   b2feb64309 Revert the whole "ask curl-config" topic for now
>   47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"
>
> we observe that the Makefile as shipped with 2.0 was appeared in
> v1.9.2-471-g47fbfded53 and in v2.0.0-rc1-5-gb2feb6430b.

Neat!

Nit: s/was appeared/appeared/ (not a big deal though, since this is
already in 'next').

From the above it's not clear to me whether this is about commits
where the object was added or where the object was removed.
Fortunately, the docs are a bit clearer:

                                 ... one side of the diff
       matches the given object id. The object can be a blob,
       gitlink entry or tree (when `-t` is given).

So this appears to be about both additions and removals.  Followup
questions:

- are copies and renames shown (if I am passing -M -C)?
- what about mode changes?  If the file became executable but the
  blob content didn't change, does that commit match?

Nit, not related to this change: it would be nice to have a long
option to go along with the short name '-t' --- e.g. --include-trees.

Another nit: s/gitlink entry/submodule commit/, perhaps.  The commit
object is not a gitlink entry: it is pointed to by a gitlink entry.

Another documentation idea: it may be nice to point out that this
is only about the preimage and postimage submodule commit and that
it doesn't look at the history in between.

>                                                          The
> reason why these commits both occur prior to v2.0.0 are evil
> merges that are not found using this new mechanism.

Would it be worth the doc mentioning that this doesn't look at merges
unless you use one of the -m/-c/--cc options, or does that go without
saying?

[...]
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -500,6 +500,12 @@ information.
>  --pickaxe-regex::
>  	Treat the <string> given to `-S` as an extended POSIX regular
>  	expression to match.
> +
> +--find-object=<object-id>::
> +	Restrict the output such that one side of the diff
> +	matches the given object id. The object can be a blob,
> +	gitlink entry or tree (when `-t` is given).

I like this name --find-object more than --blobfind!  I am not sure it
quite matches what the user is looking for, though.  We are not
looking for all occurences of the object; we only care about when the
object appears (was added or removed) in the diff.

Putting it in context, we have:

	pickaxe options:
	-S: detect changes in occurence count of a string
	-G: grep lines in diff for a string

	--pickaxe-all:
		do not filter the diff when the patch matches pickaxe
		conditions.

		kind of like log --full-diff, but restricted to pickaxe
		options.
	--pickaxe-regex: treat -S argument as a regex, not a string

Is this another kind of pickaxe option?  It feels similar to -S, but
at an object level instead of a substring level, so in a way it would
be appealing to call it --pickaxe-object.  Does --pickaxe-all affect
it like it affects -S and -G?

Another context to put it in is:

	--diff-filter:
		limit paths (but not commits?) to those with a change
		matching optarg

If I understand correctly, then --diff-filter does not interact with
--pickaxe-all, or in other words it is a different filtering
condition.  Is this another kind of diff filter?  In that context, it
may be appealing to call it something like --object-filter.

--diff-filter is an example where it seems appealing to have a
--full-diff option to diff-tree that could apply to all filters and
not just pickaxe.

[... implementation snipped ...]

The implementation looks lovely and I'm especially happy about the
tests.  Thanks for writing it.

Thoughts?
Jonathan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] diffcore: add a filter to find a specific blob
  2017-12-14 21:22           ` Jonathan Nieder
@ 2017-12-14 22:30             ` Stefan Beller
  2017-12-14 22:52               ` Jonathan Nieder
  2017-12-14 22:44             ` Junio C Hamano
  1 sibling, 1 reply; 29+ messages in thread
From: Stefan Beller @ 2017-12-14 22:30 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Junio C Hamano, Jeff King

On Thu, Dec 14, 2017 at 1:22 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:

> - what about mode changes?  If the file became executable but the
>   blob content didn't change, does that commit match?

./git log --find-object=$(git rev-parse ba67504f:t/perf/p3400-rebase.sh)

claims it does find the mode change (commit ba67504f is just a mode
change)

> - are copies and renames shown (if I am passing -M -C)?

It restricts the commits shown, not the renamed files. But I assume
you mean it the same way as with mode changes.
I did not find a good commit in gits history to demonstrate, but as
it is orthogonal to the object id restrictions, I would think it works

> Nit, not related to this change: it would be nice to have a long
> option to go along with the short name '-t' --- e.g. --include-trees.

follow up patches welcome. :)

>
> Another nit: s/gitlink entry/submodule commit/, perhaps.  The commit
> object is not a gitlink entry: it is pointed to by a gitlink entry.

Well, what if the user doesn't have a submodule, but uses gitlinks
for other purposes? We do inspect the gitlink, so it is correct IMHO.

> Another documentation idea: it may be nice to point out that this
> is only about the preimage and postimage submodule commit and that
> it doesn't look at the history in between.

That is sensible. One might be tempted to ask: "Which superproject
commit contains a submodule pointer, that has commit $X in the
submodule history?", but this new option is totally not answering this.

>>                                                          The
>> reason why these commits both occur prior to v2.0.0 are evil
>> merges that are not found using this new mechanism.
>
> Would it be worth the doc mentioning that this doesn't look at merges
> unless you use one of the -m/-c/--cc options, or does that go without
> saying?

I assumed it goes without saying, just like the lacking -t could mean
to ignore trees. ;)


>
> [...]
>> --- a/Documentation/diff-options.txt
>> +++ b/Documentation/diff-options.txt
>> @@ -500,6 +500,12 @@ information.
>>  --pickaxe-regex::
>>       Treat the <string> given to `-S` as an extended POSIX regular
>>       expression to match.
>> +
>> +--find-object=<object-id>::
>> +     Restrict the output such that one side of the diff
>> +     matches the given object id. The object can be a blob,
>> +     gitlink entry or tree (when `-t` is given).
>
> I like this name --find-object more than --blobfind!  I am not sure it
> quite matches what the user is looking for, though.  We are not
> looking for all occurences of the object; we only care about when the
> object appears (was added or removed) in the diff.

Thanks! Yes, but the 'edges' are so few commits that a further walk
will reveal all you need to know?


>
> Putting it in context, we have:
>
>         pickaxe options:
>         -S: detect changes in occurence count of a string
>         -G: grep lines in diff for a string
>
>         --pickaxe-all:
>                 do not filter the diff when the patch matches pickaxe
>                 conditions.
>
>                 kind of like log --full-diff, but restricted to pickaxe
>                 options.
>         --pickaxe-regex: treat -S argument as a regex, not a string
>
> Is this another kind of pickaxe option?  It feels similar to -S, but
> at an object level instead of a substring level, so in a way it would
> be appealing to call it --pickaxe-object.  Does --pickaxe-all affect
> it like it affects -S and -G?
>
> Another context to put it in is:
>
>         --diff-filter:
>                 limit paths (but not commits?) to those with a change
>                 matching optarg
>
> If I understand correctly, then --diff-filter does not interact with
> --pickaxe-all, or in other words it is a different filtering
> condition.  Is this another kind of diff filter?  In that context, it
> may be appealing to call it something like --object-filter.
>
> --diff-filter is an example where it seems appealing to have a
> --full-diff option to diff-tree that could apply to all filters and
> not just pickaxe.
>
> [... implementation snipped ...]
>
> The implementation looks lovely and I'm especially happy about the
> tests.  Thanks for writing it.
>
> Thoughts?
> Jonathan

Regarding finding a better name, I would want to hear from others,
I am happy with --find-object, though I can see --pickaxe-object
or --object--filter to be a good narrative as well.

Stefan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] diffcore: add a filter to find a specific blob
  2017-12-14 21:22           ` Jonathan Nieder
  2017-12-14 22:30             ` Stefan Beller
@ 2017-12-14 22:44             ` Junio C Hamano
  1 sibling, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2017-12-14 22:44 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Stefan Beller, git, peff

Jonathan Nieder <jrnieder@gmail.com> writes:

> Putting it in context, we have:
>
> 	pickaxe options:
> 	-S: detect changes in occurence count of a string
> 	-G: grep lines in diff for a string
>
> 	--pickaxe-all:
> 		do not filter the diff when the patch matches pickaxe
> 		conditions.
>
> 		kind of like log --full-diff, but restricted to pickaxe
> 		options.
> 	--pickaxe-regex: treat -S argument as a regex, not a string
>
> Is this another kind of pickaxe option?  It feels similar to -S, but
> at an object level instead of a substring level, so in a way it would
> be appealing to call it --pickaxe-object.  Does --pickaxe-all affect
> it like it affects -S and -G?

It does feel quite close to -S, but is slightly different.

Instead of spelling out the exact contents to pass as the argument
to the -S option, you are giving a blob object name instead.

However, if you rename path F to G without any modification, or if
you flip only the mode bits, -S would reject such a filepair, as
both sides have the same number of the sought-after string, I think.
Unlike -S, this new thing should show such a filepair because it is
designed to show any filepair with either (or both) side has the
sought-after object.

We could make this identical to -S, i.e. only when a filepair has
sought-after object on either (or both) sides *and* its pre and post
image sides do not talk about the same object.

    Note.  I am phrasing the above awkwardly because we can ask for
    more than one object, and a filepair that changes a path from
    object A to object B, both of which are what we are looking for,
    want to be caught.  If I phrased the above as "only when only
    one side of a filepair has an object we are looking for and the
    other side does not", such a filepair would not be shown.

I have a suspicion that such a change may be worthwhile.

> Another context to put it in is:
>
> 	--diff-filter:
> 		limit paths (but not commits?) to those with a change
> 		matching optarg

IIUC, this is applied at the output phase after all of the rename,
pickaxe etc. are done (e.g. "Now you have the diff to give me, show
me only the additions in it").  It is a poor match for this new
thing, as "Now you have the diff to give me, show me only the ones
that involve these blobs" does not sound all that useful.




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] diffcore: add a filter to find a specific blob
  2017-12-14 22:30             ` Stefan Beller
@ 2017-12-14 22:52               ` Jonathan Nieder
  2017-12-15  2:18                 ` Junio C Hamano
  0 siblings, 1 reply; 29+ messages in thread
From: Jonathan Nieder @ 2017-12-14 22:52 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, Junio C Hamano, Jeff King

Hi,

Stefan Beller wrote:
> On Thu, Dec 14, 2017 at 1:22 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:

>> - what about mode changes?  If the file became executable but the
>>   blob content didn't change, does that commit match?
>
> ./git log --find-object=$(git rev-parse ba67504f:t/perf/p3400-rebase.sh)
>
> claims it does find the mode change (commit ba67504f is just a mode
> change)

Thanks.  Reminder to self to add a test + docs about that (as a followup
change; this isn't a complaint about the patch).

>> - are copies and renames shown (if I am passing -M -C)?
>
> It restricts the commits shown, not the renamed files. But I assume
> you mean it the same way as with mode changes.
> I did not find a good commit in gits history to demonstrate, but as
> it is orthogonal to the object id restrictions, I would think it works

Ok, will add test + doc.

>> Nit, not related to this change: it would be nice to have a long
>> option to go along with the short name '-t' --- e.g. --include-trees.
>
> follow up patches welcome. :)

Will think more and try to send a patch if it still seems like a good
idea in a day or so. ;)

>> Another nit: s/gitlink entry/submodule commit/, perhaps.  The commit
>> object is not a gitlink entry: it is pointed to by a gitlink entry.
>
> Well, what if the user doesn't have a submodule, but uses gitlinks
> for other purposes? We do inspect the gitlink, so it is correct IMHO.

It's a language nit.  The argument to --find-object is a commit object
name, not a gitlink entry.  A gitlink entry looks like

 160000 <path> <object>

>> Another documentation idea: it may be nice to point out that this
>> is only about the preimage and postimage submodule commit and that
>> it doesn't look at the history in between.
>
> That is sensible. One might be tempted to ask: "Which superproject
> commit contains a submodule pointer, that has commit $X in the
> submodule history?", but this new option is totally not answering this.

Ok, will try to come up with wording.

>>>                                                          The
>>> reason why these commits both occur prior to v2.0.0 are evil
>>> merges that are not found using this new mechanism.
>>
>> Would it be worth the doc mentioning that this doesn't look at merges
>> unless you use one of the -m/-c/--cc options, or does that go without
>> saying?
>
> I assumed it goes without saying, just like the lacking -t could mean
> to ignore trees. ;)

I suspect it's worth a mention, based on the discussion in this thread
(i.e. without such docs it was non-obvious and took some time to
diagnose).

[...]
>>> +--find-object=<object-id>::
>>> +     Restrict the output such that one side of the diff
>>> +     matches the given object id. The object can be a blob,
>>> +     gitlink entry or tree (when `-t` is given).
>>
>> I like this name --find-object more than --blobfind!  I am not sure it
>> quite matches what the user is looking for, though.  We are not
>> looking for all occurences of the object; we only care about when the
>> object appears (was added or removed) in the diff.
>
> Thanks! Yes, but the 'edges' are so few commits that a further walk
> will reveal all you need to know?

Sorry for the lack of clarity: I actually like this behavior *more*
than a "find trees pointing to object" behavior.  I'm just saying the
name sets an unclear expectation.

[...]
> Regarding finding a better name, I would want to hear from others,
> I am happy with --find-object, though I can see --pickaxe-object
> or --object--filter to be a good narrative as well.

Drat, I was hoping for an opinion.

Based on the answers above about mode changes and renames, at the
moment --object-filter seems clearest to me.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] diffcore: add a filter to find a specific blob
  2017-12-14 22:52               ` Jonathan Nieder
@ 2017-12-15  2:18                 ` Junio C Hamano
  2017-12-27 18:49                   ` Stefan Beller
  0 siblings, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2017-12-15  2:18 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Stefan Beller, git, Jeff King

Jonathan Nieder <jrnieder@gmail.com> writes:

>> Regarding finding a better name, I would want to hear from others,
>> I am happy with --find-object, though I can see --pickaxe-object
>> or --object--filter to be a good narrative as well.
>
> Drat, I was hoping for an opinion.

I think it would make it a better companion to --pickaxe but we need
to align its behaviour a little bit so that it plays better with the
"--pickaxe-all" option, and also needs to hide mode and name only
changes just like pickaxe.  

After all, the diffcore-blobfind code was written while looking at
the diffcore-pickaxe's code in another window shown in the pager,
and I tend to agree with your earlier message that this is an
extreme case of -S<contents> where the contents happens to be the
whole file.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] diffcore: add a filter to find a specific blob
  2017-12-15  2:18                 ` Junio C Hamano
@ 2017-12-27 18:49                   ` Stefan Beller
  2017-12-27 18:59                     ` Jonathan Nieder
  0 siblings, 1 reply; 29+ messages in thread
From: Stefan Beller @ 2017-12-27 18:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, git, Jeff King

On Thu, Dec 14, 2017 at 6:18 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:
>
>>> Regarding finding a better name, I would want to hear from others,
>>> I am happy with --find-object, though I can see --pickaxe-object
>>> or --object--filter to be a good narrative as well.
>>
>> Drat, I was hoping for an opinion.
>
> I think it would make it a better companion to --pickaxe but we need
> to align its behaviour a little bit so that it plays better with the
> "--pickaxe-all" option, and also needs to hide mode and name only
> changes just like pickaxe.

I looked into this, and the small changes needed led me to thinking
it could be integrated into the diffcore-pickaxe code completely,
roughly like (spaces mangled):

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 9476bd2108..46f875a7b4 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -124,13 +124,21 @@ static int pickaxe_match(struct diff_filepair
*p, struct diff_options *o,
        mmfile_t mf1, mf2;
        int ret;

-       if (!o->pickaxe[0])
-               return 0;
-
        /* ignore unmerged */
        if (!DIFF_FILE_VALID(p->one) && !DIFF_FILE_VALID(p->two))
                return 0;

+       if (options->objfind) {
+               if ((DIFF_FILE_VALID(p->one) &&
+                    oidset_contains(options->objfind, &p->one->oid)) ||
+                   (DIFF_FILE_VALID(p->two) &&
+                    oidset_contains(options->objfind, &p->two->oid)))
+                       return 1;
+       }
+
+       if (!o->pickaxe[0])
+               return 0;
+
        if (o->flags.allow_textconv) {
                textconv_one = get_textconv(p->one);
                textconv_two = get_textconv(p->two);
---8<---

But then, it seems as if any pickaxe option is incompatible with
any other, i.e. from reading the code, you cannot combine -S
and -G, or even give one of them twice.

I guess that would be not a big deal for the --pickaxe-object,
but just want to point it out.

> After all, the diffcore-blobfind code was written while looking at
> the diffcore-pickaxe's code in another window shown in the pager,
> and I tend to agree with your earlier message that this is an
> extreme case of -S<contents> where the contents happens to be the
> whole file.

I disagree, as the user doesn't have the content, but the hash
over the content only and wants to know more about it. The new
option cannot be used to find a file whose partial content hashes to
the given sha1, either.

So with these considerations, I would keep the patch as currently\
queued at sb/diff-blobfind.

Thanks,
Stefan

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] diffcore: add a filter to find a specific blob
  2017-12-27 18:49                   ` Stefan Beller
@ 2017-12-27 18:59                     ` Jonathan Nieder
  2017-12-27 19:57                       ` Junio C Hamano
  0 siblings, 1 reply; 29+ messages in thread
From: Jonathan Nieder @ 2017-12-27 18:59 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, git, Jeff King

Stefan Beller wrote:
> On Thu, Dec 14, 2017 at 6:18 PM, Junio C Hamano <gitster@pobox.com> wrote:

>> I think it would make it a better companion to --pickaxe but we need
>> to align its behaviour a little bit so that it plays better with the
>> "--pickaxe-all" option, and also needs to hide mode and name only
>> changes just like pickaxe.
>
> I looked into this, and the small changes needed led me to thinking
> it could be integrated into the diffcore-pickaxe code completely,
> roughly like (spaces mangled):

Nice, this looks promising.

[...]
> But then, it seems as if any pickaxe option is incompatible with
> any other, i.e. from reading the code, you cannot combine -S
> and -G, or even give one of them twice.
>
> I guess that would be not a big deal for the --pickaxe-object,
> but just want to point it out.

Agreed that that's not a big deal for --pickaxe-object.

>> After all, the diffcore-blobfind code was written while looking at
>> the diffcore-pickaxe's code in another window shown in the pager,
>> and I tend to agree with your earlier message that this is an
>> extreme case of -S<contents> where the contents happens to be the
>> whole file.
>
> I disagree, as the user doesn't have the content, but the hash
> over the content only and wants to know more about it. The new
> option cannot be used to find a file whose partial content hashes to
> the given sha1, either.
>
> So with these considerations, I would keep the patch as currently\
> queued at sb/diff-blobfind.

Interesting --- I come to the opposite conclusion.

The pickaxe-style behavior seems more consistent and simpler to
explain and better matches the use cases I can think of.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] diffcore: add a filter to find a specific blob
  2017-12-27 18:59                     ` Jonathan Nieder
@ 2017-12-27 19:57                       ` Junio C Hamano
  0 siblings, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2017-12-27 19:57 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Stefan Beller, git, Jeff King

Jonathan Nieder <jrnieder@gmail.com> writes:

> Stefan Beller wrote:
>> On Thu, Dec 14, 2017 at 6:18 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
>>> I think it would make it a better companion to --pickaxe but we need
>>> to align its behaviour a little bit so that it plays better with the
>>> "--pickaxe-all" option, and also needs to hide mode and name only
>>> changes just like pickaxe.
>>
>> I looked into this, and the small changes needed led me to thinking
>> it could be integrated into the diffcore-pickaxe code completely,
>> roughly like (spaces mangled):
>
> Nice, this looks promising.

Indeed it is very simple, straight-forward and nice ;-)

>> I disagree, as the user doesn't have the content, but the hash
>> over the content only and wants to know more about it....

(correcting Stefan)  The user can do "git cat-file blob $oid" to
learn about the contents, so from end user's point of view, the
"pickaxe with object ID" can be seen as merely a short-hand for

    git log -S"$(git cat-file blob $oid)"

almost.

But that is a tangent in this response.

> Interesting --- I come to the opposite conclusion.
>
> The pickaxe-style behavior seems more consistent and simpler to
> explain and better matches the use cases I can think of.

Yeah, I am more leaning toward agreeing with you.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-12-27 19:57 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-08  0:24 [PATCH 0/1] diffcore-blobfind Stefan Beller
2017-12-08  0:24 ` [PATCH 1/1] diffcore: add a filter to find a specific blob Stefan Beller
2017-12-08  9:34   ` Jeff King
2017-12-08 16:28     ` Ramsay Jones
2017-12-08 20:19       ` Jeff King
2017-12-08 20:39         ` Stefan Beller
2017-12-08 21:38           ` Jeff King
2017-12-08 15:04   ` Junio C Hamano
2017-12-08 17:21     ` Junio C Hamano
2017-12-08 21:11     ` Stefan Beller
2017-12-08 21:15       ` Junio C Hamano
2017-12-11 19:58 ` [PATCH 0/1] diff-core blobfind Stefan Beller
2017-12-11 19:58   ` [PATCH 1/1] diffcore: add a filter to find a specific blob Stefan Beller
2017-12-11 23:17     ` Junio C Hamano
2017-12-12  0:21       ` Stefan Beller
2017-12-12  1:24         ` [PATCH] " Stefan Beller
2017-12-12 18:36           ` Junio C Hamano
2017-12-14 21:22           ` Jonathan Nieder
2017-12-14 22:30             ` Stefan Beller
2017-12-14 22:52               ` Jonathan Nieder
2017-12-15  2:18                 ` Junio C Hamano
2017-12-27 18:49                   ` Stefan Beller
2017-12-27 18:59                     ` Jonathan Nieder
2017-12-27 19:57                       ` Junio C Hamano
2017-12-14 22:44             ` Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2017-11-20 22:25 [PATCH 0/1] Teaching the diff machinery about blobfind [WAS: git describe <blob>] Stefan Beller
2017-11-20 22:25 ` [PATCH 1/1] diffcore: add a filter to find a specific blob Stefan Beller
2017-11-24  7:43   ` Junio C Hamano
2017-11-25  4:59     ` Junio C Hamano
2017-12-07 21:40     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).