git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [RFC PATCH 0/3] git-describe <blob> ?
@ 2017-10-28  0:44 Stefan Beller
  2017-10-28  0:45 ` [PATCH 1/3] list-objects.c: factor out traverse_trees_and_blobs Stefan Beller
                   ` (2 more replies)
  0 siblings, 3 replies; 110+ messages in thread
From: Stefan Beller @ 2017-10-28  0:44 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Occasionally a user is given an object hash from a blob as an error message
or other output (e.g. [1]).

It would be useful to get a further description of such a blob, such as
the (commit, path) tuple where this blob was introduced.

This implements the answer in builtin/describe, but I am not sure if that
is the right place. (One office mate argued it could be a "reverse-blame"
that tells you all the trees/commits where the blob is referenced from).

This is RFC for other reasons as well: tests, docs missing.

Any feedback welcome,

thanks,
Stefan

[1] https://stackoverflow.com/questions/10622179/how-to-find-identify-large-files-commits-in-git-history

Stefan Beller (3):
  list-objects.c: factor out traverse_trees_and_blobs
  revision.h: introduce blob/tree walking in order of the commits
  builtin/describe: describe blobs

 builtin/describe.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++----
 list-objects.c     | 48 +++++++++++++++++++++++++++-----------------
 revision.h         |  3 ++-
 3 files changed, 86 insertions(+), 23 deletions(-)

-- 
2.15.0.rc2.443.gfcc3b81c0a


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH 1/3] list-objects.c: factor out traverse_trees_and_blobs
  2017-10-28  0:44 [RFC PATCH 0/3] git-describe <blob> ? Stefan Beller
@ 2017-10-28  0:45 ` Stefan Beller
  2017-10-28  0:45   ` [PATCH 2/3] revision.h: introduce blob/tree walking in order of the commits Stefan Beller
                     ` (2 more replies)
  2017-10-28 16:04 ` [RFC PATCH 0/3] git-describe <blob> ? Johannes Schindelin
  2017-10-31  0:33 ` [PATCH 0/7] git-describe <blob> Stefan Beller
  2 siblings, 3 replies; 110+ messages in thread
From: Stefan Beller @ 2017-10-28  0:45 UTC (permalink / raw)
  To: sbeller; +Cc: git

With traverse_trees_and_blobs factored out of the main traverse function,
the next patch can introduce an in-order revision walking with ease.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 list-objects.c | 46 ++++++++++++++++++++++++++++------------------
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index b3931fa434..0ee0551604 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -183,25 +183,13 @@ static void add_pending_tree(struct rev_info *revs, struct tree *tree)
 	add_pending_object(revs, &tree->object, "");
 }
 
-void traverse_commit_list(struct rev_info *revs,
-			  show_commit_fn show_commit,
-			  show_object_fn show_object,
-			  void *data)
+static void traverse_trees_and_blobs(struct rev_info *revs,
+				     struct strbuf *base,
+				     show_object_fn show_object,
+				     void *data)
 {
 	int i;
-	struct commit *commit;
-	struct strbuf base;
 
-	strbuf_init(&base, PATH_MAX);
-	while ((commit = get_revision(revs)) != NULL) {
-		/*
-		 * an uninteresting boundary commit may not have its tree
-		 * parsed yet, but we are not going to show them anyway
-		 */
-		if (commit->tree)
-			add_pending_tree(revs, commit->tree);
-		show_commit(commit, data);
-	}
 	for (i = 0; i < revs->pending.nr; i++) {
 		struct object_array_entry *pending = revs->pending.objects + i;
 		struct object *obj = pending->item;
@@ -218,17 +206,39 @@ void traverse_commit_list(struct rev_info *revs,
 			path = "";
 		if (obj->type == OBJ_TREE) {
 			process_tree(revs, (struct tree *)obj, show_object,
-				     &base, path, data);
+				     base, path, data);
 			continue;
 		}
 		if (obj->type == OBJ_BLOB) {
 			process_blob(revs, (struct blob *)obj, show_object,
-				     &base, path, data);
+				     base, path, data);
 			continue;
 		}
 		die("unknown pending object %s (%s)",
 		    oid_to_hex(&obj->oid), name);
 	}
 	object_array_clear(&revs->pending);
+}
+
+void traverse_commit_list(struct rev_info *revs,
+			  show_commit_fn show_commit,
+			  show_object_fn show_object,
+			  void *data)
+{
+	struct commit *commit;
+	struct strbuf base;
+	strbuf_init(&base, PATH_MAX);
+
+	while ((commit = get_revision(revs)) != NULL) {
+		/*
+		 * an uninteresting boundary commit may not have its tree
+		 * parsed yet, but we are not going to show them anyway
+		 */
+		if (commit->tree)
+			add_pending_tree(revs, commit->tree);
+		show_commit(commit, data);
+	}
+	traverse_trees_and_blobs(revs, &base, show_object, data);
+
 	strbuf_release(&base);
 }
-- 
2.15.0.rc2.443.gfcc3b81c0a


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH 2/3] revision.h: introduce blob/tree walking in order of the commits
  2017-10-28  0:45 ` [PATCH 1/3] list-objects.c: factor out traverse_trees_and_blobs Stefan Beller
@ 2017-10-28  0:45   ` Stefan Beller
  2017-10-28 17:20     ` Johannes Schindelin
  2017-10-28  0:45   ` [PATCH 3/3] builtin/describe: describe blobs Stefan Beller
  2017-10-28 17:15   ` [PATCH 1/3] list-objects.c: factor out traverse_trees_and_blobs Johannes Schindelin
  2 siblings, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2017-10-28  0:45 UTC (permalink / raw)
  To: sbeller; +Cc: git

This will be useful shortly.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 list-objects.c | 2 ++
 revision.h     | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/list-objects.c b/list-objects.c
index 0ee0551604..ef64a237d3 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -237,6 +237,8 @@ void traverse_commit_list(struct rev_info *revs,
 		if (commit->tree)
 			add_pending_tree(revs, commit->tree);
 		show_commit(commit, data);
+		if (revs->tree_blobs_in_commit_order)
+			traverse_trees_and_blobs(revs, &base, show_object, data);
 	}
 	traverse_trees_and_blobs(revs, &base, show_object, data);
 
diff --git a/revision.h b/revision.h
index 54761200ad..86985d68aa 100644
--- a/revision.h
+++ b/revision.h
@@ -121,7 +121,8 @@ struct rev_info {
 			bisect:1,
 			ancestry_path:1,
 			first_parent_only:1,
-			line_level_traverse:1;
+			line_level_traverse:1,
+			tree_blobs_in_commit_order:1;
 
 	/* Diff flags */
 	unsigned int	diff:1,
-- 
2.15.0.rc2.443.gfcc3b81c0a


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH 3/3] builtin/describe: describe blobs
  2017-10-28  0:45 ` [PATCH 1/3] list-objects.c: factor out traverse_trees_and_blobs Stefan Beller
  2017-10-28  0:45   ` [PATCH 2/3] revision.h: introduce blob/tree walking in order of the commits Stefan Beller
@ 2017-10-28  0:45   ` Stefan Beller
  2017-10-28 17:32     ` Johannes Schindelin
  2017-10-28 17:15   ` [PATCH 1/3] list-objects.c: factor out traverse_trees_and_blobs Johannes Schindelin
  2 siblings, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2017-10-28  0:45 UTC (permalink / raw)
  To: sbeller; +Cc: git

Sometimes users are given a hash of an object and they want to
identify it further (ex.: Use verify-pack to find the largest blobs,
but what are these? or [1])

The best identification of a blob hash is done via a its path at a
given commit, which this implements.

[1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/describe.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 54 insertions(+), 4 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index 29075dbd0f..752de5843b 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -11,8 +11,9 @@
 #include "hashmap.h"
 #include "argv-array.h"
 #include "run-command.h"
+#include "revision.h"
+#include "list-objects.h"
 
-#define SEEN		(1u << 0)
 #define MAX_TAGS	(FLAG_BITS - 1)
 
 static const char * const describe_usage[] = {
@@ -282,6 +283,50 @@ static void show_suffix(int depth, const struct object_id *oid)
 	printf("-%d-g%s", depth, find_unique_abbrev(oid->hash, abbrev));
 }
 
+struct blob_descriptor {
+	struct object_id current_commit;
+	struct object_id looking_for;
+};
+
+static void process_commit(struct commit *commit, void *data)
+{
+	struct blob_descriptor *bd = data;
+
+	bd->current_commit = commit->object.oid;
+}
+
+static void process_object(struct object *obj, const char *name, void *data)
+{
+	struct blob_descriptor *bd = data;
+
+	if (!oidcmp(&bd->looking_for, &obj->oid))
+		printf(_("blob %s present at path %s in commit %s\n"),
+			oid_to_hex(&bd->looking_for), name,
+			oid_to_hex(&bd->current_commit));
+}
+
+static void describe_blob(struct object_id oid)
+{
+	struct rev_info revs;
+	struct argv_array args = ARGV_ARRAY_INIT;
+	struct blob_descriptor bd = { null_oid, oid };
+
+	argv_array_pushl(&args, "internal: The first arg is not parsed",
+		"--all", "--single-worktree", "--objects", NULL);
+
+	revs.tree_blobs_in_commit_order = 1;
+
+	init_revisions(&revs, NULL);
+
+	if (setup_revisions(args.argc, args.argv, &revs, NULL) > 1)
+		BUG("setup_revisions could not handle all args?");
+
+	if (prepare_revision_walk(&revs))
+		die("revision walk setup failed");
+
+	traverse_commit_list(&revs, process_commit, process_object, &bd);
+}
+
 static void describe(const char *arg, int last_one)
 {
 	struct object_id oid;
@@ -295,9 +340,14 @@ static void describe(const char *arg, int last_one)
 
 	if (get_oid(arg, &oid))
 		die(_("Not a valid object name %s"), arg);
-	cmit = lookup_commit_reference(&oid);
-	if (!cmit)
-		die(_("%s is not a valid '%s' object"), arg, commit_type);
+	cmit = lookup_commit_reference_gently(&oid, 1);
+	if (!cmit) {
+		if (lookup_blob(&oid))
+			describe_blob(oid);
+		else
+			die(_("%s is not a commit nor blob"), arg);
+		return;
+	}
 
 	n = find_commit_name(&cmit->object.oid);
 	if (n && (tags || all || n->prio == 2)) {
-- 
2.15.0.rc2.443.gfcc3b81c0a


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [RFC PATCH 0/3] git-describe <blob> ?
  2017-10-28  0:44 [RFC PATCH 0/3] git-describe <blob> ? Stefan Beller
  2017-10-28  0:45 ` [PATCH 1/3] list-objects.c: factor out traverse_trees_and_blobs Stefan Beller
@ 2017-10-28 16:04 ` Johannes Schindelin
  2017-10-31  0:33 ` [PATCH 0/7] git-describe <blob> Stefan Beller
  2 siblings, 0 replies; 110+ messages in thread
From: Johannes Schindelin @ 2017-10-28 16:04 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Hi Stefan,

On Fri, 27 Oct 2017, Stefan Beller wrote:

> Occasionally a user is given an object hash from a blob as an error message
> or other output (e.g. [1]).
> 
> It would be useful to get a further description of such a blob, such as
> the (commit, path) tuple where this blob was introduced.
> 
> This implements the answer in builtin/describe, but I am not sure if that
> is the right place. (One office mate argued it could be a "reverse-blame"
> that tells you all the trees/commits where the blob is referenced from).
> 
> This is RFC for other reasons as well: tests, docs missing.
> 
> Any feedback welcome,

As you ask so nicely, I'll just cheer (I do not have time to review it
right now, but I really, really wanted this, I even started working on the
name-rev side of things some time ago but eventually had to let things
slide).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/3] list-objects.c: factor out traverse_trees_and_blobs
  2017-10-28  0:45 ` [PATCH 1/3] list-objects.c: factor out traverse_trees_and_blobs Stefan Beller
  2017-10-28  0:45   ` [PATCH 2/3] revision.h: introduce blob/tree walking in order of the commits Stefan Beller
  2017-10-28  0:45   ` [PATCH 3/3] builtin/describe: describe blobs Stefan Beller
@ 2017-10-28 17:15   ` Johannes Schindelin
  2017-10-29  3:13     ` Stefan Beller
  2 siblings, 1 reply; 110+ messages in thread
From: Johannes Schindelin @ 2017-10-28 17:15 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Hi Stefan,

[I was intrigued enough by your work to postpone to later this coming week
reading the What's cooking email in favor of reviewing your patch series.]

On Fri, 27 Oct 2017, Stefan Beller wrote:

> With traverse_trees_and_blobs factored out of the main traverse function,
> the next patch can introduce an in-order revision walking with ease.

Makes sense.

> diff --git a/list-objects.c b/list-objects.c
> index b3931fa434..0ee0551604 100644
> --- a/list-objects.c
> +++ b/list-objects.c
> @@ -183,25 +183,13 @@ static void add_pending_tree(struct rev_info *revs, struct tree *tree)
>  	add_pending_object(revs, &tree->object, "");
>  }
>  
> -void traverse_commit_list(struct rev_info *revs,
> -			  show_commit_fn show_commit,
> -			  show_object_fn show_object,
> -			  void *data)
> +static void traverse_trees_and_blobs(struct rev_info *revs,
> +				     struct strbuf *base,

In the context of one function, it was obvious what `base` meant. Maybe we
can call it `base_path` now?

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/3] revision.h: introduce blob/tree walking in order of the commits
  2017-10-28  0:45   ` [PATCH 2/3] revision.h: introduce blob/tree walking in order of the commits Stefan Beller
@ 2017-10-28 17:20     ` Johannes Schindelin
  2017-10-29  3:22       ` Stefan Beller
  0 siblings, 1 reply; 110+ messages in thread
From: Johannes Schindelin @ 2017-10-28 17:20 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Hi Stefan,

On Fri, 27 Oct 2017, Stefan Beller wrote:

> This will be useful shortly.

Something tells me that I will hate you in the future when I read this
commit message and lack the context (e.g. when blaming, where I cannot see
the child commits let alone the comment in the Merge commit).

How about:

	The functionality to list tree objects in the order they were seen
	while traversing the commits will be used in the next commit,
	where we teach `git describe` to describe not only commits, but
	trees and blobs, too.

The diff itself is amazingly easy to review, and obviously correct.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 3/3] builtin/describe: describe blobs
  2017-10-28  0:45   ` [PATCH 3/3] builtin/describe: describe blobs Stefan Beller
@ 2017-10-28 17:32     ` Johannes Schindelin
  2017-10-28 22:47       ` Jacob Keller
  2017-10-29  3:28       ` Stefan Beller
  0 siblings, 2 replies; 110+ messages in thread
From: Johannes Schindelin @ 2017-10-28 17:32 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Hi Stefan,

On Fri, 27 Oct 2017, Stefan Beller wrote:

> Sometimes users are given a hash of an object and they want to identify
> it further (ex.: Use verify-pack to find the largest blobs, but what are
> these? or [1])
> 
> The best identification of a blob hash is done via a its path at a
> given commit, which this implements.
> 
> [1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob

I also came up with a script to do that:
https://github.com/msysgit/msysgit/blob/master/bin/what-made-this-repo-so-large.sh

Your method is much more elegant, of course (it describes the commit in the
same run as it finds the object, and it does not output tons of stuff only
to be filtered).

> @@ -282,6 +283,50 @@ static void show_suffix(int depth, const struct object_id *oid)
>  	printf("-%d-g%s", depth, find_unique_abbrev(oid->hash, abbrev));
>  }
>  
> +struct blob_descriptor {
> +	struct object_id current_commit;
> +	struct object_id looking_for;
> +};

Personally, I would call this process_commit_data, but I do not mind too
much about the name.

> +static void process_object(struct object *obj, const char *name, void *data)
> +{
> +	struct blob_descriptor *bd = data;
> +
> +	if (!oidcmp(&bd->looking_for, &obj->oid))
> +		printf(_("blob %s present at path %s in commit %s\n"),
> +			oid_to_hex(&bd->looking_for), name,
> +			oid_to_hex(&bd->current_commit));
> +}

s/name/path/

> @@ -295,9 +340,14 @@ static void describe(const char *arg, int last_one)
>  
>  	if (get_oid(arg, &oid))
>  		die(_("Not a valid object name %s"), arg);
> -	cmit = lookup_commit_reference(&oid);
> -	if (!cmit)
> -		die(_("%s is not a valid '%s' object"), arg, commit_type);
> +	cmit = lookup_commit_reference_gently(&oid, 1);
> +	if (!cmit) {
> +		if (lookup_blob(&oid))
> +			describe_blob(oid);
> +		else
> +			die(_("%s is not a commit nor blob"), arg);

s/not/neither/

Nicely done, sir!

I wonder whether it would make sense to extend this to tree objects while
we are at it, but maybe that's an easy up-for-grabs.

Thank you very much!
Dscho

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 3/3] builtin/describe: describe blobs
  2017-10-28 17:32     ` Johannes Schindelin
@ 2017-10-28 22:47       ` Jacob Keller
  2017-10-29  3:28       ` Stefan Beller
  1 sibling, 0 replies; 110+ messages in thread
From: Jacob Keller @ 2017-10-28 22:47 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Stefan Beller, Git mailing list

On Sat, Oct 28, 2017 at 10:32 AM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi Stefan,
>
>
> Nicely done, sir!
>
> I wonder whether it would make sense to extend this to tree objects while
> we are at it, but maybe that's an easy up-for-grabs.
>
> Thank you very much!
> Dscho

I'd very much like the same ability for trees and gitlinks as well!
But it does seem it might be fairly easy.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/3] list-objects.c: factor out traverse_trees_and_blobs
  2017-10-28 17:15   ` [PATCH 1/3] list-objects.c: factor out traverse_trees_and_blobs Johannes Schindelin
@ 2017-10-29  3:13     ` Stefan Beller
  0 siblings, 0 replies; 110+ messages in thread
From: Stefan Beller @ 2017-10-29  3:13 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

On Sat, Oct 28, 2017 at 10:15 AM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi Stefan,
>
> [I was intrigued enough by your work to postpone to later this coming week
> reading the What's cooking email in favor of reviewing your patch series.]
>
> On Fri, 27 Oct 2017, Stefan Beller wrote:
>
>> With traverse_trees_and_blobs factored out of the main traverse function,
>> the next patch can introduce an in-order revision walking with ease.
>
> Makes sense.
>
>> diff --git a/list-objects.c b/list-objects.c
>> index b3931fa434..0ee0551604 100644
>> --- a/list-objects.c
>> +++ b/list-objects.c
>> @@ -183,25 +183,13 @@ static void add_pending_tree(struct rev_info *revs, struct tree *tree)
>>       add_pending_object(revs, &tree->object, "");
>>  }
>>
>> -void traverse_commit_list(struct rev_info *revs,
>> -                       show_commit_fn show_commit,
>> -                       show_object_fn show_object,
>> -                       void *data)
>> +static void traverse_trees_and_blobs(struct rev_info *revs,
>> +                                  struct strbuf *base,
>
> In the context of one function, it was obvious what `base` meant. Maybe we
> can call it `base_path` now?

I was intrigued to keep the base local to the factored out function, but that
would mean, we'd have to allocate memory for it in every call. That I wanted
to avoid, so the only reason to pass it in, is memory management.

base_path sounds good, will rename.

Thanks for the review!
Stefan

>
> Thanks,
> Dscho

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/3] revision.h: introduce blob/tree walking in order of the commits
  2017-10-28 17:20     ` Johannes Schindelin
@ 2017-10-29  3:22       ` Stefan Beller
  2017-10-29  3:23         ` Stefan Beller
  0 siblings, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2017-10-29  3:22 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

On Sat, Oct 28, 2017 at 10:20 AM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi Stefan,
>
> On Fri, 27 Oct 2017, Stefan Beller wrote:
>
>> This will be useful shortly.
>
> Something tells me that I will hate you in the future when I read this
> commit message and lack the context (e.g. when blaming, where I cannot see
> the child commits let alone the comment in the Merge commit).

Thanks for calling me out.

(Not making excuses, but...) I remembered some other senior members
having such commit messages, so I felt like I had to step up my game.

https://public-inbox.org/git/70fbcd573f5c8a78a19a08ffc255437c36e7f49d.1495014840.git.mhagger@alum.mit.edu/

>
> How about:
>
>         The functionality to list tree objects in the order they were seen
>         while traversing the commits will be used in the next commit,
>         where we teach `git describe` to describe not only commits, but
>         trees and blobs, too.

Sounds good. If only we had git-when-merged[1] as some easy upstream
function, then the original commit message would provoke less hate as
future us would have an easier time finding the next commit.

[1] https://github.com/mhagger/git-when-merged

When writing the patch I wondered if this could be useful for other
use cases as well, i.e. do we want to have a command line argument
to trigger this behavior?

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/3] revision.h: introduce blob/tree walking in order of the commits
  2017-10-29  3:22       ` Stefan Beller
@ 2017-10-29  3:23         ` Stefan Beller
  2017-10-29  3:43           ` Junio C Hamano
  0 siblings, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2017-10-29  3:23 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

On Sat, Oct 28, 2017 at 8:22 PM, Stefan Beller <sbeller@google.com> wrote:

>
> (Not making excuses, but...) I remembered some other senior members
> having such commit messages, so I felt like I had to step up my game.
>
> https://public-inbox.org/git/70fbcd573f5c8a78a19a08ffc255437c36e7f49d.1495014840.git.mhagger@alum.mit.edu/

I forgot to note though, that this never made it into the official tree, though.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 3/3] builtin/describe: describe blobs
  2017-10-28 17:32     ` Johannes Schindelin
  2017-10-28 22:47       ` Jacob Keller
@ 2017-10-29  3:28       ` Stefan Beller
  2017-10-29 12:02         ` Kevin Daudt
  2017-10-29 12:07         ` Johannes Schindelin
  1 sibling, 2 replies; 110+ messages in thread
From: Stefan Beller @ 2017-10-29  3:28 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

On Sat, Oct 28, 2017 at 10:32 AM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi Stefan,
>
> On Fri, 27 Oct 2017, Stefan Beller wrote:
>
>> Sometimes users are given a hash of an object and they want to identify
>> it further (ex.: Use verify-pack to find the largest blobs, but what are
>> these? or [1])
>>
>> The best identification of a blob hash is done via a its path at a
>> given commit, which this implements.
>>
>> [1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob
>
> I also came up with a script to do that:
> https://github.com/msysgit/msysgit/blob/master/bin/what-made-this-repo-so-large.sh
>
> Your method is much more elegant, of course (it describes the commit in the
> same run as it finds the object, and it does not output tons of stuff only
> to be filtered).

That was the task I was given. Discussion here turned out that users
mostly only ever care about commit object names, trees and blobs are
only useful when specifically given.


>
>> @@ -282,6 +283,50 @@ static void show_suffix(int depth, const struct object_id *oid)
>>       printf("-%d-g%s", depth, find_unique_abbrev(oid->hash, abbrev));
>>  }
>>
>> +struct blob_descriptor {
>> +     struct object_id current_commit;
>> +     struct object_id looking_for;
>> +};
>
> Personally, I would call this process_commit_data, but I do not mind too
> much about the name.

I'll take any naming suggestion, as this code was assembled in a hurry using
copy/paste and trial&error as the high-tech methods used.

>> +static void process_object(struct object *obj, const char *name, void *data)
>> +{
>> +     struct blob_descriptor *bd = data;
>> +
>> +     if (!oidcmp(&bd->looking_for, &obj->oid))
>> +             printf(_("blob %s present at path %s in commit %s\n"),
>> +                     oid_to_hex(&bd->looking_for), name,
>> +                     oid_to_hex(&bd->current_commit));
>> +}
>
> s/name/path/
>
>> @@ -295,9 +340,14 @@ static void describe(const char *arg, int last_one)
>>
>>       if (get_oid(arg, &oid))
>>               die(_("Not a valid object name %s"), arg);
>> -     cmit = lookup_commit_reference(&oid);
>> -     if (!cmit)
>> -             die(_("%s is not a valid '%s' object"), arg, commit_type);
>> +     cmit = lookup_commit_reference_gently(&oid, 1);
>> +     if (!cmit) {
>> +             if (lookup_blob(&oid))
>> +                     describe_blob(oid);
>> +             else
>> +                     die(_("%s is not a commit nor blob"), arg);
>
> s/not/neither/
>
> Nicely done, sir!
>
> I wonder whether it would make sense to extend this to tree objects while
> we are at it, but maybe that's an easy up-for-grabs.

I can look into incorporating that, too. What is the use case though?
(Is there any error message, common enough that users want to
identify trees?)

Thanks for the review,
Stefan

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/3] revision.h: introduce blob/tree walking in order of the commits
  2017-10-29  3:23         ` Stefan Beller
@ 2017-10-29  3:43           ` Junio C Hamano
  0 siblings, 0 replies; 110+ messages in thread
From: Junio C Hamano @ 2017-10-29  3:43 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Johannes Schindelin, git

Stefan Beller <sbeller@google.com> writes:

> On Sat, Oct 28, 2017 at 8:22 PM, Stefan Beller <sbeller@google.com> wrote:
>
>>
>> (Not making excuses, but...) I remembered some other senior members
>> having such commit messages, so I felt like I had to step up my game.
>>
>> https://public-inbox.org/git/70fbcd573f5c8a78a19a08ffc255437c36e7f49d.1495014840.git.mhagger@alum.mit.edu/
>
> I forgot to note though, that this never made it into the official tree, though.

Besides, leading by setting a better example would be the right
approach to gain seniority (if that is your goal, that is) ;-).


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 3/3] builtin/describe: describe blobs
  2017-10-29  3:28       ` Stefan Beller
@ 2017-10-29 12:02         ` Kevin Daudt
  2017-10-29 12:07         ` Johannes Schindelin
  1 sibling, 0 replies; 110+ messages in thread
From: Kevin Daudt @ 2017-10-29 12:02 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Johannes Schindelin, git

On Sat, Oct 28, 2017 at 08:28:54PM -0700, Stefan Beller wrote:
> On Sat, Oct 28, 2017 at 10:32 AM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> > 
> > [..]
> > I wonder whether it would make sense to extend this to tree objects while
> > we are at it, but maybe that's an easy up-for-grabs.
> 
> I can look into incorporating that, too. What is the use case though?
> (Is there any error message, common enough that users want to
> identify trees?)
> 
> Thanks for the review,
> Stefan

Not sure if it's really helpfulp, but sometimes with corrup repos, git
would complain about a certain object missing or corrupt, where it might
be usefull to find out how it's referenced. Not sure though if this
would work in that case, because the repo is already corrupt.

Kevin

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 3/3] builtin/describe: describe blobs
  2017-10-29  3:28       ` Stefan Beller
  2017-10-29 12:02         ` Kevin Daudt
@ 2017-10-29 12:07         ` Johannes Schindelin
  1 sibling, 0 replies; 110+ messages in thread
From: Johannes Schindelin @ 2017-10-29 12:07 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Hi Stefan,


On Sat, 28 Oct 2017, Stefan Beller wrote:

> On Sat, Oct 28, 2017 at 10:32 AM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
>
> > I wonder whether it would make sense to extend this to tree objects
> > while we are at it, but maybe that's an easy up-for-grabs.
> 
> I can look into incorporating that, too. What is the use case though?
> (Is there any error message, common enough that users want to identify
> trees?)

I remember that I wanted to know in the past where that tree object came
from that is now missing (back in the days when worktrees gc'ed away
detached worktree HEADs and their reflogs), but on second thought, I guess
`git describe` would go belly up in that case (die()ing due to the very
same missing tree object)...

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH 0/7] git-describe <blob>
  2017-10-28  0:44 [RFC PATCH 0/3] git-describe <blob> ? Stefan Beller
  2017-10-28  0:45 ` [PATCH 1/3] list-objects.c: factor out traverse_trees_and_blobs Stefan Beller
  2017-10-28 16:04 ` [RFC PATCH 0/3] git-describe <blob> ? Johannes Schindelin
@ 2017-10-31  0:33 ` Stefan Beller
  2017-10-31  0:33   ` [PATCH 1/7] list-objects.c: factor out traverse_trees_and_blobs Stefan Beller
                     ` (7 more replies)
  2 siblings, 8 replies; 110+ messages in thread
From: Stefan Beller @ 2017-10-31  0:33 UTC (permalink / raw)
  To: sbeller; +Cc: git, me, jacob.keller, Johannes.Schindelin

This is not an RFC any more, but a serious series.

Occasionally a user is given an object hash from a blob as an error message
or other output (e.g. [1]).

It would be useful to get a further description of such a blob, such as
the (commit, path) tuple where this blob was introduced.

This implements the answer in builtin/describe,
however the heuristics are weak. See patch 6 for details.

Any feedback welcome,

Thanks,
Stefan

[1] https://stackoverflow.com/questions/10622179/how-to-find-identify-large-files-commits-in-git-history

Stefan Beller (7):
  list-objects.c: factor out traverse_trees_and_blobs
  revision.h: introduce blob/tree walking in order of the commits
  builtin/describe.c: rename `oid` to avoid variable shadowing
  builtin/describe.c: print debug statements earlier
  builtin/describe.c: factor out describe_commit
  builtin/describe.c: describe a blob
  t6120: fix typo in test name

 Documentation/git-describe.txt |  12 +++-
 builtin/describe.c             | 125 ++++++++++++++++++++++++++++++++---------
 list-objects.c                 |  50 ++++++++++-------
 revision.c                     |   2 +
 revision.h                     |   3 +-
 t/t6100-rev-list-in-order.sh   |  44 +++++++++++++++
 t/t6120-describe.sh            |  17 +++++-
 7 files changed, 203 insertions(+), 50 deletions(-)
 create mode 100755 t/t6100-rev-list-in-order.sh

-- 
2.15.0.rc2.443.gfcc3b81c0a


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH 1/7] list-objects.c: factor out traverse_trees_and_blobs
  2017-10-31  0:33 ` [PATCH 0/7] git-describe <blob> Stefan Beller
@ 2017-10-31  0:33   ` Stefan Beller
  2017-10-31  6:07     ` Junio C Hamano
  2017-10-31  0:33   ` [PATCH 2/7] revision.h: introduce blob/tree walking in order of the commits Stefan Beller
                     ` (6 subsequent siblings)
  7 siblings, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2017-10-31  0:33 UTC (permalink / raw)
  To: sbeller; +Cc: git, me, jacob.keller, Johannes.Schindelin

With traverse_trees_and_blobs factored out of the main traverse function,
the next patch can introduce an in-order revision walking with ease.

The variable holding the base path is only used in the newly factored out
function `traverse_trees_and_blobs`, however we keep its scope to
`traverse_commit_list` to keep the number of invocations for memory
allocations and release to one per commit traversal.  Rename the variable
to `base_path` for clarity.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 list-objects.c | 48 +++++++++++++++++++++++++++++-------------------
 1 file changed, 29 insertions(+), 19 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index b3931fa434..bf46f80dff 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -183,25 +183,13 @@ static void add_pending_tree(struct rev_info *revs, struct tree *tree)
 	add_pending_object(revs, &tree->object, "");
 }
 
-void traverse_commit_list(struct rev_info *revs,
-			  show_commit_fn show_commit,
-			  show_object_fn show_object,
-			  void *data)
+static void traverse_trees_and_blobs(struct rev_info *revs,
+				     struct strbuf *base_path,
+				     show_object_fn show_object,
+				     void *data)
 {
 	int i;
-	struct commit *commit;
-	struct strbuf base;
 
-	strbuf_init(&base, PATH_MAX);
-	while ((commit = get_revision(revs)) != NULL) {
-		/*
-		 * an uninteresting boundary commit may not have its tree
-		 * parsed yet, but we are not going to show them anyway
-		 */
-		if (commit->tree)
-			add_pending_tree(revs, commit->tree);
-		show_commit(commit, data);
-	}
 	for (i = 0; i < revs->pending.nr; i++) {
 		struct object_array_entry *pending = revs->pending.objects + i;
 		struct object *obj = pending->item;
@@ -218,17 +206,39 @@ void traverse_commit_list(struct rev_info *revs,
 			path = "";
 		if (obj->type == OBJ_TREE) {
 			process_tree(revs, (struct tree *)obj, show_object,
-				     &base, path, data);
+				     base_path, path, data);
 			continue;
 		}
 		if (obj->type == OBJ_BLOB) {
 			process_blob(revs, (struct blob *)obj, show_object,
-				     &base, path, data);
+				     base_path, path, data);
 			continue;
 		}
 		die("unknown pending object %s (%s)",
 		    oid_to_hex(&obj->oid), name);
 	}
 	object_array_clear(&revs->pending);
-	strbuf_release(&base);
+}
+
+void traverse_commit_list(struct rev_info *revs,
+			  show_commit_fn show_commit,
+			  show_object_fn show_object,
+			  void *data)
+{
+	struct commit *commit;
+	struct strbuf base_path;
+	strbuf_init(&base_path, PATH_MAX);
+
+	while ((commit = get_revision(revs)) != NULL) {
+		/*
+		 * an uninteresting boundary commit may not have its tree
+		 * parsed yet, but we are not going to show them anyway
+		 */
+		if (commit->tree)
+			add_pending_tree(revs, commit->tree);
+		show_commit(commit, data);
+	}
+	traverse_trees_and_blobs(revs, &base_path, show_object, data);
+
+	strbuf_release(&base_path);
 }
-- 
2.15.0.rc2.443.gfcc3b81c0a


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH 2/7] revision.h: introduce blob/tree walking in order of the commits
  2017-10-31  0:33 ` [PATCH 0/7] git-describe <blob> Stefan Beller
  2017-10-31  0:33   ` [PATCH 1/7] list-objects.c: factor out traverse_trees_and_blobs Stefan Beller
@ 2017-10-31  0:33   ` Stefan Beller
  2017-10-31  6:57     ` Junio C Hamano
  2017-10-31  0:33   ` [PATCH 3/7] builtin/describe.c: rename `oid` to avoid variable shadowing Stefan Beller
                     ` (5 subsequent siblings)
  7 siblings, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2017-10-31  0:33 UTC (permalink / raw)
  To: sbeller; +Cc: git, me, jacob.keller, Johannes.Schindelin

The functionality to list tree objects in the order they were seen
while traversing the commits will be used in the next commit,
where we teach `git describe` to describe not only commits, but
trees and blobs, too.

Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 list-objects.c               |  2 ++
 revision.c                   |  2 ++
 revision.h                   |  3 ++-
 t/t6100-rev-list-in-order.sh | 44 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 50 insertions(+), 1 deletion(-)
 create mode 100755 t/t6100-rev-list-in-order.sh

diff --git a/list-objects.c b/list-objects.c
index bf46f80dff..5e114c9a8a 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -237,6 +237,8 @@ void traverse_commit_list(struct rev_info *revs,
 		if (commit->tree)
 			add_pending_tree(revs, commit->tree);
 		show_commit(commit, data);
+		if (revs->tree_blobs_in_commit_order)
+			traverse_trees_and_blobs(revs, &base_path, show_object, data);
 	}
 	traverse_trees_and_blobs(revs, &base_path, show_object, data);
 
diff --git a/revision.c b/revision.c
index d167223e69..9329d4ebbf 100644
--- a/revision.c
+++ b/revision.c
@@ -1845,6 +1845,8 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->dense = 0;
 	} else if (!strcmp(arg, "--show-all")) {
 		revs->show_all = 1;
+	} else if (!strcmp(arg, "--in-commit-order")) {
+		revs->tree_blobs_in_commit_order = 1;
 	} else if (!strcmp(arg, "--remove-empty")) {
 		revs->remove_empty_trees = 1;
 	} else if (!strcmp(arg, "--merges")) {
diff --git a/revision.h b/revision.h
index 54761200ad..86985d68aa 100644
--- a/revision.h
+++ b/revision.h
@@ -121,7 +121,8 @@ struct rev_info {
 			bisect:1,
 			ancestry_path:1,
 			first_parent_only:1,
-			line_level_traverse:1;
+			line_level_traverse:1,
+			tree_blobs_in_commit_order:1;
 
 	/* Diff flags */
 	unsigned int	diff:1,
diff --git a/t/t6100-rev-list-in-order.sh b/t/t6100-rev-list-in-order.sh
new file mode 100755
index 0000000000..67ebe815d2
--- /dev/null
+++ b/t/t6100-rev-list-in-order.sh
@@ -0,0 +1,44 @@
+#!/bin/sh
+
+test_description='miscellaneous rev-list tests'
+
+. ./test-lib.sh
+
+
+test_expect_success 'setup' '
+	for x in one two three four
+	do
+		echo $x >$x &&
+		git add $x &&
+		git commit -m "add file $x"
+	done &&
+	for x in four three
+	do
+		git rm $x
+		git commit -m "remove $x"
+	done &&
+	git rev-list --in-commit-order --objects HEAD >actual.raw &&
+	cut -c 1-40 > actual < actual.raw &&
+
+	>expect &&
+	git rev-parse HEAD^{commit}       >>expect &&
+	git rev-parse HEAD^{tree}         >>expect &&
+	git rev-parse HEAD^{tree}:one     >>expect &&
+	git rev-parse HEAD^{tree}:two     >>expect &&
+	git rev-parse HEAD~1^{commit}     >>expect &&
+	git rev-parse HEAD~1^{tree}       >>expect &&
+	git rev-parse HEAD~1^{tree}:three >>expect &&
+	git rev-parse HEAD~2^{commit}     >>expect &&
+	git rev-parse HEAD~2^{tree}       >>expect &&
+	git rev-parse HEAD~2^{tree}:four  >>expect &&
+	git rev-parse HEAD~3^{commit}     >>expect &&
+	# skip HEAD~3^{tree}
+	git rev-parse HEAD~4^{commit}     >>expect &&
+	# skip HEAD~4^{tree}
+	git rev-parse HEAD~5^{commit}     >>expect &&
+	git rev-parse HEAD~5^{tree}       >>expect &&
+
+	test_cmp expect actual
+'
+
+test_done
-- 
2.15.0.rc2.443.gfcc3b81c0a


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH 3/7] builtin/describe.c: rename `oid` to avoid variable shadowing
  2017-10-31  0:33 ` [PATCH 0/7] git-describe <blob> Stefan Beller
  2017-10-31  0:33   ` [PATCH 1/7] list-objects.c: factor out traverse_trees_and_blobs Stefan Beller
  2017-10-31  0:33   ` [PATCH 2/7] revision.h: introduce blob/tree walking in order of the commits Stefan Beller
@ 2017-10-31  0:33   ` Stefan Beller
  2017-10-31  8:15     ` Jacob Keller
  2017-10-31  0:33   ` [PATCH 4/7] builtin/describe.c: print debug statements earlier Stefan Beller
                     ` (4 subsequent siblings)
  7 siblings, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2017-10-31  0:33 UTC (permalink / raw)
  To: sbeller; +Cc: git, me, jacob.keller, Johannes.Schindelin

The function `describe` has already a variable named `oid` declared at
the beginning of the function for an object id.  Do now shadow that
variable with a pointer to an object id.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/describe.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index 29075dbd0f..fd61f463cf 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -381,9 +381,9 @@ static void describe(const char *arg, int last_one)
 	}
 
 	if (!match_cnt) {
-		struct object_id *oid = &cmit->object.oid;
+		struct object_id *cmit_oid = &cmit->object.oid;
 		if (always) {
-			printf("%s", find_unique_abbrev(oid->hash, abbrev));
+			printf("%s", find_unique_abbrev(cmit_oid->hash, abbrev));
 			if (suffix)
 				printf("%s", suffix);
 			printf("\n");
@@ -392,11 +392,11 @@ static void describe(const char *arg, int last_one)
 		if (unannotated_cnt)
 			die(_("No annotated tags can describe '%s'.\n"
 			    "However, there were unannotated tags: try --tags."),
-			    oid_to_hex(oid));
+			    oid_to_hex(cmit_oid));
 		else
 			die(_("No tags can describe '%s'.\n"
 			    "Try --always, or create some tags."),
-			    oid_to_hex(oid));
+			    oid_to_hex(cmit_oid));
 	}
 
 	QSORT(all_matches, match_cnt, compare_pt);
-- 
2.15.0.rc2.443.gfcc3b81c0a


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH 4/7] builtin/describe.c: print debug statements earlier
  2017-10-31  0:33 ` [PATCH 0/7] git-describe <blob> Stefan Beller
                     ` (2 preceding siblings ...)
  2017-10-31  0:33   ` [PATCH 3/7] builtin/describe.c: rename `oid` to avoid variable shadowing Stefan Beller
@ 2017-10-31  0:33   ` Stefan Beller
  2017-10-31  7:03     ` Junio C Hamano
  2017-10-31  0:33   ` [PATCH 5/7] builtin/describe.c: factor out describe_commit Stefan Beller
                     ` (3 subsequent siblings)
  7 siblings, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2017-10-31  0:33 UTC (permalink / raw)
  To: sbeller; +Cc: git, me, jacob.keller, Johannes.Schindelin

For debuggers aid we'd want to print debug statements early, so
introduce a new line in the debug output that describes the whole
function, and the change the next debug output to describe why we need
to search. Conveniently drop the arg from the second line; which will
be useful in a follow up commit, that refactors the describe function.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/describe.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index fd61f463cf..3136efde31 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -293,6 +293,9 @@ static void describe(const char *arg, int last_one)
 	unsigned long seen_commits = 0;
 	unsigned int unannotated_cnt = 0;
 
+	if (debug)
+		fprintf(stderr, _("describe %s\n"), arg);
+
 	if (get_oid(arg, &oid))
 		die(_("Not a valid object name %s"), arg);
 	cmit = lookup_commit_reference(&oid);
@@ -316,7 +319,7 @@ static void describe(const char *arg, int last_one)
 	if (!max_candidates)
 		die(_("no tag exactly matches '%s'"), oid_to_hex(&cmit->object.oid));
 	if (debug)
-		fprintf(stderr, _("searching to describe %s\n"), arg);
+		fprintf(stderr, _("No exact match on refs or tags, searching to describe\n"));
 
 	if (!have_util) {
 		struct hashmap_iter iter;
-- 
2.15.0.rc2.443.gfcc3b81c0a


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH 5/7] builtin/describe.c: factor out describe_commit
  2017-10-31  0:33 ` [PATCH 0/7] git-describe <blob> Stefan Beller
                     ` (3 preceding siblings ...)
  2017-10-31  0:33   ` [PATCH 4/7] builtin/describe.c: print debug statements earlier Stefan Beller
@ 2017-10-31  0:33   ` Stefan Beller
  2017-10-31  0:33   ` [PATCH 6/7] builtin/describe.c: describe a blob Stefan Beller
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 110+ messages in thread
From: Stefan Beller @ 2017-10-31  0:33 UTC (permalink / raw)
  To: sbeller; +Cc: git, me, jacob.keller, Johannes.Schindelin

In the next patch we'll learn how to describe more than just commits,
so factor out describing commits into its own function.  That will make
the next patches easy as we still need to describe a commit as part of
describing blobs.

While factoring out the functionality to describe_commit, make sure
that any output to stdout is put into a strbuf, which we can print
afterwards, using puts which also adds the line ending.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/describe.c | 63 ++++++++++++++++++++++++++++++++----------------------
 1 file changed, 37 insertions(+), 26 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index 3136efde31..9e9a5ed5d4 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -256,7 +256,7 @@ static unsigned long finish_depth_computation(
 	return seen_commits;
 }
 
-static void display_name(struct commit_name *n)
+static void append_name(struct commit_name *n, struct strbuf *dst)
 {
 	if (n->prio == 2 && !n->tag) {
 		n->tag = lookup_tag(&n->oid);
@@ -272,19 +272,18 @@ static void display_name(struct commit_name *n)
 	}
 
 	if (n->tag)
-		printf("%s", n->tag->tag);
+		strbuf_addstr(dst, n->tag->tag);
 	else
-		printf("%s", n->path);
+		strbuf_addstr(dst, n->path);
 }
 
-static void show_suffix(int depth, const struct object_id *oid)
+static void append_suffix(int depth, const struct object_id *oid, struct strbuf *dst)
 {
-	printf("-%d-g%s", depth, find_unique_abbrev(oid->hash, abbrev));
+	strbuf_addf(dst, "-%d-g%s", depth, find_unique_abbrev(oid->hash, abbrev));
 }
 
-static void describe(const char *arg, int last_one)
+static void describe_commit(struct object_id *oid, struct strbuf *dst)
 {
-	struct object_id oid;
 	struct commit *cmit, *gave_up_on = NULL;
 	struct commit_list *list;
 	struct commit_name *n;
@@ -293,26 +292,18 @@ static void describe(const char *arg, int last_one)
 	unsigned long seen_commits = 0;
 	unsigned int unannotated_cnt = 0;
 
-	if (debug)
-		fprintf(stderr, _("describe %s\n"), arg);
-
-	if (get_oid(arg, &oid))
-		die(_("Not a valid object name %s"), arg);
-	cmit = lookup_commit_reference(&oid);
-	if (!cmit)
-		die(_("%s is not a valid '%s' object"), arg, commit_type);
+	cmit = lookup_commit_reference(oid);
 
 	n = find_commit_name(&cmit->object.oid);
 	if (n && (tags || all || n->prio == 2)) {
 		/*
 		 * Exact match to an existing ref.
 		 */
-		display_name(n);
+		append_name(n, dst);
 		if (longformat)
-			show_suffix(0, n->tag ? &n->tag->tagged->oid : &oid);
+			append_suffix(0, n->tag ? &n->tag->tagged->oid : oid, dst);
 		if (suffix)
-			printf("%s", suffix);
-		printf("\n");
+			strbuf_addstr(dst, suffix);
 		return;
 	}
 
@@ -386,10 +377,9 @@ static void describe(const char *arg, int last_one)
 	if (!match_cnt) {
 		struct object_id *cmit_oid = &cmit->object.oid;
 		if (always) {
-			printf("%s", find_unique_abbrev(cmit_oid->hash, abbrev));
+			strbuf_addstr(dst, find_unique_abbrev(cmit_oid->hash, abbrev));
 			if (suffix)
-				printf("%s", suffix);
-			printf("\n");
+				strbuf_addstr(dst, suffix);
 			return;
 		}
 		if (unannotated_cnt)
@@ -437,15 +427,36 @@ static void describe(const char *arg, int last_one)
 		}
 	}
 
-	display_name(all_matches[0].name);
+	append_name(all_matches[0].name, dst);
 	if (abbrev)
-		show_suffix(all_matches[0].depth, &cmit->object.oid);
+		append_suffix(all_matches[0].depth, &cmit->object.oid, dst);
 	if (suffix)
-		printf("%s", suffix);
-	printf("\n");
+		strbuf_addstr(dst, suffix);
+}
+
+static void describe(const char *arg, int last_one)
+{
+	struct object_id oid;
+	struct commit *cmit;
+	struct strbuf sb = STRBUF_INIT;
+
+	if (debug)
+		fprintf(stderr, _("describe %s\n"), arg);
+
+	if (get_oid(arg, &oid))
+		die(_("Not a valid object name %s"), arg);
+	cmit = lookup_commit_reference(&oid);
+	if (!cmit)
+		die(_("%s is not a valid '%s' object"), arg, commit_type);
+
+	describe_commit(&oid, &sb);
+
+	puts(sb.buf);
 
 	if (!last_one)
 		clear_commit_marks(cmit, -1);
+
+	strbuf_release(&sb);
 }
 
 int cmd_describe(int argc, const char **argv, const char *prefix)
-- 
2.15.0.rc2.443.gfcc3b81c0a


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH 6/7] builtin/describe.c: describe a blob
  2017-10-31  0:33 ` [PATCH 0/7] git-describe <blob> Stefan Beller
                     ` (4 preceding siblings ...)
  2017-10-31  0:33   ` [PATCH 5/7] builtin/describe.c: factor out describe_commit Stefan Beller
@ 2017-10-31  0:33   ` Stefan Beller
  2017-10-31  6:25     ` Junio C Hamano
  2017-10-31  0:33   ` [PATCH 7/7] t6120: fix typo in test name Stefan Beller
  2017-10-31 21:18   ` [PATCHv2 0/7] git describe blob Stefan Beller
  7 siblings, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2017-10-31  0:33 UTC (permalink / raw)
  To: sbeller; +Cc: git, me, jacob.keller, Johannes.Schindelin

Sometimes users are given a hash of an object and they want to
identify it further (ex.: Use verify-pack to find the largest blobs,
but what are these? or [1])

"This is an interesting endeavor, because describing things is hard."
  -- me, upon writing this patch.

When describing commits, we try to anchor them to tags or refs, as these
are conceptually on a higher level than the commit. And if there is no ref
or tag that matches exactly, we're out of luck.  So we employ a heuristic
to make up a name for the commit. These names are ambivalent, there might
be different tags or refs to anchor to, and there might be different
path in the DAG to travel to arrive at the commit precisely.

When describing a blob, we want to describe the blob from a higher layer
as well, which is a tuple of (commit, deep/path) as the tree objects
involved are rather uninteresting.  The same blob can be referenced by
multiple commits, so how we decide which commit to use?  This patch
implements a rather naive approach on this: As there are no back pointers
from blobs to commits in which the blob occurs, we'll start walking from
any tips available, listing the blobs in-order of the commit and once we
found the blob, we'll take the first commit that listed the blob.  For
source code this is likely not the first commit that introduced the blob,
but rather the latest commit that contained the blob.  For example:

  git describe v0.99:Makefile
  v0.99-5-gab6625e06a:Makefile

tells us the latest commit that contained the Makefile as it was in tag
v0.99 is commit v0.99-5-gab6625e06a (and at the same path), as the next
commit on top v0.99-6-gb1de9de2b9 ([PATCH] Bootstrap "make dist",
2005-07-11) touches the Makefile.

Let's see how this description turns out, if it is useful in day-to-day
use as I have the intuition that we'd rather want to see the *first*
commit that this blob was introduced to the repository (which can be
achieved easily by giving the `--reverse` flag in the describe_blob rev
walk).

[1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/git-describe.txt | 12 +++++++-
 builtin/describe.c             | 65 ++++++++++++++++++++++++++++++++++++++----
 t/t6120-describe.sh            | 15 ++++++++++
 3 files changed, 86 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt
index c924c945ba..3d618b2445 100644
--- a/Documentation/git-describe.txt
+++ b/Documentation/git-describe.txt
@@ -3,7 +3,7 @@ git-describe(1)
 
 NAME
 ----
-git-describe - Describe a commit using the most recent tag reachable from it
+git-describe - Describe a commit or blob using the most recent tag reachable from it
 
 
 SYNOPSIS
@@ -24,6 +24,16 @@ By default (without --all or --tags) `git describe` only shows
 annotated tags.  For more information about creating annotated tags
 see the -a and -s options to linkgit:git-tag[1].
 
+If the given `<commit-ish>` refers to a blob, it will be described
+as `<commit-description>:<path>`, such that the blob can be found
+at `<path>` in the `<commit-ish`. Note, that the commit is likely
+not the commit that introduced the blob, but the one that was found
+first; to find the commit that introduced the blob, you need to find
+the commit that last touched the path, e.g.
+`git log <commit-description> -- <path>`.
+As blobs do not point at the commits they are contained in,
+describing blobs is slow as we have to walk the whole graph.
+
 OPTIONS
 -------
 <commit-ish>...::
diff --git a/builtin/describe.c b/builtin/describe.c
index 9e9a5ed5d4..382cbae908 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -3,6 +3,7 @@
 #include "lockfile.h"
 #include "commit.h"
 #include "tag.h"
+#include "blob.h"
 #include "refs.h"
 #include "builtin.h"
 #include "exec_cmd.h"
@@ -11,8 +12,9 @@
 #include "hashmap.h"
 #include "argv-array.h"
 #include "run-command.h"
+#include "revision.h"
+#include "list-objects.h"
 
-#define SEEN		(1u << 0)
 #define MAX_TAGS	(FLAG_BITS - 1)
 
 static const char * const describe_usage[] = {
@@ -434,6 +436,54 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
 		strbuf_addstr(dst, suffix);
 }
 
+struct process_commit_data {
+	struct object_id current_commit;
+	struct object_id looking_for;
+	struct strbuf *dst;
+};
+
+static void process_commit(struct commit *commit, void *data)
+{
+	struct process_commit_data *pcd = data;
+	pcd->current_commit = commit->object.oid;
+}
+
+static void process_object(struct object *obj, const char *path, void *data)
+{
+	struct process_commit_data *pcd = data;
+
+	if (!oidcmp(&pcd->looking_for, &obj->oid) && !pcd->dst->len) {
+		reset_revision_walk();
+		describe_commit(&pcd->current_commit, pcd->dst);
+		strbuf_addf(pcd->dst, ":%s", path);
+	}
+}
+
+static void describe_blob(struct object_id oid, struct strbuf *dst)
+{
+	struct rev_info revs;
+	struct argv_array args = ARGV_ARRAY_INIT;
+	struct process_commit_data pcd = { null_oid, oid, dst};
+
+	argv_array_pushl(&args, "internal: The first arg is not parsed",
+		"--all", "--reflog", /* as many starting points as possible */
+		/* NEEDSWORK: --all is incompatible with worktrees for now: */
+		"--single-worktree",
+		"--objects",
+		"--in-commit-order",
+		NULL);
+
+	init_revisions(&revs, NULL);
+	if (setup_revisions(args.argc, args.argv, &revs, NULL) > 1)
+		BUG("setup_revisions could not handle all args?");
+
+	if (prepare_revision_walk(&revs))
+		die("revision walk setup failed");
+
+	traverse_commit_list(&revs, process_commit, process_object, &pcd);
+	reset_revision_walk();
+}
+
 static void describe(const char *arg, int last_one)
 {
 	struct object_id oid;
@@ -445,11 +495,16 @@ static void describe(const char *arg, int last_one)
 
 	if (get_oid(arg, &oid))
 		die(_("Not a valid object name %s"), arg);
-	cmit = lookup_commit_reference(&oid);
-	if (!cmit)
-		die(_("%s is not a valid '%s' object"), arg, commit_type);
+	cmit = lookup_commit_reference_gently(&oid, 1);
 
-	describe_commit(&oid, &sb);
+	if (cmit) {
+		describe_commit(&oid, &sb);
+	} else {
+		if (lookup_blob(&oid))
+			describe_blob(oid, &sb);
+		else
+			die(_("%s is neither a commit nor blob"), arg);
+	}
 
 	puts(sb.buf);
 
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 1c0e8659d9..3be01316e8 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -310,6 +310,21 @@ test_expect_success 'describe ignoring a borken submodule' '
 	grep broken out
 '
 
+test_expect_success 'describe a blob at a tag' '
+	echo "make it a unique blob" >file &&
+	git add file && git commit -m "content in file" &&
+	git tag -a -m "latest annotated tag" unique-file &&
+	git describe HEAD:file >actual &&
+	echo "unique-file:file" >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'describe a surviving blob' '
+	git commit --allow-empty -m "empty commit" &&
+	git describe HEAD:file >actual &&
+	grep unique-file-1-g actual
+'
+
 test_expect_failure ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
 	i=1 &&
 	while test $i -lt 8000
-- 
2.15.0.rc2.443.gfcc3b81c0a


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH 7/7] t6120: fix typo in test name
  2017-10-31  0:33 ` [PATCH 0/7] git-describe <blob> Stefan Beller
                     ` (5 preceding siblings ...)
  2017-10-31  0:33   ` [PATCH 6/7] builtin/describe.c: describe a blob Stefan Beller
@ 2017-10-31  0:33   ` Stefan Beller
  2017-11-01  1:21     ` Junio C Hamano
  2017-10-31 21:18   ` [PATCHv2 0/7] git describe blob Stefan Beller
  7 siblings, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2017-10-31  0:33 UTC (permalink / raw)
  To: sbeller; +Cc: git, me, jacob.keller, Johannes.Schindelin

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 t/t6120-describe.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 3be01316e8..fd329f173a 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -304,7 +304,7 @@ test_expect_success 'describe chokes on severely broken submodules' '
 	mv .git/modules/sub1/ .git/modules/sub_moved &&
 	test_must_fail git describe --dirty
 '
-test_expect_success 'describe ignoring a borken submodule' '
+test_expect_success 'describe ignoring a broken submodule' '
 	git describe --broken >out &&
 	test_when_finished "mv .git/modules/sub_moved .git/modules/sub1" &&
 	grep broken out
-- 
2.15.0.rc2.443.gfcc3b81c0a


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/7] list-objects.c: factor out traverse_trees_and_blobs
  2017-10-31  0:33   ` [PATCH 1/7] list-objects.c: factor out traverse_trees_and_blobs Stefan Beller
@ 2017-10-31  6:07     ` Junio C Hamano
  0 siblings, 0 replies; 110+ messages in thread
From: Junio C Hamano @ 2017-10-31  6:07 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, me, jacob.keller, Johannes.Schindelin

Stefan Beller <sbeller@google.com> writes:

> With traverse_trees_and_blobs factored out of the main traverse function,
> the next patch can introduce an in-order revision walking with ease.
>
> The variable holding the base path is only used in the newly factored out
> function `traverse_trees_and_blobs`, however we keep its scope to
> `traverse_commit_list` to keep the number of invocations for memory
> allocations and release to one per commit traversal.

In this step, the caller is calling traverse_trees_and_blobs() only
once per traversal anyway, so the paragraph does not quite justify
this somewhat strange calling convention, unless the reader knows
more calls will be made to the same function by the caller in later
steps, potentially doubling the chance the buffer can be reused with
this arrangement.

Even after this function gains more callers, wouldn't the same
invariant that "path" recorded for trees and blobs are relative to
the root of tree recorded in the object in revs->pending.objects[]?
IOW, should base_path->len be always 0 upon entry to this function?

We may want to have an explicit _reset() or assert(->len == 0) at
the beginning of the function to ensure that.

> Rename the variable to `base_path` for clarity.

Hmph.  Inside traverse_commit_list(), which never looks at the
contents of the buffer, it could just be called with a name that
does not have any meaning (e.g. "scratch area for the callee to
use"), and such a change would make it clear why it is defined one
level above its use and the caller never looks at it.

In the callee (i.e. inside traverse_trees_and_blobs()), however, I
do not see a reason why base_path is any clearer than base.

Other than that, this looks like a straight-forward code movement
and looks good.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 6/7] builtin/describe.c: describe a blob
  2017-10-31  0:33   ` [PATCH 6/7] builtin/describe.c: describe a blob Stefan Beller
@ 2017-10-31  6:25     ` Junio C Hamano
  2017-10-31 19:16       ` Stefan Beller
  0 siblings, 1 reply; 110+ messages in thread
From: Junio C Hamano @ 2017-10-31  6:25 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, me, jacob.keller, Johannes.Schindelin

Stefan Beller <sbeller@google.com> writes:

> diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt
> index c924c945ba..3d618b2445 100644
> --- a/Documentation/git-describe.txt
> +++ b/Documentation/git-describe.txt
> @@ -3,7 +3,7 @@ git-describe(1)
>  
>  NAME
>  ----
> -git-describe - Describe a commit using the most recent tag reachable from it
> +git-describe - Describe a commit or blob using the most recent tag reachable from it

If I am not mistaken, this series is about describing a blob as a
tuple of a recent commit-ish and a path in the tree in it.  Blob
never reaches anything, so "desribing blob using X reachable from
it" is a mere nonsense.

The original is not great in that it ignores the "--contains" mode
and referring to "tagged commit" merely as "tag" for brevity, but
at least it made some sense.

> @@ -24,6 +24,16 @@ By default (without --all or --tags) `git describe` only shows
>  annotated tags.  For more information about creating annotated tags
>  see the -a and -s options to linkgit:git-tag[1].
>  
> +If the given `<commit-ish>` refers to a blob, it will be described

Perhaps this step should update the SYNOPSIS so that the command
takes not just commit-ish but a blob too.  Given the difficulty in
coming up with the single-liner description of what it does we saw
above, I suspect that splitting SYNOPSIS out into two very distinct
operating mode might make it easier to read.

    SYNOPSIS
    --------
    [verse]
    'git describe' [--all] [--tags] [--contains] [--abbrev=<n>] [<commit-ish>...]
   +'git describe' [<options>...] <blob>...

Then this additional paragraph can say "When describin a <blob>",
without using a (technically nonsense) phrase "if <commit-ish>
refers to a blob", which is never true.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/7] revision.h: introduce blob/tree walking in order of the commits
  2017-10-31  0:33   ` [PATCH 2/7] revision.h: introduce blob/tree walking in order of the commits Stefan Beller
@ 2017-10-31  6:57     ` Junio C Hamano
  2017-10-31 18:12       ` Stefan Beller
  0 siblings, 1 reply; 110+ messages in thread
From: Junio C Hamano @ 2017-10-31  6:57 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, me, jacob.keller, Johannes.Schindelin

Stefan Beller <sbeller@google.com> writes:

> diff --git a/list-objects.c b/list-objects.c
> index bf46f80dff..5e114c9a8a 100644
> --- a/list-objects.c
> +++ b/list-objects.c
> @@ -237,6 +237,8 @@ void traverse_commit_list(struct rev_info *revs,
>  		if (commit->tree)
>  			add_pending_tree(revs, commit->tree);
>  		show_commit(commit, data);
> +		if (revs->tree_blobs_in_commit_order)
> +			traverse_trees_and_blobs(revs, &base_path, show_object, data);
>  	}
>  	traverse_trees_and_blobs(revs, &base_path, show_object, data);

This one is interesting.  While we walk the ancestry chain, we may
add the tree for each commit that we discover to the "pending.  We
used to sweep the pending array at the end after we run out of the
commits, but now we do the sweeping as we process each commit.

Would this make the last call to traverse_trees_and_blobs() always a
no-op?  I am not suggesting to add a new conditional in front of it;
I am just asking for curiosity's sake.

> diff --git a/t/t6100-rev-list-in-order.sh b/t/t6100-rev-list-in-order.sh
> new file mode 100755
> index 0000000000..67ebe815d2
> --- /dev/null
> +++ b/t/t6100-rev-list-in-order.sh
> @@ -0,0 +1,44 @@
> +#!/bin/sh
> +
> +test_description='miscellaneous rev-list tests'
> +
> +. ./test-lib.sh
> +
> +
> +test_expect_success 'setup' '
> +	for x in one two three four
> +	do
> +		echo $x >$x &&
> +		git add $x &&
> +		git commit -m "add file $x"
> +	done &&
> +	for x in four three
> +	do
> +		git rm $x
> +		git commit -m "remove $x"
> +	done &&

There is break in &&-chain in the second loop, but in any case, for
loops and &&-chains do not mix very well unless done carefully.
Imagine what happens if "git commit" in the first loop failed when
x is two; it won't stop and keep going to create three and four and
nobody would noice any error.

> +	git rev-list --in-commit-order --objects HEAD >actual.raw &&
> +	cut -c 1-40 > actual < actual.raw &&
> +
> +	>expect &&
> +	git rev-parse HEAD^{commit}       >>expect &&
> +	git rev-parse HEAD^{tree}         >>expect &&
> +...
> +	git rev-parse HEAD~5^{tree}       >>expect &&

Ooooh, ugly ;-).  I wish we had --stdin interface or something.
Short of that, I'd probably write this more like...

		git rev-parse >expect \
			A B C D \
			E F

Even though we do not have --stdin for rev-parse, you can still do:

	git cat-file --batch-check='%(objectname)' >expect <<-\EOF &&
	HEAD^{commit}
	HEAD^{tree}
	HEAD:one
	HEAD:two
	...
	EOF

> +	test_cmp expect actual
> +'
> +
> +test_done

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/7] builtin/describe.c: print debug statements earlier
  2017-10-31  0:33   ` [PATCH 4/7] builtin/describe.c: print debug statements earlier Stefan Beller
@ 2017-10-31  7:03     ` Junio C Hamano
  2017-10-31 19:05       ` Stefan Beller
  0 siblings, 1 reply; 110+ messages in thread
From: Junio C Hamano @ 2017-10-31  7:03 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, me, jacob.keller, Johannes.Schindelin

Stefan Beller <sbeller@google.com> writes:

> For debuggers aid we'd want to print debug statements early, so
> introduce a new line in the debug output that describes the whole
> function, and the change the next debug output to describe why we need
> to search. Conveniently drop the arg from the second line; which will
> be useful in a follow up commit, that refactors the describe function.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---

Adding an eary "entry" log may indeed be a good idea.  This is not a
new issue, but I do not think it is a good trade-off to burden i18n
teams to translate debug-only messages, though.

>  builtin/describe.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/describe.c b/builtin/describe.c
> index fd61f463cf..3136efde31 100644
> --- a/builtin/describe.c
> +++ b/builtin/describe.c
> @@ -293,6 +293,9 @@ static void describe(const char *arg, int last_one)
>  	unsigned long seen_commits = 0;
>  	unsigned int unannotated_cnt = 0;
>  
> +	if (debug)
> +		fprintf(stderr, _("describe %s\n"), arg);
> +
>  	if (get_oid(arg, &oid))
>  		die(_("Not a valid object name %s"), arg);
>  	cmit = lookup_commit_reference(&oid);
> @@ -316,7 +319,7 @@ static void describe(const char *arg, int last_one)
>  	if (!max_candidates)
>  		die(_("no tag exactly matches '%s'"), oid_to_hex(&cmit->object.oid));
>  	if (debug)
> -		fprintf(stderr, _("searching to describe %s\n"), arg);
> +		fprintf(stderr, _("No exact match on refs or tags, searching to describe\n"));
>  
>  	if (!have_util) {
>  		struct hashmap_iter iter;

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 3/7] builtin/describe.c: rename `oid` to avoid variable shadowing
  2017-10-31  0:33   ` [PATCH 3/7] builtin/describe.c: rename `oid` to avoid variable shadowing Stefan Beller
@ 2017-10-31  8:15     ` Jacob Keller
  0 siblings, 0 replies; 110+ messages in thread
From: Jacob Keller @ 2017-10-31  8:15 UTC (permalink / raw)
  To: Stefan Beller, sbeller; +Cc: git, me, Johannes.Schindelin



On October 30, 2017 5:33:47 PM PDT, Stefan Beller <sbeller@google.com> wrote:
>The function `describe` has already a variable named `oid` declared at
>the beginning of the function for an object id.  Do now shadow that

Nit, s/now/not/

>variable with a pointer to an object id.
>
>Signed-off-by: Stefan Beller <sbeller@google.com>
>---
> builtin/describe.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
>diff --git a/builtin/describe.c b/builtin/describe.c
>index 29075dbd0f..fd61f463cf 100644
>--- a/builtin/describe.c
>+++ b/builtin/describe.c
>@@ -381,9 +381,9 @@ static void describe(const char *arg, int last_one)
> 	}
> 
> 	if (!match_cnt) {
>-		struct object_id *oid = &cmit->object.oid;
>+		struct object_id *cmit_oid = &cmit->object.oid;
> 		if (always) {
>-			printf("%s", find_unique_abbrev(oid->hash, abbrev));
>+			printf("%s", find_unique_abbrev(cmit_oid->hash, abbrev));
> 			if (suffix)
> 				printf("%s", suffix);
> 			printf("\n");
>@@ -392,11 +392,11 @@ static void describe(const char *arg, int
>last_one)
> 		if (unannotated_cnt)
> 			die(_("No annotated tags can describe '%s'.\n"
> 			    "However, there were unannotated tags: try --tags."),
>-			    oid_to_hex(oid));
>+			    oid_to_hex(cmit_oid));
> 		else
> 			die(_("No tags can describe '%s'.\n"
> 			    "Try --always, or create some tags."),
>-			    oid_to_hex(oid));
>+			    oid_to_hex(cmit_oid));
> 	}
> 
> 	QSORT(all_matches, match_cnt, compare_pt);

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/7] revision.h: introduce blob/tree walking in order of the commits
  2017-10-31  6:57     ` Junio C Hamano
@ 2017-10-31 18:12       ` Stefan Beller
  0 siblings, 0 replies; 110+ messages in thread
From: Stefan Beller @ 2017-10-31 18:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Kevin Daudt, Jacob Keller, Johannes Schindelin

On Mon, Oct 30, 2017 at 11:57 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> diff --git a/list-objects.c b/list-objects.c
>> index bf46f80dff..5e114c9a8a 100644
>> --- a/list-objects.c
>> +++ b/list-objects.c
>> @@ -237,6 +237,8 @@ void traverse_commit_list(struct rev_info *revs,
>>               if (commit->tree)
>>                       add_pending_tree(revs, commit->tree);
>>               show_commit(commit, data);
>> +             if (revs->tree_blobs_in_commit_order)
>> +                     traverse_trees_and_blobs(revs, &base_path, show_object, data);
>>       }
>>       traverse_trees_and_blobs(revs, &base_path, show_object, data);
>
> This one is interesting.  While we walk the ancestry chain, we may
> add the tree for each commit that we discover to the "pending.  We
> used to sweep the pending array at the end after we run out of the
> commits, but now we do the sweeping as we process each commit.
>
> Would this make the last call to traverse_trees_and_blobs() always a
> no-op?

That is my understanding of the code, the important part of the previous patch
was the move of 'object_array_clear(&revs->pending);' into the
`traverse_trees_and_blobs` function.

>  I am not suggesting to add a new conditional in front of it;
> I am just asking for curiosity's sake.

Thanks for being clear on that. I have the same taste for this.

(Though for a split second I wondered if we can pull some esoteric trick
to skip this, but I could not find any that is sane as well as readable.)

>> +test_expect_success 'setup' '
>> +     for x in one two three four
>> +     do
>> +             echo $x >$x &&
>> +             git add $x &&
>> +             git commit -m "add file $x"
>> +     done &&
>> +     for x in four three
>> +     do
>> +             git rm $x
>> +             git commit -m "remove $x"
>> +     done &&
>
> There is break in &&-chain in the second loop, but in any case, for
> loops and &&-chains do not mix very well unless done carefully.
> Imagine what happens if "git commit" in the first loop failed when
> x is two; it won't stop and keep going to create three and four and
> nobody would noice any error.
>

I'll fix that.


> Even though we do not have --stdin for rev-parse, you can still do:
>
>         git cat-file --batch-check='%(objectname)' >expect <<-\EOF &&
>         HEAD^{commit}
>         HEAD^{tree}
>         HEAD:one
>         HEAD:two
>         ...
>         EOF

sounds better than the current implementation.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/7] builtin/describe.c: print debug statements earlier
  2017-10-31  7:03     ` Junio C Hamano
@ 2017-10-31 19:05       ` Stefan Beller
  0 siblings, 0 replies; 110+ messages in thread
From: Stefan Beller @ 2017-10-31 19:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Kevin Daudt, Jacob Keller, Johannes Schindelin

On Tue, Oct 31, 2017 at 12:03 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> For debuggers aid we'd want to print debug statements early, so
>> introduce a new line in the debug output that describes the whole
>> function, and the change the next debug output to describe why we need
>> to search. Conveniently drop the arg from the second line; which will
>> be useful in a follow up commit, that refactors the describe function.
>>
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>> ---
>
> Adding an eary "entry" log may indeed be a good idea.  This is not a
> new issue, but I do not think it is a good trade-off to burden i18n
> teams to translate debug-only messages, though.

The original motivation here is to use the 'arg' argument so early in the
function that the next patch can omit passing in 'arg' as the lines
using 'arg' are not in the factored out function.

8713ab3079 (Improve git-describe performance by reducing
revision listing., 2007-01-13) added the --debug flag, and also
exposed it to the man page, so it is user visible, which is why we cannot
keep these messages untranslated.

>
>>  builtin/describe.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/builtin/describe.c b/builtin/describe.c
>> index fd61f463cf..3136efde31 100644
>> --- a/builtin/describe.c
>> +++ b/builtin/describe.c
>> @@ -293,6 +293,9 @@ static void describe(const char *arg, int last_one)
>>       unsigned long seen_commits = 0;
>>       unsigned int unannotated_cnt = 0;
>>
>> +     if (debug)
>> +             fprintf(stderr, _("describe %s\n"), arg);
>> +
>>       if (get_oid(arg, &oid))
>>               die(_("Not a valid object name %s"), arg);
>>       cmit = lookup_commit_reference(&oid);
>> @@ -316,7 +319,7 @@ static void describe(const char *arg, int last_one)
>>       if (!max_candidates)
>>               die(_("no tag exactly matches '%s'"), oid_to_hex(&cmit->object.oid));
>>       if (debug)
>> -             fprintf(stderr, _("searching to describe %s\n"), arg);
>> +             fprintf(stderr, _("No exact match on refs or tags, searching to describe\n"));
>>
>>       if (!have_util) {
>>               struct hashmap_iter iter;

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 6/7] builtin/describe.c: describe a blob
  2017-10-31  6:25     ` Junio C Hamano
@ 2017-10-31 19:16       ` Stefan Beller
  2017-11-01  3:34         ` Junio C Hamano
  2017-11-01  3:44         ` Junio C Hamano
  0 siblings, 2 replies; 110+ messages in thread
From: Stefan Beller @ 2017-10-31 19:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Kevin Daudt, Jacob Keller, Johannes Schindelin

On Mon, Oct 30, 2017 at 11:25 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt
>> index c924c945ba..3d618b2445 100644
>> --- a/Documentation/git-describe.txt
>> +++ b/Documentation/git-describe.txt
>> @@ -3,7 +3,7 @@ git-describe(1)
>>
>>  NAME
>>  ----
>> -git-describe - Describe a commit using the most recent tag reachable from it
>> +git-describe - Describe a commit or blob using the most recent tag reachable from it
>
> If I am not mistaken, this series is about describing a blob as a
> tuple of a recent commit-ish and a path in the tree in it.  Blob
> never reaches anything, so "desribing blob using X reachable from
> it" is a mere nonsense.

Agreed, though the commitish used for the blob description may refer to a tag.

> The original is not great in that it ignores the "--contains" mode
> and referring to "tagged commit" merely as "tag" for brevity, but
> at least it made some sense.

Maybe

  "git-describe - Describe a blob or commit using graph relations"

though that sounds too generic, but it is accurate as all we do is
a heuristic for graph walk ways.


>> @@ -24,6 +24,16 @@ By default (without --all or --tags) `git describe` only shows
>>  annotated tags.  For more information about creating annotated tags
>>  see the -a and -s options to linkgit:git-tag[1].
>>
>> +If the given `<commit-ish>` refers to a blob, it will be described
>
> Perhaps this step should update the SYNOPSIS so that the command
> takes not just commit-ish but a blob too.

That is a good idea, as the --contains would not work for blobs currently.

>  Given the difficulty in
> coming up with the single-liner description of what it does we saw
> above, I suspect that splitting SYNOPSIS out into two very distinct
> operating mode might make it easier to read.
>
>     SYNOPSIS
>     --------
>     [verse]
>     'git describe' [--all] [--tags] [--contains] [--abbrev=<n>] [<commit-ish>...]
>    +'git describe' [<options>...] <blob>...
>
> Then this additional paragraph can say "When describin a <blob>",
> without using a (technically nonsense) phrase "if <commit-ish>
> refers to a blob", which is never true.
>

ok, do we know about 'blob-ish' as a term?

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCHv2 0/7] git describe blob
  2017-10-31  0:33 ` [PATCH 0/7] git-describe <blob> Stefan Beller
                     ` (6 preceding siblings ...)
  2017-10-31  0:33   ` [PATCH 7/7] t6120: fix typo in test name Stefan Beller
@ 2017-10-31 21:18   ` Stefan Beller
  2017-10-31 21:18     ` [PATCHv2 1/7] list-objects.c: factor out traverse_trees_and_blobs Stefan Beller
                       ` (8 more replies)
  7 siblings, 9 replies; 110+ messages in thread
From: Stefan Beller @ 2017-10-31 21:18 UTC (permalink / raw)
  To: sbeller; +Cc: Johannes.Schindelin, git, jacob.keller, me

v2:

* other variable names in patch v1, the commit message explains the
  unusual strategy for the scratch pad variable, + assert
* less ugly test in p2
* typofix in p3 commit msg
* patch 4 (debug printing) unchanged, awaiting discussion to start/settle.
* rephrased the man page in p6.

Thanks,
Stefan

v1:
This is not an RFC any more, but a serious series.

Occasionally a user is given an object hash from a blob as an error message
or other output (e.g. [1]).

It would be useful to get a further description of such a blob, such as
the (commit, path) tuple where this blob was introduced.

This implements the answer in builtin/describe,
however the heuristics are weak. See patch 6 for details.

Any feedback welcome,

Thanks,
Stefan

Stefan Beller (7):
  list-objects.c: factor out traverse_trees_and_blobs
  revision.h: introduce blob/tree walking in order of the commits
  builtin/describe.c: rename `oid` to avoid variable shadowing
  builtin/describe.c: print debug statements earlier
  builtin/describe.c: factor out describe_commit
  builtin/describe.c: describe a blob
  t6120: fix typo in test name

 Documentation/git-describe.txt     |  13 +++-
 Documentation/rev-list-options.txt |   5 ++
 builtin/describe.c                 | 125 ++++++++++++++++++++++++++++---------
 list-objects.c                     |  52 +++++++++------
 revision.c                         |   2 +
 revision.h                         |   3 +-
 t/t6100-rev-list-in-order.sh       |  46 ++++++++++++++
 t/t6120-describe.sh                |  17 ++++-
 8 files changed, 213 insertions(+), 50 deletions(-)
 create mode 100755 t/t6100-rev-list-in-order.sh

-- 
2.15.0.rc2.443.gfcc3b81c0a


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCHv2 1/7] list-objects.c: factor out traverse_trees_and_blobs
  2017-10-31 21:18   ` [PATCHv2 0/7] git describe blob Stefan Beller
@ 2017-10-31 21:18     ` Stefan Beller
  2017-11-01  3:46       ` Junio C Hamano
  2017-10-31 21:18     ` [PATCHv2 2/7] revision.h: introduce blob/tree walking in order of the commits Stefan Beller
                       ` (7 subsequent siblings)
  8 siblings, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2017-10-31 21:18 UTC (permalink / raw)
  To: sbeller; +Cc: Johannes.Schindelin, git, jacob.keller, me

With traverse_trees_and_blobs factored out of the main traverse function,
the next patch can introduce an in-order revision walking with ease.

In the next patch we'll call `traverse_trees_and_blobs` from within the
loop walking the commits, such that we'll have one invocation of that
function per commit.  That is why we do not want to have memory allocations
in that function, such as we'd have if we were to use a strbuf locally.
Pass a strbuf from traverse_commit_list into the blob and tree traversing
function as a scratch pad that only needs to be allocated once.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 list-objects.c | 50 +++++++++++++++++++++++++++++++-------------------
 1 file changed, 31 insertions(+), 19 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index b3931fa434..ef9dbe8f92 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -183,25 +183,15 @@ static void add_pending_tree(struct rev_info *revs, struct tree *tree)
 	add_pending_object(revs, &tree->object, "");
 }
 
-void traverse_commit_list(struct rev_info *revs,
-			  show_commit_fn show_commit,
-			  show_object_fn show_object,
-			  void *data)
+static void traverse_trees_and_blobs(struct rev_info *revs,
+				     struct strbuf *base_path,
+				     show_object_fn show_object,
+				     void *data)
 {
 	int i;
-	struct commit *commit;
-	struct strbuf base;
 
-	strbuf_init(&base, PATH_MAX);
-	while ((commit = get_revision(revs)) != NULL) {
-		/*
-		 * an uninteresting boundary commit may not have its tree
-		 * parsed yet, but we are not going to show them anyway
-		 */
-		if (commit->tree)
-			add_pending_tree(revs, commit->tree);
-		show_commit(commit, data);
-	}
+	assert(base_path->len == 0);
+
 	for (i = 0; i < revs->pending.nr; i++) {
 		struct object_array_entry *pending = revs->pending.objects + i;
 		struct object *obj = pending->item;
@@ -218,17 +208,39 @@ void traverse_commit_list(struct rev_info *revs,
 			path = "";
 		if (obj->type == OBJ_TREE) {
 			process_tree(revs, (struct tree *)obj, show_object,
-				     &base, path, data);
+				     base_path, path, data);
 			continue;
 		}
 		if (obj->type == OBJ_BLOB) {
 			process_blob(revs, (struct blob *)obj, show_object,
-				     &base, path, data);
+				     base_path, path, data);
 			continue;
 		}
 		die("unknown pending object %s (%s)",
 		    oid_to_hex(&obj->oid), name);
 	}
 	object_array_clear(&revs->pending);
-	strbuf_release(&base);
+}
+
+void traverse_commit_list(struct rev_info *revs,
+			  show_commit_fn show_commit,
+			  show_object_fn show_object,
+			  void *data)
+{
+	struct commit *commit;
+	struct strbuf csp; /* callee's scratch pad */
+	strbuf_init(&csp, PATH_MAX);
+
+	while ((commit = get_revision(revs)) != NULL) {
+		/*
+		 * an uninteresting boundary commit may not have its tree
+		 * parsed yet, but we are not going to show them anyway
+		 */
+		if (commit->tree)
+			add_pending_tree(revs, commit->tree);
+		show_commit(commit, data);
+	}
+	traverse_trees_and_blobs(revs, &csp, show_object, data);
+
+	strbuf_release(&csp);
 }
-- 
2.15.0.rc2.443.gfcc3b81c0a


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCHv2 2/7] revision.h: introduce blob/tree walking in order of the commits
  2017-10-31 21:18   ` [PATCHv2 0/7] git describe blob Stefan Beller
  2017-10-31 21:18     ` [PATCHv2 1/7] list-objects.c: factor out traverse_trees_and_blobs Stefan Beller
@ 2017-10-31 21:18     ` Stefan Beller
  2017-11-01  3:50       ` Junio C Hamano
  2017-10-31 21:18     ` [PATCHv2 3/7] builtin/describe.c: rename `oid` to avoid variable shadowing Stefan Beller
                       ` (6 subsequent siblings)
  8 siblings, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2017-10-31 21:18 UTC (permalink / raw)
  To: sbeller; +Cc: Johannes.Schindelin, git, jacob.keller, me

The functionality to list tree objects in the order they were seen
while traversing the commits will be used in the next commit,
where we teach `git describe` to describe not only commits, but
trees and blobs, too.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/rev-list-options.txt |  5 +++++
 list-objects.c                     |  2 ++
 revision.c                         |  2 ++
 revision.h                         |  3 ++-
 t/t6100-rev-list-in-order.sh       | 46 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 57 insertions(+), 1 deletion(-)
 create mode 100755 t/t6100-rev-list-in-order.sh

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 13501e1556..9066e0c777 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -686,6 +686,11 @@ ifdef::git-rev-list[]
 	all object IDs which I need to download if I have the commit
 	object _bar_ but not _foo_''.
 
+--in-commit-order::
+	Print tree and blob ids in order of the commits. The tree
+	and blob ids are printed after they are first referenced
+	by a commit.
+
 --objects-edge::
 	Similar to `--objects`, but also print the IDs of excluded
 	commits prefixed with a ``-'' character.  This is used by
diff --git a/list-objects.c b/list-objects.c
index ef9dbe8f92..03438e5a8b 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -239,6 +239,8 @@ void traverse_commit_list(struct rev_info *revs,
 		if (commit->tree)
 			add_pending_tree(revs, commit->tree);
 		show_commit(commit, data);
+		if (revs->tree_blobs_in_commit_order)
+			traverse_trees_and_blobs(revs, &csp, show_object, data);
 	}
 	traverse_trees_and_blobs(revs, &csp, show_object, data);
 
diff --git a/revision.c b/revision.c
index d167223e69..9329d4ebbf 100644
--- a/revision.c
+++ b/revision.c
@@ -1845,6 +1845,8 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->dense = 0;
 	} else if (!strcmp(arg, "--show-all")) {
 		revs->show_all = 1;
+	} else if (!strcmp(arg, "--in-commit-order")) {
+		revs->tree_blobs_in_commit_order = 1;
 	} else if (!strcmp(arg, "--remove-empty")) {
 		revs->remove_empty_trees = 1;
 	} else if (!strcmp(arg, "--merges")) {
diff --git a/revision.h b/revision.h
index 54761200ad..86985d68aa 100644
--- a/revision.h
+++ b/revision.h
@@ -121,7 +121,8 @@ struct rev_info {
 			bisect:1,
 			ancestry_path:1,
 			first_parent_only:1,
-			line_level_traverse:1;
+			line_level_traverse:1,
+			tree_blobs_in_commit_order:1;
 
 	/* Diff flags */
 	unsigned int	diff:1,
diff --git a/t/t6100-rev-list-in-order.sh b/t/t6100-rev-list-in-order.sh
new file mode 100755
index 0000000000..651666979b
--- /dev/null
+++ b/t/t6100-rev-list-in-order.sh
@@ -0,0 +1,46 @@
+#!/bin/sh
+
+test_description='miscellaneous rev-list tests'
+
+. ./test-lib.sh
+
+
+test_expect_success 'setup' '
+	for x in one two three four
+	do
+		echo $x >$x &&
+		git add $x &&
+		git commit -m "add file $x"
+	done &&
+	for x in four three
+	do
+		git rm $x &&
+		git commit -m "remove $x"
+	done &&
+	git rev-list --in-commit-order --objects HEAD >actual.raw &&
+	cut -c 1-40 >actual <actual.raw &&
+
+	git cat-file --batch-check="%(objectname)" >expect.raw <<-\EOF &&
+		HEAD^{commit}
+		HEAD^{tree}
+		HEAD^{tree}:one
+		HEAD^{tree}:two
+		HEAD~1^{commit}
+		HEAD~1^{tree}
+		HEAD~1^{tree}:three
+		HEAD~2^{commit}
+		HEAD~2^{tree}
+		HEAD~2^{tree}:four
+		HEAD~3^{commit}
+		# HEAD~3^{tree} skipped
+		HEAD~4^{commit}
+		# HEAD~4^{tree} skipped
+		HEAD~5^{commit}
+		HEAD~5^{tree}
+	EOF
+	grep -v "#" >expect <expect.raw &&
+
+	test_cmp expect actual
+'
+
+test_done
-- 
2.15.0.rc2.443.gfcc3b81c0a


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCHv2 3/7] builtin/describe.c: rename `oid` to avoid variable shadowing
  2017-10-31 21:18   ` [PATCHv2 0/7] git describe blob Stefan Beller
  2017-10-31 21:18     ` [PATCHv2 1/7] list-objects.c: factor out traverse_trees_and_blobs Stefan Beller
  2017-10-31 21:18     ` [PATCHv2 2/7] revision.h: introduce blob/tree walking in order of the commits Stefan Beller
@ 2017-10-31 21:18     ` Stefan Beller
  2017-10-31 21:18     ` [PATCHv2 4/7] builtin/describe.c: print debug statements earlier Stefan Beller
                       ` (5 subsequent siblings)
  8 siblings, 0 replies; 110+ messages in thread
From: Stefan Beller @ 2017-10-31 21:18 UTC (permalink / raw)
  To: sbeller; +Cc: Johannes.Schindelin, git, jacob.keller, me

The function `describe` has already a variable named `oid` declared at
the beginning of the function for an object id.  Do not shadow that
variable with a pointer to an object id.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/describe.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index 29075dbd0f..fd61f463cf 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -381,9 +381,9 @@ static void describe(const char *arg, int last_one)
 	}
 
 	if (!match_cnt) {
-		struct object_id *oid = &cmit->object.oid;
+		struct object_id *cmit_oid = &cmit->object.oid;
 		if (always) {
-			printf("%s", find_unique_abbrev(oid->hash, abbrev));
+			printf("%s", find_unique_abbrev(cmit_oid->hash, abbrev));
 			if (suffix)
 				printf("%s", suffix);
 			printf("\n");
@@ -392,11 +392,11 @@ static void describe(const char *arg, int last_one)
 		if (unannotated_cnt)
 			die(_("No annotated tags can describe '%s'.\n"
 			    "However, there were unannotated tags: try --tags."),
-			    oid_to_hex(oid));
+			    oid_to_hex(cmit_oid));
 		else
 			die(_("No tags can describe '%s'.\n"
 			    "Try --always, or create some tags."),
-			    oid_to_hex(oid));
+			    oid_to_hex(cmit_oid));
 	}
 
 	QSORT(all_matches, match_cnt, compare_pt);
-- 
2.15.0.rc2.443.gfcc3b81c0a


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCHv2 4/7] builtin/describe.c: print debug statements earlier
  2017-10-31 21:18   ` [PATCHv2 0/7] git describe blob Stefan Beller
                       ` (2 preceding siblings ...)
  2017-10-31 21:18     ` [PATCHv2 3/7] builtin/describe.c: rename `oid` to avoid variable shadowing Stefan Beller
@ 2017-10-31 21:18     ` Stefan Beller
  2017-10-31 21:31       ` Eric Sunshine
  2017-10-31 21:18     ` [PATCHv2 5/7] builtin/describe.c: factor out describe_commit Stefan Beller
                       ` (4 subsequent siblings)
  8 siblings, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2017-10-31 21:18 UTC (permalink / raw)
  To: sbeller; +Cc: Johannes.Schindelin, git, jacob.keller, me

For debuggers aid we'd want to print debug statements early, so
introduce a new line in the debug output that describes the whole
function, and the change the next debug output to describe why we need
to search. Conveniently drop the arg from the second line; which will
be useful in a follow up commit, that refactors the describe function.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/describe.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index fd61f463cf..3136efde31 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -293,6 +293,9 @@ static void describe(const char *arg, int last_one)
 	unsigned long seen_commits = 0;
 	unsigned int unannotated_cnt = 0;
 
+	if (debug)
+		fprintf(stderr, _("describe %s\n"), arg);
+
 	if (get_oid(arg, &oid))
 		die(_("Not a valid object name %s"), arg);
 	cmit = lookup_commit_reference(&oid);
@@ -316,7 +319,7 @@ static void describe(const char *arg, int last_one)
 	if (!max_candidates)
 		die(_("no tag exactly matches '%s'"), oid_to_hex(&cmit->object.oid));
 	if (debug)
-		fprintf(stderr, _("searching to describe %s\n"), arg);
+		fprintf(stderr, _("No exact match on refs or tags, searching to describe\n"));
 
 	if (!have_util) {
 		struct hashmap_iter iter;
-- 
2.15.0.rc2.443.gfcc3b81c0a


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCHv2 5/7] builtin/describe.c: factor out describe_commit
  2017-10-31 21:18   ` [PATCHv2 0/7] git describe blob Stefan Beller
                       ` (3 preceding siblings ...)
  2017-10-31 21:18     ` [PATCHv2 4/7] builtin/describe.c: print debug statements earlier Stefan Beller
@ 2017-10-31 21:18     ` Stefan Beller
  2017-10-31 21:18     ` [PATCHv2 6/7] builtin/describe.c: describe a blob Stefan Beller
                       ` (3 subsequent siblings)
  8 siblings, 0 replies; 110+ messages in thread
From: Stefan Beller @ 2017-10-31 21:18 UTC (permalink / raw)
  To: sbeller; +Cc: Johannes.Schindelin, git, jacob.keller, me

In the next patch we'll learn how to describe more than just commits,
so factor out describing commits into its own function.  That will make
the next patches easy as we still need to describe a commit as part of
describing blobs.

While factoring out the functionality to describe_commit, make sure
that any output to stdout is put into a strbuf, which we can print
afterwards, using puts which also adds the line ending.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/describe.c | 63 ++++++++++++++++++++++++++++++++----------------------
 1 file changed, 37 insertions(+), 26 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index 3136efde31..9e9a5ed5d4 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -256,7 +256,7 @@ static unsigned long finish_depth_computation(
 	return seen_commits;
 }
 
-static void display_name(struct commit_name *n)
+static void append_name(struct commit_name *n, struct strbuf *dst)
 {
 	if (n->prio == 2 && !n->tag) {
 		n->tag = lookup_tag(&n->oid);
@@ -272,19 +272,18 @@ static void display_name(struct commit_name *n)
 	}
 
 	if (n->tag)
-		printf("%s", n->tag->tag);
+		strbuf_addstr(dst, n->tag->tag);
 	else
-		printf("%s", n->path);
+		strbuf_addstr(dst, n->path);
 }
 
-static void show_suffix(int depth, const struct object_id *oid)
+static void append_suffix(int depth, const struct object_id *oid, struct strbuf *dst)
 {
-	printf("-%d-g%s", depth, find_unique_abbrev(oid->hash, abbrev));
+	strbuf_addf(dst, "-%d-g%s", depth, find_unique_abbrev(oid->hash, abbrev));
 }
 
-static void describe(const char *arg, int last_one)
+static void describe_commit(struct object_id *oid, struct strbuf *dst)
 {
-	struct object_id oid;
 	struct commit *cmit, *gave_up_on = NULL;
 	struct commit_list *list;
 	struct commit_name *n;
@@ -293,26 +292,18 @@ static void describe(const char *arg, int last_one)
 	unsigned long seen_commits = 0;
 	unsigned int unannotated_cnt = 0;
 
-	if (debug)
-		fprintf(stderr, _("describe %s\n"), arg);
-
-	if (get_oid(arg, &oid))
-		die(_("Not a valid object name %s"), arg);
-	cmit = lookup_commit_reference(&oid);
-	if (!cmit)
-		die(_("%s is not a valid '%s' object"), arg, commit_type);
+	cmit = lookup_commit_reference(oid);
 
 	n = find_commit_name(&cmit->object.oid);
 	if (n && (tags || all || n->prio == 2)) {
 		/*
 		 * Exact match to an existing ref.
 		 */
-		display_name(n);
+		append_name(n, dst);
 		if (longformat)
-			show_suffix(0, n->tag ? &n->tag->tagged->oid : &oid);
+			append_suffix(0, n->tag ? &n->tag->tagged->oid : oid, dst);
 		if (suffix)
-			printf("%s", suffix);
-		printf("\n");
+			strbuf_addstr(dst, suffix);
 		return;
 	}
 
@@ -386,10 +377,9 @@ static void describe(const char *arg, int last_one)
 	if (!match_cnt) {
 		struct object_id *cmit_oid = &cmit->object.oid;
 		if (always) {
-			printf("%s", find_unique_abbrev(cmit_oid->hash, abbrev));
+			strbuf_addstr(dst, find_unique_abbrev(cmit_oid->hash, abbrev));
 			if (suffix)
-				printf("%s", suffix);
-			printf("\n");
+				strbuf_addstr(dst, suffix);
 			return;
 		}
 		if (unannotated_cnt)
@@ -437,15 +427,36 @@ static void describe(const char *arg, int last_one)
 		}
 	}
 
-	display_name(all_matches[0].name);
+	append_name(all_matches[0].name, dst);
 	if (abbrev)
-		show_suffix(all_matches[0].depth, &cmit->object.oid);
+		append_suffix(all_matches[0].depth, &cmit->object.oid, dst);
 	if (suffix)
-		printf("%s", suffix);
-	printf("\n");
+		strbuf_addstr(dst, suffix);
+}
+
+static void describe(const char *arg, int last_one)
+{
+	struct object_id oid;
+	struct commit *cmit;
+	struct strbuf sb = STRBUF_INIT;
+
+	if (debug)
+		fprintf(stderr, _("describe %s\n"), arg);
+
+	if (get_oid(arg, &oid))
+		die(_("Not a valid object name %s"), arg);
+	cmit = lookup_commit_reference(&oid);
+	if (!cmit)
+		die(_("%s is not a valid '%s' object"), arg, commit_type);
+
+	describe_commit(&oid, &sb);
+
+	puts(sb.buf);
 
 	if (!last_one)
 		clear_commit_marks(cmit, -1);
+
+	strbuf_release(&sb);
 }
 
 int cmd_describe(int argc, const char **argv, const char *prefix)
-- 
2.15.0.rc2.443.gfcc3b81c0a


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCHv2 6/7] builtin/describe.c: describe a blob
  2017-10-31 21:18   ` [PATCHv2 0/7] git describe blob Stefan Beller
                       ` (4 preceding siblings ...)
  2017-10-31 21:18     ` [PATCHv2 5/7] builtin/describe.c: factor out describe_commit Stefan Beller
@ 2017-10-31 21:18     ` Stefan Beller
  2017-10-31 21:49       ` Eric Sunshine
  2017-11-01  4:11       ` Junio C Hamano
  2017-10-31 21:18     ` [PATCHv2 7/7] t6120: fix typo in test name Stefan Beller
                       ` (2 subsequent siblings)
  8 siblings, 2 replies; 110+ messages in thread
From: Stefan Beller @ 2017-10-31 21:18 UTC (permalink / raw)
  To: sbeller; +Cc: Johannes.Schindelin, git, jacob.keller, me

Sometimes users are given a hash of an object and they want to
identify it further (ex.: Use verify-pack to find the largest blobs,
but what are these? or [1])

"This is an interesting endeavor, because describing things is hard."
  -- me, upon writing this patch.

When describing commits, we try to anchor them to tags or refs, as these
are conceptually on a higher level than the commit. And if there is no ref
or tag that matches exactly, we're out of luck.  So we employ a heuristic
to make up a name for the commit. These names are ambivalent, there might
be different tags or refs to anchor to, and there might be different
path in the DAG to travel to arrive at the commit precisely.

When describing a blob, we want to describe the blob from a higher layer
as well, which is a tuple of (commit, deep/path) as the tree objects
involved are rather uninteresting.  The same blob can be referenced by
multiple commits, so how we decide which commit to use?  This patch
implements a rather naive approach on this: As there are no back pointers
from blobs to commits in which the blob occurs, we'll start walking from
any tips available, listing the blobs in-order of the commit and once we
found the blob, we'll take the first commit that listed the blob.  For
source code this is likely not the first commit that introduced the blob,
but rather the latest commit that contained the blob.  For example:

  git describe v0.99:Makefile
  v0.99-5-gab6625e06a:Makefile

tells us the latest commit that contained the Makefile as it was in tag
v0.99 is commit v0.99-5-gab6625e06a (and at the same path), as the next
commit on top v0.99-6-gb1de9de2b9 ([PATCH] Bootstrap "make dist",
2005-07-11) touches the Makefile.

Let's see how this description turns out, if it is useful in day-to-day
use as I have the intuition that we'd rather want to see the *first*
commit that this blob was introduced to the repository (which can be
achieved easily by giving the `--reverse` flag in the describe_blob rev
walk).

[1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/git-describe.txt | 13 ++++++++-
 builtin/describe.c             | 65 ++++++++++++++++++++++++++++++++++++++----
 t/t6120-describe.sh            | 15 ++++++++++
 3 files changed, 87 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt
index c924c945ba..077c3c2193 100644
--- a/Documentation/git-describe.txt
+++ b/Documentation/git-describe.txt
@@ -3,7 +3,7 @@ git-describe(1)
 
 NAME
 ----
-git-describe - Describe a commit using the most recent tag reachable from it
+git-describe - Describe a commit or blob using the graph relations
 
 
 SYNOPSIS
@@ -11,6 +11,7 @@ SYNOPSIS
 [verse]
 'git describe' [--all] [--tags] [--contains] [--abbrev=<n>] [<commit-ish>...]
 'git describe' [--all] [--tags] [--contains] [--abbrev=<n>] --dirty[=<mark>]
+'git describe' [<options>] <blob-ish>
 
 DESCRIPTION
 -----------
@@ -24,6 +25,16 @@ By default (without --all or --tags) `git describe` only shows
 annotated tags.  For more information about creating annotated tags
 see the -a and -s options to linkgit:git-tag[1].
 
+If the given object refers to a blob, it will be described
+as `<commit-ish>:<path>`, such that the blob can be found
+at `<path>` in the `<commit-ish>`. Note, that the commit is likely
+not the commit that introduced the blob, but the one that was found
+first; to find the commit that introduced the blob, you need to find
+the commit that last touched the path, e.g.
+`git log <commit-description> -- <path>`.
+As blobs do not point at the commits they are contained in,
+describing blobs is slow as we have to walk the whole graph.
+
 OPTIONS
 -------
 <commit-ish>...::
diff --git a/builtin/describe.c b/builtin/describe.c
index 9e9a5ed5d4..382cbae908 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -3,6 +3,7 @@
 #include "lockfile.h"
 #include "commit.h"
 #include "tag.h"
+#include "blob.h"
 #include "refs.h"
 #include "builtin.h"
 #include "exec_cmd.h"
@@ -11,8 +12,9 @@
 #include "hashmap.h"
 #include "argv-array.h"
 #include "run-command.h"
+#include "revision.h"
+#include "list-objects.h"
 
-#define SEEN		(1u << 0)
 #define MAX_TAGS	(FLAG_BITS - 1)
 
 static const char * const describe_usage[] = {
@@ -434,6 +436,54 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
 		strbuf_addstr(dst, suffix);
 }
 
+struct process_commit_data {
+	struct object_id current_commit;
+	struct object_id looking_for;
+	struct strbuf *dst;
+};
+
+static void process_commit(struct commit *commit, void *data)
+{
+	struct process_commit_data *pcd = data;
+	pcd->current_commit = commit->object.oid;
+}
+
+static void process_object(struct object *obj, const char *path, void *data)
+{
+	struct process_commit_data *pcd = data;
+
+	if (!oidcmp(&pcd->looking_for, &obj->oid) && !pcd->dst->len) {
+		reset_revision_walk();
+		describe_commit(&pcd->current_commit, pcd->dst);
+		strbuf_addf(pcd->dst, ":%s", path);
+	}
+}
+
+static void describe_blob(struct object_id oid, struct strbuf *dst)
+{
+	struct rev_info revs;
+	struct argv_array args = ARGV_ARRAY_INIT;
+	struct process_commit_data pcd = { null_oid, oid, dst};
+
+	argv_array_pushl(&args, "internal: The first arg is not parsed",
+		"--all", "--reflog", /* as many starting points as possible */
+		/* NEEDSWORK: --all is incompatible with worktrees for now: */
+		"--single-worktree",
+		"--objects",
+		"--in-commit-order",
+		NULL);
+
+	init_revisions(&revs, NULL);
+	if (setup_revisions(args.argc, args.argv, &revs, NULL) > 1)
+		BUG("setup_revisions could not handle all args?");
+
+	if (prepare_revision_walk(&revs))
+		die("revision walk setup failed");
+
+	traverse_commit_list(&revs, process_commit, process_object, &pcd);
+	reset_revision_walk();
+}
+
 static void describe(const char *arg, int last_one)
 {
 	struct object_id oid;
@@ -445,11 +495,16 @@ static void describe(const char *arg, int last_one)
 
 	if (get_oid(arg, &oid))
 		die(_("Not a valid object name %s"), arg);
-	cmit = lookup_commit_reference(&oid);
-	if (!cmit)
-		die(_("%s is not a valid '%s' object"), arg, commit_type);
+	cmit = lookup_commit_reference_gently(&oid, 1);
 
-	describe_commit(&oid, &sb);
+	if (cmit) {
+		describe_commit(&oid, &sb);
+	} else {
+		if (lookup_blob(&oid))
+			describe_blob(oid, &sb);
+		else
+			die(_("%s is neither a commit nor blob"), arg);
+	}
 
 	puts(sb.buf);
 
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 1c0e8659d9..3be01316e8 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -310,6 +310,21 @@ test_expect_success 'describe ignoring a borken submodule' '
 	grep broken out
 '
 
+test_expect_success 'describe a blob at a tag' '
+	echo "make it a unique blob" >file &&
+	git add file && git commit -m "content in file" &&
+	git tag -a -m "latest annotated tag" unique-file &&
+	git describe HEAD:file >actual &&
+	echo "unique-file:file" >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'describe a surviving blob' '
+	git commit --allow-empty -m "empty commit" &&
+	git describe HEAD:file >actual &&
+	grep unique-file-1-g actual
+'
+
 test_expect_failure ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
 	i=1 &&
 	while test $i -lt 8000
-- 
2.15.0.rc2.443.gfcc3b81c0a


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCHv2 7/7] t6120: fix typo in test name
  2017-10-31 21:18   ` [PATCHv2 0/7] git describe blob Stefan Beller
                       ` (5 preceding siblings ...)
  2017-10-31 21:18     ` [PATCHv2 6/7] builtin/describe.c: describe a blob Stefan Beller
@ 2017-10-31 21:18     ` Stefan Beller
  2017-11-01  5:14     ` [PATCHv2 0/7] git describe blob Junio C Hamano
  2017-11-02 19:41     ` [PATCHv3 " Stefan Beller
  8 siblings, 0 replies; 110+ messages in thread
From: Stefan Beller @ 2017-10-31 21:18 UTC (permalink / raw)
  To: sbeller; +Cc: Johannes.Schindelin, git, jacob.keller, me

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 t/t6120-describe.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 3be01316e8..fd329f173a 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -304,7 +304,7 @@ test_expect_success 'describe chokes on severely broken submodules' '
 	mv .git/modules/sub1/ .git/modules/sub_moved &&
 	test_must_fail git describe --dirty
 '
-test_expect_success 'describe ignoring a borken submodule' '
+test_expect_success 'describe ignoring a broken submodule' '
 	git describe --broken >out &&
 	test_when_finished "mv .git/modules/sub_moved .git/modules/sub1" &&
 	grep broken out
-- 
2.15.0.rc2.443.gfcc3b81c0a


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 4/7] builtin/describe.c: print debug statements earlier
  2017-10-31 21:18     ` [PATCHv2 4/7] builtin/describe.c: print debug statements earlier Stefan Beller
@ 2017-10-31 21:31       ` Eric Sunshine
  0 siblings, 0 replies; 110+ messages in thread
From: Eric Sunshine @ 2017-10-31 21:31 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Johannes Schindelin, Git List, Jacob Keller, Kevin Daudt

On Tue, Oct 31, 2017 at 5:18 PM, Stefan Beller <sbeller@google.com> wrote:
> For debuggers aid we'd want to print debug statements early, so
> introduce a new line in the debug output that describes the whole
> function, and the change the next debug output to describe why we need

s/and the/and/
...or...
s/and the/and then/

> to search. Conveniently drop the arg from the second line; which will
> be useful in a follow up commit, that refactors the describe function.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 6/7] builtin/describe.c: describe a blob
  2017-10-31 21:18     ` [PATCHv2 6/7] builtin/describe.c: describe a blob Stefan Beller
@ 2017-10-31 21:49       ` Eric Sunshine
  2017-11-01 19:51         ` Stefan Beller
  2017-11-01  4:11       ` Junio C Hamano
  1 sibling, 1 reply; 110+ messages in thread
From: Eric Sunshine @ 2017-10-31 21:49 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Johannes Schindelin, Git List, Jacob Keller, Kevin Daudt

On Tue, Oct 31, 2017 at 5:18 PM, Stefan Beller <sbeller@google.com> wrote:
> When describing commits, we try to anchor them to tags or refs, as these
> are conceptually on a higher level than the commit. And if there is no ref
> or tag that matches exactly, we're out of luck.  So we employ a heuristic
> to make up a name for the commit. These names are ambivalent, there might

I guess you meant s/ambivalent/ambiguous/ ?

> be different tags or refs to anchor to, and there might be different
> path in the DAG to travel to arrive at the commit precisely.
>
> [1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob
> Signed-off-by: Stefan Beller <sbeller@google.com>

Blank line before sign-off.

> ---
> diff --git a/builtin/describe.c b/builtin/describe.c
> @@ -445,11 +495,16 @@ static void describe(const char *arg, int last_one)
>
>         if (get_oid(arg, &oid))
>                 die(_("Not a valid object name %s"), arg);
> -       cmit = lookup_commit_reference(&oid);
> -       if (!cmit)
> -               die(_("%s is not a valid '%s' object"), arg, commit_type);
> +       cmit = lookup_commit_reference_gently(&oid, 1);
>
> -       describe_commit(&oid, &sb);
> +       if (cmit) {
> +               describe_commit(&oid, &sb);
> +       } else {
> +               if (lookup_blob(&oid))
> +                       describe_blob(oid, &sb);
> +               else
> +                       die(_("%s is neither a commit nor blob"), arg);
> +       }

Not at all worth a re-roll, but less nesting and a bit less noisy:

    if (cmt)
        describe_commit(...);
    else if (lookup_blob(...))
        describe_blob(...);
    else
        die(...);

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 7/7] t6120: fix typo in test name
  2017-10-31  0:33   ` [PATCH 7/7] t6120: fix typo in test name Stefan Beller
@ 2017-11-01  1:21     ` Junio C Hamano
  2017-11-01 18:13       ` Stefan Beller
  2017-11-02  1:36       ` Junio C Hamano
  0 siblings, 2 replies; 110+ messages in thread
From: Junio C Hamano @ 2017-11-01  1:21 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, me, jacob.keller, Johannes.Schindelin

Stefan Beller <sbeller@google.com> writes:

> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  t/t6120-describe.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Good.  I am guessing that you are sending this as the last/optional
one because this was found _after_ you worked on other parts of the
series, but I think it is easier to reason about if this were marked
as a preliminary clean-up and moved to the front of the series.

Thanks.

>
> diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
> index 3be01316e8..fd329f173a 100755
> --- a/t/t6120-describe.sh
> +++ b/t/t6120-describe.sh
> @@ -304,7 +304,7 @@ test_expect_success 'describe chokes on severely broken submodules' '
>  	mv .git/modules/sub1/ .git/modules/sub_moved &&
>  	test_must_fail git describe --dirty
>  '
> -test_expect_success 'describe ignoring a borken submodule' '
> +test_expect_success 'describe ignoring a broken submodule' '
>  	git describe --broken >out &&
>  	test_when_finished "mv .git/modules/sub_moved .git/modules/sub1" &&
>  	grep broken out

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 6/7] builtin/describe.c: describe a blob
  2017-10-31 19:16       ` Stefan Beller
@ 2017-11-01  3:34         ` Junio C Hamano
  2017-11-01 20:58           ` Stefan Beller
  2017-11-01  3:44         ` Junio C Hamano
  1 sibling, 1 reply; 110+ messages in thread
From: Junio C Hamano @ 2017-11-01  3:34 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, Kevin Daudt, Jacob Keller, Johannes Schindelin

Stefan Beller <sbeller@google.com> writes:

>>  Given the difficulty in
>> coming up with the single-liner description of what it does we saw
>> above, I suspect that splitting SYNOPSIS out into two very distinct
>> operating mode might make it easier to read.
>>
>>     SYNOPSIS
>>     --------
>>     [verse]
>>     'git describe' [--all] [--tags] [--contains] [--abbrev=<n>] [<commit-ish>...]
>>    +'git describe' [<options>...] <blob>...
>>
>> Then this additional paragraph can say "When describin a <blob>",
>> without using a (technically nonsense) phrase "if <commit-ish>
>> refers to a blob", which is never true.
>
> ok, do we know about 'blob-ish' as a term?

No, and I do not think there is any need to say -ish at all for this
use case.

After all, when we accept a <commit> when a <tree-ish> is called
for, that is only because there is only one way to use the commit in
place of the wanted <tree>; we take the top-level tree contained in
it.  You cannot say you take <blob-ish> and take a <tree>, as it is
unclear which entry in the <tree> can act as the substitute for the
wanted <blob>.

You accept blob object name in this mode, so just saying <blob> is
sufficient.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 6/7] builtin/describe.c: describe a blob
  2017-10-31 19:16       ` Stefan Beller
  2017-11-01  3:34         ` Junio C Hamano
@ 2017-11-01  3:44         ` Junio C Hamano
  1 sibling, 0 replies; 110+ messages in thread
From: Junio C Hamano @ 2017-11-01  3:44 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, Kevin Daudt, Jacob Keller, Johannes Schindelin

Stefan Beller <sbeller@google.com> writes:

> Maybe
>
>   "git-describe - Describe a blob or commit using graph relations"
>
> though that sounds too generic, but it is accurate as all we do is
> a heuristic for graph walk ways.

We used to describe commit using commit ancestry (i.e. finding the
place where a "wanted" commit sits in the commit ancestry graph).
Now we are extending it to describe an object by finding the place
where it sits in the graph that shows object reachability relation,
but saying "graph" without saying "graph over what" is probably not
descriptive enough for most readers.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 1/7] list-objects.c: factor out traverse_trees_and_blobs
  2017-10-31 21:18     ` [PATCHv2 1/7] list-objects.c: factor out traverse_trees_and_blobs Stefan Beller
@ 2017-11-01  3:46       ` Junio C Hamano
  0 siblings, 0 replies; 110+ messages in thread
From: Junio C Hamano @ 2017-11-01  3:46 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Johannes.Schindelin, git, jacob.keller, me

Stefan Beller <sbeller@google.com> writes:

> With traverse_trees_and_blobs factored out of the main traverse function,
> the next patch can introduce an in-order revision walking with ease.
>
> In the next patch we'll call `traverse_trees_and_blobs` from within the
> loop walking the commits, such that we'll have one invocation of that
> function per commit.  That is why we do not want to have memory allocations
> in that function, such as we'd have if we were to use a strbuf locally.
> Pass a strbuf from traverse_commit_list into the blob and tree traversing
> function as a scratch pad that only needs to be allocated once.

Makes sense.

I still don't think base_path is any clearer than base that was used
in the original, though.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 2/7] revision.h: introduce blob/tree walking in order of the commits
  2017-10-31 21:18     ` [PATCHv2 2/7] revision.h: introduce blob/tree walking in order of the commits Stefan Beller
@ 2017-11-01  3:50       ` Junio C Hamano
  2017-11-01 12:26         ` Johannes Schindelin
  0 siblings, 1 reply; 110+ messages in thread
From: Junio C Hamano @ 2017-11-01  3:50 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Johannes.Schindelin, git, jacob.keller, me

Stefan Beller <sbeller@google.com> writes:

> diff --git a/t/t6100-rev-list-in-order.sh b/t/t6100-rev-list-in-order.sh
> new file mode 100755
> index 0000000000..651666979b
> --- /dev/null
> +++ b/t/t6100-rev-list-in-order.sh
> @@ -0,0 +1,46 @@
> +#!/bin/sh
> +
> +test_description='miscellaneous rev-list tests'
> +
> +. ./test-lib.sh
> +
> +
> +test_expect_success 'setup' '
> +	for x in one two three four
> +	do
> +		echo $x >$x &&
> +		git add $x &&
> +		git commit -m "add file $x"
> +	done &&
> +	for x in four three
> +	do
> +		git rm $x &&
> +		git commit -m "remove $x"
> +	done &&

When "git commit -m 'remove four'" fails, this loop would still
continue, so the &&-chain in "done &&" would still be rendered
ineffetive.

> +	git rev-list --in-commit-order --objects HEAD >actual.raw &&
> +	cut -c 1-40 >actual <actual.raw &&
> +
> +	git cat-file --batch-check="%(objectname)" >expect.raw <<-\EOF &&
> +		HEAD^{commit}
> +		HEAD^{tree}
> +		HEAD^{tree}:one
> +		HEAD^{tree}:two
> +		HEAD~1^{commit}
> +		HEAD~1^{tree}
> +		HEAD~1^{tree}:three
> +		HEAD~2^{commit}
> +		HEAD~2^{tree}
> +		HEAD~2^{tree}:four
> +		HEAD~3^{commit}
> +		# HEAD~3^{tree} skipped
> +		HEAD~4^{commit}
> +		# HEAD~4^{tree} skipped
> +		HEAD~5^{commit}
> +		HEAD~5^{tree}
> +	EOF
> +	grep -v "#" >expect <expect.raw &&

Hmm, that is unfortunate that we still have to filter them out, but
these have documentation values, so perhaps this is as good as it
would get.

These "skipped" are shorthand for "not shown here, because it was
already shown as X"; it would be good to come up with a phrasing to
force us show what X is for each of them.  

"already shown as X", perhaps?

> +
> +	test_cmp expect actual
> +'
> +
> +test_done

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 6/7] builtin/describe.c: describe a blob
  2017-10-31 21:18     ` [PATCHv2 6/7] builtin/describe.c: describe a blob Stefan Beller
  2017-10-31 21:49       ` Eric Sunshine
@ 2017-11-01  4:11       ` Junio C Hamano
  2017-11-01 12:32         ` Johannes Schindelin
  2017-11-01 21:28         ` Stefan Beller
  1 sibling, 2 replies; 110+ messages in thread
From: Junio C Hamano @ 2017-11-01  4:11 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Johannes.Schindelin, git, jacob.keller, me

Stefan Beller <sbeller@google.com> writes:

> +If the given object refers to a blob, it will be described
> +as `<commit-ish>:<path>`, such that the blob can be found
> +at `<path>` in the `<commit-ish>`. Note, that the commit is likely

Does the code describe a9dbc3f12c as v2.15.0:GIT-VERSION-GEN, or
would it always be <commit>:<path>?

> +not the commit that introduced the blob, but the one that was found
> +first; to find the commit that introduced the blob, you need to find

Because we do not want to descend into the same tree object we saw
earlier, this "we show the name we happened to find first without
attempting to refine it for a better name" is a rather fundamental
limitation of this approach.  Hopefully we can later improve it with
more thought, but for now it is better than nothing (and much better
than "git log --raw | grep").

> +the commit that last touched the path, e.g.
> +`git log <commit-description> -- <path>`.
> +As blobs do not point at the commits they are contained in,
> +describing blobs is slow as we have to walk the whole graph.

Is it true that we walk the "whole graph", or do we stop immediately
we find a path to the blob?  The former makes it sound like
completely useless.

> -#define SEEN		(1u << 0)

Interesting.  Now we include revision.h this becomes redundant.

> +static void describe_blob(struct object_id oid, struct strbuf *dst)
> +{
> +	struct rev_info revs;
> +	struct argv_array args = ARGV_ARRAY_INIT;
> +	struct process_commit_data pcd = { null_oid, oid, dst};
> +
> +	argv_array_pushl(&args, "internal: The first arg is not parsed",
> +		"--all", "--reflog", /* as many starting points as possible */

Interesting.  

Do we also search in the reflog in the normal "describe" operation?
If not, perhaps we should?  I wonder what's the performance
implications would be.

That tangent aside, as "describe blob" is most likely a "what
reaches and is holding onto this blob?" query, finding something
that can only be reached from a reflog entry would make it more
helpful than without the option.

> +		/* NEEDSWORK: --all is incompatible with worktrees for now: */

What's that last colon about?

> +		"--single-worktree",
> +		"--objects",
> +		"--in-commit-order",
> +		NULL);
> +
> +	init_revisions(&revs, NULL);
> +	if (setup_revisions(args.argc, args.argv, &revs, NULL) > 1)
> +		BUG("setup_revisions could not handle all args?");
> +
> +	if (prepare_revision_walk(&revs))
> +		die("revision walk setup failed");
> +
> +	traverse_commit_list(&revs, process_commit, process_object, &pcd);
> +	reset_revision_walk();
> +}
> +

OK.

> diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
> index 1c0e8659d9..3be01316e8 100755
> --- a/t/t6120-describe.sh
> +++ b/t/t6120-describe.sh
> @@ -310,6 +310,21 @@ test_expect_success 'describe ignoring a borken submodule' '
>  	grep broken out
>  '
>  
> +test_expect_success 'describe a blob at a tag' '
> +	echo "make it a unique blob" >file &&
> +	git add file && git commit -m "content in file" &&
> +	git tag -a -m "latest annotated tag" unique-file &&
> +	git describe HEAD:file >actual &&
> +	echo "unique-file:file" >expect &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'describe a surviving blob' '
> +	git commit --allow-empty -m "empty commit" &&
> +	git describe HEAD:file >actual &&
> +	grep unique-file-1-g actual
> +'
> +

I am not sure what "surviving" means in this context.  The word
hinted that the test would be finding a blob that only appears in a
commit that only appears as a reflog entry, but that wasn't the
case, which was a bit disappointing.




^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 0/7] git describe blob
  2017-10-31 21:18   ` [PATCHv2 0/7] git describe blob Stefan Beller
                       ` (6 preceding siblings ...)
  2017-10-31 21:18     ` [PATCHv2 7/7] t6120: fix typo in test name Stefan Beller
@ 2017-11-01  5:14     ` Junio C Hamano
  2017-11-02 19:41     ` [PATCHv3 " Stefan Beller
  8 siblings, 0 replies; 110+ messages in thread
From: Junio C Hamano @ 2017-11-01  5:14 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Johannes.Schindelin, git, jacob.keller, me

Stefan Beller <sbeller@google.com> writes:

> Occasionally a user is given an object hash from a blob as an error message
> or other output (e.g. [1]).
>
> It would be useful to get a further description of such a blob, such as
> the (commit, path) tuple where this blob was introduced.
>
> This implements the answer in builtin/describe,
> however the heuristics are weak. See patch 6 for details.

Overall this was a relatively pleasant read.  Will queue.

Thanks.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 2/7] revision.h: introduce blob/tree walking in order of the commits
  2017-11-01  3:50       ` Junio C Hamano
@ 2017-11-01 12:26         ` Johannes Schindelin
  2017-11-01 12:37           ` Junio C Hamano
  0 siblings, 1 reply; 110+ messages in thread
From: Johannes Schindelin @ 2017-11-01 12:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, git, jacob.keller, me

Hi,

On Wed, 1 Nov 2017, Junio C Hamano wrote:

> Stefan Beller <sbeller@google.com> writes:
> 
> > diff --git a/t/t6100-rev-list-in-order.sh b/t/t6100-rev-list-in-order.sh
> > new file mode 100755
> > index 0000000000..651666979b
> > --- /dev/null
> > +++ b/t/t6100-rev-list-in-order.sh
> > @@ -0,0 +1,46 @@
> > +#!/bin/sh
> > +
> > +test_description='miscellaneous rev-list tests'
> > +
> > +. ./test-lib.sh
> > +
> > +
> > +test_expect_success 'setup' '
> > +	for x in one two three four
> > +	do
> > +		echo $x >$x &&
> > +		git add $x &&
> > +		git commit -m "add file $x"
> > +	done &&
> > +	for x in four three
> > +	do
> > +		git rm $x &&
> > +		git commit -m "remove $x"
> > +	done &&
> 
> When "git commit -m 'remove four'" fails, this loop would still
> continue, so the &&-chain in "done &&" would still be rendered
> ineffetive.

... so the proper fix is to append an "|| break", as in:

	...
	for x in four three
	do
		git rm $x &&
		git commit -m "remote $x" ||
		break
	done &&
	...

But that is still incorrect, as the "break" statement will succeed,
breaking (pun intended) the && chain in a different way. The canonical way
to do it is to wrap the loop in a subshell, *and also* test $? (because
`(exit 1) && echo something` still prints `something`):

	...
	(
		for x in four three
		do
			git rm $x &&
			git commit -m "remote $x" ||
			exit
		done
	) &&
	test 0 -eq $? &&
	...

Ugly? Yes. Correct? Also yes.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 6/7] builtin/describe.c: describe a blob
  2017-11-01  4:11       ` Junio C Hamano
@ 2017-11-01 12:32         ` Johannes Schindelin
  2017-11-01 17:59           ` Stefan Beller
  2017-11-01 21:28         ` Stefan Beller
  1 sibling, 1 reply; 110+ messages in thread
From: Johannes Schindelin @ 2017-11-01 12:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, git, jacob.keller, me

Hi Junio,

On Wed, 1 Nov 2017, Junio C Hamano wrote:

> Stefan Beller <sbeller@google.com> writes:
> 
> > +If the given object refers to a blob, it will be described
> > +as `<commit-ish>:<path>`, such that the blob can be found
> > +at `<path>` in the `<commit-ish>`. Note, that the commit is likely
> 
> Does the code describe a9dbc3f12c as v2.15.0:GIT-VERSION-GEN, or
> would it always be <commit>:<path>?

As the blob is described using this function:

static void process_object(struct object *obj, const char *path, void *data)
{
       struct process_commit_data *pcd = data;

       if (!oidcmp(&pcd->looking_for, &obj->oid) && !pcd->dst->len) {
               reset_revision_walk();
               describe_commit(&pcd->current_commit, pcd->dst);
               strbuf_addf(pcd->dst, ":%s", path);
       }
}

i.e. as `describe_commit()` is used on the commit part, the answer to your
question is: the former. I guess that is why Stefan wrote `commit-ish`
instead of `commit` ;-)

> > +not the commit that introduced the blob, but the one that was found
> > +first; to find the commit that introduced the blob, you need to find
> 
> Because we do not want to descend into the same tree object we saw
> earlier, this "we show the name we happened to find first without
> attempting to refine it for a better name" is a rather fundamental
> limitation of this approach.  Hopefully we can later improve it with
> more thought, but for now it is better than nothing (and much better
> than "git log --raw | grep").

Indeed.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 2/7] revision.h: introduce blob/tree walking in order of the commits
  2017-11-01 12:26         ` Johannes Schindelin
@ 2017-11-01 12:37           ` Junio C Hamano
  2017-11-01 19:37             ` Stefan Beller
  2017-11-01 21:36             ` Johannes Schindelin
  0 siblings, 2 replies; 110+ messages in thread
From: Junio C Hamano @ 2017-11-01 12:37 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Stefan Beller, Git Mailing List, Jacob Keller, Kevin D

On Wed, Nov 1, 2017 at 9:26 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
>         ...
>         (
>                 for x in four three
>                 do
>                         git rm $x &&
>                         git commit -m "remote $x" ||
>                         exit
>                 done
>         ) &&
>         test 0 -eq $? &&
>         ...
>
> Ugly? Yes. Correct? Also yes.

I think returning non-zero with "return" is how other tests avoid an
extra level of subshell.
Ugly? Yes. Correct? Questionable but it seems to work for those who
wrote them ;-)

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 6/7] builtin/describe.c: describe a blob
  2017-11-01 12:32         ` Johannes Schindelin
@ 2017-11-01 17:59           ` Stefan Beller
  2017-11-01 21:05             ` Jacob Keller
  0 siblings, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2017-11-01 17:59 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git, Jacob Keller, Kevin Daudt

>> Does the code describe a9dbc3f12c as v2.15.0:GIT-VERSION-GEN, or
>> would it always be <commit>:<path>?
>
> As the blob is described using this function:
>
> static void process_object(struct object *obj, const char *path, void *data)
> {
>        struct process_commit_data *pcd = data;
>
>        if (!oidcmp(&pcd->looking_for, &obj->oid) && !pcd->dst->len) {
>                reset_revision_walk();
>                describe_commit(&pcd->current_commit, pcd->dst);
>                strbuf_addf(pcd->dst, ":%s", path);
>        }
> }
>
> i.e. as `describe_commit()` is used on the commit part, the answer to your
> question is: the former. I guess that is why Stefan wrote `commit-ish`
> instead of `commit` ;-)

$ ./git describe a9dbc3f12c
warning: reflog of 'HEAD' references pruned commits
v2.15.0-7-g980e40477f:GIT-VERSION-GEN

So as noted below, this output is less than ideal, but technically correct as
v2.15.0-7-g980e40477f contains that blob as well (you don't have these;
it is this very series consisting of 7 patches on top of 2.15, none of them
touching GIT-VERSION-GEN, hence that blob stays intact.)

The way Junio asked, we actually may prefer the commit-ish to give
that commit that introduced the blob for the first time, i.e. add the
--reverse to the graph walking.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 7/7] t6120: fix typo in test name
  2017-11-01  1:21     ` Junio C Hamano
@ 2017-11-01 18:13       ` Stefan Beller
  2017-11-02  1:36       ` Junio C Hamano
  1 sibling, 0 replies; 110+ messages in thread
From: Stefan Beller @ 2017-11-01 18:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Kevin Daudt, Jacob Keller, Johannes Schindelin

On Tue, Oct 31, 2017 at 6:21 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>> ---
>>  t/t6120-describe.sh | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> Good.  I am guessing that you are sending this as the last/optional
> one because this was found _after_ you worked on other parts of the
> series, but I think it is easier to reason about if this were marked
> as a preliminary clean-up and moved to the front of the series.
>
> Thanks.

It can be independent even. I did not bother to put it on its own flight,
but I can shift it to the front of this series.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 2/7] revision.h: introduce blob/tree walking in order of the commits
  2017-11-01 12:37           ` Junio C Hamano
@ 2017-11-01 19:37             ` Stefan Beller
  2017-11-01 22:08               ` Johannes Schindelin
  2017-11-01 21:36             ` Johannes Schindelin
  1 sibling, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2017-11-01 19:37 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Git Mailing List, Jacob Keller, Kevin D

On Wed, Nov 1, 2017 at 5:37 AM, Junio C Hamano <gitster@pobox.com> wrote:
> On Wed, Nov 1, 2017 at 9:26 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
>>
>>         ...
>>         (
>>                 for x in four three
>>                 do
>>                         git rm $x &&
>>                         git commit -m "remote $x" ||
>>                         exit
>>                 done
>>         ) &&
>>         test 0 -eq $? &&
>>         ...
>>
>> Ugly? Yes. Correct? Also yes.
>
> I think returning non-zero with "return" is how other tests avoid an
> extra level of subshell.
> Ugly? Yes. Correct? Questionable but it seems to work for those who
> wrote them ;-)

I think I'll go without the extra subshell and with s/return 1/false/ as
the exact value doesn't matter.

Thanks for hinting at correct implementations

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 6/7] builtin/describe.c: describe a blob
  2017-10-31 21:49       ` Eric Sunshine
@ 2017-11-01 19:51         ` Stefan Beller
  0 siblings, 0 replies; 110+ messages in thread
From: Stefan Beller @ 2017-11-01 19:51 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Johannes Schindelin, Git List, Jacob Keller, Kevin Daudt

On Tue, Oct 31, 2017 at 2:49 PM, Eric Sunshine <sunshine@sunshineco.com> wrote:

>> to make up a name for the commit. These names are ambivalent, there might
>
> I guess you meant s/ambivalent/ambiguous/ ?

Indeed!

  ambivalent
  early 20th century: from ambivalence (from German Ambivalenz ),
  on the pattern of equivalent.

In German ambivalent (~synonym to mehrdeutig, "multi meaning")
actually means ambiguous, one of the false friends!

Thanks for pointing out.

>> [1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>
> Blank line before sign-off.

fixes.


>
> Not at all worth a re-roll, but less nesting and a bit less noisy:

fixed.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 6/7] builtin/describe.c: describe a blob
  2017-11-01  3:34         ` Junio C Hamano
@ 2017-11-01 20:58           ` Stefan Beller
  2017-11-02  1:53             ` Junio C Hamano
  0 siblings, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2017-11-01 20:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Kevin Daudt, Jacob Keller, Johannes Schindelin

On Tue, Oct 31, 2017 at 8:34 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>>>  Given the difficulty in
>>> coming up with the single-liner description of what it does we saw
>>> above, I suspect that splitting SYNOPSIS out into two very distinct
>>> operating mode might make it easier to read.
>>>
>>>     SYNOPSIS
>>>     --------
>>>     [verse]
>>>     'git describe' [--all] [--tags] [--contains] [--abbrev=<n>] [<commit-ish>...]
>>>    +'git describe' [<options>...] <blob>...
>>>
>>> Then this additional paragraph can say "When describin a <blob>",
>>> without using a (technically nonsense) phrase "if <commit-ish>
>>> refers to a blob", which is never true.
>>
>> ok, do we know about 'blob-ish' as a term?
>
> No, and I do not think there is any need to say -ish at all for this
> use case.
>
> After all, when we accept a <commit> when a <tree-ish> is called
> for, that is only because there is only one way to use the commit in
> place of the wanted <tree>; we take the top-level tree contained in
> it.  You cannot say you take <blob-ish> and take a <tree>, as it is
> unclear which entry in the <tree> can act as the substitute for the
> wanted <blob>.

But now we have a path as well, the notation of
<commit-ish> COLON <path>
is not a unique description of the blob, because
* there can be multiple <commit-ish>s depending on the tags and walking
* in boilerplate code cases, we might even have the blob at different
  places (e.g. pristine copies of a license file in subdirectories)

When calling for a tree-ish, we also accept commits and tags
plus walking directions.

So I find it hard to think we have to take in exact blob names, but
we would also accept blob-ishs (i.e. commit+path)

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 6/7] builtin/describe.c: describe a blob
  2017-11-01 17:59           ` Stefan Beller
@ 2017-11-01 21:05             ` Jacob Keller
  2017-11-01 22:12               ` Johannes Schindelin
  0 siblings, 1 reply; 110+ messages in thread
From: Jacob Keller @ 2017-11-01 21:05 UTC (permalink / raw)
  To: Stefan Beller, Johannes Schindelin; +Cc: Junio C Hamano, git, Kevin Daudt



On November 1, 2017 10:59:08 AM PDT, Stefan Beller <sbeller@google.com> wrote:
>>> Does the code describe a9dbc3f12c as v2.15.0:GIT-VERSION-GEN, or
>>> would it always be <commit>:<path>?
>>
>> As the blob is described using this function:
>>
>> static void process_object(struct object *obj, const char *path, void
>*data)
>> {
>>        struct process_commit_data *pcd = data;
>>
>>        if (!oidcmp(&pcd->looking_for, &obj->oid) && !pcd->dst->len) {
>>                reset_revision_walk();
>>                describe_commit(&pcd->current_commit, pcd->dst);
>>                strbuf_addf(pcd->dst, ":%s", path);
>>        }
>> }
>>
>> i.e. as `describe_commit()` is used on the commit part, the answer to
>your
>> question is: the former. I guess that is why Stefan wrote
>`commit-ish`
>> instead of `commit` ;-)
>
>$ ./git describe a9dbc3f12c
>warning: reflog of 'HEAD' references pruned commits
>v2.15.0-7-g980e40477f:GIT-VERSION-GEN
>
>So as noted below, this output is less than ideal, but technically
>correct as
>v2.15.0-7-g980e40477f contains that blob as well (you don't have these;
>it is this very series consisting of 7 patches on top of 2.15, none of
>them
>touching GIT-VERSION-GEN, hence that blob stays intact.)
>
>The way Junio asked, we actually may prefer the commit-ish to give
>that commit that introduced the blob for the first time, i.e. add the
>--reverse to the graph walking.

I know id prefer the first commit that introduced the blob. That's what describing a commit does, it finds the oldest tag prior to the commit, while --contains finds the first tag that contains the commit as an ancestor.

Neither of these things is a perfect match for blobs since we want to only find an exact commit that still has that blob. I think finding the first commit to introduce a blob is generally more useful. Finding the last commit from some branch before the blob was changed also might be useful but I don't know if that's exactly what you implemented here...

In either respect, unless it's super easy to implement, going with what you have now is better than nothing and we can improve it in the future should someone wish to take up the work.

-Jake
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 6/7] builtin/describe.c: describe a blob
  2017-11-01  4:11       ` Junio C Hamano
  2017-11-01 12:32         ` Johannes Schindelin
@ 2017-11-01 21:28         ` Stefan Beller
  1 sibling, 0 replies; 110+ messages in thread
From: Stefan Beller @ 2017-11-01 21:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git, Jacob Keller, Kevin Daudt

On Tue, Oct 31, 2017 at 9:11 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> +If the given object refers to a blob, it will be described
>> +as `<commit-ish>:<path>`, such that the blob can be found
>> +at `<path>` in the `<commit-ish>`. Note, that the commit is likely
>
> Does the code describe a9dbc3f12c as v2.15.0:GIT-VERSION-GEN, or
> would it always be <commit>:<path>?
>
>> +not the commit that introduced the blob, but the one that was found
>> +first; to find the commit that introduced the blob, you need to find
>
> Because we do not want to descend into the same tree object we saw
> earlier, this "we show the name we happened to find first without
> attempting to refine it for a better name" is a rather fundamental
> limitation of this approach.  Hopefully we can later improve it with
> more thought, but for now it is better than nothing (and much better
> than "git log --raw | grep").

ok.


>
>> +the commit that last touched the path, e.g.
>> +`git log <commit-description> -- <path>`.
>> +As blobs do not point at the commits they are contained in,
>> +describing blobs is slow as we have to walk the whole graph.
>
> Is it true that we walk the "whole graph", or do we stop immediately
> we find a path to the blob?  The former makes it sound like
> completely useless.

Unfortunately we walk the whole graph, as I have not figured out how to
stop the walking from the callback in 'traverse_commit_list'.

I assume I have to modify the struct rev_info that we operate on,
clearing any pending commits?

>
>> -#define SEEN         (1u << 0)
>
> Interesting.  Now we include revision.h this becomes redundant.
>

Correct. In a way this small part is a revert of 8713ab3079
(Improve git-describe performance by reducing revision listing., 2007-01-13)


>> +     argv_array_pushl(&args, "internal: The first arg is not parsed",
>> +             "--all", "--reflog", /* as many starting points as possible */
>
> Interesting.
>
> Do we also search in the reflog in the normal "describe" operation?
> If not, perhaps we should?  I wonder what's the performance
> implications would be.

"normal" git describe doesn't need to walk the whole graph
as we only walk down from the given commit-ish until a land mark
is found.

For --contains, this might be an interesting though, as there we
also have to "walk backwards without pointers to follow".

>
> That tangent aside, as "describe blob" is most likely a "what
> reaches and is holding onto this blob?" query, finding something
> that can only be reached from a reflog entry would make it more
> helpful than without the option.

Yeah that is my reasoning as well.

>
>> +             /* NEEDSWORK: --all is incompatible with worktrees for now: */
>
> What's that last colon about?

will replace with a dot, it ought to hint at the line that follows,
the --single-worktree flag.

>
>> +             "--single-worktree",
>> +             "--objects",
>> +             "--in-commit-order",
>> +             NULL);
>> +
>> +     init_revisions(&revs, NULL);
>> +     if (setup_revisions(args.argc, args.argv, &revs, NULL) > 1)
>> +             BUG("setup_revisions could not handle all args?");
>> +
>> +     if (prepare_revision_walk(&revs))
>> +             die("revision walk setup failed");
>> +
>> +     traverse_commit_list(&revs, process_commit, process_object, &pcd);
>> +     reset_revision_walk();
>> +}
>> +
>
> OK.
>
>> diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
>> index 1c0e8659d9..3be01316e8 100755
>> --- a/t/t6120-describe.sh
>> +++ b/t/t6120-describe.sh
>> @@ -310,6 +310,21 @@ test_expect_success 'describe ignoring a borken submodule' '
>>       grep broken out
>>  '
>>
>> +test_expect_success 'describe a blob at a tag' '
>> +     echo "make it a unique blob" >file &&
>> +     git add file && git commit -m "content in file" &&
>> +     git tag -a -m "latest annotated tag" unique-file &&
>> +     git describe HEAD:file >actual &&
>> +     echo "unique-file:file" >expect &&
>> +     test_cmp expect actual
>> +'
>> +
>> +test_expect_success 'describe a surviving blob' '
>> +     git commit --allow-empty -m "empty commit" &&
>> +     git describe HEAD:file >actual &&
>> +     grep unique-file-1-g actual
>> +'
>> +
>
> I am not sure what "surviving" means in this context.

Maybe "unchanged", "still kept around" ?


> The word
> hinted that the test would be finding a blob that only appears in a
> commit that only appears as a reflog entry, but that wasn't the
> case, which was a bit disappointing.

oh!

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 2/7] revision.h: introduce blob/tree walking in order of the commits
  2017-11-01 12:37           ` Junio C Hamano
  2017-11-01 19:37             ` Stefan Beller
@ 2017-11-01 21:36             ` Johannes Schindelin
  2017-11-01 21:39               ` Jeff King
  1 sibling, 1 reply; 110+ messages in thread
From: Johannes Schindelin @ 2017-11-01 21:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Git Mailing List, Jacob Keller, Kevin D

Hi Junio,

On Wed, 1 Nov 2017, Junio C Hamano wrote:

> On Wed, Nov 1, 2017 at 9:26 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> >         ...
> >         (
> >                 for x in four three
> >                 do
> >                         git rm $x &&
> >                         git commit -m "remote $x" ||
> >                         exit
> >                 done
> >         ) &&
> >         test 0 -eq $? &&
> >         ...
> >
> > Ugly? Yes. Correct? Also yes.
> 
> I think returning non-zero with "return" is how other tests avoid an
> extra level of subshell.
> Ugly? Yes. Correct? Questionable but it seems to work for those who
> wrote them ;-)

Given that the test_expect_* functions evaluate the code, it makes me
wonder whether those `return` statements really return appropriately, or
one call level too low.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 2/7] revision.h: introduce blob/tree walking in order of the commits
  2017-11-01 21:36             ` Johannes Schindelin
@ 2017-11-01 21:39               ` Jeff King
  2017-11-01 22:33                 ` Johannes Schindelin
  0 siblings, 1 reply; 110+ messages in thread
From: Jeff King @ 2017-11-01 21:39 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Stefan Beller, Git Mailing List, Jacob Keller, Kevin D

On Wed, Nov 01, 2017 at 10:36:02PM +0100, Johannes Schindelin wrote:

> Hi Junio,
> 
> On Wed, 1 Nov 2017, Junio C Hamano wrote:
> 
> > On Wed, Nov 1, 2017 at 9:26 PM, Johannes Schindelin
> > <Johannes.Schindelin@gmx.de> wrote:
> > >
> > >         ...
> > >         (
> > >                 for x in four three
> > >                 do
> > >                         git rm $x &&
> > >                         git commit -m "remote $x" ||
> > >                         exit
> > >                 done
> > >         ) &&
> > >         test 0 -eq $? &&
> > >         ...
> > >
> > > Ugly? Yes. Correct? Also yes.
> > 
> > I think returning non-zero with "return" is how other tests avoid an
> > extra level of subshell.
> > Ugly? Yes. Correct? Questionable but it seems to work for those who
> > wrote them ;-)
> 
> Given that the test_expect_* functions evaluate the code, it makes me
> wonder whether those `return` statements really return appropriately, or
> one call level too low.

The test_expect functions eval the actual snippets inside a dummy
function. This is intentional exactly to allow them to call "return" at
will.

-Peff

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 2/7] revision.h: introduce blob/tree walking in order of the commits
  2017-11-01 19:37             ` Stefan Beller
@ 2017-11-01 22:08               ` Johannes Schindelin
  2017-11-01 22:19                 ` Stefan Beller
  0 siblings, 1 reply; 110+ messages in thread
From: Johannes Schindelin @ 2017-11-01 22:08 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, Git Mailing List, Jacob Keller, Kevin D

Hi Stefan,

On Wed, 1 Nov 2017, Stefan Beller wrote:

> On Wed, Nov 1, 2017 at 5:37 AM, Junio C Hamano <gitster@pobox.com> wrote:
> > On Wed, Nov 1, 2017 at 9:26 PM, Johannes Schindelin
> > <Johannes.Schindelin@gmx.de> wrote:
> >>
> >>         ...
> >>         (
> >>                 for x in four three
> >>                 do
> >>                         git rm $x &&
> >>                         git commit -m "remote $x" ||
> >>                         exit
> >>                 done
> >>         ) &&
> >>         test 0 -eq $? &&
> >>         ...
> >>
> >> Ugly? Yes. Correct? Also yes.
> >
> > I think returning non-zero with "return" is how other tests avoid an
> > extra level of subshell.
> > Ugly? Yes. Correct? Questionable but it seems to work for those who
> > wrote them ;-)
> 
> I think I'll go without the extra subshell and with s/return 1/false/ as
> the exact value doesn't matter.

You mean

	for ...
	do
		xyz ||
		false
	done

? That does not work. Try this:

	for a in 1 2 3; do echo $a && false; done && echo done

While it does not print `done`, it will print all of 1, 2 and 3.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 6/7] builtin/describe.c: describe a blob
  2017-11-01 21:05             ` Jacob Keller
@ 2017-11-01 22:12               ` Johannes Schindelin
  2017-11-01 22:21                 ` Stefan Beller
  0 siblings, 1 reply; 110+ messages in thread
From: Johannes Schindelin @ 2017-11-01 22:12 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Stefan Beller, Junio C Hamano, git, Kevin Daudt

Hi,

On Wed, 1 Nov 2017, Jacob Keller wrote:

> On November 1, 2017 10:59:08 AM PDT, Stefan Beller <sbeller@google.com> wrote:
> >>> Does the code describe a9dbc3f12c as v2.15.0:GIT-VERSION-GEN, or
> >>> would it always be <commit>:<path>?
> >>
> >> As the blob is described using this function:
> >>
> >> static void process_object(struct object *obj, const char *path, void
> >*data)
> >> {
> >>        struct process_commit_data *pcd = data;
> >>
> >>        if (!oidcmp(&pcd->looking_for, &obj->oid) && !pcd->dst->len) {
> >>                reset_revision_walk();
> >>                describe_commit(&pcd->current_commit, pcd->dst);
> >>                strbuf_addf(pcd->dst, ":%s", path);
> >>        }
> >> }
> >>
> >> i.e. as `describe_commit()` is used on the commit part, the answer to
> >your
> >> question is: the former. I guess that is why Stefan wrote
> >`commit-ish`
> >> instead of `commit` ;-)
> >
> >$ ./git describe a9dbc3f12c
> >warning: reflog of 'HEAD' references pruned commits
> >v2.15.0-7-g980e40477f:GIT-VERSION-GEN
> >
> >So as noted below, this output is less than ideal, but technically
> >correct as
> >v2.15.0-7-g980e40477f contains that blob as well (you don't have these;
> >it is this very series consisting of 7 patches on top of 2.15, none of
> >them
> >touching GIT-VERSION-GEN, hence that blob stays intact.)
> >
> >The way Junio asked, we actually may prefer the commit-ish to give
> >that commit that introduced the blob for the first time, i.e. add the
> >--reverse to the graph walking.
> 
> I know id prefer the first commit that introduced the blob. That's what
> describing a commit does, it finds the oldest tag prior to the commit,
> while --contains finds the first tag that contains the commit as an
> ancestor.

It is very easy to wish for "the oldest commit introducing a blob". But
since we're in a DAG, this question is not necessarily easy to answer:

- A - B - C
    \   /
      D

Let's assume that all commits except A have the blob, and that B and D
(for some reason) have the same author/committer dates.

Do you identify B or D as the commit introducing the blob?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 2/7] revision.h: introduce blob/tree walking in order of the commits
  2017-11-01 22:08               ` Johannes Schindelin
@ 2017-11-01 22:19                 ` Stefan Beller
  2017-11-01 22:39                   ` Johannes Schindelin
  0 siblings, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2017-11-01 22:19 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Git Mailing List, Jacob Keller, Kevin D

>> I think I'll go without the extra subshell and with s/return 1/false/ as
>> the exact value doesn't matter.
>
> You mean
>
>         for ...
>         do
>                 xyz ||
>                 false
>         done

Yes, I do.

> ? That does not work. Try this:
>
>         for a in 1 2 3; do echo $a && false; done && echo done

...   && echo $?
         1

> While it does not print `done`, it will print all of 1, 2 and 3.

We do not care about the internal state, aborting early, we rather
care only if the whole loop body was executed. Running the test

    test_expect_success 'witty title' '
        for a in 1 2 3; do echo $a && false; done && echo done
    '

not ok 1 - witty title

That is all we want to care about here?
Otherwise I may still have a misunderstanding.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 6/7] builtin/describe.c: describe a blob
  2017-11-01 22:12               ` Johannes Schindelin
@ 2017-11-01 22:21                 ` Stefan Beller
  2017-11-01 22:41                   ` Johannes Schindelin
  0 siblings, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2017-11-01 22:21 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jacob Keller, Junio C Hamano, git, Kevin Daudt

>> I know id prefer the first commit that introduced the blob. That's what
>> describing a commit does, it finds the oldest tag prior to the commit,
>> while --contains finds the first tag that contains the commit as an
>> ancestor.
>
> It is very easy to wish for "the oldest commit introducing a blob". But
> since we're in a DAG, this question is not necessarily easy to answer:
>
> - A - B - C
>     \   /
>       D
>
> Let's assume that all commits except A have the blob, and that B and D
> (for some reason) have the same author/committer dates.
>
> Do you identify B or D as the commit introducing the blob?

The current implementation gives C, though.
(assuming C is HEAD, and A is ancient)

With the --reverse flag one of B or D is given (not sure which,
depends on the exact order).

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 2/7] revision.h: introduce blob/tree walking in order of the commits
  2017-11-01 21:39               ` Jeff King
@ 2017-11-01 22:33                 ` Johannes Schindelin
  2017-11-02  1:20                   ` Junio C Hamano
  0 siblings, 1 reply; 110+ messages in thread
From: Johannes Schindelin @ 2017-11-01 22:33 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Stefan Beller, Git Mailing List, Jacob Keller, Kevin D

Hi,

On Wed, 1 Nov 2017, Jeff King wrote:

> On Wed, Nov 01, 2017 at 10:36:02PM +0100, Johannes Schindelin wrote:
> 
> > On Wed, 1 Nov 2017, Junio C Hamano wrote:
> > 
> > > On Wed, Nov 1, 2017 at 9:26 PM, Johannes Schindelin
> > > <Johannes.Schindelin@gmx.de> wrote:
> > > >
> > > >         ...
> > > >         (
> > > >                 for x in four three
> > > >                 do
> > > >                         git rm $x &&
> > > >                         git commit -m "remote $x" ||
> > > >                         exit
> > > >                 done
> > > >         ) &&
> > > >         test 0 -eq $? &&
> > > >         ...
> > > >
> > > > Ugly? Yes. Correct? Also yes.
> > > 
> > > I think returning non-zero with "return" is how other tests avoid an
> > > extra level of subshell.
> > > Ugly? Yes. Correct? Questionable but it seems to work for those who
> > > wrote them ;-)
> > 
> > Given that the test_expect_* functions evaluate the code, it makes me
> > wonder whether those `return` statements really return appropriately, or
> > one call level too low.
> 
> The test_expect functions eval the actual snippets inside a dummy
> function. This is intentional exactly to allow them to call "return" at
> will.

Tricksy. And a bit unintuitive for script kings such as myself, but okay.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 2/7] revision.h: introduce blob/tree walking in order of the commits
  2017-11-01 22:19                 ` Stefan Beller
@ 2017-11-01 22:39                   ` Johannes Schindelin
  2017-11-01 22:46                     ` Stefan Beller
  0 siblings, 1 reply; 110+ messages in thread
From: Johannes Schindelin @ 2017-11-01 22:39 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, Git Mailing List, Jacob Keller, Kevin D

Hi Stefan,

On Wed, 1 Nov 2017, Stefan Beller wrote:

> We do not care about the internal state, aborting early, we rather
> care only if the whole loop body was executed. Running the test
> 
>     test_expect_success 'witty title' '
>         for a in 1 2 3; do echo $a && false; done && echo done
>     '
> 
> not ok 1 - witty title
> 
> That is all we want to care about here?

We care about the loop body being executed successfully *each time*. A
better counter example:

$ for a in 1 2 3; do echo $a && test 2 != $a || false; done && echo done
1
2
3
done

(even if the second of three runs failed)

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 6/7] builtin/describe.c: describe a blob
  2017-11-01 22:21                 ` Stefan Beller
@ 2017-11-01 22:41                   ` Johannes Schindelin
  2017-11-01 22:53                     ` Stefan Beller
                                       ` (2 more replies)
  0 siblings, 3 replies; 110+ messages in thread
From: Johannes Schindelin @ 2017-11-01 22:41 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jacob Keller, Junio C Hamano, git, Kevin Daudt

Hi Stefan,

On Wed, 1 Nov 2017, Stefan Beller wrote:

> >> I know id prefer the first commit that introduced the blob. That's
> >> what describing a commit does, it finds the oldest tag prior to the
> >> commit, while --contains finds the first tag that contains the commit
> >> as an ancestor.
> >
> > It is very easy to wish for "the oldest commit introducing a blob". But
> > since we're in a DAG, this question is not necessarily easy to answer:
> >
> > - A - B - C
> >     \   /
> >       D
> >
> > Let's assume that all commits except A have the blob, and that B and D
> > (for some reason) have the same author/committer dates.
> >
> > Do you identify B or D as the commit introducing the blob?
> 
> The current implementation gives C, though.
> (assuming C is HEAD, and A is ancient)
> 
> With the --reverse flag one of B or D is given (not sure which,
> depends on the exact order).

Sure, but it is still a tricky thing. Imagine

- A1 - B1 - A2 - B2 - B3

where all the B* commits have the blob. Do you really want to report B1
rather than B2 as the commit introducing the blob? (I would prefer B2...)

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 2/7] revision.h: introduce blob/tree walking in order of the commits
  2017-11-01 22:39                   ` Johannes Schindelin
@ 2017-11-01 22:46                     ` Stefan Beller
  0 siblings, 0 replies; 110+ messages in thread
From: Stefan Beller @ 2017-11-01 22:46 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Git Mailing List, Jacob Keller, Kevin D

On Wed, Nov 1, 2017 at 3:39 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>> not ok 1 - witty title
>>
>> That is all we want to care about here?
>
> We care about the loop body being executed successfully *each time*. A
> better counter example:

Good point. I'll use return in that case. Thanks!

Stefan

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 6/7] builtin/describe.c: describe a blob
  2017-11-01 22:41                   ` Johannes Schindelin
@ 2017-11-01 22:53                     ` Stefan Beller
  2017-11-02  6:05                     ` Jacob Keller
  2017-11-02  7:23                     ` Andreas Schwab
  2 siblings, 0 replies; 110+ messages in thread
From: Stefan Beller @ 2017-11-01 22:53 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jacob Keller, Junio C Hamano, git, Kevin Daudt

On Wed, Nov 1, 2017 at 3:41 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:

>> The current implementation gives C, though.
>> (assuming C is HEAD, and A is ancient)
>>
>> With the --reverse flag one of B or D is given (not sure which,
>> depends on the exact order).
>
> Sure, but it is still a tricky thing. Imagine
>
> - A1 - B1 - A2 - B2 - B3
>
> where all the B* commits have the blob. Do you really want to report B1
> rather than B2 as the commit introducing the blob? (I would prefer B2...)

You are correct, that B2 is also important to find.

Ideally you want a list of all adjacent commit groups, but that is too
tricky to implement for now, deferring it to another contribution in the future.

The current proposal would give you B3, such that `git log B3 -- path` will find
you B2.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 2/7] revision.h: introduce blob/tree walking in order of the commits
  2017-11-01 22:33                 ` Johannes Schindelin
@ 2017-11-02  1:20                   ` Junio C Hamano
  0 siblings, 0 replies; 110+ messages in thread
From: Junio C Hamano @ 2017-11-02  1:20 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Jeff King, Stefan Beller, Git Mailing List, Jacob Keller, Kevin D

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> > Given that the test_expect_* functions evaluate the code, it makes me
>> > wonder whether those `return` statements really return appropriately, or
>> > one call level too low.
>> 
>> The test_expect functions eval the actual snippets inside a dummy
>> function. This is intentional exactly to allow them to call "return" at
>> will.
>
> Tricksy. And a bit unintuitive for script kings such as myself, but okay.

Exactly.

The hidden assumption on the way how "return" interacts with
the way we use "eval" is the reason why I said "Ugly? Yes, Correct?
Questionable" in the first place.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 7/7] t6120: fix typo in test name
  2017-11-01  1:21     ` Junio C Hamano
  2017-11-01 18:13       ` Stefan Beller
@ 2017-11-02  1:36       ` Junio C Hamano
  1 sibling, 0 replies; 110+ messages in thread
From: Junio C Hamano @ 2017-11-02  1:36 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, me, jacob.keller, Johannes.Schindelin

Junio C Hamano <gitster@pobox.com> writes:

> Stefan Beller <sbeller@google.com> writes:
>
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>> ---
>>  t/t6120-describe.sh | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> Good.  I am guessing that you are sending this as the last/optional
> one because this was found _after_ you worked on other parts of the
> series, but I think it is easier to reason about if this were marked
> as a preliminary clean-up and moved to the front of the series.
>
> Thanks.

Having said that, if you already have v2 that keeps it at the end,
that is fine.  

It's just leaving such an "By the way I found this unrelated one and
fixed it while at it" at the end when submitting will give an
impression that the series is not as well proof-read as it could be,
because a good proof-reader would notice something out-of-place like
this fairly quickly and would move it to the front or even in a
separate preliminary clean-up series.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 6/7] builtin/describe.c: describe a blob
  2017-11-01 20:58           ` Stefan Beller
@ 2017-11-02  1:53             ` Junio C Hamano
  2017-11-02  4:23               ` Junio C Hamano
  0 siblings, 1 reply; 110+ messages in thread
From: Junio C Hamano @ 2017-11-02  1:53 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, Kevin Daudt, Jacob Keller, Johannes Schindelin

Stefan Beller <sbeller@google.com> writes:

> But now we have a path as well, the notation of
> <commit-ish> COLON <path>
> is not a unique description of the blob, because
> * there can be multiple <commit-ish>s depending on the tags and walking
> * in boilerplate code cases, we might even have the blob at different
>   places (e.g. pristine copies of a license file in subdirectories)
>
> When calling for a tree-ish, we also accept commits and tags
> plus walking directions.

I think you are confused --- that is not what "-ish" suffix is used
in our conversation on objects.

The reason why we say "-ish" is "Yes we know v2.15.0 is *NOT* a
commit object, we very well know it is a tag object, but because we
allow it to be used in a context that calls for a commit object, we
mark that use context as 'this accepts commit-ish, not just
commit'".

But what you call "walking direction", e.g. "HEAD:Makefile", names a
"BLOB".  It is not "this is something like BLOB, no, we very well
know it is not a blob but the context allows such a non-blob to be
fed in place of a blob, so we take that even though it is not a
blob".  Because it is.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 6/7] builtin/describe.c: describe a blob
  2017-11-02  1:53             ` Junio C Hamano
@ 2017-11-02  4:23               ` Junio C Hamano
  2017-11-04 21:15                 ` Philip Oakley
  0 siblings, 1 reply; 110+ messages in thread
From: Junio C Hamano @ 2017-11-02  4:23 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, Kevin Daudt, Jacob Keller, Johannes Schindelin

Junio C Hamano <gitster@pobox.com> writes:

> The reason why we say "-ish" is "Yes we know v2.15.0 is *NOT* a
> commit object, we very well know it is a tag object, but because we
> allow it to be used in a context that calls for a commit object, we
> mark that use context as 'this accepts commit-ish, not just
> commit'".

Having said all that, there is a valid case in which we might want
to say "blob-ish".

To review, X-ish is the word we use when the command wants to take
an X, but tolerates a lazy user who gives a Y, which is *NOT* X,
without bothering to add ^{X} suffix, i.e. Y^{X}.  In such a case,
the command takes not just X but takes X-ish because it takes a Y
and converts it internally to an X to be extra nice.

When the command wants to take a blob, but tolerates something else
and does "^{blob}" internally, we can say it takes "blob-ish".
Technically that "something else" could be an annotated tag that
points at a blob object, without any intervening commit or tree (I
did not check if the "describe <blob>" code in this thread handles
this, though).

But because it is not usually done to tag a blob directly, it would
probably be not worth to say "blob-ish" in the document and cause
readers to wonder in what situation something that is not a blob can
be treated as if it were a blob.  It does feel like we would be
pursuing technical correctness too much and sacrificing the readability
of the document, at least to me, and a bad trade-off.



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 6/7] builtin/describe.c: describe a blob
  2017-11-01 22:41                   ` Johannes Schindelin
  2017-11-01 22:53                     ` Stefan Beller
@ 2017-11-02  6:05                     ` Jacob Keller
  2017-11-03  5:18                       ` Junio C Hamano
  2017-11-02  7:23                     ` Andreas Schwab
  2 siblings, 1 reply; 110+ messages in thread
From: Jacob Keller @ 2017-11-02  6:05 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Stefan Beller, Junio C Hamano, git, Kevin Daudt

On Wed, Nov 1, 2017 at 3:41 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi Stefan,
>
> On Wed, 1 Nov 2017, Stefan Beller wrote:
>
>> >> I know id prefer the first commit that introduced the blob. That's
>> >> what describing a commit does, it finds the oldest tag prior to the
>> >> commit, while --contains finds the first tag that contains the commit
>> >> as an ancestor.
>> >
>> > It is very easy to wish for "the oldest commit introducing a blob". But
>> > since we're in a DAG, this question is not necessarily easy to answer:
>> >
>> > - A - B - C
>> >     \   /
>> >       D
>> >
>> > Let's assume that all commits except A have the blob, and that B and D
>> > (for some reason) have the same author/committer dates.
>> >
>> > Do you identify B or D as the commit introducing the blob?
>>
>> The current implementation gives C, though.
>> (assuming C is HEAD, and A is ancient)
>>
>> With the --reverse flag one of B or D is given (not sure which,
>> depends on the exact order).
>
> Sure, but it is still a tricky thing. Imagine
>
> - A1 - B1 - A2 - B2 - B3
>
> where all the B* commits have the blob. Do you really want to report B1
> rather than B2 as the commit introducing the blob? (I would prefer B2...)
>
> Ciao,
> Dscho

Yes, but I'd rather we find either B1 or B2, rather than B3, or in the
first case, I'd rather we *at least* provide B or D, rather than C.

Or:

A - B1 - C -E - ... - Z
  \ B2 /

In this example, if everything except a had the blob, we'd find z,
which I think isn't as useful as finding one of B or D.

Sure we can't find a perfect answer, but I'd rather find a commit
which introduced as blob as one that adds the blob, rather than one
that happens to point to it because it has history which includes it.

I think both questions are valuable, the first is "which commit last
had this blob", the second  is "which commit first introduced this
blob", neither of which can always give a definitive answer. It really
depends on what question you're asking up front.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 6/7] builtin/describe.c: describe a blob
  2017-11-01 22:41                   ` Johannes Schindelin
  2017-11-01 22:53                     ` Stefan Beller
  2017-11-02  6:05                     ` Jacob Keller
@ 2017-11-02  7:23                     ` Andreas Schwab
  2017-11-02 18:18                       ` Stefan Beller
  2 siblings, 1 reply; 110+ messages in thread
From: Andreas Schwab @ 2017-11-02  7:23 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Stefan Beller, Jacob Keller, Junio C Hamano, git, Kevin Daudt

On Nov 01 2017, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:

> Sure, but it is still a tricky thing. Imagine
>
> - A1 - B1 - A2 - B2 - B3
>
> where all the B* commits have the blob. Do you really want to report B1
> rather than B2 as the commit introducing the blob? (I would prefer B2...)

What if B3 renames or copies the blob?

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 6/7] builtin/describe.c: describe a blob
  2017-11-02  7:23                     ` Andreas Schwab
@ 2017-11-02 18:18                       ` Stefan Beller
  2017-11-03 12:05                         ` Johannes Schindelin
  0 siblings, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2017-11-02 18:18 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Johannes Schindelin, Jacob Keller, Junio C Hamano, git, Kevin Daudt

On Thu, Nov 2, 2017 at 12:23 AM, Andreas Schwab <schwab@linux-m68k.org> wrote:
> On Nov 01 2017, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>
>> Sure, but it is still a tricky thing. Imagine
>>
>> - A1 - B1 - A2 - B2 - B3
>>
>> where all the B* commits have the blob. Do you really want to report B1
>> rather than B2 as the commit introducing the blob? (I would prefer B2...)
>
> What if B3 renames or copies the blob?
>
> Andreas.

With the current proposed patch you'd find B3, and then use the diff machinery
to digg deeper from there (renames/copies ought to be easy to detect already?)

So with a copy B3 might be a better start than B1, as starting from B1 you
would not find B3 easily.

For a rename, I would think a reverse log/blame on B1:path may help.

With that said, I think I'll just reroll the series with the current logic
fixing the other minor issues that were brought up as B3 seems to
be the most versatile (though not optimal) answer for many use cases.

Thanks for that thought,
Stefan

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCHv3 0/7] git describe blob
  2017-10-31 21:18   ` [PATCHv2 0/7] git describe blob Stefan Beller
                       ` (7 preceding siblings ...)
  2017-11-01  5:14     ` [PATCHv2 0/7] git describe blob Junio C Hamano
@ 2017-11-02 19:41     ` Stefan Beller
  2017-11-02 19:41       ` [PATCHv3 1/7] t6120: fix typo in test name Stefan Beller
                         ` (7 more replies)
  8 siblings, 8 replies; 110+ messages in thread
From: Stefan Beller @ 2017-11-02 19:41 UTC (permalink / raw)
  To: sbeller; +Cc: Johannes.Schindelin, git, jacob.keller, me, schwab

Thanks for the discussion on v2[1].

Interdiff is below, just fixing minor things.

We'll keep the original algorithm for now, deferring an improvement on 
the algorithm front towards a future developer.

Thanks,
Stefan

[1] https://public-inbox.org/git/20171031211852.13001-1-sbeller@google.com/

Stefan Beller (7):
  t6120: fix typo in test name
  list-objects.c: factor out traverse_trees_and_blobs
  revision.h: introduce blob/tree walking in order of the commits
  builtin/describe.c: rename `oid` to avoid variable shadowing
  builtin/describe.c: print debug statements earlier
  builtin/describe.c: factor out describe_commit
  builtin/describe.c: describe a blob

 Documentation/git-describe.txt     |  13 +++-
 Documentation/rev-list-options.txt |   5 ++
 builtin/describe.c                 | 125 ++++++++++++++++++++++++++++---------
 list-objects.c                     |  52 +++++++++------
 revision.c                         |   2 +
 revision.h                         |   3 +-
 t/t6100-rev-list-in-order.sh       |  47 ++++++++++++++
 t/t6120-describe.sh                |  17 ++++-
 8 files changed, 214 insertions(+), 50 deletions(-)
 create mode 100755 t/t6100-rev-list-in-order.sh
diff --git c/Documentation/git-describe.txt w/Documentation/git-describe.txt
index 077c3c2193..79ec0be62a 100644
--- c/Documentation/git-describe.txt
+++ w/Documentation/git-describe.txt
@@ -11,7 +11,7 @@ SYNOPSIS
 [verse]
 'git describe' [--all] [--tags] [--contains] [--abbrev=<n>] [<commit-ish>...]
 'git describe' [--all] [--tags] [--contains] [--abbrev=<n>] --dirty[=<mark>]
-'git describe' [<options>] <blob-ish>
+'git describe' [<options>] <blob>
 
 DESCRIPTION
 -----------
diff --git c/builtin/describe.c w/builtin/describe.c
index 382cbae908..cf08bef344 100644
--- c/builtin/describe.c
+++ w/builtin/describe.c
@@ -440,6 +440,7 @@ struct process_commit_data {
 	struct object_id current_commit;
 	struct object_id looking_for;
 	struct strbuf *dst;
+	struct rev_info *revs;
 };
 
 static void process_commit(struct commit *commit, void *data)
@@ -456,6 +457,7 @@ static void process_object(struct object *obj, const char *path, void *data)
 		reset_revision_walk();
 		describe_commit(&pcd->current_commit, pcd->dst);
 		strbuf_addf(pcd->dst, ":%s", path);
+		pcd->revs->max_count = 0;
 	}
 }
 
@@ -463,7 +465,7 @@ static void describe_blob(struct object_id oid, struct strbuf *dst)
 {
 	struct rev_info revs;
 	struct argv_array args = ARGV_ARRAY_INIT;
-	struct process_commit_data pcd = { null_oid, oid, dst};
+	struct process_commit_data pcd = { null_oid, oid, dst, &revs};
 
 	argv_array_pushl(&args, "internal: The first arg is not parsed",
 		"--all", "--reflog", /* as many starting points as possible */
@@ -497,14 +499,12 @@ static void describe(const char *arg, int last_one)
 		die(_("Not a valid object name %s"), arg);
 	cmit = lookup_commit_reference_gently(&oid, 1);
 
-	if (cmit) {
+	if (cmit)
 		describe_commit(&oid, &sb);
-	} else {
-		if (lookup_blob(&oid))
-			describe_blob(oid, &sb);
-		else
-			die(_("%s is neither a commit nor blob"), arg);
-	}
+	else if (lookup_blob(&oid))
+		describe_blob(oid, &sb);
+	else
+		die(_("%s is neither a commit nor blob"), arg);
 
 	puts(sb.buf);
 
diff --git c/list-objects.c w/list-objects.c
index 03438e5a8b..07a92f35fe 100644
--- c/list-objects.c
+++ w/list-objects.c
@@ -184,13 +184,13 @@ static void add_pending_tree(struct rev_info *revs, struct tree *tree)
 }
 
 static void traverse_trees_and_blobs(struct rev_info *revs,
-				     struct strbuf *base_path,
+				     struct strbuf *base,
 				     show_object_fn show_object,
 				     void *data)
 {
 	int i;
 
-	assert(base_path->len == 0);
+	assert(base->len == 0);
 
 	for (i = 0; i < revs->pending.nr; i++) {
 		struct object_array_entry *pending = revs->pending.objects + i;
@@ -208,12 +208,12 @@ static void traverse_trees_and_blobs(struct rev_info *revs,
 			path = "";
 		if (obj->type == OBJ_TREE) {
 			process_tree(revs, (struct tree *)obj, show_object,
-				     base_path, path, data);
+				     base, path, data);
 			continue;
 		}
 		if (obj->type == OBJ_BLOB) {
 			process_blob(revs, (struct blob *)obj, show_object,
-				     base_path, path, data);
+				     base, path, data);
 			continue;
 		}
 		die("unknown pending object %s (%s)",
diff --git c/t/t6100-rev-list-in-order.sh w/t/t6100-rev-list-in-order.sh
index 651666979b..d4d539b0da 100755
--- c/t/t6100-rev-list-in-order.sh
+++ w/t/t6100-rev-list-in-order.sh
@@ -1,21 +1,22 @@
 #!/bin/sh
 
-test_description='miscellaneous rev-list tests'
+test_description='rev-list testing in-commit-order'
 
 . ./test-lib.sh
 
-
-test_expect_success 'setup' '
+test_expect_success 'rev-list --in-commit-order' '
 	for x in one two three four
 	do
 		echo $x >$x &&
 		git add $x &&
-		git commit -m "add file $x"
+		git commit -m "add file $x" ||
+		return 1
 	done &&
 	for x in four three
 	do
 		git rm $x &&
-		git commit -m "remove $x"
+		git commit -m "remove $x" ||
+		return 1
 	done &&
 	git rev-list --in-commit-order --objects HEAD >actual.raw &&
 	cut -c 1-40 >actual <actual.raw &&
@@ -32,9 +33,9 @@ test_expect_success 'setup' '
 		HEAD~2^{tree}
 		HEAD~2^{tree}:four
 		HEAD~3^{commit}
-		# HEAD~3^{tree} skipped
+		# HEAD~3^{tree} skipped, same as HEAD~1^{tree}
 		HEAD~4^{commit}
-		# HEAD~4^{tree} skipped
+		# HEAD~4^{tree} skipped, same as HEAD^{tree}
 		HEAD~5^{commit}
 		HEAD~5^{tree}
 	EOF
diff --git c/t/t6120-describe.sh w/t/t6120-describe.sh
index fd329f173a..aec6ed192d 100755
--- c/t/t6120-describe.sh
+++ w/t/t6120-describe.sh
@@ -319,7 +319,7 @@ test_expect_success 'describe a blob at a tag' '
 	test_cmp expect actual
 '
 
-test_expect_success 'describe a surviving blob' '
+test_expect_success 'describe a blob with commit-ish' '
 	git commit --allow-empty -m "empty commit" &&
 	git describe HEAD:file >actual &&
 	grep unique-file-1-g actual

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCHv3 1/7] t6120: fix typo in test name
  2017-11-02 19:41     ` [PATCHv3 " Stefan Beller
@ 2017-11-02 19:41       ` Stefan Beller
  2017-11-02 19:41       ` [PATCHv3 2/7] list-objects.c: factor out traverse_trees_and_blobs Stefan Beller
                         ` (6 subsequent siblings)
  7 siblings, 0 replies; 110+ messages in thread
From: Stefan Beller @ 2017-11-02 19:41 UTC (permalink / raw)
  To: sbeller; +Cc: Johannes.Schindelin, git, jacob.keller, me, schwab

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 t/t6120-describe.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 1c0e8659d9..c8b7ed82d9 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -304,7 +304,7 @@ test_expect_success 'describe chokes on severely broken submodules' '
 	mv .git/modules/sub1/ .git/modules/sub_moved &&
 	test_must_fail git describe --dirty
 '
-test_expect_success 'describe ignoring a borken submodule' '
+test_expect_success 'describe ignoring a broken submodule' '
 	git describe --broken >out &&
 	test_when_finished "mv .git/modules/sub_moved .git/modules/sub1" &&
 	grep broken out
-- 
2.15.0.7.g980e40477f


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCHv3 2/7] list-objects.c: factor out traverse_trees_and_blobs
  2017-11-02 19:41     ` [PATCHv3 " Stefan Beller
  2017-11-02 19:41       ` [PATCHv3 1/7] t6120: fix typo in test name Stefan Beller
@ 2017-11-02 19:41       ` Stefan Beller
  2017-11-02 19:41       ` [PATCHv3 3/7] revision.h: introduce blob/tree walking in order of the commits Stefan Beller
                         ` (5 subsequent siblings)
  7 siblings, 0 replies; 110+ messages in thread
From: Stefan Beller @ 2017-11-02 19:41 UTC (permalink / raw)
  To: sbeller; +Cc: Johannes.Schindelin, git, jacob.keller, me, schwab

With traverse_trees_and_blobs factored out of the main traverse function,
the next patch can introduce an in-order revision walking with ease.

In the next patch we'll call `traverse_trees_and_blobs` from within the
loop walking the commits, such that we'll have one invocation of that
function per commit.  That is why we do not want to have memory allocations
in that function, such as we'd have if we were to use a strbuf locally.
Pass a strbuf from traverse_commit_list into the blob and tree traversing
function as a scratch pad that only needs to be allocated once.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 list-objects.c | 50 +++++++++++++++++++++++++++++++-------------------
 1 file changed, 31 insertions(+), 19 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index b3931fa434..7c2ce9c4bd 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -183,25 +183,15 @@ static void add_pending_tree(struct rev_info *revs, struct tree *tree)
 	add_pending_object(revs, &tree->object, "");
 }
 
-void traverse_commit_list(struct rev_info *revs,
-			  show_commit_fn show_commit,
-			  show_object_fn show_object,
-			  void *data)
+static void traverse_trees_and_blobs(struct rev_info *revs,
+				     struct strbuf *base,
+				     show_object_fn show_object,
+				     void *data)
 {
 	int i;
-	struct commit *commit;
-	struct strbuf base;
 
-	strbuf_init(&base, PATH_MAX);
-	while ((commit = get_revision(revs)) != NULL) {
-		/*
-		 * an uninteresting boundary commit may not have its tree
-		 * parsed yet, but we are not going to show them anyway
-		 */
-		if (commit->tree)
-			add_pending_tree(revs, commit->tree);
-		show_commit(commit, data);
-	}
+	assert(base->len == 0);
+
 	for (i = 0; i < revs->pending.nr; i++) {
 		struct object_array_entry *pending = revs->pending.objects + i;
 		struct object *obj = pending->item;
@@ -218,17 +208,39 @@ void traverse_commit_list(struct rev_info *revs,
 			path = "";
 		if (obj->type == OBJ_TREE) {
 			process_tree(revs, (struct tree *)obj, show_object,
-				     &base, path, data);
+				     base, path, data);
 			continue;
 		}
 		if (obj->type == OBJ_BLOB) {
 			process_blob(revs, (struct blob *)obj, show_object,
-				     &base, path, data);
+				     base, path, data);
 			continue;
 		}
 		die("unknown pending object %s (%s)",
 		    oid_to_hex(&obj->oid), name);
 	}
 	object_array_clear(&revs->pending);
-	strbuf_release(&base);
+}
+
+void traverse_commit_list(struct rev_info *revs,
+			  show_commit_fn show_commit,
+			  show_object_fn show_object,
+			  void *data)
+{
+	struct commit *commit;
+	struct strbuf csp; /* callee's scratch pad */
+	strbuf_init(&csp, PATH_MAX);
+
+	while ((commit = get_revision(revs)) != NULL) {
+		/*
+		 * an uninteresting boundary commit may not have its tree
+		 * parsed yet, but we are not going to show them anyway
+		 */
+		if (commit->tree)
+			add_pending_tree(revs, commit->tree);
+		show_commit(commit, data);
+	}
+	traverse_trees_and_blobs(revs, &csp, show_object, data);
+
+	strbuf_release(&csp);
 }
-- 
2.15.0.7.g980e40477f


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCHv3 3/7] revision.h: introduce blob/tree walking in order of the commits
  2017-11-02 19:41     ` [PATCHv3 " Stefan Beller
  2017-11-02 19:41       ` [PATCHv3 1/7] t6120: fix typo in test name Stefan Beller
  2017-11-02 19:41       ` [PATCHv3 2/7] list-objects.c: factor out traverse_trees_and_blobs Stefan Beller
@ 2017-11-02 19:41       ` Stefan Beller
  2017-11-14 19:52         ` Jonathan Tan
  2017-11-02 19:41       ` [PATCHv3 4/7] builtin/describe.c: rename `oid` to avoid variable shadowing Stefan Beller
                         ` (4 subsequent siblings)
  7 siblings, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2017-11-02 19:41 UTC (permalink / raw)
  To: sbeller; +Cc: Johannes.Schindelin, git, jacob.keller, me, schwab

The functionality to list tree objects in the order they were seen
while traversing the commits will be used in the next commit,
where we teach `git describe` to describe not only commits, but
trees and blobs, too.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/rev-list-options.txt |  5 ++++
 list-objects.c                     |  2 ++
 revision.c                         |  2 ++
 revision.h                         |  3 ++-
 t/t6100-rev-list-in-order.sh       | 47 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 58 insertions(+), 1 deletion(-)
 create mode 100755 t/t6100-rev-list-in-order.sh

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 13501e1556..9066e0c777 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -686,6 +686,11 @@ ifdef::git-rev-list[]
 	all object IDs which I need to download if I have the commit
 	object _bar_ but not _foo_''.
 
+--in-commit-order::
+	Print tree and blob ids in order of the commits. The tree
+	and blob ids are printed after they are first referenced
+	by a commit.
+
 --objects-edge::
 	Similar to `--objects`, but also print the IDs of excluded
 	commits prefixed with a ``-'' character.  This is used by
diff --git a/list-objects.c b/list-objects.c
index 7c2ce9c4bd..07a92f35fe 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -239,6 +239,8 @@ void traverse_commit_list(struct rev_info *revs,
 		if (commit->tree)
 			add_pending_tree(revs, commit->tree);
 		show_commit(commit, data);
+		if (revs->tree_blobs_in_commit_order)
+			traverse_trees_and_blobs(revs, &csp, show_object, data);
 	}
 	traverse_trees_and_blobs(revs, &csp, show_object, data);
 
diff --git a/revision.c b/revision.c
index d167223e69..9329d4ebbf 100644
--- a/revision.c
+++ b/revision.c
@@ -1845,6 +1845,8 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->dense = 0;
 	} else if (!strcmp(arg, "--show-all")) {
 		revs->show_all = 1;
+	} else if (!strcmp(arg, "--in-commit-order")) {
+		revs->tree_blobs_in_commit_order = 1;
 	} else if (!strcmp(arg, "--remove-empty")) {
 		revs->remove_empty_trees = 1;
 	} else if (!strcmp(arg, "--merges")) {
diff --git a/revision.h b/revision.h
index 54761200ad..86985d68aa 100644
--- a/revision.h
+++ b/revision.h
@@ -121,7 +121,8 @@ struct rev_info {
 			bisect:1,
 			ancestry_path:1,
 			first_parent_only:1,
-			line_level_traverse:1;
+			line_level_traverse:1,
+			tree_blobs_in_commit_order:1;
 
 	/* Diff flags */
 	unsigned int	diff:1,
diff --git a/t/t6100-rev-list-in-order.sh b/t/t6100-rev-list-in-order.sh
new file mode 100755
index 0000000000..d4d539b0da
--- /dev/null
+++ b/t/t6100-rev-list-in-order.sh
@@ -0,0 +1,47 @@
+#!/bin/sh
+
+test_description='rev-list testing in-commit-order'
+
+. ./test-lib.sh
+
+test_expect_success 'rev-list --in-commit-order' '
+	for x in one two three four
+	do
+		echo $x >$x &&
+		git add $x &&
+		git commit -m "add file $x" ||
+		return 1
+	done &&
+	for x in four three
+	do
+		git rm $x &&
+		git commit -m "remove $x" ||
+		return 1
+	done &&
+	git rev-list --in-commit-order --objects HEAD >actual.raw &&
+	cut -c 1-40 >actual <actual.raw &&
+
+	git cat-file --batch-check="%(objectname)" >expect.raw <<-\EOF &&
+		HEAD^{commit}
+		HEAD^{tree}
+		HEAD^{tree}:one
+		HEAD^{tree}:two
+		HEAD~1^{commit}
+		HEAD~1^{tree}
+		HEAD~1^{tree}:three
+		HEAD~2^{commit}
+		HEAD~2^{tree}
+		HEAD~2^{tree}:four
+		HEAD~3^{commit}
+		# HEAD~3^{tree} skipped, same as HEAD~1^{tree}
+		HEAD~4^{commit}
+		# HEAD~4^{tree} skipped, same as HEAD^{tree}
+		HEAD~5^{commit}
+		HEAD~5^{tree}
+	EOF
+	grep -v "#" >expect <expect.raw &&
+
+	test_cmp expect actual
+'
+
+test_done
-- 
2.15.0.7.g980e40477f


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCHv3 4/7] builtin/describe.c: rename `oid` to avoid variable shadowing
  2017-11-02 19:41     ` [PATCHv3 " Stefan Beller
                         ` (2 preceding siblings ...)
  2017-11-02 19:41       ` [PATCHv3 3/7] revision.h: introduce blob/tree walking in order of the commits Stefan Beller
@ 2017-11-02 19:41       ` Stefan Beller
  2017-11-02 19:41       ` [PATCHv3 5/7] builtin/describe.c: print debug statements earlier Stefan Beller
                         ` (3 subsequent siblings)
  7 siblings, 0 replies; 110+ messages in thread
From: Stefan Beller @ 2017-11-02 19:41 UTC (permalink / raw)
  To: sbeller; +Cc: Johannes.Schindelin, git, jacob.keller, me, schwab

The function `describe` has already a variable named `oid` declared at
the beginning of the function for an object id.  Do not shadow that
variable with a pointer to an object id.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/describe.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index 29075dbd0f..fd61f463cf 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -381,9 +381,9 @@ static void describe(const char *arg, int last_one)
 	}
 
 	if (!match_cnt) {
-		struct object_id *oid = &cmit->object.oid;
+		struct object_id *cmit_oid = &cmit->object.oid;
 		if (always) {
-			printf("%s", find_unique_abbrev(oid->hash, abbrev));
+			printf("%s", find_unique_abbrev(cmit_oid->hash, abbrev));
 			if (suffix)
 				printf("%s", suffix);
 			printf("\n");
@@ -392,11 +392,11 @@ static void describe(const char *arg, int last_one)
 		if (unannotated_cnt)
 			die(_("No annotated tags can describe '%s'.\n"
 			    "However, there were unannotated tags: try --tags."),
-			    oid_to_hex(oid));
+			    oid_to_hex(cmit_oid));
 		else
 			die(_("No tags can describe '%s'.\n"
 			    "Try --always, or create some tags."),
-			    oid_to_hex(oid));
+			    oid_to_hex(cmit_oid));
 	}
 
 	QSORT(all_matches, match_cnt, compare_pt);
-- 
2.15.0.7.g980e40477f


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCHv3 5/7] builtin/describe.c: print debug statements earlier
  2017-11-02 19:41     ` [PATCHv3 " Stefan Beller
                         ` (3 preceding siblings ...)
  2017-11-02 19:41       ` [PATCHv3 4/7] builtin/describe.c: rename `oid` to avoid variable shadowing Stefan Beller
@ 2017-11-02 19:41       ` Stefan Beller
  2017-11-14 19:55         ` Jonathan Tan
  2017-11-02 19:41       ` [PATCHv3 6/7] builtin/describe.c: factor out describe_commit Stefan Beller
                         ` (2 subsequent siblings)
  7 siblings, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2017-11-02 19:41 UTC (permalink / raw)
  To: sbeller; +Cc: Johannes.Schindelin, git, jacob.keller, me, schwab

For debuggers aid we'd want to print debug statements early, so
introduce a new line in the debug output that describes the whole
function, and then change the next debug output to describe why we
need to search. Conveniently drop the arg from the second line;
which will be useful in a follow up commit, that refactors the\
describe function.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/describe.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index fd61f463cf..3136efde31 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -293,6 +293,9 @@ static void describe(const char *arg, int last_one)
 	unsigned long seen_commits = 0;
 	unsigned int unannotated_cnt = 0;
 
+	if (debug)
+		fprintf(stderr, _("describe %s\n"), arg);
+
 	if (get_oid(arg, &oid))
 		die(_("Not a valid object name %s"), arg);
 	cmit = lookup_commit_reference(&oid);
@@ -316,7 +319,7 @@ static void describe(const char *arg, int last_one)
 	if (!max_candidates)
 		die(_("no tag exactly matches '%s'"), oid_to_hex(&cmit->object.oid));
 	if (debug)
-		fprintf(stderr, _("searching to describe %s\n"), arg);
+		fprintf(stderr, _("No exact match on refs or tags, searching to describe\n"));
 
 	if (!have_util) {
 		struct hashmap_iter iter;
-- 
2.15.0.7.g980e40477f


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCHv3 6/7] builtin/describe.c: factor out describe_commit
  2017-11-02 19:41     ` [PATCHv3 " Stefan Beller
                         ` (4 preceding siblings ...)
  2017-11-02 19:41       ` [PATCHv3 5/7] builtin/describe.c: print debug statements earlier Stefan Beller
@ 2017-11-02 19:41       ` Stefan Beller
  2017-11-02 19:41       ` [PATCHv3 7/7] builtin/describe.c: describe a blob Stefan Beller
  2017-11-03  0:23       ` [PATCHv3 0/7] git describe blob Jacob Keller
  7 siblings, 0 replies; 110+ messages in thread
From: Stefan Beller @ 2017-11-02 19:41 UTC (permalink / raw)
  To: sbeller; +Cc: Johannes.Schindelin, git, jacob.keller, me, schwab

In the next patch we'll learn how to describe more than just commits,
so factor out describing commits into its own function.  That will make
the next patches easy as we still need to describe a commit as part of
describing blobs.

While factoring out the functionality to describe_commit, make sure
that any output to stdout is put into a strbuf, which we can print
afterwards, using puts which also adds the line ending.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/describe.c | 63 ++++++++++++++++++++++++++++++++----------------------
 1 file changed, 37 insertions(+), 26 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index 3136efde31..9e9a5ed5d4 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -256,7 +256,7 @@ static unsigned long finish_depth_computation(
 	return seen_commits;
 }
 
-static void display_name(struct commit_name *n)
+static void append_name(struct commit_name *n, struct strbuf *dst)
 {
 	if (n->prio == 2 && !n->tag) {
 		n->tag = lookup_tag(&n->oid);
@@ -272,19 +272,18 @@ static void display_name(struct commit_name *n)
 	}
 
 	if (n->tag)
-		printf("%s", n->tag->tag);
+		strbuf_addstr(dst, n->tag->tag);
 	else
-		printf("%s", n->path);
+		strbuf_addstr(dst, n->path);
 }
 
-static void show_suffix(int depth, const struct object_id *oid)
+static void append_suffix(int depth, const struct object_id *oid, struct strbuf *dst)
 {
-	printf("-%d-g%s", depth, find_unique_abbrev(oid->hash, abbrev));
+	strbuf_addf(dst, "-%d-g%s", depth, find_unique_abbrev(oid->hash, abbrev));
 }
 
-static void describe(const char *arg, int last_one)
+static void describe_commit(struct object_id *oid, struct strbuf *dst)
 {
-	struct object_id oid;
 	struct commit *cmit, *gave_up_on = NULL;
 	struct commit_list *list;
 	struct commit_name *n;
@@ -293,26 +292,18 @@ static void describe(const char *arg, int last_one)
 	unsigned long seen_commits = 0;
 	unsigned int unannotated_cnt = 0;
 
-	if (debug)
-		fprintf(stderr, _("describe %s\n"), arg);
-
-	if (get_oid(arg, &oid))
-		die(_("Not a valid object name %s"), arg);
-	cmit = lookup_commit_reference(&oid);
-	if (!cmit)
-		die(_("%s is not a valid '%s' object"), arg, commit_type);
+	cmit = lookup_commit_reference(oid);
 
 	n = find_commit_name(&cmit->object.oid);
 	if (n && (tags || all || n->prio == 2)) {
 		/*
 		 * Exact match to an existing ref.
 		 */
-		display_name(n);
+		append_name(n, dst);
 		if (longformat)
-			show_suffix(0, n->tag ? &n->tag->tagged->oid : &oid);
+			append_suffix(0, n->tag ? &n->tag->tagged->oid : oid, dst);
 		if (suffix)
-			printf("%s", suffix);
-		printf("\n");
+			strbuf_addstr(dst, suffix);
 		return;
 	}
 
@@ -386,10 +377,9 @@ static void describe(const char *arg, int last_one)
 	if (!match_cnt) {
 		struct object_id *cmit_oid = &cmit->object.oid;
 		if (always) {
-			printf("%s", find_unique_abbrev(cmit_oid->hash, abbrev));
+			strbuf_addstr(dst, find_unique_abbrev(cmit_oid->hash, abbrev));
 			if (suffix)
-				printf("%s", suffix);
-			printf("\n");
+				strbuf_addstr(dst, suffix);
 			return;
 		}
 		if (unannotated_cnt)
@@ -437,15 +427,36 @@ static void describe(const char *arg, int last_one)
 		}
 	}
 
-	display_name(all_matches[0].name);
+	append_name(all_matches[0].name, dst);
 	if (abbrev)
-		show_suffix(all_matches[0].depth, &cmit->object.oid);
+		append_suffix(all_matches[0].depth, &cmit->object.oid, dst);
 	if (suffix)
-		printf("%s", suffix);
-	printf("\n");
+		strbuf_addstr(dst, suffix);
+}
+
+static void describe(const char *arg, int last_one)
+{
+	struct object_id oid;
+	struct commit *cmit;
+	struct strbuf sb = STRBUF_INIT;
+
+	if (debug)
+		fprintf(stderr, _("describe %s\n"), arg);
+
+	if (get_oid(arg, &oid))
+		die(_("Not a valid object name %s"), arg);
+	cmit = lookup_commit_reference(&oid);
+	if (!cmit)
+		die(_("%s is not a valid '%s' object"), arg, commit_type);
+
+	describe_commit(&oid, &sb);
+
+	puts(sb.buf);
 
 	if (!last_one)
 		clear_commit_marks(cmit, -1);
+
+	strbuf_release(&sb);
 }
 
 int cmd_describe(int argc, const char **argv, const char *prefix)
-- 
2.15.0.7.g980e40477f


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCHv3 7/7] builtin/describe.c: describe a blob
  2017-11-02 19:41     ` [PATCHv3 " Stefan Beller
                         ` (5 preceding siblings ...)
  2017-11-02 19:41       ` [PATCHv3 6/7] builtin/describe.c: factor out describe_commit Stefan Beller
@ 2017-11-02 19:41       ` Stefan Beller
  2017-11-14 20:02         ` Jonathan Tan
  2017-11-03  0:23       ` [PATCHv3 0/7] git describe blob Jacob Keller
  7 siblings, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2017-11-02 19:41 UTC (permalink / raw)
  To: sbeller; +Cc: Johannes.Schindelin, git, jacob.keller, me, schwab

Sometimes users are given a hash of an object and they want to
identify it further (ex.: Use verify-pack to find the largest blobs,
but what are these? or [1])

"This is an interesting endeavor, because describing things is hard."
  -- me, upon writing this patch.

When describing commits, we try to anchor them to tags or refs, as these
are conceptually on a higher level than the commit. And if there is no ref
or tag that matches exactly, we're out of luck.  So we employ a heuristic
to make up a name for the commit. These names are ambiguous, there might
be different tags or refs to anchor to, and there might be different
path in the DAG to travel to arrive at the commit precisely.

When describing a blob, we want to describe the blob from a higher layer
as well, which is a tuple of (commit, deep/path) as the tree objects
involved are rather uninteresting.  The same blob can be referenced by
multiple commits, so how we decide which commit to use?  This patch
implements a rather naive approach on this: As there are no back pointers
from blobs to commits in which the blob occurs, we'll start walking from
any tips available, listing the blobs in-order of the commit and once we
found the blob, we'll take the first commit that listed the blob.  For
source code this is likely not the first commit that introduced the blob,
but rather the latest commit that contained the blob.  For example:

  git describe v0.99:Makefile
  v0.99-5-gab6625e06a:Makefile

tells us the latest commit that contained the Makefile as it was in tag
v0.99 is commit v0.99-5-gab6625e06a (and at the same path), as the next
commit on top v0.99-6-gb1de9de2b9 ([PATCH] Bootstrap "make dist",
2005-07-11) touches the Makefile.

Let's see how this description turns out, if it is useful in day-to-day
use as I have the intuition that we'd rather want to see the *first*
commit that this blob was introduced to the repository (which can be
achieved easily by giving the `--reverse` flag in the describe_blob rev
walk).

[1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/git-describe.txt | 13 ++++++++-
 builtin/describe.c             | 65 ++++++++++++++++++++++++++++++++++++++----
 t/t6120-describe.sh            | 15 ++++++++++
 3 files changed, 87 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt
index c924c945ba..79ec0be62a 100644
--- a/Documentation/git-describe.txt
+++ b/Documentation/git-describe.txt
@@ -3,7 +3,7 @@ git-describe(1)
 
 NAME
 ----
-git-describe - Describe a commit using the most recent tag reachable from it
+git-describe - Describe a commit or blob using the graph relations
 
 
 SYNOPSIS
@@ -11,6 +11,7 @@ SYNOPSIS
 [verse]
 'git describe' [--all] [--tags] [--contains] [--abbrev=<n>] [<commit-ish>...]
 'git describe' [--all] [--tags] [--contains] [--abbrev=<n>] --dirty[=<mark>]
+'git describe' [<options>] <blob>
 
 DESCRIPTION
 -----------
@@ -24,6 +25,16 @@ By default (without --all or --tags) `git describe` only shows
 annotated tags.  For more information about creating annotated tags
 see the -a and -s options to linkgit:git-tag[1].
 
+If the given object refers to a blob, it will be described
+as `<commit-ish>:<path>`, such that the blob can be found
+at `<path>` in the `<commit-ish>`. Note, that the commit is likely
+not the commit that introduced the blob, but the one that was found
+first; to find the commit that introduced the blob, you need to find
+the commit that last touched the path, e.g.
+`git log <commit-description> -- <path>`.
+As blobs do not point at the commits they are contained in,
+describing blobs is slow as we have to walk the whole graph.
+
 OPTIONS
 -------
 <commit-ish>...::
diff --git a/builtin/describe.c b/builtin/describe.c
index 9e9a5ed5d4..cf08bef344 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -3,6 +3,7 @@
 #include "lockfile.h"
 #include "commit.h"
 #include "tag.h"
+#include "blob.h"
 #include "refs.h"
 #include "builtin.h"
 #include "exec_cmd.h"
@@ -11,8 +12,9 @@
 #include "hashmap.h"
 #include "argv-array.h"
 #include "run-command.h"
+#include "revision.h"
+#include "list-objects.h"
 
-#define SEEN		(1u << 0)
 #define MAX_TAGS	(FLAG_BITS - 1)
 
 static const char * const describe_usage[] = {
@@ -434,6 +436,56 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
 		strbuf_addstr(dst, suffix);
 }
 
+struct process_commit_data {
+	struct object_id current_commit;
+	struct object_id looking_for;
+	struct strbuf *dst;
+	struct rev_info *revs;
+};
+
+static void process_commit(struct commit *commit, void *data)
+{
+	struct process_commit_data *pcd = data;
+	pcd->current_commit = commit->object.oid;
+}
+
+static void process_object(struct object *obj, const char *path, void *data)
+{
+	struct process_commit_data *pcd = data;
+
+	if (!oidcmp(&pcd->looking_for, &obj->oid) && !pcd->dst->len) {
+		reset_revision_walk();
+		describe_commit(&pcd->current_commit, pcd->dst);
+		strbuf_addf(pcd->dst, ":%s", path);
+		pcd->revs->max_count = 0;
+	}
+}
+
+static void describe_blob(struct object_id oid, struct strbuf *dst)
+{
+	struct rev_info revs;
+	struct argv_array args = ARGV_ARRAY_INIT;
+	struct process_commit_data pcd = { null_oid, oid, dst, &revs};
+
+	argv_array_pushl(&args, "internal: The first arg is not parsed",
+		"--all", "--reflog", /* as many starting points as possible */
+		/* NEEDSWORK: --all is incompatible with worktrees for now: */
+		"--single-worktree",
+		"--objects",
+		"--in-commit-order",
+		NULL);
+
+	init_revisions(&revs, NULL);
+	if (setup_revisions(args.argc, args.argv, &revs, NULL) > 1)
+		BUG("setup_revisions could not handle all args?");
+
+	if (prepare_revision_walk(&revs))
+		die("revision walk setup failed");
+
+	traverse_commit_list(&revs, process_commit, process_object, &pcd);
+	reset_revision_walk();
+}
+
 static void describe(const char *arg, int last_one)
 {
 	struct object_id oid;
@@ -445,11 +497,14 @@ static void describe(const char *arg, int last_one)
 
 	if (get_oid(arg, &oid))
 		die(_("Not a valid object name %s"), arg);
-	cmit = lookup_commit_reference(&oid);
-	if (!cmit)
-		die(_("%s is not a valid '%s' object"), arg, commit_type);
+	cmit = lookup_commit_reference_gently(&oid, 1);
 
-	describe_commit(&oid, &sb);
+	if (cmit)
+		describe_commit(&oid, &sb);
+	else if (lookup_blob(&oid))
+		describe_blob(oid, &sb);
+	else
+		die(_("%s is neither a commit nor blob"), arg);
 
 	puts(sb.buf);
 
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index c8b7ed82d9..aec6ed192d 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -310,6 +310,21 @@ test_expect_success 'describe ignoring a broken submodule' '
 	grep broken out
 '
 
+test_expect_success 'describe a blob at a tag' '
+	echo "make it a unique blob" >file &&
+	git add file && git commit -m "content in file" &&
+	git tag -a -m "latest annotated tag" unique-file &&
+	git describe HEAD:file >actual &&
+	echo "unique-file:file" >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'describe a blob with commit-ish' '
+	git commit --allow-empty -m "empty commit" &&
+	git describe HEAD:file >actual &&
+	grep unique-file-1-g actual
+'
+
 test_expect_failure ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
 	i=1 &&
 	while test $i -lt 8000
-- 
2.15.0.7.g980e40477f


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv3 0/7] git describe blob
  2017-11-02 19:41     ` [PATCHv3 " Stefan Beller
                         ` (6 preceding siblings ...)
  2017-11-02 19:41       ` [PATCHv3 7/7] builtin/describe.c: describe a blob Stefan Beller
@ 2017-11-03  0:23       ` Jacob Keller
  2017-11-03  1:46         ` Junio C Hamano
  7 siblings, 1 reply; 110+ messages in thread
From: Jacob Keller @ 2017-11-03  0:23 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Johannes Schindelin, Git mailing list, Kevin Daudt, Andreas Schwab

On Thu, Nov 2, 2017 at 12:41 PM, Stefan Beller <sbeller@google.com> wrote:
> Thanks for the discussion on v2[1].
>
> Interdiff is below, just fixing minor things.
>
> We'll keep the original algorithm for now, deferring an improvement on
> the algorithm front towards a future developer.
>

I agree, "something" is better than "nothing", and we can work to
improve "something" in the future, especially after we get more real
use and see what people think. Only question would be how much do we
need to document the current behavior might be open for improvement?

> Thanks,
> Stefan
>
> [1] https://public-inbox.org/git/20171031211852.13001-1-sbeller@google.com/
>
> Stefan Beller (7):
>   t6120: fix typo in test name
>   list-objects.c: factor out traverse_trees_and_blobs
>   revision.h: introduce blob/tree walking in order of the commits
>   builtin/describe.c: rename `oid` to avoid variable shadowing
>   builtin/describe.c: print debug statements earlier
>   builtin/describe.c: factor out describe_commit
>   builtin/describe.c: describe a blob
>
>  Documentation/git-describe.txt     |  13 +++-
>  Documentation/rev-list-options.txt |   5 ++
>  builtin/describe.c                 | 125 ++++++++++++++++++++++++++++---------
>  list-objects.c                     |  52 +++++++++------
>  revision.c                         |   2 +
>  revision.h                         |   3 +-
>  t/t6100-rev-list-in-order.sh       |  47 ++++++++++++++
>  t/t6120-describe.sh                |  17 ++++-
>  8 files changed, 214 insertions(+), 50 deletions(-)
>  create mode 100755 t/t6100-rev-list-in-order.sh
> diff --git c/Documentation/git-describe.txt w/Documentation/git-describe.txt
> index 077c3c2193..79ec0be62a 100644
> --- c/Documentation/git-describe.txt
> +++ w/Documentation/git-describe.txt
> @@ -11,7 +11,7 @@ SYNOPSIS
>  [verse]
>  'git describe' [--all] [--tags] [--contains] [--abbrev=<n>] [<commit-ish>...]
>  'git describe' [--all] [--tags] [--contains] [--abbrev=<n>] --dirty[=<mark>]
> -'git describe' [<options>] <blob-ish>
> +'git describe' [<options>] <blob>
>
>  DESCRIPTION
>  -----------
> diff --git c/builtin/describe.c w/builtin/describe.c
> index 382cbae908..cf08bef344 100644
> --- c/builtin/describe.c
> +++ w/builtin/describe.c
> @@ -440,6 +440,7 @@ struct process_commit_data {
>         struct object_id current_commit;
>         struct object_id looking_for;
>         struct strbuf *dst;
> +       struct rev_info *revs;
>  };
>
>  static void process_commit(struct commit *commit, void *data)
> @@ -456,6 +457,7 @@ static void process_object(struct object *obj, const char *path, void *data)
>                 reset_revision_walk();
>                 describe_commit(&pcd->current_commit, pcd->dst);
>                 strbuf_addf(pcd->dst, ":%s", path);
> +               pcd->revs->max_count = 0;
>         }
>  }
>
> @@ -463,7 +465,7 @@ static void describe_blob(struct object_id oid, struct strbuf *dst)
>  {
>         struct rev_info revs;
>         struct argv_array args = ARGV_ARRAY_INIT;
> -       struct process_commit_data pcd = { null_oid, oid, dst};
> +       struct process_commit_data pcd = { null_oid, oid, dst, &revs};
>
>         argv_array_pushl(&args, "internal: The first arg is not parsed",
>                 "--all", "--reflog", /* as many starting points as possible */
> @@ -497,14 +499,12 @@ static void describe(const char *arg, int last_one)
>                 die(_("Not a valid object name %s"), arg);
>         cmit = lookup_commit_reference_gently(&oid, 1);
>
> -       if (cmit) {
> +       if (cmit)
>                 describe_commit(&oid, &sb);
> -       } else {
> -               if (lookup_blob(&oid))
> -                       describe_blob(oid, &sb);
> -               else
> -                       die(_("%s is neither a commit nor blob"), arg);
> -       }
> +       else if (lookup_blob(&oid))
> +               describe_blob(oid, &sb);
> +       else
> +               die(_("%s is neither a commit nor blob"), arg);
>
>         puts(sb.buf);
>
> diff --git c/list-objects.c w/list-objects.c
> index 03438e5a8b..07a92f35fe 100644
> --- c/list-objects.c
> +++ w/list-objects.c
> @@ -184,13 +184,13 @@ static void add_pending_tree(struct rev_info *revs, struct tree *tree)
>  }
>
>  static void traverse_trees_and_blobs(struct rev_info *revs,
> -                                    struct strbuf *base_path,
> +                                    struct strbuf *base,
>                                      show_object_fn show_object,
>                                      void *data)
>  {
>         int i;
>
> -       assert(base_path->len == 0);
> +       assert(base->len == 0);
>
>         for (i = 0; i < revs->pending.nr; i++) {
>                 struct object_array_entry *pending = revs->pending.objects + i;
> @@ -208,12 +208,12 @@ static void traverse_trees_and_blobs(struct rev_info *revs,
>                         path = "";
>                 if (obj->type == OBJ_TREE) {
>                         process_tree(revs, (struct tree *)obj, show_object,
> -                                    base_path, path, data);
> +                                    base, path, data);
>                         continue;
>                 }
>                 if (obj->type == OBJ_BLOB) {
>                         process_blob(revs, (struct blob *)obj, show_object,
> -                                    base_path, path, data);
> +                                    base, path, data);
>                         continue;
>                 }
>                 die("unknown pending object %s (%s)",
> diff --git c/t/t6100-rev-list-in-order.sh w/t/t6100-rev-list-in-order.sh
> index 651666979b..d4d539b0da 100755
> --- c/t/t6100-rev-list-in-order.sh
> +++ w/t/t6100-rev-list-in-order.sh
> @@ -1,21 +1,22 @@
>  #!/bin/sh
>
> -test_description='miscellaneous rev-list tests'
> +test_description='rev-list testing in-commit-order'
>
>  . ./test-lib.sh
>
> -
> -test_expect_success 'setup' '
> +test_expect_success 'rev-list --in-commit-order' '
>         for x in one two three four
>         do
>                 echo $x >$x &&
>                 git add $x &&
> -               git commit -m "add file $x"
> +               git commit -m "add file $x" ||
> +               return 1
>         done &&
>         for x in four three
>         do
>                 git rm $x &&
> -               git commit -m "remove $x"
> +               git commit -m "remove $x" ||
> +               return 1
>         done &&
>         git rev-list --in-commit-order --objects HEAD >actual.raw &&
>         cut -c 1-40 >actual <actual.raw &&
> @@ -32,9 +33,9 @@ test_expect_success 'setup' '
>                 HEAD~2^{tree}
>                 HEAD~2^{tree}:four
>                 HEAD~3^{commit}
> -               # HEAD~3^{tree} skipped
> +               # HEAD~3^{tree} skipped, same as HEAD~1^{tree}
>                 HEAD~4^{commit}
> -               # HEAD~4^{tree} skipped
> +               # HEAD~4^{tree} skipped, same as HEAD^{tree}
>                 HEAD~5^{commit}
>                 HEAD~5^{tree}
>         EOF
> diff --git c/t/t6120-describe.sh w/t/t6120-describe.sh
> index fd329f173a..aec6ed192d 100755
> --- c/t/t6120-describe.sh
> +++ w/t/t6120-describe.sh
> @@ -319,7 +319,7 @@ test_expect_success 'describe a blob at a tag' '
>         test_cmp expect actual
>  '
>
> -test_expect_success 'describe a surviving blob' '
> +test_expect_success 'describe a blob with commit-ish' '
>         git commit --allow-empty -m "empty commit" &&
>         git describe HEAD:file >actual &&
>         grep unique-file-1-g actual

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv3 0/7] git describe blob
  2017-11-03  0:23       ` [PATCHv3 0/7] git describe blob Jacob Keller
@ 2017-11-03  1:46         ` Junio C Hamano
  2017-11-03  2:29           ` Stefan Beller
  0 siblings, 1 reply; 110+ messages in thread
From: Junio C Hamano @ 2017-11-03  1:46 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Stefan Beller, Johannes Schindelin, Git mailing list,
	Kevin Daudt, Andreas Schwab

Jacob Keller <jacob.keller@gmail.com> writes:

> I agree, "something" is better than "nothing", and we can work to
> improve "something" in the future, especially after we get more real
> use and see what people think. Only question would be how much do we
> need to document the current behavior might be open for improvement?

If 

 - it always digs to the root of the history no matter what; and/or
 - it almost always yields correct but suboptimal result

then that fact must be documented in BUGS section, possibly a brief
descrition of why that limitation is there, with a hint to invite
people to look into fixing it.

We should mark it prominently as experimental and advertise it as
such.  "It's too slow in real project to be usable" and "Its output
bases the answer on an irrelevant commit" are not something we want
our users to experience, except for those with inclination to (or
ability and time to) help improve the feature.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv3 0/7] git describe blob
  2017-11-03  1:46         ` Junio C Hamano
@ 2017-11-03  2:29           ` Stefan Beller
  0 siblings, 0 replies; 110+ messages in thread
From: Stefan Beller @ 2017-11-03  2:29 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jacob Keller, Johannes Schindelin, Git mailing list, Kevin Daudt,
	Andreas Schwab

On Thu, Nov 2, 2017 at 6:46 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Jacob Keller <jacob.keller@gmail.com> writes:
>
>> I agree, "something" is better than "nothing", and we can work to
>> improve "something" in the future, especially after we get more real
>> use and see what people think. Only question would be how much do we
>> need to document the current behavior might be open for improvement?
>
> If
>
>  - it always digs to the root of the history no matter what; and/or

this is fixed.

>  - it almost always yields correct but suboptimal result


this is not, for the lack of knowing what the optimal result is.

>
> then that fact must be documented in BUGS section, possibly a brief
> descrition of why that limitation is there, with a hint to invite
> people to look into fixing it.
>
> We should mark it prominently as experimental and advertise it as
> such.  "It's too slow in real project to be usable"

I found it quite fast after fixing the history walking, but still.

> and "Its output
> bases the answer on an irrelevant commit" are not something we want
> our users to experience, except for those with inclination to (or
> ability and time to) help improve the feature.

I think the current documentation states exactly this.

Thanks.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 6/7] builtin/describe.c: describe a blob
  2017-11-02  6:05                     ` Jacob Keller
@ 2017-11-03  5:18                       ` Junio C Hamano
  2017-11-03  6:55                         ` Jacob Keller
  0 siblings, 1 reply; 110+ messages in thread
From: Junio C Hamano @ 2017-11-03  5:18 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Johannes Schindelin, Stefan Beller, git, Kevin Daudt

Jacob Keller <jacob.keller@gmail.com> writes:

> I think both questions are valuable, the first is "which commit last
> had this blob", the second  is "which commit first introduced this
> blob", neither of which can always give a definitive answer. It really
> depends on what question you're asking up front.

Given that "describe" is about giving an object _a_ name that is
plausibly useful to refer to it, it is not a good match for the
above query that wants to know where it came from, how long it
remains in the tree and where it ceases to be relevant.  In order to
support that use case, a totally different and possibly more useful
avenue would be to think how this can be hooked into "git log"
machinery.

A refresher for how "git log [--options] <pathspec>" works may be
beneficial.  We walk history and compare the tree of the commit we
are looking at with the tree of its parent commits.  If everything
within <pathspec> is the same, we mark the transition between the
parent and our commit TREESAME (other commits, i.e. the ones that
have meaningful change within <pathspec>, are !TREESAME).  Then the
output routine presents the set of commits that includes commits
that are !TREESAME, within the constraints of the --options given
(e.g. with --full-history, sides of a merge that is TREESAME may
still be shown to preserve connectedness of the resulting graph).

It is easy to imagine that we can restrict "git log" traversal with
a "--blobchange=<blob>" option instead of (or in addition to) the
limitation <pathspec> gives us.  Instead of treating a commit whose
diff against its parent commit has any filepair that is different
within <pathspec> as "!TREESAME", we can treat a commit whose diff
against its parent commit has a filepair that has the <blob> on
either side of the filepair as "!TREESAME" (in other words, we
ignore a transition that is not about introducing or forgetting the
<blob> we are looking for as an "interesting change").  That would
give you a commit history graph in which only (and all) such commits
that either adds or removes the <blob> in it.

Hmm?

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 6/7] builtin/describe.c: describe a blob
  2017-11-03  5:18                       ` Junio C Hamano
@ 2017-11-03  6:55                         ` Jacob Keller
  2017-11-03 15:02                           ` Junio C Hamano
  0 siblings, 1 reply; 110+ messages in thread
From: Jacob Keller @ 2017-11-03  6:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, Stefan Beller, git, Kevin Daudt

On Thu, Nov 2, 2017 at 10:18 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Jacob Keller <jacob.keller@gmail.com> writes:
>
>> I think both questions are valuable, the first is "which commit last
>> had this blob", the second  is "which commit first introduced this
>> blob", neither of which can always give a definitive answer. It really
>> depends on what question you're asking up front.
>
> Given that "describe" is about giving an object _a_ name that is
> plausibly useful to refer to it, it is not a good match for the
> above query that wants to know where it came from, how long it
> remains in the tree and where it ceases to be relevant.  In order to
> support that use case, a totally different and possibly more useful
> avenue would be to think how this can be hooked into "git log"
> machinery.
>
> A refresher for how "git log [--options] <pathspec>" works may be
> beneficial.  We walk history and compare the tree of the commit we
> are looking at with the tree of its parent commits.  If everything
> within <pathspec> is the same, we mark the transition between the
> parent and our commit TREESAME (other commits, i.e. the ones that
> have meaningful change within <pathspec>, are !TREESAME).  Then the
> output routine presents the set of commits that includes commits
> that are !TREESAME, within the constraints of the --options given
> (e.g. with --full-history, sides of a merge that is TREESAME may
> still be shown to preserve connectedness of the resulting graph).
>
> It is easy to imagine that we can restrict "git log" traversal with
> a "--blobchange=<blob>" option instead of (or in addition to) the
> limitation <pathspec> gives us.  Instead of treating a commit whose
> diff against its parent commit has any filepair that is different
> within <pathspec> as "!TREESAME", we can treat a commit whose diff
> against its parent commit has a filepair that has the <blob> on
> either side of the filepair as "!TREESAME" (in other words, we
> ignore a transition that is not about introducing or forgetting the
> <blob> we are looking for as an "interesting change").  That would
> give you a commit history graph in which only (and all) such commits
> that either adds or removes the <blob> in it.
>
> Hmm?

This seems quite useful in the context of figuring out how a file got
to such a state. This is useful to me, since if I know the state of a
file (ie: it's exact contents) I can determine the blob name, and then
use that to lookup where it was introduced.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 6/7] builtin/describe.c: describe a blob
  2017-11-02 18:18                       ` Stefan Beller
@ 2017-11-03 12:05                         ` Johannes Schindelin
  0 siblings, 0 replies; 110+ messages in thread
From: Johannes Schindelin @ 2017-11-03 12:05 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Andreas Schwab, Jacob Keller, Junio C Hamano, git, Kevin Daudt

Hi Stefan,

On Thu, 2 Nov 2017, Stefan Beller wrote:

> On Thu, Nov 2, 2017 at 12:23 AM, Andreas Schwab <schwab@linux-m68k.org> wrote:
> > On Nov 01 2017, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> >
> >> Sure, but it is still a tricky thing. Imagine
> >>
> >> - A1 - B1 - A2 - B2 - B3
> >>
> >> where all the B* commits have the blob. Do you really want to report B1
> >> rather than B2 as the commit introducing the blob? (I would prefer B2...)
> >
> > What if B3 renames or copies the blob?
> >
> > Andreas.
> 
> With the current proposed patch you'd find B3, and then use the diff machinery
> to digg deeper from there (renames/copies ought to be easy to detect already?)
> 
> So with a copy B3 might be a better start than B1, as starting from B1 you
> would not find B3 easily.
> 
> For a rename, I would think a reverse log/blame on B1:path may help.
> 
> With that said, I think I'll just reroll the series with the current logic
> fixing the other minor issues that were brought up as B3 seems to
> be the most versatile (though not optimal) answer for many use cases.

I know this is a bit of semantics, but I disagree that B3 is the most
versatile. For my use cases, `git describe` comes in handy relatively
rarely, and usually only when I want to know what version some work was
based on. In that respect, B2 would be the most versatile answer.

However, if you say that B3 is the easiest answer to explain, I
whole-heartedly agree. Any other answer would be necessarily more
complicated to reason about, given that we're operating on a DAG, i.e. not
always on a linear history.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv2 6/7] builtin/describe.c: describe a blob
  2017-11-03  6:55                         ` Jacob Keller
@ 2017-11-03 15:02                           ` Junio C Hamano
  0 siblings, 0 replies; 110+ messages in thread
From: Junio C Hamano @ 2017-11-03 15:02 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Stefan Beller, Kevin Daudt, Jacob Keller

Jacob Keller <jacob.keller@gmail.com> writes:

> On Thu, Nov 2, 2017 at 10:18 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> ...
>> It is easy to imagine that we can restrict "git log" traversal with
>> a "--blobchange=<blob>" option instead of (or in addition to) the
>> limitation <pathspec> gives us.  Instead of treating a commit whose
>> diff against its parent commit has any filepair that is different
>> within <pathspec> as "!TREESAME", we can treat a commit whose diff
>> against its parent commit has a filepair that has the <blob> on
>> either side of the filepair as "!TREESAME" (in other words, we
>> ignore a transition that is not about introducing or forgetting the
>> <blob> we are looking for as an "interesting change").  That would
>> give you a commit history graph in which only (and all) such commits
>> that either adds or removes the <blob> in it.
>>
>> Hmm?
>
> This seems quite useful in the context of figuring out how a file got
> to such a state. This is useful to me, since if I know the state of a
> file (ie: it's exact contents) I can determine the blob name, and then
> use that to lookup where it was introduced.

This is probably a bit harder than an average #leftoverbits, but if
somebody wants to get their hands dirty, it should be reasonably
straightforward.  The needed changes would roughly go like so:

 - Add "struct oid *blob_change;" to diff_options, initialized to
   NULL.

 - Teach diff_opt_parse() a new option "--blobchange=<blob>".
   Allocate a struct oid when you parse this option and point at it
   with the blob_change field above.

 - Write diffcore-blobchange.c, modeled after diffcore-pickaxe.c,
   but this should be a lot simpler (as there is no need for an
   equivalent for "pickaxe-all" option).  It would

   - prepare an empty diff_queue_struct "outq";
   - iterate over the given diff_queue "q", and 
     - a filepair p whose p->one is blob_change and p->two is not,
       (or the other way around) is added to outq with diff_q()
     - a filepair whose p->one/p->two do not involve blob_change
       is freed with diff_free_filepair().
   - replace "q" with "outq".

 - Add a call to diffcore_blobchange() early in diffcore_std(),
   probably immediately after skip-stat-unmatch, when blob_change
   field is not NULL.

 - Arrange that blob_change is propagated to revs->pruning in
   setup_revisions().


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 6/7] builtin/describe.c: describe a blob
  2017-11-02  4:23               ` Junio C Hamano
@ 2017-11-04 21:15                 ` Philip Oakley
  2017-11-05  6:28                   ` Junio C Hamano
  0 siblings, 1 reply; 110+ messages in thread
From: Philip Oakley @ 2017-11-04 21:15 UTC (permalink / raw)
  To: Junio C Hamano, Stefan Beller
  Cc: git, Kevin Daudt, Jacob Keller, Johannes Schindelin

From: "Junio C Hamano" <gitster@pobox.com>
Sent: Thursday, November 02, 2017 4:23 AM
> Junio C Hamano <gitster@pobox.com> writes:
>
>> The reason why we say "-ish" is "Yes we know v2.15.0 is *NOT* a
>> commit object, we very well know it is a tag object, but because we
>> allow it to be used in a context that calls for a commit object, we
>> mark that use context as 'this accepts commit-ish, not just
>> commit'".
>
> Having said all that, there is a valid case in which we might want
> to say "blob-ish".

Is this not also an alternative case, relative to the user, for the scenario 
where the user has an oid/sha1 value but does not know what it is, and would 
like to find its source and type relative to the `describe` command.

IIUC the existing `describe` command only accepts <commit-ish> values, and 
here we are extending that to be even more inclusive, but at the same time 
the options become more restricted. Thus the synopsis terminology would be 
more about suggesting the range of options available (search style/start 
points) that are applicable to blobs, than being exactly about the 
'allow-blobs' parameter.

Or have I misunderstood how the fast commit search and the slower 
potentially-a-blob searching are disambiguated?
--
Philip

>
> To review, X-ish is the word we use when the command wants to take
> an X, but tolerates a lazy user who gives a Y, which is *NOT* X,
> without bothering to add ^{X} suffix, i.e. Y^{X}.  In such a case,
> the command takes not just X but takes X-ish because it takes a Y
> and converts it internally to an X to be extra nice.
>
> When the command wants to take a blob, but tolerates something else
> and does "^{blob}" internally, we can say it takes "blob-ish".
> Technically that "something else" could be an annotated tag that
> points at a blob object, without any intervening commit or tree (I
> did not check if the "describe <blob>" code in this thread handles
> this, though).
>
> But because it is not usually done to tag a blob directly, it would
> probably be not worth to say "blob-ish" in the document and cause
> readers to wonder in what situation something that is not a blob can
> be treated as if it were a blob.  It does feel like we would be
> pursuing technical correctness too much and sacrificing the readability
> of the document, at least to me, and a bad trade-off.
>
> 


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 6/7] builtin/describe.c: describe a blob
  2017-11-04 21:15                 ` Philip Oakley
@ 2017-11-05  6:28                   ` Junio C Hamano
  2017-11-06 23:50                     ` Philip Oakley
  0 siblings, 1 reply; 110+ messages in thread
From: Junio C Hamano @ 2017-11-05  6:28 UTC (permalink / raw)
  To: Philip Oakley
  Cc: Stefan Beller, git, Kevin Daudt, Jacob Keller, Johannes Schindelin

"Philip Oakley" <philipoakley@iee.org> writes:

> Is this not also an alternative case, relative to the user, for the
> scenario where the user has an oid/sha1 value but does not know what
> it is, and would like to find its source and type relative to the
> `describe` command.

I am not sure what you wanted to say with "source and type RELATIVE TO
the describe command".

The first thing the combination of the user and the describe command
would do when the user has a 40-hex string would be to do the
equivalent of "cat-file -t" to learn if it even exists and what its
type is.  With Stefan's patch, that is what describe command does in
order to choose quite a different codeflow from the traditional mode
when it learns that it was given a blob.

> IIUC the existing `describe` command only accepts <commit-ish> values,
> and here we are extending that to be even more inclusive, but at the
> same time the options become more restricted.

Do you mean that the command should check if it was given an option
that would not be applicable to the "find a commit that has the
blob" mode, once it learns that it was given a blob and needs to go
in that codepath?  I think that would make sense.

> Or have I misunderstood how the fast commit search and the slower
> potentially-a-blob searching are disambiguated?

I do not think so.  We used to barf when we got anything but
commit-ish, but Stefan's new code kicks in if the object turns out
to be a blob---I think that is what you mean by the disambiguation.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 6/7] builtin/describe.c: describe a blob
  2017-11-05  6:28                   ` Junio C Hamano
@ 2017-11-06 23:50                     ` Philip Oakley
  2017-11-09 20:30                       ` Stefan Beller
  0 siblings, 1 reply; 110+ messages in thread
From: Philip Oakley @ 2017-11-06 23:50 UTC (permalink / raw)
  To: Junio C Hamano, Stefan Beller
  Cc: git, Kevin Daudt, Jacob Keller, Johannes Schindelin

From: "Junio C Hamano" <gitster@pobox.com>
Sent: Sunday, November 05, 2017 6:28 AM
> "Philip Oakley" <philipoakley@iee.org> writes:
>
>> Is this not also an alternative case, relative to the user, for the
>> scenario where the user has an oid/sha1 value but does not know what
>> it is, and would like to find its source and type relative to the
>> `describe` command.
>
> I am not sure what you wanted to say with "source and type RELATIVE TO
> the describe command".

The 'relative to' was meaning the user's expectation about this particular 
command. For a non-expert user, who may not have come across cat-file yet, 
their world view may not extend beyond 'Git describe <this>' for me.

>
> The first thing the combination of the user and the describe command
> would do when the user has a 40-hex string would be to do the
> equivalent of "cat-file -t" to learn if it even exists and what its
> type is.  With Stefan's patch, that is what describe command does in
> order to choose quite a different codeflow from the traditional mode
> when it learns that it was given a blob.

I realised, after sending, that this was probably the method for 
non-ambiguous shortened oid's. Thanks for the reminder.
>
>> IIUC the existing `describe` command only accepts <commit-ish> values,
>> and here we are extending that to be even more inclusive, but at the
>> same time the options become more restricted.
>
> Do you mean that the command should check if it was given an option
> that would not be applicable to the "find a commit that has the
> blob" mode, once it learns that it was given a blob and needs to go
> in that codepath?  I think that would make sense.

Correct, it was the option selection aspect.
>
>> Or have I misunderstood how the fast commit search and the slower
>> potentially-a-blob searching are disambiguated?
>
> I do not think so.  We used to barf when we got anything but
> commit-ish, but Stefan's new code kicks in if the object turns out
> to be a blob---I think that is what you mean by the disambiguation.

Correct. We ask to describe an object, but then the option choices may vary 
by type.

The new [blob] synopys only lists <options>, while the old [commit-ish] 
shows specifics. It wasn't clear if the options are the same for both. I 
quess they are the same once the cat-file -t has done its bit. Its only the 
speed that's affected.

As a side note, the commit message example don't show any pathspec that is 
not in the top level directory.

--
Philip 


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 6/7] builtin/describe.c: describe a blob
  2017-11-06 23:50                     ` Philip Oakley
@ 2017-11-09 20:30                       ` Stefan Beller
  2017-11-10  0:25                         ` Philip Oakley
  0 siblings, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2017-11-09 20:30 UTC (permalink / raw)
  To: Philip Oakley
  Cc: Junio C Hamano, git, Kevin Daudt, Jacob Keller, Johannes Schindelin

Rereading this discussion, there is currently no urgent thing to address?
Then the state as announced by the last cooking email, to just cook it, seems
about right and we'll wait for further feedback.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 6/7] builtin/describe.c: describe a blob
  2017-11-09 20:30                       ` Stefan Beller
@ 2017-11-10  0:25                         ` Philip Oakley
  2017-11-10  1:24                           ` Junio C Hamano
  0 siblings, 1 reply; 110+ messages in thread
From: Philip Oakley @ 2017-11-10  0:25 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Junio C Hamano, git, Kevin Daudt, Jacob Keller, Johannes Schindelin

From: "Stefan Beller" <sbeller@google.com>
> Rereading this discussion, there is currently no urgent thing to address?

True.

> Then the state as announced by the last cooking email, to just cook it, 
> seems
> about right and we'll wait for further feedback.

Possibly only checking the documenation aspects, so folks don't fall into 
the same trap as me.. ;-)
--
Philip 


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 6/7] builtin/describe.c: describe a blob
  2017-11-10  0:25                         ` Philip Oakley
@ 2017-11-10  1:24                           ` Junio C Hamano
  2017-11-10 22:44                             ` [PATCH 0/1] describe a blob: with better docs Stefan Beller
  2017-11-20 15:22                             ` [PATCH 6/7] " Philip Oakley
  0 siblings, 2 replies; 110+ messages in thread
From: Junio C Hamano @ 2017-11-10  1:24 UTC (permalink / raw)
  To: Philip Oakley
  Cc: Stefan Beller, git, Kevin Daudt, Jacob Keller, Johannes Schindelin

"Philip Oakley" <philipoakley@iee.org> writes:

> From: "Stefan Beller" <sbeller@google.com>
>> Rereading this discussion, there is currently no urgent thing to address?
>
> True.
>
>> Then the state as announced by the last cooking email, to just cook
>> it, seems
>> about right and we'll wait for further feedback.

A shiny new toy that is not a fix for a grave bug is rarely urgent,
so with that criterion, we'd end up with hundreds of topics not in
'next' but in 'pu' waiting for the original contributor to get out
of his or her procrastination, which certainly is not what I want to
see, as I'd have to throw them into the Stalled bin and then
eventually discard them, while having to worry about possible
mismerges with remaining good topics caused by these topics
appearing and disappearing from 'pu'.

I'd rather see any topic that consumed reviewers' time to be
polished enough to get into 'next' while we all recall the issues
raised during previous reviews.  I consider the process to further
incrementally polish it after that happens a true "cooking".

For this topic, aside from "known issues" that we decided to punt
for now, my impression was that the code is in good enough shape,
and we need a bit of documentation polishes before I can mark it
as "Will merge to 'next'".

> Possibly only checking the documenation aspects, so folks don't fall
> into the same trap as me.. ;-)

Yup, so let's resolve that documentation thing while we remember
that the topic has that issue, and what part of the documentation
we find needs improvement.

I am not sure what "trap: you fell into, though.  Are you saying
that giving

	git describe [<option to describe a commit>...] <commit-ish>
	git describe [<option to describe a blob>...] <blob>

in the synopsis is not helpful, because the user may not know what
kind of object s/he has, and cannot decide from which set of options
to pick?  Then an alternative would be to list

	git describe [<option>...] <object>

in the synopsis, say upfront that most options are applicable only
when describing a commit-ish, and when describing a blob, we do
quite different thing and a separate set of options apply, perhaps?



^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH 0/1] describe a blob: with better docs
  2017-11-10  1:24                           ` Junio C Hamano
@ 2017-11-10 22:44                             ` Stefan Beller
  2017-11-10 22:44                               ` [PATCH] builtin/describe.c: describe a blob Stefan Beller
  2017-11-20 15:22                             ` [PATCH 6/7] " Philip Oakley
  1 sibling, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2017-11-10 22:44 UTC (permalink / raw)
  To: gitster; +Cc: Johannes.Schindelin, git, jacob.keller, me, philipoakley, sbeller

This replaces the last patch of origin/sb/describe-blob.
Interdiff is below.

I chose to not mention options at all, as currently none are applicable;
Check for options and tell the user by die()ing that we don't know about
the options for blobs.

Thanks,
Stefan

  builtin/describe.c: describe a blob

 Documentation/git-describe.txt | 13 +++++++-
 builtin/describe.c             | 71 ++++++++++++++++++++++++++++++++++++++----
 t/t6120-describe.sh            | 15 +++++++++
 3 files changed, 92 insertions(+), 7 deletions(-)

-- 
2.15.0.128.gcadd42da22
diff --git c/Documentation/git-describe.txt w/Documentation/git-describe.txt
index 79ec0be62a..a25443ca91 100644
--- c/Documentation/git-describe.txt
+++ w/Documentation/git-describe.txt
@@ -11,7 +11,7 @@ SYNOPSIS
 [verse]
 'git describe' [--all] [--tags] [--contains] [--abbrev=<n>] [<commit-ish>...]
 'git describe' [--all] [--tags] [--contains] [--abbrev=<n>] --dirty[=<mark>]
-'git describe' [<options>] <blob>
+'git describe' <blob>
 
 DESCRIPTION
 -----------
diff --git c/builtin/describe.c w/builtin/describe.c
index cf08bef344..acfd853a30 100644
--- c/builtin/describe.c
+++ w/builtin/describe.c
@@ -501,9 +501,13 @@ static void describe(const char *arg, int last_one)
 
 	if (cmit)
 		describe_commit(&oid, &sb);
-	else if (lookup_blob(&oid))
+	else if (lookup_blob(&oid)) {
+		if (all || tags || longformat || first_parent ||
+		    patterns.nr || exclude_patterns.nr ||
+		    always || dirty || broken)
+			die(_("options not available for describing blobs"));
 		describe_blob(oid, &sb);
-	else
+	} else
 		die(_("%s is neither a commit nor blob"), arg);
 
 	puts(sb.buf);

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH] builtin/describe.c: describe a blob
  2017-11-10 22:44                             ` [PATCH 0/1] describe a blob: with better docs Stefan Beller
@ 2017-11-10 22:44                               ` Stefan Beller
  2017-11-13  1:33                                 ` Junio C Hamano
  0 siblings, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2017-11-10 22:44 UTC (permalink / raw)
  To: gitster; +Cc: Johannes.Schindelin, git, jacob.keller, me, philipoakley, sbeller

Sometimes users are given a hash of an object and they want to
identify it further (ex.: Use verify-pack to find the largest blobs,
but what are these? or [1])

When describing commits, we try to anchor them to tags or refs, as these
are conceptually on a higher level than the commit. And if there is no ref
or tag that matches exactly, we're out of luck.  So we employ a heuristic
to make up a name for the commit. These names are ambiguous, there might
be different tags or refs to anchor to, and there might be different
path in the DAG to travel to arrive at the commit precisely.

When describing a blob, we want to describe the blob from a higher layer
as well, which is a tuple of (commit, deep/path) as the tree objects
involved are rather uninteresting.  The same blob can be referenced by
multiple commits, so how we decide which commit to use?  This patch
implements a rather naive approach on this: As there are no back pointers
from blobs to commits in which the blob occurs, we'll start walking from
any tips available, listing the blobs in-order of the commit and once we
found the blob, we'll take the first commit that listed the blob.  For
source code this is likely not the first commit that introduced the blob,
but rather the latest commit that contained the blob.  For example:

  git describe v0.99:Makefile
  v0.99-5-gab6625e06a:Makefile

tells us the latest commit that contained the Makefile as it was in tag
v0.99 is commit v0.99-5-gab6625e06a (and at the same path), as the next
commit on top v0.99-6-gb1de9de2b9 ([PATCH] Bootstrap "make dist",
2005-07-11) touches the Makefile.

Let's see how this description turns out, if it is useful in day-to-day
use as I have the intuition that we'd rather want to see the *first*
commit that this blob was introduced to the repository (which can be
achieved easily by giving the `--reverse` flag in the describe_blob rev
walk).

[1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git-describe.txt | 13 +++++++-
 builtin/describe.c             | 71 ++++++++++++++++++++++++++++++++++++++----
 t/t6120-describe.sh            | 15 +++++++++
 3 files changed, 92 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt
index c924c945ba..a25443ca91 100644
--- a/Documentation/git-describe.txt
+++ b/Documentation/git-describe.txt
@@ -3,7 +3,7 @@ git-describe(1)
 
 NAME
 ----
-git-describe - Describe a commit using the most recent tag reachable from it
+git-describe - Describe a commit or blob using the graph relations
 
 
 SYNOPSIS
@@ -11,6 +11,7 @@ SYNOPSIS
 [verse]
 'git describe' [--all] [--tags] [--contains] [--abbrev=<n>] [<commit-ish>...]
 'git describe' [--all] [--tags] [--contains] [--abbrev=<n>] --dirty[=<mark>]
+'git describe' <blob>
 
 DESCRIPTION
 -----------
@@ -24,6 +25,16 @@ By default (without --all or --tags) `git describe` only shows
 annotated tags.  For more information about creating annotated tags
 see the -a and -s options to linkgit:git-tag[1].
 
+If the given object refers to a blob, it will be described
+as `<commit-ish>:<path>`, such that the blob can be found
+at `<path>` in the `<commit-ish>`. Note, that the commit is likely
+not the commit that introduced the blob, but the one that was found
+first; to find the commit that introduced the blob, you need to find
+the commit that last touched the path, e.g.
+`git log <commit-description> -- <path>`.
+As blobs do not point at the commits they are contained in,
+describing blobs is slow as we have to walk the whole graph.
+
 OPTIONS
 -------
 <commit-ish>...::
diff --git a/builtin/describe.c b/builtin/describe.c
index 9e9a5ed5d4..acfd853a30 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -3,6 +3,7 @@
 #include "lockfile.h"
 #include "commit.h"
 #include "tag.h"
+#include "blob.h"
 #include "refs.h"
 #include "builtin.h"
 #include "exec_cmd.h"
@@ -11,8 +12,9 @@
 #include "hashmap.h"
 #include "argv-array.h"
 #include "run-command.h"
+#include "revision.h"
+#include "list-objects.h"
 
-#define SEEN		(1u << 0)
 #define MAX_TAGS	(FLAG_BITS - 1)
 
 static const char * const describe_usage[] = {
@@ -434,6 +436,56 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
 		strbuf_addstr(dst, suffix);
 }
 
+struct process_commit_data {
+	struct object_id current_commit;
+	struct object_id looking_for;
+	struct strbuf *dst;
+	struct rev_info *revs;
+};
+
+static void process_commit(struct commit *commit, void *data)
+{
+	struct process_commit_data *pcd = data;
+	pcd->current_commit = commit->object.oid;
+}
+
+static void process_object(struct object *obj, const char *path, void *data)
+{
+	struct process_commit_data *pcd = data;
+
+	if (!oidcmp(&pcd->looking_for, &obj->oid) && !pcd->dst->len) {
+		reset_revision_walk();
+		describe_commit(&pcd->current_commit, pcd->dst);
+		strbuf_addf(pcd->dst, ":%s", path);
+		pcd->revs->max_count = 0;
+	}
+}
+
+static void describe_blob(struct object_id oid, struct strbuf *dst)
+{
+	struct rev_info revs;
+	struct argv_array args = ARGV_ARRAY_INIT;
+	struct process_commit_data pcd = { null_oid, oid, dst, &revs};
+
+	argv_array_pushl(&args, "internal: The first arg is not parsed",
+		"--all", "--reflog", /* as many starting points as possible */
+		/* NEEDSWORK: --all is incompatible with worktrees for now: */
+		"--single-worktree",
+		"--objects",
+		"--in-commit-order",
+		NULL);
+
+	init_revisions(&revs, NULL);
+	if (setup_revisions(args.argc, args.argv, &revs, NULL) > 1)
+		BUG("setup_revisions could not handle all args?");
+
+	if (prepare_revision_walk(&revs))
+		die("revision walk setup failed");
+
+	traverse_commit_list(&revs, process_commit, process_object, &pcd);
+	reset_revision_walk();
+}
+
 static void describe(const char *arg, int last_one)
 {
 	struct object_id oid;
@@ -445,11 +497,18 @@ static void describe(const char *arg, int last_one)
 
 	if (get_oid(arg, &oid))
 		die(_("Not a valid object name %s"), arg);
-	cmit = lookup_commit_reference(&oid);
-	if (!cmit)
-		die(_("%s is not a valid '%s' object"), arg, commit_type);
-
-	describe_commit(&oid, &sb);
+	cmit = lookup_commit_reference_gently(&oid, 1);
+
+	if (cmit)
+		describe_commit(&oid, &sb);
+	else if (lookup_blob(&oid)) {
+		if (all || tags || longformat || first_parent ||
+		    patterns.nr || exclude_patterns.nr ||
+		    always || dirty || broken)
+			die(_("options not available for describing blobs"));
+		describe_blob(oid, &sb);
+	} else
+		die(_("%s is neither a commit nor blob"), arg);
 
 	puts(sb.buf);
 
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index c8b7ed82d9..aec6ed192d 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -310,6 +310,21 @@ test_expect_success 'describe ignoring a broken submodule' '
 	grep broken out
 '
 
+test_expect_success 'describe a blob at a tag' '
+	echo "make it a unique blob" >file &&
+	git add file && git commit -m "content in file" &&
+	git tag -a -m "latest annotated tag" unique-file &&
+	git describe HEAD:file >actual &&
+	echo "unique-file:file" >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'describe a blob with commit-ish' '
+	git commit --allow-empty -m "empty commit" &&
+	git describe HEAD:file >actual &&
+	grep unique-file-1-g actual
+'
+
 test_expect_failure ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
 	i=1 &&
 	while test $i -lt 8000
-- 
2.15.0.128.gcadd42da22


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH] builtin/describe.c: describe a blob
  2017-11-10 22:44                               ` [PATCH] builtin/describe.c: describe a blob Stefan Beller
@ 2017-11-13  1:33                                 ` Junio C Hamano
  2017-11-14 23:37                                   ` Stefan Beller
  0 siblings, 1 reply; 110+ messages in thread
From: Junio C Hamano @ 2017-11-13  1:33 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Johannes.Schindelin, git, jacob.keller, me, philipoakley

Stefan Beller <sbeller@google.com> writes:

> Sometimes users are given a hash of an object and they want to
> identify it further (ex.: Use verify-pack to find the largest blobs,
> but what are these? or [1])

Thanks for finishing the topic you started.

> @@ -11,6 +11,7 @@ SYNOPSIS
>  [verse]
>  'git describe' [--all] [--tags] [--contains] [--abbrev=<n>] [<commit-ish>...]
>  'git describe' [--all] [--tags] [--contains] [--abbrev=<n>] --dirty[=<mark>]
> +'git describe' <blob>

OK.

> diff --git a/builtin/describe.c b/builtin/describe.c
> index 9e9a5ed5d4..acfd853a30 100644
> --- a/builtin/describe.c
> +++ b/builtin/describe.c
> ...
>  static void describe(const char *arg, int last_one)
>  {
> ...
> @@ -445,11 +497,18 @@ static void describe(const char *arg, int last_one)
> ...
> +	cmit = lookup_commit_reference_gently(&oid, 1);
> +
> +	if (cmit)
> +		describe_commit(&oid, &sb);
> +	else if (lookup_blob(&oid)) {
> +		if (all || tags || longformat || first_parent ||
> +		    patterns.nr || exclude_patterns.nr ||
> +		    always || dirty || broken)
> +			die(_("options not available for describing blobs"));
> +		describe_blob(oid, &sb);

I am not sure if I agree with some of them.

> +	} else
> +		die(_("%s is neither a commit nor blob"), arg);

This side I would agree with, though.

The caller of the describe() function is either

 * 'git describe<RETURN>' makes a single call to it with "HEAD" and
   exits; or
 * 'git describe A B C...' makes one call to it for each of these
   command line arguments.

And 'git describe <blob>' code is most likely trigger from the latter,
as it is not likely for HEAD to be pointing at a blob.

    $ blob=$(git rev-parse master:Makefile)
    $ git describe --always master $blob

and trigger the above check.  Is the check against "always" useful,
or is it better to simply ignore it while describing $blob, but
still keeping it in effect while describing 'master'?

The 'dirty' and 'broken' check is redundant because we would have
already errored out if either of them is set before calling describe()
on user-supplied object names.

If I understand the way "describe <blob>" works correctly, it
traverses the history with objects, doing a moral equivalent of
"rev-list --objects", stops when it finds the blob object with the
given name, and when it stops, it knows the commit object that
contains the blob and path in that commit to the blob.  Then the
commit is described to be a human-readable string, so that the path
can be concatenated after it.

Aren't these options that affect how a commit object is described in
effect and useful when you do the "internal" describe of the commit
you found the blob object in?  IOW, wouldn't this

    $ git describe --exclude='*-wip' $blob

make sure that in its output $commit:$path, $commit is not described
in terms of any "wip" refs?


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv3 3/7] revision.h: introduce blob/tree walking in order of the commits
  2017-11-02 19:41       ` [PATCHv3 3/7] revision.h: introduce blob/tree walking in order of the commits Stefan Beller
@ 2017-11-14 19:52         ` Jonathan Tan
  0 siblings, 0 replies; 110+ messages in thread
From: Jonathan Tan @ 2017-11-14 19:52 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Johannes.Schindelin, git, jacob.keller, me, schwab

On Thu,  2 Nov 2017 12:41:44 -0700
Stefan Beller <sbeller@google.com> wrote:

> @@ -239,6 +239,8 @@ void traverse_commit_list(struct rev_info *revs,
>  		if (commit->tree)
>  			add_pending_tree(revs, commit->tree);
>  		show_commit(commit, data);
> +		if (revs->tree_blobs_in_commit_order)
> +			traverse_trees_and_blobs(revs, &csp, show_object, data);
>  	}
>  	traverse_trees_and_blobs(revs, &csp, show_object, data);
>  

I would have expected add_pending_tree() above to no longer be invoked.
If it still needs to be invoked, maybe add an explanation in the form of
a comment or commit message.

> +test_expect_success 'rev-list --in-commit-order' '
> +	for x in one two three four
> +	do
> +		echo $x >$x &&
> +		git add $x &&
> +		git commit -m "add file $x" ||
> +		return 1
> +	done &&
> +	for x in four three
> +	do
> +		git rm $x &&
> +		git commit -m "remove $x" ||
> +		return 1
> +	done &&
> +	git rev-list --in-commit-order --objects HEAD >actual.raw &&
> +	cut -c 1-40 >actual <actual.raw &&
> +
> +	git cat-file --batch-check="%(objectname)" >expect.raw <<-\EOF &&
> +		HEAD^{commit}
> +		HEAD^{tree}
> +		HEAD^{tree}:one
> +		HEAD^{tree}:two
> +		HEAD~1^{commit}
> +		HEAD~1^{tree}
> +		HEAD~1^{tree}:three
> +		HEAD~2^{commit}
> +		HEAD~2^{tree}
> +		HEAD~2^{tree}:four
> +		HEAD~3^{commit}
> +		# HEAD~3^{tree} skipped, same as HEAD~1^{tree}
> +		HEAD~4^{commit}
> +		# HEAD~4^{tree} skipped, same as HEAD^{tree}
> +		HEAD~5^{commit}
> +		HEAD~5^{tree}
> +	EOF
> +	grep -v "#" >expect <expect.raw &&
> +
> +	test_cmp expect actual
> +'

Would it be useful to have another test without --in-commit-order, so
that we can see the difference (and ensure that existing behavior is
unchanged)?

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv3 5/7] builtin/describe.c: print debug statements earlier
  2017-11-02 19:41       ` [PATCHv3 5/7] builtin/describe.c: print debug statements earlier Stefan Beller
@ 2017-11-14 19:55         ` Jonathan Tan
  2017-11-14 20:00           ` Stefan Beller
  0 siblings, 1 reply; 110+ messages in thread
From: Jonathan Tan @ 2017-11-14 19:55 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Johannes.Schindelin, git, jacob.keller, me, schwab

On Thu,  2 Nov 2017 12:41:46 -0700
Stefan Beller <sbeller@google.com> wrote:

> For debuggers aid we'd want to print debug statements early, so
> introduce a new line in the debug output that describes the whole
> function, and then change the next debug output to describe why we
> need to search. Conveniently drop the arg from the second line;
> which will be useful in a follow up commit, that refactors the\
> describe function.
> 
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  builtin/describe.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/describe.c b/builtin/describe.c
> index fd61f463cf..3136efde31 100644
> --- a/builtin/describe.c
> +++ b/builtin/describe.c
> @@ -293,6 +293,9 @@ static void describe(const char *arg, int last_one)
>  	unsigned long seen_commits = 0;
>  	unsigned int unannotated_cnt = 0;
>  
> +	if (debug)
> +		fprintf(stderr, _("describe %s\n"), arg);
> +

Could you explain in the commit message why this wasn't needed before
(if it wasn't needed), and why this is needed now?

>  	if (get_oid(arg, &oid))
>  		die(_("Not a valid object name %s"), arg);
>  	cmit = lookup_commit_reference(&oid);
> @@ -316,7 +319,7 @@ static void describe(const char *arg, int last_one)
>  	if (!max_candidates)
>  		die(_("no tag exactly matches '%s'"), oid_to_hex(&cmit->object.oid));
>  	if (debug)
> -		fprintf(stderr, _("searching to describe %s\n"), arg);
> +		fprintf(stderr, _("No exact match on refs or tags, searching to describe\n"));

What is this arg that can be safely dropped?

You mention that it is for convenience (since the describe() function
will be refactored), but could the arg just be passed to the new
function?

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv3 5/7] builtin/describe.c: print debug statements earlier
  2017-11-14 19:55         ` Jonathan Tan
@ 2017-11-14 20:00           ` Stefan Beller
  0 siblings, 0 replies; 110+ messages in thread
From: Stefan Beller @ 2017-11-14 20:00 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: Johannes Schindelin, git, Jacob Keller, Kevin Daudt, Andreas Schwab

On Tue, Nov 14, 2017 at 11:55 AM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On Thu,  2 Nov 2017 12:41:46 -0700
> Stefan Beller <sbeller@google.com> wrote:
>
>> For debuggers aid we'd want to print debug statements early, so
>> introduce a new line in the debug output that describes the whole
>> function, and then change the next debug output to describe why we
>> need to search. Conveniently drop the arg from the second line;
>> which will be useful in a follow up commit, that refactors the\
>> describe function.
>>
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>> ---
>>  builtin/describe.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/builtin/describe.c b/builtin/describe.c
>> index fd61f463cf..3136efde31 100644
>> --- a/builtin/describe.c
>> +++ b/builtin/describe.c
>> @@ -293,6 +293,9 @@ static void describe(const char *arg, int last_one)
>>       unsigned long seen_commits = 0;
>>       unsigned int unannotated_cnt = 0;
>>
>> +     if (debug)
>> +             fprintf(stderr, _("describe %s\n"), arg);
>> +
>
> Could you explain in the commit message why this wasn't needed before
> (if it wasn't needed), and why this is needed now?
>
>>       if (get_oid(arg, &oid))
>>               die(_("Not a valid object name %s"), arg);
>>       cmit = lookup_commit_reference(&oid);
>> @@ -316,7 +319,7 @@ static void describe(const char *arg, int last_one)
>>       if (!max_candidates)
>>               die(_("no tag exactly matches '%s'"), oid_to_hex(&cmit->object.oid));
>>       if (debug)
>> -             fprintf(stderr, _("searching to describe %s\n"), arg);
>> +             fprintf(stderr, _("No exact match on refs or tags, searching to describe\n"));
>
> What is this arg that can be safely dropped?
>
> You mention that it is for convenience (since the describe() function
> will be refactored), but could the arg just be passed to the new
> function?

It could, but I want to avoid that just to print a debugging statement
inside the
function. So I factor the debugging printing out of the function
introduced in the
next patch.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv3 7/7] builtin/describe.c: describe a blob
  2017-11-02 19:41       ` [PATCHv3 7/7] builtin/describe.c: describe a blob Stefan Beller
@ 2017-11-14 20:02         ` Jonathan Tan
  2017-11-14 20:40           ` Stefan Beller
  0 siblings, 1 reply; 110+ messages in thread
From: Jonathan Tan @ 2017-11-14 20:02 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Johannes.Schindelin, git, jacob.keller, me, schwab

On Thu,  2 Nov 2017 12:41:48 -0700
Stefan Beller <sbeller@google.com> wrote:

> Sometimes users are given a hash of an object and they want to
> identify it further (ex.: Use verify-pack to find the largest blobs,
> but what are these? or [1])
> 
> "This is an interesting endeavor, because describing things is hard."
>   -- me, upon writing this patch.
> 
> When describing commits, we try to anchor them to tags or refs, as these
> are conceptually on a higher level than the commit. And if there is no ref
> or tag that matches exactly, we're out of luck.  So we employ a heuristic
> to make up a name for the commit. These names are ambiguous, there might
> be different tags or refs to anchor to, and there might be different
> path in the DAG to travel to arrive at the commit precisely.
> 
> When describing a blob, we want to describe the blob from a higher layer
> as well, which is a tuple of (commit, deep/path) as the tree objects
> involved are rather uninteresting.  The same blob can be referenced by
> multiple commits, so how we decide which commit to use?  This patch
> implements a rather naive approach on this: As there are no back pointers
> from blobs to commits in which the blob occurs, we'll start walking from
> any tips available, listing the blobs in-order of the commit and once we
> found the blob, we'll take the first commit that listed the blob.  For
> source code this is likely not the first commit that introduced the blob,
> but rather the latest commit that contained the blob.  For example:
> 
>   git describe v0.99:Makefile
>   v0.99-5-gab6625e06a:Makefile
> 
> tells us the latest commit that contained the Makefile as it was in tag
> v0.99 is commit v0.99-5-gab6625e06a (and at the same path), as the next
> commit on top v0.99-6-gb1de9de2b9 ([PATCH] Bootstrap "make dist",
> 2005-07-11) touches the Makefile.
> 
> Let's see how this description turns out, if it is useful in day-to-day
> use as I have the intuition that we'd rather want to see the *first*
> commit that this blob was introduced to the repository (which can be
> achieved easily by giving the `--reverse` flag in the describe_blob rev
> walk).

The method of your intuition indeed seems better - could we just have
this from the start?

Alternatively, to me, it seems that listing commits that *introduces*
the blob (that is, where it references the blob, but none of its parents
do) would be the best way. That would then be independent of traversal
order (and we would no longer need to find a tag etc. to tie the blob
to).

If we do that, it seems to me that there is a future optimization that
could get the first commit to the user more quickly - once a commit
without the blob and a descendant commit with the blob is found, that
interval can be bisected, so that the first commit is found in O(log
number of commits) instead of O(commits). But this can be done later.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv3 7/7] builtin/describe.c: describe a blob
  2017-11-14 20:02         ` Jonathan Tan
@ 2017-11-14 20:40           ` Stefan Beller
  2017-11-14 21:17             ` Jonathan Tan
  0 siblings, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2017-11-14 20:40 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: Johannes Schindelin, git, Jacob Keller, Kevin Daudt, Andreas Schwab

On Tue, Nov 14, 2017 at 12:02 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On Thu,  2 Nov 2017 12:41:48 -0700
> Stefan Beller <sbeller@google.com> wrote:
>
>> Sometimes users are given a hash of an object and they want to
>> identify it further (ex.: Use verify-pack to find the largest blobs,
>> but what are these? or [1])
>>
>> "This is an interesting endeavor, because describing things is hard."
>>   -- me, upon writing this patch.
>>
>> When describing commits, we try to anchor them to tags or refs, as these
>> are conceptually on a higher level than the commit. And if there is no ref
>> or tag that matches exactly, we're out of luck.  So we employ a heuristic
>> to make up a name for the commit. These names are ambiguous, there might
>> be different tags or refs to anchor to, and there might be different
>> path in the DAG to travel to arrive at the commit precisely.
>>
>> When describing a blob, we want to describe the blob from a higher layer
>> as well, which is a tuple of (commit, deep/path) as the tree objects
>> involved are rather uninteresting.  The same blob can be referenced by
>> multiple commits, so how we decide which commit to use?  This patch
>> implements a rather naive approach on this: As there are no back pointers
>> from blobs to commits in which the blob occurs, we'll start walking from
>> any tips available, listing the blobs in-order of the commit and once we
>> found the blob, we'll take the first commit that listed the blob.  For
>> source code this is likely not the first commit that introduced the blob,
>> but rather the latest commit that contained the blob.  For example:
>>
>>   git describe v0.99:Makefile
>>   v0.99-5-gab6625e06a:Makefile
>>
>> tells us the latest commit that contained the Makefile as it was in tag
>> v0.99 is commit v0.99-5-gab6625e06a (and at the same path), as the next
>> commit on top v0.99-6-gb1de9de2b9 ([PATCH] Bootstrap "make dist",
>> 2005-07-11) touches the Makefile.
>>
>> Let's see how this description turns out, if it is useful in day-to-day
>> use as I have the intuition that we'd rather want to see the *first*
>> commit that this blob was introduced to the repository (which can be
>> achieved easily by giving the `--reverse` flag in the describe_blob rev
>> walk).
>
> The method of your intuition indeed seems better - could we just have
> this from the start?

Thanks for the review!

This series was written with the mindset, that a user would only ever
want to describe bad blobs. (bad in terms of file size, unwanted content, etc)

With the --reverse you only see the *first* introduction of said blob,
so finding out if it was re-introduced is still not as easy, whereas "when
was this blob last used" which is what the current algorithm does, covers
that case better.

> Alternatively, to me, it seems that listing commits that *introduces*
> the blob (that is, where it references the blob, but none of its parents
> do) would be the best way. That would then be independent of traversal
> order (and we would no longer need to find a tag etc. to tie the blob
> to).

What if it is introduced multiple times? (either in multiple competing
side branches; or introduced, reverted and re-introduced?)

> If we do that, it seems to me that there is a future optimization that
> could get the first commit to the user more quickly - once a commit
> without the blob and a descendant commit with the blob is found, that
> interval can be bisected, so that the first commit is found in O(log
> number of commits) instead of O(commits). But this can be done later.

bisection assumes that we only have one "event" going from good to
bad, which doesn't hold true here, as the blob can be there at different
occasions of the history.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCHv3 7/7] builtin/describe.c: describe a blob
  2017-11-14 20:40           ` Stefan Beller
@ 2017-11-14 21:17             ` Jonathan Tan
  0 siblings, 0 replies; 110+ messages in thread
From: Jonathan Tan @ 2017-11-14 21:17 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Johannes Schindelin, git, Jacob Keller, Kevin Daudt, Andreas Schwab

On Tue, 14 Nov 2017 12:40:03 -0800
Stefan Beller <sbeller@google.com> wrote:

> Thanks for the review!
> 
> This series was written with the mindset, that a user would only ever
> want to describe bad blobs. (bad in terms of file size, unwanted content, etc)
> 
> With the --reverse you only see the *first* introduction of said blob,
> so finding out if it was re-introduced is still not as easy, whereas "when
> was this blob last used" which is what the current algorithm does, covers
> that case better.

How does "when was this blob last used" cover reintroduction better? If
you want to check all introductions, then you'll need something like
what I describe (quoted below).

> > Alternatively, to me, it seems that listing commits that *introduces*
> > the blob (that is, where it references the blob, but none of its parents
> > do) would be the best way. That would then be independent of traversal
> > order (and we would no longer need to find a tag etc. to tie the blob
> > to).
> 
> What if it is introduced multiple times? (either in multiple competing
> side branches; or introduced, reverted and re-introduced?)

Then all of them should be listed.

> > If we do that, it seems to me that there is a future optimization that
> > could get the first commit to the user more quickly - once a commit
> > without the blob and a descendant commit with the blob is found, that
> > interval can be bisected, so that the first commit is found in O(log
> > number of commits) instead of O(commits). But this can be done later.
> 
> bisection assumes that we only have one "event" going from good to
> bad, which doesn't hold true here, as the blob can be there at different
> occasions of the history.

Yes, bisection would only be useful to output one commit more quickly.
We will still need to exhaustively search for all commits if the user
needs more than one.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH] builtin/describe.c: describe a blob
  2017-11-13  1:33                                 ` Junio C Hamano
@ 2017-11-14 23:37                                   ` Stefan Beller
  0 siblings, 0 replies; 110+ messages in thread
From: Stefan Beller @ 2017-11-14 23:37 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, git, Jacob Keller, Kevin Daudt, Philip Oakley

On Sun, Nov 12, 2017 at 5:33 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> Sometimes users are given a hash of an object and they want to
>> identify it further (ex.: Use verify-pack to find the largest blobs,
>> but what are these? or [1])
>
> Thanks for finishing the topic you started.
>
>> @@ -11,6 +11,7 @@ SYNOPSIS
>>  [verse]
>>  'git describe' [--all] [--tags] [--contains] [--abbrev=<n>] [<commit-ish>...]
>>  'git describe' [--all] [--tags] [--contains] [--abbrev=<n>] --dirty[=<mark>]
>> +'git describe' <blob>
>
> OK.
>
>> diff --git a/builtin/describe.c b/builtin/describe.c
>> index 9e9a5ed5d4..acfd853a30 100644
>> --- a/builtin/describe.c
>> +++ b/builtin/describe.c
>> ...
>>  static void describe(const char *arg, int last_one)
>>  {
>> ...
>> @@ -445,11 +497,18 @@ static void describe(const char *arg, int last_one)
>> ...
>> +     cmit = lookup_commit_reference_gently(&oid, 1);
>> +
>> +     if (cmit)
>> +             describe_commit(&oid, &sb);
>> +     else if (lookup_blob(&oid)) {
>> +             if (all || tags || longformat || first_parent ||
>> +                 patterns.nr || exclude_patterns.nr ||
>> +                 always || dirty || broken)
>> +                     die(_("options not available for describing blobs"));
>> +             describe_blob(oid, &sb);
>
> I am not sure if I agree with some of them.
>
>> +     } else
>> +             die(_("%s is neither a commit nor blob"), arg);
>
> This side I would agree with, though.
>
> The caller of the describe() function is either
>
>  * 'git describe<RETURN>' makes a single call to it with "HEAD" and
>    exits; or
>  * 'git describe A B C...' makes one call to it for each of these
>    command line arguments.
>
> And 'git describe <blob>' code is most likely trigger from the latter,
> as it is not likely for HEAD to be pointing at a blob.
>
>     $ blob=$(git rev-parse master:Makefile)
>     $ git describe --always master $blob
>
> and trigger the above check.  Is the check against "always" useful,
> or is it better to simply ignore it while describing $blob, but
> still keeping it in effect while describing 'master'?
>
> The 'dirty' and 'broken' check is redundant because we would have
> already errored out if either of them is set before calling describe()
> on user-supplied object names.
>
> If I understand the way "describe <blob>" works correctly, it
> traverses the history with objects, doing a moral equivalent of
> "rev-list --objects", stops when it finds the blob object with the
> given name, and when it stops, it knows the commit object that
> contains the blob and path in that commit to the blob.  Then the
> commit is described to be a human-readable string, so that the path
> can be concatenated after it.
>
> Aren't these options that affect how a commit object is described in
> effect and useful when you do the "internal" describe of the commit
> you found the blob object in?  IOW, wouldn't this
>
>     $ git describe --exclude='*-wip' $blob
>
> make sure that in its output $commit:$path, $commit is not described
> in terms of any "wip" refs?

yes, we would exclude those refs. But the name alone (without reading the docs)
suggests anything given in --exclude is excluded, so what about the blob

    $commit:path-wip/test1.c

which I might want to exclude from the search using the exclude pattern?
After reading the docs, this is silly of course, and the exclusion only applies
to the commit description part.

--all is an interesting case, as we pass --all to the revision walking machinery
for blobs, but that is slightly different than the --all flag given to describe,
which also only relates to the commit name that should be produced.
So, I'll go through all the options and see which we can drop.

Thanks,
Stefan


>

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 6/7] builtin/describe.c: describe a blob
  2017-11-10  1:24                           ` Junio C Hamano
  2017-11-10 22:44                             ` [PATCH 0/1] describe a blob: with better docs Stefan Beller
@ 2017-11-20 15:22                             ` Philip Oakley
  2017-11-20 18:18                               ` Philip Oakley
  1 sibling, 1 reply; 110+ messages in thread
From: Philip Oakley @ 2017-11-20 15:22 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Stefan Beller, git, Kevin Daudt, Jacob Keller, Johannes Schindelin

From: "Junio C Hamano" <gitster@pobox.com>
: Friday, November 10, 2017 1:24 AM
[catch up]

> "Philip Oakley" <philipoakley@iee.org> writes:
>
>> From: "Stefan Beller" <sbeller@google.com>
>>> Rereading this discussion, there is currently no urgent thing to 
>>> address?
>>
>> True.
>>
>>> Then the state as announced by the last cooking email, to just cook
>>> it, seems
>>> about right and we'll wait for further feedback.
>
> A shiny new toy that is not a fix for a grave bug is rarely urgent,
> so with that criterion, we'd end up with hundreds of topics not in
> 'next' but in 'pu' waiting for the original contributor to get out
> of his or her procrastination, which certainly is not what I want to
> see, as I'd have to throw them into the Stalled bin and then
> eventually discard them, while having to worry about possible
> mismerges with remaining good topics caused by these topics
> appearing and disappearing from 'pu'.
>
> I'd rather see any topic that consumed reviewers' time to be
> polished enough to get into 'next' while we all recall the issues
> raised during previous reviews.  I consider the process to further
> incrementally polish it after that happens a true "cooking".
>
> For this topic, aside from "known issues" that we decided to punt
> for now, my impression was that the code is in good enough shape,
> and we need a bit of documentation polishes before I can mark it
> as "Will merge to 'next'".
>
>> Possibly only checking the documenation aspects, so folks don't fall
>> into the same trap as me.. ;-)
>
> Yup, so let's resolve that documentation thing while we remember
> that the topic has that issue, and what part of the documentation
> we find needs improvement.
>
> I am not sure what "trap: you fell into, though.  Are you saying
> that giving
>
> git describe [<option to describe a commit>...] <commit-ish>
> git describe [<option to describe a blob>...] <blob>
>
> in the synopsis is not helpful, because the user may not know what
> kind of object s/he has, and cannot decide from which set of options
> to pick?  Then an alternative would be to list

(If I remember correctly) My nit pick was roughly along the lines you 
suggest, and that the two option lists (for commit-ish and blob) were shown 
in different ways, which could lead to the scenarion that, with knowing the 
oid object type (or knowing how to get it), the user could give an invalid 
option, and think the command failure was because the oid was invalid, not 
that the option was not appropriate, along with variations on that theme.

The newer synopsis (v5) looks Ok in that it avoids digging the hole by not 
mentioning the blob options. Personally I'm more for manuals that tend 
toward instructional, rather than being expert references. I'd sneak in a 
line saying "The object type can be determined using `git cat-file`.", but 
maybe that's my work environment...

>
> git describe [<option>...] <object>
>
> in the synopsis, say upfront that most options are applicable only
> when describing a commit-ish, and when describing a blob, we do
> quite different thing and a separate set of options apply, perhaps?
>
--
Philip 


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 6/7] builtin/describe.c: describe a blob
  2017-11-20 15:22                             ` [PATCH 6/7] " Philip Oakley
@ 2017-11-20 18:18                               ` Philip Oakley
  0 siblings, 0 replies; 110+ messages in thread
From: Philip Oakley @ 2017-11-20 18:18 UTC (permalink / raw)
  To: Philip Oakley, Junio C Hamano
  Cc: Stefan Beller, git, Kevin Daudt, Jacob Keller, Johannes Schindelin

From: "Philip Oakley" <philipoakley@iee.org>

s/with/without/  ...

> From: "Junio C Hamano" <gitster@pobox.com>
> : Friday, November 10, 2017 1:24 AM
> [catch up]
>
>> "Philip Oakley" <philipoakley@iee.org> writes:
>>
>>> From: "Stefan Beller" <sbeller@google.com>
>>>> Rereading this discussion, there is currently no urgent thing to 
>>>> address?
>>>
>>> True.
>>>
>>>> Then the state as announced by the last cooking email, to just cook
>>>> it, seems
>>>> about right and we'll wait for further feedback.
>>
>> A shiny new toy that is not a fix for a grave bug is rarely urgent,
>> so with that criterion, we'd end up with hundreds of topics not in
>> 'next' but in 'pu' waiting for the original contributor to get out
>> of his or her procrastination, which certainly is not what I want to
>> see, as I'd have to throw them into the Stalled bin and then
>> eventually discard them, while having to worry about possible
>> mismerges with remaining good topics caused by these topics
>> appearing and disappearing from 'pu'.
>>
>> I'd rather see any topic that consumed reviewers' time to be
>> polished enough to get into 'next' while we all recall the issues
>> raised during previous reviews.  I consider the process to further
>> incrementally polish it after that happens a true "cooking".
>>
>> For this topic, aside from "known issues" that we decided to punt
>> for now, my impression was that the code is in good enough shape,
>> and we need a bit of documentation polishes before I can mark it
>> as "Will merge to 'next'".
>>
>>> Possibly only checking the documenation aspects, so folks don't fall
>>> into the same trap as me.. ;-)
>>
>> Yup, so let's resolve that documentation thing while we remember
>> that the topic has that issue, and what part of the documentation
>> we find needs improvement.
>>
>> I am not sure what "trap: you fell into, though.  Are you saying
>> that giving
>>
>> git describe [<option to describe a commit>...] <commit-ish>
>> git describe [<option to describe a blob>...] <blob>
>>
>> in the synopsis is not helpful, because the user may not know what
>> kind of object s/he has, and cannot decide from which set of options
>> to pick?  Then an alternative would be to list
>
> (If I remember correctly) My nit pick was roughly along the lines you 
> suggest, and that the two option lists (for commit-ish and blob) were 
> shown in different ways, which could lead to the scenarion that, with 
> knowing the

s/with/without/  ...

> oid object type (or knowing how to get it), the user could give an invalid 
> option, and think the command failure was because the oid was invalid, not 
> that the option was not appropriate, along with variations on that theme.
>
> The newer synopsis (v5) looks Ok in that it avoids digging the hole by not 
> mentioning the blob options. Personally I'm more for manuals that tend 
> toward instructional, rather than being expert references. I'd sneak in a 
> line saying "The object type can be determined using `git cat-file`.", but 
> maybe that's my work environment...
>
>>
>> git describe [<option>...] <object>
>>
>> in the synopsis, say upfront that most options are applicable only
>> when describing a commit-ish, and when describing a blob, we do
>> quite different thing and a separate set of options apply, perhaps?
>>
> --
> Philip 


^ permalink raw reply	[flat|nested] 110+ messages in thread

end of thread, other threads:[~2017-11-20 18:19 UTC | newest]

Thread overview: 110+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-28  0:44 [RFC PATCH 0/3] git-describe <blob> ? Stefan Beller
2017-10-28  0:45 ` [PATCH 1/3] list-objects.c: factor out traverse_trees_and_blobs Stefan Beller
2017-10-28  0:45   ` [PATCH 2/3] revision.h: introduce blob/tree walking in order of the commits Stefan Beller
2017-10-28 17:20     ` Johannes Schindelin
2017-10-29  3:22       ` Stefan Beller
2017-10-29  3:23         ` Stefan Beller
2017-10-29  3:43           ` Junio C Hamano
2017-10-28  0:45   ` [PATCH 3/3] builtin/describe: describe blobs Stefan Beller
2017-10-28 17:32     ` Johannes Schindelin
2017-10-28 22:47       ` Jacob Keller
2017-10-29  3:28       ` Stefan Beller
2017-10-29 12:02         ` Kevin Daudt
2017-10-29 12:07         ` Johannes Schindelin
2017-10-28 17:15   ` [PATCH 1/3] list-objects.c: factor out traverse_trees_and_blobs Johannes Schindelin
2017-10-29  3:13     ` Stefan Beller
2017-10-28 16:04 ` [RFC PATCH 0/3] git-describe <blob> ? Johannes Schindelin
2017-10-31  0:33 ` [PATCH 0/7] git-describe <blob> Stefan Beller
2017-10-31  0:33   ` [PATCH 1/7] list-objects.c: factor out traverse_trees_and_blobs Stefan Beller
2017-10-31  6:07     ` Junio C Hamano
2017-10-31  0:33   ` [PATCH 2/7] revision.h: introduce blob/tree walking in order of the commits Stefan Beller
2017-10-31  6:57     ` Junio C Hamano
2017-10-31 18:12       ` Stefan Beller
2017-10-31  0:33   ` [PATCH 3/7] builtin/describe.c: rename `oid` to avoid variable shadowing Stefan Beller
2017-10-31  8:15     ` Jacob Keller
2017-10-31  0:33   ` [PATCH 4/7] builtin/describe.c: print debug statements earlier Stefan Beller
2017-10-31  7:03     ` Junio C Hamano
2017-10-31 19:05       ` Stefan Beller
2017-10-31  0:33   ` [PATCH 5/7] builtin/describe.c: factor out describe_commit Stefan Beller
2017-10-31  0:33   ` [PATCH 6/7] builtin/describe.c: describe a blob Stefan Beller
2017-10-31  6:25     ` Junio C Hamano
2017-10-31 19:16       ` Stefan Beller
2017-11-01  3:34         ` Junio C Hamano
2017-11-01 20:58           ` Stefan Beller
2017-11-02  1:53             ` Junio C Hamano
2017-11-02  4:23               ` Junio C Hamano
2017-11-04 21:15                 ` Philip Oakley
2017-11-05  6:28                   ` Junio C Hamano
2017-11-06 23:50                     ` Philip Oakley
2017-11-09 20:30                       ` Stefan Beller
2017-11-10  0:25                         ` Philip Oakley
2017-11-10  1:24                           ` Junio C Hamano
2017-11-10 22:44                             ` [PATCH 0/1] describe a blob: with better docs Stefan Beller
2017-11-10 22:44                               ` [PATCH] builtin/describe.c: describe a blob Stefan Beller
2017-11-13  1:33                                 ` Junio C Hamano
2017-11-14 23:37                                   ` Stefan Beller
2017-11-20 15:22                             ` [PATCH 6/7] " Philip Oakley
2017-11-20 18:18                               ` Philip Oakley
2017-11-01  3:44         ` Junio C Hamano
2017-10-31  0:33   ` [PATCH 7/7] t6120: fix typo in test name Stefan Beller
2017-11-01  1:21     ` Junio C Hamano
2017-11-01 18:13       ` Stefan Beller
2017-11-02  1:36       ` Junio C Hamano
2017-10-31 21:18   ` [PATCHv2 0/7] git describe blob Stefan Beller
2017-10-31 21:18     ` [PATCHv2 1/7] list-objects.c: factor out traverse_trees_and_blobs Stefan Beller
2017-11-01  3:46       ` Junio C Hamano
2017-10-31 21:18     ` [PATCHv2 2/7] revision.h: introduce blob/tree walking in order of the commits Stefan Beller
2017-11-01  3:50       ` Junio C Hamano
2017-11-01 12:26         ` Johannes Schindelin
2017-11-01 12:37           ` Junio C Hamano
2017-11-01 19:37             ` Stefan Beller
2017-11-01 22:08               ` Johannes Schindelin
2017-11-01 22:19                 ` Stefan Beller
2017-11-01 22:39                   ` Johannes Schindelin
2017-11-01 22:46                     ` Stefan Beller
2017-11-01 21:36             ` Johannes Schindelin
2017-11-01 21:39               ` Jeff King
2017-11-01 22:33                 ` Johannes Schindelin
2017-11-02  1:20                   ` Junio C Hamano
2017-10-31 21:18     ` [PATCHv2 3/7] builtin/describe.c: rename `oid` to avoid variable shadowing Stefan Beller
2017-10-31 21:18     ` [PATCHv2 4/7] builtin/describe.c: print debug statements earlier Stefan Beller
2017-10-31 21:31       ` Eric Sunshine
2017-10-31 21:18     ` [PATCHv2 5/7] builtin/describe.c: factor out describe_commit Stefan Beller
2017-10-31 21:18     ` [PATCHv2 6/7] builtin/describe.c: describe a blob Stefan Beller
2017-10-31 21:49       ` Eric Sunshine
2017-11-01 19:51         ` Stefan Beller
2017-11-01  4:11       ` Junio C Hamano
2017-11-01 12:32         ` Johannes Schindelin
2017-11-01 17:59           ` Stefan Beller
2017-11-01 21:05             ` Jacob Keller
2017-11-01 22:12               ` Johannes Schindelin
2017-11-01 22:21                 ` Stefan Beller
2017-11-01 22:41                   ` Johannes Schindelin
2017-11-01 22:53                     ` Stefan Beller
2017-11-02  6:05                     ` Jacob Keller
2017-11-03  5:18                       ` Junio C Hamano
2017-11-03  6:55                         ` Jacob Keller
2017-11-03 15:02                           ` Junio C Hamano
2017-11-02  7:23                     ` Andreas Schwab
2017-11-02 18:18                       ` Stefan Beller
2017-11-03 12:05                         ` Johannes Schindelin
2017-11-01 21:28         ` Stefan Beller
2017-10-31 21:18     ` [PATCHv2 7/7] t6120: fix typo in test name Stefan Beller
2017-11-01  5:14     ` [PATCHv2 0/7] git describe blob Junio C Hamano
2017-11-02 19:41     ` [PATCHv3 " Stefan Beller
2017-11-02 19:41       ` [PATCHv3 1/7] t6120: fix typo in test name Stefan Beller
2017-11-02 19:41       ` [PATCHv3 2/7] list-objects.c: factor out traverse_trees_and_blobs Stefan Beller
2017-11-02 19:41       ` [PATCHv3 3/7] revision.h: introduce blob/tree walking in order of the commits Stefan Beller
2017-11-14 19:52         ` Jonathan Tan
2017-11-02 19:41       ` [PATCHv3 4/7] builtin/describe.c: rename `oid` to avoid variable shadowing Stefan Beller
2017-11-02 19:41       ` [PATCHv3 5/7] builtin/describe.c: print debug statements earlier Stefan Beller
2017-11-14 19:55         ` Jonathan Tan
2017-11-14 20:00           ` Stefan Beller
2017-11-02 19:41       ` [PATCHv3 6/7] builtin/describe.c: factor out describe_commit Stefan Beller
2017-11-02 19:41       ` [PATCHv3 7/7] builtin/describe.c: describe a blob Stefan Beller
2017-11-14 20:02         ` Jonathan Tan
2017-11-14 20:40           ` Stefan Beller
2017-11-14 21:17             ` Jonathan Tan
2017-11-03  0:23       ` [PATCHv3 0/7] git describe blob Jacob Keller
2017-11-03  1:46         ` Junio C Hamano
2017-11-03  2:29           ` Stefan Beller

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git