git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* gitattributes export-subst and software versioning
@ 2021-01-25  0:32 Eli Schwartz
  2021-02-08 19:46 ` René Scharfe
  2021-02-14 10:04 ` [PATCH 1/2] pretty: add %(describe) René Scharfe
  0 siblings, 2 replies; 27+ messages in thread
From: Eli Schwartz @ 2021-01-25  0:32 UTC (permalink / raw)
  To: git

Periodically, I wonder if there is some better way for managing tagged 
releases for software in git. Current state of the art seems to be 
"write a custom Makefile that takes a version and seds out the existing 
version, then runs git tag for you". Inelegant solutions also abound; 
people release code that does not build properly unless you build it 
from a git checkout so it can run git describe. (There are half a dozen 
individually popular but mutually exclusive python ecosystems for this, 
in fact, all of them varying degrees of broken.)

git does have a way to automatically insert metadata via the 
export-subst attribute on a file, but it's very awkward to use and you 
cannot get much info out of it.

# get tags into a file, only on exact match:

$ cat VERSION
$Format:%d$
$Format:%D$

$ git archive HEAD | bsdtar -xOf - VERSION
   (HEAD -> master, tag: 1.0)
HEAD -> master, tag: 1.0

With sufficient regex, you can get a release out of this, but it doesn't 
work if you try getting an autogenerated tarball for a commit that isn't 
exactly a release.

$ git commit --allow-empty -m ...
$ git archive HEAD | bsdtar -xOf - VERSION
   (HEAD -> master)
HEAD -> master

I think it would be much, much nicer if there was a format placeholder 
for git describe.

It doesn't even need option support -- the default output in most cases 
could be a replacement for or fall back to existing invocations of the 
"git" program, followed by post-processing with e.g. "sed".

However, the existence of current pretty formats such as %C() or 
%(trailer:options) implies that options could be passed in a 
git-describe format too. e.g. %(describe:--long --tags --match="v*")

Thoughts?

-- 
Eli Schwartz
Arch Linux Bug Wrangler and Trusted User



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gitattributes export-subst and software versioning
  2021-01-25  0:32 gitattributes export-subst and software versioning Eli Schwartz
@ 2021-02-08 19:46 ` René Scharfe
  2021-02-08 22:41   ` Junio C Hamano
  2021-02-09  0:19   ` Eli Schwartz
  2021-02-14 10:04 ` [PATCH 1/2] pretty: add %(describe) René Scharfe
  1 sibling, 2 replies; 27+ messages in thread
From: René Scharfe @ 2021-02-08 19:46 UTC (permalink / raw)
  To: Eli Schwartz, git

Am 25.01.21 um 01:32 schrieb Eli Schwartz:
> Periodically, I wonder if there is some better way for managing
> tagged releases for software in git. Current state of the art seems
> to be "write a custom Makefile that takes a version and seds out the
> existing version, then runs git tag for you". Inelegant solutions
> also abound; people release code that does not build properly unless
> you build it from a git checkout so it can run git describe. (There
> are half a dozen individually popular but mutually exclusive python
> ecosystems for this, in fact, all of them varying degrees of
> broken.)
>
> git does have a way to automatically insert metadata via the
> export-subst attribute on a file, but it's very awkward to use and
> you cannot get much info out of it.
>
> # get tags into a file, only on exact match:
>
> $ cat VERSION $Format:%d$ $Format:%D$
>
> $ git archive HEAD | bsdtar -xOf - VERSION (HEAD -> master, tag:
> 1.0) HEAD -> master, tag: 1.0
>
> With sufficient regex, you can get a release out of this, but it
> doesn't work if you try getting an autogenerated tarball for a commit
> that isn't exactly a release.
>
> $ git commit --allow-empty -m ... $ git archive HEAD | bsdtar -xOf -
> VERSION (HEAD -> master) HEAD -> master
>
> I think it would be much, much nicer if there was a format
> placeholder for git describe.

Totally.

> It doesn't even need option support -- the default output in most
> cases could be a replacement for or fall back to existing invocations
> of the "git" program, followed by post-processing with e.g. "sed".
>
> However, the existence of current pretty formats such as %C() or
> %(trailer:options) implies that options could be passed in a
> git-describe format too. e.g. %(describe:--long --tags --match="v*")
>
> Thoughts?

git archive uses the pretty format code for export-subst.  It is used by
git log and others as well.  git describe uses all object flags to find
the best description.  Simply plugging it into the pretty format code
would clash with the object flag use of git log.

And replacing the flags with a commit slab doesn't seem to be enough,
either -- I get good results lots of commits, but for some git log with
the new placeholder would just show some nonsensical output, as it
seems to get the depth calculation wrong for them somehow.

Anyway, we can of course do something like in the patch below.  It
works, it's easy, it's fast enough for git archive, and it's quite
hideous.  Hopefully it's bad enough to motivate someone to come up with
a cleaner, faster solution.

René


---
 pretty.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/pretty.c b/pretty.c
index 3922f6f9f2..bbfb5ca3e7 100644
--- a/pretty.c
+++ b/pretty.c
@@ -12,6 +12,7 @@
 #include "reflog-walk.h"
 #include "gpg-interface.h"
 #include "trailer.h"
+#include "run-command.h"

 static char *user_format;
 static struct cmt_fmt_map {
@@ -1213,6 +1214,21 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
 		return parse_padding_placeholder(placeholder, c);
 	}

+	if (skip_prefix(placeholder, "(describe)", &arg)) {
+		struct child_process cmd = CHILD_PROCESS_INIT;
+		struct strbuf out = STRBUF_INIT;
+
+		cmd.git_cmd = 1;
+		strvec_push(&cmd.args, "describe");
+		strvec_push(&cmd.args, "--always");
+		strvec_push(&cmd.args, oid_to_hex(&commit->object.oid));
+		pipe_command(&cmd, NULL, 0, &out, 0, NULL, 0);
+		strbuf_rtrim(&out);
+		strbuf_addbuf(sb, &out);
+		strbuf_release(&out);
+		return arg - placeholder;
+	}
+
 	/* these depend on the commit */
 	if (!commit->object.parsed)
 		parse_object(the_repository, &commit->object.oid);
--
2.30.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: gitattributes export-subst and software versioning
  2021-02-08 19:46 ` René Scharfe
@ 2021-02-08 22:41   ` Junio C Hamano
  2021-02-09  0:19   ` Eli Schwartz
  1 sibling, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2021-02-08 22:41 UTC (permalink / raw)
  To: René Scharfe; +Cc: Eli Schwartz, git

René Scharfe <l.s.r@web.de> writes:

> Anyway, we can of course do something like in the patch below.  It
> works, it's easy, it's fast enough for git archive, and it's quite
> hideous.  Hopefully it's bad enough to motivate someone to come up with
> a cleaner, faster solution.

Yeah, I can see it would work, it is no-brainer-easy, it is clean,
and I can believe it would be fast enough and overall, the best
implementation for the purpose of "git archive".

Perhaps not for "log", but then I do not know if there is a good way
to have it natively run inside "log" without major surgery, as the
"describe" operation itself is fairly resource intensive to remember
which tagged commit is the closest to compute description for even a
single commit.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gitattributes export-subst and software versioning
  2021-02-08 19:46 ` René Scharfe
  2021-02-08 22:41   ` Junio C Hamano
@ 2021-02-09  0:19   ` Eli Schwartz
  2021-02-09 20:42     ` Junio C Hamano
  2021-02-14 10:04     ` René Scharfe
  1 sibling, 2 replies; 27+ messages in thread
From: Eli Schwartz @ 2021-02-09  0:19 UTC (permalink / raw)
  To: René Scharfe, git


[-- Attachment #1.1: Type: text/plain, Size: 3071 bytes --]

On 2/8/21 2:46 PM, René Scharfe wrote:
> git archive uses the pretty format code for export-subst.  It is used by
> git log and others as well.  git describe uses all object flags to find
> the best description.  Simply plugging it into the pretty format code
> would clash with the object flag use of git log.

Yeah, I was afraid there might be bad interactions with a pretty
archive-centric placeholder and something that isn't git-archive.

On the other hand, with my zero knowledge of the code but having read
lots of man pages... %S documents that it "only works with git log", so
maybe it is possible to add an option that is documented to only work
for git archive?

e.g. if you do use it,

$ cat VERSION
$Format:%d$
$Format:%S$

$ git archive HEAD | bsdtar -xOf - VERSION
 (HEAD -> master, tag: 1.0)
%S

It's apparently completely ignored and treated as raw characters. The
same restriction could theoretically be added in the other direction for
a new placeholder.

This would neatly resolve Junio's concern about the resource-intensive
nature of "describe" not being a good fit for "log".

> And replacing the flags with a commit slab doesn't seem to be enough,
> either -- I get good results lots of commits, but for some git log with
> the new placeholder would just show some nonsensical output, as it
> seems to get the depth calculation wrong for them somehow.

You mean git describe <commit> produces wrong results for those?

> Anyway, we can of course do something like in the patch below.  It
> works, it's easy, it's fast enough for git archive, and it's quite
> hideous.  Hopefully it's bad enough to motivate someone to come up with
> a cleaner, faster solution.

:D :D an important part of the resolution process, to be sure.

> ---
>  pretty.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/pretty.c b/pretty.c
> index 3922f6f9f2..bbfb5ca3e7 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -12,6 +12,7 @@
>  #include "reflog-walk.h"
>  #include "gpg-interface.h"
>  #include "trailer.h"
> +#include "run-command.h"
> 
>  static char *user_format;
>  static struct cmt_fmt_map {
> @@ -1213,6 +1214,21 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
>  		return parse_padding_placeholder(placeholder, c);
>  	}
> 
> +	if (skip_prefix(placeholder, "(describe)", &arg)) {
> +		struct child_process cmd = CHILD_PROCESS_INIT;
> +		struct strbuf out = STRBUF_INIT;
> +
> +		cmd.git_cmd = 1;
> +		strvec_push(&cmd.args, "describe");
> +		strvec_push(&cmd.args, "--always");
> +		strvec_push(&cmd.args, oid_to_hex(&commit->object.oid));
> +		pipe_command(&cmd, NULL, 0, &out, 0, NULL, 0);
> +		strbuf_rtrim(&out);
> +		strbuf_addbuf(sb, &out);
> +		strbuf_release(&out);
> +		return arg - placeholder;
> +	}
> +
>  	/* these depend on the commit */
>  	if (!commit->object.parsed)
>  		parse_object(the_repository, &commit->object.oid);
> --
> 2.30.0
> 


-- 
Eli Schwartz
Arch Linux Bug Wrangler and Trusted User


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gitattributes export-subst and software versioning
  2021-02-09  0:19   ` Eli Schwartz
@ 2021-02-09 20:42     ` Junio C Hamano
  2021-02-14 10:04       ` René Scharfe
  2021-02-14 10:04     ` René Scharfe
  1 sibling, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2021-02-09 20:42 UTC (permalink / raw)
  To: Eli Schwartz; +Cc: René Scharfe, git

Eli Schwartz <eschwartz@archlinux.org> writes:

>> And replacing the flags with a commit slab doesn't seem to be enough,
>> either -- I get good results lots of commits, but for some git log with
>> the new placeholder would just show some nonsensical output, as it
>> seems to get the depth calculation wrong for them somehow.
>
> You mean git describe <commit> produces wrong results for those?

Running "describe" as an external command, like the patch you are
responding to does, would not produce any wrong result, but Réne's
point is that it would be very involved to turn "describe" into a
function that can be called as if it is a library function without
disrupting the operation of "log" that is running in the same
address space, and the change he tried had some weird bugs to get
the computation wrong.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gitattributes export-subst and software versioning
  2021-02-09 20:42     ` Junio C Hamano
@ 2021-02-14 10:04       ` René Scharfe
  0 siblings, 0 replies; 27+ messages in thread
From: René Scharfe @ 2021-02-14 10:04 UTC (permalink / raw)
  To: Junio C Hamano, Eli Schwartz; +Cc: git

Am 09.02.21 um 21:42 schrieb Junio C Hamano:
> Eli Schwartz <eschwartz@archlinux.org> writes:
>
>>> And replacing the flags with a commit slab doesn't seem to be enough,
>>> either -- I get good results lots of commits, but for some git log with
>>> the new placeholder would just show some nonsensical output, as it
>>> seems to get the depth calculation wrong for them somehow.
>>
>> You mean git describe <commit> produces wrong results for those?
>
> Running "describe" as an external command, like the patch you are
> responding to does, would not produce any wrong result, but Réne's
> point is that it would be very involved to turn "describe" into a
> function that can be called as if it is a library function without
> disrupting the operation of "log" that is running in the same
> address space, and the change he tried had some weird bugs to get
> the computation wrong.

Yes, indeed.  Let's go with the forking version for now -- it may be
slow, but it's fast enough, and it's better than nothing.

René

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gitattributes export-subst and software versioning
  2021-02-09  0:19   ` Eli Schwartz
  2021-02-09 20:42     ` Junio C Hamano
@ 2021-02-14 10:04     ` René Scharfe
  1 sibling, 0 replies; 27+ messages in thread
From: René Scharfe @ 2021-02-14 10:04 UTC (permalink / raw)
  To: Eli Schwartz, git; +Cc: Issac Trotts

Am 09.02.21 um 01:19 schrieb Eli Schwartz:
> On the other hand, with my zero knowledge of the code but having read
> lots of man pages... %S documents that it "only works with git log", so
> maybe it is possible to add an option that is documented to only work
> for git archive?
>
> e.g. if you do use it,
>
> $ cat VERSION
> $Format:%d$
> $Format:%S$
>
> $ git archive HEAD | bsdtar -xOf - VERSION
>  (HEAD -> master, tag: 1.0)
> %S
>
> It's apparently completely ignored and treated as raw characters. The
> same restriction could theoretically be added in the other direction for
> a new placeholder.

That's a curious case.  Supporting %S in git archive would be easy  --
it's just a matter of recording the name given at the command line for
the pretty format code to find, like in the sloppy patch below (missing
doc update, missing test, leaks memory, adds a static variable to
library code).

The inconsistent support of %S is not ideal and I think that set a bad
precedent, but on the other hand I don't see its usefulness for git
archive in particular.  So I dunno.  Perhaps worth doing once rev-list
gains support, for completeness.

René

---
 archive.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/archive.c b/archive.c
index 5919d9e505..ffe4c35961 100644
--- a/archive.c
+++ b/archive.c
@@ -9,6 +9,7 @@
 #include "parse-options.h"
 #include "unpack-trees.h"
 #include "dir.h"
+#include "revision.h"

 static char const * const archive_usage[] = {
 	N_("git archive [<options>] <tree-ish> [<path>...]"),
@@ -23,6 +24,8 @@ static int nr_archivers;
 static int alloc_archivers;
 static int remote_allow_unreachable;

+static struct revision_sources revision_sources;
+
 void register_archiver(struct archiver *ar)
 {
 	ALLOC_GROW(archivers, nr_archivers + 1, alloc_archivers);
@@ -41,7 +44,8 @@ static void format_subst(const struct commit *commit,
 {
 	char *to_free = NULL;
 	struct strbuf fmt = STRBUF_INIT;
-	struct pretty_print_context ctx = {0};
+	struct rev_info rev = { .sources = &revision_sources };
+	struct pretty_print_context ctx = { .rev = &rev };
 	ctx.date_mode.type = DATE_NORMAL;
 	ctx.abbrev = DEFAULT_ABBREV;

@@ -461,6 +465,8 @@ static void parse_treeish_arg(const char **argv,

 	commit = lookup_commit_reference_gently(ar_args->repo, &oid, 1);
 	if (commit) {
+		init_revision_sources(&revision_sources);
+		*revision_sources_at(&revision_sources, commit) = xstrdup(name);
 		commit_oid = &commit->object.oid;
 		archive_time = commit->date;
 	} else {
--
2.30.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 1/2] pretty: add %(describe)
  2021-01-25  0:32 gitattributes export-subst and software versioning Eli Schwartz
  2021-02-08 19:46 ` René Scharfe
@ 2021-02-14 10:04 ` René Scharfe
  2021-02-14 10:10   ` [PATCH 2/2] pretty: add merge and exclude options to %(describe) René Scharfe.
                     ` (3 more replies)
  1 sibling, 4 replies; 27+ messages in thread
From: René Scharfe @ 2021-02-14 10:04 UTC (permalink / raw)
  To: Eli Schwartz, git; +Cc: Junio C Hamano

Add a format placeholder for describe output.  Implement it by actually
calling git describe, which is simple and guarantees correctness.  It's
intended to be used with $Format:...$ in files with the attribute
export-subst and git archive.  It can also be used with git log etc.,
even though that's going to be slow due to the fork for each commit.

Suggested-by: Eli Schwartz <eschwartz@archlinux.org>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 Documentation/pretty-formats.txt |  2 ++
 pretty.c                         | 17 +++++++++++++++++
 t/t4205-log-pretty-formats.sh    | 10 ++++++++++
 3 files changed, 29 insertions(+)

diff --git a/Documentation/pretty-formats.txt b/Documentation/pretty-formats.txt
index 6b59e28d44..bb8c05bc21 100644
--- a/Documentation/pretty-formats.txt
+++ b/Documentation/pretty-formats.txt
@@ -208,6 +208,8 @@ The placeholders are:
 '%cs':: committer date, short format (`YYYY-MM-DD`)
 '%d':: ref names, like the --decorate option of linkgit:git-log[1]
 '%D':: ref names without the " (", ")" wrapping.
+'%(describe)':: human-readable name, like linkgit:git-describe[1];
+		empty string for undescribable commits
 '%S':: ref name given on the command line by which the commit was reached
        (like `git log --source`), only works with `git log`
 '%e':: encoding
diff --git a/pretty.c b/pretty.c
index b4ff3f602f..a0c427fb61 100644
--- a/pretty.c
+++ b/pretty.c
@@ -12,6 +12,7 @@
 #include "reflog-walk.h"
 #include "gpg-interface.h"
 #include "trailer.h"
+#include "run-command.h"

 static char *user_format;
 static struct cmt_fmt_map {
@@ -1214,6 +1215,22 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
 		return parse_padding_placeholder(placeholder, c);
 	}

+	if (skip_prefix(placeholder, "(describe)", &arg)) {
+		struct child_process cmd = CHILD_PROCESS_INIT;
+		struct strbuf out = STRBUF_INIT;
+		struct strbuf err = STRBUF_INIT;
+
+		cmd.git_cmd = 1;
+		strvec_push(&cmd.args, "describe");
+		strvec_push(&cmd.args, oid_to_hex(&commit->object.oid));
+		pipe_command(&cmd, NULL, 0, &out, 0, &err, 0);
+		strbuf_rtrim(&out);
+		strbuf_addbuf(sb, &out);
+		strbuf_release(&out);
+		strbuf_release(&err);
+		return arg - placeholder;
+	}
+
 	/* these depend on the commit */
 	if (!commit->object.parsed)
 		parse_object(the_repository, &commit->object.oid);
diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh
index 749bc1431a..5a44fa447d 100755
--- a/t/t4205-log-pretty-formats.sh
+++ b/t/t4205-log-pretty-formats.sh
@@ -962,4 +962,14 @@ test_expect_success 'log --pretty=reference is colored appropriately' '
 	test_cmp expect actual
 '

+test_expect_success '%(describe) vs git describe' '
+	git log --format="%H" | while read hash
+	do
+		echo "$hash $(git describe $hash)"
+	done >expect &&
+	git log --format="%H %(describe)" >actual 2>err &&
+	test_cmp expect actual &&
+	test_must_be_empty err
+'
+
 test_done
--
2.30.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/2] pretty: add merge and exclude options to %(describe)
  2021-02-14 10:04 ` [PATCH 1/2] pretty: add %(describe) René Scharfe
@ 2021-02-14 10:10   ` René Scharfe.
  2021-02-17 18:31     ` Jeff King
       [not found]     ` <xmqqsg5uletz.fsf@gitster.g>
  2021-02-16  5:04   ` [PATCH 1/2] pretty: add %(describe) Eli Schwartz
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 27+ messages in thread
From: René Scharfe. @ 2021-02-14 10:10 UTC (permalink / raw)
  To: Eli Schwartz, git; +Cc: Junio C Hamano

Allow restricting the tags used by the placeholder %(describe) with the
options match and exclude.  E.g. the following command describes the
current commit using official version tags, without those for release
candidates:

   $ git log -1 --format='%(describe:match=v[0-9]*,exclude=*rc*)'

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 Documentation/pretty-formats.txt | 13 ++++++++--
 pretty.c                         | 43 ++++++++++++++++++++++++++++++--
 t/t4205-log-pretty-formats.sh    | 16 ++++++++++++
 3 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/Documentation/pretty-formats.txt b/Documentation/pretty-formats.txt
index bb8c05bc21..231010e6ef 100644
--- a/Documentation/pretty-formats.txt
+++ b/Documentation/pretty-formats.txt
@@ -208,8 +208,17 @@ The placeholders are:
 '%cs':: committer date, short format (`YYYY-MM-DD`)
 '%d':: ref names, like the --decorate option of linkgit:git-log[1]
 '%D':: ref names without the " (", ")" wrapping.
-'%(describe)':: human-readable name, like linkgit:git-describe[1];
-		empty string for undescribable commits
+'%(describe[:options])':: human-readable name, like
+			  linkgit:git-describe[1]; empty string for
+			  undescribable commits.  The `describe` string
+			  may be followed by a colon and zero or more
+			  comma-separated options.
++
+** 'match=<pattern>': Only consider tags matching the given
+   `glob(7)` pattern, excluding the "refs/tags/" prefix.
+** 'exclude=<pattern>': Do not consider tags matching the given
+   `glob(7)` pattern, excluding the "refs/tags/" prefix.
+
 '%S':: ref name given on the command line by which the commit was reached
        (like `git log --source`), only works with `git log`
 '%e':: encoding
diff --git a/pretty.c b/pretty.c
index a0c427fb61..c612d2ac9b 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1150,6 +1150,34 @@ static int format_trailer_match_cb(const struct strbuf *key, void *ud)
 	return 0;
 }

+static size_t parse_describe_args(const char *start, struct strvec *args)
+{
+	const char *options[] = { "match", "exclude" };
+	const char *arg = start;
+
+	for (;;) {
+		const char *matched = NULL;
+		const char *argval;
+		size_t arglen = 0;
+		int i;
+
+		for (i = 0; i < ARRAY_SIZE(options); i++) {
+			if (match_placeholder_arg_value(arg, options[i], &arg,
+							&argval, &arglen)) {
+				matched = options[i];
+				break;
+			}
+		}
+		if (!matched)
+			break;
+
+		if (!arglen)
+			return 0;
+		strvec_pushf(args, "--%s=%.*s", matched, (int)arglen, argval);
+	}
+	return arg - start;
+}
+
 static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
 				const char *placeholder,
 				void *context)
@@ -1215,20 +1243,31 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
 		return parse_padding_placeholder(placeholder, c);
 	}

-	if (skip_prefix(placeholder, "(describe)", &arg)) {
+	if (skip_prefix(placeholder, "(describe", &arg)) {
 		struct child_process cmd = CHILD_PROCESS_INIT;
 		struct strbuf out = STRBUF_INIT;
 		struct strbuf err = STRBUF_INIT;

 		cmd.git_cmd = 1;
 		strvec_push(&cmd.args, "describe");
+
+		if (*arg == ':') {
+			arg++;
+			arg += parse_describe_args(arg, &cmd.args);
+		}
+
+		if (*arg != ')') {
+			child_process_clear(&cmd);
+			return 0;
+		}
+
 		strvec_push(&cmd.args, oid_to_hex(&commit->object.oid));
 		pipe_command(&cmd, NULL, 0, &out, 0, &err, 0);
 		strbuf_rtrim(&out);
 		strbuf_addbuf(sb, &out);
 		strbuf_release(&out);
 		strbuf_release(&err);
-		return arg - placeholder;
+		return arg - placeholder + 1;
 	}

 	/* these depend on the commit */
diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh
index 5a44fa447d..7e36706212 100755
--- a/t/t4205-log-pretty-formats.sh
+++ b/t/t4205-log-pretty-formats.sh
@@ -972,4 +972,20 @@ test_expect_success '%(describe) vs git describe' '
 	test_must_be_empty err
 '

+test_expect_success '%(describe:match=...) vs git describe --match ...' '
+	test_when_finished "git tag -d tag-match" &&
+	git tag -a -m tagged tag-match&&
+	git describe --match "*-match" >expect &&
+	git log -1 --format="%(describe:match=*-match)" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '%(describe:exclude=...) vs git describe --exclude ...' '
+	test_when_finished "git tag -d tag-exclude" &&
+	git tag -a -m tagged tag-exclude &&
+	git describe --exclude "*-exclude" >expect &&
+	git log -1 --format="%(describe:exclude=*-exclude)" >actual &&
+	test_cmp expect actual
+'
+
 test_done
--
2.30.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] pretty: add %(describe)
  2021-02-14 10:04 ` [PATCH 1/2] pretty: add %(describe) René Scharfe
  2021-02-14 10:10   ` [PATCH 2/2] pretty: add merge and exclude options to %(describe) René Scharfe.
@ 2021-02-16  5:04   ` Eli Schwartz
  2021-02-16 13:00   ` Ævar Arnfjörð Bjarmason
  2021-02-17  0:58   ` Ævar Arnfjörð Bjarmason
  3 siblings, 0 replies; 27+ messages in thread
From: Eli Schwartz @ 2021-02-16  5:04 UTC (permalink / raw)
  To: René Scharfe, git; +Cc: Junio C Hamano


[-- Attachment #1.1: Type: text/plain, Size: 3488 bytes --]

On 2/14/21 5:04 AM, René Scharfe wrote:
> Add a format placeholder for describe output.  Implement it by actually
> calling git describe, which is simple and guarantees correctness.  It's
> intended to be used with $Format:...$ in files with the attribute
> export-subst and git archive.  It can also be used with git log etc.,
> even though that's going to be slow due to the fork for each commit.

This patch works great for me.

In fact, it even works fast enough for git log to not noticeably slow
down unless I really stomp on the "Page Down" button. At least on Linux...

Thanks for working on this!

> Suggested-by: Eli Schwartz <eschwartz@archlinux.org>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  Documentation/pretty-formats.txt |  2 ++
>  pretty.c                         | 17 +++++++++++++++++
>  t/t4205-log-pretty-formats.sh    | 10 ++++++++++
>  3 files changed, 29 insertions(+)
> 
> diff --git a/Documentation/pretty-formats.txt b/Documentation/pretty-formats.txt
> index 6b59e28d44..bb8c05bc21 100644
> --- a/Documentation/pretty-formats.txt
> +++ b/Documentation/pretty-formats.txt
> @@ -208,6 +208,8 @@ The placeholders are:
>  '%cs':: committer date, short format (`YYYY-MM-DD`)
>  '%d':: ref names, like the --decorate option of linkgit:git-log[1]
>  '%D':: ref names without the " (", ")" wrapping.
> +'%(describe)':: human-readable name, like linkgit:git-describe[1];
> +		empty string for undescribable commits
>  '%S':: ref name given on the command line by which the commit was reached
>         (like `git log --source`), only works with `git log`
>  '%e':: encoding
> diff --git a/pretty.c b/pretty.c
> index b4ff3f602f..a0c427fb61 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -12,6 +12,7 @@
>  #include "reflog-walk.h"
>  #include "gpg-interface.h"
>  #include "trailer.h"
> +#include "run-command.h"
> 
>  static char *user_format;
>  static struct cmt_fmt_map {
> @@ -1214,6 +1215,22 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
>  		return parse_padding_placeholder(placeholder, c);
>  	}
> 
> +	if (skip_prefix(placeholder, "(describe)", &arg)) {
> +		struct child_process cmd = CHILD_PROCESS_INIT;
> +		struct strbuf out = STRBUF_INIT;
> +		struct strbuf err = STRBUF_INIT;
> +
> +		cmd.git_cmd = 1;
> +		strvec_push(&cmd.args, "describe");
> +		strvec_push(&cmd.args, oid_to_hex(&commit->object.oid));
> +		pipe_command(&cmd, NULL, 0, &out, 0, &err, 0);
> +		strbuf_rtrim(&out);
> +		strbuf_addbuf(sb, &out);
> +		strbuf_release(&out);
> +		strbuf_release(&err);
> +		return arg - placeholder;
> +	}
> +
>  	/* these depend on the commit */
>  	if (!commit->object.parsed)
>  		parse_object(the_repository, &commit->object.oid);
> diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh
> index 749bc1431a..5a44fa447d 100755
> --- a/t/t4205-log-pretty-formats.sh
> +++ b/t/t4205-log-pretty-formats.sh
> @@ -962,4 +962,14 @@ test_expect_success 'log --pretty=reference is colored appropriately' '
>  	test_cmp expect actual
>  '
> 
> +test_expect_success '%(describe) vs git describe' '
> +	git log --format="%H" | while read hash
> +	do
> +		echo "$hash $(git describe $hash)"
> +	done >expect &&
> +	git log --format="%H %(describe)" >actual 2>err &&
> +	test_cmp expect actual &&
> +	test_must_be_empty err
> +'
> +
>  test_done
> --
> 2.30.1
> 


-- 
Eli Schwartz
Bug Wrangler and Trusted User


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] pretty: add %(describe)
  2021-02-14 10:04 ` [PATCH 1/2] pretty: add %(describe) René Scharfe
  2021-02-14 10:10   ` [PATCH 2/2] pretty: add merge and exclude options to %(describe) René Scharfe.
  2021-02-16  5:04   ` [PATCH 1/2] pretty: add %(describe) Eli Schwartz
@ 2021-02-16 13:00   ` Ævar Arnfjörð Bjarmason
  2021-02-16 17:13     ` René Scharfe.
  2021-02-16 18:44     ` Junio C Hamano
  2021-02-17  0:58   ` Ævar Arnfjörð Bjarmason
  3 siblings, 2 replies; 27+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-16 13:00 UTC (permalink / raw)
  To: René Scharfe; +Cc: Eli Schwartz, git, Junio C Hamano


On Sun, Feb 14 2021, René Scharfe wrote:

> Add a format placeholder for describe output.  Implement it by actually
> calling git describe, which is simple and guarantees correctness.  It's
> intended to be used with $Format:...$ in files with the attribute
> export-subst and git archive. 

Does it really guarantee correctness though? In "builtin/describe.c" we
first walk over the refs and use that to format all N items we're asked
about.

Under "git log" this is presumably in a race where refs added/deleted
during the run of "git log" will change the describe output to be
inconsistent with earlier lines.

> It can also be used with git log etc., even though that's going to be
> slow due to the fork for each commit.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] pretty: add %(describe)
  2021-02-16 13:00   ` Ævar Arnfjörð Bjarmason
@ 2021-02-16 17:13     ` René Scharfe.
  2021-02-16 18:44     ` Junio C Hamano
  1 sibling, 0 replies; 27+ messages in thread
From: René Scharfe. @ 2021-02-16 17:13 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Eli Schwartz, git, Junio C Hamano

Am 16.02.21 um 14:00 schrieb Ævar Arnfjörð Bjarmason:
>
> On Sun, Feb 14 2021, René Scharfe wrote:
>
>> Add a format placeholder for describe output.  Implement it by actually
>> calling git describe, which is simple and guarantees correctness.  It's
>> intended to be used with $Format:...$ in files with the attribute
>> export-subst and git archive.
>
> Does it really guarantee correctness though? In "builtin/describe.c" we
> first walk over the refs and use that to format all N items we're asked
> about.
>
> Under "git log" this is presumably in a race where refs added/deleted
> during the run of "git log" will change the describe output to be
> inconsistent with earlier lines.

Right, didn't think of that aspect.  So we'd better warn about the
raciness in the documentation.

We could improve that by keeping a single describe process around and
feeding it object names through a pipe as we go.  The results would
still become outdated after a ref is added or removed, but they'd be
consistent.  This would be faster as well.

René

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] pretty: add %(describe)
  2021-02-16 13:00   ` Ævar Arnfjörð Bjarmason
  2021-02-16 17:13     ` René Scharfe.
@ 2021-02-16 18:44     ` Junio C Hamano
  2021-02-17  0:47       ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2021-02-16 18:44 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: René Scharfe, Eli Schwartz, git

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Sun, Feb 14 2021, René Scharfe wrote:
>
>> Add a format placeholder for describe output.  Implement it by actually
>> calling git describe, which is simple and guarantees correctness.  It's
>> intended to be used with $Format:...$ in files with the attribute
>> export-subst and git archive. 
>
> Does it really guarantee correctness though? In "builtin/describe.c" we
> first walk over the refs and use that to format all N items we're asked
> about.
>
> Under "git log" this is presumably in a race where refs added/deleted
> during the run of "git log" will change the describe output to be
> inconsistent with earlier lines.

Yes, but it is not a news that the describe for a given commit will
not stay the constant while you add more objects to the repository
or you change the tags, whether the "describe" is driven internally
by "git log" or by the end-user, so I am not sure how that becomes
an issue.  If the output is inconsistent in a quiescent repository,
that would be a problem, though.

Puzzled.

Thanks.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] pretty: add %(describe)
  2021-02-16 18:44     ` Junio C Hamano
@ 2021-02-17  0:47       ` Ævar Arnfjörð Bjarmason
  2021-02-28 11:22         ` René Scharfe.
  0 siblings, 1 reply; 27+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17  0:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, Eli Schwartz, git


On Tue, Feb 16 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> On Sun, Feb 14 2021, René Scharfe wrote:
>>
>>> Add a format placeholder for describe output.  Implement it by actually
>>> calling git describe, which is simple and guarantees correctness.  It's
>>> intended to be used with $Format:...$ in files with the attribute
>>> export-subst and git archive. 
>>
>> Does it really guarantee correctness though? In "builtin/describe.c" we
>> first walk over the refs and use that to format all N items we're asked
>> about.
>>
>> Under "git log" this is presumably in a race where refs added/deleted
>> during the run of "git log" will change the describe output to be
>> inconsistent with earlier lines.
>
> Yes, but it is not a news that the describe for a given commit will
> not stay the constant while you add more objects to the repository
> or you change the tags, whether the "describe" is driven internally
> by "git log" or by the end-user, so I am not sure how that becomes
> an issue.  If the output is inconsistent in a quiescent repository,
> that would be a problem, though.
>
> Puzzled.

Usually something shelling out has going for it is that even if it's
slow it's at least known to be bug-free, after all we use the command
like that already.

I just wanted to point out this edge case, for "git describe <commits>"
we do the ref list once at the beginning so our N list will be
consistent within itself.

Whereas one might expect "git log" to also format such a list, but due
to the internal implementation the semantics are different.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] pretty: add %(describe)
  2021-02-14 10:04 ` [PATCH 1/2] pretty: add %(describe) René Scharfe
                     ` (2 preceding siblings ...)
  2021-02-16 13:00   ` Ævar Arnfjörð Bjarmason
@ 2021-02-17  0:58   ` Ævar Arnfjörð Bjarmason
  2021-02-17 18:12     ` Junio C Hamano
  2021-02-28 11:22     ` René Scharfe.
  3 siblings, 2 replies; 27+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17  0:58 UTC (permalink / raw)
  To: René Scharfe; +Cc: Eli Schwartz, git, Junio C Hamano


On Sun, Feb 14 2021, René Scharfe wrote:

> +'%(describe)':: human-readable name, like linkgit:git-describe[1];
> +		empty string for undescribable commits

In the case of undescribable we've got the subcommand exiting non-zero
and we ignore it. The right thing in this case given how the rest of
format arguments work, but maybe something to explicitly test for?
>
> +	if (skip_prefix(placeholder, "(describe)", &arg)) {
> +		struct child_process cmd = CHILD_PROCESS_INIT;
> +		struct strbuf out = STRBUF_INIT;
> +		struct strbuf err = STRBUF_INIT;
> +
> +		cmd.git_cmd = 1;
> +		strvec_push(&cmd.args, "describe");
> +		strvec_push(&cmd.args, oid_to_hex(&commit->object.oid));
> +		pipe_command(&cmd, NULL, 0, &out, 0, &err, 0);
> +		strbuf_rtrim(&out);
> +		strbuf_addbuf(sb, &out);
> +		strbuf_release(&out);
> +		strbuf_release(&err);
> +		return arg - placeholder;
> +	}

There's another edge case in this: if you do "%(describe)%(describe)"
it'll be run twice for the rev, 3 times if you add another "%(describe)"
etc. I don't know if pretty.c has an easy way to cache/avoid that.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] pretty: add %(describe)
  2021-02-17  0:58   ` Ævar Arnfjörð Bjarmason
@ 2021-02-17 18:12     ` Junio C Hamano
  2021-02-28 11:22     ` René Scharfe.
  1 sibling, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2021-02-17 18:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: René Scharfe, Eli Schwartz, git

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Sun, Feb 14 2021, René Scharfe wrote:
>
>> +'%(describe)':: human-readable name, like linkgit:git-describe[1];
>> +		empty string for undescribable commits
>
> In the case of undescribable we've got the subcommand exiting non-zero
> and we ignore it. The right thing in this case given how the rest of
> format arguments work, but maybe something to explicitly test for?
>>
>> +	if (skip_prefix(placeholder, "(describe)", &arg)) {
>> +		struct child_process cmd = CHILD_PROCESS_INIT;
>> +		struct strbuf out = STRBUF_INIT;
>> +		struct strbuf err = STRBUF_INIT;
>> +
>> +		cmd.git_cmd = 1;
>> +		strvec_push(&cmd.args, "describe");
>> +		strvec_push(&cmd.args, oid_to_hex(&commit->object.oid));
>> +		pipe_command(&cmd, NULL, 0, &out, 0, &err, 0);
>> +		strbuf_rtrim(&out);
>> +		strbuf_addbuf(sb, &out);
>> +		strbuf_release(&out);
>> +		strbuf_release(&err);
>> +		return arg - placeholder;
>> +	}
>
> There's another edge case in this: if you do "%(describe)%(describe)"
> it'll be run twice for the rev, 3 times if you add another "%(describe)"
> etc. I don't know if pretty.c has an easy way to cache/avoid that.

Just like for-each-ref that has the "atom" system to lazily parse
and cache a computed result so that multiple invocations of the same
placeholder will have to prepare a string only once, the pretty side
has the "format_commit_context" structure that can be used to cache
values that are expensive to compute in case it is used more than
once, so we could use it.

I however suspect that we may already be leaking some cacheed values
in the current code.  For example, there is "signature_check"
instance that is used to handle %G and we do use it to call
check_signature(), but a call to signature_check_clear() to release
resources is nowhere to be seen as far as I can tell.





^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] pretty: add merge and exclude options to %(describe)
  2021-02-14 10:10   ` [PATCH 2/2] pretty: add merge and exclude options to %(describe) René Scharfe.
@ 2021-02-17 18:31     ` Jeff King
  2021-02-28 11:22       ` René Scharfe.
       [not found]     ` <xmqqsg5uletz.fsf@gitster.g>
  1 sibling, 1 reply; 27+ messages in thread
From: Jeff King @ 2021-02-17 18:31 UTC (permalink / raw)
  To: René Scharfe.; +Cc: Eli Schwartz, git, Junio C Hamano

On Sun, Feb 14, 2021 at 11:10:57AM +0100, René Scharfe. wrote:

> Allow restricting the tags used by the placeholder %(describe) with the
> options match and exclude.  E.g. the following command describes the
> current commit using official version tags, without those for release
> candidates:
> 
>    $ git log -1 --format='%(describe:match=v[0-9]*,exclude=*rc*)'

An interesting side effect of this series is that it allows remote users
asking for archives to fill in this data, too (by using export-subst
placeholders). That includes servers allowing "git archive --remote",
but also services like GitHub that will run git-archive on behalf of
clients.

I wonder what avenues for mischief this provides. Certainly using extra
CPU to run git-describe. But I guess also probing at otherwise hidden
refs using the match/exclude system (though since it's limited to
refs/tags/, that's pretty unlikely).

I present this mostly as an observation, not an objection. Certainly we
already have %D which can look at hidden refs. And I strongly suspect
that the server-side git-upload-archive does not respect hidden refs
when resolving object names in the first place.

Kind of an interesting thought as we extend the formatting language,
though. I generally think of them as something that is always under
control of caller, but export-subst's $Format$ will come from the repo
contents.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] pretty: add %(describe)
  2021-02-17  0:58   ` Ævar Arnfjörð Bjarmason
  2021-02-17 18:12     ` Junio C Hamano
@ 2021-02-28 11:22     ` René Scharfe.
       [not found]       ` <xmqq7dmqsr72.fsf@gitster.c.googlers.com>
  1 sibling, 1 reply; 27+ messages in thread
From: René Scharfe. @ 2021-02-28 11:22 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Eli Schwartz, git, Junio C Hamano

Am 17.02.21 um 01:58 schrieb Ævar Arnfjörð Bjarmason:
>
> On Sun, Feb 14 2021, René Scharfe wrote:
>
>> +'%(describe)':: human-readable name, like linkgit:git-describe[1];
>> +		empty string for undescribable commits
>
> In the case of undescribable we've got the subcommand exiting non-zero
> and we ignore it. The right thing in this case given how the rest of
> format arguments work, but maybe something to explicitly test for?

The test convers it, but we can surely make that easier to see.

-- >8 --
Subject: [PATCH] t4205: assert %(describe) test coverage

Document that the test is covering both describable and
undescribable commits.

Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 t/t4205-log-pretty-formats.sh | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh
index b47a0bd9eb..cabdf7d57a 100755
--- a/t/t4205-log-pretty-formats.sh
+++ b/t/t4205-log-pretty-formats.sh
@@ -965,8 +965,17 @@ test_expect_success 'log --pretty=reference is colored appropriately' '
 test_expect_success '%(describe) vs git describe' '
 	git log --format="%H" | while read hash
 	do
-		echo "$hash $(git describe $hash)"
+		if desc=$(git describe $hash)
+		then
+			: >expect-contains-good
+		else
+			: >expect-contains-bad
+		fi &&
+		echo "$hash $desc"
 	done >expect &&
+	test_path_exists expect-contains-good &&
+	test_path_exists expect-contains-bad &&
+
 	git log --format="%H %(describe)" >actual 2>err &&
 	test_cmp expect actual &&
 	test_must_be_empty err
--
2.30.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] pretty: add %(describe)
  2021-02-17  0:47       ` Ævar Arnfjörð Bjarmason
@ 2021-02-28 11:22         ` René Scharfe.
       [not found]           ` <xmqq35xesqzk.fsf@gitster.c.googlers.com>
  0 siblings, 1 reply; 27+ messages in thread
From: René Scharfe. @ 2021-02-28 11:22 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Junio C Hamano; +Cc: Eli Schwartz, git

Am 17.02.21 um 01:47 schrieb Ævar Arnfjörð Bjarmason:
> Usually something shelling out has going for it is that even if it's
> slow it's at least known to be bug-free, after all we use the command
> like that already.
>
> I just wanted to point out this edge case, for "git describe <commits>"
> we do the ref list once at the beginning so our N list will be
> consistent within itself.
>
> Whereas one might expect "git log" to also format such a list, but due
> to the internal implementation the semantics are different.

It shouldn't matter for the intended use case (git archive containing a
spec file with a single "$Format:%(describe)$") and I cannot imagine how
consistency in the face of tag changes would be useful (what application
would be OK with consistently outdated output, but break with partly
outdated descriptions).  But it makes sense to mention it in the docs.

-- >8 --
Subject: [PATCH] pretty: document multiple %(describe) being inconsistent

Each %(describe) placeholder is expanded using a separate git describe
call.  Their outputs depend on the tags present at the time, so there's
no consistency guarantee.  Document that fact.

Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 Documentation/pretty-formats.txt | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/pretty-formats.txt b/Documentation/pretty-formats.txt
index 231010e6ef..45133066e4 100644
--- a/Documentation/pretty-formats.txt
+++ b/Documentation/pretty-formats.txt
@@ -212,7 +212,9 @@ The placeholders are:
 			  linkgit:git-describe[1]; empty string for
 			  undescribable commits.  The `describe` string
 			  may be followed by a colon and zero or more
-			  comma-separated options.
+			  comma-separated options.  Descriptions can be
+			  inconsistent when tags are added or removed at
+			  the same time.
 +
 ** 'match=<pattern>': Only consider tags matching the given
    `glob(7)` pattern, excluding the "refs/tags/" prefix.
--
2.30.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] pretty: add merge and exclude options to %(describe)
       [not found]     ` <xmqqsg5uletz.fsf@gitster.g>
@ 2021-02-28 11:22       ` René Scharfe.
  0 siblings, 0 replies; 27+ messages in thread
From: René Scharfe. @ 2021-02-28 11:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eli Schwartz, git

Am 17.02.21 um 19:38 schrieb Junio C Hamano:
> I wonder, in addition to "match" and "exclude", if we want to allow
> "always" as well.

I added "match" because that describe option is used by GIT-VERSION-GEN,
so I imagine it's generally useful for version names in spec files.  Not
sure why I misspelled it "merge" in the subject, though. o_O

"exclude" is not in there, but I threw it in anyway, as the example in
77d21f29ea (describe: teach describe negative pattern matches,
2017-01-18) made sense to me.  It complements "match" nicely.

I had "always" in proof-of-concept patch because I hadn't decided what
to do with commits that git describe doesn't find a description for,
and wanted to check the full output of git log --pretty='%(describe)'.
For a release tarball of a repo that lacks tags it would be easier to
use %h instead of %(describe:always) -- or tag the release.

That's why I didn't include "always" in the latest patches, but if it
turns out to be useful for someone then I wouldn't them adding it.

> Also, looking further into the future, I wonder if we should aim to
> eventually unify %h and %H as well as %(describe) into one,
> i.e. various ways to spell a commit object name, given that there is
> a separate effort underway to unify pretty and for-each-ref format
> strings.
>
> E.g. %(objectname) is the same as %H, and %(objectname:short) is the
> same as %h, so this might be %(objectname:describe), or something
> along the line.

According to the glossary and object name is:

        The unique identifier of an object.  The
        object name is usually represented by a 40 character
        hexadecimal string.  Also colloquially called SHA-1.

And that's how I understand it as well.  The object layer with it's
hashes on one hand and descriptions based on refs and commit relations
on the other are separate things in my mind.  %(describe) falling back
to %h when :always is given makes sense to me; %(objectname) "falling
forward" to show describe output feels like a layering violation.

René

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] pretty: add merge and exclude options to %(describe)
  2021-02-17 18:31     ` Jeff King
@ 2021-02-28 11:22       ` René Scharfe.
  2021-02-28 15:41         ` Ævar Arnfjörð Bjarmason
       [not found]         ` <xmqqy2f6rc8f.fsf@gitster.c.googlers.com>
  0 siblings, 2 replies; 27+ messages in thread
From: René Scharfe. @ 2021-02-28 11:22 UTC (permalink / raw)
  To: Jeff King; +Cc: Eli Schwartz, git, Junio C Hamano

Am 17.02.21 um 19:31 schrieb Jeff King:
> On Sun, Feb 14, 2021 at 11:10:57AM +0100, René Scharfe. wrote:
>
>> Allow restricting the tags used by the placeholder %(describe) with the
>> options match and exclude.  E.g. the following command describes the
>> current commit using official version tags, without those for release
>> candidates:
>>
>>    $ git log -1 --format='%(describe:match=v[0-9]*,exclude=*rc*)'
>
> An interesting side effect of this series is that it allows remote users
> asking for archives to fill in this data, too (by using export-subst
> placeholders). That includes servers allowing "git archive --remote",
> but also services like GitHub that will run git-archive on behalf of
> clients.
>
> I wonder what avenues for mischief this provides. Certainly using extra
> CPU to run git-describe.

A repository can contain millions of files, each file can contain
millions of $Format:...$ sequences and each of them can contain millions
of %(describe) placeholders.  Each of them could have different match or
exclude args to prevent caching.  Allowing a single request to cause
trillions of calls of git describe sounds excessive.  Let's limit this.

-- >8 --
Subject: [PATCH] archive: expand only a single %(describe) per archive

Every %(describe) placeholder in $Format:...$ strings in files with the
attribute export-subst is expanded by calling git describe.  This can
potentially result in a lot of such calls per archive.  That's OK for
local repositories under control of the user of git archive, but could
be a problem for hosted repositories.

Expand only a single %(describe) placeholder per archive for now to
avoid denial-of-service attacks.  We can make this limit configurable
later if needed, but let's start out simple.

Reported-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 Documentation/gitattributes.txt |  3 ++-
 archive.c                       | 16 ++++++++++------
 archive.h                       |  2 ++
 pretty.c                        |  8 ++++++++
 pretty.h                        |  5 +++++
 t/t5001-archive-attr.sh         | 14 ++++++++++++++
 6 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e84e104f93..0a60472bb5 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -1174,7 +1174,8 @@ tag then no replacement will be done.  The placeholders are the same
 as those for the option `--pretty=format:` of linkgit:git-log[1],
 except that they need to be wrapped like this: `$Format:PLACEHOLDERS$`
 in the file.  E.g. the string `$Format:%H$` will be replaced by the
-commit hash.
+commit hash.  However, only one `%(describe)` placeholder is expanded
+per archive to avoid denial-of-service attacks.


 Packing objects
diff --git a/archive.c b/archive.c
index 5919d9e505..2dd2236ae0 100644
--- a/archive.c
+++ b/archive.c
@@ -37,13 +37,10 @@ void init_archivers(void)

 static void format_subst(const struct commit *commit,
 			 const char *src, size_t len,
-			 struct strbuf *buf)
+			 struct strbuf *buf, struct pretty_print_context *ctx)
 {
 	char *to_free = NULL;
 	struct strbuf fmt = STRBUF_INIT;
-	struct pretty_print_context ctx = {0};
-	ctx.date_mode.type = DATE_NORMAL;
-	ctx.abbrev = DEFAULT_ABBREV;

 	if (src == buf->buf)
 		to_free = strbuf_detach(buf, NULL);
@@ -61,7 +58,7 @@ static void format_subst(const struct commit *commit,
 		strbuf_add(&fmt, b + 8, c - b - 8);

 		strbuf_add(buf, src, b - src);
-		format_commit_message(commit, fmt.buf, buf, &ctx);
+		format_commit_message(commit, fmt.buf, buf, ctx);
 		len -= c + 1 - src;
 		src  = c + 1;
 	}
@@ -94,7 +91,7 @@ static void *object_file_to_archive(const struct archiver_args *args,
 		strbuf_attach(&buf, buffer, *sizep, *sizep + 1);
 		convert_to_working_tree(args->repo->index, path, buf.buf, buf.len, &buf, &meta);
 		if (commit)
-			format_subst(commit, buf.buf, buf.len, &buf);
+			format_subst(commit, buf.buf, buf.len, &buf, args->pretty_ctx);
 		buffer = strbuf_detach(&buf, &size);
 		*sizep = size;
 	}
@@ -633,12 +630,19 @@ int write_archive(int argc, const char **argv, const char *prefix,
 		  const char *name_hint, int remote)
 {
 	const struct archiver *ar = NULL;
+	struct pretty_print_describe_status describe_status = {0};
+	struct pretty_print_context ctx = {0};
 	struct archiver_args args;
 	int rc;

 	git_config_get_bool("uploadarchive.allowunreachable", &remote_allow_unreachable);
 	git_config(git_default_config, NULL);

+	describe_status.max_invocations = 1;
+	ctx.date_mode.type = DATE_NORMAL;
+	ctx.abbrev = DEFAULT_ABBREV;
+	ctx.describe_status = &describe_status;
+	args.pretty_ctx = &ctx;
 	args.repo = repo;
 	args.prefix = prefix;
 	string_list_init(&args.extra_files, 1);
diff --git a/archive.h b/archive.h
index 33551b7ee1..49fab71aaf 100644
--- a/archive.h
+++ b/archive.h
@@ -5,6 +5,7 @@
 #include "pathspec.h"

 struct repository;
+struct pretty_print_context;

 struct archiver_args {
 	struct repository *repo;
@@ -22,6 +23,7 @@ struct archiver_args {
 	unsigned int convert : 1;
 	int compression_level;
 	struct string_list extra_files;
+	struct pretty_print_context *pretty_ctx;
 };

 /* main api */
diff --git a/pretty.c b/pretty.c
index c612d2ac9b..032e89cd4e 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1247,6 +1247,14 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
 		struct child_process cmd = CHILD_PROCESS_INIT;
 		struct strbuf out = STRBUF_INIT;
 		struct strbuf err = STRBUF_INIT;
+		struct pretty_print_describe_status *describe_status;
+
+		describe_status = c->pretty_ctx->describe_status;
+		if (describe_status) {
+			if (!describe_status->max_invocations)
+				return 0;
+			describe_status->max_invocations--;
+		}

 		cmd.git_cmd = 1;
 		strvec_push(&cmd.args, "describe");
diff --git a/pretty.h b/pretty.h
index 7ce6c0b437..27b15947ff 100644
--- a/pretty.h
+++ b/pretty.h
@@ -23,6 +23,10 @@ enum cmit_fmt {
 	CMIT_FMT_UNSPECIFIED
 };

+struct pretty_print_describe_status {
+	unsigned int max_invocations;
+};
+
 struct pretty_print_context {
 	/*
 	 * Callers should tweak these to change the behavior of pp_* functions.
@@ -44,6 +48,7 @@ struct pretty_print_context {
 	int color;
 	struct ident_split *from_ident;
 	unsigned encode_email_headers:1;
+	struct pretty_print_describe_status *describe_status;

 	/*
 	 * Fields below here are manipulated internally by pp_* functions and
diff --git a/t/t5001-archive-attr.sh b/t/t5001-archive-attr.sh
index e9aa97117a..1c9ce3956b 100755
--- a/t/t5001-archive-attr.sh
+++ b/t/t5001-archive-attr.sh
@@ -128,4 +128,18 @@ test_expect_success 'export-subst' '
 	test_cmp substfile2 archive/substfile2
 '

+test_expect_success 'export-subst expands %(describe) once' '
+	echo "\$Format:%(describe)\$" >substfile3 &&
+	echo "\$Format:%(describe)\$" >>substfile3 &&
+	echo "\$Format:%(describe)${LF}%(describe)\$" >substfile4 &&
+	git add substfile[34] &&
+	git commit -m export-subst-describe &&
+	git tag -m export-subst-describe export-subst-describe &&
+	git archive HEAD >archive-describe.tar &&
+	extract_tar_to_dir archive-describe &&
+	desc=$(git describe) &&
+	grep -F "$desc" archive-describe/substfile[34] >substituted &&
+	test_line_count = 1 substituted
+'
+
 test_done
--
2.30.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] pretty: add merge and exclude options to %(describe)
  2021-02-28 11:22       ` René Scharfe.
@ 2021-02-28 15:41         ` Ævar Arnfjörð Bjarmason
  2021-03-02 16:00           ` René Scharfe.
       [not found]         ` <xmqqy2f6rc8f.fsf@gitster.c.googlers.com>
  1 sibling, 1 reply; 27+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-28 15:41 UTC (permalink / raw)
  To: René Scharfe.; +Cc: Jeff King, Eli Schwartz, git, Junio C Hamano


On Sun, Feb 28 2021, René Scharfe. wrote:

> Am 17.02.21 um 19:31 schrieb Jeff King:
>> On Sun, Feb 14, 2021 at 11:10:57AM +0100, René Scharfe. wrote:
>>
>>> Allow restricting the tags used by the placeholder %(describe) with the
>>> options match and exclude.  E.g. the following command describes the
>>> current commit using official version tags, without those for release
>>> candidates:
>>>
>>>    $ git log -1 --format='%(describe:match=v[0-9]*,exclude=*rc*)'
>>
>> An interesting side effect of this series is that it allows remote users
>> asking for archives to fill in this data, too (by using export-subst
>> placeholders). That includes servers allowing "git archive --remote",
>> but also services like GitHub that will run git-archive on behalf of
>> clients.
>>
>> I wonder what avenues for mischief this provides. Certainly using extra
>> CPU to run git-describe.
>
> A repository can contain millions of files, each file can contain
> millions of $Format:...$ sequences and each of them can contain millions
> of %(describe) placeholders.  Each of them could have different match or
> exclude args to prevent caching.  Allowing a single request to cause
> trillions of calls of git describe sounds excessive.  Let's limit this.
>
> -- >8 --
> Subject: [PATCH] archive: expand only a single %(describe) per archive
>
> Every %(describe) placeholder in $Format:...$ strings in files with the
> attribute export-subst is expanded by calling git describe.  This can
> potentially result in a lot of such calls per archive.  That's OK for
> local repositories under control of the user of git archive, but could
> be a problem for hosted repositories.
>
> Expand only a single %(describe) placeholder per archive for now to
> avoid denial-of-service attacks.  We can make this limit configurable
> later if needed, but let's start out simple.
>
> Reported-by: Jeff King <peff@peff.net>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  Documentation/gitattributes.txt |  3 ++-
>  archive.c                       | 16 ++++++++++------
>  archive.h                       |  2 ++
>  pretty.c                        |  8 ++++++++
>  pretty.h                        |  5 +++++
>  t/t5001-archive-attr.sh         | 14 ++++++++++++++
>  6 files changed, 41 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index e84e104f93..0a60472bb5 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -1174,7 +1174,8 @@ tag then no replacement will be done.  The placeholders are the same
>  as those for the option `--pretty=format:` of linkgit:git-log[1],
>  except that they need to be wrapped like this: `$Format:PLACEHOLDERS$`
>  in the file.  E.g. the string `$Format:%H$` will be replaced by the
> -commit hash.
> +commit hash.  However, only one `%(describe)` placeholder is expanded
> +per archive to avoid denial-of-service attacks.
>
>
>  Packing objects
> diff --git a/archive.c b/archive.c
> index 5919d9e505..2dd2236ae0 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -37,13 +37,10 @@ void init_archivers(void)
>
>  static void format_subst(const struct commit *commit,
>  			 const char *src, size_t len,
> -			 struct strbuf *buf)
> +			 struct strbuf *buf, struct pretty_print_context *ctx)
>  {
>  	char *to_free = NULL;
>  	struct strbuf fmt = STRBUF_INIT;
> -	struct pretty_print_context ctx = {0};
> -	ctx.date_mode.type = DATE_NORMAL;
> -	ctx.abbrev = DEFAULT_ABBREV;
>
>  	if (src == buf->buf)
>  		to_free = strbuf_detach(buf, NULL);
> @@ -61,7 +58,7 @@ static void format_subst(const struct commit *commit,
>  		strbuf_add(&fmt, b + 8, c - b - 8);
>
>  		strbuf_add(buf, src, b - src);
> -		format_commit_message(commit, fmt.buf, buf, &ctx);
> +		format_commit_message(commit, fmt.buf, buf, ctx);
>  		len -= c + 1 - src;
>  		src  = c + 1;
>  	}
> @@ -94,7 +91,7 @@ static void *object_file_to_archive(const struct archiver_args *args,
>  		strbuf_attach(&buf, buffer, *sizep, *sizep + 1);
>  		convert_to_working_tree(args->repo->index, path, buf.buf, buf.len, &buf, &meta);
>  		if (commit)
> -			format_subst(commit, buf.buf, buf.len, &buf);
> +			format_subst(commit, buf.buf, buf.len, &buf, args->pretty_ctx);
>  		buffer = strbuf_detach(&buf, &size);
>  		*sizep = size;
>  	}
> @@ -633,12 +630,19 @@ int write_archive(int argc, const char **argv, const char *prefix,
>  		  const char *name_hint, int remote)
>  {
>  	const struct archiver *ar = NULL;
> +	struct pretty_print_describe_status describe_status = {0};
> +	struct pretty_print_context ctx = {0};
>  	struct archiver_args args;
>  	int rc;
>
>  	git_config_get_bool("uploadarchive.allowunreachable", &remote_allow_unreachable);
>  	git_config(git_default_config, NULL);
>
> +	describe_status.max_invocations = 1;
> +	ctx.date_mode.type = DATE_NORMAL;
> +	ctx.abbrev = DEFAULT_ABBREV;
> +	ctx.describe_status = &describe_status;
> +	args.pretty_ctx = &ctx;
>  	args.repo = repo;
>  	args.prefix = prefix;
>  	string_list_init(&args.extra_files, 1);
> diff --git a/archive.h b/archive.h
> index 33551b7ee1..49fab71aaf 100644
> --- a/archive.h
> +++ b/archive.h
> @@ -5,6 +5,7 @@
>  #include "pathspec.h"
>
>  struct repository;
> +struct pretty_print_context;
>
>  struct archiver_args {
>  	struct repository *repo;
> @@ -22,6 +23,7 @@ struct archiver_args {
>  	unsigned int convert : 1;
>  	int compression_level;
>  	struct string_list extra_files;
> +	struct pretty_print_context *pretty_ctx;
>  };
>
>  /* main api */
> diff --git a/pretty.c b/pretty.c
> index c612d2ac9b..032e89cd4e 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -1247,6 +1247,14 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
>  		struct child_process cmd = CHILD_PROCESS_INIT;
>  		struct strbuf out = STRBUF_INIT;
>  		struct strbuf err = STRBUF_INIT;
> +		struct pretty_print_describe_status *describe_status;
> +
> +		describe_status = c->pretty_ctx->describe_status;
> +		if (describe_status) {
> +			if (!describe_status->max_invocations)
> +				return 0;
> +			describe_status->max_invocations--;
> +		}
>
>  		cmd.git_cmd = 1;
>  		strvec_push(&cmd.args, "describe");
> diff --git a/pretty.h b/pretty.h
> index 7ce6c0b437..27b15947ff 100644
> --- a/pretty.h
> +++ b/pretty.h
> @@ -23,6 +23,10 @@ enum cmit_fmt {
>  	CMIT_FMT_UNSPECIFIED
>  };
>
> +struct pretty_print_describe_status {
> +	unsigned int max_invocations;
> +};
> +
>  struct pretty_print_context {
>  	/*
>  	 * Callers should tweak these to change the behavior of pp_* functions.
> @@ -44,6 +48,7 @@ struct pretty_print_context {
>  	int color;
>  	struct ident_split *from_ident;
>  	unsigned encode_email_headers:1;
> +	struct pretty_print_describe_status *describe_status;
>
>  	/*
>  	 * Fields below here are manipulated internally by pp_* functions and
> diff --git a/t/t5001-archive-attr.sh b/t/t5001-archive-attr.sh
> index e9aa97117a..1c9ce3956b 100755
> --- a/t/t5001-archive-attr.sh
> +++ b/t/t5001-archive-attr.sh
> @@ -128,4 +128,18 @@ test_expect_success 'export-subst' '
>  	test_cmp substfile2 archive/substfile2
>  '
>
> +test_expect_success 'export-subst expands %(describe) once' '
> +	echo "\$Format:%(describe)\$" >substfile3 &&
> +	echo "\$Format:%(describe)\$" >>substfile3 &&
> +	echo "\$Format:%(describe)${LF}%(describe)\$" >substfile4 &&
> +	git add substfile[34] &&
> +	git commit -m export-subst-describe &&
> +	git tag -m export-subst-describe export-subst-describe &&
> +	git archive HEAD >archive-describe.tar &&
> +	extract_tar_to_dir archive-describe &&
> +	desc=$(git describe) &&
> +	grep -F "$desc" archive-describe/substfile[34] >substituted &&
> +	test_line_count = 1 substituted
> +'
> +
>  test_done

This whole thing seems rather backwards as a solution to the "we run it
N times" problem I mentioned in <87k0r7a4sr.fsf@evledraar.gmail.com>.

Instead of taking the trouble of putting a limit in the
pretty_print_context so we don't call it N times for the same commit,
why not just put the strbuf with the result in that same struct?

Then you can have it millions of times, and it won't be any more
expensive than the other existing %(format) specifiers (actually cheaper
than most).

Or is there some edge case I'm missing here where "git archive" can
either be fed N commits and we share the context, or we share the
context across formatting different revisions in some cases?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] pretty: add %(describe)
       [not found]           ` <xmqq35xesqzk.fsf@gitster.c.googlers.com>
@ 2021-03-02 16:00             ` René Scharfe.
  0 siblings, 0 replies; 27+ messages in thread
From: René Scharfe. @ 2021-03-02 16:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ævar Arnfjörð Bjarmason, Eli Schwartz, git

Am 01.03.21 um 18:50 schrieb Junio C Hamano:
> René Scharfe. <l.s.r@web.de> writes:
>
>> Am 17.02.21 um 01:47 schrieb Ævar Arnfjörð Bjarmason:
>>> Usually something shelling out has going for it is that even if it's
>>> slow it's at least known to be bug-free, after all we use the command
>>> like that already.
>>>
>>> I just wanted to point out this edge case, for "git describe <commits>"
>>> we do the ref list once at the beginning so our N list will be
>>> consistent within itself.
>>>
>>> Whereas one might expect "git log" to also format such a list, but due
>>> to the internal implementation the semantics are different.
>>
>> It shouldn't matter for the intended use case (git archive containing a
>> spec file with a single "$Format:%(describe)$") and I cannot imagine how
>> consistency in the face of tag changes would be useful (what application
>> would be OK with consistently outdated output, but break with partly
>> outdated descriptions).  But it makes sense to mention it in the docs.
>
> Does it?
>
> As long as the reader understands "git describe" is about measuring
> the "location" of given commits relative to the annotated tags (or
> whatever anchor points the invocation chooses to use), I would say
> that it is unlikely that the reader does not to realize the
> ramifications of changing tags in the middle.

I actually agree, but I'm biased.  The thing is: The question came up
and needed answering, so there is a chance that it might help someone
else as well.

> While it may not be incorrect per-se (hence it may "not hurt"),
> making the description longer does hurt the readers' experience.

The added sentence would look better in a smaller font, or in a
footnote. :-|

> So, I am a bit skeptical if this is a good change, but will queue
> for now anyway.
>
> Thanks.
>
>
>
>> -- >8 --
>> Subject: [PATCH] pretty: document multiple %(describe) being inconsistent
>>
>> Each %(describe) placeholder is expanded using a separate git describe
>> call.  Their outputs depend on the tags present at the time, so there's
>> no consistency guarantee.  Document that fact.
>>
>> Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>>  Documentation/pretty-formats.txt | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/pretty-formats.txt b/Documentation/pretty-formats.txt
>> index 231010e6ef..45133066e4 100644
>> --- a/Documentation/pretty-formats.txt
>> +++ b/Documentation/pretty-formats.txt
>> @@ -212,7 +212,9 @@ The placeholders are:
>>  			  linkgit:git-describe[1]; empty string for
>>  			  undescribable commits.  The `describe` string
>>  			  may be followed by a colon and zero or more
>> -			  comma-separated options.
>> +			  comma-separated options.  Descriptions can be
>> +			  inconsistent when tags are added or removed at
>> +			  the same time.
>>  +
>>  ** 'match=<pattern>': Only consider tags matching the given
>>     `glob(7)` pattern, excluding the "refs/tags/" prefix.
>> --
>> 2.30.1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] pretty: add %(describe)
       [not found]       ` <xmqq7dmqsr72.fsf@gitster.c.googlers.com>
@ 2021-03-02 16:00         ` René Scharfe.
  0 siblings, 0 replies; 27+ messages in thread
From: René Scharfe. @ 2021-03-02 16:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ævar Arnfjörð Bjarmason, Eli Schwartz, git

Am 01.03.21 um 18:45 schrieb Junio C Hamano:
> René Scharfe. <l.s.r@web.de> writes:
>
>> Subject: [PATCH] t4205: assert %(describe) test coverage
>>
>> Document that the test is covering both describable and
>> undescribable commits.
>>
>> Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>>  t/t4205-log-pretty-formats.sh | 11 ++++++++++-
>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh
>> index b47a0bd9eb..cabdf7d57a 100755
>> --- a/t/t4205-log-pretty-formats.sh
>> +++ b/t/t4205-log-pretty-formats.sh
>> @@ -965,8 +965,17 @@ test_expect_success 'log --pretty=reference is colored appropriately' '
>>  test_expect_success '%(describe) vs git describe' '
>>  	git log --format="%H" | while read hash
>>  	do
>> -		echo "$hash $(git describe $hash)"
>> +		if desc=$(git describe $hash)
>> +		then
>> +			: >expect-contains-good
>> +		else
>> +			: >expect-contains-bad
>> +		fi &&
>> +		echo "$hash $desc"
>>  	done >expect &&
>> +	test_path_exists expect-contains-good &&
>> +	test_path_exists expect-contains-bad &&
>
> Hmph, I am not sure why we want temporary files for this (and I
> doubt this "documenting" adds that much value to the tests to begin
> with), but OK.  Will queue.

Variables would suffice, but make debugging harder.  test_path_exists
will at least print a suggestive file name.  Perhaps we should add a
test_assert?

The added checks guard against neutering the test accidentally e.g.
by tagging the currently undescribable commit in the setup phase.
That would be hard to detect without it.

>
>>  	git log --format="%H %(describe)" >actual 2>err &&
>>  	test_cmp expect actual &&
>>  	test_must_be_empty err
>> --
>> 2.30.1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] pretty: add merge and exclude options to %(describe)
  2021-02-28 15:41         ` Ævar Arnfjörð Bjarmason
@ 2021-03-02 16:00           ` René Scharfe.
  2021-03-06 16:18             ` René Scharfe.
  0 siblings, 1 reply; 27+ messages in thread
From: René Scharfe. @ 2021-03-02 16:00 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff King, Eli Schwartz, git, Junio C Hamano

Am 28.02.21 um 16:41 schrieb Ævar Arnfjörð Bjarmason:
>
> On Sun, Feb 28 2021, René Scharfe. wrote:
>
>> Am 17.02.21 um 19:31 schrieb Jeff King:
>>> On Sun, Feb 14, 2021 at 11:10:57AM +0100, René Scharfe. wrote:
>>>
>>>> Allow restricting the tags used by the placeholder %(describe) with the
>>>> options match and exclude.  E.g. the following command describes the
>>>> current commit using official version tags, without those for release
>>>> candidates:
>>>>
>>>>    $ git log -1 --format='%(describe:match=v[0-9]*,exclude=*rc*)'
>>>
>>> An interesting side effect of this series is that it allows remote users
>>> asking for archives to fill in this data, too (by using export-subst
>>> placeholders). That includes servers allowing "git archive --remote",
>>> but also services like GitHub that will run git-archive on behalf of
>>> clients.
>>>
>>> I wonder what avenues for mischief this provides. Certainly using extra
>>> CPU to run git-describe.
>>
>> A repository can contain millions of files, each file can contain
>> millions of $Format:...$ sequences and each of them can contain millions
>> of %(describe) placeholders.  Each of them could have different match or
>> exclude args to prevent caching.  Allowing a single request to cause
>> trillions of calls of git describe sounds excessive.  Let's limit this.
>>
>> -- >8 --
>> Subject: [PATCH] archive: expand only a single %(describe) per archive
>>
>> Every %(describe) placeholder in $Format:...$ strings in files with the
>> attribute export-subst is expanded by calling git describe.  This can
>> potentially result in a lot of such calls per archive.  That's OK for
>> local repositories under control of the user of git archive, but could
>> be a problem for hosted repositories.
>>
>> Expand only a single %(describe) placeholder per archive for now to
>> avoid denial-of-service attacks.  We can make this limit configurable
>> later if needed, but let's start out simple.
>>
>> Reported-by: Jeff King <peff@peff.net>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>>  Documentation/gitattributes.txt |  3 ++-
>>  archive.c                       | 16 ++++++++++------
>>  archive.h                       |  2 ++
>>  pretty.c                        |  8 ++++++++
>>  pretty.h                        |  5 +++++
>>  t/t5001-archive-attr.sh         | 14 ++++++++++++++
>>  6 files changed, 41 insertions(+), 7 deletions(-)
>>
>> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
>> index e84e104f93..0a60472bb5 100644
>> --- a/Documentation/gitattributes.txt
>> +++ b/Documentation/gitattributes.txt
>> @@ -1174,7 +1174,8 @@ tag then no replacement will be done.  The placeholders are the same
>>  as those for the option `--pretty=format:` of linkgit:git-log[1],
>>  except that they need to be wrapped like this: `$Format:PLACEHOLDERS$`
>>  in the file.  E.g. the string `$Format:%H$` will be replaced by the
>> -commit hash.
>> +commit hash.  However, only one `%(describe)` placeholder is expanded
>> +per archive to avoid denial-of-service attacks.
>>
>>
>>  Packing objects
>> diff --git a/archive.c b/archive.c
>> index 5919d9e505..2dd2236ae0 100644
>> --- a/archive.c
>> +++ b/archive.c
>> @@ -37,13 +37,10 @@ void init_archivers(void)
>>
>>  static void format_subst(const struct commit *commit,
>>  			 const char *src, size_t len,
>> -			 struct strbuf *buf)
>> +			 struct strbuf *buf, struct pretty_print_context *ctx)
>>  {
>>  	char *to_free = NULL;
>>  	struct strbuf fmt = STRBUF_INIT;
>> -	struct pretty_print_context ctx = {0};
>> -	ctx.date_mode.type = DATE_NORMAL;
>> -	ctx.abbrev = DEFAULT_ABBREV;
>>
>>  	if (src == buf->buf)
>>  		to_free = strbuf_detach(buf, NULL);
>> @@ -61,7 +58,7 @@ static void format_subst(const struct commit *commit,
>>  		strbuf_add(&fmt, b + 8, c - b - 8);
>>
>>  		strbuf_add(buf, src, b - src);
>> -		format_commit_message(commit, fmt.buf, buf, &ctx);
>> +		format_commit_message(commit, fmt.buf, buf, ctx);
>>  		len -= c + 1 - src;
>>  		src  = c + 1;
>>  	}
>> @@ -94,7 +91,7 @@ static void *object_file_to_archive(const struct archiver_args *args,
>>  		strbuf_attach(&buf, buffer, *sizep, *sizep + 1);
>>  		convert_to_working_tree(args->repo->index, path, buf.buf, buf.len, &buf, &meta);
>>  		if (commit)
>> -			format_subst(commit, buf.buf, buf.len, &buf);
>> +			format_subst(commit, buf.buf, buf.len, &buf, args->pretty_ctx);
>>  		buffer = strbuf_detach(&buf, &size);
>>  		*sizep = size;
>>  	}
>> @@ -633,12 +630,19 @@ int write_archive(int argc, const char **argv, const char *prefix,
>>  		  const char *name_hint, int remote)
>>  {
>>  	const struct archiver *ar = NULL;
>> +	struct pretty_print_describe_status describe_status = {0};
>> +	struct pretty_print_context ctx = {0};
>>  	struct archiver_args args;
>>  	int rc;
>>
>>  	git_config_get_bool("uploadarchive.allowunreachable", &remote_allow_unreachable);
>>  	git_config(git_default_config, NULL);
>>
>> +	describe_status.max_invocations = 1;
>> +	ctx.date_mode.type = DATE_NORMAL;
>> +	ctx.abbrev = DEFAULT_ABBREV;
>> +	ctx.describe_status = &describe_status;
>> +	args.pretty_ctx = &ctx;
>>  	args.repo = repo;
>>  	args.prefix = prefix;
>>  	string_list_init(&args.extra_files, 1);
>> diff --git a/archive.h b/archive.h
>> index 33551b7ee1..49fab71aaf 100644
>> --- a/archive.h
>> +++ b/archive.h
>> @@ -5,6 +5,7 @@
>>  #include "pathspec.h"
>>
>>  struct repository;
>> +struct pretty_print_context;
>>
>>  struct archiver_args {
>>  	struct repository *repo;
>> @@ -22,6 +23,7 @@ struct archiver_args {
>>  	unsigned int convert : 1;
>>  	int compression_level;
>>  	struct string_list extra_files;
>> +	struct pretty_print_context *pretty_ctx;
>>  };
>>
>>  /* main api */
>> diff --git a/pretty.c b/pretty.c
>> index c612d2ac9b..032e89cd4e 100644
>> --- a/pretty.c
>> +++ b/pretty.c
>> @@ -1247,6 +1247,14 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
>>  		struct child_process cmd = CHILD_PROCESS_INIT;
>>  		struct strbuf out = STRBUF_INIT;
>>  		struct strbuf err = STRBUF_INIT;
>> +		struct pretty_print_describe_status *describe_status;
>> +
>> +		describe_status = c->pretty_ctx->describe_status;
>> +		if (describe_status) {
>> +			if (!describe_status->max_invocations)
>> +				return 0;
>> +			describe_status->max_invocations--;
>> +		}
>>
>>  		cmd.git_cmd = 1;
>>  		strvec_push(&cmd.args, "describe");
>> diff --git a/pretty.h b/pretty.h
>> index 7ce6c0b437..27b15947ff 100644
>> --- a/pretty.h
>> +++ b/pretty.h
>> @@ -23,6 +23,10 @@ enum cmit_fmt {
>>  	CMIT_FMT_UNSPECIFIED
>>  };
>>
>> +struct pretty_print_describe_status {
>> +	unsigned int max_invocations;
>> +};
>> +
>>  struct pretty_print_context {
>>  	/*
>>  	 * Callers should tweak these to change the behavior of pp_* functions.
>> @@ -44,6 +48,7 @@ struct pretty_print_context {
>>  	int color;
>>  	struct ident_split *from_ident;
>>  	unsigned encode_email_headers:1;
>> +	struct pretty_print_describe_status *describe_status;
>>
>>  	/*
>>  	 * Fields below here are manipulated internally by pp_* functions and
>> diff --git a/t/t5001-archive-attr.sh b/t/t5001-archive-attr.sh
>> index e9aa97117a..1c9ce3956b 100755
>> --- a/t/t5001-archive-attr.sh
>> +++ b/t/t5001-archive-attr.sh
>> @@ -128,4 +128,18 @@ test_expect_success 'export-subst' '
>>  	test_cmp substfile2 archive/substfile2
>>  '
>>
>> +test_expect_success 'export-subst expands %(describe) once' '
>> +	echo "\$Format:%(describe)\$" >substfile3 &&
>> +	echo "\$Format:%(describe)\$" >>substfile3 &&
>> +	echo "\$Format:%(describe)${LF}%(describe)\$" >substfile4 &&
>> +	git add substfile[34] &&
>> +	git commit -m export-subst-describe &&
>> +	git tag -m export-subst-describe export-subst-describe &&
>> +	git archive HEAD >archive-describe.tar &&
>> +	extract_tar_to_dir archive-describe &&
>> +	desc=$(git describe) &&
>> +	grep -F "$desc" archive-describe/substfile[34] >substituted &&
>> +	test_line_count = 1 substituted
>> +'
>> +
>>  test_done
>
> This whole thing seems rather backwards as a solution to the "we run it
> N times" problem I mentioned in <87k0r7a4sr.fsf@evledraar.gmail.com>.

In the referenced message you pointed out that using %(describe) more
than once incurs the cost of a call to "git describe" each time and
asked for a way to avoid that.

This patch here prevents the second and later calls for "git archive".
It's not intended to speed up expanding repeated placeholders, but
rather to reject maliciously crafted placeholders even if they are not
the same.

So does that mean you have the requirement to use %(describe) more
than once in the same archive, and this DoS protection would be too
strict for you?  In that case we'd need a way to increase the limit,
e.g. a configuration variable or command line option.  Otherwise I
don't see the connection between your message and this patch.

Side note: Keeping the "git describe" running and feeding it commits
one by one through a pipe (with a new --stdin option, I suppose) as
mentioned in <66720982-04d6-3096-9ea2-ea5bc3fcd121@web.de> would
speed up expanding repeated placeholders as well.

> Instead of taking the trouble of putting a limit in the
> pretty_print_context so we don't call it N times for the same commit,
> why not just put the strbuf with the result in that same struct?
>
> Then you can have it millions of times, and it won't be any more
> expensive than the other existing %(format) specifiers (actually cheaper
> than most).

Each %(describe) placeholder can have a unique set of match or exclude
arguments.  Caching them all would increase the strength of a DoS
attack.  Caching a few of them would be OK, but is ineffective in
reducing the strength of the attack.

> Or is there some edge case I'm missing here where "git archive" can
> either be fed N commits and we share the context, or we share the
> context across formatting different revisions in some cases?

git archive currently works only on a single commit.

René

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] pretty: add merge and exclude options to %(describe)
       [not found]         ` <xmqqy2f6rc8f.fsf@gitster.c.googlers.com>
@ 2021-03-02 16:00           ` René Scharfe.
  0 siblings, 0 replies; 27+ messages in thread
From: René Scharfe. @ 2021-03-02 16:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Eli Schwartz, git

Am 01.03.21 um 18:54 schrieb Junio C Hamano:
> René Scharfe. <l.s.r@web.de> writes:
>
>> Am 17.02.21 um 19:31 schrieb Jeff King:
>>> On Sun, Feb 14, 2021 at 11:10:57AM +0100, René Scharfe. wrote:
>>>
>>>> Allow restricting the tags used by the placeholder %(describe) with the
>>>> options match and exclude.  E.g. the following command describes the
>>>> current commit using official version tags, without those for release
>>>> candidates:
>>>>
>>>>    $ git log -1 --format='%(describe:match=v[0-9]*,exclude=*rc*)'
>>>
>>> An interesting side effect of this series is that it allows remote users
>>> asking for archives to fill in this data, too (by using export-subst
>>> placeholders). That includes servers allowing "git archive --remote",
>>> but also services like GitHub that will run git-archive on behalf of
>>> clients.
>>>
>>> I wonder what avenues for mischief this provides. Certainly using extra
>>> CPU to run git-describe.
>>
>> A repository can contain millions of files, each file can contain
>> millions of $Format:...$ sequences and each of them can contain millions
>> of %(describe) placeholders.  Each of them could have different match or
>> exclude args to prevent caching.  Allowing a single request to cause
>> trillions of calls of git describe sounds excessive.  Let's limit this.
>
> An invocation of "git archive" would have to deal with a single
> commit, no?  I wonder if it is a more fruitful direction to go to
> teach format_subst() to "cache" the mapping from <placeholders> to
> <resulting-string> and reuse.

Yes, git archive only works on a single commit.  Caching cannot help
against a DoS attack using describe placeholders with different match
or exclude arguments.

René

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] pretty: add merge and exclude options to %(describe)
  2021-03-02 16:00           ` René Scharfe.
@ 2021-03-06 16:18             ` René Scharfe.
  0 siblings, 0 replies; 27+ messages in thread
From: René Scharfe. @ 2021-03-06 16:18 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff King, Eli Schwartz, git, Junio C Hamano

Am 02.03.21 um 17:00 schrieb René Scharfe.:
> Am 28.02.21 um 16:41 schrieb Ævar Arnfjörð Bjarmason:
>> Instead of taking the trouble of putting a limit in the
>> pretty_print_context so we don't call it N times for the same commit,
>> why not just put the strbuf with the result in that same struct?
>>
>> Then you can have it millions of times, and it won't be any more
>> expensive than the other existing %(format) specifiers (actually cheaper
>> than most).
>
> Each %(describe) placeholder can have a unique set of match or exclude
> arguments.  Caching them all would increase the strength of a DoS
> attack.  Caching a few of them would be OK, but is ineffective in
> reducing the strength of the attack.

The script at the bottom creates archives that illustrate the issue. On
a repo generated with parameter 10 (10 files with 10 $Format:...$ with
10 %(describe) placeholders, so 1000 total), I get the following number
with v2.30.1, which ignores %(describe):

Benchmark #1: git archive HEAD
  Time (mean ± σ):       2.2 ms ±   0.2 ms    [User: 0.9 ms, System: 1.0 ms]
  Range (min … max):     1.8 ms …   2.8 ms    705 runs

  Warning: Command took less than 5 ms to complete. Results might be inaccurate.

The version in next expands all placeholders and takes three orders of
magnitude longer:

Benchmark #1: git archive HEAD
  Time (mean ± σ):      2.300 s ±  0.003 s    [User: 819.0 ms, System: 1200.0 ms]
  Range (min … max):    2.293 s …  2.305 s    10 runs

The proposed patch to expand only a single placeholder gets the runtime
back under control:

Benchmark #1: git archive HEAD
  Time (mean ± σ):       4.7 ms ±   0.3 ms    [User: 1.8 ms, System: 2.2 ms]
  Range (min … max):     4.2 ms …   7.0 ms    451 runs

  Warning: Command took less than 5 ms to complete. Results might be inaccurate.

Using parameter 100 takes about a second to create the repo, but the git
archive version in next already needs longer to tar it up than I'm
willing to wait.


#!/bin/sh
n=$1
mkdir $n
cd $n
git init
for i in $(seq $n)
do
	awk -v i=$i -v n=$n 'END {
		for (j = 0; j < n; j++) {
			print "$Format:"
			for (k = 0; k < n; k++) {
				print "%(describe:exclude=x-" i "-" j "-" k ")"
			}
			print "$"
		}
	}' </dev/null >"file$i"
done
git add file*
echo "file* export-subst" >.gitattributes
git add .gitattributes
git commit -m initial
for tagno in $(seq $n)
do
	git tag -m "$tagno" "tag$tagno"
done

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2021-03-06 16:20 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-25  0:32 gitattributes export-subst and software versioning Eli Schwartz
2021-02-08 19:46 ` René Scharfe
2021-02-08 22:41   ` Junio C Hamano
2021-02-09  0:19   ` Eli Schwartz
2021-02-09 20:42     ` Junio C Hamano
2021-02-14 10:04       ` René Scharfe
2021-02-14 10:04     ` René Scharfe
2021-02-14 10:04 ` [PATCH 1/2] pretty: add %(describe) René Scharfe
2021-02-14 10:10   ` [PATCH 2/2] pretty: add merge and exclude options to %(describe) René Scharfe.
2021-02-17 18:31     ` Jeff King
2021-02-28 11:22       ` René Scharfe.
2021-02-28 15:41         ` Ævar Arnfjörð Bjarmason
2021-03-02 16:00           ` René Scharfe.
2021-03-06 16:18             ` René Scharfe.
     [not found]         ` <xmqqy2f6rc8f.fsf@gitster.c.googlers.com>
2021-03-02 16:00           ` René Scharfe.
     [not found]     ` <xmqqsg5uletz.fsf@gitster.g>
2021-02-28 11:22       ` René Scharfe.
2021-02-16  5:04   ` [PATCH 1/2] pretty: add %(describe) Eli Schwartz
2021-02-16 13:00   ` Ævar Arnfjörð Bjarmason
2021-02-16 17:13     ` René Scharfe.
2021-02-16 18:44     ` Junio C Hamano
2021-02-17  0:47       ` Ævar Arnfjörð Bjarmason
2021-02-28 11:22         ` René Scharfe.
     [not found]           ` <xmqq35xesqzk.fsf@gitster.c.googlers.com>
2021-03-02 16:00             ` René Scharfe.
2021-02-17  0:58   ` Ævar Arnfjörð Bjarmason
2021-02-17 18:12     ` Junio C Hamano
2021-02-28 11:22     ` René Scharfe.
     [not found]       ` <xmqq7dmqsr72.fsf@gitster.c.googlers.com>
2021-03-02 16:00         ` René Scharfe.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).