git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: "René Scharfe." <l.s.r@web.de>
Cc: Jeff King <peff@peff.net>, Eli Schwartz <eschwartz@archlinux.org>,
	git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 2/2] pretty: add merge and exclude options to %(describe)
Date: Sun, 28 Feb 2021 16:41:00 +0100	[thread overview]
Message-ID: <87blc4kxo3.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <b7e1f6c0-6b13-efe4-74b5-ec8249855644@web.de>


On Sun, Feb 28 2021, René Scharfe. wrote:

> Am 17.02.21 um 19:31 schrieb Jeff King:
>> On Sun, Feb 14, 2021 at 11:10:57AM +0100, René Scharfe. wrote:
>>
>>> Allow restricting the tags used by the placeholder %(describe) with the
>>> options match and exclude.  E.g. the following command describes the
>>> current commit using official version tags, without those for release
>>> candidates:
>>>
>>>    $ git log -1 --format='%(describe:match=v[0-9]*,exclude=*rc*)'
>>
>> An interesting side effect of this series is that it allows remote users
>> asking for archives to fill in this data, too (by using export-subst
>> placeholders). That includes servers allowing "git archive --remote",
>> but also services like GitHub that will run git-archive on behalf of
>> clients.
>>
>> I wonder what avenues for mischief this provides. Certainly using extra
>> CPU to run git-describe.
>
> A repository can contain millions of files, each file can contain
> millions of $Format:...$ sequences and each of them can contain millions
> of %(describe) placeholders.  Each of them could have different match or
> exclude args to prevent caching.  Allowing a single request to cause
> trillions of calls of git describe sounds excessive.  Let's limit this.
>
> -- >8 --
> Subject: [PATCH] archive: expand only a single %(describe) per archive
>
> Every %(describe) placeholder in $Format:...$ strings in files with the
> attribute export-subst is expanded by calling git describe.  This can
> potentially result in a lot of such calls per archive.  That's OK for
> local repositories under control of the user of git archive, but could
> be a problem for hosted repositories.
>
> Expand only a single %(describe) placeholder per archive for now to
> avoid denial-of-service attacks.  We can make this limit configurable
> later if needed, but let's start out simple.
>
> Reported-by: Jeff King <peff@peff.net>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  Documentation/gitattributes.txt |  3 ++-
>  archive.c                       | 16 ++++++++++------
>  archive.h                       |  2 ++
>  pretty.c                        |  8 ++++++++
>  pretty.h                        |  5 +++++
>  t/t5001-archive-attr.sh         | 14 ++++++++++++++
>  6 files changed, 41 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index e84e104f93..0a60472bb5 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -1174,7 +1174,8 @@ tag then no replacement will be done.  The placeholders are the same
>  as those for the option `--pretty=format:` of linkgit:git-log[1],
>  except that they need to be wrapped like this: `$Format:PLACEHOLDERS$`
>  in the file.  E.g. the string `$Format:%H$` will be replaced by the
> -commit hash.
> +commit hash.  However, only one `%(describe)` placeholder is expanded
> +per archive to avoid denial-of-service attacks.
>
>
>  Packing objects
> diff --git a/archive.c b/archive.c
> index 5919d9e505..2dd2236ae0 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -37,13 +37,10 @@ void init_archivers(void)
>
>  static void format_subst(const struct commit *commit,
>  			 const char *src, size_t len,
> -			 struct strbuf *buf)
> +			 struct strbuf *buf, struct pretty_print_context *ctx)
>  {
>  	char *to_free = NULL;
>  	struct strbuf fmt = STRBUF_INIT;
> -	struct pretty_print_context ctx = {0};
> -	ctx.date_mode.type = DATE_NORMAL;
> -	ctx.abbrev = DEFAULT_ABBREV;
>
>  	if (src == buf->buf)
>  		to_free = strbuf_detach(buf, NULL);
> @@ -61,7 +58,7 @@ static void format_subst(const struct commit *commit,
>  		strbuf_add(&fmt, b + 8, c - b - 8);
>
>  		strbuf_add(buf, src, b - src);
> -		format_commit_message(commit, fmt.buf, buf, &ctx);
> +		format_commit_message(commit, fmt.buf, buf, ctx);
>  		len -= c + 1 - src;
>  		src  = c + 1;
>  	}
> @@ -94,7 +91,7 @@ static void *object_file_to_archive(const struct archiver_args *args,
>  		strbuf_attach(&buf, buffer, *sizep, *sizep + 1);
>  		convert_to_working_tree(args->repo->index, path, buf.buf, buf.len, &buf, &meta);
>  		if (commit)
> -			format_subst(commit, buf.buf, buf.len, &buf);
> +			format_subst(commit, buf.buf, buf.len, &buf, args->pretty_ctx);
>  		buffer = strbuf_detach(&buf, &size);
>  		*sizep = size;
>  	}
> @@ -633,12 +630,19 @@ int write_archive(int argc, const char **argv, const char *prefix,
>  		  const char *name_hint, int remote)
>  {
>  	const struct archiver *ar = NULL;
> +	struct pretty_print_describe_status describe_status = {0};
> +	struct pretty_print_context ctx = {0};
>  	struct archiver_args args;
>  	int rc;
>
>  	git_config_get_bool("uploadarchive.allowunreachable", &remote_allow_unreachable);
>  	git_config(git_default_config, NULL);
>
> +	describe_status.max_invocations = 1;
> +	ctx.date_mode.type = DATE_NORMAL;
> +	ctx.abbrev = DEFAULT_ABBREV;
> +	ctx.describe_status = &describe_status;
> +	args.pretty_ctx = &ctx;
>  	args.repo = repo;
>  	args.prefix = prefix;
>  	string_list_init(&args.extra_files, 1);
> diff --git a/archive.h b/archive.h
> index 33551b7ee1..49fab71aaf 100644
> --- a/archive.h
> +++ b/archive.h
> @@ -5,6 +5,7 @@
>  #include "pathspec.h"
>
>  struct repository;
> +struct pretty_print_context;
>
>  struct archiver_args {
>  	struct repository *repo;
> @@ -22,6 +23,7 @@ struct archiver_args {
>  	unsigned int convert : 1;
>  	int compression_level;
>  	struct string_list extra_files;
> +	struct pretty_print_context *pretty_ctx;
>  };
>
>  /* main api */
> diff --git a/pretty.c b/pretty.c
> index c612d2ac9b..032e89cd4e 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -1247,6 +1247,14 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
>  		struct child_process cmd = CHILD_PROCESS_INIT;
>  		struct strbuf out = STRBUF_INIT;
>  		struct strbuf err = STRBUF_INIT;
> +		struct pretty_print_describe_status *describe_status;
> +
> +		describe_status = c->pretty_ctx->describe_status;
> +		if (describe_status) {
> +			if (!describe_status->max_invocations)
> +				return 0;
> +			describe_status->max_invocations--;
> +		}
>
>  		cmd.git_cmd = 1;
>  		strvec_push(&cmd.args, "describe");
> diff --git a/pretty.h b/pretty.h
> index 7ce6c0b437..27b15947ff 100644
> --- a/pretty.h
> +++ b/pretty.h
> @@ -23,6 +23,10 @@ enum cmit_fmt {
>  	CMIT_FMT_UNSPECIFIED
>  };
>
> +struct pretty_print_describe_status {
> +	unsigned int max_invocations;
> +};
> +
>  struct pretty_print_context {
>  	/*
>  	 * Callers should tweak these to change the behavior of pp_* functions.
> @@ -44,6 +48,7 @@ struct pretty_print_context {
>  	int color;
>  	struct ident_split *from_ident;
>  	unsigned encode_email_headers:1;
> +	struct pretty_print_describe_status *describe_status;
>
>  	/*
>  	 * Fields below here are manipulated internally by pp_* functions and
> diff --git a/t/t5001-archive-attr.sh b/t/t5001-archive-attr.sh
> index e9aa97117a..1c9ce3956b 100755
> --- a/t/t5001-archive-attr.sh
> +++ b/t/t5001-archive-attr.sh
> @@ -128,4 +128,18 @@ test_expect_success 'export-subst' '
>  	test_cmp substfile2 archive/substfile2
>  '
>
> +test_expect_success 'export-subst expands %(describe) once' '
> +	echo "\$Format:%(describe)\$" >substfile3 &&
> +	echo "\$Format:%(describe)\$" >>substfile3 &&
> +	echo "\$Format:%(describe)${LF}%(describe)\$" >substfile4 &&
> +	git add substfile[34] &&
> +	git commit -m export-subst-describe &&
> +	git tag -m export-subst-describe export-subst-describe &&
> +	git archive HEAD >archive-describe.tar &&
> +	extract_tar_to_dir archive-describe &&
> +	desc=$(git describe) &&
> +	grep -F "$desc" archive-describe/substfile[34] >substituted &&
> +	test_line_count = 1 substituted
> +'
> +
>  test_done

This whole thing seems rather backwards as a solution to the "we run it
N times" problem I mentioned in <87k0r7a4sr.fsf@evledraar.gmail.com>.

Instead of taking the trouble of putting a limit in the
pretty_print_context so we don't call it N times for the same commit,
why not just put the strbuf with the result in that same struct?

Then you can have it millions of times, and it won't be any more
expensive than the other existing %(format) specifiers (actually cheaper
than most).

Or is there some edge case I'm missing here where "git archive" can
either be fed N commits and we share the context, or we share the
context across formatting different revisions in some cases?

  reply	other threads:[~2021-02-28 15:42 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-25  0:32 gitattributes export-subst and software versioning Eli Schwartz
2021-02-08 19:46 ` René Scharfe
2021-02-08 22:41   ` Junio C Hamano
2021-02-09  0:19   ` Eli Schwartz
2021-02-09 20:42     ` Junio C Hamano
2021-02-14 10:04       ` René Scharfe
2021-02-14 10:04     ` René Scharfe
2021-02-14 10:04 ` [PATCH 1/2] pretty: add %(describe) René Scharfe
2021-02-14 10:10   ` [PATCH 2/2] pretty: add merge and exclude options to %(describe) René Scharfe.
2021-02-17 18:31     ` Jeff King
2021-02-28 11:22       ` René Scharfe.
2021-02-28 15:41         ` Ævar Arnfjörð Bjarmason [this message]
2021-03-02 16:00           ` René Scharfe.
2021-03-06 16:18             ` René Scharfe.
     [not found]         ` <xmqqy2f6rc8f.fsf@gitster.c.googlers.com>
2021-03-02 16:00           ` René Scharfe.
     [not found]     ` <xmqqsg5uletz.fsf@gitster.g>
2021-02-28 11:22       ` René Scharfe.
2021-02-16  5:04   ` [PATCH 1/2] pretty: add %(describe) Eli Schwartz
2021-02-16 13:00   ` Ævar Arnfjörð Bjarmason
2021-02-16 17:13     ` René Scharfe.
2021-02-16 18:44     ` Junio C Hamano
2021-02-17  0:47       ` Ævar Arnfjörð Bjarmason
2021-02-28 11:22         ` René Scharfe.
     [not found]           ` <xmqq35xesqzk.fsf@gitster.c.googlers.com>
2021-03-02 16:00             ` René Scharfe.
2021-02-17  0:58   ` Ævar Arnfjörð Bjarmason
2021-02-17 18:12     ` Junio C Hamano
2021-02-28 11:22     ` René Scharfe.
     [not found]       ` <xmqq7dmqsr72.fsf@gitster.c.googlers.com>
2021-03-02 16:00         ` René Scharfe.

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87blc4kxo3.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=eschwartz@archlinux.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).