git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/3] pack-objects: fix and simplify --filter handling
@ 2022-11-12 10:42 René Scharfe
  2022-11-12 10:44 ` [PATCH 1/3] pack-objects: fix handling of multiple --filter options René Scharfe
                   ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: René Scharfe @ 2022-11-12 10:42 UTC (permalink / raw)
  To: Git List
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Taylor Blau, Christian Couder

Fix a regression that prevents using multiple --filter options, simplify
the option parsing code and avoid relying on undefined behavior in it.

Effectively almost reverts 5cb28270a1 (pack-objects: lazily set up
"struct rev_info", don't leak, 2022-03-28), which introduced these
shortcomings.

Patch 2 conflicts with cc/filtered-repack in seen, but not semantically.

  pack-objects: fix handling of multiple --filter options
  pack-object: simplify --filter handling
  list-objects-filter: remove OPT_PARSE_LIST_OBJECTS_FILTER_INIT()

 builtin/pack-objects.c                 | 27 ++++++--------------------
 list-objects-filter-options.c          |  4 ----
 list-objects-filter-options.h          | 18 ++---------------
 t/t5317-pack-objects-filter-objects.sh | 19 ++++++++++++++++++
 4 files changed, 27 insertions(+), 41 deletions(-)

--
2.38.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/3] pack-objects: fix handling of multiple --filter options
  2022-11-12 10:42 [PATCH 0/3] pack-objects: fix and simplify --filter handling René Scharfe
@ 2022-11-12 10:44 ` René Scharfe
  2022-11-12 11:41   ` Ævar Arnfjörð Bjarmason
  2022-11-12 16:58   ` Jeff King
  2022-11-12 10:44 ` [PATCH 2/3] pack-object: simplify --filter handling René Scharfe
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 36+ messages in thread
From: René Scharfe @ 2022-11-12 10:44 UTC (permalink / raw)
  To: Git List
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Taylor Blau, Christian Couder

Since 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't
leak, 2022-03-28) --filter options given to git pack-objects overrule
earlier ones, letting only the leftmost win and leaking the memory
allocated for earlier ones.  Fix that by only initializing the rev_info
struct once.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/pack-objects.c                 |  3 ++-
 t/t5317-pack-objects-filter-objects.sh | 19 +++++++++++++++++++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 573d0b20b7..c702c09dd4 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -4158,7 +4158,8 @@ static struct list_objects_filter_options *po_filter_revs_init(void *value)
 {
 	struct po_filter_data *data = value;

-	repo_init_revisions(the_repository, &data->revs, NULL);
+	if (!data->have_revs)
+		repo_init_revisions(the_repository, &data->revs, NULL);
 	data->have_revs = 1;

 	return &data->revs.filter;
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index bb633c9b09..bd8983bb56 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -178,6 +178,25 @@ test_expect_success 'verify blob:limit=1001' '
 	test_cmp expected observed
 '

+test_expect_success 'verify blob:limit=1001+object:type=blob' '
+	git -C r2 ls-files -s large.1000 |
+	test_parse_ls_files_stage_oids |
+	sort >expected &&
+
+	git -C r2 pack-objects --revs --stdout --filter=blob:limit=1001 \
+		--filter=object:type=blob >filter.pack <<-EOF &&
+	HEAD
+	EOF
+	git -C r2 index-pack ../filter.pack &&
+
+	git -C r2 verify-pack -v ../filter.pack >verify_result &&
+	grep blob verify_result |
+	parse_verify_pack_blob_oid |
+	sort >observed &&
+
+	test_cmp expected observed
+'
+
 test_expect_success 'verify blob:limit=10001' '
 	git -C r2 ls-files -s large.1000 large.10000 |
 	test_parse_ls_files_stage_oids |
--
2.38.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/3] pack-object: simplify --filter handling
  2022-11-12 10:42 [PATCH 0/3] pack-objects: fix and simplify --filter handling René Scharfe
  2022-11-12 10:44 ` [PATCH 1/3] pack-objects: fix handling of multiple --filter options René Scharfe
@ 2022-11-12 10:44 ` René Scharfe
  2022-11-12 11:45   ` Ævar Arnfjörð Bjarmason
  2022-11-12 17:02   ` Jeff King
  2022-11-12 10:46 ` [PATCH 3/3] list-objects-filter: remove OPT_PARSE_LIST_OBJECTS_FILTER_INIT() René Scharfe
  2022-11-20 10:03 ` [PATCH v2 0/3] pack-objects: fix and simplify --filter handling René Scharfe
  3 siblings, 2 replies; 36+ messages in thread
From: René Scharfe @ 2022-11-12 10:44 UTC (permalink / raw)
  To: Git List
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Taylor Blau, Christian Couder

pack-objects uses OPT_PARSE_LIST_OBJECTS_FILTER_INIT() to initialize the
a rev_info struct lazily before populating its filter member using the
--filter option values.  It tracks whether the initialization is needed
using the .have_revs member of the callback data.

There is a better way: Use a stand-alone list_objects_filter_options
struct and build a rev_info struct with its .filter member after option
parsing.  This allows using the simpler OPT_PARSE_LIST_OBJECTS_FILTER()
and getting rid of the extra callback mechanism.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/pack-objects.c | 28 ++++++----------------------
 1 file changed, 6 insertions(+), 22 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index c702c09dd4..30023bcebb 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -4149,22 +4149,6 @@ static int option_parse_cruft_expiration(const struct option *opt,
 	return 0;
 }

-struct po_filter_data {
-	unsigned have_revs:1;
-	struct rev_info revs;
-};
-
-static struct list_objects_filter_options *po_filter_revs_init(void *value)
-{
-	struct po_filter_data *data = value;
-
-	if (!data->have_revs)
-		repo_init_revisions(the_repository, &data->revs, NULL);
-	data->have_revs = 1;
-
-	return &data->revs.filter;
-}
-
 int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 {
 	int use_internal_rev_list = 0;
@@ -4175,7 +4159,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	int rev_list_index = 0;
 	int stdin_packs = 0;
 	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
-	struct po_filter_data pfd = { .have_revs = 0 };
+	struct list_objects_filter_options filter_options =
+		LIST_OBJECTS_FILTER_INIT;

 	struct option pack_objects_options[] = {
 		OPT_SET_INT('q', "quiet", &progress,
@@ -4266,7 +4251,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			      &write_bitmap_index,
 			      N_("write a bitmap index if possible"),
 			      WRITE_BITMAP_QUIET, PARSE_OPT_HIDDEN),
-		OPT_PARSE_LIST_OBJECTS_FILTER_INIT(&pfd, po_filter_revs_init),
+		OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
 		OPT_CALLBACK_F(0, "missing", NULL, N_("action"),
 		  N_("handling for missing objects"), PARSE_OPT_NONEG,
 		  option_parse_missing_action),
@@ -4386,7 +4371,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	if (!rev_list_all || !rev_list_reflog || !rev_list_index)
 		unpack_unreachable_expiration = 0;

-	if (pfd.have_revs && pfd.revs.filter.choice) {
+	if (filter_options.choice) {
 		if (!pack_to_stdout)
 			die(_("cannot use --filter without --stdout"));
 		if (stdin_packs)
@@ -4473,16 +4458,15 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 		read_cruft_objects();
 	} else if (!use_internal_rev_list) {
 		read_object_list_from_stdin();
-	} else if (pfd.have_revs) {
-		get_object_list(&pfd.revs, rp.nr, rp.v);
-		release_revisions(&pfd.revs);
 	} else {
 		struct rev_info revs;

 		repo_init_revisions(the_repository, &revs, NULL);
+		list_objects_filter_copy(&revs.filter, &filter_options);
 		get_object_list(&revs, rp.nr, rp.v);
 		release_revisions(&revs);
 	}
+	list_objects_filter_release(&filter_options);
 	cleanup_preferred_base();
 	if (include_tag && nr_result)
 		for_each_tag_ref(add_ref_tag, NULL);
--
2.38.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 3/3] list-objects-filter: remove OPT_PARSE_LIST_OBJECTS_FILTER_INIT()
  2022-11-12 10:42 [PATCH 0/3] pack-objects: fix and simplify --filter handling René Scharfe
  2022-11-12 10:44 ` [PATCH 1/3] pack-objects: fix handling of multiple --filter options René Scharfe
  2022-11-12 10:44 ` [PATCH 2/3] pack-object: simplify --filter handling René Scharfe
@ 2022-11-12 10:46 ` René Scharfe
  2022-11-20 10:03 ` [PATCH v2 0/3] pack-objects: fix and simplify --filter handling René Scharfe
  3 siblings, 0 replies; 36+ messages in thread
From: René Scharfe @ 2022-11-12 10:46 UTC (permalink / raw)
  To: Git List
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Taylor Blau, Christian Couder

OPT_PARSE_LIST_OBJECTS_FILTER_INIT() with a non-NULL second argument
passes a function pointer via an object pointer, which is undefined.  It
may work fine on platforms that implement C99 extension J.5.7 (Function
pointer casts).  Remove the unused macro and avoid the dependency on
that extension.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 list-objects-filter-options.c |  4 ----
 list-objects-filter-options.h | 18 ++----------------
 2 files changed, 2 insertions(+), 20 deletions(-)

diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 5339660238..ee01bcd2cc 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -290,10 +290,6 @@ int opt_parse_list_objects_filter(const struct option *opt,
 				  const char *arg, int unset)
 {
 	struct list_objects_filter_options *filter_options = opt->value;
-	opt_lof_init init = (opt_lof_init)opt->defval;
-
-	if (init)
-		filter_options = init(opt->value);

 	if (unset || !arg)
 		list_objects_filter_set_no_filter(filter_options);
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 7eeadab2dd..1fe393f447 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -111,27 +111,13 @@ void parse_list_objects_filter(
  * The opt->value to opt_parse_list_objects_filter() is either a
  * "struct list_objects_filter_option *" when using
  * OPT_PARSE_LIST_OBJECTS_FILTER().
- *
- * Or, if using no "struct option" field is used by the callback,
- * except the "defval" which is expected to be an "opt_lof_init"
- * function, which is called with the "opt->value" and must return a
- * pointer to the ""struct list_objects_filter_option *" to be used.
- *
- * The OPT_PARSE_LIST_OBJECTS_FILTER_INIT() can be used e.g. the
- * "struct list_objects_filter_option" is embedded in a "struct
- * rev_info", which the "defval" could be tasked with lazily
- * initializing. See cmd_pack_objects() for an example.
  */
 int opt_parse_list_objects_filter(const struct option *opt,
 				  const char *arg, int unset);
-typedef struct list_objects_filter_options *(*opt_lof_init)(void *);
-#define OPT_PARSE_LIST_OBJECTS_FILTER_INIT(fo, init) \
-	{ OPTION_CALLBACK, 0, "filter", (fo), N_("args"), \
-	  N_("object filtering"), 0, opt_parse_list_objects_filter, \
-	  (intptr_t)(init) }

 #define OPT_PARSE_LIST_OBJECTS_FILTER(fo) \
-	OPT_PARSE_LIST_OBJECTS_FILTER_INIT((fo), NULL)
+	OPT_CALLBACK(0, "filter", (fo), N_("args"), \
+		     N_("object filtering"), opt_parse_list_objects_filter)

 /*
  * Translates abbreviated numbers in the filter's filter_spec into their
--
2.38.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] pack-objects: fix handling of multiple --filter options
  2022-11-12 10:44 ` [PATCH 1/3] pack-objects: fix handling of multiple --filter options René Scharfe
@ 2022-11-12 11:41   ` Ævar Arnfjörð Bjarmason
  2022-11-13 17:31     ` René Scharfe
  2022-11-12 16:58   ` Jeff King
  1 sibling, 1 reply; 36+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-11-12 11:41 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List, Junio C Hamano, Taylor Blau, Christian Couder


On Sat, Nov 12 2022, René Scharfe wrote:

> Since 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't
> leak, 2022-03-28) --filter options given to git pack-objects overrule
> earlier ones, letting only the leftmost win and leaking the memory
> allocated for earlier ones.  Fix that by only initializing the rev_info
> struct once.

If I do e.g. this with SANITIZE=leak:

	echo e83c5163316f89bfbde7d9ab23ca2e25604af290 | ./git pack-objects --revs --filter=blob:limit=1001 --filter=object:type=blob --stdout  >/dev/null

I see one leak that wasn't there, but two that are gone now. I haven't
looked into it, but I think the commit message should discuss what leaks
are fixed, which remain/are new etc.

> @@ -4158,7 +4158,8 @@ static struct list_objects_filter_options *po_filter_revs_init(void *value)
>  {
>  	struct po_filter_data *data = value;
>
> -	repo_init_revisions(the_repository, &data->revs, NULL);
> +	if (!data->have_revs)
> +		repo_init_revisions(the_repository, &data->revs, NULL);
>  	data->have_revs = 1;
>
>  	return &data->revs.filter;

FWIW as this goes away in your 2/3 I think just squashing the two with a
leak fix would be nice, if...

> diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
> index bb633c9b09..bd8983bb56 100755
> --- a/t/t5317-pack-objects-filter-objects.sh
> +++ b/t/t5317-pack-objects-filter-objects.sh
> @@ -178,6 +178,25 @@ test_expect_success 'verify blob:limit=1001' '
>  	test_cmp expected observed
>  '
>
> +test_expect_success 'verify blob:limit=1001+object:type=blob' '
> +	git -C r2 ls-files -s large.1000 |

Aside: Should do "git >tmp && test_parse... <tmp", we lose the exit code
of "ls-files" here.

> +	test_parse_ls_files_stage_oids |
> +	sort >expected &&
> +
> +	git -C r2 pack-objects --revs --stdout --filter=blob:limit=1001 \
> +		--filter=object:type=blob >filter.pack <<-EOF &&
> +	HEAD
> +	EOF
> +	git -C r2 index-pack ../filter.pack &&
> +
> +	git -C r2 verify-pack -v ../filter.pack >verify_result &&
> +	grep blob verify_result |
> +	parse_verify_pack_blob_oid |

Whereas this one's not a problem, no "git".

> +	sort >observed &&
> +
> +	test_cmp expected observed

Aside: It would be nice if we had a "test_cmp_sort", but some other day...

> +'
> +
>  test_expect_success 'verify blob:limit=10001' '
>  	git -C r2 ls-files -s large.1000 large.10000 |
>  	test_parse_ls_files_stage_oids |

...we can test it, but this test is in a file that's not part of "linux-leaks".

If that one leak I mentioned above can be fixed (or maybe it's not new?)
this could be tested if we put it in a file of its own with
TEST_PASSES_SANITIZE_LEAK=true.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] pack-object: simplify --filter handling
  2022-11-12 10:44 ` [PATCH 2/3] pack-object: simplify --filter handling René Scharfe
@ 2022-11-12 11:45   ` Ævar Arnfjörð Bjarmason
  2022-11-12 17:02   ` Jeff King
  1 sibling, 0 replies; 36+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-11-12 11:45 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List, Junio C Hamano, Taylor Blau, Christian Couder


On Sat, Nov 12 2022, René Scharfe wrote:

> pack-objects uses OPT_PARSE_LIST_OBJECTS_FILTER_INIT() to initialize the
> a rev_info struct lazily before populating its filter member using the
> --filter option values.  It tracks whether the initialization is needed
> using the .have_revs member of the callback data.
>
> There is a better way: Use a stand-alone list_objects_filter_options
> struct and build a rev_info struct with its .filter member after option
> parsing.  This allows using the simpler OPT_PARSE_LIST_OBJECTS_FILTER()
> and getting rid of the extra callback mechanism.

I haven't gone through the history again, but as I recall this used to
be outside of "struct rev_info", then (I think I did, as part of a leak
fix) it was put there, and now it's still there, but we make a copy of
it, then copy it into the "rev_info" struct again.

Maybe we've finally reached the right trade-off here, but it would help
to discuss some of those past commits, why those arrangements were
wrong, and why this is finally the right one...

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] pack-objects: fix handling of multiple --filter options
  2022-11-12 10:44 ` [PATCH 1/3] pack-objects: fix handling of multiple --filter options René Scharfe
  2022-11-12 11:41   ` Ævar Arnfjörð Bjarmason
@ 2022-11-12 16:58   ` Jeff King
  2022-11-13  5:01     ` Taylor Blau
  1 sibling, 1 reply; 36+ messages in thread
From: Jeff King @ 2022-11-12 16:58 UTC (permalink / raw)
  To: René Scharfe
  Cc: Git List, Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Taylor Blau, Christian Couder

On Sat, Nov 12, 2022 at 11:44:00AM +0100, René Scharfe wrote:

> Since 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't
> leak, 2022-03-28) --filter options given to git pack-objects overrule
> earlier ones, letting only the leftmost win and leaking the memory
> allocated for earlier ones.  Fix that by only initializing the rev_info
> struct once.

Yikes. I wondered how we managed to miss this in the tests. Surely we
have some examples that use multiple filters?

I think the answer is that we do in t5616, but in practice they didn't
hit this bug because of the way filter-specs are treated server-sid3.
Even if we say "clone --filter=foo --filter=bar" on the client, the
server upload-pack coalesces that into a single "combine:foo+bar"
string, and that's what its pack-objects sees.

Your test here triggers it by invoking pack-objects directly, which
makes sense. That's a little different from how pack-objects is invoked
by the rest of Git in practice, but it's definitely something that
_should_ work. It's conceivable that somebody outside of Git could be
driving pack-objects directly, but also it's just a trap waiting to
spring for future developers.

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] pack-object: simplify --filter handling
  2022-11-12 10:44 ` [PATCH 2/3] pack-object: simplify --filter handling René Scharfe
  2022-11-12 11:45   ` Ævar Arnfjörð Bjarmason
@ 2022-11-12 17:02   ` Jeff King
  2022-11-13 16:49     ` Jeff King
  2022-11-13 17:31     ` René Scharfe
  1 sibling, 2 replies; 36+ messages in thread
From: Jeff King @ 2022-11-12 17:02 UTC (permalink / raw)
  To: René Scharfe
  Cc: Git List, Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Taylor Blau, Christian Couder

On Sat, Nov 12, 2022 at 11:44:50AM +0100, René Scharfe wrote:

> pack-objects uses OPT_PARSE_LIST_OBJECTS_FILTER_INIT() to initialize the
> a rev_info struct lazily before populating its filter member using the
> --filter option values.  It tracks whether the initialization is needed
> using the .have_revs member of the callback data.
> 
> There is a better way: Use a stand-alone list_objects_filter_options
> struct and build a rev_info struct with its .filter member after option
> parsing.  This allows using the simpler OPT_PARSE_LIST_OBJECTS_FILTER()
> and getting rid of the extra callback mechanism.

That seems like a reasonable fix (and I think we do it elsewhere). But I
wonder if it wouldn't be simpler to just unconditionally initialize the
rev_info in cmd_pack_objects(), and then unconditionally release it.
True, we _might_ not use it if we are receiving objects directly over
stdin, but I don't think initializing it is particularly costly.

And then we don't have to worry about whether it's valid (it always is),
or about copying between two representations of the filter (there's only
one).

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] pack-objects: fix handling of multiple --filter options
  2022-11-12 16:58   ` Jeff King
@ 2022-11-13  5:01     ` Taylor Blau
  2022-11-13 16:44       ` Jeff King
  2022-11-13 17:31       ` René Scharfe
  0 siblings, 2 replies; 36+ messages in thread
From: Taylor Blau @ 2022-11-13  5:01 UTC (permalink / raw)
  To: Jeff King
  Cc: René Scharfe, Git List,
	Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Taylor Blau, Christian Couder, Jonathan Tan

On Sat, Nov 12, 2022 at 11:58:44AM -0500, Jeff King wrote:
> On Sat, Nov 12, 2022 at 11:44:00AM +0100, René Scharfe wrote:
>
> > Since 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't
> > leak, 2022-03-28) --filter options given to git pack-objects overrule
> > earlier ones, letting only the leftmost win and leaking the memory
> > allocated for earlier ones.  Fix that by only initializing the rev_info
> > struct once.
>
> Yikes. I wondered how we managed to miss this in the tests. Surely we
> have some examples that use multiple filters?
>
> I think the answer is that we do in t5616, but in practice they didn't
> hit this bug because of the way filter-specs are treated server-sid3.
> Even if we say "clone --filter=foo --filter=bar" on the client, the
> server upload-pack coalesces that into a single "combine:foo+bar"
> string, and that's what its pack-objects sees.

...Or in other words, clients aren't broken here because we always send
"combine" filters when multiple `--filter` arguments are passed to `git
clone`, `git fetch`, etc.

Is that always the case? I *think* so, but it would be nice to know if
there was a way that clients would not send the `combine` filter with >1
filter.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] pack-objects: fix handling of multiple --filter options
  2022-11-13  5:01     ` Taylor Blau
@ 2022-11-13 16:44       ` Jeff King
  2022-11-13 17:31       ` René Scharfe
  1 sibling, 0 replies; 36+ messages in thread
From: Jeff King @ 2022-11-13 16:44 UTC (permalink / raw)
  To: Taylor Blau
  Cc: René Scharfe, Git List,
	Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Christian Couder, Jonathan Tan

On Sun, Nov 13, 2022 at 12:01:51AM -0500, Taylor Blau wrote:

> > Yikes. I wondered how we managed to miss this in the tests. Surely we
> > have some examples that use multiple filters?
> >
> > I think the answer is that we do in t5616, but in practice they didn't
> > hit this bug because of the way filter-specs are treated server-sid3.
> > Even if we say "clone --filter=foo --filter=bar" on the client, the
> > server upload-pack coalesces that into a single "combine:foo+bar"
> > string, and that's what its pack-objects sees.
> 
> ...Or in other words, clients aren't broken here because we always send
> "combine" filters when multiple `--filter` arguments are passed to `git
> clone`, `git fetch`, etc.

Right.

> Is that always the case? I *think* so, but it would be nice to know if
> there was a way that clients would not send the `combine` filter with >1
> filter.

I think it's always the case at least with Git's code, because the
client-side send_filter() sends from list_objects_filter_spec(), which
takes the single-string spec of the top-level combine filter. It would
have to explicitly iterate over the sub-filters to come up with anything
else, and I don't think we even have a function for that (we do when
evaluating them, of course, but not for generically iterating or
producing a multi-string set of specs).

And on the server side, even if a client did send multiple "filter"
lines, I think we'd hit list_objects_filter_die_if_populated(), so we'd
complain.

I don't know of any other code paths that could hit this (can you filter
on an outgoing push?). But in the end I'm not sure it even matters. We
should be fixing the bug anyway, and this is all just understanding how
far-reaching its effects are (and the answer seems to be: not very).

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] pack-object: simplify --filter handling
  2022-11-12 17:02   ` Jeff King
@ 2022-11-13 16:49     ` Jeff King
  2022-11-13 17:31     ` René Scharfe
  1 sibling, 0 replies; 36+ messages in thread
From: Jeff King @ 2022-11-13 16:49 UTC (permalink / raw)
  To: René Scharfe
  Cc: Git List, Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Taylor Blau, Christian Couder

On Sat, Nov 12, 2022 at 12:02:58PM -0500, Jeff King wrote:

> > There is a better way: Use a stand-alone list_objects_filter_options
> > struct and build a rev_info struct with its .filter member after option
> > parsing.  This allows using the simpler OPT_PARSE_LIST_OBJECTS_FILTER()
> > and getting rid of the extra callback mechanism.
> 
> That seems like a reasonable fix (and I think we do it elsewhere). But I
> wonder if it wouldn't be simpler to just unconditionally initialize the
> rev_info in cmd_pack_objects(), and then unconditionally release it.
> True, we _might_ not use it if we are receiving objects directly over
> stdin, but I don't think initializing it is particularly costly.
> 
> And then we don't have to worry about whether it's valid (it always is),
> or about copying between two representations of the filter (there's only
> one).

Just to make sure I wasn't telling you absolute garbage, I sketched this
out quickly, so sharing it below (to illustrate, or maybe to save you 5
minutes if you want to go that direction).

If you do stick with copying the filter spec, yet another option is that
the rev_info variable could move back into get_object_list(), and we
could just pass in the filter_options. It was only moved out in
80f6de4f5b (pack-objects: move revs out of get_object_list(),
2022-03-22) so that we could parse --filter directly into it.

That's probably approaching bikeshed territory, and I'm happy enough
either way, but it keeps the outer cmd_pack_objects() to the absolute
minimum of what it needs to see.

-Peff

-- >8 --
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index c702c09dd4..cadc2be35f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -4149,22 +4149,6 @@ static int option_parse_cruft_expiration(const struct option *opt,
 	return 0;
 }
 
-struct po_filter_data {
-	unsigned have_revs:1;
-	struct rev_info revs;
-};
-
-static struct list_objects_filter_options *po_filter_revs_init(void *value)
-{
-	struct po_filter_data *data = value;
-
-	if (!data->have_revs)
-		repo_init_revisions(the_repository, &data->revs, NULL);
-	data->have_revs = 1;
-
-	return &data->revs.filter;
-}
-
 int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 {
 	int use_internal_rev_list = 0;
@@ -4175,7 +4159,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	int rev_list_index = 0;
 	int stdin_packs = 0;
 	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
-	struct po_filter_data pfd = { .have_revs = 0 };
+	struct rev_info revs;
 
 	struct option pack_objects_options[] = {
 		OPT_SET_INT('q', "quiet", &progress,
@@ -4266,7 +4250,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			      &write_bitmap_index,
 			      N_("write a bitmap index if possible"),
 			      WRITE_BITMAP_QUIET, PARSE_OPT_HIDDEN),
-		OPT_PARSE_LIST_OBJECTS_FILTER_INIT(&pfd, po_filter_revs_init),
+		OPT_PARSE_LIST_OBJECTS_FILTER(&revs.filter),
 		OPT_CALLBACK_F(0, "missing", NULL, N_("action"),
 		  N_("handling for missing objects"), PARSE_OPT_NONEG,
 		  option_parse_missing_action),
@@ -4297,6 +4281,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	if (git_env_bool(GIT_TEST_WRITE_REV_INDEX, 0))
 		pack_idx_opts.flags |= WRITE_REV;
 
+	repo_init_revisions(the_repository, &revs, NULL);
+
 	progress = isatty(2);
 	argc = parse_options(argc, argv, prefix, pack_objects_options,
 			     pack_usage, 0);
@@ -4386,7 +4372,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	if (!rev_list_all || !rev_list_reflog || !rev_list_index)
 		unpack_unreachable_expiration = 0;
 
-	if (pfd.have_revs && pfd.revs.filter.choice) {
+	if (revs.filter.choice) {
 		if (!pack_to_stdout)
 			die(_("cannot use --filter without --stdout"));
 		if (stdin_packs)
@@ -4473,16 +4459,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 		read_cruft_objects();
 	} else if (!use_internal_rev_list) {
 		read_object_list_from_stdin();
-	} else if (pfd.have_revs) {
-		get_object_list(&pfd.revs, rp.nr, rp.v);
-		release_revisions(&pfd.revs);
 	} else {
-		struct rev_info revs;
-
-		repo_init_revisions(the_repository, &revs, NULL);
 		get_object_list(&revs, rp.nr, rp.v);
-		release_revisions(&revs);
 	}
+	release_revisions(&revs);
 	cleanup_preferred_base();
 	if (include_tag && nr_result)
 		for_each_tag_ref(add_ref_tag, NULL);

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] pack-object: simplify --filter handling
  2022-11-12 17:02   ` Jeff King
  2022-11-13 16:49     ` Jeff King
@ 2022-11-13 17:31     ` René Scharfe
  1 sibling, 0 replies; 36+ messages in thread
From: René Scharfe @ 2022-11-13 17:31 UTC (permalink / raw)
  To: Jeff King
  Cc: Git List, Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Taylor Blau, Christian Couder

Am 12.11.22 um 18:02 schrieb Jeff King:
> On Sat, Nov 12, 2022 at 11:44:50AM +0100, René Scharfe wrote:
>
>> pack-objects uses OPT_PARSE_LIST_OBJECTS_FILTER_INIT() to initialize the
>> a rev_info struct lazily before populating its filter member using the
>> --filter option values.  It tracks whether the initialization is needed
>> using the .have_revs member of the callback data.
>>
>> There is a better way: Use a stand-alone list_objects_filter_options
>> struct and build a rev_info struct with its .filter member after option
>> parsing.  This allows using the simpler OPT_PARSE_LIST_OBJECTS_FILTER()
>> and getting rid of the extra callback mechanism.
>
> That seems like a reasonable fix (and I think we do it elsewhere). But I
> wonder if it wouldn't be simpler to just unconditionally initialize the
> rev_info in cmd_pack_objects(), and then unconditionally release it.
> True, we _might_ not use it if we are receiving objects directly over
> stdin, but I don't think initializing it is particularly costly.
>
> And then we don't have to worry about whether it's valid (it always is),
> or about copying between two representations of the filter (there's only
> one).

Indeed that's even simpler and what we had before 5cb28270a1
(pack-objects: lazily set up "struct rev_info", don't leak, 2022-03-28).

René

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] pack-objects: fix handling of multiple --filter options
  2022-11-13  5:01     ` Taylor Blau
  2022-11-13 16:44       ` Jeff King
@ 2022-11-13 17:31       ` René Scharfe
  1 sibling, 0 replies; 36+ messages in thread
From: René Scharfe @ 2022-11-13 17:31 UTC (permalink / raw)
  To: Taylor Blau, Jeff King
  Cc: Git List, Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Christian Couder, Jonathan Tan

Am 13.11.22 um 06:01 schrieb Taylor Blau:
> On Sat, Nov 12, 2022 at 11:58:44AM -0500, Jeff King wrote:
>> On Sat, Nov 12, 2022 at 11:44:00AM +0100, René Scharfe wrote:
>>
>>> Since 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't
>>> leak, 2022-03-28) --filter options given to git pack-objects overrule
>>> earlier ones, letting only the leftmost win and leaking the memory
>>> allocated for earlier ones.  Fix that by only initializing the rev_info
>>> struct once.
>>
>> Yikes. I wondered how we managed to miss this in the tests. Surely we
>> have some examples that use multiple filters?
>>
>> I think the answer is that we do in t5616, but in practice they didn't
>> hit this bug because of the way filter-specs are treated server-sid3.
>> Even if we say "clone --filter=foo --filter=bar" on the client, the
>> server upload-pack coalesces that into a single "combine:foo+bar"
>> string, and that's what its pack-objects sees.
>
> ...Or in other words, clients aren't broken here because we always send
> "combine" filters when multiple `--filter` arguments are passed to `git
> clone`, `git fetch`, etc.
>
> Is that always the case? I *think* so, but it would be nice to know if
> there was a way that clients would not send the `combine` filter with >1
> filter.

Searching for pack-objects invocations like this should at least find
all internal cases, right?

   $ git grep -e '"pack-objects"' -e --filter --all-match --name-only
   builtin/pack-objects.c
   bundle.c
   upload-pack.c

git pack-objects handles --filter and doesn't pass it on.  bundle.c and
upload-pack.c only add at most a single --filter option to a strvec.
bundle.c uses list_objects_filter_spec() to get the filter
specification, upload-pack.c uses expand_list_objects_filter_spec().

René

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] pack-objects: fix handling of multiple --filter options
  2022-11-12 11:41   ` Ævar Arnfjörð Bjarmason
@ 2022-11-13 17:31     ` René Scharfe
  0 siblings, 0 replies; 36+ messages in thread
From: René Scharfe @ 2022-11-13 17:31 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git List, Junio C Hamano, Taylor Blau, Christian Couder

Am 12.11.22 um 12:41 schrieb Ævar Arnfjörð Bjarmason:
>
> On Sat, Nov 12 2022, René Scharfe wrote:
>
>> Since 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't
>> leak, 2022-03-28) --filter options given to git pack-objects overrule
>> earlier ones, letting only the leftmost win and leaking the memory
>> allocated for earlier ones.  Fix that by only initializing the rev_info
>> struct once.
>
> If I do e.g. this with SANITIZE=leak:
>
> 	echo e83c5163316f89bfbde7d9ab23ca2e25604af290 | ./git pack-objects --revs --filter=blob:limit=1001 --filter=object:type=blob --stdout  >/dev/null
>
> I see one leak that wasn't there, but two that are gone now. I haven't
> looked into it, but I think the commit message should discuss what leaks
> are fixed, which remain/are new etc.

The leak is insubstantial; I mentioned it just because of the irony.  It
is caused by initializing an already initialized struct rev_info without
releasing it in between, as mentioned in the commit message.

.filter_data allocated by filter_combine__init() is not released by
filter_combine__init() in list-objects-filter.c.  Plugging that leak
allows your example command to run with SANITIZE=leak.  That is a matter
for a separate patch, though.

>
>> @@ -4158,7 +4158,8 @@ static struct list_objects_filter_options *po_filter_revs_init(void *value)
>>  {
>>  	struct po_filter_data *data = value;
>>
>> -	repo_init_revisions(the_repository, &data->revs, NULL);
>> +	if (!data->have_revs)
>> +		repo_init_revisions(the_repository, &data->revs, NULL);
>>  	data->have_revs = 1;
>>
>>  	return &data->revs.filter;
>
> FWIW as this goes away in your 2/3 I think just squashing the two with a
> leak fix would be nice, if...
>
>> diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
>> index bb633c9b09..bd8983bb56 100755
>> --- a/t/t5317-pack-objects-filter-objects.sh
>> +++ b/t/t5317-pack-objects-filter-objects.sh
>> @@ -178,6 +178,25 @@ test_expect_success 'verify blob:limit=1001' '
>>  	test_cmp expected observed
>>  '
>>
>> +test_expect_success 'verify blob:limit=1001+object:type=blob' '
>> +	git -C r2 ls-files -s large.1000 |
>
> Aside: Should do "git >tmp && test_parse... <tmp", we lose the exit code
> of "ls-files" here.

OK.  Copied that line from the surrounding tests.  They used temporary
files before fb2d0db502 (test-lib-functions: add parsing helpers for
ls-files and ls-tree, 2022-04-04).

>
>> +	test_parse_ls_files_stage_oids |
>> +	sort >expected &&
>> +
>> +	git -C r2 pack-objects --revs --stdout --filter=blob:limit=1001 \
>> +		--filter=object:type=blob >filter.pack <<-EOF &&
>> +	HEAD
>> +	EOF
>> +	git -C r2 index-pack ../filter.pack &&
>> +
>> +	git -C r2 verify-pack -v ../filter.pack >verify_result &&
>> +	grep blob verify_result |
>> +	parse_verify_pack_blob_oid |
>
> Whereas this one's not a problem, no "git".
>
>> +	sort >observed &&
>> +
>> +	test_cmp expected observed
>
> Aside: It would be nice if we had a "test_cmp_sort", but some other day...
>
>> +'
>> +
>>  test_expect_success 'verify blob:limit=10001' '
>>  	git -C r2 ls-files -s large.1000 large.10000 |
>>  	test_parse_ls_files_stage_oids |
>
> ...we can test it, but this test is in a file that's not part of "linux-leaks".
>
> If that one leak I mentioned above can be fixed (or maybe it's not new?)
> this could be tested if we put it in a file of its own with
> TEST_PASSES_SANITIZE_LEAK=true.

Plugging the leak in your example command is not enough to make t5317
leak free.

René

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 0/3] pack-objects: fix and simplify --filter handling
  2022-11-12 10:42 [PATCH 0/3] pack-objects: fix and simplify --filter handling René Scharfe
                   ` (2 preceding siblings ...)
  2022-11-12 10:46 ` [PATCH 3/3] list-objects-filter: remove OPT_PARSE_LIST_OBJECTS_FILTER_INIT() René Scharfe
@ 2022-11-20 10:03 ` René Scharfe
  2022-11-20 10:06   ` [PATCH v2 1/3] t5317: stop losing return codes of git ls-files René Scharfe
                     ` (3 more replies)
  3 siblings, 4 replies; 36+ messages in thread
From: René Scharfe @ 2022-11-20 10:03 UTC (permalink / raw)
  To: Git List
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Taylor Blau, Christian Couder, Jeff King

Fix a regression that prevents using multiple --filter options, simplify
the option parsing code and avoid relying on undefined behavior in it.

Patch 3 conflicts with cc/filtered-repack in seen, but not semantically.

Changes since v1:
- Added patch 1 to fix an issue with existing tests.
- Separate patch 2 for new tests.
- Test using blob size filters, only, which is a bit simpler.
- Test both combinations to also catch not just the current
  last-one-wins regression, but also a possible future first-one-wins
  issue.
- Actually revert 5cb28270a1 (pack-objects: lazily set up
  "struct rev_info", don't leak, 2022-03-28) instead of having a
  minimal fix and then adding some kind of middle ground by using a
  separate struct list_objects_filter_options.

  t5317: stop losing return codes of git ls-files
  t5317: demonstrate failure to handle multiple --filter options
  Revert "pack-objects: lazily set up "struct rev_info", don't leak"

 builtin/pack-objects.c                 | 31 ++-------
 list-objects-filter-options.c          |  4 --
 list-objects-filter-options.h          | 24 +------
 t/t5317-pack-objects-filter-objects.sh | 90 +++++++++++++++++++-------
 4 files changed, 75 insertions(+), 74 deletions(-)

Range-Diff gegen v1:
1:  2df1185b5f < -:  ---------- pack-objects: fix handling of multiple --filter options
2:  b8e951fb4f < -:  ---------- pack-object: simplify --filter handling
3:  1e4ae7d9f1 < -:  ---------- list-objects-filter: remove OPT_PARSE_LIST_OBJECTS_FILTER_INIT()
-:  ---------- > 1:  43af432f5c t5317: stop losing return codes of git ls-files
-:  ---------- > 2:  a77b85e71d t5317: demonstrate failure to handle multiple --filter options
-:  ---------- > 3:  63ccb81357 Revert "pack-objects: lazily set up "struct rev_info", don't leak"
--
2.38.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 1/3] t5317: stop losing return codes of git ls-files
  2022-11-20 10:03 ` [PATCH v2 0/3] pack-objects: fix and simplify --filter handling René Scharfe
@ 2022-11-20 10:06   ` René Scharfe
  2022-11-20 10:07   ` [PATCH v2 2/3] t5317: demonstrate failure to handle multiple --filter options René Scharfe
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 36+ messages in thread
From: René Scharfe @ 2022-11-20 10:06 UTC (permalink / raw)
  To: Git List
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Taylor Blau, Christian Couder, Jeff King

fb2d0db502 (test-lib-functions: add parsing helpers for ls-files and
ls-tree, 2022-04-04) not only started to use helper functions, it also
started to pipe the output of git ls-files into them directly, without
using a temporary file.  No explanation was given.  This causes the
return code of that git command to be ignored.

Revert that part of the change, use temporary files and check the return
code of git ls-files again.

Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 t/t5317-pack-objects-filter-objects.sh | 52 ++++++++++++++------------
 1 file changed, 28 insertions(+), 24 deletions(-)

diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index bb633c9b09..82a22ecaa5 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -24,8 +24,9 @@ parse_verify_pack_blob_oid () {
 }

 test_expect_success 'verify blob count in normal packfile' '
-	git -C r1 ls-files -s file.1 file.2 file.3 file.4 file.5 |
-	test_parse_ls_files_stage_oids |
+	git -C r1 ls-files -s file.1 file.2 file.3 file.4 file.5 \
+		>ls_files_result &&
+	test_parse_ls_files_stage_oids <ls_files_result |
 	sort >expected &&

 	git -C r1 pack-objects --revs --stdout >all.pack <<-EOF &&
@@ -123,8 +124,8 @@ test_expect_success 'setup r2' '
 '

 test_expect_success 'verify blob count in normal packfile' '
-	git -C r2 ls-files -s large.1000 large.10000 |
-	test_parse_ls_files_stage_oids |
+	git -C r2 ls-files -s large.1000 large.10000 >ls_files_result &&
+	test_parse_ls_files_stage_oids <ls_files_result |
 	sort >expected &&

 	git -C r2 pack-objects --revs --stdout >all.pack <<-EOF &&
@@ -161,8 +162,8 @@ test_expect_success 'verify blob:limit=1000' '
 '

 test_expect_success 'verify blob:limit=1001' '
-	git -C r2 ls-files -s large.1000 |
-	test_parse_ls_files_stage_oids |
+	git -C r2 ls-files -s large.1000 >ls_files_result &&
+	test_parse_ls_files_stage_oids <ls_files_result |
 	sort >expected &&

 	git -C r2 pack-objects --revs --stdout --filter=blob:limit=1001 >filter.pack <<-EOF &&
@@ -179,8 +180,8 @@ test_expect_success 'verify blob:limit=1001' '
 '

 test_expect_success 'verify blob:limit=10001' '
-	git -C r2 ls-files -s large.1000 large.10000 |
-	test_parse_ls_files_stage_oids |
+	git -C r2 ls-files -s large.1000 large.10000 >ls_files_result &&
+	test_parse_ls_files_stage_oids <ls_files_result |
 	sort >expected &&

 	git -C r2 pack-objects --revs --stdout --filter=blob:limit=10001 >filter.pack <<-EOF &&
@@ -197,8 +198,8 @@ test_expect_success 'verify blob:limit=10001' '
 '

 test_expect_success 'verify blob:limit=1k' '
-	git -C r2 ls-files -s large.1000 |
-	test_parse_ls_files_stage_oids |
+	git -C r2 ls-files -s large.1000 >ls_files_result &&
+	test_parse_ls_files_stage_oids <ls_files_result |
 	sort >expected &&

 	git -C r2 pack-objects --revs --stdout --filter=blob:limit=1k >filter.pack <<-EOF &&
@@ -215,8 +216,8 @@ test_expect_success 'verify blob:limit=1k' '
 '

 test_expect_success 'verify explicitly specifying oversized blob in input' '
-	git -C r2 ls-files -s large.1000 large.10000 |
-	test_parse_ls_files_stage_oids |
+	git -C r2 ls-files -s large.1000 large.10000 >ls_files_result &&
+	test_parse_ls_files_stage_oids <ls_files_result |
 	sort >expected &&

 	echo HEAD >objects &&
@@ -233,8 +234,8 @@ test_expect_success 'verify explicitly specifying oversized blob in input' '
 '

 test_expect_success 'verify blob:limit=1m' '
-	git -C r2 ls-files -s large.1000 large.10000 |
-	test_parse_ls_files_stage_oids |
+	git -C r2 ls-files -s large.1000 large.10000 >ls_files_result &&
+	test_parse_ls_files_stage_oids <ls_files_result |
 	sort >expected &&

 	git -C r2 pack-objects --revs --stdout --filter=blob:limit=1m >filter.pack <<-EOF &&
@@ -289,8 +290,9 @@ test_expect_success 'setup r3' '
 '

 test_expect_success 'verify blob count in normal packfile' '
-	git -C r3 ls-files -s sparse1 sparse2 dir1/sparse1 dir1/sparse2 |
-	test_parse_ls_files_stage_oids |
+	git -C r3 ls-files -s sparse1 sparse2 dir1/sparse1 dir1/sparse2 \
+		>ls_files_result &&
+	test_parse_ls_files_stage_oids <ls_files_result |
 	sort >expected &&

 	git -C r3 pack-objects --revs --stdout >all.pack <<-EOF &&
@@ -341,8 +343,9 @@ test_expect_success 'setup r4' '
 '

 test_expect_success 'verify blob count in normal packfile' '
-	git -C r4 ls-files -s pattern sparse1 sparse2 dir1/sparse1 dir1/sparse2 |
-	test_parse_ls_files_stage_oids |
+	git -C r4 ls-files -s pattern sparse1 sparse2 dir1/sparse1 dir1/sparse2 \
+		>ls_files_result &&
+	test_parse_ls_files_stage_oids <ls_files_result |
 	sort >expected &&

 	git -C r4 pack-objects --revs --stdout >all.pack <<-EOF &&
@@ -359,8 +362,8 @@ test_expect_success 'verify blob count in normal packfile' '
 '

 test_expect_success 'verify sparse:oid=OID' '
-	git -C r4 ls-files -s dir1/sparse1 dir1/sparse2 |
-	test_parse_ls_files_stage_oids |
+	git -C r4 ls-files -s dir1/sparse1 dir1/sparse2 >ls_files_result &&
+	test_parse_ls_files_stage_oids <ls_files_result |
 	sort >expected &&

 	git -C r4 ls-files -s pattern >staged &&
@@ -379,8 +382,8 @@ test_expect_success 'verify sparse:oid=OID' '
 '

 test_expect_success 'verify sparse:oid=oid-ish' '
-	git -C r4 ls-files -s dir1/sparse1 dir1/sparse2 |
-	test_parse_ls_files_stage_oids |
+	git -C r4 ls-files -s dir1/sparse1 dir1/sparse2 >ls_files_result &&
+	test_parse_ls_files_stage_oids <ls_files_result |
 	sort >expected &&

 	git -C r4 pack-objects --revs --stdout --filter=sparse:oid=main:pattern >filter.pack <<-EOF &&
@@ -400,8 +403,9 @@ test_expect_success 'verify sparse:oid=oid-ish' '
 # This models previously omitted objects that we did not receive.

 test_expect_success 'setup r1 - delete loose blobs' '
-	git -C r1 ls-files -s file.1 file.2 file.3 file.4 file.5 |
-	test_parse_ls_files_stage_oids |
+	git -C r1 ls-files -s file.1 file.2 file.3 file.4 file.5 \
+		>ls_files_result &&
+	test_parse_ls_files_stage_oids <ls_files_result |
 	sort >expected &&

 	for id in `cat expected | sed "s|..|&/|"`
--
2.38.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 2/3] t5317: demonstrate failure to handle multiple --filter options
  2022-11-20 10:03 ` [PATCH v2 0/3] pack-objects: fix and simplify --filter handling René Scharfe
  2022-11-20 10:06   ` [PATCH v2 1/3] t5317: stop losing return codes of git ls-files René Scharfe
@ 2022-11-20 10:07   ` René Scharfe
  2022-11-20 10:13   ` [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" René Scharfe
  2022-11-22 19:02   ` [PATCH v2 0/3] pack-objects: fix and simplify --filter handling Jeff King
  3 siblings, 0 replies; 36+ messages in thread
From: René Scharfe @ 2022-11-20 10:07 UTC (permalink / raw)
  To: Git List
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Taylor Blau, Christian Couder, Jeff King

git pack-objects should accept multiple --filter options as documented
in Documentation/rev-list-options.txt, but currently the last one wins.
Show that using tests with multiple blob size limits.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 t/t5317-pack-objects-filter-objects.sh | 38 ++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 82a22ecaa5..25faebaada 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -265,6 +265,44 @@ test_expect_success 'verify normal and blob:limit packfiles have same commits/tr
 	test_cmp expected observed
 '

+test_expect_failure 'verify small limit and big limit results in small limit' '
+	git -C r2 ls-files -s large.1000 >ls_files_result &&
+	test_parse_ls_files_stage_oids <ls_files_result |
+	sort >expected &&
+
+	git -C r2 pack-objects --revs --stdout --filter=blob:limit=1001 \
+		--filter=blob:limit=10001 >filter.pack <<-EOF &&
+	HEAD
+	EOF
+	git -C r2 index-pack ../filter.pack &&
+
+	git -C r2 verify-pack -v ../filter.pack >verify_result &&
+	grep blob verify_result |
+	parse_verify_pack_blob_oid |
+	sort >observed &&
+
+	test_cmp expected observed
+'
+
+test_expect_success 'verify big limit and small limit results in small limit' '
+	git -C r2 ls-files -s large.1000 >ls_files_result &&
+	test_parse_ls_files_stage_oids <ls_files_result |
+	sort >expected &&
+
+	git -C r2 pack-objects --revs --stdout --filter=blob:limit=10001 \
+		--filter=blob:limit=1001 >filter.pack <<-EOF &&
+	HEAD
+	EOF
+	git -C r2 index-pack ../filter.pack &&
+
+	git -C r2 verify-pack -v ../filter.pack >verify_result &&
+	grep blob verify_result |
+	parse_verify_pack_blob_oid |
+	sort >observed &&
+
+	test_cmp expected observed
+'
+
 # Test sparse:path=<path> filter.
 # !!!!
 # NOTE: sparse:path filter support has been dropped for security reasons,
--
2.38.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak"
  2022-11-20 10:03 ` [PATCH v2 0/3] pack-objects: fix and simplify --filter handling René Scharfe
  2022-11-20 10:06   ` [PATCH v2 1/3] t5317: stop losing return codes of git ls-files René Scharfe
  2022-11-20 10:07   ` [PATCH v2 2/3] t5317: demonstrate failure to handle multiple --filter options René Scharfe
@ 2022-11-20 10:13   ` René Scharfe
  2022-11-28 10:03     ` Junio C Hamano
  2022-11-22 19:02   ` [PATCH v2 0/3] pack-objects: fix and simplify --filter handling Jeff King
  3 siblings, 1 reply; 36+ messages in thread
From: René Scharfe @ 2022-11-20 10:13 UTC (permalink / raw)
  To: Git List
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Taylor Blau, Christian Couder, Jeff King

This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518.

5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
2022-03-28) avoided leaking rev_info allocations in many cases by
calling repo_init_revisions() only when the .filter member was actually
needed, but then still leaking it.  That was fixed later by 2108fe4a19
(revisions API users: add straightforward release_revisions(),
2022-04-13), making the reverted commit unnecessary.

5cb28270a1 broke support for multiple --filter options by calling
repo_init_revisions() for each one, clobbering earlier state and
leaking rev_info allocations.  Reverting it fixes that regression.
Luckily it only affects users that run pack-objects directly.  Internal
callers in bundle.c and upload-pack.c combine multiple filters into a
single option using "+".

5cb28270a1 introduced OPT_PARSE_LIST_OBJECTS_FILTER_INIT.  It relies on
being able to faithfully convert function pointers to object pointers
and back, which is undefined in C99.  The standard mentions this ability
as a common extension in its annex J (J.5.7 Function pointer casts), but
avoiding that dependency is more portable.  The macro hasn't been used
since, OPT_PARSE_LIST_OBJECTS_FILTER is enough so far.

So revert 5cb28270a1 fully to fix support for multiple --filter options
and avoid reliance on undefined behavior.  Its intent is better served
by the release_revisions() call added by 2108fe4a19 alone -- we just
need to do it unconditionally, mirroring the unconditional call of
repo_init_revisions().

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/pack-objects.c                 | 31 +++++---------------------
 list-objects-filter-options.c          |  4 ----
 list-objects-filter-options.h          | 24 +++-----------------
 t/t5317-pack-objects-filter-objects.sh |  2 +-
 4 files changed, 10 insertions(+), 51 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 573d0b20b7..3e74fbb0cd 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -4149,21 +4149,6 @@ static int option_parse_cruft_expiration(const struct option *opt,
 	return 0;
 }

-struct po_filter_data {
-	unsigned have_revs:1;
-	struct rev_info revs;
-};
-
-static struct list_objects_filter_options *po_filter_revs_init(void *value)
-{
-	struct po_filter_data *data = value;
-
-	repo_init_revisions(the_repository, &data->revs, NULL);
-	data->have_revs = 1;
-
-	return &data->revs.filter;
-}
-
 int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 {
 	int use_internal_rev_list = 0;
@@ -4174,7 +4159,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	int rev_list_index = 0;
 	int stdin_packs = 0;
 	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
-	struct po_filter_data pfd = { .have_revs = 0 };
+	struct rev_info revs;

 	struct option pack_objects_options[] = {
 		OPT_SET_INT('q', "quiet", &progress,
@@ -4265,7 +4250,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			      &write_bitmap_index,
 			      N_("write a bitmap index if possible"),
 			      WRITE_BITMAP_QUIET, PARSE_OPT_HIDDEN),
-		OPT_PARSE_LIST_OBJECTS_FILTER_INIT(&pfd, po_filter_revs_init),
+		OPT_PARSE_LIST_OBJECTS_FILTER(&revs.filter),
 		OPT_CALLBACK_F(0, "missing", NULL, N_("action"),
 		  N_("handling for missing objects"), PARSE_OPT_NONEG,
 		  option_parse_missing_action),
@@ -4284,6 +4269,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)

 	read_replace_refs = 0;

+	repo_init_revisions(the_repository, &revs, NULL);
+
 	sparse = git_env_bool("GIT_TEST_PACK_SPARSE", -1);
 	if (the_repository->gitdir) {
 		prepare_repo_settings(the_repository);
@@ -4385,7 +4372,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	if (!rev_list_all || !rev_list_reflog || !rev_list_index)
 		unpack_unreachable_expiration = 0;

-	if (pfd.have_revs && pfd.revs.filter.choice) {
+	if (revs.filter.choice) {
 		if (!pack_to_stdout)
 			die(_("cannot use --filter without --stdout"));
 		if (stdin_packs)
@@ -4472,16 +4459,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 		read_cruft_objects();
 	} else if (!use_internal_rev_list) {
 		read_object_list_from_stdin();
-	} else if (pfd.have_revs) {
-		get_object_list(&pfd.revs, rp.nr, rp.v);
-		release_revisions(&pfd.revs);
 	} else {
-		struct rev_info revs;
-
-		repo_init_revisions(the_repository, &revs, NULL);
 		get_object_list(&revs, rp.nr, rp.v);
-		release_revisions(&revs);
 	}
+	release_revisions(&revs);
 	cleanup_preferred_base();
 	if (include_tag && nr_result)
 		for_each_tag_ref(add_ref_tag, NULL);
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 5339660238..ee01bcd2cc 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -290,10 +290,6 @@ int opt_parse_list_objects_filter(const struct option *opt,
 				  const char *arg, int unset)
 {
 	struct list_objects_filter_options *filter_options = opt->value;
-	opt_lof_init init = (opt_lof_init)opt->defval;
-
-	if (init)
-		filter_options = init(opt->value);

 	if (unset || !arg)
 		list_objects_filter_set_no_filter(filter_options);
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 7eeadab2dd..64af2e29e2 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -107,31 +107,13 @@ void parse_list_objects_filter(
 	struct list_objects_filter_options *filter_options,
 	const char *arg);

-/**
- * The opt->value to opt_parse_list_objects_filter() is either a
- * "struct list_objects_filter_option *" when using
- * OPT_PARSE_LIST_OBJECTS_FILTER().
- *
- * Or, if using no "struct option" field is used by the callback,
- * except the "defval" which is expected to be an "opt_lof_init"
- * function, which is called with the "opt->value" and must return a
- * pointer to the ""struct list_objects_filter_option *" to be used.
- *
- * The OPT_PARSE_LIST_OBJECTS_FILTER_INIT() can be used e.g. the
- * "struct list_objects_filter_option" is embedded in a "struct
- * rev_info", which the "defval" could be tasked with lazily
- * initializing. See cmd_pack_objects() for an example.
- */
 int opt_parse_list_objects_filter(const struct option *opt,
 				  const char *arg, int unset);
-typedef struct list_objects_filter_options *(*opt_lof_init)(void *);
-#define OPT_PARSE_LIST_OBJECTS_FILTER_INIT(fo, init) \
-	{ OPTION_CALLBACK, 0, "filter", (fo), N_("args"), \
-	  N_("object filtering"), 0, opt_parse_list_objects_filter, \
-	  (intptr_t)(init) }

 #define OPT_PARSE_LIST_OBJECTS_FILTER(fo) \
-	OPT_PARSE_LIST_OBJECTS_FILTER_INIT((fo), NULL)
+	OPT_CALLBACK(0, "filter", fo, N_("args"), \
+	  N_("object filtering"), \
+	  opt_parse_list_objects_filter)

 /*
  * Translates abbreviated numbers in the filter's filter_spec into their
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 25faebaada..5b707d911b 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -265,7 +265,7 @@ test_expect_success 'verify normal and blob:limit packfiles have same commits/tr
 	test_cmp expected observed
 '

-test_expect_failure 'verify small limit and big limit results in small limit' '
+test_expect_success 'verify small limit and big limit results in small limit' '
 	git -C r2 ls-files -s large.1000 >ls_files_result &&
 	test_parse_ls_files_stage_oids <ls_files_result |
 	sort >expected &&
--
2.38.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/3] pack-objects: fix and simplify --filter handling
  2022-11-20 10:03 ` [PATCH v2 0/3] pack-objects: fix and simplify --filter handling René Scharfe
                     ` (2 preceding siblings ...)
  2022-11-20 10:13   ` [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" René Scharfe
@ 2022-11-22 19:02   ` Jeff King
  3 siblings, 0 replies; 36+ messages in thread
From: Jeff King @ 2022-11-22 19:02 UTC (permalink / raw)
  To: René Scharfe
  Cc: Git List, Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Taylor Blau, Christian Couder

On Sun, Nov 20, 2022 at 11:03:35AM +0100, René Scharfe wrote:

> Fix a regression that prevents using multiple --filter options, simplify
> the option parsing code and avoid relying on undefined behavior in it.
> 
> Patch 3 conflicts with cc/filtered-repack in seen, but not semantically.
> 
> Changes since v1:
> - Added patch 1 to fix an issue with existing tests.
> - Separate patch 2 for new tests.
> - Test using blob size filters, only, which is a bit simpler.
> - Test both combinations to also catch not just the current
>   last-one-wins regression, but also a possible future first-one-wins
>   issue.
> - Actually revert 5cb28270a1 (pack-objects: lazily set up
>   "struct rev_info", don't leak, 2022-03-28) instead of having a
>   minimal fix and then adding some kind of middle ground by using a
>   separate struct list_objects_filter_options.

Thanks, this looks good to me. The new test is a little funny in that
it's somewhat-nonsensical input, but I find it unlikely that we'd ever
add code to treat two instances of the same filter type specially.
Compared to something that produces two distinct effects (like a blob
and tree-depth filter combined), it also requires the "test in both
directions" trick to cover everything.

But I do like that it's easier to reason about what the output _should_
be. So I don't feel too strongly about it either way.

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak"
  2022-11-20 10:13   ` [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" René Scharfe
@ 2022-11-28 10:03     ` Junio C Hamano
  2022-11-28 11:12       ` Ævar Arnfjörð Bjarmason
  2022-11-28 11:26       ` [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" René Scharfe
  0 siblings, 2 replies; 36+ messages in thread
From: Junio C Hamano @ 2022-11-28 10:03 UTC (permalink / raw)
  To: René Scharfe
  Cc: Git List, Ævar Arnfjörð Bjarmason, Taylor Blau,
	Christian Couder, Jeff King

René Scharfe <l.s.r@web.de> writes:

> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518.
>
> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
> 2022-03-28) avoided leaking rev_info allocations in many cases by
> calling repo_init_revisions() only when the .filter member was actually
> needed, but then still leaking it.  That was fixed later by 2108fe4a19
> (revisions API users: add straightforward release_revisions(),
> 2022-04-13), making the reverted commit unnecessary.

Hmph, with this merged, 'seen' breaks linux-leaks job in a strange
way.

https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917

Does anybody want to help looking into it?

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak"
  2022-11-28 10:03     ` Junio C Hamano
@ 2022-11-28 11:12       ` Ævar Arnfjörð Bjarmason
  2022-11-28 12:00         ` [PATCH] t5314: check exit code of "rev-parse" Ævar Arnfjörð Bjarmason
  2022-11-28 11:26       ` [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" René Scharfe
  1 sibling, 1 reply; 36+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-11-28 11:12 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: René Scharfe, Git List, Taylor Blau, Christian Couder, Jeff King


On Mon, Nov 28 2022, Junio C Hamano wrote:

> René Scharfe <l.s.r@web.de> writes:
>
>> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518.
>>
>> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
>> 2022-03-28) avoided leaking rev_info allocations in many cases by
>> calling repo_init_revisions() only when the .filter member was actually
>> needed, but then still leaking it.  That was fixed later by 2108fe4a19
>> (revisions API users: add straightforward release_revisions(),
>> 2022-04-13), making the reverted commit unnecessary.
>
> Hmph, with this merged, 'seen' breaks linux-leaks job in a strange
> way.
>
> https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917
>
> Does anybody want to help looking into it?

It's because the tip of that branch introduces a new leak, but it's only
revealed in our test suite by a "git" invocation that we don't check the
exit code of. So testing with "GIT_TEST_SANITIZE_LEAK_LOG=true" (which
is there to check exactly those edge cases) catches it.

René, testing locally with e.g.:

	GIT_TEST_PASSING_SANITIZE_LEAK=check \
	GIT_TEST_SANITIZE_LEAK_LOG=true \
	make test

Should catch it, but note that we might still have new leaks in tests
that are already failing due to other leaks. So diff-ing the resulting
t/test-results/*.leak/* would be a good sanity check to see if there's
any other leak we've missed.

As to the bug itself: This exact edge case is why I went for the
lazy-setup part of 5cb28270a1f (pack-objects: lazily set up "struct
rev_info", don't leak, 2022-03-28), and indeed the example mentioned in
its commit message now leaks again, but not on master:

	echo e83c5163316f89bfbde7d9ab23ca2e25604af290 | ./git pack-objects initial

Now, maybe I'm missing something, but it the tip of that branch points
out a legitimate bug in my 5cb28270a1f.

But I don't see why the fix for it needs to be a full revert of
5cb28270a1f. If I replace all the C code changes in that commit with
this the test it flipped also passes:
	
	diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
	index 573d0b20b76..0de53c5a2df 100644
	--- a/builtin/pack-objects.c
	+++ b/builtin/pack-objects.c
	@@ -4158,8 +4158,10 @@ static struct list_objects_filter_options *po_filter_revs_init(void *value)
	 {
	 	struct po_filter_data *data = value;
	 
	-	repo_init_revisions(the_repository, &data->revs, NULL);
	-	data->have_revs = 1;
	+	if (!data->have_revs) {
	+		repo_init_revisions(the_repository, &data->revs, NULL);
	+		data->have_revs = 1;
	+	}
	 
	 	return &data->revs.filter;
	 }

I.e. the commit is correct that we shouldn't be clobbering the
"rev_info" when we see --filter again, but then let's just stop doing
that.

Well, I know why, it's because René initially started poking at this to
rid the codebase of the reliance of C99's J.5.7. I think our two
opinions on that are quite clear, so I won't repeat that here (I don't
per-se mind, I just think it's a nothingburger).

But in any case, if we're going to also refactor this code to get rd of
the J.5.7 reliance I'd think we could agree that it really belongs in
its own change. That change could then be a pure refactoring change, and
wouldn't need to also explain how it's fixing a bug while-at-it.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak"
  2022-11-28 10:03     ` Junio C Hamano
  2022-11-28 11:12       ` Ævar Arnfjörð Bjarmason
@ 2022-11-28 11:26       ` René Scharfe
  2022-11-28 11:31         ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 36+ messages in thread
From: René Scharfe @ 2022-11-28 11:26 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git List, Ævar Arnfjörð Bjarmason, Taylor Blau,
	Christian Couder, Jeff King

Am 28.11.2022 um 11:03 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518.
>>
>> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
>> 2022-03-28) avoided leaking rev_info allocations in many cases by
>> calling repo_init_revisions() only when the .filter member was actually
>> needed, but then still leaking it.  That was fixed later by 2108fe4a19
>> (revisions API users: add straightforward release_revisions(),
>> 2022-04-13), making the reverted commit unnecessary.
>
> Hmph, with this merged, 'seen' breaks linux-leaks job in a strange
> way.
>
> https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917
>
> Does anybody want to help looking into it?
The patch exposes that release_revisions() leaks the diffopt allocations
as we're yet to address the TODO added by 54c8a7c379 (revisions API: add
a TODO for diff_free(&revs->diffopt), 2022-04-14).

The patch below plugs it locally.

--- >8 ---
Subject: [PATCH 4/3] fixup! revision: free diffopt in release_revisions()

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/pack-objects.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 3e74fbb0cd..a47a3f0fba 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -4462,6 +4462,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	} else {
 		get_object_list(&revs, rp.nr, rp.v);
 	}
+	diff_free(&revs.diffopt);
 	release_revisions(&revs);
 	cleanup_preferred_base();
 	if (include_tag && nr_result)
--
2.30.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak"
  2022-11-28 11:26       ` [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" René Scharfe
@ 2022-11-28 11:31         ` Ævar Arnfjörð Bjarmason
  2022-11-28 12:24           ` Ævar Arnfjörð Bjarmason
  2022-11-28 14:29           ` René Scharfe
  0 siblings, 2 replies; 36+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-11-28 11:31 UTC (permalink / raw)
  To: René Scharfe
  Cc: Junio C Hamano, Git List, Taylor Blau, Christian Couder, Jeff King


On Mon, Nov 28 2022, René Scharfe wrote:

> Am 28.11.2022 um 11:03 schrieb Junio C Hamano:
>> René Scharfe <l.s.r@web.de> writes:
>>
>>> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518.
>>>
>>> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
>>> 2022-03-28) avoided leaking rev_info allocations in many cases by
>>> calling repo_init_revisions() only when the .filter member was actually
>>> needed, but then still leaking it.  That was fixed later by 2108fe4a19
>>> (revisions API users: add straightforward release_revisions(),
>>> 2022-04-13), making the reverted commit unnecessary.
>>
>> Hmph, with this merged, 'seen' breaks linux-leaks job in a strange
>> way.
>>
>> https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917
>>
>> Does anybody want to help looking into it?

[I see we crossed E-Mails]:
https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/

> The patch exposes that release_revisions() leaks the diffopt allocations
> as we're yet to address the TODO added by 54c8a7c379 (revisions API: add
> a TODO for diff_free(&revs->diffopt), 2022-04-14).

That's correct, and we have that leak in various places in our codebase,
but per the above side-thread I think this is primarily exposing that
we're setting up the "struct rev_info" with your change when we don't
need to. Why can't we just skip it?

Yeah, if we do set it up we'll run into an outstanding leak, and that
should also be fixed (I have some local patches...), but the other cases
I know of where we'll leak that data is where we're actually using the
"struct rev_info".

I haven't tried tearing your change apart to poke at it myself, and
maybe there's some really good reason for why you can't separate getting
rid of the J.5.7 dependency and removing the lazy-init.

> The patch below plugs it locally.
>
> --- >8 ---
> Subject: [PATCH 4/3] fixup! revision: free diffopt in release_revisions()
>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  builtin/pack-objects.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 3e74fbb0cd..a47a3f0fba 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -4462,6 +4462,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>  	} else {
>  		get_object_list(&revs, rp.nr, rp.v);
>  	}
> +	diff_free(&revs.diffopt);
>  	release_revisions(&revs);
>  	cleanup_preferred_base();
>  	if (include_tag && nr_result)

So, the main motivation for the change was paranoia that a compiler or
platform might show up without J.5.7 support and that would bite us, but
we're now adding a double-free-in-waiting?

I think we're both a bit paranoid, but clearly have different
paranoia-priorities :)

If we do end up with some hack like this instead of fixing the
underlying problem I'd much prefer that such a hack just be an UNLEAK()
here.

I.e. we have a destructor for "revs.*" already, let's not bypass it and
start freeing things from under it, which will result in a double-free
if we forget this callsite once the TODO in 54c8a7c379 is addressed.

As you'd see if you made release_revisions() simply call
diff_free(&revs.diffopt) doing so would reveal some really gnarly edge
cases.

I haven't dug into this one, but offhand I'm not confident in saying
that this isn't exposing us to some aspect of that gnarlyness (maybe
not, it's been a while since I looked).

(IIRC some of the most gnarly edge cases will only show up as CI
failures on Windows, to do with the ordering of when we'll fclose()
files hanging off that "diffopt").


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] t5314: check exit code of "rev-parse"
  2022-11-28 11:12       ` Ævar Arnfjörð Bjarmason
@ 2022-11-28 12:00         ` Ævar Arnfjörð Bjarmason
  2022-11-28 13:51           ` René Scharfe
  2022-11-28 14:18           ` [PATCH v2] t5314: check exit code of "git" Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 36+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-11-28 12:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Taylor Blau, Jeff King,
	Christian Couder, Ævar Arnfjörð Bjarmason

Amend the test added in [1] to check the exit code of the "rev-parse"
invocations. An in-flight change[2] introduced a memory leak in these
invocations, which went undetected unless we were running under
"GIT_TEST_SANITIZE_LEAK_LOG=true".

Note that the in-flight change made 8 test files fail, but as far as I
can tell only this one would have had its exit code hidden unless
under "GIT_TEST_SANITIZE_LEAK_LOG=true". The rest would be caught
without it.

1. 4cf2143e029 (pack-objects: break delta cycles before delta-search
   phase, 2016-08-11)
2. https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/
3. faececa53f9 (test-lib: have the "check" mode for SANITIZE=leak
   consider leak logs, 2022-07-28)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

Aside from rs/multi-filter-args, we should be checking the return code
of "git" in these cases.

 t/t5314-pack-cycle-detection.sh | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/t/t5314-pack-cycle-detection.sh b/t/t5314-pack-cycle-detection.sh
index 73a241743aa..169d8198641 100755
--- a/t/t5314-pack-cycle-detection.sh
+++ b/t/t5314-pack-cycle-detection.sh
@@ -63,13 +63,16 @@ TEST_PASSES_SANITIZE_LEAK=true
 # Note that the two variants of "file" must be similar enough to convince git
 # to create the delta.
 make_pack () {
-	{
-		printf '%s\n' "-$(git rev-parse $2)"
-		printf '%s dummy\n' "$(git rev-parse $1:dummy)"
-		printf '%s file\n' "$(git rev-parse $1:file)"
-	} |
-	git pack-objects --stdout |
-	git index-pack --stdin --fix-thin
+	grp1=$(git rev-parse "$2") &&
+	grp2=$(git rev-parse "$1:dummy") &&
+	grp3=$(git rev-parse "$1:file") &&
+	cat >list <<-EOF
+	-$grp1
+	$grp2 dummy
+	$grp3 file
+	EOF
+	git pack-objects --stdout <list >pack &&
+	git index-pack --stdin --fix-thin <pack
 }
 
 test_expect_success 'setup' '
-- 
2.39.0.rc0.962.g4ca4168c9ac


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak"
  2022-11-28 11:31         ` Ævar Arnfjörð Bjarmason
@ 2022-11-28 12:24           ` Ævar Arnfjörð Bjarmason
  2022-11-28 15:16             ` René Scharfe
  2022-11-28 14:29           ` René Scharfe
  1 sibling, 1 reply; 36+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-11-28 12:24 UTC (permalink / raw)
  To: René Scharfe
  Cc: Junio C Hamano, Git List, Taylor Blau, Christian Couder, Jeff King


On Mon, Nov 28 2022, Ævar Arnfjörð Bjarmason wrote:

René:

> On Mon, Nov 28 2022, René Scharfe wrote:
>
>> Am 28.11.2022 um 11:03 schrieb Junio C Hamano:
>>> René Scharfe <l.s.r@web.de> writes:
>>>
>>>> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518.
>>>>
>>>> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
>>>> 2022-03-28) avoided leaking rev_info allocations in many cases by
>>>> calling repo_init_revisions() only when the .filter member was actually
>>>> needed, but then still leaking it.  That was fixed later by 2108fe4a19
>>>> (revisions API users: add straightforward release_revisions(),
>>>> 2022-04-13), making the reverted commit unnecessary.
>>>
>>> Hmph, with this merged, 'seen' breaks linux-leaks job in a strange
>>> way.
>>>
>>> https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917
>>>
>>> Does anybody want to help looking into it?
>
> [I see we crossed E-Mails]:
> https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/
>
>> The patch exposes that release_revisions() leaks the diffopt allocations
>> as we're yet to address the TODO added by 54c8a7c379 (revisions API: add
>> a TODO for diff_free(&revs->diffopt), 2022-04-14).
>
> That's correct, and we have that leak in various places in our codebase,
> but per the above side-thread I think this is primarily exposing that
> we're setting up the "struct rev_info" with your change when we don't
> need to. Why can't we just skip it?
>
> Yeah, if we do set it up we'll run into an outstanding leak, and that
> should also be fixed (I have some local patches...), but the other cases
> I know of where we'll leak that data is where we're actually using the
> "struct rev_info".
>
> I haven't tried tearing your change apart to poke at it myself, and
> maybe there's some really good reason for why you can't separate getting
> rid of the J.5.7 dependency and removing the lazy-init.
>
>> The patch below plugs it locally.
>>
>> --- >8 ---
>> Subject: [PATCH 4/3] fixup! revision: free diffopt in release_revisions()
>>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>>  builtin/pack-objects.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>> index 3e74fbb0cd..a47a3f0fba 100644
>> --- a/builtin/pack-objects.c
>> +++ b/builtin/pack-objects.c
>> @@ -4462,6 +4462,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>>  	} else {
>>  		get_object_list(&revs, rp.nr, rp.v);
>>  	}
>> +	diff_free(&revs.diffopt);
>>  	release_revisions(&revs);
>>  	cleanup_preferred_base();
>>  	if (include_tag && nr_result)
>
> So, the main motivation for the change was paranoia that a compiler or
> platform might show up without J.5.7 support and that would bite us, but
> we're now adding a double-free-in-waiting?
>
> I think we're both a bit paranoid, but clearly have different
> paranoia-priorities :)
>
> If we do end up with some hack like this instead of fixing the
> underlying problem I'd much prefer that such a hack just be an UNLEAK()
> here.
>
> I.e. we have a destructor for "revs.*" already, let's not bypass it and
> start freeing things from under it, which will result in a double-free
> if we forget this callsite once the TODO in 54c8a7c379 is addressed.
>
> As you'd see if you made release_revisions() simply call
> diff_free(&revs.diffopt) doing so would reveal some really gnarly edge
> cases.
>
> I haven't dug into this one, but offhand I'm not confident in saying
> that this isn't exposing us to some aspect of that gnarlyness (maybe
> not, it's been a while since I looked).
>
> (IIRC some of the most gnarly edge cases will only show up as CI
> failures on Windows, to do with the ordering of when we'll fclose()
> files hanging off that "diffopt").

This squashed into 3/3 seems to me to be a proper fix to a change that
wants to refactor the code for non-J.5.7 compatibility. I.e. this just
does the data<->fp casting part of the change, without refactoring the
"lazy init".

But I think you should check this a bit more carefully. Your 3/3 says
that your change "mak[es] the reverted commit unnecessary", but as I
noted if you'd run the command that commit shows, you'd have seen you're
re-introducing the leak it fixed. So I wonder what else has been missed
here.

I vaguely recall that one reason I ended up with that J.5.7 dependency
was because there was an objection to mocking up the "struct option" as
I'm doing here. I.e. here we assume that the
opt_parse_list_objects_filter() is only ever going to care about the
"value" member.

I think that's probably fine, but I may be misrecalling, or missing some
crucial detail. I'll leave digging that up & convincing us that it's
fine to the person pushing for refactoring all of this :)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 3e74fbb0cd5..faf210bfe8c 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -4149,6 +4149,27 @@ static int option_parse_cruft_expiration(const struct option *opt,
 	return 0;
 }
 
+struct po_filter_data {
+	unsigned have_revs:1;
+	struct rev_info revs;
+};
+
+static int opt_parse_list_objects_filter_init(const struct option *opt,
+					      const char *arg, int unset)
+{
+	struct po_filter_data *data = opt->value;
+	struct rev_info *revs = &data->revs;
+	const struct option opt_rev = {
+		.value = (void *)&revs->filter,
+	};
+
+	if (!data->have_revs)
+		repo_init_revisions(the_repository, revs, NULL);
+	data->have_revs = 1;
+
+	return opt_parse_list_objects_filter(&opt_rev, arg, unset);
+}
+
 int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 {
 	int use_internal_rev_list = 0;
@@ -4159,7 +4180,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	int rev_list_index = 0;
 	int stdin_packs = 0;
 	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
-	struct rev_info revs;
+	struct po_filter_data pfd = { .have_revs = 0 };
 
 	struct option pack_objects_options[] = {
 		OPT_SET_INT('q', "quiet", &progress,
@@ -4250,7 +4271,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			      &write_bitmap_index,
 			      N_("write a bitmap index if possible"),
 			      WRITE_BITMAP_QUIET, PARSE_OPT_HIDDEN),
-		OPT_PARSE_LIST_OBJECTS_FILTER(&revs.filter),
+		OPT_CALLBACK(0, "filter", &pfd, N_("args"), N_("object filtering"),
+			     opt_parse_list_objects_filter_init),
 		OPT_CALLBACK_F(0, "missing", NULL, N_("action"),
 		  N_("handling for missing objects"), PARSE_OPT_NONEG,
 		  option_parse_missing_action),
@@ -4269,8 +4291,6 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 
 	read_replace_refs = 0;
 
-	repo_init_revisions(the_repository, &revs, NULL);
-
 	sparse = git_env_bool("GIT_TEST_PACK_SPARSE", -1);
 	if (the_repository->gitdir) {
 		prepare_repo_settings(the_repository);
@@ -4372,7 +4392,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	if (!rev_list_all || !rev_list_reflog || !rev_list_index)
 		unpack_unreachable_expiration = 0;
 
-	if (revs.filter.choice) {
+	if (pfd.have_revs && pfd.revs.filter.choice) {
 		if (!pack_to_stdout)
 			die(_("cannot use --filter without --stdout"));
 		if (stdin_packs)
@@ -4459,10 +4479,16 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 		read_cruft_objects();
 	} else if (!use_internal_rev_list) {
 		read_object_list_from_stdin();
+	} else if (pfd.have_revs) {
+		get_object_list(&pfd.revs, rp.nr, rp.v);
+		release_revisions(&pfd.revs);
 	} else {
+		struct rev_info revs;
+
+		repo_init_revisions(the_repository, &revs, NULL);
 		get_object_list(&revs, rp.nr, rp.v);
+		release_revisions(&revs);
 	}
-	release_revisions(&revs);
 	cleanup_preferred_base();
 	if (include_tag && nr_result)
 		for_each_tag_ref(add_ref_tag, NULL);

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH] t5314: check exit code of "rev-parse"
  2022-11-28 12:00         ` [PATCH] t5314: check exit code of "rev-parse" Ævar Arnfjörð Bjarmason
@ 2022-11-28 13:51           ` René Scharfe
  2022-11-28 14:18           ` [PATCH v2] t5314: check exit code of "git" Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 36+ messages in thread
From: René Scharfe @ 2022-11-28 13:51 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git
  Cc: Junio C Hamano, Taylor Blau, Jeff King, Christian Couder

Am 28.11.2022 um 13:00 schrieb Ævar Arnfjörð Bjarmason:
> Amend the test added in [1] to check the exit code of the "rev-parse"
> invocations. An in-flight change[2] introduced a memory leak in these
> invocations, which went undetected unless we were running under
> "GIT_TEST_SANITIZE_LEAK_LOG=true".

Checking the return code of rev-parse might be a good idea, but AFAICS
none of the patches in the thread around [2] add a leak to it; they are
only about pack-objects.  So that's not how this patch works.

> Note that the in-flight change made 8 test files fail, but as far as I
> can tell only this one would have had its exit code hidden unless
> under "GIT_TEST_SANITIZE_LEAK_LOG=true". The rest would be caught
> without it.
>
> 1. 4cf2143e029 (pack-objects: break delta cycles before delta-search
>    phase, 2016-08-11)
> 2. https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/
> 3. faececa53f9 (test-lib: have the "check" mode for SANITIZE=leak
>    consider leak logs, 2022-07-28)
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>
> Aside from rs/multi-filter-args, we should be checking the return code
> of "git" in these cases.
>
>  t/t5314-pack-cycle-detection.sh | 17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/t/t5314-pack-cycle-detection.sh b/t/t5314-pack-cycle-detection.sh
> index 73a241743aa..169d8198641 100755
> --- a/t/t5314-pack-cycle-detection.sh
> +++ b/t/t5314-pack-cycle-detection.sh
> @@ -63,13 +63,16 @@ TEST_PASSES_SANITIZE_LEAK=true
>  # Note that the two variants of "file" must be similar enough to convince git
>  # to create the delta.
>  make_pack () {
> -	{
> -		printf '%s\n' "-$(git rev-parse $2)"
> -		printf '%s dummy\n' "$(git rev-parse $1:dummy)"
> -		printf '%s file\n' "$(git rev-parse $1:file)"
> -	} |
> -	git pack-objects --stdout |

THIS is how it works: pack-objects with the patch in the
grandparent of [2] and built with SANITIZE=leak fails here, but
the test marches on, ignoring its return value.

> -	git index-pack --stdin --fix-thin
> +	grp1=$(git rev-parse "$2") &&
> +	grp2=$(git rev-parse "$1:dummy") &&
> +	grp3=$(git rev-parse "$1:file") &&

Nit: "grp" expands to "group" in my mind.  Here it stands for "git
rev-parse", hmm.  "commit", "dummy_blob" and "file_blob" might be
better names.

> +	cat >list <<-EOF
> +	-$grp1
> +	$grp2 dummy
> +	$grp3 file
> +	EOF
> +	git pack-objects --stdout <list >pack &&

Here's the new return code check that makes the test detect the
diffopt leak in question.

> +	git index-pack --stdin --fix-thin <pack
>  }
>
>  test_expect_success 'setup' '

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2] t5314: check exit code of "git"
  2022-11-28 12:00         ` [PATCH] t5314: check exit code of "rev-parse" Ævar Arnfjörð Bjarmason
  2022-11-28 13:51           ` René Scharfe
@ 2022-11-28 14:18           ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 36+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-11-28 14:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Taylor Blau, Jeff King,
	Christian Couder, Ævar Arnfjörð Bjarmason

Amend the test added in [1] to check the exit code of the "git"
invocations. An in-flight change[2] introduced a memory leak in these
invocations, which went undetected unless we were running under
"GIT_TEST_SANITIZE_LEAK_LOG=true".

Note that the in-flight change made 8 test files fail, but as far as I
can tell only this one would have had its exit code hidden unless
under "GIT_TEST_SANITIZE_LEAK_LOG=true". The rest would be caught
without it.

We could pick other variable names here than "ln%d", e.g. "commit",
"dummy_blob" and "file_blob", but having the "rev-parse" invocations
aligned makes the difference between them more readable, so let's pick
"ln%d".

1. 4cf2143e029 (pack-objects: break delta cycles before delta-search
   phase, 2016-08-11)
2. https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/
3. faececa53f9 (test-lib: have the "check" mode for SANITIZE=leak
   consider leak logs, 2022-07-28)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
Range-diff against v1:
1:  45b240740eb ! 1:  ca77a7249e6 t5314: check exit code of "rev-parse"
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    t5314: check exit code of "rev-parse"
    +    t5314: check exit code of "git"
     
    -    Amend the test added in [1] to check the exit code of the "rev-parse"
    +    Amend the test added in [1] to check the exit code of the "git"
         invocations. An in-flight change[2] introduced a memory leak in these
         invocations, which went undetected unless we were running under
         "GIT_TEST_SANITIZE_LEAK_LOG=true".
    @@ Commit message
         under "GIT_TEST_SANITIZE_LEAK_LOG=true". The rest would be caught
         without it.
     
    +    We could pick other variable names here than "ln%d", e.g. "commit",
    +    "dummy_blob" and "file_blob", but having the "rev-parse" invocations
    +    aligned makes the difference between them more readable, so let's pick
    +    "ln%d".
    +
         1. 4cf2143e029 (pack-objects: break delta cycles before delta-search
            phase, 2016-08-11)
         2. https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/
    @@ t/t5314-pack-cycle-detection.sh: TEST_PASSES_SANITIZE_LEAK=true
     -	} |
     -	git pack-objects --stdout |
     -	git index-pack --stdin --fix-thin
    -+	grp1=$(git rev-parse "$2") &&
    -+	grp2=$(git rev-parse "$1:dummy") &&
    -+	grp3=$(git rev-parse "$1:file") &&
    ++	ln1=$(git rev-parse "$2") &&
    ++	ln2=$(git rev-parse "$1:dummy") &&
    ++	ln3=$(git rev-parse "$1:file") &&
     +	cat >list <<-EOF
    -+	-$grp1
    -+	$grp2 dummy
    -+	$grp3 file
    ++	-$ln1
    ++	$ln2 dummy
    ++	$ln3 file
     +	EOF
     +	git pack-objects --stdout <list >pack &&
     +	git index-pack --stdin --fix-thin <pack

 t/t5314-pack-cycle-detection.sh | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/t/t5314-pack-cycle-detection.sh b/t/t5314-pack-cycle-detection.sh
index 73a241743aa..82734b9a3c4 100755
--- a/t/t5314-pack-cycle-detection.sh
+++ b/t/t5314-pack-cycle-detection.sh
@@ -63,13 +63,16 @@ TEST_PASSES_SANITIZE_LEAK=true
 # Note that the two variants of "file" must be similar enough to convince git
 # to create the delta.
 make_pack () {
-	{
-		printf '%s\n' "-$(git rev-parse $2)"
-		printf '%s dummy\n' "$(git rev-parse $1:dummy)"
-		printf '%s file\n' "$(git rev-parse $1:file)"
-	} |
-	git pack-objects --stdout |
-	git index-pack --stdin --fix-thin
+	ln1=$(git rev-parse "$2") &&
+	ln2=$(git rev-parse "$1:dummy") &&
+	ln3=$(git rev-parse "$1:file") &&
+	cat >list <<-EOF
+	-$ln1
+	$ln2 dummy
+	$ln3 file
+	EOF
+	git pack-objects --stdout <list >pack &&
+	git index-pack --stdin --fix-thin <pack
 }
 
 test_expect_success 'setup' '
-- 
2.39.0.rc0.962.g4ca4168c9ac


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak"
  2022-11-28 11:31         ` Ævar Arnfjörð Bjarmason
  2022-11-28 12:24           ` Ævar Arnfjörð Bjarmason
@ 2022-11-28 14:29           ` René Scharfe
  2022-11-28 14:34             ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 36+ messages in thread
From: René Scharfe @ 2022-11-28 14:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Git List, Taylor Blau, Christian Couder, Jeff King

Am 28.11.2022 um 12:31 schrieb Ævar Arnfjörð Bjarmason:
>
> On Mon, Nov 28 2022, René Scharfe wrote:
>
>> Am 28.11.2022 um 11:03 schrieb Junio C Hamano:
>>> René Scharfe <l.s.r@web.de> writes:
>>>
>>>> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518.
>>>>
>>>> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
>>>> 2022-03-28) avoided leaking rev_info allocations in many cases by
>>>> calling repo_init_revisions() only when the .filter member was actually
>>>> needed, but then still leaking it.  That was fixed later by 2108fe4a19
>>>> (revisions API users: add straightforward release_revisions(),
>>>> 2022-04-13), making the reverted commit unnecessary.
>>>
>>> Hmph, with this merged, 'seen' breaks linux-leaks job in a strange
>>> way.
>>>
>>> https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917
>>>
>>> Does anybody want to help looking into it?
>
> [I see we crossed E-Mails]:
> https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/
>
>> The patch exposes that release_revisions() leaks the diffopt allocations
>> as we're yet to address the TODO added by 54c8a7c379 (revisions API: add
>> a TODO for diff_free(&revs->diffopt), 2022-04-14).
>
> That's correct, and we have that leak in various places in our codebase,
> but per the above side-thread I think this is primarily exposing that
> we're setting up the "struct rev_info" with your change when we don't
> need to. Why can't we just skip it?

I have no idea how to stop get_object_list() from using struct rev_info.
We could let it take a struct list_objects_filter_options pointer
instead and have it build a struct rev_info internally, but that would
just move the problem, not solve it.

> Yeah, if we do set it up we'll run into an outstanding leak, and that
> should also be fixed (I have some local patches...), but the other cases
> I know of where we'll leak that data is where we're actually using the
> "struct rev_info".
>
> I haven't tried tearing your change apart to poke at it myself, and
> maybe there's some really good reason for why you can't separate getting
> rid of the J.5.7 dependency and removing the lazy-init.
>
>> The patch below plugs it locally.
>>
>> --- >8 ---
>> Subject: [PATCH 4/3] fixup! revision: free diffopt in release_revisions()
>>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>>  builtin/pack-objects.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>> index 3e74fbb0cd..a47a3f0fba 100644
>> --- a/builtin/pack-objects.c
>> +++ b/builtin/pack-objects.c
>> @@ -4462,6 +4462,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>>  	} else {
>>  		get_object_list(&revs, rp.nr, rp.v);
>>  	}
>> +	diff_free(&revs.diffopt);
>>  	release_revisions(&revs);
>>  	cleanup_preferred_base();
>>  	if (include_tag && nr_result)
>
> So, the main motivation for the change was paranoia that a compiler or
> platform might show up without J.5.7 support and that would bite us, but
> we're now adding a double-free-in-waiting?
>
> I think we're both a bit paranoid, but clearly have different
> paranoia-priorities :)
>
> If we do end up with some hack like this instead of fixing the
> underlying problem I'd much prefer that such a hack just be an UNLEAK()
> here.
>
> I.e. we have a destructor for "revs.*" already, let's not bypass it and
> start freeing things from under it, which will result in a double-free
> if we forget this callsite once the TODO in 54c8a7c379 is addressed.

Well, that TODO fix should remove this new diff_free() call, but I
agree that this is fragile.

Removing the "TEST_PASSES_SANITIZE_LEAK=true" line from affected tests
is probably better.

> As you'd see if you made release_revisions() simply call
> diff_free(&revs.diffopt) doing so would reveal some really gnarly edge
> cases.

That was my first attempt; it breaks lots of tests due to double frees.

> I haven't dug into this one, but offhand I'm not confident in saying
> that this isn't exposing us to some aspect of that gnarlyness (maybe
> not, it's been a while since I looked).

I saw it as the way towards a release_revisions() that calls diff_free()
itself: Add such calls to each of them, fix the "gnarlyness"
individually, finally move them all into release_revisions().  The only
problem is that there are 60+ callsites.

> (IIRC some of the most gnarly edge cases will only show up as CI
> failures on Windows, to do with the ordering of when we'll fclose()
> files hanging off that "diffopt").

Fun.

René

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak"
  2022-11-28 14:29           ` René Scharfe
@ 2022-11-28 14:34             ` Ævar Arnfjörð Bjarmason
  2022-11-28 15:56               ` René Scharfe
  0 siblings, 1 reply; 36+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-11-28 14:34 UTC (permalink / raw)
  To: René Scharfe
  Cc: Junio C Hamano, Git List, Taylor Blau, Christian Couder, Jeff King


On Mon, Nov 28 2022, René Scharfe wrote:

> Am 28.11.2022 um 12:31 schrieb Ævar Arnfjörð Bjarmason:
>>
>> On Mon, Nov 28 2022, René Scharfe wrote:
>>
>>> Am 28.11.2022 um 11:03 schrieb Junio C Hamano:
>>>> René Scharfe <l.s.r@web.de> writes:
>>>>
>>>>> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518.
>>>>>
>>>>> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
>>>>> 2022-03-28) avoided leaking rev_info allocations in many cases by
>>>>> calling repo_init_revisions() only when the .filter member was actually
>>>>> needed, but then still leaking it.  That was fixed later by 2108fe4a19
>>>>> (revisions API users: add straightforward release_revisions(),
>>>>> 2022-04-13), making the reverted commit unnecessary.
>>>>
>>>> Hmph, with this merged, 'seen' breaks linux-leaks job in a strange
>>>> way.
>>>>
>>>> https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917
>>>>
>>>> Does anybody want to help looking into it?
>>
>> [I see we crossed E-Mails]:
>> https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/
>>
>>> The patch exposes that release_revisions() leaks the diffopt allocations
>>> as we're yet to address the TODO added by 54c8a7c379 (revisions API: add
>>> a TODO for diff_free(&revs->diffopt), 2022-04-14).
>>
>> That's correct, and we have that leak in various places in our codebase,
>> but per the above side-thread I think this is primarily exposing that
>> we're setting up the "struct rev_info" with your change when we don't
>> need to. Why can't we just skip it?
>
> I have no idea how to stop get_object_list() from using struct rev_info.
> We could let it take a struct list_objects_filter_options pointer
> instead and have it build a struct rev_info internally, but that would
> just move the problem, not solve it.

I mean skip it when it's not needed, it's needed when we call
get_object_list().

But what "problem" is being caused by get_object_list()? That there's
some other case(s) where it'll leak still? I haven't checked, I think we
should leave that for some other time if there's such leaks, and just
not introduce any new leaks in this topic.

>> Yeah, if we do set it up we'll run into an outstanding leak, and that
>> should also be fixed (I have some local patches...), but the other cases
>> I know of where we'll leak that data is where we're actually using the
>> "struct rev_info".
>>
>> I haven't tried tearing your change apart to poke at it myself, and
>> maybe there's some really good reason for why you can't separate getting
>> rid of the J.5.7 dependency and removing the lazy-init.
>>
>>> The patch below plugs it locally.
>>>
>>> --- >8 ---
>>> Subject: [PATCH 4/3] fixup! revision: free diffopt in release_revisions()
>>>
>>> Signed-off-by: René Scharfe <l.s.r@web.de>
>>> ---
>>>  builtin/pack-objects.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>>> index 3e74fbb0cd..a47a3f0fba 100644
>>> --- a/builtin/pack-objects.c
>>> +++ b/builtin/pack-objects.c
>>> @@ -4462,6 +4462,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>>>  	} else {
>>>  		get_object_list(&revs, rp.nr, rp.v);
>>>  	}
>>> +	diff_free(&revs.diffopt);
>>>  	release_revisions(&revs);
>>>  	cleanup_preferred_base();
>>>  	if (include_tag && nr_result)
>>
>> So, the main motivation for the change was paranoia that a compiler or
>> platform might show up without J.5.7 support and that would bite us, but
>> we're now adding a double-free-in-waiting?
>>
>> I think we're both a bit paranoid, but clearly have different
>> paranoia-priorities :)
>>
>> If we do end up with some hack like this instead of fixing the
>> underlying problem I'd much prefer that such a hack just be an UNLEAK()
>> here.
>>
>> I.e. we have a destructor for "revs.*" already, let's not bypass it and
>> start freeing things from under it, which will result in a double-free
>> if we forget this callsite once the TODO in 54c8a7c379 is addressed.
>
> Well, that TODO fix should remove this new diff_free() call, but I
> agree that this is fragile.
>
> Removing the "TEST_PASSES_SANITIZE_LEAK=true" line from affected tests
> is probably better.

Or just not introduce new leaks, per my suggested fix-up at
https://lore.kernel.org/git/221128.86zgcbl0pe.gmgdl@evledraar.gmail.com/
(which it looks like you haven't seen when this E-Mail is composed...).

>> As you'd see if you made release_revisions() simply call
>> diff_free(&revs.diffopt) doing so would reveal some really gnarly edge
>> cases.
>
> That was my first attempt; it breaks lots of tests due to double frees.

Right, to be clear I'm saying that none of this is needed right now,
i.e. I don't get why we'd want the scope-creep past the hunk I noted in
https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/
for the --filter bug fix (plus the tests you're adding).

>> I haven't dug into this one, but offhand I'm not confident in saying
>> that this isn't exposing us to some aspect of that gnarlyness (maybe
>> not, it's been a while since I looked).
>
> I saw it as the way towards a release_revisions() that calls diff_free()
> itself: Add such calls to each of them, fix the "gnarlyness"
> individually, finally move them all into release_revisions().  The only
> problem is that there are 60+ callsites.

I think this is a really bad approach in general.

Yes, it may happen to work to free() some data from under an API, but
it's just as likely that we'll miss that this one caller is screwing
with its internal state, and e.g. when some new revision.c code is used
it'll go boom.

If we wanted to phase in such a free() of "foo" I think the right way
would be to add some "revs.free_foo = 1" flag, giving the API a chance
to treat that sanely, not to start poking at members of the struct, and
assuming that its release() won't be free()-ing them.

But as noted above & in the linked I think we can defer all of that. The
only reason we're discussing this is because you're changing the
lazy-init to be not-lazy, and introducing new leaks as a result.

I've shown a couple of approaches in this thread of fixing the issue(s)
at hand without introducing such leaks, so ...

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak"
  2022-11-28 12:24           ` Ævar Arnfjörð Bjarmason
@ 2022-11-28 15:16             ` René Scharfe
  2022-11-28 15:27               ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 36+ messages in thread
From: René Scharfe @ 2022-11-28 15:16 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Git List, Taylor Blau, Christian Couder, Jeff King

Am 28.11.2022 um 13:24 schrieb Ævar Arnfjörð Bjarmason:
>
> On Mon, Nov 28 2022, Ævar Arnfjörð Bjarmason wrote:
>
> René:
>
>> On Mon, Nov 28 2022, René Scharfe wrote:
>>
>>> Am 28.11.2022 um 11:03 schrieb Junio C Hamano:
>>>> René Scharfe <l.s.r@web.de> writes:
>>>>
>>>>> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518.
>>>>>
>>>>> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
>>>>> 2022-03-28) avoided leaking rev_info allocations in many cases by
>>>>> calling repo_init_revisions() only when the .filter member was actually
>>>>> needed, but then still leaking it.  That was fixed later by 2108fe4a19
>>>>> (revisions API users: add straightforward release_revisions(),
>>>>> 2022-04-13), making the reverted commit unnecessary.
>>>>
>>>> Hmph, with this merged, 'seen' breaks linux-leaks job in a strange
>>>> way.
>>>>
>>>> https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917
>>>>
>>>> Does anybody want to help looking into it?
>>
>> [I see we crossed E-Mails]:
>> https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/
>>
>>> The patch exposes that release_revisions() leaks the diffopt allocations
>>> as we're yet to address the TODO added by 54c8a7c379 (revisions API: add
>>> a TODO for diff_free(&revs->diffopt), 2022-04-14).
>>
>> That's correct, and we have that leak in various places in our codebase,
>> but per the above side-thread I think this is primarily exposing that
>> we're setting up the "struct rev_info" with your change when we don't
>> need to. Why can't we just skip it?
>>
>> Yeah, if we do set it up we'll run into an outstanding leak, and that
>> should also be fixed (I have some local patches...), but the other cases
>> I know of where we'll leak that data is where we're actually using the
>> "struct rev_info".
>>
>> I haven't tried tearing your change apart to poke at it myself, and
>> maybe there's some really good reason for why you can't separate getting
>> rid of the J.5.7 dependency and removing the lazy-init.
>>
>>> The patch below plugs it locally.
>>>
>>> --- >8 ---
>>> Subject: [PATCH 4/3] fixup! revision: free diffopt in release_revisions()
>>>
>>> Signed-off-by: René Scharfe <l.s.r@web.de>
>>> ---
>>>  builtin/pack-objects.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>>> index 3e74fbb0cd..a47a3f0fba 100644
>>> --- a/builtin/pack-objects.c
>>> +++ b/builtin/pack-objects.c
>>> @@ -4462,6 +4462,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>>>  	} else {
>>>  		get_object_list(&revs, rp.nr, rp.v);
>>>  	}
>>> +	diff_free(&revs.diffopt);
>>>  	release_revisions(&revs);
>>>  	cleanup_preferred_base();
>>>  	if (include_tag && nr_result)
>>
>> So, the main motivation for the change was paranoia that a compiler or
>> platform might show up without J.5.7 support and that would bite us, but
>> we're now adding a double-free-in-waiting?
>>
>> I think we're both a bit paranoid, but clearly have different
>> paranoia-priorities :)
>>
>> If we do end up with some hack like this instead of fixing the
>> underlying problem I'd much prefer that such a hack just be an UNLEAK()
>> here.
>>
>> I.e. we have a destructor for "revs.*" already, let's not bypass it and
>> start freeing things from under it, which will result in a double-free
>> if we forget this callsite once the TODO in 54c8a7c379 is addressed.
>>
>> As you'd see if you made release_revisions() simply call
>> diff_free(&revs.diffopt) doing so would reveal some really gnarly edge
>> cases.
>>
>> I haven't dug into this one, but offhand I'm not confident in saying
>> that this isn't exposing us to some aspect of that gnarlyness (maybe
>> not, it's been a while since I looked).
>>
>> (IIRC some of the most gnarly edge cases will only show up as CI
>> failures on Windows, to do with the ordering of when we'll fclose()
>> files hanging off that "diffopt").
>
> This squashed into 3/3 seems to me to be a proper fix to a change that
> wants to refactor the code for non-J.5.7 compatibility. I.e. this just
> does the data<->fp casting part of the change, without refactoring the
> "lazy init".

That works, but lazy code is more complicated and there is no benefit
here -- eager allocations are not noticably slow or big.  Laziness
hides leaks in corners, i.e. requiring invocations with uncommon
options to trigger them.

>
> But I think you should check this a bit more carefully. Your 3/3 says
> that your change "mak[es] the reverted commit unnecessary"

No, it says that _your_ change 2108fe4a19 (revisions API users: add
straightforward release_revisions(), 2022-04-13) made it unnecessary.

> , but as I
> noted if you'd run the command that commit shows, you'd have seen you're
> re-introducing the leak it fixed. So I wonder what else has been missed
> here.

5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
2022-03-28) did not plug the leak.  It only moved it to the corner that
handles the --filter option.

That leak is only interesting to Git developers and harmless for users.
But if the goal is to become free of trivial leaks in order to allow
using tools like LeakSanitizer to find real ones then pushing them into
the shadows not yet reached by our test coverage won't help for long.

> I vaguely recall that one reason I ended up with that J.5.7 dependency
> was because there was an objection to mocking up the "struct option" as
> I'm doing here. I.e. here we assume that the
> opt_parse_list_objects_filter() is only ever going to care about the
> "value" member.

It's probably fine, but unnecessarily complicated compared to calling
repo_init_revisions() eagerly.

>
> I think that's probably fine, but I may be misrecalling, or missing some
> crucial detail. I'll leave digging that up & convincing us that it's
> fine to the person pushing for refactoring all of this :)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 3e74fbb0cd5..faf210bfe8c 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -4149,6 +4149,27 @@ static int option_parse_cruft_expiration(const struct option *opt,
>  	return 0;
>  }
>
> +struct po_filter_data {
> +	unsigned have_revs:1;
> +	struct rev_info revs;
> +};
> +
> +static int opt_parse_list_objects_filter_init(const struct option *opt,
> +					      const char *arg, int unset)
> +{
> +	struct po_filter_data *data = opt->value;
> +	struct rev_info *revs = &data->revs;
> +	const struct option opt_rev = {
> +		.value = (void *)&revs->filter,
> +	};
> +
> +	if (!data->have_revs)
> +		repo_init_revisions(the_repository, revs, NULL);
> +	data->have_revs = 1;
> +
> +	return opt_parse_list_objects_filter(&opt_rev, arg, unset);
> +}
> +
>  int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>  {
>  	int use_internal_rev_list = 0;
> @@ -4159,7 +4180,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>  	int rev_list_index = 0;
>  	int stdin_packs = 0;
>  	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
> -	struct rev_info revs;
> +	struct po_filter_data pfd = { .have_revs = 0 };
>
>  	struct option pack_objects_options[] = {
>  		OPT_SET_INT('q', "quiet", &progress,
> @@ -4250,7 +4271,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>  			      &write_bitmap_index,
>  			      N_("write a bitmap index if possible"),
>  			      WRITE_BITMAP_QUIET, PARSE_OPT_HIDDEN),
> -		OPT_PARSE_LIST_OBJECTS_FILTER(&revs.filter),
> +		OPT_CALLBACK(0, "filter", &pfd, N_("args"), N_("object filtering"),
> +			     opt_parse_list_objects_filter_init),
>  		OPT_CALLBACK_F(0, "missing", NULL, N_("action"),
>  		  N_("handling for missing objects"), PARSE_OPT_NONEG,
>  		  option_parse_missing_action),
> @@ -4269,8 +4291,6 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>
>  	read_replace_refs = 0;
>
> -	repo_init_revisions(the_repository, &revs, NULL);
> -
>  	sparse = git_env_bool("GIT_TEST_PACK_SPARSE", -1);
>  	if (the_repository->gitdir) {
>  		prepare_repo_settings(the_repository);
> @@ -4372,7 +4392,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>  	if (!rev_list_all || !rev_list_reflog || !rev_list_index)
>  		unpack_unreachable_expiration = 0;
>
> -	if (revs.filter.choice) {
> +	if (pfd.have_revs && pfd.revs.filter.choice) {
>  		if (!pack_to_stdout)
>  			die(_("cannot use --filter without --stdout"));
>  		if (stdin_packs)
> @@ -4459,10 +4479,16 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>  		read_cruft_objects();
>  	} else if (!use_internal_rev_list) {
>  		read_object_list_from_stdin();
> +	} else if (pfd.have_revs) {
> +		get_object_list(&pfd.revs, rp.nr, rp.v);
> +		release_revisions(&pfd.revs);
>  	} else {
> +		struct rev_info revs;
> +
> +		repo_init_revisions(the_repository, &revs, NULL);
>  		get_object_list(&revs, rp.nr, rp.v);
> +		release_revisions(&revs);
>  	}
> -	release_revisions(&revs);
>  	cleanup_preferred_base();
>  	if (include_tag && nr_result)
>  		for_each_tag_ref(add_ref_tag, NULL);

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak"
  2022-11-28 15:16             ` René Scharfe
@ 2022-11-28 15:27               ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 36+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-11-28 15:27 UTC (permalink / raw)
  To: René Scharfe
  Cc: Junio C Hamano, Git List, Taylor Blau, Christian Couder, Jeff King


On Mon, Nov 28 2022, René Scharfe wrote:

> Am 28.11.2022 um 13:24 schrieb Ævar Arnfjörð Bjarmason:
>>
>> On Mon, Nov 28 2022, Ævar Arnfjörð Bjarmason wrote:
>>
>> René:
>>
>>> On Mon, Nov 28 2022, René Scharfe wrote:
>>>
>>>> Am 28.11.2022 um 11:03 schrieb Junio C Hamano:
>>>>> René Scharfe <l.s.r@web.de> writes:
>>>>>
>>>>>> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518.
>>>>>>
>>>>>> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
>>>>>> 2022-03-28) avoided leaking rev_info allocations in many cases by
>>>>>> calling repo_init_revisions() only when the .filter member was actually
>>>>>> needed, but then still leaking it.  That was fixed later by 2108fe4a19
>>>>>> (revisions API users: add straightforward release_revisions(),
>>>>>> 2022-04-13), making the reverted commit unnecessary.
>>>>>
>>>>> Hmph, with this merged, 'seen' breaks linux-leaks job in a strange
>>>>> way.
>>>>>
>>>>> https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917
>>>>>
>>>>> Does anybody want to help looking into it?
>>>
>>> [I see we crossed E-Mails]:
>>> https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/
>>>
>>>> The patch exposes that release_revisions() leaks the diffopt allocations
>>>> as we're yet to address the TODO added by 54c8a7c379 (revisions API: add
>>>> a TODO for diff_free(&revs->diffopt), 2022-04-14).
>>>
>>> That's correct, and we have that leak in various places in our codebase,
>>> but per the above side-thread I think this is primarily exposing that
>>> we're setting up the "struct rev_info" with your change when we don't
>>> need to. Why can't we just skip it?
>>>
>>> Yeah, if we do set it up we'll run into an outstanding leak, and that
>>> should also be fixed (I have some local patches...), but the other cases
>>> I know of where we'll leak that data is where we're actually using the
>>> "struct rev_info".
>>>
>>> I haven't tried tearing your change apart to poke at it myself, and
>>> maybe there's some really good reason for why you can't separate getting
>>> rid of the J.5.7 dependency and removing the lazy-init.
>>>
>>>> The patch below plugs it locally.
>>>>
>>>> --- >8 ---
>>>> Subject: [PATCH 4/3] fixup! revision: free diffopt in release_revisions()
>>>>
>>>> Signed-off-by: René Scharfe <l.s.r@web.de>
>>>> ---
>>>>  builtin/pack-objects.c | 1 +
>>>>  1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>>>> index 3e74fbb0cd..a47a3f0fba 100644
>>>> --- a/builtin/pack-objects.c
>>>> +++ b/builtin/pack-objects.c
>>>> @@ -4462,6 +4462,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>>>>  	} else {
>>>>  		get_object_list(&revs, rp.nr, rp.v);
>>>>  	}
>>>> +	diff_free(&revs.diffopt);
>>>>  	release_revisions(&revs);
>>>>  	cleanup_preferred_base();
>>>>  	if (include_tag && nr_result)
>>>
>>> So, the main motivation for the change was paranoia that a compiler or
>>> platform might show up without J.5.7 support and that would bite us, but
>>> we're now adding a double-free-in-waiting?
>>>
>>> I think we're both a bit paranoid, but clearly have different
>>> paranoia-priorities :)
>>>
>>> If we do end up with some hack like this instead of fixing the
>>> underlying problem I'd much prefer that such a hack just be an UNLEAK()
>>> here.
>>>
>>> I.e. we have a destructor for "revs.*" already, let's not bypass it and
>>> start freeing things from under it, which will result in a double-free
>>> if we forget this callsite once the TODO in 54c8a7c379 is addressed.
>>>
>>> As you'd see if you made release_revisions() simply call
>>> diff_free(&revs.diffopt) doing so would reveal some really gnarly edge
>>> cases.
>>>
>>> I haven't dug into this one, but offhand I'm not confident in saying
>>> that this isn't exposing us to some aspect of that gnarlyness (maybe
>>> not, it's been a while since I looked).
>>>
>>> (IIRC some of the most gnarly edge cases will only show up as CI
>>> failures on Windows, to do with the ordering of when we'll fclose()
>>> files hanging off that "diffopt").
>>
>> This squashed into 3/3 seems to me to be a proper fix to a change that
>> wants to refactor the code for non-J.5.7 compatibility. I.e. this just
>> does the data<->fp casting part of the change, without refactoring the
>> "lazy init".
>
> That works, but lazy code is more complicated and there is no benefit
> here -- eager allocations are not noticably slow or big.  Laziness
> hides leaks in corners, i.e. requiring invocations with uncommon
> options to trigger them.

Yes, sometimes it's easier to just set everything up at the
beginning. As for hiding leaks I think the empirical data here is going
against that, i.e. your change introduced a leak.

I don't think it's realistic that we'll have the side that assigns to
"have_revs" drift from the corresponding code in cmd_pack_objects().

>> But I think you should check this a bit more carefully. Your 3/3 says
>> that your change "mak[es] the reverted commit unnecessary"
>
> No, it says that _your_ change 2108fe4a19 (revisions API users: add
> straightforward release_revisions(), 2022-04-13) made it unnecessary.

Yes, I'm saying that's not correct, because if you run the command that
5cb28270a1 prominently notes we'll now leak with this revert:

	echo e83c5163316f89bfbde7d9ab23ca2e25604af290 | ./git pack-objects initial

But yes with just 5cb28270a1 didn't add release_revisions(), that came
shortly afterwards in 2108fe4a19.

>> , but as I
>> noted if you'd run the command that commit shows, you'd have seen you're
>> re-introducing the leak it fixed. So I wonder what else has been missed
>> here.
>
> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
> 2022-03-28) did not plug the leak.  It only moved it to the corner that
> handles the --filter option.

I think we're using "the leak" here differently. I mean callstacks that
LeakSanitizer emits & tests we have that do & don't pass with
SANITIZE=leak.

But yes, there may be multiple paths through a function, some of which
leak, some of which don't. I'm not saying that the entire set of API
features that builtin/pack-objects.c uses in the revision API is
leak-free.

> That leak is only interesting to Git developers and harmless for users.
> But if the goal is to become free of trivial leaks in order to allow
> using tools like LeakSanitizer to find real ones then pushing them into
> the shadows not yet reached by our test coverage won't help for long.

It's clearly helping in this case, as our CI had multiple failing tests.

>> I vaguely recall that one reason I ended up with that J.5.7 dependency
>> was because there was an objection to mocking up the "struct option" as
>> I'm doing here. I.e. here we assume that the
>> opt_parse_list_objects_filter() is only ever going to care about the
>> "value" member.
>
> It's probably fine, but unnecessarily complicated compared to calling
> repo_init_revisions() eagerly.

I'm leaving aside the question of whether we should go for some version
of the refactoring in your 3/3.

What I am saying is that such refactoring should be split up from the
more narrow bug fix to the existing code. I.e. this as a replacement for
your 3/3 is all that's needed to pass the test you're adding in 2/3.

-- >8 --
From: =?UTF-8?q?Ren=C3=A9=20Scharfe?= <l.s.r@web.de>
Subject: [PATCH] pack-objects: support multiple --filter options again

5cb28270a1f (pack-objects: lazily set up "struct rev_info", don't
leak, 2022-03-28) broke support for multiple --filter options by
calling repo_init_revisions() every time "--filter" was seen. Instead
we should only do so the first time, and subsequently append to the
existing filter data.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/pack-objects.c                 | 3 ++-
 t/t5317-pack-objects-filter-objects.sh | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 573d0b20b76..c702c09dd45 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -4158,7 +4158,8 @@ static struct list_objects_filter_options *po_filter_revs_init(void *value)
 {
 	struct po_filter_data *data = value;
 
-	repo_init_revisions(the_repository, &data->revs, NULL);
+	if (!data->have_revs)
+		repo_init_revisions(the_repository, &data->revs, NULL);
 	data->have_revs = 1;
 
 	return &data->revs.filter;
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 25faebaada8..5b707d911b5 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -265,7 +265,7 @@ test_expect_success 'verify normal and blob:limit packfiles have same commits/tr
 	test_cmp expected observed
 '
 
-test_expect_failure 'verify small limit and big limit results in small limit' '
+test_expect_success 'verify small limit and big limit results in small limit' '
 	git -C r2 ls-files -s large.1000 >ls_files_result &&
 	test_parse_ls_files_stage_oids <ls_files_result |
 	sort >expected &&
-- 
2.39.0.rc0.993.g0c499e58e3b


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak"
  2022-11-28 14:34             ` Ævar Arnfjörð Bjarmason
@ 2022-11-28 15:56               ` René Scharfe
  2022-11-28 17:57                 ` René Scharfe
  2022-11-28 17:57                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 36+ messages in thread
From: René Scharfe @ 2022-11-28 15:56 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Git List, Taylor Blau, Christian Couder, Jeff King

Am 28.11.2022 um 15:34 schrieb Ævar Arnfjörð Bjarmason:
>
> On Mon, Nov 28 2022, René Scharfe wrote:
>
>> Am 28.11.2022 um 12:31 schrieb Ævar Arnfjörð Bjarmason:
>>>
>>> On Mon, Nov 28 2022, René Scharfe wrote:
>>>
>>>> Am 28.11.2022 um 11:03 schrieb Junio C Hamano:
>>>>> René Scharfe <l.s.r@web.de> writes:
>>>>>
>>>>>> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518.
>>>>>>
>>>>>> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
>>>>>> 2022-03-28) avoided leaking rev_info allocations in many cases by
>>>>>> calling repo_init_revisions() only when the .filter member was actually
>>>>>> needed, but then still leaking it.  That was fixed later by 2108fe4a19
>>>>>> (revisions API users: add straightforward release_revisions(),
>>>>>> 2022-04-13), making the reverted commit unnecessary.
>>>>>
>>>>> Hmph, with this merged, 'seen' breaks linux-leaks job in a strange
>>>>> way.
>>>>>
>>>>> https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917
>>>>>
>>>>> Does anybody want to help looking into it?
>>>
>>> [I see we crossed E-Mails]:
>>> https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/
>>>
>>>> The patch exposes that release_revisions() leaks the diffopt allocations
>>>> as we're yet to address the TODO added by 54c8a7c379 (revisions API: add
>>>> a TODO for diff_free(&revs->diffopt), 2022-04-14).
>>>
>>> That's correct, and we have that leak in various places in our codebase,
>>> but per the above side-thread I think this is primarily exposing that
>>> we're setting up the "struct rev_info" with your change when we don't
>>> need to. Why can't we just skip it?
>>
>> I have no idea how to stop get_object_list() from using struct rev_info.
>> We could let it take a struct list_objects_filter_options pointer
>> instead and have it build a struct rev_info internally, but that would
>> just move the problem, not solve it.
>
> I mean skip it when it's not needed, it's needed when we call
> get_object_list().
>
> But what "problem" is being caused by get_object_list()? That there's
> some other case(s) where it'll leak still? I haven't checked, I think we
> should leave that for some other time if there's such leaks, and just
> not introduce any new leaks in this topic.

The problem is "How to use struct rev_info without leaks?".  No matter
where you move it, the leak will be present until the TODO in
release_revisions() is done.

>
>>> Yeah, if we do set it up we'll run into an outstanding leak, and that
>>> should also be fixed (I have some local patches...), but the other cases
>>> I know of where we'll leak that data is where we're actually using the
>>> "struct rev_info".
>>>
>>> I haven't tried tearing your change apart to poke at it myself, and
>>> maybe there's some really good reason for why you can't separate getting
>>> rid of the J.5.7 dependency and removing the lazy-init.
>>>
>>>> The patch below plugs it locally.
>>>>
>>>> --- >8 ---
>>>> Subject: [PATCH 4/3] fixup! revision: free diffopt in release_revisions()
>>>>
>>>> Signed-off-by: René Scharfe <l.s.r@web.de>
>>>> ---
>>>>  builtin/pack-objects.c | 1 +
>>>>  1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>>>> index 3e74fbb0cd..a47a3f0fba 100644
>>>> --- a/builtin/pack-objects.c
>>>> +++ b/builtin/pack-objects.c
>>>> @@ -4462,6 +4462,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>>>>  	} else {
>>>>  		get_object_list(&revs, rp.nr, rp.v);
>>>>  	}
>>>> +	diff_free(&revs.diffopt);
>>>>  	release_revisions(&revs);
>>>>  	cleanup_preferred_base();
>>>>  	if (include_tag && nr_result)
>>>
>>> So, the main motivation for the change was paranoia that a compiler or
>>> platform might show up without J.5.7 support and that would bite us, but
>>> we're now adding a double-free-in-waiting?
>>>
>>> I think we're both a bit paranoid, but clearly have different
>>> paranoia-priorities :)
>>>
>>> If we do end up with some hack like this instead of fixing the
>>> underlying problem I'd much prefer that such a hack just be an UNLEAK()
>>> here.
>>>
>>> I.e. we have a destructor for "revs.*" already, let's not bypass it and
>>> start freeing things from under it, which will result in a double-free
>>> if we forget this callsite once the TODO in 54c8a7c379 is addressed.
>>
>> Well, that TODO fix should remove this new diff_free() call, but I
>> agree that this is fragile.
>>
>> Removing the "TEST_PASSES_SANITIZE_LEAK=true" line from affected tests
>> is probably better.
>
> Or just not introduce new leaks, per my suggested fix-up at
> https://lore.kernel.org/git/221128.86zgcbl0pe.gmgdl@evledraar.gmail.com/
> (which it looks like you haven't seen when this E-Mail is composed...).

Not adding leaks is a good idea.  AFAICS none of my patches so far add
any.  Patch 3 of v2 exposes an existing one that was only triggered by
the --filter option before.  Which is also not ideal, of course, but
giving it more visibility hopefully will motivate a proper fix.

>>> As you'd see if you made release_revisions() simply call
>>> diff_free(&revs.diffopt) doing so would reveal some really gnarly edge
>>> cases.
>>
>> That was my first attempt; it breaks lots of tests due to double frees.
>
> Right, to be clear I'm saying that none of this is needed right now,
> i.e. I don't get why we'd want the scope-creep past the hunk I noted in
> https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/
> for the --filter bug fix (plus the tests you're adding).

Well, you asked to squash the minimal fix into the laziness removal in
https://lore.kernel.org/git/221112.86bkpcmm6i.gmgdl@evledraar.gmail.com/

Reverting 5cb28270a1 (pack-objects: lazily set up "struct rev_info",
don't leak, 2022-03-28) wholesale is the simplest way to reach the goals
of regression fix, simplification and standard compliance.  Except that
the leak check tests have come to depend on the leak being hidden in the
--filter corner.  So going back to v1 sure seems attractive.

>
>>> I haven't dug into this one, but offhand I'm not confident in saying
>>> that this isn't exposing us to some aspect of that gnarlyness (maybe
>>> not, it's been a while since I looked).
>>
>> I saw it as the way towards a release_revisions() that calls diff_free()
>> itself: Add such calls to each of them, fix the "gnarlyness"
>> individually, finally move them all into release_revisions().  The only
>> problem is that there are 60+ callsites.
>
> I think this is a really bad approach in general.
>
> Yes, it may happen to work to free() some data from under an API, but
> it's just as likely that we'll miss that this one caller is screwing
> with its internal state, and e.g. when some new revision.c code is used
> it'll go boom.
>
> If we wanted to phase in such a free() of "foo" I think the right way
> would be to add some "revs.free_foo = 1" flag, giving the API a chance
> to treat that sanely, not to start poking at members of the struct, and
> assuming that its release() won't be free()-ing them.

And that's why you added no_free to struct diff_options.  We could use
it here by setting it in repo_init_revisions() and unsetting in
cmd_pack_objects() and elsewhere, until it is set everywhere.

> But as noted above & in the linked I think we can defer all of that. The
> only reason we're discussing this is because you're changing the
> lazy-init to be not-lazy, and introducing new leaks as a result.>
> I've shown a couple of approaches in this thread of fixing the issue(s)
> at hand without introducing such leaks, so ...

As noted above: These leaks are not new, they are just moved into
test coverage.

René

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak"
  2022-11-28 15:56               ` René Scharfe
@ 2022-11-28 17:57                 ` René Scharfe
  2022-11-28 18:32                   ` Ævar Arnfjörð Bjarmason
  2022-11-28 17:57                 ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 36+ messages in thread
From: René Scharfe @ 2022-11-28 17:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Git List, Taylor Blau, Christian Couder, Jeff King

Am 28.11.2022 um 16:56 schrieb René Scharfe:>
> The problem is "How to use struct rev_info without leaks?".  No matter
> where you move it, the leak will be present until the TODO in
> release_revisions() is done.

Wrong.  The following sequence leaks:

	struct rev_info revs;
	repo_init_revisions(the_repository, &revs, NULL);
	release_revisions(&revs);

... and this here doesn't:

	struct rev_info revs;
	repo_init_revisions(the_repository, &revs, NULL);
	setup_revisions(0, NULL, &revs, NULL);  // leak plugger
	release_revisions(&revs);

That's because setup_revisions() calls diff_setup_done(), which frees
revs->diffopt.parseopts, and release_revisions() doesn't.

And since builtin/pack-objects.c::get_object_list() calls
setup_revisions(), it really frees that memory, as you claimed from the
start.  Sorry, I was somehow assuming that a setup function wouldn't
clean up.  D'oh!

The first sequence is used in some other places. e.g. builtin/prune.c.
But there LeakSanitizer doesn't complain for some reason; Valgrind
reports the parseopts allocation as "possibly lost".

I still think the assumption that "init_x(x); release_x(x);" doesn't
leak is reasonable.  Let's make it true.  How about this?  It's safe
in the sense that we don't risk double frees and it's close to the
TODO comment so we probably won't forget removing it once diff_free()
becomes used.

---
 revision.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/revision.c b/revision.c
index 439e34a7c5..6a51ef9418 100644
--- a/revision.c
+++ b/revision.c
@@ -3055,6 +3055,7 @@ void release_revisions(struct rev_info *revs)
 	release_revisions_mailmap(revs->mailmap);
 	free_grep_patterns(&revs->grep_filter);
 	/* TODO (need to handle "no_free"): diff_free(&revs->diffopt) */
+	FREE_AND_NULL(revs->diffopt.parseopts);
 	diff_free(&revs->pruning);
 	reflog_walk_info_release(revs->reflog_info);
 	release_revisions_topo_walk_info(revs->topo_walk_info);
--
2.30.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak"
  2022-11-28 15:56               ` René Scharfe
  2022-11-28 17:57                 ` René Scharfe
@ 2022-11-28 17:57                 ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 36+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-11-28 17:57 UTC (permalink / raw)
  To: René Scharfe
  Cc: Junio C Hamano, Git List, Taylor Blau, Christian Couder, Jeff King


On Mon, Nov 28 2022, René Scharfe wrote:

> Am 28.11.2022 um 15:34 schrieb Ævar Arnfjörð Bjarmason:
> [...]
>> I mean skip it when it's not needed, it's needed when we call
>> get_object_list().
>>
>> But what "problem" is being caused by get_object_list()? That there's
>> some other case(s) where it'll leak still? I haven't checked, I think we
>> should leave that for some other time if there's such leaks, and just
>> not introduce any new leaks in this topic.
>
> The problem is "How to use struct rev_info without leaks?".  No matter
> where you move it, the leak will be present until the TODO in
> release_revisions() is done.

No, the problem at hand is that "--filter" doesn't accept N arguments,
which is easily fixed.

Junio then tested the patch, and noted CI failing. Yes we have various
outstanding issues with the revisions API etc. leaking, but the
scope-creep of conflating those questions with a regression fix doesn't
seem necessary.

>>> [...]
>>> Well, that TODO fix should remove this new diff_free() call, but I
>>> agree that this is fragile.
>>>
>>> Removing the "TEST_PASSES_SANITIZE_LEAK=true" line from affected tests
>>> is probably better.
>>
>> Or just not introduce new leaks, per my suggested fix-up at
>> https://lore.kernel.org/git/221128.86zgcbl0pe.gmgdl@evledraar.gmail.com/
>> (which it looks like you haven't seen when this E-Mail is composed...).
>
> Not adding leaks is a good idea.  AFAICS none of my patches so far add
> any.

I don't see where debating what constitutes a "new" or "added" memory
leak is going to get us.

But to clarify I'm not saying you're responsible for the
revisions/filter APIs you're running into, and some of those are
existing leaks.

I'm also saying that if you run:

	make SANITIZE=leak test T=...

For some values of T=... (and the whole test suite with
"GIT_TEST_PASSING_SANITIZE_LEAK=true") your 2/3 passes certain tests,
and your 3/3 doesn't.

> Patch 3 of v2 exposes an existing one that was only triggered by
> the --filter option before.  Which is also not ideal, of course, but
> giving it more visibility hopefully will motivate a proper fix.

We try not to break CI. Only finding out that existing CI breaks once a
patch is queued & Junio & others start testing it to hunt down issues
isn't a good use of anyone's time.

So, maybe you think this area of CI is useless etc., but again, if
you're submitting a change to advocate for that can we peel that away
from a regression we can fix relatively easily?

>>>> As you'd see if you made release_revisions() simply call
>>>> diff_free(&revs.diffopt) doing so would reveal some really gnarly edge
>>>> cases.
>>>
>>> That was my first attempt; it breaks lots of tests due to double frees.
>>
>> Right, to be clear I'm saying that none of this is needed right now,
>> i.e. I don't get why we'd want the scope-creep past the hunk I noted in
>> https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/
>> for the --filter bug fix (plus the tests you're adding).
>
> Well, you asked to squash the minimal fix into the laziness removal in
> https://lore.kernel.org/git/221112.86bkpcmm6i.gmgdl@evledraar.gmail.com/

The second part of the relevant request being so that we could
regression test the memory leak(s).

So, thanks for trying to address that in some way, but proceeding to
break linux-leaks, and then arguing that that's OK seems like an odd way
to go about that...

> Reverting 5cb28270a1 (pack-objects: lazily set up "struct rev_info",
> don't leak, 2022-03-28) wholesale is the simplest way to reach the goals
> of regression fix, simplification and standard compliance.  Except that
> the leak check tests have come to depend on the leak being hidden in the
> --filter corner.  So going back to v1 sure seems attractive.

Yeah.

Honestly I'd mostly forgotten about the v1 review, and only skimmed
things then. So, I've just noticed that the "squash" I'm suggesting is
basically equivalent to your v1 now.

But yeah I think it's fine to just skip testing the regression fix with
SANITIZE=leak for now, we can do it some other time.

I just brought it up in the v1 feedback because that patch prominently
notes the leak fix, so if we can get regression test it relatively
easily that seemed like a good change to make.

>>> [...]
>>> I saw it as the way towards a release_revisions() that calls diff_free()
>>> itself: Add such calls to each of them, fix the "gnarlyness"
>>> individually, finally move them all into release_revisions().  The only
>>> problem is that there are 60+ callsites.
>>
>> I think this is a really bad approach in general.
>>
>> Yes, it may happen to work to free() some data from under an API, but
>> it's just as likely that we'll miss that this one caller is screwing
>> with its internal state, and e.g. when some new revision.c code is used
>> it'll go boom.
>>
>> If we wanted to phase in such a free() of "foo" I think the right way
>> would be to add some "revs.free_foo = 1" flag, giving the API a chance
>> to treat that sanely, not to start poking at members of the struct, and
>> assuming that its release() won't be free()-ing them.
>
> And that's why you added no_free to struct diff_options.  We could use
> it here by setting it in repo_init_revisions() and unsetting in
> cmd_pack_objects() and elsewhere, until it is set everywhere.

FWIW I think the "no_free" isn't all that good of an approach in
retrospect either, but at least it's (mostly) self-contained in the
struct that owns it.

But it's rather messy, because some of the time it's embedded in a
"struct rev_info" that should really own it, and sometimes now. At the
time release_revisions() didn't exist yet, so making some forward
progress with the diff leaks was easiest with the "no_free".

>> But as noted above & in the linked I think we can defer all of that. The
>> only reason we're discussing this is because you're changing the
>> lazy-init to be not-lazy, and introducing new leaks as a result.>
>> I've shown a couple of approaches in this thread of fixing the issue(s)
>> at hand without introducing such leaks, so ...
>
> As noted above: These leaks are not new, they are just moved into
> test coverage.

Right, and in some cases we should definitely just un-mark a test as
being tested under linux-leaks to make forward progress.

But in this case I just don't see how that's a good trade-off. We can
fix the --filter bug without that, and yes we'll have some new/existing
leaks, but those are all contained in tests that are not tested by
linux-leaks now.

Whereas the larger rewrite to make the init non-lazy would lose us test
coverage.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak"
  2022-11-28 17:57                 ` René Scharfe
@ 2022-11-28 18:32                   ` Ævar Arnfjörð Bjarmason
  2022-11-28 21:57                     ` René Scharfe
  0 siblings, 1 reply; 36+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-11-28 18:32 UTC (permalink / raw)
  To: René Scharfe
  Cc: Junio C Hamano, Git List, Taylor Blau, Christian Couder, Jeff King


On Mon, Nov 28 2022, René Scharfe wrote:

> Am 28.11.2022 um 16:56 schrieb René Scharfe:>
>> The problem is "How to use struct rev_info without leaks?".  No matter
>> where you move it, the leak will be present until the TODO in
>> release_revisions() is done.
>
> Wrong.  The following sequence leaks:
>
> 	struct rev_info revs;
> 	repo_init_revisions(the_repository, &revs, NULL);
> 	release_revisions(&revs);
>
> ... and this here doesn't:
>
> 	struct rev_info revs;
> 	repo_init_revisions(the_repository, &revs, NULL);
> 	setup_revisions(0, NULL, &revs, NULL);  // leak plugger
> 	release_revisions(&revs);
>
> That's because setup_revisions() calls diff_setup_done(), which frees
> revs->diffopt.parseopts, and release_revisions() doesn't.
>
> And since builtin/pack-objects.c::get_object_list() calls
> setup_revisions(), it really frees that memory, as you claimed from the
> start.  Sorry, I was somehow assuming that a setup function wouldn't
> clean up.  D'oh!
>
> The first sequence is used in some other places. e.g. builtin/prune.c.
> But there LeakSanitizer doesn't complain for some reason; Valgrind
> reports the parseopts allocation as "possibly lost".

Yes, some of the interactions are tricky. It's really useful to run the
tests with GIT_TEST_PASSING_SANITIZE_LEAK=[true|check] (see t/README) to
check these sorts of assumptions for sanity.

> I still think the assumption that "init_x(x); release_x(x);" doesn't
> leak is reasonable.  Let's make it true.  How about this?  It's safe
> in the sense that we don't risk double frees and it's close to the
> TODO comment so we probably won't forget removing it once diff_free()
> becomes used.
>
> ---
>  revision.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/revision.c b/revision.c
> index 439e34a7c5..6a51ef9418 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -3055,6 +3055,7 @@ void release_revisions(struct rev_info *revs)
>  	release_revisions_mailmap(revs->mailmap);
>  	free_grep_patterns(&revs->grep_filter);
>  	/* TODO (need to handle "no_free"): diff_free(&revs->diffopt) */
> +	FREE_AND_NULL(revs->diffopt.parseopts);
>  	diff_free(&revs->pruning);
>  	reflog_walk_info_release(revs->reflog_info);
>  	release_revisions_topo_walk_info(revs->topo_walk_info);

At this point I'm unclear on what & why this is needed? I.e. once we
narrowly fix the >1 "--filter" options what still fails?

But in general: I don't really think this sort of thing is worth
it. Here we're reaching into a member of "revs->diffopt" behind its back
rather than calling diff_free(). I think we should just focus on being
able to do do that safely.

WIP patches I have in that direction, partially based on your previous
"is_dead" suggestion:

	https://github.com/avar/git/commit/e02a15f6206
	https://github.com/avar/git/commit/c718f36566a

I haven't poked at that in a while, I think the only outstanding issue
with it is that fclose() interaction.

I think for this particular thing there aren't going to be any bad
side-effects in practice, but I also think convincing oneself of that
basically means putting the same amount of work in as just fixing some
of these properly.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak"
  2022-11-28 18:32                   ` Ævar Arnfjörð Bjarmason
@ 2022-11-28 21:57                     ` René Scharfe
  0 siblings, 0 replies; 36+ messages in thread
From: René Scharfe @ 2022-11-28 21:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Git List, Taylor Blau, Christian Couder, Jeff King

Am 28.11.2022 um 19:32 schrieb Ævar Arnfjörð Bjarmason:
>
> On Mon, Nov 28 2022, René Scharfe wrote:
>
>> Am 28.11.2022 um 16:56 schrieb René Scharfe:>
>>> The problem is "How to use struct rev_info without leaks?".  No matter
>>> where you move it, the leak will be present until the TODO in
>>> release_revisions() is done.
>>
>> Wrong.  The following sequence leaks:
>>
>> 	struct rev_info revs;
>> 	repo_init_revisions(the_repository, &revs, NULL);
>> 	release_revisions(&revs);
>>
>> ... and this here doesn't:
>>
>> 	struct rev_info revs;
>> 	repo_init_revisions(the_repository, &revs, NULL);
>> 	setup_revisions(0, NULL, &revs, NULL);  // leak plugger
>> 	release_revisions(&revs);
>>
>> That's because setup_revisions() calls diff_setup_done(), which frees
>> revs->diffopt.parseopts, and release_revisions() doesn't.
>>
>> And since builtin/pack-objects.c::get_object_list() calls
>> setup_revisions(), it really frees that memory, as you claimed from the
>> start.  Sorry, I was somehow assuming that a setup function wouldn't
>> clean up.  D'oh!
>>
>> The first sequence is used in some other places. e.g. builtin/prune.c.
>> But there LeakSanitizer doesn't complain for some reason; Valgrind
>> reports the parseopts allocation as "possibly lost".
>
> Yes, some of the interactions are tricky. It's really useful to run the
> tests with GIT_TEST_PASSING_SANITIZE_LEAK=[true|check] (see t/README) to
> check these sorts of assumptions for sanity.

That may be true, and looks even useful -- I didn't know the check
value.  I only get a strange error message, though:

   $ GIT_TEST_PASSING_SANITIZE_LEAK=check ./t0001-init.sh
   Bail out! GIT_TEST_PASSING_SANITIZE_LEAK=true has no effect except when compiled with SANITIZE=leak

Same with make test and prove, of course.  And of course I compiled
with SANITIZE=leak beforehand.

But I don't see a connection between my comment and yours.  I was
not running any tests, just the above sequences of function calls,
e.g. in git prune.

>
>> I still think the assumption that "init_x(x); release_x(x);" doesn't
>> leak is reasonable.  Let's make it true.  How about this?  It's safe
>> in the sense that we don't risk double frees and it's close to the
>> TODO comment so we probably won't forget removing it once diff_free()
>> becomes used.
>>
>> ---
>>  revision.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/revision.c b/revision.c
>> index 439e34a7c5..6a51ef9418 100644
>> --- a/revision.c
>> +++ b/revision.c
>> @@ -3055,6 +3055,7 @@ void release_revisions(struct rev_info *revs)
>>  	release_revisions_mailmap(revs->mailmap);
>>  	free_grep_patterns(&revs->grep_filter);
>>  	/* TODO (need to handle "no_free"): diff_free(&revs->diffopt) */
>> +	FREE_AND_NULL(revs->diffopt.parseopts);
>>  	diff_free(&revs->pruning);
>>  	reflog_walk_info_release(revs->reflog_info);
>>  	release_revisions_topo_walk_info(revs->topo_walk_info);
>
> At this point I'm unclear on what & why this is needed? I.e. once we
> narrowly fix the >1 "--filter" options what still fails?

As I wrote: A call to an initialization function followed by a call to a
cleanup function and nothing else shouldn't leak.  There are examples of
repo_init_revisions()+release_revisions() without setup_revisions() or
diff_setup_done() beyond pack-objects.  I mentioned prune, but there are
more, e.g. in sequencer.c.

> But in general: I don't really think this sort of thing is worth
> it. Here we're reaching into a member of "revs->diffopt" behind its back
> rather than calling diff_free(). I think we should just focus on being
> able to do do that safely.

Sure, but the FREE_AND_NULL call is simple and safe, while diff_free()
is complicated and calling it one time too many can hurt.

> WIP patches I have in that direction, partially based on your previous
> "is_dead" suggestion:
>
> 	https://github.com/avar/git/commit/e02a15f6206
> 	https://github.com/avar/git/commit/c718f36566a

Copy-typed the interesting parts of the first patch like a medieval monk
because there doesn't seem to be a download option. :-|

> I haven't poked at that in a while, I think the only outstanding issue
> with it is that fclose() interaction.

You mean the t3702-add-edit.sh failure on Windows mentioned in the
commit message of e02a15f6206?  That's caused by the file being kept
open and thus locked during the call of the editor.  Moving the
release_revisions() call in builtin/add.c::edit_patch() before the
launch_editor() call fixes that by closing the file.

> I think for this particular thing there aren't going to be any bad
> side-effects in practice, but I also think convincing oneself of that
> basically means putting the same amount of work in as just fixing some
> of these properly.

Not to me, but perhaps that TODO is easier solved that I expected.
In any case, with the mentioned edit_patch() change described above
e02a15f6206 passes the test suite on Windows for me.

René

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2022-11-28 21:57 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-12 10:42 [PATCH 0/3] pack-objects: fix and simplify --filter handling René Scharfe
2022-11-12 10:44 ` [PATCH 1/3] pack-objects: fix handling of multiple --filter options René Scharfe
2022-11-12 11:41   ` Ævar Arnfjörð Bjarmason
2022-11-13 17:31     ` René Scharfe
2022-11-12 16:58   ` Jeff King
2022-11-13  5:01     ` Taylor Blau
2022-11-13 16:44       ` Jeff King
2022-11-13 17:31       ` René Scharfe
2022-11-12 10:44 ` [PATCH 2/3] pack-object: simplify --filter handling René Scharfe
2022-11-12 11:45   ` Ævar Arnfjörð Bjarmason
2022-11-12 17:02   ` Jeff King
2022-11-13 16:49     ` Jeff King
2022-11-13 17:31     ` René Scharfe
2022-11-12 10:46 ` [PATCH 3/3] list-objects-filter: remove OPT_PARSE_LIST_OBJECTS_FILTER_INIT() René Scharfe
2022-11-20 10:03 ` [PATCH v2 0/3] pack-objects: fix and simplify --filter handling René Scharfe
2022-11-20 10:06   ` [PATCH v2 1/3] t5317: stop losing return codes of git ls-files René Scharfe
2022-11-20 10:07   ` [PATCH v2 2/3] t5317: demonstrate failure to handle multiple --filter options René Scharfe
2022-11-20 10:13   ` [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" René Scharfe
2022-11-28 10:03     ` Junio C Hamano
2022-11-28 11:12       ` Ævar Arnfjörð Bjarmason
2022-11-28 12:00         ` [PATCH] t5314: check exit code of "rev-parse" Ævar Arnfjörð Bjarmason
2022-11-28 13:51           ` René Scharfe
2022-11-28 14:18           ` [PATCH v2] t5314: check exit code of "git" Ævar Arnfjörð Bjarmason
2022-11-28 11:26       ` [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" René Scharfe
2022-11-28 11:31         ` Ævar Arnfjörð Bjarmason
2022-11-28 12:24           ` Ævar Arnfjörð Bjarmason
2022-11-28 15:16             ` René Scharfe
2022-11-28 15:27               ` Ævar Arnfjörð Bjarmason
2022-11-28 14:29           ` René Scharfe
2022-11-28 14:34             ` Ævar Arnfjörð Bjarmason
2022-11-28 15:56               ` René Scharfe
2022-11-28 17:57                 ` René Scharfe
2022-11-28 18:32                   ` Ævar Arnfjörð Bjarmason
2022-11-28 21:57                     ` René Scharfe
2022-11-28 17:57                 ` Ævar Arnfjörð Bjarmason
2022-11-22 19:02   ` [PATCH v2 0/3] pack-objects: fix and simplify --filter handling Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).