* [PATCH 0/3] pack-objects: fix and simplify --filter handling @ 2022-11-12 10:42 René Scharfe 2022-11-12 10:44 ` [PATCH 1/3] pack-objects: fix handling of multiple --filter options René Scharfe ` (4 more replies) 0 siblings, 5 replies; 52+ messages in thread From: René Scharfe @ 2022-11-12 10:42 UTC (permalink / raw) To: Git List Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, Taylor Blau, Christian Couder Fix a regression that prevents using multiple --filter options, simplify the option parsing code and avoid relying on undefined behavior in it. Effectively almost reverts 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak, 2022-03-28), which introduced these shortcomings. Patch 2 conflicts with cc/filtered-repack in seen, but not semantically. pack-objects: fix handling of multiple --filter options pack-object: simplify --filter handling list-objects-filter: remove OPT_PARSE_LIST_OBJECTS_FILTER_INIT() builtin/pack-objects.c | 27 ++++++-------------------- list-objects-filter-options.c | 4 ---- list-objects-filter-options.h | 18 ++--------------- t/t5317-pack-objects-filter-objects.sh | 19 ++++++++++++++++++ 4 files changed, 27 insertions(+), 41 deletions(-) -- 2.38.1 ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH 1/3] pack-objects: fix handling of multiple --filter options 2022-11-12 10:42 [PATCH 0/3] pack-objects: fix and simplify --filter handling René Scharfe @ 2022-11-12 10:44 ` René Scharfe 2022-11-12 11:41 ` Ævar Arnfjörð Bjarmason 2022-11-12 16:58 ` Jeff King 2022-11-12 10:44 ` [PATCH 2/3] pack-object: simplify --filter handling René Scharfe ` (3 subsequent siblings) 4 siblings, 2 replies; 52+ messages in thread From: René Scharfe @ 2022-11-12 10:44 UTC (permalink / raw) To: Git List Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, Taylor Blau, Christian Couder Since 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak, 2022-03-28) --filter options given to git pack-objects overrule earlier ones, letting only the leftmost win and leaking the memory allocated for earlier ones. Fix that by only initializing the rev_info struct once. Signed-off-by: René Scharfe <l.s.r@web.de> --- builtin/pack-objects.c | 3 ++- t/t5317-pack-objects-filter-objects.sh | 19 +++++++++++++++++++ 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 573d0b20b7..c702c09dd4 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -4158,7 +4158,8 @@ static struct list_objects_filter_options *po_filter_revs_init(void *value) { struct po_filter_data *data = value; - repo_init_revisions(the_repository, &data->revs, NULL); + if (!data->have_revs) + repo_init_revisions(the_repository, &data->revs, NULL); data->have_revs = 1; return &data->revs.filter; diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh index bb633c9b09..bd8983bb56 100755 --- a/t/t5317-pack-objects-filter-objects.sh +++ b/t/t5317-pack-objects-filter-objects.sh @@ -178,6 +178,25 @@ test_expect_success 'verify blob:limit=1001' ' test_cmp expected observed ' +test_expect_success 'verify blob:limit=1001+object:type=blob' ' + git -C r2 ls-files -s large.1000 | + test_parse_ls_files_stage_oids | + sort >expected && + + git -C r2 pack-objects --revs --stdout --filter=blob:limit=1001 \ + --filter=object:type=blob >filter.pack <<-EOF && + HEAD + EOF + git -C r2 index-pack ../filter.pack && + + git -C r2 verify-pack -v ../filter.pack >verify_result && + grep blob verify_result | + parse_verify_pack_blob_oid | + sort >observed && + + test_cmp expected observed +' + test_expect_success 'verify blob:limit=10001' ' git -C r2 ls-files -s large.1000 large.10000 | test_parse_ls_files_stage_oids | -- 2.38.1 ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH 1/3] pack-objects: fix handling of multiple --filter options 2022-11-12 10:44 ` [PATCH 1/3] pack-objects: fix handling of multiple --filter options René Scharfe @ 2022-11-12 11:41 ` Ævar Arnfjörð Bjarmason 2022-11-13 17:31 ` René Scharfe 2022-11-12 16:58 ` Jeff King 1 sibling, 1 reply; 52+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-11-12 11:41 UTC (permalink / raw) To: René Scharfe; +Cc: Git List, Junio C Hamano, Taylor Blau, Christian Couder On Sat, Nov 12 2022, René Scharfe wrote: > Since 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't > leak, 2022-03-28) --filter options given to git pack-objects overrule > earlier ones, letting only the leftmost win and leaking the memory > allocated for earlier ones. Fix that by only initializing the rev_info > struct once. If I do e.g. this with SANITIZE=leak: echo e83c5163316f89bfbde7d9ab23ca2e25604af290 | ./git pack-objects --revs --filter=blob:limit=1001 --filter=object:type=blob --stdout >/dev/null I see one leak that wasn't there, but two that are gone now. I haven't looked into it, but I think the commit message should discuss what leaks are fixed, which remain/are new etc. > @@ -4158,7 +4158,8 @@ static struct list_objects_filter_options *po_filter_revs_init(void *value) > { > struct po_filter_data *data = value; > > - repo_init_revisions(the_repository, &data->revs, NULL); > + if (!data->have_revs) > + repo_init_revisions(the_repository, &data->revs, NULL); > data->have_revs = 1; > > return &data->revs.filter; FWIW as this goes away in your 2/3 I think just squashing the two with a leak fix would be nice, if... > diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh > index bb633c9b09..bd8983bb56 100755 > --- a/t/t5317-pack-objects-filter-objects.sh > +++ b/t/t5317-pack-objects-filter-objects.sh > @@ -178,6 +178,25 @@ test_expect_success 'verify blob:limit=1001' ' > test_cmp expected observed > ' > > +test_expect_success 'verify blob:limit=1001+object:type=blob' ' > + git -C r2 ls-files -s large.1000 | Aside: Should do "git >tmp && test_parse... <tmp", we lose the exit code of "ls-files" here. > + test_parse_ls_files_stage_oids | > + sort >expected && > + > + git -C r2 pack-objects --revs --stdout --filter=blob:limit=1001 \ > + --filter=object:type=blob >filter.pack <<-EOF && > + HEAD > + EOF > + git -C r2 index-pack ../filter.pack && > + > + git -C r2 verify-pack -v ../filter.pack >verify_result && > + grep blob verify_result | > + parse_verify_pack_blob_oid | Whereas this one's not a problem, no "git". > + sort >observed && > + > + test_cmp expected observed Aside: It would be nice if we had a "test_cmp_sort", but some other day... > +' > + > test_expect_success 'verify blob:limit=10001' ' > git -C r2 ls-files -s large.1000 large.10000 | > test_parse_ls_files_stage_oids | ...we can test it, but this test is in a file that's not part of "linux-leaks". If that one leak I mentioned above can be fixed (or maybe it's not new?) this could be tested if we put it in a file of its own with TEST_PASSES_SANITIZE_LEAK=true. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/3] pack-objects: fix handling of multiple --filter options 2022-11-12 11:41 ` Ævar Arnfjörð Bjarmason @ 2022-11-13 17:31 ` René Scharfe 0 siblings, 0 replies; 52+ messages in thread From: René Scharfe @ 2022-11-13 17:31 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Git List, Junio C Hamano, Taylor Blau, Christian Couder Am 12.11.22 um 12:41 schrieb Ævar Arnfjörð Bjarmason: > > On Sat, Nov 12 2022, René Scharfe wrote: > >> Since 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't >> leak, 2022-03-28) --filter options given to git pack-objects overrule >> earlier ones, letting only the leftmost win and leaking the memory >> allocated for earlier ones. Fix that by only initializing the rev_info >> struct once. > > If I do e.g. this with SANITIZE=leak: > > echo e83c5163316f89bfbde7d9ab23ca2e25604af290 | ./git pack-objects --revs --filter=blob:limit=1001 --filter=object:type=blob --stdout >/dev/null > > I see one leak that wasn't there, but two that are gone now. I haven't > looked into it, but I think the commit message should discuss what leaks > are fixed, which remain/are new etc. The leak is insubstantial; I mentioned it just because of the irony. It is caused by initializing an already initialized struct rev_info without releasing it in between, as mentioned in the commit message. .filter_data allocated by filter_combine__init() is not released by filter_combine__init() in list-objects-filter.c. Plugging that leak allows your example command to run with SANITIZE=leak. That is a matter for a separate patch, though. > >> @@ -4158,7 +4158,8 @@ static struct list_objects_filter_options *po_filter_revs_init(void *value) >> { >> struct po_filter_data *data = value; >> >> - repo_init_revisions(the_repository, &data->revs, NULL); >> + if (!data->have_revs) >> + repo_init_revisions(the_repository, &data->revs, NULL); >> data->have_revs = 1; >> >> return &data->revs.filter; > > FWIW as this goes away in your 2/3 I think just squashing the two with a > leak fix would be nice, if... > >> diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh >> index bb633c9b09..bd8983bb56 100755 >> --- a/t/t5317-pack-objects-filter-objects.sh >> +++ b/t/t5317-pack-objects-filter-objects.sh >> @@ -178,6 +178,25 @@ test_expect_success 'verify blob:limit=1001' ' >> test_cmp expected observed >> ' >> >> +test_expect_success 'verify blob:limit=1001+object:type=blob' ' >> + git -C r2 ls-files -s large.1000 | > > Aside: Should do "git >tmp && test_parse... <tmp", we lose the exit code > of "ls-files" here. OK. Copied that line from the surrounding tests. They used temporary files before fb2d0db502 (test-lib-functions: add parsing helpers for ls-files and ls-tree, 2022-04-04). > >> + test_parse_ls_files_stage_oids | >> + sort >expected && >> + >> + git -C r2 pack-objects --revs --stdout --filter=blob:limit=1001 \ >> + --filter=object:type=blob >filter.pack <<-EOF && >> + HEAD >> + EOF >> + git -C r2 index-pack ../filter.pack && >> + >> + git -C r2 verify-pack -v ../filter.pack >verify_result && >> + grep blob verify_result | >> + parse_verify_pack_blob_oid | > > Whereas this one's not a problem, no "git". > >> + sort >observed && >> + >> + test_cmp expected observed > > Aside: It would be nice if we had a "test_cmp_sort", but some other day... > >> +' >> + >> test_expect_success 'verify blob:limit=10001' ' >> git -C r2 ls-files -s large.1000 large.10000 | >> test_parse_ls_files_stage_oids | > > ...we can test it, but this test is in a file that's not part of "linux-leaks". > > If that one leak I mentioned above can be fixed (or maybe it's not new?) > this could be tested if we put it in a file of its own with > TEST_PASSES_SANITIZE_LEAK=true. Plugging the leak in your example command is not enough to make t5317 leak free. René ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/3] pack-objects: fix handling of multiple --filter options 2022-11-12 10:44 ` [PATCH 1/3] pack-objects: fix handling of multiple --filter options René Scharfe 2022-11-12 11:41 ` Ævar Arnfjörð Bjarmason @ 2022-11-12 16:58 ` Jeff King 2022-11-13 5:01 ` Taylor Blau 1 sibling, 1 reply; 52+ messages in thread From: Jeff King @ 2022-11-12 16:58 UTC (permalink / raw) To: René Scharfe Cc: Git List, Ævar Arnfjörð Bjarmason, Junio C Hamano, Taylor Blau, Christian Couder On Sat, Nov 12, 2022 at 11:44:00AM +0100, René Scharfe wrote: > Since 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't > leak, 2022-03-28) --filter options given to git pack-objects overrule > earlier ones, letting only the leftmost win and leaking the memory > allocated for earlier ones. Fix that by only initializing the rev_info > struct once. Yikes. I wondered how we managed to miss this in the tests. Surely we have some examples that use multiple filters? I think the answer is that we do in t5616, but in practice they didn't hit this bug because of the way filter-specs are treated server-sid3. Even if we say "clone --filter=foo --filter=bar" on the client, the server upload-pack coalesces that into a single "combine:foo+bar" string, and that's what its pack-objects sees. Your test here triggers it by invoking pack-objects directly, which makes sense. That's a little different from how pack-objects is invoked by the rest of Git in practice, but it's definitely something that _should_ work. It's conceivable that somebody outside of Git could be driving pack-objects directly, but also it's just a trap waiting to spring for future developers. -Peff ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/3] pack-objects: fix handling of multiple --filter options 2022-11-12 16:58 ` Jeff King @ 2022-11-13 5:01 ` Taylor Blau 2022-11-13 16:44 ` Jeff King 2022-11-13 17:31 ` René Scharfe 0 siblings, 2 replies; 52+ messages in thread From: Taylor Blau @ 2022-11-13 5:01 UTC (permalink / raw) To: Jeff King Cc: René Scharfe, Git List, Ævar Arnfjörð Bjarmason, Junio C Hamano, Taylor Blau, Christian Couder, Jonathan Tan On Sat, Nov 12, 2022 at 11:58:44AM -0500, Jeff King wrote: > On Sat, Nov 12, 2022 at 11:44:00AM +0100, René Scharfe wrote: > > > Since 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't > > leak, 2022-03-28) --filter options given to git pack-objects overrule > > earlier ones, letting only the leftmost win and leaking the memory > > allocated for earlier ones. Fix that by only initializing the rev_info > > struct once. > > Yikes. I wondered how we managed to miss this in the tests. Surely we > have some examples that use multiple filters? > > I think the answer is that we do in t5616, but in practice they didn't > hit this bug because of the way filter-specs are treated server-sid3. > Even if we say "clone --filter=foo --filter=bar" on the client, the > server upload-pack coalesces that into a single "combine:foo+bar" > string, and that's what its pack-objects sees. ...Or in other words, clients aren't broken here because we always send "combine" filters when multiple `--filter` arguments are passed to `git clone`, `git fetch`, etc. Is that always the case? I *think* so, but it would be nice to know if there was a way that clients would not send the `combine` filter with >1 filter. Thanks, Taylor ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/3] pack-objects: fix handling of multiple --filter options 2022-11-13 5:01 ` Taylor Blau @ 2022-11-13 16:44 ` Jeff King 2022-11-13 17:31 ` René Scharfe 1 sibling, 0 replies; 52+ messages in thread From: Jeff King @ 2022-11-13 16:44 UTC (permalink / raw) To: Taylor Blau Cc: René Scharfe, Git List, Ævar Arnfjörð Bjarmason, Junio C Hamano, Christian Couder, Jonathan Tan On Sun, Nov 13, 2022 at 12:01:51AM -0500, Taylor Blau wrote: > > Yikes. I wondered how we managed to miss this in the tests. Surely we > > have some examples that use multiple filters? > > > > I think the answer is that we do in t5616, but in practice they didn't > > hit this bug because of the way filter-specs are treated server-sid3. > > Even if we say "clone --filter=foo --filter=bar" on the client, the > > server upload-pack coalesces that into a single "combine:foo+bar" > > string, and that's what its pack-objects sees. > > ...Or in other words, clients aren't broken here because we always send > "combine" filters when multiple `--filter` arguments are passed to `git > clone`, `git fetch`, etc. Right. > Is that always the case? I *think* so, but it would be nice to know if > there was a way that clients would not send the `combine` filter with >1 > filter. I think it's always the case at least with Git's code, because the client-side send_filter() sends from list_objects_filter_spec(), which takes the single-string spec of the top-level combine filter. It would have to explicitly iterate over the sub-filters to come up with anything else, and I don't think we even have a function for that (we do when evaluating them, of course, but not for generically iterating or producing a multi-string set of specs). And on the server side, even if a client did send multiple "filter" lines, I think we'd hit list_objects_filter_die_if_populated(), so we'd complain. I don't know of any other code paths that could hit this (can you filter on an outgoing push?). But in the end I'm not sure it even matters. We should be fixing the bug anyway, and this is all just understanding how far-reaching its effects are (and the answer seems to be: not very). -Peff ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/3] pack-objects: fix handling of multiple --filter options 2022-11-13 5:01 ` Taylor Blau 2022-11-13 16:44 ` Jeff King @ 2022-11-13 17:31 ` René Scharfe 1 sibling, 0 replies; 52+ messages in thread From: René Scharfe @ 2022-11-13 17:31 UTC (permalink / raw) To: Taylor Blau, Jeff King Cc: Git List, Ævar Arnfjörð Bjarmason, Junio C Hamano, Christian Couder, Jonathan Tan Am 13.11.22 um 06:01 schrieb Taylor Blau: > On Sat, Nov 12, 2022 at 11:58:44AM -0500, Jeff King wrote: >> On Sat, Nov 12, 2022 at 11:44:00AM +0100, René Scharfe wrote: >> >>> Since 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't >>> leak, 2022-03-28) --filter options given to git pack-objects overrule >>> earlier ones, letting only the leftmost win and leaking the memory >>> allocated for earlier ones. Fix that by only initializing the rev_info >>> struct once. >> >> Yikes. I wondered how we managed to miss this in the tests. Surely we >> have some examples that use multiple filters? >> >> I think the answer is that we do in t5616, but in practice they didn't >> hit this bug because of the way filter-specs are treated server-sid3. >> Even if we say "clone --filter=foo --filter=bar" on the client, the >> server upload-pack coalesces that into a single "combine:foo+bar" >> string, and that's what its pack-objects sees. > > ...Or in other words, clients aren't broken here because we always send > "combine" filters when multiple `--filter` arguments are passed to `git > clone`, `git fetch`, etc. > > Is that always the case? I *think* so, but it would be nice to know if > there was a way that clients would not send the `combine` filter with >1 > filter. Searching for pack-objects invocations like this should at least find all internal cases, right? $ git grep -e '"pack-objects"' -e --filter --all-match --name-only builtin/pack-objects.c bundle.c upload-pack.c git pack-objects handles --filter and doesn't pass it on. bundle.c and upload-pack.c only add at most a single --filter option to a strvec. bundle.c uses list_objects_filter_spec() to get the filter specification, upload-pack.c uses expand_list_objects_filter_spec(). René ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH 2/3] pack-object: simplify --filter handling 2022-11-12 10:42 [PATCH 0/3] pack-objects: fix and simplify --filter handling René Scharfe 2022-11-12 10:44 ` [PATCH 1/3] pack-objects: fix handling of multiple --filter options René Scharfe @ 2022-11-12 10:44 ` René Scharfe 2022-11-12 11:45 ` Ævar Arnfjörð Bjarmason 2022-11-12 17:02 ` Jeff King 2022-11-12 10:46 ` [PATCH 3/3] list-objects-filter: remove OPT_PARSE_LIST_OBJECTS_FILTER_INIT() René Scharfe ` (2 subsequent siblings) 4 siblings, 2 replies; 52+ messages in thread From: René Scharfe @ 2022-11-12 10:44 UTC (permalink / raw) To: Git List Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, Taylor Blau, Christian Couder pack-objects uses OPT_PARSE_LIST_OBJECTS_FILTER_INIT() to initialize the a rev_info struct lazily before populating its filter member using the --filter option values. It tracks whether the initialization is needed using the .have_revs member of the callback data. There is a better way: Use a stand-alone list_objects_filter_options struct and build a rev_info struct with its .filter member after option parsing. This allows using the simpler OPT_PARSE_LIST_OBJECTS_FILTER() and getting rid of the extra callback mechanism. Signed-off-by: René Scharfe <l.s.r@web.de> --- builtin/pack-objects.c | 28 ++++++---------------------- 1 file changed, 6 insertions(+), 22 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index c702c09dd4..30023bcebb 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -4149,22 +4149,6 @@ static int option_parse_cruft_expiration(const struct option *opt, return 0; } -struct po_filter_data { - unsigned have_revs:1; - struct rev_info revs; -}; - -static struct list_objects_filter_options *po_filter_revs_init(void *value) -{ - struct po_filter_data *data = value; - - if (!data->have_revs) - repo_init_revisions(the_repository, &data->revs, NULL); - data->have_revs = 1; - - return &data->revs.filter; -} - int cmd_pack_objects(int argc, const char **argv, const char *prefix) { int use_internal_rev_list = 0; @@ -4175,7 +4159,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) int rev_list_index = 0; int stdin_packs = 0; struct string_list keep_pack_list = STRING_LIST_INIT_NODUP; - struct po_filter_data pfd = { .have_revs = 0 }; + struct list_objects_filter_options filter_options = + LIST_OBJECTS_FILTER_INIT; struct option pack_objects_options[] = { OPT_SET_INT('q', "quiet", &progress, @@ -4266,7 +4251,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) &write_bitmap_index, N_("write a bitmap index if possible"), WRITE_BITMAP_QUIET, PARSE_OPT_HIDDEN), - OPT_PARSE_LIST_OBJECTS_FILTER_INIT(&pfd, po_filter_revs_init), + OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options), OPT_CALLBACK_F(0, "missing", NULL, N_("action"), N_("handling for missing objects"), PARSE_OPT_NONEG, option_parse_missing_action), @@ -4386,7 +4371,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (!rev_list_all || !rev_list_reflog || !rev_list_index) unpack_unreachable_expiration = 0; - if (pfd.have_revs && pfd.revs.filter.choice) { + if (filter_options.choice) { if (!pack_to_stdout) die(_("cannot use --filter without --stdout")); if (stdin_packs) @@ -4473,16 +4458,15 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) read_cruft_objects(); } else if (!use_internal_rev_list) { read_object_list_from_stdin(); - } else if (pfd.have_revs) { - get_object_list(&pfd.revs, rp.nr, rp.v); - release_revisions(&pfd.revs); } else { struct rev_info revs; repo_init_revisions(the_repository, &revs, NULL); + list_objects_filter_copy(&revs.filter, &filter_options); get_object_list(&revs, rp.nr, rp.v); release_revisions(&revs); } + list_objects_filter_release(&filter_options); cleanup_preferred_base(); if (include_tag && nr_result) for_each_tag_ref(add_ref_tag, NULL); -- 2.38.1 ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH 2/3] pack-object: simplify --filter handling 2022-11-12 10:44 ` [PATCH 2/3] pack-object: simplify --filter handling René Scharfe @ 2022-11-12 11:45 ` Ævar Arnfjörð Bjarmason 2022-11-12 17:02 ` Jeff King 1 sibling, 0 replies; 52+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-11-12 11:45 UTC (permalink / raw) To: René Scharfe; +Cc: Git List, Junio C Hamano, Taylor Blau, Christian Couder On Sat, Nov 12 2022, René Scharfe wrote: > pack-objects uses OPT_PARSE_LIST_OBJECTS_FILTER_INIT() to initialize the > a rev_info struct lazily before populating its filter member using the > --filter option values. It tracks whether the initialization is needed > using the .have_revs member of the callback data. > > There is a better way: Use a stand-alone list_objects_filter_options > struct and build a rev_info struct with its .filter member after option > parsing. This allows using the simpler OPT_PARSE_LIST_OBJECTS_FILTER() > and getting rid of the extra callback mechanism. I haven't gone through the history again, but as I recall this used to be outside of "struct rev_info", then (I think I did, as part of a leak fix) it was put there, and now it's still there, but we make a copy of it, then copy it into the "rev_info" struct again. Maybe we've finally reached the right trade-off here, but it would help to discuss some of those past commits, why those arrangements were wrong, and why this is finally the right one... ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 2/3] pack-object: simplify --filter handling 2022-11-12 10:44 ` [PATCH 2/3] pack-object: simplify --filter handling René Scharfe 2022-11-12 11:45 ` Ævar Arnfjörð Bjarmason @ 2022-11-12 17:02 ` Jeff King 2022-11-13 16:49 ` Jeff King 2022-11-13 17:31 ` René Scharfe 1 sibling, 2 replies; 52+ messages in thread From: Jeff King @ 2022-11-12 17:02 UTC (permalink / raw) To: René Scharfe Cc: Git List, Ævar Arnfjörð Bjarmason, Junio C Hamano, Taylor Blau, Christian Couder On Sat, Nov 12, 2022 at 11:44:50AM +0100, René Scharfe wrote: > pack-objects uses OPT_PARSE_LIST_OBJECTS_FILTER_INIT() to initialize the > a rev_info struct lazily before populating its filter member using the > --filter option values. It tracks whether the initialization is needed > using the .have_revs member of the callback data. > > There is a better way: Use a stand-alone list_objects_filter_options > struct and build a rev_info struct with its .filter member after option > parsing. This allows using the simpler OPT_PARSE_LIST_OBJECTS_FILTER() > and getting rid of the extra callback mechanism. That seems like a reasonable fix (and I think we do it elsewhere). But I wonder if it wouldn't be simpler to just unconditionally initialize the rev_info in cmd_pack_objects(), and then unconditionally release it. True, we _might_ not use it if we are receiving objects directly over stdin, but I don't think initializing it is particularly costly. And then we don't have to worry about whether it's valid (it always is), or about copying between two representations of the filter (there's only one). -Peff ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 2/3] pack-object: simplify --filter handling 2022-11-12 17:02 ` Jeff King @ 2022-11-13 16:49 ` Jeff King 2022-11-13 17:31 ` René Scharfe 1 sibling, 0 replies; 52+ messages in thread From: Jeff King @ 2022-11-13 16:49 UTC (permalink / raw) To: René Scharfe Cc: Git List, Ævar Arnfjörð Bjarmason, Junio C Hamano, Taylor Blau, Christian Couder On Sat, Nov 12, 2022 at 12:02:58PM -0500, Jeff King wrote: > > There is a better way: Use a stand-alone list_objects_filter_options > > struct and build a rev_info struct with its .filter member after option > > parsing. This allows using the simpler OPT_PARSE_LIST_OBJECTS_FILTER() > > and getting rid of the extra callback mechanism. > > That seems like a reasonable fix (and I think we do it elsewhere). But I > wonder if it wouldn't be simpler to just unconditionally initialize the > rev_info in cmd_pack_objects(), and then unconditionally release it. > True, we _might_ not use it if we are receiving objects directly over > stdin, but I don't think initializing it is particularly costly. > > And then we don't have to worry about whether it's valid (it always is), > or about copying between two representations of the filter (there's only > one). Just to make sure I wasn't telling you absolute garbage, I sketched this out quickly, so sharing it below (to illustrate, or maybe to save you 5 minutes if you want to go that direction). If you do stick with copying the filter spec, yet another option is that the rev_info variable could move back into get_object_list(), and we could just pass in the filter_options. It was only moved out in 80f6de4f5b (pack-objects: move revs out of get_object_list(), 2022-03-22) so that we could parse --filter directly into it. That's probably approaching bikeshed territory, and I'm happy enough either way, but it keeps the outer cmd_pack_objects() to the absolute minimum of what it needs to see. -Peff -- >8 -- diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index c702c09dd4..cadc2be35f 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -4149,22 +4149,6 @@ static int option_parse_cruft_expiration(const struct option *opt, return 0; } -struct po_filter_data { - unsigned have_revs:1; - struct rev_info revs; -}; - -static struct list_objects_filter_options *po_filter_revs_init(void *value) -{ - struct po_filter_data *data = value; - - if (!data->have_revs) - repo_init_revisions(the_repository, &data->revs, NULL); - data->have_revs = 1; - - return &data->revs.filter; -} - int cmd_pack_objects(int argc, const char **argv, const char *prefix) { int use_internal_rev_list = 0; @@ -4175,7 +4159,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) int rev_list_index = 0; int stdin_packs = 0; struct string_list keep_pack_list = STRING_LIST_INIT_NODUP; - struct po_filter_data pfd = { .have_revs = 0 }; + struct rev_info revs; struct option pack_objects_options[] = { OPT_SET_INT('q', "quiet", &progress, @@ -4266,7 +4250,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) &write_bitmap_index, N_("write a bitmap index if possible"), WRITE_BITMAP_QUIET, PARSE_OPT_HIDDEN), - OPT_PARSE_LIST_OBJECTS_FILTER_INIT(&pfd, po_filter_revs_init), + OPT_PARSE_LIST_OBJECTS_FILTER(&revs.filter), OPT_CALLBACK_F(0, "missing", NULL, N_("action"), N_("handling for missing objects"), PARSE_OPT_NONEG, option_parse_missing_action), @@ -4297,6 +4281,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (git_env_bool(GIT_TEST_WRITE_REV_INDEX, 0)) pack_idx_opts.flags |= WRITE_REV; + repo_init_revisions(the_repository, &revs, NULL); + progress = isatty(2); argc = parse_options(argc, argv, prefix, pack_objects_options, pack_usage, 0); @@ -4386,7 +4372,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (!rev_list_all || !rev_list_reflog || !rev_list_index) unpack_unreachable_expiration = 0; - if (pfd.have_revs && pfd.revs.filter.choice) { + if (revs.filter.choice) { if (!pack_to_stdout) die(_("cannot use --filter without --stdout")); if (stdin_packs) @@ -4473,16 +4459,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) read_cruft_objects(); } else if (!use_internal_rev_list) { read_object_list_from_stdin(); - } else if (pfd.have_revs) { - get_object_list(&pfd.revs, rp.nr, rp.v); - release_revisions(&pfd.revs); } else { - struct rev_info revs; - - repo_init_revisions(the_repository, &revs, NULL); get_object_list(&revs, rp.nr, rp.v); - release_revisions(&revs); } + release_revisions(&revs); cleanup_preferred_base(); if (include_tag && nr_result) for_each_tag_ref(add_ref_tag, NULL); ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH 2/3] pack-object: simplify --filter handling 2022-11-12 17:02 ` Jeff King 2022-11-13 16:49 ` Jeff King @ 2022-11-13 17:31 ` René Scharfe 1 sibling, 0 replies; 52+ messages in thread From: René Scharfe @ 2022-11-13 17:31 UTC (permalink / raw) To: Jeff King Cc: Git List, Ævar Arnfjörð Bjarmason, Junio C Hamano, Taylor Blau, Christian Couder Am 12.11.22 um 18:02 schrieb Jeff King: > On Sat, Nov 12, 2022 at 11:44:50AM +0100, René Scharfe wrote: > >> pack-objects uses OPT_PARSE_LIST_OBJECTS_FILTER_INIT() to initialize the >> a rev_info struct lazily before populating its filter member using the >> --filter option values. It tracks whether the initialization is needed >> using the .have_revs member of the callback data. >> >> There is a better way: Use a stand-alone list_objects_filter_options >> struct and build a rev_info struct with its .filter member after option >> parsing. This allows using the simpler OPT_PARSE_LIST_OBJECTS_FILTER() >> and getting rid of the extra callback mechanism. > > That seems like a reasonable fix (and I think we do it elsewhere). But I > wonder if it wouldn't be simpler to just unconditionally initialize the > rev_info in cmd_pack_objects(), and then unconditionally release it. > True, we _might_ not use it if we are receiving objects directly over > stdin, but I don't think initializing it is particularly costly. > > And then we don't have to worry about whether it's valid (it always is), > or about copying between two representations of the filter (there's only > one). Indeed that's even simpler and what we had before 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak, 2022-03-28). René ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH 3/3] list-objects-filter: remove OPT_PARSE_LIST_OBJECTS_FILTER_INIT() 2022-11-12 10:42 [PATCH 0/3] pack-objects: fix and simplify --filter handling René Scharfe 2022-11-12 10:44 ` [PATCH 1/3] pack-objects: fix handling of multiple --filter options René Scharfe 2022-11-12 10:44 ` [PATCH 2/3] pack-object: simplify --filter handling René Scharfe @ 2022-11-12 10:46 ` René Scharfe 2022-11-20 10:03 ` [PATCH v2 0/3] pack-objects: fix and simplify --filter handling René Scharfe 2022-11-29 12:19 ` [PATCH v3 0/5] " René Scharfe 4 siblings, 0 replies; 52+ messages in thread From: René Scharfe @ 2022-11-12 10:46 UTC (permalink / raw) To: Git List Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, Taylor Blau, Christian Couder OPT_PARSE_LIST_OBJECTS_FILTER_INIT() with a non-NULL second argument passes a function pointer via an object pointer, which is undefined. It may work fine on platforms that implement C99 extension J.5.7 (Function pointer casts). Remove the unused macro and avoid the dependency on that extension. Signed-off-by: René Scharfe <l.s.r@web.de> --- list-objects-filter-options.c | 4 ---- list-objects-filter-options.h | 18 ++---------------- 2 files changed, 2 insertions(+), 20 deletions(-) diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c index 5339660238..ee01bcd2cc 100644 --- a/list-objects-filter-options.c +++ b/list-objects-filter-options.c @@ -290,10 +290,6 @@ int opt_parse_list_objects_filter(const struct option *opt, const char *arg, int unset) { struct list_objects_filter_options *filter_options = opt->value; - opt_lof_init init = (opt_lof_init)opt->defval; - - if (init) - filter_options = init(opt->value); if (unset || !arg) list_objects_filter_set_no_filter(filter_options); diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h index 7eeadab2dd..1fe393f447 100644 --- a/list-objects-filter-options.h +++ b/list-objects-filter-options.h @@ -111,27 +111,13 @@ void parse_list_objects_filter( * The opt->value to opt_parse_list_objects_filter() is either a * "struct list_objects_filter_option *" when using * OPT_PARSE_LIST_OBJECTS_FILTER(). - * - * Or, if using no "struct option" field is used by the callback, - * except the "defval" which is expected to be an "opt_lof_init" - * function, which is called with the "opt->value" and must return a - * pointer to the ""struct list_objects_filter_option *" to be used. - * - * The OPT_PARSE_LIST_OBJECTS_FILTER_INIT() can be used e.g. the - * "struct list_objects_filter_option" is embedded in a "struct - * rev_info", which the "defval" could be tasked with lazily - * initializing. See cmd_pack_objects() for an example. */ int opt_parse_list_objects_filter(const struct option *opt, const char *arg, int unset); -typedef struct list_objects_filter_options *(*opt_lof_init)(void *); -#define OPT_PARSE_LIST_OBJECTS_FILTER_INIT(fo, init) \ - { OPTION_CALLBACK, 0, "filter", (fo), N_("args"), \ - N_("object filtering"), 0, opt_parse_list_objects_filter, \ - (intptr_t)(init) } #define OPT_PARSE_LIST_OBJECTS_FILTER(fo) \ - OPT_PARSE_LIST_OBJECTS_FILTER_INIT((fo), NULL) + OPT_CALLBACK(0, "filter", (fo), N_("args"), \ + N_("object filtering"), opt_parse_list_objects_filter) /* * Translates abbreviated numbers in the filter's filter_spec into their -- 2.38.1 ^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH v2 0/3] pack-objects: fix and simplify --filter handling 2022-11-12 10:42 [PATCH 0/3] pack-objects: fix and simplify --filter handling René Scharfe ` (2 preceding siblings ...) 2022-11-12 10:46 ` [PATCH 3/3] list-objects-filter: remove OPT_PARSE_LIST_OBJECTS_FILTER_INIT() René Scharfe @ 2022-11-20 10:03 ` René Scharfe 2022-11-20 10:06 ` [PATCH v2 1/3] t5317: stop losing return codes of git ls-files René Scharfe ` (3 more replies) 2022-11-29 12:19 ` [PATCH v3 0/5] " René Scharfe 4 siblings, 4 replies; 52+ messages in thread From: René Scharfe @ 2022-11-20 10:03 UTC (permalink / raw) To: Git List Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, Taylor Blau, Christian Couder, Jeff King Fix a regression that prevents using multiple --filter options, simplify the option parsing code and avoid relying on undefined behavior in it. Patch 3 conflicts with cc/filtered-repack in seen, but not semantically. Changes since v1: - Added patch 1 to fix an issue with existing tests. - Separate patch 2 for new tests. - Test using blob size filters, only, which is a bit simpler. - Test both combinations to also catch not just the current last-one-wins regression, but also a possible future first-one-wins issue. - Actually revert 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak, 2022-03-28) instead of having a minimal fix and then adding some kind of middle ground by using a separate struct list_objects_filter_options. t5317: stop losing return codes of git ls-files t5317: demonstrate failure to handle multiple --filter options Revert "pack-objects: lazily set up "struct rev_info", don't leak" builtin/pack-objects.c | 31 ++------- list-objects-filter-options.c | 4 -- list-objects-filter-options.h | 24 +------ t/t5317-pack-objects-filter-objects.sh | 90 +++++++++++++++++++------- 4 files changed, 75 insertions(+), 74 deletions(-) Range-Diff gegen v1: 1: 2df1185b5f < -: ---------- pack-objects: fix handling of multiple --filter options 2: b8e951fb4f < -: ---------- pack-object: simplify --filter handling 3: 1e4ae7d9f1 < -: ---------- list-objects-filter: remove OPT_PARSE_LIST_OBJECTS_FILTER_INIT() -: ---------- > 1: 43af432f5c t5317: stop losing return codes of git ls-files -: ---------- > 2: a77b85e71d t5317: demonstrate failure to handle multiple --filter options -: ---------- > 3: 63ccb81357 Revert "pack-objects: lazily set up "struct rev_info", don't leak" -- 2.38.1 ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v2 1/3] t5317: stop losing return codes of git ls-files 2022-11-20 10:03 ` [PATCH v2 0/3] pack-objects: fix and simplify --filter handling René Scharfe @ 2022-11-20 10:06 ` René Scharfe 2022-11-20 10:07 ` [PATCH v2 2/3] t5317: demonstrate failure to handle multiple --filter options René Scharfe ` (2 subsequent siblings) 3 siblings, 0 replies; 52+ messages in thread From: René Scharfe @ 2022-11-20 10:06 UTC (permalink / raw) To: Git List Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, Taylor Blau, Christian Couder, Jeff King fb2d0db502 (test-lib-functions: add parsing helpers for ls-files and ls-tree, 2022-04-04) not only started to use helper functions, it also started to pipe the output of git ls-files into them directly, without using a temporary file. No explanation was given. This causes the return code of that git command to be ignored. Revert that part of the change, use temporary files and check the return code of git ls-files again. Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: René Scharfe <l.s.r@web.de> --- t/t5317-pack-objects-filter-objects.sh | 52 ++++++++++++++------------ 1 file changed, 28 insertions(+), 24 deletions(-) diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh index bb633c9b09..82a22ecaa5 100755 --- a/t/t5317-pack-objects-filter-objects.sh +++ b/t/t5317-pack-objects-filter-objects.sh @@ -24,8 +24,9 @@ parse_verify_pack_blob_oid () { } test_expect_success 'verify blob count in normal packfile' ' - git -C r1 ls-files -s file.1 file.2 file.3 file.4 file.5 | - test_parse_ls_files_stage_oids | + git -C r1 ls-files -s file.1 file.2 file.3 file.4 file.5 \ + >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | sort >expected && git -C r1 pack-objects --revs --stdout >all.pack <<-EOF && @@ -123,8 +124,8 @@ test_expect_success 'setup r2' ' ' test_expect_success 'verify blob count in normal packfile' ' - git -C r2 ls-files -s large.1000 large.10000 | - test_parse_ls_files_stage_oids | + git -C r2 ls-files -s large.1000 large.10000 >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | sort >expected && git -C r2 pack-objects --revs --stdout >all.pack <<-EOF && @@ -161,8 +162,8 @@ test_expect_success 'verify blob:limit=1000' ' ' test_expect_success 'verify blob:limit=1001' ' - git -C r2 ls-files -s large.1000 | - test_parse_ls_files_stage_oids | + git -C r2 ls-files -s large.1000 >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | sort >expected && git -C r2 pack-objects --revs --stdout --filter=blob:limit=1001 >filter.pack <<-EOF && @@ -179,8 +180,8 @@ test_expect_success 'verify blob:limit=1001' ' ' test_expect_success 'verify blob:limit=10001' ' - git -C r2 ls-files -s large.1000 large.10000 | - test_parse_ls_files_stage_oids | + git -C r2 ls-files -s large.1000 large.10000 >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | sort >expected && git -C r2 pack-objects --revs --stdout --filter=blob:limit=10001 >filter.pack <<-EOF && @@ -197,8 +198,8 @@ test_expect_success 'verify blob:limit=10001' ' ' test_expect_success 'verify blob:limit=1k' ' - git -C r2 ls-files -s large.1000 | - test_parse_ls_files_stage_oids | + git -C r2 ls-files -s large.1000 >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | sort >expected && git -C r2 pack-objects --revs --stdout --filter=blob:limit=1k >filter.pack <<-EOF && @@ -215,8 +216,8 @@ test_expect_success 'verify blob:limit=1k' ' ' test_expect_success 'verify explicitly specifying oversized blob in input' ' - git -C r2 ls-files -s large.1000 large.10000 | - test_parse_ls_files_stage_oids | + git -C r2 ls-files -s large.1000 large.10000 >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | sort >expected && echo HEAD >objects && @@ -233,8 +234,8 @@ test_expect_success 'verify explicitly specifying oversized blob in input' ' ' test_expect_success 'verify blob:limit=1m' ' - git -C r2 ls-files -s large.1000 large.10000 | - test_parse_ls_files_stage_oids | + git -C r2 ls-files -s large.1000 large.10000 >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | sort >expected && git -C r2 pack-objects --revs --stdout --filter=blob:limit=1m >filter.pack <<-EOF && @@ -289,8 +290,9 @@ test_expect_success 'setup r3' ' ' test_expect_success 'verify blob count in normal packfile' ' - git -C r3 ls-files -s sparse1 sparse2 dir1/sparse1 dir1/sparse2 | - test_parse_ls_files_stage_oids | + git -C r3 ls-files -s sparse1 sparse2 dir1/sparse1 dir1/sparse2 \ + >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | sort >expected && git -C r3 pack-objects --revs --stdout >all.pack <<-EOF && @@ -341,8 +343,9 @@ test_expect_success 'setup r4' ' ' test_expect_success 'verify blob count in normal packfile' ' - git -C r4 ls-files -s pattern sparse1 sparse2 dir1/sparse1 dir1/sparse2 | - test_parse_ls_files_stage_oids | + git -C r4 ls-files -s pattern sparse1 sparse2 dir1/sparse1 dir1/sparse2 \ + >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | sort >expected && git -C r4 pack-objects --revs --stdout >all.pack <<-EOF && @@ -359,8 +362,8 @@ test_expect_success 'verify blob count in normal packfile' ' ' test_expect_success 'verify sparse:oid=OID' ' - git -C r4 ls-files -s dir1/sparse1 dir1/sparse2 | - test_parse_ls_files_stage_oids | + git -C r4 ls-files -s dir1/sparse1 dir1/sparse2 >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | sort >expected && git -C r4 ls-files -s pattern >staged && @@ -379,8 +382,8 @@ test_expect_success 'verify sparse:oid=OID' ' ' test_expect_success 'verify sparse:oid=oid-ish' ' - git -C r4 ls-files -s dir1/sparse1 dir1/sparse2 | - test_parse_ls_files_stage_oids | + git -C r4 ls-files -s dir1/sparse1 dir1/sparse2 >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | sort >expected && git -C r4 pack-objects --revs --stdout --filter=sparse:oid=main:pattern >filter.pack <<-EOF && @@ -400,8 +403,9 @@ test_expect_success 'verify sparse:oid=oid-ish' ' # This models previously omitted objects that we did not receive. test_expect_success 'setup r1 - delete loose blobs' ' - git -C r1 ls-files -s file.1 file.2 file.3 file.4 file.5 | - test_parse_ls_files_stage_oids | + git -C r1 ls-files -s file.1 file.2 file.3 file.4 file.5 \ + >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | sort >expected && for id in `cat expected | sed "s|..|&/|"` -- 2.38.1 ^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH v2 2/3] t5317: demonstrate failure to handle multiple --filter options 2022-11-20 10:03 ` [PATCH v2 0/3] pack-objects: fix and simplify --filter handling René Scharfe 2022-11-20 10:06 ` [PATCH v2 1/3] t5317: stop losing return codes of git ls-files René Scharfe @ 2022-11-20 10:07 ` René Scharfe 2022-11-20 10:13 ` [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" René Scharfe 2022-11-22 19:02 ` [PATCH v2 0/3] pack-objects: fix and simplify --filter handling Jeff King 3 siblings, 0 replies; 52+ messages in thread From: René Scharfe @ 2022-11-20 10:07 UTC (permalink / raw) To: Git List Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, Taylor Blau, Christian Couder, Jeff King git pack-objects should accept multiple --filter options as documented in Documentation/rev-list-options.txt, but currently the last one wins. Show that using tests with multiple blob size limits. Signed-off-by: René Scharfe <l.s.r@web.de> --- t/t5317-pack-objects-filter-objects.sh | 38 ++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh index 82a22ecaa5..25faebaada 100755 --- a/t/t5317-pack-objects-filter-objects.sh +++ b/t/t5317-pack-objects-filter-objects.sh @@ -265,6 +265,44 @@ test_expect_success 'verify normal and blob:limit packfiles have same commits/tr test_cmp expected observed ' +test_expect_failure 'verify small limit and big limit results in small limit' ' + git -C r2 ls-files -s large.1000 >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | + sort >expected && + + git -C r2 pack-objects --revs --stdout --filter=blob:limit=1001 \ + --filter=blob:limit=10001 >filter.pack <<-EOF && + HEAD + EOF + git -C r2 index-pack ../filter.pack && + + git -C r2 verify-pack -v ../filter.pack >verify_result && + grep blob verify_result | + parse_verify_pack_blob_oid | + sort >observed && + + test_cmp expected observed +' + +test_expect_success 'verify big limit and small limit results in small limit' ' + git -C r2 ls-files -s large.1000 >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | + sort >expected && + + git -C r2 pack-objects --revs --stdout --filter=blob:limit=10001 \ + --filter=blob:limit=1001 >filter.pack <<-EOF && + HEAD + EOF + git -C r2 index-pack ../filter.pack && + + git -C r2 verify-pack -v ../filter.pack >verify_result && + grep blob verify_result | + parse_verify_pack_blob_oid | + sort >observed && + + test_cmp expected observed +' + # Test sparse:path=<path> filter. # !!!! # NOTE: sparse:path filter support has been dropped for security reasons, -- 2.38.1 ^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" 2022-11-20 10:03 ` [PATCH v2 0/3] pack-objects: fix and simplify --filter handling René Scharfe 2022-11-20 10:06 ` [PATCH v2 1/3] t5317: stop losing return codes of git ls-files René Scharfe 2022-11-20 10:07 ` [PATCH v2 2/3] t5317: demonstrate failure to handle multiple --filter options René Scharfe @ 2022-11-20 10:13 ` René Scharfe 2022-11-28 10:03 ` Junio C Hamano 2022-11-22 19:02 ` [PATCH v2 0/3] pack-objects: fix and simplify --filter handling Jeff King 3 siblings, 1 reply; 52+ messages in thread From: René Scharfe @ 2022-11-20 10:13 UTC (permalink / raw) To: Git List Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, Taylor Blau, Christian Couder, Jeff King This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518. 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak, 2022-03-28) avoided leaking rev_info allocations in many cases by calling repo_init_revisions() only when the .filter member was actually needed, but then still leaking it. That was fixed later by 2108fe4a19 (revisions API users: add straightforward release_revisions(), 2022-04-13), making the reverted commit unnecessary. 5cb28270a1 broke support for multiple --filter options by calling repo_init_revisions() for each one, clobbering earlier state and leaking rev_info allocations. Reverting it fixes that regression. Luckily it only affects users that run pack-objects directly. Internal callers in bundle.c and upload-pack.c combine multiple filters into a single option using "+". 5cb28270a1 introduced OPT_PARSE_LIST_OBJECTS_FILTER_INIT. It relies on being able to faithfully convert function pointers to object pointers and back, which is undefined in C99. The standard mentions this ability as a common extension in its annex J (J.5.7 Function pointer casts), but avoiding that dependency is more portable. The macro hasn't been used since, OPT_PARSE_LIST_OBJECTS_FILTER is enough so far. So revert 5cb28270a1 fully to fix support for multiple --filter options and avoid reliance on undefined behavior. Its intent is better served by the release_revisions() call added by 2108fe4a19 alone -- we just need to do it unconditionally, mirroring the unconditional call of repo_init_revisions(). Helped-by: Jeff King <peff@peff.net> Signed-off-by: René Scharfe <l.s.r@web.de> --- builtin/pack-objects.c | 31 +++++--------------------- list-objects-filter-options.c | 4 ---- list-objects-filter-options.h | 24 +++----------------- t/t5317-pack-objects-filter-objects.sh | 2 +- 4 files changed, 10 insertions(+), 51 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 573d0b20b7..3e74fbb0cd 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -4149,21 +4149,6 @@ static int option_parse_cruft_expiration(const struct option *opt, return 0; } -struct po_filter_data { - unsigned have_revs:1; - struct rev_info revs; -}; - -static struct list_objects_filter_options *po_filter_revs_init(void *value) -{ - struct po_filter_data *data = value; - - repo_init_revisions(the_repository, &data->revs, NULL); - data->have_revs = 1; - - return &data->revs.filter; -} - int cmd_pack_objects(int argc, const char **argv, const char *prefix) { int use_internal_rev_list = 0; @@ -4174,7 +4159,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) int rev_list_index = 0; int stdin_packs = 0; struct string_list keep_pack_list = STRING_LIST_INIT_NODUP; - struct po_filter_data pfd = { .have_revs = 0 }; + struct rev_info revs; struct option pack_objects_options[] = { OPT_SET_INT('q', "quiet", &progress, @@ -4265,7 +4250,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) &write_bitmap_index, N_("write a bitmap index if possible"), WRITE_BITMAP_QUIET, PARSE_OPT_HIDDEN), - OPT_PARSE_LIST_OBJECTS_FILTER_INIT(&pfd, po_filter_revs_init), + OPT_PARSE_LIST_OBJECTS_FILTER(&revs.filter), OPT_CALLBACK_F(0, "missing", NULL, N_("action"), N_("handling for missing objects"), PARSE_OPT_NONEG, option_parse_missing_action), @@ -4284,6 +4269,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) read_replace_refs = 0; + repo_init_revisions(the_repository, &revs, NULL); + sparse = git_env_bool("GIT_TEST_PACK_SPARSE", -1); if (the_repository->gitdir) { prepare_repo_settings(the_repository); @@ -4385,7 +4372,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (!rev_list_all || !rev_list_reflog || !rev_list_index) unpack_unreachable_expiration = 0; - if (pfd.have_revs && pfd.revs.filter.choice) { + if (revs.filter.choice) { if (!pack_to_stdout) die(_("cannot use --filter without --stdout")); if (stdin_packs) @@ -4472,16 +4459,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) read_cruft_objects(); } else if (!use_internal_rev_list) { read_object_list_from_stdin(); - } else if (pfd.have_revs) { - get_object_list(&pfd.revs, rp.nr, rp.v); - release_revisions(&pfd.revs); } else { - struct rev_info revs; - - repo_init_revisions(the_repository, &revs, NULL); get_object_list(&revs, rp.nr, rp.v); - release_revisions(&revs); } + release_revisions(&revs); cleanup_preferred_base(); if (include_tag && nr_result) for_each_tag_ref(add_ref_tag, NULL); diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c index 5339660238..ee01bcd2cc 100644 --- a/list-objects-filter-options.c +++ b/list-objects-filter-options.c @@ -290,10 +290,6 @@ int opt_parse_list_objects_filter(const struct option *opt, const char *arg, int unset) { struct list_objects_filter_options *filter_options = opt->value; - opt_lof_init init = (opt_lof_init)opt->defval; - - if (init) - filter_options = init(opt->value); if (unset || !arg) list_objects_filter_set_no_filter(filter_options); diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h index 7eeadab2dd..64af2e29e2 100644 --- a/list-objects-filter-options.h +++ b/list-objects-filter-options.h @@ -107,31 +107,13 @@ void parse_list_objects_filter( struct list_objects_filter_options *filter_options, const char *arg); -/** - * The opt->value to opt_parse_list_objects_filter() is either a - * "struct list_objects_filter_option *" when using - * OPT_PARSE_LIST_OBJECTS_FILTER(). - * - * Or, if using no "struct option" field is used by the callback, - * except the "defval" which is expected to be an "opt_lof_init" - * function, which is called with the "opt->value" and must return a - * pointer to the ""struct list_objects_filter_option *" to be used. - * - * The OPT_PARSE_LIST_OBJECTS_FILTER_INIT() can be used e.g. the - * "struct list_objects_filter_option" is embedded in a "struct - * rev_info", which the "defval" could be tasked with lazily - * initializing. See cmd_pack_objects() for an example. - */ int opt_parse_list_objects_filter(const struct option *opt, const char *arg, int unset); -typedef struct list_objects_filter_options *(*opt_lof_init)(void *); -#define OPT_PARSE_LIST_OBJECTS_FILTER_INIT(fo, init) \ - { OPTION_CALLBACK, 0, "filter", (fo), N_("args"), \ - N_("object filtering"), 0, opt_parse_list_objects_filter, \ - (intptr_t)(init) } #define OPT_PARSE_LIST_OBJECTS_FILTER(fo) \ - OPT_PARSE_LIST_OBJECTS_FILTER_INIT((fo), NULL) + OPT_CALLBACK(0, "filter", fo, N_("args"), \ + N_("object filtering"), \ + opt_parse_list_objects_filter) /* * Translates abbreviated numbers in the filter's filter_spec into their diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh index 25faebaada..5b707d911b 100755 --- a/t/t5317-pack-objects-filter-objects.sh +++ b/t/t5317-pack-objects-filter-objects.sh @@ -265,7 +265,7 @@ test_expect_success 'verify normal and blob:limit packfiles have same commits/tr test_cmp expected observed ' -test_expect_failure 'verify small limit and big limit results in small limit' ' +test_expect_success 'verify small limit and big limit results in small limit' ' git -C r2 ls-files -s large.1000 >ls_files_result && test_parse_ls_files_stage_oids <ls_files_result | sort >expected && -- 2.38.1 ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" 2022-11-20 10:13 ` [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" René Scharfe @ 2022-11-28 10:03 ` Junio C Hamano 2022-11-28 11:12 ` Ævar Arnfjörð Bjarmason 2022-11-28 11:26 ` [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" René Scharfe 0 siblings, 2 replies; 52+ messages in thread From: Junio C Hamano @ 2022-11-28 10:03 UTC (permalink / raw) To: René Scharfe Cc: Git List, Ævar Arnfjörð Bjarmason, Taylor Blau, Christian Couder, Jeff King René Scharfe <l.s.r@web.de> writes: > This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518. > > 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak, > 2022-03-28) avoided leaking rev_info allocations in many cases by > calling repo_init_revisions() only when the .filter member was actually > needed, but then still leaking it. That was fixed later by 2108fe4a19 > (revisions API users: add straightforward release_revisions(), > 2022-04-13), making the reverted commit unnecessary. Hmph, with this merged, 'seen' breaks linux-leaks job in a strange way. https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917 Does anybody want to help looking into it? Thanks. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" 2022-11-28 10:03 ` Junio C Hamano @ 2022-11-28 11:12 ` Ævar Arnfjörð Bjarmason 2022-11-28 12:00 ` [PATCH] t5314: check exit code of "rev-parse" Ævar Arnfjörð Bjarmason 2022-11-28 11:26 ` [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" René Scharfe 1 sibling, 1 reply; 52+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-11-28 11:12 UTC (permalink / raw) To: Junio C Hamano Cc: René Scharfe, Git List, Taylor Blau, Christian Couder, Jeff King On Mon, Nov 28 2022, Junio C Hamano wrote: > René Scharfe <l.s.r@web.de> writes: > >> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518. >> >> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak, >> 2022-03-28) avoided leaking rev_info allocations in many cases by >> calling repo_init_revisions() only when the .filter member was actually >> needed, but then still leaking it. That was fixed later by 2108fe4a19 >> (revisions API users: add straightforward release_revisions(), >> 2022-04-13), making the reverted commit unnecessary. > > Hmph, with this merged, 'seen' breaks linux-leaks job in a strange > way. > > https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917 > > Does anybody want to help looking into it? It's because the tip of that branch introduces a new leak, but it's only revealed in our test suite by a "git" invocation that we don't check the exit code of. So testing with "GIT_TEST_SANITIZE_LEAK_LOG=true" (which is there to check exactly those edge cases) catches it. René, testing locally with e.g.: GIT_TEST_PASSING_SANITIZE_LEAK=check \ GIT_TEST_SANITIZE_LEAK_LOG=true \ make test Should catch it, but note that we might still have new leaks in tests that are already failing due to other leaks. So diff-ing the resulting t/test-results/*.leak/* would be a good sanity check to see if there's any other leak we've missed. As to the bug itself: This exact edge case is why I went for the lazy-setup part of 5cb28270a1f (pack-objects: lazily set up "struct rev_info", don't leak, 2022-03-28), and indeed the example mentioned in its commit message now leaks again, but not on master: echo e83c5163316f89bfbde7d9ab23ca2e25604af290 | ./git pack-objects initial Now, maybe I'm missing something, but it the tip of that branch points out a legitimate bug in my 5cb28270a1f. But I don't see why the fix for it needs to be a full revert of 5cb28270a1f. If I replace all the C code changes in that commit with this the test it flipped also passes: diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 573d0b20b76..0de53c5a2df 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -4158,8 +4158,10 @@ static struct list_objects_filter_options *po_filter_revs_init(void *value) { struct po_filter_data *data = value; - repo_init_revisions(the_repository, &data->revs, NULL); - data->have_revs = 1; + if (!data->have_revs) { + repo_init_revisions(the_repository, &data->revs, NULL); + data->have_revs = 1; + } return &data->revs.filter; } I.e. the commit is correct that we shouldn't be clobbering the "rev_info" when we see --filter again, but then let's just stop doing that. Well, I know why, it's because René initially started poking at this to rid the codebase of the reliance of C99's J.5.7. I think our two opinions on that are quite clear, so I won't repeat that here (I don't per-se mind, I just think it's a nothingburger). But in any case, if we're going to also refactor this code to get rd of the J.5.7 reliance I'd think we could agree that it really belongs in its own change. That change could then be a pure refactoring change, and wouldn't need to also explain how it's fixing a bug while-at-it. ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH] t5314: check exit code of "rev-parse" 2022-11-28 11:12 ` Ævar Arnfjörð Bjarmason @ 2022-11-28 12:00 ` Ævar Arnfjörð Bjarmason 2022-11-28 13:51 ` René Scharfe 2022-11-28 14:18 ` [PATCH v2] t5314: check exit code of "git" Ævar Arnfjörð Bjarmason 0 siblings, 2 replies; 52+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-11-28 12:00 UTC (permalink / raw) To: git Cc: Junio C Hamano, René Scharfe, Taylor Blau, Jeff King, Christian Couder, Ævar Arnfjörð Bjarmason Amend the test added in [1] to check the exit code of the "rev-parse" invocations. An in-flight change[2] introduced a memory leak in these invocations, which went undetected unless we were running under "GIT_TEST_SANITIZE_LEAK_LOG=true". Note that the in-flight change made 8 test files fail, but as far as I can tell only this one would have had its exit code hidden unless under "GIT_TEST_SANITIZE_LEAK_LOG=true". The rest would be caught without it. 1. 4cf2143e029 (pack-objects: break delta cycles before delta-search phase, 2016-08-11) 2. https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/ 3. faececa53f9 (test-lib: have the "check" mode for SANITIZE=leak consider leak logs, 2022-07-28) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- Aside from rs/multi-filter-args, we should be checking the return code of "git" in these cases. t/t5314-pack-cycle-detection.sh | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/t/t5314-pack-cycle-detection.sh b/t/t5314-pack-cycle-detection.sh index 73a241743aa..169d8198641 100755 --- a/t/t5314-pack-cycle-detection.sh +++ b/t/t5314-pack-cycle-detection.sh @@ -63,13 +63,16 @@ TEST_PASSES_SANITIZE_LEAK=true # Note that the two variants of "file" must be similar enough to convince git # to create the delta. make_pack () { - { - printf '%s\n' "-$(git rev-parse $2)" - printf '%s dummy\n' "$(git rev-parse $1:dummy)" - printf '%s file\n' "$(git rev-parse $1:file)" - } | - git pack-objects --stdout | - git index-pack --stdin --fix-thin + grp1=$(git rev-parse "$2") && + grp2=$(git rev-parse "$1:dummy") && + grp3=$(git rev-parse "$1:file") && + cat >list <<-EOF + -$grp1 + $grp2 dummy + $grp3 file + EOF + git pack-objects --stdout <list >pack && + git index-pack --stdin --fix-thin <pack } test_expect_success 'setup' ' -- 2.39.0.rc0.962.g4ca4168c9ac ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH] t5314: check exit code of "rev-parse" 2022-11-28 12:00 ` [PATCH] t5314: check exit code of "rev-parse" Ævar Arnfjörð Bjarmason @ 2022-11-28 13:51 ` René Scharfe 2022-11-28 14:18 ` [PATCH v2] t5314: check exit code of "git" Ævar Arnfjörð Bjarmason 1 sibling, 0 replies; 52+ messages in thread From: René Scharfe @ 2022-11-28 13:51 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason, git Cc: Junio C Hamano, Taylor Blau, Jeff King, Christian Couder Am 28.11.2022 um 13:00 schrieb Ævar Arnfjörð Bjarmason: > Amend the test added in [1] to check the exit code of the "rev-parse" > invocations. An in-flight change[2] introduced a memory leak in these > invocations, which went undetected unless we were running under > "GIT_TEST_SANITIZE_LEAK_LOG=true". Checking the return code of rev-parse might be a good idea, but AFAICS none of the patches in the thread around [2] add a leak to it; they are only about pack-objects. So that's not how this patch works. > Note that the in-flight change made 8 test files fail, but as far as I > can tell only this one would have had its exit code hidden unless > under "GIT_TEST_SANITIZE_LEAK_LOG=true". The rest would be caught > without it. > > 1. 4cf2143e029 (pack-objects: break delta cycles before delta-search > phase, 2016-08-11) > 2. https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/ > 3. faececa53f9 (test-lib: have the "check" mode for SANITIZE=leak > consider leak logs, 2022-07-28) > > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > --- > > Aside from rs/multi-filter-args, we should be checking the return code > of "git" in these cases. > > t/t5314-pack-cycle-detection.sh | 17 ++++++++++------- > 1 file changed, 10 insertions(+), 7 deletions(-) > > diff --git a/t/t5314-pack-cycle-detection.sh b/t/t5314-pack-cycle-detection.sh > index 73a241743aa..169d8198641 100755 > --- a/t/t5314-pack-cycle-detection.sh > +++ b/t/t5314-pack-cycle-detection.sh > @@ -63,13 +63,16 @@ TEST_PASSES_SANITIZE_LEAK=true > # Note that the two variants of "file" must be similar enough to convince git > # to create the delta. > make_pack () { > - { > - printf '%s\n' "-$(git rev-parse $2)" > - printf '%s dummy\n' "$(git rev-parse $1:dummy)" > - printf '%s file\n' "$(git rev-parse $1:file)" > - } | > - git pack-objects --stdout | THIS is how it works: pack-objects with the patch in the grandparent of [2] and built with SANITIZE=leak fails here, but the test marches on, ignoring its return value. > - git index-pack --stdin --fix-thin > + grp1=$(git rev-parse "$2") && > + grp2=$(git rev-parse "$1:dummy") && > + grp3=$(git rev-parse "$1:file") && Nit: "grp" expands to "group" in my mind. Here it stands for "git rev-parse", hmm. "commit", "dummy_blob" and "file_blob" might be better names. > + cat >list <<-EOF > + -$grp1 > + $grp2 dummy > + $grp3 file > + EOF > + git pack-objects --stdout <list >pack && Here's the new return code check that makes the test detect the diffopt leak in question. > + git index-pack --stdin --fix-thin <pack > } > > test_expect_success 'setup' ' ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v2] t5314: check exit code of "git" 2022-11-28 12:00 ` [PATCH] t5314: check exit code of "rev-parse" Ævar Arnfjörð Bjarmason 2022-11-28 13:51 ` René Scharfe @ 2022-11-28 14:18 ` Ævar Arnfjörð Bjarmason 1 sibling, 0 replies; 52+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-11-28 14:18 UTC (permalink / raw) To: git Cc: Junio C Hamano, René Scharfe, Taylor Blau, Jeff King, Christian Couder, Ævar Arnfjörð Bjarmason Amend the test added in [1] to check the exit code of the "git" invocations. An in-flight change[2] introduced a memory leak in these invocations, which went undetected unless we were running under "GIT_TEST_SANITIZE_LEAK_LOG=true". Note that the in-flight change made 8 test files fail, but as far as I can tell only this one would have had its exit code hidden unless under "GIT_TEST_SANITIZE_LEAK_LOG=true". The rest would be caught without it. We could pick other variable names here than "ln%d", e.g. "commit", "dummy_blob" and "file_blob", but having the "rev-parse" invocations aligned makes the difference between them more readable, so let's pick "ln%d". 1. 4cf2143e029 (pack-objects: break delta cycles before delta-search phase, 2016-08-11) 2. https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/ 3. faececa53f9 (test-lib: have the "check" mode for SANITIZE=leak consider leak logs, 2022-07-28) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- Range-diff against v1: 1: 45b240740eb ! 1: ca77a7249e6 t5314: check exit code of "rev-parse" @@ Metadata Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com> ## Commit message ## - t5314: check exit code of "rev-parse" + t5314: check exit code of "git" - Amend the test added in [1] to check the exit code of the "rev-parse" + Amend the test added in [1] to check the exit code of the "git" invocations. An in-flight change[2] introduced a memory leak in these invocations, which went undetected unless we were running under "GIT_TEST_SANITIZE_LEAK_LOG=true". @@ Commit message under "GIT_TEST_SANITIZE_LEAK_LOG=true". The rest would be caught without it. + We could pick other variable names here than "ln%d", e.g. "commit", + "dummy_blob" and "file_blob", but having the "rev-parse" invocations + aligned makes the difference between them more readable, so let's pick + "ln%d". + 1. 4cf2143e029 (pack-objects: break delta cycles before delta-search phase, 2016-08-11) 2. https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/ @@ t/t5314-pack-cycle-detection.sh: TEST_PASSES_SANITIZE_LEAK=true - } | - git pack-objects --stdout | - git index-pack --stdin --fix-thin -+ grp1=$(git rev-parse "$2") && -+ grp2=$(git rev-parse "$1:dummy") && -+ grp3=$(git rev-parse "$1:file") && ++ ln1=$(git rev-parse "$2") && ++ ln2=$(git rev-parse "$1:dummy") && ++ ln3=$(git rev-parse "$1:file") && + cat >list <<-EOF -+ -$grp1 -+ $grp2 dummy -+ $grp3 file ++ -$ln1 ++ $ln2 dummy ++ $ln3 file + EOF + git pack-objects --stdout <list >pack && + git index-pack --stdin --fix-thin <pack t/t5314-pack-cycle-detection.sh | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/t/t5314-pack-cycle-detection.sh b/t/t5314-pack-cycle-detection.sh index 73a241743aa..82734b9a3c4 100755 --- a/t/t5314-pack-cycle-detection.sh +++ b/t/t5314-pack-cycle-detection.sh @@ -63,13 +63,16 @@ TEST_PASSES_SANITIZE_LEAK=true # Note that the two variants of "file" must be similar enough to convince git # to create the delta. make_pack () { - { - printf '%s\n' "-$(git rev-parse $2)" - printf '%s dummy\n' "$(git rev-parse $1:dummy)" - printf '%s file\n' "$(git rev-parse $1:file)" - } | - git pack-objects --stdout | - git index-pack --stdin --fix-thin + ln1=$(git rev-parse "$2") && + ln2=$(git rev-parse "$1:dummy") && + ln3=$(git rev-parse "$1:file") && + cat >list <<-EOF + -$ln1 + $ln2 dummy + $ln3 file + EOF + git pack-objects --stdout <list >pack && + git index-pack --stdin --fix-thin <pack } test_expect_success 'setup' ' -- 2.39.0.rc0.962.g4ca4168c9ac ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" 2022-11-28 10:03 ` Junio C Hamano 2022-11-28 11:12 ` Ævar Arnfjörð Bjarmason @ 2022-11-28 11:26 ` René Scharfe 2022-11-28 11:31 ` Ævar Arnfjörð Bjarmason 1 sibling, 1 reply; 52+ messages in thread From: René Scharfe @ 2022-11-28 11:26 UTC (permalink / raw) To: Junio C Hamano Cc: Git List, Ævar Arnfjörð Bjarmason, Taylor Blau, Christian Couder, Jeff King Am 28.11.2022 um 11:03 schrieb Junio C Hamano: > René Scharfe <l.s.r@web.de> writes: > >> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518. >> >> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak, >> 2022-03-28) avoided leaking rev_info allocations in many cases by >> calling repo_init_revisions() only when the .filter member was actually >> needed, but then still leaking it. That was fixed later by 2108fe4a19 >> (revisions API users: add straightforward release_revisions(), >> 2022-04-13), making the reverted commit unnecessary. > > Hmph, with this merged, 'seen' breaks linux-leaks job in a strange > way. > > https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917 > > Does anybody want to help looking into it? The patch exposes that release_revisions() leaks the diffopt allocations as we're yet to address the TODO added by 54c8a7c379 (revisions API: add a TODO for diff_free(&revs->diffopt), 2022-04-14). The patch below plugs it locally. --- >8 --- Subject: [PATCH 4/3] fixup! revision: free diffopt in release_revisions() Signed-off-by: René Scharfe <l.s.r@web.de> --- builtin/pack-objects.c | 1 + 1 file changed, 1 insertion(+) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 3e74fbb0cd..a47a3f0fba 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -4462,6 +4462,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) } else { get_object_list(&revs, rp.nr, rp.v); } + diff_free(&revs.diffopt); release_revisions(&revs); cleanup_preferred_base(); if (include_tag && nr_result) -- 2.30.2 ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" 2022-11-28 11:26 ` [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" René Scharfe @ 2022-11-28 11:31 ` Ævar Arnfjörð Bjarmason 2022-11-28 12:24 ` Ævar Arnfjörð Bjarmason 2022-11-28 14:29 ` René Scharfe 0 siblings, 2 replies; 52+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-11-28 11:31 UTC (permalink / raw) To: René Scharfe Cc: Junio C Hamano, Git List, Taylor Blau, Christian Couder, Jeff King On Mon, Nov 28 2022, René Scharfe wrote: > Am 28.11.2022 um 11:03 schrieb Junio C Hamano: >> René Scharfe <l.s.r@web.de> writes: >> >>> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518. >>> >>> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak, >>> 2022-03-28) avoided leaking rev_info allocations in many cases by >>> calling repo_init_revisions() only when the .filter member was actually >>> needed, but then still leaking it. That was fixed later by 2108fe4a19 >>> (revisions API users: add straightforward release_revisions(), >>> 2022-04-13), making the reverted commit unnecessary. >> >> Hmph, with this merged, 'seen' breaks linux-leaks job in a strange >> way. >> >> https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917 >> >> Does anybody want to help looking into it? [I see we crossed E-Mails]: https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/ > The patch exposes that release_revisions() leaks the diffopt allocations > as we're yet to address the TODO added by 54c8a7c379 (revisions API: add > a TODO for diff_free(&revs->diffopt), 2022-04-14). That's correct, and we have that leak in various places in our codebase, but per the above side-thread I think this is primarily exposing that we're setting up the "struct rev_info" with your change when we don't need to. Why can't we just skip it? Yeah, if we do set it up we'll run into an outstanding leak, and that should also be fixed (I have some local patches...), but the other cases I know of where we'll leak that data is where we're actually using the "struct rev_info". I haven't tried tearing your change apart to poke at it myself, and maybe there's some really good reason for why you can't separate getting rid of the J.5.7 dependency and removing the lazy-init. > The patch below plugs it locally. > > --- >8 --- > Subject: [PATCH 4/3] fixup! revision: free diffopt in release_revisions() > > Signed-off-by: René Scharfe <l.s.r@web.de> > --- > builtin/pack-objects.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index 3e74fbb0cd..a47a3f0fba 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -4462,6 +4462,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) > } else { > get_object_list(&revs, rp.nr, rp.v); > } > + diff_free(&revs.diffopt); > release_revisions(&revs); > cleanup_preferred_base(); > if (include_tag && nr_result) So, the main motivation for the change was paranoia that a compiler or platform might show up without J.5.7 support and that would bite us, but we're now adding a double-free-in-waiting? I think we're both a bit paranoid, but clearly have different paranoia-priorities :) If we do end up with some hack like this instead of fixing the underlying problem I'd much prefer that such a hack just be an UNLEAK() here. I.e. we have a destructor for "revs.*" already, let's not bypass it and start freeing things from under it, which will result in a double-free if we forget this callsite once the TODO in 54c8a7c379 is addressed. As you'd see if you made release_revisions() simply call diff_free(&revs.diffopt) doing so would reveal some really gnarly edge cases. I haven't dug into this one, but offhand I'm not confident in saying that this isn't exposing us to some aspect of that gnarlyness (maybe not, it's been a while since I looked). (IIRC some of the most gnarly edge cases will only show up as CI failures on Windows, to do with the ordering of when we'll fclose() files hanging off that "diffopt"). ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" 2022-11-28 11:31 ` Ævar Arnfjörð Bjarmason @ 2022-11-28 12:24 ` Ævar Arnfjörð Bjarmason 2022-11-28 15:16 ` René Scharfe 2022-11-28 14:29 ` René Scharfe 1 sibling, 1 reply; 52+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-11-28 12:24 UTC (permalink / raw) To: René Scharfe Cc: Junio C Hamano, Git List, Taylor Blau, Christian Couder, Jeff King On Mon, Nov 28 2022, Ævar Arnfjörð Bjarmason wrote: René: > On Mon, Nov 28 2022, René Scharfe wrote: > >> Am 28.11.2022 um 11:03 schrieb Junio C Hamano: >>> René Scharfe <l.s.r@web.de> writes: >>> >>>> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518. >>>> >>>> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak, >>>> 2022-03-28) avoided leaking rev_info allocations in many cases by >>>> calling repo_init_revisions() only when the .filter member was actually >>>> needed, but then still leaking it. That was fixed later by 2108fe4a19 >>>> (revisions API users: add straightforward release_revisions(), >>>> 2022-04-13), making the reverted commit unnecessary. >>> >>> Hmph, with this merged, 'seen' breaks linux-leaks job in a strange >>> way. >>> >>> https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917 >>> >>> Does anybody want to help looking into it? > > [I see we crossed E-Mails]: > https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/ > >> The patch exposes that release_revisions() leaks the diffopt allocations >> as we're yet to address the TODO added by 54c8a7c379 (revisions API: add >> a TODO for diff_free(&revs->diffopt), 2022-04-14). > > That's correct, and we have that leak in various places in our codebase, > but per the above side-thread I think this is primarily exposing that > we're setting up the "struct rev_info" with your change when we don't > need to. Why can't we just skip it? > > Yeah, if we do set it up we'll run into an outstanding leak, and that > should also be fixed (I have some local patches...), but the other cases > I know of where we'll leak that data is where we're actually using the > "struct rev_info". > > I haven't tried tearing your change apart to poke at it myself, and > maybe there's some really good reason for why you can't separate getting > rid of the J.5.7 dependency and removing the lazy-init. > >> The patch below plugs it locally. >> >> --- >8 --- >> Subject: [PATCH 4/3] fixup! revision: free diffopt in release_revisions() >> >> Signed-off-by: René Scharfe <l.s.r@web.de> >> --- >> builtin/pack-objects.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c >> index 3e74fbb0cd..a47a3f0fba 100644 >> --- a/builtin/pack-objects.c >> +++ b/builtin/pack-objects.c >> @@ -4462,6 +4462,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) >> } else { >> get_object_list(&revs, rp.nr, rp.v); >> } >> + diff_free(&revs.diffopt); >> release_revisions(&revs); >> cleanup_preferred_base(); >> if (include_tag && nr_result) > > So, the main motivation for the change was paranoia that a compiler or > platform might show up without J.5.7 support and that would bite us, but > we're now adding a double-free-in-waiting? > > I think we're both a bit paranoid, but clearly have different > paranoia-priorities :) > > If we do end up with some hack like this instead of fixing the > underlying problem I'd much prefer that such a hack just be an UNLEAK() > here. > > I.e. we have a destructor for "revs.*" already, let's not bypass it and > start freeing things from under it, which will result in a double-free > if we forget this callsite once the TODO in 54c8a7c379 is addressed. > > As you'd see if you made release_revisions() simply call > diff_free(&revs.diffopt) doing so would reveal some really gnarly edge > cases. > > I haven't dug into this one, but offhand I'm not confident in saying > that this isn't exposing us to some aspect of that gnarlyness (maybe > not, it's been a while since I looked). > > (IIRC some of the most gnarly edge cases will only show up as CI > failures on Windows, to do with the ordering of when we'll fclose() > files hanging off that "diffopt"). This squashed into 3/3 seems to me to be a proper fix to a change that wants to refactor the code for non-J.5.7 compatibility. I.e. this just does the data<->fp casting part of the change, without refactoring the "lazy init". But I think you should check this a bit more carefully. Your 3/3 says that your change "mak[es] the reverted commit unnecessary", but as I noted if you'd run the command that commit shows, you'd have seen you're re-introducing the leak it fixed. So I wonder what else has been missed here. I vaguely recall that one reason I ended up with that J.5.7 dependency was because there was an objection to mocking up the "struct option" as I'm doing here. I.e. here we assume that the opt_parse_list_objects_filter() is only ever going to care about the "value" member. I think that's probably fine, but I may be misrecalling, or missing some crucial detail. I'll leave digging that up & convincing us that it's fine to the person pushing for refactoring all of this :) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 3e74fbb0cd5..faf210bfe8c 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -4149,6 +4149,27 @@ static int option_parse_cruft_expiration(const struct option *opt, return 0; } +struct po_filter_data { + unsigned have_revs:1; + struct rev_info revs; +}; + +static int opt_parse_list_objects_filter_init(const struct option *opt, + const char *arg, int unset) +{ + struct po_filter_data *data = opt->value; + struct rev_info *revs = &data->revs; + const struct option opt_rev = { + .value = (void *)&revs->filter, + }; + + if (!data->have_revs) + repo_init_revisions(the_repository, revs, NULL); + data->have_revs = 1; + + return opt_parse_list_objects_filter(&opt_rev, arg, unset); +} + int cmd_pack_objects(int argc, const char **argv, const char *prefix) { int use_internal_rev_list = 0; @@ -4159,7 +4180,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) int rev_list_index = 0; int stdin_packs = 0; struct string_list keep_pack_list = STRING_LIST_INIT_NODUP; - struct rev_info revs; + struct po_filter_data pfd = { .have_revs = 0 }; struct option pack_objects_options[] = { OPT_SET_INT('q', "quiet", &progress, @@ -4250,7 +4271,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) &write_bitmap_index, N_("write a bitmap index if possible"), WRITE_BITMAP_QUIET, PARSE_OPT_HIDDEN), - OPT_PARSE_LIST_OBJECTS_FILTER(&revs.filter), + OPT_CALLBACK(0, "filter", &pfd, N_("args"), N_("object filtering"), + opt_parse_list_objects_filter_init), OPT_CALLBACK_F(0, "missing", NULL, N_("action"), N_("handling for missing objects"), PARSE_OPT_NONEG, option_parse_missing_action), @@ -4269,8 +4291,6 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) read_replace_refs = 0; - repo_init_revisions(the_repository, &revs, NULL); - sparse = git_env_bool("GIT_TEST_PACK_SPARSE", -1); if (the_repository->gitdir) { prepare_repo_settings(the_repository); @@ -4372,7 +4392,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (!rev_list_all || !rev_list_reflog || !rev_list_index) unpack_unreachable_expiration = 0; - if (revs.filter.choice) { + if (pfd.have_revs && pfd.revs.filter.choice) { if (!pack_to_stdout) die(_("cannot use --filter without --stdout")); if (stdin_packs) @@ -4459,10 +4479,16 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) read_cruft_objects(); } else if (!use_internal_rev_list) { read_object_list_from_stdin(); + } else if (pfd.have_revs) { + get_object_list(&pfd.revs, rp.nr, rp.v); + release_revisions(&pfd.revs); } else { + struct rev_info revs; + + repo_init_revisions(the_repository, &revs, NULL); get_object_list(&revs, rp.nr, rp.v); + release_revisions(&revs); } - release_revisions(&revs); cleanup_preferred_base(); if (include_tag && nr_result) for_each_tag_ref(add_ref_tag, NULL); ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" 2022-11-28 12:24 ` Ævar Arnfjörð Bjarmason @ 2022-11-28 15:16 ` René Scharfe 2022-11-28 15:27 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 52+ messages in thread From: René Scharfe @ 2022-11-28 15:16 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Junio C Hamano, Git List, Taylor Blau, Christian Couder, Jeff King Am 28.11.2022 um 13:24 schrieb Ævar Arnfjörð Bjarmason: > > On Mon, Nov 28 2022, Ævar Arnfjörð Bjarmason wrote: > > René: > >> On Mon, Nov 28 2022, René Scharfe wrote: >> >>> Am 28.11.2022 um 11:03 schrieb Junio C Hamano: >>>> René Scharfe <l.s.r@web.de> writes: >>>> >>>>> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518. >>>>> >>>>> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak, >>>>> 2022-03-28) avoided leaking rev_info allocations in many cases by >>>>> calling repo_init_revisions() only when the .filter member was actually >>>>> needed, but then still leaking it. That was fixed later by 2108fe4a19 >>>>> (revisions API users: add straightforward release_revisions(), >>>>> 2022-04-13), making the reverted commit unnecessary. >>>> >>>> Hmph, with this merged, 'seen' breaks linux-leaks job in a strange >>>> way. >>>> >>>> https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917 >>>> >>>> Does anybody want to help looking into it? >> >> [I see we crossed E-Mails]: >> https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/ >> >>> The patch exposes that release_revisions() leaks the diffopt allocations >>> as we're yet to address the TODO added by 54c8a7c379 (revisions API: add >>> a TODO for diff_free(&revs->diffopt), 2022-04-14). >> >> That's correct, and we have that leak in various places in our codebase, >> but per the above side-thread I think this is primarily exposing that >> we're setting up the "struct rev_info" with your change when we don't >> need to. Why can't we just skip it? >> >> Yeah, if we do set it up we'll run into an outstanding leak, and that >> should also be fixed (I have some local patches...), but the other cases >> I know of where we'll leak that data is where we're actually using the >> "struct rev_info". >> >> I haven't tried tearing your change apart to poke at it myself, and >> maybe there's some really good reason for why you can't separate getting >> rid of the J.5.7 dependency and removing the lazy-init. >> >>> The patch below plugs it locally. >>> >>> --- >8 --- >>> Subject: [PATCH 4/3] fixup! revision: free diffopt in release_revisions() >>> >>> Signed-off-by: René Scharfe <l.s.r@web.de> >>> --- >>> builtin/pack-objects.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c >>> index 3e74fbb0cd..a47a3f0fba 100644 >>> --- a/builtin/pack-objects.c >>> +++ b/builtin/pack-objects.c >>> @@ -4462,6 +4462,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) >>> } else { >>> get_object_list(&revs, rp.nr, rp.v); >>> } >>> + diff_free(&revs.diffopt); >>> release_revisions(&revs); >>> cleanup_preferred_base(); >>> if (include_tag && nr_result) >> >> So, the main motivation for the change was paranoia that a compiler or >> platform might show up without J.5.7 support and that would bite us, but >> we're now adding a double-free-in-waiting? >> >> I think we're both a bit paranoid, but clearly have different >> paranoia-priorities :) >> >> If we do end up with some hack like this instead of fixing the >> underlying problem I'd much prefer that such a hack just be an UNLEAK() >> here. >> >> I.e. we have a destructor for "revs.*" already, let's not bypass it and >> start freeing things from under it, which will result in a double-free >> if we forget this callsite once the TODO in 54c8a7c379 is addressed. >> >> As you'd see if you made release_revisions() simply call >> diff_free(&revs.diffopt) doing so would reveal some really gnarly edge >> cases. >> >> I haven't dug into this one, but offhand I'm not confident in saying >> that this isn't exposing us to some aspect of that gnarlyness (maybe >> not, it's been a while since I looked). >> >> (IIRC some of the most gnarly edge cases will only show up as CI >> failures on Windows, to do with the ordering of when we'll fclose() >> files hanging off that "diffopt"). > > This squashed into 3/3 seems to me to be a proper fix to a change that > wants to refactor the code for non-J.5.7 compatibility. I.e. this just > does the data<->fp casting part of the change, without refactoring the > "lazy init". That works, but lazy code is more complicated and there is no benefit here -- eager allocations are not noticably slow or big. Laziness hides leaks in corners, i.e. requiring invocations with uncommon options to trigger them. > > But I think you should check this a bit more carefully. Your 3/3 says > that your change "mak[es] the reverted commit unnecessary" No, it says that _your_ change 2108fe4a19 (revisions API users: add straightforward release_revisions(), 2022-04-13) made it unnecessary. > , but as I > noted if you'd run the command that commit shows, you'd have seen you're > re-introducing the leak it fixed. So I wonder what else has been missed > here. 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak, 2022-03-28) did not plug the leak. It only moved it to the corner that handles the --filter option. That leak is only interesting to Git developers and harmless for users. But if the goal is to become free of trivial leaks in order to allow using tools like LeakSanitizer to find real ones then pushing them into the shadows not yet reached by our test coverage won't help for long. > I vaguely recall that one reason I ended up with that J.5.7 dependency > was because there was an objection to mocking up the "struct option" as > I'm doing here. I.e. here we assume that the > opt_parse_list_objects_filter() is only ever going to care about the > "value" member. It's probably fine, but unnecessarily complicated compared to calling repo_init_revisions() eagerly. > > I think that's probably fine, but I may be misrecalling, or missing some > crucial detail. I'll leave digging that up & convincing us that it's > fine to the person pushing for refactoring all of this :) > > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index 3e74fbb0cd5..faf210bfe8c 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -4149,6 +4149,27 @@ static int option_parse_cruft_expiration(const struct option *opt, > return 0; > } > > +struct po_filter_data { > + unsigned have_revs:1; > + struct rev_info revs; > +}; > + > +static int opt_parse_list_objects_filter_init(const struct option *opt, > + const char *arg, int unset) > +{ > + struct po_filter_data *data = opt->value; > + struct rev_info *revs = &data->revs; > + const struct option opt_rev = { > + .value = (void *)&revs->filter, > + }; > + > + if (!data->have_revs) > + repo_init_revisions(the_repository, revs, NULL); > + data->have_revs = 1; > + > + return opt_parse_list_objects_filter(&opt_rev, arg, unset); > +} > + > int cmd_pack_objects(int argc, const char **argv, const char *prefix) > { > int use_internal_rev_list = 0; > @@ -4159,7 +4180,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) > int rev_list_index = 0; > int stdin_packs = 0; > struct string_list keep_pack_list = STRING_LIST_INIT_NODUP; > - struct rev_info revs; > + struct po_filter_data pfd = { .have_revs = 0 }; > > struct option pack_objects_options[] = { > OPT_SET_INT('q', "quiet", &progress, > @@ -4250,7 +4271,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) > &write_bitmap_index, > N_("write a bitmap index if possible"), > WRITE_BITMAP_QUIET, PARSE_OPT_HIDDEN), > - OPT_PARSE_LIST_OBJECTS_FILTER(&revs.filter), > + OPT_CALLBACK(0, "filter", &pfd, N_("args"), N_("object filtering"), > + opt_parse_list_objects_filter_init), > OPT_CALLBACK_F(0, "missing", NULL, N_("action"), > N_("handling for missing objects"), PARSE_OPT_NONEG, > option_parse_missing_action), > @@ -4269,8 +4291,6 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) > > read_replace_refs = 0; > > - repo_init_revisions(the_repository, &revs, NULL); > - > sparse = git_env_bool("GIT_TEST_PACK_SPARSE", -1); > if (the_repository->gitdir) { > prepare_repo_settings(the_repository); > @@ -4372,7 +4392,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) > if (!rev_list_all || !rev_list_reflog || !rev_list_index) > unpack_unreachable_expiration = 0; > > - if (revs.filter.choice) { > + if (pfd.have_revs && pfd.revs.filter.choice) { > if (!pack_to_stdout) > die(_("cannot use --filter without --stdout")); > if (stdin_packs) > @@ -4459,10 +4479,16 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) > read_cruft_objects(); > } else if (!use_internal_rev_list) { > read_object_list_from_stdin(); > + } else if (pfd.have_revs) { > + get_object_list(&pfd.revs, rp.nr, rp.v); > + release_revisions(&pfd.revs); > } else { > + struct rev_info revs; > + > + repo_init_revisions(the_repository, &revs, NULL); > get_object_list(&revs, rp.nr, rp.v); > + release_revisions(&revs); > } > - release_revisions(&revs); > cleanup_preferred_base(); > if (include_tag && nr_result) > for_each_tag_ref(add_ref_tag, NULL); ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" 2022-11-28 15:16 ` René Scharfe @ 2022-11-28 15:27 ` Ævar Arnfjörð Bjarmason 0 siblings, 0 replies; 52+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-11-28 15:27 UTC (permalink / raw) To: René Scharfe Cc: Junio C Hamano, Git List, Taylor Blau, Christian Couder, Jeff King On Mon, Nov 28 2022, René Scharfe wrote: > Am 28.11.2022 um 13:24 schrieb Ævar Arnfjörð Bjarmason: >> >> On Mon, Nov 28 2022, Ævar Arnfjörð Bjarmason wrote: >> >> René: >> >>> On Mon, Nov 28 2022, René Scharfe wrote: >>> >>>> Am 28.11.2022 um 11:03 schrieb Junio C Hamano: >>>>> René Scharfe <l.s.r@web.de> writes: >>>>> >>>>>> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518. >>>>>> >>>>>> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak, >>>>>> 2022-03-28) avoided leaking rev_info allocations in many cases by >>>>>> calling repo_init_revisions() only when the .filter member was actually >>>>>> needed, but then still leaking it. That was fixed later by 2108fe4a19 >>>>>> (revisions API users: add straightforward release_revisions(), >>>>>> 2022-04-13), making the reverted commit unnecessary. >>>>> >>>>> Hmph, with this merged, 'seen' breaks linux-leaks job in a strange >>>>> way. >>>>> >>>>> https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917 >>>>> >>>>> Does anybody want to help looking into it? >>> >>> [I see we crossed E-Mails]: >>> https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/ >>> >>>> The patch exposes that release_revisions() leaks the diffopt allocations >>>> as we're yet to address the TODO added by 54c8a7c379 (revisions API: add >>>> a TODO for diff_free(&revs->diffopt), 2022-04-14). >>> >>> That's correct, and we have that leak in various places in our codebase, >>> but per the above side-thread I think this is primarily exposing that >>> we're setting up the "struct rev_info" with your change when we don't >>> need to. Why can't we just skip it? >>> >>> Yeah, if we do set it up we'll run into an outstanding leak, and that >>> should also be fixed (I have some local patches...), but the other cases >>> I know of where we'll leak that data is where we're actually using the >>> "struct rev_info". >>> >>> I haven't tried tearing your change apart to poke at it myself, and >>> maybe there's some really good reason for why you can't separate getting >>> rid of the J.5.7 dependency and removing the lazy-init. >>> >>>> The patch below plugs it locally. >>>> >>>> --- >8 --- >>>> Subject: [PATCH 4/3] fixup! revision: free diffopt in release_revisions() >>>> >>>> Signed-off-by: René Scharfe <l.s.r@web.de> >>>> --- >>>> builtin/pack-objects.c | 1 + >>>> 1 file changed, 1 insertion(+) >>>> >>>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c >>>> index 3e74fbb0cd..a47a3f0fba 100644 >>>> --- a/builtin/pack-objects.c >>>> +++ b/builtin/pack-objects.c >>>> @@ -4462,6 +4462,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) >>>> } else { >>>> get_object_list(&revs, rp.nr, rp.v); >>>> } >>>> + diff_free(&revs.diffopt); >>>> release_revisions(&revs); >>>> cleanup_preferred_base(); >>>> if (include_tag && nr_result) >>> >>> So, the main motivation for the change was paranoia that a compiler or >>> platform might show up without J.5.7 support and that would bite us, but >>> we're now adding a double-free-in-waiting? >>> >>> I think we're both a bit paranoid, but clearly have different >>> paranoia-priorities :) >>> >>> If we do end up with some hack like this instead of fixing the >>> underlying problem I'd much prefer that such a hack just be an UNLEAK() >>> here. >>> >>> I.e. we have a destructor for "revs.*" already, let's not bypass it and >>> start freeing things from under it, which will result in a double-free >>> if we forget this callsite once the TODO in 54c8a7c379 is addressed. >>> >>> As you'd see if you made release_revisions() simply call >>> diff_free(&revs.diffopt) doing so would reveal some really gnarly edge >>> cases. >>> >>> I haven't dug into this one, but offhand I'm not confident in saying >>> that this isn't exposing us to some aspect of that gnarlyness (maybe >>> not, it's been a while since I looked). >>> >>> (IIRC some of the most gnarly edge cases will only show up as CI >>> failures on Windows, to do with the ordering of when we'll fclose() >>> files hanging off that "diffopt"). >> >> This squashed into 3/3 seems to me to be a proper fix to a change that >> wants to refactor the code for non-J.5.7 compatibility. I.e. this just >> does the data<->fp casting part of the change, without refactoring the >> "lazy init". > > That works, but lazy code is more complicated and there is no benefit > here -- eager allocations are not noticably slow or big. Laziness > hides leaks in corners, i.e. requiring invocations with uncommon > options to trigger them. Yes, sometimes it's easier to just set everything up at the beginning. As for hiding leaks I think the empirical data here is going against that, i.e. your change introduced a leak. I don't think it's realistic that we'll have the side that assigns to "have_revs" drift from the corresponding code in cmd_pack_objects(). >> But I think you should check this a bit more carefully. Your 3/3 says >> that your change "mak[es] the reverted commit unnecessary" > > No, it says that _your_ change 2108fe4a19 (revisions API users: add > straightforward release_revisions(), 2022-04-13) made it unnecessary. Yes, I'm saying that's not correct, because if you run the command that 5cb28270a1 prominently notes we'll now leak with this revert: echo e83c5163316f89bfbde7d9ab23ca2e25604af290 | ./git pack-objects initial But yes with just 5cb28270a1 didn't add release_revisions(), that came shortly afterwards in 2108fe4a19. >> , but as I >> noted if you'd run the command that commit shows, you'd have seen you're >> re-introducing the leak it fixed. So I wonder what else has been missed >> here. > > 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak, > 2022-03-28) did not plug the leak. It only moved it to the corner that > handles the --filter option. I think we're using "the leak" here differently. I mean callstacks that LeakSanitizer emits & tests we have that do & don't pass with SANITIZE=leak. But yes, there may be multiple paths through a function, some of which leak, some of which don't. I'm not saying that the entire set of API features that builtin/pack-objects.c uses in the revision API is leak-free. > That leak is only interesting to Git developers and harmless for users. > But if the goal is to become free of trivial leaks in order to allow > using tools like LeakSanitizer to find real ones then pushing them into > the shadows not yet reached by our test coverage won't help for long. It's clearly helping in this case, as our CI had multiple failing tests. >> I vaguely recall that one reason I ended up with that J.5.7 dependency >> was because there was an objection to mocking up the "struct option" as >> I'm doing here. I.e. here we assume that the >> opt_parse_list_objects_filter() is only ever going to care about the >> "value" member. > > It's probably fine, but unnecessarily complicated compared to calling > repo_init_revisions() eagerly. I'm leaving aside the question of whether we should go for some version of the refactoring in your 3/3. What I am saying is that such refactoring should be split up from the more narrow bug fix to the existing code. I.e. this as a replacement for your 3/3 is all that's needed to pass the test you're adding in 2/3. -- >8 -- From: =?UTF-8?q?Ren=C3=A9=20Scharfe?= <l.s.r@web.de> Subject: [PATCH] pack-objects: support multiple --filter options again 5cb28270a1f (pack-objects: lazily set up "struct rev_info", don't leak, 2022-03-28) broke support for multiple --filter options by calling repo_init_revisions() every time "--filter" was seen. Instead we should only do so the first time, and subsequently append to the existing filter data. Helped-by: Jeff King <peff@peff.net> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- builtin/pack-objects.c | 3 ++- t/t5317-pack-objects-filter-objects.sh | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 573d0b20b76..c702c09dd45 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -4158,7 +4158,8 @@ static struct list_objects_filter_options *po_filter_revs_init(void *value) { struct po_filter_data *data = value; - repo_init_revisions(the_repository, &data->revs, NULL); + if (!data->have_revs) + repo_init_revisions(the_repository, &data->revs, NULL); data->have_revs = 1; return &data->revs.filter; diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh index 25faebaada8..5b707d911b5 100755 --- a/t/t5317-pack-objects-filter-objects.sh +++ b/t/t5317-pack-objects-filter-objects.sh @@ -265,7 +265,7 @@ test_expect_success 'verify normal and blob:limit packfiles have same commits/tr test_cmp expected observed ' -test_expect_failure 'verify small limit and big limit results in small limit' ' +test_expect_success 'verify small limit and big limit results in small limit' ' git -C r2 ls-files -s large.1000 >ls_files_result && test_parse_ls_files_stage_oids <ls_files_result | sort >expected && -- 2.39.0.rc0.993.g0c499e58e3b ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" 2022-11-28 11:31 ` Ævar Arnfjörð Bjarmason 2022-11-28 12:24 ` Ævar Arnfjörð Bjarmason @ 2022-11-28 14:29 ` René Scharfe 2022-11-28 14:34 ` Ævar Arnfjörð Bjarmason 1 sibling, 1 reply; 52+ messages in thread From: René Scharfe @ 2022-11-28 14:29 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Junio C Hamano, Git List, Taylor Blau, Christian Couder, Jeff King Am 28.11.2022 um 12:31 schrieb Ævar Arnfjörð Bjarmason: > > On Mon, Nov 28 2022, René Scharfe wrote: > >> Am 28.11.2022 um 11:03 schrieb Junio C Hamano: >>> René Scharfe <l.s.r@web.de> writes: >>> >>>> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518. >>>> >>>> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak, >>>> 2022-03-28) avoided leaking rev_info allocations in many cases by >>>> calling repo_init_revisions() only when the .filter member was actually >>>> needed, but then still leaking it. That was fixed later by 2108fe4a19 >>>> (revisions API users: add straightforward release_revisions(), >>>> 2022-04-13), making the reverted commit unnecessary. >>> >>> Hmph, with this merged, 'seen' breaks linux-leaks job in a strange >>> way. >>> >>> https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917 >>> >>> Does anybody want to help looking into it? > > [I see we crossed E-Mails]: > https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/ > >> The patch exposes that release_revisions() leaks the diffopt allocations >> as we're yet to address the TODO added by 54c8a7c379 (revisions API: add >> a TODO for diff_free(&revs->diffopt), 2022-04-14). > > That's correct, and we have that leak in various places in our codebase, > but per the above side-thread I think this is primarily exposing that > we're setting up the "struct rev_info" with your change when we don't > need to. Why can't we just skip it? I have no idea how to stop get_object_list() from using struct rev_info. We could let it take a struct list_objects_filter_options pointer instead and have it build a struct rev_info internally, but that would just move the problem, not solve it. > Yeah, if we do set it up we'll run into an outstanding leak, and that > should also be fixed (I have some local patches...), but the other cases > I know of where we'll leak that data is where we're actually using the > "struct rev_info". > > I haven't tried tearing your change apart to poke at it myself, and > maybe there's some really good reason for why you can't separate getting > rid of the J.5.7 dependency and removing the lazy-init. > >> The patch below plugs it locally. >> >> --- >8 --- >> Subject: [PATCH 4/3] fixup! revision: free diffopt in release_revisions() >> >> Signed-off-by: René Scharfe <l.s.r@web.de> >> --- >> builtin/pack-objects.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c >> index 3e74fbb0cd..a47a3f0fba 100644 >> --- a/builtin/pack-objects.c >> +++ b/builtin/pack-objects.c >> @@ -4462,6 +4462,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) >> } else { >> get_object_list(&revs, rp.nr, rp.v); >> } >> + diff_free(&revs.diffopt); >> release_revisions(&revs); >> cleanup_preferred_base(); >> if (include_tag && nr_result) > > So, the main motivation for the change was paranoia that a compiler or > platform might show up without J.5.7 support and that would bite us, but > we're now adding a double-free-in-waiting? > > I think we're both a bit paranoid, but clearly have different > paranoia-priorities :) > > If we do end up with some hack like this instead of fixing the > underlying problem I'd much prefer that such a hack just be an UNLEAK() > here. > > I.e. we have a destructor for "revs.*" already, let's not bypass it and > start freeing things from under it, which will result in a double-free > if we forget this callsite once the TODO in 54c8a7c379 is addressed. Well, that TODO fix should remove this new diff_free() call, but I agree that this is fragile. Removing the "TEST_PASSES_SANITIZE_LEAK=true" line from affected tests is probably better. > As you'd see if you made release_revisions() simply call > diff_free(&revs.diffopt) doing so would reveal some really gnarly edge > cases. That was my first attempt; it breaks lots of tests due to double frees. > I haven't dug into this one, but offhand I'm not confident in saying > that this isn't exposing us to some aspect of that gnarlyness (maybe > not, it's been a while since I looked). I saw it as the way towards a release_revisions() that calls diff_free() itself: Add such calls to each of them, fix the "gnarlyness" individually, finally move them all into release_revisions(). The only problem is that there are 60+ callsites. > (IIRC some of the most gnarly edge cases will only show up as CI > failures on Windows, to do with the ordering of when we'll fclose() > files hanging off that "diffopt"). Fun. René ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" 2022-11-28 14:29 ` René Scharfe @ 2022-11-28 14:34 ` Ævar Arnfjörð Bjarmason 2022-11-28 15:56 ` René Scharfe 0 siblings, 1 reply; 52+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-11-28 14:34 UTC (permalink / raw) To: René Scharfe Cc: Junio C Hamano, Git List, Taylor Blau, Christian Couder, Jeff King On Mon, Nov 28 2022, René Scharfe wrote: > Am 28.11.2022 um 12:31 schrieb Ævar Arnfjörð Bjarmason: >> >> On Mon, Nov 28 2022, René Scharfe wrote: >> >>> Am 28.11.2022 um 11:03 schrieb Junio C Hamano: >>>> René Scharfe <l.s.r@web.de> writes: >>>> >>>>> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518. >>>>> >>>>> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak, >>>>> 2022-03-28) avoided leaking rev_info allocations in many cases by >>>>> calling repo_init_revisions() only when the .filter member was actually >>>>> needed, but then still leaking it. That was fixed later by 2108fe4a19 >>>>> (revisions API users: add straightforward release_revisions(), >>>>> 2022-04-13), making the reverted commit unnecessary. >>>> >>>> Hmph, with this merged, 'seen' breaks linux-leaks job in a strange >>>> way. >>>> >>>> https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917 >>>> >>>> Does anybody want to help looking into it? >> >> [I see we crossed E-Mails]: >> https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/ >> >>> The patch exposes that release_revisions() leaks the diffopt allocations >>> as we're yet to address the TODO added by 54c8a7c379 (revisions API: add >>> a TODO for diff_free(&revs->diffopt), 2022-04-14). >> >> That's correct, and we have that leak in various places in our codebase, >> but per the above side-thread I think this is primarily exposing that >> we're setting up the "struct rev_info" with your change when we don't >> need to. Why can't we just skip it? > > I have no idea how to stop get_object_list() from using struct rev_info. > We could let it take a struct list_objects_filter_options pointer > instead and have it build a struct rev_info internally, but that would > just move the problem, not solve it. I mean skip it when it's not needed, it's needed when we call get_object_list(). But what "problem" is being caused by get_object_list()? That there's some other case(s) where it'll leak still? I haven't checked, I think we should leave that for some other time if there's such leaks, and just not introduce any new leaks in this topic. >> Yeah, if we do set it up we'll run into an outstanding leak, and that >> should also be fixed (I have some local patches...), but the other cases >> I know of where we'll leak that data is where we're actually using the >> "struct rev_info". >> >> I haven't tried tearing your change apart to poke at it myself, and >> maybe there's some really good reason for why you can't separate getting >> rid of the J.5.7 dependency and removing the lazy-init. >> >>> The patch below plugs it locally. >>> >>> --- >8 --- >>> Subject: [PATCH 4/3] fixup! revision: free diffopt in release_revisions() >>> >>> Signed-off-by: René Scharfe <l.s.r@web.de> >>> --- >>> builtin/pack-objects.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c >>> index 3e74fbb0cd..a47a3f0fba 100644 >>> --- a/builtin/pack-objects.c >>> +++ b/builtin/pack-objects.c >>> @@ -4462,6 +4462,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) >>> } else { >>> get_object_list(&revs, rp.nr, rp.v); >>> } >>> + diff_free(&revs.diffopt); >>> release_revisions(&revs); >>> cleanup_preferred_base(); >>> if (include_tag && nr_result) >> >> So, the main motivation for the change was paranoia that a compiler or >> platform might show up without J.5.7 support and that would bite us, but >> we're now adding a double-free-in-waiting? >> >> I think we're both a bit paranoid, but clearly have different >> paranoia-priorities :) >> >> If we do end up with some hack like this instead of fixing the >> underlying problem I'd much prefer that such a hack just be an UNLEAK() >> here. >> >> I.e. we have a destructor for "revs.*" already, let's not bypass it and >> start freeing things from under it, which will result in a double-free >> if we forget this callsite once the TODO in 54c8a7c379 is addressed. > > Well, that TODO fix should remove this new diff_free() call, but I > agree that this is fragile. > > Removing the "TEST_PASSES_SANITIZE_LEAK=true" line from affected tests > is probably better. Or just not introduce new leaks, per my suggested fix-up at https://lore.kernel.org/git/221128.86zgcbl0pe.gmgdl@evledraar.gmail.com/ (which it looks like you haven't seen when this E-Mail is composed...). >> As you'd see if you made release_revisions() simply call >> diff_free(&revs.diffopt) doing so would reveal some really gnarly edge >> cases. > > That was my first attempt; it breaks lots of tests due to double frees. Right, to be clear I'm saying that none of this is needed right now, i.e. I don't get why we'd want the scope-creep past the hunk I noted in https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/ for the --filter bug fix (plus the tests you're adding). >> I haven't dug into this one, but offhand I'm not confident in saying >> that this isn't exposing us to some aspect of that gnarlyness (maybe >> not, it's been a while since I looked). > > I saw it as the way towards a release_revisions() that calls diff_free() > itself: Add such calls to each of them, fix the "gnarlyness" > individually, finally move them all into release_revisions(). The only > problem is that there are 60+ callsites. I think this is a really bad approach in general. Yes, it may happen to work to free() some data from under an API, but it's just as likely that we'll miss that this one caller is screwing with its internal state, and e.g. when some new revision.c code is used it'll go boom. If we wanted to phase in such a free() of "foo" I think the right way would be to add some "revs.free_foo = 1" flag, giving the API a chance to treat that sanely, not to start poking at members of the struct, and assuming that its release() won't be free()-ing them. But as noted above & in the linked I think we can defer all of that. The only reason we're discussing this is because you're changing the lazy-init to be not-lazy, and introducing new leaks as a result. I've shown a couple of approaches in this thread of fixing the issue(s) at hand without introducing such leaks, so ... ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" 2022-11-28 14:34 ` Ævar Arnfjörð Bjarmason @ 2022-11-28 15:56 ` René Scharfe 2022-11-28 17:57 ` René Scharfe 2022-11-28 17:57 ` Ævar Arnfjörð Bjarmason 0 siblings, 2 replies; 52+ messages in thread From: René Scharfe @ 2022-11-28 15:56 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Junio C Hamano, Git List, Taylor Blau, Christian Couder, Jeff King Am 28.11.2022 um 15:34 schrieb Ævar Arnfjörð Bjarmason: > > On Mon, Nov 28 2022, René Scharfe wrote: > >> Am 28.11.2022 um 12:31 schrieb Ævar Arnfjörð Bjarmason: >>> >>> On Mon, Nov 28 2022, René Scharfe wrote: >>> >>>> Am 28.11.2022 um 11:03 schrieb Junio C Hamano: >>>>> René Scharfe <l.s.r@web.de> writes: >>>>> >>>>>> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518. >>>>>> >>>>>> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak, >>>>>> 2022-03-28) avoided leaking rev_info allocations in many cases by >>>>>> calling repo_init_revisions() only when the .filter member was actually >>>>>> needed, but then still leaking it. That was fixed later by 2108fe4a19 >>>>>> (revisions API users: add straightforward release_revisions(), >>>>>> 2022-04-13), making the reverted commit unnecessary. >>>>> >>>>> Hmph, with this merged, 'seen' breaks linux-leaks job in a strange >>>>> way. >>>>> >>>>> https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917 >>>>> >>>>> Does anybody want to help looking into it? >>> >>> [I see we crossed E-Mails]: >>> https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/ >>> >>>> The patch exposes that release_revisions() leaks the diffopt allocations >>>> as we're yet to address the TODO added by 54c8a7c379 (revisions API: add >>>> a TODO for diff_free(&revs->diffopt), 2022-04-14). >>> >>> That's correct, and we have that leak in various places in our codebase, >>> but per the above side-thread I think this is primarily exposing that >>> we're setting up the "struct rev_info" with your change when we don't >>> need to. Why can't we just skip it? >> >> I have no idea how to stop get_object_list() from using struct rev_info. >> We could let it take a struct list_objects_filter_options pointer >> instead and have it build a struct rev_info internally, but that would >> just move the problem, not solve it. > > I mean skip it when it's not needed, it's needed when we call > get_object_list(). > > But what "problem" is being caused by get_object_list()? That there's > some other case(s) where it'll leak still? I haven't checked, I think we > should leave that for some other time if there's such leaks, and just > not introduce any new leaks in this topic. The problem is "How to use struct rev_info without leaks?". No matter where you move it, the leak will be present until the TODO in release_revisions() is done. > >>> Yeah, if we do set it up we'll run into an outstanding leak, and that >>> should also be fixed (I have some local patches...), but the other cases >>> I know of where we'll leak that data is where we're actually using the >>> "struct rev_info". >>> >>> I haven't tried tearing your change apart to poke at it myself, and >>> maybe there's some really good reason for why you can't separate getting >>> rid of the J.5.7 dependency and removing the lazy-init. >>> >>>> The patch below plugs it locally. >>>> >>>> --- >8 --- >>>> Subject: [PATCH 4/3] fixup! revision: free diffopt in release_revisions() >>>> >>>> Signed-off-by: René Scharfe <l.s.r@web.de> >>>> --- >>>> builtin/pack-objects.c | 1 + >>>> 1 file changed, 1 insertion(+) >>>> >>>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c >>>> index 3e74fbb0cd..a47a3f0fba 100644 >>>> --- a/builtin/pack-objects.c >>>> +++ b/builtin/pack-objects.c >>>> @@ -4462,6 +4462,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) >>>> } else { >>>> get_object_list(&revs, rp.nr, rp.v); >>>> } >>>> + diff_free(&revs.diffopt); >>>> release_revisions(&revs); >>>> cleanup_preferred_base(); >>>> if (include_tag && nr_result) >>> >>> So, the main motivation for the change was paranoia that a compiler or >>> platform might show up without J.5.7 support and that would bite us, but >>> we're now adding a double-free-in-waiting? >>> >>> I think we're both a bit paranoid, but clearly have different >>> paranoia-priorities :) >>> >>> If we do end up with some hack like this instead of fixing the >>> underlying problem I'd much prefer that such a hack just be an UNLEAK() >>> here. >>> >>> I.e. we have a destructor for "revs.*" already, let's not bypass it and >>> start freeing things from under it, which will result in a double-free >>> if we forget this callsite once the TODO in 54c8a7c379 is addressed. >> >> Well, that TODO fix should remove this new diff_free() call, but I >> agree that this is fragile. >> >> Removing the "TEST_PASSES_SANITIZE_LEAK=true" line from affected tests >> is probably better. > > Or just not introduce new leaks, per my suggested fix-up at > https://lore.kernel.org/git/221128.86zgcbl0pe.gmgdl@evledraar.gmail.com/ > (which it looks like you haven't seen when this E-Mail is composed...). Not adding leaks is a good idea. AFAICS none of my patches so far add any. Patch 3 of v2 exposes an existing one that was only triggered by the --filter option before. Which is also not ideal, of course, but giving it more visibility hopefully will motivate a proper fix. >>> As you'd see if you made release_revisions() simply call >>> diff_free(&revs.diffopt) doing so would reveal some really gnarly edge >>> cases. >> >> That was my first attempt; it breaks lots of tests due to double frees. > > Right, to be clear I'm saying that none of this is needed right now, > i.e. I don't get why we'd want the scope-creep past the hunk I noted in > https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/ > for the --filter bug fix (plus the tests you're adding). Well, you asked to squash the minimal fix into the laziness removal in https://lore.kernel.org/git/221112.86bkpcmm6i.gmgdl@evledraar.gmail.com/ Reverting 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak, 2022-03-28) wholesale is the simplest way to reach the goals of regression fix, simplification and standard compliance. Except that the leak check tests have come to depend on the leak being hidden in the --filter corner. So going back to v1 sure seems attractive. > >>> I haven't dug into this one, but offhand I'm not confident in saying >>> that this isn't exposing us to some aspect of that gnarlyness (maybe >>> not, it's been a while since I looked). >> >> I saw it as the way towards a release_revisions() that calls diff_free() >> itself: Add such calls to each of them, fix the "gnarlyness" >> individually, finally move them all into release_revisions(). The only >> problem is that there are 60+ callsites. > > I think this is a really bad approach in general. > > Yes, it may happen to work to free() some data from under an API, but > it's just as likely that we'll miss that this one caller is screwing > with its internal state, and e.g. when some new revision.c code is used > it'll go boom. > > If we wanted to phase in such a free() of "foo" I think the right way > would be to add some "revs.free_foo = 1" flag, giving the API a chance > to treat that sanely, not to start poking at members of the struct, and > assuming that its release() won't be free()-ing them. And that's why you added no_free to struct diff_options. We could use it here by setting it in repo_init_revisions() and unsetting in cmd_pack_objects() and elsewhere, until it is set everywhere. > But as noted above & in the linked I think we can defer all of that. The > only reason we're discussing this is because you're changing the > lazy-init to be not-lazy, and introducing new leaks as a result.> > I've shown a couple of approaches in this thread of fixing the issue(s) > at hand without introducing such leaks, so ... As noted above: These leaks are not new, they are just moved into test coverage. René ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" 2022-11-28 15:56 ` René Scharfe @ 2022-11-28 17:57 ` René Scharfe 2022-11-28 18:32 ` Ævar Arnfjörð Bjarmason 2022-11-28 17:57 ` Ævar Arnfjörð Bjarmason 1 sibling, 1 reply; 52+ messages in thread From: René Scharfe @ 2022-11-28 17:57 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Junio C Hamano, Git List, Taylor Blau, Christian Couder, Jeff King Am 28.11.2022 um 16:56 schrieb René Scharfe:> > The problem is "How to use struct rev_info without leaks?". No matter > where you move it, the leak will be present until the TODO in > release_revisions() is done. Wrong. The following sequence leaks: struct rev_info revs; repo_init_revisions(the_repository, &revs, NULL); release_revisions(&revs); ... and this here doesn't: struct rev_info revs; repo_init_revisions(the_repository, &revs, NULL); setup_revisions(0, NULL, &revs, NULL); // leak plugger release_revisions(&revs); That's because setup_revisions() calls diff_setup_done(), which frees revs->diffopt.parseopts, and release_revisions() doesn't. And since builtin/pack-objects.c::get_object_list() calls setup_revisions(), it really frees that memory, as you claimed from the start. Sorry, I was somehow assuming that a setup function wouldn't clean up. D'oh! The first sequence is used in some other places. e.g. builtin/prune.c. But there LeakSanitizer doesn't complain for some reason; Valgrind reports the parseopts allocation as "possibly lost". I still think the assumption that "init_x(x); release_x(x);" doesn't leak is reasonable. Let's make it true. How about this? It's safe in the sense that we don't risk double frees and it's close to the TODO comment so we probably won't forget removing it once diff_free() becomes used. --- revision.c | 1 + 1 file changed, 1 insertion(+) diff --git a/revision.c b/revision.c index 439e34a7c5..6a51ef9418 100644 --- a/revision.c +++ b/revision.c @@ -3055,6 +3055,7 @@ void release_revisions(struct rev_info *revs) release_revisions_mailmap(revs->mailmap); free_grep_patterns(&revs->grep_filter); /* TODO (need to handle "no_free"): diff_free(&revs->diffopt) */ + FREE_AND_NULL(revs->diffopt.parseopts); diff_free(&revs->pruning); reflog_walk_info_release(revs->reflog_info); release_revisions_topo_walk_info(revs->topo_walk_info); -- 2.30.2 ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" 2022-11-28 17:57 ` René Scharfe @ 2022-11-28 18:32 ` Ævar Arnfjörð Bjarmason 2022-11-28 21:57 ` René Scharfe 0 siblings, 1 reply; 52+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-11-28 18:32 UTC (permalink / raw) To: René Scharfe Cc: Junio C Hamano, Git List, Taylor Blau, Christian Couder, Jeff King On Mon, Nov 28 2022, René Scharfe wrote: > Am 28.11.2022 um 16:56 schrieb René Scharfe:> >> The problem is "How to use struct rev_info without leaks?". No matter >> where you move it, the leak will be present until the TODO in >> release_revisions() is done. > > Wrong. The following sequence leaks: > > struct rev_info revs; > repo_init_revisions(the_repository, &revs, NULL); > release_revisions(&revs); > > ... and this here doesn't: > > struct rev_info revs; > repo_init_revisions(the_repository, &revs, NULL); > setup_revisions(0, NULL, &revs, NULL); // leak plugger > release_revisions(&revs); > > That's because setup_revisions() calls diff_setup_done(), which frees > revs->diffopt.parseopts, and release_revisions() doesn't. > > And since builtin/pack-objects.c::get_object_list() calls > setup_revisions(), it really frees that memory, as you claimed from the > start. Sorry, I was somehow assuming that a setup function wouldn't > clean up. D'oh! > > The first sequence is used in some other places. e.g. builtin/prune.c. > But there LeakSanitizer doesn't complain for some reason; Valgrind > reports the parseopts allocation as "possibly lost". Yes, some of the interactions are tricky. It's really useful to run the tests with GIT_TEST_PASSING_SANITIZE_LEAK=[true|check] (see t/README) to check these sorts of assumptions for sanity. > I still think the assumption that "init_x(x); release_x(x);" doesn't > leak is reasonable. Let's make it true. How about this? It's safe > in the sense that we don't risk double frees and it's close to the > TODO comment so we probably won't forget removing it once diff_free() > becomes used. > > --- > revision.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/revision.c b/revision.c > index 439e34a7c5..6a51ef9418 100644 > --- a/revision.c > +++ b/revision.c > @@ -3055,6 +3055,7 @@ void release_revisions(struct rev_info *revs) > release_revisions_mailmap(revs->mailmap); > free_grep_patterns(&revs->grep_filter); > /* TODO (need to handle "no_free"): diff_free(&revs->diffopt) */ > + FREE_AND_NULL(revs->diffopt.parseopts); > diff_free(&revs->pruning); > reflog_walk_info_release(revs->reflog_info); > release_revisions_topo_walk_info(revs->topo_walk_info); At this point I'm unclear on what & why this is needed? I.e. once we narrowly fix the >1 "--filter" options what still fails? But in general: I don't really think this sort of thing is worth it. Here we're reaching into a member of "revs->diffopt" behind its back rather than calling diff_free(). I think we should just focus on being able to do do that safely. WIP patches I have in that direction, partially based on your previous "is_dead" suggestion: https://github.com/avar/git/commit/e02a15f6206 https://github.com/avar/git/commit/c718f36566a I haven't poked at that in a while, I think the only outstanding issue with it is that fclose() interaction. I think for this particular thing there aren't going to be any bad side-effects in practice, but I also think convincing oneself of that basically means putting the same amount of work in as just fixing some of these properly. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" 2022-11-28 18:32 ` Ævar Arnfjörð Bjarmason @ 2022-11-28 21:57 ` René Scharfe 2022-11-29 1:26 ` Jeff King 2022-11-29 7:12 ` Ævar Arnfjörð Bjarmason 0 siblings, 2 replies; 52+ messages in thread From: René Scharfe @ 2022-11-28 21:57 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Junio C Hamano, Git List, Taylor Blau, Christian Couder, Jeff King Am 28.11.2022 um 19:32 schrieb Ævar Arnfjörð Bjarmason: > > On Mon, Nov 28 2022, René Scharfe wrote: > >> Am 28.11.2022 um 16:56 schrieb René Scharfe:> >>> The problem is "How to use struct rev_info without leaks?". No matter >>> where you move it, the leak will be present until the TODO in >>> release_revisions() is done. >> >> Wrong. The following sequence leaks: >> >> struct rev_info revs; >> repo_init_revisions(the_repository, &revs, NULL); >> release_revisions(&revs); >> >> ... and this here doesn't: >> >> struct rev_info revs; >> repo_init_revisions(the_repository, &revs, NULL); >> setup_revisions(0, NULL, &revs, NULL); // leak plugger >> release_revisions(&revs); >> >> That's because setup_revisions() calls diff_setup_done(), which frees >> revs->diffopt.parseopts, and release_revisions() doesn't. >> >> And since builtin/pack-objects.c::get_object_list() calls >> setup_revisions(), it really frees that memory, as you claimed from the >> start. Sorry, I was somehow assuming that a setup function wouldn't >> clean up. D'oh! >> >> The first sequence is used in some other places. e.g. builtin/prune.c. >> But there LeakSanitizer doesn't complain for some reason; Valgrind >> reports the parseopts allocation as "possibly lost". > > Yes, some of the interactions are tricky. It's really useful to run the > tests with GIT_TEST_PASSING_SANITIZE_LEAK=[true|check] (see t/README) to > check these sorts of assumptions for sanity. That may be true, and looks even useful -- I didn't know the check value. I only get a strange error message, though: $ GIT_TEST_PASSING_SANITIZE_LEAK=check ./t0001-init.sh Bail out! GIT_TEST_PASSING_SANITIZE_LEAK=true has no effect except when compiled with SANITIZE=leak Same with make test and prove, of course. And of course I compiled with SANITIZE=leak beforehand. But I don't see a connection between my comment and yours. I was not running any tests, just the above sequences of function calls, e.g. in git prune. > >> I still think the assumption that "init_x(x); release_x(x);" doesn't >> leak is reasonable. Let's make it true. How about this? It's safe >> in the sense that we don't risk double frees and it's close to the >> TODO comment so we probably won't forget removing it once diff_free() >> becomes used. >> >> --- >> revision.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/revision.c b/revision.c >> index 439e34a7c5..6a51ef9418 100644 >> --- a/revision.c >> +++ b/revision.c >> @@ -3055,6 +3055,7 @@ void release_revisions(struct rev_info *revs) >> release_revisions_mailmap(revs->mailmap); >> free_grep_patterns(&revs->grep_filter); >> /* TODO (need to handle "no_free"): diff_free(&revs->diffopt) */ >> + FREE_AND_NULL(revs->diffopt.parseopts); >> diff_free(&revs->pruning); >> reflog_walk_info_release(revs->reflog_info); >> release_revisions_topo_walk_info(revs->topo_walk_info); > > At this point I'm unclear on what & why this is needed? I.e. once we > narrowly fix the >1 "--filter" options what still fails? As I wrote: A call to an initialization function followed by a call to a cleanup function and nothing else shouldn't leak. There are examples of repo_init_revisions()+release_revisions() without setup_revisions() or diff_setup_done() beyond pack-objects. I mentioned prune, but there are more, e.g. in sequencer.c. > But in general: I don't really think this sort of thing is worth > it. Here we're reaching into a member of "revs->diffopt" behind its back > rather than calling diff_free(). I think we should just focus on being > able to do do that safely. Sure, but the FREE_AND_NULL call is simple and safe, while diff_free() is complicated and calling it one time too many can hurt. > WIP patches I have in that direction, partially based on your previous > "is_dead" suggestion: > > https://github.com/avar/git/commit/e02a15f6206 > https://github.com/avar/git/commit/c718f36566a Copy-typed the interesting parts of the first patch like a medieval monk because there doesn't seem to be a download option. :-| > I haven't poked at that in a while, I think the only outstanding issue > with it is that fclose() interaction. You mean the t3702-add-edit.sh failure on Windows mentioned in the commit message of e02a15f6206? That's caused by the file being kept open and thus locked during the call of the editor. Moving the release_revisions() call in builtin/add.c::edit_patch() before the launch_editor() call fixes that by closing the file. > I think for this particular thing there aren't going to be any bad > side-effects in practice, but I also think convincing oneself of that > basically means putting the same amount of work in as just fixing some > of these properly. Not to me, but perhaps that TODO is easier solved that I expected. In any case, with the mentioned edit_patch() change described above e02a15f6206 passes the test suite on Windows for me. René ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" 2022-11-28 21:57 ` René Scharfe @ 2022-11-29 1:26 ` Jeff King 2022-11-29 1:46 ` Junio C Hamano 2022-11-29 10:25 ` Ævar Arnfjörð Bjarmason 2022-11-29 7:12 ` Ævar Arnfjörð Bjarmason 1 sibling, 2 replies; 52+ messages in thread From: Jeff King @ 2022-11-29 1:26 UTC (permalink / raw) To: René Scharfe Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, Git List, Taylor Blau, Christian Couder On Mon, Nov 28, 2022 at 10:57:02PM +0100, René Scharfe wrote: > >> diff --git a/revision.c b/revision.c > >> index 439e34a7c5..6a51ef9418 100644 > >> --- a/revision.c > >> +++ b/revision.c > >> @@ -3055,6 +3055,7 @@ void release_revisions(struct rev_info *revs) > >> release_revisions_mailmap(revs->mailmap); > >> free_grep_patterns(&revs->grep_filter); > >> /* TODO (need to handle "no_free"): diff_free(&revs->diffopt) */ > >> + FREE_AND_NULL(revs->diffopt.parseopts); > >> diff_free(&revs->pruning); > >> reflog_walk_info_release(revs->reflog_info); > >> release_revisions_topo_walk_info(revs->topo_walk_info); > > > > At this point I'm unclear on what & why this is needed? I.e. once we > > narrowly fix the >1 "--filter" options what still fails? > > As I wrote: A call to an initialization function followed by a call to a > cleanup function and nothing else shouldn't leak. There are examples of > repo_init_revisions()+release_revisions() without setup_revisions() or > diff_setup_done() beyond pack-objects. I mentioned prune, but there are > more, e.g. in sequencer.c. I tend to agree with you; an init+release combo should release all memory. We _can_ work around it in the caller here, but I think we are better to correct the root cause. I do think what Ævar is saying is: once we have fixed the bug, why are more changes needed? I.e., why would we get rid of the lazy-init. IMHO the answer is that the lazy-init is a more complex pattern, and requires more boilerplate code (which can lead to more bugs, as we saw). So once the bug is fixed, this is purely about cleanup/simplification (if one ignores the C-standard issues), but I tend to think it is one worth doing. (Apologies to Ævar if I'm mis-stating your position). > > But in general: I don't really think this sort of thing is worth > > it. Here we're reaching into a member of "revs->diffopt" behind its back > > rather than calling diff_free(). I think we should just focus on being > > able to do do that safely. > > Sure, but the FREE_AND_NULL call is simple and safe, while diff_free() > is complicated and calling it one time too many can hurt. Again, I'd agree with you here. In the long run, yes, we want diff_free(). But if that is not ready yet, the FREE_AND_NULL() is a stepping stone that takes us in the right direction and which we can do immediately. > > WIP patches I have in that direction, partially based on your previous > > "is_dead" suggestion: > > > > https://github.com/avar/git/commit/e02a15f6206 > > https://github.com/avar/git/commit/c718f36566a > > Copy-typed the interesting parts of the first patch like a medieval monk > because there doesn't seem to be a download option. :-| This is the actual reason I responded to your message. ;) Try: https://github.com/avar/git/commit/e02a15f6206.patch etc. I don't think there's a "raw patch" link or similar in the interface, though. You have to just know about it. -Peff ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" 2022-11-29 1:26 ` Jeff King @ 2022-11-29 1:46 ` Junio C Hamano 2022-11-29 10:25 ` Ævar Arnfjörð Bjarmason 1 sibling, 0 replies; 52+ messages in thread From: Junio C Hamano @ 2022-11-29 1:46 UTC (permalink / raw) To: Jeff King Cc: René Scharfe, Ævar Arnfjörð Bjarmason, Git List, Taylor Blau, Christian Couder Jeff King <peff@peff.net> writes: > This is the actual reason I responded to your message. ;) Try: > > https://github.com/avar/git/commit/e02a15f6206.patch > > etc. I don't think there's a "raw patch" link or similar in the > interface, though. You have to just know about it. ;-) ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" 2022-11-29 1:26 ` Jeff King 2022-11-29 1:46 ` Junio C Hamano @ 2022-11-29 10:25 ` Ævar Arnfjörð Bjarmason 1 sibling, 0 replies; 52+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-11-29 10:25 UTC (permalink / raw) To: Jeff King Cc: René Scharfe, Junio C Hamano, Git List, Taylor Blau, Christian Couder On Mon, Nov 28 2022, Jeff King wrote: > On Mon, Nov 28, 2022 at 10:57:02PM +0100, René Scharfe wrote: > >> >> diff --git a/revision.c b/revision.c >> >> index 439e34a7c5..6a51ef9418 100644 >> >> --- a/revision.c >> >> +++ b/revision.c >> >> @@ -3055,6 +3055,7 @@ void release_revisions(struct rev_info *revs) >> >> release_revisions_mailmap(revs->mailmap); >> >> free_grep_patterns(&revs->grep_filter); >> >> /* TODO (need to handle "no_free"): diff_free(&revs->diffopt) */ >> >> + FREE_AND_NULL(revs->diffopt.parseopts); >> >> diff_free(&revs->pruning); >> >> reflog_walk_info_release(revs->reflog_info); >> >> release_revisions_topo_walk_info(revs->topo_walk_info); >> > >> > At this point I'm unclear on what & why this is needed? I.e. once we >> > narrowly fix the >1 "--filter" options what still fails? >> >> As I wrote: A call to an initialization function followed by a call to a >> cleanup function and nothing else shouldn't leak. There are examples of >> repo_init_revisions()+release_revisions() without setup_revisions() or >> diff_setup_done() beyond pack-objects. I mentioned prune, but there are >> more, e.g. in sequencer.c. > > I tend to agree with you; an init+release combo should release all > memory. We _can_ work around it in the caller here, but I think we are > better to correct the root cause. > > I do think what Ævar is saying is: once we have fixed the bug, why are > more changes needed? I.e., why would we get rid of the lazy-init. IMHO > the answer is that the lazy-init is a more complex pattern, and requires > more boilerplate code (which can lead to more bugs, as we saw). So once > the bug is fixed, this is purely about cleanup/simplification (if one > ignores the C-standard issues), but I tend to think it is one worth > doing. > > (Apologies to Ævar if I'm mis-stating your position). My position isn't that it isn't worth doing, but that we can clearly proceed in smaller steps here, and changing one thing at a time is better. So far in this thread we've had, in relation to 3/3 (and I may have lost track of some points made): 1. It's fixing the bug where you provide --filter N times 2. It's refactoring the "lazy init" to "non-lazy init" 3. It's refactoring the code to avoid relying on the J.5.7 extension in C99. 4. Due to #2 we run into more leaks 5. A dozen or so test files fail due to #2. Are we digressing to fix the root cause of the leak(s), or un-marking those tests as passing leak-free & losing test coverage? As https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/ shows we can do #1 as a stand-alone change. I'm not saying the rest of this isn't worth pursuing, except to point out in https://lore.kernel.org/git/221128.86zgcbl0pe.gmgdl@evledraar.gmail.com/ that #2 can be split off from #3. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" 2022-11-28 21:57 ` René Scharfe 2022-11-29 1:26 ` Jeff King @ 2022-11-29 7:12 ` Ævar Arnfjörð Bjarmason 2022-11-29 19:18 ` René Scharfe 1 sibling, 1 reply; 52+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-11-29 7:12 UTC (permalink / raw) To: René Scharfe Cc: Junio C Hamano, Git List, Taylor Blau, Christian Couder, Jeff King On Mon, Nov 28 2022, René Scharfe wrote: > Am 28.11.2022 um 19:32 schrieb Ævar Arnfjörð Bjarmason: >> >> On Mon, Nov 28 2022, René Scharfe wrote: >> >>> Am 28.11.2022 um 16:56 schrieb René Scharfe:> >>>> The problem is "How to use struct rev_info without leaks?". No matter >>>> where you move it, the leak will be present until the TODO in >>>> release_revisions() is done. >>> >>> Wrong. The following sequence leaks: >>> >>> struct rev_info revs; >>> repo_init_revisions(the_repository, &revs, NULL); >>> release_revisions(&revs); >>> >>> ... and this here doesn't: >>> >>> struct rev_info revs; >>> repo_init_revisions(the_repository, &revs, NULL); >>> setup_revisions(0, NULL, &revs, NULL); // leak plugger >>> release_revisions(&revs); >>> >>> That's because setup_revisions() calls diff_setup_done(), which frees >>> revs->diffopt.parseopts, and release_revisions() doesn't. >>> >>> And since builtin/pack-objects.c::get_object_list() calls >>> setup_revisions(), it really frees that memory, as you claimed from the >>> start. Sorry, I was somehow assuming that a setup function wouldn't >>> clean up. D'oh! >>> >>> The first sequence is used in some other places. e.g. builtin/prune.c. >>> But there LeakSanitizer doesn't complain for some reason; Valgrind >>> reports the parseopts allocation as "possibly lost". >> >> Yes, some of the interactions are tricky. It's really useful to run the >> tests with GIT_TEST_PASSING_SANITIZE_LEAK=[true|check] (see t/README) to >> check these sorts of assumptions for sanity. > > That may be true, and looks even useful -- I didn't know the check > value. I only get a strange error message, though: > > $ GIT_TEST_PASSING_SANITIZE_LEAK=check ./t0001-init.sh > Bail out! GIT_TEST_PASSING_SANITIZE_LEAK=true has no effect except when compiled with SANITIZE=leak > > Same with make test and prove, of course. And of course I compiled > with SANITIZE=leak beforehand. The "=true" part of the message is unfortunately incorrect (it pre-dates "check" being a possible value), but I don't see how you could have compiled with "SANITIZE=leak" and get that message. It's unreachable if 'test -n "$SANITIZE_LEAK"', and that'll be non-empty in GIT-BUILD-OPTIONS if compiled with it. Perhaps you gave SANITIZE=leak to t/Makefile, not the top-level Makefile? Try this at the top-level: GIT_TEST_PASSING_SANITIZE_LEAK=check make SANITIZE= test T=t0001-init.sh > But I don't see a connection between my comment and yours. I was > not running any tests, just the above sequences of function calls, > e.g. in git prune. The connection is that you submitted patches that fail the CI. I don't think you use GitHub, so in particular if you're submitting patches that claim to do something with leak checking it's useful to run those modes against them to check your assumptions. >> >>> I still think the assumption that "init_x(x); release_x(x);" doesn't >>> leak is reasonable. Let's make it true. How about this? It's safe >>> in the sense that we don't risk double frees and it's close to the >>> TODO comment so we probably won't forget removing it once diff_free() >>> becomes used. >>> >>> --- >>> revision.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/revision.c b/revision.c >>> index 439e34a7c5..6a51ef9418 100644 >>> --- a/revision.c >>> +++ b/revision.c >>> @@ -3055,6 +3055,7 @@ void release_revisions(struct rev_info *revs) >>> release_revisions_mailmap(revs->mailmap); >>> free_grep_patterns(&revs->grep_filter); >>> /* TODO (need to handle "no_free"): diff_free(&revs->diffopt) */ >>> + FREE_AND_NULL(revs->diffopt.parseopts); >>> diff_free(&revs->pruning); >>> reflog_walk_info_release(revs->reflog_info); >>> release_revisions_topo_walk_info(revs->topo_walk_info); >> >> At this point I'm unclear on what & why this is needed? I.e. once we >> narrowly fix the >1 "--filter" options what still fails? > > As I wrote: A call to an initialization function followed by a call to a > cleanup function and nothing else shouldn't leak. There are examples of > repo_init_revisions()+release_revisions() without setup_revisions() or > diff_setup_done() beyond pack-objects. I mentioned prune, but there are > more, e.g. in sequencer.c. Yes, I agree it shouldn't leak. And we should definitely fix those leaks. I just don't see why a series fixing bugs in --filter needs to expand the scope to fix those. >> But in general: I don't really think this sort of thing is worth >> it. Here we're reaching into a member of "revs->diffopt" behind its back >> rather than calling diff_free(). I think we should just focus on being >> able to do do that safely. > > Sure, but the FREE_AND_NULL call is simple and safe, while diff_free() > is complicated and calling it one time too many can hurt. It's "safe" because you've read the internals of it, and know that it isn't assuming a non-NULL there once it's past initialization? Or is it like the revisions init()+release() in this thread, where you're assuming it works one way based on the function names etc., only for the CI to fail? In either case, I'm saying that if someone's confident enough to reach into the internals of a structure and tweak it they should be confident enough to just patch diff_free() or the like. >> WIP patches I have in that direction, partially based on your previous >> "is_dead" suggestion: >> >> https://github.com/avar/git/commit/e02a15f6206 >> https://github.com/avar/git/commit/c718f36566a > > Copy-typed the interesting parts of the first patch like a medieval monk > because there doesn't seem to be a download option. :-| Jeff pointed out the ".patch" (there's also ".diff"), but also: Git has this well-known transport protocol it uses, which typically maps to the web URL on public hosting sites ... :) git remote add avar https://github.com/avar/git.git git fetch avar git show <OID> >> I haven't poked at that in a while, I think the only outstanding issue >> with it is that fclose() interaction. > > You mean the t3702-add-edit.sh failure on Windows mentioned in the > commit message of e02a15f6206? That's caused by the file being kept > open and thus locked during the call of the editor. Moving the > release_revisions() call in builtin/add.c::edit_patch() before the > launch_editor() call fixes that by closing the file. Yeah, I haven't had time to poke at it in a while, and I had it queued behind various other leak fixes. I'm sure it's not that complicated in the end, just that that interaction needs to be dealt with. >> I think for this particular thing there aren't going to be any bad >> side-effects in practice, but I also think convincing oneself of that >> basically means putting the same amount of work in as just fixing some >> of these properly. > > Not to me, but perhaps that TODO is easier solved that I expected. > In any case, with the mentioned edit_patch() change described above > e02a15f6206 passes the test suite on Windows for me. Nice! I'll try that out. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" 2022-11-29 7:12 ` Ævar Arnfjörð Bjarmason @ 2022-11-29 19:18 ` René Scharfe 0 siblings, 0 replies; 52+ messages in thread From: René Scharfe @ 2022-11-29 19:18 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Junio C Hamano, Git List, Taylor Blau, Christian Couder, Jeff King Am 29.11.22 um 08:12 schrieb Ævar Arnfjörð Bjarmason: > > On Mon, Nov 28 2022, René Scharfe wrote: > >> That may be true, and looks even useful -- I didn't know the check >> value. I only get a strange error message, though: >> >> $ GIT_TEST_PASSING_SANITIZE_LEAK=check ./t0001-init.sh >> Bail out! GIT_TEST_PASSING_SANITIZE_LEAK=true has no effect except when compiled with SANITIZE=leak >> >> Same with make test and prove, of course. And of course I compiled >> with SANITIZE=leak beforehand. > > The "=true" part of the message is unfortunately incorrect (it pre-dates > "check" being a possible value), but I don't see how you could have > compiled with "SANITIZE=leak" and get that message. > > It's unreachable if 'test -n "$SANITIZE_LEAK"', and that'll be non-empty > in GIT-BUILD-OPTIONS if compiled with it. Perhaps you gave SANITIZE=leak > to t/Makefile, not the top-level Makefile? > > Try this at the top-level: > > GIT_TEST_PASSING_SANITIZE_LEAK=check make SANITIZE= test T=t0001-init.sh It works today, no idea what I did yesterday. Did reboot and make clean in the meantime, shell history is inconclusive. *shrug* >> As I wrote: A call to an initialization function followed by a call to a >> cleanup function and nothing else shouldn't leak. There are examples of >> repo_init_revisions()+release_revisions() without setup_revisions() or >> diff_setup_done() beyond pack-objects. I mentioned prune, but there are >> more, e.g. in sequencer.c. > > Yes, I agree it shouldn't leak. And we should definitely fix those > leaks. I just don't see why a series fixing bugs in --filter needs to > expand the scope to fix those. The connection is that this is the very leak that 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak, 2022-03-28) plugged locally. Took me a while to see that. Anyway, I'm also not keen on scope creep. >>> But in general: I don't really think this sort of thing is worth >>> it. Here we're reaching into a member of "revs->diffopt" behind its back >>> rather than calling diff_free(). I think we should just focus on being >>> able to do do that safely. >> >> Sure, but the FREE_AND_NULL call is simple and safe, while diff_free() >> is complicated and calling it one time too many can hurt. > > It's "safe" because you've read the internals of it, and know that it > isn't assuming a non-NULL there once it's past initialization? > > Or is it like the revisions init()+release() in this thread, where > you're assuming it works one way based on the function names etc., only > for the CI to fail? Ouch. > In either case, I'm saying that if someone's confident enough to reach > into the internals of a structure and tweak it they should be confident > enough to just patch diff_free() or the like. diff_free() is more complicated; it does that FREE_AND_NULL plus several things that are not idempotent. >>> WIP patches I have in that direction, partially based on your previous >>> "is_dead" suggestion: >>> >>> https://github.com/avar/git/commit/e02a15f6206 >>> https://github.com/avar/git/commit/c718f36566a >> >> Copy-typed the interesting parts of the first patch like a medieval monk >> because there doesn't seem to be a download option. :-| > > Jeff pointed out the ".patch" (there's also ".diff"), but also: Git has > this well-known transport protocol it uses, which typically maps to the > web URL on public hosting sites ... :) > > git remote add avar https://github.com/avar/git.git > git fetch avar > git show <OID> Yes, I probably should have downloaded everything like that. Just did it for the heck of it and got 43.37 MiB at 598.00 KiB/s. Typing wasn't that much slower. René ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" 2022-11-28 15:56 ` René Scharfe 2022-11-28 17:57 ` René Scharfe @ 2022-11-28 17:57 ` Ævar Arnfjörð Bjarmason 1 sibling, 0 replies; 52+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-11-28 17:57 UTC (permalink / raw) To: René Scharfe Cc: Junio C Hamano, Git List, Taylor Blau, Christian Couder, Jeff King On Mon, Nov 28 2022, René Scharfe wrote: > Am 28.11.2022 um 15:34 schrieb Ævar Arnfjörð Bjarmason: > [...] >> I mean skip it when it's not needed, it's needed when we call >> get_object_list(). >> >> But what "problem" is being caused by get_object_list()? That there's >> some other case(s) where it'll leak still? I haven't checked, I think we >> should leave that for some other time if there's such leaks, and just >> not introduce any new leaks in this topic. > > The problem is "How to use struct rev_info without leaks?". No matter > where you move it, the leak will be present until the TODO in > release_revisions() is done. No, the problem at hand is that "--filter" doesn't accept N arguments, which is easily fixed. Junio then tested the patch, and noted CI failing. Yes we have various outstanding issues with the revisions API etc. leaking, but the scope-creep of conflating those questions with a regression fix doesn't seem necessary. >>> [...] >>> Well, that TODO fix should remove this new diff_free() call, but I >>> agree that this is fragile. >>> >>> Removing the "TEST_PASSES_SANITIZE_LEAK=true" line from affected tests >>> is probably better. >> >> Or just not introduce new leaks, per my suggested fix-up at >> https://lore.kernel.org/git/221128.86zgcbl0pe.gmgdl@evledraar.gmail.com/ >> (which it looks like you haven't seen when this E-Mail is composed...). > > Not adding leaks is a good idea. AFAICS none of my patches so far add > any. I don't see where debating what constitutes a "new" or "added" memory leak is going to get us. But to clarify I'm not saying you're responsible for the revisions/filter APIs you're running into, and some of those are existing leaks. I'm also saying that if you run: make SANITIZE=leak test T=... For some values of T=... (and the whole test suite with "GIT_TEST_PASSING_SANITIZE_LEAK=true") your 2/3 passes certain tests, and your 3/3 doesn't. > Patch 3 of v2 exposes an existing one that was only triggered by > the --filter option before. Which is also not ideal, of course, but > giving it more visibility hopefully will motivate a proper fix. We try not to break CI. Only finding out that existing CI breaks once a patch is queued & Junio & others start testing it to hunt down issues isn't a good use of anyone's time. So, maybe you think this area of CI is useless etc., but again, if you're submitting a change to advocate for that can we peel that away from a regression we can fix relatively easily? >>>> As you'd see if you made release_revisions() simply call >>>> diff_free(&revs.diffopt) doing so would reveal some really gnarly edge >>>> cases. >>> >>> That was my first attempt; it breaks lots of tests due to double frees. >> >> Right, to be clear I'm saying that none of this is needed right now, >> i.e. I don't get why we'd want the scope-creep past the hunk I noted in >> https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/ >> for the --filter bug fix (plus the tests you're adding). > > Well, you asked to squash the minimal fix into the laziness removal in > https://lore.kernel.org/git/221112.86bkpcmm6i.gmgdl@evledraar.gmail.com/ The second part of the relevant request being so that we could regression test the memory leak(s). So, thanks for trying to address that in some way, but proceeding to break linux-leaks, and then arguing that that's OK seems like an odd way to go about that... > Reverting 5cb28270a1 (pack-objects: lazily set up "struct rev_info", > don't leak, 2022-03-28) wholesale is the simplest way to reach the goals > of regression fix, simplification and standard compliance. Except that > the leak check tests have come to depend on the leak being hidden in the > --filter corner. So going back to v1 sure seems attractive. Yeah. Honestly I'd mostly forgotten about the v1 review, and only skimmed things then. So, I've just noticed that the "squash" I'm suggesting is basically equivalent to your v1 now. But yeah I think it's fine to just skip testing the regression fix with SANITIZE=leak for now, we can do it some other time. I just brought it up in the v1 feedback because that patch prominently notes the leak fix, so if we can get regression test it relatively easily that seemed like a good change to make. >>> [...] >>> I saw it as the way towards a release_revisions() that calls diff_free() >>> itself: Add such calls to each of them, fix the "gnarlyness" >>> individually, finally move them all into release_revisions(). The only >>> problem is that there are 60+ callsites. >> >> I think this is a really bad approach in general. >> >> Yes, it may happen to work to free() some data from under an API, but >> it's just as likely that we'll miss that this one caller is screwing >> with its internal state, and e.g. when some new revision.c code is used >> it'll go boom. >> >> If we wanted to phase in such a free() of "foo" I think the right way >> would be to add some "revs.free_foo = 1" flag, giving the API a chance >> to treat that sanely, not to start poking at members of the struct, and >> assuming that its release() won't be free()-ing them. > > And that's why you added no_free to struct diff_options. We could use > it here by setting it in repo_init_revisions() and unsetting in > cmd_pack_objects() and elsewhere, until it is set everywhere. FWIW I think the "no_free" isn't all that good of an approach in retrospect either, but at least it's (mostly) self-contained in the struct that owns it. But it's rather messy, because some of the time it's embedded in a "struct rev_info" that should really own it, and sometimes now. At the time release_revisions() didn't exist yet, so making some forward progress with the diff leaks was easiest with the "no_free". >> But as noted above & in the linked I think we can defer all of that. The >> only reason we're discussing this is because you're changing the >> lazy-init to be not-lazy, and introducing new leaks as a result.> >> I've shown a couple of approaches in this thread of fixing the issue(s) >> at hand without introducing such leaks, so ... > > As noted above: These leaks are not new, they are just moved into > test coverage. Right, and in some cases we should definitely just un-mark a test as being tested under linux-leaks to make forward progress. But in this case I just don't see how that's a good trade-off. We can fix the --filter bug without that, and yes we'll have some new/existing leaks, but those are all contained in tests that are not tested by linux-leaks now. Whereas the larger rewrite to make the init non-lazy would lose us test coverage. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2 0/3] pack-objects: fix and simplify --filter handling 2022-11-20 10:03 ` [PATCH v2 0/3] pack-objects: fix and simplify --filter handling René Scharfe ` (2 preceding siblings ...) 2022-11-20 10:13 ` [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" René Scharfe @ 2022-11-22 19:02 ` Jeff King 3 siblings, 0 replies; 52+ messages in thread From: Jeff King @ 2022-11-22 19:02 UTC (permalink / raw) To: René Scharfe Cc: Git List, Ævar Arnfjörð Bjarmason, Junio C Hamano, Taylor Blau, Christian Couder On Sun, Nov 20, 2022 at 11:03:35AM +0100, René Scharfe wrote: > Fix a regression that prevents using multiple --filter options, simplify > the option parsing code and avoid relying on undefined behavior in it. > > Patch 3 conflicts with cc/filtered-repack in seen, but not semantically. > > Changes since v1: > - Added patch 1 to fix an issue with existing tests. > - Separate patch 2 for new tests. > - Test using blob size filters, only, which is a bit simpler. > - Test both combinations to also catch not just the current > last-one-wins regression, but also a possible future first-one-wins > issue. > - Actually revert 5cb28270a1 (pack-objects: lazily set up > "struct rev_info", don't leak, 2022-03-28) instead of having a > minimal fix and then adding some kind of middle ground by using a > separate struct list_objects_filter_options. Thanks, this looks good to me. The new test is a little funny in that it's somewhat-nonsensical input, but I find it unlikely that we'd ever add code to treat two instances of the same filter type specially. Compared to something that produces two distinct effects (like a blob and tree-depth filter combined), it also requires the "test in both directions" trick to cover everything. But I do like that it's easier to reason about what the output _should_ be. So I don't feel too strongly about it either way. -Peff ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v3 0/5] pack-objects: fix and simplify --filter handling 2022-11-12 10:42 [PATCH 0/3] pack-objects: fix and simplify --filter handling René Scharfe ` (3 preceding siblings ...) 2022-11-20 10:03 ` [PATCH v2 0/3] pack-objects: fix and simplify --filter handling René Scharfe @ 2022-11-29 12:19 ` René Scharfe 2022-11-29 12:21 ` [PATCH v3 1/5] t5317: stop losing return codes of git ls-files René Scharfe ` (4 more replies) 4 siblings, 5 replies; 52+ messages in thread From: René Scharfe @ 2022-11-29 12:19 UTC (permalink / raw) To: Git List Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, Taylor Blau, Christian Couder Fix a regression that prevents using multiple --filter options, simplify the option parsing code and avoid relying on undefined behavior in it. Patch 4 conflicts with cc/filtered-repack in seen, but not semantically. Changes since v2: - Split the code changes up again like in v1; keep the separate test patches. Reverting 5cb28270a1 wholesale is too cumbersome. - Bring the dedicated list_objects_filter_options struct back. - Move the list_objects_filter_release() call to the cleanup section. Changes since v1: - Added patch 1 to fix an issue with existing tests. - Separate patch 2 for new tests. - Test using blob size filters, only, which is a bit simpler. - Test both combinations to also catch not just the current last-one-wins regression, but also a possible future first-one-wins issue. - Actually revert 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak, 2022-03-28) instead of having a minimal fix and then adding some kind of middle ground by using a separate struct list_objects_filter_options. t5317: stop losing return codes of git ls-files t5317: demonstrate failure to handle multiple --filter options pack-objects: fix handling of multiple --filter options pack-objects: simplify --filter handling list-objects-filter: remove OPT_PARSE_LIST_OBJECTS_FILTER_INIT() builtin/pack-objects.c | 27 ++------ list-objects-filter-options.c | 4 -- list-objects-filter-options.h | 18 +----- t/t5317-pack-objects-filter-objects.sh | 90 +++++++++++++++++++------- 4 files changed, 74 insertions(+), 65 deletions(-) Range-Diff gegen v2: 1: 955ec33c30 = 1: 955ec33c30 t5317: stop losing return codes of git ls-files 2: 966094ef98 = 2: 966094ef98 t5317: demonstrate failure to handle multiple --filter options 3: f5ba2a2f5e < -: ---------- Revert "pack-objects: lazily set up "struct rev_info", don't leak" -: ---------- > 3: d51424e8d1 pack-objects: fix handling of multiple --filter options -: ---------- > 4: e1fa0fcb1a pack-objects: simplify --filter handling -: ---------- > 5: 5865e24c04 list-objects-filter: remove OPT_PARSE_LIST_OBJECTS_FILTER_INIT() -- 2.38.1 ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v3 1/5] t5317: stop losing return codes of git ls-files 2022-11-29 12:19 ` [PATCH v3 0/5] " René Scharfe @ 2022-11-29 12:21 ` René Scharfe 2022-11-29 12:22 ` [PATCH v3 2/5] t5317: demonstrate failure to handle multiple --filter options René Scharfe ` (3 subsequent siblings) 4 siblings, 0 replies; 52+ messages in thread From: René Scharfe @ 2022-11-29 12:21 UTC (permalink / raw) To: Git List Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, Taylor Blau, Christian Couder, Jeff King fb2d0db502 (test-lib-functions: add parsing helpers for ls-files and ls-tree, 2022-04-04) not only started to use helper functions, it also started to pipe the output of git ls-files into them directly, without using a temporary file. No explanation was given. This causes the return code of that git command to be ignored. Revert that part of the change, use temporary files and check the return code of git ls-files again. Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: René Scharfe <l.s.r@web.de> --- t/t5317-pack-objects-filter-objects.sh | 52 ++++++++++++++------------ 1 file changed, 28 insertions(+), 24 deletions(-) diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh index bb633c9b09..82a22ecaa5 100755 --- a/t/t5317-pack-objects-filter-objects.sh +++ b/t/t5317-pack-objects-filter-objects.sh @@ -24,8 +24,9 @@ parse_verify_pack_blob_oid () { } test_expect_success 'verify blob count in normal packfile' ' - git -C r1 ls-files -s file.1 file.2 file.3 file.4 file.5 | - test_parse_ls_files_stage_oids | + git -C r1 ls-files -s file.1 file.2 file.3 file.4 file.5 \ + >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | sort >expected && git -C r1 pack-objects --revs --stdout >all.pack <<-EOF && @@ -123,8 +124,8 @@ test_expect_success 'setup r2' ' ' test_expect_success 'verify blob count in normal packfile' ' - git -C r2 ls-files -s large.1000 large.10000 | - test_parse_ls_files_stage_oids | + git -C r2 ls-files -s large.1000 large.10000 >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | sort >expected && git -C r2 pack-objects --revs --stdout >all.pack <<-EOF && @@ -161,8 +162,8 @@ test_expect_success 'verify blob:limit=1000' ' ' test_expect_success 'verify blob:limit=1001' ' - git -C r2 ls-files -s large.1000 | - test_parse_ls_files_stage_oids | + git -C r2 ls-files -s large.1000 >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | sort >expected && git -C r2 pack-objects --revs --stdout --filter=blob:limit=1001 >filter.pack <<-EOF && @@ -179,8 +180,8 @@ test_expect_success 'verify blob:limit=1001' ' ' test_expect_success 'verify blob:limit=10001' ' - git -C r2 ls-files -s large.1000 large.10000 | - test_parse_ls_files_stage_oids | + git -C r2 ls-files -s large.1000 large.10000 >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | sort >expected && git -C r2 pack-objects --revs --stdout --filter=blob:limit=10001 >filter.pack <<-EOF && @@ -197,8 +198,8 @@ test_expect_success 'verify blob:limit=10001' ' ' test_expect_success 'verify blob:limit=1k' ' - git -C r2 ls-files -s large.1000 | - test_parse_ls_files_stage_oids | + git -C r2 ls-files -s large.1000 >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | sort >expected && git -C r2 pack-objects --revs --stdout --filter=blob:limit=1k >filter.pack <<-EOF && @@ -215,8 +216,8 @@ test_expect_success 'verify blob:limit=1k' ' ' test_expect_success 'verify explicitly specifying oversized blob in input' ' - git -C r2 ls-files -s large.1000 large.10000 | - test_parse_ls_files_stage_oids | + git -C r2 ls-files -s large.1000 large.10000 >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | sort >expected && echo HEAD >objects && @@ -233,8 +234,8 @@ test_expect_success 'verify explicitly specifying oversized blob in input' ' ' test_expect_success 'verify blob:limit=1m' ' - git -C r2 ls-files -s large.1000 large.10000 | - test_parse_ls_files_stage_oids | + git -C r2 ls-files -s large.1000 large.10000 >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | sort >expected && git -C r2 pack-objects --revs --stdout --filter=blob:limit=1m >filter.pack <<-EOF && @@ -289,8 +290,9 @@ test_expect_success 'setup r3' ' ' test_expect_success 'verify blob count in normal packfile' ' - git -C r3 ls-files -s sparse1 sparse2 dir1/sparse1 dir1/sparse2 | - test_parse_ls_files_stage_oids | + git -C r3 ls-files -s sparse1 sparse2 dir1/sparse1 dir1/sparse2 \ + >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | sort >expected && git -C r3 pack-objects --revs --stdout >all.pack <<-EOF && @@ -341,8 +343,9 @@ test_expect_success 'setup r4' ' ' test_expect_success 'verify blob count in normal packfile' ' - git -C r4 ls-files -s pattern sparse1 sparse2 dir1/sparse1 dir1/sparse2 | - test_parse_ls_files_stage_oids | + git -C r4 ls-files -s pattern sparse1 sparse2 dir1/sparse1 dir1/sparse2 \ + >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | sort >expected && git -C r4 pack-objects --revs --stdout >all.pack <<-EOF && @@ -359,8 +362,8 @@ test_expect_success 'verify blob count in normal packfile' ' ' test_expect_success 'verify sparse:oid=OID' ' - git -C r4 ls-files -s dir1/sparse1 dir1/sparse2 | - test_parse_ls_files_stage_oids | + git -C r4 ls-files -s dir1/sparse1 dir1/sparse2 >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | sort >expected && git -C r4 ls-files -s pattern >staged && @@ -379,8 +382,8 @@ test_expect_success 'verify sparse:oid=OID' ' ' test_expect_success 'verify sparse:oid=oid-ish' ' - git -C r4 ls-files -s dir1/sparse1 dir1/sparse2 | - test_parse_ls_files_stage_oids | + git -C r4 ls-files -s dir1/sparse1 dir1/sparse2 >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | sort >expected && git -C r4 pack-objects --revs --stdout --filter=sparse:oid=main:pattern >filter.pack <<-EOF && @@ -400,8 +403,9 @@ test_expect_success 'verify sparse:oid=oid-ish' ' # This models previously omitted objects that we did not receive. test_expect_success 'setup r1 - delete loose blobs' ' - git -C r1 ls-files -s file.1 file.2 file.3 file.4 file.5 | - test_parse_ls_files_stage_oids | + git -C r1 ls-files -s file.1 file.2 file.3 file.4 file.5 \ + >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | sort >expected && for id in `cat expected | sed "s|..|&/|"` -- 2.38.1 ^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH v3 2/5] t5317: demonstrate failure to handle multiple --filter options 2022-11-29 12:19 ` [PATCH v3 0/5] " René Scharfe 2022-11-29 12:21 ` [PATCH v3 1/5] t5317: stop losing return codes of git ls-files René Scharfe @ 2022-11-29 12:22 ` René Scharfe 2022-11-29 12:23 ` [PATCH v3 3/5] pack-objects: fix handling of " René Scharfe ` (2 subsequent siblings) 4 siblings, 0 replies; 52+ messages in thread From: René Scharfe @ 2022-11-29 12:22 UTC (permalink / raw) To: Git List Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, Taylor Blau, Christian Couder, Jeff King git pack-objects should accept multiple --filter options as documented in Documentation/rev-list-options.txt, but currently the last one wins. Show that using tests with multiple blob size limits Signed-off-by: René Scharfe <l.s.r@web.de> --- t/t5317-pack-objects-filter-objects.sh | 38 ++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh index 82a22ecaa5..25faebaada 100755 --- a/t/t5317-pack-objects-filter-objects.sh +++ b/t/t5317-pack-objects-filter-objects.sh @@ -265,6 +265,44 @@ test_expect_success 'verify normal and blob:limit packfiles have same commits/tr test_cmp expected observed ' +test_expect_failure 'verify small limit and big limit results in small limit' ' + git -C r2 ls-files -s large.1000 >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | + sort >expected && + + git -C r2 pack-objects --revs --stdout --filter=blob:limit=1001 \ + --filter=blob:limit=10001 >filter.pack <<-EOF && + HEAD + EOF + git -C r2 index-pack ../filter.pack && + + git -C r2 verify-pack -v ../filter.pack >verify_result && + grep blob verify_result | + parse_verify_pack_blob_oid | + sort >observed && + + test_cmp expected observed +' + +test_expect_success 'verify big limit and small limit results in small limit' ' + git -C r2 ls-files -s large.1000 >ls_files_result && + test_parse_ls_files_stage_oids <ls_files_result | + sort >expected && + + git -C r2 pack-objects --revs --stdout --filter=blob:limit=10001 \ + --filter=blob:limit=1001 >filter.pack <<-EOF && + HEAD + EOF + git -C r2 index-pack ../filter.pack && + + git -C r2 verify-pack -v ../filter.pack >verify_result && + grep blob verify_result | + parse_verify_pack_blob_oid | + sort >observed && + + test_cmp expected observed +' + # Test sparse:path=<path> filter. # !!!! # NOTE: sparse:path filter support has been dropped for security reasons, -- 2.38.1 ^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH v3 3/5] pack-objects: fix handling of multiple --filter options 2022-11-29 12:19 ` [PATCH v3 0/5] " René Scharfe 2022-11-29 12:21 ` [PATCH v3 1/5] t5317: stop losing return codes of git ls-files René Scharfe 2022-11-29 12:22 ` [PATCH v3 2/5] t5317: demonstrate failure to handle multiple --filter options René Scharfe @ 2022-11-29 12:23 ` René Scharfe 2022-11-30 1:09 ` Junio C Hamano 2022-11-29 12:25 ` [PATCH v3 4/5] pack-objects: simplify --filter handling René Scharfe 2022-11-29 12:26 ` [PATCH v3 5/5] list-objects-filter: remove OPT_PARSE_LIST_OBJECTS_FILTER_INIT() René Scharfe 4 siblings, 1 reply; 52+ messages in thread From: René Scharfe @ 2022-11-29 12:23 UTC (permalink / raw) To: Git List Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, Taylor Blau, Christian Couder, Jeff King Since 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak, 2022-03-28) --filter options given to git pack-objects overrule earlier ones, letting only the leftmost win and leaking the memory allocated for earlier ones. Fix that by only initializing the rev_info struct once. Signed-off-by: René Scharfe <l.s.r@web.de> --- builtin/pack-objects.c | 3 ++- t/t5317-pack-objects-filter-objects.sh | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 573d0b20b7..c702c09dd4 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -4158,7 +4158,8 @@ static struct list_objects_filter_options *po_filter_revs_init(void *value) { struct po_filter_data *data = value; - repo_init_revisions(the_repository, &data->revs, NULL); + if (!data->have_revs) + repo_init_revisions(the_repository, &data->revs, NULL); data->have_revs = 1; return &data->revs.filter; diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh index 25faebaada..5b707d911b 100755 --- a/t/t5317-pack-objects-filter-objects.sh +++ b/t/t5317-pack-objects-filter-objects.sh @@ -265,7 +265,7 @@ test_expect_success 'verify normal and blob:limit packfiles have same commits/tr test_cmp expected observed ' -test_expect_failure 'verify small limit and big limit results in small limit' ' +test_expect_success 'verify small limit and big limit results in small limit' ' git -C r2 ls-files -s large.1000 >ls_files_result && test_parse_ls_files_stage_oids <ls_files_result | sort >expected && -- 2.38.1 ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v3 3/5] pack-objects: fix handling of multiple --filter options 2022-11-29 12:23 ` [PATCH v3 3/5] pack-objects: fix handling of " René Scharfe @ 2022-11-30 1:09 ` Junio C Hamano 2022-11-30 7:11 ` René Scharfe 0 siblings, 1 reply; 52+ messages in thread From: Junio C Hamano @ 2022-11-30 1:09 UTC (permalink / raw) To: René Scharfe Cc: Git List, Ævar Arnfjörð Bjarmason, Taylor Blau, Christian Couder, Jeff King René Scharfe <l.s.r@web.de> writes: > Since 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't > leak, 2022-03-28) --filter options given to git pack-objects overrule > earlier ones, letting only the leftmost win and leaking the memory > allocated for earlier ones. Fix that by only initializing the rev_info > struct once. I think "leftmost" -> "rightmost", if your command line goes from left to right? Or "leftmost" -> "last" (e.g. "last one wins" over "earlier ones")? > Signed-off-by: René Scharfe <l.s.r@web.de> > --- > builtin/pack-objects.c | 3 ++- > t/t5317-pack-objects-filter-objects.sh | 2 +- > 2 files changed, 3 insertions(+), 2 deletions(-) > > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index 573d0b20b7..c702c09dd4 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -4158,7 +4158,8 @@ static struct list_objects_filter_options *po_filter_revs_init(void *value) > { > struct po_filter_data *data = value; > > - repo_init_revisions(the_repository, &data->revs, NULL); > + if (!data->have_revs) > + repo_init_revisions(the_repository, &data->revs, NULL); > data->have_revs = 1; > > return &data->revs.filter; > diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh > index 25faebaada..5b707d911b 100755 > --- a/t/t5317-pack-objects-filter-objects.sh > +++ b/t/t5317-pack-objects-filter-objects.sh > @@ -265,7 +265,7 @@ test_expect_success 'verify normal and blob:limit packfiles have same commits/tr > test_cmp expected observed > ' > > -test_expect_failure 'verify small limit and big limit results in small limit' ' > +test_expect_success 'verify small limit and big limit results in small limit' ' > git -C r2 ls-files -s large.1000 >ls_files_result && > test_parse_ls_files_stage_oids <ls_files_result | > sort >expected && > -- > 2.38.1 ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v3 3/5] pack-objects: fix handling of multiple --filter options 2022-11-30 1:09 ` Junio C Hamano @ 2022-11-30 7:11 ` René Scharfe 0 siblings, 0 replies; 52+ messages in thread From: René Scharfe @ 2022-11-30 7:11 UTC (permalink / raw) To: Junio C Hamano Cc: Git List, Ævar Arnfjörð Bjarmason, Taylor Blau, Christian Couder, Jeff King Am 30.11.22 um 02:09 schrieb Junio C Hamano: > René Scharfe <l.s.r@web.de> writes: > >> Since 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't >> leak, 2022-03-28) --filter options given to git pack-objects overrule >> earlier ones, letting only the leftmost win and leaking the memory >> allocated for earlier ones. Fix that by only initializing the rev_info >> struct once. > > I think "leftmost" -> "rightmost", if your command line goes from > left to right? Or "leftmost" -> "last" (e.g. "last one wins" over > "earlier ones")? Yes, indeed, I mixed up left and right again. René ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v3 4/5] pack-objects: simplify --filter handling 2022-11-29 12:19 ` [PATCH v3 0/5] " René Scharfe ` (2 preceding siblings ...) 2022-11-29 12:23 ` [PATCH v3 3/5] pack-objects: fix handling of " René Scharfe @ 2022-11-29 12:25 ` René Scharfe 2022-11-29 13:27 ` Ævar Arnfjörð Bjarmason 2022-11-29 12:26 ` [PATCH v3 5/5] list-objects-filter: remove OPT_PARSE_LIST_OBJECTS_FILTER_INIT() René Scharfe 4 siblings, 1 reply; 52+ messages in thread From: René Scharfe @ 2022-11-29 12:25 UTC (permalink / raw) To: Git List Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, Taylor Blau, Christian Couder, Jeff King pack-objects uses OPT_PARSE_LIST_OBJECTS_FILTER_INIT() to initialize the a rev_info struct lazily before populating its filter member using the --filter option values. It tracks whether the initialization is needed using the .have_revs member of the callback data. There is a better way: Use a stand-alone list_objects_filter_options struct and build a rev_info struct with its .filter member after option parsing. This allows using the simpler OPT_PARSE_LIST_OBJECTS_FILTER() and getting rid of the extra callback mechanism. Even simpler would be using a struct rev_info as before 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak, 2022-03-28), but that would expose a memory leak caused by repo_init_revisions() followed by release_revisions() without a setup_revisions() call in between. Using list_objects_filter_options also allows pushing the rev_info struct into get_object_list(), where it arguably belongs. Either way, this is all left for later. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: René Scharfe <l.s.r@web.de> --- builtin/pack-objects.c | 28 ++++++---------------------- 1 file changed, 6 insertions(+), 22 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index c702c09dd4..2193f80b89 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -4149,22 +4149,6 @@ static int option_parse_cruft_expiration(const struct option *opt, return 0; } -struct po_filter_data { - unsigned have_revs:1; - struct rev_info revs; -}; - -static struct list_objects_filter_options *po_filter_revs_init(void *value) -{ - struct po_filter_data *data = value; - - if (!data->have_revs) - repo_init_revisions(the_repository, &data->revs, NULL); - data->have_revs = 1; - - return &data->revs.filter; -} - int cmd_pack_objects(int argc, const char **argv, const char *prefix) { int use_internal_rev_list = 0; @@ -4175,7 +4159,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) int rev_list_index = 0; int stdin_packs = 0; struct string_list keep_pack_list = STRING_LIST_INIT_NODUP; - struct po_filter_data pfd = { .have_revs = 0 }; + struct list_objects_filter_options filter_options = + LIST_OBJECTS_FILTER_INIT; struct option pack_objects_options[] = { OPT_SET_INT('q', "quiet", &progress, @@ -4266,7 +4251,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) &write_bitmap_index, N_("write a bitmap index if possible"), WRITE_BITMAP_QUIET, PARSE_OPT_HIDDEN), - OPT_PARSE_LIST_OBJECTS_FILTER_INIT(&pfd, po_filter_revs_init), + OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options), OPT_CALLBACK_F(0, "missing", NULL, N_("action"), N_("handling for missing objects"), PARSE_OPT_NONEG, option_parse_missing_action), @@ -4386,7 +4371,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (!rev_list_all || !rev_list_reflog || !rev_list_index) unpack_unreachable_expiration = 0; - if (pfd.have_revs && pfd.revs.filter.choice) { + if (filter_options.choice) { if (!pack_to_stdout) die(_("cannot use --filter without --stdout")); if (stdin_packs) @@ -4473,13 +4458,11 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) read_cruft_objects(); } else if (!use_internal_rev_list) { read_object_list_from_stdin(); - } else if (pfd.have_revs) { - get_object_list(&pfd.revs, rp.nr, rp.v); - release_revisions(&pfd.revs); } else { struct rev_info revs; repo_init_revisions(the_repository, &revs, NULL); + list_objects_filter_copy(&revs.filter, &filter_options); get_object_list(&revs, rp.nr, rp.v); release_revisions(&revs); } @@ -4514,6 +4497,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) reuse_packfile_objects); cleanup: + list_objects_filter_release(&filter_options); strvec_clear(&rp); return 0; -- 2.38.1 ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v3 4/5] pack-objects: simplify --filter handling 2022-11-29 12:25 ` [PATCH v3 4/5] pack-objects: simplify --filter handling René Scharfe @ 2022-11-29 13:27 ` Ævar Arnfjörð Bjarmason 2022-11-30 11:23 ` René Scharfe 0 siblings, 1 reply; 52+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-11-29 13:27 UTC (permalink / raw) To: René Scharfe Cc: Git List, Junio C Hamano, Taylor Blau, Christian Couder, Jeff King On Tue, Nov 29 2022, René Scharfe wrote: First, I think 3/5 in this series looks really good in this iteration. I think this is also worth cleaning up, but depending on when we expect this to be queued maybe splitting these post-cleanups into their own series would make sense? > pack-objects uses OPT_PARSE_LIST_OBJECTS_FILTER_INIT() to initialize the > a rev_info struct lazily before populating its filter member using the > --filter option values. It tracks whether the initialization is needed > using the .have_revs member of the callback data. > > There is a better way: Use a stand-alone list_objects_filter_options > struct and build a rev_info struct with its .filter member after option > parsing. This allows using the simpler OPT_PARSE_LIST_OBJECTS_FILTER() > and getting rid of the extra callback mechanism. > > Even simpler would be using a struct rev_info as before 5cb28270a1 > (pack-objects: lazily set up "struct rev_info", don't leak, 2022-03-28), > but that would expose a memory leak caused by repo_init_revisions() > followed by release_revisions() without a setup_revisions() call in > between. > > Using list_objects_filter_options also allows pushing the rev_info > struct into get_object_list(), where it arguably belongs. Either way, > this is all left for later. I think what you're missing here (or not explaining) is that *the reason* for us not being able to return to a pre-5cb28270a1 state is that if we initialize with REV_INFO_INIT we'll first of all find that the "filter" isn't initialized, because it's not in that macro. And even if it were there, the repo_init_revisions() will wipe away anything we add to it once we call repo_init_revisions(), which we'll need to do any "real" work with it. But if we're really going to refactor this properly let's stop beating around the bush and just connect those dots. So first, the only reason for why "filter" isn't there yet is because a recent topic would have conflicted with it, so if this is cherry-picked after 3/5 we'll have that: https://github.com/avar/git/commit/6b29069d72f.patch Then the below change on top of that could replace this 4/5 and 5/5. I.e. also the 5/5 because the only reason we need the "opt_lof_init" is because of this init-ing edge case. builtin/pack-objects.c | 32 +++++++------------------------- list-objects-filter-options.c | 4 ---- list-objects-filter-options.h | 29 ++++++++++------------------- revision.c | 19 +++++++++++++------ revision.h | 9 +++++++++ 5 files changed, 39 insertions(+), 54 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index c702c09dd45..d0c4c26f4e9 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -4149,22 +4149,6 @@ static int option_parse_cruft_expiration(const struct option *opt, return 0; } -struct po_filter_data { - unsigned have_revs:1; - struct rev_info revs; -}; - -static struct list_objects_filter_options *po_filter_revs_init(void *value) -{ - struct po_filter_data *data = value; - - if (!data->have_revs) - repo_init_revisions(the_repository, &data->revs, NULL); - data->have_revs = 1; - - return &data->revs.filter; -} - int cmd_pack_objects(int argc, const char **argv, const char *prefix) { int use_internal_rev_list = 0; @@ -4175,7 +4159,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) int rev_list_index = 0; int stdin_packs = 0; struct string_list keep_pack_list = STRING_LIST_INIT_NODUP; - struct po_filter_data pfd = { .have_revs = 0 }; + struct rev_info revs = REV_INFO_INIT; struct option pack_objects_options[] = { OPT_SET_INT('q', "quiet", &progress, @@ -4266,7 +4250,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) &write_bitmap_index, N_("write a bitmap index if possible"), WRITE_BITMAP_QUIET, PARSE_OPT_HIDDEN), - OPT_PARSE_LIST_OBJECTS_FILTER_INIT(&pfd, po_filter_revs_init), + OPT_PARSE_LIST_OBJECTS_FILTER(&revs.filter), OPT_CALLBACK_F(0, "missing", NULL, N_("action"), N_("handling for missing objects"), PARSE_OPT_NONEG, option_parse_missing_action), @@ -4386,7 +4370,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (!rev_list_all || !rev_list_reflog || !rev_list_index) unpack_unreachable_expiration = 0; - if (pfd.have_revs && pfd.revs.filter.choice) { + if (revs.filter.choice) { if (!pack_to_stdout) die(_("cannot use --filter without --stdout")); if (stdin_packs) @@ -4473,16 +4457,14 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) read_cruft_objects(); } else if (!use_internal_rev_list) { read_object_list_from_stdin(); - } else if (pfd.have_revs) { - get_object_list(&pfd.revs, rp.nr, rp.v); - release_revisions(&pfd.revs); + } else if (revs.filter.choice) { + repo_dynamic_init_revisions(the_repository, &revs, NULL); + get_object_list(&revs, rp.nr, rp.v); } else { - struct rev_info revs; - repo_init_revisions(the_repository, &revs, NULL); get_object_list(&revs, rp.nr, rp.v); - release_revisions(&revs); } + release_revisions(&revs); cleanup_preferred_base(); if (include_tag && nr_result) for_each_tag_ref(add_ref_tag, NULL); diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c index 53396602380..ee01bcd2cc3 100644 --- a/list-objects-filter-options.c +++ b/list-objects-filter-options.c @@ -290,10 +290,6 @@ int opt_parse_list_objects_filter(const struct option *opt, const char *arg, int unset) { struct list_objects_filter_options *filter_options = opt->value; - opt_lof_init init = (opt_lof_init)opt->defval; - - if (init) - filter_options = init(opt->value); if (unset || !arg) list_objects_filter_set_no_filter(filter_options); diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h index 7eeadab2dd0..470b86113ac 100644 --- a/list-objects-filter-options.h +++ b/list-objects-filter-options.h @@ -108,30 +108,21 @@ void parse_list_objects_filter( const char *arg); /** - * The opt->value to opt_parse_list_objects_filter() is either a - * "struct list_objects_filter_option *" when using - * OPT_PARSE_LIST_OBJECTS_FILTER(). + * The opt->value to opt_parse_list_objects_filter() is a "struct + * list_objects_filter_option *". * - * Or, if using no "struct option" field is used by the callback, - * except the "defval" which is expected to be an "opt_lof_init" - * function, which is called with the "opt->value" and must return a - * pointer to the ""struct list_objects_filter_option *" to be used. - * - * The OPT_PARSE_LIST_OBJECTS_FILTER_INIT() can be used e.g. the - * "struct list_objects_filter_option" is embedded in a "struct - * rev_info", which the "defval" could be tasked with lazily - * initializing. See cmd_pack_objects() for an example. + * When it's the "filter" member of a "struct rev_info" make sure to + * initialize it first with "REV_INFO_INIT" or repo_init_revisions() + * before manipulating the "filter". When using "REV_INFO_INIT" to + * lazily initialize a "struct rev_info" use + * repo_dynamic_init_revisions() before using that "struct rev_info" + * with the revisions API. */ int opt_parse_list_objects_filter(const struct option *opt, const char *arg, int unset); -typedef struct list_objects_filter_options *(*opt_lof_init)(void *); -#define OPT_PARSE_LIST_OBJECTS_FILTER_INIT(fo, init) \ - { OPTION_CALLBACK, 0, "filter", (fo), N_("args"), \ - N_("object filtering"), 0, opt_parse_list_objects_filter, \ - (intptr_t)(init) } - #define OPT_PARSE_LIST_OBJECTS_FILTER(fo) \ - OPT_PARSE_LIST_OBJECTS_FILTER_INIT((fo), NULL) + { OPTION_CALLBACK, 0, "filter", (fo), N_("args"), \ + N_("object filtering"), 0, opt_parse_list_objects_filter } /* * Translates abbreviated numbers in the filter's filter_spec into their diff --git a/revision.c b/revision.c index 47817da209a..435c864d7cd 100644 --- a/revision.c +++ b/revision.c @@ -1888,13 +1888,10 @@ static int add_parents_only(struct rev_info *revs, const char *arg_, int flags, return 1; } -void repo_init_revisions(struct repository *r, - struct rev_info *revs, - const char *prefix) +void repo_dynamic_init_revisions(struct repository *r, + struct rev_info *revs, + const char *prefix) { - struct rev_info blank = REV_INFO_INIT; - memcpy(revs, &blank, sizeof(*revs)); - revs->repo = r; revs->pruning.repo = r; revs->pruning.add_remove = file_add_remove; @@ -1914,6 +1911,16 @@ void repo_init_revisions(struct repository *r, init_display_notes(&revs->notes_opt); } +void repo_init_revisions(struct repository *r, + struct rev_info *revs, + const char *prefix) +{ + struct rev_info blank = REV_INFO_INIT; + memcpy(revs, &blank, sizeof(*revs)); + + repo_dynamic_init_revisions(r, revs, prefix); +} + static void add_pending_commit_list(struct rev_info *revs, struct commit_list *commit_list, unsigned int flags) diff --git a/revision.h b/revision.h index 56c6b012abb..75982288db8 100644 --- a/revision.h +++ b/revision.h @@ -417,6 +417,15 @@ struct rev_info { void repo_init_revisions(struct repository *r, struct rev_info *revs, const char *prefix); + +/** + * A subset of repo_init_revisions(), assumes that we've already + * initialized with REV_INFO_INIT, and possibly manipulated the + * members initialized therein. + */ +void repo_dynamic_init_revisions(struct repository *r, + struct rev_info *revs, + const char *prefix); #ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS #define init_revisions(revs, prefix) repo_init_revisions(the_repository, revs, prefix) #endif Now, that repo_dynamic_init_revisions() is rather nasty, but why is it that we need it in the first place? It turns out that this whole "dynamic init" business is only needed because we need to set "revs.repo" to "the_repository". So this on top passes all tests: diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index d0c4c26f4e9..190e2f8ee0e 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -4458,10 +4458,12 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) } else if (!use_internal_rev_list) { read_object_list_from_stdin(); } else if (revs.filter.choice) { + xsetenv("GIT_TEST_RI", "1", 1); repo_dynamic_init_revisions(the_repository, &revs, NULL); get_object_list(&revs, rp.nr, rp.v); } else { - repo_init_revisions(the_repository, &revs, NULL); + xsetenv("GIT_TEST_RI", "1", 1); + repo_dynamic_init_revisions(the_repository, &revs, NULL); get_object_list(&revs, rp.nr, rp.v); } release_revisions(&revs); diff --git a/revision.c b/revision.c index 435c864d7cd..cd832499d22 100644 --- a/revision.c +++ b/revision.c @@ -1893,6 +1893,9 @@ void repo_dynamic_init_revisions(struct repository *r, const char *prefix) { revs->repo = r; + if (getenv("GIT_TEST_RI")) + return; + revs->pruning.repo = r; revs->pruning.add_remove = file_add_remove; revs->pruning.change = file_change; And of course we can in turn simplify that to something like the CHILD_PROCESS_INIT macro we discussed the other day in another thread, i.e. just: struct rev_info = REV_INFO_INIT_VA( .repo = the_repository, }; So then the whole intermediate repo_dynamic_init_revisions() will also go away, and I suspect many current users of repo_init_revisions() can be made to just use the macro. Regarding 5/5: I know I asked in the previous iteration if we couldn't split that J.5.7 change from the rest, but the 4/5 here just leaves it as a loose end. I.e. we've already made the macro unused, so in 5/5 we're just cleaning up already dead code. On reflection, I don't know if splitting it up is worth it after all, but if it's going to be made an atomic change it'll surely have to happen the other way around. E.g. now we take an arbtirary "intptr_t", but once you get rid of the callback mechanism we stop using it, so we've already done the J.5.7 change. Maybe it's not possible to split it up after all. In any case the real valuable thing is to split it from the bug fix in 3/5. If it then goes away as part of some general refactoring... I'd also be fine with just having this series as-is at this point, depending on your appetite to continue hacking on this. I've been slowly working towards making the "struct rev_info" initialization less crappy, so the changes above could also be made in some eventual REV_INFO_INIT series... ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v3 4/5] pack-objects: simplify --filter handling 2022-11-29 13:27 ` Ævar Arnfjörð Bjarmason @ 2022-11-30 11:23 ` René Scharfe 0 siblings, 0 replies; 52+ messages in thread From: René Scharfe @ 2022-11-30 11:23 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Git List, Junio C Hamano, Taylor Blau, Christian Couder, Jeff King Am 29.11.22 um 14:27 schrieb Ævar Arnfjörð Bjarmason: > > On Tue, Nov 29 2022, René Scharfe wrote: > > First, I think 3/5 in this series looks really good in this iteration. > > I think this is also worth cleaning up, but depending on when we expect > this to be queued maybe splitting these post-cleanups into their own > series would make sense? > >> pack-objects uses OPT_PARSE_LIST_OBJECTS_FILTER_INIT() to initialize the >> a rev_info struct lazily before populating its filter member using the >> --filter option values. It tracks whether the initialization is needed >> using the .have_revs member of the callback data. >> >> There is a better way: Use a stand-alone list_objects_filter_options >> struct and build a rev_info struct with its .filter member after option >> parsing. This allows using the simpler OPT_PARSE_LIST_OBJECTS_FILTER() >> and getting rid of the extra callback mechanism. >> >> Even simpler would be using a struct rev_info as before 5cb28270a1 >> (pack-objects: lazily set up "struct rev_info", don't leak, 2022-03-28), >> but that would expose a memory leak caused by repo_init_revisions() >> followed by release_revisions() without a setup_revisions() call in >> between. >> >> Using list_objects_filter_options also allows pushing the rev_info >> struct into get_object_list(), where it arguably belongs. Either way, >> this is all left for later. > > I think what you're missing here (or not explaining) is that *the > reason* for us not being able to return to a pre-5cb28270a1 state is > that if we initialize with REV_INFO_INIT we'll first of all find that > the "filter" isn't initialized, because it's not in that macro. The parent of 5cb28270a1 used repo_init_revisions(), not REV_INFO_INIT directly, so that macro is not a direct reason. But you mean that in a world where struct rev_info can be sufficiently initialized with a macro we could use it without allocations, avoiding leaks? And for pack-objects just the filter member is missing to reach that goal? > And even if it were there, the repo_init_revisions() will wipe away > anything we add to it once we call repo_init_revisions(), which we'll > need to do any "real" work with it. > > But if we're really going to refactor this properly let's stop beating > around the bush and just connect those dots. So first, the only reason > for why "filter" isn't there yet is because a recent topic would have > conflicted with it, so if this is cherry-picked after 3/5 we'll have > that: https://github.com/avar/git/commit/6b29069d72f.patch That patch looks good. > Then the below change on top of that could replace this 4/5 and > 5/5. I.e. also the 5/5 because the only reason we need the > "opt_lof_init" is because of this init-ing edge case. > > builtin/pack-objects.c | 32 +++++++------------------------- > list-objects-filter-options.c | 4 ---- > list-objects-filter-options.h | 29 ++++++++++------------------- > revision.c | 19 +++++++++++++------ > revision.h | 9 +++++++++ > 5 files changed, 39 insertions(+), 54 deletions(-) > > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index c702c09dd45..d0c4c26f4e9 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -4149,22 +4149,6 @@ static int option_parse_cruft_expiration(const struct option *opt, > return 0; > } > > -struct po_filter_data { > - unsigned have_revs:1; > - struct rev_info revs; > -}; > - > -static struct list_objects_filter_options *po_filter_revs_init(void *value) > -{ > - struct po_filter_data *data = value; > - > - if (!data->have_revs) > - repo_init_revisions(the_repository, &data->revs, NULL); > - data->have_revs = 1; > - > - return &data->revs.filter; > -} > - > int cmd_pack_objects(int argc, const char **argv, const char *prefix) > { > int use_internal_rev_list = 0; > @@ -4175,7 +4159,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) > int rev_list_index = 0; > int stdin_packs = 0; > struct string_list keep_pack_list = STRING_LIST_INIT_NODUP; > - struct po_filter_data pfd = { .have_revs = 0 }; > + struct rev_info revs = REV_INFO_INIT; > > struct option pack_objects_options[] = { > OPT_SET_INT('q', "quiet", &progress, > @@ -4266,7 +4250,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) > &write_bitmap_index, > N_("write a bitmap index if possible"), > WRITE_BITMAP_QUIET, PARSE_OPT_HIDDEN), > - OPT_PARSE_LIST_OBJECTS_FILTER_INIT(&pfd, po_filter_revs_init), > + OPT_PARSE_LIST_OBJECTS_FILTER(&revs.filter), > OPT_CALLBACK_F(0, "missing", NULL, N_("action"), > N_("handling for missing objects"), PARSE_OPT_NONEG, > option_parse_missing_action), > @@ -4386,7 +4370,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) > if (!rev_list_all || !rev_list_reflog || !rev_list_index) > unpack_unreachable_expiration = 0; > > - if (pfd.have_revs && pfd.revs.filter.choice) { > + if (revs.filter.choice) { > if (!pack_to_stdout) > die(_("cannot use --filter without --stdout")); > if (stdin_packs) > @@ -4473,16 +4457,14 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) > read_cruft_objects(); > } else if (!use_internal_rev_list) { > read_object_list_from_stdin(); > - } else if (pfd.have_revs) { > - get_object_list(&pfd.revs, rp.nr, rp.v); > - release_revisions(&pfd.revs); > + } else if (revs.filter.choice) { > + repo_dynamic_init_revisions(the_repository, &revs, NULL); > + get_object_list(&revs, rp.nr, rp.v); > } else { > - struct rev_info revs; > - > repo_init_revisions(the_repository, &revs, NULL); > get_object_list(&revs, rp.nr, rp.v); > - release_revisions(&revs); > } > + release_revisions(&revs); > cleanup_preferred_base(); > if (include_tag && nr_result) > for_each_tag_ref(add_ref_tag, NULL); > diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c > index 53396602380..ee01bcd2cc3 100644 > --- a/list-objects-filter-options.c > +++ b/list-objects-filter-options.c > @@ -290,10 +290,6 @@ int opt_parse_list_objects_filter(const struct option *opt, > const char *arg, int unset) > { > struct list_objects_filter_options *filter_options = opt->value; > - opt_lof_init init = (opt_lof_init)opt->defval; > - > - if (init) > - filter_options = init(opt->value); > > if (unset || !arg) > list_objects_filter_set_no_filter(filter_options); > diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h > index 7eeadab2dd0..470b86113ac 100644 > --- a/list-objects-filter-options.h > +++ b/list-objects-filter-options.h > @@ -108,30 +108,21 @@ void parse_list_objects_filter( > const char *arg); > > /** > - * The opt->value to opt_parse_list_objects_filter() is either a > - * "struct list_objects_filter_option *" when using > - * OPT_PARSE_LIST_OBJECTS_FILTER(). > + * The opt->value to opt_parse_list_objects_filter() is a "struct > + * list_objects_filter_option *". > * > - * Or, if using no "struct option" field is used by the callback, > - * except the "defval" which is expected to be an "opt_lof_init" > - * function, which is called with the "opt->value" and must return a > - * pointer to the ""struct list_objects_filter_option *" to be used. > - * > - * The OPT_PARSE_LIST_OBJECTS_FILTER_INIT() can be used e.g. the > - * "struct list_objects_filter_option" is embedded in a "struct > - * rev_info", which the "defval" could be tasked with lazily > - * initializing. See cmd_pack_objects() for an example. > + * When it's the "filter" member of a "struct rev_info" make sure to > + * initialize it first with "REV_INFO_INIT" or repo_init_revisions() > + * before manipulating the "filter". When using "REV_INFO_INIT" to > + * lazily initialize a "struct rev_info" use > + * repo_dynamic_init_revisions() before using that "struct rev_info" > + * with the revisions API. > */ > int opt_parse_list_objects_filter(const struct option *opt, > const char *arg, int unset); > -typedef struct list_objects_filter_options *(*opt_lof_init)(void *); > -#define OPT_PARSE_LIST_OBJECTS_FILTER_INIT(fo, init) \ > - { OPTION_CALLBACK, 0, "filter", (fo), N_("args"), \ > - N_("object filtering"), 0, opt_parse_list_objects_filter, \ > - (intptr_t)(init) } > - > #define OPT_PARSE_LIST_OBJECTS_FILTER(fo) \ > - OPT_PARSE_LIST_OBJECTS_FILTER_INIT((fo), NULL) > + { OPTION_CALLBACK, 0, "filter", (fo), N_("args"), \ > + N_("object filtering"), 0, opt_parse_list_objects_filter } > > /* > * Translates abbreviated numbers in the filter's filter_spec into their > diff --git a/revision.c b/revision.c > index 47817da209a..435c864d7cd 100644 > --- a/revision.c > +++ b/revision.c > @@ -1888,13 +1888,10 @@ static int add_parents_only(struct rev_info *revs, const char *arg_, int flags, > return 1; > } > > -void repo_init_revisions(struct repository *r, > - struct rev_info *revs, > - const char *prefix) > +void repo_dynamic_init_revisions(struct repository *r, > + struct rev_info *revs, > + const char *prefix) > { > - struct rev_info blank = REV_INFO_INIT; > - memcpy(revs, &blank, sizeof(*revs)); > - > revs->repo = r; > revs->pruning.repo = r; > revs->pruning.add_remove = file_add_remove; > @@ -1914,6 +1911,16 @@ void repo_init_revisions(struct repository *r, > init_display_notes(&revs->notes_opt); > } > > +void repo_init_revisions(struct repository *r, > + struct rev_info *revs, > + const char *prefix) > +{ > + struct rev_info blank = REV_INFO_INIT; > + memcpy(revs, &blank, sizeof(*revs)); > + > + repo_dynamic_init_revisions(r, revs, prefix); > +} > + > static void add_pending_commit_list(struct rev_info *revs, > struct commit_list *commit_list, > unsigned int flags) > diff --git a/revision.h b/revision.h > index 56c6b012abb..75982288db8 100644 > --- a/revision.h > +++ b/revision.h > @@ -417,6 +417,15 @@ struct rev_info { > void repo_init_revisions(struct repository *r, > struct rev_info *revs, > const char *prefix); > + > +/** > + * A subset of repo_init_revisions(), assumes that we've already > + * initialized with REV_INFO_INIT, and possibly manipulated the > + * members initialized therein. > + */ > +void repo_dynamic_init_revisions(struct repository *r, > + struct rev_info *revs, > + const char *prefix); > #ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS > #define init_revisions(revs, prefix) repo_init_revisions(the_repository, revs, prefix) > #endif > > Now, that repo_dynamic_init_revisions() is rather nasty, but why is it > that we need it in the first place? It turns out that this whole > "dynamic init" business is only needed because we need to set > "revs.repo" to "the_repository". So this on top passes all tests: > > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index d0c4c26f4e9..190e2f8ee0e 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -4458,10 +4458,12 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) > } else if (!use_internal_rev_list) { > read_object_list_from_stdin(); > } else if (revs.filter.choice) { > + xsetenv("GIT_TEST_RI", "1", 1); > repo_dynamic_init_revisions(the_repository, &revs, NULL); > get_object_list(&revs, rp.nr, rp.v); > } else { > - repo_init_revisions(the_repository, &revs, NULL); > + xsetenv("GIT_TEST_RI", "1", 1); > + repo_dynamic_init_revisions(the_repository, &revs, NULL); > get_object_list(&revs, rp.nr, rp.v); > } > release_revisions(&revs); > diff --git a/revision.c b/revision.c > index 435c864d7cd..cd832499d22 100644 > --- a/revision.c > +++ b/revision.c > @@ -1893,6 +1893,9 @@ void repo_dynamic_init_revisions(struct repository *r, > const char *prefix) > { > revs->repo = r; > + if (getenv("GIT_TEST_RI")) > + return; > + > revs->pruning.repo = r; > revs->pruning.add_remove = file_add_remove; > revs->pruning.change = file_change; > > And of course we can in turn simplify that to something like the > CHILD_PROCESS_INIT macro we discussed the other day in another thread, > i.e. just: > > struct rev_info = REV_INFO_INIT_VA( > .repo = the_repository, > }; > > So then the whole intermediate repo_dynamic_init_revisions() will also > go away, and I suspect many current users of repo_init_revisions() can > be made to just use the macro. Converting existing users of repo_init_revisions() to partial initialization using a macro seems like a lot of work to determine if that subset is sufficient. Testing alone won't be enough. A macro that initializes the struct fully would be easier to use. One obstacle is that repo and prefix are duplicated into embedded structs (.repo, .pruning.repo, .grep_filter.repo, .diffopt.repo; .prefix, .diffopt.prefix). Another is .diffopt.parseopts, which includes pointers to other .diffopt members, whose initialization requires the name of the rev_info variable. Why do we want to get rid of repo_init_revisions()? Because it allocates .diffopt.parseopts and we sometimes forget to release it, especially in cases where we don't actually need it. We can avoid that by leaving the allocations to the code paths that parse diff options. That's actually easy (for now?) because only two places customize the option list and they do it by concatenation. Patch below. > Regarding 5/5: I know I asked in the previous iteration if we couldn't > split that J.5.7 change from the rest, but the 4/5 here just leaves it > as a loose end. I.e. we've already made the macro unused, so in 5/5 > we're just cleaning up already dead code. > > On reflection, I don't know if splitting it up is worth it after all, > but if it's going to be made an atomic change it'll surely have to > happen the other way around. > > E.g. now we take an arbtirary "intptr_t", but once you get rid of the > callback mechanism we stop using it, so we've already done the J.5.7 > change. > > Maybe it's not possible to split it up after all. In any case the real > valuable thing is to split it from the bug fix in 3/5. If it then goes > away as part of some general refactoring... > > I'd also be fine with just having this series as-is at this point, > depending on your appetite to continue hacking on this. > > I've been slowly working towards making the "struct rev_info" > initialization less crappy, so the changes above could also be made in > some eventual REV_INFO_INIT series... Merging the introduction or removal of infrastructure with the first or last use is sometimes nicer, e.g. if one part is small. Here I slightly prefer to keep the different concerns separate, i.e. to not merge patch 4 and 5, because 4 is big enough already and 5 being a trivial dead code removal is a good thing in my eyes. diff --git a/builtin/range-diff.c b/builtin/range-diff.c index e2a74efb42..757010069f 100644 --- a/builtin/range-diff.c +++ b/builtin/range-diff.c @@ -47,7 +47,7 @@ int cmd_range_diff(int argc, const char **argv, const char *prefix) repo_diff_setup(the_repository, &diffopt); - options = parse_options_concat(range_diff_options, diffopt.parseopts); + options = diff_parse_options_concat(range_diff_options, &diffopt); argc = parse_options(argc, argv, prefix, options, builtin_range_diff_usage, PARSE_OPT_KEEP_DASHDASH); diff --git a/diff-no-index.c b/diff-no-index.c index 18edbdf4b5..0a240039a0 100644 --- a/diff-no-index.c +++ b/diff-no-index.c @@ -255,8 +255,7 @@ int diff_no_index(struct rev_info *revs, }; struct option *options; - options = parse_options_concat(no_index_options, - revs->diffopt.parseopts); + options = diff_parse_options_concat(no_index_options, &revs->diffopt); argc = parse_options(argc, argv, revs->prefix, options, diff_no_index_usage, 0); if (argc != 2) { diff --git a/diff.c b/diff.c index 1054a4b732..ebf9154eaf 100644 --- a/diff.c +++ b/diff.c @@ -4615,8 +4615,6 @@ static void run_checkdiff(struct diff_filepair *p, struct diff_options *o) builtin_checkdiff(name, other, attr_path, p->one, p->two, o); } -static void prep_parse_options(struct diff_options *options); - void repo_diff_setup(struct repository *r, struct diff_options *options) { memcpy(options, &default_diff_options, sizeof(*options)); @@ -4662,8 +4660,6 @@ void repo_diff_setup(struct repository *r, struct diff_options *options) options->color_moved = diff_color_moved_default; options->color_moved_ws_handling = diff_color_moved_ws_default; - - prep_parse_options(options); } static const char diff_status_letters[] = { @@ -4821,8 +4817,6 @@ void diff_setup_done(struct diff_options *options) options->filter = ~filter_bit[DIFF_STATUS_FILTER_AON]; options->filter &= ~options->filter_not; } - - FREE_AND_NULL(options->parseopts); } int parse_long_opt(const char *opt, const char **argv, @@ -5419,7 +5413,8 @@ static int diff_opt_rotate_to(const struct option *opt, const char *arg, int uns return 0; } -static void prep_parse_options(struct diff_options *options) +struct option *diff_parse_options_concat(const struct option *extra, + struct diff_options *options) { struct option parseopts[] = { OPT_GROUP(N_("Diff output format options")), @@ -5689,22 +5684,24 @@ static void prep_parse_options(struct diff_options *options) OPT_END() }; - ALLOC_ARRAY(options->parseopts, ARRAY_SIZE(parseopts)); - memcpy(options->parseopts, parseopts, sizeof(parseopts)); + return parse_options_concat(extra, parseopts); } int diff_opt_parse(struct diff_options *options, const char **av, int ac, const char *prefix) { + struct option *opts = diff_parse_options_concat(NULL, options); + if (!prefix) prefix = ""; - ac = parse_options(ac, av, prefix, options->parseopts, NULL, + ac = parse_options(ac, av, prefix, opts, NULL, PARSE_OPT_KEEP_DASHDASH | PARSE_OPT_KEEP_UNKNOWN_OPT | PARSE_OPT_NO_INTERNAL_HELP | PARSE_OPT_ONE_SHOT | PARSE_OPT_STOP_AT_NON_OPTION); + free(opts); return ac; } @@ -6513,7 +6510,6 @@ void diff_free(struct diff_options *options) diff_free_file(options); diff_free_ignore_regex(options); clear_pathspec(&options->pathspec); - FREE_AND_NULL(options->parseopts); } void diff_flush(struct diff_options *options) diff --git a/diff.h b/diff.h index fd33caeb25..f2ab97b8b4 100644 --- a/diff.h +++ b/diff.h @@ -394,7 +394,6 @@ struct diff_options { unsigned color_moved_ws_handling; struct repository *repo; - struct option *parseopts; struct strmap *additional_path_headers; int no_free; @@ -539,6 +538,8 @@ int git_diff_ui_config(const char *var, const char *value, void *cb); #define diff_setup(diffopts) repo_diff_setup(the_repository, diffopts) #endif void repo_diff_setup(struct repository *, struct diff_options *); +struct option *diff_parse_options_concat(const struct option *extra, + struct diff_options *options); int diff_opt_parse(struct diff_options *, const char **, int, const char *); void diff_setup_done(struct diff_options *); int git_config_rename(const char *var, const char *value); ^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH v3 5/5] list-objects-filter: remove OPT_PARSE_LIST_OBJECTS_FILTER_INIT() 2022-11-29 12:19 ` [PATCH v3 0/5] " René Scharfe ` (3 preceding siblings ...) 2022-11-29 12:25 ` [PATCH v3 4/5] pack-objects: simplify --filter handling René Scharfe @ 2022-11-29 12:26 ` René Scharfe 2022-11-30 1:20 ` Junio C Hamano 4 siblings, 1 reply; 52+ messages in thread From: René Scharfe @ 2022-11-29 12:26 UTC (permalink / raw) To: Git List Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, Taylor Blau, Christian Couder, Jeff King OPT_PARSE_LIST_OBJECTS_FILTER_INIT() with a non-NULL second argument passes a function pointer via an object pointer, which is undefined. It may work fine on platforms that implement C99 extension J.5.7 (Function pointer casts). Remove the unused macro and avoid the dependency on that extension. Signed-off-by: René Scharfe <l.s.r@web.de> --- list-objects-filter-options.c | 4 ---- list-objects-filter-options.h | 18 ++---------------- 2 files changed, 2 insertions(+), 20 deletions(-) diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c index 5339660238..ee01bcd2cc 100644 --- a/list-objects-filter-options.c +++ b/list-objects-filter-options.c @@ -290,10 +290,6 @@ int opt_parse_list_objects_filter(const struct option *opt, const char *arg, int unset) { struct list_objects_filter_options *filter_options = opt->value; - opt_lof_init init = (opt_lof_init)opt->defval; - - if (init) - filter_options = init(opt->value); if (unset || !arg) list_objects_filter_set_no_filter(filter_options); diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h index 7eeadab2dd..1fe393f447 100644 --- a/list-objects-filter-options.h +++ b/list-objects-filter-options.h @@ -111,27 +111,13 @@ void parse_list_objects_filter( * The opt->value to opt_parse_list_objects_filter() is either a * "struct list_objects_filter_option *" when using * OPT_PARSE_LIST_OBJECTS_FILTER(). - * - * Or, if using no "struct option" field is used by the callback, - * except the "defval" which is expected to be an "opt_lof_init" - * function, which is called with the "opt->value" and must return a - * pointer to the ""struct list_objects_filter_option *" to be used. - * - * The OPT_PARSE_LIST_OBJECTS_FILTER_INIT() can be used e.g. the - * "struct list_objects_filter_option" is embedded in a "struct - * rev_info", which the "defval" could be tasked with lazily - * initializing. See cmd_pack_objects() for an example. */ int opt_parse_list_objects_filter(const struct option *opt, const char *arg, int unset); -typedef struct list_objects_filter_options *(*opt_lof_init)(void *); -#define OPT_PARSE_LIST_OBJECTS_FILTER_INIT(fo, init) \ - { OPTION_CALLBACK, 0, "filter", (fo), N_("args"), \ - N_("object filtering"), 0, opt_parse_list_objects_filter, \ - (intptr_t)(init) } #define OPT_PARSE_LIST_OBJECTS_FILTER(fo) \ - OPT_PARSE_LIST_OBJECTS_FILTER_INIT((fo), NULL) + OPT_CALLBACK(0, "filter", (fo), N_("args"), \ + N_("object filtering"), opt_parse_list_objects_filter) /* * Translates abbreviated numbers in the filter's filter_spec into their -- 2.38.1 ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v3 5/5] list-objects-filter: remove OPT_PARSE_LIST_OBJECTS_FILTER_INIT() 2022-11-29 12:26 ` [PATCH v3 5/5] list-objects-filter: remove OPT_PARSE_LIST_OBJECTS_FILTER_INIT() René Scharfe @ 2022-11-30 1:20 ` Junio C Hamano 0 siblings, 0 replies; 52+ messages in thread From: Junio C Hamano @ 2022-11-30 1:20 UTC (permalink / raw) To: René Scharfe Cc: Git List, Ævar Arnfjörð Bjarmason, Taylor Blau, Christian Couder, Jeff King René Scharfe <l.s.r@web.de> writes: > OPT_PARSE_LIST_OBJECTS_FILTER_INIT() with a non-NULL second argument > passes a function pointer via an object pointer, which is undefined. It > may work fine on platforms that implement C99 extension J.5.7 (Function > pointer casts). Remove the unused macro and avoid the dependency on > that extension. Makes sense. Some of us may consider that no platform of importance exists in practice that breaks with such a cast, but our opinion does not matter at all, while opinion of those who supply compilers and checkers to us do---they may decide to flag such a use with a warning and that would break our build with the -Werror option. If we do not have to rely on the extension, we should. Thanks. ^ permalink raw reply [flat|nested] 52+ messages in thread
end of thread, other threads:[~2022-11-30 11:26 UTC | newest] Thread overview: 52+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-11-12 10:42 [PATCH 0/3] pack-objects: fix and simplify --filter handling René Scharfe 2022-11-12 10:44 ` [PATCH 1/3] pack-objects: fix handling of multiple --filter options René Scharfe 2022-11-12 11:41 ` Ævar Arnfjörð Bjarmason 2022-11-13 17:31 ` René Scharfe 2022-11-12 16:58 ` Jeff King 2022-11-13 5:01 ` Taylor Blau 2022-11-13 16:44 ` Jeff King 2022-11-13 17:31 ` René Scharfe 2022-11-12 10:44 ` [PATCH 2/3] pack-object: simplify --filter handling René Scharfe 2022-11-12 11:45 ` Ævar Arnfjörð Bjarmason 2022-11-12 17:02 ` Jeff King 2022-11-13 16:49 ` Jeff King 2022-11-13 17:31 ` René Scharfe 2022-11-12 10:46 ` [PATCH 3/3] list-objects-filter: remove OPT_PARSE_LIST_OBJECTS_FILTER_INIT() René Scharfe 2022-11-20 10:03 ` [PATCH v2 0/3] pack-objects: fix and simplify --filter handling René Scharfe 2022-11-20 10:06 ` [PATCH v2 1/3] t5317: stop losing return codes of git ls-files René Scharfe 2022-11-20 10:07 ` [PATCH v2 2/3] t5317: demonstrate failure to handle multiple --filter options René Scharfe 2022-11-20 10:13 ` [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" René Scharfe 2022-11-28 10:03 ` Junio C Hamano 2022-11-28 11:12 ` Ævar Arnfjörð Bjarmason 2022-11-28 12:00 ` [PATCH] t5314: check exit code of "rev-parse" Ævar Arnfjörð Bjarmason 2022-11-28 13:51 ` René Scharfe 2022-11-28 14:18 ` [PATCH v2] t5314: check exit code of "git" Ævar Arnfjörð Bjarmason 2022-11-28 11:26 ` [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" René Scharfe 2022-11-28 11:31 ` Ævar Arnfjörð Bjarmason 2022-11-28 12:24 ` Ævar Arnfjörð Bjarmason 2022-11-28 15:16 ` René Scharfe 2022-11-28 15:27 ` Ævar Arnfjörð Bjarmason 2022-11-28 14:29 ` René Scharfe 2022-11-28 14:34 ` Ævar Arnfjörð Bjarmason 2022-11-28 15:56 ` René Scharfe 2022-11-28 17:57 ` René Scharfe 2022-11-28 18:32 ` Ævar Arnfjörð Bjarmason 2022-11-28 21:57 ` René Scharfe 2022-11-29 1:26 ` Jeff King 2022-11-29 1:46 ` Junio C Hamano 2022-11-29 10:25 ` Ævar Arnfjörð Bjarmason 2022-11-29 7:12 ` Ævar Arnfjörð Bjarmason 2022-11-29 19:18 ` René Scharfe 2022-11-28 17:57 ` Ævar Arnfjörð Bjarmason 2022-11-22 19:02 ` [PATCH v2 0/3] pack-objects: fix and simplify --filter handling Jeff King 2022-11-29 12:19 ` [PATCH v3 0/5] " René Scharfe 2022-11-29 12:21 ` [PATCH v3 1/5] t5317: stop losing return codes of git ls-files René Scharfe 2022-11-29 12:22 ` [PATCH v3 2/5] t5317: demonstrate failure to handle multiple --filter options René Scharfe 2022-11-29 12:23 ` [PATCH v3 3/5] pack-objects: fix handling of " René Scharfe 2022-11-30 1:09 ` Junio C Hamano 2022-11-30 7:11 ` René Scharfe 2022-11-29 12:25 ` [PATCH v3 4/5] pack-objects: simplify --filter handling René Scharfe 2022-11-29 13:27 ` Ævar Arnfjörð Bjarmason 2022-11-30 11:23 ` René Scharfe 2022-11-29 12:26 ` [PATCH v3 5/5] list-objects-filter: remove OPT_PARSE_LIST_OBJECTS_FILTER_INIT() René Scharfe 2022-11-30 1:20 ` Junio C Hamano
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).