git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: William Sprent <williams@unity3d.com>
To: Eric Sunshine <sunshine@sunshineco.com>,
	William Sprent via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] ls-tree: add --sparse-filter-oid argument
Date: Mon, 16 Jan 2023 13:14:02 +0100	[thread overview]
Message-ID: <26f026f6-6c17-7578-e341-d56fe53c95df@unity3d.com> (raw)
In-Reply-To: <CAPig+cRgZ0CrkqY7mufuWmhf6BC8yXjXXuOTEQjuz+Y0NA+N7Q@mail.gmail.com>

On 13/01/2023 15.17, Eric Sunshine wrote:
> On Wed, Jan 11, 2023 at 12:05 PM William Sprent via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> [...]
>> To fill this gap, add a new argument to ls-tree '--sparse-filter-oid'
>> which takes the object id of a blob containing sparse checkout patterns
>> that filters the output of 'ls-tree'. When filtering with given sparsity
>> patterns, 'ls-tree' only outputs blobs and commit objects that
>> match the given patterns.
>> [...]
>> Signed-off-by: William Sprent <williams@unity3d.com>
> 
> This is not a proper review, but rather just some comments about
> issues I noticed while quickly running my eye over the patch. Many are
> just micro-nits about style; a few regarding the tests are probably
> actionable.
> 
>>      Note that one of the tests only pass when run on top of commit
>>      5842710dc2 (dir: check for single file cone patterns, 2023-01-03), which
>>      was submitted separately and is currently is merged to 'next'.
> 
> Thanks for mentioning this. It's exactly the sort of information the
> maintainer needs when applying your patch to his tree. And it can be
> helpful for reviewers too.
> 
>> diff --git a/Documentation/git-ls-tree.txt b/Documentation/git-ls-tree.txt
>> @@ -48,6 +48,11 @@ OPTIONS
>> +--sparse-filter-oid=<blob-ish>::
>> +       Omit showing tree objects and paths that do not match the
>> +       sparse-checkout specification contained in the blob
>> +       (or blob-expression) <blob-ish>.
> 
> Good to see a documentation update. The SYNOPSIS probably deserves an
> update too.
> 

Nice catch. I'll update it.

>> diff --git a/builtin/ls-tree.c b/builtin/ls-tree.c
>> @@ -329,12 +331,79 @@ static struct ls_tree_cmdmode_to_fmt ls_tree_cmdmode_format[] = {
>> +static void sparse_filter_data__init(
>> +       struct sparse_filter_data **d,
>> +       const char *sparse_oid_name, read_tree_fn_t fn)
>> +{
>> +       struct object_id sparse_oid;
>> +       struct object_context oc;
>> +
>> +       (*d) = xcalloc(1, sizeof(**d));
>> +       (*d)->fn = fn;
>> +       (*d)->pl.use_cone_patterns = core_sparse_checkout_cone;
>> +
>> +       strbuf_init(&(*d)->full_path_buf, 0);
>> +
>> +
> 
> Nit: too many blank lines.
> 
>> +       if (get_oid_with_context(the_repository,
>> +                                sparse_oid_name,
>> +                                GET_OID_BLOB, &sparse_oid, &oc))
> 
> Pure nit: somewhat odd choice of wrapping; I'd have expected something
> along the lines of:
> 
>      if (get_oid_with_context(the_repository, sparse_oid_name, GET_OID_BLOB,
>                      &sparse_oid, &oc))
> 
> or:
> 
>      if (get_oid_with_context(the_repository, sparse_oid_name,
>                      GET_OID_BLOB, &sparse_oid, &oc))
> 
>> +static void sparse_filter_data__free(struct sparse_filter_data *d)
>> +{
>> +       clear_pattern_list(&d->pl);
>> +       strbuf_release(&d->full_path_buf);
>> +       free(d);
>> +}
> 
> Is the double-underscore convention in function names imported from
> elsewhere? I don't recall seeing it used in this codebase.
> 

I took inspiration from the similar '--filter:sparse' argument for rev-list
which is implemented in 'list-objects-filter.c'. I've just given it a
search  and it looks like it that convention isn't really used outside the
list-objects files, which makes it a bit awkward here.

I get a bunch more hits for '(init|free)_', so I will change them to that.


>> +static int path_matches_sparse_checkout_patterns(struct strbuf *full_path, struct pattern_list *pl, int dtype)
>> +{
>> +       enum pattern_match_result match = recursively_match_path_with_sparse_patterns(full_path->buf, the_repository->index, dtype, pl);
>> +       return match > 0;
>> +}
>> +
>> +
> 
> Nit: too many blank lines
> 
>> +static int filter_sparse(const struct object_id *oid, struct strbuf *base,
>> +                        const char *pathname, unsigned mode, void *context)
>> +{
>> +
>> +       struct sparse_filter_data *data = context;
> 
> Nit: unnecessary blank line after "{"
> 
>> +       int do_recurse = show_recursive(base->buf, base->len, pathname);
>> +       if (object_type(mode) == OBJ_TREE)
>> +               return do_recurse;
>> +
>> +       strbuf_reset(&data->full_path_buf);
>> +       strbuf_addbuf(&data->full_path_buf, base);
>> +       strbuf_addstr(&data->full_path_buf, pathname);
>> +
>> +       if (!path_matches_sparse_checkout_patterns(&data->full_path_buf, &data->pl, DT_REG))
>> +               return 0;
>> +
>> +       return data->fn(oid, base, pathname, mode, context);	
>> +}
>> +
>> +
> 
> Nit: too many blank lines
> 
>> diff --git a/dir.c b/dir.c
>> @@ -1450,45 +1450,51 @@ int init_sparse_checkout_patterns(struct index_state *istate)
>> +static int path_in_sparse_checkout_1(const char *path,
>> +                                    struct index_state *istate,
>> +                                    int require_cone_mode)
>> +{
>> +
>> +       /*
> 
> Nit: unnecessary blank line after "{"
> 
>> diff --git a/t/t3106-ls-tree-sparse-checkout.sh b/t/t3106-ls-tree-sparse-checkout.sh
>> new file mode 100755
> 
> We often try to avoid introducing a new test script if the tests being
> added can fit well into an existing script. If you didn't find any
> existing script where these tests would fit, then creating a new
> script may be fine.
> 

Well there's a couple of things. The only ls-tree related test file
that isn't too specific is 't3103-ls-tree-misc.sh'. However, the
'sparse-checkout' command leaks memory, so that would require setting
'TEST_PASSES_SANITIZE_LEAK' to false.

For tests related to sparse-checkout, we both have the large
't1092-sparse-checkout-compatibility.sh' file. But there is also a
collection of files like 't3705-add-sparse-checkout.sh'. I guess
both could work fine in this case. I think I went with the latter
because I personally find the smaller test files a bit easier to
parse.

If someone feels strongly the other way, I'll gladly change it.

>> @@ -0,0 +1,103 @@
>> +check_agrees_with_ls_files () {
>> +       REPO=repo
>> +       git -C $REPO submodule deinit -f --all
>> +       git -C $REPO cat-file -p ${filter_oid} >${REPO}/.git/info/sparse-checkout
>> +       git -C $REPO sparse-checkout init --cone 2>err
>> +       git -C $REPO submodule init
>> +       git -C $REPO ls-files -t| grep -v "^S "|cut -d" " -f2 >ls-files
>> +       test_cmp ls-files actual
>> +}
> 
> Several comments:
> 
> Since the return code of this function is significant and callers care
> about it, you should &&-chain all of the code in the function itself
> (just like you do within a test) so that failure of any command in the
> function is noticed and causes the calling test to fail. It's a good
> idea to &&-chain the variable assignments at the top of the function
> too (just in case someone later inserts code above the assignments).
> 

That was an oversight. Thanks for noticing.

> It's odd to see a mixture of $VAR and ${VAR}. It's better to be
> consistent. We typically use the $VAR form (though it's not exclusive
> by any means).
> 

I'll remove the braces.

> Some shells complain when the pathname following ">" redirection
> operator is not quoted, so:
> 
>      git -C $REPO cat-file -p ${filter_oid} >"$REPO/.git/info/sparse-checkout" &&
> 
> Style: add space around "|" pipe operator
> 
> We usually avoid having a Git command upstream of the pipe since its
> exit code gets swallowed by the pipe, so we usually do this instead:
> 
>      git -C $REPO ls-files -t >out &&
>      grep -v "^S " out | cut -d" " -f2 >ls-files &&
> 

Good point. I'll do that.

> Minor: The two-command pipeline `grep -v "^S " | cut -d" " -f2
>> ls-files` could be expressed via a single `sed` invocation:
> 
>      sed -n "/^S /!s/^. //p" out &&
> 
> Nit: The first argument to test_cmp() is typically named "expect".
> 
> Error output is captured to file "err" but that file is never consulted.
> 

I'll remove the redirection.

>> +check_same_result_in_bare_repo () {
>> +       FULL=repo
>> +       BARE=bare
>> +       FILTER=$1
>> +       git -C repo cat-file -p ${filter_oid}| git -C bare hash-object -w --stdin
>> +       git -C bare ls-tree --name-only --filter-sparse-oid=${filter_oid} -r HEAD >bare-result
>> +       test_cmp expect bare-result
>> +}
> 
> Same comments as above, plus:
> 
> What is the purpose of the variables FULL, BARE, FILTER? They don't
> seem to be used in the function.
> 
> I suspect that the function should be using $FILTER internally rather
> than $filter_oid.
> 

I think I probably had plans with them and then forgot. Thanks for noticing.
I'll keep (and use) FILTER, but remove the others.

>> +test_expect_success 'setup' '
>> +       git init submodule &&
>> +       (
>> +               cd submodule &&
>> +               test_commit file
>> +       ) &&
> 
> Minor: test_commit() accepts a -C option, so this could be done
> without the subshell:
> 
>      test_commit -C submodule file
> 
>> +       git init repo &&
>> +       (
>> +               cd repo &&
>> +               mkdir dir &&
>> +               test_commit dir/sub-file &&
>> +               test_commit dir/sub-file2 &&
>> +               mkdir dir2 &&
>> +               test_commit dir2/sub-file1 &&
>> +               test_commit dir2/sub-file2 &&
>> +               test_commit top-file &&
>> +               git clone ../submodule submodule &&
>> +               git submodule add ./submodule &&
>> +               git submodule absorbgitdirs &&
>> +               git commit -m"add submodule" &&
>> +               git sparse-checkout init --cone
>> +       ) &&
> 
> Here the subshell makes sense since so many commands are run in
> directory "repo." Fine.

Thanks for taking the time to give some comments. I think they all
make sense to me. Besides the comments I've explicitly acknowledged above,
I've also applied all of the minor/nit/style comments to my local tree.



  parent reply	other threads:[~2023-01-16 12:14 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-11 17:01 [PATCH] ls-tree: add --sparse-filter-oid argument William Sprent via GitGitGadget
2023-01-13 14:17 ` Eric Sunshine
2023-01-13 20:01   ` Junio C Hamano
2023-01-16 15:13     ` William Sprent
2023-01-16 12:14   ` William Sprent [this message]
2023-01-23 11:46 ` [PATCH v2] " William Sprent via GitGitGadget
2023-01-23 13:00   ` Ævar Arnfjörð Bjarmason
2023-01-24 15:30     ` William Sprent
2023-01-23 13:06   ` Ævar Arnfjörð Bjarmason
2023-01-24 15:30     ` William Sprent
2023-01-24 20:11   ` Victoria Dye
2023-01-25 13:47     ` William Sprent
2023-01-25 18:32       ` Victoria Dye
2023-01-26 14:55         ` William Sprent
2023-01-25  5:11   ` Elijah Newren
2023-01-25 16:16     ` William Sprent
2023-01-26  3:25       ` Elijah Newren
2023-01-27 11:58         ` William Sprent
2023-01-28 16:45           ` Elijah Newren
2023-01-30 15:28             ` William Sprent

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=26f026f6-6c17-7578-e341-d56fe53c95df@unity3d.com \
    --to=williams@unity3d.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).