git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee <stolee@gmail.com>,
	Matheus Tavares Bernardino <matheus.bernardino@usp.br>,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH v2 24/25] sparse-index: expand_to_path()
Date: Mon, 5 Apr 2021 12:32:23 -0700	[thread overview]
Message-ID: <CABPp-BE5xLMsVdamAb=MgiBnXKQrYDPJXa1TWutDK+nVYRPyRQ@mail.gmail.com> (raw)
In-Reply-To: <d52c72b4a7b9a3ae10bd2daae85c1bd1bafcc2f3.1617241803.git.gitgitgadget@gmail.com>

On Wed, Mar 31, 2021 at 6:50 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Some users of the index API have a specific path they are looking for,
> but choose to use index_file_exists() to rely on the name-hash hashtable
> instead of doing binary search with index_name_pos(). These users only
> need to know a yes/no answer, not a position within the cache array.
>
> When the index is sparse, the name-hash hash table does not contain the
> full list of paths within sparse directories. It _does_ contain the
> directory names for the sparse-directory entries.
>
> Create a helper function, expand_to_path(), for intended use with the
> name-hash hashtable functions. The integration with name-hash.c will
> follow in a later change.
>
> The solution here is to use ensure_full_index() when we determine that
> the requested path is within a sparse directory entry. This will
> populate the name-hash hashtable as the index is recomputed from
> scratch.
>
> There may be cases where the caller is trying to find an untracked path
> that is not in the index but also is not within a sparse directory
> entry. We want to minimize the overhead for these requests. If we used
> index_name_pos() to find the insertion order of the path, then we could
> determine from that position if a sparse-directory exists. (In fact,
> just calling index_name_pos() in that case would lead to expanding the
> index to a full index.) However, this takes O(log N) time where N is the
> number of cache entries.
>
> To keep the performance of this call based mostly on the input string,
> use index_file_exists() to look for the ancestors of the path. Using the
> heuristic that a sparse directory is likely to have a small number of
> parent directories, we start from the bottom and build up. Use a string
> buffer to allow mutating the path name to terminate after each slash for
> each hashset test.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  sparse-index.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  sparse-index.h | 13 +++++++++
>  2 files changed, 85 insertions(+)
>
> diff --git a/sparse-index.c b/sparse-index.c
> index 95ea17174da3..8a1223041296 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -283,3 +283,75 @@ void ensure_full_index(struct index_state *istate)
>
>         trace2_region_leave("index", "ensure_full_index", istate->repo);
>  }
> +
> +/*
> + * This static global helps avoid infinite recursion between
> + * expand_to_path() and index_file_exists().
> + */
> +static int in_expand_to_path = 0;
> +
> +void expand_to_path(struct index_state *istate,
> +                   const char *path, size_t pathlen, int icase)
> +{
> +       struct strbuf path_mutable = STRBUF_INIT;
> +       size_t substr_len;
> +
> +       /* prevent extra recursion */
> +       if (in_expand_to_path)
> +               return;
> +
> +       if (!istate || !istate->sparse_index)
> +               return;
> +
> +       if (!istate->repo)
> +               istate->repo = the_repository;
> +
> +       in_expand_to_path = 1;
> +
> +       /*
> +        * We only need to actually expand a region if the
> +        * following are both true:
> +        *
> +        * 1. 'path' is not already in the index.
> +        * 2. Some parent directory of 'path' is a sparse directory.
> +        */
> +
> +       if (index_file_exists(istate, path, pathlen, icase))
> +               goto cleanup;
> +
> +       strbuf_add(&path_mutable, path, pathlen);
> +       strbuf_addch(&path_mutable, '/');
> +
> +       /* Check the name hash for all parent directories */
> +       substr_len = 0;
> +       while (substr_len < pathlen) {
> +               char temp;
> +               char *replace = strchr(path_mutable.buf + substr_len, '/');
> +
> +               if (!replace)
> +                       break;
> +
> +               /* replace the character _after_ the slash */
> +               replace++;
> +               temp = *replace;
> +               *replace = '\0';
> +               if (index_file_exists(istate, path_mutable.buf,
> +                                     path_mutable.len, icase)) {
> +                       /*
> +                        * We found a parent directory in the name-hash
> +                        * hashtable, which means that this entry could
> +                        * exist within a sparse-directory entry. Expand
> +                        * accordingly.

"this entry" is a bit unclear; it might be worth referring to the
"path" variable instead.  I think it'd also help to explain the
reasoning for using the '/' character, because while it's clear when
looking at the series, just running across it in the code it might be
easy to forget or miss.  Perhaps:

                    * We found a parent directory in the name-hash
                    * hashtable, because only sparse directory entries
                    * have a trailing '/' character.  Since "path" wasn't
                    * in the index, perhaps it exists within this
                    * sparse-directory.  Expand accordingly.

> +                        */
> +                       ensure_full_index(istate);
> +                       break;
> +               }
> +
> +               *replace = temp;
> +               substr_len = replace - path_mutable.buf;
> +       }
> +
> +cleanup:
> +       strbuf_release(&path_mutable);
> +       in_expand_to_path = 0;
> +}
> diff --git a/sparse-index.h b/sparse-index.h
> index 0268f38753c0..1115a0d7dd98 100644
> --- a/sparse-index.h
> +++ b/sparse-index.h
> @@ -4,6 +4,19 @@
>  struct index_state;
>  int convert_to_sparse(struct index_state *istate);
>
> +/*
> + * Some places in the codebase expect to search for a specific path.
> + * This path might be outside of the sparse-checkout definition, in
> + * which case a sparse-index may not contain a path for that index.
> + *
> + * Given an index and a path, check to see if a leading directory for
> + * 'path' exists in the index as a sparse directory. In that case,
> + * expand that sparse directory to a full range of cache entries and
> + * populate the index accordingly.
> + */
> +void expand_to_path(struct index_state *istate,
> +                   const char *path, size_t pathlen, int icase);
> +
>  struct repository;
>  int set_sparse_index_config(struct repository *repo, int enable);
>
> --
> gitgitgadget

  reply	other threads:[~2021-04-05 19:32 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 01/27] *: remove 'const' qualifier for struct index_state Derrick Stolee via GitGitGadget
2021-03-19 21:01   ` Junio C Hamano
2021-03-20  1:45     ` Derrick Stolee
2021-03-20  1:52     ` Junio C Hamano
2021-03-30 16:53       ` Derrick Stolee
2021-03-16 21:16 ` [PATCH 02/27] read-cache: expand on query into sparse-directory entry Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 03/27] sparse-index: API protection strategy Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 04/27] cache: move ensure_full_index() to cache.h Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 05/27] add: ensure full index Derrick Stolee via GitGitGadget
2021-03-17 17:35   ` Elijah Newren
2021-03-17 20:35     ` Matheus Tavares Bernardino
2021-03-17 20:55       ` Derrick Stolee
2021-03-16 21:16 ` [PATCH 06/27] checkout-index: " Derrick Stolee via GitGitGadget
2021-03-17 17:50   ` Elijah Newren
2021-03-17 20:05     ` Derrick Stolee
2021-03-17 21:10       ` Elijah Newren
2021-03-17 21:33         ` Derrick Stolee
2021-03-17 22:36           ` Elijah Newren
2021-03-18  1:17             ` Derrick Stolee
2021-03-16 21:16 ` [PATCH 07/27] checkout: " Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 08/27] commit: " Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 09/27] difftool: " Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 10/27] fsck: " Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 11/27] grep: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 12/27] ls-files: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 13/27] merge-index: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 14/27] rm: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 15/27] sparse-checkout: " Derrick Stolee via GitGitGadget
2021-03-18  5:22   ` Elijah Newren
2021-03-23 13:13     ` Derrick Stolee
2021-03-16 21:17 ` [PATCH 16/27] update-index: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 17/27] diff-lib: " Derrick Stolee via GitGitGadget
2021-03-18  5:24   ` Elijah Newren
2021-03-23 13:15     ` Derrick Stolee
2021-03-16 21:17 ` [PATCH 18/27] dir: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 19/27] entry: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 20/27] merge-ort: " Derrick Stolee via GitGitGadget
2021-03-18  5:31   ` Elijah Newren
2021-03-23 13:26     ` Derrick Stolee
2021-03-16 21:17 ` [PATCH 21/27] merge-recursive: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 22/27] pathspec: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 23/27] read-cache: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 24/27] resolve-undo: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 25/27] revision: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 26/27] sparse-index: expand_to_path() Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 27/27] name-hash: use expand_to_path() Derrick Stolee via GitGitGadget
2021-03-17 18:03 ` [PATCH 00/27] Sparse Index: API protections Elijah Newren
2021-03-18  6:32   ` Elijah Newren
2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 01/25] sparse-index: API protection strategy Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 02/25] *: remove 'const' qualifier for struct index_state Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 03/25] read-cache: expand on query into sparse-directory entry Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 04/25] cache: move ensure_full_index() to cache.h Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 05/25] add: ensure full index Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 06/25] checkout-index: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 07/25] checkout: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 08/25] commit: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 09/25] difftool: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 10/25] fsck: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 11/25] grep: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 12/25] ls-files: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 13/25] merge-index: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 14/25] rm: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 15/25] stash: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 16/25] update-index: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 17/25] dir: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 18/25] entry: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 19/25] merge-recursive: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 20/25] pathspec: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 21/25] read-cache: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 22/25] resolve-undo: " Derrick Stolee via GitGitGadget
2021-04-01  1:50   ` [PATCH v2 23/25] revision: " Derrick Stolee via GitGitGadget
2021-04-01  1:50   ` [PATCH v2 24/25] sparse-index: expand_to_path() Derrick Stolee via GitGitGadget
2021-04-05 19:32     ` Elijah Newren [this message]
2021-04-06 11:46       ` Derrick Stolee
2021-04-01  1:50   ` [PATCH v2 25/25] name-hash: use expand_to_path() Derrick Stolee via GitGitGadget
2021-04-05 19:53     ` Elijah Newren
2021-04-01  7:07   ` [PATCH v2 00/25] Sparse Index: API protections Junio C Hamano
2021-04-01 13:32     ` Derrick Stolee
2021-04-05 19:55   ` Elijah Newren
2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 01/26] sparse-index: API protection strategy Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 02/26] *: remove 'const' qualifier for struct index_state Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 03/26] read-cache: expand on query into sparse-directory entry Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 04/26] cache: move ensure_full_index() to cache.h Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 05/26] add: ensure full index Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 06/26] checkout-index: " Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 07/26] checkout: " Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 08/26] commit: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 09/26] difftool: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 10/26] fsck: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 11/26] grep: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 12/26] ls-files: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 13/26] merge-index: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 14/26] rm: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 15/26] stash: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 16/26] update-index: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 17/26] dir: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 18/26] entry: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 19/26] merge-recursive: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 20/26] pathspec: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 21/26] read-cache: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 22/26] resolve-undo: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 23/26] revision: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 24/26] name-hash: don't add directories to name_hash Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 25/26] sparse-index: expand_to_path() Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 26/26] name-hash: use expand_to_path() Derrick Stolee via GitGitGadget
2021-04-13 16:02     ` [PATCH v3 00/26] Sparse Index: API protections Elijah Newren
2021-04-14 20:44       ` Junio C Hamano
2021-04-15  2:42         ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BE5xLMsVdamAb=MgiBnXKQrYDPJXa1TWutDK+nVYRPyRQ@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=matheus.bernardino@usp.br \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).