From: Jeff Hostetler <git@jeffhostetler.com>
To: Johannes Schindelin via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Subject: Re: [PATCH 2/4] split-index; stop abusing the `base_oid` to strip the "link" extension
Date: Thu, 23 Mar 2023 11:22:00 -0400 [thread overview]
Message-ID: <aaad07e3-11c9-9bb1-89ec-f9418cb2ef77@jeffhostetler.com> (raw)
In-Reply-To: <f1897b880729b649ab24da14cbc3432d44b7c731.1679500859.git.gitgitgadget@gmail.com>
On 3/22/23 12:00 PM, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> When a split-index is in effect, the `$GIT_DIR/index` file needs to
> contain a "link" extension that contains all the information about the
> split-index, including the information about the shared index.
>
> However, in some cases Git needs to suppress writing that "link"
> extension (i.e. to fall back to writing a full index) even if the
> in-memory index structure _has_ a `split_index` configured. This is the
> case e.g. when "too many not shared" index entries exist.
>
> In such instances, the current code sets the `base_oid` field of said
> `split_index` structure to all-zero to indicate that `do_write_index()`
> should skip writing the "link" extension.
>
> This can lead to problems later on, when the in-memory index is still
> used to perform other operations and eventually wants to write a
> split-index, detects the presence of the `split_index` and reuses that,
> too (under the assumption that it has been initialized correctly and
> still has a non-null `base_oid`).
>
> Let's stop zeroing out the `base_oid` to indicate that the "link"
> extension should not be written.
>
> One might be tempted to simply call `discard_split_index()` instead,
> under the assumption that Git decided to write a non-split index and
> therefore the the `split_index` structure might no longer be wanted.
> However, that is not possible because that would release index entries
> in `split_index->base` that are likely to still be in use. Therefore we
> cannot do that.
>
> The next best thing we _can_ do is to introduce a flag, specifically
> indicating when the "link" extension should be skipped. So that's what
> we do here.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
> read-cache.c | 37 ++++++++++++++++++++++--------------
> t/t7527-builtin-fsmonitor.sh | 2 +-
> 2 files changed, 24 insertions(+), 15 deletions(-)
>
> diff --git a/read-cache.c b/read-cache.c
> index b09128b1884..8fcb2d54c05 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -2868,6 +2868,12 @@ static int record_ieot(void)
> return !git_config_get_index_threads(&val) && val != 1;
> }
>
> +enum strip_extensions {
> + WRITE_ALL_EXTENSIONS = 0,
> + STRIP_ALL_EXTENSIONS = 1,
> + STRIP_LINK_EXTENSION_ONLY = 2
> +};
Earlier (in a response to Junio's response on this commit) I said
that I didn't think we needed to make a bit set here, but I want
to re-think that or at least walk thru the change and talk out loud.
I'll explain in-line below.
> +
> /*
> * On success, `tempfile` is closed. If it is the temporary file
> * of a `struct lock_file`, we will therefore effectively perform
> @@ -2876,7 +2882,7 @@ static int record_ieot(void)
> * rely on it.
> */
> static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
> - int strip_extensions, unsigned flags)
> + enum strip_extensions strip_extensions, unsigned flags)
> {
> uint64_t start = getnanotime();
> struct hashfile *f;
> @@ -3045,7 +3051,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
> return -1;
> }
>
> - if (!strip_extensions && istate->split_index &&
> + if (strip_extensions == WRITE_ALL_EXTENSIONS && istate->split_index &&
> !is_null_oid(&istate->split_index->base_oid)) {
(I hate all of this double negative logic...)
Here we only want the extension if we have WRITE_ALL, so that is
NOT STRIP_ALL and NOT STRIP_LINK_ONLY, so that is OK.
> struct strbuf sb = STRBUF_INIT;
>
> @@ -3060,7 +3066,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
> if (err)
> return -1;
> }
> - if (!strip_extensions && !drop_cache_tree && istate->cache_tree) {
> + if (strip_extensions != STRIP_ALL_EXTENSIONS && !drop_cache_tree && istate->cache_tree) {
Here we only want the extension when NOT STRIP_ALL, so this is
either WRITE_ALL or STRIP_LINK_ONLY, so this is OK. The rest are
the same, so I'll omit them.
[...]
All of this looks correct, but I stumbled over things on my first
or second reading. I wonder if it would it simplify things to define
this as:
enum strip_extensions {
WRITE_ALL_EXTENSIONS = 0,
STRIP_LINK_EXTENSION = (1<0),
STRIP_OTHER_EXTENSIONS = (1<1),
STRIP_ALL_EXTENSIONS = (STRIP_LINK_EXTENSION
| STRIP_OTHER_EXTENSIONS),
};
Then the link test becomes:
if ( ! (strip_extensions & STRIP_LINK_EXTENSION) &&
istate->split_index &&
...) {
and the others become:
if ( ! (strip_extensions & STRIP_OTHER_EXTENSIONS) &&
...) {
If we need to add the ability later to strip an individual,
we can easily add a bit to the enum and update the _ALL_ mask
and the corresponding `if` test.
In a later commit (probably in another series), we can invert
these double negatives to improve readability.
> +/*
> + * Write the Git index into a `.lock` file
> + *
> + * If `strip_link_extension` is non-zero, avoid writing any "link" extension
> + * (used by the split-index feature).
> + */
> static int do_write_locked_index(struct index_state *istate, struct lock_file *lock,
> - unsigned flags)
> + unsigned flags, int strip_link_extension)
> {
> int ret;
> int was_full = istate->sparse_index == INDEX_EXPANDED;
> @@ -3185,7 +3197,7 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
> */
> trace2_region_enter_printf("index", "do_write_index", the_repository,
> "%s", get_lock_file_path(lock));
> - ret = do_write_index(istate, lock->tempfile, 0, flags);
> + ret = do_write_index(istate, lock->tempfile, strip_link_extension ? STRIP_LINK_EXTENSION_ONLY : 0, flags);
> trace2_region_leave_printf("index", "do_write_index", the_repository,
> "%s", get_lock_file_path(lock));
>
> @@ -3214,7 +3226,7 @@ static int write_split_index(struct index_state *istate,
> {
> int ret;
> prepare_to_write_split_index(istate);
> - ret = do_write_locked_index(istate, lock, flags);
> + ret = do_write_locked_index(istate, lock, flags, 0);
could we use the enum values here instead of 0 ?
> finish_writing_split_index(istate);
> return ret;
> }
> @@ -3366,9 +3378,7 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
> if ((!si && !test_split_index_env) ||
> alternate_index_output ||
> (istate->cache_changed & ~EXTMASK)) {
> - if (si)
> - oidclr(&si->base_oid);
> - ret = do_write_locked_index(istate, lock, flags);
> + ret = do_write_locked_index(istate, lock, flags, 1);
and here
> goto out;
> }
>
> @@ -3394,8 +3404,7 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
> /* Same initial permissions as the main .git/index file */
> temp = mks_tempfile_sm(git_path("sharedindex_XXXXXX"), 0, 0666);
> if (!temp) {
> - oidclr(&si->base_oid);
> - ret = do_write_locked_index(istate, lock, flags);
> + ret = do_write_locked_index(istate, lock, flags, 1);
and here
> goto out;
> }
> ret = write_shared_index(istate, &temp, flags);
> diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
> index cbafdd69602..9fab9a2ab38 100755
> --- a/t/t7527-builtin-fsmonitor.sh
> +++ b/t/t7527-builtin-fsmonitor.sh
> @@ -1003,7 +1003,7 @@ test_expect_success !UNICODE_COMPOSITION_SENSITIVE 'Unicode nfc/nfd' '
> egrep "^event: nfd/d_${utf8_nfc}/?$" ./unicode.trace
> '
>
> -test_expect_failure 'split-index and FSMonitor work well together' '
> +test_expect_success 'split-index and FSMonitor work well together' '
> git init split-index &&
> test_when_finished "git -C \"$PWD/split-index\" \
> fsmonitor--daemon stop" &&
Thanks
Jeff
next prev parent reply other threads:[~2023-03-23 15:22 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-22 16:00 [PATCH 0/4] Fix a few split-index bugs Johannes Schindelin via GitGitGadget
2023-03-22 16:00 ` [PATCH 1/4] split-index & fsmonitor: demonstrate a bug Johannes Schindelin via GitGitGadget
2023-03-23 13:39 ` Jeff Hostetler
2023-03-22 16:00 ` [PATCH 2/4] split-index; stop abusing the `base_oid` to strip the "link" extension Johannes Schindelin via GitGitGadget
2023-03-22 21:24 ` Junio C Hamano
2023-03-23 13:59 ` Jeff Hostetler
2023-03-23 16:06 ` Junio C Hamano
2023-03-23 15:22 ` Jeff Hostetler [this message]
2023-03-22 16:00 ` [PATCH 3/4] fsmonitor: avoid overriding `cache_changed` bits Johannes Schindelin via GitGitGadget
2023-03-22 16:00 ` [PATCH 4/4] unpack-trees: take care to propagate the split-index flag Johannes Schindelin via GitGitGadget
2023-03-23 14:32 ` Jeff Hostetler
2023-03-22 16:22 ` [PATCH 0/4] Fix a few split-index bugs Junio C Hamano
2023-03-26 22:45 ` [PATCH v2 " Johannes Schindelin via GitGitGadget
2023-03-26 22:45 ` [PATCH v2 1/4] split-index & fsmonitor: demonstrate a bug Johannes Schindelin via GitGitGadget
2023-03-26 22:45 ` [PATCH v2 2/4] split-index; stop abusing the `base_oid` to strip the "link" extension Johannes Schindelin via GitGitGadget
2023-03-26 22:45 ` [PATCH v2 3/4] fsmonitor: avoid overriding `cache_changed` bits Johannes Schindelin via GitGitGadget
2023-03-26 22:45 ` [PATCH v2 4/4] unpack-trees: take care to propagate the split-index flag Johannes Schindelin via GitGitGadget
2023-03-27 14:48 ` [PATCH v2 0/4] Fix a few split-index bugs Jeff Hostetler
2023-03-27 16:42 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aaad07e3-11c9-9bb1-89ec-f9418cb2ef77@jeffhostetler.com \
--to=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=johannes.schindelin@gmx.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).