git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: Johannes Schindelin via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Subject: Re: [PATCH 2/4] split-index; stop abusing the `base_oid` to strip the "link" extension
Date: Thu, 23 Mar 2023 11:22:00 -0400	[thread overview]
Message-ID: <aaad07e3-11c9-9bb1-89ec-f9418cb2ef77@jeffhostetler.com> (raw)
In-Reply-To: <f1897b880729b649ab24da14cbc3432d44b7c731.1679500859.git.gitgitgadget@gmail.com>



On 3/22/23 12:00 PM, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> When a split-index is in effect, the `$GIT_DIR/index` file needs to
> contain a "link" extension that contains all the information about the
> split-index, including the information about the shared index.
> 
> However, in some cases Git needs to suppress writing that "link"
> extension (i.e. to fall back to writing a full index) even if the
> in-memory index structure _has_ a `split_index` configured. This is the
> case e.g. when "too many not shared" index entries exist.
> 
> In such instances, the current code sets the `base_oid` field of said
> `split_index` structure to all-zero to indicate that `do_write_index()`
> should skip writing the "link" extension.
> 
> This can lead to problems later on, when the in-memory index is still
> used to perform other operations and eventually wants to write a
> split-index, detects the presence of the `split_index` and reuses that,
> too (under the assumption that it has been initialized correctly and
> still has a non-null `base_oid`).
> 
> Let's stop zeroing out the `base_oid` to indicate that the "link"
> extension should not be written.
> 
> One might be tempted to simply call `discard_split_index()` instead,
> under the assumption that Git decided to write a non-split index and
> therefore the the `split_index` structure might no longer be wanted.
> However, that is not possible because that would release index entries
> in `split_index->base` that are likely to still be in use. Therefore we
> cannot do that.
> 
> The next best thing we _can_ do is to introduce a flag, specifically
> indicating when the "link" extension should be skipped. So that's what
> we do here.
> 
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>   read-cache.c                 | 37 ++++++++++++++++++++++--------------
>   t/t7527-builtin-fsmonitor.sh |  2 +-
>   2 files changed, 24 insertions(+), 15 deletions(-)
> 
> diff --git a/read-cache.c b/read-cache.c
> index b09128b1884..8fcb2d54c05 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -2868,6 +2868,12 @@ static int record_ieot(void)
>   	return !git_config_get_index_threads(&val) && val != 1;
>   }
>   
> +enum strip_extensions {
> +	WRITE_ALL_EXTENSIONS = 0,
> +	STRIP_ALL_EXTENSIONS = 1,
> +	STRIP_LINK_EXTENSION_ONLY = 2
> +};

Earlier (in a response to Junio's response on this commit) I said
that I didn't think we needed to make a bit set here, but I want
to re-think that or at least walk thru the change and talk out loud.

I'll explain in-line below.

> +
>   /*
>    * On success, `tempfile` is closed. If it is the temporary file
>    * of a `struct lock_file`, we will therefore effectively perform
> @@ -2876,7 +2882,7 @@ static int record_ieot(void)
>    * rely on it.
>    */
>   static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
> -			  int strip_extensions, unsigned flags)
> +			  enum strip_extensions strip_extensions, unsigned flags)
>   {
>   	uint64_t start = getnanotime();
>   	struct hashfile *f;
> @@ -3045,7 +3051,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
>   			return -1;
>   	}
>   
> -	if (!strip_extensions && istate->split_index &&
> +	if (strip_extensions == WRITE_ALL_EXTENSIONS && istate->split_index &&
>   	    !is_null_oid(&istate->split_index->base_oid)) {

(I hate all of this double negative logic...)

Here we only want the extension if we have WRITE_ALL, so that is
NOT STRIP_ALL and NOT STRIP_LINK_ONLY, so that is OK.

>   		struct strbuf sb = STRBUF_INIT;
>   
> @@ -3060,7 +3066,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
>   		if (err)
>   			return -1;
>   	}
> -	if (!strip_extensions && !drop_cache_tree && istate->cache_tree) {
> +	if (strip_extensions != STRIP_ALL_EXTENSIONS && !drop_cache_tree && istate->cache_tree) {

Here we only want the extension when NOT STRIP_ALL, so this is
either WRITE_ALL or STRIP_LINK_ONLY, so this is OK.  The rest are
the same, so I'll omit them.

[...]

All of this looks correct, but I stumbled over things on my first
or second reading.  I wonder if it would it simplify things to define
this as:

enum strip_extensions {
	WRITE_ALL_EXTENSIONS   = 0,
	STRIP_LINK_EXTENSION   = (1<0),
	STRIP_OTHER_EXTENSIONS = (1<1),
	STRIP_ALL_EXTENSIONS   = (STRIP_LINK_EXTENSION
				 | STRIP_OTHER_EXTENSIONS),
};

Then the link test becomes:
	if ( ! (strip_extensions & STRIP_LINK_EXTENSION) &&
	    istate->split_index &&
	    ...) {

and the others become:

	if ( ! (strip_extensions & STRIP_OTHER_EXTENSIONS) &&
	    ...) {

If we need to add the ability later to strip an individual,
we can easily add a bit to the enum and update the _ALL_ mask
and the corresponding `if` test.

In a later commit (probably in another series), we can invert
these double negatives to improve readability.

> +/*
> + * Write the Git index into a `.lock` file
> + *
> + * If `strip_link_extension` is non-zero, avoid writing any "link" extension
> + * (used by the split-index feature).
> + */
>   static int do_write_locked_index(struct index_state *istate, struct lock_file *lock,
> -				 unsigned flags)
> +				 unsigned flags, int strip_link_extension)
>   {
>   	int ret;
>   	int was_full = istate->sparse_index == INDEX_EXPANDED;
> @@ -3185,7 +3197,7 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
>   	 */
>   	trace2_region_enter_printf("index", "do_write_index", the_repository,
>   				   "%s", get_lock_file_path(lock));
> -	ret = do_write_index(istate, lock->tempfile, 0, flags);
> +	ret = do_write_index(istate, lock->tempfile, strip_link_extension ? STRIP_LINK_EXTENSION_ONLY : 0, flags);
>   	trace2_region_leave_printf("index", "do_write_index", the_repository,
>   				   "%s", get_lock_file_path(lock));
>   
> @@ -3214,7 +3226,7 @@ static int write_split_index(struct index_state *istate,
>   {
>   	int ret;
>   	prepare_to_write_split_index(istate);
> -	ret = do_write_locked_index(istate, lock, flags);
> +	ret = do_write_locked_index(istate, lock, flags, 0);

could we use the enum values here instead of 0 ?

>   	finish_writing_split_index(istate);
>   	return ret;
>   }
> @@ -3366,9 +3378,7 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
>   	if ((!si && !test_split_index_env) ||
>   	    alternate_index_output ||
>   	    (istate->cache_changed & ~EXTMASK)) {
> -		if (si)
> -			oidclr(&si->base_oid);
> -		ret = do_write_locked_index(istate, lock, flags);
> +		ret = do_write_locked_index(istate, lock, flags, 1);

and here

>   		goto out;
>   	}
>   
> @@ -3394,8 +3404,7 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
>   		/* Same initial permissions as the main .git/index file */
>   		temp = mks_tempfile_sm(git_path("sharedindex_XXXXXX"), 0, 0666);
>   		if (!temp) {
> -			oidclr(&si->base_oid);
> -			ret = do_write_locked_index(istate, lock, flags);
> +			ret = do_write_locked_index(istate, lock, flags, 1);

and here

>   			goto out;
>   		}
>   		ret = write_shared_index(istate, &temp, flags);
> diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
> index cbafdd69602..9fab9a2ab38 100755
> --- a/t/t7527-builtin-fsmonitor.sh
> +++ b/t/t7527-builtin-fsmonitor.sh
> @@ -1003,7 +1003,7 @@ test_expect_success !UNICODE_COMPOSITION_SENSITIVE 'Unicode nfc/nfd' '
>   	egrep "^event: nfd/d_${utf8_nfc}/?$" ./unicode.trace
>   '
>   
> -test_expect_failure 'split-index and FSMonitor work well together' '
> +test_expect_success 'split-index and FSMonitor work well together' '
>   	git init split-index &&
>   	test_when_finished "git -C \"$PWD/split-index\" \
>   		fsmonitor--daemon stop" &&

Thanks
Jeff

  parent reply	other threads:[~2023-03-23 15:22 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-22 16:00 [PATCH 0/4] Fix a few split-index bugs Johannes Schindelin via GitGitGadget
2023-03-22 16:00 ` [PATCH 1/4] split-index & fsmonitor: demonstrate a bug Johannes Schindelin via GitGitGadget
2023-03-23 13:39   ` Jeff Hostetler
2023-03-22 16:00 ` [PATCH 2/4] split-index; stop abusing the `base_oid` to strip the "link" extension Johannes Schindelin via GitGitGadget
2023-03-22 21:24   ` Junio C Hamano
2023-03-23 13:59     ` Jeff Hostetler
2023-03-23 16:06       ` Junio C Hamano
2023-03-23 15:22   ` Jeff Hostetler [this message]
2023-03-22 16:00 ` [PATCH 3/4] fsmonitor: avoid overriding `cache_changed` bits Johannes Schindelin via GitGitGadget
2023-03-22 16:00 ` [PATCH 4/4] unpack-trees: take care to propagate the split-index flag Johannes Schindelin via GitGitGadget
2023-03-23 14:32   ` Jeff Hostetler
2023-03-22 16:22 ` [PATCH 0/4] Fix a few split-index bugs Junio C Hamano
2023-03-26 22:45 ` [PATCH v2 " Johannes Schindelin via GitGitGadget
2023-03-26 22:45   ` [PATCH v2 1/4] split-index & fsmonitor: demonstrate a bug Johannes Schindelin via GitGitGadget
2023-03-26 22:45   ` [PATCH v2 2/4] split-index; stop abusing the `base_oid` to strip the "link" extension Johannes Schindelin via GitGitGadget
2023-03-26 22:45   ` [PATCH v2 3/4] fsmonitor: avoid overriding `cache_changed` bits Johannes Schindelin via GitGitGadget
2023-03-26 22:45   ` [PATCH v2 4/4] unpack-trees: take care to propagate the split-index flag Johannes Schindelin via GitGitGadget
2023-03-27 14:48   ` [PATCH v2 0/4] Fix a few split-index bugs Jeff Hostetler
2023-03-27 16:42     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aaad07e3-11c9-9bb1-89ec-f9418cb2ef77@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=johannes.schindelin@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).