git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: Jonathan Tan <jonathantanmy@google.com>,
	git@vger.kernel.org, vdye@github.com, gitster@pobox.com
Subject: Re: [PATCH 2/2] builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects
Date: Fri, 13 May 2022 16:06:39 -0700	[thread overview]
Message-ID: <20220513230639.1099955-1-jonathantanmy@google.com> (raw)
In-Reply-To: <bcc48004450769410d7e6d8a6e56f08bfa9a02a3.1652458395.git.me@ttaylorr.com>

Taylor Blau <me@ttaylorr.com> writes:
> An alternative approach to closing this race would have been to call
> `is_pack_valid()` on _all_ packs in a multi-pack bitmap on load. This
> has a couple of problems:
> 
>   - it is unnecessarily expensive in the cases where we don't actually
>     need to open any packs (e.g., in `git rev-list --use-bitmap-index
>     --count`)
> 
>   - more importantly, it means any time we would have hit this race,
>     we'll avoid using bitmaps altogether, leading to significant
>     slowdowns by forcing a full object traversal

This answers a question I had about why we're only opening the preferred
pack instead of all packs. (You mention in [1] that it's also answered
in that patch message, but I didn't see it.) In any case, it might be
clearer to move this part to the 1st commit.

[1] https://lore.kernel.org/git/Yn63nDhSBIEa3%2F+%2F@nand.local/

> Work around this by calling `is_pack_valid()` from within
> `want_found_object()`, matching the behavior in
> `want_object_in_pack_one()` (which has an analogous call). Most calls to
> `is_pack_valid()` should be basically no-ops, since only the first call
> requires us to open a file (subsequent calls realize the file is already
> open, and return immediately).
> 
> This does require us to make a small change in
> `want_object_in_pack_one()`, since `want_found_object()` may return `-1`
> (indicating "keep searching for other packs containing this object")
> when `*found_pack` is non-NULL. Force `want_object_in_pack_one()` to
> call `is_pack_valid()` when `p != *found_pack`, not just when
> `*found_pack` is non-NULL.

It took me a while to realize that the relevant want_found_object()
invocation that may return -1 is not the one in
want_object_in_pack_one(), but in the latter's caller
want_object_in_pack(). But even in this case, couldn't
want_found_object() return -1 (see the very end of the function) even
before this patch?

> @@ -1424,14 +1427,15 @@ static int want_object_in_pack_one(struct packed_git *p,
>  				   off_t *found_offset)
>  {
>  	off_t offset;
> +	int use_found = p == *found_pack;
>  
> -	if (p == *found_pack)
> +	if (use_found)
>  		offset = *found_offset;
>  	else
>  		offset = find_pack_entry_one(oid->hash, p);
>  
>  	if (offset) {
> -		if (!*found_pack) {
> +		if (!use_found) {
>  			if (!is_pack_valid(p))
>  				return -1;
>  			*found_offset = offset;

My understanding of the purpose of this code change is that if we reach
here with a non-NULL *found_pack, it is likely that *found_pack contains
an invalid pack, and this part overwrites *found_pack (and
*found_offset) if it finds a valid pack. This seems like a good change,
but I don't see how this is a result of something that "does require
us" (as far as I can tell, *found_pack could have already been invalid
before this patch, so the downstream code should already be able to
handle it). Maybe it's just that we couldn't tell if the pack is invalid
previously, but now we can; but if so, it would be better to say "use
this added information to overwrite *found_pack when it makes sense" or
something like that.

(An alternative to the change in this patch may be to reset *found_pack
to NULL when it is found that the pack is invalid, but I haven't
investigated all the callers to see if they can tolerate *found_pack
moving changing non-NULL to NULL, so the change in this patch is
probably more practical.)

Other than that (and the "close(fd)" thing in patch 1 that Junio
mentioned [2]), this series looks good to me. Thanks for noticing and
fixing the bug.

[2] https://lore.kernel.org/git/xmqqpmkh9tye.fsf@gitster.g/

  reply	other threads:[~2022-05-13 23:11 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-13 16:23 [PATCH 0/2] pack-objects: fix a pair of MIDX bitmap-related races Taylor Blau
2022-05-13 16:23 ` [PATCH 1/2] pack-bitmap: check preferred pack validity when opening MIDX bitmap Taylor Blau
2022-05-13 18:19   ` Junio C Hamano
2022-05-13 19:55     ` Taylor Blau
2022-05-13 16:23 ` [PATCH 2/2] builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects Taylor Blau
2022-05-13 23:06   ` Jonathan Tan [this message]
2022-05-14 13:17     ` Taylor Blau
2022-05-16  6:07       ` Jonathan Tan
2022-05-14 13:34     ` Taylor Blau
2022-05-16  6:11       ` Jonathan Tan
2022-05-24 18:54 ` [PATCH v2 0/4] pack-objects: fix a pair of MIDX bitmap-related races Taylor Blau
2022-05-24 18:54   ` [PATCH v2 1/4] pack-bitmap.c: check preferred pack validity when opening MIDX bitmap Taylor Blau
2022-05-24 19:36     ` Ævar Arnfjörð Bjarmason
2022-05-24 21:38       ` Taylor Blau
2022-05-24 21:51         ` Ævar Arnfjörð Bjarmason
2022-05-24 18:54   ` [PATCH v2 2/4] builtin/pack-objects.c: avoid redundant NULL check Taylor Blau
2022-05-24 21:44     ` Junio C Hamano
2022-05-25  0:11       ` Taylor Blau
2022-05-24 18:54   ` [PATCH v2 3/4] builtin/pack-objects.c: ensure included `--stdin-packs` exist Taylor Blau
2022-05-24 19:46     ` Ævar Arnfjörð Bjarmason
2022-05-24 21:33       ` Taylor Blau
2022-05-24 21:49         ` Ævar Arnfjörð Bjarmason
2022-05-24 22:03     ` Junio C Hamano
2022-05-25  0:14       ` Taylor Blau
2022-05-26 19:21     ` Victoria Dye
2022-05-26 20:05       ` Taylor Blau
2022-05-24 18:54   ` [PATCH v2 4/4] builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects Taylor Blau
2022-05-24 21:38   ` [PATCH v2 0/4] pack-objects: fix a pair of MIDX bitmap-related races Junio C Hamano
2022-05-25  0:16     ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220513230639.1099955-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).