git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, me@ttaylorr.com, vdye@github.com,
	avarab@gmail.com, steadmon@google.com, chooglen@google.com
Subject: Re: [PATCH v2 01/10] bundle: optionally skip reachability walk
Date: Mon, 23 Jan 2023 13:24:26 -0500	[thread overview]
Message-ID: <eae85534-89c9-6eff-69d5-7d4b2be85fb6@github.com> (raw)
In-Reply-To: <xmqqsfg1m8l6.fsf@gitster.g>

On 1/23/2023 1:03 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <derrickstolee@github.com>
>>
>> When unbundling a bundle, the verify_bundle() method checks two things
>> with regards to the prerequisite commits:
>>
>>  1. Those commits are in the object store, and
>>  2. Those commits are reachable from refs.
>>
>> During testing of the bundle URI feature, where multiple bundles are
>> unbundled in the same process, the ref store did not appear to be
>> refreshing with the new refs/bundles/* references added within that
>> process. This caused the second half -- the reachability walk -- report
>> that some commits were not present, despite actually being present.
>>
>> One way to attempt to fix this would be to create a way to force-refresh
>> the ref state. That would correct this for these cases where the
>> refs/bundles/* references have been updated. However, this still is an
>> expensive operation in a repository with many references.
>>
>> Instead, optionally allow callers to skip this portion by instead just
>> checking for presence within the object store. Use this when unbundling
>> in bundle-uri.c.
> 
> This step is new in this round.
> 
> I am assuming that this approach is to avoid repeated "now we
> unbundled one, let's spend enormous cycles to update the in-core
> view of the ref store before processing the next bundle"---instead
> we unbundle all, assuming the prerequisites for each and every
> bundle are satisfied.

We are specifically removing the requirement that the objects are
reachable from refs, we still check that the objects are in the
object store. Thus, we can only be in a bad state afterwards if
the required objects for a bundle were in the object store,
previously unreachable, and one of these two things happened:

1. Some objects reachable from those required commits were already
   missing in the repository (so the repo's object store was broken
   but only for some unreachable objects).

2. A GC pruned those objects between verifying the bundle and
   writing the refs/bundles/* refs after unbundling.

I think we should trust the repository to not be in the first state,
but the race condition in the second option will create a state
where we have missing objects that are now reachable from refs.
 
> I am OK as long as we check the assumption holds true at the end;
> this looks like a good optimization.
 
So are you recommending that we verify all objects reachable from
the new refs/bundles/* are present after unbundling? That would
prevent the possibility of a GC race, but at some significant run-
time performance costs. Do we do the same as we unpack from a
fetch? Do we apply the same scrutiny to the objects during
unbundling as we do from a fetched pack? They both use 'git
index-pack --stdin --fix-thin', so my guess is that we do the same
amount of checks for both cases.

Since this is only one of multiple ways to add objects that depend
on possibly-unreachable objects, my preferred way to avoid the
GC race is by using care in the GC process itself (such as the new
--expire-to option to recover from these races).

Thanks,
-Stolee

  reply	other threads:[~2023-01-23 18:26 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-06 20:36 [PATCH 0/8] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
2023-01-06 20:36 ` [PATCH 1/8] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
2023-01-17 18:17   ` Victoria Dye
2023-01-17 21:00     ` Derrick Stolee
2023-01-06 20:36 ` [PATCH 2/8] bundle-uri: parse bundle.heuristic=creationToken Derrick Stolee via GitGitGadget
2023-01-09  2:38   ` Junio C Hamano
2023-01-09 14:20     ` Derrick Stolee
2023-01-17 19:13   ` Victoria Dye
2023-01-06 20:36 ` [PATCH 3/8] bundle-uri: parse bundle.<id>.creationToken values Derrick Stolee via GitGitGadget
2023-01-09  3:08   ` Junio C Hamano
2023-01-09 14:41     ` Derrick Stolee
2023-01-17 19:24   ` Victoria Dye
2023-01-06 20:36 ` [PATCH 4/8] bundle-uri: download in creationToken order Derrick Stolee via GitGitGadget
2023-01-09  3:22   ` Junio C Hamano
2023-01-09 14:58     ` Derrick Stolee
2023-01-19 18:32   ` Victoria Dye
2023-01-20 14:56     ` Derrick Stolee
2023-01-06 20:36 ` [PATCH 5/8] clone: set fetch.bundleURI if appropriate Derrick Stolee via GitGitGadget
2023-01-19 19:42   ` Victoria Dye
2023-01-20 15:42     ` Derrick Stolee
2023-01-06 20:36 ` [PATCH 6/8] bundle-uri: drop bundle.flag from design doc Derrick Stolee via GitGitGadget
2023-01-19 19:44   ` Victoria Dye
2023-01-06 20:36 ` [PATCH 7/8] fetch: fetch from an external bundle URI Derrick Stolee via GitGitGadget
2023-01-19 20:34   ` Victoria Dye
2023-01-20 15:47     ` Derrick Stolee
2023-01-06 20:36 ` [PATCH 8/8] bundle-uri: store fetch.bundleCreationToken Derrick Stolee via GitGitGadget
2023-01-19 22:24   ` Victoria Dye
2023-01-20 15:53     ` Derrick Stolee
2023-01-23 15:21 ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
2023-01-23 15:21   ` [PATCH v2 01/10] bundle: optionally skip reachability walk Derrick Stolee via GitGitGadget
2023-01-23 18:03     ` Junio C Hamano
2023-01-23 18:24       ` Derrick Stolee [this message]
2023-01-23 20:13         ` Junio C Hamano
2023-01-23 22:30           ` Junio C Hamano
2023-01-24 12:27             ` Derrick Stolee
2023-01-24 14:14               ` [PATCH v2.5 01/11] bundle: test unbundling with incomplete history Derrick Stolee
2023-01-24 17:16                 ` Junio C Hamano
2023-01-24 14:16               ` [PATCH v2.5 02/11] bundle: verify using connected() Derrick Stolee
2023-01-24 17:33                 ` Junio C Hamano
2023-01-24 18:46                   ` Derrick Stolee
2023-01-24 20:41                     ` Junio C Hamano
2023-01-24 15:22               ` [PATCH v2 01/10] bundle: optionally skip reachability walk Junio C Hamano
2023-01-23 21:08         ` Junio C Hamano
2023-01-23 15:21   ` [PATCH v2 02/10] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
2023-01-27 19:15     ` Victoria Dye
2023-01-23 15:21   ` [PATCH v2 03/10] bundle-uri: parse bundle.heuristic=creationToken Derrick Stolee via GitGitGadget
2023-01-23 15:21   ` [PATCH v2 04/10] bundle-uri: parse bundle.<id>.creationToken values Derrick Stolee via GitGitGadget
2023-01-23 15:21   ` [PATCH v2 05/10] bundle-uri: download in creationToken order Derrick Stolee via GitGitGadget
2023-01-27 19:17     ` Victoria Dye
2023-01-27 19:32       ` Junio C Hamano
2023-01-30 18:43         ` Derrick Stolee
2023-01-30 19:02           ` Junio C Hamano
2023-01-30 19:12             ` Derrick Stolee
2023-01-23 15:21   ` [PATCH v2 06/10] clone: set fetch.bundleURI if appropriate Derrick Stolee via GitGitGadget
2023-01-23 15:21   ` [PATCH v2 07/10] bundle-uri: drop bundle.flag from design doc Derrick Stolee via GitGitGadget
2023-01-23 15:21   ` [PATCH v2 08/10] fetch: fetch from an external bundle URI Derrick Stolee via GitGitGadget
2023-01-27 19:18     ` Victoria Dye
2023-01-23 15:21   ` [PATCH v2 09/10] bundle-uri: store fetch.bundleCreationToken Derrick Stolee via GitGitGadget
2023-01-23 15:21   ` [PATCH v2 10/10] bundle-uri: test missing bundles with heuristic Derrick Stolee via GitGitGadget
2023-01-27 19:21     ` Victoria Dye
2023-01-30 18:47       ` Derrick Stolee
2023-01-27 19:28   ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Victoria Dye
2023-01-31 13:29   ` [PATCH v3 00/11] " Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 01/11] bundle: test unbundling with incomplete history Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 02/11] bundle: verify using check_connected() Derrick Stolee via GitGitGadget
2023-01-31 17:35       ` Junio C Hamano
2023-01-31 19:31         ` Derrick Stolee
2023-01-31 19:36           ` Junio C Hamano
2023-01-31 13:29     ` [PATCH v3 03/11] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 04/11] bundle-uri: parse bundle.heuristic=creationToken Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 05/11] bundle-uri: parse bundle.<id>.creationToken values Derrick Stolee via GitGitGadget
2023-01-31 21:22       ` Junio C Hamano
2023-01-31 13:29     ` [PATCH v3 06/11] bundle-uri: download in creationToken order Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 07/11] clone: set fetch.bundleURI if appropriate Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 08/11] bundle-uri: drop bundle.flag from design doc Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 09/11] fetch: fetch from an external bundle URI Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 10/11] bundle-uri: store fetch.bundleCreationToken Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 11/11] bundle-uri: test missing bundles with heuristic Derrick Stolee via GitGitGadget
2023-01-31 22:01     ` [PATCH v3 00/11] Bundle URIs V: creationToken heuristic for incremental fetches Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eae85534-89c9-6eff-69d5-7d4b2be85fb6@github.com \
    --to=derrickstolee@github.com \
    --cc=avarab@gmail.com \
    --cc=chooglen@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=steadmon@google.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).