git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Robert Coup <robert@coup.net.nz>
Cc: Robert Coup via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, Jonathan Tan <jonathantanmy@google.com>,
	John Cai <johncai86@gmail.com>,
	Jeff Hostetler <git@jeffhostetler.com>,
	Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee <derrickstolee@github.com>
Subject: Re: [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering")
Date: Mon, 28 Feb 2022 23:20:59 +0100	[thread overview]
Message-ID: <220228.86ee3m39jf.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <CACf-nVcy8xsf+STJoE5vwcsUauHRcR5wmwmCfnUnSW=4jDcgYQ@mail.gmail.com>


On Mon, Feb 28 2022, Robert Coup wrote:

> Hi Ævar,
>
> Thanks for taking the time to look into this,
>
> On Mon, 28 Feb 2022 at 16:54, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>> I realize this was probably based on feedback on v1 (I didn't go back
>> and re-read it, sorry).
>
> Yes, `fetch --repair` was from Jonathan Tan's v1 feedback[1], where he
> pointed out it could fill in lost objects from any remote in a more
> generally useful fashion.
>
> My goal here is to refetch with a different filter so that I get the
> outcome of a `clone --filter=` without having to chuck my object
> directory. But the actual implementation doesn't need to know anything
> specific about filters, so the original "refilter" name I had isn't
> really right.

*nod*

>> But I feel strongly that we really should name this something other than
>> --repair. I don't care much if it isn't that :) But maybe
>> --expand-filters, --fleshen-partial or something like that?
>
> fleshen-partial sounds like a horror movie scene to me.
>
> 1. `--refetch`
> 2. `--as-clone`
> 3. `--expand-filter` (though TBC you don't necessarily need a filter)
> 4. `--refilter`
> 5. something else

*nod*

>> So first (and partially as an aside): Is a "noop" negotiatior really
>> want we want at all? Don't we instead want to be discovering those parts
>> of our history that are closed under reachability (if any) and say we
>> HAVE those things during negotiation?
>
> At an object level we don't have any means of knowing what has or
> hasn't been obtained via fetch to a partial clone with different
> `--filter` args (via config or cli), dynamic fault-ins, or sourced
> from a different remote. Fetch negotiation only occurs for refs and
> their associated commits/histories, but filtering occurs at the blob
> or tree level — so we often HAVE a commit but not all of its
> trees/blobs, whereupon negotiation skips that commit and all it's
> associated objects.

Yes, I'm basically asking if the negotiation part wouldn't *ideally* be
doing basically the same "everything is connected" check
receive-pack/fsck do.

I.e. you've got partial data with promisors locally, but if you walk
your branch histor(y|ies) you'll discover that N commits down we have
all the prerequisite objects locally.

As an aside there's a 1=1 mapping between that and what "git bundle
create" will do/verify to create a bundle without listed
prerequisites.

I.e. I think you'll find what it does with revision.c and PREREQ_* and
other flags INTERESTING (a lame pun on its use of UNINTERESTING :).

Presumably the code needed to drive such a negotiation would be useful
for other neat stuff, e.g. having a some-partial repo locally, wanting
to fetch the PACK to complete it from the server, and knowing you have
that data to create a fully connected (or incremental) bundle for that
repository, but I digress.

>> But secondly, on the "--repair" name: The reason I mentioned that is
>> that I'd really like us to actually have a "my repo is screwed, please
>> repair it".
>
> Feels like people would look at `fsck` for that over `fetch`? Maybe
> not. Anyway, I get the point about the naming still not being right
> :-)

I think that definitely would be fetch/gc over "fsck". I.e. if you've
got corruption fsck can only tell you that it's screwed.

It's fetch/gc (or "git bundle unbundle") that stand any chance of
actually doing the repair, since we'd need to stitch together the
(partially) corrupted local/remote content with a hopefully good
compliment to it.

FWIW I had an ad-hoc implementation of this basically working by
disabling the negotiation + not doing any object existence/collision
checks before writing content to the repository.

That and teaching "repack" to not die and instead to carry on in the
face of object decoding failure (and hopefully discover a "duplicate"
but good copy later) + "gc" is enough to repair most corruption,
e.g. truncated loose object etc.

>> But (and I haven't tested, but I'm pretty sure), this patch series isn't
>> going to give you that. The reasons are elaborated on in [1], basically
>> we try really hard to re-use local data, and due to that & the collision
>> detection will often just hard die early in object walking.
>>
>> But maybe I'm wrong, have you actually tested this with *broken* objects
>> as opposed to just missing ones with repo filters + promisors in play?
>> Our t/*fsck* and t/*corrupt*/ etc. tests have some of those.
>
> Correct: I haven't tested with such objects/broken ODBs. Ideally
> repack/gc/etc would prefer a new-fixed pack over the old-broken
> pack/object but that's not really what I'm aiming to achieve here or
> am interested in.

I think I've only tested loose (bad) + pack (good), I think pack (bad) +
pack (good) probably has some bigger caveats (like the first error
aborting the whole pack read, due to deltas etc.).

But yeah, I'm not saying this should be on your radar at all, other than
the bikeshedding comment of having a --repair that doesn't really do
"repair" would be unfortunate naming, especially if we're locked into
behavior orthagonal to that needed for an "actual" repair.

> 1. https://lore.kernel.org/git/20220202185957.1928631-1-jonathantanmy@google.com/


  parent reply	other threads:[~2022-02-28 22:34 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-01 15:49 [PATCH 0/6] [RFC] partial-clone: add ability to refetch with expanded filter Robert Coup via GitGitGadget
2022-02-01 15:49 ` [PATCH 1/6] fetch-negotiator: add specific noop initializor Robert Coup via GitGitGadget
2022-02-01 15:49 ` [PATCH 2/6] fetch-pack: add partial clone refiltering Robert Coup via GitGitGadget
2022-02-04 18:02   ` Jonathan Tan
2022-02-11 14:56     ` Robert Coup
2022-02-17  0:05       ` Jonathan Tan
2022-02-01 15:49 ` [PATCH 3/6] builtin/fetch-pack: add --refilter option Robert Coup via GitGitGadget
2022-02-01 15:49 ` [PATCH 4/6] fetch: " Robert Coup via GitGitGadget
2022-02-01 15:49 ` [PATCH 5/6] t5615-partial-clone: add test for --refilter Robert Coup via GitGitGadget
2022-02-01 15:49 ` [PATCH 6/6] doc/partial-clone: mention --refilter option Robert Coup via GitGitGadget
2022-02-01 20:13 ` [PATCH 0/6] [RFC] partial-clone: add ability to refetch with expanded filter Junio C Hamano
2022-02-02 15:02   ` Robert Coup
2022-02-16 13:24     ` Robert Coup
2022-02-02 18:59 ` Jonathan Tan
2022-02-02 21:58   ` Robert Coup
2022-02-02 21:59     ` Robert Coup
2022-02-07 19:37 ` Jeff Hostetler
2022-02-24 16:13 ` [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering") Robert Coup via GitGitGadget
2022-02-24 16:13   ` [PATCH v2 1/8] fetch-negotiator: add specific noop initializor Robert Coup via GitGitGadget
2022-02-25  6:19     ` Junio C Hamano
2022-02-28 12:22       ` Robert Coup
2022-02-24 16:13   ` [PATCH v2 2/8] fetch-pack: add repairing Robert Coup via GitGitGadget
2022-02-25  6:46     ` Junio C Hamano
2022-02-28 12:14       ` Robert Coup
2022-02-24 16:13   ` [PATCH v2 3/8] builtin/fetch-pack: add --repair option Robert Coup via GitGitGadget
2022-02-24 16:13   ` [PATCH v2 4/8] fetch: " Robert Coup via GitGitGadget
2022-02-24 16:13   ` [PATCH v2 5/8] t5615-partial-clone: add test for fetch --repair Robert Coup via GitGitGadget
2022-02-24 16:13   ` [PATCH v2 6/8] maintenance: add ability to pass config options Robert Coup via GitGitGadget
2022-02-25  6:57     ` Junio C Hamano
2022-02-28 12:02       ` Robert Coup
2022-02-28 17:07         ` Junio C Hamano
2022-02-25 10:29     ` Ævar Arnfjörð Bjarmason
2022-02-28 11:51       ` Robert Coup
2022-02-24 16:13   ` [PATCH v2 7/8] fetch: after repair, encourage auto gc repacking Robert Coup via GitGitGadget
2022-02-28 16:40     ` Ævar Arnfjörð Bjarmason
2022-02-24 16:13   ` [PATCH v2 8/8] doc/partial-clone: mention --repair fetch option Robert Coup via GitGitGadget
2022-02-28 16:43   ` [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering") Ævar Arnfjörð Bjarmason
2022-02-28 17:27     ` Robert Coup
2022-02-28 18:54       ` [PATCH v2 0/8] fetch: add repair: full refetch without negotiation Junio C Hamano
2022-02-28 22:20       ` Ævar Arnfjörð Bjarmason [this message]
2022-03-04 15:04   ` [PATCH v3 0/7] fetch: add repair: full refetch without negotiation (was: "refiltering") Robert Coup via GitGitGadget
2022-03-04 15:04     ` [PATCH v3 1/7] fetch-negotiator: add specific noop initializer Robert Coup via GitGitGadget
2022-03-04 15:04     ` [PATCH v3 2/7] fetch-pack: add refetch Robert Coup via GitGitGadget
2022-03-04 15:04     ` [PATCH v3 3/7] builtin/fetch-pack: add --refetch option Robert Coup via GitGitGadget
2022-03-04 15:04     ` [PATCH v3 4/7] fetch: " Robert Coup via GitGitGadget
2022-03-04 21:19       ` Junio C Hamano
2022-03-07 11:31         ` Robert Coup
2022-03-07 17:27           ` Junio C Hamano
2022-03-09 10:00             ` Robert Coup
2022-03-04 15:04     ` [PATCH v3 5/7] t5615-partial-clone: add test for fetch --refetch Robert Coup via GitGitGadget
2022-03-04 15:04     ` [PATCH v3 6/7] fetch: after refetch, encourage auto gc repacking Robert Coup via GitGitGadget
2022-03-04 15:04     ` [PATCH v3 7/7] doc/partial-clone: mention --refetch fetch option Robert Coup via GitGitGadget
2022-03-09  0:27     ` [PATCH v3 0/7] fetch: add repair: full refetch without negotiation (was: "refiltering") Calvin Wan
2022-03-09  9:57       ` Robert Coup
2022-03-09 21:32         ` [PATCH v3 0/7] fetch: add repair: full refetch without negotiation Junio C Hamano
2022-03-10  1:07           ` Calvin Wan
2022-03-10 14:29           ` Robert Coup
2022-03-21 17:58             ` Calvin Wan
2022-03-21 21:34               ` Robert Coup
2022-03-28 14:02     ` [PATCH v4 0/7] fetch: add repair: full refetch without negotiation (was: "refiltering") Robert Coup via GitGitGadget
2022-03-28 14:02       ` [PATCH v4 1/7] fetch-negotiator: add specific noop initializer Robert Coup via GitGitGadget
2022-03-28 14:02       ` [PATCH v4 2/7] fetch-pack: add refetch Robert Coup via GitGitGadget
2022-03-31 15:09         ` Ævar Arnfjörð Bjarmason
2022-04-01 10:26           ` Robert Coup
2022-03-28 14:02       ` [PATCH v4 3/7] builtin/fetch-pack: add --refetch option Robert Coup via GitGitGadget
2022-03-28 14:02       ` [PATCH v4 4/7] fetch: " Robert Coup via GitGitGadget
2022-03-31 15:18         ` Ævar Arnfjörð Bjarmason
2022-04-01 10:31           ` Robert Coup
2022-03-28 14:02       ` [PATCH v4 5/7] t5615-partial-clone: add test for fetch --refetch Robert Coup via GitGitGadget
2022-03-31 15:20         ` Ævar Arnfjörð Bjarmason
2022-04-01 10:36           ` Robert Coup
2022-03-28 14:02       ` [PATCH v4 6/7] fetch: after refetch, encourage auto gc repacking Robert Coup via GitGitGadget
2022-03-31 15:22         ` Ævar Arnfjörð Bjarmason
2022-04-01 10:51           ` Robert Coup
2022-03-28 14:02       ` [PATCH v4 7/7] docs: mention --refetch fetch option Robert Coup via GitGitGadget
2022-03-28 17:38         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=220228.86ee3m39jf.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=johncai86@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=robert@coup.net.nz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).