git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Nicolas Pitre <nico@fluxnic.net>
Cc: Duy Nguyen <pclouds@gmail.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 2/2] index-pack: prefetch missing REF_DELTA bases
Date: Tue, 21 May 2019 17:20:23 -0400	[thread overview]
Message-ID: <20190521212023.GB14807@sigill.intra.peff.net> (raw)
In-Reply-To: <nycvar.YSQ.7.76.1905201803520.1558@knanqh.ubzr>

On Mon, May 20, 2019 at 07:04:14PM -0400, Nicolas Pitre wrote:

> > That still has some value even if your commit ends up with a question
> > mark. There's not much to dig out of 636171cb80 (make index-pack able
> > to complete thin packs., 2006-10-25). Adding Nico, maybe he still
> > remembers...
> 
> What about this comment in fix_unresolved_deltas():
> 
>         /*
>          * Since many unresolved deltas may well be themselves base objects
>          * for more unresolved deltas, we really want to include the
>          * smallest number of base objects that would cover as much delta
>          * as possible by picking the
>          * trunc deltas first, allowing for other deltas to resolve without
>          * additional base objects.  Since most base objects are to be found
>          * before deltas depending on them, a good heuristic is to start
>          * resolving deltas in the same order as their position in the pack.
>          */
> 
> Doesn't that cover it?

Hmm. It does help, but only because my earlier comments were actually
wrong. I was claiming that index-pack would not be able to resolve "A"
from the delta chain when A is a regular delta, B is a thin delta, and C
is not in the pack.

Because I did not see how we would ever find base "B" in that third pass
of fix_unresolved_deltas(), because we look only for objects we already
have on disk.

But I think the trick that I was missing is that if we see "B" first,
we'll resolve it against C that we already have, but then we'll _also_
look for children that this now enables us to resolve. So we'd resolve
"A" at that point, and then when we later hit "A" in the loop from
fix_unresolved_deltas(), we skip it because of this:

   if (objects[d->obj_no].real_type != OBJ_REF_DELTA)
	continue;

So this situation _can_ happen, and we do handle it properly. I don't
know why the tests I showed in [1] didn't work. Obviously I botched
something.

I'm still not quite sure what would happen if we see "A" first (i.e.,
the delta is physically in the pack before its base, "B") and both are
REF_DELTA. Right now we'd skip over A, knowing that we don't have B. And
then when we get to B, we'd correctly resolve A on top of it (and if we
have no such B, we'd eventually complain "hey, there are still some
unresolved deltas").

But what happens if we _do_ have "B", but we just weren't able to tell
the sender for some reason (e.g., it's unreferenced). We'd add a new
copy of "B" to the pack while resolving "A", as part of --fix-thin. And
then when we resolve "B", we'd get _another_ copy of "B" in the pack,
and our result would have a duplicate.

I don't think this happens in practice because we'd generally use
OFS_DELTA in the first place these days. And even for REF_DELTA, I think
we prefer to put bases before their deltas (i.e., we'd always see "B"
before "A"). But I think if we ever _did_ see it (alternate
implementation? malicious packfile?) we'd generate duplicates. I _think_
that would then cause us to barf due to the duplicate check from
68be2fea50 (receive-pack, fetch-pack: reject bogus pack that records
objects twice, 2011-11-16).

And that's true today, even without Jonathan's on-demand fetching patch.
So I don't think that materially changes the requirements for
correctness. It does mean that we might fetch (but not use!) a thin base
we don't need, but only if the sender uses REF_DELTA for non-thin
deltas, which we wouldn't normally do.

-Peff

[1] https://public-inbox.org/git/20190517043939.GA12063@sigill.intra.peff.net/

  reply	other threads:[~2019-05-21 21:20 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-14 21:10 [PATCH 0/2] Partial clone fix: handling received REF_DELTA Jonathan Tan
2019-05-14 21:10 ` [PATCH 1/2] t5616: refactor packfile replacement Jonathan Tan
2019-05-15  8:36   ` Johannes Schindelin
2019-05-15 18:22     ` Jonathan Tan
2019-05-14 21:10 ` [PATCH 2/2] index-pack: prefetch missing REF_DELTA bases Jonathan Tan
2019-05-15  8:46   ` Johannes Schindelin
2019-05-15 18:28     ` Jonathan Tan
2019-05-17 18:33       ` Johannes Schindelin
2019-05-15 23:16   ` Jeff King
2019-05-16  1:43     ` Junio C Hamano
2019-05-16  4:04       ` Jeff King
2019-05-16 18:26     ` Jonathan Tan
2019-05-16 21:12       ` Jeff King
2019-05-16 21:30         ` Jonathan Tan
2019-05-16 21:42           ` Jeff King
2019-05-16 23:15             ` Jonathan Tan
2019-05-17  1:09               ` Jeff King
2019-05-17  1:22                 ` Jeff King
2019-05-17  4:39                   ` Jeff King
2019-05-17  4:42                     ` Jeff King
2019-05-17  7:20                     ` Duy Nguyen
2019-05-17  8:55                       ` Jeff King
2019-05-18 11:39                         ` Duy Nguyen
2019-05-20 23:04                           ` Nicolas Pitre
2019-05-21 21:20                             ` Jeff King [this message]
2019-06-03 22:23   ` Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190521212023.GB14807@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=nico@fluxnic.net \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).