mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <>
To: Nicolas Pitre <>
Cc: Duy Nguyen <>,
	Jonathan Tan <>,
	Git Mailing List <>
Subject: Re: [PATCH 2/2] index-pack: prefetch missing REF_DELTA bases
Date: Tue, 21 May 2019 17:20:23 -0400	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <nycvar.YSQ.7.76.1905201803520.1558@knanqh.ubzr>

On Mon, May 20, 2019 at 07:04:14PM -0400, Nicolas Pitre wrote:

> > That still has some value even if your commit ends up with a question
> > mark. There's not much to dig out of 636171cb80 (make index-pack able
> > to complete thin packs., 2006-10-25). Adding Nico, maybe he still
> > remembers...
> What about this comment in fix_unresolved_deltas():
>         /*
>          * Since many unresolved deltas may well be themselves base objects
>          * for more unresolved deltas, we really want to include the
>          * smallest number of base objects that would cover as much delta
>          * as possible by picking the
>          * trunc deltas first, allowing for other deltas to resolve without
>          * additional base objects.  Since most base objects are to be found
>          * before deltas depending on them, a good heuristic is to start
>          * resolving deltas in the same order as their position in the pack.
>          */
> Doesn't that cover it?

Hmm. It does help, but only because my earlier comments were actually
wrong. I was claiming that index-pack would not be able to resolve "A"
from the delta chain when A is a regular delta, B is a thin delta, and C
is not in the pack.

Because I did not see how we would ever find base "B" in that third pass
of fix_unresolved_deltas(), because we look only for objects we already
have on disk.

But I think the trick that I was missing is that if we see "B" first,
we'll resolve it against C that we already have, but then we'll _also_
look for children that this now enables us to resolve. So we'd resolve
"A" at that point, and then when we later hit "A" in the loop from
fix_unresolved_deltas(), we skip it because of this:

   if (objects[d->obj_no].real_type != OBJ_REF_DELTA)

So this situation _can_ happen, and we do handle it properly. I don't
know why the tests I showed in [1] didn't work. Obviously I botched

I'm still not quite sure what would happen if we see "A" first (i.e.,
the delta is physically in the pack before its base, "B") and both are
REF_DELTA. Right now we'd skip over A, knowing that we don't have B. And
then when we get to B, we'd correctly resolve A on top of it (and if we
have no such B, we'd eventually complain "hey, there are still some
unresolved deltas").

But what happens if we _do_ have "B", but we just weren't able to tell
the sender for some reason (e.g., it's unreferenced). We'd add a new
copy of "B" to the pack while resolving "A", as part of --fix-thin. And
then when we resolve "B", we'd get _another_ copy of "B" in the pack,
and our result would have a duplicate.

I don't think this happens in practice because we'd generally use
OFS_DELTA in the first place these days. And even for REF_DELTA, I think
we prefer to put bases before their deltas (i.e., we'd always see "B"
before "A"). But I think if we ever _did_ see it (alternate
implementation? malicious packfile?) we'd generate duplicates. I _think_
that would then cause us to barf due to the duplicate check from
68be2fea50 (receive-pack, fetch-pack: reject bogus pack that records
objects twice, 2011-11-16).

And that's true today, even without Jonathan's on-demand fetching patch.
So I don't think that materially changes the requirements for
correctness. It does mean that we might fetch (but not use!) a thin base
we don't need, but only if the sender uses REF_DELTA for non-thin
deltas, which we wouldn't normally do.



  reply	other threads:[~2019-05-21 21:20 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-14 21:10 [PATCH 0/2] Partial clone fix: handling received REF_DELTA Jonathan Tan
2019-05-14 21:10 ` [PATCH 1/2] t5616: refactor packfile replacement Jonathan Tan
2019-05-15  8:36   ` Johannes Schindelin
2019-05-15 18:22     ` Jonathan Tan
2019-05-14 21:10 ` [PATCH 2/2] index-pack: prefetch missing REF_DELTA bases Jonathan Tan
2019-05-15  8:46   ` Johannes Schindelin
2019-05-15 18:28     ` Jonathan Tan
2019-05-17 18:33       ` Johannes Schindelin
2019-05-15 23:16   ` Jeff King
2019-05-16  1:43     ` Junio C Hamano
2019-05-16  4:04       ` Jeff King
2019-05-16 18:26     ` Jonathan Tan
2019-05-16 21:12       ` Jeff King
2019-05-16 21:30         ` Jonathan Tan
2019-05-16 21:42           ` Jeff King
2019-05-16 23:15             ` Jonathan Tan
2019-05-17  1:09               ` Jeff King
2019-05-17  1:22                 ` Jeff King
2019-05-17  4:39                   ` Jeff King
2019-05-17  4:42                     ` Jeff King
2019-05-17  7:20                     ` Duy Nguyen
2019-05-17  8:55                       ` Jeff King
2019-05-18 11:39                         ` Duy Nguyen
2019-05-20 23:04                           ` Nicolas Pitre
2019-05-21 21:20                             ` Jeff King [this message]
2019-06-03 22:23   ` Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

  List information:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).