git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>, Jeff King <peff@peff.net>
Subject: Re: [PATCH v2 4/5] index-pack, unpack-objects: add --not-so-strict for connectivity check
Date: Thu, 02 May 2013 23:55:57 -0700	[thread overview]
Message-ID: <7vbo8szh4y.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <7vk3ngzi6o.fsf@alter.siamese.dyndns.org> (Junio C. Hamano's message of "Thu, 02 May 2013 23:33:19 -0700")

Junio C Hamano <gitster@pobox.com> writes:

> Duy Nguyen <pclouds@gmail.com> writes:
>
>> What do you mean by "partial history"? Do we have dangling pointers
>> after doing that commit walker?
>
> "^C" will leave the objects and it is safe because it will not
> update refs.
>
> But your code that does not verify the full connectivity from such
> an object (that is outside the transferred pack) to the rest of the
> history will then make the resulting repository broken, if you
> update your ref to point at such an object, no?  Ading a single
> has_sha1_file() only verifies that single object, not the history
> behind it.

Let's illustrate.  Imagine your project as a whole has this history:

---A---B---C---D---E

When you cloned, it only had up to A, so that is what you have.
Then you try http walker, which slurps E, wants to go to D and dig
down, but after fetching E, some trees and blobs in E, and fetching
D, but before completing D' trees and blobs, your ISP cuts you off.

You have these in your object store:

---A           D---E

but your ref is pointing at A, because we promise that we make sure
full connectivity before we update the ref, and even if you have
commits D and E, the associated trees, blobs, and commits behind
this subpart of the history are missing.

Now you try to fetch from another mirror over the pack transport.
You advertise that you have A (but you do not advertise E, because
your refs do not point at it), and you expect all objects that are
listed in "rev-list --objects A..E" to be gien to you.

Unfortunately, the mirror is broken, in that it only packs the
objects that appear in "rev-list --objects A..B", but still tells
you that it is sending objects to complete history leading to E.

Your object store would have objects like this after this transfer:

---A---B       D---E

You may still have commits D and E unpruned, but are missing C, and
trees and blobs that are needed in D or E.

You have to walk the history from E and list the necessary objects
yourself, running has_sha1_file() on all of them, to prove that you
have everything needed, and the only things you can trust are your
refs (in this case, A).

If you verify that all the objects in the transferred pack are
complete, and also verify that the object the transfer proposes to
update your ref to is _in_ that pack, then you can detect a mirror
that is broken and only sends objects for A..B, but that does not
actually solve the issue.  Imagine another broken mirror that
instead sends objects for E.  E, its trees and all its blobs may be
in the pack and the only outgoing link from that pack may be a
pointer out of E pointing at D.  You happen to _have_ it in your
object store, but that does not mean you can update your ref to E
(remember, you do not have trees and blobs to complete D, or the
history behind it).

The current check is implemented in the way we currently do it,
_not_ because we were stupid and did not realize the optimization
possibility (in other words, what your patch proposes is not new),
but because we rejected this approach because it was not safe.

  reply	other threads:[~2013-05-03  6:56 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-31 11:09 [PATCH 0/4] check_everything_connected replacement Nguyễn Thái Ngọc Duy
2013-03-31 11:09 ` [PATCH 1/4] fetch-pack: save shallow file before fetching the pack Nguyễn Thái Ngọc Duy
2013-04-01 14:53   ` Junio C Hamano
2013-04-05  2:11     ` Duy Nguyen
2013-03-31 11:09 ` [PATCH 2/4] index-pack: remove dead code (it should never happen) Nguyễn Thái Ngọc Duy
2013-03-31 11:09 ` [PATCH 3/4] index-pack, unpack-objects: add --not-so-strict for connectivity check Nguyễn Thái Ngọc Duy
2013-03-31 11:09 ` [PATCH 4/4] Use --not-so-strict on all pack transfer " Nguyễn Thái Ngọc Duy
2013-04-01 14:48 ` [PATCH 0/4] check_everything_connected replacement Junio C Hamano
2013-05-01 10:59 ` [PATCH v2 0/5] " Nguyễn Thái Ngọc Duy
2013-05-01 10:59   ` [PATCH v2 1/5] clone: let the user know when check_everything_connected is run Nguyễn Thái Ngọc Duy
2013-05-01 10:59   ` [PATCH v2 2/5] fetch-pack: prepare updated shallow file before fetching the pack Nguyễn Thái Ngọc Duy
2013-05-01 20:27     ` Junio C Hamano
2013-05-02 10:04       ` Duy Nguyen
2013-05-01 10:59   ` [PATCH v2 3/5] index-pack: remove dead code (it should never happen) Nguyễn Thái Ngọc Duy
2013-05-01 10:59   ` [PATCH v2 4/5] index-pack, unpack-objects: add --not-so-strict for connectivity check Nguyễn Thái Ngọc Duy
2013-05-01 23:35     ` Junio C Hamano
2013-05-02  9:53       ` Duy Nguyen
2013-05-02 16:27         ` Junio C Hamano
2013-05-03  2:29           ` Duy Nguyen
2013-05-03  6:33             ` Junio C Hamano
2013-05-03  6:55               ` Junio C Hamano [this message]
2013-05-03  7:09                 ` Duy Nguyen
2013-05-03  8:16                   ` Eric Sunshine
2013-05-01 10:59   ` [PATCH v2 5/5] Use --not-so-strict on all pack transfer " Nguyễn Thái Ngọc Duy
2013-05-03 12:35   ` [PATCH v3 0/4] check_everything_connected replacement Nguyễn Thái Ngọc Duy
2013-05-03 12:35     ` [PATCH v3 1/4] clone: let the user know when check_everything_connected is run Nguyễn Thái Ngọc Duy
2013-05-03 12:35     ` [PATCH v3 2/4] fetch-pack: prepare updated shallow file before fetching the pack Nguyễn Thái Ngọc Duy
2013-05-03 12:37       ` Eric Sunshine
2013-05-07 15:59       ` Junio C Hamano
2013-05-26  1:01         ` Duy Nguyen
2013-05-03 12:35     ` [PATCH v3 3/4] index-pack: remove dead code (it should never happen) Nguyễn Thái Ngọc Duy
2013-05-03 12:35     ` [PATCH v3 4/4] clone: open a shortcut for connectivity check Nguyễn Thái Ngọc Duy
2013-05-03 12:41       ` Eric Sunshine
2013-05-03 16:15       ` Junio C Hamano
2013-05-04  1:10         ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vbo8szh4y.fsf@alter.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).