git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: Jeff King <peff@peff.net>, git@vger.kernel.org
Subject: Re: [PATCH 2/2] fetch-pack: avoid repeatedly re-scanning pack directory
Date: Sun, 27 Jan 2013 12:09:39 -0800	[thread overview]
Message-ID: <7vvcaiv14s.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <20130127102753.GB4228@elie.Belkin> (Jonathan Nieder's message of "Sun, 27 Jan 2013 02:27:53 -0800")

Jonathan Nieder <jrnieder@gmail.com> writes:

> Jeff King wrote:
>
>> When we look up a sha1 object for reading, we first check
>> packfiles, and then loose objects. If we still haven't found
>> it, we re-scan the list of packfiles in `objects/pack`. This
>> final step ensures that we can co-exist with a simultaneous
>> repack process which creates a new pack and then prunes the
>> old object.
>
> I like the context above and what follows it, but I think you forgot
> to mention what the patch actually does. :)
>
> I guess it is:
>
> 	However, in the first scan over refs in fetch-pack.c::everything_local,
> 	this double-check of packfiles is not necessary since we are only
> 	trying to get a rough estimate of the last time we fetched from this
> 	remote repository in order to find good candidate common commits ---
> 	a missed object would only result in a slightly slower fetch.

It is not about a rough estimate nor common commits, though.  The
"everything local" check in question is interested in only one
thing: are we _clearly_ up to date without fetching anything from
them?

Loosening the check may miss the rare case where we race against a
simultaneous repack and will cause us to go to the network when we
do not have to, and it becomes a trade off between the common unracy
case going faster by allowing the "Are we clearly up to date" check
to cheat, at the expense of rare racy cases suffering unnecessary
object transfer overhead.

> 	Avoid that slow second scan in the common case by guarding the object
> 	lookup with has_sha1_file().

This conclusion is correct.

> I had not read this codepath before.  I'm left with a few questions:
>
>  * Why is 49bb805e ("Do not ask for objects known to be complete",
>    2005-10-19) trying to do?  Are we hurting that in any way?

An earlier fetch may have acquired all the necessary objects but may
not have updated our refs for some reason (e.g. fast-forward check
may have fired).  In such a case, we may already have a history that
is good (i.e. not missing paths down to the common history) in our
repository that is not connected to any of our refs, and we can
update our refs (or write to FETCH_HEAD) without asking the remote
end to do any common ancestor computation or object transfer.

That was the primary thing the patch wanted to do.

As a side-effect, we know more objects than just the objects at the
tips of our refs are complete and that may help the later common
history discovery step, but obviously we do not want to dig the
history down to root.  The cutoff value is merely a heuristics
chosen without any deep thought.

>  * Is has_sha1_file() generally succeptible to the race against repack
>    you mentioned?  How is that normally dealt with?

By failing to find, so that the user will restart.  When the caller
really wants to use the object, parse_objects() => read_sha1_file()
=> read_object() is used and we will see the retry.

>  * Can a slow operation get confused if an object is incorporated into
>    a pack and then expelled again by two repacks in sequence?

If it checks "the object should be there" first, wait for a long
time, and then tries to find that object's data, the later access
will go to the parse_objects() callpath and I think it should do the
right thing.  If that slow opearation stops inside read_object(), it
could find it unable to map the loose object file and then unable to
find it in the pack, either.  Is that what you are worried about?

  reply	other threads:[~2013-01-27 20:10 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-26 22:40 [PATCH 0/2] optimizing pack access on "read only" fetch repos Jeff King
2013-01-26 22:40 ` [PATCH 1/2] fetch: run gc --auto after fetching Jeff King
2013-01-27  1:51   ` Jonathan Nieder
     [not found]   ` <87bmopzbqx.fsf@gmail.com>
2017-07-12 20:00     ` git gc --auto aquires *.lock files that make a subsequent git-fetch error out Jeff King
2017-07-12 20:30       ` Ævar Arnfjörð Bjarmason
2017-07-12 20:43         ` Jeff King
2013-01-26 22:40 ` [PATCH 2/2] fetch-pack: avoid repeatedly re-scanning pack directory Jeff King
2013-01-27 10:27   ` Jonathan Nieder
2013-01-27 20:09     ` Junio C Hamano [this message]
2013-01-27 23:20       ` Jonathan Nieder
2013-01-27  6:32 ` [PATCH 0/2] optimizing pack access on "read only" fetch repos Junio C Hamano
2013-01-29  8:06   ` Shawn Pearce
2013-01-29  8:29   ` Jeff King
2013-01-29 15:25     ` Martin Fick
2013-01-29 15:58     ` Junio C Hamano
2013-01-29 21:19       ` Jeff King
2013-01-29 22:26         ` Junio C Hamano
2013-01-31 16:47         ` Shawn Pearce
2013-02-01  9:14           ` Jeff King
2013-02-02 10:07             ` Shawn Pearce
2013-01-29 11:01   ` Duy Nguyen
  -- strict thread matches above, loose matches on Subject: below --
2012-12-07 13:53 [PATCH 0/2] make repeated fetch faster on "read only" repos Jeff King
2012-12-07 14:04 ` [PATCH 2/2] fetch-pack: avoid repeatedly re-scanning pack directory Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vvcaiv14s.fsf@alter.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).