git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Michael Haggerty <mhagger@alum.mit.edu>
Subject: [PATCH v3 0/2] pack-objects mru
Date: Wed, 10 Aug 2016 07:52:06 -0400	[thread overview]
Message-ID: <20160810115206.l57qpehpabthnl6c@sigill.intra.peff.net> (raw)
In-Reply-To: <xmqq7fbp8tki.fsf@gitster.mtv.corp.google.com>

On Tue, Aug 09, 2016 at 03:29:33PM -0700, Junio C Hamano wrote:

> > @@ -1516,6 +1577,13 @@ static void get_object_details(void)
> >  			entry->no_try_delta = 1;
> >  	}
> >  
> > +	/*
> > +	 * This must happen in a second pass, since we rely on the delta
> > +	 * information for the whole list being completed.
> > +	 */
> > +	for (i = 0; i < to_pack.nr_objects; i++)
> > +		break_delta_cycles(&to_pack.objects[i]);
> > +
> >  	free(sorted_by_offset);
> >  }
> 
> A potential cycle can only come from reusing deltas across packs in
> an unstable order, that happens way before we do the find_delta()
> thing, so this is a good place to have the new call.  While reading
> break_delta_cycles(), I was wondering if what it does is safe under
> multi-threading but there is no need to worry.

It definitely is not multi-threaded safe (and I'm not sure it could
easily be made so). But yeah, it is quick and happens as part of the
get_object_details() look at the objects, which is where we make
decisions about delta reuse.

> The recursiveness of break-delta-cycles is not too bad for the same
> reason why it is OK to recurse in check_delta_limit(), I would guess?

Yes, and for the same reason that it is OK in write_one(); the recursion
is limited by the depth of the delta chains.

> This is not new with this change, but I am not quite sure what in
> the current code prevents us from busting the delta limit for reused
> ones, though.

I think in the current code you are limited by the depth you might find
in a single existing pack (since we never reuse cross-pack deltas).
There is a comment in check_object() that claims:

	* Depth value does not matter - find_deltas() will
	* never consider reused delta as the base object to
	* deltify other objects against, in order to avoid
	* circular deltas.

but I do not recall seeing code to enforce that (but presumably that is
just beacuse it is a corner of the code I have not seen yet).

However, I think with cross-pack deltas, you could have a situation
like:

  pack 1: A -> B -> C
  pack 2: C -> D -> E

and pick A and B from the first pack, and C, D, and E from the second.
Then you end up with:

  A -> B -> C -> D -> E

in the output. IOW, I think the absolute worst case chain is the
max_depth times the number of packs. In practice I'd expect it to be
much smaller.

I'm not sure how much we should be worried about it. We could fill in
the depth values when adding a reused delta, but I don't think we have
the number handy; we'd have to actually walk the chain (though with
delta-base-offset, that is reasonably cheap).

> > I think my preference is to clean up the "hacky" bit of this patch, and
> > then apply the earlier MRU patch on top of it (which takes my repack
> > from 44 minutes to 5 minutes for this particular test set).
> 
> Yup, with something like this to break the delta chain _and_ allow
> an object to go through the usual deltify machinery, I'd say the MRU
> patch is a wonderful thing to have.

Here are cleaned-up patches. The cycle-breaking patch is cleaned up and
has tests. I erred on the side of verbosity in the comments there,
because it literally took me hours to come up with a reliable working
set of commands (and a convincing argument that it's correct). Hopefully
it doesn't put anyone to sleep. :)

The second patch is the same as before, though I tweaked the commit
message a bit, so please replace what is at the tip of
jk/pack-objects-optim-mru.

  [1/2]: pack-objects: break delta cycles before delta-search phase
  [2/2]: pack-objects: use mru list when iterating over packs

-Peff

  reply	other threads:[~2016-08-10 18:59 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-29  4:04 [PATCH v2 0/7] speed up pack-objects counting with many packs Jeff King
2016-07-29  4:06 ` [PATCH v2 1/7] t/perf: add tests for many-pack scenarios Jeff King
2016-07-29  4:06 ` [PATCH v2 2/7] sha1_file: drop free_pack_by_name Jeff King
2016-07-29  4:06 ` [PATCH v2 3/7] add generic most-recently-used list Jeff King
2016-07-29  4:09 ` [PATCH v2 4/7] find_pack_entry: replace last_found_pack with MRU cache Jeff King
2016-07-29  4:10 ` [PATCH v2 5/7] pack-objects: break out of want_object loop early Jeff King
2016-07-29  4:11 ` [PATCH v2 6/7] pack-objects: compute local/ignore_pack_keep early Jeff King
2016-07-29  4:15 ` [PATCH v2 7/7] pack-objects: use mru list when iterating over packs Jeff King
2016-07-29  5:45   ` Jeff King
2016-07-29 15:02     ` Junio C Hamano
2016-08-08 14:50       ` Jeff King
2016-08-08 16:28         ` Junio C Hamano
2016-08-08 16:51           ` Jeff King
2016-08-08 17:16             ` Junio C Hamano
2016-08-09 14:04               ` Jeff King
2016-08-09 17:45                 ` Jeff King
2016-08-09 18:06                   ` Junio C Hamano
2016-08-09 22:29                 ` Junio C Hamano
2016-08-10 11:52                   ` Jeff King [this message]
2016-08-10 12:02                     ` [PATCH v3 1/2] pack-objects: break delta cycles before delta-search phase Jeff King
2016-08-10 20:17                       ` Junio C Hamano
2016-08-11  5:02                         ` Jeff King
2016-08-11  5:15                           ` [PATCH v4 " Jeff King
2016-08-11  6:57                           ` [PATCH v3 " Jeff King
2016-08-11  9:20                             ` [PATCH v5] pack-objects mru Jeff King
2016-08-11  9:24                               ` [PATCH v5 1/4] provide an initializer for "struct object_info" Jeff King
2016-08-11  9:25                               ` [PATCH v5 2/4] sha1_file: make packed_object_info public Jeff King
2016-08-11  9:26                               ` [PATCH v5 3/4] pack-objects: break delta cycles before delta-search phase Jeff King
2016-08-11  9:26                               ` [PATCH v5 4/4] pack-objects: use mru list when iterating over packs Jeff King
2016-08-11  9:57                               ` [PATCH v5] pack-objects mru Jeff King
2016-08-11 15:11                                 ` Junio C Hamano
2016-08-11 16:19                                   ` Jeff King
2016-08-10 12:03                     ` [PATCH v3 2/2] pack-objects: use mru list when iterating over packs Jeff King
2016-08-10 16:47                     ` [PATCH v3 0/2] pack-objects mru Junio C Hamano
2016-08-11  4:48                       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160810115206.l57qpehpabthnl6c@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).