git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Michael Haggerty <mhagger@alum.mit.edu>
Subject: Re: [PATCH v3 1/2] pack-objects: break delta cycles before delta-search phase
Date: Thu, 11 Aug 2016 01:02:52 -0400	[thread overview]
Message-ID: <20160811050252.g3iusy7bp3j6tzte@sigill.intra.peff.net> (raw)
In-Reply-To: <xmqqr39w4bvx.fsf@gitster.mtv.corp.google.com>

On Wed, Aug 10, 2016 at 01:17:22PM -0700, Junio C Hamano wrote:

> > Actually, skimming the sha1_file code, I am not 100% sure that we detect
> > cycles in OBJ_REF_DELTA (you cannot have cycles in OBJ_OFS_DELTA since
> > they always point backwards in the pack). But if that is the case, then
> > I think we should fix that, not worry about special-casing it here.
> 
> Yes, but sha1_file.c?  It is the reading side and it is too late if
> we notice a problem, I would think.

We already are covered on the writing side. That is what your code in
write_one() does. The reason to warn is on the reading side ("I fixed
this for you, but by the way, your existing packs were bogus").

But of more concern is whether read_sha1_file() would recurse
infinitely, which would be bad (though I do not think it would be a
feasible attack vector; index-pack already rejects such packs before
they are admitted to the repository).

> > + *   2. Updating our size; check_object() will have filled in the size of our
> > + *      delta, but a non-delta object needs it true size.
> 
> Excellent point.

I was not clever enough to think of it; the pack-objects code is filled
with nice assertions (Thanks, Nico!) that help out when you are stupid. :)

One thing to be careful of is that there are more things this
drop_reused_delta() should be doing. But I looked through the rest of
check_object() and could not find anything else.

> > +	case DFS_ACTIVE:
> > +		/*
> > +		 * We found a cycle that needs broken. It would be correct to
> > +		 * break any link in the chain, but it's convenient to
> > +		 * break this one.
> > +		 */
> > +		drop_reused_delta(entry);
> > +		break;
> > +	}
> > +}
> 
> Do we need to do anything to the DFS state of an entry when
> drop_reused_delta() resets its other fields?

Good catch. It should be marked DONE after we have broken the delta. It
doesn't matter in practice, because...

> If we later find D that is directly based on A, wouldn't we end up
> visiting A and attempt to drop it again?  drop_reused_delta() is
> idempotent so there will be no data structure corruption, I think,
> but we can safely declare that the entry is now DONE after calling
> drop_reused_delta() on it (either in the function or in the caller
> after it calls the function), no?

I think the idempotency of drop_reused_delta() doesn't matter. When we
visit A again later, its "delta" field will be NULL, so we'll hit the
condition at the top of the function: this is a base object, mark DONE
and don't recurse.

So it's correct as-is, but I agree it feels weird that the DFS would end
with some objects potentially marked ACTIVE. Everything should be DONE
at the end.

> > +# Create a pack containing the the tree $1 and blob $1:file, with
> > +# the latter stored as a delta against $2:file.
> > +#
> > +# We convince pack-objects to make the delta in the direction of our choosing
> > +# by marking $2 as a preferred-base edge. That results in $1:file as a thin
> > +# delta, and index-pack completes it by adding $2:file as a base.
> 
> Tricky but clever and correct ;-)

Thanks, it took a long time to think up. ;)

I actually wish we had better tools for making fake packs. Something
where you could say "add A, then add B as a delta of A, then...".
Because you often have to jump through quite a few hoops to convince
pack-objects to generate the pack you want, and even some things are
impossible (for example, I would like to make a chain of 10 deltas; how
do I convince the delta search to put my objects in the right order?).

I tried briefly yesterday to convince pack-objects to just take a list
of objects and their deltas, but it got ugly very quickly.  I think we'd
be better off writing a new tool that happens to reuse some of the
formatting functions from pack-objects. But even then, we've got to
write an .idx, which means going through index-pack (which will complain
if we are writing bogus packs for testing odd situations), or we have to
keep a valid list of "struct object_entry" to feed to the idx writer.

So even that approach is not quite trivial.

> > +make_pack () {
> > +	 {
> > +		 echo "-$(git rev-parse $2)"
> 
> Is everybody's 'echo' happy with dash followed by unknown string?

I'd assume so because it will be "-<sha1>", and I think echoes which
take options are careful about that.

Still, it would not be hard to tweak.

> > +		 echo "$(git rev-parse $1:dummy) dummy"
> > +		 echo "$(git rev-parse $1:file) file"
> > +	 } |
> > +	 git pack-objects --stdout |
> > +	 git index-pack --stdin --fix-thin
> 
> An alternative
> 
> 	git pack-objects --stdout <<-EOF |
> 	-$(git rev-parse $2)
>         $(git rev-parse $1:dummy) dummy
>         $(git rev-parse $1:file) file
> 	EOF
>         git index-pack --stdin --fix-thin
> 
> looks somewhat ugly, though.

Yeah, I think we would be better to just switch to printf if we want to
be careful.

I'll follow-up with a patch.

-Peff

  reply	other threads:[~2016-08-11  5:03 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-29  4:04 [PATCH v2 0/7] speed up pack-objects counting with many packs Jeff King
2016-07-29  4:06 ` [PATCH v2 1/7] t/perf: add tests for many-pack scenarios Jeff King
2016-07-29  4:06 ` [PATCH v2 2/7] sha1_file: drop free_pack_by_name Jeff King
2016-07-29  4:06 ` [PATCH v2 3/7] add generic most-recently-used list Jeff King
2016-07-29  4:09 ` [PATCH v2 4/7] find_pack_entry: replace last_found_pack with MRU cache Jeff King
2016-07-29  4:10 ` [PATCH v2 5/7] pack-objects: break out of want_object loop early Jeff King
2016-07-29  4:11 ` [PATCH v2 6/7] pack-objects: compute local/ignore_pack_keep early Jeff King
2016-07-29  4:15 ` [PATCH v2 7/7] pack-objects: use mru list when iterating over packs Jeff King
2016-07-29  5:45   ` Jeff King
2016-07-29 15:02     ` Junio C Hamano
2016-08-08 14:50       ` Jeff King
2016-08-08 16:28         ` Junio C Hamano
2016-08-08 16:51           ` Jeff King
2016-08-08 17:16             ` Junio C Hamano
2016-08-09 14:04               ` Jeff King
2016-08-09 17:45                 ` Jeff King
2016-08-09 18:06                   ` Junio C Hamano
2016-08-09 22:29                 ` Junio C Hamano
2016-08-10 11:52                   ` [PATCH v3 0/2] pack-objects mru Jeff King
2016-08-10 12:02                     ` [PATCH v3 1/2] pack-objects: break delta cycles before delta-search phase Jeff King
2016-08-10 20:17                       ` Junio C Hamano
2016-08-11  5:02                         ` Jeff King [this message]
2016-08-11  5:15                           ` [PATCH v4 " Jeff King
2016-08-11  6:57                           ` [PATCH v3 " Jeff King
2016-08-11  9:20                             ` [PATCH v5] pack-objects mru Jeff King
2016-08-11  9:24                               ` [PATCH v5 1/4] provide an initializer for "struct object_info" Jeff King
2016-08-11  9:25                               ` [PATCH v5 2/4] sha1_file: make packed_object_info public Jeff King
2016-08-11  9:26                               ` [PATCH v5 3/4] pack-objects: break delta cycles before delta-search phase Jeff King
2016-08-11  9:26                               ` [PATCH v5 4/4] pack-objects: use mru list when iterating over packs Jeff King
2016-08-11  9:57                               ` [PATCH v5] pack-objects mru Jeff King
2016-08-11 15:11                                 ` Junio C Hamano
2016-08-11 16:19                                   ` Jeff King
2016-08-10 12:03                     ` [PATCH v3 2/2] pack-objects: use mru list when iterating over packs Jeff King
2016-08-10 16:47                     ` [PATCH v3 0/2] pack-objects mru Junio C Hamano
2016-08-11  4:48                       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160811050252.g3iusy7bp3j6tzte@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).