From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Duy Nguyen <pclouds@gmail.com>,
"Shawn O. Pearce" <spearce@spearce.org>
Subject: Re: [PATCH 4/6] introduce a commit metapack
Date: Tue, 29 Jan 2013 09:38:10 -0800 [thread overview]
Message-ID: <7vy5fbq48t.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: 20130129091610.GD9999@sigill.intra.peff.net
Jeff King <peff@peff.net> writes:
> +int commit_metapack(unsigned char *sha1,
> + uint32_t *timestamp,
> + unsigned char **tree,
> + unsigned char **parent1,
> + unsigned char **parent2)
> +{
> + struct commit_metapack *p;
> +
> + prepare_commit_metapacks();
> + for (p = commit_metapacks; p; p = p->next) {
> + unsigned char *data;
> + int pos = sha1_entry_pos(p->index, 20, 0, 0, p->nr, p->nr, sha1);
This is a tangent, but isn't it about time to rip out the check for
GIT_USE_LOOKUP in find_pack_entry_one(), I wonder.
> + prepare_commit_metapacks();
> + for (p = commit_metapacks; p; p = p->next) {
> + unsigned char *data;
> + int pos = sha1_entry_pos(p->index, 20, 0, 0, p->nr, p->nr, sha1);
> + if (pos < 0)
> + continue;
> +
> + /* timestamp(4) + tree(20) + parents(40) */
> + data = p->data + 64 * pos;
> + *timestamp = *(uint32_t *)data;
> + *timestamp = ntohl(*timestamp);
> + data += 4;
> + *tree = data;
> + data += 20;
> + *parent1 = data;
> + data += 20;
> + *parent2 = data;
> +
> + return 0;
> + }
> +
> + return -1;
> +}
I am torn on this one.
These cached properties of a single commit will not change no matter
which pack it appears in, and it feels logically wrong, especially
when you record these object names in the full SHA-1 form, to tie a
"commit metapack" to a pack. Logically there needs only one commit
metapack that describes all the commits known to the repository when
the metapack was created.
In order to reduce the disk footprint and I/O cost, the future
direction for this mechanism may want to point into an existing
store of SHA-1 hashes with a shorter file offset, and the .idx file
could be such a store, and in order to move in that direction, you
cannot avoid tying a metapack to a pack.
> +static void get_commits(struct metapack_writer *mw,
> + const unsigned char *sha1,
> + void *data)
> +{
> + struct commit_list ***tail = data;
> + enum object_type type = sha1_object_info(sha1, NULL);
> + struct commit *c;
> +
> + if (type != OBJ_COMMIT)
> + return;
> +
> + c = lookup_commit(sha1);
> + if (!c || parse_commit(c))
> + die("unable to read commit %s", sha1_to_hex(sha1));
> +
> + /*
> + * Our fixed-size parent list cannot represent root commits, nor
> + * octopus merges. Just skip those commits, as we can fallback
> + * in those rare cases to reading the actual commit object.
> + */
> + if (!c->parents ||
> + (c->parents && c->parents->next && c->parents->next->next))
> + return;
> +
> + *tail = &commit_list_insert(c, *tail)->next;
> +}
It feels somewhat wasteful to:
- use commit_list for this, rather than an array of commit
objects. If you have a rough estimate of the number of commits
in the pack, you could just preallocate a single array and use
ALLOC_GROW() on it, no?
- iterate over the .idx file and run sha1_object_info() and
parse_commit() on many objects in the SHA-1 order. Iterating in
the way builtin/pack-objects.c::get_object_details() does avoids
jumping around in existing packfiles, which may be more
efficient, no?
> +void commit_metapack_write(const char *idx)
> +{
> + struct metapack_writer mw;
> + struct commit_list *commits = NULL, *p;
> + struct commit_list **tail = &commits;
> + uint32_t nr = 0;
> +
> + metapack_writer_init(&mw, idx, "commits", 1);
> +
> + /* Figure out how many eligible commits we've got in this pack. */
> + metapack_writer_foreach(&mw, get_commits, &tail);
> + for (p = commits; p; p = p->next)
> + nr++;
next prev parent reply other threads:[~2013-01-29 17:38 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-29 9:14 [PATCH/RFC 0/6] commit caching Jeff King
2013-01-29 9:15 ` [PATCH 1/6] csum-file: make sha1write const-correct Jeff King
2013-01-29 9:15 ` [PATCH 2/6] strbuf: add string-chomping functions Jeff King
2013-01-29 10:15 ` Michael Haggerty
2013-01-29 11:10 ` Jeff King
2013-01-30 5:00 ` Michael Haggerty
2013-01-29 9:15 ` [PATCH 3/6] introduce pack metadata cache files Jeff King
2013-01-29 17:35 ` Junio C Hamano
2013-01-30 6:47 ` Jeff King
2013-01-30 1:30 ` Duy Nguyen
2013-01-30 6:50 ` Jeff King
2013-01-29 9:16 ` [PATCH 4/6] introduce a commit metapack Jeff King
2013-01-29 10:24 ` Michael Haggerty
2013-01-29 11:13 ` Jeff King
2013-01-29 17:38 ` Junio C Hamano [this message]
2013-01-29 18:08 ` Junio C Hamano
2013-01-30 7:12 ` Jeff King
2013-01-30 7:17 ` Junio C Hamano
2013-02-01 9:21 ` Jeff King
2013-01-30 15:56 ` Junio C Hamano
2013-01-31 17:03 ` Shawn Pearce
2013-02-01 9:42 ` Jeff King
2013-02-02 17:49 ` Junio C Hamano
2013-01-30 7:07 ` Jeff King
2013-01-30 3:36 ` Duy Nguyen
2013-01-30 7:12 ` Jeff King
2013-01-30 13:56 ` Duy Nguyen
2013-01-30 14:16 ` Duy Nguyen
2013-01-31 11:06 ` Duy Nguyen
2013-02-01 10:15 ` Jeff King
2013-02-02 9:49 ` Duy Nguyen
2013-02-01 10:40 ` Jeff King
2013-03-17 13:21 ` Duy Nguyen
2013-03-18 12:20 ` Jeff King
2013-02-01 10:00 ` Jeff King
2013-01-29 9:16 ` [PATCH 5/6] add git-metapack command Jeff King
2013-01-29 9:16 ` [PATCH 6/6] commit: look up commit info in metapack Jeff King
2013-01-30 3:31 ` [PATCH/RFC 0/6] commit caching Duy Nguyen
2013-01-30 7:18 ` Jeff King
2013-01-30 8:32 ` Duy Nguyen
2013-01-31 17:14 ` Shawn Pearce
2013-02-01 9:11 ` Jeff King
2013-02-02 10:04 ` Shawn Pearce
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7vy5fbq48t.fsf@alter.siamese.dyndns.org \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=pclouds@gmail.com \
--cc=peff@peff.net \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).