From: David Kastrup <dak@gnu.org>
To: Nicolas Pitre <nico@cam.org>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] diff-delta.c: pack the index structure
Date: Sat, 08 Sep 2007 08:48:41 +0200 [thread overview]
Message-ID: <85vealzm7q.fsf@lola.goethe.zz> (raw)
In-Reply-To: <alpine.LFD.0.9999.0709072215420.21186@xanadu.home> (Nicolas Pitre's message of "Fri\, 07 Sep 2007 22\:34\:47 -0400 \(EDT\)")
Nicolas Pitre <nico@cam.org> writes:
> On Sat, 8 Sep 2007, David Kastrup wrote:
>
>> In normal use cases, the performance wins are not overly impressive:
>> we get something like 5-10% due to the slightly better locality of
>> memory accesses using the packed structure.
>
> The gain is probably counterbalanced by the fact that you're copying
> the whole index when packing it, which is unfortunate.
It was a design choice (I don't particularly like it myself). An
index is created once, used a dozen times. Doing the packing in-place
implies using realloc on the finished index. I consider the
likelihood of permanent memory fragmentation higher when doing that
rather than when allocating a fresh area of the same size.
Also a repack in-place is going to cost more operations and have quite
more complicated code.
> Also, could you provide actual test results backing your performance
> claim? 5-10% is still not negligible.
I did in my git repository
for i in .git/objects/[0-9a-f][0-9a-f]/[0-9a-f]*;do echo $i;done|sed 's+.*ts/\(..\)/+\1+' > /tmp/objlist
and then something like
dak@lola:/usr/local/tmp/git$ time ./git-pack-objects </tmp/objlist --stdout|dd of=/dev/null
4099+2 records in
4100+1 records out
2099205 bytes (2.1 MB) copied, 3.99295 seconds, 526 kB/s
real 0m4.023s
user 0m3.812s
sys 0m0.168s
dak@lola:/usr/local/tmp/git$ time git-pack-objects </tmp/objlist --stdout|dd of=/dev/null
4099+2 records in
4100+1 records out
2099205 bytes (2.1 MB) copied, 4.18734 seconds, 501 kB/s
real 0m4.218s
user 0m4.012s
sys 0m0.160s
dak@lola:/usr/local/tmp/git$
repeatedly on a warm cache. Results were pretty much comparable:
consistently my version was faster, but never more than 10%.
>> - struct index_entry *entry, **hash;
>> + struct unpacked_index_entry *entry, **hash;
>> + struct index_entry *aentry, **ahash;
>
> What does the "a" stand for?
array (as opposed to linked list). Was the first thing coming into my
mind. Maybe I should have gone for "p" for packed, but I shied from
it because it often is meant to imply "pointer".
Alternatively I could replace "entry" with "uentry", but that affects
more lines.
What do you propose?
>> + mem = index+1;
> [...]
>> + for (i=0; i<hsize; i++) {
> [...]
>> + for (entry=hash[i]; entry; entry=entry->next)
>
> Minor style nit: please add spaces around "+", "=", "<", etc. for
> consistency.
Can do.
--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
next prev parent reply other threads:[~2007-09-08 8:59 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-09-07 23:38 [PATCH] diff-delta.c: pack the index structure David Kastrup
2007-09-08 2:34 ` Nicolas Pitre
2007-09-08 6:48 ` David Kastrup [this message]
2007-09-08 8:36 ` David Kastrup
2007-09-08 20:50 ` David Kastrup
-- strict thread matches above, loose matches on Subject: below --
2007-09-08 9:31 David Kastrup
2007-09-08 21:17 David Kastrup
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=85vealzm7q.fsf@lola.goethe.zz \
--to=dak@gnu.org \
--cc=git@vger.kernel.org \
--cc=nico@cam.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).