git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: David Kastrup <dak@gnu.org>
To: Nicolas Pitre <nico@cam.org>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] diff-delta.c: pack the index structure
Date: Sat, 08 Sep 2007 08:48:41 +0200	[thread overview]
Message-ID: <85vealzm7q.fsf@lola.goethe.zz> (raw)
In-Reply-To: <alpine.LFD.0.9999.0709072215420.21186@xanadu.home> (Nicolas Pitre's message of "Fri\, 07 Sep 2007 22\:34\:47 -0400 \(EDT\)")

Nicolas Pitre <nico@cam.org> writes:

> On Sat, 8 Sep 2007, David Kastrup wrote:
>
>> In normal use cases, the performance wins are not overly impressive:
>> we get something like 5-10% due to the slightly better locality of
>> memory accesses using the packed structure.
>
> The gain is probably counterbalanced by the fact that you're copying
> the whole index when packing it, which is unfortunate.

It was a design choice (I don't particularly like it myself).  An
index is created once, used a dozen times.  Doing the packing in-place
implies using realloc on the finished index.  I consider the
likelihood of permanent memory fragmentation higher when doing that
rather than when allocating a fresh area of the same size.

Also a repack in-place is going to cost more operations and have quite
more complicated code.

> Also, could you provide actual test results backing your performance 
> claim?  5-10% is still not negligible.

I did in my git repository

for i in .git/objects/[0-9a-f][0-9a-f]/[0-9a-f]*;do echo $i;done|sed 's+.*ts/\(..\)/+\1+' > /tmp/objlist

and then something like

dak@lola:/usr/local/tmp/git$ time ./git-pack-objects </tmp/objlist --stdout|dd of=/dev/null
4099+2 records in
4100+1 records out
2099205 bytes (2.1 MB) copied, 3.99295 seconds, 526 kB/s

real	0m4.023s
user	0m3.812s
sys	0m0.168s
dak@lola:/usr/local/tmp/git$ time git-pack-objects </tmp/objlist --stdout|dd of=/dev/null
4099+2 records in
4100+1 records out
2099205 bytes (2.1 MB) copied, 4.18734 seconds, 501 kB/s

real	0m4.218s
user	0m4.012s
sys	0m0.160s
dak@lola:/usr/local/tmp/git$ 

repeatedly on a warm cache.  Results were pretty much comparable:
consistently my version was faster, but never more than 10%.

>> -	struct index_entry *entry, **hash;
>> +	struct unpacked_index_entry *entry, **hash;
>> +	struct index_entry *aentry, **ahash;
>
> What does the "a" stand for?

array (as opposed to linked list).  Was the first thing coming into my
mind.  Maybe I should have gone for "p" for packed, but I shied from
it because it often is meant to imply "pointer".

Alternatively I could replace "entry" with "uentry", but that affects
more lines.

What do you propose?

>> +	mem = index+1;
> [...]
>> +	for (i=0; i<hsize; i++) {
> [...]
>> +		for (entry=hash[i]; entry; entry=entry->next)
>
> Minor style nit: please add spaces around "+", "=", "<", etc. for 
> consistency.

Can do.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

  reply	other threads:[~2007-09-08  8:59 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-07 23:38 [PATCH] diff-delta.c: pack the index structure David Kastrup
2007-09-08  2:34 ` Nicolas Pitre
2007-09-08  6:48   ` David Kastrup [this message]
2007-09-08  8:36   ` David Kastrup
2007-09-08 20:50     ` David Kastrup
  -- strict thread matches above, loose matches on Subject: below --
2007-09-08  9:31 David Kastrup
2007-09-08 21:17 David Kastrup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=85vealzm7q.fsf@lola.goethe.zz \
    --to=dak@gnu.org \
    --cc=git@vger.kernel.org \
    --cc=nico@cam.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).