git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: David Kastrup <dak@gnu.org>
To: Nicolas Pitre <nico@cam.org>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] diff-delta.c: pack the index structure
Date: Sat, 08 Sep 2007 10:36:05 +0200	[thread overview]
Message-ID: <854pi5zh8q.fsf@lola.goethe.zz> (raw)
In-Reply-To: <alpine.LFD.0.9999.0709072215420.21186@xanadu.home> (Nicolas Pitre's message of "Fri\, 07 Sep 2007 22\:34\:47 -0400 \(EDT\)")


I answered this post already, but the answer seems to have disappeared
into a black hole.  What a nuisance.

Nicolas Pitre <nico@cam.org> writes:

> On Sat, 8 Sep 2007, David Kastrup wrote:
>
>> In normal use cases, the performance wins are not overly impressive:
>> we get something like 5-10% due to the slightly better locality of
>> memory accesses using the packed structure.
>
> The gain is probably counterbalanced by the fact that you're copying
> the whole index when packing it, which is unfortunate.

The alternative would be packing in-place.  That's actually a rather
contorted operation, and the gains from packing would then have to be
claimed by "realloc".  I think that realloc is more likely to leave a
fragmented address space (giving the reclaimed memory to the heap
rather than back to the system), but I'd have to check glibc for
details at least on GNU/Linux.

Another option avoiding the realloc would be not to use linked lists
at all but just collect all entries sequentually in "packed" form, and
sort them into order afterwards.  That's an O(n lg n) operation with a
large n.  Even if one has a sorting algorithm with good memory
locality, I doubt that the locality would compensate for the lg n
factor, even when taking into account that we save ourselves the
copying.  And that is not even taken the possibility of having to cull
some buckets into account, another O(n) operation (which amounts to
copying everything once, too).

> Also, could you provide actual test results backing your performance 
> claim?  5-10% is still not negligible.


dak@lola:/usr/local/tmp/git$ for i in .git/objects/[0-9a-f][0-9a-f]/[0-9a-f]*;do echo $i;done|sed 's+.*ts/\(..\)/+\1+' > /tmp/objlist
dak@lola:/usr/local/tmp/git$ time ./git-pack-objects </tmp/objlist --stdout|dd of=/dev/null
4099+2 records in
4100+1 records out
2099205 bytes (2.1 MB) copied, 4.01225 seconds, 523 kB/s

real	0m4.043s
user	0m3.836s
sys	0m0.164s
dak@lola:/usr/local/tmp/git$ time git-pack-objects </tmp/objlist --stdout|dd of=/dev/null
4099+2 records in
4100+1 records out
2099205 bytes (2.1 MB) copied, 4.34936 seconds, 483 kB/s

real	0m4.381s
user	0m4.056s
sys	0m0.176s
dak@lola:/usr/local/tmp/git$ 

Of course, on a warm file system.

>> -	struct index_entry *entry, **hash;
>> +	struct unpacked_index_entry *entry, **hash;
>> +	struct index_entry *aentry, **ahash;
>
> What does the "a" stand for?

array (rather than linked list).  Should I rather rename entry and
hash to uentry and uhash (for unpacked)?

>> +	mem = index+1;
> [...]
>> +	for (i=0; i<hsize; i++) {
> [...]
>> +		for (entry=hash[i]; entry; entry=entry->next)
>
> Minor style nit: please add spaces around "+", "=", "<", etc. for 
> consistency.

Will do.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

  parent reply	other threads:[~2007-09-08  8:36 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-07 23:38 [PATCH] diff-delta.c: pack the index structure David Kastrup
2007-09-08  2:34 ` Nicolas Pitre
2007-09-08  6:48   ` David Kastrup
2007-09-08  8:36   ` David Kastrup [this message]
2007-09-08 20:50     ` David Kastrup
  -- strict thread matches above, loose matches on Subject: below --
2007-09-08  9:31 David Kastrup
2007-09-08 21:17 David Kastrup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=854pi5zh8q.fsf@lola.goethe.zz \
    --to=dak@gnu.org \
    --cc=git@vger.kernel.org \
    --cc=nico@cam.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).