git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Brandon Casey <drafnel@gmail.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [PATCH 4/4] pack-revindex: radix-sort the revindex
Date: Wed, 10 Jul 2013 06:52:44 -0400	[thread overview]
Message-ID: <20130710105244.GA9724@sigill.intra.peff.net> (raw)
In-Reply-To: <CA+sFfMfQcnx+OGNd+v7NJC5zSXg2OR1QiLyRSqDjXD0zb4mvtA@mail.gmail.com>

On Mon, Jul 08, 2013 at 01:50:41PM -0700, Brandon Casey wrote:

> > +static void sort_revindex(struct revindex_entry *entries, int n, off_t max)
> 
> If 'n' is the number of objects in the pack, shouldn't it be unsigned?

Yes. I inherited that bug from the rest of the revindex code.

> The data type for struct packed_git.num_objects is uint32_t.  Looks
> like create_pack_revindex uses the wrong datatype when it captures
> num_objects in the int num_ent and passes it to sort_revindex.  So, it
> looks like that function needs to be updated too.

Yep. And the binary search in find_pack_revindex, too (which even has an
integer overflow!). I'll add a patch to my series to fix it (and switch
mine).

> > +       while (max / (((off_t)1) << digits)) {
> 
> Is there any reason this shouldn't be simplified to just:
> 
>        while (max >> digits) {

No, yours is much more readable. In case you are wondering how I ended
up with that monstrosity, I originally did not keep the "digits" field
as a number of bits, but rather a running total that bit-shifted 16 bits
each time through the loop. I'll change it in the re-roll.

> I glanced briefly at the assembly and it appears that gcc does
> actually emit a divide instruction to accomplish this, which I think
> we can avoid by just rearranging the operation.

Yep, although it almost certainly doesn't matter. We hit that loop
condition check at most 5 times for a 64-bit integer. I'm more concerned
with readability.

> > +       if (a != entries) {
> > +               int i;
> > +               for (i = 0; i < n; i++)
> > +                       entries[i] = tmp[i];
> 
> I think I recall that somebody investigated whether a for loop like
> you have above was faster for copying structures than memcpy.  I
> forget whether it was conclusive.  Did you happen to compare them?

It was me, actually, but the comparison was for memcmp rather than an
open-coded loop. And the conclusion was that memcmp is way faster on
glibc 2.13 and higher.

I think memcpy probably is going to be faster (especially in recent
versions of glibc), given the size of the array (the other memcmp
discussion was for 20-byte hashes, where the function call and setup
time was much more relevant).

But I don't think this was even timed at all in my tests. Since we go
back-and-forth between the original array and the tmp storage, we have a
"50%" chance of not needing to swap back anyway. So for packfiles up to
64K, we do the swap (but they are not that interesting to measure), and
then from 64K to 4G, we do not.

Note that we also use struct assignment in the sort itself to drop
elements into their buckets. That could potentially use memcpy, though I
would expect the compiler to generate pretty decent instructions for
such a small struct.

-Peff

  parent reply	other threads:[~2013-07-10 10:52 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-07 10:01 [RFC/PATCH 0/4] cat-file --batch-disk-sizes Jeff King
2013-07-07 10:03 ` [PATCH 1/4] zero-initialize object_info structs Jeff King
2013-07-07 17:34   ` Junio C Hamano
2013-07-07 10:04 ` [PATCH 2/4] teach sha1_object_info_extended a "disk_size" query Jeff King
2013-07-07 10:09 ` [PATCH 3/4] cat-file: add --batch-disk-sizes option Jeff King
2013-07-07 17:49   ` Junio C Hamano
2013-07-07 18:19     ` Jeff King
2013-07-08 11:04     ` Duy Nguyen
2013-07-08 12:00       ` Ramkumar Ramachandra
2013-07-08 13:13         ` Duy Nguyen
2013-07-08 13:37           ` Ramkumar Ramachandra
2013-07-09  2:55             ` Duy Nguyen
2013-07-09 10:32               ` Ramkumar Ramachandra
2013-07-10 11:16             ` Jeff King
2013-07-08 16:40           ` Junio C Hamano
2013-07-10 11:04     ` Jeff King
2013-07-11 16:35       ` Junio C Hamano
2013-07-07 21:15   ` brian m. carlson
2013-07-10 10:57     ` Jeff King
2013-07-07 10:14 ` [PATCH 4/4] pack-revindex: radix-sort the revindex Jeff King
2013-07-07 23:52   ` Shawn Pearce
2013-07-08  7:57     ` Jeff King
2013-07-08 15:38       ` Shawn Pearce
2013-07-08 20:50   ` Brandon Casey
2013-07-08 21:35     ` Brandon Casey
2013-07-10 10:57       ` Jeff King
2013-07-10 10:52     ` Jeff King [this message]
2013-07-10 11:34 ` [PATCHv2 00/10] cat-file formats/on-disk sizes Jeff King
2013-07-10 11:35   ` [PATCH 01/10] zero-initialize object_info structs Jeff King
2013-07-10 11:35   ` [PATCH 02/10] teach sha1_object_info_extended a "disk_size" query Jeff King
2013-07-10 11:36   ` [PATCH 03/10] t1006: modernize output comparisons Jeff King
2013-07-10 11:38   ` [PATCH 04/10] cat-file: teach --batch to stream blob objects Jeff King
2013-07-10 11:38   ` [PATCH 05/10] cat-file: refactor --batch option parsing Jeff King
2013-07-10 11:45   ` [PATCH 06/10] cat-file: add --batch-check=<format> Jeff King
2013-07-10 11:57     ` Eric Sunshine
2013-07-10 14:51     ` Ramkumar Ramachandra
2013-07-11 11:24       ` Jeff King
2013-07-10 11:46   ` [PATCH 07/10] cat-file: add %(objectsize:disk) format atom Jeff King
2013-07-10 11:48   ` [PATCH 08/10] cat-file: split --batch input lines on whitespace Jeff King
2013-07-10 15:29     ` Ramkumar Ramachandra
2013-07-11 11:36       ` Jeff King
2013-07-11 17:42         ` Junio C Hamano
2013-07-11 20:45         ` [PATCHv3 " Jeff King
2013-07-10 11:50   ` [PATCH 09/10] pack-revindex: use unsigned to store number of objects Jeff King
2013-07-10 11:55   ` [PATCH 10/10] pack-revindex: radix-sort the revindex Jeff King
2013-07-10 12:00     ` Jeff King
2013-07-10 13:17     ` Ramkumar Ramachandra
2013-07-11 11:03       ` Jeff King
2013-07-10 17:10     ` Brandon Casey
2013-07-11 11:17       ` Jeff King
2013-07-11 12:16     ` [PATCHv3 " Jeff King
2013-07-11 21:12       ` Brandon Casey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130710105244.GA9724@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=drafnel@gmail.com \
    --cc=git@vger.kernel.org \
    --subject='Re: [PATCH 4/4] pack-revindex: radix-sort the revindex' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).