From: Jeff King <peff@peff.net>
To: "Torsten Bögershausen" <tboegi@web.de>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 4/8] add functions for memory-efficient bitmaps
Date: Thu, 26 Jun 2014 11:51:19 -0400 [thread overview]
Message-ID: <20140626155119.GA10402@sigill.intra.peff.net> (raw)
In-Reply-To: <53AB9039.8040809@web.de>
On Thu, Jun 26, 2014 at 05:15:05AM +0200, Torsten Bögershausen wrote:
> > + */
> > +static inline int bitset_sizeof(int num_bits)
> > +{
> > + return (num_bits + CHAR_BIT - 1) / CHAR_BIT;
> > +}
> Just a general question about the usage of "int" here (and at other places):
> Is there a special reason for new code to allow num_bits to be negative ?
No. I usually choose "int" when the word size is not likely to matter
(i.e., we do not expect it to overflow a 32-bit integer, nor to have so
many that I need to be careful not to waste space).
Probably "unsigned int" would be a more descriptive choice.
It may also help the compiler optimize better. Assuming CHAR_BIT is 8
(i.e., most everywhere), we get:
(num_bits + 7) / 8
Presumably the compiler implements the division with a right-shift.
Marking num_bits as unsigned should let us do just a logical shift,
without worrying about the sign. And indeed, here are the signed and
unsigned versions produced by "gcc -S -O2" (for an equivalent
non-inlined function):
[signed]
leal 14(%rdi), %edx
movl %edi, %eax
addl $7, %eax
cmovs %edx, %eax
sarl $3, %eax
ret
[unsigned]
leal 7(%rdi), %eax
shrl $3, %eax
ret
Much simpler, though see below for practical considerations.
> To my knowledge all the size_t definitions these days are positive,
> because a size can not be negative.
size_t is perhaps a reasonable choice for the return value, given the name
"sizeof". But if you really care about using the whole range of bits there, you
need a data type for num_bits that is CHAR_BIT times larger.
> Should we use
> "unsigned" here ?
> or "unsigned int" ?
Yes, I think so. Both are the same to the compiler. I have a vague
recollection that we prefer one over the other, but grepping seems to
find many examples of each in our code.
I'm squashing in the patch below. I couldn't measure any speed
improvement. I'm guessing because the functions are all inlined, which
means we likely get away with calculating bitset_sizeof once outside of
our loop. I think the result is still more obvious to read, though.
-Peff
---
diff --git a/bitset.h b/bitset.h
index 5fbc956..268545b 100644
--- a/bitset.h
+++ b/bitset.h
@@ -45,7 +45,7 @@
*
* bits = xcalloc(1, bitset_sizeof(nr));
*/
-static inline int bitset_sizeof(int num_bits)
+static inline unsigned bitset_sizeof(unsigned num_bits)
{
return (num_bits + CHAR_BIT - 1) / CHAR_BIT;
}
@@ -54,7 +54,7 @@ static inline int bitset_sizeof(int num_bits)
* Set the bit at position "n". "n" is counted from zero, and must be
* smaller than the num_bits given to bitset_sizeof when allocating the bitset.
*/
-static inline void bitset_set(unsigned char *bits, int n)
+static inline void bitset_set(unsigned char *bits, unsigned n)
{
bits[n / CHAR_BIT] |= 1 << (n % CHAR_BIT);
}
@@ -62,7 +62,7 @@ static inline void bitset_set(unsigned char *bits, int n)
/*
* Return the bit at position "n" (see bitset_set for a description of "n").
*/
-static inline int bitset_get(unsigned char *bits, int n)
+static inline unsigned bitset_get(unsigned char *bits, unsigned n)
{
return !!(bits[n / CHAR_BIT] & (1 << (n % CHAR_BIT)));
}
@@ -75,9 +75,10 @@ static inline int bitset_get(unsigned char *bits, int n)
* "max" (we assume any bits beyond "max" up to the next CHAR_BIT boundary are
* zeroed padding).
*/
-static inline int bitset_equal(unsigned char *a, unsigned char *b, int max)
+static inline int bitset_equal(unsigned char *a, unsigned char *b,
+ unsigned max)
{
- int i;
+ unsigned i;
for (i = bitset_sizeof(max); i > 0; i--)
if (*a++ != *b++)
return 0;
@@ -89,9 +90,10 @@ static inline int bitset_equal(unsigned char *a, unsigned char *b, int max)
*
* See bitset_equal for the definition of "max".
*/
-static inline void bitset_or(unsigned char *dst, const unsigned char *src, int max)
+static inline void bitset_or(unsigned char *dst, const unsigned char *src,
+ unsigned max)
{
- int i;
+ unsigned i;
for (i = bitset_sizeof(max); i > 0; i--)
*dst++ |= *src++;
}
@@ -101,9 +103,9 @@ static inline void bitset_or(unsigned char *dst, const unsigned char *src, int m
*
* See bitset_equal for the definition of "max".
*/
-static inline int bitset_empty(const unsigned char *bits, int max)
+static inline int bitset_empty(const unsigned char *bits, unsigned max)
{
- int i;
+ unsigned i;
for (i = bitset_sizeof(max); i > 0; i--, bits++)
if (*bits)
return 0;
next prev parent reply other threads:[~2014-06-26 15:51 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-25 23:34 [PATCH 0/8] use merge-base for tag --contains Jeff King
2014-06-25 23:35 ` [PATCH 1/8] tag: allow --sort with -n Jeff King
2014-06-25 23:35 ` [PATCH 2/8] tag: factor out decision to stream tags Jeff King
2014-06-25 23:39 ` [PATCH 3/8] paint_down_to_common: use prio_queue Jeff King
2014-07-01 16:23 ` Junio C Hamano
2014-07-01 17:10 ` Jeff King
2014-06-25 23:40 ` [PATCH 4/8] add functions for memory-efficient bitmaps Jeff King
2014-06-26 3:15 ` Torsten Bögershausen
2014-06-26 15:51 ` Jeff King [this message]
2014-06-29 7:41 ` Eric Sunshine
2014-06-30 17:07 ` Jeff King
2014-07-01 16:57 ` Junio C Hamano
2014-07-01 17:18 ` Jeff King
2014-06-25 23:42 ` [PATCH 5/8] string-list: add pos to iterator callback Jeff King
2014-07-01 17:45 ` Junio C Hamano
2014-07-01 19:00 ` Jeff King
2014-06-25 23:47 ` [PATCH 6/8] commit: provide a fast multi-tip contains function Jeff King
2014-06-26 18:55 ` Junio C Hamano
2014-06-26 19:19 ` Junio C Hamano
2014-06-26 19:26 ` Junio C Hamano
2014-07-01 18:16 ` Junio C Hamano
2014-07-01 19:14 ` Junio C Hamano
2014-06-25 23:49 ` [PATCH 7/8] tag: use commit_contains Jeff King
2014-06-25 23:53 ` [PATCH 8/8] perf: add tests for tag --contains Jeff King
2014-06-26 0:01 ` Jeff King
2014-06-26 0:04 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140626155119.GA10402@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=tboegi@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).