git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [GSoC] Use unsigned integral type for collection of bits
@ 2024-02-18 11:36 eugenio gigante
  2024-02-18 19:09 ` Eric Sunshine
  0 siblings, 1 reply; 4+ messages in thread
From: eugenio gigante @ 2024-02-18 11:36 UTC (permalink / raw
  To: git

Hi all,
I was looking around the codebase for some field of a structure that
stores collections of bits with signed int type.
I used this simple grep command to search for it:

$ grep -r -n "\tsigned int" .
> ./diffcore.h:63:     signed int is_binary : 2;

The struct in question is "diff_filespec" and Junio commented the
declaration of the field as following:

/* data should be considered "binary"; -1 means "don't know yet" */

So, if I understood it correctly, possible values are:
 1 -> 01
 2 -> 10
-1 -> 11
On the other, by changing it to unsigned values would be:
1 -> 01
2 -> 10
3 -> 11

I read somewhere that one should always prefer unsigned integral type over

signed integral type for a couple of reasons [1].

These involve operations like Modulus, Shifting and Overflow.

I didn't dig too much into how the field is used and if there are cases in which

the mentioned operations are involved (I would like the community
opinion about this topic before).


Moreover, I don’t know if such a change breaks too much code and if
it’s worth it.

Probably it's not that tragic since the header 'diffcore.h' is only
included in two other files,

but maybe I'm missing something. For sure, various If conditions used
by the function

'diff_filespec_is_binary' inside 'diff.c' would have to be changed.


Besides, it's possible that my grep command is not enough and maybe
more "signed int" can be spotted.

Thanks!
Eugenio Gigante.
P.S. I was insecure about how to send this email since it does not
include a commit.
I decide not to use git-send-email. Hoping I didn't mess up the format.

[1]
https://embeddedgurus.com/stack-overflow/2009/05/signed-versus-unsigned-integers/


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [GSoC] Use unsigned integral type for collection of bits
  2024-02-18 11:36 [GSoC] Use unsigned integral type for collection of bits eugenio gigante
@ 2024-02-18 19:09 ` Eric Sunshine
       [not found]   ` <CAFJh0PTgjj=1QAYD+tyqc_35TZE78QJJv4WU-W3aiJiFOWHP=w@mail.gmail.com>
  2024-02-20  0:32   ` Junio C Hamano
  0 siblings, 2 replies; 4+ messages in thread
From: Eric Sunshine @ 2024-02-18 19:09 UTC (permalink / raw
  To: eugenio gigante; +Cc: git

On Sun, Feb 18, 2024 at 6:37 AM eugenio gigante
<giganteeugenio2@gmail.com> wrote:
> I was looking around the codebase for some field of a structure that
> stores collections of bits with signed int type.
>
> > ./diffcore.h:63:     signed int is_binary : 2;
>
> The struct in question is "diff_filespec" and Junio commented the
> declaration of the field as following:
>
> /* data should be considered "binary"; -1 means "don't know yet" */
>
> I read somewhere that one should always prefer unsigned integral type over
> signed integral type for a couple of reasons [1].
> These involve operations like Modulus, Shifting and Overflow.

In the context of Git, we want to be using `unsigned` for variables
which are "bags of bits", where each bit indicates some "on" or "off"
property. Very frequently, such variables have "flags" in their names.
So, a possible scenario might be something like this:

    #define OP_FOO 0x01
    #define OP_BAR 0x02
    #define OP_ZAZ 0x04
    ...
    unsigned int flags = OP_FOO | OP_ZAZ;
    ...
    if ((flags & OP_ZAZ))
        do_some_zaz();

> I didn't dig too much into how the field is used and if there are cases in which
> the mentioned operations are involved (I would like the community
> opinion about this topic before).
>
> Moreover, I don’t know if such a change breaks too much code and if
> it’s worth it.
>
> but maybe I'm missing something. For sure, various If conditions used
> by the function
>
> 'diff_filespec_is_binary' inside 'diff.c' would have to be changed.

The code in question is not being used as a "bag of bits". Rather,
it's a tristate binary with values "not-set", "true", and "false".
Whereas a typical binary could be represented by a single bit, this
one needs the extra bit to handle the "not-set" case. Moreover, it is
idiomatic in the Git codebase for -1 to represent "not-set", so I
think this code is fine as-is since its meaning is clear to those
familiar with the codebase, thus does not need any changes made to it.

> Besides, it's possible that my grep command is not enough and maybe
> more "signed int" can be spotted.

There are cases in the codebase in which a signed type is being used
as a "bag of bits" instead of the more desirable unsigned type. If you
are interested in making such a fix, you might find some candidates
using a search such as this:

    git grep -P '(?<!unsigned )int\s+flags'

For example, it finds this instance in `builtin/add.c`:

    static int refresh(int verbose, const struct pathspec *pathspec)
    {
        int flags = REFRESH_IGNORE_SKIP_WORKTREE |
            (verbose ? REFRESH_IN_PORCELAIN : REFRESH_QUIET);
        ...
        refresh_index(&the_index, flags, pathspec, seen, ...);
    }

Taking a look at `read-cache-ll.h`, we see:

    int refresh_index(struct index_state *, unsigned int flags, ...);

So, refresh_index() is correctly expecting an unsigned value for
`flags` but refresh() in `builtin/add.c` has undesirably declared
`flags` as signed.

> P.S. I was insecure about how to send this email since it does not
> include a commit.
> I decide not to use git-send-email. Hoping I didn't mess up the format.

Using your preferred email client for general discussion is fine. Most
people only use git-send-email for sending patches.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [GSoC] Use unsigned integral type for collection of bits
       [not found]     ` <CAFJh0PRJkVBr-A=UtmEcAh4cPgC3w_vdTPg6kkjgHVQXHTYRmA@mail.gmail.com>
@ 2024-02-19 23:43       ` eugenio gigante
  0 siblings, 0 replies; 4+ messages in thread
From: eugenio gigante @ 2024-02-19 23:43 UTC (permalink / raw
  To: Eric Sunshine; +Cc: git

On Sun, Feb 18, 2024 at 20:09 AM eric sunshine
<sunshine@sunshineco.com> wrote:
> The code in question is not being used as a "bag of bits". Rather,
> it's a tristate binary with values "not-set", "true", and "false".
> Whereas a typical binary could be represented by a single bit, this
> one needs the extra bit to handle the "not-set" case. Moreover, it is
> idiomatic in the Git codebase for -1 to represent "not-set", so I
> think this code is fine as-is since its meaning is clear to those
> familiar with the codebase, thus does not need any changes made to it.

Thank you for the clarification and sorry for the misunderstanding.

> So, refresh_index() is correctly expecting an unsigned value for
> `flags` but refresh() in `builtin/add.c` has undesirably declared
> `flags` as signed.

So, an unsigned type is preferable since we are dealing
with 'bags of bits', and probably only bitwise operators operate
on them. Also the mixing is not ideal.
Yes, I'm interested in fixing the one in `builtin/add.c`.


Il giorno mar 20 feb 2024 alle ore 00:39 eugenio gigante
<giganteeugenio2@gmail.com> ha scritto:
>
> On Sun, Feb 18, 2024 at 20:09 AM eric sunshine
> <sunshine@sunshineco.com> wrote:
> > The code in question is not being used as a "bag of bits". Rather,
> > it's a tristate binary with values "not-set", "true", and "false".
> > Whereas a typical binary could be represented by a single bit, this
> > one needs the extra bit to handle the "not-set" case. Moreover, it is
> > idiomatic in the Git codebase for -1 to represent "not-set", so I
> > think this code is fine as-is since its meaning is clear to those
> > familiar with the codebase, thus does not need any changes made to it.
>
> Thank you for the clarification and sorry for the misunderstanding.
>
> > So, refresh_index() is correctly expecting an unsigned value for
> > `flags` but refresh() in `builtin/add.c` has undesirably declared
> > `flags` as signed.
>
> So, an unsigned type is preferable since we are dealing
> with 'bags of bits', and probably only bitwise operators operate
> on them. Also the mixing is not ideal.
> Yes, I'm interested in fixing the one in `builtin/add.c`.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [GSoC] Use unsigned integral type for collection of bits
  2024-02-18 19:09 ` Eric Sunshine
       [not found]   ` <CAFJh0PTgjj=1QAYD+tyqc_35TZE78QJJv4WU-W3aiJiFOWHP=w@mail.gmail.com>
@ 2024-02-20  0:32   ` Junio C Hamano
  1 sibling, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2024-02-20  0:32 UTC (permalink / raw
  To: Eric Sunshine; +Cc: eugenio gigante, git

Eric Sunshine <sunshine@sunshineco.com> writes:

>> 'diff_filespec_is_binary' inside 'diff.c' would have to be changed.
>
> The code in question is not being used as a "bag of bits". Rather,
> it's a tristate binary with values "not-set", "true", and "false".
> Whereas a typical binary could be represented by a single bit, this
> one needs the extra bit to handle the "not-set" case. Moreover, it is
> idiomatic in the Git codebase for -1 to represent "not-set", so I
> think this code is fine as-is since its meaning is clear to those
> familiar with the codebase, thus does not need any changes made to it.

Correct.  In general, bitfield structure members in our codebase
should be already fine.  Most of them are "unsigned : 1" and there
is not enough room in a single-bit bitfield to go signed.

> There are cases in the codebase in which a signed type is being used
> as a "bag of bits" instead of the more desirable unsigned type.
> ...
> So, refresh_index() is correctly expecting an unsigned value for
> `flags` but refresh() in `builtin/add.c` has undesirably declared
> `flags` as signed.

Thanks for a good example.  In a signed flag word that is used as a
bag of bits, the MSB is special, and unless you take advantage of
that special casing (which happens almost never), you should use an
unsigned word instead, to document that you are not doing anything
funny with the MSB, like "the flag word is negative, so the MSB must
be on", "I want to copy the bit immediately below MSB to MSB", etc.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-02-20  0:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-18 11:36 [GSoC] Use unsigned integral type for collection of bits eugenio gigante
2024-02-18 19:09 ` Eric Sunshine
     [not found]   ` <CAFJh0PTgjj=1QAYD+tyqc_35TZE78QJJv4WU-W3aiJiFOWHP=w@mail.gmail.com>
     [not found]     ` <CAFJh0PRJkVBr-A=UtmEcAh4cPgC3w_vdTPg6kkjgHVQXHTYRmA@mail.gmail.com>
2024-02-19 23:43       ` eugenio gigante
2024-02-20  0:32   ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).