git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>, "Jeff King" <peff@peff.net>,
	"Philip Oakley" <philipoakley@iee.email>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [PATCH v2 0/7] strvec: use size_t to store nr and alloc
Date: Sun, 12 Sep 2021 02:15:48 +0200	[thread overview]
Message-ID: <cover-v2-0.7-00000000000-20210912T001420Z-avarab@gmail.com> (raw)
In-Reply-To: <5e5e7fd9-83d7-87f7-b1ef-1292912b6c00@iee.email>

This is a proposed v2 of Jeff King's one-patch change to change
strvec's nr/alloc from "int" to "size_t". As noted below I think it's
worthwhile to not only change that in the struct, but also in code
that directly references the "nr" member.

On Sat, Sep 11 2021, Philip Oakley wrote:

> On 11/09/2021 17:13, Ævar Arnfjörð Bjarmason wrote:
>> On Sat, Sep 11 2021, Jeff King wrote:
>>
>>> We converted argv_array (which later became strvec) to use size_t in
>>> 819f0e76b1 (argv-array: use size_t for count and alloc, 2020-07-28) in
>>> order to avoid the possibility of integer overflow. But later, commit
>>> d70a9eb611 (strvec: rename struct fields, 2020-07-28) accidentally
>>> converted these back to ints!
>>>
>>> Those two commits were part of the same patch series. I'm pretty sure
>>> what happened is that they were originally written in the opposite order
>>> and then cleaned up and re-ordered during an interactive rebase. And
>>> when resolving the inevitable conflict, I mistakenly took the "rename"
>>> patch completely, accidentally dropping the type change.
>>>
>>> We can correct it now; better late than never.
>>>
>>> Signed-off-by: Jeff King <peff@peff.net>
>>> ---
>>> This was posted previously in the midst of another thread, but I don't
>>> think was picked up. There was some positive reaction, but one "do we
>>> really need this?" to which I responded in detail:
>>>
>>>   https://lore.kernel.org/git/YTIBnT8Ue1HZXs82@coredump.intra.peff.net/
>>>
>>> I don't really think any of that needs to go into the commit message,
>>> but if that's a hold-up, I can try to summarize it (though I think
>>> referring to the commit which _already_ did this and was accidentally
>>> reverted would be sufficient).
>> Thanks, I have a WIP version of this outstanding starting with this
>> patch that I was planning to submit sometime, but I'm happy to have you
>> pursue it, especially with the ~100 outstanding patches I have in
>> master..seen.
>>
>> It does feel somewhere between iffy and a landmine waiting to be stepped
>> on to only convert the member itself, and not any of the corresponding
>> "int" variables that track it to "size_t".
>>
>> If you do the change I suggested in
>> https://lore.kernel.org/git/87v93i8svd.fsf@evledraar.gmail.com/ you'll
>> find that there's at least one first-order reference to this that now
>> uses "int" that if converted to "size_t" will result in a wrap-around
>> error, we're lucky that one has a test failure.
>>
>> I can tell you what that bug is, but maybe it's better if you find it
>> yourself :) I.e. I found *that* one, but I'm not sure I found them
>> all. I just s/int nr/size_t *nr/ and eyeballed the wall off compiler
>> errors & the code context (note: pointer, obviously broken, but makes
>> the compiler yell).
>>
>> That particular bug will be caught by the compiler as it involves a >= 0
>> comparison against unsigned, but we may not not have that everywhere...
>
> I'm particularly interested in the int -> size_t change problem as part
> of the wider 4GB limitations for the LLP64 systems [0] such as the
> RaspPi, git-lfs (on windows [1]), and Git-for-Windows[2]. It is a big
> problem.

Okey, fine, no fun excercise for the reader then ;)

This is what I'd been sitting on locally since that recent thread, I
polished it up a bit since Jeff King posted his version.

The potential overflow bug I mentioned is in rebase.c. See
5/7. "Potential" because it's not a bug now, but that code
intentionally considers a strvec, and then iterates it from nr-1 to 0,
and if it reaches 0 intentionally counts down one more to -1 to
indicate that it's visited all elements.

We then check that with i >= 0, except of course if it becomes
unsigned that doesn't become -1, but rather it wraps around.

The rest of this is all changes to have that s/int/size_t/ radiate
outwards, i.e. when we assign that value to a variable somewhere its
now a "size_t" instead of an "int" etc.

> [0]
> http://nickdesaulniers.github.io/blog/2016/05/30/data-models-and-word-size/
> [1] https://github.com/git-lfs/git-lfs/issues/2434  Git on Windows
> client corrupts files > 4Gb
> [2] https://github.com/git-for-windows/git/pull/2179  [DRAFT] for
> testing : Fix 4Gb limit for large files on Git for Windows

Jeff King (1):
  strvec: use size_t to store nr and alloc

Ævar Arnfjörð Bjarmason (6):
  remote-curl: pass "struct strvec *" instead of int/char ** pair
  pack-objects: pass "struct strvec *" instead of int/char ** pair
  sequencer.[ch]: pass "struct strvec *" instead of int/char ** pair
  upload-pack.c: pass "struct strvec *" instead of int/char ** pair
  rebase: don't have loop over "struct strvec" depend on signed "nr"
  strvec API users: change some "int" tracking "nr" to "size_t"

 builtin/pack-objects.c |  6 +++---
 builtin/rebase.c       | 26 ++++++++++++--------------
 connect.c              |  8 ++++----
 fetch-pack.c           |  4 ++--
 ls-refs.c              |  2 +-
 remote-curl.c          | 23 +++++++++++------------
 sequencer.c            |  8 ++++----
 sequencer.h            |  4 ++--
 serve.c                |  2 +-
 shallow.c              |  5 +++--
 shallow.h              |  6 ++++--
 strvec.h               |  4 ++--
 submodule.c            |  2 +-
 upload-pack.c          |  7 +++----
 14 files changed, 53 insertions(+), 54 deletions(-)

Range-diff against v1:
-:  ----------- > 1:  2ef48d734e8 remote-curl: pass "struct strvec *" instead of int/char ** pair
-:  ----------- > 2:  7f59a58ed97 pack-objects: pass "struct strvec *" instead of int/char ** pair
-:  ----------- > 3:  c35cfb9c9c5 sequencer.[ch]: pass "struct strvec *" instead of int/char ** pair
-:  ----------- > 4:  2e0b82d4316 upload-pack.c: pass "struct strvec *" instead of int/char ** pair
-:  ----------- > 5:  be85a0565ef rebase: don't have loop over "struct strvec" depend on signed "nr"
1:  498f5ed80dc ! 6:  ba17290852c strvec: use size_t to store nr and alloc
    @@ Commit message
         We can correct it now; better late than never.
     
         Signed-off-by: Jeff King <peff@peff.net>
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## strvec.h ##
     @@ strvec.h: extern const char *empty_strvec[];
-:  ----------- > 7:  2edd9708888 strvec API users: change some "int" tracking "nr" to "size_t"
-- 
2.33.0.998.ga4d44345d43


  reply	other threads:[~2021-09-12  0:16 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-11 15:01 [PATCH] strvec: use size_t to store nr and alloc Jeff King
2021-09-11 16:13 ` Ævar Arnfjörð Bjarmason
2021-09-11 22:48   ` Philip Oakley
2021-09-12  0:15     ` Ævar Arnfjörð Bjarmason [this message]
2021-09-12  0:15       ` [PATCH v2 1/7] remote-curl: pass "struct strvec *" instead of int/char ** pair Ævar Arnfjörð Bjarmason
2021-09-12  0:36         ` Carlo Arenas
2021-09-13  3:56           ` Ævar Arnfjörð Bjarmason
2021-09-12  0:15       ` [PATCH v2 2/7] pack-objects: " Ævar Arnfjörð Bjarmason
2021-09-12  0:15       ` [PATCH v2 3/7] sequencer.[ch]: " Ævar Arnfjörð Bjarmason
2021-09-12  0:15       ` [PATCH v2 4/7] upload-pack.c: " Ævar Arnfjörð Bjarmason
2021-09-12  0:15       ` [PATCH v2 5/7] rebase: don't have loop over "struct strvec" depend on signed "nr" Ævar Arnfjörð Bjarmason
2021-09-12  2:57         ` Eric Sunshine
2021-09-12  0:15       ` [PATCH v2 6/7] strvec: use size_t to store nr and alloc Ævar Arnfjörð Bjarmason
2021-09-12  0:15       ` [PATCH v2 7/7] strvec API users: change some "int" tracking "nr" to "size_t" Ævar Arnfjörð Bjarmason
2021-09-12  3:00         ` Eric Sunshine
2021-09-12 22:19       ` [PATCH v2 0/7] strvec: use size_t to store nr and alloc Jeff King
2021-09-13  5:38         ` Junio C Hamano
2021-09-13 12:29           ` Ævar Arnfjörð Bjarmason
2021-09-13 17:20             ` Jeff King
2021-09-13 10:47       ` Philip Oakley
2021-09-12 22:00     ` [PATCH] " Jeff King
2021-09-13 11:42       ` Philip Oakley
2021-09-12 21:58   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover-v2-0.7-00000000000-20210912T001420Z-avarab@gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=philipoakley@iee.email \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).