git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee <derrickstolee@github.com>,
	Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
Subject: Re: [PATCH 4/6] pack-bitmap.c: factor out manual `map_pos` manipulation
Date: Tue, 21 Mar 2023 13:56:12 -0400	[thread overview]
Message-ID: <20230321175612.GF3119834@coredump.intra.peff.net> (raw)
In-Reply-To: <0decf13869df6216914044a560d94968126836f4.1679342296.git.me@ttaylorr.com>

On Mon, Mar 20, 2023 at 04:02:49PM -0400, Taylor Blau wrote:

> The pack-bitmap internals use a combination of a `map` and `map_pos`
> variable to keep a pointer to a memory mapped region of the `.bitmap`
> file, as well as the position within that file, respectively.
> 
> Reads within the memory mapped region are meant to mimic file reads,
> where each read adjusts the offset of the file descriptor accordingly.
> This functionality is implemented by adjusting the `map_pos` variable
> after each read.
> 
> Factor out a function similar to seek() which adjusts the `map_pos`
> variable for us. Since the bitmap code only needs to adjust relative to
> the beginning of the file as well as the current position, we do not
> need to support any "whence" values outside of SEEK_SET and SEEK_CUR.

I like the idea of centralizing the bounds checks here.

I do think copying lseek() is a little awkward, though. As you note, we
only care about SEEK_SET and SEEK_CUR, and we have to BUG() on anything
else. Which means we have a run-time check, rather than a compile time
check. If we had two functions like:

  void bitmap_index_seek_to(struct bitmap_index *bitmap_git, size_t pos)
  {
	bitmap_git->map_pos = pos;
	if (bitmap_git->map_pos >= bitmap_git->map_size)
	   ...etc...
  }

  void bitmap_index_incr(struct bitmap_index *bitmap_git, size_t offset)
  {
	bitmap_index_seek_to(bitmap_git, st_add(bitmap_git->map_pos, offset));
  }

then the compiler can see all cases directly. I dunno how much it
matters.

The other thing that's interesting from a bounds-checking perspective is
that you're checking the seek _after_ you've read an item. Which has two
implications:

  - I don't think we could ever overflow size_t here. We are working
    with a buffer that we got from mmap(), so it's already probably
    bounded to some much smaller subset of size_t. And even if we
    imagine that you really could get a buffer that stretches for the
    whole of the memory space, and that incrementing it by 4 bytes would
    overflow, we'd by definition have just overflowed the memory space
    itself by reading 4 bytes (and presumably segfaulted). So I really
    doubt this st_add() is doing anything.

  - The more interesting case is that we are not overflowing size_t, but
    simply walking past bitmap_git->map_size. But again, we are reading
    first and then seeking. So if our seek does go past, then by
    definition we just read garbage bytes, which is undefined behavior.

    For a bounds-check, wouldn't we want it the other way around? To
    make sure we have 4 bytes available, and then if so read them and
    increment the offset?

> +	if (bitmap_git->map_pos >= bitmap_git->map_size)
> +		BUG("bitmap position exceeds size (%"PRIuMAX" >= %"PRIuMAX")",
> +		    (uintmax_t)bitmap_git->map_pos,
> +		    (uintmax_t)bitmap_git->map_size);

This uses ">=", which is good, because it is valid to walk the pointer
to one past the end of the map, which is effectively EOF. But as above,
in that condition the callers should be checking for this EOF state
before reading the bytes.

-Peff

  reply	other threads:[~2023-03-21 17:56 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-20 20:02 [PATCH 0/6] pack-bitmap: miscellaneous mmap read hardening Taylor Blau
2023-03-20 20:02 ` [PATCH 1/6] pack-bitmap.c: hide bitmap internals in `read_u8()` Taylor Blau
2023-03-21 17:35   ` Jeff King
2023-03-24 17:52     ` Derrick Stolee
2023-03-20 20:02 ` [PATCH 2/6] pack-bitmap.c: hide bitmap internals in `read_be32()` Taylor Blau
2023-03-20 20:02 ` [PATCH 3/6] pack-bitmap.c: drop unnecessary 'inline's Taylor Blau
2023-03-21 17:40   ` Jeff King
2023-03-20 20:02 ` [PATCH 4/6] pack-bitmap.c: factor out manual `map_pos` manipulation Taylor Blau
2023-03-21 17:56   ` Jeff King [this message]
2023-03-24 18:04     ` Derrick Stolee
2023-03-24 18:29       ` Jeff King
2023-03-24 23:23         ` Taylor Blau
2023-03-25  4:57           ` Jeff King
2023-03-24 23:13       ` Taylor Blau
2023-03-24 23:24         ` Taylor Blau
2023-03-24 23:08     ` Taylor Blau
2023-03-20 20:02 ` [PATCH 5/6] pack-bitmap.c: use `bitmap_index_seek()` where possible Taylor Blau
2023-03-21 18:05   ` Jeff King
2023-03-24 18:06     ` Derrick Stolee
2023-03-24 18:35       ` Jeff King
2023-03-24 19:43         ` Junio C Hamano
2023-03-24 20:37           ` Jeff King
2023-03-24 21:38             ` Junio C Hamano
2023-03-24 22:57               ` Taylor Blau
2023-03-20 20:02 ` [PATCH 6/6] pack-bitmap.c: factor out `bitmap_index_seek_commit()` Taylor Blau
2023-03-21 18:13   ` Jeff King
2023-03-21 18:16     ` Taylor Blau
2023-03-21 18:27       ` Jeff King
2023-03-24 18:09         ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230321175612.GF3119834@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=chakrabortyabhradeep79@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).