git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Philip Oakley <philipoakley@iee.email>
Cc: Christian Couder <christian.couder@gmail.com>,
	git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Christian Couder <chriscool@tuxfamily.org>,
	Ramsay Jones <ramsay@ramsayjones.plus.com>,
	Jonathan Tan <jonathantanmy@google.com>
Subject: Re: [PATCH v2 5/9] pack-bitmap: introduce bitmap_walk_contains()
Date: Sat, 19 Oct 2019 19:18:36 -0400	[thread overview]
Message-ID: <20191019231836.GA32408@sigill.intra.peff.net> (raw)
In-Reply-To: <dce8e0b5-c4ea-f4f6-6275-1322f2d7200b@iee.email>

On Sat, Oct 19, 2019 at 04:25:19PM +0100, Philip Oakley wrote:

> > +int bitmap_walk_contains(struct bitmap_index *bitmap_git,
> > +			 struct bitmap *bitmap, const struct object_id *oid)
> > +{
> > +	int idx;
> Excuse my ignorance here...
> 
> For the case on Windows (int/long 32 bit), is this return value guaranteed
> to be less than 2GiB, i.e. not a memory offset?
> 
> I'm just thinking ahead to the resolution of the 4GiB file limit issue on
> Git-for-Windows (https://github.com/git-for-windows/git/pull/2179)

Yes, it's not a memory offset.

This "idx" here (and the return value of bitmap_position) represents a
position within an array of objects. This isn't strictly limited to the
objects in a single pack (because a traversal might extend to objects
outside the bitmapped pack), but we can use that as a general ballpark.
And it's limited to a 4-byte object count already.

So the "best" type here would be a uint32_t (which is used elsewhere
in the pack code), but we use signedness to indicate that the object
wasn't found.

That's probably OK. The biggest repos I've seen have on the order of
10-100M objects. That still gives us a factor of 20 before we hit 2^31.
If we imagine those repos took 10 years or so to accrue that many
objects, then we probably still have 200 years of growth left. Of course
growth accelerates over time, but I suspect repos with 2B objects will
run into other scaling problems first. So I don't think it's worth
worrying about too much for now.

-Peff

  parent reply	other threads:[~2019-10-19 23:18 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-19 10:35 [PATCH v2 0/9] Rewrite packfile reuse code Christian Couder
2019-10-19 10:35 ` [PATCH v2 1/9] builtin/pack-objects: report reused packfile objects Christian Couder
2019-10-19 10:35 ` [PATCH v2 2/9] packfile: expose get_delta_base() Christian Couder
2019-10-19 10:35 ` [PATCH v2 3/9] ewah/bitmap: introduce bitmap_word_alloc() Christian Couder
2019-10-22 17:46   ` Jonathan Tan
2019-10-26  9:29     ` Christian Couder
2019-10-19 10:35 ` [PATCH v2 4/9] pack-bitmap: don't rely on bitmap_git->reuse_objects Christian Couder
2019-10-19 10:35 ` [PATCH v2 5/9] pack-bitmap: introduce bitmap_walk_contains() Christian Couder
2019-10-19 15:25   ` Philip Oakley
2019-10-19 18:55     ` Christian Couder
2019-10-19 20:15       ` Philip Oakley
2019-10-19 23:18     ` Jeff King [this message]
2019-10-19 10:35 ` [PATCH v2 6/9] csum-file: introduce hashfile_total() Christian Couder
2019-10-19 10:35 ` [PATCH v2 7/9] pack-objects: introduce pack.allowPackReuse Christian Couder
2019-10-19 10:35 ` [PATCH v2 8/9] builtin/pack-objects: introduce obj_is_packed() Christian Couder
2019-10-19 10:35 ` [PATCH v2 9/9] pack-objects: improve partial packfile reuse Christian Couder
2019-10-19 15:30   ` Philip Oakley
2019-10-19 19:20     ` Christian Couder
2019-10-19 23:23       ` Jeff King
2019-10-20 11:26         ` Philip Oakley
2019-10-22 19:48   ` Jonathan Tan
2019-10-26  9:29     ` Christian Couder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191019231836.GA32408@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=chriscool@tuxfamily.org \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=philipoakley@iee.email \
    --cc=ramsay@ramsayjones.plus.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).