git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Shawn O. Pearce" <spearce@spearce.org>
To: Marek Zawirski <marek.zawirski@gmail.com>
Cc: robin.rosenberg@dewire.com, git@vger.kernel.org
Subject: Re: [EGIT PATCH 05/20] Reverse pack index implementation: PackReverseIndex
Date: Mon, 16 Jun 2008 00:06:36 -0400	[thread overview]
Message-ID: <20080616040635.GU11793@spearce.org> (raw)
In-Reply-To: <1213566349-25395-6-git-send-email-marek.zawirski@gmail.com>

Marek Zawirski <marek.zawirski@gmail.com> wrote:
> Let us quickly find ObjectId or next object for specified offset in a
> pack, in O(log n) time.
...
> diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackReverseIndex.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackReverseIndex.java
...
> +	/**
> +	 * Object ids corresponding to {@link #offsets32} and {@link #offsets64}.
> +	 */
> +	private final int names[];

This could be smaller if it was an array of indexes into the index,
rather than the ObjectId itself.  Thus we need only 1 int per object
and not 5 ints per object.

But I see why you are doing it; PackIndex.MutableEntry exposes the
ObjectId and not the nth position of the object within the index.

> +	PackReverseIndex(final PackIndex index) {
> +		final long count = index.getObjectCount();
> +		if (count > Integer.MAX_VALUE)
> +			throw new IllegalArgumentException(
> +					"Huge indexes (> 2^31 entries) are not supported by jgit, yet");

For what its worth, this limit is actually:

	Integer.MAX_VALUE / Constants.OBJECT_ID_LENGTH / 4

as you store the entire ObjectId data for the index in a single int[]
array called names.  So you'll get an ArrayIndexOutOfBoundsException
or maybe OutOfMemoryError when you try to build names later on, and
never really hit this case here.

> +	ObjectId findObject(final long offset) {
> +		if (offset <= Integer.MAX_VALUE) {
> +			final int i32 = Arrays.binarySearch(offsets32, (int) offset);
> +			if (i32 < 0)
> +				return null;
> +			final int iNames = i32 * Constants.OBJECT_ID_LENGTH / 4;

This probably should be:

	final int iNames = i32 * (Constants.OBJECT_ID_LENGTH / 4);

as then we don't overflow the precision of int when i32 is large.

> +			return ObjectId.fromRaw(names, iNames);
> +		} else {
> +			final int i64 = Arrays.binarySearch(offsets64, offset);
> +			if (i64 < 0)
> +				return null;
> +			final int iNames = (i64 + offsets32.length)
> +					* Constants.OBJECT_ID_LENGTH / 4;

Again, watch out for overflow.

-- 
Shawn.

  parent reply	other threads:[~2008-06-16  4:07 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-15 21:45 [EGIT PATCH 00/20] PackWriter, first usable attempt Marek Zawirski
2008-06-15 21:45 ` [EGIT PATCH 01/20] Fix typo in PackIndexV2 Marek Zawirski
2008-06-15 21:45   ` [EGIT PATCH 02/20] Integer versions of copyRawTo() and fromRaw() in ObjectId Marek Zawirski
2008-06-15 21:45     ` [EGIT PATCH 03/20] Add openObjectInAllPacks() to Repository, exposing packed objects storage Marek Zawirski
2008-06-15 21:45       ` [EGIT PATCH 04/20] WindowedFile fragments copying: copyToStream() Marek Zawirski
2008-06-15 21:45         ` [EGIT PATCH 05/20] Reverse pack index implementation: PackReverseIndex Marek Zawirski
2008-06-15 21:45           ` [EGIT PATCH 06/20] Tests for PackReverseIndex Marek Zawirski
2008-06-15 21:45             ` [EGIT PATCH 07/20] Refactor PackIndexV2 - extract binarySearchLevelTwo() Marek Zawirski
2008-06-15 21:45               ` [EGIT PATCH 08/20] CRC32 support for PackIndex Marek Zawirski
2008-06-15 21:45                 ` [EGIT PATCH 09/20] CRC32 PackIndex tests Marek Zawirski
2008-06-15 21:45                   ` [EGIT PATCH 10/20] Format PackedObjectLoader class Marek Zawirski
2008-06-15 21:45                     ` [EGIT PATCH 11/20] Format UnpackedObjectLoader class Marek Zawirski
2008-06-15 21:45                       ` [EGIT PATCH 12/20] Format DeltaOfsPackedObjectLoader class Marek Zawirski
2008-06-15 21:45                         ` [EGIT PATCH 13/20] Raw-data operations in ObjectLoaders and PackFile Marek Zawirski
2008-06-15 21:45                           ` [EGIT PATCH 14/20] Add hasRevSort() in RevWalk for faster sorting strategy checking Marek Zawirski
2008-06-15 21:45                             ` [EGIT PATCH 15/20] Refactor getRevSort() calls to hasRevSort() Marek Zawirski
2008-06-15 21:45                               ` [EGIT PATCH 16/20] Support for RevSort.BOUNDARY in ObjectWalk Marek Zawirski
2008-06-15 21:45                                 ` [EGIT PATCH 17/20] Rename confusing objects field " Marek Zawirski
2008-06-15 21:45                                   ` [EGIT PATCH 18/20] New CountingOutputStream class - stream decorator Marek Zawirski
2008-06-15 21:45                                     ` [EGIT PATCH 19/20] Simplified implementation of pack creation: PackWriter Marek Zawirski
2008-06-15 21:45                                       ` [EGIT PATCH 20/20] PackWriter test suite Marek Zawirski
2008-06-17 21:28                                       ` [EGIT PATCH 21/20] Make isBetterDeltaReuseLoader() static in PackWriter Marek Zawirski
2008-06-17 22:07                                         ` Robin Rosenberg
2008-06-19 16:26                                           ` Marek Zawirski
2008-06-16  4:06           ` Shawn O. Pearce [this message]
2008-06-16 16:27             ` [EGIT PATCH 05/20] Reverse pack index implementation: PackReverseIndex Marek Zawirski
2008-06-17  2:02               ` Shawn O. Pearce
2008-06-16  5:19 ` [EGIT PATCH 00/20] PackWriter, first usable attempt Shawn O. Pearce
2008-06-16 16:37   ` Marek Zawirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080616040635.GU11793@spearce.org \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=marek.zawirski@gmail.com \
    --cc=robin.rosenberg@dewire.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).