git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [EGIT PATCH 00/20] PackWriter, first usable attempt
@ 2008-06-15 21:45 Marek Zawirski
  2008-06-15 21:45 ` [EGIT PATCH 01/20] Fix typo in PackIndexV2 Marek Zawirski
  2008-06-16  5:19 ` [EGIT PATCH 00/20] PackWriter, first usable attempt Shawn O. Pearce
  0 siblings, 2 replies; 29+ messages in thread
From: Marek Zawirski @ 2008-06-15 21:45 UTC (permalink / raw)
  To: robin.rosenberg, spearce; +Cc: git, Marek Zawirski

Studying made me busy last week, but I'm back:
with another GSoC series, adding some usable feature this time.

At first, some stuff was still missing to produce packs, mostly
raw-data access related and ObjectWalk related.

Finally, we've got some support for pack writing! It's not that
power that C git version offers, but something usable. Delta
generation is not supported. Although we can reuse deltas and objects,
and support all other (I hope) options of git-pack-objects directly or
indirectly, most importantly --thin.

Pack writing and some other features are tested, seem to work.

This implementation of packing is not a very valuable thing directly
(achieving efficient storage), however it's a base for enhancements
and can be used for sending packs over net (with some assumptions).
It's more a "repacking" than "packing" tool.

So... I'm switching now to push implementation. If time allows,
delta-algorithms will be added later.

Robin,
this series is based on master of egit.git when I saw it last time
before repo.or.cz went down (9354293) ;) I'll add packwriter branch
to my repo when server is up.

Marek Zawirski (20):
  Fix typo in PackIndexV2
  Integer versions of copyRawTo() and fromRaw() in ObjectId
  Add openObjectInAllPacks() to Repository, exposing packed objects
    storage
  WindowedFile fragments copying: copyToStream()
  Reverse pack index implementation: PackReverseIndex
  Tests for PackReverseIndex
  Refactor PackIndexV2 - extract binarySearchLevelTwo()
  CRC32 support for PackIndex
  CRC32 PackIndex tests
  Format PackedObjectLoader class
  Format UnpackedObjectLoader class
  Format DeltaOfsPackedObjectLoader class
  Raw-data operations in ObjectLoaders and PackFile
  Add hasRevSort() in RevWalk for faster sorting strategy checking
  Refactor getRevSort() calls to hasRevSort()
  Support for RevSort.BOUNDARY in ObjectWalk
  Rename confusing objects field in ObjectWalk
  New CountingOutputStream class - stream decorator
  Simplified implementation of pack creation: PackWriter
  PackWriter test suite

 .../tst/org/spearce/jgit/lib/PackIndexTest.java    |   10 +
 .../tst/org/spearce/jgit/lib/PackIndexV1Test.java  |   19 +
 .../tst/org/spearce/jgit/lib/PackIndexV2Test.java  |   30 +
 .../org/spearce/jgit/lib/PackReverseIndexTest.java |  115 +++
 .../tst/org/spearce/jgit/lib/PackWriterTest.java   |  454 ++++++++++
 org.spearce.jgit.test/tst/pack-huge.idx            |  Bin 0 -> 2368 bytes
 .../src/org/spearce/jgit/lib/AnyObjectId.java      |   16 +
 .../jgit/lib/DeltaOfsPackedObjectLoader.java       |   24 +-
 .../spearce/jgit/lib/DeltaPackedObjectLoader.java  |    9 +-
 .../jgit/lib/DeltaRefPackedObjectLoader.java       |   15 +-
 .../src/org/spearce/jgit/lib/ObjectId.java         |   26 +
 .../src/org/spearce/jgit/lib/ObjectLoader.java     |   24 +
 .../src/org/spearce/jgit/lib/PackFile.java         |   85 ++-
 .../src/org/spearce/jgit/lib/PackIndex.java        |   23 +
 .../src/org/spearce/jgit/lib/PackIndexV1.java      |   10 +
 .../src/org/spearce/jgit/lib/PackIndexV2.java      |   73 +-
 .../src/org/spearce/jgit/lib/PackReverseIndex.java |  179 ++++
 .../src/org/spearce/jgit/lib/PackWriter.java       |  882 ++++++++++++++++++++
 .../org/spearce/jgit/lib/PackedObjectLoader.java   |   47 +-
 .../src/org/spearce/jgit/lib/Repository.java       |   42 +
 .../org/spearce/jgit/lib/UnpackedObjectLoader.java |   26 +-
 .../spearce/jgit/lib/WholePackedObjectLoader.java  |   20 +-
 .../src/org/spearce/jgit/lib/WindowedFile.java     |   43 +
 .../src/org/spearce/jgit/revwalk/ObjectWalk.java   |   40 +-
 .../src/org/spearce/jgit/revwalk/RevSort.java      |    5 +-
 .../src/org/spearce/jgit/revwalk/RevWalk.java      |   11 +
 .../org/spearce/jgit/revwalk/StartGenerator.java   |   14 +-
 .../spearce/jgit/util/CountingOutputStream.java    |   89 ++
 28 files changed, 2258 insertions(+), 73 deletions(-)
 create mode 100644 org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackReverseIndexTest.java
 create mode 100644 org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackWriterTest.java
 create mode 100644 org.spearce.jgit.test/tst/pack-huge.idx
 create mode 100644 org.spearce.jgit/src/org/spearce/jgit/lib/PackReverseIndex.java
 create mode 100644 org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
 create mode 100644 org.spearce.jgit/src/org/spearce/jgit/util/CountingOutputStream.java

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [EGIT PATCH 01/20] Fix typo in PackIndexV2
  2008-06-15 21:45 [EGIT PATCH 00/20] PackWriter, first usable attempt Marek Zawirski
@ 2008-06-15 21:45 ` Marek Zawirski
  2008-06-15 21:45   ` [EGIT PATCH 02/20] Integer versions of copyRawTo() and fromRaw() in ObjectId Marek Zawirski
  2008-06-16  5:19 ` [EGIT PATCH 00/20] PackWriter, first usable attempt Shawn O. Pearce
  1 sibling, 1 reply; 29+ messages in thread
From: Marek Zawirski @ 2008-06-15 21:45 UTC (permalink / raw)
  To: robin.rosenberg, spearce; +Cc: git, Marek Zawirski

levelTWo -> levelTwo

Signed-off-by: Marek Zawirski <marek.zawirski@gmail.com>
---
 .../src/org/spearce/jgit/lib/PackIndexV2.java      |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV2.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV2.java
index 9a695ef..ae70f11 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV2.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV2.java
@@ -184,13 +184,13 @@ class PackIndexV2 extends PackIndex {
 	private class EntriesIteratorV2 extends EntriesIterator {
 		private int levelOne;
 
-		private int levelTWo;
+		private int levelTwo;
 
 		public MutableEntry next() {
 			for (; levelOne < names.length; levelOne++) {
-				if (levelTWo < names[levelOne].length) {
-					objectId.fromRaw(names[levelOne], levelTWo);
-					int arrayIdx = levelTWo / (Constants.OBJECT_ID_LENGTH / 4)
+				if (levelTwo < names[levelOne].length) {
+					objectId.fromRaw(names[levelOne], levelTwo);
+					int arrayIdx = levelTwo / (Constants.OBJECT_ID_LENGTH / 4)
 							* 4;
 					long offset = NB.decodeUInt32(offset32[levelOne], arrayIdx);
 					if ((offset & IS_O64) != 0) {
@@ -199,11 +199,11 @@ class PackIndexV2 extends PackIndex {
 					}
 					objectId.setOffset(offset);
 
-					levelTWo += Constants.OBJECT_ID_LENGTH / 4;
+					levelTwo += Constants.OBJECT_ID_LENGTH / 4;
 					returnedNumber++;
 					return objectId;
 				} else {
-					levelTWo = 0;
+					levelTwo = 0;
 				}
 			}
 			throw new NoSuchElementException();
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [EGIT PATCH 02/20] Integer versions of copyRawTo() and fromRaw() in ObjectId
  2008-06-15 21:45 ` [EGIT PATCH 01/20] Fix typo in PackIndexV2 Marek Zawirski
@ 2008-06-15 21:45   ` Marek Zawirski
  2008-06-15 21:45     ` [EGIT PATCH 03/20] Add openObjectInAllPacks() to Repository, exposing packed objects storage Marek Zawirski
  0 siblings, 1 reply; 29+ messages in thread
From: Marek Zawirski @ 2008-06-15 21:45 UTC (permalink / raw)
  To: robin.rosenberg, spearce; +Cc: git, Marek Zawirski

Helper methods in ObjectId and AnyobjectId for int[], overload existing
byte[] versions.

Signed-off-by: Marek Zawirski <marek.zawirski@gmail.com>
---
 .../src/org/spearce/jgit/lib/AnyObjectId.java      |   16 ++++++++++++
 .../src/org/spearce/jgit/lib/ObjectId.java         |   26 ++++++++++++++++++++
 2 files changed, 42 insertions(+), 0 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/AnyObjectId.java b/org.spearce.jgit/src/org/spearce/jgit/lib/AnyObjectId.java
index c348598..871a76d 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/AnyObjectId.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/AnyObjectId.java
@@ -276,6 +276,22 @@ public abstract class AnyObjectId implements Comparable {
 	}
 
 	/**
+	 * Copy this ObjectId to an int array.
+	 * 
+	 * @param b
+	 *            the buffer to copy to.
+	 * @param o
+	 *            the offset within b to write at.
+	 */
+	public void copyRawTo(final int[] b, final int o) {
+		b[o] = w1;
+		b[o + 1] = w2;
+		b[o + 2] = w3;
+		b[o + 3] = w4;
+		b[o + 4] = w5;
+	}
+
+	/**
 	 * Copy this ObjectId to an output writer in raw binary.
 	 * 
 	 * @param w
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/ObjectId.java b/org.spearce.jgit/src/org/spearce/jgit/lib/ObjectId.java
index 9688a2e..7646a7b 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/ObjectId.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/ObjectId.java
@@ -168,6 +168,32 @@ public class ObjectId extends AnyObjectId {
 	}
 
 	/**
+	 * Convert an ObjectId from raw binary representation.
+	 * 
+	 * @param is
+	 *            the raw integers buffer to read from. At least 5 integers must
+	 *            be available within this int array.
+	 * @return the converted object id.
+	 */
+	public static final ObjectId fromRaw(final int[] is) {
+		return fromRaw(is, 0);
+	}
+
+	/**
+	 * Convert an ObjectId from raw binary representation.
+	 * 
+	 * @param is
+	 *            the raw integers buffer to read from. At least 5 integers
+	 *            after p must be available within this int array.
+	 * @param p
+	 *            position to read the first integer of data from.
+	 * @return the converted object id.
+	 */
+	public static final ObjectId fromRaw(final int[] is, final int p) {
+		return new ObjectId(is[p], is[p + 1], is[p + 2], is[p + 3], is[p + 4]);
+	}
+
+	/**
 	 * Convert an ObjectId from hex characters (US-ASCII).
 	 * 
 	 * @param buf
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [EGIT PATCH 03/20] Add openObjectInAllPacks() to Repository, exposing packed objects storage
  2008-06-15 21:45   ` [EGIT PATCH 02/20] Integer versions of copyRawTo() and fromRaw() in ObjectId Marek Zawirski
@ 2008-06-15 21:45     ` Marek Zawirski
  2008-06-15 21:45       ` [EGIT PATCH 04/20] WindowedFile fragments copying: copyToStream() Marek Zawirski
  0 siblings, 1 reply; 29+ messages in thread
From: Marek Zawirski @ 2008-06-15 21:45 UTC (permalink / raw)
  To: robin.rosenberg, spearce; +Cc: git, Marek Zawirski

Gives client a way to access specified object from any pack where it is
stored (this may be useful when an object is stored in more than 1 pack).

Signed-off-by: Marek Zawirski <marek.zawirski@gmail.com>
---
 .../src/org/spearce/jgit/lib/Repository.java       |   42 ++++++++++++++++++++
 1 files changed, 42 insertions(+), 0 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/Repository.java b/org.spearce.jgit/src/org/spearce/jgit/lib/Repository.java
index 3efe60b..64f93ff 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/Repository.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/Repository.java
@@ -48,6 +48,7 @@ import java.io.IOException;
 import java.util.ArrayList;
 import java.util.Collection;
 import java.util.HashMap;
+import java.util.LinkedList;
 import java.util.Map;
 
 import org.spearce.jgit.errors.IncorrectObjectTypeException;
@@ -292,6 +293,47 @@ public class Repository {
 	}
 
 	/**
+	 * Open object in all packs containing specified object.
+	 * 
+	 * @param objectId
+	 *            id of object to search for
+	 * @param curs
+	 *            temporary working space associated with the calling thread.
+	 * @return collection of loaders for this object, from all packs containing
+	 *         this object
+	 * @throws IOException
+	 */
+	public Collection<PackedObjectLoader> openObjectInAllPacks(
+			final AnyObjectId objectId, final WindowCursor curs)
+			throws IOException {
+		Collection<PackedObjectLoader> result = new LinkedList<PackedObjectLoader>();
+		openObjectInAllPacks(objectId, result, curs);
+		return result;
+	}
+
+	/**
+	 * Open object in all packs containing specified object.
+	 * 
+	 * @param objectId
+	 *            id of object to search for
+	 * @param resultLoaders
+	 *            result collection of loaders for this object, filled with
+	 *            loaders from all packs containing specified object
+	 * @param curs
+	 *            temporary working space associated with the calling thread.
+	 * @throws IOException
+	 */
+	void openObjectInAllPacks(final AnyObjectId objectId,
+			final Collection<PackedObjectLoader> resultLoaders,
+			final WindowCursor curs) throws IOException {
+		for (PackFile pack : packs) {
+			final PackedObjectLoader loader = pack.get(curs, objectId);
+			if (loader != null)
+				resultLoaders.add(loader);
+		}
+	}
+
+	/**
 	 * @param id
 	 *            SHA'1 of a blob
 	 * @return an {@link ObjectLoader} for accessing the data of a named blob
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [EGIT PATCH 04/20] WindowedFile fragments copying: copyToStream()
  2008-06-15 21:45     ` [EGIT PATCH 03/20] Add openObjectInAllPacks() to Repository, exposing packed objects storage Marek Zawirski
@ 2008-06-15 21:45       ` Marek Zawirski
  2008-06-15 21:45         ` [EGIT PATCH 05/20] Reverse pack index implementation: PackReverseIndex Marek Zawirski
  0 siblings, 1 reply; 29+ messages in thread
From: Marek Zawirski @ 2008-06-15 21:45 UTC (permalink / raw)
  To: robin.rosenberg, spearce; +Cc: git, Marek Zawirski

Method supports direct rewriting of file segment to the specified output
stream.

Signed-off-by: Marek Zawirski <marek.zawirski@gmail.com>
---
 .../src/org/spearce/jgit/lib/WindowedFile.java     |   43 ++++++++++++++++++++
 1 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/WindowedFile.java b/org.spearce.jgit/src/org/spearce/jgit/lib/WindowedFile.java
index 323b396..fff8990 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/WindowedFile.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/WindowedFile.java
@@ -40,6 +40,7 @@ package org.spearce.jgit.lib;
 import java.io.EOFException;
 import java.io.File;
 import java.io.IOException;
+import java.io.OutputStream;
 import java.io.RandomAccessFile;
 import java.nio.MappedByteBuffer;
 import java.nio.channels.FileChannel.MapMode;
@@ -195,6 +196,48 @@ public class WindowedFile {
 			throw new EOFException();
 	}
 
+	/**
+	 * Copy the requested number of bytes to the provided output stream.
+	 * <p>
+	 * This routine always reads until either the requested number of bytes has
+	 * been copied or EOF has been reached.
+	 * </p>
+	 * 
+	 * @param position
+	 *            the starting offset, as measured in bytes from the beginning
+	 *            of this file, to copy from.
+	 * @param buf
+	 *            temporary buffer to copy bytes into. In case of a big amount
+	 *            of data to copy, size of at least few kB is recommended. It
+	 *            does not need to be of size <code>cnt</code>, however.
+	 * @param cnt
+	 *            number of bytes to copy. Must not exceed
+	 *            <code>file.length - position</code>.
+	 * @param out
+	 *            output stream where read data is written out. No buffering is
+	 *            guaranteed by this method.
+	 * @param curs
+	 *            current cursor for reading data from the file.
+	 * @throws IOException
+	 *             a necessary window was not found in the window cache and
+	 *             trying to load it in from the operating system failed.
+	 * @throws EOFException
+	 *             the file ended before <code>cnt</code> bytes could be read.
+	 */
+	public void copyToStream(long position, final byte[] buf, long cnt,
+			final OutputStream out, final WindowCursor curs)
+			throws IOException, EOFException {
+		while (cnt > 0) {
+			int toRead = (int) Math.min(cnt, buf.length);
+			int read = read(position, buf, 0, toRead, curs);
+			if (read != toRead)
+				throw new EOFException();
+			position += read;
+			cnt -= read;
+			out.write(buf, 0, read);
+		}
+	}
+
 	void readCompressed(final long position, final byte[] dstbuf,
 			final WindowCursor curs) throws IOException, DataFormatException {
 		final Inflater inf = InflaterCache.get();
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [EGIT PATCH 05/20] Reverse pack index implementation: PackReverseIndex
  2008-06-15 21:45       ` [EGIT PATCH 04/20] WindowedFile fragments copying: copyToStream() Marek Zawirski
@ 2008-06-15 21:45         ` Marek Zawirski
  2008-06-15 21:45           ` [EGIT PATCH 06/20] Tests for PackReverseIndex Marek Zawirski
  2008-06-16  4:06           ` [EGIT PATCH 05/20] Reverse pack index implementation: PackReverseIndex Shawn O. Pearce
  0 siblings, 2 replies; 29+ messages in thread
From: Marek Zawirski @ 2008-06-15 21:45 UTC (permalink / raw)
  To: robin.rosenberg, spearce; +Cc: git, Marek Zawirski

Let us quickly find ObjectId or next object for specified offset in a
pack, in O(log n) time.

Current implementation uses one level (reverse) indexing by an offset.
However, it tries to take small space as possible, by using 2 distinct
arrays for offsets with value < 2^31 (int[]) and those with value > 2^31
(long[]). Offsets are stored ordered in these 2 arrays. Binary search is
performed during requests.

Reverse index is constructed from an existing PackIndex instance, by
lazy initialization in a PackFile instance.

Signed-off-by: Marek Zawirski <marek.zawirski@gmail.com>
---
 .../src/org/spearce/jgit/lib/PackFile.java         |   20 +++
 .../src/org/spearce/jgit/lib/PackReverseIndex.java |  179 ++++++++++++++++++++
 2 files changed, 199 insertions(+), 0 deletions(-)
 create mode 100644 org.spearce.jgit/src/org/spearce/jgit/lib/PackReverseIndex.java

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackFile.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackFile.java
index 1b2c167..3880966 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackFile.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackFile.java
@@ -55,6 +55,8 @@ public class PackFile implements Iterable<PackIndex.MutableEntry> {
 
 	private final PackIndex idx;
 
+	private PackReverseIndex reverseIdx;
+
 	/**
 	 * Construct a reader for an existing, pre-indexed packfile.
 	 * 
@@ -172,6 +174,18 @@ public class PackFile implements Iterable<PackIndex.MutableEntry> {
 		return idx.getObjectCount();
 	}
 
+	/**
+	 * Search for object id with the specified start offset in associated pack
+	 * (reverse) index.
+	 * 
+	 * @param offset
+	 *            start offset of object to find
+	 * @return object id for this offset, or null if no object was found
+	 */
+	ObjectId findObjectForOffset(final long offset) {
+		return getReverseIdx().findObject(offset);
+	}
+
 	final UnpackedObjectCache.Entry readCache(final long position) {
 		return UnpackedObjectCache.get(pack, position);
 	}
@@ -264,4 +278,10 @@ public class PackFile implements Iterable<PackIndex.MutableEntry> {
 			throw new IOException("Unknown object type " + typeCode + ".");
 		}
 	}
+
+	private PackReverseIndex getReverseIdx() {
+		if (reverseIdx == null)
+			reverseIdx = new PackReverseIndex(idx);
+		return reverseIdx;
+	}
 }
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackReverseIndex.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackReverseIndex.java
new file mode 100644
index 0000000..3dede88
--- /dev/null
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackReverseIndex.java
@@ -0,0 +1,179 @@
+/*
+ * Copyright (C) 2008, Marek Zawirski <marek.zawirski@gmail.com>
+ *
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials provided
+ *   with the distribution.
+ *
+ * - Neither the name of the Git Development Community nor the
+ *   names of its contributors may be used to endorse or promote
+ *   products derived from this software without specific prior
+ *   written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
+ * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
+ * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+ * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+package org.spearce.jgit.lib;
+
+import java.util.Arrays;
+import java.util.Comparator;
+
+import org.spearce.jgit.errors.CorruptObjectException;
+import org.spearce.jgit.lib.PackIndex.MutableEntry;
+
+/**
+ * <p>
+ * Reverse index for forward pack index. Provides operations based on offset
+ * instead of object id. Such offset-based reverse lookups are performed in
+ * O(log n) time.
+ * </p>
+ * 
+ * @see PackIndex
+ * @see PackFile
+ */
+class PackReverseIndex {
+	/**
+	 * (offset31, truly) Offsets accommodating in 31 bits.
+	 */
+	private final int offsets32[];
+
+	/**
+	 * Offsets not accommodating in 31 bits.
+	 */
+	private final long offsets64[];
+
+	/**
+	 * Object ids corresponding to {@link #offsets32} and {@link #offsets64}.
+	 */
+	private final int names[];
+
+	/**
+	 * Create reverse index from straight/forward pack index, by indexing all
+	 * its entries.
+	 * 
+	 * @param index
+	 *            forward index - entries to (reverse) index.
+	 */
+	PackReverseIndex(final PackIndex index) {
+		final long count = index.getObjectCount();
+		if (count > Integer.MAX_VALUE)
+			throw new IllegalArgumentException(
+					"Huge indexes (> 2^31 entries) are not supported by jgit, yet");
+
+		final MutableEntry entries[] = new MutableEntry[(int) count];
+		int i = 0;
+		int count32 = 0;
+		for (MutableEntry me : index) {
+			entries[i++] = me.cloneEntry();
+			if (me.getOffset() <= Integer.MAX_VALUE)
+				count32++;
+		}
+		Arrays.sort(entries, new Comparator<MutableEntry>() {
+			public int compare(MutableEntry o1, MutableEntry o2) {
+				return Long.signum(o1.getOffset() - o2.getOffset());
+			}
+		});
+
+		names = new int[entries.length * Constants.OBJECT_ID_LENGTH / 4];
+		offsets32 = new int[count32];
+		offsets64 = new long[entries.length - count32];
+		for (int j = 0, j32 = 0; j < entries.length; j++) {
+			final long offset = entries[j].getOffset();
+			if (offset <= Integer.MAX_VALUE)
+				offsets32[j32++] = (int) offset;
+			else
+				offsets64[j - j32] = offset;
+			entries[j].copyRawTo(names, j * Constants.OBJECT_ID_LENGTH / 4);
+		}
+	}
+
+	/**
+	 * Search for object id with the specified start offset in this pack
+	 * (reverse) index.
+	 * 
+	 * @param offset
+	 *            start offset of object to find.
+	 * @return object id for this offset, or null if no object was found.
+	 */
+	ObjectId findObject(final long offset) {
+		if (offset <= Integer.MAX_VALUE) {
+			final int i32 = Arrays.binarySearch(offsets32, (int) offset);
+			if (i32 < 0)
+				return null;
+			final int iNames = i32 * Constants.OBJECT_ID_LENGTH / 4;
+			return ObjectId.fromRaw(names, iNames);
+		} else {
+			final int i64 = Arrays.binarySearch(offsets64, offset);
+			if (i64 < 0)
+				return null;
+			final int iNames = (i64 + offsets32.length)
+					* Constants.OBJECT_ID_LENGTH / 4;
+			return ObjectId.fromRaw(names, iNames);
+		}
+	}
+
+	/**
+	 * Search for the next offset to the specified offset in this pack (reverse)
+	 * index.
+	 * 
+	 * @param offset
+	 *            start offset of previous object (must be valid-existing
+	 *            offset).
+	 * @param maxOffset
+	 *            maximum offset in a pack (returned when there is no next
+	 *            offset).
+	 * @return offset of the next object in a pack or maxOffset if provided
+	 *         offset was the last one.
+	 * @throws CorruptObjectException
+	 *             when there is no object with the provided offset.
+	 */
+	long findNextOffset(final long offset, final long maxOffset)
+			throws CorruptObjectException {
+		if (offset <= Integer.MAX_VALUE) {
+			final int i32 = Arrays.binarySearch(offsets32, (int) offset);
+			if (i32 < 0)
+				throw new CorruptObjectException(
+						"Can't find object in (reverse) pack index for the specified offset "
+								+ offset);
+
+			if (i32 + 1 == offsets32.length) {
+				if (offsets64.length > 0)
+					return offsets64[0];
+				return maxOffset;
+			}
+			return offsets32[i32 + 1];
+		} else {
+			final int i64 = Arrays.binarySearch(offsets64, offset);
+			if (i64 < 0)
+				throw new CorruptObjectException(
+						"Can't find object in (reverse) pack index for the specified offset "
+								+ offset);
+
+			if (i64 + 1 == offsets64.length)
+				return maxOffset;
+			return offsets64[i64 + 1];
+		}
+	}
+}
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [EGIT PATCH 06/20] Tests for PackReverseIndex
  2008-06-15 21:45         ` [EGIT PATCH 05/20] Reverse pack index implementation: PackReverseIndex Marek Zawirski
@ 2008-06-15 21:45           ` Marek Zawirski
  2008-06-15 21:45             ` [EGIT PATCH 07/20] Refactor PackIndexV2 - extract binarySearchLevelTwo() Marek Zawirski
  2008-06-16  4:06           ` [EGIT PATCH 05/20] Reverse pack index implementation: PackReverseIndex Shawn O. Pearce
  1 sibling, 1 reply; 29+ messages in thread
From: Marek Zawirski @ 2008-06-15 21:45 UTC (permalink / raw)
  To: robin.rosenberg, spearce; +Cc: git, Marek Zawirski

Test cases based on new pack index, with big offsets (> 2^31).
Index was generated from artificially generated pack (repository).

Signed-off-by: Marek Zawirski <marek.zawirski@gmail.com>
---
 .../org/spearce/jgit/lib/PackReverseIndexTest.java |  115 ++++++++++++++++++++
 org.spearce.jgit.test/tst/pack-huge.idx            |  Bin 0 -> 2368 bytes
 2 files changed, 115 insertions(+), 0 deletions(-)
 create mode 100644 org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackReverseIndexTest.java
 create mode 100644 org.spearce.jgit.test/tst/pack-huge.idx

diff --git a/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackReverseIndexTest.java b/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackReverseIndexTest.java
new file mode 100644
index 0000000..a06c613
--- /dev/null
+++ b/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackReverseIndexTest.java
@@ -0,0 +1,115 @@
+/*
+ * Copyright (C) 2008, Marek Zawirski <marek.zawirski@gmail.com>
+ *
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials provided
+ *   with the distribution.
+ *
+ * - Neither the name of the Git Development Community nor the
+ *   names of its contributors may be used to endorse or promote
+ *   products derived from this software without specific prior
+ *   written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
+ * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
+ * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+ * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+package org.spearce.jgit.lib;
+
+import java.io.File;
+
+import org.spearce.jgit.errors.CorruptObjectException;
+import org.spearce.jgit.lib.PackIndex.MutableEntry;
+
+public class PackReverseIndexTest extends RepositoryTestCase {
+
+	private PackIndex idx;
+
+	private PackReverseIndex reverseIdx;
+
+	/**
+	 * Set up tested class instance, test constructor by the way.
+	 */
+	public void setUp() throws Exception {
+		super.setUp();
+		// index with both small (< 2^31) and big offsets 
+		idx = PackIndex.open(new File(new File("tst"),
+				"pack-huge.idx"));
+		reverseIdx = new PackReverseIndex(idx);
+	}
+
+	/**
+	 * Test findObject() for all index entries.
+	 */
+	public void testFindObject() {
+		for (MutableEntry me : idx)
+			assertEquals(me.toObjectId(), reverseIdx.findObject(me.getOffset()));
+	}
+
+	/**
+	 * Test findObject() with illegal argument.
+	 */
+	public void testFindObjectWrongOffset() {
+		assertNull(reverseIdx.findObject(0));
+	}
+
+	/**
+	 * Test findNextOffset() for all index entries.
+	 * 
+	 * @throws CorruptObjectException
+	 */
+	public void testFindNextOffset() throws CorruptObjectException {
+		long offset = findFirstOffset();
+		assertTrue(offset > 0);
+		for (int i = 0; i < idx.getObjectCount(); i++) {
+			long newOffset = reverseIdx.findNextOffset(offset, Long.MAX_VALUE);
+			assertTrue(newOffset > offset);
+			if (i == idx.getObjectCount() - 1)
+				assertEquals(newOffset, Long.MAX_VALUE);
+			else
+				assertEquals(newOffset, idx.findOffset(reverseIdx
+						.findObject(newOffset)));
+			offset = newOffset;
+		}
+	}
+
+	/**
+	 * Test findNextOffset() with wrong illegal argument as offset.
+	 */
+	public void testFindNextOffsetWrongOffset() {
+		try {
+			reverseIdx.findNextOffset(0, Long.MAX_VALUE);
+			fail("findNextOffset() should throw exception");
+		} catch (CorruptObjectException x) {
+			// expected
+		}
+	}
+
+	private long findFirstOffset() {
+		long min = Long.MAX_VALUE;
+		for (MutableEntry me : idx)
+			min = Math.min(min, me.getOffset());
+		return min;
+	}
+}
diff --git a/org.spearce.jgit.test/tst/pack-huge.idx b/org.spearce.jgit.test/tst/pack-huge.idx
new file mode 100644
index 0000000000000000000000000000000000000000..0a5bbfb6a6c1453926f7d1ceaa78af494a8b460d
GIT binary patch
literal 2368
zcmd7Tdo<K(7zgkfni%6YQn^GXGpJCQK`t|=u%X&vbaR(MbA%*Y)9QkCnN)*W#wD`S
zRzxDDVik%Smy!Dz3?WJ`rOrl%eScw(<8(Tmwm<fd&-uR3^S<x%n}6oKv%zF43<eVc
z;EO=@*NB4srIvx-QV7udlI5WPy~M!XuMr3PUxEbvFOdNK#ZaJLN)q%INP+dYkOt=<
zGE2yU8iEGo{wG#|-vN>Tij`pRd#nQY7F!MKA3*`kQJkj)a;Y_-2T@r>6|DcOwP40#
zYM}m+Fks%gZ=ep&EWm>Gx4?mO3pBx6>t9&M^kUw-u`-=-wd|mdRKhc%z3Bd_j3ZUc
zWckHw-5QZ>PpzD%uTgKTG1vcev~<^Nuom1z?blMx=#@?B>gP17cQlPL4MPkyd@=~+
z)|HBnlU&683U;z@2B$a;<e(k6KFa9(dYX|0i^|BN@;#x?`wzkLs2eC6GP~Sg$0NEQ
z*$3Y45zO-0V*0d0O$sunLS`MVol`aF^Ph=-GGhFSZ^f|<y?6ZlNA~DTOZ)0kJt<-f
zEy*|cFn7@AiNgEN6S#|m##^^(WjN=x%4i@Ckn{iMk=L+K+DL5QJeh4W+<%Q{R(Y;<
zFAWp>m(qG$p;5cv<BOMu^0JkDeMuEtidC(5H{-Z<P2`@;+>hO{_Zg2(gPZVs&H8s(
z)cEQS)2k&4x3`wT3Gp6Ioh2V$esJzoU&X+SjMv(5nvQk5RCYO-7~DA<{qUL_r=h{h
z1XoRW6><+q1Zj+SI)zT5GZWpPCdAa9ada={A2yuYT_;EL_G#TZO0X3TbEK`si)C#x
zFPtzCHmm2Id6gu_a3jqUev!BEtrs3fUOe@#F`q+Z+Zl_Om!Wm6j<K_X$_iNhOvcO9
zLUa29oufhR+MgQDj~{At4@ujU%N*99`h8PGAF=l=A+mvqlI(kxhZCezj84o3drqVj
z9&t(Y?hLz;Uuj9Z&4-y{yo$1^n9Xo!FI|J86`pba*X<rXitnP+FKz3fyKX#PR6VnD
zb=kOslE*GxH8Vw)y@zJ}-I)w3FWPPmhBnp4iBh>hS~a10Ki>>ivbmIqh$0`iA9Exi
zH;hv_7GB)wA@R(Kt16Usrp1<eKiAQEp-`5AwA52bbVZi5#HO_5_I!NtX?_0{UuU`U
z3msK`irOX{c2@n`#|wjH2gZ>y_+*O0hc>A}{9Efccj-HOy2nHle%=!;`GTj2jo>23
zQ2u%~`v`ddB5aqX@_@VNknzl<DXBHY#Z<dCk1XH6Rq$KY?OT4+k?>@6$F{h20Xn(c
z)%1rrQTfE_N#7wKNt?zS_7&b8I9BFBjww<YQOWnBqc|n&?i~zE&e|=FpthJ}vr#nM
zQCF+efkqE3(|MTpx+fbwYht?gmWS)anGz-LWc|Ts6uudL{=iIy;ud%v%XE(H7Py9L
z5KO4by+`9}W%c0g19Va?d1F&ptT7#-vHV0eKMk9zoLPdQ+$!`tn2>JgnVBQaz*|R&
zInieX+@zRYX5-~`UN%&(>5`ODHJOLD$~YobkVMZLE1j0+kr=IEbq=I(Q|f`5Q1C|v
z;EM3clORO^Q2@NhBF<i7Em~lW0G0z{0C*ooeQ@Cu0f7vz(hfwgB@70;4EjhwLNatV
z1|$lQ1f&3I0K8*}<b6?mL+}|xebEsQF#9d&%L3pngI9x;16G_uK7;Pb6C(yGAfYz{
zdOKlkROvXBm(SmQ0p$pb4Sg3%>|7>-*-)LBJeMpcl=)J(n&)cvLceIf4uG;b?o!TN
zo!C9!zXRpjn2H%FpUPah2jvJWfjFO^=k%hJ)8=Yc1(YSsoT~Z#?$l*jF@l0Nla?U^
i5q#0RPo(gxfy(+nDH~nU#sqlR)DGh}_SPGh&i(_^HDsp%

literal 0
HcmV?d00001

-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [EGIT PATCH 07/20] Refactor PackIndexV2 - extract binarySearchLevelTwo()
  2008-06-15 21:45           ` [EGIT PATCH 06/20] Tests for PackReverseIndex Marek Zawirski
@ 2008-06-15 21:45             ` Marek Zawirski
  2008-06-15 21:45               ` [EGIT PATCH 08/20] CRC32 support for PackIndex Marek Zawirski
  0 siblings, 1 reply; 29+ messages in thread
From: Marek Zawirski @ 2008-06-15 21:45 UTC (permalink / raw)
  To: robin.rosenberg, spearce; +Cc: git, Marek Zawirski

Signed-off-by: Marek Zawirski <marek.zawirski@gmail.com>
---
 .../src/org/spearce/jgit/lib/PackIndexV2.java      |   23 +++++++++++++-------
 1 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV2.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV2.java
index ae70f11..a0b9827 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV2.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV2.java
@@ -153,6 +153,20 @@ class PackIndexV2 extends PackIndex {
 	@Override
 	long findOffset(final AnyObjectId objId) {
 		final int levelOne = objId.getFirstByte();
+		final int levelTwo = binarySearchLevelTwo(objId, levelOne);
+		if (levelTwo == -1)
+			return -1;
+		final long p = NB.decodeUInt32(offset32[levelOne], levelTwo << 2);
+		if ((p & IS_O64) != 0)
+			return NB.decodeUInt64(offset64, (8 * (int) (p & ~IS_O64)));
+		return p;
+	}
+
+	public Iterator<MutableEntry> iterator() {
+		return new EntriesIteratorV2();
+	}
+
+	private int binarySearchLevelTwo(final AnyObjectId objId, final int levelOne) {
 		final int[] data = names[levelOne];
 		int high = offset32[levelOne].length >> 2;
 		if (high == 0)
@@ -167,20 +181,13 @@ class PackIndexV2 extends PackIndex {
 			if (cmp < 0)
 				high = mid;
 			else if (cmp == 0) {
-				final long p = NB.decodeUInt32(offset32[levelOne], mid4);
-				if ((p & IS_O64) != 0)
-					return NB.decodeUInt64(offset64, (8 * (int) (p & ~IS_O64)));
-				return p;
+				return mid;
 			} else
 				low = mid + 1;
 		} while (low < high);
 		return -1;
 	}
 
-	public Iterator<MutableEntry> iterator() {
-		return new EntriesIteratorV2();
-	}
-
 	private class EntriesIteratorV2 extends EntriesIterator {
 		private int levelOne;
 
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [EGIT PATCH 08/20] CRC32 support for PackIndex
  2008-06-15 21:45             ` [EGIT PATCH 07/20] Refactor PackIndexV2 - extract binarySearchLevelTwo() Marek Zawirski
@ 2008-06-15 21:45               ` Marek Zawirski
  2008-06-15 21:45                 ` [EGIT PATCH 09/20] CRC32 PackIndex tests Marek Zawirski
  0 siblings, 1 reply; 29+ messages in thread
From: Marek Zawirski @ 2008-06-15 21:45 UTC (permalink / raw)
  To: robin.rosenberg, spearce; +Cc: git, Marek Zawirski

Add findCRC32() and hasCRC32Support() methods in PackIndex with
implementation for index v2.

Signed-off-by: Marek Zawirski <marek.zawirski@gmail.com>
---
 .../src/org/spearce/jgit/lib/PackIndex.java        |   23 ++++++++++++
 .../src/org/spearce/jgit/lib/PackIndexV1.java      |   10 +++++
 .../src/org/spearce/jgit/lib/PackIndexV2.java      |   38 ++++++++++++-------
 3 files changed, 57 insertions(+), 14 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndex.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndex.java
index 3935d4f..e34cd36 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndex.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndex.java
@@ -44,6 +44,7 @@ import java.io.FileNotFoundException;
 import java.io.IOException;
 import java.util.Iterator;
 
+import org.spearce.jgit.errors.MissingObjectException;
 import org.spearce.jgit.util.NB;
 
 /**
@@ -151,6 +152,28 @@ public abstract class PackIndex implements Iterable<PackIndex.MutableEntry> {
 	abstract long findOffset(AnyObjectId objId);
 
 	/**
+	 * Retrieve stored CRC32 checksum of the requested object raw-data
+	 * (including header).
+	 * 
+	 * @param objId
+	 *            id of object to look for
+	 * @return CRC32 checksum of specified object (at 32 less significant bits)
+	 * @throws MissingObjectException
+	 *             when requested ObjectId was not found in this index
+	 * @throws UnsupportedOperationException
+	 *             when this index doesn't support CRC32 checksum
+	 */
+	abstract long findCRC32(AnyObjectId objId) throws MissingObjectException,
+			UnsupportedOperationException;
+
+	/**
+	 * Check whether this index supports (has) CRC32 checksums for objects.
+	 * 
+	 * @return true if CRC32 is stored, false otherwise
+	 */
+	abstract boolean hasCRC32Support();
+
+	/**
 	 * Represent mutable entry of pack index consisting of object id and offset
 	 * in pack (both mutable).
 	 * 
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV1.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV1.java
index b8d9de3..86b939a 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV1.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV1.java
@@ -107,6 +107,16 @@ class PackIndexV1 extends PackIndex {
 		return -1;
 	}
 
+	@Override
+	long findCRC32(AnyObjectId objId) {
+		throw new UnsupportedOperationException();
+	}
+
+	@Override
+	boolean hasCRC32Support() {
+		return false;
+	}
+
 	public Iterator<MutableEntry> iterator() {
 		return new IndexV1Iterator();
 	}
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV2.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV2.java
index a0b9827..fc1f08b 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV2.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV2.java
@@ -37,12 +37,12 @@
 
 package org.spearce.jgit.lib;
 
-import java.io.EOFException;
 import java.io.IOException;
 import java.io.InputStream;
 import java.util.Iterator;
 import java.util.NoSuchElementException;
 
+import org.spearce.jgit.errors.MissingObjectException;
 import org.spearce.jgit.util.NB;
 
 /** Support for the pack index v2 format. */
@@ -63,6 +63,9 @@ class PackIndexV2 extends PackIndex {
 	/** 256 arrays of the 32 bit offset data, matching {@link #names}. */
 	private byte[][] offset32;
 
+	/** 256 arrays of the CRC-32 of objects, matching {@link #names}. */
+	private byte[][] crc32;
+
 	/** 64 bit offset table. */
 	private byte[] offset64;
 
@@ -76,6 +79,7 @@ class PackIndexV2 extends PackIndex {
 
 		names = new int[FANOUT][];
 		offset32 = new byte[FANOUT][];
+		crc32 = new byte[FANOUT][];
 
 		// Object name table. The size we can permit per fan-out bucket
 		// is limited to Java's 2 GB per byte array limitation. That is
@@ -91,6 +95,7 @@ class PackIndexV2 extends PackIndex {
 			if (bucketCnt == 0) {
 				names[k] = NO_INTS;
 				offset32[k] = NO_BYTES;
+				crc32[k] = NO_BYTES;
 				continue;
 			}
 
@@ -107,11 +112,12 @@ class PackIndexV2 extends PackIndex {
 
 			names[k] = bin;
 			offset32[k] = new byte[(int) (bucketCnt * 4)];
+			crc32[k] = new byte[(int) (bucketCnt * 4)];
 		}
 
-		// CRC32 table. Currently unused.
-		//
-		skipFully(fd, objectCnt * 4);
+		// CRC32 table.
+		for (int k = 0; k < FANOUT; k++)
+			NB.readFully(fd, crc32[k], 0, crc32[k].length);
 
 		// 32 bit offset table. Any entries with the most significant bit
 		// set require a 64 bit offset entry in another table.
@@ -135,16 +141,6 @@ class PackIndexV2 extends PackIndex {
 		}
 	}
 
-	private static void skipFully(final InputStream fd, long toSkip)
-			throws IOException {
-		while (toSkip > 0) {
-			final long r = fd.skip(toSkip);
-			if (r <= 0)
-				throw new EOFException("Cannot skip index section.");
-			toSkip -= r;
-		}
-	}
-
 	@Override
 	long getObjectCount() {
 		return objectCnt;
@@ -162,6 +158,20 @@ class PackIndexV2 extends PackIndex {
 		return p;
 	}
 
+	@Override
+	long findCRC32(AnyObjectId objId) throws MissingObjectException {
+		final int levelOne = objId.getFirstByte();
+		final int levelTwo = binarySearchLevelTwo(objId, levelOne);
+		if (levelTwo == -1)
+			throw new MissingObjectException(objId.copy(), "unknown");
+		return NB.decodeUInt32(crc32[levelOne], levelTwo << 2);
+	}
+
+	@Override
+	boolean hasCRC32Support() {
+		return true;
+	}
+
 	public Iterator<MutableEntry> iterator() {
 		return new EntriesIteratorV2();
 	}
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [EGIT PATCH 09/20] CRC32 PackIndex tests
  2008-06-15 21:45               ` [EGIT PATCH 08/20] CRC32 support for PackIndex Marek Zawirski
@ 2008-06-15 21:45                 ` Marek Zawirski
  2008-06-15 21:45                   ` [EGIT PATCH 10/20] Format PackedObjectLoader class Marek Zawirski
  0 siblings, 1 reply; 29+ messages in thread
From: Marek Zawirski @ 2008-06-15 21:45 UTC (permalink / raw)
  To: robin.rosenberg, spearce; +Cc: git, Marek Zawirski

Signed-off-by: Marek Zawirski <marek.zawirski@gmail.com>
---
 .../tst/org/spearce/jgit/lib/PackIndexTest.java    |   10 ++++++
 .../tst/org/spearce/jgit/lib/PackIndexV1Test.java  |   19 ++++++++++++
 .../tst/org/spearce/jgit/lib/PackIndexV2Test.java  |   30 ++++++++++++++++++++
 3 files changed, 59 insertions(+), 0 deletions(-)

diff --git a/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackIndexTest.java b/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackIndexTest.java
index c682153..fd7b646 100644
--- a/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackIndexTest.java
+++ b/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackIndexTest.java
@@ -41,6 +41,7 @@ import java.io.File;
 import java.util.Iterator;
 import java.util.NoSuchElementException;
 
+import org.spearce.jgit.errors.MissingObjectException;
 import org.spearce.jgit.lib.PackIndex.MutableEntry;
 
 public abstract class PackIndexTest extends RepositoryTestCase {
@@ -70,6 +71,15 @@ public abstract class PackIndexTest extends RepositoryTestCase {
 	public abstract File getFileForPackdf2982f28();
 
 	/**
+	 * Verify CRC32 support.
+	 * 
+	 * @throws MissingObjectException
+	 * @throws UnsupportedOperationException
+	 */
+	public abstract void testCRC32() throws MissingObjectException,
+			UnsupportedOperationException;
+
+	/**
 	 * Test contracts of Iterator methods and this implementation remove()
 	 * limitations.
 	 */
diff --git a/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackIndexV1Test.java b/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackIndexV1Test.java
index dda3ef4..bb9e83e 100644
--- a/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackIndexV1Test.java
+++ b/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackIndexV1Test.java
@@ -39,6 +39,8 @@ package org.spearce.jgit.lib;
 
 import java.io.File;
 
+import org.spearce.jgit.errors.MissingObjectException;
+
 public class PackIndexV1Test extends PackIndexTest {
 	@Override
 	public File getFileForPack34be9032() {
@@ -51,4 +53,21 @@ public class PackIndexV1Test extends PackIndexTest {
 		return new File(new File("tst"),
 				"pack-df2982f284bbabb6bdb59ee3fcc6eb0983e20371.idx");
 	}
+
+	/**
+	 * Verify CRC32 - V1 should not index anything.
+	 * 
+	 * @throws MissingObjectException
+	 */
+	@Override
+	public void testCRC32() throws MissingObjectException {
+		assertFalse(smallIdx.hasCRC32Support());
+		try {
+			smallIdx.findCRC32(ObjectId
+					.fromString("4b825dc642cb6eb9a060e54bf8d69288fbee4904"));
+			fail("index V1 shouldn't support CRC");
+		} catch (UnsupportedOperationException x) {
+			// expected
+		}
+	}
 }
diff --git a/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackIndexV2Test.java b/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackIndexV2Test.java
index 8267e48..b21a7e9 100644
--- a/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackIndexV2Test.java
+++ b/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackIndexV2Test.java
@@ -39,6 +39,8 @@ package org.spearce.jgit.lib;
 
 import java.io.File;
 
+import org.spearce.jgit.errors.MissingObjectException;
+
 public class PackIndexV2Test extends PackIndexTest {
 	@Override
 	public File getFileForPack34be9032() {
@@ -51,4 +53,32 @@ public class PackIndexV2Test extends PackIndexTest {
 		return new File(new File("tst"),
 				"pack-df2982f284bbabb6bdb59ee3fcc6eb0983e20371.idxV2");
 	}
+
+	/**
+	 * Verify CRC32 indexing.
+	 * 
+	 * @throws UnsupportedOperationException
+	 * @throws MissingObjectException
+	 */
+	@Override
+	public void testCRC32() throws MissingObjectException,
+			UnsupportedOperationException {
+		assertTrue(smallIdx.hasCRC32Support());
+		assertEquals(0x00000000C2B64258l, smallIdx.findCRC32(ObjectId
+				.fromString("4b825dc642cb6eb9a060e54bf8d69288fbee4904")));
+		assertEquals(0x0000000072AD57C2l, smallIdx.findCRC32(ObjectId
+				.fromString("540a36d136cf413e4b064c2b0e0a4db60f77feab")));
+		assertEquals(0x00000000FF10A479l, smallIdx.findCRC32(ObjectId
+				.fromString("5b6e7c66c276e7610d4a73c70ec1a1f7c1003259")));
+		assertEquals(0x0000000034B27DDCl, smallIdx.findCRC32(ObjectId
+				.fromString("6ff87c4664981e4397625791c8ea3bbb5f2279a3")));
+		assertEquals(0x000000004743F1E4l, smallIdx.findCRC32(ObjectId
+				.fromString("82c6b885ff600be425b4ea96dee75dca255b69e7")));
+		assertEquals(0x00000000640B358Bl, smallIdx.findCRC32(ObjectId
+				.fromString("902d5476fa249b7abc9d84c611577a81381f0327")));
+		assertEquals(0x000000002A17CB5El, smallIdx.findCRC32(ObjectId
+				.fromString("aabf2ffaec9b497f0950352b3e582d73035c2035")));
+		assertEquals(0x000000000B3B5BA6l, smallIdx.findCRC32(ObjectId
+				.fromString("c59759f143fb1fe21c197981df75a7ee00290799")));
+	}
 }
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [EGIT PATCH 10/20] Format PackedObjectLoader class
  2008-06-15 21:45                 ` [EGIT PATCH 09/20] CRC32 PackIndex tests Marek Zawirski
@ 2008-06-15 21:45                   ` Marek Zawirski
  2008-06-15 21:45                     ` [EGIT PATCH 11/20] Format UnpackedObjectLoader class Marek Zawirski
  0 siblings, 1 reply; 29+ messages in thread
From: Marek Zawirski @ 2008-06-15 21:45 UTC (permalink / raw)
  To: robin.rosenberg, spearce; +Cc: git, Marek Zawirski

Signed-off-by: Marek Zawirski <marek.zawirski@gmail.com>
---
 .../org/spearce/jgit/lib/PackedObjectLoader.java   |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackedObjectLoader.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackedObjectLoader.java
index 743484e..43d43e6 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackedObjectLoader.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackedObjectLoader.java
@@ -41,8 +41,7 @@ package org.spearce.jgit.lib;
 import java.io.IOException;
 
 /**
- * Base class for a set of object loader classes for packed
- * objects.
+ * Base class for a set of object loader classes for packed objects.
  */
 abstract class PackedObjectLoader extends ObjectLoader {
 	protected final PackFile pack;
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [EGIT PATCH 11/20] Format UnpackedObjectLoader class
  2008-06-15 21:45                   ` [EGIT PATCH 10/20] Format PackedObjectLoader class Marek Zawirski
@ 2008-06-15 21:45                     ` Marek Zawirski
  2008-06-15 21:45                       ` [EGIT PATCH 12/20] Format DeltaOfsPackedObjectLoader class Marek Zawirski
  0 siblings, 1 reply; 29+ messages in thread
From: Marek Zawirski @ 2008-06-15 21:45 UTC (permalink / raw)
  To: robin.rosenberg, spearce; +Cc: git, Marek Zawirski

Signed-off-by: Marek Zawirski <marek.zawirski@gmail.com>
---
 .../org/spearce/jgit/lib/UnpackedObjectLoader.java |   16 ++++++++++------
 1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/UnpackedObjectLoader.java b/org.spearce.jgit/src/org/spearce/jgit/lib/UnpackedObjectLoader.java
index 4e95387..a5c484b 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/UnpackedObjectLoader.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/UnpackedObjectLoader.java
@@ -49,8 +49,7 @@ import org.spearce.jgit.util.MutableInteger;
 import org.spearce.jgit.util.RawParseUtils;
 
 /**
- * Loose object loader. This class loads an object not
- * stored in a pack.
+ * Loose object loader. This class loads an object not stored in a pack.
  */
 public class UnpackedObjectLoader extends ObjectLoader {
 	private final int objectType;
@@ -61,8 +60,11 @@ public class UnpackedObjectLoader extends ObjectLoader {
 
 	/**
 	 * Construct an ObjectLoader for the specified SHA-1
-	 * @param db repository
-	 * @param id SHA-1
+	 * 
+	 * @param db
+	 *            repository
+	 * @param id
+	 *            SHA-1
 	 * @throws IOException
 	 */
 	public UnpackedObjectLoader(final Repository db, final ObjectId id)
@@ -94,11 +96,13 @@ public class UnpackedObjectLoader extends ObjectLoader {
 	 *             The compressed data supplied does not match the format for a
 	 *             valid loose object.
 	 */
-	public UnpackedObjectLoader(final byte[] compressed) throws CorruptObjectException {
+	public UnpackedObjectLoader(final byte[] compressed)
+			throws CorruptObjectException {
 		this(compressed, null);
 	}
 
-	private UnpackedObjectLoader(final byte[] compressed, final ObjectId id) throws CorruptObjectException {
+	private UnpackedObjectLoader(final byte[] compressed, final ObjectId id)
+			throws CorruptObjectException {
 		setId(id);
 
 		// Try to determine if this is a legacy format loose object or
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [EGIT PATCH 12/20] Format DeltaOfsPackedObjectLoader class
  2008-06-15 21:45                     ` [EGIT PATCH 11/20] Format UnpackedObjectLoader class Marek Zawirski
@ 2008-06-15 21:45                       ` Marek Zawirski
  2008-06-15 21:45                         ` [EGIT PATCH 13/20] Raw-data operations in ObjectLoaders and PackFile Marek Zawirski
  0 siblings, 1 reply; 29+ messages in thread
From: Marek Zawirski @ 2008-06-15 21:45 UTC (permalink / raw)
  To: robin.rosenberg, spearce; +Cc: git, Marek Zawirski

Signed-off-by: Marek Zawirski <marek.zawirski@gmail.com>
---
 .../jgit/lib/DeltaOfsPackedObjectLoader.java       |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/DeltaOfsPackedObjectLoader.java b/org.spearce.jgit/src/org/spearce/jgit/lib/DeltaOfsPackedObjectLoader.java
index 75fda45..edbeef9 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/DeltaOfsPackedObjectLoader.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/DeltaOfsPackedObjectLoader.java
@@ -44,9 +44,8 @@ import java.io.IOException;
 class DeltaOfsPackedObjectLoader extends DeltaPackedObjectLoader {
 	private final long deltaBase;
 
-	DeltaOfsPackedObjectLoader(final WindowCursor curs,
-			final PackFile pr, final long offset,
-			final int deltaSz, final long base) {
+	DeltaOfsPackedObjectLoader(final WindowCursor curs, final PackFile pr,
+			final long offset, final int deltaSz, final long base) {
 		super(curs, pr, offset, deltaSz);
 		deltaBase = base;
 	}
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [EGIT PATCH 13/20] Raw-data operations in ObjectLoaders and PackFile
  2008-06-15 21:45                       ` [EGIT PATCH 12/20] Format DeltaOfsPackedObjectLoader class Marek Zawirski
@ 2008-06-15 21:45                         ` Marek Zawirski
  2008-06-15 21:45                           ` [EGIT PATCH 14/20] Add hasRevSort() in RevWalk for faster sorting strategy checking Marek Zawirski
  0 siblings, 1 reply; 29+ messages in thread
From: Marek Zawirski @ 2008-06-15 21:45 UTC (permalink / raw)
  To: robin.rosenberg, spearce; +Cc: git, Marek Zawirski

Expose operations on raw-data (storage specific) in ObjectLoaders and
subclasses:
- getRawType() giving access to the object type at PackFile header level
- getRawSize() giving access to the size of this object at PackFile
  header level
- getDeltaBase() determining delta base if applicable
- copyRawData() allowing direct copying raw (compressed or delitified)
  object data if possible
+ helper fields, methods in ObjectLoaders
+ helper methods/core engine in PackFile

New operations do not introduce any signifficant performance overhead
when not used.

Signed-off-by: Marek Zawirski <marek.zawirski@gmail.com>
---
 .../jgit/lib/DeltaOfsPackedObjectLoader.java       |   21 ++++++-
 .../spearce/jgit/lib/DeltaPackedObjectLoader.java  |    9 ++-
 .../jgit/lib/DeltaRefPackedObjectLoader.java       |   15 ++++-
 .../src/org/spearce/jgit/lib/ObjectLoader.java     |   24 +++++++
 .../src/org/spearce/jgit/lib/PackFile.java         |   65 ++++++++++++++++++-
 .../org/spearce/jgit/lib/PackedObjectLoader.java   |   44 +++++++++++++-
 .../org/spearce/jgit/lib/UnpackedObjectLoader.java |   10 +++
 .../spearce/jgit/lib/WholePackedObjectLoader.java  |   20 ++++++-
 8 files changed, 194 insertions(+), 14 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/DeltaOfsPackedObjectLoader.java b/org.spearce.jgit/src/org/spearce/jgit/lib/DeltaOfsPackedObjectLoader.java
index edbeef9..5c9fb00 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/DeltaOfsPackedObjectLoader.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/DeltaOfsPackedObjectLoader.java
@@ -40,17 +40,34 @@ package org.spearce.jgit.lib;
 
 import java.io.IOException;
 
+import org.spearce.jgit.errors.CorruptObjectException;
+
 /** Reads a deltified object which uses an offset to find its base. */
 class DeltaOfsPackedObjectLoader extends DeltaPackedObjectLoader {
 	private final long deltaBase;
 
 	DeltaOfsPackedObjectLoader(final WindowCursor curs, final PackFile pr,
-			final long offset, final int deltaSz, final long base) {
-		super(curs, pr, offset, deltaSz);
+			final long dataOffset, final long objectOffset, final int deltaSz,
+			final long base) {
+		super(curs, pr, dataOffset, objectOffset, deltaSz);
 		deltaBase = base;
 	}
 
 	protected PackedObjectLoader getBaseLoader() throws IOException {
 		return pack.resolveBase(curs, deltaBase);
 	}
+
+	@Override
+	public int getRawType() {
+		return Constants.OBJ_OFS_DELTA;
+	}
+
+	@Override
+	public ObjectId getDeltaBase() throws IOException {
+		final ObjectId id = pack.findObjectForOffset(deltaBase);
+		if (id == null)
+			throw new CorruptObjectException(
+					"Offset-written delta base for object not found in a pack");
+		return id;
+	}
 }
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/DeltaPackedObjectLoader.java b/org.spearce.jgit/src/org/spearce/jgit/lib/DeltaPackedObjectLoader.java
index 4813572..e73f8e5 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/DeltaPackedObjectLoader.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/DeltaPackedObjectLoader.java
@@ -50,8 +50,8 @@ abstract class DeltaPackedObjectLoader extends PackedObjectLoader {
 	private final int deltaSize;
 
 	DeltaPackedObjectLoader(final WindowCursor curs, final PackFile pr,
-			final long offset, final int deltaSz) {
-		super(curs, pr, offset);
+			final long dataOffset, final long objectOffset, final int deltaSz) {
+		super(curs, pr, dataOffset, objectOffset);
 		objectType = -1;
 		deltaSize = deltaSz;
 	}
@@ -98,6 +98,11 @@ abstract class DeltaPackedObjectLoader extends PackedObjectLoader {
 		}
 	}
 
+	@Override
+	public long getRawSize() {
+		return deltaSize;
+	}
+
 	/**
 	 * @return the object loader for the base object
 	 * @throws IOException
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/DeltaRefPackedObjectLoader.java b/org.spearce.jgit/src/org/spearce/jgit/lib/DeltaRefPackedObjectLoader.java
index fb87abc..042d3a8 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/DeltaRefPackedObjectLoader.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/DeltaRefPackedObjectLoader.java
@@ -47,8 +47,9 @@ class DeltaRefPackedObjectLoader extends DeltaPackedObjectLoader {
 	private final ObjectId deltaBase;
 
 	DeltaRefPackedObjectLoader(final WindowCursor curs, final PackFile pr,
-			final long offset, final int deltaSz, final ObjectId base) {
-		super(curs, pr, offset, deltaSz);
+			final long dataOffset, final long objectOffset, final int deltaSz,
+			final ObjectId base) {
+		super(curs, pr, dataOffset, objectOffset, deltaSz);
 		deltaBase = base;
 	}
 
@@ -58,4 +59,14 @@ class DeltaRefPackedObjectLoader extends DeltaPackedObjectLoader {
 			throw new MissingObjectException(deltaBase, "delta base");
 		return or;
 	}
+
+	@Override
+	public int getRawType() throws IOException {
+		return Constants.OBJ_REF_DELTA;
+	}
+
+	@Override
+	public ObjectId getDeltaBase() throws IOException {
+		return deltaBase;
+	}
 }
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/ObjectLoader.java b/org.spearce.jgit/src/org/spearce/jgit/lib/ObjectLoader.java
index 3a96dd1..5282491 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/ObjectLoader.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/ObjectLoader.java
@@ -66,6 +66,13 @@ public abstract class ObjectLoader {
 	}
 
 	/**
+	 * @return true if id of loaded object is already known, false otherwise.
+	 */
+	protected boolean hasComputedId() {
+		return objectId != null;
+	}
+
+	/**
 	 * Set the SHA-1 id of the object handled by this loader
 	 * 
 	 * @param id
@@ -113,4 +120,21 @@ public abstract class ObjectLoader {
 	 *             the object cannot be read.
 	 */
 	public abstract byte[] getCachedBytes() throws IOException;
+
+	/**
+	 * @return raw object type from object header, as stored in storage (pack,
+	 *         loose file). This may be different from {@link #getType()} result
+	 *         for packs (see {@link Constants}).
+	 * @throws IOException
+	 *             when type cannot be read from the object header.
+	 */
+	public abstract int getRawType() throws IOException;
+
+	/**
+	 * @return raw size of object from object header (pack, loose file).
+	 *         Interpretation of this value depends on {@link #getRawType()}.
+	 * @throws IOException
+	 *             when raw size cannot be read from the object header.
+	 */
+	public abstract long getRawSize() throws IOException;
 }
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackFile.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackFile.java
index 3880966..9992615 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackFile.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackFile.java
@@ -38,11 +38,16 @@
 
 package org.spearce.jgit.lib;
 
+import java.io.EOFException;
 import java.io.File;
 import java.io.IOException;
+import java.io.OutputStream;
 import java.util.Iterator;
+import java.util.zip.CRC32;
+import java.util.zip.CheckedOutputStream;
 import java.util.zip.DataFormatException;
 
+import org.spearce.jgit.errors.CorruptObjectException;
 import org.spearce.jgit.util.NB;
 
 /**
@@ -201,6 +206,52 @@ public class PackFile implements Iterable<PackIndex.MutableEntry> {
 		return dstbuf;
 	}
 
+	final void copyRawData(final PackedObjectLoader loader,
+			final OutputStream out, final byte buf[]) throws IOException {
+		final long objectOffset = loader.objectOffset;
+		final long dataOffset = loader.dataOffset;
+		final int cnt = (int) (findEndOffset(objectOffset) - dataOffset);
+		final WindowCursor curs = loader.curs;
+
+		if (idx.hasCRC32Support()) {
+			final CRC32 crc = new CRC32();
+			int headerCnt = (int) (dataOffset - objectOffset);
+			while (headerCnt > 0) {
+				int toRead = Math.min(headerCnt, buf.length);
+				int read = pack.read(objectOffset, buf, 0, toRead, curs);
+				if (read != toRead)
+					throw new EOFException();
+				crc.update(buf, 0, read);
+				headerCnt -= toRead;
+			}
+			final CheckedOutputStream crcOut = new CheckedOutputStream(out, crc);
+			pack.copyToStream(dataOffset, buf, cnt, crcOut, curs);
+			final long computed = crc.getValue();
+
+			ObjectId id;
+			if (loader.hasComputedId())
+				id = loader.getId();
+			else
+				id = findObjectForOffset(objectOffset);
+			final long expected = idx.findCRC32(id);
+			if (computed != expected)
+				throw new CorruptObjectException(id,
+						"Possible data corruption - CRC32 of raw pack data (object offset "
+								+ objectOffset
+								+ ") mismatch CRC32 from pack index");
+		} else {
+			pack.copyToStream(dataOffset, buf, cnt, out, curs);
+
+			// read to verify against Adler32 zlib checksum
+			loader.getCachedBytes();
+		}
+	}
+	
+	boolean supportsFastCopyRawData() {
+		return idx.hasCRC32Support();
+	}
+
+
 	private void readPackHeader() throws IOException {
 		final WindowCursor curs = new WindowCursor();
 		long position = 0;
@@ -252,8 +303,8 @@ public class PackFile implements Iterable<PackIndex.MutableEntry> {
 		case Constants.OBJ_TREE:
 		case Constants.OBJ_BLOB:
 		case Constants.OBJ_TAG:
-			return new WholePackedObjectLoader(curs, this, pos, typeCode,
-					(int) dataSize);
+			return new WholePackedObjectLoader(curs, this, pos, objOffset,
+					typeCode, (int) dataSize);
 
 		case Constants.OBJ_OFS_DELTA: {
 			pack.readFully(pos, ib, curs);
@@ -267,18 +318,24 @@ public class PackFile implements Iterable<PackIndex.MutableEntry> {
 				ofs += (c & 127);
 			}
 			return new DeltaOfsPackedObjectLoader(curs, this, pos + p,
-					(int) dataSize, objOffset - ofs);
+					objOffset, (int) dataSize, objOffset - ofs);
 		}
 		case Constants.OBJ_REF_DELTA: {
 			pack.readFully(pos, ib, curs);
 			return new DeltaRefPackedObjectLoader(curs, this, pos + ib.length,
-					(int) dataSize, ObjectId.fromRaw(ib));
+					objOffset, (int) dataSize, ObjectId.fromRaw(ib));
 		}
 		default:
 			throw new IOException("Unknown object type " + typeCode + ".");
 		}
 	}
 
+	private long findEndOffset(final long startOffset)
+			throws CorruptObjectException {
+		final long maxOffset = pack.length() - Constants.OBJECT_ID_LENGTH;
+		return getReverseIdx().findNextOffset(startOffset, maxOffset);
+	}
+
 	private PackReverseIndex getReverseIdx() {
 		if (reverseIdx == null)
 			reverseIdx = new PackReverseIndex(idx);
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackedObjectLoader.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackedObjectLoader.java
index 43d43e6..b433609 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackedObjectLoader.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackedObjectLoader.java
@@ -39,6 +39,7 @@
 package org.spearce.jgit.lib;
 
 import java.io.IOException;
+import java.io.OutputStream;
 
 /**
  * Base class for a set of object loader classes for packed objects.
@@ -50,15 +51,18 @@ abstract class PackedObjectLoader extends ObjectLoader {
 
 	protected final long dataOffset;
 
+	protected final long objectOffset;
+
 	protected int objectType;
 
 	protected int objectSize;
 
 	PackedObjectLoader(final WindowCursor c, final PackFile pr,
-			final long offset) {
+			final long dataOffset, final long objectOffset) {
 		curs = c;
 		pack = pr;
-		dataOffset = offset;
+		this.dataOffset = dataOffset;
+		this.objectOffset = objectOffset;
 	}
 
 	public int getType() throws IOException {
@@ -82,4 +86,40 @@ abstract class PackedObjectLoader extends ObjectLoader {
 		System.arraycopy(data, 0, copy, 0, data.length);
 		return data;
 	}
+
+	/**
+	 * Copy raw object representation from storage to provided output stream.
+	 * <p>
+	 * Copied data doesn't include object header. User must provide temporary
+	 * buffer used during copying by underlying I/O layer.
+	 * </p>
+	 * 
+	 * @param out
+	 *            output stream when data is copied. No buffering is guaranteed.
+	 * @param buf
+	 *            temporary buffer used during copying. Recommended size is at
+	 *            least few kB.
+	 * @throws IOException
+	 *             when the object cannot be read.
+	 */
+	public void copyRawData(OutputStream out, byte buf[]) throws IOException {
+		pack.copyRawData(this, out, buf);
+	}
+
+	/**
+	 * @return true if this loader is capable of fast raw-data copying basing on
+	 *         compressed data checksum; false if raw-data copying needs
+	 *         uncompressing and compressing data
+	 */
+	public boolean supportsFastCopyRawData() {
+		return pack.supportsFastCopyRawData();
+	}
+
+	/**
+	 * @return id of delta base object for this object representation. null if
+	 *         object is not stored as delta.
+	 * @throws IOException
+	 *             when delta base cannot read.
+	 */
+	public abstract ObjectId getDeltaBase() throws IOException;
 }
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/UnpackedObjectLoader.java b/org.spearce.jgit/src/org/spearce/jgit/lib/UnpackedObjectLoader.java
index a5c484b..65072f0 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/UnpackedObjectLoader.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/UnpackedObjectLoader.java
@@ -208,4 +208,14 @@ public class UnpackedObjectLoader extends ObjectLoader {
 	public byte[] getCachedBytes() throws IOException {
 		return bytes;
 	}
+
+	@Override
+	public int getRawType() {
+		return objectType;
+	}
+
+	@Override
+	public long getRawSize() {
+		return objectSize;
+	}
 }
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/WholePackedObjectLoader.java b/org.spearce.jgit/src/org/spearce/jgit/lib/WholePackedObjectLoader.java
index e54fba6..7185df5 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/WholePackedObjectLoader.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/WholePackedObjectLoader.java
@@ -47,8 +47,9 @@ class WholePackedObjectLoader extends PackedObjectLoader {
 	private static final int OBJ_COMMIT = Constants.OBJ_COMMIT;
 
 	WholePackedObjectLoader(final WindowCursor curs, final PackFile pr,
-			final long offset, final int type, final int size) {
-		super(curs, pr, offset);
+			final long dataOffset, final long objectOffset, final int type,
+			final int size) {
+		super(curs, pr, dataOffset, objectOffset);
 		objectType = type;
 		objectSize = size;
 	}
@@ -76,4 +77,19 @@ class WholePackedObjectLoader extends PackedObjectLoader {
 			throw coe;
 		}
 	}
+
+	@Override
+	public int getRawType() {
+		return objectType;
+	}
+
+	@Override
+	public long getRawSize() {
+		return objectSize;
+	}
+
+	@Override
+	public ObjectId getDeltaBase() {
+		return null;
+	}
 }
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [EGIT PATCH 14/20] Add hasRevSort() in RevWalk for faster sorting strategy checking
  2008-06-15 21:45                         ` [EGIT PATCH 13/20] Raw-data operations in ObjectLoaders and PackFile Marek Zawirski
@ 2008-06-15 21:45                           ` Marek Zawirski
  2008-06-15 21:45                             ` [EGIT PATCH 15/20] Refactor getRevSort() calls to hasRevSort() Marek Zawirski
  0 siblings, 1 reply; 29+ messages in thread
From: Marek Zawirski @ 2008-06-15 21:45 UTC (permalink / raw)
  To: robin.rosenberg, spearce; +Cc: git, Marek Zawirski

Direct calls to this method let us avoid unnecessary cloning of
getRevSort() output just for checking existance of some strategy.

Signed-off-by: Marek Zawirski <marek.zawirski@gmail.com>
---
 .../src/org/spearce/jgit/revwalk/RevWalk.java      |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/revwalk/RevWalk.java b/org.spearce.jgit/src/org/spearce/jgit/revwalk/RevWalk.java
index 4cb75ec..fc757a5 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/revwalk/RevWalk.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/revwalk/RevWalk.java
@@ -389,6 +389,17 @@ public class RevWalk implements Iterable<RevCommit> {
 	}
 
 	/**
+	 * Check whether the provided sorting strategy is enabled.
+	 * 
+	 * @param sort
+	 *            a sorting strategy to look for.
+	 * @return true if this strategy is enabled, false otherwise
+	 */
+	public boolean hasRevSort(RevSort sort) {
+		return sorting.contains(sort);
+	}
+
+	/**
 	 * Select a single sorting strategy for the returned commits.
 	 * <p>
 	 * Disables all sorting strategies, then enables only the single strategy
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [EGIT PATCH 15/20] Refactor getRevSort() calls to hasRevSort()
  2008-06-15 21:45                           ` [EGIT PATCH 14/20] Add hasRevSort() in RevWalk for faster sorting strategy checking Marek Zawirski
@ 2008-06-15 21:45                             ` Marek Zawirski
  2008-06-15 21:45                               ` [EGIT PATCH 16/20] Support for RevSort.BOUNDARY in ObjectWalk Marek Zawirski
  0 siblings, 1 reply; 29+ messages in thread
From: Marek Zawirski @ 2008-06-15 21:45 UTC (permalink / raw)
  To: robin.rosenberg, spearce; +Cc: git, Marek Zawirski

It is not signifficant performance improvement, just avoids creation of
few unnecessary objects.
However, it improves encapsulation and keeps existing RevSort checking
code consistent with further use of hasRevSort().

Signed-off-by: Marek Zawirski <marek.zawirski@gmail.com>
---
 .../src/org/spearce/jgit/revwalk/ObjectWalk.java   |    2 +-
 .../org/spearce/jgit/revwalk/StartGenerator.java   |   14 +++++++-------
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/revwalk/ObjectWalk.java b/org.spearce.jgit/src/org/spearce/jgit/revwalk/ObjectWalk.java
index a36c1cc..68ed861 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/revwalk/ObjectWalk.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/revwalk/ObjectWalk.java
@@ -198,7 +198,7 @@ public class ObjectWalk extends RevWalk {
 				return null;
 			if ((r.flags & UNINTERESTING) != 0) {
 				markTreeUninteresting(r.getTree());
-				if (getRevSort().contains(RevSort.BOUNDARY))
+				if (hasRevSort(RevSort.BOUNDARY))
 					return r;
 				continue;
 			}
diff --git a/org.spearce.jgit/src/org/spearce/jgit/revwalk/StartGenerator.java b/org.spearce.jgit/src/org/spearce/jgit/revwalk/StartGenerator.java
index debd168..7ddcd3c 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/revwalk/StartGenerator.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/revwalk/StartGenerator.java
@@ -38,7 +38,6 @@
 package org.spearce.jgit.revwalk;
 
 import java.io.IOException;
-import java.util.EnumSet;
 
 import org.spearce.jgit.errors.IncorrectObjectTypeException;
 import org.spearce.jgit.errors.MissingObjectException;
@@ -91,8 +90,7 @@ class StartGenerator extends Generator {
 			return mbg.next();
 		}
 
-		final EnumSet<RevSort> sort = w.getRevSort();
-		boolean boundary = sort.contains(RevSort.BOUNDARY);
+		boolean boundary = walker.hasRevSort(RevSort.BOUNDARY);
 
 		if (!boundary && walker instanceof ObjectWalk) {
 			// The object walker requires boundary support to color
@@ -110,9 +108,10 @@ class StartGenerator extends Generator {
 		}
 
 		int pendingOutputType = 0;
-		if (sort.contains(RevSort.START_ORDER) && !(q instanceof FIFORevQueue))
+		if (walker.hasRevSort(RevSort.START_ORDER)
+				&& !(q instanceof FIFORevQueue))
 			q = new FIFORevQueue(q);
-		if (sort.contains(RevSort.COMMIT_TIME_DESC)
+		if (walker.hasRevSort(RevSort.COMMIT_TIME_DESC)
 				&& !(q instanceof DateRevQueue))
 			q = new DateRevQueue(q);
 		if (tf != TreeFilter.ALL) {
@@ -141,9 +140,10 @@ class StartGenerator extends Generator {
 			g = new RewriteGenerator(g);
 		}
 
-		if (sort.contains(RevSort.TOPO) && (g.outputType() & SORT_TOPO) == 0)
+		if (walker.hasRevSort(RevSort.TOPO)
+				&& (g.outputType() & SORT_TOPO) == 0)
 			g = new TopoSortGenerator(g);
-		if (sort.contains(RevSort.REVERSE))
+		if (walker.hasRevSort(RevSort.REVERSE))
 			g = new LIFORevQueue(q);
 		if (boundary)
 			g = new BoundaryGenerator(w, g);
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [EGIT PATCH 16/20] Support for RevSort.BOUNDARY in ObjectWalk
  2008-06-15 21:45                             ` [EGIT PATCH 15/20] Refactor getRevSort() calls to hasRevSort() Marek Zawirski
@ 2008-06-15 21:45                               ` Marek Zawirski
  2008-06-15 21:45                                 ` [EGIT PATCH 17/20] Rename confusing objects field " Marek Zawirski
  0 siblings, 1 reply; 29+ messages in thread
From: Marek Zawirski @ 2008-06-15 21:45 UTC (permalink / raw)
  To: robin.rosenberg, spearce; +Cc: git, Marek Zawirski

When RevSort.BOUNDARY strategy in enabled, ObjectWalk now includes in
nextObjects() all objects associated with boundary commits (trees,
blobs) and all other objects explictly marked as  uninteresting
(boundary).

This behavior is something more than original C git-rev-list offers in
this matter - it is impossible to get such a behavior (to include all
boundary objects, not only commits, at output) directly from:
$ git-rev-list --objects-edge
Here, it is added for compactness - callers usually need also boundary
objects (e.g. for preparing thin-pack). If not, they can still easily
filter out such objects from nextObject() by checking for UNINTERESTING
flag or just use next() if interested only in commits.

Signed-off-by: Marek Zawirski <marek.zawirski@gmail.com>
---
 .../src/org/spearce/jgit/revwalk/ObjectWalk.java   |   26 +++++++++++++++----
 .../src/org/spearce/jgit/revwalk/RevSort.java      |    5 +++-
 2 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/revwalk/ObjectWalk.java b/org.spearce.jgit/src/org/spearce/jgit/revwalk/ObjectWalk.java
index 68ed861..81cebbd 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/revwalk/ObjectWalk.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/revwalk/ObjectWalk.java
@@ -66,8 +66,6 @@ import org.spearce.jgit.treewalk.TreeWalk;
  * commits that are returned first.
  */
 public class ObjectWalk extends RevWalk {
-	private static final int SEEN_OR_UNINTERESTING = SEEN | UNINTERESTING;
-
 	private final TreeWalk treeWalk;
 
 	private BlockObjQueue objects;
@@ -177,6 +175,8 @@ public class ObjectWalk extends RevWalk {
 			IncorrectObjectTypeException, IOException {
 		while (o instanceof RevTag) {
 			o.flags |= UNINTERESTING;
+			if (hasRevSort(RevSort.BOUNDARY))
+				addObject(o);
 			o = ((RevTag) o).getObject();
 			parse(o);
 		}
@@ -187,6 +187,10 @@ public class ObjectWalk extends RevWalk {
 			markTreeUninteresting((RevTree) o);
 		else
 			o.flags |= UNINTERESTING;
+
+		if (o.getType() != Constants.OBJ_COMMIT && hasRevSort(RevSort.BOUNDARY)) {
+			addObject(o);
+		}
 	}
 
 	@Override
@@ -198,8 +202,10 @@ public class ObjectWalk extends RevWalk {
 				return null;
 			if ((r.flags & UNINTERESTING) != 0) {
 				markTreeUninteresting(r.getTree());
-				if (hasRevSort(RevSort.BOUNDARY))
+				if (hasRevSort(RevSort.BOUNDARY)) {
+					objects.add(r.getTree());
 					return r;
+				}
 				continue;
 			}
 			objects.add(r.getTree());
@@ -237,17 +243,23 @@ public class ObjectWalk extends RevWalk {
 			switch (sType) {
 			case Constants.OBJ_BLOB: {
 				final RevObject o = lookupAny(treeWalk.getObjectId(0), sType);
-				if ((o.flags & SEEN_OR_UNINTERESTING) != 0)
+				if ((o.flags & SEEN) != 0)
 					continue;
 				o.flags |= SEEN;
+				if ((o.flags & UNINTERESTING) != 0
+						&& !hasRevSort(RevSort.BOUNDARY))
+					continue;
 				fromTreeWalk = true;
 				return o;
 			}
 			case Constants.OBJ_TREE: {
 				final RevObject o = lookupAny(treeWalk.getObjectId(0), sType);
-				if ((o.flags & SEEN_OR_UNINTERESTING) != 0)
+				if ((o.flags & SEEN) != 0)
 					continue;
 				o.flags |= SEEN;
+				if ((o.flags & UNINTERESTING) != 0
+						&& !hasRevSort(RevSort.BOUNDARY))
+					continue;
 				enterSubtree = true;
 				fromTreeWalk = true;
 				return o;
@@ -265,9 +277,11 @@ public class ObjectWalk extends RevWalk {
 			final RevObject o = objects.next();
 			if (o == null)
 				return null;
-			if ((o.flags & SEEN_OR_UNINTERESTING) != 0)
+			if ((o.flags & SEEN) != 0)
 				continue;
 			o.flags |= SEEN;
+			if ((o.flags & UNINTERESTING) != 0 && !hasRevSort(RevSort.BOUNDARY))
+				continue;
 			if (o instanceof RevTree) {
 				currentTree = (RevTree) o;
 				treeWalk.reset(new ObjectId[] { currentTree });
diff --git a/org.spearce.jgit/src/org/spearce/jgit/revwalk/RevSort.java b/org.spearce.jgit/src/org/spearce/jgit/revwalk/RevSort.java
index 8688f7f..b0a03ad 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/revwalk/RevSort.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/revwalk/RevSort.java
@@ -37,7 +37,7 @@
 
 package org.spearce.jgit.revwalk;
 
-/** Sorting strategies supported by {@link RevWalk}. */
+/** Sorting strategies supported by {@link RevWalk} and {@link ObjectWalk}. */
 public enum RevSort {
 	/**
 	 * No specific sorting is requested.
@@ -83,6 +83,9 @@ public enum RevSort {
 
 	/**
 	 * Include {@link RevFlag#UNINTERESTING} boundary commits after all others.
+	 * In {@link ObjectWalk}, objects associated with such commits (trees,
+	 * blobs), and all other objects marked explicitly as UNINTERESTING are also
+	 * included.
 	 * <p>
 	 * A boundary commit is a UNINTERESTING parent of an interesting commit that
 	 * was previously output.
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [EGIT PATCH 17/20] Rename confusing objects field in ObjectWalk
  2008-06-15 21:45                               ` [EGIT PATCH 16/20] Support for RevSort.BOUNDARY in ObjectWalk Marek Zawirski
@ 2008-06-15 21:45                                 ` Marek Zawirski
  2008-06-15 21:45                                   ` [EGIT PATCH 18/20] New CountingOutputStream class - stream decorator Marek Zawirski
  0 siblings, 1 reply; 29+ messages in thread
From: Marek Zawirski @ 2008-06-15 21:45 UTC (permalink / raw)
  To: robin.rosenberg, spearce; +Cc: git, Marek Zawirski

Rename ObjectWalk#objects field to pendingObjects, as private objects
field already existed in superclass - RevWalk. These 2 fields have
different meaning and just leaded to confusion.

Signed-off-by: Marek Zawirski <marek.zawirski@gmail.com>
---
 .../src/org/spearce/jgit/revwalk/ObjectWalk.java   |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/revwalk/ObjectWalk.java b/org.spearce.jgit/src/org/spearce/jgit/revwalk/ObjectWalk.java
index 81cebbd..6a5b857 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/revwalk/ObjectWalk.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/revwalk/ObjectWalk.java
@@ -68,7 +68,7 @@ import org.spearce.jgit.treewalk.TreeWalk;
 public class ObjectWalk extends RevWalk {
 	private final TreeWalk treeWalk;
 
-	private BlockObjQueue objects;
+	private BlockObjQueue pendingObjects;
 
 	private RevTree currentTree;
 
@@ -85,7 +85,7 @@ public class ObjectWalk extends RevWalk {
 	public ObjectWalk(final Repository repo) {
 		super(repo);
 		treeWalk = new TreeWalk(repo);
-		objects = new BlockObjQueue();
+		pendingObjects = new BlockObjQueue();
 	}
 
 	/**
@@ -203,12 +203,12 @@ public class ObjectWalk extends RevWalk {
 			if ((r.flags & UNINTERESTING) != 0) {
 				markTreeUninteresting(r.getTree());
 				if (hasRevSort(RevSort.BOUNDARY)) {
-					objects.add(r.getTree());
+					pendingObjects.add(r.getTree());
 					return r;
 				}
 				continue;
 			}
-			objects.add(r.getTree());
+			pendingObjects.add(r.getTree());
 			return r;
 		}
 	}
@@ -274,7 +274,7 @@ public class ObjectWalk extends RevWalk {
 		}
 
 		for (;;) {
-			final RevObject o = objects.next();
+			final RevObject o = pendingObjects.next();
 			if (o == null)
 				return null;
 			if ((o.flags & SEEN) != 0)
@@ -348,7 +348,7 @@ public class ObjectWalk extends RevWalk {
 	@Override
 	public void dispose() {
 		super.dispose();
-		objects = new BlockObjQueue();
+		pendingObjects = new BlockObjQueue();
 		enterSubtree = false;
 		currentTree = null;
 	}
@@ -356,14 +356,14 @@ public class ObjectWalk extends RevWalk {
 	@Override
 	protected void reset(final int retainFlags) {
 		super.reset(retainFlags);
-		objects = new BlockObjQueue();
+		pendingObjects = new BlockObjQueue();
 		enterSubtree = false;
 	}
 
 	private void addObject(final RevObject o) {
 		if ((o.flags & SEEN) == 0) {
 			o.flags |= SEEN;
-			objects.add(o);
+			pendingObjects.add(o);
 		}
 	}
 
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [EGIT PATCH 18/20] New CountingOutputStream class - stream decorator
  2008-06-15 21:45                                 ` [EGIT PATCH 17/20] Rename confusing objects field " Marek Zawirski
@ 2008-06-15 21:45                                   ` Marek Zawirski
  2008-06-15 21:45                                     ` [EGIT PATCH 19/20] Simplified implementation of pack creation: PackWriter Marek Zawirski
  0 siblings, 1 reply; 29+ messages in thread
From: Marek Zawirski @ 2008-06-15 21:45 UTC (permalink / raw)
  To: robin.rosenberg, spearce; +Cc: git, Marek Zawirski

This decorator provides information about number of already written
bytes.

Signed-off-by: Marek Zawirski <marek.zawirski@gmail.com>
---
 .../spearce/jgit/util/CountingOutputStream.java    |   89 ++++++++++++++++++++
 1 files changed, 89 insertions(+), 0 deletions(-)
 create mode 100644 org.spearce.jgit/src/org/spearce/jgit/util/CountingOutputStream.java

diff --git a/org.spearce.jgit/src/org/spearce/jgit/util/CountingOutputStream.java b/org.spearce.jgit/src/org/spearce/jgit/util/CountingOutputStream.java
new file mode 100644
index 0000000..574bb96
--- /dev/null
+++ b/org.spearce.jgit/src/org/spearce/jgit/util/CountingOutputStream.java
@@ -0,0 +1,89 @@
+/*
+ * Copyright (C) 2008, Marek Zawirski <marek.zawirski@gmail.com>
+ *
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials provided
+ *   with the distribution.
+ *
+ * - Neither the name of the Git Development Community nor the
+ *   names of its contributors may be used to endorse or promote
+ *   products derived from this software without specific prior
+ *   written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
+ * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
+ * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+ * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+package org.spearce.jgit.util;
+
+import java.io.FilterOutputStream;
+import java.io.IOException;
+import java.io.OutputStream;
+
+/**
+ * Counting output stream decoration. Counts bytes written to stream.
+ */
+public class CountingOutputStream extends FilterOutputStream {
+
+	private int count;
+
+	/**
+	 * Create counting stream being decorated to provided real output stream.
+	 * 
+	 * @param out
+	 *            output stream where data should be written
+	 */
+	public CountingOutputStream(OutputStream out) {
+		super(out);
+	}
+
+	@Override
+	public void write(int b) throws IOException {
+		out.write(b);
+		count++;
+	}
+
+	@Override
+	public void write(byte[] b, int off, int len) throws IOException {
+		out.write(b, off, len);
+		count += len;
+	}
+
+	/**
+	 * Return number of already written bytes.
+	 * 
+	 * @return number of written bytes since last reset (object is reset upon
+	 *         creation)
+	 */
+	public int getCount() {
+		return count;
+	}
+
+	/**
+	 * Reset counter to zero value.
+	 */
+	public void reset() {
+		count = 0;
+	}
+}
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [EGIT PATCH 19/20] Simplified implementation of pack creation: PackWriter
  2008-06-15 21:45                                   ` [EGIT PATCH 18/20] New CountingOutputStream class - stream decorator Marek Zawirski
@ 2008-06-15 21:45                                     ` Marek Zawirski
  2008-06-15 21:45                                       ` [EGIT PATCH 20/20] PackWriter test suite Marek Zawirski
  2008-06-17 21:28                                       ` [EGIT PATCH 21/20] Make isBetterDeltaReuseLoader() static in PackWriter Marek Zawirski
  0 siblings, 2 replies; 29+ messages in thread
From: Marek Zawirski @ 2008-06-15 21:45 UTC (permalink / raw)
  To: robin.rosenberg, spearce; +Cc: git, Marek Zawirski

This class is able to create a pack basing on provided objects
specification and options.

The core unimplemented feature comparing to the original
git-pack-objects is windowed delta searching algorithm (and needed
binary delta between 2 files).

This implementation can create an always correct pack, with
appropriate objects set and optimized order (as in original git).
Objects are written as whole objects or deltas to another objects (ref
or offset). Existing deltas and objects may be reused if set writer is
set up accordingly. Thin-packs and delta-depth options are also
supported.

Comparing to the original implementation, delta reuse is performed in a
slightly different way - allowing delta-chains longer than 2.
Delta-depth and delta-cycles are checked on-line when writing out
objects. These changes were introduced (possibly temporary) to give us
sensible pack creation implementation without binary delta generation
algorithm which is not yet implemented.

Mentored-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Marek Zawirski <marek.zawirski@gmail.com>
---
 .../src/org/spearce/jgit/lib/PackWriter.java       |  882 ++++++++++++++++++++
 1 files changed, 882 insertions(+), 0 deletions(-)
 create mode 100644 org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
new file mode 100644
index 0000000..18d3ec2
--- /dev/null
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
@@ -0,0 +1,882 @@
+/*
+ * Copyright (C) 2008, Marek Zawirski <marek.zawirski@gmail.com>
+ *
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials provided
+ *   with the distribution.
+ *
+ * - Neither the name of the Git Development Community nor the
+ *   names of its contributors may be used to endorse or promote
+ *   products derived from this software without specific prior
+ *   written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
+ * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
+ * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+ * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+package org.spearce.jgit.lib;
+
+import java.io.IOException;
+import java.io.OutputStream;
+import java.security.DigestOutputStream;
+import java.security.MessageDigest;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.zip.Deflater;
+import java.util.zip.DeflaterOutputStream;
+
+import org.spearce.jgit.errors.IncorrectObjectTypeException;
+import org.spearce.jgit.errors.MissingObjectException;
+import org.spearce.jgit.revwalk.ObjectWalk;
+import org.spearce.jgit.revwalk.RevFlag;
+import org.spearce.jgit.revwalk.RevObject;
+import org.spearce.jgit.revwalk.RevSort;
+import org.spearce.jgit.util.CountingOutputStream;
+import org.spearce.jgit.util.NB;
+
+/**
+ * <p>
+ * PackWriter class is responsible for generating pack files from specified set
+ * of objects from repository. This implementation produce pack files in format
+ * version 2.
+ * </p>
+ * <p>
+ * Source of objects may be specified in two ways:
+ * <ul>
+ * <li>(usually) by providing sets of interesting and uninteresting objects in
+ * repository - all interesting objects and their ancestors except uninteresting
+ * objects and their ancestors will be included in pack, or</li>
+ * <li>by providing iterator of {@link RevObject} specifying exact list and
+ * order of objects in pack</li>
+ * </ul>
+ * Typical usage consists of creating instance intended for some pack,
+ * configuring options through accessors methods and finally call
+ * {@link #writePack(Iterator)} or
+ * {@link #writePack(Collection, Collection, boolean)} with objects
+ * specification, to generate a pack stream.
+ * </p>
+ * <p>
+ * Class provide set of configurable options and {@link ProgressMonitor}
+ * support, as operations may take a long time for big repositories. Deltas
+ * searching algorithm is <b>NOT IMPLEMENTED</b> yet - this implementation
+ * relies only on deltas and objects reuse.
+ * </p>
+ * <p>
+ * This class is not thread safe, it is intended to be used in one thread, with
+ * one instance per created pack. Subsequent calls to writePack result in
+ * undefined behavior.
+ * </p>
+ */
+
+public class PackWriter {
+	/**
+	 * Title of {@link ProgressMonitor} task used during counting objects to
+	 * pack.
+	 * 
+	 * @see #writePack(Collection, Collection, boolean)
+	 */
+	public static final String COUNTING_OBJECTS_PROGRESS = "Counting objects to pack";
+
+	/**
+	 * Title of {@link ProgressMonitor} task used during searching for objects
+	 * reuse or delta reuse.
+	 * 
+	 * @see #writePack(Iterator)
+	 * @see #writePack(Collection, Collection, boolean)
+	 */
+	public static final String SEARCHING_REUSE_PROGRESS = "Searching for delta and object reuse";
+
+	/**
+	 * Title of {@link ProgressMonitor} task used during writing out pack
+	 * (objects)
+	 * 
+	 * @see #writePack(Iterator)
+	 * @see #writePack(Collection, Collection, boolean)
+	 */
+	public static final String WRITING_OBJECTS_PROGRESS = "Writing objects";
+
+	/**
+	 * Default value of deltas reuse option.
+	 * 
+	 * @see #setReuseDeltas(boolean)
+	 */
+	public static final boolean DEFAULT_REUSE_DELTAS = true;
+
+	/**
+	 * Default value of objects reuse option.
+	 * 
+	 * @see #setReuseObjects(boolean)
+	 */
+	public static final boolean DEFAULT_REUSE_OBJECTS = true;
+
+	/**
+	 * Default value of delta base as offset option.
+	 * 
+	 * @see #setDeltaBaseAsOffset(boolean)
+	 */
+	public static final boolean DEFAULT_DELTA_BASE_AS_OFFSET = false;
+
+	/**
+	 * Default value of maximum delta chain depth.
+	 * 
+	 * @see #setMaxDeltaDepth(int)
+	 */
+	public static final int DEFAULT_MAX_DELTA_DEPTH = 50;
+
+	private static final int PACK_VERSION_GENERATED = 2;
+
+	@SuppressWarnings("unchecked")
+	private final List<ObjectToPack> objectsLists[] = new List[Constants.OBJ_TAG + 1];
+	{
+		objectsLists[0] = Collections.<ObjectToPack> emptyList();
+		objectsLists[Constants.OBJ_COMMIT] = new ArrayList<ObjectToPack>();
+		objectsLists[Constants.OBJ_TREE] = new ArrayList<ObjectToPack>();
+		objectsLists[Constants.OBJ_BLOB] = new ArrayList<ObjectToPack>();
+		objectsLists[Constants.OBJ_TAG] = new ArrayList<ObjectToPack>();
+	}
+
+	private final ObjectIdSubclassMap<ObjectToPack> objectsMap = new ObjectIdSubclassMap<ObjectToPack>();
+
+	// edge objects for thin packs
+	private final ObjectIdSubclassMap<ObjectId> edgeObjects = new ObjectIdSubclassMap<ObjectId>();
+
+	private final Repository db;
+
+	private final DigestOutputStream out;
+
+	private final CountingOutputStream countingOut;
+
+	private final Deflater deflater;
+
+	private final ProgressMonitor monitor;
+
+	private final byte[] buf = new byte[16384]; // 16 KB
+
+	private final WindowCursor windowCursor = new WindowCursor();
+
+	private boolean reuseDeltas = DEFAULT_REUSE_DELTAS;
+
+	private boolean reuseObjects = DEFAULT_REUSE_OBJECTS;
+
+	private boolean deltaBaseAsOffset = DEFAULT_DELTA_BASE_AS_OFFSET;
+
+	private int maxDeltaDepth = DEFAULT_MAX_DELTA_DEPTH;
+
+	private boolean thin;
+
+	/**
+	 * Create writer for specified repository, that will write a pack to
+	 * provided output stream. Objects for packing are specified in
+	 * {@link #writePack(Iterator)} or
+	 * {@link #writePack(Collection, Collection, boolean)}.
+	 * 
+	 * @param repo
+	 *            repository where objects are stored.
+	 * @param out
+	 *            output stream of pack data; no buffering is guaranteed by
+	 *            writer.
+	 * @param monitor
+	 *            operations progress monitor, used within
+	 *            {@link #writePack(Iterator)} or
+	 *            {@link #writePack(Collection, Collection, boolean)}.
+	 */
+	public PackWriter(final Repository repo, final OutputStream out,
+			final ProgressMonitor monitor) {
+		this.db = repo;
+		this.monitor = monitor;
+		this.countingOut = new CountingOutputStream(out);
+		this.out = new DigestOutputStream(countingOut, Constants
+				.newMessageDigest());
+		this.deflater = new Deflater(db.getConfig().getCore().getCompression());
+	}
+
+	/**
+	 * Check whether object is configured to reuse deltas existing in
+	 * repository.
+	 * <p>
+	 * Default setting: {@value #DEFAULT_REUSE_DELTAS}
+	 * </p>
+	 * 
+	 * @return true if object is configured to reuse deltas; false otherwise.
+	 */
+	public boolean isReuseDeltas() {
+		return reuseDeltas;
+	}
+
+	/**
+	 * Set reuse deltas configuration option for this writer. When enabled,
+	 * writer will search for delta representation of object in repository and
+	 * use it if possible. Normally, only deltas with base to another object
+	 * existing in set of objects to pack will be used. Exception is however
+	 * thin-pack (see {@link #writePack(Collection, Collection, boolean)} and
+	 * {@link #writePack(Iterator)}) where base object must exist on other side
+	 * machine.
+	 * <p>
+	 * When raw delta data is directly copied from a pack file, checksum is
+	 * computed to verify data.
+	 * </p>
+	 * <p>
+	 * Default setting: {@value #DEFAULT_REUSE_DELTAS}
+	 * </p>
+	 * 
+	 * @param reuseDeltas
+	 *            boolean indicating whether or not try to reuse deltas.
+	 */
+	public void setReuseDeltas(boolean reuseDeltas) {
+		this.reuseDeltas = reuseDeltas;
+	}
+
+	/**
+	 * Checks whether object is configured to reuse existing objects
+	 * representation in repository.
+	 * <p>
+	 * Default setting: {@value #DEFAULT_REUSE_OBJECTS}
+	 * </p>
+	 * 
+	 * @return true if writer is configured to reuse objects representation from
+	 *         pack; false otherwise.
+	 */
+	public boolean isReuseObjects() {
+		return reuseObjects;
+	}
+
+	/**
+	 * Set reuse objects configuration option for this writer. If enabled,
+	 * writer searches for representation in a pack file. If possible,
+	 * compressed data is directly copied from such a pack file. Data checksum
+	 * is verified.
+	 * <p>
+	 * Default setting: {@value #DEFAULT_REUSE_OBJECTS}
+	 * </p>
+	 * 
+	 * @param reuseObjects
+	 *            boolean indicating whether or not writer should reuse existing
+	 *            objects representation.
+	 */
+	public void setReuseObjects(boolean reuseObjects) {
+		this.reuseObjects = reuseObjects;
+	}
+
+	/**
+	 * Check whether writer can store delta base as an offset (new style
+	 * reducing pack size) or should store it as an object id (legacy style,
+	 * compatible with old readers).
+	 * <p>
+	 * Default setting: {@value #DEFAULT_DELTA_BASE_AS_OFFSET}
+	 * </p>
+	 * 
+	 * @return true if delta base is stored as an offset; false if it is stored
+	 *         as an object id.
+	 */
+	public boolean isDeltaBaseAsOffset() {
+		return deltaBaseAsOffset;
+	}
+
+	/**
+	 * Set writer delta base format. Delta base can be written as an offset in a
+	 * pack file (new approach reducing file size) or as an object id (legacy
+	 * approach, compatible with old readers).
+	 * <p>
+	 * Default setting: {@value #DEFAULT_DELTA_BASE_AS_OFFSET}
+	 * </p>
+	 * 
+	 * @param deltaBaseAsOffset
+	 *            boolean indicating whether delta base can be stored as an
+	 *            offset.
+	 */
+	public void setDeltaBaseAsOffset(boolean deltaBaseAsOffset) {
+		this.deltaBaseAsOffset = deltaBaseAsOffset;
+	}
+
+	/**
+	 * Get maximum depth of delta chain set up for this writer. Generated chains
+	 * are not longer than this value.
+	 * <p>
+	 * Default setting: {@value #DEFAULT_MAX_DELTA_DEPTH}
+	 * </p>
+	 * 
+	 * @return maximum delta chain depth.
+	 */
+	public int getMaxDeltaDepth() {
+		return maxDeltaDepth;
+	}
+
+	/**
+	 * Set up maximum depth of delta chain for this writer. Generated chains are
+	 * not longer than this value. Too low value causes low compression level,
+	 * while too big makes unpacking (reading) longer.
+	 * <p>
+	 * Default setting: {@value #DEFAULT_MAX_DELTA_DEPTH}
+	 * </p>
+	 * 
+	 * @param maxDeltaDepth
+	 *            maximum delta chain depth.
+	 */
+	public void setMaxDeltaDepth(int maxDeltaDepth) {
+		this.maxDeltaDepth = maxDeltaDepth;
+	}
+
+	/**
+	 * Returns objects number in a pack file that was created by this writer.
+	 * 
+	 * @return number of objects in pack.
+	 */
+	public int getObjectsNumber() {
+		return objectsMap.size();
+	}
+
+	/**
+	 * Write pack to output stream according to current writer configuration for
+	 * provided source iterator of objects.
+	 * <p>
+	 * Iterator <b>exactly</b> determines which objects are included in a pack
+	 * and order they appear in pack (except that objects order by type is not
+	 * needed at input). This order should conform general rules of ordering
+	 * objects in git - by recency and path (type and delta-base first is
+	 * internally secured) and responsibility for guaranteeing this order is on
+	 * a caller side. Iterator must return each id of object to write exactly
+	 * once.
+	 * </p>
+	 * <p>
+	 * When iterator returns object that has {@link RevFlag#UNINTERESTING} flag,
+	 * this object won't be included in an output pack. Instead, it is recorded
+	 * as edge-object (known to remote repository) for thin-pack. In such a case
+	 * writer may pack objects with delta base object not within set of objects
+	 * to pack, but belonging to party repository - those marked with
+	 * {@link RevFlag#UNINTERESTING} flag. This type of pack is used only for
+	 * transport.
+	 * </p>
+	 * <p>
+	 * At first, this method collects and sorts objects to pack, then deltas
+	 * search is performed if set up accordingly, finally pack stream is
+	 * written. {@link ProgressMonitor} tasks {@value #SEARCHING_REUSE_PROGRESS}
+	 * (only if resueDeltas or reuseObjects is enabled) and
+	 * {@value #WRITING_OBJECTS_PROGRESS} are updated during packing.
+	 * </p>
+	 * <p>
+	 * All reused objects data checksum (Adler32/CRC32) is computed and
+	 * validated against existing checksum.
+	 * </p>
+	 * 
+	 * @param objectsSource
+	 *            iterator of object to store in a pack; order of objects within
+	 *            each type is important, ordering by type is not needed;
+	 *            allowed types for objects are {@link Constants#OBJ_COMMIT},
+	 *            {@link Constants#OBJ_TREE}, {@link Constants#OBJ_BLOB} and
+	 *            {@link Constants#OBJ_TAG}; objects returned by iterator may
+	 *            be later reused by caller as object id and type are internally
+	 *            copied in each iteration; if object returned by iterator has
+	 *            {@link RevFlag#UNINTERESTING} flag set, it won't be included
+	 *            in a pack, but is considered as edge-object for thin-pack.
+	 * @throws IOException
+	 *             when some I/O problem occur during reading objects for pack
+	 *             or writing pack stream.
+	 */
+	public void writePack(final Iterator<RevObject> objectsSource)
+			throws IOException {
+		while (objectsSource.hasNext()) {
+			addObject(objectsSource.next());
+		}
+		writePackInternal();
+	}
+
+	/**
+	 * Write pack to output stream according to current writer configuration for
+	 * provided sets of interesting and uninteresting objects.
+	 * <p>
+	 * Basing on these 2 sets, another set of objects to put in a pack file is
+	 * created: this set consists of all objects reachable (ancestors) from
+	 * interesting objects, except uninteresting objects and their ancestors.
+	 * This method uses class {@link ObjectWalk} extensively to find out that
+	 * appropriate set of output objects and their optimal order in output pack.
+	 * Order is consistent with general git in-pack rules: sort by object type,
+	 * recency, path and delta-base first.
+	 * </p>
+	 * <p>
+	 * At first, this method collects and sorts objects to pack, then deltas
+	 * search is performed if set up accordingly, finally pack stream is
+	 * written. {@link ProgressMonitor} tasks
+	 * {@value #COUNTING_OBJECTS_PROGRESS}, {@value #SEARCHING_REUSE_PROGRESS}
+	 * (only if resueDeltas or reuseObjects is enabled) and
+	 * {@value #WRITING_OBJECTS_PROGRESS} are updated during packing.
+	 * </p>
+	 * <p>
+	 * All reused objects data checksum (Adler32/CRC32) is computed and
+	 * validated against existing checksum.
+	 * </p>
+	 * 
+	 * @param interestingObjects
+	 *            collection of objects to be marked as interesting (start
+	 *            points of graph traversal).
+	 * @param uninterestingObjects
+	 *            collection of objects to be marked as uninteresting (end
+	 *            points of graph traversal).
+	 * @param thin
+	 *            a boolean indicating whether writer may pack objects with
+	 *            delta base object not within set of objects to pack, but
+	 *            belonging to party repository (uninteresting/boundary) as
+	 *            determined by set; this kind of pack is used only for
+	 *            transport; true - to produce thin pack, false - otherwise.
+	 * @throws IOException
+	 *             when some I/O problem occur during reading objects for pack
+	 *             or writing pack stream.
+	 */
+	public void writePack(final Collection<ObjectId> interestingObjects,
+			final Collection<ObjectId> uninterestingObjects, boolean thin)
+			throws IOException {
+		ObjectWalk walker = setUpWalker(interestingObjects,
+				uninterestingObjects, thin);
+		findObjectsToPack(walker);
+		writePackInternal();
+	}
+
+	/**
+	 * Computes SHA-1 of lexicographically sorted objects ids written in this
+	 * pack, as used to name a pack file in repository.
+	 * 
+	 * @return ObjectId representing SHA-1 name of a pack that was created.
+	 */
+	public ObjectId computeName() {
+		final ArrayList<ObjectToPack> sorted = new ArrayList<ObjectToPack>(
+				objectsMap.size());
+		for (List<ObjectToPack> list : objectsLists) {
+			for (ObjectToPack otp : list)
+				sorted.add(otp);
+		}
+
+		final MessageDigest md = Constants.newMessageDigest();
+		Collections.sort(sorted);
+		for (ObjectToPack otp : sorted) {
+			otp.copyRawTo(buf, 0);
+			md.update(buf, 0, Constants.OBJECT_ID_LENGTH);
+		}
+		return ObjectId.fromRaw(md.digest());
+	}
+
+	private void writePackInternal() throws IOException {
+		if (reuseDeltas || reuseObjects)
+			searchForReuse();
+
+		monitor.beginTask(WRITING_OBJECTS_PROGRESS, getObjectsNumber());
+		writeHeader();
+		writeObjects();
+		writeChecksum();
+
+		out.flush();
+		windowCursor.release();
+		monitor.endTask();
+	}
+
+	private void searchForReuse() throws IOException {
+		monitor.beginTask(SEARCHING_REUSE_PROGRESS, getObjectsNumber());
+		final Collection<PackedObjectLoader> reuseLoaders = new LinkedList<PackedObjectLoader>();
+
+		for (List<ObjectToPack> list : objectsLists) {
+			for (ObjectToPack otp : list) {
+				if (monitor.isCancelled())
+					throw new IOException(
+							"Packing cancelled during objects writing");
+				reuseLoaders.clear();
+				db.openObjectInAllPacks(otp, reuseLoaders, windowCursor);
+				if (reuseDeltas) {
+					selectDeltaReuseForObject(otp, reuseLoaders);
+				}
+				// delta reuse is preferred over object reuse
+				if (reuseObjects && !otp.hasReuseLoader()) {
+					selectObjectReuseForObject(otp, reuseLoaders);
+				}
+				monitor.update(1);
+			}
+		}
+
+		monitor.endTask();
+	}
+
+	private void selectDeltaReuseForObject(final ObjectToPack otp,
+			final Collection<PackedObjectLoader> loaders) throws IOException {
+		PackedObjectLoader bestLoader = null;
+		ObjectId bestBase = null;
+
+		for (PackedObjectLoader loader : loaders) {
+			ObjectId idBase = loader.getDeltaBase();
+			if (idBase == null)
+				continue;
+			ObjectToPack otpBase = objectsMap.get(idBase);
+
+			// only if base is in set of objects to write or thin-pack's edge
+			if ((otpBase != null || (thin && edgeObjects.get(idBase) != null))
+			// select smallest possible delta if > 1 available
+					&& isBetterDeltaReuseLoader(bestLoader, loader)) {
+				bestLoader = loader;
+				bestBase = (otpBase != null ? otpBase : idBase);
+			}
+		}
+
+		if (bestLoader != null) {
+			otp.setReuseLoader(bestLoader);
+			otp.setDeltaBase(bestBase);
+		}
+	}
+
+	private boolean isBetterDeltaReuseLoader(PackedObjectLoader currentLoader,
+			PackedObjectLoader loader) throws IOException {
+		if (currentLoader == null)
+			return true;
+		if (loader.getRawSize() < currentLoader.getRawSize())
+			return true;
+		return (loader.getRawSize() == currentLoader.getRawSize()
+				&& loader.supportsFastCopyRawData() && !currentLoader
+				.supportsFastCopyRawData());
+	}
+
+	private void selectObjectReuseForObject(final ObjectToPack otp,
+			final Collection<PackedObjectLoader> loaders) {
+		for (final PackedObjectLoader loader : loaders) {
+			if (loader instanceof WholePackedObjectLoader) {
+				otp.setReuseLoader(loader);
+				return;
+			}
+		}
+	}
+
+	private void writeHeader() throws IOException {
+		out.write(Constants.PACK_SIGNATURE);
+
+		NB.encodeInt32(buf, 0, PACK_VERSION_GENERATED);
+		out.write(buf, 0, 4);
+
+		NB.encodeInt32(buf, 0, getObjectsNumber());
+		out.write(buf, 0, 4);
+	}
+
+	private void writeObjects() throws IOException {
+		for (List<ObjectToPack> list : objectsLists) {
+			for (ObjectToPack otp : list) {
+				if (monitor.isCancelled())
+					throw new IOException(
+							"Packing cancelled during objects writing");
+				if (!otp.isWritten())
+					writeObject(otp);
+			}
+		}
+	}
+
+	private void writeObject(final ObjectToPack otp) throws IOException {
+		otp.markWantWrite();
+		if (otp.isDeltaRepresentation()) {
+			ObjectToPack deltaBase = otp.getDeltaBase();
+			assert deltaBase != null || thin;
+			if (deltaBase != null && !deltaBase.isWritten()) {
+				if (deltaBase.wantWrite()) {
+					otp.clearDeltaBase(); // cycle detected
+					otp.disposeLoader();
+				} else {
+					writeObject(deltaBase);
+				}
+			}
+
+			otp.updateDeltaDepth();
+			if (otp.getDeltaDepth() > maxDeltaDepth) {
+				otp.clearDeltaBase();
+				otp.disposeLoader();
+			}
+		}
+
+		assert !otp.isWritten();
+
+		otp.markWritten(countingOut.getCount());
+		if (otp.isDeltaRepresentation())
+			writeDeltaObject(otp);
+		else
+			writeWholeObject(otp);
+
+		monitor.update(1);
+	}
+
+	private void writeWholeObject(final ObjectToPack otp) throws IOException {
+		if (otp.hasReuseLoader()) {
+			final PackedObjectLoader loader = otp.getReuseLoader();
+			writeObjectHeader(loader.getType(), loader.getSize());
+			loader.copyRawData(out, buf);
+			otp.disposeLoader();
+		} else {
+			final ObjectLoader loader = db.openObject(windowCursor, otp);
+			final DeflaterOutputStream deflaterOut = new DeflaterOutputStream(
+					out, deflater);
+			writeObjectHeader(loader.getType(), loader.getSize());
+			deflaterOut.write(loader.getCachedBytes());
+			deflaterOut.finish();
+			deflater.reset();
+		}
+	}
+
+	private void writeDeltaObject(final ObjectToPack otp) throws IOException {
+		final PackedObjectLoader loader = otp.getReuseLoader();
+		if (deltaBaseAsOffset && otp.getDeltaBase() != null) {
+			writeObjectHeader(Constants.OBJ_OFS_DELTA, loader.getRawSize());
+
+			final ObjectToPack deltaBase = otp.getDeltaBase();
+			long offsetDiff = otp.getOffset() - deltaBase.getOffset();
+			int pos = buf.length - 1;
+			buf[pos] = (byte) (offsetDiff & 0x7F);
+			while ((offsetDiff >>= 7) > 0) {
+				buf[--pos] = (byte) (0x80 | (--offsetDiff & 0x7F));
+			}
+
+			out.write(buf, pos, buf.length - pos);
+		} else {
+			writeObjectHeader(Constants.OBJ_REF_DELTA, loader.getRawSize());
+			otp.getDeltaBaseId().copyRawTo(buf, 0);
+			out.write(buf, 0, Constants.OBJECT_ID_LENGTH);
+		}
+		loader.copyRawData(out, buf);
+		otp.disposeLoader();
+	}
+
+	private void writeObjectHeader(final int objectType, long dataLength)
+			throws IOException {
+		long nextLength = dataLength >>> 4;
+		int size = 0;
+		buf[size++] = (byte) ((nextLength > 0 ? 0x80 : 0x00)
+				| (objectType << 4) | (dataLength & 0x0F));
+		dataLength = nextLength;
+		while (dataLength > 0) {
+			nextLength >>>= 7;
+			buf[size++] = (byte) ((nextLength > 0 ? 0x80 : 0x00) | (dataLength & 0x7F));
+			dataLength = nextLength;
+		}
+		out.write(buf, 0, size);
+	}
+
+	private void writeChecksum() throws IOException {
+		out.on(false);
+		final byte checksum[] = out.getMessageDigest().digest();
+		out.write(checksum);
+	}
+
+	private ObjectWalk setUpWalker(
+			final Collection<ObjectId> interestingObjects,
+			final Collection<ObjectId> uninterestingObjects, boolean thin)
+			throws MissingObjectException, IOException,
+			IncorrectObjectTypeException {
+		final ObjectWalk walker = new ObjectWalk(db);
+		walker.sort(RevSort.TOPO, true);
+		walker.sort(RevSort.COMMIT_TIME_DESC, true);
+		if (thin)
+			walker.sort(RevSort.BOUNDARY);
+
+		for (ObjectId id : interestingObjects) {
+			RevObject o = walker.parseAny(id);
+			walker.markStart(o);
+		}
+		for (ObjectId id : uninterestingObjects) {
+			RevObject o = walker.parseAny(id);
+			walker.markUninteresting(o);
+		}
+		return walker;
+	}
+
+	private void findObjectsToPack(final ObjectWalk walker)
+			throws MissingObjectException, IncorrectObjectTypeException,
+			IOException {
+		monitor.beginTask(COUNTING_OBJECTS_PROGRESS, ProgressMonitor.UNKNOWN);
+		RevObject o;
+
+		while ((o = walker.next()) != null) {
+			addObject(o);
+			monitor.update(1);
+		}
+		while ((o = walker.nextObject()) != null) {
+			addObject(o);
+			monitor.update(1);
+		}
+		monitor.endTask();
+	}
+
+	private void addObject(RevObject object)
+			throws IncorrectObjectTypeException {
+		if (object.has(RevFlag.UNINTERESTING)) {
+			edgeObjects.add(object);
+			thin = true;
+			return;
+		}
+
+		final ObjectToPack otp = new ObjectToPack(object);
+		try {
+			objectsLists[object.getType()].add(otp);
+		} catch (ArrayIndexOutOfBoundsException x) {
+			throw new IncorrectObjectTypeException(object,
+					"COMMIT nor TREE nor BLOB nor TAG");
+		} catch (UnsupportedOperationException x) {
+			// index pointing to "dummy" empty list
+			throw new IncorrectObjectTypeException(object,
+					"COMMIT nor TREE nor BLOB nor TAG");
+		}
+		objectsMap.add(otp);
+	}
+
+	/**
+	 * Class holding information about object that is going to be packed by
+	 * {@link PackWriter}. Information include object representation in a
+	 * pack-file and object status.
+	 * 
+	 */
+	static class ObjectToPack extends ObjectId {
+		private ObjectId deltaBase;
+
+		private PackedObjectLoader reuseLoader;
+
+		private long offset = -1;
+
+		private int deltaDepth;
+
+		private boolean wantWrite;
+
+		/**
+		 * Construct object for specified object id. <br/> By default object is
+		 * marked as not written and non-delta packed (as a whole object).
+		 * 
+		 * @param src
+		 *            object id of object for packing
+		 */
+		ObjectToPack(AnyObjectId src) {
+			super(src);
+		}
+
+		/**
+		 * @return delta base object id if object is going to be packed in delta
+		 *         representation; null otherwise - if going to be packed as a
+		 *         whole object.
+		 */
+		ObjectId getDeltaBaseId() {
+			return deltaBase;
+		}
+
+		/**
+		 * @return delta base object to pack if object is going to be packed in
+		 *         delta representation and delta is specified as object to
+		 *         pack; null otherwise - if going to be packed as a whole
+		 *         object or delta base is specified only as id.
+		 */
+		ObjectToPack getDeltaBase() {
+			if (deltaBase instanceof ObjectToPack)
+				return (ObjectToPack) deltaBase;
+			return null;
+		}
+
+		/**
+		 * Set delta base for the object. Delta base set by this method is used
+		 * by {@link PackWriter} to write object - determines its representation
+		 * in a created pack.
+		 * 
+		 * @param deltaBase
+		 *            delta base object or null if object should be packed as a
+		 *            whole object.
+		 * 
+		 */
+		void setDeltaBase(ObjectId deltaBase) {
+			this.deltaBase = deltaBase;
+		}
+
+		void clearDeltaBase() {
+			this.deltaBase = null;
+		}
+
+		/**
+		 * @return true if object is going to be written as delta; false
+		 *         otherwise.
+		 */
+		boolean isDeltaRepresentation() {
+			return deltaBase != null;
+		}
+
+		/**
+		 * Check if object is already written in a pack. This information is
+		 * used to achieve delta-base precedence in a pack file.
+		 * 
+		 * @return true if object is already written; false otherwise.
+		 */
+		boolean isWritten() {
+			return offset != -1;
+		}
+
+		/**
+		 * @return offset in pack when object has been already written, or -1 if
+		 *         it has not been written yet
+		 */
+		long getOffset() {
+			return offset;
+		}
+
+		/**
+		 * Mark object as written. This information is used to achieve
+		 * delta-base precedence in a pack file.
+		 * 
+		 * @param offset
+		 *            offset where written object starts
+		 */
+		void markWritten(long offset) {
+			this.offset = offset;
+		}
+
+		PackedObjectLoader getReuseLoader() {
+			return reuseLoader;
+		}
+
+		boolean hasReuseLoader() {
+			return reuseLoader != null;
+		}
+
+		void setReuseLoader(PackedObjectLoader reuseLoader) {
+			this.reuseLoader = reuseLoader;
+		}
+
+		void disposeLoader() {
+			this.reuseLoader = null;
+		}
+
+		int getDeltaDepth() {
+			return deltaDepth;
+		}
+
+		void updateDeltaDepth() {
+			if (deltaBase instanceof ObjectToPack)
+				deltaDepth = ((ObjectToPack) deltaBase).deltaDepth + 1;
+			else if (deltaBase != null)
+				deltaDepth = 1;
+		}
+
+		boolean wantWrite() {
+			return wantWrite;
+		}
+
+		void markWantWrite() {
+			this.wantWrite = true;
+		}
+	}
+}
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [EGIT PATCH 20/20] PackWriter test suite
  2008-06-15 21:45                                     ` [EGIT PATCH 19/20] Simplified implementation of pack creation: PackWriter Marek Zawirski
@ 2008-06-15 21:45                                       ` Marek Zawirski
  2008-06-17 21:28                                       ` [EGIT PATCH 21/20] Make isBetterDeltaReuseLoader() static in PackWriter Marek Zawirski
  1 sibling, 0 replies; 29+ messages in thread
From: Marek Zawirski @ 2008-06-15 21:45 UTC (permalink / raw)
  To: robin.rosenberg, spearce; +Cc: git, Marek Zawirski

Test suite is provided relying on IndexPack and PackIndex to verify
PackWriter output for various configurations.

Signed-off-by: Marek Zawirski <marek.zawirski@gmail.com>
---
 .../tst/org/spearce/jgit/lib/PackWriterTest.java   |  454 ++++++++++++++++++++
 1 files changed, 454 insertions(+), 0 deletions(-)
 create mode 100644 org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackWriterTest.java

diff --git a/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackWriterTest.java b/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackWriterTest.java
new file mode 100644
index 0000000..9572342
--- /dev/null
+++ b/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackWriterTest.java
@@ -0,0 +1,454 @@
+/*
+ * Copyright (C) 2008, Marek Zawirski <marek.zawirski@gmail.com>
+ *
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials provided
+ *   with the distribution.
+ *
+ * - Neither the name of the Git Development Community nor the
+ *   names of its contributors may be used to endorse or promote
+ *   products derived from this software without specific prior
+ *   written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
+ * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
+ * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+ * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+package org.spearce.jgit.lib;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.spearce.jgit.errors.MissingObjectException;
+import org.spearce.jgit.lib.PackIndex.MutableEntry;
+import org.spearce.jgit.revwalk.RevObject;
+import org.spearce.jgit.revwalk.RevWalk;
+import org.spearce.jgit.transport.IndexPack;
+import org.spearce.jgit.util.CountingOutputStream;
+
+public class PackWriterTest extends RepositoryTestCase {
+
+	private static final List<ObjectId> EMPTY_LIST_OBJECT = Collections
+			.<ObjectId> emptyList();
+
+	private static final List<RevObject> EMPTY_LIST_REVS = Collections
+			.<RevObject> emptyList();
+
+	private PackWriter writer;
+
+	private ByteArrayOutputStream os;
+
+	private CountingOutputStream cos;
+
+	private File packBase;
+
+	private File packFile;
+
+	private File indexFile;
+
+	private PackFile pack;
+
+	public void setUp() throws Exception {
+		super.setUp();
+		os = new ByteArrayOutputStream();
+		cos = new CountingOutputStream(os);
+		packBase = new File(trash, "tmp_pack");
+		packFile = new File(trash, "tmp_pack.pack");
+		indexFile = new File(trash, "tmp_pack.idx");
+		writer = new PackWriter(db, cos, new TextProgressMonitor());
+	}
+
+	/**
+	 * Test constructor for exceptions, default settings, initialization.
+	 */
+	public void testContructor() {
+		assertEquals(false, writer.isDeltaBaseAsOffset());
+		assertEquals(true, writer.isReuseDeltas());
+		assertEquals(true, writer.isReuseObjects());
+		assertEquals(0, writer.getObjectsNumber());
+	}
+
+	/**
+	 * Change default settings and verify them.
+	 */
+	public void testModifySettings() {
+		writer.setDeltaBaseAsOffset(true);
+		writer.setReuseDeltas(false);
+		writer.setReuseObjects(false);
+
+		assertEquals(true, writer.isDeltaBaseAsOffset());
+		assertEquals(false, writer.isReuseDeltas());
+		assertEquals(false, writer.isReuseObjects());
+	}
+
+	/**
+	 * Write empty pack by providing empty sets of interesting/uninteresting
+	 * objects and check for correct format.
+	 * 
+	 * @throws IOException
+	 */
+	public void testWriteEmptyPack1() throws IOException {
+		createVerifyOpenPack(EMPTY_LIST_OBJECT, EMPTY_LIST_OBJECT, false);
+
+		assertEquals(0, writer.getObjectsNumber());
+		assertEquals(0, pack.getObjectCount());
+		assertEquals("da39a3ee5e6b4b0d3255bfef95601890afd80709", writer
+				.computeName().toString());
+	}
+
+	/**
+	 * Write empty pack by providing empty iterator of objects to write and
+	 * check for correct format.
+	 * 
+	 * @throws IOException
+	 */
+	public void testWriteEmptyPack2() throws IOException {
+		createVerifyOpenPack(EMPTY_LIST_REVS.iterator());
+
+		assertEquals(0, writer.getObjectsNumber());
+		assertEquals(0, pack.getObjectCount());
+	}
+
+	/**
+	 * Create pack basing on only interesting objects, then precisely verify
+	 * content. No delta reuse here.
+	 * 
+	 * @throws IOException
+	 */
+	public void testWritePack1() throws IOException {
+		writer.setReuseDeltas(false);
+		writeVerifyPack1();
+	}
+
+	/**
+	 * Test writing pack without object reuse. Pack content/preparation as in
+	 * {@link #testWritePack1()}.
+	 * 
+	 * @throws IOException
+	 */
+	public void testWritePack1NoObjectReuse() throws IOException {
+		writer.setReuseDeltas(false);
+		writer.setReuseObjects(false);
+		writeVerifyPack1();
+	}
+
+	/**
+	 * Create pack basing on both interesting and uninteresting objects, then
+	 * precisely verify content. No delta reuse here.
+	 * 
+	 * @throws IOException
+	 */
+	public void testWritePack2() throws IOException {
+		writeVerifyPack2(false);
+	}
+
+	/**
+	 * Test pack writing with deltas reuse, delta-base first rule. Pack
+	 * content/preparation as in {@link #testWritePack2()}.
+	 * 
+	 * @throws IOException
+	 */
+	public void testWritePack2DeltasReuseRefs() throws IOException {
+		writeVerifyPack2(true);
+	}
+
+	/**
+	 * Test pack writing with delta reuse. Delta bases referred as offsets. Pack
+	 * configuration as in {@link #testWritePack2DeltasReuseRefs()}.
+	 * 
+	 * @throws IOException
+	 */
+	public void testWritePack2DeltasReuseOffsets() throws IOException {
+		writer.setDeltaBaseAsOffset(true);
+		writeVerifyPack2(true);
+	}
+
+	/**
+	 * Test pack writing with delta reuse. Raw-data copy (reuse) is made on a
+	 * pack with CRC32 index. Pack configuration as in
+	 * {@link #testWritePack2DeltasReuseRefs()}.
+	 * 
+	 * @throws IOException
+	 */
+	public void testWritePack2DeltasCRC32Copy() throws IOException {
+		final File packDir = new File(db.getObjectsDirectory(), "pack");
+		final File crc32Pack = new File(packDir,
+				"pack-34be9032ac282b11fa9babdc2b2a93ca996c9c2f.pack");
+		final File crc32Idx = new File(packDir,
+				"pack-34be9032ac282b11fa9babdc2b2a93ca996c9c2f.idx");
+		copyFile(new File(new File("tst"),
+				"pack-34be9032ac282b11fa9babdc2b2a93ca996c9c2f.idxV2"),
+				crc32Idx);
+		db.openPack(crc32Pack, crc32Idx);
+
+		writeVerifyPack2(true);
+	}
+
+	/**
+	 * Create pack basing on fixed objects list, then precisely verify content.
+	 * No delta reuse here.
+	 * 
+	 * @throws IOException
+	 * @throws MissingObjectException
+	 * 
+	 */
+	public void testWritePack3() throws MissingObjectException, IOException {
+		writer.setReuseDeltas(false);
+		final ObjectId forcedOrder[] = new ObjectId[] {
+				ObjectId.fromString("82c6b885ff600be425b4ea96dee75dca255b69e7"),
+				ObjectId.fromString("c59759f143fb1fe21c197981df75a7ee00290799"),
+				ObjectId.fromString("aabf2ffaec9b497f0950352b3e582d73035c2035"),
+				ObjectId.fromString("902d5476fa249b7abc9d84c611577a81381f0327"),
+				ObjectId.fromString("5b6e7c66c276e7610d4a73c70ec1a1f7c1003259"),
+				ObjectId.fromString("6ff87c4664981e4397625791c8ea3bbb5f2279a3") };
+		final RevWalk parser = new RevWalk(db);
+		final RevObject forcedOrderRevs[] = new RevObject[forcedOrder.length];
+		for (int i = 0; i < forcedOrder.length; i++)
+			forcedOrderRevs[i] = parser.parseAny(forcedOrder[i]);
+
+		createVerifyOpenPack(Arrays.asList(forcedOrderRevs).iterator());
+
+		assertEquals(forcedOrder.length, writer.getObjectsNumber());
+		verifyObjectsOrder(forcedOrder);
+		assertEquals("ed3f96b8327c7c66b0f8f70056129f0769323d86", writer
+				.computeName().toString());
+	}
+
+	/**
+	 * Another pack creation: basing on both interesting and uninteresting
+	 * objects. No delta reuse possible here, as this is a specific case when we
+	 * write only 1 commit, associated with 1 tree, 1 blob.
+	 * 
+	 * @throws IOException
+	 */
+	public void testWritePack4() throws IOException {
+		writeVerifyPack4(false);
+	}
+
+	/**
+	 * Test thin pack writing: 1 blob delta base is on objects edge. Pack
+	 * configuration as in {@link #testWritePack4()}.
+	 * 
+	 * @throws IOException
+	 */
+	public void testWritePack4ThinPack() throws IOException {
+		writeVerifyPack4(true);
+	}
+
+	/**
+	 * Compare sizes of packs created using {@link #testWritePack2()} and
+	 * {@link #testWritePack2DeltasReuseRefs()}. The pack using deltas should
+	 * be smaller.
+	 * 
+	 * @throws Exception
+	 */
+	public void testWritePack2SizeDeltasVsNoDeltas() throws Exception {
+		testWritePack2();
+		final int sizePack2NoDeltas = cos.getCount();
+		setUp();
+		testWritePack2DeltasReuseRefs();
+		final int sizePack2DeltasRefs = cos.getCount();
+
+		assertTrue(sizePack2NoDeltas > sizePack2DeltasRefs);
+	}
+
+	/**
+	 * Compare sizes of packs created using
+	 * {@link #testWritePack2DeltasReuseRefs()} and
+	 * {@link #testWritePack2DeltasReuseOffsets()}. The pack with delta bases
+	 * written as offsets should be smaller.
+	 * 
+	 * @throws Exception
+	 */
+	public void testWritePack2SizeOffsetsVsRefs() throws Exception {
+		testWritePack2DeltasReuseRefs();
+		final int sizePack2DeltasRefs = cos.getCount();
+		setUp();
+		testWritePack2DeltasReuseOffsets();
+		final int sizePack2DeltasOffsets = cos.getCount();
+
+		assertTrue(sizePack2DeltasRefs > sizePack2DeltasOffsets);
+	}
+
+	/**
+	 * Compare sizes of packs created using {@link #testWritePack4()} and
+	 * {@link #testWritePack4ThinPack()}. Obviously, the thin pack should be
+	 * smaller.
+	 * 
+	 * @throws Exception
+	 */
+	public void testWritePack4SizeThinVsNoThin() throws Exception {
+		testWritePack4();
+		final int sizePack4 = cos.getCount();
+		setUp();
+		testWritePack4ThinPack();
+		final int sizePack4Thin = cos.getCount();
+
+		assertTrue(sizePack4 > sizePack4Thin);
+	}
+
+	// TODO: testWritePackDeltasCycle()
+	// TODO: testWritePackDeltasDepth()
+
+	private void writeVerifyPack1() throws IOException {
+		final LinkedList<ObjectId> interestings = new LinkedList<ObjectId>();
+		interestings.add(ObjectId
+				.fromString("82c6b885ff600be425b4ea96dee75dca255b69e7"));
+		createVerifyOpenPack(interestings, EMPTY_LIST_OBJECT, false);
+
+		final ObjectId expectedOrder[] = new ObjectId[] {
+				ObjectId.fromString("82c6b885ff600be425b4ea96dee75dca255b69e7"),
+				ObjectId.fromString("c59759f143fb1fe21c197981df75a7ee00290799"),
+				ObjectId.fromString("540a36d136cf413e4b064c2b0e0a4db60f77feab"),
+				ObjectId.fromString("aabf2ffaec9b497f0950352b3e582d73035c2035"),
+				ObjectId.fromString("902d5476fa249b7abc9d84c611577a81381f0327"),
+				ObjectId.fromString("4b825dc642cb6eb9a060e54bf8d69288fbee4904"),
+				ObjectId.fromString("5b6e7c66c276e7610d4a73c70ec1a1f7c1003259"),
+				ObjectId.fromString("6ff87c4664981e4397625791c8ea3bbb5f2279a3") };
+
+		assertEquals(expectedOrder.length, writer.getObjectsNumber());
+		verifyObjectsOrder(expectedOrder);
+		assertEquals("34be9032ac282b11fa9babdc2b2a93ca996c9c2f", writer
+				.computeName().toString());
+	}
+
+	private void writeVerifyPack2(boolean deltaReuse) throws IOException {
+		writer.setReuseDeltas(deltaReuse);
+		final LinkedList<ObjectId> interestings = new LinkedList<ObjectId>();
+		interestings.add(ObjectId
+				.fromString("82c6b885ff600be425b4ea96dee75dca255b69e7"));
+		final LinkedList<ObjectId> uninterestings = new LinkedList<ObjectId>();
+		uninterestings.add(ObjectId
+				.fromString("540a36d136cf413e4b064c2b0e0a4db60f77feab"));
+		createVerifyOpenPack(interestings, uninterestings, false);
+
+		final ObjectId expectedOrder[] = new ObjectId[] {
+				ObjectId.fromString("82c6b885ff600be425b4ea96dee75dca255b69e7"),
+				ObjectId.fromString("c59759f143fb1fe21c197981df75a7ee00290799"),
+				ObjectId.fromString("aabf2ffaec9b497f0950352b3e582d73035c2035"),
+				ObjectId.fromString("902d5476fa249b7abc9d84c611577a81381f0327"),
+				ObjectId.fromString("5b6e7c66c276e7610d4a73c70ec1a1f7c1003259"),
+				ObjectId.fromString("6ff87c4664981e4397625791c8ea3bbb5f2279a3") };
+		if (deltaReuse) {
+			// objects order influenced (swapped) by delta-base first rule
+			ObjectId temp = expectedOrder[4];
+			expectedOrder[4] = expectedOrder[5];
+			expectedOrder[5] = temp;
+		}
+		assertEquals(expectedOrder.length, writer.getObjectsNumber());
+		verifyObjectsOrder(expectedOrder);
+		assertEquals("ed3f96b8327c7c66b0f8f70056129f0769323d86", writer
+				.computeName().toString());
+	}
+
+	private void writeVerifyPack4(final boolean thin) throws IOException {
+		final LinkedList<ObjectId> interestings = new LinkedList<ObjectId>();
+		interestings.add(ObjectId
+				.fromString("82c6b885ff600be425b4ea96dee75dca255b69e7"));
+		final LinkedList<ObjectId> uninterestings = new LinkedList<ObjectId>();
+		uninterestings.add(ObjectId
+				.fromString("c59759f143fb1fe21c197981df75a7ee00290799"));
+		createVerifyOpenPack(interestings, uninterestings, thin);
+
+		final ObjectId writtenObjects[] = new ObjectId[] {
+				ObjectId.fromString("82c6b885ff600be425b4ea96dee75dca255b69e7"),
+				ObjectId.fromString("aabf2ffaec9b497f0950352b3e582d73035c2035"),
+				ObjectId.fromString("5b6e7c66c276e7610d4a73c70ec1a1f7c1003259") };
+		assertEquals(writtenObjects.length, writer.getObjectsNumber());
+		ObjectId expectedObjects[];
+		if (thin) {
+			expectedObjects = new ObjectId[4];
+			System.arraycopy(writtenObjects, 0, expectedObjects, 0,
+					writtenObjects.length);
+			expectedObjects[3] = ObjectId
+					.fromString("6ff87c4664981e4397625791c8ea3bbb5f2279a3");
+
+		} else {
+			expectedObjects = writtenObjects;
+		}
+		verifyObjectsOrder(expectedObjects);
+		assertEquals("cded4b74176b4456afa456768b2b5aafb41c44fc", writer
+				.computeName().toString());
+	}
+
+	private void createVerifyOpenPack(final Collection<ObjectId> interestings,
+			final Collection<ObjectId> uninterestings, final boolean thin)
+			throws MissingObjectException, IOException {
+		writer.writePack(interestings, uninterestings, thin);
+		verifyOpenPack(thin);
+	}
+
+	private void createVerifyOpenPack(final Iterator<RevObject> objectSource)
+			throws MissingObjectException, IOException {
+		writer.writePack(objectSource);
+		verifyOpenPack(false);
+	}
+
+	private void verifyOpenPack(final boolean thin) throws IOException {
+		if (thin) {
+			final InputStream is = new ByteArrayInputStream(os.toByteArray());
+			final IndexPack indexer = new IndexPack(db, is, packBase);
+			try {
+				indexer.index(new TextProgressMonitor());
+				fail("indexer should grumble about missing object");
+			} catch (IOException x) {
+				// expected
+			}
+		}
+		final InputStream is = new ByteArrayInputStream(os.toByteArray());
+		final IndexPack indexer = new IndexPack(db, is, packBase);
+		indexer.setFixThin(thin);
+		indexer.index(new TextProgressMonitor());
+		pack = new PackFile(db, indexFile, packFile);
+	}
+
+	private void verifyObjectsOrder(final ObjectId objectsOrder[]) {
+		final List<PackIndex.MutableEntry> entries = new ArrayList<PackIndex.MutableEntry>();
+
+		for (MutableEntry me : pack) {
+			entries.add(me.cloneEntry());
+		}
+		Collections.sort(entries, new Comparator<PackIndex.MutableEntry>() {
+			public int compare(MutableEntry o1, MutableEntry o2) {
+				return Long.signum(o1.getOffset() - o2.getOffset());
+			}
+		});
+
+		int i = 0;
+		for (MutableEntry me : entries) {
+			assertEquals(objectsOrder[i++], me.copy());
+		}
+	}
+}
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [EGIT PATCH 05/20] Reverse pack index implementation: PackReverseIndex
  2008-06-15 21:45         ` [EGIT PATCH 05/20] Reverse pack index implementation: PackReverseIndex Marek Zawirski
  2008-06-15 21:45           ` [EGIT PATCH 06/20] Tests for PackReverseIndex Marek Zawirski
@ 2008-06-16  4:06           ` Shawn O. Pearce
  2008-06-16 16:27             ` Marek Zawirski
  1 sibling, 1 reply; 29+ messages in thread
From: Shawn O. Pearce @ 2008-06-16  4:06 UTC (permalink / raw)
  To: Marek Zawirski; +Cc: robin.rosenberg, git

Marek Zawirski <marek.zawirski@gmail.com> wrote:
> Let us quickly find ObjectId or next object for specified offset in a
> pack, in O(log n) time.
...
> diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackReverseIndex.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackReverseIndex.java
...
> +	/**
> +	 * Object ids corresponding to {@link #offsets32} and {@link #offsets64}.
> +	 */
> +	private final int names[];

This could be smaller if it was an array of indexes into the index,
rather than the ObjectId itself.  Thus we need only 1 int per object
and not 5 ints per object.

But I see why you are doing it; PackIndex.MutableEntry exposes the
ObjectId and not the nth position of the object within the index.

> +	PackReverseIndex(final PackIndex index) {
> +		final long count = index.getObjectCount();
> +		if (count > Integer.MAX_VALUE)
> +			throw new IllegalArgumentException(
> +					"Huge indexes (> 2^31 entries) are not supported by jgit, yet");

For what its worth, this limit is actually:

	Integer.MAX_VALUE / Constants.OBJECT_ID_LENGTH / 4

as you store the entire ObjectId data for the index in a single int[]
array called names.  So you'll get an ArrayIndexOutOfBoundsException
or maybe OutOfMemoryError when you try to build names later on, and
never really hit this case here.

> +	ObjectId findObject(final long offset) {
> +		if (offset <= Integer.MAX_VALUE) {
> +			final int i32 = Arrays.binarySearch(offsets32, (int) offset);
> +			if (i32 < 0)
> +				return null;
> +			final int iNames = i32 * Constants.OBJECT_ID_LENGTH / 4;

This probably should be:

	final int iNames = i32 * (Constants.OBJECT_ID_LENGTH / 4);

as then we don't overflow the precision of int when i32 is large.

> +			return ObjectId.fromRaw(names, iNames);
> +		} else {
> +			final int i64 = Arrays.binarySearch(offsets64, offset);
> +			if (i64 < 0)
> +				return null;
> +			final int iNames = (i64 + offsets32.length)
> +					* Constants.OBJECT_ID_LENGTH / 4;

Again, watch out for overflow.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [EGIT PATCH 00/20] PackWriter, first usable attempt
  2008-06-15 21:45 [EGIT PATCH 00/20] PackWriter, first usable attempt Marek Zawirski
  2008-06-15 21:45 ` [EGIT PATCH 01/20] Fix typo in PackIndexV2 Marek Zawirski
@ 2008-06-16  5:19 ` Shawn O. Pearce
  2008-06-16 16:37   ` Marek Zawirski
  1 sibling, 1 reply; 29+ messages in thread
From: Shawn O. Pearce @ 2008-06-16  5:19 UTC (permalink / raw)
  To: Marek Zawirski; +Cc: robin.rosenberg, git

Marek Zawirski <marek.zawirski@gmail.com> wrote:
> At first, some stuff was still missing to produce packs, mostly
> raw-data access related and ObjectWalk related.

I'm glad it turned out to be so little missing actually.  Reusing
ObjectWalk saved a lot of code in the pack writer, and for the most
part our existing data access structures were already well organized.

It is too early to say how the performance is going to work, but
object packing with delta reuse can be difficult and I'm happy to
see that our abstractions more-or-less supported it.  Tuning can
come later, once we better understand the code, and have something
for end-users to complain (or praise) about.
 
> Finally, we've got some support for pack writing! It's not that
> power that C git version offers, but something usable. Delta
> generation is not supported. Although we can reuse deltas and objects,
> and support all other (I hope) options of git-pack-objects directly or
> indirectly, most importantly --thin.
> 
> Pack writing and some other features are tested, seem to work.
> 
> This implementation of packing is not a very valuable thing directly
> (achieving efficient storage), however it's a base for enhancements
> and can be used for sending packs over net (with some assumptions).
> It's more a "repacking" than "packing" tool.

Yup.  The critical part here is jgit can now format a pack file,
which means we can now actually implement native push over the
local pipe (to fork+exec'd git-receive-pack) or over SSH.  That
is one of the major missing features in the Eclipse plugin, so
this is a huge milestone for us.  Thank you Marek.

> So... I'm switching now to push implementation. If time allows,
> delta-algorithms will be added later.

Yay.

Native push protocol support at this point is much more important
than delta generation.  Although delta generation is one of the
key features that makes git so damn efficient it is pointless if we
cannot actually communicate with a remote repository to send them
our changes.  Early adopters of the push support coming from this
plugin can at least use it on local area networks, where bandwidth
is not (usually) a limiting factor.

>  28 files changed, 2258 insertions(+), 73 deletions(-)

Nice to see it didn't take that much code either.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [EGIT PATCH 05/20] Reverse pack index implementation: PackReverseIndex
  2008-06-16  4:06           ` [EGIT PATCH 05/20] Reverse pack index implementation: PackReverseIndex Shawn O. Pearce
@ 2008-06-16 16:27             ` Marek Zawirski
  2008-06-17  2:02               ` Shawn O. Pearce
  0 siblings, 1 reply; 29+ messages in thread
From: Marek Zawirski @ 2008-06-16 16:27 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: robin.rosenberg, git

Shawn O. Pearce wrote:
> Marek Zawirski <marek.zawirski@gmail.com> wrote:
>> Let us quickly find ObjectId or next object for specified offset in a
>> pack, in O(log n) time.
> ...
>> diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackReverseIndex.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackReverseIndex.java
> ...
>> +	/**
>> +	 * Object ids corresponding to {@link #offsets32} and {@link #offsets64}.
>> +	 */
>> +	private final int names[];
> 
> This could be smaller if it was an array of indexes into the index,
> rather than the ObjectId itself.  Thus we need only 1 int per object
> and not 5 ints per object.
> 
> But I see why you are doing it; PackIndex.MutableEntry exposes the
> ObjectId and not the nth position of the object within the index.

Hmm, that's smart.
I can change array of names to second level indices, but I think that in 
such a case PackReverseIndex should be an inner class of PackIndex and 
some refactor/additional assumptions at PackIndex may be needed. What do 
you think?

>> +	PackReverseIndex(final PackIndex index) {
>> +		final long count = index.getObjectCount();
>> +		if (count > Integer.MAX_VALUE)
>> +			throw new IllegalArgumentException(
>> +					"Huge indexes (> 2^31 entries) are not supported by jgit, yet");
> 
> For what its worth, this limit is actually:
> 
> 	Integer.MAX_VALUE / Constants.OBJECT_ID_LENGTH / 4
> 
> as you store the entire ObjectId data for the index in a single int[]
> array called names.  So you'll get an ArrayIndexOutOfBoundsException
> or maybe OutOfMemoryError when you try to build names later on, and
> never really hit this case here.
>> +	ObjectId findObject(final long offset) {
>> +		if (offset <= Integer.MAX_VALUE) {
>> +			final int i32 = Arrays.binarySearch(offsets32, (int) offset);
>> +			if (i32 < 0)
>> +				return null;
>> +			final int iNames = i32 * Constants.OBJECT_ID_LENGTH / 4;
> 
> This probably should be:
> 
> 	final int iNames = i32 * (Constants.OBJECT_ID_LENGTH / 4);
> 
> as then we don't overflow the precision of int when i32 is large.
> 
>> +			return ObjectId.fromRaw(names, iNames);
>> +		} else {
>> +			final int i64 = Arrays.binarySearch(offsets64, offset);
>> +			if (i64 < 0)
>> +				return null;
>> +			final int iNames = (i64 + offsets32.length)
>> +					* Constants.OBJECT_ID_LENGTH / 4;
> 
> Again, watch out for overflow.

Right, my faults.

-- 
Marek Zawirski [zawir]
marek.zawirski@gmail.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [EGIT PATCH 00/20] PackWriter, first usable attempt
  2008-06-16  5:19 ` [EGIT PATCH 00/20] PackWriter, first usable attempt Shawn O. Pearce
@ 2008-06-16 16:37   ` Marek Zawirski
  0 siblings, 0 replies; 29+ messages in thread
From: Marek Zawirski @ 2008-06-16 16:37 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: robin.rosenberg, git

Shawn O. Pearce wrote:
> Marek Zawirski <marek.zawirski@gmail.com> wrote:
>> At first, some stuff was still missing to produce packs, mostly
>> raw-data access related and ObjectWalk related.
> 
> I'm glad it turned out to be so little missing actually.  Reusing
> ObjectWalk saved a lot of code in the pack writer, and for the most
> part our existing data access structures were already well organized.

Yeah, I actually expected that this feature implementation would cause 
more changes. But existence of transport and rev-walking frameworks in 
jgit helped a lot. Jgit code changed significantly between my first look 
at it (march/april) and GSoC start date.

(...)
>> Finally, we've got some support for pack writing! It's not that
>> power that C git version offers, but something usable. Delta
>> generation is not supported. Although we can reuse deltas and objects,
>> and support all other (I hope) options of git-pack-objects directly or
>> indirectly, most importantly --thin.
>>
>> Pack writing and some other features are tested, seem to work.
>>
>> This implementation of packing is not a very valuable thing directly
>> (achieving efficient storage), however it's a base for enhancements
>> and can be used for sending packs over net (with some assumptions).
>> It's more a "repacking" than "packing" tool.
> 
> Yup.  The critical part here is jgit can now format a pack file,
> which means we can now actually implement native push over the
> local pipe (to fork+exec'd git-receive-pack) or over SSH.  That
> is one of the major missing features in the Eclipse plugin, so
> this is a huge milestone for us.  Thank you Marek.

Hey, I'm here for doing this, they even pay me for that fun;)

-- 
Marek Zawirski [zawir]
marek.zawirski@gmail.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [EGIT PATCH 05/20] Reverse pack index implementation: PackReverseIndex
  2008-06-16 16:27             ` Marek Zawirski
@ 2008-06-17  2:02               ` Shawn O. Pearce
  0 siblings, 0 replies; 29+ messages in thread
From: Shawn O. Pearce @ 2008-06-17  2:02 UTC (permalink / raw)
  To: Marek Zawirski; +Cc: robin.rosenberg, git

Marek Zawirski <marek.zawirski@gmail.com> wrote:
> Shawn O. Pearce wrote:
> >Marek Zawirski <marek.zawirski@gmail.com> wrote:
> >>Let us quickly find ObjectId or next object for specified offset in a
> >>pack, in O(log n) time.
...
> >This could be smaller if it was an array of indexes into the index,
> >rather than the ObjectId itself.  Thus we need only 1 int per object
> >and not 5 ints per object.
> 
> Hmm, that's smart.
> I can change array of names to second level indices, but I think that in 
> such a case PackReverseIndex should be an inner class of PackIndex and 
> some refactor/additional assumptions at PackIndex may be needed. What do 
> you think?

So I coded this up today. It can either squash to this patch I
am replying to, or maybe follow-on in the series.

The advantage of this change is we use the minimum amount of memory
possible to build the reverse index.  The disadvantage is the
construction of the reverse index now runs in O(N + 2 * (N log N)),
where the prior version was O(N + N log N), ignoring all GC costs.

We could be doing worse here, but I suspect the additional running
time is better than the memory usage from cloning every MutableEntry
during traversal.  GC costs are not free.

With this patch ObjectId lookup is still constant time, though
we do perform a binary search on a 256 element array.  log 256 is
still a constant, even though it is not 1. :-)

We support up to 2^32 objects per index, which is the limit of the
file format, however we only support the first 2 billion objects
being in the first 2 GB of the pack file.  Given that each object
needs _at least_ two bytes of data the first 2 billion objects will
easily require the first 4 GB of the pack file, pushing us into the
offsets64 table.  Which is then itself limited to 2 billion objects.
So our real limit here is we cannot have more than 2 billion objects
requiring 64 bit offsets.  But PackIndexV2 only supports (2**31-1)/8
such 64 bit offsets so we'll blow that out long before we blow the
PackReverseIndex limits.

Yes, PackIndexV2 is currently limited by its implementation and
not by what the file format would permit.


--8<--
Improved PackReverseIndex

---
 .../src/org/spearce/jgit/lib/PackIndex.java        |   57 ++++++++++++++
 .../src/org/spearce/jgit/lib/PackIndexV1.java      |   38 +++++++++-
 .../src/org/spearce/jgit/lib/PackIndexV2.java      |   33 ++++++++-
 .../src/org/spearce/jgit/lib/PackReverseIndex.java |   82 +++++++++++---------
 4 files changed, 170 insertions(+), 40 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndex.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndex.java
index 3935d4f..6debf3b 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndex.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndex.java
@@ -140,6 +140,63 @@ public abstract class PackIndex implements Iterable<PackIndex.MutableEntry> {
 	abstract long getObjectCount();
 
 	/**
+	 * Obtain the total number of objects needing 64 bit offsets.
+	 * 
+	 * @return number of objects in this index using a 64 bit offset; that is an
+	 *         object positioned after the 2 GB position within the file.
+	 */
+	abstract long getOffset64Count();
+
+	/**
+	 * Get ObjectId for the n-th object entry returned by {@link #iterator()}.
+	 * <p>
+	 * This method is a constant-time replacement for the following loop:
+	 * 
+	 * <pre>
+	 * Iterator&lt;MutableEntry&gt; eItr = index.iterator();
+	 * int curPosition = 0;
+	 * while (eItr.hasNext() &amp;&amp; curPosition++ &lt; nthPosition)
+	 * 	eItr.next();
+	 * ObjectId result = eItr.next().toObjectId();
+	 * </pre>
+	 * 
+	 * @param nthPosition
+	 *            position within the traversal of {@link #iterator()} that the
+	 *            caller needs the object for. The first returned
+	 *            {@link MutableEntry} is 0, the second is 1, etc.
+	 * @return the ObjectId for the corresponding entry.
+	 */
+	abstract ObjectId getObjectId(long nthPosition);
+
+	/**
+	 * Get ObjectId for the n-th object entry returned by {@link #iterator()}.
+	 * <p>
+	 * This method is a constant-time replacement for the following loop:
+	 * 
+	 * <pre>
+	 * Iterator&lt;MutableEntry&gt; eItr = index.iterator();
+	 * int curPosition = 0;
+	 * while (eItr.hasNext() &amp;&amp; curPosition++ &lt; nthPosition)
+	 * 	eItr.next();
+	 * ObjectId result = eItr.next().toObjectId();
+	 * </pre>
+	 * 
+	 * @param nthPosition
+	 *            unsigned 32 bit position within the traversal of
+	 *            {@link #iterator()} that the caller needs the object for. The
+	 *            first returned {@link MutableEntry} is 0, the second is 1,
+	 *            etc. Positions past 2**31-1 are negative, but still valid.
+	 * @return the ObjectId for the corresponding entry.
+	 */
+	final ObjectId getObjectId(final int nthPosition) {
+		if (nthPosition >= 0)
+			return getObjectId((long) nthPosition);
+		final int u31 = nthPosition >>> 1;
+		final int one = nthPosition & 1;
+		return getObjectId(((long) u31) << 1 | one);
+	}
+
+	/**
 	 * Locate the file offset position for the requested object.
 	 * 
 	 * @param objId
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV1.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV1.java
index b8d9de3..b58dfdf 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV1.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV1.java
@@ -40,6 +40,7 @@ package org.spearce.jgit.lib;
 
 import java.io.IOException;
 import java.io.InputStream;
+import java.util.Arrays;
 import java.util.Iterator;
 import java.util.NoSuchElementException;
 
@@ -49,6 +50,8 @@ import org.spearce.jgit.util.NB;
 class PackIndexV1 extends PackIndex {
 	private static final int IDX_HDR_LEN = 256 * 4;
 
+	private final long[] idxHeader;
+
 	private byte[][] idxdata;
 
 	private long objectCnt;
@@ -59,7 +62,7 @@ class PackIndexV1 extends PackIndex {
 		System.arraycopy(hdr, 0, fanoutTable, 0, hdr.length);
 		NB.readFully(fd, fanoutTable, hdr.length, IDX_HDR_LEN - hdr.length);
 
-		final long[] idxHeader = new long[256]; // really unsigned 32-bit...
+		idxHeader = new long[256]; // really unsigned 32-bit...
 		for (int k = 0; k < idxHeader.length; k++)
 			idxHeader[k] = NB.decodeUInt32(fanoutTable, k * 4);
 		idxdata = new byte[idxHeader.length][];
@@ -82,6 +85,39 @@ class PackIndexV1 extends PackIndex {
 		return objectCnt;
 	}
 
+	@Override
+	long getOffset64Count() {
+		long n64 = 0;
+		for (final MutableEntry e : this) {
+			if (e.getOffset() >= Integer.MAX_VALUE)
+				n64++;
+		}
+		return n64;
+	}
+
+	@Override
+	ObjectId getObjectId(final long nthPosition) {
+		int levelOne = Arrays.binarySearch(idxHeader, nthPosition + 1);
+		long base;
+		if (levelOne >= 0) {
+			// If we hit the bucket exactly the item is in the bucket, or
+			// any bucket before it which has the same object count.
+			//
+			base = idxHeader[levelOne];
+			while (levelOne > 0 && base == idxHeader[levelOne - 1])
+				levelOne--;
+		} else {
+			// The item is in the bucket we would insert it into.
+			//
+			levelOne = -(levelOne + 1);
+		}
+
+		base = levelOne > 0 ? idxHeader[levelOne - 1] : 0;
+		final int p = (int) (nthPosition - base);
+		final int dataIdx = ((4 + Constants.OBJECT_ID_LENGTH) * p) + 4;
+		return ObjectId.fromRaw(idxdata[levelOne], dataIdx);
+	}
+
 	long findOffset(final AnyObjectId objId) {
 		final int levelOne = objId.getFirstByte();
 		byte[] data = idxdata[levelOne];
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV2.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV2.java
index ae70f11..8b2c6d6 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV2.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV2.java
@@ -40,6 +40,7 @@ package org.spearce.jgit.lib;
 import java.io.EOFException;
 import java.io.IOException;
 import java.io.InputStream;
+import java.util.Arrays;
 import java.util.Iterator;
 import java.util.NoSuchElementException;
 
@@ -57,6 +58,8 @@ class PackIndexV2 extends PackIndex {
 
 	private long objectCnt;
 
+	private final long[] fanoutTable;
+
 	/** 256 arrays of contiguous object names. */
 	private int[][] names;
 
@@ -69,7 +72,7 @@ class PackIndexV2 extends PackIndex {
 	PackIndexV2(final InputStream fd) throws IOException {
 		final byte[] fanoutRaw = new byte[4 * FANOUT];
 		NB.readFully(fd, fanoutRaw, 0, fanoutRaw.length);
-		final long[] fanoutTable = new long[FANOUT];
+		fanoutTable = new long[FANOUT];
 		for (int k = 0; k < FANOUT; k++)
 			fanoutTable[k] = NB.decodeUInt32(fanoutRaw, k * 4);
 		objectCnt = fanoutTable[FANOUT - 1];
@@ -151,6 +154,34 @@ class PackIndexV2 extends PackIndex {
 	}
 
 	@Override
+	long getOffset64Count() {
+		return offset64.length / 8;
+	}
+
+	@Override
+	ObjectId getObjectId(final long nthPosition) {
+		int levelOne = Arrays.binarySearch(fanoutTable, nthPosition + 1);
+		long base;
+		if (levelOne >= 0) {
+			// If we hit the bucket exactly the item is in the bucket, or
+			// any bucket before it which has the same object count.
+			//
+			base = fanoutTable[levelOne];
+			while (levelOne > 0 && base == fanoutTable[levelOne - 1])
+				levelOne--;
+		} else {
+			// The item is in the bucket we would insert it into.
+			//
+			levelOne = -(levelOne + 1);
+		}
+
+		base = levelOne > 0 ? fanoutTable[levelOne - 1] : 0;
+		final int p = (int) (nthPosition - base);
+		final int p4 = p << 2;
+		return ObjectId.fromRaw(names[levelOne], p4 + p); // p * 5
+	}
+
+	@Override
 	long findOffset(final AnyObjectId objId) {
 		final int levelOne = objId.getFirstByte();
 		final int[] data = names[levelOne];
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackReverseIndex.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackReverseIndex.java
index 3dede88..bac7477 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackReverseIndex.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackReverseIndex.java
@@ -38,7 +38,6 @@
 package org.spearce.jgit.lib;
 
 import java.util.Arrays;
-import java.util.Comparator;
 
 import org.spearce.jgit.errors.CorruptObjectException;
 import org.spearce.jgit.lib.PackIndex.MutableEntry;
@@ -54,6 +53,9 @@ import org.spearce.jgit.lib.PackIndex.MutableEntry;
  * @see PackFile
  */
 class PackReverseIndex {
+	/** Index we were created from, and that has our ObjectId data. */
+	private final PackIndex index;
+
 	/**
 	 * (offset31, truly) Offsets accommodating in 31 bits.
 	 */
@@ -64,48 +66,55 @@ class PackReverseIndex {
 	 */
 	private final long offsets64[];
 
-	/**
-	 * Object ids corresponding to {@link #offsets32} and {@link #offsets64}.
-	 */
-	private final int names[];
+	/** Position of the corresponding {@link #offsets32} in {@link #index}. */
+	private final int nth32[];
+
+	/** Position of the corresponding {@link #offsets64} in {@link #index}. */
+	private final int nth64[];
 
 	/**
 	 * Create reverse index from straight/forward pack index, by indexing all
 	 * its entries.
 	 * 
-	 * @param index
+	 * @param packIndex
 	 *            forward index - entries to (reverse) index.
 	 */
-	PackReverseIndex(final PackIndex index) {
-		final long count = index.getObjectCount();
-		if (count > Integer.MAX_VALUE)
+	PackReverseIndex(final PackIndex packIndex) {
+		index = packIndex;
+
+		final long cnt = index.getObjectCount();
+		final long n64 = index.getOffset64Count();
+		final long n32 = cnt - n64;
+		if (n32 > Integer.MAX_VALUE || n64 > Integer.MAX_VALUE
+				|| cnt > 0xffffffffL)
 			throw new IllegalArgumentException(
-					"Huge indexes (> 2^31 entries) are not supported by jgit, yet");
-
-		final MutableEntry entries[] = new MutableEntry[(int) count];
-		int i = 0;
-		int count32 = 0;
-		for (MutableEntry me : index) {
-			entries[i++] = me.cloneEntry();
-			if (me.getOffset() <= Integer.MAX_VALUE)
-				count32++;
+					"Huge indexes are not supported by jgit, yet");
+
+		offsets32 = new int[(int) n32];
+		offsets64 = new long[(int) n64];
+		nth32 = new int[offsets32.length];
+		nth64 = new int[offsets64.length];
+
+		int i32 = 0;
+		int i64 = 0;
+		for (final MutableEntry me : index) {
+			final long o = me.getOffset();
+			if (o < Integer.MAX_VALUE)
+				offsets32[i32++] = (int) o;
+			else
+				offsets64[i64++] = o;
 		}
-		Arrays.sort(entries, new Comparator<MutableEntry>() {
-			public int compare(MutableEntry o1, MutableEntry o2) {
-				return Long.signum(o1.getOffset() - o2.getOffset());
-			}
-		});
-
-		names = new int[entries.length * Constants.OBJECT_ID_LENGTH / 4];
-		offsets32 = new int[count32];
-		offsets64 = new long[entries.length - count32];
-		for (int j = 0, j32 = 0; j < entries.length; j++) {
-			final long offset = entries[j].getOffset();
-			if (offset <= Integer.MAX_VALUE)
-				offsets32[j32++] = (int) offset;
+
+		Arrays.sort(offsets32);
+		Arrays.sort(offsets64);
+
+		int nth = 0;
+		for (final MutableEntry me : index) {
+			final long o = me.getOffset();
+			if (o < Integer.MAX_VALUE)
+				nth32[Arrays.binarySearch(offsets32, (int) o)] = nth++;
 			else
-				offsets64[j - j32] = offset;
-			entries[j].copyRawTo(names, j * Constants.OBJECT_ID_LENGTH / 4);
+				nth64[Arrays.binarySearch(offsets64, o)] = nth++;
 		}
 	}
 
@@ -122,15 +131,12 @@ class PackReverseIndex {
 			final int i32 = Arrays.binarySearch(offsets32, (int) offset);
 			if (i32 < 0)
 				return null;
-			final int iNames = i32 * Constants.OBJECT_ID_LENGTH / 4;
-			return ObjectId.fromRaw(names, iNames);
+			return index.getObjectId(nth32[i32]);
 		} else {
 			final int i64 = Arrays.binarySearch(offsets64, offset);
 			if (i64 < 0)
 				return null;
-			final int iNames = (i64 + offsets32.length)
-					* Constants.OBJECT_ID_LENGTH / 4;
-			return ObjectId.fromRaw(names, iNames);
+			return index.getObjectId(nth64[i64]);
 		}
 	}
 
-- 
1.5.6.rc2.181.gbb495


-- 
Shawn.

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [EGIT PATCH 21/20] Make isBetterDeltaReuseLoader() static in PackWriter
  2008-06-15 21:45                                     ` [EGIT PATCH 19/20] Simplified implementation of pack creation: PackWriter Marek Zawirski
  2008-06-15 21:45                                       ` [EGIT PATCH 20/20] PackWriter test suite Marek Zawirski
@ 2008-06-17 21:28                                       ` Marek Zawirski
  2008-06-17 22:07                                         ` Robin Rosenberg
  1 sibling, 1 reply; 29+ messages in thread
From: Marek Zawirski @ 2008-06-17 21:28 UTC (permalink / raw)
  To: robin.rosenberg, spearce; +Cc: git, Marek Zawirski

Implementation was already static, it's just a fix for clarity and
potential speed-up.

Reported-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Marek Zawirski <marek.zawirski@gmail.com>
---
It could be squashed with patch 19/20. I can clean up this mess, adding 
also Shawn's patch - just let me know what is preferred way (squash
commits, commits on top?).

 .../src/org/spearce/jgit/lib/PackWriter.java       |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
index 18d3ec2..ba43da5 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
@@ -543,8 +543,9 @@ public class PackWriter {
 		}
 	}
 
-	private boolean isBetterDeltaReuseLoader(PackedObjectLoader currentLoader,
-			PackedObjectLoader loader) throws IOException {
+	private static boolean isBetterDeltaReuseLoader(
+			PackedObjectLoader currentLoader, PackedObjectLoader loader)
+			throws IOException {
 		if (currentLoader == null)
 			return true;
 		if (loader.getRawSize() < currentLoader.getRawSize())
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [EGIT PATCH 21/20] Make isBetterDeltaReuseLoader() static in PackWriter
  2008-06-17 21:28                                       ` [EGIT PATCH 21/20] Make isBetterDeltaReuseLoader() static in PackWriter Marek Zawirski
@ 2008-06-17 22:07                                         ` Robin Rosenberg
  2008-06-19 16:26                                           ` Marek Zawirski
  0 siblings, 1 reply; 29+ messages in thread
From: Robin Rosenberg @ 2008-06-17 22:07 UTC (permalink / raw)
  To: Marek Zawirski; +Cc: spearce, git

tisdagen den 17 juni 2008 23.28.54 skrev Marek Zawirski:
> Implementation was already static, it's just a fix for clarity and
> potential speed-up.
> 
> Reported-by: Shawn O. Pearce <spearce@spearce.org>
> Signed-off-by: Marek Zawirski <marek.zawirski@gmail.com>
> ---
> It could be squashed with patch 19/20. I can clean up this mess, adding 
> also Shawn's patch - just let me know what is preferred way (squash
> commits, commits on top?).

If the code is already merged then patch on top, else squashing or rebase,
unless you feel there is a reason not to. We can pretend it was right from
the start :)  I see no educational value in having a separate patch in this case.

-- robin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [EGIT PATCH 21/20] Make isBetterDeltaReuseLoader() static in PackWriter
  2008-06-17 22:07                                         ` Robin Rosenberg
@ 2008-06-19 16:26                                           ` Marek Zawirski
  0 siblings, 0 replies; 29+ messages in thread
From: Marek Zawirski @ 2008-06-19 16:26 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: spearce, git

Robin Rosenberg wrote:
> tisdagen den 17 juni 2008 23.28.54 skrev Marek Zawirski:
>> Implementation was already static, it's just a fix for clarity and
>> potential speed-up.
>>
>> Reported-by: Shawn O. Pearce <spearce@spearce.org>
>> Signed-off-by: Marek Zawirski <marek.zawirski@gmail.com>
>> ---
>> It could be squashed with patch 19/20. I can clean up this mess, adding 
>> also Shawn's patch - just let me know what is preferred way (squash
>> commits, commits on top?).
> 
> If the code is already merged then patch on top, else squashing or rebase,
> unless you feel there is a reason not to. We can pretend it was right from
> the start :)  

So let's pretend that...
I have squashed these 2 additional patches (Shawn's improvement for 
reverse index and my minor fix) into appropriate commits. "packwriter" 
branch was updated (non fast-forward):
http://repo.or.cz/w/egit/zawir.git?a=shortlog;h=refs/heads/packwriter

 > I see no educational value in having a separate patch in this case.

The only educational value was to type Reported-by on my own ;)


-- 
Marek Zawirski [zawir]
marek.zawirski@gmail.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2008-06-19 16:29 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-15 21:45 [EGIT PATCH 00/20] PackWriter, first usable attempt Marek Zawirski
2008-06-15 21:45 ` [EGIT PATCH 01/20] Fix typo in PackIndexV2 Marek Zawirski
2008-06-15 21:45   ` [EGIT PATCH 02/20] Integer versions of copyRawTo() and fromRaw() in ObjectId Marek Zawirski
2008-06-15 21:45     ` [EGIT PATCH 03/20] Add openObjectInAllPacks() to Repository, exposing packed objects storage Marek Zawirski
2008-06-15 21:45       ` [EGIT PATCH 04/20] WindowedFile fragments copying: copyToStream() Marek Zawirski
2008-06-15 21:45         ` [EGIT PATCH 05/20] Reverse pack index implementation: PackReverseIndex Marek Zawirski
2008-06-15 21:45           ` [EGIT PATCH 06/20] Tests for PackReverseIndex Marek Zawirski
2008-06-15 21:45             ` [EGIT PATCH 07/20] Refactor PackIndexV2 - extract binarySearchLevelTwo() Marek Zawirski
2008-06-15 21:45               ` [EGIT PATCH 08/20] CRC32 support for PackIndex Marek Zawirski
2008-06-15 21:45                 ` [EGIT PATCH 09/20] CRC32 PackIndex tests Marek Zawirski
2008-06-15 21:45                   ` [EGIT PATCH 10/20] Format PackedObjectLoader class Marek Zawirski
2008-06-15 21:45                     ` [EGIT PATCH 11/20] Format UnpackedObjectLoader class Marek Zawirski
2008-06-15 21:45                       ` [EGIT PATCH 12/20] Format DeltaOfsPackedObjectLoader class Marek Zawirski
2008-06-15 21:45                         ` [EGIT PATCH 13/20] Raw-data operations in ObjectLoaders and PackFile Marek Zawirski
2008-06-15 21:45                           ` [EGIT PATCH 14/20] Add hasRevSort() in RevWalk for faster sorting strategy checking Marek Zawirski
2008-06-15 21:45                             ` [EGIT PATCH 15/20] Refactor getRevSort() calls to hasRevSort() Marek Zawirski
2008-06-15 21:45                               ` [EGIT PATCH 16/20] Support for RevSort.BOUNDARY in ObjectWalk Marek Zawirski
2008-06-15 21:45                                 ` [EGIT PATCH 17/20] Rename confusing objects field " Marek Zawirski
2008-06-15 21:45                                   ` [EGIT PATCH 18/20] New CountingOutputStream class - stream decorator Marek Zawirski
2008-06-15 21:45                                     ` [EGIT PATCH 19/20] Simplified implementation of pack creation: PackWriter Marek Zawirski
2008-06-15 21:45                                       ` [EGIT PATCH 20/20] PackWriter test suite Marek Zawirski
2008-06-17 21:28                                       ` [EGIT PATCH 21/20] Make isBetterDeltaReuseLoader() static in PackWriter Marek Zawirski
2008-06-17 22:07                                         ` Robin Rosenberg
2008-06-19 16:26                                           ` Marek Zawirski
2008-06-16  4:06           ` [EGIT PATCH 05/20] Reverse pack index implementation: PackReverseIndex Shawn O. Pearce
2008-06-16 16:27             ` Marek Zawirski
2008-06-17  2:02               ` Shawn O. Pearce
2008-06-16  5:19 ` [EGIT PATCH 00/20] PackWriter, first usable attempt Shawn O. Pearce
2008-06-16 16:37   ` Marek Zawirski

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).