git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Sam Vilain <sam@vilain.net>
To: git@vger.kernel.org
Cc: Nick Edelen <sirnot@gmail.com>,
	"Shawn O. Pearce" <spearce@spearce.org>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Andreas Ericsson <exon@op5.se>,
	Christian Couder <christian@couder.net>,
	Jeff King <peff@peff.net>
Subject: [PATCH 7/7] revision cache: be even stricter with sort order
Date: Fri, 5 Jun 2009 02:05:13 +1200	[thread overview]
Message-ID: <00f8798ca56481f207983f6f26fc5fda1f12f337.1244125128.git.sam@vilain.net> (raw)
In-Reply-To: <cover.1244125127.git.sam@vilain.net>

For sliced bundles, there must be absolutely no ambiguity at all about
the sort order.  Define how to break ties, and describe why this
affects mirror-sync.

Signed-off-by: Sam Vilain <sam@vilain.net>
---
 Documentation/technical/revision-cache.txt |   79 +++++++++++++++++++---------
 1 files changed, 54 insertions(+), 25 deletions(-)

diff --git a/Documentation/technical/revision-cache.txt b/Documentation/technical/revision-cache.txt
index 0cd7b08..7aaab38 100644
--- a/Documentation/technical/revision-cache.txt
+++ b/Documentation/technical/revision-cache.txt
@@ -14,7 +14,7 @@ A revision cache contains;
   - A list of objects which are referred to by any 'start' object, but
     not by crossing 'end' objects, including:
 
-    * position when sorted in --date-order
+    * position when sorted in --date-order, with ties broken by SHA1
 
     * object ID
 
@@ -33,9 +33,15 @@ A revision cache contains;
     'end' object, but not reachable from the 'start' objects reachable
     from that 'end' object.
 
-  - a list of 'foreign end' objects, for which not all reachable
-    objects are in the object list, but can have reachability or a
-    'newness' bitmap.
+  - a list of 'foreign start' objects, for which not all reachable
+    objects are in the object list, but can have reachability and
+    'newness' bitmaps.
+
+  - a list of object types for each object, in the order they appear
+    in the object list.
+
+  - a list of object lengths for each object, in the order they appear
+    in the object list.
 
 
 Start Objects and End Objects
@@ -64,7 +70,9 @@ from the offset.
 this still only specifies an ordering for commit objects.  Other
 objects will appear after the object which first refers to them.  Tag
 objects are sorted as if they were commit objects with a single
-parent, the object they tag.
+parent, the object they tag.  No object is allowed to appear before
+another object which refers to it, unless it is a 'start' object.
+Ties in the order are broken by SHA1.
 
 
 Included object hash table
@@ -231,6 +239,23 @@ caches whether the object exists and is reachable from the 'start'
 objects which connect to those stacked caches.
 
 
+Extending Revision Caches
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+  rev_cache_extend( rev_cache, quasi_interesting[] )
+
+The function works on an existing revision cache, and calculates just
+the 'reachability' and 'newness' bitmaps.
+
+These are calculated by marking the quasi_interesting[] commits as
+interesting, the 'end' objects in the revision cache as
+'uninteresting', and walking.
+Objects which are encountered which exist in the revision cache are
+converted to bits in the emitted bitmap, and the 'newness' bitmap is
+built as with the normal case once the reachable 'end' objects are
+known.
+
+
 receive-pack/pack-objects
 ~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -360,28 +385,32 @@ slicing bundles (mirror-sync)
 
 For 'slicing' bundles, a deterministic method is required for chopping
 up the list of objects, which can be calculated by any node which has
-the 'interesting' commits.
-
-To determine the objects which fall within a given slice, the object
-list must be enumerated and then divided evenly.  As the compressed
-size on one node cannot be reproduced on another node, the
-uncompressed size of the object is used instead, with the hope that
-slices will generally end up of roughly even size once compressed.
-
-To calculate the object boundaries, the list of objects in
---date-order must first be tie-broken, eg for commits with the same
-commitdate and topological order.  If slices are allowed to be
-sub-commit level, then objects between commits (ie blobs and trees)
-are also sorted.  Once this deterministic list has been built, then
-all of the objects must be accessed to determine their length.  The
-objects which start within a given range are the ones in the slice.
+the 'interesting' commits.  As topological order is important, bundles
+must only consist of objects from a single revision cache to be sliced
+in this manner.
+
+To determine the objects which fall within a given slice, the new
+object list in the bundle must be enumerated (such as by decompressing
+and masking bitmaps) and then divided evenly, using the (masked)
+object lengths list.  As the compressed size on one node cannot be
+reproduced on another node, the uncompressed size of the object is
+used instead, with the hope that slices will generally end up of
+roughly even size once compressed.
+
+The objects which start within a given range in this list of objects
+are the ones in the slice.  If slices are not allowed at the
+sub-commit level, then the requirement to tie-break ordering of
+non-commit/tag objects is relaxed, however the list of per-object
+types is required to know where the boundaries are.
 
 For re-using deltas in sliced bundles, the delta base is looked up in
-the deterministic list.  If it has an earlier sequence, then the delta
-can be safely re-used.  If it has a later sequence, then the delta
-must be resolved and then the object re-deltified/compressed, using
-only objects with an earlier sequence (or in the returned pack) as a
-base.
+the deterministic list of objects.  If it has an earlier sequence,
+then the delta can be safely re-used.  If it has a later sequence,
+then the delta must be resolved and then the object
+re-deltified/compressed, using only objects with an earlier sequence
+(or in the returned pack) as a base.
 
 This requirement is so that a collection of 'sliced' bundles can
 successfully re-assemble, while still allowing them to be 'thin'.
+Without thin packs, download spreading from multiple mirrors will
+result in a much larger download.
-- 
debian.1.5.6.1

  parent reply	other threads:[~2009-06-05  5:45 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-04 14:18 [PATCH 0/7] [GSoC2009] Revision cache / git-daemon caching plan Sam Vilain
2009-06-04 14:05 ` [PATCH 4/7] rev-cache: allow multiple 'start' objects per index Sam Vilain
2009-06-04 14:05 ` [PATCH 5/7] revision cache: maps of 'new' objects Sam Vilain
2009-06-04 14:05 ` [PATCH 3/7] rev-cache: add 'end' objects for caching 'uninteresting' lookups Sam Vilain
2009-06-04 14:05 ` [PATCH 1/7] revision-cache: define revision cache as simple list of revisions Sam Vilain
2009-06-05 19:28   ` Nicolas Pitre
2009-06-07  3:49     ` Sam Vilain
2009-06-07  5:43       ` Nicolas Pitre
2009-06-04 14:05 ` [PATCH 2/7] rev-cache: add on-disk format for fast reachability lookup Sam Vilain
2009-06-04 14:05 ` [PATCH 6/7] revision cache: allow foreign 'start' commits Sam Vilain
2009-06-04 14:05 ` Sam Vilain [this message]
2009-06-05 16:56 ` [PATCH 0/7] [GSoC2009] Revision cache / git-daemon caching plan Jakub Narebski
2009-06-05 20:58   ` Nicolas Pitre
2009-06-07  4:01   ` Sam Vilain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=00f8798ca56481f207983f6f26fc5fda1f12f337.1244125128.git.sam@vilain.net \
    --to=sam@vilain.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=christian@couder.net \
    --cc=exon@op5.se \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=sirnot@gmail.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).