git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Sam Vilain <sam@vilain.net>
To: git@vger.kernel.org
Cc: Nick Edelen <sirnot@gmail.com>,
	"Shawn O. Pearce" <spearce@spearce.org>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Andreas Ericsson <exon@op5.se>,
	Christian Couder <christian@couder.net>,
	Jeff King <peff@peff.net>
Subject: [PATCH 2/7] rev-cache: add on-disk format for fast reachability lookup
Date: Fri, 5 Jun 2009 02:05:12 +1200	[thread overview]
Message-ID: <25e92985a657be1d7ea3dd8486cbe404b072a2a2.1244125127.git.sam@vilain.net> (raw)
In-Reply-To: <cover.1244125127.git.sam@vilain.net>

As well as storing the sorted list of objects, store a hash table for
faster lookup.

Signed-off-by: Sam Vilain <sam@vilain.net>
---
 Documentation/technical/revision-cache.txt |   24 ++++++++++++++++++++----
 1 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/Documentation/technical/revision-cache.txt b/Documentation/technical/revision-cache.txt
index cc18535..8349cfe 100644
--- a/Documentation/technical/revision-cache.txt
+++ b/Documentation/technical/revision-cache.txt
@@ -14,6 +14,9 @@ A revision cache contains;
 
     * object ID
 
+  - A hash table from an (abbreviated) object ID to a position into
+    the above list
+
 
 Start Object
 ------------
@@ -35,6 +38,18 @@ objects are sorted as if they were commit objects with a single
 parent, the object they tag.
 
 
+Included object hash table
+--------------------------
+
+This index is used to quickly determine if an object exists in the
+index without scanning the entire topological list.
+
+Entries in the object hash table can be shortened, eg to 3 or 4 bytes;
+basically they just need to be long enough to avoid collisions within
+the objects which exist in the list.  Any match must be confirmed by
+checking the full SHA1 in the topological list.
+
+
 Use Cases
 ---------
 In this section, the key functions and operations that this index is
@@ -131,7 +146,8 @@ passed, or any 'uninteresting' objects were passed.
 
 This function must revision walk the commit graph, sorting in
 --date-order along the way, and may emit revisions as they are
-discovered to the topological object list.
+discovered to the topological object list.  It must also build a hash
+table of object IDs and emit it at the end.
 
 
 receive-pack/pack-objects
@@ -186,9 +202,9 @@ emitted, the delta from the packfile is re-used.  If a loop is
 detected or the delta base is not in the returned set of objects, then
 the delta is first resolved.
 
-This implies that the list of objects is first loaded into a hash
-table prior to returning any objects; however this is probably
-acceptable as the entire list is in one stream and will load quickly.
+This implies that each delta base must be looked up in the on-disk
+hash table as they are written, which is both low impact and memory
+efficient.
 
 For later fetches, the revision cache is not appropriate as they will
 have 'uninteresting' objects set.
-- 
debian.1.5.6.1

  reply	other threads:[~2009-06-05  5:45 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-04 14:18 [PATCH 0/7] [GSoC2009] Revision cache / git-daemon caching plan Sam Vilain
2009-06-04 14:05 ` Sam Vilain [this message]
2009-06-04 14:05 ` [PATCH 1/7] revision-cache: define revision cache as simple list of revisions Sam Vilain
2009-06-05 19:28   ` Nicolas Pitre
2009-06-07  3:49     ` Sam Vilain
2009-06-07  5:43       ` Nicolas Pitre
2009-06-04 14:05 ` [PATCH 4/7] rev-cache: allow multiple 'start' objects per index Sam Vilain
2009-06-04 14:05 ` [PATCH 3/7] rev-cache: add 'end' objects for caching 'uninteresting' lookups Sam Vilain
2009-06-04 14:05 ` [PATCH 5/7] revision cache: maps of 'new' objects Sam Vilain
2009-06-04 14:05 ` [PATCH 7/7] revision cache: be even stricter with sort order Sam Vilain
2009-06-04 14:05 ` [PATCH 6/7] revision cache: allow foreign 'start' commits Sam Vilain
2009-06-05 16:56 ` [PATCH 0/7] [GSoC2009] Revision cache / git-daemon caching plan Jakub Narebski
2009-06-05 20:58   ` Nicolas Pitre
2009-06-07  4:01   ` Sam Vilain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=25e92985a657be1d7ea3dd8486cbe404b072a2a2.1244125127.git.sam@vilain.net \
    --to=sam@vilain.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=christian@couder.net \
    --cc=exon@op5.se \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=sirnot@gmail.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).