git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Ben Peart <peartben@gmail.com>
Cc: Ben Peart <benpeart@microsoft.com>,
	git@vger.kernel.org, pclouds@gmail.com, chriscool@tuxfamily.org,
	Johannes.Schindelin@gmx.de, alexmv@dropbox.com, peff@peff.net
Subject: Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization
Date: Wed, 15 Nov 2017 13:40:59 +0900
Message-ID: <xmqq1sl017dw.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <9ba23d2c-2198-55d7-5a02-69879fbbb3cb@gmail.com> (Ben Peart's message of "Tue, 14 Nov 2017 23:16:39 -0500")

Ben Peart <peartben@gmail.com> writes:

> OK.  I'll call this new extension "EOIE" ("end of index
> entries"). Other than the standard header/footer, it will only contain
> a 32 bit offset to the beginning of the extension entries.  I'll
> always write out that extension unless you would prefer it be behind a
> setting (if so, what do you want it called and what should the default
> be)?  I won't add support in update-index for this extension.

To make it robust, if I were doing this, I would at least add a checksum
of some sort.  

As each extension section consists of 4-byte extension type, 4-byte
size, followed by that many bytes of the "meat" of the section, what
I had in mind when I suggested this backpointer was something like:

    "EOIE" <32-bit size> <32-bit offset> <20-byte hash>

where the size of the extension section is obviously 24-byte to
cover the offset plus hash, and the hash is computed over extension
types and their sizes (but not their contents---this is not about
protecting against file corruption and not worth wasting the cycles
for hashing) for all the extension sections this index file has
(except for "EOIE" at the end, for obvious reasons).  E.g. if we
have "TREE" extension that is N-bytes long, "REUC" extension that is
M-bytes long, followed by "EOIE", then the hash would be

	SHA-1("TREE" + <binary representation of N> +
	      "REUC" + <binary representation of M>)

Then the reader would

 - Seek back 32-byte from the trailer to ensure it sees "EOIE"
   followed by a correct size (24?)

 - Jump to the offset and find 4-bytes that presumably is the type
   of the first extension, followed by its size.  

 - Feed these 8-bytes to the hash, skip that section based on its
   size (while making sure we won't run off the end of the file,
   which is a sign that we thought EOIE exists when there wasn't).
   Repeat this until we hit where we found "EOIE" (or we notice our
   mistake by overrunning it).

 - Check the hash to make sure we got it right.

> Since the goal was to find a way to load the IEOT extension before the
> cache entries, I'll also refactor the extension reading loop into a
> function that takes a function pointer and add a
> preread_index_extension() function that can be passed in when that
> loop is run before the cache entries are loaded.  When the loop is run
> again after the cache entries are loaded, it will pass in the existing
> read_index_extension() function.  Extensions can then choose which
> function they want to be loaded in.
>
> The code to read/write/use the IEOT to parallelize the cache entry
> loading will stay behind a config setting that defaults to false (at
> least for now).  I'll stick with "core.fastindex" until someone can
> (please) propose a better name.

Sounds good.

  reply index

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-09 14:17 [PATCH v1 0/4] Speed " Ben Peart
2017-11-09 14:17 ` [PATCH v1 1/4] fastindex: speed " Ben Peart
2017-11-10  4:46   ` Junio C Hamano
2017-11-13 16:42     ` Ben Peart
2017-11-14  1:10       ` Junio C Hamano
2017-11-14 14:31         ` Ben Peart
2017-11-14 15:04           ` Junio C Hamano
2017-11-14 15:40             ` Ben Peart
2017-11-15  1:12               ` Junio C Hamano
2017-11-15  4:16                 ` Ben Peart
2017-11-15  4:40                   ` Junio C Hamano [this message]
2017-11-20 14:01                     ` Ben Peart
2017-11-20 14:20                       ` Jeff King
2017-11-20 15:38                         ` Jeff King
2017-11-20 23:51                       ` Ramsay Jones
2017-11-21  0:45                         ` Ben Peart
2017-11-09 14:17 ` [PATCH v1 2/4] update-index: add fastindex support to update-index Ben Peart
2017-11-09 14:17 ` [PATCH v1 3/4] fastindex: add test tools and a test script Ben Peart
2017-11-09 14:17 ` [PATCH v1 4/4] fastindex: add documentation for the fastindex extension Ben Peart

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq1sl017dw.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=alexmv@dropbox.com \
    --cc=benpeart@microsoft.com \
    --cc=chriscool@tuxfamily.org \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=peartben@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git