git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Ben Peart <peartben@gmail.com>
Cc: Ben Peart <benpeart@microsoft.com>,
	git@vger.kernel.org, pclouds@gmail.com, chriscool@tuxfamily.org,
	Johannes.Schindelin@gmx.de, alexmv@dropbox.com, peff@peff.net
Subject: Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization
Date: Wed, 15 Nov 2017 13:40:59 +0900	[thread overview]
Message-ID: <xmqq1sl017dw.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <9ba23d2c-2198-55d7-5a02-69879fbbb3cb@gmail.com> (Ben Peart's message of "Tue, 14 Nov 2017 23:16:39 -0500")

Ben Peart <peartben@gmail.com> writes:

> OK.  I'll call this new extension "EOIE" ("end of index
> entries"). Other than the standard header/footer, it will only contain
> a 32 bit offset to the beginning of the extension entries.  I'll
> always write out that extension unless you would prefer it be behind a
> setting (if so, what do you want it called and what should the default
> be)?  I won't add support in update-index for this extension.

To make it robust, if I were doing this, I would at least add a checksum
of some sort.  

As each extension section consists of 4-byte extension type, 4-byte
size, followed by that many bytes of the "meat" of the section, what
I had in mind when I suggested this backpointer was something like:

    "EOIE" <32-bit size> <32-bit offset> <20-byte hash>

where the size of the extension section is obviously 24-byte to
cover the offset plus hash, and the hash is computed over extension
types and their sizes (but not their contents---this is not about
protecting against file corruption and not worth wasting the cycles
for hashing) for all the extension sections this index file has
(except for "EOIE" at the end, for obvious reasons).  E.g. if we
have "TREE" extension that is N-bytes long, "REUC" extension that is
M-bytes long, followed by "EOIE", then the hash would be

	SHA-1("TREE" + <binary representation of N> +
	      "REUC" + <binary representation of M>)

Then the reader would

 - Seek back 32-byte from the trailer to ensure it sees "EOIE"
   followed by a correct size (24?)

 - Jump to the offset and find 4-bytes that presumably is the type
   of the first extension, followed by its size.  

 - Feed these 8-bytes to the hash, skip that section based on its
   size (while making sure we won't run off the end of the file,
   which is a sign that we thought EOIE exists when there wasn't).
   Repeat this until we hit where we found "EOIE" (or we notice our
   mistake by overrunning it).

 - Check the hash to make sure we got it right.

> Since the goal was to find a way to load the IEOT extension before the
> cache entries, I'll also refactor the extension reading loop into a
> function that takes a function pointer and add a
> preread_index_extension() function that can be passed in when that
> loop is run before the cache entries are loaded.  When the loop is run
> again after the cache entries are loaded, it will pass in the existing
> read_index_extension() function.  Extensions can then choose which
> function they want to be loaded in.
>
> The code to read/write/use the IEOT to parallelize the cache entry
> loading will stay behind a config setting that defaults to false (at
> least for now).  I'll stick with "core.fastindex" until someone can
> (please) propose a better name.

Sounds good.

  reply	other threads:[~2017-11-15  4:41 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-09 14:17 [PATCH v1 0/4] Speed up index load through parallelization Ben Peart
2017-11-09 14:17 ` [PATCH v1 1/4] fastindex: speed " Ben Peart
2017-11-10  4:46   ` Junio C Hamano
2017-11-13 16:42     ` Ben Peart
2017-11-14  1:10       ` Junio C Hamano
2017-11-14 14:31         ` Ben Peart
2017-11-14 15:04           ` Junio C Hamano
2017-11-14 15:40             ` Ben Peart
2017-11-15  1:12               ` Junio C Hamano
2017-11-15  4:16                 ` Ben Peart
2017-11-15  4:40                   ` Junio C Hamano [this message]
2017-11-20 14:01                     ` Ben Peart
2017-11-20 14:20                       ` Jeff King
2017-11-20 15:38                         ` Jeff King
2017-11-20 23:51                       ` Ramsay Jones
2017-11-21  0:45                         ` Ben Peart
2017-11-09 14:17 ` [PATCH v1 2/4] update-index: add fastindex support to update-index Ben Peart
2017-11-09 14:17 ` [PATCH v1 3/4] fastindex: add test tools and a test script Ben Peart
2017-11-09 14:17 ` [PATCH v1 4/4] fastindex: add documentation for the fastindex extension Ben Peart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq1sl017dw.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=alexmv@dropbox.com \
    --cc=benpeart@microsoft.com \
    --cc=chriscool@tuxfamily.org \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=peartben@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).