git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Ben Peart <peartben@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Ben Peart <benpeart@microsoft.com>,
	git@vger.kernel.org, pclouds@gmail.com, chriscool@tuxfamily.org,
	Johannes.Schindelin@gmx.de, alexmv@dropbox.com, peff@peff.net
Subject: Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization
Date: Tue, 14 Nov 2017 09:31:03 -0500	[thread overview]
Message-ID: <7428e41e-b705-f377-1951-b11af851c4d5@gmail.com> (raw)
In-Reply-To: <xmqq7eut8y36.fsf@gitster.mtv.corp.google.com>



On 11/13/2017 8:10 PM, Junio C Hamano wrote:
> Ben Peart <peartben@gmail.com> writes:
> 
>> The proposed format for extensions (ie having both a header and a
>> footer with name and size) in the current patch already enables having
>> multiple extensions that can be parsed forward or backward.  Any
>> extensions that would want to be parse-able in reverse would just all
>> need to be written one after the other after right before the trailing
>> SHA1 (and of course, include the proper footer).
> 
> In other words, anything that wants to be scannable from the tail is
> welcome to reimplement the same validation structure used by IEOT to
> check the section specific sanity constraint, and this series is not
> interested in introducing a more general infrastructure to make it
> easy for code that want to find where each extension section in the
> file begins without pasing the body of the index.
> 
> I find it somewhat unsatisfactory in that it is a fundamental change
> to allow jumping to the start of an extension section from the tail
> that can benefit any future codepath, and have expected a feature
> neutral extension whose sole purpose is to do so [*1*].
> 
> That way, extension sections whose names are all-caps can stay to be
> optional, even if they allow locating from the tail of the file.  If
> you require them to implement the same validation struture as IEOT
> to perform section specific sanity constraint and also require them
> to be placed consecutively at the end, the reader MUST know about
> all such extensions---otherwise they cannot scan backwards and find
> ones that appear before an unknown but optional one.  If you keep an
> extension section at the end whose sole purpose is to point at the
> beginning of extension sections, the reader can then scan forward as
> usual, skipping over unknown but optional ones, and reading your
> IEOT can merely be an user (and the first user) of that more generic
> feature that is futureproof, no?
> 
> 

How about I add the logic to write out a special extension right before 
the SHA1 that contains an offset to the beginning of the extensions 
section.  I will also add the logic in do_read_index() to search for and 
load this special extension if it exists.

This will provide a common framework for any future extension to take 
advantage of if it wants to be loaded/processed before or in parallel 
with the cache entries or other extensions.

For all existing extensions that assume they are loaded _after_ the 
cache entries, in do_read_index() I'll add the logic to use the offset 
(if it exists) to adjust the src_offset and then load them normally.

Given the IEOT extension is just another list of offsets into the index 
to enable out of order processing, I'll add those offsets into the same 
extension so that it is a more generic "table of contents" for the 
entire index.  This enables us to have common/reusable way to have 
random access to _all_ sections in the index while maintaining backwards 
comparability with the existing index formats and code.

These additional offsets will initially only be used to parallelize the 
loading of cache entries and only if the user explicitly enables that 
option but I can think of other interesting uses for them in the future.

  reply	other threads:[~2017-11-14 14:31 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-09 14:17 [PATCH v1 0/4] Speed up index load through parallelization Ben Peart
2017-11-09 14:17 ` [PATCH v1 1/4] fastindex: speed " Ben Peart
2017-11-10  4:46   ` Junio C Hamano
2017-11-13 16:42     ` Ben Peart
2017-11-14  1:10       ` Junio C Hamano
2017-11-14 14:31         ` Ben Peart [this message]
2017-11-14 15:04           ` Junio C Hamano
2017-11-14 15:40             ` Ben Peart
2017-11-15  1:12               ` Junio C Hamano
2017-11-15  4:16                 ` Ben Peart
2017-11-15  4:40                   ` Junio C Hamano
2017-11-20 14:01                     ` Ben Peart
2017-11-20 14:20                       ` Jeff King
2017-11-20 15:38                         ` Jeff King
2017-11-20 23:51                       ` Ramsay Jones
2017-11-21  0:45                         ` Ben Peart
2017-11-09 14:17 ` [PATCH v1 2/4] update-index: add fastindex support to update-index Ben Peart
2017-11-09 14:17 ` [PATCH v1 3/4] fastindex: add test tools and a test script Ben Peart
2017-11-09 14:17 ` [PATCH v1 4/4] fastindex: add documentation for the fastindex extension Ben Peart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7428e41e-b705-f377-1951-b11af851c4d5@gmail.com \
    --to=peartben@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=alexmv@dropbox.com \
    --cc=benpeart@microsoft.com \
    --cc=chriscool@tuxfamily.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).