From: Junio C Hamano <email@example.com> To: Ben Peart <firstname.lastname@example.org> Cc: Ben Peart <email@example.com>, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, Johannes.Schindelin@gmx.de, email@example.com, firstname.lastname@example.org Subject: Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization Date: Wed, 15 Nov 2017 13:40:59 +0900 Message-ID: <email@example.com> (raw) In-Reply-To: <firstname.lastname@example.org> (Ben Peart's message of "Tue, 14 Nov 2017 23:16:39 -0500") Ben Peart <email@example.com> writes: > OK. I'll call this new extension "EOIE" ("end of index > entries"). Other than the standard header/footer, it will only contain > a 32 bit offset to the beginning of the extension entries. I'll > always write out that extension unless you would prefer it be behind a > setting (if so, what do you want it called and what should the default > be)? I won't add support in update-index for this extension. To make it robust, if I were doing this, I would at least add a checksum of some sort. As each extension section consists of 4-byte extension type, 4-byte size, followed by that many bytes of the "meat" of the section, what I had in mind when I suggested this backpointer was something like: "EOIE" <32-bit size> <32-bit offset> <20-byte hash> where the size of the extension section is obviously 24-byte to cover the offset plus hash, and the hash is computed over extension types and their sizes (but not their contents---this is not about protecting against file corruption and not worth wasting the cycles for hashing) for all the extension sections this index file has (except for "EOIE" at the end, for obvious reasons). E.g. if we have "TREE" extension that is N-bytes long, "REUC" extension that is M-bytes long, followed by "EOIE", then the hash would be SHA-1("TREE" + <binary representation of N> + "REUC" + <binary representation of M>) Then the reader would - Seek back 32-byte from the trailer to ensure it sees "EOIE" followed by a correct size (24?) - Jump to the offset and find 4-bytes that presumably is the type of the first extension, followed by its size. - Feed these 8-bytes to the hash, skip that section based on its size (while making sure we won't run off the end of the file, which is a sign that we thought EOIE exists when there wasn't). Repeat this until we hit where we found "EOIE" (or we notice our mistake by overrunning it). - Check the hash to make sure we got it right. > Since the goal was to find a way to load the IEOT extension before the > cache entries, I'll also refactor the extension reading loop into a > function that takes a function pointer and add a > preread_index_extension() function that can be passed in when that > loop is run before the cache entries are loaded. When the loop is run > again after the cache entries are loaded, it will pass in the existing > read_index_extension() function. Extensions can then choose which > function they want to be loaded in. > > The code to read/write/use the IEOT to parallelize the cache entry > loading will stay behind a config setting that defaults to false (at > least for now). I'll stick with "core.fastindex" until someone can > (please) propose a better name. Sounds good.
next prev parent reply index Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-11-09 14:17 [PATCH v1 0/4] Speed " Ben Peart 2017-11-09 14:17 ` [PATCH v1 1/4] fastindex: speed " Ben Peart 2017-11-10 4:46 ` Junio C Hamano 2017-11-13 16:42 ` Ben Peart 2017-11-14 1:10 ` Junio C Hamano 2017-11-14 14:31 ` Ben Peart 2017-11-14 15:04 ` Junio C Hamano 2017-11-14 15:40 ` Ben Peart 2017-11-15 1:12 ` Junio C Hamano 2017-11-15 4:16 ` Ben Peart 2017-11-15 4:40 ` Junio C Hamano [this message] 2017-11-20 14:01 ` Ben Peart 2017-11-20 14:20 ` Jeff King 2017-11-20 15:38 ` Jeff King 2017-11-20 23:51 ` Ramsay Jones 2017-11-21 0:45 ` Ben Peart 2017-11-09 14:17 ` [PATCH v1 2/4] update-index: add fastindex support to update-index Ben Peart 2017-11-09 14:17 ` [PATCH v1 3/4] fastindex: add test tools and a test script Ben Peart 2017-11-09 14:17 ` [PATCH v1 4/4] fastindex: add documentation for the fastindex extension Ben Peart
Reply instructions: You may reply publically to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --firstname.lastname@example.org \ --email@example.com \ --cc=Johannes.Schindelin@gmx.de \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
email@example.com list mirror (unofficial, one of many) Archives are clonable: git clone --mirror https://public-inbox.org/git git clone --mirror http://ou63pmih66umazou.onion/git git clone --mirror http://czquwvybam4bgbro.onion/git git clone --mirror http://hjrcffqmbrq6wope.onion/git Example config snippet for mirrors Newsgroups are available over NNTP: nntp://news.public-inbox.org/inbox.comp.version-control.git nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git nntp://news.gmane.org/gmane.comp.version-control.git note: .onion URLs require Tor: https://www.torproject.org/ AGPL code for this site: git clone https://public-inbox.org/public-inbox.git