From: Jonathan Nieder <jrnieder@gmail.com>
To: Shawn Pearce <spearce@spearce.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Git Mailing List <git@vger.kernel.org>,
Stefan Beller <sbeller@google.com>,
bmwill@google.com, Jonathan Tan <jonathantanmy@google.com>,
Jeff King <peff@peff.net>, David Lang <david@lang.hm>,
"brian m. carlson" <sandals@crustytoothpaste.net>
Subject: Re: RFC v3: Another proposed hash function transition plan
Date: Thu, 9 Mar 2017 12:24:08 -0800 [thread overview]
Message-ID: <20170309202408.GA17847@aiede.mtv.corp.google.com> (raw)
In-Reply-To: <CAJo=hJtoX9=AyLHHpUJS7fueV9ciZ_MNpnEPHUz8Whui6g9F0A@mail.gmail.com>
Hi,
Shawn Pearce wrote:
> On Mon, Mar 6, 2017 at 4:17 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
>> Alongside the packfile, a sha3 repository stores a bidirectional
>> mapping between sha3 and sha1 object names. The mapping is generated
>> locally and can be verified using "git fsck". Object lookups use this
>> mapping to allow naming objects using either their sha1 and sha3 names
>> interchangeably.
>
> I saw some discussion about using LevelDB for this mapping table. I
> think any existing database may be overkill.
>
> For packs, you may be able to simplify by having only one file
> (pack-*.msha1) that maps SHA-1 to pack offset; idx v2. The CRC32 table
> in v2 is unnecessary, but you need the 64 bit offset support.
>
> SHA-1 to SHA-3: lookup SHA-1 in .msha1, reverse .idx, find offset to
> read the SHA-3.
> SHA-3 to SHA-1: lookup SHA-3 in .idx, and reverse the .msha1 file to
> translate offset to SHA-1.
Thanks for this suggestion. I was initially vaguely nervous about
lookup times in an idx-style file, but as you say, object reads from a
packfile already have to deal with this kind of lookup and work fine.
> For loose objects, the loose object directories should have only
> O(4000) entries before auto gc is strongly encouraging
> packing/pruning. With 256 shards, each given directory has O(16) loose
> objects in it. When writing a SHA-3 loose object, Git could also
> append a line "$sha3 $sha1\n" to objects/${first_byte}/sha1, which
> GC/prune rewrites to remove entries. With O(16) objects in a
> directory, these files should only have O(16) entries in them.
Insertion time is what worries me. When writing a small number of
objects using a command like "git commit", I don't want to have to
regenerate an entire idx file. I don't want to move the pain to
O(loose objects) work at read time, either --- some people disable
auto gc, and others have a large number of loose objects due to gc
ejecting unreachable objects.
But some kind of simplification along these lines should be possible.
I'll experiment.
Jonathan
next prev parent reply other threads:[~2017-03-09 20:25 UTC|newest]
Thread overview: 113+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-04 1:12 RFC: Another proposed hash function transition plan Jonathan Nieder
2017-03-05 2:35 ` Linus Torvalds
2017-03-06 0:26 ` brian m. carlson
2017-03-06 18:24 ` Brandon Williams
2017-06-15 10:30 ` Which hash function to use, was " Johannes Schindelin
2017-06-15 11:05 ` Mike Hommey
2017-06-15 13:01 ` Jeff King
2017-06-15 16:30 ` Ævar Arnfjörð Bjarmason
2017-06-15 19:34 ` Johannes Schindelin
2017-06-15 21:59 ` Adam Langley
2017-06-15 22:41 ` brian m. carlson
2017-06-15 23:36 ` Ævar Arnfjörð Bjarmason
2017-06-16 0:17 ` brian m. carlson
2017-06-16 6:25 ` Ævar Arnfjörð Bjarmason
2017-06-16 13:24 ` Johannes Schindelin
2017-06-16 17:38 ` Adam Langley
2017-06-16 20:52 ` Junio C Hamano
2017-06-16 21:12 ` Junio C Hamano
2017-06-16 21:24 ` Jonathan Nieder
2017-06-16 21:39 ` Ævar Arnfjörð Bjarmason
2017-06-16 20:42 ` Jeff King
2017-06-19 9:26 ` Johannes Schindelin
2017-06-15 21:10 ` Mike Hommey
2017-06-16 4:30 ` Jeff King
2017-06-15 17:36 ` Brandon Williams
2017-06-15 19:20 ` Junio C Hamano
2017-06-15 19:13 ` Jonathan Nieder
2017-03-07 0:17 ` RFC v3: " Jonathan Nieder
2017-03-09 19:14 ` Shawn Pearce
2017-03-09 20:24 ` Jonathan Nieder [this message]
2017-03-10 19:38 ` Jeff King
2017-03-10 19:55 ` Jonathan Nieder
2017-09-28 4:43 ` [PATCH v4] technical doc: add a design doc for hash function transition Jonathan Nieder
2017-09-29 6:06 ` Junio C Hamano
2017-09-29 8:09 ` Junio C Hamano
2017-09-29 17:34 ` Jonathan Nieder
2017-10-02 8:25 ` Junio C Hamano
2017-10-02 19:41 ` Jason Cooper
2017-10-02 9:02 ` Junio C Hamano
2017-10-02 19:23 ` Jason Cooper
2017-10-03 5:40 ` Junio C Hamano
2017-10-03 13:08 ` Jason Cooper
2017-10-04 1:44 ` Junio C Hamano
2017-09-06 6:28 ` RFC v3: Another proposed hash function transition plan Junio C Hamano
2017-09-08 2:40 ` Junio C Hamano
2017-09-08 3:34 ` Jeff King
2017-09-11 18:59 ` Brandon Williams
2017-09-13 12:05 ` Johannes Schindelin
2017-09-13 13:43 ` demerphq
2017-09-13 22:51 ` Jonathan Nieder
2017-09-14 18:26 ` Johannes Schindelin
2017-09-14 18:40 ` Jonathan Nieder
2017-09-14 22:09 ` Johannes Schindelin
2017-09-13 23:30 ` Linus Torvalds
2017-09-14 18:45 ` Johannes Schindelin
2017-09-18 12:17 ` Gilles Van Assche
2017-09-18 22:16 ` Johannes Schindelin
2017-09-19 16:45 ` Gilles Van Assche
2017-09-29 13:17 ` Johannes Schindelin
2017-09-29 14:54 ` Joan Daemen
2017-09-29 22:33 ` Johannes Schindelin
2017-09-30 22:02 ` Joan Daemen
2017-10-02 14:26 ` Johannes Schindelin
2017-09-18 22:25 ` Jonathan Nieder
2017-09-26 17:05 ` Jason Cooper
2017-09-26 22:11 ` Johannes Schindelin
2017-09-26 22:25 ` [PATCH] technical doc: add a design doc for hash function transition Stefan Beller
2017-09-26 23:38 ` Jonathan Nieder
2017-09-26 23:51 ` RFC v3: Another proposed hash function transition plan Jonathan Nieder
2017-10-02 14:54 ` Jason Cooper
2017-10-02 16:50 ` Brandon Williams
2017-10-02 14:00 ` Jason Cooper
2017-10-02 17:18 ` Linus Torvalds
2017-10-02 19:37 ` Jeff King
2017-09-13 16:30 ` Jonathan Nieder
2017-09-13 21:52 ` Junio C Hamano
2017-09-13 22:07 ` Stefan Beller
2017-09-13 22:18 ` Jonathan Nieder
2017-09-14 2:13 ` Junio C Hamano
2017-09-14 15:23 ` Johannes Schindelin
2017-09-14 15:45 ` demerphq
2017-09-14 22:06 ` Johannes Schindelin
2017-09-13 22:15 ` Junio C Hamano
2017-09-13 22:27 ` Jonathan Nieder
2017-09-14 2:10 ` Junio C Hamano
2017-09-14 12:39 ` Johannes Schindelin
2017-09-14 16:36 ` Brandon Williams
2017-09-14 18:49 ` Jonathan Nieder
2017-09-15 20:42 ` Philip Oakley
2017-03-05 11:02 ` RFC: " David Lang
[not found] ` <CA+dhYEXHbQfJ6KUB1tWS9u1MLEOJL81fTYkbxu4XO-i+379LPw@mail.gmail.com>
2017-03-06 9:43 ` Jeff King
2017-03-06 23:40 ` Jonathan Nieder
2017-03-07 0:03 ` Mike Hommey
2017-03-06 8:43 ` Jeff King
2017-03-06 18:39 ` Jonathan Tan
2017-03-06 19:22 ` Linus Torvalds
2017-03-06 19:59 ` Brandon Williams
2017-03-06 21:53 ` Junio C Hamano
2017-03-07 8:59 ` Jeff King
2017-03-06 18:43 ` Junio C Hamano
2017-03-07 18:57 ` Ian Jackson
2017-03-07 19:15 ` Linus Torvalds
2017-03-08 11:20 ` Ian Jackson
2017-03-08 15:37 ` Johannes Schindelin
2017-03-08 15:40 ` Johannes Schindelin
2017-03-20 5:21 ` Use base32? Jason Hennessey
2017-03-20 5:58 ` Michael Steuer
2017-03-20 8:05 ` Jacob Keller
2017-03-21 3:07 ` Michael Steuer
2017-03-13 9:24 ` RFC: Another proposed hash function transition plan The Keccak Team
2017-03-13 17:48 ` Jonathan Nieder
2017-03-13 18:34 ` ankostis
2017-03-17 11:07 ` Johannes Schindelin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170309202408.GA17847@aiede.mtv.corp.google.com \
--to=jrnieder@gmail.com \
--cc=bmwill@google.com \
--cc=david@lang.hm \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
--cc=peff@peff.net \
--cc=sandals@crustytoothpaste.net \
--cc=sbeller@google.com \
--cc=spearce@spearce.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).