From: Shawn Pearce <email@example.com> To: Michael Haggerty <firstname.lastname@example.org> Cc: git <email@example.com>, Jeff King <firstname.lastname@example.org>, Junio C Hamano <email@example.com>, David Borowitz <firstname.lastname@example.org> Subject: Re: reftable [v4]: new ref storage format Date: Thu, 3 Aug 2017 19:50:28 -0700 [thread overview] Message-ID: <CAJo=hJvw3UBP7p-5Yxni++_CL8c3JC3etPkYqxSQiaBiKPQWww@mail.gmail.com> (raw) In-Reply-To: <CAMy9T_EU6hPbnnB72ouRAd0yNvWn6_Ef8Bh2iPxChpmDt1qmFw@mail.gmail.com> On Thu, Aug 3, 2017 at 3:48 PM, Michael Haggerty <email@example.com> wrote: > I've revised the blockless reftable proposal to address some feedback: I've been thinking more about your blockless proposal. I experimentally modified my reftable implementation to omit padding between blocks, bringing it a tiny bit closer to your blockless proposal. Unfortunately this slightly increased latency for lookups on a 4k chunk size. I speculate this is because chunks are no longer aligned with the filesystem page, and I'm forcing Mac OS X to give me two pages out of the filesystem. Using padding to align to a 4k block is slightly faster, and the average wasted per block is <=20 bytes, too small to fit another ref. The restart table and binary search within the 4k block is a performance win. Disabling the restart table significantly increased lookup latency. tl;dr: I think the block alignment and restart table are wins vs. the multi-level index. A suggested downside of my reftable design is the ref index at 4k block size for 866k refs is 199 KiB, and must be paged in for binary search to locate the correct block for any lookup. The pack idx for the two main packs in this repository is 210 MiB. We think fairly little of mmap'ing 210 MiB to perform binary search to find object data. 199 KiB for ref data seems to be a bargain. An advantage of the single level index is its only one page touched after the index is loaded. Hot reftable reads (5.6 usec) are faster than loose ref reads (6.5 usec). Once the ref index is loaded, reftable can read a ref more quickly than the time required to open-read-close a loose ref. Admittedly, a large index slows down a cold read. tl;dr: I just don't think the size of the index is a concern. I really favor the reflog data in a different section from the ref values themselves. Even for smaller transaction files, it improves scan and lookup time by allowing readers who just care about the name and SHA-1 value of a ref to not be paging in or skipping over log record payloads. However, I also agree that the aggregates may benefit from ref and log being separate files. > * Add a way to mark a reflog entry "deleted" without having to rewrite > everything. This is mostly meant to deal with `refs/stash`. This is an interesting idea. Given how I implemented reftable in JGit, just inserting a deletion record for the same (ref,update_index) tuple would make it trivial to hide the prior entry. > * Define an extension mechanism. > * Define the SHA-1 → object mapping as an extension rather than as > part of the main spec. My gut feeling is that it will never be > implemented for git-core. While the SHA-1 -> object mapping may never be implemented for git-core, I'd still prefer to see it as an optional part of the file specification, rather than an extension that is specified. IMHO the extension stuff in DIRC has made it unnecessarily complicated, and we've still revved that file through many revisions. > * Revise how the SHA-1 → object mapping works: > * Merge the bottommost OBJ_INDEX node with the old OBJ nodes to > form a new OBJ_LEAF node. > * Allow the branching factor of each node to be specified > independently (to allow the node sizes to be matched more closely to > the preferred read sizes). I'm not sure objects warrant this kind of complexity. The obj support in reftable is nearly identical to the ref support. I have a significant amount of code that is common between them. Your approach has objects different enough from refs that they need their own code, increasing complexity in both the writer and reader. > I currently lean towards the opinion that we should store pseudorefs > (like `FETCH_HEAD`, `MERGE_HEAD` *outside of* reftables, except for > `HEAD` (which behaves more like a normal reference, which is > considered for reachability, and for which we want to retain reflogs), > which we should store *in* reftables. I'm on the fence, and don't really have a strong opinion about where we store the pseudorefs. Happy to keep them in $GIT_DIR, happy to have them supported inside a reftable.
next prev parent reply other threads:[~2017-08-04 2:50 UTC|newest] Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-07-31 3:51 Shawn Pearce 2017-07-31 17:41 ` Dave Borowitz 2017-07-31 19:01 ` Stefan Beller 2017-07-31 23:05 ` Shawn Pearce 2017-07-31 19:42 ` Junio C Hamano 2017-07-31 23:43 ` Shawn Pearce 2017-08-01 16:08 ` Shawn Pearce 2017-08-01 6:41 ` Michael Haggerty 2017-08-01 20:23 ` Shawn Pearce 2017-08-02 0:49 ` Michael Haggerty 2017-08-01 23:27 ` Shawn Pearce 2017-08-01 23:54 ` Shawn Pearce 2017-08-02 1:51 ` Michael Haggerty 2017-08-02 2:38 ` Shawn Pearce 2017-08-02 9:28 ` Jeff King 2017-08-02 15:17 ` Shawn Pearce 2017-08-02 16:51 ` Junio C Hamano 2017-08-02 17:28 ` Jeff King 2017-08-02 12:20 ` Dave Borowitz 2017-08-02 17:18 ` Jeff King 2017-08-03 18:38 ` Michael Haggerty 2017-08-03 22:26 ` Shawn Pearce 2017-08-03 22:48 ` Michael Haggerty 2017-08-04 2:50 ` Shawn Pearce [this message] 2017-08-05 21:00 ` Shawn Pearce 2017-08-01 13:54 ` Dave Borowitz 2017-08-01 15:27 ` Shawn Pearce 2017-08-02 19:50 ` Junio C Hamano 2017-08-02 20:28 ` Jeff King 2017-08-03 22:17 ` Shawn Pearce 2017-08-03 1:50 ` Junio C Hamano 2017-08-03 2:21 ` Shawn Pearce 2017-08-03 2:36 ` Junio C Hamano 2017-08-02 19:54 ` Stefan Beller
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CAJo=hJvw3UBP7p-5Yxni++_CL8c3JC3etPkYqxSQiaBiKPQWww@mail.gmail.com' \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --subject='Re: reftable [v4]: new ref storage format' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).