From: Shawn Pearce <email@example.com> To: Jeff King <firstname.lastname@example.org> Cc: Junio C Hamano <email@example.com>, git <firstname.lastname@example.org>, Michael Haggerty <email@example.com>, David Borowitz <firstname.lastname@example.org> Subject: Re: reftable [v4]: new ref storage format Date: Thu, 3 Aug 2017 15:17:37 -0700 [thread overview] Message-ID: <CAJo=hJtXaDtouGU0noE0ueHJYr872OTfSt5Y9SmQ5xAA7g_CrA@mail.gmail.com> (raw) In-Reply-To: <email@example.com> On Wed, Aug 2, 2017 at 1:28 PM, Jeff King <firstname.lastname@example.org> wrote: > On Wed, Aug 02, 2017 at 12:50:39PM -0700, Junio C Hamano wrote: > >> With the traditional "packed-refs plus loose" layout, no matter how >> many times a handful of selected busy refs are updated during the >> day, you'd need to open at most two files to find out the current >> value of a single ref (admittedly, the accessing of the second file, >> after we realize that there is no loose one, would be very costly). >> If you make a few commits on a topic branch A, then build a 100 >> commit series on top of another topic branch B, finding the current >> value of A is still one open and read of refs/heads/A. >> >> With the reftable format, we'd need to open and read all 100 >> incremental transactions that touch branch B before realizing that >> none of them talk about A, and read the next transaction file to >> find the current value of A. To keep this number low, we'd need >> quite a frequent compaction. > > I think this is where compaction cleverness can come in. > > One relatively easy optimization would be to note when the most recent > reftable contains a subset of the refs we are currently updating (and > the obvious string-of-updates to a single ref falls under that), and do > a "quick" compaction where we simply drop that reftable in favor of > ours. That compaction is essentially free, because we know those entries > aren't valid anymore anyway. > > I'm actually not sure if this is a strict "drop", though, because of > reflogs. If the reflog is written into the same file as the ref update, > then you'd need to roll its entry into your new update, too. But see > below anyway. > > For more complicated cases, there's some cleverness you can do with > small on-the-fly compactions. Even if there are entries in the last few > reftables that we're not currently updating, it's pretty cheap to roll a > few of them up into our new reftable if it lets us drop some > intermediate reftables. E.g., if we're writing a new reftable with a 4K > block size but only have 100 bytes of new data, we're probably best to > roll up a previous 500-byte reftable. > > That one's an obvious optimization because we know that the filesystem > is going to make us spend 4K either way, so rounding up to that is > generally free-ish. Yes. I was trying to propose exactly this in the first draft of reftable, but someone on list didn't like the idea of an update transaction stalling to perform a compaction, so I took that text out in later drafts. What I had envisioned is exactly what you mention; an update of refs/heads/B that is just going to overlay another small reftable that also recently updated refs/heads/B should just replace that table instead of pushing onto the stack. Or if the combined top of stack + new table is under 4K, they should just combine together instead of pushing a new table onto the stack. > What's less obvious is when we should roll up a bunch of 4K tables into > one (let's say) 32K table. I think there's a formula for doing this > geometrically so that the amortized cost of writing stays under a > certain bound (linear?). But I haven't thought it through (or looked it > up); I was hoping Shawn had already done so and could dump that wisdom > on us. Sorry, I haven't done that. :) >> We can just declare that reftable format is not for desktop clients >> but for server implementations where frequent compaction would not >> be an annoyance to the users, but I'd wish we do not have to. > > Yeah, I agree we should avoid that. I would love to eventually kill off > the loose/packed backend (or at least make it historical and read-only, > so that a reasonable response to people complaining about its races and > lack of atomicity is "please upgrade to reftables"). I'd also really like reftable to be useful for "desktop clients". I'd consider it something of a design failure if the format was horribly unsuitable in some way to that use case. I think the small compactions during updates will be essential to supporting the typical developer's workflow.
next prev parent reply other threads:[~2017-08-03 22:18 UTC|newest] Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-07-31 3:51 Shawn Pearce 2017-07-31 17:41 ` Dave Borowitz 2017-07-31 19:01 ` Stefan Beller 2017-07-31 23:05 ` Shawn Pearce 2017-07-31 19:42 ` Junio C Hamano 2017-07-31 23:43 ` Shawn Pearce 2017-08-01 16:08 ` Shawn Pearce 2017-08-01 6:41 ` Michael Haggerty 2017-08-01 20:23 ` Shawn Pearce 2017-08-02 0:49 ` Michael Haggerty 2017-08-01 23:27 ` Shawn Pearce 2017-08-01 23:54 ` Shawn Pearce 2017-08-02 1:51 ` Michael Haggerty 2017-08-02 2:38 ` Shawn Pearce 2017-08-02 9:28 ` Jeff King 2017-08-02 15:17 ` Shawn Pearce 2017-08-02 16:51 ` Junio C Hamano 2017-08-02 17:28 ` Jeff King 2017-08-02 12:20 ` Dave Borowitz 2017-08-02 17:18 ` Jeff King 2017-08-03 18:38 ` Michael Haggerty 2017-08-03 22:26 ` Shawn Pearce 2017-08-03 22:48 ` Michael Haggerty 2017-08-04 2:50 ` Shawn Pearce 2017-08-05 21:00 ` Shawn Pearce 2017-08-01 13:54 ` Dave Borowitz 2017-08-01 15:27 ` Shawn Pearce 2017-08-02 19:50 ` Junio C Hamano 2017-08-02 20:28 ` Jeff King 2017-08-03 22:17 ` Shawn Pearce [this message] 2017-08-03 1:50 ` Junio C Hamano 2017-08-03 2:21 ` Shawn Pearce 2017-08-03 2:36 ` Junio C Hamano 2017-08-02 19:54 ` Stefan Beller
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CAJo=hJtXaDtouGU0noE0ueHJYr872OTfSt5Y9SmQ5xAA7g_CrA@mail.gmail.com' \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --subject='Re: reftable [v4]: new ref storage format' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).