From: Jeff King <firstname.lastname@example.org> To: Junio C Hamano <email@example.com> Cc: Shawn Pearce <firstname.lastname@example.org>, git <email@example.com>, Michael Haggerty <firstname.lastname@example.org>, David Borowitz <email@example.com> Subject: Re: reftable [v4]: new ref storage format Date: Wed, 2 Aug 2017 16:28:50 -0400 [thread overview] Message-ID: <firstname.lastname@example.org> (raw) In-Reply-To: <email@example.com> On Wed, Aug 02, 2017 at 12:50:39PM -0700, Junio C Hamano wrote: > With the traditional "packed-refs plus loose" layout, no matter how > many times a handful of selected busy refs are updated during the > day, you'd need to open at most two files to find out the current > value of a single ref (admittedly, the accessing of the second file, > after we realize that there is no loose one, would be very costly). > If you make a few commits on a topic branch A, then build a 100 > commit series on top of another topic branch B, finding the current > value of A is still one open and read of refs/heads/A. > > With the reftable format, we'd need to open and read all 100 > incremental transactions that touch branch B before realizing that > none of them talk about A, and read the next transaction file to > find the current value of A. To keep this number low, we'd need > quite a frequent compaction. I think this is where compaction cleverness can come in. One relatively easy optimization would be to note when the most recent reftable contains a subset of the refs we are currently updating (and the obvious string-of-updates to a single ref falls under that), and do a "quick" compaction where we simply drop that reftable in favor of ours. That compaction is essentially free, because we know those entries aren't valid anymore anyway. I'm actually not sure if this is a strict "drop", though, because of reflogs. If the reflog is written into the same file as the ref update, then you'd need to roll its entry into your new update, too. But see below anyway. For more complicated cases, there's some cleverness you can do with small on-the-fly compactions. Even if there are entries in the last few reftables that we're not currently updating, it's pretty cheap to roll a few of them up into our new reftable if it lets us drop some intermediate reftables. E.g., if we're writing a new reftable with a 4K block size but only have 100 bytes of new data, we're probably best to roll up a previous 500-byte reftable. That one's an obvious optimization because we know that the filesystem is going to make us spend 4K either way, so rounding up to that is generally free-ish. What's less obvious is when we should roll up a bunch of 4K tables into one (let's say) 32K table. I think there's a formula for doing this geometrically so that the amortized cost of writing stays under a certain bound (linear?). But I haven't thought it through (or looked it up); I was hoping Shawn had already done so and could dump that wisdom on us. > We can just declare that reftable format is not for desktop clients > but for server implementations where frequent compaction would not > be an annoyance to the users, but I'd wish we do not have to. Yeah, I agree we should avoid that. I would love to eventually kill off the loose/packed backend (or at least make it historical and read-only, so that a reasonable response to people complaining about its races and lack of atomicity is "please upgrade to reftables"). -Peff
next prev parent reply other threads:[~2017-08-02 20:28 UTC|newest] Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-07-31 3:51 Shawn Pearce 2017-07-31 17:41 ` Dave Borowitz 2017-07-31 19:01 ` Stefan Beller 2017-07-31 23:05 ` Shawn Pearce 2017-07-31 19:42 ` Junio C Hamano 2017-07-31 23:43 ` Shawn Pearce 2017-08-01 16:08 ` Shawn Pearce 2017-08-01 6:41 ` Michael Haggerty 2017-08-01 20:23 ` Shawn Pearce 2017-08-02 0:49 ` Michael Haggerty 2017-08-01 23:27 ` Shawn Pearce 2017-08-01 23:54 ` Shawn Pearce 2017-08-02 1:51 ` Michael Haggerty 2017-08-02 2:38 ` Shawn Pearce 2017-08-02 9:28 ` Jeff King 2017-08-02 15:17 ` Shawn Pearce 2017-08-02 16:51 ` Junio C Hamano 2017-08-02 17:28 ` Jeff King 2017-08-02 12:20 ` Dave Borowitz 2017-08-02 17:18 ` Jeff King 2017-08-03 18:38 ` Michael Haggerty 2017-08-03 22:26 ` Shawn Pearce 2017-08-03 22:48 ` Michael Haggerty 2017-08-04 2:50 ` Shawn Pearce 2017-08-05 21:00 ` Shawn Pearce 2017-08-01 13:54 ` Dave Borowitz 2017-08-01 15:27 ` Shawn Pearce 2017-08-02 19:50 ` Junio C Hamano 2017-08-02 20:28 ` Jeff King [this message] 2017-08-03 22:17 ` Shawn Pearce 2017-08-03 1:50 ` Junio C Hamano 2017-08-03 2:21 ` Shawn Pearce 2017-08-03 2:36 ` Junio C Hamano 2017-08-02 19:54 ` Stefan Beller
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --subject='Re: reftable [v4]: new ref storage format' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).