From: Michael Haggerty <email@example.com> To: Shawn Pearce <firstname.lastname@example.org> Cc: git <email@example.com>, Jeff King <firstname.lastname@example.org>, Junio C Hamano <email@example.com>, David Borowitz <firstname.lastname@example.org> Subject: Re: reftable [v4]: new ref storage format Date: Tue, 1 Aug 2017 17:49:47 -0700 [thread overview] Message-ID: <CAMy9T_FA1RV+NFxaXR65gw7G6OxL7Z8Ve3VNLrF4oyCdqtdahg@mail.gmail.com> (raw) In-Reply-To: <CAJo=hJtrdCOF-RxzXfyLx7R-1f2-7pZVO_UOg28J=wUDNdf3yw@mail.gmail.com> On Tue, Aug 1, 2017 at 1:23 PM, Shawn Pearce <email@example.com> wrote: > On Mon, Jul 31, 2017 at 11:41 PM, Michael Haggerty <firstname.lastname@example.org> wrote: >> On Sun, Jul 30, 2017 at 8:51 PM, Shawn Pearce <email@example.com> wrote: >>> 4th iteration of the reftable storage format. >>> [...] >> >> Before we commit to Shawn's reftable proposal, I wanted to explore >> what a contrasting design that is not block based would look like. So >> I threw together a sketch of such a design. It is not as refined as >> Shawn's, and I haven't had time to program an implementation or >> collect statistics, but maybe it is interesting anyway. > > Thanks for taking the time to write this out. Its constructive to see > other possible approaches. > > I roughly implemented your proposed design in JGit for references > only. I skipped the object lookup and log handling for the sake of > studying the basics of this approach. Wow, that's awesome, thanks! > | size | seek_cold | seek_hot | > mh | 28.3 M | 24.5 usec | 14.5 usec | > sp 4k | 29.2 M | 63.0 usec | 5.8 usec | > sp 64k | 27.7 M | 35.6 usec | 23.3 usec | OK, so it's at least in the right ballpark. >> * Multiple related chunks can be stored in a 64 kiB block (interesting >> for people who prefer to read files in 64k segments). By storing >> related chunks near each other, the number of seeks will probably >> typically be smaller than the number of chunks that have to be >> accessed. > > This is difficult. The most related chunks are actually index chunks > in the multi-level index. You want the next step(s) near the current > step to avoid an actual disk seek. But that is hard to do when the > index is several levels deep. Given the 64k read size that you prefer, it seems to me that it would probably be possible to keep pairs of index layers (i.e., one index node and all of its direct children) in the same 64k range, though this might require adjusting their sizes somewhat. And possibly to keep the lowest index layer close to the reference chunks that it describes (analogous to your format, where the restart records form something like an index that is located close to the references). So I would think that it is plausible to arrange things so that a reference can be read within two 64k reads even if the index has three levels. >> * The file can be written in two possible ways: with cross-references >> between chunks always pointing to later parts of the file (probably >> preferable if the data will be read from a local hard disk) or >> always pointing to earlier parts of the file (to accomodate writers >> who need to write their data in order). See the `offsets_negative` >> field in the header. (I'm not sure that the former variant is >> actually needed.) > > This seems like unnecessary complexity. I'd prefer to just say the > offsets are negative, and reference backwards. Maybe. I just wasn't sure whether a hard disk would perform significantly worse when reading backwards rather than forwards (because of pre-fetch of blocks). It could very well be unnecessary. Michael
next prev parent reply other threads:[~2017-08-02 0:49 UTC|newest] Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-07-31 3:51 Shawn Pearce 2017-07-31 17:41 ` Dave Borowitz 2017-07-31 19:01 ` Stefan Beller 2017-07-31 23:05 ` Shawn Pearce 2017-07-31 19:42 ` Junio C Hamano 2017-07-31 23:43 ` Shawn Pearce 2017-08-01 16:08 ` Shawn Pearce 2017-08-01 6:41 ` Michael Haggerty 2017-08-01 20:23 ` Shawn Pearce 2017-08-02 0:49 ` Michael Haggerty [this message] 2017-08-01 23:27 ` Shawn Pearce 2017-08-01 23:54 ` Shawn Pearce 2017-08-02 1:51 ` Michael Haggerty 2017-08-02 2:38 ` Shawn Pearce 2017-08-02 9:28 ` Jeff King 2017-08-02 15:17 ` Shawn Pearce 2017-08-02 16:51 ` Junio C Hamano 2017-08-02 17:28 ` Jeff King 2017-08-02 12:20 ` Dave Borowitz 2017-08-02 17:18 ` Jeff King 2017-08-03 18:38 ` Michael Haggerty 2017-08-03 22:26 ` Shawn Pearce 2017-08-03 22:48 ` Michael Haggerty 2017-08-04 2:50 ` Shawn Pearce 2017-08-05 21:00 ` Shawn Pearce 2017-08-01 13:54 ` Dave Borowitz 2017-08-01 15:27 ` Shawn Pearce 2017-08-02 19:50 ` Junio C Hamano 2017-08-02 20:28 ` Jeff King 2017-08-03 22:17 ` Shawn Pearce 2017-08-03 1:50 ` Junio C Hamano 2017-08-03 2:21 ` Shawn Pearce 2017-08-03 2:36 ` Junio C Hamano 2017-08-02 19:54 ` Stefan Beller
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CAMy9T_FA1RV+NFxaXR65gw7G6OxL7Z8Ve3VNLrF4oyCdqtdahg@mail.gmail.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --subject='Re: reftable [v4]: new ref storage format' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).