From: Shawn Pearce <email@example.com> To: Michael Haggerty <firstname.lastname@example.org> Cc: git <email@example.com>, Jeff King <firstname.lastname@example.org>, Junio C Hamano <email@example.com>, David Borowitz <firstname.lastname@example.org> Subject: Re: reftable [v4]: new ref storage format Date: Tue, 1 Aug 2017 13:23:18 -0700 [thread overview] Message-ID: <CAJo=hJtrdCOF-RxzXfyLx7R-1f2-7pZVO_UOg28J=wUDNdf3yw@mail.gmail.com> (raw) In-Reply-To: <CAMy9T_HCnyc1g8XWOOWhe7nN0aEFyyBskV2aOMb_fe+wGvEJ7A@mail.gmail.com> On Mon, Jul 31, 2017 at 11:41 PM, Michael Haggerty <email@example.com> wrote: > On Sun, Jul 30, 2017 at 8:51 PM, Shawn Pearce <firstname.lastname@example.org> wrote: >> 4th iteration of the reftable storage format. >> [...] > > Before we commit to Shawn's reftable proposal, I wanted to explore > what a contrasting design that is not block based would look like. So > I threw together a sketch of such a design. It is not as refined as > Shawn's, and I haven't had time to program an implementation or > collect statistics, but maybe it is interesting anyway. Thanks for taking the time to write this out. Its constructive to see other possible approaches. I roughly implemented your proposed design in JGit for references only. I skipped the object lookup and log handling for the sake of studying the basics of this approach. | size | seek_cold | seek_hot | mh | 28.3 M | 24.5 usec | 14.5 usec | sp 4k | 29.2 M | 63.0 usec | 5.8 usec | sp 64k | 27.7 M | 35.6 usec | 23.3 usec | All tests were run with a 4K chunk/block size on an 866k ref example. The mh approach does not use padding between chunks, so chunks are variable sized, but not exceeding 4K. The differences between cold and hot is whether or not the file was held open, and the root of the ref index stays in memory. In the mh approach with a 4K chunk size and 866k refs, the index is 2 levels deep. A cold seek needs to perform 3 disk reads to navigate the index, hot seek is 2 disk reads. > It seems to me that the block-orientation of Shawn's spec adds a lot > of complication to the design (e.g., padding, optional blocks, varying > block sizes for different parts of the file). It also seems that the > format has been heavily optimized to be read in 64k blocks, whereas > most users will be storing their references to local hard disks or > (increasingly) to local SSDs. So I tried to come up with a design that > is friendlier to the latter, hopefully without being much worse for > the former. Yes, my design is biased to run well on a 64k block size, where the underlying disk is slow. Ideally reads require only one or two block reads. Torn blocks (where a range of data lies on two different blocks) are expensive, as both 64k blocks have to be loaded to acquire data that lies across the boundary. Given how fast SSDs are, I don't think this causes any problems for SSD users. > One thing that I realized is that, assuming that reference values and > reflogs are usually compacted separately, there will basically be > three types of reftable files: > > 1. Accumulated references and their values (potentially very large) > > 2. Accumulated reflogs (potentially gigantic) > > 3. Recent transaction(s), which include both refs and reflogs (usually > quite small). > > The first two types of file don't benefit from separating reference > data from reflog data in the file (because they only include one or > the other). The third kind of file doesn't care because such files > will be small. In principal, I agree with your simplification. The important thing is keeping the reflog away from the accumulated references, such that scans of references (e.g. current ls-remote advertisement) is efficient. I think I was also aiming to do this even for the transaction log files, as ls-remote operations don't need the log record data. > So let's store all of the data related to a single reference in one > spot. This simplifies the file format and removes the need for two > separate indexes and/or variants that special-case the omission of an > index. A disadvantage is that, it prevents compression of reflog > entries across references. I think that is a reasonable tradeoff, > because most reflog entries (in the repositories I checked; that might > differ in Gerrit-managed repos) correspond to references that have > been updated many times. To partly compensate for the resulting > reduction in compression, I added some features to reduce the storage > of redundant information in reflog entries. I found the encoding of reflog entries below somewhat complicated, but I haven't been able to implement it to compare storage size vs. the deflated block I proposed. > On the other hand, this design does not support binary searching for > refnames. Instead, the refnames are distributed into a multiple-level > tree. When looking for a reference, each tree node on the way to the > reference needs to be parsed serially. But the nodes are small enough > (due to the multilevel indexing scheme) that I hope that isn't a > problem. I'd rather read less data at the expense of parsing a little > bit more. Looking at the numbers above, the binary search within a 4k block does reduce lookup time. E.g. reftable has ~7 binary search restart points within each 4k block, vs. your approach needing to parse up to 119 refs in a 4k chunk. > * Multiple related chunks can be stored in a 64 kiB block (interesting > for people who prefer to read files in 64k segments). By storing > related chunks near each other, the number of seeks will probably > typically be smaller than the number of chunks that have to be > accessed. This is difficult. The most related chunks are actually index chunks in the multi-level index. You want the next step(s) near the current step to avoid an actual disk seek. But that is hard to do when the index is several levels deep. > * The file can be written in two possible ways: with cross-references > between chunks always pointing to later parts of the file (probably > preferable if the data will be read from a local hard disk) or > always pointing to earlier parts of the file (to accomodate writers > who need to write their data in order). See the `offsets_negative` > field in the header. (I'm not sure that the former variant is > actually needed.) This seems like unnecessary complexity. I'd prefer to just say the offsets are negative, and reference backwards.
next prev parent reply other threads:[~2017-08-01 20:23 UTC|newest] Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-07-31 3:51 Shawn Pearce 2017-07-31 17:41 ` Dave Borowitz 2017-07-31 19:01 ` Stefan Beller 2017-07-31 23:05 ` Shawn Pearce 2017-07-31 19:42 ` Junio C Hamano 2017-07-31 23:43 ` Shawn Pearce 2017-08-01 16:08 ` Shawn Pearce 2017-08-01 6:41 ` Michael Haggerty 2017-08-01 20:23 ` Shawn Pearce [this message] 2017-08-02 0:49 ` Michael Haggerty 2017-08-01 23:27 ` Shawn Pearce 2017-08-01 23:54 ` Shawn Pearce 2017-08-02 1:51 ` Michael Haggerty 2017-08-02 2:38 ` Shawn Pearce 2017-08-02 9:28 ` Jeff King 2017-08-02 15:17 ` Shawn Pearce 2017-08-02 16:51 ` Junio C Hamano 2017-08-02 17:28 ` Jeff King 2017-08-02 12:20 ` Dave Borowitz 2017-08-02 17:18 ` Jeff King 2017-08-03 18:38 ` Michael Haggerty 2017-08-03 22:26 ` Shawn Pearce 2017-08-03 22:48 ` Michael Haggerty 2017-08-04 2:50 ` Shawn Pearce 2017-08-05 21:00 ` Shawn Pearce 2017-08-01 13:54 ` Dave Borowitz 2017-08-01 15:27 ` Shawn Pearce 2017-08-02 19:50 ` Junio C Hamano 2017-08-02 20:28 ` Jeff King 2017-08-03 22:17 ` Shawn Pearce 2017-08-03 1:50 ` Junio C Hamano 2017-08-03 2:21 ` Shawn Pearce 2017-08-03 2:36 ` Junio C Hamano 2017-08-02 19:54 ` Stefan Beller
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CAJo=hJtrdCOF-RxzXfyLx7R-1f2-7pZVO_UOg28J=wUDNdf3yw@mail.gmail.com' \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --subject='Re: reftable [v4]: new ref storage format' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).