From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-2.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,RCVD_IN_SORBS_SPAM, RP_MATCHES_RCVD shortcircuit=no autolearn=no autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id CD1782047F for ; Wed, 2 Aug 2017 00:49:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751554AbdHBAtw (ORCPT ); Tue, 1 Aug 2017 20:49:52 -0400 Received: from alum-mailsec-scanner-4.mit.edu ([18.7.68.15]:51993 "EHLO alum-mailsec-scanner-4.mit.edu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751029AbdHBAtv (ORCPT ); Tue, 1 Aug 2017 20:49:51 -0400 X-AuditID: 1207440f-32bff70000000b50-36-598121aed741 Received: from outgoing-alum.mit.edu (OUTGOING-ALUM.MIT.EDU [18.7.68.33]) (using TLS with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by alum-mailsec-scanner-4.mit.edu (Symantec Messaging Gateway) with SMTP id B5.94.02896.EA121895; Tue, 1 Aug 2017 20:49:50 -0400 (EDT) Received: from mail-io0-f170.google.com (mail-io0-f170.google.com [209.85.223.170]) (authenticated bits=0) (User authenticated as mhagger@ALUM.MIT.EDU) by outgoing-alum.mit.edu (8.13.8/8.12.4) with ESMTP id v720nn7F004582 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT) for ; Tue, 1 Aug 2017 20:49:50 -0400 Received: by mail-io0-f170.google.com with SMTP id c74so14551447iod.4 for ; Tue, 01 Aug 2017 17:49:49 -0700 (PDT) X-Gm-Message-State: AIVw110HbJS8UvKWY+B0jQPUcC6/sU1+fUDYzygktgeipXTHLrYtoS6Z CH5IaRNx4pPjqCL5jOZo5NvOJDMc9w== X-Received: by 10.107.187.129 with SMTP id l123mr23393909iof.122.1501634988951; Tue, 01 Aug 2017 17:49:48 -0700 (PDT) MIME-Version: 1.0 Reply-To: mhagger@alum.mit.edu Received: by 10.107.129.91 with HTTP; Tue, 1 Aug 2017 17:49:47 -0700 (PDT) In-Reply-To: References: From: Michael Haggerty Date: Tue, 1 Aug 2017 17:49:47 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: reftable [v4]: new ref storage format To: Shawn Pearce Cc: git , Jeff King , Junio C Hamano , David Borowitz Content-Type: text/plain; charset="UTF-8" X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFupjleLIzCtJLcpLzFFi42IRYndR1F2n2Bhp0PZK0aLrSjeTA6PH501y AYxRXDYpqTmZZalF+nYJXBn/jl5gKXgiUbH06z+WBsYDwl2MnBwSAiYSU0++Zu9i5OIQEtjB JPFvfj8jhPOESeLX/YUsEE4/o8TOnplsEC35Euefv2aFsMslPn1+zAhi8woISpyc+YQFxBYS kJN4teEGI4TtJbFm4yQmEJtTIFDi0tFPUEPvMEosPPCAGSTBJqArsainGayIRUBFYtXvHVDL EiWaTv0AGsQBtCBAYsELV5CwsICRRMeFfWCtIgJqEtsv9LCCzGQWmMoocWzGMbBeZgFNidbt v9knMArPQnLfLCSpBYxMqxjlEnNKc3VzEzNzilOTdYuTE/PyUot0TfRyM0v0UlNKNzFCApl/ B2PXeplDjAIcjEo8vBztDZFCrIllxZW5hxglOZiURHkVe+ojhfiS8lMqMxKLM+KLSnNSiw8x SnAwK4nwLpVujBTiTUmsrEotyodJSXOwKInzqi9R9xMSSE8sSc1OTS1ILYLJynBwKEnwuigA NQoWpaanVqRl5pQgpJk4OEGG8wANtwGp4S0uSMwtzkyHyJ9iNOa4cmXdFyaOKQe2f2ESYsnL z0uVEucNBSkVACnNKM2DmwZLRq8YxYGeE+Y9C1LFA0xkcPNeAa1iAlolWVoLsqokESEl1cAo eLBX54H8PWklttpYsThhxSrhm5/X7Tx1Zb989pF/P+zff4jX0xJY9aExkW/GWZ/Yu+W6zvwn Hzuncz0PWe34wLV94lYhzvLwtDeHt9b1Lugu+r2y9PwE/r8RH9sswk5KZAiu3//8rwV/sfDy +iusvaZhSW+qbp6xChRkbe0QS2bc8FayRUSJpTgj0VCLuag4EQCSEXaiIQMAAA== Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Tue, Aug 1, 2017 at 1:23 PM, Shawn Pearce wrote: > On Mon, Jul 31, 2017 at 11:41 PM, Michael Haggerty wrote: >> On Sun, Jul 30, 2017 at 8:51 PM, Shawn Pearce wrote: >>> 4th iteration of the reftable storage format. >>> [...] >> >> Before we commit to Shawn's reftable proposal, I wanted to explore >> what a contrasting design that is not block based would look like. So >> I threw together a sketch of such a design. It is not as refined as >> Shawn's, and I haven't had time to program an implementation or >> collect statistics, but maybe it is interesting anyway. > > Thanks for taking the time to write this out. Its constructive to see > other possible approaches. > > I roughly implemented your proposed design in JGit for references > only. I skipped the object lookup and log handling for the sake of > studying the basics of this approach. Wow, that's awesome, thanks! > | size | seek_cold | seek_hot | > mh | 28.3 M | 24.5 usec | 14.5 usec | > sp 4k | 29.2 M | 63.0 usec | 5.8 usec | > sp 64k | 27.7 M | 35.6 usec | 23.3 usec | OK, so it's at least in the right ballpark. >> * Multiple related chunks can be stored in a 64 kiB block (interesting >> for people who prefer to read files in 64k segments). By storing >> related chunks near each other, the number of seeks will probably >> typically be smaller than the number of chunks that have to be >> accessed. > > This is difficult. The most related chunks are actually index chunks > in the multi-level index. You want the next step(s) near the current > step to avoid an actual disk seek. But that is hard to do when the > index is several levels deep. Given the 64k read size that you prefer, it seems to me that it would probably be possible to keep pairs of index layers (i.e., one index node and all of its direct children) in the same 64k range, though this might require adjusting their sizes somewhat. And possibly to keep the lowest index layer close to the reference chunks that it describes (analogous to your format, where the restart records form something like an index that is located close to the references). So I would think that it is plausible to arrange things so that a reference can be read within two 64k reads even if the index has three levels. >> * The file can be written in two possible ways: with cross-references >> between chunks always pointing to later parts of the file (probably >> preferable if the data will be read from a local hard disk) or >> always pointing to earlier parts of the file (to accomodate writers >> who need to write their data in order). See the `offsets_negative` >> field in the header. (I'm not sure that the former variant is >> actually needed.) > > This seems like unnecessary complexity. I'd prefer to just say the > offsets are negative, and reference backwards. Maybe. I just wasn't sure whether a hard disk would perform significantly worse when reading backwards rather than forwards (because of pre-fetch of blocks). It could very well be unnecessary. Michael