From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-2.9 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 4591820357 for ; Mon, 17 Jul 2017 19:51:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751335AbdGQTvq (ORCPT ); Mon, 17 Jul 2017 15:51:46 -0400 Received: from mail-pg0-f49.google.com ([74.125.83.49]:35683 "EHLO mail-pg0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751316AbdGQTvp (ORCPT ); Mon, 17 Jul 2017 15:51:45 -0400 Received: by mail-pg0-f49.google.com with SMTP id v190so26949353pgv.2 for ; Mon, 17 Jul 2017 12:51:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=DcHGtlHd4M3Qlwf+SQQRZW8UkhDiu/emVrVuduN0qdg=; b=WHJ1oUVUnCg1P827jlzam3wWElkNDBLVnJ5B4UTUOby35jOrLRi3pWTGEmx+gESIXC 2UaSZvG3FAUxQwRxD9kETUcR9M+kdGqhNmYiukM9wHzEP5Te54NhfDQyQ1I7Obpf5KtS MCCq65EiUyFf1GG/oasYg6edSGRdXLtonGdtT/upVMsuC98OPFkFH3xc54ZLHfDKe9nm 4Ds9LK739FtPzfdMGxpYKufW5Dq+EDls6E7b5g+Vp1vh595zl5A6VgBhrIOvF4NXzLRX WOyKHa1eXZvfZNwIjeRV4zhBuBFmEWVXKNpaW2kjDRUwJEINhbwBvBXeyzsbS/o0rSsm FgDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:references:date :in-reply-to:message-id:user-agent:mime-version; bh=DcHGtlHd4M3Qlwf+SQQRZW8UkhDiu/emVrVuduN0qdg=; b=JkO8eRo+VWzrpnFnnQsU0PVGUsKqwMCp7pX6CoybzBtuOzootPsu5jTTNoKn8GIH0B g6W4XvayMUXuyZ/z9Ekt5/SFQanbK4H9WXhyxE1mpS2Q5iD0BsyNdsqpbsyVAhcDCcJZ RfOHJeKIqZbEENVhZnXc7davRgFlDZ+B7EAqb2Sbl0Ovi+nu7nlELvk3pRXF5MOAYSwo MF+fzxF7jVe1FPnuefo6xJ21TyqsrvJRBL4d3yi5nBqojocYvzNsDAUtADkX7Fvh8qCV PpOWnljV6lTXdOWH+MM7tEU6uOMjKsnsGw39/jk1d9WR3HIdgChw+bz6wk9oOtvoJ39q dYKg== X-Gm-Message-State: AIVw112a/oaoVQ6GsdkC2EWgKUtl1jkA2rqKuP+pawYWzFoDxcEipKLV Z1d/P1vP87kPfo8XHB8= X-Received: by 10.84.195.3 with SMTP id i3mr31440832pld.65.1500321104872; Mon, 17 Jul 2017 12:51:44 -0700 (PDT) Received: from localhost ([2620:0:1000:8622:480d:579b:4a3b:27f3]) by smtp.gmail.com with ESMTPSA id g7sm139661pfj.29.2017.07.17.12.51.43 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Mon, 17 Jul 2017 12:51:43 -0700 (PDT) From: Junio C Hamano To: Shawn Pearce Cc: git Subject: Re: reftable [v2]: new ref storage format References: Date: Mon, 17 Jul 2017 12:51:42 -0700 In-Reply-To: (Shawn Pearce's message of "Mon, 17 Jul 2017 08:01:44 -0700") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Shawn Pearce writes: > This is an updated draft after discussion on list with Peff, Michael > Haggerty, and Dave Borowitz. > > You can read a rendered version of this here: > https://googlers.googlesource.com/sop/jgit/+/reftable/Documentation/technical/reftable.md > > Biggest changes from the first proposal are: > > - reflog is now integrated into reftable > - block headers slightly different > - Peff's stack management idea is used > - Michael's compaction idea is used Just a few comments. > A variable number of 4-byte `restart_offset` values follows the > records. Offsets are relative to the start of the block (0 in first > block to include file header) and refer to the first byte of any > `ref_record` whose name has not been prefixed compressed. Readers can > start linear scans from any of these records. It is unclear what "0 in first block to include file header" wants to say. Do I write "8" if I want to express the offset of the first record in the first block, or do I write "0"? > The 2-byte `number_of_restarts + 1` stores the number of entries in > the `restart_offset` list. It is unclear whose responsibility it is to add this "1". Does this mean a reader thinks there is one entry in the restart table when it sees "0" in the number_of_restarts field (hence you can have max 65536 entries in total)? > Readers can use the restart count to binary search between restarts > before starting a linear scan. The `number_of_restarts` field must be > the last 2 bytes of the block as specified by `block_len` from the > block header. Does the new term "restart count" mean the same thing as number_of_restarts? > ### Log block format > > A log block is written as: > > 'g' > uint24( block_len ) > zlib_deflate { > log_record* > int32( restart_offset )* > int16( number_of_restarts ) > } > > Log blocks look similar to ref blocks, except `block_type = 'g'`. Is there a general recommended strategy for writers to choose how many entries to include in a single physical block? I understand that the deflated whole must fit in the physical block whose length is defined in the footer of the whole file, and in general you would not know how small the data deflates down to before compressing, right? > Log record keys are structured as: > > ref_name '\0' reverse_int32( time_sec ) > > where `time_sec` is the update time in seconds since the epoch. The > `reverse_int32` function inverses the value so lexographical ordering > the network byte order time sorts more recent records first: > > reverse_int(int32 t) { > return 0xffffffff - t; > } Is 2038 an issue, or by that time we'd all be retired together with this file format and it won't be our problem? As the file format uses delta compression with restarts, a reader needs to sequencially scan some bounded number of entries to get the contents of a specific entry anyway, so I am wondering if it is worth storing a longer timestamp in varint() for an restart entry and express the timestamp on delta entries as difference to the previous entry. > ### Log index > > The log index stores the log key (`refname \0 reverse_int32(time_sec)`) > for the last log record of every log block in the file, supporting > bounded-time lookup. This assumes that timestamps never wildly rewind in the reflog, doesn't it? Is that a sensible assumption? Or does "the last log record" in the above mean "the log record with largest timestamp? ... ah, not that is still not sufficient. You'd still need to assume that timestamps on entries in one log block must be all older than the ones on entries in later log blocks. Hmph... Also it is not clear to me how reflogs for two refs would be intermixed in the log blocks, and what log keys for the entries must be recorded in the log index, to facilitate efficient lookup. Is it assumed that a consecutive sequence of log blocks record reflogs for the same ref, before the sequence of log blocks switch to record reflogs for another ref, or something? > A log index block must be written if 2 or more log blocks are written > to the file. If present, the log index appears after the first log > block. There is no padding used to align the log index to block > alignment. > > Log index format is identical to ref index, except the keys are 5 > bytes longer to include `'\0'` and the 4-byte `reverse_int32(time)`. > Records use `block_offset` to refer to the start of a log block. I am assuming that we do not care about being able to quickly determine master@{24028}; I would imagine that it wouldn't be too hard to add an index to help such query, but I offhand would not know the details until I figure out how the format handles reflog entries for multiple refs first.