From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-3.4 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from mail-qt1-x834.google.com (mail-qt1-x834.google.com [IPv6:2607:f8b0:4864:20::834]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 817B71F463 for ; Fri, 3 Jan 2020 20:15:36 +0000 (UTC) Received: by mail-qt1-x834.google.com with SMTP id g1so30786598qtr.13 for ; Fri, 03 Jan 2020 12:15:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linuxfoundation.org; s=google; h=date:from:to:subject:message-id:mail-followup-to:mime-version :content-disposition; bh=npVxPEiQTKziEeBfhvmWuwwHQKNum3hK5W43F+u1Q/c=; b=fhzDnFZQj1mfu/bMOi+IvctkVXK5t+2AHq0z8I9IirrTrLOB8YRleg566JBuEPMDJL lZFACta0eJhZ3nA2Be7R686FT41qD93db0ZSMJ/ZOq9cQRcb7kdBJ8TfhW4iIsNY4KnB Kh+Q8+Hr15D+xDwQJuGr7NyXfWvtgCFbgE2eg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:subject:message-id:mail-followup-to :mime-version:content-disposition; bh=npVxPEiQTKziEeBfhvmWuwwHQKNum3hK5W43F+u1Q/c=; b=asP142vX4jvoLjixfIx7TSItWarrpY41awK7KF/mAKRfP5JhJ35wP0l4PzWMRgqu15 r+a6C9AEniAA1/id0aYN2MC0gBRyZKBCLgs+/MNrMzFGl2uQv9YJPq3lj+lR//dCGM78 dByZAuFsfuXAcWCBXmAtLtWk7s952oWxPmD4Bg1IcX5p2UIJilgk20VucdPcI3cPsNMy S16WMq7GbNkov007BMOtkwSxNh98jClrUCYBhOq0xKbrrBmwTwX6+tjQA2ugEKY84uOb LKyDi/slkmDk/BGM6dJdzg4TiTv4Y1pvkuaxUWX74USJNmuKAYLGKNgj21GRc4dCuBQK nmRw== X-Gm-Message-State: APjAAAWmL14d1/Q3MP2eo/RjKDJwLdjXrimN2qpcgJ+x25YDevf8tjgT ggtzJ0iYJrt40Z/2GG1MYhUc9jjtQoM= X-Google-Smtp-Source: APXvYqxPXrY/21TUyuGZC3EetJ7RPQ8MxFyqNEFc2ADjpYddIUxemme1My28Jf7+xu4CYOMxojjwFw== X-Received: by 2002:ac8:2b86:: with SMTP id m6mr65306742qtm.190.1578082535094; Fri, 03 Jan 2020 12:15:35 -0800 (PST) Received: from chatter.i7.local (107-179-243-71.cpe.teksavvy.com. [107.179.243.71]) by smtp.gmail.com with ESMTPSA id w25sm12527273qts.91.2020.01.03.12.15.34 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Jan 2020 12:15:34 -0800 (PST) Date: Fri, 3 Jan 2020 15:15:32 -0500 From: Konstantin Ryabitsev To: meta@public-inbox.org Subject: Limited-history local archives Message-ID: <20200103201532.gv4rdotwuiv7ieiy@chatter.i7.local> Mail-Followup-To: meta@public-inbox.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline List-Id: Hi, all: I wonder if it would be useful to have a feature allowing someone to run a limited-history local copy of a larger remote archive -- for example if someone only wanted a 3-month copy of LKML instead of the whole 20-year enchilada. It's possible to accomplish this with git already [^1], e.g. you can use the following to grab a copy of LKML starting with December 2019: $ git clone --bare --shallow-since 2019-12-01 https://lore.kernel.org/lkml/git/7 lkml-since-dec.git $ cd lkml-since-dec.git $ git config --add remote.origin.fetch '+refs/heads/master:refs/heads/master' You can now run "git fetch" as usual and perform all the normal operations, such as "git show {rev}:m" to get the message contents. Obviously, if we try to get a revision from before December 1, the operation fails: $ git show dae740ca679710fbe8b97b3e704d63e3e7883fd9:m fatal: Path 'm' does not exist in 'dae740ca679710fbe8b97b3e704d63e3e7883fd9' If we enable uploadpack.allowAnySHA1InWant on the server, we can then fetch this object directly: $ git fetch --depth 1 origin dae740ca679710fbe8b97b3e704d63e3e7883fd9 remote: Counting objects: 3, done. remote: Compressing objects: 100% (2/2), done. remote: Total 3 (delta 0), reused 3 (delta 0) Unpacking objects: 100% (3/3), done. From https://lore.kernel.org/lkml/git/7 * branch dae740ca679710fbe8b97b3e704d63e3e7883fd9 -> FETCH_HEAD Now this succeeds: $ git show dae740ca679710fbe8b97b3e704d63e3e7883fd9:m We can then periodically reshallow the archive (e.g. once a day) in order to get rid of older objects: $ git fetch --shallow-since 2019-12-15 --update-shallow origin master $ git gc --prune=now There isn't really an RFC or anything associated with this -- I just wanted to share this idea as a possibly useful way of reducing local storage requirements while still being able to operate directly on public-inbox git repositories -- e.g. with a tool like l2md (https://git.kernel.org/pub/scm/linux/kernel/git/dborkman/l2md.git/). -K [^1]: Theoretically, this will become even easier in the future with partial-clone functionality, though I believe that's mostly written to support fetching large blobs from CDNs and wouldn't be as useful for very linear public-inbox repositories.