From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id B77E31F463; Fri, 3 Jan 2020 22:02:50 +0000 (UTC) Date: Fri, 3 Jan 2020 22:02:50 +0000 From: Eric Wong To: meta@public-inbox.org Subject: Re: Limited-history local archives Message-ID: <20200103220250.GA11789@dcvr> References: <20200103201532.gv4rdotwuiv7ieiy@chatter.i7.local> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20200103201532.gv4rdotwuiv7ieiy@chatter.i7.local> List-Id: Konstantin Ryabitsev wrote: > Hi, all: > > I wonder if it would be useful to have a feature allowing someone to run > a limited-history local copy of a larger remote archive -- for example > if someone only wanted a 3-month copy of LKML instead of the whole > 20-year enchilada. Yes. > It's possible to accomplish this with git already [^1], e.g. you can use > the following to grab a copy of LKML starting with December 2019: > > $ git clone --bare --shallow-since 2019-12-01 https://lore.kernel.org/lkml/git/7 lkml-since-dec.git > $ cd lkml-since-dec.git > $ git config --add remote.origin.fetch '+refs/heads/master:refs/heads/master' > > You can now run "git fetch" as usual and perform all the normal > operations, such as "git show {rev}:m" to get the message contents. > Obviously, if we try to get a revision from before December 1, the > operation fails: > > $ git show dae740ca679710fbe8b97b3e704d63e3e7883fd9:m > fatal: Path 'm' does not exist in 'dae740ca679710fbe8b97b3e704d63e3e7883fd9' > > If we enable uploadpack.allowAnySHA1InWant on the server, we can then > fetch this object directly: Usability-wise, git itself seems pretty bad at this... I haven't looked deeply at this, but could/should public-inbox enable allowAnySHA1InWant by default? > $ git fetch --depth 1 origin dae740ca679710fbe8b97b3e704d63e3e7883fd9 > remote: Counting objects: 3, done. > remote: Compressing objects: 100% (2/2), done. > remote: Total 3 (delta 0), reused 3 (delta 0) > Unpacking objects: 100% (3/3), done. > From https://lore.kernel.org/lkml/git/7 > * branch dae740ca679710fbe8b97b3e704d63e3e7883fd9 -> FETCH_HEAD > > Now this succeeds: > > $ git show dae740ca679710fbe8b97b3e704d63e3e7883fd9:m > > We can then periodically reshallow the archive (e.g. once a day) in > order to get rid of older objects: > > $ git fetch --shallow-since 2019-12-15 --update-shallow origin master > $ git gc --prune=now > > There isn't really an RFC or anything associated with this -- I just > wanted to share this idea as a possibly useful way of reducing local > storage requirements while still being able to operate directly on > public-inbox git repositories -- e.g. with a tool like l2md > (https://git.kernel.org/pub/scm/linux/kernel/git/dborkman/l2md.git/). Given allowAnySHA1InWant isn't enabled by default on servers today, and the number of commands are needed on the client, I'm not sure git is really great for people who want to read mail locally... POST + "&x=m" search queries the easiest alternative, I think: curl -X POST "$INBOX_URL/?q=d:$YYYYMMDD..&x=m" >mboxrd.gz (but I wish MUAs could keep track of which messages I've read in between queries) And NNTP, which ought to be tunnel-able over HTTPS CONNECT. > [^1]: Theoretically, this will become even easier in the future with > partial-clone functionality, though I believe that's mostly > written to support fetching large blobs from CDNs and wouldn't be > as useful for very linear public-inbox repositories. Fwiw, I really wish "git --git-dir=$URL any-read-only-command" could work one day like it does with SVN. WebDAV would've been nice but AFAIK davfs2 doesn't support Range:, yet..., and having to mount FSes is a drag...