user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: Re: Limited-history local archives
Date: Fri, 3 Jan 2020 22:02:50 +0000	[thread overview]
Message-ID: <20200103220250.GA11789@dcvr> (raw)
In-Reply-To: <20200103201532.gv4rdotwuiv7ieiy@chatter.i7.local>

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> Hi, all:
> 
> I wonder if it would be useful to have a feature allowing someone to run 
> a limited-history local copy of a larger remote archive -- for example 
> if someone only wanted a 3-month copy of LKML instead of the whole 
> 20-year enchilada.

Yes.

> It's possible to accomplish this with git already [^1], e.g. you can use 
> the following to grab a copy of LKML starting with December 2019:
> 
>   $ git clone --bare --shallow-since 2019-12-01 https://lore.kernel.org/lkml/git/7 lkml-since-dec.git
>   $ cd lkml-since-dec.git
>   $ git config --add remote.origin.fetch '+refs/heads/master:refs/heads/master'
> 
> You can now run "git fetch" as usual and perform all the normal 
> operations, such as "git show {rev}:m" to get the message contents.  
> Obviously, if we try to get a revision from before December 1, the 
> operation fails:
> 
>   $ git show dae740ca679710fbe8b97b3e704d63e3e7883fd9:m
>   fatal: Path 'm' does not exist in 'dae740ca679710fbe8b97b3e704d63e3e7883fd9'
> 
> If we enable uploadpack.allowAnySHA1InWant on the server, we can then 
> fetch this object directly:

Usability-wise, git itself seems pretty bad at this...

I haven't looked deeply at this, but could/should public-inbox
enable allowAnySHA1InWant by default?

>   $ git fetch --depth 1 origin dae740ca679710fbe8b97b3e704d63e3e7883fd9
>   remote: Counting objects: 3, done.
>   remote: Compressing objects: 100% (2/2), done.
>   remote: Total 3 (delta 0), reused 3 (delta 0)
>   Unpacking objects: 100% (3/3), done.
>   From https://lore.kernel.org/lkml/git/7
>    * branch              dae740ca679710fbe8b97b3e704d63e3e7883fd9 -> FETCH_HEAD
> 
> Now this succeeds:
> 
>   $ git show dae740ca679710fbe8b97b3e704d63e3e7883fd9:m
> 
> We can then periodically reshallow the archive (e.g. once a day) in 
> order to get rid of older objects:
> 
>   $ git fetch --shallow-since 2019-12-15 --update-shallow origin master
>   $ git gc --prune=now
> 
> There isn't really an RFC or anything associated with this -- I just 
> wanted to share this idea as a possibly useful way of reducing local 
> storage requirements while still being able to operate directly on 
> public-inbox git repositories -- e.g. with a tool like l2md 
> (https://git.kernel.org/pub/scm/linux/kernel/git/dborkman/l2md.git/).

Given allowAnySHA1InWant isn't enabled by default on servers
today, and the number of commands are needed on the client,
I'm not sure git is really great for people who want to read
mail locally...

POST + "&x=m" search queries the easiest alternative, I think:

	curl -X POST "$INBOX_URL/?q=d:$YYYYMMDD..&x=m" >mboxrd.gz
	(but I wish MUAs could keep track of which messages I've read in
	 between queries)

And NNTP, which ought to be tunnel-able over HTTPS CONNECT.

> [^1]: Theoretically, this will become even easier in the future with 
>       partial-clone functionality, though I believe that's mostly 
>       written to support fetching large blobs from CDNs and wouldn't be 
>       as useful for very linear public-inbox repositories.

Fwiw, I really wish "git --git-dir=$URL any-read-only-command"
could work one day like it does with SVN.

WebDAV would've been nice but AFAIK davfs2 doesn't support
Range:, yet..., and having to mount FSes is a drag...

      reply	other threads:[~2020-01-03 22:02 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-03 20:15 Limited-history local archives Konstantin Ryabitsev
2020-01-03 22:02 ` Eric Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200103220250.GA11789@dcvr \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).