user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: Re: RFC: monthly epochs for v2
Date: Thu, 24 Oct 2019 20:35:03 +0000	[thread overview]
Message-ID: <20191024203503.GA31522@dcvr> (raw)
In-Reply-To: <20191024195304.5b7zlx7e3vxfxmtg@chatter.i7.local>

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> Hi, all:
> 
> With public-inbox now providing manifest files, it is easy to communicate to
> mirroring services when an epoch rolls over. What do you think if we make
> these roll-overs month-based instead of size-based. So, instead of:
> 
> git/
>  0.git
>  1.git
>  2.git
> 
> it becomes
> 
> git/
>  201908.git
>  201909.git
>  201910.git
> 
> Upsides:
> 
> - if history needs to be rewritten due to GDPR edits, the impact is  limited
> to just messages in that month's archive

Epoch size should be configurable, yes.  But I'm against time
periods such as months or years being a factor for rollover.
Many inboxes (including this one) can go idle for weeks/months;
and activity can be unpredictable if there's surges.

> - if someone is only interested in a few months worth of archives, they
> don't have to clone the entire collection
> - similarly, someone using public-inbox to feed messages to their inbox
> (e.g. using the l2md tool [1]) doesn't need to waste gigabytes storing
> archives they aren't interested in

NNTP or d:YYYYMMDD..YYYYMMDD mboxrd downloads via HTTP search
are better suited for those cases.

The HTTP search will be better once it can expand threads to
fetch messages in the same thread outside of date ranges
(e.g. "mairix -t").

The client side could still import into a local v2 inbox and use
it as a cache, and configure their epoch size and expiration logic.

> - since the numbers are always auto-incrementing, this change can even  be
> done to repos currently using number-based epoch rotation, e.g.:
> 
>  git/
>    0.git
>    1.git
>    201910.git
>    201911.git
> 
> - there shouldn't be severe directory listing penalties with this, as  even
> 20 years worth of archives will only have 240 entries

That would still increase overhead for cloning + fetching as far
as installing and running extra tools.

Aside from LKML, most inboxes are pretty small and shouldn't
require more than an initial clone and then fetch via cron.

If people only want a backup via git (and not host HTTP/NNTP),
it's FAR easier for them to run ubiquitous commands such as
"git clone --mirror && git fetch" rather than
"install $TOOL which may be out-of-date-or-missing-on-your-distro"

  reply	other threads:[~2019-10-24 20:35 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-24 19:53 RFC: monthly epochs for v2 Konstantin Ryabitsev
2019-10-24 20:35 ` Eric Wong [this message]
2019-10-24 21:21   ` Konstantin Ryabitsev
2019-10-24 22:34     ` Eric Wong
2019-10-25 12:22       ` Eric Wong
2019-10-25 20:56         ` Konstantin Ryabitsev
2019-10-25 22:57           ` Eric Wong
2019-10-29 15:03             ` Eric W. Biederman
2019-10-29 15:55               ` Konstantin Ryabitsev
2019-10-29 22:46                 ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191024203503.GA31522@dcvr \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).