From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 17E881F4C0; Thu, 24 Oct 2019 20:35:04 +0000 (UTC) Date: Thu, 24 Oct 2019 20:35:03 +0000 From: Eric Wong To: meta@public-inbox.org Subject: Re: RFC: monthly epochs for v2 Message-ID: <20191024203503.GA31522@dcvr> References: <20191024195304.5b7zlx7e3vxfxmtg@chatter.i7.local> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20191024195304.5b7zlx7e3vxfxmtg@chatter.i7.local> List-Id: Konstantin Ryabitsev wrote: > Hi, all: > > With public-inbox now providing manifest files, it is easy to communicate to > mirroring services when an epoch rolls over. What do you think if we make > these roll-overs month-based instead of size-based. So, instead of: > > git/ > 0.git > 1.git > 2.git > > it becomes > > git/ > 201908.git > 201909.git > 201910.git > > Upsides: > > - if history needs to be rewritten due to GDPR edits, the impact is limited > to just messages in that month's archive Epoch size should be configurable, yes. But I'm against time periods such as months or years being a factor for rollover. Many inboxes (including this one) can go idle for weeks/months; and activity can be unpredictable if there's surges. > - if someone is only interested in a few months worth of archives, they > don't have to clone the entire collection > - similarly, someone using public-inbox to feed messages to their inbox > (e.g. using the l2md tool [1]) doesn't need to waste gigabytes storing > archives they aren't interested in NNTP or d:YYYYMMDD..YYYYMMDD mboxrd downloads via HTTP search are better suited for those cases. The HTTP search will be better once it can expand threads to fetch messages in the same thread outside of date ranges (e.g. "mairix -t"). The client side could still import into a local v2 inbox and use it as a cache, and configure their epoch size and expiration logic. > - since the numbers are always auto-incrementing, this change can even be > done to repos currently using number-based epoch rotation, e.g.: > > git/ > 0.git > 1.git > 201910.git > 201911.git > > - there shouldn't be severe directory listing penalties with this, as even > 20 years worth of archives will only have 240 entries That would still increase overhead for cloning + fetching as far as installing and running extra tools. Aside from LKML, most inboxes are pretty small and shouldn't require more than an initial clone and then fetch via cron. If people only want a backup via git (and not host HTTP/NNTP), it's FAR easier for them to run ubiquitous commands such as "git clone --mirror && git fetch" rather than "install $TOOL which may be out-of-date-or-missing-on-your-distro"