user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
To: Eric Wong <e@80x24.org>
Cc: meta@public-inbox.org
Subject: Re: lei-managed pseudo mailing lists
Date: Mon, 26 Apr 2021 15:46:59 -0400	[thread overview]
Message-ID: <20210426194659.d5w2nkeqvtyni4ay@nitro.local> (raw)
In-Reply-To: <20210426184717.GA29112@dcvr>

On Mon, Apr 26, 2021 at 06:47:17PM +0000, Eric Wong wrote:
> > I'm thinking we need the ability to make it a real clonable repository --
> > perhaps without its own xapian index? Actual git repositories aren't large,
> > especially if they are only used for direct git operations. Disk space is
> > cheap, it's the IO that's expensive. :)
> 
> True, though cache overheads hurt a bit.  I also wonder if lei
> can increase traffic to public-inbox-<imapd|nntpd> to reduce
> the need/use of "git clone".
> 
> > If these are real clonable repositories, then it would be easy for people to
> > set up replication for just the curated content people want.
> 
> Understood.  Using --output v2publicinbox:... w/o --shared is
> totally doable.

I'm just worried that if we overuse the alternates, then we may find ourselves
in a situation where when we repack the "every blob" shared repository, we'll
end up with a pack that isn't really optimized to be used by any of the
member repos. So, in a situation where a clone is performed, git-upload-pack
will have to spend a lot of cycles navigating through the monstrous parent
pack just to build and re-compress the small subset of objects it needs to
send.

Git has ways of dealing with this by allowing to set things like pack islands,
but it's finicky and requires that each child repo is defined as refs in the
parent repo. We deal with this in grokmirror, but it's messy and requires
properly tracking child repo additions/removals/etc.

I think it may be one of those cases where wasting disk space on duplicate
objects is worth the CPU cycle savings.

> > Not really worried about deduping blobs, but I'm wondering how to make it work
> > well when search parameters change (see above). E.g.:
> > 
> > 1. we create the repo with one set of parameters
> > 2. maintainer then broadens it up to include something else
> > 3. maintainer then decides that it's now *way* too much and narrows it down again
> > 
> > We don't really want step 2 to lead to a permanent ballooning of the
> > repository, so perhaps all query changes should force-append a dt: with the
> > open-ended datetime of the change? Or do you already have a way to deal with
> > this situation?
> 
> The aforementioned maxuid prevents stuff that's too old from
> being seen.  Otherwise, there's always "public-inbox-learn rm".

How would it handle the situation where we import a new list into lore with a
10-year-long archive of messages?

-K

  reply	other threads:[~2021-04-26 19:47 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-26 16:44 Konstantin Ryabitsev
2021-04-26 17:37 ` Eric Wong
2021-04-26 18:20   ` Konstantin Ryabitsev
2021-04-26 18:47     ` Eric Wong
2021-04-26 19:46       ` Konstantin Ryabitsev [this message]
2021-04-26 20:34         ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210426194659.d5w2nkeqvtyni4ay@nitro.local \
    --to=konstantin@linuxfoundation.org \
    --cc=e@80x24.org \
    --cc=meta@public-inbox.org \
    --subject='Re: lei-managed pseudo mailing lists' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

user/dev discussion of public-inbox itself

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 meta meta/ https://public-inbox.org/meta \
		meta@public-inbox.org
	public-inbox-index meta

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://7fh6tueqddpjyxjmgtdiueylzoqt6pt7hec3pukyptlmohoowvhde4yd.onion/inbox.comp.mail.public-inbox.meta
	nntp://ie5yzdi7fg72h7s4sdcztq5evakq23rdt33mfyfcddc5u3ndnw24ogqd.onion/inbox.comp.mail.public-inbox.meta
	nntp://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.io/gmane.mail.public-inbox.general
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/public-inbox.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git