user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* lei-managed pseudo mailing lists
@ 2021-04-26 16:44 Konstantin Ryabitsev
  2021-04-26 17:37 ` Eric Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Konstantin Ryabitsev @ 2021-04-26 16:44 UTC (permalink / raw)
  To: meta

Hello:

One of the services I think would be interesting to provide is ability for
people to subscribe to "curated saved searches". For example, a kernel
subsystem maintainer can define a set of query parameters (a thread mentions
these files/functions/terms, etc), and allow others to follow this saved
search either by defining it as a remote source for their own lei command, or
by subscribing to it as they would to any regular mailing list.

The latter is specifically something I think would be of interest to kernel
folks, so I envision that we'd have something like the following:

- a maintainer publishes a configuration file we can pass to lei
- our backend lei process uses all of lore.kernel.org sources to create and
  continuously update a new public-inbox repository with matching search
  results
- we set up a mlmmj list that doesn't receive any direct mail but is only fed
  from saved search results; people can subscribe/unsubscribe as they would
  with any other mlmmj list

Any particular reason this wouldn't work?

-K

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: lei-managed pseudo mailing lists
  2021-04-26 16:44 lei-managed pseudo mailing lists Konstantin Ryabitsev
@ 2021-04-26 17:37 ` Eric Wong
  2021-04-26 18:20   ` Konstantin Ryabitsev
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2021-04-26 17:37 UTC (permalink / raw)
  To: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> Hello:
> 
> One of the services I think would be interesting to provide is ability for
> people to subscribe to "curated saved searches". For example, a kernel
> subsystem maintainer can define a set of query parameters (a thread mentions
> these files/functions/terms, etc), and allow others to follow this saved
> search either by defining it as a remote source for their own lei command, or
> by subscribing to it as they would to any regular mailing list.
> 
> The latter is specifically something I think would be of interest to kernel
> folks, so I envision that we'd have something like the following:
> 
> - a maintainer publishes a configuration file we can pass to lei

The command-line might be enough, the pathname of the current
state/config file is a bit tricky and tied to its output.
I suppose "lei import-search" can be a command, though...

> - our backend lei process uses all of lore.kernel.org sources to create and
>   continuously update a new public-inbox repository with matching search
>   results

There's already some accomodations for that in LeiSavedSearch
which can present itself as a PublicInbox::Inbox-ish object to
PublicInbox::WWW (untested).

Searching an within LSS isn't implemented, yet, but I think it's
doable w/o extra Xapian storage.

However, git object storage isn't duplicated, which is nice for
local use (instaweb-like), but supporting clone/fetch isn't as
natural...

Perhaps supporting a v2 inbox as an lei q output destination
is in order:

	lei q --output v2publicinbox:/path/to/v2 --shared SEARCH_TERMS

--shared would be "git clone --shared", the new v2 inbox can
use ~/.cache/lei/all_locals_ever.git/ as an alternate and not
duplicate space for blobs.

> - we set up a mlmmj list that doesn't receive any direct mail but is only fed
>   from saved search results; people can subscribe/unsubscribe as they would
>   with any other mlmmj list
> 
> Any particular reason this wouldn't work?

Nope :)  As long as all the data formats can interoperate
(mostly RFC5322/2822).  "lei convert" is nice, too :)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: lei-managed pseudo mailing lists
  2021-04-26 17:37 ` Eric Wong
@ 2021-04-26 18:20   ` Konstantin Ryabitsev
  2021-04-26 18:47     ` Eric Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Konstantin Ryabitsev @ 2021-04-26 18:20 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Mon, Apr 26, 2021 at 05:37:26PM +0000, Eric Wong wrote:
> > The latter is specifically something I think would be of interest to kernel
> > folks, so I envision that we'd have something like the following:
> > 
> > - a maintainer publishes a configuration file we can pass to lei
> 
> The command-line might be enough, the pathname of the current
> state/config file is a bit tricky and tied to its output.
> I suppose "lei import-search" can be a command, though...

Excellent, excellent. How well does it deal with the situation when the search
parameters change?

> > - our backend lei process uses all of lore.kernel.org sources to create and
> >   continuously update a new public-inbox repository with matching search
> >   results
> 
> There's already some accomodations for that in LeiSavedSearch
> which can present itself as a PublicInbox::Inbox-ish object to
> PublicInbox::WWW (untested).
> 
> Searching an within LSS isn't implemented, yet, but I think it's
> doable w/o extra Xapian storage.
> 
> However, git object storage isn't duplicated, which is nice for
> local use (instaweb-like), but supporting clone/fetch isn't as
> natural...

I'm thinking we need the ability to make it a real clonable repository --
perhaps without its own xapian index? Actual git repositories aren't large,
especially if they are only used for direct git operations. Disk space is
cheap, it's the IO that's expensive. :)

If these are real clonable repositories, then it would be easy for people to
set up replication for just the curated content people want.

> Perhaps supporting a v2 inbox as an lei q output destination
> is in order:
> 
> 	lei q --output v2publicinbox:/path/to/v2 --shared SEARCH_TERMS
> 
> --shared would be "git clone --shared", the new v2 inbox can
> use ~/.cache/lei/all_locals_ever.git/ as an alternate and not
> duplicate space for blobs.

Not really worried about deduping blobs, but I'm wondering how to make it work
well when search parameters change (see above). E.g.:

1. we create the repo with one set of parameters
2. maintainer then broadens it up to include something else
3. maintainer then decides that it's now *way* too much and narrows it down again

We don't really want step 2 to lead to a permanent ballooning of the
repository, so perhaps all query changes should force-append a dt: with the
open-ended datetime of the change? Or do you already have a way to deal with
this situation?

> > - we set up a mlmmj list that doesn't receive any direct mail but is only fed
> >   from saved search results; people can subscribe/unsubscribe as they would
> >   with any other mlmmj list
> > 
> > Any particular reason this wouldn't work?
> 
> Nope :)  As long as all the data formats can interoperate
> (mostly RFC5322/2822).  "lei convert" is nice, too :)

Great! I believe this will help untangle the current situation with "where
should I send this kernel patch". 

I want "just send it to linux-kernel@vger.kernel.org" to be a valid option
again. Participating subsystems can then define what patches they want to see
by setting up pseudo-lists and letting participating reviewers/maintainers
subscribe to them via their preferred mail delivery mechanism.

-K

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: lei-managed pseudo mailing lists
  2021-04-26 18:20   ` Konstantin Ryabitsev
@ 2021-04-26 18:47     ` Eric Wong
  2021-04-26 19:46       ` Konstantin Ryabitsev
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2021-04-26 18:47 UTC (permalink / raw)
  To: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Mon, Apr 26, 2021 at 05:37:26PM +0000, Eric Wong wrote:
> > > The latter is specifically something I think would be of interest to kernel
> > > folks, so I envision that we'd have something like the following:
> > > 
> > > - a maintainer publishes a configuration file we can pass to lei
> > 
> > The command-line might be enough, the pathname of the current
> > state/config file is a bit tricky and tied to its output.
> > I suppose "lei import-search" can be a command, though...
> 
> Excellent, excellent. How well does it deal with the situation when the search
> parameters change?

"lei edit-search" can be used to zero the maxuid parameters;
and normal v2 deduplication will prevent duplicates from showing up.
It's not automatic, though; though that probably seems like a good
idea to keep manual, anyways, given the step 2. below.

> > > - our backend lei process uses all of lore.kernel.org sources to create and
> > >   continuously update a new public-inbox repository with matching search
> > >   results
> > 
> > There's already some accomodations for that in LeiSavedSearch
> > which can present itself as a PublicInbox::Inbox-ish object to
> > PublicInbox::WWW (untested).
> > 
> > Searching an within LSS isn't implemented, yet, but I think it's
> > doable w/o extra Xapian storage.
> > 
> > However, git object storage isn't duplicated, which is nice for
> > local use (instaweb-like), but supporting clone/fetch isn't as
> > natural...
> 
> I'm thinking we need the ability to make it a real clonable repository --
> perhaps without its own xapian index? Actual git repositories aren't large,
> especially if they are only used for direct git operations. Disk space is
> cheap, it's the IO that's expensive. :)

True, though cache overheads hurt a bit.  I also wonder if lei
can increase traffic to public-inbox-<imapd|nntpd> to reduce
the need/use of "git clone".

> If these are real clonable repositories, then it would be easy for people to
> set up replication for just the curated content people want.

Understood.  Using --output v2publicinbox:... w/o --shared is
totally doable.

> > Perhaps supporting a v2 inbox as an lei q output destination
> > is in order:
> > 
> > 	lei q --output v2publicinbox:/path/to/v2 --shared SEARCH_TERMS
> > 
> > --shared would be "git clone --shared", the new v2 inbox can
> > use ~/.cache/lei/all_locals_ever.git/ as an alternate and not
> > duplicate space for blobs.
> 
> Not really worried about deduping blobs, but I'm wondering how to make it work
> well when search parameters change (see above). E.g.:
> 
> 1. we create the repo with one set of parameters
> 2. maintainer then broadens it up to include something else
> 3. maintainer then decides that it's now *way* too much and narrows it down again
> 
> We don't really want step 2 to lead to a permanent ballooning of the
> repository, so perhaps all query changes should force-append a dt: with the
> open-ended datetime of the change? Or do you already have a way to deal with
> this situation?

The aforementioned maxuid prevents stuff that's too old from
being seen.  Otherwise, there's always "public-inbox-learn rm".

> > > - we set up a mlmmj list that doesn't receive any direct mail but is only fed
> > >   from saved search results; people can subscribe/unsubscribe as they would
> > >   with any other mlmmj list
> > > 
> > > Any particular reason this wouldn't work?
> > 
> > Nope :)  As long as all the data formats can interoperate
> > (mostly RFC5322/2822).  "lei convert" is nice, too :)
> 
> Great! I believe this will help untangle the current situation with "where
> should I send this kernel patch". 
> 
> I want "just send it to linux-kernel@vger.kernel.org" to be a valid option
> again. Participating subsystems can then define what patches they want to see
> by setting up pseudo-lists and letting participating reviewers/maintainers
> subscribe to them via their preferred mail delivery mechanism.

Yup, that seems easiest for new contributors.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: lei-managed pseudo mailing lists
  2021-04-26 18:47     ` Eric Wong
@ 2021-04-26 19:46       ` Konstantin Ryabitsev
  2021-04-26 20:34         ` Eric Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Konstantin Ryabitsev @ 2021-04-26 19:46 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Mon, Apr 26, 2021 at 06:47:17PM +0000, Eric Wong wrote:
> > I'm thinking we need the ability to make it a real clonable repository --
> > perhaps without its own xapian index? Actual git repositories aren't large,
> > especially if they are only used for direct git operations. Disk space is
> > cheap, it's the IO that's expensive. :)
> 
> True, though cache overheads hurt a bit.  I also wonder if lei
> can increase traffic to public-inbox-<imapd|nntpd> to reduce
> the need/use of "git clone".
> 
> > If these are real clonable repositories, then it would be easy for people to
> > set up replication for just the curated content people want.
> 
> Understood.  Using --output v2publicinbox:... w/o --shared is
> totally doable.

I'm just worried that if we overuse the alternates, then we may find ourselves
in a situation where when we repack the "every blob" shared repository, we'll
end up with a pack that isn't really optimized to be used by any of the
member repos. So, in a situation where a clone is performed, git-upload-pack
will have to spend a lot of cycles navigating through the monstrous parent
pack just to build and re-compress the small subset of objects it needs to
send.

Git has ways of dealing with this by allowing to set things like pack islands,
but it's finicky and requires that each child repo is defined as refs in the
parent repo. We deal with this in grokmirror, but it's messy and requires
properly tracking child repo additions/removals/etc.

I think it may be one of those cases where wasting disk space on duplicate
objects is worth the CPU cycle savings.

> > Not really worried about deduping blobs, but I'm wondering how to make it work
> > well when search parameters change (see above). E.g.:
> > 
> > 1. we create the repo with one set of parameters
> > 2. maintainer then broadens it up to include something else
> > 3. maintainer then decides that it's now *way* too much and narrows it down again
> > 
> > We don't really want step 2 to lead to a permanent ballooning of the
> > repository, so perhaps all query changes should force-append a dt: with the
> > open-ended datetime of the change? Or do you already have a way to deal with
> > this situation?
> 
> The aforementioned maxuid prevents stuff that's too old from
> being seen.  Otherwise, there's always "public-inbox-learn rm".

How would it handle the situation where we import a new list into lore with a
10-year-long archive of messages?

-K

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: lei-managed pseudo mailing lists
  2021-04-26 19:46       ` Konstantin Ryabitsev
@ 2021-04-26 20:34         ` Eric Wong
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2021-04-26 20:34 UTC (permalink / raw)
  To: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> I'm just worried that if we overuse the alternates, then we may find ourselves
> in a situation where when we repack the "every blob" shared repository, we'll
> end up with a pack that isn't really optimized to be used by any of the
> member repos. So, in a situation where a clone is performed, git-upload-pack
> will have to spend a lot of cycles navigating through the monstrous parent
> pack just to build and re-compress the small subset of objects it needs to
> send.
> 
> Git has ways of dealing with this by allowing to set things like pack islands,
> but it's finicky and requires that each child repo is defined as refs in the
> parent repo. We deal with this in grokmirror, but it's messy and requires
> properly tracking child repo additions/removals/etc.

At least for personal use, I've been meaning to look into
automatically managing islands.

> I think it may be one of those cases where wasting disk space on duplicate
> objects is worth the CPU cycle savings.

Agreed for serving public inboxes.

> On Mon, Apr 26, 2021 at 06:47:17PM +0000, Eric Wong wrote:
> > The aforementioned maxuid prevents stuff that's too old from
> > being seen.  Otherwise, there's always "public-inbox-learn rm".
> 
> How would it handle the situation where we import a new list into lore with a
> 10-year-long archive of messages?

maxuid is either per-inbox or per-extindex.

If the search is going off of inboxes via --only, then it would
not see the new inbox at all.  If it's on an extindex like
"all", then yes, the newly-imported historical messages would
show up.

So using "rt:" (Received time) is helpful in the [extindex "all"] case

Also, the approxidate parsing is done every time with "lei up",
so you can have a rolling window with "rt:last.week.." as a
search parameter.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-04-26 20:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-26 16:44 lei-managed pseudo mailing lists Konstantin Ryabitsev
2021-04-26 17:37 ` Eric Wong
2021-04-26 18:20   ` Konstantin Ryabitsev
2021-04-26 18:47     ` Eric Wong
2021-04-26 19:46       ` Konstantin Ryabitsev
2021-04-26 20:34         ` Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).