user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: "Robin H. Johnson" <robbat2@gentoo.org>
Cc: meta@public-inbox.org
Subject: Re: publicinbox watch path globbing
Date: Mon, 20 Nov 2023 00:10:01 +0000	[thread overview]
Message-ID: <20231120001001.M311669@dcvr> (raw)
In-Reply-To: <robbat2-20231119T203948-109104131Z@orbis-terrarum.net>

"Robin H. Johnson" <robbat2@gentoo.org> wrote:
> Hi!
> 
> Writing to see about work in converting Gentoo's (now-broken) other
> archives web interface over into using public-inbox instead.
> 
> This is the first of a few questions/bumps along the way.
> 
> For historical reasons on the scaling side, the archive maildirs are
> stored by date:
> watch = maildir:$REDACTED/$LISTNAME/.200001/
> watch = maildir:$REDACTED/$LISTNAME/.200102/
> watch = maildir:$REDACTED/$LISTNAME/.YYYYMM/
> watch = maildir:$REDACTED/$LISTNAME/.202311/
> etc.
> (over time, directories are moved to stable read-only storage)

Is there any reason to expect new messages to appear the /.2000??/
and other old directories?

IOW, if somebody with a broken clock sends a message from a past
year/month in the Date: header, does it end up in an old bucket
or the current one?

If your old buckets are frozen, lei in public-inbox.git should be
able to start them off with:

	for d in $REDACTED/$LISTNAME/.??????
	do
		lei convert -o v2:/path/to/inbox-$LISTNAME maildir:$d
	done
	lei daemon-kill # optional, stops lei-daemon when done

And then you'd only have to watch the latest maildir.

I'll try to get public-inbox 2.0 released soon[1]; but the lei convert
stuff should be ready.

> If a given list is low traffic does NOT get traffic in a given month,
> the directory does not exist (it's created when the first mail arrives
> during a calendar month).
> 
> Multiply this by ~120 lists, and it gets on the large side for a config
> file: 7500+ lines just for the "watch" entries.

I agree that sucks.

> While I could generate the config file, I'm wondering about better
> solution, to allow globbing the path.

I wanted to have recursive watches at some point but never got
around to it.  So I guess something like this could work recursively:

	watchglob = maildir:$REDACTED/$LISTNAME/**

> I tried to locate a single place in the codebase where this would be
> applied, but it's not clear enough to me if there's a single place that
> it can easily modified.

The `new' sub in lib/PublicInbox/Watch.pm sets up maildirs/imap/nntp

The glob2re function is better nowadays in public-inbox.git,
and the mdre regexp will probably needs to be updated when it sees
a new maildir...

> If there's a consistent place, I think the cleanest syntax that doesn't
> break existing consumers would be something like this:
> [publicinbox "$LISTNAME"]
> watch = maildirglob:$REDACTED/$LISTNAME/.19????/
> watch = maildirglob:$REDACTED/$LISTNAME/.20????/

I think `watchglob = maildir:...' is preferable since I don't
want maildirglob: to be confused as a type.

[1] mainly blocked on releasing trying to wrap my head around -cindex

  reply	other threads:[~2023-11-20  0:10 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-19 23:13 publicinbox watch path globbing Robin H. Johnson
2023-11-20  0:10 ` Eric Wong [this message]
2023-11-20  0:16   ` Robin H. Johnson
2023-11-20  1:20     ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231120001001.M311669@dcvr \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    --cc=robbat2@gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).