From: "Robin H. Johnson" <robbat2@gentoo.org>
To: Eric Wong <e@80x24.org>
Cc: "Robin H. Johnson" <robbat2@gentoo.org>, meta@public-inbox.org
Subject: Re: publicinbox watch path globbing
Date: Mon, 20 Nov 2023 00:16:57 +0000 [thread overview]
Message-ID: <robbat2-20231120T001135-683587565Z@orbis-terrarum.net> (raw)
In-Reply-To: <20231120001001.M311669@dcvr>
[-- Attachment #1: Type: text/plain, Size: 3424 bytes --]
On Mon, Nov 20, 2023 at 12:10:01AM +0000, Eric Wong wrote:
> "Robin H. Johnson" <robbat2@gentoo.org> wrote:
> > Hi!
> >
> > Writing to see about work in converting Gentoo's (now-broken) other
> > archives web interface over into using public-inbox instead.
> >
> > This is the first of a few questions/bumps along the way.
> >
> > For historical reasons on the scaling side, the archive maildirs are
> > stored by date:
> > watch = maildir:$REDACTED/$LISTNAME/.200001/
> > watch = maildir:$REDACTED/$LISTNAME/.200102/
> > watch = maildir:$REDACTED/$LISTNAME/.YYYYMM/
> > watch = maildir:$REDACTED/$LISTNAME/.202311/
> > etc.
> > (over time, directories are moved to stable read-only storage)
>
> Is there any reason to expect new messages to appear the /.2000??/
> and other old directories?
>
> IOW, if somebody with a broken clock sends a message from a past
> year/month in the Date: header, does it end up in an old bucket
> or the current one?
The date is based on arrival time at the archive ingest.
For some of the very old lists, we do have a list of message-ids that we
know existed but aren't captured in the archive, and those mails have
been added to the old locations if they are ever found (maybe once a
year).
>
> If your old buckets are frozen, lei in public-inbox.git should be
> able to start them off with:
>
> for d in $REDACTED/$LISTNAME/.??????
> do
> lei convert -o v2:/path/to/inbox-$LISTNAME maildir:$d
> done
> lei daemon-kill # optional, stops lei-daemon when done
>
> And then you'd only have to watch the latest maildir.
Any concerns during the month rollover period?
E.g. making sure the 202310 & 202311 are both watched right as time
increments from October to November, because the archive ingest is
likely to write to 202311, but it's possible that public-inbox is still
run for the last few new messages in 202310 yet?
> > While I could generate the config file, I'm wondering about better
> > solution, to allow globbing the path.
>
> I wanted to have recursive watches at some point but never got
> around to it. So I guess something like this could work recursively:
> watchglob = maildir:$REDACTED/$LISTNAME/**
>
> > I tried to locate a single place in the codebase where this would be
> > applied, but it's not clear enough to me if there's a single place that
> > it can easily modified.
>
> The `new' sub in lib/PublicInbox/Watch.pm sets up maildirs/imap/nntp
>
> The glob2re function is better nowadays in public-inbox.git,
> and the mdre regexp will probably needs to be updated when it sees
> a new maildir...
Thanks. I'd want to explicitly scope the glob to the dates.
The spam processing has been to move spam to .spam.YYYYMM.
> > If there's a consistent place, I think the cleanest syntax that doesn't
> > break existing consumers would be something like this:
> > [publicinbox "$LISTNAME"]
> > watch = maildirglob:$REDACTED/$LISTNAME/.19????/
> > watch = maildirglob:$REDACTED/$LISTNAME/.20????/
>
> I think `watchglob = maildir:...' is preferable since I don't
> want maildirglob: to be confused as a type.
Agreed, I see concerns there.
--
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation President & Treasurer
E-Mail : robbat2@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1113 bytes --]
next prev parent reply other threads:[~2023-11-20 0:17 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-19 23:13 publicinbox watch path globbing Robin H. Johnson
2023-11-20 0:10 ` Eric Wong
2023-11-20 0:16 ` Robin H. Johnson [this message]
2023-11-20 1:20 ` Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=robbat2-20231120T001135-683587565Z@orbis-terrarum.net \
--to=robbat2@gentoo.org \
--cc=e@80x24.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).