public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2024-02-01	lei convert: explicitly allow --sort for inputs
	LeiToMail can't sort v2 output, but sorting MH input (and NNTP spool + mlmmj archives) numerically makes sense.
2023-12-30	lei: support reading MH for convert+import+index
	The MH format is widely-supported and used by various MUAs such as mutt and sylpheed, and a MH-like format is used by mlmmj for archives, as well. Locking implementations for writes are inconsistent, so this commit doesn't support writes, yet. inotify\|EVFILT_VNODE watches aren't supported, yet, but that'll have to come since MH allows packing unused integers and renaming files.
2023-11-16	lei q\|up\|convert: common finish_output to detect errors
	We need to consistently check the exit code of pigz\|gzip\|xz\|bzip2 when writing to compressed mboxes (or bad storage).
2023-11-16	lei: avoid extra fork for v2 outputs
	We've always forced LeiToMail to only have one process for v2 outputs anyways since v2 has its own sharding and IPC. Thus we can use the single LeiToMail process directly to avoid extra IPC overhead.
2023-11-16	lei convert: fix repeat and idempotent v2 output
	We should be able to treat v2 outputs just like any other mail format, with the exception that content dedupe is always enforced by the v2 format. This allows users hosting v2 public-inboxes to catch up broken synchronization from alternate archives such as the mbox archives hosted by https://lists.gnu.org/ Link: https://public-inbox.org/meta/20231114-hypersonic-papaya-starling-e1cfc8@nitro/
2023-10-04	lei: get rid of l2m_progress PktOp callback
	We already have an ->incr callback we can enhance to support multiple counters with a single request. Furthermore, we can just flatten the object graph by storing counters directly in the $lei object itself to reduce hash lookups.
2023-01-18	ipc+lei: switch to awaitpid
	This avoids awkwardly stuffing an arrayref into callbacks which expect multiple arguments. IPC->awaitpid_init now allows pre-registering callbacks before spawning workers.
2022-07-07	lei: track seen messages to note duplicates
	This may help track down deduplication or other bugs in lei which lead to occasionally missing messages. Link: https://public-inbox.org/meta/CAL_JsqJH8xx_2NyZffNsRXbGXiv3kjmCETvKXt3Yfb0uToLm9Q@mail.gmail.com/
2021-10-28	lei convert: remove redundant input_net_cb
	Use the one provided by the LeiInput parent class.
2021-10-28	lei convert: use "--output" in failure message
	The extra dashes should help users find the correct option more easily.
2021-10-28	xt/net_writer_imap: test "lei convert" w/ IMAP source
	I just did a double-take and nearly thought authentication was broken while reading LeiConvert.pm. Add a comment in LeiConvert.pm to clarify things, too.
2021-10-15	lei + ipc: simplify process reaping
	Simplify our APIs and force dwaitpid() to work in async mode for all lei workers. This avoids having lingering zombies for parallel searches if one worker finishes soon before another. The old distinction between "old" and "new" workers was needlessly complex, error-prone, and embarrasingly bad. We also never handled v2:// writers properly before on Ctrl-C/Ctrl-Z (SIGINT/SIGTSTP), so add them to @WQ_KEYS to ensure they get handled by $lei when appropropriate.
2021-10-13	lei: use standard warn() in more places
	warn() is easier to augment with context information, and frankly unavoidable in the presence of 3rd-party libraries we don't control.
2021-06-08	lei: generalize auxiliary WQ handling
	op_wait_event is now more lei-specific since we no longer have to care about oneshot and use a synchronous loop. {ikw} (import-keywords) started a trend, but LeiPmdir (parallel Maildir) is an upcoming WQ class that will follow this idea. Eventually, {l2m} usage may be updated to follow this, too.
2021-05-03	lei: simplify workers_start API
	In most cases, we just name the worker process based on the command. The only change is for LeiMirror vs "lei add-external --mirror", but I doubt it matters.
2021-05-01	lei edit-search: support relocating lei.q.output
	The contents of the old lei.q.output will not be removed, but will be converted into the new one.
2021-04-28	lei: simple WQ workers use {wq1} field
	This lets us share more code and reduces cognitive overhead when it comes to picking names (because {lsss} was ridiculous). We'll need to ensure the first error set in lei is the actual error we exit with, otherwise things can get confusing and errors may get lost.
2021-04-28	lei: quiet down Eml-related warnings consistently
	"lei import" is probably the only place where it users might care about warnings.
2021-04-27	lei: standardize on _lei_wq_eof callback for workers
	Simplify our internals a little bit.
2021-03-31	lei_input: reduce IPC traffic with multiple inputs
	No point in sending a command for every input when a single one will do. We'll also trigger LeiStore->done sooner in the worker rather than later.
2021-03-29	lei_input: support compressed mboxes
	Since "lei q" and "lei convert" already support writing these compressed inboxes, it makes sense that all mbox readers support them, as well. Using compression is one reliable way to know an mboxrd or mboxo hasn't been unexpectedly truncated.
2021-03-29	lei_input: avoid special case sub for --stdin
	We can consistently open /dev/stdin correctly nowadays, so drop the input_stdin and just use the normal ->path_to_fd code path.
2021-03-28	lei: simplify PktOp callers
	Provide a consistent ->op_wait_event method instead of forcing callers to loop (or not) at each callsite. This also avoid a leak possibility by avoiding circular references.
2021-03-26	lei: support /dev/fd/[0-2] inputs and outputs in daemon
	Since lei-daemon won't have the same FDs as the client, we need to special-case thse mappings and won't be able to open arbitrary, non-standard FDs. We also won't attempt to support /proc/self/fd/[0-2] since that's a Linux-ism. /dev/fd/[0-2] and /dev/std{in,out,err} are portable to FreeBSD, at least. mawk(1) also supports /dev/std{out,err}, as does gawk(1) (which supports everything we can support, and arbitrary /dev/fd/$FD).
2021-03-24	lei: improve management around short-lived workers
	Instead of creating a short-lived circular reference, ensure they don't exist in the first place. Note the following changes to hold an extra ref to $sto: - $self->_lei_store(1)->write_prepare($self); + my $sto = $self->_lei_store(1); + $sto->write_prepare($self); I'm not a perlguts expert, but I actually wanted to switch to the one-line version for LeiImport, but xt/lei-auth-fail.t was getting stuck for some reason. It seems the extra ref to the LeiStore ($sto) object is necessary.
2021-03-24	lei_input: more common code between <mark\|convert\|import>
	"lei convert" is actually a bit of the odd one, since it uses lei2mail for auth, unlike the others.
2021-03-24	lei: hide *_atfork_child from command-line
	Otherwise we could get non-sensical results if somebody tries running "lei atfork_child" from the command-line.
2021-03-23	lei_input: common filehandle reader for eml + mbox
	This improve code regularity, and will let us deal with the "RFC822" messages with "From " line that mutt pipes to.
2021-03-23	lei: simplify workers_start and callers
	Since workers_start is in the common PublicInbox::LEI package, we can just use \&METHOD_NAME instead of relying on UNIVERSAL->can to avoid a method dispatch. Most of our worker code can just use lei->dclose, so default to doing that unless it's been overridden.
2021-03-23	lei: share input code between convert and import
	These commands accept mail the same way, and this forces us to maintain consistent input format support between commands. We'll be using this for "lei mark", too.
2021-03-22	lei: simplify lazy-loading
	This makes it slightly easier to implement future commands, since there'll be a couple more relatively self-contained ones.
2021-03-16	mbox: move mbox_keywords to MboxReader
	MboxReader is a more appropriate place for it than LeiStore.
2021-03-04	lei: use maildir_each_eml in more places
	This saves us some code and redundant callsites for eml_from_path. We'll change maildir_each_eml to include the filename to facilitate an upcoming change to "lei q" without --augment
2021-02-26	lei import\|convert: support mbox locking on reads
	In case somebody is writing non-atomically, ensure we take read locks when opening mbox files for reading. v2: squash: load MboxLock even for .eml files
2021-02-26	lei import: use --in-format/-F for consistency
	Since we recommend $IN_FORMAT:$LOCATION, this is hopefully not intrusive (not that this is released software, yet). This is to be consistent with "lei convert" usage. We'll keep "-f" only for output formats, since that is used for "lei q" and "lei convert" for outputs
2021-02-26	lei convert: support IMAP output and "-F eml" inputs
	eml ("message/rfc822" MIME type) is supported by "lei import", so it probably makes sense to support via convert, at least for tests. And IMAP support is supported in "lei q -o $MFOLDER", so this only required renaming {nrd} => {net} and initializing outputs before augment preparation (creating the IMAP folder)
2021-02-24	lei <import\|convert>: support NNTP sources
	We can read NNTP in -watch and Net::NNTP is shipped with Perl5, so lei import and convert have no excuse not to support NNTP as a client. Authentication is not tested, yet; but should be close to what IMAP is like...
2021-02-22	lei convert: inline convert_start
	Since we stopped using LeiAuth as a WQ worker, keeping this around as a single-use sub makes no sense and wastes several KB of memory.
2021-02-22	lei q: reduce wasted IMAP connection for auth
	We can rework the first lei2mail worker to authenticate, and then share auth info with the rest of the lei2mail workers. As with "lei import", this uses PktOp and lei-daemon to share updated credentials between the first an subsequent l2m workers.
2021-02-22	lei convert: auth directly from worker process
	Since this only has one worker, we can auth directly in the worker since the convert worker now has access to the script/lei {sock} for running "git credential".
2021-02-18	lei: consolidate the bulk of the IPC code
	The backends for "lei add-external --mirror", "lei convert", and "lei import" all share a similar pattern for spawning background workers. Hoist out the common parts to slim down our code base a bit. The LeiXSearch and LeiToMail workers for "lei q" remains a the odd duck due to the deep pipelining and parallelization.
2021-02-18	lei convert: mail format conversion sub-command
	This will make testing IMAP support for other commands easier, as it doesn't write to lei/store at all. Like the pager and MUA, "git credential" is always spawned by script/lei (and not lei-daemon) so it has a controlling terminal for password prompts. v2: fix missing requires, correct test ordering v3: ensure config exists for IMAP auth