about summary refs log tree commit homepage
path: root/lib/PublicInbox/LeiConvert.pm
DateCommit message (Collapse)
2024-02-01lei convert: explicitly allow --sort for inputs
LeiToMail can't sort v2 output, but sorting MH input (and NNTP spool + mlmmj archives) numerically makes sense.
2023-12-30lei: support reading MH for convert+import+index
The MH format is widely-supported and used by various MUAs such as mutt and sylpheed, and a MH-like format is used by mlmmj for archives, as well. Locking implementations for writes are inconsistent, so this commit doesn't support writes, yet. inotify|EVFILT_VNODE watches aren't supported, yet, but that'll have to come since MH allows packing unused integers and renaming files.
2023-11-16lei q|up|convert: common finish_output to detect errors
We need to consistently check the exit code of pigz|gzip|xz|bzip2 when writing to compressed mboxes (or bad storage).
2023-11-16lei: avoid extra fork for v2 outputs
We've always forced LeiToMail to only have one process for v2 outputs anyways since v2 has its own sharding and IPC. Thus we can use the single LeiToMail process directly to avoid extra IPC overhead.
2023-11-16lei convert: fix repeat and idempotent v2 output
We should be able to treat v2 outputs just like any other mail format, with the exception that content dedupe is always enforced by the v2 format. This allows users hosting v2 public-inboxes to catch up broken synchronization from alternate archives such as the mbox archives hosted by https://lists.gnu.org/ Link: https://public-inbox.org/meta/20231114-hypersonic-papaya-starling-e1cfc8@nitro/
2023-10-04lei: get rid of l2m_progress PktOp callback
We already have an ->incr callback we can enhance to support multiple counters with a single request. Furthermore, we can just flatten the object graph by storing counters directly in the $lei object itself to reduce hash lookups.
2023-01-18ipc+lei: switch to awaitpid
This avoids awkwardly stuffing an arrayref into callbacks which expect multiple arguments. IPC->awaitpid_init now allows pre-registering callbacks before spawning workers.
2022-07-07lei: track seen messages to note duplicates
This may help track down deduplication or other bugs in lei which lead to occasionally missing messages. Link: https://public-inbox.org/meta/CAL_JsqJH8xx_2NyZffNsRXbGXiv3kjmCETvKXt3Yfb0uToLm9Q@mail.gmail.com/
2021-10-28lei convert: remove redundant input_net_cb
Use the one provided by the LeiInput parent class.
2021-10-28lei convert: use "--output" in failure message
The extra dashes should help users find the correct option more easily.
2021-10-28xt/net_writer_imap: test "lei convert" w/ IMAP source
I just did a double-take and nearly thought authentication was broken while reading LeiConvert.pm. Add a comment in LeiConvert.pm to clarify things, too.
2021-10-15lei + ipc: simplify process reaping
Simplify our APIs and force dwaitpid() to work in async mode for all lei workers. This avoids having lingering zombies for parallel searches if one worker finishes soon before another. The old distinction between "old" and "new" workers was needlessly complex, error-prone, and embarrasingly bad. We also never handled v2:// writers properly before on Ctrl-C/Ctrl-Z (SIGINT/SIGTSTP), so add them to @WQ_KEYS to ensure they get handled by $lei when appropropriate.
2021-10-13lei: use standard warn() in more places
warn() is easier to augment with context information, and frankly unavoidable in the presence of 3rd-party libraries we don't control.
2021-06-08lei: generalize auxiliary WQ handling
op_wait_event is now more lei-specific since we no longer have to care about oneshot and use a synchronous loop. {ikw} (import-keywords) started a trend, but LeiPmdir (parallel Maildir) is an upcoming WQ class that will follow this idea. Eventually, {l2m} usage may be updated to follow this, too.
2021-05-03lei: simplify workers_start API
In most cases, we just name the worker process based on the command. The only change is for LeiMirror vs "lei add-external --mirror", but I doubt it matters.
2021-05-01lei edit-search: support relocating lei.q.output
The contents of the old lei.q.output will not be removed, but will be converted into the new one.
2021-04-28lei: simple WQ workers use {wq1} field
This lets us share more code and reduces cognitive overhead when it comes to picking names (because {lsss} was ridiculous). We'll need to ensure the first error set in lei is the actual error we exit with, otherwise things can get confusing and errors may get lost.
2021-04-28lei: quiet down Eml-related warnings consistently
"lei import" is probably the only place where it users might care about warnings.
2021-04-27lei: standardize on _lei_wq_eof callback for workers
Simplify our internals a little bit.
2021-03-31lei_input: reduce IPC traffic with multiple inputs
No point in sending a command for every input when a single one will do. We'll also trigger LeiStore->done sooner in the worker rather than later.
2021-03-29lei_input: support compressed mboxes
Since "lei q" and "lei convert" already support writing these compressed inboxes, it makes sense that all mbox readers support them, as well. Using compression is one reliable way to know an mboxrd or mboxo hasn't been unexpectedly truncated.
2021-03-29lei_input: avoid special case sub for --stdin
We can consistently open /dev/stdin correctly nowadays, so drop the input_stdin and just use the normal ->path_to_fd code path.
2021-03-28lei: simplify PktOp callers
Provide a consistent ->op_wait_event method instead of forcing callers to loop (or not) at each callsite. This also avoid a leak possibility by avoiding circular references.
2021-03-26lei: support /dev/fd/[0-2] inputs and outputs in daemon
Since lei-daemon won't have the same FDs as the client, we need to special-case thse mappings and won't be able to open arbitrary, non-standard FDs. We also won't attempt to support /proc/self/fd/[0-2] since that's a Linux-ism. /dev/fd/[0-2] and /dev/std{in,out,err} are portable to FreeBSD, at least. mawk(1) also supports /dev/std{out,err}, as does gawk(1) (which supports everything we can support, and arbitrary /dev/fd/$FD).
2021-03-24lei: improve management around short-lived workers
Instead of creating a short-lived circular reference, ensure they don't exist in the first place. Note the following changes to hold an extra ref to $sto: - $self->_lei_store(1)->write_prepare($self); + my $sto = $self->_lei_store(1); + $sto->write_prepare($self); I'm not a perlguts expert, but I actually wanted to switch to the one-line version for LeiImport, but xt/lei-auth-fail.t was getting stuck for some reason. It seems the extra ref to the LeiStore ($sto) object is necessary.
2021-03-24lei_input: more common code between <mark|convert|import>
"lei convert" is actually a bit of the odd one, since it uses lei2mail for auth, unlike the others.
2021-03-24lei: hide *_atfork_child from command-line
Otherwise we could get non-sensical results if somebody tries running "lei atfork_child" from the command-line.
2021-03-23lei_input: common filehandle reader for eml + mbox
This improve code regularity, and will let us deal with the "RFC822" messages with "From " line that mutt pipes to.
2021-03-23lei: simplify workers_start and callers
Since workers_start is in the common PublicInbox::LEI package, we can just use \&METHOD_NAME instead of relying on UNIVERSAL->can to avoid a method dispatch. Most of our worker code can just use lei->dclose, so default to doing that unless it's been overridden.
2021-03-23lei: share input code between convert and import
These commands accept mail the same way, and this forces us to maintain consistent input format support between commands. We'll be using this for "lei mark", too.
2021-03-22lei: simplify lazy-loading
This makes it slightly easier to implement future commands, since there'll be a couple more relatively self-contained ones.
2021-03-16mbox: move mbox_keywords to MboxReader
MboxReader is a more appropriate place for it than LeiStore.
2021-03-04lei: use maildir_each_eml in more places
This saves us some code and redundant callsites for eml_from_path. We'll change maildir_each_eml to include the filename to facilitate an upcoming change to "lei q" without --augment
2021-02-26lei import|convert: support mbox locking on reads
In case somebody is writing non-atomically, ensure we take read locks when opening mbox files for reading. v2: squash: load MboxLock even for .eml files
2021-02-26lei import: use --in-format/-F for consistency
Since we recommend $IN_FORMAT:$LOCATION, this is hopefully not intrusive (not that this is released software, yet). This is to be consistent with "lei convert" usage. We'll keep "-f" only for output formats, since that is used for "lei q" and "lei convert" for outputs
2021-02-26lei convert: support IMAP output and "-F eml" inputs
eml ("message/rfc822" MIME type) is supported by "lei import", so it probably makes sense to support via convert, at least for tests. And IMAP support is supported in "lei q -o $MFOLDER", so this only required renaming {nrd} => {net} and initializing outputs before augment preparation (creating the IMAP folder)
2021-02-24lei <import|convert>: support NNTP sources
We can read NNTP in -watch and Net::NNTP is shipped with Perl5, so lei import and convert have no excuse not to support NNTP as a client. Authentication is not tested, yet; but should be close to what IMAP is like...
2021-02-22lei convert: inline convert_start
Since we stopped using LeiAuth as a WQ worker, keeping this around as a single-use sub makes no sense and wastes several KB of memory.
2021-02-22lei q: reduce wasted IMAP connection for auth
We can rework the first lei2mail worker to authenticate, and then share auth info with the rest of the lei2mail workers. As with "lei import", this uses PktOp and lei-daemon to share updated credentials between the first an subsequent l2m workers.
2021-02-22lei convert: auth directly from worker process
Since this only has one worker, we can auth directly in the worker since the convert worker now has access to the script/lei {sock} for running "git credential".
2021-02-18lei: consolidate the bulk of the IPC code
The backends for "lei add-external --mirror", "lei convert", and "lei import" all share a similar pattern for spawning background workers. Hoist out the common parts to slim down our code base a bit. The LeiXSearch and LeiToMail workers for "lei q" remains a the odd duck due to the deep pipelining and parallelization.
2021-02-18lei convert: mail format conversion sub-command
This will make testing IMAP support for other commands easier, as it doesn't write to lei/store at all. Like the pager and MUA, "git credential" is always spawned by script/lei (and not lei-daemon) so it has a controlling terminal for password prompts. v2: fix missing requires, correct test ordering v3: ensure config exists for IMAP auth