Date | Commit message (Collapse) |
|
It's annoying for people using "git fetch && public-inbox-index"
as one user; and running -httpd/-nntpd as a different user
(where users see different config files).
|
|
It doesn't implement progress of batches, yet, but it wires
up the parsing of the command-line while preserving output
compatibility.
This output is NOT meant to be stable.
|
|
In particular, the '--compact' switch is really useful since it
works without holding the inbox-wide lock for minutes at a time
on giant inboxes (inboxes where copies can take dozens, if not
hundreds of minutes).
|
|
-index documentation avoid redundant v1 information and refers
readers to apropriate v1/v2 manpages. Search::Xapian can also
be optional, now, as only the PSGI search interface uses it.
Favor "INBOX_DIR" where appropriate, since "REPO_DIR" can be
confused for code repos which we also support.
XAPIAN_FLUSH_THRESHOLD is documented for all relevant
bulk commands.
|
|
Allow users to specify the --blocksize <B>, --no-full, --fuller
options for xapian-compact(1) for fine-tuning compact behavior
for low-traffic/inactive inboxes.
We also won't support --multipass, since it doesn't seem
compatible with our requirement to use --no-renumber.
We also won't support --single-file, since it only seems
intended for totally dead inboxes; and it doesn't seem
worth the support overhead when "totally dead" turns out
to be a misdiagnosis.
|
|
Since -xcpdb is a superset of -compact, we can reuse much of
that code used for driving compact.
For compact (only), this is slightly less memory efficient since
it requires an extra process per-partition, but we get to prefix
the output with the partition name for more readable output.
|
|
Copying an entire Xapian DB is horribly slow whether it's done
via Perl or copydatabase(1). So displaying some progress
indication is good for user experience.
While we're at it, prefix xapian-compact output, too; since
parallel processes end up clobbering each other.
|
|
To minimize the delay on active inboxes, it's actually ideal to
run xapian-compact at the end of the per-partition cpdb process;
since the new DB isn't accessible yet and so we don't have to
deal with lock contention with -mda or -watch processes. The
downside is temporary file overhead (3x instead of 2x) required.
|
|
By avoid copydatabase(1) entirely, we can make further changes
to avoid locking the entire inbox for a long operation and
switch to fine-grained locking.
|
|
We will be reindexing after copydatabase
|
|
copydatabase(1) is an existing Xapian tool which is the
recommended way to upgrade existing DBs to the latest Xapian
database format (currently "glass" for stable/released
versions). Our use of Xapian relies on preserving document IDs,
so we'll wrap it like we do xapian-compact(1) and use the
"--no-renumber" switch.
I could not name the tool "public-inbox-copydatabase" since it
would be ambiguous as to which DB it's actually copying. So, I
abbreviated the suffix to "xcpdb" (Xapian CoPy DataBase), which
I hope is acceptable and unambiguous.
|
|
Both of these index-affecting commands should work similarly
on the command-line.
public-inbox-index no longer complains about unconfigured
~/.public-inbox/config; but often I found myself being
annoyed by that, anyways...
|
|
Port public-inbox-compact(1) over to using it, and we will need
to wrap copydatabase(1) to ease glass migrations, too.
|
|
We're going to need copydatabase, too
|
|
In retrospect, introducing V1Writable was unnecessary and
InboxWritable->importer is in a better position to abstract
away differences between v1 and v2 writers.
So teach InboxWritable to initialize inboxes and get rid
of V1Writable.
|
|
"git config" did not preserve permissions of the config file it
modifies prior to git 2.1.0, so workaround that.
|
|
Since we lazy-load Xapian now, some errors may become
more cryptic or buried. Try to improve that by making
Admin show better errors.
|
|
More tests work without Search::Xapian, now.
Usability issues still need to be fixed
|
|
Deflating responses may be done by the reverse proxy (e.g. varnish
or nginx), so the warning for it could be invalid.
|
|
Avoiding reliance on environment variables is a bit cleaner
for writing tests
|
|
Import initialization is a little strange from history, but we
also can't change it too much because it's technically a public
API which external code may rely on...
And we may need to support v1 repos indefinitely. This should
make it easier to write tests for both formats.
|
|
This can help users track down the source of warnings
when presented with imperfect emails.
While we're at it, make the __WARN__ callback in t/v2writable.t
a no-op since we don't check for warnings, there.
|
|
* origin/purge:
implement public-inbox-purge tool
v2writable: read epoch on purge
v2writable: cleanup processes when done
v2writable: purge ignores non-existent git epoch directories
v2writable: ->purge returns undef on no-op
import: purge: reap fast-export process
hoist out resolve_repo_dir from -index
|
|
Maybe we'll default to a dark theme to promote energy savings...
See contrib/css/README for details
|
|
|
|
Expose the ->purge functionality of V2Writable for rewriting
git history to permanently purge messages from history. This
may be necessary for legal reasons.
Usage:
# requires ~/.public-inbox/config
public-inbox-purge --all </path/to/message-to-purge
# good for testing with unconfigured inboxes:
public-inbox-purge $INBOX_DIR </path/to/message-to-purge
|
|
We'll be using it in future admin tools, and making this
easier-to-test.
|
|
Clearly the AltId stuff was never tested for v2. Ensure
this tricky filter (which reuses Msgmap to avoid introducing
new serial numbers) doesn't trigger deadlocks SQLite due
to opening a DB for writing multiple times.
I went through several iterations of this change before
going with this one, which is the least intrusive I could
fine.
|
|
No need to reach into PublicInbox::Config internals and iterate
through the hashref by hand
|
|
This allows archivists to publish incomplete archives with newer
mail while allowing "0.git" (or "1.git" and so on) epochs to be
added-after-the-fact (without affecting "git clone" followers).
A reindex will be necessary for Xapian and SQLite to catch up
once the old epochs are added; but the reindexing code is also
capable of tolerating missing epochs.
|
|
It is redundant to set default values in the public-inbox
config file. Lets not clutter up users' screens when they
view or edit the config file.
|
|
This reuses some of the configuration from -watch, but remains
independent since some configurations will use -watch for some
inboxes and -mda for others.
The default remains "spamc" for -mda users so nothing changes
without explicit configuration.
Per-inbox configurations may also be supported in the future.
|
|
We must not clobber the original message string, as Email::MIME(*)
still needs it for iterating through parts in SearchIdx (but not
when handing it as a raw string to git-fast-import).
I've noticed message bodies (especially dfpre/dpost) were not
getting indexed when going through -mda (no problems with
-watch). This also did not affect v1 repos, since indexing is a
separate process for v1 and requires re-reading the data from
git.
(*) tested Email::MIME 1.937 on Debian stretch
|
|
It's a convenient wrapper nowadays, so get rid of some legacy
code and minimize differences from the -watch code.
|
|
If indexlevel is specified on the command line prefer that.
If indexlevel is specified in the config file prefer that.
If indexlevel is not specified anywhere default to full.
This should make indexlevel somewhat approachable.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
We subtract one from "jobs" to map to "partitions" to account
for the overview index and git fast-import jobs.
|
|
Many MTA understand these and map them to sensible SMTP error messages.
Inability to find an inbox results in "5.1.1 user unknown".
Misformatted messages are rejected with "5.6.0 data format error".
Unsupported inbox versions are reported as "5.3.5 local configuration error".
All of these are interpreted as permanent failures.
|
|
Oops, I mainly rely on public-inbox-watch for spam training
and completely forgot this tool existed :x
|
|
This quiets a warning inside Spawn.pm
|
|
Some users may not have any public-inboxes configured, especially in
tests.
|
|
I noticed I lost a $GIT_DIR/description in a conversion, so we
should preserve it. While we're at it, we ought to copy any
config in the old repo to the new one.
We will need to warn about cloneurl since it's unfortunately
not an automatic process to update. Oh well..
|
|
--no-renumber does not allow merging, and merging is not ideal
for reindexing, either.
|
|
Since we only query the SQLite over DB for OVER/XOVER; do not
need to waste space storing fields To/Cc/:bytes/:lines or the
XNUM term. We only use From/Subject/References/Message-ID/:blob
in various places of the PSGI code.
For reindexing, we will take advantage of docid stability
in "xapian-compact --no-renumber" to ensure duplicates do not
show up in search results. Since the PSGI interface is the
only consumer of Xapian at the moment, it has no need to
search based on NNTP article number.
|
|
public-inbox-convert ought to be 100% lossless, now
|
|
Not everybody needs multiprocess support.
|
|
Xapian is size-intensive and SQLite is not strictly necessary for v1.
|
|
Some of this jankiness was from early performance problems
and they turned out to be unnecessary measures.
|
|
Lets not scare users when they encounter files that are supposed
to be there. Then, preserve the journal and pipe.lock, even if
they're supposedly unused due to us holding the inbox-wide lock.
|
|
This is important for people running mirrors via "git fetch",
as they need to be kept up-to-date. Purging is also now
supported in mirrors.
The short-lived "--regenerate" option is gone and is now
implicitly enabled as a result. It's still cheap when
article number regeneration is unnecessary, as we track
the range for each git repository.
|
|
|