user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 27/26] doc: various updates to reflect current state
Date: Thu, 23 May 2019 10:37:38 +0000	[thread overview]
Message-ID: <20190523103738.GA24435@dcvr> (raw)
In-Reply-To: <20190523093704.18367-1-e@80x24.org>

-index documentation avoid redundant v1 information and refers
readers to apropriate v1/v2 manpages.  Search::Xapian can also
be optional, now, as only the PSGI search interface uses it.

Favor "INBOX_DIR" where appropriate, since "REPO_DIR" can be
confused for code repos which we also support.

XAPIAN_FLUSH_THRESHOLD is documented for all relevant
bulk commands.
---
 Documentation/public-inbox-compact.pod   | 25 ++++----
 Documentation/public-inbox-index.pod     | 80 +++++++++---------------
 Documentation/public-inbox-v1-format.pod | 12 +++-
 Documentation/public-inbox-v2-format.pod |  5 +-
 Documentation/public-inbox-xcpdb.pod     |  5 +-
 lib/PublicInbox/Inbox.pm                 |  2 +-
 script/public-inbox-compact              |  2 +-
 script/public-inbox-index                |  4 +-
 script/public-inbox-init                 |  2 +-
 9 files changed, 64 insertions(+), 73 deletions(-)

diff --git a/Documentation/public-inbox-compact.pod b/Documentation/public-inbox-compact.pod
index 4a519ce..7d37f6f 100644
--- a/Documentation/public-inbox-compact.pod
+++ b/Documentation/public-inbox-compact.pod
@@ -9,15 +9,12 @@ public-inbox-compact - compact Xapian DBs
 =head1 DESCRIPTION
 
 public-inbox-compact is a wrapper for L<xapian-compact(1)>
-designed for "v2" inboxes.  It combines multiple Xapian
-partitions into one to reduce space overhead after an initial
-mass import (using multiple partitions) is done.
+which locks the inbox and prevents other processes such as
+L<public-inbox-watch(1)> or L<public-inbox-mda(1)> from
+writing while it operates.
 
-It locks the inbox and prevents other processes such as
-L<public-inbox-watch(1)> from writing while it operates.
-
-It also supports "v1" (ssoma) inboxes with limited
-usefulness over L<xapian-compact(1)>
+It enforces the use of the C<--no-renumber> option of
+L<xapian-compact(1)>
 
 =head1 ENVIRONMENT
 
@@ -28,9 +25,15 @@ usefulness over L<xapian-compact(1)>
 The default config file, normally "~/.public-inbox/config".
 See L<public-inbox-config(5)>
 
-=back
+=item XAPIAN_FLUSH_THRESHOLD
+
+The number of documents to update before committing changes to
+disk.  This environment is handled directly by Xapian, refer to
+Xapian API documentation for more details.
 
-=head1 UPGRADING
+Default: 10000
+
+=back
 
 =head1 CONTACT
 
@@ -41,7 +44,7 @@ and L<http://hjrcffqmbrq6wope.onion/meta/>
 
 =head1 COPYRIGHT
 
-Copyright 2018 all contributors L<mailto:meta@public-inbox.org>
+Copyright 2018-2019 all contributors L<mailto:meta@public-inbox.org>
 
 License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>
 
diff --git a/Documentation/public-inbox-index.pod b/Documentation/public-inbox-index.pod
index acc9039..2e0ff69 100644
--- a/Documentation/public-inbox-index.pod
+++ b/Documentation/public-inbox-index.pod
@@ -4,14 +4,15 @@ public-inbox-index - create and update search indices
 
 =head1 SYNOPSIS
 
-public-inbox-index [OPTIONS] REPO_DIR
+public-inbox-index [OPTIONS] INBOX_DIR
 
 =head1 DESCRIPTION
 
-public-inbox-index creates and updates the search and NNTP
-article number database used by the read-only public-inbox HTTP
-and NNTP interfaces.  Currently, this requires L<Search::Xapian>
-and L<DBD::SQlite> and L<DBI> Perl modules.
+public-inbox-index creates and updates the search, overview and
+NNTP article number database used by the read-only public-inbox
+HTTP and NNTP interfaces.  Currently, this requires
+L<DBD::SQlite> and L<DBI> Perl modules.  L<Search::Xapian>
+is optional, only to support the PSGI search interface.
 
 Once the initial indices are created by public-inbox-index,
 L<public-inbox-mda(1)> and L<public-inbox-watch(1)> will
@@ -22,10 +23,10 @@ relying on L<git-fetch(1)> to mirror an existing public-inbox;
 or if upgrading to a new version of public-inbox using
 the C<--reindex> option.
 
-Having a search and article number database is essential to
+Having the overview and article number database is essential to
 running the NNTP interface, and strongly recommended for the
-HTTP interface as it provides thread grouping in addition
-to normal search functionality.
+HTTP interface as it provides thread grouping in addition to
+normal search functionality.
 
 =head1 OPTIONS
 
@@ -45,50 +46,11 @@ This does not touch the NNTP article number database.
 
 =head1 FILES
 
+For v1 (ssoma) repositories described in L<public-inbox-v1-format>.
 All public-inbox-specific files are contained within the
-C<$REPO_DIR/public-inbox/> directory.  All files are expected to
-grow in size as more messages are archived, so using compaction
-commands (e.g. L<xapian-compact(1)>) is not recommended unless
-the list is no longer active.
+C<$GIT_DIR/public-inbox/> directory.
 
-=over
-
-=item $REPO_DIR/public-inbox/msgmap.sqlite3
-
-The stable NNTP article number to Message-ID mapping is
-stored in an SQLite3 database.
-
-This is required for users of L<public-inbox-nntpd(1)>, but
-users of the L<PublicInbox::WWW> interface will find it
-useful for attempting recovery from copy-paste truncations of
-URLs containing long Message-IDs.
-
-Avoid removing this file and regenerating it; it may cause
-existing NNTP readers to lose sync and miss (or see duplicate)
-messages.
-
-This file is relatively small, and typically less than 5%
-of the space of the mail stored in a packed git repository.
-
-=item $REPO_DIR/public-inbox/xapian*
-
-The database used by L<Search::Xapian>.  This directory name is
-followed by a number indicating the index schema version this
-installation of public-inbox uses.
-
-These directories may be safely deleted or removed in full
-while the NNTP and HTTP interfaces are no longer accessing
-them.
-
-In addition to providing a search interface for the HTTP
-interface, the Xapian database is used to group and combine
-related messages into threads.  For NNTP servers, it also
-provides a cache of metadata and header information often
-requested by NNTP clients.
-
-This directory is large, often two to three times the size of
-the objects stored in a packed git repository.  Using the
-C<--reindex> option makes it larger, still.
+v2 repositories are described in L<public-inbox-v2-format>.
 
 =back
 
@@ -100,8 +62,24 @@ C<--reindex> option makes it larger, still.
 
 Used to override the default "~/.public-inbox/config" value.
 
+=item XAPIAN_FLUSH_THRESHOLD
+
+The number of documents to update before committing changes to
+disk.  This environment is handled directly by Xapian, refer to
+Xapian API documentation for more details.
+
+Default: our indexing code flushes every megabyte of mail seen
+to keep memory usage low.  Setting this environment variable to
+any positive value will switch to a document count-based
+threshold in Xapian.
+
 =back
 
+=head1 UPGRADING
+
+Occasionally, public-inbox will update it's schema version and
+require a full index by running this command.
+
 =head1 CONTACT
 
 Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org>
@@ -111,7 +89,7 @@ and L<http://hjrcffqmbrq6wope.onion/meta/>
 
 =head1 COPYRIGHT
 
-Copyright 2016-2018 all contributors L<mailto:meta@public-inbox.org>
+Copyright 2016-2019 all contributors L<mailto:meta@public-inbox.org>
 
 License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>
 
diff --git a/Documentation/public-inbox-v1-format.pod b/Documentation/public-inbox-v1-format.pod
index 3b0e70e..c960913 100644
--- a/Documentation/public-inbox-v1-format.pod
+++ b/Documentation/public-inbox-v1-format.pod
@@ -104,6 +104,10 @@ SQLite3 database maintaining a stable mapping of Message-IDs to NNTP
 article numbers.  Used by L<public-inbox-nntpd(1)> and created
 and updated by L<public-inbox-index(1)>.
 
+Users of the L<PublicInbox::WWW> interface will find it
+useful for attempting recovery from copy-paste truncations of
+URLs containing long Message-IDs.
+
 Automatically updated by L<public-inbox-mda(1)>,
 L<public-inbox-learn(1)> and L<public-inbox-watch(1)>.
 
@@ -135,8 +139,12 @@ the "overview" DB also exists in the xapian directory for v1
 repositories.  See L<public-inbox-v2-format(5)/OVERVIEW DB>
 
 Our use of the L</OVERVIEW DB> requires Xapian document IDs to
-remain stable.  Thus, use of L<xapian-compact(1)> and
-L<copydatabase(8)> require the use of C<--no-renumber> switch.
+remain stable.  Using L<public-inbox-compact(1)> and
+L<public-inbox-xcpdb(1)> wrappers are recommended over tools
+provided by Xapian.
+
+This directory is large, often two to three times the size of
+the objects stored in a packed git repository.
 
 =item $GIT_DIR/ssoma.index
 
diff --git a/Documentation/public-inbox-v2-format.pod b/Documentation/public-inbox-v2-format.pod
index bc58074..65a85c1 100644
--- a/Documentation/public-inbox-v2-format.pod
+++ b/Documentation/public-inbox-v2-format.pod
@@ -118,8 +118,9 @@ large mail archives; but are fine for backup and usable for
 small instances.
 
 Our use of the L</OVERVIEW DB> requires Xapian document IDs to
-remain stable.  Thus, use of L<xapian-compact(1)> and
-L<copydatabase(8)> require the use of C<--no-renumber> switch.
+remain stable.  Using L<public-inbox-compact(1)> and
+L<public-inbox-xcpdb(1)> wrappers are recommended over tools
+provided by Xapian.
 
 =head2 OVERVIEW DB
 
diff --git a/Documentation/public-inbox-xcpdb.pod b/Documentation/public-inbox-xcpdb.pod
index c47500b..5697dcd 100644
--- a/Documentation/public-inbox-xcpdb.pod
+++ b/Documentation/public-inbox-xcpdb.pod
@@ -1,6 +1,6 @@
 =head1 NAME
 
-public-inbox-xcpdb - copy Xapian DBs (for format upgrades)
+public-inbox-xcpdb - upgrade Xapian DB formats
 
 =head1 SYNOPSIS
 
@@ -16,7 +16,8 @@ L<public-inbox-watch(1)> or L<public-inbox-mda(1)>.
 
 This is intended for upgrading the database format used by
 Xapian.  It DOES NOT upgrade the schema used by the
-public-inbox search interface (see L<public-inbox-index(1)>).
+public-inbox PSGI search interface (see
+L<public-inbox-index(1)>).
 
 =head1 ENVIRONMENT
 
diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm
index 0d86771..2771a24 100644
--- a/lib/PublicInbox/Inbox.pm
+++ b/lib/PublicInbox/Inbox.pm
@@ -225,7 +225,7 @@ sub description {
 	local $/ = "\n";
 	chomp $desc;
 	$desc =~ s/\s+/ /smg;
-	$desc = '($REPO_DIR/description missing)' if $desc eq '';
+	$desc = '($INBOX_DIR/description missing)' if $desc eq '';
 	$self->{description} = $desc;
 }
 
diff --git a/script/public-inbox-compact b/script/public-inbox-compact
index 4bdadfc..e8bf31e 100755
--- a/script/public-inbox-compact
+++ b/script/public-inbox-compact
@@ -8,7 +8,7 @@ use PublicInbox::InboxWritable;
 use PublicInbox::Xapcmd;
 use PublicInbox::Admin;
 PublicInbox::Admin::require_or_die('-index');
-my $usage = "Usage: public-inbox-compact REPO_DIR\n";
+my $usage = "Usage: public-inbox-compact INBOX_DIR\n";
 my $opt = { compact => 1, -coarse_lock => 1 };
 GetOptions($opt, @PublicInbox::Xapcmd::COMPACT_OPT) or
 	die "bad command-line args\n$usage";
diff --git a/script/public-inbox-index b/script/public-inbox-index
index b6e3052..40187b3 100755
--- a/script/public-inbox-index
+++ b/script/public-inbox-index
@@ -4,12 +4,12 @@
 # Basic tool to create a Xapian search index for a git repository
 # configured for public-inbox.
 # Usage with libeatmydata <https://www.flamingspork.com/projects/libeatmydata/>
-# highly recommended: eatmydata public-inbox-index REPO_DIR
+# highly recommended: eatmydata public-inbox-index INBOX_DIR
 
 use strict;
 use warnings;
 use Getopt::Long qw(:config gnu_getopt no_ignore_case auto_abbrev);
-my $usage = "public-inbox-index REPO_DIR";
+my $usage = "public-inbox-index INBOX_DIR";
 use PublicInbox::Admin;
 PublicInbox::Admin::require_or_die('-index');
 
diff --git a/script/public-inbox-init b/script/public-inbox-init
index 5724c52..985a09f 100755
--- a/script/public-inbox-init
+++ b/script/public-inbox-init
@@ -5,7 +5,7 @@
 # Initializes a public-inbox, basically a wrapper for git-init(1)
 use strict;
 use warnings;
-my $usage = "public-inbox-init NAME REPO_DIR HTTP_URL ADDRESS [ADDRESS..]";
+my $usage = "public-inbox-init NAME INBOX_DIR HTTP_URL ADDRESS [ADDRESS..]";
 use Getopt::Long qw/:config gnu_getopt no_ignore_case auto_abbrev/;
 use PublicInbox::Admin;
 PublicInbox::Admin::require_or_die('-base');
-- 
EW


      parent reply	other threads:[~2019-05-23 10:37 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-23  9:36 [PATCH 00/26] xcpdb: ease Xapian DB format migrations Eric Wong
2019-05-23  9:36 ` [PATCH 01/26] t/convert-compact: skip on missing xapian-compact(1) Eric Wong
2019-05-23  9:36 ` [PATCH 02/26] v1writable: retire in favor of InboxWritable Eric Wong
2019-05-23  9:36 ` [PATCH 03/26] doc: document the reason for --no-renumber Eric Wong
2019-05-23  9:36 ` [PATCH 04/26] search: reenable phrase search on non-chert Xapian Eric Wong
2019-05-23  9:36 ` [PATCH 05/26] xapcmd: new module for wrapping Xapian commands Eric Wong
2019-05-23  9:36 ` [PATCH 06/26] admin: hoist out resolve_inboxes for -compact and -index Eric Wong
2019-05-23  9:36 ` [PATCH 07/26] xapcmd: support spawn options Eric Wong
2019-05-23  9:36 ` [PATCH 08/26] xcpdb: new tool which wraps Xapian's copydatabase(1) Eric Wong
2019-05-23  9:36 ` [PATCH 09/26] xapcmd: do not cleanup on errors Eric Wong
2019-05-23  9:36 ` [PATCH 10/26] admin: move index_inbox over Eric Wong
2019-05-23  9:36 ` [PATCH 11/26] xcpdb: implement using Perl bindings Eric Wong
2019-05-23  9:36 ` [PATCH 12/26] xapcmd: xcpdb supports compaction Eric Wong
2019-05-23  9:36 ` [PATCH 13/26] v2writable: hoist out log_range sub for readability Eric Wong
2019-05-23  9:36 ` [PATCH 14/26] xcpdb: use fine-grained locking Eric Wong
2019-05-23  9:36 ` [PATCH 15/26] xcpdb: implement progress reporting Eric Wong
2019-05-23  9:36 ` [PATCH 16/26] xcpdb: cleanup error handling and diagnosis Eric Wong
2019-05-23  9:36 ` [PATCH 17/26] xapcmd: avoid EXDEV when finalizing changes Eric Wong
2019-05-23  9:36 ` [PATCH 18/26] doc: xcpdb: update to reflect the current state Eric Wong
2019-05-23  9:36 ` [PATCH 19/26] xapcmd: use "print STDERR" for progress reporting Eric Wong
2019-05-23  9:36 ` [PATCH 20/26] xcpdb: show re-indexing progress Eric Wong
2019-05-23  9:36 ` [PATCH 21/26] xcpdb: remove temporary directories on aborts Eric Wong
2019-05-23  9:37 ` [PATCH 22/26] compact: reuse infrastructure from xcpdb Eric Wong
2019-05-23  9:37 ` [PATCH 23/26] xcpdb|compact: support some xapian-compact switches Eric Wong
2019-05-23  9:37 ` [PATCH 24/26] xapcmd: cleanup on interrupted xcpdb "--compact" Eric Wong
2019-05-23  9:37 ` [PATCH 25/26] xcpdb|compact: support --jobs/-j flag like gmake(1) Eric Wong
2019-05-23  9:37 ` [PATCH 26/26] xapcmd: do not reset %SIG until last Xtmpdir is done Eric Wong
2019-05-23 10:37 ` Eric Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190523103738.GA24435@dcvr \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).