user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH 5/5] Fix some typos/grammar/errors in docs and comments
  @ 2023-08-28 10:42 68% ` Štěpán Němec
  0 siblings, 0 replies; 7+ results
From: Štěpán Němec @ 2023-08-28 10:42 UTC (permalink / raw)
  To: meta

---
Please note the FIXME added in this patch: I lacked the confidence to
repair that paragraph on my own.

 Documentation/RelNotes/v2.0.0.wip           |  2 +-
 Documentation/dc-dlvr-spam-flow.txt         |  2 +-
 Documentation/design_notes.txt              | 10 ++++----
 Documentation/design_www.txt                | 12 ++++-----
 Documentation/lei.pod                       |  2 +-
 Documentation/public-inbox-config.pod       | 10 ++++----
 Documentation/public-inbox-daemon.pod       | 20 ++++++++-------
 Documentation/public-inbox-glossary.pod     |  6 ++---
 Documentation/public-inbox-learn.pod        |  4 +--
 Documentation/public-inbox-purge.pod        |  4 +--
 Documentation/public-inbox-tuning.pod       | 12 ++++-----
 Documentation/public-inbox-v2-format.pod    |  6 ++---
 Documentation/public-inbox-watch.pod        |  4 +--
 Documentation/reproducibility.txt           |  4 +--
 Documentation/standards.perl                |  4 +--
 Documentation/technical/data_structures.txt | 28 ++++++++++-----------
 Documentation/technical/ds.txt              |  6 ++---
 Documentation/technical/memory.txt          |  2 +-
 Documentation/technical/whyperl.txt         | 20 +++++++--------
 HACKING                                     | 14 +++++------
 INSTALL                                     |  4 +--
 README                                      | 16 ++++++------
 TODO                                        |  6 ++---
 ci/README                                   |  2 +-
 ci/profiles.sh                              |  2 +-
 devel/README                                |  2 +-
 examples/varnish-4.vcl                      |  2 +-
 lib/PublicInbox/DS.pm                       |  4 +--
 lib/PublicInbox/Daemon.pm                   |  2 +-
 sa_config/README                            |  4 +--
 script/public-inbox-mda                     |  4 +--
 scripts/README                              |  2 +-
 32 files changed, 111 insertions(+), 111 deletions(-)

diff --git a/Documentation/RelNotes/v2.0.0.wip b/Documentation/RelNotes/v2.0.0.wip
index cccf11ae587d..40c87169ccd9 100644
--- a/Documentation/RelNotes/v2.0.0.wip
+++ b/Documentation/RelNotes/v2.0.0.wip
@@ -60,7 +60,7 @@
   * fix `lei q -tt' on locally-indexed messages (still broken for remotes:
     https://public-inbox.org/meta/20230226170931.M947721@dcvr/ )
 
-  * `lei import' now set labels+keywords consistently on all
+  * `lei import' now sets labels+keywords consistently on all
      already-imported messages
 
 solver (used by lei (rediff|blob), and PublicInbox::WWW)
diff --git a/Documentation/dc-dlvr-spam-flow.txt b/Documentation/dc-dlvr-spam-flow.txt
index d151d272d0ae..6210fc7dcff4 100644
--- a/Documentation/dc-dlvr-spam-flow.txt
+++ b/Documentation/dc-dlvr-spam-flow.txt
@@ -39,7 +39,7 @@ delivery path as well as removing the message from the git tree.
 
 * incron - run commands based on filesystem events: http://incron.aiken.cz/
 
-* sendmail / MTA - we use and recommend use postfix, which includes a
+* sendmail / MTA - we use and recommend postfix, which includes a
                    sendmail-compatible wrapper: http://www.postfix.org/
 
 * spamc / spamd - SpamAssassin: http://spamassassin.apache.org/
diff --git a/Documentation/design_notes.txt b/Documentation/design_notes.txt
index 3df5af3e3cf2..95f025560c9e 100644
--- a/Documentation/design_notes.txt
+++ b/Documentation/design_notes.txt
@@ -52,15 +52,15 @@ Why email?
   There is no need to ask the NSA for backups of your mail archives :)
 
 * git, one of the most widely-used version control systems, includes many
-  tools for for email, including: git-format-patch(1), git-send-email(1),
+  tools for email, including: git-format-patch(1), git-send-email(1),
   git-am(1), git-imap-send(1).  Furthermore, the development of git itself
   is based on the git mailing list: https://public-inbox.org/git/
   (or
   http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/git/
-  for Tor users)
+  for Tor users).
 
 * Email is already the de-facto form of communication in many Free Software
-  communities..
+  communities.
 
 * Fallback/transition to private email and other lists, in case the
   public-inbox host becomes unavailable, users may still directly email
@@ -76,13 +76,13 @@ Why git?
 
 * As of 2016, git is widely used and known to nearly all Free Software
   developers.  For non-developers it is packaged for all major GNU/Linux
-  and *BSD distributions.  NNTP is not as widely-used nowadays, and
+  and *BSD distributions.  NNTP is not as widely used nowadays, and
   most IMAP clients do not have good support for read-only mailboxes.
 
 Why perl 5?
 -----------
 
-* Perl 5 is widely available on modern *nix systems with good a history
+* Perl 5 is widely available on modern *nix systems, with a good history
   of backwards and forward compatibility.
 
 * git and SpamAssassin both use it, so it should be one less thing for
diff --git a/Documentation/design_www.txt b/Documentation/design_www.txt
index b1f916ddb369..68488b1fa253 100644
--- a/Documentation/design_www.txt
+++ b/Documentation/design_www.txt
@@ -102,7 +102,7 @@ We also set <title> to make window management easier.
 
 We favor <pre>-formatted text since public-inbox is intended as a place
 to share and discuss patches and code.  Unfortunately, long paragraphs
-tends to be less readable with fixed-width serif fonts which GUI
+tend to be less readable with fixed-width serif fonts which GUI
 browsers default to.
 
 * No graphics, images, or icons at all.  We tolerate, but do not
@@ -122,12 +122,12 @@ browsers default to.
   avoided as they do not render well with some displays or user-chosen
   fonts.
 
-* No JavaScript. JS is historically too buggy and insecure, and we will
+* No JavaScript.  JS is historically too buggy and insecure, and we will
   never expect our readers to do either of the following:
-  a) read and audit all our code for on every single page load
-  b) trust us and and run code without reading it
+  a) read and audit all our code on every single page load
+  b) trust us and run code without reading it
 
-* We only use CSS for one reason: wrapping pre-formatted text
+* We only use CSS for one reason: wrapping pre-formatted text.
   This is necessary because unfortunate GUI browsers tend to be
   prone to layout widening from unwrapped mailers.
   Do not expect CSS to be enabled, especially with scary things like:
@@ -141,4 +141,4 @@ CSS classes (for user-supplied CSS)
 -----------------------------------
 
 See examples in contrib/css/ and lib/PublicInbox/WwwText.pm
-(or https://public-inbox.org/meta/_/text/color/ soon)
+(or <https://public-inbox.org/meta/_/text/color/>)
diff --git a/Documentation/lei.pod b/Documentation/lei.pod
index f01f506af359..2b10f4906e1a 100644
--- a/Documentation/lei.pod
+++ b/Documentation/lei.pod
@@ -126,7 +126,7 @@ Other subcommands include
 
 =head1 FILES
 
-By default storage is located at C<$XDG_DATA_HOME/lei/store>.  The
+By default, storage is located at C<$XDG_DATA_HOME/lei/store>.  The
 configuration for lei resides at C<$XDG_CONFIG_HOME/lei/config>.
 
 =head1 ERRORS
diff --git a/Documentation/public-inbox-config.pod b/Documentation/public-inbox-config.pod
index d175d2d74726..d2389abceb0e 100644
--- a/Documentation/public-inbox-config.pod
+++ b/Documentation/public-inbox-config.pod
@@ -191,7 +191,7 @@ Default: :all
 The local path name of a CSS file for the PSGI web interface.
 May contain the attributes "media", "title" and "href" which match
 the associated attributes of the HTML <style> tag.
-"href" may be specified to point to the URL of an remote CSS file
+"href" may be specified to point to the URL of a remote CSS file
 and the path may be "/dev/null" or any empty file.
 Multiple files may be specified and will be included in the
 order specified.
@@ -291,10 +291,10 @@ Default: /var/www/htdocs/cgit/cgit.cgi or /usr/lib/cgit/cgit.cgi
 =item publicinbox.cgitdata
 
 A path to the data directory used by cgit for storing static files.
-Typically guessed based the location of C<cgit.cgi> (from
-C<publicinbox.cgitbin>, but may be overridden.
+Typically guessed based on the location of C<cgit.cgi> (from
+C<publicinbox.cgitbin>), but may be overridden.
 
-Default: basename of C<publicinbox.cgitbin>, /var/www/htdocs/cgit/
+Default: dirname of C<publicinbox.cgitbin>, /var/www/htdocs/cgit/
 or /usr/share/cgit/
 
 =item publicinbox.cgit
@@ -311,7 +311,7 @@ Try using C<cgit> as the first choice, this is the default.
 =item * fallback
 
 Fall back to using C<cgit> only if our native, inbox-aware
-git code repository viewer doesn't recognized the URL.
+git code repository viewer doesn't recognize the URL.
 
 =item * rewrite
 
diff --git a/Documentation/public-inbox-daemon.pod b/Documentation/public-inbox-daemon.pod
index 7121683325c7..c5c88bdd04fa 100644
--- a/Documentation/public-inbox-daemon.pod
+++ b/Documentation/public-inbox-daemon.pod
@@ -101,6 +101,8 @@ Default: 1
 The default TLS certificate for HTTPS, IMAPS, NNTPS, POP3S and/or STARTTLS
 support if the C<cert> option is not given with C<--listen>.
 
+=for comment FIXME this paragraph needs repair
+
 Well-known TCP ports automatically get TLS or STARTTLS support
 If using systemd-compatible socket activation and a TCP listener
 on port well-known ports (563 is inherited, it is automatically
@@ -112,15 +114,15 @@ STARTTLS support.
 
 The default TLS certificate key for the default C<--cert> or
 per-listener C<cert=> option.  The private key may be
-concatenated into the path used by the cert, in which case this
+concatenated into the cert file itself, in which case this
 option is not needed.
 
 =item --multi-accept INTEGER
 
-By default, each worker accepts one connection at-a-time to maximize
+By default, each worker accepts one connection at a time to maximize
 fairness and minimize contention across multiple processes on a
 shared listen socket.  Accepting multiple connections at once may be
-useful in constrained deployments with few, heavily-loaded workers.
+useful in constrained deployments with few, heavily loaded workers.
 Negative values enables a worker to accept all available clients at
 once, possibly starving others in the process.  C<-1> behaves like
 C<multi_accept yes> in nginx; while C<0> (the default) is
@@ -137,7 +139,7 @@ Default: 0
 =head1 SIGNALS
 
 Most of our signal handling behavior is copied from L<nginx(8)>
-and/or L<starman(1)>; so it is possible to reuse common scripts
+and/or L<starman(1)>, so it is possible to reuse common scripts
 for managing them.
 
 =over 8
@@ -158,7 +160,7 @@ Reload config files associated with the process.
 
 =item SIGTTIN
 
-Increase the number of running workers processes by one.
+Increase the number of running worker processes by one.
 
 =item SIGTTOU
 
@@ -166,7 +168,7 @@ Decrease the number of running worker processes by one.
 
 =item SIGWINCH
 
-Stop all running worker processes.   SIGHUP or SIGTTIN
+Stop all running worker processes.  SIGHUP or SIGTTIN
 may be used to restart workers.
 
 =item SIGQUIT
@@ -194,7 +196,7 @@ activation.  See L<systemd.socket(5)> and L<sd_listen_fds(3)>.
 
 =item PERL_INLINE_DIRECTORY
 
-Pointing this to point to a writable directory enables the use
+Pointing this to a writable directory enables the use
 of L<Inline> and L<Inline::C> extensions which may provide
 platform-specific performance improvements.  Currently, this
 enables the use of L<vfork(2)> which speeds up subprocess
@@ -211,8 +213,8 @@ created by a user. See L<Inline> and L<Inline::C> for more details.
 There are two ways to upgrade a running process.
 
 Users of process management systems with socket activation
-(L<systemd(1)> or similar) may rely on multiple instances For
-systemd, this means using two (or more) '@' instances for each
+(L<systemd(1)> or similar) may rely on multiple daemon instances.
+For systemd, this means using two (or more) '@' instances for each
 service (e.g. C<SERVICENAME@INSTANCE>) as documented in
 L<systemd.unit(5)>.
 
diff --git a/Documentation/public-inbox-glossary.pod b/Documentation/public-inbox-glossary.pod
index 3c9e2bd21283..d88539c8b0fb 100644
--- a/Documentation/public-inbox-glossary.pod
+++ b/Documentation/public-inbox-glossary.pod
@@ -25,7 +25,7 @@ C<over.sqlite3>
 
 =item tid, THREADID
 
-A sequentially-assigned positive integer.  These integers are
+A sequentially assigned positive integer.  These integers are
 per-inbox or per-extindex.  In the future, this may be prefixed
 with C<T> for JMAP (RFC 8621) and RFC 8474.  This may not be
 strictly compliant with RFC 8621 since inboxes and extindices
@@ -40,7 +40,7 @@ RFC-(822|2822|5322) email message.
 
 =item IMAP EMAILID, JMAP Email Id
 
-To-be-decided.  This will likely be the git blob ID prefixed with C<g>
+To be decided.  This will likely be the git blob ID prefixed with C<g>
 rather than the numeric UID to accommodate the same blob showing
 up in both an extindex and inbox (or multiple extindices).
 
@@ -87,7 +87,7 @@ but it imports drafts.
 
 For L<lei(1)> users only.  This will allow lei users to place
 the same email into one or more virtual folders for
-ease-of-filtering.  This is NOT tied to public-inbox names, as
+ease of filtering.  This is NOT tied to public-inbox names, as
 messages stored by lei may not be public.
 
 These are similar in spirit to arbitrary freeform "tags"
diff --git a/Documentation/public-inbox-learn.pod b/Documentation/public-inbox-learn.pod
index 3c92b1cc698b..f776df6b2bb0 100644
--- a/Documentation/public-inbox-learn.pod
+++ b/Documentation/public-inbox-learn.pod
@@ -54,7 +54,7 @@ This is similar to the C<spam> command above, but does
 not feed the message to L<spamc(1)> and only removes messages
 which match on any of the C<To:>, C<Cc:>, and C<List-ID:> headers.
 
-The C<--all> option may be used match C<spam> semantics in removing
+The C<--all> option may be used to match C<spam> semantics in removing
 the message from all configured inboxes.  C<--all> is only
 available in public-inbox 1.6.0+.
 
@@ -82,7 +82,7 @@ L<http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>
 
 =head1 COPYRIGHT
 
-Copyright 2019-2021 all contributors L<mailto:meta@public-inbox.org>
+Copyright all contributors L<mailto:meta@public-inbox.org>
 
 License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>
 
diff --git a/Documentation/public-inbox-purge.pod b/Documentation/public-inbox-purge.pod
index 945286c69f97..1223b5775828 100644
--- a/Documentation/public-inbox-purge.pod
+++ b/Documentation/public-inbox-purge.pod
@@ -31,7 +31,7 @@ leads to discontiguous git history.
 =item --all
 
 Purge the message in all inboxes configured in ~/.public-inbox/config.
-This is an alternative to specifying individual inboxes directories
+This is an alternative to specifying individual inbox directories
 on the command-line.
 
 =back
@@ -74,7 +74,7 @@ L<http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>
 
 =head1 COPYRIGHT
 
-Copyright 2019-2021 all contributors L<mailto:meta@public-inbox.org>
+Copyright all contributors L<mailto:meta@public-inbox.org>
 
 License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>
 
diff --git a/Documentation/public-inbox-tuning.pod b/Documentation/public-inbox-tuning.pod
index 53668eccb7cb..58a4d9bcbabd 100644
--- a/Documentation/public-inbox-tuning.pod
+++ b/Documentation/public-inbox-tuning.pod
@@ -79,8 +79,8 @@ RAM.  Attempts to parallelize random I/O on HDDs leads to pathological
 slowdowns as inboxes grow.
 
 While C<-V2> introduced Xapian shards as a parallelization
-mechanism for SSDs; enabling C<publicInbox.indexSequentialShard>
-repurposes sharding as mechanism to reduce the kernel page cache
+mechanism for SSDs, enabling C<publicInbox.indexSequentialShard>
+repurposes sharding as a mechanism to reduce the kernel page cache
 footprint when indexing on HDDs.
 
 Initializing a mirror with a high C<--jobs> count to create more
@@ -108,7 +108,7 @@ indices on btrfs to achieve acceptable performance (even on SSD).
 Disabling copy-on-write also disables checksumming, thus C<raid1>
 (or higher) configurations may be corrupt after unsafe shutdowns.
 
-Fortunately, these SQLite and Xapian indices are designed to
+Fortunately, these SQLite and Xapian indices are designed to be
 recoverable from git if missing.
 
 Disabling CoW does not prevent all fragmentation.  Large values
@@ -125,7 +125,7 @@ C<btrfs filesystem defragment -fr $INBOX_DIR> may be necessary.
 Large filesystems benefit significantly from the C<space_cache=v2>
 mount option documented in L<btrfs(5)>.
 
-Older, non-CoW filesystems are generally work well out-of-the-box
+Older, non-CoW filesystems generally work well out of the box
 for our Xapian and SQLite indices.
 
 =head2 Performance on solid state drives
@@ -152,7 +152,7 @@ C<LimitNOFILE=> in L<systemd.exec(5)>) may need to be raised to
 accommodate many concurrent clients.
 
 Transport Layer Security (IMAPS, NNTPS, or via STARTTLS) significantly
-increases memory use of client sockets, sure to account for that in
+increases memory use of client sockets, be sure to account for that in
 capacity planning.
 
 =head2 Other OS tuning knobs
@@ -168,7 +168,7 @@ Other OSes may have similar tuning knobs (patches appreciated).
 L<public-inbox-extindex(1)> allows any number of public-inboxes
 to share the same Xapian indices.
 
-git 2.33+ startup time is orders-of-magnitude faster and uses
+git 2.33+ startup time is orders of magnitude faster and uses
 less memory when dealing with thousands of alternates required
 for thousands of inboxes with L<public-inbox-extindex(1)>.
 
diff --git a/Documentation/public-inbox-v2-format.pod b/Documentation/public-inbox-v2-format.pod
index e93d7fc701d9..de3b0bfd390f 100644
--- a/Documentation/public-inbox-v2-format.pod
+++ b/Documentation/public-inbox-v2-format.pod
@@ -30,7 +30,7 @@ databases for parallelism by "shards".
   - all.git                         # empty, alternates to $EPOCH.git
   - xap$SCHEMA_VERSION/$SHARD       # per-shard Xapian DB
   - xap$SCHEMA_VERSION/over.sqlite3 # OVER-view DB for NNTP, threading
-  - msgmap.sqlite3                  # same the v1 msgmap
+  - msgmap.sqlite3                  # same as the v1 msgmap
 
 For blob lookups, the reader only needs to open the "all.git"
 repository with $GIT_DIR/objects/info/alternates which references
@@ -89,7 +89,7 @@ After-the-fact invocations of L<public-inbox-index> will ignore
 messages written to 'd' after they are written to 'm'.
 
 Deltafication is not significantly improved over v1, but overall
-storage for trees is made as as small as possible.  Initial
+storage for trees is made as small as possible.  Initial
 statistics and benchmarks showing the benefits of this approach
 are documented at:
 
@@ -97,7 +97,7 @@ L<https://public-inbox.org/meta/20180209205140.GA11047@dcvr/>
 
 =head2 XAPIAN SHARDS
 
-Another second scalability problem in v1 was the inability to
+Another scalability problem in v1 was the inability to
 utilize multiple CPU cores for Xapian indexing.  This is
 addressed by using shards in Xapian to perform import
 indexing in parallel.
diff --git a/Documentation/public-inbox-watch.pod b/Documentation/public-inbox-watch.pod
index e8f97c8088c9..febda0b13df4 100644
--- a/Documentation/public-inbox-watch.pod
+++ b/Documentation/public-inbox-watch.pod
@@ -41,7 +41,7 @@ importing them into public-inbox git repositories and indices.
 public-inbox-watch is useful in situations when a user wishes to
 mirror an existing mailing list, but has no access to run
 L<public-inbox-mda(1)> on a server.  Unlike public-inbox-mda
-which is invoked once per-message, public-inbox-watch is a
+which is invoked once per message, public-inbox-watch is a
 persistent process, making it faster for after-the-fact imports
 of large Maildirs.
 
@@ -62,7 +62,7 @@ public-inbox-watch takes no command-line options.
 =head1 CONFIGURATION
 
 These configuration knobs should be used in the
-L<public-inbox-config(5)> file
+L<public-inbox-config(5)> file.
 
 =over 8
 
diff --git a/Documentation/reproducibility.txt b/Documentation/reproducibility.txt
index 4e56ada48bb2..3336de731a4d 100644
--- a/Documentation/reproducibility.txt
+++ b/Documentation/reproducibility.txt
@@ -12,7 +12,7 @@ reproducible.
 Keeping all communications as email ensures the full history
 of the entire project can be mirrored by anyone with the
 resources to do so.  Compact, low-complexity data requires
-less resources to mirror, so sticking with plain-text
+less resources to mirror, so sticking with plain text
 ensures more parties can mirror and potentially fork the
 project with all its data.
 
@@ -26,4 +26,4 @@ If these things make power hungry project leaders and admins
 uncomfortable, good.  That was the point.  It's how checks
 and balances ought to work.
 
-Comments, corrections, etc welcome: meta@public-inbox.org
+Comments, corrections, etc. welcome: meta@public-inbox.org
diff --git a/Documentation/standards.perl b/Documentation/standards.perl
index c36afb5d718b..743cdee1ce24 100755
--- a/Documentation/standards.perl
+++ b/Documentation/standards.perl
@@ -11,11 +11,11 @@ Non-exhaustive list of standards public-inbox software attempts or
 intends to implement.  This list is intended to be a quick reference
 for hackers and users.
 
-Given the goals of interoperability and accessibility; strict
+Given the goals of interoperability and accessibility, strict
 conformance to standards is not always possible, but rather
 best-effort taking into account real-world cases.  In particular,
 "obsolete" standards remain relevant as long as clients and
-data exists.
+data using them exist.
 
 IETF RFCs
 ---------
diff --git a/Documentation/technical/data_structures.txt b/Documentation/technical/data_structures.txt
index 4dcf9ce609be..5ed21882b9f8 100644
--- a/Documentation/technical/data_structures.txt
+++ b/Documentation/technical/data_structures.txt
@@ -32,19 +32,19 @@ Per-message classes
   Common abbreviation: $mime, $eml
   Used by: PublicInbox::WWW, PublicInbox::SearchIdx
 
-  An representation of an entire email, multipart or not.
+  A representation of an entire email, multipart or not.
   An option to use libgmime or libmailutils may be supported
   in the future for performance and memory use.
 
   This can be a memory hog with big messages and giant
   attachments, so our PublicInbox::WWW interface only keeps
-  one object of this class in memory at-a-time.
+  one object of this class in memory at a time.
 
   In other words, this is the "meat" of the message, whereas
   $smsg (below) is just the "skeleton".
 
   Our PublicInbox::V2Writable class may have two objects of this
-  type in memory at-a-time for deduplication.
+  type in memory at a time for deduplication.
 
   In public-inbox 1.4 and earlier, Email::MIME and its subclass,
   PublicInbox::MIME were used.  Despite still slurping,
@@ -61,10 +61,10 @@ Per-message classes
 
   This is loaded from either the overview DB (over.sqlite3) or
   the Xapian DB (docdata.glass), though the Xapian docdata
-  is won't hold NNTP-only fields (Cc:/To:)
+  won't hold NNTP-only fields (Cc:/To:).
 
   There may be hundreds or thousands of these objects in memory
-  at-a-time, so fields are pruned if unneeded.
+  at a time, so fields are pruned if unneeded.
 
 * PublicInbox::SearchThread::Msg - subclass of Smsg
   Common abbreviation: $cont or $node
@@ -75,9 +75,9 @@ Per-message classes
   Nowadays, this is a re-blessed $smsg with additional fields.
 
   As with $smsg objects, there may be hundreds or thousands
-  of these objects in memory at-a-time.
+  of these objects in memory at a time.
 
-  We also do not use a linked-list for storing children as JWZ
+  We also do not use a linked list for storing children as JWZ
   describes, but instead a Perl hashref for {children} which
   becomes an arrayref upon sorting.
 
@@ -88,7 +88,7 @@ Per-inbox classes
 
 * PublicInbox::Inbox - represents a single public-inbox
   Common abbreviation: $ibx
-  Used everywhere
+  Used everywhere.
 
   This represents a "publicinbox" section in the config
   file, see public-inbox-config(5) for details.
@@ -152,7 +152,7 @@ ad-hoc structures shared across packages
   This holds the PSGI $env as well as any internal variables
   used by various modules of PublicInbox::WWW.
 
-  As with the PSGI $env, there is one per-active WWW
+  As with the PSGI $env, there is one per active WWW
   request+response cycle.  It does not exist for idle HTTP
   clients.
 
@@ -174,8 +174,8 @@ daemon classes
   Common abbreviation: $http
   Used by: PublicInbox::DS, public-inbox-httpd
 
-  Unlike PublicInbox::NNTP, this class no knowledge of any of
-  the email or git-specific parts of public-inbox, only PSGI.
+  Unlike PublicInbox::NNTP, this class has no knowledge of any of
+  the email- or git-specific parts of public-inbox, only PSGI.
   However, it supports APIs and behaviors (e.g. streaming large
   responses) which PublicInbox::WWW may take advantage of.
 
@@ -188,7 +188,7 @@ daemon classes
 
   This class calls non-blocking accept(2) or accept4(2) on a
   listen socket to create new PublicInbox::HTTP and
-  PublicInbox::HTTP instances.
+  PublicInbox::NNTP instances.
 
 * PublicInbox::HTTPD
   Common abbreviation: $httpd
@@ -197,9 +197,9 @@ daemon classes
   wrappers around client sockets accepted from
   PublicInbox::Listener.
 
-  Since the SERVER_NAME and SERVER_PORT PSGI variables needs to be
+  Since the SERVER_NAME and SERVER_PORT PSGI variables need to be
   exposed for HTTP/1.0 requests when Host: headers are missing,
-  this is per-Listener socket.
+  this is per Listener socket.
 
 * PublicInbox::HTTPD::Async
   Common abbreviation: $async
diff --git a/Documentation/technical/ds.txt b/Documentation/technical/ds.txt
index 4cfb62fe44c8..afead2f155e0 100644
--- a/Documentation/technical/ds.txt
+++ b/Documentation/technical/ds.txt
@@ -19,7 +19,7 @@ Most notably:
   triggers a call.
 
   The lack of read/write callback distinction is driven by the
-  fact TLS libraries (e.g. OpenSSL via IO::Socket::SSL) may
+  fact that TLS libraries (e.g. OpenSSL via IO::Socket::SSL) may
   declare SSL_WANT_READ on SSL_write(), and SSL_WANT_READ on
   SSL_read().  So we end up having to let each user object decide
   whether it wants to make read or write calls depending on its
@@ -35,7 +35,7 @@ Most notably:
   Reducing the user-supplied code down to a single callback allows
   subclasses to keep their logic self-contained.  The combination
   of this change and one-shot wakeups (see below) for bidirectional
-  data flows make asynchronous code easier to reason about.
+  data flows makes asynchronous code easier to reason about.
 
 Other divergences:
 
@@ -53,7 +53,7 @@ Other divergences:
 
 Augmented features:
 
-* obj->write(CODEREF) passes the object itself to the CODEREF
+* obj->write(CODEREF) passes the object itself to the CODEREF.
   Being able to enqueue subroutine calls is a powerful feature in
   Danga::Socket for keeping linear logic in an asynchronous environment.
   Unfortunately, each subroutine takes several kilobytes of memory.
diff --git a/Documentation/technical/memory.txt b/Documentation/technical/memory.txt
index a35b2c734409..039694c33441 100644
--- a/Documentation/technical/memory.txt
+++ b/Documentation/technical/memory.txt
@@ -8,7 +8,7 @@ memory-efficient.
 We strive to keep processes small to improve locality, allow
 the kernel to cache more files, and to be a good neighbor to
 other processes running on the machine.  Taking advantage of
-automatic reference counting (ARC) in Perl allows us
+automatic reference counting (ARC) in Perl allows us to
 deterministically release memory back to the heap.
 
 We start with a simple data model with few circular
diff --git a/Documentation/technical/whyperl.txt b/Documentation/technical/whyperl.txt
index fbe2e1b16e06..db1d9793a76a 100644
--- a/Documentation/technical/whyperl.txt
+++ b/Documentation/technical/whyperl.txt
@@ -21,7 +21,7 @@ Good Things
 
   Perl 5 is installed on many, if not most GNU/Linux and
   BSD-based servers and workstations.  It is likely the most
-  widely-installed programming environment that offers a
+  widely installed programming environment that offers a
   significant amount of POSIX functionality.  Users won't
   have to waste bandwidth or space with giant toolchains or
   architecture-specific binaries.
@@ -47,8 +47,8 @@ Good Things
 
 * Predictable performance
 
-  While Perl is neither fast or memory-efficient, its
-  performance and memory use are predictable and does not
+  While Perl is neither fast nor memory-efficient, its
+  performance and memory use are predictable and do not
   require GC tuning by the user.
 
   public-inbox is developed for (and mostly on) old
@@ -56,7 +56,7 @@ Good Things
   late 1990s, and any cheap VPS today has more than enough
   RAM and CPU for handling plain-text email.
 
-  Low hardware requirements increases the reach of our software
+  Low hardware requirements increase the reach of our software
   to more users, improving centralization resistance.
 
 * Compatibility
@@ -86,7 +86,7 @@ Good Things
 
   There should be no need to rely on language-specific
   package managers such as cpan(1), those systems increase
-  the learning curve for users and systems administrators.
+  the learning curve for users and system administrators.
 
 * Compactness and terseness
 
@@ -98,7 +98,7 @@ Good Things
 * Performance ceiling and escape hatch
 
   With optional Inline::C, we can be "as fast as C" in some
-  cases.  Inline::C is widely-packaged by distros and it
+  cases.  Inline::C is widely packaged by distros and it
   gives us an escape hatch for dealing with missing bindings
   or performance problems should they arise.  Inline::C use
   (as opposed to XS) also preserves the software freedom and
@@ -135,7 +135,7 @@ Bad Things
   (m//, substr(), index(), etc.) still require memory copies
   into userspace, negating a benefit of zero-copy.
 
-* The XS/C API make it difficult to improve internals while
+* The XS/C API makes it difficult to improve internals while
   preserving compatibility.
 
 * Lack of optional type checking.  This may be a blessing in
@@ -161,14 +161,14 @@ Red herrings to ignore when evaluating other runtimes
 -----------------------------------------------------
 
 These don't discount a language or runtime from being
-being used, they're just not interesting.
+used, they're just not interesting.
 
 * Lightweight threading
 
   While lightweight threading implementations are
-  convenient, they tend to be significantly heavier than a
+  convenient, they tend to be significantly heavier than
   pure event-loop systems (or multi-threaded event-loop
-  systems)
+  systems).
 
   Lightweight threading implementations have stack overhead
   and growth typically measured in kilobytes.  The userspace
diff --git a/HACKING b/HACKING
index df68b54d0f40..18ec74206c45 100644
--- a/HACKING
+++ b/HACKING
@@ -7,7 +7,7 @@ It is archived at: https://public-inbox.org/meta/
 and http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/ (using Tor)
 
 Contributions are email-driven, just like contributing to git
-itself or the Linux kernel; however anonymous and pseudonymous
+itself or the Linux kernel; nevertheless, anonymous and pseudonymous
 contributions will always be welcome.
 
 Please consider our goals in mind:
@@ -15,17 +15,17 @@ Please consider our goals in mind:
 	Decentralization, Accessibility, Compatibility, Performance
 
 These goals apply to everyone: users viewing over the web or NNTP,
-sysadmins running public-inbox, and other hackers working public-inbox.
+sysadmins running public-inbox, and other hackers working on public-inbox.
 
 We will reject any feature which advocates or contributes to any
-particular instance of a public-inbox becoming a single point of failure.
+particular instance of public-inbox becoming a single point of failure.
 Things we've considered but rejected include:
 
 * exposing article serial numbers outside of NNTP
 * allowing readers to inject metadata (e.g. votes)
 
 We care about being accessible to folks with vision problems and/or
-lack the computing resources to view so-called "modern" websites.
+lacking the computing resources to view so-called "modern" websites.
 This includes folks on slow connections and ancient browsers which
 may be too difficult to upgrade due to resource demands.
 
@@ -45,7 +45,7 @@ Just-Ahead-of-Time-compiled C (via Inline::C)
 Do not recurse on user-supplied data.  Neither Perl or C handle
 deep recursion gracefully.  See lib/PublicInbox/SearchThread.pm
 and lib/PublicInbox/MsgIter.pm for examples of non-recursive
-alternatives to previously-recursive algorithms.
+alternatives to previously recursive algorithms.
 
 Performance should be reasonably good for server administrators, too,
 and we will sacrifice features to achieve predictable performance.
@@ -61,8 +61,6 @@ on specific topics, in particular data_structures.txt
 Optional packages for testing and development
 ---------------------------------------------
 
-Optional packages testing and development:
-
 - Plack::Test                      deb: libplack-test-perl
                                    pkg: p5-Plack
                                    rpm: perl-Plack-Test
@@ -107,6 +105,6 @@ Perl notes
 ----------
 
 * \w, \s, \d character classes all match Unicode characters;
-  so write out class ranges (e.g "[0-9]") if you only intend to
+  so write out class ranges (e.g., "[0-9]") if you only intend to
   match ASCII.  Do not use the "/a" (ASCII) modifier, that requires
   Perl 5.14 and we're only depending on 5.10.1 at the moment.
diff --git a/INSTALL b/INSTALL
index 91e590ce3318..f5e14ebe73d4 100644
--- a/INSTALL
+++ b/INSTALL
@@ -1,7 +1,7 @@
 public-inbox (server-side) installation
 ---------------------------------------
 
-This is for folks who want to setup their own public-inbox instance.
+This is for folks who want to set up their own public-inbox instance.
 Clients should use normal git-clone/git-fetch, IMAP or NNTP clients
 if they want to import mail into their personal inboxes.
 
@@ -135,7 +135,7 @@ Numerous optional modules are likely to be useful as well:
                                     foreground servers)
 
 The following module is typically pulled in by dependencies listed
-above, so there is no need to explicitly install them:
+above, so there is no need to explicitly install it:
 
 - DBI                              deb: libdbi-perl
                                    pkg: p5-DBI
diff --git a/README b/README
index abe8ddc0075f..a9aa0e864ca2 100644
--- a/README
+++ b/README
@@ -17,7 +17,7 @@ public-inbox spawned around three main ideas:
   communication.  Users may have broken graphics drivers, limited
   eyesight, or be unable to afford modern hardware.
 
-public-inbox aims to be easy-to-deploy and manage; encouraging projects
+public-inbox aims to be easy to deploy and manage, encouraging projects
 to run their own instances with minimal overhead.
 
 Implementation
@@ -27,7 +27,7 @@ public-inbox stores mail in git repositories as documented
 in https://public-inbox.org/public-inbox-v2-format.txt and
 https://public-inbox.org/public-inbox-v1-format.txt
 
-By storing (and optionally) exposing an inbox via git, it is
+By storing and (optionally) exposing an inbox via git, it is
 fast and efficient to host and mirror public-inboxes.
 
 Traditional mailing lists use the "push" model.  For readers,
@@ -42,11 +42,11 @@ follow the list via NNTP, IMAP, POP3, Atom feed or HTML archives.
 
 If a reader loses interest, they simply stop following.
 
-Since we use git, mirrors are easy-to-setup, and lists are
-easy-to-relocate to different mail addresses without losing
+Since we use git, mirrors are easy to set up, and lists are
+easy to relocate to different mail addresses without losing
 or splitting archives.
 
-_Anybody_ may also setup a delivery-only mailing list server to
+_Anybody_ may also set up a delivery-only mailing list server to
 replay a public-inbox git archive to subscribers via SMTP.
 
 Features
@@ -111,7 +111,7 @@ and pull requests to our public-inbox address at:
 
 Please Cc: all recipients when replying as we do not require
 subscription.  This also makes it easier to rope in folks of
-tangentially related projects we depend on (e.g. git developers
+tangentially related projects we depend on (e.g., git developers
 on git@vger.kernel.org).
 
 The archives are readable via IMAP, NNTP or HTTP:
@@ -155,8 +155,8 @@ This improves accessibility, and saves bandwidth and storage
 as mail is archived forever.
 
 As of the 2010s, successful online social networks and forums are the
-ones which heavily restrict users formatting options; so public-inbox
-aims to preserve the focus on content, and not presentation.
+ones which heavily restrict users' formatting options; public-inbox
+aims to preserve the focus on content, not presentation.
 
 Copyright
 ---------
diff --git a/TODO b/TODO
index 77453eba27ac..de628e2e310a 100644
--- a/TODO
+++ b/TODO
@@ -1,8 +1,8 @@
 TODO items for public-inbox
 
 (Not in any particular order, and
-performance, ease-of-setup, installation, maintainability, etc
-all need to be considered for everything we introduce)
+performance, ease of setup, installation, maintainability, etc.
+all need to be considered for everything we introduce.)
 
 * general performance improvements, but without relying on
   XS or pre-built modules any more than we currently do.
@@ -32,7 +32,7 @@ all need to be considered for everything we introduce)
   portability to older Linux, free BSDs and maybe Hurd).
 
 * dogfood latest Xapian, Perl5, SQLite, git and various modules to
-  ensure things continue working as they should (or more better)
+  ensure things continue working as they should (or better)
   while retaining compatibility with old versions.
 
 * Support more of RFC 3977 (NNTP)
diff --git a/ci/README b/ci/README
index 4687fbc57059..728d82a0052c 100644
--- a/ci/README
+++ b/ci/README
@@ -27,7 +27,7 @@ run in the top-level source tree, that is, as `./ci/run.sh'.
 	or doing development.  However, it can be convenient to for
 	users to mass-install several packages.
 
-* ci/profiles.sh - prints to-be tested package profile for the current OS
+* ci/profiles.sh - prints to-be-tested package profile for the current OS
 
 	Called automatically by ci/run.sh
 	The output is read by ci/run.sh
diff --git a/ci/profiles.sh b/ci/profiles.sh
index e58b61d50a13..55b998d73633 100755
--- a/ci/profiles.sh
+++ b/ci/profiles.sh
@@ -2,7 +2,7 @@
 # Copyright (C) 2019-2021 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 
-# Prints OS-specific package profiles to stdout (one per-newline) to use
+# Prints OS-specific package profiles to stdout (one per line) to use
 # as command-line args for ci/deps.perl.  Called automatically by ci/run.sh
 
 # set by os-release(5) or similar
diff --git a/devel/README b/devel/README
index 8f9a0485ec3f..c4be51415d34 100644
--- a/devel/README
+++ b/devel/README
@@ -1 +1 @@
-scripts use for public-inbox development that don't belong in t/
+scripts used for public-inbox development that don't belong in t/
diff --git a/examples/varnish-4.vcl b/examples/varnish-4.vcl
index 5fc202ed4f36..624f60133599 100644
--- a/examples/varnish-4.vcl
+++ b/examples/varnish-4.vcl
@@ -28,7 +28,7 @@ sub vcl_recv {
 }
 
 sub vcl_pipe {
-	# By default Connection: close is set on all piped requests by varnish,
+	# By default, Connection: close is set on all piped requests by varnish,
 	# but public-inbox-httpd supports persistent connections well :)
 	unset bereq.http.connection;
 	return (pipe);
diff --git a/lib/PublicInbox/DS.pm b/lib/PublicInbox/DS.pm
index 98084b5c8a0a..e89dc4306c7b 100644
--- a/lib/PublicInbox/DS.pm
+++ b/lib/PublicInbox/DS.pm
@@ -209,8 +209,8 @@ sub await_cb ($;@) {
 	warn "E: awaitpid($pid): $@" if $@;
 }
 
-# This relies on our Perl process is single-threaded, or at least
-# no threads are spawning and waiting on processes (``, system(), etc...)
+# This relies on our Perl process being single-threaded, or at least
+# no threads spawning and waiting on processes (``, system(), etc...)
 # Threads are officially discouraged by the Perl5 team, and I expect
 # that to remain the case.
 sub reap_pids {
diff --git a/lib/PublicInbox/Daemon.pm b/lib/PublicInbox/Daemon.pm
index 30442227bdf8..88b0fa45bbb6 100644
--- a/lib/PublicInbox/Daemon.pm
+++ b/lib/PublicInbox/Daemon.pm
@@ -155,7 +155,7 @@ options:
 
   -l ADDRESS    address to listen on$dh
   --cert=FILE   default SSL/TLS certificate
-  --key=FILE    default SSL/TLS certificate
+  --key=FILE    default SSL/TLS certificate key
   -W WORKERS    number of worker processes to spawn (default: 1)
 
 See public-inbox-daemon(8) and $prog(1) man pages for more.
diff --git a/sa_config/README b/sa_config/README
index 6703c38fe1ae..3705e1e85d1b 100644
--- a/sa_config/README
+++ b/sa_config/README
@@ -4,9 +4,9 @@ SpamAssassin configs for public-inbox.org
 root/ - files for system-wide use (plugins, rule definitions,
         new rules should have a zero score which should be overridden)
 user/ - per-user config (keep as much in here as possible)
-        These files go into the users home directory
+        These files go into the user's home directory.
 
-All files in these example directory are CC0:
+All files in these example directories are CC0:
 To the extent possible under law, Eric Wong has waived all copyright and
 related or neighboring rights to these examples.
 
diff --git a/script/public-inbox-mda b/script/public-inbox-mda
index 7e2bee92096e..ba4989569e25 100755
--- a/script/public-inbox-mda
+++ b/script/public-inbox-mda
@@ -33,8 +33,8 @@ use PublicInbox::Filter::Base;
 use PublicInbox::InboxWritable;
 use PublicInbox::Spamcheck;
 
-# n.b: hopefully we can setup the emergency path without bailing due to
-# user error, we really want to setup the emergency destination ASAP
+# n.b.: Hopefully we can set up the emergency path without bailing due to
+# user error, we really want to set up the emergency destination ASAP
 # in case there's bugs in our code or user error.
 my $emergency = $ENV{PI_EMERGENCY} || "$ENV{HOME}/.public-inbox/emergency/";
 $ems = PublicInbox::Emergency->new($emergency);
diff --git a/scripts/README b/scripts/README
index 3b9c37da8787..7ffbd93cb994 100644
--- a/scripts/README
+++ b/scripts/README
@@ -1,5 +1,5 @@
 This directory contains informal scripts and random tools used
-in the development of public-inbox.  Some only exist only for
+in the development of public-inbox.  Some only exist for
 historical purposes, and some may not work anymore.
 
 See the "script/" directory (not "scripts/") for supported and
-- 
2.42.0


^ permalink raw reply related	[relevance 68%]

* [PATCH 2/6] doc: technical/ds: update blurb to note more daemons
  @ 2023-03-09 19:28 99% ` Eric Wong
  0 siblings, 0 replies; 7+ results
From: Eric Wong @ 2023-03-09 19:28 UTC (permalink / raw)
  To: meta

And add a note about the various wakeup modes of kqueue|epoll
while we're at it; we use all of them!
---
 Documentation/technical/ds.txt | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/Documentation/technical/ds.txt b/Documentation/technical/ds.txt
index 89cc05af..4cfb62fe 100644
--- a/Documentation/technical/ds.txt
+++ b/Documentation/technical/ds.txt
@@ -1,9 +1,14 @@
 PublicInbox::DS - event loop and async I/O base class
 
-Our PublicInbox::DS event loop which powers public-inbox-nntpd
-and public-inbox-httpd diverges significantly from the
-unmaintained Danga::Socket package we forked from.  In fact,
-it's probably different from most other event loops out there.
+Our PublicInbox::DS event loop which powers most of our long-lived
+processes(*) diverges significantly from the unmaintained Danga::Socket
+package we forked from.  In fact, it's probably different from most
+other event loops out there.
+
+Most notably, it uses one-shot, level-trigger, and edge-trigger mode
+modes of kqueue|epoll depending on the situation.
+
+(*) public-inbox-netd,(-httpd,-imapd,-nntpd,-pop3d,-watch) + lei-daemon
 
 Most notably:
 

^ permalink raw reply related	[relevance 99%]

* [PATCH 12/12] ds: drop dwaitpid, switch to waitpid(-1)
  @ 2023-01-17  7:19 83% ` Eric Wong
  0 siblings, 0 replies; 7+ results
From: Eric Wong @ 2023-01-17  7:19 UTC (permalink / raw)
  To: meta

With no remaining users, we can drop dwaitpid and switch
awaitpid to rely on waitpid(-1) to save syscalls.
---
 Documentation/technical/ds.txt |  2 +-
 lib/PublicInbox/DS.pm          | 68 +++++++---------------------------
 2 files changed, 15 insertions(+), 55 deletions(-)

diff --git a/Documentation/technical/ds.txt b/Documentation/technical/ds.txt
index 5a1655a1..89cc05af 100644
--- a/Documentation/technical/ds.txt
+++ b/Documentation/technical/ds.txt
@@ -81,7 +81,7 @@ New features
 
 * IO::Socket::SSL support (for NNTPS, STARTTLS+NNTP, HTTPS)
 
-* dwaitpid (waitpid wrapper) support for reaping dead children
+* awaitpid (waitpid wrapper) support for reaping dead children
 
 * reliable signal wakeups are supported via signalfd on Linux,
   EVFILT_SIGNAL on *BSDs via IO::KQueue.
diff --git a/lib/PublicInbox/DS.pm b/lib/PublicInbox/DS.pm
index 9563a1cb..c849f515 100644
--- a/lib/PublicInbox/DS.pm
+++ b/lib/PublicInbox/DS.pm
@@ -32,11 +32,10 @@ use PublicInbox::Syscall qw(:epoll);
 use PublicInbox::Tmpfile;
 use Errno qw(EAGAIN EINVAL);
 use Carp qw(carp croak);
-our @EXPORT_OK = qw(now msg_more dwaitpid awaitpid add_timer add_uniq_timer);
+our @EXPORT_OK = qw(now msg_more awaitpid add_timer add_uniq_timer);
 
 my %Stack;
 my $nextq; # queue for next_tick
-my $wait_pids; # list of [ pid, callback, callback_arg ]
 my $AWAIT_PIDS; # pid => [ $callback, @args ]
 my $reap_armed;
 my $ToClose; # sockets to close when event loop is done
@@ -75,11 +74,11 @@ sub Reset {
 		# we may be iterating inside one of these on our stack
 		my @q = delete @Stack{keys %Stack};
 		for my $q (@q) { @$q = () }
-		$AWAIT_PIDS = $wait_pids = $nextq = $ToClose = undef;
+		$AWAIT_PIDS = $nextq = $ToClose = undef;
 		$ep_io = undef; # closes real $Epoll FD
 		$Epoll = undef; # may call DSKQXS::DESTROY
-	} while (@Timers || keys(%Stack) || $nextq || $wait_pids ||
-		$ToClose || keys(%DescriptorMap) || $AWAIT_PIDS ||
+	} while (@Timers || keys(%Stack) || $nextq || $AWAIT_PIDS ||
+		$ToClose || keys(%DescriptorMap) ||
 		$PostLoopCallback || keys(%UniqTimer));
 
 	$reap_armed = undef;
@@ -209,43 +208,23 @@ sub await_cb ($;@) {
 	warn "E: awaitpid($pid): $@" if $@;
 }
 
-# We can't use waitpid(-1) safely here since it can hit ``, system(),
-# and other things.  So we scan the $wait_pids list, which is hopefully
-# not too big.  We keep $wait_pids small by not calling dwaitpid()
-# until we've hit EOF when reading the stdout of the child.
-
+# This relies on our Perl process is single-threaded, or at least
+# no threads are spawning and waiting on processes (``, system(), etc...)
+# Threads are officially discouraged by the Perl5 team, and I expect
+# that to remain the case.
 sub reap_pids {
 	$reap_armed = undef;
-	my $tmp = $wait_pids // [];
-	$wait_pids = undef;
-	$Stack{reap_runq} = $tmp;
 	my $oldset = block_signals();
-
-	# old API
-	foreach my $ary (@$tmp) {
-		my ($pid, $cb, $arg) = @$ary;
-		my $ret = waitpid($pid, WNOHANG);
-		if ($ret == 0) {
-			push @$wait_pids, $ary; # autovivifies @$wait_pids
-		} elsif ($ret == $pid) {
-			if ($cb) {
-				eval { $cb->($arg, $pid) };
-				warn "E: dwaitpid($pid) in_loop: $@" if $@;
-			}
+	while (1) {
+		my $pid = waitpid(-1, WNOHANG) // last;
+		last if $pid <= 0;
+		if (defined(my $cb_args = delete $AWAIT_PIDS->{$pid})) {
+			await_cb($pid, @$cb_args) if $cb_args;
 		} else {
-			warn "waitpid($pid, WNOHANG) = $ret, \$!=$!, \$?=$?";
+			warn "W: reaped unknown PID=$pid: \$?=$?\n";
 		}
 	}
-
-	# new API TODO: convert to waitpid(-1) in the future as long
-	# as we don't use threads
-	for my $pid (keys %$AWAIT_PIDS) {
-		my $wpid = waitpid($pid, WNOHANG) // next;
-		my $cb_args = delete $AWAIT_PIDS->{$wpid} or next;
-		await_cb($pid, @$cb_args);
-	}
 	sig_setmask($oldset);
-	delete $Stack{reap_runq};
 }
 
 # reentrant SIGCHLD handler (since reap_pids is not reentrant)
@@ -719,25 +698,6 @@ sub long_response ($$;@) {
 	undef;
 }
 
-sub dwaitpid ($;$$) {
-	my ($pid, $cb, $arg) = @_;
-	if ($in_loop) {
-		push @$wait_pids, [ $pid, $cb, $arg ];
-		# We could've just missed our SIGCHLD, cover it, here:
-		enqueue_reap();
-	} else {
-		my $ret = waitpid($pid, 0);
-		if ($ret == $pid) {
-			if ($cb) {
-				eval { $cb->($arg, $pid) };
-				carp "E: dwaitpid($pid) !in_loop: $@" if $@;
-			}
-		} else {
-			carp "waitpid($pid, 0) = $ret, \$!=$!, \$?=$?";
-		}
-	}
-}
-
 sub awaitpid {
 	my ($pid, @cb_args) = @_;
 	$AWAIT_PIDS->{$pid} //= @cb_args ? \@cb_args : 0;

^ permalink raw reply related	[relevance 83%]

* [PATCH 12/12] httpd/async: switch to level-triggered epoll
  @ 2021-10-16  1:01 85% ` Eric Wong
  0 siblings, 0 replies; 7+ results
From: Eric Wong @ 2021-10-16  1:01 UTC (permalink / raw)
  To: meta

We'll save ourselves some code here and let the kernel do more
work, instead.
---
 Documentation/technical/ds.txt |  3 +--
 lib/PublicInbox/HTTPD/Async.pm | 16 +++++-----------
 lib/PublicInbox/Qspawn.pm      |  1 -
 3 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/Documentation/technical/ds.txt b/Documentation/technical/ds.txt
index 7bc1ad79ce0c..5a1655a1450e 100644
--- a/Documentation/technical/ds.txt
+++ b/Documentation/technical/ds.txt
@@ -77,8 +77,7 @@ New features
   which (if any) events it's interested in for the next loop iteration.
 
 * Edge-triggering available via EPOLLET or EV_CLEAR.  These reduce wakeups
-  for unidirectional classes (e.g. PublicInbox::Listener sockets,
-  and pipes via PublicInbox::HTTPD::Async).
+  for unidirectional classes when throughput is more important than fairness.
 
 * IO::Socket::SSL support (for NNTPS, STARTTLS+NNTP, HTTPS)
 
diff --git a/lib/PublicInbox/HTTPD/Async.pm b/lib/PublicInbox/HTTPD/Async.pm
index 7238650aff97..1651da88ac03 100644
--- a/lib/PublicInbox/HTTPD/Async.pm
+++ b/lib/PublicInbox/HTTPD/Async.pm
@@ -17,7 +17,7 @@ package PublicInbox::HTTPD::Async;
 use strict;
 use parent qw(PublicInbox::DS);
 use Errno qw(EAGAIN);
-use PublicInbox::Syscall qw(EPOLLIN EPOLLET);
+use PublicInbox::Syscall qw(EPOLLIN);
 
 # This is called via: $env->{'pi-httpd.async'}->()
 # $io is a read-only pipe ($rpipe) for now, but may be a
@@ -39,7 +39,7 @@ sub new {
 	}, $class;
 	my $pp = tied *$io;
 	$pp->{fh}->blocking(0) // die "$io->blocking(0): $!";
-	$self->SUPER::new($io, EPOLLIN | EPOLLET);
+	$self->SUPER::new($io, EPOLLIN);
 }
 
 sub event_step {
@@ -54,15 +54,12 @@ sub event_step {
 		my $r = sysread($sock, my $buf, 65536);
 		if ($r) {
 			$self->{fh}->write($buf); # may call $http->close
-			if ($http->{sock}) { # !closed
-				$self->requeue;
-				# let other clients get some work done, too
-				return;
-			}
+			# let other clients get some work done, too
+			return if $http->{sock}; # !closed
 
 			# else: fall through to close below...
 		} elsif (!defined $r && $! == EAGAIN) {
-			return; # EPOLLET means we'll be notified
+			return; # EPOLLIN means we'll be notified
 		}
 
 		# Done! Error handling will happen in $self->{fh}->close
@@ -89,9 +86,6 @@ sub async_pass {
 
 	$self->{http} = $http;
 	$self->{fh} = $fh;
-
-	# either hit EAGAIN or ->requeue to keep EPOLLET happy
-	event_step($self);
 }
 
 # may be called as $forward->close in PublicInbox::HTTP or EOF (event_step)
diff --git a/lib/PublicInbox/Qspawn.pm b/lib/PublicInbox/Qspawn.pm
index a1ff65b65324..53d0ad55ee84 100644
--- a/lib/PublicInbox/Qspawn.pm
+++ b/lib/PublicInbox/Qspawn.pm
@@ -192,7 +192,6 @@ sub event_step {
 sub rd_hdr ($) {
 	my ($self) = @_;
 	# typically used for reading CGI headers
-	# we must loop until EAGAIN for EPOLLET in HTTPD/Async.pm
 	# We also need to check EINTR for generic PSGI servers.
 	my $ret;
 	my $total_rd = 0;

^ permalink raw reply related	[relevance 85%]

* [PATCH] favor git(1) rather than libgit2 for ExtSearch
@ 2021-06-24  5:50 65% Eric Wong
  0 siblings, 0 replies; 7+ results
From: Eric Wong @ 2021-06-24  5:50 UTC (permalink / raw)
  To: meta

While both git and libgit2 take around 16 minutes to load 100K
alternates there's already a proposed patch to make git faster:

  <https://lore.kernel.org/git/20210624005806.12079-1-e@80x24.org/>

It's also easier to patch and install git locally since the
git.git build system defaults to prefix=$HOME and dealing with
dynamic linking with libgit2 is more difficult for end users
relying on Inline::C.

libgit2 remains in use for the non-ALL.git case, but maybe it's
not necessary (libgit2 is significantly slower than git in
Debian 10 due to SHA-1 collision checking).
---
 Documentation/technical/ds.txt |  2 +-
 lib/PublicInbox/GitAsyncCat.pm | 21 +++++++++++++--------
 lib/PublicInbox/GzipFilter.pm  |  3 +--
 lib/PublicInbox/HTTPD.pm       |  2 +-
 lib/PublicInbox/IMAP.pm        | 10 +++++-----
 lib/PublicInbox/NNTP.pm        |  4 ++--
 lib/PublicInbox/SolverGit.pm   |  3 +--
 7 files changed, 24 insertions(+), 21 deletions(-)

diff --git a/Documentation/technical/ds.txt b/Documentation/technical/ds.txt
index a0793ca2..7bc1ad79 100644
--- a/Documentation/technical/ds.txt
+++ b/Documentation/technical/ds.txt
@@ -64,7 +64,7 @@ Augmented features:
 * ->requeue support.  An optimization of the AddTimer(0, ...) idiom
   for immediately dispatching code at the next event loop iteration.
   public-inbox uses this for fairly generating large responses
-  iteratively (see PublicInbox::NNTP::long_response or git_async_cat
+  iteratively (see PublicInbox::NNTP::long_response or ibx_async_cat
   for blob retrievals).
 
 New features
diff --git a/lib/PublicInbox/GitAsyncCat.pm b/lib/PublicInbox/GitAsyncCat.pm
index 7d1a13db..57c194d9 100644
--- a/lib/PublicInbox/GitAsyncCat.pm
+++ b/lib/PublicInbox/GitAsyncCat.pm
@@ -8,7 +8,7 @@ use strict;
 use parent qw(PublicInbox::DS Exporter);
 use POSIX qw(WNOHANG);
 use PublicInbox::Syscall qw(EPOLLIN EPOLLET);
-our @EXPORT = qw(git_async_cat git_async_prefetch);
+our @EXPORT = qw(ibx_async_cat ibx_async_prefetch);
 use PublicInbox::Git ();
 
 our $GCF2C; # singleton PublicInbox::Gcf2Client
@@ -45,12 +45,16 @@ sub event_step {
 	}
 }
 
-sub git_async_cat ($$$$) {
-	my ($git, $oid, $cb, $arg) = @_;
-	if ($GCF2C //= eval {
+sub ibx_async_cat ($$$$) {
+	my ($ibx, $oid, $cb, $arg) = @_;
+	my $git = $ibx->git;
+	# {topdir} means ExtSearch (likely [extindex "all"]) with potentially
+	# 100K alternates.  git(1) has a proposed patch for 100K alternates:
+	# <https://lore.kernel.org/git/20210624005806.12079-1-e@80x24.org/>
+	if (!defined($ibx->{topdir}) && ($GCF2C //= eval {
 		require PublicInbox::Gcf2Client;
 		PublicInbox::Gcf2Client::new();
-	} // 0) { # 0: do not retry if libgit2 or Inline::C are missing
+	} // 0)) { # 0: do not retry if libgit2 or Inline::C are missing
 		$GCF2C->gcf2_async(\"$oid $git->{git_dir}\n", $cb, $arg);
 		\undef;
 	} else { # read-only end of git-cat-file pipe
@@ -66,9 +70,10 @@ sub git_async_cat ($$$$) {
 
 # this is safe to call inside $cb, but not guaranteed to enqueue
 # returns true if successful, undef if not.
-sub git_async_prefetch {
-	my ($git, $oid, $cb, $arg) = @_;
-	if ($GCF2C) {
+sub ibx_async_prefetch {
+	my ($ibx, $oid, $cb, $arg) = @_;
+	my $git = $ibx->git;
+	if (!defined($ibx->{topdir}) && $GCF2C) {
 		if (!$GCF2C->{wbuf}) {
 			$oid .= " $git->{git_dir}\n";
 			return $GCF2C->gcf2_async(\$oid, $cb, $arg); # true
diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm
index 48ed11a5..334d6581 100644
--- a/lib/PublicInbox/GzipFilter.pm
+++ b/lib/PublicInbox/GzipFilter.pm
@@ -180,8 +180,7 @@ sub async_blob_cb { # git->cat_async callback
 
 sub smsg_blob {
 	my ($self, $smsg) = @_;
-	git_async_cat($self->{ibx}->git, $smsg->{blob},
-			\&async_blob_cb, $self);
+	ibx_async_cat($self->{ibx}, $smsg->{blob}, \&async_blob_cb, $self);
 }
 
 1;
diff --git a/lib/PublicInbox/HTTPD.pm b/lib/PublicInbox/HTTPD.pm
index b193c9ae..fb683f74 100644
--- a/lib/PublicInbox/HTTPD.pm
+++ b/lib/PublicInbox/HTTPD.pm
@@ -37,7 +37,7 @@ sub new {
 		# XXX unstable API!, only GitHTTPBackend needs
 		# this to limit git-http-backend(1) parallelism.
 		# We also check for the truthiness of this to
-		# detect when to use git_async_cat for slow blobs
+		# detect when to use async paths for slow blobs
 		'pi-httpd.async' => \&pi_httpd_async
 	);
 	bless {
diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index af8ce72b..9402aa41 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -612,7 +612,7 @@ sub fetch_run_ops {
 	$self->msg_more(")\r\n");
 }
 
-sub fetch_blob_cb { # called by git->cat_async via git_async_cat
+sub fetch_blob_cb { # called by git->cat_async via ibx_async_cat
 	my ($bref, $oid, $type, $size, $fetch_arg) = @_;
 	my ($self, undef, $msgs, $range_info, $ops, $partial) = @$fetch_arg;
 	my $ibx = $self->{ibx} or return $self->close; # client disconnected
@@ -627,8 +627,8 @@ sub fetch_blob_cb { # called by git->cat_async via git_async_cat
 	}
 	my $pre;
 	if (!$self->{wbuf} && (my $nxt = $msgs->[0])) {
-		$pre = git_async_prefetch($ibx->git, $nxt->{blob},
-						\&fetch_blob_cb, $fetch_arg);
+		$pre = ibx_async_prefetch($ibx, $nxt->{blob},
+					\&fetch_blob_cb, $fetch_arg);
 	}
 	fetch_run_ops($self, $smsg, $bref, $ops, $partial);
 	$pre ? $self->zflush : requeue_once($self);
@@ -760,7 +760,7 @@ sub fetch_blob { # long_response
 		}
 	}
 	uo2m_extend($self, $msgs->[-1]->{num});
-	git_async_cat($self->{ibx}->git, $msgs->[0]->{blob},
+	ibx_async_cat($self->{ibx}, $msgs->[0]->{blob},
 			\&fetch_blob_cb, \@_);
 }
 
@@ -1228,7 +1228,7 @@ sub long_step {
 	} elsif ($more) { # $self->{wbuf}:
 		$self->update_idle_time;
 
-		# control passed to git_async_cat if $more == \undef
+		# control passed to ibx_async_cat if $more == \undef
 		requeue_once($self) if !ref($more);
 	} else { # all done!
 		delete $self->{long_cb};
diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm
index f7d99913..9df47133 100644
--- a/lib/PublicInbox/NNTP.pm
+++ b/lib/PublicInbox/NNTP.pm
@@ -515,7 +515,7 @@ found:
 		$smsg->{nntp_code} = $code;
 		set_art($self, $art);
 		# this dereferences to `undef'
-		${git_async_cat($ibx->git, $smsg->{blob}, \&blob_cb, $smsg)};
+		${ibx_async_cat($ibx, $smsg->{blob}, \&blob_cb, $smsg)};
 	}
 }
 
@@ -549,7 +549,7 @@ sub msg_hdr_write ($$) {
 	$smsg->{nntp}->msg_more($$hdr);
 }
 
-sub blob_cb { # called by git->cat_async via git_async_cat
+sub blob_cb { # called by git->cat_async via ibx_async_cat
 	my ($bref, $oid, $type, $size, $smsg) = @_;
 	my $self = $smsg->{nntp};
 	my $code = $smsg->{nntp_code};
diff --git a/lib/PublicInbox/SolverGit.pm b/lib/PublicInbox/SolverGit.pm
index 92106e75..b0cd0f2c 100644
--- a/lib/PublicInbox/SolverGit.pm
+++ b/lib/PublicInbox/SolverGit.pm
@@ -593,8 +593,7 @@ sub resolve_patch ($$) {
 	if (my $msgs = $want->{try_smsgs}) {
 		my $smsg = shift @$msgs;
 		if ($self->{psgi_env}->{'pi-httpd.async'}) {
-			return git_async_cat($want->{cur_ibx}->git,
-						$smsg->{blob},
+			return ibx_async_cat($want->{cur_ibx}, $smsg->{blob},
 						\&extract_diff_async,
 						[$self, $want, $smsg]);
 		} else {

^ permalink raw reply related	[relevance 65%]

* [PATCH 37/43] www: update internal docs
  @ 2020-07-05 23:27 65% ` Eric Wong
  0 siblings, 0 replies; 7+ results
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

We no longer favor getline+close for streaming PSGI responses
when using public-inbox-httpd.  We still support it for other
PSGI servers, though.
---
 Documentation/technical/ds.txt   |  4 ++--
 lib/PublicInbox/GetlineBody.pm   |  4 +---
 lib/PublicInbox/GzipFilter.pm    | 17 +++++++++++++----
 lib/PublicInbox/HTTPD.pm         |  5 ++---
 lib/PublicInbox/Mbox.pm          |  8 ++------
 lib/PublicInbox/View.pm          |  2 +-
 lib/PublicInbox/WwwAtomStream.pm |  6 ++----
 lib/PublicInbox/WwwStream.pm     |  7 +++----
 8 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/Documentation/technical/ds.txt b/Documentation/technical/ds.txt
index cbd06cfb4..a0793ca23 100644
--- a/Documentation/technical/ds.txt
+++ b/Documentation/technical/ds.txt
@@ -64,8 +64,8 @@ Augmented features:
 * ->requeue support.  An optimization of the AddTimer(0, ...) idiom
   for immediately dispatching code at the next event loop iteration.
   public-inbox uses this for fairly generating large responses
-  iteratively (see PublicInbox::NNTP::long_response or the use of
-  ->getline callbacks for generating gigantic gzipped mboxes).
+  iteratively (see PublicInbox::NNTP::long_response or git_async_cat
+  for blob retrievals).
 
 New features
 
diff --git a/lib/PublicInbox/GetlineBody.pm b/lib/PublicInbox/GetlineBody.pm
index 6becaaf5f..988bc63f4 100644
--- a/lib/PublicInbox/GetlineBody.pm
+++ b/lib/PublicInbox/GetlineBody.pm
@@ -5,9 +5,7 @@
 # end callback when the object goes out-of-scope.
 # This depends on rpipe being _blocking_ on getline.
 #
-# public-inbox-httpd favors "getline" response bodies to take a
-# "pull"-based approach to feeding slow clients (as opposed to a
-# more common "push" model)
+# This is only used by generic PSGI servers and not public-inbox-httpd
 package PublicInbox::GetlineBody;
 use strict;
 use warnings;
diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm
index 6380f50e9..d72ad3c88 100644
--- a/lib/PublicInbox/GzipFilter.pm
+++ b/lib/PublicInbox/GzipFilter.pm
@@ -1,7 +1,16 @@
 # Copyright (C) 2020 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
-
-# Qspawn filter
+#
+# In public-inbox <=1.5.0, public-inbox-httpd favored "getline"
+# response bodies to take a "pull"-based approach to feeding
+# slow clients (as opposed to a more common "push" model).
+#
+# In newer versions, public-inbox-httpd supports a backpressure-aware
+# pull/push model which also accounts for slow git blob storage.
+# {async_next} callbacks only run when the DS {wbuf} is drained
+# {async_eml} callbacks only run when a blob arrives from git.
+#
+# We continue to support getline+close for generic PSGI servers.
 package PublicInbox::GzipFilter;
 use strict;
 use parent qw(Exporter);
@@ -14,12 +23,12 @@ our @EXPORT_OK = qw(gzf_maybe);
 my %OPT = (-WindowBits => 15 + 16, -AppendOutput => 1);
 my @GZIP_HDRS = qw(Vary Accept-Encoding Content-Encoding gzip);
 
-sub new { bless {}, shift }
+sub new { bless {}, shift } # qspawn filter
 
 # for Qspawn if using $env->{'pi-httpd.async'}
 sub attach {
 	my ($self, $http_out) = @_;
-	$self->{http_out} = $http_out;
+	$self->{http_out} = $http_out; # PublicInbox::HTTP::{Chunked,Identity}
 	$self
 }
 
diff --git a/lib/PublicInbox/HTTPD.pm b/lib/PublicInbox/HTTPD.pm
index 331939699..a9f55ff61 100644
--- a/lib/PublicInbox/HTTPD.pm
+++ b/lib/PublicInbox/HTTPD.pm
@@ -36,9 +36,8 @@ sub new {
 
 		# XXX unstable API!, only GitHTTPBackend needs
 		# this to limit git-http-backend(1) parallelism.
-		# The rest of our PSGI code is generic, relying
-		# on "pull" model using "getline" to prevent
-		# over-buffering.
+		# We also check for the truthiness of this to
+		# detect when to use git_async_cat for slow blobs
 		'pi-httpd.async' => \&pi_httpd_async
 	);
 	bless {
diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm
index abdf43c93..8726b9f64 100644
--- a/lib/PublicInbox/Mbox.pm
+++ b/lib/PublicInbox/Mbox.pm
@@ -1,12 +1,8 @@
 # Copyright (C) 2015-2020 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 
-# Streaming (via getline) interface for formatting messages as an mboxrd.
-# Used by the PSGI web interface.
-#
-# public-inbox-httpd favors "getline" response bodies to take a
-# "pull"-based approach to feeding slow clients (as opposed to a
-# more common "push" model)
+# Streaming interface for mboxrd HTTP responses
+# See PublicInbox::GzipFilter for details.
 package PublicInbox::Mbox;
 use strict;
 use parent 'PublicInbox::GzipFilter';
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 895e4f278..60dad6bac 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -415,7 +415,7 @@ sub stream_thread ($$) {
 	PublicInbox::WwwStream::aresponse($ctx, 200, \&stream_thread_i);
 }
 
-# /$INBOX/$MESSAGE_ID/t/
+# /$INBOX/$MSGID/t/ and /$INBOX/$MSGID/T/
 sub thread_html {
 	my ($ctx) = @_;
 	my $mid = $ctx->{mid};
diff --git a/lib/PublicInbox/WwwAtomStream.pm b/lib/PublicInbox/WwwAtomStream.pm
index 073df1dfa..3b5b133a5 100644
--- a/lib/PublicInbox/WwwAtomStream.pm
+++ b/lib/PublicInbox/WwwAtomStream.pm
@@ -1,10 +1,8 @@
 # Copyright (C) 2016-2020 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 #
-# Atom body stream for which yields getline+close methods
-# public-inbox-httpd favors "getline" response bodies to take a
-# "pull"-based approach to feeding slow clients (as opposed to a
-# more common "push" model)
+# Atom body stream for HTTP responses
+# See PublicInbox::GzipFilter for details.
 package PublicInbox::WwwAtomStream;
 use strict;
 use parent 'PublicInbox::GzipFilter';
diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm
index 7d257a191..23b03f0e8 100644
--- a/lib/PublicInbox/WwwStream.pm
+++ b/lib/PublicInbox/WwwStream.pm
@@ -1,11 +1,10 @@
 # Copyright (C) 2016-2020 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 #
-# HTML body stream for which yields getline+close methods
+# HTML body stream for which yields getline+close methods for
+# generic PSGI servers and callbacks for public-inbox-httpd.
 #
-# public-inbox-httpd favors "getline" response bodies to take a
-# "pull"-based approach to feeding slow clients (as opposed to a
-# more common "push" model)
+# See PublicInbox::GzipFilter parent class for more info.
 package PublicInbox::WwwStream;
 use strict;
 use parent qw(Exporter PublicInbox::GzipFilter);

^ permalink raw reply related	[relevance 65%]

* [PATCH] doc: technical/ds.txt: describe PublicInbox::DS divergences
@ 2020-01-10 20:35 63% Eric Wong
  0 siblings, 0 replies; 7+ results
From: Eric Wong @ 2020-01-10 20:35 UTC (permalink / raw)
  To: meta

Danga::Socket 1.62 was released a few months back and
the maintainer indicated it would be the last release.
We've diverged significantly in incompatible ways...

While most of this should've already been documented in
commit messages, putting it all into one document could
make it easier-to-digest.

It's also a strange design for anybody used to conventional
event loops.  Maybe this is an unconventional project :P
---
 Documentation/technical/ds.txt | 112 +++++++++++++++++++++++++++++++++
 MANIFEST                       |   1 +
 lib/PublicInbox/DS.pm          |  16 ++---
 3 files changed, 121 insertions(+), 8 deletions(-)
 create mode 100644 Documentation/technical/ds.txt

diff --git a/Documentation/technical/ds.txt b/Documentation/technical/ds.txt
new file mode 100644
index 00000000..cbd06cfb
--- /dev/null
+++ b/Documentation/technical/ds.txt
@@ -0,0 +1,112 @@
+PublicInbox::DS - event loop and async I/O base class
+
+Our PublicInbox::DS event loop which powers public-inbox-nntpd
+and public-inbox-httpd diverges significantly from the
+unmaintained Danga::Socket package we forked from.  In fact,
+it's probably different from most other event loops out there.
+
+Most notably:
+
+* There is one and only one callback: ->event_step.  Unlike other
+  event loops, there are no separate callbacks for read, write,
+  error or hangup events.  In fact, we never care which kevent
+  filter or poll/epoll event flag (e.g. POLLIN/POLLOUT/POLLHUP)
+  triggers a call.
+
+  The lack of read/write callback distinction is driven by the
+  fact TLS libraries (e.g. OpenSSL via IO::Socket::SSL) may
+  declare SSL_WANT_READ on SSL_write(), and SSL_WANT_READ on
+  SSL_read().  So we end up having to let each user object decide
+  whether it wants to make read or write calls depending on its
+  internal state, completely independent of the event loop.
+
+  Error and hangup (POLLERR and POLLHUP) callbacks are redundant and
+  only triggered in rare cases.  They're redundant because the
+  result of every read and write call in ->event_step must be
+  checked, anyways.  At best, callbacks for POLLHUP and POLLERR can
+  save one syscall per socket lifetime and not worth the extra code
+  it imposes.
+
+  Reducing the user-supplied code down to a single callback allows
+  subclasses to keep their logic self-contained.  The combination
+  of this change and one-shot wakeups (see below) for bidirectional
+  data flows make asynchronous code easier to reason about.
+
+Other divergences:
+
+* ->write buffering uses temporary files whereas Danga::Socket used
+  the heap.  The rationale for this is the kernel already provides
+  ample (and configurable) space for socket buffers.  Modern kernels
+  also cache FS operations aggressively, so systems with ample RAM
+  are unlikely to notice degradation, while small systems are less
+  likely to suffer unpredictable heap fragmentation, swap and OOM
+  penalties.
+
+  In the future, we may introduce sendfile and mmap+SSL_write to
+  reduce data copies, and use FALLOC_FL_PUNCH_HOLE on Linux to
+  release space after the buffer is partially cleared.
+
+Augmented features:
+
+* obj->write(CODEREF) passes the object itself to the CODEREF
+  Being able to enqueue subroutine calls is a powerful feature in
+  Danga::Socket for keeping linear logic in an asynchronous environment.
+  Unfortunately, each subroutine takes several kilobytes of memory.
+  One small change to Danga::Socket is to pass the receiver object
+  (aka "$self") to the CODEREF.  $self can store any necessary
+  state it needs for a normal (named) subroutine.  This allows us to
+  put the same sub into multiple queues without paying a large
+  memory penalty for each one.
+
+  This idea is also more easily ported to C or other languages which
+  lack anonymous subroutines (aka "closures").
+
+* ->requeue support.  An optimization of the AddTimer(0, ...) idiom
+  for immediately dispatching code at the next event loop iteration.
+  public-inbox uses this for fairly generating large responses
+  iteratively (see PublicInbox::NNTP::long_response or the use of
+  ->getline callbacks for generating gigantic gzipped mboxes).
+
+New features
+
+* One-shot wakeups allowed via EPOLLONESHOT or EV_DISPATCH.  These
+  flags allow us to simplify code in ->event_step callbacks for
+  bidirectional sockets (NNTP and HTTP).  Instead of merely reacting
+  to events, control is handed over at ->event_step in one-shot scenarios.
+  The event_step caller (NNTP || HTTP) then becomes proactive in declaring
+  which (if any) events it's interested in for the next loop iteration.
+
+* Edge-triggering available via EPOLLET or EV_CLEAR.  These reduce wakeups
+  for unidirectional classes (e.g. PublicInbox::Listener sockets,
+  and pipes via PublicInbox::HTTPD::Async).
+
+* IO::Socket::SSL support (for NNTPS, STARTTLS+NNTP, HTTPS)
+
+* dwaitpid (waitpid wrapper) support for reaping dead children
+
+* reliable signal wakeups are supported via signalfd on Linux,
+  EVFILT_SIGNAL on *BSDs via IO::KQueue.
+
+Removed features
+
+* Many fields removed or moved to subclasses, so the underlying
+  hash is smaller and suitable for FDs other than stream sockets.
+  Some fields we enforce (e.g. wbuf, wbuf_off) are autovivified
+  on an as-needed basis to save memory when they're not needed.
+
+* TCP_CORK support removed, instead we use MSG_MORE on non-TLS sockets
+  and we may use vectored I/O support via GnuTLS in the future
+  for TLS sockets.
+
+* per-FD PLCMap (post-loop callback) removed, we got ->requeue
+  support where no extra hash lookups or assignments are necessary.
+
+* read push backs removed.  Some subclasses use a read buffer ({rbuf})
+  but they control it, not this event loop.
+
+* Profiling and debug logging removed.  Perl and OS-specific tracers
+  and profilers are sufficient.
+
+* ->AddOtherFds support removed, everything watched is a subclass of
+  PublicInbox::DS, but we've slimmed down the fields to eliminate
+  the memory penalty for objects.
diff --git a/MANIFEST b/MANIFEST
index 914015ad..3736c777 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -34,6 +34,7 @@ Documentation/public-inbox-watch.pod
 Documentation/public-inbox-xcpdb.pod
 Documentation/public-inbox.cgi.pod
 Documentation/standards.perl
+Documentation/technical/ds.txt
 Documentation/txt2pre
 HACKING
 INSTALL
diff --git a/lib/PublicInbox/DS.pm b/lib/PublicInbox/DS.pm
index 09dc3992..058b1358 100644
--- a/lib/PublicInbox/DS.pm
+++ b/lib/PublicInbox/DS.pm
@@ -3,15 +3,15 @@
 #
 # This license differs from the rest of public-inbox
 #
-# This is a fork of the (for now) unmaintained Danga::Socket 1.61.
-# Unused features will be removed, and updates will be made to take
-# advantage of newer kernels.
+# This is a fork of the unmaintained Danga::Socket (1.61) with
+# significant changes.  See Documentation/technical/ds.txt in our
+# source for details.
 #
-# API changes to diverge from Danga::Socket will happen to better
-# accomodate new features and improve scalability.  Do not expect
-# this to be a stable API like Danga::Socket.
-# Bugs encountered (and likely fixed) are reported to
-# bug-Danga-Socket@rt.cpan.org and visible at:
+# Do not expect this to be a stable API like Danga::Socket,
+# but it will evolve to suite our needs and to take advantage of
+# newer Linux and *BSD features.
+# Bugs encountered were reported to bug-Danga-Socket@rt.cpan.org,
+# fixed in Danga::Socket 1.62 and visible at:
 # https://rt.cpan.org/Public/Dist/Display.html?Name=Danga-Socket
 package PublicInbox::DS;
 use strict;

^ permalink raw reply related	[relevance 63%]

Results 1-7 of 7 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2020-01-10 20:35 63% [PATCH] doc: technical/ds.txt: describe PublicInbox::DS divergences Eric Wong
2020-07-05 23:27     [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
2020-07-05 23:27 65% ` [PATCH 37/43] www: update internal docs Eric Wong
2021-06-24  5:50 65% [PATCH] favor git(1) rather than libgit2 for ExtSearch Eric Wong
2021-10-16  1:00     [PATCH 00/16] some yak-shaving and annoyance fixes Eric Wong
2021-10-16  1:01 85% ` [PATCH 12/12] httpd/async: switch to level-triggered epoll Eric Wong
2023-01-17  7:18     [PATCH 00/12] improve process reaping Eric Wong
2023-01-17  7:19 83% ` [PATCH 12/12] ds: drop dwaitpid, switch to waitpid(-1) Eric Wong
2023-03-09 19:28     [PATCH 0/6] various doc updates Eric Wong
2023-03-09 19:28 99% ` [PATCH 2/6] doc: technical/ds: update blurb to note more daemons Eric Wong
2023-08-28 10:42     [PATCH 1/5] ci/profiles.sh: fix case matching logic Štěpán Němec
2023-08-28 10:42 68% ` [PATCH 5/5] Fix some typos/grammar/errors in docs and comments Štěpán Němec

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).