about summary refs log tree commit homepage
diff options
context:
space:
mode:
-rw-r--r--Documentation/technical/ds.txt112
-rw-r--r--MANIFEST1
-rw-r--r--lib/PublicInbox/DS.pm16
3 files changed, 121 insertions, 8 deletions
diff --git a/Documentation/technical/ds.txt b/Documentation/technical/ds.txt
new file mode 100644
index 00000000..cbd06cfb
--- /dev/null
+++ b/Documentation/technical/ds.txt
@@ -0,0 +1,112 @@
+PublicInbox::DS - event loop and async I/O base class
+
+Our PublicInbox::DS event loop which powers public-inbox-nntpd
+and public-inbox-httpd diverges significantly from the
+unmaintained Danga::Socket package we forked from.  In fact,
+it's probably different from most other event loops out there.
+
+Most notably:
+
+* There is one and only one callback: ->event_step.  Unlike other
+  event loops, there are no separate callbacks for read, write,
+  error or hangup events.  In fact, we never care which kevent
+  filter or poll/epoll event flag (e.g. POLLIN/POLLOUT/POLLHUP)
+  triggers a call.
+
+  The lack of read/write callback distinction is driven by the
+  fact TLS libraries (e.g. OpenSSL via IO::Socket::SSL) may
+  declare SSL_WANT_READ on SSL_write(), and SSL_WANT_READ on
+  SSL_read().  So we end up having to let each user object decide
+  whether it wants to make read or write calls depending on its
+  internal state, completely independent of the event loop.
+
+  Error and hangup (POLLERR and POLLHUP) callbacks are redundant and
+  only triggered in rare cases.  They're redundant because the
+  result of every read and write call in ->event_step must be
+  checked, anyways.  At best, callbacks for POLLHUP and POLLERR can
+  save one syscall per socket lifetime and not worth the extra code
+  it imposes.
+
+  Reducing the user-supplied code down to a single callback allows
+  subclasses to keep their logic self-contained.  The combination
+  of this change and one-shot wakeups (see below) for bidirectional
+  data flows make asynchronous code easier to reason about.
+
+Other divergences:
+
+* ->write buffering uses temporary files whereas Danga::Socket used
+  the heap.  The rationale for this is the kernel already provides
+  ample (and configurable) space for socket buffers.  Modern kernels
+  also cache FS operations aggressively, so systems with ample RAM
+  are unlikely to notice degradation, while small systems are less
+  likely to suffer unpredictable heap fragmentation, swap and OOM
+  penalties.
+
+  In the future, we may introduce sendfile and mmap+SSL_write to
+  reduce data copies, and use FALLOC_FL_PUNCH_HOLE on Linux to
+  release space after the buffer is partially cleared.
+
+Augmented features:
+
+* obj->write(CODEREF) passes the object itself to the CODEREF
+  Being able to enqueue subroutine calls is a powerful feature in
+  Danga::Socket for keeping linear logic in an asynchronous environment.
+  Unfortunately, each subroutine takes several kilobytes of memory.
+  One small change to Danga::Socket is to pass the receiver object
+  (aka "$self") to the CODEREF.  $self can store any necessary
+  state it needs for a normal (named) subroutine.  This allows us to
+  put the same sub into multiple queues without paying a large
+  memory penalty for each one.
+
+  This idea is also more easily ported to C or other languages which
+  lack anonymous subroutines (aka "closures").
+
+* ->requeue support.  An optimization of the AddTimer(0, ...) idiom
+  for immediately dispatching code at the next event loop iteration.
+  public-inbox uses this for fairly generating large responses
+  iteratively (see PublicInbox::NNTP::long_response or the use of
+  ->getline callbacks for generating gigantic gzipped mboxes).
+
+New features
+
+* One-shot wakeups allowed via EPOLLONESHOT or EV_DISPATCH.  These
+  flags allow us to simplify code in ->event_step callbacks for
+  bidirectional sockets (NNTP and HTTP).  Instead of merely reacting
+  to events, control is handed over at ->event_step in one-shot scenarios.
+  The event_step caller (NNTP || HTTP) then becomes proactive in declaring
+  which (if any) events it's interested in for the next loop iteration.
+
+* Edge-triggering available via EPOLLET or EV_CLEAR.  These reduce wakeups
+  for unidirectional classes (e.g. PublicInbox::Listener sockets,
+  and pipes via PublicInbox::HTTPD::Async).
+
+* IO::Socket::SSL support (for NNTPS, STARTTLS+NNTP, HTTPS)
+
+* dwaitpid (waitpid wrapper) support for reaping dead children
+
+* reliable signal wakeups are supported via signalfd on Linux,
+  EVFILT_SIGNAL on *BSDs via IO::KQueue.
+
+Removed features
+
+* Many fields removed or moved to subclasses, so the underlying
+  hash is smaller and suitable for FDs other than stream sockets.
+  Some fields we enforce (e.g. wbuf, wbuf_off) are autovivified
+  on an as-needed basis to save memory when they're not needed.
+
+* TCP_CORK support removed, instead we use MSG_MORE on non-TLS sockets
+  and we may use vectored I/O support via GnuTLS in the future
+  for TLS sockets.
+
+* per-FD PLCMap (post-loop callback) removed, we got ->requeue
+  support where no extra hash lookups or assignments are necessary.
+
+* read push backs removed.  Some subclasses use a read buffer ({rbuf})
+  but they control it, not this event loop.
+
+* Profiling and debug logging removed.  Perl and OS-specific tracers
+  and profilers are sufficient.
+
+* ->AddOtherFds support removed, everything watched is a subclass of
+  PublicInbox::DS, but we've slimmed down the fields to eliminate
+  the memory penalty for objects.
diff --git a/MANIFEST b/MANIFEST
index 914015ad..3736c777 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -34,6 +34,7 @@ Documentation/public-inbox-watch.pod
 Documentation/public-inbox-xcpdb.pod
 Documentation/public-inbox.cgi.pod
 Documentation/standards.perl
+Documentation/technical/ds.txt
 Documentation/txt2pre
 HACKING
 INSTALL
diff --git a/lib/PublicInbox/DS.pm b/lib/PublicInbox/DS.pm
index 09dc3992..058b1358 100644
--- a/lib/PublicInbox/DS.pm
+++ b/lib/PublicInbox/DS.pm
@@ -3,15 +3,15 @@
 #
 # This license differs from the rest of public-inbox
 #
-# This is a fork of the (for now) unmaintained Danga::Socket 1.61.
-# Unused features will be removed, and updates will be made to take
-# advantage of newer kernels.
+# This is a fork of the unmaintained Danga::Socket (1.61) with
+# significant changes.  See Documentation/technical/ds.txt in our
+# source for details.
 #
-# API changes to diverge from Danga::Socket will happen to better
-# accomodate new features and improve scalability.  Do not expect
-# this to be a stable API like Danga::Socket.
-# Bugs encountered (and likely fixed) are reported to
-# bug-Danga-Socket@rt.cpan.org and visible at:
+# Do not expect this to be a stable API like Danga::Socket,
+# but it will evolve to suite our needs and to take advantage of
+# newer Linux and *BSD features.
+# Bugs encountered were reported to bug-Danga-Socket@rt.cpan.org,
+# fixed in Danga::Socket 1.62 and visible at:
 # https://rt.cpan.org/Public/Dist/Display.html?Name=Danga-Socket
 package PublicInbox::DS;
 use strict;