user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH 2/4] listener: use EPOLLEXCLUSIVE for listen sockets
    2019-05-05  0:52 24% ` [PATCH 1/4] " Eric Wong
@ 2019-05-05  0:52 83% ` Eric Wong
    2 siblings, 0 replies; 51+ results
From: Eric Wong @ 2019-05-05  0:52 UTC (permalink / raw)
  To: meta

Since our listen sockets are non-blocking and we may run
multiple httpd|nntpd processes; we need a way to avoid
thundering herds when there are multiple httpd|nntpd worker
processes.

EPOLLEXCLUSIVE was added just for that in Linux 4.5
---
 TODO                        |  3 ---
 lib/PublicInbox/DS.pm       | 22 ++++++++++++++++------
 lib/PublicInbox/Listener.pm |  2 +-
 lib/PublicInbox/Syscall.pm  |  7 +++++--
 4 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/TODO b/TODO
index 372f733..ac255b8 100644
--- a/TODO
+++ b/TODO
@@ -56,9 +56,6 @@ all need to be considered for everything we introduce)
   ugh... https://rt.cpan.org/Ticket/Display.html?id=116615
   (IO::KQueue is broken with Danga::Socket / PublicInbox::DS)
 
-* EPOLLEXCLUSIVE for listen socket fairness across -httpd/nntpd
-  worker processes.
-
 * improve documentation
 
 * linkify thread skeletons better
diff --git a/lib/PublicInbox/DS.pm b/lib/PublicInbox/DS.pm
index 543d3fd..3ccc275 100644
--- a/lib/PublicInbox/DS.pm
+++ b/lib/PublicInbox/DS.pm
@@ -78,6 +78,8 @@ our (
      @Timers,                    # timers
      );
 
+# this may be set to zero with old kernels
+our $EPOLLEXCLUSIVE = EPOLLEXCLUSIVE;
 Reset();
 
 #####################################################################
@@ -666,11 +668,9 @@ This is normally (always?) called from your subclass via:
 
 =cut
 sub new {
-    my PublicInbox::DS $self = shift;
+    my ($self, $sock, $exclusive) = @_;
     $self = fields::new($self) unless ref $self;
 
-    my $sock = shift;
-
     $self->{sock}        = $sock;
     my $fd = fileno($sock);
 
@@ -685,13 +685,23 @@ sub new {
     $self->{corked} = 0;
     $self->{read_push_back} = [];
 
-    $self->{event_watch} = POLLERR|POLLHUP|POLLNVAL;
+    my $ev = $self->{event_watch} = POLLERR|POLLHUP|POLLNVAL;
 
     _InitPoller();
 
     if ($HaveEpoll) {
-        epoll_ctl($Epoll, EPOLL_CTL_ADD, $fd, $self->{event_watch})
-            and die "couldn't add epoll watch for $fd\n";
+        if ($exclusive) {
+            $ev = $self->{event_watch} = EPOLLIN|EPOLLERR|EPOLLHUP|$EPOLLEXCLUSIVE;
+        }
+retry:
+        if (epoll_ctl($Epoll, EPOLL_CTL_ADD, $fd, $ev)) {
+            if ($!{EINVAL} && ($ev & $EPOLLEXCLUSIVE)) {
+                $EPOLLEXCLUSIVE = 0; # old kernel
+                $ev = $self->{event_watch} = EPOLLIN|EPOLLERR|EPOLLHUP;
+                goto retry;
+            }
+            die "couldn't add epoll watch for $fd: $!\n";
+        }
     }
     elsif ($HaveKQueue) {
         # Add them to the queue but disabled for now
diff --git a/lib/PublicInbox/Listener.pm b/lib/PublicInbox/Listener.pm
index d1f0d2e..a75a6fd 100644
--- a/lib/PublicInbox/Listener.pm
+++ b/lib/PublicInbox/Listener.pm
@@ -17,7 +17,7 @@ sub new ($$$) {
 	listen($s, 1024);
 	IO::Handle::blocking($s, 0);
 	my $self = fields::new($class);
-	$self->SUPER::new($s); # calls epoll_create for the first socket
+	$self->SUPER::new($s, 1); # calls epoll_create for the first socket
 	$self->watch_read(1);
 	$self->{post_accept} = $cb;
 	$self
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index cf70045..9194364 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -23,10 +23,12 @@ $VERSION     = "0.25";
 @ISA         = qw(Exporter);
 @EXPORT_OK   = qw(sendfile epoll_ctl epoll_create epoll_wait
                   EPOLLIN EPOLLOUT EPOLLERR EPOLLHUP EPOLLRDBAND
-                  EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD);
+                  EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
+                  EPOLLEXCLUSIVE);
 %EXPORT_TAGS = (epoll => [qw(epoll_ctl epoll_create epoll_wait
                              EPOLLIN EPOLLOUT EPOLLERR EPOLLHUP EPOLLRDBAND
-                             EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD)],
+                             EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
+                             EPOLLEXCLUSIVE)],
                 sendfile => [qw(sendfile)],
                 );
 
@@ -35,6 +37,7 @@ use constant EPOLLOUT      => 4;
 use constant EPOLLERR      => 8;
 use constant EPOLLHUP      => 16;
 use constant EPOLLRDBAND   => 128;
+use constant EPOLLEXCLUSIVE => (1 << 28);
 use constant EPOLL_CTL_ADD => 1;
 use constant EPOLL_CTL_DEL => 2;
 use constant EPOLL_CTL_MOD => 3;
-- 
EW


^ permalink raw reply related	[relevance 83%]

* [PATCH 1/4] bundle Danga::Socket and Sys::Syscall
  @ 2019-05-05  0:52 24% ` Eric Wong
  2019-05-05  0:52 83% ` [PATCH 2/4] listener: use EPOLLEXCLUSIVE for listen sockets Eric Wong
    2 siblings, 0 replies; 51+ results
From: Eric Wong @ 2019-05-05  0:52 UTC (permalink / raw)
  To: meta

These modules are unmaintained upstream at the moment, but I'll
be able to help with the intended maintainer once/if CPAN
ownership is transferred.  OTOH, we've been waiting for that
transfer for several years, now...

Changes I intend to make:

* EPOLLEXCLUSIVE for Linux
* remove unused fields wasting memory
* kqueue bugfixes e.g. https://rt.cpan.org/Ticket/Display.html?id=116615
* accept4 support

And some lower priority experiments:

* switch to EV_ONESHOT / EPOLLONESHOT (incompatible changes)
* nginx-style buffering to tmpfile instead of string array
* sendfile off tmpfile buffers
* io_uring maybe?
---
 INSTALL                           |    4 -
 MANIFEST                          |    2 +
 TODO                              |    2 +-
 lib/PublicInbox/DS.pm             | 1334 +++++++++++++++++++++++++++++
 lib/PublicInbox/Daemon.pm         |    8 +-
 lib/PublicInbox/EvCleanup.pm      |   12 +-
 lib/PublicInbox/GitHTTPBackend.pm |    2 +-
 lib/PublicInbox/HTTP.pm           |   12 +-
 lib/PublicInbox/HTTPD/Async.pm    |    4 +-
 lib/PublicInbox/Listener.pm       |    2 +-
 lib/PublicInbox/NNTP.pm           |    6 +-
 lib/PublicInbox/ParentPipe.pm     |    2 +-
 lib/PublicInbox/Qspawn.pm         |    4 +-
 lib/PublicInbox/Syscall.pm        |  326 +++++++
 t/git-http-backend.t              |    2 +-
 t/httpd-corner.t                  |    2 +-
 t/httpd-unix.t                    |    2 +-
 t/httpd.t                         |    2 +-
 t/nntp.t                          |    2 +-
 t/nntpd.t                         |    2 +-
 t/v2mirror.t                      |    2 +-
 t/v2writable.t                    |    4 +-
 22 files changed, 1698 insertions(+), 40 deletions(-)
 create mode 100644 lib/PublicInbox/DS.pm
 create mode 100644 lib/PublicInbox/Syscall.pm

diff --git a/INSTALL b/INSTALL
index 9470d83..3c0b910 100644
--- a/INSTALL
+++ b/INSTALL
@@ -73,10 +73,6 @@ Numerous optional modules are likely to be useful as well:
                                rpm: perl-DBD-SQLite
                                (for NNTP service or gzipped mbox over HTTP)
 
-  - Danga::Socket              deb: libdanga-socket-perl
-                               rpm: perl-Danga-Socket
-                               (for bundled HTTP and NNTP servers)
-
   - Net::Server                deb: libnet-server-perl
                                rpm: perl-Net-Server
                                (for HTTP/NNTP servers as standalone daemons,
diff --git a/MANIFEST b/MANIFEST
index ed8ff49..afe5ae1 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -64,6 +64,7 @@ lib/PublicInbox/AltId.pm
 lib/PublicInbox/Cgit.pm
 lib/PublicInbox/Config.pm
 lib/PublicInbox/ContentId.pm
+lib/PublicInbox/DS.pm
 lib/PublicInbox/Daemon.pm
 lib/PublicInbox/Emergency.pm
 lib/PublicInbox/EvCleanup.pm
@@ -117,6 +118,7 @@ lib/PublicInbox/Spamcheck.pm
 lib/PublicInbox/Spamcheck/Spamc.pm
 lib/PublicInbox/Spawn.pm
 lib/PublicInbox/SpawnPP.pm
+lib/PublicInbox/Syscall.pm
 lib/PublicInbox/Unsubscribe.pm
 lib/PublicInbox/UserContent.pm
 lib/PublicInbox/V2Writable.pm
diff --git a/TODO b/TODO
index 7a3bb6b..372f733 100644
--- a/TODO
+++ b/TODO
@@ -54,7 +54,7 @@ all need to be considered for everything we introduce)
 
 * portability to FreeBSD (and other Free Software *BSDs)
   ugh... https://rt.cpan.org/Ticket/Display.html?id=116615
-  (IO::KQueue is broken with Danga::Socket)
+  (IO::KQueue is broken with Danga::Socket / PublicInbox::DS)
 
 * EPOLLEXCLUSIVE for listen socket fairness across -httpd/nntpd
   worker processes.
diff --git a/lib/PublicInbox/DS.pm b/lib/PublicInbox/DS.pm
new file mode 100644
index 0000000..543d3fd
--- /dev/null
+++ b/lib/PublicInbox/DS.pm
@@ -0,0 +1,1334 @@
+# This library is free software; you can redistribute it and/or modify
+# it under the same terms as Perl itself.
+#
+# This license differs from the rest of public-inbox
+#
+# This is a fork of the (for now) unmaintained Danga::Socket 1.61.
+# Unused features will be removed, and updates will be made to take
+# advantage of newer kernels
+
+package PublicInbox::DS;
+use strict;
+use bytes;
+use POSIX ();
+use Time::HiRes ();
+
+my $opt_bsd_resource = eval "use BSD::Resource; 1;";
+
+use vars qw{$VERSION};
+$VERSION = "1.61";
+
+use warnings;
+no  warnings qw(deprecated);
+
+use PublicInbox::Syscall qw(:epoll);
+
+use fields ('sock',              # underlying socket
+            'fd',                # numeric file descriptor
+            'write_buf',         # arrayref of scalars, scalarrefs, or coderefs to write
+            'write_buf_offset',  # offset into first array of write_buf to start writing at
+            'write_buf_size',    # total length of data in all write_buf items
+            'write_set_watch',   # bool: true if we internally set watch_write rather than by a subclass
+            'read_push_back',    # arrayref of "pushed-back" read data the application didn't want
+            'closed',            # bool: socket is closed
+            'corked',            # bool: socket is corked
+            'event_watch',       # bitmask of events the client is interested in (POLLIN,OUT,etc.)
+            'peer_v6',           # bool: cached; if peer is an IPv6 address
+            'peer_ip',           # cached stringified IP address of $sock
+            'peer_port',         # cached port number of $sock
+            'local_ip',          # cached stringified IP address of local end of $sock
+            'local_port',        # cached port number of local end of $sock
+            'writer_func',       # subref which does writing.  must return bytes written (or undef) and set $! on errors
+            );
+
+use Errno  qw(EINPROGRESS EWOULDBLOCK EISCONN ENOTSOCK
+              EPIPE EAGAIN EBADF ECONNRESET ENOPROTOOPT);
+use Socket qw(IPPROTO_TCP);
+use Carp   qw(croak confess);
+
+use constant TCP_CORK => ($^O eq "linux" ? 3 : 0); # FIXME: not hard-coded (Linux-specific too)
+use constant DebugLevel => 0;
+
+use constant POLLIN        => 1;
+use constant POLLOUT       => 4;
+use constant POLLERR       => 8;
+use constant POLLHUP       => 16;
+use constant POLLNVAL      => 32;
+
+our $HAVE_KQUEUE = eval { require IO::KQueue; 1 };
+
+our (
+     $HaveEpoll,                 # Flag -- is epoll available?  initially undefined.
+     $HaveKQueue,
+     %DescriptorMap,             # fd (num) -> PublicInbox::DS object
+     %PushBackSet,               # fd (num) -> PublicInbox::DS (fds with pushed back read data)
+     $Epoll,                     # Global epoll fd (for epoll mode only)
+     $KQueue,                    # Global kqueue fd (for kqueue mode only)
+     @ToClose,                   # sockets to close when event loop is done
+     %OtherFds,                  # A hash of "other" (non-PublicInbox::DS) file
+                                 # descriptors for the event loop to track.
+
+     $PostLoopCallback,          # subref to call at the end of each loop, if defined (global)
+     %PLCMap,                    # fd (num) -> PostLoopCallback (per-object)
+
+     $LoopTimeout,               # timeout of event loop in milliseconds
+     $DoProfile,                 # if on, enable profiling
+     %Profiling,                 # what => [ utime, stime, calls ]
+     $DoneInit,                  # if we've done the one-time module init yet
+     @Timers,                    # timers
+     );
+
+Reset();
+
+#####################################################################
+### C L A S S   M E T H O D S
+#####################################################################
+
+=head2 C<< CLASS->Reset() >>
+
+Reset all state
+
+=cut
+sub Reset {
+    %DescriptorMap = ();
+    %PushBackSet = ();
+    @ToClose = ();
+    %OtherFds = ();
+    $LoopTimeout = -1;  # no timeout by default
+    $DoProfile = 0;
+    %Profiling = ();
+    @Timers = ();
+
+    $PostLoopCallback = undef;
+    %PLCMap = ();
+    $DoneInit = 0;
+
+    POSIX::close($Epoll)  if defined $Epoll  && $Epoll  >= 0;
+    POSIX::close($KQueue) if defined $KQueue && $KQueue >= 0;
+
+    *EventLoop = *FirstTimeEventLoop;
+}
+
+=head2 C<< CLASS->HaveEpoll() >>
+
+Returns a true value if this class will use IO::Epoll for async IO.
+
+=cut
+sub HaveEpoll {
+    _InitPoller();
+    return $HaveEpoll;
+}
+
+=head2 C<< CLASS->WatchedSockets() >>
+
+Returns the number of file descriptors which are registered with the global
+poll object.
+
+=cut
+sub WatchedSockets {
+    return scalar keys %DescriptorMap;
+}
+*watched_sockets = *WatchedSockets;
+
+=head2 C<< CLASS->EnableProfiling() >>
+
+Turns profiling on, clearing current profiling data.
+
+=cut
+sub EnableProfiling {
+    if ($opt_bsd_resource) {
+        %Profiling = ();
+        $DoProfile = 1;
+        return 1;
+    }
+    return 0;
+}
+
+=head2 C<< CLASS->DisableProfiling() >>
+
+Turns off profiling, but retains data up to this point
+
+=cut
+sub DisableProfiling {
+    $DoProfile = 0;
+}
+
+=head2 C<< CLASS->ProfilingData() >>
+
+Returns reference to a hash of data in format:
+
+  ITEM => [ utime, stime, #calls ]
+
+=cut
+sub ProfilingData {
+    return \%Profiling;
+}
+
+=head2 C<< CLASS->ToClose() >>
+
+Return the list of sockets that are awaiting close() at the end of the
+current event loop.
+
+=cut
+sub ToClose { return @ToClose; }
+
+=head2 C<< CLASS->OtherFds( [%fdmap] ) >>
+
+Get/set the hash of file descriptors that need processing in parallel with
+the registered PublicInbox::DS objects.
+
+=cut
+sub OtherFds {
+    my $class = shift;
+    if ( @_ ) { %OtherFds = @_ }
+    return wantarray ? %OtherFds : \%OtherFds;
+}
+
+=head2 C<< CLASS->AddOtherFds( [%fdmap] ) >>
+
+Add fds to the OtherFds hash for processing.
+
+=cut
+sub AddOtherFds {
+    my $class = shift;
+    %OtherFds = ( %OtherFds, @_ ); # FIXME investigate what happens on dupe fds
+    return wantarray ? %OtherFds : \%OtherFds;
+}
+
+=head2 C<< CLASS->SetLoopTimeout( $timeout ) >>
+
+Set the loop timeout for the event loop to some value in milliseconds.
+
+A timeout of 0 (zero) means poll forever. A timeout of -1 means poll and return
+immediately.
+
+=cut
+sub SetLoopTimeout {
+    return $LoopTimeout = $_[1] + 0;
+}
+
+=head2 C<< CLASS->DebugMsg( $format, @args ) >>
+
+Print the debugging message specified by the C<sprintf>-style I<format> and
+I<args>
+
+=cut
+sub DebugMsg {
+    my ( $class, $fmt, @args ) = @_;
+    chomp $fmt;
+    printf STDERR ">>> $fmt\n", @args;
+}
+
+=head2 C<< CLASS->AddTimer( $seconds, $coderef ) >>
+
+Add a timer to occur $seconds from now. $seconds may be fractional, but timers
+are not guaranteed to fire at the exact time you ask for.
+
+Returns a timer object which you can call C<< $timer->cancel >> on if you need to.
+
+=cut
+sub AddTimer {
+    my $class = shift;
+    my ($secs, $coderef) = @_;
+
+    my $fire_time = Time::HiRes::time() + $secs;
+
+    my $timer = bless [$fire_time, $coderef], "PublicInbox::DS::Timer";
+
+    if (!@Timers || $fire_time >= $Timers[-1][0]) {
+        push @Timers, $timer;
+        return $timer;
+    }
+
+    # Now, where do we insert?  (NOTE: this appears slow, algorithm-wise,
+    # but it was compared against calendar queues, heaps, naive push/sort,
+    # and a bunch of other versions, and found to be fastest with a large
+    # variety of datasets.)
+    for (my $i = 0; $i < @Timers; $i++) {
+        if ($Timers[$i][0] > $fire_time) {
+            splice(@Timers, $i, 0, $timer);
+            return $timer;
+        }
+    }
+
+    die "Shouldn't get here.";
+}
+
+=head2 C<< CLASS->DescriptorMap() >>
+
+Get the hash of PublicInbox::DS objects keyed by the file descriptor (fileno) they
+are wrapping.
+
+Returns a hash in list context or a hashref in scalar context.
+
+=cut
+sub DescriptorMap {
+    return wantarray ? %DescriptorMap : \%DescriptorMap;
+}
+*descriptor_map = *DescriptorMap;
+*get_sock_ref = *DescriptorMap;
+
+sub _InitPoller
+{
+    return if $DoneInit;
+    $DoneInit = 1;
+
+    if ($HAVE_KQUEUE) {
+        $KQueue = IO::KQueue->new();
+        $HaveKQueue = $KQueue >= 0;
+        if ($HaveKQueue) {
+            *EventLoop = *KQueueEventLoop;
+        }
+    }
+    elsif (PublicInbox::Syscall::epoll_defined()) {
+        $Epoll = eval { epoll_create(1024); };
+        $HaveEpoll = defined $Epoll && $Epoll >= 0;
+        if ($HaveEpoll) {
+            *EventLoop = *EpollEventLoop;
+        }
+    }
+
+    if (!$HaveEpoll && !$HaveKQueue) {
+        require IO::Poll;
+        *EventLoop = *PollEventLoop;
+    }
+}
+
+=head2 C<< CLASS->EventLoop() >>
+
+Start processing IO events. In most daemon programs this never exits. See
+C<PostLoopCallback> below for how to exit the loop.
+
+=cut
+sub FirstTimeEventLoop {
+    my $class = shift;
+
+    _InitPoller();
+
+    if ($HaveEpoll) {
+        EpollEventLoop($class);
+    } elsif ($HaveKQueue) {
+        KQueueEventLoop($class);
+    } else {
+        PollEventLoop($class);
+    }
+}
+
+## profiling-related data/functions
+our ($Prof_utime0, $Prof_stime0);
+sub _pre_profile {
+    ($Prof_utime0, $Prof_stime0) = getrusage();
+}
+
+sub _post_profile {
+    # get post information
+    my ($autime, $astime) = getrusage();
+
+    # calculate differences
+    my $utime = $autime - $Prof_utime0;
+    my $stime = $astime - $Prof_stime0;
+
+    foreach my $k (@_) {
+        $Profiling{$k} ||= [ 0.0, 0.0, 0 ];
+        $Profiling{$k}->[0] += $utime;
+        $Profiling{$k}->[1] += $stime;
+        $Profiling{$k}->[2]++;
+    }
+}
+
+# runs timers and returns milliseconds for next one, or next event loop
+sub RunTimers {
+    return $LoopTimeout unless @Timers;
+
+    my $now = Time::HiRes::time();
+
+    # Run expired timers
+    while (@Timers && $Timers[0][0] <= $now) {
+        my $to_run = shift(@Timers);
+        $to_run->[1]->($now) if $to_run->[1];
+    }
+
+    return $LoopTimeout unless @Timers;
+
+    # convert time to an even number of milliseconds, adding 1
+    # extra, otherwise floating point fun can occur and we'll
+    # call RunTimers like 20-30 times, each returning a timeout
+    # of 0.0000212 seconds
+    my $timeout = int(($Timers[0][0] - $now) * 1000) + 1;
+
+    # -1 is an infinite timeout, so prefer a real timeout
+    return $timeout     if $LoopTimeout == -1;
+
+    # otherwise pick the lower of our regular timeout and time until
+    # the next timer
+    return $LoopTimeout if $LoopTimeout < $timeout;
+    return $timeout;
+}
+
+### The epoll-based event loop. Gets installed as EventLoop if IO::Epoll loads
+### okay.
+sub EpollEventLoop {
+    my $class = shift;
+
+    foreach my $fd ( keys %OtherFds ) {
+        if (epoll_ctl($Epoll, EPOLL_CTL_ADD, $fd, EPOLLIN) == -1) {
+            warn "epoll_ctl(): failure adding fd=$fd; $! (", $!+0, ")\n";
+        }
+    }
+
+    while (1) {
+        my @events;
+        my $i;
+        my $timeout = RunTimers();
+
+        # get up to 1000 events
+        my $evcount = epoll_wait($Epoll, 1000, $timeout, \@events);
+      EVENT:
+        for ($i=0; $i<$evcount; $i++) {
+            my $ev = $events[$i];
+
+            # it's possible epoll_wait returned many events, including some at the end
+            # that ones in the front triggered unregister-interest actions.  if we
+            # can't find the %sock entry, it's because we're no longer interested
+            # in that event.
+            my PublicInbox::DS $pob = $DescriptorMap{$ev->[0]};
+            my $code;
+            my $state = $ev->[1];
+
+            # if we didn't find a Perlbal::Socket subclass for that fd, try other
+            # pseudo-registered (above) fds.
+            if (! $pob) {
+                if (my $code = $OtherFds{$ev->[0]}) {
+                    $code->($state);
+                } else {
+                    my $fd = $ev->[0];
+                    warn "epoll() returned fd $fd w/ state $state for which we have no mapping.  removing.\n";
+                    POSIX::close($fd);
+                    epoll_ctl($Epoll, EPOLL_CTL_DEL, $fd, 0);
+                }
+                next;
+            }
+
+            DebugLevel >= 1 && $class->DebugMsg("Event: fd=%d (%s), state=%d \@ %s\n",
+                                                $ev->[0], ref($pob), $ev->[1], time);
+
+            if ($DoProfile) {
+                my $class = ref $pob;
+
+                # call profiling action on things that need to be done
+                if ($state & EPOLLIN && ! $pob->{closed}) {
+                    _pre_profile();
+                    $pob->event_read;
+                    _post_profile("$class-read");
+                }
+
+                if ($state & EPOLLOUT && ! $pob->{closed}) {
+                    _pre_profile();
+                    $pob->event_write;
+                    _post_profile("$class-write");
+                }
+
+                if ($state & (EPOLLERR|EPOLLHUP)) {
+                    if ($state & EPOLLERR && ! $pob->{closed}) {
+                        _pre_profile();
+                        $pob->event_err;
+                        _post_profile("$class-err");
+                    }
+                    if ($state & EPOLLHUP && ! $pob->{closed}) {
+                        _pre_profile();
+                        $pob->event_hup;
+                        _post_profile("$class-hup");
+                    }
+                }
+
+                next;
+            }
+
+            # standard non-profiling codepat
+            $pob->event_read   if $state & EPOLLIN && ! $pob->{closed};
+            $pob->event_write  if $state & EPOLLOUT && ! $pob->{closed};
+            if ($state & (EPOLLERR|EPOLLHUP)) {
+                $pob->event_err    if $state & EPOLLERR && ! $pob->{closed};
+                $pob->event_hup    if $state & EPOLLHUP && ! $pob->{closed};
+            }
+        }
+        return unless PostEventLoop();
+    }
+    exit 0;
+}
+
+### The fallback IO::Poll-based event loop. Gets installed as EventLoop if
+### IO::Epoll fails to load.
+sub PollEventLoop {
+    my $class = shift;
+
+    my PublicInbox::DS $pob;
+
+    while (1) {
+        my $timeout = RunTimers();
+
+        # the following sets up @poll as a series of ($poll,$event_mask)
+        # items, then uses IO::Poll::_poll, implemented in XS, which
+        # modifies the array in place with the even elements being
+        # replaced with the event masks that occured.
+        my @poll;
+        foreach my $fd ( keys %OtherFds ) {
+            push @poll, $fd, POLLIN;
+        }
+        while ( my ($fd, $sock) = each %DescriptorMap ) {
+            push @poll, $fd, $sock->{event_watch};
+        }
+
+        # if nothing to poll, either end immediately (if no timeout)
+        # or just keep calling the callback
+        unless (@poll) {
+            select undef, undef, undef, ($timeout / 1000);
+            return unless PostEventLoop();
+            next;
+        }
+
+        my $count = IO::Poll::_poll($timeout, @poll);
+        unless ($count) {
+            return unless PostEventLoop();
+            next;
+        }
+
+        # Fetch handles with read events
+        while (@poll) {
+            my ($fd, $state) = splice(@poll, 0, 2);
+            next unless $state;
+
+            $pob = $DescriptorMap{$fd};
+
+            if (!$pob) {
+                if (my $code = $OtherFds{$fd}) {
+                    $code->($state);
+                }
+                next;
+            }
+
+            $pob->event_read   if $state & POLLIN && ! $pob->{closed};
+            $pob->event_write  if $state & POLLOUT && ! $pob->{closed};
+            $pob->event_err    if $state & POLLERR && ! $pob->{closed};
+            $pob->event_hup    if $state & POLLHUP && ! $pob->{closed};
+        }
+
+        return unless PostEventLoop();
+    }
+
+    exit 0;
+}
+
+### The kqueue-based event loop. Gets installed as EventLoop if IO::KQueue works
+### okay.
+sub KQueueEventLoop {
+    my $class = shift;
+
+    foreach my $fd (keys %OtherFds) {
+        $KQueue->EV_SET($fd, IO::KQueue::EVFILT_READ(), IO::KQueue::EV_ADD());
+    }
+
+    while (1) {
+        my $timeout = RunTimers();
+        my @ret = $KQueue->kevent($timeout);
+
+        foreach my $kev (@ret) {
+            my ($fd, $filter, $flags, $fflags) = @$kev;
+            my PublicInbox::DS $pob = $DescriptorMap{$fd};
+            if (!$pob) {
+                if (my $code = $OtherFds{$fd}) {
+                    $code->($filter);
+                }  else {
+                    warn "kevent() returned fd $fd for which we have no mapping.  removing.\n";
+                    POSIX::close($fd); # close deletes the kevent entry
+                }
+                next;
+            }
+
+            DebugLevel >= 1 && $class->DebugMsg("Event: fd=%d (%s), flags=%d \@ %s\n",
+                                                        $fd, ref($pob), $flags, time);
+
+            $pob->event_read  if $filter == IO::KQueue::EVFILT_READ()  && !$pob->{closed};
+            $pob->event_write if $filter == IO::KQueue::EVFILT_WRITE() && !$pob->{closed};
+            if ($flags ==  IO::KQueue::EV_EOF() && !$pob->{closed}) {
+                if ($fflags) {
+                    $pob->event_err;
+                } else {
+                    $pob->event_hup;
+                }
+            }
+        }
+        return unless PostEventLoop();
+    }
+
+    exit(0);
+}
+
+=head2 C<< CLASS->SetPostLoopCallback( CODEREF ) >>
+
+Sets post loop callback function.  Pass a subref and it will be
+called every time the event loop finishes.
+
+Return 1 (or any true value) from the sub to make the loop continue, 0 or false
+and it will exit.
+
+The callback function will be passed two parameters: \%DescriptorMap, \%OtherFds.
+
+=cut
+sub SetPostLoopCallback {
+    my ($class, $ref) = @_;
+
+    if (ref $class) {
+        # per-object callback
+        my PublicInbox::DS $self = $class;
+        if (defined $ref && ref $ref eq 'CODE') {
+            $PLCMap{$self->{fd}} = $ref;
+        } else {
+            delete $PLCMap{$self->{fd}};
+        }
+    } else {
+        # global callback
+        $PostLoopCallback = (defined $ref && ref $ref eq 'CODE') ? $ref : undef;
+    }
+}
+
+# Internal function: run the post-event callback, send read events
+# for pushed-back data, and close pending connections.  returns 1
+# if event loop should continue, or 0 to shut it all down.
+sub PostEventLoop {
+    # fire read events for objects with pushed-back read data
+    my $loop = 1;
+    while ($loop) {
+        $loop = 0;
+        foreach my $fd (keys %PushBackSet) {
+            my PublicInbox::DS $pob = $PushBackSet{$fd};
+
+            # a previous event_read invocation could've closed a
+            # connection that we already evaluated in "keys
+            # %PushBackSet", so skip ones that seem to have
+            # disappeared.  this is expected.
+            next unless $pob;
+
+            die "ASSERT: the $pob socket has no read_push_back" unless @{$pob->{read_push_back}};
+            next unless (! $pob->{closed} &&
+                         $pob->{event_watch} & POLLIN);
+            $loop = 1;
+            $pob->event_read;
+        }
+    }
+
+    # now we can close sockets that wanted to close during our event processing.
+    # (we didn't want to close them during the loop, as we didn't want fd numbers
+    #  being reused and confused during the event loop)
+    while (my $sock = shift @ToClose) {
+        my $fd = fileno($sock);
+
+        # close the socket.  (not a PublicInbox::DS close)
+        $sock->close;
+
+        # and now we can finally remove the fd from the map.  see
+        # comment above in _cleanup.
+        delete $DescriptorMap{$fd};
+    }
+
+
+    # by default we keep running, unless a postloop callback (either per-object
+    # or global) cancels it
+    my $keep_running = 1;
+
+    # per-object post-loop-callbacks
+    for my $plc (values %PLCMap) {
+        $keep_running &&= $plc->(\%DescriptorMap, \%OtherFds);
+    }
+
+    # now we're at the very end, call callback if defined
+    if (defined $PostLoopCallback) {
+        $keep_running &&= $PostLoopCallback->(\%DescriptorMap, \%OtherFds);
+    }
+
+    return $keep_running;
+}
+
+#####################################################################
+### PublicInbox::DS-the-object code
+#####################################################################
+
+=head2 OBJECT METHODS
+
+=head2 C<< CLASS->new( $socket ) >>
+
+Create a new PublicInbox::DS subclass object for the given I<socket> which will
+react to events on it during the C<EventLoop>.
+
+This is normally (always?) called from your subclass via:
+
+  $class->SUPER::new($socket);
+
+=cut
+sub new {
+    my PublicInbox::DS $self = shift;
+    $self = fields::new($self) unless ref $self;
+
+    my $sock = shift;
+
+    $self->{sock}        = $sock;
+    my $fd = fileno($sock);
+
+    Carp::cluck("undef sock and/or fd in PublicInbox::DS->new.  sock=" . ($sock || "") . ", fd=" . ($fd || ""))
+        unless $sock && $fd;
+
+    $self->{fd}          = $fd;
+    $self->{write_buf}      = [];
+    $self->{write_buf_offset} = 0;
+    $self->{write_buf_size} = 0;
+    $self->{closed} = 0;
+    $self->{corked} = 0;
+    $self->{read_push_back} = [];
+
+    $self->{event_watch} = POLLERR|POLLHUP|POLLNVAL;
+
+    _InitPoller();
+
+    if ($HaveEpoll) {
+        epoll_ctl($Epoll, EPOLL_CTL_ADD, $fd, $self->{event_watch})
+            and die "couldn't add epoll watch for $fd\n";
+    }
+    elsif ($HaveKQueue) {
+        # Add them to the queue but disabled for now
+        $KQueue->EV_SET($fd, IO::KQueue::EVFILT_READ(),
+                        IO::KQueue::EV_ADD() | IO::KQueue::EV_DISABLE());
+        $KQueue->EV_SET($fd, IO::KQueue::EVFILT_WRITE(),
+                        IO::KQueue::EV_ADD() | IO::KQueue::EV_DISABLE());
+    }
+
+    Carp::cluck("PublicInbox::DS::new blowing away existing descriptor map for fd=$fd ($DescriptorMap{$fd})")
+        if $DescriptorMap{$fd};
+
+    $DescriptorMap{$fd} = $self;
+    return $self;
+}
+
+
+#####################################################################
+### I N S T A N C E   M E T H O D S
+#####################################################################
+
+=head2 C<< $obj->tcp_cork( $boolean ) >>
+
+Turn TCP_CORK on or off depending on the value of I<boolean>.
+
+=cut
+sub tcp_cork {
+    my PublicInbox::DS $self = $_[0];
+    my $val = $_[1];
+
+    # make sure we have a socket
+    return unless $self->{sock};
+    return if $val == $self->{corked};
+
+    my $rv;
+    if (TCP_CORK) {
+        $rv = setsockopt($self->{sock}, IPPROTO_TCP, TCP_CORK,
+                         pack("l", $val ? 1 : 0));
+    } else {
+        # FIXME: implement freebsd *PUSH sockopts
+        $rv = 1;
+    }
+
+    # if we failed, close (if we're not already) and warn about the error
+    if ($rv) {
+        $self->{corked} = $val;
+    } else {
+        if ($! == EBADF || $! == ENOTSOCK) {
+            # internal state is probably corrupted; warn and then close if
+            # we're not closed already
+            warn "setsockopt: $!";
+            $self->close('tcp_cork_failed');
+        } elsif ($! == ENOPROTOOPT || $!{ENOTSOCK} || $!{EOPNOTSUPP}) {
+            # TCP implementation doesn't support corking, so just ignore it
+            # or we're trying to tcp-cork a non-socket (like a socketpair pipe
+            # which is acting like a socket, which Perlbal does for child
+            # processes acting like inetd-like web servers)
+        } else {
+            # some other error; we should never hit here, but if we do, die
+            die "setsockopt: $!";
+        }
+    }
+}
+
+=head2 C<< $obj->steal_socket() >>
+
+Basically returns our socket and makes it so that we don't try to close it,
+but we do remove it from epoll handlers.  THIS CLOSES $self.  It is the same
+thing as calling close, except it gives you the socket to use.
+
+=cut
+sub steal_socket {
+    my PublicInbox::DS $self = $_[0];
+    return if $self->{closed};
+
+    # cleanup does most of the work of closing this socket
+    $self->_cleanup();
+
+    # now undef our internal sock and fd structures so we don't use them
+    my $sock = $self->{sock};
+    $self->{sock} = undef;
+    return $sock;
+}
+
+=head2 C<< $obj->close( [$reason] ) >>
+
+Close the socket. The I<reason> argument will be used in debugging messages.
+
+=cut
+sub close {
+    my PublicInbox::DS $self = $_[0];
+    return if $self->{closed};
+
+    # print out debugging info for this close
+    if (DebugLevel) {
+        my ($pkg, $filename, $line) = caller;
+        my $reason = $_[1] || "";
+        warn "Closing \#$self->{fd} due to $pkg/$filename/$line ($reason)\n";
+    }
+
+    # this does most of the work of closing us
+    $self->_cleanup();
+
+    # defer closing the actual socket until the event loop is done
+    # processing this round of events.  (otherwise we might reuse fds)
+    if ($self->{sock}) {
+        push @ToClose, $self->{sock};
+        $self->{sock} = undef;
+    }
+
+    return 0;
+}
+
+### METHOD: _cleanup()
+### Called by our closers so we can clean internal data structures.
+sub _cleanup {
+    my PublicInbox::DS $self = $_[0];
+
+    # we're effectively closed; we have no fd and sock when we leave here
+    $self->{closed} = 1;
+
+    # we need to flush our write buffer, as there may
+    # be self-referential closures (sub { $client->close })
+    # preventing the object from being destroyed
+    $self->{write_buf} = [];
+
+    # uncork so any final data gets sent.  only matters if the person closing
+    # us forgot to do it, but we do it to be safe.
+    $self->tcp_cork(0);
+
+    # if we're using epoll, we have to remove this from our epoll fd so we stop getting
+    # notifications about it
+    if ($HaveEpoll && $self->{fd}) {
+        if (epoll_ctl($Epoll, EPOLL_CTL_DEL, $self->{fd}, $self->{event_watch}) != 0) {
+            # dump_error prints a backtrace so we can try to figure out why this happened
+            $self->dump_error("epoll_ctl(): failure deleting fd=$self->{fd} during _cleanup(); $! (" . ($!+0) . ")");
+        }
+    }
+
+    # now delete from mappings.  this fd no longer belongs to us, so we don't want
+    # to get alerts for it if it becomes writable/readable/etc.
+    delete $PushBackSet{$self->{fd}};
+    delete $PLCMap{$self->{fd}};
+
+    # we explicitly don't delete from DescriptorMap here until we
+    # actually close the socket, as we might be in the middle of
+    # processing an epoll_wait/etc that returned hundreds of fds, one
+    # of which is not yet processed and is what we're closing.  if we
+    # keep it in DescriptorMap, then the event harnesses can just
+    # looked at $pob->{closed} and ignore it.  but if it's an
+    # un-accounted for fd, then it (understandably) freak out a bit
+    # and emit warnings, thinking their state got off.
+
+    # and finally get rid of our fd so we can't use it anywhere else
+    $self->{fd} = undef;
+}
+
+=head2 C<< $obj->sock() >>
+
+Returns the underlying IO::Handle for the object.
+
+=cut
+sub sock {
+    my PublicInbox::DS $self = shift;
+    return $self->{sock};
+}
+
+=head2 C<< $obj->set_writer_func( CODEREF ) >>
+
+Sets a function to use instead of C<syswrite()> when writing data to the socket.
+
+=cut
+sub set_writer_func {
+   my PublicInbox::DS $self = shift;
+   my $wtr = shift;
+   Carp::croak("Not a subref") unless !defined $wtr || UNIVERSAL::isa($wtr, "CODE");
+   $self->{writer_func} = $wtr;
+}
+
+=head2 C<< $obj->write( $data ) >>
+
+Write the specified data to the underlying handle.  I<data> may be scalar,
+scalar ref, code ref (to run when there), or undef just to kick-start.
+Returns 1 if writes all went through, or 0 if there are writes in queue. If
+it returns 1, caller should stop waiting for 'writable' events)
+
+=cut
+sub write {
+    my PublicInbox::DS $self;
+    my $data;
+    ($self, $data) = @_;
+
+    # nobody should be writing to closed sockets, but caller code can
+    # do two writes within an event, have the first fail and
+    # disconnect the other side (whose destructor then closes the
+    # calling object, but it's still in a method), and then the
+    # now-dead object does its second write.  that is this case.  we
+    # just lie and say it worked.  it'll be dead soon and won't be
+    # hurt by this lie.
+    return 1 if $self->{closed};
+
+    my $bref;
+
+    # just queue data if there's already a wait
+    my $need_queue;
+
+    if (defined $data) {
+        $bref = ref $data ? $data : \$data;
+        if ($self->{write_buf_size}) {
+            push @{$self->{write_buf}}, $bref;
+            $self->{write_buf_size} += ref $bref eq "SCALAR" ? length($$bref) : 1;
+            return 0;
+        }
+
+        # this flag says we're bypassing the queue system, knowing we're the
+        # only outstanding write, and hoping we don't ever need to use it.
+        # if so later, though, we'll need to queue
+        $need_queue = 1;
+    }
+
+  WRITE:
+    while (1) {
+        return 1 unless $bref ||= $self->{write_buf}[0];
+
+        my $len;
+        eval {
+            $len = length($$bref); # this will die if $bref is a code ref, caught below
+        };
+        if ($@) {
+            if (UNIVERSAL::isa($bref, "CODE")) {
+                unless ($need_queue) {
+                    $self->{write_buf_size}--; # code refs are worth 1
+                    shift @{$self->{write_buf}};
+                }
+                $bref->();
+
+                # code refs are just run and never get reenqueued
+                # (they're one-shot), so turn off the flag indicating the
+                # outstanding data needs queueing.
+                $need_queue = 0;
+
+                undef $bref;
+                next WRITE;
+            }
+            die "Write error: $@ <$bref>";
+        }
+
+        my $to_write = $len - $self->{write_buf_offset};
+        my $written;
+        if (my $wtr = $self->{writer_func}) {
+            $written = $wtr->($bref, $to_write, $self->{write_buf_offset});
+        } else {
+            $written = syswrite($self->{sock}, $$bref, $to_write, $self->{write_buf_offset});
+        }
+
+        if (! defined $written) {
+            if ($! == EPIPE) {
+                return $self->close("EPIPE");
+            } elsif ($! == EAGAIN) {
+                # since connection has stuff to write, it should now be
+                # interested in pending writes:
+                if ($need_queue) {
+                    push @{$self->{write_buf}}, $bref;
+                    $self->{write_buf_size} += $len;
+                }
+                $self->{write_set_watch} = 1 unless $self->{event_watch} & POLLOUT;
+                $self->watch_write(1);
+                return 0;
+            } elsif ($! == ECONNRESET) {
+                return $self->close("ECONNRESET");
+            }
+
+            DebugLevel >= 1 && $self->debugmsg("Closing connection ($self) due to write error: $!\n");
+
+            return $self->close("write_error");
+        } elsif ($written != $to_write) {
+            DebugLevel >= 2 && $self->debugmsg("Wrote PARTIAL %d bytes to %d",
+                                               $written, $self->{fd});
+            if ($need_queue) {
+                push @{$self->{write_buf}}, $bref;
+                $self->{write_buf_size} += $len;
+            }
+            # since connection has stuff to write, it should now be
+            # interested in pending writes:
+            $self->{write_buf_offset} += $written;
+            $self->{write_buf_size} -= $written;
+            $self->on_incomplete_write;
+            return 0;
+        } elsif ($written == $to_write) {
+            DebugLevel >= 2 && $self->debugmsg("Wrote ALL %d bytes to %d (nq=%d)",
+                                               $written, $self->{fd}, $need_queue);
+            $self->{write_buf_offset} = 0;
+
+            if ($self->{write_set_watch}) {
+                $self->watch_write(0);
+                $self->{write_set_watch} = 0;
+            }
+
+            # this was our only write, so we can return immediately
+            # since we avoided incrementing the buffer size or
+            # putting it in the buffer.  we also know there
+            # can't be anything else to write.
+            return 1 if $need_queue;
+
+            $self->{write_buf_size} -= $written;
+            shift @{$self->{write_buf}};
+            undef $bref;
+            next WRITE;
+        }
+    }
+}
+
+sub on_incomplete_write {
+    my PublicInbox::DS $self = shift;
+    $self->{write_set_watch} = 1 unless $self->{event_watch} & POLLOUT;
+    $self->watch_write(1);
+}
+
+=head2 C<< $obj->push_back_read( $buf ) >>
+
+Push back I<buf> (a scalar or scalarref) into the read stream. Useful if you read
+more than you need to and want to return this data on the next "read".
+
+=cut
+sub push_back_read {
+    my PublicInbox::DS $self = shift;
+    my $buf = shift;
+    push @{$self->{read_push_back}}, ref $buf ? $buf : \$buf;
+    $PushBackSet{$self->{fd}} = $self;
+}
+
+=head2 C<< $obj->read( $bytecount ) >>
+
+Read at most I<bytecount> bytes from the underlying handle; returns scalar
+ref on read, or undef on connection closed.
+
+=cut
+sub read {
+    my PublicInbox::DS $self = shift;
+    return if $self->{closed};
+    my $bytes = shift;
+    my $buf;
+    my $sock = $self->{sock};
+
+    if (@{$self->{read_push_back}}) {
+        $buf = shift @{$self->{read_push_back}};
+        my $len = length($$buf);
+
+        if ($len <= $bytes) {
+            delete $PushBackSet{$self->{fd}} unless @{$self->{read_push_back}};
+            return $buf;
+        } else {
+            # if the pushed back read is too big, we have to split it
+            my $overflow = substr($$buf, $bytes);
+            $buf = substr($$buf, 0, $bytes);
+            unshift @{$self->{read_push_back}}, \$overflow;
+            return \$buf;
+        }
+    }
+
+    # if this is too high, perl quits(!!).  reports on mailing lists
+    # don't seem to point to a universal answer.  5MB worked for some,
+    # crashed for others.  1MB works for more people.  let's go with 1MB
+    # for now.  :/
+    my $req_bytes = $bytes > 1048576 ? 1048576 : $bytes;
+
+    my $res = sysread($sock, $buf, $req_bytes, 0);
+    DebugLevel >= 2 && $self->debugmsg("sysread = %d; \$! = %d", $res, $!);
+
+    if (! $res && $! != EWOULDBLOCK) {
+        # catches 0=conn closed or undef=error
+        DebugLevel >= 2 && $self->debugmsg("Fd \#%d read hit the end of the road.", $self->{fd});
+        return undef;
+    }
+
+    return \$buf;
+}
+
+=head2 (VIRTUAL) C<< $obj->event_read() >>
+
+Readable event handler. Concrete deriviatives of PublicInbox::DS should
+provide an implementation of this. The default implementation will die if
+called.
+
+=cut
+sub event_read  { die "Base class event_read called for $_[0]\n"; }
+
+=head2 (VIRTUAL) C<< $obj->event_err() >>
+
+Error event handler. Concrete deriviatives of PublicInbox::DS should
+provide an implementation of this. The default implementation will die if
+called.
+
+=cut
+sub event_err   { die "Base class event_err called for $_[0]\n"; }
+
+=head2 (VIRTUAL) C<< $obj->event_hup() >>
+
+'Hangup' event handler. Concrete deriviatives of PublicInbox::DS should
+provide an implementation of this. The default implementation will die if
+called.
+
+=cut
+sub event_hup   { die "Base class event_hup called for $_[0]\n"; }
+
+=head2 C<< $obj->event_write() >>
+
+Writable event handler. Concrete deriviatives of PublicInbox::DS may wish to
+provide an implementation of this. The default implementation calls
+C<write()> with an C<undef>.
+
+=cut
+sub event_write {
+    my $self = shift;
+    $self->write(undef);
+}
+
+=head2 C<< $obj->watch_read( $boolean ) >>
+
+Turn 'readable' event notification on or off.
+
+=cut
+sub watch_read {
+    my PublicInbox::DS $self = shift;
+    return if $self->{closed} || !$self->{sock};
+
+    my $val = shift;
+    my $event = $self->{event_watch};
+
+    $event &= ~POLLIN if ! $val;
+    $event |=  POLLIN if   $val;
+
+    # If it changed, set it
+    if ($event != $self->{event_watch}) {
+        if ($HaveKQueue) {
+            $KQueue->EV_SET($self->{fd}, IO::KQueue::EVFILT_READ(),
+                            $val ? IO::KQueue::EV_ENABLE() : IO::KQueue::EV_DISABLE());
+        }
+        elsif ($HaveEpoll) {
+            epoll_ctl($Epoll, EPOLL_CTL_MOD, $self->{fd}, $event)
+                and $self->dump_error("couldn't modify epoll settings for $self->{fd} " .
+                                      "from $self->{event_watch} -> $event: $! (" . ($!+0) . ")");
+        }
+        $self->{event_watch} = $event;
+    }
+}
+
+=head2 C<< $obj->watch_write( $boolean ) >>
+
+Turn 'writable' event notification on or off.
+
+=cut
+sub watch_write {
+    my PublicInbox::DS $self = shift;
+    return if $self->{closed} || !$self->{sock};
+
+    my $val = shift;
+    my $event = $self->{event_watch};
+
+    $event &= ~POLLOUT if ! $val;
+    $event |=  POLLOUT if   $val;
+
+    if ($val && caller ne __PACKAGE__) {
+        # A subclass registered interest, it's now responsible for this.
+        $self->{write_set_watch} = 0;
+    }
+
+    # If it changed, set it
+    if ($event != $self->{event_watch}) {
+        if ($HaveKQueue) {
+            $KQueue->EV_SET($self->{fd}, IO::KQueue::EVFILT_WRITE(),
+                            $val ? IO::KQueue::EV_ENABLE() : IO::KQueue::EV_DISABLE());
+        }
+        elsif ($HaveEpoll) {
+            epoll_ctl($Epoll, EPOLL_CTL_MOD, $self->{fd}, $event)
+                and $self->dump_error("couldn't modify epoll settings for $self->{fd} " .
+                                      "from $self->{event_watch} -> $event: $! (" . ($!+0) . ")");
+        }
+        $self->{event_watch} = $event;
+    }
+}
+
+=head2 C<< $obj->dump_error( $message ) >>
+
+Prints to STDERR a backtrace with information about this socket and what lead
+up to the dump_error call.
+
+=cut
+sub dump_error {
+    my $i = 0;
+    my @list;
+    while (my ($file, $line, $sub) = (caller($i++))[1..3]) {
+        push @list, "\t$file:$line called $sub\n";
+    }
+
+    warn "ERROR: $_[1]\n" .
+        "\t$_[0] = " . $_[0]->as_string . "\n" .
+        join('', @list);
+}
+
+=head2 C<< $obj->debugmsg( $format, @args ) >>
+
+Print the debugging message specified by the C<sprintf>-style I<format> and
+I<args>.
+
+=cut
+sub debugmsg {
+    my ( $self, $fmt, @args ) = @_;
+    confess "Not an object" unless ref $self;
+
+    chomp $fmt;
+    printf STDERR ">>> $fmt\n", @args;
+}
+
+
+=head2 C<< $obj->peer_ip_string() >>
+
+Returns the string describing the peer's IP
+
+=cut
+sub peer_ip_string {
+    my PublicInbox::DS $self = shift;
+    return _undef("peer_ip_string undef: no sock") unless $self->{sock};
+    return $self->{peer_ip} if defined $self->{peer_ip};
+
+    my $pn = getpeername($self->{sock});
+    return _undef("peer_ip_string undef: getpeername") unless $pn;
+
+    my ($port, $iaddr) = eval {
+        if (length($pn) >= 28) {
+            return Socket6::unpack_sockaddr_in6($pn);
+        } else {
+            return Socket::sockaddr_in($pn);
+        }
+    };
+
+    if ($@) {
+        $self->{peer_port} = "[Unknown peerport '$@']";
+        return "[Unknown peername '$@']";
+    }
+
+    $self->{peer_port} = $port;
+
+    if (length($iaddr) == 4) {
+        return $self->{peer_ip} = Socket::inet_ntoa($iaddr);
+    } else {
+        $self->{peer_v6} = 1;
+        return $self->{peer_ip} = Socket6::inet_ntop(Socket6::AF_INET6(),
+                                                     $iaddr);
+    }
+}
+
+=head2 C<< $obj->peer_addr_string() >>
+
+Returns the string describing the peer for the socket which underlies this
+object in form "ip:port"
+
+=cut
+sub peer_addr_string {
+    my PublicInbox::DS $self = shift;
+    my $ip = $self->peer_ip_string
+        or return undef;
+    return $self->{peer_v6} ?
+        "[$ip]:$self->{peer_port}" :
+        "$ip:$self->{peer_port}";
+}
+
+=head2 C<< $obj->local_ip_string() >>
+
+Returns the string describing the local IP
+
+=cut
+sub local_ip_string {
+    my PublicInbox::DS $self = shift;
+    return _undef("local_ip_string undef: no sock") unless $self->{sock};
+    return $self->{local_ip} if defined $self->{local_ip};
+
+    my $pn = getsockname($self->{sock});
+    return _undef("local_ip_string undef: getsockname") unless $pn;
+
+    my ($port, $iaddr) = Socket::sockaddr_in($pn);
+    $self->{local_port} = $port;
+
+    return $self->{local_ip} = Socket::inet_ntoa($iaddr);
+}
+
+=head2 C<< $obj->local_addr_string() >>
+
+Returns the string describing the local end of the socket which underlies this
+object in form "ip:port"
+
+=cut
+sub local_addr_string {
+    my PublicInbox::DS $self = shift;
+    my $ip = $self->local_ip_string;
+    return $ip ? "$ip:$self->{local_port}" : undef;
+}
+
+
+=head2 C<< $obj->as_string() >>
+
+Returns a string describing this socket.
+
+=cut
+sub as_string {
+    my PublicInbox::DS $self = shift;
+    my $rw = "(" . ($self->{event_watch} & POLLIN ? 'R' : '') .
+                   ($self->{event_watch} & POLLOUT ? 'W' : '') . ")";
+    my $ret = ref($self) . "$rw: " . ($self->{closed} ? "closed" : "open");
+    my $peer = $self->peer_addr_string;
+    if ($peer) {
+        $ret .= " to " . $self->peer_addr_string;
+    }
+    return $ret;
+}
+
+sub _undef {
+    return undef unless $ENV{DS_DEBUG};
+    my $msg = shift || "";
+    warn "PublicInbox::DS: $msg\n";
+    return undef;
+}
+
+package PublicInbox::DS::Timer;
+# [$abs_float_firetime, $coderef];
+sub cancel {
+    $_[0][1] = undef;
+}
+
+1;
+
+=head1 AUTHORS (Danga::Socket)
+
+Brad Fitzpatrick <brad@danga.com> - author
+
+Michael Granger <ged@danga.com> - docs, testing
+
+Mark Smith <junior@danga.com> - contributor, heavy user, testing
+
+Matt Sergeant <matt@sergeant.org> - kqueue support, docs, timers, other bits
diff --git a/lib/PublicInbox/Daemon.pm b/lib/PublicInbox/Daemon.pm
index 48051f4..68ba987 100644
--- a/lib/PublicInbox/Daemon.pm
+++ b/lib/PublicInbox/Daemon.pm
@@ -12,7 +12,7 @@ use Cwd qw/abs_path/;
 use Time::HiRes qw(clock_gettime CLOCK_MONOTONIC);
 STDOUT->autoflush(1);
 STDERR->autoflush(1);
-require Danga::Socket;
+require PublicInbox::DS;
 require POSIX;
 require PublicInbox::Listener;
 require PublicInbox::ParentPipe;
@@ -172,14 +172,14 @@ sub worker_quit {
 	# killing again terminates immediately:
 	exit unless @listeners;
 
-	$_->close foreach @listeners; # call Danga::Socket::close
+	$_->close foreach @listeners; # call PublicInbox::DS::close
 	@listeners = ();
 	$reason->close if ref($reason) eq 'PublicInbox::ParentPipe';
 
 	my $proc_name;
 	my $warn = 0;
 	# drop idle connections and try to quit gracefully
-	Danga::Socket->SetPostLoopCallback(sub {
+	PublicInbox::DS->SetPostLoopCallback(sub {
 		my ($dmap, undef) = @_;
 		my $n = 0;
 		my $now = clock_gettime(CLOCK_MONOTONIC);
@@ -486,7 +486,7 @@ sub daemon_loop ($$) {
 		PublicInbox::Listener->new($_, $post_accept)
 	} @listeners;
 	PublicInbox::EvCleanup::enable();
-	Danga::Socket->EventLoop;
+	PublicInbox::DS->EventLoop;
 	$parent_pipe = undef;
 }
 
diff --git a/lib/PublicInbox/EvCleanup.pm b/lib/PublicInbox/EvCleanup.pm
index 1a2bdb2..b2f8c08 100644
--- a/lib/PublicInbox/EvCleanup.pm
+++ b/lib/PublicInbox/EvCleanup.pm
@@ -1,11 +1,11 @@
 # Copyright (C) 2016-2018 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 
-# event cleanups (currently for Danga::Socket)
+# event cleanups (currently for PublicInbox::DS)
 package PublicInbox::EvCleanup;
 use strict;
 use warnings;
-use base qw(Danga::Socket);
+use base qw(PublicInbox::DS);
 use fields qw(rd);
 
 my $ENABLED;
@@ -38,7 +38,7 @@ sub _run_all ($) {
 	$_->() foreach @$run;
 }
 
-# ensure Danga::Socket::ToClose fires after timers fire
+# ensure PublicInbox::DS::ToClose fires after timers fire
 sub _asap_close () { $asapq->[1] ||= _asap_timer() }
 
 sub _run_asap () { _run_all($asapq) }
@@ -52,7 +52,7 @@ sub _run_later () {
 	_asap_close();
 }
 
-# Called by Danga::Socket
+# Called by PublicInbox::DS
 sub event_write {
 	my ($self) = @_;
 	$self->watch_write(0);
@@ -74,13 +74,13 @@ sub asap ($) {
 sub next_tick ($) {
 	my ($cb) = @_;
 	push @{$nextq->[0]}, $cb;
-	$nextq->[1] ||= Danga::Socket->AddTimer(0, *_run_next);
+	$nextq->[1] ||= PublicInbox::DS->AddTimer(0, *_run_next);
 }
 
 sub later ($) {
 	my ($cb) = @_;
 	push @{$laterq->[0]}, $cb;
-	$laterq->[1] ||= Danga::Socket->AddTimer(60, *_run_later);
+	$laterq->[1] ||= PublicInbox::DS->AddTimer(60, *_run_later);
 }
 
 END {
diff --git a/lib/PublicInbox/GitHTTPBackend.pm b/lib/PublicInbox/GitHTTPBackend.pm
index 57944a0..0941104 100644
--- a/lib/PublicInbox/GitHTTPBackend.pm
+++ b/lib/PublicInbox/GitHTTPBackend.pm
@@ -67,7 +67,7 @@ sub err ($@) {
 
 sub drop_client ($) {
 	if (my $io = $_[0]->{'psgix.io'}) {
-		$io->close; # this is Danga::Socket::close
+		$io->close; # this is PublicInbox::DS::close
 	}
 }
 
diff --git a/lib/PublicInbox/HTTP.pm b/lib/PublicInbox/HTTP.pm
index e73bd81..11bd241 100644
--- a/lib/PublicInbox/HTTP.pm
+++ b/lib/PublicInbox/HTTP.pm
@@ -10,7 +10,7 @@
 package PublicInbox::HTTP;
 use strict;
 use warnings;
-use base qw(Danga::Socket);
+use base qw(PublicInbox::DS);
 use fields qw(httpd env rbuf input_left remote_addr remote_port forward pull);
 use bytes (); # only for bytes::length
 use Fcntl qw(:seek);
@@ -63,7 +63,7 @@ sub new ($$$) {
 	$self;
 }
 
-sub event_read { # called by Danga::Socket
+sub event_read { # called by PublicInbox::DS
 	my ($self) = @_;
 
 	return event_read_input($self) if defined $self->{env};
@@ -148,7 +148,7 @@ sub app_dispatch {
 		sysseek($input, 0, SEEK_SET) or
 			die "BUG: psgi.input seek failed: $!";
 	}
-	# note: NOT $self->{sock}, we want our close (+ Danga::Socket::close),
+	# note: NOT $self->{sock}, we want our close (+ PublicInbox::DS::close),
 	# to do proper cleanup:
 	$env->{'psgix.io'} = $self; # only for ->close
 	my $res = Plack::Util::run_app($self->{httpd}->{app}, $env);
@@ -256,7 +256,7 @@ sub getline_cb ($$$) {
 	if ($forward) {
 		my $buf = eval { $forward->getline };
 		if (defined $buf) {
-			$write->($buf); # may close in Danga::Socket::write
+			$write->($buf); # may close in PublicInbox::DS::write
 			unless ($self->{closed}) {
 				my $next = $self->{pull};
 				if ($self->{write_buf_size}) {
@@ -320,7 +320,7 @@ sub more ($$) {
 			my $nlen = length($_[1]) - $n;
 			return 1 if $nlen == 0; # all done!
 
-			# Danga::Socket::write queues the unwritten substring:
+			# PublicInbox::DS::write queues the unwritten substring:
 			return $self->write(substr($_[1], $n, $nlen));
 		}
 	}
@@ -465,7 +465,7 @@ sub quit {
 	$self->close;
 }
 
-# callbacks for Danga::Socket
+# callbacks for PublicInbox::DS
 
 sub event_hup { $_[0]->close }
 sub event_err { $_[0]->close }
diff --git a/lib/PublicInbox/HTTPD/Async.pm b/lib/PublicInbox/HTTPD/Async.pm
index a647f10..dbe8a84 100644
--- a/lib/PublicInbox/HTTPD/Async.pm
+++ b/lib/PublicInbox/HTTPD/Async.pm
@@ -8,7 +8,7 @@
 package PublicInbox::HTTPD::Async;
 use strict;
 use warnings;
-use base qw(Danga::Socket);
+use base qw(PublicInbox::DS);
 use fields qw(cb cleanup);
 require PublicInbox::EvCleanup;
 
@@ -45,7 +45,7 @@ sub main_cb ($$$) {
 		my $r = sysread($self->{sock}, $$bref, 8192);
 		if ($r) {
 			$fh->write($$bref);
-			unless ($http->{closed}) { # Danga::Socket sets this
+			unless ($http->{closed}) { # PublicInbox::DS sets this
 				if ($http->{write_buf_size}) {
 					$self->watch_read(0);
 					$http->write(restart_read_cb($self));
diff --git a/lib/PublicInbox/Listener.pm b/lib/PublicInbox/Listener.pm
index 52894cb..d1f0d2e 100644
--- a/lib/PublicInbox/Listener.pm
+++ b/lib/PublicInbox/Listener.pm
@@ -5,7 +5,7 @@
 package PublicInbox::Listener;
 use strict;
 use warnings;
-use base 'Danga::Socket';
+use base 'PublicInbox::DS';
 use Socket qw(SOL_SOCKET SO_KEEPALIVE IPPROTO_TCP TCP_NODELAY);
 use fields qw(post_accept);
 require IO::Handle;
diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm
index 13591e5..f756e92 100644
--- a/lib/PublicInbox/NNTP.pm
+++ b/lib/PublicInbox/NNTP.pm
@@ -5,7 +5,7 @@
 package PublicInbox::NNTP;
 use strict;
 use warnings;
-use base qw(Danga::Socket);
+use base qw(PublicInbox::DS);
 use fields qw(nntpd article rbuf ng long_res);
 use PublicInbox::Search;
 use PublicInbox::Msgmap;
@@ -936,7 +936,7 @@ sub do_more ($$) {
 	do_write($self, $data);
 }
 
-# callbacks for Danga::Socket
+# callbacks for PublicInbox::DS
 
 sub event_hup { $_[0]->close }
 sub event_err { $_[0]->close }
@@ -989,7 +989,7 @@ sub check_read {
 	} else {
 		# no pipelined requests available, let the kernel know
 		# to wake us up if there's more
-		$self->watch_read(1); # Danga::Socket::watch_read
+		$self->watch_read(1); # PublicInbox::DS::watch_read
 	}
 }
 
diff --git a/lib/PublicInbox/ParentPipe.pm b/lib/PublicInbox/ParentPipe.pm
index 4f7ee15..25f13a8 100644
--- a/lib/PublicInbox/ParentPipe.pm
+++ b/lib/PublicInbox/ParentPipe.pm
@@ -4,7 +4,7 @@
 package PublicInbox::ParentPipe;
 use strict;
 use warnings;
-use base qw(Danga::Socket);
+use base qw(PublicInbox::DS);
 use fields qw(cb);
 
 sub new ($$$) {
diff --git a/lib/PublicInbox/Qspawn.pm b/lib/PublicInbox/Qspawn.pm
index 79cdae7..9aede10 100644
--- a/lib/PublicInbox/Qspawn.pm
+++ b/lib/PublicInbox/Qspawn.pm
@@ -12,9 +12,9 @@
 # operate in.  This can be useful to ensure smaller inboxes can
 # be cloned while cloning of large inboxes is maxed out.
 #
-# This does not depend on Danga::Socket or any other external
+# This does not depend on PublicInbox::DS or any other external
 # scheduling mechanism, you just need to call start() and finish()
-# appropriately. However, public-inbox-httpd (which uses Danga::Socket)
+# appropriately. However, public-inbox-httpd (which uses PublicInbox::DS)
 # will be able to schedule this based on readability of stdout from
 # the spawned process.  See GitHTTPBackend.pm and SolverGit.pm for
 # usage examples.  It does not depend on any form of threading.
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
new file mode 100644
index 0000000..cf70045
--- /dev/null
+++ b/lib/PublicInbox/Syscall.pm
@@ -0,0 +1,326 @@
+# This is a fork of the (for now) unmaintained Sys::Syscall 0.25,
+# specifically the Debian libsys-syscall-perl 0.25-6 version to
+# fix upstream regressions in 0.25.
+#
+# This license differs from the rest of public-inbox
+#
+# This module is Copyright (c) 2005 Six Apart, Ltd.
+# Copyright (C) 2019 all contributors <meta@public-inbox.org>
+#
+# All rights reserved.
+#
+# You may distribute under the terms of either the GNU General Public
+# License or the Artistic License, as specified in the Perl README file.
+package PublicInbox::Syscall;
+use strict;
+use POSIX qw(ENOSYS SEEK_CUR);
+use Config;
+
+require Exporter;
+use vars qw(@ISA @EXPORT_OK %EXPORT_TAGS $VERSION);
+
+$VERSION     = "0.25";
+@ISA         = qw(Exporter);
+@EXPORT_OK   = qw(sendfile epoll_ctl epoll_create epoll_wait
+                  EPOLLIN EPOLLOUT EPOLLERR EPOLLHUP EPOLLRDBAND
+                  EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD);
+%EXPORT_TAGS = (epoll => [qw(epoll_ctl epoll_create epoll_wait
+                             EPOLLIN EPOLLOUT EPOLLERR EPOLLHUP EPOLLRDBAND
+                             EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD)],
+                sendfile => [qw(sendfile)],
+                );
+
+use constant EPOLLIN       => 1;
+use constant EPOLLOUT      => 4;
+use constant EPOLLERR      => 8;
+use constant EPOLLHUP      => 16;
+use constant EPOLLRDBAND   => 128;
+use constant EPOLL_CTL_ADD => 1;
+use constant EPOLL_CTL_DEL => 2;
+use constant EPOLL_CTL_MOD => 3;
+
+our $loaded_syscall = 0;
+
+sub _load_syscall {
+    # props to Gaal for this!
+    return if $loaded_syscall++;
+    my $clean = sub {
+        delete @INC{qw<syscall.ph asm/unistd.ph bits/syscall.ph
+                        _h2ph_pre.ph sys/syscall.ph>};
+    };
+    $clean->(); # don't trust modules before us
+    my $rv = eval { require 'syscall.ph'; 1 } || eval { require 'sys/syscall.ph'; 1 };
+    $clean->(); # don't require modules after us trust us
+    return $rv;
+}
+
+our ($sysname, $nodename, $release, $version, $machine) = POSIX::uname();
+
+our (
+     $SYS_epoll_create,
+     $SYS_epoll_ctl,
+     $SYS_epoll_wait,
+     $SYS_sendfile,
+     $SYS_readahead,
+     );
+
+our $no_deprecated = 0;
+
+if ($^O eq "linux") {
+    # whether the machine requires 64-bit numbers to be on 8-byte
+    # boundaries.
+    my $u64_mod_8 = 0;
+
+    # if we're running on an x86_64 kernel, but a 32-bit process,
+    # we need to use the i386 syscall numbers.
+    if ($machine eq "x86_64" && $Config{ptrsize} == 4) {
+        $machine = "i386";
+    }
+
+    # Similarly for mips64 vs mips
+    if ($machine eq "mips64" && $Config{ptrsize} == 4) {
+        $machine = "mips";
+    }
+
+    if ($machine =~ m/^i[3456]86$/) {
+        $SYS_epoll_create = 254;
+        $SYS_epoll_ctl    = 255;
+        $SYS_epoll_wait   = 256;
+        $SYS_sendfile     = 187;  # or 64: 239
+        $SYS_readahead    = 225;
+    } elsif ($machine eq "x86_64") {
+        $SYS_epoll_create = 213;
+        $SYS_epoll_ctl    = 233;
+        $SYS_epoll_wait   = 232;
+        $SYS_sendfile     =  40;
+        $SYS_readahead    = 187;
+    } elsif ($machine =~ m/^parisc/) {
+        $SYS_epoll_create = 224;
+        $SYS_epoll_ctl    = 225;
+        $SYS_epoll_wait   = 226;
+        $SYS_sendfile     = 122;  # sys_sendfile64=209
+        $SYS_readahead    = 207;
+        $u64_mod_8        = 1;
+    } elsif ($machine =~ m/^ppc64/) {
+        $SYS_epoll_create = 236;
+        $SYS_epoll_ctl    = 237;
+        $SYS_epoll_wait   = 238;
+        $SYS_sendfile     = 186;  # (sys32_sendfile).  sys32_sendfile64=226  (64 bit processes: sys_sendfile64=186)
+        $SYS_readahead    = 191;  # both 32-bit and 64-bit vesions
+        $u64_mod_8        = 1;
+    } elsif ($machine eq "ppc") {
+        $SYS_epoll_create = 236;
+        $SYS_epoll_ctl    = 237;
+        $SYS_epoll_wait   = 238;
+        $SYS_sendfile     = 186;  # sys_sendfile64=226
+        $SYS_readahead    = 191;
+        $u64_mod_8        = 1;
+    } elsif ($machine =~ m/^s390/) {
+        $SYS_epoll_create = 249;
+        $SYS_epoll_ctl    = 250;
+        $SYS_epoll_wait   = 251;
+        $SYS_sendfile     = 187;  # sys_sendfile64=223
+        $SYS_readahead    = 222;
+        $u64_mod_8        = 1;
+    } elsif ($machine eq "ia64") {
+        $SYS_epoll_create = 1243;
+        $SYS_epoll_ctl    = 1244;
+        $SYS_epoll_wait   = 1245;
+        $SYS_sendfile     = 1187;
+        $SYS_readahead    = 1216;
+        $u64_mod_8        = 1;
+    } elsif ($machine eq "alpha") {
+        # natural alignment, ints are 32-bits
+        $SYS_sendfile     = 370;  # (sys_sendfile64)
+        $SYS_epoll_create = 407;
+        $SYS_epoll_ctl    = 408;
+        $SYS_epoll_wait   = 409;
+        $SYS_readahead    = 379;
+        $u64_mod_8        = 1;
+    } elsif ($machine eq "aarch64") {
+        $SYS_epoll_create = 20;  # (sys_epoll_create1)
+        $SYS_epoll_ctl    = 21;
+        $SYS_epoll_wait   = 22;  # (sys_epoll_pwait)
+        $SYS_sendfile     = 71;  # (sys_sendfile64)
+        $SYS_readahead    = 213;
+        $u64_mod_8        = 1;
+        $no_deprecated    = 1;
+    } elsif ($machine =~ m/arm(v\d+)?.*l/) {
+        # ARM OABI
+        $SYS_epoll_create = 250;
+        $SYS_epoll_ctl    = 251;
+        $SYS_epoll_wait   = 252;
+        $SYS_sendfile     = 187;
+        $SYS_readahead    = 225;
+        $u64_mod_8        = 1;
+    } elsif ($machine =~ m/^mips64/) {
+        $SYS_sendfile     = 5039;
+        $SYS_epoll_create = 5207;
+        $SYS_epoll_ctl    = 5208;
+        $SYS_epoll_wait   = 5209;
+        $SYS_readahead    = 5179;
+        $u64_mod_8        = 1;
+    } elsif ($machine =~ m/^mips/) {
+        $SYS_sendfile     = 4207;
+        $SYS_epoll_create = 4248;
+        $SYS_epoll_ctl    = 4249;
+        $SYS_epoll_wait   = 4250;
+        $SYS_readahead    = 4223;
+        $u64_mod_8        = 1;
+    } else {
+        # as a last resort, try using the *.ph files which may not
+        # exist or may be wrong
+        _load_syscall();
+        $SYS_epoll_create = eval { &SYS_epoll_create; } || 0;
+        $SYS_epoll_ctl    = eval { &SYS_epoll_ctl;    } || 0;
+        $SYS_epoll_wait   = eval { &SYS_epoll_wait;   } || 0;
+        $SYS_readahead    = eval { &SYS_readahead;    } || 0;
+    }
+
+    if ($u64_mod_8) {
+        *epoll_wait = \&epoll_wait_mod8;
+        *epoll_ctl = \&epoll_ctl_mod8;
+    } else {
+        *epoll_wait = \&epoll_wait_mod4;
+        *epoll_ctl = \&epoll_ctl_mod4;
+    }
+}
+
+elsif ($^O eq "freebsd") {
+    if ($ENV{FREEBSD_SENDFILE}) {
+        # this is still buggy and in development
+        $SYS_sendfile = 393;  # old is 336
+    }
+}
+
+############################################################################
+# sendfile functions
+############################################################################
+
+unless ($SYS_sendfile) {
+    _load_syscall();
+    $SYS_sendfile = eval { &SYS_sendfile; } || 0;
+}
+
+sub sendfile_defined { return $SYS_sendfile ? 1 : 0; }
+
+if ($^O eq "linux" && $SYS_sendfile) {
+    *sendfile = \&sendfile_linux;
+} elsif ($^O eq "freebsd" && $SYS_sendfile) {
+    *sendfile = \&sendfile_freebsd;
+} else {
+    *sendfile = \&sendfile_noimpl;
+}
+
+sub sendfile_noimpl {
+    $! = ENOSYS;
+    return -1;
+}
+
+# C: ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count)
+# Perl:  sendfile($write_fd, $read_fd, $max_count) --> $actually_sent
+sub sendfile_linux {
+    return syscall(
+                   $SYS_sendfile,
+                   $_[0] + 0,  # fd
+                   $_[1] + 0,  # fd
+                   0,          # don't keep track of offset.  callers can lseek and keep track.
+                   $_[2] + 0   # count
+                   );
+}
+
+sub sendfile_freebsd {
+    my $offset = POSIX::lseek($_[1]+0, 0, SEEK_CUR) + 0;
+    my $ct = $_[2] + 0;
+    my $sbytes_buf = "\0" x 8;
+    my $rv = syscall(
+                     $SYS_sendfile,
+                     $_[1] + 0,   # fd     (from)
+                     $_[0] + 0,   # socket (to)
+                     $offset,
+                     $ct,
+                     0,           # struct sf_hdtr *hdtr
+                     $sbytes_buf, # off_t *sbytes
+                     0);          # flags
+    return $rv if $rv < 0;
+
+
+    my $set = unpack("L", $sbytes_buf);
+    POSIX::lseek($_[1]+0, SEEK_CUR, $set);
+    return $set;
+}
+
+
+############################################################################
+# epoll functions
+############################################################################
+
+sub epoll_defined { return $SYS_epoll_create ? 1 : 0; }
+
+# ARGS: (size) -- but in modern Linux 2.6, the
+# size doesn't even matter (radix tree now, not hash)
+sub epoll_create {
+    return -1 unless defined $SYS_epoll_create;
+    my $epfd = eval { syscall($SYS_epoll_create, $no_deprecated ? 0 : ($_[0]||100)+0) };
+    return -1 if $@;
+    return $epfd;
+}
+
+# epoll_ctl wrapper
+# ARGS: (epfd, op, fd, events_mask)
+sub epoll_ctl_mod4 {
+    syscall($SYS_epoll_ctl, $_[0]+0, $_[1]+0, $_[2]+0, pack("LLL", $_[3], $_[2], 0));
+}
+sub epoll_ctl_mod8 {
+    syscall($SYS_epoll_ctl, $_[0]+0, $_[1]+0, $_[2]+0, pack("LLLL", $_[3], 0, $_[2], 0));
+}
+
+# epoll_wait wrapper
+# ARGS: (epfd, maxevents, timeout (milliseconds), arrayref)
+#  arrayref: values modified to be [$fd, $event]
+our $epoll_wait_events;
+our $epoll_wait_size = 0;
+sub epoll_wait_mod4 {
+    # resize our static buffer if requested size is bigger than we've ever done
+    if ($_[1] > $epoll_wait_size) {
+        $epoll_wait_size = $_[1];
+        $epoll_wait_events = "\0" x 12 x $epoll_wait_size;
+    }
+    my $ct = syscall($SYS_epoll_wait, $_[0]+0, $epoll_wait_events, $_[1]+0, $_[2]+0);
+    for (0..$ct-1) {
+        @{$_[3]->[$_]}[1,0] = unpack("LL", substr($epoll_wait_events, 12*$_, 8));
+    }
+    return $ct;
+}
+
+sub epoll_wait_mod8 {
+    # resize our static buffer if requested size is bigger than we've ever done
+    if ($_[1] > $epoll_wait_size) {
+        $epoll_wait_size = $_[1];
+        $epoll_wait_events = "\0" x 16 x $epoll_wait_size;
+    }
+    my $ct;
+    if ($no_deprecated) {
+        $ct = syscall($SYS_epoll_wait, $_[0]+0, $epoll_wait_events, $_[1]+0, $_[2]+0, undef);
+    } else {
+        $ct = syscall($SYS_epoll_wait, $_[0]+0, $epoll_wait_events, $_[1]+0, $_[2]+0);
+    }
+    for (0..$ct-1) {
+        # 16 byte epoll_event structs, with format:
+        #    4 byte mask [idx 1]
+        #    4 byte padding (we put it into idx 2, useless)
+        #    8 byte data (first 4 bytes are fd, into idx 0)
+        @{$_[3]->[$_]}[1,2,0] = unpack("LLL", substr($epoll_wait_events, 16*$_, 12));
+    }
+    return $ct;
+}
+
+1;
+
+=head1 WARRANTY
+
+This is free software. IT COMES WITHOUT WARRANTY OF ANY KIND.
+
+=head1 AUTHORS
+
+Brad Fitzpatrick <brad@danga.com>
diff --git a/t/git-http-backend.t b/t/git-http-backend.t
index 4e51f2b..b616e82 100644
--- a/t/git-http-backend.t
+++ b/t/git-http-backend.t
@@ -11,7 +11,7 @@ use Cwd qw(getcwd);
 
 my $git_dir = $ENV{GIANT_GIT_DIR};
 plan 'skip_all' => 'GIANT_GIT_DIR not defined' unless $git_dir;
-foreach my $mod (qw(Danga::Socket BSD::Resource
+foreach my $mod (qw(PublicInbox::DS BSD::Resource
 			Plack::Util Plack::Builder
 			HTTP::Date HTTP::Status Net::HTTP)) {
 	eval "require $mod";
diff --git a/t/httpd-corner.t b/t/httpd-corner.t
index aa0698d..49c5d1f 100644
--- a/t/httpd-corner.t
+++ b/t/httpd-corner.t
@@ -7,7 +7,7 @@ use warnings;
 use Test::More;
 use Time::HiRes qw(gettimeofday tv_interval);
 
-foreach my $mod (qw(Plack::Util Plack::Builder Danga::Socket
+foreach my $mod (qw(Plack::Util Plack::Builder PublicInbox::DS
 			HTTP::Date HTTP::Status IPC::Run)) {
 	eval "require $mod";
 	plan skip_all => "$mod missing for httpd-corner.t" if $@;
diff --git a/t/httpd-unix.t b/t/httpd-unix.t
index 0a93f20..627adfa 100644
--- a/t/httpd-unix.t
+++ b/t/httpd-unix.t
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use Test::More;
 
-foreach my $mod (qw(Plack::Util Plack::Builder Danga::Socket
+foreach my $mod (qw(Plack::Util Plack::Builder PublicInbox::DS
 			HTTP::Date HTTP::Status)) {
 	eval "require $mod";
 	plan skip_all => "$mod missing for httpd-unix.t" if $@;
diff --git a/t/httpd.t b/t/httpd.t
index 44df164..45cbcbf 100644
--- a/t/httpd.t
+++ b/t/httpd.t
@@ -4,7 +4,7 @@ use strict;
 use warnings;
 use Test::More;
 
-foreach my $mod (qw(Plack::Util Plack::Builder Danga::Socket
+foreach my $mod (qw(Plack::Util Plack::Builder PublicInbox::DS
 			HTTP::Date HTTP::Status)) {
 	eval "require $mod";
 	plan skip_all => "$mod missing for httpd.t" if $@;
diff --git a/t/nntp.t b/t/nntp.t
index 6df7db8..c39a05f 100644
--- a/t/nntp.t
+++ b/t/nntp.t
@@ -5,7 +5,7 @@ use warnings;
 use Test::More;
 use Data::Dumper;
 
-foreach my $mod (qw(DBD::SQLite Search::Xapian Danga::Socket)) {
+foreach my $mod (qw(DBD::SQLite Search::Xapian PublicInbox::DS)) {
 	eval "require $mod";
 	plan skip_all => "$mod missing for nntp.t" if $@;
 }
diff --git a/t/nntpd.t b/t/nntpd.t
index 6b13f81..ecfd74f 100644
--- a/t/nntpd.t
+++ b/t/nntpd.t
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use Test::More;
-foreach my $mod (qw(DBD::SQLite Search::Xapian Danga::Socket)) {
+foreach my $mod (qw(DBD::SQLite Search::Xapian PublicInbox::DS)) {
 	eval "require $mod";
 	plan skip_all => "$mod missing for nntpd.t" if $@;
 }
diff --git a/t/v2mirror.t b/t/v2mirror.t
index ef9a540..eaf9e61 100644
--- a/t/v2mirror.t
+++ b/t/v2mirror.t
@@ -7,7 +7,7 @@ require './t/common.perl';
 require_git(2.6);
 
 # Integration tests for HTTP cloning + mirroring
-foreach my $mod (qw(Plack::Util Plack::Builder Danga::Socket
+foreach my $mod (qw(Plack::Util Plack::Builder PublicInbox::DS
 			HTTP::Date HTTP::Status Search::Xapian DBD::SQLite
 			IPC::Run)) {
 	eval "require $mod";
diff --git a/t/v2writable.t b/t/v2writable.t
index f171417..06b2251 100644
--- a/t/v2writable.t
+++ b/t/v2writable.t
@@ -134,8 +134,8 @@ SKIP: {
 	use Net::NNTP;
 	use IO::Socket;
 	use Socket qw(SO_KEEPALIVE IPPROTO_TCP TCP_NODELAY);
-	eval { require Danga::Socket };
-	skip "Danga::Socket missing $@", 2 if $@;
+	eval { require PublicInbox::DS };
+	skip "PublicInbox::DS missing $@", 2 if $@;
 	my $err = "$mainrepo/stderr.log";
 	my $out = "$mainrepo/stdout.log";
 	my %opts = (
-- 
EW


^ permalink raw reply related	[relevance 24%]

* [PATCH 2/4] syscall: drop readahead wrapper
  @ 2019-05-08 19:18 85%   ` Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2019-05-08 19:18 UTC (permalink / raw)
  To: meta

No backwards compatibility to worry about for us; and fadvise
is superior anyways.
---
 lib/PublicInbox/Syscall.pm | 14 --------------
 1 file changed, 14 deletions(-)

diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 9194364..4ef64cc 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -64,7 +64,6 @@ our (
      $SYS_epoll_ctl,
      $SYS_epoll_wait,
      $SYS_sendfile,
-     $SYS_readahead,
      );
 
 our $no_deprecated = 0;
@@ -90,47 +89,40 @@ if ($^O eq "linux") {
         $SYS_epoll_ctl    = 255;
         $SYS_epoll_wait   = 256;
         $SYS_sendfile     = 187;  # or 64: 239
-        $SYS_readahead    = 225;
     } elsif ($machine eq "x86_64") {
         $SYS_epoll_create = 213;
         $SYS_epoll_ctl    = 233;
         $SYS_epoll_wait   = 232;
         $SYS_sendfile     =  40;
-        $SYS_readahead    = 187;
     } elsif ($machine =~ m/^parisc/) {
         $SYS_epoll_create = 224;
         $SYS_epoll_ctl    = 225;
         $SYS_epoll_wait   = 226;
         $SYS_sendfile     = 122;  # sys_sendfile64=209
-        $SYS_readahead    = 207;
         $u64_mod_8        = 1;
     } elsif ($machine =~ m/^ppc64/) {
         $SYS_epoll_create = 236;
         $SYS_epoll_ctl    = 237;
         $SYS_epoll_wait   = 238;
         $SYS_sendfile     = 186;  # (sys32_sendfile).  sys32_sendfile64=226  (64 bit processes: sys_sendfile64=186)
-        $SYS_readahead    = 191;  # both 32-bit and 64-bit vesions
         $u64_mod_8        = 1;
     } elsif ($machine eq "ppc") {
         $SYS_epoll_create = 236;
         $SYS_epoll_ctl    = 237;
         $SYS_epoll_wait   = 238;
         $SYS_sendfile     = 186;  # sys_sendfile64=226
-        $SYS_readahead    = 191;
         $u64_mod_8        = 1;
     } elsif ($machine =~ m/^s390/) {
         $SYS_epoll_create = 249;
         $SYS_epoll_ctl    = 250;
         $SYS_epoll_wait   = 251;
         $SYS_sendfile     = 187;  # sys_sendfile64=223
-        $SYS_readahead    = 222;
         $u64_mod_8        = 1;
     } elsif ($machine eq "ia64") {
         $SYS_epoll_create = 1243;
         $SYS_epoll_ctl    = 1244;
         $SYS_epoll_wait   = 1245;
         $SYS_sendfile     = 1187;
-        $SYS_readahead    = 1216;
         $u64_mod_8        = 1;
     } elsif ($machine eq "alpha") {
         # natural alignment, ints are 32-bits
@@ -138,14 +130,12 @@ if ($^O eq "linux") {
         $SYS_epoll_create = 407;
         $SYS_epoll_ctl    = 408;
         $SYS_epoll_wait   = 409;
-        $SYS_readahead    = 379;
         $u64_mod_8        = 1;
     } elsif ($machine eq "aarch64") {
         $SYS_epoll_create = 20;  # (sys_epoll_create1)
         $SYS_epoll_ctl    = 21;
         $SYS_epoll_wait   = 22;  # (sys_epoll_pwait)
         $SYS_sendfile     = 71;  # (sys_sendfile64)
-        $SYS_readahead    = 213;
         $u64_mod_8        = 1;
         $no_deprecated    = 1;
     } elsif ($machine =~ m/arm(v\d+)?.*l/) {
@@ -154,21 +144,18 @@ if ($^O eq "linux") {
         $SYS_epoll_ctl    = 251;
         $SYS_epoll_wait   = 252;
         $SYS_sendfile     = 187;
-        $SYS_readahead    = 225;
         $u64_mod_8        = 1;
     } elsif ($machine =~ m/^mips64/) {
         $SYS_sendfile     = 5039;
         $SYS_epoll_create = 5207;
         $SYS_epoll_ctl    = 5208;
         $SYS_epoll_wait   = 5209;
-        $SYS_readahead    = 5179;
         $u64_mod_8        = 1;
     } elsif ($machine =~ m/^mips/) {
         $SYS_sendfile     = 4207;
         $SYS_epoll_create = 4248;
         $SYS_epoll_ctl    = 4249;
         $SYS_epoll_wait   = 4250;
-        $SYS_readahead    = 4223;
         $u64_mod_8        = 1;
     } else {
         # as a last resort, try using the *.ph files which may not
@@ -177,7 +164,6 @@ if ($^O eq "linux") {
         $SYS_epoll_create = eval { &SYS_epoll_create; } || 0;
         $SYS_epoll_ctl    = eval { &SYS_epoll_ctl;    } || 0;
         $SYS_epoll_wait   = eval { &SYS_epoll_wait;   } || 0;
-        $SYS_readahead    = eval { &SYS_readahead;    } || 0;
     }
 
     if ($u64_mod_8) {
-- 
EW


^ permalink raw reply related	[relevance 85%]

* [PATCH 15/57] syscall: get rid of unused EPOLL* constants
  @ 2019-06-24  2:52 91% ` Eric Wong
  2019-06-24  2:52 93% ` [PATCH 16/57] syscall: get rid of unnecessary uname local vars Eric Wong
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 51+ results
From: Eric Wong @ 2019-06-24  2:52 UTC (permalink / raw)
  To: meta

EPOLLRDBAND is used for DECnet; and I'm pretty sure I won't be
updating any of our code to work with DECnet.

I've never found use for EPOLLHUP or EPOLLERR, either; so
disable those for now and add comments for things I might
actually use: EPOLLET and EPOLLONESHOT.
---
 lib/PublicInbox/Syscall.pm | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 4ef64cc3..98110eaf 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -22,11 +22,11 @@ use vars qw(@ISA @EXPORT_OK %EXPORT_TAGS $VERSION);
 $VERSION     = "0.25";
 @ISA         = qw(Exporter);
 @EXPORT_OK   = qw(sendfile epoll_ctl epoll_create epoll_wait
-                  EPOLLIN EPOLLOUT EPOLLERR EPOLLHUP EPOLLRDBAND
+                  EPOLLIN EPOLLOUT
                   EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
                   EPOLLEXCLUSIVE);
 %EXPORT_TAGS = (epoll => [qw(epoll_ctl epoll_create epoll_wait
-                             EPOLLIN EPOLLOUT EPOLLERR EPOLLHUP EPOLLRDBAND
+                             EPOLLIN EPOLLOUT
                              EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
                              EPOLLEXCLUSIVE)],
                 sendfile => [qw(sendfile)],
@@ -34,10 +34,12 @@ $VERSION     = "0.25";
 
 use constant EPOLLIN       => 1;
 use constant EPOLLOUT      => 4;
-use constant EPOLLERR      => 8;
-use constant EPOLLHUP      => 16;
-use constant EPOLLRDBAND   => 128;
+# use constant EPOLLERR      => 8;
+# use constant EPOLLHUP      => 16;
+# use constant EPOLLRDBAND   => 128;
 use constant EPOLLEXCLUSIVE => (1 << 28);
+# use constant EPOLLONESHOT => (1 << 30);
+# use constant EPOLLET => (1 << 31);
 use constant EPOLL_CTL_ADD => 1;
 use constant EPOLL_CTL_DEL => 2;
 use constant EPOLL_CTL_MOD => 3;
-- 
EW


^ permalink raw reply related	[relevance 91%]

* [PATCH 16/57] syscall: get rid of unnecessary uname local vars
    2019-06-24  2:52 91% ` [PATCH 15/57] syscall: get rid of unused EPOLL* constants Eric Wong
@ 2019-06-24  2:52 93% ` Eric Wong
  2019-06-24  2:52 57% ` [PATCH 21/57] ds: get rid of event_watch field Eric Wong
  2019-06-24  2:52 60% ` [PATCH 54/57] ds: split out IO::KQueue-specific code Eric Wong
  3 siblings, 0 replies; 51+ results
From: Eric Wong @ 2019-06-24  2:52 UTC (permalink / raw)
  To: meta

We don't need to keep information from uname(2) around outside
of startup.
---
 lib/PublicInbox/Syscall.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 98110eaf..17fd1398 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -59,7 +59,6 @@ sub _load_syscall {
     return $rv;
 }
 
-our ($sysname, $nodename, $release, $version, $machine) = POSIX::uname();
 
 our (
      $SYS_epoll_create,
@@ -71,6 +70,7 @@ our (
 our $no_deprecated = 0;
 
 if ($^O eq "linux") {
+    my $machine = (POSIX::uname())[-1];
     # whether the machine requires 64-bit numbers to be on 8-byte
     # boundaries.
     my $u64_mod_8 = 0;
-- 
EW


^ permalink raw reply related	[relevance 93%]

* [PATCH 21/57] ds: get rid of event_watch field
    2019-06-24  2:52 91% ` [PATCH 15/57] syscall: get rid of unused EPOLL* constants Eric Wong
  2019-06-24  2:52 93% ` [PATCH 16/57] syscall: get rid of unnecessary uname local vars Eric Wong
@ 2019-06-24  2:52 57% ` Eric Wong
  2019-06-24  2:52 60% ` [PATCH 54/57] ds: split out IO::KQueue-specific code Eric Wong
  3 siblings, 0 replies; 51+ results
From: Eric Wong @ 2019-06-24  2:52 UTC (permalink / raw)
  To: meta

We don't need to keep track of that field since we always
know what events we're interested in when using one-shot
wakeups.
---
 lib/PublicInbox/DS.pm          | 64 ++++++++++++----------------------
 lib/PublicInbox/EvCleanup.pm   |  4 +--
 lib/PublicInbox/HTTP.pm        | 13 +++----
 lib/PublicInbox/HTTPD/Async.pm | 10 +++---
 lib/PublicInbox/NNTP.pm        | 25 +++++--------
 lib/PublicInbox/Syscall.pm     |  6 ++--
 6 files changed, 50 insertions(+), 72 deletions(-)

diff --git a/lib/PublicInbox/DS.pm b/lib/PublicInbox/DS.pm
index 9c801214..f5986e55 100644
--- a/lib/PublicInbox/DS.pm
+++ b/lib/PublicInbox/DS.pm
@@ -29,8 +29,6 @@ use PublicInbox::Syscall qw(:epoll);
 use fields ('sock',              # underlying socket
             'wbuf',              # arrayref of coderefs or GLOB refs
             'wbuf_off',  # offset into first element of wbuf to start writing at
-            'event_watch',       # bitmask of events the client is interested in
-                                 # (EPOLLIN,OUT,etc.)
             );
 
 use Errno  qw(EAGAIN EINVAL);
@@ -318,6 +316,17 @@ sub PostEventLoop {
     return $keep_running;
 }
 
+# map EPOLL* bits to kqueue EV_* flags for EV_SET
+sub kq_flag ($$) {
+    my ($bit, $ev) = @_;
+    if ($ev & $bit) {
+        my $fl = EV_ADD() | EV_ENABLE();
+        ($ev & EPOLLONESHOT) ? ($fl|EV_ONESHOT()) : $fl;
+    } else {
+        EV_DISABLE();
+    }
+}
+
 #####################################################################
 ### PublicInbox::DS-the-object code
 #####################################################################
@@ -344,25 +353,21 @@ sub new {
     Carp::cluck("undef sock and/or fd in PublicInbox::DS->new.  sock=" . ($sock || "") . ", fd=" . ($fd || ""))
         unless $sock && $fd;
 
-    $self->{event_watch} = $ev;
-
     _InitPoller();
 
     if ($HaveEpoll) {
 retry:
         if (epoll_ctl($Epoll, EPOLL_CTL_ADD, $fd, $ev)) {
             if ($! == EINVAL && ($ev & EPOLLEXCLUSIVE)) {
-                $self->{event_watch} = ($ev &= ~EPOLLEXCLUSIVE);
+                $ev &= ~EPOLLEXCLUSIVE;
                 goto retry;
             }
             die "couldn't add epoll watch for $fd: $!\n";
         }
     }
     elsif ($HaveKQueue) {
-        my $f = $ev & EPOLLIN ? EV_ENABLE() : EV_DISABLE();
-        $KQueue->EV_SET($fd, EVFILT_READ(), EV_ADD() | $f);
-        $f = $ev & EPOLLOUT ? EV_ENABLE() : EV_DISABLE();
-        $KQueue->EV_SET($fd, EVFILT_WRITE(), EV_ADD() | $f);
+        $KQueue->EV_SET($fd, EVFILT_READ(), EV_ADD() | kq_flag(EPOLLIN, $ev));
+        $KQueue->EV_SET($fd, EVFILT_WRITE(), EV_ADD() | kq_flag(EPOLLOUT, $ev));
     }
 
     Carp::cluck("PublicInbox::DS::new blowing away existing descriptor map for fd=$fd ($DescriptorMap{$fd})")
@@ -454,7 +459,7 @@ next_buf:
                     }
                 } elsif ($! == EAGAIN) {
                     $self->{wbuf_off} = $off;
-                    watch_write($self, 1);
+                    watch($self, EPOLLOUT|EPOLLONESHOT);
                     return 0;
                 } else {
                     return $self->close;
@@ -467,7 +472,6 @@ next_buf:
     } # while @$wbuf
 
     delete $self->{wbuf};
-    $self->watch_write(0);
     1; # all done
 }
 
@@ -544,7 +548,7 @@ sub write {
             return $self->close;
         }
         $self->{wbuf} = [ tmpbuf($bref, $written) ];
-        watch_write($self, 1);
+        watch($self, EPOLLOUT|EPOLLONESHOT);
         return 0;
     }
 }
@@ -563,49 +567,27 @@ sub msg_more ($$) {
 
             # queue up the unwritten substring:
             $self->{wbuf} = [ tmpbuf(\($_[1]), $n) ];
-            watch_write($self, 1);
+            watch($self, EPOLLOUT|EPOLLONESHOT);
             return 0;
         }
     }
     $self->write(\($_[1]));
 }
 
-sub watch_chg ($$$) {
-    my ($self, $bits, $set) = @_;
+sub watch ($$) {
+    my ($self, $ev) = @_;
     my $sock = $self->{sock} or return;
-    my $cur = $self->{event_watch};
-    my $changes = $cur;
-    if ($set) {
-        $changes |= $bits;
-    } else {
-        $changes &= ~$bits;
-    }
-    return if $changes == $cur;
     my $fd = fileno($sock);
     if ($HaveEpoll) {
-        epoll_ctl($Epoll, EPOLL_CTL_MOD, $fd, $changes) and
+        epoll_ctl($Epoll, EPOLL_CTL_MOD, $fd, $ev) and
             confess("EPOLL_CTL_MOD $!");
     } elsif ($HaveKQueue) {
-        my $flag = $set ? EV_ENABLE() : EV_DISABLE();
-        $KQueue->EV_SET($fd, EVFILT_READ(), $flag) if $bits & EPOLLIN;
-        $KQueue->EV_SET($fd, EVFILT_WRITE(), $flag) if $bits & EPOLLOUT;
+        $KQueue->EV_SET($fd, EVFILT_READ(), kq_flag(EPOLLIN, $ev));
+        $KQueue->EV_SET($fd, EVFILT_WRITE(), kq_flag(EPOLLOUT, $ev));
     }
-    $self->{event_watch} = $changes;
 }
 
-=head2 C<< $obj->watch_read( $boolean ) >>
-
-Turn 'readable' event notification on or off.
-
-=cut
-sub watch_read ($$) { watch_chg($_[0], EPOLLIN, $_[1]) };
-
-=head2 C<< $obj->watch_write( $boolean ) >>
-
-Turn 'writable' event notification on or off.
-
-=cut
-sub watch_write ($$) { watch_chg($_[0], EPOLLOUT, $_[1]) };
+sub watch_in1 ($) { watch($_[0], EPOLLIN | EPOLLONESHOT) }
 
 package PublicInbox::DS::Timer;
 # [$abs_float_firetime, $coderef];
diff --git a/lib/PublicInbox/EvCleanup.pm b/lib/PublicInbox/EvCleanup.pm
index d60ac2cc..a9f6167d 100644
--- a/lib/PublicInbox/EvCleanup.pm
+++ b/lib/PublicInbox/EvCleanup.pm
@@ -6,6 +6,7 @@ package PublicInbox::EvCleanup;
 use strict;
 use warnings;
 use base qw(PublicInbox::DS);
+use PublicInbox::Syscall qw(EPOLLOUT EPOLLONESHOT);
 
 my $ENABLED;
 sub enabled { $ENABLED }
@@ -59,13 +60,12 @@ sub _run_later () {
 # Called by PublicInbox::DS
 sub event_step {
 	my ($self) = @_;
-	$self->watch_write(0);
 	_run_asap();
 }
 
 sub _asap_timer () {
 	$singleton ||= once_init();
-	$singleton->watch_write(1);
+	$singleton->watch(EPOLLOUT|EPOLLONESHOT);
 	1;
 }
 
diff --git a/lib/PublicInbox/HTTP.pm b/lib/PublicInbox/HTTP.pm
index afa71ea5..773d77ba 100644
--- a/lib/PublicInbox/HTTP.pm
+++ b/lib/PublicInbox/HTTP.pm
@@ -20,6 +20,7 @@ use HTTP::Date qw(time2str);
 use IO::Handle;
 require PublicInbox::EvCleanup;
 PublicInbox::DS->import(qw(msg_more write_in_full));
+use PublicInbox::Syscall qw(EPOLLIN EPOLLONESHOT);
 use constant {
 	CHUNK_START => -1,   # [a-f0-9]+\r\n
 	CHUNK_END => -2,     # \r\n
@@ -56,7 +57,7 @@ sub http_date () {
 sub new ($$$) {
 	my ($class, $sock, $addr, $httpd) = @_;
 	my $self = fields::new($class);
-	$self->SUPER::new($sock, PublicInbox::DS::EPOLLIN());
+	$self->SUPER::new($sock, EPOLLIN | EPOLLONESHOT);
 	$self->{httpd} = $httpd;
 	$self->{rbuf} = '';
 	($self->{remote_addr}, $self->{remote_port}) =
@@ -80,7 +81,8 @@ sub event_step { # called by PublicInbox::DS
 		return $self->close if $r == 0;
 		return rbuf_process($self);
 	}
-	return if $!{EAGAIN}; # no need to call watch_read(1) again
+
+	return $self->watch_in1 if $!{EAGAIN};
 
 	# common for clients to break connections without warning,
 	# would be too noisy to log here:
@@ -100,7 +102,7 @@ sub rbuf_process {
 			($r == -2 && length($self->{rbuf}) > 0x4000)) {
 		return quit($self, 400);
 	}
-	return $self->watch_read(1) if $r < 0; # incomplete
+	return $self->watch_in1 if $r < 0; # incomplete
 	$self->{rbuf} = substr($self->{rbuf}, $r);
 
 	my $len = input_prepare($self, \%env);
@@ -143,7 +145,6 @@ sub read_input ($) {
 
 sub app_dispatch {
 	my ($self, $input) = @_;
-	$self->watch_read(0);
 	my $env = $self->{env};
 	$env->{REMOTE_ADDR} = $self->{remote_addr};
 	$env->{REMOTE_PORT} = $self->{remote_port};
@@ -236,7 +237,7 @@ sub identity_wcb ($) {
 sub next_request ($) {
 	my ($self) = @_;
 	if ($self->{rbuf} eq '') { # wait for next request
-		$self->watch_read(1);
+		$self->watch_in1;
 	} else { # avoid recursion for pipelined requests
 		push @$pipelineq, $self;
 		$pipet ||= PublicInbox::EvCleanup::asap(*process_pipelineq);
@@ -360,7 +361,7 @@ sub recv_err {
 	return $self->close if (defined $r && $r == 0);
 	if ($!{EAGAIN}) {
 		$self->{input_left} = $len;
-		return;
+		return $self->watch_in1;
 	}
 	err($self, "error reading for input: $! ($len bytes remaining)");
 	quit($self, 500);
diff --git a/lib/PublicInbox/HTTPD/Async.pm b/lib/PublicInbox/HTTPD/Async.pm
index dae62e55..f32ef009 100644
--- a/lib/PublicInbox/HTTPD/Async.pm
+++ b/lib/PublicInbox/HTTPD/Async.pm
@@ -31,10 +31,12 @@ sub new {
 	$self;
 }
 
+sub restart_read ($) { $_[0]->watch(PublicInbox::DS::EPOLLIN()) }
+
 # fires after pending writes are complete:
 sub restart_read_cb ($) {
 	my ($self) = @_;
-	sub { $self->watch_read(1) }
+	sub { restart_read($self) }
 }
 
 sub main_cb ($$$) {
@@ -46,16 +48,16 @@ sub main_cb ($$$) {
 			$fh->write($$bref);
 			if ($http->{sock}) { # !closed
 				if ($http->{wbuf}) {
-					$self->watch_read(0);
+					$self->watch(0);
 					$http->write(restart_read_cb($self));
 				}
-				# stay in watch_read, but let other clients
+				# stay in EPOLLIN, but let other clients
 				# get some work done, too.
 				return;
 			}
 			# fall through to close below...
 		} elsif (!defined $r) {
-			return if $!{EAGAIN} || $!{EINTR};
+			return restart_read($self) if $!{EAGAIN} || $!{EINTR};
 		}
 
 		# Done! Error handling will happen in $fh->close
diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm
index eb1679a7..98f88410 100644
--- a/lib/PublicInbox/NNTP.pm
+++ b/lib/PublicInbox/NNTP.pm
@@ -24,6 +24,7 @@ use constant {
 	r225 =>	'225 Headers follow (multi-line)',
 	r430 => '430 No article with that message-id',
 };
+use PublicInbox::Syscall qw(EPOLLIN EPOLLONESHOT);
 
 my @OVERVIEW = qw(Subject From Date Message-ID References Xref);
 my $OVERVIEW_FMT = join(":\r\n", @OVERVIEW, qw(Bytes Lines)) . ":\r\n";
@@ -52,12 +53,6 @@ sub next_tick () {
 			# pipelined request, we bypassed socket-readiness
 			# checks to get here:
 			event_step($nntp);
-
-			# maybe there's more pipelined data, or we'll have
-			# to register it for socket-readiness notifications
-			if (!$nntp->{long_res} && $nntp->{sock}) {
-				check_read($nntp);
-			}
 		}
 	}
 }
@@ -97,7 +92,7 @@ sub expire_old () {
 sub new ($$$) {
 	my ($class, $sock, $nntpd) = @_;
 	my $self = fields::new($class);
-	$self->SUPER::new($sock, PublicInbox::DS::EPOLLIN());
+	$self->SUPER::new($sock, EPOLLIN | EPOLLONESHOT);
 	$self->{nntpd} = $nntpd;
 	res($self, '201 ' . $nntpd->{servername} . ' ready - post via email');
 	$self->{rbuf} = '';
@@ -624,11 +619,10 @@ sub long_response ($$) {
 	# make sure we disable reading during a long response,
 	# clients should not be sending us stuff and making us do more
 	# work while we are stream a response to them
-	$self->watch_read(0);
 	my $t0 = now();
 	$self->{long_res} = sub {
 		my $more = eval { $cb->() };
-		if ($@ || !$self->{sock}) {
+		if ($@ || !$self->{sock}) { # something bad happened...
 			$self->{long_res} = undef;
 
 			if ($@) {
@@ -922,10 +916,6 @@ sub do_write ($$) {
 	my $done = $self->write(\($_[1]));
 	return 0 unless $self->{sock};
 
-	# Do not watch for readability if we have data in the queue,
-	# instead re-enable watching for readability when we can
-	$self->watch_read(0) if (!$done || $self->{long_res});
-
 	$done;
 }
 
@@ -943,7 +933,6 @@ sub event_step {
 	my ($self) = @_;
 
 	return unless $self->flush_write && $self->{sock};
-	return if $self->{long_res};
 
 	update_idle_time($self);
 	# only read more requests if we've drained the write buffer,
@@ -957,7 +946,7 @@ sub event_step {
 		my $off = length($$rbuf);
 		$r = sysread($self->{sock}, $$rbuf, LINE_MAX, $off);
 		unless (defined $r) {
-			return if $!{EAGAIN};
+			return $self->watch_in1 if $!{EAGAIN};
 			return $self->close;
 		}
 		return $self->close if $r == 0;
@@ -978,6 +967,10 @@ sub event_step {
 	my $len = length($$rbuf);
 	return $self->close if ($len >= LINE_MAX);
 	update_idle_time($self);
+
+	# maybe there's more pipelined data, or we'll have
+	# to register it for socket-readiness notifications
+	check_read($self) unless ($self->{long_res} || $self->{wbuf});
 }
 
 sub check_read {
@@ -993,7 +986,7 @@ sub check_read {
 	} else {
 		# no pipelined requests available, let the kernel know
 		# to wake us up if there's more
-		$self->watch_read(1); # PublicInbox::DS::watch_read
+		$self->watch_in1; # PublicInbox::DS::watch_in1
 	}
 }
 
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 17fd1398..f1988e61 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -24,11 +24,11 @@ $VERSION     = "0.25";
 @EXPORT_OK   = qw(sendfile epoll_ctl epoll_create epoll_wait
                   EPOLLIN EPOLLOUT
                   EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
-                  EPOLLEXCLUSIVE);
+                  EPOLLONESHOT EPOLLEXCLUSIVE);
 %EXPORT_TAGS = (epoll => [qw(epoll_ctl epoll_create epoll_wait
                              EPOLLIN EPOLLOUT
                              EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
-                             EPOLLEXCLUSIVE)],
+                             EPOLLONESHOT EPOLLEXCLUSIVE)],
                 sendfile => [qw(sendfile)],
                 );
 
@@ -38,7 +38,7 @@ use constant EPOLLOUT      => 4;
 # use constant EPOLLHUP      => 16;
 # use constant EPOLLRDBAND   => 128;
 use constant EPOLLEXCLUSIVE => (1 << 28);
-# use constant EPOLLONESHOT => (1 << 30);
+use constant EPOLLONESHOT => (1 << 30);
 # use constant EPOLLET => (1 << 31);
 use constant EPOLL_CTL_ADD => 1;
 use constant EPOLL_CTL_DEL => 2;
-- 
EW


^ permalink raw reply related	[relevance 57%]

* [PATCH 54/57] ds: split out IO::KQueue-specific code
                     ` (2 preceding siblings ...)
  2019-06-24  2:52 57% ` [PATCH 21/57] ds: get rid of event_watch field Eric Wong
@ 2019-06-24  2:52 60% ` Eric Wong
  2019-06-24  5:24 93%   ` Eric Wong
  3 siblings, 1 reply; 51+ results
From: Eric Wong @ 2019-06-24  2:52 UTC (permalink / raw)
  To: meta

We don't need to code multiple event loops or have branches in
watch() if we can easily make the IO::KQueue-based interface
look like our lower-level epoll_* API.
---
 MANIFEST                   |   1 +
 lib/PublicInbox/DS.pm      | 121 ++++++++-----------------------------
 lib/PublicInbox/DSKQXS.pm  |  73 ++++++++++++++++++++++
 lib/PublicInbox/Syscall.pm |   9 +--
 4 files changed, 99 insertions(+), 105 deletions(-)
 create mode 100644 lib/PublicInbox/DSKQXS.pm

diff --git a/MANIFEST b/MANIFEST
index 26ff0d0d..52c4790e 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -77,6 +77,7 @@ lib/PublicInbox/Cgit.pm
 lib/PublicInbox/Config.pm
 lib/PublicInbox/ContentId.pm
 lib/PublicInbox/DS.pm
+lib/PublicInbox/DSKQXS.pm
 lib/PublicInbox/Daemon.pm
 lib/PublicInbox/Emergency.pm
 lib/PublicInbox/EvCleanup.pm
diff --git a/lib/PublicInbox/DS.pm b/lib/PublicInbox/DS.pm
index d38e2d20..d6ef0b8d 100644
--- a/lib/PublicInbox/DS.pm
+++ b/lib/PublicInbox/DS.pm
@@ -36,14 +36,9 @@ use Errno  qw(EAGAIN EINVAL);
 use Carp   qw(croak confess carp);
 use File::Temp qw(tempfile);
 
-our $HAVE_KQUEUE = eval { require IO::KQueue; IO::KQueue->import; 1 };
-
 our (
-     $HaveEpoll,                 # Flag -- is epoll available?  initially undefined.
-     $HaveKQueue,
      %DescriptorMap,             # fd (num) -> PublicInbox::DS object
-     $Epoll,                     # Global epoll fd (for epoll mode only)
-     $KQueue,                    # Global kqueue fd ref (for kqueue mode only)
+     $Epoll,                     # Global epoll fd (or DSKQXS ref)
      $_io,                       # IO::Handle for Epoll
      @ToClose,                   # sockets to close when event loop is done
 
@@ -74,13 +69,8 @@ sub Reset {
     $PostLoopCallback = undef;
     $DoneInit = 0;
 
-    # NOTE kqueue is close-on-fork, and we don't account for it, yet
-    # OTOH, we (public-inbox) don't need this sub outside of tests...
-    POSIX::close($$KQueue) if !$_io && $KQueue && $$KQueue >= 0;
-    $KQueue = undef;
-
-    $_io = undef; # close $Epoll
-    $Epoll = undef;
+    $_io = undef; # closes real $Epoll FD
+    $Epoll = undef; # may call DSKQXS::DESTROY
 
     *EventLoop = *FirstTimeEventLoop;
 }
@@ -152,21 +142,17 @@ sub _InitPoller
     return if $DoneInit;
     $DoneInit = 1;
 
-    if ($HAVE_KQUEUE) {
-        $KQueue = IO::KQueue->new();
-        $HaveKQueue = defined $KQueue;
-        if ($HaveKQueue) {
-            *EventLoop = *KQueueEventLoop;
-        }
-    }
-    elsif (PublicInbox::Syscall::epoll_defined()) {
-        $Epoll = eval { epoll_create(1024); };
-        $HaveEpoll = defined $Epoll && $Epoll >= 0;
-        if ($HaveEpoll) {
-            set_cloexec($Epoll);
-            *EventLoop = *EpollEventLoop;
-        }
+    if (!PublicInbox::Syscall::epoll_defined())  {
+        $Epoll = eval {
+            require PublicInbox::DSKQXS;
+            PublicInbox::DSKQXS->import;
+            PublicInbox::DSKQXS->new;
+        };
+    } else {
+        $Epoll = epoll_create();
+        set_cloexec($Epoll) if (defined($Epoll) && $Epoll >= 0);
     }
+    *EventLoop = *EpollEventLoop;
 }
 
 =head2 C<< CLASS->EventLoop() >>
@@ -180,11 +166,7 @@ sub FirstTimeEventLoop {
 
     _InitPoller();
 
-    if ($HaveEpoll) {
-        EpollEventLoop($class);
-    } elsif ($HaveKQueue) {
-        KQueueEventLoop($class);
-    }
+    EventLoop($class);
 }
 
 sub now () { clock_gettime(CLOCK_MONOTONIC) }
@@ -218,11 +200,7 @@ sub RunTimers {
     return $timeout;
 }
 
-### The epoll-based event loop. Gets installed as EventLoop if IO::Epoll loads
-### okay.
 sub EpollEventLoop {
-    my $class = shift;
-
     while (1) {
         my @events;
         my $i;
@@ -241,30 +219,6 @@ sub EpollEventLoop {
     }
 }
 
-### The kqueue-based event loop. Gets installed as EventLoop if IO::KQueue works
-### okay.
-sub KQueueEventLoop {
-    my $class = shift;
-
-    while (1) {
-        my $timeout = RunTimers();
-        my @ret = eval { $KQueue->kevent($timeout) };
-        if (my $err = $@) {
-            # workaround https://rt.cpan.org/Ticket/Display.html?id=116615
-            if ($err =~ /Interrupted system call/) {
-                @ret = ();
-            } else {
-                die $err;
-            }
-        }
-
-        foreach my $kev (@ret) {
-            $DescriptorMap{$kev->[0]}->event_step;
-        }
-        return unless PostEventLoop();
-    }
-}
-
 =head2 C<< CLASS->SetPostLoopCallback( CODEREF ) >>
 
 Sets post loop callback function.  Pass a subref and it will be
@@ -314,17 +268,6 @@ sub PostEventLoop {
     return $keep_running;
 }
 
-# map EPOLL* bits to kqueue EV_* flags for EV_SET
-sub kq_flag ($$) {
-    my ($bit, $ev) = @_;
-    if ($ev & $bit) {
-        my $fl = EV_ADD() | EV_ENABLE();
-        ($ev & EPOLLONESHOT) ? ($fl|EV_ONESHOT()) : $fl;
-    } else {
-        EV_ADD() | EV_DISABLE();
-    }
-}
-
 #####################################################################
 ### PublicInbox::DS-the-object code
 #####################################################################
@@ -353,21 +296,13 @@ sub new {
 
     _InitPoller();
 
-    if ($HaveEpoll) {
-retry:
-        if (epoll_ctl($Epoll, EPOLL_CTL_ADD, $fd, $ev)) {
-            if ($! == EINVAL && ($ev & EPOLLEXCLUSIVE)) {
-                $ev &= ~EPOLLEXCLUSIVE;
-                goto retry;
-            }
-            die "couldn't add epoll watch for $fd: $!\n";
+    if (epoll_ctl($Epoll, EPOLL_CTL_ADD, $fd, $ev)) {
+        if ($! == EINVAL && ($ev & EPOLLEXCLUSIVE)) {
+            $ev &= ~EPOLLEXCLUSIVE;
+            goto retry;
         }
+        die "couldn't add epoll watch for $fd: $!\n";
     }
-    elsif ($HaveKQueue) {
-        $KQueue->EV_SET($fd, EVFILT_READ(), kq_flag(EPOLLIN, $ev));
-        $KQueue->EV_SET($fd, EVFILT_WRITE(), kq_flag(EPOLLOUT, $ev));
-    }
-
     Carp::cluck("PublicInbox::DS::new blowing away existing descriptor map for fd=$fd ($DescriptorMap{$fd})")
         if $DescriptorMap{$fd};
 
@@ -396,11 +331,9 @@ sub close {
 
     # if we're using epoll, we have to remove this from our epoll fd so we stop getting
     # notifications about it
-    if ($HaveEpoll) {
-        my $fd = fileno($sock);
-        epoll_ctl($Epoll, EPOLL_CTL_DEL, $fd, 0) and
-            confess("EPOLL_CTL_DEL: $!");
-    }
+    my $fd = fileno($sock);
+    epoll_ctl($Epoll, EPOLL_CTL_DEL, $fd, 0) and
+        confess("EPOLL_CTL_DEL: $!");
 
     # we explicitly don't delete from DescriptorMap here until we
     # actually close the socket, as we might be in the middle of
@@ -596,14 +529,8 @@ sub msg_more ($$) {
 sub watch ($$) {
     my ($self, $ev) = @_;
     my $sock = $self->{sock} or return;
-    my $fd = fileno($sock);
-    if ($HaveEpoll) {
-        epoll_ctl($Epoll, EPOLL_CTL_MOD, $fd, $ev) and
-            confess("EPOLL_CTL_MOD $!");
-    } elsif ($HaveKQueue) {
-        $KQueue->EV_SET($fd, EVFILT_READ(), kq_flag(EPOLLIN, $ev));
-        $KQueue->EV_SET($fd, EVFILT_WRITE(), kq_flag(EPOLLOUT, $ev));
-    }
+    epoll_ctl($Epoll, EPOLL_CTL_MOD, fileno($sock), $ev) and
+        confess("EPOLL_CTL_MOD $!");
     0;
 }
 
diff --git a/lib/PublicInbox/DSKQXS.pm b/lib/PublicInbox/DSKQXS.pm
new file mode 100644
index 00000000..38e13446
--- /dev/null
+++ b/lib/PublicInbox/DSKQXS.pm
@@ -0,0 +1,73 @@
+# Copyright (C) 2019 all contributors <meta@public-inbox.org>
+# Licensed the same as Danga::Socket (and Perl5)
+# License: GPL-1.0+ or Artistic-1.0-Perl
+#  <https://www.gnu.org/licenses/gpl-1.0.txt>
+#  <https://dev.perl.org/licenses/artistic.html>
+#
+# kqueue support via IO::KQueue XS module.  This makes kqueue look
+# like epoll to simplify the code in DS.pm.  This is NOT meant to be
+# an all encompassing emulation of epoll via IO::KQueue, but just to
+# support cases public-inbox-nntpd/httpd care about.
+# A pure-Perl version using syscall() is planned, and it should be
+# faster due to the lack of syscall overhead.
+package PublicInbox::DSKQXS;
+use strict;
+use warnings;
+use parent qw(IO::KQueue);
+use parent qw(Exporter);
+use IO::KQueue;
+use PublicInbox::Syscall qw(EPOLLONESHOT EPOLLIN EPOLLOUT EPOLL_CTL_DEL);
+our @EXPORT = qw(epoll_ctl epoll_wait);
+my $owner_pid = -1; # kqueue is close-on-fork (yes, fork, not exec)
+
+# map EPOLL* bits to kqueue EV_* flags for EV_SET
+sub kq_flag ($$) {
+	my ($bit, $ev) = @_;
+	if ($ev & $bit) {
+		my $fl = EV_ADD | EV_ENABLE;
+		($ev & EPOLLONESHOT) ? ($fl | EV_ONESHOT) : $fl;
+	} else {
+		EV_ADD | EV_DISABLE;
+	}
+}
+
+sub new {
+	my ($class) = @_;
+	die 'non-singleton use not supported' if $owner_pid == $$;
+	$owner_pid = $$;
+	$class->SUPER::new;
+}
+
+sub epoll_ctl {
+	my ($self, $op, $fd, $ev) = @_;
+	if ($op != EPOLL_CTL_DEL) {
+		$self->EV_SET($fd, EVFILT_READ, kq_flag(EPOLLIN, $ev));
+		$self->EV_SET($fd, EVFILT_WRITE, kq_flag(EPOLLOUT, $ev));
+	}
+	0;
+}
+
+sub epoll_wait {
+	my ($self, $maxevents, $timeout_msec, $events) = @_;
+	@$events = eval { $self->kevent($timeout_msec) };
+	if (my $err = $@) {
+		# workaround https://rt.cpan.org/Ticket/Display.html?id=116615
+		if ($err =~ /Interrupted system call/) {
+			@$events = ();
+		} else {
+			die $err;
+		}
+	}
+	# caller only cares for $events[$i]->[0]
+	scalar(@$events);
+}
+
+sub DESTROY {
+	my ($self) = @_;
+	if ($owner_pid == $$) {
+		POSIX::close($$self);
+		$owner_pid = -1;
+	}
+}
+
+1;
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index f1988e61..f53f3c82 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -248,14 +248,7 @@ sub sendfile_freebsd {
 
 sub epoll_defined { return $SYS_epoll_create ? 1 : 0; }
 
-# ARGS: (size) -- but in modern Linux 2.6, the
-# size doesn't even matter (radix tree now, not hash)
-sub epoll_create {
-    return -1 unless defined $SYS_epoll_create;
-    my $epfd = eval { syscall($SYS_epoll_create, $no_deprecated ? 0 : ($_[0]||100)+0) };
-    return -1 if $@;
-    return $epfd;
-}
+sub epoll_create { syscall($SYS_epoll_create, 0) }
 
 # epoll_ctl wrapper
 # ARGS: (epfd, op, fd, events_mask)
-- 
EW


^ permalink raw reply related	[relevance 60%]

* Re: [PATCH 54/57] ds: split out IO::KQueue-specific code
  2019-06-24  2:52 60% ` [PATCH 54/57] ds: split out IO::KQueue-specific code Eric Wong
@ 2019-06-24  5:24 93%   ` Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2019-06-24  5:24 UTC (permalink / raw)
  To: meta

Eric Wong <e@80x24.org> wrote:
> diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
> index f1988e61..f53f3c82 100644
> --- a/lib/PublicInbox/Syscall.pm
> +++ b/lib/PublicInbox/Syscall.pm
> @@ -248,14 +248,7 @@ sub sendfile_freebsd {
>  
>  sub epoll_defined { return $SYS_epoll_create ? 1 : 0; }
>  
> -# ARGS: (size) -- but in modern Linux 2.6, the
> -# size doesn't even matter (radix tree now, not hash)
> -sub epoll_create {
> -    return -1 unless defined $SYS_epoll_create;
> -    my $epfd = eval { syscall($SYS_epoll_create, $no_deprecated ? 0 : ($_[0]||100)+0) };
> -    return -1 if $@;
> -    return $epfd;
> -}
> +sub epoll_create { syscall($SYS_epoll_create, 0) }

Oops, that wasn't tested on Linux, actually :x  I got too
focused on FreeBSD-related improvements that I forgot to test
on the OS I mainly use :x

Will squash this before pushing

diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index f53f3c8..500efa6 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -248,7 +248,9 @@ sub sendfile_freebsd {
 
 sub epoll_defined { return $SYS_epoll_create ? 1 : 0; }
 
-sub epoll_create { syscall($SYS_epoll_create, 0) }
+sub epoll_create {
+	syscall($SYS_epoll_create, $no_deprecated ? 0 : ($_[0]||100)+0);
+}
 
 # epoll_ctl wrapper
 # ARGS: (epfd, op, fd, events_mask)

^ permalink raw reply related	[relevance 93%]

* [PATCH 04/11] listener: use edge-triggered notifications
  @ 2019-06-29 19:59 86% ` Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2019-06-29 19:59 UTC (permalink / raw)
  To: meta

We don't need extra wakeups from the kernel when we know a
listener is already active.
---
 lib/PublicInbox/DSKQXS.pm   | 4 +++-
 lib/PublicInbox/Listener.pm | 7 ++++---
 lib/PublicInbox/Syscall.pm  | 4 ++--
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/DSKQXS.pm b/lib/PublicInbox/DSKQXS.pm
index 364df3d6..278d3f88 100644
--- a/lib/PublicInbox/DSKQXS.pm
+++ b/lib/PublicInbox/DSKQXS.pm
@@ -16,7 +16,8 @@ use warnings;
 use parent qw(IO::KQueue);
 use parent qw(Exporter);
 use IO::KQueue;
-use PublicInbox::Syscall qw(EPOLLONESHOT EPOLLIN EPOLLOUT EPOLL_CTL_DEL);
+use PublicInbox::Syscall qw(EPOLLONESHOT EPOLLIN EPOLLOUT EPOLLET
+	EPOLL_CTL_DEL);
 our @EXPORT_OK = qw(epoll_ctl epoll_wait);
 my $owner_pid = -1; # kqueue is close-on-fork (yes, fork, not exec)
 
@@ -25,6 +26,7 @@ sub kq_flag ($$) {
 	my ($bit, $ev) = @_;
 	if ($ev & $bit) {
 		my $fl = EV_ADD | EV_ENABLE;
+		$fl |= EV_CLEAR if $fl & EPOLLET;
 		($ev & EPOLLONESHOT) ? ($fl | EV_ONESHOT) : $fl;
 	} else {
 		EV_ADD | EV_DISABLE;
diff --git a/lib/PublicInbox/Listener.pm b/lib/PublicInbox/Listener.pm
index 94b2aed4..594dabb8 100644
--- a/lib/PublicInbox/Listener.pm
+++ b/lib/PublicInbox/Listener.pm
@@ -9,6 +9,7 @@ use base 'PublicInbox::DS';
 use Socket qw(SOL_SOCKET SO_KEEPALIVE IPPROTO_TCP TCP_NODELAY);
 use fields qw(post_accept);
 require IO::Handle;
+use PublicInbox::Syscall qw(EPOLLIN EPOLLEXCLUSIVE EPOLLET);
 
 sub new ($$$) {
 	my ($class, $s, $cb) = @_;
@@ -17,15 +18,14 @@ sub new ($$$) {
 	listen($s, 1024);
 	IO::Handle::blocking($s, 0);
 	my $self = fields::new($class);
-	$self->SUPER::new($s, PublicInbox::DS::EPOLLIN()|
-	                      PublicInbox::DS::EPOLLEXCLUSIVE());
+	$self->SUPER::new($s, EPOLLIN|EPOLLET|EPOLLEXCLUSIVE);
 	$self->{post_accept} = $cb;
 	$self
 }
 
 sub event_step {
 	my ($self) = @_;
-	my $sock = $self->{sock};
+	my $sock = $self->{sock} or return;
 
 	# no loop here, we want to fairly distribute clients
 	# between multiple processes sharing the same socket
@@ -35,6 +35,7 @@ sub event_step {
 	if (my $addr = accept(my $c, $sock)) {
 		IO::Handle::blocking($c, 0); # no accept4 :<
 		$self->{post_accept}->($c, $addr, $sock);
+		$self->requeue;
 	}
 }
 
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 500efa67..d7e15c72 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -22,7 +22,7 @@ use vars qw(@ISA @EXPORT_OK %EXPORT_TAGS $VERSION);
 $VERSION     = "0.25";
 @ISA         = qw(Exporter);
 @EXPORT_OK   = qw(sendfile epoll_ctl epoll_create epoll_wait
-                  EPOLLIN EPOLLOUT
+                  EPOLLIN EPOLLOUT EPOLLET
                   EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
                   EPOLLONESHOT EPOLLEXCLUSIVE);
 %EXPORT_TAGS = (epoll => [qw(epoll_ctl epoll_create epoll_wait
@@ -39,7 +39,7 @@ use constant EPOLLOUT      => 4;
 # use constant EPOLLRDBAND   => 128;
 use constant EPOLLEXCLUSIVE => (1 << 28);
 use constant EPOLLONESHOT => (1 << 30);
-# use constant EPOLLET => (1 << 31);
+use constant EPOLLET => (1 << 31);
 use constant EPOLL_CTL_ADD => 1;
 use constant EPOLL_CTL_DEL => 2;
 use constant EPOLL_CTL_MOD => 3;
-- 
EW


^ permalink raw reply related	[relevance 86%]

* [PATCH 7/7] syscall: get rid of sendfile wrappers for now
  @ 2019-10-21 11:22 75% ` Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2019-10-21 11:22 UTC (permalink / raw)
  To: meta

I'm not sure they'll make a measurable difference or will
be worth the effort in the future given the prevalance
of HTTPS and giant socket buffers.

Using Inline::C for this may make more sense in the
future, too, especially if we want to be able to use
GnuTLS.
---
 lib/PublicInbox/Syscall.pm | 75 +-------------------------------------
 1 file changed, 1 insertion(+), 74 deletions(-)

diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index d7e15c72..da8a6c86 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -21,7 +21,7 @@ use vars qw(@ISA @EXPORT_OK %EXPORT_TAGS $VERSION);
 
 $VERSION     = "0.25";
 @ISA         = qw(Exporter);
-@EXPORT_OK   = qw(sendfile epoll_ctl epoll_create epoll_wait
+@EXPORT_OK   = qw(epoll_ctl epoll_create epoll_wait
                   EPOLLIN EPOLLOUT EPOLLET
                   EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
                   EPOLLONESHOT EPOLLEXCLUSIVE);
@@ -29,7 +29,6 @@ $VERSION     = "0.25";
                              EPOLLIN EPOLLOUT
                              EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
                              EPOLLONESHOT EPOLLEXCLUSIVE)],
-                sendfile => [qw(sendfile)],
                 );
 
 use constant EPOLLIN       => 1;
@@ -64,7 +63,6 @@ our (
      $SYS_epoll_create,
      $SYS_epoll_ctl,
      $SYS_epoll_wait,
-     $SYS_sendfile,
      );
 
 our $no_deprecated = 0;
@@ -90,45 +88,37 @@ if ($^O eq "linux") {
         $SYS_epoll_create = 254;
         $SYS_epoll_ctl    = 255;
         $SYS_epoll_wait   = 256;
-        $SYS_sendfile     = 187;  # or 64: 239
     } elsif ($machine eq "x86_64") {
         $SYS_epoll_create = 213;
         $SYS_epoll_ctl    = 233;
         $SYS_epoll_wait   = 232;
-        $SYS_sendfile     =  40;
     } elsif ($machine =~ m/^parisc/) {
         $SYS_epoll_create = 224;
         $SYS_epoll_ctl    = 225;
         $SYS_epoll_wait   = 226;
-        $SYS_sendfile     = 122;  # sys_sendfile64=209
         $u64_mod_8        = 1;
     } elsif ($machine =~ m/^ppc64/) {
         $SYS_epoll_create = 236;
         $SYS_epoll_ctl    = 237;
         $SYS_epoll_wait   = 238;
-        $SYS_sendfile     = 186;  # (sys32_sendfile).  sys32_sendfile64=226  (64 bit processes: sys_sendfile64=186)
         $u64_mod_8        = 1;
     } elsif ($machine eq "ppc") {
         $SYS_epoll_create = 236;
         $SYS_epoll_ctl    = 237;
         $SYS_epoll_wait   = 238;
-        $SYS_sendfile     = 186;  # sys_sendfile64=226
         $u64_mod_8        = 1;
     } elsif ($machine =~ m/^s390/) {
         $SYS_epoll_create = 249;
         $SYS_epoll_ctl    = 250;
         $SYS_epoll_wait   = 251;
-        $SYS_sendfile     = 187;  # sys_sendfile64=223
         $u64_mod_8        = 1;
     } elsif ($machine eq "ia64") {
         $SYS_epoll_create = 1243;
         $SYS_epoll_ctl    = 1244;
         $SYS_epoll_wait   = 1245;
-        $SYS_sendfile     = 1187;
         $u64_mod_8        = 1;
     } elsif ($machine eq "alpha") {
         # natural alignment, ints are 32-bits
-        $SYS_sendfile     = 370;  # (sys_sendfile64)
         $SYS_epoll_create = 407;
         $SYS_epoll_ctl    = 408;
         $SYS_epoll_wait   = 409;
@@ -137,7 +127,6 @@ if ($^O eq "linux") {
         $SYS_epoll_create = 20;  # (sys_epoll_create1)
         $SYS_epoll_ctl    = 21;
         $SYS_epoll_wait   = 22;  # (sys_epoll_pwait)
-        $SYS_sendfile     = 71;  # (sys_sendfile64)
         $u64_mod_8        = 1;
         $no_deprecated    = 1;
     } elsif ($machine =~ m/arm(v\d+)?.*l/) {
@@ -145,16 +134,13 @@ if ($^O eq "linux") {
         $SYS_epoll_create = 250;
         $SYS_epoll_ctl    = 251;
         $SYS_epoll_wait   = 252;
-        $SYS_sendfile     = 187;
         $u64_mod_8        = 1;
     } elsif ($machine =~ m/^mips64/) {
-        $SYS_sendfile     = 5039;
         $SYS_epoll_create = 5207;
         $SYS_epoll_ctl    = 5208;
         $SYS_epoll_wait   = 5209;
         $u64_mod_8        = 1;
     } elsif ($machine =~ m/^mips/) {
-        $SYS_sendfile     = 4207;
         $SYS_epoll_create = 4248;
         $SYS_epoll_ctl    = 4249;
         $SYS_epoll_wait   = 4250;
@@ -180,68 +166,9 @@ if ($^O eq "linux") {
 elsif ($^O eq "freebsd") {
     if ($ENV{FREEBSD_SENDFILE}) {
         # this is still buggy and in development
-        $SYS_sendfile = 393;  # old is 336
     }
 }
 
-############################################################################
-# sendfile functions
-############################################################################
-
-unless ($SYS_sendfile) {
-    _load_syscall();
-    $SYS_sendfile = eval { &SYS_sendfile; } || 0;
-}
-
-sub sendfile_defined { return $SYS_sendfile ? 1 : 0; }
-
-if ($^O eq "linux" && $SYS_sendfile) {
-    *sendfile = \&sendfile_linux;
-} elsif ($^O eq "freebsd" && $SYS_sendfile) {
-    *sendfile = \&sendfile_freebsd;
-} else {
-    *sendfile = \&sendfile_noimpl;
-}
-
-sub sendfile_noimpl {
-    $! = ENOSYS;
-    return -1;
-}
-
-# C: ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count)
-# Perl:  sendfile($write_fd, $read_fd, $max_count) --> $actually_sent
-sub sendfile_linux {
-    return syscall(
-                   $SYS_sendfile,
-                   $_[0] + 0,  # fd
-                   $_[1] + 0,  # fd
-                   0,          # don't keep track of offset.  callers can lseek and keep track.
-                   $_[2] + 0   # count
-                   );
-}
-
-sub sendfile_freebsd {
-    my $offset = POSIX::lseek($_[1]+0, 0, SEEK_CUR) + 0;
-    my $ct = $_[2] + 0;
-    my $sbytes_buf = "\0" x 8;
-    my $rv = syscall(
-                     $SYS_sendfile,
-                     $_[1] + 0,   # fd     (from)
-                     $_[0] + 0,   # socket (to)
-                     $offset,
-                     $ct,
-                     0,           # struct sf_hdtr *hdtr
-                     $sbytes_buf, # off_t *sbytes
-                     0);          # flags
-    return $rv if $rv < 0;
-
-
-    my $set = unpack("L", $sbytes_buf);
-    POSIX::lseek($_[1]+0, SEEK_CUR, $set);
-    return $set;
-}
-
-
 ############################################################################
 # epoll functions
 ############################################################################

^ permalink raw reply related	[relevance 75%]

* [PATCH 2/2] httpd|nntpd: avoid missed signal wakeups
  @ 2019-11-27  1:33 37% ` Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2019-11-27  1:33 UTC (permalink / raw)
  To: meta

Our attempt at using a self-pipe in signal handlers was
ineffective, since pure Perl code execution is deferred
and Perl doesn't use an internal self-pipe/eventfd.  In
retrospect, I actually prefer the simplicity of Perl in
this regard...

We can use sigprocmask() from Perl, so we can introduce
signalfd(2) and EVFILT_SIGNAL support on Linux and *BSD-based
systems, respectively.  These OS primitives allow us to avoid a
race where Perl checks for signals right before epoll_wait() or
kevent() puts the process to sleep.

The (few) systems nowadays without signalfd(2) or IO::KQueue
will now see wakeups every second to avoid missed signals.
---
 MANIFEST                   |   2 +
 lib/PublicInbox/DS.pm      |   6 +-
 lib/PublicInbox/DSKQXS.pm  | 103 +++++++++++++++++----
 lib/PublicInbox/Daemon.pm  | 183 ++++++++++++++++++-------------------
 lib/PublicInbox/Sigfd.pm   |  63 +++++++++++++
 lib/PublicInbox/Syscall.pm |  42 ++++++++-
 t/ds-kqxs.t                |  28 ++++++
 t/sigfd.t                  |  65 +++++++++++++
 8 files changed, 376 insertions(+), 116 deletions(-)
 create mode 100644 lib/PublicInbox/Sigfd.pm
 create mode 100644 t/sigfd.t

diff --git a/MANIFEST b/MANIFEST
index a50c1246..098e656d 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -141,6 +141,7 @@ lib/PublicInbox/SearchIdxShard.pm
 lib/PublicInbox/SearchMsg.pm
 lib/PublicInbox/SearchThread.pm
 lib/PublicInbox/SearchView.pm
+lib/PublicInbox/Sigfd.pm
 lib/PublicInbox/SolverGit.pm
 lib/PublicInbox/Spamcheck.pm
 lib/PublicInbox/Spamcheck/Spamc.pm
@@ -266,6 +267,7 @@ t/replace.t
 t/reply.t
 t/search-thr-index.t
 t/search.t
+t/sigfd.t
 t/solve/0001-simple-mod.patch
 t/solve/0002-rename-with-modifications.patch
 t/solver_git.t
diff --git a/lib/PublicInbox/DS.pm b/lib/PublicInbox/DS.pm
index 7f7cb85d..17c640f4 100644
--- a/lib/PublicInbox/DS.pm
+++ b/lib/PublicInbox/DS.pm
@@ -53,6 +53,7 @@ our (
      $LoopTimeout,               # timeout of event loop in milliseconds
      $DoneInit,                  # if we've done the one-time module init yet
      @Timers,                    # timers
+     $in_loop,
      );
 
 Reset();
@@ -249,7 +250,7 @@ sub reap_pids {
 sub enqueue_reap ($) { push @$nextq, \&reap_pids };
 
 sub EpollEventLoop {
-    local $SIG{CHLD} = \&enqueue_reap;
+    local $in_loop = 1;
     while (1) {
         my @events;
         my $i;
@@ -628,8 +629,7 @@ sub shutdn ($) {
 # must be called with eval, PublicInbox::DS may not be loaded (see t/qspawn.t)
 sub dwaitpid ($$$) {
     my ($pid, $cb, $arg) = @_;
-    my $chld = $SIG{CHLD};
-    if (defined($chld) && $chld eq \&enqueue_reap) {
+    if ($in_loop) {
         push @$WaitPids, [ $pid, $cb, $arg ];
 
         # We could've just missed our SIGCHLD, cover it, here:
diff --git a/lib/PublicInbox/DSKQXS.pm b/lib/PublicInbox/DSKQXS.pm
index 84e146f8..a56079e2 100644
--- a/lib/PublicInbox/DSKQXS.pm
+++ b/lib/PublicInbox/DSKQXS.pm
@@ -8,18 +8,20 @@
 # like epoll to simplify the code in DS.pm.  This is NOT meant to be
 # an all encompassing emulation of epoll via IO::KQueue, but just to
 # support cases public-inbox-nntpd/httpd care about.
-# A pure-Perl version using syscall() is planned, and it should be
-# faster due to the lack of syscall overhead.
+#
+# It also implements signalfd(2) emulation via "tie".
+#
+# A pure-Perl version using syscall() is planned.
 package PublicInbox::DSKQXS;
 use strict;
 use warnings;
-use parent qw(IO::KQueue);
 use parent qw(Exporter);
+use Symbol qw(gensym);
 use IO::KQueue;
+use Errno qw(EAGAIN);
 use PublicInbox::Syscall qw(EPOLLONESHOT EPOLLIN EPOLLOUT EPOLLET
-	EPOLL_CTL_ADD EPOLL_CTL_MOD EPOLL_CTL_DEL);
+	EPOLL_CTL_ADD EPOLL_CTL_MOD EPOLL_CTL_DEL SFD_NONBLOCK);
 our @EXPORT_OK = qw(epoll_ctl epoll_wait);
-my $owner_pid = -1; # kqueue is close-on-fork (yes, fork, not exec)
 
 sub EV_DISPATCH () { 0x0080 }
 
@@ -41,29 +43,90 @@ sub kq_flag ($$) {
 
 sub new {
 	my ($class) = @_;
-	die 'non-singleton use not supported' if $owner_pid == $$;
-	$owner_pid = $$;
-	$class->SUPER::new;
+	bless { kq => IO::KQueue->new, owner_pid => $$ }, $class;
+}
+
+# returns a new instance which behaves like signalfd on Linux.
+# It's wasteful in that it uses another FD, but it simplifies
+# our epoll-oriented code.
+sub signalfd {
+	my ($class, $signo, $flags) = @_;
+	my $sym = gensym;
+	tie *$sym, $class, $signo, $flags; # calls TIEHANDLE
+	$sym
+}
+
+sub TIEHANDLE { # similar to signalfd()
+	my ($class, $signo, $flags) = @_;
+	my $self = $class->new;
+	$self->{timeout} = ($flags & SFD_NONBLOCK) ? 0 : -1;
+	my $kq = $self->{kq};
+	$kq->EV_SET($_, EVFILT_SIGNAL, EV_ADD) for @$signo;
+	$self;
+}
+
+sub READ { # called by sysread() for signalfd compatibility
+	my ($self, undef, $len, $off) = @_; # $_[1] = buf
+	die "bad args for signalfd read" if ($len % 128) // defined($off);
+	my $timeout = $self->{timeout};
+	my $sigbuf = $self->{sigbuf} //= [];
+	my $nr = $len / 128;
+	my $r = 0;
+	$_[1] = '';
+	do {
+		while ($nr--) {
+			my $signo = shift(@$sigbuf) or last;
+			# caller only cares about signalfd_siginfo.ssi_signo:
+			$_[1] .= pack('L', $signo) . ("\0" x 124);
+			$r += 128;
+		}
+		return $r if $r;
+		my @events = eval { $self->{kq}->kevent($timeout) };
+		# workaround https://rt.cpan.org/Ticket/Display.html?id=116615
+		if ($@) {
+			next if $@ =~ /Interrupted system call/;
+			die;
+		}
+		if (!scalar(@events) && $timeout == 0) {
+			$! = EAGAIN;
+			return;
+		}
+
+		# Grab the kevent.ident (signal number).  The kevent.data
+		# field shows coalesced signals, and maybe we'll use it
+		# in the future...
+		@$sigbuf = map { $_->[0] } @events;
+	} while (1);
 }
 
+# for fileno() calls in PublicInbox::DS
+sub FILENO { ${$_[0]->{kq}} }
+
 sub epoll_ctl {
 	my ($self, $op, $fd, $ev) = @_;
+	my $kq = $self->{kq};
 	if ($op == EPOLL_CTL_MOD) {
-		$self->EV_SET($fd, EVFILT_READ, kq_flag(EPOLLIN, $ev));
-		$self->EV_SET($fd, EVFILT_WRITE, kq_flag(EPOLLOUT, $ev));
+		$kq->EV_SET($fd, EVFILT_READ, kq_flag(EPOLLIN, $ev));
+		$kq->EV_SET($fd, EVFILT_WRITE, kq_flag(EPOLLOUT, $ev));
 	} elsif ($op == EPOLL_CTL_DEL) {
-		$self->EV_SET($fd, EVFILT_READ, EV_DISABLE);
-		$self->EV_SET($fd, EVFILT_WRITE, EV_DISABLE);
-	} else {
-		$self->EV_SET($fd, EVFILT_READ, EV_ADD|kq_flag(EPOLLIN, $ev));
-		$self->EV_SET($fd, EVFILT_WRITE, EV_ADD|kq_flag(EPOLLOUT, $ev));
+		$kq->EV_SET($fd, EVFILT_READ, EV_DISABLE);
+		$kq->EV_SET($fd, EVFILT_WRITE, EV_DISABLE);
+	} else { # EPOLL_CTL_ADD
+		$kq->EV_SET($fd, EVFILT_READ, EV_ADD|kq_flag(EPOLLIN, $ev));
+
+		# we call this blindly for read-only FDs such as tied
+		# DSKQXS (signalfd emulation) and Listeners
+		eval {
+			$kq->EV_SET($fd, EVFILT_WRITE, EV_ADD |
+							kq_flag(EPOLLOUT, $ev));
+		};
 	}
 	0;
 }
 
 sub epoll_wait {
 	my ($self, $maxevents, $timeout_msec, $events) = @_;
-	@$events = eval { $self->kevent($timeout_msec) };
+	@$events = eval { $self->{kq}->kevent($timeout_msec) };
 	if (my $err = $@) {
 		# workaround https://rt.cpan.org/Ticket/Display.html?id=116615
 		if ($err =~ /Interrupted system call/) {
@@ -76,11 +139,13 @@ sub epoll_wait {
 	scalar(@$events);
 }
 
+# kqueue is close-on-fork (not exec), so we must not close it
+# in forked processes:
 sub DESTROY {
 	my ($self) = @_;
-	if ($owner_pid == $$) {
-		POSIX::close($$self);
-		$owner_pid = -1;
+	my $kq = delete $self->{kq} or return;
+	if (delete($self->{owner_pid}) == $$) {
+		POSIX::close($$kq);
 	}
 }
 
diff --git a/lib/PublicInbox/Daemon.pm b/lib/PublicInbox/Daemon.pm
index 0e3b95d2..c7a71ba0 100644
--- a/lib/PublicInbox/Daemon.pm
+++ b/lib/PublicInbox/Daemon.pm
@@ -15,9 +15,11 @@ use Cwd qw/abs_path/;
 STDOUT->autoflush(1);
 STDERR->autoflush(1);
 use PublicInbox::DS qw(now);
+use PublicInbox::Syscall qw(SFD_NONBLOCK);
 require PublicInbox::EvCleanup;
 require PublicInbox::Listener;
 require PublicInbox::ParentPipe;
+require PublicInbox::Sigfd;
 my @CMD;
 my ($set_user, $oldset, $newset);
 my (@cfg_listen, $stdout, $stderr, $group, $user, $pid_file, $daemonize);
@@ -74,12 +76,14 @@ sub accept_tls_opt ($) {
 	{ SSL_server => 1, SSL_startHandshake => 0, SSL_reuse_ctx => $ctx };
 }
 
+sub sig_setmask { sigprocmask(SIG_SETMASK, @_) or die "sigprocmask: $!" }
+
 sub daemon_prepare ($) {
 	my ($default_listen) = @_;
 	$oldset = POSIX::SigSet->new();
 	$newset = POSIX::SigSet->new();
 	$newset->fillset or die "fillset: $!";
-	sigprocmask(SIG_SETMASK, $newset, $oldset) or die "sigprocmask: $!";
+	sig_setmask($newset, $oldset);
 	@CMD = ($0, @ARGV);
 	my %opts = (
 		'l|listen=s' => \@cfg_listen,
@@ -252,30 +256,12 @@ sub daemonize () {
 	}
 }
 
-sub shrink_pipes {
-	if ($^O eq 'linux') { # 1031: F_SETPIPE_SZ, 4096: page size
-		fcntl($_, 1031, 4096) for @_;
-	}
-}
-
-sub worker_quit {
+sub worker_quit { # $_[0] = signal name or number (unused)
 	# killing again terminates immediately:
 	exit unless @listeners;
 
 	$_->close foreach @listeners; # call PublicInbox::DS::close
 	@listeners = ();
-
-	# create a lazy self-pipe which kicks us out of the EventLoop
-	# so DS::PostEventLoop can fire
-	if (pipe(my ($r, $w))) {
-		shrink_pipes($w);
-
-		# shrink_pipes == noop
-		PublicInbox::ParentPipe->new($r, *shrink_pipes);
-		close $w; # wake up from the event loop
-	} else {
-		warn "E: pipe failed ($!), quit unreliable\n";
-	}
 	my $proc_name;
 	my $warn = 0;
 	# drop idle connections and try to quit gracefully
@@ -398,7 +384,7 @@ processes when multiple service instances start.
 	@rv
 }
 
-sub upgrade () {
+sub upgrade { # $_[0] = signal name or number (unused)
 	if ($reexec_pid) {
 		warn "upgrade in-progress: $reexec_pid\n";
 		return;
@@ -453,7 +439,7 @@ sub upgrade_aborted ($) {
 	warn $@, "\n" if $@;
 }
 
-sub reap_children () {
+sub reap_children { # $_[0] = 'CHLD' or POSIX::SIGCHLD()
 	while (1) {
 		my $p = waitpid(-1, WNOHANG) or return;
 		if (defined $reexec_pid && $p == $reexec_pid) {
@@ -483,60 +469,50 @@ sub unlink_pid_file_safe_ish ($$) {
 
 sub master_loop {
 	pipe(my ($p0, $p1)) or die "failed to create parent-pipe: $!";
-	pipe(my ($r, $w)) or die "failed to create self-pipe: $!";
-	shrink_pipes($w, $p1);
-
-	IO::Handle::blocking($w, 0);
+	# 1031: F_SETPIPE_SZ, 4096: page size
+	fcntl($p1, 1031, 4096) if $^O eq 'linux';
 	my $set_workers = $worker_processes;
-	my @caught;
-	my $master_pid = $$;
-	foreach my $s (qw(HUP CHLD QUIT INT TERM USR1 USR2 TTIN TTOU WINCH)) {
-		$SIG{$s} = sub {
-			return if $$ != $master_pid;
-			push @caught, $s;
-			syswrite($w, '.');
-		};
-	}
-	sigprocmask(SIG_SETMASK, $oldset) or die "sigprocmask: $!";
 	reopen_logs();
-	# main loop
 	my $quit = 0;
-	while (1) {
-		while (my $s = shift @caught) {
-			if ($s eq 'USR1') {
-				reopen_logs();
-				kill_workers($s);
-			} elsif ($s eq 'USR2') {
-				upgrade();
-			} elsif ($s =~ /\A(?:QUIT|TERM|INT)\z/) {
-				exit if $quit++;
-				kill_workers($s);
-			} elsif ($s eq 'WINCH') {
-				if (-t STDIN || -t STDOUT || -t STDERR) {
-					warn
-"ignoring SIGWINCH since we are not daemonized\n";
-					$SIG{WINCH} = 'IGNORE';
-				} else {
-					$worker_processes = 0;
-				}
-			} elsif ($s eq 'HUP') {
-				$worker_processes = $set_workers;
-				kill_workers($s);
-			} elsif ($s eq 'TTIN') {
-				if ($set_workers > $worker_processes) {
-					++$worker_processes;
-				} else {
-					$worker_processes = ++$set_workers;
-				}
-			} elsif ($s eq 'TTOU') {
-				if ($set_workers > 0) {
-					$worker_processes = --$set_workers;
-				}
-			} elsif ($s eq 'CHLD') {
-				reap_children();
+	my $ignore_winch;
+	my $quit_cb = sub { exit if $quit++; kill_workers($_[0]) };
+	my $sig = {
+		USR1 => sub { reopen_logs(); kill_workers($_[0]); },
+		USR2 => \&upgrade,
+		QUIT => $quit_cb,
+		INT => $quit_cb,
+		TERM => $quit_cb,
+		WINCH => sub {
+			return if $ignore_winch;
+			if (-t STDIN || -t STDOUT || -t STDERR) {
+				$ignore_winch = 1;
+				warn <<EOF;
+ignoring SIGWINCH since we are not daemonized
+EOF
+			} else {
+				$worker_processes = 0;
 			}
-		}
-
+		},
+		HUP => sub {
+			$worker_processes = $set_workers;
+			kill_workers($_[0]);
+		},
+		TTIN => sub {
+			if ($set_workers > $worker_processes) {
+				++$worker_processes;
+			} else {
+				$worker_processes = ++$set_workers;
+			}
+		},
+		TTOU => sub {
+			$worker_processes = --$set_workers if $set_workers > 0;
+		},
+		CHLD => \&reap_children,
+	};
+	my $sigfd = PublicInbox::Sigfd->new($sig, 0);
+	local %SIG = (%SIG, %$sig) if !$sigfd;
+	sig_setmask($oldset) if !$sigfd;
+	while (1) { # main loop
 		my $n = scalar keys %pids;
 		if ($quit) {
 			exit if $n == 0;
@@ -549,22 +525,29 @@ sub master_loop {
 			}
 			$n = $worker_processes;
 		}
-		sigprocmask(SIG_SETMASK, $newset) or die "sigprocmask: $!";
-		foreach my $i ($n..($worker_processes - 1)) {
-			my $pid = fork;
-			if (!defined $pid) {
-				warn "failed to fork worker[$i]: $!\n";
-			} elsif ($pid == 0) {
-				$set_user->() if $set_user;
-				return $p0; # run normal work code
-			} else {
-				warn "PID=$pid is worker[$i]\n";
-				$pids{$pid} = $i;
+		my $want = $worker_processes - 1;
+		if ($n <= $want) {
+			sig_setmask($newset) if !$sigfd;
+			for my $i ($n..$want) {
+				my $pid = fork;
+				if (!defined $pid) {
+					warn "failed to fork worker[$i]: $!\n";
+				} elsif ($pid == 0) {
+					$set_user->() if $set_user;
+					return $p0; # run normal work code
+				} else {
+					warn "PID=$pid is worker[$i]\n";
+					$pids{$pid} = $i;
+				}
 			}
+			sig_setmask($oldset) if !$sigfd;
+		}
+
+		if ($sigfd) { # Linux and IO::KQueue users:
+			$sigfd->wait_once;
+		} else { # wake up every second
+			sleep(1);
 		}
-		sigprocmask(SIG_SETMASK, $oldset) or die "sigprocmask: $!";
-		# just wait on signal events here:
-		sysread($r, my $buf, 8);
 	}
 	exit # never gets here, just for documentation
 }
@@ -606,6 +589,18 @@ sub daemon_loop ($$$$) {
 			$nntpd->{accept_tls} = $v;
 		}
 	}
+	my $sig = {
+		HUP => $refresh,
+		INT => \&worker_quit,
+		QUIT => \&worker_quit,
+		TERM => \&worker_quit,
+		TTIN => 'IGNORE',
+		TTOU => 'IGNORE',
+		USR1 => \&reopen_logs,
+		USR2 => 'IGNORE',
+		WINCH => 'IGNORE',
+		CHLD => \&PublicInbox::DS::enqueue_reap,
+	};
 	my $parent_pipe;
 	if ($worker_processes > 0) {
 		$refresh->(); # preload by default
@@ -614,16 +609,11 @@ sub daemon_loop ($$$$) {
 	} else {
 		reopen_logs();
 		$set_user->() if $set_user;
-		$SIG{USR2} = sub { worker_quit() if upgrade() };
+		$sig->{USR2} = sub { worker_quit() if upgrade() };
 		$refresh->();
 	}
 	$uid = $gid = undef;
 	reopen_logs();
-	$SIG{QUIT} = $SIG{INT} = $SIG{TERM} = *worker_quit;
-	$SIG{USR1} = *reopen_logs;
-	$SIG{HUP} = $refresh;
-	$SIG{CHLD} = 'DEFAULT';
-	$SIG{$_} = 'IGNORE' for qw(USR2 TTIN TTOU WINCH);
 	@listeners = map {;
 		my $tls_cb = $post_accept{sockname($_)};
 
@@ -634,7 +624,14 @@ sub daemon_loop ($$$$) {
 		# this calls epoll_create:
 		PublicInbox::Listener->new($_, $tls_cb || $post_accept)
 	} @listeners;
-	sigprocmask(SIG_SETMASK, $oldset) or die "sigprocmask: $!";
+	my $sigfd = PublicInbox::Sigfd->new($sig, SFD_NONBLOCK);
+	local %SIG = (%SIG, %$sig) if !$sigfd;
+	if (!$sigfd) {
+		# wake up every second to accept signals if we don't
+		# have signalfd or IO::KQueue:
+		sig_setmask($oldset);
+		PublicInbox::DS->SetLoopTimeout(1000);
+	}
 	PublicInbox::DS->EventLoop;
 	$parent_pipe = undef;
 }
diff --git a/lib/PublicInbox/Sigfd.pm b/lib/PublicInbox/Sigfd.pm
new file mode 100644
index 00000000..ec5d7145
--- /dev/null
+++ b/lib/PublicInbox/Sigfd.pm
@@ -0,0 +1,63 @@
+# Copyright (C) 2019 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+package PublicInbox::Sigfd;
+use strict;
+use parent qw(PublicInbox::DS);
+use fields qw(sig); # hashref similar to %SIG, but signal numbers as keys
+use PublicInbox::Syscall qw(signalfd EPOLLIN EPOLLET SFD_NONBLOCK);
+use POSIX ();
+use IO::Handle ();
+
+# returns a coderef to unblock signals if neither signalfd or kqueue
+# are available.
+sub new {
+	my ($class, $sig, $flags) = @_;
+	my $self = fields::new($class);
+	my %signo = map {;
+		my $cb = $sig->{$_};
+		my $num = ($_ eq 'WINCH' && $^O =~ /linux|bsd/i) ? 28 : do {
+			my $m = "SIG$_";
+			POSIX->$m;
+		};
+		$num => $cb;
+	} keys %$sig;
+	my $io;
+	my $fd = signalfd(-1, [keys %signo], $flags);
+	if (defined $fd && $fd >= 0) {
+		$io = IO::Handle->new_from_fd($fd, 'r+');
+	} elsif (eval { require PublicInbox::DSKQXS }) {
+		$io = PublicInbox::DSKQXS->signalfd([keys %signo], $flags);
+	} else {
+		return; # wake up every second to check for signals
+	}
+	if ($flags & SFD_NONBLOCK) { # it can go into the event loop
+		$self->SUPER::new($io, EPOLLIN | EPOLLET);
+	} else { # master main loop
+		$self->{sock} = $io;
+	}
+	$self->{sig} = \%signo;
+	$self;
+}
+
+# PublicInbox::Daemon in master main loop (blocking)
+sub wait_once ($) {
+	my ($self) = @_;
+	my $r = sysread($self->{sock}, my $buf, 128 * 64);
+	if (defined($r)) {
+		while (1) {
+			my $sig = unpack('L', $buf);
+			my $cb = $self->{sig}->{$sig};
+			$cb->($sig) if $cb ne 'IGNORE';
+			return $r if length($buf) == 128;
+			$buf = substr($buf, 128);
+		}
+	}
+	$r;
+}
+
+# called by PublicInbox::DS in epoll_wait loop
+sub event_step {
+	while (wait_once($_[0])) {} # non-blocking
+}
+
+1;
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index da8a6c86..487013d5 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -24,7 +24,8 @@ $VERSION     = "0.25";
 @EXPORT_OK   = qw(epoll_ctl epoll_create epoll_wait
                   EPOLLIN EPOLLOUT EPOLLET
                   EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
-                  EPOLLONESHOT EPOLLEXCLUSIVE);
+                  EPOLLONESHOT EPOLLEXCLUSIVE
+                  signalfd SFD_NONBLOCK);
 %EXPORT_TAGS = (epoll => [qw(epoll_ctl epoll_create epoll_wait
                              EPOLLIN EPOLLOUT
                              EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
@@ -42,6 +43,11 @@ use constant EPOLLET => (1 << 31);
 use constant EPOLL_CTL_ADD => 1;
 use constant EPOLL_CTL_DEL => 2;
 use constant EPOLL_CTL_MOD => 3;
+use constant {
+	SFD_CLOEXEC => 02000000,
+	SFD_NONBLOCK => 00004000,
+};
+
 
 our $loaded_syscall = 0;
 
@@ -63,6 +69,7 @@ our (
      $SYS_epoll_create,
      $SYS_epoll_ctl,
      $SYS_epoll_wait,
+     $SYS_signalfd4,
      );
 
 our $no_deprecated = 0;
@@ -88,63 +95,75 @@ if ($^O eq "linux") {
         $SYS_epoll_create = 254;
         $SYS_epoll_ctl    = 255;
         $SYS_epoll_wait   = 256;
+        $SYS_signalfd4 = 327;
     } elsif ($machine eq "x86_64") {
         $SYS_epoll_create = 213;
         $SYS_epoll_ctl    = 233;
         $SYS_epoll_wait   = 232;
+        $SYS_signalfd4 = 289;
     } elsif ($machine =~ m/^parisc/) {
         $SYS_epoll_create = 224;
         $SYS_epoll_ctl    = 225;
         $SYS_epoll_wait   = 226;
         $u64_mod_8        = 1;
+        $SYS_signalfd4 = 309;
     } elsif ($machine =~ m/^ppc64/) {
         $SYS_epoll_create = 236;
         $SYS_epoll_ctl    = 237;
         $SYS_epoll_wait   = 238;
         $u64_mod_8        = 1;
+        $SYS_signalfd4 = 313;
     } elsif ($machine eq "ppc") {
         $SYS_epoll_create = 236;
         $SYS_epoll_ctl    = 237;
         $SYS_epoll_wait   = 238;
         $u64_mod_8        = 1;
+        $SYS_signalfd4 = 313;
     } elsif ($machine =~ m/^s390/) {
         $SYS_epoll_create = 249;
         $SYS_epoll_ctl    = 250;
         $SYS_epoll_wait   = 251;
         $u64_mod_8        = 1;
+        $SYS_signalfd4 = 322;
     } elsif ($machine eq "ia64") {
         $SYS_epoll_create = 1243;
         $SYS_epoll_ctl    = 1244;
         $SYS_epoll_wait   = 1245;
         $u64_mod_8        = 1;
+        $SYS_signalfd4 = 289;
     } elsif ($machine eq "alpha") {
         # natural alignment, ints are 32-bits
         $SYS_epoll_create = 407;
         $SYS_epoll_ctl    = 408;
         $SYS_epoll_wait   = 409;
         $u64_mod_8        = 1;
+        $SYS_signalfd4 = 484;
     } elsif ($machine eq "aarch64") {
         $SYS_epoll_create = 20;  # (sys_epoll_create1)
         $SYS_epoll_ctl    = 21;
         $SYS_epoll_wait   = 22;  # (sys_epoll_pwait)
         $u64_mod_8        = 1;
         $no_deprecated    = 1;
+        $SYS_signalfd4 = 74;
     } elsif ($machine =~ m/arm(v\d+)?.*l/) {
         # ARM OABI
         $SYS_epoll_create = 250;
         $SYS_epoll_ctl    = 251;
         $SYS_epoll_wait   = 252;
         $u64_mod_8        = 1;
+        $SYS_signalfd4 = 355;
     } elsif ($machine =~ m/^mips64/) {
         $SYS_epoll_create = 5207;
         $SYS_epoll_ctl    = 5208;
         $SYS_epoll_wait   = 5209;
         $u64_mod_8        = 1;
+        $SYS_signalfd4 = 5283;
     } elsif ($machine =~ m/^mips/) {
         $SYS_epoll_create = 4248;
         $SYS_epoll_ctl    = 4249;
         $SYS_epoll_wait   = 4250;
         $u64_mod_8        = 1;
+        $SYS_signalfd4 = 4324;
     } else {
         # as a last resort, try using the *.ph files which may not
         # exist or may be wrong
@@ -152,6 +171,11 @@ if ($^O eq "linux") {
         $SYS_epoll_create = eval { &SYS_epoll_create; } || 0;
         $SYS_epoll_ctl    = eval { &SYS_epoll_ctl;    } || 0;
         $SYS_epoll_wait   = eval { &SYS_epoll_wait;   } || 0;
+
+	# Note: do NOT add new syscalls to depend on *.ph, here.
+	# Better to miss syscalls (so we can fallback to IO::Poll)
+	# than to use wrong ones, since the names are not stable
+	# (at least not on FreeBSD), if the actual numbers are.
     }
 
     if ($u64_mod_8) {
@@ -228,6 +252,22 @@ sub epoll_wait_mod8 {
     return $ct;
 }
 
+sub signalfd ($$$) {
+	my ($fd, $signos, $flags) = @_;
+	if ($SYS_signalfd4) {
+		# Not sure if there's a way to get pack/unpack to get the
+		# contents of POSIX::SigSet to a buffer, but prepping the
+		# bitmap like one would for select() works:
+		my $buf = "\0" x 8;
+		vec($buf, $_ - 1, 1) = 1 for @$signos;
+
+		syscall($SYS_signalfd4, $fd, $buf, 8, $flags|SFD_CLOEXEC);
+	} else {
+		$! = ENOSYS;
+		undef;
+	}
+}
+
 1;
 
 =head1 WARRANTY
diff --git a/t/ds-kqxs.t b/t/ds-kqxs.t
index 785570c3..43b6333f 100644
--- a/t/ds-kqxs.t
+++ b/t/ds-kqxs.t
@@ -10,5 +10,33 @@ unless (eval { require IO::KQueue }) {
 				: "no IO::KQueue, skipping $0: $@";
 	plan skip_all => $m;
 }
+
+if ('ensure nested kqueue works for signalfd emulation') {
+	require POSIX;
+	my $new = POSIX::SigSet->new(POSIX::SIGHUP());
+	my $old = POSIX::SigSet->new;
+	my $hup = 0;
+	local $SIG{HUP} = sub { $hup++ };
+	POSIX::sigprocmask(POSIX::SIG_SETMASK(), $new, $old) or die;
+	my $kqs = IO::KQueue->new or die;
+	$kqs->EV_SET(POSIX::SIGHUP(), IO::KQueue::EVFILT_SIGNAL(),
+			IO::KQueue::EV_ADD());
+	kill('HUP', $$) or die;
+	my @events = $kqs->kevent(3000);
+	is(scalar(@events), 1, 'got one event');
+	is($events[0]->[0], POSIX::SIGHUP(), 'got SIGHUP');
+	my $parent = IO::KQueue->new or die;
+	my $kqfd = $$kqs;
+	$parent->EV_SET($kqfd, IO::KQueue::EVFILT_READ(), IO::KQueue::EV_ADD());
+	kill('HUP', $$) or die;
+	@events = $parent->kevent(3000);
+	is(scalar(@events), 1, 'got one event');
+	is($events[0]->[0], $kqfd, 'got kqfd');
+	is($hup, 0, '$SIG{HUP} did not fire');
+	POSIX::sigprocmask(POSIX::SIG_SETMASK(), $old) or die;
+	defined(POSIX::close($kqfd)) or die;
+	defined(POSIX::close($$parent)) or die;
+}
+
 local $ENV{TEST_IOPOLLER} = 'PublicInbox::DSKQXS';
 require './t/ds-poll.t';
diff --git a/t/sigfd.t b/t/sigfd.t
new file mode 100644
index 00000000..34f30de8
--- /dev/null
+++ b/t/sigfd.t
@@ -0,0 +1,65 @@
+# Copyright (C) 2019 all contributors <meta@public-inbox.org>
+use strict;
+use Test::More;
+use IO::Handle;
+use POSIX qw(:signal_h);
+use Errno qw(ENOSYS);
+use PublicInbox::Syscall qw(SFD_NONBLOCK);
+require_ok 'PublicInbox::Sigfd';
+
+SKIP: {
+	if ($^O ne 'linux' && !eval { require IO::KQueue }) {
+		skip 'signalfd requires Linux or IO::KQueue to emulate', 10;
+	}
+	my $new = POSIX::SigSet->new;
+	$new->fillset or die "sigfillset: $!";
+	my $old = POSIX::SigSet->new;
+	sigprocmask(SIG_SETMASK, $new, $old) or die "sigprocmask $!";
+	my $hit = {};
+	my $sig = {};
+	local $SIG{HUP} = sub { $hit->{HUP}->{normal}++ };
+	local $SIG{TERM} = sub { $hit->{TERM}->{normal}++ };
+	local $SIG{INT} = sub { $hit->{INT}->{normal}++ };
+	for my $s (qw(HUP TERM INT)) {
+		$sig->{$s} = sub { $hit->{$s}->{sigfd}++ };
+	}
+	my $sigfd = PublicInbox::Sigfd->new($sig, 0);
+	if ($sigfd) {
+		require PublicInbox::DS;
+		ok($sigfd, 'Sigfd->new works');
+		kill('HUP', $$) or die "kill $!";
+		kill('INT', $$) or die "kill $!";
+		my $fd = fileno($sigfd->{sock});
+		ok($fd >= 0, 'fileno(Sigfd->{sock}) works');
+		my $rvec = '';
+		vec($rvec, $fd, 1) = 1;
+		is(select($rvec, undef, undef, undef), 1, 'select() works');
+		ok($sigfd->wait_once, 'wait_once reported success');
+		for my $s (qw(HUP INT)) {
+			is($hit->{$s}->{sigfd}, 1, "sigfd fired $s");
+			is($hit->{$s}->{normal}, undef,
+				'normal $SIG{$s} not fired');
+		}
+		$sigfd = undef;
+
+		my $nbsig = PublicInbox::Sigfd->new($sig, SFD_NONBLOCK);
+		ok($nbsig, 'Sigfd->new SFD_NONBLOCK works');
+		is($nbsig->wait_once, undef, 'nonblocking ->wait_once');
+		ok($! == Errno::EAGAIN, 'got EAGAIN');
+		kill('HUP', $$) or die "kill $!";
+		PublicInbox::DS->SetPostLoopCallback(sub {}); # loop once
+		PublicInbox::DS->EventLoop;
+		is($hit->{HUP}->{sigfd}, 2, 'HUP sigfd fired in event loop');
+		kill('TERM', $$) or die "kill $!";
+		kill('HUP', $$) or die "kill $!";
+		PublicInbox::DS->EventLoop;
+		PublicInbox::DS->Reset;
+		is($hit->{TERM}->{sigfd}, 1, 'TERM sigfd fired in event loop');
+		is($hit->{HUP}->{sigfd}, 3, 'HUP sigfd fired in event loop');
+	} else {
+		skip('signalfd disabled?', 10);
+	}
+	sigprocmask(SIG_SETMASK, $old) or die "sigprocmask $!";
+}
+
+done_testing;

^ permalink raw reply related	[relevance 37%]

* [PATCH 6/6] syscall: modernize away from pre-Perl-5.6 conventions
  @ 2020-01-05 23:23 87% ` Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2020-01-05 23:23 UTC (permalink / raw)
  To: meta

"use vars" was superseded by "our" in Perl 5.6, and we
can "use parent qw(Exporter)" in favor of manipulating
@ISA directly (or the bigger "use base ...");

While we're at it, avoid multiple invocations of constant->import
by passing a hashref as a "use" parameter.
---
 lib/PublicInbox/Syscall.pm | 35 ++++++++++++++++-------------------
 1 file changed, 16 insertions(+), 19 deletions(-)

diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 487013d5..c66ea51b 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -13,42 +13,39 @@
 # License or the Artistic License, as specified in the Perl README file.
 package PublicInbox::Syscall;
 use strict;
+use parent qw(Exporter);
 use POSIX qw(ENOSYS SEEK_CUR);
 use Config;
 
-require Exporter;
-use vars qw(@ISA @EXPORT_OK %EXPORT_TAGS $VERSION);
-
-$VERSION     = "0.25";
-@ISA         = qw(Exporter);
-@EXPORT_OK   = qw(epoll_ctl epoll_create epoll_wait
+# $VERSION = '0.25'; # Sys::Syscall version
+our @EXPORT_OK = qw(epoll_ctl epoll_create epoll_wait
                   EPOLLIN EPOLLOUT EPOLLET
                   EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
                   EPOLLONESHOT EPOLLEXCLUSIVE
                   signalfd SFD_NONBLOCK);
-%EXPORT_TAGS = (epoll => [qw(epoll_ctl epoll_create epoll_wait
+our %EXPORT_TAGS = (epoll => [qw(epoll_ctl epoll_create epoll_wait
                              EPOLLIN EPOLLOUT
                              EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
                              EPOLLONESHOT EPOLLEXCLUSIVE)],
                 );
 
-use constant EPOLLIN       => 1;
-use constant EPOLLOUT      => 4;
-# use constant EPOLLERR      => 8;
-# use constant EPOLLHUP      => 16;
-# use constant EPOLLRDBAND   => 128;
-use constant EPOLLEXCLUSIVE => (1 << 28);
-use constant EPOLLONESHOT => (1 << 30);
-use constant EPOLLET => (1 << 31);
-use constant EPOLL_CTL_ADD => 1;
-use constant EPOLL_CTL_DEL => 2;
-use constant EPOLL_CTL_MOD => 3;
 use constant {
+	EPOLLIN => 1,
+	EPOLLOUT => 4,
+	# EPOLLERR => 8,
+	# EPOLLHUP => 16,
+	# EPOLLRDBAND => 128,
+	EPOLLEXCLUSIVE => (1 << 28),
+	EPOLLONESHOT => (1 << 30),
+	EPOLLET => (1 << 31),
+	EPOLL_CTL_ADD => 1,
+	EPOLL_CTL_DEL => 2,
+	EPOLL_CTL_MOD => 3,
+
 	SFD_CLOEXEC => 02000000,
 	SFD_NONBLOCK => 00004000,
 };
 
-
 our $loaded_syscall = 0;
 
 sub _load_syscall {

^ permalink raw reply related	[relevance 87%]

* [PATCH] syscall: support Linux x32 ABI
@ 2020-02-06  8:49 80% Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2020-02-06  8:49 UTC (permalink / raw)
  To: meta

The x32 ABI allows users to take advantage of the extra
registers on x86-64 without the bloat of 64-bit pointers and
longs.

This ought to be significant since Perl was designed when 32-bit
was prevalent; and the common structs for ops, hashes, scalars,
and arrays use longs (SSize_t/Size_t) for things which should
never need 64-bits when processing emails.

Debian's x32 port seems to work quite nicely under a chroot
on an amd64 Linux system.  All tests pass under x32, now.
---
 MANIFEST                   |  1 +
 lib/PublicInbox/Syscall.pm |  9 +++++++--
 t/epoll.t                  | 20 ++++++++++++++++++++
 3 files changed, 28 insertions(+), 2 deletions(-)
 create mode 100644 t/epoll.t

diff --git a/MANIFEST b/MANIFEST
index 5eb5d53a..61747daa 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -217,6 +217,7 @@ t/ds-leak.t
 t/ds-poll.t
 t/edit.t
 t/emergency.t
+t/epoll.t
 t/fail-bin/spamc
 t/feed.t
 t/filter_base.t
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index c66ea51b..29d60ed1 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -78,9 +78,9 @@ if ($^O eq "linux") {
     my $u64_mod_8 = 0;
 
     # if we're running on an x86_64 kernel, but a 32-bit process,
-    # we need to use the i386 syscall numbers.
+    # we need to use the x32 or i386 syscall numbers.
     if ($machine eq "x86_64" && $Config{ptrsize} == 4) {
-        $machine = "i386";
+        $machine = $Config{cppsymbols} =~ /\b__ILP32__=1\b/ ? 'x32' : 'i386';
     }
 
     # Similarly for mips64 vs mips
@@ -98,6 +98,11 @@ if ($^O eq "linux") {
         $SYS_epoll_ctl    = 233;
         $SYS_epoll_wait   = 232;
         $SYS_signalfd4 = 289;
+    } elsif ($machine eq 'x32') {
+        $SYS_epoll_create = 1073742037;
+        $SYS_epoll_ctl = 1073742057;
+        $SYS_epoll_wait = 1073742056;
+        $SYS_signalfd4 = 1073742113;
     } elsif ($machine =~ m/^parisc/) {
         $SYS_epoll_create = 224;
         $SYS_epoll_ctl    = 225;
diff --git a/t/epoll.t b/t/epoll.t
new file mode 100644
index 00000000..56ade672
--- /dev/null
+++ b/t/epoll.t
@@ -0,0 +1,20 @@
+use strict;
+use Test::More;
+use IO::Handle;
+use PublicInbox::Syscall qw(:epoll);
+plan skip_all => 'not Linux' if $^O ne 'linux';
+my $epfd = epoll_create();
+ok($epfd >= 0, 'epoll_create');
+my $hnd = IO::Handle->new_from_fd($epfd, 'r+'); # close on exit
+
+pipe(my ($r, $w)) or die "pipe: $!";
+is(epoll_ctl($epfd, EPOLL_CTL_ADD, fileno($w), EPOLLOUT), 0,
+    'epoll_ctl socket EPOLLOUT');
+
+my @events;
+is(epoll_wait($epfd, 100, 10000, \@events), 1, 'epoll_wait returns');
+is_deeply(\@events, [ [ fileno($w), EPOLLOUT ] ], 'got expected events');
+close $w;
+is(epoll_wait($epfd, 100, 0, \@events), 0, 'epoll_wait timeout');
+
+done_testing;

^ permalink raw reply related	[relevance 80%]

* [PATCH] syscall: support sparc64 (and maybe other big-endian systems)
@ 2020-08-07 10:15 68% Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2020-08-07 10:15 UTC (permalink / raw)
  To: meta

Thanks to the GCC compile farm project, we can wire up syscalls
for sparc64 and set system-specific SFD_* constants properly.

I've FINALLY figured out how to use POSIX::SigSet to generate
a usable buffer for the syscall perlfunc.  This is required
for endian-neutral behavior and relevant to sparc64, at least.

There's no need for signalfd-related stuff to be constants,
either.  signalfd initialization is never a hot path and a stub
subroutine for constants uses several KB of memory in the
interpreter.

We'll drop the needless SEEK_CUR import while we're importing
O_NONBLOCK, too.
---
 lib/PublicInbox/DSKQXS.pm  |  4 ++--
 lib/PublicInbox/Daemon.pm  |  4 ++--
 lib/PublicInbox/Sigfd.pm   |  4 ++--
 lib/PublicInbox/Syscall.pm | 29 +++++++++++++++++------------
 script/public-inbox-watch  |  4 ++--
 t/sigfd.t                  |  6 +++---
 6 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/lib/PublicInbox/DSKQXS.pm b/lib/PublicInbox/DSKQXS.pm
index 35cdecda8..d1d3fe60d 100644
--- a/lib/PublicInbox/DSKQXS.pm
+++ b/lib/PublicInbox/DSKQXS.pm
@@ -18,7 +18,7 @@ use Symbol qw(gensym);
 use IO::KQueue;
 use Errno qw(EAGAIN);
 use PublicInbox::Syscall qw(EPOLLONESHOT EPOLLIN EPOLLOUT EPOLLET
-	EPOLL_CTL_ADD EPOLL_CTL_MOD EPOLL_CTL_DEL SFD_NONBLOCK);
+	EPOLL_CTL_ADD EPOLL_CTL_MOD EPOLL_CTL_DEL $SFD_NONBLOCK);
 our @EXPORT_OK = qw(epoll_ctl epoll_wait);
 
 sub EV_DISPATCH () { 0x0080 }
@@ -57,7 +57,7 @@ sub signalfd {
 sub TIEHANDLE { # similar to signalfd()
 	my ($class, $signo, $flags) = @_;
 	my $self = $class->new;
-	$self->{timeout} = ($flags & SFD_NONBLOCK) ? 0 : -1;
+	$self->{timeout} = ($flags & $SFD_NONBLOCK) ? 0 : -1;
 	my $kq = $self->{kq};
 	$kq->EV_SET($_, EVFILT_SIGNAL, EV_ADD) for @$signo;
 	$self;
diff --git a/lib/PublicInbox/Daemon.pm b/lib/PublicInbox/Daemon.pm
index ab0c2226e..454751834 100644
--- a/lib/PublicInbox/Daemon.pm
+++ b/lib/PublicInbox/Daemon.pm
@@ -15,7 +15,7 @@ use Cwd qw/abs_path/;
 STDOUT->autoflush(1);
 STDERR->autoflush(1);
 use PublicInbox::DS qw(now);
-use PublicInbox::Syscall qw(SFD_NONBLOCK);
+use PublicInbox::Syscall qw($SFD_NONBLOCK);
 require PublicInbox::Listener;
 require PublicInbox::ParentPipe;
 use PublicInbox::Sigfd;
@@ -622,7 +622,7 @@ sub daemon_loop ($$$$) {
 		# this calls epoll_create:
 		PublicInbox::Listener->new($_, $tls_cb || $post_accept)
 	} @listeners;
-	my $sigfd = PublicInbox::Sigfd->new($sig, SFD_NONBLOCK);
+	my $sigfd = PublicInbox::Sigfd->new($sig, $SFD_NONBLOCK);
 	local %SIG = (%SIG, %$sig) if !$sigfd;
 	if (!$sigfd) {
 		# wake up every second to accept signals if we don't
diff --git a/lib/PublicInbox/Sigfd.pm b/lib/PublicInbox/Sigfd.pm
index bf91bb377..5d61e6308 100644
--- a/lib/PublicInbox/Sigfd.pm
+++ b/lib/PublicInbox/Sigfd.pm
@@ -6,7 +6,7 @@
 package PublicInbox::Sigfd;
 use strict;
 use parent qw(PublicInbox::DS);
-use PublicInbox::Syscall qw(signalfd EPOLLIN EPOLLET SFD_NONBLOCK);
+use PublicInbox::Syscall qw(signalfd EPOLLIN EPOLLET $SFD_NONBLOCK);
 use POSIX qw(:signal_h);
 use IO::Handle ();
 
@@ -33,7 +33,7 @@ sub new {
 	} else {
 		return; # wake up every second to check for signals
 	}
-	if ($flags & SFD_NONBLOCK) { # it can go into the event loop
+	if ($flags & $SFD_NONBLOCK) { # it can go into the event loop
 		$self->SUPER::new($io, EPOLLIN | EPOLLET);
 	} else { # master main loop
 		$self->{sock} = $io;
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index ce6b0f3af..e4f00a2a2 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -14,7 +14,7 @@
 package PublicInbox::Syscall;
 use strict;
 use parent qw(Exporter);
-use POSIX qw(ENOSYS SEEK_CUR);
+use POSIX qw(ENOSYS O_NONBLOCK);
 use Config;
 
 # $VERSION = '0.25'; # Sys::Syscall version
@@ -22,7 +22,7 @@ our @EXPORT_OK = qw(epoll_ctl epoll_create epoll_wait
                   EPOLLIN EPOLLOUT EPOLLET
                   EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
                   EPOLLONESHOT EPOLLEXCLUSIVE
-                  signalfd SFD_NONBLOCK);
+                  signalfd $SFD_NONBLOCK);
 our %EXPORT_TAGS = (epoll => [qw(epoll_ctl epoll_create epoll_wait
                              EPOLLIN EPOLLOUT
                              EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
@@ -41,9 +41,6 @@ use constant {
 	EPOLL_CTL_ADD => 1,
 	EPOLL_CTL_DEL => 2,
 	EPOLL_CTL_MOD => 3,
-
-	SFD_CLOEXEC => 02000000,
-	SFD_NONBLOCK => 00004000,
 };
 
 our $loaded_syscall = 0;
@@ -69,6 +66,8 @@ our (
      $SYS_signalfd4,
      );
 
+my $SFD_CLOEXEC = 02000000; # Perl does not expose O_CLOEXEC
+our $SFD_NONBLOCK = O_NONBLOCK;
 our $no_deprecated = 0;
 
 if ($^O eq "linux") {
@@ -103,6 +102,13 @@ if ($^O eq "linux") {
         $SYS_epoll_ctl = 1073742057;
         $SYS_epoll_wait = 1073742056;
         $SYS_signalfd4 = 1073742113;
+    } elsif ($machine eq 'sparc64') {
+	$SYS_epoll_create = 193;
+	$SYS_epoll_ctl = 194;
+	$SYS_epoll_wait = 195;
+	$u64_mod_8 = 1;
+	$SYS_signalfd4 = 317;
+	$SFD_CLOEXEC = 020000000;
     } elsif ($machine =~ m/^parisc/) {
         $SYS_epoll_create = 224;
         $SYS_epoll_ctl    = 225;
@@ -140,6 +146,7 @@ if ($^O eq "linux") {
         $SYS_epoll_wait   = 409;
         $u64_mod_8        = 1;
         $SYS_signalfd4 = 484;
+	$SFD_CLOEXEC = 010000000;
     } elsif ($machine eq "aarch64") {
         $SYS_epoll_create = 20;  # (sys_epoll_create1)
         $SYS_epoll_ctl    = 21;
@@ -257,13 +264,11 @@ sub epoll_wait_mod8 {
 sub signalfd ($$$) {
 	my ($fd, $signos, $flags) = @_;
 	if ($SYS_signalfd4) {
-		# Not sure if there's a way to get pack/unpack to get the
-		# contents of POSIX::SigSet to a buffer, but prepping the
-		# bitmap like one would for select() works:
-		my $buf = "\0" x 8;
-		vec($buf, $_ - 1, 1) = 1 for @$signos;
-
-		syscall($SYS_signalfd4, $fd, $buf, 8, $flags|SFD_CLOEXEC);
+		my $set = POSIX::SigSet->new(@$signos);
+		syscall($SYS_signalfd4, $fd, "$$set",
+			# $Config{sig_count} is NSIG, so this is NSIG/8:
+			int($Config{sig_count}/8),
+			$flags|$SFD_CLOEXEC);
 	} else {
 		$! = ENOSYS;
 		undef;
diff --git a/script/public-inbox-watch b/script/public-inbox-watch
index c07d45d74..20534bf2a 100755
--- a/script/public-inbox-watch
+++ b/script/public-inbox-watch
@@ -7,7 +7,7 @@ use PublicInbox::WatchMaildir;
 use PublicInbox::Config;
 use PublicInbox::DS;
 use PublicInbox::Sigfd;
-use PublicInbox::Syscall qw(SFD_NONBLOCK);
+use PublicInbox::Syscall qw($SFD_NONBLOCK);
 my $oldset = PublicInbox::Sigfd::block_signals();
 STDOUT->autoflush(1);
 STDERR->autoflush(1);
@@ -35,7 +35,7 @@ if ($watch_md) {
 	unless (grep(/\A--no-scan\z/, @ARGV)) {
 		PublicInbox::DS::requeue($scan);
 	}
-	my $sigfd = PublicInbox::Sigfd->new($sig, SFD_NONBLOCK);
+	my $sigfd = PublicInbox::Sigfd->new($sig, $SFD_NONBLOCK);
 	local %SIG = (%SIG, %$sig) if !$sigfd;
 	if (!$sigfd) {
 		PublicInbox::Sigfd::set_sigmask($oldset);
diff --git a/t/sigfd.t b/t/sigfd.t
index 07120b64c..8daf31374 100644
--- a/t/sigfd.t
+++ b/t/sigfd.t
@@ -4,7 +4,7 @@ use Test::More;
 use IO::Handle;
 use POSIX qw(:signal_h);
 use Errno qw(ENOSYS);
-use PublicInbox::Syscall qw(SFD_NONBLOCK);
+use PublicInbox::Syscall qw($SFD_NONBLOCK);
 require_ok 'PublicInbox::Sigfd';
 
 SKIP: {
@@ -42,8 +42,8 @@ SKIP: {
 		}
 		$sigfd = undef;
 
-		my $nbsig = PublicInbox::Sigfd->new($sig, SFD_NONBLOCK);
-		ok($nbsig, 'Sigfd->new SFD_NONBLOCK works');
+		my $nbsig = PublicInbox::Sigfd->new($sig, $SFD_NONBLOCK);
+		ok($nbsig, 'Sigfd->new $SFD_NONBLOCK works');
 		is($nbsig->wait_once, undef, 'nonblocking ->wait_once');
 		ok($! == Errno::EAGAIN, 'got EAGAIN');
 		kill('HUP', $$) or die "kill $!";

^ permalink raw reply related	[relevance 68%]

* [PATCH 5/5] ds: flatten + reuse @events, epoll_wait style fixes
  @ 2020-12-27  2:53 60% ` Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2020-12-27  2:53 UTC (permalink / raw)
  To: meta

Consistently returning the equivalent of pollfd.revents in a
portable manner was never worth the effort for us, as we use the
same ->event_step callback regardless of POLLIN/POLLOUT/POLLHUP.

Being a Perl, @events knows it size and we don't have to return
a maximum index for the caller to iterate on.

We can also avoid redundant integer coercion ("+0") since we
ensure everything is an IV in other places.

Finally, vec() is preferable to ("\0" x $size) for resizing
buffers because it only needs to write the extended portion
and not overwrite the entire buffer.
---
 lib/PublicInbox/DS.pm      | 20 ++++--------
 lib/PublicInbox/DSKQXS.pm  |  2 +-
 lib/PublicInbox/DSPoll.pm  |  3 +-
 lib/PublicInbox/Syscall.pm | 66 +++++++++++++++++++++-----------------
 t/ds-poll.t                | 28 ++++++++--------
 t/epoll.t                  |  8 ++---
 6 files changed, 63 insertions(+), 64 deletions(-)

diff --git a/lib/PublicInbox/DS.pm b/lib/PublicInbox/DS.pm
index 12df5919..97a6f6ef 100644
--- a/lib/PublicInbox/DS.pm
+++ b/lib/PublicInbox/DS.pm
@@ -87,9 +87,7 @@ A timeout of 0 (zero) means poll forever. A timeout of -1 means poll and return
 immediately.
 
 =cut
-sub SetLoopTimeout {
-    return $LoopTimeout = $_[1] + 0;
-}
+sub SetLoopTimeout { $LoopTimeout = $_[1] + 0 }
 
 =head2 C<< PublicInbox::DS::add_timer( $seconds, $coderef, $arg) >>
 
@@ -200,12 +198,7 @@ sub RunTimers {
     my $timeout = int(($Timers[0][0] - $now) * 1000) + 1;
 
     # -1 is an infinite timeout, so prefer a real timeout
-    return $timeout     if $LoopTimeout == -1;
-
-    # otherwise pick the lower of our regular timeout and time until
-    # the next timer
-    return $LoopTimeout if $LoopTimeout < $timeout;
-    return $timeout;
+    ($LoopTimeout < 0 || $LoopTimeout >= $timeout) ? $timeout : $LoopTimeout;
 }
 
 # We can't use waitpid(-1) safely here since it can hit ``, system(),
@@ -261,19 +254,18 @@ sub PostEventLoop () {
 sub EventLoop {
     $Epoll //= _InitPoller();
     local $in_loop = 1;
+    my @events;
     do {
-        my @events;
-        my $i;
         my $timeout = RunTimers();
 
         # get up to 1000 events
-        my $evcount = epoll_wait($Epoll, 1000, $timeout, \@events);
-        for ($i=0; $i<$evcount; $i++) {
+        epoll_wait($Epoll, 1000, $timeout, \@events);
+        for my $fd (@events) {
             # it's possible epoll_wait returned many events, including some at the end
             # that ones in the front triggered unregister-interest actions.  if we
             # can't find the %sock entry, it's because we're no longer interested
             # in that event.
-            $DescriptorMap{$events[$i]->[0]}->event_step;
+            $DescriptorMap{$fd}->event_step;
         }
     } while (PostEventLoop());
     _run_later();
diff --git a/lib/PublicInbox/DSKQXS.pm b/lib/PublicInbox/DSKQXS.pm
index d1d3fe60..aa2c9168 100644
--- a/lib/PublicInbox/DSKQXS.pm
+++ b/lib/PublicInbox/DSKQXS.pm
@@ -134,7 +134,7 @@ sub epoll_wait {
 		}
 	}
 	# caller only cares for $events[$i]->[0]
-	scalar(@$events);
+	$_ = $_->[0] for @$events;
 }
 
 # kqueue is close-on-fork (not exec), so we must not close it
diff --git a/lib/PublicInbox/DSPoll.pm b/lib/PublicInbox/DSPoll.pm
index 1d9b51d9..a218f695 100644
--- a/lib/PublicInbox/DSPoll.pm
+++ b/lib/PublicInbox/DSPoll.pm
@@ -45,14 +45,13 @@ sub epoll_wait {
 			my $fd = $pset[$i++];
 			my $revents = $pset[$i++] or next;
 			delete($self->{$fd}) if $self->{$fd} & EPOLLONESHOT;
-			push @$events, [ $fd ];
+			push @$events, $fd;
 		}
 		my $nevents = scalar @$events;
 		if ($n != $nevents) {
 			warn "BUG? poll() returned $n, but got $nevents";
 		}
 	}
-	$n;
 }
 
 1;
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index e4f00a2a..c403f78a 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -227,38 +227,46 @@ sub epoll_ctl_mod8 {
 our $epoll_wait_events;
 our $epoll_wait_size = 0;
 sub epoll_wait_mod4 {
-    # resize our static buffer if requested size is bigger than we've ever done
-    if ($_[1] > $epoll_wait_size) {
-        $epoll_wait_size = $_[1];
-        $epoll_wait_events = "\0" x 12 x $epoll_wait_size;
-    }
-    my $ct = syscall($SYS_epoll_wait, $_[0]+0, $epoll_wait_events, $_[1]+0, $_[2]+0);
-    for (0..$ct-1) {
-        @{$_[3]->[$_]}[1,0] = unpack("LL", substr($epoll_wait_events, 12*$_, 8));
-    }
-    return $ct;
+	my ($epfd, $maxevents, $timeout_msec, $events) = @_;
+	# resize our static buffer if maxevents bigger than we've ever done
+	if ($maxevents > $epoll_wait_size) {
+		$epoll_wait_size = $maxevents;
+		vec($epoll_wait_events, $maxevents * 12 * 8 - 1, 1) = 0;
+	}
+	@$events = ();
+	my $ct = syscall($SYS_epoll_wait, $epfd, $epoll_wait_events,
+			$maxevents, $timeout_msec);
+	for (0..$ct - 1) {
+		# 12-byte struct epoll_event
+		# 4 bytes uint32_t events mask (skipped, useless to us)
+		# 8 bytes: epoll_data_t union (first 4 bytes are the fd)
+		# So we skip the first 4 bytes and take the middle 4:
+		$events->[$_] = unpack('L', substr($epoll_wait_events,
+							12 * $_ + 4, 4));
+	}
 }
 
 sub epoll_wait_mod8 {
-    # resize our static buffer if requested size is bigger than we've ever done
-    if ($_[1] > $epoll_wait_size) {
-        $epoll_wait_size = $_[1];
-        $epoll_wait_events = "\0" x 16 x $epoll_wait_size;
-    }
-    my $ct;
-    if ($no_deprecated) {
-        $ct = syscall($SYS_epoll_wait, $_[0]+0, $epoll_wait_events, $_[1]+0, $_[2]+0, undef);
-    } else {
-        $ct = syscall($SYS_epoll_wait, $_[0]+0, $epoll_wait_events, $_[1]+0, $_[2]+0);
-    }
-    for (0..$ct-1) {
-        # 16 byte epoll_event structs, with format:
-        #    4 byte mask [idx 1]
-        #    4 byte padding (we put it into idx 2, useless)
-        #    8 byte data (first 4 bytes are fd, into idx 0)
-        @{$_[3]->[$_]}[1,2,0] = unpack("LLL", substr($epoll_wait_events, 16*$_, 12));
-    }
-    return $ct;
+	my ($epfd, $maxevents, $timeout_msec, $events) = @_;
+
+	# resize our static buffer if maxevents bigger than we've ever done
+	if ($maxevents > $epoll_wait_size) {
+		$epoll_wait_size = $maxevents;
+		vec($epoll_wait_events, $maxevents * 16 * 8 - 1, 1) = 0;
+	}
+	@$events = ();
+	my $ct = syscall($SYS_epoll_wait, $epfd, $epoll_wait_events,
+			$maxevents, $timeout_msec,
+			$no_deprecated ? undef : ());
+	for (0..$ct - 1) {
+		# 16-byte struct epoll_event
+		# 4 bytes uint32_t events mask (skipped, useless to us)
+		# 4 bytes padding (skipped, useless)
+		# 8 bytes epoll_data_t union (first 4 bytes are the fd)
+		# So skip the first 8 bytes, take 4, and ignore the last 4:
+		$events->[$_] = unpack('L', substr($epoll_wait_events,
+							16 * $_ + 8, 4));
+	}
 }
 
 sub signalfd ($$$) {
diff --git a/t/ds-poll.t b/t/ds-poll.t
index 3771059b..0ee57b69 100644
--- a/t/ds-poll.t
+++ b/t/ds-poll.t
@@ -16,35 +16,35 @@ pipe($r, $w) or die;
 pipe($x, $y) or die;
 is($p->epoll_ctl(EPOLL_CTL_ADD, fileno($r), EPOLLIN), 0, 'add EPOLLIN');
 my $events = [];
-my $n = $p->epoll_wait(9, 0, $events);
+$p->epoll_wait(9, 0, $events);
 is_deeply($events, [], 'no events set');
-is($n, 0, 'nothing ready, yet');
 is($p->epoll_ctl(EPOLL_CTL_ADD, fileno($w), EPOLLOUT|EPOLLONESHOT), 0,
 	'add EPOLLOUT|EPOLLONESHOT');
-$n = $p->epoll_wait(9, -1, $events);
-is($n, 1, 'got POLLOUT event');
-is($events->[0]->[0], fileno($w), '$w ready');
+$p->epoll_wait(9, -1, $events);
+is(scalar(@$events), 1, 'got POLLOUT event');
+is($events->[0], fileno($w), '$w ready');
 
-$n = $p->epoll_wait(9, 0, $events);
-is($n, 0, 'nothing ready after oneshot');
+$p->epoll_wait(9, 0, $events);
+is(scalar(@$events), 0, 'nothing ready after oneshot');
 is_deeply($events, [], 'no events set after oneshot');
 
 syswrite($w, '1') == 1 or die;
 for my $t (0..1) {
-	$n = $p->epoll_wait(9, $t, $events);
-	is($events->[0]->[0], fileno($r), "level-trigger POLLIN ready #$t");
-	is($n, 1, "only event ready #$t");
+	$p->epoll_wait(9, $t, $events);
+	is($events->[0], fileno($r), "level-trigger POLLIN ready #$t");
+	is(scalar(@$events), 1, "only event ready #$t");
 }
 syswrite($y, '1') == 1 or die;
 is($p->epoll_ctl(EPOLL_CTL_ADD, fileno($x), EPOLLIN|EPOLLONESHOT), 0,
 	'EPOLLIN|EPOLLONESHOT add');
-is($p->epoll_wait(9, -1, $events), 2, 'epoll_wait has 2 ready');
-my @fds = sort(map { $_->[0] } @$events);
+$p->epoll_wait(9, -1, $events);
+is(scalar @$events, 2, 'epoll_wait has 2 ready');
+my @fds = sort @$events;
 my @exp = sort((fileno($r), fileno($x)));
 is_deeply(\@fds, \@exp, 'got both ready FDs');
 
 is($p->epoll_ctl(EPOLL_CTL_DEL, fileno($r), 0), 0, 'EPOLL_CTL_DEL OK');
-$n = $p->epoll_wait(9, 0, $events);
-is($n, 0, 'nothing ready after EPOLL_CTL_DEL');
+$p->epoll_wait(9, 0, $events);
+is(scalar @$events, 0, 'nothing ready after EPOLL_CTL_DEL');
 
 done_testing;
diff --git a/t/epoll.t b/t/epoll.t
index b47650e3..a1e73e07 100644
--- a/t/epoll.t
+++ b/t/epoll.t
@@ -12,11 +12,11 @@ is(epoll_ctl($epfd, EPOLL_CTL_ADD, fileno($w), EPOLLOUT), 0,
     'epoll_ctl socket EPOLLOUT');
 
 my @events;
-is(epoll_wait($epfd, 100, 10000, \@events), 1, 'epoll_wait returns');
+epoll_wait($epfd, 100, 10000, \@events);
 is(scalar(@events), 1, 'got one event');
-is($events[0]->[0], fileno($w), 'got expected FD');
-is($events[0]->[1], EPOLLOUT, 'got expected event');
+is($events[0], fileno($w), 'got expected FD');
 close $w;
-is(epoll_wait($epfd, 100, 0, \@events), 0, 'epoll_wait timeout');
+epoll_wait($epfd, 100, 0, \@events);
+is(@events, 0, 'epoll_wait timeout');
 
 done_testing;

^ permalink raw reply related	[relevance 60%]

* [PATCH 32/36] syscall: SFD_NONBLOCK can be a constant, again
  @ 2020-12-31 13:51 73% ` Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2020-12-31 13:51 UTC (permalink / raw)
  To: meta

Since Perl exposes O_NONBLOCK as a constant, we can safely make
SFD_NONBLOCK a constant, too.  This is not the case for
SFD_CLOEXEC, since O_CLOEXEC is not exposed by Perl despite
being used internally in the interpreter.
---
 lib/PublicInbox/DSKQXS.pm  | 4 ++--
 lib/PublicInbox/Daemon.pm  | 4 ++--
 lib/PublicInbox/LEI.pm     | 4 ++--
 lib/PublicInbox/Sigfd.pm   | 4 ++--
 lib/PublicInbox/Syscall.pm | 4 ++--
 script/public-inbox-watch  | 4 ++--
 t/sigfd.t                  | 6 +++---
 7 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/lib/PublicInbox/DSKQXS.pm b/lib/PublicInbox/DSKQXS.pm
index aa2c9168..9a37e4ce 100644
--- a/lib/PublicInbox/DSKQXS.pm
+++ b/lib/PublicInbox/DSKQXS.pm
@@ -18,7 +18,7 @@ use Symbol qw(gensym);
 use IO::KQueue;
 use Errno qw(EAGAIN);
 use PublicInbox::Syscall qw(EPOLLONESHOT EPOLLIN EPOLLOUT EPOLLET
-	EPOLL_CTL_ADD EPOLL_CTL_MOD EPOLL_CTL_DEL $SFD_NONBLOCK);
+	EPOLL_CTL_ADD EPOLL_CTL_MOD EPOLL_CTL_DEL SFD_NONBLOCK);
 our @EXPORT_OK = qw(epoll_ctl epoll_wait);
 
 sub EV_DISPATCH () { 0x0080 }
@@ -57,7 +57,7 @@ sub signalfd {
 sub TIEHANDLE { # similar to signalfd()
 	my ($class, $signo, $flags) = @_;
 	my $self = $class->new;
-	$self->{timeout} = ($flags & $SFD_NONBLOCK) ? 0 : -1;
+	$self->{timeout} = ($flags & SFD_NONBLOCK) ? 0 : -1;
 	my $kq = $self->{kq};
 	$kq->EV_SET($_, EVFILT_SIGNAL, EV_ADD) for @$signo;
 	$self;
diff --git a/lib/PublicInbox/Daemon.pm b/lib/PublicInbox/Daemon.pm
index bdf1dc45..f68337a0 100644
--- a/lib/PublicInbox/Daemon.pm
+++ b/lib/PublicInbox/Daemon.pm
@@ -16,7 +16,7 @@ sub SO_ACCEPTFILTER () { 0x1000 }
 STDOUT->autoflush(1);
 STDERR->autoflush(1);
 use PublicInbox::DS qw(now);
-use PublicInbox::Syscall qw($SFD_NONBLOCK);
+use PublicInbox::Syscall qw(SFD_NONBLOCK);
 require PublicInbox::Listener;
 use PublicInbox::EOFpipe;
 use PublicInbox::Sigfd;
@@ -627,7 +627,7 @@ sub daemon_loop ($$$$) {
 		# this calls epoll_create:
 		PublicInbox::Listener->new($_, $tls_cb || $post_accept)
 	} @listeners;
-	my $sigfd = PublicInbox::Sigfd->new($sig, $SFD_NONBLOCK);
+	my $sigfd = PublicInbox::Sigfd->new($sig, SFD_NONBLOCK);
 	local %SIG = (%SIG, %$sig) if !$sigfd;
 	if (!$sigfd) {
 		# wake up every second to accept signals if we don't
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 7b7f45de..03302f8a 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -16,7 +16,7 @@ use POSIX ();
 use IO::Handle ();
 use Sys::Syslog qw(syslog openlog);
 use PublicInbox::Config;
-use PublicInbox::Syscall qw($SFD_NONBLOCK EPOLLIN EPOLLONESHOT);
+use PublicInbox::Syscall qw(SFD_NONBLOCK EPOLLIN EPOLLONESHOT);
 use PublicInbox::Sigfd;
 use PublicInbox::DS qw(now dwaitpid);
 use PublicInbox::Spawn qw(spawn run_die);
@@ -704,7 +704,7 @@ sub lazy_start {
 		USR1 => \&noop,
 		USR2 => \&noop,
 	};
-	my $sigfd = PublicInbox::Sigfd->new($sig, $SFD_NONBLOCK);
+	my $sigfd = PublicInbox::Sigfd->new($sig, SFD_NONBLOCK);
 	local %SIG = (%SIG, %$sig) if !$sigfd;
 	if ($sigfd) { # TODO: use inotify/kqueue to detect unlinked sockets
 		PublicInbox::DS->SetLoopTimeout(5000);
diff --git a/lib/PublicInbox/Sigfd.pm b/lib/PublicInbox/Sigfd.pm
index 5d61e630..bf91bb37 100644
--- a/lib/PublicInbox/Sigfd.pm
+++ b/lib/PublicInbox/Sigfd.pm
@@ -6,7 +6,7 @@
 package PublicInbox::Sigfd;
 use strict;
 use parent qw(PublicInbox::DS);
-use PublicInbox::Syscall qw(signalfd EPOLLIN EPOLLET $SFD_NONBLOCK);
+use PublicInbox::Syscall qw(signalfd EPOLLIN EPOLLET SFD_NONBLOCK);
 use POSIX qw(:signal_h);
 use IO::Handle ();
 
@@ -33,7 +33,7 @@ sub new {
 	} else {
 		return; # wake up every second to check for signals
 	}
-	if ($flags & $SFD_NONBLOCK) { # it can go into the event loop
+	if ($flags & SFD_NONBLOCK) { # it can go into the event loop
 		$self->SUPER::new($io, EPOLLIN | EPOLLET);
 	} else { # master main loop
 		$self->{sock} = $io;
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index c403f78a..180ee2cc 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -22,7 +22,7 @@ our @EXPORT_OK = qw(epoll_ctl epoll_create epoll_wait
                   EPOLLIN EPOLLOUT EPOLLET
                   EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
                   EPOLLONESHOT EPOLLEXCLUSIVE
-                  signalfd $SFD_NONBLOCK);
+                  signalfd SFD_NONBLOCK);
 our %EXPORT_TAGS = (epoll => [qw(epoll_ctl epoll_create epoll_wait
                              EPOLLIN EPOLLOUT
                              EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
@@ -67,7 +67,7 @@ our (
      );
 
 my $SFD_CLOEXEC = 02000000; # Perl does not expose O_CLOEXEC
-our $SFD_NONBLOCK = O_NONBLOCK;
+sub SFD_NONBLOCK () { O_NONBLOCK }
 our $no_deprecated = 0;
 
 if ($^O eq "linux") {
diff --git a/script/public-inbox-watch b/script/public-inbox-watch
index 55183ef2..4fd6ad49 100755
--- a/script/public-inbox-watch
+++ b/script/public-inbox-watch
@@ -14,7 +14,7 @@ use PublicInbox::Watch;
 use PublicInbox::Config;
 use PublicInbox::DS;
 use PublicInbox::Sigfd;
-use PublicInbox::Syscall qw($SFD_NONBLOCK);
+use PublicInbox::Syscall qw(SFD_NONBLOCK);
 my $do_scan = 1;
 GetOptions('scan!' => \$do_scan, # undocumented, testing only
 	'help|h' => \(my $show_help)) or do { print STDERR $help; exit 1 };
@@ -57,7 +57,7 @@ if ($watch) {
 	# --no-scan is only intended for testing atm, undocumented.
 	PublicInbox::DS::requeue($scan) if $do_scan;
 
-	my $sigfd = PublicInbox::Sigfd->new($sig, $SFD_NONBLOCK);
+	my $sigfd = PublicInbox::Sigfd->new($sig, SFD_NONBLOCK);
 	local %SIG = (%SIG, %$sig) if !$sigfd;
 	if (!$sigfd) {
 		PublicInbox::Sigfd::sig_setmask($oldset);
diff --git a/t/sigfd.t b/t/sigfd.t
index 8daf3137..07120b64 100644
--- a/t/sigfd.t
+++ b/t/sigfd.t
@@ -4,7 +4,7 @@ use Test::More;
 use IO::Handle;
 use POSIX qw(:signal_h);
 use Errno qw(ENOSYS);
-use PublicInbox::Syscall qw($SFD_NONBLOCK);
+use PublicInbox::Syscall qw(SFD_NONBLOCK);
 require_ok 'PublicInbox::Sigfd';
 
 SKIP: {
@@ -42,8 +42,8 @@ SKIP: {
 		}
 		$sigfd = undef;
 
-		my $nbsig = PublicInbox::Sigfd->new($sig, $SFD_NONBLOCK);
-		ok($nbsig, 'Sigfd->new $SFD_NONBLOCK works');
+		my $nbsig = PublicInbox::Sigfd->new($sig, SFD_NONBLOCK);
+		ok($nbsig, 'Sigfd->new SFD_NONBLOCK works');
 		is($nbsig->wait_once, undef, 'nonblocking ->wait_once');
 		ok($! == Errno::EAGAIN, 'got EAGAIN');
 		kill('HUP', $$) or die "kill $!";

^ permalink raw reply related	[relevance 73%]

* [PATCH 2/5] initialize scalar for `vec' perlop modification
  @ 2021-01-17  7:09 95% ` Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2021-01-17  7:09 UTC (permalink / raw)
  To: meta

From: Eric Wong <e@yhbt.net>

Older Perls (tested 5.16.3) would warn on uninitialized scalars while
newer (tested 5.28.1) do not.  Just initialize it to an empty string
since it'll be filled in by `vec'.
---
 lib/PublicInbox/LEI.pm     | 2 +-
 lib/PublicInbox/Syscall.pm | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 1f4a3082..2784ca6b 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -670,7 +670,7 @@ sub accept_dispatch { # Listener {post_accept} callback
 	my ($sock) = @_; # ignore other
 	$sock->autoflush(1);
 	my $self = bless { sock => $sock }, __PACKAGE__;
-	vec(my $rvec, fileno($sock), 1) = 1;
+	vec(my $rvec = '', fileno($sock), 1) = 1;
 	select($rvec, undef, undef, 1) or
 		return send($sock, 'timed out waiting to recv FDs', MSG_EOR);
 	my @fds = $recv_cmd->($sock, my $buf, 4096 * 33); # >MAX_ARG_STRLEN
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index a1f53235..5ff1d65f 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -224,7 +224,7 @@ sub epoll_ctl_mod8 {
 # epoll_wait wrapper
 # ARGS: (epfd, maxevents, timeout (milliseconds), arrayref)
 #  arrayref: values modified to be [$fd, $event]
-our $epoll_wait_events;
+our $epoll_wait_events = '';
 our $epoll_wait_size = 0;
 sub epoll_wait_mod4 {
 	my ($epfd, $maxevents, $timeout_msec, $events) = @_;

^ permalink raw reply related	[relevance 95%]

* [PATCH] syscall: minor yak-shaving updates
@ 2021-05-06  8:38 88% Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2021-05-06  8:38 UTC (permalink / raw)
  To: meta

FreeBSD (and other *BSDs) do not have stable syscall numbers, so
drop no-op checks for it and add a note to use Inline::C,
instead.  Drop an implicit return for the syscall.ph loading
while we're at it, too.

On Linux, epoll_create(2) ignores the size arg since Linux
2.6.8, so just hard code it to some non-zero value.

On a side note, we can probably drop epoll_create(2) support
soon and just use epoll_create1(2) which appeared in 2.6.27+
(2008-10-09).  Our userspace (Perl and git) requirements are
already further ahead.
---
 lib/PublicInbox/Syscall.pm | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 5ff1d65f..2599f8a3 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -55,7 +55,7 @@ sub _load_syscall {
     $clean->(); # don't trust modules before us
     my $rv = eval { require 'syscall.ph'; 1 } || eval { require 'sys/syscall.ph'; 1 };
     $clean->(); # don't require modules after us trust us
-    return $rv;
+    $rv;
 }
 
 
@@ -195,21 +195,17 @@ if ($^O eq "linux") {
         *epoll_ctl = \&epoll_ctl_mod4;
     }
 }
-
-elsif ($^O eq "freebsd") {
-    if ($ENV{FREEBSD_SENDFILE}) {
-        # this is still buggy and in development
-    }
-}
+# use Inline::C for *BSD-only or general POSIX stuff.
+# Linux guarantees stable syscall numbering, BSDs only offer a stable libc
 
 ############################################################################
 # epoll functions
 ############################################################################
 
-sub epoll_defined { return $SYS_epoll_create ? 1 : 0; }
+sub epoll_defined { $SYS_epoll_create ? 1 : 0; }
 
 sub epoll_create {
-	syscall($SYS_epoll_create, $no_deprecated ? 0 : ($_[0]||100)+0);
+	syscall($SYS_epoll_create, $no_deprecated ? 0 : 100);
 }
 
 # epoll_ctl wrapper

^ permalink raw reply related	[relevance 88%]

* [PATCH] scripts: add syscall-list tool for development
@ 2021-06-18 21:44 99% Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2021-06-18 21:44 UTC (permalink / raw)
  To: meta

We'll be supporting inotify directly as we do with epoll so so
Linux users won't have to deal with XS, extra DSOs or install
Linux::Inotify2 (and common::sense) modules.
---
 MANIFEST                   |  2 ++
 devel/README               |  1 +
 devel/syscall-list         | 49 ++++++++++++++++++++++++++++++++++++++
 lib/PublicInbox/Syscall.pm |  1 +
 4 files changed, 53 insertions(+)
 create mode 100644 devel/README
 create mode 100755 devel/syscall-list

diff --git a/MANIFEST b/MANIFEST
index d4b3e75d..146a32ab 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -97,6 +97,8 @@ contrib/css/216light.css
 contrib/css/README
 contrib/selinux/el7/publicinbox.fc
 contrib/selinux/el7/publicinbox.te
+devel/README
+devel/syscall-list
 examples/README
 examples/README.unsubscribe
 examples/apache2_cgi.conf
diff --git a/devel/README b/devel/README
new file mode 100644
index 00000000..8f9a0485
--- /dev/null
+++ b/devel/README
@@ -0,0 +1 @@
+scripts use for public-inbox development that don't belong in t/
diff --git a/devel/syscall-list b/devel/syscall-list
new file mode 100755
index 00000000..b33401d9
--- /dev/null
+++ b/devel/syscall-list
@@ -0,0 +1,49 @@
+# Copyright 2021 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <http://www.gnu.org/licenses/agpl-3.0.txt>
+# Dump syscall numbers under Linux and any other kernel which
+# promises stable syscall numbers.  This is to maintain
+# PublicInbox::Syscall
+# DO NOT USE this for *BSDs, none of the current BSD kernels
+# we know about promise stable syscall numbers, we'll use
+# Inline::C to support them.
+eval 'exec perl -S $0 ${1+"$@"}' # no shebang
+	if 0; # running under some shell
+use strict;
+use File::Temp 0.19;
+my $cc = $ENV{CC} // 'cc';
+my @cflags = split(/\s+/, $ENV{CFLAGS} // '-Wall');
+my $str = do { local $/; <DATA> };
+my $tmp = File::Temp->newdir('syscall-list-XXXX', TMPDIR => 1);
+my $f = "$tmp/sc.c";
+my $x = "$tmp/sc";
+open my $fh, '>', $f or die "open $f $!";
+print $fh $str or die "print $f $!";
+close $fh or die "close $f $!";
+system($cc, '-o', $x, $f, @cflags) == 0 or die "cc failed \$?=$?";
+exec($x);
+__DATA__
+#define _GNU_SOURCE
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <stdio.h>
+
+#define D(x) printf("$" #x " = %ld;\n", (long)x)
+
+int main(void)
+{
+#ifdef __linux__
+	D(SYS_epoll_create1);
+	D(SYS_epoll_ctl);
+#ifdef SYS_epoll_wait
+	D(SYS_epoll_wait);
+#endif
+	D(SYS_epoll_pwait);
+	D(SYS_signalfd4);
+	D(SYS_inotify_init1);
+	D(SYS_inotify_add_watch);
+	D(SYS_inotify_rm_watch);
+	D(SYS_prctl);
+#endif /* Linux, any other OSes with stable syscalls? */
+	printf("size_t=%zu off_t=%zu\n", sizeof(size_t), sizeof(off_t));
+	return 0;
+}
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 2599f8a3..a8a6f42a 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -197,6 +197,7 @@ if ($^O eq "linux") {
 }
 # use Inline::C for *BSD-only or general POSIX stuff.
 # Linux guarantees stable syscall numbering, BSDs only offer a stable libc
+# use scripts/syscall-list on Linux to detect new syscall numbers
 
 ############################################################################
 # epoll functions

^ permalink raw reply related	[relevance 99%]

* [PATCH 5/9] ds: simplify signalfd use
  @ 2021-10-01  9:54 43% ` Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2021-10-01  9:54 UTC (permalink / raw)
  To: meta

Since signalfd is often combined with our event loop, give it a
convenient API and reduce the code duplication required to use it.

EventLoop is replaced with ::event_loop to allow consistent
parameter passing and avoid needlessly passing the package name
on stack.

We also avoid exporting SFD_NONBLOCK since it's the only flag we
support.  There's no sense in having the memory overhead of a
constant function when it's in cold code.
---
 lib/PublicInbox/ConfigIter.pm   |  2 +-
 lib/PublicInbox/DS.pm           | 64 ++++++++++++++++++---------------
 lib/PublicInbox/DSKQXS.pm       | 10 +++---
 lib/PublicInbox/Daemon.pm       | 14 ++------
 lib/PublicInbox/ExtMsg.pm       |  2 +-
 lib/PublicInbox/ExtSearchIdx.pm | 12 ++-----
 lib/PublicInbox/Gcf2Client.pm   |  4 +--
 lib/PublicInbox/IPC.pm          |  3 +-
 lib/PublicInbox/LEI.pm          | 17 ++-------
 lib/PublicInbox/Qspawn.pm       |  2 +-
 lib/PublicInbox/Sigfd.pm        | 10 +++---
 lib/PublicInbox/Syscall.pm      | 12 +++----
 lib/PublicInbox/Watch.pm        |  3 +-
 script/public-inbox-watch       |  9 -----
 t/dir_idle.t                    |  6 ++--
 t/ds-leak.t                     |  4 +--
 t/imapd.t                       |  6 ++--
 t/nntpd.t                       |  2 +-
 t/sigfd.t                       |  7 ++--
 t/watch_maildir.t               |  2 +-
 xt/mem-imapd-tls.t              |  6 ++--
 xt/net_writer-imap.t            |  2 +-
 22 files changed, 82 insertions(+), 117 deletions(-)

diff --git a/lib/PublicInbox/ConfigIter.pm b/lib/PublicInbox/ConfigIter.pm
index 24cb09bfdc44..14fcef83f229 100644
--- a/lib/PublicInbox/ConfigIter.pm
+++ b/lib/PublicInbox/ConfigIter.pm
@@ -1,7 +1,7 @@
 # Copyright (C) 2020-2021 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 
-# Intended for PublicInbox::DS->EventLoop in read-only daemons
+# Intended for PublicInbox::DS::event_loop in read-only daemons
 # to avoid each_inbox() monopolizing the event loop when hundreds/thousands
 # of inboxes are in play.
 package PublicInbox::ConfigIter;
diff --git a/lib/PublicInbox/DS.pm b/lib/PublicInbox/DS.pm
index 37cd6087cafb..ba6c74d0ea97 100644
--- a/lib/PublicInbox/DS.pm
+++ b/lib/PublicInbox/DS.pm
@@ -155,13 +155,6 @@ sub _InitPoller
     }
 }
 
-=head2 C<< CLASS->EventLoop() >>
-
-Start processing IO events. In most daemon programs this never exits. See
-C<PostLoopCallback> below for how to exit the loop.
-
-=cut
-
 sub now () { clock_gettime(CLOCK_MONOTONIC) }
 
 sub next_tick () {
@@ -277,26 +270,41 @@ sub PostEventLoop () {
 	$PostLoopCallback ? $PostLoopCallback->(\%DescriptorMap) : 1;
 }
 
-sub EventLoop {
-    $Epoll //= _InitPoller();
-    local $in_loop = 1;
-    my @events;
-    do {
-        my $timeout = RunTimers();
-
-        # get up to 1000 events
-        epoll_wait($Epoll, 1000, $timeout, \@events);
-        for my $fd (@events) {
-            # it's possible epoll_wait returned many events, including some at the end
-            # that ones in the front triggered unregister-interest actions.  if we
-            # can't find the %sock entry, it's because we're no longer interested
-            # in that event.
-
-	    # guard stack-not-refcounted w/ Carp + @DB::args
-            my $obj = $DescriptorMap{$fd};
-            $obj->event_step;
-        }
-    } while (PostEventLoop());
+# Start processing IO events. In most daemon programs this never exits. See
+# C<PostLoopCallback> for how to exit the loop.
+sub event_loop (;$$) {
+	my ($sig, $oldset) = @_;
+	$Epoll //= _InitPoller();
+	require PublicInbox::Sigfd if $sig;
+	my $sigfd = PublicInbox::Sigfd->new($sig, 1) if $sig;
+	local @SIG{keys %$sig} = values(%$sig) if $sig && !$sigfd;
+	local $SIG{PIPE} = 'IGNORE';
+	if (!$sigfd && $sig) {
+		# wake up every second to accept signals if we don't
+		# have signalfd or IO::KQueue:
+		sig_setmask($oldset);
+		PublicInbox::DS->SetLoopTimeout(1000);
+	}
+	$_[0] = $sigfd = $sig = undef; # $_[0] == sig
+	local $in_loop = 1;
+	my @events;
+	do {
+		my $timeout = RunTimers();
+
+		# get up to 1000 events
+		epoll_wait($Epoll, 1000, $timeout, \@events);
+		for my $fd (@events) {
+			# it's possible epoll_wait returned many events,
+			# including some at the end that ones in the front
+			# triggered unregister-interest actions.  if we can't
+			# find the %sock entry, it's because we're no longer
+			# interested in that event.
+
+			# guard stack-not-refcounted w/ Carp + @DB::args
+			my $obj = $DescriptorMap{$fd};
+			$obj->event_step;
+		}
+	} while (PostEventLoop());
 }
 
 =head2 C<< CLASS->SetPostLoopCallback( CODEREF ) >>
@@ -326,7 +334,7 @@ sub SetPostLoopCallback {
 =head2 C<< CLASS->new( $socket ) >>
 
 Create a new PublicInbox::DS subclass object for the given I<socket> which will
-react to events on it during the C<EventLoop>.
+react to events on it during the C<event_loop>.
 
 This is normally (always?) called from your subclass via:
 
diff --git a/lib/PublicInbox/DSKQXS.pm b/lib/PublicInbox/DSKQXS.pm
index acc31d9baa22..eccfa56d72cb 100644
--- a/lib/PublicInbox/DSKQXS.pm
+++ b/lib/PublicInbox/DSKQXS.pm
@@ -18,7 +18,7 @@ use Symbol qw(gensym);
 use IO::KQueue;
 use Errno qw(EAGAIN);
 use PublicInbox::Syscall qw(EPOLLONESHOT EPOLLIN EPOLLOUT EPOLLET
-	EPOLL_CTL_ADD EPOLL_CTL_MOD EPOLL_CTL_DEL SFD_NONBLOCK);
+	EPOLL_CTL_ADD EPOLL_CTL_MOD EPOLL_CTL_DEL);
 our @EXPORT_OK = qw(epoll_ctl epoll_wait);
 
 sub EV_DISPATCH () { 0x0080 }
@@ -48,16 +48,16 @@ sub new {
 # It's wasteful in that it uses another FD, but it simplifies
 # our epoll-oriented code.
 sub signalfd {
-	my ($class, $signo, $flags) = @_;
+	my ($class, $signo, $nonblock) = @_;
 	my $sym = gensym;
-	tie *$sym, $class, $signo, $flags; # calls TIEHANDLE
+	tie *$sym, $class, $signo, $nonblock; # calls TIEHANDLE
 	$sym
 }
 
 sub TIEHANDLE { # similar to signalfd()
-	my ($class, $signo, $flags) = @_;
+	my ($class, $signo, $nonblock) = @_;
 	my $self = $class->new;
-	$self->{timeout} = ($flags & SFD_NONBLOCK) ? 0 : -1;
+	$self->{timeout} = $nonblock ? 0 : -1;
 	my $kq = $self->{kq};
 	$kq->EV_SET($_, EVFILT_SIGNAL, EV_ADD) for @$signo;
 	$self;
diff --git a/lib/PublicInbox/Daemon.pm b/lib/PublicInbox/Daemon.pm
index 24dc7791b43d..5be474fa8754 100644
--- a/lib/PublicInbox/Daemon.pm
+++ b/lib/PublicInbox/Daemon.pm
@@ -15,7 +15,6 @@ use Socket qw(IPPROTO_TCP SOL_SOCKET);
 STDOUT->autoflush(1);
 STDERR->autoflush(1);
 use PublicInbox::DS qw(now);
-use PublicInbox::Syscall qw(SFD_NONBLOCK);
 require PublicInbox::Listener;
 use PublicInbox::EOFpipe;
 use PublicInbox::Sigfd;
@@ -513,7 +512,7 @@ EOF
 		},
 		CHLD => \&reap_children,
 	};
-	my $sigfd = PublicInbox::Sigfd->new($sig, 0);
+	my $sigfd = PublicInbox::Sigfd->new($sig);
 	local @SIG{keys %$sig} = values(%$sig) unless $sigfd;
 	PublicInbox::DS::sig_setmask($oldset) if !$sigfd;
 	while (1) { # main loop
@@ -630,20 +629,11 @@ sub daemon_loop ($$$$) {
 		# this calls epoll_create:
 		PublicInbox::Listener->new($_, $tls_cb || $post_accept)
 	} @listeners;
-	my $sigfd = PublicInbox::Sigfd->new($sig, SFD_NONBLOCK);
-	local @SIG{keys %$sig} = values(%$sig) unless $sigfd;
-	if (!$sigfd) {
-		# wake up every second to accept signals if we don't
-		# have signalfd or IO::KQueue:
-		PublicInbox::DS::sig_setmask($oldset);
-		PublicInbox::DS->SetLoopTimeout(1000);
-	}
-	PublicInbox::DS->EventLoop;
+	PublicInbox::DS::event_loop($sig, $oldset);
 }
 
 sub run ($$$;$) {
 	my ($default, $refresh, $post_accept, $tlsd) = @_;
-	local $SIG{PIPE} = 'IGNORE';
 	daemon_prepare($default);
 	my $af_default = $default =~ /:8080\z/ ? 'httpready' : undef;
 	my $for_destroy = daemonize();
diff --git a/lib/PublicInbox/ExtMsg.pm b/lib/PublicInbox/ExtMsg.pm
index c134de55d16c..72cae005da5a 100644
--- a/lib/PublicInbox/ExtMsg.pm
+++ b/lib/PublicInbox/ExtMsg.pm
@@ -150,7 +150,7 @@ sub ext_msg {
 	};
 }
 
-# called via PublicInbox::DS->EventLoop
+# called via PublicInbox::DS::event_loop
 sub event_step {
 	my ($ctx, $sync) = @_;
 	# can't find a partial match in current inbox, try the others:
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 6b29789a2ed8..c34225b29d9a 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -1305,19 +1305,11 @@ sub eidx_watch { # public-inbox-extindex --watch main loop
 	};
 	my $quit = PublicInbox::SearchIdx::quit_cb($sync);
 	$sig->{QUIT} = $sig->{INT} = $sig->{TERM} = $quit;
-	my $sigfd = PublicInbox::Sigfd->new($sig,
-					$PublicInbox::Syscall::SFD_NONBLOCK);
-	@SIG{keys %$sig} = values(%$sig) if !$sigfd;
 	local $self->{-watch_sync} = $sync; # for ->on_inbox_unlock
-	if (!$sigfd) {
-		# wake up every second to accept signals if we don't
-		# have signalfd or IO::KQueue:
-		PublicInbox::DS::sig_setmask($oldset);
-		PublicInbox::DS->SetLoopTimeout(1000);
-	}
 	PublicInbox::DS->SetPostLoopCallback(sub { !$sync->{quit} });
 	$pr->("initial scan complete, entering event loop\n") if $pr;
-	PublicInbox::DS->EventLoop; # calls InboxIdle->event_step
+	# calls InboxIdle->event_step:
+	PublicInbox::DS::event_loop($sig, $oldset);
 	done($self);
 }
 
diff --git a/lib/PublicInbox/Gcf2Client.pm b/lib/PublicInbox/Gcf2Client.pm
index 397774f90bf2..c5695db140cd 100644
--- a/lib/PublicInbox/Gcf2Client.pm
+++ b/lib/PublicInbox/Gcf2Client.pm
@@ -57,7 +57,7 @@ sub gcf2_async ($$$;$) {
 # ensure PublicInbox::Git::cat_async_step never calls cat_async_retry
 sub alternates_changed {}
 
-# DS->EventLoop will call this
+# DS::event_loop will call this
 sub event_step {
 	my ($self) = @_;
 	$self->flush_write;
@@ -74,7 +74,7 @@ sub event_step {
 
 sub DESTROY {
 	my ($self) = @_;
-	delete $self->{sock}; # if outside EventLoop
+	delete $self->{sock}; # if outside event_loop
 	PublicInbox::Git::DESTROY($self);
 }
 
diff --git a/lib/PublicInbox/IPC.pm b/lib/PublicInbox/IPC.pm
index 205b5b92cf71..6c189b6410aa 100644
--- a/lib/PublicInbox/IPC.pm
+++ b/lib/PublicInbox/IPC.pm
@@ -251,7 +251,7 @@ sub wq_worker_loop ($$) {
 	my $wqw = PublicInbox::WQWorker->new($self, $self->{-wq_s2});
 	PublicInbox::WQWorker->new($self, $bcast2) if $bcast2;
 	PublicInbox::DS->SetPostLoopCallback(sub { $wqw->{sock} });
-	PublicInbox::DS->EventLoop;
+	PublicInbox::DS::event_loop();
 	PublicInbox::DS->Reset;
 }
 
@@ -353,7 +353,6 @@ sub _wq_worker_start ($$$$) {
 		delete @$self{qw(-wq_s1 -wq_ppid)};
 		$self->{-wq_worker_nr} =
 				keys %{delete($self->{-wq_workers}) // {}};
-		$SIG{$_} = 'IGNORE' for (qw(PIPE));
 		$SIG{$_} = 'DEFAULT' for (qw(TTOU TTIN TERM QUIT INT CHLD));
 		local $0 = $one ? $self->{-wq_ident} :
 			"$self->{-wq_ident} $self->{-wq_worker_nr}";
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index df0bfab6dfb7..fd59235846ae 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -18,8 +18,7 @@ use POSIX qw(strftime);
 use IO::Handle ();
 use Fcntl qw(SEEK_SET);
 use PublicInbox::Config;
-use PublicInbox::Syscall qw(SFD_NONBLOCK EPOLLIN EPOLLET);
-use PublicInbox::Sigfd;
+use PublicInbox::Syscall qw(EPOLLIN EPOLLET);
 use PublicInbox::DS qw(now dwaitpid);
 use PublicInbox::Spawn qw(spawn popen_rd);
 use PublicInbox::Lock;
@@ -1291,23 +1290,11 @@ sub lazy_start {
 		USR1 => \&noop,
 		USR2 => \&noop,
 	};
-	my $sigfd = PublicInbox::Sigfd->new($sig, SFD_NONBLOCK);
-	local @SIG{keys %$sig} = values(%$sig) unless $sigfd;
-	undef $sig;
-	local $SIG{PIPE} = 'IGNORE';
 	require PublicInbox::DirIdle;
 	local $dir_idle = PublicInbox::DirIdle->new([$sock_dir], sub {
 		# just rely on wakeup to hit PostLoopCallback set below
 		dir_idle_handler($_[0]) if $_[0]->fullname ne $path;
 	}, 1);
-	if ($sigfd) {
-		undef $sigfd; # unref, already in DS::DescriptorMap
-	} else {
-		# wake up every second to accept signals if we don't
-		# have signalfd or IO::KQueue:
-		PublicInbox::DS::sig_setmask($oldset);
-		PublicInbox::DS->SetLoopTimeout(1000);
-	}
 	PublicInbox::DS->SetPostLoopCallback(sub {
 		my ($dmap, undef) = @_;
 		if (@st = defined($path) ? stat($path) : ()) {
@@ -1344,7 +1331,7 @@ sub lazy_start {
 	open STDERR, '>&STDIN' or die "redirect stderr failed: $!";
 	open STDOUT, '>&STDIN' or die "redirect stdout failed: $!";
 	# $daemon pipe to `lei' closed, main loop begins:
-	eval { PublicInbox::DS->EventLoop };
+	eval { PublicInbox::DS::event_loop($sig, $oldset) };
 	warn "event loop error: $@\n" if $@;
 	# exit() may trigger waitpid via various DESTROY, ensure interruptible
 	PublicInbox::DS::sig_setmask($oldset);
diff --git a/lib/PublicInbox/Qspawn.pm b/lib/PublicInbox/Qspawn.pm
index 7e50a59ae49e..b1285eda4a83 100644
--- a/lib/PublicInbox/Qspawn.pm
+++ b/lib/PublicInbox/Qspawn.pm
@@ -12,7 +12,7 @@
 # operate in.  This can be useful to ensure smaller inboxes can
 # be cloned while cloning of large inboxes is maxed out.
 #
-# This does not depend on the PublicInbox::DS->EventLoop or any
+# This does not depend on the PublicInbox::DS::event_loop or any
 # other external scheduling mechanism, you just need to call
 # start() and finish() appropriately. However, public-inbox-httpd
 # (which uses PublicInbox::DS)  will be able to schedule this
diff --git a/lib/PublicInbox/Sigfd.pm b/lib/PublicInbox/Sigfd.pm
index d91ea0e7ac78..81e5a1b1dd88 100644
--- a/lib/PublicInbox/Sigfd.pm
+++ b/lib/PublicInbox/Sigfd.pm
@@ -6,13 +6,13 @@
 package PublicInbox::Sigfd;
 use strict;
 use parent qw(PublicInbox::DS);
-use PublicInbox::Syscall qw(signalfd EPOLLIN EPOLLET SFD_NONBLOCK);
+use PublicInbox::Syscall qw(signalfd EPOLLIN EPOLLET);
 use POSIX ();
 
 # returns a coderef to unblock signals if neither signalfd or kqueue
 # are available.
 sub new {
-	my ($class, $sig, $flags) = @_;
+	my ($class, $sig, $nonblock) = @_;
 	my %signo = map {;
 		my $cb = $sig->{$_};
 		# SIGWINCH is 28 on FreeBSD, NetBSD, OpenBSD
@@ -24,15 +24,15 @@ sub new {
 	} keys %$sig;
 	my $self = bless { sig => \%signo }, $class;
 	my $io;
-	my $fd = signalfd(-1, [keys %signo], $flags);
+	my $fd = signalfd([keys %signo], $nonblock);
 	if (defined $fd && $fd >= 0) {
 		open($io, '+<&=', $fd) or die "open: $!";
 	} elsif (eval { require PublicInbox::DSKQXS }) {
-		$io = PublicInbox::DSKQXS->signalfd([keys %signo], $flags);
+		$io = PublicInbox::DSKQXS->signalfd([keys %signo], $nonblock);
 	} else {
 		return; # wake up every second to check for signals
 	}
-	if ($flags & SFD_NONBLOCK) { # it can go into the event loop
+	if ($nonblock) { # it can go into the event loop
 		$self->SUPER::new($io, EPOLLIN | EPOLLET);
 	} else { # master main loop
 		$self->{sock} = $io;
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index a8a6f42a2e2d..7ab4291119ea 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -22,7 +22,7 @@ our @EXPORT_OK = qw(epoll_ctl epoll_create epoll_wait
                   EPOLLIN EPOLLOUT EPOLLET
                   EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
                   EPOLLONESHOT EPOLLEXCLUSIVE
-                  signalfd SFD_NONBLOCK);
+                  signalfd);
 our %EXPORT_TAGS = (epoll => [qw(epoll_ctl epoll_create epoll_wait
                              EPOLLIN EPOLLOUT
                              EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
@@ -67,7 +67,6 @@ our (
      );
 
 my $SFD_CLOEXEC = 02000000; # Perl does not expose O_CLOEXEC
-sub SFD_NONBLOCK () { O_NONBLOCK }
 our $no_deprecated = 0;
 
 if ($^O eq "linux") {
@@ -266,14 +265,15 @@ sub epoll_wait_mod8 {
 	}
 }
 
-sub signalfd ($$$) {
-	my ($fd, $signos, $flags) = @_;
+sub signalfd ($$) {
+	my ($signos, $nonblock) = @_;
 	if ($SYS_signalfd4) {
 		my $set = POSIX::SigSet->new(@$signos);
-		syscall($SYS_signalfd4, $fd, "$$set",
+		syscall($SYS_signalfd4, -1, "$$set",
 			# $Config{sig_count} is NSIG, so this is NSIG/8:
 			int($Config{sig_count}/8),
-			$flags|$SFD_CLOEXEC);
+			# SFD_NONBLOCK == O_NONBLOCK for every architecture
+			($nonblock ? O_NONBLOCK : 0) |$SFD_CLOEXEC);
 	} else {
 		$! = ENOSYS;
 		undef;
diff --git a/lib/PublicInbox/Watch.pm b/lib/PublicInbox/Watch.pm
index 0523ad03f871..c6bebce32edb 100644
--- a/lib/PublicInbox/Watch.pm
+++ b/lib/PublicInbox/Watch.pm
@@ -12,7 +12,6 @@ use PublicInbox::MdirReader;
 use PublicInbox::NetReader;
 use PublicInbox::Filter::Base qw(REJECT);
 use PublicInbox::Spamcheck;
-use PublicInbox::Sigfd;
 use PublicInbox::DS qw(now add_timer);
 use PublicInbox::MID qw(mids);
 use PublicInbox::ContentHash qw(content_hash);
@@ -570,7 +569,7 @@ sub watch { # main entry point
 	}
 	watch_fs_init($self) if $self->{mdre};
 	PublicInbox::DS->SetPostLoopCallback(sub { !$self->quit_done });
-	PublicInbox::DS->EventLoop; # calls ->event_step
+	PublicInbox::DS::event_loop($sig, $oldset); # calls ->event_step
 	_done_for_now($self);
 }
 
diff --git a/script/public-inbox-watch b/script/public-inbox-watch
index 86349d71d415..af02d8f358f7 100755
--- a/script/public-inbox-watch
+++ b/script/public-inbox-watch
@@ -13,8 +13,6 @@ use IO::Handle; # ->autoflush
 use PublicInbox::Watch;
 use PublicInbox::Config;
 use PublicInbox::DS;
-use PublicInbox::Sigfd;
-use PublicInbox::Syscall qw(SFD_NONBLOCK);
 my $do_scan = 1;
 GetOptions('scan!' => \$do_scan, # undocumented, testing only
 	'help|h' => \(my $show_help)) or do { print STDERR $help; exit 1 };
@@ -56,12 +54,5 @@ if ($watch) {
 
 	# --no-scan is only intended for testing atm, undocumented.
 	PublicInbox::DS::requeue($scan) if $do_scan;
-
-	my $sigfd = PublicInbox::Sigfd->new($sig, SFD_NONBLOCK);
-	local @SIG{keys %$sig} = values(%$sig) unless $sigfd;
-	if (!$sigfd) {
-		PublicInbox::DS::sig_setmask($oldset);
-		PublicInbox::DS->SetLoopTimeout(1000);
-	}
 	$watch->watch($sig, $oldset) while ($watch);
 }
diff --git a/t/dir_idle.t b/t/dir_idle.t
index 0bb3b7585328..8e7f3b70eec4 100644
--- a/t/dir_idle.t
+++ b/t/dir_idle.t
@@ -15,7 +15,7 @@ my $end = 3 + now;
 PublicInbox::DS->SetPostLoopCallback(sub { scalar(@x) == 0 && now < $end });
 tick(0.011);
 rmdir("$tmpdir/a/b") or xbail "rmdir $!";
-PublicInbox::DS->EventLoop;
+PublicInbox::DS::event_loop();
 is(scalar(@x), 1, 'got an event') and
 	is($x[0]->[0]->fullname, "$tmpdir/a/b", 'got expected fullname') and
 	ok($x[0]->[0]->IN_DELETE, 'IN_DELETE set');
@@ -24,7 +24,7 @@ tick(0.011);
 rmdir("$tmpdir/a") or xbail "rmdir $!";
 @x = ();
 $end = 3 + now;
-PublicInbox::DS->EventLoop;
+PublicInbox::DS::event_loop();
 is(scalar(@x), 1, 'got an event') and
 	is($x[0]->[0]->fullname, "$tmpdir/a", 'got expected fullname') and
 	ok($x[0]->[0]->IN_DELETE_SELF, 'IN_DELETE_SELF set');
@@ -33,7 +33,7 @@ tick(0.011);
 rename("$tmpdir/c", "$tmpdir/j") or xbail "rmdir $!";
 @x = ();
 $end = 3 + now;
-PublicInbox::DS->EventLoop;
+PublicInbox::DS::event_loop();
 is(scalar(@x), 1, 'got an event') and
 	is($x[0]->[0]->fullname, "$tmpdir/c", 'got expected fullname') and
 	ok($x[0]->[0]->IN_DELETE_SELF || $x[0]->[0]->IN_MOVE_SELF,
diff --git a/t/ds-leak.t b/t/ds-leak.t
index 4c211639ed16..4e8d76cdf2ea 100644
--- a/t/ds-leak.t
+++ b/t/ds-leak.t
@@ -19,7 +19,7 @@ if ('close-on-exec for epoll and kqueue') {
 	pipe($r, $w) or die "pipe: $!";
 
 	PublicInbox::DS::add_timer(0, sub { $pid = spawn([qw(sleep 10)]) });
-	PublicInbox::DS->EventLoop;
+	PublicInbox::DS::event_loop();
 	ok($pid, 'subprocess spawned');
 
 	# wait for execve, we need to ensure lsof sees sleep(1)
@@ -56,7 +56,7 @@ SKIP: {
 	for my $i (0..$n) {
 		PublicInbox::DS->SetLoopTimeout(0);
 		PublicInbox::DS->SetPostLoopCallback($cb);
-		PublicInbox::DS->EventLoop;
+		PublicInbox::DS::event_loop();
 		PublicInbox::DS->Reset;
 	}
 	ok(1, "Reset works and doesn't hit RLIMIT_NOFILE ($n)");
diff --git a/t/imapd.t b/t/imapd.t
index bd8ad7e5162d..80757a9d4071 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -466,7 +466,7 @@ SKIP: {
 	my $w = start_script(['-watch'], undef, { 2 => $err_wr });
 
 	diag 'waiting for initial fetch...';
-	PublicInbox::DS->EventLoop;
+	PublicInbox::DS::event_loop();
 	diag 'inbox unlocked on initial fetch, waiting for IDLE';
 
 	tick until (grep(/I: \S+ idling/, <$err>));
@@ -477,7 +477,7 @@ SKIP: {
 		diag "mda error \$?=$?";
 	diag 'waiting for IMAP IDLE wakeup';
 	PublicInbox::DS->SetPostLoopCallback(undef);
-	PublicInbox::DS->EventLoop;
+	PublicInbox::DS::event_loop();
 	diag 'inbox unlocked on IDLE wakeup';
 
 	# try again with polling
@@ -494,7 +494,7 @@ SKIP: {
 
 	diag 'waiting for PollInterval wakeup';
 	PublicInbox::DS->SetPostLoopCallback(undef);
-	PublicInbox::DS->EventLoop;
+	PublicInbox::DS::event_loop();
 	diag 'inbox unlocked (poll)';
 	$w->kill;
 	$w->join;
diff --git a/t/nntpd.t b/t/nntpd.t
index 3c171a3b88b9..cf1c44f80b23 100644
--- a/t/nntpd.t
+++ b/t/nntpd.t
@@ -439,7 +439,7 @@ sub test_watch {
 	my $w = start_script(['-watch'], undef, { 2 => $err_wr });
 
 	diag 'waiting for initial fetch...';
-	PublicInbox::DS->EventLoop;
+	PublicInbox::DS::event_loop();
 	diag 'inbox unlocked on initial fetch';
 	$w->kill;
 	$w->join;
diff --git a/t/sigfd.t b/t/sigfd.t
index a1ab222c0c4d..a68b12a65f01 100644
--- a/t/sigfd.t
+++ b/t/sigfd.t
@@ -4,7 +4,6 @@ use Test::More;
 use IO::Handle;
 use POSIX qw(:signal_h);
 use Errno qw(ENOSYS);
-use PublicInbox::Syscall qw(SFD_NONBLOCK);
 require_ok 'PublicInbox::Sigfd';
 use PublicInbox::DS;
 
@@ -40,18 +39,18 @@ SKIP: {
 		}
 		$sigfd = undef;
 
-		my $nbsig = PublicInbox::Sigfd->new($sig, SFD_NONBLOCK);
+		my $nbsig = PublicInbox::Sigfd->new($sig, 1);
 		ok($nbsig, 'Sigfd->new SFD_NONBLOCK works');
 		is($nbsig->wait_once, undef, 'nonblocking ->wait_once');
 		ok($! == Errno::EAGAIN, 'got EAGAIN');
 		kill('HUP', $$) or die "kill $!";
 		PublicInbox::DS->SetPostLoopCallback(sub {}); # loop once
-		PublicInbox::DS->EventLoop;
+		PublicInbox::DS::event_loop();
 		is($hit->{HUP}->{sigfd}, 2, 'HUP sigfd fired in event loop') or
 			diag explain($hit); # sometimes fails on FreeBSD 11.x
 		kill('TERM', $$) or die "kill $!";
 		kill('HUP', $$) or die "kill $!";
-		PublicInbox::DS->EventLoop;
+		PublicInbox::DS::event_loop();
 		PublicInbox::DS->Reset;
 		is($hit->{TERM}->{sigfd}, 1, 'TERM sigfd fired in event loop');
 		is($hit->{HUP}->{sigfd}, 3, 'HUP sigfd fired in event loop');
diff --git a/t/watch_maildir.t b/t/watch_maildir.t
index e74b512f2192..6399fb7cc15f 100644
--- a/t/watch_maildir.t
+++ b/t/watch_maildir.t
@@ -199,7 +199,7 @@ More majordomo info at  http://vger.kernel.org/majordomo-info.html\n);
 
 	$em->commit; # wake -watch up
 	diag 'waiting for -watch to import new message';
-	PublicInbox::DS->EventLoop;
+	PublicInbox::DS::event_loop();
 	$wm->kill;
 	$wm->join;
 	$ii->close;
diff --git a/xt/mem-imapd-tls.t b/xt/mem-imapd-tls.t
index bd75ef452984..8992a6fc0d8d 100644
--- a/xt/mem-imapd-tls.t
+++ b/xt/mem-imapd-tls.t
@@ -95,7 +95,7 @@ foreach my $n (1..$nfd) {
 
 	# one step through the event loop
 	# do a little work as we connect:
-	PublicInbox::DS->EventLoop;
+	PublicInbox::DS::event_loop();
 
 	# try not to overflow the listen() backlog:
 	if (!($n % 128) && $DONE != $n) {
@@ -104,7 +104,7 @@ foreach my $n (1..$nfd) {
 		PublicInbox::DS->SetPostLoopCallback(sub { $DONE != $n });
 
 		# clear the backlog:
-		PublicInbox::DS->EventLoop;
+		PublicInbox::DS::event_loop();
 
 		# resume looping
 		PublicInbox::DS->SetLoopTimeout(0);
@@ -117,7 +117,7 @@ diag "done?: @".time." $DONE/$nfd";
 if ($DONE != $nfd) {
 	PublicInbox::DS->SetLoopTimeout(-1);
 	PublicInbox::DS->SetPostLoopCallback(sub { $DONE != $nfd });
-	PublicInbox::DS->EventLoop;
+	PublicInbox::DS::event_loop();
 }
 is($nfd, $DONE, "$nfd/$DONE done");
 if ($^O eq 'linux' && open(my $f, '<', "/proc/$pid/status")) {
diff --git a/xt/net_writer-imap.t b/xt/net_writer-imap.t
index 41438cf79b15..cb2ea61ff8e3 100644
--- a/xt/net_writer-imap.t
+++ b/xt/net_writer-imap.t
@@ -228,7 +228,7 @@ EOM
 	$pub_cfg->each_inbox(sub { $_[0]->subscribe_unlock('ident', $obj) });
 	my $w = start_script(['-watch'], undef, { 2 => $err_wr });
 	diag 'waiting for initial fetch...';
-	PublicInbox::DS->EventLoop;
+	PublicInbox::DS::event_loop();
 	my $ibx = $pub_cfg->lookup_name('wtest');
 	my $mm = $ibx->mm;
 	ok(defined($mm->num_for('Seen@test.example.com')),

^ permalink raw reply related	[relevance 43%]

* [PATCH 15/15] lei: use RENAME_NOREPLACE on Linux 3.15+
  @ 2021-10-21 21:10 80% ` Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2021-10-21 21:10 UTC (permalink / raw)
  To: meta

One syscall is better than two for atomicity in Maildirs.  This
means there's no window where another process can see both the
old and new file at the same time (link && unlink), nor a window
where we might inadvertantly clobber an existing file if we were
to do `stat && rename'.
---
 MANIFEST                       |  1 +
 devel/syscall-list             |  8 +++++-
 lib/PublicInbox/LeiExportKw.pm | 19 +++++--------
 lib/PublicInbox/LeiStore.pm    |  8 +++---
 lib/PublicInbox/LeiToMail.pm   |  7 +++--
 lib/PublicInbox/Syscall.pm     | 49 +++++++++++++++++++++++++++++++---
 t/rename_noreplace.t           | 26 ++++++++++++++++++
 7 files changed, 92 insertions(+), 26 deletions(-)
 create mode 100644 t/rename_noreplace.t

diff --git a/MANIFEST b/MANIFEST
index af1522d71bd1..9fd979ef02fb 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -528,6 +528,7 @@ t/psgi_v2.t
 t/purge.t
 t/qspawn.t
 t/reindex-time-range.t
+t/rename_noreplace.t
 t/replace.t
 t/reply.t
 t/run.perl
diff --git a/devel/syscall-list b/devel/syscall-list
index b33401d98ce4..3d55df1fc1d7 100755
--- a/devel/syscall-list
+++ b/devel/syscall-list
@@ -1,4 +1,4 @@
-# Copyright 2021 all contributors <meta@public-inbox.org>
+# Copyright all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <http://www.gnu.org/licenses/agpl-3.0.txt>
 # Dump syscall numbers under Linux and any other kernel which
 # promises stable syscall numbers.  This is to maintain
@@ -9,7 +9,10 @@
 eval 'exec perl -S $0 ${1+"$@"}' # no shebang
 	if 0; # running under some shell
 use strict;
+use v5.10.1;
 use File::Temp 0.19;
+use POSIX qw(uname);
+say '$machine='.(POSIX::uname())[-1];
 my $cc = $ENV{CC} // 'cc';
 my @cflags = split(/\s+/, $ENV{CFLAGS} // '-Wall');
 my $str = do { local $/; <DATA> };
@@ -43,6 +46,9 @@ int main(void)
 	D(SYS_inotify_add_watch);
 	D(SYS_inotify_rm_watch);
 	D(SYS_prctl);
+#ifdef SYS_renameat2
+	D(SYS_renameat2);
+#endif
 #endif /* Linux, any other OSes with stable syscalls? */
 	printf("size_t=%zu off_t=%zu\n", sizeof(size_t), sizeof(off_t));
 	return 0;
diff --git a/lib/PublicInbox/LeiExportKw.pm b/lib/PublicInbox/LeiExportKw.pm
index 0b65c2762633..ceeef7f21d54 100644
--- a/lib/PublicInbox/LeiExportKw.pm
+++ b/lib/PublicInbox/LeiExportKw.pm
@@ -7,6 +7,7 @@ use strict;
 use v5.10.1;
 use parent qw(PublicInbox::IPC PublicInbox::LeiInput);
 use Errno qw(EEXIST ENOENT);
+use PublicInbox::Syscall qw(rename_noreplace);
 
 sub export_kw_md { # LeiMailSync->each_src callback
 	my ($oidbin, $id, $self, $mdir) = @_;
@@ -30,30 +31,22 @@ sub export_kw_md { # LeiMailSync->each_src callback
 	my $lei = $self->{lei};
 	for my $d (@try) {
 		my $src = "$mdir/$d/$$id";
-
-		# we use link(2) + unlink(2) since rename(2) may
-		# inadvertently clobber if the "uniquefilename" part wasn't
-		# actually unique.
-		if (link($src, $dst)) { # success
-			# unlink(2) may ENOENT from parallel invocation,
-			# ignore it, but not other serious errors
-			if (!unlink($src) and $! != ENOENT) {
-				$lei->child_error(1, "E: unlink($src): $!");
-			}
+		if (rename_noreplace($src, $dst)) { # success
 			$self->{lms}->mv_src("maildir:$mdir",
 						$oidbin, $id, $bn);
-			return; # success anyways if link(2) worked
+			return; # success
 		} elsif ($! == EEXIST) { # lost race with lei/store?
 			return;
 		} elsif ($! != ENOENT) {
-			$lei->child_error(1, "E: link($src -> $dst): $!");
+			$lei->child_error(1,
+				"E: rename_noreplace($src -> $dst): $!");
 		} # else loop @try
 	}
 	my $e = $!;
 	# both tries failed
 	my $oidhex = unpack('H*', $oidbin);
 	my $src = "$mdir/{".join(',', @try)."}/$$id";
-	$lei->child_error(1, "link($src -> $dst) ($oidhex): $e");
+	$lei->child_error(1, "rename_noreplace($src -> $dst) ($oidhex): $e");
 	for (@try) { return if -e "$mdir/$_/$$id" }
 	$self->{lms}->clear_src("maildir:$mdir", $id);
 }
diff --git a/lib/PublicInbox/LeiStore.pm b/lib/PublicInbox/LeiStore.pm
index 16e7d302dc2f..f1316229bb32 100644
--- a/lib/PublicInbox/LeiStore.pm
+++ b/lib/PublicInbox/LeiStore.pm
@@ -32,6 +32,7 @@ use POSIX ();
 use IO::Handle (); # ->autoflush
 use Sys::Syslog qw(syslog openlog);
 use Errno qw(EEXIST ENOENT);
+use PublicInbox::Syscall qw(rename_noreplace);
 
 sub new {
 	my (undef, $dir, $opt) = @_;
@@ -185,10 +186,7 @@ sub export1_kw_md ($$$$$) {
 	my $dst = "$mdir/cur/$bn";
 	for my $d (@try) {
 		my $src = "$mdir/$d/$orig";
-		if (link($src, $dst)) {
-			if (!unlink($src) and $! != ENOENT) {
-				syslog('warning', "unlink($src): $!");
-			}
+		if (rename_noreplace($src, $dst)) {
 			# TODO: verify oidbin?
 			$self->{lms}->mv_src("maildir:$mdir",
 					$oidbin, \$orig, $bn);
@@ -196,7 +194,7 @@ sub export1_kw_md ($$$$$) {
 		} elsif ($! == EEXIST) { # lost race with "lei export-kw"?
 			return;
 		} elsif ($! != ENOENT) {
-			syslog('warning', "link($src -> $dst): $!");
+			syslog('warning', "rename_noreplace($src -> $dst): $!");
 		}
 	}
 	for (@try) { return if -e "$mdir/$_/$orig" };
diff --git a/lib/PublicInbox/LeiToMail.pm b/lib/PublicInbox/LeiToMail.pm
index ca4e92de48b7..d33d27aec006 100644
--- a/lib/PublicInbox/LeiToMail.pm
+++ b/lib/PublicInbox/LeiToMail.pm
@@ -12,6 +12,7 @@ use PublicInbox::Spawn qw(spawn);
 use Symbol qw(gensym);
 use IO::Handle; # ->autoflush
 use Fcntl qw(SEEK_SET SEEK_END O_CREAT O_EXCL O_WRONLY);
+use PublicInbox::Syscall qw(rename_noreplace);
 
 my %kw2char = ( # Maildir characters
 	draft => 'D',
@@ -262,10 +263,8 @@ sub _buf2maildir ($$$$) {
 		$rand = '';
 		do {
 			$base = $rand.$common.':2,'.kw2suffix($kw);
-		} while (!($ok = link($tmp, $dst.$base)) && $!{EEXIST} &&
-			($rand = _rand.','));
-		die "link($tmp, $dst$base): $!" unless $ok;
-		unlink($tmp) or warn "W: failed to unlink $tmp: $!\n";
+		} while (!($ok = rename_noreplace($tmp, $dst.$base)) &&
+			$!{EEXIST} && ($rand = _rand.','));
 		\$base;
 	} else {
 		my $err = "Error writing $smsg->{blob} to $dst: $!\n";
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 7ab4291119ea..c00385b94db8 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -13,8 +13,9 @@
 # License or the Artistic License, as specified in the Perl README file.
 package PublicInbox::Syscall;
 use strict;
+use v5.10.1;
 use parent qw(Exporter);
-use POSIX qw(ENOSYS O_NONBLOCK);
+use POSIX qw(ENOENT EEXIST ENOSYS O_NONBLOCK);
 use Config;
 
 # $VERSION = '0.25'; # Sys::Syscall version
@@ -22,7 +23,7 @@ our @EXPORT_OK = qw(epoll_ctl epoll_create epoll_wait
                   EPOLLIN EPOLLOUT EPOLLET
                   EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
                   EPOLLONESHOT EPOLLEXCLUSIVE
-                  signalfd);
+                  signalfd rename_noreplace);
 our %EXPORT_TAGS = (epoll => [qw(epoll_ctl epoll_create epoll_wait
                              EPOLLIN EPOLLOUT
                              EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
@@ -64,13 +65,16 @@ our (
      $SYS_epoll_ctl,
      $SYS_epoll_wait,
      $SYS_signalfd4,
+     $SYS_renameat2,
      );
 
 my $SFD_CLOEXEC = 02000000; # Perl does not expose O_CLOEXEC
 our $no_deprecated = 0;
 
 if ($^O eq "linux") {
-    my $machine = (POSIX::uname())[-1];
+    my (undef, undef, $release, undef, $machine) = POSIX::uname();
+    my ($maj, $min) = ($release =~ /\A([0-9]+)\.([0-9]+)/);
+    $SYS_renameat2 = 0 if "$maj.$min" < 3.15;
     # whether the machine requires 64-bit numbers to be on 8-byte
     # boundaries.
     my $u64_mod_8 = 0;
@@ -91,22 +95,26 @@ if ($^O eq "linux") {
         $SYS_epoll_ctl    = 255;
         $SYS_epoll_wait   = 256;
         $SYS_signalfd4 = 327;
+        $SYS_renameat2 //= 353;
     } elsif ($machine eq "x86_64") {
         $SYS_epoll_create = 213;
         $SYS_epoll_ctl    = 233;
         $SYS_epoll_wait   = 232;
         $SYS_signalfd4 = 289;
+	$SYS_renameat2 //= 316;
     } elsif ($machine eq 'x32') {
         $SYS_epoll_create = 1073742037;
         $SYS_epoll_ctl = 1073742057;
         $SYS_epoll_wait = 1073742056;
         $SYS_signalfd4 = 1073742113;
+	$SYS_renameat2 //= 0x40000000 + 316;
     } elsif ($machine eq 'sparc64') {
 	$SYS_epoll_create = 193;
 	$SYS_epoll_ctl = 194;
 	$SYS_epoll_wait = 195;
 	$u64_mod_8 = 1;
 	$SYS_signalfd4 = 317;
+	$SYS_renameat2 //= 345;
 	$SFD_CLOEXEC = 020000000;
     } elsif ($machine =~ m/^parisc/) {
         $SYS_epoll_create = 224;
@@ -120,18 +128,21 @@ if ($^O eq "linux") {
         $SYS_epoll_wait   = 238;
         $u64_mod_8        = 1;
         $SYS_signalfd4 = 313;
+	$SYS_renameat2 //= 357;
     } elsif ($machine eq "ppc") {
         $SYS_epoll_create = 236;
         $SYS_epoll_ctl    = 237;
         $SYS_epoll_wait   = 238;
         $u64_mod_8        = 1;
         $SYS_signalfd4 = 313;
+	$SYS_renameat2 //= 357;
     } elsif ($machine =~ m/^s390/) {
         $SYS_epoll_create = 249;
         $SYS_epoll_ctl    = 250;
         $SYS_epoll_wait   = 251;
         $u64_mod_8        = 1;
         $SYS_signalfd4 = 322;
+	$SYS_renameat2 //= 347;
     } elsif ($machine eq "ia64") {
         $SYS_epoll_create = 1243;
         $SYS_epoll_ctl    = 1244;
@@ -153,6 +164,7 @@ if ($^O eq "linux") {
         $u64_mod_8        = 1;
         $no_deprecated    = 1;
         $SYS_signalfd4 = 74;
+	$SYS_renameat2 //= 276;
     } elsif ($machine =~ m/arm(v\d+)?.*l/) {
         # ARM OABI
         $SYS_epoll_create = 250;
@@ -160,18 +172,21 @@ if ($^O eq "linux") {
         $SYS_epoll_wait   = 252;
         $u64_mod_8        = 1;
         $SYS_signalfd4 = 355;
+	$SYS_renameat2 //= 382;
     } elsif ($machine =~ m/^mips64/) {
         $SYS_epoll_create = 5207;
         $SYS_epoll_ctl    = 5208;
         $SYS_epoll_wait   = 5209;
         $u64_mod_8        = 1;
         $SYS_signalfd4 = 5283;
+	$SYS_renameat2 //= 5311;
     } elsif ($machine =~ m/^mips/) {
         $SYS_epoll_create = 4248;
         $SYS_epoll_ctl    = 4249;
         $SYS_epoll_wait   = 4250;
         $u64_mod_8        = 1;
         $SYS_signalfd4 = 4324;
+	$SYS_renameat2 //= 4351;
     } else {
         # as a last resort, try using the *.ph files which may not
         # exist or may be wrong
@@ -280,6 +295,34 @@ sub signalfd ($$) {
 	}
 }
 
+sub _rename_noreplace_racy ($$) {
+	my ($old, $new) = @_;
+	if (link($old, $new)) {
+		warn "unlink $old: $!\n" if !unlink($old) && $! != ENOENT;
+		1
+	} else {
+		undef;
+	}
+}
+
+# TODO: support FD args?
+sub rename_noreplace ($$) {
+	my ($old, $new) = @_;
+	if ($SYS_renameat2) { # RENAME_NOREPLACE = 1, AT_FDCWD = -100
+		my $ret = syscall($SYS_renameat2, -100, $old, -100, $new, 1);
+		if ($ret == 0) {
+			1; # like rename() perlop
+		} elsif ($! == ENOSYS) {
+			undef $SYS_renameat2;
+			_rename_noreplace_racy($old, $new);
+		} else {
+			undef
+		}
+	} else {
+		_rename_noreplace_racy($old, $new);
+	}
+}
+
 1;
 
 =head1 WARRANTY
diff --git a/t/rename_noreplace.t b/t/rename_noreplace.t
new file mode 100644
index 000000000000..bd1c4e9236a7
--- /dev/null
+++ b/t/rename_noreplace.t
@@ -0,0 +1,26 @@
+#!perl -w
+# Copyright (C) all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use v5.10.1;
+use PublicInbox::TestCommon;
+use_ok 'PublicInbox::Syscall', 'rename_noreplace';
+my ($tmpdir, $for_destroy) = tmpdir;
+
+open my $fh, '>', "$tmpdir/a" or xbail $!;
+my @sa = stat($fh);
+is(rename_noreplace("$tmpdir/a", "$tmpdir/b"), 1, 'rename_noreplace');
+my @sb = stat("$tmpdir/b");
+ok(scalar(@sb), 'new file exists');
+ok(!-e "$tmpdir/a", 'original gone');
+is("@sa[0,1]", "@sb[0,1]", 'same st_dev + st_ino');
+
+is(rename_noreplace("$tmpdir/a", "$tmpdir/c"), undef, 'undef on ENOENT');
+ok($!{ENOENT}, 'ENOENT set when missing');
+
+open $fh, '>', "$tmpdir/a" or xbail $!;
+is(rename_noreplace("$tmpdir/a", "$tmpdir/b"), undef, 'undef on EEXIST');
+ok($!{EEXIST}, 'EEXIST set when missing');
+is_deeply([stat("$tmpdir/b")], \@sb, 'target unchanged on EEXIST');
+
+done_testing;

^ permalink raw reply related	[relevance 80%]

* Re: Test failures with 1.7.0
  @ 2021-12-09  2:53 79%       ` Dominique Martinet
  0 siblings, 0 replies; 51+ results
From: Dominique Martinet @ 2021-12-09  2:53 UTC (permalink / raw)
  To: Julien Moutinho; +Cc: Eric Wong, meta

Julien Moutinho wrote on Thu, Dec 09, 2021 at 02:37:43AM +0100:
> I can also reproduce Infinisil's test failure with:
> $ (cd public-inbox-1.7.0; TMPDIR=/var/tmp perl -I$out/lib/perl5/site_perl t/lei_to_mail.t )
> > ok 96 - got Maildir callback
> > Use of uninitialized value in open at t/lei_to_mail.t line 263. 
> > Bail out!  No such file or directory 

I got curious on this one.
strace tells me:
----
2813384 renameat2(AT_FDCWD, "/tank/pi-lei_to_mail-2813384-n7sk/maildir/tmp/badc0ffee", AT_FDCWD, "/tank/pi-lei_to_mail-2813384-n7sk/maildir/cur/badc0ffee:2,", RENAME_NOREPLACE) = -1 EINVAL (Invalid argument)
2813384 openat(AT_FDCWD, "/tank/pi-lei_to_mail-2813384-n7sk/maildir/new/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
2813384 newfstatat(4, "", {st_mode=S_IFDIR|0755, st_size=2, ...}, AT_EMPTY_PATH) = 0
2813384 brk(0x44f4000)                  = 0x44f4000
2813384 getdents64(4, 0x44b3e40 /* 2 entries */, 131072) = 48
2813384 getdents64(4, 0x44b3e40 /* 0 entries */, 131072) = 0
2813384 close(4)                        = 0
2813384 openat(AT_FDCWD, "/tank/pi-lei_to_mail-2813384-n7sk/maildir/cur/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
2813384 newfstatat(4, "", {st_mode=S_IFDIR|0755, st_size=2, ...}, AT_EMPTY_PATH) = 0
2813384 getdents64(4, 0x44b3e40 /* 2 entries */, 131072) = 48
2813384 getdents64(4, 0x44b3e40 /* 0 entries */, 131072) = 0
2813384 close(4)                        = 0
2813384 write(2, "Use of uninitialized value in op"..., 64) = 64
2813384 openat(AT_FDCWD, "", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
2813384 getpid()                        = 2813384
2813384 getpid()                        = 2813384
2813384 getpid()                        = 2813384
2813384 write(5, "Bail out!  No such file or direc"..., 37) = 37
----

So this one is a real bug, this appears to fix it:
----
From 50a63628d505ca1c8d36f94ab5703f87a2c5e415 Mon Sep 17 00:00:00 2001
From: Dominique Martinet <asmadeus@codewreck.org>
Date: Thu, 9 Dec 2021 11:50:51 +0900
Subject: [PATCH] syscall: fallback to rename on renameat2 EINVAL

ZFS appears to incorrectly return EINVAL on renameat2 when the operation is not
supported:
renameat2(AT_FDCWD, "...", AT_FDCWD, "...", RENAME_NOREPLACE) = -1 EINVAL

Fall back to the racy rename in this case as well:

diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index c00385b94db8..78f926ac38f0 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -15,7 +15,7 @@ package PublicInbox::Syscall;
 use strict;
 use v5.10.1;
 use parent qw(Exporter);
-use POSIX qw(ENOENT EEXIST ENOSYS O_NONBLOCK);
+use POSIX qw(ENOENT EEXIST ENOSYS EINVAL O_NONBLOCK);
 use Config;
 
 # $VERSION = '0.25'; # Sys::Syscall version
@@ -312,7 +312,7 @@ sub rename_noreplace ($$) {
 		my $ret = syscall($SYS_renameat2, -100, $old, -100, $new, 1);
 		if ($ret == 0) {
 			1; # like rename() perlop
-		} elsif ($! == ENOSYS) {
+		} elsif ($! == ENOSYS || $! == EINVAL) {
 			undef $SYS_renameat2;
 			_rename_noreplace_racy($old, $new);
 		} else {
----

> This test does succeed outside Nix's sandbox:
> $ (cd public-inbox-1.7.0; export PERL_INLINE_DIRECTORY=$PWD/inline-c; rm -rf $PERL_INLINE_DIRECTORY; mkdir $PERL_INLINE_DIRECTORY; prove -bvw t/lei-sigpipe.t )
> > t/lei-sigpipe.t ..               
> > ok 1 - lei import $TMPDIR/lei-daemon/big.eml
> > ok 2 - read one byte             
> > ok 3 - signaled                  
> > ok 4 - got SIGPIPE               
> > ok 5 - quiet after sigpipe 
> > ok 6 - read one byte
> > ok 7 - signaled -f mboxcl2       
> > ok 8 - got SIGPIPE -f mboxcl2    
> > ok 9 - quiet after sigpipe -f mboxcl2
> > ok 10 - read one byte
> > ok 11 - signaled -f text         
> > ok 12 - got SIGPIPE -f text
> > ok 13 - quiet after sigpipe -f text
> > ok 14 - lei daemon-pid (daemon-pid after t/lei-sigpipe.t:44)
> > ok 15 - daemon running after t/lei-sigpipe.t:44
> > ok 16 - lei daemon-kill (daemon-kill after t/lei-sigpipe.t:44)
> > ok 17 - t/lei-sigpipe.t:44 daemon stopped
> > ok 18 - t/lei-sigpipe.t:44 daemon XDG_RUNTIME_DIR/lei/errors.log empty
> > 1..18
> > ok
> > All tests successful.
> > Files=1, Tests=18,  7 wallclock secs ( 0.06 usr  0.06 sys +  3.44 cusr  2.73 csys =  6.29 CPU)
> > Result: PASS
> 
> More surprisingly, it even succeeds when run manually
> inside the hanging Nix sandbox:
> $ sudo nsenter --target 3137110 --all -S 1000 -G 100 $(readlink -e $(which bash))
> $ . /build/env-vars
> $ cd /build
> $ export HOME=$(mktemp -d)
> $ mkdir -p $HOME/.cache/public-inbox/inline-c
> $ LANG=C prove -bvw t/lei-sigpipe.t
> > t/lei-sigpipe.t .. 
> > ok 1 - lei import $TMPDIR/lei-daemon/big.eml
> > ok 2 - read one byte
> > ok 3 - signaled 
> > ok 4 - got SIGPIPE 
> > ok 5 - quiet after sigpipe 
> > ok 6 - read one byte
> > ok 7 - signaled -f mboxcl2
> > ok 8 - got SIGPIPE -f mboxcl2
> > ok 9 - quiet after sigpipe -f mboxcl2
> > ok 10 - read one byte
> > ok 11 - signaled -f text
> > ok 12 - got SIGPIPE -f text
> > ok 13 - quiet after sigpipe -f text
> > ok 14 - lei daemon-pid (daemon-pid after t/lei-sigpipe.t:44)
> > ok 15 - daemon running after t/lei-sigpipe.t:44
> > ok 16 - lei daemon-kill (daemon-kill after t/lei-sigpipe.t:44)
> > ok 17 - t/lei-sigpipe.t:44 daemon stopped
> > ok 18 - t/lei-sigpipe.t:44 daemon XDG_RUNTIME_DIR/lei/errors.log empty
> > 1..18
> > ok
> > All tests successful.
> > Files=1, Tests=18,  4 wallclock secs ( 0.06 usr  0.06 sys +  1.23 cusr  1.48 csys =  2.83 CPU)
> > Result: PASS
> 
> Even more strange, Dominique was able to reproduce
> the hang this morning, but no longer tonight..

Yes, I don't get it, it hanged once but no longer hangs, so as much as
I'd have liked to investigate I'm a bit stuck.

With this, I can confirm running with inline-c also makes the tests that
failed with the btrfs chattr call also pass.
So all that's left is fix the proc mounts parsing there :)

-- 
Dominique

^ permalink raw reply related	[relevance 79%]

* Re: [PATCH] nodatacow: quiet chattr errors [was: Test failures with 1.7.0]
  @ 2022-01-30 21:49 60%           ` Eric Wong
    0 siblings, 1 reply; 51+ results
From: Eric Wong @ 2022-01-30 21:49 UTC (permalink / raw)
  To: Dominique Martinet; +Cc: Julien Moutinho, meta

Dominique Martinet <asmadeus@codewreck.org> wrote:
> Dominique Martinet wrote on Thu, Dec 09, 2021 at 06:14:36AM +0900:
> > I'll try giving it one, in my opinion it's more representative to test
> > with inline-c working.
> 
> So giving tests a home appear to make another test hang
> (t/lei-refresh-mail-sync.t)
> 
> I've run out of time, will provide more traces tonight
> 
> > > Yes on tests requiring stderr to be empty.  Below is a patch
> > > which should fix it; however it should only be calling chattr on
> > > btrfs mounts.
> > 
> > I'll give this a try as well.
> 
> This patch makes the tests pass as expected.
> 
> > > You can also try:
> > > 
> > >   BTRFS_TESTDIR=/path/to/your/btrfs-mount prove -bvw t/nodatacow.t
> > 
> > I'll try something similar as well.
> 
> I can confirm this one works as well after installing chattr and running
> on btrfs, so there's no problem if mounts parsing is fixed.
> 
> I'd say this hints at a problem so we're probably better off not
> silencing chattr warnings, but would also need to check if the chattr
> binary is present... Probably not worth the hassle, I don't know.

Thanks for testing the previous patch.  Actually, I prefer we
drop previous implementations and instead rely on Linux ABI
stability while shifting maintenance burden to maintainers.

Can you test the patch below?  It supercedes the other one.

Apologies for the delay, I've pretty much lost all motivation for
everything in life :<  Will try to find some energy to look into the
other issues in a bit...

Thanks.

---------8<----------
Subject: [PATCH] rewrite Linux nodatacow use in pure Perl w/o system

btrfs is Linux-only at the moment (and likely to remain that way
for practical purposes).  So rely on Linux ABI stability and use
the `syscall' and `ioctl' perlops rather than relying on Inline::C.
Inline::C (and gcc||clang) are monstrous dependencies which we
can't expect users to have.

This makes supporting new architectures more difficult, but new
architectures come along rarely and this reduces the burden for
the majority of Linux users on popular architectures (while
still avoiding the distribution of pre-built binaries).

Link: https://public-inbox.org/meta/YbCPWGaJEkV6eWfo@codewreck.org/
---
 MANIFEST                       |  1 -
 devel/syscall-list             | 10 ++++-
 lib/PublicInbox/IMAPTracker.pm |  6 +--
 lib/PublicInbox/LeiMailSync.pm |  6 +--
 lib/PublicInbox/MiscIdx.pm     |  6 +--
 lib/PublicInbox/Msgmap.pm      |  6 +--
 lib/PublicInbox/NDC_PP.pm      | 34 ---------------
 lib/PublicInbox/Over.pm        |  7 ++--
 lib/PublicInbox/SearchIdx.pm   |  7 ++--
 lib/PublicInbox/SharedKV.pm    |  6 +--
 lib/PublicInbox/Spawn.pm       | 76 +++-------------------------------
 lib/PublicInbox/Syscall.pm     | 46 +++++++++++++++++++-
 lib/PublicInbox/Xapcmd.pm      | 11 ++---
 t/nodatacow.t                  | 35 +++++++---------
 14 files changed, 101 insertions(+), 156 deletions(-)
 delete mode 100644 lib/PublicInbox/NDC_PP.pm

diff --git a/MANIFEST b/MANIFEST
index 1287182d..ca840210 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -289,7 +289,6 @@ lib/PublicInbox/MsgIter.pm
 lib/PublicInbox/MsgTime.pm
 lib/PublicInbox/Msgmap.pm
 lib/PublicInbox/MultiGit.pm
-lib/PublicInbox/NDC_PP.pm
 lib/PublicInbox/NNTP.pm
 lib/PublicInbox/NNTPD.pm
 lib/PublicInbox/NNTPdeflate.pm
diff --git a/devel/syscall-list b/devel/syscall-list
index 3d55df1f..a6b1bfa7 100755
--- a/devel/syscall-list
+++ b/devel/syscall-list
@@ -26,8 +26,10 @@ system($cc, '-o', $x, $f, @cflags) == 0 or die "cc failed \$?=$?";
 exec($x);
 __DATA__
 #define _GNU_SOURCE
-#include <unistd.h>
 #include <sys/syscall.h>
+#include <sys/ioctl.h>
+#include <linux/fs.h>
+#include <unistd.h>
 #include <stdio.h>
 
 #define D(x) printf("$" #x " = %ld;\n", (long)x)
@@ -46,6 +48,12 @@ int main(void)
 	D(SYS_inotify_add_watch);
 	D(SYS_inotify_rm_watch);
 	D(SYS_prctl);
+	D(SYS_fstatfs);
+#ifdef FS_IOC_GETFLAGS
+	printf("FS_IOC_GETFLAGS=%#lx\nFS_IOC_SETFLAGS=%#lx\n",
+		(unsigned long)FS_IOC_GETFLAGS, (unsigned long)FS_IOC_SETFLAGS);
+#endif
+
 #ifdef SYS_renameat2
 	D(SYS_renameat2);
 #endif
diff --git a/lib/PublicInbox/IMAPTracker.pm b/lib/PublicInbox/IMAPTracker.pm
index 2fd66440..4efa8a7e 100644
--- a/lib/PublicInbox/IMAPTracker.pm
+++ b/lib/PublicInbox/IMAPTracker.pm
@@ -1,4 +1,4 @@
-# Copyright (C) 2018-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 package PublicInbox::IMAPTracker;
 use strict;
@@ -75,11 +75,11 @@ sub new {
 	}
 	if (!-f $dbname) {
 		require File::Path;
-		require PublicInbox::Spawn;
+		require PublicInbox::Syscall;
 		my ($dir) = ($dbname =~ m!(.*?/)[^/]+\z!);
 		File::Path::mkpath($dir);
+		PublicInbox::Syscall::nodatacow_dir($dir);
 		open my $fh, '+>>', $dbname or die "failed to open $dbname: $!";
-		PublicInbox::Spawn::nodatacow_fd(fileno($fh));
 	}
 	my $self = bless { lock_path => "$dbname.lock", url => $url }, $class;
 	$self->lock_acquire;
diff --git a/lib/PublicInbox/LeiMailSync.pm b/lib/PublicInbox/LeiMailSync.pm
index 124eb969..182b0c22 100644
--- a/lib/PublicInbox/LeiMailSync.pm
+++ b/lib/PublicInbox/LeiMailSync.pm
@@ -1,4 +1,4 @@
-# Copyright (C) 2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 
 # for maintaining synchronization between lei/store <=> Maildir|MH|IMAP|JMAP
@@ -15,9 +15,9 @@ sub dbh_new {
 	my $f = $self->{filename};
 	my $creat = $rw && !-s $f;
 	if ($creat) {
-		require PublicInbox::Spawn;
+		require PublicInbox::Syscall;
 		open my $fh, '+>>', $f or Carp::croak "open($f): $!";
-		PublicInbox::Spawn::nodatacow_fd(fileno($fh));
+		PublicInbox::Syscall::nodatacow_fh($fh);
 	}
 	my $dbh = DBI->connect("dbi:SQLite:dbname=$f",'','', {
 		AutoCommit => 1,
diff --git a/lib/PublicInbox/MiscIdx.pm b/lib/PublicInbox/MiscIdx.pm
index f5a374b2..dc15442d 100644
--- a/lib/PublicInbox/MiscIdx.pm
+++ b/lib/PublicInbox/MiscIdx.pm
@@ -1,4 +1,4 @@
-# Copyright (C) 2020-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 
 # like PublicInbox::SearchIdx, but for searching for non-mail messages.
@@ -16,11 +16,11 @@ use v5.10.1;
 use PublicInbox::InboxWritable;
 use PublicInbox::Search; # for SWIG Xapian and Search::Xapian compat
 use PublicInbox::SearchIdx qw(index_text term_generator add_val);
-use PublicInbox::Spawn qw(nodatacow_dir);
 use Carp qw(croak);
 use File::Path ();
 use PublicInbox::MiscSearch;
 use PublicInbox::Config;
+use PublicInbox::Syscall;
 my $json;
 
 sub new {
@@ -28,7 +28,7 @@ sub new {
 	PublicInbox::SearchIdx::load_xapian_writable();
 	my $mi_dir = "$eidx->{xpfx}/misc";
 	File::Path::mkpath($mi_dir);
-	nodatacow_dir($mi_dir);
+	PublicInbox::Syscall::nodatacow_dir($mi_dir);
 	my $flags = $PublicInbox::SearchIdx::DB_CREATE_OR_OPEN;
 	$flags |= $PublicInbox::SearchIdx::DB_NO_SYNC if $eidx->{-no_fsync};
 	$json //= PublicInbox::Config::json();
diff --git a/lib/PublicInbox/Msgmap.pm b/lib/PublicInbox/Msgmap.pm
index 699a8bf0..1041cd17 100644
--- a/lib/PublicInbox/Msgmap.pm
+++ b/lib/PublicInbox/Msgmap.pm
@@ -1,4 +1,4 @@
-# Copyright (C) 2015-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 
 # bidirectional Message-ID <-> Article Number mapping for the NNTP
@@ -13,7 +13,6 @@ use v5.10.1;
 use DBI;
 use DBD::SQLite;
 use PublicInbox::Over;
-use PublicInbox::Spawn;
 use Scalar::Util qw(blessed);
 
 sub new_file {
@@ -53,7 +52,8 @@ sub tmp_clone {
 	require File::Temp;
 	my $tmp = "mm_tmp-$$-XXXX";
 	my ($fh, $fn) = File::Temp::tempfile($tmp, EXLOCK => 0, DIR => $dir);
-	PublicInbox::Spawn::nodatacow_fd(fileno($fh));
+	require PublicInbox::Syscall;
+	PublicInbox::Syscall::nodatacow_fh($fh);
 	$self->{dbh}->sqlite_backup_to_file($fn);
 	$tmp = ref($self)->new_file($fn, 2);
 	$tmp->{dbh}->do('PRAGMA journal_mode = MEMORY');
diff --git a/lib/PublicInbox/NDC_PP.pm b/lib/PublicInbox/NDC_PP.pm
deleted file mode 100644
index 57abccbe..00000000
--- a/lib/PublicInbox/NDC_PP.pm
+++ /dev/null
@@ -1,34 +0,0 @@
-# Copyright (C) 2020-2021 all contributors <meta@public-inbox.org>
-# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
-
-# Pure-perl class for Linux non-Inline::C users to disable COW for btrfs
-package PublicInbox::NDC_PP;
-use strict;
-use v5.10.1;
-
-sub nodatacow_dir ($) {
-	my ($path) = @_;
-	open my $mh, '<', '/proc/self/mounts' or return;
-	for (grep(/ btrfs /, <$mh>)) {
-		my (undef, $mnt_path, $type) = split(/ /);
-		next if $type ne 'btrfs'; # in case of false-positive from grep
-
-		# weird chars are escaped as octal
-		$mnt_path =~ s/\\(0[0-9]{2})/chr(oct($1))/egs;
-		$mnt_path .= '/' unless $mnt_path =~ m!/\z!;
-		if (index($path, $mnt_path) == 0) {
-			# error goes to stderr, but non-fatal for us
-			system('chattr', '+C', $path);
-			last;
-		}
-	}
-}
-
-sub nodatacow_fd ($) {
-	my ($fd) = @_;
-	return if $^O ne 'linux';
-	defined(my $path = readlink("/proc/self/fd/$fd")) or return;
-	nodatacow_dir($path);
-}
-
-1;
diff --git a/lib/PublicInbox/Over.pm b/lib/PublicInbox/Over.pm
index 30ad949d..786f9d92 100644
--- a/lib/PublicInbox/Over.pm
+++ b/lib/PublicInbox/Over.pm
@@ -1,4 +1,4 @@
-# Copyright (C) 2018-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 
 # for XOVER, OVER in NNTP, and feeds/homepage/threads in PSGI
@@ -18,11 +18,10 @@ sub dbh_new {
 	my $f = delete $self->{filename};
 	if (!-s $f) { # SQLite defaults mode to 0644, we want 0666
 		if ($rw) {
-			require PublicInbox::Spawn;
+			require PublicInbox::Syscall;
 			my ($dir) = ($f =~ m!(.+)/[^/]+\z!);
-			PublicInbox::Spawn::nodatacow_dir($dir);
+			PublicInbox::Syscall::nodatacow_dir($dir);
 			open my $fh, '+>>', $f or die "failed to open $f: $!";
-			PublicInbox::Spawn::nodatacow_fd(fileno($fh));
 		} else {
 			$self->{filename} = $f; # die on stat() below:
 		}
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 4e5d7d44..95b14c3a 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -1,4 +1,4 @@
-# Copyright (C) 2015-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 # based on notmuch, but with no concept of folders, files
 #
@@ -20,7 +20,7 @@ use Carp qw(croak carp);
 use POSIX qw(strftime);
 use Time::Local qw(timegm);
 use PublicInbox::OverIdx;
-use PublicInbox::Spawn qw(spawn nodatacow_dir);
+use PublicInbox::Spawn qw(spawn);
 use PublicInbox::Git qw(git_unquote);
 use PublicInbox::MsgTime qw(msg_timestamp msg_datestamp);
 use PublicInbox::Address;
@@ -139,7 +139,8 @@ sub idx_acquire {
 		if (!-d $dir && (!$is_shard ||
 				($is_shard && need_xapian($self)))) {
 			File::Path::mkpath($dir);
-			nodatacow_dir($dir);
+			require PublicInbox::Syscall;
+			PublicInbox::Syscall::nodatacow_dir($dir);
 			$self->{-set_has_threadid_once} = 1;
 		}
 	}
diff --git a/lib/PublicInbox/SharedKV.pm b/lib/PublicInbox/SharedKV.pm
index 4297efed..95a3cb14 100644
--- a/lib/PublicInbox/SharedKV.pm
+++ b/lib/PublicInbox/SharedKV.pm
@@ -1,4 +1,4 @@
-# Copyright (C) 2020-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 
 # fork()-friendly key-value store.  Will be used for making
@@ -49,9 +49,9 @@ sub new {
 	my $f = $self->{filename} = "$dir/$base.sqlite3";
 	$self->{lock_path} = $opt->{lock_path} // "$dir/$base.flock";
 	unless (-s $f) {
-		PublicInbox::Spawn::nodatacow_dir($dir); # for journal/shm/wal
+		require PublicInbox::Syscall;
+		PublicInbox::Syscall::nodatacow_dir($dir); # for journal/shm/wal
 		open my $fh, '+>>', $f or die "failed to open $f: $!";
-		PublicInbox::Spawn::nodatacow_fd(fileno($fh));
 	}
 	$self;
 }
diff --git a/lib/PublicInbox/Spawn.pm b/lib/PublicInbox/Spawn.pm
index e0a51c21..137b8087 100644
--- a/lib/PublicInbox/Spawn.pm
+++ b/lib/PublicInbox/Spawn.pm
@@ -1,4 +1,4 @@
-# Copyright (C) 2016-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 #
 # This allows vfork to be used for spawning subprocesses if
@@ -21,7 +21,7 @@ use Symbol qw(gensym);
 use Fcntl qw(LOCK_EX SEEK_SET);
 use IO::Handle ();
 use PublicInbox::ProcessPipe;
-our @EXPORT_OK = qw(which spawn popen_rd run_die nodatacow_dir);
+our @EXPORT_OK = qw(which spawn popen_rd run_die);
 our @RLIMITS = qw(RLIMIT_CPU RLIMIT_CORE RLIMIT_DATA);
 
 BEGIN {
@@ -268,62 +268,12 @@ void recv_cmd4(PerlIO *s, SV *buf, STRLEN n)
 #endif /* defined(CMSG_SPACE) && defined(CMSG_LEN) */
 ALL_LIBC
 
-# btrfs on Linux is copy-on-write (COW) by default.  As of Linux 5.7,
-# this still leads to fragmentation for SQLite and Xapian files where
-# random I/O happens, so we disable COW just for SQLite files and Xapian
-# directories.  Disabling COW disables checksumming, so we only do this
-# for regeneratable files, and not canonical git storage (git doesn't
-# checksum refs, only data under $GIT_DIR/objects).
-	my $set_nodatacow = $^O eq 'linux' ? <<'SET_NODATACOW' : '';
-#include <sys/ioctl.h>
-#include <sys/vfs.h>
-#include <linux/magic.h>
-#include <linux/fs.h>
-#include <dirent.h>
-
-void nodatacow_fd(int fd)
-{
-	struct statfs buf;
-	int val = 0;
-
-	if (fstatfs(fd, &buf) < 0) {
-		fprintf(stderr, "fstatfs: %s\\n", strerror(errno));
-		return;
-	}
-
-	/* only btrfs is known to have this problem, so skip for non-btrfs */
-	if (buf.f_type != BTRFS_SUPER_MAGIC)
-		return;
-
-	if (ioctl(fd, FS_IOC_GETFLAGS, &val) < 0) {
-		fprintf(stderr, "FS_IOC_GET_FLAGS: %s\\n", strerror(errno));
-		return;
-	}
-	val |= FS_NOCOW_FL;
-	if (ioctl(fd, FS_IOC_SETFLAGS, &val) < 0)
-		fprintf(stderr, "FS_IOC_SET_FLAGS: %s\\n", strerror(errno));
-}
-
-void nodatacow_dir(const char *dir)
-{
-	DIR *dh = opendir(dir);
-	int fd;
-
-	if (!dh) croak("opendir(%s): %s", dir, strerror(errno));
-	fd = dirfd(dh);
-	if (fd >= 0)
-		nodatacow_fd(fd);
-	/* ENOTSUP probably won't happen under Linux... */
-	closedir(dh);
-}
-SET_NODATACOW
-
 	my $inline_dir = $ENV{PERL_INLINE_DIRECTORY} //= (
 			$ENV{XDG_CACHE_HOME} //
 			( ($ENV{HOME} // '/nonexistent').'/.cache' )
 		).'/public-inbox/inline-c';
 	warn "$inline_dir exists, not writable\n" if -e $inline_dir && !-w _;
-	$set_nodatacow = $all_libc = undef unless -d _ && -w _;
+	$all_libc = undef unless -d _ && -w _;
 	if (defined $all_libc) {
 		my $f = "$inline_dir/.public-inbox.lock";
 		open my $oldout, '>&', \*STDOUT or die "dup(1): $!";
@@ -337,17 +287,10 @@ SET_NODATACOW
 		# CentOS 7.x ships Inline 0.53, 0.64+ has built-in locking
 		flock($fh, LOCK_EX) or die "LOCK_EX($f): $!";
 		eval <<'EOM';
-use Inline C => $all_libc.$set_nodatacow, BUILD_NOISY => 1;
+use Inline C => $all_libc, BUILD_NOISY => 1;
 EOM
 		my $err = $@;
 		my $ndc_err = '';
-		if ($err && $set_nodatacow) { # missing Linux kernel headers
-			$ndc_err = "with set_nodatacow: <\n$err\n>\n";
-			undef $set_nodatacow;
-			eval <<'EOM';
-use Inline C => $all_libc, BUILD_NOISY => 1;
-EOM
-		};
 		$err = $@;
 		open(STDERR, '>&', $olderr) or warn "restore stderr: $!";
 		open(STDOUT, '>&', $oldout) or warn "restore stdout: $!";
@@ -356,22 +299,13 @@ EOM
 			my @msg = <$fh>;
 			warn "Inline::C build failed:\n",
 				$ndc_err, $err, "\n", @msg;
-			$set_nodatacow = $all_libc = undef;
-		} elsif ($ndc_err) {
-			warn "Inline::C build succeeded w/o set_nodatacow\n",
-				"error $ndc_err";
+			$all_libc = undef;
 		}
 	}
 	unless ($all_libc) {
 		require PublicInbox::SpawnPP;
 		*pi_fork_exec = \&PublicInbox::SpawnPP::pi_fork_exec
 	}
-	unless ($set_nodatacow) {
-		require PublicInbox::NDC_PP;
-		no warnings 'once';
-		*nodatacow_fd = \&PublicInbox::NDC_PP::nodatacow_fd;
-		*nodatacow_dir = \&PublicInbox::NDC_PP::nodatacow_dir;
-	}
 } # /BEGIN
 
 sub which ($) {
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index c00385b9..bcfae2cb 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -5,7 +5,7 @@
 # This license differs from the rest of public-inbox
 #
 # This module is Copyright (c) 2005 Six Apart, Ltd.
-# Copyright (C) 2019-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 #
 # All rights reserved.
 #
@@ -68,6 +68,7 @@ our (
      $SYS_renameat2,
      );
 
+my $SYS_fstatfs; # don't need fstatfs64, just statfs.f_type
 my $SFD_CLOEXEC = 02000000; # Perl does not expose O_CLOEXEC
 our $no_deprecated = 0;
 
@@ -96,18 +97,21 @@ if ($^O eq "linux") {
         $SYS_epoll_wait   = 256;
         $SYS_signalfd4 = 327;
         $SYS_renameat2 //= 353;
+	$SYS_fstatfs = 100;
     } elsif ($machine eq "x86_64") {
         $SYS_epoll_create = 213;
         $SYS_epoll_ctl    = 233;
         $SYS_epoll_wait   = 232;
         $SYS_signalfd4 = 289;
 	$SYS_renameat2 //= 316;
+	$SYS_fstatfs = 138;
     } elsif ($machine eq 'x32') {
         $SYS_epoll_create = 1073742037;
         $SYS_epoll_ctl = 1073742057;
         $SYS_epoll_wait = 1073742056;
         $SYS_signalfd4 = 1073742113;
 	$SYS_renameat2 //= 0x40000000 + 316;
+	$SYS_fstatfs = 138;
     } elsif ($machine eq 'sparc64') {
 	$SYS_epoll_create = 193;
 	$SYS_epoll_ctl = 194;
@@ -116,6 +120,7 @@ if ($^O eq "linux") {
 	$SYS_signalfd4 = 317;
 	$SYS_renameat2 //= 345;
 	$SFD_CLOEXEC = 020000000;
+	$SYS_fstatfs = 158;
     } elsif ($machine =~ m/^parisc/) {
         $SYS_epoll_create = 224;
         $SYS_epoll_ctl    = 225;
@@ -129,6 +134,7 @@ if ($^O eq "linux") {
         $u64_mod_8        = 1;
         $SYS_signalfd4 = 313;
 	$SYS_renameat2 //= 357;
+	$SYS_fstatfs = 100;
     } elsif ($machine eq "ppc") {
         $SYS_epoll_create = 236;
         $SYS_epoll_ctl    = 237;
@@ -136,6 +142,7 @@ if ($^O eq "linux") {
         $u64_mod_8        = 1;
         $SYS_signalfd4 = 313;
 	$SYS_renameat2 //= 357;
+	$SYS_fstatfs = 100;
     } elsif ($machine =~ m/^s390/) {
         $SYS_epoll_create = 249;
         $SYS_epoll_ctl    = 250;
@@ -143,6 +150,7 @@ if ($^O eq "linux") {
         $u64_mod_8        = 1;
         $SYS_signalfd4 = 322;
 	$SYS_renameat2 //= 347;
+	$SYS_fstatfs = 100;
     } elsif ($machine eq "ia64") {
         $SYS_epoll_create = 1243;
         $SYS_epoll_ctl    = 1244;
@@ -165,6 +173,7 @@ if ($^O eq "linux") {
         $no_deprecated    = 1;
         $SYS_signalfd4 = 74;
 	$SYS_renameat2 //= 276;
+	$SYS_fstatfs = 44;
     } elsif ($machine =~ m/arm(v\d+)?.*l/) {
         # ARM OABI
         $SYS_epoll_create = 250;
@@ -173,6 +182,7 @@ if ($^O eq "linux") {
         $u64_mod_8        = 1;
         $SYS_signalfd4 = 355;
 	$SYS_renameat2 //= 382;
+	$SYS_fstatfs = 100;
     } elsif ($machine =~ m/^mips64/) {
         $SYS_epoll_create = 5207;
         $SYS_epoll_ctl    = 5208;
@@ -180,6 +190,7 @@ if ($^O eq "linux") {
         $u64_mod_8        = 1;
         $SYS_signalfd4 = 5283;
 	$SYS_renameat2 //= 5311;
+	$SYS_fstatfs = 5135;
     } elsif ($machine =~ m/^mips/) {
         $SYS_epoll_create = 4248;
         $SYS_epoll_ctl    = 4249;
@@ -187,6 +198,7 @@ if ($^O eq "linux") {
         $u64_mod_8        = 1;
         $SYS_signalfd4 = 4324;
 	$SYS_renameat2 //= 4351;
+	$SYS_fstatfs = 4100;
     } else {
         # as a last resort, try using the *.ph files which may not
         # exist or may be wrong
@@ -323,6 +335,38 @@ sub rename_noreplace ($$) {
 	}
 }
 
+sub nodatacow_fh {
+	return if !defined($SYS_fstatfs);
+	my $buf = '';
+	vec($buf, 120 * 8 - 1, 1) = 0;
+	my ($fh) = @_;
+	syscall($SYS_fstatfs, fileno($fh), $buf) == 0 or
+		return warn("fstatfs: $!\n");
+	my $f_type = unpack('l!', $buf); # statfs.f_type is a signed word
+	return if $f_type != 0x9123683E; # BTRFS_SUPER_MAGIC
+
+	state ($FS_IOC_GETFLAGS, $FS_IOC_SETFLAGS);
+	unless (defined $FS_IOC_GETFLAGS) {
+		if (substr($Config{byteorder}, 0, 4) eq '1234') {
+			$FS_IOC_GETFLAGS = 0x80086601;
+			$FS_IOC_SETFLAGS = 0x40086602;
+		} else { # Big endian
+			$FS_IOC_GETFLAGS = 0x40086601;
+			$FS_IOC_SETFLAGS = 0x80086602;
+		}
+	}
+	ioctl($fh, $FS_IOC_GETFLAGS, $buf) //
+		return warn("FS_IOC_GET_FLAGS: $!\n");
+	my $attr = unpack('l!', $buf);
+	return if ($attr & 0x00800000); # FS_NOCOW_FL;
+	ioctl($fh, $FS_IOC_SETFLAGS, pack('l', $attr | 0x00800000)) //
+		warn("FS_IOC_SET_FLAGS: $!\n");
+}
+
+sub nodatacow_dir {
+	if (open my $fh, '<', $_[0]) { nodatacow_fh($fh) }
+}
+
 1;
 
 =head1 WARRANTY
diff --git a/lib/PublicInbox/Xapcmd.pm b/lib/PublicInbox/Xapcmd.pm
index 44e0f8e5..10685636 100644
--- a/lib/PublicInbox/Xapcmd.pm
+++ b/lib/PublicInbox/Xapcmd.pm
@@ -1,8 +1,9 @@
-# Copyright (C) 2018-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 package PublicInbox::Xapcmd;
 use strict;
-use PublicInbox::Spawn qw(which popen_rd nodatacow_dir);
+use PublicInbox::Spawn qw(which popen_rd);
+use PublicInbox::Syscall;
 use PublicInbox::Admin qw(setup_signals);
 use PublicInbox::Over;
 use PublicInbox::SearchIdx;
@@ -211,7 +212,7 @@ sub prepare_run {
 		my $v = PublicInbox::Search::SCHEMA_VERSION();
 		my $wip = File::Temp->newdir("xapian$v-XXXX", DIR => $dir);
 		$tmp->{$old} = $wip;
-		nodatacow_dir($wip->dirname);
+		PublicInbox::Syscall::nodatacow_dir($wip->dirname);
 		push @queue, [ $old, $wip ];
 	} elsif (defined $old) {
 		opendir my $dh, $old or die "Failed to opendir $old: $!\n";
@@ -242,7 +243,7 @@ sub prepare_run {
 			same_fs_or_die($old, $wip->dirname);
 			my $cur = "$old/$dn";
 			push @queue, [ $src // $cur , $wip ];
-			nodatacow_dir($wip->dirname);
+			PublicInbox::Syscall::nodatacow_dir($wip->dirname);
 			$tmp->{$cur} = $wip;
 		}
 		# mark old shards to be unlinked
@@ -443,7 +444,7 @@ sub cpdb ($$) { # cb_spawn callback
 		$ft = File::Temp->newdir("$new.compact-XXXX", DIR => $dir);
 		setup_signals();
 		$tmp = $ft->dirname;
-		nodatacow_dir($tmp);
+		PublicInbox::Syscall::nodatacow_dir($tmp);
 	} else {
 		$tmp = $new;
 	}
diff --git a/t/nodatacow.t b/t/nodatacow.t
index 19247c10..83aa227f 100644
--- a/t/nodatacow.t
+++ b/t/nodatacow.t
@@ -1,48 +1,41 @@
 #!perl -w
-# Copyright (C) 2020-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 use strict; use v5.10.1; use PublicInbox::TestCommon;
 use File::Temp 0.19;
-use_ok 'PublicInbox::NDC_PP';
+use_ok 'PublicInbox::Syscall';
+
+# btrfs on Linux is copy-on-write (COW) by default.  As of Linux 5.7,
+# this still leads to fragmentation for SQLite and Xapian files where
+# random I/O happens, so we disable COW just for SQLite files and Xapian
+# directories.  Disabling COW disables checksumming, so we only do this
+# for regeneratable files, and not canonical git storage (git doesn't
+# checksum refs, only data under $GIT_DIR/objects).
 
 SKIP: {
 	my $nr = 2;
 	skip 'test is Linux-only', $nr if $^O ne 'linux';
 	my $dir = $ENV{BTRFS_TESTDIR};
 	skip 'BTRFS_TESTDIR not defined', $nr unless defined $dir;
-	require_cmd('chattr', 1) or skip 'chattr(1) not installed', $nr;
+
 	my $lsattr = require_cmd('lsattr', 1) or
 		skip 'lsattr(1) not installed', $nr;
+
 	my $tmp = File::Temp->newdir('nodatacow-XXXX', DIR => $dir);
 	my $dn = $tmp->dirname;
 
 	my $name = "$dn/pp.f";
 	open my $fh, '>', $name or BAIL_OUT "open($name): $!";
-	my $pp_sub = \&PublicInbox::NDC_PP::nodatacow_fd;
-	$pp_sub->(fileno($fh));
+	PublicInbox::Syscall::nodatacow_fh($fh);
 	my $res = xqx([$lsattr, $name]);
 	like($res, qr/C.*\Q$name\E/, "`C' attribute set on fd with pure Perl");
 
+
 	$name = "$dn/pp.d";
 	mkdir($name) or BAIL_OUT "mkdir($name) $!";
-	PublicInbox::NDC_PP::nodatacow_dir($name);
+	PublicInbox::Syscall::nodatacow_dir($name);
 	$res = xqx([$lsattr, '-d', $name]);
 	like($res, qr/C.*\Q$name\E/, "`C' attribute set on dir with pure Perl");
-
-	$name = "$dn/ic.f";
-	my $ic_sub = \&PublicInbox::Spawn::nodatacow_fd;
-	$pp_sub == $ic_sub and
-		skip 'Inline::C or Linux kernel headers missing', 2;
-	open $fh, '>', $name or BAIL_OUT "open($name): $!";
-	$ic_sub->(fileno($fh));
-	$res = xqx([$lsattr, $name]);
-	like($res, qr/C.*\Q$name\E/, "`C' attribute set on fd with Inline::C");
-
-	$name = "$dn/ic.d";
-	mkdir($name) or BAIL_OUT "mkdir($name) $!";
-	PublicInbox::Spawn::nodatacow_dir($name);
-	$res = xqx([$lsattr, '-d', $name]);
-	like($res, qr/C.*\Q$name\E/, "`C' attribute set on dir with Inline::C");
 };
 
 done_testing;

^ permalink raw reply related	[relevance 60%]

* Re: [PATCH] nodatacow: quiet chattr errors [was: Test failures with 1.7.0]
  @ 2022-02-01  1:27 74%                   ` Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2022-02-01  1:27 UTC (permalink / raw)
  To: Dominique Martinet; +Cc: Julien Moutinho, meta

Dominique Martinet <asmadeus@codewreck.org> wrote:
> Eric Wong wrote on Mon, Jan 31, 2022 at 02:03:11AM +0000:
> > Ah, intentionally setting BTRFS_TESTDIR to something that isn't
> > btrfs will break, yes.
> 
> Ok, so this is specific to the test.
> Checking now nodatacow_fh skips files with non-btrfs magic, so it looks
> good to me!
> 
> I've taken a look at the code now, just one more question: I don't
> understand why you've made the ioctl value depend on endianness ?

Actually, it's not endianess, it's architecture specific :x
The nasty patch below fixes it.  I really wish there are better
options for scripting languages to use C headers :<

> That aside the code looks good to me, if you do Reviewed-by tags feel
> free to add mine (Dominique Martinet <asmadeus@codewreck.org>) once that
> question is answered.
> If you don't care, I don't care either :)

Thanks, just added the Noticed-by: as an attribution to the
below patch.

> this isn't really a problem, I've only tried because I'm a monkey :)

Yeah.  Setting BTRFS_TESTDIR to a non-btrfs dir isn't going to
be supported :>

> Thanks again for the support, don't hesitate to ask if you need further
> info or tests for the zfs problems.

You're welcome, and will do.

-----8<-----
Subject: [PATCH] syscall: FS_IOC_*FLAGS: define on per-architecture basis

It turns out these Linux ioctls are unfortunately
architecture-dependent, and not endian-dependent.
Fixup some warning messages while we're at it, too.

Fixes: 14fa0abdcc7b6513 ("rewrite Linux nodatacow use in pure Perl w/o system")
Link: https://public-inbox.org/meta/YfdYqLhDVQRQ9NGT@codewreck.org/
Noticed-by: Dominique Martinet <asmadeus@codewreck.org>
---
 lib/PublicInbox/Syscall.pm | 35 +++++++++++++++++++++++------------
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index bcfae2cb..31c67a14 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -69,6 +69,7 @@ our (
      );
 
 my $SYS_fstatfs; # don't need fstatfs64, just statfs.f_type
+my ($FS_IOC_GETFLAGS, $FS_IOC_SETFLAGS);
 my $SFD_CLOEXEC = 02000000; # Perl does not expose O_CLOEXEC
 our $no_deprecated = 0;
 
@@ -98,6 +99,8 @@ if ($^O eq "linux") {
         $SYS_signalfd4 = 327;
         $SYS_renameat2 //= 353;
 	$SYS_fstatfs = 100;
+	$FS_IOC_GETFLAGS = 0x80046601;
+	$FS_IOC_SETFLAGS = 0x40046602;
     } elsif ($machine eq "x86_64") {
         $SYS_epoll_create = 213;
         $SYS_epoll_ctl    = 233;
@@ -105,6 +108,8 @@ if ($^O eq "linux") {
         $SYS_signalfd4 = 289;
 	$SYS_renameat2 //= 316;
 	$SYS_fstatfs = 138;
+	$FS_IOC_GETFLAGS = 0x80086601;
+	$FS_IOC_SETFLAGS = 0x40086602;
     } elsif ($machine eq 'x32') {
         $SYS_epoll_create = 1073742037;
         $SYS_epoll_ctl = 1073742057;
@@ -112,6 +117,8 @@ if ($^O eq "linux") {
         $SYS_signalfd4 = 1073742113;
 	$SYS_renameat2 //= 0x40000000 + 316;
 	$SYS_fstatfs = 138;
+	$FS_IOC_GETFLAGS = 0x80046601;
+	$FS_IOC_SETFLAGS = 0x40046602;
     } elsif ($machine eq 'sparc64') {
 	$SYS_epoll_create = 193;
 	$SYS_epoll_ctl = 194;
@@ -121,6 +128,8 @@ if ($^O eq "linux") {
 	$SYS_renameat2 //= 345;
 	$SFD_CLOEXEC = 020000000;
 	$SYS_fstatfs = 158;
+	$FS_IOC_GETFLAGS = 0x40086601;
+	$FS_IOC_SETFLAGS = 0x80086602;
     } elsif ($machine =~ m/^parisc/) {
         $SYS_epoll_create = 224;
         $SYS_epoll_ctl    = 225;
@@ -135,6 +144,8 @@ if ($^O eq "linux") {
         $SYS_signalfd4 = 313;
 	$SYS_renameat2 //= 357;
 	$SYS_fstatfs = 100;
+	$FS_IOC_GETFLAGS = 0x40086601;
+	$FS_IOC_SETFLAGS = 0x80086602;
     } elsif ($machine eq "ppc") {
         $SYS_epoll_create = 236;
         $SYS_epoll_ctl    = 237;
@@ -143,6 +154,8 @@ if ($^O eq "linux") {
         $SYS_signalfd4 = 313;
 	$SYS_renameat2 //= 357;
 	$SYS_fstatfs = 100;
+	$FS_IOC_GETFLAGS = 0x40086601;
+	$FS_IOC_SETFLAGS = 0x80086602;
     } elsif ($machine =~ m/^s390/) {
         $SYS_epoll_create = 249;
         $SYS_epoll_ctl    = 250;
@@ -174,6 +187,8 @@ if ($^O eq "linux") {
         $SYS_signalfd4 = 74;
 	$SYS_renameat2 //= 276;
 	$SYS_fstatfs = 44;
+	$FS_IOC_GETFLAGS = 0x80086601;
+	$FS_IOC_SETFLAGS = 0x40086602;
     } elsif ($machine =~ m/arm(v\d+)?.*l/) {
         # ARM OABI
         $SYS_epoll_create = 250;
@@ -191,6 +206,8 @@ if ($^O eq "linux") {
         $SYS_signalfd4 = 5283;
 	$SYS_renameat2 //= 5311;
 	$SYS_fstatfs = 5135;
+	$FS_IOC_GETFLAGS = 0x40046601;
+	$FS_IOC_SETFLAGS = 0x80046602;
     } elsif ($machine =~ m/^mips/) {
         $SYS_epoll_create = 4248;
         $SYS_epoll_ctl    = 4249;
@@ -199,6 +216,8 @@ if ($^O eq "linux") {
         $SYS_signalfd4 = 4324;
 	$SYS_renameat2 //= 4351;
 	$SYS_fstatfs = 4100;
+	$FS_IOC_GETFLAGS = 0x40046601;
+	$FS_IOC_SETFLAGS = 0x80046602;
     } else {
         # as a last resort, try using the *.ph files which may not
         # exist or may be wrong
@@ -345,22 +364,14 @@ sub nodatacow_fh {
 	my $f_type = unpack('l!', $buf); # statfs.f_type is a signed word
 	return if $f_type != 0x9123683E; # BTRFS_SUPER_MAGIC
 
-	state ($FS_IOC_GETFLAGS, $FS_IOC_SETFLAGS);
-	unless (defined $FS_IOC_GETFLAGS) {
-		if (substr($Config{byteorder}, 0, 4) eq '1234') {
-			$FS_IOC_GETFLAGS = 0x80086601;
-			$FS_IOC_SETFLAGS = 0x40086602;
-		} else { # Big endian
-			$FS_IOC_GETFLAGS = 0x40086601;
-			$FS_IOC_SETFLAGS = 0x80086602;
-		}
-	}
+	$FS_IOC_GETFLAGS //
+		return warn('FS_IOC_GETFLAGS undefined for platform');
 	ioctl($fh, $FS_IOC_GETFLAGS, $buf) //
-		return warn("FS_IOC_GET_FLAGS: $!\n");
+		return warn("FS_IOC_GETFLAGS: $!\n");
 	my $attr = unpack('l!', $buf);
 	return if ($attr & 0x00800000); # FS_NOCOW_FL;
 	ioctl($fh, $FS_IOC_SETFLAGS, pack('l', $attr | 0x00800000)) //
-		warn("FS_IOC_SET_FLAGS: $!\n");
+		warn("FS_IOC_SETFLAGS: $!\n");
 }
 
 sub nodatacow_dir {

^ permalink raw reply related	[relevance 74%]

* [PATCH 1/3] syscall: drop unused EEXIST import
  @ 2022-03-23  8:54 93% ` Eric Wong
  2022-03-23  8:54 84% ` [PATCH 3/3] syscall: implement sendmsg+recvmsg in pure Perl Eric Wong
  2022-03-23 21:08 79% ` [PATCH 4/3] syscall: add sendmsg+recvmsg for remaining arches Eric Wong
  2 siblings, 0 replies; 51+ results
From: Eric Wong @ 2022-03-23  8:54 UTC (permalink / raw)
  To: meta

We've never used it, actually.
---
 lib/PublicInbox/Syscall.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index a2b7490a..806c192e 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -15,7 +15,7 @@ package PublicInbox::Syscall;
 use strict;
 use v5.10.1;
 use parent qw(Exporter);
-use POSIX qw(ENOENT EEXIST ENOSYS EINVAL O_NONBLOCK);
+use POSIX qw(ENOENT ENOSYS EINVAL O_NONBLOCK);
 use Config;
 
 # $VERSION = '0.25'; # Sys::Syscall version

^ permalink raw reply related	[relevance 93%]

* [PATCH 3/3] syscall: implement sendmsg+recvmsg in pure Perl
    2022-03-23  8:54 93% ` [PATCH 1/3] syscall: drop unused EEXIST import Eric Wong
@ 2022-03-23  8:54 84% ` Eric Wong
  2022-03-23 21:08 79% ` [PATCH 4/3] syscall: add sendmsg+recvmsg for remaining arches Eric Wong
  2 siblings, 0 replies; 51+ results
From: Eric Wong @ 2022-03-23  8:54 UTC (permalink / raw)
  To: meta

Socket::MsgHdr is only packaged for Debian and derivatives at
the moment, and Inline::C pulling in gcc/clang is a huge amount
of disk space and bandwidth for some users.

This enables disk space and/or bandwidth-limited users to use lei.

Only Linux guarantees a stable ABI and syscall numbers, but
that's the majority of our userbase.  FreeBSD users will still
have to use Inline::C (or get Socket::MsgHdr packaged).

x86, x32, and x86-64 are all currently supported, more to be added.
---
 devel/syscall-list         |  2 +
 lib/PublicInbox/Syscall.pm | 95 ++++++++++++++++++++++++++++++++++++++
 script/lei                 |  6 ++-
 t/cmd_ipc.t                | 13 +++++-
 4 files changed, 114 insertions(+), 2 deletions(-)

diff --git a/devel/syscall-list b/devel/syscall-list
index a6b1bfa7..d33a8a78 100755
--- a/devel/syscall-list
+++ b/devel/syscall-list
@@ -49,6 +49,8 @@ int main(void)
 	D(SYS_inotify_rm_watch);
 	D(SYS_prctl);
 	D(SYS_fstatfs);
+	D(SYS_sendmsg);
+	D(SYS_recvmsg);
 #ifdef FS_IOC_GETFLAGS
 	printf("FS_IOC_GETFLAGS=%#lx\nFS_IOC_SETFLAGS=%#lx\n",
 		(unsigned long)FS_IOC_GETFLAGS, (unsigned long)FS_IOC_SETFLAGS);
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 806c192e..e9175ceb 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -2,6 +2,9 @@
 # specifically the Debian libsys-syscall-perl 0.25-6 version to
 # fix upstream regressions in 0.25.
 #
+# See devel/syscall-list in the public-inbox source tree for maintenance
+# <https://80x24.org/public-inbox.git>
+#
 # This license differs from the rest of public-inbox
 #
 # This module is Copyright (c) 2005 Six Apart, Ltd.
@@ -16,6 +19,7 @@ use strict;
 use v5.10.1;
 use parent qw(Exporter);
 use POSIX qw(ENOENT ENOSYS EINVAL O_NONBLOCK);
+use Socket qw(SOL_SOCKET SCM_RIGHTS);
 use Config;
 
 # $VERSION = '0.25'; # Sys::Syscall version
@@ -42,8 +46,19 @@ use constant {
 	EPOLL_CTL_ADD => 1,
 	EPOLL_CTL_DEL => 2,
 	EPOLL_CTL_MOD => 3,
+	SIZEOF_int => $Config{intsize},
+	SIZEOF_size_t => $Config{sizesize},
+	NUL => "\0",
+};
+
+use constant {
+	TMPL_size_t => SIZEOF_size_t == 8 ? 'Q' : 'L',
+	BYTES_4_hole => SIZEOF_size_t == 8 ? 'L' : '',
+	# cmsg_len, cmsg_level, cmsg_type
+	SIZEOF_cmsghdr => SIZEOF_int * 2 + SIZEOF_size_t,
 };
 
+my @BYTES_4_hole = BYTES_4_hole ? (0) : ();
 our $loaded_syscall = 0;
 
 sub _load_syscall {
@@ -68,6 +83,7 @@ our (
      $SYS_renameat2,
      );
 
+my ($SYS_sendmsg, $SYS_recvmsg);
 my $SYS_fstatfs; # don't need fstatfs64, just statfs.f_type
 my ($FS_IOC_GETFLAGS, $FS_IOC_SETFLAGS);
 my $SFD_CLOEXEC = 02000000; # Perl does not expose O_CLOEXEC
@@ -99,6 +115,8 @@ if ($^O eq "linux") {
         $SYS_signalfd4 = 327;
         $SYS_renameat2 //= 353;
 	$SYS_fstatfs = 100;
+	$SYS_sendmsg = 370;
+	$SYS_recvmsg = 372;
 	$FS_IOC_GETFLAGS = 0x80046601;
 	$FS_IOC_SETFLAGS = 0x40046602;
     } elsif ($machine eq "x86_64") {
@@ -108,6 +126,8 @@ if ($^O eq "linux") {
         $SYS_signalfd4 = 289;
 	$SYS_renameat2 //= 316;
 	$SYS_fstatfs = 138;
+	$SYS_sendmsg = 46;
+	$SYS_recvmsg = 47;
 	$FS_IOC_GETFLAGS = 0x80086601;
 	$FS_IOC_SETFLAGS = 0x40086602;
     } elsif ($machine eq 'x32') {
@@ -117,6 +137,8 @@ if ($^O eq "linux") {
         $SYS_signalfd4 = 1073742113;
 	$SYS_renameat2 //= 0x40000000 + 316;
 	$SYS_fstatfs = 138;
+	$SYS_sendmsg = 0x40000206;
+	$SYS_recvmsg = 0x40000207;
 	$FS_IOC_GETFLAGS = 0x80046601;
 	$FS_IOC_SETFLAGS = 0x40046602;
     } elsif ($machine eq 'sparc64') {
@@ -378,6 +400,79 @@ sub nodatacow_dir {
 	if (open my $fh, '<', $_[0]) { nodatacow_fh($fh) }
 }
 
+sub CMSG_ALIGN ($) { ($_[0] + SIZEOF_size_t - 1) & ~(SIZEOF_size_t - 1) }
+use constant CMSG_ALIGN_SIZEOF_cmsghdr => CMSG_ALIGN(SIZEOF_cmsghdr);
+sub CMSG_SPACE ($) { CMSG_ALIGN($_[0]) + CMSG_ALIGN_SIZEOF_cmsghdr }
+sub CMSG_LEN ($) { CMSG_ALIGN_SIZEOF_cmsghdr + $_[0] }
+
+if (defined($SYS_sendmsg) && defined($SYS_recvmsg)) {
+no warnings 'once';
+*send_cmd4 = sub ($$$$) {
+	my ($sock, $fds, undef, $flags) = @_;
+	my $iov = pack('P'.TMPL_size_t,
+			$_[2] // NUL, length($_[2] // NUL) || 1);
+	my $cmsghdr = pack(TMPL_size_t . # cmsg_len
+			'LL' .  # cmsg_level, cmsg_type,
+			('i' x scalar(@$fds)),
+			CMSG_LEN(scalar(@$fds) * SIZEOF_int), # cmsg_len
+			SOL_SOCKET, SCM_RIGHTS, # cmsg_{level,type}
+			@$fds); # CMSG_DATA
+	my $mh = pack('PL' . # msg_name, msg_namelen (socklen_t (U32))
+			BYTES_4_hole . # 4-byte padding on 64-bit
+			'P'.TMPL_size_t . # msg_iov, msg_iovlen,
+			'P'.TMPL_size_t . # msg_control, msg_controllen,
+			'i', # msg_flags
+			NUL, 0, # msg_name, msg_namelen (unused)
+			@BYTES_4_hole,
+			$iov, 1, # msg_iov, msg_iovlen
+			$cmsghdr, # msg_control
+			CMSG_SPACE(scalar(@$fds) * SIZEOF_int), # msg_controllen
+			0); # msg_flags
+	my $sent;
+	my $try = 0;
+	do {
+		$sent = syscall($SYS_sendmsg, fileno($sock), $mh, $flags);
+	} while ($sent < 0 &&
+			($!{ENOBUFS} || $!{ENOMEM} || $!{ETOOMANYREFS}) &&
+			(++$try < 50) &&
+			warn "sleeping on sendmsg: $! (#$try)\n" &&
+			select(undef, undef, undef, 0.1) == 0);
+	$sent >= 0 ? $sent : undef;
+};
+
+*recv_cmd4 = sub ($$$) {
+	my ($sock, undef, $len) = @_;
+	vec($_[1], ($len + 1) * 8, 1) = 0;
+	vec(my $cmsghdr = '', 256 * 8 - 1, 1) = 1;
+	my $iov = pack('P'.TMPL_size_t, $_[1], $len);
+	my $mh = pack('PL' . # msg_name, msg_namelen (socklen_t (U32))
+			BYTES_4_hole . # 4-byte padding on 64-bit
+			'P'.TMPL_size_t . # msg_iov, msg_iovlen,
+			'P'.TMPL_size_t . # msg_control, msg_controllen,
+			'i', # msg_flags
+			NUL, 0, # msg_name, msg_namelen (unused)
+			@BYTES_4_hole,
+			$iov, 1, # msg_iov, msg_iovlen
+			$cmsghdr, # msg_control
+			256, # msg_controllen
+			0); # msg_flags
+	my $r = syscall($SYS_recvmsg, fileno($sock), $mh, 0);
+	return (undef) if $r < 0; # $! set
+	substr($_[1], $r, length($_[1]), '');
+	my @ret;
+	if ($r > 0) {
+		my ($len, $lvl, $type, @fds) = unpack(TMPL_size_t . # cmsg_len
+					'LLi*', # cmsg_level, cmsg_type, @fds
+					$cmsghdr);
+		if ($lvl == SOL_SOCKET && $type == SCM_RIGHTS) {
+			$len -= CMSG_ALIGN_SIZEOF_cmsghdr;
+			@ret = @fds[0..(($len / SIZEOF_int) - 1)];
+		}
+	}
+	@ret;
+};
+}
+
 1;
 
 =head1 WARRANTY
diff --git a/script/lei b/script/lei
index 5cad19d7..adef9944 100755
--- a/script/lei
+++ b/script/lei
@@ -1,5 +1,5 @@
 #!perl -w
-# Copyright (C) 2020-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 use strict;
 use v5.10.1;
@@ -9,6 +9,10 @@ my $narg = 5;
 my $sock;
 my $recv_cmd = PublicInbox::CmdIPC4->can('recv_cmd4');
 my $send_cmd = PublicInbox::CmdIPC4->can('send_cmd4') // do {
+	require PublicInbox::Syscall;
+	$recv_cmd = PublicInbox::Syscall->can('recv_cmd4');
+	PublicInbox::Syscall->can('send_cmd4');
+} // do {
 	my $inline_dir = $ENV{PERL_INLINE_DIRECTORY} //= (
 			$ENV{XDG_CACHE_HOME} //
 			( ($ENV{HOME} // '/nonexistent').'/.cache' )
diff --git a/t/cmd_ipc.t b/t/cmd_ipc.t
index dd90fa2a..75697a15 100644
--- a/t/cmd_ipc.t
+++ b/t/cmd_ipc.t
@@ -1,5 +1,5 @@
 #!perl -w
-# Copyright (C) 2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 use strict;
 use v5.10.1;
@@ -142,4 +142,15 @@ SKIP: {
 	}
 }
 
+SKIP: {
+	skip 'not Linux', 1 if $^O ne 'linux';
+	require_ok 'PublicInbox::Syscall';
+	$send = PublicInbox::Syscall->can('send_cmd4') or
+		skip 'send_cmd4 not defined for arch';
+	$recv = PublicInbox::Syscall->can('recv_cmd4') or
+		skip 'recv_cmd4 not defined for arch';
+	$do_test->(SOCK_STREAM, 0, 'PP Linux stream');
+	$do_test->($SOCK_SEQPACKET, MSG_EOR, 'PP Linux seqpacket');
+}
+
 done_testing;

^ permalink raw reply related	[relevance 84%]

* [PATCH 4/3] syscall: add sendmsg+recvmsg for remaining arches
    2022-03-23  8:54 93% ` [PATCH 1/3] syscall: drop unused EEXIST import Eric Wong
  2022-03-23  8:54 84% ` [PATCH 3/3] syscall: implement sendmsg+recvmsg in pure Perl Eric Wong
@ 2022-03-23 21:08 79% ` Eric Wong
  2 siblings, 0 replies; 51+ results
From: Eric Wong @ 2022-03-23 21:08 UTC (permalink / raw)
  To: meta

aarch64, ppc64le, sparc64, loongarch64, and mips (32-bit userspace)
are all tested via machines from the GCC Farm Project
<https://cfarm.tetaneutral.net/>

Remaining syscall numbers are from musl <https://musl.libc.org/>
---
 lib/PublicInbox/Syscall.pm | 32 +++++++++++++++++++++++---------
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index e9175ceb..e972aa41 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -3,7 +3,8 @@
 # fix upstream regressions in 0.25.
 #
 # See devel/syscall-list in the public-inbox source tree for maintenance
-# <https://80x24.org/public-inbox.git>
+# <https://80x24.org/public-inbox.git>, and machines from the GCC Farm:
+# <https://cfarm.tetaneutral.net/>
 #
 # This license differs from the rest of public-inbox
 #
@@ -150,6 +151,8 @@ if ($^O eq "linux") {
 	$SYS_renameat2 //= 345;
 	$SFD_CLOEXEC = 020000000;
 	$SYS_fstatfs = 158;
+	$SYS_sendmsg = 114;
+	$SYS_recvmsg = 113;
 	$FS_IOC_GETFLAGS = 0x40086601;
 	$FS_IOC_SETFLAGS = 0x80086602;
     } elsif ($machine =~ m/^parisc/) {
@@ -166,6 +169,8 @@ if ($^O eq "linux") {
         $SYS_signalfd4 = 313;
 	$SYS_renameat2 //= 357;
 	$SYS_fstatfs = 100;
+	$SYS_sendmsg = 341;
+	$SYS_recvmsg = 342;
 	$FS_IOC_GETFLAGS = 0x40086601;
 	$FS_IOC_SETFLAGS = 0x80086602;
     } elsif ($machine eq "ppc") {
@@ -178,7 +183,7 @@ if ($^O eq "linux") {
 	$SYS_fstatfs = 100;
 	$FS_IOC_GETFLAGS = 0x40086601;
 	$FS_IOC_SETFLAGS = 0x80086602;
-    } elsif ($machine =~ m/^s390/) {
+    } elsif ($machine =~ m/^s390/) { # untested, no machine on cfarm
         $SYS_epoll_create = 249;
         $SYS_epoll_ctl    = 250;
         $SYS_epoll_wait   = 251;
@@ -186,13 +191,15 @@ if ($^O eq "linux") {
         $SYS_signalfd4 = 322;
 	$SYS_renameat2 //= 347;
 	$SYS_fstatfs = 100;
-    } elsif ($machine eq "ia64") {
+	$SYS_sendmsg = 370;
+	$SYS_recvmsg = 372;
+    } elsif ($machine eq 'ia64') { # untested, no machine on cfarm
         $SYS_epoll_create = 1243;
         $SYS_epoll_ctl    = 1244;
         $SYS_epoll_wait   = 1245;
         $u64_mod_8        = 1;
         $SYS_signalfd4 = 289;
-    } elsif ($machine eq "alpha") {
+    } elsif ($machine eq "alpha") { # untested, no machine on cfarm
         # natural alignment, ints are 32-bits
         $SYS_epoll_create = 407;
         $SYS_epoll_ctl    = 408;
@@ -200,7 +207,7 @@ if ($^O eq "linux") {
         $u64_mod_8        = 1;
         $SYS_signalfd4 = 484;
 	$SFD_CLOEXEC = 010000000;
-    } elsif ($machine eq "aarch64") {
+    } elsif ($machine eq 'aarch64' || $machine eq 'loongarch64') {
         $SYS_epoll_create = 20;  # (sys_epoll_create1)
         $SYS_epoll_ctl    = 21;
         $SYS_epoll_wait   = 22;  # (sys_epoll_pwait)
@@ -209,10 +216,11 @@ if ($^O eq "linux") {
         $SYS_signalfd4 = 74;
 	$SYS_renameat2 //= 276;
 	$SYS_fstatfs = 44;
+	$SYS_sendmsg = 211;
+	$SYS_recvmsg = 212;
 	$FS_IOC_GETFLAGS = 0x80086601;
 	$FS_IOC_SETFLAGS = 0x40086602;
-    } elsif ($machine =~ m/arm(v\d+)?.*l/) {
-        # ARM OABI
+    } elsif ($machine =~ m/arm(v\d+)?.*l/) { # ARM OABI (untested on cfarm)
         $SYS_epoll_create = 250;
         $SYS_epoll_ctl    = 251;
         $SYS_epoll_wait   = 252;
@@ -220,7 +228,9 @@ if ($^O eq "linux") {
         $SYS_signalfd4 = 355;
 	$SYS_renameat2 //= 382;
 	$SYS_fstatfs = 100;
-    } elsif ($machine =~ m/^mips64/) {
+	$SYS_sendmsg = 296;
+	$SYS_recvmsg = 297;
+    } elsif ($machine =~ m/^mips64/) { # cfarm only has 32-bit userspace
         $SYS_epoll_create = 5207;
         $SYS_epoll_ctl    = 5208;
         $SYS_epoll_wait   = 5209;
@@ -228,9 +238,11 @@ if ($^O eq "linux") {
         $SYS_signalfd4 = 5283;
 	$SYS_renameat2 //= 5311;
 	$SYS_fstatfs = 5135;
+	$SYS_sendmsg = 5045;
+	$SYS_recvmsg = 5046;
 	$FS_IOC_GETFLAGS = 0x40046601;
 	$FS_IOC_SETFLAGS = 0x80046602;
-    } elsif ($machine =~ m/^mips/) {
+    } elsif ($machine =~ m/^mips/) { # 32-bit, tested on mips64 cfarm machine
         $SYS_epoll_create = 4248;
         $SYS_epoll_ctl    = 4249;
         $SYS_epoll_wait   = 4250;
@@ -238,6 +250,8 @@ if ($^O eq "linux") {
         $SYS_signalfd4 = 4324;
 	$SYS_renameat2 //= 4351;
 	$SYS_fstatfs = 4100;
+	$SYS_sendmsg = 4179;
+	$SYS_recvmsg = 4177;
 	$FS_IOC_GETFLAGS = 0x40046601;
 	$FS_IOC_SETFLAGS = 0x80046602;
     } else {

^ permalink raw reply related	[relevance 79%]

* [PATCH 4/4] syscall: golf + more idiomatic buffer initialization
    2022-04-18  9:50 91% ` [PATCH 2/4] syscall: more idiomatic cmsghdr space allocation Eric Wong
@ 2022-04-18  9:50 93% ` Eric Wong
  1 sibling, 0 replies; 51+ results
From: Eric Wong @ 2022-04-18  9:50 UTC (permalink / raw)
  To: meta

While `vec' is useful for user-supplied buffers to avoid excess
memory traffic, but provides no benefit when we need to allocate
our own buffers as we do in nodatacow_fh, since Perl can't elide
memset(ptr, 0, len).  So just use the idiomatic `"\0" x $LEN' here.
---
 lib/PublicInbox/Syscall.pm | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index cc282f9f..22b779ad 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -390,12 +390,10 @@ sub rename_noreplace ($$) {
 	}
 }
 
-sub nodatacow_fh {
-	return if !defined($SYS_fstatfs);
-	my $buf = '';
-	vec($buf, 120 * 8 - 1, 1) = 0;
+sub nodatacow_fh ($) {
 	my ($fh) = @_;
-	syscall($SYS_fstatfs, fileno($fh), $buf) == 0 or
+	my $buf = "\0" x 120;
+	syscall($SYS_fstatfs // return, fileno($fh), $buf) == 0 or
 		return warn("fstatfs: $!\n");
 	my $f_type = unpack('l!', $buf); # statfs.f_type is a signed word
 	return if $f_type != 0x9123683E; # BTRFS_SUPER_MAGIC

^ permalink raw reply related	[relevance 93%]

* [PATCH 2/4] syscall: more idiomatic cmsghdr space allocation
  @ 2022-04-18  9:50 91% ` Eric Wong
  2022-04-18  9:50 93% ` [PATCH 4/4] syscall: golf + more idiomatic buffer initialization Eric Wong
  1 sibling, 0 replies; 51+ results
From: Eric Wong @ 2022-04-18  9:50 UTC (permalink / raw)
  To: meta

Since we know the space required under Linux, we can use the
same initialization as the Inline::C version instead of
hard-coding 256 as we do for Socket::MsgHdr.
---
 lib/PublicInbox/Syscall.pm | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index e972aa41..cc282f9f 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -418,6 +418,7 @@ sub CMSG_ALIGN ($) { ($_[0] + SIZEOF_size_t - 1) & ~(SIZEOF_size_t - 1) }
 use constant CMSG_ALIGN_SIZEOF_cmsghdr => CMSG_ALIGN(SIZEOF_cmsghdr);
 sub CMSG_SPACE ($) { CMSG_ALIGN($_[0]) + CMSG_ALIGN_SIZEOF_cmsghdr }
 sub CMSG_LEN ($) { CMSG_ALIGN_SIZEOF_cmsghdr + $_[0] }
+use constant msg_controllen => CMSG_SPACE(10 * SIZEOF_int) + 16; # 10 FDs
 
 if (defined($SYS_sendmsg) && defined($SYS_recvmsg)) {
 no warnings 'once';
@@ -457,7 +458,7 @@ no warnings 'once';
 *recv_cmd4 = sub ($$$) {
 	my ($sock, undef, $len) = @_;
 	vec($_[1], ($len + 1) * 8, 1) = 0;
-	vec(my $cmsghdr = '', 256 * 8 - 1, 1) = 1;
+	my $cmsghdr = "\0" x msg_controllen; # 10 * sizeof(int)
 	my $iov = pack('P'.TMPL_size_t, $_[1], $len);
 	my $mh = pack('PL' . # msg_name, msg_namelen (socklen_t (U32))
 			BYTES_4_hole . # 4-byte padding on 64-bit
@@ -468,7 +469,7 @@ no warnings 'once';
 			@BYTES_4_hole,
 			$iov, 1, # msg_iov, msg_iovlen
 			$cmsghdr, # msg_control
-			256, # msg_controllen
+			msg_controllen,
 			0); # msg_flags
 	my $r = syscall($SYS_recvmsg, fileno($sock), $mh, 0);
 	return (undef) if $r < 0; # $! set

^ permalink raw reply related	[relevance 91%]

* [PATCH 2/2] lei: move to v5.12 to avoid "use strict"
  @ 2022-04-23 22:03 88% ` Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2022-04-23 22:03 UTC (permalink / raw)
  To: meta

Socket.pm still loads strict.pm, unfortunately, which hurts
startup time; but we'll save some LoC this way.
---
 lib/PublicInbox/CmdIPC4.pm | 3 +--
 lib/PublicInbox/LEI.pm     | 3 +--
 lib/PublicInbox/Spawn.pm   | 2 +-
 lib/PublicInbox/Syscall.pm | 3 +--
 script/lei                 | 3 +--
 5 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/lib/PublicInbox/CmdIPC4.pm b/lib/PublicInbox/CmdIPC4.pm
index 76938b6d..e368d032 100644
--- a/lib/PublicInbox/CmdIPC4.pm
+++ b/lib/PublicInbox/CmdIPC4.pm
@@ -5,8 +5,7 @@
 # first choice for script/lei front-end and 2nd choice for lei backend
 # libsocket-msghdr-perl is in Debian but not many other distros as of 2021.
 package PublicInbox::CmdIPC4;
-use strict;
-use v5.10.1;
+use v5.12;
 use Socket qw(SOL_SOCKET SCM_RIGHTS);
 BEGIN { eval {
 require Socket::MsgHdr; # XS
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 93b4ea03..89aa4119 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -6,8 +6,7 @@
 # local clients with read/write access to the FS and use as many
 # system resources as the local user has access to.
 package PublicInbox::LEI;
-use strict;
-use v5.10.1;
+use v5.12;
 use parent qw(PublicInbox::DS PublicInbox::LeiExternal
 	PublicInbox::LeiQuery);
 use Getopt::Long ();
diff --git a/lib/PublicInbox/Spawn.pm b/lib/PublicInbox/Spawn.pm
index 137b8087..3f69108a 100644
--- a/lib/PublicInbox/Spawn.pm
+++ b/lib/PublicInbox/Spawn.pm
@@ -15,7 +15,7 @@
 # We don't want too many DSOs: https://udrepper.livejournal.com/8790.html
 
 package PublicInbox::Spawn;
-use strict;
+use v5.12;
 use parent qw(Exporter);
 use Symbol qw(gensym);
 use Fcntl qw(LOCK_EX SEEK_SET);
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 22b779ad..777c44d0 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -16,8 +16,7 @@
 # You may distribute under the terms of either the GNU General Public
 # License or the Artistic License, as specified in the Perl README file.
 package PublicInbox::Syscall;
-use strict;
-use v5.10.1;
+use v5.12;
 use parent qw(Exporter);
 use POSIX qw(ENOENT ENOSYS EINVAL O_NONBLOCK);
 use Socket qw(SOL_SOCKET SCM_RIGHTS);
diff --git a/script/lei b/script/lei
index adef9944..5feb7751 100755
--- a/script/lei
+++ b/script/lei
@@ -1,8 +1,7 @@
 #!perl -w
 # Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
-use strict;
-use v5.10.1;
+use v5.12;
 use Socket qw(AF_UNIX SOCK_SEQPACKET MSG_EOR pack_sockaddr_un);
 use PublicInbox::CmdIPC4;
 my $narg = 5;

^ permalink raw reply related	[relevance 88%]

* [PATCH] syscall: add support for riscv64
@ 2022-08-11 22:33 93% Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2022-08-11 22:33 UTC (permalink / raw)
  To: meta

Tested on gcc92.fsffrance.org from cfarm.
---
 lib/PublicInbox/Syscall.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 777c44d0..46496bca 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -206,7 +206,7 @@ if ($^O eq "linux") {
         $u64_mod_8        = 1;
         $SYS_signalfd4 = 484;
 	$SFD_CLOEXEC = 010000000;
-    } elsif ($machine eq 'aarch64' || $machine eq 'loongarch64') {
+    } elsif ($machine =~ /\A(?:loong)?aarch64\z/ || $machine eq 'riscv64') {
         $SYS_epoll_create = 20;  # (sys_epoll_create1)
         $SYS_epoll_ctl    = 21;
         $SYS_epoll_wait   = 22;  # (sys_epoll_pwait)

^ permalink raw reply related	[relevance 93%]

* [PATCH 1/4] syscall: initialize buffer for vec()
  @ 2022-09-29 17:48 93% ` Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2022-09-29 17:48 UTC (permalink / raw)
  To: meta; +Cc: Konstantin Ryabitsev

This is needed for older Perls (tested perl 5.16.3 on CentOS 7).
---
 lib/PublicInbox/Syscall.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 46496bca..412ca64f 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -454,7 +454,7 @@ no warnings 'once';
 
 *recv_cmd4 = sub ($$$) {
 	my ($sock, undef, $len) = @_;
-	vec($_[1], ($len + 1) * 8, 1) = 0;
+	vec($_[1] //= '', ($len + 1) * 8, 1) = 0;
 	my $cmsghdr = "\0" x msg_controllen; # 10 * sizeof(int)
 	my $iov = pack('P'.TMPL_size_t, $_[1], $len);
 	my $mh = pack('PL' . # msg_name, msg_namelen (socklen_t (U32))

^ permalink raw reply related	[relevance 93%]

* [PATCH 2/2] sigfd: set SIGWINCH for MIPS and PA-RISC on Linux
    2022-10-17  9:30 91% ` [PATCH 1/2] syscall: avoid needless string comparison on x86-64 Eric Wong
@ 2022-10-17  9:30 96% ` Eric Wong
  1 sibling, 0 replies; 51+ results
From: Eric Wong @ 2022-10-17  9:30 UTC (permalink / raw)
  To: Nicolás Ojeda Bär; +Cc: meta

SIGWINCH is actually different for these architectures on Linux
according to the signal(7) man page.

Note: AFAICS there's no parisc machine in the GCC Farm[1],
so it remains untested.  I've only tested mips64 for mips,
but I expect them to both work.

OpenBSD (on gcc231) octeon defines SIGWINCH as the common `28',
so it appears Linux is the only one with arch-dependent syscall
numbers.

[1] https://cfarm.tetaneutral.net/machines/list/
---
 devel/syscall-list         |  2 ++
 lib/PublicInbox/Sigfd.pm   | 13 ++++---------
 lib/PublicInbox/Syscall.pm |  5 ++++-
 t/sigfd.t                  |  5 ++++-
 4 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/devel/syscall-list b/devel/syscall-list
index adb450da..0b36c0e2 100755
--- a/devel/syscall-list
+++ b/devel/syscall-list
@@ -26,6 +26,7 @@ system($cc, '-o', $x, $f, @cflags) == 0 or die "cc failed \$?=$?";
 exec($x);
 __DATA__
 #define _GNU_SOURCE
+#include <signal.h>
 #include <sys/syscall.h>
 #include <sys/ioctl.h>
 #ifdef __linux__
@@ -65,5 +66,6 @@ int main(void)
 #endif /* Linux, any other OSes with stable syscalls? */
 	printf("size_t=%zu off_t=%zu pid_t=%zu\n",
 		 sizeof(size_t), sizeof(off_t), sizeof(pid_t));
+	D(SIGWINCH);
 	return 0;
 }
diff --git a/lib/PublicInbox/Sigfd.pm b/lib/PublicInbox/Sigfd.pm
index 583f9f14..3d964be3 100644
--- a/lib/PublicInbox/Sigfd.pm
+++ b/lib/PublicInbox/Sigfd.pm
@@ -1,4 +1,4 @@
-# Copyright (C) 2019-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 
 # Wraps a signalfd (or similar) for PublicInbox::DS
@@ -6,7 +6,7 @@
 package PublicInbox::Sigfd;
 use strict;
 use parent qw(PublicInbox::DS);
-use PublicInbox::Syscall qw(signalfd EPOLLIN EPOLLET);
+use PublicInbox::Syscall qw(signalfd EPOLLIN EPOLLET %SIGNUM);
 use POSIX ();
 
 # returns a coderef to unblock signals if neither signalfd or kqueue
@@ -14,13 +14,8 @@ use POSIX ();
 sub new {
 	my ($class, $sig, $nonblock) = @_;
 	my %signo = map {;
-		my $cb = $sig->{$_};
-		# SIGWINCH is 28 on FreeBSD, NetBSD, OpenBSD, Darwin
-		my $num = ($_ eq 'WINCH' && $^O =~ /linux|bsd|darwin/i) ? 28 : do {
-			my $m = "SIG$_";
-			POSIX->$m;
-		};
-		$num => $cb;
+		# $num => $cb;
+		($SIGNUM{$_} // POSIX->can("SIG$_")->()) => $sig->{$_}
 	} keys %$sig;
 	my $self = bless { sig => \%signo }, $class;
 	my $io;
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 291e0489..ee4c6107 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -21,13 +21,14 @@ use parent qw(Exporter);
 use POSIX qw(ENOENT ENOSYS EINVAL O_NONBLOCK);
 use Socket qw(SOL_SOCKET SCM_RIGHTS);
 use Config;
+our %SIGNUM = (WINCH => 28); # most Linux, {Free,Net,Open}BSD, *Darwin
 
 # $VERSION = '0.25'; # Sys::Syscall version
 our @EXPORT_OK = qw(epoll_ctl epoll_create epoll_wait
                   EPOLLIN EPOLLOUT EPOLLET
                   EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
                   EPOLLONESHOT EPOLLEXCLUSIVE
-                  signalfd rename_noreplace);
+                  signalfd rename_noreplace %SIGNUM);
 our %EXPORT_TAGS = (epoll => [qw(epoll_ctl epoll_create epoll_wait
                              EPOLLIN EPOLLOUT
                              EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
@@ -159,6 +160,7 @@ if ($^O eq "linux") {
         $SYS_epoll_wait   = 226;
         $u64_mod_8        = 1;
         $SYS_signalfd4 = 309;
+        $SIGNUM{WINCH} = 23;
     } elsif ($machine =~ m/^ppc64/) {
         $SYS_epoll_create = 236;
         $SYS_epoll_ctl    = 237;
@@ -252,6 +254,7 @@ if ($^O eq "linux") {
 	$SYS_recvmsg = 4177;
 	$FS_IOC_GETFLAGS = 0x40046601;
 	$FS_IOC_SETFLAGS = 0x80046602;
+	$SIGNUM{WINCH} = 20;
     } else {
         # as a last resort, try using the *.ph files which may not
         # exist or may be wrong
diff --git a/t/sigfd.t b/t/sigfd.t
index a68b12a6..7eb6b222 100644
--- a/t/sigfd.t
+++ b/t/sigfd.t
@@ -18,7 +18,8 @@ SKIP: {
 	local $SIG{HUP} = sub { $hit->{HUP}->{normal}++ };
 	local $SIG{TERM} = sub { $hit->{TERM}->{normal}++ };
 	local $SIG{INT} = sub { $hit->{INT}->{normal}++ };
-	for my $s (qw(HUP TERM INT)) {
+	local $SIG{WINCH} = sub { $hit->{WINCH}->{normal}++ };
+	for my $s (qw(HUP TERM INT WINCH)) {
 		$sig->{$s} = sub { $hit->{$s}->{sigfd}++ };
 	}
 	my $sigfd = PublicInbox::Sigfd->new($sig, 0);
@@ -26,6 +27,7 @@ SKIP: {
 		ok($sigfd, 'Sigfd->new works');
 		kill('HUP', $$) or die "kill $!";
 		kill('INT', $$) or die "kill $!";
+		kill('WINCH', $$) or die "kill $!";
 		my $fd = fileno($sigfd->{sock});
 		ok($fd >= 0, 'fileno(Sigfd->{sock}) works');
 		my $rvec = '';
@@ -54,6 +56,7 @@ SKIP: {
 		PublicInbox::DS->Reset;
 		is($hit->{TERM}->{sigfd}, 1, 'TERM sigfd fired in event loop');
 		is($hit->{HUP}->{sigfd}, 3, 'HUP sigfd fired in event loop');
+		is($hit->{WINCH}->{sigfd}, 1, 'WINCH sigfd fired in event loop');
 	} else {
 		skip('signalfd disabled?', 10);
 	}

^ permalink raw reply related	[relevance 96%]

* [PATCH 1/2] syscall: avoid needless string comparison on x86-64
  @ 2022-10-17  9:30 91% ` Eric Wong
  2022-10-17  9:30 96% ` [PATCH 2/2] sigfd: set SIGWINCH for MIPS and PA-RISC on Linux Eric Wong
  1 sibling, 0 replies; 51+ results
From: Eric Wong @ 2022-10-17  9:30 UTC (permalink / raw)
  To: Nicolás Ojeda Bär; +Cc: meta

For common x86-64 systems, we can avoid a needless
string comparison on `mips64' by restructuring the
branches for architecture detection.
---
 lib/PublicInbox/Syscall.pm | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 412ca64f..291e0489 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -97,15 +97,14 @@ if ($^O eq "linux") {
     # boundaries.
     my $u64_mod_8 = 0;
 
-    # if we're running on an x86_64 kernel, but a 32-bit process,
-    # we need to use the x32 or i386 syscall numbers.
-    if ($machine eq "x86_64" && $Config{ptrsize} == 4) {
-        $machine = $Config{cppsymbols} =~ /\b__ILP32__=1\b/ ? 'x32' : 'i386';
-    }
-
-    # Similarly for mips64 vs mips
-    if ($machine eq "mips64" && $Config{ptrsize} == 4) {
-        $machine = "mips";
+    if ($Config{ptrsize} == 4) {
+	# if we're running on an x86_64 kernel, but a 32-bit process,
+	# we need to use the x32 or i386 syscall numbers.
+	if ($machine eq 'x86_64') {
+	    $machine = $Config{cppsymbols} =~ /\b__ILP32__=1\b/ ? 'x32' : 'i386'
+	} elsif ($machine eq 'mips64') { # similarly for mips64 vs mips
+	    $machine = 'mips';
+	}
     }
 
     if ($machine =~ m/^i[3456]86$/) {

^ permalink raw reply related	[relevance 91%]

* [PATCH 1/2] syscall: get rid of epoll_defined() sub
  @ 2022-12-23 12:51 97% ` Eric Wong
  2022-12-23 12:51 82% ` [PATCH 2/2] syscall: drop syscall.ph support Eric Wong
  1 sibling, 0 replies; 51+ results
From: Eric Wong @ 2022-12-23 12:51 UTC (permalink / raw)
  To: meta

We can just check defined() on the `our' var itself and
save the process several kilobytes of memory.
---
 lib/PublicInbox/DS.pm      | 2 +-
 lib/PublicInbox/Syscall.pm | 2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/lib/PublicInbox/DS.pm b/lib/PublicInbox/DS.pm
index 26840662..a6c43b22 100644
--- a/lib/PublicInbox/DS.pm
+++ b/lib/PublicInbox/DS.pm
@@ -126,7 +126,7 @@ sub add_uniq_timer { # ($name, $secs, $coderef, @args) = @_;
 
 # caller sets return value to $Epoll
 sub _InitPoller () {
-	if (PublicInbox::Syscall::epoll_defined())  {
+	if (defined $PublicInbox::Syscall::SYS_epoll_create)  {
 		my $fd = epoll_create();
 		die "epoll_create: $!" if $fd < 0;
 		open($ep_io, '+<&=', $fd) or return;
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index ee4c6107..bda9bbb0 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -285,8 +285,6 @@ if ($^O eq "linux") {
 # epoll functions
 ############################################################################
 
-sub epoll_defined { $SYS_epoll_create ? 1 : 0; }
-
 sub epoll_create {
 	syscall($SYS_epoll_create, $no_deprecated ? 0 : 100);
 }

^ permalink raw reply related	[relevance 97%]

* [PATCH 2/2] syscall: drop syscall.ph support
    2022-12-23 12:51 97% ` [PATCH 1/2] syscall: get rid of epoll_defined() sub Eric Wong
@ 2022-12-23 12:51 82% ` Eric Wong
  1 sibling, 0 replies; 51+ results
From: Eric Wong @ 2022-12-23 12:51 UTC (permalink / raw)
  To: meta

h2ph-generated *.ph files are often wrong or incomplete and IME
they cause more problems than they solve.  Furthermore, we need
knowledge of struct layouts which h2ph-generated files can't get
us.  So trim down some bloat and leave a note for porters.
---
 lib/PublicInbox/Syscall.pm | 34 ++++++----------------------------
 1 file changed, 6 insertions(+), 28 deletions(-)

diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index bda9bbb0..cecb1247 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -60,21 +60,6 @@ use constant {
 };
 
 my @BYTES_4_hole = BYTES_4_hole ? (0) : ();
-our $loaded_syscall = 0;
-
-sub _load_syscall {
-    # props to Gaal for this!
-    return if $loaded_syscall++;
-    my $clean = sub {
-        delete @INC{qw<syscall.ph asm/unistd.ph bits/syscall.ph
-                        _h2ph_pre.ph sys/syscall.ph>};
-    };
-    $clean->(); # don't trust modules before us
-    my $rv = eval { require 'syscall.ph'; 1 } || eval { require 'sys/syscall.ph'; 1 };
-    $clean->(); # don't require modules after us trust us
-    $rv;
-}
-
 
 our (
      $SYS_epoll_create,
@@ -256,19 +241,12 @@ if ($^O eq "linux") {
 	$FS_IOC_SETFLAGS = 0x80046602;
 	$SIGNUM{WINCH} = 20;
     } else {
-        # as a last resort, try using the *.ph files which may not
-        # exist or may be wrong
-        _load_syscall();
-        $SYS_epoll_create = eval { &SYS_epoll_create; } || 0;
-        $SYS_epoll_ctl    = eval { &SYS_epoll_ctl;    } || 0;
-        $SYS_epoll_wait   = eval { &SYS_epoll_wait;   } || 0;
-
-	# Note: do NOT add new syscalls to depend on *.ph, here.
-	# Better to miss syscalls (so we can fallback to IO::Poll)
-	# than to use wrong ones, since the names are not stable
-	# (at least not on FreeBSD), if the actual numbers are.
+        warn <<EOM;
+machine=$machine ptrsize=$Config{ptrsize} has no syscall definitions
+git clone https://80x24.org/public-inbox.git and
+Send the output of ./devel/syscall-list to meta\@public-inbox.org
+EOM
     }
-
     if ($u64_mod_8) {
         *epoll_wait = \&epoll_wait_mod8;
         *epoll_ctl = \&epoll_ctl_mod8;
@@ -279,7 +257,7 @@ if ($^O eq "linux") {
 }
 # use Inline::C for *BSD-only or general POSIX stuff.
 # Linux guarantees stable syscall numbering, BSDs only offer a stable libc
-# use scripts/syscall-list on Linux to detect new syscall numbers
+# use devel/syscall-list on Linux to detect new syscall numbers
 
 ############################################################################
 # epoll functions

^ permalink raw reply related	[relevance 82%]

* [PATCH] syscall: fix i386/i686 detection
@ 2022-12-25 13:24 93% Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2022-12-25 13:24 UTC (permalink / raw)
  To: meta

Both __ILP32__ and __x86_64__ need to be defined for a system to
be considered x32.  Without this, my 32-bit Debian VM on a
64-bit kernel would fail after upgrading to Perl 5.32.1 on
Debian 11 (bullseye).
---
 lib/PublicInbox/Syscall.pm | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index cecb1247..530ee93b 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -87,7 +87,9 @@ if ($^O eq "linux") {
 	# if we're running on an x86_64 kernel, but a 32-bit process,
 	# we need to use the x32 or i386 syscall numbers.
 	if ($machine eq 'x86_64') {
-	    $machine = $Config{cppsymbols} =~ /\b__ILP32__=1\b/ ? 'x32' : 'i386'
+	    my $s = $Config{cppsymbols};
+	    $machine = ($s =~ /\b__ILP32__=1\b/ && $s =~ /\b__x86_64__=1\b/) ?
+				'x32' : 'i386'
 	} elsif ($machine eq 'mips64') { # similarly for mips64 vs mips
 	    $machine = 'mips';
 	}

^ permalink raw reply related	[relevance 93%]

* [PATCH] sendmsg: prefix sleep message with `#'
@ 2023-02-22 17:25 95% Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2023-02-22 17:25 UTC (permalink / raw)
  To: meta

It's an informative message that's harmless, so hopefully
the `#' prefix puts the users mind at ease.

(I saw it on an `lei import' against an IMAP source)
---
 lib/PublicInbox/CmdIPC4.pm | 2 +-
 lib/PublicInbox/Spawn.pm   | 2 +-
 lib/PublicInbox/Syscall.pm | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/CmdIPC4.pm b/lib/PublicInbox/CmdIPC4.pm
index e368d032..99890244 100644
--- a/lib/PublicInbox/CmdIPC4.pm
+++ b/lib/PublicInbox/CmdIPC4.pm
@@ -23,7 +23,7 @@ no warnings 'once';
 	} while (!defined($s) &&
 			($!{ENOBUFS} || $!{ENOMEM} || $!{ETOOMANYREFS}) &&
 			(++$try < 50) &&
-			warn "sleeping on sendmsg: $! (#$try)\n" &&
+			warn "# sleeping on sendmsg: $! (#$try)\n" &&
 			select(undef, undef, undef, 0.1) == 0);
 	$s;
 };
diff --git a/lib/PublicInbox/Spawn.pm b/lib/PublicInbox/Spawn.pm
index 826ee508..dc11543a 100644
--- a/lib/PublicInbox/Spawn.pm
+++ b/lib/PublicInbox/Spawn.pm
@@ -171,7 +171,7 @@ static int sleep_wait(unsigned *tries, int err)
 	switch (err) {
 	case ENOBUFS: case ENOMEM: case ETOOMANYREFS:
 		if (++*tries < 50) {
-			fprintf(stderr, "sleeping on sendmsg: %s (#%u)\n",
+			fprintf(stderr, "# sleeping on sendmsg: %s (#%u)\n",
 				strerror(err), *tries);
 			nanosleep(&req, NULL);
 			return 1;
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 530ee93b..841a2106 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -427,7 +427,7 @@ no warnings 'once';
 	} while ($sent < 0 &&
 			($!{ENOBUFS} || $!{ENOMEM} || $!{ETOOMANYREFS}) &&
 			(++$try < 50) &&
-			warn "sleeping on sendmsg: $! (#$try)\n" &&
+			warn "# sleeping on sendmsg: $! (#$try)\n" &&
 			select(undef, undef, undef, 0.1) == 0);
 	$sent >= 0 ? $sent : undef;
 };

^ permalink raw reply related	[relevance 95%]

* [PATCH 04/10] update devel/syscall-list to devel/sysdefs-list
  @ 2023-09-04 10:36 90% ` Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2023-09-04 10:36 UTC (permalink / raw)
  To: meta

We use it to dump SIGWINCH and _SC_NPROCESSORS_ONLN, so
"sysdefs" is a more appropriate list for *BSD users.
---
 MANIFEST                             |  2 +-
 devel/{syscall-list => sysdefs-list} | 47 +++++++++++++++-------------
 lib/PublicInbox/Syscall.pm           |  7 +++--
 3 files changed, 30 insertions(+), 26 deletions(-)
 rename devel/{syscall-list => sysdefs-list} (60%)

diff --git a/MANIFEST b/MANIFEST
index 918ec2e1..5964794e 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -123,7 +123,7 @@ contrib/selinux/el7/publicinbox.fc
 contrib/selinux/el7/publicinbox.te
 devel/README
 devel/longest-tests
-devel/syscall-list
+devel/sysdefs-list
 examples/README
 examples/README.unsubscribe
 examples/cgit-commit-filter.lua
diff --git a/devel/syscall-list b/devel/sysdefs-list
similarity index 60%
rename from devel/syscall-list
rename to devel/sysdefs-list
index 0b36c0e2..9764cc29 100755
--- a/devel/syscall-list
+++ b/devel/sysdefs-list
@@ -1,31 +1,37 @@
 # Copyright all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <http://www.gnu.org/licenses/agpl-3.0.txt>
-# Dump syscall numbers under Linux and any other kernel which
-# promises stable syscall numbers.  This is to maintain
-# PublicInbox::Syscall
-# DO NOT USE this for *BSDs, none of the current BSD kernels
-# we know about promise stable syscall numbers, we'll use
-# Inline::C to support them.
+# Dump system-specific constant numbers this is to maintain
+# PublicInbox::Syscall and any other system-specific pieces.
+# DO NOT USE syscall numbers for *BSDs, none of the current BSD kernels
+# we know about promise stable syscall numbers (unlike Linux).
+# However, sysconf(3) constants are stable ABI on all safe to dump.
 eval 'exec perl -S $0 ${1+"$@"}' # no shebang
 	if 0; # running under some shell
-use strict;
-use v5.10.1;
+use v5.12;
 use File::Temp 0.19;
 use POSIX qw(uname);
+use Config;
 say '$machine='.(POSIX::uname())[-1];
-my $cc = $ENV{CC} // 'cc';
-my @cflags = split(/\s+/, $ENV{CFLAGS} // '-Wall');
+my $cc = $ENV{CC} // $Config{cc} // 'cc';
+my @cflags = split(/\s+/, $ENV{CFLAGS} // $Config{ccflags} // '-Wall');
 my $str = do { local $/; <DATA> };
-my $tmp = File::Temp->newdir('syscall-list-XXXX', TMPDIR => 1);
-my $f = "$tmp/sc.c";
-my $x = "$tmp/sc";
+$str =~ s/^\s*MAYBE\s*(\w+)\s*$/
+#ifdef $1
+	D($1);
+#endif
+/sgxm;
+my $tmp = File::Temp->newdir('sysdefs-list-XXXX', TMPDIR => 1);
+my $f = "$tmp/sysdefs.c";
+my $x = "$tmp/sysdefs";
 open my $fh, '>', $f or die "open $f $!";
 print $fh $str or die "print $f $!";
 close $fh or die "close $f $!";
-system($cc, '-o', $x, $f, @cflags) == 0 or die "cc failed \$?=$?";
+system($cc, '-o', $x, $f, @cflags) == 0 or die "$cc failed \$?=$?";
 exec($x);
 __DATA__
-#define _GNU_SOURCE
+#ifndef _GNU_SOURCE
+#  define _GNU_SOURCE
+#endif
 #include <signal.h>
 #include <sys/syscall.h>
 #include <sys/ioctl.h>
@@ -43,9 +49,7 @@ int main(void)
 #ifdef __linux__
 	D(SYS_epoll_create1);
 	D(SYS_epoll_ctl);
-#ifdef SYS_epoll_wait
-	D(SYS_epoll_wait);
-#endif
+	MAYBE SYS_epoll_wait
 	D(SYS_epoll_pwait);
 	D(SYS_signalfd4);
 	D(SYS_inotify_init1);
@@ -59,13 +63,12 @@ int main(void)
 	printf("FS_IOC_GETFLAGS=%#lx\nFS_IOC_SETFLAGS=%#lx\n",
 		(unsigned long)FS_IOC_GETFLAGS, (unsigned long)FS_IOC_SETFLAGS);
 #endif
-
-#ifdef SYS_renameat2
-	D(SYS_renameat2);
-#endif
+	MAYBE SYS_renameat2
 #endif /* Linux, any other OSes with stable syscalls? */
 	printf("size_t=%zu off_t=%zu pid_t=%zu\n",
 		 sizeof(size_t), sizeof(off_t), sizeof(pid_t));
 	D(SIGWINCH);
+	MAYBE _SC_NPROCESSORS_ONLN
+
 	return 0;
 }
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 841a2106..4609b32d 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -2,7 +2,7 @@
 # specifically the Debian libsys-syscall-perl 0.25-6 version to
 # fix upstream regressions in 0.25.
 #
-# See devel/syscall-list in the public-inbox source tree for maintenance
+# See devel/sysdefs-list in the public-inbox source tree for maintenance
 # <https://80x24.org/public-inbox.git>, and machines from the GCC Farm:
 # <https://cfarm.tetaneutral.net/>
 #
@@ -246,7 +246,7 @@ if ($^O eq "linux") {
         warn <<EOM;
 machine=$machine ptrsize=$Config{ptrsize} has no syscall definitions
 git clone https://80x24.org/public-inbox.git and
-Send the output of ./devel/syscall-list to meta\@public-inbox.org
+Send the output of ./devel/sysdefs-list to meta\@public-inbox.org
 EOM
     }
     if ($u64_mod_8) {
@@ -259,7 +259,8 @@ EOM
 }
 # use Inline::C for *BSD-only or general POSIX stuff.
 # Linux guarantees stable syscall numbering, BSDs only offer a stable libc
-# use devel/syscall-list on Linux to detect new syscall numbers
+# use devel/sysdefs-list on Linux to detect new syscall numbers and
+# other system constants
 
 ############################################################################
 # epoll functions

^ permalink raw reply related	[relevance 90%]

* [PATCH 3/7] ds: use object-oriented API for epoll
    2023-09-11  9:41 43% ` [PATCH 2/7] daemon: depend on DS event_loop in master process, too Eric Wong
@ 2023-09-11  9:41 50% ` Eric Wong
  1 sibling, 0 replies; 51+ results
From: Eric Wong @ 2023-09-11  9:41 UTC (permalink / raw)
  To: meta

This allows us to cut down on imports and reduce code.
This also makes it easier (in the next commit) to provide an option
to disable epoll/kqueue when saving an FD is valued over scalability.
---
 MANIFEST                   |  1 +
 lib/PublicInbox/DS.pm      | 40 ++++++++++++---------------------
 lib/PublicInbox/DSKQXS.pm  | 46 +++++++++++++++++---------------------
 lib/PublicInbox/DSPoll.pm  | 31 +++++++++----------------
 lib/PublicInbox/Epoll.pm   | 23 +++++++++++++++++++
 lib/PublicInbox/Syscall.pm |  6 -----
 t/ds-kqxs.t                |  4 ++--
 t/ds-poll.t                | 29 +++++++++++-------------
 t/epoll.t                  | 23 +++++++++----------
 9 files changed, 95 insertions(+), 108 deletions(-)
 create mode 100644 lib/PublicInbox/Epoll.pm

diff --git a/MANIFEST b/MANIFEST
index 1fe1c7f7..d7a670b8 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -184,6 +184,7 @@ lib/PublicInbox/EOFpipe.pm
 lib/PublicInbox/Emergency.pm
 lib/PublicInbox/Eml.pm
 lib/PublicInbox/EmlContentFoo.pm
+lib/PublicInbox/Epoll.pm
 lib/PublicInbox/ExtMsg.pm
 lib/PublicInbox/ExtSearch.pm
 lib/PublicInbox/ExtSearchIdx.pm
diff --git a/lib/PublicInbox/DS.pm b/lib/PublicInbox/DS.pm
index d6e3d10e..9300ac77 100644
--- a/lib/PublicInbox/DS.pm
+++ b/lib/PublicInbox/DS.pm
@@ -28,7 +28,8 @@ use POSIX qw(WNOHANG sigprocmask SIG_SETMASK SIG_UNBLOCK);
 use Fcntl qw(SEEK_SET :DEFAULT O_APPEND);
 use Time::HiRes qw(clock_gettime CLOCK_MONOTONIC);
 use Scalar::Util qw(blessed);
-use PublicInbox::Syscall qw(:epoll %SIGNUM);
+use PublicInbox::Syscall qw(%SIGNUM
+	EPOLLIN EPOLLOUT EPOLLONESHOT EPOLLEXCLUSIVE);
 use PublicInbox::Tmpfile;
 use Errno qw(EAGAIN EINVAL ECHILD EINTR);
 use Carp qw(carp croak);
@@ -41,8 +42,7 @@ my $reap_armed;
 my $ToClose; # sockets to close when event loop is done
 our (
      %DescriptorMap,             # fd (num) -> PublicInbox::DS object
-     $Epoll,                     # Global epoll fd (or DSKQXS ref)
-     $ep_io,                     # IO::Handle for Epoll
+     $Epoll,  # global Epoll, DSPoll, or DSKQXS ref
 
      @post_loop_do,              # subref + args to call at the end of each loop
 
@@ -75,7 +75,6 @@ sub Reset {
 		my @q = delete @Stack{keys %Stack};
 		for my $q (@q) { @$q = () }
 		$AWAIT_PIDS = $nextq = $ToClose = undef;
-		$ep_io = undef; # closes real $Epoll FD
 		$Epoll = undef; # may call DSKQXS::DESTROY
 	} while (@Timers || keys(%Stack) || $nextq || $AWAIT_PIDS ||
 		$ToClose || keys(%DescriptorMap) ||
@@ -126,21 +125,13 @@ sub add_uniq_timer { # ($name, $secs, $coderef, @args) = @_;
 
 # caller sets return value to $Epoll
 sub _InitPoller () {
-	if (defined $PublicInbox::Syscall::SYS_epoll_create)  {
-		my $fd = epoll_create();
-		die "epoll_create: $!" if $fd < 0;
-		open($ep_io, '+<&=', $fd) or return;
-		fcntl($ep_io, F_SETFD, FD_CLOEXEC);
-		$fd;
-	} else {
-		my $cls;
-		for (qw(DSKQXS DSPoll)) {
-			$cls = "PublicInbox::$_";
-			last if eval "require $cls";
-		}
-		$cls->import(qw(epoll_ctl epoll_wait));
-		$cls->new;
+	my @try = ($^O eq 'linux' ? 'Epoll' : 'DSKQXS');
+	my $cls;
+	for (@try, 'DSPoll') {
+		$cls = "PublicInbox::$_";
+		last if eval "require $cls";
 	}
+	$cls->new;
 }
 
 sub now () { clock_gettime(CLOCK_MONOTONIC) }
@@ -307,7 +298,7 @@ sub event_loop (;$$) {
 		my $timeout = RunTimers();
 
 		# get up to 1000 events
-		epoll_wait($Epoll, 1000, $timeout, \@events);
+		$Epoll->ep_wait(1000, $timeout, \@events);
 		for my $fd (@events) {
 			# it's possible epoll_wait returned many events,
 			# including some at the end that ones in the front
@@ -345,7 +336,7 @@ sub new {
 
     $Epoll //= _InitPoller();
 retry:
-    if (epoll_ctl($Epoll, EPOLL_CTL_ADD, $fd, $ev)) {
+    if ($Epoll->ep_add($sock, $ev)) {
         if ($! == EINVAL && ($ev & EPOLLEXCLUSIVE)) {
             $ev &= ~EPOLLEXCLUSIVE;
             goto retry;
@@ -399,9 +390,7 @@ sub close {
 
     # if we're using epoll, we have to remove this from our epoll fd so we stop getting
     # notifications about it
-    my $fd = fileno($sock);
-    epoll_ctl($Epoll, EPOLL_CTL_DEL, $fd, 0) and
-        croak("EPOLL_CTL_DEL($self/$sock): $!");
+    $Epoll->ep_del($sock) and croak("EPOLL_CTL_DEL($self/$sock): $!");
 
     # we explicitly don't delete from DescriptorMap here until we
     # actually close the socket, as we might be in the middle of
@@ -619,9 +608,8 @@ sub msg_more ($$) {
 }
 
 sub epwait ($$) {
-    my ($sock, $ev) = @_;
-    epoll_ctl($Epoll, EPOLL_CTL_MOD, fileno($sock), $ev) and
-        croak("EPOLL_CTL_MOD($sock): $!");
+	my ($io, $ev) = @_;
+	$Epoll->ep_mod($io, $ev) and croak("EPOLL_CTL_MOD($io): $!");
 }
 
 # return true if complete, false if incomplete (or failure)
diff --git a/lib/PublicInbox/DSKQXS.pm b/lib/PublicInbox/DSKQXS.pm
index b6e5c4e9..8ef8ffb6 100644
--- a/lib/PublicInbox/DSKQXS.pm
+++ b/lib/PublicInbox/DSKQXS.pm
@@ -12,13 +12,10 @@
 # It also implements signalfd(2) emulation via "tie".
 package PublicInbox::DSKQXS;
 use v5.12;
-use parent qw(Exporter);
 use Symbol qw(gensym);
 use IO::KQueue;
 use Errno qw(EAGAIN);
-use PublicInbox::Syscall qw(EPOLLONESHOT EPOLLIN EPOLLOUT EPOLLET
-	EPOLL_CTL_ADD EPOLL_CTL_MOD EPOLL_CTL_DEL);
-our @EXPORT_OK = qw(epoll_ctl epoll_wait);
+use PublicInbox::Syscall qw(EPOLLONESHOT EPOLLIN EPOLLOUT EPOLLET);
 
 sub EV_DISPATCH () { 0x0080 }
 
@@ -97,30 +94,29 @@ sub READ { # called by sysread() for signalfd compatibility
 # for fileno() calls in PublicInbox::DS
 sub FILENO { ${$_[0]->{kq}} }
 
-sub epoll_ctl {
-	my ($self, $op, $fd, $ev) = @_;
-	my $kq = $self->{kq};
-	if ($op == EPOLL_CTL_MOD) {
-		$kq->EV_SET($fd, EVFILT_READ, kq_flag(EPOLLIN, $ev));
-		eval { $kq->EV_SET($fd, EVFILT_WRITE, kq_flag(EPOLLOUT, $ev)) };
-	} elsif ($op == EPOLL_CTL_DEL) {
-		$kq // return; # called in cleanup
-		$kq->EV_SET($fd, EVFILT_READ, EV_DISABLE);
-		eval { $kq->EV_SET($fd, EVFILT_WRITE, EV_DISABLE) };
-	} else { # EPOLL_CTL_ADD
-		$kq->EV_SET($fd, EVFILT_READ, EV_ADD|kq_flag(EPOLLIN, $ev));
-
-		# we call this blindly for read-only FDs such as tied
-		# DSKQXS (signalfd emulation) and Listeners
-		eval {
-			$kq->EV_SET($fd, EVFILT_WRITE, EV_ADD |
-							kq_flag(EPOLLOUT, $ev));
-		};
-	}
+sub _ep_mod_add ($$$$) {
+	my ($kq, $fd, $ev, $add) = @_;
+	$kq->EV_SET($fd, EVFILT_READ, $add|kq_flag(EPOLLIN, $ev));
+
+	# we call this blindly for read-only FDs such as tied
+	# DSKQXS (signalfd emulation) and Listeners
+	eval { $kq->EV_SET($fd, EVFILT_WRITE, $add|kq_flag(EPOLLOUT, $ev)) };
+	0;
+}
+
+sub ep_add { _ep_mod_add($_[0]->{kq}, fileno($_[1]), $_[2], EV_ADD) };
+sub ep_mod { _ep_mod_add($_[0]->{kq}, fileno($_[1]), $_[2], 0) };
+
+sub ep_del {
+	my ($self, $io, $ev) = @_;
+	my $kq = $_[0]->{kq} // return; # called in cleanup
+	my $fd = fileno($io);
+	$kq->EV_SET($fd, EVFILT_READ, EV_DISABLE);
+	eval { $kq->EV_SET($fd, EVFILT_WRITE, EV_DISABLE) };
 	0;
 }
 
-sub epoll_wait {
+sub ep_wait {
 	my ($self, $maxevents, $timeout_msec, $events) = @_;
 	@$events = eval { $self->{kq}->kevent($timeout_msec) };
 	if (my $err = $@) {
diff --git a/lib/PublicInbox/DSPoll.pm b/lib/PublicInbox/DSPoll.pm
index 56a400c2..fc282de0 100644
--- a/lib/PublicInbox/DSPoll.pm
+++ b/lib/PublicInbox/DSPoll.pm
@@ -1,4 +1,4 @@
-# Copyright (C) 2019-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # Licensed the same as Danga::Socket (and Perl5)
 # License: GPL-1.0+ or Artistic-1.0-Perl
 #  <https://www.gnu.org/licenses/gpl-1.0.txt>
@@ -9,28 +9,13 @@
 # an all encompassing emulation of epoll via IO::Poll, but just to
 # support cases public-inbox-nntpd/httpd care about.
 package PublicInbox::DSPoll;
-use strict;
-use warnings;
-use parent qw(Exporter);
+use v5.12;
 use IO::Poll;
-use PublicInbox::Syscall qw(EPOLLONESHOT EPOLLIN EPOLLOUT EPOLL_CTL_DEL);
-our @EXPORT_OK = qw(epoll_ctl epoll_wait);
+use PublicInbox::Syscall qw(EPOLLONESHOT EPOLLIN EPOLLOUT);
 
-sub new { bless {}, $_[0] } # fd => events
+sub new { bless {}, __PACKAGE__ } # fd => events
 
-sub epoll_ctl {
-	my ($self, $op, $fd, $ev) = @_;
-
-	# not wasting time on error checking
-	if ($op != EPOLL_CTL_DEL) {
-		$self->{$fd} = $ev;
-	} else {
-		delete $self->{$fd};
-	}
-	0;
-}
-
-sub epoll_wait {
+sub ep_wait {
 	my ($self, $maxevents, $timeout_msec, $events) = @_;
 	my @pset;
 	while (my ($fd, $events) = each %$self) {
@@ -54,4 +39,10 @@ sub epoll_wait {
 	}
 }
 
+sub ep_del { delete($_[0]->{fileno($_[1])}); 0 }
+sub ep_add { $_[0]->{fileno($_[1])} = $_[2]; 0 }
+
+no warnings 'once';
+*ep_mod = \&ep_add;
+
 1;
diff --git a/lib/PublicInbox/Epoll.pm b/lib/PublicInbox/Epoll.pm
new file mode 100644
index 00000000..d55c8535
--- /dev/null
+++ b/lib/PublicInbox/Epoll.pm
@@ -0,0 +1,23 @@
+# Copyright (C) all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# OO API for epoll
+package PublicInbox::Epoll;
+use v5.12;
+use PublicInbox::Syscall qw(epoll_create epoll_ctl epoll_wait
+	EPOLL_CTL_ADD EPOLL_CTL_MOD EPOLL_CTL_DEL);
+use Fcntl qw(F_SETFD FD_CLOEXEC);
+use autodie qw(open fcntl);
+
+sub new {
+	open(my $fh, '+<&=', epoll_create());
+	fcntl($fh, F_SETFD, FD_CLOEXEC);
+	bless \$fh, __PACKAGE__;
+}
+
+sub ep_add { epoll_ctl(fileno(${$_[0]}), EPOLL_CTL_ADD, fileno($_[1]), $_[2]) }
+sub ep_mod { epoll_ctl(fileno(${$_[0]}), EPOLL_CTL_MOD, fileno($_[1]), $_[2]) }
+sub ep_del { epoll_ctl(fileno(${$_[0]}), EPOLL_CTL_DEL, fileno($_[1]), 0) }
+sub ep_wait { epoll_wait(fileno(${$_[0]}), @_[1, 2, 3]) }
+
+1;
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 14cd1720..0a0912fb 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -29,12 +29,6 @@ our @EXPORT_OK = qw(epoll_ctl epoll_create epoll_wait
                   EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
                   EPOLLONESHOT EPOLLEXCLUSIVE
                   signalfd rename_noreplace %SIGNUM);
-our %EXPORT_TAGS = (epoll => [qw(epoll_ctl epoll_create epoll_wait
-                             EPOLLIN EPOLLOUT
-                             EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
-                             EPOLLONESHOT EPOLLEXCLUSIVE)],
-                );
-
 use constant {
 	EPOLLIN => 1,
 	EPOLLOUT => 4,
diff --git a/t/ds-kqxs.t b/t/ds-kqxs.t
index 43c71fed..57acb53f 100644
--- a/t/ds-kqxs.t
+++ b/t/ds-kqxs.t
@@ -1,9 +1,9 @@
-# Copyright (C) 2019-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # Licensed the same as Danga::Socket (and Perl5)
 # License: GPL-1.0+ or Artistic-1.0-Perl
 #  <https://www.gnu.org/licenses/gpl-1.0.txt>
 #  <https://dev.perl.org/licenses/artistic.html>
-use strict;
+use v5.12;
 use Test::More;
 unless (eval { require IO::KQueue }) {
 	my $m = $^O !~ /bsd/ ? 'DSKQXS is only for *BSD systems'
diff --git a/t/ds-poll.t b/t/ds-poll.t
index d8861369..57fac3ef 100644
--- a/t/ds-poll.t
+++ b/t/ds-poll.t
@@ -1,12 +1,11 @@
-# Copyright (C) 2019-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # Licensed the same as Danga::Socket (and Perl5)
 # License: GPL-1.0+ or Artistic-1.0-Perl
 #  <https://www.gnu.org/licenses/gpl-1.0.txt>
 #  <https://dev.perl.org/licenses/artistic.html>
-use strict;
-use warnings;
+use v5.12;
 use Test::More;
-use PublicInbox::Syscall qw(:epoll);
+use PublicInbox::Syscall qw(EPOLLIN EPOLLOUT EPOLLONESHOT);
 my $cls = $ENV{TEST_IOPOLLER} // 'PublicInbox::DSPoll';
 use_ok $cls;
 my $p = $cls->new;
@@ -14,37 +13,35 @@ my $p = $cls->new;
 my ($r, $w, $x, $y);
 pipe($r, $w) or die;
 pipe($x, $y) or die;
-is($p->epoll_ctl(EPOLL_CTL_ADD, fileno($r), EPOLLIN), 0, 'add EPOLLIN');
+is($p->ep_add($r, EPOLLIN), 0, 'add EPOLLIN');
 my $events = [];
-$p->epoll_wait(9, 0, $events);
+$p->ep_wait(9, 0, $events);
 is_deeply($events, [], 'no events set');
-is($p->epoll_ctl(EPOLL_CTL_ADD, fileno($w), EPOLLOUT|EPOLLONESHOT), 0,
-	'add EPOLLOUT|EPOLLONESHOT');
-$p->epoll_wait(9, -1, $events);
+is($p->ep_add($w, EPOLLOUT|EPOLLONESHOT), 0, 'add EPOLLOUT|EPOLLONESHOT');
+$p->ep_wait(9, -1, $events);
 is(scalar(@$events), 1, 'got POLLOUT event');
 is($events->[0], fileno($w), '$w ready');
 
-$p->epoll_wait(9, 0, $events);
+$p->ep_wait(9, 0, $events);
 is(scalar(@$events), 0, 'nothing ready after oneshot');
 is_deeply($events, [], 'no events set after oneshot');
 
 syswrite($w, '1') == 1 or die;
 for my $t (0..1) {
-	$p->epoll_wait(9, $t, $events);
+	$p->ep_wait(9, $t, $events);
 	is($events->[0], fileno($r), "level-trigger POLLIN ready #$t");
 	is(scalar(@$events), 1, "only event ready #$t");
 }
 syswrite($y, '1') == 1 or die;
-is($p->epoll_ctl(EPOLL_CTL_ADD, fileno($x), EPOLLIN|EPOLLONESHOT), 0,
-	'EPOLLIN|EPOLLONESHOT add');
-$p->epoll_wait(9, -1, $events);
+is($p->ep_add($x, EPOLLIN|EPOLLONESHOT), 0, 'EPOLLIN|EPOLLONESHOT add');
+$p->ep_wait(9, -1, $events);
 is(scalar @$events, 2, 'epoll_wait has 2 ready');
 my @fds = sort @$events;
 my @exp = sort((fileno($r), fileno($x)));
 is_deeply(\@fds, \@exp, 'got both ready FDs');
 
-is($p->epoll_ctl(EPOLL_CTL_DEL, fileno($r), 0), 0, 'EPOLL_CTL_DEL OK');
-$p->epoll_wait(9, 0, $events);
+is($p->ep_del($r, 0), 0, 'EPOLL_CTL_DEL OK');
+$p->ep_wait(9, 0, $events);
 is(scalar @$events, 0, 'nothing ready after EPOLL_CTL_DEL');
 
 done_testing;
diff --git a/t/epoll.t b/t/epoll.t
index f346b387..54dc6f47 100644
--- a/t/epoll.t
+++ b/t/epoll.t
@@ -1,25 +1,22 @@
 #!perl -w
-# Copyright (C) 2020-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
-use strict;
-use v5.10.1;
+use v5.12;
 use Test::More;
-use PublicInbox::Syscall qw(:epoll);
+use autodie;
+use PublicInbox::Syscall qw(EPOLLOUT);
 plan skip_all => 'not Linux' if $^O ne 'linux';
-my $epfd = epoll_create();
-ok($epfd >= 0, 'epoll_create');
-open(my $hnd, '+<&=', $epfd); # for autoclose
-
-pipe(my ($r, $w)) or die "pipe: $!";
-is(epoll_ctl($epfd, EPOLL_CTL_ADD, fileno($w), EPOLLOUT), 0,
-    'epoll_ctl socket EPOLLOUT');
+require_ok 'PublicInbox::Epoll';
+my $ep = PublicInbox::Epoll->new;
+pipe(my $r, my $w);
+is($ep->ep_add($w, EPOLLOUT), 0, 'epoll_ctl pipe EPOLLOUT');
 
 my @events;
-epoll_wait($epfd, 100, 10000, \@events);
+$ep->ep_wait(100, 10000, \@events);
 is(scalar(@events), 1, 'got one event');
 is($events[0], fileno($w), 'got expected FD');
 close $w;
-epoll_wait($epfd, 100, 0, \@events);
+$ep->ep_wait(100, 0, \@events);
 is(scalar(@events), 0, 'epoll_wait timeout');
 
 done_testing;

^ permalink raw reply related	[relevance 50%]

* [PATCH 2/7] daemon: depend on DS event_loop in master process, too
  @ 2023-09-11  9:41 43% ` Eric Wong
  2023-09-11  9:41 50% ` [PATCH 3/7] ds: use object-oriented API for epoll Eric Wong
  1 sibling, 0 replies; 51+ results
From: Eric Wong @ 2023-09-11  9:41 UTC (permalink / raw)
  To: meta

The awaitpid API turns out to be quite handy for managing
long-lived worker processes.  This allows us to ensure all our
uses of signalfd (and kevent emulation) are non-blocking.
---
 lib/PublicInbox/DS.pm      |   2 +-
 lib/PublicInbox/DSKQXS.pm  |  12 +-
 lib/PublicInbox/Daemon.pm  | 252 +++++++++++++++++--------------------
 lib/PublicInbox/Sigfd.pm   |  12 +-
 lib/PublicInbox/Syscall.pm |   6 +-
 t/httpd-unix.t             |  20 ++-
 t/sigfd.t                  |   4 +-
 7 files changed, 146 insertions(+), 162 deletions(-)

diff --git a/lib/PublicInbox/DS.pm b/lib/PublicInbox/DS.pm
index ff10c9c0..d6e3d10e 100644
--- a/lib/PublicInbox/DS.pm
+++ b/lib/PublicInbox/DS.pm
@@ -280,7 +280,7 @@ sub event_loop (;$$) {
 	my ($sig, $oldset) = @_;
 	$Epoll //= _InitPoller();
 	require PublicInbox::Sigfd if $sig;
-	my $sigfd = $sig ? PublicInbox::Sigfd->new($sig, 1) : undef;
+	my $sigfd = $sig ? PublicInbox::Sigfd->new($sig) : undef;
 	if ($sigfd && $sigfd->{is_kq}) {
 		my $tmp = allowset($sig);
 		local @SIG{keys %$sig} = values(%$sig);
diff --git a/lib/PublicInbox/DSKQXS.pm b/lib/PublicInbox/DSKQXS.pm
index 3fcb4e40..b6e5c4e9 100644
--- a/lib/PublicInbox/DSKQXS.pm
+++ b/lib/PublicInbox/DSKQXS.pm
@@ -47,16 +47,15 @@ sub new {
 # It's wasteful in that it uses another FD, but it simplifies
 # our epoll-oriented code.
 sub signalfd {
-	my ($class, $signo, $nonblock) = @_;
+	my ($class, $signo) = @_;
 	my $sym = gensym;
-	tie *$sym, $class, $signo, $nonblock; # calls TIEHANDLE
+	tie *$sym, $class, $signo; # calls TIEHANDLE
 	$sym
 }
 
 sub TIEHANDLE { # similar to signalfd()
-	my ($class, $signo, $nonblock) = @_;
+	my ($class, $signo) = @_;
 	my $self = $class->new;
-	$self->{timeout} = $nonblock ? 0 : -1;
 	my $kq = $self->{kq};
 	$kq->EV_SET($_, EVFILT_SIGNAL, EV_ADD) for @$signo;
 	$self;
@@ -65,7 +64,6 @@ sub TIEHANDLE { # similar to signalfd()
 sub READ { # called by sysread() for signalfd compatibility
 	my ($self, undef, $len, $off) = @_; # $_[1] = buf
 	die "bad args for signalfd read" if ($len % 128) // defined($off);
-	my $timeout = $self->{timeout};
 	my $sigbuf = $self->{sigbuf} //= [];
 	my $nr = $len / 128;
 	my $r = 0;
@@ -78,13 +76,13 @@ sub READ { # called by sysread() for signalfd compatibility
 			$r += 128;
 		}
 		return $r if $r;
-		my @events = eval { $self->{kq}->kevent($timeout) };
+		my @events = eval { $self->{kq}->kevent(0) };
 		# workaround https://rt.cpan.org/Ticket/Display.html?id=116615
 		if ($@) {
 			next if $@ =~ /Interrupted system call/;
 			die;
 		}
-		if (!scalar(@events) && $timeout == 0) {
+		if (!scalar(@events)) {
 			$! = EAGAIN;
 			return;
 		}
diff --git a/lib/PublicInbox/Daemon.pm b/lib/PublicInbox/Daemon.pm
index 88b0fa45..222093bc 100644
--- a/lib/PublicInbox/Daemon.pm
+++ b/lib/PublicInbox/Daemon.pm
@@ -5,8 +5,7 @@
 # and designed for handling thousands of untrusted clients over slow
 # and/or lossy connections.
 package PublicInbox::Daemon;
-use strict;
-use v5.10.1;
+use v5.12;
 use Getopt::Long qw(:config gnu_getopt no_ignore_case auto_abbrev);
 use IO::Handle; # ->autoflush
 use IO::Socket;
@@ -15,10 +14,9 @@ use POSIX qw(WNOHANG :signal_h F_SETFD);
 use Socket qw(IPPROTO_TCP SOL_SOCKET);
 STDOUT->autoflush(1);
 STDERR->autoflush(1);
-use PublicInbox::DS qw(now);
+use PublicInbox::DS qw(now awaitpid);
 use PublicInbox::Listener;
 use PublicInbox::EOFpipe;
-use PublicInbox::Sigfd;
 use PublicInbox::Git;
 use PublicInbox::GitAsyncCat;
 use PublicInbox::Eml;
@@ -27,9 +25,7 @@ our $SO_ACCEPTFILTER = 0x1000;
 my @CMD;
 my ($set_user, $oldset);
 my (@cfg_listen, $stdout, $stderr, $group, $user, $pid_file, $daemonize);
-my $worker_processes = 1;
-my @listeners;
-my (%pids, %logs);
+my ($nworker, @listeners, %WORKERS, %logs);
 my %tls_opt; # scheme://sockname => args for IO::Socket::SSL::SSL_Context->new
 my $reexec_pid;
 my ($uid, $gid);
@@ -40,6 +36,19 @@ my %SCHEME2PORT = map { $KNOWN_TLS{$_} => $_ + 0 } keys %KNOWN_TLS;
 for (keys %KNOWN_STARTTLS) { $SCHEME2PORT{$KNOWN_STARTTLS{$_}} = $_ + 0 }
 $SCHEME2PORT{http} = 80;
 
+our ($parent_pipe, %POST_ACCEPT, %XNETD);
+our %WORKER_SIG = (
+	INT => \&worker_quit,
+	QUIT => \&worker_quit,
+	TERM => \&worker_quit,
+	TTIN => 'IGNORE',
+	TTOU => 'IGNORE',
+	USR1 => \&reopen_logs,
+	USR2 => 'IGNORE',
+	WINCH => 'IGNORE',
+	CHLD => \&PublicInbox::DS::enqueue_reap,
+);
+
 sub listener_opt ($) {
 	my ($str) = @_; # opt1=val1,opt2=val2 (opt may repeat for multi-value)
 	my $o = {};
@@ -141,8 +150,8 @@ sub load_mod ($;$$) {
 	\%xn;
 }
 
-sub daemon_prepare ($$) {
-	my ($default_listen, $xnetd) = @_;
+sub daemon_prepare ($) {
+	my ($default_listen) = @_;
 	my $listener_names = {}; # sockname => IO::Handle
 	$oldset = PublicInbox::DS::block_signals();
 	@CMD = ($0, @ARGV);
@@ -164,7 +173,7 @@ EOF
 		'l|listen=s' => \@cfg_listen,
 		'1|stdout=s' => \$stdout,
 		'2|stderr=s' => \$stderr,
-		'W|worker-processes=i' => \$worker_processes,
+		'W|worker-processes=i' => \$nworker,
 		'P|pid-file=s' => \$pid_file,
 		'u|user=s' => \$user,
 		'g|group=s' => \$group,
@@ -218,7 +227,7 @@ EOF
 			die "$orig specified w/o cert=\n";
 		}
 		if ($listener_names->{$l}) { # already inherited
-			$xnetd->{$l} = load_mod($scheme, $opt, $l);
+			$XNETD{$l} = load_mod($scheme, $opt, $l);
 			next;
 		}
 		my (%o, $sock_pkg);
@@ -254,7 +263,7 @@ EOF
 		$s->blocking(0);
 		my $sockname = sockname($s);
 		warn "# bound $scheme://$sockname\n";
-		$xnetd->{$sockname} //= load_mod($scheme, $opt);
+		$XNETD{$sockname} //= load_mod($scheme, $opt);
 		$listener_names->{$sockname} = $s;
 		push @listeners, $s;
 	}
@@ -268,10 +277,10 @@ EOF
 		for my $x (@inherited_names) {
 			$x =~ /:([0-9]+)\z/ or next; # no TLS for AF_UNIX
 			if (my $scheme = $KNOWN_TLS{$1}) {
-				$xnetd->{$x} //= load_mod($scheme);
+				$XNETD{$x} //= load_mod($scheme);
 				$tls_opt{"$scheme://$x"} ||= accept_tls_opt('');
 			} elsif (($scheme = $KNOWN_STARTTLS{$1})) {
-				$xnetd->{$x} //= load_mod($scheme);
+				$XNETD{$x} //= load_mod($scheme);
 				$tls_opt{"$scheme://$x"} ||= accept_tls_opt('');
 			} elsif (defined $stls) {
 				$tls_opt{"$stls://$x"} ||= accept_tls_opt('');
@@ -280,7 +289,7 @@ EOF
 	}
 	if (defined $default_scheme) {
 		for my $x (@inherited_names) {
-			$xnetd->{$x} //= load_mod($default_scheme);
+			$XNETD{$x} //= load_mod($default_scheme);
 		}
 	}
 	die "No listeners bound\n" unless @listeners;
@@ -476,11 +485,9 @@ sub upgrade { # $_[0] = signal name or number (unused)
 		write_pid($pid_file);
 	}
 	my $pid = fork;
-	unless (defined $pid) {
+	if (!defined($pid)) {
 		warn "fork failed: $!\n";
-		return;
-	}
-	if ($pid == 0) {
+	} elsif ($pid == 0) {
 		$ENV{LISTEN_FDS} = scalar @listeners;
 		$ENV{LISTEN_PID} = $$;
 		foreach my $s (@listeners) {
@@ -490,18 +497,17 @@ sub upgrade { # $_[0] = signal name or number (unused)
 		}
 		exec @CMD;
 		die "Failed to exec: $!\n";
+	} else {
+		awaitpid($pid, \&upgrade_aborted);
+		$reexec_pid = $pid;
 	}
-	$reexec_pid = $pid;
 }
 
-sub kill_workers ($) {
-	my ($sig) = @_;
-	kill $sig, keys(%pids);
-}
+sub kill_workers ($) { kill $_[0], values(%WORKERS) }
 
-sub upgrade_aborted ($) {
-	my ($p) = @_;
-	warn "reexec PID($p) died with: $?\n";
+sub upgrade_aborted {
+	my ($pid) = @_;
+	warn "reexec PID($pid) died with: $?\n";
 	$reexec_pid = undef;
 	return unless $pid_file;
 
@@ -513,21 +519,6 @@ sub upgrade_aborted ($) {
 	warn $@, "\n" if $@;
 }
 
-sub reap_children { # $_[0] = 'CHLD'
-	while (1) {
-		my $p = waitpid(-1, WNOHANG) or return;
-		if (defined $reexec_pid && $p == $reexec_pid) {
-			upgrade_aborted($p);
-		} elsif (defined(my $id = delete $pids{$p})) {
-			warn "worker[$id] PID($p) died with: $?\n";
-		} elsif ($p > 0) {
-			warn "unknown PID($p) reaped: $?\n";
-		} else {
-			return;
-		}
-	}
-}
-
 sub unlink_pid_file_safe_ish ($$) {
 	my ($unlink_pid, $file) = @_;
 	return unless defined $unlink_pid && $unlink_pid == $$;
@@ -544,92 +535,90 @@ sub unlink_pid_file_safe_ish ($$) {
 sub master_quit ($) {
 	exit unless @listeners;
 	@listeners = ();
-	kill_workers($_[0]);
+	exit unless kill_workers($_[0]);
+}
+
+sub reap_worker { # awaitpid CB
+	my ($pid, $nr) = @_;
+	warn "worker[$nr] died \$?=$?\n" if $?;
+	delete $WORKERS{$nr};
+	exit if !@listeners && !keys(%WORKERS);
+	PublicInbox::DS::requeue(\&start_workers);
+}
+
+sub start_worker ($) {
+	my ($nr) = @_;
+	my $seed = rand(0xffffffff);
+	return unless @listeners;
+	my $pid = fork;
+	if (!defined($pid)) {
+		warn "fork: $!";
+	} elsif ($pid == 0) {
+		undef %WORKERS;
+		PublicInbox::DS::Reset();
+		srand($seed);
+		eval { Net::SSLeay::randomize() };
+		$set_user->() if $set_user;
+		PublicInbox::EOFpipe->new($parent_pipe, \&worker_quit);
+		worker_loop();
+		exit 0;
+	} else {
+		$WORKERS{$nr} = $pid;
+		awaitpid($pid, \&reap_worker, $nr);
+	}
+}
+
+sub start_workers {
+	for my $nr (grep { !defined($WORKERS{$_}) } (0..($nworker - 1))) {
+		start_worker($nr);
+	}
+}
+
+sub trim_workers {
+	my @nr = grep { $_ >= $nworker } keys %WORKERS;
+	kill('TERM', @WORKERS{@nr});
 }
 
 sub master_loop {
-	pipe(my ($p0, $p1)) or die "failed to create parent-pipe: $!";
-	my $set_workers = $worker_processes;
+	local $parent_pipe;
+	pipe($parent_pipe, my $p1) or die "failed to create parent-pipe: $!";
+	my $set_workers = $nworker; # for SIGWINCH
 	reopen_logs();
-	my $ignore_winch;
-	my $sig = {
+	my $msig = {
 		USR1 => sub { reopen_logs(); kill_workers($_[0]); },
 		USR2 => \&upgrade,
 		QUIT => \&master_quit,
 		INT => \&master_quit,
 		TERM => \&master_quit,
 		WINCH => sub {
-			return if $ignore_winch || !@listeners;
-			if (-t STDIN || -t STDOUT || -t STDERR) {
-				$ignore_winch = 1;
-				warn <<EOF;
-ignoring SIGWINCH since we are not daemonized
-EOF
-			} else {
-				$worker_processes = 0;
-			}
+			$nworker = 0;
+			trim_workers();
 		},
 		HUP => sub {
-			return unless @listeners;
-			$worker_processes = $set_workers;
+			$nworker = $set_workers; # undo WINCH
 			kill_workers($_[0]);
+			PublicInbox::DS::requeue(\&start_workers)
 		},
 		TTIN => sub {
-			return unless @listeners;
-			if ($set_workers > $worker_processes) {
-				++$worker_processes;
+			if ($set_workers > $nworker) {
+				++$nworker;
 			} else {
-				$worker_processes = ++$set_workers;
+				$nworker = ++$set_workers;
 			}
+			PublicInbox::DS::requeue(\&start_workers);
 		},
 		TTOU => sub {
-			$worker_processes = --$set_workers if $set_workers > 0;
+			return if $nworker <= 0;
+			--$nworker;
+			trim_workers();
 		},
-		CHLD => \&reap_children,
+		CHLD => \&PublicInbox::DS::enqueue_reap,
 	};
-	my $sigfd = PublicInbox::Sigfd->new($sig);
-	local @SIG{keys %$sig} = values(%$sig) unless $sigfd;
-	PublicInbox::DS::sig_setmask($oldset) if !$sigfd;
-	while (1) { # main loop
-		my $n = scalar keys %pids;
-		unless (@listeners) {
-			exit if $n == 0;
-			$set_workers = $worker_processes = $n = 0;
-		}
-
-		if ($n > $worker_processes) {
-			while (my ($k, $v) = each %pids) {
-				kill('TERM', $k) if $v >= $worker_processes;
-			}
-			$n = $worker_processes;
-		}
-		my $want = $worker_processes - 1;
-		if ($n <= $want) {
-			PublicInbox::DS::block_signals() if !$sigfd;
-			for my $i ($n..$want) {
-				my $seed = rand(0xffffffff);
-				my $pid = fork;
-				if (!defined $pid) {
-					warn "failed to fork worker[$i]: $!\n";
-				} elsif ($pid == 0) {
-					srand($seed);
-					eval { Net::SSLeay::randomize() };
-					$set_user->() if $set_user;
-					return $p0; # run normal work code
-				} else {
-					warn "PID=$pid is worker[$i]\n";
-					$pids{$pid} = $i;
-				}
-			}
-			PublicInbox::DS::sig_setmask($oldset) if !$sigfd;
-		}
-
-		if ($sigfd) { # Linux and IO::KQueue users:
-			$sigfd->wait_once;
-		} else { # wake up every second
-			sleep(1);
-		}
-	}
+	$msig->{WINCH} = sub {
+		warn "ignoring SIGWINCH since we are not daemonized\n";
+	} if -t STDIN || -t STDOUT || -t STDERR;
+	start_workers();
+	PublicInbox::DS::event_loop($msig, $oldset);
 	exit # never gets here, just for documentation
 }
 
@@ -659,56 +648,45 @@ sub defer_accept ($$) {
 	}
 }
 
-sub daemon_loop ($) {
-	my ($xnetd) = @_;
+sub daemon_loop () {
 	local $PublicInbox::Config::DEDUPE = {}; # enable dedupe cache
-	my $refresh = sub {
+	my $refresh = $WORKER_SIG{HUP} = sub {
 		my ($sig) = @_;
 		%$PublicInbox::Config::DEDUPE = (); # clear cache
-		for my $xn (values %$xnetd) {
+		for my $xn (values %XNETD) {
 			delete $xn->{tlsd}->{ssl_ctx}; # PublicInbox::TLS::start
 			eval { $xn->{refresh}->($sig) };
 			warn "refresh $@\n" if $@;
 		}
 	};
-	my %post_accept;
 	while (my ($k, $ctx_opt) = each %tls_opt) {
 		$ctx_opt // next;
 		my ($scheme, $l) = split(m!://!, $k, 2);
-		my $xn = $xnetd->{$l} // die "BUG: no xnetd for $k";
+		my $xn = $XNETD{$l} // die "BUG: no xnetd for $k";
 		$xn->{tlsd}->{ssl_ctx_opt} //= $ctx_opt;
 		$scheme =~ m!\A(?:https|imaps|nntps|pop3s)! and
-			$post_accept{$l} = tls_cb(@$xn{qw(post_accept tlsd)});
+			$POST_ACCEPT{$l} = tls_cb(@$xn{qw(post_accept tlsd)});
 	}
 	undef %tls_opt;
-	my $sig = {
-		HUP => $refresh,
-		INT => \&worker_quit,
-		QUIT => \&worker_quit,
-		TERM => \&worker_quit,
-		TTIN => 'IGNORE',
-		TTOU => 'IGNORE',
-		USR1 => \&reopen_logs,
-		USR2 => 'IGNORE',
-		WINCH => 'IGNORE',
-		CHLD => \&PublicInbox::DS::enqueue_reap,
-	};
-	if ($worker_processes > 0) {
+	if ($nworker > 0) {
 		$refresh->(); # preload by default
-		my $fh = master_loop(); # returns if in child process
-		PublicInbox::EOFpipe->new($fh, \&worker_quit);
+		return master_loop();
 	} else {
 		reopen_logs();
 		$set_user->() if $set_user;
-		$sig->{USR2} = sub { worker_quit() if upgrade() };
+		$WORKER_SIG{USR2} = sub { worker_quit() if upgrade() };
 		$refresh->();
 	}
+	worker_loop();
+}
+
+sub worker_loop {
 	$uid = $gid = undef;
 	reopen_logs();
 	@listeners = map {;
 		my $l = sockname($_);
-		my $tls_cb = $post_accept{$l};
-		my $xn = $xnetd->{$l} // die "BUG: no xnetd for $l";
+		my $tls_cb = $POST_ACCEPT{$l};
+		my $xn = $XNETD{$l} // die "BUG: no xnetd for $l";
 
 		# NNTPS, HTTPS, HTTP, IMAPS and POP3S are client-first traffic
 		# IMAP, NNTP and POP3 are server-first
@@ -718,20 +696,24 @@ sub daemon_loop ($) {
 		PublicInbox::Listener->new($_, $tls_cb || $xn->{post_accept},
 						$xn->{'multi-accept'})
 	} @listeners;
-	PublicInbox::DS::event_loop($sig, $oldset);
+	PublicInbox::DS::event_loop(\%WORKER_SIG, $oldset);
 }
 
 sub run {
 	my ($default_listen) = @_;
-	daemon_prepare($default_listen, my $xnetd = {});
+	$nworker = 1;
+	local (%XNETD, %POST_ACCEPT);
+	daemon_prepare($default_listen);
 	my $for_destroy = daemonize();
 
 	# localize GCF2C for tests:
 	local $PublicInbox::GitAsyncCat::GCF2C;
 	local $PublicInbox::Git::async_warn = 1;
 	local $SIG{__WARN__} = PublicInbox::Eml::warn_ignore_cb();
+	local %WORKER_SIG = %WORKER_SIG;
+	local %POST_ACCEPT;
 
-	daemon_loop($xnetd);
+	daemon_loop();
 	PublicInbox::DS->Reset;
 	# ->DESTROY runs when $for_destroy goes out-of-scope
 }
diff --git a/lib/PublicInbox/Sigfd.pm b/lib/PublicInbox/Sigfd.pm
index 5656baeb..b8a1ddfb 100644
--- a/lib/PublicInbox/Sigfd.pm
+++ b/lib/PublicInbox/Sigfd.pm
@@ -12,26 +12,22 @@ use POSIX ();
 # returns a coderef to unblock signals if neither signalfd or kqueue
 # are available.
 sub new {
-	my ($class, $sig, $nonblock) = @_;
+	my ($class, $sig) = @_;
 	my %signo = map {;
 		# $num => [ $cb, $signame ];
 		($SIGNUM{$_} // POSIX->can("SIG$_")->()) => [ $sig->{$_}, $_ ]
 	} keys %$sig;
 	my $self = bless { sig => \%signo }, $class;
 	my $io;
-	my $fd = signalfd([keys %signo], $nonblock);
+	my $fd = signalfd([keys %signo]);
 	if (defined $fd && $fd >= 0) {
 		open($io, '+<&=', $fd) or die "open: $!";
 	} elsif (eval { require PublicInbox::DSKQXS }) {
-		$io = PublicInbox::DSKQXS->signalfd([keys %signo], $nonblock);
+		$io = PublicInbox::DSKQXS->signalfd([keys %signo]);
 	} else {
 		return; # wake up every second to check for signals
 	}
-	if ($nonblock) { # it can go into the event loop
-		$self->SUPER::new($io, EPOLLIN | EPOLLET);
-	} else { # master main loop
-		$self->{sock} = $io;
-	}
+	$self->SUPER::new($io, EPOLLIN | EPOLLET);
 	$self->{is_kq} = 1 if tied(*$io);
 	$self;
 }
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 4609b32d..14cd1720 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -327,15 +327,15 @@ sub epoll_wait_mod8 {
 	}
 }
 
-sub signalfd ($$) {
-	my ($signos, $nonblock) = @_;
+sub signalfd ($) {
+	my ($signos) = @_;
 	if ($SYS_signalfd4) {
 		my $set = POSIX::SigSet->new(@$signos);
 		syscall($SYS_signalfd4, -1, "$$set",
 			# $Config{sig_count} is NSIG, so this is NSIG/8:
 			int($Config{sig_count}/8),
 			# SFD_NONBLOCK == O_NONBLOCK for every architecture
-			($nonblock ? O_NONBLOCK : 0) |$SFD_CLOEXEC);
+			O_NONBLOCK|$SFD_CLOEXEC);
 	} else {
 		$! = ENOSYS;
 		undef;
diff --git a/t/httpd-unix.t b/t/httpd-unix.t
index 414ca0c8..d90c6c3e 100644
--- a/t/httpd-unix.t
+++ b/t/httpd-unix.t
@@ -2,8 +2,7 @@
 # Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 # Tests for binding Unix domain sockets
-use strict;
-use Test::More;
+use v5.12;
 use PublicInbox::TestCommon;
 use Errno qw(EADDRINUSE);
 use Cwd qw(abs_path);
@@ -12,6 +11,7 @@ use Fcntl qw(FD_CLOEXEC F_SETFD);
 require_mods(qw(Plack::Util Plack::Builder HTTP::Date HTTP::Status));
 use IO::Socket::UNIX;
 use POSIX qw(mkfifo);
+require PublicInbox::Sigfd;
 my ($tmpdir, $for_destroy) = tmpdir();
 my $unix = "$tmpdir/unix.sock";
 my $psgi = './t/httpd-corner.psgi';
@@ -99,16 +99,17 @@ check_sock($unix);
 
 # portable Perl can delay or miss signal dispatches due to races,
 # so disable some tests on systems lacking signalfd(2) or EVFILT_SIGNAL
-my $has_sigfd = PublicInbox::Sigfd->new({}, 0) ? 1 : $ENV{TEST_UNRELIABLE};
+my $has_sigfd = PublicInbox::Sigfd->new({}) ? 1 : $ENV{TEST_UNRELIABLE};
+PublicInbox::DS::Reset() if $has_sigfd;
 
 sub delay_until {
-	my $cond = shift;
+	my ($cond, $msg) = @_;
 	my $end = time + 30;
 	do {
 		return if $cond->();
 		tick(0.012);
 	} until (time > $end);
-	Carp::confess('condition failed');
+	Carp::confess($msg // 'condition failed');
 }
 
 SKIP: {
@@ -140,6 +141,8 @@ SKIP: {
 		is(select($rvec, undef, undef, 1), 1, 'timeout for pipe HUP');
 		is(my $undef = <$p0>, undef, 'process closed pipe writer at exit');
 		ok(!-e $pid_file, "$w pid file unlinked at exit");
+		delay_until(sub { !kill(0, $pid) },
+			"daemonized $w really not running");
 	}
 
 	my $httpd = abs_path('blib/script/public-inbox-httpd');
@@ -181,6 +184,9 @@ SKIP: {
 		delay_until(sub {
 			$pid == (eval { $read_pid->($pid_file) } // 0)
 		});
+
+		delay_until(sub { !kill(0, $new_pid) }, 'new PID really died');
+
 		is($read_pid->($pid_file), $pid, 'old PID file restored');
 		ok(!-f "$pid_file.oldbin", '.oldbin PID file gone');
 
@@ -196,7 +202,7 @@ SKIP: {
 
 		# drop the old parent
 		kill('QUIT', $old_pid) or die "QUIT failed: $!";
-		delay_until(sub { !kill(0, $old_pid) }); # UGH
+		delay_until(sub { !kill(0, $old_pid) }, 'old PID really died');
 
 		ok(!-f "$pid_file.oldbin", '.oldbin PID file gone');
 
@@ -209,6 +215,7 @@ SKIP: {
 		is(my $u = <$p0>, undef, 'process closed pipe writer at exit');
 
 		ok(!-f $pid_file, 'PID file is gone');
+		delay_until(sub { !kill(0, $new_pid) }, 'new PID really died');
 	}
 
 	if ('try USR2 without workers (-W0)') {
@@ -234,6 +241,7 @@ SKIP: {
 		is(select($rvec, undef, undef, 1), 1, 'timeout for pipe HUP');
 		is(my $u = <$p0>, undef, 'process closed pipe writer at exit');
 		ok(!-f $pid_file, 'PID file is gone');
+		delay_until(sub { !kill(0, $pid) }, '-W0 daemon is gone');
 	}
 }
 
diff --git a/t/sigfd.t b/t/sigfd.t
index f6449dab..9a7b947d 100644
--- a/t/sigfd.t
+++ b/t/sigfd.t
@@ -29,7 +29,7 @@ SKIP: {
 	ok(!defined($hit->{USR2}), 'no USR2 yet') or diag explain($hit);
 	PublicInbox::DS->Reset;
 	ok($PublicInbox::Syscall::SIGNUM{WINCH}, 'SIGWINCH number defined');
-	my $sigfd = PublicInbox::Sigfd->new($sig, 0);
+	my $sigfd = PublicInbox::Sigfd->new($sig);
 	if ($sigfd) {
 		$linux_sigfd = 1 if $^O eq 'linux';
 		$has_sigfd = 1;
@@ -57,7 +57,7 @@ SKIP: {
 		PublicInbox::DS->Reset;
 		$sigfd = undef;
 
-		my $nbsig = PublicInbox::Sigfd->new($sig, 1);
+		my $nbsig = PublicInbox::Sigfd->new($sig);
 		ok($nbsig, 'Sigfd->new SFD_NONBLOCK works');
 		is($nbsig->wait_once, undef, 'nonblocking ->wait_once');
 		ok($! == Errno::EAGAIN, 'got EAGAIN');

^ permalink raw reply related	[relevance 43%]

* [PATCH 4/6] ipc: recv_cmd4 clobbers destination buffer on errors
  @ 2023-09-24 20:19 86% ` Eric Wong
  2023-09-24 20:19 86% ` [PATCH 5/6] syscall: have `vec' operate on bytes directly Eric Wong
  2023-09-24 20:19 91% ` [PATCH 6/6] syscall: fix valgrind error in pure Perl send_cmd4 Eric Wong
  2 siblings, 0 replies; 51+ results
From: Eric Wong @ 2023-09-24 20:19 UTC (permalink / raw)
  To: meta

Handling this should be done at the lowest levels possible;
so away from higher-level lei code.
---
 lib/PublicInbox/CmdIPC4.pm       | 5 ++++-
 lib/PublicInbox/LEI.pm           | 1 -
 lib/PublicInbox/LeiSelfSocket.pm | 1 -
 lib/PublicInbox/Spawn.pm         | 8 +++++---
 lib/PublicInbox/Syscall.pm       | 5 ++++-
 t/cmd_ipc.t                      | 1 +
 6 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/lib/PublicInbox/CmdIPC4.pm b/lib/PublicInbox/CmdIPC4.pm
index 99890244..4bc4c729 100644
--- a/lib/PublicInbox/CmdIPC4.pm
+++ b/lib/PublicInbox/CmdIPC4.pm
@@ -31,7 +31,10 @@ no warnings 'once';
 *recv_cmd4 = sub ($$$) {
 	my ($s, undef, $len) = @_; # $_[1] = destination buffer
 	my $mh = Socket::MsgHdr->new(buflen => $len, controllen => 256);
-	my $r = Socket::MsgHdr::recvmsg($s, $mh, 0) // return (undef);
+	my $r = Socket::MsgHdr::recvmsg($s, $mh, 0) // do {
+		$_[1] = '';
+		return (undef);
+	};
 	$_[1] = $mh->buf;
 	return () if $r == 0;
 	my (undef, undef, $data) = $mh->cmsghdr;
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 8b62def2..1ead9bf6 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -1167,7 +1167,6 @@ sub event_step {
 		if (scalar(@fds) == 1 && !defined($fds[0])) {
 			return if $! == EAGAIN;
 			die "recvmsg: $!" if $! != ECONNRESET;
-			$buf = '';
 			@fds = (); # for open loop below:
 		}
 		for (@fds) { open my $rfh, '+<&=', $_ }
diff --git a/lib/PublicInbox/LeiSelfSocket.pm b/lib/PublicInbox/LeiSelfSocket.pm
index 84367266..b8745252 100644
--- a/lib/PublicInbox/LeiSelfSocket.pm
+++ b/lib/PublicInbox/LeiSelfSocket.pm
@@ -25,7 +25,6 @@ sub event_step {
 	if (scalar(@fds) == 1 && !defined($fds[0])) {
 		return if $!{EAGAIN};
 		die "recvmsg: $!" unless $!{ECONNRESET};
-		$buf = '';
 	} else { # just in case open so perl can auto-close them:
 		for (@fds) { open my $fh, '+<&=', $_ };
 	}
diff --git a/lib/PublicInbox/Spawn.pm b/lib/PublicInbox/Spawn.pm
index ed698afc..2b84e2d5 100644
--- a/lib/PublicInbox/Spawn.pm
+++ b/lib/PublicInbox/Spawn.pm
@@ -259,10 +259,12 @@ void recv_cmd4(PerlIO *s, SV *buf, STRLEN n)
 	msg.msg_controllen = CMSG_SPACE(SEND_FD_SPACE);
 
 	i = recvmsg(PerlIO_fileno(s), &msg, 0);
-	if (i < 0)
-		Inline_Stack_Push(&PL_sv_undef);
-	else
+	if (i >= 0) {
 		SvCUR_set(buf, i);
+	} else {
+		Inline_Stack_Push(&PL_sv_undef);
+		SvCUR_set(buf, 0);
+	}
 	if (i > 0 && cmsg.hdr.cmsg_level == SOL_SOCKET &&
 			cmsg.hdr.cmsg_type == SCM_RIGHTS) {
 		size_t len = cmsg.hdr.cmsg_len;
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 0a0912fb..776fbe23 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -444,7 +444,10 @@ no warnings 'once';
 			msg_controllen,
 			0); # msg_flags
 	my $r = syscall($SYS_recvmsg, fileno($sock), $mh, 0);
-	return (undef) if $r < 0; # $! set
+	if ($r < 0) { # $! is set
+		$_[1] = '';
+		return (undef);
+	}
 	substr($_[1], $r, length($_[1]), '');
 	my @ret;
 	if ($r > 0) {
diff --git a/t/cmd_ipc.t b/t/cmd_ipc.t
index 461d2140..7313d13b 100644
--- a/t/cmd_ipc.t
+++ b/t/cmd_ipc.t
@@ -47,6 +47,7 @@ my $do_test = sub { SKIP: {
 		$s2->blocking(0);
 		@fds = $recv->($s2, $buf, length($src) + 1);
 		ok($!{EAGAIN}, "EAGAIN set by ($desc)");
+		is($buf, '', "recv buffer emptied on EAGAIN ($desc)");
 		is_deeply(\@fds, [ undef ], "EAGAIN $desc");
 		$s2->blocking(1);
 

^ permalink raw reply related	[relevance 86%]

* [PATCH 6/6] syscall: fix valgrind error in pure Perl send_cmd4
    2023-09-24 20:19 86% ` [PATCH 4/6] ipc: recv_cmd4 clobbers destination buffer on errors Eric Wong
  2023-09-24 20:19 86% ` [PATCH 5/6] syscall: have `vec' operate on bytes directly Eric Wong
@ 2023-09-24 20:19 91% ` Eric Wong
  2 siblings, 0 replies; 51+ results
From: Eric Wong @ 2023-09-24 20:19 UTC (permalink / raw)
  To: meta

We need to allocate CMSG_SPACE for the `struct cmsghdr', not the
smaller CMSG_LEN.  AFAIK this isn't a real world problem since
the Linux kernel doesn't care about the uninitialized space as
long as memory region belongs to the user, but valgrind complains.
---
 lib/PublicInbox/Syscall.pm | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index b76a9e8a..4cf45d0f 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -398,10 +398,13 @@ no warnings 'once';
 	my ($sock, $fds, undef, $flags) = @_;
 	my $iov = pack('P'.TMPL_size_t,
 			$_[2] // NUL, length($_[2] // NUL) || 1);
+	my $fd_space = scalar(@$fds) * SIZEOF_int;
+	my $msg_controllen = CMSG_SPACE($fd_space);
 	my $cmsghdr = pack(TMPL_size_t . # cmsg_len
 			'LL' .  # cmsg_level, cmsg_type,
-			('i' x scalar(@$fds)),
-			CMSG_LEN(scalar(@$fds) * SIZEOF_int), # cmsg_len
+			('i' x scalar(@$fds)) . # CMSG_DATA
+			'@'.($msg_controllen - 1).'x1', # pad to space, not len
+			CMSG_LEN($fd_space), # cmsg_len
 			SOL_SOCKET, SCM_RIGHTS, # cmsg_{level,type}
 			@$fds); # CMSG_DATA
 	my $mh = pack('PL' . # msg_name, msg_namelen (socklen_t (U32))
@@ -413,7 +416,7 @@ no warnings 'once';
 			@BYTES_4_hole,
 			$iov, 1, # msg_iov, msg_iovlen
 			$cmsghdr, # msg_control
-			CMSG_SPACE(scalar(@$fds) * SIZEOF_int), # msg_controllen
+			$msg_controllen,
 			0); # msg_flags
 	my $sent;
 	my $try = 0;

^ permalink raw reply related	[relevance 91%]

* [PATCH 5/6] syscall: have `vec' operate on bytes directly
    2023-09-24 20:19 86% ` [PATCH 4/6] ipc: recv_cmd4 clobbers destination buffer on errors Eric Wong
@ 2023-09-24 20:19 86% ` Eric Wong
  2023-09-24 20:19 91% ` [PATCH 6/6] syscall: fix valgrind error in pure Perl send_cmd4 Eric Wong
  2 siblings, 0 replies; 51+ results
From: Eric Wong @ 2023-09-24 20:19 UTC (permalink / raw)
  To: meta

Instead of converting to bytes to bits and asking `vec' to
operate on single bits, we can just have `vec' work on 8 bits
at-a-time.

This also fixes an overallocation in pure Perl Linux recv_cmd4.
Adding an extra byte ourselves for "\0" isn't necessary: Perl
already does it internally everywhere when creating/resizing
scalars.
---
 lib/PublicInbox/Syscall.pm | 6 +++---
 t/cmd_ipc.t                | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 776fbe23..b76a9e8a 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -283,7 +283,7 @@ sub epoll_wait_mod4 {
 	# resize our static buffer if maxevents bigger than we've ever done
 	if ($maxevents > $epoll_wait_size) {
 		$epoll_wait_size = $maxevents;
-		vec($epoll_wait_events, $maxevents * 12 * 8 - 1, 1) = 0;
+		vec($epoll_wait_events, $maxevents * 12 - 1, 8) = 0;
 	}
 	@$events = ();
 	my $ct = syscall($SYS_epoll_wait, $epfd, $epoll_wait_events,
@@ -304,7 +304,7 @@ sub epoll_wait_mod8 {
 	# resize our static buffer if maxevents bigger than we've ever done
 	if ($maxevents > $epoll_wait_size) {
 		$epoll_wait_size = $maxevents;
-		vec($epoll_wait_events, $maxevents * 16 * 8 - 1, 1) = 0;
+		vec($epoll_wait_events, $maxevents * 16 - 1, 8) = 0;
 	}
 	@$events = ();
 	my $ct = syscall($SYS_epoll_wait, $epfd, $epoll_wait_events,
@@ -429,7 +429,7 @@ no warnings 'once';
 
 *recv_cmd4 = sub ($$$) {
 	my ($sock, undef, $len) = @_;
-	vec($_[1] //= '', ($len + 1) * 8, 1) = 0;
+	vec($_[1] //= '', $len - 1, 8) = 0;
 	my $cmsghdr = "\0" x msg_controllen; # 10 * sizeof(int)
 	my $iov = pack('P'.TMPL_size_t, $_[1], $len);
 	my $mh = pack('PL' . # msg_name, msg_namelen (socklen_t (U32))
diff --git a/t/cmd_ipc.t b/t/cmd_ipc.t
index 7313d13b..e5d22aab 100644
--- a/t/cmd_ipc.t
+++ b/t/cmd_ipc.t
@@ -97,7 +97,7 @@ my $do_test = sub { SKIP: {
 
 		my $nr = 2 * 1024 * 1024;
 		while (1) {
-			vec(my $vec = '', $nr * 8 - 1, 1) = 1;
+			vec(my $vec = '', $nr - 1, 8) = 1;
 			my $n = $send->($s1, [], $vec, $flag);
 			if (defined($n)) {
 				$n == length($vec) or

^ permalink raw reply related	[relevance 86%]

* [PATCH] ipc: lower-level send_cmd/recv_cmd handle EINTR directly
@ 2023-10-06  1:37 57% Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2023-10-06  1:37 UTC (permalink / raw)
  To: meta

This ensures script/lei $send_cmd usage is EINTR-safe (since
I prefer to avoid loading PublicInbox::IPC for startup time).
Overall, it saves us some code, too.
---
 lib/PublicInbox/CmdIPC4.pm       | 24 +++++++++++++------
 lib/PublicInbox/IPC.pm           | 26 ++++----------------
 lib/PublicInbox/LEI.pm           |  6 ++---
 lib/PublicInbox/LeiSelfSocket.pm |  3 ++-
 lib/PublicInbox/Spawn.pm         | 41 ++++++++++++++++++--------------
 lib/PublicInbox/Syscall.pm       | 21 ++++++++--------
 lib/PublicInbox/XapClient.pm     |  2 +-
 lib/PublicInbox/XapHelper.pm     |  2 +-
 script/lei                       |  5 +---
 t/cmd_ipc.t                      | 12 ++++++----
 t/xap_helper.t                   |  4 ++--
 11 files changed, 72 insertions(+), 74 deletions(-)

diff --git a/lib/PublicInbox/CmdIPC4.pm b/lib/PublicInbox/CmdIPC4.pm
index 4bc4c729..2f102ec6 100644
--- a/lib/PublicInbox/CmdIPC4.pm
+++ b/lib/PublicInbox/CmdIPC4.pm
@@ -7,6 +7,16 @@
 package PublicInbox::CmdIPC4;
 use v5.12;
 use Socket qw(SOL_SOCKET SCM_RIGHTS);
+
+sub sendmsg_retry ($) {
+	return 1 if $!{EINTR};
+	return unless ($!{ENOMEM} || $!{ENOBUFS} || $!{ETOOMANYREFS});
+	return if ++$_[0] >= 50;
+	warn "# sleeping on sendmsg: $! (#$_[0])\n";
+	select(undef, undef, undef, 0.1);
+	1;
+}
+
 BEGIN { eval {
 require Socket::MsgHdr; # XS
 no warnings 'once';
@@ -20,21 +30,21 @@ no warnings 'once';
 	my $try = 0;
 	do {
 		$s = Socket::MsgHdr::sendmsg($sock, $mh, $flags);
-	} while (!defined($s) &&
-			($!{ENOBUFS} || $!{ENOMEM} || $!{ETOOMANYREFS}) &&
-			(++$try < 50) &&
-			warn "# sleeping on sendmsg: $! (#$try)\n" &&
-			select(undef, undef, undef, 0.1) == 0);
+	} while (!defined($s) && sendmsg_retry($try));
 	$s;
 };
 
 *recv_cmd4 = sub ($$$) {
 	my ($s, undef, $len) = @_; # $_[1] = destination buffer
 	my $mh = Socket::MsgHdr->new(buflen => $len, controllen => 256);
-	my $r = Socket::MsgHdr::recvmsg($s, $mh, 0) // do {
+	my $r;
+	do {
+		$r = Socket::MsgHdr::recvmsg($s, $mh, 0);
+	} while (!defined($r) && $!{EINTR});
+	if (!defined($r)) {
 		$_[1] = '';
 		return (undef);
-	};
+	}
 	$_[1] = $mh->buf;
 	return () if $r == 0;
 	my (undef, undef, $data) = $mh->cmsghdr;
diff --git a/lib/PublicInbox/IPC.pm b/lib/PublicInbox/IPC.pm
index 9b4b1508..839281b2 100644
--- a/lib/PublicInbox/IPC.pm
+++ b/lib/PublicInbox/IPC.pm
@@ -204,27 +204,9 @@ sub ipc_sibling_atfork_child {
 	$pid == $$ and die "BUG: $$ ipc_atfork_child called on itself";
 }
 
-sub send_cmd ($$$$) {
-	my ($s, $fds, $buf, $fl) = @_;
-	while (1) {
-		my $n = $send_cmd->($s, $fds, $buf, $fl);
-		next if !defined($n) && $!{EINTR};
-		return $n;
-	}
-}
-
-sub recv_cmd ($$$) {
-	my ($s, undef, $len) = @_; # $_[1] is $buf
-	while (1) {
-		my @fds = $recv_cmd->($s, $_[1], $len);
-		next if scalar(@fds) == 1 && !defined($fds[0]) && $!{EINTR};
-		return @fds;
-	}
-}
-
 sub recv_and_run {
 	my ($self, $s2, $len, $full_stream) = @_;
-	my @fds = recv_cmd($s2, my $buf, $len // $MY_MAX_ARG_STRLEN);
+	my @fds = $recv_cmd->($s2, my $buf, $len // $MY_MAX_ARG_STRLEN);
 	return if scalar(@fds) && !defined($fds[0]);
 	my $n = length($buf) or return 0;
 	my $nfd = 0;
@@ -291,11 +273,11 @@ sub stream_in_full ($$$) {
 	my ($s1, $fds, $buf) = @_;
 	socketpair(my $r, my $w, AF_UNIX, SOCK_STREAM, 0) or
 		croak "socketpair: $!";
-	my $n = send_cmd($s1, [ fileno($r) ],
+	my $n = $send_cmd->($s1, [ fileno($r) ],
 			ipc_freeze(['do_sock_stream', length($buf)]),
 			0) // croak "sendmsg: $!";
 	undef $r;
-	$n = send_cmd($w, $fds, $buf, 0) // croak "sendmsg: $!";
+	$n = $send_cmd->($w, $fds, $buf, 0) // croak "sendmsg: $!";
 	while ($n < length($buf)) {
 		my $x = syswrite($w, $buf, length($buf) - $n, $n);
 		if (!defined($n)) {
@@ -315,7 +297,7 @@ sub wq_io_do { # always async
 		if (length($buf) > $MY_MAX_ARG_STRLEN) {
 			stream_in_full($s1, $fds, $buf);
 		} else {
-			my $n = send_cmd $s1, $fds, $buf, 0;
+			my $n = $send_cmd->($s1, $fds, $buf, 0);
 			return if defined($n); # likely
 			$!{ETOOMANYREFS} and
 				croak "sendmsg: $! (check RLIMIT_NOFILE)";
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index e300f0a4..f8bcd43d 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -1041,7 +1041,7 @@ sub start_mua {
 
 sub send_exec_cmd { # tell script/lei to execute a command
 	my ($self, $io, $cmd, $env) = @_;
-	PublicInbox::IPC::send_cmd(
+	$PublicInbox::IPC::send_cmd->(
 			$self->{sock} // die('lei client gone'),
 			[ map { fileno($_) } @$io ],
 			exec_buf($cmd, $env), 0) //
@@ -1139,7 +1139,7 @@ sub accept_dispatch { # Listener {post_accept} callback
 	select($rvec, undef, undef, 60) or
 		return send($sock, 'timed out waiting to recv FDs', 0);
 	# (4096 * 33) >MAX_ARG_STRLEN
-	my @fds = PublicInbox::IPC::recv_cmd($sock, my $buf, 4096 * 33) or
+	my @fds = $PublicInbox::IPC::recv_cmd->($sock, my $buf, 4096 * 33) or
 		return; # EOF
 	if (!defined($fds[0])) {
 		warn(my $msg = "recv_cmd failed: $!");
@@ -1178,7 +1178,7 @@ sub event_step {
 	local %ENV = %{$self->{env}};
 	local $current_lei = $self;
 	eval {
-		my @fds = PublicInbox::IPC::recv_cmd(
+		my @fds = $PublicInbox::IPC::recv_cmd->(
 			$self->{sock} // return, my $buf, 4096);
 		if (scalar(@fds) == 1 && !defined($fds[0])) {
 			return if $! == EAGAIN;
diff --git a/lib/PublicInbox/LeiSelfSocket.pm b/lib/PublicInbox/LeiSelfSocket.pm
index b8745252..0e15bc7c 100644
--- a/lib/PublicInbox/LeiSelfSocket.pm
+++ b/lib/PublicInbox/LeiSelfSocket.pm
@@ -21,7 +21,8 @@ sub new {
 
 sub event_step {
 	my ($self) = @_;
-	my @fds = PublicInbox::IPC::recv_cmd($self->{sock}, my $buf, 4096 * 33);
+	my ($buf, @fds);
+	@fds = $PublicInbox::IPC::recv_cmd->($self->{sock}, $buf, 4096 * 33);
 	if (scalar(@fds) == 1 && !defined($fds[0])) {
 		return if $!{EAGAIN};
 		die "recvmsg: $!" unless $!{ECONNRESET};
diff --git a/lib/PublicInbox/Spawn.pm b/lib/PublicInbox/Spawn.pm
index bb2abe28..4c7e0f80 100644
--- a/lib/PublicInbox/Spawn.pm
+++ b/lib/PublicInbox/Spawn.pm
@@ -173,19 +173,20 @@ int pi_fork_exec(SV *redirref, SV *file, SV *cmdref, SV *envref, SV *rlimref,
 	return (int)pid;
 }
 
-static int sleep_wait(unsigned *tries, int err)
+static int sendmsg_retry(unsigned *tries)
 {
 	const struct timespec req = { 0, 100000000 }; /* 100ms */
+	int err = errno;
 	switch (err) {
+	case EINTR: PERL_ASYNC_CHECK(); return 1;
 	case ENOBUFS: case ENOMEM: case ETOOMANYREFS:
-		if (++*tries < 50) {
-			fprintf(stderr, "# sleeping on sendmsg: %s (#%u)\n",
-				strerror(err), *tries);
-			nanosleep(&req, NULL);
-			return 1;
-		}
-	default:
-		return 0;
+		if (++*tries >= 50) return 0;
+		fprintf(stderr, "# sleeping on sendmsg: %s (#%u)\n",
+			strerror(err), *tries);
+		nanosleep(&req, NULL);
+		PERL_ASYNC_CHECK();
+		return 1;
+	default: return 0;
 	}
 }
 
@@ -237,7 +238,7 @@ SV *send_cmd4(PerlIO *s, SV *svfds, SV *data, int flags)
 	}
 	do {
 		sent = sendmsg(PerlIO_fileno(s), &msg, flags);
-	} while (sent < 0 && sleep_wait(&tries, errno));
+	} while (sent < 0 && sendmsg_retry(&tries));
 	return sent >= 0 ? newSViv(sent) : &PL_sv_undef;
 }
 
@@ -259,20 +260,24 @@ void recv_cmd4(PerlIO *s, SV *buf, STRLEN n)
 	msg.msg_control = &cmsg.hdr;
 	msg.msg_controllen = CMSG_SPACE(SEND_FD_SPACE);
 
-	i = recvmsg(PerlIO_fileno(s), &msg, 0);
+	for (;;) {
+		i = recvmsg(PerlIO_fileno(s), &msg, 0);
+		if (i >= 0 || errno != EINTR) break;
+		PERL_ASYNC_CHECK();
+	}
 	if (i >= 0) {
 		SvCUR_set(buf, i);
+		if (cmsg.hdr.cmsg_level == SOL_SOCKET &&
+				cmsg.hdr.cmsg_type == SCM_RIGHTS) {
+			size_t len = cmsg.hdr.cmsg_len;
+			int *fdp = (int *)CMSG_DATA(&cmsg.hdr);
+			for (i = 0; CMSG_LEN((i + 1) * sizeof(int)) <= len; i++)
+				Inline_Stack_Push(sv_2mortal(newSViv(*fdp++)));
+		}
 	} else {
 		Inline_Stack_Push(&PL_sv_undef);
 		SvCUR_set(buf, 0);
 	}
-	if (i > 0 && cmsg.hdr.cmsg_level == SOL_SOCKET &&
-			cmsg.hdr.cmsg_type == SCM_RIGHTS) {
-		size_t len = cmsg.hdr.cmsg_len;
-		int *fdp = (int *)CMSG_DATA(&cmsg.hdr);
-		for (i = 0; CMSG_LEN((i + 1) * sizeof(int)) <= len; i++)
-			Inline_Stack_Push(sv_2mortal(newSViv(*fdp++)));
-	}
 	Inline_Stack_Done;
 }
 #endif /* defined(CMSG_SPACE) && defined(CMSG_LEN) */
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 4cf45d0f..e83beb6a 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -394,6 +394,8 @@ use constant msg_controllen => CMSG_SPACE(10 * SIZEOF_int) + 16; # 10 FDs
 
 if (defined($SYS_sendmsg) && defined($SYS_recvmsg)) {
 no warnings 'once';
+require PublicInbox::CmdIPC4;
+
 *send_cmd4 = sub ($$$$) {
 	my ($sock, $fds, undef, $flags) = @_;
 	my $iov = pack('P'.TMPL_size_t,
@@ -418,16 +420,12 @@ no warnings 'once';
 			$cmsghdr, # msg_control
 			$msg_controllen,
 			0); # msg_flags
-	my $sent;
+	my $s;
 	my $try = 0;
 	do {
-		$sent = syscall($SYS_sendmsg, fileno($sock), $mh, $flags);
-	} while ($sent < 0 &&
-			($!{ENOBUFS} || $!{ENOMEM} || $!{ETOOMANYREFS}) &&
-			(++$try < 50) &&
-			warn "# sleeping on sendmsg: $! (#$try)\n" &&
-			select(undef, undef, undef, 0.1) == 0);
-	$sent >= 0 ? $sent : undef;
+		$s = syscall($SYS_sendmsg, fileno($sock), $mh, $flags);
+	} while ($s < 0 && PublicInbox::CmdIPC4::sendmsg_retry($try));
+	$s >= 0 ? $s : undef;
 };
 
 *recv_cmd4 = sub ($$$) {
@@ -446,8 +444,11 @@ no warnings 'once';
 			$cmsghdr, # msg_control
 			msg_controllen,
 			0); # msg_flags
-	my $r = syscall($SYS_recvmsg, fileno($sock), $mh, 0);
-	if ($r < 0) { # $! is set
+	my $r;
+	do {
+		$r = syscall($SYS_recvmsg, fileno($sock), $mh, 0);
+	} while ($r < 0 && $!{EINTR});
+	if ($r < 0) {
 		$_[1] = '';
 		return (undef);
 	}
diff --git a/lib/PublicInbox/XapClient.pm b/lib/PublicInbox/XapClient.pm
index f6c09c3b..9e2d71a0 100644
--- a/lib/PublicInbox/XapClient.pm
+++ b/lib/PublicInbox/XapClient.pm
@@ -21,7 +21,7 @@ sub mkreq {
 	}
 	my @fds = map fileno($_), @$ios;
 	my $buf = join("\0", @arg, '');
-	$n = PublicInbox::IPC::send_cmd($self->{io}, \@fds, $buf, 0) //
+	$n = $PublicInbox::IPC::send_cmd->($self->{io}, \@fds, $buf, 0) //
 		die "send_cmd: $!";
 	$n == length($buf) or die "send_cmd: $n != ".length($buf);
 	$r;
diff --git a/lib/PublicInbox/XapHelper.pm b/lib/PublicInbox/XapHelper.pm
index c98708e3..ae907766 100644
--- a/lib/PublicInbox/XapHelper.pm
+++ b/lib/PublicInbox/XapHelper.pm
@@ -177,7 +177,7 @@ sub recv_loop {
 	my $in = \*STDIN;
 	while (!defined($parent_pid) || getppid == $parent_pid) {
 		PublicInbox::DS::sig_setmask($workerset);
-		my @fds = PublicInbox::IPC::recv_cmd($in, $rbuf, 4096*33);
+		my @fds = $PublicInbox::IPC::recv_cmd->($in, $rbuf, 4096*33);
 		scalar(@fds) or exit(66); # EX_NOINPUT
 		die "recvmsg: $!" if !defined($fds[0]);
 		PublicInbox::DS::block_signals();
diff --git a/script/lei b/script/lei
index 1d90be0a..087afc33 100755
--- a/script/lei
+++ b/script/lei
@@ -116,10 +116,7 @@ $SIG{CONT} = sub { send($sock, 'CONT', 0) };
 my $x_it_code = 0;
 while (1) {
 	my (@fds) = $recv_cmd->($sock, my $buf, 4096 * 33);
-	if (scalar(@fds) == 1 && !defined($fds[0])) {
-		next if $!{EINTR};
-		die "recvmsg: $!";
-	}
+	die "recvmsg: $!" if scalar(@fds) == 1 && !defined($fds[0]);
 	last if $buf eq '';
 	if ($buf =~ /\Aexec (.+)\z/) {
 		$exec_cmd->(\@fds, split(/\0/, $1));
diff --git a/t/cmd_ipc.t b/t/cmd_ipc.t
index e5d22aab..ccf4ca31 100644
--- a/t/cmd_ipc.t
+++ b/t/cmd_ipc.t
@@ -59,18 +59,20 @@ my $do_test = sub { SKIP: {
 			if ($pid == 0) {
 				# need to loop since Perl signals are racy
 				# (the interpreter doesn't self-pipe)
-				CORE::kill('ALRM', $tgt) while (tick(0.05));
+				my $n = 3;
+				while (tick(0.01 * $n) && --$n) {
+					kill('ALRM', $tgt)
+				}
+				close $s1;
 				POSIX::_exit(1);
 			}
+			close $s1;
 			@fds = $recv->($s2, $buf, length($src) + 1);
-			ok($!{EINTR}, "EINTR set by ($desc)");
-			kill('KILL', $pid);
 			waitpid($pid, 0);
-			is_deeply(\@fds, [ undef ], "EINTR $desc");
+			is_deeply(\@fds, [], "EINTR->EOF $desc");
 			ok($alrm, 'SIGALRM hit');
 		}
 
-		close $s1;
 		@fds = $recv->($s2, $buf, length($src) + 1);
 		is_deeply(\@fds, [], "no FDs on EOF $desc");
 		is($buf, '', "buffer cleared on EOF ($desc)");
diff --git a/t/xap_helper.t b/t/xap_helper.t
index 2303301d..27742cad 100644
--- a/t/xap_helper.t
+++ b/t/xap_helper.t
@@ -52,8 +52,8 @@ my $doreq = sub {
 	my $buf = join("\0", @arg, '');
 	my @fds = fileno($y);
 	push @fds, fileno($err) if $err;
-	my $n = PublicInbox::IPC::send_cmd($s, \@fds, $buf, 0);
-	$n // xbail "send: $!";
+	my $n = $PublicInbox::IPC::send_cmd->($s, \@fds, $buf, 0) //
+		xbail "send: $!";
 	my $exp = length($buf);
 	$exp == $n or xbail "req @arg sent short ($n != $exp)";
 	$x;

^ permalink raw reply related	[relevance 57%]

* [PATCH 17/30] syscall: common $F_SETPIPE_SZ definition
  @ 2023-10-17 23:38 71% ` Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2023-10-17 23:38 UTC (permalink / raw)
  To: meta

We use this in various places to minimize or maximize pipe
size on Linux.  So keep it all in one place.
---
 lib/PublicInbox/CidxLogP.pm       |  4 ++--
 lib/PublicInbox/EOFpipe.pm        |  6 +++---
 lib/PublicInbox/LeiXSearch.pm     |  2 +-
 lib/PublicInbox/SearchIdxShard.pm | 14 +++++++-------
 lib/PublicInbox/Syscall.pm        | 16 ++++++++--------
 t/gcf2.t                          |  5 +++--
 t/lei-sigpipe.t                   |  7 +++----
 7 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/lib/PublicInbox/CidxLogP.pm b/lib/PublicInbox/CidxLogP.pm
index 7877d5ac..34f7201d 100644
--- a/lib/PublicInbox/CidxLogP.pm
+++ b/lib/PublicInbox/CidxLogP.pm
@@ -10,12 +10,12 @@
 package PublicInbox::CidxLogP;
 use v5.12;
 use parent qw(PublicInbox::DS);
-use PublicInbox::Syscall qw(EPOLLIN EPOLLONESHOT);
+use PublicInbox::Syscall qw(EPOLLIN EPOLLONESHOT $F_SETPIPE_SZ);
 
 sub new {
 	my ($cls, $rd, $cidx, $git, $roots) = @_;
 	my $self = bless { cidx => $cidx, git => $git, roots => $roots }, $cls;
-	fcntl($rd, 1031, 1048576) if $^O eq 'linux'; # fatter pipes
+	fcntl($rd, $F_SETPIPE_SZ, 1048576) if $F_SETPIPE_SZ;
 	$self->SUPER::new($rd, EPOLLIN|EPOLLONESHOT);
 }
 
diff --git a/lib/PublicInbox/EOFpipe.pm b/lib/PublicInbox/EOFpipe.pm
index 628e9366..3474874f 100644
--- a/lib/PublicInbox/EOFpipe.pm
+++ b/lib/PublicInbox/EOFpipe.pm
@@ -4,13 +4,13 @@
 package PublicInbox::EOFpipe;
 use v5.12;
 use parent qw(PublicInbox::DS);
-use PublicInbox::Syscall qw(EPOLLIN EPOLLONESHOT);
+use PublicInbox::Syscall qw(EPOLLIN EPOLLONESHOT $F_SETPIPE_SZ);
 
 sub new {
 	my (undef, $rd, $cb) = @_;
 	my $self = bless { cb => $cb }, __PACKAGE__;
-	# 1031: F_SETPIPE_SZ, 4096: page size
-	fcntl($rd, 1031, 4096) if $^O eq 'linux';
+	# 4096: page size
+	fcntl($rd, $F_SETPIPE_SZ, 4096) if $F_SETPIPE_SZ;
 	$self->SUPER::new($rd, EPOLLIN|EPOLLONESHOT);
 }
 
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index d83a403c..25b66b3b 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -21,6 +21,7 @@ use Fcntl qw(SEEK_SET F_SETFL O_APPEND O_RDWR);
 use PublicInbox::ContentHash qw(git_sha);
 use POSIX qw(strftime);
 use autodie qw(open read seek truncate);
+use PublicInbox::Syscall qw($F_SETPIPE_SZ);
 
 sub new {
 	my ($class) = @_;
@@ -536,7 +537,6 @@ sub do_query {
 		if ($lei->{opt}->{augment} && delete $lei->{early_mua}) {
 			$lei->start_mua;
 		}
-		my $F_SETPIPE_SZ = $^O eq 'linux' ? 1031 : undef;
 		if ($l2m->{-wq_nr_workers} > 1 &&
 				$l2m->{base_type} =~ /\A(?:maildir|mbox)\z/) {
 			# setup two barriers to coordinate ->has_entries
diff --git a/lib/PublicInbox/SearchIdxShard.pm b/lib/PublicInbox/SearchIdxShard.pm
index 21bd56c2..1630eb4a 100644
--- a/lib/PublicInbox/SearchIdxShard.pm
+++ b/lib/PublicInbox/SearchIdxShard.pm
@@ -7,6 +7,7 @@ package PublicInbox::SearchIdxShard;
 use v5.12;
 use parent qw(PublicInbox::SearchIdx PublicInbox::IPC);
 use PublicInbox::OnDestroy;
+use PublicInbox::Syscall qw($F_SETPIPE_SZ);
 
 sub new {
 	my ($class, $v2w, $shard) = @_; # v2w may be ExtSearchIdx
@@ -20,13 +21,12 @@ sub new {
 	if ($v2w->{parallel}) {
 		local $self->{-v2w_afc} = $v2w;
 		$self->ipc_worker_spawn("shard[$shard]");
-		# F_SETPIPE_SZ = 1031 on Linux; increasing the pipe size for
-		# inputs speeds V2Writable batch imports across 8 cores by
-		# nearly 20%.  Since any of our responses are small, make
-		# the response pipe as small as possible
-		if ($^O eq 'linux') {
-			fcntl($self->{-ipc_req}, 1031, 1048576);
-			fcntl($self->{-ipc_res}, 1031, 4096);
+		# Increasing the pipe size for requests speeds V2 batch imports
+		# across 8 cores by nearly 20%.  Since many of our responses
+		# are small, make the response pipe as small as possible
+		if ($F_SETPIPE_SZ) {
+			fcntl($self->{-ipc_req}, $F_SETPIPE_SZ, 1048576);
+			fcntl($self->{-ipc_res}, $F_SETPIPE_SZ, 4096);
 		}
 	}
 	$self;
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index e83beb6a..78181bb6 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -28,7 +28,7 @@ our @EXPORT_OK = qw(epoll_ctl epoll_create epoll_wait
                   EPOLLIN EPOLLOUT EPOLLET
                   EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
                   EPOLLONESHOT EPOLLEXCLUSIVE
-                  signalfd rename_noreplace %SIGNUM);
+                  signalfd rename_noreplace %SIGNUM $F_SETPIPE_SZ);
 use constant {
 	EPOLLIN => 1,
 	EPOLLOUT => 4,
@@ -55,13 +55,12 @@ use constant {
 
 my @BYTES_4_hole = BYTES_4_hole ? (0) : ();
 
-our (
-     $SYS_epoll_create,
-     $SYS_epoll_ctl,
-     $SYS_epoll_wait,
-     $SYS_signalfd4,
-     $SYS_renameat2,
-     );
+our ($SYS_epoll_create,
+	$SYS_epoll_ctl,
+	$SYS_epoll_wait,
+	$SYS_signalfd4,
+	$SYS_renameat2,
+	$F_SETPIPE_SZ);
 
 my ($SYS_sendmsg, $SYS_recvmsg);
 my $SYS_fstatfs; # don't need fstatfs64, just statfs.f_type
@@ -70,6 +69,7 @@ my $SFD_CLOEXEC = 02000000; # Perl does not expose O_CLOEXEC
 our $no_deprecated = 0;
 
 if ($^O eq "linux") {
+	$F_SETPIPE_SZ = 1031;
     my (undef, undef, $release, undef, $machine) = POSIX::uname();
     my ($maj, $min) = ($release =~ /\A([0-9]+)\.([0-9]+)/);
     $SYS_renameat2 = 0 if "$maj.$min" < 3.15;
diff --git a/t/gcf2.t b/t/gcf2.t
index d12a4420..33f3bbca 100644
--- a/t/gcf2.t
+++ b/t/gcf2.t
@@ -10,6 +10,7 @@ use POSIX qw(_exit);
 use Cwd qw(abs_path);
 require_mods('PublicInbox::Gcf2');
 use_ok 'PublicInbox::Gcf2';
+use PublicInbox::Syscall qw($F_SETPIPE_SZ);
 use PublicInbox::Import;
 my ($tmpdir, $for_destroy) = tmpdir();
 
@@ -109,7 +110,7 @@ SKIP: {
 	for my $blk (1, 0) {
 		my ($r, $w);
 		pipe($r, $w) or BAIL_OUT $!;
-		fcntl($w, 1031, 4096) or
+		fcntl($w, $F_SETPIPE_SZ, 4096) or
 			skip('Linux too old for F_SETPIPE_SZ', 14);
 		$w->blocking($blk);
 		seek($fh, 0, SEEK_SET) or BAIL_OUT "seek: $!";
@@ -129,7 +130,7 @@ SKIP: {
 		$ck_copying->("pipe blocking($blk)");
 
 		pipe($r, $w) or BAIL_OUT $!;
-		fcntl($w, 1031, 4096) or BAIL_OUT $!;
+		fcntl($w, $F_SETPIPE_SZ, 4096) or BAIL_OUT $!;
 		$w->blocking($blk);
 		close $r;
 		local $SIG{PIPE} = 'IGNORE';
diff --git a/t/lei-sigpipe.t b/t/lei-sigpipe.t
index 55c208e2..622598a4 100644
--- a/t/lei-sigpipe.t
+++ b/t/lei-sigpipe.t
@@ -6,6 +6,7 @@ use v5.10.1;
 use PublicInbox::TestCommon;
 use POSIX qw(WTERMSIG WIFSIGNALED SIGPIPE);
 use PublicInbox::OnDestroy;
+use PublicInbox::Syscall qw($F_SETPIPE_SZ);
 
 # undo systemd (and similar) ignoring SIGPIPE, since lei expects to be run
 # from an interactive terminal:
@@ -21,10 +22,8 @@ test_lei(sub {
 	my $imported;
 	for my $out ([], [qw(-f mboxcl2)], [qw(-f text)]) {
 		pipe(my ($r, $w)) or BAIL_OUT $!;
-		my $size = 65536;
-		if ($^O eq 'linux' && fcntl($w, 1031, 4096)) {
-			$size = 4096;
-		}
+		my $size = $F_SETPIPE_SZ && fcntl($w, $F_SETPIPE_SZ, 4096) ?
+			4096 : 65536;
 		unless (-f $f) {
 			open my $fh, '>', $f or xbail "open $f: $!";
 			print $fh <<'EOM' or xbail;

^ permalink raw reply related	[relevance 71%]

* [PATCH] pure Perl inotify support
@ 2023-12-28  4:23 47% Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2023-12-28  4:23 UTC (permalink / raw)
  To: meta

This is a step towards improving the out-of-the-box experience
in achieving notifications without XS, extra downloads, and .so
loading + runtime mmap overhead.

This also fixes loongarch support of all Linux syscalls due to
a bad regexp :x

All the reachable Linux architectures listed at
<https://portal.cfarm.net/machines/list/> should be supported.
At the moment, there appears to be no reachable sparc* Linux
machines available to cfarm users.

Fixes: b0e5093aa3572a86 (syscall: add support for riscv64, 2022-08-11)
---
 MANIFEST                      |   4 ++
 devel/sysdefs-list            |  28 +++++++++
 lib/PublicInbox/DirIdle.pm    |  16 ++---
 lib/PublicInbox/In3Event.pm   |  24 +++++++
 lib/PublicInbox/In3Watch.pm   |  20 ++++++
 lib/PublicInbox/InboxIdle.pm  |   6 +-
 lib/PublicInbox/Inotify.pm    |  27 ++++++--
 lib/PublicInbox/Inotify3.pm   | 115 ++++++++++++++++++++++++++++++++++
 lib/PublicInbox/Syscall.pm    |  37 ++++++++++-
 lib/PublicInbox/TailNotify.pm |   6 +-
 t/imapd.t                     |   2 +-
 t/inotify3.t                  |  17 +++++
 t/lei-auto-watch.t            |   4 +-
 t/lei-watch.t                 |   4 +-
 t/nntpd.t                     |   2 +-
 t/watch_maildir.t             |   2 +-
 16 files changed, 287 insertions(+), 27 deletions(-)
 create mode 100644 lib/PublicInbox/In3Event.pm
 create mode 100644 lib/PublicInbox/In3Watch.pm
 create mode 100644 lib/PublicInbox/Inotify3.pm
 create mode 100644 t/inotify3.t

diff --git a/MANIFEST b/MANIFEST
index e22674b7..109ce88a 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -223,10 +223,13 @@ lib/PublicInbox/IPC.pm
 lib/PublicInbox/IdxStack.pm
 lib/PublicInbox/Import.pm
 lib/PublicInbox/In2Tie.pm
+lib/PublicInbox/In3Event.pm
+lib/PublicInbox/In3Watch.pm
 lib/PublicInbox/Inbox.pm
 lib/PublicInbox/InboxIdle.pm
 lib/PublicInbox/InboxWritable.pm
 lib/PublicInbox/Inotify.pm
+lib/PublicInbox/Inotify3.pm
 lib/PublicInbox/InputPipe.pm
 lib/PublicInbox/Isearch.pm
 lib/PublicInbox/KQNotify.pm
@@ -493,6 +496,7 @@ t/index-git-times.t
 t/indexlevels-mirror-v1.t
 t/indexlevels-mirror.t
 t/init.t
+t/inotify3.t
 t/io.t
 t/ipc.t
 t/iso-2202-jp.eml
diff --git a/devel/sysdefs-list b/devel/sysdefs-list
index d0166461..61532cf2 100755
--- a/devel/sysdefs-list
+++ b/devel/sysdefs-list
@@ -88,9 +88,37 @@ int main(void)
 	MAYBE D(SYS_epoll_wait);
 	D(SYS_epoll_pwait);
 	D(SYS_signalfd4);
+
+	X(IN_CLOEXEC);
+	X(IN_ACCESS);
+	X(IN_ALL_EVENTS);
+	X(IN_ATTRIB);
+	X(IN_CLOSE);
+	X(IN_CLOSE_NOWRITE);
+	X(IN_CLOSE_WRITE);
+	X(IN_CREATE);
+	X(IN_DELETE);
+	X(IN_DELETE_SELF);
+	X(IN_DONT_FOLLOW);
+	X(IN_EXCL_UNLINK);
+	X(IN_IGNORED);
+	X(IN_ISDIR);
+	X(IN_MASK_ADD);
+	X(IN_MODIFY);
+	X(IN_MOVE);
+	X(IN_MOVED_FROM);
+	X(IN_MOVED_TO);
+	X(IN_MOVE_SELF);
+	X(IN_ONESHOT);
+	X(IN_ONLYDIR);
+	X(IN_OPEN);
+	X(IN_Q_OVERFLOW);
+	X(IN_UNMOUNT);
+
 	D(SYS_inotify_init1);
 	D(SYS_inotify_add_watch);
 	D(SYS_inotify_rm_watch);
+
 	D(SYS_prctl);
 	D(SYS_fstatfs);
 
diff --git a/lib/PublicInbox/DirIdle.pm b/lib/PublicInbox/DirIdle.pm
index e6a326ab..230df166 100644
--- a/lib/PublicInbox/DirIdle.pm
+++ b/lib/PublicInbox/DirIdle.pm
@@ -10,12 +10,12 @@ use PublicInbox::In2Tie;
 
 my ($MAIL_IN, $MAIL_GONE, $ino_cls);
 if ($^O eq 'linux' && eval { require PublicInbox::Inotify; 1 }) {
-	$MAIL_IN = Linux::Inotify2::IN_MOVED_TO() |
-		Linux::Inotify2::IN_CREATE();
-	$MAIL_GONE = Linux::Inotify2::IN_DELETE() |
-			Linux::Inotify2::IN_DELETE_SELF() |
-			Linux::Inotify2::IN_MOVE_SELF() |
-			Linux::Inotify2::IN_MOVED_FROM();
+	$MAIL_IN = PublicInbox::Inotify::IN_MOVED_TO() |
+		PublicInbox::Inotify::IN_CREATE();
+	$MAIL_GONE = PublicInbox::Inotify::IN_DELETE() |
+			PublicInbox::Inotify::IN_DELETE_SELF() |
+			PublicInbox::Inotify::IN_MOVE_SELF() |
+			PublicInbox::Inotify::IN_MOVED_FROM();
 	$ino_cls = 'PublicInbox::Inotify';
 # Perl 5.22+ is needed for fileno(DIRHANDLE) support:
 } elsif ($^V ge v5.22 && eval { require PublicInbox::KQNotify }) {
@@ -79,7 +79,7 @@ sub event_step {
 	my $cb = $self->{cb} or return;
 	local $PublicInbox::DS::in_loop = 0; # waitpid() synchronously (FIXME)
 	eval {
-		my @events = $self->{inot}->read; # Linux::Inotify2->read
+		my @events = $self->{inot}->read; # Inotify3->read
 		$cb->($_) for @events;
 	};
 	warn "$self->{inot}->read err: $@\n" if $@;
@@ -88,7 +88,7 @@ sub event_step {
 sub force_close {
 	my ($self) = @_;
 	my $inot = delete $self->{inot} // return;
-	if ($inot->can('fh')) { # Linux::Inotify2 2.3+
+	if ($inot->can('fh')) { # Inotify3 or Linux::Inotify2 2.3+
 		$inot->fh->close or warn "CLOSE ERROR: $!";
 	} elsif ($inot->isa('Linux::Inotify2')) {
 		require PublicInbox::LI2Wrap;
diff --git a/lib/PublicInbox/In3Event.pm b/lib/PublicInbox/In3Event.pm
new file mode 100644
index 00000000..f93dc0da
--- /dev/null
+++ b/lib/PublicInbox/In3Event.pm
@@ -0,0 +1,24 @@
+# Copyright (C) all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# duck-type compatible with Linux::Inotify2::Event for pure Perl
+# PublicInbox::Inotify3 w/o callback support
+package PublicInbox::In3Event;
+use v5.12;
+
+sub w { $_[0]->[2] } # PublicInbox::In3Watch
+sub mask { $_[0]->[0] }
+sub name { $_[0]->[1] }
+
+sub fullname {
+	my ($name, $wname) = ($_[0]->[1], $_[0]->[2]->name);
+	length($name) ? "$wname/$name" : $wname;
+}
+
+my $buf = '';
+while (my ($sym, $mask) = each %PublicInbox::Inotify3::events) {
+	$buf .= "sub $sym { \$_[0]->[0] & $mask }\n";
+}
+eval $buf;
+
+1;
diff --git a/lib/PublicInbox/In3Watch.pm b/lib/PublicInbox/In3Watch.pm
new file mode 100644
index 00000000..bdb91869
--- /dev/null
+++ b/lib/PublicInbox/In3Watch.pm
@@ -0,0 +1,20 @@
+# Copyright (C) all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# duck-type compatible with Linux::Inotify2::Watch for pure Perl
+# PublicInbox::Inotify3 for our needs, only
+package PublicInbox::In3Watch;
+use v5.12;
+
+sub mask { $_[0]->[1] }
+sub name { $_[0]->[2] }
+
+sub cancel {
+	my ($self) = @_;
+	my ($wd, $in3) = @$self[0, 3];
+	$in3 or return 1; # already canceled
+	pop @$self;
+	$in3->rm_watch($wd);
+}
+
+1;
diff --git a/lib/PublicInbox/InboxIdle.pm b/lib/PublicInbox/InboxIdle.pm
index 4231c0a0..3c4d4a68 100644
--- a/lib/PublicInbox/InboxIdle.pm
+++ b/lib/PublicInbox/InboxIdle.pm
@@ -11,7 +11,7 @@ use PublicInbox::Syscall qw(EPOLLIN);
 my $IN_MODIFY = 0x02; # match Linux inotify
 my $ino_cls;
 if ($^O eq 'linux' && eval { require PublicInbox::Inotify }) {
-	$IN_MODIFY = Linux::Inotify2::IN_MODIFY();
+	$IN_MODIFY = PublicInbox::Inotify::IN_MODIFY();
 	$ino_cls = 'PublicInbox::Inotify';
 } elsif (eval { require PublicInbox::KQNotify }) {
 	$IN_MODIFY = PublicInbox::KQNotify::NOTE_WRITE();
@@ -34,7 +34,7 @@ sub in2_arm ($$) { # PublicInbox::Config::each_inbox callback
 		$ibx->{unlock_subs} = $old_ibx->{unlock_subs};
 		%{$ibx->{unlock_subs}} = (%$u, %{$ibx->{unlock_subs}}) if $u;
 
-		# Linux::Inotify2::Watch::name matches if watches are the
+		# *::Inotify*::Watch::name matches if watches are the
 		# same, no point in replacing a watch of the same name
 		if ($cur->[1]->name eq $lock) {
 			$self->{on_unlock}->{$lock} = $ibx;
@@ -87,7 +87,7 @@ sub new {
 sub event_step {
 	my ($self) = @_;
 	eval {
-		my @events = $self->{inot}->read; # Linux::Inotify2::read
+		my @events = $self->{inot}->read; # PublicInbox::Inotify3::read
 		my $on_unlock = $self->{on_unlock};
 		for my $ev (@events) {
 			my $fn = $ev->fullname // next; # cancelled
diff --git a/lib/PublicInbox/Inotify.pm b/lib/PublicInbox/Inotify.pm
index 3ef271c8..c4f1ae84 100644
--- a/lib/PublicInbox/Inotify.pm
+++ b/lib/PublicInbox/Inotify.pm
@@ -5,12 +5,29 @@
 package PublicInbox::Inotify;
 use v5.12;
 our @ISA;
-BEGIN {
-	eval { require Linux::Inotify2 };
-	if ($@) { # TODO: get rid of XS dependency
-		die "W: Linux::Inotify2 missing: $@\n";
+BEGIN { # prefer pure Perl since it works out-of-the-box
+	my $isa;
+	for my $m (qw(PublicInbox::Inotify3 Linux::Inotify2)) {
+		eval "require $m";
+		next if $@;
+		$isa = $m;
+	}
+	if ($isa) {
+		push @ISA, $isa;
+		my $buf = '';
+		for (qw(IN_MOVED_TO IN_CREATE IN_DELETE IN_DELETE_SELF
+				IN_MOVE_SELF IN_MOVED_FROM IN_MODIFY)) {
+			$buf .= "*$_ = \\&PublicInbox::Inotify3::$_;\n";
+		}
+		eval $buf;
+		die $@ if $@;
 	} else {
-		push @ISA, 'Linux::Inotify2';
+		die <<EOM;
+W: inotify syscall numbers unknown on your platform and
+W: Linux::Inotify2 missing: $@
+W: public-inbox hackers welcome the plain-text output of ./devel/sysdefs-list
+W: at meta\@public-inbox.org
+EOM
 	}
 };
 
diff --git a/lib/PublicInbox/Inotify3.pm b/lib/PublicInbox/Inotify3.pm
new file mode 100644
index 00000000..4f337a7a
--- /dev/null
+++ b/lib/PublicInbox/Inotify3.pm
@@ -0,0 +1,115 @@
+# Copyright (C) all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# Implements most Linux::Inotify2 functionality we need in pure Perl
+# Anonymous sub support isn't supported since it's expensive in the
+# best case and likely leaky in older Perls (e.g. 5.16.3)
+package PublicInbox::Inotify3;
+use v5.12;
+use autodie qw(open);
+use PublicInbox::Syscall ();
+use Carp;
+use Scalar::Util ();
+
+# this fails if undefined no unsupported platforms
+use constant $PublicInbox::Syscall::INOTIFY;
+our %events;
+
+# extracted from devel/sysdefs-list output, these should be arch-independent
+BEGIN {
+%events = (
+	IN_ACCESS => 0x1,
+	IN_ALL_EVENTS => 0xfff,
+	IN_ATTRIB => 0x4,
+	IN_CLOSE => 0x18,
+	IN_CLOSE_NOWRITE => 0x10,
+	IN_CLOSE_WRITE => 0x8,
+	IN_CREATE => 0x100,
+	IN_DELETE => 0x200,
+	IN_DELETE_SELF => 0x400,
+	IN_DONT_FOLLOW => 0x2000000,
+	IN_EXCL_UNLINK => 0x4000000,
+	IN_IGNORED => 0x8000,
+	IN_ISDIR => 0x40000000,
+	IN_MASK_ADD => 0x20000000,
+	IN_MODIFY => 0x2,
+	IN_MOVE => 0xc0,
+	IN_MOVED_FROM => 0x40,
+	IN_MOVED_TO => 0x80,
+	IN_MOVE_SELF => 0x800,
+	IN_ONESHOT => 0x80000000,
+	IN_ONLYDIR => 0x1000000,
+	IN_OPEN => 0x20,
+	IN_Q_OVERFLOW => 0x4000,
+	IN_UNMOUNT => 0x2000,
+);
+} # /BEGIN
+use constant \%events;
+require PublicInbox::In3Event; # uses %events
+require PublicInbox::In3Watch; # uses SYS_inotify_rm_watch
+
+use constant autocancel =>
+	(IN_IGNORED|IN_UNMOUNT|IN_ONESHOT|IN_DELETE_SELF);
+
+sub new {
+	open my $fh, '+<&=', syscall(SYS_inotify_init1, IN_CLOEXEC);
+	bless { fh => $fh }, __PACKAGE__;
+}
+
+sub read {
+	my ($self) = @_;
+	my (@ret, $wd, $mask, $len, $name, $size, $buf);
+	my $r = sysread($self->{fh}, my $rbuf, 8192);
+	if ($r) {
+		while ($r) {
+			($wd, $mask, undef, $len) = unpack('lLLL', $rbuf);
+			$size = 16 + $len; # 16: sizeof(struct inotify_event)
+			substr($rbuf, 0, 16, '');
+			$name = $len ? unpack('Z*', substr($rbuf, 0, $len, ''))
+					: undef;
+			$r -= $size;
+			next if $self->{ignore}->{$wd};
+			my $ev = bless [$mask, $name], 'PublicInbox::In3Event';
+			push @ret, $ev;
+			if (my $w = $self->{w}->{$wd}) {
+				$ev->[2] = $w;
+				$w->cancel if $ev->mask & autocancel;
+			} elsif ($mask & IN_Q_OVERFLOW) {
+				carp 'E: IN_Q_OVERFLOW, too busy? (non-fatal)'
+			} else {
+				carp "BUG? wd:$wd unknown (non-fatal)";
+			}
+		}
+	} elsif (defined($r) || ($!{EAGAIN} || $!{EINTR})) {
+	} else {
+		croak "inotify read: $!";
+	}
+	delete $self->{ignore};
+	@ret;
+}
+
+sub fileno { CORE::fileno($_[0]->{fh}) }
+
+sub fh { $_[0]->{fh} }
+
+sub blocking { shift->{fh}->blocking(@_) }
+
+sub watch {
+	my ($self, $name, $mask, $cb) = @_;
+	croak "E: $cb not supported" if $cb; # too much memory
+	my $wd = syscall(SYS_inotify_add_watch, $self->fileno, $name, $mask);
+	return if $wd < 0;
+	my $w = bless [ $wd, $mask, $name, $self ], 'PublicInbox::In3Watch';
+	$self->{w}->{$wd} = $w;
+	Scalar::Util::weaken($w->[3]); # ugh
+	$w;
+}
+
+sub rm_watch {
+	my ($self, $wd) = @_;
+	delete $self->{w}->{$wd};
+	$self->{ignore}->{$wd} = 1; # is this needed?
+	syscall(SYS_inotify_rm_watch, $self->fileno, $wd) < 0 ? undef : 1;
+}
+
+1;
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 78181bb6..96af2b22 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -22,6 +22,7 @@ use POSIX qw(ENOENT ENOSYS EINVAL O_NONBLOCK);
 use Socket qw(SOL_SOCKET SCM_RIGHTS);
 use Config;
 our %SIGNUM = (WINCH => 28); # most Linux, {Free,Net,Open}BSD, *Darwin
+our $INOTIFY;
 
 # $VERSION = '0.25'; # Sys::Syscall version
 our @EXPORT_OK = qw(epoll_ctl epoll_create epoll_wait
@@ -98,6 +99,11 @@ if ($^O eq "linux") {
 	$SYS_fstatfs = 100;
 	$SYS_sendmsg = 370;
 	$SYS_recvmsg = 372;
+	$INOTIFY = { # usage: `use constant $PublicInbox::Syscall::INOTIFY'
+		SYS_inotify_init1 => 332,
+		SYS_inotify_add_watch => 292,
+		SYS_inotify_rm_watch => 293,
+	};
 	$FS_IOC_GETFLAGS = 0x80046601;
 	$FS_IOC_SETFLAGS = 0x40046602;
     } elsif ($machine eq "x86_64") {
@@ -109,6 +115,11 @@ if ($^O eq "linux") {
 	$SYS_fstatfs = 138;
 	$SYS_sendmsg = 46;
 	$SYS_recvmsg = 47;
+	$INOTIFY = {
+		SYS_inotify_init1 => 294,
+		SYS_inotify_add_watch => 254,
+		SYS_inotify_rm_watch => 255,
+	};
 	$FS_IOC_GETFLAGS = 0x80086601;
 	$FS_IOC_SETFLAGS = 0x40086602;
     } elsif ($machine eq 'x32') {
@@ -122,6 +133,11 @@ if ($^O eq "linux") {
 	$SYS_recvmsg = 0x40000207;
 	$FS_IOC_GETFLAGS = 0x80046601;
 	$FS_IOC_SETFLAGS = 0x40046602;
+	$INOTIFY = {
+		SYS_inotify_init1 => 1073742118,
+		SYS_inotify_add_watch => 1073742078,
+		SYS_inotify_rm_watch => 1073742079,
+	};
     } elsif ($machine eq 'sparc64') {
 	$SYS_epoll_create = 193;
 	$SYS_epoll_ctl = 194;
@@ -154,6 +170,11 @@ if ($^O eq "linux") {
 	$SYS_recvmsg = 342;
 	$FS_IOC_GETFLAGS = 0x40086601;
 	$FS_IOC_SETFLAGS = 0x80086602;
+	$INOTIFY = {
+		SYS_inotify_init1 => 318,
+		SYS_inotify_add_watch => 276,
+		SYS_inotify_rm_watch => 277,
+	};
     } elsif ($machine eq "ppc") {
         $SYS_epoll_create = 236;
         $SYS_epoll_ctl    = 237;
@@ -188,7 +209,7 @@ if ($^O eq "linux") {
         $u64_mod_8        = 1;
         $SYS_signalfd4 = 484;
 	$SFD_CLOEXEC = 010000000;
-    } elsif ($machine =~ /\A(?:loong)?aarch64\z/ || $machine eq 'riscv64') {
+    } elsif ($machine =~ /\A(?:loong|a)arch64\z/ || $machine eq 'riscv64') {
         $SYS_epoll_create = 20;  # (sys_epoll_create1)
         $SYS_epoll_ctl    = 21;
         $SYS_epoll_wait   = 22;  # (sys_epoll_pwait)
@@ -199,6 +220,11 @@ if ($^O eq "linux") {
 	$SYS_fstatfs = 44;
 	$SYS_sendmsg = 211;
 	$SYS_recvmsg = 212;
+	$INOTIFY = {
+		SYS_inotify_init1 => 26,
+		SYS_inotify_add_watch => 27,
+		SYS_inotify_rm_watch => 28,
+	};
 	$FS_IOC_GETFLAGS = 0x80086601;
 	$FS_IOC_SETFLAGS = 0x40086602;
     } elsif ($machine =~ m/arm(v\d+)?.*l/) { # ARM OABI (untested on cfarm)
@@ -236,6 +262,11 @@ if ($^O eq "linux") {
 	$FS_IOC_GETFLAGS = 0x40046601;
 	$FS_IOC_SETFLAGS = 0x80046602;
 	$SIGNUM{WINCH} = 20;
+	$INOTIFY = {
+		SYS_inotify_init1 => 4329,
+		SYS_inotify_add_watch => 4285,
+		SYS_inotify_rm_watch => 4286,
+	};
     } else {
         warn <<EOM;
 machine=$machine ptrsize=$Config{ptrsize} has no syscall definitions
@@ -251,6 +282,10 @@ EOM
         *epoll_ctl = \&epoll_ctl_mod4;
     }
 }
+
+# SFD_CLOEXEC is arch-dependent, so IN_CLOEXEC may be, too
+$INOTIFY->{IN_CLOEXEC} //= 0x80000 if $INOTIFY;
+
 # use Inline::C for *BSD-only or general POSIX stuff.
 # Linux guarantees stable syscall numbering, BSDs only offer a stable libc
 # use devel/sysdefs-list on Linux to detect new syscall numbers and
diff --git a/lib/PublicInbox/TailNotify.pm b/lib/PublicInbox/TailNotify.pm
index bdb92d54..84340a35 100644
--- a/lib/PublicInbox/TailNotify.pm
+++ b/lib/PublicInbox/TailNotify.pm
@@ -9,9 +9,9 @@ use PublicInbox::DS qw(now);
 
 my ($TAIL_MOD, $ino_cls);
 if ($^O eq 'linux' && eval { require PublicInbox::Inotify; 1 }) {
-	$TAIL_MOD = Linux::Inotify2::IN_MOVED_TO() |
-		Linux::Inotify2::IN_CREATE() |
-		Linux::Inotify2::IN_MODIFY();
+	$TAIL_MOD = PublicInbox::Inotify::IN_MOVED_TO() |
+		PublicInbox::Inotify::IN_CREATE() |
+		PublicInbox::Inotify::IN_MODIFY();
 	$ino_cls = 'PublicInbox::Inotify';
 } elsif (eval { require PublicInbox::KQNotify }) {
 	$TAIL_MOD = PublicInbox::KQNotify::MOVED_TO_OR_CREATE() |
diff --git a/t/imapd.t b/t/imapd.t
index 9606291e..549b8766 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -250,7 +250,7 @@ SKIP: {
 
 ok($mic->logout, 'logout works');
 
-my $have_inotify = eval { require Linux::Inotify2; 1 };
+my $have_inotify = eval { require PublicInbox::Inotify; 1 };
 
 for my $ibx (@ibx) {
 	my $name = $ibx->{name};
diff --git a/t/inotify3.t b/t/inotify3.t
new file mode 100644
index 00000000..c25c0f42
--- /dev/null
+++ b/t/inotify3.t
@@ -0,0 +1,17 @@
+#!perl -w
+# Copyright (C) all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use v5.12; use PublicInbox::TestCommon;
+plan skip_all => 'inotify is Linux-only' if $^O ne 'linux';
+use_ok 'PublicInbox::Inotify3';
+my $in = PublicInbox::Inotify3->new;
+my $tmpdir = tmpdir;
+my $w = $in->watch("$tmpdir", PublicInbox::Inotify3::IN_ALL_EVENTS());
+$in->blocking(0);
+is_xdeeply [ $in->read ], [], 'non-blocking has no events, yet';
+undef $tmpdir;
+my @list = $in->read;
+ok scalar(@list), 'got events';
+ok $w->cancel, 'watch canceled';
+
+done_testing;
diff --git a/t/lei-auto-watch.t b/t/lei-auto-watch.t
index f871188d..1e190316 100644
--- a/t/lei-auto-watch.t
+++ b/t/lei-auto-watch.t
@@ -4,10 +4,10 @@
 use strict; use v5.10.1; use PublicInbox::TestCommon;
 use File::Basename qw(basename);
 plan skip_all => "TEST_FLAKY not enabled for $0" if !$ENV{TEST_FLAKY};
-my $have_fast_inotify = eval { require Linux::Inotify2 } ||
+my $have_fast_inotify = eval { require PublicInbox::Inotify } ||
 	eval { require IO::KQueue };
 $have_fast_inotify or
-	diag("$0 IO::KQueue or Linux::Inotify2 missing, test will be slow");
+	diag("$0 IO::KQueue or inotify missing, test will be slow");
 
 test_lei(sub {
 	my ($ro_home, $cfg_path) = setup_public_inboxes;
diff --git a/t/lei-watch.t b/t/lei-watch.t
index 24d9f5c8..7b357ee0 100644
--- a/t/lei-watch.t
+++ b/t/lei-watch.t
@@ -5,11 +5,11 @@ use strict; use v5.10.1; use PublicInbox::TestCommon;
 use File::Path qw(make_path remove_tree);
 plan skip_all => "TEST_FLAKY not enabled for $0" if !$ENV{TEST_FLAKY};
 require_mods('lei');
-my $have_fast_inotify = eval { require Linux::Inotify2 } ||
+my $have_fast_inotify = eval { require PublicInbox::Inotify } ||
 	eval { require IO::KQueue };
 
 $have_fast_inotify or
-	diag("$0 IO::KQueue or Linux::Inotify2 missing, test will be slow");
+	diag("$0 IO::KQueue or inotify missing, test will be slow");
 
 my ($ro_home, $cfg_path) = setup_public_inboxes;
 test_lei(sub {
diff --git a/t/nntpd.t b/t/nntpd.t
index 0f3ef596..7052cb6a 100644
--- a/t/nntpd.t
+++ b/t/nntpd.t
@@ -14,7 +14,7 @@ use PublicInbox::DS;
 my $version = $ENV{PI_TEST_VERSION} || 1;
 require_git('2.6') if $version == 2;
 use_ok 'PublicInbox::Msgmap';
-my $fast_idle = eval { require Linux::Inotify2; 1 } //
+my $fast_idle = eval { require PublicInbox::Inotify; 1 } //
 		eval { require IO::KQueue; 1 };
 
 my ($tmpdir, $for_destroy) = tmpdir();
diff --git a/t/watch_maildir.t b/t/watch_maildir.t
index 07ebeef6..d7f01b1a 100644
--- a/t/watch_maildir.t
+++ b/t/watch_maildir.t
@@ -182,7 +182,7 @@ EOM
 
 	# wait for -watch to setup inotify watches
 	my $sleep = 1;
-	if (eval { require Linux::Inotify2 } && -d "/proc/$wm->{pid}/fd") {
+	if (eval { require PublicInbox::Inotify } && -d "/proc/$wm->{pid}/fd") {
 		my $end = time + 2;
 		my (@ino, @ino_info);
 		do {

^ permalink raw reply related	[relevance 47%]

* [PATCH 1/2] syscall: update formatting to match our codebase
  @ 2024-01-29 21:23 50% ` Eric Wong
  2024-01-29 21:23 54% ` [PATCH 2/2] syscall: use pure Perl sendmsg/recvmsg on *BSD Eric Wong
  1 sibling, 0 replies; 51+ results
From: Eric Wong @ 2024-01-29 21:23 UTC (permalink / raw)
  To: meta

Sys::Syscall needs separate patches anyways (if it ever gets
updated), and having a mix of indentation styles in our codebase
gets confusing.  We'll also update cfarm-related comments for
the current URL.
---
 lib/PublicInbox/Syscall.pm | 427 ++++++++++++++++++-------------------
 1 file changed, 213 insertions(+), 214 deletions(-)

diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 96af2b22..9071e6b1 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -4,7 +4,7 @@
 #
 # See devel/sysdefs-list in the public-inbox source tree for maintenance
 # <https://80x24.org/public-inbox.git>, and machines from the GCC Farm:
-# <https://cfarm.tetaneutral.net/>
+# <https://portal.cfarm.net/>
 #
 # This license differs from the rest of public-inbox
 #
@@ -26,10 +26,10 @@ our $INOTIFY;
 
 # $VERSION = '0.25'; # Sys::Syscall version
 our @EXPORT_OK = qw(epoll_ctl epoll_create epoll_wait
-                  EPOLLIN EPOLLOUT EPOLLET
-                  EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
-                  EPOLLONESHOT EPOLLEXCLUSIVE
-                  signalfd rename_noreplace %SIGNUM $F_SETPIPE_SZ);
+		EPOLLIN EPOLLOUT EPOLLET
+		EPOLL_CTL_ADD EPOLL_CTL_DEL EPOLL_CTL_MOD
+		EPOLLONESHOT EPOLLEXCLUSIVE
+		signalfd rename_noreplace %SIGNUM $F_SETPIPE_SZ);
 use constant {
 	EPOLLIN => 1,
 	EPOLLOUT => 4,
@@ -71,216 +71,216 @@ our $no_deprecated = 0;
 
 if ($^O eq "linux") {
 	$F_SETPIPE_SZ = 1031;
-    my (undef, undef, $release, undef, $machine) = POSIX::uname();
-    my ($maj, $min) = ($release =~ /\A([0-9]+)\.([0-9]+)/);
-    $SYS_renameat2 = 0 if "$maj.$min" < 3.15;
-    # whether the machine requires 64-bit numbers to be on 8-byte
-    # boundaries.
-    my $u64_mod_8 = 0;
+	my (undef, undef, $release, undef, $machine) = POSIX::uname();
+	my ($maj, $min) = ($release =~ /\A([0-9]+)\.([0-9]+)/);
+	$SYS_renameat2 = 0 if "$maj.$min" < 3.15;
+	# whether the machine requires 64-bit numbers to be on 8-byte
+	# boundaries.
+	my $u64_mod_8 = 0;
 
-    if ($Config{ptrsize} == 4) {
-	# if we're running on an x86_64 kernel, but a 32-bit process,
-	# we need to use the x32 or i386 syscall numbers.
-	if ($machine eq 'x86_64') {
-	    my $s = $Config{cppsymbols};
-	    $machine = ($s =~ /\b__ILP32__=1\b/ && $s =~ /\b__x86_64__=1\b/) ?
+	if ($Config{ptrsize} == 4) {
+		# if we're running on an x86_64 kernel, but a 32-bit process,
+		# we need to use the x32 or i386 syscall numbers.
+		if ($machine eq 'x86_64') {
+			my $s = $Config{cppsymbols};
+			$machine = ($s =~ /\b__ILP32__=1\b/ &&
+					$s =~ /\b__x86_64__=1\b/) ?
 				'x32' : 'i386'
-	} elsif ($machine eq 'mips64') { # similarly for mips64 vs mips
-	    $machine = 'mips';
+		} elsif ($machine eq 'mips64') { # similarly for mips64 vs mips
+			$machine = 'mips';
+		}
 	}
-    }
-
-    if ($machine =~ m/^i[3456]86$/) {
-        $SYS_epoll_create = 254;
-        $SYS_epoll_ctl    = 255;
-        $SYS_epoll_wait   = 256;
-        $SYS_signalfd4 = 327;
-        $SYS_renameat2 //= 353;
-	$SYS_fstatfs = 100;
-	$SYS_sendmsg = 370;
-	$SYS_recvmsg = 372;
-	$INOTIFY = { # usage: `use constant $PublicInbox::Syscall::INOTIFY'
-		SYS_inotify_init1 => 332,
-		SYS_inotify_add_watch => 292,
-		SYS_inotify_rm_watch => 293,
-	};
-	$FS_IOC_GETFLAGS = 0x80046601;
-	$FS_IOC_SETFLAGS = 0x40046602;
-    } elsif ($machine eq "x86_64") {
-        $SYS_epoll_create = 213;
-        $SYS_epoll_ctl    = 233;
-        $SYS_epoll_wait   = 232;
-        $SYS_signalfd4 = 289;
-	$SYS_renameat2 //= 316;
-	$SYS_fstatfs = 138;
-	$SYS_sendmsg = 46;
-	$SYS_recvmsg = 47;
-	$INOTIFY = {
-		SYS_inotify_init1 => 294,
-		SYS_inotify_add_watch => 254,
-		SYS_inotify_rm_watch => 255,
-	};
-	$FS_IOC_GETFLAGS = 0x80086601;
-	$FS_IOC_SETFLAGS = 0x40086602;
-    } elsif ($machine eq 'x32') {
-        $SYS_epoll_create = 1073742037;
-        $SYS_epoll_ctl = 1073742057;
-        $SYS_epoll_wait = 1073742056;
-        $SYS_signalfd4 = 1073742113;
-	$SYS_renameat2 //= 0x40000000 + 316;
-	$SYS_fstatfs = 138;
-	$SYS_sendmsg = 0x40000206;
-	$SYS_recvmsg = 0x40000207;
-	$FS_IOC_GETFLAGS = 0x80046601;
-	$FS_IOC_SETFLAGS = 0x40046602;
-	$INOTIFY = {
-		SYS_inotify_init1 => 1073742118,
-		SYS_inotify_add_watch => 1073742078,
-		SYS_inotify_rm_watch => 1073742079,
-	};
-    } elsif ($machine eq 'sparc64') {
-	$SYS_epoll_create = 193;
-	$SYS_epoll_ctl = 194;
-	$SYS_epoll_wait = 195;
-	$u64_mod_8 = 1;
-	$SYS_signalfd4 = 317;
-	$SYS_renameat2 //= 345;
-	$SFD_CLOEXEC = 020000000;
-	$SYS_fstatfs = 158;
-	$SYS_sendmsg = 114;
-	$SYS_recvmsg = 113;
-	$FS_IOC_GETFLAGS = 0x40086601;
-	$FS_IOC_SETFLAGS = 0x80086602;
-    } elsif ($machine =~ m/^parisc/) {
-        $SYS_epoll_create = 224;
-        $SYS_epoll_ctl    = 225;
-        $SYS_epoll_wait   = 226;
-        $u64_mod_8        = 1;
-        $SYS_signalfd4 = 309;
-        $SIGNUM{WINCH} = 23;
-    } elsif ($machine =~ m/^ppc64/) {
-        $SYS_epoll_create = 236;
-        $SYS_epoll_ctl    = 237;
-        $SYS_epoll_wait   = 238;
-        $u64_mod_8        = 1;
-        $SYS_signalfd4 = 313;
-	$SYS_renameat2 //= 357;
-	$SYS_fstatfs = 100;
-	$SYS_sendmsg = 341;
-	$SYS_recvmsg = 342;
-	$FS_IOC_GETFLAGS = 0x40086601;
-	$FS_IOC_SETFLAGS = 0x80086602;
-	$INOTIFY = {
-		SYS_inotify_init1 => 318,
-		SYS_inotify_add_watch => 276,
-		SYS_inotify_rm_watch => 277,
-	};
-    } elsif ($machine eq "ppc") {
-        $SYS_epoll_create = 236;
-        $SYS_epoll_ctl    = 237;
-        $SYS_epoll_wait   = 238;
-        $u64_mod_8        = 1;
-        $SYS_signalfd4 = 313;
-	$SYS_renameat2 //= 357;
-	$SYS_fstatfs = 100;
-	$FS_IOC_GETFLAGS = 0x40086601;
-	$FS_IOC_SETFLAGS = 0x80086602;
-    } elsif ($machine =~ m/^s390/) { # untested, no machine on cfarm
-        $SYS_epoll_create = 249;
-        $SYS_epoll_ctl    = 250;
-        $SYS_epoll_wait   = 251;
-        $u64_mod_8        = 1;
-        $SYS_signalfd4 = 322;
-	$SYS_renameat2 //= 347;
-	$SYS_fstatfs = 100;
-	$SYS_sendmsg = 370;
-	$SYS_recvmsg = 372;
-    } elsif ($machine eq 'ia64') { # untested, no machine on cfarm
-        $SYS_epoll_create = 1243;
-        $SYS_epoll_ctl    = 1244;
-        $SYS_epoll_wait   = 1245;
-        $u64_mod_8        = 1;
-        $SYS_signalfd4 = 289;
-    } elsif ($machine eq "alpha") { # untested, no machine on cfarm
-        # natural alignment, ints are 32-bits
-        $SYS_epoll_create = 407;
-        $SYS_epoll_ctl    = 408;
-        $SYS_epoll_wait   = 409;
-        $u64_mod_8        = 1;
-        $SYS_signalfd4 = 484;
-	$SFD_CLOEXEC = 010000000;
-    } elsif ($machine =~ /\A(?:loong|a)arch64\z/ || $machine eq 'riscv64') {
-        $SYS_epoll_create = 20;  # (sys_epoll_create1)
-        $SYS_epoll_ctl    = 21;
-        $SYS_epoll_wait   = 22;  # (sys_epoll_pwait)
-        $u64_mod_8        = 1;
-        $no_deprecated    = 1;
-        $SYS_signalfd4 = 74;
-	$SYS_renameat2 //= 276;
-	$SYS_fstatfs = 44;
-	$SYS_sendmsg = 211;
-	$SYS_recvmsg = 212;
-	$INOTIFY = {
-		SYS_inotify_init1 => 26,
-		SYS_inotify_add_watch => 27,
-		SYS_inotify_rm_watch => 28,
-	};
-	$FS_IOC_GETFLAGS = 0x80086601;
-	$FS_IOC_SETFLAGS = 0x40086602;
-    } elsif ($machine =~ m/arm(v\d+)?.*l/) { # ARM OABI (untested on cfarm)
-        $SYS_epoll_create = 250;
-        $SYS_epoll_ctl    = 251;
-        $SYS_epoll_wait   = 252;
-        $u64_mod_8        = 1;
-        $SYS_signalfd4 = 355;
-	$SYS_renameat2 //= 382;
-	$SYS_fstatfs = 100;
-	$SYS_sendmsg = 296;
-	$SYS_recvmsg = 297;
-    } elsif ($machine =~ m/^mips64/) { # cfarm only has 32-bit userspace
-        $SYS_epoll_create = 5207;
-        $SYS_epoll_ctl    = 5208;
-        $SYS_epoll_wait   = 5209;
-        $u64_mod_8        = 1;
-        $SYS_signalfd4 = 5283;
-	$SYS_renameat2 //= 5311;
-	$SYS_fstatfs = 5135;
-	$SYS_sendmsg = 5045;
-	$SYS_recvmsg = 5046;
-	$FS_IOC_GETFLAGS = 0x40046601;
-	$FS_IOC_SETFLAGS = 0x80046602;
-    } elsif ($machine =~ m/^mips/) { # 32-bit, tested on mips64 cfarm machine
-        $SYS_epoll_create = 4248;
-        $SYS_epoll_ctl    = 4249;
-        $SYS_epoll_wait   = 4250;
-        $u64_mod_8        = 1;
-        $SYS_signalfd4 = 4324;
-	$SYS_renameat2 //= 4351;
-	$SYS_fstatfs = 4100;
-	$SYS_sendmsg = 4179;
-	$SYS_recvmsg = 4177;
-	$FS_IOC_GETFLAGS = 0x40046601;
-	$FS_IOC_SETFLAGS = 0x80046602;
-	$SIGNUM{WINCH} = 20;
-	$INOTIFY = {
-		SYS_inotify_init1 => 4329,
-		SYS_inotify_add_watch => 4285,
-		SYS_inotify_rm_watch => 4286,
-	};
-    } else {
-        warn <<EOM;
+	if ($machine =~ m/^i[3456]86$/) {
+		$SYS_epoll_create = 254;
+		$SYS_epoll_ctl = 255;
+		$SYS_epoll_wait = 256;
+		$SYS_signalfd4 = 327;
+		$SYS_renameat2 //= 353;
+		$SYS_fstatfs = 100;
+		$SYS_sendmsg = 370;
+		$SYS_recvmsg = 372;
+		$INOTIFY = { # usage: `use constant $PublicInbox::Syscall::INOTIFY'
+			SYS_inotify_init1 => 332,
+			SYS_inotify_add_watch => 292,
+			SYS_inotify_rm_watch => 293,
+		};
+		$FS_IOC_GETFLAGS = 0x80046601;
+		$FS_IOC_SETFLAGS = 0x40046602;
+	} elsif ($machine eq "x86_64") {
+		$SYS_epoll_create = 213;
+		$SYS_epoll_ctl = 233;
+		$SYS_epoll_wait = 232;
+		$SYS_signalfd4 = 289;
+		$SYS_renameat2 //= 316;
+		$SYS_fstatfs = 138;
+		$SYS_sendmsg = 46;
+		$SYS_recvmsg = 47;
+		$INOTIFY = {
+			SYS_inotify_init1 => 294,
+			SYS_inotify_add_watch => 254,
+			SYS_inotify_rm_watch => 255,
+		};
+		$FS_IOC_GETFLAGS = 0x80086601;
+		$FS_IOC_SETFLAGS = 0x40086602;
+	} elsif ($machine eq 'x32') {
+		$SYS_epoll_create = 1073742037;
+		$SYS_epoll_ctl = 1073742057;
+		$SYS_epoll_wait = 1073742056;
+		$SYS_signalfd4 = 1073742113;
+		$SYS_renameat2 //= 0x40000000 + 316;
+		$SYS_fstatfs = 138;
+		$SYS_sendmsg = 0x40000206;
+		$SYS_recvmsg = 0x40000207;
+		$FS_IOC_GETFLAGS = 0x80046601;
+		$FS_IOC_SETFLAGS = 0x40046602;
+		$INOTIFY = {
+			SYS_inotify_init1 => 1073742118,
+			SYS_inotify_add_watch => 1073742078,
+			SYS_inotify_rm_watch => 1073742079,
+		};
+	} elsif ($machine eq 'sparc64') {
+		$SYS_epoll_create = 193;
+		$SYS_epoll_ctl = 194;
+		$SYS_epoll_wait = 195;
+		$u64_mod_8 = 1;
+		$SYS_signalfd4 = 317;
+		$SYS_renameat2 //= 345;
+		$SFD_CLOEXEC = 020000000;
+		$SYS_fstatfs = 158;
+		$SYS_sendmsg = 114;
+		$SYS_recvmsg = 113;
+		$FS_IOC_GETFLAGS = 0x40086601;
+		$FS_IOC_SETFLAGS = 0x80086602;
+	} elsif ($machine =~ m/^parisc/) { # untested, no machine on cfarm
+		$SYS_epoll_create = 224;
+		$SYS_epoll_ctl = 225;
+		$SYS_epoll_wait = 226;
+		$u64_mod_8 = 1;
+		$SYS_signalfd4 = 309;
+		$SIGNUM{WINCH} = 23;
+	} elsif ($machine =~ m/^ppc64/) {
+		$SYS_epoll_create = 236;
+		$SYS_epoll_ctl = 237;
+		$SYS_epoll_wait = 238;
+		$u64_mod_8 = 1;
+		$SYS_signalfd4 = 313;
+		$SYS_renameat2 //= 357;
+		$SYS_fstatfs = 100;
+		$SYS_sendmsg = 341;
+		$SYS_recvmsg = 342;
+		$FS_IOC_GETFLAGS = 0x40086601;
+		$FS_IOC_SETFLAGS = 0x80086602;
+		$INOTIFY = {
+			SYS_inotify_init1 => 318,
+			SYS_inotify_add_watch => 276,
+			SYS_inotify_rm_watch => 277,
+		};
+	} elsif ($machine eq "ppc") { # untested, no machine on cfarm
+		$SYS_epoll_create = 236;
+		$SYS_epoll_ctl = 237;
+		$SYS_epoll_wait = 238;
+		$u64_mod_8 = 1;
+		$SYS_signalfd4 = 313;
+		$SYS_renameat2 //= 357;
+		$SYS_fstatfs = 100;
+		$FS_IOC_GETFLAGS = 0x40086601;
+		$FS_IOC_SETFLAGS = 0x80086602;
+	} elsif ($machine =~ m/^s390/) { # untested, no machine on cfarm
+		$SYS_epoll_create = 249;
+		$SYS_epoll_ctl = 250;
+		$SYS_epoll_wait = 251;
+		$u64_mod_8 = 1;
+		$SYS_signalfd4 = 322;
+		$SYS_renameat2 //= 347;
+		$SYS_fstatfs = 100;
+		$SYS_sendmsg = 370;
+		$SYS_recvmsg = 372;
+	} elsif ($machine eq 'ia64') { # untested, no machine on cfarm
+		$SYS_epoll_create = 1243;
+		$SYS_epoll_ctl = 1244;
+		$SYS_epoll_wait = 1245;
+		$u64_mod_8 = 1;
+		$SYS_signalfd4 = 289;
+	} elsif ($machine eq "alpha") { # untested, no machine on cfarm
+		# natural alignment, ints are 32-bits
+		$SYS_epoll_create = 407;
+		$SYS_epoll_ctl = 408;
+		$SYS_epoll_wait = 409;
+		$u64_mod_8 = 1;
+		$SYS_signalfd4 = 484;
+		$SFD_CLOEXEC = 010000000;
+	} elsif ($machine =~ /\A(?:loong|a)arch64\z/ || $machine eq 'riscv64') {
+		$SYS_epoll_create = 20; # (sys_epoll_create1)
+		$SYS_epoll_ctl = 21;
+		$SYS_epoll_wait = 22; # (sys_epoll_pwait)
+		$u64_mod_8 = 1;
+		$no_deprecated = 1;
+		$SYS_signalfd4 = 74;
+		$SYS_renameat2 //= 276;
+		$SYS_fstatfs = 44;
+		$SYS_sendmsg = 211;
+		$SYS_recvmsg = 212;
+		$INOTIFY = {
+			SYS_inotify_init1 => 26,
+			SYS_inotify_add_watch => 27,
+			SYS_inotify_rm_watch => 28,
+		};
+		$FS_IOC_GETFLAGS = 0x80086601;
+		$FS_IOC_SETFLAGS = 0x40086602;
+	} elsif ($machine =~ m/arm(v\d+)?.*l/) { # ARM OABI (untested on cfarm)
+		$SYS_epoll_create = 250;
+		$SYS_epoll_ctl = 251;
+		$SYS_epoll_wait = 252;
+		$u64_mod_8 = 1;
+		$SYS_signalfd4 = 355;
+		$SYS_renameat2 //= 382;
+		$SYS_fstatfs = 100;
+		$SYS_sendmsg = 296;
+		$SYS_recvmsg = 297;
+	} elsif ($machine =~ m/^mips64/) { # cfarm only has 32-bit userspace
+		$SYS_epoll_create = 5207;
+		$SYS_epoll_ctl = 5208;
+		$SYS_epoll_wait = 5209;
+		$u64_mod_8 = 1;
+		$SYS_signalfd4 = 5283;
+		$SYS_renameat2 //= 5311;
+		$SYS_fstatfs = 5135;
+		$SYS_sendmsg = 5045;
+		$SYS_recvmsg = 5046;
+		$FS_IOC_GETFLAGS = 0x40046601;
+		$FS_IOC_SETFLAGS = 0x80046602;
+	} elsif ($machine =~ m/^mips/) { # 32-bit, tested on mips64 cfarm host
+		$SYS_epoll_create = 4248;
+		$SYS_epoll_ctl = 4249;
+		$SYS_epoll_wait = 4250;
+		$u64_mod_8 = 1;
+		$SYS_signalfd4 = 4324;
+		$SYS_renameat2 //= 4351;
+		$SYS_fstatfs = 4100;
+		$SYS_sendmsg = 4179;
+		$SYS_recvmsg = 4177;
+		$FS_IOC_GETFLAGS = 0x40046601;
+		$FS_IOC_SETFLAGS = 0x80046602;
+		$SIGNUM{WINCH} = 20;
+		$INOTIFY = {
+			SYS_inotify_init1 => 4329,
+			SYS_inotify_add_watch => 4285,
+			SYS_inotify_rm_watch => 4286,
+		};
+	} else {
+		warn <<EOM;
 machine=$machine ptrsize=$Config{ptrsize} has no syscall definitions
 git clone https://80x24.org/public-inbox.git and
 Send the output of ./devel/sysdefs-list to meta\@public-inbox.org
 EOM
-    }
-    if ($u64_mod_8) {
-        *epoll_wait = \&epoll_wait_mod8;
-        *epoll_ctl = \&epoll_ctl_mod8;
-    } else {
-        *epoll_wait = \&epoll_wait_mod4;
-        *epoll_ctl = \&epoll_ctl_mod4;
-    }
+	}
+	if ($u64_mod_8) {
+		*epoll_wait = \&epoll_wait_mod8;
+		*epoll_ctl = \&epoll_ctl_mod8;
+	} else {
+		*epoll_wait = \&epoll_wait_mod4;
+		*epoll_ctl = \&epoll_ctl_mod4;
+	}
 }
 
 # SFD_CLOEXEC is arch-dependent, so IN_CLOEXEC may be, too
@@ -291,10 +291,6 @@ $INOTIFY->{IN_CLOEXEC} //= 0x80000 if $INOTIFY;
 # use devel/sysdefs-list on Linux to detect new syscall numbers and
 # other system constants
 
-############################################################################
-# epoll functions
-############################################################################
-
 sub epoll_create {
 	syscall($SYS_epoll_create, $no_deprecated ? 0 : 100);
 }
@@ -302,10 +298,13 @@ sub epoll_create {
 # epoll_ctl wrapper
 # ARGS: (epfd, op, fd, events_mask)
 sub epoll_ctl_mod4 {
-    syscall($SYS_epoll_ctl, $_[0]+0, $_[1]+0, $_[2]+0, pack("LLL", $_[3], $_[2], 0));
+	syscall($SYS_epoll_ctl, $_[0]+0, $_[1]+0, $_[2]+0,
+		pack("LLL", $_[3], $_[2], 0));
 }
+
 sub epoll_ctl_mod8 {
-    syscall($SYS_epoll_ctl, $_[0]+0, $_[1]+0, $_[2]+0, pack("LLLL", $_[3], 0, $_[2], 0));
+	syscall($SYS_epoll_ctl, $_[0]+0, $_[1]+0, $_[2]+0,
+		pack("LLLL", $_[3], 0, $_[2], 0));
 }
 
 # epoll_wait wrapper

^ permalink raw reply related	[relevance 50%]

* [PATCH 2/2] syscall: use pure Perl sendmsg/recvmsg on *BSD
    2024-01-29 21:23 50% ` [PATCH 1/2] syscall: update formatting to match our codebase Eric Wong
@ 2024-01-29 21:23 54% ` Eric Wong
    1 sibling, 1 reply; 51+ results
From: Eric Wong @ 2024-01-29 21:23 UTC (permalink / raw)
  To: meta

While syscall symbols (e.g. SYS_*) have changed on us in FreeBSD
during the history of Sys::Syscall and this project and did bite
us in some cases; the actual numbers don't get recycled for new
syscalls.  We're also fortunate that sendmsg and recvmsg syscalls
and associated msghdr and cmsg structs predate the BSD forks and
are compatible across all the BSDs I've tried.

OpenBSD routes Perl `syscall' through libc; while NetBSD + FreeBSD
document procedures for maintaining backwards compatibility.
It looks like Dragonfly follows FreeBSD, here.

Tested on i386 OpenBSD, and amd64 {Free,Net,Open,Dragonfly}BSD

This enables *BSD users to use lei, -cindex and future SCM_RIGHTS-only
features without needing Inline::C.

[1] https://cvsweb.openbsd.org/src/gnu/usr.bin/perl/gen_syscall_emulator.pl
[2] https://www.netbsd.org/docs/internals/en/chap-processes.html#syscall_versioning
[3] https://wiki.freebsd.org/AddingSyscalls#Backward_compatibily
---
 devel/sysdefs-list         |   9 +++-
 lib/PublicInbox/Syscall.pm | 102 +++++++++++++++++++++++--------------
 t/cmd_ipc.t                |   9 ++--
 3 files changed, 74 insertions(+), 46 deletions(-)

diff --git a/devel/sysdefs-list b/devel/sysdefs-list
index 61532cf2..ba51de6c 100755
--- a/devel/sysdefs-list
+++ b/devel/sysdefs-list
@@ -2,8 +2,6 @@
 # License: AGPL-3.0+ <http://www.gnu.org/licenses/agpl-3.0.txt>
 # Dump system-specific constant numbers this is to maintain
 # PublicInbox::Syscall and any other system-specific pieces.
-# DO NOT USE syscall numbers for *BSDs, none of the current BSD kernels
-# we know about promise stable syscall numbers (unlike Linux).
 # However, sysconf(3) constants are stable ABI on all safe to dump.
 eval 'exec perl -S $0 ${1+"$@"}' # no shebang
 	if 0; # running under some shell
@@ -179,5 +177,12 @@ int main(void)
 		PR_NUM(cmsg_type);
 	STRUCT_END;
 
+	{
+		struct cmsghdr cmsg;
+		uintptr_t cmsg_data_off;
+		cmsg_data_off = (uintptr_t)CMSG_DATA(&cmsg) - (uintptr_t)&cmsg;
+		D(cmsg_data_off);
+	}
+
 	return 0;
 }
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 9071e6b1..829cfa3c 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -22,7 +22,7 @@ use POSIX qw(ENOENT ENOSYS EINVAL O_NONBLOCK);
 use Socket qw(SOL_SOCKET SCM_RIGHTS);
 use Config;
 our %SIGNUM = (WINCH => 28); # most Linux, {Free,Net,Open}BSD, *Darwin
-our $INOTIFY;
+our ($INOTIFY, %PACK);
 
 # $VERSION = '0.25'; # Sys::Syscall version
 our @EXPORT_OK = qw(epoll_ctl epoll_create epoll_wait
@@ -44,26 +44,21 @@ use constant {
 	EPOLL_CTL_MOD => 3,
 	SIZEOF_int => $Config{intsize},
 	SIZEOF_size_t => $Config{sizesize},
+	SIZEOF_ptr => $Config{ptrsize},
 	NUL => "\0",
 };
 
-use constant {
-	TMPL_size_t => SIZEOF_size_t == 8 ? 'Q' : 'L',
-	BYTES_4_hole => SIZEOF_size_t == 8 ? 'L' : '',
-	# cmsg_len, cmsg_level, cmsg_type
-	SIZEOF_cmsghdr => SIZEOF_int * 2 + SIZEOF_size_t,
-};
-
-my @BYTES_4_hole = BYTES_4_hole ? (0) : ();
+use constant TMPL_size_t => SIZEOF_size_t == 8 ? 'Q' : 'L';
 
 our ($SYS_epoll_create,
 	$SYS_epoll_ctl,
 	$SYS_epoll_wait,
 	$SYS_signalfd4,
 	$SYS_renameat2,
-	$F_SETPIPE_SZ);
+	$F_SETPIPE_SZ,
+	$SYS_sendmsg,
+	$SYS_recvmsg);
 
-my ($SYS_sendmsg, $SYS_recvmsg);
 my $SYS_fstatfs; # don't need fstatfs64, just statfs.f_type
 my ($FS_IOC_GETFLAGS, $FS_IOC_SETFLAGS);
 my $SFD_CLOEXEC = 02000000; # Perl does not expose O_CLOEXEC
@@ -78,7 +73,7 @@ if ($^O eq "linux") {
 	# boundaries.
 	my $u64_mod_8 = 0;
 
-	if ($Config{ptrsize} == 4) {
+	if (SIZEOF_ptr == 4) {
 		# if we're running on an x86_64 kernel, but a 32-bit process,
 		# we need to use the x32 or i386 syscall numbers.
 		if ($machine eq 'x86_64') {
@@ -281,16 +276,52 @@ EOM
 		*epoll_wait = \&epoll_wait_mod4;
 		*epoll_ctl = \&epoll_ctl_mod4;
 	}
+} elsif ($^O =~ /\A(?:freebsd|openbsd|netbsd|dragonfly)\z/) {
+# don't use syscall.ph here, name => number mappings are not stable on *BSD
+# but the actual numbers are.
+# OpenBSD perl redirects syscall perlop to libc functions
+# https://cvsweb.openbsd.org/src/gnu/usr.bin/perl/gen_syscall_emulator.pl
+# https://www.netbsd.org/docs/internals/en/chap-processes.html#syscall_versioning
+# https://wiki.freebsd.org/AddingSyscalls#Backward_compatibily
+# (I'm assuming Dragonfly copies FreeBSD, here, too)
+	$SYS_recvmsg = 27;
+	$SYS_sendmsg = 28;
+}
+
+BEGIN {
+	if ($^O eq 'linux') {
+		%PACK = (
+			TMPL_cmsg_len => TMPL_size_t,
+			# cmsg_len, cmsg_level, cmsg_type
+			SIZEOF_cmsghdr => SIZEOF_int * 2 + SIZEOF_size_t,
+			CMSG_DATA_off => '',
+			TMPL_msghdr => 'PL' . # msg_name, msg_namelen
+				'@'.(2 * SIZEOF_ptr).'P'. # msg_iov
+				'i'. # msg_iovlen
+				'@'.(4 * SIZEOF_ptr).'P'. # msg_control
+				'L'. # msg_controllen (socklen_t)
+				'i', # msg_flags
+		);
+	} elsif ($^O =~ /\A(?:freebsd|openbsd|netbsd|dragonfly)\z/) {
+		%PACK = (
+			TMPL_cmsg_len => 'L', # socklen_t
+			SIZEOF_cmsghdr => SIZEOF_int * 3,
+			CMSG_DATA_off => SIZEOF_ptr == 8 ? '@16' : '',
+			TMPL_msghdr => 'PL' . # msg_name, msg_namelen
+				'@'.(2 * SIZEOF_ptr).'P'. # msg_iov
+				TMPL_size_t. # msg_iovlen
+				'@'.(4 * SIZEOF_ptr).'P'. # msg_control
+				TMPL_size_t. # msg_controllen
+				'i', # msg_flags
+
+		)
+	}
+	$PACK{CMSG_ALIGN_size} = SIZEOF_size_t;
 }
 
 # SFD_CLOEXEC is arch-dependent, so IN_CLOEXEC may be, too
 $INOTIFY->{IN_CLOEXEC} //= 0x80000 if $INOTIFY;
 
-# use Inline::C for *BSD-only or general POSIX stuff.
-# Linux guarantees stable syscall numbering, BSDs only offer a stable libc
-# use devel/sysdefs-list on Linux to detect new syscall numbers and
-# other system constants
-
 sub epoll_create {
 	syscall($SYS_epoll_create, $no_deprecated ? 0 : 100);
 }
@@ -420,11 +451,13 @@ sub nodatacow_dir {
 	if (open my $fh, '<', $_[0]) { nodatacow_fh($fh) }
 }
 
-sub CMSG_ALIGN ($) { ($_[0] + SIZEOF_size_t - 1) & ~(SIZEOF_size_t - 1) }
+use constant \%PACK;
+sub CMSG_ALIGN ($) { ($_[0] + CMSG_ALIGN_size - 1) & ~(CMSG_ALIGN_size - 1) }
 use constant CMSG_ALIGN_SIZEOF_cmsghdr => CMSG_ALIGN(SIZEOF_cmsghdr);
 sub CMSG_SPACE ($) { CMSG_ALIGN($_[0]) + CMSG_ALIGN_SIZEOF_cmsghdr }
 sub CMSG_LEN ($) { CMSG_ALIGN_SIZEOF_cmsghdr + $_[0] }
-use constant msg_controllen => CMSG_SPACE(10 * SIZEOF_int) + 16; # 10 FDs
+use constant msg_controllen_max =>
+	CMSG_SPACE(10 * SIZEOF_int) + SIZEOF_cmsghdr; # space for 10 FDs
 
 if (defined($SYS_sendmsg) && defined($SYS_recvmsg)) {
 no warnings 'once';
@@ -436,20 +469,15 @@ require PublicInbox::CmdIPC4;
 			$_[2] // NUL, length($_[2] // NUL) || 1);
 	my $fd_space = scalar(@$fds) * SIZEOF_int;
 	my $msg_controllen = CMSG_SPACE($fd_space);
-	my $cmsghdr = pack(TMPL_size_t . # cmsg_len
+	my $cmsghdr = pack(TMPL_cmsg_len .
 			'LL' .  # cmsg_level, cmsg_type,
-			('i' x scalar(@$fds)) . # CMSG_DATA
+			CMSG_DATA_off.('i' x scalar(@$fds)). # CMSG_DATA
 			'@'.($msg_controllen - 1).'x1', # pad to space, not len
 			CMSG_LEN($fd_space), # cmsg_len
 			SOL_SOCKET, SCM_RIGHTS, # cmsg_{level,type}
 			@$fds); # CMSG_DATA
-	my $mh = pack('PL' . # msg_name, msg_namelen (socklen_t (U32))
-			BYTES_4_hole . # 4-byte padding on 64-bit
-			'P'.TMPL_size_t . # msg_iov, msg_iovlen,
-			'P'.TMPL_size_t . # msg_control, msg_controllen,
-			'i', # msg_flags
-			NUL, 0, # msg_name, msg_namelen (unused)
-			@BYTES_4_hole,
+	my $mh = pack(TMPL_msghdr,
+			undef, 0, # msg_name, msg_namelen (unused)
 			$iov, 1, # msg_iov, msg_iovlen
 			$cmsghdr, # msg_control
 			$msg_controllen,
@@ -465,18 +493,13 @@ require PublicInbox::CmdIPC4;
 *recv_cmd4 = sub ($$$) {
 	my ($sock, undef, $len) = @_;
 	vec($_[1] //= '', $len - 1, 8) = 0;
-	my $cmsghdr = "\0" x msg_controllen; # 10 * sizeof(int)
+	my $cmsghdr = "\0" x msg_controllen_max; # 10 * sizeof(int)
 	my $iov = pack('P'.TMPL_size_t, $_[1], $len);
-	my $mh = pack('PL' . # msg_name, msg_namelen (socklen_t (U32))
-			BYTES_4_hole . # 4-byte padding on 64-bit
-			'P'.TMPL_size_t . # msg_iov, msg_iovlen,
-			'P'.TMPL_size_t . # msg_control, msg_controllen,
-			'i', # msg_flags
-			NUL, 0, # msg_name, msg_namelen (unused)
-			@BYTES_4_hole,
+	my $mh = pack(TMPL_msghdr,
+			undef, 0, # msg_name, msg_namelen (unused)
 			$iov, 1, # msg_iov, msg_iovlen
 			$cmsghdr, # msg_control
-			msg_controllen,
+			msg_controllen_max,
 			0); # msg_flags
 	my $r;
 	do {
@@ -489,8 +512,9 @@ require PublicInbox::CmdIPC4;
 	substr($_[1], $r, length($_[1]), '');
 	my @ret;
 	if ($r > 0) {
-		my ($len, $lvl, $type, @fds) = unpack(TMPL_size_t . # cmsg_len
-					'LLi*', # cmsg_level, cmsg_type, @fds
+		my ($len, $lvl, $type, @fds) = unpack(TMPL_cmsg_len.
+					'LL'. # cmsg_level, cmsg_type
+					CMSG_DATA_off.'i*', # @fds
 					$cmsghdr);
 		if ($lvl == SOL_SOCKET && $type == SCM_RIGHTS) {
 			$len -= CMSG_ALIGN_SIZEOF_cmsghdr;
diff --git a/t/cmd_ipc.t b/t/cmd_ipc.t
index 08a4dcc3..c973c6f0 100644
--- a/t/cmd_ipc.t
+++ b/t/cmd_ipc.t
@@ -143,14 +143,13 @@ SKIP: {
 }
 
 SKIP: {
-	skip 'not Linux', 1 if $^O ne 'linux';
 	require_ok 'PublicInbox::Syscall';
 	$send = PublicInbox::Syscall->can('send_cmd4') or
-		skip 'send_cmd4 not defined for arch', 1;
+		skip "send_cmd4 not defined for $^O arch", 1;
 	$recv = PublicInbox::Syscall->can('recv_cmd4') or
-		skip 'recv_cmd4 not defined for arch', 1;
-	$do_test->(SOCK_STREAM, 0, 'PP Linux stream');
-	$do_test->(SOCK_SEQPACKET, 0, 'PP Linux seqpacket');
+		skip "recv_cmd4 not defined for $^O arch", 1;
+	$do_test->(SOCK_STREAM, 0, 'pure Perl stream');
+	$do_test->(SOCK_SEQPACKET, 0, 'pure Perl seqpacket');
 }
 
 done_testing;

^ permalink raw reply related	[relevance 54%]

* [RFT] syscall: set default constants for Inline::C platforms
  @ 2024-04-08  9:48 91%     ` Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2024-04-08  9:48 UTC (permalink / raw)
  To: Gaelan Steele; +Cc: meta

Gaelan Steele <gaelans@icloud.com> wrote:
> Unfortunately this patch broke public-inbox on Darwin:
> 
> Bareword "SIZEOF_cmsghdr" not allowed while "strict subs" in use at /tmp/public-inbox/lib/PublicInbox/Syscall.pm line 456.
> BEGIN not safe after errors--compilation aborted at /tmp/public-inbox/lib/PublicInbox/Syscall.pm line 460.
> Compilation failed in require at /tmp/public-inbox/lib/PublicInbox/DS.pm line 31.
> BEGIN failed--compilation aborted at /tmp/public-inbox/lib/PublicInbox/DS.pm line 32.
> Compilation failed in require at /tmp/public-inbox/lib/PublicInbox/Daemon.pm line 17.
> BEGIN failed--compilation aborted at /tmp/public-inbox/lib/PublicInbox/Daemon.pm line 17.
> Compilation failed in require at /tmp/public-inbox/script/public-inbox-httpd line 7.
> BEGIN failed--compilation aborted at /tmp/public-inbox/script/public-inbox-httpd line 7.
> 
> I’m not enough of a Perl person to fully untangle this. As
> best I can tell, the intent is that non-Linux/BSD OSes should
> still work with Inline::C, but this doesn’t work in practice
> due to a bug?

Right.  Patch below should fix it, test feedback appreciated.

> It may also be possible to use the BSD approach on Darwin -
> Darwin ascribes to the BSD school of thought where libc is the
> only Officially Stable interface, but if you can get away with
> it on the real BSDs maybe you can get away with it on fake BSD
> too.

NetBSD and FreeBSD both document the underlying syscall numbers
remain stable (but not the name => number mapping).  OpenBSD has
no stable numbering, but goes as far as to patch Perl to route
the `syscall' perlop through their libc to avoid breaking Perl
scripts.

I have no idea if Darwin maintains any stability guarantees at
all like the above OSes, so Inline::C may be safer, here.

-------8<-------
Subject: [PATCH] syscall: set default constants for Inline::C platforms

This ought to fix compile errors on platforms we don't
explicitly support.

Reported-by: Gaelan Steele <gaelans@icloud.com>
---
 lib/PublicInbox/Syscall.pm | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 829cfa3c..99af5bf5 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -317,6 +317,10 @@ BEGIN {
 		)
 	}
 	$PACK{CMSG_ALIGN_size} = SIZEOF_size_t;
+	$PACK{SIZEOF_cmsghdr} //= 0;
+	$PACK{TMPL_cmsg_len} //= undef;
+	$PACK{CMSG_DATA_off} //= undef;
+	$PACK{TMPL_msghdr} //= undef;
 }
 
 # SFD_CLOEXEC is arch-dependent, so IN_CLOEXEC may be, too

^ permalink raw reply related	[relevance 91%]

* [PATCH 3/5] send_cmd4: make `tries' a per-call parameter
  @ 2024-04-25 21:31 80% ` Eric Wong
  0 siblings, 0 replies; 51+ results
From: Eric Wong @ 2024-04-25 21:31 UTC (permalink / raw)
  To: meta

While existing callers are private (lei, *-index, -watch) are
private, we should not be blocking the event loop in
public-facing servers when we hit ETOOMANYREFS, ENOMEM, or
ENOBUFS.
---
 lib/PublicInbox/CmdIPC4.pm | 12 ++++++------
 lib/PublicInbox/Spawn.pm   | 12 +++++++-----
 lib/PublicInbox/Syscall.pm |  8 ++++----
 3 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/lib/PublicInbox/CmdIPC4.pm b/lib/PublicInbox/CmdIPC4.pm
index 2f102ec6..fc77bd03 100644
--- a/lib/PublicInbox/CmdIPC4.pm
+++ b/lib/PublicInbox/CmdIPC4.pm
@@ -11,8 +11,8 @@ use Socket qw(SOL_SOCKET SCM_RIGHTS);
 sub sendmsg_retry ($) {
 	return 1 if $!{EINTR};
 	return unless ($!{ENOMEM} || $!{ENOBUFS} || $!{ETOOMANYREFS});
-	return if ++$_[0] >= 50;
-	warn "# sleeping on sendmsg: $! (#$_[0])\n";
+	return if --$_[0] < 0;
+	warn "# sleeping on sendmsg: $! ($_[0] tries left)\n";
 	select(undef, undef, undef, 0.1);
 	1;
 }
@@ -22,15 +22,15 @@ require Socket::MsgHdr; # XS
 no warnings 'once';
 
 # any number of FDs per-sendmsg(2) + buffer
-*send_cmd4 = sub ($$$$) { # (sock, fds, buf, flags) = @_;
-	my ($sock, $fds, undef, $flags) = @_;
+*send_cmd4 = sub ($$$$;$) { # (sock, fds, buf, flags) = @_;
+	my ($sock, $fds, undef, $flags, $tries) = @_;
+	$tries //= 50;
 	my $mh = Socket::MsgHdr->new(buf => $_[2]);
 	$mh->cmsghdr(SOL_SOCKET, SCM_RIGHTS, pack('i' x scalar(@$fds), @$fds));
 	my $s;
-	my $try = 0;
 	do {
 		$s = Socket::MsgHdr::sendmsg($sock, $mh, $flags);
-	} while (!defined($s) && sendmsg_retry($try));
+	} while (!defined($s) && sendmsg_retry($tries));
 	$s;
 };
 
diff --git a/lib/PublicInbox/Spawn.pm b/lib/PublicInbox/Spawn.pm
index e36659ce..e9e81e88 100644
--- a/lib/PublicInbox/Spawn.pm
+++ b/lib/PublicInbox/Spawn.pm
@@ -176,15 +176,15 @@ out:
 	return (int)pid;
 }
 
-static int sendmsg_retry(unsigned *tries)
+static int sendmsg_retry(int *tries)
 {
 	const struct timespec req = { 0, 100000000 }; /* 100ms */
 	int err = errno;
 	switch (err) {
 	case EINTR: PERL_ASYNC_CHECK(); return 1;
 	case ENOBUFS: case ENOMEM: case ETOOMANYREFS:
-		if (++*tries >= 50) return 0;
-		fprintf(stderr, "# sleeping on sendmsg: %s (#%u)\n",
+		if (--*tries < 0) return 0;
+		fprintf(stderr, "# sleeping on sendmsg: %s (%d tries left)\n",
 			strerror(err), *tries);
 		nanosleep(&req, NULL);
 		PERL_ASYNC_CHECK();
@@ -201,7 +201,7 @@ union my_cmsg {
 	char pad[sizeof(struct cmsghdr) + 16 + SEND_FD_SPACE];
 };
 
-SV *send_cmd4(PerlIO *s, SV *svfds, SV *data, int flags)
+SV *send_cmd4_(PerlIO *s, SV *svfds, SV *data, int flags, int tries)
 {
 	struct msghdr msg = { 0 };
 	union my_cmsg cmsg = { 0 };
@@ -211,7 +211,6 @@ SV *send_cmd4(PerlIO *s, SV *svfds, SV *data, int flags)
 	AV *fds = (AV *)SvRV(svfds);
 	I32 i, nfds = av_len(fds) + 1;
 	int *fdp;
-	unsigned tries = 0;
 
 	if (SvOK(data)) {
 		iov.iov_base = SvPV(data, dlen);
@@ -332,6 +331,9 @@ EOM
 	if (defined $all_libc) { # set for Gcf2
 		$ENV{PERL_INLINE_DIRECTORY} = $inline_dir;
 		%RLIMITS = rlimit_map();
+		*send_cmd4 = sub ($$$$;$) {
+			send_cmd4_($_[0], $_[1], $_[2], $_[3], 50);
+		}
 	} else {
 		require PublicInbox::SpawnPP;
 		*pi_fork_exec = \&PublicInbox::SpawnPP::pi_fork_exec
diff --git a/lib/PublicInbox/Syscall.pm b/lib/PublicInbox/Syscall.pm
index 99af5bf5..4cbe9623 100644
--- a/lib/PublicInbox/Syscall.pm
+++ b/lib/PublicInbox/Syscall.pm
@@ -467,8 +467,8 @@ if (defined($SYS_sendmsg) && defined($SYS_recvmsg)) {
 no warnings 'once';
 require PublicInbox::CmdIPC4;
 
-*send_cmd4 = sub ($$$$) {
-	my ($sock, $fds, undef, $flags) = @_;
+*send_cmd4 = sub ($$$$;$) {
+	my ($sock, $fds, undef, $flags, $tries) = @_;
 	my $iov = pack('P'.TMPL_size_t,
 			$_[2] // NUL, length($_[2] // NUL) || 1);
 	my $fd_space = scalar(@$fds) * SIZEOF_int;
@@ -487,10 +487,10 @@ require PublicInbox::CmdIPC4;
 			$msg_controllen,
 			0); # msg_flags
 	my $s;
-	my $try = 0;
+	$tries //= 50;
 	do {
 		$s = syscall($SYS_sendmsg, fileno($sock), $mh, $flags);
-	} while ($s < 0 && PublicInbox::CmdIPC4::sendmsg_retry($try));
+	} while ($s < 0 && PublicInbox::CmdIPC4::sendmsg_retry($tries));
 	$s >= 0 ? $s : undef;
 };
 

^ permalink raw reply related	[relevance 80%]

Results 1-51 of 51 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2019-05-05  0:52     [PATCH 0/4] bundle Danga::Socket and Sys::Syscall Eric Wong
2019-05-05  0:52 24% ` [PATCH 1/4] " Eric Wong
2019-05-05  0:52 83% ` [PATCH 2/4] listener: use EPOLLEXCLUSIVE for listen sockets Eric Wong
2019-05-08 19:18     ` [PATCH 0/4] Danga::Socket bundling cleanups Eric Wong
2019-05-08 19:18 85%   ` [PATCH 2/4] syscall: drop readahead wrapper Eric Wong
2019-06-24  2:52     [PATCH 00/57] ds: shrink, TLS support, buffer writes to FS Eric Wong
2019-06-24  2:52 91% ` [PATCH 15/57] syscall: get rid of unused EPOLL* constants Eric Wong
2019-06-24  2:52 93% ` [PATCH 16/57] syscall: get rid of unnecessary uname local vars Eric Wong
2019-06-24  2:52 57% ` [PATCH 21/57] ds: get rid of event_watch field Eric Wong
2019-06-24  2:52 60% ` [PATCH 54/57] ds: split out IO::KQueue-specific code Eric Wong
2019-06-24  5:24 93%   ` Eric Wong
2019-06-29 19:59     [PATCH 00/11] ds: more updates Eric Wong
2019-06-29 19:59 86% ` [PATCH 04/11] listener: use edge-triggered notifications Eric Wong
2019-10-21 11:22     [PATCH 0/7] dead code elimination Eric Wong
2019-10-21 11:22 75% ` [PATCH 7/7] syscall: get rid of sendfile wrappers for now Eric Wong
2019-11-27  1:33     [PATCH 0/2] fix kqueue support and missed signal wakeups Eric Wong
2019-11-27  1:33 37% ` [PATCH 2/2] httpd|nntpd: avoid " Eric Wong
2020-01-05 23:23     [PATCH 0/6] various cleanups around use/require Eric Wong
2020-01-05 23:23 87% ` [PATCH 6/6] syscall: modernize away from pre-Perl-5.6 conventions Eric Wong
2020-02-06  8:49 80% [PATCH] syscall: support Linux x32 ABI Eric Wong
2020-08-07 10:15 68% [PATCH] syscall: support sparc64 (and maybe other big-endian systems) Eric Wong
2020-12-27  2:53     [PATCH 0/5] some yak shaving Eric Wong
2020-12-27  2:53 60% ` [PATCH 5/5] ds: flatten + reuse @events, epoll_wait style fixes Eric Wong
2020-12-31 13:51     [PATCH 00/36] another round of lei stuff Eric Wong
2020-12-31 13:51 73% ` [PATCH 32/36] syscall: SFD_NONBLOCK can be a constant, again Eric Wong
2021-01-17  7:09     [PATCH 0/5] fixes for older Perls and Xapian Eric Wong
2021-01-17  7:09 95% ` [PATCH 2/5] initialize scalar for `vec' perlop modification Eric Wong
2021-05-06  8:38 88% [PATCH] syscall: minor yak-shaving updates Eric Wong
2021-06-18 21:44 99% [PATCH] scripts: add syscall-list tool for development Eric Wong
2021-10-01  9:54     [PATCH 0/9] daemon-related things Eric Wong
2021-10-01  9:54 43% ` [PATCH 5/9] ds: simplify signalfd use Eric Wong
2021-10-21 21:10     [PATCH 00/15] use RENAME_NOREPLACE on Linux 3.15+ Eric Wong
2021-10-21 21:10 80% ` [PATCH 15/15] lei: " Eric Wong
2021-12-08  1:07     Test failures with 1.7.0 Julien Moutinho
2021-12-08  4:08     ` Eric Wong
2021-12-08 10:56       ` Dominique Martinet
2021-12-08 18:22         ` [PATCH] nodatacow: quiet chattr errors [was: Test failures with 1.7.0] Eric Wong
2021-12-08 21:14           ` Dominique Martinet
2021-12-08 22:01             ` Dominique Martinet
2022-01-30 21:49 60%           ` Eric Wong
2022-01-30 23:18                 ` Dominique Martinet
2022-01-31  2:03                   ` Eric Wong
2022-01-31  3:34                     ` Dominique Martinet
2022-02-01  1:27 74%                   ` Eric Wong
2021-12-09  1:37         ` Test failures with 1.7.0 Julien Moutinho
2021-12-09  2:53 79%       ` Dominique Martinet
2022-03-23  8:54     [PATCH 0/3] support sendmsg+recvmsg in pure Perl under Linux Eric Wong
2022-03-23  8:54 93% ` [PATCH 1/3] syscall: drop unused EEXIST import Eric Wong
2022-03-23  8:54 84% ` [PATCH 3/3] syscall: implement sendmsg+recvmsg in pure Perl Eric Wong
2022-03-23 21:08 79% ` [PATCH 4/3] syscall: add sendmsg+recvmsg for remaining arches Eric Wong
2022-04-18  9:50     [PATCH 0/4] lei: finish wiring up pure-Perl stuff for Linux Eric Wong
2022-04-18  9:50 91% ` [PATCH 2/4] syscall: more idiomatic cmsghdr space allocation Eric Wong
2022-04-18  9:50 93% ` [PATCH 4/4] syscall: golf + more idiomatic buffer initialization Eric Wong
2022-04-23 22:03     [PATCH 0/2] version bumps, Perl 5.12 in _some_ places Eric Wong
2022-04-23 22:03 88% ` [PATCH 2/2] lei: move to v5.12 to avoid "use strict" Eric Wong
2022-08-11 22:33 93% [PATCH] syscall: add support for riscv64 Eric Wong
2022-09-29 17:48     [PATCH 0/4] CentOS 7 fixes + fix Gcf2 everywhere Eric Wong
2022-09-29 17:48 93% ` [PATCH 1/4] syscall: initialize buffer for vec() Eric Wong
2022-10-15  8:12     SIGWINCH not recognized under macOS Nicolás Ojeda Bär
2022-10-17  9:30 91% ` [PATCH 1/2] syscall: avoid needless string comparison on x86-64 Eric Wong
2022-10-17  9:30 96% ` [PATCH 2/2] sigfd: set SIGWINCH for MIPS and PA-RISC on Linux Eric Wong
2022-12-23 12:51     [PATCH 0/2] syscall debloating Eric Wong
2022-12-23 12:51 97% ` [PATCH 1/2] syscall: get rid of epoll_defined() sub Eric Wong
2022-12-23 12:51 82% ` [PATCH 2/2] syscall: drop syscall.ph support Eric Wong
2022-12-25 13:24 93% [PATCH] syscall: fix i386/i686 detection Eric Wong
2023-02-22 17:25 95% [PATCH] sendmsg: prefix sleep message with `#' Eric Wong
2023-09-04 10:35     [PATCH 00/10] signal-handling and *BSD fixes Eric Wong
2023-09-04 10:36 90% ` [PATCH 04/10] update devel/syscall-list to devel/sysdefs-list Eric Wong
2023-09-11  9:41     [PATCH 0/7] system-related updates and cleanups Eric Wong
2023-09-11  9:41 43% ` [PATCH 2/7] daemon: depend on DS event_loop in master process, too Eric Wong
2023-09-11  9:41 50% ` [PATCH 3/7] ds: use object-oriented API for epoll Eric Wong
2023-09-24 20:19     [PATCH 0/6] various test and syscall-related fixes Eric Wong
2023-09-24 20:19 86% ` [PATCH 4/6] ipc: recv_cmd4 clobbers destination buffer on errors Eric Wong
2023-09-24 20:19 86% ` [PATCH 5/6] syscall: have `vec' operate on bytes directly Eric Wong
2023-09-24 20:19 91% ` [PATCH 6/6] syscall: fix valgrind error in pure Perl send_cmd4 Eric Wong
2023-10-06  1:37 57% [PATCH] ipc: lower-level send_cmd/recv_cmd handle EINTR directly Eric Wong
2023-10-17 23:37     [PATCH 00/30] autodie-ification and code simplifications Eric Wong
2023-10-17 23:38 71% ` [PATCH 17/30] syscall: common $F_SETPIPE_SZ definition Eric Wong
2023-12-28  4:23 47% [PATCH] pure Perl inotify support Eric Wong
2024-01-29 21:23     [PATCH 0/2] pure Perl sendmsg/recvmsg on *BSD Eric Wong
2024-01-29 21:23 50% ` [PATCH 1/2] syscall: update formatting to match our codebase Eric Wong
2024-01-29 21:23 54% ` [PATCH 2/2] syscall: use pure Perl sendmsg/recvmsg on *BSD Eric Wong
2024-04-06  0:43       ` Gaelan Steele
2024-04-08  9:48 91%     ` [RFT] syscall: set default constants for Inline::C platforms Eric Wong
2024-04-25 21:31     [PATCH 0/5] xap_helper stuff for public daemons Eric Wong
2024-04-25 21:31 80% ` [PATCH 3/5] send_cmd4: make `tries' a per-call parameter Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).