* Gemini protocol view? @ 2025-03-04 19:44 Konstantin Ryabitsev 2025-03-04 22:14 ` Eric Wong 0 siblings, 1 reply; 6+ messages in thread From: Konstantin Ryabitsev @ 2025-03-04 19:44 UTC (permalink / raw) To: meta Hi: Just a wild idea -- how hard would it be to present a gemini:// protocol view in addition to the web view? The AI scraper bots are killing us and I'm looking for ways to present a lighter view of the entire lore database that developers can still easily use. Gemini seems like the right set of features for public-inbox, since it allows searching. Just thinking out loud. :) -K ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Gemini protocol view? 2025-03-04 19:44 Gemini protocol view? Konstantin Ryabitsev @ 2025-03-04 22:14 ` Eric Wong 2025-03-05 0:45 ` [PATCH] listener: don't set listen backlog on inherited sockets Eric Wong 2025-03-07 21:25 ` Gemini protocol view? Eric Wong 0 siblings, 2 replies; 6+ messages in thread From: Eric Wong @ 2025-03-04 22:14 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote: > Hi: > > Just a wild idea -- how hard would it be to present a gemini:// protocol view > in addition to the web view? Probably not hard at all, but I haven't looked at it at all and am still trying figure out how to make a 2.0 release soon with codesearch support... I'm not a fan of forced TLS since I favor Tor; and I generally favor older, more established things... > The AI scraper bots are killing us and I'm > looking for ways to present a lighter view of the entire lore database that > developers can still easily use. Gemini seems like the right set of features > for public-inbox, since it allows searching. Yeah, I don't disagree. I've got the design of a new, independent Perl transpile-terpreter largely worked out in my head which ought to give good speedups; but it remains a massive effort to actually implement. I wonder if prioritizing certain User-Agents (e.g. curl, w3m, anything with non-Anroid "Linux" in it) would be a quick fix in haproxy||nginx. > Just thinking out loud. :) Thanks :> ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] listener: don't set listen backlog on inherited sockets 2025-03-04 22:14 ` Eric Wong @ 2025-03-05 0:45 ` Eric Wong 2025-03-05 16:54 ` Konstantin Ryabitsev 2025-03-07 21:25 ` Gemini protocol view? Eric Wong 1 sibling, 1 reply; 6+ messages in thread From: Eric Wong @ 2025-03-05 0:45 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta Eric Wong <e@80x24.org> wrote: > I wonder if prioritizing certain User-Agents (e.g. curl, w3m, > anything with non-Anroid "Linux" in it) would be a quick fix in > haproxy||nginx. This seems like a long overdue change to allow certain listen sockets to handle more traffic than others. ---------8<------- Subject: [PATCH] listener: don't set listen backlog on inherited sockets By using the listen(2) backlog as-is when inheriting (from systemd or similar), we can give the sysadmin more control on controlling overload on a per-listener basis. For systemd users, this means the `Backlog=' parameter in systemd.socket(5) can be respected and configured to give certain sockets a smaller backlog (perhaps combined with with per-listener `multi-accept' parameter on sockets with the standard (huge) backlog). For sockets we create, continue to use INT_MAX and let the kernel clamp it to whatever system-wide limit there is (e.g. `net.core.somaxconn' sysctl on Linux). --- lib/PublicInbox/Daemon.pm | 2 +- lib/PublicInbox/LEI.pm | 3 ++- lib/PublicInbox/Listener.pm | 1 - 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/PublicInbox/Daemon.pm b/lib/PublicInbox/Daemon.pm index 5d93f81f..8fe93acd 100644 --- a/lib/PublicInbox/Daemon.pm +++ b/lib/PublicInbox/Daemon.pm @@ -264,7 +264,7 @@ EOF die $@ if $@; %o = (LocalAddr => $l, ReuseAddr => 1, Proto => 'tcp'); } - $o{Listen} = 1024; + $o{Listen} = 2**31 - 1; # kernel will clamp my $prev = umask 0000; my $s = eval { $sock_pkg->new(%o) } or warn "error binding $l: $! ($@)\n"; diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm index 94bac688..0a779c4f 100644 --- a/lib/PublicInbox/LEI.pm +++ b/lib/PublicInbox/LEI.pm @@ -9,7 +9,7 @@ package PublicInbox::LEI; use v5.12; use parent qw(PublicInbox::DS PublicInbox::LeiExternal PublicInbox::LeiQuery); -use autodie qw(bind chdir open pipe socket socketpair syswrite unlink); +use autodie qw(bind chdir listen open pipe socket socketpair syswrite unlink); use Getopt::Long (); use Socket qw(AF_UNIX SOCK_SEQPACKET pack_sockaddr_un); use Errno qw(EPIPE EAGAIN ECONNREFUSED ENOENT ECONNRESET EINTR); @@ -1371,6 +1371,7 @@ sub lazy_start { local (%PATH2CFG, $MDIR2CFGPATH); local $daemon_pid = $$; $listener->blocking(0); + listen $listener, 2**31 - 1; # kernel will clamp my $exit_code; my $pil = PublicInbox::Listener->new($listener, \&accept_dispatch); local $quit = do { diff --git a/lib/PublicInbox/Listener.pm b/lib/PublicInbox/Listener.pm index c83901b2..62475600 100644 --- a/lib/PublicInbox/Listener.pm +++ b/lib/PublicInbox/Listener.pm @@ -21,7 +21,6 @@ sub new { my ($class, $s, $cb, $multi_accept) = @_; setsockopt($s, SOL_SOCKET, SO_KEEPALIVE, 1); setsockopt($s, IPPROTO_TCP, TCP_NODELAY, 1); # ignore errors on non-TCP - listen($s, 2**31 - 1); # kernel will clamp my $self = bless { post_accept => $cb }, $class; $self->{multi_accept} = $multi_accept //= $MULTI_ACCEPT; $self->SUPER::new($s, EPOLLIN|EPOLLEXCLUSIVE); ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] listener: don't set listen backlog on inherited sockets 2025-03-05 0:45 ` [PATCH] listener: don't set listen backlog on inherited sockets Eric Wong @ 2025-03-05 16:54 ` Konstantin Ryabitsev 0 siblings, 0 replies; 6+ messages in thread From: Konstantin Ryabitsev @ 2025-03-05 16:54 UTC (permalink / raw) To: Eric Wong; +Cc: meta On Wed, Mar 05, 2025 at 12:45:36AM +0000, Eric Wong wrote: > Eric Wong <e@80x24.org> wrote: > > I wonder if prioritizing certain User-Agents (e.g. curl, w3m, > > anything with non-Anroid "Linux" in it) would be a quick fix in > > haproxy||nginx. > > This seems like a long overdue change to allow certain listen > sockets to handle more traffic than others. That's a neat trick, but not quite what I was looking for. It's not that we're not able to handle the number of connections -- public-inbox-httpd is actually really good at it. In fact, when I migrated things to EL9, I managed to mess up my configuration and ran lore for a few months without -W8, and even then everything mostly worked amazingly well. :) However, we do generate a lot of traffic and unnecessary cpu churn just to train someone's LLM. I don't care about the training part (well, I do, but I can't do anything about it), but doing it over the web when they can just clone the underlying repositories is the stupidest way to do it. So, when musing about the gemini view, I was really just thinking of ways to reduce the dumb AI bots while still giving unrestricted access to anyone else with a gemini client. That said, I will play with this change, as this will for sure let me prioritize b4/curl traffic over anything that identifies itself as a browser. Thanks! -K ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Gemini protocol view? 2025-03-04 22:14 ` Eric Wong 2025-03-05 0:45 ` [PATCH] listener: don't set listen backlog on inherited sockets Eric Wong @ 2025-03-07 21:25 ` Eric Wong 2025-03-11 19:26 ` Eric Wong 1 sibling, 1 reply; 6+ messages in thread From: Eric Wong @ 2025-03-07 21:25 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta Eric Wong <e@80x24.org> wrote: > Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote: > > Hi: > > > > Just a wild idea -- how hard would it be to present a gemini:// protocol view > > in addition to the web view? > > Probably not hard at all, but I haven't looked at it at all and > am still trying figure out how to make a 2.0 release soon with > codesearch support... Yeah, definitely doable and a mostly mindless distraction while hitting mental blocks with codesearch UI. Not sure about clients, I'm just using something like: echo $URL | openssl s_client -crlf -connect ... right now; but I know https://git.sr.ht/~rkta/w3m has a gemini branch which I'll have to look at. I loathe having to program new TUI keybindings into my muscle memory. notes: * The ``` use for preformatted blocks doesn't have an established way to escape it, but I suppose prefixing those with "\N{ZERO WIDTH SPACE}" is fine for escaping. * the specifications are developed on GitLab so not accessible to non-JS users (which seems like a chunk of the target audience for gemini) * text/gemini is better than the mess that is Markdown (or similar) * lack of TLS connection reuse sucks on high-latency networks if trying to browse through a bunch of messages quickly (something that an NNTP/IMAP client would do) * lack of compression would suck for /T/ and /t/ endpoints for trying to browse a thread w/ a single request... * titan:// is an extension that can be used like POST/PUT if we ever support non-SMTP inputs; especially with TOFU + client certs (`git send-titan' anyone?) * URLs being limited to 1024 bytes shouldn't be a problem for non-spam Message-IDs (Xapian has a lower limit (244) for terms) more to come... ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Gemini protocol view? 2025-03-07 21:25 ` Gemini protocol view? Eric Wong @ 2025-03-11 19:26 ` Eric Wong 0 siblings, 0 replies; 6+ messages in thread From: Eric Wong @ 2025-03-11 19:26 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta Eric Wong <e@80x24.org> wrote: > * lack of compression would suck for /T/ and /t/ endpoints > for trying to browse a thread w/ a single request... The protocol specification mentions URL fragments, but the text/gemini specification doesn't seem to support linking within the same document. So I'm wondering how useful the /T/ and /t/ views would be... > more to come... ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-03-11 19:26 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-03-04 19:44 Gemini protocol view? Konstantin Ryabitsev 2025-03-04 22:14 ` Eric Wong 2025-03-05 0:45 ` [PATCH] listener: don't set listen backlog on inherited sockets Eric Wong 2025-03-05 16:54 ` Konstantin Ryabitsev 2025-03-07 21:25 ` Gemini protocol view? Eric Wong 2025-03-11 19:26 ` Eric Wong
Code repositories for project(s) associated with this public inbox https://80x24.org/public-inbox.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).