* Gemini protocol view?
@ 2025-03-04 19:44 Konstantin Ryabitsev
2025-03-04 22:14 ` Eric Wong
0 siblings, 1 reply; 6+ messages in thread
From: Konstantin Ryabitsev @ 2025-03-04 19:44 UTC (permalink / raw)
To: meta
Hi:
Just a wild idea -- how hard would it be to present a gemini:// protocol view
in addition to the web view? The AI scraper bots are killing us and I'm
looking for ways to present a lighter view of the entire lore database that
developers can still easily use. Gemini seems like the right set of features
for public-inbox, since it allows searching.
Just thinking out loud. :)
-K
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Gemini protocol view?
2025-03-04 19:44 Gemini protocol view? Konstantin Ryabitsev
@ 2025-03-04 22:14 ` Eric Wong
2025-03-05 0:45 ` [PATCH] listener: don't set listen backlog on inherited sockets Eric Wong
2025-03-07 21:25 ` Gemini protocol view? Eric Wong
0 siblings, 2 replies; 6+ messages in thread
From: Eric Wong @ 2025-03-04 22:14 UTC (permalink / raw)
To: Konstantin Ryabitsev; +Cc: meta
Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> Hi:
>
> Just a wild idea -- how hard would it be to present a gemini:// protocol view
> in addition to the web view?
Probably not hard at all, but I haven't looked at it at all and
am still trying figure out how to make a 2.0 release soon with
codesearch support...
I'm not a fan of forced TLS since I favor Tor; and I generally
favor older, more established things...
> The AI scraper bots are killing us and I'm
> looking for ways to present a lighter view of the entire lore database that
> developers can still easily use. Gemini seems like the right set of features
> for public-inbox, since it allows searching.
Yeah, I don't disagree.
I've got the design of a new, independent Perl transpile-terpreter
largely worked out in my head which ought to give good speedups;
but it remains a massive effort to actually implement.
I wonder if prioritizing certain User-Agents (e.g. curl, w3m,
anything with non-Anroid "Linux" in it) would be a quick fix in
haproxy||nginx.
> Just thinking out loud. :)
Thanks :>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] listener: don't set listen backlog on inherited sockets
2025-03-04 22:14 ` Eric Wong
@ 2025-03-05 0:45 ` Eric Wong
2025-03-05 16:54 ` Konstantin Ryabitsev
2025-03-07 21:25 ` Gemini protocol view? Eric Wong
1 sibling, 1 reply; 6+ messages in thread
From: Eric Wong @ 2025-03-05 0:45 UTC (permalink / raw)
To: Konstantin Ryabitsev; +Cc: meta
Eric Wong <e@80x24.org> wrote:
> I wonder if prioritizing certain User-Agents (e.g. curl, w3m,
> anything with non-Anroid "Linux" in it) would be a quick fix in
> haproxy||nginx.
This seems like a long overdue change to allow certain listen
sockets to handle more traffic than others.
---------8<-------
Subject: [PATCH] listener: don't set listen backlog on inherited sockets
By using the listen(2) backlog as-is when inheriting (from
systemd or similar), we can give the sysadmin more control on
controlling overload on a per-listener basis. For systemd
users, this means the `Backlog=' parameter in systemd.socket(5)
can be respected and configured to give certain sockets a
smaller backlog (perhaps combined with with per-listener
`multi-accept' parameter on sockets with the standard (huge)
backlog).
For sockets we create, continue to use INT_MAX and let the
kernel clamp it to whatever system-wide limit there is
(e.g. `net.core.somaxconn' sysctl on Linux).
---
lib/PublicInbox/Daemon.pm | 2 +-
lib/PublicInbox/LEI.pm | 3 ++-
lib/PublicInbox/Listener.pm | 1 -
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/lib/PublicInbox/Daemon.pm b/lib/PublicInbox/Daemon.pm
index 5d93f81f..8fe93acd 100644
--- a/lib/PublicInbox/Daemon.pm
+++ b/lib/PublicInbox/Daemon.pm
@@ -264,7 +264,7 @@ EOF
die $@ if $@;
%o = (LocalAddr => $l, ReuseAddr => 1, Proto => 'tcp');
}
- $o{Listen} = 1024;
+ $o{Listen} = 2**31 - 1; # kernel will clamp
my $prev = umask 0000;
my $s = eval { $sock_pkg->new(%o) } or
warn "error binding $l: $! ($@)\n";
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 94bac688..0a779c4f 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -9,7 +9,7 @@ package PublicInbox::LEI;
use v5.12;
use parent qw(PublicInbox::DS PublicInbox::LeiExternal
PublicInbox::LeiQuery);
-use autodie qw(bind chdir open pipe socket socketpair syswrite unlink);
+use autodie qw(bind chdir listen open pipe socket socketpair syswrite unlink);
use Getopt::Long ();
use Socket qw(AF_UNIX SOCK_SEQPACKET pack_sockaddr_un);
use Errno qw(EPIPE EAGAIN ECONNREFUSED ENOENT ECONNRESET EINTR);
@@ -1371,6 +1371,7 @@ sub lazy_start {
local (%PATH2CFG, $MDIR2CFGPATH);
local $daemon_pid = $$;
$listener->blocking(0);
+ listen $listener, 2**31 - 1; # kernel will clamp
my $exit_code;
my $pil = PublicInbox::Listener->new($listener, \&accept_dispatch);
local $quit = do {
diff --git a/lib/PublicInbox/Listener.pm b/lib/PublicInbox/Listener.pm
index c83901b2..62475600 100644
--- a/lib/PublicInbox/Listener.pm
+++ b/lib/PublicInbox/Listener.pm
@@ -21,7 +21,6 @@ sub new {
my ($class, $s, $cb, $multi_accept) = @_;
setsockopt($s, SOL_SOCKET, SO_KEEPALIVE, 1);
setsockopt($s, IPPROTO_TCP, TCP_NODELAY, 1); # ignore errors on non-TCP
- listen($s, 2**31 - 1); # kernel will clamp
my $self = bless { post_accept => $cb }, $class;
$self->{multi_accept} = $multi_accept //= $MULTI_ACCEPT;
$self->SUPER::new($s, EPOLLIN|EPOLLEXCLUSIVE);
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] listener: don't set listen backlog on inherited sockets
2025-03-05 0:45 ` [PATCH] listener: don't set listen backlog on inherited sockets Eric Wong
@ 2025-03-05 16:54 ` Konstantin Ryabitsev
0 siblings, 0 replies; 6+ messages in thread
From: Konstantin Ryabitsev @ 2025-03-05 16:54 UTC (permalink / raw)
To: Eric Wong; +Cc: meta
On Wed, Mar 05, 2025 at 12:45:36AM +0000, Eric Wong wrote:
> Eric Wong <e@80x24.org> wrote:
> > I wonder if prioritizing certain User-Agents (e.g. curl, w3m,
> > anything with non-Anroid "Linux" in it) would be a quick fix in
> > haproxy||nginx.
>
> This seems like a long overdue change to allow certain listen
> sockets to handle more traffic than others.
That's a neat trick, but not quite what I was looking for. It's not that we're
not able to handle the number of connections -- public-inbox-httpd is actually
really good at it. In fact, when I migrated things to EL9, I managed to mess
up my configuration and ran lore for a few months without -W8, and even then
everything mostly worked amazingly well. :)
However, we do generate a lot of traffic and unnecessary cpu churn just to
train someone's LLM. I don't care about the training part (well, I do, but I
can't do anything about it), but doing it over the web when they can just
clone the underlying repositories is the stupidest way to do it.
So, when musing about the gemini view, I was really just thinking of ways to
reduce the dumb AI bots while still giving unrestricted access to anyone else
with a gemini client.
That said, I will play with this change, as this will for sure let me
prioritize b4/curl traffic over anything that identifies itself as a browser.
Thanks!
-K
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Gemini protocol view?
2025-03-04 22:14 ` Eric Wong
2025-03-05 0:45 ` [PATCH] listener: don't set listen backlog on inherited sockets Eric Wong
@ 2025-03-07 21:25 ` Eric Wong
2025-03-11 19:26 ` Eric Wong
1 sibling, 1 reply; 6+ messages in thread
From: Eric Wong @ 2025-03-07 21:25 UTC (permalink / raw)
To: Konstantin Ryabitsev; +Cc: meta
Eric Wong <e@80x24.org> wrote:
> Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> > Hi:
> >
> > Just a wild idea -- how hard would it be to present a gemini:// protocol view
> > in addition to the web view?
>
> Probably not hard at all, but I haven't looked at it at all and
> am still trying figure out how to make a 2.0 release soon with
> codesearch support...
Yeah, definitely doable and a mostly mindless distraction
while hitting mental blocks with codesearch UI.
Not sure about clients, I'm just using something like:
echo $URL | openssl s_client -crlf -connect ...
right now; but I know https://git.sr.ht/~rkta/w3m has a gemini
branch which I'll have to look at. I loathe having to program
new TUI keybindings into my muscle memory.
notes:
* The ``` use for preformatted blocks doesn't have an established
way to escape it, but I suppose prefixing those with
"\N{ZERO WIDTH SPACE}" is fine for escaping.
* the specifications are developed on GitLab so not accessible
to non-JS users (which seems like a chunk of the target
audience for gemini)
* text/gemini is better than the mess that is Markdown (or similar)
* lack of TLS connection reuse sucks on high-latency networks
if trying to browse through a bunch of messages quickly
(something that an NNTP/IMAP client would do)
* lack of compression would suck for /T/ and /t/ endpoints
for trying to browse a thread w/ a single request...
* titan:// is an extension that can be used like POST/PUT if we
ever support non-SMTP inputs; especially with TOFU + client
certs (`git send-titan' anyone?)
* URLs being limited to 1024 bytes shouldn't be a problem for
non-spam Message-IDs (Xapian has a lower limit (244) for terms)
more to come...
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Gemini protocol view?
2025-03-07 21:25 ` Gemini protocol view? Eric Wong
@ 2025-03-11 19:26 ` Eric Wong
0 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2025-03-11 19:26 UTC (permalink / raw)
To: Konstantin Ryabitsev; +Cc: meta
Eric Wong <e@80x24.org> wrote:
> * lack of compression would suck for /T/ and /t/ endpoints
> for trying to browse a thread w/ a single request...
The protocol specification mentions URL fragments, but the
text/gemini specification doesn't seem to support linking
within the same document. So I'm wondering how useful the
/T/ and /t/ views would be...
> more to come...
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-03-11 19:26 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-04 19:44 Gemini protocol view? Konstantin Ryabitsev
2025-03-04 22:14 ` Eric Wong
2025-03-05 0:45 ` [PATCH] listener: don't set listen backlog on inherited sockets Eric Wong
2025-03-05 16:54 ` Konstantin Ryabitsev
2025-03-07 21:25 ` Gemini protocol view? Eric Wong
2025-03-11 19:26 ` Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).