From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id DAFD81F4C0; Fri, 25 Oct 2019 12:22:14 +0000 (UTC) Date: Fri, 25 Oct 2019 12:22:14 +0000 From: Eric Wong To: meta@public-inbox.org Subject: Re: RFC: monthly epochs for v2 Message-ID: <20191025122214.GA6947@dcvr> References: <20191024195304.5b7zlx7e3vxfxmtg@chatter.i7.local> <20191024203503.GA31522@dcvr> <20191024212108.zfbwh7bmfbo3cgu5@chatter.i7.local> <20191024223451.GA17949@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20191024223451.GA17949@dcvr> List-Id: Eric Wong wrote: > Konstantin Ryabitsev wrote: > > On Thu, Oct 24, 2019 at 08:35:03PM +0000, Eric Wong wrote: > > > > - if someone is only interested in a few months worth of archives, they > > > > don't have to clone the entire collection > > > > - similarly, someone using public-inbox to feed messages to their inbox > > > > (e.g. using the l2md tool [1]) doesn't need to waste gigabytes storing > > > > archives they aren't interested in > > > > > > NNTP or d:YYYYMMDD..YYYYMMDD mboxrd downloads via HTTP search > > > are better suited for those cases. > > > > I know you really like nntp, but I'm worried that with Big Corp's love of > > deep packet inspection and filtering, NNTP ports aren't going to be usable > > by a large subset of developers. We already have enough problems with port > > 9418 not being reachable (and sometimes not even port 22). Since usenet's > > descent into mostly illegal content, many corporate environments probably > > have ports 119 and 563 blocked off entirely and changing that would be > > futile. > > I would consider the possibility of an HTTP API which looks like > NNTP commands, too. But it wouldn't work with existing NNTP > clients... Maybe websockets can be used *shrug* I forget, HTTP CONNECT exists for tunneling anything off HTTP/HTTPS, too; so generic LD_PRELOAD like proxychains should work with most NNTP clients/libraries. Net::NNTP is in every Perl5 install I know of, and every nearly hacker's *nix system has Perl5, but proxychains isn't ubiquitous (Debian packages it, at least) > NNTP can also run off 80/443 if somebody has an extra IP. Not > sure if supporting HTTP and NNTP off the same port is a > possibility since some HTTP clients pre-connect TCP and NNTP is > server-talks-first whereas HTTP is client-talks-first. > > > If people only want a backup via git (and not host HTTP/NNTP), > > > it's FAR easier for them to run ubiquitous commands such as > > > "git clone --mirror && git fetch" rather than > > > "install $TOOL which may be out-of-date-or-missing-on-your-distro" > > > > I think that anyone who is likely to use public-inbox repositories for more > > than just a copy of archives is likely to be using some kind of tool. I > > mean, SMTP can be used with "telnet" but nobody really does. :) If we > > provide a convenient library that supports things like intelligent selective > > cloning, indexing, fetching messages, etc, then that would avoid everyone > > doing it badly. In fact, libpublicinbox and bindings to most common > > languages is probably something that should happen early on. > > I'm not sure about a libpublicinbox... I have been really > hesitant to depend on shared C/C++ libraries whenever I use Perl > or Ruby because of build and install complexity; especially for > stuff that's not-yet-available on distros. > > Well-defined and stable protocols + data formats? > Yes. 100 times yes. > > What would be nice is to have a local server so they could > access everything via HTTP using curl or whatever HTTP library > users want. On shared systems, it could be HTTP over a UNIX > socket. I don't think libcurl supports Unix domain sockets, > yet, but HTTP/1.1 parsers are pretty common. > > JSON is a possibility, too; but I'm not sure if JSON is even > necessary if all that's exchanged are git blob OIDs and URLs for > mboxes. Parsing MIME + RFC822(-ish) are already sunk costs. More on that. As much as I may be in favor of "software freedom", I'm even more in favor of "freedom _from_ software". Reusing existing data formats as much as possible to minimize the bug and attack surface is something I've been trying to do.