Date | Commit message (Collapse) |
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
These headers can conflict with headers in the DKIM signature;
and parsing the DKIM-Signature header to determine whether or
not we can safely add a header would be more code and CPU
cycles.
Since IMAP seems fine without these headers (and JMAP will
likely be, too), there's likely no need to continue appending
these to every message. Nowadays, developers seem sufficiently
trained to use URLs with Message-IDs in them. So drop the
headers and save some cycles and bandwidth all around.
|
|
Since extindex is an amalgamation of several inboxes, discerning
an appropriate address for List-Post: would be expensive and
most likely unnecessary. Some legacy/historical inboxes may
have no active address, either, so don't attempt to set the
List-Post header if no addresses are configured.
|
|
{pi_config} may be confused with the documented `PI_CONFIG'
environment variable, and we'll favor vowel-removal to be
consistent with our usage of object references.
The `pi_' prefix may stay in some places, for now; since a
separate namespace may come into this codebase for local/private
client-tooling.
For InboxIdle, we'll also remove an invalid comment about
holding a reference to the PublicInbox::Config object, too.
|
|
They're PublicInbox::Inbox objects just like the rest of
the non-NNTP code. So rename the NNTP code for consistency
with the rest of the codebase. Furthermore, {ng} and $ng
may be confused with the `--ng' switch for -init, and that's
a non-ref scalar string.
|
|
Using a non-zero-length separator for `join' requires extra work
inside Perl. We can shove the cost of appending "\r\n" into the
`map' loop, instead. This speeds up the `join' operation.
The "deferred" log entry for a "LISTGROUP org.kernel.vger.linux-kernel"
command (with nearly 3.8 million messages) goes from ~3.96s to 3.86s
on my workstation.
|
|
We can invalidate ibx->{newsgroup} at config load-time to avoid
having to check ibx->{newsgroup} validity in To/Cc: matching.
This saves us some hash lookups in all cases.
|
|
For ->ALL users, this mitigates the regression introduced
by commit 811b8d3cbaa790f59b7b107140b86248da16499b
("nntp: xref: use ->ALL extindex if available"), since
it's common to cross post messages to some mailing
lists with per-list trailers for unsubscribe information.
We won't bother dealing with Bcc-ed messages since those
are nearly all spam when it comes to public mailing lists.
Fixes: 811b8d3cbaa790f5 ("nntp: xref: use ->ALL extindex if available")
Link: https://public-inbox.org/meta/20201130194201.GA6687@dcvr/
|
|
It's not worth confusing hackers reading the source to have
two ways to access the same (large) hash table. So just
go through PublicInbox::Config objects for now since the
extra hash lookup isn't going to be noticeable.
I've also started favoring "for" instead of "foreach"
since they're the equivalent perlop and less wear on
my fingers + keyboard.
|
|
Another 30-40% speedup when testing against a local
lore.kernel.org mirror. In either case, we'll consistently sort
the response for ease-of-testing and client-side
cache-friendliness.
|
|
This lets us take advantage of mid_lookup speedup from the
previous commit.
While we're at it, start moving towards using `$ibx' as the
abbreviation for PublicInbox::Inbox objects even in the NNTP
code, since they've been shared with the WWW code for several
years, now.
|
|
We can reuse "xref3" information in extindex to quickly match
messages matching a given Message-ID across hundreds or
thousands of newsgroups with a few SQL statements.
"XHDR Xref $MESSAGE_ID" is around 40% faster, on top of
previous speedups.
|
|
We can amortize the cost of NEWGROUPS time filtering using the
long_response API. This lets us handle hundreds/thousands of
inboxes without monopolizing the event loop for this command.
Further speedup is possible using MiscSearch, but that requires
not-yet-done indexing changes to MiscIdx.
|
|
Getting Xref for cross-posted messages is an O(n) operation
where `n' is the number of newsgroups on the server. This works
acceptably when there are dozens of groups, but would be
unnacceptable when there's tens of thousands of newsgroups.
With ~140 newsgroups, a lore.kernel.org mirror already handles
"XHDR Xref $MESSAGE_ID" requests around 30% faster after
creating the xref3.idx_nntp index.
The SQL additions to ExtSearch.pm may be a bit strange and
seem more appropriate for Over.pm; however it currently makes
sense to me since those bits of over.sqlite3 access are
exclusive to ExtSearch and can't be used by traditional
v1/v2 inboxes...
|
|
We'll be using the `xref3' table in extindex to speed up xref(),
and that'll require comparisons against $smsg->{blob}. So pass
the entire $smsg through.
|
|
Reduce screen real estate usage to reduce human attention span
requirements.
|
|
Iterating through many newsgroups can hog the event loop if many
random seeks are required. Avoid monopolizing the event loop in
that case by using the long_response API.
For now, we can still rely on grep() since it seems to work
reasonably well with 50K test newsgroup names.
|
|
This matches the example shown in RFC 3977, section 7.6.1.3
|
|
With 50K newsgroups, the filtering phase goes from ~2000 seconds
to ~90 MILLISECONDS by relying on the grep perlop. This moves
->over checking out of the main dispatch and amortizes the cost
via long_response. (Fairly scheduled) long_response time in
newnews_i now takes ~360 seconds as opposed to ~30 seconds
before this change, however; but the initial filtering speedup
eliminating 2000s is more than worth it.
|
|
Based on experiences with the IMAP server, this ought to be
significantly faster (as to be demonstrated in the next
commit).
|
|
This simplifies callers and allows empty newsgroups to be
represented (the WWW UI may be insufficient there, too).
|
|
This is memoized, and may allow us some future flexibility w.r.t
PublicInbox::Inbox-like objects. While we're at it, use
defined-or ("//") in case somebody really set a public-inbox
creation time to the Unix epoch.
|
|
Perhaps some NNTP clients would be unhappy with the old value
"y". So use a bit more bandwidth+space to use the server-name
and historical "!not-for-mail" tail-entry to better conform to
a published RFC.
Reported-by: Andrey Melnikov <temnota.am@gmail.com>
|
|
...instead of spaces. This is specified in RFC 5536 3.1.4.
Include references to RFC 1036, 5536 and 5537 in our docs while
we're at it.
Reported-by: Andrey Melnikov <temnota.am@gmail.com>
Link: https://public-inbox.org/meta/CA+PODjpUN5Q4gBFQhAzUNuMasVEdmp9f=8Uo0Ej0mFumdSwi4w@mail.gmail.com/
|
|
Apparently they happen (triggered by my -imapd instance), so
bail out by closing the underlying socket rather than stopping
the event loop and daemon process.
|
|
This prepares us for future changes to improve scalability to
many inboxes.
|
|
We cannot blindly use the selected newsgroup for
HEAD/ARTICLE/BODY requests using Message-ID, since
those commands look across all newsgroups; not just
the selected one (if any).
So stuff a reference to the Inbox object into $smsg.
We can reduce args passed into set_nntp_headers() and
msg_hdr_write(), too.
Fixes: 0e6ceff37fc38f28 ("nntp: support slow blob retrievals")
|
|
The return value of art_lookup changed but this command wasn't
updated since it wasn't tested.
Fixes: 0e6ceff37fc38f28 ("nntp: support slow blob retrievals")
|
|
Since -edit and -purge should be rare and TOCTOU around them
rarer still; missing {blobs} could be indicative of a real bug
elsewhere. Warn on them.
And I somehow ended up with 3 different field names for Inbox
objects. Perhaps they'll be made consistent in the future.
|
|
Since the removal of pseudo-hash support in Perl 5.10, the
"fields" module no longer provides the space or speed benefits
it did in 5.8. It also does not allow for compile-time checks,
only run-time checks.
To me, the extra developer overhead in maintaining "use fields"
args has become a hassle. None of our non-DS-related code uses
fields.pm, nor do any of our current dependencies. In fact,
Danga::Socket (which DS was originally forked from) and its
subclasses are the only fields.pm users I've ever encountered in
the wild. Removing fields may make our code more approachable
to other Perl hackers.
So stop using fields.pm and locked hashes, but continue to
document what fields do for non-trivial classes.
|
|
While this circular reference was carefully managed to not leak
memory; it was still triggering a warning at -imapd/-nntpd
shutdown due to the EPOLL_CTL_DEL op failing after the $Epoll FD
gets closed.
So remove the circular reference by providing a ref to `undef',
instead.
|
|
Having `git cat-file' as a separate process naturally lends
itself to asynchronous dispatch. Our event loop for -nntpd no
longer blocks on slow git storage.
Pipelining in -imapd was tricky and bugs were exposed by
mbsync(1). Update t/nntpd.t to support pipelining ARTICLE
requests to ensure we don't have the same problems -imapd
did during development.
|
|
This matches PublicInbox::IMAP::event_step and will allow us to
handle blob retrievals from git asynchronously without falling
over on pipelined requests.
|
|
Doing a ref($obj) string comparison ties us to IO::Socket::SSL
(and OpenSSL) In the future, we may support GnuTLS or other TLS
implementations. This was already done in the IMAP code.
|
|
For v1 inboxes (and possibly v2 in the future, for VACUUM),
public-inbox-compact replaces over.sqlite3 with a new file.
This currently doesn't need an extra inotify watch descriptor
(or FD for kevent) at the moment, so it can coexist nicely for
systems w/o IO::KQueue or Linux::Inotify2.
|
|
We'll continue to favor simpler data models that can be
used directly rather than wasting time and memory with
accessor APIs.
The ->from, ->to, -cc, ->mid, ->subject, >references methods can
all be trivially replaced by hash lookups since all their values
are stored in doc_data. Most remaining callers of those methods
were test cases, anyways.
->from_name is only used in the PSGI code, so we can just
use ->psgi_cull to take care of populating the {from_name}
field.
|
|
PublicInbox::Smsg::date remains the only exception which
requires any subroutine calls, here, so we'll just have
a branch just for that.
|
|
Since PublicInbox::Eml doesn't parse MIME subparts
up front, it can replace most uses of Email::Simple
without performance penalty.
This will eventually allow us to lower overall internal
API footprint by not having to keep the MIME vs Simple
distinction.
|
|
This allows us to simplify some of our existing code and make
future changes easier.
I doubt anybody goes through the trouble to have a Perl
installation without zlib support. The zlib source code is even
bundled with Perl since 5.9.3 for systems without existing zlib
development headers and libraries.
Of course, zlib is also a requirement of git, too; and we're not
going to stop using git :)
[squashed: "wwwaltid: use gzipfilter up front"]
|
|
It's unnecessary overhead for anything which does Email::MIME
parsing. It was never done for v2 indexing, even though v1->v2
conversions did NOT remove those From_ lines. There was never a
need to remote From_ lines the v1 SearchIdx paths, either.
Hitting a /$INBOX_URL/$MSGID/T/ endpoint with an 18 message
thread reveals a ~0.5% speed improvement. This will become
more apparent when we have a faster MIME parser.
|
|
While this is not a known problem in practice,
RFC 3977 section 3.1 states:
Keywords and arguments MUST each be separated by one
or more space or TAB characters.
|
|
This allows us to consistently enforce the same Message-ID
extraction rules everywhere and makes it easier for us to
make changes in the future.
Update scripts/ssoma-replay, as well, but don't rely on
PublicInbox::* modules in that since it's legacy and
public-inbox was never a dependency of ssoma.
|
|
Since the introduction of over.sqlite3, SearchMsg is not tied to
our search functionality in any way, so stop confusing ourselves
and future hackers by just calling it "PublicInbox::Smsg".
Add a missing "use" in ExtMsg while we're at it.
|
|
I didn't wait until September to do it, this year!
|
|
We can cut down on the number of operations required
using "grep" instead of "foreach".
|
|
We can rely on autovification to turn `undef' value of {wbuf}
into an arrayref.
Furthermore, "push" returns the (new) size of the array since at
least Perl 5.0 (I didn't look further back), so we can use that
return value instead of calling "scalar" again.
|
|
We cannot safely call "fileno(undef)" without bringing down the
entire -nntpd process :x. To ensure no logging regression, we
now stash the FD for the duration of the long response to ensure
the error can be matched to the original command in logs.
Fixes: 207b89615a1a0c06 ("nntp: remove cyclic refs from long_response")
|
|
Time::Local has the concept of a "rolling century" which is
defined at 50 years on either side of the current year. Since
it's now 2020 and >50 years since the Unix epoch, the year "70"
gets interpreted by Time::Local as 2070-01-01 instead of
1970-01-01.
Since NNTP servers are unlikely to store messages from the
future, we'll feed 4-digit year to Time::Local::{timegm,timelocal}
and hopefully not have to worry about things until Y10K.
This fixes test failures on t/v2writable.t and t/nntpd.t since
2020-01-01.
|
|
Introduce xover_i, which does the same thing as the anonymous
sub it replaces.
|
|
Introduce hdr_msgid_range_i, which does the same thing as the
anonymous sub it replaces.
|