Date | Commit message (Collapse) |
|
Gnus seems to start Message-IDs with 10 random characters
followed by ".fsf@$DOMAIN". In case of mis-linkification or
mis-selection from stopping at the `@', ensuring the first 14
characters are accepted as a search parameter for the truncated
Message-ID improves usability.
|
|
Since signalfd is often combined with our event loop, give it a
convenient API and reduce the code duplication required to use it.
EventLoop is replaced with ::event_loop to allow consistent
parameter passing and avoid needlessly passing the package name
on stack.
We also avoid exporting SFD_NONBLOCK since it's the only flag we
support. There's no sense in having the memory overhead of a
constant function when it's in cold code.
|
|
This allows us to avoid creating ibx->{search}->{xdb} at this
spot by using an `undef' value. This is a step towards
eliminating the innocuous "/path/to/inboxdir/xap15 has no shards"
messages introduced in commit 63d7b8ce (daemons: revamp
periodic cleanup task, 2021-09-23)
|
|
Apparently URLs can be configured relatively for HTTP(S) setups,
attempt to support them when linking to cross-posted messages.
This also fixes the top-row (mirror/help/color/Atom feed) links
in /$INBOX_URL/$EXTMSG_MSGID/T/ (and /t/) URLs.
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/meta/20210902191239.cmbxlmjqcsmdzmqp@meerkat.local/
|
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
The ->mset method always returns a Xapian mset nowadays, so
naming a parameter {mset} is too confusing. As it does with
MiscSearch, setting the {relevance} parameter to -1 now sorts by
ascending docid order. -2 is now supported for descending
docid order, too, since it may be useful for lei users.
|
|
If a message can't be found in ->ALL, we shouldn't attempt to
enter code paths which iterate normal inboxes or attempt to
access non-existent fields (e.g. {name}, {newsgroup},
{inboxdir}) in the ExtSearch object.
|
|
{pi_config} may be confused with the documented `PI_CONFIG'
environment variable, and we'll favor vowel-removal to be
consistent with our usage of object references.
The `pi_' prefix may stay in some places, for now; since a
separate namespace may come into this codebase for local/private
client-tooling.
For InboxIdle, we'll also remove an invalid comment about
holding a reference to the PublicInbox::Config object, too.
|
|
{ibx} is shorter and is the most prevalent abbreviation
in indexing and IMAP code, and the `$ibx' local variable
is already prevalent throughout.
In general, the codebase favors removal of vowels in variable
and field names to denote non-references (because references are
"lighter" than non-references).
So update WWW and Filter users to use the same code since
it reduces confusion and may allow easier code sharing.
|
|
Using "eidx_key:" boolean prefix to limit results to a given
inbox, we can use ->ALL to emulate and replace per-Inbox
xap15/[0-9] search indices.
With this change, the presence of "extindex.all.topdir" in the
$PI_CONFIG will cause the WWW code to use that extindex and
ignore per-inbox Xapian DBs in xap15/[0-9].
Unfortunately IMAP search still requires old per-inbox indices,
for now. Mapping extindex Xapian docids to per-Inbox UIDs and
vice-versa is proving tricky. Fortunately, IMAP search is
rarely used and optional. The RFCs don't specify expensive
phrase search, either, so `indexlevel=medium' can be used in
per-inbox Xapian indices to save space.
For primarily WWW (and future JMAP) users; this should result in
significant disk space, FD, and page cache footprint savings for
large instances with many inboxes and many cross-posted
messages.
|
|
As with NewsWWW and NNTP, we can use ->ALL to completely
avoid trying SQLite/Xapian lookups across hundreds/thousands
of inboxes.
|
|
While Perl implements tail recursion via `goto' which allows
avoiding warnings on deep recursion. It doesn't (as of 5.28)
optimize the speed of such dispatches, though it may reduce
ephemeral memory usage.
Make the code less alien to hackers coming from other languages
by using normal subroutine dispatch. It's actually slightly
faster in micro benchmarks due to the complexity of `goto &NAME'.
|
|
With many inboxes, checking multiple SQLite repos will be slow
and time-consuming, so ensure we can schedule it fairly between
multiple inboxes.
|
|
In Perl, we can simplify callers by passing a single array
all the way down the stack instead of a single array ref which
needs to be expanded every call.
|
|
Nearly all of the search uses in the production code rely on
a Xapian mset iterator being returned (instead of an array
of $smsg objects). So default to returning the mset and move
the burden of smsg array conversion into the test cases.
|
|
We can avoid importing mdocid() in several places by using
this method, simplifying callers.
|
|
Once again, over.sqlite3 contains everything necessary for
Message-ID resolution. Also, Xapian may be completely
unnecessary with the advent of over.sqlite3, but that's for
another time.
|
|
We'll continue to favor simpler data models that can be
used directly rather than wasting time and memory with
accessor APIs.
The ->from, ->to, -cc, ->mid, ->subject, >references methods can
all be trivially replaced by hash lookups since all their values
are stored in doc_data. Most remaining callers of those methods
were test cases, anyways.
->from_name is only used in the PSGI code, so we can just
use ->psgi_cull to take care of populating the {from_name}
field.
|
|
And use Exporter to make our life easier, since WwwAltId was
using a non-existent PublicInbox::WwwResponse namespace in error
paths which doesn't get noticed by `perl -c' or exercised by
tests on normal systems.
Fixes: 6512b1245ebc6fe3 ("www: add endpoint to retrieve altid dumps")
|
|
No reason to use the ->getline interface for small responses.
|
|
Since the introduction of over.sqlite3, SearchMsg is not tied to
our search functionality in any way, so stop confusing ourselves
and future hackers by just calling it "PublicInbox::Smsg".
Add a missing "use" in ExtMsg while we're at it.
|
|
We need to escape ampersands (and some other characters for href
attributes), so introduce a `mid_href' sub to do just that.
'<', '>' and '"' were always escaped, so there's no risk of tag
or attribute injection, but creative Message-IDs could cause
confusion for some parsers and generate invalid URLs.
Start getting rid of the bloated, over-engineered OO Hval API
while we're at it, I only noticed this bug because I started
killing off Hval->new* callers.
|
|
I didn't wait until September to do it, this year!
|
|
gmane still has a NNTP server, so update links to point to it.
cf. https://lars.ingebrigtsen.no/2020/01/06/whatever-happened-to-news-gmane-org/
|
|
There's a bunch of leftover "require" and "use" statements we no
longer need and can get rid of, along with some excessive
imports via "use".
IO::Handle usage isn't always obvious, so add comments
describing why a package loads it. Along the same lines,
document the tmpdir support as the reason we depend on
File::Temp 0.19, even though every Perl 5.10.1+ user has it.
While we're at it, favor "use" over "require", since it it gives
us extra compile-time checking.
|
|
This allows to do some compile-time checking and fills in a
missing "use" in PublicInbox::NewsWWW, allowing it to be used
standalone and independently of PublicInbox::WWW
|
|
This allows callers to pass named (not anonymous) subs.
Update all retry_reopen callers to use this feature, and
fix some places where we failed to use retry_reopen :x
|
|
Another place where we can replace anonymous subs with named
subs by passing a user-supplied arg.
|
|
We rely on Inbox::mm nowadays.
|
|
|
|
We already escape the user-provided Message-IDs (so there's no
security problem AFAIK), but the URL templates which exist in
our source code were not escaped properly.
This quiets down tidy(1).
|
|
It's not worth it, and attempts to wildcard off
single-character Message-IDs(*) causes Xapian to error
out in unpredictable ways:
something terrible happened at /usr/lib/x86_64-linux-gnu/perl5/5.24/Search/Xapian/Enquire.pm line 54.
...propagated at lib/PublicInbox/Search.pm line 209.
So don't bother.
(*) because people blindly hit 'y' or 'n' when git-send-email
prompted them for In-Reply-To.
|
|
"LIKE" in SQLite (and other SQL implementations I've seen) is
expensive with nearly 3 million messages in the archives.
This caused some partial Message-ID lookups to take over 600ms
on my workstation (~300ms on a faster Xeon). Cut that to below
under 30ms on average on my workstation by relying exclusively
on Xapian for partial Message-ID lookups as we have in the past.
Unlike in the past when we tried using Xapian to match partial
Message-IDs; we now optimize our indexing of Message-IDs to
break apart "words" in Message-IDs for searching, yielding
(hopefully) "good enough" accuracy for folks who get long URLs
broken across lines when copy+pasting.
We'll also drop the (in retrospect) pointless stripping of
"/[tTf]" suffixes for the partial match, since anybody who
hits that codepath would be hitting an invalid message ID.
Finally, limit wildcard expansion to prevent easy DoS vectors
on short terms.
And blame Pine and alpine for generating Message-IDs with
low-entropy prefixes :P
|
|
* origin/master:
nntp: allow and ignore empty commands
mbox: do not barf on queries which return no results
nntp: fix NEWNEWS command
searchview: fix non-numeric comparison
Allow specification of the number of search results to return
githttpbackend: avoid infinite loop on generic PSGI servers
http: fix modification of read-only value
extmsg: use news.gmane.org for Message-ID lookups
extmsg: rework partial MID matching to favor current inbox
Update the installation instructions with Fedora package names
nntp: do not drain rbuf if there is a command pending
nntp: improve fairness during XOVER and similar commands
searchidx: do not modify Xapian DB while iterating
Don't use LIMIT in UPDATE statements
|
|
Searching across different inboxes is expensive without
SQLite (or Xapian) installed, so avoid doing expensive tree
lookups in git. Since SQLite is required for Xapian
support anyways, we won't need to check Xapian, either.
Sites without SQLite installed will simply 404 if somebody
requests a message which isn't in the current inbox.
|
|
http://mid.gmane.org/ has not worked for a while, but their NNTP
server continues to work. Use that and perhaps give NNTP more
exposure.
Reported-by: Jonathan Corbet <corbet@lwn.net>
|
|
The current inbox is more important for partial Message-ID
matching, so we try harder on that to fix common errors before
moving onto other inboxes. Then, prevent expensive scanning of
other inboxes by requiring a Message-ID length of at least 16
bytes.
Finally, we limit the overall partial responses to 200 when
scanning other inboxes to avoid excessive memory usage.
|
|
The current inbox is more important for partial Message-ID
matching, so we try harder on that to fix common errors before
moving onto other inboxes. Then, prevent expensive scanning of
other inboxes by requiring a Message-ID length of at least 16
bytes.
Finally, we limit the overall partial responses to 200 when
scanning other inboxes to avoid excessive memory usage.
|
|
'Q' is merely a convention in the Xapian world, and is close
enough to unique for practical purposes, so stop using XMID
and gain a little more term length as a result.
|
|
In general, they are, but there's no way for or general purpose
mail server to enforce that. This is a step in allowing us
to handle more corner cases which existing lists throw at us.
|
|
This likely has no real world implications, though, as we
fall back to Msgmap lookups anyways.
Broken since commit 7eeadcb62729b0efbcb53cd9b7b181897c92cf9a
("search: remove unnecessary abstractions and functionality")
|
|
Using update-copyrights from gnulib
While we're at it, use the SPDX identifier for AGPL-3.0+ to
ease mechanical processing.
|
|
Apparently mid.mail-archive.com does not support HTTPS,
and the HTTP version redirects to the search query, anyways.
|
|
Based on reading RFC 3986, it seems '@', ':', '!', '$', '&',
"'", '; '(', ')', '*', '+', ',', ';', '=' are all allowed
in path-absolute where we have the Message-ID.
In any case, it seems '@' is fairly common in path components
nowadays and too common in Message-IDs.
|
|
gmane is down at the moment, so lower that in priority
(hopefully it will be brought back up, again). Wikipedia also
lists a few more project-specific list providers, so include
those as well: https://en.wikipedia.org/wiki/Message-ID
|
|
While an inbox may have multiple URLs, we will favor
the existing URL for the current inbox on partial matches
to avoid confusing users or slowing them down by requiring
a new TCP connection.
|
|
Reduce the size of hashes a bit and drops some unneeded hash
lookups for uncommon paths.
|
|
Another step towards a consistent WWW UI...
|
|
Automatic inbox switching was a potentially deceptive pattern
and surprises readers who do not check the URL bar closely.
Furthermore, a message could be cross-posted to multiple lists,
too.
|
|
Exposing compressed Message-IDs in URLs was a mistake,
remove a remnant of it.
|