Date | Commit message (Collapse) |
|
The only place where we could return wide characters with -httpd
was the raw $INBOX_DIR/description text, which is now converted
to octets.
All daemon (HTTP/NNTP/IMAP) sockets are opened in binary mode,
so length() and bytes::length() are equivalent on reads. For
socket writes, any non-octet data would warn about wide characters
and we are strict in warnings with test_httpd.
All gzipped buffers are also octets, as is PublicInbox::Eml->body,
and anything from PerlIO objects ("git cat-file --batch" output,
filesystems), so bytes::length was unnecessary in all those places.
|
|
Sometimes users (or bots) may lead queries with '&' and
trigger uninitialized variable warnings, just ignore them
and give consumers a $ctx->{qp}->{''} entry.
While we're in the area, pass a regexp rather than scalar string
to the `split' perlop to prevent Perl from recompiling the
regexp on every call.
|
|
|
|
git approxidate won't actually return times in the future,
so "1.{hour,day,year}.from.now" all return the current epoch
time.
So just use "now" and ensure we have a predictable time zone for
testing.
|
|
This greatly improves the usability of d:, dt:, and rt: search
prefixes for users already familiar git's "approxidate" feature.
That is, users familiar with the --(since|after|until|before)=
options in git-log(1) and similar commands will be able to use
those dates in the WWW UI.
|
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
{pi_config} may be confused with the documented `PI_CONFIG'
environment variable, and we'll favor vowel-removal to be
consistent with our usage of object references.
The `pi_' prefix may stay in some places, for now; since a
separate namespace may come into this codebase for local/private
client-tooling.
For InboxIdle, we'll also remove an invalid comment about
holding a reference to the PublicInbox::Config object, too.
|
|
Expanding threads via over.sqlite3 for mbox.gz downloads without
Xapian effectively collapsing on the THREADID column leads to
repeated messages getting downloaded.
To avoid that situation, use a "has_threadid" Xapian metadata
flag that's only set on --reindex (and brand new Xapian DBs).
This allows admins to upgrade WWW or do --reindex in any order;
without worrying about users eating up bandwidth and CPU cycles.
|
|
There may be messages in the wild with wide characters in
headers which aren't non-RFC2047 encoded. Assume UTF-8 so
those fields can round trip through over.sqlite3.
This doesn't affect docdata.glass in Xapian, but it does
affect how over.sqlite3 stores the same deflated info.
|
|
v?fork failures seems to be the cause of locks not getting
released in -watch. Ensure lock release doesn't get skipped
in ->done for both v1 and v2 inboxes. We also need to do
everything we can to ensure DB handles, pipes and processes
get released even in the face of failure.
While we're at it, make failures around `git update-server-info'
non-fatal, since smart HTTP seems more popular anyways.
v2 changes:
- spawn: show failing command
- ensure waitpid is synchronous for inotify events
- teardown all fast-import processes on exception,
not just the failing one
- beef up lock_release error handling
- release lock on fast-import spawn failure
|
|
We no longer load or use Email::MIME outside of comparison
tests.
|
|
Instead, favor PublicInbox::MIME->new for non-attachment emails.
We may support alternatives to Email::MIME down the line.
We'll still keep Email::MIME->create to deal with attachments,
for now, but there's also a fair amount of test duplication
we should eliminate, later.
|
|
I didn't wait until September to do it, this year!
|
|
Lets always have Content-Disposition for files intended
to be downloaded for consumption by non-browsers, such
as pigz, zcat, "git am".
This is also to be consistent with the non-gzipped mbox
$MESSAGE_ID/raw endpoint.
|
|
Apparently I fixed this bug a while back in commit
f94c3a195a25a31d0215cd175938008fca473378 but did
not write tests.
|
|
Some users just want to run -mda, -watch, and/or -nntpd.
Let them run just those without forcing them to pull in a
bunch of dependencies.
|
|
Xapian upstream is slowly phasing out the XS-based Search::Xapian
in favor of the SWIG-generated "Xapian" package. While Debian and
both FreeBSD have Search::Xapian, OpenBSD only includes the "Xapian"
binding.
More information about the status of the "Xapian" Perl module here:
https://trac.xapian.org/ticket/523
|
|
This cuts down on lines of code in individual test cases and
fixes some misnamed error messages by using "$0" consistently.
This will also provide us with a method of swapping out
dependencies which provide equivalent functionality (e.g
"Xapian" SWIG can replace "Search::Xapian" XS bindings).
|
|
We want to be able to use run_script with *.t files, so
t/common.perl putting subs into the top-level "main" namespace
won't work. Instead, make it a module which uses Exporter
like other libraries.
|
|
We'll also introduce a tmpdir() API to give tempdirs
consistent names.
|
|
"mainrepo" ws a bad name and artifact from the early days when I
intended for there to be a "spamrepo" (now just the
ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be
especially confusing, since v2 needs at least two git
repositories (epoch + all.git) to function and we shouldn't
confuse users by having them point to a git repository for v2.
Much of our documentation already references "INBOX_DIR" for
command-line arguments, so use "inboxdir" as the
git-config(1)-friendly variant for that.
"mainrepo" remains supported indefinitely for compatibility.
Users may need to revert to old versions, or may be referring
to old documentation and must not be forced to change config
files to account for this change.
So if you're using "mainrepo" today, I do NOT recommend changing
it right away because other bugs can lurk.
Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
|
|
Rewrite a bunch of tests to use ordered input (emulating
"git config -l" output) so we can always walk sections in
the order they were given in the config file.
|
|
|
|
No point in using lower-level APIs for a PSGI test.
|
|
PublicInbox::Inbox objects have minimal dependencies, so
drop code to support old tests which existed before the
PublicInbox::Inbox object came into existence.
|
|
In case we encounter an odd system which has Search::Xapian
but not DBD::SQLite.
|
|
More tests work without Search::Xapian, now.
Usability issues still need to be fixed
|
|
We were relying on Danga::Socket using the "bytes" pragma,
previously. Nowadays, the "bytes" pragma is not recommended in
general, but bytes::length remains acceptable for getting the
byte-size of a scalar.
|
|
"LIKE" in SQLite (and other SQL implementations I've seen) is
expensive with nearly 3 million messages in the archives.
This caused some partial Message-ID lookups to take over 600ms
on my workstation (~300ms on a faster Xeon). Cut that to below
under 30ms on average on my workstation by relying exclusively
on Xapian for partial Message-ID lookups as we have in the past.
Unlike in the past when we tried using Xapian to match partial
Message-IDs; we now optimize our indexing of Message-IDs to
break apart "words" in Message-IDs for searching, yielding
(hopefully) "good enough" accuracy for folks who get long URLs
broken across lines when copy+pasting.
We'll also drop the (in retrospect) pointless stripping of
"/[tTf]" suffixes for the partial match, since anybody who
hits that codepath would be hitting an invalid message ID.
Finally, limit wildcard expansion to prevent easy DoS vectors
on short terms.
And blame Pine and alpine for generating Message-IDs with
low-entropy prefixes :P
|
|
* origin/master:
nntp: allow and ignore empty commands
mbox: do not barf on queries which return no results
nntp: fix NEWNEWS command
searchview: fix non-numeric comparison
Allow specification of the number of search results to return
githttpbackend: avoid infinite loop on generic PSGI servers
http: fix modification of read-only value
extmsg: use news.gmane.org for Message-ID lookups
extmsg: rework partial MID matching to favor current inbox
Update the installation instructions with Fedora package names
nntp: do not drain rbuf if there is a command pending
nntp: improve fairness during XOVER and similar commands
searchidx: do not modify Xapian DB while iterating
Don't use LIMIT in UPDATE statements
|
|
Having zero search results means we never get a chance
to populate the Content-Disposition header for mbox
downloads.
|
|
This ought to provide better performance and scalability
which is less dependent on inbox size. Xapian does not
seem optimized for some queries used by the WWW homepage,
Atom feeds, XOVER and NEWNEWS NNTP commands.
This can actually make Xapian optional for NNTP usage,
and allow more functionality to work without Xapian
installed.
Indexing performance was extremely bad at first, but
DBI::Profile helped me optimize away problematic queries.
|
|
We don't want non-fully-numeric limits being compared and
tripping warnings. While we're at it, avoid hard-coding
'200' and reuse $LIM as the default.
|
|
Using update-copyrights from gnulib
While we're at it, use the SPDX identifier for AGPL-3.0+ to
ease mechanical processing.
|
|
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
https://public-inbox.org/meta/CACBZZX5Gnow08r=0A1J_kt3a=zpGyMfvsqu8nAN7kacNnDm+dg@mail.gmail.com/
|