Date | Commit message (Collapse) |
|
This simplifies the code a bit and reduces the translation
overhead for looking directly at data from tools shipped
with Xapian.
While we're at it, fix thread-all.t :)
|
|
This is a high indicator of spam (but out-of-scope for this
particular module) but sometimes it is not, and people
legitimately forget to set a Subject: header at all.
|
|
This was necessary for the presence of the 0xa0 byte(*)
in the Subject: of the message at:
http://blade.nagaokaut.ac.jp/ruby/ruby-core/3220
(*) That is 0xa0, not 0x0a ("\n"), so I wonder if the
nibbles got swapped somehow.
|
|
It is possible to have double-escaped queries when copy and
pasting into browsers, so try to help users work around this
common error by automatically retrying after unescaping once.
Of course, we must inform the user when doing this results in
success, in case they really meant to search for a
double-escaped term which resulted in nothing.
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
https://public-inbox.org/meta/CACBZZX5Gnow08r=0A1J_kt3a=zpGyMfvsqu8nAN7kacNnDm+dg@mail.gmail.com/
|
|
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
https://public-inbox.org/meta/CACBZZX5Gnow08r=0A1J_kt3a=zpGyMfvsqu8nAN7kacNnDm+dg@mail.gmail.com/
|
|
Sometimes bots generate malformed queries with sequential
"&" and ";" characters.
|
|
It should be helpful to know what error happened.
|
|
umask should never fail and set $@, but use the cached local
to be more explicit just in case.
|
|
Due to the asynchronous nature of SMTP, it is possible for the
root message of a thread (with no References/In-Reply-To)
to arrive last in a series. We must preserve the thread_id
of the ghost message in this case, as we do when vivifiying
non-root ghosts.
Otherwise, this causes threads to be broken when the root
arrives last.
|
|
I'm not sure if people use either and it's not in mairix
(where we base our abbreviations off of). Lets go
with the shorter prefix since it's easier-to-type.
|
|
Dovecot uses 'a'..'z' (lowercase) to designate keywords
in Maildir flags. This was preventing certain messages
from being marked as spam.
https://wiki2.dovecot.org/MailboxFormat/Maildir
|
|
When displaying search results with full messages, it makes
more sense to show them in ascending chronological order when
going by date. Reverse chronological order makes more sense
for search results which only show the subject.
|
|
At least for the thread view (&x=t); this will make it
easy to link to the overview.
|
|
Apparently mid.mail-archive.com does not support HTTPS,
and the HTTP version redirects to the search query, anyways.
|
|
Otherwise funky filenames can cause HTML injection
vulnerabilities (hope you have JavaScript disabled!)
|
|
PSGI specs already require PATH_INFO to be unescaped;
so our tests were wrong, too.
|
|
It seems possible for git-send-email(1) to generate repeated
repeated instances of References and In-Reply-To headers,
as evidenced in:
https://public-inbox.org/git/20161111124541.8216-17-vascomalmeida@sapo.pt/raw
This causes a mismatch between how our search indexer threads
and how our HTML view handles threading. In the future, View.pm
will use the smsg-parsed {references} field and avoid redoing
Email::MIME header parsing.
We will still need to figure out a way to deal with messages
with repeated Message-IDs, at some point, too.
|
|
There's no need to hold everything in memory, here,
since apparently "foreach" will read everything at
once in array context
(for some reason, I thought Perl5 was smart enough
to avoid creating a temporary array, here...)
|
|
We cannot distinguish between legitimate ghosts and mis-threaded
messages before commit 83425ef12e4b65cdcecd11ddcb38175d4a91d5a0
("searchidx: deal with empty In-Reply-To and References headers")
so we must rebuild the index in parallel to fix it.
|
|
Oops, that's broken, too. I guess the only way to reindex
after fixing the thread detection is to start from scratch.
This reverts commit 5d91adedf5f33ef1cb87df2a86306ddf370b4f8d.
|
|
We cannot always reuse thread IDs since our threading
logic may change as bugs are fixed.
|
|
In some messages, these headers exist, but have empty values.
Do not let empty values throw off our search indexer to tie
threads together, as it can make non-sensical threads grouped
to a Message-Id of "" (empty string).
See
<https://public-inbox.org/git/11340844841342-git-send-email-mailing-lists.git@rawuncut.elitemail.org/raw>
for an example of such a message.
Thanks-to: Johannes Schindelin <Johannes.Schindelin@gmx.de>
<https://public-inbox.org/git/alpine.DEB.2.20.1702041206130.3496@virtualbox/>
|
|
We are in no danger of excessive buffering or OOM-ing,
the main page for every inbox already loads 200 results;
and thread page views even load 1000! Increase this to
200 for now.
|
|
Xapian can only give estimated results when a result limit is
given to it, so make clear it is an estimate to avoid showing
non-sensical ranges when no results are returned.
|
|
Some mailing lists add annoying tags into the Subject line which
discourages readers from doing proper mail organization on the
client side. They also waste precious screen space and
attention span.
Remove them from our archives to reduce clutter.
|
|
We'll want to allow some degree of configuration for
various mailing lists.
|
|
We don't want to be triggering OOM or swapping on weaker
systems when we have dozens of inboxes as potential targets.
|
|
We must call Email::Simple methods directly in our monkey patch
for Email::MIME to call the intended method. Using SUPER in our
subclass would instead hit a different, unintended method in
Email::MIME.
Reported-by: Junio C Hamano <gitster@pobox.com>
<xmqq4m0wb43w.fsf@gitster.mtv.corp.google.com>
|
|
We may need to do this even more aggressively, since the
Xapian database does not always give the latest results.
This time, we'll do it without relying on weak references,
and instead check refcounts.
|
|
This should fix problems with multipart messages where
text/plain parts lack a header.
cf. git clone --mirror https://github.com/rjbs/Email-MIME.git
refs/pull/28/head
In the future, we may still introduce as streaming
interface to reduce memory usage on large emails.
|
|
We still need to cleanup git processes occasionally, since
"git cat-file --batch" does not release old packs (and
git processes are fairly expensive).
For SQLite and Xapian file handles, they should be capable
of managing themselves without too much trouble, so lets
try keeping them for the lifetime of a process.
|
|
Apparently it never actually got used, and the world seems
fine without it, so we can drop it.
While we're at it, consider removing our subject_path
usage from existence, too. We are not using fancy subject-line
based URLs, here.
|
|
This is faster, smaller, and more straighforward to me with
fewer layers of indirection.
|
|
We only need strftime to be locale-independent when generating
dates for email and HTTP headers. Purely numeric dates can
use strftime for ease-of-readability.
|
|
This allows certain inboxes to override the global nntpserver
(perhaps under a different domain).
|
|
We can do a better job initializing the data structure
so we no longer need to rely on weak references to cleanup
when we ditch the config on reload.
|
|
Hopefully make this easier for future generations to understand.
|
|
This seems like an unnecessary abstraction, or an abstraction
on the wrong level.
|
|
I'm not sure if we'll ever support sharing a config file
with other tools, but maybe we will, and "limiter" is
too generic.
|
|
We may allow the {max} value of a limiter to be changed
in the future, so lets start accounting for it before we
spawn followup processes.
|
|
Avoiding weaken here is no more dangerous than the existing
circular refs (e.g. psgix.io) we create and manage throughout
the lifetime of the connection. So, trust ourselves to maintain
the data structure properly and avoid triggering extra memory
usage.
While we're at it, avoid having anonymous subroutines capture
more variables than necessary to simplify reference auditing.
|
|
We do not need to use weaken() here, so avoid it to simplify our
interactions with Perl; as weaken requires additional storage
and (it seems) time complexity.
|
|
Oops. And we'll be fixing circular references from now...
|
|
If a message is spam in one mailbox, it is spam in all others a
particular user/group will care about.
|
|
Danga::Socket defers close() syscalls until the end of the event
loop to avoid FD recycling. Unfortunately, this is dependent on
IO events firing and waking the process up from
poll/kevent/epoll_wait.
Without any I/O activity, a socket could remain in the
@Danga::Socket::ToClose array indefinitely. Thus, we will
trigger a fake IO event after running all timers to trigger
the deferred close in Danga::Socket::PostEventLoop.
|
|
We only refer to PublicInbox::HTTP objects here, so '$io'
was a bad name.
|
|
Fewer returns improves readability and the diffstat agrees.
|
|
Fewer conditionals means theres fewer code paths to test
and makes things easier-to-read.
|
|
Use a more meaningful variable name for the Qspawn
object, since this module is the reference for its
use.
|
|
Oops, this would be disatrous if we started handling
bigger request bodies or slow clients.
Fixes: c008654229a9 ("avoid IO::File for anonymous temporary files")
|