about summary refs log tree commit homepage
path: root/lib
DateCommit message (Collapse)
2017-04-04watchmaildir: do not reject lowercase flags on Maildir files
Dovecot uses 'a'..'z' (lowercase) to designate keywords in Maildir flags. This was preventing certain messages from being marked as spam. https://wiki2.dovecot.org/MailboxFormat/Maildir
2017-03-24searchview: show full (&x=t) messages in ascending chronlogical order
When displaying search results with full messages, it makes more sense to show them in ascending chronological order when going by date. Reverse chronological order makes more sense for search results which only show the subject.
2017-03-24searchview: add "t" id to link to thread overview
At least for the thread view (&x=t); this will make it easy to link to the overview.
2017-03-22extmsg: use updated mail-archive.com URL
Apparently mid.mail-archive.com does not support HTTPS, and the HTTP version redirects to the search query, anyways.
2017-03-14view: escape HTML description name
Otherwise funky filenames can cause HTML injection vulnerabilities (hope you have JavaScript disabled!)
2017-02-14www: do not unescape PATH_INFO twice
PSGI specs already require PATH_INFO to be unescaped; so our tests were wrong, too.
2017-02-11handle repeated References and In-Reply-To headers
It seems possible for git-send-email(1) to generate repeated repeated instances of References and In-Reply-To headers, as evidenced in: https://public-inbox.org/git/20161111124541.8216-17-vascomalmeida@sapo.pt/raw This causes a mismatch between how our search indexer threads and how our HTML view handles threading. In the future, View.pm will use the smsg-parsed {references} field and avoid redoing Email::MIME header parsing. We will still need to figure out a way to deal with messages with repeated Message-IDs, at some point, too.
2017-02-09config: do not slurp lines into memory
There's no need to hold everything in memory, here, since apparently "foreach" will read everything at once in array context (for some reason, I thought Perl5 was smart enough to avoid creating a temporary array, here...)
2017-02-06search: schema version bump for empty References/In-Reply-To
We cannot distinguish between legitimate ghosts and mis-threaded messages before commit 83425ef12e4b65cdcecd11ddcb38175d4a91d5a0 ("searchidx: deal with empty In-Reply-To and References headers") so we must rebuild the index in parallel to fix it.
2017-02-06Revert "searchidx: reindex clobbers old thread IDs"
Oops, that's broken, too. I guess the only way to reindex after fixing the thread detection is to start from scratch. This reverts commit 5d91adedf5f33ef1cb87df2a86306ddf370b4f8d.
2017-02-06searchidx: reindex clobbers old thread IDs
We cannot always reuse thread IDs since our threading logic may change as bugs are fixed.
2017-02-06searchidx: deal with empty In-Reply-To and References headers
In some messages, these headers exist, but have empty values. Do not let empty values throw off our search indexer to tie threads together, as it can make non-sensical threads grouped to a Message-Id of "" (empty string). See <https://public-inbox.org/git/11340844841342-git-send-email-mailing-lists.git@rawuncut.elitemail.org/raw> for an example of such a message. Thanks-to: Johannes Schindelin <Johannes.Schindelin@gmx.de> <https://public-inbox.org/git/alpine.DEB.2.20.1702041206130.3496@virtualbox/>
2017-02-06searchview: increase limit for displaying search results
We are in no danger of excessive buffering or OOM-ing, the main page for every inbox already loads 200 results; and thread page views even load 1000! Increase this to 200 for now.
2017-02-06searchview: clarify numeric summary at bottom
Xapian can only give estimated results when a result limit is given to it, so make clear it is an estimate to avoid showing non-sensical ranges when no results are returned.
2017-01-26add filter for Subject: tags
Some mailing lists add annoying tags into the Subject line which discourages readers from doing proper mail organization on the client side. They also waste precious screen space and attention span. Remove them from our archives to reduce clutter.
2017-01-26watchmaildir: allow arguments for filters
We'll want to allow some degree of configuration for various mailing lists.
2017-01-19watchmaildir: limit live importer processes
We don't want to be triggering OOM or swapping on weaker systems when we have dozens of inboxes as potential targets.
2017-01-18mime: avoid SUPER usage in Email::MIME subclass
We must call Email::Simple methods directly in our monkey patch for Email::MIME to call the intended method. Using SUPER in our subclass would instead hit a different, unintended method in Email::MIME. Reported-by: Junio C Hamano <gitster@pobox.com> <xmqq4m0wb43w.fsf@gitster.mtv.corp.google.com>
2017-01-11inbox: reinstate periodic cleanup of Xapian and SQLite objects
We may need to do this even more aggressively, since the Xapian database does not always give the latest results. This time, we'll do it without relying on weak references, and instead check refcounts.
2017-01-10introduce PublicInbox::MIME wrapper class
This should fix problems with multipart messages where text/plain parts lack a header. cf. git clone --mirror https://github.com/rjbs/Email-MIME.git refs/pull/28/head In the future, we may still introduce as streaming interface to reduce memory usage on large emails.
2017-01-07inbox: properly register cleanup timer for git processes
We still need to cleanup git processes occasionally, since "git cat-file --batch" does not release old packs (and git processes are fairly expensive). For SQLite and Xapian file handles, they should be capable of managing themselves without too much trouble, so lets try keeping them for the lifetime of a process.
2017-01-07search: remove subject_summary
Apparently it never actually got used, and the world seems fine without it, so we can drop it. While we're at it, consider removing our subject_path usage from existence, too. We are not using fancy subject-line based URLs, here.
2017-01-07searchmsg: favor direct hash access over accessor methods
This is faster, smaller, and more straighforward to me with fewer layers of indirection.
2017-01-07remove incorrect comment about strftime + locales
We only need strftime to be locale-independent when generating dates for email and HTTP headers. Purely numeric dates can use strftime for ease-of-readability.
2017-01-07config: allow per-inbox nntpserver
This allows certain inboxes to override the global nntpserver (perhaps under a different domain).
2017-01-07inbox: eliminate weaken usage entirely
We can do a better job initializing the data structure so we no longer need to rely on weak references to cleanup when we ditch the config on reload.
2017-01-07inbox: describe the full key name
Hopefully make this easier for future generations to understand.
2017-01-07config: remove unused get() method
This seems like an unnecessary abstraction, or an abstraction on the wrong level.
2017-01-07config: always use namespaced "publicinboxlimiter"
I'm not sure if we'll ever support sharing a config file with other tools, but maybe we will, and "limiter" is too generic.
2017-01-07qspawn: prepare to support runtime reloading of Limiter
We may allow the {max} value of a limiter to be changed in the future, so lets start accounting for it before we spawn followup processes.
2017-01-04http: remove weaken usage, reduce anonsub capture scope
Avoiding weaken here is no more dangerous than the existing circular refs (e.g. psgix.io) we create and manage throughout the lifetime of the connection. So, trust ourselves to maintain the data structure properly and avoid triggering extra memory usage. While we're at it, avoid having anonymous subroutines capture more variables than necessary to simplify reference auditing.
2017-01-04httpd/async: remove weaken usage
We do not need to use weaken() here, so avoid it to simplify our interactions with Perl; as weaken requires additional storage and (it seems) time complexity.
2017-01-04http: fix spelling error
Oops. And we'll be fixing circular references from now...
2017-01-02watch: watchspam affects all configured inboxes
If a message is spam in one mailbox, it is spam in all others a particular user/group will care about.
2016-12-26evcleanup: ensure deferred close from timers are handled ASAP
Danga::Socket defers close() syscalls until the end of the event loop to avoid FD recycling. Unfortunately, this is dependent on IO events firing and waking the process up from poll/kevent/epoll_wait. Without any I/O activity, a socket could remain in the @Danga::Socket::ToClose array indefinitely. Thus, we will trigger a fake IO event after running all timers to trigger the deferred close in Danga::Socket::PostEventLoop.
2016-12-25httpd/async: improve variable naming
We only refer to PublicInbox::HTTP objects here, so '$io' was a bad name.
2016-12-25githttpbackend: minor cleanups to improve readability
Fewer returns improves readability and the diffstat agrees.
2016-12-25githttpbackend: simplify compatibility code
Fewer conditionals means theres fewer code paths to test and makes things easier-to-read.
2016-12-25githttpbackend: minor readability improvement
Use a more meaningful variable name for the Qspawn object, since this module is the reference for its use.
2016-12-25http: fix clobbering of $null_io
Oops, this would be disatrous if we started handling bigger request bodies or slow clients. Fixes: c008654229a9 ("avoid IO::File for anonymous temporary files")
2016-12-24linkify: modify argument in place
This results in over 1% speedup doing $MESSAGE_ID/T/ HTML generation for a 368-message thread.
2016-12-24view: do not modify array during iteration
This results in a half percent speedup or so doing $MESSAGE_ID/T/ HTML generation for a 368 message thread.
2016-12-24view: stop chomping off whitespace at ends of messages
This allows a 3-4% speedup in $MESSAGE_ID/T/ page generation speed for a 368+ message thread. It also more faithfully preserves the message as intended; even if the it makes the sender look like a space-wasting slob :P
2016-12-24view: remove unused parameter
And add a comment about it to remind our future selves.
2016-12-22search: lookup_mail handles modified DBs
We call lookup_mail all over the place, be sure we can handle database modifications in those cases.
2016-12-22doc: various comments on async handling
Notes for future developers (myself included) since we can't assume people can read my mind.
2016-12-21searchthread: simplify API and remove needless OO
This simplifies callers to prevent errors and avoids needless object-orientation in favor of a single procedure call to handle threading and ordering.
2016-12-21searchthread: update comment about loop prevention
It definitely is necessary to prevent looping with the %seen hash.
2016-12-20searchmsg: remove ensure_metadata
Instead, only preload the ->mid field for threading, as we only need ->thread and ->path once in Search->get_thread (but we will need the ->mid field repeatedly). This more than doubles View->load_results performance on according to thread-all on an inbox with over 300K messages.
2016-12-17searchmsg: do not memoize {date} field
We only generate the ->date once in NNTP, so creating the hash entry is a waste.