Date | Commit message (Collapse) |
|
This should be adequate warning for folks who may be
uncomfortable or uncertain about even possessing AGPL
source code due to employer agreements and such.
Disclaimer: I remain completely in favor of AGPL and strong
copyleft, and am more than willing to risk my own future on it.
However, I refuse to even nudge people into downloading AGPL
source code if it presents any legal risk to them.
|
|
We do not need to import IO::File into the main programs
since Perl 5.8+ supports literal "undef" for generating
anonymous temporary file handles.
|
|
This was sloppy code, all calls need to be checked
for failure.
|
|
Some mail clients do not seem to handle '+' as a space in query
parameters for the mail subject, use the more common '%20' for
compatibility.
|
|
One may build the initial index on a powerful host and transfer
it to a weaker one for incremental indexing. Thus there is
no requirement to have a configured public-inbox for building
the index unless a user needs altid support or some such.
|
|
We should not completely kill a process if "git gc --auto"
errors out due to a warning or whatnot.
|
|
This reverts commit 3c9dd6619f825f0515e7e4afa1bd55c99c1a68d3
("thread: fix sorting without topmost")
and reinstates the "topmost" routine for sorting purposes.
|
|
The ordering change in add_child is critical if $self == $parent
as the {children} hash was lost before this change.
has_descendent can be simplified by walking upwards from the child
instead of downwards from the parent.
This fixes threading regressions introduced in
commit 30100c46326e2eac275e0af13116636701d2537e
("thread: use hash + array instead of hand-rolled linked list")
|
|
This should reduce differences from the original Mail::Thread
code and hopefully make things easier-to-follow.
|
|
We have to walk through all the messages after threading
anyways to build the rootset, so we can just delete all
the parent references at that point.
|
|
Some broken (or malicious) mailers may include a generated
Message-ID in its References header, so be prepared for it.
|
|
Each node has an entire arrayref of its children nowadays, so
there's no need to waste time and memory creating another one.
|
|
This starts to show noticeable performance improvements when
attempting to thread over 400 messages; but the improvement
may not be measurable with less.
However, the resulting code is much shorter and (IMHO)
much easier to understand.
|
|
This bug was hidden, and we may not be able to efficiently
implement a topmost subroutine with the hash-based (vs
linked-list) based container for threading in the next
commit.
|
|
We no longer recurse, and it's too hard to come up with
a new name for a sub we will only use once.
|
|
We never use the depth anywhere in this sub
|
|
It is pointless to increment when setting a true value is
simpler as there is no need to read before writing.
|
|
Unnecessary subs and complexity. This was hiding the fact
that $before is never used.
|
|
Single use subroutines actually make the code more complex in
this case, and there's never a {seen} field in $self.
|
|
It doesn't buy us much and copying to a new array is slower;
but probably not measurable in real-world use.
|
|
This roughly doubles performance due to the reduction in
object creation and abstraction layers.
|
|
smsg will be undef for ghost messages in a subsequent commit
|
|
This improves top-level index generation performance by 3-4%.
|
|
Copying large arrays is expensive, so avoid it.
This reduces /$INBOX/ time by around 1%.
|
|
Introduce our own SearchThread class for threading messages.
This should allow us to specialize and optimize away objects
in future commits.
|
|
We will not care for inexact threading by subject or pruning.
|
|
Support (and document) 'a:' after all, as "mairix -h" uses it,
so this should reduce the learning curve for mairix users.
|
|
This clarifies the code somewhat, and we don't care to lazy-load
in NNTP.pm anyways since this is only used for a long-lived
daemon.
|
|
The existing string -> number date range Xapian query is good
enough, and having too much flexibility is probably bad for
caching (as well as increasing our attack surface, because
parsing queries is tricky).
Tags-as-skiplists are probably not worth the effort given
Xapian, and we may have to import old messages after-the-fact,
anyways, and message delivery for mirrors is never orderly.
Other items are all done and need to be maintained (like the
search engine docs for the mairix-compatibility features that
just got pushed out)
|
|
Output $! for diagnostic purposes since I've noticed this on
two slow machines, today (and seemingly, never prior).
|
|
And while we're at it, ensure searching inside displayable
attachment bodies works.
|
|
The basic rule is that if it is displayable via our WWW
interface, it should be indexable text for Xapian search.
|
|
It's not worth entering a complex codepath in Email::MIME to
save some (probably immeasurable amount of) memory, here. We've
already stopped doing this in our WWW code a while back, too.
If we really cared enough about it, we'd prioritize work on a
streaming replacement for Email::MIME.
|
|
Specifying the "d:" field only worked for
NumberValueRangeProcessor in older versions of Xapian, such
as the one in Debian wheezy (libsearch-xapian-perl=1.2.10.0-1)
This slipped through since I rarely use wheezy, anymore, and
perhaps nobody else does, either. Perhaps wheezy support may be
dropped, soon.
Unfortunately, this requires a schema version bump.
|
|
We pay a storage cost for storing positional information
in Xapian, make good use of it by attempting to preserve
it for (hopefully) better search results.
|
|
This is stricter than the mutt quote_regexp default
("^([ \t]*[|>:}#])+" on Debian jessie),
but matches what we have in View.pm.
I prefer the stricter quote detection since it is less ambiguous
and less likely to hide/obscure important details.
|
|
As of Xapian 1.0.4 (from 2007) is possible to use
Search::Xapian::QueryParser::add_prefix multiple times with the
same user field name but different term prefixes.
This brings my current git@vger mirror from 6.5GB to 2.1GB
(both sizes are after xapian-compact).
|
|
"bs:" and "b:" are adapted from mairix(1)
We will also support searching explicitly for quoted vs
non-quoted text via "q:" and "nq:" prefixes since sometimes
readers will not care for quoted text.
In the future, we will support parsing diffs (perhaps when
repobrowse integration is complete).
Note: this roughly doubles the size of the Xapian database due
to the additional information; so this change may not be worth
it.
|
|
We only document the "s:" anyways. While the long name is more
descriptive, the ambiguity makes agnostic caching (by Varnish or
similar) slightly harder and longer URLs are more likely to be
accidentally truncated when shared.
|
|
Sometimes it can be useful to search based on who the
message was sent to, sent by, or Cc:-ed. Of course,
headers can be faked, but they usually are not...
Anyways this mostly matches the behavior of mairix(1).
|
|
We need to prevent excessive repository growth for
public-inbox-watch and public-inbox-mda users.
|
|
We will be reusing this in the next commit, too.
|
|
For now, we will document this since it allows better
performance without the burden of extensions. Perhaps one day
far in the future Perl can natively support vfork(2) AND that
version of Perl will be widely available, but I suspect that day
is at least a decade away, if not two:
https://rt.perl.org/Ticket/Display.html?id=128227
|
|
This reduces duplication, slightly. We may be using it
yet again in a to-be-introduced function (or we may not
introduce it).
|
|
Email::MIME internally assumes "text/plain" for messages
missing a Content-Type, but does not expose that in the
Email::MIME::content_type API method. We must assume it
ourselves to avoid uninitialized value warnings for the
rare (nowadays) MUAs which do not set it.
|
|
And include it into the build + website
|
|
Hopefully more folks can download and run public-inbox,
nowadays.
|
|
Just having "limiter" in the prefix may confuse
it with something else. Use the full prefix to
avoid this confusion.
|
|
We want to encourage users to serve repositories. So enable
bitmaps by default so performance suffers less with smart HTTP.
|
|
We'll keep supporting "publicinboxlearn" indefinitely,
but "publicinboxwatch" is probably more appropriate
at the moment.
Noticed while writing documentation.
|