Date | Commit message (Collapse) |
|
We want transactions to be the responsibility of the
caller when possible; this fixes the potential for
the msgmap to internally become inconsistent when
using it from inside searchidx.
|
|
Some messages will be misimported due to an old bug,
clean them up and ensure we do not propagate the mistake.
Followup-to: a0c07cba0e5d ("mda: drop leading "From " lines again")
|
|
We failed to discard old thread IDs when vivifying ghosts
due to out-of-order message arrival. This rectifies the
failure and will trigger a re-index.
|
|
Remove some worthless parameters and redundant no-ops
to make the next (important) patch easier-to-review.
|
|
Disable this since we handle imperfect data from
an imperfect world.
|
|
This hopefully makes the intent of the code clearer, too.
The the HTTP use of the numeric reference for getline
caused problems in Git.pm, already.
|
|
msg_iter lets us know the index of the attachment,
allow us to make more sensible labels and in a future
commit, hyperlinks to download attachments.
|
|
Message-IDs should not be MIME encoded, but in case they are,
use the raw form for compatibility with ssoma and possibly
other tools. This prevents a potential problem where a
malicious client could confuse our storage layer into indexing
incorrect contents.
|
|
We can rely on timely auto-destruction based on reference
counting; reducing the chance of redundant close(2) calls
which may hit the wront FD.
We do care about certain close calls (e.g. writing to a buffered
IO handle) if we require error-checking for write-integrity. In
other cases, let things go out-of-scope so it can be freed
automatically after use.
|
|
While empty or "0" should never appear, this allows the
reviewer to think and know less about the context in which
this check is done.
|
|
We'll be using it for more than just cat-file.
Adding a `popen' API for internal use allows us to save a bunch
of code in other places.
|
|
Hopefully this gives new hackers a better overview of
how the components relate to each other.
|
|
Xapian has this limit for terms, and there are likely no
legitimate Message-IDs (or single header lines) this long; so
there's no need to workaround this limit.
|
|
We use it as a general compressor for identifiers such as
subject paths, so using the "mid_" prefix probably is not
appropriate.
|
|
Sometimes subjects are excessively long and hit Xapian's 245-byte
term limit. We can still perform subject-only searches with
a probabilistic prefix.
|
|
The document data of a search message already contains a good chunk
of the information needed to respond to OVER/XOVER commands quickly.
Expand on that and use the document data to implement OVER/XOVER
quickly.
This adds a dependency on Xapian being available for nntpd usage,
but is probably alright since nntpd is esoteric enough that anybody
willing to run nntpd will also want search functionality offered
by Xapian.
This also speeds up XHDR/HDR with the To: and Cc: headers and
:bytes/:lines article metadata used by some clients for header
displays and marking messages as read/unread.
|
|
It seems like it was never used
|
|
We can avoid duplicating work of extracting messages from git if we
tie this to Xapian. Of course, this ties the two features together,
but it's probably reasonable to expect that anybody who wants to use
public-inbox to serve messages to front-end users will have both.
|
|
We'll be reusing this for loading msgmap.
|
|
In the future, it should be possible to use this:
git ls-files | UPDATE_COPYRIGHT_HOLDER='all contributors' \
UPDATE_COPYRIGHT_USE_INTERVALS=2 \
xargs /path/to/gnulib/build-aux/update-copyright
|
|
We'll continue to compress long Message-IDs in URLs (which we know
about), but we will store entire Message-IDs in the Xapian database
to facilitate ease-of-lookups in external databases.
|
|
Redundant document data increases our database size, pull the
smsg->mid off the unique term, the smsg->ts off the value, and
only generate the formatted display date off smsg->ts.
|
|
We no longer need them, as we can rely on index-time thread
resolution and thread merging. This allows us to index less
data and hopefully increase efficiency.
|
|
Perl does not currently optimize for this.
ref (from p5p):
http://mid.gmane.org/D5C27970-9176-4C7A-8B99-7D78360E67A2@pobox.com
|
|
Consistently name mid_* functions as verbs.
|
|
Dereference header_obj only once when performance may be
critical, or simplify our code by calling "header" directly on
the Email::{Simple,MIME} object if not.
|
|
We must preserve the umask for the entirety of the indexing
operation, as Xapian transactions replace entire files
atomically instead of writing them in place.
|
|
Extend the purpose of core.sharedRepository to apply to
the $GIT_DIR/public-inbox/xapian* directory.
|
|
This makes organization easier and reduces the amount of code
loaded for a PSGI, mod_perl or CGI instance.
|