Date | Commit message (Collapse) |
|
I mainly focus on -watch for mirroring busy mailing lists, but
using -mda should remain an option.
|
|
And we do not want to start making confused repos if somebody
leaves out "-V2" the second time around.
|
|
We'll be using it in more future tests and scripts.
|
|
This was causing errors while attempting to load messages via
the WWW interface while mass-importing LKML. While we're at it,
remove unnecessary eval from lookup_article.
|
|
We no longer need some of these old subroutines which
assumed a single Message-ID for each message.
|
|
Back in the day, we compressed long Message-IDs to SHA-1
hexdigests for the URL. This now redirects to a 301 in
the hopes we can remove these checks some day to reduce
overhead.
|
|
We can avoid a small amount of overhead and use the "preferred"
Message-ID based on what is in the SearchMsg object.
|
|
The layout of this structure ended up being a bit different
and the read-only access is handled through the ::Inbox class,
instead.
|
|
We do not need this subroutine for read-only use in Search.pm
|
|
Too many similar functions doing the same basic thing was
redundant and misleading, especially since Message-ID is
no longer treated as a truly unique identifier.
For displaying threads in the HTML, this makes it clear
that we favor the primary Message-ID mapped to an NNTP
article number if a message cannot be found.
|
|
The only Xapian term which should be unique is the NNTP article
number; so we no longer need find_unique_doc_id.
|
|
Purging existing messages is fairly straightforward since we can
take advantage of Xapian and lookup the git object_id with it.
Unfortunately, purging an already "removed" message (which is
no longer in Xapian) is not as easy and we'll need to expose
->purge_oids to purge by the git object_id (currently SHA-1).
Furthermore, we expire reflogs and prune in hopes a dumb HTTP
client won't get the object.
|
|
Otherwise I would forget and be tempted to remove them.
|
|
By using the "primary" Message-ID in WwwAttach, we can avoid
conflicts in the links we use for downloading attachments.
|
|
The Message-ID mapped to an NNTP article number is stronger,
so we will favor that for attachment lookups.
|
|
The original Message-ID is still the most important when
discussing with other recipients who do not rely on a message
flowing through public-inbox. So whatever Message-ID we use
to deduplicate internally will be secondary and less important.
All of our front-end v2 code is order-independent, so we won't
let the message count against us, that way.
|
|
We do not need to care about ghosts at multiple call sites; they
cannot have a {blob} field and we've stored the blob field in
Xapian since SCHEMA_VERSION=13.
|
|
This will require multiple client invocations, but should reduce
load on the server and make it easier for readers to only clone
the latest data.
Unfortunately, supporting a cloneurl file for externally-hosted
repos will be more difficult as we cannot easily know if the
clones use v1 or v2 repositories, or how many git partitions
they have.
|
|
We must detect EOF when reading a POST body with standard PSGI servers.
This does not affect deployments using the standard public-inbox-httpd;
but most smaller inboxes should be able to get away using a generic
PSGI server.
|
|
This fails in the rare case we get a partial send() on "\r\n"
when writing chunked HTTP responses out.
|
|
Since we need to handle messages with multiple and duplicate
Message-ID headers, our thread skeleton display must account
for that.
Since we have a "preferred" Message-ID in case of conflicts,
use it as the UUID in an Atom feed so readers do not get
confused by conflicts.
|
|
We do not need many of these, anymore.
|
|
We use the actual Inbox object everywhere else and don't
need the name of the inbox separated from the object.
|
|
It would be a bug to have deleted files marked but not
seen in our histories.
|
|
This should help us detect bugs sooner in case we have
space waste problems.
|
|
This needs tests and further refinement, but current tests pass.
|
|
I forget this endpoint is still accessible (even if not linked).
This also simplifies new.html all around and removes some unused
clutter from the old days while we're at it.
|
|
This gives more-up-to-date data in case and allows us
to avoid reopening in more places ourselves.
|
|
Since v2 supports duplicate messages, we need to support
looking up different messages with the same Message-Id.
Fortunately, our "raw" endpoint has always been mboxrd,
so users won't need to change their parsing tools.
|
|
This also quiets down warnings from -watch when spam training
happens on messages without Message-Id.
|
|
We can no longer rely on tree name lookups for v2. This also
optimizes v1 by relying on git blob object_id lookups while
avoiding process spawning overhead for "git log".
|
|
The File::Temp API is a bit tricky and needs TMPDIR explicitly
enabled if a template is given.
|
|
We want to make it clear to the code and DEBUG_DIFF users
that we do not introduce messages with unsuitable headers
into public archives.
|
|
Allow best-effort regeneration of NNTP article numbers from
cloned git repositories in addition to indexing Xapian Article
numbers will not remain consistent when we add purge support,
though.
|
|
This still requires a msgmap.sqlite3 file to exist, but
it allows us to tweak Xapian indexing rules and reindex
the Xapian database online while -watch is running.
|
|
I keep forgetting to run "make syntax"
|
|
This will be used to keep track of Message-ID <-> NNTP Article
numbers to prevent article number reuse when reindexing.
|
|
We want to rely on Date: to sort messages within individual
threads since it keeps messages from git-send-email(1) sorted.
However, since developers occasionally have the clock set
wrong on their machines, sort overall messages by the newest
date in a Received: header so the landing page isn't forever
polluted by messages from the future.
This also gives us determinism for commit times in most cases,
as we'll used the Received: timestamp there, as well.
|
|
This will make it easier to as well as supporting future
Filter API users. It allows simplifying our ad-hoc
import_vger_from_mbox script.
|
|
Reduce the places where we have duplicate logic for discarding
unwanted headers.
|
|
This code will be shared with future mass-import tools.
|
|
If we need to use content_id, we've already lost hope
in relying on Message-Id as a differentiator. This
prevents duplicates from showing up repeatedly with
-watch when Message-Ids are reused and we generate
new Message-Ids to disambiguate.
|
|
public-inbox-watch gets restarted on reboots and whatnot, so
it could get pointlessly noisy. This message was only useful
during initial development and imports.
|
|
This can help us track down some differences during import,
if needed.
|
|
While parallel processes improves import speed for initial
imports; they are probably not necessary for daily mail imports
via WatchMaildir and certainly not for public-inbox-init. Save
some memory for daily use and even helps improve readability of
some subroutines by showing which methods they call remotely.
|
|
Be consistent with our "remote_" prefix for other IPC subs
|
|
Unfortunately this gives up some minor performance tweaks we
made to avoid reforking import processes.
|
|
This matches Import::done behavior
|
|
This reduces code duplication needed for locking and
and hopefully makes things easier to understand.
|
|
Instead of using ssoma-based locking, enable locking via Import
for now.
|