Date | Commit message (Collapse) |
|
Otherwise I would forget and be tempted to remove them.
|
|
By using the "primary" Message-ID in WwwAttach, we can avoid
conflicts in the links we use for downloading attachments.
|
|
The Message-ID mapped to an NNTP article number is stronger,
so we will favor that for attachment lookups.
|
|
The original Message-ID is still the most important when
discussing with other recipients who do not rely on a message
flowing through public-inbox. So whatever Message-ID we use
to deduplicate internally will be secondary and less important.
All of our front-end v2 code is order-independent, so we won't
let the message count against us, that way.
|
|
We do not need to care about ghosts at multiple call sites; they
cannot have a {blob} field and we've stored the blob field in
Xapian since SCHEMA_VERSION=13.
|
|
This will require multiple client invocations, but should reduce
load on the server and make it easier for readers to only clone
the latest data.
Unfortunately, supporting a cloneurl file for externally-hosted
repos will be more difficult as we cannot easily know if the
clones use v1 or v2 repositories, or how many git partitions
they have.
|
|
We must detect EOF when reading a POST body with standard PSGI servers.
This does not affect deployments using the standard public-inbox-httpd;
but most smaller inboxes should be able to get away using a generic
PSGI server.
|
|
This fails in the rare case we get a partial send() on "\r\n"
when writing chunked HTTP responses out.
|
|
Since we need to handle messages with multiple and duplicate
Message-ID headers, our thread skeleton display must account
for that.
Since we have a "preferred" Message-ID in case of conflicts,
use it as the UUID in an Atom feed so readers do not get
confused by conflicts.
|
|
We do not need many of these, anymore.
|
|
We use the actual Inbox object everywhere else and don't
need the name of the inbox separated from the object.
|
|
It would be a bug to have deleted files marked but not
seen in our histories.
|
|
This should help us detect bugs sooner in case we have
space waste problems.
|
|
This needs tests and further refinement, but current tests pass.
|
|
I forget this endpoint is still accessible (even if not linked).
This also simplifies new.html all around and removes some unused
clutter from the old days while we're at it.
|
|
Some test coverage is better than none, here.
|
|
This gives more-up-to-date data in case and allows us
to avoid reopening in more places ourselves.
|
|
Since v2 supports duplicate messages, we need to support
looking up different messages with the same Message-Id.
Fortunately, our "raw" endpoint has always been mboxrd,
so users won't need to change their parsing tools.
|
|
This also quiets down warnings from -watch when spam training
happens on messages without Message-Id.
|
|
We can no longer rely on tree name lookups for v2. This also
optimizes v1 by relying on git blob object_id lookups while
avoiding process spawning overhead for "git log".
|
|
The File::Temp API is a bit tricky and needs TMPDIR explicitly
enabled if a template is given.
|
|
We want to make it clear to the code and DEBUG_DIFF users
that we do not introduce messages with unsuitable headers
into public archives.
|
|
Allow best-effort regeneration of NNTP article numbers from
cloned git repositories in addition to indexing Xapian Article
numbers will not remain consistent when we add purge support,
though.
|
|
I'll be relying on some of this behavior for regenerating NNTP
article numbers off fresh clones.
|
|
This still requires a msgmap.sqlite3 file to exist, but
it allows us to tweak Xapian indexing rules and reindex
the Xapian database online while -watch is running.
|
|
I keep forgetting to run "make syntax"
|
|
This will be used to keep track of Message-ID <-> NNTP Article
numbers to prevent article number reuse when reindexing.
|
|
We want to rely on Date: to sort messages within individual
threads since it keeps messages from git-send-email(1) sorted.
However, since developers occasionally have the clock set
wrong on their machines, sort overall messages by the newest
date in a Received: header so the landing page isn't forever
polluted by messages from the future.
This also gives us determinism for commit times in most cases,
as we'll used the Received: timestamp there, as well.
|
|
This will make it easier to as well as supporting future
Filter API users. It allows simplifying our ad-hoc
import_vger_from_mbox script.
|
|
Reduce the places where we have duplicate logic for discarding
unwanted headers.
|
|
This code will be shared with future mass-import tools.
|
|
If we need to use content_id, we've already lost hope
in relying on Message-Id as a differentiator. This
prevents duplicates from showing up repeatedly with
-watch when Message-Ids are reused and we generate
new Message-Ids to disambiguate.
|
|
public-inbox-watch gets restarted on reboots and whatnot, so
it could get pointlessly noisy. This message was only useful
during initial development and imports.
|
|
This can help us track down some differences during import,
if needed.
|
|
Perhaps we should filter these headers out in Import
|
|
While parallel processes improves import speed for initial
imports; they are probably not necessary for daily mail imports
via WatchMaildir and certainly not for public-inbox-init. Save
some memory for daily use and even helps improve readability of
some subroutines by showing which methods they call remotely.
|
|
Be consistent with our "remote_" prefix for other IPC subs
|
|
Unfortunately this gives up some minor performance tweaks we
made to avoid reforking import processes.
|
|
This matches Import::done behavior
|
|
I had to dig through commit history for this and we should
better document our tests (along with everything else).
|
|
This reduces code duplication needed for locking and
and hopefully makes things easier to understand.
|
|
No functional changes, yet, but this makes future changes
easier-to-read.
|
|
Instead of using ssoma-based locking, enable locking via Import
for now.
|
|
This will make reindexing easier.
|
|
Hexdigests are too long and shorter Message-IDs are easier
to deal with.
|
|
This allows us to share code for generating Message-IDs
between v1 and v2 repos.
For v1, this introduces a slight incompatibility in message
removal iff the original message lacked a Message-ID AND
the training request came from a message which did not
pass through the public-inbox:
The workaround for this would be to reuse the bad message from
the archive itself.
|
|
This can probably be moved to Import for code reuse.
|
|
This allows us to be more consistent in dealing with completely
empty Message-Ids.
|
|
This will allow WatchMaildir to use ->barrier operations instead
of reaching inside for nchg. This also ensures dumb HTTP
clients can see changes to V2 repos immediately.
|
|
In the future, we may store "purged" content IDs or other
uncommon stuff under "_/" of the git tree. This keeps the
top-level tree small and more amenable to deltafication.
This helps the the common case where "m" is most commonly
changed file at the top level.
Also, use 'D' instead of 'd' since it matches git's '--raw'
output format.
|