Date | Commit message (Collapse) |
|
I was too aggressively disabling parallelization to speed up
the test suite and broke this :x Re-enable parallelization
for the v2reindex test so we can catch it later.
|
|
We need to ensure there is only one file in the top-level tree
at any commit so the "add; remove; add;" sequence on the same
message is detected properly.
Otherwise, git will not detect the second "add" unless
a second message is added to history.
Deletes are now stored in "d" (and not "D" or "_/D") at the
top-level, now. There's no need to have a "_" to reduce churn
as "m" and "d" should never co-exist. It's now lowercased to
make it easier-to-distinguish from "D" in git-log output.
|
|
We will vivify multiple ghosts if a message has multiple
Message-IDs.
|
|
Ensure -convert and -compact do not make repositories
unreadable on live servers.
|
|
We have Git::qx nowadays.
|
|
We'll be making sure V2Writable uses this.
|
|
This bug was hidden due to timing problems with eatmydata or
running with tmpfs for TMPDIR.
|
|
Some folks had bad mail clients which generated 3-digit years
around Y2K...
|
|
This is a smaller improvement than the landing /$INBOX/ page
because full message bodies are shown; but still saves around
100ms for my system with LKML.
|
|
It's no longer necessary to have this since load_expand
now populates $smsg->mid with the "preferred" Message-ID.
This saves around 10ms on the homepage for me.
|
|
This saves over 400ms on my system with the full LKML
with over 2.8 million messages.
|
|
This is consistent with how we internally generate new
Message-IDs to break conflicts and allows ->reindex to
succeed while walking backwards through history
|
|
Relying solely on git for v2 repos is probably not
so useful, so add pointers to public-inbox-init/index
commands.
|
|
By supporting purge and allowing users to delete git partitions,
we can open up ourselves to gaps and un-reindexible data. Let
that be.
|
|
Somebody may only care about the most recent history,
so allow -init and -index to operate quietly on missing
partitions.
|
|
-watch on a busy/giant Maildir caused too many Xapian
errors while attempting to browse.
|
|
I mainly focus on -watch for mirroring busy mailing lists, but
using -mda should remain an option.
|
|
Having multiple Xapian partitions is mostly pointless after
the initial import. We can compact all the partitions into
one while keeping the skeleton separate.
|
|
And we do not want to start making confused repos if somebody
leaves out "-V2" the second time around.
|
|
We'll be using it in more future tests and scripts.
|
|
This was causing errors while attempting to load messages via
the WWW interface while mass-importing LKML. While we're at it,
remove unnecessary eval from lookup_article.
|
|
We no longer need some of these old subroutines which
assumed a single Message-ID for each message.
|
|
Back in the day, we compressed long Message-IDs to SHA-1
hexdigests for the URL. This now redirects to a 301 in
the hopes we can remove these checks some day to reduce
overhead.
|
|
We can avoid a small amount of overhead and use the "preferred"
Message-ID based on what is in the SearchMsg object.
|
|
The layout of this structure ended up being a bit different
and the read-only access is handled through the ::Inbox class,
instead.
|
|
We do not need this subroutine for read-only use in Search.pm
|
|
Too many similar functions doing the same basic thing was
redundant and misleading, especially since Message-ID is
no longer treated as a truly unique identifier.
For displaying threads in the HTML, this makes it clear
that we favor the primary Message-ID mapped to an NNTP
article number if a message cannot be found.
|
|
The only Xapian term which should be unique is the NNTP article
number; so we no longer need find_unique_doc_id.
|
|
Purging existing messages is fairly straightforward since we can
take advantage of Xapian and lookup the git object_id with it.
Unfortunately, purging an already "removed" message (which is
no longer in Xapian) is not as easy and we'll need to expose
->purge_oids to purge by the git object_id (currently SHA-1).
Furthermore, we expire reflogs and prune in hopes a dumb HTTP
client won't get the object.
|
|
This should make it easier to let users perform comparisons and
migrate to v2 if needed.
|
|
Otherwise I would forget and be tempted to remove them.
|
|
By using the "primary" Message-ID in WwwAttach, we can avoid
conflicts in the links we use for downloading attachments.
|
|
The Message-ID mapped to an NNTP article number is stronger,
so we will favor that for attachment lookups.
|
|
The original Message-ID is still the most important when
discussing with other recipients who do not rely on a message
flowing through public-inbox. So whatever Message-ID we use
to deduplicate internally will be secondary and less important.
All of our front-end v2 code is order-independent, so we won't
let the message count against us, that way.
|
|
We do not need to care about ghosts at multiple call sites; they
cannot have a {blob} field and we've stored the blob field in
Xapian since SCHEMA_VERSION=13.
|
|
This will require multiple client invocations, but should reduce
load on the server and make it easier for readers to only clone
the latest data.
Unfortunately, supporting a cloneurl file for externally-hosted
repos will be more difficult as we cannot easily know if the
clones use v1 or v2 repositories, or how many git partitions
they have.
|
|
We must detect EOF when reading a POST body with standard PSGI servers.
This does not affect deployments using the standard public-inbox-httpd;
but most smaller inboxes should be able to get away using a generic
PSGI server.
|
|
This fails in the rare case we get a partial send() on "\r\n"
when writing chunked HTTP responses out.
|
|
Since we need to handle messages with multiple and duplicate
Message-ID headers, our thread skeleton display must account
for that.
Since we have a "preferred" Message-ID in case of conflicts,
use it as the UUID in an Atom feed so readers do not get
confused by conflicts.
|
|
We do not need many of these, anymore.
|
|
We use the actual Inbox object everywhere else and don't
need the name of the inbox separated from the object.
|
|
It would be a bug to have deleted files marked but not
seen in our histories.
|
|
This should help us detect bugs sooner in case we have
space waste problems.
|
|
This needs tests and further refinement, but current tests pass.
|
|
I forget this endpoint is still accessible (even if not linked).
This also simplifies new.html all around and removes some unused
clutter from the old days while we're at it.
|
|
Some test coverage is better than none, here.
|
|
This gives more-up-to-date data in case and allows us
to avoid reopening in more places ourselves.
|
|
Since v2 supports duplicate messages, we need to support
looking up different messages with the same Message-Id.
Fortunately, our "raw" endpoint has always been mboxrd,
so users won't need to change their parsing tools.
|
|
This also quiets down warnings from -watch when spam training
happens on messages without Message-Id.
|
|
We can no longer rely on tree name lookups for v2. This also
optimizes v1 by relying on git blob object_id lookups while
avoiding process spawning overhead for "git log".
|