about summary refs log tree commit homepage
DateCommit message (Collapse)
2019-01-02update and add documentation for repository formats
Remove confusing documentation around ssoma now that we have NNTP and downloadable mbox support. Only lightly-checked for grammar and speling, and not yet formatting. Edits, corrections and addendums expected :>
2019-01-02t/feed.t: remove ssoma use
No need to waste cycles with this anymore.
2019-01-02t/v2reindex: use the larger text to increase test reliability
libxapian30:amd64 1.4.9-1 on Debian sid seems to give an 8KB position.glass database with "hello world" as the document regardless of our indexlevel. Use the text of the AGPL-3.0 for a more realisitic Xapian database size. And perhaps tying our tests to the AGPL will make life more difficult for would-be copyright violators :>
2019-01-02INSTALL: note Plack and URI::Escape are required at the moment
They really shouldn't be... Also, it seems like eliminating IPC::Run is not going to be worth the effort.
2019-01-02inbox: keep Danga::Socket optional
We can't run cleanup stuff without Danga::Socket.
2019-01-01hval: set font-size:100% for all elements
GUI browsers have a tendency to use a larger (though sometimes smaller) font than the rest of the page for some reason I could not find... So set everything to 100% to give uniformity to the page; which benefits visually-challenged users who want to use gigantic fonts for the entire page.
2019-01-01TODO: avoid mentioning untrustworthy browser extensions
Old and new versions of Mozilla-based browsers seem to support userContent.css just fine. cf. https://www-archive.mozilla.org/unix/customizing.html#usercss http://kb.mozillazine.org/index.php?title=UserContent.css
2019-01-01TODO: support integration with cgit/gitweb/etc...
We support searching on blob identifiers for a reason :>
2018-12-30TODO: add a note for exposing a targeted reindexing API
2018-12-30handle "multipart/mixed" messages which are not multipart
I've found two examples on https://lore.kernel.org/lkml/ where the messages declared themselves to be "multipart/mixed" but were actually plain text: <87llgalspt.fsf@free.fr> <200308111450.h7BEoOu20077@mail.osdl.org> With the mboxrd downloaded, mutt is able to view them without difficulty. Note: this change would require reindexing of Xapian to pick up the changes. But it's only two ancient messages, the first was resent by the original sender and the second is too old to be relevant.
2018-12-30examples/cgit-commit-filter.lua: escape '&' properly in URL
2018-12-29t/git.t: reorder IPC::Run check
We can't skip tests after "use_ok"
2018-12-29t/cgi.t: shorten %ENV setting
No need to write our own loop when an assignment will do.
2018-12-29tests: consolidate process spawning code.
IPC::Run provides a nice simplification in several places; and we already use it (optionally) on a lot of tests. For the non-test code, we still rely on our vfork-capable Inline::C stuff since real-world server processes can get large enough to where vfork is an advantage. Maybe Perl5 can use CLONE_VFORK somehow, one day: https://rt.perl.org/Ticket/Display.html?id=128227 Ohg V'q engure cbeg choyvp-vaobk gb Ehol :C
2018-12-29examples/cgit-commit-filter.lua: update URLs
Let's Encrypt is working out nicely, so we can rely on HTTPS, now. Use 80x24.org instead of bogomips.org while we're at it, since I don't think the latter will remain.
2018-12-29TODO: add note for "IMAP IDLE"-like long-polling "git fetch"
2018-12-28wwwstream: always show multi-line cloning instructions
Unfortunately, long inbox names and URLs don't really display well with my gigantic fonts...
2018-12-28add filter for gmane archives
Extracted from import_slrnspool, since some spools get converted to mbox or what not.
2018-12-28init: allow --skip of old epochs for -V2 repos
This allows archivists to publish incomplete archives with newer mail while allowing "0.git" (or "1.git" and so on) epochs to be added-after-the-fact (without affecting "git clone" followers). A reindex will be necessary for Xapian and SQLite to catch up once the old epochs are added; but the reindexing code is also capable of tolerating missing epochs.
2018-12-28reply: allow ":none=$REASON" in "replyto" config
This can be useful for configuring archives of lists which are no longer active.
2018-12-27t/git-http-backend.t: remove TEST_CHUNK env setting
TEST_CHUNK hast not been relevant since 2016: (commit bb38f0fcce73904e "http: chunk in the server, not middleware")
2018-12-27t/perf-nntpd.t: update for RFC 5536 sec 3.2.14 compliance
This performance test doesn't normally get run... Fixes: dd7049951c052c54 ("Put the NNTP server name into Xref lines")
2018-12-27init: do not set publicinbox.$NAME.indexlevel by default
It is redundant to set default values in the public-inbox config file. Lets not clutter up users' screens when they view or edit the config file.
2018-12-18TODO: add a note for davfs2 Range: support
And maybe I or somebody else interested will implement it, since fusedav is abandoned upstream and removed from Debian testing: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=840388 Yes, I have fusedav patches at https://bogomips.org/fusedav.git as noted in the above bug report, but I think davfs2 has more momentum at the moment.
2018-12-12doc/hosted: add glibc and bug-gnulib mirrors
These have existed for a while, actually, so, we might as well publicize them. While we're at it, add a disclaimer to discourage reliance on single points of failure.
2018-12-06nntp: prevent event_read from firing twice in a row
When a client starts pipelining requests to us which trigger long responses, we need to keep socket readiness checks disabled and only enable them when our socket rbuf is drained. Failure to do this caused aborted clients with "BUG: nested long response" when Danga::Socket calls event_read for read-readiness after our "next_tick" sub fires in the same event loop iteration. Reported-by: Jonathan Corbet <corbet@lwn.net> cf. https://public-inbox.org/meta/20181013124658.23b9f9d2@lwn.net/
2018-10-16Add Xrefs to over/xover lines
Putting the Xref field into xover lines allows newsreaders to mark cross-posted messages read when catching up a group. That, in turn, massively improves the life of crazy people who try to follow dozens of kernel lists, where emails are often heavily cross-posted.
2018-10-16Put the NNTP server name into Xref lines
RFC 5536 sec 3.2.14 says that the server-name in an Xref line is "which news server generated the header field"; indeed, that is necessary for newsreaders like gnus to handle references properly. So pick up the server name from the config if available (the first name if there's more than one), from the host name otherwise, and use it rather than the domain name of the list server. Tests have been adjusted to match the new behavior.
2018-08-10Import.pm: When purging replace a purged file with a zero length file
This ensures that the number of added files remains the same and thus the article numbers derived from a repository will remain the same. I think this is the last place in public-inbox that has to be tweaked to guarantee the generated article number will remain the same in an public inbox archive. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-08-05overidx: preserve `tid' column on re-indexing
Otherwise, walking backwards through history could mean the root message in a thread forgets its `tid' and it prevents messages from being looked up by it. This bug was hidden by the fact that `sid' matches were often good enough to link threads together.
2018-08-05view: distinguish strict and loose thread matches
The "loose" (Subject:-based) thread matching yields too many hits for some common subjects (e.g. "[GIT] Networking" on LKML) and causes thread skeletons to not show the current messages. Favor strict matches in the query and only add loose matches if there's space. While working on this, I noticed the backwards --reindex walk breaks `tid' on v1 repositories, at least. That bug was hidden by the Subject: match logic and not discovered until now. It will be fixed separately. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2018-08-03Merge branch 'eb/index-incremental'
Incremental indexing fixes from Eric W. Biederman. These prevents the highest message number in msgmap from being reassigned after deletes in rare cases and ensures messages are deleted from msgmap in v2. * eb/index-incremental: V2Writeable.pm: In unindex_oid delete the message from msgmap V2Writeable.pm: Ensure that a found message number is in the msgmap SearchIdx,V2Writeable: Update num_highwater on optimized deletes t/v[12]reindex.t: Verify the num highwater is as expected t/v[12]reindex.t Verify num_highwater Msgmap.pm: Track the largest value of num ever assigned SearchIdx.pm: Always assign numbers backwards during incremental indexing t/v[12]reindex.t: Test incremental indexing works t/v[12]reindex.t: Test that the resulting msgmap is as expected t/v[12]reindex.t: Place expected second in Xapian tests t/v2reindex.t: Isolate the test cases more t/v1reindex.t: Isolate the test cases Import.pm: Don't assume {in} and {out} always exist
2018-08-03V2Writeable.pm: In unindex_oid delete the message from msgmap
Now that we track the num highwater mark it is safe to remove messages from msgmap that have been previously allocated. Removing even the highest numbered article will no longer cause new message numbers to move backwards. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-08-03V2Writeable.pm: Ensure that a found message number is in the msgmap
The lookup to see if a num has already been assigned to a message happens in a temporary copy of message map. It is possible that the number has been removed from the current message map. The unindex/reindex after a history rewrite triggered by a purge should be one such case. Therefore add the number to the msgmap in case it is not currently present. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-08-03SearchIdx,V2Writeable: Update num_highwater on optimized deletes
When performing an incremental index update with index_sync if a message is seen to be both added and deleted update the num_highwater mark even though the message is not otherwise indexed. This ensures index_sync generates the same msgmap no matter which commit it stops at during incremental syncs. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-08-03t/v[12]reindex.t: Verify the num highwater is as expected
Instrument the tests to verify the highwater num highwater mark is where it is expected. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-08-03t/v[12]reindex.t Verify num_highwater
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-08-03Msgmap.pm: Track the largest value of num ever assigned
Today the only thing that prevents public-inbox not reusing the message numbers of deleted messages is the sqlite autoincrement magic and that only works part of the time. The new incremental indexing test has revealed areas where today public-inbox does try to reuse numbers of deleted messages. Reusing the message numbers of existing messages is a problem because if a client ever sees messages that are subsequently deleted the client will not see the new messages with their old numbers. In practice this is difficult to trigger because it requires the most recently added message to be removed and have the removal show up in a separate pull request. Still it can happen and it should be handled. Instead of infering the highset number ever used by finding the maximum number in the message map, track the largest number ever assigned directly. Update Msgmap to track this value and update the indexers to use this value. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-08-03search: (really) match the behavior of WWW for indexing text
Not sure what was going through my mind when I made my first attempt at this, but we really want to make sure we index all the text we display in the web view (and presumably anything a reasonable mail client can display). Followup-to: 0cf6196025d4e4880cd1ed859257ce21dd3cdcf6 ("search: match the behavior of WWW for indexing text")
2018-08-02SearchIdx.pm: Always assign numbers backwards during incremental indexing
When walking messages newest to oldest, assigning the larger numbers before smaller numbers ensures older messages get smaller numbers. This leads to the possibility of a msgmap that can be regenerated when needed. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-08-02t/v[12]reindex.t: Test incremental indexing works
Capture interesting commits of the test repository in mark variables. Use those marks to build interesting scenarios where index_sync proceeds as if those marks are the heads of the repositor. Use this capability to test what happens when adds and deletes are mixed within a repository. Be sad because things don't yet work as they should. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-08-02t/v[12]reindex.t: Test that the resulting msgmap is as expected
Deeply inspect the entire message map in the reindexing tests as the actual message order is significant and can result in surprises. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-08-02t/v[12]reindex.t: Place expected second in Xapian tests
Place the expected value second in is and isnt tests because when these tests fail they report the second value as the expected value. A report saying got 0 expected 8 'no Xapian search results' can be confusing. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-08-02t/v2reindex.t: Isolate the test cases more
While inspecting the tests I realized that because we have been reusing variables there can be a memory between one test case and another. Add scopes and local variables to prevent an unintended memory between one test cases. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-08-02t/v1reindex.t: Isolate the test cases
While inspecting the tests I realized that because we have been reusing variables there can be a memory between one test case and another. Add scopes and local variables to prevent an unintended memory between one test and another. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-08-02Import.pm: Don't assume {in} and {out} always exist
While working on one of the tests I did: my $im = PublicInbox::V2Writable->new($ibx, 1); my $im0 = $im->importer(); $im->add($mime); Which resulted in a warning of the use of an undefined value from atfork_child, and the test failing nastily. Inspection of the code reveals this can happen anytime gfi_start has not been called. So just fix atfork_child to skip closing file descriptors that have not yet been setup. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-07-30ProcessPipe.pm: Use read not sysread
While playing with git fast export I discovered that mixing <> and read would give inconsistent results. I tracked the issue down to using sysread in ProcessPipe instead of plain read. If it is desirable to use readline I can't see how using sysread can work as readline to be efficient needs to use buffered I/O. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-07-29mda: allow configuring globally without spamc support
This reuses some of the configuration from -watch, but remains independent since some configurations will use -watch for some inboxes and -mda for others. The default remains "spamc" for -mda users so nothing changes without explicit configuration. Per-inbox configurations may also be supported in the future.
2018-07-29mda: v2: ensure message bodies are indexed
We must not clobber the original message string, as Email::MIME(*) still needs it for iterating through parts in SearchIdx (but not when handing it as a raw string to git-fast-import). I've noticed message bodies (especially dfpre/dpost) were not getting indexed when going through -mda (no problems with -watch). This also did not affect v1 repos, since indexing is a separate process for v1 and requires re-reading the data from git. (*) tested Email::MIME 1.937 on Debian stretch
2018-07-29t/v2mda: make it easy to test v1 repos here, too
It will help track down a bug which only seems to happen in v2 repos.