public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2023-11-10	www: add topics_(new\|active).(html\|atom) endpoints
	This seems like a easy (but WWW-specific) way to get recently created and recently active topics as suggested by Konstantin. To do this with Xapian will require a new columns and reindexing; and I'm not sure if the current lei handling of search results by dumping results to a format readable by common MUAs would work well with this. A new TUI may be required... Suggested-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20231107-skilled-cobra-of-swiftness-a6ff26@meerkat/
2023-01-30	use Net::SSLeay (OpenSSL) for SHA-(1\|256) if installed
	On my x86-64 machine, OpenSSL SHA-256 is nearly twice as fast as the Digest::SHA implementation from Perl, most likely due to an optimized assembly implementation. SHA-1 is a few percent faster, too.
2022-09-11	view: fix solver links with multiple messages
	For redundant messages sharing Message-IDs, the link to solver (/$INBOX/$OID/s/) was going up too many levels for /$INBOX/$MSGID/ when there were multiple messages sharing the same $MSGID. Unfortunately, redundant messages are common with /all/ due to signature trailers. So dynamically assigning {-spfx} is tricky and error prone from counting `/'. So simplify the code a bit by setting {-spfx} once per HTTP request, instead of every single message.
2022-09-10	www: use PerlIO::scalar (zfh) for buffering
	Calling Compress::Raw::Zlib::deflate is fairly expensive. Relying on the `.=' (concat) operator inside ->zadd operator is faster, but the method dispatch overhead is noticeable compared to the original code where we had bare `.=' littered throughout. Fortunately, `print' and `say' with the PerlIO::scalar IO layer appears to offer better performance without high method dispatch overhead. This doesn't allow us to save as much memory as I originally hoped, but does allow us to rely less on concat operators in other places and just pass a list of args to `print' and `say' as a appropriate. This does reduce scratchpad use, however, allowing for large memory savings, and we still ->deflate every single $eml.
2022-09-10	www: switch to zadd for the majority of buffering
	This allows us to focus string concatenations in one place to allow Perl internal scratchpad optimizations to reuse memory. Calling Compress::Raw::Zlib::deflate repeatedly proves too expensive in terms of CPU cycles.
2022-09-10	www: drop {obuf} use entirely, for now
	This may help us identify hot spots and reduce pad space as needed.
2022-09-10	view: remove multipart_text_as_html
	It seems like a pointless wrapper function that's not saving us a whole lot. Drop some direct {obuf} manipulation while we're at it.
2022-09-10	www_atom_stream: require 200 response
	This simplifies parameter passing at the moment. I can't imagine an Atom feed reader would be parsing XML for 404s or other error codes.
2022-08-29	www: atom: fix "changed" href to nowhere
	The HTML generated for the Atom feed doesn't have the footer of /T/ and /t/ HTML-only views, so just make "changed" in the diffstat go directly to the permalink #related anchor. Fixes: 66512e177390 ("view: generate query in single-message and commit views")
2022-08-26	www: fix unindexed v1 inboxes w/ public-inbox-httpd
	Unindexed v1 inboxes were leaving $smsg objects unpopulated when using public-inbox-httpd (but not generic PSGI servers) and causing missing HTML content and uninitialized value warnings. Our existing tests for unindexed v1 inboxes only assumed generic PSGI servers and synchronous blob retrieval. Due to changes several years ago to make git blob retrieval async for slow storage using public-inbox-httpd, our tests were insufficient to detect this regression. So ensure $smsg->populate runs in a few places and rewrite t/plack.t to test against both generic PSGI and -httpd implementations. Fortunately, unindexed v1 inboxes are uncommon, and this bug was only (finally) discovered while developing other features. For ensuring we can test (and not blindly follow) redirects with -httpd, we now provide our own LWP::UserAgent (used internally by Plack::Test::ExternalServer) with redirect following disabled to P:T:ES::test_psgi.
2021-10-25	gzip_filter: delay async wcb call
	This will let us modify the response header later to set a proper charset for Content-Type when displaying raw messages. Cc: Thomas Weißschuh <thomas@t-8ch.de>
2021-10-22	wwwatomstream: call gmtime with scalar
	When the gmtime() calls were moved from feed_entry() and atom_header() into feed_updated() in c447bbbd, @_ rather than a scalar was passed to gmtime(). As a result, feed <updated> values end up as "1970-01-01T00:00:00Z". Switch back to using a scalar argument to restore the correct timestamps. Fixes: c447bbbddb4ac8e1 ("wwwatomstream: simplify feed_update callers")
2021-03-17	extindex: add some validation and config knobs for WWW
	We'll try to share a bit more configuration with extindex entries for WWW PSGI usage.
2021-01-01	update copyrights for 2021
	Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-12-09	treewide: replace {-inbox} with {ibx} for consistency
	{ibx} is shorter and is the most prevalent abbreviation in indexing and IMAP code, and the `$ibx' local variable is already prevalent throughout. In general, the codebase favors removal of vowels in variable and field names to denote non-references (because references are "lighter" than non-references). So update WWW and Filter users to use the same code since it reduces confusion and may allow easier code sharing.
2020-08-02	remove unnecessary ->header_obj calls
	We used ->header_obj in the past as an optimization with Email::MIME. That optimization is no longer necessary with PublicInbox::Eml. This doesn't make any functional difference even if we were to go back to Email::MIME. However, it reduces the amount of code we have and slightly reduces allocations with PublicInbox::Eml.
2020-08-01	www: rework async_* to use method table
	Although the ->async_next method does not take $self as a receiver, but rather a PublicInbox::HTTP object, we may still retrieve it to be called with the HTTP object via UNIVERSAL->can.
2020-07-10	wwwatomstream: avoid uninitialized warnings for $email
	As in Import, we'll fall back to Sender: if From: is missing, and use the primary_address of the inboxes to indicate the total absence of those fields.
2020-07-06	www: update internal docs
	We no longer favor getline+close for streaming PSGI responses when using public-inbox-httpd. We still support it for other PSGI servers, though.
2020-07-06	www: start making gzipfilter the parent response class
	Virtually all of our responses are going to be gzipped, anyways. This will allow us to utilize zlib as a buffering layer and share common code for async blob retrieval responses. To streamline this and allow GzipFilter to be a parent class, we'll replace the NoopFilter with a similar CompressNoop class which emulates the two Compress::Raw::Zlib::Deflate methods we use. This drops a bunch of redundant code and will hopefully make upcoming WwwStream changes easier to reason about.
2020-07-06	wwwatomstream: support async blob fetch
	This allows -httpd to handle other requests while waiting for git to retrieve and decode blobs. We'll also break apart t/psgi_v2.t further to ensure tests run against -httpd in addition to generic PSGI testing. Using xt/httpd-async-stream.t to test against clones of meta@public-inbox.org shows a 10-12% performance improvement with the following env: TEST_JOBS=1000 TEST_CURL_OPT=--compressed TEST_ENDPOINT=new.atom
2020-07-06	wwwatomstream: reuse $ctx as $self
	No need to deepen our object graph, here.
2020-07-06	wwwatomstream: use PublicInbox::Inbox->modified for feed_updated
	stat(2) on the inboxdir is unlikely to be correct, now that msgmap truncates its journal (rather than unlinking it).
2020-07-06	wwwatomstream: simplify feed_update callers
	We always return Z (UTC) times, anyways, so we'll always use gmtime() on the seconds-after-the-epoch.
2020-07-06	www*stream: gzip ->getline responses
	Our most common endpoints deserve to be gzipped.
2020-06-03	wwwatomstream: drop smsg->{mid} fallback for non-SQLite
	It's no longer necessary to populate the smsg->{mid} field now that ->smsg_eml calls smsg->populate in rare cases where the smsg did not originate from SQLite.
2020-06-03	wwwatomstream: convert callers to use smsg_eml
	We can simplify WwwAtomStream callbacks by performing ->smsg_eml calls in the `feed_entry' sub itself. This simplifies callers, by reducing the number of places which can load an Eml object into memory.
2020-04-19	wwwatomstream: move {emit_header} field to $self
	There's no need to pollute the cross-package $ctx with it.
2020-02-16	view: escape ampersand in Message-IDs
	We need to escape ampersands (and some other characters for href attributes), so introduce a `mid_href' sub to do just that. '<', '>' and '"' were always escaped, so there's no risk of tag or attribute injection, but creative Message-IDs could cause confusion for some parsers and generate invalid URLs. Start getting rid of the bloated, over-engineered OO Hval API while we're at it, I only noticed this bug because I started killing off Hval->new* callers.
2020-02-16	view: remove mhref arg from multipart_text_as_html
	No point in passing something on stack only to stash it into the $ctx which holds most other parameters used for rendering the HTML.
2020-02-06	treewide: run update-copyrights from gnulib for 2019
	I didn't wait until September to do it, this year!
2020-01-27	view: start performing buffering into {obuf}
	Get rid of the confusingly named {rv} and {tip} fields and unify them into {obuf} for readability. {obuf} usage may be expanded to more areas in the future. This will eventually make it easier for us to experiment with alternative buffering schemes.
2020-01-27	wwwstream: favor \&close instead of close
	Be explicit that we're making a code reference, and not a reference to a scalar, array, hash, or IO...
2020-01-12	www: discard multipart parent on iteration
	We're often iterating through messages while writing to another buffer in our WWW interface, causing memory usage to multiply. Since we know we won't need to keep the MIME object around in some cases, and can tell msg_iter to clobber the on-stack variable while it operates on subparts of multipart messages. With xt/mem-msgview.t switched to multipart from the previous commit, this shows a 13 MB memory reduction on that test.
2020-01-06	treewide: "require" + "use" cleanup and docs
	There's a bunch of leftover "require" and "use" statements we no longer need and can get rid of, along with some excessive imports via "use". IO::Handle usage isn't always obvious, so add comments describing why a package loads it. Along the same lines, document the tmpdir support as the reason we depend on File::Temp 0.19, even though every Perl 5.10.1+ user has it. While we're at it, favor "use" over "require", since it it gives us extra compile-time checking.
2019-12-27	feed: avoid anonymous subs
	WwwStream already passes the WWW $ctx to the user-supplied callback, and it's a trivial change for WwwAtomStream to do the same. Callers in Feed.pm can now take advantage of that to save a few kilobytes of memory on every response.
2019-10-16	config: support "inboxdir" in addition to "mainrepo"
	"mainrepo" ws a bad name and artifact from the early days when I intended for there to be a "spamrepo" (now just the ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be especially confusing, since v2 needs at least two git repositories (epoch + all.git) to function and we shouldn't confuse users by having them point to a git repository for v2. Much of our documentation already references "INBOX_DIR" for command-line arguments, so use "inboxdir" as the git-config(1)-friendly variant for that. "mainrepo" remains supported indefinitely for compatibility. Users may need to revert to old versions, or may be referring to old documentation and must not be forced to change config files to account for this change. So if you're using "mainrepo" today, I do NOT recommend changing it right away because other bugs can lurk. Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
2019-09-20	wwwatomstream: fix per-feed <id>
	We were emitting the same "<id>mailto:name@domain</id>" tag for every feed (but not per-feed entry). This could cause feed readers to mistake the top (news.atom) feed for other feeds (search results, or per-thread feeds). This is technically a breaking change for people relying on per-thread or per-query feeds, but the only alternative is to remain broken for anybody trying to follow multiple feeds off the same inbox.
2019-09-09	run update-copyrights from gnulib for 2019

2019-02-01	view: diffstat anchors for multi-message/attachment views
	diffstat <-> ^diff anchors work within the same attachment or message while in HTML views which display multiple messages.
2019-01-09	doc: various overview-level module comments
	Hopefully this helps people familiarize themselves with the source code.
2018-03-27	view: depend on SearchMsg for Message-ID
	Since we need to handle messages with multiple and duplicate Message-ID headers, our thread skeleton display must account for that. Since we have a "preferred" Message-ID in case of conflicts, use it as the UUID in an Atom feed so readers do not get confused by conflicts.
2018-03-06	favor Received: date over Date: header globally
	The first Received: header is believable since it typically hits the user's mail server and can be treated as relatively trustworthy. We still show the Date: in per-message (permalink) views, which may expose users for having incorrect Date: headers, but all the ISO YYYY-MM-DD dates we display will match what we see.
2018-02-07	update copyrights for 2018
	Using update-copyrights from gnulib While we're at it, use the SPDX identifier for AGPL-3.0+ to ease mechanical processing.
2018-01-26	atom: show metadata before message body
	This can allow streaming parsers (SAX) to work a little more efficiently as they can handle/discard all the metadata before the big content.
2017-07-13	www: Atom stream respects timezone
	Oops, we must not discard the timezone when parsing dates for the Atom stream.
2017-01-07	remove incorrect comment about strftime + locales
	We only need strftime to be locale-independent when generating dates for email and HTTP headers. Purely numeric dates can use strftime for ease-of-readability.
2016-12-17	atom: implement message threading per RFC 4685
	This will allows certain feed readers to render a message thread as described in <https://www.jwz.org/doc/threading.html>. Feed readers with knowledge of of RFC 4685 are unknown to us at this time, but perhaps this will encourage future implementations. Existing feed readers I've tested (newsbeuter, feed2imap) seem to ignore these tags gracefully without degradation.
2016-12-03	atom: switch to getline/close for response bodies
	This will let us stream larger Atom documents bodies without wasting too much memory and reduce the amount of round-trip requests needed to get necessary information. Hopefully clients are using streaming (SAX) parsers, too. This is the final transition in the core public-inbox code to allow migrating to a "pull"-based body streaming scheme which allows a HTTP server to respond appropriately to backpressure from slow clients.