public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2021-10-24	thread: avoid Perl5 internal scratchpad target cache
	The use of array-returning built-ins such as `grep' inside arrayref declarations appears to result in permanently allocated scratchpad space for caching according to my malloc inspector. Thread skeletons get discarded every response, but multiple skeletons can exist in memory at once, so do what we can to prevent long-lived allocations from being made, here. In other words, replacing constructs such as: my $foo = [ grep(...) ]; with: my @foo = grep(...); Seems to ensure the mortality of the underlying array.
2021-10-09	view: save memory by dropping smsg->{from_name} on use
	We'll also save a few LoC when generating it. $smsg objects can linger a while when rendering large threads, so saving a few bytes here can add up to several hundred KB saved. I noticed this while chasing the ref cycle leak in commit b28e74c9dc0a (www: fix ref cycle from threading w/ extindex, 2021-10-03). While there's no longer a leak, releasing memory earlier can allow it to be reused sooner and reduce both memory traffic and memory pressure.
2021-10-09	view: discard Eml->{bdy} when done using
	We can release the raw body buffer once we've obtained a copy of the decoded buffer. This reduces memory pressure ahead of some expensive diff processing.
2021-10-06	msg_iter: split_quotes adds trailing "\n"
	The regexp in split_quotes relies on the presence of a final "\n", so add it wherever we need to instead of making it the responsibility of every caller. This probably doesn't matter in practice since every email seems to have a "\n" as the final byte (due to the way SMTP works), but maybe there's some odd ones that'll get imported via lei.
2021-09-29	www: do not bump {over} refcnt on long responses
	SQLite files may be replaced or removed by admins while generating a large threads or mailbox responses. Ensure we don't hold onto DBI handles and associated file descriptors past their cleanup.
2021-09-02	www: handle name-only publicinbox.*.url entries
	Apparently URLs can be configured relatively for HTTP(S) setups, attempt to support them when linking to cross-posted messages. This also fixes the top-row (mirror/help/color/Atom feed) links in /$INBOX_URL/$EXTMSG_MSGID/T/ (and /t/) URLs. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210902191239.cmbxlmjqcsmdzmqp@meerkat.local/
2021-08-28	get rid of unnecessary bytes::length usage
	The only place where we could return wide characters with -httpd was the raw $INBOX_DIR/description text, which is now converted to octets. All daemon (HTTP/NNTP/IMAP) sockets are opened in binary mode, so length() and bytes::length() are equivalent on reads. For socket writes, any non-octet data would warn about wide characters and we are strict in warnings with test_httpd. All gzipped buffers are also octets, as is PublicInbox::Eml->body, and anything from PerlIO objects ("git cat-file --batch" output, filesystems), so bytes::length was unnecessary in all those places.
2021-08-17	view: remove mbox.gz and Atom from topic view
	This declutters the topic view since these links seem rarely used. Atom and mbox.gz links probably make most sense when users have read the HTML and decide the topic is worth following or downloading. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210816154444.sj3ks2sikq3x2ywx@nitro.local/
2021-08-14	www: avoid uninitialized vars from shadowed Message-IDs
	For /all/ (extindex) and like, Message-ID reuse from client errors or list-injected footers can cause threading weirdness. Avoid auto-vivification in the mapping table and dereferencing of unknown messages.
2021-06-20	view: extra check to for redundant messages in HTML view
	There appears to be some cases of duplicates appearing due to -extindex. I haven't nailed down the cause of it, yet, but this should make things easier for readers using the PSGI HTML interface in the meantime. The raw mboxrd remains undeduplicated for now, and the correct fix/workaround would be some fsck-like mode for public-inbox-extindex.
2021-04-28	view: add [thread overview] anchor next to Date:
	The existing Subject: anchor to #r may not be 100% obvious, and we can't stick the phrase "[thread overview]" into the same line as the Subject without introducing ambiguity. Fortunately, we have the Date: header directly under it. Adding "[thread overview]" after the Date: is unambiguous and won't make the line too long for valid emails. This hopefully improves navigation ever-so-slightly thanks to comments by Son Luong Ngoc. Reported-by: Son Luong Ngoc <sluongng@gmail.com> Link: https://public-inbox.org/git/YHhfsqfTJ9NzRwS1@C02YX140LVDN.corpad.adbkng.com/
2021-03-17	config: lazy-load coderepos, support extindex
	Extsearch objects are duck-types of Inbox objects, and are capable of supporting code repos all the same.
2021-01-01	update copyrights for 2021
	Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-12-09	treewide: replace {-inbox} with {ibx} for consistency
	{ibx} is shorter and is the most prevalent abbreviation in indexing and IMAP code, and the `$ibx' local variable is already prevalent throughout. In general, the codebase favors removal of vowels in variable and field names to denote non-references (because references are "lighter" than non-references). So update WWW and Filter users to use the same code since it reduces confusion and may allow easier code sharing.
2020-09-12	treewide: avoid `goto &NAME' for tail recursion
	While Perl implements tail recursion via `goto' which allows avoiding warnings on deep recursion. It doesn't (as of 5.28) optimize the speed of such dispatches, though it may reduce ephemeral memory usage. Make the code less alien to hackers coming from other languages by using normal subroutine dispatch. It's actually slightly faster in micro benchmarks due to the complexity of `goto &NAME'.
2020-08-28	www: more descriptive pagination
	Being an easily confused person, I find "next" and "prev" ambiguous as to whether messages on the next or previous page will be newer or older than the current page. Clarify that for the threaded /$INBOX/ view and search results. For search results sorted by relevance, we'll use "[>= $SCORE]" or "[<= $SCORE]" to indicate to indicate directionality. This also fixes $INBOX/new.html for unindexed v1 inboxes.
2020-08-28	www: improve navigation around contemporary threads
	Sometimes it's useful to quickly get to threads and messages which are contemporaries of the current thread/message being focused on. This hopefully improves navigation by making: a) the top line (where $INBOX_DIR/description) is shown a link to the latest topics in search results and per-thread/per-message views. b) providing a link to contemporaries ("~YYYY-MM-DD") at around the thread overview skeleton area for per-thread and per-message views
2020-08-07	www: avoid warnings on YYYYMMDD-only t= query parameter
	While we always generate YYYYMMDDhhmmss query parameters ourselves, the regexps in paginate_recent allow YYYYMMDD-only (no hhmmss) timestamps, so don't trigger Time::Local::timegm warnings about empty numeric comparisons on empty strings when a client starts making up their own URLs.
2020-08-02	remove unnecessary ->header_obj calls
	We used ->header_obj in the past as an optimization with Email::MIME. That optimization is no longer necessary with PublicInbox::Eml. This doesn't make any functional difference even if we were to go back to Email::MIME. However, it reduces the amount of code we have and slightly reduces allocations with PublicInbox::Eml.
2020-07-06	view: simplify eml_entry callers further
	This simplifies the primary callers of eml_entry while only making mknews.perl worse.
2020-07-06	www: update internal docs
	We no longer favor getline+close for streaming PSGI responses when using public-inbox-httpd. We still support it for other PSGI servers, though.
2020-07-06	wwwstream: eliminate ::response, use html_oneshot
	All of our streaming responses use ::aresponse, now, and our synchronous responses use html_oneshot. So there's no need for the old WwwStream::response.
2020-07-06	view: /$INBOX/$MSGID/t/: avoid extra hash lookup in eml case
	We can build and buffer the HTML <head> section once the first non-ghost message in a thread is loaded, so there's no need to perform an extra check on $ctx->{nr} once the $eml is ready.
2020-07-06	view: eml_entry: reduce parameters
	We can save stack space and simplify subroutine calls, here.
2020-07-06	view: update /$INBOX/$MSGID/T/ to be async
	Another 10% or so speedup in a frequently-hit endpoint.
2020-07-06	view: /$INBOX/$MSGID/t/ reads blobs asynchronously
	Once again, this shows a ~10% speedup with multi-message threads in xt/httpd-async-stream.t regardless of whether TEST_JOBS is 1 or 100.
2020-07-06	view: make /$INBOX/$MSGID/ permalink async
	This will allow -httpd to handle other requusts if waiting on an HDD seek or git to decode a blob.
2020-07-06	wwwstream: reduce blob fetch paths for ->getline
	This will make it easier to support asynchronous blob retrievals. The `$ctx->{nr}' counter is no longer implicitly supplied since many users didn't care for it, so stack overhead is slightly reduced.
2020-07-06	wwwstream: reduce object graph depth
	Like with WwwAtomStream and MboxGz, we can bless the existing $ctx object directly to avoid allocating a new hashref. We'll also switch from "->" to "::" to reduce stack utilization.
2020-06-03	www: remove smsg_mime API and adjust callers
	To further simplify callers and avoid embarrasing memory explosions[1], we can finally eliminate this method in favor of smsg_eml. [1] commit 7d02b9e64455831d3bda20cd2e64e0c15dc07df5 ("view: stop storing all MIME objects on large threads") fixed a huge memory blowup.
2020-05-26	view: do not offer links to 0-byte multipart attachments
	Offering links to download 0-byte files is useless. We could waste memory by preserving $eml->{bdy} during iteration, but offering attachments of type "multipart" is not very useful, as users are usually interested in decoded attachments or the entire raw message. Fixes: e60231148eb604a3 ("descend into message/(rfc822\|news\|global) parts")
2020-05-17	descend into message/(rfc822\|news\|global) parts
	Email::MIME never supported this properly, but there's real instances of forwarded messages as message/rfc822 attachments. message/news is legacy thing which we'll see in archives, and message/global appears to be the new thing. gmime also supports message/rfc2822, so we'll support it anyways despite lacking other evidence of its existence. Existing attachments remain downloadable as a whole message, but individual attachments of subparts are now downloadable and can be displayed in HTML, too. Furthermore, ensure Xapian can now search for common headers inside those messages as well as the message bodies.
2020-05-16	view: drop a newline before first attachment link
	However, we'll always have a newline before subsequent attachments links after the first. For the initial part of a multipart message, this regression appeared in 1.5.0, but the display was overly clumped in prior relases, too. Fixes: 453dee4881a9c764 ("msg_iter: pass $idx as a scalar, not array")
2020-05-09	replace most uses of PublicInbox::MIME with Eml
	PublicInbox::Eml has enough functionality to replace the Email::MIME-based PublicInbox::MIME.
2020-05-09	msg_iter: pass $idx as a scalar, not array
	This doesn't make any difference for most multipart messages (or any single part messages). However, this starts having space savings when parts start nesting. It also slightly simplifies callers.
2020-05-09	msg_iter: make ->each_part method for PublicInbox::MIME
	The reliance on Email::MIME->subparts is a tad inefficient with a work-in-progress module to replace Email::MIME. So move towards using ->each_part as a class-specific iterator which can take advantage of more class-specific optimizations in the yet-to-be-revealed PublicInbox::Eml and PublicInbox::Gmime classes. The msg_iter() sub remains for compatibility with existing 3rd-party scripts/modules which use our small public Perl API and Email::MIME.
2020-05-07	viewdiff: stricter highlighting and linkification check
	Sometimes senders draw ASCII tables and such which we get fooled into attempting highlighting and diffstat anchoring. We now require 3 consecutive diff header lines: /^--- /, /^\Q+++\E /, and /^@@ / to enable diff highlighting (whether generated with git or not). The presence of a line matching /^diff / is not sufficient or even useful to us for highlighting diffs, since that could just be part of a line-wrapped sentence. However, we'll now check for the presence of a line matching /^diff --git / before enabling diffstat anchors. Otherwise cover letters for a patch series may fool us into creating anchors for diffstats.
2020-04-22	view: actually omit subject text when dumping topics
	Despite dump_topics() calling dedupe_subject() on the subject, the index shows partly duplicated subjects, for example ` [PATCH 2/2] t/www_listing: avoid 'once' warnings ` [PATCH v2] t/www_listing: avoid 'once' warnings " In the second line, the omission character " is appended, but the entire subject is shown. To display the subject with duplicated parts omitted, regenerate it from the array that is modified by dedupe_subject().
2020-04-22	view: strip omission character from current message in thread view
	In the thread view shown at the top of a message, the subject for the current message is dropped, leaving just the sender's name. However, if skel_dump() omitted part of the subject because it was duplicated, the omission character is still displayed: * [PATCH v2] t/www_listing: avoid 'once' warnings 2020-03-21 1:10 ` [PATCH 2/2] t/www_listing: avoid 'once' warnings Eric Wong @ 2020-03-21 5:24 ` " Eric Wong Note the " on the last line. Adjust the regular expression in _th_index_lite() to account for the omission character. [ew: avoid capturing $1, keep under 80 cols]
2020-04-17	searchthread: reduce indirection by removing container
	We can rid ourselves of a layer of indirection by subclassing PublicInbox::Smsg instead of using a container object to hold each $smsg. Furthermore, the `{id}' vs. `{mid}' field name confusion is eliminated. This reduces the size of the $rootset passed to walk_thread by around 15%, that is over 50K memory when rendering a /$INBOX/ landing page.
2020-04-09	triewyde: ficks soem speling errrors
	Dikshunarees R gude!
2020-04-07	view: do not redundantly obfuscate addresses
	We shouldn't rerun the address obfuscator on data we've already run through. Instead, run through the unescaped text part and substitute the UTF-8 "\x{2022}" substitution before it hits HTML escaping Fixes: 9bdd81dc16ba6511 ("view: msg_iter calls add_body_text directly")
2020-04-05	release large (non ref) scalars using `undef $sv'
	Using `undef EXPR' like a function call actually frees the heap memory associated with the scalar, whereas `$sv = undef' or `$sv = ""' will hold the buffer around until $sv goes out of scope. The `sv_set_undef' documentation in the perlapi(1) manpage explicitly states this: The perl equivalent is "$sv = undef;". Note that it doesn't free any string buffer, unlike "undef $sv". And I've confirmed by reading Dump() output from Devel::Peek. We'll also inline the old index_body sub in SearchIdx.pm to make the scope of the scalar more obvious. This change saves several hundred kB RSS on both -index and -httpd when hitting large emails with thousands of lines.
2020-04-04	view: inline flush_quote sub
	No point in having an extra sub for a short, commonly called function in the same file.
2020-04-04	viewdiff: reduce sub parameter count
	We're slowly moving towards doing all of our output buffering into a single buffer, so passing that around on the stack as a dedicated parameter is confusing.
2020-04-04	view: dedupe_subject: allow "0" as a valid Subject
	While rare in practice (even by spammers), A single "0" could theoretically be the entire contents of a Subject line. So use the Perl 5.10+ defined-or operator to improve correctness of subject deduplication.
2020-04-04	view: use defined-or operator to simplify checks
	We depend on Perl 5.10 features in other places. Shorten the lifetime of the `$desc' scalar while we're at it.
2020-04-04	view: note we assume UTF-8 on unknown encodings
	Clarify that we're assuming the text is UTF-8, since users may have no idea how it's mangled.
2020-04-03	quiet "Complex regular subexpression recursion limit" warnings
	These seem mostly harmless since Perl will just truncate the match and start a new one on a newline boundary in our case. The only downside is we'd end up with redundant <span> tags in HTML. Limiting the number of line matched ourselves with `{1,$NUM}' doesn't seem prudent since lines vary in length, so we continue to defer the job of limiting matches to the Perl regexp engine. I've noticed this warning in practice on 100K+ line patches to locale data.
2020-04-03	view: handle the topic-free case properly
	There may be no topics for a given timestamp range, so don't attempt to treat `undef' as an arrayref.