public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2020-04-02	mid: add $MID_EXTRACT regexp for export
	This allows us to consistently enforce the same Message-ID extraction rules everywhere and makes it easier for us to make changes in the future. Update scripts/ssoma-replay, as well, but don't rely on PublicInbox::* modules in that since it's legacy and public-inbox was never a dependency of ssoma.
2020-03-22	rename PublicInbox::SearchMsg => PublicInbox::Smsg
	Since the introduction of over.sqlite3, SearchMsg is not tied to our search functionality in any way, so stop confusing ourselves and future hackers by just calling it "PublicInbox::Smsg". Add a missing "use" in ExtMsg while we're at it.
2020-02-17	view: shorten life of MIME object for permalink
	We don't need to hold onto the Email::MIME object across multiple WwwResponse->getline calls, instead we can stuff the rendered HTML of the first (and hopefully only) message of the buffer into ctx->{-html_tip}.
2020-02-16	view: remove last Hval->new caller
	The object-oriented Hval API turned out to be less useful and more clunky than I envisioned years ago, so get rid of it. We'll no longer strip trailing whitespace from From: headers in the HTML display, but I doubt anybody cares.
2020-02-16	view: escape ampersand in Message-IDs
	We need to escape ampersands (and some other characters for href attributes), so introduce a `mid_href' sub to do just that. '<', '>' and '"' were always escaped, so there's no risk of tag or attribute injection, but creative Message-IDs could cause confusion for some parsers and generate invalid URLs. Start getting rid of the bloated, over-engineered OO Hval API while we're at it, I only noticed this bug because I started killing off Hval->new* callers.
2020-02-16	view: escape Subject HTML directly
	No need to use the over-engineered Hval OO API when the subject is already normalized and there's no trailing spaces because of normalization.
2020-02-16	view,searchview: avoid smsg method calls when using SQLite/Xapian
	We already pre-populate the hashref when loading $smsg (PublicInbox::SearchMsg) objects out of over.sqlite3 or Xapian, so making expensive method calls isn't necessary in those cases. We only need to use the method calls when SQLite or Xapian are not available or are being populated (such as during indexing).
2020-02-16	view: cleanup topic accumulation and dumping
	Avoid needlessly normalizing the subject when dumping, since it's pushed into the @$topic array during accumulation in normalized form. We can also safely treat $smsg as a hashref and avoid calling "->ds" as a method since we know we've got that loaded via Over\|\|Search and won't have to use Email::MIME header lookup methods.
2020-02-16	view: dump_topics: better naming of top Subject
	We use `$top' in other places, so name it to `$top_subj' consistently for `$subj' and `$prev_subj' comparisons down the function.
2020-02-16	view: single id="t" for multi-Subject messages
	While multi-Subject messages are unfortunate, try not to generate confusing/invalid HTML with multiple elements having the same HTML id attribute.
2020-02-16	view: remove mhref arg from multipart_text_as_html
	No point in passing something on stack only to stash it into the $ctx which holds most other parameters used for rendering the HTML.
2020-02-06	treewide: run update-copyrights from gnulib for 2019
	I didn't wait until September to do it, this year!
2020-01-27	viewdiff: rewrite and simplify
	Instead of going line-by-line, use split() with a giant regexp to capture groups of contiguous lines. This offloads state management to the regexp itself and makes it FAR easier to keep track of <span> and </span> pairings. Performance seems roughly on par after this change for the meta@public-inbox archives. It seems a tiny bit faster for git@vger with xt/perf-msgview.t, likely due to the longer messages and larger contiguous groups of lines having the same prefix (or no prefix at all) and drastically reduces the number of subroutine calls and Perl ops executed.
2020-01-27	linkify: move to_html over from ViewDiff
	We use the same idiom in many places for doing two-step linkification and HTML escaping. Get rid of an outdated comment in flush_quote while we're at it.
2020-01-27	view: inline and eliminate msg_html
	No need to keep the old sub around, anymore. Rename auxiliary subs to "msg_page_*" instead of the "html" version.
2020-01-27	view: start performing buffering into {obuf}
	Get rid of the confusingly named {rv} and {tip} fields and unify them into {obuf} for readability. {obuf} usage may be expanded to more areas in the future. This will eventually make it easier for us to experiment with alternative buffering schemes.
2020-01-27	view: simplify duplicate Message-ID handling
	It's an uncommon code path, no need to make it more complex than it needs to be by having extra sub parameters.
2020-01-27	view: thread_skel: drop constant tpfx parameter
	It hasn't changed in a few years. Now we can rely on constant folding to avoid extraneous ops to the $skel buffer.
2020-01-27	view: reduce parameters for html_footer
	Put more logic into html_footer and less in its only caller so we can control the buffering and string creation.
2020-01-27	view: improve readability around walk_thread
	Pass \&coderefs explicitly to walk_thread, and add some prototypes + comments to describe what goes on.
2020-01-27	www: use "skel" terminology consistently
	This saves us a few comments and confusion. Yes, it's a destination so "dst" can be appropriate, but we may be using that term elsewhere.
2020-01-25	spelling: favor `publicly' over `publically'
	While both can be correct, the former seems more common, is shorter, and is also consistent with the spelling found in the AGPL-3.0 text.
2020-01-12	www: discard multipart parent on iteration
	We're often iterating through messages while writing to another buffer in our WWW interface, causing memory usage to multiply. Since we know we won't need to keep the MIME object around in some cases, and can tell msg_iter to clobber the on-stack variable while it operates on subparts of multipart messages. With xt/mem-msgview.t switched to multipart from the previous commit, this shows a 13 MB memory reduction on that test.
2020-01-06	view: update POSIX::strftime usage
	The POSIX module is always loaded, so import `strftime' into the namespace so we can use it and take advantage of compile-time arg checking. While we're at it, update and reorder caller functions to use prototypes, too.
2020-01-06	hval: export prurl and add prototype
	This allows to do some compile-time checking and fills in a missing "use" in PublicInbox::NewsWWW, allowing it to be used standalone and independently of PublicInbox::WWW
2020-01-05	view: msg_html: reduce memory use on reused MIDs
	In rare cases where Message-IDs get reused, we do not want to hold onto the large Email::MIME objects in memory after showing the first message. So discard each message as soon as we're done using it so we can save memory for the next message. The new and expensive xt/mem-msgview.t test shows a nearly 14MB reduction for two ~7MB messages. run_script() also gets upgraded to make it easier to pass large inputs via IO GLOBs.
2019-12-27	view: msg_iter calls add_body_text directly
	No need to waste several kilobytes creating an anonymous sub for every invocation of msg_iter.
2019-12-27	searchview: remove anonymous sub when sorting threads by relevance
	We don't need to return a closure or have a separate hash for sorting threads by relevance. Instead, we can stuff the relevance {pct} into the SearchMsg object itself and use that. Note: upon reviewing this code, the sort-by-relevance seems bogus as it only considers the relevance of the topmost message. Instead, it would make more sense to the user to sort by the highest relevance of all messages in that particular thread.
2019-12-27	view: thread_html: pass named sub to WwwStream
	We can pass everything we need into the WWW $ctx to avoid allocating kilobytes of memory for an anonymous sub for every $MESSAGE_ID/t/ request.
2019-12-27	view: msg_html: stop using an anonymous sub
	Stash 5 local variables into the WWW $ctx hash table instead of allocating several kilobytes for an anonymous sub.
2019-12-27	view: avoid anon sub in stream_thread
	WwwStream already passes the WWW $ctx to the callback sub, so we don't need to create a new sub every call to capture local variables for the callback.
2019-12-21	searchview: save a column in &x=t thread skeleton
	Displaying "100%" wastes a precious column. Show "99%" instead since there's little practical difference and <xapian/mset.h> states: Note that these generally aren't percentages of anything meaningful (unless you use a custom weighting formula where they are!) And we're not using a custom weighting formula.
2019-12-20	view: show percentage in search results thread skeleton
	The displays the Xapian ->get_percent value in the skeleton to improve scanning of relevancy; irrelevant results do not display that. This fixes broken #anchor links introduced in the previous commit, irrelevant messages now link to the /$INBOX/$MESSAGE_ID page.
2019-12-20	searchthread: fix usage of user-supplied parameter
	Instead of only passing an Inbox object, we'll pass the $ctx reference as PublicInbox::SearchView::mset_thread did. So although mset_thread was wrong, we now make it's usage of SearchThread::thread correct and update other callers to favor the new style of passing the entire $ctx (with ->{-inbox}) instead of just the Inbox object. This makes the thread skeleton at the bottom of the search page to show subjects of messages, but unfortunately links to non-existent #anchors. The next commit will fix that. While we're at it, favor "\&foo" over "*foo" since the former makes the code reference (aka "function pointer) obvious so it won't be confused for other things named "foo" in that scope (e.g. $foo/@foo/%foo).
2019-10-28	view: show X-Alt-Message-ID in permalink view, too
	Since we index X-Alt-Message-ID (because we need to placate some NNTP clients), we now display it as well, since that Message-ID could be the X-Alt-Message-ID that the reader is actually interested in.
2019-10-28	linkify: support adding "(raw)" link for Message-IDs
	And use it for the per-message permalink display.
2019-10-28	view: improve warning for multiple Message-IDs
	"refer" is not the correct term, here; since that would mean multiple messages have the current message in the "References:" header, and that's a normal occurence. Instead, we need to warn the reader that the given message itself has multiple Message-IDs.
2019-10-28	view: move '<' and '>' outside <a>
	Browsers may underline '<' and '>' in links, which may be confused with '≤' and '≥'. So have the Message-ID header display follow what we do with In-Reply-To headers and move the "<" and ">" outside of <a> in the HTML.
2019-10-28	view: display redundant headers in permalink
	Mail headers can contain multiple headers of any type, so ensure we don't hide any information we're getting in the per-message permalink views. This means it's possible to have multiple From, Date, To, Cc, Subject, and In-Reply-To headers displayed. The thread indices are a special case, I guess, since we run out of space on the line if the headers too long and tools like mutt only show the first one.
2019-09-09	run update-copyrights from gnulib for 2019

2019-06-04	view: require YYYYmmDD(HHMMSS) timestamps to be ASCII
	Passing digits to `timegm' which it does not understand would be a waste of time.
2019-06-04	www: only emit ASCII chars in attachment filenames
	We don't want to emit funky URLs which can be lost in translation or cause problems with non-Unicode-aware clients. Then, don't accept non-ASCII filenames in URLs, since a manually-generated URL/filename in attachment downloads could be used for Unicode homographs to confuse folks who down the attachment.
2019-05-15	www: use Inbox->over where appropriate
	We don't need to rely on Xapian search functionality for the majority of the WWW code, even. subject_normalized is moved to SearchMsg, where it (probably) makes more sense, anyways.
2019-04-23	view: avoid "1+ messages" in per-message footer of /t/ and /T/
	Try to appear gramatically correct and state: "only message in thread" when there's only one known (to us) message in the thread.
2019-04-18	view: show "(no subject)" consistently in HTML
	Empty subjects ("") and undefined Subjects: are now both displayed as "(no subject)" for now.
2019-04-16	cleanup: use '$ibx' consistently when referring to Inbox refs
	'$inbox' is more human-readable, so that is for the more human-readable name in most cases. Making our variable naming more consistent should make the code easier-to-review and harder to screw up.
2019-02-13	ensure bytes::length is available to callers
	We were relying on Danga::Socket using the "bytes" pragma, previously. Nowadays, the "bytes" pragma is not recommended in general, but bytes::length remains acceptable for getting the byte-size of a scalar.
2019-02-01	viewdiff: support renames and long paths in diffstat anchors
	This is best-effort, but works well-enough in practice for projects which use shell-friendly filenames as well as the long path names for some Linux kernel selftests.
2019-02-01	view: simplify quote splitting
	Perl "split" can capture and group in the regexp itself, so rely on that to shorten our code. Comparing the /T/ HTML output of a thread from hell (on LKML with 1356 messages) reveals no difference in the rendered result. Only the HTML source differs in newline placement before/after the closing </span> This allows a minor speedup on my X32 Thinkpad @ 1.6GHz with the aforementioned LKML thread from hell: before: 3.67s after: 3.55s
2019-02-01	view: fix broken hunk header hrefs in Atom feeds
	We use absolute URLs in the Atom feeds (to ease syndication/mirroring), so hunk headers need to point to the solver URLs.