public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2020-04-09	triewyde: ficks soem speling errrors
	Dikshunarees R gude!
2020-04-07	view: do not redundantly obfuscate addresses
	We shouldn't rerun the address obfuscator on data we've already run through. Instead, run through the unescaped text part and substitute the UTF-8 "\x{2022}" substitution before it hits HTML escaping Fixes: 9bdd81dc16ba6511 ("view: msg_iter calls add_body_text directly")
2020-04-05	release large (non ref) scalars using `undef $sv'
	Using `undef EXPR' like a function call actually frees the heap memory associated with the scalar, whereas `$sv = undef' or `$sv = ""' will hold the buffer around until $sv goes out of scope. The `sv_set_undef' documentation in the perlapi(1) manpage explicitly states this: The perl equivalent is "$sv = undef;". Note that it doesn't free any string buffer, unlike "undef $sv". And I've confirmed by reading Dump() output from Devel::Peek. We'll also inline the old index_body sub in SearchIdx.pm to make the scope of the scalar more obvious. This change saves several hundred kB RSS on both -index and -httpd when hitting large emails with thousands of lines.
2020-04-04	view: inline flush_quote sub
	No point in having an extra sub for a short, commonly called function in the same file.
2020-04-04	viewdiff: reduce sub parameter count
	We're slowly moving towards doing all of our output buffering into a single buffer, so passing that around on the stack as a dedicated parameter is confusing.
2020-04-04	view: dedupe_subject: allow "0" as a valid Subject
	While rare in practice (even by spammers), A single "0" could theoretically be the entire contents of a Subject line. So use the Perl 5.10+ defined-or operator to improve correctness of subject deduplication.
2020-04-04	view: use defined-or operator to simplify checks
	We depend on Perl 5.10 features in other places. Shorten the lifetime of the `$desc' scalar while we're at it.
2020-04-04	view: note we assume UTF-8 on unknown encodings
	Clarify that we're assuming the text is UTF-8, since users may have no idea how it's mangled.
2020-04-03	quiet "Complex regular subexpression recursion limit" warnings
	These seem mostly harmless since Perl will just truncate the match and start a new one on a newline boundary in our case. The only downside is we'd end up with redundant <span> tags in HTML. Limiting the number of line matched ourselves with `{1,$NUM}' doesn't seem prudent since lines vary in length, so we continue to defer the job of limiting matches to the Perl regexp engine. I've noticed this warning in practice on 100K+ line patches to locale data.
2020-04-03	view: handle the topic-free case properly
	There may be no topics for a given timestamp range, so don't attempt to treat `undef' as an arrayref.
2020-04-02	mid: add $MID_EXTRACT regexp for export
	This allows us to consistently enforce the same Message-ID extraction rules everywhere and makes it easier for us to make changes in the future. Update scripts/ssoma-replay, as well, but don't rely on PublicInbox::* modules in that since it's legacy and public-inbox was never a dependency of ssoma.
2020-03-22	rename PublicInbox::SearchMsg => PublicInbox::Smsg
	Since the introduction of over.sqlite3, SearchMsg is not tied to our search functionality in any way, so stop confusing ourselves and future hackers by just calling it "PublicInbox::Smsg". Add a missing "use" in ExtMsg while we're at it.
2020-02-17	view: shorten life of MIME object for permalink
	We don't need to hold onto the Email::MIME object across multiple WwwResponse->getline calls, instead we can stuff the rendered HTML of the first (and hopefully only) message of the buffer into ctx->{-html_tip}.
2020-02-16	view: remove last Hval->new caller
	The object-oriented Hval API turned out to be less useful and more clunky than I envisioned years ago, so get rid of it. We'll no longer strip trailing whitespace from From: headers in the HTML display, but I doubt anybody cares.
2020-02-16	view: escape ampersand in Message-IDs
	We need to escape ampersands (and some other characters for href attributes), so introduce a `mid_href' sub to do just that. '<', '>' and '"' were always escaped, so there's no risk of tag or attribute injection, but creative Message-IDs could cause confusion for some parsers and generate invalid URLs. Start getting rid of the bloated, over-engineered OO Hval API while we're at it, I only noticed this bug because I started killing off Hval->new* callers.
2020-02-16	view: escape Subject HTML directly
	No need to use the over-engineered Hval OO API when the subject is already normalized and there's no trailing spaces because of normalization.
2020-02-16	view,searchview: avoid smsg method calls when using SQLite/Xapian
	We already pre-populate the hashref when loading $smsg (PublicInbox::SearchMsg) objects out of over.sqlite3 or Xapian, so making expensive method calls isn't necessary in those cases. We only need to use the method calls when SQLite or Xapian are not available or are being populated (such as during indexing).
2020-02-16	view: cleanup topic accumulation and dumping
	Avoid needlessly normalizing the subject when dumping, since it's pushed into the @$topic array during accumulation in normalized form. We can also safely treat $smsg as a hashref and avoid calling "->ds" as a method since we know we've got that loaded via Over\|\|Search and won't have to use Email::MIME header lookup methods.
2020-02-16	view: dump_topics: better naming of top Subject
	We use `$top' in other places, so name it to `$top_subj' consistently for `$subj' and `$prev_subj' comparisons down the function.
2020-02-16	view: single id="t" for multi-Subject messages
	While multi-Subject messages are unfortunate, try not to generate confusing/invalid HTML with multiple elements having the same HTML id attribute.
2020-02-16	view: remove mhref arg from multipart_text_as_html
	No point in passing something on stack only to stash it into the $ctx which holds most other parameters used for rendering the HTML.
2020-02-06	treewide: run update-copyrights from gnulib for 2019
	I didn't wait until September to do it, this year!
2020-01-27	viewdiff: rewrite and simplify
	Instead of going line-by-line, use split() with a giant regexp to capture groups of contiguous lines. This offloads state management to the regexp itself and makes it FAR easier to keep track of <span> and </span> pairings. Performance seems roughly on par after this change for the meta@public-inbox archives. It seems a tiny bit faster for git@vger with xt/perf-msgview.t, likely due to the longer messages and larger contiguous groups of lines having the same prefix (or no prefix at all) and drastically reduces the number of subroutine calls and Perl ops executed.
2020-01-27	linkify: move to_html over from ViewDiff
	We use the same idiom in many places for doing two-step linkification and HTML escaping. Get rid of an outdated comment in flush_quote while we're at it.
2020-01-27	view: inline and eliminate msg_html
	No need to keep the old sub around, anymore. Rename auxiliary subs to "msg_page_*" instead of the "html" version.
2020-01-27	view: start performing buffering into {obuf}
	Get rid of the confusingly named {rv} and {tip} fields and unify them into {obuf} for readability. {obuf} usage may be expanded to more areas in the future. This will eventually make it easier for us to experiment with alternative buffering schemes.
2020-01-27	view: simplify duplicate Message-ID handling
	It's an uncommon code path, no need to make it more complex than it needs to be by having extra sub parameters.
2020-01-27	view: thread_skel: drop constant tpfx parameter
	It hasn't changed in a few years. Now we can rely on constant folding to avoid extraneous ops to the $skel buffer.
2020-01-27	view: reduce parameters for html_footer
	Put more logic into html_footer and less in its only caller so we can control the buffering and string creation.
2020-01-27	view: improve readability around walk_thread
	Pass \&coderefs explicitly to walk_thread, and add some prototypes + comments to describe what goes on.
2020-01-27	www: use "skel" terminology consistently
	This saves us a few comments and confusion. Yes, it's a destination so "dst" can be appropriate, but we may be using that term elsewhere.
2020-01-25	spelling: favor `publicly' over `publically'
	While both can be correct, the former seems more common, is shorter, and is also consistent with the spelling found in the AGPL-3.0 text.
2020-01-12	www: discard multipart parent on iteration
	We're often iterating through messages while writing to another buffer in our WWW interface, causing memory usage to multiply. Since we know we won't need to keep the MIME object around in some cases, and can tell msg_iter to clobber the on-stack variable while it operates on subparts of multipart messages. With xt/mem-msgview.t switched to multipart from the previous commit, this shows a 13 MB memory reduction on that test.
2020-01-06	view: update POSIX::strftime usage
	The POSIX module is always loaded, so import `strftime' into the namespace so we can use it and take advantage of compile-time arg checking. While we're at it, update and reorder caller functions to use prototypes, too.
2020-01-06	hval: export prurl and add prototype
	This allows to do some compile-time checking and fills in a missing "use" in PublicInbox::NewsWWW, allowing it to be used standalone and independently of PublicInbox::WWW
2020-01-05	view: msg_html: reduce memory use on reused MIDs
	In rare cases where Message-IDs get reused, we do not want to hold onto the large Email::MIME objects in memory after showing the first message. So discard each message as soon as we're done using it so we can save memory for the next message. The new and expensive xt/mem-msgview.t test shows a nearly 14MB reduction for two ~7MB messages. run_script() also gets upgraded to make it easier to pass large inputs via IO GLOBs.
2019-12-27	view: msg_iter calls add_body_text directly
	No need to waste several kilobytes creating an anonymous sub for every invocation of msg_iter.
2019-12-27	searchview: remove anonymous sub when sorting threads by relevance
	We don't need to return a closure or have a separate hash for sorting threads by relevance. Instead, we can stuff the relevance {pct} into the SearchMsg object itself and use that. Note: upon reviewing this code, the sort-by-relevance seems bogus as it only considers the relevance of the topmost message. Instead, it would make more sense to the user to sort by the highest relevance of all messages in that particular thread.
2019-12-27	view: thread_html: pass named sub to WwwStream
	We can pass everything we need into the WWW $ctx to avoid allocating kilobytes of memory for an anonymous sub for every $MESSAGE_ID/t/ request.
2019-12-27	view: msg_html: stop using an anonymous sub
	Stash 5 local variables into the WWW $ctx hash table instead of allocating several kilobytes for an anonymous sub.
2019-12-27	view: avoid anon sub in stream_thread
	WwwStream already passes the WWW $ctx to the callback sub, so we don't need to create a new sub every call to capture local variables for the callback.
2019-12-21	searchview: save a column in &x=t thread skeleton
	Displaying "100%" wastes a precious column. Show "99%" instead since there's little practical difference and <xapian/mset.h> states: Note that these generally aren't percentages of anything meaningful (unless you use a custom weighting formula where they are!) And we're not using a custom weighting formula.
2019-12-20	view: show percentage in search results thread skeleton
	The displays the Xapian ->get_percent value in the skeleton to improve scanning of relevancy; irrelevant results do not display that. This fixes broken #anchor links introduced in the previous commit, irrelevant messages now link to the /$INBOX/$MESSAGE_ID page.
2019-12-20	searchthread: fix usage of user-supplied parameter
	Instead of only passing an Inbox object, we'll pass the $ctx reference as PublicInbox::SearchView::mset_thread did. So although mset_thread was wrong, we now make it's usage of SearchThread::thread correct and update other callers to favor the new style of passing the entire $ctx (with ->{-inbox}) instead of just the Inbox object. This makes the thread skeleton at the bottom of the search page to show subjects of messages, but unfortunately links to non-existent #anchors. The next commit will fix that. While we're at it, favor "\&foo" over "*foo" since the former makes the code reference (aka "function pointer) obvious so it won't be confused for other things named "foo" in that scope (e.g. $foo/@foo/%foo).
2019-10-28	view: show X-Alt-Message-ID in permalink view, too
	Since we index X-Alt-Message-ID (because we need to placate some NNTP clients), we now display it as well, since that Message-ID could be the X-Alt-Message-ID that the reader is actually interested in.
2019-10-28	linkify: support adding "(raw)" link for Message-IDs
	And use it for the per-message permalink display.
2019-10-28	view: improve warning for multiple Message-IDs
	"refer" is not the correct term, here; since that would mean multiple messages have the current message in the "References:" header, and that's a normal occurence. Instead, we need to warn the reader that the given message itself has multiple Message-IDs.
2019-10-28	view: move '<' and '>' outside <a>
	Browsers may underline '<' and '>' in links, which may be confused with '≤' and '≥'. So have the Message-ID header display follow what we do with In-Reply-To headers and move the "<" and ">" outside of <a> in the HTML.
2019-10-28	view: display redundant headers in permalink
	Mail headers can contain multiple headers of any type, so ensure we don't hide any information we're getting in the per-message permalink views. This means it's possible to have multiple From, Date, To, Cc, Subject, and In-Reply-To headers displayed. The thread indices are a special case, I guess, since we run out of space on the line if the headers too long and tools like mutt only show the first one.
2019-09-09	run update-copyrights from gnulib for 2019