Date | Commit message (Collapse) |
|
The internal help text links to the Xapian query parser
documentation anyways, but also provides information
on which prefixes exist.
|
|
Begin documenting some basic help functionality.
I may tweak the anchor names of the various HTML endpoints
to be more consistent with each other (old ones will be
supported for a short while), so I'm not documenting
those, for now.
This may become part of a builtin key-value store for
basic texts, but this probably shouldn't become a wiki
engine, either.
|
|
We're not to-the-letter about percent-encoding, but
we should allow all the characters. This is mainly
so we can effectively use the link to some Wikipedia
pages with parentheses in them:
https://en.wikipedia.org/wiki/Atom_(standard)
https://en.wikipedia.org/wiki/Git_(software)
|
|
For some reason, Alpine will set X-UNKNOWN for valid UTF-8.
Since we favor UTF-8 HTML anyways, try forcing Email::MIME to
handle text/plain as UTF-8 which might show up better.
At least this change renders
<alpine.DEB.2.20.1608131214070.4924@virtualbox>
properly by showing "•" (•) instead of
"⠢" (•)
Reported-by: Thomas Ferris Nicolaisen <tfnico@gmail.com>
|
|
Alpine seems to set charset=X-UNKNOWN for valid UTF-8 text,
which causes Email::MIME::body_str to fail as X-UNKNOWN
is not a valid encoding. So, blindly display the body
as plain-text but warn users about possibly mangled text.
Reported-by: Thomas Ferris Nicolaisen <tfnico@gmail.com>
|
|
There is no point in using an array to join on an
empty string (my original intention was probably to
join on "\n").
This is only preparation for the next change to show
a warning to in the attachment link.
|
|
This is similar to mairix in that it uses a "d:" prefix; but
only takes YYYYMMDD, for now. Using custom date/time parsers
via Perl will be much more work:
nntp://news.gmane.org/20151005222157.GE5880@survex.com
Anyhow, this ought to be more human-friendly than searching by
Unix timestamps, but it requires reindexing to take advantage of.
|
|
The Unix timestamp isn't meaningful for users searching,
we will start indexing the YYYYMMDD date stamp which may
use StringValueRangeProcessor, instead.
|
|
Not sure why or how I missed this before; but the common address
parsing routine we have should be more correct.
Add a test to ensure excessively quoted names don't make it
through, either.
|
|
Ensure we usually strip one level of '<>' from Message-IDs,
since our internal SQLite, Xapian, and SHA-1 storage all
assume that.
Realistically, we screw up if somebody has '<<' or '>>',
but those are screwed up mail clients and we can deal with
it another time. Currently, this means some messages with
'>>' in References or Message-Id are not handled correctly,
yet, but we match the behavior of Mail::Thread in keeping
the extra '>'.
|
|
Based on reading RFC 3986, it seems '@', ':', '!', '$', '&',
"'", '; '(', ')', '*', '+', ',', ';', '=' are all allowed
in path-absolute where we have the Message-ID.
In any case, it seems '@' is fairly common in path components
nowadays and too common in Message-IDs.
|
|
I've seen 0x1b (\e) in at least one message and some other
possibly non-printable chars. In any case, make sure they're
valid XML with us-ascii encoding as far as xmlstarlet(1) thinks
so.
|
|
Apparently there are some really screwed up In-Reply-To
fields out there.
|
|
We can't blindly assume a ghost even exists in the DB, as the
rules can change internally for some corner-case Message-IDs.
|
|
|
|
Because buggy mail clients exist and generate invalid
In-Reply-To headers we cannot handle across the board...
|
|
SQLite might index quickly, so we hold the lock used by Xapian
for the duration. This probably needs to be reworked entirely,
actually.
|
|
Some browsers do not give any indication of the HTTP error
code on errors, so show the error text to the user like we
do in the top-level WWW module.
|
|
gmane is down at the moment, so lower that in priority
(hopefully it will be brought back up, again). Wikipedia also
lists a few more project-specific list providers, so include
those as well: https://en.wikipedia.org/wiki/Message-ID
|
|
We need to pass the Inbox object to SearchIdx to get altid
mappings properly for incremental imports.
TODO: use the Inbox object in more places where it makes sense
to do so.
|
|
Improve the discoverability of NNTP endpoints for users
who still know what NNTP is.
==> ~/.public-inbox/config <==
; aliases for the locally-run nntpd can be specified in
; the "publicinbox" section:
[publicinbox]
nntpserver = nntp://ou63pmih66umazou.onion/
nntpserver = news.public-inbox.org
; NNTPS is not supported natively, yet,
; but one can use haproxy or similar
; nntpserver = nntps://news.public-inbox.invalid/
; mirrors for specific inboxes may be specified either as full
; NNTP (or NNTPS) URLs, or with the server name only if the
; newsgroup name is specfied for a local NNTP server
[publicinbox "git"]
...
newsgroup = inbox.a.b.c
nntpmirror = nntp://czquwvybam4bgbro.onion/
nntpmirror = hjrcffqmbrq6wope.onion
; there may be a mirror on a different server with a
; different name:
nntpmirror = nntp://news.example.com/differently.named.group
; (And I really need to write manpages for all this...)
|
|
Oops. We will inevitably need to support multiple altids for a
public-inbox one day.
|
|
For some existing mailing list archives, messages are identified
by serial number (such as NNTP article numbers in gmane). Those
links may become inaccessible (as is the current case for
gmane), so ensure users can still search based on old serial
numbers.
Now, I run the following periodically to get article numbers
from gmane (while news.gmane.org remains):
NNTPSERVER=news.gmane.org
export NNTPSERVER
GROUP=gmane.comp.version-control.git
perl -I lib scripts/xhdr-num2mid $GROUP --msgmap=/path/to/gmane.sqlite3
(I might integrate this further with public-inbox-* scripts one day).
My ~/.public-inbox/config as an added "altid" snippet which now
looks like this:
[publicinbox "git"]
address = git@vger.kernel.org
mainrepo = /path/to/git.vger.git
newsgroup = inbox.comp.version-control.git
; relative pathnames expand to $mainrepo/public-inbox/$file
altid = serial:gmane:file=gmane.sqlite3
And run "public-inbox-index --reindex /path/to/git.vger.git"
periodically.
This ought to allow searching for "gmane:12345" to work for
Xapian-enabled instances.
Disclaimer: while public-inbox supports NNTP and stable article
serial numbers, use of those for public links is discouraged
since it encourages centralization.
|
|
It is not unheard of for users to attempt finding messages by
entering Message-IDs into the "Search" box instead of using the
existing URL structure. So make it possible for them.
Fwiw, I've definitely encountered users who enter entire URLs
into generic search engines.
|
|
Oops, we must unescape each key=value pair in a QUERY_STRING
individually; otherwise we cannot interpret '&' or ';' in
query parameter values.
|
|
We must ensure cat-file process is launched before Xapian
grabs lock, too. Our use of "git cat-file --batch" has
the same problem as "git log" did, (which was fixed in
commit 3713c727cda431a0dc2865a7878c13ecf9f21851)
"searchidx: release Xapian FDs before spawning git log"
|
|
This will allow us to release and re-acquire Xapian locks
due to the lack of FD_CLOEXEC on some FDs.
|
|
We can cheaply keep the object around nowadays since it
spawns expensive processes only on an as-needed basis.
|
|
We do not need to pass the PublicInbox::Git object to
various callbacks.
|
|
This is necessary to delimit messages when viewed without
threading.
|
|
At least for public-inbox-httpd, this allows us to avoid having
a client monopolize one event loop tick of the server for too
long. It hurts throughput for the /all.mbox.gz endpoint, but I
doubt anybody cares and the latency improvement for other
clients would be appreciated.
We already do the same fairness thing for HTML pages.
|
|
When using <ul><li>..., we already setup <pre> tags
in thread_index_entry, so having an extra </pre> tag
causes validation errors.
Fixes: 6ef9b216156c ("view: use <hr> to delineate in /$MID/T/ view")
|
|
This warrants further investigation, but it appears we cannot
release Xapian reliably after forking "git log" due to the
lack of a close-on-exec flag on the Xapian flintlock FD
|
|
The sacrifice in vertical space might be worth it to improve
ease-of-reading, as it's unreasonable to expect an entire
message thread to be able to fit into a single window.
https://public-inbox.org/git/20160805093544.scvl4yshkfg2l26p@sigill.intra.peff.net/
Cc: Jeff King <peff@peff.net>
|
|
PSGI applications (like our WWW :P) can fail unpredictability,
but lets try to avoid bringing the entire process down when this
happens.
|
|
Yet another monkey patch to fix a problem encountered in upstream
Mail::Thread.
ref:
- https://rt.cpan.org/Ticket/Display.html?id=116727
- http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=833479
|
|
Sometimes messages have an empty In-Reply-To header which throws
threaders off. This actually causes public-inbox-httpd to die,
which is probably bad and will be fixed elsewhere.
|
|
Doing git tree lookups based on the SHA-1 of the Message-ID
is expensive as trees get larger, instead, use the SHA-1
object ID directly. This drastically reduces the amount
of time spent in the "git cat-file --batch" process for
fetching the /$INBOX/all.mbox.gz endpoint on the ~800MB
git@vger.kernel.org mirror
This retains backwards compatibility and allows existing
indices to be transparently upgraded without performance
degradation.
|
|
For reindexing, fresh Xapian DBs do not count as a reindex,
allowing users to blindly use --reindex on the first
run on a clean repo.
While we're at it, allow indexing to override HEAD ref for
multi-head git repos.
|
|
search is probably more useful so users should be able to select
it sooner. Put it on its own line so it won't get scrolled off
the edge for non-CSS users.
Fix a minor spacing bug in the input tag while we're at it, too
|
|
As far as most process managers are concerned (e.g. systemd),
they should already start in '/'. So avoid making our daemon
more complex to run by requiring absolute paths during
development.
|
|
This should make tweaking the way we search more efficiet
by allowing us to avoid doubling destroying the index every
time we want to change something.
We also give priority to incremental indexing via
public-inbox-{watch,mda} and have manual invocations of
public-inbox-index perform batch updates while releasing
ssoma.lock.
|
|
We want transactions to be the responsibility of the
caller when possible; this fixes the potential for
the msgmap to internally become inconsistent when
using it from inside searchidx.
|
|
This allows systemd users to use SIGWINCH to temporarily
(and gracefully) stop an instance of a service without
doing a code reload to bring it back up:
# start temporary new service code
systemctl start public-inbox-nntpd@2.service
# momentarily paralyze original service
systemctl kill -s WINCH public-inbox-nntpd@1.service
if new_code_at_2_sucks
then
# restart original workers
systemctl kill -s HUP public-inbox-nntpd@1.service
else # new is better than old, replace original instance
systemctl restart public-inbox-nntpd@1.service
fi
# cleanup the temporary service
systemctl stop public-inbox-nntpd@2.service
This partially reverts commit 73d274e83b7d300f31e0cc1ceeacbf73c6c2a1e4
("daemon: disable SIGWINCH unless explicitly daemonized")
|
|
Callers may have localized $/ to something else, so make sure
we chomp the expected character(s) when calling chomp.
|
|
This is common when multiple participants are in a thread.
|
|
This should make it easier for folks to run their own forks.
|
|
Not everybody knows what .onion URLs are, so refer them to Tor.
|
|
Having long Cc: lines is inevitable for large threads
with many participants, and git-send-email only gained
the ability to recognize ',' in the "--cc" arg recently
with the release of git v2.6.0 in September 2015.
|
|
Clearly label "Thread overview" and "Reply instructions"
so users can quickly skip stuff they're not interested in.
Additionally, note the fact the thread view allows quick
navigation within the thread to avoid extra network requests
and improve the display for single-message threads.
Finally, use <hr> to better-delineate sections of each page.
|