about summary refs log tree commit homepage
DateCommit message (Collapse)
2016-08-14import_slrnspool: reimplement using fast-import
I needed to use this to resurrect some messages missing from my initial downloads from gmane...
2016-08-14newswww: include body text in 404 response
Some browsers do not give any indication of the HTTP error code on errors, so show the error text to the user like we do in the top-level WWW module.
2016-08-13extmsg: reorder and add a more Message-ID lookup services
gmane is down at the moment, so lower that in priority (hopefully it will be brought back up, again). Wikipedia also lists a few more project-specific list providers, so include those as well: https://en.wikipedia.org/wiki/Message-ID
2016-08-12watch: respect altid for incremental watch changes
We need to pass the Inbox object to SearchIdx to get altid mappings properly for incremental imports. TODO: use the Inbox object in more places where it makes sense to do so.
2016-08-12www: allow including links to NNTP sites in HTML footer
Improve the discoverability of NNTP endpoints for users who still know what NNTP is. ==> ~/.public-inbox/config <== ; aliases for the locally-run nntpd can be specified in ; the "publicinbox" section: [publicinbox] nntpserver = nntp://ou63pmih66umazou.onion/ nntpserver = news.public-inbox.org ; NNTPS is not supported natively, yet, ; but one can use haproxy or similar ; nntpserver = nntps://news.public-inbox.invalid/ ; mirrors for specific inboxes may be specified either as full ; NNTP (or NNTPS) URLs, or with the server name only if the ; newsgroup name is specfied for a local NNTP server [publicinbox "git"] ... newsgroup = inbox.a.b.c nntpmirror = nntp://czquwvybam4bgbro.onion/ nntpmirror = hjrcffqmbrq6wope.onion ; there may be a mirror on a different server with a ; different name: nntpmirror = nntp://news.example.com/differently.named.group ; (And I really need to write manpages for all this...)
2016-08-12public-inbox-watch: support reloading config with SIGHUP
This can be useful for adding new lists, as restarting is expensive (but still non-lossy).
2016-08-12config: do not nest multi-value altid arrays
Oops. We will inevitably need to support multiple altids for a public-inbox one day.
2016-08-11search: support alt-ID for mapping legacy serial numbers
For some existing mailing list archives, messages are identified by serial number (such as NNTP article numbers in gmane). Those links may become inaccessible (as is the current case for gmane), so ensure users can still search based on old serial numbers. Now, I run the following periodically to get article numbers from gmane (while news.gmane.org remains): NNTPSERVER=news.gmane.org export NNTPSERVER GROUP=gmane.comp.version-control.git perl -I lib scripts/xhdr-num2mid $GROUP --msgmap=/path/to/gmane.sqlite3 (I might integrate this further with public-inbox-* scripts one day). My ~/.public-inbox/config as an added "altid" snippet which now looks like this: [publicinbox "git"] address = git@vger.kernel.org mainrepo = /path/to/git.vger.git newsgroup = inbox.comp.version-control.git ; relative pathnames expand to $mainrepo/public-inbox/$file altid = serial:gmane:file=gmane.sqlite3 And run "public-inbox-index --reindex /path/to/git.vger.git" periodically. This ought to allow searching for "gmane:12345" to work for Xapian-enabled instances. Disclaimer: while public-inbox supports NNTP and stable article serial numbers, use of those for public links is discouraged since it encourages centralization.
2016-08-10searchidx: allow searching Message-IDs in free-form text
It is not unheard of for users to attempt finding messages by entering Message-IDs into the "Search" box instead of using the existing URL structure. So make it possible for them. Fwiw, I've definitely encountered users who enter entire URLs into generic search engines.
2016-08-09www: avoid misinterpreting '&' and ';' in query parameters
Oops, we must unescape each key=value pair in a QUERY_STRING individually; otherwise we cannot interpret '&' or ';' in query parameter values.
2016-08-09searchidx: avoid holding Xapian lock in cat-file
We must ensure cat-file process is launched before Xapian grabs lock, too. Our use of "git cat-file --batch" has the same problem as "git log" did, (which was fixed in commit 3713c727cda431a0dc2865a7878c13ecf9f21851) "searchidx: release Xapian FDs before spawning git log"
2016-08-09searchidx: release Xapian FDs before spawning git log
This will allow us to release and re-acquire Xapian locks due to the lack of FD_CLOEXEC on some FDs.
2016-08-09searchidx: persist the PublicInbox::Git object
We can cheaply keep the object around nowadays since it spawns expensive processes only on an as-needed basis.
2016-08-09searchidx: remove unused $git parameters
We do not need to pass the PublicInbox::Git object to various callbacks.
2016-08-06www: use <hr> to delimit messages in /new.html view, too
This is necessary to delimit messages when viewed without threading.
2016-08-06mbox: be fair to other HTTP clients
At least for public-inbox-httpd, this allows us to avoid having a client monopolize one event loop tick of the server for too long. It hurts throughput for the /all.mbox.gz endpoint, but I doubt anybody cares and the latency improvement for other clients would be appreciated. We already do the same fairness thing for HTML pages.
2016-08-06view: do not introduce excessive </pre> in $MID/t/ view
When using <ul><li>..., we already setup <pre> tags in thread_index_entry, so having an extra </pre> tag causes validation errors. Fixes: 6ef9b216156c ("view: use <hr> to delineate in /$MID/T/ view")
2016-08-05search: disable batching in newer versions of Xapian, for now
This warrants further investigation, but it appears we cannot release Xapian reliably after forking "git log" due to the lack of a close-on-exec flag on the Xapian flintlock FD
2016-08-05view: use <hr> to delineate in /$MID/T/ view
The sacrifice in vertical space might be worth it to improve ease-of-reading, as it's unreasonable to expect an entire message thread to be able to fit into a single window. https://public-inbox.org/git/20160805093544.scvl4yshkfg2l26p@sigill.intra.peff.net/ Cc: Jeff King <peff@peff.net>
2016-08-05http: do not allow bad getline+close responses to kill us
PSGI applications (like our WWW :P) can fail unpredictability, but lets try to avoid bringing the entire process down when this happens.
2016-08-05thread: avoid recursion in Mail::Thread::recurse_down
Yet another monkey patch to fix a problem encountered in upstream Mail::Thread. ref: - https://rt.cpan.org/Ticket/Display.html?id=116727 - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=833479
2016-08-04view: do not fail on empty In-Reply-To
Sometimes messages have an empty In-Reply-To header which throws threaders off. This actually causes public-inbox-httpd to die, which is probably bad and will be fixed elsewhere.
2016-08-04searchmsg: add git object ID to doc_data
Doing git tree lookups based on the SHA-1 of the Message-ID is expensive as trees get larger, instead, use the SHA-1 object ID directly. This drastically reduces the amount of time spent in the "git cat-file --batch" process for fetching the /$INBOX/all.mbox.gz endpoint on the ~800MB git@vger.kernel.org mirror This retains backwards compatibility and allows existing indices to be transparently upgraded without performance degradation.
2016-08-02search: improve reindexing behavior
For reindexing, fresh Xapian DBs do not count as a reindex, allowing users to blindly use --reindex on the first run on a clean repo. While we're at it, allow indexing to override HEAD ref for multi-head git repos.
2016-08-02wwwstream: prioritize search in top title bar
search is probably more useful so users should be able to select it sooner. Put it on its own line so it won't get scrolled off the edge for non-CSS users. Fix a minor spacing bug in the input tag while we're at it, too
2016-08-02daemon: do not chdir unless daemonizing
As far as most process managers are concerned (e.g. systemd), they should already start in '/'. So avoid making our daemon more complex to run by requiring absolute paths during development.
2016-07-31search: support reindexing existing search indices
This should make tweaking the way we search more efficiet by allowing us to avoid doubling destroying the index every time we want to change something. We also give priority to incremental indexing via public-inbox-{watch,mda} and have manual invocations of public-inbox-index perform batch updates while releasing ssoma.lock.
2016-07-31msgmap: fix use of transactions
We want transactions to be the responsibility of the caller when possible; this fixes the potential for the msgmap to internally become inconsistent when using it from inside searchidx.
2016-07-30t/config_limiter: fix check for identical Git object
If we completely undef an object, it is likely possible to have the same scalar address as the original object even if they are different. So keep the same object around and only force creation of the same reference. Tested on Perl 5.14.2 on Debian 7.x wheezy.
2016-07-29daemon: re-enable SIGWINCH without setsid
This allows systemd users to use SIGWINCH to temporarily (and gracefully) stop an instance of a service without doing a code reload to bring it back up: # start temporary new service code systemctl start public-inbox-nntpd@2.service # momentarily paralyze original service systemctl kill -s WINCH public-inbox-nntpd@1.service if new_code_at_2_sucks then # restart original workers systemctl kill -s HUP public-inbox-nntpd@1.service else # new is better than old, replace original instance systemctl restart public-inbox-nntpd@1.service fi # cleanup the temporary service systemctl stop public-inbox-nntpd@2.service This partially reverts commit 73d274e83b7d300f31e0cc1ceeacbf73c6c2a1e4 ("daemon: disable SIGWINCH unless explicitly daemonized")
2016-07-28add scripts/xhdr-num2mid example
This is used to quickly generate an article number to Message-ID mapping. Usage: NNTPSERVER=news.example.org ./scripts/xhdr-num2mid GROUP >file
2016-07-28fix manifest
2016-07-28add script used for importing git from download.gmane.org
In case others want to use it...
2016-07-27localize $/ when using chomp
Callers may have localized $/ to something else, so make sure we chomp the expected character(s) when calling chomp.
2016-07-26mda: always call Import::done, even on dupes
We don't want to leave fast_import_crash_* dumps around on duplicates.
2016-07-26learn: fix uninitialized variable
Oops :x
2016-07-26mda: fix address matching in address lists
This is common when multiple participants are in a thread.
2016-07-21www: redefinable project name and URL
This should make it easier for folks to run their own forks.
2016-07-21www: a note .onion URLs require the usage of Tor
Not everybody knows what .onion URLs are, so refer them to Tor.
2016-07-21view: split up --cc args for git-send-email
Having long Cc: lines is inevitable for large threads with many participants, and git-send-email only gained the ability to recognize ',' in the "--cc" arg recently with the release of git v2.6.0 in September 2015.
2016-07-21www: label sections and hopefully improve navigation
Clearly label "Thread overview" and "Reply instructions" so users can quickly skip stuff they're not interested in. Additionally, note the fact the thread view allows quick navigation within the thread to avoid extra network requests and improve the display for single-message threads. Finally, use <hr> to better-delineate sections of each page.
2016-07-17extmsg: favor user-provided URL on partial matches
While an inbox may have multiple URLs, we will favor the existing URL for the current inbox on partial matches to avoid confusing users or slowing them down by requiring a new TCP connection.
2016-07-10view: conditionally anchor to thread skeleton
We only care about the thread skeleton if we have multiple messages in a thread, single message threads can just go to the top of the message.
2016-07-10INSTALL: postfix and spamassassin are optional for HTTP mirrors
Not everybody needs to run an -mda or -watch
2016-07-09view: add "infourl" for reply information
2016-07-09view: show most recently updated topics, first
This probably makes the most sense as it's structured like a changelog.
2016-07-09view: improve grouping for topic view
This reduces the amount of mbox/Atom links while keeping better track of overall thread count. We no longer loop to fill up slots to simplify the code a bit and hopefully get better grouping.
2016-07-09httpd/async: reinstate D::S timer usage for cleanup
EvCleanup::asap events are not guaranteed to run after Danga::Socket closes sockets at the event loop. Thus we must use slower Danga::Socket timers which are guaranteed to run at the end of the event loop.
2016-07-09httpd/async: do not attempt future writes on closed sockets
Danga::Socket::close does not clear the write_buf_size field, so it's conceivable we could attempt to queue up data and callbacks we can never flush out.
2016-07-09www: add configurable limiters
Currently only for git-http-backend use, this allows limiting the number of spawned processes per-inbox or by group, if there are multiple large inboxes amidst a sea of small ones. For example, a "big" repo limiter could be used for big inboxes: which would be shared between multiple repos: [limiter "big"] max = 4 [publicinbox "git"] address = git@vger.kernel.org mainrepo = /path/to/git.git ; shared limiter with giant: httpbackendmax = big [publicinbox "giant"] address = giant@project.org mainrepo = /path/to/giant.git ; shared limiter with git: httpbackendmax = big ; This is a tiny inbox, use the default limiter with 32 slots: [publicinbox "meta"] address = meta@public-inbox.org mainrepo = /path/to/meta.git