about summary refs log tree commit homepage
path: root/script
DateCommit message (Collapse)
2019-09-26httpd: disable Deflater middleware by default on Perl <5.18 p516-leak
Testing with perl-5.16.3-294.el7_6 RPM package on RHEL/CentOS 7, the Deflater middleware triggers a leak when used in conjunction with our push-based responses from PublicInbox::Qspawn. I could not find another solution to workaround the memory leak in this case, and I could not find a specific leak fix in the perl5180delta manpage[1] which looked like it would solve our problem. Attempting to workaround the issue proved futile. Using internal Deflater-specific keys to prevent deflating in GitHTTPBackend and Qspawn did not solve the problem: $env->{"plack.skip-deflater"} = 1; $env->{"psgix.no-compress"} = 1; Nor did forcing an invalid encoding via "git fetch": git -c http.extraheader=Accept-Encoding:gzap fetch So this appears to be a problem with Plack::Util::response_cb somewhere. This does NOT appear to be a problem with ref() leaking as in DS::next_tick[2], since I couldn't find where Plack::Middleware::Deflater or Plack::Util::response_cb would be calling ref() on a blessed reference to trigger a leak. Also, oddly enough, the ref() use for backwards compatibility at the top of PublicInbox::GitHTTPBackend::serve does NOT seem to trigger a leak on 5.16.3 due to [2]: # XXX compatibility... ugh, can we stop supporting this? $git = PublicInbox::Git->new($git) unless ref($git); [1] https://perldoc.perl.org/perl5180delta.html [2] https://rt.perl.org/Public/Bug/Display.html?id=114340
2019-09-09run update-copyrights from gnulib for 2019
2019-07-06nntp: support COMPRESS DEFLATE per RFC 8054
This is only tested so far with my patches to Net::NNTP at: https://rt.cpan.org/Ticket/Display.html?id=129967 Memory use in C10K situations is disappointing, but that's the nature of compression. gzip compression over HTTPS does have the advantage of not keeping zlib streams open when clients are idle, at the cost of worse compression.
2019-06-24nntp: NNTPS and NNTP+STARTTLS working
It kinda, barely works, and I'm most happy I got it working without any modifications to the main NNTP::event_step callback thanks to the DS->write(CODE) support we inherited from Danga::Socket.
2019-06-14edit: fix portability of editor invocation
The eval was unnecessary, and $0 can't be "--". Tested with /bin/sh on FreeBSD 11.2
2019-06-14Merge remote-tracking branch 'origin/reshard' into next
* origin/reshard: xcpdb: support resharding v2 repos xcpdb: use destination shard as progress prefix xapcmd: preserve indexlevel based on the destination v2writable: use a smaller default for Xapian partitions
2019-06-14xcpdb: support resharding v2 repos
v2 repos are sometimes created on machines where CPU parallelization exceeds the capability of the storage devices. In that case, users may reshard the Xapian DB to any smaller, positive integer to avoid excessive overhead and contention when bottlenecked by slow storage. Resharding can also be used to increase shard count after hardware upgrades.
2019-06-12edit: unlink temporary file when done
We don't need to leave temporary files lying around.
2019-06-10edit: drop unwanted headers before noop check
mutt will set Content-Length, Lines, and Status headers unconditionally, so we need to account for that before doing header comparisons to avoid making expensive changes when noop edits are made.
2019-06-10edit|purge: improve output on rewrites
Fill in undef as "(unchanged)" when displaying commits and prefix the epoch name.
2019-06-09edit: new tool to perform edits
This wrapper around V2Writable->replace provides a user-interface for editing messages as single-message mboxes (or the raw text via $EDITOR).
2019-06-09AdminEdit: move editability checks from -purge
We'll be reusing the same logic for -edit
2019-06-09admin: beef up resolve_inboxes to handle purge options
We'll be using this in -edit, and maybe other admin-oriented tools for UI-consistency.
2019-06-09purge: start moving common options to AdminEdit module
Editing and purging are similar operations involving history rewrites, so there'll be common options and code between them.
2019-06-05tighten up digit matches to ASCII for git output
While I don't expect git to suddenly start spewing non-ASCII digits in places I'd expect ASCII, this would make things easier for future hackers and reviewers.
2019-06-04require ASCII digits for local FS items
In case some BOFH decides to randomly create directories using non-ASCII digits all over the place.
2019-05-29searchidx: store indexlevel=medium as metadata
And use it from Admin. It's easy to tell what indexlevel=basic is from unconfigured inboxes, but distinguishing between 'medium' and 'full' would require stat()-ing position.* files which is fragile and Xapian-implementation-dependent. So use the metadata facility of Xapian and store it in the main partition so Admin tools can deal better with unconfigured inboxes copied using generic tools like cp(1) or rsync(1).
2019-05-29index: remove warning on unconfigured inboxes
It's annoying for people using "git fetch && public-inbox-index" as one user; and running -httpd/-nntpd as a different user (where users see different config files).
2019-05-29index: support --verbose option
It doesn't implement progress of batches, yet, but it wires up the parsing of the command-line while preserving output compatibility. This output is NOT meant to be stable.
2019-05-24doc: xcpdb: add switch documentation
In particular, the '--compact' switch is really useful since it works without holding the inbox-wide lock for minutes at a time on giant inboxes (inboxes where copies can take dozens, if not hundreds of minutes).
2019-05-23doc: various updates to reflect current state
-index documentation avoid redundant v1 information and refers readers to apropriate v1/v2 manpages. Search::Xapian can also be optional, now, as only the PSGI search interface uses it. Favor "INBOX_DIR" where appropriate, since "REPO_DIR" can be confused for code repos which we also support. XAPIAN_FLUSH_THRESHOLD is documented for all relevant bulk commands.
2019-05-23xcpdb|compact: support some xapian-compact switches
Allow users to specify the --blocksize <B>, --no-full, --fuller options for xapian-compact(1) for fine-tuning compact behavior for low-traffic/inactive inboxes. We also won't support --multipass, since it doesn't seem compatible with our requirement to use --no-renumber. We also won't support --single-file, since it only seems intended for totally dead inboxes; and it doesn't seem worth the support overhead when "totally dead" turns out to be a misdiagnosis.
2019-05-23compact: reuse infrastructure from xcpdb
Since -xcpdb is a superset of -compact, we can reuse much of that code used for driving compact. For compact (only), this is slightly less memory efficient since it requires an extra process per-partition, but we get to prefix the output with the partition name for more readable output.
2019-05-23xcpdb: implement progress reporting
Copying an entire Xapian DB is horribly slow whether it's done via Perl or copydatabase(1). So displaying some progress indication is good for user experience. While we're at it, prefix xapian-compact output, too; since parallel processes end up clobbering each other.
2019-05-23xapcmd: xcpdb supports compaction
To minimize the delay on active inboxes, it's actually ideal to run xapian-compact at the end of the per-partition cpdb process; since the new DB isn't accessible yet and so we don't have to deal with lock contention with -mda or -watch processes. The downside is temporary file overhead (3x instead of 2x) required.
2019-05-23xcpdb: implement using Perl bindings
By avoid copydatabase(1) entirely, we can make further changes to avoid locking the entire inbox for a long operation and switch to fine-grained locking.
2019-05-23admin: move index_inbox over
We will be reindexing after copydatabase
2019-05-23xcpdb: new tool which wraps Xapian's copydatabase(1)
copydatabase(1) is an existing Xapian tool which is the recommended way to upgrade existing DBs to the latest Xapian database format (currently "glass" for stable/released versions). Our use of Xapian relies on preserving document IDs, so we'll wrap it like we do xapian-compact(1) and use the "--no-renumber" switch. I could not name the tool "public-inbox-copydatabase" since it would be ambiguous as to which DB it's actually copying. So, I abbreviated the suffix to "xcpdb" (Xapian CoPy DataBase), which I hope is acceptable and unambiguous.
2019-05-23admin: hoist out resolve_inboxes for -compact and -index
Both of these index-affecting commands should work similarly on the command-line. public-inbox-index no longer complains about unconfigured ~/.public-inbox/config; but often I found myself being annoyed by that, anyways...
2019-05-23xapcmd: new module for wrapping Xapian commands
Port public-inbox-compact(1) over to using it, and we will need to wrap copydatabase(1) to ease glass migrations, too.
2019-05-23doc: document the reason for --no-renumber
We're going to need copydatabase, too
2019-05-23v1writable: retire in favor of InboxWritable
In retrospect, introducing V1Writable was unnecessary and InboxWritable->importer is in a better position to abstract away differences between v1 and v2 writers. So teach InboxWritable to initialize inboxes and get rid of V1Writable.
2019-05-22init: preserve permissions for git prior to 2.1.0
"git config" did not preserve permissions of the config file it modifies prior to git 2.1.0, so workaround that.
2019-05-15admin: improve warnings and errors for missing modules
Since we lazy-load Xapian now, some errors may become more cryptic or buried. Try to improve that by making Admin show better errors.
2019-05-15lazy load Xapian and make it optional for v2
More tests work without Search::Xapian, now. Usability issues still need to be fixed
2019-05-14httpd: get rid of Deflater warning
Deflating responses may be done by the reverse proxy (e.g. varnish or nginx), so the warning for it could be invalid.
2019-05-14v2writable: allow setting nproc via creat options
Avoiding reliance on environment variables is a bit cleaner for writing tests
2019-05-14v1writable: new wrapper which is closer to v2writable
Import initialization is a little strange from history, but we also can't change it too much because it's technically a public API which external code may rely on... And we may need to support v1 repos indefinitely. This should make it easier to write tests for both formats.
2019-05-06index: warn with info about the message as context
This can help users track down the source of warnings when presented with imperfect emails. While we're at it, make the __WARN__ callback in t/v2writable.t a no-op since we don't check for warnings, there.
2019-01-31Merge remote-tracking branch 'origin/purge'
* origin/purge: implement public-inbox-purge tool v2writable: read epoch on purge v2writable: cleanup processes when done v2writable: purge ignores non-existent git epoch directories v2writable: ->purge returns undef on no-op import: purge: reap fast-export process hoist out resolve_repo_dir from -index
2019-01-20www: admin-configurable CSS via "publicinbox.css"
Maybe we'll default to a dark theme to promote energy savings... See contrib/css/README for details
2019-01-15index: allow working on unconfigured inboxes, again
2019-01-11implement public-inbox-purge tool
Expose the ->purge functionality of V2Writable for rewriting git history to permanently purge messages from history. This may be necessary for legal reasons. Usage: # requires ~/.public-inbox/config public-inbox-purge --all </path/to/message-to-purge # good for testing with unconfigured inboxes: public-inbox-purge $INBOX_DIR </path/to/message-to-purge
2019-01-11hoist out resolve_repo_dir from -index
We'll be using it in future admin tools, and making this easier-to-test.
2019-01-05filter/rubylang: fix SQLite DB lifetime problems
Clearly the AltId stuff was never tested for v2. Ensure this tricky filter (which reuses Msgmap to avoid introducing new serial numbers) doesn't trigger deadlocks SQLite due to opening a DB for writing multiple times. I went through several iterations of this change before going with this one, which is the least intrusive I could fine.
2019-01-02use PublicInbox::Config::each_inbox where appropriate
No need to reach into PublicInbox::Config internals and iterate through the hashref by hand
2018-12-28init: allow --skip of old epochs for -V2 repos
This allows archivists to publish incomplete archives with newer mail while allowing "0.git" (or "1.git" and so on) epochs to be added-after-the-fact (without affecting "git clone" followers). A reindex will be necessary for Xapian and SQLite to catch up once the old epochs are added; but the reindexing code is also capable of tolerating missing epochs.
2018-12-27init: do not set publicinbox.$NAME.indexlevel by default
It is redundant to set default values in the public-inbox config file. Lets not clutter up users' screens when they view or edit the config file.
2018-07-29mda: allow configuring globally without spamc support
This reuses some of the configuration from -watch, but remains independent since some configurations will use -watch for some inboxes and -mda for others. The default remains "spamc" for -mda users so nothing changes without explicit configuration. Per-inbox configurations may also be supported in the future.
2018-07-29mda: v2: ensure message bodies are indexed
We must not clobber the original message string, as Email::MIME(*) still needs it for iterating through parts in SearchIdx (but not when handing it as a raw string to git-fast-import). I've noticed message bodies (especially dfpre/dpost) were not getting indexed when going through -mda (no problems with -watch). This also did not affect v1 repos, since indexing is a separate process for v1 and requires re-reading the data from git. (*) tested Email::MIME 1.937 on Debian stretch