user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* sample robots.txt to reduce WWW load
@ 2024-04-01 13:21  6% Eric Wong
  0 siblings, 0 replies; 14+ results
From: Eric Wong @ 2024-04-01 13:21 UTC (permalink / raw)
  To: meta

Performance is still slow, and crawler traffic patterns tend to
do bad things with caches at all levels, so I've regretfully had
to experiment with robots.txt to mitigate performance problems.

The /s/ solver endpoint remains expensive but commit
8d6a50ff2a44 (www: use a dedicated limiter for blob solver, 2024-03-11)
seems to have helped significantly.

All the multi-message endpoints (/[Tt]*) are of course expensive
and have always been.  git blob access over SATA 2 SSD isn't too
fast, and HTML rendering is quite expensive in Perl.  Keeping
multiple zlib contexts for HTTP gzip also hurts memory usage,
so we want to minimize the amount of time clients keep
longer-lived allocations.

Anyways, this robots.txt is what I've been experimenting with
and (after a few days when bots pick it up) it seems to have
significantly cut load on my system so I can actually work on
performance problems[1] which show up.

==> robots.txt <==
User-Agent: *
Disallow: /*/s/
Disallow: /*/T/
Disallow: /*/t/
Disallow: /*/t.atom
Disallow: /*/t.mbox.gz
Allow: /

I also disable git-archive snapshots for cgit || WwwCoderepo:

Disallow: /*/snapshot/*


[1] I'm testing a glibc patch which hopefully reduces fragmentation.
    I've gotten rid of many of the Disallow: entries temporarily
   since

^ permalink raw reply	[relevance 6%]

* [PATCH 1/4] www: use a dedicated limiter for blob solver
  @ 2024-03-11 19:40 16% ` Eric Wong
  0 siblings, 0 replies; 14+ results
From: Eric Wong @ 2024-03-11 19:40 UTC (permalink / raw)
  To: meta

Wrap the entire solver command chain with a dedicated limiter.
The normal limiter is designed for longer-lived commands or ones
which serve a single HTTP request (e.g. git-http-backend or
cgit) and not effective for short memory + CPU intensive commands
used for solver.

Each overall solver request is both memory + CPU intensive: it
spawns several short-lived git processes(*) in addition to a
longer-lived `git cat-file --batch' process.

Thus running parallel solvers from a single -netd/-httpd worker
(which have their own parallelization) results in excessive
parallelism that is both memory and CPU-bound (not network-bound)
and cascade into slowdowns for handling simpler memory/CPU-bound
requests.  Parallel solvers were also responsible for the
increased lifetime and frequency of zombies since the event loop
was too saturated to reap them.

We'll also return 503 on excessive solver queueing, since these
require an FD for the client HTTP(S) socket to be held onto.

(*) git (update-index|apply|ls-files) are all run by solver and
    short-lived
---
 lib/PublicInbox/SolverGit.pm | 15 ++++++-----
 lib/PublicInbox/ViewVCS.pm   | 48 +++++++++++++++++++++++++++++-------
 2 files changed, 48 insertions(+), 15 deletions(-)

diff --git a/lib/PublicInbox/SolverGit.pm b/lib/PublicInbox/SolverGit.pm
index 4e79f750..296e7d17 100644
--- a/lib/PublicInbox/SolverGit.pm
+++ b/lib/PublicInbox/SolverGit.pm
@@ -256,6 +256,12 @@ sub update_index_result ($$) {
 	next_step($self); # onto do_git_apply
 }
 
+sub qsp_qx ($$$) {
+	my ($self, $qsp, $cb) = @_;
+	$qsp->{qsp_err} = \($self->{-qsp_err} = '');
+	$qsp->psgi_qx($self->{psgi_env}, $self->{limiter}, $cb, $self);
+}
+
 sub prepare_index ($) {
 	my ($self) = @_;
 	my $patches = $self->{patches};
@@ -284,9 +290,8 @@ sub prepare_index ($) {
 	my $cmd = [ qw(git update-index -z --index-info) ];
 	my $qsp = PublicInbox::Qspawn->new($cmd, $self->{git_env}, $rdr);
 	$path_a = git_quote($path_a);
-	$qsp->{qsp_err} = \($self->{-qsp_err} = '');
 	$self->{-msg} = "index prepared:\n$mode_a $oid_full\t$path_a";
-	$qsp->psgi_qx($self->{psgi_env}, undef, \&update_index_result, $self);
+	qsp_qx $self, $qsp, \&update_index_result;
 }
 
 # pure Perl "git init"
@@ -465,8 +470,7 @@ sub apply_result ($$) { # qx_cb
 	my @cmd = qw(git ls-files -s -z);
 	my $qsp = PublicInbox::Qspawn->new(\@cmd, $self->{git_env});
 	$self->{-cur_di} = $di;
-	$qsp->{qsp_err} = \($self->{-qsp_err} = '');
-	$qsp->psgi_qx($self->{psgi_env}, undef, \&ls_files_result, $self);
+	qsp_qx $self, $qsp, \&ls_files_result;
 }
 
 sub do_git_apply ($) {
@@ -495,8 +499,7 @@ sub do_git_apply ($) {
 	my $opt = { 2 => 1, -C => _tmp($self)->dirname, quiet => 1 };
 	my $qsp = PublicInbox::Qspawn->new(\@cmd, $self->{git_env}, $opt);
 	$self->{-cur_di} = $di;
-	$qsp->{qsp_err} = \($self->{-qsp_err} = '');
-	$qsp->psgi_qx($self->{psgi_env}, undef, \&apply_result, $self);
+	qsp_qx $self, $qsp, \&apply_result;
 }
 
 sub di_url ($$) {
diff --git a/lib/PublicInbox/ViewVCS.pm b/lib/PublicInbox/ViewVCS.pm
index 61329db6..790b9a2c 100644
--- a/lib/PublicInbox/ViewVCS.pm
+++ b/lib/PublicInbox/ViewVCS.pm
@@ -49,6 +49,10 @@ my %GIT_MODE = (
 	'160000' => 'g', # commit (gitlink)
 );
 
+# TODO: not fork safe, but we don't fork w/o exec in PublicInbox::WWW
+my (@solver_q, $solver_lim);
+my $solver_nr = 0;
+
 sub html_page ($$;@) {
 	my ($ctx, $code) = @_[0, 1];
 	my $wcb = delete $ctx->{-wcb};
@@ -614,26 +618,52 @@ sub show_blob { # git->cat_async callback
 		'</code></pre></td></tr></table>'.dbg_log($ctx), @def);
 }
 
-# GET /$INBOX/$GIT_OBJECT_ID/s/
-# GET /$INBOX/$GIT_OBJECT_ID/s/$FILENAME
-sub show ($$;$) {
-	my ($ctx, $oid_b, $fn) = @_;
-	my $hints = $ctx->{hints} = {};
+sub start_solver ($) {
+	my ($ctx) = @_;
 	while (my ($from, $to) = each %QP_MAP) {
 		my $v = $ctx->{qp}->{$from} // next;
-		$hints->{$to} = $v if $v ne '';
+		$ctx->{hints}->{$to} = $v if $v ne '';
 	}
-	$ctx->{fn} = $fn;
-	$ctx->{-tmp} = File::Temp->newdir("solver.$oid_b-XXXX", TMPDIR => 1);
+	$ctx->{-next_solver} = PublicInbox::OnDestroy->new($$, \&next_solver);
+	++$solver_nr;
+	$ctx->{-tmp} = File::Temp->newdir("solver.$ctx->{oid_b}-XXXX",
+						TMPDIR => 1);
 	$ctx->{lh} or open $ctx->{lh}, '+>>', "$ctx->{-tmp}/solve.log";
 	my $solver = PublicInbox::SolverGit->new($ctx->{ibx},
 						\&solve_result, $ctx);
+	$solver->{limiter} = $solver_lim;
 	$solver->{gits} //= [ $ctx->{git} ];
 	$solver->{tmp} = $ctx->{-tmp}; # share tmpdir
 	# PSGI server will call this immediately and give us a callback (-wcb)
+	$solver->solve(@$ctx{qw(env lh oid_b hints)});
+}
+
+# run the next solver job when done and DESTROY-ed
+sub next_solver {
+	--$solver_nr;
+	# XXX FIXME: client may've disconnected if it waited a long while
+	start_solver(shift(@solver_q) // return);
+}
+
+sub may_start_solver ($) {
+	my ($ctx) = @_;
+	$solver_lim //= $ctx->{www}->{pi_cfg}->limiter('codeblob');
+	if ($solver_nr >= $solver_lim->{max}) {
+		@solver_q > 128 ? html_page($ctx, 503, 'too busy')
+				: push(@solver_q, $ctx);
+	} else {
+		start_solver($ctx);
+	}
+}
+
+# GET /$INBOX/$GIT_OBJECT_ID/s/
+# GET /$INBOX/$GIT_OBJECT_ID/s/$FILENAME
+sub show ($$;$) {
+	my ($ctx, $oid_b, $fn) = @_;
+	@$ctx{qw(oid_b fn)} = ($oid_b, $fn);
 	sub {
 		$ctx->{-wcb} = $_[0]; # HTTP write callback
-		$solver->solve($ctx->{env}, $ctx->{lh}, $oid_b, $hints);
+		may_start_solver $ctx;
 	};
 }
 

^ permalink raw reply related	[relevance 16%]

* [PATCH 5/5] Fix some typos/grammar/errors in docs and comments
  @ 2023-08-28 10:42  5% ` Štěpán Němec
  0 siblings, 0 replies; 14+ results
From: Štěpán Němec @ 2023-08-28 10:42 UTC (permalink / raw)
  To: meta

---
Please note the FIXME added in this patch: I lacked the confidence to
repair that paragraph on my own.

 Documentation/RelNotes/v2.0.0.wip           |  2 +-
 Documentation/dc-dlvr-spam-flow.txt         |  2 +-
 Documentation/design_notes.txt              | 10 ++++----
 Documentation/design_www.txt                | 12 ++++-----
 Documentation/lei.pod                       |  2 +-
 Documentation/public-inbox-config.pod       | 10 ++++----
 Documentation/public-inbox-daemon.pod       | 20 ++++++++-------
 Documentation/public-inbox-glossary.pod     |  6 ++---
 Documentation/public-inbox-learn.pod        |  4 +--
 Documentation/public-inbox-purge.pod        |  4 +--
 Documentation/public-inbox-tuning.pod       | 12 ++++-----
 Documentation/public-inbox-v2-format.pod    |  6 ++---
 Documentation/public-inbox-watch.pod        |  4 +--
 Documentation/reproducibility.txt           |  4 +--
 Documentation/standards.perl                |  4 +--
 Documentation/technical/data_structures.txt | 28 ++++++++++-----------
 Documentation/technical/ds.txt              |  6 ++---
 Documentation/technical/memory.txt          |  2 +-
 Documentation/technical/whyperl.txt         | 20 +++++++--------
 HACKING                                     | 14 +++++------
 INSTALL                                     |  4 +--
 README                                      | 16 ++++++------
 TODO                                        |  6 ++---
 ci/README                                   |  2 +-
 ci/profiles.sh                              |  2 +-
 devel/README                                |  2 +-
 examples/varnish-4.vcl                      |  2 +-
 lib/PublicInbox/DS.pm                       |  4 +--
 lib/PublicInbox/Daemon.pm                   |  2 +-
 sa_config/README                            |  4 +--
 script/public-inbox-mda                     |  4 +--
 scripts/README                              |  2 +-
 32 files changed, 111 insertions(+), 111 deletions(-)

diff --git a/Documentation/RelNotes/v2.0.0.wip b/Documentation/RelNotes/v2.0.0.wip
index cccf11ae587d..40c87169ccd9 100644
--- a/Documentation/RelNotes/v2.0.0.wip
+++ b/Documentation/RelNotes/v2.0.0.wip
@@ -60,7 +60,7 @@
   * fix `lei q -tt' on locally-indexed messages (still broken for remotes:
     https://public-inbox.org/meta/20230226170931.M947721@dcvr/ )
 
-  * `lei import' now set labels+keywords consistently on all
+  * `lei import' now sets labels+keywords consistently on all
      already-imported messages
 
 solver (used by lei (rediff|blob), and PublicInbox::WWW)
diff --git a/Documentation/dc-dlvr-spam-flow.txt b/Documentation/dc-dlvr-spam-flow.txt
index d151d272d0ae..6210fc7dcff4 100644
--- a/Documentation/dc-dlvr-spam-flow.txt
+++ b/Documentation/dc-dlvr-spam-flow.txt
@@ -39,7 +39,7 @@ delivery path as well as removing the message from the git tree.
 
 * incron - run commands based on filesystem events: http://incron.aiken.cz/
 
-* sendmail / MTA - we use and recommend use postfix, which includes a
+* sendmail / MTA - we use and recommend postfix, which includes a
                    sendmail-compatible wrapper: http://www.postfix.org/
 
 * spamc / spamd - SpamAssassin: http://spamassassin.apache.org/
diff --git a/Documentation/design_notes.txt b/Documentation/design_notes.txt
index 3df5af3e3cf2..95f025560c9e 100644
--- a/Documentation/design_notes.txt
+++ b/Documentation/design_notes.txt
@@ -52,15 +52,15 @@ Why email?
   There is no need to ask the NSA for backups of your mail archives :)
 
 * git, one of the most widely-used version control systems, includes many
-  tools for for email, including: git-format-patch(1), git-send-email(1),
+  tools for email, including: git-format-patch(1), git-send-email(1),
   git-am(1), git-imap-send(1).  Furthermore, the development of git itself
   is based on the git mailing list: https://public-inbox.org/git/
   (or
   http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/git/
-  for Tor users)
+  for Tor users).
 
 * Email is already the de-facto form of communication in many Free Software
-  communities..
+  communities.
 
 * Fallback/transition to private email and other lists, in case the
   public-inbox host becomes unavailable, users may still directly email
@@ -76,13 +76,13 @@ Why git?
 
 * As of 2016, git is widely used and known to nearly all Free Software
   developers.  For non-developers it is packaged for all major GNU/Linux
-  and *BSD distributions.  NNTP is not as widely-used nowadays, and
+  and *BSD distributions.  NNTP is not as widely used nowadays, and
   most IMAP clients do not have good support for read-only mailboxes.
 
 Why perl 5?
 -----------
 
-* Perl 5 is widely available on modern *nix systems with good a history
+* Perl 5 is widely available on modern *nix systems, with a good history
   of backwards and forward compatibility.
 
 * git and SpamAssassin both use it, so it should be one less thing for
diff --git a/Documentation/design_www.txt b/Documentation/design_www.txt
index b1f916ddb369..68488b1fa253 100644
--- a/Documentation/design_www.txt
+++ b/Documentation/design_www.txt
@@ -102,7 +102,7 @@ We also set <title> to make window management easier.
 
 We favor <pre>-formatted text since public-inbox is intended as a place
 to share and discuss patches and code.  Unfortunately, long paragraphs
-tends to be less readable with fixed-width serif fonts which GUI
+tend to be less readable with fixed-width serif fonts which GUI
 browsers default to.
 
 * No graphics, images, or icons at all.  We tolerate, but do not
@@ -122,12 +122,12 @@ browsers default to.
   avoided as they do not render well with some displays or user-chosen
   fonts.
 
-* No JavaScript. JS is historically too buggy and insecure, and we will
+* No JavaScript.  JS is historically too buggy and insecure, and we will
   never expect our readers to do either of the following:
-  a) read and audit all our code for on every single page load
-  b) trust us and and run code without reading it
+  a) read and audit all our code on every single page load
+  b) trust us and run code without reading it
 
-* We only use CSS for one reason: wrapping pre-formatted text
+* We only use CSS for one reason: wrapping pre-formatted text.
   This is necessary because unfortunate GUI browsers tend to be
   prone to layout widening from unwrapped mailers.
   Do not expect CSS to be enabled, especially with scary things like:
@@ -141,4 +141,4 @@ CSS classes (for user-supplied CSS)
 -----------------------------------
 
 See examples in contrib/css/ and lib/PublicInbox/WwwText.pm
-(or https://public-inbox.org/meta/_/text/color/ soon)
+(or <https://public-inbox.org/meta/_/text/color/>)
diff --git a/Documentation/lei.pod b/Documentation/lei.pod
index f01f506af359..2b10f4906e1a 100644
--- a/Documentation/lei.pod
+++ b/Documentation/lei.pod
@@ -126,7 +126,7 @@ Other subcommands include
 
 =head1 FILES
 
-By default storage is located at C<$XDG_DATA_HOME/lei/store>.  The
+By default, storage is located at C<$XDG_DATA_HOME/lei/store>.  The
 configuration for lei resides at C<$XDG_CONFIG_HOME/lei/config>.
 
 =head1 ERRORS
diff --git a/Documentation/public-inbox-config.pod b/Documentation/public-inbox-config.pod
index d175d2d74726..d2389abceb0e 100644
--- a/Documentation/public-inbox-config.pod
+++ b/Documentation/public-inbox-config.pod
@@ -191,7 +191,7 @@ Default: :all
 The local path name of a CSS file for the PSGI web interface.
 May contain the attributes "media", "title" and "href" which match
 the associated attributes of the HTML <style> tag.
-"href" may be specified to point to the URL of an remote CSS file
+"href" may be specified to point to the URL of a remote CSS file
 and the path may be "/dev/null" or any empty file.
 Multiple files may be specified and will be included in the
 order specified.
@@ -291,10 +291,10 @@ Default: /var/www/htdocs/cgit/cgit.cgi or /usr/lib/cgit/cgit.cgi
 =item publicinbox.cgitdata
 
 A path to the data directory used by cgit for storing static files.
-Typically guessed based the location of C<cgit.cgi> (from
-C<publicinbox.cgitbin>, but may be overridden.
+Typically guessed based on the location of C<cgit.cgi> (from
+C<publicinbox.cgitbin>), but may be overridden.
 
-Default: basename of C<publicinbox.cgitbin>, /var/www/htdocs/cgit/
+Default: dirname of C<publicinbox.cgitbin>, /var/www/htdocs/cgit/
 or /usr/share/cgit/
 
 =item publicinbox.cgit
@@ -311,7 +311,7 @@ Try using C<cgit> as the first choice, this is the default.
 =item * fallback
 
 Fall back to using C<cgit> only if our native, inbox-aware
-git code repository viewer doesn't recognized the URL.
+git code repository viewer doesn't recognize the URL.
 
 =item * rewrite
 
diff --git a/Documentation/public-inbox-daemon.pod b/Documentation/public-inbox-daemon.pod
index 7121683325c7..c5c88bdd04fa 100644
--- a/Documentation/public-inbox-daemon.pod
+++ b/Documentation/public-inbox-daemon.pod
@@ -101,6 +101,8 @@ Default: 1
 The default TLS certificate for HTTPS, IMAPS, NNTPS, POP3S and/or STARTTLS
 support if the C<cert> option is not given with C<--listen>.
 
+=for comment FIXME this paragraph needs repair
+
 Well-known TCP ports automatically get TLS or STARTTLS support
 If using systemd-compatible socket activation and a TCP listener
 on port well-known ports (563 is inherited, it is automatically
@@ -112,15 +114,15 @@ STARTTLS support.
 
 The default TLS certificate key for the default C<--cert> or
 per-listener C<cert=> option.  The private key may be
-concatenated into the path used by the cert, in which case this
+concatenated into the cert file itself, in which case this
 option is not needed.
 
 =item --multi-accept INTEGER
 
-By default, each worker accepts one connection at-a-time to maximize
+By default, each worker accepts one connection at a time to maximize
 fairness and minimize contention across multiple processes on a
 shared listen socket.  Accepting multiple connections at once may be
-useful in constrained deployments with few, heavily-loaded workers.
+useful in constrained deployments with few, heavily loaded workers.
 Negative values enables a worker to accept all available clients at
 once, possibly starving others in the process.  C<-1> behaves like
 C<multi_accept yes> in nginx; while C<0> (the default) is
@@ -137,7 +139,7 @@ Default: 0
 =head1 SIGNALS
 
 Most of our signal handling behavior is copied from L<nginx(8)>
-and/or L<starman(1)>; so it is possible to reuse common scripts
+and/or L<starman(1)>, so it is possible to reuse common scripts
 for managing them.
 
 =over 8
@@ -158,7 +160,7 @@ Reload config files associated with the process.
 
 =item SIGTTIN
 
-Increase the number of running workers processes by one.
+Increase the number of running worker processes by one.
 
 =item SIGTTOU
 
@@ -166,7 +168,7 @@ Decrease the number of running worker processes by one.
 
 =item SIGWINCH
 
-Stop all running worker processes.   SIGHUP or SIGTTIN
+Stop all running worker processes.  SIGHUP or SIGTTIN
 may be used to restart workers.
 
 =item SIGQUIT
@@ -194,7 +196,7 @@ activation.  See L<systemd.socket(5)> and L<sd_listen_fds(3)>.
 
 =item PERL_INLINE_DIRECTORY
 
-Pointing this to point to a writable directory enables the use
+Pointing this to a writable directory enables the use
 of L<Inline> and L<Inline::C> extensions which may provide
 platform-specific performance improvements.  Currently, this
 enables the use of L<vfork(2)> which speeds up subprocess
@@ -211,8 +213,8 @@ created by a user. See L<Inline> and L<Inline::C> for more details.
 There are two ways to upgrade a running process.
 
 Users of process management systems with socket activation
-(L<systemd(1)> or similar) may rely on multiple instances For
-systemd, this means using two (or more) '@' instances for each
+(L<systemd(1)> or similar) may rely on multiple daemon instances.
+For systemd, this means using two (or more) '@' instances for each
 service (e.g. C<SERVICENAME@INSTANCE>) as documented in
 L<systemd.unit(5)>.
 
diff --git a/Documentation/public-inbox-glossary.pod b/Documentation/public-inbox-glossary.pod
index 3c9e2bd21283..d88539c8b0fb 100644
--- a/Documentation/public-inbox-glossary.pod
+++ b/Documentation/public-inbox-glossary.pod
@@ -25,7 +25,7 @@ C<over.sqlite3>
 
 =item tid, THREADID
 
-A sequentially-assigned positive integer.  These integers are
+A sequentially assigned positive integer.  These integers are
 per-inbox or per-extindex.  In the future, this may be prefixed
 with C<T> for JMAP (RFC 8621) and RFC 8474.  This may not be
 strictly compliant with RFC 8621 since inboxes and extindices
@@ -40,7 +40,7 @@ RFC-(822|2822|5322) email message.
 
 =item IMAP EMAILID, JMAP Email Id
 
-To-be-decided.  This will likely be the git blob ID prefixed with C<g>
+To be decided.  This will likely be the git blob ID prefixed with C<g>
 rather than the numeric UID to accommodate the same blob showing
 up in both an extindex and inbox (or multiple extindices).
 
@@ -87,7 +87,7 @@ but it imports drafts.
 
 For L<lei(1)> users only.  This will allow lei users to place
 the same email into one or more virtual folders for
-ease-of-filtering.  This is NOT tied to public-inbox names, as
+ease of filtering.  This is NOT tied to public-inbox names, as
 messages stored by lei may not be public.
 
 These are similar in spirit to arbitrary freeform "tags"
diff --git a/Documentation/public-inbox-learn.pod b/Documentation/public-inbox-learn.pod
index 3c92b1cc698b..f776df6b2bb0 100644
--- a/Documentation/public-inbox-learn.pod
+++ b/Documentation/public-inbox-learn.pod
@@ -54,7 +54,7 @@ This is similar to the C<spam> command above, but does
 not feed the message to L<spamc(1)> and only removes messages
 which match on any of the C<To:>, C<Cc:>, and C<List-ID:> headers.
 
-The C<--all> option may be used match C<spam> semantics in removing
+The C<--all> option may be used to match C<spam> semantics in removing
 the message from all configured inboxes.  C<--all> is only
 available in public-inbox 1.6.0+.
 
@@ -82,7 +82,7 @@ L<http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>
 
 =head1 COPYRIGHT
 
-Copyright 2019-2021 all contributors L<mailto:meta@public-inbox.org>
+Copyright all contributors L<mailto:meta@public-inbox.org>
 
 License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>
 
diff --git a/Documentation/public-inbox-purge.pod b/Documentation/public-inbox-purge.pod
index 945286c69f97..1223b5775828 100644
--- a/Documentation/public-inbox-purge.pod
+++ b/Documentation/public-inbox-purge.pod
@@ -31,7 +31,7 @@ leads to discontiguous git history.
 =item --all
 
 Purge the message in all inboxes configured in ~/.public-inbox/config.
-This is an alternative to specifying individual inboxes directories
+This is an alternative to specifying individual inbox directories
 on the command-line.
 
 =back
@@ -74,7 +74,7 @@ L<http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>
 
 =head1 COPYRIGHT
 
-Copyright 2019-2021 all contributors L<mailto:meta@public-inbox.org>
+Copyright all contributors L<mailto:meta@public-inbox.org>
 
 License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>
 
diff --git a/Documentation/public-inbox-tuning.pod b/Documentation/public-inbox-tuning.pod
index 53668eccb7cb..58a4d9bcbabd 100644
--- a/Documentation/public-inbox-tuning.pod
+++ b/Documentation/public-inbox-tuning.pod
@@ -79,8 +79,8 @@ RAM.  Attempts to parallelize random I/O on HDDs leads to pathological
 slowdowns as inboxes grow.
 
 While C<-V2> introduced Xapian shards as a parallelization
-mechanism for SSDs; enabling C<publicInbox.indexSequentialShard>
-repurposes sharding as mechanism to reduce the kernel page cache
+mechanism for SSDs, enabling C<publicInbox.indexSequentialShard>
+repurposes sharding as a mechanism to reduce the kernel page cache
 footprint when indexing on HDDs.
 
 Initializing a mirror with a high C<--jobs> count to create more
@@ -108,7 +108,7 @@ indices on btrfs to achieve acceptable performance (even on SSD).
 Disabling copy-on-write also disables checksumming, thus C<raid1>
 (or higher) configurations may be corrupt after unsafe shutdowns.
 
-Fortunately, these SQLite and Xapian indices are designed to
+Fortunately, these SQLite and Xapian indices are designed to be
 recoverable from git if missing.
 
 Disabling CoW does not prevent all fragmentation.  Large values
@@ -125,7 +125,7 @@ C<btrfs filesystem defragment -fr $INBOX_DIR> may be necessary.
 Large filesystems benefit significantly from the C<space_cache=v2>
 mount option documented in L<btrfs(5)>.
 
-Older, non-CoW filesystems are generally work well out-of-the-box
+Older, non-CoW filesystems generally work well out of the box
 for our Xapian and SQLite indices.
 
 =head2 Performance on solid state drives
@@ -152,7 +152,7 @@ C<LimitNOFILE=> in L<systemd.exec(5)>) may need to be raised to
 accommodate many concurrent clients.
 
 Transport Layer Security (IMAPS, NNTPS, or via STARTTLS) significantly
-increases memory use of client sockets, sure to account for that in
+increases memory use of client sockets, be sure to account for that in
 capacity planning.
 
 =head2 Other OS tuning knobs
@@ -168,7 +168,7 @@ Other OSes may have similar tuning knobs (patches appreciated).
 L<public-inbox-extindex(1)> allows any number of public-inboxes
 to share the same Xapian indices.
 
-git 2.33+ startup time is orders-of-magnitude faster and uses
+git 2.33+ startup time is orders of magnitude faster and uses
 less memory when dealing with thousands of alternates required
 for thousands of inboxes with L<public-inbox-extindex(1)>.
 
diff --git a/Documentation/public-inbox-v2-format.pod b/Documentation/public-inbox-v2-format.pod
index e93d7fc701d9..de3b0bfd390f 100644
--- a/Documentation/public-inbox-v2-format.pod
+++ b/Documentation/public-inbox-v2-format.pod
@@ -30,7 +30,7 @@ databases for parallelism by "shards".
   - all.git                         # empty, alternates to $EPOCH.git
   - xap$SCHEMA_VERSION/$SHARD       # per-shard Xapian DB
   - xap$SCHEMA_VERSION/over.sqlite3 # OVER-view DB for NNTP, threading
-  - msgmap.sqlite3                  # same the v1 msgmap
+  - msgmap.sqlite3                  # same as the v1 msgmap
 
 For blob lookups, the reader only needs to open the "all.git"
 repository with $GIT_DIR/objects/info/alternates which references
@@ -89,7 +89,7 @@ After-the-fact invocations of L<public-inbox-index> will ignore
 messages written to 'd' after they are written to 'm'.
 
 Deltafication is not significantly improved over v1, but overall
-storage for trees is made as as small as possible.  Initial
+storage for trees is made as small as possible.  Initial
 statistics and benchmarks showing the benefits of this approach
 are documented at:
 
@@ -97,7 +97,7 @@ L<https://public-inbox.org/meta/20180209205140.GA11047@dcvr/>
 
 =head2 XAPIAN SHARDS
 
-Another second scalability problem in v1 was the inability to
+Another scalability problem in v1 was the inability to
 utilize multiple CPU cores for Xapian indexing.  This is
 addressed by using shards in Xapian to perform import
 indexing in parallel.
diff --git a/Documentation/public-inbox-watch.pod b/Documentation/public-inbox-watch.pod
index e8f97c8088c9..febda0b13df4 100644
--- a/Documentation/public-inbox-watch.pod
+++ b/Documentation/public-inbox-watch.pod
@@ -41,7 +41,7 @@ importing them into public-inbox git repositories and indices.
 public-inbox-watch is useful in situations when a user wishes to
 mirror an existing mailing list, but has no access to run
 L<public-inbox-mda(1)> on a server.  Unlike public-inbox-mda
-which is invoked once per-message, public-inbox-watch is a
+which is invoked once per message, public-inbox-watch is a
 persistent process, making it faster for after-the-fact imports
 of large Maildirs.
 
@@ -62,7 +62,7 @@ public-inbox-watch takes no command-line options.
 =head1 CONFIGURATION
 
 These configuration knobs should be used in the
-L<public-inbox-config(5)> file
+L<public-inbox-config(5)> file.
 
 =over 8
 
diff --git a/Documentation/reproducibility.txt b/Documentation/reproducibility.txt
index 4e56ada48bb2..3336de731a4d 100644
--- a/Documentation/reproducibility.txt
+++ b/Documentation/reproducibility.txt
@@ -12,7 +12,7 @@ reproducible.
 Keeping all communications as email ensures the full history
 of the entire project can be mirrored by anyone with the
 resources to do so.  Compact, low-complexity data requires
-less resources to mirror, so sticking with plain-text
+less resources to mirror, so sticking with plain text
 ensures more parties can mirror and potentially fork the
 project with all its data.
 
@@ -26,4 +26,4 @@ If these things make power hungry project leaders and admins
 uncomfortable, good.  That was the point.  It's how checks
 and balances ought to work.
 
-Comments, corrections, etc welcome: meta@public-inbox.org
+Comments, corrections, etc. welcome: meta@public-inbox.org
diff --git a/Documentation/standards.perl b/Documentation/standards.perl
index c36afb5d718b..743cdee1ce24 100755
--- a/Documentation/standards.perl
+++ b/Documentation/standards.perl
@@ -11,11 +11,11 @@ Non-exhaustive list of standards public-inbox software attempts or
 intends to implement.  This list is intended to be a quick reference
 for hackers and users.
 
-Given the goals of interoperability and accessibility; strict
+Given the goals of interoperability and accessibility, strict
 conformance to standards is not always possible, but rather
 best-effort taking into account real-world cases.  In particular,
 "obsolete" standards remain relevant as long as clients and
-data exists.
+data using them exist.
 
 IETF RFCs
 ---------
diff --git a/Documentation/technical/data_structures.txt b/Documentation/technical/data_structures.txt
index 4dcf9ce609be..5ed21882b9f8 100644
--- a/Documentation/technical/data_structures.txt
+++ b/Documentation/technical/data_structures.txt
@@ -32,19 +32,19 @@ Per-message classes
   Common abbreviation: $mime, $eml
   Used by: PublicInbox::WWW, PublicInbox::SearchIdx
 
-  An representation of an entire email, multipart or not.
+  A representation of an entire email, multipart or not.
   An option to use libgmime or libmailutils may be supported
   in the future for performance and memory use.
 
   This can be a memory hog with big messages and giant
   attachments, so our PublicInbox::WWW interface only keeps
-  one object of this class in memory at-a-time.
+  one object of this class in memory at a time.
 
   In other words, this is the "meat" of the message, whereas
   $smsg (below) is just the "skeleton".
 
   Our PublicInbox::V2Writable class may have two objects of this
-  type in memory at-a-time for deduplication.
+  type in memory at a time for deduplication.
 
   In public-inbox 1.4 and earlier, Email::MIME and its subclass,
   PublicInbox::MIME were used.  Despite still slurping,
@@ -61,10 +61,10 @@ Per-message classes
 
   This is loaded from either the overview DB (over.sqlite3) or
   the Xapian DB (docdata.glass), though the Xapian docdata
-  is won't hold NNTP-only fields (Cc:/To:)
+  won't hold NNTP-only fields (Cc:/To:).
 
   There may be hundreds or thousands of these objects in memory
-  at-a-time, so fields are pruned if unneeded.
+  at a time, so fields are pruned if unneeded.
 
 * PublicInbox::SearchThread::Msg - subclass of Smsg
   Common abbreviation: $cont or $node
@@ -75,9 +75,9 @@ Per-message classes
   Nowadays, this is a re-blessed $smsg with additional fields.
 
   As with $smsg objects, there may be hundreds or thousands
-  of these objects in memory at-a-time.
+  of these objects in memory at a time.
 
-  We also do not use a linked-list for storing children as JWZ
+  We also do not use a linked list for storing children as JWZ
   describes, but instead a Perl hashref for {children} which
   becomes an arrayref upon sorting.
 
@@ -88,7 +88,7 @@ Per-inbox classes
 
 * PublicInbox::Inbox - represents a single public-inbox
   Common abbreviation: $ibx
-  Used everywhere
+  Used everywhere.
 
   This represents a "publicinbox" section in the config
   file, see public-inbox-config(5) for details.
@@ -152,7 +152,7 @@ ad-hoc structures shared across packages
   This holds the PSGI $env as well as any internal variables
   used by various modules of PublicInbox::WWW.
 
-  As with the PSGI $env, there is one per-active WWW
+  As with the PSGI $env, there is one per active WWW
   request+response cycle.  It does not exist for idle HTTP
   clients.
 
@@ -174,8 +174,8 @@ daemon classes
   Common abbreviation: $http
   Used by: PublicInbox::DS, public-inbox-httpd
 
-  Unlike PublicInbox::NNTP, this class no knowledge of any of
-  the email or git-specific parts of public-inbox, only PSGI.
+  Unlike PublicInbox::NNTP, this class has no knowledge of any of
+  the email- or git-specific parts of public-inbox, only PSGI.
   However, it supports APIs and behaviors (e.g. streaming large
   responses) which PublicInbox::WWW may take advantage of.
 
@@ -188,7 +188,7 @@ daemon classes
 
   This class calls non-blocking accept(2) or accept4(2) on a
   listen socket to create new PublicInbox::HTTP and
-  PublicInbox::HTTP instances.
+  PublicInbox::NNTP instances.
 
 * PublicInbox::HTTPD
   Common abbreviation: $httpd
@@ -197,9 +197,9 @@ daemon classes
   wrappers around client sockets accepted from
   PublicInbox::Listener.
 
-  Since the SERVER_NAME and SERVER_PORT PSGI variables needs to be
+  Since the SERVER_NAME and SERVER_PORT PSGI variables need to be
   exposed for HTTP/1.0 requests when Host: headers are missing,
-  this is per-Listener socket.
+  this is per Listener socket.
 
 * PublicInbox::HTTPD::Async
   Common abbreviation: $async
diff --git a/Documentation/technical/ds.txt b/Documentation/technical/ds.txt
index 4cfb62fe44c8..afead2f155e0 100644
--- a/Documentation/technical/ds.txt
+++ b/Documentation/technical/ds.txt
@@ -19,7 +19,7 @@ Most notably:
   triggers a call.
 
   The lack of read/write callback distinction is driven by the
-  fact TLS libraries (e.g. OpenSSL via IO::Socket::SSL) may
+  fact that TLS libraries (e.g. OpenSSL via IO::Socket::SSL) may
   declare SSL_WANT_READ on SSL_write(), and SSL_WANT_READ on
   SSL_read().  So we end up having to let each user object decide
   whether it wants to make read or write calls depending on its
@@ -35,7 +35,7 @@ Most notably:
   Reducing the user-supplied code down to a single callback allows
   subclasses to keep their logic self-contained.  The combination
   of this change and one-shot wakeups (see below) for bidirectional
-  data flows make asynchronous code easier to reason about.
+  data flows makes asynchronous code easier to reason about.
 
 Other divergences:
 
@@ -53,7 +53,7 @@ Other divergences:
 
 Augmented features:
 
-* obj->write(CODEREF) passes the object itself to the CODEREF
+* obj->write(CODEREF) passes the object itself to the CODEREF.
   Being able to enqueue subroutine calls is a powerful feature in
   Danga::Socket for keeping linear logic in an asynchronous environment.
   Unfortunately, each subroutine takes several kilobytes of memory.
diff --git a/Documentation/technical/memory.txt b/Documentation/technical/memory.txt
index a35b2c734409..039694c33441 100644
--- a/Documentation/technical/memory.txt
+++ b/Documentation/technical/memory.txt
@@ -8,7 +8,7 @@ memory-efficient.
 We strive to keep processes small to improve locality, allow
 the kernel to cache more files, and to be a good neighbor to
 other processes running on the machine.  Taking advantage of
-automatic reference counting (ARC) in Perl allows us
+automatic reference counting (ARC) in Perl allows us to
 deterministically release memory back to the heap.
 
 We start with a simple data model with few circular
diff --git a/Documentation/technical/whyperl.txt b/Documentation/technical/whyperl.txt
index fbe2e1b16e06..db1d9793a76a 100644
--- a/Documentation/technical/whyperl.txt
+++ b/Documentation/technical/whyperl.txt
@@ -21,7 +21,7 @@ Good Things
 
   Perl 5 is installed on many, if not most GNU/Linux and
   BSD-based servers and workstations.  It is likely the most
-  widely-installed programming environment that offers a
+  widely installed programming environment that offers a
   significant amount of POSIX functionality.  Users won't
   have to waste bandwidth or space with giant toolchains or
   architecture-specific binaries.
@@ -47,8 +47,8 @@ Good Things
 
 * Predictable performance
 
-  While Perl is neither fast or memory-efficient, its
-  performance and memory use are predictable and does not
+  While Perl is neither fast nor memory-efficient, its
+  performance and memory use are predictable and do not
   require GC tuning by the user.
 
   public-inbox is developed for (and mostly on) old
@@ -56,7 +56,7 @@ Good Things
   late 1990s, and any cheap VPS today has more than enough
   RAM and CPU for handling plain-text email.
 
-  Low hardware requirements increases the reach of our software
+  Low hardware requirements increase the reach of our software
   to more users, improving centralization resistance.
 
 * Compatibility
@@ -86,7 +86,7 @@ Good Things
 
   There should be no need to rely on language-specific
   package managers such as cpan(1), those systems increase
-  the learning curve for users and systems administrators.
+  the learning curve for users and system administrators.
 
 * Compactness and terseness
 
@@ -98,7 +98,7 @@ Good Things
 * Performance ceiling and escape hatch
 
   With optional Inline::C, we can be "as fast as C" in some
-  cases.  Inline::C is widely-packaged by distros and it
+  cases.  Inline::C is widely packaged by distros and it
   gives us an escape hatch for dealing with missing bindings
   or performance problems should they arise.  Inline::C use
   (as opposed to XS) also preserves the software freedom and
@@ -135,7 +135,7 @@ Bad Things
   (m//, substr(), index(), etc.) still require memory copies
   into userspace, negating a benefit of zero-copy.
 
-* The XS/C API make it difficult to improve internals while
+* The XS/C API makes it difficult to improve internals while
   preserving compatibility.
 
 * Lack of optional type checking.  This may be a blessing in
@@ -161,14 +161,14 @@ Red herrings to ignore when evaluating other runtimes
 -----------------------------------------------------
 
 These don't discount a language or runtime from being
-being used, they're just not interesting.
+used, they're just not interesting.
 
 * Lightweight threading
 
   While lightweight threading implementations are
-  convenient, they tend to be significantly heavier than a
+  convenient, they tend to be significantly heavier than
   pure event-loop systems (or multi-threaded event-loop
-  systems)
+  systems).
 
   Lightweight threading implementations have stack overhead
   and growth typically measured in kilobytes.  The userspace
diff --git a/HACKING b/HACKING
index df68b54d0f40..18ec74206c45 100644
--- a/HACKING
+++ b/HACKING
@@ -7,7 +7,7 @@ It is archived at: https://public-inbox.org/meta/
 and http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/ (using Tor)
 
 Contributions are email-driven, just like contributing to git
-itself or the Linux kernel; however anonymous and pseudonymous
+itself or the Linux kernel; nevertheless, anonymous and pseudonymous
 contributions will always be welcome.
 
 Please consider our goals in mind:
@@ -15,17 +15,17 @@ Please consider our goals in mind:
 	Decentralization, Accessibility, Compatibility, Performance
 
 These goals apply to everyone: users viewing over the web or NNTP,
-sysadmins running public-inbox, and other hackers working public-inbox.
+sysadmins running public-inbox, and other hackers working on public-inbox.
 
 We will reject any feature which advocates or contributes to any
-particular instance of a public-inbox becoming a single point of failure.
+particular instance of public-inbox becoming a single point of failure.
 Things we've considered but rejected include:
 
 * exposing article serial numbers outside of NNTP
 * allowing readers to inject metadata (e.g. votes)
 
 We care about being accessible to folks with vision problems and/or
-lack the computing resources to view so-called "modern" websites.
+lacking the computing resources to view so-called "modern" websites.
 This includes folks on slow connections and ancient browsers which
 may be too difficult to upgrade due to resource demands.
 
@@ -45,7 +45,7 @@ Just-Ahead-of-Time-compiled C (via Inline::C)
 Do not recurse on user-supplied data.  Neither Perl or C handle
 deep recursion gracefully.  See lib/PublicInbox/SearchThread.pm
 and lib/PublicInbox/MsgIter.pm for examples of non-recursive
-alternatives to previously-recursive algorithms.
+alternatives to previously recursive algorithms.
 
 Performance should be reasonably good for server administrators, too,
 and we will sacrifice features to achieve predictable performance.
@@ -61,8 +61,6 @@ on specific topics, in particular data_structures.txt
 Optional packages for testing and development
 ---------------------------------------------
 
-Optional packages testing and development:
-
 - Plack::Test                      deb: libplack-test-perl
                                    pkg: p5-Plack
                                    rpm: perl-Plack-Test
@@ -107,6 +105,6 @@ Perl notes
 ----------
 
 * \w, \s, \d character classes all match Unicode characters;
-  so write out class ranges (e.g "[0-9]") if you only intend to
+  so write out class ranges (e.g., "[0-9]") if you only intend to
   match ASCII.  Do not use the "/a" (ASCII) modifier, that requires
   Perl 5.14 and we're only depending on 5.10.1 at the moment.
diff --git a/INSTALL b/INSTALL
index 91e590ce3318..f5e14ebe73d4 100644
--- a/INSTALL
+++ b/INSTALL
@@ -1,7 +1,7 @@
 public-inbox (server-side) installation
 ---------------------------------------
 
-This is for folks who want to setup their own public-inbox instance.
+This is for folks who want to set up their own public-inbox instance.
 Clients should use normal git-clone/git-fetch, IMAP or NNTP clients
 if they want to import mail into their personal inboxes.
 
@@ -135,7 +135,7 @@ Numerous optional modules are likely to be useful as well:
                                     foreground servers)
 
 The following module is typically pulled in by dependencies listed
-above, so there is no need to explicitly install them:
+above, so there is no need to explicitly install it:
 
 - DBI                              deb: libdbi-perl
                                    pkg: p5-DBI
diff --git a/README b/README
index abe8ddc0075f..a9aa0e864ca2 100644
--- a/README
+++ b/README
@@ -17,7 +17,7 @@ public-inbox spawned around three main ideas:
   communication.  Users may have broken graphics drivers, limited
   eyesight, or be unable to afford modern hardware.
 
-public-inbox aims to be easy-to-deploy and manage; encouraging projects
+public-inbox aims to be easy to deploy and manage, encouraging projects
 to run their own instances with minimal overhead.
 
 Implementation
@@ -27,7 +27,7 @@ public-inbox stores mail in git repositories as documented
 in https://public-inbox.org/public-inbox-v2-format.txt and
 https://public-inbox.org/public-inbox-v1-format.txt
 
-By storing (and optionally) exposing an inbox via git, it is
+By storing and (optionally) exposing an inbox via git, it is
 fast and efficient to host and mirror public-inboxes.
 
 Traditional mailing lists use the "push" model.  For readers,
@@ -42,11 +42,11 @@ follow the list via NNTP, IMAP, POP3, Atom feed or HTML archives.
 
 If a reader loses interest, they simply stop following.
 
-Since we use git, mirrors are easy-to-setup, and lists are
-easy-to-relocate to different mail addresses without losing
+Since we use git, mirrors are easy to set up, and lists are
+easy to relocate to different mail addresses without losing
 or splitting archives.
 
-_Anybody_ may also setup a delivery-only mailing list server to
+_Anybody_ may also set up a delivery-only mailing list server to
 replay a public-inbox git archive to subscribers via SMTP.
 
 Features
@@ -111,7 +111,7 @@ and pull requests to our public-inbox address at:
 
 Please Cc: all recipients when replying as we do not require
 subscription.  This also makes it easier to rope in folks of
-tangentially related projects we depend on (e.g. git developers
+tangentially related projects we depend on (e.g., git developers
 on git@vger.kernel.org).
 
 The archives are readable via IMAP, NNTP or HTTP:
@@ -155,8 +155,8 @@ This improves accessibility, and saves bandwidth and storage
 as mail is archived forever.
 
 As of the 2010s, successful online social networks and forums are the
-ones which heavily restrict users formatting options; so public-inbox
-aims to preserve the focus on content, and not presentation.
+ones which heavily restrict users' formatting options; public-inbox
+aims to preserve the focus on content, not presentation.
 
 Copyright
 ---------
diff --git a/TODO b/TODO
index 77453eba27ac..de628e2e310a 100644
--- a/TODO
+++ b/TODO
@@ -1,8 +1,8 @@
 TODO items for public-inbox
 
 (Not in any particular order, and
-performance, ease-of-setup, installation, maintainability, etc
-all need to be considered for everything we introduce)
+performance, ease of setup, installation, maintainability, etc.
+all need to be considered for everything we introduce.)
 
 * general performance improvements, but without relying on
   XS or pre-built modules any more than we currently do.
@@ -32,7 +32,7 @@ all need to be considered for everything we introduce)
   portability to older Linux, free BSDs and maybe Hurd).
 
 * dogfood latest Xapian, Perl5, SQLite, git and various modules to
-  ensure things continue working as they should (or more better)
+  ensure things continue working as they should (or better)
   while retaining compatibility with old versions.
 
 * Support more of RFC 3977 (NNTP)
diff --git a/ci/README b/ci/README
index 4687fbc57059..728d82a0052c 100644
--- a/ci/README
+++ b/ci/README
@@ -27,7 +27,7 @@ run in the top-level source tree, that is, as `./ci/run.sh'.
 	or doing development.  However, it can be convenient to for
 	users to mass-install several packages.
 
-* ci/profiles.sh - prints to-be tested package profile for the current OS
+* ci/profiles.sh - prints to-be-tested package profile for the current OS
 
 	Called automatically by ci/run.sh
 	The output is read by ci/run.sh
diff --git a/ci/profiles.sh b/ci/profiles.sh
index e58b61d50a13..55b998d73633 100755
--- a/ci/profiles.sh
+++ b/ci/profiles.sh
@@ -2,7 +2,7 @@
 # Copyright (C) 2019-2021 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 
-# Prints OS-specific package profiles to stdout (one per-newline) to use
+# Prints OS-specific package profiles to stdout (one per line) to use
 # as command-line args for ci/deps.perl.  Called automatically by ci/run.sh
 
 # set by os-release(5) or similar
diff --git a/devel/README b/devel/README
index 8f9a0485ec3f..c4be51415d34 100644
--- a/devel/README
+++ b/devel/README
@@ -1 +1 @@
-scripts use for public-inbox development that don't belong in t/
+scripts used for public-inbox development that don't belong in t/
diff --git a/examples/varnish-4.vcl b/examples/varnish-4.vcl
index 5fc202ed4f36..624f60133599 100644
--- a/examples/varnish-4.vcl
+++ b/examples/varnish-4.vcl
@@ -28,7 +28,7 @@ sub vcl_recv {
 }
 
 sub vcl_pipe {
-	# By default Connection: close is set on all piped requests by varnish,
+	# By default, Connection: close is set on all piped requests by varnish,
 	# but public-inbox-httpd supports persistent connections well :)
 	unset bereq.http.connection;
 	return (pipe);
diff --git a/lib/PublicInbox/DS.pm b/lib/PublicInbox/DS.pm
index 98084b5c8a0a..e89dc4306c7b 100644
--- a/lib/PublicInbox/DS.pm
+++ b/lib/PublicInbox/DS.pm
@@ -209,8 +209,8 @@ sub await_cb ($;@) {
 	warn "E: awaitpid($pid): $@" if $@;
 }
 
-# This relies on our Perl process is single-threaded, or at least
-# no threads are spawning and waiting on processes (``, system(), etc...)
+# This relies on our Perl process being single-threaded, or at least
+# no threads spawning and waiting on processes (``, system(), etc...)
 # Threads are officially discouraged by the Perl5 team, and I expect
 # that to remain the case.
 sub reap_pids {
diff --git a/lib/PublicInbox/Daemon.pm b/lib/PublicInbox/Daemon.pm
index 30442227bdf8..88b0fa45bbb6 100644
--- a/lib/PublicInbox/Daemon.pm
+++ b/lib/PublicInbox/Daemon.pm
@@ -155,7 +155,7 @@ options:
 
   -l ADDRESS    address to listen on$dh
   --cert=FILE   default SSL/TLS certificate
-  --key=FILE    default SSL/TLS certificate
+  --key=FILE    default SSL/TLS certificate key
   -W WORKERS    number of worker processes to spawn (default: 1)
 
 See public-inbox-daemon(8) and $prog(1) man pages for more.
diff --git a/sa_config/README b/sa_config/README
index 6703c38fe1ae..3705e1e85d1b 100644
--- a/sa_config/README
+++ b/sa_config/README
@@ -4,9 +4,9 @@ SpamAssassin configs for public-inbox.org
 root/ - files for system-wide use (plugins, rule definitions,
         new rules should have a zero score which should be overridden)
 user/ - per-user config (keep as much in here as possible)
-        These files go into the users home directory
+        These files go into the user's home directory.
 
-All files in these example directory are CC0:
+All files in these example directories are CC0:
 To the extent possible under law, Eric Wong has waived all copyright and
 related or neighboring rights to these examples.
 
diff --git a/script/public-inbox-mda b/script/public-inbox-mda
index 7e2bee92096e..ba4989569e25 100755
--- a/script/public-inbox-mda
+++ b/script/public-inbox-mda
@@ -33,8 +33,8 @@ use PublicInbox::Filter::Base;
 use PublicInbox::InboxWritable;
 use PublicInbox::Spamcheck;
 
-# n.b: hopefully we can setup the emergency path without bailing due to
-# user error, we really want to setup the emergency destination ASAP
+# n.b.: Hopefully we can set up the emergency path without bailing due to
+# user error, we really want to set up the emergency destination ASAP
 # in case there's bugs in our code or user error.
 my $emergency = $ENV{PI_EMERGENCY} || "$ENV{HOME}/.public-inbox/emergency/";
 $ems = PublicInbox::Emergency->new($emergency);
diff --git a/scripts/README b/scripts/README
index 3b9c37da8787..7ffbd93cb994 100644
--- a/scripts/README
+++ b/scripts/README
@@ -1,5 +1,5 @@
 This directory contains informal scripts and random tools used
-in the development of public-inbox.  Some only exist only for
+in the development of public-inbox.  Some only exist for
 historical purposes, and some may not work anymore.
 
 See the "script/" directory (not "scripts/") for supported and
-- 
2.42.0


^ permalink raw reply related	[relevance 5%]

* [PATCH] relnotes: 2.0.0 work-in-progress
@ 2022-12-15 19:34  5% Eric Wong
  0 siblings, 0 replies; 14+ results
From: Eric Wong @ 2022-12-15 19:34 UTC (permalink / raw)
  To: meta

I'm thinking the -nntpd regression fix will push this release
out sooner rather than later...
---
 Documentation/RelNotes/v2.0.0.wip | 87 +++++++++++++++++++++++++++++++
 MANIFEST                          |  1 +
 2 files changed, 88 insertions(+)
 create mode 100644 Documentation/RelNotes/v2.0.0.wip

diff --git a/Documentation/RelNotes/v2.0.0.wip b/Documentation/RelNotes/v2.0.0.wip
new file mode 100644
index 00000000..a5468f8c
--- /dev/null
+++ b/Documentation/RelNotes/v2.0.0.wip
@@ -0,0 +1,87 @@
+To: meta@public-inbox.org
+Subject: [WIP] public-inbox 2.2.0
+MIME-Version: 1.0
+Content-Type: text/plain; charset=utf-8
+Content-Disposition: inline
+
+This release is mainly to fix a regression for -nntpd affecting
+neomutt and possibly other NNTP clients.
+
+There is also ongoing work to integrate coderepo handling into
+the codebase and the idea of `lei p2q' is integrated into the
+WWW UI.
+
+Upgrading:
+
+  lei users need to "lei daemon-kill" after installation to load
+  new code.  Normal daemons (read-only, and public-inbox-watch)
+  will also need restarts, of course, but there's no
+  backwards-incompatible data format changes so rolling back to
+  older versions is harmless.
+
+treewide
+
+  * support raw UTF-8 headers from SMTPUTF8 hosts
+
+  * standardize on `#' prefix for stderr diagnostics (previously `I:')
+
+PublicInbox::WWW
+
+  * support `+' in inbox names
+
+  * support coderepo displays for systems without cgit
+
+  * improve display of git tags, commits and trees in $INBOX/$OID/s/ endpoint
+
+  * numerous memory usage reductions by avoiding Perl scratchpads
+
+  * add #related anchor and search form to find related patches
+    based on blob OIDs (IOW, exposing `lei p2q' to the web)
+
+  * fix footer in listing of >200 inboxes
+
+lei
+
+  * use http.proxy / http.<remote>.proxy from system-wide git-config if
+    unconfigured for lei
+
+  * improve IMAP error reporting
+
+  * reduce default IMAP connections to avoid overloading servers
+
+  * compatibility with SQLite <3.8.3 on CentOS 7.x
+
+solver (used by lei (rediff|blob), and PublicInbox::WWW)
+
+  * handle copies in patches properly
+
+portability
+
+  * SIGWINCH is handled properly on less common architectures and OSes
+
+  * fix EINTR handling for kqueue users
+
+public-inbox-nntpd
+
+  * fix LISTGROUP with range (affects neomutt)
+
+public-inbox-clone / public-inbox-fetch / `lei add-external --mirror'
+
+  * mtime of downloaded manifest preserved
+
+public-inbox-clone:
+
+  * parallel mirroring of multiple inboxes/coderepos via manifest
+
+  * new flags to support this include:
+    --dry-run, --inbox-config=, --project-list=, --prune,
+    --keep-going, --jobs, --include=, --exclude=, --objstore=, ...
+
+PublicInbox::SaPlugin::ListMirror
+
+  * List-ID handling special-cased according to RFC 2919 rules
+
+Please report bugs via plain-text mail to: meta@public-inbox.org
+
+See archives at https://public-inbox.org/meta/ for all history.
+See https://public-inbox.org/TODO for what the future holds.
diff --git a/MANIFEST b/MANIFEST
index 29f368de..8d60d9dc 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -15,6 +15,7 @@ Documentation/RelNotes/v1.6.1.eml
 Documentation/RelNotes/v1.7.0.eml
 Documentation/RelNotes/v1.8.0.eml
 Documentation/RelNotes/v1.9.0.eml
+Documentation/RelNotes/v2.0.0.wip
 Documentation/clients.txt
 Documentation/common.perl
 Documentation/dc-dlvr-spam-flow.txt

^ permalink raw reply related	[relevance 5%]

* [PATCH 03/18] viewvcs: delay stringification of solver debug log
  @ 2022-08-29  9:26  7% ` Eric Wong
  0 siblings, 0 replies; 14+ results
From: Eric Wong @ 2022-08-29  9:26 UTC (permalink / raw)
  To: meta

This will make future changes easier to work on as we pass more
stuff through $ctx and reduce parameter passing on the Perl stack.
---
 lib/PublicInbox/ViewVCS.pm | 130 +++++++++++++++++--------------------
 1 file changed, 61 insertions(+), 69 deletions(-)

diff --git a/lib/PublicInbox/ViewVCS.pm b/lib/PublicInbox/ViewVCS.pm
index c5d16478..b04a5672 100644
--- a/lib/PublicInbox/ViewVCS.pm
+++ b/lib/PublicInbox/ViewVCS.pm
@@ -37,20 +37,35 @@ my $SHOW_FMT = '--pretty=format:'.join('%n', '%P', '%p', '%H', '%T', '%s',
 	'%an <%ae>  %ai', '%cn <%ce>  %ci', '%b%x00');
 
 sub html_page ($$$) {
-	my ($ctx, $code, $strref) = @_;
+	my ($ctx, $code, $str) = @_;
 	my $wcb = delete $ctx->{-wcb};
 	$ctx->{-upfx} = '../../'; # from "/$INBOX/$OID/s/"
-	my $res = html_oneshot($ctx, $code, $strref);
+	my $res = html_oneshot($ctx, $code, \$str);
 	$wcb ? $wcb->($res) : $res;
 }
 
+sub dbg_log ($) {
+	my ($ctx) = @_;
+	my $log = delete $ctx->{lh} // die 'BUG: already captured debug log';
+	if (!seek($log, 0, 0)) {
+		warn "seek(log): $!";
+		return '<pre>debug log seek error</pre>';
+	}
+	$log = do { local $/; <$log> } // do {
+		warn "readline(log): $!";
+		return '<pre>debug log read error</pre>';
+	};
+	$ctx->{-linkify} //= PublicInbox::Linkify->new;
+	'<pre>debug log:</pre><hr /><pre>'.
+		$ctx->{-linkify}->to_html($log).'</pre>';
+}
+
 sub stream_blob_parse_hdr { # {parse_hdr} for Qspawn
 	my ($r, $bref, $ctx) = @_;
-	my ($res, $logref) = delete @$ctx{qw(-res -logref)};
-	my ($git, $oid, $type, $size, $di) = @$res;
+	my ($git, $oid, $type, $size, $di) = @{$ctx->{-res}};
 	my @cl = ('Content-Length', $size);
-	if (!defined $r) { # error
-		html_page($ctx, 500, $logref);
+	if (!defined $r) { # sysread error
+		html_page($ctx, 500, dbg_log($ctx));
 	} elsif (index($$bref, "\0") >= 0) {
 		[200, [qw(Content-Type application/octet-stream), @cl] ];
 	} else {
@@ -60,17 +75,16 @@ sub stream_blob_parse_hdr { # {parse_hdr} for Qspawn
 				'text/plain; charset=UTF-8', @cl ] ];
 		}
 		if ($r == 0) {
-			warn "premature EOF on $oid $$logref";
-			return html_page($ctx, 500, $logref);
+			my $log = dbg_log($ctx);
+			warn "premature EOF on $oid $log";
+			return html_page($ctx, 500, $log);
 		}
-		@$ctx{qw(-res -logref)} = ($res, $logref);
 		undef; # bref keeps growing
 	}
 }
 
-sub stream_large_blob ($$$$) {
-	my ($ctx, $res, $logref, $fn) = @_;
-	$ctx->{-logref} = $logref;
+sub stream_large_blob ($$) {
+	my ($ctx, $res) = @_;
 	$ctx->{-res} = $res;
 	my ($git, $oid, $type, $size, $di) = @$res;
 	my $cmd = ['git', "--git-dir=$git->{git_dir}", 'cat-file', $type, $oid];
@@ -80,18 +94,16 @@ sub stream_large_blob ($$$$) {
 	$qsp->psgi_return($env, undef, \&stream_blob_parse_hdr, $ctx);
 }
 
-sub show_other_result ($$) {
+sub show_other_result ($$) { # tag, tree, ...
 	my ($bref, $ctx) = @_;
-	my ($qsp_err, $logref) = delete @$ctx{qw(-qsp_err -logref)};
-	if ($qsp_err) {
-		$$logref .= "git show error:$qsp_err";
-		return html_page($ctx, 500, $logref);
+	if (my $qsp_err = delete $ctx->{-qsp_err}) {
+		return html_page($ctx, 500, dbg_log($ctx) .
+				"git show error:$qsp_err");
 	}
 	my $l = PublicInbox::Linkify->new;
 	utf8::decode($$bref);
-	$$bref = '<pre>'. $l->to_html($$bref);
-	$$bref .= '</pre><hr>' . $$logref;
-	html_page($ctx, 200, $bref);
+	html_page($ctx, 200, '<pre>', $l->to_html($$bref), '</pre><hr>',
+		dbg_log($ctx));
 }
 
 sub cmt_title { # git->cat_async callback
@@ -104,10 +116,9 @@ sub cmt_title { # git->cat_async callback
 
 sub show_commit_start { # ->psgi_qx callback
 	my ($bref, $ctx) = @_;
-	my ($qsp_err, $logref) = delete @$ctx{qw(-qsp_err -logref)};
-	if ($qsp_err) {
-		$$logref .= "git show/patch-id error:$qsp_err";
-		return html_page($ctx, 500, $logref);
+	if (my $qsp_err = delete $ctx->{-qsp_err}) {
+		return html_page($ctx, 500, dbg_log($ctx) .
+				"git show/patch-id error:$qsp_err");
 	}
 	my $patchid = (split(/ /, $$bref))[0]; # ignore commit
 	$ctx->{-q_value_html} = "patchid:$patchid" if defined $patchid;
@@ -135,7 +146,7 @@ sub show_commit_start { # ->psgi_qx callback
 
 sub cmt_finalize {
 	my ($ctx) = @_;
-	$ctx->{-linkify} = PublicInbox::Linkify->new;
+	$ctx->{-linkify} //= PublicInbox::Linkify->new;
 	# try to keep author and committer dates lined up
 	my ($au, $co) = delete @$ctx{qw(cmt_au cmt_co)};
 	my $x = length($au) - length($co);
@@ -219,8 +230,8 @@ EOM
 	delete($ctx->{env}->{'qspawn.wcb'})->([200, $res_hdr, [$x]]);
 }
 
-sub show_commit ($$$$) {
-	my ($ctx, $res, $logref, $fn) = @_;
+sub show_commit ($$) {
+	my ($ctx, $res) = @_;
 	my ($git, $oid) = @$res;
 	# patch-id needs two passes, and we use the initial show to ensure
 	# a patch embedded inside the commit message body doesn't get fed
@@ -234,84 +245,67 @@ sub show_commit ($$$$) {
 	my $e = { GIT_DIR => $git->{git_dir} };
 	my $qsp = PublicInbox::Qspawn->new($cmd, $e, { -C => "$ctx->{-tmp}" });
 	$qsp->{qsp_err} = \($ctx->{-qsp_err} = '');
-	$ctx->{-logref} = $logref;
 	$ctx->{env}->{'qspawn.wcb'} = delete $ctx->{-wcb};
 	$ctx->{git} = $git;
 	$qsp->psgi_qx($ctx->{env}, undef, \&show_commit_start, $ctx);
 }
 
-sub show_other ($$$$) {
-	my ($ctx, $res, $logref, $fn) = @_;
+sub show_other ($$) {
+	my ($ctx, $res) = @_;
 	my ($git, $oid, $type, $size) = @$res;
-	if ($size > $MAX_SIZE) {
-		$$logref = "$oid is too big to show\n" . $$logref;
-		return html_page($ctx, 200, $logref);
-	}
+	$size > $MAX_SIZE and return html_page($ctx, 200,
+				"$oid is too big to show\n". dbg_log($ctx));
 	my $cmd = ['git', "--git-dir=$git->{git_dir}",
 		qw(show --encoding=UTF-8 --no-color --no-abbrev), $oid ];
 	my $qsp = PublicInbox::Qspawn->new($cmd);
 	$qsp->{qsp_err} = \($ctx->{-qsp_err} = '');
-	$ctx->{-logref} = $logref;
 	$qsp->psgi_qx($ctx->{env}, undef, \&show_other_result, $ctx);
 }
 
 # user_cb for SolverGit, called as: user_cb->($result_or_error, $uarg)
 sub solve_result {
 	my ($res, $ctx) = @_;
-	my ($log, $hints, $fn) = delete @$ctx{qw(lh hints fn)};
-
-	unless (seek($log, 0, 0)) {
-		warn "seek(log): $!";
-		return html_page($ctx, 500, \'seek error');
-	}
-	$log = do { local $/; <$log> };
-
-	my $l = PublicInbox::Linkify->new;
-	$log = '<pre>debug log:</pre><hr /><pre>' .
-		$l->to_html($log) . '</pre>';
-
-	$res or return html_page($ctx, 404, \$log);
-	ref($res) eq 'ARRAY' or return html_page($ctx, 500, \$log);
+	my $hints = delete $ctx->{hints};
+	$res or return html_page($ctx, 404, dbg_log($ctx));
+	ref($res) eq 'ARRAY' or return html_page($ctx, 500, dbg_log($ctx));
 
 	my ($git, $oid, $type, $size, $di) = @$res;
-	return show_commit($ctx, $res, \$log, $fn) if $type eq 'commit';
-	return show_other($ctx, $res, \$log, $fn) if $type ne 'blob';
+	return show_commit($ctx, $res) if $type eq 'commit';
+	return show_other($ctx, $res) if $type ne 'blob';
 	my $path = to_filename($di->{path_b} // $hints->{path_b} // 'blob');
 	my $raw_link = "(<a\nhref=$path>raw</a>)";
 	if ($size > $MAX_SIZE) {
-		return stream_large_blob($ctx, $res, \$log, $fn) if defined $fn;
-		$log = "<pre><b>Too big to show, download available</b>\n" .
-			"$oid $type $size bytes $raw_link</pre>" . $log;
-		return html_page($ctx, 200, \$log);
+		return stream_large_blob($ctx, $res) if defined $ctx->{fn};
+		return html_page($ctx, 200, <<EOM . dbg_log($ctx));
+<pre><b>Too big to show, download available</b>
+"$oid $type $size bytes $raw_link</pre>
+EOM
 	}
 
 	my $blob = $git->cat_file($oid);
 	if (!$blob) { # WTF?
 		my $e = "Failed to retrieve generated blob ($oid)";
 		warn "$e ($git->{git_dir})";
-		$log = "<pre><b>$e</b></pre>" . $log;
-		return html_page($ctx, 500, \$log);
+		return html_page($ctx, 500, "<pre><b>$e</b></pre>".dbg_log($ctx))
 	}
 
 	my $bin = index(substr($$blob, 0, $BIN_DETECT), "\0") >= 0;
-	if (defined $fn) {
+	if (defined $ctx->{fn}) {
 		my $h = [ 'Content-Length', $size, 'Content-Type' ];
 		push(@$h, ($bin ? 'application/octet-stream' : 'text/plain'));
 		return delete($ctx->{-wcb})->([200, $h, [ $$blob ]]);
 	}
 
-	if ($bin) {
-		$log = "<pre>$oid $type $size bytes (binary)" .
-			" $raw_link</pre>" . $log;
-		return html_page($ctx, 200, \$log);
-	}
+	$bin and return html_page($ctx, 200,
+				"<pre>$oid $type $size bytes (binary)" .
+				" $raw_link</pre>".dbg_log($ctx));
 
 	# TODO: detect + convert to ensure validity
 	utf8::decode($$blob);
 	my $nl = ($$blob =~ s/\r?\n/\n/sg);
 	my $pad = length($nl);
 
-	$l->linkify_1($$blob);
+	($ctx->{-linkify} //= PublicInbox::Linkify->new)->linkify_1($$blob);
 	my $ok = $hl->do_hl($blob, $path) if $hl;
 	if ($ok) {
 		$blob = $ok;
@@ -320,17 +314,15 @@ sub solve_result {
 	}
 
 	# using some of the same CSS class names and ids as cgit
-	$log = "<pre>$oid $type $size bytes $raw_link</pre>" .
+	html_page($ctx, 200, "<pre>$oid $type $size bytes $raw_link</pre>" .
 		"<hr /><table\nclass=blob>".
 		"<tr><td\nclass=linenumbers><pre>" . join('', map {
 			sprintf("<a id=n$_ href=#n$_>% ${pad}u</a>\n", $_)
 		} (1..$nl)) . '</pre></td>' .
 		'<td><pre> </pre></td>'. # pad for non-CSS users
 		"<td\nclass=lines><pre\nstyle='white-space:pre'><code>" .
-		$l->linkify_2($$blob) .
-		'</code></pre></td></tr></table>' . $log;
-
-	html_page($ctx, 200, \$log);
+		$ctx->{-linkify}->linkify_2($$blob) .
+		'</code></pre></td></tr></table>'.dbg_log($ctx));
 }
 
 # GET /$INBOX/$GIT_OBJECT_ID/s/

^ permalink raw reply related	[relevance 7%]

* [PATCH 03/11] use "\&" where possible when referring to subroutines
  2020-09-09  6:26  6% [PATCH 00/11] httpd: further reduce event loop monopolization Eric Wong
@ 2020-09-09  6:26 10% ` Eric Wong
  0 siblings, 0 replies; 14+ results
From: Eric Wong @ 2020-09-09  6:26 UTC (permalink / raw)
  To: meta

"*foo" is ambiguous in that it may refer to a bareword file handle;
so we'll use it where we can without triggering warnings.

PublicInbox::TestCommon::run_script_exit required dropping the
prototype, however.  We'll also future-proof by dropping "use
warnings" in Cgit.pm and use the less-ambiguous "//=" in Inbox.pm
while we're in the area.
---
 lib/PublicInbox/Cgit.pm       | 5 ++---
 lib/PublicInbox/Inbox.pm      | 2 +-
 lib/PublicInbox/TestCommon.pm | 4 ++--
 lib/PublicInbox/WwwListing.pm | 6 +++---
 t/replace.t                   | 8 ++++----
 t/solver_git.t                | 2 +-
 xt/msgtime_cmp.t              | 2 +-
 7 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/lib/PublicInbox/Cgit.pm b/lib/PublicInbox/Cgit.pm
index 9a51b451..fb0d0e60 100644
--- a/lib/PublicInbox/Cgit.pm
+++ b/lib/PublicInbox/Cgit.pm
@@ -10,9 +10,8 @@ use strict;
 use PublicInbox::GitHTTPBackend;
 use PublicInbox::Git;
 # not bothering with Exporter for a one-off
-*input_prepare = *PublicInbox::GitHTTPBackend::input_prepare;
-*serve = *PublicInbox::GitHTTPBackend::serve;
-use warnings;
+*input_prepare = \&PublicInbox::GitHTTPBackend::input_prepare;
+*serve = \&PublicInbox::GitHTTPBackend::serve;
 use PublicInbox::Qspawn;
 use PublicInbox::WwwStatic qw(r);
 
diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm
index 3b5ac970..b0894a7d 100644
--- a/lib/PublicInbox/Inbox.pm
+++ b/lib/PublicInbox/Inbox.pm
@@ -70,7 +70,7 @@ sub _cleanup_later ($) {
 	my ($self) = @_;
 	$cleanup_avail = cleanup_possible() if $cleanup_avail < 0;
 	return if $cleanup_avail != 1;
-	$cleanup_timer ||= PublicInbox::DS::later(*cleanup_task);
+	$cleanup_timer //= PublicInbox::DS::later(\&cleanup_task);
 	$CLEANUP->{"$self"} = $self;
 }
 
diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm
index b03e93e0..42819179 100644
--- a/lib/PublicInbox/TestCommon.pm
+++ b/lib/PublicInbox/TestCommon.pm
@@ -158,7 +158,7 @@ sub _undo_redirects ($) {
 # The default is 2.
 our $run_script_exit_code;
 sub RUN_SCRIPT_EXIT () { "RUN_SCRIPT_EXIT\n" };
-sub run_script_exit (;$) {
+sub run_script_exit {
 	$run_script_exit_code = $_[0] // 0;
 	die RUN_SCRIPT_EXIT;
 }
@@ -180,7 +180,7 @@ package $pkg;
 use strict;
 use subs qw(exit);
 
-*exit = *PublicInbox::TestCommon::run_script_exit;
+*exit = \\&PublicInbox::TestCommon::run_script_exit;
 sub main {
 # the below "line" directive is a magic comment, see perlsyn(1) manpage
 # line 1 "$f"
diff --git a/lib/PublicInbox/WwwListing.pm b/lib/PublicInbox/WwwListing.pm
index 365743cf..0be3764c 100644
--- a/lib/PublicInbox/WwwListing.pm
+++ b/lib/PublicInbox/WwwListing.pm
@@ -60,9 +60,9 @@ sub list_404 ($$) { [] }
 
 # TODO: +cgit
 my %VALID = (
-	all => *list_all,
-	'match=domain' => *list_match_domain,
-	404 => *list_404,
+	all => \&list_all,
+	'match=domain' => \&list_match_domain,
+	404 => \&list_404,
 );
 
 sub set_cb ($$$) {
diff --git a/t/replace.t b/t/replace.t
index a1e2d63b..95241adf 100644
--- a/t/replace.t
+++ b/t/replace.t
@@ -179,10 +179,10 @@ EOF
 	}
 }
 
-my $opt = { pre => *pad_msgs };
+my $opt = { pre => \&pad_msgs };
 test_replace(2, 'basic', {});
 test_replace(2, 'basic', $opt);
-test_replace(2, 'basic', $opt = { %$opt, post => *pad_msgs });
+test_replace(2, 'basic', $opt = { %$opt, post => \&pad_msgs });
 test_replace(2, 'basic', $opt = { %$opt, rotate_bytes => 1 });
 
 SKIP: {
@@ -190,9 +190,9 @@ SKIP: {
 	PublicInbox::Search::load_xapian() or skip 'Search::Xapian missing', 8;
 	for my $l (qw(medium)) {
 		test_replace(2, $l, {});
-		$opt = { pre => *pad_msgs };
+		$opt = { pre => \&pad_msgs };
 		test_replace(2, $l, $opt);
-		test_replace(2, $l, $opt = { %$opt, post => *pad_msgs });
+		test_replace(2, $l, $opt = { %$opt, post => \&pad_msgs });
 		test_replace(2, $l, $opt = { %$opt, rotate_bytes => 1 });
 	}
 };
diff --git a/t/solver_git.t b/t/solver_git.t
index 78cc0edd..c162b605 100644
--- a/t/solver_git.t
+++ b/t/solver_git.t
@@ -41,7 +41,7 @@ $ibx->{-repo_objs} = [ $git ];
 my $res;
 my $solver = PublicInbox::SolverGit->new($ibx, sub { $res = $_[0] });
 open my $log, '+>>', "$inboxdir/solve.log" or die "open: $!";
-my $psgi_env = { 'psgi.errors' => *STDERR, 'psgi.url_scheme' => 'http',
+my $psgi_env = { 'psgi.errors' => \*STDERR, 'psgi.url_scheme' => 'http',
 		'HTTP_HOST' => 'example.com' };
 $solver->solve($psgi_env, $log, '69df7d5', {});
 ok($res, 'solved a blob!');
diff --git a/xt/msgtime_cmp.t b/xt/msgtime_cmp.t
index 0ce3c042..aa96be4d 100644
--- a/xt/msgtime_cmp.t
+++ b/xt/msgtime_cmp.t
@@ -62,7 +62,7 @@ my $fh = $git->popen(@cat);
 while (<$fh>) {
 	my ($oid, $type) = split / /;
 	next if $type ne 'blob';
-	$git->cat_async($oid, *compare);
+	$git->cat_async($oid, \&compare);
 }
 $git->cat_async_wait;
 ok(1);

^ permalink raw reply related	[relevance 10%]

* [PATCH 00/11] httpd: further reduce event loop monopolization
@ 2020-09-09  6:26  6% Eric Wong
  2020-09-09  6:26 10% ` [PATCH 03/11] use "\&" where possible when referring to subroutines Eric Wong
  0 siblings, 1 reply; 14+ results
From: Eric Wong @ 2020-09-09  6:26 UTC (permalink / raw)
  To: meta

A couple more things to mitigate the effects of slow storage
with many inboxes.  Mostly solver-related, and still more to
come...  (Hoping the electrical grid stays up and dust bunny
removal solved overheating problems).

Eric Wong (11):
  xt/solver: test with public-inbox-httpd, too
  solver: drop warnings, modernize use v5.10.1, use SEEK_SET
  use "\&" where possible when referring to subroutines
  www: manifest.js.gz generation no longer hogs event loop
  config: flatten each_inbox and iterate_start args
  config: split out iterator into separate object
  t/cgi.t: show stderr on failures
  extmsg: prevent cross-inbox matches from hogging event loop
  wwwlisting: avoid hogging event loop
  solver: check one git coderepo and inbox at a time
  solver: break apart inbox blob retrieval

 MANIFEST                        |   2 +
 lib/PublicInbox/Cgit.pm         |   5 +-
 lib/PublicInbox/Config.pm       |  22 +--
 lib/PublicInbox/ConfigIter.pm   |  40 +++++
 lib/PublicInbox/ExtMsg.pm       | 102 ++++++++----
 lib/PublicInbox/IMAPD.pm        |   6 +-
 lib/PublicInbox/Inbox.pm        |   2 +-
 lib/PublicInbox/ManifestJsGz.pm | 135 ++++++++++++++++
 lib/PublicInbox/SolverGit.pm    | 190 +++++++++++++---------
 lib/PublicInbox/TestCommon.pm   |   4 +-
 lib/PublicInbox/WWW.pm          |  21 +--
 lib/PublicInbox/Watch.pm        |  13 +-
 lib/PublicInbox/WwwListing.pm   | 279 ++++++++------------------------
 t/cgi.t                         |   2 +-
 t/replace.t                     |   8 +-
 t/solver_git.t                  |   7 +-
 t/www_listing.t                 |   7 +-
 xt/msgtime_cmp.t                |   2 +-
 xt/solver.t                     |  31 +++-
 19 files changed, 499 insertions(+), 379 deletions(-)
 create mode 100644 lib/PublicInbox/ConfigIter.pm
 create mode 100644 lib/PublicInbox/ManifestJsGz.pm

^ permalink raw reply	[relevance 6%]

* [PATCH 2/3] use "\&" where possible when referring to subroutines
  @ 2020-09-01 20:36 10% ` Eric Wong
  0 siblings, 0 replies; 14+ results
From: Eric Wong @ 2020-09-01 20:36 UTC (permalink / raw)
  To: meta

"*foo" is ambiguous in that it may refer to a bareword file handle;
so we'll use it where we can without triggering warnings.

PublicInbox::TestCommon::run_script_exit required dropping the
prototype, however.  We'll also future-proof by dropping "use
warnings" in Cgit.pm and use the less-ambiguous "//=" in Inbox.pm
while we're in the area.
---
 lib/PublicInbox/Cgit.pm       | 5 ++---
 lib/PublicInbox/Inbox.pm      | 2 +-
 lib/PublicInbox/TestCommon.pm | 4 ++--
 lib/PublicInbox/WwwListing.pm | 6 +++---
 t/replace.t                   | 8 ++++----
 t/solver_git.t                | 2 +-
 xt/msgtime_cmp.t              | 2 +-
 7 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/lib/PublicInbox/Cgit.pm b/lib/PublicInbox/Cgit.pm
index 9a51b451..fb0d0e60 100644
--- a/lib/PublicInbox/Cgit.pm
+++ b/lib/PublicInbox/Cgit.pm
@@ -10,9 +10,8 @@ use strict;
 use PublicInbox::GitHTTPBackend;
 use PublicInbox::Git;
 # not bothering with Exporter for a one-off
-*input_prepare = *PublicInbox::GitHTTPBackend::input_prepare;
-*serve = *PublicInbox::GitHTTPBackend::serve;
-use warnings;
+*input_prepare = \&PublicInbox::GitHTTPBackend::input_prepare;
+*serve = \&PublicInbox::GitHTTPBackend::serve;
 use PublicInbox::Qspawn;
 use PublicInbox::WwwStatic qw(r);
 
diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm
index 241001d3..4005954e 100644
--- a/lib/PublicInbox/Inbox.pm
+++ b/lib/PublicInbox/Inbox.pm
@@ -70,7 +70,7 @@ sub _cleanup_later ($) {
 	my ($self) = @_;
 	$cleanup_avail = cleanup_possible() if $cleanup_avail < 0;
 	return if $cleanup_avail != 1;
-	$cleanup_timer ||= PublicInbox::DS::later(*cleanup_task);
+	$cleanup_timer //= PublicInbox::DS::later(\&cleanup_task);
 	$CLEANUP->{"$self"} = $self;
 }
 
diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm
index b03e93e0..42819179 100644
--- a/lib/PublicInbox/TestCommon.pm
+++ b/lib/PublicInbox/TestCommon.pm
@@ -158,7 +158,7 @@ sub _undo_redirects ($) {
 # The default is 2.
 our $run_script_exit_code;
 sub RUN_SCRIPT_EXIT () { "RUN_SCRIPT_EXIT\n" };
-sub run_script_exit (;$) {
+sub run_script_exit {
 	$run_script_exit_code = $_[0] // 0;
 	die RUN_SCRIPT_EXIT;
 }
@@ -180,7 +180,7 @@ package $pkg;
 use strict;
 use subs qw(exit);
 
-*exit = *PublicInbox::TestCommon::run_script_exit;
+*exit = \\&PublicInbox::TestCommon::run_script_exit;
 sub main {
 # the below "line" directive is a magic comment, see perlsyn(1) manpage
 # line 1 "$f"
diff --git a/lib/PublicInbox/WwwListing.pm b/lib/PublicInbox/WwwListing.pm
index 365743cf..0be3764c 100644
--- a/lib/PublicInbox/WwwListing.pm
+++ b/lib/PublicInbox/WwwListing.pm
@@ -60,9 +60,9 @@ sub list_404 ($$) { [] }
 
 # TODO: +cgit
 my %VALID = (
-	all => *list_all,
-	'match=domain' => *list_match_domain,
-	404 => *list_404,
+	all => \&list_all,
+	'match=domain' => \&list_match_domain,
+	404 => \&list_404,
 );
 
 sub set_cb ($$$) {
diff --git a/t/replace.t b/t/replace.t
index c4dcb89d..490e3b7b 100644
--- a/t/replace.t
+++ b/t/replace.t
@@ -179,10 +179,10 @@ EOF
 	}
 }
 
-my $opt = { pre => *pad_msgs };
+my $opt = { pre => \&pad_msgs };
 test_replace(2, 'basic', {});
 test_replace(2, 'basic', $opt);
-test_replace(2, 'basic', $opt = { %$opt, post => *pad_msgs });
+test_replace(2, 'basic', $opt = { %$opt, post => \&pad_msgs });
 test_replace(2, 'basic', $opt = { %$opt, rotate_bytes => 1 });
 
 SKIP: {
@@ -190,9 +190,9 @@ SKIP: {
 	PublicInbox::Search::load_xapian() or skip 'Search::Xapian missing', 8;
 	for my $l (qw(medium)) {
 		test_replace(2, $l, {});
-		$opt = { pre => *pad_msgs };
+		$opt = { pre => \&pad_msgs };
 		test_replace(2, $l, $opt);
-		test_replace(2, $l, $opt = { %$opt, post => *pad_msgs });
+		test_replace(2, $l, $opt = { %$opt, post => \&pad_msgs });
 		test_replace(2, $l, $opt = { %$opt, rotate_bytes => 1 });
 	}
 };
diff --git a/t/solver_git.t b/t/solver_git.t
index 78cc0edd..c162b605 100644
--- a/t/solver_git.t
+++ b/t/solver_git.t
@@ -41,7 +41,7 @@ $ibx->{-repo_objs} = [ $git ];
 my $res;
 my $solver = PublicInbox::SolverGit->new($ibx, sub { $res = $_[0] });
 open my $log, '+>>', "$inboxdir/solve.log" or die "open: $!";
-my $psgi_env = { 'psgi.errors' => *STDERR, 'psgi.url_scheme' => 'http',
+my $psgi_env = { 'psgi.errors' => \*STDERR, 'psgi.url_scheme' => 'http',
 		'HTTP_HOST' => 'example.com' };
 $solver->solve($psgi_env, $log, '69df7d5', {});
 ok($res, 'solved a blob!');
diff --git a/xt/msgtime_cmp.t b/xt/msgtime_cmp.t
index 0ce3c042..aa96be4d 100644
--- a/xt/msgtime_cmp.t
+++ b/xt/msgtime_cmp.t
@@ -62,7 +62,7 @@ my $fh = $git->popen(@cat);
 while (<$fh>) {
 	my ($oid, $type) = split / /;
 	next if $type ne 'blob';
-	$git->cat_async($oid, *compare);
+	$git->cat_async($oid, \&compare);
 }
 $git->cat_async_wait;
 ok(1);

^ permalink raw reply related	[relevance 10%]

* [PATCH 5/7] git: remove src_blob_url
  @ 2019-10-21 11:22  7% ` Eric Wong
  0 siblings, 0 replies; 14+ results
From: Eric Wong @ 2019-10-21 11:22 UTC (permalink / raw)
  To: meta

This was intended for solver, but it's unused since
commit 915cd090798069a4
("solver: switch patch application to use a callback")
---
 lib/PublicInbox/Git.pm | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index ff3838b3..218846f3 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -261,17 +261,6 @@ sub local_nick ($) {
 	wantarray ? ($ret) : $ret;
 }
 
-# show the blob URL for cgit/gitweb/whatever
-sub src_blob_url {
-	my ($self, $oid) = @_;
-	# blob_url_format = "https://example.com/foo.git/blob/%s"
-	if (my $bfu = $self->{blob_url_format}) {
-		return map { sprintf($_, $oid) } @$bfu if wantarray;
-		return sprintf($bfu->[0], $oid);
-	}
-	local_nick($self);
-}
-
 sub host_prefix_url ($$) {
 	my ($env, $url) = @_;
 	return $url if index($url, '//') >= 0;

^ permalink raw reply related	[relevance 7%]

* repobrowse history and notes
@ 2019-04-04  9:55  6% Eric Wong
  0 siblings, 0 replies; 14+ results
From: Eric Wong @ 2019-04-04  9:55 UTC (permalink / raw)
  To: meta

I've always intended public-inbox for software development.
It's useful for other types of communication, too, but
software development was always the driving force.

A web-based git viewing counterpart, known as "repobrowse" was
envisioned and started several times over the years in different
languages; one of which is in the historical "repobrowse" branch
in public-inbox.git.

I finally realized a few months ago that the world doesn't need
another web-based git repository viewer; at least not in the
traditional sense...

Currently, I believe there are two types of repository viewers:

1. standalone repository viewers with no messaging:

    gitweb, cgit, etc...

2. repository viewers with centralized messaging:

    gogs/gitea, gitlab, and proprietary stuff

public-inbox is a bit different:

  A centralization-resistant messaging system with git-awareness


What public-inbox can do with code repositories, today:

* reconstruct unmerged blobs using patches (SolverGit.pm)

* show SolverGit-reconstructed blobs with syntax highlighting

* diff-highlighting emails; hunk headers link to SolverGit endpoints

* spawn/wrap cgit(1), including parsing the config

* handle smart HTTP requests of coderepos with our git-http-backend(1)
  wrapper (since cgit doesn't handle smart clone/fetch)

* search for blob object_ids (done YEARS ago :)
  SolverGit would not have been possible without this.


Works-in-progress:

* show diffs with different options (contexts, algorithms),
  most importantly to be able to diff against SolverGit-reconstructed
  blobs which aren't merged into a permanent+public code repo, yet.

TODO:

* show/link commits (like git-show(1), and link to
  emails/threads discussing such commits)

* built-in/configurable search queries for common patterns
  git-request-pull(1) templates, patches for certain paths.

  Perhaps this could support generic reporting for building
  tables off CI emails, even.

* show trees...

* maybe: blame/annotate Solver-reconstructed blobs

* spawn/wrap gitweb similar to what was done for cgit

* display config information for mirroring/reproducing
  the site.

* more tests

One more general thing to keep in mind:

public-inbox tries to be educational in its fight against
centralization.  For example, it includes instructions/examples
for using git-send-email(1) to nudge users towards proper
threading and reply-to-all behavior.

Continuing that tradition, the git repository viewer section
should try to be educational for users unfamiliar with git,
patches, or emails in order to prepare users for offline use.

Thus having links to git manpages, examples, etc could all be
useful in the right places.

^ permalink raw reply	[relevance 6%]

* [PATCH 12/37] view: enable naming hints for raw blob downloads
    2019-01-21 20:52  7% ` [PATCH 04/37] solver: initial Perl implementation Eric Wong
  2019-01-21 20:52  7% ` [PATCH 09/37] view: wire up diff and vcs viewers with solver Eric Wong
@ 2019-01-21 20:52  9% ` Eric Wong
  2 siblings, 0 replies; 14+ results
From: Eric Wong @ 2019-01-21 20:52 UTC (permalink / raw)
  To: meta

Meaningful names in URLs are nice, and it can make
life easier for supporting syntax-highlighting
---
 lib/PublicInbox/ViewDiff.pm | 27 +++++++++++++++++++--------
 lib/PublicInbox/ViewVCS.pm  | 32 +++++++++++++++++++++++++++-----
 2 files changed, 46 insertions(+), 13 deletions(-)

diff --git a/lib/PublicInbox/ViewDiff.pm b/lib/PublicInbox/ViewDiff.pm
index ee450fa..94f015f 100644
--- a/lib/PublicInbox/ViewDiff.pm
+++ b/lib/PublicInbox/ViewDiff.pm
@@ -2,12 +2,16 @@
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 #
 # used by PublicInbox::View
+# This adds CSS spans for diff highlighting.
+# It also generates links for ViewVCS + SolverGit to show
+# (or reconstruct) blobs.
+
 package PublicInbox::ViewDiff;
 use strict;
 use warnings;
 use base qw(Exporter);
 our @EXPORT_OK = qw(flush_diff);
-
+use URI::Escape qw(uri_escape_utf8);
 use PublicInbox::Hval qw(ascii_html);
 use PublicInbox::Git qw(git_unquote);
 
@@ -18,6 +22,7 @@ sub DSTATE_HUNK () { 3 } # /^@@ /
 sub DSTATE_CTX () { 4 } # /^ /
 sub DSTATE_ADD () { 5 } # /^\+/
 sub DSTATE_DEL () { 6 } # /^\-/
+sub UNSAFE () { "^A-Za-z0-9\-\._~/" }
 
 my $OID_NULL = '0{7,40}';
 my $OID_BLOB = '[a-f0-9]{7,40}';
@@ -40,18 +45,18 @@ sub diff_hunk ($$$$) {
 	my ($n) = ($ca =~ /^-(\d+)/);
 	$n = defined($n) ? do { ++$n; "#n$n" } : '';
 
-	my $rv = qq(@@ <a\nhref=$spfx$oid_a/s$n>$ca</a>);
+	my $rv = qq(@@ <a\nhref=$spfx$oid_a/s$dctx->{Q}$n>$ca</a>);
 
 	($n) = ($cb =~ /^\+(\d+)/);
 	$n = defined($n) ? do { ++$n; "#n$n" } : '';
 
-	$rv .= qq( <a\nhref=$spfx$oid_b/s$n>$cb</a> @@);
+	$rv .= qq( <a\nhref=$spfx$oid_b/s$dctx->{Q}$n>$cb</a> @@);
 }
 
 sub flush_diff ($$$$) {
 	my ($dst, $spfx, $linkify, $diff) = @_;
 	my $state = DSTATE_INIT;
-	my $dctx; # {}, keys: oid_a, oid_b, path_a, path_b
+	my $dctx = { Q => '' }; # {}, keys: oid_a, oid_b, path_a, path_b
 
 	foreach my $s (@$diff) {
 		if ($s =~ /^ /) {
@@ -67,7 +72,7 @@ sub flush_diff ($$$$) {
 				$$dst .= '</span>';
 			}
 			$$dst .= $s;
-		} elsif ($s =~ m!^diff --git ($PATH_A) ($PATH_B)$!x) {
+		} elsif ($s =~ m!^diff --git ($PATH_A) ($PATH_B)$!) {
 			if ($state != DSTATE_HEAD) {
 				my ($pa, $pb) = ($1, $2);
 				$$dst .= '</span>' if $state != DSTATE_INIT;
@@ -75,15 +80,21 @@ sub flush_diff ($$$$) {
 				$state = DSTATE_HEAD;
 				$pa = (split('/', git_unquote($pa), 2))[1];
 				$pb = (split('/', git_unquote($pb), 2))[1];
-				$dctx = { path_a => $pa, path_b => $pb };
+				$dctx = {
+					Q => "?b=".uri_escape_utf8($pb, UNSAFE),
+				};
+				if ($pa ne $pb) {
+					$dctx->{Q} .=
+					     "&a=".uri_escape_utf8($pa, UNSAFE);
+				}
 			}
 			$$dst .= to_html($linkify, $s);
 		} elsif ($s =~ s/^(index $OID_NULL\.\.)($OID_BLOB)\b//o) {
-			$$dst .= qq($1<a\nhref=$spfx$2/s>$2</a>);
+			$$dst .= qq($1<a\nhref=$spfx$2/s$dctx->{Q}>$2</a>);
 			$$dst .= to_html($linkify, $s) ;
 		} elsif ($s =~ s/^index ($OID_NULL)(\.\.$OID_BLOB)\b//o) {
 			$$dst .= 'index ';
-			$$dst .= qq(<a\nhref=$spfx$1/s>$1</a>$2);
+			$$dst .= qq(<a\nhref=$spfx$1/s$dctx->{Q}>$1</a>$2);
 			$$dst .= to_html($linkify, $s);
 		} elsif ($s =~ /^index ($OID_BLOB)\.\.($OID_BLOB)/o) {
 			$dctx->{oid_a} = $1;
diff --git a/lib/PublicInbox/ViewVCS.pm b/lib/PublicInbox/ViewVCS.pm
index 49fb1c5..90c0907 100644
--- a/lib/PublicInbox/ViewVCS.pm
+++ b/lib/PublicInbox/ViewVCS.pm
@@ -2,6 +2,17 @@
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 
 # show any VCS object, similar to "git show"
+# FIXME: we only show blobs for now
+#
+# This can use a "solver" to reconstruct blobs based on git
+# patches (with abbreviated OIDs in the header).  However, the
+# abbreviated OIDs must match exactly what's in the original
+# email (unless a normal code repo already has the blob).
+#
+# In other words, we can only reliably reconstruct blobs based
+# on links generated by ViewDiff (and only if the emailed
+# patches apply 100% cleanly to published blobs).
+
 package PublicInbox::ViewVCS;
 use strict;
 use warnings;
@@ -9,7 +20,7 @@ use Encode qw(find_encoding);
 use PublicInbox::SolverGit;
 use PublicInbox::WwwStream;
 use PublicInbox::Linkify;
-use PublicInbox::Hval qw(ascii_html);
+use PublicInbox::Hval qw(ascii_html to_filename);
 my %QP_MAP = ( A => 'oid_a', B => 'oid_b', a => 'path_a', b => 'path_b' );
 my $max_size = 1024 * 1024; # TODO: configurable
 my $enc_utf8 = find_encoding('UTF-8');
@@ -63,8 +74,18 @@ sub show ($$;$) {
 		return html_page($ctx, 500, \$log);
 	}
 
-	if (index($$blob, "\0") >= 0) {
-		$log = "<pre>$oid $type $size bytes (binary)</pre>" . $log;
+	my $binary = index($$blob, "\0") >= 0;
+	if ($fn) {
+		my $h = [ 'Content-Length', $size, 'Content-Type' ];
+		push(@$h, ($binary ? 'application/octet-stream' : 'text/plain'));
+		return [ 200, $h, [ $$blob ]];
+	}
+
+	my $path = to_filename($di->{path_b} || $hints->{path_b} || 'blob');
+	my $raw_link = "(<a\nhref=_$path>raw</a>)";
+	if ($binary) {
+		$log = "<pre>$oid $type $size bytes (binary)" .
+			" $raw_link</pre>" . $log;
 		return html_page($ctx, 200, \$log);
 	}
 
@@ -73,13 +94,14 @@ sub show ($$;$) {
 	my $pad = length($nl);
 
 	# using some of the same CSS class names and ids as cgit
-	$log = "<pre>$oid $type $size bytes</pre><hr /><table\nclass=blob>".
+	$log = "<pre>$oid $type $size bytes $raw_link</pre>" .
+		"<hr /><table\nclass=blob>".
 		"<tr><td\nclass=linenumbers><pre>" . join('', map {
 			sprintf("<a id=n$_ href=#n$_>% ${pad}u</a>\n", $_)
 		} (1..$nl)) . '</pre></td>' .
 		'<td><pre> </pre></td>'. # pad for non-CSS users
 		"<td\nclass=lines><pre><code>" .  ascii_html($$blob) .
-		'</pre></td></tr></table>' . $log;
+		'</code></pre></td></tr></table>' . $log;
 
 	html_page($ctx, 200, \$log);
 }
-- 
EW


^ permalink raw reply related	[relevance 9%]

* [PATCH 09/37] view: wire up diff and vcs viewers with solver
    2019-01-21 20:52  7% ` [PATCH 04/37] solver: initial Perl implementation Eric Wong
@ 2019-01-21 20:52  7% ` Eric Wong
  2019-01-21 20:52  9% ` [PATCH 12/37] view: enable naming hints for raw blob downloads Eric Wong
  2 siblings, 0 replies; 14+ results
From: Eric Wong @ 2019-01-21 20:52 UTC (permalink / raw)
  To: meta

---
 MANIFEST                    |   2 +
 lib/PublicInbox/Config.pm   |  59 ++++++++++++++-
 lib/PublicInbox/View.pm     |  47 +++++++++---
 lib/PublicInbox/ViewDiff.pm | 147 ++++++++++++++++++++++++++++++++++++
 lib/PublicInbox/ViewVCS.pm  |  87 +++++++++++++++++++++
 lib/PublicInbox/WWW.pm      |  18 ++++-
 6 files changed, 345 insertions(+), 15 deletions(-)
 create mode 100644 lib/PublicInbox/ViewDiff.pm
 create mode 100644 lib/PublicInbox/ViewVCS.pm

diff --git a/MANIFEST b/MANIFEST
index 95ad0c6..5e980fe 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -109,6 +109,8 @@ lib/PublicInbox/SpawnPP.pm
 lib/PublicInbox/Unsubscribe.pm
 lib/PublicInbox/V2Writable.pm
 lib/PublicInbox/View.pm
+lib/PublicInbox/ViewDiff.pm
+lib/PublicInbox/ViewVCS.pm
 lib/PublicInbox/WWW.pm
 lib/PublicInbox/WWW.pod
 lib/PublicInbox/WatchMaildir.pm
diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
index bea2617..355e64b 100644
--- a/lib/PublicInbox/Config.pm
+++ b/lib/PublicInbox/Config.pm
@@ -2,12 +2,19 @@
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 #
 # Used throughout the project for reading configuration
+#
+# Note: I hate camelCase; but git-config(1) uses it, but it's better
+# than alllowercasewithoutunderscores, so use lc('configKey') where
+# applicable for readability
+
 package PublicInbox::Config;
 use strict;
 use warnings;
 require PublicInbox::Inbox;
 use PublicInbox::Spawn qw(popen_rd);
 
+sub _array ($) { ref($_[0]) eq 'ARRAY' ? $_[0] : [ $_[0] ] }
+
 # returns key-value pairs of config directives in a hash
 # if keys may be multi-value, the value is an array ref containing all values
 sub new {
@@ -22,6 +29,7 @@ sub new {
 	$self->{-by_newsgroup} ||= {};
 	$self->{-no_obfuscate} ||= {};
 	$self->{-limiters} ||= {};
+	$self->{-code_repos} ||= {}; # nick => PublicInbox::Git object
 
 	if (my $no = delete $self->{'publicinbox.noobfuscate'}) {
 		$no = [ $no ] if ref($no) ne 'ARRAY';
@@ -169,6 +177,41 @@ sub valid_inbox_name ($) {
 	1;
 }
 
+# parse a code repo
+# Only git is supported at the moment, but SVN and Hg are possibilities
+sub _fill_code_repo {
+	my ($self, $nick) = @_;
+	my $pfx = "coderepo.$nick";
+
+	my $dir = $self->{"$pfx.dir"}; # aka "GIT_DIR"
+	unless (defined $dir) {
+		warn "$pfx.repodir unset";
+		return;
+	}
+
+	my $git = PublicInbox::Git->new($dir);
+	foreach my $t (qw(blob commit tree tag)) {
+		$git->{$t.'_url_format'} =
+				_array($self->{lc("$pfx.${t}UrlFormat")});
+	}
+
+	if (my $cgits = $self->{lc("$pfx.cgitUrl")}) {
+		$git->{cgit_url} = $cgits = _array($cgits);
+
+		# cgit supports "/blob/?id=%s", but it's only a plain-text
+		# display and requires an unabbreviated id=
+		foreach my $t (qw(blob commit tag)) {
+			$git->{$t.'_url_format'} ||= map {
+				"$_/$t/?id=%s"
+			} @$cgits;
+		}
+	}
+	# TODO: support gitweb and other repository viewers?
+	# TODO: parse cgitrc
+
+	$git;
+}
+
 sub _fill {
 	my ($self, $pfx) = @_;
 	my $rv = {};
@@ -192,9 +235,9 @@ sub _fill {
 	}
 	# TODO: more arrays, we should support multi-value for
 	# more things to encourage decentralization
-	foreach my $k (qw(address altid nntpmirror)) {
+	foreach my $k (qw(address altid nntpmirror coderepo)) {
 		if (defined(my $v = $self->{"$pfx.$k"})) {
-			$rv->{$k} = ref($v) eq 'ARRAY' ? $v : [ $v ];
+			$rv->{$k} = _array($v);
 		}
 	}
 
@@ -224,6 +267,18 @@ sub _fill {
 		$rv->{-no_obfuscate_re} = $self->{-no_obfuscate_re};
 		each_inbox($self, sub {}); # noop to populate -no_obfuscate
 	}
+
+	if (my $ibx_code_repos = $rv->{coderepo}) {
+		my $code_repos = $self->{-code_repos};
+		my $repo_objs = $rv->{-repo_objs} = [];
+		foreach my $nick (@$ibx_code_repos) {
+			valid_inbox_name($nick) or next;
+			my $repo = $code_repos->{$nick} ||=
+						_fill_code_repo($self, $nick);
+			push @$repo_objs, $repo if $repo;
+		}
+	}
+
 	$rv
 }
 
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 470e3ab..0187ec3 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -14,6 +14,7 @@ use PublicInbox::MsgIter;
 use PublicInbox::Address;
 use PublicInbox::WwwStream;
 use PublicInbox::Reply;
+use PublicInbox::ViewDiff qw(flush_diff);
 require POSIX;
 use Time::Local qw(timegm);
 
@@ -28,7 +29,7 @@ sub msg_html {
 	my ($ctx, $mime, $more, $smsg) = @_;
 	my $hdr = $mime->header_obj;
 	my $ibx = $ctx->{-inbox};
-	my $obfs_ibx = $ctx->{-obfs_ibx} = $ibx->{obfuscate} ? $ibx : undef;
+	$ctx->{-obfs_ibx} = $ibx->{obfuscate} ? $ibx : undef;
 	my $tip = _msg_html_prepare($hdr, $ctx, $more, 0);
 	my $end = 2;
 	PublicInbox::WwwStream->response($ctx, 200, sub {
@@ -36,7 +37,7 @@ sub msg_html {
 		if ($nr == 1) {
 			# $more cannot be true w/o $smsg being defined:
 			my $upfx = $more ? '../'.mid_escape($smsg->mid).'/' : '';
-			$tip . multipart_text_as_html($mime, $upfx, $obfs_ibx) .
+			$tip . multipart_text_as_html($mime, $upfx, $ibx) .
 				'</pre><hr>'
 		} elsif ($more && @$more) {
 			++$end;
@@ -81,15 +82,15 @@ sub msg_html_more {
 	my $str = eval {
 		my ($id, $prev, $smsg) = @$more;
 		my $mid = $ctx->{mid};
-		$smsg = $ctx->{-inbox}->smsg_mime($smsg);
+		my $ibx = $ctx->{-inbox};
+		$smsg = $ibx->smsg_mime($smsg);
 		my $next = $ctx->{srch}->next_by_mid($mid, \$id, \$prev);
 		@$more = $next ? ($id, $prev, $next) : ();
 		if ($smsg) {
 			my $mime = $smsg->{mime};
 			my $upfx = '../' . mid_escape($smsg->mid) . '/';
 			_msg_html_prepare($mime->header_obj, $ctx, $more, $nr) .
-				multipart_text_as_html($mime, $upfx,
-							$ctx->{-obfs_ibx}) .
+				multipart_text_as_html($mime, $upfx, $ibx) .
 				'</pre><hr>'
 		} else {
 			'';
@@ -260,7 +261,8 @@ sub index_entry {
 	$rv .= "\n";
 
 	# scan through all parts, looking for displayable text
-	msg_iter($mime, sub { $rv .= add_text_body($mhref, $obfs_ibx, $_[0]) });
+	my $ibx = $ctx->{-inbox};
+	msg_iter($mime, sub { $rv .= add_text_body($mhref, $ibx, $_[0]) });
 
 	# add the footer
 	$rv .= "\n<a\nhref=#$id_m\nid=e$id>^</a> ".
@@ -488,11 +490,11 @@ sub thread_html {
 }
 
 sub multipart_text_as_html {
-	my ($mime, $upfx, $obfs_ibx) = @_;
+	my ($mime, $upfx, $ibx) = @_;
 	my $rv = "";
 
 	# scan through all parts, looking for displayable text
-	msg_iter($mime, sub { $rv .= add_text_body($upfx, $obfs_ibx, $_[0]) });
+	msg_iter($mime, sub { $rv .= add_text_body($upfx, $ibx, $_[0]) });
 	$rv;
 }
 
@@ -545,7 +547,8 @@ sub attach_link ($$$$;$) {
 }
 
 sub add_text_body {
-	my ($upfx, $obfs_ibx, $p) = @_;
+	my ($upfx, $ibx, $p) = @_;
+	my $obfs_ibx = $ibx->{obfuscate} ? $ibx : undef;
 	# $p - from msg_iter: [ Email::MIME, depth, @idx ]
 	my ($part, $depth) = @$p; # attachment @idx is unused
 	my $ct = $part->content_type || 'text/plain';
@@ -554,6 +557,19 @@ sub add_text_body {
 
 	return attach_link($upfx, $ct, $p, $fn) unless defined $s;
 
+	my ($diff, $spfx);
+	if ($ibx->{-repo_objs} && $s =~ /^(?:diff|---|\+{3}) /ms) {
+		$diff = [];
+		my $n_slash = $upfx =~ tr!/!/!;
+		if ($n_slash == 0) {
+			$spfx = '../';
+		} elsif ($n_slash == 1) {
+			$spfx = '';
+		} else { # nslash == 2
+			$spfx = '../../';
+		}
+	};
+
 	my @lines = split(/^/m, $s);
 	$s = '';
 	if (defined($fn) || $depth > 0 || $err) {
@@ -568,19 +584,26 @@ sub add_text_body {
 			# show the previously buffered quote inline
 			flush_quote(\$s, $l, \@quot) if @quot;
 
-			# regular line, OK
-			$l->linkify_1($cur);
-			$s .= $l->linkify_2(ascii_html($cur));
+			if ($diff) {
+				push @$diff, $cur;
+			} else {
+				# regular line, OK
+				$l->linkify_1($cur);
+				$s .= $l->linkify_2(ascii_html($cur));
+			}
 		} else {
+			flush_diff(\$s, $spfx, $l, $diff) if $diff && @$diff;
 			push @quot, $cur;
 		}
 	}
 
 	if (@quot) { # ugh, top posted
 		flush_quote(\$s, $l, \@quot);
+		flush_diff(\$s, $spfx, $l, $diff) if $diff && @$diff;
 		obfuscate_addrs($obfs_ibx, $s) if $obfs_ibx;
 		$s;
 	} else {
+		flush_diff(\$s, $spfx, $l, $diff) if $diff && @$diff;
 		obfuscate_addrs($obfs_ibx, $s) if $obfs_ibx;
 		if ($s =~ /\n\z/s) { # common, last line ends with a newline
 			$s;
diff --git a/lib/PublicInbox/ViewDiff.pm b/lib/PublicInbox/ViewDiff.pm
new file mode 100644
index 0000000..ee450fa
--- /dev/null
+++ b/lib/PublicInbox/ViewDiff.pm
@@ -0,0 +1,147 @@
+# Copyright (C) 2019 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+#
+# used by PublicInbox::View
+package PublicInbox::ViewDiff;
+use strict;
+use warnings;
+use base qw(Exporter);
+our @EXPORT_OK = qw(flush_diff);
+
+use PublicInbox::Hval qw(ascii_html);
+use PublicInbox::Git qw(git_unquote);
+
+sub DSTATE_INIT () { 0 }
+sub DSTATE_STAT () { 1 } # TODO
+sub DSTATE_HEAD () { 2 } # /^diff --git /, /^index /, /^--- /, /^\+\+\+ /
+sub DSTATE_HUNK () { 3 } # /^@@ /
+sub DSTATE_CTX () { 4 } # /^ /
+sub DSTATE_ADD () { 5 } # /^\+/
+sub DSTATE_DEL () { 6 } # /^\-/
+
+my $OID_NULL = '0{7,40}';
+my $OID_BLOB = '[a-f0-9]{7,40}';
+my $PATH_A = '"?a/.+|/dev/null';
+my $PATH_B = '"?b/.+|/dev/null';
+
+sub to_html ($$) {
+	$_[0]->linkify_1($_[1]);
+	$_[0]->linkify_2(ascii_html($_[1]));
+}
+
+# link to line numbers in blobs
+sub diff_hunk ($$$$) {
+	my ($dctx, $spfx, $ca, $cb) = @_;
+	my $oid_a = $dctx->{oid_a};
+	my $oid_b = $dctx->{oid_b};
+
+	(defined($oid_a) && defined($oid_b)) or return "@@ $ca $cb @@";
+
+	my ($n) = ($ca =~ /^-(\d+)/);
+	$n = defined($n) ? do { ++$n; "#n$n" } : '';
+
+	my $rv = qq(@@ <a\nhref=$spfx$oid_a/s$n>$ca</a>);
+
+	($n) = ($cb =~ /^\+(\d+)/);
+	$n = defined($n) ? do { ++$n; "#n$n" } : '';
+
+	$rv .= qq( <a\nhref=$spfx$oid_b/s$n>$cb</a> @@);
+}
+
+sub flush_diff ($$$$) {
+	my ($dst, $spfx, $linkify, $diff) = @_;
+	my $state = DSTATE_INIT;
+	my $dctx; # {}, keys: oid_a, oid_b, path_a, path_b
+
+	foreach my $s (@$diff) {
+		if ($s =~ /^ /) {
+			if ($state == DSTATE_HUNK || $state == DSTATE_ADD ||
+			    $state == DSTATE_DEL || $state == DSTATE_HEAD) {
+				$$dst .= "</span><span\nclass=ctx>";
+				$state = DSTATE_CTX;
+			}
+			$$dst .= to_html($linkify, $s);
+		} elsif ($s =~ /^-- $/) { # email signature begins
+			if ($state != DSTATE_INIT) {
+				$state = DSTATE_INIT;
+				$$dst .= '</span>';
+			}
+			$$dst .= $s;
+		} elsif ($s =~ m!^diff --git ($PATH_A) ($PATH_B)$!x) {
+			if ($state != DSTATE_HEAD) {
+				my ($pa, $pb) = ($1, $2);
+				$$dst .= '</span>' if $state != DSTATE_INIT;
+				$$dst .= "<span\nclass=head>";
+				$state = DSTATE_HEAD;
+				$pa = (split('/', git_unquote($pa), 2))[1];
+				$pb = (split('/', git_unquote($pb), 2))[1];
+				$dctx = { path_a => $pa, path_b => $pb };
+			}
+			$$dst .= to_html($linkify, $s);
+		} elsif ($s =~ s/^(index $OID_NULL\.\.)($OID_BLOB)\b//o) {
+			$$dst .= qq($1<a\nhref=$spfx$2/s>$2</a>);
+			$$dst .= to_html($linkify, $s) ;
+		} elsif ($s =~ s/^index ($OID_NULL)(\.\.$OID_BLOB)\b//o) {
+			$$dst .= 'index ';
+			$$dst .= qq(<a\nhref=$spfx$1/s>$1</a>$2);
+			$$dst .= to_html($linkify, $s);
+		} elsif ($s =~ /^index ($OID_BLOB)\.\.($OID_BLOB)/o) {
+			$dctx->{oid_a} = $1;
+			$dctx->{oid_b} = $2;
+			$$dst .= to_html($linkify, $s);
+		} elsif ($s =~ s/^@@ (\S+) (\S+) @@//) {
+			my ($ca, $cb) = ($1, $2);
+			if ($state == DSTATE_HEAD || $state == DSTATE_CTX ||
+			    $state == DSTATE_ADD || $state == DSTATE_DEL) {
+				$$dst .= "</span><span\nclass=hunk>";
+				$state = DSTATE_HUNK;
+				$$dst .= diff_hunk($dctx, $spfx, $ca, $cb);
+			} else {
+				$$dst .= to_html($linkify, "@@ $ca $cb @@");
+			}
+			$$dst .= to_html($linkify, $s);
+		} elsif ($s =~ m!^--- $PATH_A!) {
+			if ($state == DSTATE_INIT) { # color only (no oid link)
+				$state = DSTATE_HEAD;
+				$$dst .= "<span\nclass=head>";
+			}
+			$$dst .= to_html($linkify, $s);
+		} elsif ($s =~ m!^\+{3} $PATH_B!)  {
+			if ($state == DSTATE_INIT) { # color only (no oid link)
+				$state = DSTATE_HEAD;
+				$$dst .= "<span\nclass=head>";
+			}
+			$$dst .= to_html($linkify, $s);
+		} elsif ($s =~ /^\+/) {
+			if ($state != DSTATE_ADD && $state != DSTATE_INIT) {
+				$$dst .= "</span><span\nclass=add>";
+				$state = DSTATE_ADD;
+			}
+			$$dst .= to_html($linkify, $s);
+		} elsif ($s =~ /^-/) {
+			if ($state != DSTATE_DEL && $state != DSTATE_INIT) {
+				$$dst .= "</span><span\nclass=del>";
+				$state = DSTATE_DEL;
+			}
+			$$dst .= to_html($linkify, $s);
+		# ignore the following lines in headers:
+		} elsif ($s =~ /^(?:dis)similarity index/ ||
+			 $s =~ /^(?:old|new) mode/ ||
+			 $s =~ /^(?:deleted|new) file mode/ ||
+			 $s =~ /^(?:copy|rename) (?:from|to) / ||
+			 $s =~ /^(?:dis)?similarity index /) {
+			$$dst .= to_html($linkify, $s);
+		} else {
+			if ($state != DSTATE_INIT) {
+				$$dst .= '</span>';
+				$state = DSTATE_INIT;
+			}
+			$$dst .= to_html($linkify, $s);
+		}
+	}
+	@$diff = ();
+	$$dst .= '</span>' if $state != DSTATE_INIT;
+	undef;
+}
+
+1;
diff --git a/lib/PublicInbox/ViewVCS.pm b/lib/PublicInbox/ViewVCS.pm
new file mode 100644
index 0000000..49fb1c5
--- /dev/null
+++ b/lib/PublicInbox/ViewVCS.pm
@@ -0,0 +1,87 @@
+# Copyright (C) 2019 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# show any VCS object, similar to "git show"
+package PublicInbox::ViewVCS;
+use strict;
+use warnings;
+use Encode qw(find_encoding);
+use PublicInbox::SolverGit;
+use PublicInbox::WwwStream;
+use PublicInbox::Linkify;
+use PublicInbox::Hval qw(ascii_html);
+my %QP_MAP = ( A => 'oid_a', B => 'oid_b', a => 'path_a', b => 'path_b' );
+my $max_size = 1024 * 1024; # TODO: configurable
+my $enc_utf8 = find_encoding('UTF-8');
+
+sub html_page ($$$) {
+	my ($ctx, $code, $strref) = @_;
+	$ctx->{-upfx} = '../'; # from "/$INBOX/$OID/s"
+	PublicInbox::WwwStream->response($ctx, $code, sub {
+		my ($nr, undef) =  @_;
+		$nr == 1 ? $$strref : undef;
+	});
+}
+
+sub show ($$;$) {
+	my ($ctx, $oid_b, $fn) = @_;
+	my $ibx = $ctx->{-inbox};
+	my $inboxes = [ $ibx ];
+	my $solver = PublicInbox::SolverGit->new($ibx->{-repo_objs}, $inboxes);
+	my $qp = $ctx->{qp};
+	my $hints = {};
+	while (my ($from, $to) = each %QP_MAP) {
+		defined(my $v = $qp->{$from}) or next;
+		$hints->{$to} = $v;
+	}
+
+	open my $log, '+>', undef or die "open: $!";
+	my $res = $solver->solve($log, $oid_b, $hints);
+
+	seek($log, 0, 0) or die "seek: $!";
+	$log = do { local $/; <$log> };
+
+	my $l = PublicInbox::Linkify->new;
+	$l->linkify_1($log);
+	$log = '<pre>debug log:</pre><hr /><pre>' .
+		$l->linkify_2(ascii_html($log)) . '</pre>';
+
+	$res or return html_page($ctx, 404, \$log);
+
+	my ($git, $oid, $type, $size, $di) = @$res;
+	if ($size > $max_size) {
+		# TODO: stream the raw file if it's gigantic, at least
+		$log = '<pre><b>Too big to show</b></pre>' . $log;
+		return html_page($ctx, 500, \$log);
+	}
+
+	my $blob = $git->cat_file($oid);
+	if (!$blob) { # WTF?
+		my $e = "Failed to retrieve generated blob ($oid)";
+		$ctx->{env}->{'psgi.errors'}->print("$e ($git->{git_dir})\n");
+		$log = "<pre><b>$e</b></pre>" . $log;
+		return html_page($ctx, 500, \$log);
+	}
+
+	if (index($$blob, "\0") >= 0) {
+		$log = "<pre>$oid $type $size bytes (binary)</pre>" . $log;
+		return html_page($ctx, 200, \$log);
+	}
+
+	$$blob = $enc_utf8->decode($$blob);
+	my $nl = ($$blob =~ tr/\n/\n/);
+	my $pad = length($nl);
+
+	# using some of the same CSS class names and ids as cgit
+	$log = "<pre>$oid $type $size bytes</pre><hr /><table\nclass=blob>".
+		"<tr><td\nclass=linenumbers><pre>" . join('', map {
+			sprintf("<a id=n$_ href=#n$_>% ${pad}u</a>\n", $_)
+		} (1..$nl)) . '</pre></td>' .
+		'<td><pre> </pre></td>'. # pad for non-CSS users
+		"<td\nclass=lines><pre><code>" .  ascii_html($$blob) .
+		'</pre></td></tr></table>' . $log;
+
+	html_page($ctx, 200, \$log);
+}
+
+1;
diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index 3562e46..c73370f 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -25,6 +25,7 @@ our $INBOX_RE = qr!\A/([\w\-][\w\.\-]*)!;
 our $MID_RE = qr!([^/]+)!;
 our $END_RE = qr!(T/|t/|t\.mbox(?:\.gz)?|t\.atom|raw|)!;
 our $ATTACH_RE = qr!(\d[\.\d]*)-([[:alnum:]][\w\.-]+[[:alnum:]])!i;
+our $OID_RE = qr![a-f0-9]{7,40}!;
 
 sub new {
 	my ($class, $pi_config) = @_;
@@ -117,7 +118,10 @@ sub call {
 		r301($ctx, $1, $2);
 	} elsif ($path_info =~ m!$INBOX_RE/_/text(?:/(.*))?\z!o) {
 		get_text($ctx, $1, $2);
-
+	} elsif ($path_info =~ m!$INBOX_RE/($OID_RE)/s\z!o) {
+		get_vcs_object($ctx, $1, $2);
+	} elsif ($path_info =~ m!$INBOX_RE/($OID_RE)/_([\w\.\-]+)\z!o) {
+		get_vcs_object($ctx, $1, $2, $3);
 	# convenience redirects order matters
 	} elsif ($path_info =~ m!$INBOX_RE/([^/]{2,})\z!o) {
 		r301($ctx, $1, $2);
@@ -259,6 +263,18 @@ sub get_text {
 	PublicInbox::WwwText::get_text($ctx, $key);
 }
 
+# show git objects (blobs and commits)
+# /$INBOX/_/$OBJECT_ID/show
+# /$INBOX/_/${OBJECT_ID}_${FILENAME}
+# KEY may contain slashes
+sub get_vcs_object ($$$;$) {
+	my ($ctx, $inbox, $oid, $filename) = @_;
+	my $r404 = invalid_inbox($ctx, $inbox);
+	return $r404 if $r404;
+	require PublicInbox::ViewVCS;
+	PublicInbox::ViewVCS::show($ctx, $oid, $filename);
+}
+
 sub ctx_get {
 	my ($ctx, $key) = @_;
 	my $val = $ctx->{$key};
-- 
EW


^ permalink raw reply related	[relevance 7%]

* [PATCH 04/37] solver: initial Perl implementation
  @ 2019-01-21 20:52  7% ` Eric Wong
  2019-01-21 20:52  7% ` [PATCH 09/37] view: wire up diff and vcs viewers with solver Eric Wong
  2019-01-21 20:52  9% ` [PATCH 12/37] view: enable naming hints for raw blob downloads Eric Wong
  2 siblings, 0 replies; 14+ results
From: Eric Wong @ 2019-01-21 20:52 UTC (permalink / raw)
  To: meta

This will lookup git blobs from associated git source code
repositories.  If the blobs can't be found, an attempt to
"solve" them via patch application will be performed.

Eventually, this may become the basis of a type-agnostic
frontend similar to "git show"
---
 MANIFEST                                     |   4 +
 lib/PublicInbox/Git.pm                       |  16 +
 lib/PublicInbox/SolverGit.pm                 | 400 +++++++++++++++++++
 t/solve/0001-simple-mod.patch                |  20 +
 t/solve/0002-rename-with-modifications.patch |  37 ++
 t/solver_git.t                               |  91 +++++
 6 files changed, 568 insertions(+)
 create mode 100644 lib/PublicInbox/SolverGit.pm
 create mode 100644 t/solve/0001-simple-mod.patch
 create mode 100644 t/solve/0002-rename-with-modifications.patch
 create mode 100644 t/solver_git.t

diff --git a/MANIFEST b/MANIFEST
index dfd9e27..95ad0c6 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -101,6 +101,7 @@ lib/PublicInbox/SearchIdxPart.pm
 lib/PublicInbox/SearchMsg.pm
 lib/PublicInbox/SearchThread.pm
 lib/PublicInbox/SearchView.pm
+lib/PublicInbox/SolverGit.pm
 lib/PublicInbox/Spamcheck.pm
 lib/PublicInbox/Spamcheck/Spamc.pm
 lib/PublicInbox/Spawn.pm
@@ -201,6 +202,9 @@ t/qspawn.t
 t/reply.t
 t/search-thr-index.t
 t/search.t
+t/solve/0001-simple-mod.patch
+t/solve/0002-rename-with-modifications.patch
+t/solver_git.t
 t/spamcheck_spamc.t
 t/spawn.t
 t/thread-cycle.t
diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index 90b9214..9676086 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -40,6 +40,7 @@ sub new {
 	my ($class, $git_dir) = @_;
 	my @st;
 	$st[7] = $st[10] = 0;
+	# may contain {-wt} field (working-tree (File::Temp::Dir))
 	bless { git_dir => $git_dir, st => \@st }, $class
 }
 
@@ -201,6 +202,21 @@ sub packed_bytes {
 
 sub DESTROY { cleanup(@_) }
 
+# show the blob URL for cgit/gitweb/whatever
+sub src_blob_url {
+	my ($self, $oid) = @_;
+	# blob_fmt = "https://example.com/foo.git/blob/%s"
+	if (my $bfu = $self->{blob_fmt_url}) {
+		return sprintf($bfu, $oid);
+	}
+
+	# don't show full FS path, basename should be OK:
+	if ($self->{git_dir} =~ m!/([^/]+)\z!) {
+		return "/path/to/$1";
+	}
+	'???';
+}
+
 1;
 __END__
 =pod
diff --git a/lib/PublicInbox/SolverGit.pm b/lib/PublicInbox/SolverGit.pm
new file mode 100644
index 0000000..f28768a
--- /dev/null
+++ b/lib/PublicInbox/SolverGit.pm
@@ -0,0 +1,400 @@
+# Copyright (C) 2019 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# "Solve" blobs which don't exist in git code repositories by
+# searching inboxes for post-image blobs.
+
+# this emits a lot of debugging/tracing information which may be
+# publically viewed over HTTP(S).  Be careful not to expose
+# local filesystem layouts in the process.
+package PublicInbox::SolverGit;
+use strict;
+use warnings;
+use File::Temp qw();
+use Fcntl qw(SEEK_SET);
+use File::Path qw(make_path);
+use PublicInbox::Git qw(git_unquote);
+use PublicInbox::Spawn qw(spawn popen_rd);
+use PublicInbox::MsgIter qw(msg_iter msg_part_text);
+use URI::Escape qw(uri_escape_utf8);
+
+# don't bother if somebody sends us a patch with these path components,
+# it's junk at best, an attack attempt at worse:
+my %bad_component = map { $_ => 1 } ('', '.', '..');
+
+sub new {
+	my ($class, $gits, $inboxes) = @_;
+	bless {
+		gits => $gits,
+		inboxes => $inboxes,
+	}, $class;
+}
+
+# look for existing blobs already in git repos
+sub solve_existing ($$) {
+	my ($self, $want) = @_;
+	foreach my $git (@{$self->{gits}}) {
+		my ($oid_full, $type, $size) = $git->check($want->{oid_b});
+		if (defined($type) && $type eq 'blob') {
+			return [ $git, $oid_full, $type, int($size) ];
+		}
+	}
+	undef;
+}
+
+# returns a hashref with information about a diff:
+# {
+#	oid_a => abbreviated pre-image oid,
+#	oid_b => abbreviated post-image oid,
+#	tmp => anonymous file handle with the diff,
+#	hdr_lines => arrayref of various header lines for mode information
+#	mode_a => original mode of oid_a (string, not integer),
+#	ibx => PublicInbox::Inbox object containing the diff
+#	smsg => PublicInbox::SearchMsg object containing diff
+#	path_a => pre-image path
+#	path_b => post-image path
+# }
+sub extract_diff ($$$$) {
+	my ($p, $re, $ibx, $smsg) = @_;
+	my ($part) = @$p; # ignore $depth and @idx;
+	my $hdr_lines; # diff --git a/... b/...
+	my $tmp;
+	my $ct = $part->content_type || 'text/plain';
+	my ($s, undef) = msg_part_text($part, $ct);
+	defined $s or return;
+	my $di = {};
+	foreach my $l (split(/^/m, $s)) {
+		if ($l =~ /$re/) {
+			$di->{oid_a} = $1;
+			$di->{oid_b} = $2;
+			my $mode_a = $3;
+			if ($mode_a =~ /\A(?:100644|120000|100755)\z/) {
+				$di->{mode_a} = $mode_a;
+			}
+
+			# start writing the diff out to a tempfile
+			open($tmp, '+>', undef) or die "open(tmp): $!";
+			$di->{tmp} = $tmp;
+			$di->{hdr_lines} = $hdr_lines;
+
+			print $tmp @$hdr_lines, $l or die "print(tmp): $!";
+
+			# for debugging/diagnostics:
+			$di->{ibx} = $ibx;
+			$di->{smsg} = $smsg;
+		} elsif ($l =~ m!\Adiff --git ("?a/.+) ("?b/.+)$!) {
+			return $di if $tmp; # got our blob, done!
+
+			my ($path_a, $path_b) = ($1, $2);
+
+			# don't care for leading 'a/' and 'b/'
+			my (undef, @a) = split(m{/}, git_unquote($path_a));
+			my (undef, @b) = split(m{/}, git_unquote($path_b));
+
+			# get rid of path-traversal attempts and junk patches:
+			foreach (@a, @b) {
+				return if $bad_component{$_};
+			}
+
+			$di->{path_a} = join('/', @a);
+			$di->{path_b} = join('/', @b);
+			$hdr_lines = [ $l ];
+		} elsif ($tmp) {
+			print $tmp $l or die "print(tmp): $!";
+		} elsif ($hdr_lines) {
+			push @$hdr_lines, $l;
+		}
+	}
+	$tmp ? $di : undef;
+}
+
+sub path_searchable ($) { defined($_[0]) && $_[0] =~ m!\A[\w/\. \-]+\z! }
+
+sub find_extract_diff ($$$) {
+	my ($self, $ibx, $want) = @_;
+	my $srch = $ibx->search or return;
+
+	my $post = $want->{oid_b} or die 'BUG: no {oid_b}';
+	$post =~ /\A[a-f0-9]+\z/ or die "BUG: oid_b not hex: $post";
+
+	my $q = "dfpost:$post";
+	my $pre = $want->{oid_a};
+	if (defined $pre && $pre =~ /\A[a-f0-9]+\z/) {
+		$q .= " dfpre:$pre";
+	} else {
+		$pre = '[a-f0-9]{7}'; # for $re below
+	}
+
+	my $path_b = $want->{path_b};
+	if (path_searchable($path_b)) {
+		$q .= qq{ dfn:"$path_b"};
+
+		my $path_a = $want->{path_a};
+		if (path_searchable($path_a) && $path_a ne $path_b) {
+			$q .= qq{ dfn:"$path_a"};
+		}
+	}
+
+	my $msgs = $srch->query($q, { relevance => 1 });
+	my $re = qr/\Aindex ($pre[a-f0-9]*)\.\.($post[a-f0-9]*)(?: (\d+))?/;
+
+	my $di;
+	foreach my $smsg (@$msgs) {
+		$ibx->smsg_mime($smsg) or next;
+		msg_iter(delete($smsg->{mime}), sub {
+			$di ||= extract_diff($_[0], $re, $ibx, $smsg);
+		});
+		return $di if $di;
+	}
+}
+
+# pure Perl "git init"
+sub do_git_init_wt ($) {
+	my ($self) = @_;
+	my $wt = File::Temp->newdir('solver.wt-XXXXXXXX', TMPDIR => 1);
+	my $dir = $wt->dirname;
+
+	foreach (qw(objects/info refs/heads)) {
+		make_path("$dir/.git/$_") or die "make_path $_: $!";
+	}
+	open my $fh, '>', "$dir/.git/config" or die "open .git/config: $!";
+	print $fh <<'EOF' or die "print .git/config $!";
+[core]
+	repositoryFormatVersion = 0
+	filemode = true
+	bare = false
+	fsyncObjectfiles = false
+	logAllRefUpdates = false
+EOF
+	close $fh or die "close .git/config: $!";
+
+	open $fh, '>', "$dir/.git/HEAD" or die "open .git/HEAD: $!";
+	print $fh "ref: refs/heads/master\n" or die "print .git/HEAD: $!";
+	close $fh or die "close .git/HEAD: $!";
+
+	my $f = '.git/objects/info/alternates';
+	open $fh, '>', "$dir/$f" or die "open: $f: $!";
+	foreach my $git (@{$self->{gits}}) {
+		print $fh "$git->{git_dir}/objects\n" or die "print $f: $!";
+	}
+	close $fh or die "close: $f: $!";
+	$wt;
+}
+
+sub extract_old_mode ($) {
+	my ($di) = @_;
+	if (grep(/\Aold mode (100644|100755|120000)$/, @{$di->{hdr_lines}})) {
+		return $1;
+	}
+	'100644';
+}
+
+sub reap ($$) {
+	my ($pid, $msg) = @_;
+	waitpid($pid, 0) == $pid or die "waitpid($msg): $!";
+	$? == 0 or die "$msg failed: $?";
+}
+
+sub prepare_wt ($$$) {
+	my ($wt_dir, $existing, $di) = @_;
+	my $oid_full = $existing->[1];
+	my ($r, $w);
+	my $path_a = $di->{path_a} or die "BUG: path_a missing for $oid_full";
+	my $mode_a = $di->{mode_a} || extract_old_mode($di);
+	my @git = (qw(git -C), $wt_dir);
+
+	pipe($r, $w) or die "pipe: $!";
+	my $rdr = { 0 => fileno($r) };
+	my $pid = spawn([@git, qw(update-index -z --index-info)], {}, $rdr);
+	close $r or die "close pipe(r): $!";
+	print $w "$mode_a $oid_full\t$path_a\0" or die "print update-index: $!";
+	close $w or die "close update-index: $!";
+	reap($pid, 'update-index -z --index-info');
+
+	$pid = spawn([@git, qw(checkout-index -a -f -u)]);
+	reap($pid, 'checkout-index -a -f -u');
+}
+
+sub do_apply ($$$$) {
+	my ($out, $wt_git, $wt_dir, $di) = @_;
+
+	my $tmp = delete $di->{tmp} or die "BUG: no tmp ", di_info($di);
+	$tmp->flush or die "tmp->flush failed: $!";
+	$out->flush or die "err->flush failed: $!";
+	sysseek($tmp, 0, SEEK_SET) or die "sysseek(tmp) failed: $!";
+
+	defined(my $err_fd = fileno($out)) or die "fileno(out): $!";
+	my $rdr = { 0 => fileno($tmp), 1 => $err_fd, 2 => $err_fd };
+	my $cmd = [ qw(git -C), $wt_dir,
+	            qw(apply --whitespace=warn -3 --verbose) ];
+	reap(spawn($cmd, undef, $rdr), 'apply');
+
+	local $/ = "\0";
+	my $rd = popen_rd([qw(git -C), $wt_dir, qw(ls-files -s -z)]);
+
+	defined(my $line = <$rd>) or die "failed to read ls-files: $!";
+	chomp $line or die "no trailing \\0 in [$line] from ls-files";
+
+	my ($info, $file) = split(/\t/, $line, 2);
+	my ($mode_b, $oid_b_full, $stage) = split(/ /, $info);
+
+	defined($line = <$rd>) and die "extra files in index: $line";
+	close $rd or die "close ls-files: $?";
+
+	$file eq $di->{path_b} or
+		die "index mismatch: file=$file != path_b=$di->{path_b}";
+	my $abs_path = "$wt_dir/$file";
+	-r $abs_path or die "WT_DIR/$file not readable";
+	my $size = -s _;
+
+	print $out "OK $mode_b $oid_b_full $stage\t$file\n";
+	[ $wt_git, $oid_b_full, 'blob', $size, $di ];
+}
+
+sub di_url ($) {
+	my ($di) = @_;
+	# note: we don't pass the PSGI env here, different inboxes
+	# can have different HTTP_HOST on the same instance.
+	my $url = $di->{ibx}->base_url;
+	my $mid = $di->{smsg}->{mid};
+	defined($url) ? "<$url/$mid/>" : "<$mid>";
+}
+
+sub apply_patches ($$$$$) {
+	my ($self, $out, $wt, $found, $patches) = @_;
+	my $wt_dir = $wt->dirname;
+	my $wt_git = PublicInbox::Git->new("$wt_dir/.git");
+	$wt_git->{-wt} = $wt;
+
+	my $cur = 0;
+	my $tot = scalar @$patches;
+
+	foreach my $di (@$patches) {
+		my $i = ++$cur;
+		my $oid_a = $di->{oid_a};
+		my $existing = $found->{$oid_a};
+		my $empty_oid = $oid_a =~ /\A0+\z/;
+
+		if ($empty_oid && $i != 0) {
+			die "empty oid at [$i/$tot] ", di_url($di);
+		}
+		if (!$existing && !$empty_oid) {
+			die "missing $oid_a at [$i/$tot] ", di_url($di);
+		}
+
+		# prepare the worktree for patch application:
+		if ($i == 1 && $existing) {
+			prepare_wt($wt_dir, $existing, $di);
+		}
+		unless (-f "$wt_dir/$di->{path_a}") {
+			die "missing $di->{path_a} at [$i/$tot] ", di_url($di);
+		}
+
+		print $out "applying [$i/$tot] ", di_url($di), "\n",
+			   join('', @{$di->{hdr_lines}}), "\n"
+			or die "print \$out failed: $!";
+
+		# apply the patch!
+		$found->{$di->{oid_b}} = do_apply($out, $wt_git, $wt_dir, $di);
+	}
+}
+
+sub dump_found ($$) {
+	my ($out, $found) = @_;
+	foreach my $oid (sort keys %$found) {
+		my ($git, $oid, $di) = @{$found->{$oid}};
+		my $loc = $di ? di_info($di) : $git->src_blob_url($oid);
+		print $out "$oid from $loc\n";
+	}
+}
+
+sub dump_patches ($$) {
+	my ($out, $patches) = @_;
+	my $tot = scalar(@$patches);
+	my $i = 0;
+	foreach my $di (@$patches) {
+		++$i;
+		print $out "[$i/$tot] ", di_url($di), "\n";
+	}
+}
+
+# recreate $oid_b
+# Returns a 2-element array ref: [ PublicInbox::Git object, oid_full ]
+# or undef if nothing was found.
+sub solve ($$$$) {
+	my ($self, $out, $oid_b, $hints) = @_;
+
+	# should we even get here? Probably not, but somebody
+	# could be manually typing URLs:
+	return if $oid_b =~ /\A0+\z/;
+
+	my $req = { %$hints, oid_b => $oid_b };
+	my @todo = ($req);
+	my $found = {}; # { oid_abbrev => [ PublicInbox::Git, oid_full, $di ] }
+	my $patches = []; # [ array of $di hashes ]
+
+	my $max = $self->{max_steps} || 200;
+	my $steps = 0;
+
+	while (defined(my $want = pop @todo)) {
+		# see if we can find the blob in an existing git repo:
+		if (my $existing = solve_existing($self, $want)) {
+			my $want_oid = $want->{oid_b};
+			return $existing if $want_oid eq $oid_b; # DONE!
+
+			$found->{$want_oid} = $existing;
+			next; # ok, one blob resolved, more to go?
+		}
+
+		# scan through inboxes to look for emails which results in
+		# the oid we want:
+		foreach my $ibx (@{$self->{inboxes}}) {
+			my $di = find_extract_diff($self, $ibx, $want) or next;
+
+			unshift @$patches, $di;
+
+			# good, we can find a path to the oid we $want, now
+			# lets see if we need to apply more patches:
+			my $src = $di->{oid_a};
+			if ($src !~ /\A0+\z/) {
+				if (++$steps > $max) {
+					print $out
+"Aborting, too many steps to $oid_b\n";
+
+					return;
+				}
+
+				# we have to solve it using another oid, fine:
+				my $job = {
+					oid_b => $src,
+					path_b => $di->{path_a},
+				};
+				push @todo, $job;
+			}
+			last; # onto the next @todo item
+		}
+	}
+
+	unless (scalar(@$patches)) {
+		print $out "no patch(es) for $oid_b\n";
+		dump_found($out, $found);
+		return;
+	}
+
+	# reconstruct the oid_b blob using patches we found:
+	eval {
+		my $wt = do_git_init_wt($self);
+		apply_patches($self, $out, $wt, $found, $patches);
+	};
+	if ($@) {
+		print $out "E: $@\nfound: ";
+		dump_found($out, $found);
+		print $out "patches: ";
+		dump_patches($out, $patches);
+		return;
+	}
+
+	$found->{$oid_b};
+}
+
+1;
diff --git a/t/solve/0001-simple-mod.patch b/t/solve/0001-simple-mod.patch
new file mode 100644
index 0000000..c6bb157
--- /dev/null
+++ b/t/solve/0001-simple-mod.patch
@@ -0,0 +1,20 @@
+From: WEB DESIGN EXPERT <BOFH@YHBT.net>
+To: meta@public-inbox.org
+Subject: [PATCH] TODO: take expert web design advice
+Date: Mon, 1 Apr 2019 08:15:20 +0000
+Message-Id: <20190401081523.16213-1-BOFH@YHBT.net>
+
+---
+ TODO | 2 ++
+ 1 file changed, 2 insertions(+)
+
+diff --git a/TODO b/TODO
+index 605013e..69df7d5 100644
+--- a/TODO
++++ b/TODO
+@@ -109,3 +109,5 @@ all need to be considered for everything we introduce)
+ 
+   * Optional history squashing to reduce commit and intermediate
+     tree objects
++
++  * Make use of <blink> and <marquee> tags
diff --git a/t/solve/0002-rename-with-modifications.patch b/t/solve/0002-rename-with-modifications.patch
new file mode 100644
index 0000000..aa415e0
--- /dev/null
+++ b/t/solve/0002-rename-with-modifications.patch
@@ -0,0 +1,37 @@
+From: POLITICAL CORRECTNESS EXPERT <BOFH@YHBT.net>
+To: meta@public-inbox.org
+Subject: [PATCH] POLITICALLY CORRECT FILE NAMING
+Date: Mon, 1 Apr 2019 08:15:20 +0000
+Message-Id: <20190401081523.16213-2-BOFH@YHBT.net>
+
+HACKING MIGHT GET US REPORTED TO EFF-BEE-EYE
+AND USE MARKDOWN CUZ MOAR FLAVORS == BETTER
+---
+ HACKING => CONTRIBUTING.md | 6 +++---
+ 1 file changed, 3 insertions(+), 3 deletions(-)
+ rename HACKING => CONTRIBUTING.md (94%)
+
+diff --git a/HACKING b/CONTRIBUTING.md
+similarity index 94%
+rename from HACKING
+rename to CONTRIBUTING.md
+index 3435775..0a92431 100644
+--- a/HACKING
++++ b/CONTRIBUTING.md
+@@ -1,5 +1,5 @@
+-hacking public-inbox
+---------------------
++contributing to public-inbox
++----------------------------
+ 
+ Send all patches and "git request-pull"-formatted emails to our
+ self-hosting inbox at meta@public-inbox.org
+@@ -15,7 +15,7 @@ Please consider our goals in mind:
+ 	Decentralization, Accessibility, Compatibility, Performance
+ 
+ These goals apply to everyone: users viewing over the web or NNTP,
+-sysadmins running public-inbox, and other hackers working public-inbox.
++sysadmins running public-inbox, and other contributors working public-inbox.
+ 
+ We will reject any feature which advocates or contributes to any
+ particular instance of a public-inbox becoming a single point of failure.
diff --git a/t/solver_git.t b/t/solver_git.t
new file mode 100644
index 0000000..fe322ea
--- /dev/null
+++ b/t/solver_git.t
@@ -0,0 +1,91 @@
+# Copyright (C) 2019 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use warnings;
+use Test::More;
+use File::Temp qw(tempdir);
+use Cwd qw(abs_path);
+require './t/common.perl';
+require_git(2.6);
+
+my @mods = qw(DBD::SQLite Search::Xapian HTTP::Request::Common Plack::Test
+		URI::Escape Plack::Builder);
+foreach my $mod (@mods) {
+	eval "require $mod";
+	plan skip_all => "$mod missing for $0" if $@;
+}
+chomp(my $git_dir = `git rev-parse --git-dir 2>/dev/null`);
+plan skip_all => "$0 must be run from a git working tree" if $?;
+$git_dir = abs_path($git_dir);
+
+use_ok "PublicInbox::$_" for (qw(Inbox V2Writable MIME Git SolverGit));
+
+my $mainrepo = tempdir('pi-solver-XXXXXX', TMPDIR => 1, CLEANUP => 1);
+my $opts = {
+	mainrepo => $mainrepo,
+	name => 'test-v2writable',
+	version => 2,
+	-primary_address => 'test@example.com',
+};
+my $ibx = PublicInbox::Inbox->new($opts);
+my $im = PublicInbox::V2Writable->new($ibx, 1);
+$im->{parallel} = 0;
+
+sub deliver_patch ($) {
+	open my $fh, '<', $_[0] or die "open: $!";
+	my $mime = PublicInbox::MIME->new(do { local $/; <$fh> });
+	$im->add($mime);
+	$im->done;
+}
+
+deliver_patch('t/solve/0001-simple-mod.patch');
+
+my $gits = [ PublicInbox::Git->new($git_dir) ];
+my $solver = PublicInbox::SolverGit->new($gits, [ $ibx ]);
+open my $log, '+>>', "$mainrepo/solve.log" or die "open: $!";
+my $res = $solver->solve($log, '69df7d5', {});
+ok($res, 'solved a blob!');
+my $wt_git = $res->[0];
+is(ref($wt_git), 'PublicInbox::Git', 'got a git object for the blob');
+my $expect = '69df7d565d49fbaaeb0a067910f03dc22cd52bd0';
+is($res->[1], $expect, 'resolved blob to unabbreviated identifier');
+is($res->[2], 'blob', 'type specified');
+is($res->[3], 4405, 'size returned');
+
+is(ref($wt_git->cat_file($res->[1])), 'SCALAR', 'wt cat-file works');
+is_deeply([$expect, 'blob', 4405],
+	  [$wt_git->check($res->[1])], 'wt check works');
+
+if (0) { # TODO: check this?
+	seek($log, 0, 0);
+	my $z = do { local $/; <$log> };
+	diag $z;
+}
+
+$res = undef;
+my $wt_git_dir = $wt_git->{git_dir};
+$wt_git = undef;
+ok(!-d $wt_git_dir, 'no references to WT held');
+
+$res = $solver->solve($log, '0'x40, {});
+is($res, undef, 'no error on z40');
+
+my $git_v2_20_1_tag = '7a95a1cd084cb665c5c2586a415e42df0213af74';
+$res = $solver->solve($log, $git_v2_20_1_tag, {});
+is($res, undef, 'no error on a tag not in our repo');
+
+deliver_patch('t/solve/0002-rename-with-modifications.patch');
+$res = $solver->solve($log, '0a92431', {});
+ok($res, 'resolved without hints');
+
+my $hints = {
+	oid_a => '3435775',
+	path_a => 'HACKING',
+	path_b => 'CONTRIBUTING'
+};
+my $hinted = $solver->solve($log, '0a92431', $hints);
+# don't compare ::Git objects:
+shift @$res; shift @$hinted;
+is_deeply($res, $hinted, 'hints work (or did not hurt :P');
+
+done_testing();
-- 
EW


^ permalink raw reply related	[relevance 7%]

* [PATCH] solver: initial Perl implementation
@ 2019-01-15  8:46  7% Eric Wong
  0 siblings, 0 replies; 14+ results
From: Eric Wong @ 2019-01-15  8:46 UTC (permalink / raw)
  To: meta

This will lookup git blobs from associated git source code
repositories.  If the blobs can't be found, an attempt to
"solve" them via patch application will be performed.

Eventually, this may become the basis of a type-agnostic
frontend similar to "git show"
---
 MANIFEST                                     |   4 +
 lib/PublicInbox/Git.pm                       |  16 +
 lib/PublicInbox/SolverGit.pm                 | 400 +++++++++++++++++++
 t/solve/0001-simple-mod.patch                |  20 +
 t/solve/0002-rename-with-modifications.patch |  37 ++
 t/solver_git.t                               |  91 +++++
 6 files changed, 568 insertions(+)
 create mode 100644 lib/PublicInbox/SolverGit.pm
 create mode 100644 t/solve/0001-simple-mod.patch
 create mode 100644 t/solve/0002-rename-with-modifications.patch
 create mode 100644 t/solver_git.t

diff --git a/MANIFEST b/MANIFEST
index dfd9e27..95ad0c6 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -101,6 +101,7 @@ lib/PublicInbox/SearchIdxPart.pm
 lib/PublicInbox/SearchMsg.pm
 lib/PublicInbox/SearchThread.pm
 lib/PublicInbox/SearchView.pm
+lib/PublicInbox/SolverGit.pm
 lib/PublicInbox/Spamcheck.pm
 lib/PublicInbox/Spamcheck/Spamc.pm
 lib/PublicInbox/Spawn.pm
@@ -201,6 +202,9 @@ t/qspawn.t
 t/reply.t
 t/search-thr-index.t
 t/search.t
+t/solve/0001-simple-mod.patch
+t/solve/0002-rename-with-modifications.patch
+t/solver_git.t
 t/spamcheck_spamc.t
 t/spawn.t
 t/thread-cycle.t
diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index 4601f25..111c4f3 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -38,6 +38,7 @@ sub new {
 	my ($class, $git_dir) = @_;
 	my @st;
 	$st[7] = $st[10] = 0;
+	# may contain {-wt} field (working-tree (File::Temp::Dir))
 	bless { git_dir => $git_dir, st => \@st }, $class
 }
 
@@ -199,6 +200,21 @@ sub packed_bytes {
 
 sub DESTROY { cleanup(@_) }
 
+# show the blob URL for cgit/gitweb/whatever
+sub src_blob_url {
+	my ($self, $oid) = @_;
+	# blob_fmt = "https://example.com/foo.git/blob/%s"
+	if (my $bfu = $self->{blob_fmt_url}) {
+		return sprintf($bfu, $oid);
+	}
+
+	# don't show full FS path, basename should be OK:
+	if ($self->{git_dir} =~ m!/([^/]+)\z!) {
+		return "/path/to/$1";
+	}
+	'???';
+}
+
 1;
 __END__
 =pod
diff --git a/lib/PublicInbox/SolverGit.pm b/lib/PublicInbox/SolverGit.pm
new file mode 100644
index 0000000..f28768a
--- /dev/null
+++ b/lib/PublicInbox/SolverGit.pm
@@ -0,0 +1,400 @@
+# Copyright (C) 2019 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# "Solve" blobs which don't exist in git code repositories by
+# searching inboxes for post-image blobs.
+
+# this emits a lot of debugging/tracing information which may be
+# publically viewed over HTTP(S).  Be careful not to expose
+# local filesystem layouts in the process.
+package PublicInbox::SolverGit;
+use strict;
+use warnings;
+use File::Temp qw();
+use Fcntl qw(SEEK_SET);
+use File::Path qw(make_path);
+use PublicInbox::Git qw(git_unquote);
+use PublicInbox::Spawn qw(spawn popen_rd);
+use PublicInbox::MsgIter qw(msg_iter msg_part_text);
+use URI::Escape qw(uri_escape_utf8);
+
+# don't bother if somebody sends us a patch with these path components,
+# it's junk at best, an attack attempt at worse:
+my %bad_component = map { $_ => 1 } ('', '.', '..');
+
+sub new {
+	my ($class, $gits, $inboxes) = @_;
+	bless {
+		gits => $gits,
+		inboxes => $inboxes,
+	}, $class;
+}
+
+# look for existing blobs already in git repos
+sub solve_existing ($$) {
+	my ($self, $want) = @_;
+	foreach my $git (@{$self->{gits}}) {
+		my ($oid_full, $type, $size) = $git->check($want->{oid_b});
+		if (defined($type) && $type eq 'blob') {
+			return [ $git, $oid_full, $type, int($size) ];
+		}
+	}
+	undef;
+}
+
+# returns a hashref with information about a diff:
+# {
+#	oid_a => abbreviated pre-image oid,
+#	oid_b => abbreviated post-image oid,
+#	tmp => anonymous file handle with the diff,
+#	hdr_lines => arrayref of various header lines for mode information
+#	mode_a => original mode of oid_a (string, not integer),
+#	ibx => PublicInbox::Inbox object containing the diff
+#	smsg => PublicInbox::SearchMsg object containing diff
+#	path_a => pre-image path
+#	path_b => post-image path
+# }
+sub extract_diff ($$$$) {
+	my ($p, $re, $ibx, $smsg) = @_;
+	my ($part) = @$p; # ignore $depth and @idx;
+	my $hdr_lines; # diff --git a/... b/...
+	my $tmp;
+	my $ct = $part->content_type || 'text/plain';
+	my ($s, undef) = msg_part_text($part, $ct);
+	defined $s or return;
+	my $di = {};
+	foreach my $l (split(/^/m, $s)) {
+		if ($l =~ /$re/) {
+			$di->{oid_a} = $1;
+			$di->{oid_b} = $2;
+			my $mode_a = $3;
+			if ($mode_a =~ /\A(?:100644|120000|100755)\z/) {
+				$di->{mode_a} = $mode_a;
+			}
+
+			# start writing the diff out to a tempfile
+			open($tmp, '+>', undef) or die "open(tmp): $!";
+			$di->{tmp} = $tmp;
+			$di->{hdr_lines} = $hdr_lines;
+
+			print $tmp @$hdr_lines, $l or die "print(tmp): $!";
+
+			# for debugging/diagnostics:
+			$di->{ibx} = $ibx;
+			$di->{smsg} = $smsg;
+		} elsif ($l =~ m!\Adiff --git ("?a/.+) ("?b/.+)$!) {
+			return $di if $tmp; # got our blob, done!
+
+			my ($path_a, $path_b) = ($1, $2);
+
+			# don't care for leading 'a/' and 'b/'
+			my (undef, @a) = split(m{/}, git_unquote($path_a));
+			my (undef, @b) = split(m{/}, git_unquote($path_b));
+
+			# get rid of path-traversal attempts and junk patches:
+			foreach (@a, @b) {
+				return if $bad_component{$_};
+			}
+
+			$di->{path_a} = join('/', @a);
+			$di->{path_b} = join('/', @b);
+			$hdr_lines = [ $l ];
+		} elsif ($tmp) {
+			print $tmp $l or die "print(tmp): $!";
+		} elsif ($hdr_lines) {
+			push @$hdr_lines, $l;
+		}
+	}
+	$tmp ? $di : undef;
+}
+
+sub path_searchable ($) { defined($_[0]) && $_[0] =~ m!\A[\w/\. \-]+\z! }
+
+sub find_extract_diff ($$$) {
+	my ($self, $ibx, $want) = @_;
+	my $srch = $ibx->search or return;
+
+	my $post = $want->{oid_b} or die 'BUG: no {oid_b}';
+	$post =~ /\A[a-f0-9]+\z/ or die "BUG: oid_b not hex: $post";
+
+	my $q = "dfpost:$post";
+	my $pre = $want->{oid_a};
+	if (defined $pre && $pre =~ /\A[a-f0-9]+\z/) {
+		$q .= " dfpre:$pre";
+	} else {
+		$pre = '[a-f0-9]{7}'; # for $re below
+	}
+
+	my $path_b = $want->{path_b};
+	if (path_searchable($path_b)) {
+		$q .= qq{ dfn:"$path_b"};
+
+		my $path_a = $want->{path_a};
+		if (path_searchable($path_a) && $path_a ne $path_b) {
+			$q .= qq{ dfn:"$path_a"};
+		}
+	}
+
+	my $msgs = $srch->query($q, { relevance => 1 });
+	my $re = qr/\Aindex ($pre[a-f0-9]*)\.\.($post[a-f0-9]*)(?: (\d+))?/;
+
+	my $di;
+	foreach my $smsg (@$msgs) {
+		$ibx->smsg_mime($smsg) or next;
+		msg_iter(delete($smsg->{mime}), sub {
+			$di ||= extract_diff($_[0], $re, $ibx, $smsg);
+		});
+		return $di if $di;
+	}
+}
+
+# pure Perl "git init"
+sub do_git_init_wt ($) {
+	my ($self) = @_;
+	my $wt = File::Temp->newdir('solver.wt-XXXXXXXX', TMPDIR => 1);
+	my $dir = $wt->dirname;
+
+	foreach (qw(objects/info refs/heads)) {
+		make_path("$dir/.git/$_") or die "make_path $_: $!";
+	}
+	open my $fh, '>', "$dir/.git/config" or die "open .git/config: $!";
+	print $fh <<'EOF' or die "print .git/config $!";
+[core]
+	repositoryFormatVersion = 0
+	filemode = true
+	bare = false
+	fsyncObjectfiles = false
+	logAllRefUpdates = false
+EOF
+	close $fh or die "close .git/config: $!";
+
+	open $fh, '>', "$dir/.git/HEAD" or die "open .git/HEAD: $!";
+	print $fh "ref: refs/heads/master\n" or die "print .git/HEAD: $!";
+	close $fh or die "close .git/HEAD: $!";
+
+	my $f = '.git/objects/info/alternates';
+	open $fh, '>', "$dir/$f" or die "open: $f: $!";
+	foreach my $git (@{$self->{gits}}) {
+		print $fh "$git->{git_dir}/objects\n" or die "print $f: $!";
+	}
+	close $fh or die "close: $f: $!";
+	$wt;
+}
+
+sub extract_old_mode ($) {
+	my ($di) = @_;
+	if (grep(/\Aold mode (100644|100755|120000)$/, @{$di->{hdr_lines}})) {
+		return $1;
+	}
+	'100644';
+}
+
+sub reap ($$) {
+	my ($pid, $msg) = @_;
+	waitpid($pid, 0) == $pid or die "waitpid($msg): $!";
+	$? == 0 or die "$msg failed: $?";
+}
+
+sub prepare_wt ($$$) {
+	my ($wt_dir, $existing, $di) = @_;
+	my $oid_full = $existing->[1];
+	my ($r, $w);
+	my $path_a = $di->{path_a} or die "BUG: path_a missing for $oid_full";
+	my $mode_a = $di->{mode_a} || extract_old_mode($di);
+	my @git = (qw(git -C), $wt_dir);
+
+	pipe($r, $w) or die "pipe: $!";
+	my $rdr = { 0 => fileno($r) };
+	my $pid = spawn([@git, qw(update-index -z --index-info)], {}, $rdr);
+	close $r or die "close pipe(r): $!";
+	print $w "$mode_a $oid_full\t$path_a\0" or die "print update-index: $!";
+	close $w or die "close update-index: $!";
+	reap($pid, 'update-index -z --index-info');
+
+	$pid = spawn([@git, qw(checkout-index -a -f -u)]);
+	reap($pid, 'checkout-index -a -f -u');
+}
+
+sub do_apply ($$$$) {
+	my ($out, $wt_git, $wt_dir, $di) = @_;
+
+	my $tmp = delete $di->{tmp} or die "BUG: no tmp ", di_info($di);
+	$tmp->flush or die "tmp->flush failed: $!";
+	$out->flush or die "err->flush failed: $!";
+	sysseek($tmp, 0, SEEK_SET) or die "sysseek(tmp) failed: $!";
+
+	defined(my $err_fd = fileno($out)) or die "fileno(out): $!";
+	my $rdr = { 0 => fileno($tmp), 1 => $err_fd, 2 => $err_fd };
+	my $cmd = [ qw(git -C), $wt_dir,
+	            qw(apply --whitespace=warn -3 --verbose) ];
+	reap(spawn($cmd, undef, $rdr), 'apply');
+
+	local $/ = "\0";
+	my $rd = popen_rd([qw(git -C), $wt_dir, qw(ls-files -s -z)]);
+
+	defined(my $line = <$rd>) or die "failed to read ls-files: $!";
+	chomp $line or die "no trailing \\0 in [$line] from ls-files";
+
+	my ($info, $file) = split(/\t/, $line, 2);
+	my ($mode_b, $oid_b_full, $stage) = split(/ /, $info);
+
+	defined($line = <$rd>) and die "extra files in index: $line";
+	close $rd or die "close ls-files: $?";
+
+	$file eq $di->{path_b} or
+		die "index mismatch: file=$file != path_b=$di->{path_b}";
+	my $abs_path = "$wt_dir/$file";
+	-r $abs_path or die "WT_DIR/$file not readable";
+	my $size = -s _;
+
+	print $out "OK $mode_b $oid_b_full $stage\t$file\n";
+	[ $wt_git, $oid_b_full, 'blob', $size, $di ];
+}
+
+sub di_url ($) {
+	my ($di) = @_;
+	# note: we don't pass the PSGI env here, different inboxes
+	# can have different HTTP_HOST on the same instance.
+	my $url = $di->{ibx}->base_url;
+	my $mid = $di->{smsg}->{mid};
+	defined($url) ? "<$url/$mid/>" : "<$mid>";
+}
+
+sub apply_patches ($$$$$) {
+	my ($self, $out, $wt, $found, $patches) = @_;
+	my $wt_dir = $wt->dirname;
+	my $wt_git = PublicInbox::Git->new("$wt_dir/.git");
+	$wt_git->{-wt} = $wt;
+
+	my $cur = 0;
+	my $tot = scalar @$patches;
+
+	foreach my $di (@$patches) {
+		my $i = ++$cur;
+		my $oid_a = $di->{oid_a};
+		my $existing = $found->{$oid_a};
+		my $empty_oid = $oid_a =~ /\A0+\z/;
+
+		if ($empty_oid && $i != 0) {
+			die "empty oid at [$i/$tot] ", di_url($di);
+		}
+		if (!$existing && !$empty_oid) {
+			die "missing $oid_a at [$i/$tot] ", di_url($di);
+		}
+
+		# prepare the worktree for patch application:
+		if ($i == 1 && $existing) {
+			prepare_wt($wt_dir, $existing, $di);
+		}
+		unless (-f "$wt_dir/$di->{path_a}") {
+			die "missing $di->{path_a} at [$i/$tot] ", di_url($di);
+		}
+
+		print $out "applying [$i/$tot] ", di_url($di), "\n",
+			   join('', @{$di->{hdr_lines}}), "\n"
+			or die "print \$out failed: $!";
+
+		# apply the patch!
+		$found->{$di->{oid_b}} = do_apply($out, $wt_git, $wt_dir, $di);
+	}
+}
+
+sub dump_found ($$) {
+	my ($out, $found) = @_;
+	foreach my $oid (sort keys %$found) {
+		my ($git, $oid, $di) = @{$found->{$oid}};
+		my $loc = $di ? di_info($di) : $git->src_blob_url($oid);
+		print $out "$oid from $loc\n";
+	}
+}
+
+sub dump_patches ($$) {
+	my ($out, $patches) = @_;
+	my $tot = scalar(@$patches);
+	my $i = 0;
+	foreach my $di (@$patches) {
+		++$i;
+		print $out "[$i/$tot] ", di_url($di), "\n";
+	}
+}
+
+# recreate $oid_b
+# Returns a 2-element array ref: [ PublicInbox::Git object, oid_full ]
+# or undef if nothing was found.
+sub solve ($$$$) {
+	my ($self, $out, $oid_b, $hints) = @_;
+
+	# should we even get here? Probably not, but somebody
+	# could be manually typing URLs:
+	return if $oid_b =~ /\A0+\z/;
+
+	my $req = { %$hints, oid_b => $oid_b };
+	my @todo = ($req);
+	my $found = {}; # { oid_abbrev => [ PublicInbox::Git, oid_full, $di ] }
+	my $patches = []; # [ array of $di hashes ]
+
+	my $max = $self->{max_steps} || 200;
+	my $steps = 0;
+
+	while (defined(my $want = pop @todo)) {
+		# see if we can find the blob in an existing git repo:
+		if (my $existing = solve_existing($self, $want)) {
+			my $want_oid = $want->{oid_b};
+			return $existing if $want_oid eq $oid_b; # DONE!
+
+			$found->{$want_oid} = $existing;
+			next; # ok, one blob resolved, more to go?
+		}
+
+		# scan through inboxes to look for emails which results in
+		# the oid we want:
+		foreach my $ibx (@{$self->{inboxes}}) {
+			my $di = find_extract_diff($self, $ibx, $want) or next;
+
+			unshift @$patches, $di;
+
+			# good, we can find a path to the oid we $want, now
+			# lets see if we need to apply more patches:
+			my $src = $di->{oid_a};
+			if ($src !~ /\A0+\z/) {
+				if (++$steps > $max) {
+					print $out
+"Aborting, too many steps to $oid_b\n";
+
+					return;
+				}
+
+				# we have to solve it using another oid, fine:
+				my $job = {
+					oid_b => $src,
+					path_b => $di->{path_a},
+				};
+				push @todo, $job;
+			}
+			last; # onto the next @todo item
+		}
+	}
+
+	unless (scalar(@$patches)) {
+		print $out "no patch(es) for $oid_b\n";
+		dump_found($out, $found);
+		return;
+	}
+
+	# reconstruct the oid_b blob using patches we found:
+	eval {
+		my $wt = do_git_init_wt($self);
+		apply_patches($self, $out, $wt, $found, $patches);
+	};
+	if ($@) {
+		print $out "E: $@\nfound: ";
+		dump_found($out, $found);
+		print $out "patches: ";
+		dump_patches($out, $patches);
+		return;
+	}
+
+	$found->{$oid_b};
+}
+
+1;
diff --git a/t/solve/0001-simple-mod.patch b/t/solve/0001-simple-mod.patch
new file mode 100644
index 0000000..c6bb157
--- /dev/null
+++ b/t/solve/0001-simple-mod.patch
@@ -0,0 +1,20 @@
+From: WEB DESIGN EXPERT <BOFH@YHBT.net>
+To: meta@public-inbox.org
+Subject: [PATCH] TODO: take expert web design advice
+Date: Mon, 1 Apr 2019 08:15:20 +0000
+Message-Id: <20190401081523.16213-1-BOFH@YHBT.net>
+
+---
+ TODO | 2 ++
+ 1 file changed, 2 insertions(+)
+
+diff --git a/TODO b/TODO
+index 605013e..69df7d5 100644
+--- a/TODO
++++ b/TODO
+@@ -109,3 +109,5 @@ all need to be considered for everything we introduce)
+ 
+   * Optional history squashing to reduce commit and intermediate
+     tree objects
++
++  * Make use of <blink> and <marquee> tags
diff --git a/t/solve/0002-rename-with-modifications.patch b/t/solve/0002-rename-with-modifications.patch
new file mode 100644
index 0000000..aa415e0
--- /dev/null
+++ b/t/solve/0002-rename-with-modifications.patch
@@ -0,0 +1,37 @@
+From: POLITICAL CORRECTNESS EXPERT <BOFH@YHBT.net>
+To: meta@public-inbox.org
+Subject: [PATCH] POLITICALLY CORRECT FILE NAMING
+Date: Mon, 1 Apr 2019 08:15:20 +0000
+Message-Id: <20190401081523.16213-2-BOFH@YHBT.net>
+
+HACKING MIGHT GET US REPORTED TO EFF-BEE-EYE
+AND USE MARKDOWN CUZ MOAR FLAVORS == BETTER
+---
+ HACKING => CONTRIBUTING.md | 6 +++---
+ 1 file changed, 3 insertions(+), 3 deletions(-)
+ rename HACKING => CONTRIBUTING.md (94%)
+
+diff --git a/HACKING b/CONTRIBUTING.md
+similarity index 94%
+rename from HACKING
+rename to CONTRIBUTING.md
+index 3435775..0a92431 100644
+--- a/HACKING
++++ b/CONTRIBUTING.md
+@@ -1,5 +1,5 @@
+-hacking public-inbox
+---------------------
++contributing to public-inbox
++----------------------------
+ 
+ Send all patches and "git request-pull"-formatted emails to our
+ self-hosting inbox at meta@public-inbox.org
+@@ -15,7 +15,7 @@ Please consider our goals in mind:
+ 	Decentralization, Accessibility, Compatibility, Performance
+ 
+ These goals apply to everyone: users viewing over the web or NNTP,
+-sysadmins running public-inbox, and other hackers working public-inbox.
++sysadmins running public-inbox, and other contributors working public-inbox.
+ 
+ We will reject any feature which advocates or contributes to any
+ particular instance of a public-inbox becoming a single point of failure.
diff --git a/t/solver_git.t b/t/solver_git.t
new file mode 100644
index 0000000..fe322ea
--- /dev/null
+++ b/t/solver_git.t
@@ -0,0 +1,91 @@
+# Copyright (C) 2019 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use warnings;
+use Test::More;
+use File::Temp qw(tempdir);
+use Cwd qw(abs_path);
+require './t/common.perl';
+require_git(2.6);
+
+my @mods = qw(DBD::SQLite Search::Xapian HTTP::Request::Common Plack::Test
+		URI::Escape Plack::Builder);
+foreach my $mod (@mods) {
+	eval "require $mod";
+	plan skip_all => "$mod missing for $0" if $@;
+}
+chomp(my $git_dir = `git rev-parse --git-dir 2>/dev/null`);
+plan skip_all => "$0 must be run from a git working tree" if $?;
+$git_dir = abs_path($git_dir);
+
+use_ok "PublicInbox::$_" for (qw(Inbox V2Writable MIME Git SolverGit));
+
+my $mainrepo = tempdir('pi-solver-XXXXXX', TMPDIR => 1, CLEANUP => 1);
+my $opts = {
+	mainrepo => $mainrepo,
+	name => 'test-v2writable',
+	version => 2,
+	-primary_address => 'test@example.com',
+};
+my $ibx = PublicInbox::Inbox->new($opts);
+my $im = PublicInbox::V2Writable->new($ibx, 1);
+$im->{parallel} = 0;
+
+sub deliver_patch ($) {
+	open my $fh, '<', $_[0] or die "open: $!";
+	my $mime = PublicInbox::MIME->new(do { local $/; <$fh> });
+	$im->add($mime);
+	$im->done;
+}
+
+deliver_patch('t/solve/0001-simple-mod.patch');
+
+my $gits = [ PublicInbox::Git->new($git_dir) ];
+my $solver = PublicInbox::SolverGit->new($gits, [ $ibx ]);
+open my $log, '+>>', "$mainrepo/solve.log" or die "open: $!";
+my $res = $solver->solve($log, '69df7d5', {});
+ok($res, 'solved a blob!');
+my $wt_git = $res->[0];
+is(ref($wt_git), 'PublicInbox::Git', 'got a git object for the blob');
+my $expect = '69df7d565d49fbaaeb0a067910f03dc22cd52bd0';
+is($res->[1], $expect, 'resolved blob to unabbreviated identifier');
+is($res->[2], 'blob', 'type specified');
+is($res->[3], 4405, 'size returned');
+
+is(ref($wt_git->cat_file($res->[1])), 'SCALAR', 'wt cat-file works');
+is_deeply([$expect, 'blob', 4405],
+	  [$wt_git->check($res->[1])], 'wt check works');
+
+if (0) { # TODO: check this?
+	seek($log, 0, 0);
+	my $z = do { local $/; <$log> };
+	diag $z;
+}
+
+$res = undef;
+my $wt_git_dir = $wt_git->{git_dir};
+$wt_git = undef;
+ok(!-d $wt_git_dir, 'no references to WT held');
+
+$res = $solver->solve($log, '0'x40, {});
+is($res, undef, 'no error on z40');
+
+my $git_v2_20_1_tag = '7a95a1cd084cb665c5c2586a415e42df0213af74';
+$res = $solver->solve($log, $git_v2_20_1_tag, {});
+is($res, undef, 'no error on a tag not in our repo');
+
+deliver_patch('t/solve/0002-rename-with-modifications.patch');
+$res = $solver->solve($log, '0a92431', {});
+ok($res, 'resolved without hints');
+
+my $hints = {
+	oid_a => '3435775',
+	path_a => 'HACKING',
+	path_b => 'CONTRIBUTING'
+};
+my $hinted = $solver->solve($log, '0a92431', $hints);
+# don't compare ::Git objects:
+shift @$res; shift @$hinted;
+is_deeply($res, $hinted, 'hints work (or did not hurt :P');
+
+done_testing();
-- 
EW


^ permalink raw reply related	[relevance 7%]

Results 1-14 of 14 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2019-01-15  8:46  7% [PATCH] solver: initial Perl implementation Eric Wong
2019-01-21 20:52     [PATCH 00/37] viewvcs: diff highlighting and more Eric Wong
2019-01-21 20:52  7% ` [PATCH 04/37] solver: initial Perl implementation Eric Wong
2019-01-21 20:52  7% ` [PATCH 09/37] view: wire up diff and vcs viewers with solver Eric Wong
2019-01-21 20:52  9% ` [PATCH 12/37] view: enable naming hints for raw blob downloads Eric Wong
2019-04-04  9:55  6% repobrowse history and notes Eric Wong
2019-10-21 11:22     [PATCH 0/7] dead code elimination Eric Wong
2019-10-21 11:22  7% ` [PATCH 5/7] git: remove src_blob_url Eric Wong
2020-09-01 20:36     [PATCH 0/3] www: cleanups + scheduling improvements Eric Wong
2020-09-01 20:36 10% ` [PATCH 2/3] use "\&" where possible when referring to subroutines Eric Wong
2020-09-09  6:26  6% [PATCH 00/11] httpd: further reduce event loop monopolization Eric Wong
2020-09-09  6:26 10% ` [PATCH 03/11] use "\&" where possible when referring to subroutines Eric Wong
2022-08-29  9:26     [PATCH 00/18] WWW: patch, tree, git glossary Eric Wong
2022-08-29  9:26  7% ` [PATCH 03/18] viewvcs: delay stringification of solver debug log Eric Wong
2022-12-15 19:34  5% [PATCH] relnotes: 2.0.0 work-in-progress Eric Wong
2023-08-28 10:42     [PATCH 1/5] ci/profiles.sh: fix case matching logic Štěpán Němec
2023-08-28 10:42  5% ` [PATCH 5/5] Fix some typos/grammar/errors in docs and comments Štěpán Němec
2024-03-11 19:40     [PATCH 0/4] memory reductions for WWW + solver Eric Wong
2024-03-11 19:40 16% ` [PATCH 1/4] www: use a dedicated limiter for blob solver Eric Wong
2024-04-01 13:21  6% sample robots.txt to reduce WWW load Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).