user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* [PATCH 0/2] "lei q" remote memoization
@ 2021-02-24 23:37 Eric Wong
  2021-02-24 23:37 ` [PATCH 1/2] lei_external: don't treat IPv6 URLs as globs Eric Wong
  2021-02-24 23:37 ` [PATCH 2/2] lei q: auto-memoize remote messages into lei/store Eric Wong
  0 siblings, 2 replies; 3+ messages in thread
From: Eric Wong @ 2021-02-24 23:37 UTC (permalink / raw)
  To: meta

1/2 only happened because I made IPv6 tests the default.
2/2 is a feature I've always wanted.

Eric Wong (2):
  lei_external: don't treat IPv6 URLs as globs
  lei q: auto-memoize remote messages into lei/store

 MANIFEST                       |  1 +
 lib/PublicInbox/LEI.pm         |  2 ++
 lib/PublicInbox/LeiExternal.pm |  8 +++++-
 lib/PublicInbox/LeiQuery.pm    |  1 +
 lib/PublicInbox/LeiXSearch.pm  | 10 +++++--
 t/lei-q-remote-import.t        | 50 ++++++++++++++++++++++++++++++++++
 t/lei_external.t               |  1 +
 7 files changed, 69 insertions(+), 4 deletions(-)
 create mode 100644 t/lei-q-remote-import.t

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/2] lei_external: don't treat IPv6 URLs as globs
  2021-02-24 23:37 [PATCH 0/2] "lei q" remote memoization Eric Wong
@ 2021-02-24 23:37 ` Eric Wong
  2021-02-24 23:37 ` [PATCH 2/2] lei q: auto-memoize remote messages into lei/store Eric Wong
  1 sibling, 0 replies; 3+ messages in thread
From: Eric Wong @ 2021-02-24 23:37 UTC (permalink / raw)
  To: meta

IPv6 addresses are hexadecimals and colons inside brackets, so
add some DWIM-ery to ensure we don't attempt to treat addresses
like "http://[dead:beef]/foo/" as a glob.
---
 lib/PublicInbox/LeiExternal.pm | 8 +++++++-
 t/lei_external.t               | 1 +
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/LeiExternal.pm b/lib/PublicInbox/LeiExternal.pm
index 0cc84cca..47791d4e 100644
--- a/lib/PublicInbox/LeiExternal.pm
+++ b/lib/PublicInbox/LeiExternal.pm
@@ -54,6 +54,12 @@ sub glob2re {
 	my $p = '';
 	my $in_bracket = 0;
 	my $qm = 0;
+	my $schema_host_port = '';
+
+	# don't glob URL-looking things that look like IPv6
+	if ($re =~ s!\A([a-z0-9\+]+://\[[a-f0-9\:]+\](?::[0-9]+)?/)!!i) {
+		$schema_host_port = quotemeta $1; # "http://[::1]:1234"
+	}
 	my $changes = ($re =~ s!(.)!
 		$re_map{$p eq '\\' ? '' : do {
 			if ($1 eq '[') { ++$in_bracket }
@@ -69,7 +75,7 @@ sub glob2re {
 			(my $in_braces = $2) =~ tr!,!|!;
 			$1."($in_braces)";
 			/sge);
-	($changes - $qm) ? $re : undef;
+	($changes - $qm) ? $schema_host_port.$re : undef;
 }
 
 # get canonicalized externals list matching $loc
diff --git a/t/lei_external.t b/t/lei_external.t
index 78f71658..51d0af5c 100644
--- a/t/lei_external.t
+++ b/t/lei_external.t
@@ -17,6 +17,7 @@ is($canon->('/this//path/'), '/this/path', 'extra slashes gone');
 is($canon->('/ALL/CAPS'), '/ALL/CAPS', 'caps preserved');
 
 my $glob2re = $cls->can('glob2re');
+is($glob2re->('http://[::1]:1234/foo/'), undef, 'IPv6 URL not globbed');
 is($glob2re->('foo'), undef, 'plain string unchanged');
 is_deeply($glob2re->('[f-o]'), '[f-o]' , 'range accepted');
 is_deeply($glob2re->('*'), '[^/]*?' , 'wildcard accepted');

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 2/2] lei q: auto-memoize remote messages into lei/store
  2021-02-24 23:37 [PATCH 0/2] "lei q" remote memoization Eric Wong
  2021-02-24 23:37 ` [PATCH 1/2] lei_external: don't treat IPv6 URLs as globs Eric Wong
@ 2021-02-24 23:37 ` Eric Wong
  1 sibling, 0 replies; 3+ messages in thread
From: Eric Wong @ 2021-02-24 23:37 UTC (permalink / raw)
  To: meta

This lets users avoid network traffic on subsequent searches at
the expense of local disk space.  --no-import-remote may be
specified to reverse this trade-off for users with little
storage.
---
 MANIFEST                      |  1 +
 lib/PublicInbox/LEI.pm        |  2 ++
 lib/PublicInbox/LeiQuery.pm   |  1 +
 lib/PublicInbox/LeiXSearch.pm | 10 ++++---
 t/lei-q-remote-import.t       | 50 +++++++++++++++++++++++++++++++++++
 5 files changed, 61 insertions(+), 3 deletions(-)
 create mode 100644 t/lei-q-remote-import.t

diff --git a/MANIFEST b/MANIFEST
index 4c04eec8..adbd108f 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -372,6 +372,7 @@ t/lei-import-maildir.t
 t/lei-import-nntp.t
 t/lei-import.t
 t/lei-mirror.t
+t/lei-q-remote-import.t
 t/lei.t
 t/lei_dedupe.t
 t/lei_external.t
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 8cd95ac2..50665b3e 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -112,6 +112,7 @@ our %CMD = ( # sorted in order of importance/use:
 	save-as=s output|mfolder|o=s format|f=s dedupe|d=s threads|t augment|a
 	sort|s=s reverse|r offset=i remote! local! external! pretty
 	include|I=s@ exclude=s@ only=s@ jobs|j=s globoff|g stdin|
+	import-remote!
 	alert=s@ mua=s no-torsocks torsocks=s verbose|v+ quiet|q C=s@),
 	PublicInbox::LeiQuery::curl_opt(), opt_dash('limit|n=i', '[0-9]+') ],
 
@@ -225,6 +226,7 @@ my %OPTDESC = (
 		'whether or not to wrap git and curl commands with torsocks'],
 'no-torsocks' => 'alias for --torsocks=no',
 'save-as=s' => ['NAME', 'save a search terms by given name'],
+'import-remote!' => 'do not memoize remote messages into local store',
 
 'type=s' => [ 'any|mid|git', 'disambiguate type' ],
 
diff --git a/lib/PublicInbox/LeiQuery.pm b/lib/PublicInbox/LeiQuery.pm
index 743fa3f7..b57d1cc5 100644
--- a/lib/PublicInbox/LeiQuery.pm
+++ b/lib/PublicInbox/LeiQuery.pm
@@ -51,6 +51,7 @@ sub lei_q {
 	# we'll allow "--only $LOCATION --local"
 	my $sto = $self->_lei_store(1);
 	my $lse = $sto->search;
+	$sto->write_prepare($self) if $opt->{'import-remote'} //= 1;
 	if ($opt->{'local'} //= scalar(@only) ? 0 : 1) {
 		$lxs->prepare_external($lse);
 	}
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index c46aba3b..2d399653 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -189,8 +189,9 @@ sub query_mset { # non-parallel for non-"--threads" users
 	$lei->{ovv}->ovv_atexit_child($lei);
 }
 
-sub each_eml { # callback for MboxReader->mboxrd
+sub each_remote_eml { # callback for MboxReader->mboxrd
 	my ($eml, $self, $lei, $each_smsg) = @_;
+	$lei->{sto}->ipc_do('set_eml', $eml) if $lei->{sto}; # --import-remote
 	my $smsg = bless {}, 'PublicInbox::Smsg';
 	$smsg->populate($eml);
 	$smsg->parse_references($eml, mids($eml));
@@ -244,14 +245,17 @@ sub query_remote_mboxrd {
 		my ($fh, $pid) = popen_rd($cmd, undef, $rdr);
 		$reap_curl = PublicInbox::OnDestroy->new($sigint_reap, $pid);
 		$fh = IO::Uncompress::Gunzip->new($fh);
-		PublicInbox::MboxReader->mboxrd($fh, \&each_eml, $self,
+		PublicInbox::MboxReader->mboxrd($fh, \&each_remote_eml, $self,
 						$lei, $each_smsg);
 		my $err = waitpid($pid, 0) == $pid ? undef
 						: "BUG: waitpid($cmd): $!";
 		@$reap_curl = (); # cancel OnDestroy
 		die $err if $err;
+		my $nr = $lei->{-nr_remote_eml};
+		if ($nr && $lei->{sto}) {
+			my $wait = $lei->{sto}->ipc_do('done');
+		}
 		if ($? == 0) {
-			my $nr = $lei->{-nr_remote_eml};
 			mset_progress($lei, $lei->{-current_url}, $nr, $nr);
 			next;
 		}
diff --git a/t/lei-q-remote-import.t b/t/lei-q-remote-import.t
new file mode 100644
index 00000000..f73524cf
--- /dev/null
+++ b/t/lei-q-remote-import.t
@@ -0,0 +1,50 @@
+#!perl -w
+# Copyright (C) 2021 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict; use v5.10.1; use PublicInbox::TestCommon;
+require_git 2.6;
+require_mods(qw(json DBD::SQLite Search::Xapian));
+use PublicInbox::MboxReader;
+my ($ro_home, $cfg_path) = setup_public_inboxes;
+my $sock = tcp_server;
+my ($tmpdir, $for_destroy) = tmpdir;
+my $cmd = [ '-httpd', '-W0', "--stdout=$tmpdir/1", "--stderr=$tmpdir/2" ];
+my $env = { PI_CONFIG => $cfg_path };
+my $td = start_script($cmd, $env, { 3 => $sock }) or BAIL_OUT("-httpd: $?");
+my $host_port = tcp_host_port($sock);
+my $url = "http://$host_port/t2/";
+my $exp1 = [ eml_load('t/plack-qp.eml') ];
+my $exp2 = [ eml_load('t/iso-2202-jp.eml') ];
+my $slurp_emls = sub {
+	open my $fh, '<', $_[0] or BAIL_OUT "open: $!";
+	my @eml;
+	PublicInbox::MboxReader->mboxrd($fh, sub {
+		my $eml = shift;
+		$eml->header_set('Status');
+		push @eml, $eml;
+	});
+	\@eml;
+};
+
+test_lei({ tmpdir => $tmpdir }, sub {
+	my $o = "$ENV{HOME}/o.mboxrd";
+	my @cmd = ('q', '-o', "mboxrd:$o", 'm:qp@example.com');
+	lei_ok(@cmd);
+	ok(-f $o && !-s _, 'output exists but is empty');
+	unlink $o or BAIL_OUT $!;
+	lei_ok(@cmd, '-I', $url);
+	is_deeply($slurp_emls->($o), $exp1, 'got results after remote search');
+	unlink $o or BAIL_OUT $!;
+	lei_ok(@cmd);
+	ok(-f $o && -s _, 'output exists after import but is not empty');
+	is_deeply($slurp_emls->($o), $exp1, 'got results w/o remote search');
+	unlink $o or BAIL_OUT $!;
+
+	$cmd[-1] = 'm:199707281508.AAA24167@hoyogw.example';
+	lei_ok(@cmd, '-I', $url, '--no-import-remote');
+	is_deeply($slurp_emls->($o), $exp2, 'got another after remote search');
+	unlink $o or BAIL_OUT $!;
+	lei_ok(@cmd);
+	ok(-f $o && !-s _, '--no-import-remote did not memoize');
+});
+done_testing;

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-02-24 23:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-24 23:37 [PATCH 0/2] "lei q" remote memoization Eric Wong
2021-02-24 23:37 ` [PATCH 1/2] lei_external: don't treat IPv6 URLs as globs Eric Wong
2021-02-24 23:37 ` [PATCH 2/2] lei q: auto-memoize remote messages into lei/store Eric Wong

user/dev discussion of public-inbox itself

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 meta meta/ https://public-inbox.org/meta \
		meta@public-inbox.org
	public-inbox-index meta

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://7fh6tueqddpjyxjmgtdiueylzoqt6pt7hec3pukyptlmohoowvhde4yd.onion/inbox.comp.mail.public-inbox.meta
	nntp://ie5yzdi7fg72h7s4sdcztq5evakq23rdt33mfyfcddc5u3ndnw24ogqd.onion/inbox.comp.mail.public-inbox.meta
	nntp://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.io/gmane.mail.public-inbox.general
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/public-inbox.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git