user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 06/11] lei q: -I/--exclude/--only support globs and basenames
Date: Tue,  2 Feb 2021 22:11:38 -1000	[thread overview]
Message-ID: <20210203081143.24424-7-e@80x24.org> (raw)
In-Reply-To: <20210203081143.24424-1-e@80x24.org>

We can do basename matching when it's unambiguous.  Since '*?[]'
characters are rare in URLs and pathnames, we'll do glob
matching by default to support a (curl-inspired) --globoff/-g
option to disable globbing.

And fix --exclude while we're at it
---
 lib/PublicInbox/LEI.pm         |  3 ++-
 lib/PublicInbox/LeiExternal.pm | 38 +++++++++++++++++++++++++++++++++-
 lib/PublicInbox/LeiQuery.pm    | 14 ++++++++-----
 3 files changed, 48 insertions(+), 7 deletions(-)

diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 05a39cad..3cb7a327 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -104,7 +104,7 @@ our %CMD = ( # sorted in order of importance/use:
 'q' => [ 'SEARCH_TERMS...', 'search for messages matching terms', qw(
 	save-as=s output|mfolder|o=s format|f=s dedupe|d=s thread|t augment|a
 	sort|s=s reverse|r offset=i remote! local! external! pretty
-	include|I=s@ exclude=s@ only=s@ jobs|j=s
+	include|I=s@ exclude=s@ only=s@ jobs|j=s globoff|g
 	mua-cmd|mua=s no-torsocks torsocks=s verbose|v quiet|q
 	received-after=s received-before=s sent-after=s sent-since=s),
 	PublicInbox::LeiQuery::curl_opt(), opt_dash('limit|n=i', '[0-9]+') ],
@@ -201,6 +201,7 @@ my $ls_format = [ 'OUT|plain|json|null', 'listing output format' ];
 my %OPTDESC = (
 'help|h' => 'show this built-in help',
 'quiet|q' => 'be quiet',
+'globoff|g' => "do not match locations using '*?' wildcards and '[]' ranges",
 'verbose|v' => 'be more verbose',
 'solve!' => 'do not attempt to reconstruct blobs from emails',
 'torsocks=s' => ['auto|no|yes',
diff --git a/lib/PublicInbox/LeiExternal.pm b/lib/PublicInbox/LeiExternal.pm
index 3853cfc1..6b4c7fb0 100644
--- a/lib/PublicInbox/LeiExternal.pm
+++ b/lib/PublicInbox/LeiExternal.pm
@@ -39,7 +39,7 @@ sub lei_ls_external {
 }
 
 sub ext_canonicalize {
-	my ($location) = $_[-1];
+	my ($location) = @_;
 	if ($location !~ m!\Ahttps?://!) {
 		PublicInbox::Config::rel2abs_collapsed($location);
 	} else {
@@ -52,6 +52,42 @@ sub ext_canonicalize {
 	}
 }
 
+my %patmap = ('*' => '[^/]*?', '?' => '[^/]', '[' => '[', ']' => ']');
+sub glob2pat {
+	my ($glob) = @_;
+        $glob =~ s!(.)!$patmap{$1} || "\Q$1"!ge;
+        $glob;
+}
+
+sub get_externals {
+	my ($self, $loc, $exclude) = @_;
+	return (ext_canonicalize($loc)) if -e $loc;
+
+	my @m;
+	my @cur = externals_each($self);
+	my $do_glob = !$self->{opt}->{globoff}; # glob by default
+	if ($do_glob && ($loc =~ /[\*\?]/s || $loc =~ /\[.*\]/s)) {
+		my $re = glob2pat($loc);
+		@m = grep(m!$re!, @cur);
+		return @m if scalar(@m);
+	} elsif (index($loc, '/') < 0) { # exact basename match:
+		@m = grep(m!/\Q$loc\E/?\z!, @cur);
+		return @m if scalar(@m) == 1;
+	} elsif ($exclude) { # URL, maybe:
+		my $canon = ext_canonicalize($loc);
+		@m = grep(m!\A\Q$canon\E\z!, @cur);
+		return @m if scalar(@m) == 1;
+	} else { # URL:
+		return (ext_canonicalize($loc));
+	}
+	if (scalar(@m) == 0) {
+		$self->fail("`$loc' is unknown");
+	} else {
+		$self->fail("`$loc' is ambiguous:\n", map { "\t$_\n" } @m);
+	}
+	();
+}
+
 sub lei_add_external {
 	my ($self, $location) = @_;
 	my $cfg = $self->_lei_cfg(1);
diff --git a/lib/PublicInbox/LeiQuery.pm b/lib/PublicInbox/LeiQuery.pm
index 72a67c24..10b8d6fa 100644
--- a/lib/PublicInbox/LeiQuery.pm
+++ b/lib/PublicInbox/LeiQuery.pm
@@ -31,17 +31,21 @@ sub lei_q {
 	}
 	if (@only) {
 		for my $loc (@only) {
-			$lxs->prepare_external($self->ext_canonicalize($loc));
+			my @loc = $self->get_externals($loc) or return;
+			$lxs->prepare_external($_) for @loc;
 		}
 	} else {
 		for my $loc (@{$opt->{include} // []}) {
-			$lxs->prepare_external($self->ext_canonicalize($loc));
+			my @loc = $self->get_externals($loc) or return;
+			$lxs->prepare_external($_) for @loc;
 		}
 		# --external is enabled by default, but allow --no-external
 		if ($opt->{external} //= 1) {
-			my %x = map {;
-				($self->ext_canonicalize($_), 1)
-			} @{$self->{exclude} // []};
+			my %x;
+			for my $loc (@{$opt->{exclude} // []}) {
+				my @l = $self->get_externals($loc, 1) or return;
+				$x{$_} = 1 for @l;
+			}
 			my $ne = $self->externals_each(\&prep_ext, $lxs, \%x);
 			$opt->{remote} //= !($lxs->locals - $opt->{'local'});
 			if ($opt->{'local'}) {

  parent reply	other threads:[~2021-02-03  8:11 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-03  8:11 [PATCH 00/11] lei q --stdin, shortcut names, etc Eric Wong
2021-02-03  8:11 ` [PATCH 01/11] lei: reduce FD pressure from lei2mail worker Eric Wong
2021-02-03  8:11 ` [PATCH 02/11] lei: further reduce lei2mail FD pressure Eric Wong
2021-02-03  8:11 ` [PATCH 03/11] pkt_op: rely on DS::in_loop global Eric Wong
2021-02-03  8:11 ` [PATCH 04/11] lei: err: avoid uninitialized variable warnings Eric Wong
2021-02-03  8:11 ` [PATCH 05/11] lei: propagate curl errors, improve internal consistency Eric Wong
2021-02-03  8:11 ` Eric Wong [this message]
2021-02-03  8:11 ` [PATCH 07/11] lei: complete basenames for include|exclude|only Eric Wong
2021-02-03  8:11 ` [PATCH 08/11] lei: help starts pager Eric Wong
2021-02-03  8:11 ` [PATCH 09/11] lei add-external: completion for existing URL basenames Eric Wong
2021-02-03  8:11 ` [PATCH 10/11] lei: use sleep(1) loop for infinite sleep Eric Wong
2021-02-03  8:11 ` [PATCH 11/11] lei q: support reading queries from stdin Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210203081143.24424-7-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).