From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 10D391F625 for ; Sat, 21 Mar 2020 02:03:56 +0000 (UTC) From: Eric Wong To: meta@public-inbox.org Subject: [PATCH 10/11] altid: warn about non-word prefixes Date: Sat, 21 Mar 2020 02:03:53 +0000 Message-Id: <20200321020354.9056-11-e@yhbt.net> In-Reply-To: <20200321020354.9056-1-e@yhbt.net> References: <20200321020354.9056-1-e@yhbt.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: We only support searching on prefixes matching /\A\w+\z/ because Xapian requires ':' to delimit the prefix and splits on spaces without quotes. I've also verified Xapian supports multibyte UTF-8 characters, underscores, and bare numbers as search prefixes, so there's no need to restrict it beyond what Perl's UTF-8 aware \w character class offers. --- lib/PublicInbox/AltId.pm | 2 +- lib/PublicInbox/Search.pm | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/lib/PublicInbox/AltId.pm b/lib/PublicInbox/AltId.pm index 8ce70e46..3be6c73c 100644 --- a/lib/PublicInbox/AltId.pm +++ b/lib/PublicInbox/AltId.pm @@ -22,7 +22,7 @@ sub new { my ($class, $ibx, $spec, $writable) = @_; my ($type, $prefix, $query) = split(/:/, $spec, 3); $type eq 'serial' or die "non-serial not supported, yet\n"; - + $prefix =~ /\A\w+\z/ or warn "non-word prefix not searchable\n"; my %params = map { my ($k, $v) = split(/=/, uri_unescape($_), 2); $v = '' unless defined $v; diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm index 372dc5a7..00dddc6b 100644 --- a/lib/PublicInbox/Search.pm +++ b/lib/PublicInbox/Search.pm @@ -316,6 +316,8 @@ sub qp { my $user_pfx = $self->{-user_pfx} = []; for (@$altid) { # $_ = 'serial:gmane:/path/to/gmane.msgmap.sqlite3' + # note: Xapian supports multibyte UTF-8, /^[0-9]+$/, + # and '_' with prefixes matching \w+ /\Aserial:(\w+):/ or next; my $pfx = $1; push @$user_pfx, "$pfx:", <