user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: "Eric Wong (Contractor, The Linux Foundation)" <e@80x24.org>
To: meta@public-inbox.org
Cc: "Eric Wong (Contractor, The Linux Foundation)" <e@80x24.org>
Subject: [PATCH 08/14] search: cleanup uniqueness checking
Date: Thu, 29 Mar 2018 10:28:13 +0000	[thread overview]
Message-ID: <20180329102819.15234-9-e@80x24.org> (raw)
In-Reply-To: <20180329102819.15234-1-e@80x24.org>

The only Xapian term which should be unique is the NNTP article
number; so we no longer need find_unique_doc_id.
---
 lib/PublicInbox/Search.pm | 24 ++++++++----------------
 1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index a4e2498..584a508 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -396,9 +396,16 @@ sub lookup_article {
 		retry_reopen($self, sub {
 			my $db = $self->{skel} || $self->{xdb};
 			my $head = $db->postlist_begin($term);
-			return if $head == $db->postlist_end($term);
+			my $tail = $db->postlist_end($term);
+			return if $head->equal($tail);
 			my $doc_id = $head->get_docid;
 			return unless defined $doc_id;
+			$head->inc;
+			if ($head->nequal($tail)) {
+				my $loc= $self->{mainrepo} .
+					($self->{skel} ? 'skel' : 'xdb');
+				warn "article #$num is not unique in $loc\n";
+			}
 			# raises on error:
 			my $doc = $db->get_document($doc_id);
 			$smsg = PublicInbox::SearchMsg->wrap($doc);
@@ -432,21 +439,6 @@ sub each_smsg_by_mid {
 	}
 }
 
-sub find_unique_doc_id {
-	my ($self, $termval) = @_;
-
-	my ($begin, $end) = $self->find_doc_ids($termval);
-
-	return undef if $begin->equal($end); # not found
-
-	my $rv = $begin->get_docid;
-
-	# sanity check
-	$begin->inc;
-	$begin->equal($end) or die "Term '$termval' is not unique\n";
-	$rv;
-}
-
 # returns begin and end PostingIterator
 sub find_doc_ids {
 	my ($self, $termval) = @_;
-- 
EW


  parent reply	other threads:[~2018-03-29 10:28 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-29 10:28 [PATCH 00/14] purging support, v1 conversions, cleanups + more Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 01/14] www: remove unnecessary ghost checks Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 02/14] v2writable: append, instead of prepending generated Message-ID Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 03/14] lookup by Message-ID favors the "primary" one Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 04/14] www: fix attachment downloads for conflicted Message-IDs Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 05/14] searchmsg: document why we store To: and Cc: for NNTP Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 06/14] public-inbox-convert: tool for converting old to new inboxes Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 07/14] v2writable: support purging messages from git entirely Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` Eric Wong (Contractor, The Linux Foundation) [this message]
2018-03-29 10:28 ` [PATCH 09/14] search: get rid of most lookup_* subroutines Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 10/14] search: move find_doc_ids to searchidx Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 11/14] v2writable: cleanup: get rid of unused fields Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 12/14] mbox: avoid extracting Message-ID for linkification Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 13/14] www: cleanup expensive fallback for legacy URLs Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 14/14] view: get rid of some unnecessary imports Eric Wong (Contractor, The Linux Foundation)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180329102819.15234-9-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).