user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: "Eric Wong (Contractor, The Linux Foundation)" <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 01/21] v2writable: warn on duplicate Message-IDs
Date: Wed, 28 Feb 2018 23:41:42 +0000	[thread overview]
Message-ID: <20180228234202.8839-2-e@80x24.org> (raw)
In-Reply-To: <20180228234202.8839-1-e@80x24.org>

This should give us an idea of how much a problem deduplication
will be.
---
 lib/PublicInbox/SearchIdx.pm  | 6 ++++--
 lib/PublicInbox/V2Writable.pm | 2 +-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index cc7e7ec..f9207e9 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -515,13 +515,15 @@ sub unindex_blob {
 }
 
 sub index_mm {
-	my ($self, $mime) = @_;
+	my ($self, $mime, $warn_existing) = @_;
 	my $mid = mid_clean(mid_mime($mime));
 	my $mm = $self->{mm};
 	my $num = $mm->mid_insert($mid);
+	return $num if defined $num;
 
+	warn "<$mid> reused\n" if $warn_existing;
 	# fallback to num_for since filters like RubyLang set the number
-	defined $num ? $num : $mm->num_for($mid);
+	$mm->num_for($mid);
 }
 
 sub unindex_mm {
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index cf19c76..29ed23c 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -63,7 +63,7 @@ sub add {
 	my ($len, $msgref) = @{$im->{last_object}};
 
 	$self->idx_init;
-	my $num = $self->{all}->index_mm($mime);
+	my $num = $self->{all}->index_mm($mime, 1);
 	my $nparts = $self->{partitions};
 	my $part = $num % $nparts;
 	my $idx = $self->idx_part($part);
-- 
EW


  reply	other threads:[~2018-02-28 23:42 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-28 23:41 [PATCH v2 0/21] UI bits and v2 import fixes Eric Wong (Contractor, The Linux Foundation)
2018-02-28 23:41 ` Eric Wong (Contractor, The Linux Foundation) [this message]
2018-02-28 23:41 ` [PATCH 02/21] v2/ui: some hacky things to get the PSGI UI to show up Eric Wong (Contractor, The Linux Foundation)
2018-02-28 23:41 ` [PATCH 03/21] v2/ui: retry DB reopens in a few more places Eric Wong (Contractor, The Linux Foundation)
2018-02-28 23:41 ` [PATCH 04/21] v2writable: cleanup unused pipes in partitions Eric Wong (Contractor, The Linux Foundation)
2018-02-28 23:41 ` [PATCH 05/21] searchidxpart: binmode Eric Wong (Contractor, The Linux Foundation)
2018-02-28 23:41 ` [PATCH 06/21] use PublicInbox::MIME consistently Eric Wong (Contractor, The Linux Foundation)
2018-02-28 23:41 ` [PATCH 07/21] searchidxpart: chomp line before splitting Eric Wong (Contractor, The Linux Foundation)
2018-02-28 23:41 ` [PATCH 08/21] searchidx*: name child subprocesses Eric Wong (Contractor, The Linux Foundation)
2018-02-28 23:41 ` [PATCH 09/21] searchidx: get rid of pointless index_blob wrapper Eric Wong (Contractor, The Linux Foundation)
2018-02-28 23:41 ` [PATCH 10/21] view: remove X-PI-TS reference Eric Wong (Contractor, The Linux Foundation)
2018-02-28 23:41 ` [PATCH 11/21] searchidxthread: load doc data for references Eric Wong (Contractor, The Linux Foundation)
2018-02-28 23:41 ` [PATCH 12/21] searchidxpart: force integers into add_message Eric Wong (Contractor, The Linux Foundation)
2018-02-28 23:41 ` [PATCH 13/21] search: reopen skeleton DB as well Eric Wong (Contractor, The Linux Foundation)
2018-02-28 23:41 ` [PATCH 14/21] searchidx: index values in the threader Eric Wong (Contractor, The Linux Foundation)
2018-02-28 23:41 ` [PATCH 15/21] search: use different Enquire object for skeleton queries Eric Wong (Contractor, The Linux Foundation)
2018-02-28 23:41 ` [PATCH 16/21] rename SearchIdxThread to SearchIdxSkeleton Eric Wong (Contractor, The Linux Foundation)
2018-02-28 23:41 ` [PATCH 17/21] v2writable: commit to skeleton via remote partitions Eric Wong (Contractor, The Linux Foundation)
2018-02-28 23:41 ` [PATCH 18/21] searchidxskeleton: extra error checking Eric Wong (Contractor, The Linux Foundation)
2018-02-28 23:42 ` [PATCH 19/21] searchidx: do not modify Xapian DB while iterating Eric Wong (Contractor, The Linux Foundation)
2018-02-28 23:42 ` [PATCH 20/21] search: query_xover uses skeleton DB iff available Eric Wong (Contractor, The Linux Foundation)
2018-02-28 23:42 ` [PATCH 21/21] v2/ui: get nntpd and init tests running on v2 Eric Wong (Contractor, The Linux Foundation)
2018-03-01 23:40 ` [PATCH v2 0/21] UI bits and v2 import fixes Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180228234202.8839-2-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).