user/dev discussion of public-inbox itself
 help / color / Atom feed
From: Eric Wong <e@80x24.org>
To: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Cc: meta@public-inbox.org
Subject: [WIP] v2writable: support INBOX_DEBUG=replace
Date: Mon, 10 Jun 2019 22:03:20 +0000
Message-ID: <20190610220320.nssvqjseswo2ujl2@dcvr> (raw)
In-Reply-To: <20190610194039.GD16418@chatter.i7.local>

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Mon, Jun 10, 2019 at 07:29:05PM +0000, Eric Wong wrote:
> > > I did a few successful tests on small trial lists, but I'm running
> > > into a
> > > problem when I try to actually edit something in (a copy of) LKML:
> > > 
> > > $ perl5lib/bin/public-inbox-edit -m messageid /mnt/fastio/lkml
> > > (mutt opens here)
> > > 1 kept, 0 deleted.
> > > Exception: Expected block 102325 to be level 2, not 0
> > 
> > That's an exception from Xapian I haven't seen that in years.
> > Which version of Xapian and are you using chert or glass?
> 
> EL7 has 1.2.25.

Oh, I just realized that doesn't use OFD locks at all because
its Linux 3.10 and OFD locks appeared in 3.15 (unless RH backported).

Maybe PATCH 14/11 fixes it:

  https://public-inbox.org/meta/20190610215811.untkksidetf3erf6@dcvr/

> > > I can provide more debug info if that helps.
> > 
> > Yes, I could probably add some debug messages to make it easier
> 
> Sounds good -- I had suspected it was coming from Xapian and I know EL7 is
> lagging behind quite a bit. If this is too much divergence to work with, I
> can probably build a xapian14 package, but I'm afraid of the rabbit hole I
> may have to go down to make that work.

Xapian 1.4 has huge performance improvements in worst-case
scenarios with the glass backend; so it might be worth trying
anyways.

But that won't get you Linux >=3.15 for OFD locks; so Xapian
is probably still using the nasty fork()-based lock in older
releases.

Maybe this dirty patch can dump more info:
---------8<--------
Subject: [WIP] v2writable: support INBOX_DEBUG=replace

Dirty patch to enable ->replace debugging via
INBOX_DEBUG=replace environment.

cf. <20190610192905.55xb737jl7qnbh23@dcvr>
---
 lib/PublicInbox/V2Writable.pm | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 3484807..164a032 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -20,6 +20,9 @@ use PublicInbox::Spawn qw(spawn);
 use PublicInbox::SearchIdx;
 use IO::Handle;
 
+# unstable interface
+use constant DBG_REPLACE => !!(($ENV{INBOX_DEBUG}//'') =~ /\breplace\b/);
+
 # an estimate of the post-packed size to the raw uncompressed size
 my $PACKING_FACTOR = 0.4;
 
@@ -312,14 +315,23 @@ sub _replace_oids ($$$) {
 		$self->{epoch_max} = $max;
 	}
 
+	if (DBG_REPLACE) {
+		warn "Replacing OIDs\n";
+		warn "\t", $_, "\n" for (keys %$replace_map);
+	}
+
 	foreach my $i (0..$max) {
 		my $git_dir = "$pfx/$i.git";
 		-d $git_dir or next;
+
+		warn "In $git_dir ... " if DBG_REPLACE;
 		my $git = PublicInbox::Git->new($git_dir);
 		my $im = $self->import_init($git, 0, 1);
 		$rewrites->[$i] = $im->replace_oids($mime, $replace_map);
 		$im->done;
+		warn "Done: $git_dir" if DBG_REPLACE;
 	}
+	warn "done replacing in git repos" if DBG_REPLACE;
 	$rewrites;
 }
 
@@ -369,6 +381,7 @@ sub rewrite_internal ($$;$$$) {
 	foreach my $mid (@$mids) {
 		my %gone; # num => [ smsg, raw ]
 		my ($id, $prev);
+		warn "looking for <$mid>" if DBG_REPLACE;
 		while (my $smsg = $over->next_by_mid($mid, \$id, \$prev)) {
 			my $msg = get_blob($self, $smsg);
 			if (!defined($msg)) {
@@ -380,6 +393,10 @@ sub rewrite_internal ($$;$$$) {
 			if (content_matches($cids, $cur)) {
 				$smsg->{mime} = $cur;
 				$gone{$smsg->{num}} = [ $smsg, \$orig ];
+				DBG_REPLACE and
+					warn "matched <$mid> => $smsg->{num}";
+			} else {
+				DBG_REPLACE and warn "no match <$mid>";
 			}
 		}
 		my $n = scalar keys %gone;
@@ -387,6 +404,7 @@ sub rewrite_internal ($$;$$$) {
 		if ($n > 1) {
 			warn "BUG: multiple articles linked to <$mid>\n",
 				join(',', sort keys %gone), "\n";
+			warn "Replacing all of them\n";
 		}
 		foreach my $num (keys %gone) {
 			my ($smsg, $orig) = @{$gone{$num}};
@@ -513,6 +531,7 @@ sub replace ($$$) {
 
 	my $raw = $new_mime->as_string;
 	my $expect_oid = git_hash_raw($self, \$raw);
+	warn "expect_oid: $expect_oid" if DBG_REPLACE;
 	my $rewritten = _replace($self, $old_mime, $new_mime, \$raw) or return;
 	my $need_reindex = $rewritten->{need_reindex};
 
@@ -537,6 +556,7 @@ W: $list
 	for my $smsg (@$need_reindex) {
 		my $num = $smsg->{num};
 		my $mid0 = $smsg->{mid};
+		warn "Reindexing article $num <$mid0>" if DBG_REPLACE;
 		do_idx($self, \$raw, $new_mime, $len, $num, $oid, $mid0);
 	}
 	$rewritten->{rewrites};
-- 
EW

  reply index

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-09  2:51 [PATCH 00/11] v2: implement message editing Eric Wong (Contractor, The Linux Foundation)
2019-06-09  2:51 ` [PATCH 01/11] v2writable: consolidate overview and indexing call Eric Wong (Contractor, The Linux Foundation)
2019-06-09  2:51 ` [PATCH 02/11] import: extract_author_info becomes extract_commit_info Eric Wong (Contractor, The Linux Foundation)
2019-06-09  2:51 ` [PATCH 03/11] import: switch to "replace_oids" interface for purge Eric Wong (Contractor, The Linux Foundation)
2019-06-09  2:51 ` [PATCH 04/11] v2writable: implement ->replace call Eric Wong (Contractor, The Linux Foundation)
2019-06-09  2:51 ` [PATCH 05/11] admin: remove warning arg for unconfigured inboxes Eric Wong (Contractor, The Linux Foundation)
2019-06-09  2:51 ` [PATCH 06/11] purge: start moving common options to AdminEdit module Eric Wong (Contractor, The Linux Foundation)
2019-06-09  2:51 ` [PATCH 07/11] admin: beef up resolve_inboxes to handle purge options Eric Wong (Contractor, The Linux Foundation)
2019-06-09  2:51 ` [PATCH 08/11] AdminEdit: move editability checks from -purge Eric Wong (Contractor, The Linux Foundation)
2019-06-09  2:51 ` [PATCH 09/11] admin: expose ->config Eric Wong (Contractor, The Linux Foundation)
2019-06-09  2:51 ` [PATCH 10/11] doc: document the --prune option for -index Eric Wong (Contractor, The Linux Foundation)
2019-06-09  2:51 ` [PATCH 11/11] edit: new tool to perform edits Eric Wong (Contractor, The Linux Foundation)
2019-06-10 16:06   ` Konstantin Ryabitsev
2019-06-10 18:02     ` Eric Wong
2019-06-13  8:07     ` Eric Wong
2019-06-10 15:06 ` [PATCH 00/11] v2: implement message editing Konstantin Ryabitsev
2019-06-10 15:40   ` Eric Wong
2019-06-10 17:56     ` [PATCH 12/11] edit|purge: improve output on rewrites Eric Wong
2019-06-10 18:57     ` [PATCH 00/11] v2: implement message editing Konstantin Ryabitsev
2019-06-10 19:29       ` Eric Wong
2019-06-10 19:40         ` Konstantin Ryabitsev
2019-06-10 22:03           ` Eric Wong [this message]
2019-06-10 22:13             ` [WIP] v2writable: support INBOX_DEBUG=replace Konstantin Ryabitsev
2019-06-10 23:12               ` [WIP] add more debug tracing around idx_init Eric Wong
2019-06-11 15:33                 ` Konstantin Ryabitsev
2019-06-11 18:43             ` [WIP] v2writable: support INBOX_DEBUG=replace Eric Wong
2019-06-11 21:06       ` [PATCH 00/11] v2: implement message editing Konstantin Ryabitsev
2019-06-12  0:18         ` [PATCH] searchidx: improve error message when Xapian fails Eric Wong
2019-06-10 18:17 ` [PATCH 13/11] edit: drop unwanted headers before noop check Eric Wong (Contractor, The Linux Foundation)
2019-06-10 21:58 ` [PATCH 14/11] v2writable: replace: kill git processes before reindexing Eric Wong (Contractor, The Linux Foundation)
2019-06-12  0:25 ` [PATCH 15/11] edit: unlink temporary file when done Eric Wong (Contractor, The Linux Foundation)

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190610220320.nssvqjseswo2ujl2@dcvr \
    --to=e@80x24.org \
    --cc=konstantin@linuxfoundation.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror http://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.org/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox