user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* [PATCH 00/11] v2: implement message editing
@ 2019-06-09  2:51 Eric Wong (Contractor, The Linux Foundation)
  2019-06-09  2:51 ` [PATCH 01/11] v2writable: consolidate overview and indexing call Eric Wong (Contractor, The Linux Foundation)
                   ` (14 more replies)
  0 siblings, 15 replies; 31+ messages in thread
From: Eric Wong (Contractor, The Linux Foundation) @ 2019-06-09  2:51 UTC (permalink / raw)
  To: meta

Some organizations are legally responsible for removing certain
content but prefer to edit out sensitive parts of a message
instead of purging it completely from history.

We can build off existing purge functionality.  Instead of
replacing a message with an empty file; we instead replace
it with the desired content.

This ->replace method reindexes the modified message and
updates the corresponding git commit in case the subject
or authorship ident changes.

A new tool, public-inbox-edit(1) wraps the new ->replace
functionality by providing an editable mboxrd (suitable
for publicinbox.mailEditor "mutt -f").

GIT_EDITOR/VISUAL/EDITOR can be used if publicinbox.mailEditor
is not configured, but those are generally not ideal
for editing base64 or QP-encoded messages.

Eric Wong (Contractor, The Linux Foundation) (11):
  v2writable: consolidate overview and indexing call
  import: extract_author_info becomes extract_commit_info
  import: switch to "replace_oids" interface for purge
  v2writable: implement ->replace call
  admin: remove warning arg for unconfigured inboxes
  purge: start moving common options to AdminEdit module
  admin: beef up resolve_inboxes to handle purge options
  AdminEdit: move editability checks from -purge
  admin: expose ->config
  doc: document the --prune option for -index
  edit: new tool to perform edits

 Documentation/include.mk              |   1 +
 Documentation/public-inbox-config.pod |   4 +
 Documentation/public-inbox-edit.pod   | 109 ++++++++++++
 Documentation/public-inbox-index.pod  |   7 +
 MANIFEST                              |   5 +
 lib/PublicInbox/Admin.pm              |  75 ++++++---
 lib/PublicInbox/AdminEdit.pm          |  50 ++++++
 lib/PublicInbox/Import.pm             | 101 ++++++-----
 lib/PublicInbox/V2Writable.pm         | 200 +++++++++++++++++-----
 script/public-inbox-edit              | 233 ++++++++++++++++++++++++++
 script/public-inbox-purge             | 103 ++----------
 t/edit.t                              | 178 ++++++++++++++++++++
 t/replace.t                           | 199 ++++++++++++++++++++++
 13 files changed, 1065 insertions(+), 200 deletions(-)
 create mode 100644 Documentation/public-inbox-edit.pod
 create mode 100644 lib/PublicInbox/AdminEdit.pm
 create mode 100755 script/public-inbox-edit
 create mode 100644 t/edit.t
 create mode 100644 t/replace.t

-- 
EW


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 01/11] v2writable: consolidate overview and indexing call
  2019-06-09  2:51 [PATCH 00/11] v2: implement message editing Eric Wong (Contractor, The Linux Foundation)
@ 2019-06-09  2:51 ` Eric Wong (Contractor, The Linux Foundation)
  2019-06-09  2:51 ` [PATCH 02/11] import: extract_author_info becomes extract_commit_info Eric Wong (Contractor, The Linux Foundation)
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Eric Wong (Contractor, The Linux Foundation) @ 2019-06-09  2:51 UTC (permalink / raw)
  To: meta

It's one ugly sub with lots of parameters, but it's better
than calling a bunch of ugly subs with lots of parameters;
as we'll be needing to call it again when reindexing for
message replacements.
---
 lib/PublicInbox/V2Writable.pm | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index a8c33ef..a435814 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -116,6 +116,18 @@ sub add {
 	});
 }
 
+# indexes a message, returns true if checkpointing is needed
+sub do_idx ($$$$$$$) {
+	my ($self, $msgref, $mime, $len, $num, $oid, $mid0) = @_;
+	$self->{over}->add_overview($mime, $len, $num, $oid, $mid0);
+	my $npart = $self->{partitions};
+	my $part = $num % $npart;
+	my $idx = idx_part($self, $part);
+	$idx->index_raw($len, $msgref, $num, $oid, $mid0, $mime);
+	my $n = $self->{transact_bytes} += $len;
+	$n >= (PublicInbox::SearchIdx::BATCH_BYTES * $npart);
+}
+
 sub _add {
 	my ($self, $mime, $check_cb) = @_;
 
@@ -141,13 +153,7 @@ sub _add {
 	$self->{last_commit}->[$self->{epoch_max}] = $cmt;
 
 	my ($oid, $len, $msgref) = @{$im->{last_object}};
-	$self->{over}->add_overview($mime, $len, $num, $oid, $mid0);
-	my $nparts = $self->{partitions};
-	my $part = $num % $nparts;
-	my $idx = $self->idx_part($part);
-	$idx->index_raw($len, $msgref, $num, $oid, $mid0, $mime);
-	my $n = $self->{transact_bytes} += $len;
-	if ($n > (PublicInbox::SearchIdx::BATCH_BYTES * $nparts)) {
+	if (do_idx($self, $msgref, $mime, $len, $num, $oid, $mid0)) {
 		$self->checkpoint;
 	}
 
@@ -772,15 +778,8 @@ sub reindex_oid ($$$$) {
 	}
 	$sync->{mm_tmp}->mid_delete($mid0) or
 		die "failed to delete <$mid0> for article #$num\n";
-
-	$self->{over}->add_overview($mime, $len, $num, $oid, $mid0);
-	my $nparts = $self->{partitions};
-	my $part = $num % $nparts;
-	my $idx = $self->idx_part($part);
-	$idx->index_raw($len, $msgref, $num, $oid, $mid0, $mime);
-	my $n = $self->{transact_bytes} += $len;
 	$sync->{nr}++;
-	if ($n > (PublicInbox::SearchIdx::BATCH_BYTES * $nparts)) {
+	if (do_idx($self, $msgref, $mime, $len, $num, $oid, $mid0)) {
 		$git->cleanup;
 		$sync->{mm_tmp}->atfork_prepare;
 		$self->done; # release lock
-- 
EW


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 02/11] import: extract_author_info becomes extract_commit_info
  2019-06-09  2:51 [PATCH 00/11] v2: implement message editing Eric Wong (Contractor, The Linux Foundation)
  2019-06-09  2:51 ` [PATCH 01/11] v2writable: consolidate overview and indexing call Eric Wong (Contractor, The Linux Foundation)
@ 2019-06-09  2:51 ` Eric Wong (Contractor, The Linux Foundation)
  2019-06-09  2:51 ` [PATCH 03/11] import: switch to "replace_oids" interface for purge Eric Wong (Contractor, The Linux Foundation)
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Eric Wong (Contractor, The Linux Foundation) @ 2019-06-09  2:51 UTC (permalink / raw)
  To: meta

We will be reusing the same logic for extracting all
the authorship and commit title logic for edits; so
put it all into one sub.
---
 lib/PublicInbox/Import.pm | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index 2c4bad9..6ee1935 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -277,7 +277,7 @@ sub git_timestamp {
 	"$ts $zone";
 }
 
-sub extract_author_info ($) {
+sub extract_cmt_info ($) {
 	my ($mime) = @_;
 
 	my $sender = '';
@@ -314,7 +314,17 @@ sub extract_author_info ($) {
 		$name = '';
 		warn "no name in From: $from or Sender: $sender\n";
 	}
-	($name, $email);
+
+	my $hdr = $mime->header_obj;
+
+	my $subject = $hdr->header('Subject');
+	$subject = '(no subject)' unless defined $subject;
+	# Mime decoding can create nulls replace them with spaces to protect git
+	$subject =~ tr/\0/ /;
+	utf8::encode($subject);
+	my $at = git_timestamp(my @at = msg_datestamp($hdr));
+	my $ct = git_timestamp(my @ct = msg_timestamp($hdr));
+	($name, $email, $at, $ct, $subject);
 }
 
 # kill potentially confusing/misleading headers
@@ -361,19 +371,7 @@ sub clean_tree_v2 ($$$) {
 sub add {
 	my ($self, $mime, $check_cb) = @_; # mime = Email::MIME
 
-	my ($name, $email) = extract_author_info($mime);
-	my $hdr = $mime->header_obj;
-	my @at = msg_datestamp($hdr);
-	my @ct = msg_timestamp($hdr);
-	my $author_time_raw = git_timestamp(@at);
-	my $commit_time_raw = git_timestamp(@ct);
-
-	my $subject = $mime->header('Subject');
-	$subject = '(no subject)' unless defined $subject;
-	# Mime decoding can create nulls replace them with spaces to protect git
-	$subject =~ tr/\0/ /;
-	utf8::encode($subject);
-
+	my ($name, $email, $at, $ct, $subject) = extract_cmt_info($mime);
 	my $path_type = $self->{path_type};
 	my $path;
 	if ($path_type eq '2/38') {
@@ -416,8 +414,8 @@ sub add {
 	}
 
 	print $w "commit $ref\nmark :$commit\n",
-		"author $name <$email> $author_time_raw\n",
-		"committer $self->{ident} $commit_time_raw\n" or wfail;
+		"author $name <$email> $at\n",
+		"committer $self->{ident} $ct\n" or wfail;
 	print $w "data ", (length($subject) + 1), "\n",
 		$subject, "\n\n" or wfail;
 	if ($tip ne '') {
-- 
EW


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 03/11] import: switch to "replace_oids" interface for purge
  2019-06-09  2:51 [PATCH 00/11] v2: implement message editing Eric Wong (Contractor, The Linux Foundation)
  2019-06-09  2:51 ` [PATCH 01/11] v2writable: consolidate overview and indexing call Eric Wong (Contractor, The Linux Foundation)
  2019-06-09  2:51 ` [PATCH 02/11] import: extract_author_info becomes extract_commit_info Eric Wong (Contractor, The Linux Foundation)
@ 2019-06-09  2:51 ` Eric Wong (Contractor, The Linux Foundation)
  2019-06-09  2:51 ` [PATCH 04/11] v2writable: implement ->replace call Eric Wong (Contractor, The Linux Foundation)
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Eric Wong (Contractor, The Linux Foundation) @ 2019-06-09  2:51 UTC (permalink / raw)
  To: meta

Continuing the work by Eric Biederman in commit a118d58a402bd31b
("Import.pm: When purging replace a purged file with a zero length file"),
we can use a generic OID replacement mechanism to implement
purge.
---
 lib/PublicInbox/Import.pm     | 33 +++++++++++++++++++--------------
 lib/PublicInbox/V2Writable.pm |  6 +++---
 2 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index 6ee1935..2c8fe84 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -501,16 +501,16 @@ sub clean_purge_buffer {
 	}
 }
 
-sub purge_oids {
-	my ($self, $purge) = @_;
-	my $tmp = "refs/heads/purge-".((keys %$purge)[0]);
+sub replace_oids {
+	my ($self, $replace) = @_; # oid => raw string
+	my $tmp = "refs/heads/replace-".((keys %$replace)[0]);
 	my $old = $self->{'ref'};
 	my $git = $self->{git};
 	my @export = (qw(fast-export --no-data --use-done-feature), $old);
 	my $rd = $git->popen(@export);
 	my ($r, $w) = $self->gfi_start;
 	my @buf;
-	my $npurge = 0;
+	my $nreplace = 0;
 	my @oids;
 	my ($done, $mark);
 	my $tree = $self->{-tree};
@@ -533,10 +533,13 @@ sub purge_oids {
 		} elsif (/^M 100644 ([a-f0-9]+) (\w+)/) {
 			my ($oid, $path) = ($1, $2);
 			$tree->{$path} = 1;
-			if ($purge->{$oid}) {
+			my $sref = $replace->{$oid};
+			if (defined $sref) {
 				push @oids, $oid;
-				my $cmd = "M 100644 inline $path\ndata 0\n\n";
-				push @buf, $cmd;
+				my $n = length($$sref);
+				push @buf, "M 100644 inline $path\ndata $n\n";
+				push @buf, $$sref; # hope CoW works...
+				push @buf, "\n";
 			} else {
 				push @buf, $_;
 			}
@@ -549,7 +552,7 @@ sub purge_oids {
 				$out =~ s/^/# /sgm;
 				warn "purge rewriting\n", $out, "\n";
 				clean_purge_buffer(\@oids, \@buf);
-				$npurge++;
+				$nreplace++;
 			}
 			$w->print(@buf, "\n") or wfail;
 			@buf = ();
@@ -567,28 +570,30 @@ sub purge_oids {
 		$w->print(@buf) or wfail;
 	}
 	die 'done\n not seen from fast-export' unless $done;
-	chomp(my $cmt = $self->get_mark(":$mark")) if $npurge;
+	chomp(my $cmt = $self->get_mark(":$mark")) if $nreplace;
 	$self->{nchg} = 0; # prevent _update_git_info until update-ref:
 	$self->done;
 	my @git = ('git', "--git-dir=$git->{git_dir}");
 
-	run_die([@git, qw(update-ref), $old, $tmp]) if $npurge;
+	run_die([@git, qw(update-ref), $old, $tmp]) if $nreplace;
 
 	run_die([@git, qw(update-ref -d), $tmp]);
 
-	return if $npurge == 0;
+	return if $nreplace == 0;
 
 	run_die([@git, qw(-c gc.reflogExpire=now gc --prune=all)]);
+
+	# check that old OIDs are gone
 	my $err = 0;
-	foreach my $oid (keys %$purge) {
+	foreach my $oid (keys %$replace) {
 		my @info = $git->check($oid);
 		if (@info) {
-			warn "$oid not purged\n";
+			warn "$oid not replaced\n";
 			$err++;
 		}
 	}
 	_update_git_info($self, 0);
-	die "Failed to purge $err object(s)\n" if $err;
+	die "Failed to replace $err object(s)\n" if $err;
 	$cmt;
 }
 
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index a435814..d6f72b0 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -298,7 +298,7 @@ sub idx_init {
 }
 
 sub purge_oids ($$) {
-	my ($self, $purge) = @_; # $purge = { $object_id => 1, ... }
+	my ($self, $purge) = @_; # $purge = { $object_id => \'', ... }
 	$self->done;
 	my $pfx = "$self->{-inbox}->{mainrepo}/git";
 	my $purges = [];
@@ -313,7 +313,7 @@ sub purge_oids ($$) {
 		-d $git_dir or next;
 		my $git = PublicInbox::Git->new($git_dir);
 		my $im = $self->import_init($git, 0, 1);
-		$purges->[$i] = $im->purge_oids($purge);
+		$purges->[$i] = $im->replace_oids($purge);
 		$im->done;
 	}
 	$purges;
@@ -386,7 +386,7 @@ sub remove_internal ($$$$) {
 			$removed = $smsg;
 			my $oid = $smsg->{blob};
 			if ($purge) {
-				$purge->{$oid} = 1;
+				$purge->{$oid} = \'';
 			} else {
 				($mark, undef) = $im->remove($orig, $cmt_msg);
 			}
-- 
EW


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 04/11] v2writable: implement ->replace call
  2019-06-09  2:51 [PATCH 00/11] v2: implement message editing Eric Wong (Contractor, The Linux Foundation)
                   ` (2 preceding siblings ...)
  2019-06-09  2:51 ` [PATCH 03/11] import: switch to "replace_oids" interface for purge Eric Wong (Contractor, The Linux Foundation)
@ 2019-06-09  2:51 ` Eric Wong (Contractor, The Linux Foundation)
  2019-06-09  2:51 ` [PATCH 05/11] admin: remove warning arg for unconfigured inboxes Eric Wong (Contractor, The Linux Foundation)
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Eric Wong (Contractor, The Linux Foundation) @ 2019-06-09  2:51 UTC (permalink / raw)
  To: meta

Much of the existing purge code is repurposed to a general
"replace" functionality.

->purge is simpler because it can just drop the information.
Unlike ->purge, ->replace needs to edit existing git commits (in
case of From: and Subject: headers) and reindex the modified
message.

We currently disallow editing of References:, In-Reply-To: and
Message-ID headers because it can cause bad side effects with
our threading (and our lack of rethreading support to deal with
excessive matching from incorrect/invalid References).
---
 MANIFEST                      |   1 +
 lib/PublicInbox/Import.pm     |  44 +++++---
 lib/PublicInbox/V2Writable.pm | 171 ++++++++++++++++++++++++-----
 t/replace.t                   | 199 ++++++++++++++++++++++++++++++++++
 4 files changed, 372 insertions(+), 43 deletions(-)
 create mode 100644 t/replace.t

diff --git a/MANIFEST b/MANIFEST
index 5085bff..321da03 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -236,6 +236,7 @@ t/psgi_text.t
 t/psgi_v2.t
 t/purge.t
 t/qspawn.t
+t/replace.t
 t/reply.t
 t/search-thr-index.t
 t/search.t
diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index 2c8fe84..137b2b7 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -484,26 +484,38 @@ sub digest2mid ($$) {
 	"$dt.$b64" . '@z';
 }
 
-sub clean_purge_buffer {
-	my ($oids, $buf) = @_;
-	my $cmt_msg = 'purged '.join(' ',@$oids)."\n";
+sub rewrite_commit ($$$$) {
+	my ($self, $oids, $buf, $mime) = @_;
+	my ($name, $email, $at, $ct, $subject);
+	if ($mime) {
+		($name, $email, $at, $ct, $subject) = extract_cmt_info($mime);
+	} else {
+		$name = $email = '';
+		$subject = 'purged '.join(' ', @$oids);
+	}
 	@$oids = ();
-
+	$subject .= "\n";
 	foreach my $i (0..$#$buf) {
 		my $l = $buf->[$i];
 		if ($l =~ /^author .* ([0-9]+ [\+-]?[0-9]+)$/) {
-			$buf->[$i] = "author <> $1\n";
+			$at //= $1;
+			$buf->[$i] = "author $name <$email> $at\n";
+		} elsif ($l =~ /^committer .* ([0-9]+ [\+-]?[0-9]+)$/) {
+			$ct //= $1;
+			$buf->[$i] = "committer $self->{ident} $ct\n";
 		} elsif ($l =~ /^data ([0-9]+)/) {
-			$buf->[$i++] = "data " . length($cmt_msg) . "\n";
-			$buf->[$i] = $cmt_msg;
+			$buf->[$i++] = "data " . length($subject) . "\n";
+			$buf->[$i] = $subject;
 			last;
 		}
 	}
 }
 
+# returns the new commit OID if a replacement was done
+# returns undef if nothing was done
 sub replace_oids {
-	my ($self, $replace) = @_; # oid => raw string
-	my $tmp = "refs/heads/replace-".((keys %$replace)[0]);
+	my ($self, $mime, $replace_map) = @_; # oid => raw string
+	my $tmp = "refs/heads/replace-".((keys %$replace_map)[0]);
 	my $old = $self->{'ref'};
 	my $git = $self->{git};
 	my @export = (qw(fast-export --no-data --use-done-feature), $old);
@@ -533,7 +545,7 @@ sub replace_oids {
 		} elsif (/^M 100644 ([a-f0-9]+) (\w+)/) {
 			my ($oid, $path) = ($1, $2);
 			$tree->{$path} = 1;
-			my $sref = $replace->{$oid};
+			my $sref = $replace_map->{$oid};
 			if (defined $sref) {
 				push @oids, $oid;
 				my $n = length($$sref);
@@ -548,10 +560,12 @@ sub replace_oids {
 			push @buf, $_ if $tree->{$path};
 		} elsif ($_ eq "\n") {
 			if (@oids) {
-				my $out = join('', @buf);
-				$out =~ s/^/# /sgm;
-				warn "purge rewriting\n", $out, "\n";
-				clean_purge_buffer(\@oids, \@buf);
+				if (!$mime) {
+					my $out = join('', @buf);
+					$out =~ s/^/# /sgm;
+					warn "purge rewriting\n", $out, "\n";
+				}
+				rewrite_commit($self, \@oids, \@buf, $mime);
 				$nreplace++;
 			}
 			$w->print(@buf, "\n") or wfail;
@@ -585,7 +599,7 @@ sub replace_oids {
 
 	# check that old OIDs are gone
 	my $err = 0;
-	foreach my $oid (keys %$replace) {
+	foreach my $oid (keys %$replace_map) {
 		my @info = $git->check($oid);
 		if (@info) {
 			warn "$oid not replaced\n";
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index d6f72b0..3484807 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -11,7 +11,7 @@ use PublicInbox::SearchIdxPart;
 use PublicInbox::MIME;
 use PublicInbox::Git;
 use PublicInbox::Import;
-use PublicInbox::MID qw(mids);
+use PublicInbox::MID qw(mids references);
 use PublicInbox::ContentId qw(content_id content_digest);
 use PublicInbox::Inbox;
 use PublicInbox::OverIdx;
@@ -297,26 +297,30 @@ sub idx_init {
 	});
 }
 
-sub purge_oids ($$) {
-	my ($self, $purge) = @_; # $purge = { $object_id => \'', ... }
+# returns an array mapping [ epoch => latest_commit ]
+# latest_commit may be undef if nothing was done to that epoch
+# $replace_map = { $object_id => $strref, ... }
+sub _replace_oids ($$$) {
+	my ($self, $mime, $replace_map) = @_;
 	$self->done;
 	my $pfx = "$self->{-inbox}->{mainrepo}/git";
-	my $purges = [];
+	my $rewrites = []; # epoch => commit
 	my $max = $self->{epoch_max};
 
 	unless (defined($max)) {
 		defined(my $latest = git_dir_latest($self, \$max)) or return;
 		$self->{epoch_max} = $max;
 	}
+
 	foreach my $i (0..$max) {
 		my $git_dir = "$pfx/$i.git";
 		-d $git_dir or next;
 		my $git = PublicInbox::Git->new($git_dir);
 		my $im = $self->import_init($git, 0, 1);
-		$purges->[$i] = $im->replace_oids($purge);
+		$rewrites->[$i] = $im->replace_oids($mime, $replace_map);
 		$im->done;
 	}
-	$purges;
+	$rewrites;
 }
 
 sub content_ids ($) {
@@ -339,25 +343,31 @@ sub content_matches ($$) {
 	0
 }
 
-sub remove_internal ($$$$) {
-	my ($self, $mime, $cmt_msg, $purge) = @_;
+# used for removing or replacing (purging)
+sub rewrite_internal ($$;$$$) {
+	my ($self, $old_mime, $cmt_msg, $new_mime, $sref) = @_;
 	$self->idx_init;
-	my $im = $self->importer unless $purge;
+	my ($im, $need_reindex, $replace_map);
+	if ($sref) {
+		$replace_map = {}; # oid => sref
+		$need_reindex = [] if $new_mime;
+	} else {
+		$im = $self->importer;
+	}
 	my $over = $self->{over};
-	my $cids = content_ids($mime);
+	my $cids = content_ids($old_mime);
 	my $parts = $self->{idx_parts};
-	my $mm = $self->{mm};
 	my $removed;
-	my $mids = mids($mime->header_obj);
+	my $mids = mids($old_mime->header_obj);
 
 	# We avoid introducing new blobs into git since the raw content
 	# can be slightly different, so we do not need the user-supplied
 	# message now that we have the mids and content_id
-	$mime = undef;
+	$old_mime = undef;
 	my $mark;
 
 	foreach my $mid (@$mids) {
-		my %gone;
+		my %gone; # num => [ smsg, raw ]
 		my ($id, $prev);
 		while (my $smsg = $over->next_by_mid($mid, \$id, \$prev)) {
 			my $msg = get_blob($self, $smsg);
@@ -380,17 +390,21 @@ sub remove_internal ($$$$) {
 		}
 		foreach my $num (keys %gone) {
 			my ($smsg, $orig) = @{$gone{$num}};
-			$mm->num_delete($num);
 			# $removed should only be set once assuming
 			# no bugs in our deduplication code:
 			$removed = $smsg;
 			my $oid = $smsg->{blob};
-			if ($purge) {
-				$purge->{$oid} = \'';
+			if ($replace_map) {
+				$replace_map->{$oid} = $sref;
 			} else {
 				($mark, undef) = $im->remove($orig, $cmt_msg);
 			}
 			$orig = undef;
+			if ($need_reindex) { # ->replace
+				push @$need_reindex, $smsg;
+			} else { # ->purge or ->remove
+				$self->{mm}->num_delete($num);
+			}
 			unindex_oid_remote($self, $oid, $mid);
 		}
 	}
@@ -399,8 +413,9 @@ sub remove_internal ($$$$) {
 		my $cmt = $im->get_mark($mark);
 		$self->{last_commit}->[$self->{epoch_max}] = $cmt;
 	}
-	if ($purge && scalar keys %$purge) {
-		return purge_oids($self, $purge);
+	if ($replace_map && scalar keys %$replace_map) {
+		my $rewrites = _replace_oids($self, $new_mime, $replace_map);
+		return { rewrites => $rewrites, need_reindex => $need_reindex };
 	}
 	$removed;
 }
@@ -409,22 +424,122 @@ sub remove_internal ($$$$) {
 sub remove {
 	my ($self, $mime, $cmt_msg) = @_;
 	$self->{-inbox}->with_umask(sub {
-		remove_internal($self, $mime, $cmt_msg, undef);
+		rewrite_internal($self, $mime, $cmt_msg);
 	});
 }
 
+sub _replace ($$;$$) {
+	my ($self, $old_mime, $new_mime, $sref) = @_;
+	my $rewritten = $self->{-inbox}->with_umask(sub {
+		rewrite_internal($self, $old_mime, undef, $new_mime, $sref);
+	}) or return;
+
+	my $rewrites = $rewritten->{rewrites};
+	# ->done is called if there are rewrites since we gc+prune from git
+	$self->idx_init if @$rewrites;
+
+	for my $i (0..$#$rewrites) {
+		defined(my $cmt = $rewrites->[$i]) or next;
+		$self->{last_commit}->[$i] = $cmt;
+	}
+	$rewritten;
+}
+
 # public
 sub purge {
 	my ($self, $mime) = @_;
-	my $purges = $self->{-inbox}->with_umask(sub {
-		remove_internal($self, $mime, undef, {});
-	}) or return;
-	$self->idx_init if @$purges; # ->done is called on purges
-	for my $i (0..$#$purges) {
-		defined(my $cmt = $purges->[$i]) or next;
-		$self->{last_commit}->[$i] = $cmt;
+	my $rewritten = _replace($self, $mime, undef, \'') or return;
+	$rewritten->{rewrites}
+}
+
+# returns the git object_id of $fh, does not write the object to FS
+sub git_hash_raw ($$) {
+	my ($self, $raw) = @_;
+	# grab the expected OID we have to reindex:
+	open my $tmp_fh, '+>', undef or die "failed to open tmp: $!";
+	$tmp_fh->autoflush(1);
+	print $tmp_fh $$raw or die "print \$tmp_fh: $!";
+	sysseek($tmp_fh, 0, 0) or die "seek failed: $!";
+
+	my ($r, $w);
+	pipe($r, $w) or die "failed to create pipe: $!";
+	my $rdr = { 0 => fileno($tmp_fh), 1 => fileno($w) };
+	my $git_dir = $self->{-inbox}->git->{git_dir};
+	my $cmd = ['git', "--git-dir=$git_dir", qw(hash-object --stdin)];
+	my $pid = spawn($cmd, undef, $rdr);
+	close $w;
+	local $/ = "\n";
+	chomp(my $oid = <$r>);
+	waitpid($pid, 0) == $pid or die "git hash-object did not finish";
+	die "git hash-object failed: $?" if $?;
+	$oid =~ /\A[a-f0-9]{40}\z/ or die "OID not expected: $oid";
+	$oid;
+}
+
+sub _check_mids_match ($$$) {
+	my ($old_list, $new_list, $hdrs) = @_;
+	my %old_mids = map { $_ => 1 } @$old_list;
+	my %new_mids = map { $_ => 1 } @$new_list;
+	my @old = keys %old_mids;
+	my @new = keys %new_mids;
+	my $err = "$hdrs may not be changed when replacing\n";
+	die $err if scalar(@old) != scalar(@new);
+	delete @new_mids{@old};
+	delete @old_mids{@new};
+	die $err if (scalar(keys %old_mids) || scalar(keys %new_mids));
+}
+
+# Changing Message-IDs or References with ->replace isn't supported.
+# The rules for dealing with messages with multiple or conflicting
+# Message-IDs are pretty complex and rethreading hasn't been fully
+# implemented, yet.
+sub check_mids_match ($$) {
+	my ($old_mime, $new_mime) = @_;
+	my $old = $old_mime->header_obj;
+	my $new = $new_mime->header_obj;
+	_check_mids_match(mids($old), mids($new), 'Message-ID(s)');
+	_check_mids_match(references($old), references($new),
+			'References/In-Reply-To');
+}
+
+# public
+sub replace ($$$) {
+	my ($self, $old_mime, $new_mime) = @_;
+
+	check_mids_match($old_mime, $new_mime);
+
+	# mutt will always add Content-Length:, Status:, Lines: when editing
+	PublicInbox::Import::drop_unwanted_headers($new_mime);
+
+	my $raw = $new_mime->as_string;
+	my $expect_oid = git_hash_raw($self, \$raw);
+	my $rewritten = _replace($self, $old_mime, $new_mime, \$raw) or return;
+	my $need_reindex = $rewritten->{need_reindex};
+
+	# just in case we have bugs in deduplication code:
+	my $n = scalar(@$need_reindex);
+	if ($n > 1) {
+		my $list = join(', ', map {
+					"$_->{num}: <$_->{mid}>"
+				} @$need_reindex);
+		warn <<"";
+W: rewritten $n messages matching content of original message (expected: 1).
+W: possible bug in public-inbox, NNTP article IDs and Message-IDs follow:
+W: $list
+
+	}
+
+	# make sure we really got the OID:
+	my ($oid, $type, $len) = $self->{-inbox}->git->check($expect_oid);
+	$oid eq $expect_oid or die "BUG: $expect_oid not found after replace";
+
+	# reindex modified messages:
+	for my $smsg (@$need_reindex) {
+		my $num = $smsg->{num};
+		my $mid0 = $smsg->{mid};
+		do_idx($self, \$raw, $new_mime, $len, $num, $oid, $mid0);
 	}
-	$purges;
+	$rewritten->{rewrites};
 }
 
 sub last_commit_part ($$;$) {
diff --git a/t/replace.t b/t/replace.t
new file mode 100644
index 0000000..6fae551
--- /dev/null
+++ b/t/replace.t
@@ -0,0 +1,199 @@
+# Copyright (C) 2019 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use warnings;
+use Test::More;
+use PublicInbox::MIME;
+use PublicInbox::InboxWritable;
+use File::Temp qw/tempdir/;
+require './t/common.perl';
+require_git(2.6); # replace is v2 only, for now...
+foreach my $mod (qw(DBD::SQLite)) {
+	eval "require $mod";
+	plan skip_all => "$mod missing for $0" if $@;
+}
+
+sub test_replace ($$$) {
+	my ($v, $level, $opt) = @_;
+	diag "v$v $level replace";
+	my $this = "pi-$v-$level-replace";
+	my $tmpdir = tempdir("$this-tmp-XXXXXX", TMPDIR => 1, CLEANUP => 1);
+	my $ibx = PublicInbox::Inbox->new({
+		mainrepo => "$tmpdir/testbox",
+		name => $this,
+		version => $v,
+		-primary_address => 'test@example.com',
+		indexlevel => $level,
+	});
+
+	my $orig = PublicInbox::MIME->new(<<'EOF');
+From: Barbra Streisand <effect@example.com>
+To: test@example.com
+Subject: confidential
+Message-ID: <replace@example.com>
+Date: Fri, 02 Oct 1993 00:00:00 +0000
+
+Top secret info about my house in Malibu...
+EOF
+	my $im = PublicInbox::InboxWritable->new($ibx, {nproc=>1})->importer;
+	# fake a bunch of epochs
+	$im->{rotate_bytes} = $opt->{rotate_bytes} if $opt->{rotate_bytes};
+
+	if ($opt->{pre}) {
+		$opt->{pre}->($im, 1, 2);
+		$orig->header_set('References', '<1@example.com>');
+	}
+	ok($im->add($orig), 'add message to be replaced');
+	if ($opt->{post}) {
+		$opt->{post}->($im, 3, { 4 => 'replace@example.com' });
+	}
+	$im->done;
+	my $thread_a = $ibx->over->get_thread('replace@example.com');
+
+	my %before = map {; delete($_->{blob}) => $_ } @{$ibx->recent};
+	my $reject = PublicInbox::MIME->new($orig->as_string);
+	foreach my $mid (['<replace@example.com>', '<extra@example.com>'],
+				[], ['<replaced@example.com>']) {
+		$reject->header_set('Message-ID', @$mid);
+		my $ok = eval { $im->replace($orig, $reject) };
+		like($@, qr/Message-ID.*may not be changed/,
+			'->replace died on Message-ID change');
+		ok(!$ok, 'no replacement happened');
+	}
+
+	# prepare the replacement
+	my $expect = "Move along, nothing to see here\n";
+	my $repl = PublicInbox::MIME->new($orig->as_string);
+	$repl->header_set('From', '<redactor@example.com>');
+	$repl->header_set('Subject', 'redacted');
+	$repl->header_set('Date', 'Sat, 02 Oct 2010 00:00:00 +0000');
+	$repl->body_str_set($expect);
+
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	ok(my $cmts = $im->replace($orig, $repl), 'replaced message');
+	my $changed_epochs = 0;
+	for my $tip (@$cmts) {
+		next if !defined $tip;
+		$changed_epochs++;
+		like($tip, qr/\A[a-f0-9]{40}\z/,
+			'replace returned current commit');
+	}
+	is($changed_epochs, 1, 'only one epoch changed');
+
+	$im->done;
+	my $m = PublicInbox::MIME->new($ibx->msg_by_mid('replace@example.com'));
+	is($m->body, $expect, 'replaced message');
+	is_deeply(\@warn, [], 'no warnings on noop');
+
+	my @cat = qw(cat-file --buffer --batch --batch-all-objects);
+	my $git = $ibx->git;
+	my @all = $git->qx(@cat);
+	is_deeply([grep(/confidential/, @all)], [], 'nothing confidential');
+	is_deeply([grep(/Streisand/, @all)], [], 'Streisand who?');
+	is_deeply([grep(/\bOct 1993\b/, @all)], [], 'nothing from Oct 1993');
+	my $t19931002 = qr/ 749520000 /;
+	is_deeply([grep(/$t19931002/, @all)], [], "nothing matches $t19931002");
+
+	for my $dir (glob("$ibx->{mainrepo}/git/*.git")) {
+		my ($bn) = ($dir =~ m!([^/]+)\z!);
+		is(system(qw(git --git-dir), $dir, qw(fsck --strict)), 0,
+			"git fsck is clean in epoch $bn");
+	}
+
+	my $thread_b = $ibx->over->get_thread('replace@example.com');
+	is_deeply([sort map { $_->{mid} } @$thread_b],
+		[sort map { $_->{mid} } @$thread_a], 'threading preserved');
+
+	if (my $srch = $ibx->search) {
+		for my $q ('f:streisand', 's:confidential', 'malibu') {
+			my $msgs = $srch->query($q);
+			is_deeply($msgs, [], "no match for $q");
+		}
+		my @ok = ('f:redactor', 's:redacted', 'nothing to see');
+		if ($opt->{pre}) {
+			push @ok, 'm:1@example.com', 'm:2@example.com',
+				's:message2', 's:message1';
+		}
+		if ($opt->{post}) {
+			push @ok, 'm:3@example.com', 'm:4@example.com',
+				's:message3', 's:message4';
+		}
+		for my $q (@ok) {
+			my $msgs = $srch->query($q);
+			ok($msgs->[0], "got match for $q");
+		}
+	}
+
+	# check overview matches:
+	my %after = map {; delete($_->{blob}) => $_ } @{$ibx->recent};
+	my @before_blobs = keys %before;
+	foreach my $blob (@before_blobs) {
+		delete $before{$blob} if delete $after{$blob};
+	}
+
+	is(scalar keys %before, 1, 'one unique blob from before left');
+	is(scalar keys %after, 1, 'one unique blob from after left');
+	foreach my $blob (keys %before) {
+		is($git->check($blob), undef, 'old blob not found');
+		my $smsg = $before{$blob};
+		is($smsg->{subject}, 'confidential', 'before subject');
+		is($smsg->{mid}, 'replace@example.com', 'before MID');
+	}
+	foreach my $blob (keys %after) {
+		ok($git->check($blob), 'new blob found');
+		my $smsg = $after{$blob};
+		is($smsg->{subject}, 'redacted', 'after subject');
+		is($smsg->{mid}, 'replace@example.com', 'before MID');
+	}
+	@warn = ();
+	is($im->replace($orig, $repl), undef, 'no-op replace returns undef');
+	is($im->purge($orig), undef, 'no-op purge returns undef');
+	is_deeply(\@warn, [], 'no warnings on noop');
+}
+
+sub pad_msgs {
+	my ($im, @range) = @_;
+	for my $i (@range) {
+		my $irt;
+		if (ref($i) eq 'HASH') {
+			($i, $irt) = each %$i;
+		}
+		my $sec = sprintf('%0d', $i);
+		my $mime = PublicInbox::MIME->new(<<EOF);
+From: foo\@example.com
+To: test\@example.com
+Message-ID: <$i\@example.com>
+Date: Fri, 02, Jan 1970 00:00:$sec +0000
+Subject: message$i
+
+message number$i
+EOF
+
+		if (defined($irt)) {
+			$mime->header_set('References', "<$irt>");
+		}
+
+		$im->add($mime);
+	}
+}
+
+my $opt = { pre => *pad_msgs };
+test_replace(2, 'basic', {});
+test_replace(2, 'basic', $opt);
+test_replace(2, 'basic', $opt = { %$opt, post => *pad_msgs });
+test_replace(2, 'basic', $opt = { %$opt, rotate_bytes => 1 });
+
+SKIP: if ('test xapian') {
+	require PublicInbox::Search;
+	PublicInbox::Search::load_xapian() or skip 'Search::Xapian missing', 8;
+	for my $l (qw(medium)) {
+		test_replace(2, $l, {});
+		$opt = { pre => *pad_msgs };
+		test_replace(2, $l, $opt);
+		test_replace(2, $l, $opt = { %$opt, post => *pad_msgs });
+		test_replace(2, $l, $opt = { %$opt, rotate_bytes => 1 });
+	}
+};
+
+done_testing();
-- 
EW


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 05/11] admin: remove warning arg for unconfigured inboxes
  2019-06-09  2:51 [PATCH 00/11] v2: implement message editing Eric Wong (Contractor, The Linux Foundation)
                   ` (3 preceding siblings ...)
  2019-06-09  2:51 ` [PATCH 04/11] v2writable: implement ->replace call Eric Wong (Contractor, The Linux Foundation)
@ 2019-06-09  2:51 ` Eric Wong (Contractor, The Linux Foundation)
  2019-06-09  2:51 ` [PATCH 06/11] purge: start moving common options to AdminEdit module Eric Wong (Contractor, The Linux Foundation)
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Eric Wong (Contractor, The Linux Foundation) @ 2019-06-09  2:51 UTC (permalink / raw)
  To: meta

We no longer make -index warn on it, no other code uses it;
and working on unconfigured inboxes is totally reasonable
for admins who are setting things up.
---
 lib/PublicInbox/Admin.pm | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/lib/PublicInbox/Admin.pm b/lib/PublicInbox/Admin.pm
index 4a862c6..419cb35 100644
--- a/lib/PublicInbox/Admin.pm
+++ b/lib/PublicInbox/Admin.pm
@@ -67,7 +67,7 @@ $ibx->{mainrepo} has unexpected indexlevel in Xapian: $m
 }
 
 sub resolve_inboxes {
-	my ($argv, $warn_on_unconfigured) = @_;
+	my ($argv) = @_;
 	require PublicInbox::Config;
 	require PublicInbox::Inbox;
 
@@ -80,9 +80,6 @@ sub resolve_inboxes {
 			my ($ibx) = @_;
 			$dir2ibx{abs_path($ibx->{mainrepo})} = $ibx;
 		});
-	} elsif ($warn_on_unconfigured) {
-		# do we really care about this?  It's annoying...
-		warn $warn_on_unconfigured, "\n";
 	}
 	for my $i (0..$#ibxs) {
 		my $dir = $ibxs[$i];
-- 
EW


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 06/11] purge: start moving common options to AdminEdit module
  2019-06-09  2:51 [PATCH 00/11] v2: implement message editing Eric Wong (Contractor, The Linux Foundation)
                   ` (4 preceding siblings ...)
  2019-06-09  2:51 ` [PATCH 05/11] admin: remove warning arg for unconfigured inboxes Eric Wong (Contractor, The Linux Foundation)
@ 2019-06-09  2:51 ` Eric Wong (Contractor, The Linux Foundation)
  2019-06-09  2:51 ` [PATCH 07/11] admin: beef up resolve_inboxes to handle purge options Eric Wong (Contractor, The Linux Foundation)
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Eric Wong (Contractor, The Linux Foundation) @ 2019-06-09  2:51 UTC (permalink / raw)
  To: meta

Editing and purging are similar operations involving history
rewrites, so there'll be common options and code between them.
---
 MANIFEST                     |  1 +
 lib/PublicInbox/AdminEdit.pm | 11 +++++++++++
 script/public-inbox-purge    | 25 ++++++++++---------------
 3 files changed, 22 insertions(+), 15 deletions(-)
 create mode 100644 lib/PublicInbox/AdminEdit.pm

diff --git a/MANIFEST b/MANIFEST
index 321da03..dcf1a60 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -68,6 +68,7 @@ examples/unsubscribe.psgi
 examples/varnish-4.vcl
 lib/PublicInbox/Address.pm
 lib/PublicInbox/Admin.pm
+lib/PublicInbox/AdminEdit.pm
 lib/PublicInbox/AltId.pm
 lib/PublicInbox/Cgit.pm
 lib/PublicInbox/Config.pm
diff --git a/lib/PublicInbox/AdminEdit.pm b/lib/PublicInbox/AdminEdit.pm
new file mode 100644
index 0000000..109a99f
--- /dev/null
+++ b/lib/PublicInbox/AdminEdit.pm
@@ -0,0 +1,11 @@
+# Copyright (C) 2019 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# common stuff between -edit, -purge (and maybe -learn in the future)
+package PublicInbox::AdminEdit;
+use strict;
+use warnings;
+use PublicInbox::Admin;
+our @OPT = qw(all force|f verbose|v!);
+
+1;
diff --git a/script/public-inbox-purge b/script/public-inbox-purge
index 25e6cc9..d58a9ba 100755
--- a/script/public-inbox-purge
+++ b/script/public-inbox-purge
@@ -7,7 +7,7 @@
 use strict;
 use warnings;
 use Getopt::Long qw(:config gnu_getopt no_ignore_case auto_abbrev);
-use PublicInbox::Admin qw(resolve_repo_dir);
+use PublicInbox::AdminEdit;
 PublicInbox::Admin::check_require('-index');
 require PublicInbox::Filter::Base;
 require PublicInbox::Config;
@@ -19,25 +19,20 @@ require PublicInbox::V2Writable;
 my $usage = "$0 [--all] [INBOX_DIRS] </path/to/message";
 my $config = eval { PublicInbox::Config->new };
 my $cfgfile = PublicInbox::Config::default_file();
-my ($all, $force);
-my $verbose = 1;
-my %opts = (
-	'all' => \$all,
-	'force|f' => \$force,
-	'verbose|v!' => \$verbose,
-);
-GetOptions(%opts) or die "bad command-line args\n", $usage, "\n";
+my $opt = { verbose => 1 };
+GetOptions($opt, @PublicInbox::AdminEdit::OPT) or
+	die "bad command-line args\n$usage\n";
 
 # TODO: clean this up and share code with -index via ::Admin
 my %dir2ibx; # ( path => Inbox object )
 my @inboxes;
 $config and $config->each_inbox(sub {
 	my ($ibx) = @_;
-	push @inboxes, $ibx if $all && $ibx->{version} != 1;
+	push @inboxes, $ibx if $opt->{all} && $ibx->{version} != 1;
 	$dir2ibx{$ibx->{mainrepo}} = $ibx;
 });
 
-if ($all) {
+if ($opt->{all}) {
 	$config or die "--all specified, but $cfgfile not readable\n";
 	@ARGV and die "--all specified, but directories specified\n";
 } else {
@@ -47,7 +42,7 @@ if ($all) {
 
 	foreach my $dir (@dirs) {
 		my $v;
-		my $dir = resolve_repo_dir($dir, \$v);
+		my $dir = PublicInbox::Admin::resolve_repo_dir($dir, \$v);
 		if ($v == 1) {
 			push @err, $dir;
 			next;
@@ -127,7 +122,7 @@ foreach my $ibx (@inboxes) {
 
 	$v2w->done;
 
-	if ($verbose) { # should we consider this machine-parseable?
+	if ($opt->{verbose}) { # should we consider this machine-parseable?
 		print "$ibx->{mainrepo}:";
 		if (scalar @$commits) {
 			print join("\n\t", '', @$commits), "\n";
@@ -139,7 +134,7 @@ foreach my $ibx (@inboxes) {
 }
 
 # behave like "rm -f"
-exit(0) if ($force || $n_purged);
+exit(0) if ($opt->{force} || $n_purged);
 
-warn "Not found\n" if $verbose;
+warn "Not found\n" if $opt->{verbose};
 exit(1);
-- 
EW


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 07/11] admin: beef up resolve_inboxes to handle purge options
  2019-06-09  2:51 [PATCH 00/11] v2: implement message editing Eric Wong (Contractor, The Linux Foundation)
                   ` (5 preceding siblings ...)
  2019-06-09  2:51 ` [PATCH 06/11] purge: start moving common options to AdminEdit module Eric Wong (Contractor, The Linux Foundation)
@ 2019-06-09  2:51 ` Eric Wong (Contractor, The Linux Foundation)
  2019-06-09  2:51 ` [PATCH 08/11] AdminEdit: move editability checks from -purge Eric Wong (Contractor, The Linux Foundation)
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Eric Wong (Contractor, The Linux Foundation) @ 2019-06-09  2:51 UTC (permalink / raw)
  To: meta

We'll be using this in -edit, and maybe other admin-oriented
tools for UI-consistency.
---
 lib/PublicInbox/Admin.pm  | 65 +++++++++++++++++++++++++++++----------
 script/public-inbox-purge | 53 +++----------------------------
 2 files changed, 52 insertions(+), 66 deletions(-)

diff --git a/lib/PublicInbox/Admin.pm b/lib/PublicInbox/Admin.pm
index 419cb35..cebd144 100644
--- a/lib/PublicInbox/Admin.pm
+++ b/lib/PublicInbox/Admin.pm
@@ -66,33 +66,64 @@ $ibx->{mainrepo} has unexpected indexlevel in Xapian: $m
 	$l;
 }
 
-sub resolve_inboxes {
-	my ($argv) = @_;
+sub unconfigured_ibx ($$) {
+	my ($dir, $i) = @_;
+	my $name = "unconfigured-$i";
+	PublicInbox::Inbox->new({
+		name => $name,
+		address => [ "$name\@example.com" ],
+		mainrepo => $dir,
+		# TODO: consumers may want to warn on this:
+		#-unconfigured => 1,
+	});
+}
+
+sub resolve_inboxes ($;$) {
+	my ($argv, $opt) = @_;
 	require PublicInbox::Config;
 	require PublicInbox::Inbox;
+	$opt ||= {};
 
-	my @ibxs = map { resolve_repo_dir($_) } @$argv;
-	push(@ibxs, resolve_repo_dir()) unless @ibxs;
+	my $config = eval { PublicInbox::Config->new };
+	if ($opt->{all}) {
+		my $cfgfile = PublicInbox::Config::default_file();
+		$config or die "--all specified, but $cfgfile not readable\n";
+		@$argv and die "--all specified, but directories specified\n";
+	}
 
+	my $min_ver = $opt->{-min_inbox_version} || 0;
+	my (@old, @ibxs);
 	my %dir2ibx;
-	if (my $config = eval { PublicInbox::Config->new }) {
+	if ($config) {
 		$config->each_inbox(sub {
 			my ($ibx) = @_;
+			$ibx->{version} ||= 1;
 			$dir2ibx{abs_path($ibx->{mainrepo})} = $ibx;
 		});
 	}
-	for my $i (0..$#ibxs) {
-		my $dir = $ibxs[$i];
-		$ibxs[$i] = $dir2ibx{$dir} ||= do {
-			my $name = "unconfigured-$i";
-			PublicInbox::Inbox->new({
-				name => $name,
-				address => [ "$name\@example.com" ],
-				mainrepo => $dir,
-				# TODO: consumers may want to warn on this:
-				#-unconfigured => 1,
-			});
-		};
+	if ($opt->{all}) {
+		my @all = values %dir2ibx;
+		@all = grep { $_->{version} >= $min_ver } @all;
+		push @ibxs, @all;
+	} else { # directories specified on the command-line
+		my $i = 0;
+		my @dirs = @$argv;
+		push @dirs, '.' unless @dirs;
+		foreach (@dirs) {
+			my $v;
+			my $dir = resolve_repo_dir($_, \$v);
+			if ($v < $min_ver) {
+				push @old, $dir;
+				next;
+			}
+			my $ibx = $dir2ibx{$dir} ||= unconfigured_ibx($dir, $i);
+			$i++;
+			push @ibxs, $ibx;
+		}
+	}
+	if (@old) {
+		die "inboxes $min_ver inboxes not supported by $0\n\t",
+		    join("\n\t", @old), "\n";
 	}
 	@ibxs;
 }
diff --git a/script/public-inbox-purge b/script/public-inbox-purge
index d58a9ba..dc7f89d 100755
--- a/script/public-inbox-purge
+++ b/script/public-inbox-purge
@@ -10,64 +10,19 @@ use Getopt::Long qw(:config gnu_getopt no_ignore_case auto_abbrev);
 use PublicInbox::AdminEdit;
 PublicInbox::Admin::check_require('-index');
 require PublicInbox::Filter::Base;
-require PublicInbox::Config;
 require PublicInbox::MIME;
 require PublicInbox::V2Writable;
 
 { no warnings 'once'; *REJECT = *PublicInbox::Filter::Base::REJECT }
 
 my $usage = "$0 [--all] [INBOX_DIRS] </path/to/message";
-my $config = eval { PublicInbox::Config->new };
-my $cfgfile = PublicInbox::Config::default_file();
-my $opt = { verbose => 1 };
+my $opt = { verbose => 1, all => 0, -min_inbox_version => 2 };
 GetOptions($opt, @PublicInbox::AdminEdit::OPT) or
 	die "bad command-line args\n$usage\n";
 
-# TODO: clean this up and share code with -index via ::Admin
-my %dir2ibx; # ( path => Inbox object )
-my @inboxes;
-$config and $config->each_inbox(sub {
-	my ($ibx) = @_;
-	push @inboxes, $ibx if $opt->{all} && $ibx->{version} != 1;
-	$dir2ibx{$ibx->{mainrepo}} = $ibx;
-});
-
-if ($opt->{all}) {
-	$config or die "--all specified, but $cfgfile not readable\n";
-	@ARGV and die "--all specified, but directories specified\n";
-} else {
-	my @err;
-	my @dirs = scalar(@ARGV) ? @ARGV : ('.');
-	my $u = 0;
-
-	foreach my $dir (@dirs) {
-		my $v;
-		my $dir = PublicInbox::Admin::resolve_repo_dir($dir, \$v);
-		if ($v == 1) {
-			push @err, $dir;
-			next;
-		}
-		my $ibx = $dir2ibx{$dir} ||= do {
-			warn "$dir not configured in $cfgfile\n";
-			$u++;
-			my $name = "unconfigured-$u";
-			PublicInbox::Inbox->new({
-				version => 2,
-				name => $name,
-				-primary_address => "$name\@example.com",
-				mainrepo => $dir,
-			});
-		};
-		push @inboxes, $ibx;
-	}
-
-	if (@err) {
-		die "v1 inboxes currently not supported by -purge\n\t",
-		    join("\n\t", @err), "\n";
-	}
-}
+my @ibxs = PublicInbox::Admin::resolve_inboxes(\@ARGV, $opt);
 
-foreach my $ibx (@inboxes) {
+foreach my $ibx (@ibxs) {
 	my $lvl = $ibx->{indexlevel};
 	if (defined $lvl) {
 		PublicInbox::Admin::indexlevel_ok_or_die($lvl);
@@ -105,7 +60,7 @@ my $data = do { local $/; scalar <STDIN> };
 $data =~ s/\A[\r\n]*From [^\r\n]*\r?\n//s;
 my $n_purged = 0;
 
-foreach my $ibx (@inboxes) {
+foreach my $ibx (@ibxs) {
 	my $mime = PublicInbox::MIME->new($data);
 	my $v2w = PublicInbox::V2Writable->new($ibx, 0);
 
-- 
EW


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 08/11] AdminEdit: move editability checks from -purge
  2019-06-09  2:51 [PATCH 00/11] v2: implement message editing Eric Wong (Contractor, The Linux Foundation)
                   ` (6 preceding siblings ...)
  2019-06-09  2:51 ` [PATCH 07/11] admin: beef up resolve_inboxes to handle purge options Eric Wong (Contractor, The Linux Foundation)
@ 2019-06-09  2:51 ` Eric Wong (Contractor, The Linux Foundation)
  2019-06-09  2:51 ` [PATCH 09/11] admin: expose ->config Eric Wong (Contractor, The Linux Foundation)
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Eric Wong (Contractor, The Linux Foundation) @ 2019-06-09  2:51 UTC (permalink / raw)
  To: meta

We'll be reusing the same logic for -edit
---
 lib/PublicInbox/AdminEdit.pm | 39 ++++++++++++++++++++++++++++++++++++
 script/public-inbox-purge    | 35 +-------------------------------
 2 files changed, 40 insertions(+), 34 deletions(-)

diff --git a/lib/PublicInbox/AdminEdit.pm b/lib/PublicInbox/AdminEdit.pm
index 109a99f..b27c831 100644
--- a/lib/PublicInbox/AdminEdit.pm
+++ b/lib/PublicInbox/AdminEdit.pm
@@ -8,4 +8,43 @@ use warnings;
 use PublicInbox::Admin;
 our @OPT = qw(all force|f verbose|v!);
 
+sub check_editable ($) {
+	my ($ibxs) = @_;
+
+	foreach my $ibx (@$ibxs) {
+		my $lvl = $ibx->{indexlevel};
+		if (defined $lvl) {
+			PublicInbox::Admin::indexlevel_ok_or_die($lvl);
+			next;
+		}
+
+		# Undefined indexlevel, so `full'...
+		# Search::Xapian exists and the DB can be read, at least, fine
+		$ibx->search and next;
+
+		# it's possible for a Xapian directory to exist,
+		# but Search::Xapian to go missing/broken.
+		# Make sure it's purged in that case:
+		$ibx->over or die "no over.sqlite3 in $ibx->{mainrepo}\n";
+
+		# $ibx->{search} is populated by $ibx->over call
+		my $xdir_ro = $ibx->{search}->xdir(1);
+		my $npart = 0;
+		foreach my $part (<$xdir_ro/*>) {
+			if (-d $part && $part =~ m!/[0-9]+\z!) {
+				my $bytes = 0;
+				$bytes += -s $_ foreach glob("$part/*");
+				$npart++ if $bytes;
+			}
+		}
+		if ($npart) {
+			PublicInbox::Admin::require_or_die('-search');
+		} else {
+			# somebody could "rm -r" all the Xapian directories;
+			# let them purge the overview, at least
+			$ibx->{indexlevel} ||= 'basic';
+		}
+	}
+}
+
 1;
diff --git a/script/public-inbox-purge b/script/public-inbox-purge
index dc7f89d..846557c 100755
--- a/script/public-inbox-purge
+++ b/script/public-inbox-purge
@@ -21,40 +21,7 @@ GetOptions($opt, @PublicInbox::AdminEdit::OPT) or
 	die "bad command-line args\n$usage\n";
 
 my @ibxs = PublicInbox::Admin::resolve_inboxes(\@ARGV, $opt);
-
-foreach my $ibx (@ibxs) {
-	my $lvl = $ibx->{indexlevel};
-	if (defined $lvl) {
-		PublicInbox::Admin::indexlevel_ok_or_die($lvl);
-		next;
-	}
-
-	# Undefined indexlevel, so `full'...
-	# Search::Xapian exists and the DB can be read, at least, fine
-	$ibx->search and next;
-
-	# it's possible for a Xapian directory to exist, but Search::Xapian
-	# to go missing/broken.  Make sure it's purged in that case:
-	$ibx->over or die "no over.sqlite3 in $ibx->{mainrepo}\n";
-
-	# $ibx->{search} is populated by $ibx->over call
-	my $xdir_ro = $ibx->{search}->xdir(1);
-	my $npart = 0;
-	foreach my $part (<$xdir_ro/*>) {
-		if (-d $part && $part =~ m!/[0-9]+\z!) {
-			my $bytes = 0;
-			$bytes += -s $_ foreach glob("$part/*");
-			$npart++ if $bytes;
-		}
-	}
-	if ($npart) {
-		PublicInbox::Admin::require_or_die('-search');
-	} else {
-		# somebody could "rm -r" all the Xapian directories;
-		# let them purge the overview, at least
-		$ibx->{indexlevel} ||= 'basic';
-	}
-}
+PublicInbox::AdminEdit::check_editable(\@ibxs);
 
 my $data = do { local $/; scalar <STDIN> };
 $data =~ s/\A[\r\n]*From [^\r\n]*\r?\n//s;
-- 
EW


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 09/11] admin: expose ->config
  2019-06-09  2:51 [PATCH 00/11] v2: implement message editing Eric Wong (Contractor, The Linux Foundation)
                   ` (7 preceding siblings ...)
  2019-06-09  2:51 ` [PATCH 08/11] AdminEdit: move editability checks from -purge Eric Wong (Contractor, The Linux Foundation)
@ 2019-06-09  2:51 ` Eric Wong (Contractor, The Linux Foundation)
  2019-06-09  2:51 ` [PATCH 10/11] doc: document the --prune option for -index Eric Wong (Contractor, The Linux Foundation)
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Eric Wong (Contractor, The Linux Foundation) @ 2019-06-09  2:51 UTC (permalink / raw)
  To: meta

No point in forcing admin programs to reparse the config
themselves; and we won't support multiple instances of it;
unlike the WWW code.
---
 lib/PublicInbox/Admin.pm | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/Admin.pm b/lib/PublicInbox/Admin.pm
index cebd144..8a2f204 100644
--- a/lib/PublicInbox/Admin.pm
+++ b/lib/PublicInbox/Admin.pm
@@ -9,6 +9,8 @@ use warnings;
 use Cwd 'abs_path';
 use base qw(Exporter);
 our @EXPORT_OK = qw(resolve_repo_dir);
+my $CFG; # all the admin stuff is a singleton
+require PublicInbox::Config;
 
 sub resolve_repo_dir {
 	my ($cd, $ver) = @_;
@@ -78,24 +80,25 @@ sub unconfigured_ibx ($$) {
 	});
 }
 
+sub config () { $CFG //= eval { PublicInbox::Config->new } }
+
 sub resolve_inboxes ($;$) {
 	my ($argv, $opt) = @_;
-	require PublicInbox::Config;
 	require PublicInbox::Inbox;
 	$opt ||= {};
 
-	my $config = eval { PublicInbox::Config->new };
+	my $cfg = config();
 	if ($opt->{all}) {
 		my $cfgfile = PublicInbox::Config::default_file();
-		$config or die "--all specified, but $cfgfile not readable\n";
+		$cfg or die "--all specified, but $cfgfile not readable\n";
 		@$argv and die "--all specified, but directories specified\n";
 	}
 
 	my $min_ver = $opt->{-min_inbox_version} || 0;
 	my (@old, @ibxs);
 	my %dir2ibx;
-	if ($config) {
-		$config->each_inbox(sub {
+	if ($cfg) {
+		$cfg->each_inbox(sub {
 			my ($ibx) = @_;
 			$ibx->{version} ||= 1;
 			$dir2ibx{abs_path($ibx->{mainrepo})} = $ibx;
-- 
EW


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 10/11] doc: document the --prune option for -index
  2019-06-09  2:51 [PATCH 00/11] v2: implement message editing Eric Wong (Contractor, The Linux Foundation)
                   ` (8 preceding siblings ...)
  2019-06-09  2:51 ` [PATCH 09/11] admin: expose ->config Eric Wong (Contractor, The Linux Foundation)
@ 2019-06-09  2:51 ` Eric Wong (Contractor, The Linux Foundation)
  2019-06-09  2:51 ` [PATCH 11/11] edit: new tool to perform edits Eric Wong (Contractor, The Linux Foundation)
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Eric Wong (Contractor, The Linux Foundation) @ 2019-06-09  2:51 UTC (permalink / raw)
  To: meta

We've had it around for a while, but I forgot to document it :x
---
 Documentation/public-inbox-index.pod | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/public-inbox-index.pod b/Documentation/public-inbox-index.pod
index 6d2a420..7679376 100644
--- a/Documentation/public-inbox-index.pod
+++ b/Documentation/public-inbox-index.pod
@@ -42,6 +42,13 @@ Xapian database.
 
 This does not touch the NNTP article number database.
 
+=item --prune
+
+Run L<git-gc(1)> to prune and expire reflogs if discontiguous history
+is detected.  This is intended to be used in mirrors after running
+L<public-inbox-edit(1)> or L<public-inbox-purge(1)> to ensure data
+is expunged from mirrors.
+
 =back
 
 =head1 FILES
-- 
EW


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 11/11] edit: new tool to perform edits
  2019-06-09  2:51 [PATCH 00/11] v2: implement message editing Eric Wong (Contractor, The Linux Foundation)
                   ` (9 preceding siblings ...)
  2019-06-09  2:51 ` [PATCH 10/11] doc: document the --prune option for -index Eric Wong (Contractor, The Linux Foundation)
@ 2019-06-09  2:51 ` Eric Wong (Contractor, The Linux Foundation)
  2019-06-10 16:06   ` Konstantin Ryabitsev
  2019-06-10 15:06 ` [PATCH 00/11] v2: implement message editing Konstantin Ryabitsev
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 31+ messages in thread
From: Eric Wong (Contractor, The Linux Foundation) @ 2019-06-09  2:51 UTC (permalink / raw)
  To: meta

This wrapper around V2Writable->replace provides a user-interface
for editing messages as single-message mboxes (or the raw text
via $EDITOR).
---
 Documentation/include.mk              |   1 +
 Documentation/public-inbox-config.pod |   4 +
 Documentation/public-inbox-edit.pod   | 109 ++++++++++++
 MANIFEST                              |   3 +
 script/public-inbox-edit              | 233 ++++++++++++++++++++++++++
 t/edit.t                              | 178 ++++++++++++++++++++
 6 files changed, 528 insertions(+)
 create mode 100644 Documentation/public-inbox-edit.pod
 create mode 100755 script/public-inbox-edit
 create mode 100644 t/edit.t

diff --git a/Documentation/include.mk b/Documentation/include.mk
index b064f29..f5f46d0 100644
--- a/Documentation/include.mk
+++ b/Documentation/include.mk
@@ -32,6 +32,7 @@ podtext = $(PODTEXT) $(PODTEXT_OPTS)
 # MakeMaker only seems to support manpage sections 1 and 3...
 m1 =
 m1 += public-inbox-compact
+m1 += public-inbox-edit
 m1 += public-inbox-httpd
 m1 += public-inbox-index
 m1 += public-inbox-mda
diff --git a/Documentation/public-inbox-config.pod b/Documentation/public-inbox-config.pod
index db81bf1..a86132b 100644
--- a/Documentation/public-inbox-config.pod
+++ b/Documentation/public-inbox-config.pod
@@ -234,6 +234,10 @@ C<publicinbox.cgitbin>, but may be overridden.
 Default: basename of C<publicinbox.cgitbin>, /var/www/htdocs/cgit/
 or /usr/share/cgit/
 
+=item publicinbox.mailEditor
+
+See L<public-inbox-edit(1)>
+
 =item publicinbox.wwwlisting
 
 Enable a HTML listing style when the root path of the URL '/' is accessed.
diff --git a/Documentation/public-inbox-edit.pod b/Documentation/public-inbox-edit.pod
new file mode 100644
index 0000000..97c7c92
--- /dev/null
+++ b/Documentation/public-inbox-edit.pod
@@ -0,0 +1,109 @@
+=head1 NAME
+
+public-inbox-edit - edit messages in a public inbox
+
+=head1 SYNOPSIS
+
+	public-inbox-edit -m MESSAGE-ID --all|INBOX_DIR
+
+	public-inbox-edit -F RAW_FILE --all|INBOX_DIR [.. INBOX_DIR]
+
+=head1 DESCRIPTION
+
+public-inbox-edit allows editing messages in a given inbox
+to remove sensitive information.  It is only intended as a
+last resort, as it will cause discontiguous git history and
+draw more attention to the sensitive data in mirrors.
+
+=head1 OPTIONS
+
+=over
+
+=item --all
+
+Edit the message in all inboxes configured in ~/.public-inbox/config.
+This is an alternative to specifying individual inboxes directories
+on the command-line.
+
+=item -m MESSAGE-ID
+
+Edits the message corresponding to the given C<MESSAGE-ID>.
+If the C<MESSAGE-ID> is ambiguous, C<--force> or using the
+C<--file> of the original will be required.
+
+=item -F FILE
+
+Edits the message corresponding to the Message-ID: header
+and content given in C<FILE>.  This requires the unmodified
+raw message, and the contents of C<FILE> will not itself
+be modified.  This is useful if a Message-ID is ambiguous
+due to filtering/munging rules or other edits.
+
+=item --force
+
+Forcibly perform the edit even if Message-ID is ambiguous.
+
+=item --raw
+
+Do not perform "From " line escaping.  By default, this
+generates a mboxrd variant file to detect unpurged messages
+in the new mbox.  This makes sense if your configured
+C<publicinbox.mailEditor> is a regular editor and not
+something like C<mutt -f>
+
+=back
+
+=head1 CONFIGURATION
+
+=over 8
+
+=item publicinbox.mailEditor
+
+The command to perform the edit with.  An example of this would be
+C<mutt -f>, and the user would then use the facilities in L<mutt(1)>
+to edit the mail.  This is useful for editing attachments or
+Base64-encoded emails which are more difficult to edit with a
+normal editor (configured via C<GIT_EDITOR>, C<VISUAL> or C<EDITOR>).
+
+Default: none
+
+=back
+
+=head1 ENVIRONMENT
+
+=over 8
+
+=for comment MAIL_EDITOR is undocumented (unstable, don't want naming conflicts)
+
+=item GIT_EDITOR / VISUAL / EDITOR
+
+public-inbox-edit will fall back to using one of these variables
+(in that order) if C<publicinbox.mailEditor> is unset.
+
+=item PI_CONFIG
+
+The default config file, normally "~/.public-inbox/config".
+See L<public-inbox-config(5)>
+
+=back
+
+=head1 LIMITATIONS
+
+Only L<v2|public-inbox-v2-format(5)> repositories are supported.
+
+=head1 CONTACT
+
+Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org>
+
+The mail archives are hosted at L<https://public-inbox.org/meta/>
+and L<http://hjrcffqmbrq6wope.onion/meta/>
+
+=head1 COPYRIGHT
+
+Copyright 2019 all contributors L<mailto:meta@public-inbox.org>
+
+License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>
+
+=head1 SEE ALSO
+
+L<public-inbox-purge(1)|https://public-inbox.org/public-inbox-purge.html>
diff --git a/MANIFEST b/MANIFEST
index dcf1a60..a44632a 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -13,6 +13,7 @@ Documentation/public-inbox-compact.pod
 Documentation/public-inbox-config.pod
 Documentation/public-inbox-convert.pod
 Documentation/public-inbox-daemon.pod
+Documentation/public-inbox-edit.pod
 Documentation/public-inbox-httpd.pod
 Documentation/public-inbox-index.pod
 Documentation/public-inbox-mda.pod
@@ -150,6 +151,7 @@ sa_config/root/etc/spamassassin/public-inbox.pre
 sa_config/user/.spamassassin/user_prefs
 script/public-inbox-compact
 script/public-inbox-convert
+script/public-inbox-edit
 script/public-inbox-httpd
 script/public-inbox-index
 script/public-inbox-init
@@ -185,6 +187,7 @@ t/content_id.t
 t/convert-compact.t
 t/data/0001.patch
 t/ds-leak.t
+t/edit.t
 t/emergency.t
 t/fail-bin/spamc
 t/feed.t
diff --git a/script/public-inbox-edit b/script/public-inbox-edit
new file mode 100755
index 0000000..ff0351a
--- /dev/null
+++ b/script/public-inbox-edit
@@ -0,0 +1,233 @@
+#!/usr/bin/perl -w
+# Copyright (C) 2019 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+#
+# Used for editing messages in a public-inbox.
+# Supports v2 inboxes only, for now.
+use strict;
+use warnings;
+use Getopt::Long qw(:config gnu_getopt no_ignore_case auto_abbrev);
+use PublicInbox::AdminEdit;
+use File::Temp qw(tempfile);
+use PublicInbox::ContentId qw(content_id);
+use PublicInbox::MID qw(mid_clean mids);
+PublicInbox::Admin::check_require('-index');
+require PublicInbox::MIME;
+require PublicInbox::InboxWritable;
+
+my $usage = "$0 -m MESSAGE_ID [--all] [INBOX_DIRS]";
+my $opt = { verbose => 1, all => 0, -min_inbox_version => 2, raw => 0 };
+my @opt = qw(mid|m=s file|F=s raw);
+GetOptions($opt, @PublicInbox::AdminEdit::OPT, @opt) or
+	die "bad command-line args\n$usage\n";
+
+my $editor = $ENV{MAIL_EDITOR}; # e.g. "mutt -f"
+unless (defined $editor) {
+	my $k = 'publicinbox.mailEditor';
+	if (my $cfg = PublicInbox::Admin::config()) {
+		$editor = $cfg->{lc($k)};
+	}
+	unless (defined $editor) {
+		warn "\`$k' not configured, trying \`git var GIT_EDITOR'\n";
+		chomp($editor = `git var GIT_EDITOR`);
+		warn "Will use $editor to edit mail\n";
+	}
+}
+
+my $mid = $opt->{mid};
+my $file = $opt->{file};
+if (defined $mid && defined $file) {
+	die "the --mid and --file options are mutually exclusive\n";
+}
+
+my @ibxs = PublicInbox::Admin::resolve_inboxes(\@ARGV, $opt);
+PublicInbox::AdminEdit::check_editable(\@ibxs);
+
+my $found = {}; # cid => [ [ibx, smsg] [, [ibx, smsg] ] ]
+
+sub find_mid ($) {
+	my ($mid) = @_;
+	foreach my $ibx (@ibxs) {
+		my $over = $ibx->over;
+		my ($id, $prev);
+		while (my $smsg = $over->next_by_mid($mid, \$id, \$prev)) {
+			my $ref = $ibx->msg_by_smsg($smsg);
+			my $mime = PublicInbox::MIME->new($ref);
+			my $cid = content_id($mime);
+			my $tuple = [ $ibx, $smsg ];
+			push @{$found->{$cid} ||= []}, $tuple
+		}
+		delete @$ibx{qw(over mm git search)}; # cleanup
+	}
+	$found;
+}
+
+sub show_cmd ($$) {
+	my ($ibx, $smsg) = @_;
+	" GIT_DIR=$ibx->{mainrepo}/all.git \\\n    git show $smsg->{blob}\n";
+}
+
+sub show_found () {
+	foreach my $to_edit (values %$found) {
+		foreach my $tuple (@$to_edit) {
+			my ($ibx, $smsg) = @$tuple;
+			warn show_cmd($ibx, $smsg);
+		}
+	}
+}
+
+if (defined($mid)) {
+	$mid = mid_clean($mid);
+	$found = find_mid($mid);
+	my $nr = scalar(keys %$found);
+	die "No message found for <$mid>\n" unless $nr;
+	if ($nr > 1) {
+		warn <<"";
+Multiple messages with different content found matching
+<$mid>:
+
+		show_found();
+		die "Use --force to edit all of them\n" if !$opt->{force};
+		warn "Will edit all of them\n";
+	}
+} else {
+	open my $fh, '<', $file or die "open($file) failed: $!";
+	my $orig = do { local $/; <$fh> };
+	my $mime = PublicInbox::MIME->new(\$orig);
+	my $mids = mids($mime->header_obj);
+	find_mid($_) for (@$mids); # populates $found
+	my $cid = content_id($mime);
+	my $to_edit = $found->{$cid};
+	unless ($to_edit) {
+		my $nr = scalar(keys %$found);
+		if ($nr > 0) {
+			warn <<"";
+$nr matches to Message-ID(s) in $file, but none matched content
+Partial matches below:
+
+			show_found();
+		} elsif ($nr == 0) {
+			$mids = join('', map { "  <$_>\n" } @$mids);
+			warn <<"";
+No matching messages found matching Message-ID(s) in $file
+$mids
+
+		}
+		exit 1;
+	}
+	$found = { $cid => $to_edit };
+}
+
+my $tmpl = 'public-inbox-edit-XXXXXX';
+foreach my $to_edit (values %$found) {
+	my ($edit_fh, $edit_fn) = tempfile($tmpl, TMPDIR => 1);
+	$edit_fh->autoflush(1);
+	my ($ibx, $smsg) = @{$to_edit->[0]};
+	my $old_raw = $ibx->msg_by_smsg($smsg);
+	delete @$ibx{qw(over mm git search)}; # cleanup
+
+	my $tmp = $$old_raw;
+	if (!$opt->{raw}) {
+		my $oid = $smsg->{blob};
+		print $edit_fh "From mboxrd\@$oid Thu Jan  1 00:00:00 1970\n";
+		$tmp =~ s/^(>*From )/>$1/gm;
+	}
+	print $edit_fh $tmp or
+		die "failed to write tempfile for editing: $!";
+
+	# run the editor, respecting spaces/quote
+retry_edit:
+	if (system(qw(sh -c), qq(eval "$editor" '"\$@"'), '--', $edit_fn)) {
+		if (!(-t STDIN) && !$opt->{force}) {
+			die "E: $editor failed: $?\n";
+		}
+		print STDERR "$editor failed, ";
+		print STDERR "continuing as forced\n" if $opt->{force};
+		while (!$opt->{force}) {
+			print STDERR "(r)etry, (c)ontinue, (q)uit?\n";
+			chomp(my $op = <STDIN> || '');
+			$op = lc($op);
+			goto retry_edit if $op eq 'r';
+			exit $? if $op eq 'q';
+			last if $op eq 'c'; # continuing
+			print STDERR "\`$op' not recognized\n";
+		}
+	}
+
+	# reread the edited file, not using $edit_fh since $EDITOR may
+	# rename/relink $edit_fn
+	open my $new_fh, '<', $edit_fn or
+		die "can't read edited file ($edit_fn): $!\n";
+	my $new_raw = do { local $/; <$new_fh> };
+
+	if (!$opt->{raw}) {
+		# get rid of the From we added
+		$new_raw =~ s/\A[\r\n]*From [^\r\n]*\r?\n//s;
+
+		# check if user forgot to purge (in mutt) after editing
+		if ($new_raw =~ /^From /sm) {
+			if (-t STDIN) {
+				print STDERR <<'';
+Extra "From " lines detected in new mbox.
+Did you forget to purge the original message from the mbox after editing?
+
+				while (1) {
+					print STDERR <<"";
+(y)es to re-edit, (n)o to continue
+
+					chomp(my $op = <STDIN> || '');
+					$op = lc($op);
+					goto retry_edit if $op eq 'y';
+					last if $op eq 'n'; # continuing
+					print STDERR "\`$op' not recognized\n";
+				}
+			} else { # non-interactive path
+				# unlikely to happen, as extra From lines are
+				# only a common mistake (for me) with
+				# interactive use
+				warn <<"";
+W: possible message boundary splitting error
+
+			}
+		}
+		# unescape what we escaped:
+		$new_raw =~ s/^>(>*From )/$1/gm;
+	}
+
+	my $new_mime = PublicInbox::MIME->new(\$new_raw);
+	my $old_mime = PublicInbox::MIME->new($old_raw);
+
+	# allow changing Received: and maybe other headers which can
+	# contain sensitive info.
+	my $nhdr = $new_mime->header_obj;
+	my $ohdr = $old_mime->header_obj;
+	if (($nhdr->as_string eq $ohdr->as_string) &&
+	    (content_id($new_mime) eq content_id($old_mime))) {
+		warn "No change detected to:\n", show_cmd($ibx, $smsg);
+
+		next unless $opt->{verbose};
+		# should we consider this machine-parseable?
+		print "$ibx->{mainrepo}:\n\tNONE\n";
+		next;
+	}
+
+	foreach my $tuple (@$to_edit) {
+		$ibx = PublicInbox::InboxWritable->new($tuple->[0]);
+		$smsg = $tuple->[1];
+		my $im = $ibx->importer(0);
+		my $commits = $im->replace($old_mime, $new_mime);
+		$im->done;
+		unless ($commits) {
+			warn "Failed to replace:\n", show_cmd($ibx, $smsg);
+			next;
+		}
+		next unless $opt->{verbose};
+		# should we consider this machine-parseable?
+		print "$ibx->{mainrepo}:";
+		if (scalar @$commits) {
+			print join("\n\t", '', @$commits), "\n";
+		} else {
+			print "\tNONE\n";
+		}
+	}
+}
diff --git a/t/edit.t b/t/edit.t
new file mode 100644
index 0000000..61e90f2
--- /dev/null
+++ b/t/edit.t
@@ -0,0 +1,178 @@
+# Copyright (C) 2019 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+# edit frontend behavior test (t/replace.t for backend)
+use strict;
+use warnings;
+use Test::More;
+use File::Temp qw/tempdir/;
+require './t/common.perl';
+require_git(2.6);
+require PublicInbox::Inbox;
+require PublicInbox::InboxWritable;
+require PublicInbox::Config;
+use PublicInbox::MID qw(mid_clean);
+
+my @mods = qw(IPC::Run DBI DBD::SQLite);
+foreach my $mod (@mods) {
+	eval "require $mod";
+	plan skip_all => "missing $mod for $0" if $@;
+};
+IPC::Run->import(qw(run));
+
+my $cmd_pfx = 'blib/script/public-inbox';
+my $tmpdir = tempdir('pi-edit-XXXXXX', TMPDIR => 1, CLEANUP => 1);
+my $mainrepo = "$tmpdir/v2";
+my $ibx = PublicInbox::Inbox->new({
+	mainrepo => $mainrepo,
+	name => 'test-v2edit',
+	version => 2,
+	-primary_address => 'test@example.com',
+	indexlevel => 'basic',
+});
+$ibx = PublicInbox::InboxWritable->new($ibx, {nproc=>1});
+my $cfgfile = "$tmpdir/config";
+local $ENV{PI_CONFIG} = $cfgfile;
+my $file = 't/data/0001.patch';
+open my $fh, '<', $file or die "open: $!";
+my $raw = do { local $/; <$fh> };
+my $im = $ibx->importer(0);
+my $mime = PublicInbox::MIME->new($raw);
+my $mid = mid_clean($mime->header('Message-Id'));
+ok($im->add($mime), 'add message to be edited');
+$im->done;
+my ($in, $out, $err, $cmd, $cur, $t);
+my $__git_dir = "--git-dir=$ibx->{mainrepo}/git/0.git";
+
+$t = '-F FILE'; {
+	$in = $out = $err = '';
+	local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/boolean prefix/bool pfx/'";
+	$cmd = [ "$cmd_pfx-edit", "-F$file", $mainrepo ];
+	ok(run($cmd, \$in, \$out, \$err), "$t edit OK");
+	$cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+	like($cur->header('Subject'), qr/bool pfx/, "$t message edited");
+	like($out, qr/[a-f0-9]{40}/, "$t shows commit on success");
+}
+
+$t = '-m MESSAGE_ID'; {
+	$in = $out = $err = '';
+	local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/bool pfx/boolean prefix/'";
+	$cmd = [ "$cmd_pfx-edit", "-m$mid", $mainrepo ];
+	ok(run($cmd, \$in, \$out, \$err), "$t edit OK");
+	$cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+	like($cur->header('Subject'), qr/boolean prefix/, "$t message edited");
+	like($out, qr/[a-f0-9]{40}/, "$t shows commit on success");
+}
+
+$t = 'no-op -m MESSAGE_ID'; {
+	$in = $out = $err = '';
+	my $before = `git $__git_dir rev-parse HEAD`;
+	local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/bool pfx/boolean prefix/'";
+	$cmd = [ "$cmd_pfx-edit", "-m$mid", $mainrepo ];
+	ok(run($cmd, \$in, \$out, \$err), "$t succeeds");
+	my $prev = $cur;
+	$cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+	is_deeply($cur, $prev, "$t makes no change");
+	like($cur->header('Subject'), qr/boolean prefix/,
+		"$t does not change message");
+	like($out, qr/NONE/, 'noop shows NONE');
+	my $after = `git $__git_dir rev-parse HEAD`;
+	is($after, $before, 'git head unchanged');
+}
+
+$t = '-m MESSAGE_ID can change Received: headers'; {
+	$in = $out = $err = '';
+	my $before = `git $__git_dir rev-parse HEAD`;
+	local $ENV{MAIL_EDITOR} =
+			"$^X -i -p -e 's/^Subject:.*/Received: x\\n\$&/'";
+	$cmd = [ "$cmd_pfx-edit", "-m$mid", $mainrepo ];
+	ok(run($cmd, \$in, \$out, \$err), "$t succeeds");
+	$cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+	like($cur->header('Subject'), qr/boolean prefix/,
+		"$t does not change Subject");
+	is($cur->header('Received'), 'x', 'added Received header');
+}
+
+$t = '-m miss'; {
+	$in = $out = $err = '';
+	local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/boolean/FAIL/'";
+	$cmd = [ "$cmd_pfx-edit", "-m$mid-miss", $mainrepo ];
+	ok(!run($cmd, \$in, \$out, \$err), "$t fails on invalid MID");
+	like($err, qr/No message found/, "$t shows error");
+}
+
+$t = 'non-interactive editor failure'; {
+	$in = $out = $err = '';
+	local $ENV{MAIL_EDITOR} = "$^X -i -p -e 'END { exit 1 }'";
+	$cmd = [ "$cmd_pfx-edit", "-m$mid", $mainrepo ];
+	ok(!run($cmd, \$in, \$out, \$err), "$t detected");
+	like($err, qr/END \{ exit 1 \}' failed:/, "$t shows error");
+}
+
+$t = 'mailEditor set in config'; {
+	$in = $out = $err = '';
+	my $rc = system(qw(git config), "--file=$cfgfile",
+			'publicinbox.maileditor',
+			"$^X -i -p -e 's/boolean prefix/bool pfx/'");
+	is($rc, 0, 'set publicinbox.mailEditor');
+	local $ENV{MAIL_EDITOR};
+	local $ENV{GIT_EDITOR} = 'echo should not run';
+	$cmd = [ "$cmd_pfx-edit", "-m$mid", $mainrepo ];
+	ok(run($cmd, \$in, \$out, \$err), "$t edited message");
+	$cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+	like($cur->header('Subject'), qr/bool pfx/, "$t message edited");
+	unlike($out, qr/should not run/, 'did not run GIT_EDITOR');
+}
+
+$t = '--raw and mbox escaping'; {
+	$in = $out = $err = '';
+	local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/^\$/\\nFrom not mbox\\n/'";
+	$cmd = [ "$cmd_pfx-edit", "-m$mid", '--raw', $mainrepo ];
+	ok(run($cmd, \$in, \$out, \$err), "$t succeeds");
+	$cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+	like($cur->body, qr/^From not mbox/sm, 'put "From " line into body');
+
+	local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/^>From not/\$& an/'";
+	$cmd = [ "$cmd_pfx-edit", "-m$mid", $mainrepo ];
+	ok(run($cmd, \$in, \$out, \$err), "$t succeeds with mbox escaping");
+	$cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+	like($cur->body, qr/^From not an mbox/sm,
+		'changed "From " line unescaped');
+
+	local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/^From not an mbox\\n//s'";
+	$cmd = [ "$cmd_pfx-edit", "-m$mid", '--raw', $mainrepo ];
+	ok(run($cmd, \$in, \$out, \$err), "$t succeeds again");
+	$cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+	unlike($cur->body, qr/^From not an mbox/sm, "$t restored body");
+}
+
+$t = 'reuse Message-ID'; {
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	ok($im->add($mime), "$t and re-add");
+	$im->done;
+	like($warn[0], qr/reused for mismatched content/, "$t got warning");
+}
+
+$t = 'edit ambiguous Message-ID with -m'; {
+	$in = $out = $err = '';
+	local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/bool pfx/boolean prefix/'";
+	$cmd = [ "$cmd_pfx-edit", "-m$mid", $mainrepo ];
+	ok(!run($cmd, \$in, \$out, \$err), "$t fails w/o --force");
+	like($err, qr/Multiple messages with different content found matching/,
+		"$t shows matches");
+	like($err, qr/GIT_DIR=.*git show/is, "$t shows git commands");
+}
+
+$t .= ' and --force'; {
+	$in = $out = $err = '';
+	local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/^Subject:.*/Subject:x/i'";
+	$cmd = [ "$cmd_pfx-edit", "-m$mid", '--force', $mainrepo ];
+	ok(run($cmd, \$in, \$out, \$err), "$t succeeds");
+	like($err, qr/Will edit all of them/, "$t notes all will be edited");
+	my @dump = `git $__git_dir cat-file --batch --batch-all-objects`;
+	chomp @dump;
+	is_deeply([grep(/^Subject:/i, @dump)], [qw(Subject:x Subject:x)],
+		"$t edited both messages");
+}
+
+done_testing();
-- 
EW


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/11] v2: implement message editing
  2019-06-09  2:51 [PATCH 00/11] v2: implement message editing Eric Wong (Contractor, The Linux Foundation)
                   ` (10 preceding siblings ...)
  2019-06-09  2:51 ` [PATCH 11/11] edit: new tool to perform edits Eric Wong (Contractor, The Linux Foundation)
@ 2019-06-10 15:06 ` Konstantin Ryabitsev
  2019-06-10 15:40   ` Eric Wong
  2019-06-10 18:17 ` [PATCH 13/11] edit: drop unwanted headers before noop check Eric Wong (Contractor, The Linux Foundation)
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 31+ messages in thread
From: Konstantin Ryabitsev @ 2019-06-10 15:06 UTC (permalink / raw)
  To: Eric Wong (Contractor, The Linux Foundation); +Cc: meta

On Sun, Jun 09, 2019 at 02:51:36AM +0000, Eric Wong (Contractor, The Linux Foundation) wrote:
>Some organizations are legally responsible for removing certain
>content but prefer to edit out sensitive parts of a message
>instead of purging it completely from history.
>
>We can build off existing purge functionality.  Instead of
>replacing a message with an empty file; we instead replace
>it with the desired content.
>
>This ->replace method reindexes the modified message and
>updates the corresponding git commit in case the subject
>or authorship ident changes.
>
>A new tool, public-inbox-edit(1) wraps the new ->replace
>functionality by providing an editable mboxrd (suitable
>for publicinbox.mailEditor "mutt -f").

Thanks, Eric. I'm testing this out now. Quick question -- I'm assuming 
this can't be done online, while new messages are arriving, correct?  
Should the procedure be to stop incoming mail, perform the edits, then 
start the mail again?

Best,
-K

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/11] v2: implement message editing
  2019-06-10 15:06 ` [PATCH 00/11] v2: implement message editing Konstantin Ryabitsev
@ 2019-06-10 15:40   ` Eric Wong
  2019-06-10 17:56     ` [PATCH 12/11] edit|purge: improve output on rewrites Eric Wong
  2019-06-10 18:57     ` [PATCH 00/11] v2: implement message editing Konstantin Ryabitsev
  0 siblings, 2 replies; 31+ messages in thread
From: Eric Wong @ 2019-06-10 15:40 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> Thanks, Eric. I'm testing this out now. Quick question -- I'm assuming this
> can't be done online, while new messages are arriving, correct?  Should the
> procedure be to stop incoming mail, perform the edits, then start the mail
> again?

It is designed to be done online by flock-ing inbox.lock.  All
the write operations are protected by that lock; I use -learn
all the time to remove spam and also used to --reindex
frequently.

For messages deeper into history, then it takes longer to replay
subsequent history and blocks -mda/-watch messages for longer.

I just noticed, the status message triggers a perl uninitialized
warning with multiple epochs, but it's harmless.   Will fix in a
bit.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 11/11] edit: new tool to perform edits
  2019-06-09  2:51 ` [PATCH 11/11] edit: new tool to perform edits Eric Wong (Contractor, The Linux Foundation)
@ 2019-06-10 16:06   ` Konstantin Ryabitsev
  2019-06-10 18:02     ` Eric Wong
  2019-06-13  8:07     ` Eric Wong
  0 siblings, 2 replies; 31+ messages in thread
From: Konstantin Ryabitsev @ 2019-06-10 16:06 UTC (permalink / raw)
  To: Eric Wong (Contractor, The Linux Foundation); +Cc: meta

On Sun, Jun 09, 2019 at 02:51:47AM +0000, Eric Wong (Contractor, The Linux Foundation) wrote:
>+public-inbox-edit - edit messages in a public inbox
>+
>+=head1 SYNOPSIS
>+
>+	public-inbox-edit -m MESSAGE-ID --all|INBOX_DIR
>+
>+	public-inbox-edit -F RAW_FILE --all|INBOX_DIR [.. INBOX_DIR]

A quick RFE that's beyond the scope of this work, but would be handy 
from the usability perspective -- pass a search term in case multiple 
messages need to be edited.  E.g.:

public-inbox-edit -s "johndoe@example.com" INBOX_DIR

The way I see it working, that would:

1. find all matching messages and put them into an mbox file
2. fire off "mutt -f" to start the editing session
3. do a batch replace of all messages in the edited mbox file

Best,
-K

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 12/11] edit|purge: improve output on rewrites
  2019-06-10 15:40   ` Eric Wong
@ 2019-06-10 17:56     ` Eric Wong
  2019-06-10 18:57     ` [PATCH 00/11] v2: implement message editing Konstantin Ryabitsev
  1 sibling, 0 replies; 31+ messages in thread
From: Eric Wong @ 2019-06-10 17:56 UTC (permalink / raw)
  To: meta; +Cc: Konstantin Ryabitsev

Eric Wong <e@80x24.org> wrote:
> I just noticed, the status message triggers a perl uninitialized
> warning with multiple epochs, but it's harmless.   Will fix in a
> bit.

Pushed as 6e507c8cb41b0d48963503a88034348d74506211
------------------8<-------------
Subject: [PATCH] edit|purge: improve output on rewrites

Fill in undef as "(unchanged)" when displaying commits
and prefix the epoch name.
---
 lib/PublicInbox/AdminEdit.pm | 17 +++++++++++++++++
 script/public-inbox-edit     |  9 ++-------
 script/public-inbox-purge    |  7 +------
 t/purge.t                    |  4 ++--
 4 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/lib/PublicInbox/AdminEdit.pm b/lib/PublicInbox/AdminEdit.pm
index b27c831..169feba 100644
--- a/lib/PublicInbox/AdminEdit.pm
+++ b/lib/PublicInbox/AdminEdit.pm
@@ -47,4 +47,21 @@ sub check_editable ($) {
 	}
 }
 
+# takes the output of V2Writable::purge and V2Writable::replace
+# $rewrites = [ array commits keyed by epoch ]
+sub show_rewrites ($$$) {
+	my ($fh, $ibx, $rewrites) = @_;
+	print $fh "$ibx->{mainrepo}:";
+	if (scalar @$rewrites) {
+		my $epoch = -1;
+		my @out = map {;
+			++$epoch;
+			"$epoch.git: ".(defined($_) ? $_ : '(unchanged)')
+		} @$rewrites;
+		print $fh join("\n\t", '', @out), "\n";
+	} else {
+		print $fh " NONE\n";
+	}
+}
+
 1;
diff --git a/script/public-inbox-edit b/script/public-inbox-edit
index ff0351a..7a534cc 100755
--- a/script/public-inbox-edit
+++ b/script/public-inbox-edit
@@ -207,7 +207,7 @@ W: possible message boundary splitting error
 
 		next unless $opt->{verbose};
 		# should we consider this machine-parseable?
-		print "$ibx->{mainrepo}:\n\tNONE\n";
+		PublicInbox::AdminEdit::show_rewrites(\*STDOUT, $ibx, []);
 		next;
 	}
 
@@ -223,11 +223,6 @@ W: possible message boundary splitting error
 		}
 		next unless $opt->{verbose};
 		# should we consider this machine-parseable?
-		print "$ibx->{mainrepo}:";
-		if (scalar @$commits) {
-			print join("\n\t", '', @$commits), "\n";
-		} else {
-			print "\tNONE\n";
-		}
+		PublicInbox::AdminEdit::show_rewrites(\*STDOUT, $ibx, $commits);
 	}
 }
diff --git a/script/public-inbox-purge b/script/public-inbox-purge
index 846557c..0705d17 100755
--- a/script/public-inbox-purge
+++ b/script/public-inbox-purge
@@ -45,12 +45,7 @@ foreach my $ibx (@ibxs) {
 	$v2w->done;
 
 	if ($opt->{verbose}) { # should we consider this machine-parseable?
-		print "$ibx->{mainrepo}:";
-		if (scalar @$commits) {
-			print join("\n\t", '', @$commits), "\n";
-		} else {
-			print " NONE\n";
-		}
+		PublicInbox::AdminEdit::show_rewrites(\*STDOUT, $ibx, $commits);
 	}
 	$n_purged += scalar @$commits;
 }
diff --git a/t/purge.t b/t/purge.t
index c1e0e9a..384f32a 100644
--- a/t/purge.t
+++ b/t/purge.t
@@ -57,7 +57,7 @@ is($? >> 8, 1, 'missed purge exits with 1');
 
 # a successful case:
 ok(IPC::Run::run([$purge, $mainrepo], \$raw, \$out, \$err), 'match OK');
-like($out, qr/^\t[a-f0-9]{40,}/m, 'removed commit noted');
+like($out, qr/\b[a-f0-9]{40,}/m, 'removed commit noted');
 
 # add (old) vger filter to config file
 print $cfg_fh <<EOF or die "print $!";
@@ -85,7 +85,7 @@ $out = $err = '';
 ok(chdir('/'), "chdir / OK for --all test");
 ok(IPC::Run::run([$purge, '--all'], \$pre_scrub, \$out, \$err),
 	'scrub purge OK');
-like($out, qr/^\t[a-f0-9]{40,}/m, 'removed commit noted');
+like($out, qr/\b[a-f0-9]{40,}/m, 'removed commit noted');
 # diag "out: $out"; diag "err: $err";
 
 $out = $err = '';
-- 
EW

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 11/11] edit: new tool to perform edits
  2019-06-10 16:06   ` Konstantin Ryabitsev
@ 2019-06-10 18:02     ` Eric Wong
  2019-06-13  8:07     ` Eric Wong
  1 sibling, 0 replies; 31+ messages in thread
From: Eric Wong @ 2019-06-10 18:02 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> A quick RFE that's beyond the scope of this work, but would be handy from
> the usability perspective -- pass a search term in case multiple messages
> need to be edited.  E.g.:
> 
> public-inbox-edit -s "johndoe@example.com" INBOX_DIR
> 
> The way I see it working, that would:
> 
> 1. find all matching messages and put them into an mbox file
> 2. fire off "mutt -f" to start the editing session
> 3. do a batch replace of all messages in the edited mbox file

The UI perspective could get a little hairy; especially with
mutt not purging messages from mboxes by default.  Or at least
I was missing prompts while testing this.

I noticed one more small bug w.r.t Status/Content-Length/Lines
header changes from mutt causing no-op changes to get missed
which I can fix quick; but I need to deal with a bug problem
of the insect variety today :<

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 13/11] edit: drop unwanted headers before noop check
  2019-06-09  2:51 [PATCH 00/11] v2: implement message editing Eric Wong (Contractor, The Linux Foundation)
                   ` (11 preceding siblings ...)
  2019-06-10 15:06 ` [PATCH 00/11] v2: implement message editing Konstantin Ryabitsev
@ 2019-06-10 18:17 ` Eric Wong (Contractor, The Linux Foundation)
  2019-06-10 21:58 ` [PATCH 14/11] v2writable: replace: kill git processes before reindexing Eric Wong (Contractor, The Linux Foundation)
  2019-06-12  0:25 ` [PATCH 15/11] edit: unlink temporary file when done Eric Wong (Contractor, The Linux Foundation)
  14 siblings, 0 replies; 31+ messages in thread
From: Eric Wong (Contractor, The Linux Foundation) @ 2019-06-10 18:17 UTC (permalink / raw)
  To: meta

mutt will set Content-Length, Lines, and Status headers
unconditionally, so we need to account for that before
doing header comparisons to avoid making expensive changes
when noop edits are made.
---
 script/public-inbox-edit |  6 ++++++
 t/edit.t                 | 18 ++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/script/public-inbox-edit b/script/public-inbox-edit
index 7a534cc..16d7852 100755
--- a/script/public-inbox-edit
+++ b/script/public-inbox-edit
@@ -14,6 +14,7 @@ use PublicInbox::MID qw(mid_clean mids);
 PublicInbox::Admin::check_require('-index');
 require PublicInbox::MIME;
 require PublicInbox::InboxWritable;
+require PublicInbox::Import;
 
 my $usage = "$0 -m MESSAGE_ID [--all] [INBOX_DIRS]";
 my $opt = { verbose => 1, all => 0, -min_inbox_version => 2, raw => 0 };
@@ -197,6 +198,11 @@ W: possible message boundary splitting error
 	my $new_mime = PublicInbox::MIME->new(\$new_raw);
 	my $old_mime = PublicInbox::MIME->new($old_raw);
 
+	# make sure we don't compare unwanted headers, since mutt adds
+	# Content-Length, Status, and Lines headers:
+	PublicInbox::Import::drop_unwanted_headers($new_mime);
+	PublicInbox::Import::drop_unwanted_headers($old_mime);
+
 	# allow changing Received: and maybe other headers which can
 	# contain sensitive info.
 	my $nhdr = $new_mime->header_obj;
diff --git a/t/edit.t b/t/edit.t
index 61e90f2..6b4e35c 100644
--- a/t/edit.t
+++ b/t/edit.t
@@ -79,6 +79,24 @@ $t = 'no-op -m MESSAGE_ID'; {
 	is($after, $before, 'git head unchanged');
 }
 
+$t = 'no-op -m MESSAGE_ID w/Status: header'; { # because mutt does it
+	$in = $out = $err = '';
+	my $before = `git $__git_dir rev-parse HEAD`;
+	local $ENV{MAIL_EDITOR} =
+			"$^X -i -p -e 's/^Subject:.*/Status: RO\\n\$&/'";
+	$cmd = [ "$cmd_pfx-edit", "-m$mid", $mainrepo ];
+	ok(run($cmd, \$in, \$out, \$err), "$t succeeds");
+	my $prev = $cur;
+	$cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+	is_deeply($cur, $prev, "$t makes no change");
+	like($cur->header('Subject'), qr/boolean prefix/,
+		"$t does not change message");
+	is($cur->header('Status'), undef, 'Status header not added');
+	like($out, qr/NONE/, 'noop shows NONE');
+	my $after = `git $__git_dir rev-parse HEAD`;
+	is($after, $before, 'git head unchanged');
+}
+
 $t = '-m MESSAGE_ID can change Received: headers'; {
 	$in = $out = $err = '';
 	my $before = `git $__git_dir rev-parse HEAD`;
-- 
EW


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/11] v2: implement message editing
  2019-06-10 15:40   ` Eric Wong
  2019-06-10 17:56     ` [PATCH 12/11] edit|purge: improve output on rewrites Eric Wong
@ 2019-06-10 18:57     ` Konstantin Ryabitsev
  2019-06-10 19:29       ` Eric Wong
  2019-06-11 21:06       ` [PATCH 00/11] v2: implement message editing Konstantin Ryabitsev
  1 sibling, 2 replies; 31+ messages in thread
From: Konstantin Ryabitsev @ 2019-06-10 18:57 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Mon, Jun 10, 2019 at 03:40:58PM +0000, Eric Wong wrote:
>I just noticed, the status message triggers a perl uninitialized
>warning with multiple epochs, but it's harmless.   Will fix in a
>bit.

I did a few successful tests on small trial lists, but I'm running into 
a problem when I try to actually edit something in (a copy of) LKML:

$ perl5lib/bin/public-inbox-edit -m messageid /mnt/fastio/lkml
(mutt opens here)
1 kept, 0 deleted.
Exception: Expected block 102325 to be level 2, not 0

The above exception pops up immediately after exiting mutt.

I can provide more debug info if that helps.


-K

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/11] v2: implement message editing
  2019-06-10 18:57     ` [PATCH 00/11] v2: implement message editing Konstantin Ryabitsev
@ 2019-06-10 19:29       ` Eric Wong
  2019-06-10 19:40         ` Konstantin Ryabitsev
  2019-06-11 21:06       ` [PATCH 00/11] v2: implement message editing Konstantin Ryabitsev
  1 sibling, 1 reply; 31+ messages in thread
From: Eric Wong @ 2019-06-10 19:29 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Mon, Jun 10, 2019 at 03:40:58PM +0000, Eric Wong wrote:
> > I just noticed, the status message triggers a perl uninitialized
> > warning with multiple epochs, but it's harmless.   Will fix in a
> > bit.
> 
> I did a few successful tests on small trial lists, but I'm running into a
> problem when I try to actually edit something in (a copy of) LKML:
> 
> $ perl5lib/bin/public-inbox-edit -m messageid /mnt/fastio/lkml
> (mutt opens here)
> 1 kept, 0 deleted.
> Exception: Expected block 102325 to be level 2, not 0

That's an exception from Xapian I haven't seen that in years.
Which version of Xapian and are you using chert or glass?

Could be open files from OFD locks in Xapian >= 1.2.20 && <1.2.24

> The above exception pops up immediately after exiting mutt.

Hmm... that would seem quick, but I have a slow computer.

> I can provide more debug info if that helps.

Yes, I could probably add some debug messages to make it easier

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/11] v2: implement message editing
  2019-06-10 19:29       ` Eric Wong
@ 2019-06-10 19:40         ` Konstantin Ryabitsev
  2019-06-10 22:03           ` [WIP] v2writable: support INBOX_DEBUG=replace Eric Wong
  0 siblings, 1 reply; 31+ messages in thread
From: Konstantin Ryabitsev @ 2019-06-10 19:40 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Mon, Jun 10, 2019 at 07:29:05PM +0000, Eric Wong wrote:
>> I did a few successful tests on small trial lists, but I'm running 
>> into a
>> problem when I try to actually edit something in (a copy of) LKML:
>>
>> $ perl5lib/bin/public-inbox-edit -m messageid /mnt/fastio/lkml
>> (mutt opens here)
>> 1 kept, 0 deleted.
>> Exception: Expected block 102325 to be level 2, not 0
>
>That's an exception from Xapian I haven't seen that in years.
>Which version of Xapian and are you using chert or glass?

EL7 has 1.2.25.

>> I can provide more debug info if that helps.
>
>Yes, I could probably add some debug messages to make it easier

Sounds good -- I had suspected it was coming from Xapian and I know EL7 
is lagging behind quite a bit. If this is too much divergence to work 
with, I can probably build a xapian14 package, but I'm afraid of the 
rabbit hole I may have to go down to make that work.

-K

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 14/11] v2writable: replace: kill git processes before reindexing
  2019-06-09  2:51 [PATCH 00/11] v2: implement message editing Eric Wong (Contractor, The Linux Foundation)
                   ` (12 preceding siblings ...)
  2019-06-10 18:17 ` [PATCH 13/11] edit: drop unwanted headers before noop check Eric Wong (Contractor, The Linux Foundation)
@ 2019-06-10 21:58 ` Eric Wong (Contractor, The Linux Foundation)
  2019-06-12  0:25 ` [PATCH 15/11] edit: unlink temporary file when done Eric Wong (Contractor, The Linux Foundation)
  14 siblings, 0 replies; 31+ messages in thread
From: Eric Wong (Contractor, The Linux Foundation) @ 2019-06-10 21:58 UTC (permalink / raw)
  To: meta

Xapian on Linux <3.15 has trouble with coprocesses since it used
fork() for locking and would hold onto pipes used for git
unnecessarily.
---
 I'm not sure if this fixes a problem, actually; but it's a
 general cleanliness thing and we already have convuluted logic
 in the SearchIdx.pm code for v1 to deal with the same thing.

 lib/PublicInbox/V2Writable.pm | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 3484807..09ed4e7 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -533,6 +533,9 @@ W: $list
 	my ($oid, $type, $len) = $self->{-inbox}->git->check($expect_oid);
 	$oid eq $expect_oid or die "BUG: $expect_oid not found after replace";
 
+	# don't leak FDs to Xapian:
+	$self->{-inbox}->git->cleanup;
+
 	# reindex modified messages:
 	for my $smsg (@$need_reindex) {
 		my $num = $smsg->{num};
-- 
EW

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [WIP] v2writable: support INBOX_DEBUG=replace
  2019-06-10 19:40         ` Konstantin Ryabitsev
@ 2019-06-10 22:03           ` Eric Wong
  2019-06-10 22:13             ` Konstantin Ryabitsev
  2019-06-11 18:43             ` [WIP] v2writable: support INBOX_DEBUG=replace Eric Wong
  0 siblings, 2 replies; 31+ messages in thread
From: Eric Wong @ 2019-06-10 22:03 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Mon, Jun 10, 2019 at 07:29:05PM +0000, Eric Wong wrote:
> > > I did a few successful tests on small trial lists, but I'm running
> > > into a
> > > problem when I try to actually edit something in (a copy of) LKML:
> > > 
> > > $ perl5lib/bin/public-inbox-edit -m messageid /mnt/fastio/lkml
> > > (mutt opens here)
> > > 1 kept, 0 deleted.
> > > Exception: Expected block 102325 to be level 2, not 0
> > 
> > That's an exception from Xapian I haven't seen that in years.
> > Which version of Xapian and are you using chert or glass?
> 
> EL7 has 1.2.25.

Oh, I just realized that doesn't use OFD locks at all because
its Linux 3.10 and OFD locks appeared in 3.15 (unless RH backported).

Maybe PATCH 14/11 fixes it:

  https://public-inbox.org/meta/20190610215811.untkksidetf3erf6@dcvr/

> > > I can provide more debug info if that helps.
> > 
> > Yes, I could probably add some debug messages to make it easier
> 
> Sounds good -- I had suspected it was coming from Xapian and I know EL7 is
> lagging behind quite a bit. If this is too much divergence to work with, I
> can probably build a xapian14 package, but I'm afraid of the rabbit hole I
> may have to go down to make that work.

Xapian 1.4 has huge performance improvements in worst-case
scenarios with the glass backend; so it might be worth trying
anyways.

But that won't get you Linux >=3.15 for OFD locks; so Xapian
is probably still using the nasty fork()-based lock in older
releases.

Maybe this dirty patch can dump more info:
---------8<--------
Subject: [WIP] v2writable: support INBOX_DEBUG=replace

Dirty patch to enable ->replace debugging via
INBOX_DEBUG=replace environment.

cf. <20190610192905.55xb737jl7qnbh23@dcvr>
---
 lib/PublicInbox/V2Writable.pm | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 3484807..164a032 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -20,6 +20,9 @@ use PublicInbox::Spawn qw(spawn);
 use PublicInbox::SearchIdx;
 use IO::Handle;
 
+# unstable interface
+use constant DBG_REPLACE => !!(($ENV{INBOX_DEBUG}//'') =~ /\breplace\b/);
+
 # an estimate of the post-packed size to the raw uncompressed size
 my $PACKING_FACTOR = 0.4;
 
@@ -312,14 +315,23 @@ sub _replace_oids ($$$) {
 		$self->{epoch_max} = $max;
 	}
 
+	if (DBG_REPLACE) {
+		warn "Replacing OIDs\n";
+		warn "\t", $_, "\n" for (keys %$replace_map);
+	}
+
 	foreach my $i (0..$max) {
 		my $git_dir = "$pfx/$i.git";
 		-d $git_dir or next;
+
+		warn "In $git_dir ... " if DBG_REPLACE;
 		my $git = PublicInbox::Git->new($git_dir);
 		my $im = $self->import_init($git, 0, 1);
 		$rewrites->[$i] = $im->replace_oids($mime, $replace_map);
 		$im->done;
+		warn "Done: $git_dir" if DBG_REPLACE;
 	}
+	warn "done replacing in git repos" if DBG_REPLACE;
 	$rewrites;
 }
 
@@ -369,6 +381,7 @@ sub rewrite_internal ($$;$$$) {
 	foreach my $mid (@$mids) {
 		my %gone; # num => [ smsg, raw ]
 		my ($id, $prev);
+		warn "looking for <$mid>" if DBG_REPLACE;
 		while (my $smsg = $over->next_by_mid($mid, \$id, \$prev)) {
 			my $msg = get_blob($self, $smsg);
 			if (!defined($msg)) {
@@ -380,6 +393,10 @@ sub rewrite_internal ($$;$$$) {
 			if (content_matches($cids, $cur)) {
 				$smsg->{mime} = $cur;
 				$gone{$smsg->{num}} = [ $smsg, \$orig ];
+				DBG_REPLACE and
+					warn "matched <$mid> => $smsg->{num}";
+			} else {
+				DBG_REPLACE and warn "no match <$mid>";
 			}
 		}
 		my $n = scalar keys %gone;
@@ -387,6 +404,7 @@ sub rewrite_internal ($$;$$$) {
 		if ($n > 1) {
 			warn "BUG: multiple articles linked to <$mid>\n",
 				join(',', sort keys %gone), "\n";
+			warn "Replacing all of them\n";
 		}
 		foreach my $num (keys %gone) {
 			my ($smsg, $orig) = @{$gone{$num}};
@@ -513,6 +531,7 @@ sub replace ($$$) {
 
 	my $raw = $new_mime->as_string;
 	my $expect_oid = git_hash_raw($self, \$raw);
+	warn "expect_oid: $expect_oid" if DBG_REPLACE;
 	my $rewritten = _replace($self, $old_mime, $new_mime, \$raw) or return;
 	my $need_reindex = $rewritten->{need_reindex};
 
@@ -537,6 +556,7 @@ W: $list
 	for my $smsg (@$need_reindex) {
 		my $num = $smsg->{num};
 		my $mid0 = $smsg->{mid};
+		warn "Reindexing article $num <$mid0>" if DBG_REPLACE;
 		do_idx($self, \$raw, $new_mime, $len, $num, $oid, $mid0);
 	}
 	$rewritten->{rewrites};
-- 
EW

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [WIP] v2writable: support INBOX_DEBUG=replace
  2019-06-10 22:03           ` [WIP] v2writable: support INBOX_DEBUG=replace Eric Wong
@ 2019-06-10 22:13             ` Konstantin Ryabitsev
  2019-06-10 23:12               ` [WIP] add more debug tracing around idx_init Eric Wong
  2019-06-11 18:43             ` [WIP] v2writable: support INBOX_DEBUG=replace Eric Wong
  1 sibling, 1 reply; 31+ messages in thread
From: Konstantin Ryabitsev @ 2019-06-10 22:13 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Mon, Jun 10, 2019 at 10:03:20PM +0000, Eric Wong wrote:
>Maybe PATCH 14/11 fixes it:
>
>  https://public-inbox.org/meta/20190610215811.untkksidetf3erf6@dcvr/

It didn't, unfortunately.

>But that won't get you Linux >=3.15 for OFD locks; so Xapian
>is probably still using the nasty fork()-based lock in older
>releases.
>
>Maybe this dirty patch can dump more info:
>---------8<--------
>Subject: [WIP] v2writable: support INBOX_DEBUG=replace

This is what I got right before the "Expected block" error:
expect_oid: 19af5df050275fb91f4104180e0c86d4b155c23e at /home/mricon/perl5lib//PublicInbox/V2Writable.pm line 534.

Hope this helps!

-K

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [WIP] add more debug tracing around idx_init
  2019-06-10 22:13             ` Konstantin Ryabitsev
@ 2019-06-10 23:12               ` Eric Wong
  2019-06-11 15:33                 ` Konstantin Ryabitsev
  0 siblings, 1 reply; 31+ messages in thread
From: Eric Wong @ 2019-06-10 23:12 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Mon, Jun 10, 2019 at 10:03:20PM +0000, Eric Wong wrote:
> > Maybe PATCH 14/11 fixes it:
> > 
> >  https://public-inbox.org/meta/20190610215811.untkksidetf3erf6@dcvr/
> 
> It didn't, unfortunately.
> 
> > But that won't get you Linux >=3.15 for OFD locks; so Xapian
> > is probably still using the nasty fork()-based lock in older
> > releases.
> > 
> > Maybe this dirty patch can dump more info:
> > ---------8<--------
> > Subject: [WIP] v2writable: support INBOX_DEBUG=replace
> 
> This is what I got right before the "Expected block" error:
> expect_oid: 19af5df050275fb91f4104180e0c86d4b155c23e at /home/mricon/perl5lib//PublicInbox/V2Writable.pm line 534.
> 
> Hope this helps!

It could be hitting something during idx_init.  Either that
or that message has no Message-IDs after editing.

You can probably sprinkle a:

	system("strace -f -o /tmp/strace.out -p $$ &");

Somewhere and get more deep into it w/o tracing mutt.
I use the MAIL_EDITOR env, in tests, but am kinda hesitant to
document/support it forever; but for now, you can use something
like:

  MAIL_EDITOR="perl -i -p -e 's/^Subject:/Foo:/'"

To try non-interactive editing and see if some mutt config
is doing something strange and clobbering Message-IDs, too.

-----8<------
Subject: [WIP] add more debug tracing around idx_init

---
 lib/PublicInbox/V2Writable.pm | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 7d6d618..bd156c5 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -45,6 +45,7 @@ sub count_partitions ($) {
 	my ($self) = @_;
 	my $nparts = 0;
 	my $xpfx = $self->{xpfx};
+	warn 'counting partitions' if DBG_REPLACE;
 
 	# always load existing partitions in case core count changes:
 	# Also, partition count may change while -watch is running
@@ -58,6 +59,7 @@ sub count_partitions ($) {
 			};
 		}
 	}
+	warn "nparts=$nparts" if DBG_REPLACE;
 	$nparts;
 }
 
@@ -255,6 +257,7 @@ sub idx_init {
 	# do not leak read-only FDs to child processes, we only have these
 	# FDs for duplicate detection so they should not be
 	# frequently activated.
+	warn 'cleaning read-only elements' if DBG_REPLACE;
 	delete $ibx->{$_} foreach (qw(git mm search));
 
 	my $indexlevel = $ibx->{indexlevel};
@@ -263,6 +266,7 @@ sub idx_init {
 	}
 
 	if ($self->{parallel}) {
+		warn 'preparing {bnote}' if DBG_REPLACE;
 		pipe(my ($r, $w)) or die "pipe failed: $!";
 		# pipe for barrier notifications doesn't need to be big,
 		# 1031: F_SETPIPE_SZ
@@ -271,10 +275,12 @@ sub idx_init {
 		$w->autoflush(1);
 	}
 
+	warn 'preparing lock' if DBG_REPLACE;
 	my $over = $self->{over};
 	$ibx->umask_prepare;
 	$ibx->with_umask(sub {
 		$self->lock_acquire unless ($opt && $opt->{-skip_lock});
+		warn 'creating ->over' if DBG_REPLACE;
 		$over->create;
 
 		# -compact can change partition count while -watch is idle
@@ -287,10 +293,12 @@ sub idx_init {
 		my $max = $self->{partitions} - 1;
 
 		# idx_parts must be visible to all forked processes
+		warn 'preparing SearchIdxParts' if DBG_REPLACE;
 		my $idx = $self->{idx_parts} = [];
 		for my $i (0..$max) {
 			push @$idx, PublicInbox::SearchIdxPart->new($self, $i);
 		}
+		warn 'SearchIdxParts created' if DBG_REPLACE;
 
 		# Now that all subprocesses are up, we can open the FDs
 		# for SQLite:
@@ -358,6 +366,7 @@ sub content_matches ($$) {
 # used for removing or replacing (purging)
 sub rewrite_internal ($$;$$$) {
 	my ($self, $old_mime, $cmt_msg, $new_mime, $sref) = @_;
+	warn "idx_init" if DBG_REPLACE;
 	$self->idx_init;
 	my ($im, $need_reindex, $replace_map);
 	if ($sref) {
-- 
EW


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [WIP] add more debug tracing around idx_init
  2019-06-10 23:12               ` [WIP] add more debug tracing around idx_init Eric Wong
@ 2019-06-11 15:33                 ` Konstantin Ryabitsev
  0 siblings, 0 replies; 31+ messages in thread
From: Konstantin Ryabitsev @ 2019-06-11 15:33 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Mon, Jun 10, 2019 at 11:12:20PM +0000, Eric Wong wrote:
>Somewhere and get more deep into it w/o tracing mutt.
>I use the MAIL_EDITOR env, in tests, but am kinda hesitant to
>document/support it forever; but for now, you can use something
>like:
>
>  MAIL_EDITOR="perl -i -p -e 's/^Subject:/Foo:/'"

I just change a single letter in vi, which is sufficient for the tests. 

>Subject: [WIP] add more debug tracing around idx_init

Here's a few more lines of output:

Will use vi to edit mail
counting partitions at /home/mricon/perl5lib//PublicInbox/V2Writable.pm line 48.
nparts=14 at /home/mricon/perl5lib//PublicInbox/V2Writable.pm line 62.
expect_oid: 8f5ea3f1210f998f26a8c67c75b3a9042a338e11 at /home/mricon/perl5lib//PublicInbox/V2Writable.pm line 543.
idx_init at /home/mricon/perl5lib//PublicInbox/V2Writable.pm line 369.
cleaning read-only elements at /home/mricon/perl5lib//PublicInbox/V2Writable.pm line 260.
preparing lock at /home/mricon/perl5lib//PublicInbox/V2Writable.pm line 278.
creating ->over at /home/mricon/perl5lib//PublicInbox/V2Writable.pm line 283.
counting partitions at /home/mricon/perl5lib//PublicInbox/V2Writable.pm line 48.
nparts=14 at /home/mricon/perl5lib//PublicInbox/V2Writable.pm line 62.
preparing SearchIdxParts at /home/mricon/perl5lib//PublicInbox/V2Writable.pm line 296.
Exception: Expected block 102325 to be level 2, not 0

Hope this helps. Let me know if you do want me to run strace.

Best,
-K

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [WIP] v2writable: support INBOX_DEBUG=replace
  2019-06-10 22:03           ` [WIP] v2writable: support INBOX_DEBUG=replace Eric Wong
  2019-06-10 22:13             ` Konstantin Ryabitsev
@ 2019-06-11 18:43             ` Eric Wong
  1 sibling, 0 replies; 31+ messages in thread
From: Eric Wong @ 2019-06-11 18:43 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Eric Wong <e@80x24.org> wrote:
> Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> > On Mon, Jun 10, 2019 at 07:29:05PM +0000, Eric Wong wrote:
> > > > I did a few successful tests on small trial lists, but I'm running
> > > > into a
> > > > problem when I try to actually edit something in (a copy of) LKML:
> > > > 
> > > > $ perl5lib/bin/public-inbox-edit -m messageid /mnt/fastio/lkml
> > > > (mutt opens here)
> > > > 1 kept, 0 deleted.
> > > > Exception: Expected block 102325 to be level 2, not 0
> > > 
> > > That's an exception from Xapian I haven't seen that in years.
> > > Which version of Xapian and are you using chert or glass?
> > 
> > EL7 has 1.2.25.

Also, is that the Search::Xapian version or the
xapian-core-libs/libxapianXX version?  It's OK if they mismatch,
Search::Xapian (XS bindings) can work with libxapianXX for 1.4.x,
even.  Mainly it's the libxapianXX version which matters.

> Oh, I just realized that doesn't use OFD locks at all because
> its Linux 3.10 and OFD locks appeared in 3.15 (unless RH backported).

Yes, it appears RH backported OFD locks to 3.10; so more variables
to consider...

Just wondering, do t/edit.t and t/replace.t tests pass for you?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/11] v2: implement message editing
  2019-06-10 18:57     ` [PATCH 00/11] v2: implement message editing Konstantin Ryabitsev
  2019-06-10 19:29       ` Eric Wong
@ 2019-06-11 21:06       ` Konstantin Ryabitsev
  2019-06-12  0:18         ` [PATCH] searchidx: improve error message when Xapian fails Eric Wong
  1 sibling, 1 reply; 31+ messages in thread
From: Konstantin Ryabitsev @ 2019-06-11 21:06 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Mon, 10 Jun 2019 at 14:57, Konstantin Ryabitsev
<konstantin@linuxfoundation.org> wrote:
> I did a few successful tests on small trial lists, but I'm running into
> a problem when I try to actually edit something in (a copy of) LKML:
>
> $ perl5lib/bin/public-inbox-edit -m messageid /mnt/fastio/lkml
> (mutt opens here)
> 1 kept, 0 deleted.
> Exception: Expected block 102325 to be level 2, not 0

In the process of helping me debug this, Eric pointed out that the
xapian-db was corrupted. Apparently, this happened while I was copying
the lkml dir to a safe location for testing purposes. Once I had a
non-corrupted version of the database, I was able to successfully edit
the necessary message.

My sincere apologies for the noise, and huge thanks to Eric for going
the extra mile for me.

Regards,
-- 
Konstantin Ryabitsev
Director, Projects IT
The Linux Foundation
Montréal, Québec

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH] searchidx: improve error message when Xapian fails
  2019-06-11 21:06       ` [PATCH 00/11] v2: implement message editing Konstantin Ryabitsev
@ 2019-06-12  0:18         ` Eric Wong
  0 siblings, 0 replies; 31+ messages in thread
From: Eric Wong @ 2019-06-12  0:18 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> > Exception: Expected block 102325 to be level 2, not 0
> 
> In the process of helping me debug this, Eric pointed out that the
> xapian-db was corrupted. Apparently, this happened while I was copying
> the lkml dir to a safe location for testing purposes. Once I had a
> non-corrupted version of the database, I was able to successfully edit
> the necessary message.

Also, for public reference, using the flock(1) command (or
similar) can be used to safely copy while -mda or -watch
runs:

	flock /path/to/inbox.lock cp -a ....

I suppose this is also possible, to minimize lock time:

	# unlocked copy step
	rsync -a $SRC $DST

	# locked copy using a remote destination to force delta-transmission:
	flock /path/to/inbox.lock rsync -a $SRC $REMOTE_DST

But to make Xapian corruption more apparent in the future:

---------------8<-------------
Subject: [PATCH] searchidx: improve error message when Xapian fails

Make it easier to detect if a partition is corrupt.
---
 lib/PublicInbox/SearchIdx.pm | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 9985628..7cd67f1 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -117,7 +117,11 @@ sub _xdb_acquire {
 		}
 	}
 	return unless defined $flag;
-	$self->{xdb} = Search::Xapian::WritableDatabase->new($dir, $flag);
+	my $xdb = eval { Search::Xapian::WritableDatabase->new($dir, $flag) };
+	if ($@) {
+		die "Failed opening $dir: ", $@;
+	}
+	$self->{xdb} = $xdb;
 }
 
 sub add_val ($$$) {
-- 
EW

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 15/11] edit: unlink temporary file when done
  2019-06-09  2:51 [PATCH 00/11] v2: implement message editing Eric Wong (Contractor, The Linux Foundation)
                   ` (13 preceding siblings ...)
  2019-06-10 21:58 ` [PATCH 14/11] v2writable: replace: kill git processes before reindexing Eric Wong (Contractor, The Linux Foundation)
@ 2019-06-12  0:25 ` Eric Wong (Contractor, The Linux Foundation)
  14 siblings, 0 replies; 31+ messages in thread
From: Eric Wong (Contractor, The Linux Foundation) @ 2019-06-12  0:25 UTC (permalink / raw)
  To: meta

We don't need to leave temporary files lying around.
---
 script/public-inbox-edit | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/script/public-inbox-edit b/script/public-inbox-edit
index 16d7852..2e2c761 100755
--- a/script/public-inbox-edit
+++ b/script/public-inbox-edit
@@ -121,7 +121,7 @@ $mids
 
 my $tmpl = 'public-inbox-edit-XXXXXX';
 foreach my $to_edit (values %$found) {
-	my ($edit_fh, $edit_fn) = tempfile($tmpl, TMPDIR => 1);
+	my ($edit_fh, $edit_fn) = tempfile($tmpl, TMPDIR => 1, UNLINK => 1);
 	$edit_fh->autoflush(1);
 	my ($ibx, $smsg) = @{$to_edit->[0]};
 	my $old_raw = $ibx->msg_by_smsg($smsg);
-- 
EW

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 11/11] edit: new tool to perform edits
  2019-06-10 16:06   ` Konstantin Ryabitsev
  2019-06-10 18:02     ` Eric Wong
@ 2019-06-13  8:07     ` Eric Wong
  1 sibling, 0 replies; 31+ messages in thread
From: Eric Wong @ 2019-06-13  8:07 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Sun, Jun 09, 2019 at 02:51:47AM +0000, Eric Wong (Contractor, The Linux Foundation) wrote:
> > +public-inbox-edit - edit messages in a public inbox
> > +
> > +=head1 SYNOPSIS
> > +
> > +	public-inbox-edit -m MESSAGE-ID --all|INBOX_DIR
> > +
> > +	public-inbox-edit -F RAW_FILE --all|INBOX_DIR [.. INBOX_DIR]
> 
> A quick RFE that's beyond the scope of this work, but would be handy from
> the usability perspective -- pass a search term in case multiple messages
> need to be edited.  E.g.:

Fwiw, all this editing stuff gives me the heebie-jeebies.  Not
the code or tests itself, but users potentially shooting themselves
(or other readers) in the foot and having to deal with it from
a user support side...

> public-inbox-edit -s "johndoe@example.com" INBOX_DIR
> 
> The way I see it working, that would:
> 
> 1. find all matching messages and put them into an mbox file

So 1) is something I've been wanting to do for a long time,
anyways.  And I've got some optimizations in the pipeline
(literally) which make sense for throughput-focused local
usage.

> 2. fire off "mutt -f" to start the editing session

In my usage of mutt, I have to remind myself to purge the old
message.  It seemed like a bad usability hiccup, not sure if
that'll be a problem for others...

> 3. do a batch replace of all messages in the edited mbox file

The backend part shouldn't be too hard.  But mapping the mbox
to the correct message gets a little tricky with support for
ambiguous Message-IDs.  But I think it'll be workable as long as
the requirement of Message-IDs not being editable (or even
reorderable) remains.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2019-06-13  8:07 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-09  2:51 [PATCH 00/11] v2: implement message editing Eric Wong (Contractor, The Linux Foundation)
2019-06-09  2:51 ` [PATCH 01/11] v2writable: consolidate overview and indexing call Eric Wong (Contractor, The Linux Foundation)
2019-06-09  2:51 ` [PATCH 02/11] import: extract_author_info becomes extract_commit_info Eric Wong (Contractor, The Linux Foundation)
2019-06-09  2:51 ` [PATCH 03/11] import: switch to "replace_oids" interface for purge Eric Wong (Contractor, The Linux Foundation)
2019-06-09  2:51 ` [PATCH 04/11] v2writable: implement ->replace call Eric Wong (Contractor, The Linux Foundation)
2019-06-09  2:51 ` [PATCH 05/11] admin: remove warning arg for unconfigured inboxes Eric Wong (Contractor, The Linux Foundation)
2019-06-09  2:51 ` [PATCH 06/11] purge: start moving common options to AdminEdit module Eric Wong (Contractor, The Linux Foundation)
2019-06-09  2:51 ` [PATCH 07/11] admin: beef up resolve_inboxes to handle purge options Eric Wong (Contractor, The Linux Foundation)
2019-06-09  2:51 ` [PATCH 08/11] AdminEdit: move editability checks from -purge Eric Wong (Contractor, The Linux Foundation)
2019-06-09  2:51 ` [PATCH 09/11] admin: expose ->config Eric Wong (Contractor, The Linux Foundation)
2019-06-09  2:51 ` [PATCH 10/11] doc: document the --prune option for -index Eric Wong (Contractor, The Linux Foundation)
2019-06-09  2:51 ` [PATCH 11/11] edit: new tool to perform edits Eric Wong (Contractor, The Linux Foundation)
2019-06-10 16:06   ` Konstantin Ryabitsev
2019-06-10 18:02     ` Eric Wong
2019-06-13  8:07     ` Eric Wong
2019-06-10 15:06 ` [PATCH 00/11] v2: implement message editing Konstantin Ryabitsev
2019-06-10 15:40   ` Eric Wong
2019-06-10 17:56     ` [PATCH 12/11] edit|purge: improve output on rewrites Eric Wong
2019-06-10 18:57     ` [PATCH 00/11] v2: implement message editing Konstantin Ryabitsev
2019-06-10 19:29       ` Eric Wong
2019-06-10 19:40         ` Konstantin Ryabitsev
2019-06-10 22:03           ` [WIP] v2writable: support INBOX_DEBUG=replace Eric Wong
2019-06-10 22:13             ` Konstantin Ryabitsev
2019-06-10 23:12               ` [WIP] add more debug tracing around idx_init Eric Wong
2019-06-11 15:33                 ` Konstantin Ryabitsev
2019-06-11 18:43             ` [WIP] v2writable: support INBOX_DEBUG=replace Eric Wong
2019-06-11 21:06       ` [PATCH 00/11] v2: implement message editing Konstantin Ryabitsev
2019-06-12  0:18         ` [PATCH] searchidx: improve error message when Xapian fails Eric Wong
2019-06-10 18:17 ` [PATCH 13/11] edit: drop unwanted headers before noop check Eric Wong (Contractor, The Linux Foundation)
2019-06-10 21:58 ` [PATCH 14/11] v2writable: replace: kill git processes before reindexing Eric Wong (Contractor, The Linux Foundation)
2019-06-12  0:25 ` [PATCH 15/11] edit: unlink temporary file when done Eric Wong (Contractor, The Linux Foundation)

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).