From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id BCFB41F9FB for ; Sun, 9 Jun 2019 02:51:49 +0000 (UTC) From: "Eric Wong (Contractor, The Linux Foundation)" To: meta@public-inbox.org Subject: [PATCH 11/11] edit: new tool to perform edits Date: Sun, 9 Jun 2019 02:51:47 +0000 Message-Id: <20190609025147.24966-12-e@80x24.org> In-Reply-To: <20190609025147.24966-1-e@80x24.org> References: <20190609025147.24966-1-e@80x24.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: This wrapper around V2Writable->replace provides a user-interface for editing messages as single-message mboxes (or the raw text via $EDITOR). --- Documentation/include.mk | 1 + Documentation/public-inbox-config.pod | 4 + Documentation/public-inbox-edit.pod | 109 ++++++++++++ MANIFEST | 3 + script/public-inbox-edit | 233 ++++++++++++++++++++++++++ t/edit.t | 178 ++++++++++++++++++++ 6 files changed, 528 insertions(+) create mode 100644 Documentation/public-inbox-edit.pod create mode 100755 script/public-inbox-edit create mode 100644 t/edit.t diff --git a/Documentation/include.mk b/Documentation/include.mk index b064f29..f5f46d0 100644 --- a/Documentation/include.mk +++ b/Documentation/include.mk @@ -32,6 +32,7 @@ podtext = $(PODTEXT) $(PODTEXT_OPTS) # MakeMaker only seems to support manpage sections 1 and 3... m1 = m1 += public-inbox-compact +m1 += public-inbox-edit m1 += public-inbox-httpd m1 += public-inbox-index m1 += public-inbox-mda diff --git a/Documentation/public-inbox-config.pod b/Documentation/public-inbox-config.pod index db81bf1..a86132b 100644 --- a/Documentation/public-inbox-config.pod +++ b/Documentation/public-inbox-config.pod @@ -234,6 +234,10 @@ C, but may be overridden. Default: basename of C, /var/www/htdocs/cgit/ or /usr/share/cgit/ +=item publicinbox.mailEditor + +See L + =item publicinbox.wwwlisting Enable a HTML listing style when the root path of the URL '/' is accessed. diff --git a/Documentation/public-inbox-edit.pod b/Documentation/public-inbox-edit.pod new file mode 100644 index 0000000..97c7c92 --- /dev/null +++ b/Documentation/public-inbox-edit.pod @@ -0,0 +1,109 @@ +=head1 NAME + +public-inbox-edit - edit messages in a public inbox + +=head1 SYNOPSIS + + public-inbox-edit -m MESSAGE-ID --all|INBOX_DIR + + public-inbox-edit -F RAW_FILE --all|INBOX_DIR [.. INBOX_DIR] + +=head1 DESCRIPTION + +public-inbox-edit allows editing messages in a given inbox +to remove sensitive information. It is only intended as a +last resort, as it will cause discontiguous git history and +draw more attention to the sensitive data in mirrors. + +=head1 OPTIONS + +=over + +=item --all + +Edit the message in all inboxes configured in ~/.public-inbox/config. +This is an alternative to specifying individual inboxes directories +on the command-line. + +=item -m MESSAGE-ID + +Edits the message corresponding to the given C. +If the C is ambiguous, C<--force> or using the +C<--file> of the original will be required. + +=item -F FILE + +Edits the message corresponding to the Message-ID: header +and content given in C. This requires the unmodified +raw message, and the contents of C will not itself +be modified. This is useful if a Message-ID is ambiguous +due to filtering/munging rules or other edits. + +=item --force + +Forcibly perform the edit even if Message-ID is ambiguous. + +=item --raw + +Do not perform "From " line escaping. By default, this +generates a mboxrd variant file to detect unpurged messages +in the new mbox. This makes sense if your configured +C is a regular editor and not +something like C + +=back + +=head1 CONFIGURATION + +=over 8 + +=item publicinbox.mailEditor + +The command to perform the edit with. An example of this would be +C, and the user would then use the facilities in L +to edit the mail. This is useful for editing attachments or +Base64-encoded emails which are more difficult to edit with a +normal editor (configured via C, C or C). + +Default: none + +=back + +=head1 ENVIRONMENT + +=over 8 + +=for comment MAIL_EDITOR is undocumented (unstable, don't want naming conflicts) + +=item GIT_EDITOR / VISUAL / EDITOR + +public-inbox-edit will fall back to using one of these variables +(in that order) if C is unset. + +=item PI_CONFIG + +The default config file, normally "~/.public-inbox/config". +See L + +=back + +=head1 LIMITATIONS + +Only L repositories are supported. + +=head1 CONTACT + +Feedback welcome via plain-text mail to L + +The mail archives are hosted at L +and L + +=head1 COPYRIGHT + +Copyright 2019 all contributors L + +License: AGPL-3.0+ L + +=head1 SEE ALSO + +L diff --git a/MANIFEST b/MANIFEST index dcf1a60..a44632a 100644 --- a/MANIFEST +++ b/MANIFEST @@ -13,6 +13,7 @@ Documentation/public-inbox-compact.pod Documentation/public-inbox-config.pod Documentation/public-inbox-convert.pod Documentation/public-inbox-daemon.pod +Documentation/public-inbox-edit.pod Documentation/public-inbox-httpd.pod Documentation/public-inbox-index.pod Documentation/public-inbox-mda.pod @@ -150,6 +151,7 @@ sa_config/root/etc/spamassassin/public-inbox.pre sa_config/user/.spamassassin/user_prefs script/public-inbox-compact script/public-inbox-convert +script/public-inbox-edit script/public-inbox-httpd script/public-inbox-index script/public-inbox-init @@ -185,6 +187,7 @@ t/content_id.t t/convert-compact.t t/data/0001.patch t/ds-leak.t +t/edit.t t/emergency.t t/fail-bin/spamc t/feed.t diff --git a/script/public-inbox-edit b/script/public-inbox-edit new file mode 100755 index 0000000..ff0351a --- /dev/null +++ b/script/public-inbox-edit @@ -0,0 +1,233 @@ +#!/usr/bin/perl -w +# Copyright (C) 2019 all contributors +# License: AGPL-3.0+ +# +# Used for editing messages in a public-inbox. +# Supports v2 inboxes only, for now. +use strict; +use warnings; +use Getopt::Long qw(:config gnu_getopt no_ignore_case auto_abbrev); +use PublicInbox::AdminEdit; +use File::Temp qw(tempfile); +use PublicInbox::ContentId qw(content_id); +use PublicInbox::MID qw(mid_clean mids); +PublicInbox::Admin::check_require('-index'); +require PublicInbox::MIME; +require PublicInbox::InboxWritable; + +my $usage = "$0 -m MESSAGE_ID [--all] [INBOX_DIRS]"; +my $opt = { verbose => 1, all => 0, -min_inbox_version => 2, raw => 0 }; +my @opt = qw(mid|m=s file|F=s raw); +GetOptions($opt, @PublicInbox::AdminEdit::OPT, @opt) or + die "bad command-line args\n$usage\n"; + +my $editor = $ENV{MAIL_EDITOR}; # e.g. "mutt -f" +unless (defined $editor) { + my $k = 'publicinbox.mailEditor'; + if (my $cfg = PublicInbox::Admin::config()) { + $editor = $cfg->{lc($k)}; + } + unless (defined $editor) { + warn "\`$k' not configured, trying \`git var GIT_EDITOR'\n"; + chomp($editor = `git var GIT_EDITOR`); + warn "Will use $editor to edit mail\n"; + } +} + +my $mid = $opt->{mid}; +my $file = $opt->{file}; +if (defined $mid && defined $file) { + die "the --mid and --file options are mutually exclusive\n"; +} + +my @ibxs = PublicInbox::Admin::resolve_inboxes(\@ARGV, $opt); +PublicInbox::AdminEdit::check_editable(\@ibxs); + +my $found = {}; # cid => [ [ibx, smsg] [, [ibx, smsg] ] ] + +sub find_mid ($) { + my ($mid) = @_; + foreach my $ibx (@ibxs) { + my $over = $ibx->over; + my ($id, $prev); + while (my $smsg = $over->next_by_mid($mid, \$id, \$prev)) { + my $ref = $ibx->msg_by_smsg($smsg); + my $mime = PublicInbox::MIME->new($ref); + my $cid = content_id($mime); + my $tuple = [ $ibx, $smsg ]; + push @{$found->{$cid} ||= []}, $tuple + } + delete @$ibx{qw(over mm git search)}; # cleanup + } + $found; +} + +sub show_cmd ($$) { + my ($ibx, $smsg) = @_; + " GIT_DIR=$ibx->{mainrepo}/all.git \\\n git show $smsg->{blob}\n"; +} + +sub show_found () { + foreach my $to_edit (values %$found) { + foreach my $tuple (@$to_edit) { + my ($ibx, $smsg) = @$tuple; + warn show_cmd($ibx, $smsg); + } + } +} + +if (defined($mid)) { + $mid = mid_clean($mid); + $found = find_mid($mid); + my $nr = scalar(keys %$found); + die "No message found for <$mid>\n" unless $nr; + if ($nr > 1) { + warn <<""; +Multiple messages with different content found matching +<$mid>: + + show_found(); + die "Use --force to edit all of them\n" if !$opt->{force}; + warn "Will edit all of them\n"; + } +} else { + open my $fh, '<', $file or die "open($file) failed: $!"; + my $orig = do { local $/; <$fh> }; + my $mime = PublicInbox::MIME->new(\$orig); + my $mids = mids($mime->header_obj); + find_mid($_) for (@$mids); # populates $found + my $cid = content_id($mime); + my $to_edit = $found->{$cid}; + unless ($to_edit) { + my $nr = scalar(keys %$found); + if ($nr > 0) { + warn <<""; +$nr matches to Message-ID(s) in $file, but none matched content +Partial matches below: + + show_found(); + } elsif ($nr == 0) { + $mids = join('', map { " <$_>\n" } @$mids); + warn <<""; +No matching messages found matching Message-ID(s) in $file +$mids + + } + exit 1; + } + $found = { $cid => $to_edit }; +} + +my $tmpl = 'public-inbox-edit-XXXXXX'; +foreach my $to_edit (values %$found) { + my ($edit_fh, $edit_fn) = tempfile($tmpl, TMPDIR => 1); + $edit_fh->autoflush(1); + my ($ibx, $smsg) = @{$to_edit->[0]}; + my $old_raw = $ibx->msg_by_smsg($smsg); + delete @$ibx{qw(over mm git search)}; # cleanup + + my $tmp = $$old_raw; + if (!$opt->{raw}) { + my $oid = $smsg->{blob}; + print $edit_fh "From mboxrd\@$oid Thu Jan 1 00:00:00 1970\n"; + $tmp =~ s/^(>*From )/>$1/gm; + } + print $edit_fh $tmp or + die "failed to write tempfile for editing: $!"; + + # run the editor, respecting spaces/quote +retry_edit: + if (system(qw(sh -c), qq(eval "$editor" '"\$@"'), '--', $edit_fn)) { + if (!(-t STDIN) && !$opt->{force}) { + die "E: $editor failed: $?\n"; + } + print STDERR "$editor failed, "; + print STDERR "continuing as forced\n" if $opt->{force}; + while (!$opt->{force}) { + print STDERR "(r)etry, (c)ontinue, (q)uit?\n"; + chomp(my $op = || ''); + $op = lc($op); + goto retry_edit if $op eq 'r'; + exit $? if $op eq 'q'; + last if $op eq 'c'; # continuing + print STDERR "\`$op' not recognized\n"; + } + } + + # reread the edited file, not using $edit_fh since $EDITOR may + # rename/relink $edit_fn + open my $new_fh, '<', $edit_fn or + die "can't read edited file ($edit_fn): $!\n"; + my $new_raw = do { local $/; <$new_fh> }; + + if (!$opt->{raw}) { + # get rid of the From we added + $new_raw =~ s/\A[\r\n]*From [^\r\n]*\r?\n//s; + + # check if user forgot to purge (in mutt) after editing + if ($new_raw =~ /^From /sm) { + if (-t STDIN) { + print STDERR <<''; +Extra "From " lines detected in new mbox. +Did you forget to purge the original message from the mbox after editing? + + while (1) { + print STDERR <<""; +(y)es to re-edit, (n)o to continue + + chomp(my $op = || ''); + $op = lc($op); + goto retry_edit if $op eq 'y'; + last if $op eq 'n'; # continuing + print STDERR "\`$op' not recognized\n"; + } + } else { # non-interactive path + # unlikely to happen, as extra From lines are + # only a common mistake (for me) with + # interactive use + warn <<""; +W: possible message boundary splitting error + + } + } + # unescape what we escaped: + $new_raw =~ s/^>(>*From )/$1/gm; + } + + my $new_mime = PublicInbox::MIME->new(\$new_raw); + my $old_mime = PublicInbox::MIME->new($old_raw); + + # allow changing Received: and maybe other headers which can + # contain sensitive info. + my $nhdr = $new_mime->header_obj; + my $ohdr = $old_mime->header_obj; + if (($nhdr->as_string eq $ohdr->as_string) && + (content_id($new_mime) eq content_id($old_mime))) { + warn "No change detected to:\n", show_cmd($ibx, $smsg); + + next unless $opt->{verbose}; + # should we consider this machine-parseable? + print "$ibx->{mainrepo}:\n\tNONE\n"; + next; + } + + foreach my $tuple (@$to_edit) { + $ibx = PublicInbox::InboxWritable->new($tuple->[0]); + $smsg = $tuple->[1]; + my $im = $ibx->importer(0); + my $commits = $im->replace($old_mime, $new_mime); + $im->done; + unless ($commits) { + warn "Failed to replace:\n", show_cmd($ibx, $smsg); + next; + } + next unless $opt->{verbose}; + # should we consider this machine-parseable? + print "$ibx->{mainrepo}:"; + if (scalar @$commits) { + print join("\n\t", '', @$commits), "\n"; + } else { + print "\tNONE\n"; + } + } +} diff --git a/t/edit.t b/t/edit.t new file mode 100644 index 0000000..61e90f2 --- /dev/null +++ b/t/edit.t @@ -0,0 +1,178 @@ +# Copyright (C) 2019 all contributors +# License: AGPL-3.0+ +# edit frontend behavior test (t/replace.t for backend) +use strict; +use warnings; +use Test::More; +use File::Temp qw/tempdir/; +require './t/common.perl'; +require_git(2.6); +require PublicInbox::Inbox; +require PublicInbox::InboxWritable; +require PublicInbox::Config; +use PublicInbox::MID qw(mid_clean); + +my @mods = qw(IPC::Run DBI DBD::SQLite); +foreach my $mod (@mods) { + eval "require $mod"; + plan skip_all => "missing $mod for $0" if $@; +}; +IPC::Run->import(qw(run)); + +my $cmd_pfx = 'blib/script/public-inbox'; +my $tmpdir = tempdir('pi-edit-XXXXXX', TMPDIR => 1, CLEANUP => 1); +my $mainrepo = "$tmpdir/v2"; +my $ibx = PublicInbox::Inbox->new({ + mainrepo => $mainrepo, + name => 'test-v2edit', + version => 2, + -primary_address => 'test@example.com', + indexlevel => 'basic', +}); +$ibx = PublicInbox::InboxWritable->new($ibx, {nproc=>1}); +my $cfgfile = "$tmpdir/config"; +local $ENV{PI_CONFIG} = $cfgfile; +my $file = 't/data/0001.patch'; +open my $fh, '<', $file or die "open: $!"; +my $raw = do { local $/; <$fh> }; +my $im = $ibx->importer(0); +my $mime = PublicInbox::MIME->new($raw); +my $mid = mid_clean($mime->header('Message-Id')); +ok($im->add($mime), 'add message to be edited'); +$im->done; +my ($in, $out, $err, $cmd, $cur, $t); +my $__git_dir = "--git-dir=$ibx->{mainrepo}/git/0.git"; + +$t = '-F FILE'; { + $in = $out = $err = ''; + local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/boolean prefix/bool pfx/'"; + $cmd = [ "$cmd_pfx-edit", "-F$file", $mainrepo ]; + ok(run($cmd, \$in, \$out, \$err), "$t edit OK"); + $cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid)); + like($cur->header('Subject'), qr/bool pfx/, "$t message edited"); + like($out, qr/[a-f0-9]{40}/, "$t shows commit on success"); +} + +$t = '-m MESSAGE_ID'; { + $in = $out = $err = ''; + local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/bool pfx/boolean prefix/'"; + $cmd = [ "$cmd_pfx-edit", "-m$mid", $mainrepo ]; + ok(run($cmd, \$in, \$out, \$err), "$t edit OK"); + $cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid)); + like($cur->header('Subject'), qr/boolean prefix/, "$t message edited"); + like($out, qr/[a-f0-9]{40}/, "$t shows commit on success"); +} + +$t = 'no-op -m MESSAGE_ID'; { + $in = $out = $err = ''; + my $before = `git $__git_dir rev-parse HEAD`; + local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/bool pfx/boolean prefix/'"; + $cmd = [ "$cmd_pfx-edit", "-m$mid", $mainrepo ]; + ok(run($cmd, \$in, \$out, \$err), "$t succeeds"); + my $prev = $cur; + $cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid)); + is_deeply($cur, $prev, "$t makes no change"); + like($cur->header('Subject'), qr/boolean prefix/, + "$t does not change message"); + like($out, qr/NONE/, 'noop shows NONE'); + my $after = `git $__git_dir rev-parse HEAD`; + is($after, $before, 'git head unchanged'); +} + +$t = '-m MESSAGE_ID can change Received: headers'; { + $in = $out = $err = ''; + my $before = `git $__git_dir rev-parse HEAD`; + local $ENV{MAIL_EDITOR} = + "$^X -i -p -e 's/^Subject:.*/Received: x\\n\$&/'"; + $cmd = [ "$cmd_pfx-edit", "-m$mid", $mainrepo ]; + ok(run($cmd, \$in, \$out, \$err), "$t succeeds"); + $cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid)); + like($cur->header('Subject'), qr/boolean prefix/, + "$t does not change Subject"); + is($cur->header('Received'), 'x', 'added Received header'); +} + +$t = '-m miss'; { + $in = $out = $err = ''; + local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/boolean/FAIL/'"; + $cmd = [ "$cmd_pfx-edit", "-m$mid-miss", $mainrepo ]; + ok(!run($cmd, \$in, \$out, \$err), "$t fails on invalid MID"); + like($err, qr/No message found/, "$t shows error"); +} + +$t = 'non-interactive editor failure'; { + $in = $out = $err = ''; + local $ENV{MAIL_EDITOR} = "$^X -i -p -e 'END { exit 1 }'"; + $cmd = [ "$cmd_pfx-edit", "-m$mid", $mainrepo ]; + ok(!run($cmd, \$in, \$out, \$err), "$t detected"); + like($err, qr/END \{ exit 1 \}' failed:/, "$t shows error"); +} + +$t = 'mailEditor set in config'; { + $in = $out = $err = ''; + my $rc = system(qw(git config), "--file=$cfgfile", + 'publicinbox.maileditor', + "$^X -i -p -e 's/boolean prefix/bool pfx/'"); + is($rc, 0, 'set publicinbox.mailEditor'); + local $ENV{MAIL_EDITOR}; + local $ENV{GIT_EDITOR} = 'echo should not run'; + $cmd = [ "$cmd_pfx-edit", "-m$mid", $mainrepo ]; + ok(run($cmd, \$in, \$out, \$err), "$t edited message"); + $cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid)); + like($cur->header('Subject'), qr/bool pfx/, "$t message edited"); + unlike($out, qr/should not run/, 'did not run GIT_EDITOR'); +} + +$t = '--raw and mbox escaping'; { + $in = $out = $err = ''; + local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/^\$/\\nFrom not mbox\\n/'"; + $cmd = [ "$cmd_pfx-edit", "-m$mid", '--raw', $mainrepo ]; + ok(run($cmd, \$in, \$out, \$err), "$t succeeds"); + $cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid)); + like($cur->body, qr/^From not mbox/sm, 'put "From " line into body'); + + local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/^>From not/\$& an/'"; + $cmd = [ "$cmd_pfx-edit", "-m$mid", $mainrepo ]; + ok(run($cmd, \$in, \$out, \$err), "$t succeeds with mbox escaping"); + $cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid)); + like($cur->body, qr/^From not an mbox/sm, + 'changed "From " line unescaped'); + + local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/^From not an mbox\\n//s'"; + $cmd = [ "$cmd_pfx-edit", "-m$mid", '--raw', $mainrepo ]; + ok(run($cmd, \$in, \$out, \$err), "$t succeeds again"); + $cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid)); + unlike($cur->body, qr/^From not an mbox/sm, "$t restored body"); +} + +$t = 'reuse Message-ID'; { + my @warn; + local $SIG{__WARN__} = sub { push @warn, @_ }; + ok($im->add($mime), "$t and re-add"); + $im->done; + like($warn[0], qr/reused for mismatched content/, "$t got warning"); +} + +$t = 'edit ambiguous Message-ID with -m'; { + $in = $out = $err = ''; + local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/bool pfx/boolean prefix/'"; + $cmd = [ "$cmd_pfx-edit", "-m$mid", $mainrepo ]; + ok(!run($cmd, \$in, \$out, \$err), "$t fails w/o --force"); + like($err, qr/Multiple messages with different content found matching/, + "$t shows matches"); + like($err, qr/GIT_DIR=.*git show/is, "$t shows git commands"); +} + +$t .= ' and --force'; { + $in = $out = $err = ''; + local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/^Subject:.*/Subject:x/i'"; + $cmd = [ "$cmd_pfx-edit", "-m$mid", '--force', $mainrepo ]; + ok(run($cmd, \$in, \$out, \$err), "$t succeeds"); + like($err, qr/Will edit all of them/, "$t notes all will be edited"); + my @dump = `git $__git_dir cat-file --batch --batch-all-objects`; + chomp @dump; + is_deeply([grep(/^Subject:/i, @dump)], [qw(Subject:x Subject:x)], + "$t edited both messages"); +} + +done_testing(); -- EW