user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: "Eric Wong (Contractor, The Linux Foundation)" <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 11/13] v2writable: clarify header cleanups
Date: Thu, 22 Mar 2018 09:40:13 +0000	[thread overview]
Message-ID: <20180322094015.14422-12-e@80x24.org> (raw)
In-Reply-To: <20180322094015.14422-1-e@80x24.org>

We want to make it clear to the code and DEBUG_DIFF users
that we do not introduce messages with unsuitable headers
into public archives.
---
 lib/PublicInbox/Import.pm     | 12 +++++++++---
 lib/PublicInbox/V2Writable.pm |  7 +++++++
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index d69934b..5d116a1 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -288,6 +288,14 @@ sub extract_author_info ($) {
 	($name, $email);
 }
 
+# kill potentially confusing/misleading headers
+sub drop_unwanted_headers ($) {
+	my ($mime) = @_;
+
+	$mime->header_set($_) for qw(bytes lines content-length status);
+	$mime->header_set($_) for @PublicInbox::MDA::BAD_HEADERS;
+}
+
 # returns undef on duplicate
 # returns the :MARK of the most recent commit
 sub add {
@@ -321,9 +329,7 @@ sub add {
 		_check_path($r, $w, $tip, $path) and return;
 	}
 
-	# kill potentially confusing/misleading headers
-	$mime->header_set($_) for qw(bytes lines content-length status);
-	$mime->header_set($_) for @PublicInbox::MDA::BAD_HEADERS;
+	drop_unwanted_headers($mime);
 
 	# spam check:
 	if ($check_cb) {
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 605f688..44b5528 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -223,6 +223,12 @@ sub remove {
 	my $mm = $skel->{mm};
 	my $removed;
 	my $mids = mids($mime->header_obj);
+
+	# We avoid introducing new blobs into git since the raw content
+	# can be slightly different, so we do not need the user-supplied
+	# message now that we have the mids and content_id
+	$mime = undef;
+
 	foreach my $mid (@$mids) {
 		$srch->reopen->each_smsg_by_mid($mid, sub {
 			my ($smsg) = @_;
@@ -430,6 +436,7 @@ sub diff ($$$) {
 	print $ah $cur->as_string or die "print: $!";
 	close $ah or die "close: $!";
 	my ($bh, $bn) = tempfile('email-new-XXXXXXXX');
+	PublicInbox::Import::drop_unwanted_headers($new);
 	print $bh $new->as_string or die "print: $!";
 	close $bh or die "close: $!";
 	my $cmd = [ qw(diff -u), $an, $bn ];
-- 
EW


  parent reply	other threads:[~2018-03-22  9:40 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-22  9:40 [PATCH 00/13] reindexing, feeds, date fixes Eric Wong (Contractor, The Linux Foundation)
2018-03-22  9:40 ` [PATCH 01/13] content_id: do not take Message-Id into account Eric Wong (Contractor, The Linux Foundation)
2018-03-22  9:40 ` [PATCH 02/13] introduce InboxWritable class Eric Wong (Contractor, The Linux Foundation)
2018-03-22  9:40 ` [PATCH 03/13] import: discard all the same headers as MDA Eric Wong (Contractor, The Linux Foundation)
2018-03-22  9:40 ` [PATCH 04/13] InboxWritable: add mbox/maildir parsing + import logic Eric Wong (Contractor, The Linux Foundation)
2018-03-22  9:40 ` [PATCH 05/13] use both Date: and Received: times Eric Wong (Contractor, The Linux Foundation)
2018-03-22  9:40 ` [PATCH 06/13] msgmap: add tmp_clone to create an anonymous copy Eric Wong (Contractor, The Linux Foundation)
2018-03-22  9:40 ` [PATCH 07/13] fix syntax warnings Eric Wong (Contractor, The Linux Foundation)
2018-03-22  9:40 ` [PATCH 08/13] v2writable: support reindexing Xapian Eric Wong (Contractor, The Linux Foundation)
2018-03-26 20:08   ` Eric Wong
2018-03-22  9:40 ` [PATCH 09/13] t/altid.t: extra tests for mid_set Eric Wong (Contractor, The Linux Foundation)
2018-03-22  9:40 ` [PATCH 10/13] v2writable: add NNTP article number regeneration support Eric Wong (Contractor, The Linux Foundation)
2018-03-22  9:40 ` Eric Wong (Contractor, The Linux Foundation) [this message]
2018-03-22  9:40 ` [PATCH 12/13] v2writable: DEBUG_DIFF respects $TMPDIR Eric Wong (Contractor, The Linux Foundation)
2018-03-22  9:40 ` [PATCH 13/13] feed: $INBOX/new.atom endpoint supports v2 inboxes Eric Wong (Contractor, The Linux Foundation)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180322094015.14422-12-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).