user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: "Eric Wong (Contractor, The Linux Foundation)" <e@80x24.org>
To: meta@public-inbox.org
Cc: "Eric Wong (Contractor, The Linux Foundation)" <e@80x24.org>
Subject: [PATCH 06/14] public-inbox-convert: tool for converting old to new inboxes
Date: Thu, 29 Mar 2018 10:28:11 +0000	[thread overview]
Message-ID: <20180329102819.15234-7-e@80x24.org> (raw)
In-Reply-To: <20180329102819.15234-1-e@80x24.org>

This should make it easier to let users perform comparisons and
migrate to v2 if needed.
---
 Documentation/public-inbox-config.pod  |   2 +-
 Documentation/public-inbox-convert.pod |  45 ++++++++++
 MANIFEST                               |   2 +
 script/public-inbox-convert            | 109 +++++++++++++++++++++++++
 4 files changed, 157 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/public-inbox-convert.pod
 create mode 100755 script/public-inbox-convert

diff --git a/Documentation/public-inbox-config.pod b/Documentation/public-inbox-config.pod
index 8250b45..22ee909 100644
--- a/Documentation/public-inbox-config.pod
+++ b/Documentation/public-inbox-config.pod
@@ -40,7 +40,7 @@ Default: none, required
 
 =item publicinbox.<name>.mainrepo
 
-The absolute path to the git repository which hosts the
+The absolute path to the directory which hosts the
 public-inbox.  This must be specified once.
 
 Default: none, required
diff --git a/Documentation/public-inbox-convert.pod b/Documentation/public-inbox-convert.pod
new file mode 100644
index 0000000..1e16ea4
--- /dev/null
+++ b/Documentation/public-inbox-convert.pod
@@ -0,0 +1,45 @@
+=head1 NAME
+
+public-inbox-convert - convert v1 inboxes to v2
+
+=head1 SYNOPSIS
+
+	public-inbox-convert OLD_DIR NEW_DIR
+
+=head1 DESCRIPTION
+
+public-inbox-convert copies the contents of an old "v1" inbox
+into a new "v2" inbox.  It makes no changes to the old inbox
+and users are expected to update the "mainrepo" path in
+L<public-inbox-config(5)> to point to the path of NEW_DIR
+once they are satisfied with the conversion.
+
+=head1 ENVIRONMENT
+
+=over 8
+
+=item PI_CONFIG
+
+The default config file, normally "~/.public-inbox/config".
+See L<public-inbox-config(5)>
+
+=back
+
+=head1 UPGRADING
+
+=head1 CONTACT
+
+Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org>
+
+The mail archives are hosted at L<https://public-inbox.org/meta/>
+and L<http://hjrcffqmbrq6wope.onion/meta/>
+
+=head1 COPYRIGHT
+
+Copyright 2013-2018 all contributors L<mailto:meta@public-inbox.org>
+
+License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>
+
+=head1 SEE ALSO
+
+L<public-inbox-init(1)>, L<public-inbox-index(1)>
diff --git a/MANIFEST b/MANIFEST
index 8b2b10b..1e48d3a 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -8,6 +8,7 @@ Documentation/design_www.txt
 Documentation/hosted.txt
 Documentation/include.mk
 Documentation/public-inbox-config.pod
+Documentation/public-inbox-convert.pod
 Documentation/public-inbox-daemon.pod
 Documentation/public-inbox-httpd.pod
 Documentation/public-inbox-index.pod
@@ -109,6 +110,7 @@ sa_config/Makefile
 sa_config/README
 sa_config/root/etc/spamassassin/public-inbox.pre
 sa_config/user/.spamassassin/user_prefs
+script/public-inbox-convert
 script/public-inbox-httpd
 script/public-inbox-index
 script/public-inbox-init
diff --git a/script/public-inbox-convert b/script/public-inbox-convert
new file mode 100755
index 0000000..2b0a385
--- /dev/null
+++ b/script/public-inbox-convert
@@ -0,0 +1,109 @@
+#!/usr/bin/perl -w
+# Copyright (C) 2018 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <http://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use warnings;
+use Getopt::Long qw(:config gnu_getopt no_ignore_case auto_abbrev);
+use PublicInbox::MIME;
+use PublicInbox::Inbox;
+use PublicInbox::Config;
+use PublicInbox::V2Writable;
+use PublicInbox::Spawn qw(spawn);
+use Cwd 'abs_path';
+my $usage = "Usage: public-inbox-convert OLD NEW\n";
+my $jobs;
+my $index = 1;
+my %opts = (
+	'--jobs|j=i' => \$jobs,
+	'--index!' => \$index,
+);
+GetOptions(%opts) or die "bad command-line args\n$usage";
+GetOptions(%opts) or die "bad command-line args\n$usage";
+my $old_dir = shift or die $usage;
+my $new_dir = shift or die $usage;
+die "$new_dir exists\n" if -d $new_dir;
+die "$old_dir not a directory\n" unless -d $old_dir;
+my $config = PublicInbox::Config->new;
+$old_dir = abs_path($old_dir);
+my $old;
+$config->each_inbox(sub {
+	$old = $_[0] if abs_path($_[0]->{mainrepo}) eq $old_dir;
+});
+unless ($old) {
+	warn "W: $old_dir not configured in " .
+		PublicInbox::Config::default_file() . "\n";
+	$old = {
+		mainrepo => $old_dir,
+		name => 'ignored',
+		address => [ 'old@example.com' ],
+	};
+	$old = PublicInbox::Inbox->new($old);
+}
+if (($old->{version} || 1) >= 2) {
+	die "Only conversion from v1 inboxes is supported\n";
+}
+my $new = { %$old };
+delete $new->{altid}; # TODO: support altid for v2
+$new->{mainrepo} = $new_dir;
+$new->{version} = 2;
+$new = PublicInbox::Inbox->new($new);
+my $v2w = PublicInbox::V2Writable->new($new, 1);
+$v2w->init_inbox($jobs);
+my $state = '';
+my ($prev, $from);
+my $head = $old->{ref_head} || 'HEAD';
+my ($rd, $pid) = $old->git->popen(qw(fast-export --use-done-feature), $head);
+$v2w->idx_init;
+my $im = $v2w->importer;
+my ($r, $w) = $im->gfi_start;
+my $h = '[0-9a-f]';
+my %D;
+while (<$rd>) {
+	if ($_ eq "blob\n") {
+		$state = 'blob';
+	} elsif (/^commit /) {
+		$state = 'commit';
+	} elsif (/^data (\d+)/) {
+		my $len = $1;
+		$w->print($_) or $im->wfail;
+		while ($len) {
+			my $n = read($rd, my $tmp, $len) or die "read: $!";
+			warn "$n != $len\n" if $n != $len;
+			$len -= $n;
+			$w->print($tmp) or $im->wfail;
+		}
+		next;
+	} elsif ($state eq 'commit') {
+		if (m{^M 100644 :(\d+) (${h}{2}/${h}{38})}o) {
+			my ($mark, $path) = ($1, $2);
+			$D{$path} = $mark;
+			$w->print("M 100644 :$mark m\n") or $im->wfail;
+			next;
+		}
+		if (m{^D (${h}{2}/${h}{38})}o) {
+			my $mark = delete $D{$1};
+			defined $mark or die "undeleted path: $1\n";
+			$w->print("M 100644 :$mark _/D\n") or $im->wfail;
+			next;
+		}
+		if (m{^from (:\d+)}) {
+			$prev = $from;
+			$from = $1;
+			# no next
+		}
+	} elsif ($_ eq "done\n") {
+		last;
+	}
+	$w->print($_) or $im->wfail;
+}
+$w = $r = undef;
+close $rd or die "close fast-export: $!\n";
+waitpid($pid, 0) or die "waitpid failed: $!\n";
+$? == 0 or die "fast-export failed: $?\n";
+my $mm = $old->mm;
+$mm->{dbh}->sqlite_backup_to_file("$new_dir/msgmap.sqlite3") if $mm;
+$v2w->done;
+if ($index) {
+	$v2w->reindex;
+	$v2w->done;
+}
-- 
EW


  parent reply	other threads:[~2018-03-29 10:28 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-29 10:28 [PATCH 00/14] purging support, v1 conversions, cleanups + more Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 01/14] www: remove unnecessary ghost checks Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 02/14] v2writable: append, instead of prepending generated Message-ID Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 03/14] lookup by Message-ID favors the "primary" one Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 04/14] www: fix attachment downloads for conflicted Message-IDs Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 05/14] searchmsg: document why we store To: and Cc: for NNTP Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` Eric Wong (Contractor, The Linux Foundation) [this message]
2018-03-29 10:28 ` [PATCH 07/14] v2writable: support purging messages from git entirely Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 08/14] search: cleanup uniqueness checking Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 09/14] search: get rid of most lookup_* subroutines Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 10/14] search: move find_doc_ids to searchidx Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 11/14] v2writable: cleanup: get rid of unused fields Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 12/14] mbox: avoid extracting Message-ID for linkification Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 13/14] www: cleanup expensive fallback for legacy URLs Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 14/14] view: get rid of some unnecessary imports Eric Wong (Contractor, The Linux Foundation)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180329102819.15234-7-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).