user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 3/5] searchidx: sync Msgmap database along with Xapian
Date: Tue, 15 Sep 2015 01:08:02 +0000	[thread overview]
Message-ID: <20150915010804.20084-4-e@80x24.org> (raw)
In-Reply-To: <20150915010804.20084-1-e@80x24.org>

We can avoid duplicating work of extracting messages from git if we
tie this to Xapian.  Of course, this ties the two features together,
but it's probably reasonable to expect that anybody who wants to use
public-inbox to serve messages to front-end users will have both.
---
 lib/PublicInbox/SearchIdx.pm | 84 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 66 insertions(+), 18 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 44f6bc1..351450c 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -247,18 +247,36 @@ sub link_message_to_parents {
 }
 
 sub index_blob {
-	my ($self, $git, $blob) = @_;
-	my $mime = do_cat_mail($git, $blob) or return;
-	eval { $self->add_message($mime) };
-	warn "W: index_blob $blob: $@\n" if $@;
+	my ($self, $git, $mime) = @_;
+	$self->add_message($mime);
 }
 
 sub unindex_blob {
-	my ($self, $git, $blob) = @_;
-	my $mime = do_cat_mail($git, $blob) or return;
-	my $mid = $mime->header('Message-ID');
-	eval { $self->remove_message($mid) } if defined $mid;
-	warn "W: unindex_blob $blob: $@\n" if $@;
+	my ($self, $git, $mime) = @_;
+	my $mid = mid_clean($mime->header('Message-ID'));
+	$self->remove_message($mid) if defined $mid;
+}
+
+sub index_mm {
+	my ($self, $git, $mime) = @_;
+	$self->{mm}->mid_insert(mid_clean($mime->header('Message-ID')));
+}
+
+sub unindex_mm {
+	my ($self, $git, $mime) = @_;
+	$self->{mm}->mid_delete(mid_clean($mime->header('Message-ID')));
+}
+
+sub index_both {
+	my ($self, $git, $mime) = @_;
+	index_blob($self, $git, $mime);
+	index_mm($self, $git, $mime);
+}
+
+sub unindex_both {
+	my ($self, $git, $mime) = @_;
+	unindex_blob($self, $git, $mime);
+	unindex_mm($self, $git, $mime);
 }
 
 sub do_cat_mail {
@@ -292,9 +310,11 @@ sub rlog {
 		die('open` '.join(' ', @cmd) . " pipe failed: $!\n");
 	while (my $line = <$log>) {
 		if ($line =~ /$addmsg/o) {
-			$add_cb->($self, $git, $1);
+			my $mime = do_cat_mail($git, $1) or next;
+			$add_cb->($self, $git, $mime);
 		} elsif ($line =~ /$delmsg/o) {
-			$del_cb->($self, $git, $1);
+			my $mime = do_cat_mail($git, $1) or next;
+			$del_cb->($self, $git, $mime);
 		} elsif ($line =~ /^commit ($h40)/o) {
 			$latest = $1;
 		}
@@ -308,17 +328,45 @@ sub _index_sync {
 	my ($self, $head) = @_;
 	my $db = $self->{xdb};
 	$head ||= 'HEAD';
+	my $mm = $self->{mm} = eval {
+		require PublicInbox::Msgmap;
+		PublicInbox::Msgmap->new($self->{git_dir}, 1);
+	};
 
 	$db->begin_transaction;
-	eval {
-		my $latest = $db->get_metadata('last_commit');
-		my $range = $latest eq '' ? $head : "$latest..$head";
-		$latest = $self->rlog($range, *index_blob, *unindex_blob);
-		$db->set_metadata('last_commit', $latest) if defined $latest;
-	};
+	my $lx = $db->get_metadata('last_commit');
+	my $range = $lx eq '' ? $head : "$lx..$head";
+	if ($mm) {
+		$mm->{dbh}->begin_work;
+		my $lm = $mm->last_commit || '';
+		if ($lm eq $lx) {
+			# Common case is the indexes are synced,
+			# we only need to run git-log once:
+			$lx = $self->rlog($range, *index_both, *unindex_both);
+			$mm->{dbh}->commit;
+			if (defined $lx) {
+				$db->set_metadata('last_commit', $lx);
+				$mm->last_commit($lx);
+			}
+		} else {
+			# dumb case, msgmap and xapian are out-of-sync
+			# do not care for performance:
+			my $r = $lm eq '' ? $head : "$lm..$head";
+			$lm = $self->rlog($r, *index_mm, *unindex_mm);
+			$mm->{dbh}->commit;
+			$mm->last_commit($lm) if defined $lm;
+
+			goto xapian_only;
+		}
+	} else {
+		# user didn't install DBD::SQLite and DBI
+xapian_only:
+		$lx = $self->rlog($range, *index_blob, *unindex_blob);
+		$db->set_metadata('last_commit', $lx) if defined $lx;
+	}
 	if ($@) {
-		warn "indexing failed: $@\n";
 		$db->cancel_transaction;
+		$mm->{dbh}->rollback if $mm;
 	} else {
 		$db->commit_transaction;
 	}
-- 
EW


  parent reply	other threads:[~2015-09-15  1:08 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-15  1:07 [PATCH 0/5] introduce SQLite message map Eric Wong
2015-09-15  1:08 ` [PATCH 1/5] msgmap: add message mapping via SQLite Eric Wong
2015-09-15  1:08 ` [PATCH 2/5] searchidx: hoist out rlog code Eric Wong
2015-09-15  1:08 ` Eric Wong [this message]
2015-09-15  1:08 ` [PATCH 4/5] extmsg: wire up to use msgmap for prefixes Eric Wong
2015-09-15  1:08 ` [PATCH 5/5] INSTALL: document DBD::SQLite and DBI dependencies Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150915010804.20084-4-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).