user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 3/4] v2writable: avoid mm_tmp creation without regen
Date: Thu, 30 May 2019 06:52:26 +0000	[thread overview]
Message-ID: <20190530065227.17641-4-e@80x24.org> (raw)
In-Reply-To: <20190530065227.17641-1-e@80x24.org>

Creating mm_tmp is an expensive operation with large inboxes
and can be avoided if there are no new messages to process.

Since git-fetch(1) currently lacks an --exit-code option(*),
mirrors will run `public-inbox-index' unconditionally after
fetch, which is an expensive op if it needs to duplicate
a large SQLite DB.

This speeds up the mirror case of:

	git --git-dir=git/$EPOCH.git fetch && public-inbox-index

This reduces the no-op `public-inbox-index' time from over 8s to
~0.5s on a (currently) 7-epoch clone of https://lore.kernel.org/lkml/
on my system.

(*) WIP --exit-code for git-fetch:
    https://public-inbox.org/git/87ftphw7mv.fsf@evledraar.gmail.com/
---
 lib/PublicInbox/V2Writable.pm | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 375f12f..fd93ac2 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -900,6 +900,9 @@ sub sync_prepare ($$$) {
 		$pr->("$n\n") if $pr;
 		$regen_max += $n;
 	}
+
+	return 0 if (!$regen_max && !keys(%{$self->{unindex_range}}));
+
 	# reindex should NOT see new commits anymore, if we do,
 	# it's a problem and we need to notice it via die()
 	my $pad = length($regen_max) + 1;
@@ -1027,7 +1030,6 @@ sub index_sync {
 	return unless defined $latest;
 	$self->idx_init($opt); # acquire lock
 	my $sync = {
-		mm_tmp => $self->{mm}->tmp_clone,
 		D => {}, # "$mid\0$cid" => $oid
 		unindex_range => {}, # EPOCH => oid_old..oid_new
 		reindex => $opt->{reindex},
@@ -1036,6 +1038,16 @@ sub index_sync {
 	$sync->{ranges} = sync_ranges($self, $sync, $epoch_max);
 	$sync->{regen} = sync_prepare($self, $sync, $epoch_max);
 
+	if ($sync->{regen}) {
+		# tmp_clone seems to fail if inside a transaction, so
+		# we rollback here (because we opened {mm} for reading)
+		# Note: we do NOT rely on DBI transactions for atomicity;
+		# only for batch performance.
+		$self->{mm}->{dbh}->rollback;
+		$self->{mm}->{dbh}->begin_work;
+		$sync->{mm_tmp} = $self->{mm}->tmp_clone;
+	}
+
 	# work backwards through history
 	for (my $i = $epoch_max; $i >= 0; $i--) {
 		index_epoch($self, $sync, $i);
@@ -1049,8 +1061,10 @@ sub index_sync {
 		$git->cleanup;
 	}
 	$self->done;
-	if (my $pr = $sync->{-opt}->{-progress}) {
-		$pr->('all.git '.sprintf($sync->{-regen_fmt}, $sync->{nr}));
+
+	if (my $nr = $sync->{nr}) {
+		my $pr = $sync->{-opt}->{-progress};
+		$pr->('all.git '.sprintf($sync->{-regen_fmt}, $nr)) if $pr;
 	}
 
 	# reindex does not pick up new changes, so we rerun w/o it:
-- 
EW


  parent reply	other threads:[~2019-05-30  6:52 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-30  6:52 [PATCH 0/4] v2writable: speedup no-op -index invocation Eric Wong
2019-05-30  6:52 ` [PATCH 1/4] v2writable: split off unindex_range mapping Eric Wong
2019-05-30  6:52 ` [PATCH 2/4] v2writable: hoist out index_epoch sub Eric Wong
2019-05-30  6:52 ` Eric Wong [this message]
2019-05-30  6:52 ` [PATCH 4/4] v2writable: short-circuit is_ancestor check on equality Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190530065227.17641-4-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).