user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH 2/3] searchidx: switch to accounting by message bytes
  2017-06-14  0:14  5% [PATCH 0/3] search improvements Eric Wong
@ 2017-06-14  0:14  7% ` Eric Wong
  0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2017-06-14  0:14 UTC (permalink / raw)
  To: meta

Xapian memory usage is tied to the size of the indexed
text, so take the raw message size into account when
deciding when to flush Xapian data.

More importantly, we now flush Xapian before we have it
buffer beyond our maximum; and we do it unconditionally
to prevent even high priority processes from OOM-ing.
---
 lib/PublicInbox/SearchIdx.pm | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 316111b..30d3fe9 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -20,13 +20,14 @@ use Carp qw(croak);
 use POSIX qw(strftime);
 require PublicInbox::Git;
 
-use constant MAX_MID_SIZE => 244; # max term size - 1 in Xapian
 use constant {
+	MAX_MID_SIZE => 244, # max term size - 1 in Xapian
 	PERM_UMASK => 0,
 	OLD_PERM_GROUP => 1,
 	OLD_PERM_EVERYBODY => 2,
 	PERM_GROUP => 0660,
 	PERM_EVERYBODY => 0664,
+	BATCH_BYTES => 1_000_000,
 };
 
 sub new {
@@ -71,7 +72,6 @@ sub _xdb_acquire {
 		require File::Path;
 		_lock_acquire($self);
 		File::Path::mkpath($dir);
-		$self->{batch_size} = 100;
 		$flag = Search::Xapian::DB_CREATE_OR_OPEN;
 	}
 	$self->{xdb} = Search::Xapian::WritableDatabase->new($dir, $flag);
@@ -395,6 +395,15 @@ sub index_sync {
 	with_umask($self, sub { $self->_index_sync($opts) });
 }
 
+sub batch_adjust ($$$$) {
+	my ($max, $bytes, $batch_cb, $latest) = @_;
+	$$max -= $bytes;
+	if ($$max <= 0) {
+		$$max = BATCH_BYTES;
+		$batch_cb->($latest, 1);
+	}
+}
+
 sub rlog {
 	my ($self, $log, $add_cb, $del_cb, $batch_cb) = @_;
 	my $hex = '[a-f0-9]';
@@ -404,23 +413,21 @@ sub rlog {
 	my $git = $self->{git};
 	my $latest;
 	my $bytes;
-	my $max = $self->{batch_size}; # may be undef
+	my $max = BATCH_BYTES;
 	local $/ = "\n";
 	my $line;
 	while (defined($line = <$log>)) {
 		if ($line =~ /$addmsg/o) {
 			my $blob = $1;
 			my $mime = do_cat_mail($git, $blob, \$bytes) or next;
+			batch_adjust(\$max, $bytes, $batch_cb, $latest);
 			$add_cb->($self, $mime, $bytes, $blob);
 		} elsif ($line =~ /$delmsg/o) {
 			my $blob = $1;
-			my $mime = do_cat_mail($git, $blob) or next;
+			my $mime = do_cat_mail($git, $blob, \$bytes) or next;
+			batch_adjust(\$max, $bytes, $batch_cb, $latest);
 			$del_cb->($self, $mime);
 		} elsif ($line =~ /^commit ($h40)/o) {
-			if (defined $max && --$max <= 0) {
-				$max = $self->{batch_size};
-				$batch_cb->($latest, 1);
-			}
 			$latest = $1;
 		}
 	}
-- 
EW


^ permalink raw reply related	[relevance 7%]

* [PATCH 0/3] search improvements
@ 2017-06-14  0:14  5% Eric Wong
  2017-06-14  0:14  7% ` [PATCH 2/3] searchidx: switch to accounting by message bytes Eric Wong
  0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2017-06-14  0:14 UTC (permalink / raw)
  To: meta

These have been sitting in the stalled "repobrowse" branch for
a bit.  I think they can be tracked into "master", first; since
I'm leaning towards splitting repobrowse into a separate project
at the moment.

Eric Wong (3):
      search: remove unnecessary abstractions and functionality
      searchidx: switch to accounting by message bytes
      search: allow searching within mail diffs

 lib/PublicInbox/Search.pm    |  51 ++++++------
 lib/PublicInbox/SearchIdx.pm | 180 +++++++++++++++++++++++++++++++++++++------
 lib/PublicInbox/SearchMsg.pm |   2 +-
 t/search.t                   |   9 +--
 4 files changed, 183 insertions(+), 59 deletions(-)


^ permalink raw reply	[relevance 5%]

Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2017-06-14  0:14  5% [PATCH 0/3] search improvements Eric Wong
2017-06-14  0:14  7% ` [PATCH 2/3] searchidx: switch to accounting by message bytes Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).