user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH v2] codesearch: use retry_reopen for WWW
  2023-11-30 11:41  7% ` [PATCH 15/15] codesearch: use retry_reopen for WWW Eric Wong
@ 2023-11-30 21:40  7%   ` Eric Wong
  0 siblings, 0 replies; 3+ results
From: Eric Wong @ 2023-11-30 21:40 UTC (permalink / raw)
  To: meta

As with mail search, a cindex may be updated while WWW is
serving requests.  Thus we must reopen the Xapian DB when
the revision we're using becomes stale.
---
v2: avoid reintroducing load_ct as noted in
  https://public-inbox.org/meta/20231130213641.M35664@dcvr/

 lib/PublicInbox/CodeSearch.pm | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/lib/PublicInbox/CodeSearch.pm b/lib/PublicInbox/CodeSearch.pm
index f4694686..3092718d 100644
--- a/lib/PublicInbox/CodeSearch.pm
+++ b/lib/PublicInbox/CodeSearch.pm
@@ -258,15 +258,11 @@ sub load_ct { # retry_reopen cb
 
 sub load_commit_times { # each_cindex callback
 	my ($self, $todo) = @_; # todo = [ [ time, git ], [ time, git ] ...]
-	my (@pending, $rec, $dir, @ids, $doc);
+	my (@pending, $rec, $ct);
 	while ($rec = shift @$todo) {
-		@ids = docids_of_git_dir $self, $rec->[1]->{git_dir};
-		if (@ids) {
-			for (@ids) {
-				$doc = $self->get_doc($_) // next;
-				$rec->[0] = int_val($doc, CT);
-				last;
-			}
+		$ct = $self->retry_reopen(\&load_ct, $rec->[1]->{git_dir});
+		if (defined $ct) {
+			$rec->[0] = $ct;
 		} else { # may be in another cindex:
 			push @pending, $rec;
 		}
@@ -295,7 +291,7 @@ EOM
 			$git;
 		};
 	}
-	my $jd = join_data($self) or return warn <<EOM;
+	my $jd = $self->retry_reopen(\&join_data, $self) or return warn <<EOM;
 W: cindex.$name.topdir=$self->{topdir} has no usable join data for $cfg_f
 EOM
 	my ($ekeys, $roots, $ibx2root) = @$jd{qw(ekeys roots ibx2root)};
@@ -366,7 +362,7 @@ sub repos_sorted {
 	my @recs = map { [ 0, $_ ] } @_; # PublicInbox::Git objects
 	my @todo = @recs;
 	$pi_cfg->each_cindex(\&load_commit_times, \@todo);
-	@recs = sort { $b->[0] <=> $a->[0] } @recs;
+	@recs = sort { $b->[0] <=> $a->[0] } @recs; # sort by commit time
 }
 
 1;

^ permalink raw reply related	[relevance 7%]

* [PATCH 15/15] codesearch: use retry_reopen for WWW
  2023-11-30 11:40  6% [PATCH 00/15] various cindex fixes + speedups Eric Wong
@ 2023-11-30 11:41  7% ` Eric Wong
  2023-11-30 21:40  7%   ` [PATCH v2] " Eric Wong
  0 siblings, 1 reply; 3+ results
From: Eric Wong @ 2023-11-30 11:41 UTC (permalink / raw)
  To: meta

As with mail search, a cindex may be updated while WWW is
serving requests.  Thus we must reopen the Xapian DB when
the revision we're using becomes stale.
---
 lib/PublicInbox/CodeSearch.pm | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/lib/PublicInbox/CodeSearch.pm b/lib/PublicInbox/CodeSearch.pm
index f4694686..a2f4bae8 100644
--- a/lib/PublicInbox/CodeSearch.pm
+++ b/lib/PublicInbox/CodeSearch.pm
@@ -256,17 +256,22 @@ sub load_ct { # retry_reopen cb
 	}
 }
 
+sub load_ct { # retry_reopen cb
+	my ($self, $git_dir) = @_;
+	my @ids = docids_of_git_dir $self, $git_dir or return;
+	for (@ids) {
+		my $doc = $self->get_doc($_) // next;
+		return int_val($doc, CT);
+	}
+}
+
 sub load_commit_times { # each_cindex callback
 	my ($self, $todo) = @_; # todo = [ [ time, git ], [ time, git ] ...]
-	my (@pending, $rec, $dir, @ids, $doc);
+	my (@pending, $rec, $ct);
 	while ($rec = shift @$todo) {
-		@ids = docids_of_git_dir $self, $rec->[1]->{git_dir};
-		if (@ids) {
-			for (@ids) {
-				$doc = $self->get_doc($_) // next;
-				$rec->[0] = int_val($doc, CT);
-				last;
-			}
+		$ct = $self->retry_reopen(\&load_ct, $rec->[1]->{git_dir});
+		if (defined $ct) {
+			$rec->[0] = $ct;
 		} else { # may be in another cindex:
 			push @pending, $rec;
 		}
@@ -295,7 +300,7 @@ EOM
 			$git;
 		};
 	}
-	my $jd = join_data($self) or return warn <<EOM;
+	my $jd = $self->retry_reopen(\&join_data, $self) or return warn <<EOM;
 W: cindex.$name.topdir=$self->{topdir} has no usable join data for $cfg_f
 EOM
 	my ($ekeys, $roots, $ibx2root) = @$jd{qw(ekeys roots ibx2root)};
@@ -366,7 +371,7 @@ sub repos_sorted {
 	my @recs = map { [ 0, $_ ] } @_; # PublicInbox::Git objects
 	my @todo = @recs;
 	$pi_cfg->each_cindex(\&load_commit_times, \@todo);
-	@recs = sort { $b->[0] <=> $a->[0] } @recs;
+	@recs = sort { $b->[0] <=> $a->[0] } @recs; # sort by commit time
 }
 
 1;

^ permalink raw reply related	[relevance 7%]

* [PATCH 00/15] various cindex fixes + speedups
@ 2023-11-30 11:40  6% Eric Wong
  2023-11-30 11:41  7% ` [PATCH 15/15] codesearch: use retry_reopen for WWW Eric Wong
  0 siblings, 1 reply; 3+ results
From: Eric Wong @ 2023-11-30 11:40 UTC (permalink / raw)
  To: meta

Notable changes:

10/15 provides a huge speedup which will hopefully make
future developments faster.

12/15 probably obsoletes libgit2 for extindex "all" users.

13/15 can save some memory with many inboxes while making
configuration easier.

Eric Wong (15):
  cindex: fix store_repo+repo_stored on no-op
  codesearch: allow inbox count to exceed matches
  config: reject newlines consistently in dir names
  cindex: only create {-cidx_err} field on failures
  cindex: keep batch pipe for pruning SHA-256 repos
  cindex: store extensions.objectFormat with repo data
  git: share unlinked pack checking code with gcf2
  cindex: skip getpid guard for most OnDestroy use
  spawn: drop IO layer support from redirects
  cindex: speed up initial scan setup phase
  inbox: expire resources more aggressively
  git_async_cat: use git from "all" extindex if possible
  www_listing: support publicInbox.nameIsUrl
  inbox: shrink data structures for publicinbox.*.hide
  codesearch: use retry_reopen for WWW

 Documentation/public-inbox-config.pod |  19 +-
 lib/PublicInbox/CodeSearch.pm         |  54 +++--
 lib/PublicInbox/CodeSearchIdx.pm      | 286 ++++++++++++++++----------
 lib/PublicInbox/Config.pm             |  32 ++-
 lib/PublicInbox/Gcf2.pm               |  16 +-
 lib/PublicInbox/Git.pm                |  27 +--
 lib/PublicInbox/GitAsyncCat.pm        |   8 +-
 lib/PublicInbox/Inbox.pm              |  32 +--
 lib/PublicInbox/MailDiff.pm           |   3 +-
 lib/PublicInbox/SearchIdx.pm          |   5 +-
 lib/PublicInbox/Spawn.pm              |  32 +--
 lib/PublicInbox/WwwListing.pm         |  21 +-
 12 files changed, 303 insertions(+), 232 deletions(-)


^ permalink raw reply	[relevance 6%]

Results 1-3 of 3 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2023-11-30 11:40  6% [PATCH 00/15] various cindex fixes + speedups Eric Wong
2023-11-30 11:41  7% ` [PATCH 15/15] codesearch: use retry_reopen for WWW Eric Wong
2023-11-30 21:40  7%   ` [PATCH v2] " Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).