user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH 09/15] spawn: drop IO layer support from redirects
  2023-11-30 11:40  7% [PATCH 00/15] various cindex fixes + speedups Eric Wong
@ 2023-11-30 11:41  7% ` Eric Wong
  0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2023-11-30 11:41 UTC (permalink / raw)
  To: meta

When setting up stdin for commands, the write_file API is
convenient enough nowadays to not be worth having special
support with process spawning.

When reading stdout of commands, we should probably be using
utf8_maybe everywhere since there'll always be legacy encodings
in git repos.

Reading regular files with :utf8 also results in worse memory
management since the file size cannot be used as a hint.
---
 lib/PublicInbox/MailDiff.pm  |  3 ++-
 lib/PublicInbox/SearchIdx.pm |  5 ++++-
 lib/PublicInbox/Spawn.pm     | 32 +++++++++++---------------------
 3 files changed, 17 insertions(+), 23 deletions(-)

diff --git a/lib/PublicInbox/MailDiff.pm b/lib/PublicInbox/MailDiff.pm
index e4e262ef..125360fe 100644
--- a/lib/PublicInbox/MailDiff.pm
+++ b/lib/PublicInbox/MailDiff.pm
@@ -65,6 +65,7 @@ sub next_smsg ($) {
 sub emit_msg_diff {
 	my ($bref, $self) = @_; # bref is `git diff' output
 	require PublicInbox::Hval;
+	PublicInbox::Hval::utf8_maybe($$bref);
 
 	# will be escaped to `•' in HTML
 	$self->{ctx}->{ibx}->{obfuscate} and
@@ -81,7 +82,7 @@ sub do_diff {
 	my $dir = "$self->{tmp}/$n";
 	$self->dump_eml($dir, $eml);
 	my $cmd = [ qw(git diff --no-index --no-color -- a), $n ];
-	my $opt = { -C => "$self->{tmp}", quiet => 1, 1 => [':utf8', \my $o] };
+	my $opt = { -C => "$self->{tmp}", quiet => 1 };
 	my $qsp = PublicInbox::Qspawn->new($cmd, undef, $opt);
 	$qsp->psgi_qx($self->{ctx}->{env}, undef, \&emit_msg_diff, $self);
 }
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 17538027..86c435fd 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -355,8 +355,11 @@ sub index_body_text {
 	my $rd;
 	if ($$sref =~ /^(?:diff|---|\+\+\+) /ms) { # start patch-id in parallel
 		my $git = ($self->{ibx} // $self->{eidx} // $self)->git;
+		my $fh = PublicInbox::IO::write_file '+>:utf8', undef, $$sref;
+		$fh->flush or die "flush: $!";
+		sysseek($fh, 0, SEEK_SET);
 		$rd = popen_rd($git->cmd(qw(patch-id --stable)), undef,
-				{ 0 => [ ':utf8', $sref ] });
+				{ 0 => $fh });
 	}
 
 	# split off quoted and unquoted blocks:
diff --git a/lib/PublicInbox/Spawn.pm b/lib/PublicInbox/Spawn.pm
index 9c680690..e6b12994 100644
--- a/lib/PublicInbox/Spawn.pm
+++ b/lib/PublicInbox/Spawn.pm
@@ -332,18 +332,6 @@ sub which ($) {
 	undef;
 }
 
-sub scalar_redirect {
-	my ($layer, $opt, $child_fd, $bref) = @_;
-	open my $fh, '+>'.$layer, undef;
-	$opt->{"fh.$child_fd"} = $fh;
-	if ($child_fd == 0) {
-		print $fh $$bref;
-		$fh->flush or die "flush: $!";
-		sysseek($fh, 0, SEEK_SET);
-	}
-	fileno($fh);
-}
-
 sub spawn ($;$$) {
 	my ($cmd, $env, $opt) = @_;
 	my $f = which($cmd->[0]) // die "$cmd->[0]: command not found\n";
@@ -354,14 +342,18 @@ sub spawn ($;$$) {
 	}
 	for my $child_fd (0..2) {
 		my $pfd = $opt->{$child_fd};
-		if ('ARRAY' eq ref($pfd)) {
-			my ($layer, $bref) = @$pfd;
-			$pfd = scalar_redirect($layer, $opt, $child_fd, $bref)
-		} elsif ('SCALAR' eq ref($pfd)) {
-			$pfd = scalar_redirect('', $opt, $child_fd, $pfd);
+		if ('SCALAR' eq ref($pfd)) {
+			open my $fh, '+>', undef;
+			$opt->{"fh.$child_fd"} = $fh; # for read_out_err
+			if ($child_fd == 0) {
+				print $fh $$pfd;
+				$fh->flush or die "flush: $!";
+				sysseek($fh, 0, SEEK_SET);
+			}
+			$pfd = fileno($fh);
 		} elsif (defined($pfd) && $pfd !~ /\A[0-9]+\z/) {
 			my $fd = fileno($pfd) //
-					die "$pfd not an IO GLOB? $!";
+					croak "BUG: $pfd not an IO GLOB? $!";
 			$pfd = $fd;
 		}
 		$rdr[$child_fd] = $pfd // $child_fd;
@@ -399,9 +391,7 @@ sub read_out_err ($) {
 	for my $fd (1, 2) { # read stdout/stderr
 		my $fh = delete($opt->{"fh.$fd"}) // next;
 		seek($fh, 0, SEEK_SET);
-		my $dst = $opt->{$fd};
-		$dst = $opt->{$fd} = $dst->[1] if ref($dst) eq 'ARRAY';
-		PublicInbox::IO::read_all $fh, 0, $dst
+		PublicInbox::IO::read_all $fh, undef, $opt->{$fd};
 	}
 }
 

^ permalink raw reply related	[relevance 7%]

* [PATCH 00/15] various cindex fixes + speedups
@ 2023-11-30 11:40  7% Eric Wong
  2023-11-30 11:41  7% ` [PATCH 09/15] spawn: drop IO layer support from redirects Eric Wong
  0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2023-11-30 11:40 UTC (permalink / raw)
  To: meta

Notable changes:

10/15 provides a huge speedup which will hopefully make
future developments faster.

12/15 probably obsoletes libgit2 for extindex "all" users.

13/15 can save some memory with many inboxes while making
configuration easier.

Eric Wong (15):
  cindex: fix store_repo+repo_stored on no-op
  codesearch: allow inbox count to exceed matches
  config: reject newlines consistently in dir names
  cindex: only create {-cidx_err} field on failures
  cindex: keep batch pipe for pruning SHA-256 repos
  cindex: store extensions.objectFormat with repo data
  git: share unlinked pack checking code with gcf2
  cindex: skip getpid guard for most OnDestroy use
  spawn: drop IO layer support from redirects
  cindex: speed up initial scan setup phase
  inbox: expire resources more aggressively
  git_async_cat: use git from "all" extindex if possible
  www_listing: support publicInbox.nameIsUrl
  inbox: shrink data structures for publicinbox.*.hide
  codesearch: use retry_reopen for WWW

 Documentation/public-inbox-config.pod |  19 +-
 lib/PublicInbox/CodeSearch.pm         |  54 +++--
 lib/PublicInbox/CodeSearchIdx.pm      | 286 ++++++++++++++++----------
 lib/PublicInbox/Config.pm             |  32 ++-
 lib/PublicInbox/Gcf2.pm               |  16 +-
 lib/PublicInbox/Git.pm                |  27 +--
 lib/PublicInbox/GitAsyncCat.pm        |   8 +-
 lib/PublicInbox/Inbox.pm              |  32 +--
 lib/PublicInbox/MailDiff.pm           |   3 +-
 lib/PublicInbox/SearchIdx.pm          |   5 +-
 lib/PublicInbox/Spawn.pm              |  32 +--
 lib/PublicInbox/WwwListing.pm         |  21 +-
 12 files changed, 303 insertions(+), 232 deletions(-)


^ permalink raw reply	[relevance 7%]

Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2023-11-30 11:40  7% [PATCH 00/15] various cindex fixes + speedups Eric Wong
2023-11-30 11:41  7% ` [PATCH 09/15] spawn: drop IO layer support from redirects Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).