user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 5/5] viewvcs: support streaming large blobs
Date: Thu, 31 Jan 2019 04:27:24 +0000	[thread overview]
Message-ID: <20190131042724.2675-6-e@80x24.org> (raw)
In-Reply-To: <20190131042724.2675-1-e@80x24.org>

Forking off git-cat-file here for streaming large blobs is
reasonably efficient, at least no worse than using
git-http-backend for serving clones.  So let our limiter
framework deal with it.

git itself isn't great for large files, and AFAIK there's no
stable/widely-available mechanisms for reading smaller chunks
of giant blobs in git itself.

Tested with some giant GPU headers in the Linux kernel.
---
 lib/PublicInbox/ViewVCS.pm | 37 +++++++++++++++++++++++++++++++++----
 1 file changed, 33 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/ViewVCS.pm b/lib/PublicInbox/ViewVCS.pm
index 85edf22..63731e9 100644
--- a/lib/PublicInbox/ViewVCS.pm
+++ b/lib/PublicInbox/ViewVCS.pm
@@ -34,6 +34,7 @@ END { $hl = undef };
 my %QP_MAP = ( A => 'oid_a', B => 'oid_b', a => 'path_a', b => 'path_b' );
 my $max_size = 1024 * 1024; # TODO: configurable
 my $enc_utf8 = find_encoding('UTF-8');
+my $BIN_DETECT = 8000; # same as git
 
 sub html_page ($$$) {
 	my ($ctx, $code, $strref) = @_;
@@ -43,7 +44,33 @@ sub html_page ($$$) {
 		my ($nr, undef) =  @_;
 		$nr == 1 ? $$strref : undef;
 	});
-	$wcb->($res);
+	$wcb ? $wcb->($res) : $res;
+}
+
+sub stream_large_blob ($$$$) {
+	my ($ctx, $res, $logref, $fn) = @_;
+	my ($git, $oid, $type, $size, $di) = @$res;
+	my $cmd = ['git', "--git-dir=$git->{git_dir}", 'cat-file', $type, $oid];
+	my $qsp = PublicInbox::Qspawn->new($cmd);
+	my @cl = ('Content-Length', $size);
+	my $env = $ctx->{env};
+	$env->{'qspawn.response'} = delete $ctx->{-wcb};
+	$qsp->psgi_return($env, undef, sub {
+		my ($r, $bref) = @_;
+		if (!defined $r) { # error
+			html_page($ctx, 500, $logref);
+		} elsif (index($$bref, "\0") >= 0) {
+			my $ct = 'application/octet-stream';
+			[200, ['Content-Type', $ct, @cl ] ];
+		} else {
+			my $n = bytes::length($$bref);
+			if ($n >= $BIN_DETECT || $n == $size) {
+				my $ct = 'text/plain; charset=UTF-8';
+				return [200, ['Content-Type', $ct, @cl] ];
+			}
+			undef; # bref keeps growing
+		}
+	});
 }
 
 sub solve_result {
@@ -65,9 +92,13 @@ sub solve_result {
 	$ref eq 'ARRAY' or return html_page($ctx, 500, \$log);
 
 	my ($git, $oid, $type, $size, $di) = @$res;
+	my $path = to_filename($di->{path_b} || $hints->{path_b} || 'blob');
+	my $raw_link = "(<a\nhref=$path>raw</a>)";
 	if ($size > $max_size) {
+		return stream_large_blob($ctx, $res, \$log, $fn) if defined $fn;
 		# TODO: stream the raw file if it's gigantic, at least
-		$log = '<pre><b>Too big to show</b></pre>' . $log;
+		$log = "<pre><b>Too big to show, download available</b>\n" .
+			"$oid $type $size bytes $raw_link</pre>" . $log;
 		return html_page($ctx, 500, \$log);
 	}
 
@@ -86,8 +117,6 @@ sub solve_result {
 		return delete($ctx->{-wcb})->([200, $h, [ $$blob ]]);
 	}
 
-	my $path = to_filename($di->{path_b} || $hints->{path_b} || 'blob');
-	my $raw_link = "(<a\nhref=$path>raw</a>)";
 	if ($binary) {
 		$log = "<pre>$oid $type $size bytes (binary)" .
 			" $raw_link</pre>" . $log;
-- 
EW


      parent reply	other threads:[~2019-01-31  4:27 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-31  4:27 [PATCH 0/5] a few more solver fixups and improvements Eric Wong
2019-01-31  4:27 ` [PATCH 1/5] t/config.t: test PublicInbox::Git sharing between inboxes Eric Wong
2019-01-31  4:27 ` [PATCH 2/5] inbox: perform cleanup of Git objects for coderepos Eric Wong
2019-01-31  4:27 ` [PATCH 3/5] solvergit: allow searching on longer-than-needed OIDs Eric Wong
2019-01-31  4:27 ` [PATCH 4/5] solvergit: allow shorter-than-necessary OIDs from user Eric Wong
2019-01-31  4:27 ` Eric Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190131042724.2675-6-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).