Date | Commit message (Collapse) |
|
The stat() array is a whopping 480 bytes (on x86-64, Perl 5.28),
while the new packed representation of two 64-bit doubles as a
scalar is "only" 56 bytes. This can add up when there's many
inboxes. Just use a string comparison on the packed
representation.
Some 32-bit Perl builds (IIRC OpenBSD) lack quad support, so
doubles were chosen for pack() portability.
|
|
We rely on spawn/popen_rd for redirects, nowadays.
|
|
I didn't wait until September to do it, this year!
|
|
popen_rd accepts arbitrary redirects, so we can reuse its
code to setup the pipe end we want to read, saving each
caller a few lines of code compared to calling pipe+spawn.
|
|
File::Glob is loaded by the perl for the "glob()" op, anyways,
so call bsd_glob with the GLOB_NOSORT to avoid needless sorting
of the output.
|
|
While v1 inboxes typically only have one branch, code repositories
may have dozens or even hundreds. Slurping those into memory is
a waste.
|
|
Most spawn and popen_rd callers die on failure to spawn,
anyways, and some are missing checks entirely. This saves
us a bunch of verbose error-checking code in callers.
This also makes popen_rd more consistent, since it already
dies on pipe creation failures.
|
|
We haven't used it in SolverGit, yet, and I'll be reworking it
to work with ->cat_async, instead.
|
|
While v1 inboxes are typically only a single branch, coderepos
will have many branches and being able to pipeline requests
to "git cat-file --batch" can help us mask seek times.
|
|
'0' is a valid value for HTTP_HOST, and maybe some folks
will want to hit that as port 80 where the HTTP client won't
send the ":$PORT" suffix.
|
|
There's a bunch of leftover "require" and "use" statements we no
longer need and can get rid of, along with some excessive
imports via "use".
IO::Handle usage isn't always obvious, so add comments
describing why a package loads it. Along the same lines,
document the tmpdir support as the reason we depend on
File::Temp 0.19, even though every Perl 5.10.1+ user has it.
While we're at it, favor "use" over "require", since it it gives
us extra compile-time checking.
|
|
We can save callers the trouble of {-hold} and {-dev_null}
refs as well as the trouble of calling fileno().
|
|
This allows callers to avoid allocating several KB for for every
call to ->async_cat.
|
|
This is a transitionary interface which does NOT require an
event loop. It can be plugged into in current synchronous code
without major surgery.
It allows HTTP/1.1 pipelining-like functionality by taking
advantage of predictable and well-specified POSIX pipe semantics
by stuffing multiple git cat-file requests into the --batch pipe
With xt/git_async_cmp.t and GIANT_GIT_DIR=git.git, the async
interface is 10-25% faster than the synchronous interface since
it can keep the "git cat-file" process busier.
This is expected to improve performance on systems with slower
storage (but multiple cores).
|
|
This was intended for solver, but it's unused since
commit 915cd090798069a4
("solver: switch patch application to use a callback")
|
|
Although we always unlink temporary files, give them a
meaningful name so that we can we can still make sense
of the pre-unlink name when using lsof(8) or similar
tools on Linux.
|
|
|
|
While we're usually not stuck waiting on waitpid after
seeing a pipe EOF or even triggering SIGPIPE in the process
(e.g. git-http-backend) we're reading from, it MAY happen
and we should be careful to never hang the daemon process
on waitpid calls.
v2: use "eq" for string comparison against 'DEFAULT'
|
|
* origin/manifest:
git: ensure ->modified returns an integer
www: support $INBOX/git/$EPOCH.git for v2 cloning
www: wire up /$INBOX/manifest.js.gz, too
wwwlisting: generate grokmirror-compatible manifest.js.gz
wwwlisting: allow hiding entries from manifest
|
|
We weren't using it, and in retrospect, it makes no sense to use
this API cat_file for giant responses which can't read quickly
with minimal context-switching (or sanely fit into memory for
Email::Simple/Email::MIME).
For giant blobs which we don't want slurped in memory, we'll
spawn a short-lived git-cat-file process like we do in ViewVCS.
Otherwise, monopolizing a git-cat-file process for a giant
blob is harmful to other PSGI/NNTP users.
A better interface is coming which will be more suitable for
for batch processing of "small" objects such as commits and
email blobs.
|
|
We don't want to serialize timestamps as strings to JSON.
I only noticed this bug on a 32-bit system.
|
|
While I don't expect git to suddenly start spewing non-ASCII
digits in places I'd expect ASCII, this would make things easier
for future hackers and reviewers.
|
|
No reason to leave that (usually) empty file open after killing off
"cat-file --batch-check". This wasn't an unbound leak, though,
as respawning the --batch-check process would've clobbered the
old err_c file.
|
|
A constant stream of traffic to either httpd/nntpd would mean
git-cat-file processes never expire. Things can go bad after a
full repack, as a full repack will unlink old pack indices and
git-cat-file does not currently detect unlinked files.
We could do something complicated by recursively stat-ing
objects/pack of every git directory and alternate;
but that's probably not worth the trouble compared to
occasionally restarting the cat-file process.
So simplify the code and let httpd/nntpd expire them
periodically, since spawning a "git-cat-file --batch" process
isn't too expensive. We already spawn for every request which
hits git-http-backend, cgit, and git-apply.
In the future, we may optionally support the Git::Raw module
to avoid IPC; but we must remain careful to not leave lingering
FDs open to unlinked files after repack.
|
|
git < 2.5.0 was missing --git-path support. This means any
users relying on some rare environment variables will need git
2.5.0+
|
|
This will be used for generating an HTML listing for v1 inboxes,
at least. The logic for this follows that of grokmirror,
and we may dynamically generate manifest.js.gz natively...
|
|
We can save admins the trouble of declaring [coderepo "..."]
sections in the public-inbox config by parsing the cgitrc
directly.
Macro expansion (e.g. $HTTP_HOST) expansion is not supported,
yet; but may be in the future.
|
|
This will be useful for extracting titles/subjects from
commit objects when displaying commits.
|
|
Otherwise, long-running but idle git processes may keep unlinked
packs around indefinitely and waste disk space.
|
|
Using git worktrees was causing t/solver_git.t to fail on me.
|
|
We can avoid bumping up RLIMIT_NOFILE too much by storing
patches in a temporary directory. And we can share this
top-level directory with our temporary git repository.
Since we no longer rely on a working-tree for git, we are free
to rearrange the layout and avoid relying on the ".git"
convention and relying on "git -C" for chdir.
This may also ease porting public-inbox to older systems
where git does not support "-C" for chdir.
|
|
David Turner's patch to return "ambiguous" seems like a reasonable
patch for future versions of git:
https://public-inbox.org/git/672a6fb9e480becbfcb5df23ae37193784811b6b.camel@novalis.org/
|
|
Ambiguity is not worth it for internal usage with the
solver.
|
|
This will be useful for disambiguating short OIDs in older
emails when abbreviations were shorter.
Tested against the following script with /path/to/git.git
==> t.perl <==
use strict;
use PublicInbox::Git;
use Data::Dumper;
my $dir = shift or die "Usage: $0 GIT_DIR # (of git.git)";
my $git = PublicInbox::Git->new($dir);
my @res = $git->check('dead');
print Dumper({res => \@res, err=> $git->last_check_err});
@res = $git->check('5335669531d83d7d6c905bcfca9b5f8e182dc4d4');
print Dumper({res => \@res, err=> $git->last_check_err});
|
|
It'll be helpful for displaying progress in SolverGit
output.
|
|
For redundancy and centralization resistance.
|
|
This will lookup git blobs from associated git source code
repositories. If the blobs can't be found, an attempt to
"solve" them via patch application will be performed.
Eventually, this may become the basis of a type-agnostic
frontend similar to "git show"
|
|
We need to work with 0x22 (double-quote) and 0x5c (backslash);
even if they're oddball characters in filenames which wouldn't
be used by projects I'd want to work on.
|
|
This function doesn't have a lot of callers at the moment so
none of them are affected by this change. But the plan is to
use this in our WWW code for things, so do it now before we
call it in more places.
Results from a Thinkpad X200 with a Core2Duo P8600 @ 2.4GHz:
Benchmark: timing 10 iterations of cp, ip...
cp: 12.868 wallclock secs (12.86 usr + 0.00 sys = 12.86 CPU) @ 0.78/s (n=10)
ip: 10.9137 wallclock secs (10.91 usr + 0.00 sys = 10.91 CPU) @ 0.92/s (n=10)
Note: I mainly care about unquoted performance because
that's the common case for the target audience of public-inbox.
Script used to get benchmark results against the Linux source tree:
==> bench_unquote.perl <==
use strict;
use warnings;
use Benchmark ':hireswallclock';
my $nr = 50;
my %GIT_ESC = (
a => "\a",
b => "\b",
f => "\f",
n => "\n",
r => "\r",
t => "\t",
v => "\013",
);
sub git_unquote_ip ($) {
return $_[0] unless ($_[0] =~ /\A"(.*)"\z/);
$_[0] = $1;
$_[0] =~ s/\\([abfnrtv])/$GIT_ESC{$1}/g;
$_[0] =~ s/\\([0-7]{1,3})/chr(oct($1))/ge;
$_[0];
}
sub git_unquote_cp ($) {
my ($s) = @_;
return $s unless ($s =~ /\A"(.*)"\z/);
$s = $1;
$s =~ s/\\([abfnrtv])/$GIT_ESC{$1}/g;
$s =~ s/\\([0-7]{1,3})/chr(oct($1))/ge;
$s;
}
chomp(my @files = `git -C ~/linux ls-tree --name-only -r v4.19.13`);
timethese(10, {
cp => sub { for (0..$nr) { git_unquote_cp($_) for @files } },
ip => sub { for (0..$nr) { git_unquote_ip($_) for @files } },
});
|
|
We'll be using it outside of searchidx...
|
|
I've hit /proc/sys/fs/pipe-user-pages-* limits on some systems.
So stop hogging resources on pipes which don't benefit from
giant sizes.
Some of these can use eventfd in the future to further reduce
resource use.
|
|
Since we'll be adding new repositories to the `alternates' file
in git, we must restart the `git cat-file --batch' process as
git currently does not detect changes to the alternates file
in long-running cat-file processes.
Don't bother with the `--batch-check' process since we won't be
using it with v2.
|
|
Wrap the old Import package to enable creating new repos based
on size thresholds. This is better than relying on time-based
rotation as LKML traffic seems to be increasing.
|
|
Using update-copyrights from gnulib
While we're at it, use the SPDX identifier for AGPL-3.0+ to
ease mechanical processing.
|
|
We must ensure cat-file process is launched before Xapian
grabs lock, too. Our use of "git cat-file --batch" has
the same problem as "git log" did, (which was fixed in
commit 3713c727cda431a0dc2865a7878c13ecf9f21851)
"searchidx: release Xapian FDs before spawning git log"
|
|
fork failures are unfortunately common when Xapian has
gigabytes and gigabytes mmapped.
|
|
This hopefully makes the intent of the code clearer, too.
The the HTTP use of the numeric reference for getline
caused problems in Git.pm, already.
|
|
This allows us to easily provide gigantic inboxes
with proper backpressure handling for slow clients.
It also eliminates public-inbox-httpd and Danga::Socket-specific
knowledge from this class, making it easier to follow for
those used to generic PSGI applications.
|
|
This is probably trivial enough to be final?
|
|
This lets us one-line git commands easily like ``, but without
having to remember --git-dir or escape arguments.
|