git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 00/18] Gitweb caching v8
@ 2010-12-09 21:57 John 'Warthog9' Hawley
  2010-12-09 21:57 ` [PATCH 01/18] gitweb: Prepare for splitting gitweb John 'Warthog9' Hawley
                   ` (19 more replies)
  0 siblings, 20 replies; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-12-09 21:57 UTC (permalink / raw)
  To: git

Afternoon everyone,

(Afternoon is like morning, right?)
 
This is the latest incarnation of gitweb w/ caching.  Per the general
consensus and requests from the recent Gittogether I'm re-submitting 
my patches.

Bunch of re-works in the code, and several requested features.  Sadly the
patch series has balloned as I've been adding things.  It was 3-4 patches,
it's now 18.  This is based on top of Jakub's v7.2 patch series, but
it should be more or less clean now.

As such there was a bunch of changes that I needed to do to Jakub's tree
which are indicated in the series.  Why did I do them up as separate things?
Mainly there's a bunch of history that's getting lost right now between
going back and forth, and I wanted to have clear patches to discuss
should further discussion be needed.

This still differs, by two patches, from whats in production on kernel.org.
It's missing the index page git:// link, and kernel.org and kernel.org also
has the forced version matching.  As a note I'll probably let this stew
another day or so on kernel.org and then I'll push it into the Fedora update
stream, as there's a couple of things in this patch series that would be 
good for them to have.

There is one additional script I've written that the Fedora folks are using,
and that might be useful to include, which is an 'offline' cache file generator.
It basically wraps gitweb.cgi and at the end moves the cache file into the right
place.  The Fedora folks were finding it took hours to generate their front
page, and that doing a background generation almost never completed (due to 
process death).  This was a simple way to handle that.  If people would like
I can add it in as an additional patch.

v8:
	- Reverting several changes from Jakub's change set that make no sense
                - is_cacheable changed to always return true - nothing special about
                  blame or blame_incremental as far as the caching engine is concerned
                - Reverted config file change "caching_enabled" back to "cache_enable" as this
                  config file option is already in the wild in production code, as are all
                  current gitweb-caching configuration variables.
                - Reverted change to reset_output as
                        open STDOUT, ">&", \*STDOUT_REAL;
                  causes assertion failures:
                  Assertion !((((s->var)->sv_flags & (0x00004000|0x00008000)) == 0x00008000) && (((svtype)((s->var)->sv_flags & 0xff)) == SVt_PVGV || ((svtype)((s->var)->sv_flags & 0xff)) == SVt_PVLV)) failed: file "scalar.xs", line 49 at gitweb.cgi line 1221.
                  if we encounter an error *BEFORE* we've ever changed the output.
        - Cleanups there were indirectly mentioned by Jakub
                - Elimination of anything even remotely looking like duplicate code
                        - Creation of isBinaryAction() and isFeedAction()
        - Adding in blacklist of "dumb" clients for purposes of downloading content
        - Added more explicit disablement of "Generating..." page
        - Added better error handling
                - Creation of .err file in the cache directory
                - Trap STDERR output into $output_err as this was spewing data prior to any header information being sent
        - Added hidden field in footer for url & hash of url, which is extremely useful for debugging

v7:
	- Rework output system, now central STDOUT redirect
	- Various fixes to caching brought in from existing
	  running system

v6:
	- Never saw the light of day
	- Various testing, and reworks.

v5:
	- Missed a couple of things that were in my local tree, and
	  added them back in.
	- Split up the die_error and the version matching patch
	- Set version matching to be on by default - otherwise this
	  really is code that will never get checked, or at best
	  enabled by default by distributions
	- Added a minor code cleanup with respect to $site_header
	  that was already in my tree
	- Applied against a more recent git tree vs. 1.6.6-rc2
	- Removed breakout patch for now (did that in v4 actually)
	  and will deal with that separately 

	http://git.kernel.org/?p=git/warthog9/gitweb.git;a=shortlog;h=refs/heads/gitweb-ml-v5

v4:
	- major re-working of the caching layer to use file handle
	  redirection instead of buffering output
	- other minor improvements

	http://git.kernel.org/?p=git/warthog9/gitweb.git;a=shortlog;h=refs/heads/gitweb-ml-v4
v3:
	- various minor re-works based on mailing list feedback,
	  this series was not sent to the mailing list.
v2:
	- Better breakout
	- You can actually disable the cache now

- John 'Warthog9' Hawley 


Jakub Narebski (2):
  gitweb: Prepare for splitting gitweb
  gitweb: Minimal testing of gitweb caching

John 'Warthog9' Hawley (16):
  gitweb: add output buffering and associated functions
  gitweb: File based caching layer (from git.kernel.org)
  gitweb: Regression fix concerning binary output of files
  gitweb: Add more explicit means of disabling 'Generating...' page
  gitweb: Revert back to $cache_enable vs. $caching_enabled
  gitweb: Change is_cacheable() to return true always
  gitweb: Revert reset_output() back to original code
  gitweb: Adding isBinaryAction() and isFeedAction() to determine the
    action type
  gitweb: add isDumbClient() check
  gitweb: Change file handles (in caching) to lexical variables as
    opposed     to globs
  gitweb: Add commented url & url hash to page footer
  gitweb: add print_transient_header() function for central header
    printing
  gitweb: Add show_warning() to display an immediate warning, with
    refresh
  gitweb: When changing output (STDOUT) change STDERR as well
  gitweb: Prepare for cached error pages & better error page handling
  gitweb: Add better error handling for gitweb caching

 gitweb/Makefile                           |   20 +-
 gitweb/gitweb.perl                        |  176 ++++++++++-
 gitweb/lib/cache.pl                       |  488 +++++++++++++++++++++++++++++
 gitweb/static/gitweb.css                  |    6 +
 t/gitweb-lib.sh                           |   16 +
 t/t9500-gitweb-standalone-no-errors.sh    |   20 ++
 t/t9501-gitweb-standalone-http-status.sh  |   13 +
 t/t9502-gitweb-standalone-parse-output.sh |   33 ++
 8 files changed, 762 insertions(+), 10 deletions(-)
 create mode 100644 gitweb/lib/cache.pl
 mode change 100644 => 100755 t/gitweb-lib.sh

-- 
1.7.2.3

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 01/18] gitweb: Prepare for splitting gitweb
  2010-12-09 21:57 [PATCH 00/18] Gitweb caching v8 John 'Warthog9' Hawley
@ 2010-12-09 21:57 ` John 'Warthog9' Hawley
  2010-12-09 23:30   ` Jakub Narebski
  2010-12-09 21:57 ` [PATCH 02/18] gitweb: add output buffering and associated functions John 'Warthog9' Hawley
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-12-09 21:57 UTC (permalink / raw)
  To: git

From: Jakub Narebski <jnareb@gmail.com>

Prepare gitweb for having been split into modules that are to be
installed alongside gitweb in 'lib/' subdirectory, by adding

  use lib __DIR__.'/lib';

to gitweb.perl (to main gitweb script), and preparing for putting
modules (relative path) in $(GITWEB_MODULES) in gitweb/Makefile.

This preparatory work allows to add new module to gitweb by simply
adding

  GITWEB_MODULES += <module>

to gitweb/Makefile (assuming that the module is in 'gitweb/lib/'
directory).

While at it pass GITWEBLIBDIR in addition to GITWEB_TEST_INSTALLED
to test instaleed version of gitweb and installed version of modules
(for tests which check individual (sub)modules).


Using __DIR__ from Dir::Self module (not in core, that's why currently
gitweb includes excerpt of code from Dir::Self defining __DIR__) was
chosen over using FindBin-based solution (in core since perl 5.00307,
while gitweb itself requires at least perl 5.8.0) because FindBin uses
BEGIN block, which is a problem under mod_perl and other persistent
Perl environments (thought there are workarounds).

At Pavan Kumar Sankara suggestion gitweb/Makefile uses

  install [OPTION]... SOURCE... DIRECTORY

format (2nd format) with single SOURCE rather than

  install [OPTION]... SOURCE DEST

format (1st format) because of security reasons (race conditions).
Modern GNU install has `-T' / `--no-target-directory' option, but we
cannot rely that the $(INSTALL) we are using supports this option.

The install-modules target in gitweb/Makefile uses shell 'for' loop,
instead of make's $(foreach) function, to avoid possible problem with
generating a command line that exceeded the maximum argument list
length.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/Makefile    |   17 +++++++++++++++--
 gitweb/gitweb.perl |    8 ++++++++
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/gitweb/Makefile b/gitweb/Makefile
index 0a6ac00..f9e32eb 100644
--- a/gitweb/Makefile
+++ b/gitweb/Makefile
@@ -57,6 +57,7 @@ PERL_PATH  ?= /usr/bin/perl
 bindir_SQ = $(subst ','\'',$(bindir))#'
 gitwebdir_SQ = $(subst ','\'',$(gitwebdir))#'
 gitwebstaticdir_SQ = $(subst ','\'',$(gitwebdir)/static)#'
+gitweblibdir_SQ = $(subst ','\'',$(gitwebdir)/lib)#'
 SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH))#'
 PERL_PATH_SQ  = $(subst ','\'',$(PERL_PATH))#'
 DESTDIR_SQ    = $(subst ','\'',$(DESTDIR))#'
@@ -153,20 +154,32 @@ test:
 
 test-installed:
 	GITWEB_TEST_INSTALLED='$(DESTDIR_SQ)$(gitwebdir_SQ)' \
+	GITWEBLIBDIR='$(DESTDIR_SQ)$(gitweblibdir_SQ)' \
 		$(MAKE) -C ../t gitweb-test
 
 ### Installation rules
 
-install: all
+install: all install-modules
 	$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(gitwebdir_SQ)'
 	$(INSTALL) -m 755 $(GITWEB_PROGRAMS) '$(DESTDIR_SQ)$(gitwebdir_SQ)'
 	$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(gitwebstaticdir_SQ)'
 	$(INSTALL) -m 644 $(GITWEB_FILES) '$(DESTDIR_SQ)$(gitwebstaticdir_SQ)'
 
+install-modules:
+	install_dirs="$(sort $(dir $(GITWEB_MODULES)))" && \
+	for dir in $$install_dirs; do \
+		test -d '$(DESTDIR_SQ)$(gitweblibdir_SQ)/$$dir' || \
+		$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(gitweblibdir_SQ)/$$dir'; \
+	done
+	gitweb_modules="$(GITWEB_MODULES)" && \
+	for mod in $$gitweb_modules; do \
+		$(INSTALL) -m 644 lib/$$mod '$(DESTDIR_SQ)$(gitweblibdir_SQ)/$$(dirname $$mod)'; \
+	done
+
 ### Cleaning rules
 
 clean:
 	$(RM) gitweb.cgi static/gitweb.min.js static/gitweb.min.css GITWEB-BUILD-OPTIONS
 
-.PHONY: all clean install test test-installed .FORCE-GIT-VERSION-FILE FORCE
+.PHONY: all clean install install-modules test test-installed .FORCE-GIT-VERSION-FILE FORCE
 
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 679f2da..cfa511c 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -10,6 +10,14 @@
 use 5.008;
 use strict;
 use warnings;
+
+use File::Spec;
+# __DIR__ is taken from Dir::Self __DIR__ fragment
+sub __DIR__ () {
+	File::Spec->rel2abs(join '', (File::Spec->splitpath(__FILE__))[0, 1]);
+}
+use lib __DIR__ . '/lib';
+
 use CGI qw(:standard :escapeHTML -nosticky);
 use CGI::Util qw(unescape);
 use CGI::Carp qw(fatalsToBrowser set_message);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 02/18] gitweb: add output buffering and associated functions
  2010-12-09 21:57 [PATCH 00/18] Gitweb caching v8 John 'Warthog9' Hawley
  2010-12-09 21:57 ` [PATCH 01/18] gitweb: Prepare for splitting gitweb John 'Warthog9' Hawley
@ 2010-12-09 21:57 ` John 'Warthog9' Hawley
  2010-12-09 21:57 ` [PATCH 03/18] gitweb: File based caching layer (from git.kernel.org) John 'Warthog9' Hawley
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-12-09 21:57 UTC (permalink / raw)
  To: git

This adds output buffering for gitweb, mainly in preparation for
caching support.  This is a dramatic change to how caching was being
done, mainly in passing around the variable manually and such.

This centrally flips the entire STDOUT to a variable, which after the
completion of the run, flips it back and does a print on the resulting
data.

This should save on the previous 10K line patch (or so) that adds more
explicit output passing.

[jn: modified reset_output to silence 'gitweb.perl: Name "main::STDOUT_REAL"
 used only once: possible typo at ../gitweb/gitweb.perl line 1130.' warning]

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 gitweb/gitweb.perl |   29 +++++++++++++++++++++++++++++
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index cfa511c..cae0e34 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -39,6 +39,9 @@ BEGIN {
 
 our $version = "++GIT_VERSION++";
 
+# Output buffer variable
+our $output = "";
+
 our ($my_url, $my_uri, $base_url, $path_info, $home_link);
 sub evaluate_uri {
 	our $cgi;
@@ -1134,6 +1137,25 @@ sub evaluate_argv {
 	);
 }
 
+sub change_output {
+	our $output;
+
+	# Trap the 'proper' STDOUT to STDOUT_REAL for things like error messages and such
+	open(STDOUT_REAL,">&STDOUT") or die "Unable to capture STDOUT $!\n";
+
+	# Close STDOUT, so that it isn't being used anymore.
+	close STDOUT;
+
+	# Trap STDOUT to the $output variable, which is what I was using in the original
+	# patch anyway.
+	open(STDOUT,">", \$output) || die "Unable to open STDOUT: $!"; #open STDOUT handle to use $var
+}
+
+sub reset_output {
+	# This basically takes STDOUT_REAL and puts it back as STDOUT
+	open STDOUT, ">&", \*STDOUT_REAL;
+}
+
 sub run {
 	evaluate_argv();
 
@@ -1145,7 +1167,10 @@ sub run {
 		$pre_dispatch_hook->()
 			if $pre_dispatch_hook;
 
+		change_output();
 		run_request();
+		reset_output();
+		print $output;
 
 		$post_dispatch_hook->()
 			if $post_dispatch_hook;
@@ -3655,6 +3680,10 @@ sub die_error {
 		500 => '500 Internal Server Error',
 		503 => '503 Service Unavailable',
 	);
+	# Reset the output so that we are actually going to STDOUT as opposed
+	# to buffering the output.
+	reset_output();
+
 	git_header_html($http_responses{$status}, undef, %opts);
 	print <<EOF;
 <div class="page_body">
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 03/18] gitweb: File based caching layer (from git.kernel.org)
  2010-12-09 21:57 [PATCH 00/18] Gitweb caching v8 John 'Warthog9' Hawley
  2010-12-09 21:57 ` [PATCH 01/18] gitweb: Prepare for splitting gitweb John 'Warthog9' Hawley
  2010-12-09 21:57 ` [PATCH 02/18] gitweb: add output buffering and associated functions John 'Warthog9' Hawley
@ 2010-12-09 21:57 ` John 'Warthog9' Hawley
  2010-12-09 21:57 ` [PATCH 04/18] gitweb: Minimal testing of gitweb caching John 'Warthog9' Hawley
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-12-09 21:57 UTC (permalink / raw)
  To: git

This is a relatively large patch that implements the file based
caching layer that is quite similar to the one used  on such large
sites as kernel.org and soon git.fedoraproject.org.  This provides
a simple, and straight forward caching mechanism that scales
dramatically better than Gitweb by itself.

The caching layer basically buffers the output that Gitweb would
normally return, and saves that output to a cache file on the local
disk.  When the file is requested it attempts to gain a shared lock
on the cache file and cat it out to the client.  Should an exclusive
lock be on a file (it's being updated) the code has a choice to either
update in the background and go ahead and show the stale page while
update is being performed, or stall the client(s) until the page
is generated.

There are two forms of stalling involved here, background building
and non-background building, both of which are discussed in the
configuration page.

There are still a few known "issues" with respect to this:
- Code needs to be added to be "browser" aware so
  that clients like wget that are trying to get a
  binary blob don't obtain a "Generating..." page

Caching is disabled by default.  You can turn it on by setting
$caching_enabled variable to true to enable file based caching.

[jn: added error checking to loading 'cache.pl'; moved check
 for $caching_enabled outside out of cache_fetch, which required
 update to die_error()]

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 gitweb/Makefile          |    3 +
 gitweb/gitweb.perl       |  105 ++++++++++++--
 gitweb/lib/cache.pl      |  348 ++++++++++++++++++++++++++++++++++++++++++++++
 gitweb/static/gitweb.css |    6 +
 4 files changed, 450 insertions(+), 12 deletions(-)
 create mode 100644 gitweb/lib/cache.pl

diff --git a/gitweb/Makefile b/gitweb/Makefile
index f9e32eb..6ddd4f1 100644
--- a/gitweb/Makefile
+++ b/gitweb/Makefile
@@ -113,6 +113,9 @@ endif
 
 GITWEB_FILES += static/git-logo.png static/git-favicon.png
 
+# Gitweb caching
+GITWEB_MODULES += cache.pl
+
 GITWEB_REPLACE = \
 	-e 's|++GIT_VERSION++|$(GIT_VERSION)|g' \
 	-e 's|++GIT_BINDIR++|$(bindir)|g' \
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index cae0e34..3c3ff08 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -250,6 +250,53 @@ our %avatar_size = (
 # Leave it undefined (or set to 'undef') to turn off load checking.
 our $maxload = 300;
 
+# This enables/disables the caching layer in gitweb.  This currently only supports the
+# 'dumb' file based caching layer, primarily used on git.kernel.org.  this is reasonably
+# effective but it has the downside of requiring a huge amount of disk space if there
+# are a number of repositories involved.  It is not uncommon for git.kernel.org to have
+# on the order of 80G - 120G accumulate over the course of a few months.  It is recommended
+# that the cache directory be periodically completely deleted, and this is safe to perform.
+# Suggested mechanism
+# mv $cacheidr $cachedir.flush;mkdir $cachedir;rm -rf $cachedir.flush
+our $caching_enabled = 0;
+
+# Used to set the minimum cache timeout for the dynamic caching algorithm.  Basically
+# if we calculate the cache to be under this number of seconds we set the cache timeout
+# to this minimum.
+# Value is in seconds.  1 = 1 seconds, 60 = 1 minute, 600 = 10 minutes, 3600 = 1 hour
+our $minCacheTime = 20;
+
+# Used to set the maximum cache timeout for the dynamic caching algorithm.  Basically
+# if we calculate the cache to exceed this number of seconds we set the cache timeout
+# to this maximum.
+# Value is in seconds.  1 = 1 seconds, 60 = 1 minute, 600 = 10 minutes, 3600 = 1 hour
+our $maxCacheTime = 1200;
+
+# If you need to change the location of the caching directory, override this
+# otherwise this will probably do fine for you
+our $cachedir = 'cache';
+
+# If this is set (to 1) cache will do it's best to always display something instead
+# of making someone wait for the cache to update.  This will launch the cacheUpdate
+# into the background and it will lock a <file>.bg file and will only lock the
+# actual cache file when it needs to write into it.  In theory this will make
+# gitweb seem more responsive at the price of possibly stale data.
+our $backgroundCache = 1;
+
+# Used to set the maximum cache file life.  If a cache files last modify time exceeds
+# this value, it will assume that the data is just too old, and HAS to be regenerated
+# instead of trying to display the existing cache data.
+# Value is in seconds.  1 = 1 seconds, 60 = 1 minute, 600 = 10 minutes, 3600 = 1 hour
+# 18000 = 5 hours
+our $maxCacheLife = 18000;
+
+# Used to enable or disable background forking of the gitweb caching.  Mainly here for debugging purposes
+our $cacheDoFork = 1;
+
+our $fullhashpath = *STDOUT;
+our $fullhashbinpath = *STDOUT;
+our $fullhashbinpathfinal = *STDOUT;
+
 # configuration for 'highlight' (http://www.andre-simon.de/)
 # match by basename
 our %highlight_basename = (
@@ -506,6 +553,15 @@ our %feature = (
 		'default' => [0]},
 );
 
+#
+# Includes
+#
+if (!exists $INC{'cache.pl'}) {
+	my $return = do 'cache.pl';
+	die $@ if $@;
+	die "Couldn't read 'cache.pl': $!" if (!defined $return);
+}
+
 sub gitweb_get_feature {
 	my ($name) = @_;
 	return unless exists $feature{$name};
@@ -734,6 +790,10 @@ our %actions = (
 	"project_list" => \&git_project_list,
 	"project_index" => \&git_project_index,
 );
+sub is_cacheable {
+	my $action = shift;
+	return !($action eq 'blame_data' || $action eq 'blame_incremental');
+}
 
 # finally, we have the hash of allowed extra_options for the commands that
 # allow them
@@ -1072,7 +1132,11 @@ sub dispatch {
 	    !$project) {
 		die_error(400, "Project needed");
 	}
-	$actions{$action}->();
+	if ($caching_enabled && is_cacheable($action)) {
+		cache_fetch($action);
+	} else {
+		$actions{$action}->();
+	}
 }
 
 sub reset_timer {
@@ -1142,6 +1206,7 @@ sub change_output {
 
 	# Trap the 'proper' STDOUT to STDOUT_REAL for things like error messages and such
 	open(STDOUT_REAL,">&STDOUT") or die "Unable to capture STDOUT $!\n";
+	print STDOUT_REAL "";
 
 	# Close STDOUT, so that it isn't being used anymore.
 	close STDOUT;
@@ -1167,10 +1232,7 @@ sub run {
 		$pre_dispatch_hook->()
 			if $pre_dispatch_hook;
 
-		change_output();
 		run_request();
-		reset_output();
-		print $output;
 
 		$post_dispatch_hook->()
 			if $post_dispatch_hook;
@@ -3447,7 +3509,8 @@ sub git_header_html {
 	# support xhtml+xml but choking when it gets what it asked for.
 	if (defined $cgi->http('HTTP_ACCEPT') &&
 	    $cgi->http('HTTP_ACCEPT') =~ m/(,|;|\s|^)application\/xhtml\+xml(,|;|\s|$)/ &&
-	    $cgi->Accept('application/xhtml+xml') != 0) {
+	    $cgi->Accept('application/xhtml+xml') != 0 &&
+	    !$caching_enabled) {
 		$content_type = 'application/xhtml+xml';
 	} else {
 		$content_type = 'text/html';
@@ -3592,6 +3655,7 @@ sub git_footer_html {
 	my $feed_class = 'rss_logo';
 
 	print "<div class=\"page_footer\">\n";
+	print "<div class=\"cachetime\">Cache Last Updated: ". gmtime( time ) ." GMT</div>\n";
 	if (defined $project) {
 		my $descr = git_get_project_description($project);
 		if (defined $descr) {
@@ -3680,9 +3744,14 @@ sub die_error {
 		500 => '500 Internal Server Error',
 		503 => '503 Service Unavailable',
 	);
+	# The output handlers for die_error need to be reset to STDOUT
+	# so that half the message isn't being output to random and
+	# half to STDOUT as expected.  This is mainly for the benefit
+	# of using git_header_html() and git_footer_html() since
+	#
 	# Reset the output so that we are actually going to STDOUT as opposed
 	# to buffering the output.
-	reset_output();
+	reset_output() if ($caching_enabled);
 
 	git_header_html($http_responses{$status}, undef, %opts);
 	print <<EOF;
@@ -5592,9 +5661,15 @@ sub git_blob_plain {
 			($sandbox ? 'attachment' : 'inline')
 			. '; filename="' . $save_as . '"');
 	local $/ = undef;
-	binmode STDOUT, ':raw';
-	print <$fd>;
-	binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
+	if ($caching_enabled) {
+		open BINOUT, '>', $fullhashbinpath or die_error(500, "Could not open bin dump file");
+	}else{
+		open BINOUT, '>', \$fullhashbinpath or die_error(500, "Could not open bin dump file");
+	}
+	binmode BINOUT, ':raw';
+	print BINOUT <$fd>;
+	binmode BINOUT, ':utf8'; # as set at the beginning of gitweb.cgi
+	close BINOUT;
 	close $fd;
 }
 
@@ -5879,9 +5954,15 @@ sub git_snapshot {
 
 	open my $fd, "-|", $cmd
 		or die_error(500, "Execute git-archive failed");
-	binmode STDOUT, ':raw';
-	print <$fd>;
-	binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
+	if ($caching_enabled) {
+		open BINOUT, '>', $fullhashbinpath or die_error(500, "Could not open bin dump file");
+	}else{
+		open BINOUT, '>', \$fullhashbinpath or die_error(500, "Could not open bin dump file");
+	}
+	binmode BINOUT, ':raw';
+	print BINOUT <$fd>;
+	binmode BINOUT, ':utf8'; # as set at the beginning of gitweb.cgi
+	close BINOUT;
 	close $fd;
 }
 
diff --git a/gitweb/lib/cache.pl b/gitweb/lib/cache.pl
new file mode 100644
index 0000000..dd14bfb
--- /dev/null
+++ b/gitweb/lib/cache.pl
@@ -0,0 +1,348 @@
+# gitweb - simple web interface to track changes in git repositories
+#
+# (C) 2006, John 'Warthog9' Hawley <warthog19@eaglescrag.net>
+#
+# This program is licensed under the GPLv2
+
+#
+# Gitweb caching engine
+#
+
+#use File::Path qw(make_path remove_tree);
+use File::Path qw(mkpath rmtree); # Used for compatability reasons
+use Digest::MD5 qw(md5 md5_hex md5_base64);
+use Fcntl ':flock';
+use File::Copy;
+
+sub cache_fetch {
+	my ($action) = @_;
+	my $cacheTime = 0;
+
+	if(! -d $cachedir){
+		print "*** Warning ***: Caching enabled but cache directory does not exsist.  ($cachedir)\n";
+		mkdir ("cache", 0755) || die "Cannot create cache dir - you will need to manually create";
+		print "Cache directory created successfully\n";
+	}
+
+	our $full_url = "$my_url?". $ENV{'QUERY_STRING'};
+	our $urlhash = md5_hex($full_url);
+	our $fullhashdir = "$cachedir/". substr( $urlhash, 0, 2) ."/";
+
+	eval { mkpath( $fullhashdir, 0, 0777 ) };
+	if ($@) {
+		die_error(500, "Internal Server Error", "Could not create cache directory: $@");
+	}
+	$fullhashpath = "$fullhashdir/". substr( $urlhash, 2 );
+	$fullhashbinpath = "$fullhashpath.bin.wt";
+	$fullhashbinpathfinal = "$fullhashpath.bin";
+
+	if(! -e "$fullhashpath" ){
+		if(! $cacheDoFork || ! defined(my $childPid = fork()) ){
+			cacheUpdate($action,0);
+			cacheDisplay($action);
+		} elsif ( $childPid == 0 ){
+			#run the updater
+			cacheUpdate($action,1);
+		}else{
+			cacheWaitForUpdate($action);
+		}
+	}else{
+		#if cache is out dated, update
+		#else displayCache();
+		open(cacheFile, '<', "$fullhashpath");
+		stat(cacheFile);
+		close(cacheFile);
+		my $stat_time = (stat(_))[9];
+		my $stat_size = (stat(_))[7];
+
+		$cacheTime = get_loadavg() * 60;
+		if( $cacheTime > $maxCacheTime ){
+			$cacheTime = $maxCacheTime;
+		}
+		if( $cacheTime < $minCacheTime ){
+			$cacheTime = $minCacheTime;
+		}
+		if( $stat_time < (time - $cacheTime) || $stat_size == 0 ){
+			if( ! $cacheDoFork || ! defined(my $childPid = fork()) ){
+				cacheUpdate($action,0);
+				cacheDisplay($action);
+			} elsif ( $childPid == 0 ){
+				#run the updater
+				#print "Running updater\n";
+				cacheUpdate($action,1);
+			}else{
+				#print "Waiting for update\n";
+				cacheWaitForUpdate($action);
+			}
+		} else {
+			cacheDisplay($action);
+		}
+
+
+	}
+
+	#
+	# If all of the caching failes - lets go ahead and press on without it and fall back to 'default'
+	# non-caching behavior.  This is the softest of the failure conditions.
+	#
+	#$actions{$action}->();
+}
+
+sub cacheUpdate {
+	my ($action,$areForked) = @_;
+	my $lockingStatus;
+	my $fileData = "";
+
+	if($backgroundCache){
+		open(cacheFileBG, '>:utf8', "$fullhashpath.bg");
+		my $lockStatBG = flock(cacheFileBG,LOCK_EX|LOCK_NB);
+
+		$lockStatus = $lockStatBG;
+	}else{
+		open(cacheFile, '>:utf8', \$fullhashpath);
+		my $lockStat = flock(cacheFile,LOCK_EX|LOCK_NB);
+
+		$lockStatus = $lockStat;
+	}
+	#print "lock status: $lockStat\n";
+
+
+	if (! $lockStatus ){
+		if ( $areForked ){
+			exit(0);
+		}else{
+			return;
+		}
+	}
+
+	if(
+		$action eq "snapshot"
+		||
+		$action eq "blob_plain"
+	){
+		my $openstat = open(cacheFileBinWT, '>>:utf8', "$fullhashbinpath");
+		my $lockStatBin = flock(cacheFileBinWT,LOCK_EX|LOCK_NB);
+	}
+
+	# Trap all output from the action
+	change_output();
+
+	$actions{$action}->();
+
+	# Reset the outputs as we should be fine now
+	reset_output();
+
+
+	if($backgroundCache){
+		open(cacheFile, '>:utf8', "$fullhashpath");
+		$lockStat = flock(cacheFile,LOCK_EX);
+
+		if (! $lockStat ){
+			if ( $areForked ){
+				exit(0);
+			}else{
+				return;
+			}
+		}
+	}
+
+	if(
+		$action eq "snapshot"
+		||
+		$action eq "blob_plain"
+	){
+		my $openstat = open(cacheFileBinFINAL, '>:utf8', "$fullhashbinpathfinal");
+		$lockStatBIN = flock(cacheFileBinFINAL,LOCK_EX);
+
+		if (! $lockStatBIN ){
+			if ( $areForked ){
+				exit(0);
+			}else{
+				return;
+			}
+		}
+	}
+
+	# Actually dump the output to the proper file handler
+	local $/ = undef;
+	$|++;
+	print cacheFile "$output";
+	$|--;
+	if(
+		$action eq "snapshot"
+		||
+		$action eq "blob_plain"
+	){
+		move("$fullhashbinpath", "$fullhashbinpathfinal") or die "Binary Cache file could not be updated: $!";
+
+		flock(cacheFileBinFINAL,LOCK_UN);
+		close(cacheFileBinFINAL);
+
+		flock(cacheFileBinWT,LOCK_UN);
+		close(cacheFileBinWT);
+	}
+
+	flock(cacheFile,LOCK_UN);
+	close(cacheFile);
+
+	if($backgroundCache){
+		flock(cacheFileBG,LOCK_UN);
+		close(cacheFileBG);
+	}
+
+	if ( $areForked ){
+		exit(0);
+	} else {
+		return;
+	}
+}
+
+
+sub cacheWaitForUpdate {
+	my ($action) = @_;
+	my $x = 0;
+	my $max = 10;
+	my $lockStat = 0;
+
+	if( $backgroundCache ){
+		if( -e "$fullhashpath" ){
+			open(cacheFile, '<:utf8', "$fullhashpath");
+			$lockStat = flock(cacheFile,LOCK_SH|LOCK_NB);
+			stat(cacheFile);
+			close(cacheFile);
+
+			if( $lockStat && ( (stat(_))[9] > (time - $maxCacheLife) ) ){
+				cacheDisplay($action);
+				return;
+			}
+		}
+	}
+
+	if(
+		$action eq "atom"
+		||
+		$action eq "rss"
+		||
+		$action eq "opml"
+	){
+		do {
+			sleep 2 if $x > 0;
+			open(cacheFile, '<:utf8', "$fullhashpath");
+			$lockStat = flock(cacheFile,LOCK_SH|LOCK_NB);
+			close(cacheFile);
+			$x++;
+			$combinedLockStat = $lockStat;
+		} while ((! $combinedLockStat) && ($x < $max));
+
+		if( $x != $max ){
+			cacheDisplay($action);
+		}
+		return;
+	}
+
+	$| = 1;
+
+	print $::cgi->header(
+				-type=>'text/html',
+				-charset => 'utf-8',
+				-status=> 200,
+				-expires => 'now',
+				# HTTP/1.0
+				-Pragma => 'no-cache',
+				# HTTP/1.1
+				-Cache_Control => join(
+							', ',
+							qw(
+								private
+								no-cache
+								no-store
+								must-revalidate
+								max-age=0
+								pre-check=0
+								post-check=0
+							)
+						)
+				);
+
+	print <<EOF;
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www/w3.porg/TR/html4/strict.dtd">
+<!-- git web w/caching interface version $version, (C) 2006-2010, John 'Warthog9' Hawley <warthog9\@kernel.org> -->
+<!-- git core binaries version $git_version -->
+<head>
+<meta http-equiv="content-type" content="$content_type; charset=utf-8"/>
+<meta name="generator" content="gitweb/$version git/$git_version"/>
+<meta name="robots" content="index, nofollow"/>
+<meta http-equiv="refresh" content="0"/>
+<title>$title</title>
+</head>
+<body>
+EOF
+
+	print "Generating..";
+	do {
+		print ".";
+		sleep 2 if $x > 0;
+		open(cacheFile, '<:utf8', "$fullhashpath");
+		$lockStat = flock(cacheFile,LOCK_SH|LOCK_NB);
+		close(cacheFile);
+		$x++;
+		$combinedLockStat = $lockStat;
+	} while ((! $combinedLockStat) && ($x < $max));
+	print <<EOF;
+</body>
+</html>
+EOF
+	return;
+}
+
+sub cacheDisplay {
+	local $/ = undef;
+	$|++;
+
+	my ($action) = @_;
+	open(cacheFile, '<:utf8', "$fullhashpath");
+	$lockStat = flock(cacheFile,LOCK_SH|LOCK_NB);
+
+	if (! $lockStat ){
+		close(cacheFile);
+		cacheWaitForUpdate($action);
+	}
+
+	if(
+		(
+			$action eq "snapshot"
+			||
+			$action eq "blob_plain"
+		)
+	){
+		my $openstat = open(cacheFileBin, '<', "$fullhashbinpathfinal");
+		$lockStatBIN = flock(cacheFileBin,LOCK_SH|LOCK_NB);
+		if (! $lockStatBIN ){
+			system ("echo 'cacheDisplay - bailing due to binary lock failure' >> /tmp/gitweb.log");
+			close(cacheFile);
+			close(cacheFileBin);
+			cacheWaitForUpdate($action);
+		}
+
+		my $binfilesize = -s "$fullhashbinpathfinal";
+		print "Content-Length: $binfilesize";
+	}
+	while( <cacheFile> ){
+		print $_;
+	}
+	if(
+		$action eq "snapshot"
+		||
+		$action eq "blob_plain"
+	){
+		binmode STDOUT, ':raw';
+		print <cacheFileBin>;
+		binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
+		close(cacheFileBin);
+	}
+	close(cacheFile);
+	$|--;
+}
+
+1;
+__END__
diff --git a/gitweb/static/gitweb.css b/gitweb/static/gitweb.css
index 4132aab..972d32e 100644
--- a/gitweb/static/gitweb.css
+++ b/gitweb/static/gitweb.css
@@ -67,6 +67,12 @@ div.page_path {
 	border-width: 0px 0px 1px;
 }
 
+div.cachetime {
+	float: left;
+	margin-right: 10px;
+	color: #555555;
+}
+
 div.page_footer {
 	height: 17px;
 	padding: 4px 8px;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 04/18] gitweb: Minimal testing of gitweb caching
  2010-12-09 21:57 [PATCH 00/18] Gitweb caching v8 John 'Warthog9' Hawley
                   ` (2 preceding siblings ...)
  2010-12-09 21:57 ` [PATCH 03/18] gitweb: File based caching layer (from git.kernel.org) John 'Warthog9' Hawley
@ 2010-12-09 21:57 ` John 'Warthog9' Hawley
  2010-12-09 21:57 ` [PATCH 05/18] gitweb: Regression fix concerning binary output of files John 'Warthog9' Hawley
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-12-09 21:57 UTC (permalink / raw)
  To: git

From: Jakub Narebski <jnareb@gmail.com>

Add basic tests of caching support to t9500-gitweb-standalone-no-errors
test: set $caching_enabled to true and check for errors for first time
run (generating cache) and second time run (retrieving from cache) for a
single view - summary view for a project.  Check also that request for
non-existent object (which results in die_error() codepath to be called)
doesn't produce errors.

Check in t9501-gitweb-standalone-http-status that request for
non-existent object produces correct output (HTTP headers and HTML
output) also when caching is enabled.

Check in the t9502-gitweb-standalone-parse-output test that gitweb
produces the same output with and without caching, for first and
second run, with binary (raw) or plain text (utf8) output.

The common routine that enables cache, gitweb_enable_caching, is
defined in t/gitweb-lib.sh

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 t/gitweb-lib.sh                           |   15 +++++++++++++
 t/t9500-gitweb-standalone-no-errors.sh    |   20 +++++++++++++++++
 t/t9501-gitweb-standalone-http-status.sh  |   13 +++++++++++
 t/t9502-gitweb-standalone-parse-output.sh |   33 +++++++++++++++++++++++++++++
 4 files changed, 81 insertions(+), 0 deletions(-)
 mode change 100644 => 100755 t/gitweb-lib.sh

diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh
old mode 100644
new mode 100755
index b9bb95f..16ce811
--- a/t/gitweb-lib.sh
+++ b/t/gitweb-lib.sh
@@ -52,6 +52,21 @@ EOF
 	export SCRIPT_NAME
 }
 
+gitweb_enable_caching () {
+	test_expect_success 'enable caching' '
+		cat >>gitweb_config.perl <<-\EOF &&
+		our $caching_enabled = 1;
+		our $minCacheTime = 60*60*24*7*30;     # very long expiration time for tests (a month)
+		our $maxCacheTime = 60*60*24*7*30*365; # upper bound for dynamic (adaptive) caching
+		our $cachedir = "cache";               # for testsuite to clear the right thing
+		# required, because otherwise some tests might intermittently not pass
+		our $backgroundCache = 0; # should turn off cacheWaitForUpdate() / "Generating..."
+		#our $cacheDoFork = 0;
+		EOF
+		rm -rf cache/
+	'
+}
+
 gitweb_run () {
 	GATEWAY_INTERFACE='CGI/1.1'
 	HTTP_ACCEPT='*/*'
diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
index 21cd286..86c1b50 100755
--- a/t/t9500-gitweb-standalone-no-errors.sh
+++ b/t/t9500-gitweb-standalone-no-errors.sh
@@ -677,4 +677,24 @@ test_expect_success HIGHLIGHT \
 	 gitweb_run "p=.git;a=blob;f=test.sh"'
 test_debug 'cat gitweb.log'
 
+# ----------------------------------------------------------------------
+# caching
+
+gitweb_enable_caching
+
+test_expect_success \
+	'caching enabled (project summary, first run, generating cache)' \
+	'gitweb_run "p=.git;a=summary"'
+test_debug 'cat gitweb.log'
+
+test_expect_success \
+	'caching enabled (project summary, second run, cached version)' \
+	'gitweb_run "p=.git;a=summary"'
+test_debug 'cat gitweb.log'
+
+test_expect_success \
+	'caching enabled (non-existent commit, non-cache error page)' \
+	'gitweb_run "p=.git;a=commit;h=non-existent"'
+test_debug 'cat gitweb.log'
+
 test_done
diff --git a/t/t9501-gitweb-standalone-http-status.sh b/t/t9501-gitweb-standalone-http-status.sh
index 2487da1..5b1df3f 100755
--- a/t/t9501-gitweb-standalone-http-status.sh
+++ b/t/t9501-gitweb-standalone-http-status.sh
@@ -134,5 +134,18 @@ cat >>gitweb_config.perl <<\EOF
 our $maxload = undef;
 EOF
 
+# ----------------------------------------------------------------------
+# output caching
+
+gitweb_enable_caching
+
+test_expect_success 'caching enabled (non-existent commit, 404 error)' '
+	gitweb_run "p=.git;a=commit;h=non-existent" &&
+	grep "Status: 404 Not Found" gitweb.headers &&
+	grep "404 - Unknown commit object" gitweb.body
+'
+test_debug 'echo "log"     && cat gitweb.log'
+test_debug 'echo "headers" && cat gitweb.headers'
+test_debug 'echo "body"    && cat gitweb.body'
 
 test_done
diff --git a/t/t9502-gitweb-standalone-parse-output.sh b/t/t9502-gitweb-standalone-parse-output.sh
index dd83890..bc8eb01 100755
--- a/t/t9502-gitweb-standalone-parse-output.sh
+++ b/t/t9502-gitweb-standalone-parse-output.sh
@@ -112,4 +112,37 @@ test_expect_success 'snapshot: hierarchical branch name (xx/test)' '
 '
 test_debug 'cat gitweb.headers'
 
+
+# ----------------------------------------------------------------------
+# whether gitweb with caching enabled produces the same output
+
+test_expect_success 'setup for caching tests (utf8 patch, binary file)' '
+	. "$TEST_DIRECTORY"/t3901-utf8.txt &&
+	cp "$TEST_DIRECTORY"/test9200a.png image.png &&
+	git add image.png &&
+	git commit -F "$TEST_DIRECTORY"/t3900/1-UTF-8.txt &&
+	gitweb_run "p=.git;a=patch" &&
+	mv gitweb.body no_cache.txt &&
+	gitweb_run "p=.git;a=blob_plain;f=image.png" &&
+	mv gitweb.body no_cache.png
+'
+
+gitweb_enable_caching
+
+for desc in 'generating cache' 'cached version'; do
+	test_expect_success "caching enabled, plain text (utf8) output, $desc" '
+		gitweb_run "p=.git;a=patch" &&
+		mv gitweb.body cache.txt &&
+		test_cmp no_cache.txt cache.txt
+	'
+done
+
+for desc in 'generating cache' 'cached version'; do
+	test_expect_success "caching enabled, binary output (raw), $desc" '
+		gitweb_run "p=.git;a=blob_plain;f=image.png" &&
+		mv gitweb.body cache.png &&
+		cmp no_cache.png cache.png
+	'
+done
+
 test_done
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 05/18] gitweb: Regression fix concerning binary output of files
  2010-12-09 21:57 [PATCH 00/18] Gitweb caching v8 John 'Warthog9' Hawley
                   ` (3 preceding siblings ...)
  2010-12-09 21:57 ` [PATCH 04/18] gitweb: Minimal testing of gitweb caching John 'Warthog9' Hawley
@ 2010-12-09 21:57 ` John 'Warthog9' Hawley
  2010-12-09 23:33   ` Jakub Narebski
  2010-12-09 21:57 ` [PATCH 06/18] gitweb: Add more explicit means of disabling 'Generating...' page John 'Warthog9' Hawley
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-12-09 21:57 UTC (permalink / raw)
  To: git

This solves the regression introduced with v7.2 of the gitweb-caching code,
fix proposed by Jakub in his e-mail.

Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
---
 gitweb/gitweb.perl |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 3c3ff08..f2ef3da 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -5664,7 +5664,7 @@ sub git_blob_plain {
 	if ($caching_enabled) {
 		open BINOUT, '>', $fullhashbinpath or die_error(500, "Could not open bin dump file");
 	}else{
-		open BINOUT, '>', \$fullhashbinpath or die_error(500, "Could not open bin dump file");
+		open BINOUT, '>&', \$fullhashbinpath or die_error(500, "Could not open bin dump file");
 	}
 	binmode BINOUT, ':raw';
 	print BINOUT <$fd>;
@@ -5957,7 +5957,7 @@ sub git_snapshot {
 	if ($caching_enabled) {
 		open BINOUT, '>', $fullhashbinpath or die_error(500, "Could not open bin dump file");
 	}else{
-		open BINOUT, '>', \$fullhashbinpath or die_error(500, "Could not open bin dump file");
+		open BINOUT, '>&', \$fullhashbinpath or die_error(500, "Could not open bin dump file");
 	}
 	binmode BINOUT, ':raw';
 	print BINOUT <$fd>;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 06/18] gitweb: Add more explicit means of disabling 'Generating...' page
  2010-12-09 21:57 [PATCH 00/18] Gitweb caching v8 John 'Warthog9' Hawley
                   ` (4 preceding siblings ...)
  2010-12-09 21:57 ` [PATCH 05/18] gitweb: Regression fix concerning binary output of files John 'Warthog9' Hawley
@ 2010-12-09 21:57 ` John 'Warthog9' Hawley
  2010-12-09 21:57 ` [PATCH 07/18] gitweb: Revert back to $cache_enable vs. $caching_enabled John 'Warthog9' Hawley
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-12-09 21:57 UTC (permalink / raw)
  To: git

As requested this adds $cacheGenStatus variable, default 1 (on).
If caching is enabled it will explicitly disble the display of the
'Generating...' page and just force the user to stall indefinately.

Also adding it to gitweb's test code as I'm sure the 'Generating...'
page isn't that useful there.

Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
---
 gitweb/gitweb.perl  |    6 ++++++
 gitweb/lib/cache.pl |    2 ++
 t/gitweb-lib.sh     |    1 +
 3 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index f2ef3da..05e7ba6 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -293,6 +293,12 @@ our $maxCacheLife = 18000;
 # Used to enable or disable background forking of the gitweb caching.  Mainly here for debugging purposes
 our $cacheDoFork = 1;
 
+# Used to enable or disable the foreground "Generating..." page.  This is here to be more explicit should
+# people want to disable it.
+# Default: 1 (True - Enabled)
+# To disable set to 0
+our $cacheGenStatus = 1;
+
 our $fullhashpath = *STDOUT;
 our $fullhashbinpath = *STDOUT;
 our $fullhashbinpathfinal = *STDOUT;
diff --git a/gitweb/lib/cache.pl b/gitweb/lib/cache.pl
index dd14bfb..a8ee99e 100644
--- a/gitweb/lib/cache.pl
+++ b/gitweb/lib/cache.pl
@@ -224,6 +224,8 @@ sub cacheWaitForUpdate {
 		$action eq "rss"
 		||
 		$action eq "opml"
+		||
+		! $cacheGenStatus
 	){
 		do {
 			sleep 2 if $x > 0;
diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh
index 16ce811..10c4a3d 100755
--- a/t/gitweb-lib.sh
+++ b/t/gitweb-lib.sh
@@ -26,6 +26,7 @@ our \$projects_list = '';
 our \$export_ok = '';
 our \$strict_export = '';
 our \$maxload = undef;
+our \$cacheGenStatus = 0;
 
 EOF
 
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 07/18] gitweb: Revert back to $cache_enable vs. $caching_enabled
  2010-12-09 21:57 [PATCH 00/18] Gitweb caching v8 John 'Warthog9' Hawley
                   ` (5 preceding siblings ...)
  2010-12-09 21:57 ` [PATCH 06/18] gitweb: Add more explicit means of disabling 'Generating...' page John 'Warthog9' Hawley
@ 2010-12-09 21:57 ` John 'Warthog9' Hawley
  2010-12-09 23:38   ` Jakub Narebski
  2010-12-09 21:57 ` [PATCH 08/18] gitweb: Change is_cacheable() to return true always John 'Warthog9' Hawley
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-12-09 21:57 UTC (permalink / raw)
  To: git

Simple enough, $cache_enable (along with all caching variables) are
already in production in multiple places and doing a small semantic
change without backwards compatibility is pointless breakage.

This reverts back to the previous variable to enable / disable caching

Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
---
 gitweb/gitweb.perl |   12 ++++++------
 t/gitweb-lib.sh    |    2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 05e7ba6..5eb0309 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -258,7 +258,7 @@ our $maxload = 300;
 # that the cache directory be periodically completely deleted, and this is safe to perform.
 # Suggested mechanism
 # mv $cacheidr $cachedir.flush;mkdir $cachedir;rm -rf $cachedir.flush
-our $caching_enabled = 0;
+our $cache_enable = 0;
 
 # Used to set the minimum cache timeout for the dynamic caching algorithm.  Basically
 # if we calculate the cache to be under this number of seconds we set the cache timeout
@@ -1138,7 +1138,7 @@ sub dispatch {
 	    !$project) {
 		die_error(400, "Project needed");
 	}
-	if ($caching_enabled && is_cacheable($action)) {
+	if ($cache_enable && is_cacheable($action)) {
 		cache_fetch($action);
 	} else {
 		$actions{$action}->();
@@ -3516,7 +3516,7 @@ sub git_header_html {
 	if (defined $cgi->http('HTTP_ACCEPT') &&
 	    $cgi->http('HTTP_ACCEPT') =~ m/(,|;|\s|^)application\/xhtml\+xml(,|;|\s|$)/ &&
 	    $cgi->Accept('application/xhtml+xml') != 0 &&
-	    !$caching_enabled) {
+	    !$cache_enable) {
 		$content_type = 'application/xhtml+xml';
 	} else {
 		$content_type = 'text/html';
@@ -3757,7 +3757,7 @@ sub die_error {
 	#
 	# Reset the output so that we are actually going to STDOUT as opposed
 	# to buffering the output.
-	reset_output() if ($caching_enabled);
+	reset_output() if ($cache_enable && ! $cacheErrorCache);
 
 	git_header_html($http_responses{$status}, undef, %opts);
 	print <<EOF;
@@ -5667,7 +5667,7 @@ sub git_blob_plain {
 			($sandbox ? 'attachment' : 'inline')
 			. '; filename="' . $save_as . '"');
 	local $/ = undef;
-	if ($caching_enabled) {
+	if ($cache_enable) {
 		open BINOUT, '>', $fullhashbinpath or die_error(500, "Could not open bin dump file");
 	}else{
 		open BINOUT, '>&', \$fullhashbinpath or die_error(500, "Could not open bin dump file");
@@ -5960,7 +5960,7 @@ sub git_snapshot {
 
 	open my $fd, "-|", $cmd
 		or die_error(500, "Execute git-archive failed");
-	if ($caching_enabled) {
+	if ($cache_enable) {
 		open BINOUT, '>', $fullhashbinpath or die_error(500, "Could not open bin dump file");
 	}else{
 		open BINOUT, '>&', \$fullhashbinpath or die_error(500, "Could not open bin dump file");
diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh
index 10c4a3d..a0f7696 100755
--- a/t/gitweb-lib.sh
+++ b/t/gitweb-lib.sh
@@ -56,7 +56,7 @@ EOF
 gitweb_enable_caching () {
 	test_expect_success 'enable caching' '
 		cat >>gitweb_config.perl <<-\EOF &&
-		our $caching_enabled = 1;
+		our $cache_enable = 1;
 		our $minCacheTime = 60*60*24*7*30;     # very long expiration time for tests (a month)
 		our $maxCacheTime = 60*60*24*7*30*365; # upper bound for dynamic (adaptive) caching
 		our $cachedir = "cache";               # for testsuite to clear the right thing
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 08/18] gitweb: Change is_cacheable() to return true always
  2010-12-09 21:57 [PATCH 00/18] Gitweb caching v8 John 'Warthog9' Hawley
                   ` (6 preceding siblings ...)
  2010-12-09 21:57 ` [PATCH 07/18] gitweb: Revert back to $cache_enable vs. $caching_enabled John 'Warthog9' Hawley
@ 2010-12-09 21:57 ` John 'Warthog9' Hawley
  2010-12-09 23:46   ` Jakub Narebski
  2010-12-09 21:57 ` [PATCH 09/18] gitweb: Revert reset_output() back to original code John 'Warthog9' Hawley
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-12-09 21:57 UTC (permalink / raw)
  To: git

is_cacheable() was set to return false for blame or blame_incremental
which both use unique urls so there's no reason this shouldn't pass
through the caching engine.

Leaving the function in place for now should something actually arrise
that we can't use caching for (think ajaxy kinda things likely).

Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
---
 gitweb/gitweb.perl |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 5eb0309..1d8bc74 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -798,7 +798,8 @@ our %actions = (
 );
 sub is_cacheable {
 	my $action = shift;
-	return !($action eq 'blame_data' || $action eq 'blame_incremental');
+	# There are no known actions that do no involve a unique URL that shouldn't be cached.
+	return 1;
 }
 
 # finally, we have the hash of allowed extra_options for the commands that
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 09/18] gitweb: Revert reset_output() back to original code
  2010-12-09 21:57 [PATCH 00/18] Gitweb caching v8 John 'Warthog9' Hawley
                   ` (7 preceding siblings ...)
  2010-12-09 21:57 ` [PATCH 08/18] gitweb: Change is_cacheable() to return true always John 'Warthog9' Hawley
@ 2010-12-09 21:57 ` John 'Warthog9' Hawley
  2010-12-09 23:58   ` Jakub Narebski
  2010-12-09 21:57 ` [PATCH 10/18] gitweb: Adding isBinaryAction() and isFeedAction() to determine the action type John 'Warthog9' Hawley
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-12-09 21:57 UTC (permalink / raw)
  To: git

Reverted change to reset_output as

	open STDOUT, ">&", \*STDOUT_REAL;

causes assertion failures:

	Assertion !((((s->var)->sv_flags & (0x00004000|0x00008000)) == 0x00008000) && (((svtype)((s->var)->sv_flags & 0xff)) == SVt_PVGV || ((svtype)((s->var)->sv_flags & 0xff)) == SVt_PVLV)) failed: file "scalar.xs", line 49 at gitweb.cgi line 1221.

if we encounter an error *BEFORE* we've ever changed the output.

Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
---
 gitweb/gitweb.perl |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 1d8bc74..e8c028b 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1225,7 +1225,7 @@ sub change_output {
 
 sub reset_output {
 	# This basically takes STDOUT_REAL and puts it back as STDOUT
-	open STDOUT, ">&", \*STDOUT_REAL;
+	open(STDOUT,">&STDOUT_REAL");
 }
 
 sub run {
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 10/18] gitweb: Adding isBinaryAction() and isFeedAction() to determine the action type
  2010-12-09 21:57 [PATCH 00/18] Gitweb caching v8 John 'Warthog9' Hawley
                   ` (8 preceding siblings ...)
  2010-12-09 21:57 ` [PATCH 09/18] gitweb: Revert reset_output() back to original code John 'Warthog9' Hawley
@ 2010-12-09 21:57 ` John 'Warthog9' Hawley
  2010-12-10  0:06   ` Jakub Narebski
  2010-12-09 21:57 ` [PATCH 11/18] gitweb: add isDumbClient() check John 'Warthog9' Hawley
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-12-09 21:57 UTC (permalink / raw)
  To: git

This is fairly self explanitory, these are here just to centralize the checking
for these types of actions, as special things need to be done with regards to
them inside the caching engine.

isBinaryAction() returns true if the action deals with creating binary files
(this needing :raw output)

isFeedAction() returns true if the action deals with a news feed of some sort,
basically used to bypass the 'Generating...' message should it be a news reader
as those will explode badly on that page.

Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
---
 gitweb/lib/cache.pl |   69 ++++++++++++++++++++++++++-------------------------
 1 files changed, 35 insertions(+), 34 deletions(-)

diff --git a/gitweb/lib/cache.pl b/gitweb/lib/cache.pl
index a8ee99e..d55b572 100644
--- a/gitweb/lib/cache.pl
+++ b/gitweb/lib/cache.pl
@@ -88,6 +88,34 @@ sub cache_fetch {
 	#$actions{$action}->();
 }
 
+sub isBinaryAction {
+	my ($action) = @_;
+
+	if(
+		$action eq "snapshot"
+		||
+		$action eq "blob_plain"
+	){
+		return 1;	# True
+	}
+
+	return 0;		# False
+}
+
+sub isFeedAction {
+	if(
+		$action eq "atom"
+		||
+		$action eq "rss"
+		||
+		$action eq "opml"
+	){
+		return 1;	# True
+	}
+
+	return 0;		# False
+}
+
 sub cacheUpdate {
 	my ($action,$areForked) = @_;
 	my $lockingStatus;
@@ -115,11 +143,7 @@ sub cacheUpdate {
 		}
 	}
 
-	if(
-		$action eq "snapshot"
-		||
-		$action eq "blob_plain"
-	){
+	if( isBinaryAction($action) ){
 		my $openstat = open(cacheFileBinWT, '>>:utf8', "$fullhashbinpath");
 		my $lockStatBin = flock(cacheFileBinWT,LOCK_EX|LOCK_NB);
 	}
@@ -146,11 +170,7 @@ sub cacheUpdate {
 		}
 	}
 
-	if(
-		$action eq "snapshot"
-		||
-		$action eq "blob_plain"
-	){
+	if( isBinaryAction($action) ){
 		my $openstat = open(cacheFileBinFINAL, '>:utf8', "$fullhashbinpathfinal");
 		$lockStatBIN = flock(cacheFileBinFINAL,LOCK_EX);
 
@@ -168,11 +188,7 @@ sub cacheUpdate {
 	$|++;
 	print cacheFile "$output";
 	$|--;
-	if(
-		$action eq "snapshot"
-		||
-		$action eq "blob_plain"
-	){
+	if( isBinaryAction($action) ){
 		move("$fullhashbinpath", "$fullhashbinpathfinal") or die "Binary Cache file could not be updated: $!";
 
 		flock(cacheFileBinFINAL,LOCK_UN);
@@ -219,14 +235,10 @@ sub cacheWaitForUpdate {
 	}
 
 	if(
-		$action eq "atom"
-		||
-		$action eq "rss"
-		||
-		$action eq "opml"
+		isFeedAction($action)
 		||
 		! $cacheGenStatus
-	){
+	  ){
 		do {
 			sleep 2 if $x > 0;
 			open(cacheFile, '<:utf8', "$fullhashpath");
@@ -310,17 +322,10 @@ sub cacheDisplay {
 		cacheWaitForUpdate($action);
 	}
 
-	if(
-		(
-			$action eq "snapshot"
-			||
-			$action eq "blob_plain"
-		)
-	){
+	if( isBinaryAction($action) ){
 		my $openstat = open(cacheFileBin, '<', "$fullhashbinpathfinal");
 		$lockStatBIN = flock(cacheFileBin,LOCK_SH|LOCK_NB);
 		if (! $lockStatBIN ){
-			system ("echo 'cacheDisplay - bailing due to binary lock failure' >> /tmp/gitweb.log");
 			close(cacheFile);
 			close(cacheFileBin);
 			cacheWaitForUpdate($action);
@@ -332,11 +337,7 @@ sub cacheDisplay {
 	while( <cacheFile> ){
 		print $_;
 	}
-	if(
-		$action eq "snapshot"
-		||
-		$action eq "blob_plain"
-	){
+	if( isBinaryAction($action) ){
 		binmode STDOUT, ':raw';
 		print <cacheFileBin>;
 		binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 11/18] gitweb: add isDumbClient() check
  2010-12-09 21:57 [PATCH 00/18] Gitweb caching v8 John 'Warthog9' Hawley
                   ` (9 preceding siblings ...)
  2010-12-09 21:57 ` [PATCH 10/18] gitweb: Adding isBinaryAction() and isFeedAction() to determine the action type John 'Warthog9' Hawley
@ 2010-12-09 21:57 ` John 'Warthog9' Hawley
  2010-12-10  0:12   ` Jakub Narebski
  2010-12-09 21:57 ` [PATCH 12/18] gitweb: Change file handles (in caching) to lexical variables as opposed to globs John 'Warthog9' Hawley
                   ` (8 subsequent siblings)
  19 siblings, 1 reply; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-12-09 21:57 UTC (permalink / raw)
  To: git

Basic check for the claimed Agent string, if it matches a known
blacklist (wget and curl currently) don't display the 'Generating...'
page.

Jakub has mentioned a couple of other possible ways to handle
this, so if a better way comes along this should be used as a
wrapper to any better way we can find to deal with this.

Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
---
 gitweb/lib/cache.pl |   30 ++++++++++++++++++++++++++++++
 1 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/gitweb/lib/cache.pl b/gitweb/lib/cache.pl
index d55b572..5182a94 100644
--- a/gitweb/lib/cache.pl
+++ b/gitweb/lib/cache.pl
@@ -116,6 +116,34 @@ sub isFeedAction {
 	return 0;		# False
 }
 
+# There have been a number of requests that things like "dumb" clients, I.E. wget
+# lynx, links, etc (things that just download, but don't parse the html) actually
+# work without getting the wonkiness that is the "Generating..." page.
+#
+# There's only one good way to deal with this, and that's to read the browser User
+# Agent string and do matching based on that.  This has a whole slew of error cases
+# and mess, but there's no other way to determine if the "Generating..." page
+# will break things.
+#
+# This assumes the client is not dumb, thus the default behavior is to return
+# "false" (0) (and eventually the "Generating..." page).  If it is a dumb client
+# return "true" (1)
+sub isDumbClient {
+	my($user_agent) = $ENV{'HTTP_USER_AGENT'};
+	
+	if(
+		# wget case
+		$user_agent =~ /^Wget/i
+		||
+		# curl should be excluded I think, probably better safe than sorry
+		$user_agent =~ /^curl/i
+	  ){
+		return 1;	# True
+	}
+
+	return 0;
+}
+
 sub cacheUpdate {
 	my ($action,$areForked) = @_;
 	my $lockingStatus;
@@ -237,6 +265,8 @@ sub cacheWaitForUpdate {
 	if(
 		isFeedAction($action)
 		||
+		isDumbClient()
+		||
 		! $cacheGenStatus
 	  ){
 		do {
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 12/18] gitweb: Change file handles (in caching) to lexical variables as opposed to globs
  2010-12-09 21:57 [PATCH 00/18] Gitweb caching v8 John 'Warthog9' Hawley
                   ` (10 preceding siblings ...)
  2010-12-09 21:57 ` [PATCH 11/18] gitweb: add isDumbClient() check John 'Warthog9' Hawley
@ 2010-12-09 21:57 ` John 'Warthog9' Hawley
  2010-12-10  0:16   ` Jakub Narebski
  2010-12-09 21:57 ` [PATCH 13/18] gitweb: Add commented url & url hash to page footer John 'Warthog9' Hawley
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-12-09 21:57 UTC (permalink / raw)
  To: git

This isn't a huge change, it just adds global variables for the file handles,
an additional cleanup to localize the variable a bit more which should alleviate
the issues that Jakub had with my original approach.

Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
---
 gitweb/lib/cache.pl |  114 +++++++++++++++++++++++++++++++-------------------
 1 files changed, 71 insertions(+), 43 deletions(-)

diff --git a/gitweb/lib/cache.pl b/gitweb/lib/cache.pl
index 5182a94..fafc028 100644
--- a/gitweb/lib/cache.pl
+++ b/gitweb/lib/cache.pl
@@ -14,6 +14,12 @@ use Digest::MD5 qw(md5 md5_hex md5_base64);
 use Fcntl ':flock';
 use File::Copy;
 
+# Global declarations
+our $cacheFile;
+our $cacheFileBG;
+our $cacheFileBinWT;
+our $cacheFileBin;
+
 sub cache_fetch {
 	my ($action) = @_;
 	my $cacheTime = 0;
@@ -49,9 +55,9 @@ sub cache_fetch {
 	}else{
 		#if cache is out dated, update
 		#else displayCache();
-		open(cacheFile, '<', "$fullhashpath");
-		stat(cacheFile);
-		close(cacheFile);
+		open($cacheFile, '<', "$fullhashpath");
+		stat($cacheFile);
+		close($cacheFile);
 		my $stat_time = (stat(_))[9];
 		my $stat_size = (stat(_))[7];
 
@@ -150,13 +156,13 @@ sub cacheUpdate {
 	my $fileData = "";
 
 	if($backgroundCache){
-		open(cacheFileBG, '>:utf8', "$fullhashpath.bg");
-		my $lockStatBG = flock(cacheFileBG,LOCK_EX|LOCK_NB);
+		open($cacheFileBG, '>:utf8', "$fullhashpath.bg");
+		my $lockStatBG = flock($cacheFileBG,LOCK_EX|LOCK_NB);
 
 		$lockStatus = $lockStatBG;
 	}else{
-		open(cacheFile, '>:utf8', \$fullhashpath);
-		my $lockStat = flock(cacheFile,LOCK_EX|LOCK_NB);
+		open($cacheFile, '>:utf8', \$fullhashpath);
+		my $lockStat = flock($cacheFile,LOCK_EX|LOCK_NB);
 
 		$lockStatus = $lockStat;
 	}
@@ -172,8 +178,8 @@ sub cacheUpdate {
 	}
 
 	if( isBinaryAction($action) ){
-		my $openstat = open(cacheFileBinWT, '>>:utf8', "$fullhashbinpath");
-		my $lockStatBin = flock(cacheFileBinWT,LOCK_EX|LOCK_NB);
+		my $openstat = open($cacheFileBinWT, '>>:utf8', "$fullhashbinpath");
+		my $lockStatBin = flock($cacheFileBinWT,LOCK_EX|LOCK_NB);
 	}
 
 	# Trap all output from the action
@@ -186,8 +192,8 @@ sub cacheUpdate {
 
 
 	if($backgroundCache){
-		open(cacheFile, '>:utf8', "$fullhashpath");
-		$lockStat = flock(cacheFile,LOCK_EX);
+		open($cacheFile, '>:utf8', "$fullhashpath");
+		$lockStat = flock($cacheFile,LOCK_EX);
 
 		if (! $lockStat ){
 			if ( $areForked ){
@@ -199,8 +205,8 @@ sub cacheUpdate {
 	}
 
 	if( isBinaryAction($action) ){
-		my $openstat = open(cacheFileBinFINAL, '>:utf8', "$fullhashbinpathfinal");
-		$lockStatBIN = flock(cacheFileBinFINAL,LOCK_EX);
+		my $openstat = open($cacheFileBinFINAL, '>:utf8', "$fullhashbinpathfinal");
+		$lockStatBIN = flock($cacheFileBinFINAL,LOCK_EX);
 
 		if (! $lockStatBIN ){
 			if ( $areForked ){
@@ -214,24 +220,24 @@ sub cacheUpdate {
 	# Actually dump the output to the proper file handler
 	local $/ = undef;
 	$|++;
-	print cacheFile "$output";
+	print $cacheFile "$output";
 	$|--;
 	if( isBinaryAction($action) ){
 		move("$fullhashbinpath", "$fullhashbinpathfinal") or die "Binary Cache file could not be updated: $!";
 
-		flock(cacheFileBinFINAL,LOCK_UN);
-		close(cacheFileBinFINAL);
+		flock($cacheFileBinFINAL,LOCK_UN);
+		close($cacheFileBinFINAL);
 
-		flock(cacheFileBinWT,LOCK_UN);
-		close(cacheFileBinWT);
+		flock($cacheFileBinWT,LOCK_UN);
+		close($cacheFileBinWT);
 	}
 
-	flock(cacheFile,LOCK_UN);
-	close(cacheFile);
+	flock($cacheFile,LOCK_UN);
+	close($cacheFile);
 
 	if($backgroundCache){
-		flock(cacheFileBG,LOCK_UN);
-		close(cacheFileBG);
+		flock($cacheFileBG,LOCK_UN);
+		close($cacheFileBG);
 	}
 
 	if ( $areForked ){
@@ -250,10 +256,10 @@ sub cacheWaitForUpdate {
 
 	if( $backgroundCache ){
 		if( -e "$fullhashpath" ){
-			open(cacheFile, '<:utf8', "$fullhashpath");
-			$lockStat = flock(cacheFile,LOCK_SH|LOCK_NB);
-			stat(cacheFile);
-			close(cacheFile);
+			open($cacheFile, '<:utf8', "$fullhashpath");
+			$lockStat = flock($cacheFile,LOCK_SH|LOCK_NB);
+			stat($cacheFile);
+			close($cacheFile);
 
 			if( $lockStat && ( (stat(_))[9] > (time - $maxCacheLife) ) ){
 				cacheDisplay($action);
@@ -271,9 +277,9 @@ sub cacheWaitForUpdate {
 	  ){
 		do {
 			sleep 2 if $x > 0;
-			open(cacheFile, '<:utf8', "$fullhashpath");
-			$lockStat = flock(cacheFile,LOCK_SH|LOCK_NB);
-			close(cacheFile);
+			open($cacheFile, '<:utf8', "$fullhashpath");
+			$lockStat = flock($cacheFile,LOCK_SH|LOCK_NB);
+			close($cacheFile);
 			$x++;
 			$combinedLockStat = $lockStat;
 		} while ((! $combinedLockStat) && ($x < $max));
@@ -326,9 +332,9 @@ EOF
 	do {
 		print ".";
 		sleep 2 if $x > 0;
-		open(cacheFile, '<:utf8', "$fullhashpath");
-		$lockStat = flock(cacheFile,LOCK_SH|LOCK_NB);
-		close(cacheFile);
+		open($cacheFile, '<:utf8', "$fullhashpath");
+		$lockStat = flock($cacheFile,LOCK_SH|LOCK_NB);
+		close($cacheFile);
 		$x++;
 		$combinedLockStat = $lockStat;
 	} while ((! $combinedLockStat) && ($x < $max));
@@ -339,41 +345,63 @@ EOF
 	return;
 }
 
+sub cacheDisplayErr {
+
+	return if ( ! -e "$fullhashpath.err" );
+
+	open($cacheFileErr, '<:utf8', "$fullhashpath.err");
+	$lockStatus = flock($cacheFileErr,LOCK_SH|LOCK_NB);
+
+	if (! $lockStatus ){
+		show_warning(
+				"<p>".
+				"<strong>*** Warning ***:</strong> Locking error when trying to lock error cache page, file $fullhashpath.err<br/>/\n".
+				"This is about as screwed up as it gets folks - see your systems administrator for more help with this.".
+				"<p>"
+				);
+	}
+
+	while( <$cacheFileErr> ){
+		print $_;
+	}
+	exit(0);
+}
+
 sub cacheDisplay {
 	local $/ = undef;
 	$|++;
 
 	my ($action) = @_;
-	open(cacheFile, '<:utf8', "$fullhashpath");
-	$lockStat = flock(cacheFile,LOCK_SH|LOCK_NB);
+	open($cacheFile, '<:utf8', "$fullhashpath");
+	$lockStat = flock($cacheFile,LOCK_SH|LOCK_NB);
 
 	if (! $lockStat ){
-		close(cacheFile);
+		close($cacheFile);
 		cacheWaitForUpdate($action);
 	}
 
 	if( isBinaryAction($action) ){
-		my $openstat = open(cacheFileBin, '<', "$fullhashbinpathfinal");
-		$lockStatBIN = flock(cacheFileBin,LOCK_SH|LOCK_NB);
+		my $openstat = open($cacheFileBin, '<', "$fullhashbinpathfinal");
+		$lockStatBIN = flock($cacheFileBin,LOCK_SH|LOCK_NB);
 		if (! $lockStatBIN ){
-			close(cacheFile);
-			close(cacheFileBin);
+			close($cacheFile);
+			close($cacheFileBin);
 			cacheWaitForUpdate($action);
 		}
 
 		my $binfilesize = -s "$fullhashbinpathfinal";
 		print "Content-Length: $binfilesize";
 	}
-	while( <cacheFile> ){
+	while( <$cacheFile> ){
 		print $_;
 	}
 	if( isBinaryAction($action) ){
 		binmode STDOUT, ':raw';
-		print <cacheFileBin>;
+		print <$cacheFileBin>;
 		binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
-		close(cacheFileBin);
+		close($cacheFileBin);
 	}
-	close(cacheFile);
+	close($cacheFile);
 	$|--;
 }
 
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 13/18] gitweb: Add commented url & url hash to page footer
  2010-12-09 21:57 [PATCH 00/18] Gitweb caching v8 John 'Warthog9' Hawley
                   ` (11 preceding siblings ...)
  2010-12-09 21:57 ` [PATCH 12/18] gitweb: Change file handles (in caching) to lexical variables as opposed to globs John 'Warthog9' Hawley
@ 2010-12-09 21:57 ` John 'Warthog9' Hawley
  2010-12-10  0:26   ` Jakub Narebski
  2010-12-09 21:57 ` [PATCH 14/18] gitweb: add print_transient_header() function for central header printing John 'Warthog9' Hawley
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-12-09 21:57 UTC (permalink / raw)
  To: git

This is mostly a debugging tool, but it adds a small bit of information
to the footer:

<!--
	Full URL: |http://localhost/gitweb-caching/gitweb.cgi?p=/project.git;a=summary|
	URL Hash: |7a31cfb8a43f5643679eec88aa9d7981|
-->

The first bit tells you what the url that generated the page actually was, the second is
the hash used to store the file with the first two characters being used as the directory:

<cachedir>/7a/31cfb8a43f5643679eec88aa9d7981

Also useful for greping through the existing cache and finding files with unique paths that
you may want to explicitly flush.

Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
---
 gitweb/gitweb.perl  |    7 +++++++
 gitweb/lib/cache.pl |    4 ++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index e8c028b..7f8292e 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -303,6 +303,9 @@ our $fullhashpath = *STDOUT;
 our $fullhashbinpath = *STDOUT;
 our $fullhashbinpathfinal = *STDOUT;
 
+our $full_url;
+our $urlhash;
+
 # configuration for 'highlight' (http://www.andre-simon.de/)
 # match by basename
 our %highlight_basename = (
@@ -3663,6 +3666,10 @@ sub git_footer_html {
 
 	print "<div class=\"page_footer\">\n";
 	print "<div class=\"cachetime\">Cache Last Updated: ". gmtime( time ) ." GMT</div>\n";
+	print	"<!--\n".
+		"	Full URL: |$full_url|\n".
+		"	URL Hash: |$urlhash|\n".
+		"-->\n" if ($cache_enable);
 	if (defined $project) {
 		my $descr = git_get_project_description($project);
 		if (defined $descr) {
diff --git a/gitweb/lib/cache.pl b/gitweb/lib/cache.pl
index fafc028..63dbe9e 100644
--- a/gitweb/lib/cache.pl
+++ b/gitweb/lib/cache.pl
@@ -30,8 +30,8 @@ sub cache_fetch {
 		print "Cache directory created successfully\n";
 	}
 
-	our $full_url = "$my_url?". $ENV{'QUERY_STRING'};
-	our $urlhash = md5_hex($full_url);
+	$full_url = "$my_url?". $ENV{'QUERY_STRING'};
+	$urlhash = md5_hex($full_url);
 	our $fullhashdir = "$cachedir/". substr( $urlhash, 0, 2) ."/";
 
 	eval { mkpath( $fullhashdir, 0, 0777 ) };
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 14/18] gitweb: add print_transient_header() function for central header printing
  2010-12-09 21:57 [PATCH 00/18] Gitweb caching v8 John 'Warthog9' Hawley
                   ` (12 preceding siblings ...)
  2010-12-09 21:57 ` [PATCH 13/18] gitweb: Add commented url & url hash to page footer John 'Warthog9' Hawley
@ 2010-12-09 21:57 ` John 'Warthog9' Hawley
  2010-12-10  0:36   ` Jakub Narebski
  2010-12-09 21:57 ` [PATCH 15/18] gitweb: Add show_warning() to display an immediate warning, with refresh John 'Warthog9' Hawley
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-12-09 21:57 UTC (permalink / raw)
  To: git

There are a few things I would like to reuse the transient header
information I'm using, currently this is only the 'Generating...'
page, but there is at least one additional warning page I would
like to use this on.

Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
---
 gitweb/lib/cache.pl |   47 ++++++++++++++++++++++++++---------------------
 1 files changed, 26 insertions(+), 21 deletions(-)

diff --git a/gitweb/lib/cache.pl b/gitweb/lib/cache.pl
index 63dbe9e..723ae9b 100644
--- a/gitweb/lib/cache.pl
+++ b/gitweb/lib/cache.pl
@@ -94,6 +94,31 @@ sub cache_fetch {
 	#$actions{$action}->();
 }
 
+sub print_transient_header {
+	print $::cgi->header(
+				-type=>'text/html',
+				-charset => 'utf-8',
+				-status=> 200,
+				-expires => 'now',
+				# HTTP/1.0
+				-Pragma => 'no-cache',
+				# HTTP/1.1
+				-Cache_Control => join(
+							', ',
+							qw(
+								private
+								no-cache
+								no-store
+								must-revalidate
+								max-age=0
+								pre-check=0
+								post-check=0
+							)
+						)
+				);
+	return;
+}
+
 sub isBinaryAction {
 	my ($action) = @_;
 
@@ -292,27 +317,7 @@ sub cacheWaitForUpdate {
 
 	$| = 1;
 
-	print $::cgi->header(
-				-type=>'text/html',
-				-charset => 'utf-8',
-				-status=> 200,
-				-expires => 'now',
-				# HTTP/1.0
-				-Pragma => 'no-cache',
-				# HTTP/1.1
-				-Cache_Control => join(
-							', ',
-							qw(
-								private
-								no-cache
-								no-store
-								must-revalidate
-								max-age=0
-								pre-check=0
-								post-check=0
-							)
-						)
-				);
+	print_transient_header();
 
 	print <<EOF;
 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www/w3.porg/TR/html4/strict.dtd">
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 15/18] gitweb: Add show_warning() to display an immediate warning, with refresh
  2010-12-09 21:57 [PATCH 00/18] Gitweb caching v8 John 'Warthog9' Hawley
                   ` (13 preceding siblings ...)
  2010-12-09 21:57 ` [PATCH 14/18] gitweb: add print_transient_header() function for central header printing John 'Warthog9' Hawley
@ 2010-12-09 21:57 ` John 'Warthog9' Hawley
  2010-12-10  1:01   ` Jakub Narebski
  2010-12-09 21:57 ` [PATCH 16/18] gitweb: When changing output (STDOUT) change STDERR as well John 'Warthog9' Hawley
                   ` (4 subsequent siblings)
  19 siblings, 1 reply; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-12-09 21:57 UTC (permalink / raw)
  To: git

die_error() is an immediate and abrupt action.  show_warning() more or less
functions identically, except that the page generated doesn't use the
gitweb header or footer (in case they are broken) and has an auto-refresh
(10 seconds) built into it.

This makes use of print_transient_header() which is also used in the
'Generating...' page.  Currently the only warning it throws is about
the cache needing to be created.  If that fails it's a fatal error
and we call die_error()

Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
---
 gitweb/lib/cache.pl |   36 +++++++++++++++++++++++++++++++++---
 1 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/gitweb/lib/cache.pl b/gitweb/lib/cache.pl
index 723ae9b..28e4240 100644
--- a/gitweb/lib/cache.pl
+++ b/gitweb/lib/cache.pl
@@ -25,9 +25,13 @@ sub cache_fetch {
 	my $cacheTime = 0;
 
 	if(! -d $cachedir){
-		print "*** Warning ***: Caching enabled but cache directory does not exsist.  ($cachedir)\n";
-		mkdir ("cache", 0755) || die "Cannot create cache dir - you will need to manually create";
-		print "Cache directory created successfully\n";
+		mkdir ("cache", 0755) || die_error(500, "Internal Server Error", "Cannot create cache dir () - you will need to manually create");
+		show_warning(
+				"<p>".
+				"<strong>*** Warning ***:</strong> Caching enabled but cache directory did not exsist.  ($cachedir)<br/>/\n".
+				"Cache directory created successfully\n".
+				"<p>"
+				);
 	}
 
 	$full_url = "$my_url?". $ENV{'QUERY_STRING'};
@@ -119,6 +123,32 @@ sub print_transient_header {
 	return;
 }
 
+sub show_warning {
+	$| = 1;
+
+	my $warning = esc_html(shift) || "Unknown Warning";
+
+	print_transient_header();
+
+	print <<EOF;
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www/w3.porg/TR/html4/strict.dtd">
+<!-- git web w/caching interface version $version, (C) 2006-2010, John 'Warthog9' Hawley <warthog9\@kernel.org> -->
+<!-- git core binaries version $git_version -->
+<head>
+<meta http-equiv="content-type" content="$content_type; charset=utf-8"/>
+<meta name="generator" content="gitweb/$version git/$git_version"/>
+<meta name="robots" content="index, nofollow"/>
+<meta http-equiv="refresh" content="10"/>
+<title>$title</title>
+</head>
+<body>
+$warning
+</body>
+</html>
+EOF
+	exit(0);
+}
+
 sub isBinaryAction {
 	my ($action) = @_;
 
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 16/18] gitweb: When changing output (STDOUT) change STDERR as well
  2010-12-09 21:57 [PATCH 00/18] Gitweb caching v8 John 'Warthog9' Hawley
                   ` (14 preceding siblings ...)
  2010-12-09 21:57 ` [PATCH 15/18] gitweb: Add show_warning() to display an immediate warning, with refresh John 'Warthog9' Hawley
@ 2010-12-09 21:57 ` John 'Warthog9' Hawley
  2010-12-10  1:36   ` Jakub Narebski
  2010-12-09 21:57 ` [PATCH 17/18] gitweb: Prepare for cached error pages & better error page handling John 'Warthog9' Hawley
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-12-09 21:57 UTC (permalink / raw)
  To: git

This sets up a trap for STDERR as well as STDOUT.  This should
prevent any transient error messages from git itself percolating
up to gitweb and outputting errant information before the HTTP
header has been sent.

Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
---
 gitweb/gitweb.perl  |   22 +++++++++++++++++++++-
 gitweb/lib/cache.pl |   22 ----------------------
 2 files changed, 21 insertions(+), 23 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 7f8292e..d39982a 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1214,6 +1214,10 @@ sub evaluate_argv {
 sub change_output {
 	our $output;
 
+	#
+	# STDOUT
+	#
+
 	# Trap the 'proper' STDOUT to STDOUT_REAL for things like error messages and such
 	open(STDOUT_REAL,">&STDOUT") or die "Unable to capture STDOUT $!\n";
 	print STDOUT_REAL "";
@@ -1223,12 +1227,28 @@ sub change_output {
 
 	# Trap STDOUT to the $output variable, which is what I was using in the original
 	# patch anyway.
-	open(STDOUT,">", \$output) || die "Unable to open STDOUT: $!"; #open STDOUT handle to use $var
+	open(STDOUT,">", \$output) || die "Unable to open STDOUT: $!"; #open STDOUT handle to use $output
+
+	#
+	# STDERR
+	#
+
+	# Trap the 'proper' STDOUT to STDOUT_REAL for things like error messages and such
+	open(STDERR_REAL,">&STDERR") or die "Unable to capture STDERR $!\n";
+	print STDERR_REAL "";
+
+	# Close STDOUT, so that it isn't being used anymore.
+	close STDERR;
+
+	# Trap STDOUT to the $output variable, which is what I was using in the original
+	# patch anyway.
+	open(STDERR,">", \$output_err) || die "Unable to open STDERR: $!"; #open STDERR handle to use $output_err
 }
 
 sub reset_output {
 	# This basically takes STDOUT_REAL and puts it back as STDOUT
 	open(STDOUT,">&STDOUT_REAL");
+	open(STDERR,">&STDERR_REAL");
 }
 
 sub run {
diff --git a/gitweb/lib/cache.pl b/gitweb/lib/cache.pl
index 28e4240..a8c902d 100644
--- a/gitweb/lib/cache.pl
+++ b/gitweb/lib/cache.pl
@@ -380,28 +380,6 @@ EOF
 	return;
 }
 
-sub cacheDisplayErr {
-
-	return if ( ! -e "$fullhashpath.err" );
-
-	open($cacheFileErr, '<:utf8', "$fullhashpath.err");
-	$lockStatus = flock($cacheFileErr,LOCK_SH|LOCK_NB);
-
-	if (! $lockStatus ){
-		show_warning(
-				"<p>".
-				"<strong>*** Warning ***:</strong> Locking error when trying to lock error cache page, file $fullhashpath.err<br/>/\n".
-				"This is about as screwed up as it gets folks - see your systems administrator for more help with this.".
-				"<p>"
-				);
-	}
-
-	while( <$cacheFileErr> ){
-		print $_;
-	}
-	exit(0);
-}
-
 sub cacheDisplay {
 	local $/ = undef;
 	$|++;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 17/18] gitweb: Prepare for cached error pages & better error page handling
  2010-12-09 21:57 [PATCH 00/18] Gitweb caching v8 John 'Warthog9' Hawley
                   ` (15 preceding siblings ...)
  2010-12-09 21:57 ` [PATCH 16/18] gitweb: When changing output (STDOUT) change STDERR as well John 'Warthog9' Hawley
@ 2010-12-09 21:57 ` John 'Warthog9' Hawley
  2010-12-10  1:49   ` Jakub Narebski
  2010-12-09 21:57 ` [PATCH 18/18] gitweb: Add better error handling for gitweb caching John 'Warthog9' Hawley
                   ` (2 subsequent siblings)
  19 siblings, 1 reply; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-12-09 21:57 UTC (permalink / raw)
  To: git

To quote myself from an e-mail of mine:

	I've got a hammer, it clearly solves all problems!

This is the prepatory work to set up a mechanism inside the
caching engine to cache the error pages instead of throwing
them straight out to the client.

This adds two functions:

die_error_cache() - this gets back called from die_error() so
that the error message generated can be cached.

cacheDisplayErr() - this is a simplified version of cacheDisplay()
that does an initial check, if the error page exists - display it
and exit.  If not, return.

Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
---
 gitweb/lib/cache.pl |   52 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 52 insertions(+), 0 deletions(-)

diff --git a/gitweb/lib/cache.pl b/gitweb/lib/cache.pl
index a8c902d..6cb82c8 100644
--- a/gitweb/lib/cache.pl
+++ b/gitweb/lib/cache.pl
@@ -302,6 +302,36 @@ sub cacheUpdate {
 	}
 }
 
+sub die_error_cache {
+	my ($output) = @_;
+
+	open(my $cacheFileErr, '>:utf8', "$fullhashpath.err");
+	my $lockStatus = flock($cacheFileErr,LOCK_EX|LOCK_NB);
+
+	if (! $lockStatus ){
+		if ( $areForked ){
+			exit(0);
+		}else{
+			return;
+		}
+	}
+
+	# Actually dump the output to the proper file handler
+	local $/ = undef;
+	$|++;
+	print $cacheFileErr "$output";
+	$|--;
+
+	flock($cacheFileErr,LOCK_UN);
+	close($cacheFileErr);
+
+	if ( $areForked ){
+		exit(0);
+	}else{
+		return;
+	}
+}
+
 
 sub cacheWaitForUpdate {
 	my ($action) = @_;
@@ -380,6 +410,28 @@ EOF
 	return;
 }
 
+sub cacheDisplayErr {
+
+	return if ( ! -e "$fullhashpath.err" );
+
+	open($cacheFileErr, '<:utf8', "$fullhashpath.err");
+	$lockStatus = flock($cacheFileErr,LOCK_SH|LOCK_NB);
+
+	if (! $lockStatus ){
+		show_warning(
+				"<p>".
+				"<strong>*** Warning ***:</strong> Locking error when trying to lock error cache page, file $fullhashpath.err<br/>/\n".
+				"This is about as screwed up as it gets folks - see your systems administrator for more help with this.".
+				"<p>"
+				);
+	}
+
+	while( <$cacheFileErr> ){
+		print $_;
+	}
+	exit(0);
+}
+
 sub cacheDisplay {
 	local $/ = undef;
 	$|++;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 18/18] gitweb: Add better error handling for gitweb caching
  2010-12-09 21:57 [PATCH 00/18] Gitweb caching v8 John 'Warthog9' Hawley
                   ` (16 preceding siblings ...)
  2010-12-09 21:57 ` [PATCH 17/18] gitweb: Prepare for cached error pages & better error page handling John 'Warthog9' Hawley
@ 2010-12-09 21:57 ` John 'Warthog9' Hawley
  2010-12-10  1:56   ` Jakub Narebski
  2010-12-09 23:26 ` [PATCH 00/18] Gitweb caching v8 Jakub Narebski
  2010-12-10  0:39 ` Junio C Hamano
  19 siblings, 1 reply; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-12-09 21:57 UTC (permalink / raw)
  To: git

This basically finishes the plumbing for caching the error pages
as the are generated.

If an error is hit, create a <hash>.err file with the error.  This
will interrupt all currently waiting processes and they will display
the error, without any additional refreshing.

On a new request a generation will be attempted, should it succed the
<hash.err> file is removed (if it exists).

Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
---
 gitweb/gitweb.perl  |    8 ++++++++
 gitweb/lib/cache.pl |   14 ++++++++++++++
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index d39982a..5a9660a 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -41,6 +41,7 @@ our $version = "++GIT_VERSION++";
 
 # Output buffer variable
 our $output = "";
+our $output_err = "";
 
 our ($my_url, $my_uri, $base_url, $path_info, $home_link);
 sub evaluate_uri {
@@ -303,6 +304,9 @@ our $fullhashpath = *STDOUT;
 our $fullhashbinpath = *STDOUT;
 our $fullhashbinpathfinal = *STDOUT;
 
+our $cacheErrorCache = 0; # false
+our $cacheErrorCount = 0;
+
 our $full_url;
 our $urlhash;
 
@@ -3786,6 +3790,7 @@ sub die_error {
 	# Reset the output so that we are actually going to STDOUT as opposed
 	# to buffering the output.
 	reset_output() if ($cache_enable && ! $cacheErrorCache);
+	$cacheErrorCount++ if( $cacheErrorCache );
 
 	git_header_html($http_responses{$status}, undef, %opts);
 	print <<EOF;
@@ -3801,6 +3806,9 @@ EOF
 	print "</div>\n";
 
 	git_footer_html();
+
+	die_error_cache($output) if ( $cacheErrorCache );
+
 	goto DONE_GITWEB
 		unless ($opts{'-error_handler'});
 }
diff --git a/gitweb/lib/cache.pl b/gitweb/lib/cache.pl
index 6cb82c8..2e7ca69 100644
--- a/gitweb/lib/cache.pl
+++ b/gitweb/lib/cache.pl
@@ -240,8 +240,14 @@ sub cacheUpdate {
 	# Trap all output from the action
 	change_output();
 
+	# Set the error handler so we cache
+	$cacheErrorCache = 1; # true
+
 	$actions{$action}->();
 
+	# Reset Error Handler to not cache
+	$cacheErrorCache = 0; # false
+
 	# Reset the outputs as we should be fine now
 	reset_output();
 
@@ -295,6 +301,8 @@ sub cacheUpdate {
 		close($cacheFileBG);
 	}
 
+	unlink("$fullhashpath.err") if (-e "$fullhashpath.err");
+
 	if ( $areForked ){
 		exit(0);
 	} else {
@@ -339,6 +347,9 @@ sub cacheWaitForUpdate {
 	my $max = 10;
 	my $lockStat = 0;
 
+	# Call cacheDisplayErr - if an error exists it will display and die.  If not it will just return
+	cacheDisplayErr($action);
+
 	if( $backgroundCache ){
 		if( -e "$fullhashpath" ){
 			open($cacheFile, '<:utf8', "$fullhashpath");
@@ -402,6 +413,9 @@ EOF
 		close($cacheFile);
 		$x++;
 		$combinedLockStat = $lockStat;
+
+		# Call cacheDisplayErr - if an error exists it will display and die.  If not it will just return
+		cacheDisplayErr($action);
 	} while ((! $combinedLockStat) && ($x < $max));
 	print <<EOF;
 </body>
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 00/18] Gitweb caching v8
  2010-12-09 21:57 [PATCH 00/18] Gitweb caching v8 John 'Warthog9' Hawley
                   ` (17 preceding siblings ...)
  2010-12-09 21:57 ` [PATCH 18/18] gitweb: Add better error handling for gitweb caching John 'Warthog9' Hawley
@ 2010-12-09 23:26 ` Jakub Narebski
  2010-12-10  0:43   ` J.H.
  2010-12-10  0:39 ` Junio C Hamano
  19 siblings, 1 reply; 60+ messages in thread
From: Jakub Narebski @ 2010-12-09 23:26 UTC (permalink / raw)
  To: John 'Warthog9' Hawley; +Cc: git, Jakub Narebski

John, could you please in the future Cc me?  I am interested in gitweb
output caching development.  Thanks in advance.

"John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:

> Afternoon everyone,
> 
> (Afternoon is like morning, right?)
>  
> This is the latest incarnation of gitweb w/ caching.  Per the general
> consensus and requests from the recent GitTogether I'm re-submitting 
> my patches.
> 
> Bunch of re-works in the code, and several requested features.  Sadly the
> patch series has balloned as I've been adding things.  It was 3-4 patches,
> it's now 18.  This is based on top of Jakub's v7.2 patch series, but
> it should be more or less clean now.

Could you please rebase it on top of v7.2 version?  The v7.2 patch
series contained a few bugs that needs to be corrected.

> 
> As such there was a bunch of changes that I needed to do to Jakub's tree
> which are indicated in the series.  Why did I do them up as separate things?
> Mainly there's a bunch of history that's getting lost right now between
> going back and forth, and I wanted to have clear patches to discuss
> should further discussion be needed.

I guess that in the final submission (i.e. the one that is to be
merged in into git.git repository) those changes would be squashed in,
isn't it?

> 
> This still differs, by two patches, from whats in production on kernel.org.
> It's missing the index page git:// link, and kernel.org and kernel.org also
> has the forced version matching.  As a note I'll probably let this stew
> another day or so on kernel.org and then I'll push it into the Fedora update
> stream, as there's a couple of things in this patch series that would be 
> good for them to have.

There was some discussion about git:// link in the past; nevertheless
this issue is independent on gitweb caching and can (and should) be
sent as a aeparate patch.

IIRC we agreed that because of backward compatibility forced versions
match is quite useless (in general)...

> 
> There is one additional script I've written that the Fedora folks are using,
> and that might be useful to include, which is an 'offline' cache file generator.
> It basically wraps gitweb.cgi and at the end moves the cache file into the right
> place.  The Fedora folks were finding it took hours to generate their front
> page, and that doing a background generation almost never completed (due to 
> process death).  This was a simple way to handle that.  If people would like
> I can add it in as an additional patch.

Are you detaching the background process?

It would be nice to have it as separate patch.

> 
> v8:
> 	- Reverting several changes from Jakub's change set that make no sense
>                 - is_cacheable changed to always return true - nothing special about
>                   blame or blame_incremental as far as the caching engine is concerned

'blame_incremental' is just another version of 'blame' view.  I have
disabled it when caching is enabled in my rewrite (you instead disabled
caching for 'blame_incremental' in your v7 and mine v7.x) because I
couldn't get it to work together with caching.  Did you check that it
works?

Besides, withou "tee"-ing, i.e. printing output as it is captured,
cached 'blame_data' means that 'blame_incremental' is not incremental,
and therefore it vanishes its advantage over 'blame'.

In the case data is in cache, then 'blame_inremental' doesn't have
advantage over 'blame' either.

>                 - Reverted config file change "caching_enabled" back to "cache_enable" as this
>                   config file option is already in the wild in production code, as are all
>                   current gitweb-caching configuration variables.

[Explitive deleted.]  I dislike strongly this $cache_enable.  I think it
would be better for backward compatibility (should we keep backward
compatibility with out-of-tree patches?) to use the same mechanism as
provided in 

  [PATCHv6/RFC 22/24] gitweb: Support legacy options used by kernel.org caching engine
  http://thread.gmane.org/gmane.comp.version-control.git/163052/focus=163058
  http://repo.or.cz/w/git/jnareb-git.git/commitdiff/27ec67ad90ecd56ac3d05f6a9ea49b6faabf7d0a

in my rewrite.  Just set $caching_enabled to true if $cache_enable is
defined and true.

>                 - Reverted change to reset_output as
>                         open STDOUT, ">&", \*STDOUT_REAL;
>                   causes assertion failures:
>                   Assertion !((((s->var)->sv_flags & (0x00004000|0x00008000)) == 0x00008000) && (((svtype)((s->var)->sv_flags & 0xff)) == SVt_PVGV || ((svtype)((s->var)->sv_flags & 0xff)) == SVt_PVLV)) failed: file "scalar.xs", line 49 at gitweb.cgi line 1221.
>                   if we encounter an error *BEFORE* we've ever changed the output.

Which Perl version are you using?  Because I think you found error in Perl.
Well, at least I have not happen on this bug.

I have nothing againts using

  open STDOUT, ">&STDOUT_REAL";

though I really prefer that you used lexical filehandles, instead of
"globs" which are global variables.

The following works:

  open STDOUT, '>&', fileno($fh);

Note that fileno(Symbol::qualify_to_ref($fh)) might be needed...

>         - Cleanups there were indirectly mentioned by Jakub
>                 - Elimination of anything even remotely looking like duplicate code
>                         - Creation of isBinaryAction() and isFeedAction()

Could you please do not use mixedCase names?


First, that is what %actions_info from

  [PATCH 16/24] gitweb: Introduce %actions_info, gathering information about actions
  http://thread.gmane.org/gmane.comp.version-control.git/163052/focus=163038
  http://repo.or.cz/w/git/jnareb-git.git/commitdiff/305a10339b33d56b4a50708d71e8f42453c8cb1f

I have invented for.

Second, why 'isBinaryAction()'?  there isn't something inherently
different between binary (':raw') and text (':utf8') output, as I have
repeatedly said before.  See my rewrite: there is no special case for
binary output (or perhaps binary output as in the case of 'blob_plain'
action).

>         - Adding in blacklist of "dumb" clients for purposes of downloading content
>         - Added more explicit disablement of "Generating..." page

Good, I'll check this.

>         - Added better error handling
>                 - Creation of .err file in the cache directory
>                 - Trap STDERR output into $output_err as this was spewing data prior
>                   to any header information being sent

Why it is needed?  We capture output of "die" via CGI::Util::set_message,
and "warn" output is captured to web server logs... unless you explicitely
use "print STDERR <sth>" -- don't do that instead.

>         - Added hidden field in footer for url & hash of url, which is extremely useful
>           for debugging

Nice idea, I'll see it.  Can it be disabled (information leakage)?

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 01/18] gitweb: Prepare for splitting gitweb
  2010-12-09 21:57 ` [PATCH 01/18] gitweb: Prepare for splitting gitweb John 'Warthog9' Hawley
@ 2010-12-09 23:30   ` Jakub Narebski
  0 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-12-09 23:30 UTC (permalink / raw)
  To: John 'Warthog9' Hawley; +Cc: git, Jakub Narebski

"John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:

> From: Jakub Narebski <jnareb@gmail.com>
> +install-modules:
> +	install_dirs="$(sort $(dir $(GITWEB_MODULES)))" && \
> +	for dir in $$install_dirs; do \
> +		test -d '$(DESTDIR_SQ)$(gitweblibdir_SQ)/$$dir' || \
> +		$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(gitweblibdir_SQ)/$$dir'; \

This should be

  +		$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(gitweblibdir_SQ)'/$$dir; \

or even

  +		$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(gitweblibdir_SQ)'/"$$dir"; \

Shell variables should be not inside single quotes (as oposed to make
variables, where it does not matter).

Please rebase on top of v7.4, where it was fixed.
-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 05/18] gitweb: Regression fix concerning binary output of files
  2010-12-09 21:57 ` [PATCH 05/18] gitweb: Regression fix concerning binary output of files John 'Warthog9' Hawley
@ 2010-12-09 23:33   ` Jakub Narebski
  0 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-12-09 23:33 UTC (permalink / raw)
  To: John 'Warthog9' Hawley; +Cc: git, Jakub Narebski

"John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:

> This solves the regression introduced with v7.2 of the gitweb-caching code,
> fix proposed by Jakub in his e-mail.
> 
> Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
> ---
>  gitweb/gitweb.perl |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 3c3ff08..f2ef3da 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -5664,7 +5664,7 @@ sub git_blob_plain {
>  	if ($caching_enabled) {
>  		open BINOUT, '>', $fullhashbinpath or die_error(500, "Could not open bin dump file");
>  	}else{
> -		open BINOUT, '>', \$fullhashbinpath or die_error(500, "Could not open bin dump file");
> +		open BINOUT, '>&', \$fullhashbinpath or die_error(500, "Could not open bin dump file");
>  	}
>  	binmode BINOUT, ':raw';
>  	print BINOUT <$fd>;

I'd rather you rebase on top of v7.4, where this issue was fixed in
different way... well, at least in easier to undertstand way (in the
solution used above one must know that if caching is disabled,
$fullhashbinpath is *STDOUT - and has nothing to do with any _path_).

This probably should be squashed, if using v7.4 is not chosen.
-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 07/18] gitweb: Revert back to $cache_enable vs. $caching_enabled
  2010-12-09 21:57 ` [PATCH 07/18] gitweb: Revert back to $cache_enable vs. $caching_enabled John 'Warthog9' Hawley
@ 2010-12-09 23:38   ` Jakub Narebski
  2010-12-10  2:38     ` J.H.
  0 siblings, 1 reply; 60+ messages in thread
From: Jakub Narebski @ 2010-12-09 23:38 UTC (permalink / raw)
  To: John 'Warthog9' Hawley; +Cc: git

"John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:

> Simple enough, $cache_enable (along with all caching variables) are
> already in production in multiple places and doing a small semantic
> change without backwards compatibility is pointless breakage.

Formally, there is no backward compatibility with any released code.
Using out-of-tree patches is on one's own risk.

But even discarding that, I'd rather use the same solution as in

  [PATCHv6/RFC 22/24] gitweb: Support legacy options used by kernel.org caching engine
  http://thread.gmane.org/gmane.comp.version-control.git/163052/focus=163058
  http://repo.or.cz/w/git/jnareb-git.git/commitdiff/27ec67ad90ecd56ac3d05f6a9ea49b6faabf7d0a

i.e.

  our $cache_enable;

  [...]

  # somewhere just before call to cache_fetch()
  $caching_enabled = !!$cache_enable if defined $cache_enable;

> 
> This reverts back to the previous variable to enable / disable caching

[...]
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -258,7 +258,7 @@ our $maxload = 300;
>  # that the cache directory be periodically completely deleted, and this is safe to perform.
>  # Suggested mechanism
>  # mv $cacheidr $cachedir.flush;mkdir $cachedir;rm -rf $cachedir.flush
> -our $caching_enabled = 0;
> +our $cache_enable = 0;
>  
>  # Used to set the minimum cache timeout for the dynamic caching algorithm.  Basically
>  # if we calculate the cache to be under this number of seconds we set the cache timeout
> @@ -1138,7 +1138,7 @@ sub dispatch {
>  	    !$project) {
>  		die_error(400, "Project needed");
>  	}
> -	if ($caching_enabled && is_cacheable($action)) {
> +	if ($cache_enable && is_cacheable($action)) {
>  		cache_fetch($action);
>  	} else {
>  		$actions{$action}->();

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 08/18] gitweb: Change is_cacheable() to return true always
  2010-12-09 21:57 ` [PATCH 08/18] gitweb: Change is_cacheable() to return true always John 'Warthog9' Hawley
@ 2010-12-09 23:46   ` Jakub Narebski
  0 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-12-09 23:46 UTC (permalink / raw)
  To: John 'Warthog9' Hawley; +Cc: git, Jakub Narebski

"John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:

> is_cacheable() was set to return false for blame or blame_incremental
> which both use unique urls so there's no reason this shouldn't pass
> through the caching engine.

I have disabled caching 'blame_incremental' (and its workhorse
'blame_data'), in slightly different way (by disabling these views
rather than making them un-cacheable), because last time when I was
chaing this it simply didn't work with caching.  Did you check that it
works?

Besides with caching (without "tee"-ing captre) 'blame_incremental'
view doesn't offer any advantage over 'blame' view, so it should be
IMHO disabled.
 
> Leaving the function in place for now should something actually arrise
> that we can't use caching for (think ajaxy kinda things likely).

Sidenote: I use it for 'cache' and for 'cache_clear' action in

  "[RFC PATCHv6 24/24] gitweb: Add beginnings of cache administration page (proof of concept)"
  http://thread.gmane.org/gmane.comp.version-control.git/163052/focus=163051
  http://repo.or.cz/w/git/jnareb-git.git/commitdiff/aa9fd77ff206eae8838fdde626d2afea563f9f75

> 
> Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
> ---
>  gitweb/gitweb.perl |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 5eb0309..1d8bc74 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -798,7 +798,8 @@ our %actions = (
>  );
>  sub is_cacheable {
>  	my $action = shift;
> -	return !($action eq 'blame_data' || $action eq 'blame_incremental');
> +	# There are no known actions that do no involve a unique URL that shouldn't be cached.
> +	return 1;
>  }
>  
>  # finally, we have the hash of allowed extra_options for the commands that
> -- 
> 1.7.2.3
> 

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 09/18] gitweb: Revert reset_output() back to original code
  2010-12-09 21:57 ` [PATCH 09/18] gitweb: Revert reset_output() back to original code John 'Warthog9' Hawley
@ 2010-12-09 23:58   ` Jakub Narebski
  2010-12-10  2:43     ` J.H.
  0 siblings, 1 reply; 60+ messages in thread
From: Jakub Narebski @ 2010-12-09 23:58 UTC (permalink / raw)
  To: John 'Warthog9' Hawley; +Cc: git, Jakub Narebski

"John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:

> Reverted change to reset_output as
> 
> 	open STDOUT, ">&", \*STDOUT_REAL;

For somebody not following our discussion the above would be very,
very cryptic... though I suppose this would be squashed in final
(ready to be merged in) version of the code.
 
> causes assertion failures:
> 
> 	Assertion !((((s->var)->sv_flags & (0x00004000|0x00008000)) == 0x00008000) && (((svtype)((s->var)->sv_flags & 0xff)) == SVt_PVGV || ((svtype)((s->var)->sv_flags & 0xff)) == SVt_PVLV)) failed: file "scalar.xs", line 49 at gitweb.cgi line 1221.

It looks like bug in Perl, because it should give some kind of Perl
error, not failed assertion from within guts of Perl C code.

Which Perl version are you using?

> if we encounter an error *BEFORE* we've ever changed the output.

And how to reproduce this error (i.e. how did you found it)?
 
> Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
> ---
>  gitweb/gitweb.perl |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 1d8bc74..e8c028b 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -1225,7 +1225,7 @@ sub change_output {
>  
>  sub reset_output {
>  	# This basically takes STDOUT_REAL and puts it back as STDOUT
> -	open STDOUT, ">&", \*STDOUT_REAL;
> +	open(STDOUT,">&STDOUT_REAL");

Hmmm... how to silence spurious warning then:

  gitweb.perl: Name "main::STDOUT_REAL" used only once: possible typo
  at ../gitweb/gitweb.perl line 1130.

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 10/18] gitweb: Adding isBinaryAction() and isFeedAction() to determine the action type
  2010-12-09 21:57 ` [PATCH 10/18] gitweb: Adding isBinaryAction() and isFeedAction() to determine the action type John 'Warthog9' Hawley
@ 2010-12-10  0:06   ` Jakub Narebski
  2010-12-10  3:39     ` J.H.
  0 siblings, 1 reply; 60+ messages in thread
From: Jakub Narebski @ 2010-12-10  0:06 UTC (permalink / raw)
  To: John 'Warthog9' Hawley; +Cc: git, Jakub Narebski

"John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:

> This is fairly self explanitory, these are here just to centralize the checking
> for these types of actions, as special things need to be done with regards to
> them inside the caching engine.
> 
> isBinaryAction() returns true if the action deals with creating binary files
> (this needing :raw output)

Why do you need special case binary / :raw output?  It is not really
necessary if it is done in right way, as shown in my rewrite.

> isFeedAction() returns true if the action deals with a news feed of some sort,
> basically used to bypass the 'Generating...' message should it be a news reader
> as those will explode badly on that page.

Why blacklisting 'feed', instead of whitelisting HTML-output?


BTW., please don't use mixedCase names, but underline_separated.

> 
> Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
> ---
>  gitweb/lib/cache.pl |   69 ++++++++++++++++++++++++++-------------------------
>  1 files changed, 35 insertions(+), 34 deletions(-)
> 
> diff --git a/gitweb/lib/cache.pl b/gitweb/lib/cache.pl
> index a8ee99e..d55b572 100644
> --- a/gitweb/lib/cache.pl
> +++ b/gitweb/lib/cache.pl
> @@ -88,6 +88,34 @@ sub cache_fetch {
>  	#$actions{$action}->();
>  }
>  
> +sub isBinaryAction {
> +	my ($action) = @_;
> +
> +	if(
> +		$action eq "snapshot"
> +		||
> +		$action eq "blob_plain"
> +	){
> +		return 1;	# True
> +	}
> +
> +	return 0;		# False
> +}
> +
> +sub isFeedAction {
> +	if(
> +		$action eq "atom"
> +		||
> +		$action eq "rss"
> +		||
> +		$action eq "opml"
> +	){
> +		return 1;	# True
> +	}
> +
> +	return 0;		# False
> +}

Compare to:

+our %actions_info = ();
+sub evaluate_actions_info {
+       our %actions_info;
+       our (%actions);
+
+       # unless explicitely stated otherwise, default output format is html
+       foreach my $action (keys %actions) {
+               $actions_info{$action}{'output_format'} = 'html';
+       }
+       # list all exceptions; undef means variable (no definite format)
+       map { $actions_info{$_}{'output_format'} = 'text' }
+               qw(commitdiff_plain patch patches project_index blame_data);
+       map { $actions_info{$_}{'output_format'} = 'xml' }
+               qw(rss atom opml); # there are different types (document formats) of XML
+       map { $actions_info{$_}{'output_format'} = undef }
+               qw(blob_plain object);
+       $actions_info{'snapshot'}{'output_format'} = 'binary';
+}

Instead of 'xml' you can use 'feed'.

Then e.g.:

+sub action_outputs_html {
+       my $action = shift;
+       return $actions_info{$action}{'output_format'} eq 'html';
+}


See 
  "gitweb: Introduce %actions_info, gathering information about actions"
  "gitweb: Show appropriate "Generating..." page when regenerating cache"
  http://repo.or.cz/w/git/jnareb-git.git/shortlog/refs/heads/origin..refs/heads/gitweb/cache-kernel-v6

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 11/18] gitweb: add isDumbClient() check
  2010-12-09 21:57 ` [PATCH 11/18] gitweb: add isDumbClient() check John 'Warthog9' Hawley
@ 2010-12-10  0:12   ` Jakub Narebski
  2010-12-10  4:00     ` J.H.
  0 siblings, 1 reply; 60+ messages in thread
From: Jakub Narebski @ 2010-12-10  0:12 UTC (permalink / raw)
  To: John 'Warthog9' Hawley; +Cc: git, Jakub Narebski

"John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:

> Basic check for the claimed Agent string, if it matches a known
> blacklist (wget and curl currently) don't display the 'Generating...'
> page.
> 
> Jakub has mentioned a couple of other possible ways to handle
> this, so if a better way comes along this should be used as a
> wrapper to any better way we can find to deal with this.
> 
> Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
> ---
>  gitweb/lib/cache.pl |   30 ++++++++++++++++++++++++++++++
>  1 files changed, 30 insertions(+), 0 deletions(-)
> 
> diff --git a/gitweb/lib/cache.pl b/gitweb/lib/cache.pl
> index d55b572..5182a94 100644
> --- a/gitweb/lib/cache.pl
> +++ b/gitweb/lib/cache.pl
> @@ -116,6 +116,34 @@ sub isFeedAction {
>  	return 0;		# False
>  }
>  
> +# There have been a number of requests that things like "dumb" clients, I.E. wget
> +# lynx, links, etc (things that just download, but don't parse the html) actually
> +# work without getting the wonkiness that is the "Generating..." page.
> +#
> +# There's only one good way to deal with this, and that's to read the browser User
> +# Agent string and do matching based on that.  This has a whole slew of error cases
> +# and mess, but there's no other way to determine if the "Generating..." page
> +# will break things.
> +#
> +# This assumes the client is not dumb, thus the default behavior is to return
> +# "false" (0) (and eventually the "Generating..." page).  If it is a dumb client
> +# return "true" (1)
> +sub isDumbClient {

Please don't use mixedCase, but underline_separated words,
e.g. browser_is_robot(), or client_is_dumb().

> +	my($user_agent) = $ENV{'HTTP_USER_AGENT'};

What if $ENV{'HTTP_USER_AGENT'} is unset / undef, e.g. because we are
runing gitweb as a script... which includes running gitweb tests?

> +	
> +	if(
> +		# wget case
> +		$user_agent =~ /^Wget/i
> +		||
> +		# curl should be excluded I think, probably better safe than sorry
> +		$user_agent =~ /^curl/i
> +	  ){
> +		return 1;	# True
> +	}
> +
> +	return 0;
> +}

Compare (note: handcrafted solution is to whitelist, not blacklist):

+sub browser_is_robot {
+       return 1 if !exists $ENV{'HTTP_USER_AGENT'}; # gitweb run as script
+       if (eval { require HTTP::BrowserDetect; }) {
+               my $browser = HTTP::BrowserDetect->new();
+               return $browser->robot();
+       }
+       # fallback on detecting known web browsers
+       return 0 if ($ENV{'HTTP_USER_AGENT'} =~ /\b(?:Mozilla|Opera|Safari|IE)\b/);
+       # be conservative; if not sure, assume non-interactive
+       return 1;
+}

from

  "[PATCHv6 17/24] gitweb: Show appropriate "Generating..." page when regenerating cache"
  http://thread.gmane.org/gmane.comp.version-control.git/163052/focus=163040
  http://repo.or.cz/w/git/jnareb-git.git/commitdiff/48679f7985ccda16dc54fda97790841bab4a0ba2

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 12/18] gitweb: Change file handles (in caching) to lexical variables as opposed to globs
  2010-12-09 21:57 ` [PATCH 12/18] gitweb: Change file handles (in caching) to lexical variables as opposed to globs John 'Warthog9' Hawley
@ 2010-12-10  0:16   ` Jakub Narebski
  2010-12-10  0:32     ` Junio C Hamano
  0 siblings, 1 reply; 60+ messages in thread
From: Jakub Narebski @ 2010-12-10  0:16 UTC (permalink / raw)
  To: John 'Warthog9' Hawley; +Cc: git, Jakub Narebski

"John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:

> This isn't a huge change, it just adds global variables for the file handles,
> an additional cleanup to localize the variable a bit more which should alleviate
> the issues that Jakub had with my original approach.
> 
> Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
> ---
>  gitweb/lib/cache.pl |  114 +++++++++++++++++++++++++++++++-------------------
>  1 files changed, 71 insertions(+), 43 deletions(-)
> 
> diff --git a/gitweb/lib/cache.pl b/gitweb/lib/cache.pl
> index 5182a94..fafc028 100644
> --- a/gitweb/lib/cache.pl
> +++ b/gitweb/lib/cache.pl
> @@ -14,6 +14,12 @@ use Digest::MD5 qw(md5 md5_hex md5_base64);
>  use Fcntl ':flock';
>  use File::Copy;
>  
> +# Global declarations
> +our $cacheFile;
> +our $cacheFileBG;
> +our $cacheFileBinWT;
> +our $cacheFileBin;

You are trading globs for global (well, package) variables.  They are
not lexical filehandles... though I'm not sure if it would be possible
without restructuring code; note that if variable holding filehandle
falls out of scope, then file would be automatically closed.

BTW. Do you really need all those types/variables?

> +
>  sub cache_fetch {
>  	my ($action) = @_;
>  	my $cacheTime = 0;
> @@ -49,9 +55,9 @@ sub cache_fetch {
>  	}else{
>  		#if cache is out dated, update
>  		#else displayCache();
> -		open(cacheFile, '<', "$fullhashpath");
> -		stat(cacheFile);
> -		close(cacheFile);
> +		open($cacheFile, '<', "$fullhashpath");
> +		stat($cacheFile);
> +		close($cacheFile);
[...]

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 13/18] gitweb: Add commented url & url hash to page footer
  2010-12-09 21:57 ` [PATCH 13/18] gitweb: Add commented url & url hash to page footer John 'Warthog9' Hawley
@ 2010-12-10  0:26   ` Jakub Narebski
  2010-12-10  6:10     ` J.H.
  0 siblings, 1 reply; 60+ messages in thread
From: Jakub Narebski @ 2010-12-10  0:26 UTC (permalink / raw)
  To: John 'Warthog9' Hawley; +Cc: git, Jakub Narebski

"John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:

> This is mostly a debugging tool, but it adds a small bit of information
> to the footer:
> 
> <!--
> 	Full URL: |http://localhost/gitweb-caching/gitweb.cgi?p=/project.git;a=summary|
> 	URL Hash: |7a31cfb8a43f5643679eec88aa9d7981|
> -->

Nice idea.  It helps with debugging and doesn't introduce information
leakage.

Note that in my rewrite there would be *three* pieces of information,
not two.  Namely:

  Full URL: |http://localhost/gitweb-caching/gitweb.cgi/project.git|
  Key:      |http://localhost/gitweb-caching/gitweb.cgi?p=/project.git;a=summary|
  Key hash: |7a31cfb8a43f5643679eec88aa9d7981|

> 
> The first bit tells you what the url that generated the page actually was, the second is
> the hash used to store the file with the first two characters being used as the directory:
> 
> <cachedir>/7a/31cfb8a43f5643679eec88aa9d7981

Isn't it

  <cachedir>/7a/7a31cfb8a43f5643679eec88aa9d7981

in your series?

> 
> Also useful for greping through the existing cache and finding files with unique paths that
> you may want to explicitly flush.

Though probably better 'cache_admin' page would be ultimately best
solution, see proof of concept in

  [RFC PATCHv6 24/24] gitweb: Add beginnings of cache administration page (proof of concept)
  http://thread.gmane.org/gmane.comp.version-control.git/163052/focus=163051
  http://repo.or.cz/w/git/jnareb-git.git/commitdiff/aa9fd77ff206eae8838fdde626d2afea563f9f75

> 
> Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
> ---
>  gitweb/gitweb.perl  |    7 +++++++
>  gitweb/lib/cache.pl |    4 ++--
>  2 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index e8c028b..7f8292e 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -303,6 +303,9 @@ our $fullhashpath = *STDOUT;
>  our $fullhashbinpath = *STDOUT;
>  our $fullhashbinpathfinal = *STDOUT;
>  
> +our $full_url;
> +our $urlhash;
> +
>  # configuration for 'highlight' (http://www.andre-simon.de/)
>  # match by basename
>  our %highlight_basename = (
> @@ -3663,6 +3666,10 @@ sub git_footer_html {
>  
>  	print "<div class=\"page_footer\">\n";
>  	print "<div class=\"cachetime\">Cache Last Updated: ". gmtime( time ) ." GMT</div>\n";
> +	print	"<!--\n".
> +		"	Full URL: |$full_url|\n".
> +		"	URL Hash: |$urlhash|\n".
> +		"-->\n" if ($cache_enable);

Don't you need to esc_html on it?  $full_url can contain ' -->', and
what you would do then?

>  	if (defined $project) {
>  		my $descr = git_get_project_description($project);
>  		if (defined $descr) {
> diff --git a/gitweb/lib/cache.pl b/gitweb/lib/cache.pl
> index fafc028..63dbe9e 100644
> --- a/gitweb/lib/cache.pl
> +++ b/gitweb/lib/cache.pl
> @@ -30,8 +30,8 @@ sub cache_fetch {
>  		print "Cache directory created successfully\n";
>  	}
>  
> -	our $full_url = "$my_url?". $ENV{'QUERY_STRING'};

Note that $my_url is $cgi->url(), which does not include path_info.

> -	our $urlhash = md5_hex($full_url);
> +	$full_url = "$my_url?". $ENV{'QUERY_STRING'};
> +	$urlhash = md5_hex($full_url);
>  	our $fullhashdir = "$cachedir/". substr( $urlhash, 0, 2) ."/";
>  
>  	eval { mkpath( $fullhashdir, 0, 0777 ) };
> -- 
> 1.7.2.3

Looks quite nice. 

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 12/18] gitweb: Change file handles (in caching) to lexical variables as opposed to globs
  2010-12-10  0:16   ` Jakub Narebski
@ 2010-12-10  0:32     ` Junio C Hamano
  2010-12-10  0:47       ` Jakub Narebski
  2010-12-10  5:56       ` J.H.
  0 siblings, 2 replies; 60+ messages in thread
From: Junio C Hamano @ 2010-12-10  0:32 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: John 'Warthog9' Hawley, git

Jakub Narebski <jnareb@gmail.com> writes:

>> +# Global declarations
>> +our $cacheFile;
>> +our $cacheFileBG;
>> +our $cacheFileBinWT;
>> +our $cacheFileBin;
>
> You are trading globs for global (well, package) variables.  They are
> not lexical filehandles... though I'm not sure if it would be possible
> without restructuring code; note that if variable holding filehandle
> falls out of scope, then file would be automatically closed.

Hmm. why is it a bad idea, when you need to access these from practically
everywhere, to use global variables to begin with?  To a certain degree,
it sounds like an unnecessary burden without much gain to me.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 14/18] gitweb: add print_transient_header() function for central header printing
  2010-12-09 21:57 ` [PATCH 14/18] gitweb: add print_transient_header() function for central header printing John 'Warthog9' Hawley
@ 2010-12-10  0:36   ` Jakub Narebski
  2010-12-10  6:18     ` J.H.
  0 siblings, 1 reply; 60+ messages in thread
From: Jakub Narebski @ 2010-12-10  0:36 UTC (permalink / raw)
  To: John 'Warthog9' Hawley; +Cc: git, Jakub Narebski

"John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:

> There are a few things I would like to reuse the transient header
> information I'm using, currently this is only the 'Generating...'
> page, but there is at least one additional warning page I would
> like to use this on.
> 
> Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
> ---
>  gitweb/lib/cache.pl |   47 ++++++++++++++++++++++++++---------------------
>  1 files changed, 26 insertions(+), 21 deletions(-)
> 
> diff --git a/gitweb/lib/cache.pl b/gitweb/lib/cache.pl
> index 63dbe9e..723ae9b 100644
> --- a/gitweb/lib/cache.pl
> +++ b/gitweb/lib/cache.pl
> @@ -94,6 +94,31 @@ sub cache_fetch {
>  	#$actions{$action}->();
>  }
>  
> +sub print_transient_header {
> +	print $::cgi->header(

Why you use $::cgi->header() instead of equivalent $cgi->header()?
Note that $::cgi->header() is $main::cgi->header(), and is not
CGI::header().

> +				-type=>'text/html',
> +				-charset => 'utf-8',
> +				-status=> 200,
> +				-expires => 'now',
> +				# HTTP/1.0
> +				-Pragma => 'no-cache',
> +				# HTTP/1.1
> +				-Cache_Control => join(
> +							', ',
> +							qw(
> +								private
> +								no-cache
> +								no-store
> +								must-revalidate
> +								max-age=0
> +								pre-check=0
> +								post-check=0
> +							)
> +						)
> +				);
> +	return;
> +}

Why not use

	our %no_cache = (
		# HTTP/1.0
		-Pragma => 'no-cache',
		# HTTP/1.1
		-Cache_Control => join(', ', qw(private no-cache no-store must-revalidate
		                                max-age=0 pre-check=0 post-check=0)),
	);

(or something like that).  This way you can reuse it even if content
type is different (e.g. 'text/plain').

But that is just a proposal.

> +
>  sub isBinaryAction {
>  	my ($action) = @_;
>  
> @@ -292,27 +317,7 @@ sub cacheWaitForUpdate {
>  
>  	$| = 1;
>  
> -	print $::cgi->header(
> -				-type=>'text/html',
> -				-charset => 'utf-8',
> -				-status=> 200,
> -				-expires => 'now',
> -				# HTTP/1.0
> -				-Pragma => 'no-cache',
> -				# HTTP/1.1
> -				-Cache_Control => join(
> -							', ',
> -							qw(
> -								private
> -								no-cache
> -								no-store
> -								must-revalidate
> -								max-age=0
> -								pre-check=0
> -								post-check=0
> -							)
> -						)
> -				);
> +	print_transient_header();
>  
>  	print <<EOF;
>  <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www/w3.porg/TR/html4/strict.dtd">
> -- 
> 1.7.2.3
> 

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 00/18] Gitweb caching v8
  2010-12-09 21:57 [PATCH 00/18] Gitweb caching v8 John 'Warthog9' Hawley
                   ` (18 preceding siblings ...)
  2010-12-09 23:26 ` [PATCH 00/18] Gitweb caching v8 Jakub Narebski
@ 2010-12-10  0:39 ` Junio C Hamano
  2010-12-10  0:45   ` J.H.
  19 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2010-12-10  0:39 UTC (permalink / raw)
  To: John 'Warthog9' Hawley; +Cc: git

It seems that t950X tests do not like me.  I am getting these in gitweb.body:

    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www/w3.porg/TR/html4/strict.dtd">
    <!-- git web w/caching interface version current, (C) 2006-2010, John 'Warthog9' Hawley <warthog9@kernel.org> -->
    <!-- git core binaries version 1.7.3.3.494.gcf92e -->
    <head>
    <meta http-equiv="content-type" content="; charset=utf-8"/>
    <meta name="generator" content="gitweb/current git/1.7.3.3.494.gcf92e"/>
    <meta name="robots" content="index, nofollow"/>
    <meta http-equiv="refresh" content="10"/>
    <title></title>
    </head>
    <body>
    &lt;p&gt;&lt;strong&gt;*** Warning ***:&lt;/strong&gt; Caching enabled but cache directory did not exsist.  (cache)&lt;br/&gt;/<span class="cntrl">\n</span>Cache directory created successfully<span class="cntrl">\n</span>&lt;p&gt;
    </body>
    </html>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 00/18] Gitweb caching v8
  2010-12-09 23:26 ` [PATCH 00/18] Gitweb caching v8 Jakub Narebski
@ 2010-12-10  0:43   ` J.H.
  2010-12-10  1:27     ` Jakub Narebski
  0 siblings, 1 reply; 60+ messages in thread
From: J.H. @ 2010-12-10  0:43 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

On 12/09/2010 03:26 PM, Jakub Narebski wrote:
> John, could you please in the future Cc me?  I am interested in gitweb
> output caching development.  Thanks in advance.

Apologies, apparently screwed up on my git send-email line.  I'll get
that right one of these eons.

> Could you please rebase it on top of v7.2 version?  The v7.2 patch
> series contained a few bugs that needs to be corrected.

I assume you mean 7.4, as opposed to 7.2... otherwise already done!

> I guess that in the final submission (i.e. the one that is to be
> merged in into git.git repository) those changes would be squashed in,
> isn't it?

I have no objections to squashing the reversions into a single patch,
just figured it was easier to break them out for the time being.

>> This still differs, by two patches, from whats in production on kernel.org.
>> It's missing the index page git:// link, and kernel.org and kernel.org also
> 
>>
>> has the forced version matching.  As a note I'll probably let this stew
>> another day or so on kernel.org and then I'll push it into the Fedora update
>> stream, as there's a couple of things in this patch series that would be 
>> good for them to have.
> 
> There was some discussion about git:// link in the past; nevertheless
> this issue is independent on gitweb caching and can (and should) be
> sent as a aeparate patch.
> 
> IIRC we agreed that because of backward compatibility forced versions
> match is quite useless (in general)...

The former wasn't submitted as that is a separate issue, the later was
not agreed on really but mostly me retracting the patches as they
weren't making any headway.

I mention the patches at all as clarification of what's actually running
on kernel.org, and eventually what will be in the gitweb-caching
packages that are part of Fedora and EPEL.

>> There is one additional script I've written that the Fedora folks are using,
>> and that might be useful to include, which is an 'offline' cache file generator.
>> It basically wraps gitweb.cgi and at the end moves the cache file into the right
>> place.  The Fedora folks were finding it took hours to generate their front
>> page, and that doing a background generation almost never completed (due to 
>> process death).  This was a simple way to handle that.  If people would like
>> I can add it in as an additional patch.
> 
> Are you detaching the background process?

No, in fact I completely turn off forking (using the $cacheDoFork variable.)

> It would be nice to have it as separate patch.

I can add it easily enough.

>> v8:
>> 	- Reverting several changes from Jakub's change set that make no sense
>>                 - is_cacheable changed to always return true - nothing special about
>>                   blame or blame_incremental as far as the caching engine is concerned
> 
> 'blame_incremental' is just another version of 'blame' view.  I have
> disabled it when caching is enabled in my rewrite (you instead disabled
> caching for 'blame_incremental' in your v7 and mine v7.x) because I
> couldn't get it to work together with caching.  Did you check that it
> works?

blame works fine, blame_incremental generates but doesn't..... ohhhh
someone added ajaxy kinda stuff and doesn't mention it anywhere.

Exciting.

blame_data needs to not get a 'generating...' page in all likelihood,
generating a blame_incremental page, letting it load and then refreshing
the whole thing gets me what I'm expecting.

Is enough to mask.

Guess I'm looking at a v9 now.

> Besides, withou "tee"-ing, i.e. printing output as it is captured,
> cached 'blame_data' means that 'blame_incremental' is not incremental,
> and therefore it vanishes its advantage over 'blame'.

There are only 2 ways to get to a blame_incremental page

1) By going to a blame page and clicking on the incremental link in the nav

2) By enabling it by default so when you click 'blame' it goes to
incremental first.

> In the case data is in cache, then 'blame_inremental' doesn't have
> advantage over 'blame' either.

Agreed, though it's easy enough to support in the caching engine,
basically don't return 'Generating...' and wait for that data to cache.
 Not really an advantage except that your not waiting for the whole
generation to get a page back at all.

>>                 - Reverted change to reset_output as
>>                         open STDOUT, ">&", \*STDOUT_REAL;
>>                   causes assertion failures:
>>                   Assertion !((((s->var)->sv_flags & (0x00004000|0x00008000)) == 0x00008000) && (((svtype)((s->var)->sv_flags & 0xff)) == SVt_PVGV || ((svtype)((s->var)->sv_flags & 0xff)) == SVt_PVLV)) failed: file "scalar.xs", line 49 at gitweb.cgi line 1221.
>>                   if we encounter an error *BEFORE* we've ever changed the output.
> 
> Which Perl version are you using?  Because I think you found error in Perl.
> Well, at least I have not happen on this bug.

This is perl, v5.10.0 built for x86_64-linux-thread-multi

> I have nothing againts using
> 
>   open STDOUT, ">&STDOUT_REAL";
> 
> though I really prefer that you used lexical filehandles, instead of
> "globs" which are global variables.
> 
> The following works:
> 
>   open STDOUT, '>&', fileno($fh);
> 
> Note that fileno(Symbol::qualify_to_ref($fh)) might be needed...

I see 0 advantage to shifting around STDOUT and STDERR to a lexical
filehandle vs. a glob in this case.  STDOUT_REAL retains all the
properties of STDOUT should it be needed elsewhere, including what it
was going and what it was doing.

I have no objection to shifting the file handles I'm using to lexical
variables, if nothing else the argument about them closing when falling
out of scope is worth it, but for STDOUT, STDERR, etc I don't think
switching to lexicals makes a lot of sense

>>         - Cleanups there were indirectly mentioned by Jakub
>>                 - Elimination of anything even remotely looking like duplicate code
>>                         - Creation of isBinaryAction() and isFeedAction()
> 
> Could you please do not use mixedCase names?

I'm fine with renaming those if you wish.

> First, that is what %actions_info from
> 
>   [PATCH 16/24] gitweb: Introduce %actions_info, gathering information about actions
>   http://thread.gmane.org/gmane.comp.version-control.git/163052/focus=163038
>   http://repo.or.cz/w/git/jnareb-git.git/commitdiff/305a10339b33d56b4a50708d71e8f42453c8cb1f
> 
> I have invented for.

I have not based any of my caching engine, right now, on anything you've
done for your rewrite.

> Second, why 'isBinaryAction()'?  there isn't something inherently
> different between binary (':raw') and text (':utf8') output, as I have
> repeatedly said before.

It's a binary action in that you are shoving something down the pipe
with the intention of sending the bits completely raw.  You read the
data raw, and write the data raw.  There is no interpretation of the
data as being anything but straight raw.

Right now, in gitweb already, there are two places that treat output
completely differently:

	- snapshot
	- blob_plain

The only reason isBinaryAction() (or any other function name or process
you want to grant it) exists is so that I can figure out if it's one of
those actions so I can deal with the cache and output handling
differently for each.

Yes, I could flip the entire caching engine over to following the same
mantra for everything and thus there is no need to care, but gitweb
itself isn't really setup to handle that separation cleanly right now,
and I'm trying to make as few bigger changes right now as is.


>>         - Added better error handling
>>                 - Creation of .err file in the cache directory
>>                 - Trap STDERR output into $output_err as this was spewing data prior
>>                   to any header information being sent
> 
> Why it is needed?  We capture output of "die" via CGI::Util::set_message,
> and "warn" output is captured to web server logs... unless you explicitely
> use "print STDERR <sth>" -- don't do that instead.

I have seen, in several instances, a case where git itself will generate
an error, it shoves it to STDERR which makes it to the client before
anything else, thus causing 500 level errors.

Added this so that STDERR got trapped and those messages didn't make it out.

>>         - Added hidden field in footer for url & hash of url, which is extremely useful
>>           for debugging
> 
> Nice idea, I'll see it.  Can it be disabled (information leakage)?

There's not really any information leakage per-se, unless you call
md5suming the url information leakage.

- John 'Warthog9' Hawley

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 00/18] Gitweb caching v8
  2010-12-10  0:39 ` Junio C Hamano
@ 2010-12-10  0:45   ` J.H.
  0 siblings, 0 replies; 60+ messages in thread
From: J.H. @ 2010-12-10  0:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 12/09/2010 04:39 PM, Junio C Hamano wrote:
> It seems that t950X tests do not like me.  I am getting these in gitweb.body:
> 
>     <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www/w3.porg/TR/html4/strict.dtd">
>     <!-- git web w/caching interface version current, (C) 2006-2010, John 'Warthog9' Hawley <warthog9@kernel.org> -->
>     <!-- git core binaries version 1.7.3.3.494.gcf92e -->
>     <head>
>     <meta http-equiv="content-type" content="; charset=utf-8"/>
>     <meta name="generator" content="gitweb/current git/1.7.3.3.494.gcf92e"/>
>     <meta name="robots" content="index, nofollow"/>
>     <meta http-equiv="refresh" content="10"/>
>     <title></title>
>     </head>
>     <body>
>     &lt;p&gt;&lt;strong&gt;*** Warning ***:&lt;/strong&gt; Caching enabled but cache directory did not exsist.  (cache)&lt;br/&gt;/<span class="cntrl">\n</span>Cache directory created successfully<span class="cntrl">\n</span>&lt;p&gt;
>     </body>
>     </html>

caching directory didn't exist prior to running the test, and so it
throws the warning that it needed to create it.  The cache directory
should likely be created before caching tests.

(Sorry, didn't catch that one as I've already got my caching directory
created, and it doesn't really get deleted)

- John 'Warthog9' Hawley

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 12/18] gitweb: Change file handles (in caching) to lexical variables as opposed to globs
  2010-12-10  0:32     ` Junio C Hamano
@ 2010-12-10  0:47       ` Jakub Narebski
  2010-12-10  5:56       ` J.H.
  1 sibling, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-12-10  0:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: John 'Warthog9' Hawley, git

On Fri, 10 Dec 2010, Junio C Hamano wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
>> "John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:
>>> 
>>> +# Global declarations
>>> +our $cacheFile;
>>> +our $cacheFileBG;
>>> +our $cacheFileBinWT;
>>> +our $cacheFileBin;
>>
>> You are trading globs for global (well, package) variables.  They are
>> not lexical filehandles... though I'm not sure if it would be possible
>> without restructuring code; note that if variable holding filehandle
>> falls out of scope, then file would be automatically closed.
> 
> Hmm. why is it a bad idea, when you need to access these from practically
> everywhere, to use global variables to begin with?  To a certain degree,
> it sounds like an unnecessary burden without much gain to me.

If you check my rewrite of gitweb output caching:

  "[PATCHv6/RFC 00/24] gitweb: Simple file based output caching"
  

http://repo.or.cz/w/git/jnareb-git.git/shortlog/refs/heads/origin..refs/heads/gitweb/cache-kernel-v6
https://github.com/jnareb/git/compare/origin...gitweb/cache-kernel-v6

you would see that I always use lexical filehandles, and I never need
to use global variables / glob filehandles.

http://en.wikipedia.org/wiki/Global_variables
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 15/18] gitweb: Add show_warning() to display an immediate warning, with refresh
  2010-12-09 21:57 ` [PATCH 15/18] gitweb: Add show_warning() to display an immediate warning, with refresh John 'Warthog9' Hawley
@ 2010-12-10  1:01   ` Jakub Narebski
  2010-12-10  7:38     ` J.H.
  0 siblings, 1 reply; 60+ messages in thread
From: Jakub Narebski @ 2010-12-10  1:01 UTC (permalink / raw)
  To: John 'Warthog9' Hawley; +Cc: git, Jakub Narebski

"John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:

> die_error() is an immediate and abrupt action.  show_warning() more or less
> functions identically, except that the page generated doesn't use the
> gitweb header or footer (in case they are broken) and has an auto-refresh
> (10 seconds) built into it.

Why not use gitweb header/footer?  If they are broken, it should be
caught in git development.  If we don't se them, the show_warning()
output would look out of place.

> 
> This makes use of print_transient_header() which is also used in the
> 'Generating...' page.  Currently the only warning it throws is about
> the cache needing to be created.  If that fails it's a fatal error
> and we call die_error()

Why do you feel the need to single out this case giving it warning,
and single out this warning by showing warning page?

Nevertheless show_warning() _might_ be a good idea.

> 
> Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
> ---
>  gitweb/lib/cache.pl |   36 +++++++++++++++++++++++++++++++++---
>  1 files changed, 33 insertions(+), 3 deletions(-)
> 
> diff --git a/gitweb/lib/cache.pl b/gitweb/lib/cache.pl
> index 723ae9b..28e4240 100644
> --- a/gitweb/lib/cache.pl
> +++ b/gitweb/lib/cache.pl
> @@ -25,9 +25,13 @@ sub cache_fetch {
>  	my $cacheTime = 0;
>  
>  	if(! -d $cachedir){
> -		print "*** Warning ***: Caching enabled but cache directory does not exsist.  ($cachedir)\n";
> -		mkdir ("cache", 0755) || die "Cannot create cache dir - you will need to manually create";
> -		print "Cache directory created successfully\n";
> +		mkdir ("cache", 0755) || die_error(500, "Internal Server Error", "Cannot create cache dir () - you will need to manually create");
> +		show_warning(
> +				"<p>".
> +				"<strong>*** Warning ***:</strong> Caching enabled but cache directory did not exsist.  ($cachedir)<br/>/\n".

Minor nit: s/exsist/exist/

Don't you need to use esc_path() on $cachedir, 
using either

  ...did not exist.  (".esc_path($cachedir).")<br/>\n";

or using this trick

  ...did not exist.  (@{[esc_path($cachedir)]})<br/>\n";

> +				"Cache directory created successfully\n".
> +				"<p>"
> +				);
>  	}
>  
>  	$full_url = "$my_url?". $ENV{'QUERY_STRING'};
> @@ -119,6 +123,32 @@ sub print_transient_header {
>  	return;
>  }
>  
> +sub show_warning {
> +	$| = 1;

  +	local $| = 1;

$| is global variable, and otherwise you would turn autoflush for all
code, which would matter e.g. for FastCGI.

> +
> +	my $warning = esc_html(shift) || "Unknown Warning";
> +
> +	print_transient_header();
> +
> +	print <<EOF;
> +<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www/w3.porg/TR/html4/strict.dtd">
> +<!-- git web w/caching interface version $version, (C) 2006-2010, John 'Warthog9' Hawley <warthog9\@kernel.org> -->
> +<!-- git core binaries version $git_version -->
> +<head>
> +<meta http-equiv="content-type" content="$content_type; charset=utf-8"/>

$content_type is not defined here.

> +<meta name="generator" content="gitweb/$version git/$git_version"/>
> +<meta name="robots" content="index, nofollow"/>

It is "noindex, nofollow", isn't it?

> +<meta http-equiv="refresh" content="10"/>

Why 10 seconds?

> +<title>$title</title>

$title is not defined here.

> +</head>
> +<body>
> +$warning
> +</body>
> +</html>
> +EOF
> +	exit(0);

"exit(0)" and not "goto DONE_GITWEB", or "goto DONE_REQUEST"?

> +}
> +
>  sub isBinaryAction {
>  	my ($action) = @_;

Didn't you ran gitweb tests?

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 00/18] Gitweb caching v8
  2010-12-10  0:43   ` J.H.
@ 2010-12-10  1:27     ` Jakub Narebski
  0 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-12-10  1:27 UTC (permalink / raw)
  To: J.H.; +Cc: git

On Fri, 10 Dec 2010, J.H. wrote:
> On 12/09/2010 03:26 PM, Jakub Narebski wrote:

>> John, could you please in the future Cc me?  I am interested in gitweb
>> output caching development.  Thanks in advance.
> 
> Apologies, apparently screwed up on my git send-email line.  I'll get
> that right one of these eons.

Ah, I can understand this.

>> I guess that in the final submission (i.e. the one that is to be
>> merged in into git.git repository) those changes would be squashed in,
>> isn't it?
> 
> I have no objections to squashing the reversions into a single patch,
> just figured it was easier to break them out for the time being.

I guess that interdiff in comments would work as well, or almost as well...
 
>>> There is one additional script I've written that the Fedora folks are using,
>>> and that might be useful to include, which is an 'offline' cache file generator.
>>> It basically wraps gitweb.cgi and at the end moves the cache file into the right
>>> place.  The Fedora folks were finding it took hours to generate their front
>>> page, and that doing a background generation almost never completed (due to 
>>> process death).  This was a simple way to handle that.  If people would like
>>> I can add it in as an additional patch.
>> 
>> Are you detaching the background process?

Errr... what I meant here is that perhaps detaching background process
would make it not die, but I am guessing here.
 
> No, in fact I completely turn off forking (using the $cacheDoFork variable.)

BTW. what I don't like is your code forking indiscriminately even if it
is not needed (e.g. background cache generation is turned off).

> 
>> It would be nice to have it as separate patch.
> 
> I can add it easily enough.

It is only about caching most IO intensive page, i.e. projects_list page,
isn't it?  Why doesn't _it_ die, like background process?

> 
>>> v8:
>>> 	- Reverting several changes from Jakub's change set that make no sense
>>>                 - is_cacheable changed to always return true - nothing special about
>>>                   blame or blame_incremental as far as the caching engine is concerned
>> 
>> 'blame_incremental' is just another version of 'blame' view.  I have
>> disabled it when caching is enabled in my rewrite (you instead disabled
>> caching for 'blame_incremental' in your v7 and mine v7.x) because I
>> couldn't get it to work together with caching.  Did you check that it
>> works?
> 
> blame works fine, blame_incremental generates but doesn't..... ohhhh
> someone added ajaxy kinda stuff and doesn't mention it anywhere.

Errr... I thought that the 'incremental' part is self-explaining that
it is Ajax-y stuff.  Well, while commit is 4af819d (gitweb: Incremental
blame (using JavaScript), 2009-09-01), perhaps I should have added some
comment in the code.

> 
> Exciting.
> 
> blame_data needs to not get a 'generating...' page in all likelihood,
> generating a blame_incremental page, letting it load and then refreshing
> the whole thing gets me what I'm expecting.

Hmmm... I wonder why it didn't work for me at that time...

> 
> Is enough to mask.
> 
> Guess I'm looking at a v9 now.
> 
>> Besides, withou "tee"-ing, i.e. printing output as it is captured,
>> cached 'blame_data' means that 'blame_incremental' is not incremental,
>> and therefore it vanishes its advantage over 'blame'.

I mean here that with current state of caching 'blame_incremental' stops
to be incremental...
 
> There are only 2 ways to get to a blame_incremental page
> 
> 1) By going to a blame page and clicking on the incremental link in the nav
> 
> 2) By enabling it by default so when you click 'blame' it goes to
> incremental first.

  3) By having JavaScript add ';js=1' to all links, so clicking on
  'blame' link (with action set to 'blame') would result in 
  'blame_incremental' view.

> 
>> In the case data is in cache, then 'blame_inremental' doesn't have
>> advantage over 'blame' either.
> 
> Agreed, though it's easy enough to support in the caching engine,
> basically don't return 'Generating...' and wait for that data to cache.
> Not really an advantage except that your not waiting for the whole
> generation to get a page back at all.
> 
>>>                 - Reverted change to reset_output as
>>>                         open STDOUT, ">&", \*STDOUT_REAL;
>>>                   causes assertion failures:
>>>                   Assertion !((((s->var)->sv_flags & (0x00004000|0x00008000)) == 0x00008000) && (((svtype)((s->var)->sv_flags & 0xff)) == SVt_PVGV || ((svtype)((s->var)->sv_flags & 0xff)) == SVt_PVLV)) failed: file "scalar.xs", line 49 at gitweb.cgi line 1221.
>>>                   if we encounter an error *BEFORE* we've ever changed the output.
>> 
>> Which Perl version are you using?  Because I think you found error in Perl.
>> Well, at least I have not happen on this bug.
> 
> This is perl, v5.10.0 built for x86_64-linux-thread-multi

Could you check with newer perl?  I don't get this error.

>> I have nothing againts using
>> 
>>   open STDOUT, ">&STDOUT_REAL";
>> 
>> though I really prefer that you used lexical filehandles, instead of
>> "globs" which are global variables.

And using 'print STDOUT_REAL "";' protects against spurious warning
(the warning is really wrong in this case).
 
>> The following works:
>> 
>>   open STDOUT, '>&', fileno($fh);
>> 
>> Note that fileno(Symbol::qualify_to_ref($fh)) might be needed...
> 
> I see 0 advantage to shifting around STDOUT and STDERR to a lexical
> filehandle vs. a glob in this case.  STDOUT_REAL retains all the
> properties of STDOUT should it be needed elsewhere, including what it
> was going and what it was doing.
> 
> I have no objection to shifting the file handles I'm using to lexical
> variables, if nothing else the argument about them closing when falling
> out of scope is worth it, but for STDOUT, STDERR, etc I don't think
> switching to lexicals makes a lot of sense

Well... I'd have to agree that in current case (capturing engine embedded
in gitweb, and gitweb-specific; no need for recursive capture) it would
be enough to use such globs.

> 
>>>         - Cleanups there were indirectly mentioned by Jakub
>>>                 - Elimination of anything even remotely looking like duplicate code
>>>                         - Creation of isBinaryAction() and isFeedAction()
>> 
>> Could you please do not use mixedCase names?
> 
> I'm fine with renaming those if you wish.
> 
>> First, that is what %actions_info from
>> 
>>   [PATCH 16/24] gitweb: Introduce %actions_info, gathering information about actions
>>   http://thread.gmane.org/gmane.comp.version-control.git/163052/focus=163038
>>   http://repo.or.cz/w/git/jnareb-git.git/commitdiff/305a10339b33d56b4a50708d71e8f42453c8cb1f
>> 
>> I have invented for.
> 
> I have not based any of my caching engine, right now, on anything you've
> done for your rewrite.

What I meant here that if you will be doing yet another version, you
can take a look at it as a way to avoiding not very clear and nice
long alternatives in condition, or in regexp matched.

> 
>> Second, why 'isBinaryAction()'?  there isn't something inherently
>> different between binary (':raw') and text (':utf8') output, as I have
>> repeatedly said before.
> 
> It's a binary action in that you are shoving something down the pipe
> with the intention of sending the bits completely raw.  You read the
> data raw, and write the data raw.  There is no interpretation of the
> data as being anything but straight raw.
> 
> Right now, in gitweb already, there are two places that treat output
> completely differently:
> 
> 	- snapshot
> 	- blob_plain
> 
> The only reason isBinaryAction() (or any other function name or process
> you want to grant it) exists is so that I can figure out if it's one of
> those actions so I can deal with the cache and output handling
> differently for each.
> 
> Yes, I could flip the entire caching engine over to following the same
> mantra for everything and thus there is no need to care, but gitweb
> itself isn't really setup to handle that separation cleanly right now,
> and I'm trying to make as few bigger changes right now as is.

Always reading from cache in ':raw' mode and always printing from cache
in ':raw' mode (i.e. setting STDOUT to ':raw' before printing / copying
cache entry) would be in gitweb case enough to not special-case binary
files.

In gitweb you always do "binmode STDOUT, ':raw';" _after_ starting capture,
which means that it gets applied to cache file; and gitweb always do
"binmode STDOUT, ':utf8';" before stopping capture.

If you print text data to file using ':utf8' layer (applied at beginning
to cache file) it is in this file as correct sequence of bytes.  Therefore
you can dump said cache file to STDOUT in ':raw' mode (or in ':utf8' mode)
- both STDOUT and read cache file has to have the same mode.

>>>         - Added better error handling
>>>                 - Creation of .err file in the cache directory
>>>                 - Trap STDERR output into $output_err as this was spewing data prior
>>>                   to any header information being sent
>> 
>> Why it is needed?  We capture output of "die" via CGI::Util::set_message,
>> and "warn" output is captured to web server logs... unless you explicitely
>> use "print STDERR <sth>" -- don't do that instead.
> 
> I have seen, in several instances, a case where git itself will generate
> an error, it shoves it to STDERR which makes it to the client before
> anything else, thus causing 500 level errors.
> 
> Added this so that STDERR got trapped and those messages didn't make it out.

Could you give examples when it happens?  Anything that happens after
"use CGI::Carp" is parsed should have STDERR redirected to web server
errors log.

I'll read the actual patch and comment on it.

> 
>>>         - Added hidden field in footer for url & hash of url, which is extremely useful
>>>           for debugging
>> 
>> Nice idea, I'll see it.  Can it be disabled (information leakage)?
> 
> There's not really any information leakage per-se, unless you call
> md5suming the url information leakage.

Ah, sorry, I send this comment before actually reading patch in question.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 16/18] gitweb: When changing output (STDOUT) change STDERR as well
  2010-12-09 21:57 ` [PATCH 16/18] gitweb: When changing output (STDOUT) change STDERR as well John 'Warthog9' Hawley
@ 2010-12-10  1:36   ` Jakub Narebski
  2010-12-12  5:25     ` J.H.
  0 siblings, 1 reply; 60+ messages in thread
From: Jakub Narebski @ 2010-12-10  1:36 UTC (permalink / raw)
  To: John 'Warthog9' Hawley; +Cc: git, Jakub Narebski

"John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:

> This sets up a trap for STDERR as well as STDOUT.  This should
> prevent any transient error messages from git itself percolating
> up to gitweb and outputting errant information before the HTTP
> header has been sent.

Hmm... anuthing that happens after 'use CGI::Carp;' is parsed should
have STDERR redirected to web server logs, see CGI::Carp manpage

    [...]
 
       use CGI::Carp

    And the standard warn(), die (), croak(), confess() and carp() calls will
    automagically be replaced with functions that write out nicely time-stamped
    messages to the HTTP server error log.

    [...]

    REDIRECTING ERROR MESSAGES

       By default, error messages are sent to STDERR.  Most HTTPD servers direct
       STDERR to the server's error log.

    [...]

Especially the second part.


Could you give us example which causes described misbehaviour?

I have nothing against this patch: if you have to have it, then you
have to have it.  I oly try to understand what might be core cause
behind the issue that this patch is to solve...

> Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
> ---
>  gitweb/gitweb.perl  |   22 +++++++++++++++++++++-
>  gitweb/lib/cache.pl |   22 ----------------------
>  2 files changed, 21 insertions(+), 23 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 7f8292e..d39982a 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -1214,6 +1214,10 @@ sub evaluate_argv {
>  sub change_output {
>  	our $output;
>  
> +	#
> +	# STDOUT
> +	#
> +
>  	# Trap the 'proper' STDOUT to STDOUT_REAL for things like error messages and such
>  	open(STDOUT_REAL,">&STDOUT") or die "Unable to capture STDOUT $!\n";
>  	print STDOUT_REAL "";
> @@ -1223,12 +1227,28 @@ sub change_output {
>  
>  	# Trap STDOUT to the $output variable, which is what I was using in the original
>  	# patch anyway.
> -	open(STDOUT,">", \$output) || die "Unable to open STDOUT: $!"; #open STDOUT handle to use $var
> +	open(STDOUT,">", \$output) || die "Unable to open STDOUT: $!"; #open STDOUT handle to use $output
> +
> +	#
> +	# STDERR
> +	#
> +
> +	# Trap the 'proper' STDOUT to STDOUT_REAL for things like error messages and such
> +	open(STDERR_REAL,">&STDERR") or die "Unable to capture STDERR $!\n";
> +	print STDERR_REAL "";

'print STDERR_REAL "";' nicely solves the spurious warning problem.
Nice.

> +
> +	# Close STDOUT, so that it isn't being used anymore.
> +	close STDERR;
> +
> +	# Trap STDOUT to the $output variable, which is what I was using in the original
> +	# patch anyway.
> +	open(STDERR,">", \$output_err) || die "Unable to open STDERR: $!"; #open STDERR handle to use $output_err

Err... where $output_err is defined?

>  }
>  
>  sub reset_output {
>  	# This basically takes STDOUT_REAL and puts it back as STDOUT
>  	open(STDOUT,">&STDOUT_REAL");
> +	open(STDERR,">&STDERR_REAL");
>  }
>  
>  sub run {
> diff --git a/gitweb/lib/cache.pl b/gitweb/lib/cache.pl
> index 28e4240..a8c902d 100644
> --- a/gitweb/lib/cache.pl
> +++ b/gitweb/lib/cache.pl
> @@ -380,28 +380,6 @@ EOF
>  	return;
>  }
>  
> -sub cacheDisplayErr {
> -
> -	return if ( ! -e "$fullhashpath.err" );
> -
> -	open($cacheFileErr, '<:utf8', "$fullhashpath.err");
> -	$lockStatus = flock($cacheFileErr,LOCK_SH|LOCK_NB);
> -
> -	if (! $lockStatus ){
> -		show_warning(
> -				"<p>".
> -				"<strong>*** Warning ***:</strong> Locking error when trying to lock error cache page, file $fullhashpath.err<br/>/\n".
> -				"This is about as screwed up as it gets folks - see your systems administrator for more help with this.".
> -				"<p>"
> -				);
> -	}
> -
> -	while( <$cacheFileErr> ){
> -		print $_;
> -	}
> -	exit(0);
> -}

Errr... in which patch it was added?


-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 17/18] gitweb: Prepare for cached error pages & better error page handling
  2010-12-09 21:57 ` [PATCH 17/18] gitweb: Prepare for cached error pages & better error page handling John 'Warthog9' Hawley
@ 2010-12-10  1:49   ` Jakub Narebski
  2010-12-10  8:33     ` J.H.
  0 siblings, 1 reply; 60+ messages in thread
From: Jakub Narebski @ 2010-12-10  1:49 UTC (permalink / raw)
  To: John 'Warthog9' Hawley; +Cc: git, Jakub Narebski

"John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:

> To quote myself from an e-mail of mine:
> 
> 	I've got a hammer, it clearly solves all problems!
> 
> This is the prepatory work to set up a mechanism inside the
> caching engine to cache the error pages instead of throwing
> them straight out to the client.

There is no problem with capturing output of die_error, nor there is a
problem with caching error pages (perhaps transiently in memory).

The problem is that subroutines calling die_error assum that it would
exit ending subroutine that is responsible for generating current
action; see "goto DONE_GITWEB" which should be "goto DONE_REQUEST",
and which was "exit 0" some time ago at the end of die_error().

With caching error pages you want die_error to exit $actions{$action}->(),
but not exit cache_fetch().  How do you intend to do it?

> 
> This adds two functions:
> 
> die_error_cache() - this gets back called from die_error() so
> that the error message generated can be cached.

*How* die_error_cache() gets called back from die_error()?  I don't
see any changes to die_error(), or actually any calling sites for
die_error_cache() in the patch below.
 
> cacheDisplayErr() - this is a simplified version of cacheDisplay()
> that does an initial check, if the error page exists - display it
> and exit.  If not, return.

Errr... isn't it removed in _preceding_ patch?  WTF???

> 
> Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
> ---
>  gitweb/lib/cache.pl |   52 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 52 insertions(+), 0 deletions(-)
> 
> diff --git a/gitweb/lib/cache.pl b/gitweb/lib/cache.pl
> index a8c902d..6cb82c8 100644
> --- a/gitweb/lib/cache.pl
> +++ b/gitweb/lib/cache.pl
> @@ -302,6 +302,36 @@ sub cacheUpdate {
>  	}
>  }
>  
> +sub die_error_cache {
> +	my ($output) = @_;
> +
> +	open(my $cacheFileErr, '>:utf8', "$fullhashpath.err");
> +	my $lockStatus = flock($cacheFileErr,LOCK_EX|LOCK_NB);

Why do you need to lock here?  A comment would be nice.

> +
> +	if (! $lockStatus ){
> +		if ( $areForked ){

Grrrr...

But if it is here to stay, a comment if you please.

> +			exit(0);
> +		}else{
> +			return;
> +		}
> +	}
> +
> +	# Actually dump the output to the proper file handler
> +	local $/ = undef;
> +	$|++;

Why not

  +	local $| = 1;


> +	print $cacheFileErr "$output";
> +	$|--;
> +
> +	flock($cacheFileErr,LOCK_UN);
> +	close($cacheFileErr);

Closing file will unlock it.

> +
> +	if ( $areForked ){
> +		exit(0);
> +	}else{
> +		return;

So die_error_cache would not actually work like "die" here and like
die_error(), isn't it?

> +	}
> +}
> +
>  
>  sub cacheWaitForUpdate {
>  	my ($action) = @_;
> @@ -380,6 +410,28 @@ EOF
>  	return;
>  }
>  
> +sub cacheDisplayErr {
> +
> +	return if ( ! -e "$fullhashpath.err" );
> +
> +	open($cacheFileErr, '<:utf8', "$fullhashpath.err");
> +	$lockStatus = flock($cacheFileErr,LOCK_SH|LOCK_NB);
> +
> +	if (! $lockStatus ){
> +		show_warning(
> +				"<p>".
> +				"<strong>*** Warning ***:</strong> Locking error when trying to lock error cache page, file $fullhashpath.err<br/>/\n".

esc_path

> +				"This is about as screwed up as it gets folks - see your systems administrator for more help with this.".
> +				"<p>"
> +				);
> +	}
> +
> +	while( <$cacheFileErr> ){
> +		print $_;
> +	}

Why not 'print <$cacheFileErr>' (list context), like in insert_file()
subroutine?

> +	exit(0);
> +}

Callsites?

Note: I have't read next commit yet.

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 18/18] gitweb: Add better error handling for gitweb caching
  2010-12-09 21:57 ` [PATCH 18/18] gitweb: Add better error handling for gitweb caching John 'Warthog9' Hawley
@ 2010-12-10  1:56   ` Jakub Narebski
  0 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-12-10  1:56 UTC (permalink / raw)
  To: John 'Warthog9' Hawley; +Cc: git

"John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:

> This basically finishes the plumbing for caching the error pages
> as the are generated.
> 
> If an error is hit, create a <hash>.err file with the error.  This
> will interrupt all currently waiting processes and they will display
> the error, without any additional refreshing.
> 
> On a new request a generation will be attempted, should it succed the
> <hash.err> file is removed (if it exists).

Could you split 17 and 18 patches slightly differently, at least not
using variables which were not declared first?
 
> Signed-off-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>

Hmmm... I certainly hope that this complication is not really needed.
I have trouble following code flow (no comments), so I'd try to do
fresh review again tomorrow.

> ---
>  gitweb/gitweb.perl  |    8 ++++++++
>  gitweb/lib/cache.pl |   14 ++++++++++++++
>  2 files changed, 22 insertions(+), 0 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index d39982a..5a9660a 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -41,6 +41,7 @@ our $version = "++GIT_VERSION++";
>  
>  # Output buffer variable
>  our $output = "";
> +our $output_err = "";

It is used by earlier patches, and declared only there.

>  
>  our ($my_url, $my_uri, $base_url, $path_info, $home_link);
>  sub evaluate_uri {
> @@ -303,6 +304,9 @@ our $fullhashpath = *STDOUT;
>  our $fullhashbinpath = *STDOUT;
>  our $fullhashbinpathfinal = *STDOUT;
>  
> +our $cacheErrorCache = 0; # false

$capture_error_output, isn't it?

> +our $cacheErrorCount = 0;

$cached_error_count, or something like that, isn't it?

> +
>  our $full_url;
>  our $urlhash;
>  
> @@ -3786,6 +3790,7 @@ sub die_error {
>  	# Reset the output so that we are actually going to STDOUT as opposed
>  	# to buffering the output.
>  	reset_output() if ($cache_enable && ! $cacheErrorCache);
> +	$cacheErrorCount++ if( $cacheErrorCache );

Where it is decremented?  A comment, if you please.

>  
>  	git_header_html($http_responses{$status}, undef, %opts);
>  	print <<EOF;
> @@ -3801,6 +3806,9 @@ EOF
>  	print "</div>\n";
>  
>  	git_footer_html();
> +
> +	die_error_cache($output) if ( $cacheErrorCache );
> +

That's cache_die_error_output, or something like that, isn't it?

It's hard to review this patch when die_error_cache is defined in
separate (previous) patch.

>  	goto DONE_GITWEB
>  		unless ($opts{'-error_handler'});
>  }
> diff --git a/gitweb/lib/cache.pl b/gitweb/lib/cache.pl
> index 6cb82c8..2e7ca69 100644
> --- a/gitweb/lib/cache.pl
> +++ b/gitweb/lib/cache.pl
> @@ -240,8 +240,14 @@ sub cacheUpdate {
>  	# Trap all output from the action
>  	change_output();
>  
> +	# Set the error handler so we cache
> +	$cacheErrorCache = 1; # true
> +
>  	$actions{$action}->();
>  
> +	# Reset Error Handler to not cache
> +	$cacheErrorCache = 0; # false
> +
>  	# Reset the outputs as we should be fine now
>  	reset_output();
>  
> @@ -295,6 +301,8 @@ sub cacheUpdate {
>  		close($cacheFileBG);
>  	}
>  
> +	unlink("$fullhashpath.err") if (-e "$fullhashpath.err");
> +
>  	if ( $areForked ){
>  		exit(0);
>  	} else {
> @@ -339,6 +347,9 @@ sub cacheWaitForUpdate {
>  	my $max = 10;
>  	my $lockStat = 0;
>  
> +	# Call cacheDisplayErr - if an error exists it will display and die.  If not it will just return
> +	cacheDisplayErr($action);
> +
>  	if( $backgroundCache ){
>  		if( -e "$fullhashpath" ){
>  			open($cacheFile, '<:utf8', "$fullhashpath");
> @@ -402,6 +413,9 @@ EOF
>  		close($cacheFile);
>  		$x++;
>  		$combinedLockStat = $lockStat;
> +
> +		# Call cacheDisplayErr - if an error exists it will display and die.  If not it will just return
> +		cacheDisplayErr($action);
>  	} while ((! $combinedLockStat) && ($x < $max));
>  	print <<EOF;
>  </body>
> -- 
> 1.7.2.3
> 

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 07/18] gitweb: Revert back to $cache_enable vs. $caching_enabled
  2010-12-09 23:38   ` Jakub Narebski
@ 2010-12-10  2:38     ` J.H.
  2010-12-10 13:48       ` Jakub Narebski
  0 siblings, 1 reply; 60+ messages in thread
From: J.H. @ 2010-12-10  2:38 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

> Formally, there is no backward compatibility with any released code.
> Using out-of-tree patches is on one's own risk.

I will have to beg to differ with you on this, the entirety of the
existing caching engine has been released code for a number of years,
there are rpm packages available for it, at the very least, in Fedora
and in EPEL.

The caching engine *IS* released code, and this patchset is as much a
new feature as an attempt to merge a fork.  Kernel.org isn't the only
one running this code, and that has been the case for several years now
already.

Claiming that this isn't released code is doing me a disservice to me,
and those who have submitted patches to it independent of git and the
mainline gitweb.

Thinking about the patch series outside of that context will lead to me
putting my foot down and arguing on those other users behalf.  I'm not
keen on breaking them for no good reason, and I'm not seeing your change
here as one that's particularly worthwhile, while causing external
breakage for no reason.

> But even discarding that, I'd rather use the same solution as in
> 
>   [PATCHv6/RFC 22/24] gitweb: Support legacy options used by kernel.org caching engine
>   http://thread.gmane.org/gmane.comp.version-control.git/163052/focus=163058
>   http://repo.or.cz/w/git/jnareb-git.git/commitdiff/27ec67ad90ecd56ac3d05f6a9ea49b6faabf7d0a
> 
> i.e.
> 
>   our $cache_enable;
> 
>   [...]
> 
>   # somewhere just before call to cache_fetch()
>   $caching_enabled = !!$cache_enable if defined $cache_enable;
> 
>>
>> This reverts back to the previous variable to enable / disable caching

Is there really any point in changing the name at all?  The intention of
cache_enable, at one point, was to allow for other caching engines and
while there aren't any other caching engines that use it, it's already
treated identically to cache_enable.

If it really adds enough to the readability to the code, then I'm fine
with adding:

	$caching_enabled = $cache_enable if defined $cache_enable;

But now you are setting up two variables that control the same thing,
adding the possibility for conflicts and confusion to end users.

I just want that stated.

Also, why the double negative in your original snippet - that doesn't
entirely make sense....

          |  cache_enable     |    caching_enabled
----------+-------------------+---------------------
enabled:  |        1          |            1
disabled: |        0          |            0

doing a double negative like that doesn't really buy you much except
turning 0 into NULL or '' which is equivalent to 0...

- John 'Warthog9' Hawley

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 09/18] gitweb: Revert reset_output() back to original code
  2010-12-09 23:58   ` Jakub Narebski
@ 2010-12-10  2:43     ` J.H.
  0 siblings, 0 replies; 60+ messages in thread
From: J.H. @ 2010-12-10  2:43 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

>> Reverted change to reset_output as
>>
>> 	open STDOUT, ">&", \*STDOUT_REAL;
> 
> For somebody not following our discussion the above would be very,
> very cryptic... though I suppose this would be squashed in final
> (ready to be merged in) version of the code.
>  
>> causes assertion failures:
>>
>> 	Assertion !((((s->var)->sv_flags & (0x00004000|0x00008000)) == 0x00008000) && (((svtype)((s->var)->sv_flags & 0xff)) == SVt_PVGV || ((svtype)((s->var)->sv_flags & 0xff)) == SVt_PVLV)) failed: file "scalar.xs", line 49 at gitweb.cgi line 1221.
> 
> It looks like bug in Perl, because it should give some kind of Perl
> error, not failed assertion from within guts of Perl C code.
> 
> Which Perl version are you using?

This is perl, v5.10.0 built for x86_64-linux-thread-multi

>> if we encounter an error *BEFORE* we've ever changed the output.
> 
> And how to reproduce this error (i.e. how did you found it)?

Cause an error to occur before the caching engine switches output, for
instance fail on creating the cache dir, or disable caching all together
and generate an error.  I think the former is where I noticed it, it was
consistent though.

- John 'Warthog9' Hawley

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 10/18] gitweb: Adding isBinaryAction() and isFeedAction() to determine the action type
  2010-12-10  0:06   ` Jakub Narebski
@ 2010-12-10  3:39     ` J.H.
  2010-12-10 12:10       ` Jakub Narebski
  0 siblings, 1 reply; 60+ messages in thread
From: J.H. @ 2010-12-10  3:39 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

>> This is fairly self explanitory, these are here just to centralize the checking
>> for these types of actions, as special things need to be done with regards to
>> them inside the caching engine.
>>
>> isBinaryAction() returns true if the action deals with creating binary files
>> (this needing :raw output)
> 
> Why do you need special case binary / :raw output?  It is not really
> necessary if it is done in right way, as shown in my rewrite.

Because that's not how my caching engine does it, and the reason for
that is I am mimicking how the rest of gitweb does it.

I attempted at one point to do as you were suggesting, and it became too
cumbersome.  I eventually broke out the 'binary' packages into a special
case (thus mimicking how gitweb is already doing things), which also
gives me the advantage of being able to checksum the resulting binary
out of band, as well as being able to more trivially calculate the file
size being sent.

>> isFeedAction() returns true if the action deals with a news feed of some sort,
>> basically used to bypass the 'Generating...' message should it be a news reader
>> as those will explode badly on that page.
> 
> Why blacklisting 'feed', instead of whitelisting HTML-output?

There are a limited number of feed types and their ilk (standard xml
formatted feed and atom), there are lots of html-output like things.
Easier to default and have things work, generally, than to have things
not work the way you would expect.

> BTW., please don't use mixedCase names, but underline_separated.

fixed in v9

- John 'Warthog9' Hawley

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 11/18] gitweb: add isDumbClient() check
  2010-12-10  0:12   ` Jakub Narebski
@ 2010-12-10  4:00     ` J.H.
  2010-12-11  0:07       ` Junio C Hamano
  0 siblings, 1 reply; 60+ messages in thread
From: J.H. @ 2010-12-10  4:00 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

>> +	my($user_agent) = $ENV{'HTTP_USER_AGENT'};
> 
> What if $ENV{'HTTP_USER_AGENT'} is unset / undef, e.g. because we are
> runing gitweb as a script... which includes running gitweb tests?

It can be disabled for the running of tests, but the default is to show
'Generating...' vs. not.  I'd rather assume there's an intelligent
client on the other end and give users a reason why they aren't staring
at their initial content immediately (and thus thinking something is
broken).

>> +	
>> +	if(
>> +		# wget case
>> +		$user_agent =~ /^Wget/i
>> +		||
>> +		# curl should be excluded I think, probably better safe than sorry
>> +		$user_agent =~ /^curl/i
>> +	  ){
>> +		return 1;	# True
>> +	}
>> +
>> +	return 0;
>> +}
> 
> Compare (note: handcrafted solution is to whitelist, not blacklist):
> 
> +sub browser_is_robot {
> +       return 1 if !exists $ENV{'HTTP_USER_AGENT'}; # gitweb run as script
> +       if (eval { require HTTP::BrowserDetect; }) {
> +               my $browser = HTTP::BrowserDetect->new();
> +               return $browser->robot();
> +       }
> +       # fallback on detecting known web browsers
> +       return 0 if ($ENV{'HTTP_USER_AGENT'} =~ /\b(?:Mozilla|Opera|Safari|IE)\b/);
> +       # be conservative; if not sure, assume non-interactive
> +       return 1;
> +}

My initial look indicated that perl-http-browserdetect wasn't available
for RHEL / CentOS 5 - it is however available in EPEL.

However there are a couple of things to note about User Agents at all:
	- They lie... a lot
	- Robots lie even more

Blacklisting is still the better option, by a lot.  I'll re-work this
some in v9, as I'm fine with the added dependency.

- John 'Warthog9' Hawley

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 12/18] gitweb: Change file handles (in caching) to lexical variables as opposed to globs
  2010-12-10  0:32     ` Junio C Hamano
  2010-12-10  0:47       ` Jakub Narebski
@ 2010-12-10  5:56       ` J.H.
  1 sibling, 0 replies; 60+ messages in thread
From: J.H. @ 2010-12-10  5:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jakub Narebski, git

On 12/09/2010 04:32 PM, Junio C Hamano wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
> 
>>> +# Global declarations
>>> +our $cacheFile;
>>> +our $cacheFileBG;
>>> +our $cacheFileBinWT;
>>> +our $cacheFileBin;
>>
>> You are trading globs for global (well, package) variables.  They are
>> not lexical filehandles... though I'm not sure if it would be possible
>> without restructuring code; note that if variable holding filehandle
>> falls out of scope, then file would be automatically closed.
> 
> Hmm. why is it a bad idea, when you need to access these from practically
> everywhere, to use global variables to begin with?  To a certain degree,
> it sounds like an unnecessary burden without much gain to me.

This was why I used globs in the first place.

To answer Jakub's question first: yes, some of those are holding locks
open while things are happening.  Since there's locks open, through a
lot of functions, it's

Generally speaking I'm pretty good about opening and closing the files
when it's needed.  I *THINK* they can be made local variables, as I
don't think anything is kept open (within the caching engine) across
functions.

I've made a couple of changes, I'm going to have to test them, this
might be fixed in v9.

- John 'Warthog9' Hawley

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 13/18] gitweb: Add commented url & url hash to page footer
  2010-12-10  0:26   ` Jakub Narebski
@ 2010-12-10  6:10     ` J.H.
  0 siblings, 0 replies; 60+ messages in thread
From: J.H. @ 2010-12-10  6:10 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

>> <!--
>> 	Full URL: |http://localhost/gitweb-caching/gitweb.cgi?p=/project.git;a=summary|
>> 	URL Hash: |7a31cfb8a43f5643679eec88aa9d7981|
>> -->
> 
> Nice idea.  It helps with debugging and doesn't introduce information
> leakage.

That was the plan, and I've already got a scenario where it would be useful.

>> The first bit tells you what the url that generated the page actually was, the second is
>> the hash used to store the file with the first two characters being used as the directory:
>>
>> <cachedir>/7a/31cfb8a43f5643679eec88aa9d7981
> 
> Isn't it
> 
>   <cachedir>/7a/7a31cfb8a43f5643679eec88aa9d7981
> 
> in your series?

Nope

	our $fullhashdir = "$cachedir/". substr( $urlhash, 0, 2) ."/";

and then a couple of lines later:

	$fullhashpath = "$fullhashdir/". substr( $urlhash, 2 );

right at the top of cache_fetch()

>> Also useful for greping through the existing cache and finding files with unique paths that
>> you may want to explicitly flush.
> 
> Though probably better 'cache_admin' page would be ultimately best
> solution, see proof of concept in

The biggest problem with the cache admin page you've got there, is that
gitweb itself doesn't have a framework for user administration,
privileges, etc.  Limiting it to the local machine is also useless,
there are very few people who are going to have access, from 127.0.0.1
to their web server, and this also breaks anything even remotely
resembling virtual hosts.

The fact that it's unusable from virtual hosts makes this pretty much DOA.

Like I've said in the past, we need to at least look at web frameworks
for gitweb, and if we want to provide things like the admin page than we
need to consider that we are going to need user management.  That, in
particular, starts drifting towards needing a database to store things
in and I for one am *NOT* in favor of that.

I like the idea of a framework helping deal with things like page
layout, separating data access from content, etc.  I do not like the
idea of gitweb having a full blown setup with a database and all behind it.

- John 'Warthog9' Hawley

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 14/18] gitweb: add print_transient_header() function for central header printing
  2010-12-10  0:36   ` Jakub Narebski
@ 2010-12-10  6:18     ` J.H.
  0 siblings, 0 replies; 60+ messages in thread
From: J.H. @ 2010-12-10  6:18 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

>> +sub print_transient_header {
>> +	print $::cgi->header(
> 
> Why you use $::cgi->header() instead of equivalent $cgi->header()?
> Note that $::cgi->header() is $main::cgi->header(), and is not
> CGI::header().

Because $main::cgi already was setup.  Since I'm not redefining $cgi
anywhere they evaluate to the same thing since cgi is already a global
variable coming from gitweb itself.

The way I have it now is it's more explicit to being the parent (main).
 It doesn't really matter either way, but I can change it if you like.

>> +				-type=>'text/html',
>> +				-charset => 'utf-8',
>> +				-status=> 200,
>> +				-expires => 'now',
>> +				# HTTP/1.0
>> +				-Pragma => 'no-cache',
>> +				# HTTP/1.1
>> +				-Cache_Control => join(
>> +							', ',
>> +							qw(
>> +								private
>> +								no-cache
>> +								no-store
>> +								must-revalidate
>> +								max-age=0
>> +								pre-check=0
>> +								post-check=0
>> +							)
>> +						)
>> +				);
>> +	return;
>> +}
> 
> Why not use
> 
> 	our %no_cache = (
> 		# HTTP/1.0
> 		-Pragma => 'no-cache',
> 		# HTTP/1.1
> 		-Cache_Control => join(', ', qw(private no-cache no-store must-revalidate
> 		                                max-age=0 pre-check=0 post-check=0)),
> 	);
> 
> (or something like that).  This way you can reuse it even if content
> type is different (e.g. 'text/plain').
> 
> But that is just a proposal.

Finer grained control, though they have the same basic setup.  Probably
will add that, though it's not that big of a deal.

- John 'Warthog9' Hawley

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 15/18] gitweb: Add show_warning() to display an immediate warning, with refresh
  2010-12-10  1:01   ` Jakub Narebski
@ 2010-12-10  7:38     ` J.H.
  2010-12-10 14:10       ` Jakub Narebski
  0 siblings, 1 reply; 60+ messages in thread
From: J.H. @ 2010-12-10  7:38 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

On 12/09/2010 05:01 PM, Jakub Narebski wrote:
> "John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:
> 
>> die_error() is an immediate and abrupt action.  show_warning() more or less
>> functions identically, except that the page generated doesn't use the
>> gitweb header or footer (in case they are broken) and has an auto-refresh
>> (10 seconds) built into it.
> 
> Why not use gitweb header/footer?  If they are broken, it should be
> caught in git development.  If we don't se them, the show_warning()
> output would look out of place.

The only other 'transient' style page, the 'Generating...' page doesn't
use it, and I felt that since this was also transient, and only (likely)
to be seen once it wasn't worth the header & footer.

That said I've added it back in, in v9.

>> +sub show_warning {
>> +	$| = 1;
> 
>   +	local $| = 1;
> 
> $| is global variable, and otherwise you would turn autoflush for all
> code, which would matter e.g. for FastCGI.

Since the execution exits immediately after, wouldn't FastCGI reset at
that point, since execution of that thread has stopped?  Or does FastCGI
retain everything as is across subsequent executions of a process?

>> +<meta http-equiv="refresh" content="10"/>
> 
> Why 10 seconds?

Long enough to see the error, but not too long to be a nuisance.  Mainly
just there to warn the admin that it did something automatic they may
not have been expecting.

>> +</head>
>> +<body>
>> +$warning
>> +</body>
>> +</html>
>> +EOF
>> +	exit(0);
> 
> "exit(0)" and not "goto DONE_GITWEB", or "goto DONE_REQUEST"?

DONE_REQUEST doesn't actually exist as a label, the exit was used
partially for my lack of love for goto's, but mostly out of not
realizing what that was calling back to (mainly for the excitement of
things like PSGI and their ilk)

I will change that that, but considering there are other locations where
I do explicit exit's and those are actually inherent to the way the
caching engine currently works, I might need to go take a look at what's
going on with respect to multi-threaded items inside of PSGI and their
like.  It's possible the caching engine doesn't actually work on those...

>> +}
>> +
>>  sub isBinaryAction {
>>  	my ($action) = @_;
> 
> Didn't you ran gitweb tests?

I did, they passed for me - for whatever reason my cache dir wasn't
cleaned up, and stayed resident once it was created.

- John 'Warthog9' Hawley

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 17/18] gitweb: Prepare for cached error pages & better error page handling
  2010-12-10  1:49   ` Jakub Narebski
@ 2010-12-10  8:33     ` J.H.
  2010-12-10 20:33       ` Jakub Narebski
  0 siblings, 1 reply; 60+ messages in thread
From: J.H. @ 2010-12-10  8:33 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

> There is no problem with capturing output of die_error, nor there is a
> problem with caching error pages (perhaps transiently in memory).
> 
> The problem is that subroutines calling die_error assum that it would
> exit ending subroutine that is responsible for generating current
> action; see "goto DONE_GITWEB" which should be "goto DONE_REQUEST",
> and which was "exit 0" some time ago at the end of die_error().
> 
> With caching error pages you want die_error to exit $actions{$action}->(),
> but not exit cache_fetch().  How do you intend to do it?

Well there's one bug in how that function ends in looking at it again,
basically the return case shouldn't happen, and that function should
end, like your suggesting in the first part of your question (with
respect to DONE_GITWEB)

In the second part, your not thinking with the fork() going (though in
thinking sans the fork this might not work right).

It's the background process that will call die_error in such a way that
die_error_cache will get invoked.  die_error_cache will write the .err
file out, and the whole thing should just exit.

Though now that I say that there's an obvious bug in the case where
forking didn't work at all, in that case you would get a blank page as
the connection would just be closed.  If you refreshed (say hitting F5)
you'd get the error at that point.

Need to fix that non-forked problem though.

>> This adds two functions:
>>
>> die_error_cache() - this gets back called from die_error() so
>> that the error message generated can be cached.
> 
> *How* die_error_cache() gets called back from die_error()?  I don't
> see any changes to die_error(), or actually any calling sites for
> die_error_cache() in the patch below.
>  
>> cacheDisplayErr() - this is a simplified version of cacheDisplay()
>> that does an initial check, if the error page exists - display it
>> and exit.  If not, return.
> 
> Errr... isn't it removed in _preceding_ patch?  WTF???

in breaking up the series it got included in the wrong spot, and
apparently removed and re-added correctly, should be fixed in v9

>> +sub die_error_cache {
>> +	my ($output) = @_;
>> +
>> +	open(my $cacheFileErr, '>:utf8', "$fullhashpath.err");
>> +	my $lockStatus = flock($cacheFileErr,LOCK_EX|LOCK_NB);
> 
> Why do you need to lock here?  A comment would be nice.

At any point when a write happens there's the potential for multiple
simultaneous writes.  Locking becomes obvious, when your trying to
prevent multiple processes from writing to the same thing at the same
time...

>> +
>> +	if (! $lockStatus ){
>> +		if ( $areForked ){
> 
> Grrrr...
> 
> But if it is here to stay, a comment if you please.
> 
>> +			exit(0);
>> +		}else{
>> +			return;
>> +		}
>> +	}

The exit(0) or return have been removed in favor of DONE_GITWEB, as
we've already errored if we are broken here we should just die.

>> +
>> +	# Actually dump the output to the proper file handler
>> +	local $/ = undef;
>> +	$|++;
> 
> Why not
> 
>   +	local $| = 1;
> 

Done.

> 
>> +	print $cacheFileErr "$output";
>> +	$|--;
>> +
>> +	flock($cacheFileErr,LOCK_UN);
>> +	close($cacheFileErr);
> 
> Closing file will unlock it.

Doesn't really hurt to be explicit though.

>> +
>> +	if ( $areForked ){
>> +		exit(0);
>> +	}else{
>> +		return;
> 
> So die_error_cache would not actually work like "die" here and like
> die_error(), isn't it?

that was ejected, it was a bug.  DONE_GITWEB is more correct, though I
might need to add a hook to display the error message in the case that
the process didn't fork.

>> +	}
>> +}
>> +
>>  
>>  sub cacheWaitForUpdate {
>>  	my ($action) = @_;
>> @@ -380,6 +410,28 @@ EOF
>>  	return;
>>  }
>>  
>> +sub cacheDisplayErr {
>> +
>> +	return if ( ! -e "$fullhashpath.err" );
>> +
>> +	open($cacheFileErr, '<:utf8', "$fullhashpath.err");
>> +	$lockStatus = flock($cacheFileErr,LOCK_SH|LOCK_NB);
>> +
>> +	if (! $lockStatus ){
>> +		show_warning(
>> +				"<p>".
>> +				"<strong>*** Warning ***:</strong> Locking error when trying to lock error cache page, file $fullhashpath.err<br/>/\n".
> 
> esc_path
> 
>> +				"This is about as screwed up as it gets folks - see your systems administrator for more help with this.".
>> +				"<p>"
>> +				);
>> +	}
>> +
>> +	while( <$cacheFileErr> ){
>> +		print $_;
>> +	}
> 
> Why not 'print <$cacheFileErr>' (list context), like in insert_file()
> subroutine?

I've had buffer problems with 'print <$cacheFileErr>' in some cases.
This is small enough it shouldn't happen, but I've gotten into the habit
of doing it this way.  I can change it if you like.

> 
>> +	exit(0);
>> +}
> 
> Callsites?
> 
> Note: I have't read next commit yet.

Next patch.

If you'd rather I can squash 17 & 18 into a single commit.

- John 'Warthog9' Hawley

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 10/18] gitweb: Adding isBinaryAction() and isFeedAction() to determine the action type
  2010-12-10  3:39     ` J.H.
@ 2010-12-10 12:10       ` Jakub Narebski
  2010-12-10 12:25         ` Jakub Narebski
  0 siblings, 1 reply; 60+ messages in thread
From: Jakub Narebski @ 2010-12-10 12:10 UTC (permalink / raw)
  To: J.H.; +Cc: git

On Fri, 10 Dec 2010, J.H. wrote:

>>> This is fairly self explanatory, these are here just to centralize the checking
>>> for these types of actions, as special things need to be done with regards to
>>> them inside the caching engine.
>>>
>>> isBinaryAction() returns true if the action deals with creating binary files
>>> (this needing :raw output)
>> 
>> Why do you need special case binary / :raw output?  It is not really
>> necessary if it is done in right way, as shown in my rewrite.
> 
> Because that's not how my caching engine does it, and the reason for
> that is I am mimicking how the rest of gitweb does it.

To shorten the explanation why treating binary (needing :raw) output in
a special way is not necessary: with the way gitweb code is structured
(with "binmode STDOUT, ':raw'" inside action subroutine), with the way
capturing output is done (by redirecting STDOUT), and even with the way
kernel.org caching code is structured the only thing that needs to be
done to support both text (:utf8, as set at beginning of gitweb) and
binary (:raw) output is to *dump cache to STDOUT in binary mode*:

	binmode $cache_fh, ':raw';
	binmode STDOUT, ':raw';
	File::Copy::copy($fh, \*STDOUT);

Nothing more.

Just dump cache file to STDOUT in binary mode.
 
> I attempted at one point to do as you were suggesting, and it became too
> cumbersome.  I eventually broke out the 'binary' packages into a special
> case (thus mimicking how gitweb is already doing things), which also
> gives me the advantage of being able to checksum the resulting binary
> out of band, as well as being able to more trivially calculate the file
> size being sent.

I don't see how it needs to be special-cased: the ordinary output would
also take advantage of this.  Note that plain 'blob' action can also
be quite large.

If there is to be done smarter, i.e. HTTP-aware, parsing and dumping of
cache entry file, e.g. by reading the HTTP header part to memory and
fiddling with HTTP headers (e.g. adding Content-Length header), it can be
done in a contents-agnostic way.

Note that with the way I do it in my rewrite, namely saving cached output
to temporary file to rename it to final destination later (atomic update),
we can do mungling of HTTP headers before/during this final copying to
final file, e.g. calculating Content-Length and perhaps Content-MD5 
headers.

> 
>>> isFeedAction() returns true if the action deals with a news feed of some sort,
>>> basically used to bypass the 'Generating...' message should it be a news reader
>>> as those will explode badly on that page.
>> 
>> Why blacklisting 'feed', instead of whitelisting HTML-output?
> 
> There are a limited number of feed types and their ilk (standard xml
> formatted feed and atom), there are lots of html-output like things.
> Easier to default and have things work, generally, than to have things
> not work the way you would expect.

Ah, I see from what you written in other subthreads of this thread that
you prefer to have "Generating..." page where it is not wanted that not
have it where it could be useful (i.e. blacklist approach), while I took
the opposite side (i.e. whitelist approach).

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 10/18] gitweb: Adding isBinaryAction() and isFeedAction() to determine the action type
  2010-12-10 12:10       ` Jakub Narebski
@ 2010-12-10 12:25         ` Jakub Narebski
  0 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-12-10 12:25 UTC (permalink / raw)
  To: J.H.; +Cc: git

On Fri, 10 Dec 2010, Jakub Narebski wrote:
> On Fri, 10 Dec 2010, J.H. wrote:
> 
>>>> This is fairly self explanatory, these are here just to centralize the checking
>>>> for these types of actions, as special things need to be done with regards to
>>>> them inside the caching engine.
>>>>
>>>> isBinaryAction() returns true if the action deals with creating binary files
>>>> (this needing :raw output)
>>> 
>>> Why do you need special case binary / :raw output?  It is not really
>>> necessary if it is done in right way, as shown in my rewrite.
>> 
>> Because that's not how my caching engine does it, and the reason for
>> that is I am mimicking how the rest of gitweb does it.
> 
> To shorten the explanation why treating binary (needing :raw) output in
> a special way is not necessary: with the way gitweb code is structured
> (with "binmode STDOUT, ':raw'" inside action subroutine), with the way
> capturing output is done (by redirecting STDOUT), and even with the way
> kernel.org caching code is structured the only thing that needs to be
> done to support both text (:utf8, as set at beginning of gitweb) and
> binary (:raw) output is to *dump cache to STDOUT in binary mode*:
> 
> 	binmode $cache_fh, ':raw';
> 	binmode STDOUT, ':raw';
> 	File::Copy::copy($fh, \*STDOUT);
> 
> Nothing more.
> 
> Just dump cache file to STDOUT in binary mode.

Note that special-casing binary output means that you would never be able
to replace custom caching engine with e.g. CHI with Memcached backend,
because that treating some actions in a special way interleaves gitweb
code with guts of caching code.

And memcached might be a way that kernel.org would have to go...
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 07/18] gitweb: Revert back to $cache_enable vs. $caching_enabled
  2010-12-10  2:38     ` J.H.
@ 2010-12-10 13:48       ` Jakub Narebski
  0 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-12-10 13:48 UTC (permalink / raw)
  To: J.H.; +Cc: git

On Fri, 10 Dec 2010, J.H. wrote:

> > Formally, there is no backward compatibility with any released code.
> > Using out-of-tree patches is on one's own risk.
> 
> I will have to beg to differ with you on this, the entirety of the
> existing caching engine has been released code for a number of years,
> there are rpm packages available for it, at the very least, in Fedora
> and in EPEL.
> 
> The caching engine *IS* released code, and this patchset is as much a
> new feature as an attempt to merge a fork.  Kernel.org isn't the only
> one running this code, and that has been the case for several years now
> already.
> 
> Claiming that this isn't released code is doing me a disservice to me,
> and those who have submitted patches to it independent of git and the
> mainline gitweb.
> 
> Thinking about the patch series outside of that context will lead to me
> putting my foot down and arguing on those other users behalf.  I'm not
> keen on breaking them for no good reason, and I'm not seeing your change
> here as one that's particularly worthwhile, while causing external
> breakage for no reason.

I am so very sorry.  Please excuse me.  I didn't intent this to be arguing
against backwards compatibility with what amounts to gitweb fork, but rather
grumbling about maintaining our mistakes due to backwards compatibility
requirement.  I see now that it reads as arguing for breaking backwards
compatibility: the "Formally" qualifier is too weak.

That said I would rather there was no need for forking, or at least for
the caching patches to be peer-reviewed on git mailing list, even if they
wouldn't be accepted / merged in, or merged in soon enough to avoid need
for fork.

> 
> > But even discarding that, I'd rather use the same solution as in
> > 
> >   [PATCHv6/RFC 22/24] gitweb: Support legacy options used by kernel.org caching engine
> >   http://thread.gmane.org/gmane.comp.version-control.git/163052/focus=163058
> >   http://repo.or.cz/w/git/jnareb-git.git/commitdiff/27ec67ad90ecd56ac3d05f6a9ea49b6faabf7d0a
> > 
> > i.e.
> > 
> >   our $cache_enable;
> > 
> >   [...]
> > 
> >   # somewhere just before call to cache_fetch()
> >   $caching_enabled = !!$cache_enable if defined $cache_enable;
> > 
> >>
> >> This reverts back to the previous variable to enable / disable caching
> 
> Is there really any point in changing the name at all?  The intention of
> cache_enable, at one point, was to allow for other caching engines and
> while there aren't any other caching engines that use it, it's already
> treated identically to cache_enable.
> 
> If it really adds enough to the readability to the code, then I'm fine
> with adding:
> 
> 	$caching_enabled = $cache_enable if defined $cache_enable;
> 
> But now you are setting up two variables that control the same thing,
> adding the possibility for conflicts and confusion to end users.
> 
> I just want that stated.

I guess I can live (I'd have to live) with $cache_enable instead of
$caching_enabled as name of *boolean* variable controlling whether
caching is turned on or off.  Though I'd argue that $caching_enabled
is better name:

  if ($caching_enabled) {

reads naturally as "if caching [is] enabled"; not so with $cache_enable.
$cache_enable as enum is just a bad, bad idea, as is conflating enabling
caching with selecting caching engine (c.f. http://lwn.net/Articles/412131/
though only very peripherally - it is about other "conflated designs").


BTW. when leaving $cache_enable from "[PATCHv6/RFC 22/24] gitweb: Support
legacy options used by kernel.org caching engine" I forgot that it is
actually $cache_enable (which is 0 by default) that needs to be set up
if one wants caching.  All the rest of cache config variables can be left
at their default values... though, J.H., are they?

> 
> Also, why the double negative in your original snippet - that doesn't
> entirely make sense....

I don't know why I felt that I needed to convert it to bool...

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 15/18] gitweb: Add show_warning() to display an immediate warning, with refresh
  2010-12-10  7:38     ` J.H.
@ 2010-12-10 14:10       ` Jakub Narebski
  0 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-12-10 14:10 UTC (permalink / raw)
  To: J.H.; +Cc: git

On Fri, 10 Dec 2010, J.H. wrote:
> On 12/09/2010 05:01 PM, Jakub Narebski wrote:
>> "John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:
>> 
>>> die_error() is an immediate and abrupt action.  show_warning() more or less
>>> functions identically, except that the page generated doesn't use the
>>> gitweb header or footer (in case they are broken) and has an auto-refresh
>>> (10 seconds) built into it.
>> 
>> Why not use gitweb header/footer?  If they are broken, it should be
>> caught in git development.  If we don't se them, the show_warning()
>> output would look out of place.
> 
> The only other 'transient' style page, the 'Generating...' page doesn't
> use it, and I felt that since this was also transient, and only (likely)
> to be seen once it wasn't worth the header & footer.
> 
> That said I've added it back in, in v9.

Well, the contents and feel of show_warning() is more like die_error()
rather than "Generating..." page, so I feel that if die_error() conforms
to style of rest of gitweb pages, then show_warning() should too.
 
>>> +sub show_warning {
>>> +	$| = 1;
>> 
>>   +	local $| = 1;
>> 
>> $| is global variable, and otherwise you would turn autoflush for all
>> code, which would matter e.g. for FastCGI.
> 
> Since the execution exits immediately after, wouldn't FastCGI reset at
> that point, since execution of that thread has stopped?  Or does FastCGI
> retain everything as is across subsequent executions of a process?

Well, with exit(0) it is a moot point... but it is good habit to localize
punctation variables ($|, $/,)
 
>>> +<meta http-equiv="refresh" content="10"/>
>> 
>> Why 10 seconds?
> 
> Long enough to see the error, but not too long to be a nuisance.  Mainly
> just there to warn the admin that it did something automatic they may
> not have been expecting.

A comment if you please, then?
 
Hmmm... I guess there is no ned to make it configurable.

>>> +</head>
>>> +<body>
>>> +$warning
>>> +</body>
>>> +</html>
>>> +EOF
>>> +	exit(0);
>> 
>> "exit(0)" and not "goto DONE_GITWEB", or "goto DONE_REQUEST"?
> 
> DONE_REQUEST doesn't actually exist as a label,

Errr... DONE_REQUEST was introduced in

  [PATCH/RFC] gitweb: Go to DONE_REQUEST rather than DONE_GITWEB in die_error
  Message-ID: <1290723308-21685-1-git-send-email-jnareb@gmail.com>
  http://permalink.gmane.org/gmane.comp.version-control.git/162156


> the exit was used 
> partially for my lack of love for goto's, but mostly out of not
> realizing what that was calling back to (mainly for the excitement of
> things like PSGI and their ilk)


You would have to do more than that.  ModPerl::Registry that is used
for mod_perl support (which as deployment is I guess more widespread
than PSGI via wrapper using Plack::App::WrapCGI, or FastCGI deployment)
redefines 'exit' so that CGI scripts that use 'exit' to end request
keep working without need to restart worker at each request; for real
exit, for example from background process, you need to use CORE::exit.
See e.g. http://repo.or.cz/w/git/jnareb-git.git/commitdiff/8bd99a6d37cc
the ->_set_maybe_background() method.

> 
> I will change that that, but considering there are other locations where
> I do explicit exit's and those are actually inherent to the way the
> caching engine currently works, I might need to go take a look at what's
> going on with respect to multi-threaded items inside of PSGI and their
> like.  It's possible the caching engine doesn't actually work on those...

That would be a pity.  In my rewrite I tried to take into acount both
non-persistent (plain CGI, running as script) and persistent (mod_perl,
FastCGI, PSGI) web environments.

>>> +}
>>> +
>>>  sub isBinaryAction {
>>>  	my ($action) = @_;
>> 
>> Didn't you ran gitweb tests?
> 
> I did, they passed for me - for whatever reason my cache dir wasn't
> cleaned up, and stayed resident once it was created.

Hmmm... I wonder why new tests in t9502 and t9503 didn't pass for me...


P.S. I'll write separate email about problems with die_error, die-ing
and output caching.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 17/18] gitweb: Prepare for cached error pages & better error page handling
  2010-12-10  8:33     ` J.H.
@ 2010-12-10 20:33       ` Jakub Narebski
  0 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-12-10 20:33 UTC (permalink / raw)
  To: J.H.; +Cc: git

On Fri, 10 Dec 2010, J.H. wrote:

> > There is no problem with capturing output of die_error, nor there is a
> > problem with caching error pages (perhaps transiently in memory).
> > 
> > The problem is that subroutines calling die_error assum that it would
> > exit ending subroutine that is responsible for generating current
> > action; see "goto DONE_GITWEB" which should be "goto DONE_REQUEST",
> > and which was "exit 0" some time ago at the end of die_error().
> > 
> > With caching error pages you want die_error to exit $actions{$action}->(),
> > but not exit cache_fetch().  How do you intend to do it?
> 
> Well there's one bug in how that function ends in looking at it again,
> basically the return case shouldn't happen, and that function should
> end, like your suggesting in the first part of your question (with
> respect to DONE_GITWEB)
> 
> In the second part, your not thinking with the fork() going (though in
> thinking sans the fork this might not work right).
> 
> It's the background process that will call die_error in such a way that
> die_error_cache will get invoked.  die_error_cache will write the .err
> file out, and the whole thing should just exit.
> 
> Though now that I say that there's an obvious bug in the case where
> forking didn't work at all, in that case you would get a blank page as
> the connection would just be closed.  If you refreshed (say hitting F5)
> you'd get the error at that point.
> 
> Need to fix that non-forked problem though.

Well, if you, the author, cannot follow code flow of your own code, what
does it matter for being sure that this code is bug free?  What does this
matter for maintability of this code?
 

That rant aside, error / exception handling in gitweb is currently not
fitting well with output caching, at least the locking one.

die_error() functions as a kind of exception handling; we rely that on
the fact that calling die_error() would end request, independent on how
deep in the stack we are.  Originally die_error() ended with 'exit',
which ModPerl::Registry redefined for it to end request and not exit
worker.  Then 'exit' was replaced by 'goto DONE_GITWEB' to jump out of
several levels of calls; I didn't know then about ModPerl::Registry
redefining 'exit'... and actually it should be 'goto DONE_REQUEST', like
in "[PATCH/RFC] gitweb: Go to DONE_REQUEST rather than DONE_GITWEB in
die_error"

  http://permalink.gmane.org/gmane.comp.version-control.git/162156

It is because die_error is exception mechanism, and in current incarnation
always ends request, that is why error pages (generated by die_error) were
not cached: we jump out of capturing and out of caching.  The additional
reasoning is that we don't want to bloat cache with error pages, which 
IMHO usually gets requested only once (assumption: different clients makes
different errors).

Now in most cases the approach taken to modify die_error for caching only
by adding explicit turning off capturing at the beginning of die_error is
enough.

1. Single client, no generating in background (note that if given URL is
   never cached, we would not invoke background generation to refresh shown
   stale data - there wouldn't be stale data).

   In this case if there is an expected error, die_error() gets explicitely
   invoked, turns off capturing, prints error page to client, and ends
   request.

   In the case of uncaught "die", it would be caught by CGI::Carp error
   handler, and passed to handle_errors_html() subroutine (thanks to gitweb
   using set_message(\&handle_errors_html)), which runs die_error() with
   options making it not print HTTP header (which was already printed by
   CGI::Carp error handler), and not exit - the CGI::Carp error handler
   would end request instead.  die_error() turns of capturing, prints
   error page, and CGI::Carp error handler ends request.

2. Two clients arriving at exactly the same error (same link), at the
   same time.  This is quite unlikely.

   In my rewrite there is loop in ->compute method in rewritten caching
   engine, which reads:

      do {
          ...
      } until (<received data to show to user> || <tried to generate data ourself>);

   This means that one client acquires writers lock, die_error prints error
   page and exists, second client notices that it didn't get anything but
   didn't try it itself yet, and dies itself on die_error()

   Dealing with "die"-ing works the same as in the case described above,
   so there is no problem from this area neither.

   Alternate solution would be to treat it as the case described below.

3. Gitweb runs generating cache entry in background.  Note that if error
   pages are not cached, there would be no stale pages to serve while 
   regenerating data in background - so entering background process can
   be done only thanks to "Generating..." page.
   
   We can try _prevent this from happening_, as I did in my rewrite by
   introducing initial/startup delay in "Generating..." (which has also
   other reasons to use), or via 'generating_info_is_safe' protection.

   Otherwise we need to pass error page from background process to
   foreground proces that is showing "Generating..." page; well, to be
   more exact, with current workings of "Generating..." it would be its
   successor (next request, after reload / refresh).

   Note: the fact that it is *next request* that needs an error page
   (otherwise we would show "Generating..." page yet again).

   So what die_error needs to do if it finds itself in the background
   process (perhaps explicit $background boolean variable, perhaps
   comparing $$ with $my_pid, perhaps checking if STDOUT is closed)
   it needs to somehow write cache entry, perhaps in a special way
   marking it as error page.  The problem is to do it in generic way,
   that would not make it impossible to use other caching engine, or
   other capturing engine, in the future.

   Note also that at the end of background process (perhaps at the
   end of die_error) we need to exit process, and not just end request,
   so we should use 'CORE::exit(0);'.

The problem with 3rd case makes me think that it is high time that
die_error use Perl 5 exception throwing and handling mechanism, namely
"die" (for throwing errors, to be used in die_error), and "eval BLOCK"
(to catch errors).

As proposed on #perl channel when asking about this situation, die_error
would use 'die \$DIE_ERROR' to throw reference, or throw an object, to
easy distinguish between handled error from die_error, and unhandled
error from Perl (where we assume that all errors are strings).

run_request() or run() would then use 'eval { ... }', which has the
additional advantage that we can get rid of CGI::Carp::set_message,
which doesn't allow to use custom HTTP status, and supposedly doesn't
work with mod_perl 2.0.  Instead of adding capture_stop() to die_error(),
the capture mechanism should use 'eval { ... }', and just print response
if there was exception (like Capture::Tiny does)... or return captured
error page to be cached in the case of being in background process.


Well, any way we choose to handle it, the code should be very clear,
and handle all cases (other caching engines, perhaps also other capture
engines, non-persistent and persistent environments, redefined 'exit'
like in ModPerl::Registry case, not redefined 'exit' like I think in
FastCGI case, etc., etc.).

>>> This adds two functions:
>>>
>>> die_error_cache() - this gets back called from die_error() so
>>> that the error message generated can be cached.
>> 
>> *How* die_error_cache() gets called back from die_error()?  I don't
>> see any changes to die_error(), or actually any calling sites for
>> die_error_cache() in the patch below.
>>  
>>> cacheDisplayErr() - this is a simplified version of cacheDisplay()
>>> that does an initial check, if the error page exists - display it
>>> and exit.  If not, return.
>> 
>> Errr... isn't it removed in _preceding_ patch?  WTF???
> 
> in breaking up the series it got included in the wrong spot, and
> apparently removed and re-added correctly, should be fixed in v9

[...]
> 
> If you'd rather I can squash 17 & 18 into a single commit.

Yes, please.  Splitting those changes into 17 & 18 didn't make it more
clear (usually smaller commit == easier to review), but rather less
transparent.
 
>>> +sub die_error_cache {
>>> +	my ($output) = @_;
>>> +
>>> +	open(my $cacheFileErr, '>:utf8', "$fullhashpath.err");
>>> +	my $lockStatus = flock($cacheFileErr,LOCK_EX|LOCK_NB);
>> 
>> Why do you need to lock here?  A comment would be nice.
> 
> At any point when a write happens there's the potential for multiple
> simultaneous writes.  Locking becomes obvious, when your trying to
> prevent multiple processes from writing to the same thing at the same
> time...

Or you can use 'write to File::Temp::tempfile, rename file' trick for
atomic update to file.  I use it in early commits in my rewrite of gitweb
caching, see e.g.:

  http://repo.or.cz/w/git/jnareb-git.git/blob/refs/heads/gitweb/cache-kernel-v6:/gitweb/lib/GitwebCache/SimpleFileCache.pm#l362
 
>>> +
>>> +	if (! $lockStatus ){
>>> +		if ( $areForked ){
>> 
>> Grrrr...

Global variables, action at considerable distance.
 
>> But if it is here to stay, a comment if you please.
>> 
>>> +			exit(0);
>>> +		}else{
>>> +			return;
>>> +		}
>>> +	}
> 
> The exit(0) or return have been removed in favor of DONE_GITWEB, as
> we've already errored if we are broken here we should just die.

Note the difference between exit and Core::exit, when running gitweb
from mod_perl using ModPerl::Registry handler.
 

>>> +
>>> +	flock($cacheFileErr,LOCK_UN);
>>> +	close($cacheFileErr);
>> 
>> Closing file will unlock it.
> 
> Doesn't really hurt to be explicit though.

O.K., but please note that I have found LOCK_UN to be unreliable.
 
>>> +
>>> +	if ( $areForked ){
>>> +		exit(0);
>>> +	}else{
>>> +		return;
>> 
>> So die_error_cache would not actually work like "die" here and like
>> die_error(), isn't it?
> 
> that was ejected, it was a bug.  DONE_GITWEB is more correct, though I
> might need to add a hook to display the error message in the case that
> the process didn't fork.

By the way, why do you fork indiscriminately (remember that forking
is not without performance cost), even when background generation is
turned off, or you don't need background generation?
 
Wouldn't fallback on non-background generation if fork() fails, as in
my rewrite of gitweb caching series be a better solution?

>>> +	while( <$cacheFileErr> ){
>>> +		print $_;
>>> +	}
>> 
>> Why not 'print <$cacheFileErr>' (list context), like in insert_file()
>> subroutine?
> 
> I've had buffer problems with 'print <$cacheFileErr>' in some cases.
> This is small enough it shouldn't happen, but I've gotten into the habit
> of doing it this way.  I can change it if you like.

Perhaps

  print while <$cacheFileErr>;

(we use it in already in "print while <$fd>;" in git_blame_common())?

Or, if we use File::Copy, perhaps File::Copy::copy($cacheFileErr, \*STDOUT);
or something like that.
 
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 11/18] gitweb: add isDumbClient() check
  2010-12-10  4:00     ` J.H.
@ 2010-12-11  0:07       ` Junio C Hamano
  2010-12-11  0:15         ` Jakub Narebski
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2010-12-11  0:07 UTC (permalink / raw)
  To: J.H.; +Cc: Jakub Narebski, git

"J.H." <warthog9@eaglescrag.net> writes:

> My initial look indicated that perl-http-browserdetect wasn't available
> for RHEL / CentOS 5 - it is however available in EPEL.
>
> However there are a couple of things to note about User Agents at all:
> 	- They lie... a lot
> 	- Robots lie even more
>
> Blacklisting is still the better option, by a lot.  I'll re-work this
> some in v9, as I'm fine with the added dependency.

Thanks, both.  I sense that we finally are going to get a single version
of gitweb that can be used at larger sites ;-)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 11/18] gitweb: add isDumbClient() check
  2010-12-11  0:07       ` Junio C Hamano
@ 2010-12-11  0:15         ` Jakub Narebski
  2010-12-11  1:15           ` J.H.
  0 siblings, 1 reply; 60+ messages in thread
From: Jakub Narebski @ 2010-12-11  0:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: J.H., git

Junio C Hamano wrote:
> "J.H." <warthog9@eaglescrag.net> writes:
> 
> > My initial look indicated that perl-http-browserdetect wasn't available
> > for RHEL / CentOS 5 - it is however available in EPEL.
> >
> > However there are a couple of things to note about User Agents at all:
> > 	- They lie... a lot
> > 	- Robots lie even more
> >
> > Blacklisting is still the better option, by a lot.  I'll re-work this
> > some in v9, as I'm fine with the added dependency.
> 
> Thanks, both.  I sense that we finally are going to get a single version
> of gitweb that can be used at larger sites ;-)

I wouldn't be so optimistic.  While we borrow features and ideas from
each other, the difference still remains that J.H. patches are bit hacky
but are tested, while my rewrite is IMHO cleaner but untested (well, 
untested on real life load).

Anyway the main issue that was discovered by PATCHv6 by me, and v8 by J.H.
is that die_error sucks... well, at least if background caching is enabled.

Anyway, J.H. plans v9, I plan shortened rewrite.
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 11/18] gitweb: add isDumbClient() check
  2010-12-11  0:15         ` Jakub Narebski
@ 2010-12-11  1:15           ` J.H.
  2010-12-11  1:40             ` Jakub Narebski
  0 siblings, 1 reply; 60+ messages in thread
From: J.H. @ 2010-12-11  1:15 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Junio C Hamano, git

On 12/10/2010 04:15 PM, Jakub Narebski wrote:
> Junio C Hamano wrote:
>> "J.H." <warthog9@eaglescrag.net> writes:
>>
>>> My initial look indicated that perl-http-browserdetect wasn't available
>>> for RHEL / CentOS 5 - it is however available in EPEL.
>>>
>>> However there are a couple of things to note about User Agents at all:
>>> 	- They lie... a lot
>>> 	- Robots lie even more
>>>
>>> Blacklisting is still the better option, by a lot.  I'll re-work this
>>> some in v9, as I'm fine with the added dependency.
>>
>> Thanks, both.  I sense that we finally are going to get a single version
>> of gitweb that can be used at larger sites ;-)
> 
> I wouldn't be so optimistic.  While we borrow features and ideas from
> each other, the difference still remains that J.H. patches are bit hacky
> but are tested, while my rewrite is IMHO cleaner but untested (well, 
> untested on real life load).

At this point I'm not sure there is a way to rectify the two patch
series, and while we may borrow ideas from each other it's becoming
clear that we are both, generally speaking, heading in different
directions for what we want and need out of gitweb.  Jakub's patches for
the admin page are indicative of that.

> Anyway the main issue that was discovered by PATCHv6 by me, and v8 by J.H.
> is that die_error sucks... well, at least if background caching is enabled.

I'd agree with that, and as such I'm working on a complete re-work of
error handling in gitweb for v9.  Things are looking pretty good so far,
but to claim that it's a non-invasive patch would be akin to selling
someone the Brooklyn bridge.

That said, the way Gitweb handles it's errors and things like exit are
appalling and this has been something that's needed doing for a while
anyway.  Guess now's the time to do it.  Might be a few days for me to
get far enough for any of it to be worthwhile sharing, late next week
maybe.  That said I hit vacation starting on the 20th so it might be
next year before that is finalized.

- John 'Warthog9' Hawley

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 11/18] gitweb: add isDumbClient() check
  2010-12-11  1:15           ` J.H.
@ 2010-12-11  1:40             ` Jakub Narebski
  0 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-12-11  1:40 UTC (permalink / raw)
  To: J.H.; +Cc: Junio C Hamano, git

On Sat, 11 Dec 2010, J.H. wrote:
> On 12/10/2010 04:15 PM, Jakub Narebski wrote:
>> Junio C Hamano wrote:
>>> "J.H." <warthog9@eaglescrag.net> writes:
>>>
>>>> My initial look indicated that perl-http-browserdetect wasn't available
>>>> for RHEL / CentOS 5 - it is however available in EPEL.
>>>>
>>>> However there are a couple of things to note about User Agents at all:
>>>> 	- They lie... a lot
>>>> 	- Robots lie even more
>>>>
>>>> Blacklisting is still the better option, by a lot.  I'll re-work this
>>>> some in v9, as I'm fine with the added dependency.
>>>
>>> Thanks, both.  I sense that we finally are going to get a single version
>>> of gitweb that can be used at larger sites ;-)
>> 
>> I wouldn't be so optimistic.  While we borrow features and ideas from
>> each other, the difference still remains that J.H. patches are bit hacky
>> but are tested, while my rewrite is IMHO cleaner but untested (well, 
>> untested on real life load).
> 
> At this point I'm not sure there is a way to rectify the two patch
> series, and while we may borrow ideas from each other it's becoming
> clear that we are both, generally speaking, heading in different
> directions for what we want and need out of gitweb.  Jakub's patches for
> the admin page are indicative of that.

Actually the cache administration page was just proof of concept.  Perhaps
a better solution would be to provide script that can be run to safely
clean cache (or just heavily outdated entries).


What I want from caching series is a clean separation between capturing
(so it can be replaced in the future e.g. by Capture::Tiny, or capturing
to mmapped fragment for Cache::FastMmap-like cache, or simple capturing
to memory for Memcached), caching engine (so it can be replaced by some
good and tested caching engine, like Cache::Cache, Cache, Cache::Memcached,
Cache::FastMmap, CHI and its drivers and options like cache levels), and
caching output module.  Modular build makes it easier to catch errors
and allows for unit testing each component separately.  And you can simply
use 'require <Package>' instead of doing manual error handling and 
protecting against redefine errors / multiple include via 'do <file>'.

What I don't like is caching engine guts strewn all over the gitweb.
I'd rather capturing engine was not tied too tightly with gitweb.  The
least controversial is "output caching" part...

Anyway I'd try to keep my rewrite feature-compatibile with J.H. series,
including (from v7) also backward compatibility with cache config option
names, including $cache_enable.  (Grrr... API/ABI backwards compatibility).

> 
>> Anyway the main issue that was discovered by PATCHv6 by me, and v8 by J.H.
>> is that die_error sucks... well, at least if background caching is enabled.
> 
> I'd agree with that, and as such I'm working on a complete re-work of
> error handling in gitweb for v9.  Things are looking pretty good so far,
> but to claim that it's a non-invasive patch would be akin to selling
> someone the Brooklyn bridge.

Hmmm... I am also thinking about changing the way error handling is done
in gitweb, but I don't think it would be very invasive: for a non-cached
case it would be simply one "eval" in run_request() or run(), and "die"
instead of "goto DONE_XXX" in die_error().
 
Now if only there were HTTP::Server::Simple::FCGI so I would be able to
test fastCGI support without need to install mod_fcgi / mod_fastcgi for
Apache... (local::lib and cpanm for the win!).

> That said, the way Gitweb handles it's errors and things like exit are
> appalling and this has been something that's needed doing for a while
> anyway.  Guess now's the time to do it.  Might be a few days for me to
> get far enough for any of it to be worthwhile sharing, late next week
> maybe.  That said I hit vacation starting on the 20th so it might be
> next year before that is finalized.

I also don't think that output caching can be done before end of this year,
sorry.

Hmmm... I guess that in shortened minimal version of my rewrite of output
caching for gitweb (without zero-size check, adaptive cache lifetime, 
perhaps even without support for alternate caching engines) I should also
include minimal improvement to die_error-handling.  Just like there is
"gitweb: Prepare for splitting gitweb" there.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 16/18] gitweb: When changing output (STDOUT) change STDERR as well
  2010-12-10  1:36   ` Jakub Narebski
@ 2010-12-12  5:25     ` J.H.
  2010-12-12 15:17       ` Jakub Narebski
  0 siblings, 1 reply; 60+ messages in thread
From: J.H. @ 2010-12-12  5:25 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

> Hmm... anuthing that happens after 'use CGI::Carp;' is parsed should
> have STDERR redirected to web server logs, see CGI::Carp manpage
> 
>     [...]
>  
>        use CGI::Carp
> 
>     And the standard warn(), die (), croak(), confess() and carp() calls will
>     automagically be replaced with functions that write out nicely time-stamped
>     messages to the HTTP server error log.
> 
>     [...]
> 
>     REDIRECTING ERROR MESSAGES
> 
>        By default, error messages are sent to STDERR.  Most HTTPD servers direct
>        STDERR to the server's error log.
> 
>     [...]
> 
> Especially the second part.

That was not what I was seeing, so either something I was doing was
horking how CGI::Carp works, or their claim that "most HTTPD server
direct STDERR to the server's error log" is false.

> Could you give us example which causes described misbehaviour?

While I was working on the trapping of the error pages I started getting
500 errors when going to a non-existent sha1.  Running the command from
the cli revealed that a message from a git command was making it out to
the console.  Redirecting STDERR masked the error from git, and stopped
premature data being sent out before the headers were sent.

> I have nothing against this patch: if you have to have it, then you
> have to have it.  I oly try to understand what might be core cause
> behind the issue that this patch is to solve...

I've re-tried this, if you remove this patch and attempt to visit a
non-exist sha1, *boom*

I can only speculate that CGI::Carp only redirects the output inside of
perl, and does not handle the case when called programs (like git) write
more directly to STDERR.

- John 'Warthog9' Hawley

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 16/18] gitweb: When changing output (STDOUT) change STDERR as well
  2010-12-12  5:25     ` J.H.
@ 2010-12-12 15:17       ` Jakub Narebski
  0 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-12-12 15:17 UTC (permalink / raw)
  To: J.H.; +Cc: git

On Sun, 12 Dec 2010, J.H. wrote:

> > Hmm... anuthing that happens after 'use CGI::Carp;' is parsed should
> > have STDERR redirected to web server logs, see CGI::Carp manpage
> > 
> >     [...]
> >  
> >        use CGI::Carp
> > 
> >     And the standard warn(), die (), croak(), confess() and carp() calls will
> >     automagically be replaced with functions that write out nicely time-stamped
> >     messages to the HTTP server error log.
> > 
> >     [...]
> > 
> >     REDIRECTING ERROR MESSAGES
> > 
> >        By default, error messages are sent to STDERR.  Most HTTPD servers direct
> >        STDERR to the server's error log.
> > 
> >     [...]
> > 
> > Especially the second part.
> 
> That was not what I was seeing, so either something I was doing was
> horking how CGI::Carp works, or their claim that "most HTTPD server
> direct STDERR to the server's error log" is false.
> 
> > Could you give us example which causes described misbehaviour?
> 
> While I was working on the trapping of the error pages I started getting
> 500 errors when going to a non-existent sha1.  Running the command from
> the cli revealed that a message from a git command was making it out to
> the console.  Redirecting STDERR masked the error from git, and stopped
> premature data being sent out before the headers were sent.

Generally if something worked, and stopped working, don't you think
that you should concentrate on fixing your code, and not papering
over the issue?


The fact that "Running the command from the cli revealed that a message
from a git command was making it out to the console." doesn't mean
anything, because when running gitweb from commandline both stdout
and stderr are redirected to terminal, by default.  So you should
worry only if there is premature data being sent to standard output,
with standard error redirected to /dev/null (2>/dev/null).

What CGI::Carp does is (re)define 'die' and 'warn' to support
fatalsToBrowser and warningsToBrowser, and to add timestamp and other
auxiliary information: in the end 'die' calls 'CORE::die', and 'warn'
calls 'CORE::warn' - both of which write to STDERR.  This means that
warnings from git commands sent to standard error do not get timestamp
appended.  Note that standard output from git commands run by gitweb
is always captured.
 
> > I have nothing against this patch: if you have to have it, then you
> > have to have it.  I oly try to understand what might be core cause
> > behind the issue that this patch is to solve...
> 
> I've re-tried this, if you remove this patch and attempt to visit a
> non-exist sha1, *boom*
> 
> I can only speculate that CGI::Carp only redirects the output inside of
> perl, and does not handle the case when called programs (like git) write
> more directly to STDERR.

CGI::Carp doesn't redirect output: it adds timestamp and prints it to
STDERR (unless one use 'carpout') to the result of 'die' and 'warn' calls.

*Without your series* when I visit non-existing sha1, or non-existing
file I get correctly 404 error from gitweb.  So you have borked something.

The CGI standard (http://tools.ietf.org/html/rfc3875) doesn't talk about
'standard error' stream at all; on the other hand it talks only about
'standard input' and 'standard output'.  I have checked with simple CGI
script in Perl, that neither using die or warn (both before any HTTP 
headers are send), neither with plain CGI or with mod_perl 
(ModPerl::Registry), with CGI::Carp I never get the error you see.
Without CGI::Carp I get '500 Internal Server Error' instead of nicer
one formatted by CGI::Carp, but I don't get it even without CGI::Carp
with 'warn' and printing to STDERR directly.

The standard error stream either gets discarded (mod_cgid), or is
written to /var/log/httpd/error_log (mod_perl).

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2010-12-12 15:17 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-09 21:57 [PATCH 00/18] Gitweb caching v8 John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 01/18] gitweb: Prepare for splitting gitweb John 'Warthog9' Hawley
2010-12-09 23:30   ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 02/18] gitweb: add output buffering and associated functions John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 03/18] gitweb: File based caching layer (from git.kernel.org) John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 04/18] gitweb: Minimal testing of gitweb caching John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 05/18] gitweb: Regression fix concerning binary output of files John 'Warthog9' Hawley
2010-12-09 23:33   ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 06/18] gitweb: Add more explicit means of disabling 'Generating...' page John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 07/18] gitweb: Revert back to $cache_enable vs. $caching_enabled John 'Warthog9' Hawley
2010-12-09 23:38   ` Jakub Narebski
2010-12-10  2:38     ` J.H.
2010-12-10 13:48       ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 08/18] gitweb: Change is_cacheable() to return true always John 'Warthog9' Hawley
2010-12-09 23:46   ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 09/18] gitweb: Revert reset_output() back to original code John 'Warthog9' Hawley
2010-12-09 23:58   ` Jakub Narebski
2010-12-10  2:43     ` J.H.
2010-12-09 21:57 ` [PATCH 10/18] gitweb: Adding isBinaryAction() and isFeedAction() to determine the action type John 'Warthog9' Hawley
2010-12-10  0:06   ` Jakub Narebski
2010-12-10  3:39     ` J.H.
2010-12-10 12:10       ` Jakub Narebski
2010-12-10 12:25         ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 11/18] gitweb: add isDumbClient() check John 'Warthog9' Hawley
2010-12-10  0:12   ` Jakub Narebski
2010-12-10  4:00     ` J.H.
2010-12-11  0:07       ` Junio C Hamano
2010-12-11  0:15         ` Jakub Narebski
2010-12-11  1:15           ` J.H.
2010-12-11  1:40             ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 12/18] gitweb: Change file handles (in caching) to lexical variables as opposed to globs John 'Warthog9' Hawley
2010-12-10  0:16   ` Jakub Narebski
2010-12-10  0:32     ` Junio C Hamano
2010-12-10  0:47       ` Jakub Narebski
2010-12-10  5:56       ` J.H.
2010-12-09 21:57 ` [PATCH 13/18] gitweb: Add commented url & url hash to page footer John 'Warthog9' Hawley
2010-12-10  0:26   ` Jakub Narebski
2010-12-10  6:10     ` J.H.
2010-12-09 21:57 ` [PATCH 14/18] gitweb: add print_transient_header() function for central header printing John 'Warthog9' Hawley
2010-12-10  0:36   ` Jakub Narebski
2010-12-10  6:18     ` J.H.
2010-12-09 21:57 ` [PATCH 15/18] gitweb: Add show_warning() to display an immediate warning, with refresh John 'Warthog9' Hawley
2010-12-10  1:01   ` Jakub Narebski
2010-12-10  7:38     ` J.H.
2010-12-10 14:10       ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 16/18] gitweb: When changing output (STDOUT) change STDERR as well John 'Warthog9' Hawley
2010-12-10  1:36   ` Jakub Narebski
2010-12-12  5:25     ` J.H.
2010-12-12 15:17       ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 17/18] gitweb: Prepare for cached error pages & better error page handling John 'Warthog9' Hawley
2010-12-10  1:49   ` Jakub Narebski
2010-12-10  8:33     ` J.H.
2010-12-10 20:33       ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 18/18] gitweb: Add better error handling for gitweb caching John 'Warthog9' Hawley
2010-12-10  1:56   ` Jakub Narebski
2010-12-09 23:26 ` [PATCH 00/18] Gitweb caching v8 Jakub Narebski
2010-12-10  0:43   ` J.H.
2010-12-10  1:27     ` Jakub Narebski
2010-12-10  0:39 ` Junio C Hamano
2010-12-10  0:45   ` J.H.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).