git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCHv6/RFC 00/24] gitweb: Simple file based output caching
@ 2010-12-06 23:10 Jakub Narebski
  2010-12-06 23:10 ` [PATCH 01/24] t/test-lib.sh: Export also GIT_BUILD_DIR in test_external Jakub Narebski
                   ` (23 more replies)
  0 siblings, 24 replies; 25+ messages in thread
From: Jakub Narebski @ 2010-12-06 23:10 UTC (permalink / raw
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Junio C Hamano, demerphq, Aevar Arnfjord Bjarmason, Thomas Adam,
	Jakub Narebski

[I Cc-ed everybody who *might* be interested in this series.  I am
 sorry if it included somebody by mistake]

This 22+ patches long series (2 last patches are proof of concept) is
intended as replacement (rewrite) of "Gitweb caching v7" series from
John 'Warthog9' Hawley (J.H.):

  http://thread.gmane.org/gmane.comp.version-control.git/160147

This is sixth version (6th release) of this series, and is available
in the following repositories (links are to web interface):

  http://repo.or.cz/w/git/jnareb-git.git
  http://github.com/jnareb/git

as 'gitweb/cache-kernel-v6' branch.  Earlier versions are available at
http://repo.or.cz/w/git/jnareb-git.git as 'gitweb/cache-kernel-v5'
(previous version) to 'gitweb/cache-kernel' (first version).

Previous version of this series was sent to git mailing list as:

  [PATCHv5 00/17] gitweb: Simple file based output caching
  Message-Id: <1286402526-13143-1-git-send-email-jnareb@gmail.com>
  http://thread.gmane.org/gmane.comp.version-control.git/158313

You can find link to next to previous version, et cetera.


The main ideas in lifted from J.H. patches are the following
(features in common with "Gitweb caching v7" series by John Hawley):

* caching captured output of gitweb in flat files, without any
  serialization (caching raw data)

* using global (per-cache, not per-entry) expiration time, and
  using difference between mtime of cache file and current time
  for expiration

* using file locking (flock) to prevent 'cache miss stampede'
  problem, i.e. to ensure that only one process is (re)generating
  cache entry

* serving stale but not too old version, and regenerating data
  in background, to avoid waiting for data to be regenerated

* progress info indicator based on http-equiv refresh trick
  (described in more detail how it works in the commit message)

* capturing gitweb output by redirecting STDOUT to cache entry file


The main differences between this patch series and "Gitweb caching v7"
(and my minimal fixups in "Gitweb caching v7.[1-3]") are the following:

* features are added piece by piece in multiple patches (22 patches
  covering v7 features vs 3-4 patches in v7/v7.x series), making it
  hopefully easier to review, as patches are smaller.  OTOH this series
  is much longer...

* In J.H. series subroutines responsible for capturing gitweb output are
  in gitweb.perl, and subroutines responsible for caching are in lib/cache.pl
  (cache.pm in original patch).  cache.pl/cache.pm uses variables and
  subroutines from gitweb script, so it couldn't be made into Perl module;
  therefore we have to use 'do' rather than 'require' to load it.

  In this series GitwebCache::Capture::Simple module is responsible for
  capturing [gitweb] output, GitwebCache::SimpleFileCache and
  GitwebCache::FileCacheWithLocking are responsible for caching, and
  GitwebCache::CacheOutput is about caching captured output (ties them
  together).  This allowed "unit" testing, i.e. testing each module
  in isolation (tests t9503 - t9505).

* GitwebCache::CacheOutput::cache_output (equivalent of cache_fetch from
  cache.pm in J.H. patch) supports any cache supporting ->get / ->set or
  ->compute interface (e.g. Cache::FileCache from Cache::Cache, or CHI
  with 'File' driver, or Cache::FastMmap) - it is described in gitweb/README
  in "Gitweb caching" section.

  For this capturing engine (GitwebCache::Capture::Simple) supports returning
  captured output (via capturing to in-memory file).

  Tested once upon a time with Cache::FileCache $cache.

* There is no difference between treating actions with binary output or
  possibly binary output like 'snapshot' or 'blob_plain' (which use binary
  or ':raw' mode) and other actions (which use text or ':utf8' mode).
  GitwebCache::Capture::Simple captures transformed output i.e. raw bytes,
  so data from cache is dumped to STDOUT (to web browser) in ':raw' mode.

* Instead of disabling caching of 'blame_incremental' action (so it is
  used without caching), this alternate to plain 'blame' action is
  disabled if caching is turned off.

  In the future 'blame_interactive' would use cache for caching its
  initial output and for caching 'blame_data' it uses.

* Configuring cache is done via %cache_options (and %generating_options)
  instead of via gitweb config variables.  For example instead of 
  $minCacheTime there is $cache_options{'expires_min'}.

  It is also more configurable than in J.H. patch; more parameters can be
  changed (like e.g. factor multiplying get_loadavg() in adaptive cache
  lifetime; 'check_load', 'generating_info', 'on_error' are configurable
  callbacks).

  "gitweb: Support legacy options used by kernel.org caching engine"
  patch in this series makes this rewrite support configuration variables
  used by "Gitweb caching v7" series.

* This rewrite uses lexical filehandles, i.e.

    open my $fh, '>', $filename

  instead of globals that J.H. patch uses

    open FH, '>', $filename

  (though it hides it in "open(cacheFile, '<', $filename)").  J.H. is
  working on "Gitweb caching v8" and I think he would address that issue
  there.

* When generating cache in background process, the background process
  daemonizes itself.  Therefore it should be safe to enable / use
  'background_cache' also for persistent environments, like mod_perl via
  ModPerl::Registry, FastCGI when run as gitweb.fcgi, PSGI via gitweb.psgi
  wrapper that git-instaweb generates.

- Other changes might be mentioned in comments to individual patches

Two last patches in this series introduce proof of concept cache
administration page, where you can currently check how much file space is
used by cache, and where you can also safely clean cache (remove all
entries).  Those two patches are slightly outside scope of "gitweb output
caching", and that is why I refer to this series as 22+ patches long
(there are 24 patches in total).

Previous version of this series had
  gitweb/lib - Benchmarking GitwebCache::SimpleFileCache (in t/9603/)
  gitweb/lib - Alternate ways of capturing output
as two last patche in the series.  They are missing in this release.


The following changes since commit 0b0cd0e0a29a139f418991dd769ea4266ffec370:

  Merge branch 'jn/ignore-doc' (2010-12-03 16:13:06 -0800)

are available in the git repository at:

  git://repo.or.cz/git/jnareb-git.git gitweb/cache-kernel-v6

Jakub Narebski (24):
  t/test-lib.sh: Export also GIT_BUILD_DIR in test_external
  gitweb: Prepare for splitting gitweb
  gitweb/lib - Very simple file based cache
  gitweb/lib - Stat-based cache expiration
  gitweb/lib - Regenerate entry if the cache file has size of 0
  gitweb/lib - Simple output capture by redirecting STDOUT
  gitweb/lib - Cache captured output (using get/set)
  gitweb: Add optional output caching
  gitweb/lib - Adaptive cache expiration time
  gitweb/lib - Use CHI compatibile (compute method) caching interface
  gitweb/lib - capture output directly to cache entry file
  gitweb/lib - Use locking to avoid 'cache miss stampede' problem
  gitweb/lib - No need for File::Temp when locking
  gitweb/lib - Serve stale data when waiting for filling cache
  gitweb/lib - Regenerate (refresh) cache in background
  gitweb: Introduce %actions_info, gathering information about actions
  gitweb: Show appropriate "Generating..." page when regenerating cache
  gitweb/lib - Configure running 'generating_info' when generating data
  gitweb: Add startup delay to activity indicator for cache
  gitweb/lib - Add support for setting error handler in cache
  gitweb: Wrap die_error to use as error handler for caching engine
  gitweb: Support legacy options used by kernel.org caching engine
  gitweb/lib - Add clear() and size() methods to caching interface
  gitweb: Add beginnings of cache administration page (proof of
    concept)

 gitweb/Makefile                                |   23 +-
 gitweb/README                                  |   62 +++
 gitweb/gitweb.perl                             |  544 +++++++++++++++++++-
 gitweb/lib/GitwebCache/CacheOutput.pm          |  131 +++++
 gitweb/lib/GitwebCache/Capture/Simple.pm       |  110 ++++
 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |  376 ++++++++++++++
 gitweb/lib/GitwebCache/SimpleFileCache.pm      |  592 ++++++++++++++++++++++
 t/gitweb-lib.sh                                |   12 +
 t/t9500-gitweb-standalone-no-errors.sh         |   20 +
 t/t9501-gitweb-standalone-http-status.sh       |   21 +
 t/t9502-gitweb-standalone-parse-output.sh      |   33 ++
 t/t9503-gitweb-caching-interface.sh            |   34 ++
 t/t9503/test_cache_interface.pl                |  647 ++++++++++++++++++++++++
 t/t9504-gitweb-capture-interface.sh            |   34 ++
 t/t9504/test_capture_interface.pl              |  108 ++++
 t/t9505-gitweb-cache.sh                        |   39 ++
 t/t9505/test_cache_output.pl                   |   86 ++++
 t/test-lib.sh                                  |    4 +-
 18 files changed, 2850 insertions(+), 26 deletions(-)
 create mode 100644 gitweb/lib/GitwebCache/CacheOutput.pm
 create mode 100644 gitweb/lib/GitwebCache/Capture/Simple.pm
 create mode 100644 gitweb/lib/GitwebCache/FileCacheWithLocking.pm
 create mode 100644 gitweb/lib/GitwebCache/SimpleFileCache.pm
 create mode 100755 t/t9503-gitweb-caching-interface.sh
 create mode 100755 t/t9503/test_cache_interface.pl
 create mode 100755 t/t9504-gitweb-capture-interface.sh
 create mode 100755 t/t9504/test_capture_interface.pl
 create mode 100755 t/t9505-gitweb-cache.sh
 create mode 100755 t/t9505/test_cache_output.pl

-- 
1.7.3

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 01/24] t/test-lib.sh: Export also GIT_BUILD_DIR in test_external
  2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
@ 2010-12-06 23:10 ` Jakub Narebski
  2010-12-06 23:10 ` [PATCH 02/24] gitweb: Prepare for splitting gitweb Jakub Narebski
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2010-12-06 23:10 UTC (permalink / raw
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Junio C Hamano, demerphq, Aevar Arnfjord Bjarmason, Thomas Adam,
	Jakub Narebski

This way we can use it in test scripts written in other languages
(e.g. in Perl), and not rely on "$TEST_DIRECTORY/.."

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This is not strictly required, but if we modified t/gitweb-lib.sh to
use GIT_BUILD_DIR, we would want to do the same in the test scripts
for caching interface etc. included in this series.

This patch is the same as in previous version of this series.  It has
no equivalent in "Gitweb caching v7" by J.H.; it was not needed there
because that series doesn't have separate "unit" tests for individual
components of gitweb output caching.

 t/test-lib.sh |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 48fa516..c077fa4 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -552,9 +552,9 @@ test_external () {
 		# Announce the script to reduce confusion about the
 		# test output that follows.
 		say_color "" "# run $test_count: $descr ($*)"
-		# Export TEST_DIRECTORY, TRASH_DIRECTORY and GIT_TEST_LONG
+		# Export TEST_DIRECTORY, GIT_BUILD_DIR, TRASH_DIRECTORY and GIT_TEST_LONG
 		# to be able to use them in script
-		export TEST_DIRECTORY TRASH_DIRECTORY GIT_TEST_LONG
+		export TEST_DIRECTORY GIT_BUILD_DIR TRASH_DIRECTORY GIT_TEST_LONG
 		# Run command; redirect its stderr to &4 as in
 		# test_run_, but keep its stdout on our stdout even in
 		# non-verbose mode.
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/24] gitweb: Prepare for splitting gitweb
  2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
  2010-12-06 23:10 ` [PATCH 01/24] t/test-lib.sh: Export also GIT_BUILD_DIR in test_external Jakub Narebski
@ 2010-12-06 23:10 ` Jakub Narebski
  2010-12-06 23:10 ` [PATCH 03/24] gitweb/lib - Very simple file based cache Jakub Narebski
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2010-12-06 23:10 UTC (permalink / raw
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Junio C Hamano, demerphq, Aevar Arnfjord Bjarmason, Thomas Adam,
	Jakub Narebski

Prepare gitweb for having been split into modules that are to be
installed alongside gitweb in 'lib/' subdirectory, by adding

  use lib __DIR__.'/lib';

to gitweb.perl (to main gitweb script), and preparing for putting
modules (relative path) in $(GITWEB_MODULES) in gitweb/Makefile.

This preparatory work allows to add new module to gitweb by simply
adding

  GITWEB_MODULES += <module>

to gitweb/Makefile (assuming that the module is in 'gitweb/lib/'
directory).

While at it pass GITWEBLIBDIR in addition to GITWEB_TEST_INSTALLED to
allow testing installed version of gitweb and installed version of
modules (for future tests which would check individual (sub)modules).


Using __DIR__ from Dir::Self module (not in core, that's why currently
gitweb includes excerpt of code from Dir::Self defining __DIR__) was
chosen over using FindBin-based solution (in core since perl 5.00307,
while gitweb itself requires at least perl 5.8.0) because FindBin uses
BEGIN block, which is a problem under mod_perl and other persistent
Perl environments (thought there are workarounds).

At Pavan Kumar Sankara suggestion gitweb/Makefile uses

  install [OPTION]... SOURCE... DIRECTORY

format (2nd format) with single SOURCE rather than

  install [OPTION]... SOURCE DEST

format (1st format) because of security reasons (race conditions).
Modern GNU install has `-T' / `--no-target-directory' option, but we
cannot rely that the $(INSTALL) we are using supports this option.

The install-modules target in gitweb/Makefile uses shell 'for' loop,
instead of make's $(foreach) function, to avoid possible problem with
generating a command line that exceeded the maximum argument list
length.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
In previous version there was a BUG: shell variables should not be in
single quotes.

This is fixed version, the same as in

  [PATCHv7.4 0/4] Gitweb caching v7.4
  Message-ID: <1291404335-25541-1-git-send-email-jnareb@gmail.com>
  http://thread.gmane.org/gmane.comp.version-control.git/162830

series, which are a bit fixed up and a tiny bit cleaned up version of
"Gitweb caching v7" series from John 'Warthog9' Hawley:

  http://thread.gmane.org/gmane.comp.version-control.git/160147


Sidenote: this patch can serve as a way to introduce infrastructure
required for splitting gitweb into modules for better maintability.
This issue was to be first part (first milestone) for "Integrated web
client for git" GSoC2010 project by Pavan Kumar Sankara (with Christian
Couder as mentor, and Petr 'Pasky' Baudis as co-mentor)... which
unfortunately failed middterm evaluations.

As you can see later in series it makes it very easy to add new gitweb
modules which are to be installed.  Just add

  GITWEB_MODULES += Module/Foo.pm

to gitweb/Makefile for Module::Foo (present in gitweb/lib/Module/Foo.pm)
and 'install-gitweb' target would automatically install it along with
gitweb.



 gitweb/Makefile    |   17 +++++++++++++++--
 gitweb/gitweb.perl |    8 ++++++++
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/gitweb/Makefile b/gitweb/Makefile
index 0a6ac00..e6029e1 100644
--- a/gitweb/Makefile
+++ b/gitweb/Makefile
@@ -57,6 +57,7 @@ PERL_PATH  ?= /usr/bin/perl
 bindir_SQ = $(subst ','\'',$(bindir))#'
 gitwebdir_SQ = $(subst ','\'',$(gitwebdir))#'
 gitwebstaticdir_SQ = $(subst ','\'',$(gitwebdir)/static)#'
+gitweblibdir_SQ = $(subst ','\'',$(gitwebdir)/lib)#'
 SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH))#'
 PERL_PATH_SQ  = $(subst ','\'',$(PERL_PATH))#'
 DESTDIR_SQ    = $(subst ','\'',$(DESTDIR))#'
@@ -153,20 +154,32 @@ test:
 
 test-installed:
 	GITWEB_TEST_INSTALLED='$(DESTDIR_SQ)$(gitwebdir_SQ)' \
+	GITWEBLIBDIR='$(DESTDIR_SQ)$(gitweblibdir_SQ)' \
 		$(MAKE) -C ../t gitweb-test
 
 ### Installation rules
 
-install: all
+install: all install-modules
 	$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(gitwebdir_SQ)'
 	$(INSTALL) -m 755 $(GITWEB_PROGRAMS) '$(DESTDIR_SQ)$(gitwebdir_SQ)'
 	$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(gitwebstaticdir_SQ)'
 	$(INSTALL) -m 644 $(GITWEB_FILES) '$(DESTDIR_SQ)$(gitwebstaticdir_SQ)'
 
+install-modules:
+	install_dirs="$(sort $(dir $(GITWEB_MODULES)))" && \
+	for dir in $$install_dirs; do \
+		test -d '$(DESTDIR_SQ)$(gitweblibdir_SQ)'/"$$dir" || \
+		$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(gitweblibdir_SQ)'/"$$dir"; \
+	done
+	gitweb_modules="$(GITWEB_MODULES)" && \
+	for mod in $$gitweb_modules; do \
+		$(INSTALL) -m 644 "lib/$$mod" '$(DESTDIR_SQ)$(gitweblibdir_SQ)'/"$$(dirname $$mod)"; \
+	done
+
 ### Cleaning rules
 
 clean:
 	$(RM) gitweb.cgi static/gitweb.min.js static/gitweb.min.css GITWEB-BUILD-OPTIONS
 
-.PHONY: all clean install test test-installed .FORCE-GIT-VERSION-FILE FORCE
+.PHONY: all clean install install-modules test test-installed .FORCE-GIT-VERSION-FILE FORCE
 
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 679f2da..cfa511c 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -10,6 +10,14 @@
 use 5.008;
 use strict;
 use warnings;
+
+use File::Spec;
+# __DIR__ is taken from Dir::Self __DIR__ fragment
+sub __DIR__ () {
+	File::Spec->rel2abs(join '', (File::Spec->splitpath(__FILE__))[0, 1]);
+}
+use lib __DIR__ . '/lib';
+
 use CGI qw(:standard :escapeHTML -nosticky);
 use CGI::Util qw(unescape);
 use CGI::Carp qw(fatalsToBrowser set_message);
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 03/24] gitweb/lib - Very simple file based cache
  2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
  2010-12-06 23:10 ` [PATCH 01/24] t/test-lib.sh: Export also GIT_BUILD_DIR in test_external Jakub Narebski
  2010-12-06 23:10 ` [PATCH 02/24] gitweb: Prepare for splitting gitweb Jakub Narebski
@ 2010-12-06 23:10 ` Jakub Narebski
  2010-12-06 23:10 ` [PATCH 04/24] gitweb/lib - Stat-based cache expiration Jakub Narebski
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2010-12-06 23:10 UTC (permalink / raw
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Junio C Hamano, demerphq, Aevar Arnfjord Bjarmason, Thomas Adam,
	Jakub Narebski

This is first step towards implementing file based output (response)
caching layer that is used on such large sites as kernel.org.

This patch introduces GitwebCaching::SimpleFileCache package, which
follows Cache::Cache / CHI interface, although do not implement it
fully.  The intent of following established convention for cache
interface is to be able to replace our simple file based cache,
e.g. by the one using memcached.

The data is stored in the cache as-is, without adding metadata (like
expiration date), and without serialization (which means that one can
store only scalar data).  At this point there is no support for
expiring cache entries.


The code of GitwebCaching::SimpleFileCache package in gitweb/lib
was heavily based on file-based cache in Cache::Cache package, i.e.
on Cache::FileCache, Cache::FileBackend and Cache::BaseCache, and on
file-based cache in CHI, i.e. on CHI::Driver::File and CHI::Driver
(including implementing atomic write, something that original patch
lacks).  It tries to follow more modern CHI architecture, but without
requiring Moose.  It is much simplified compared to both interfaces
and their file-based drivers.

This patch does not yet enable output caching in gitweb (it doesn't
have all required features yet); on the other hand it includes tests
of cache Perl API in t/t9503-gitweb-caching-interface.sh.

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This version is cleaned up from remainders of old-style code inspired
by code in Cach::Cache distribution, and of cargo-culted code from
examples in Perl documentation; it uses more modern Perl coding
conventions.  Therefore it has significantly less warnings from
Perl::Critic (though they are not eliminated entirely).

Contrary to some earlier versions we do not try to be operating system
agnostic by using File::Spec; concatenating paths with '/' taken as
directory separator is faster, and that is what gitweb uses anyway.

Note that writing to cache entry file is done atomically via File::Temp
and renaming file to final version.  This way we can ensure that gitweb
wouldn't get halfway written file.  Use of File::Temp (which gives
safety but unfortunately has negative impact on performance) will be
made not necessary with introduction of locking.

Because GitwebCache::SimpleFileCache (and its derivatives) use standard
->new, ->get, ->set, ->compute methods / API / interface, then other
caching engines such as Cache::Cache, Cache, CHI, Cache::Memcached or
Cache::FastMmap can be used instead of this one.

Fast spew / fast slurp might be interesting, but they are obsoleted
(unused) when redirecting output directly to cache entry file in later
patch.

 gitweb/lib/GitwebCache/SimpleFileCache.pm |  336 +++++++++++++++++++++++++++++
 t/t9503-gitweb-caching-interface.sh       |   34 +++
 t/t9503/test_cache_interface.pl           |   85 ++++++++
 3 files changed, 455 insertions(+), 0 deletions(-)
 create mode 100644 gitweb/lib/GitwebCache/SimpleFileCache.pm
 create mode 100755 t/t9503-gitweb-caching-interface.sh
 create mode 100755 t/t9503/test_cache_interface.pl

diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
new file mode 100644
index 0000000..2f26a6c
--- /dev/null
+++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
@@ -0,0 +1,336 @@
+# gitweb - simple web interface to track changes in git repositories
+#
+# (C) 2006, John 'Warthog9' Hawley <warthog19@eaglescrag.net>
+# (C) 2010, Jakub Narebski <jnareb@gmail.com>
+#
+# This program is licensed under the GPLv2
+
+#
+# Gitweb caching engine, simple file-based cache
+#
+
+# Minimalistic cache that stores data in the filesystem, without serialization
+# and currently without any kind of cache expiration (all keys last forever till
+# they got explicitely removed).
+#
+# It follows Cache::Cache and CHI interfaces (but does not implement it fully)
+
+package GitwebCache::SimpleFileCache;
+
+use strict;
+use warnings;
+
+use File::Path qw(mkpath);
+use File::Temp qw(tempfile);
+use Digest::MD5 qw(md5_hex);
+
+# by default, the cache nests all entries on the filesystem single
+# directory deep, i.e. '60/b725f10c9c85c70d97880dfe8191b3' for
+# key name (key digest) 60b725f10c9c85c70d97880dfe8191b3.
+#
+our $DEFAULT_CACHE_DEPTH = 1;
+
+# by default, the root of the cache is located in 'cache'.
+#
+our $DEFAULT_CACHE_ROOT = "cache";
+
+# by default we don't use cache namespace (empty namespace);
+# empty namespace does not allow for simple implementation of clear() method.
+#
+our $DEFAULT_NAMESPACE = '';
+
+# ......................................................................
+# constructor
+
+# The options are set by passing in a reference to a hash containing
+# any of the following keys:
+#  * 'namespace'
+#    The namespace associated with this cache.  This allows easy separation of
+#    multiple, distinct caches without worrying about key collision.  Defaults
+#    to $DEFAULT_NAMESPACE.
+#  * 'cache_root' (Cache::FileCache compatibile),
+#    'root_dir' (CHI::Driver::File compatibile),
+#    The location in the filesystem that will hold the root of the cache.
+#    Defaults to $DEFAULT_CACHE_ROOT.
+#  * 'cache_depth' (Cache::FileCache compatibile),
+#    'depth' (CHI::Driver::File compatibile),
+#    The number of subdirectories deep to cache object item.  This should be
+#    large enough that no cache directory has more than a few hundred objects.
+#    Defaults to $DEFAULT_CACHE_DEPTH unless explicitly set.
+sub new {
+	my $class = shift;
+	my %opts = ref $_[0] ? %{ $_[0] } : @_;
+
+	my $self = {};
+	$self = bless($self, $class);
+
+	my ($root, $depth, $ns);
+	if (%opts) {
+		$root =
+			$opts{'cache_root'} ||
+			$opts{'root_dir'};
+		$depth =
+			$opts{'cache_depth'} ||
+			$opts{'depth'};
+		$ns = $opts{'namespace'};
+	}
+	$root  = $DEFAULT_CACHE_ROOT  unless defined($root);
+	$depth = $DEFAULT_CACHE_DEPTH unless defined($depth);
+	$ns    = $DEFAULT_NAMESPACE   unless defined($ns);
+
+	$self->set_root($root);
+	$self->set_depth($depth);
+	$self->set_namespace($ns);
+
+	return $self;
+}
+
+
+# ......................................................................
+# accessors
+
+# http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
+
+# creates get_depth() and set_depth($depth) etc. methods
+foreach my $i (qw(depth root namespace)) {
+	my $field = $i;
+	no strict 'refs';
+	*{"get_$field"} = sub {
+		my $self = shift;
+		return $self->{$field};
+	};
+	*{"set_$field"} = sub {
+		my ($self, $value) = @_;
+		$self->{$field} = $value;
+	};
+}
+
+
+# ----------------------------------------------------------------------
+# utility functions and methods
+
+# Return root dir for namespace (lazily built, cached)
+sub path_to_namespace {
+	my ($self) = @_;
+
+	if (!exists $self->{'path_to_namespace'}) {
+		if (defined $self->{'namespace'} &&
+		    $self->{'namespace'} ne '') {
+			$self->{'path_to_namespace'} = "$self->{'root'}/$self->{'namespace'}";
+		} else {
+			$self->{'path_to_namespace'} =  $self->{'root'};
+		}
+	}
+	return $self->{'path_to_namespace'};
+}
+
+# $path = $cache->path_to_key($key);
+# $path = $cache->path_to_key($key, \$dir);
+#
+# Take an human readable key, and return file path.
+# Puts dirname of file path in second argument, if it is provided.
+sub path_to_key {
+	my ($self, $key, $dir_ref) = @_;
+
+	my @paths = ( $self->path_to_namespace() );
+
+	# Create a unique (hashed) key from human readable key
+	my $filename = md5_hex($key); # or $digester->add($key)->hexdigest();
+
+	# Split filename so that it have DEPTH subdirectories,
+	# where each subdirectory has a two-letter name
+	push @paths, unpack("(a2)[$self->{'depth'}] a*", $filename);
+	$filename = pop @paths;
+
+	# Join paths together, computing dir separately if $dir_ref was passed.
+	my $filepath;
+	if (defined $dir_ref && ref($dir_ref)) {
+		my $dir = join('/', @paths);
+		$filepath = "$dir/$filename";
+		$$dir_ref = $dir;
+	} else {
+		$filepath = join('/', @paths, $filename);
+	}
+
+	return $filepath;
+}
+
+sub read_file {
+	my $filename = shift;
+
+	# Fast slurp, adapted from File::Slurp::read, with unnecessary options removed
+	# via CHI::Driver::File (from CHI-0.33)
+	my $buf = '';
+	open my $read_fh, '<', $filename
+		or return;
+	binmode $read_fh, ':raw';
+
+	my $size_left = -s $read_fh;
+
+	while ($size_left > 0) {
+		my $read_cnt = sysread($read_fh, $buf, $size_left, length($buf));
+		return unless defined $read_cnt;
+
+		last if $read_cnt == 0;
+		$size_left -= $read_cnt;
+		#last if $size_left <= 0;
+	}
+
+	close $read_fh
+		or die "Couldn't close file '$filename' opened for reading: $!";
+	return $buf;
+}
+
+sub write_fh {
+	my ($write_fh, $filename, $data) = @_;
+
+	# Fast spew, adapted from File::Slurp::write, with unnecessary options removed
+	# via CHI::Driver::File (from CHI-0.33)
+	binmode $write_fh, ':raw';
+
+	my $size_left = length($data);
+	my $offset = 0;
+
+	while ($size_left > 0) {
+		my $write_cnt = syswrite($write_fh, $data, $size_left, $offset);
+		return unless defined $write_cnt;
+
+		$size_left -= $write_cnt;
+		$offset += $write_cnt; # == length($data);
+	}
+
+	close $write_fh
+		or die "Couldn't close file '$filename' opened for writing: $!";
+}
+
+# ----------------------------------------------------------------------
+# "private" utility functions and methods
+
+# take a file path to cache entry, and its directory
+# return filehandle and filename of open temporary file,
+# like File::Temp::tempfile
+sub _tempfile_to_path {
+	my ($file, $dir) = @_;
+
+	# tempfile will croak() if there is an error
+	return tempfile("${file}_XXXXX",
+		#DIR => $dir,
+		'UNLINK' => 0, # ensure that we don't unlink on close; file is renamed
+		'SUFFIX' => '.tmp');
+}
+
+
+# ----------------------------------------------------------------------
+# worker methods
+
+sub fetch {
+	my ($self, $key) = @_;
+
+	my $file = $self->path_to_key($key);
+	return unless (defined $file && -f $file);
+
+	return read_file($file);
+}
+
+sub store {
+	my ($self, $key, $data) = @_;
+
+	my $dir;
+	my $file = $self->path_to_key($key, \$dir);
+	return unless (defined $file && defined $dir);
+
+	# ensure that directory leading to cache file exists
+	if (!-d $dir) {
+		# mkpath will croak()/die() if there is an error
+		mkpath($dir, 0, 0777);
+	}
+
+	# generate a temporary file
+	my ($temp_fh, $tempname) = _tempfile_to_path($file, $dir);
+	chmod 0666, $tempname
+		or warn "Couldn't change permissions to 0666 / -rw-rw-rw- for '$tempname': $!";
+
+	write_fh($temp_fh, $tempname, $data);
+
+	rename($tempname, $file)
+		or die "Couldn't rename temporary file '$tempname' to '$file': $!";
+}
+
+# get size of an element associated with the $key (not the size of whole cache)
+sub get_size {
+	my ($self, $key) = @_;
+
+	my $path = $self->path_to_key($key)
+		or return undef;
+	if (-f $path) {
+		return -s $path;
+	}
+	return 0;
+}
+
+
+# ......................................................................
+# interface methods
+
+# Removing and expiring
+
+# $cache->remove($key)
+#
+# Remove the data associated with the $key from the cache.
+sub remove {
+	my ($self, $key) = @_;
+
+	my $file = $self->path_to_key($key)
+		or return;
+	return unless -f $file;
+	unlink($file)
+		or die "Couldn't remove file '$file': $!";
+}
+
+# Getting and setting
+
+# $cache->set($key, $data);
+#
+# Associates $data with $key in the cache, overwriting any existing entry.
+# Returns $data.
+sub set {
+	my ($self, $key, $data) = @_;
+
+	return unless (defined $key && defined $data);
+
+	$self->store($key, $data);
+
+	return $data;
+}
+
+# $data = $cache->get($key);
+#
+# Returns the data associated with $key.  If $key does not exist
+# or has expired, returns undef.
+sub get {
+	my ($self, $key) = @_;
+
+	return $self->fetch($key);;
+}
+
+# $data = $cache->compute($key, $code);
+#
+# Combines the get and set operations in a single call.  Attempts to
+# get $key; if successful, returns the value.  Otherwise, calls $code
+# and uses the return value as the new value for $key, which is then
+# returned.
+sub compute {
+	my ($self, $key, $code) = @_;
+
+	my $data = $self->get($key);
+	if (!defined $data) {
+		$data = $code->();
+		$self->set($key, $data);
+	}
+
+	return $data;
+}
+
+1;
+__END__
+# end of package GitwebCache::SimpleFileCache;
diff --git a/t/t9503-gitweb-caching-interface.sh b/t/t9503-gitweb-caching-interface.sh
new file mode 100755
index 0000000..819da1d
--- /dev/null
+++ b/t/t9503-gitweb-caching-interface.sh
@@ -0,0 +1,34 @@
+#!/bin/sh
+#
+# Copyright (c) 2010 Jakub Narebski
+#
+
+test_description='gitweb caching interface
+
+This test checks caching interface used in gitweb caching, and caching
+infrastructure (GitwebCache::* modules).'
+
+# for now we are running only cache interface tests
+. ./test-lib.sh
+
+# this test is present in gitweb-lib.sh
+if ! test_have_prereq PERL; then
+	skip_all='perl not available, skipping test'
+	test_done
+fi
+
+"$PERL_PATH" -MTest::More -e 0 >/dev/null 2>&1 || {
+	skip_all='perl module Test::More unavailable, skipping test'
+	test_done
+}
+
+# ----------------------------------------------------------------------
+
+# The external test will outputs its own plan
+test_external_has_tap=1
+
+test_external \
+	'GitwebCache::* Perl API (in gitweb/lib/)' \
+	"$PERL_PATH" "$TEST_DIRECTORY"/t9503/test_cache_interface.pl
+
+test_done
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
new file mode 100755
index 0000000..1a3f374
--- /dev/null
+++ b/t/t9503/test_cache_interface.pl
@@ -0,0 +1,85 @@
+#!/usr/bin/perl
+use lib (split(/:/, $ENV{GITPERLLIB}));
+
+use warnings;
+use strict;
+
+use Test::More;
+
+# test source version
+use lib $ENV{GITWEBLIBDIR} || "$ENV{GIT_BUILD_DIR}/gitweb/lib";
+
+
+# Test creating a cache
+#
+BEGIN { use_ok('GitwebCache::SimpleFileCache'); }
+diag("Using lib '$INC[0]'");
+diag("Testing '$INC{'GitwebCache/SimpleFileCache.pm'}'");
+
+my $cache = new_ok('GitwebCache::SimpleFileCache');
+
+# Test that default values are defined
+#
+ok(defined $GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT,
+	'$DEFAULT_CACHE_ROOT defined');
+ok(defined $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH,
+	'$DEFAULT_CACHE_DEPTH defined');
+
+# Test accessors and default values for cache
+#
+SKIP: {
+	skip 'default values not defined', 3
+		unless ($GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT &&
+		        $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH);
+
+	is($cache->get_namespace(), '', "default namespace is ''");
+	cmp_ok($cache->get_root(),  'eq', $GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT,
+		"default cache root is '$GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT'");
+	cmp_ok($cache->get_depth(), '==', $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH,
+		"default cache depth is $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH");
+}
+
+# Test the getting, setting, and removal of a cached value
+# (Cache::Cache interface)
+#
+my $key = 'Test Key';
+my $value = 'Test Value';
+
+subtest 'Cache::Cache interface' => sub {
+	foreach my $method (qw(get set remove)) {
+		can_ok($cache, $method);
+	}
+
+	$cache->set($key, $value);
+	cmp_ok($cache->get_size($key), '>', 0, 'get_size after set, is greater than 0');
+	is($cache->get($key), $value,          'get after set, returns cached value');
+	$cache->remove($key);
+	ok(!defined($cache->get($key)),        'get after remove, is undefined');
+
+	eval { $cache->remove('Not-Existent Key'); };
+	ok(!$@,                                'remove on non-existent key doesn\'t die');
+	diag($@) if $@;
+
+	done_testing();
+};
+
+# Test the getting and setting of a cached value
+# (CHI interface)
+#
+my $call_count = 0;
+sub get_value {
+	$call_count++;
+	return $value;
+}
+subtest 'CHI interface' => sub {
+	can_ok($cache, qw(compute));
+
+	is($cache->compute($key, \&get_value), $value, "compute 1st time (set) returns '$value'");
+	is($cache->compute($key, \&get_value), $value, "compute 2nd time (get) returns '$value'");
+	is($cache->compute($key, \&get_value), $value, "compute 3rd time (get) returns '$value'");
+	cmp_ok($call_count, '==', 1, 'get_value() is called once from compute');
+
+	done_testing();
+};
+
+done_testing();
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 04/24] gitweb/lib - Stat-based cache expiration
  2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
                   ` (2 preceding siblings ...)
  2010-12-06 23:10 ` [PATCH 03/24] gitweb/lib - Very simple file based cache Jakub Narebski
@ 2010-12-06 23:10 ` Jakub Narebski
  2010-12-06 23:10 ` [PATCH 05/24] gitweb/lib - Regenerate entry if the cache file has size of 0 Jakub Narebski
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2010-12-06 23:10 UTC (permalink / raw
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Junio C Hamano, demerphq, Aevar Arnfjord Bjarmason, Thomas Adam,
	Jakub Narebski

Add stat-based cache expiration to file-based GitwebCache::SimpleFileCache.
Contrary to the way other caching interfaces such as Cache::Cache and CHI
do it, the time when cache element expires in is _global_ value associated
with cache instance, and is not local property of cache entry.  (Currently
cache entry does not store any metadata associated with entry... which
means that there is no need for serialization / marshalling / freezing and
thawing.)  Default expire time is -1, which means never expire.

To check if cache entry is expired, GitwebCache::SimpleFileCache compares
difference between mtime (last modify time) of a cache file and current time
with (global) time to expire.  It is done using CHI-compatible is_valid()
method.

Includes some tests checking that expiring works correctly (on the level of
API) in t9503.

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Compared to "Gitweb caching v7" series, checking if cache entry is
expired (based on mtime of cache file) is simpler: we reuse stat
structure from test if file even exists (via stat(_)).

 gitweb/lib/GitwebCache/SimpleFileCache.pm |   40 +++++++++++++++++++++++++++-
 t/t9503/test_cache_interface.pl           |   19 +++++++++++++
 2 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
index 2f26a6c..790383d 100644
--- a/gitweb/lib/GitwebCache/SimpleFileCache.pm
+++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
@@ -57,6 +57,10 @@ our $DEFAULT_NAMESPACE = '';
 #    The number of subdirectories deep to cache object item.  This should be
 #    large enough that no cache directory has more than a few hundred objects.
 #    Defaults to $DEFAULT_CACHE_DEPTH unless explicitly set.
+#  * 'default_expires_in' (Cache::Cache compatibile),
+#    'expires_in' (CHI compatibile) [seconds]
+#    The expiration time for objects place in the cache.
+#    Defaults to -1 (never expire) if not explicitly set.
 sub new {
 	my $class = shift;
 	my %opts = ref $_[0] ? %{ $_[0] } : @_;
@@ -64,7 +68,7 @@ sub new {
 	my $self = {};
 	$self = bless($self, $class);
 
-	my ($root, $depth, $ns);
+	my ($root, $depth, $ns, $expires_in);
 	if (%opts) {
 		$root =
 			$opts{'cache_root'} ||
@@ -73,14 +77,19 @@ sub new {
 			$opts{'cache_depth'} ||
 			$opts{'depth'};
 		$ns = $opts{'namespace'};
+		$expires_in =
+			$opts{'default_expires_in'} ||
+			$opts{'expires_in'};
 	}
 	$root  = $DEFAULT_CACHE_ROOT  unless defined($root);
 	$depth = $DEFAULT_CACHE_DEPTH unless defined($depth);
 	$ns    = $DEFAULT_NAMESPACE   unless defined($ns);
+	$expires_in = -1 unless defined($expires_in); # <0 means never
 
 	$self->set_root($root);
 	$self->set_depth($depth);
 	$self->set_namespace($ns);
+	$self->set_expires_in($expires_in);
 
 	return $self;
 }
@@ -92,7 +101,7 @@ sub new {
 # http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
 
 # creates get_depth() and set_depth($depth) etc. methods
-foreach my $i (qw(depth root namespace)) {
+foreach my $i (qw(depth root namespace expires_in)) {
 	my $field = $i;
 	no strict 'refs';
 	*{"get_$field"} = sub {
@@ -287,6 +296,31 @@ sub remove {
 		or die "Couldn't remove file '$file': $!";
 }
 
+# $cache->is_valid($key)
+#
+# Returns a boolean indicating whether $key exists in the cache
+# and has not expired (global per-cache 'expires_in').
+sub is_valid {
+	my ($self, $key) = @_;
+
+	my $path = $self->path_to_key($key);
+
+	# does file exists in cache?
+	return 0 unless -f $path;
+	# get its modification time
+	my $mtime = (stat(_))[9] # _ to reuse stat structure used in -f test
+		or die "Couldn't stat file '$path': $!";
+
+	# expire time can be set to never
+	my $expires_in = $self->get_expires_in();
+	return 1 unless (defined $expires_in && $expires_in >= 0);
+
+	# is file expired?
+	my $now = time();
+
+	return (($now - $mtime) < $expires_in);
+}
+
 # Getting and setting
 
 # $cache->set($key, $data);
@@ -310,6 +344,8 @@ sub set {
 sub get {
 	my ($self, $key) = @_;
 
+	return unless $self->is_valid($key);
+
 	return $self->fetch($key);;
 }
 
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 1a3f374..1517fb6 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -82,4 +82,23 @@ subtest 'CHI interface' => sub {
 	done_testing();
 };
 
+# Test cache expiration
+#
+subtest 'cache expiration' => sub {
+	$cache->set_expires_in(60*60*24); # set expire time to 1 day
+	cmp_ok($cache->get_expires_in(), '>', 0, '"expires in" is greater than 0');
+	is($cache->get($key), $value,            'get returns cached value (not expired in 1d)');
+
+	$cache->set_expires_in(-1); # set expire time to never expire
+	is($cache->get_expires_in(), -1,         '"expires in" is set to never (-1)');
+	is($cache->get($key), $value,            'get returns cached value (not expired)');
+
+	$cache->set_expires_in(0);
+	is($cache->get_expires_in(),  0,         '"expires in" is set to now (0)');
+	$cache->set($key, $value);
+	ok(!defined($cache->get($key)),          'cache is expired');
+
+	done_testing();
+};
+
 done_testing();
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 05/24] gitweb/lib - Regenerate entry if the cache file has size of 0
  2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
                   ` (3 preceding siblings ...)
  2010-12-06 23:10 ` [PATCH 04/24] gitweb/lib - Stat-based cache expiration Jakub Narebski
@ 2010-12-06 23:10 ` Jakub Narebski
  2010-12-06 23:10 ` [PATCH 06/24] gitweb/lib - Simple output capture by redirecting STDOUT Jakub Narebski
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2010-12-06 23:10 UTC (permalink / raw
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Junio C Hamano, demerphq, Aevar Arnfjord Bjarmason, Thomas Adam,
	Jakub Narebski

If the file representing cache entry has size 0 (in bytes), the cache
entry is considered invalid, regardless of its freshness, even if cache
expiration is turned off.

[jh: This is a quick, and thankfully easy, check to regenerate the
cache if the resulting file is of size 0.  This *SHOULDN'T* happen,
but it is possible that it could and thus adding the check.]

Based-on-commit-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch is unchanged from previos version of this series.

It is inspired by commit 22eb7af (Gitweb - Regenerate the cache if the
resulting file has size of 0, 2010-09-23) on 'master' branch of
git/warthog9/gitweb.git repository on git.kernel.org

The "gitweb: File based caching layer (from git.kernel.org)" in
"Gitweb caching v7" by J.H. incudes the same test.

 gitweb/lib/GitwebCache/SimpleFileCache.pm |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
index 790383d..bf74f7c 100644
--- a/gitweb/lib/GitwebCache/SimpleFileCache.pm
+++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
@@ -310,6 +310,8 @@ sub is_valid {
 	# get its modification time
 	my $mtime = (stat(_))[9] # _ to reuse stat structure used in -f test
 		or die "Couldn't stat file '$path': $!";
+	# cache entry is invalid if it is size 0 (in bytes)
+	return 0 unless ((stat(_))[7] > 0);
 
 	# expire time can be set to never
 	my $expires_in = $self->get_expires_in();
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 06/24] gitweb/lib - Simple output capture by redirecting STDOUT
  2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
                   ` (4 preceding siblings ...)
  2010-12-06 23:10 ` [PATCH 05/24] gitweb/lib - Regenerate entry if the cache file has size of 0 Jakub Narebski
@ 2010-12-06 23:10 ` Jakub Narebski
  2010-12-06 23:10 ` [PATCH 07/24] gitweb/lib - Cache captured output (using get/set) Jakub Narebski
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2010-12-06 23:10 UTC (permalink / raw
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Junio C Hamano, demerphq, Aevar Arnfjord Bjarmason, Thomas Adam,
	Jakub Narebski

Add GitwebCache::Capture::Simple package, which captures output by
redirecting STDOUT to in-memory file (saving what is printed to
scalar), earlier saving original STDOUT to restore it when finished
capturing.

GitwebCache::Capture::Simple preserves PerlIO layers, both those set
before started capturing output, and those set during capture.  The
exceptions is the 'scalar' layer, which needs additional parameter,
and which for proper handling needs non-core module PerlIO::Util.

No care was taken to handle the following special cases (prior to
starting capture): closed STDOUT, STDOUT reopened to scalar reference,
tied STDOUT.  You shouldn't modify STDOUT during capture.

Includes separate tests for capturing output in
t9504/test_capture_interface.pl which is run as external test from
t9504-gitweb-capture-interface.sh.  It tests capturing of utf8 data
printed in :utf8 mode, and of binary data (containing invalid utf8) in
:raw mode.

Note that nested capturing doesn't work (and probably couldn't be made
to work when capturing to in-memory file), but this feature wouldn't
be needed for capturing gitweb output (to cache it).


This patch was based on "gitweb: add output buffering and associated
functions" patch by John 'Warthog9' Hawley (J.H.) in "Gitweb caching v7"
series, and on code of Capture::Tiny by David Golden (Apache License 2.0).

Based-on-work-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
In previous version of this series the basic (default) capturing
engine was GitwebCache::Capture::SelectFH, which made use of
select(FH) to change default filehandle used for print and printf
without filehandle argument (print LIST).  This required changing
  binmode STDOUT, <mode>
to
  binmode select(), <mode>
in gitweb.perl, which is now not necessary.

To simplify code this time we don't use GitwebCache::Capture as base
class providing common features, but we reserve space to add it if we
feel it necessary (e.g. when adding more capturing engines).

Even re-layering is probably not necessary in the case of gitweb, as
we set ':utf8' mode at beginning, and if we change it to ':raw' it is
always after we started capture.

Original "Gitweb caching v7" used capturing to in-memory files in the
case when caching was disabled; in my minimal fixup i.e. in the
"Gitweb caching v7.x" threads capturing to in-memory file is not done
at all; output is redirected straight to cache file (or equivalent).
The same would be done later in this series.

 gitweb/lib/GitwebCache/Capture/Simple.pm           |   96 ++++++++++++++++++++
 ...erface.sh => t9504-gitweb-capture-interface.sh} |   10 +-
 t/t9504/test_capture_interface.pl                  |   91 +++++++++++++++++++
 3 files changed, 192 insertions(+), 5 deletions(-)
 create mode 100644 gitweb/lib/GitwebCache/Capture/Simple.pm
 copy t/{t9503-gitweb-caching-interface.sh => t9504-gitweb-capture-interface.sh} (66%)
 create mode 100755 t/t9504/test_capture_interface.pl

diff --git a/gitweb/lib/GitwebCache/Capture/Simple.pm b/gitweb/lib/GitwebCache/Capture/Simple.pm
new file mode 100644
index 0000000..3585e58
--- /dev/null
+++ b/gitweb/lib/GitwebCache/Capture/Simple.pm
@@ -0,0 +1,96 @@
+# gitweb - simple web interface to track changes in git repositories
+#
+# (C) 2010, Jakub Narebski <jnareb@gmail.com>
+#
+# This program is licensed under the GPLv2
+
+#
+# Simple output capturing via redirecting STDOUT to in-memory file.
+#
+
+# This is the same mechanism that Capture::Tiny uses, only simpler;
+# we don't capture STDERR at all, we don't tee, we don't support
+# capturing output of external commands.
+
+package GitwebCache::Capture::Simple;
+
+use strict;
+use warnings;
+
+use PerlIO;
+
+# Constructor
+sub new {
+	my $class = shift;
+
+	my $self = {};
+	$self = bless($self, $class);
+
+	return $self;
+}
+
+sub capture {
+	my ($self, $code) = @_;
+
+	$self->capture_start();
+	$code->();
+	return $self->capture_stop();
+}
+
+# ----------------------------------------------------------------------
+
+# Start capturing data (STDOUT)
+sub capture_start {
+	my $self = shift;
+
+	# save copy of real STDOUT via duplicating it
+	my @layers = PerlIO::get_layers(\*STDOUT);
+	open $self->{'orig_stdout'}, ">&", \*STDOUT
+		or die "Couldn't dup STDOUT for capture: $!";
+
+	# close STDOUT, so that it isn't used anymode (to have it fd0)
+	close STDOUT;
+
+	# reopen STDOUT as in-memory file
+	$self->{'data'} = '';
+	unless (open STDOUT, '>', \$self->{'data'}) {
+		open STDOUT, '>&', fileno($self->{'orig_stdout'});
+		die "Couldn't reopen STDOUT as in-memory file for capture: $!";
+	}
+	_relayer(\*STDOUT, \@layers);
+
+	# started capturing
+	$self->{'capturing'} = 1;
+}
+
+# Stop capturing data (required for die_error)
+sub capture_stop {
+	my $self = shift;
+
+	# return if we didn't start capturing
+	return unless delete $self->{'capturing'};
+
+	# close in-memory file, and restore original STDOUT
+	my @layers = PerlIO::get_layers(\*STDOUT);
+	close STDOUT;
+	open STDOUT, '>&', fileno($self->{'orig_stdout'});
+	_relayer(\*STDOUT, \@layers);
+
+	return $self->{'data'};
+}
+
+# taken from Capture::Tiny by David Golden, Apache License 2.0
+# with debugging stripped out, and added filtering out 'scalar' layer
+sub _relayer {
+	my ($fh, $layers) = @_;
+
+	my %seen = ( unix => 1, perlio => 1, scalar => 1 ); # filter these out
+	my @unique = grep { !$seen{$_}++ } @$layers;
+
+	binmode($fh, join(":", ":raw", @unique));
+}
+
+
+1;
+__END__
+# end of package GitwebCache::Capture::Simple
diff --git a/t/t9503-gitweb-caching-interface.sh b/t/t9504-gitweb-capture-interface.sh
similarity index 66%
copy from t/t9503-gitweb-caching-interface.sh
copy to t/t9504-gitweb-capture-interface.sh
index 819da1d..82623f1 100755
--- a/t/t9503-gitweb-caching-interface.sh
+++ b/t/t9504-gitweb-capture-interface.sh
@@ -3,10 +3,10 @@
 # Copyright (c) 2010 Jakub Narebski
 #
 
-test_description='gitweb caching interface
+test_description='gitweb capturing interface
 
-This test checks caching interface used in gitweb caching, and caching
-infrastructure (GitwebCache::* modules).'
+This test checks capturing interface used for capturing gitweb output
+in gitweb caching (GitwebCache::Capture* modules).'
 
 # for now we are running only cache interface tests
 . ./test-lib.sh
@@ -28,7 +28,7 @@ fi
 test_external_has_tap=1
 
 test_external \
-	'GitwebCache::* Perl API (in gitweb/lib/)' \
-	"$PERL_PATH" "$TEST_DIRECTORY"/t9503/test_cache_interface.pl
+	'GitwebCache::Capture Perl API (in gitweb/lib/)' \
+	"$PERL_PATH" "$TEST_DIRECTORY"/t9504/test_capture_interface.pl
 
 test_done
diff --git a/t/t9504/test_capture_interface.pl b/t/t9504/test_capture_interface.pl
new file mode 100755
index 0000000..47ab804
--- /dev/null
+++ b/t/t9504/test_capture_interface.pl
@@ -0,0 +1,91 @@
+#!/usr/bin/perl
+use lib (split(/:/, $ENV{GITPERLLIB}));
+
+use warnings;
+use strict;
+use utf8;
+
+use Test::More;
+
+# test source version
+use lib $ENV{GITWEBLIBDIR} || "$ENV{GIT_BUILD_DIR}/gitweb/lib";
+
+# ....................................................................
+
+use_ok('GitwebCache::Capture::Simple');
+diag("Using lib '$INC[0]'");
+diag("Testing '$INC{'GitwebCache/Capture/Simple.pm'}'");
+
+# Test setting up capture
+#
+my $capture = new_ok('GitwebCache::Capture::Simple' => [], 'The $capture');
+
+# Test capturing
+#
+sub capture_block (&) {
+	return $capture->capture(shift);
+}
+
+diag('Should not print anything except test results and diagnostic');
+my $test_data = 'Capture this';
+my $captured = capture_block {
+	print $test_data;
+};
+is($captured, $test_data, 'capture simple data');
+
+binmode STDOUT, ':utf8';
+$test_data = <<'EOF';
+Zażółć gęsią jaźń
+EOF
+utf8::decode($test_data);
+$captured = capture_block {
+	binmode STDOUT, ':utf8';
+
+	print $test_data;
+};
+utf8::decode($captured);
+is($captured, $test_data, 'capture utf8 data');
+
+$test_data = '|\x{fe}\x{ff}|\x{9F}|\000|'; # invalid utf-8
+$captured = capture_block {
+	binmode STDOUT, ':raw';
+
+	print $test_data;
+};
+is($captured, $test_data, 'capture raw data');
+
+# Test nested capturing
+#
+TODO: {
+	local $TODO = "not required for capturing gitweb output";
+	no warnings;
+
+	my $outer_capture = GitwebCache::Capture::Simple->new();
+	$captured = $outer_capture->capture(sub {
+		print "pre|";
+		my $captured = $capture->capture(sub {
+			print "INNER";
+		});
+		print lc($captured);
+		print "|post";
+	});
+	is($captured, "pre|inner|post", 'nested capture');
+}
+
+SKIP: {
+	skip "Capture::Tiny not available", 1
+		unless eval { require Capture::Tiny; };
+
+	$captured = Capture::Tiny::capture(sub {
+		my $inner = $capture->capture(sub {
+			print "INNER";
+		});
+	});
+	is($captured, '', "doesn't print while capturing");
+}
+
+done_testing();
+
+# Local Variables:
+# coding: utf-8
+# End:
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 07/24] gitweb/lib - Cache captured output (using get/set)
  2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
                   ` (5 preceding siblings ...)
  2010-12-06 23:10 ` [PATCH 06/24] gitweb/lib - Simple output capture by redirecting STDOUT Jakub Narebski
@ 2010-12-06 23:10 ` Jakub Narebski
  2010-12-06 23:10 ` [PATCH 08/24] gitweb: Add optional output caching Jakub Narebski
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2010-12-06 23:10 UTC (permalink / raw
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Junio C Hamano, demerphq, Aevar Arnfjord Bjarmason, Thomas Adam,
	Jakub Narebski

Add GitwebCache::CacheOutput package, which contains cache_output
subroutine.  The cache_output gets data from cache and prints it, or
captures output of provided subroutine (code reference), saves it to
cache and prints it.  It currently uses Cache::Cache compatibile (get,
set) interface to cache.  The capture_stop currently simply runs
$capture->capture_stop().

Capturing output is done using GitwebCache::Capture::Simple compatibile
capture (->new(), ->capture($code)) passed as one of parameters; the
default solution is to use GitwebCache::Capture::Simple.

Gitweb would use cache_output to get page from cache, or to generate
page and save it to cache.  The capture_stop would be used in die_error
subroutine, because error pages would not be cached.

It is assumed that data is saved to cache _converted_, and should
therefore be read from cache and printed to STDOUT in ':raw' (binary)
mode.


Add t9505/test_cache_output.pl test, run as external test in
t9505-gitweb-cache.  It checks that cache_output behaves correctly,
namely that it saves and restores action output in cache, and that it
prints generated output or cached output, depending on whether there
exist data in cache.  This test requires Capture::Tiny to be installed
(because GitwebCache::Capture::Simple dosn't support recursive
capturing).

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
In this version capture_stop is defined in GitwebCache::CacheOutput,
instead of being re-exported from GitwebCache::Capture (which no
longer exists).  Compared to previous version cache_output now expects
$capture (selecting capturing engine) as second argument, and 
capture_stop expect $cache and $capture, while it was paramless before.
t9505 now requires Capture::Tiny, because it needs reliable nested
capture.

capture_stop is here only in the case in the future we would accept
wider range of possible $capture values, e.g. \&Capture::Tiny::capture
(reference to capturing subroutine).


cache_output($cache, $capture, $key, $code) is roughly equivalent to
cache_fetch($action) from cache.pm file in "Gitweb caching v7" series
by J.H.... but nothing in GitwebCache::CacheOutput is gitweb specific.
This module is simply about caching captured output, using given caching
engine, given capturing engine, given key to cache entry, and given 
coderef output of which is to be captured and cached, and then printing
generated data or data from cache.

Similarly to J.H. patch, and differently from CGI::Cache or
Plack::Middleware::Cache accessing cache is/would be quite explicit.

 gitweb/lib/GitwebCache/CacheOutput.pm              |   90 ++++++++++++++++++++
 ...-caching-interface.sh => t9505-gitweb-cache.sh} |   15 ++-
 t/t9505/test_cache_output.pl                       |   86 +++++++++++++++++++
 3 files changed, 186 insertions(+), 5 deletions(-)
 create mode 100644 gitweb/lib/GitwebCache/CacheOutput.pm
 copy t/{t9503-gitweb-caching-interface.sh => t9505-gitweb-cache.sh} (58%)
 create mode 100755 t/t9505/test_cache_output.pl

diff --git a/gitweb/lib/GitwebCache/CacheOutput.pm b/gitweb/lib/GitwebCache/CacheOutput.pm
new file mode 100644
index 0000000..458e314
--- /dev/null
+++ b/gitweb/lib/GitwebCache/CacheOutput.pm
@@ -0,0 +1,90 @@
+# gitweb - simple web interface to track changes in git repositories
+#
+# (C) 2010, Jakub Narebski <jnareb@gmail.com>
+# (C) 2006, John 'Warthog9' Hawley <warthog19@eaglescrag.net>
+#
+# This program is licensed under the GPLv2
+
+#
+# Capturing and caching (gitweb) output
+#
+
+# Capture output, save it in cache and print it, or retrieve it from
+# cache and print it.
+
+package GitwebCache::CacheOutput;
+
+use strict;
+use warnings;
+
+use Exporter qw(import);
+our @EXPORT      = qw(cache_output capture_stop);
+our %EXPORT_TAGS = (all => [ @EXPORT ]);
+
+# cache_output($cache, $capture, $key, $action_code);
+#
+# Attempts to get $key from $cache; if successful, prints the value.
+# Otherwise, calls $action_code, capture its output using $capture,
+# and use the captured output as the new value for $key in $cache,
+# then print captured output.
+#
+# It is assumed that captured data is already converted and it is
+# in ':raw' format (and thus restored in ':raw' from cache)
+
+# default capture class (engine), if none provided
+our $DEFAULT_CAPTURE_CLASS = 'GitwebCache::Capture::Simple';
+sub cache_output {
+	my ($cache, $capture, $key, $code) = @_;
+
+	$capture = setup_capture($capture);
+
+	# check if data is in the cache
+	my $data = $cache->get($key);
+
+	# capture and cache output, if there was nothing in the cache
+	if (!defined $data) {
+		$data = $capture->capture($code);
+		$cache->set($key, $data) if defined $data;
+	}
+
+	# print cached data
+	if (defined $data) {
+		binmode STDOUT, ':raw';
+		print $data;
+	}
+
+	return $data;
+}
+
+# capture_stop($cache, $capture);
+#
+# Stops capturing output; to be used in die_error, so that error pages
+# are not cached (not captured and cached).
+sub capture_stop {
+	my ($cache, $capture) = @_;
+
+	if (defined $capture) {
+		return $capture->capture_stop();
+	}
+	return;
+}
+
+# ......................................................................
+# helper subroutines
+
+# setup capture engine
+sub setup_capture {
+	my $capture = shift;
+
+	$capture ||= $DEFAULT_CAPTURE_CLASS;
+	if (!ref($capture)) {
+		eval "require $capture;" or die $@;
+		$capture = $capture->new();
+	}
+
+	return $capture;
+}
+
+1;
+__END__
+# end of package GitwebCache::CacheOutput
diff --git a/t/t9503-gitweb-caching-interface.sh b/t/t9505-gitweb-cache.sh
similarity index 58%
copy from t/t9503-gitweb-caching-interface.sh
copy to t/t9505-gitweb-cache.sh
index 819da1d..181577f 100755
--- a/t/t9503-gitweb-caching-interface.sh
+++ b/t/t9505-gitweb-cache.sh
@@ -3,10 +3,10 @@
 # Copyright (c) 2010 Jakub Narebski
 #
 
-test_description='gitweb caching interface
+test_description='gitweb cache
 
-This test checks caching interface used in gitweb caching, and caching
-infrastructure (GitwebCache::* modules).'
+This test checks GitwebCache::CacheOutput Perl module that is
+responsible for capturing and caching gitweb output.'
 
 # for now we are running only cache interface tests
 . ./test-lib.sh
@@ -22,13 +22,18 @@ fi
 	test_done
 }
 
+"$PERL_PATH" -MCapture::Tiny -e 0 >/dev/null 2>&1 || {
+	skip_all='perl module Capture::Tiny unavailable, skipping test'
+	test_done
+}
+
 # ----------------------------------------------------------------------
 
 # The external test will outputs its own plan
 test_external_has_tap=1
 
 test_external \
-	'GitwebCache::* Perl API (in gitweb/lib/)' \
-	"$PERL_PATH" "$TEST_DIRECTORY"/t9503/test_cache_interface.pl
+	'GitwebCache::CacheOutput Perl API (in gitweb/lib/)' \
+	"$PERL_PATH" "$TEST_DIRECTORY"/t9505/test_cache_output.pl
 
 test_done
diff --git a/t/t9505/test_cache_output.pl b/t/t9505/test_cache_output.pl
new file mode 100755
index 0000000..167bb36
--- /dev/null
+++ b/t/t9505/test_cache_output.pl
@@ -0,0 +1,86 @@
+#!/usr/bin/perl
+use lib (split(/:/, $ENV{GITPERLLIB}));
+
+use warnings;
+use strict;
+
+use Test::More;
+use Capture::Tiny qw(capture);
+
+# test source version
+use lib $ENV{GITWEBLIBDIR} || "$ENV{GIT_BUILD_DIR}/gitweb/lib";
+
+# ....................................................................
+
+# prototypes must be known at compile time, otherwise they do not work
+BEGIN { use_ok('GitwebCache::CacheOutput'); }
+
+require_ok('GitwebCache::SimpleFileCache');
+require_ok('GitwebCache::Capture::Simple');
+
+diag("Using lib '$INC[0]'");
+diag("Testing '$INC{'GitwebCache/CacheOutput.pm'}'");
+diag("Testing '$INC{'GitwebCache/SimpleFileCache.pm'}'");
+diag("Testing '$INC{'GitwebCache/Capture/Simple.pm'}'");
+
+
+# Test setting up $cache and $capture
+my $cache   = new_ok('GitwebCache::SimpleFileCache' => [], 'The $cache  ');
+my $capture = new_ok('GitwebCache::Capture::Simple' => [], 'The $capture');
+
+# ......................................................................
+
+# Prepare for testing cache_output
+my $key = 'Key';
+my $action_output = <<'EOF';
+# This is data to be cached and shown
+EOF
+my $cached_output = <<"EOF";
+$action_output# (version recovered from cache)
+EOF
+sub action {
+	print $action_output;
+}
+
+my $no_capture_output = <<"EOF";
+$action_output# (no capture)
+EOF
+sub no_capture {
+	capture_stop($cache, $capture);
+	print $no_capture_output;
+}
+
+# Catch output printed by cache_fetch
+# (only for 'print <sth>' and 'printf <sth>')
+sub capture_output_of_cache_output {
+	my $code = shift;
+
+	my ($stdout, $stderr) = capture {
+		cache_output($cache, $capture, $key, $code);
+	};
+	print STDERR $stderr;
+	return $stdout;
+}
+
+# clean state
+$cache->set_expires_in(-1);
+$cache->remove($key);
+my $test_data;
+
+# first time (if there is no cache) generates cache entry
+$test_data = capture_output_of_cache_output(\&action);
+is($test_data, $action_output,        'action output is printed (generated)');
+is($cache->get($key), $action_output, 'action output is saved in cache (generated)');
+
+# second time (if cache is set/valid) reads from cache
+$cache->set($key, $cached_output);
+$test_data = capture_output_of_cache_output(\&action);
+is($test_data, $cached_output,        'action output is printed (from cache)');
+
+# test using capture_stop
+$cache->remove($key);
+$test_data = capture_output_of_cache_output(\&no_capture);
+is($test_data, $no_capture_output,    'no_capture output is printed (generated)');
+ok(! $cache->get($key),               'no_capture output is not captured and not cached');
+
+done_testing();
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 08/24] gitweb: Add optional output caching
  2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
                   ` (6 preceding siblings ...)
  2010-12-06 23:10 ` [PATCH 07/24] gitweb/lib - Cache captured output (using get/set) Jakub Narebski
@ 2010-12-06 23:10 ` Jakub Narebski
  2010-12-06 23:10 ` [PATCH 09/24] gitweb/lib - Adaptive cache expiration time Jakub Narebski
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2010-12-06 23:10 UTC (permalink / raw
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Junio C Hamano, demerphq, Aevar Arnfjord Bjarmason, Thomas Adam,
	Jakub Narebski

This commit actually adds output caching to gitweb, as we have now
minimal features required for it in GitwebCache::SimpleFileCache
(a 'dumb' but fast file-based cache engine).  To enable cache you need
(at least) set $caching_enabled to true in gitweb config, and copy
required modules alongside generated gitweb.cgi - this is described
in more detail in the new "Gitweb caching" section in gitweb/README.
"make install-gitweb" would install all modules alongside gitweb
itself.

Caching in theory can be done using any Perl module that implements
Cache::Cache compatibile get/set (method) interface.  The default is
to use GitwebCache::SimpleFileCache.  Capturing and caching output
is done via cache_output subroutine from GitwebCache::CacheOutput.

The cache_output subroutine in GitwebCache::CacheOutput currently uses
GitwebCache::Capture::Simple compatibile capturing engine passed as
one of parameters to cache_output subroutine.  The default is to use
GitwebCache::Capture::Simple package.

Capturing and caching is designed in such way that there is no
behaviour change if $caching_enabled is false.  If caching is not
enabled, then capturing is also turned off.

Enabling caching causes the following additional changes to gitweb
output:
* Disables content-type negotiation (choosing between 'text/html'
  mimetype and 'application/xhtml+xml') when caching, as there is no
  content-type negotiation done when retrieving page from cache.
  Use lowest common denominator of 'text/html' mimetype which can
  be used by all browsers.  This may change in the future.
* Disable optional timing info (how much time it took to generate the
  original page, and how many git commands it took), and in its place show
  unconditionally when page was originally generated (in GMT / UTC
  timezone).
* Disable 'blame_incremental' view, as it doesn't make sense without
  printing data as soon as it is generated (which would require tee-ing
  when capturing output for caching)... and it doesn't work currently
  anyway.  Alternate solution would be to run 'blame_incremental' view
  with caching disabled.


Add basic tests of caching support to t9500-gitweb-standalone-no-errors
test: set $caching_enabled to true and check for errors for first time
run (generating cache) and second time run (retrieving from cache) for a
single view - summary view for a project.

Check in the t9501-gitweb-standalone-http-status test that gitweb at
least correctly handles "404 Not Found" error pages also in the case
when gitweb caching is enabled.

Check in the t9502-gitweb-standalone-parse-output test that gitweb
produces the same output with and without caching, for first and
second run, with binary or text output.

All those tests make use of new gitweb_enable_caching subroutine added
to gitweb-lib.sh

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Gitweb caching tests are (almost) the same as in

  [PATCHv7.4 4/4] gitweb: Minimal testing of gitweb caching
  Message-Id: <1291404335-25541-4-git-send-email-jnareb@gmail.com>
  http://thread.gmane.org/gmane.comp.version-control.git/162830/focus=162834

The difference is of course in setting up tests, i.e. in the contents
of gitweb_enable_caching function in t/gitweb-lib.sh.

Only the t9500 tests were present in previous version of this series.


One of differences from "Gitweb caching v7" by J.H. (and minimal fixup of
thereof) is that in "gitweb: File based caching layer (from git.kernel.org)"
hashed (internal) key is composed of $my_url ($cgi->url()) and
$ENV{'QUERY_STRING'}... which meant that it ignores path_info (which is
WRONG, as you can have different views that differ only in path_info part;
different views which shouldn't be lumped together under one cache key), and
that links which differ only in ordering of parameters (e.g. handcrafted
URL) would result in different cache keys.

I have chosen here to use href(-replay => 1, -full => 1, -path_info => 0) as
human-readable cache key, which means that if different link gives use the
same action then it gives us the same cache key.


NOTE: I had to put quick fix about base url, so sha1 of gitweb.perl preimage
might not match.

 gitweb/Makefile                           |    5 +
 gitweb/README                             |   58 ++++++++++++
 gitweb/gitweb.perl                        |  144 +++++++++++++++++++++++++----
 t/gitweb-lib.sh                           |   11 ++
 t/t9500-gitweb-standalone-no-errors.sh    |   20 ++++
 t/t9501-gitweb-standalone-http-status.sh  |   13 +++
 t/t9502-gitweb-standalone-parse-output.sh |   33 +++++++
 7 files changed, 265 insertions(+), 19 deletions(-)

diff --git a/gitweb/Makefile b/gitweb/Makefile
index e6029e1..d67c138 100644
--- a/gitweb/Makefile
+++ b/gitweb/Makefile
@@ -113,6 +113,11 @@ endif
 
 GITWEB_FILES += static/git-logo.png static/git-favicon.png
 
+# gitweb output caching
+GITWEB_MODULES += GitwebCache/CacheOutput.pm
+GITWEB_MODULES += GitwebCache/SimpleFileCache.pm
+GITWEB_MODULES += GitwebCache/Capture/Simple.pm
+
 GITWEB_REPLACE = \
 	-e 's|++GIT_VERSION++|$(GIT_VERSION)|g' \
 	-e 's|++GIT_BINDIR++|$(bindir)|g' \
diff --git a/gitweb/README b/gitweb/README
index bf3664f..3dc01bd 100644
--- a/gitweb/README
+++ b/gitweb/README
@@ -246,6 +246,12 @@ not include variables usually directly set during build):
    http://www.andre-simon.de due to assumptions about parameters and output).
    Useful if highlight is not installed on your webserver's PATH.
    [Default: highlight]
+ * $caching_enabled
+   If true, gitweb would use caching to speed up generating response.
+   Currently supported is only output (response) caching.  See "Gitweb caching"
+   section below for details on how to configure and customize caching.
+   The default is false (caching is disabled).
+
 
 Projects list file format
 ~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -317,6 +323,58 @@ You can use the following files in repository:
    descriptions.
 
 
+Gitweb caching
+~~~~~~~~~~~~~~
+
+Currently gitweb supports only output (HTTP response) caching, similar
+to the one used on http://git.kernel.org.  To turn it on, set 
+$caching_enabled variable to true value in gitweb config file, i.e.:
+
+   our $caching_enabled = 1;
+
+You can choose which caching engine should gitweb use by setting $cache
+variable to _initialized_ instance of cache interface, e.g.:
+
+   use CHI;
+   our $cache = CHI->new( driver => 'Memcached',
+   	servers => [ "10.0.0.15:11211", "10.0.0.15:11212" ],
+   	l1_cache => { driver => 'FastMmap', root_dir => '/var/cache/gitweb' }
+   );
+
+Alternatively you can set $cache variable to the name of cache class,
+e.g.:
+
+   our $cache = 'Cache::FileCache';
+
+In this case caching engine should support Cache::Cache or CHI names for
+cache config (see below), and ignore unrecognized options.  Such caching
+engine should also implement (at least) ->get($key) and ->set($key, $data)
+methods (Cache::Cache and CHI compatible interface).
+
+If $cache is left unset (if it is left undefined), then gitweb would use
+GitwebCache::SimpleFileCache as caching engine.  This engine is 'dumb' (but
+fast) file based caching layer, currently without any support for cache size
+limiting, or even removing expired / grossly expired entries.  It has
+therefore the downside of requiring a huge amount of disk space if there are
+a number of repositories involved.  It is not uncommon for git.kernel.org to
+have on the order of 80G - 120G accumulate over the course of a few months.
+It is therefore recommended that the cache directory be periodically
+completely deleted; this operation is safe to perform.  Suggested mechanism
+(substitute $cachedir for actual path to gitweb cache):
+
+   # mv $cachedir $cachedir.flush && mkdir $cachedir && rm -rf $cachedir.flush
+
+Site-wide cache options are defined in %cache_options hash.  Those options
+apply only when $cache is unset (GitwebCache::SimpleFileCache is used), or
+if $cache is name of cache class (e.g. $cache = 'Cache::FileCache').  You
+can override cache options in gitweb config, e.g.:
+
+   $cache_options{'expires_in'} = 60; # 60 seconds = 1 minute
+
+Please read comments for %cache_options entries in gitweb/gitweb.perl for
+description of available cache options.
+
+
 Webserver configuration
 -----------------------
 
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index cfa511c..3286709 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -270,6 +270,48 @@ our %highlight_ext = (
 	map { $_ => 'xml' } qw(xhtml html htm),
 );
 
+
+# This enables/disables the caching layer in gitweb.  Currently supported
+# is only output (response) caching, similar to the one used on git.kernel.org.
+our $caching_enabled = 0;
+# Set to _initialized_ instance of cache interface implementing (at least)
+# get($key) and set($key, $data) methods (Cache::Cache and CHI interfaces),
+# or to name of class of cache interface implementing said methods.
+# If unset, GitwebCache::SimpleFileCache would be used, which is 'dumb'
+# (but fast) file based caching layer, currently without any support for
+# cache size limiting.  It is therefore recommended that the cache directory
+# be periodically completely deleted; this operation is safe to perform.
+# Suggested mechanism:
+# mv $cachedir $cachedir.flush && mkdir $cachedir && rm -rf $cachedir.flush
+our $cache;
+# You define site-wide cache options defaults here; override them with
+# $GITWEB_CONFIG as necessary.
+our %cache_options = (
+	# The location in the filesystem that will hold the root of the cache.
+	# This directory will be created as needed (if possible) on the first
+	# cache set.  Note that either this directory must exists and web server
+	# has to have write permissions to it, or web server must be able to
+	# create this directory.
+	# Possible values:
+	# * 'cache' (relative to gitweb),
+	# * File::Spec->catdir(File::Spec->tmpdir(), 'gitweb-cache'),
+	# * '/var/cache/gitweb' (FHS compliant, requires being set up),
+	'cache_root' => 'cache',
+
+	# The number of subdirectories deep to cache object item.  This should be
+	# large enough that no cache directory has more than a few hundred
+	# objects.  Each non-leaf directory contains up to 256 subdirectories
+	# (00-ff).  Must be larger than 0.
+	'cache_depth' => 1,
+
+	# The (global) expiration time for objects placed in the cache, in seconds.
+	'expires_in' => 20,
+);
+# Set to _initialized_ instance of GitwebCache::Capture compatibile capturing
+# engine, i.e. one implementing ->new() constructor, and ->capture($code)
+# method.  If unset (default), the GitwebCache::Capture::Simple would be used.
+our $capture;
+
 # You define site-wide feature defaults here; override them with
 # $GITWEB_CONFIG as necessary.
 our %feature = (
@@ -1069,7 +1111,15 @@ sub dispatch {
 	    !$project) {
 		die_error(400, "Project needed");
 	}
-	$actions{$action}->();
+
+	if ($caching_enabled) {
+		# human readable key identifying gitweb output
+		my $output_key = href(-replay => 1, -full => 1, -path_info => 0);
+
+		cache_output($cache, $capture, $output_key, $actions{$action});
+	} else {
+		$actions{$action}->();
+	}
 }
 
 sub reset_timer {
@@ -1085,6 +1135,8 @@ sub run_request {
 	evaluate_gitweb_config();
 	evaluate_git_version();
 	check_loadavg();
+	configure_caching()
+		if ($caching_enabled);
 
 	# $projectroot and $projects_list might be set in gitweb config file
 	$projects_list ||= $projectroot;
@@ -1157,6 +1209,42 @@ sub run {
 	1;
 }
 
+sub configure_caching {
+	if (!eval { require GitwebCache::CacheOutput; 1; }) {
+		# cache is configured _before_ handling request, so $cgi is not defined,
+		# so we can't just "die" with sending error message to web browser
+		#die_error(500, "Caching enabled and GitwebCache::CacheOutput not found");
+
+		# turn off caching and warn instead
+		$caching_enabled = 0;
+		warn "Caching enabled and GitwebCache::CacheOutput not found";
+	}
+	GitwebCache::CacheOutput->import();
+
+	# $cache might be initialized (instantiated) cache, i.e. cache object,
+	# or it might be name of class, or it might be undefined
+	unless (defined $cache && ref($cache)) {
+		$cache ||= 'GitwebCache::SimpleFileCache';
+		eval "require $cache";
+		die $@ if $@;
+		$cache = $cache->new({
+			%cache_options,
+			#'cache_root' => '/tmp/cache',
+			#'cache_depth' => 2,
+			#'expires_in' => 20, # in seconds (CHI compatibile)
+			# (Cache::Cache compatibile initialization)
+			'default_expires_in' => $cache_options{'expires_in'},
+			# (CHI compatibile initialization)
+			'root_dir' => $cache_options{'cache_root'},
+			'depth' => $cache_options{'cache_depth'},
+		});
+	}
+	unless (defined $capture && ref($capture)) {
+		require GitwebCache::Capture::Simple;
+		$capture = GitwebCache::Capture::Simple->new();
+	}
+}
+
 run();
 
 if (defined caller) {
@@ -3420,7 +3508,9 @@ sub git_header_html {
 	# 'application/xhtml+xml', otherwise send it as plain old 'text/html'.
 	# we have to do this because MSIE sometimes globs '*/*', pretending to
 	# support xhtml+xml but choking when it gets what it asked for.
-	if (defined $cgi->http('HTTP_ACCEPT') &&
+	# Disable content-type negotiation when caching (use mimetype good for all).
+	if (!$caching_enabled &&
+	    defined $cgi->http('HTTP_ACCEPT') &&
 	    $cgi->http('HTTP_ACCEPT') =~ m/(,|;|\s|^)application\/xhtml\+xml(,|;|\s|$)/ &&
 	    $cgi->Accept('application/xhtml+xml') != 0) {
 		$content_type = 'application/xhtml+xml';
@@ -3445,7 +3535,9 @@ sub git_header_html {
 EOF
 	# the stylesheet, favicon etc urls won't work correctly with path_info
 	# unless we set the appropriate base URL
-	if ($ENV{'PATH_INFO'}) {
+	# if caching is enabled we can get it from cache for path_info when it
+	# is generated without path_info
+	if ($ENV{'PATH_INFO'} || $caching_enabled) {
 		print "<base href=\"".esc_url($base_url)."\" />\n";
 	}
 	# print out each stylesheet that exist, providing backwards capability
@@ -3594,17 +3686,25 @@ sub git_footer_html {
 	}
 	print "</div>\n"; # class="page_footer"
 
-	if (defined $t0 && gitweb_check_feature('timed')) {
+	# timing info doesn't make much sense with output (response) caching,
+	# so when caching is enabled gitweb prints the time of page generation
+	if ((defined $t0 || $caching_enabled) &&
+	    gitweb_check_feature('timed')) {
 		print "<div id=\"generating_info\">\n";
-		print 'This page took '.
-		      '<span id="generating_time" class="time_span">'.
-		      Time::HiRes::tv_interval($t0, [Time::HiRes::gettimeofday()]).
-		      ' seconds </span>'.
-		      ' and '.
-		      '<span id="generating_cmd">'.
-		      $number_of_git_cmds.
-		      '</span> git commands '.
-		      " to generate.\n";
+		if ($caching_enabled) {
+			print 'This page was generated at '.
+			      gmtime( time() )." GMT\n";
+		} else {
+			print 'This page took '.
+			      '<span id="generating_time" class="time_span">'.
+			      Time::HiRes::tv_interval($t0, [Time::HiRes::gettimeofday()]).
+			      ' seconds </span>'.
+			      ' and '.
+			      '<span id="generating_cmd">'.
+			      $number_of_git_cmds.
+			      '</span> git commands '.
+			      " to generate.\n";
+		}
 		print "</div>\n"; # class="page_footer"
 	}
 
@@ -3613,8 +3713,8 @@ sub git_footer_html {
 	}
 
 	print qq!<script type="text/javascript" src="$javascript"></script>\n!;
-	if (defined $action &&
-	    $action eq 'blame_incremental') {
+	if (!$caching_enabled &&
+	    defined $action && $action eq 'blame_incremental') {
 		print qq!<script type="text/javascript">\n!.
 		      qq!startBlame("!. href(action=>"blame_data", -replay=>1) .qq!",\n!.
 		      qq!           "!. href() .qq!");\n!.
@@ -3655,6 +3755,10 @@ sub die_error {
 		500 => '500 Internal Server Error',
 		503 => '503 Service Unavailable',
 	);
+
+	# Do not cache error pages
+	capture_stop($cache, $capture) if ($capture && $caching_enabled);
+
 	git_header_html($http_responses{$status}, undef, %opts);
 	print <<EOF;
 <div class="page_body">
@@ -5250,7 +5354,8 @@ sub git_tag {
 
 sub git_blame_common {
 	my $format = shift || 'porcelain';
-	if ($format eq 'porcelain' && $cgi->param('js')) {
+	if ($format eq 'porcelain' && $cgi->param('js') &&
+	    !$caching_enabled) {
 		$format = 'incremental';
 		$action = 'blame_incremental'; # for page title etc
 	}
@@ -5304,7 +5409,8 @@ sub git_blame_common {
 			or print "ERROR $!\n";
 
 		print 'END';
-		if (defined $t0 && gitweb_check_feature('timed')) {
+		if (!$caching_enabled &&
+		    defined $t0 && gitweb_check_feature('timed')) {
 			print ' '.
 			      Time::HiRes::tv_interval($t0, [Time::HiRes::gettimeofday()]).
 			      ' '.$number_of_git_cmds;
@@ -5324,7 +5430,7 @@ sub git_blame_common {
 		$formats_nav .=
 			$cgi->a({-href => href(action=>"blame", javascript=>0, -replay=>1)},
 			        "blame") . " (non-incremental)";
-	} else {
+	} elsif (!$caching_enabled) {
 		$formats_nav .=
 			$cgi->a({-href => href(action=>"blame_incremental", -replay=>1)},
 			        "blame") . " (incremental)";
@@ -5483,7 +5589,7 @@ sub git_blame {
 }
 
 sub git_blame_incremental {
-	git_blame_common('incremental');
+	git_blame_common(!$caching_enabled ? 'incremental' : undef);
 }
 
 sub git_blame_data {
diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh
index b9bb95f..4ce067f 100644
--- a/t/gitweb-lib.sh
+++ b/t/gitweb-lib.sh
@@ -52,6 +52,17 @@ EOF
 	export SCRIPT_NAME
 }
 
+gitweb_enable_caching () {
+	test_expect_success 'enable caching' '
+		cat >>gitweb_config.perl <<-\EOF &&
+		$caching_enabled = 1;
+		$cache_options{"expires_in"} = -1;      # never expire cache for tests
+		$cache_options{"cache_root"} = "cache"; # to clear the right thing
+		EOF
+		rm -rf cache/
+	'
+}
+
 gitweb_run () {
 	GATEWAY_INTERFACE='CGI/1.1'
 	HTTP_ACCEPT='*/*'
diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
index 21cd286..cc9cee5 100755
--- a/t/t9500-gitweb-standalone-no-errors.sh
+++ b/t/t9500-gitweb-standalone-no-errors.sh
@@ -677,4 +677,24 @@ test_expect_success HIGHLIGHT \
 	 gitweb_run "p=.git;a=blob;f=test.sh"'
 test_debug 'cat gitweb.log'
 
+# ----------------------------------------------------------------------
+# caching
+
+gitweb_enable_caching
+
+test_expect_success \
+	'caching enabled (project summary, first run, generating cache)' \
+	'gitweb_run "p=.git;a=summary"'
+test_debug 'cat gitweb.log'
+
+test_expect_success \
+	'caching enabled (project summary, second run, cached version)' \
+	'gitweb_run "p=.git;a=summary"'
+test_debug 'cat gitweb.log'
+
+test_expect_success \
+	'caching enabled (non-existent commit, not cached error page)' \
+	'gitweb_run "p=.git;a=commit;h=non-existent"'
+test_debug 'cat gitweb.log'
+
 test_done
diff --git a/t/t9501-gitweb-standalone-http-status.sh b/t/t9501-gitweb-standalone-http-status.sh
index 2487da1..168e494 100755
--- a/t/t9501-gitweb-standalone-http-status.sh
+++ b/t/t9501-gitweb-standalone-http-status.sh
@@ -134,5 +134,18 @@ cat >>gitweb_config.perl <<\EOF
 our $maxload = undef;
 EOF
 
+# ----------------------------------------------------------------------
+# output caching
+
+gitweb_enable_caching
+
+test_expect_success 'caching enabled (non-existent commit, 404 error)' '
+	gitweb_run "p=.git;a=commit;h=non-existent" &&
+	grep "Status: 404 Not Found" gitweb.headers &&
+	grep "404 - Unknown commit object" gitweb.body
+'
+test_debug 'echo "headers" && cat gitweb.headers'
+test_debug 'echo "body"    && cat gitweb.body'
+
 
 test_done
diff --git a/t/t9502-gitweb-standalone-parse-output.sh b/t/t9502-gitweb-standalone-parse-output.sh
index dd83890..bc8cb92 100755
--- a/t/t9502-gitweb-standalone-parse-output.sh
+++ b/t/t9502-gitweb-standalone-parse-output.sh
@@ -112,4 +112,37 @@ test_expect_success 'snapshot: hierarchical branch name (xx/test)' '
 '
 test_debug 'cat gitweb.headers'
 
+
+# ----------------------------------------------------------------------
+# whether gitweb with caching enabled produces the same output
+
+test_expect_success 'setup for caching tests (utf8 commit, binary file)' '
+	. "$TEST_DIRECTORY"/t3901-utf8.txt &&
+	cp "$TEST_DIRECTORY"/test9200a.png image.png &&
+	git add image.png &&
+	git commit -F "$TEST_DIRECTORY"/t3900/1-UTF-8.txt &&
+	gitweb_run "p=.git;a=patch" &&
+	mv gitweb.body no_cache.html &&
+	gitweb_run "p=.git;a=blob_plain;f=image.png" &&
+	mv gitweb.body no_cache.png
+'
+
+gitweb_enable_caching
+
+for desc in 'generating cache' 'cached version'; do
+	test_expect_success "caching enabled, HTML output, $desc" '
+		gitweb_run "p=.git;a=patch" &&
+		mv gitweb.body cache.html &&
+		test_cmp no_cache.html cache.html
+	'
+done
+
+for desc in 'generating cache' 'cached version'; do
+	test_expect_success "caching enabled, binary output, $desc" '
+		gitweb_run "p=.git;a=blob_plain;f=image.png" &&
+		mv gitweb.body cache.png &&
+		cmp no_cache.png cache.png
+	'
+done
+
 test_done
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 09/24] gitweb/lib - Adaptive cache expiration time
  2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
                   ` (7 preceding siblings ...)
  2010-12-06 23:10 ` [PATCH 08/24] gitweb: Add optional output caching Jakub Narebski
@ 2010-12-06 23:10 ` Jakub Narebski
  2010-12-06 23:10 ` [PATCH 10/24] gitweb/lib - Use CHI compatibile (compute method) caching interface Jakub Narebski
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2010-12-06 23:10 UTC (permalink / raw
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Junio C Hamano, demerphq, Aevar Arnfjord Bjarmason, Thomas Adam,
	Jakub Narebski

Add to GitwebCache::SimpleFileCache support for adaptive lifetime
(cache expiration) control.  Cache lifetime can be increased or
decreased by any factor, e.g. load average, through the definition
of the 'check_load' callback.

Note that using ->set_expires_in, or unsetting 'check_load' via
->set_check_load(undef) turns off adaptive caching.

Make gitweb automatically adjust cache lifetime by load, using
get_loadavg() function.  Define and describe default parameters for
dynamic (adaptive) cache expiration time control.


There are some very basic tests of dynamic expiration time in t9503,
namely checking if dynamic expire time is within given upper and lower
bounds.

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Differences from relevant parts of J.H. patch:
* 'increase_factor' is configurable rather than hardcoded 60.

* 'check_load' is passed via conctructor parameter; gitweb by default
  sets it to \&get_loadavg.  This means that the caching engine is not
  entangled with gitweb (and for example can be tested separately).

 gitweb/gitweb.perl                        |   25 ++++++++-
 gitweb/lib/GitwebCache/SimpleFileCache.pm |   79 +++++++++++++++++++++++++++--
 t/t9503/test_cache_interface.pl           |   33 ++++++++++++
 3 files changed, 130 insertions(+), 7 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 4322b28..12e04a1 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -304,8 +304,29 @@ our %cache_options = (
 	# (00-ff).  Must be larger than 0.
 	'cache_depth' => 1,
 
-	# The (global) expiration time for objects placed in the cache, in seconds.
-	'expires_in' => 20,
+	# The (global) minimum expiration time for objects placed in the cache,
+	# in seconds.  If the dynamic adaptive cache exporation time is lower
+	# than this number, we set cache timeout to this minimum.
+	'expires_min' => 20,   # 20 seconds
+
+	# The (global) maximum expiration time for dynamic (adaptive) caching
+	# algorithm, in seconds.  If the adaptive cache lifetime exceeds this
+	# number, we set cache timeout to this maximum.
+	# (If 'expires_min' >= 'expires_max', there is no adaptive cache timeout,
+	# and 'expires_min' is used as expiration time for objects in cache.)
+	'expires_max' => 1200, # 20 minutes
+
+	# Cache lifetime will be increased by applying this factor to the result
+	# from 'check_load' callback (see below).
+	'expires_factor' => 60, # expire time in seconds for 1.0 (100% CPU) load
+
+	# User supplied callback for deciding the cache policy, usually system
+	# load.  Multiplied by 'expires_factor' gives adaptive expiration time,
+	# in seconds, subject to the limits imposed by 'expires_min' and
+	# 'expires_max' bounds.  Set to undef (or delete) to turn off dynamic
+	# lifetime control.
+	# (Compatibile with Cache::Adaptive.)
+	'check_load' => \&get_loadavg,
 );
 # Set to _initialized_ instance of GitwebCache::Capture compatibile capturing
 # engine, i.e. one implementing ->new() constructor, and ->capture($code)
diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
index bf74f7c..581a574 100644
--- a/gitweb/lib/GitwebCache/SimpleFileCache.pm
+++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
@@ -61,6 +61,22 @@ our $DEFAULT_NAMESPACE = '';
 #    'expires_in' (CHI compatibile) [seconds]
 #    The expiration time for objects place in the cache.
 #    Defaults to -1 (never expire) if not explicitly set.
+#    Sets 'expires_min' to given value.
+#  * 'expires_min' [seconds]
+#    The minimum expiration time for objects in cache (e.g. with 0% CPU load).
+#    Used as lower bound in adaptive cache lifetime / expiration.
+#    Defaults to 20 seconds; 'expires_in' sets it also.
+#  * 'expires_max' [seconds]
+#    The maximum expiration time for objects in cache.
+#    Used as upper bound in adaptive cache lifetime / expiration.
+#    Defaults to 1200 seconds, if not set; 
+#    defaults to 'expires_min' if 'expires_in' is used.
+#  * 'check_load'
+#    Subroutine (code) used for adaptive cache lifetime / expiration.
+#    If unset, adaptive caching is turned off; defaults to unset.
+#  * 'increase_factor' [seconds / 100% CPU load]
+#    Factor multiplying 'check_load' result when calculating cache lietime.
+#    Defaults to 60 seconds for 100% SPU load ('check_load' returning 1.0).
 sub new {
 	my $class = shift;
 	my %opts = ref $_[0] ? %{ $_[0] } : @_;
@@ -68,7 +84,8 @@ sub new {
 	my $self = {};
 	$self = bless($self, $class);
 
-	my ($root, $depth, $ns, $expires_in);
+	my ($root, $depth, $ns);
+	my ($expires_min, $expires_max, $increase_factor, $check_load);
 	if (%opts) {
 		$root =
 			$opts{'cache_root'} ||
@@ -77,19 +94,31 @@ sub new {
 			$opts{'cache_depth'} ||
 			$opts{'depth'};
 		$ns = $opts{'namespace'};
-		$expires_in =
+		$expires_min =
+			$opts{'expires_min'} ||
 			$opts{'default_expires_in'} ||
 			$opts{'expires_in'};
+		$expires_max =
+			$opts{'expires_max'};
+		$increase_factor = $opts{'expires_factor'};
+		$check_load      = $opts{'check_load'};
 	}
 	$root  = $DEFAULT_CACHE_ROOT  unless defined($root);
 	$depth = $DEFAULT_CACHE_DEPTH unless defined($depth);
 	$ns    = $DEFAULT_NAMESPACE   unless defined($ns);
-	$expires_in = -1 unless defined($expires_in); # <0 means never
+	$expires_min = -1 unless defined($expires_min);
+	$expires_max = $expires_min
+		if (!defined($expires_max) && exists $opts{'expires_in'});
+	$expires_max = -1 unless (defined($expires_max));
+	$increase_factor = 60 unless defined($increase_factor);
 
 	$self->set_root($root);
 	$self->set_depth($depth);
 	$self->set_namespace($ns);
-	$self->set_expires_in($expires_in);
+	$self->set_expires_min($expires_min);
+	$self->set_expires_max($expires_max);
+	$self->set_increase_factor($increase_factor);
+	$self->set_check_load($check_load);
 
 	return $self;
 }
@@ -101,7 +130,8 @@ sub new {
 # http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
 
 # creates get_depth() and set_depth($depth) etc. methods
-foreach my $i (qw(depth root namespace expires_in)) {
+foreach my $i (qw(depth root namespace
+                  expires_min expires_max increase_factor check_load)) {
 	my $field = $i;
 	no strict 'refs';
 	*{"get_$field"} = sub {
@@ -114,6 +144,45 @@ foreach my $i (qw(depth root namespace expires_in)) {
 	};
 }
 
+# ......................................................................
+# pseudo-accessors
+
+# returns adaptive lifetime of cache entry for given $key [seconds]
+sub get_expires_in {
+	my ($self) = @_;
+
+	# short-circuit
+	if (!defined $self->{'check_load'} ||
+	    $self->{'expires_max'} <= $self->{'expires_min'}) {
+		return $self->{'expires_min'};
+	}
+
+	my $expires_in =
+		#$self->{'expires_min'} +
+		$self->{'increase_factor'} * $self->check_load();
+
+	if ($expires_in < $self->{'expires_min'}) {
+		return $self->{'expires_min'};
+	} elsif ($expires_in > $self->{'expires_max'}) {
+		return $self->{'expires_max'};
+	}
+
+	return $expires_in;
+}
+
+# sets expiration time to $duration, turns off adaptive cache lifetime
+sub set_expires_in {
+	my ($self, $duration) = @_;
+
+	$self->{'expires_min'} = $self->{'expires_max'} = $duration;
+}
+
+# runs 'check_load' subroutine, for adaptive cache lifetime.
+# Note: check in caller that 'check_load' exists.
+sub check_load {
+	my $self = shift;
+	return $self->{'check_load'}->();
+}
 
 # ----------------------------------------------------------------------
 # utility functions and methods
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 1517fb6..9513043 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -101,4 +101,37 @@ subtest 'cache expiration' => sub {
 	done_testing();
 };
 
+# Test assertions for adaptive cache expiration
+#
+my $load = 0.0;
+sub load { return $load; }
+my $expires_min = 10;
+my $expires_max = 30;
+$cache->set_expires_in(-1);
+$cache->set_expires_min($expires_min);
+$cache->set_expires_max($expires_max);
+$cache->set_check_load(\&load);
+subtest 'adaptive cache expiration' => sub {
+	cmp_ok($cache->get_expires_min(), '==', $expires_min,
+	       '"expires min" set correctly');
+	cmp_ok($cache->get_expires_max(), '==', $expires_max,
+	       '"expires max" set correctly');
+
+	$load = 0.0;
+	cmp_ok($cache->get_expires_in(), '>=', $expires_min,
+	       '"expires in" bound from down for load=0');
+	cmp_ok($cache->get_expires_in(), '<=', $expires_max,
+	       '"expires in" bound from up   for load=0');
+	
+	$load = 1_000;
+	cmp_ok($cache->get_expires_in(), '>=', $expires_min,
+	       '"expires in" bound from down for heavy load');
+	cmp_ok($cache->get_expires_in(), '<=', $expires_max,
+	       '"expires in" bound from up   for heavy load');
+
+	done_testing();
+};
+
+$cache->set_expires_in(-1);
+
 done_testing();
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 10/24] gitweb/lib - Use CHI compatibile (compute method) caching interface
  2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
                   ` (8 preceding siblings ...)
  2010-12-06 23:10 ` [PATCH 09/24] gitweb/lib - Adaptive cache expiration time Jakub Narebski
@ 2010-12-06 23:10 ` Jakub Narebski
  2010-12-06 23:10 ` [PATCH 11/24] gitweb/lib - capture output directly to cache entry file Jakub Narebski
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2010-12-06 23:10 UTC (permalink / raw
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Junio C Hamano, demerphq, Aevar Arnfjord Bjarmason, Thomas Adam,
	Jakub Narebski

If $cache provides CHI compatible ->compute($key, $code) method, use it
instead of Cache::Cache compatible ->get($key) and ->set($key, $data).
In the future other compatibile but differently named methods, like
Cache::FastMmap's ->get_and_set($key, $code) method, could also be
supported; though CHI which has ->compute() includes CHI::Driver::FastMmap
wrapper.

GitwebCache::SimpleFileCache provides 'compute' method, which currently
simply use 'get' and 'set' methods in proscribed manner.  Nevertheless
'compute' method can be more flexible in choosing when to refresh cache,
and which process is to refresh/(re)generate cache entry.  This method
would use (advisory) locking to prevent 'cache miss stampede' (aka
'stampeding herd') problem in the next commit.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch has less importance that it had before, because when using
GitwebCache::FileCacheWithLocking it is ->compute_fh interface that would
be used, not ->compute.  As far as I understand CHI ->compute looks the
same simple wrapper around ->get/->set that GitwebCache::SimpleFileCache
uses.

On the other hand (as you would see later) ->compute and ->compute_fh have
lot of code in common, and it is easier to test behavior of ->compute in
t9503 tests.

 gitweb/lib/GitwebCache/CacheOutput.pm |   36 +++++++++++++++++++++++++++-----
 1 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/gitweb/lib/GitwebCache/CacheOutput.pm b/gitweb/lib/GitwebCache/CacheOutput.pm
index 458e314..4a96ac9 100644
--- a/gitweb/lib/GitwebCache/CacheOutput.pm
+++ b/gitweb/lib/GitwebCache/CacheOutput.pm
@@ -38,6 +38,36 @@ sub cache_output {
 
 	$capture = setup_capture($capture);
 
+	my $data;
+	if ($cache->can('compute')) {
+		$data = cache_output_compute($cache, $capture, $key, $code);
+	} else {
+		$data = cache_output_get_set($cache, $capture, $key, $code);
+	}
+
+	if (defined $data) {
+		binmode STDOUT, ':raw';
+		print $data;
+	}
+
+	return $data;
+}
+
+# for $cache which can ->compute($key, $code)
+sub cache_output_compute {
+	my ($cache, $capture, $key, $code) = @_;
+
+	my $data = $cache->compute($key, sub {
+		$capture->capture($code);
+	});
+
+	return $data;
+}
+
+# for $cache which can ->get($key) and ->set($key, $data)
+sub cache_output_get_set {
+	my ($cache, $capture, $key, $code) = @_;
+
 	# check if data is in the cache
 	my $data = $cache->get($key);
 
@@ -47,12 +77,6 @@ sub cache_output {
 		$cache->set($key, $data) if defined $data;
 	}
 
-	# print cached data
-	if (defined $data) {
-		binmode STDOUT, ':raw';
-		print $data;
-	}
-
 	return $data;
 }
 
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 11/24] gitweb/lib - capture output directly to cache entry file
  2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
                   ` (9 preceding siblings ...)
  2010-12-06 23:10 ` [PATCH 10/24] gitweb/lib - Use CHI compatibile (compute method) caching interface Jakub Narebski
@ 2010-12-06 23:10 ` Jakub Narebski
  2010-12-06 23:10 ` [PATCH 12/24] gitweb/lib - Use locking to avoid 'cache miss stampede' problem Jakub Narebski
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2010-12-06 23:10 UTC (permalink / raw
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Junio C Hamano, demerphq, Aevar Arnfjord Bjarmason, Thomas Adam,
	Jakub Narebski

This commit makes it possible to capture output redirecting STDOUT to
cache entry file; this way we don't need to store whole output in
memory in order to cache it.

The ->capture() method in GitwebCache::Capture::Simple is extended so
it signature is now ->capture($code[, $to]).  This means that it
accepts optional $to parameter, which specifies either filehandle open
for writing, or a file name, which would be used to capture output
instead of in-memory file.  The ->capture_start() and ->capture_stop()
methods that are used by ->capture() also got updated.

GitwebCache::SimpleFileCache acquired new ->compute_fh($key, $code_fh)
method.  Instead of passing and returning data like ->compute(), it
passes and returns filehandles and filenames.  While $code parameter
in ->compute($key, $code) call is expected to generate and return data
to be cached, the $code_fh parameter in ->compute_fh($key, $code_fh)
call is expected to print generated data to filehandle provided as its
parameter.  Instead of returning data (either from cache or generated)
like ->compute(), the new ->compute_fh() method returns filehandle and
filename to read data from.

The cache_output subroutine in GitwebCache::CacheOutput now uses
->compute_fh method if cache supports it.  The contents of file that
is returned by ->compute_fh is dumped to STDOUT using File::Copy::copy.

Note that ->compute_fh interface is non-standard, and is not supported
by any caching module known to author.  Closest to it is ->handle($key)
method in the Cache[1] interface (proxied to ->entry($key)->handle()).

[1]: http://p3rl.org/Cache


Added tests for capturing to file in t9503, and of ->compute_fh method
in t9504.

Inspired-by-idea-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch was inspired of behavior of cache and capturing output in
"Gitweb caching v7" series (or at least my v7.4 minimal rewrite).  It
was not present in previous version of this series... this means that
it might be less well thought out.


The major difference from the way it is done in "Gitweb caching v7" by
J.H. is that it tries to keep those three areas: caching, capturing
and integration with gitweb (caching captured output) separate.  As it
was said earlier, it allows to "unit test" those modules/packages
separately.  I think it also makes this solution easier to maintain.

As to why I didn't follow Cache interface, i.e. ->entry($key)->handle():
it would require to create at least two additional classes, a class for
cache entry to avoid race condition between checking availability and
getting ->handle(), and a IO::Handle / IO::File derivative which allows to
specify code ran when closing file - for locking, and for atomic write
(unlock and rename file to final destination).  That's why I went with
->compute_fh interface.

Note that dumping generated output or output from cache to STDOUT is done
using File::Copy: this too was inspired by code in "Gitweb caching v7" by
J.H.  (File::Copy was first released with perl 5.002, and we require for
gitweb at least 5.008).

 gitweb/lib/GitwebCache/CacheOutput.pm     |   17 ++++++++
 gitweb/lib/GitwebCache/Capture/Simple.pm  |   28 ++++++++++---
 gitweb/lib/GitwebCache/SimpleFileCache.pm |   59 +++++++++++++++++++++++++++++
 t/t9503/test_cache_interface.pl           |   32 +++++++++++++++
 t/t9504/test_capture_interface.pl         |   17 ++++++++
 5 files changed, 146 insertions(+), 7 deletions(-)

diff --git a/gitweb/lib/GitwebCache/CacheOutput.pm b/gitweb/lib/GitwebCache/CacheOutput.pm
index 4a96ac9..7aeb895 100644
--- a/gitweb/lib/GitwebCache/CacheOutput.pm
+++ b/gitweb/lib/GitwebCache/CacheOutput.pm
@@ -17,6 +17,8 @@ package GitwebCache::CacheOutput;
 use strict;
 use warnings;
 
+use File::Copy;
+
 use Exporter qw(import);
 our @EXPORT      = qw(cache_output capture_stop);
 our %EXPORT_TAGS = (all => [ @EXPORT ]);
@@ -38,6 +40,21 @@ sub cache_output {
 
 	$capture = setup_capture($capture);
 
+	if ($cache->can('compute_fh')) {
+		my ($fh, $filename) = $cache->compute_fh($key, sub {
+			my $fh = shift;
+			$capture->capture($code, $fh);
+		});
+
+		if (defined $fh) {
+			binmode $fh, ':raw';
+			binmode STDOUT, ':raw';
+			copy($fh, \*STDOUT);
+		}
+
+		return;
+	}
+
 	my $data;
 	if ($cache->can('compute')) {
 		$data = cache_output_compute($cache, $capture, $key, $code);
diff --git a/gitweb/lib/GitwebCache/Capture/Simple.pm b/gitweb/lib/GitwebCache/Capture/Simple.pm
index 3585e58..74dd240 100644
--- a/gitweb/lib/GitwebCache/Capture/Simple.pm
+++ b/gitweb/lib/GitwebCache/Capture/Simple.pm
@@ -18,6 +18,7 @@ use strict;
 use warnings;
 
 use PerlIO;
+use Symbol qw(qualify_to_ref);
 
 # Constructor
 sub new {
@@ -32,7 +33,7 @@ sub new {
 sub capture {
 	my ($self, $code) = @_;
 
-	$self->capture_start();
+	$self->capture_start(@_[2..$#_]); # pass rest of params
 	$code->();
 	return $self->capture_stop();
 }
@@ -42,6 +43,7 @@ sub capture {
 # Start capturing data (STDOUT)
 sub capture_start {
 	my $self = shift;
+	my $to = shift;
 
 	# save copy of real STDOUT via duplicating it
 	my @layers = PerlIO::get_layers(\*STDOUT);
@@ -51,11 +53,23 @@ sub capture_start {
 	# close STDOUT, so that it isn't used anymode (to have it fd0)
 	close STDOUT;
 
-	# reopen STDOUT as in-memory file
-	$self->{'data'} = '';
-	unless (open STDOUT, '>', \$self->{'data'}) {
-		open STDOUT, '>&', fileno($self->{'orig_stdout'});
-		die "Couldn't reopen STDOUT as in-memory file for capture: $!";
+	if (defined $to) {
+		$self->{'to'} = $to;
+		if (defined fileno(qualify_to_ref($to))) {
+			# if $to is filehandle, redirect
+			open STDOUT, '>&', fileno($to);
+		} elsif (! ref($to)) {
+			# if $to is name of file, open it
+			open STDOUT, '>', $to;
+		}
+
+	} else {
+		# reopen STDOUT as in-memory file
+		$self->{'data'} = '';
+		unless (open STDOUT, '>', \$self->{'data'}) {
+			open STDOUT, '>&', fileno($self->{'orig_stdout'});
+			die "Couldn't reopen STDOUT as in-memory file for capture: $!";
+		}
 	}
 	_relayer(\*STDOUT, \@layers);
 
@@ -76,7 +90,7 @@ sub capture_stop {
 	open STDOUT, '>&', fileno($self->{'orig_stdout'});
 	_relayer(\*STDOUT, \@layers);
 
-	return $self->{'data'};
+	return exists $self->{'to'} ? $self->{'to'} : $self->{'data'};
 }
 
 # taken from Capture::Tiny by David Golden, Apache License 2.0
diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
index 581a574..12af44f 100644
--- a/gitweb/lib/GitwebCache/SimpleFileCache.pm
+++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
@@ -438,6 +438,65 @@ sub compute {
 	return $data;
 }
 
+# ......................................................................
+# nonstandard interface methods
+
+sub get_fh {
+	my ($self, $key) = @_;
+
+	my $path = $self->path_to_key($key);
+	return unless (defined $path);
+
+	return unless ($self->is_valid($key));
+
+	open my $fh, '<', $path or return;
+	return ($fh, $path);
+}
+
+sub set_coderef_fh {
+	my ($self, $key, $code) = @_;
+
+	my $path = $self->path_to_key($key, \my $dir);
+	return unless (defined $path && defined $dir);
+
+	# ensure that directory leading to cache file exists
+	if (!-d $dir) {
+		# mkpath will croak()/die() if there is an error
+		mkpath($dir, 0, 0777);
+	}
+
+	# generate a temporary file
+	my ($fh, $tempfile) = _tempfile_to_path($path, $dir);
+
+	# code writes to filehandle or file
+	$code->($fh, $tempfile);
+
+	close $fh;
+	rename($tempfile, $path)
+		or die "Couldn't rename temporary file '$tempfile' to '$path': $!";
+
+	open $fh, '<', $path or return;
+	return ($fh, $path);
+}
+
+# ($fh, $filename) = $cache->compute_fh($key, $code);
+#
+# Combines the get and set operations in a single call.  Attempts to
+# get $key; if successful, returns the filehandle it can be read from.
+# Otherwise, calls $code passing filehandle to write to as a
+# parameter; contents of this file is then used as the new value for
+# $key; returns filehandle from which one can read newly generated data.
+sub compute_fh {
+	my ($self, $key, $code) = @_;
+
+	my ($fh, $filename) = $self->get_fh($key);
+	if (!defined $fh) {
+		($fh, $filename) = $self->set_coderef_fh($key, $code);
+	}
+
+	return ($fh, $filename);
+}
+
 1;
 __END__
 # end of package GitwebCache::SimpleFileCache;
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 9513043..0b4b09f 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -82,6 +82,38 @@ subtest 'CHI interface' => sub {
 	done_testing();
 };
 
+# Test the getting and setting of a cached value
+# (compute_fh interface)
+#
+$call_count = 0;
+sub write_value {
+	my $fh = shift;
+	$call_count++;
+	print {$fh} $value;
+}
+sub compute_fh_output {
+	my ($cache, $key, $code_fh) = @_;
+
+	my ($fh, $filename) = $cache->compute_fh($key, $code_fh);
+
+	local $/ = undef;
+	return <$fh>;
+}
+subtest 'compute_fh interface' => sub {
+	can_ok($cache, qw(compute_fh));
+
+	$cache->remove($key);
+	is(compute_fh_output($cache, $key, \&write_value), $value,
+	   "compute_fh 1st time (set) returns '$value'");
+	is(compute_fh_output($cache, $key, \&write_value), $value,
+	   "compute_fh 2nd time (get) returns '$value'");
+	is(compute_fh_output($cache, $key, \&write_value), $value,
+	   "compute_fh 3rd time (get) returns '$value'");
+	cmp_ok($call_count, '==', 1, 'write_value() is called once from compute_fh');
+
+	done_testing();
+};
+
 # Test cache expiration
 #
 subtest 'cache expiration' => sub {
diff --git a/t/t9504/test_capture_interface.pl b/t/t9504/test_capture_interface.pl
index 47ab804..26c9303 100755
--- a/t/t9504/test_capture_interface.pl
+++ b/t/t9504/test_capture_interface.pl
@@ -6,6 +6,7 @@ use strict;
 use utf8;
 
 use Test::More;
+use File::Compare;
 
 # test source version
 use lib $ENV{GITWEBLIBDIR} || "$ENV{GIT_BUILD_DIR}/gitweb/lib";
@@ -84,6 +85,22 @@ SKIP: {
 	is($captured, '', "doesn't print while capturing");
 }
 
+# Test capturing to file
+#
+my $test_data = 'Capture this';
+open my $fh, '>', 'expected' or die "Couldn't open file for writing: $!";
+print {$fh} $test_data;
+close $fh;
+
+$capture->capture(sub { print $test_data; }, 'actual');
+cmp_ok(compare('expected', 'actual'), '==', 0, 'capturing to file via filename');
+
+open my $fh, '>', 'actual' or die "Couldn't open file for writing: $!";
+$capture->capture(sub { print $test_data; }, $fh);
+close $fh;
+cmp_ok(compare('expected', 'actual'), '==', 0, 'capturing to file via filehandle');
+
+
 done_testing();
 
 # Local Variables:
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 12/24] gitweb/lib - Use locking to avoid 'cache miss stampede' problem
  2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
                   ` (10 preceding siblings ...)
  2010-12-06 23:10 ` [PATCH 11/24] gitweb/lib - capture output directly to cache entry file Jakub Narebski
@ 2010-12-06 23:10 ` Jakub Narebski
  2010-12-06 23:10 ` [PATCH 13/24] gitweb/lib - No need for File::Temp when locking Jakub Narebski
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2010-12-06 23:10 UTC (permalink / raw
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Junio C Hamano, demerphq, Aevar Arnfjord Bjarmason, Thomas Adam,
	Jakub Narebski

Create GitwebCache::FileCacheWithLocking module (class), derived from
GitwebCache::SimpleFileCache, where the ->compute($key, $code) method
use locking (via flock) to ensure that only one process would generate
data to update/fill-in cache; the rest would wait for the cache to
be (re)generated and would read data from cache.  If process that was
to (re)generate data dies or exits, one of the readers would take its
role.  This also means that if some data is not cached (like e.g. error
pages in gitweb), then it would work correctly even in rare case of
simultaneous access to the same non-cached entry.

Currently this feature can not be disabled via %cache_options,
although you can set $cache to 'GitwebCache::SimpleFileCache' instead.
Future new features (like: serving stale data while cache is being
regenerated, (re)generating cache in background, activity indicator)
all depend on locking.


A test in t9503 shows that in the case where there are two clients
trying to simultaneously access non-existent or stale cache entry,
(and generating data takes (artifically) a bit of time), if they are
using ->compute method the data is (re)generated once, as opposed to
situation when those clients are just using ->get/->set methods.

The situation where subroutine used to generate data dies is also
tested in t9503.

Note that there is slight inefficiency, in that filename for lockfile
depends on the filename for cache entry (it just adds '.lock' suffix),
but is recalculated from $key for both paths.

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
The idea of using locking to ensure that only one precess is updating
cache entry was inspired by original patch by J.H. about output caching
in gitweb (and perhaps also, to lesser extent, locking in Cache module).

The way file locking is implemented, with a separate *.lock file, was
based in

  "File Locking Tricks and Traps" (http://perl.plover.com/yak/flock/)

In "Gitweb caching v7" locking is done usually on the cache entry file,
i.e. one read from/written to.  It looks like there are in some cases
multiple locks for the same entry...

This code does not use LOCK_UN: closing lockfile would unlock.  LOCK_UN
doesn't always work as intended (see mentioned article; also tested that
LOCK_UN does not work found by tests provided, either this or in the
future commit).

Note that flock(..., LOCK_EX | LOCK_NB) is used to select the process
that would be (re)generating cache entry, and it is used to create
critical section - this means that it is unlocked when cache entry file
is written in full.  flock(..., LOCK_SH) on the other hand is used to
wait for aid process to finish; it is used only for synchronization.
In sane situation, i.e. when cache expiration time is longer than it
takes to read from cache, shared (readers) lock can be released just
after acquiring, but just in case opening file is in critical section
protected by this lock.

->compute and ->compute_fh ensure that data is generated even if process
generating data fails... unless no process can generate data.  It is
done thank to loop that waits either for data, or for a chance to
regenerate it themselves.  This shouldn't matter in most situations.


Compared to previous version of this series there was no ->compute_fh
(and was no common ->_compute_generic).


Alternate solution to locking to prevent 'cache miss stampede' problem,
is "expires_variance" (float) parameter used by CHI, which allows items
to expire little earlier than stated expiration time.  CHI also has
deprecated "busy_lock" (duration) which bumps expiration time, but it
doubles number of writes.

Comparing (benchmarking) those different solutions is yet to be done...

 gitweb/Makefile                                |    1 +
 gitweb/README                                  |    6 +-
 gitweb/gitweb.perl                             |    4 +-
 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |  147 ++++++++++++
 t/t9503/test_cache_interface.pl                |  283 ++++++++++++++++++++++--
 5 files changed, 424 insertions(+), 17 deletions(-)
 create mode 100644 gitweb/lib/GitwebCache/FileCacheWithLocking.pm

diff --git a/gitweb/Makefile b/gitweb/Makefile
index d67c138..acc4f5c 100644
--- a/gitweb/Makefile
+++ b/gitweb/Makefile
@@ -116,6 +116,7 @@ GITWEB_FILES += static/git-logo.png static/git-favicon.png
 # gitweb output caching
 GITWEB_MODULES += GitwebCache/CacheOutput.pm
 GITWEB_MODULES += GitwebCache/SimpleFileCache.pm
+GITWEB_MODULES += GitwebCache/FileCacheWithLocking.pm
 GITWEB_MODULES += GitwebCache/Capture/Simple.pm
 
 GITWEB_REPLACE = \
diff --git a/gitweb/README b/gitweb/README
index 3dc01bd..98e8101 100644
--- a/gitweb/README
+++ b/gitweb/README
@@ -351,8 +351,12 @@ cache config (see below), and ignore unrecognized options.  Such caching
 engine should also implement (at least) ->get($key) and ->set($key, $data)
 methods (Cache::Cache and CHI compatible interface).
 
+You can set $cache to 'GitwebCache::SimpleFileCache' if you don't want
+to use locking, but then some advanced features, like generating data in
+background, wouldn't work because they require locking.
+
 If $cache is left unset (if it is left undefined), then gitweb would use
-GitwebCache::SimpleFileCache as caching engine.  This engine is 'dumb' (but
+GitwebCache::FileCacheWithLocking as caching engine.  This engine is 'dumb' (but
 fast) file based caching layer, currently without any support for cache size
 limiting, or even removing expired / grossly expired entries.  It has
 therefore the downside of requiring a huge amount of disk space if there are
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 12e04a1..72683be 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -277,7 +277,7 @@ our $caching_enabled = 0;
 # Set to _initialized_ instance of cache interface implementing (at least)
 # get($key) and set($key, $data) methods (Cache::Cache and CHI interfaces),
 # or to name of class of cache interface implementing said methods.
-# If unset, GitwebCache::SimpleFileCache would be used, which is 'dumb'
+# If unset, GitwebCache::FileCacheWithLocking would be used, which is 'dumb'
 # (but fast) file based caching layer, currently without any support for
 # cache size limiting.  It is therefore recommended that the cache directory
 # be periodically completely deleted; this operation is safe to perform.
@@ -1245,7 +1245,7 @@ sub configure_caching {
 	# $cache might be initialized (instantiated) cache, i.e. cache object,
 	# or it might be name of class, or it might be undefined
 	unless (defined $cache && ref($cache)) {
-		$cache ||= 'GitwebCache::SimpleFileCache';
+		$cache ||= 'GitwebCache::FileCacheWithLocking';
 		eval "require $cache";
 		die $@ if $@;
 		$cache = $cache->new({
diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
new file mode 100644
index 0000000..1ea0e60
--- /dev/null
+++ b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
@@ -0,0 +1,147 @@
+# gitweb - simple web interface to track changes in git repositories
+#
+# (C) 2006, John 'Warthog9' Hawley <warthog19@eaglescrag.net>
+# (C) 2010, Jakub Narebski <jnareb@gmail.com>
+#
+# This program is licensed under the GPLv2
+
+#
+# Gitweb caching engine, simple file-based cache, with locking
+#
+
+# Based on GitwebCache::SimpleFileCache, minimalistic cache that
+# stores data in the filesystem, without serialization.
+#
+# It uses file locks (flock) to have only one process generating data
+# and writing to cache, when using CHI interface ->compute() method.
+
+package GitwebCache::FileCacheWithLocking;
+use base qw(GitwebCache::SimpleFileCache);
+
+use strict;
+use warnings;
+
+use File::Path qw(mkpath);
+use Fcntl qw(:flock);
+
+# ......................................................................
+# constructor is inherited from GitwebCache::SimpleFileCache
+
+# ----------------------------------------------------------------------
+# utility functions and methods
+
+# Take an human readable key, and return path to be used for lockfile
+# Ensures that file can be created, if needed.
+sub get_lockname {
+	my ($self, $key) = @_;
+
+	my $lockfile = $self->path_to_key($key, \my $dir) . '.lock';
+
+	# ensure that directory leading to lockfile exists
+	if (!-d $dir) {
+		eval { mkpath($dir, 0, 0777); 1 }
+			or die "Couldn't mkpath '$dir' for lockfile: $!";
+	}
+
+	return $lockfile;
+}
+
+# ......................................................................
+# interface methods
+
+sub _compute_generic {
+	my ($self, $key,
+	    $get_code, $set_code, $get_locked) = @_;
+
+	my @result = $get_code->();
+	return @result if @result;
+
+	my $lockfile = $self->get_lockname($key);
+
+	# this loop is to protect against situation where process that
+	# acquired exclusive lock (writer) dies or exits (die_error)
+	# before writing data to cache
+	my $lock_state; # needed for loop condition
+	do {
+		open my $lock_fh, '+>', $lockfile
+			or die "Could't open lockfile '$lockfile': $!";
+		$lock_state = flock($lock_fh, LOCK_EX | LOCK_NB);
+		if ($lock_state) {
+			# acquired writers lock
+			@result = $set_code->();
+
+			# closing lockfile releases lock
+			close $lock_fh
+				or die "Could't close lockfile '$lockfile': $!";
+
+		} else {
+			# get readers lock (wait for writer)
+			flock($lock_fh, LOCK_SH);
+			# closing lockfile releases lock
+			if ($get_locked) {
+				@result = $get_code->();
+				close $lock_fh
+					or die "Could't close lockfile '$lockfile': $!";
+			} else {
+				close $lock_fh
+					or die "Could't close lockfile '$lockfile': $!";
+				@result = $get_code->();
+			}
+		}
+	} until (@result || $lock_state);
+	# repeat until we have data, or we tried generating data oneself and failed
+	return @result;
+}
+
+# $data = $cache->compute($key, $code);
+#
+# Combines the get and set operations in a single call.  Attempts to
+# get $key; if successful, returns the value.  Otherwise, calls $code
+# and uses the return value as the new value for $key, which is then
+# returned.
+#
+# Uses file locking to have only one process updating value for $key
+# to avoid 'cache miss stampede' (aka 'stampeding herd') problem.
+sub compute {
+	my ($self, $key, $code) = @_;
+
+	return ($self->_compute_generic($key,
+		sub {
+			return $self->get($key);
+		},
+		sub {
+			my $data = $code->();
+			$self->set($key, $data);
+			return $data;
+		},
+		0 # $self->get($key); is outside LOCK_SH critical section
+	))[0]; # return single value: $data
+}
+
+# ($fh, $filename) = $cache->compute_fh($key, $code);
+#
+# Combines the get and set operations in a single call.  Attempts to
+# get $key; if successful, returns the filehandle it can be read from.
+# Otherwise, calls $code passing filehandle to write to as a
+# parameter; contents of this file is then used as the new value for
+# $key; returns filehandle from which one can read newly generated data.
+#
+# Uses file locking to have only one process updating value for $key
+# to avoid 'cache miss stampede' (aka 'stampeding herd') problem.
+sub compute_fh {
+	my ($self, $key, $code_fh) = @_;
+
+	return $self->_compute_generic($key,
+		sub {
+			return $self->get_fh($key);
+		},
+		sub {
+			return $self->set_coderef_fh($key, $code_fh);
+		},
+		1 # $self->get_fh($key); just opens file
+	);
+}
+
+1;
+__END__
+# end of package GitwebCache::FileCacheWithLocking
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 0b4b09f..c6a28f8 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -4,6 +4,12 @@ use lib (split(/:/, $ENV{GITPERLLIB}));
 use warnings;
 use strict;
 
+use POSIX qw(dup2);
+use Fcntl qw(:DEFAULT);
+use IO::Handle;
+use IO::Select;
+use IO::Pipe;
+
 use Test::More;
 
 # test source version
@@ -12,11 +18,12 @@ use lib $ENV{GITWEBLIBDIR} || "$ENV{GIT_BUILD_DIR}/gitweb/lib";
 
 # Test creating a cache
 #
-BEGIN { use_ok('GitwebCache::SimpleFileCache'); }
+BEGIN { use_ok('GitwebCache::FileCacheWithLocking'); }
 diag("Using lib '$INC[0]'");
-diag("Testing '$INC{'GitwebCache/SimpleFileCache.pm'}'");
+diag("Testing '$INC{'GitwebCache/FileCacheWithLocking.pm'}'");
 
-my $cache = new_ok('GitwebCache::SimpleFileCache');
+my $cache = new_ok('GitwebCache::FileCacheWithLocking');
+isa_ok($cache, 'GitwebCache::SimpleFileCache');
 
 # Test that default values are defined
 #
@@ -39,6 +46,9 @@ SKIP: {
 		"default cache depth is $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH");
 }
 
+# ----------------------------------------------------------------------
+# CACHE API
+
 # Test the getting, setting, and removal of a cached value
 # (Cache::Cache interface)
 #
@@ -91,23 +101,15 @@ sub write_value {
 	$call_count++;
 	print {$fh} $value;
 }
-sub compute_fh_output {
-	my ($cache, $key, $code_fh) = @_;
-
-	my ($fh, $filename) = $cache->compute_fh($key, $code_fh);
-
-	local $/ = undef;
-	return <$fh>;
-}
 subtest 'compute_fh interface' => sub {
 	can_ok($cache, qw(compute_fh));
 
 	$cache->remove($key);
-	is(compute_fh_output($cache, $key, \&write_value), $value,
+	is(cache_compute_fh($cache, $key, \&write_value), $value,
 	   "compute_fh 1st time (set) returns '$value'");
-	is(compute_fh_output($cache, $key, \&write_value), $value,
+	is(cache_compute_fh($cache, $key, \&write_value), $value,
 	   "compute_fh 2nd time (get) returns '$value'");
-	is(compute_fh_output($cache, $key, \&write_value), $value,
+	is(cache_compute_fh($cache, $key, \&write_value), $value,
 	   "compute_fh 3rd time (get) returns '$value'");
 	cmp_ok($call_count, '==', 1, 'write_value() is called once from compute_fh');
 
@@ -166,4 +168,257 @@ subtest 'adaptive cache expiration' => sub {
 
 $cache->set_expires_in(-1);
 
+# ----------------------------------------------------------------------
+# CONCURRENT ACCESS
+sub parallel_run (&); # forward declaration of prototype
+
+# Test 'stampeding herd' / 'cache miss stampede' problem
+#
+
+my $slow_time = 1; # how many seconds to sleep in mockup of slow generation
+sub get_value_slow {
+	$call_count++;
+	sleep $slow_time;
+	return $value;
+}
+sub get_value_slow_fh {
+	my $fh = shift;
+
+	$call_count++;
+	sleep $slow_time;
+	print {$fh} $value;
+}
+
+sub get_value_die {
+	$call_count++;
+	die "get_value_die\n";
+}
+
+my $lock_file = "$0.$$.lock";
+sub get_value_die_once {
+	if (sysopen my $fh, $lock_file, (O_WRONLY | O_CREAT | O_EXCL)) {
+		close $fh;
+		die "get_value_die_once\n";
+	} else {
+		sleep $slow_time;
+		return $value;
+	}
+}
+
+my @output;
+my $sep = '|';
+my $total_count = 0;
+
+note("Following tests contain artifical delay of $slow_time seconds");
+subtest 'parallel access' => sub {
+	$cache->remove($key);
+	@output = parallel_run {
+		$call_count = 0;
+		my $data = cache_get_set($cache, $key, \&get_value_slow);
+		print "$data$sep$call_count";
+	};
+	$total_count = 0;
+	foreach (@output) {
+		my ($child_out, $child_count) = split(quotemeta $sep, $_);
+		#is($child_out, $value, "get/set (parallel) returns '$value'");
+		$total_count += $child_count;
+	}
+	cmp_ok($total_count, '==', 2, 'parallel get/set: get_value_slow() called twice');
+
+	$cache->remove($key);
+	@output = parallel_run {
+		$call_count = 0;
+		my $data = cache_compute($cache, $key, \&get_value_slow);
+		print "$data$sep$call_count";
+	};
+	$total_count = 0;
+	foreach (@output) {
+		my ($child_out, $child_count) = split(quotemeta $sep, $_);
+		#is($child_out, $value, "compute (parallel) returns '$value'");
+		$total_count += $child_count;
+	}
+	cmp_ok($total_count, '==', 1, 'parallel compute: get_value_slow() called once');
+
+	$cache->remove($key);
+	@output = parallel_run {
+		$call_count = 0;
+		my $data = cache_compute_fh($cache, $key, \&get_value_slow_fh);
+		print "$data$sep$call_count";
+	};
+	$total_count = 0;
+	foreach (@output) {
+		my ($child_out, $child_count) = split(quotemeta $sep, $_);
+		#is($child_out, $value, "compute_fh (parallel) returns '$value'");
+		$total_count += $child_count;
+	}
+	cmp_ok($total_count, '==', 1, 'parallel compute_fh: get_value_slow_fh() called once');
+
+	eval {
+		local $SIG{ALRM} = sub { die "alarm\n"; };
+		alarm 4*$slow_time;
+
+		@output = parallel_run {
+			$call_count = 0;
+			my $data = eval { cache_compute($cache, 'No Key', \&get_value_die); };
+			my $eval_error = $@;
+			print "$data" if defined $data;
+			print "$sep";
+			print "$eval_error" if defined $eval_error;
+		};
+		is_deeply(
+			\@output,
+			[ ( "${sep}get_value_die\n" ) x 2 ],
+			'parallel compute: get_value_die() died in both'
+		);
+
+		alarm 0;
+	};
+	ok(!$@, 'parallel compute: no alarm call (neither process hung)');
+	diag($@) if $@;
+
+	$cache->remove($key);
+	unlink($lock_file);
+	@output = parallel_run {
+		my $data = eval { cache_compute($cache, $key, \&get_value_die_once); };
+		my $eval_error = $@;
+		print "$data" if defined $data;
+		print "$sep";
+		print "$eval_error" if defined $eval_error;
+	};
+	is_deeply(
+		[sort @output],
+		[sort ("$value$sep", "${sep}get_value_die_once\n")],
+		'parallel compute: return correct value even if other process died'
+	);
+	unlink($lock_file);
+
+	done_testing();
+};
+
 done_testing();
+
+
+#######################################################################
+#######################################################################
+#######################################################################
+
+# use ->get($key) and ->set($key, $data) interface
+sub cache_get_set {
+	my ($cache, $key, $code) = @_;
+
+	my $data = $cache->get($key);
+	if (!defined $data) {
+		$data = $code->();
+		$cache->set($key, $data);
+	}
+
+	return $data;
+}
+
+# use ->compute($key, $code) interface
+sub cache_compute {
+	my ($cache, $key, $code) = @_;
+
+	my $data = $cache->compute($key, $code);
+	return $data;
+}
+# use ->compute_fh($key, $code_fh) interface
+sub cache_compute_fh {
+	my ($cache, $key, $code_fh) = @_;
+
+	my ($fh, $filename) = $cache->compute_fh($key, $code_fh);
+
+	local $/ = undef;
+	return <$fh>;
+}
+
+# from http://aaroncrane.co.uk/talks/pipes_and_processes/
+sub fork_child (&) {
+	my ($child_process_code) = @_;
+
+	my $pid = fork();
+	die "Failed to fork: $!\n" if !defined $pid;
+
+	return $pid if $pid != 0;
+
+	# Now we're in the new child process
+	$child_process_code->();
+	exit;
+}
+
+sub parallel_run (&) {
+	my $child_code = shift;
+	my $nchildren = 2;
+
+	my %children;
+	my (%pid_for_child, %fd_for_child);
+	my $sel = IO::Select->new();
+	foreach my $child_idx (1..$nchildren) {
+		my $pipe = IO::Pipe->new()
+			or die "Failed to create pipe: $!\n";
+
+		my $pid = fork_child {
+			$pipe->writer()
+				or die "$$: Child \$pipe->writer(): $!\n";
+			dup2(fileno($pipe), fileno(STDOUT))
+				or die "$$: Child $child_idx failed to reopen stdout to pipe: $!\n";
+			close $pipe
+				or die "$$: Child $child_idx failed to close pipe: $!\n";
+
+			# From Test-Simple-0.96/t/subtest/fork.t
+			#
+			# Force all T::B output into the pipe (redirected to STDOUT),
+			# for the parent builder as well as the current subtest builder.
+			{
+				no warnings 'redefine';
+				*Test::Builder::output         = sub { *STDOUT };
+				*Test::Builder::failure_output = sub { *STDOUT };
+				*Test::Builder::todo_output    = sub { *STDOUT };
+			}
+
+			$child_code->();
+
+			*STDOUT->flush();
+			close(STDOUT);
+		};
+
+		$pid_for_child{$pid} = $child_idx;
+		$pipe->reader()
+			or die "Failed to \$pipe->reader(): $!\n";
+		$fd_for_child{$pipe} = $child_idx;
+		$sel->add($pipe);
+
+		$children{$child_idx} = {
+			'pid'    => $pid,
+			'stdout' => $pipe,
+			'output' => '',
+		};
+	}
+
+	while (my @ready = $sel->can_read()) {
+		foreach my $fh (@ready) {
+			my $buf = '';
+			my $nread = sysread($fh, $buf, 1024);
+
+			exists $fd_for_child{$fh}
+				or die "Cannot find child for fd: $fh\n";
+
+			if ($nread > 0) {
+				$children{$fd_for_child{$fh}}{'output'} .= $buf;
+			} else {
+				$sel->remove($fh);
+			}
+		}
+	}
+
+	while (%pid_for_child) {
+		my $pid = waitpid -1, 0;
+		warn "Child $pid_for_child{$pid} ($pid) failed with status: $?\n"
+			if $? != 0;
+		delete $pid_for_child{$pid};
+	}
+
+	return map { $children{$_}{'output'} } keys %children;
+}
+
+__END__
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 13/24] gitweb/lib - No need for File::Temp when locking
  2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
                   ` (11 preceding siblings ...)
  2010-12-06 23:10 ` [PATCH 12/24] gitweb/lib - Use locking to avoid 'cache miss stampede' problem Jakub Narebski
@ 2010-12-06 23:10 ` Jakub Narebski
  2010-12-06 23:10 ` [PATCH 14/24] gitweb/lib - Serve stale data when waiting for filling cache Jakub Narebski
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2010-12-06 23:10 UTC (permalink / raw
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Junio C Hamano, demerphq, Aevar Arnfjord Bjarmason, Thomas Adam,
	Jakub Narebski

When using locking to ensure that only one process is generating data
and updating cache, there is no need to use File::Temp for temporary
file.  This should improve performance.

The _tempfile_to_path subroutine got promoted to _tempfile_to_path
method, because we want to choose correct one dynamically, based on
the type of object (polymorphism).

Idea-inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
In previous version of this patch _tempfile_to_path was subroutine
rather than method... which mans that GitwebCache::SimpleFileCache
version was being picked up because that is the package where methods
that use it are.

"Gitweb caching v7" either writes directly to cache entry file, or
uses file with name based on original cache entry as temporary output
file; the latter is just like GitwebCache::FileCacheWithLocking does
it after this patch.

 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |   16 ++++++++++++++++
 gitweb/lib/GitwebCache/SimpleFileCache.pm      |    6 +++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
index 1ea0e60..4d8114d 100644
--- a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
+++ b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
@@ -46,6 +46,22 @@ sub get_lockname {
 	return $lockfile;
 }
 
+# ----------------------------------------------------------------------
+# "private" utility functions and methods
+
+# take a file path to cache entry, and its directory
+# return filehandle and filename of open temporary file,
+# like File::Temp::tempfile
+sub _tempfile_to_path {
+	my ($self, $file, $dir) = @_;
+
+	my $tempname = "$file.tmp";
+	open my $temp_fh, '>', $tempname
+		or die "Couldn't open temporary file '$tempname' for writing: $!";
+
+	return ($temp_fh, $tempname);
+}
+
 # ......................................................................
 # interface methods
 
diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
index 12af44f..aeb91d4 100644
--- a/gitweb/lib/GitwebCache/SimpleFileCache.pm
+++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
@@ -288,7 +288,7 @@ sub write_fh {
 # return filehandle and filename of open temporary file,
 # like File::Temp::tempfile
 sub _tempfile_to_path {
-	my ($file, $dir) = @_;
+	my ($self, $file, $dir) = @_;
 
 	# tempfile will croak() if there is an error
 	return tempfile("${file}_XXXXX",
@@ -324,7 +324,7 @@ sub store {
 	}
 
 	# generate a temporary file
-	my ($temp_fh, $tempname) = _tempfile_to_path($file, $dir);
+	my ($temp_fh, $tempname) = $self->_tempfile_to_path($file, $dir);
 	chmod 0666, $tempname
 		or warn "Couldn't change permissions to 0666 / -rw-rw-rw- for '$tempname': $!";
 
@@ -466,7 +466,7 @@ sub set_coderef_fh {
 	}
 
 	# generate a temporary file
-	my ($fh, $tempfile) = _tempfile_to_path($path, $dir);
+	my ($fh, $tempfile) = $self->_tempfile_to_path($path, $dir);
 
 	# code writes to filehandle or file
 	$code->($fh, $tempfile);
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 14/24] gitweb/lib - Serve stale data when waiting for filling cache
  2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
                   ` (12 preceding siblings ...)
  2010-12-06 23:10 ` [PATCH 13/24] gitweb/lib - No need for File::Temp when locking Jakub Narebski
@ 2010-12-06 23:10 ` Jakub Narebski
  2010-12-06 23:11 ` [PATCH 15/24] gitweb/lib - Regenerate (refresh) cache in background Jakub Narebski
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2010-12-06 23:10 UTC (permalink / raw
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Junio C Hamano, demerphq, Aevar Arnfjord Bjarmason, Thomas Adam,
	Jakub Narebski

When process fails to acquire exclusive (writers) lock, then instead
of waiting for the other process to (re)generate and fill cache, serve
stale (expired) data from cache.  This is of course possible only if
there is some stale data in cache for given key.

This feature of GitwebCache::FileCacheWithLocking is used only for
->update($key, $code) and ->update_fh($key, $code_fh) methods.  It is
controlled by 'max_lifetime' cache parameter; you can set it to -1 to
always serve stale data if it exists, and you can set it to 0 (or any
value smaller than 'expires_min') to turn this feature off.

This feature, as it is implemented currently, makes ->update() method a
bit asymmetric with respect to process that acquired writers lock and
those processes that didn't, which can be seen in the new test in t9503.
The process that is to regenerate (refresh) data in cache must wait for
the data to be generated in full before showing anything to client, while
the other processes can show stale (expired) data immediately.  In order
to remove or reduce this asymmetry gitweb would need to employ one of
two alternate solutions.  Either data should be (re)generated in background,
so that process that acquired writers lock would generate data in
background while serving stale data, or alternatively the process that
generates data should pass output to original STDOUT while capturing it
("tee" output).

Note that process that got stale data serves it immediately, therefore
it wouldn't be available to regenerate data if process regenerating
data died; see commented-out TODO test in t9503.  Otherwise it would
have to wait to check if data got regenerated, which would negate the
idea of serving stale data for a fast return.

When developing this feature, ->is_valid() method in base class
GitwebCache::SimpleFileCache acquired additional extra optional
parameter, where one can pass expire time instead of using whole-cache
global (adaptive) expire time.

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Compared to version in previous version of this series the parallel
access test got much improved (this actually started in earlier
commit).

This is the part that is possible _without_ regenerating cache in
background.

Note that here it is explicit that serving stale data when some
process is regenerating cache is possible only with locking enabled,
i.e. when using GitwebCache::FileCacheWithLocking.

 gitweb/gitweb.perl                             |    8 ++
 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |  105 ++++++++++++++++++++++--
 gitweb/lib/GitwebCache/SimpleFileCache.pm      |   22 ++++--
 t/t9503/test_cache_interface.pl                |   68 +++++++++++++++-
 4 files changed, 189 insertions(+), 14 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 72683be..454766c 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -327,6 +327,14 @@ our %cache_options = (
 	# lifetime control.
 	# (Compatibile with Cache::Adaptive.)
 	'check_load' => \&get_loadavg,
+
+	# Maximum cache file life, in seconds.  If cache entry lifetime exceeds
+	# this value, it wouldn't be served as being too stale when waiting for
+	# cache to be regenerated/refreshed, instead of trying to display
+	# existing cache date.
+	# Set it to -1 to always serve existing data if it exists,
+	# set it to 0 to turn off serving stale data - always wait.
+	'max_lifetime' => 5*60*60, # 5 hours
 );
 # Set to _initialized_ instance of GitwebCache::Capture compatibile capturing
 # engine, i.e. one implementing ->new() constructor, and ->capture($code)
diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
index 4d8114d..1d32810 100644
--- a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
+++ b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
@@ -25,7 +25,88 @@ use File::Path qw(mkpath);
 use Fcntl qw(:flock);
 
 # ......................................................................
-# constructor is inherited from GitwebCache::SimpleFileCache
+# constructor
+
+# The options are set by passing in a reference to a hash containing
+# any of the following keys:
+#  * 'namespace'
+#    The namespace associated with this cache.  This allows easy separation of
+#    multiple, distinct caches without worrying about key collision.  Defaults
+#    to $DEFAULT_NAMESPACE.
+#  * 'cache_root' (Cache::FileCache compatibile),
+#    'root_dir' (CHI::Driver::File compatibile),
+#    The location in the filesystem that will hold the root of the cache.
+#    Defaults to $DEFAULT_CACHE_ROOT.
+#  * 'cache_depth' (Cache::FileCache compatibile),
+#    'depth' (CHI::Driver::File compatibile),
+#    The number of subdirectories deep to cache object item.  This should be
+#    large enough that no cache directory has more than a few hundred objects.
+#    Defaults to $DEFAULT_CACHE_DEPTH unless explicitly set.
+#  * 'default_expires_in' (Cache::Cache compatibile),
+#    'expires_in' (CHI compatibile) [seconds]
+#    The expiration time for objects place in the cache.
+#    Defaults to -1 (never expire) if not explicitly set.
+#    Sets 'expires_min' to given value.
+#  * 'expires_min' [seconds]
+#    The minimum expiration time for objects in cache (e.g. with 0% CPU load).
+#    Used as lower bound in adaptive cache lifetime / expiration.
+#    Defaults to 20 seconds; 'expires_in' sets it also.
+#  * 'expires_max' [seconds]
+#    The maximum expiration time for objects in cache.
+#    Used as upper bound in adaptive cache lifetime / expiration.
+#    Defaults to 1200 seconds, if not set; 
+#    defaults to 'expires_min' if 'expires_in' is used.
+#  * 'check_load'
+#    Subroutine (code) used for adaptive cache lifetime / expiration.
+#    If unset, adaptive caching is turned off; defaults to unset.
+#  * 'increase_factor' [seconds / 100% CPU load]
+#    Factor multiplying 'check_load' result when calculating cache lietime.
+#    Defaults to 60 seconds for 100% SPU load ('check_load' returning 1.0).
+#
+# (all the above are inherited from GitwebCache::SimpleFileCache)
+#
+#  * 'max_lifetime' [seconds]
+#    If it is greater than 0, and cache entry is expired but not older
+#    than it, serve stale data when waiting for cache entry to be 
+#    regenerated (refreshed).  Non-adaptive.
+#    Defaults to -1 (never expire / always serve stale).
+sub new {
+	my $class = shift;
+	my %opts = ref $_[0] ? %{ $_[0] } : @_;
+
+	my $self = $class->SUPER::new(\%opts);
+
+	my ($max_lifetime);
+	if (%opts) {
+		$max_lifetime =
+			$opts{'max_lifetime'} ||
+			$opts{'max_cache_lifetime'};
+	}
+	$max_lifetime = -1 unless defined($max_lifetime);
+
+	$self->set_max_lifetime($max_lifetime);
+
+	return $self;
+}
+
+# ......................................................................
+# accessors
+
+# http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
+
+# creates get_depth() and set_depth($depth) etc. methods
+foreach my $i (qw(max_lifetime)) {
+	my $field = $i;
+	no strict 'refs';
+	*{"get_$field"} = sub {
+		my $self = shift;
+		return $self->{$field};
+	};
+	*{"set_$field"} = sub {
+		my ($self, $value) = @_;
+		$self->{$field} = $value;
+	};
+}
 
 # ----------------------------------------------------------------------
 # utility functions and methods
@@ -67,7 +148,7 @@ sub _tempfile_to_path {
 
 sub _compute_generic {
 	my ($self, $key,
-	    $get_code, $set_code, $get_locked) = @_;
+	    $get_code, $fetch_code, $set_code, $fetch_locked) = @_;
 
 	my @result = $get_code->();
 	return @result if @result;
@@ -91,17 +172,23 @@ sub _compute_generic {
 				or die "Could't close lockfile '$lockfile': $!";
 
 		} else {
+			# try to retrieve stale data
+			@result = $fetch_code->()
+				if $self->is_valid($key, $self->get_max_lifetime());
+			return @result if @result;
+
 			# get readers lock (wait for writer)
+			# if there is no stale data to serve
 			flock($lock_fh, LOCK_SH);
 			# closing lockfile releases lock
-			if ($get_locked) {
-				@result = $get_code->();
+			if ($fetch_locked) {
+				@result = $fetch_code->();
 				close $lock_fh
 					or die "Could't close lockfile '$lockfile': $!";
 			} else {
 				close $lock_fh
 					or die "Could't close lockfile '$lockfile': $!";
-				@result = $get_code->();
+				@result = $fetch_code->();
 			}
 		}
 	} until (@result || $lock_state);
@@ -126,6 +213,9 @@ sub compute {
 			return $self->get($key);
 		},
 		sub {
+			return $self->fetch($key);
+		},
+		sub {
 			my $data = $code->();
 			$self->set($key, $data);
 			return $data;
@@ -152,9 +242,12 @@ sub compute_fh {
 			return $self->get_fh($key);
 		},
 		sub {
+			return $self->fetch_fh($key);
+		},
+		sub {
 			return $self->set_coderef_fh($key, $code_fh);
 		},
-		1 # $self->get_fh($key); just opens file
+		1 # $self->fetch_fh($key); just opens file
 	);
 }
 
diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
index aeb91d4..21ec434 100644
--- a/gitweb/lib/GitwebCache/SimpleFileCache.pm
+++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
@@ -365,12 +365,13 @@ sub remove {
 		or die "Couldn't remove file '$file': $!";
 }
 
-# $cache->is_valid($key)
+# $cache->is_valid($key[, $expires_in])
 #
 # Returns a boolean indicating whether $key exists in the cache
-# and has not expired (global per-cache 'expires_in').
+# and has not expired.  Uses global per-cache expires time, unless
+# passed optional $expires_in argument.
 sub is_valid {
-	my ($self, $key) = @_;
+	my ($self, $key, $expires_in) = @_;
 
 	my $path = $self->path_to_key($key);
 
@@ -383,7 +384,7 @@ sub is_valid {
 	return 0 unless ((stat(_))[7] > 0);
 
 	# expire time can be set to never
-	my $expires_in = $self->get_expires_in();
+	$expires_in = defined $expires_in ? $expires_in : $self->get_expires_in();
 	return 1 unless (defined $expires_in && $expires_in >= 0);
 
 	# is file expired?
@@ -441,18 +442,25 @@ sub compute {
 # ......................................................................
 # nonstandard interface methods
 
-sub get_fh {
+sub fetch_fh {
 	my ($self, $key) = @_;
 
 	my $path = $self->path_to_key($key);
 	return unless (defined $path);
 
-	return unless ($self->is_valid($key));
-
 	open my $fh, '<', $path or return;
 	return ($fh, $path);
 }
 
+
+sub get_fh {
+	my ($self, $key) = @_;
+
+	return unless ($self->is_valid($key));
+
+	return $self->fetch_fh($key);
+}
+
 sub set_coderef_fh {
 	my ($self, $key, $code) = @_;
 
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index c6a28f8..8a52261 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -22,7 +22,9 @@ BEGIN { use_ok('GitwebCache::FileCacheWithLocking'); }
 diag("Using lib '$INC[0]'");
 diag("Testing '$INC{'GitwebCache/FileCacheWithLocking.pm'}'");
 
-my $cache = new_ok('GitwebCache::FileCacheWithLocking');
+my $cache = new_ok('GitwebCache::FileCacheWithLocking', [ {
+	'max_lifetime' => 0, # turn it off
+} ]);
 isa_ok($cache, 'GitwebCache::SimpleFileCache');
 
 # Test that default values are defined
@@ -295,6 +297,70 @@ subtest 'parallel access' => sub {
 	done_testing();
 };
 
+# Test that cache returns stale data in existing but expired cache situation
+# (probably should be run only if GIT_TEST_LONG)
+#
+my $stale_value = 'Stale Value';
+
+subtest 'serving stale data when (re)generating' => sub {
+	$cache->set($key, $stale_value);
+	$call_count = 0;
+	$cache->set_expires_in(0);    # expire now
+	$cache->set_max_lifetime(-1); # forever (always serve stale data)
+
+	@output = parallel_run {
+		my $data = cache_compute($cache, $key, \&get_value_slow);
+		print $data;
+	};
+	ok(scalar(grep { $_ eq $stale_value } @output),
+	   'stale data in at least one process when expired');
+
+	$cache->set_expires_in(-1); # never expire for next ->get
+	is($cache->get($key), $value,
+	   'value got set correctly, even if stale data returned');
+
+
+# 	$cache->set($key, $stale_value);
+# 	unlink($lock_file);
+# 	@output = parallel_run {
+# 		my $data = eval { cache_compute($cache, $key, \&get_value_die_once); };
+# 		my $eval_error = $@;
+# 		print "$data" if defined $data;
+# 		print "$sep";
+# 		print "$eval_error" if defined $eval_error;
+# 	};
+#  TODO: {
+# 		local $TODO = 'not implemented';
+#
+# 		is_deeply(
+# 			[sort @output],
+# 			[sort ("$value${sep}", "${sep}get_value_die_once\n")],
+# 			'return non-stale value, even if process regenerating it died'
+# 		);
+#
+# 		$cache->set_expires_in(-1); # never expire for next ->get
+# 		is($cache->get($key), $value,
+# 		   'value got regenerated, even if process regenerating it died');
+# 	};
+# 	unlink($lock_file);
+
+	$cache->set($key, $stale_value);
+	$cache->set_expires_in(0);   # expire now
+	$cache->set_max_lifetime(0); # don't serve stale data
+
+	@output = parallel_run {
+		my $data = cache_compute($cache, $key, \&get_value_slow);
+		print $data;
+	};
+	# no returning stale data
+	ok(!scalar(grep { $_ eq $stale_value } @output),
+	   'no stale data if configured');
+
+
+	done_testing();
+};
+$cache->set_expires_in(-1);
+
 done_testing();
 
 
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 15/24] gitweb/lib - Regenerate (refresh) cache in background
  2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
                   ` (13 preceding siblings ...)
  2010-12-06 23:10 ` [PATCH 14/24] gitweb/lib - Serve stale data when waiting for filling cache Jakub Narebski
@ 2010-12-06 23:11 ` Jakub Narebski
  2010-12-06 23:11 ` [PATCH 16/24] gitweb: Introduce %actions_info, gathering information about actions Jakub Narebski
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2010-12-06 23:11 UTC (permalink / raw
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Junio C Hamano, demerphq, Aevar Arnfjord Bjarmason, Thomas Adam,
	Jakub Narebski

This commit removes asymmetry in serving stale data (if stale data exists)
when regenerating cache in GitwebCache::FileCacheWithLocking.  The process
that acquired exclusive (writers) lock, and is therefore selected to
be the one that (re)generates data to fill the cache, can now generate
data in background, while serving stale data.

Those background processes are daemonized, i.e. detached from the main
process (the one returning data or stale data).  Otherwise there might be a
problem when gitweb is running as (part of) long-lived process, for example
from mod_perl or from FastCGI: it would leave unreaped children as zombies
(entries in process table).  We don't want to wait for background process,
and we can't set $SIG{CHLD} to 'IGNORE' in gitweb to automatically reap
child processes, because this interferes with using
  open my $fd, '-|', git_cmd(), 'param', ...
    or die_error(...)
  # read from <$fd>
  close $fd
    or die_error(...)
In the above code "close" for magic "-|" open calls waitpid...  and we
would would die with "No child processes".  Removing 'or die' would
possibly remove ability to react to other errors.

This feature can be enabled or disabled on demand via 'background_cache'
cache parameter.  It is turned on by default.


The t9503 test got updated to test both case with background generation
enabled and case with background generation disabled.

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Compared to previous version of this series ->_set_maybe_background
was extracted (refactored) from ->_compute_generic (in earlier version
it would be just ->compute).  This hopefully makes code easier to
understand.

Differences to approach taken in "Gitweb caching v7" by J.H.
* It is made explicit that background generation depends on using
  locking.  It doesn't matter for J.H. series, as you canot turn off
  using locking there.

* Forking (running generating process in background) is done only if
  there is stale data to serve (and if background cache is turned on).
  In J.H. series forking was done unconditionally, only generation or
  exit depended on $backgroundCache (and technical/for debugging
  $cacheDoFork).

* Locking is done before forking, as forking background process is done
  only for the process regenerating cache.

* Daemonizes background process, detaching it from parent (using
  setsid).  This way whether main process is short-lived (gitweb as CGI)
  or long-lived (mod_perl, PSGI or FastCGI), there would be no need to
  wait and no zombies.

 gitweb/gitweb.perl                             |    9 +++
 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |   64 ++++++++++++++++++++++--
 t/t9503/test_cache_interface.pl                |   40 ++++++++++++++-
 3 files changed, 106 insertions(+), 7 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 454766c..f202d6b 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -335,6 +335,15 @@ our %cache_options = (
 	# Set it to -1 to always serve existing data if it exists,
 	# set it to 0 to turn off serving stale data - always wait.
 	'max_lifetime' => 5*60*60, # 5 hours
+
+	# This enables/disables background caching.  If it is set to true value,
+	# caching engine would return stale data (if it is not older than
+	# 'max_lifetime' seconds) if it exists, and launch process if regenerating
+	# (refreshing) cache into the background.  If it is set to false value,
+	# the process that fills cache must always wait for data to be generated.
+	# In theory this will make gitweb seem more responsive at the price of
+	# serving possibly stale data.
+	'background_cache' => 1,
 );
 # Set to _initialized_ instance of GitwebCache::Capture compatibile capturing
 # engine, i.e. one implementing ->new() constructor, and ->capture($code)
diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
index 1d32810..82e88f1 100644
--- a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
+++ b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
@@ -23,6 +23,7 @@ use warnings;
 
 use File::Path qw(mkpath);
 use Fcntl qw(:flock);
+use POSIX qw(setsid);
 
 # ......................................................................
 # constructor
@@ -70,21 +71,27 @@ use Fcntl qw(:flock);
 #    than it, serve stale data when waiting for cache entry to be 
 #    regenerated (refreshed).  Non-adaptive.
 #    Defaults to -1 (never expire / always serve stale).
+#  * 'background_cache' (boolean)
+#    This enables/disables regenerating cache in background process.
+#    Defaults to true.
 sub new {
 	my $class = shift;
 	my %opts = ref $_[0] ? %{ $_[0] } : @_;
 
 	my $self = $class->SUPER::new(\%opts);
 
-	my ($max_lifetime);
+	my ($max_lifetime, $background_cache);
 	if (%opts) {
 		$max_lifetime =
 			$opts{'max_lifetime'} ||
 			$opts{'max_cache_lifetime'};
+		$background_cache = $opts{'background_cache'};
 	}
 	$max_lifetime = -1 unless defined($max_lifetime);
+	$background_cache = 1 unless defined($background_cache);
 
 	$self->set_max_lifetime($max_lifetime);
+	$self->set_background_cache($background_cache);
 
 	return $self;
 }
@@ -95,7 +102,7 @@ sub new {
 # http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
 
 # creates get_depth() and set_depth($depth) etc. methods
-foreach my $i (qw(max_lifetime)) {
+foreach my $i (qw(max_lifetime background_cache)) {
 	my $field = $i;
 	no strict 'refs';
 	*{"get_$field"} = sub {
@@ -146,6 +153,52 @@ sub _tempfile_to_path {
 # ......................................................................
 # interface methods
 
+sub _set_maybe_background {
+	my ($self, $key, $fetch_code, $set_code) = @_;
+
+	my $pid;
+	my (@result, @stale_result);
+
+	if ($self->{'background_cache'}) {
+		# try to retrieve stale data
+		@stale_result = $fetch_code->()
+			if $self->is_valid($key, $self->get_max_lifetime());
+
+		# fork if there is stale data, for background process
+		# to regenerate/refresh the cache (generate data)
+		$pid = fork() if (@stale_result);
+	}
+
+	if ($pid) {
+		## forked and are in parent process
+		# reap child, which spawned grandchild process (detaching it)
+		waitpid $pid, 0;
+
+	}	else {
+		## didn't fork, or are in background process
+
+		# daemonize background process, detaching it from parent
+		# see also Proc::Daemonize, Apache2::SubProcess
+		if (defined $pid) {
+			## in background process
+			POSIX::setsid(); # or setpgrp(0, 0);
+			fork() && CORE::exit(0);
+		}
+
+		@result = $set_code->();
+
+		if (defined $pid) {
+			## in background process; parent will serve stale data
+
+			# lockfile will be automatically closed on exit,
+			# and therefore lockfile would be unlocked
+			CORE::exit(0);
+		}
+	}
+
+	return @result > 0 ? @result : @stale_result;
+}
+
 sub _compute_generic {
 	my ($self, $key,
 	    $get_code, $fetch_code, $set_code, $fetch_locked) = @_;
@@ -162,16 +215,19 @@ sub _compute_generic {
 	do {
 		open my $lock_fh, '+>', $lockfile
 			or die "Could't open lockfile '$lockfile': $!";
+
 		$lock_state = flock($lock_fh, LOCK_EX | LOCK_NB);
 		if ($lock_state) {
-			# acquired writers lock
-			@result = $set_code->();
+			## acquired writers lock, have to generate data
+			@result = $self->_set_maybe_background($key, $fetch_code, $set_code);
 
 			# closing lockfile releases lock
 			close $lock_fh
 				or die "Could't close lockfile '$lockfile': $!";
 
 		} else {
+			## didn't acquire writers lock, get stale data or wait for regeneration
+
 			# try to retrieve stale data
 			@result = $fetch_code->()
 				if $self->is_valid($key, $self->get_max_lifetime());
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 8a52261..7f08863 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -24,9 +24,13 @@ diag("Testing '$INC{'GitwebCache/FileCacheWithLocking.pm'}'");
 
 my $cache = new_ok('GitwebCache::FileCacheWithLocking', [ {
 	'max_lifetime' => 0, # turn it off
+	'background_cache' => 0,
 } ]);
 isa_ok($cache, 'GitwebCache::SimpleFileCache');
 
+# compute can fork, don't generate zombies
+#local $SIG{CHLD} = 'IGNORE';
+
 # Test that default values are defined
 #
 ok(defined $GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT,
@@ -303,6 +307,9 @@ subtest 'parallel access' => sub {
 my $stale_value = 'Stale Value';
 
 subtest 'serving stale data when (re)generating' => sub {
+	# without background generation
+	$cache->set_background_cache(0);
+
 	$cache->set($key, $stale_value);
 	$call_count = 0;
 	$cache->set_expires_in(0);    # expire now
@@ -312,12 +319,39 @@ subtest 'serving stale data when (re)generating' => sub {
 		my $data = cache_compute($cache, $key, \&get_value_slow);
 		print $data;
 	};
-	ok(scalar(grep { $_ eq $stale_value } @output),
-	   'stale data in at least one process when expired');
+	# returning stale data works
+	is_deeply(
+		[sort @output],
+		[sort ($value, $stale_value)],
+		'no background: stale data returned by one process'
+	);
+
+	$cache->set_expires_in(-1); # never expire for next ->get
+	is($cache->get($key), $value,
+	   'no background: value got set correctly, even if stale data returned');
+
+
+	# with background generation
+	$cache->set_background_cache(1);
+
+	$cache->set($key, $stale_value);
+	$cache->set_expires_in(0);    # set value is now expired
+	@output = parallel_run {
+		my $data = cache_compute($cache, $key, \&get_value_slow);
+		print $data;
+	};
+	# returning stale data works
+	is_deeply(
+		\@output,
+		[ ($stale_value) x 2 ],
+		'background: stale data returned by both process when expired'
+	);
 
 	$cache->set_expires_in(-1); # never expire for next ->get
+	note('waiting for background process to have time to set data');
+	sleep $slow_time; # wait for background process to have chance to set data
 	is($cache->get($key), $value,
-	   'value got set correctly, even if stale data returned');
+	   'background: value got set correctly by background process');
 
 
 # 	$cache->set($key, $stale_value);
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 16/24] gitweb: Introduce %actions_info, gathering information about actions
  2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
                   ` (14 preceding siblings ...)
  2010-12-06 23:11 ` [PATCH 15/24] gitweb/lib - Regenerate (refresh) cache in background Jakub Narebski
@ 2010-12-06 23:11 ` Jakub Narebski
  2010-12-06 23:11 ` [PATCH 17/24] gitweb: Show appropriate "Generating..." page when regenerating cache Jakub Narebski
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2010-12-06 23:11 UTC (permalink / raw
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Junio C Hamano, demerphq, Aevar Arnfjord Bjarmason, Thomas Adam,
	Jakub Narebski

Currently it only contains information about output format, and is not
used anywhere.  It will be used to check whether current action
produces HTML output, and therefore is displaying HTML-based progress
info about (re)generating cache makes sense.

It can contain information about allowed extra options, whether to
display link to feed (Atom or RSS), etc. in easier and faster way than
listing all matching or all non-matching actions at appropriate place.


Currently not used; will be used in next commit, to check if action
produces HTML output and therefore we can use HTML-specific progress
indicator.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
While it is patch preparing infrastructure to be used by progress
indicator (generating info), it is IMHO a good idea in itself, and
that is why it is not marked as RFC.

This patch is the same as in the previous version.

It has no equivalent in "Gitweb cachning v7" series by J.H.

 gitweb/gitweb.perl |   20 ++++++++++++++++++++
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index f202d6b..da95388 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -818,6 +818,25 @@ our %allowed_options = (
 	"--no-merges" => [ qw(rss atom log shortlog history) ],
 );
 
+our %actions_info = ();
+sub evaluate_actions_info {
+	our %actions_info;
+	our (%actions);
+
+	# unless explicitely stated otherwise, default output format is html
+	foreach my $action (keys %actions) {
+		$actions_info{$action}{'output_format'} = 'html';
+	}
+	# list all exceptions; undef means variable (no definite format)
+	map { $actions_info{$_}{'output_format'} = 'text' }
+		qw(commitdiff_plain patch patches project_index blame_data);
+	map { $actions_info{$_}{'output_format'} = 'xml' }
+		qw(rss atom opml); # there are different types (document formats) of XML
+	map { $actions_info{$_}{'output_format'} = undef }
+		qw(blob_plain object);
+	$actions_info{'snapshot'}{'output_format'} = 'binary';
+}
+
 # fill %input_params with the CGI parameters. All values except for 'opt'
 # should be single values, but opt can be an array. We should probably
 # build an array of parameters that can be multi-valued, but since for the time
@@ -1226,6 +1245,7 @@ sub evaluate_argv {
 
 sub run {
 	evaluate_argv();
+	evaluate_actions_info();
 
 	$pre_listen_hook->()
 		if $pre_listen_hook;
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 17/24] gitweb: Show appropriate "Generating..." page when regenerating cache
  2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
                   ` (15 preceding siblings ...)
  2010-12-06 23:11 ` [PATCH 16/24] gitweb: Introduce %actions_info, gathering information about actions Jakub Narebski
@ 2010-12-06 23:11 ` Jakub Narebski
  2010-12-06 23:11 ` [PATCH 18/24] gitweb/lib - Configure running 'generating_info' when generating data Jakub Narebski
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2010-12-06 23:11 UTC (permalink / raw
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Junio C Hamano, demerphq, Aevar Arnfjord Bjarmason, Thomas Adam,
	Jakub Narebski

When there exist stale/expired (but not too stale) version of
(re)generated page in cache, gitweb returns stale version (and updates
cache in background, assuming 'background_cache' is set to true value).
When there is no stale version suitable to serve the client, currently
we have to wait for the data to be generated in full before showing it.
Add to GitwebCache::FileCacheWithLocking, via 'generating_info' callback,
the ability to show user some activity indicator / progress bar, to
show that we are working on generating data.

Note that without generating data in background, process generating
data wouldn't print progress info, because 'generating_info' can exit
(and in the case of gitweb's git_generating_data_html does exit).


Gitweb itself uses "Generating..." page as activity indicator, which
redirects (via <meta http-equiv="Refresh" ...>) to refreshed version
of the page after the cache is filled (via trick of not closing page
and therefore not closing connection till data is available in cache,
checked by getting shared/readers lock on lockfile for cache entry).
The git_generating_data_html() subroutine, which is used by gitweb
to implement this feature, is highly configurable: you can choose
frequency of writing some data so that connection won't get closed,
and maximum time to wait for data in "Generating..." page (see comments
in %generating_options hash definition).

The git_generating_data_html() subroutine would return early (not showing
HTML-base progress indicator) if action does not return HTML output, or
if web browser / user agent is a robot / web crawler (or gitweb is run as
standalone script).  In such cases HTML "Generating..." page does not make
much sense.

For this purpose new subrotuines action_outputs_html($action) (which uses
%actions_info introduced in earlier commit) and browser_is_robot() (which
uses HTTP::BrowserDetect if possible, and fall backs on simple check of
User-Agent string) were added.


WARNING: If 'generating_info' subroutine always exits, like
git_generating_data_html() currently does, then there would be problems
with error pages, which are not cached... unless the process generating
data does not use 'generating_info', see the next commit.  The initial
delay introduced in 2nd next commit migitates this issue somewhat.

Alternate solution would be to print output when generating data
('tee'-ing it).


Add test for simple (not exiting) 'generating_info' subroutine, for both
case with background generation disabled and enabled to t9503.

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
NOTE that after patch introducing DONE_REQUEST, it is this label that
should be used as target label at the end of git_generating_data_html()
rather than DONE_GITWEB.


Previous version of this patch lacked browser_is_robot() check in
git_generating_data_html().  We also now assume that Time::HiRes is
always available (it is in core for Perl 5.8.x that is requirement for
running gitweb).

Differences from relevant parts of "Gitweb caching v7" series by J.H.:
* When providing "Generating..." page for process which acquired writers
  lock, and spawned background process to (re)generate data in cache, we
  have to drop writers lock for 'generating_info' subroutine to try to get
  readers lock to know when data is finished generating.  This is done by
  closing lockfile, and then reopening; I have found that LOCK_UN doesn't
  work reliably enough.

  Note that there might be problem with error pages (die_error exists,
  and its output does not enter cache); it is addressed in next commit
  in this series.  I am not sure if in J.H. patch process generating
  data also shows "Generating..." page; on the other hand if I
  understand it correctly it has initial delay of 2 seconds, which also
  solves this issue as described in 2nd next commit.

* The subroutine that is responsible for doing "Generating..." progress
  info / activity indicator (cacheWaitForUpdate() subroutine in
  J.H. patch, git_generating_data_html() in this patch) resides in
  gitweb.perl, and not in caching module.  It is passed as callback (as
  code reference) to $cache constructor, as 'generating_info' parameter.

* The idea how git_generating_data_html() works for web browsers without
  Ajax, using only plain HTML and server-side scripting is somewhat
  described in comments.

* gitweb prints generating info in more restricted set of situations; the
  set of actions where gitweb does not generate activity indicator is
  larger.  We could probably provide activity indicator also for
  (possibly) non-HTML output, like 'blob_plain' or 'patches', provided
  that 'User-Agent' denotes that we are using web browser (is not a
  robot... which we now check).

* "Generating..." info behavior can be configured (at least the timings)
  via %generating_options hash, instead of having those options hardcoded.

* Waiting is done using blocking flock + alarm, rather than "busy wait"
  loop with non-blocking flock + sleep.

* The page title shows that it is progress info, and for what action
  (for git_generating_data_html()).

* Ability to use sub-second precision.

* Basic test for "Generating..." feature, though not for the
  git_generating_data_html(); see commit message.

 gitweb/gitweb.perl                             |  131 +++++++++++++++++++++++-
 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |   84 ++++++++++++----
 t/gitweb-lib.sh                                |    1 +
 t/t9503/test_cache_interface.pl                |   79 ++++++++++++++
 4 files changed, 276 insertions(+), 19 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index da95388..181b85d 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -22,7 +22,7 @@ use CGI qw(:standard :escapeHTML -nosticky);
 use CGI::Util qw(unescape);
 use CGI::Carp qw(fatalsToBrowser set_message);
 use Encode;
-use Fcntl ':mode';
+use Fcntl qw(:mode :flock);
 use File::Find qw();
 use File::Basename qw(basename);
 binmode STDOUT, ':utf8';
@@ -344,6 +344,31 @@ our %cache_options = (
 	# In theory this will make gitweb seem more responsive at the price of
 	# serving possibly stale data.
 	'background_cache' => 1,
+
+	# Subroutine which would be called when gitweb has to wait for data to
+	# be generated (it can't serve stale data because there isn't any,
+	# or if it exists it is older than 'max_lifetime').  The default
+	# is to use git_generating_data_html(), which creates "Generating..."
+	# page, which would then redirect or redraw/rewrite the page when
+	# data is ready.
+	# Set it to `undef' to disable this feature.
+	#
+	# Such subroutine (if invoked from GitwebCache::SimpleFileCache)
+	# is passed the following parameters: $cache instance, human-readable
+	# $key to current page, and filehandle $lock_fh to lockfile.
+	'generating_info' => \&git_generating_data_html,
+);
+# You define site-wide options for "Generating..." page (if enabled) here
+# (which means that $cache_options{'generating_info'} is set to coderef);
+# override them with $GITWEB_CONFIG as necessary.
+our %generating_options = (
+	# The time between generating new piece of output to prevent from
+	# redirection before data is ready, i.e. time between printing each
+	# dot in activity indicator / progress info, in seconds.
+	'print_interval' => 2,
+	# Maximum time "Generating..." page would be present, waiting for data,
+	# before unconditional redirect, in seconds.
+	'timeout' => $cache_options{'expires_min'},
 );
 # Set to _initialized_ instance of GitwebCache::Capture compatibile capturing
 # engine, i.e. one implementing ->new() constructor, and ->capture($code)
@@ -837,6 +862,23 @@ sub evaluate_actions_info {
 	$actions_info{'snapshot'}{'output_format'} = 'binary';
 }
 
+sub action_outputs_html {
+	my $action = shift;
+	return $actions_info{$action}{'output_format'} eq 'html';
+}
+
+sub browser_is_robot {
+	return 1 if !exists $ENV{'HTTP_USER_AGENT'}; # gitweb run as script
+	if (eval { require HTTP::BrowserDetect; }) {
+		my $browser = HTTP::BrowserDetect->new();
+		return $browser->robot();
+	}
+	# fallback on detecting known web browsers
+	return 0 if ($ENV{'HTTP_USER_AGENT'} =~ /\b(?:Mozilla|Opera|Safari|IE)\b/);
+	# be conservative; if not sure, assume non-interactive
+	return 1;
+}
+
 # fill %input_params with the CGI parameters. All values except for 'opt'
 # should be single values, but opt can be an array. We should probably
 # build an array of parameters that can be multi-valued, but since for the time
@@ -3555,6 +3597,93 @@ sub get_page_title {
 	return $title;
 }
 
+# creates "Generating..." page when caching enabled and not in cache
+sub git_generating_data_html {
+	my ($cache, $key, $lock_fh) = @_;
+
+	# whitelist of actions that should get "Generating..." page
+	if (!action_outputs_html($action) ||
+	    browser_is_robot()) {
+		return;
+	}
+
+	my $title = "[Generating...] " . get_page_title();
+	# TODO: the following line of code duplicates the one
+	# in git_header_html, and it should probably be refactored.
+	my $mod_perl_version = $ENV{'MOD_PERL'} ? " $ENV{'MOD_PERL'}" : '';
+
+	# Use the trick that 'refresh' HTTP header equivalent (set via http-equiv)
+	# with timeout of 0 seconds would redirect as soon as page is finished.
+	# It assumes that browser would display partially received page.
+	# This "Generating..." redirect page should not be cached (externally).
+	my %no_cache = (
+		# HTTP/1.0
+		-Pragma => 'no-cache',
+		# HTTP/1.1
+		-Cache_Control => join(', ', qw(private no-cache no-store must-revalidate
+		                                max-age=0 pre-check=0 post-check=0)),
+	);
+	print STDOUT $cgi->header(-type => 'text/html', -charset => 'utf-8',
+	                          -status=> '200 OK', -expires => 'now',
+	                          %no_cache);
+	print STDOUT <<"EOF";
+<?xml version="1.0" encoding="utf-8"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
+                      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">
+<!-- git web interface version $version -->
+<!-- git core binaries version $git_version -->
+<head>
+<meta http-equiv="content-type" content="text/html; charset=utf-8" />
+<meta http-equiv="refresh" content="0" />
+<meta name="generator" content="gitweb/$version git/$git_version$mod_perl_version" />
+<meta name="robots" content="noindex, nofollow" />
+<title>$title</title>
+</head>
+<body>
+EOF
+
+	local $| = 1; # autoflush
+	print STDOUT 'Generating...';
+
+	my $total_time = 0;
+	my $interval = $generating_options{'print_interval'} || 1;
+	my $timeout  = $generating_options{'timeout'};
+	my $alarm_handler = sub {
+		local $! = 1;
+		print STDOUT '.';
+		$total_time += $interval;
+		if ($total_time > $timeout) {
+			die "timeout\n";
+		}
+	};
+	eval {
+		local $SIG{ALRM} = $alarm_handler;
+		Time::HiRes::alarm($interval, $interval);
+		my $lock_acquired;
+		do {
+			# loop is needed here because SIGALRM (from 'alarm')
+			# can interrupt process of acquiring lock
+			$lock_acquired = flock($lock_fh, LOCK_SH); # blocking readers lock
+		} until ($lock_acquired);
+		alarm 0;
+	};
+	# It doesn't really matter if we got lock, or timed-out
+	# but we should re-throw unknown (unexpected) errors
+	die $@ if ($@ and $@ !~ /timeout/);
+
+	print STDOUT <<"EOF";
+
+</body>
+</html>
+EOF
+
+	# after refresh web browser would reload page and send new request
+	goto DONE_GITWEB;
+	#exit 0;
+	#return;
+}
+
 sub git_header_html {
 	my $status = shift || "200 OK";
 	my $expires = shift;
diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
index 82e88f1..694c318 100644
--- a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
+++ b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
@@ -74,24 +74,32 @@ use POSIX qw(setsid);
 #  * 'background_cache' (boolean)
 #    This enables/disables regenerating cache in background process.
 #    Defaults to true.
+#  * 'generating_info'
+#    Subroutine (code) called when process has to wait for cache entry
+#    to be (re)generated (when there is no not-too-stale data to serve
+#    instead), for other process (or bacground process).  It is passed
+#    $cache instance, $key, and opened $lock_fh filehandle to lockfile.
+#    Unset by default (which means no activity indicator).
 sub new {
 	my $class = shift;
 	my %opts = ref $_[0] ? %{ $_[0] } : @_;
 
 	my $self = $class->SUPER::new(\%opts);
 
-	my ($max_lifetime, $background_cache);
+	my ($max_lifetime, $background_cache, $generating_info);
 	if (%opts) {
 		$max_lifetime =
 			$opts{'max_lifetime'} ||
 			$opts{'max_cache_lifetime'};
 		$background_cache = $opts{'background_cache'};
+		$generating_info  = $opts{'generating_info'};
 	}
 	$max_lifetime = -1 unless defined($max_lifetime);
 	$background_cache = 1 unless defined($background_cache);
 
 	$self->set_max_lifetime($max_lifetime);
 	$self->set_background_cache($background_cache);
+	$self->set_generating_info($generating_info);
 
 	return $self;
 }
@@ -102,7 +110,7 @@ sub new {
 # http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
 
 # creates get_depth() and set_depth($depth) etc. methods
-foreach my $i (qw(max_lifetime background_cache)) {
+foreach my $i (qw(max_lifetime background_cache generating_info)) {
 	my $field = $i;
 	no strict 'refs';
 	*{"get_$field"} = sub {
@@ -115,6 +123,17 @@ foreach my $i (qw(max_lifetime background_cache)) {
 	};
 }
 
+# $cache->generating_info($key, $lock);
+# runs 'generating_info' subroutine, for activity indicator,
+# checking if it is defined first.
+sub generating_info {
+	my $self = shift;
+
+	if (defined $self->{'generating_info'}) {
+		$self->{'generating_info'}->($self, @_);
+	}
+}
+
 # ----------------------------------------------------------------------
 # utility functions and methods
 
@@ -153,6 +172,33 @@ sub _tempfile_to_path {
 # ......................................................................
 # interface methods
 
+sub _wait_for_data {
+	my ($self, $key,
+	    $lock_fh, $lockfile,
+	    $fetch_code, $fetch_locked) = @_;
+	my @result;
+
+	# provide "generating page..." info, if exists
+	$self->generating_info($key, $lock_fh);
+	# generating info may exit, so we can not get there
+
+	# get readers lock, i.e. wait for writer,
+	# which might be background process
+	flock($lock_fh, LOCK_SH);
+	# closing lockfile releases lock
+	if ($fetch_locked) {
+		@result = $fetch_code->();
+		close $lock_fh
+			or die "Could't close lockfile '$lockfile': $!";
+	} else {
+		close $lock_fh
+			or die "Could't close lockfile '$lockfile': $!";
+		@result = $fetch_code->();
+	}
+
+	return @result;
+}
+
 sub _set_maybe_background {
 	my ($self, $key, $fetch_code, $set_code) = @_;
 
@@ -165,8 +211,10 @@ sub _set_maybe_background {
 			if $self->is_valid($key, $self->get_max_lifetime());
 
 		# fork if there is stale data, for background process
-		# to regenerate/refresh the cache (generate data)
-		$pid = fork() if (@stale_result);
+		# to regenerate/refresh the cache (generate data),
+		# or if main process would show progress indicator
+		$pid = fork()
+			if (@stale_result || $self->{'generating_info'});
 	}
 
 	if ($pid) {
@@ -221,10 +269,19 @@ sub _compute_generic {
 			## acquired writers lock, have to generate data
 			@result = $self->_set_maybe_background($key, $fetch_code, $set_code);
 
-			# closing lockfile releases lock
+			# closing lockfile releases writer lock
 			close $lock_fh
 				or die "Could't close lockfile '$lockfile': $!";
 
+			if (!@result) {
+				# wait for background process to finish generating data
+				open $lock_fh, '<', $lockfile
+					or die "Couldn't reopen (for reading) lockfile '$lockfile': $!";
+
+				@result = $self->_wait_for_data($key, $lock_fh, $lockfile,
+				                                $fetch_code, $fetch_locked);
+			}
+
 		} else {
 			## didn't acquire writers lock, get stale data or wait for regeneration
 
@@ -233,19 +290,10 @@ sub _compute_generic {
 				if $self->is_valid($key, $self->get_max_lifetime());
 			return @result if @result;
 
-			# get readers lock (wait for writer)
-			# if there is no stale data to serve
-			flock($lock_fh, LOCK_SH);
-			# closing lockfile releases lock
-			if ($fetch_locked) {
-				@result = $fetch_code->();
-				close $lock_fh
-					or die "Could't close lockfile '$lockfile': $!";
-			} else {
-				close $lock_fh
-					or die "Could't close lockfile '$lockfile': $!";
-				@result = $fetch_code->();
-			}
+			# wait for regeneration
+			@result = $self->_wait_for_data($key, $lock_fh, $lockfile,
+			                                $fetch_code, $fetch_locked);
+
 		}
 	} until (@result || $lock_state);
 	# repeat until we have data, or we tried generating data oneself and failed
diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh
index 4ce067f..d191f0e 100755
--- a/t/gitweb-lib.sh
+++ b/t/gitweb-lib.sh
@@ -58,6 +58,7 @@ gitweb_enable_caching () {
 		$caching_enabled = 1;
 		$cache_options{"expires_in"} = -1;      # never expire cache for tests
 		$cache_options{"cache_root"} = "cache"; # to clear the right thing
+		$cache_options{"generating_info"} = undef;
 		EOF
 		rm -rf cache/
 	'
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 7f08863..81f0b76 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -395,6 +395,85 @@ subtest 'serving stale data when (re)generating' => sub {
 };
 $cache->set_expires_in(-1);
 
+
+# Test 'generating_info' feature
+#
+$cache->remove($key);
+my $progress_info = "Generating...";
+sub test_generating_info {
+	local $| = 1;
+	print "$progress_info";
+}
+$cache->set_generating_info(\&test_generating_info);
+
+subtest 'generating progress info' => sub {
+	my @progress;
+
+	# without background generation
+	$cache->set_background_cache(0);
+	$cache->remove($key);
+
+	@output = parallel_run {
+		my $data = cache_compute($cache, $key, \&get_value_slow);
+		print "$sep$data";
+	};
+	@progress = map { s/^(.*)\Q${sep}\E//o && $1 } @output;
+	is_deeply(
+		[sort @progress],
+		[sort ('', $progress_info)],
+		'no background: one process waiting for data prints progress info'
+	);
+	is_deeply(
+		\@output,
+		[ ($value) x 2 ],
+		'no background: both processes return correct value'
+	);
+
+
+	# without background generation, with stale value
+	$cache->set($key, $stale_value);
+	$cache->set_expires_in(0);    # set value is now expired
+	$cache->set_max_lifetime(-1); # stale data never expire
+	@output = parallel_run {
+		my $data = cache_compute($cache, $key, \&get_value_slow);
+		print "$sep$data";
+	};
+	is_deeply(
+		[sort @output],
+	## no progress for generating process without background generation;
+	#	[sort ("$progress_info$sep$value", "$sep$stale_value")],
+		[sort ("$sep$value", "$sep$stale_value")],
+		'no background, stale data: generating gets data, other gets stale data'
+	) or diag('@output is ', join ", ", sort @output);
+	$cache->set_expires_in(-1);
+
+
+	# with background generation
+	$cache->set_background_cache(1);
+	$cache->remove($key);
+
+	@output = parallel_run {
+		my $data = cache_compute($cache, $key, \&get_value_slow);
+		print "$sep$data";
+	};
+	@progress = map { s/^(.*)\Q${sep}\E//o && $1 } @output;
+	is_deeply(
+		\@progress,
+		[ ($progress_info) x 2],
+		'background: both process print progress info'
+	);
+	is_deeply(
+		\@output,
+		[ ($value) x 2 ],
+		'background: both processes return correct value'
+	);
+
+
+	done_testing();
+};
+$cache->set_expires_in(-1);
+
+
 done_testing();
 
 
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 18/24] gitweb/lib - Configure running 'generating_info' when generating data
  2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
                   ` (16 preceding siblings ...)
  2010-12-06 23:11 ` [PATCH 17/24] gitweb: Show appropriate "Generating..." page when regenerating cache Jakub Narebski
@ 2010-12-06 23:11 ` Jakub Narebski
  2010-12-06 23:11 ` [PATCH 19/24] gitweb: Add startup delay to activity indicator for cache Jakub Narebski
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2010-12-06 23:11 UTC (permalink / raw
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Junio C Hamano, demerphq, Aevar Arnfjord Bjarmason, Thomas Adam,
	Jakub Narebski

Add a new 'generating_info_is_safe' cache option; if true, then
process generating data (one that acquired exclusive writer's lock)
would run 'generating_info' if there is no stale data and background
cache generation is enabled.

If function generating (or printing) exits, which leads to cache entry
not being generated (for gitweb this means that there are pages which
are not cached, i.e. error pages), and 'generating_info' also exits,
this could result in bad behavior.  Therefore this new option is false
by default.


Updates t9503 test appropriately.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch was not present in previous version of this series.

This is one solution to the problem mentioned in commit message;
another is to use initial delay in "Generating..." page, as it is done
in next patch, and as it was done in previous version of this series.

 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |   17 ++++++++++++++---
 t/t9503/test_cache_interface.pl                |    1 +
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
index 694c318..0823c55 100644
--- a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
+++ b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
@@ -80,26 +80,35 @@ use POSIX qw(setsid);
 #    instead), for other process (or bacground process).  It is passed
 #    $cache instance, $key, and opened $lock_fh filehandle to lockfile.
 #    Unset by default (which means no activity indicator).
+#  * 'generating_info_is_safe' (boolean)
+#    If true, run 'generating_info' subroutine also in the project that
+#    is generating data.  This has effect only when 'background_cache'
+#    is true (both 'background_cache' and 'generating_info_is_safe' must
+#    be true for project generating data to run 'generating_info').
+#    Defaults to false for safety.
 sub new {
 	my $class = shift;
 	my %opts = ref $_[0] ? %{ $_[0] } : @_;
 
 	my $self = $class->SUPER::new(\%opts);
 
-	my ($max_lifetime, $background_cache, $generating_info);
+	my ($max_lifetime, $background_cache, $generating_info, $gen_info_is_safe);
 	if (%opts) {
 		$max_lifetime =
 			$opts{'max_lifetime'} ||
 			$opts{'max_cache_lifetime'};
 		$background_cache = $opts{'background_cache'};
 		$generating_info  = $opts{'generating_info'};
+		$gen_info_is_safe = $opts{'generating_info_is_safe'};
 	}
 	$max_lifetime = -1 unless defined($max_lifetime);
 	$background_cache = 1 unless defined($background_cache);
+	$gen_info_is_safe = 0 unless defined($gen_info_is_safe);
 
 	$self->set_max_lifetime($max_lifetime);
 	$self->set_background_cache($background_cache);
 	$self->set_generating_info($generating_info);
+	$self->set_generating_info_is_safe($gen_info_is_safe);
 
 	return $self;
 }
@@ -110,7 +119,8 @@ sub new {
 # http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
 
 # creates get_depth() and set_depth($depth) etc. methods
-foreach my $i (qw(max_lifetime background_cache generating_info)) {
+foreach my $i (qw(max_lifetime background_cache
+                  generating_info generating_info_is_safe)) {
 	my $field = $i;
 	no strict 'refs';
 	*{"get_$field"} = sub {
@@ -214,7 +224,8 @@ sub _set_maybe_background {
 		# to regenerate/refresh the cache (generate data),
 		# or if main process would show progress indicator
 		$pid = fork()
-			if (@stale_result || $self->{'generating_info'});
+			if (@stale_result ||
+			    ($self->{'generating_info'} && $self->{'generating_info_is_safe'}));
 	}
 
 	if ($pid) {
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 81f0b76..480cfbc 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -333,6 +333,7 @@ subtest 'serving stale data when (re)generating' => sub {
 
 	# with background generation
 	$cache->set_background_cache(1);
+	$cache->set_generating_info_is_safe(1);
 
 	$cache->set($key, $stale_value);
 	$cache->set_expires_in(0);    # set value is now expired
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 19/24] gitweb: Add startup delay to activity indicator for cache
  2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
                   ` (17 preceding siblings ...)
  2010-12-06 23:11 ` [PATCH 18/24] gitweb/lib - Configure running 'generating_info' when generating data Jakub Narebski
@ 2010-12-06 23:11 ` Jakub Narebski
  2010-12-06 23:11 ` [PATCH/RFC 20/24] gitweb/lib - Add support for setting error handler in cache Jakub Narebski
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2010-12-06 23:11 UTC (permalink / raw
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Junio C Hamano, demerphq, Aevar Arnfjord Bjarmason, Thomas Adam,
	Jakub Narebski

This adds support for [optional] startup delay to git_generating_data_html()
subroutine, which is used to provide "Generating..." page as activity
indicator when waiting for page to be generated if caching.  If the data
(page contents) gets generated within $generating_options{'staryp_delay'}
seconds, the "Generating..." page won't get displayed.

This feature was created in response to complaint by Petr 'Pasky' Baudis'
about "Generating..." feature.


NOTE: This startup delay allows to use git_generating_data_html() also
for process generating data, assuming that die_error(), which turns of
capturing and which output (error page) is not cache, finishes within
startup delay.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch doesn't have equivalent in J.H. series, and is rather
extension of idea of progress indicator for (re)generating [cache]
data.  It was inspired by comment by Petr 'Pasky' Baudis on #git IRC
channel that one doesn't want to have "Generating..." page to be shown
only to immediately vanish.

The additional advantage is that it solves problem with link that
leads to error page: die_error exits / ends request without generating
cache entry, so if "Generating..." is done also for process generating
data in background (perhaps it isn't true in J.H. series; I am not
sure about code flow there), then gitweb would endlessly show
"Generating..." page if it didn't wait and exited / ended request also
for die_error case.

The only difference compared to previous version of this series is that
git_generating_data_html is now marked as 'generating_info_is_safe'.

 gitweb/gitweb.perl |   28 ++++++++++++++++++++++++++++
 1 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 181b85d..c974e79 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -357,11 +357,22 @@ our %cache_options = (
 	# is passed the following parameters: $cache instance, human-readable
 	# $key to current page, and filehandle $lock_fh to lockfile.
 	'generating_info' => \&git_generating_data_html,
+
+	# This enables/disables using 'generating_info' subroutine by process
+	# generating data, when not too stale data is not available (data is then
+	# generated in background).  Because git_generating_data_html() includes
+	# initial delay (of 1 second by default), and we can assume that die_error
+	# finishes within this time, then generating error pages should be safe
+	# from infinite "Generating page..." loop.
+	'generating_info_is_safe' => 1,
 );
 # You define site-wide options for "Generating..." page (if enabled) here
 # (which means that $cache_options{'generating_info'} is set to coderef);
 # override them with $GITWEB_CONFIG as necessary.
 our %generating_options = (
+	# The delay before displaying "Generating..." page, in seconds.  It is
+	# intended for "Generating..." page to be shown only when really needed.
+	'startup_delay' => 1,
 	# The time between generating new piece of output to prevent from
 	# redirection before data is ready, i.e. time between printing each
 	# dot in activity indicator / progress info, in seconds.
@@ -3607,6 +3618,23 @@ sub git_generating_data_html {
 		return;
 	}
 
+	# Initial delay
+	if ($generating_options{'startup_delay'} > 0) {
+		eval {
+			local $SIG{ALRM} = sub { die "alarm clock restart\n" }; # NB: \n required
+			alarm $generating_options{'startup_delay'};
+			flock($lock_fh, LOCK_SH); # blocking readers lock
+			alarm 0;
+		};
+		if ($@) {
+			# propagate unexpected errors
+			die $@ if $@ !~ /alarm clock restart/;
+		} else {
+			# we got response within 'startup_delay' timeout
+			return;
+		}
+	}
+
 	my $title = "[Generating...] " . get_page_title();
 	# TODO: the following line of code duplicates the one
 	# in git_header_html, and it should probably be refactored.
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH/RFC 20/24] gitweb/lib - Add support for setting error handler in cache
  2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
                   ` (18 preceding siblings ...)
  2010-12-06 23:11 ` [PATCH 19/24] gitweb: Add startup delay to activity indicator for cache Jakub Narebski
@ 2010-12-06 23:11 ` Jakub Narebski
  2010-12-06 23:11 ` [PATCH/RFC 21/24] gitweb: Wrap die_error to use as error handler for caching engine Jakub Narebski
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2010-12-06 23:11 UTC (permalink / raw
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Junio C Hamano, demerphq, Aevar Arnfjord Bjarmason, Thomas Adam,
	Jakub Narebski

GitwebCache::SimpleFileCache and GitwebCache::FileCacheWithLocking
acquired support for 'on_error'/'error_handler' parameter, which
accepts the same values as 'on_get_error' and 'on_set_error' option in
CHI, with exception of support for "log".  The default is "die" (as it
was before), though it now uses "croak" from Carp, rather than plain
"die".

Added basic test for 'on_error' in t9503: setting it to error handler
that dies, and setting it to 'ignore'.


The way error handler coderef is invoked is slightly different from
the way CHI does it: the error handler doesn't pass key and raw error
message as subsequent parameters.

Because '$self->_handle_error($msg)' is used in place of 'die $msg',
read_file and write_fh subroutines had to be converted to methods, to
have access to $self.  Alternate solution would be to catch errors
using "eval BLOCK" in ->get() and ->set() etc., like in CHI::Driver.

Note that external subroutines that croak()/die() on error (like
'mkpath' or 'tempfile') are now wrapped in eval block (this would be
not necessary if alternate solution described above was used).


While at it refactor ensuring that directory exists (for opening file
for writing, possibly creating it) into ->ensure_path($dir) method.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This and the next patch are new and were not present in previous version
of this series.  That is why this and next patch are marked as RFC.

The way CHI does it, by wrapping commands in eval { ... } (or using
Try::Tiny) in ->get, ->set (and in our case also ->compute and
->compute_fh, as those no longer are defined in term of ->get and ->set,
or ->get/->set-like operations) could be a better solution.  Some
dicsussion / thinking on it is required.


In "Gitweb caching v7" series cache.pl / cache.pm used die_error
directly, and that is why it couldn't be a proper Perl module, and why
it had to be loaded using 'do <file>' rather than 'require <package>'.

Note however that compared to "Gitweb caching v7" in this patch we
leak more, possibly sensitive, information like exact file that was
attempted to open and location in source file.  OTOH it helps
debugging errors in gitweb.


Note that we might want to treat on_set_error and on_get_error
differently, especially ENOSPC (No space left on device) on set.
This is left for (optionally) the future commit.

 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |   25 ++++---
 gitweb/lib/GitwebCache/SimpleFileCache.pm      |   88 +++++++++++++++++------
 t/t9503/test_cache_interface.pl                |   44 ++++++++++++
 3 files changed, 124 insertions(+), 33 deletions(-)

diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
index 0823c55..d994b3a 100644
--- a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
+++ b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
@@ -63,6 +63,14 @@ use POSIX qw(setsid);
 #  * 'increase_factor' [seconds / 100% CPU load]
 #    Factor multiplying 'check_load' result when calculating cache lietime.
 #    Defaults to 60 seconds for 100% SPU load ('check_load' returning 1.0).
+#  * 'on_error' (similar to CHI 'on_get_error'/'on_set_error')
+#    How to handle runtime errors occurring during cache gets and cache
+#    sets, which may or may not be considered fatal in your application.
+#    Options are:
+#    * "die" (the default) - call die() with an appropriate message
+#    * "warn" - call warn() with an appropriate message
+#    * "ignore" - do nothing
+#    * <coderef> - call this code reference with an appropriate message
 #
 # (all the above are inherited from GitwebCache::SimpleFileCache)
 #
@@ -155,10 +163,7 @@ sub get_lockname {
 	my $lockfile = $self->path_to_key($key, \my $dir) . '.lock';
 
 	# ensure that directory leading to lockfile exists
-	if (!-d $dir) {
-		eval { mkpath($dir, 0, 0777); 1 }
-			or die "Couldn't mkpath '$dir' for lockfile: $!";
-	}
+	$self->ensure_path($dir);
 
 	return $lockfile;
 }
@@ -174,7 +179,7 @@ sub _tempfile_to_path {
 
 	my $tempname = "$file.tmp";
 	open my $temp_fh, '>', $tempname
-		or die "Couldn't open temporary file '$tempname' for writing: $!";
+		or $self->_handle_error("Couldn't open temporary file '$tempname' for writing: $!");
 
 	return ($temp_fh, $tempname);
 }
@@ -199,10 +204,10 @@ sub _wait_for_data {
 	if ($fetch_locked) {
 		@result = $fetch_code->();
 		close $lock_fh
-			or die "Could't close lockfile '$lockfile': $!";
+			or $self->_handle_error("Could't close lockfile '$lockfile': $!");
 	} else {
 		close $lock_fh
-			or die "Could't close lockfile '$lockfile': $!";
+			or $self->_handle_error("Could't close lockfile '$lockfile': $!");
 		@result = $fetch_code->();
 	}
 
@@ -273,7 +278,7 @@ sub _compute_generic {
 	my $lock_state; # needed for loop condition
 	do {
 		open my $lock_fh, '+>', $lockfile
-			or die "Could't open lockfile '$lockfile': $!";
+			or $self->_handle_error("Could't open lockfile '$lockfile': $!");
 
 		$lock_state = flock($lock_fh, LOCK_EX | LOCK_NB);
 		if ($lock_state) {
@@ -282,12 +287,12 @@ sub _compute_generic {
 
 			# closing lockfile releases writer lock
 			close $lock_fh
-				or die "Could't close lockfile '$lockfile': $!";
+				or $self->_handle_error("Could't close lockfile '$lockfile': $!");
 
 			if (!@result) {
 				# wait for background process to finish generating data
 				open $lock_fh, '<', $lockfile
-					or die "Couldn't reopen (for reading) lockfile '$lockfile': $!";
+					or $self->_handle_error("Couldn't reopen (for reading) lockfile '$lockfile': $!");
 
 				@result = $self->_wait_for_data($key, $lock_fh, $lockfile,
 				                                $fetch_code, $fetch_locked);
diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
index 21ec434..8d0a6d9 100644
--- a/gitweb/lib/GitwebCache/SimpleFileCache.pm
+++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
@@ -20,6 +20,7 @@ package GitwebCache::SimpleFileCache;
 use strict;
 use warnings;
 
+use Carp;
 use File::Path qw(mkpath);
 use File::Temp qw(tempfile);
 use Digest::MD5 qw(md5_hex);
@@ -77,6 +78,14 @@ our $DEFAULT_NAMESPACE = '';
 #  * 'increase_factor' [seconds / 100% CPU load]
 #    Factor multiplying 'check_load' result when calculating cache lietime.
 #    Defaults to 60 seconds for 100% SPU load ('check_load' returning 1.0).
+#  * 'on_error' (similar to CHI 'on_get_error'/'on_set_error')
+#    How to handle runtime errors occurring during cache gets and cache
+#    sets, which may or may not be considered fatal in your application.
+#    Options are:
+#    * "die" (the default) - call die() with an appropriate message
+#    * "warn" - call warn() with an appropriate message
+#    * "ignore" - do nothing
+#    * <coderef> - call this code reference with an appropriate message
 sub new {
 	my $class = shift;
 	my %opts = ref $_[0] ? %{ $_[0] } : @_;
@@ -86,6 +95,7 @@ sub new {
 
 	my ($root, $depth, $ns);
 	my ($expires_min, $expires_max, $increase_factor, $check_load);
+	my ($on_error);
 	if (%opts) {
 		$root =
 			$opts{'cache_root'} ||
@@ -102,6 +112,11 @@ sub new {
 			$opts{'expires_max'};
 		$increase_factor = $opts{'expires_factor'};
 		$check_load      = $opts{'check_load'};
+		$on_error =
+			$opts{'on_error'} ||
+			$opts{'on_get_error'} ||
+			$opts{'on_set_error'} ||
+			$opts{'error_handler'};
 	}
 	$root  = $DEFAULT_CACHE_ROOT  unless defined($root);
 	$depth = $DEFAULT_CACHE_DEPTH unless defined($depth);
@@ -111,6 +126,9 @@ sub new {
 		if (!defined($expires_max) && exists $opts{'expires_in'});
 	$expires_max = -1 unless (defined($expires_max));
 	$increase_factor = 60 unless defined($increase_factor);
+	$on_error = "die"
+		unless (defined $on_error &&
+		        (ref($on_error) eq 'CODE' || $on_error =~ /^die|warn|ignore$/));
 
 	$self->set_root($root);
 	$self->set_depth($depth);
@@ -119,6 +137,7 @@ sub new {
 	$self->set_expires_max($expires_max);
 	$self->set_increase_factor($increase_factor);
 	$self->set_check_load($check_load);
+	$self->set_on_error($on_error);
 
 	return $self;
 }
@@ -131,7 +150,8 @@ sub new {
 
 # creates get_depth() and set_depth($depth) etc. methods
 foreach my $i (qw(depth root namespace
-                  expires_min expires_max increase_factor check_load)) {
+                  expires_min expires_max increase_factor check_load
+                  on_error)) {
 	my $field = $i;
 	no strict 'refs';
 	*{"get_$field"} = sub {
@@ -234,7 +254,7 @@ sub path_to_key {
 }
 
 sub read_file {
-	my $filename = shift;
+	my ($self, $filename) = @_;
 
 	# Fast slurp, adapted from File::Slurp::read, with unnecessary options removed
 	# via CHI::Driver::File (from CHI-0.33)
@@ -255,12 +275,12 @@ sub read_file {
 	}
 
 	close $read_fh
-		or die "Couldn't close file '$filename' opened for reading: $!";
+		or $self->_handle_error("Couldn't close file '$filename' opened for reading: $!");
 	return $buf;
 }
 
 sub write_fh {
-	my ($write_fh, $filename, $data) = @_;
+	my ($self, $write_fh, $filename, $data) = @_;
 
 	# Fast spew, adapted from File::Slurp::write, with unnecessary options removed
 	# via CHI::Driver::File (from CHI-0.33)
@@ -278,7 +298,20 @@ sub write_fh {
 	}
 
 	close $write_fh
-		or die "Couldn't close file '$filename' opened for writing: $!";
+		or $self->_handle_error("Couldn't close file '$filename' opened for writing: $!");
+}
+
+sub ensure_path {
+	my $self = shift;
+	my $dir = shift || return;
+
+	if (!-d $dir) {
+		# mkpath will croak()/die() if there is an error
+		eval {
+			mkpath($dir, 0, 0777);
+			1;
+		} or $self->_handle_error($@);
+	}
 }
 
 # ----------------------------------------------------------------------
@@ -291,12 +324,27 @@ sub _tempfile_to_path {
 	my ($self, $file, $dir) = @_;
 
 	# tempfile will croak() if there is an error
-	return tempfile("${file}_XXXXX",
-		#DIR => $dir,
-		'UNLINK' => 0, # ensure that we don't unlink on close; file is renamed
-		'SUFFIX' => '.tmp');
+	my ($temp_fh, $tempname);
+	eval {
+		($temp_fh, $tempname) = tempfile("${file}_XXXXX",
+			#DIR => $dir,
+			'UNLINK' => 0, # ensure that we don't unlink on close; file is renamed
+			'SUFFIX' => '.tmp');
+	} or $self->_handle_error($@);
+	return ($temp_fh, $tempname);
 }
 
+# based on _handle_get_error and _dispatch_error_msg from CHI::Driver
+sub _handle_error {
+	my ($self, $error) = @_;
+
+	for ($self->get_on_error()) {
+		(ref($_) eq 'CODE') && do { $_->($error) };
+		/^ignore$/ && do { };
+		/^warn$/   && do { carp $error };
+		/^die$/    && do { croak $error };
+	}
+}
 
 # ----------------------------------------------------------------------
 # worker methods
@@ -307,7 +355,7 @@ sub fetch {
 	my $file = $self->path_to_key($key);
 	return unless (defined $file && -f $file);
 
-	return read_file($file);
+	return $self->read_file($file);
 }
 
 sub store {
@@ -318,20 +366,17 @@ sub store {
 	return unless (defined $file && defined $dir);
 
 	# ensure that directory leading to cache file exists
-	if (!-d $dir) {
-		# mkpath will croak()/die() if there is an error
-		mkpath($dir, 0, 0777);
-	}
+	$self->ensure_path($dir);
 
 	# generate a temporary file
 	my ($temp_fh, $tempname) = $self->_tempfile_to_path($file, $dir);
 	chmod 0666, $tempname
 		or warn "Couldn't change permissions to 0666 / -rw-rw-rw- for '$tempname': $!";
 
-	write_fh($temp_fh, $tempname, $data);
+	$self->write_fh($temp_fh, $tempname, $data);
 
 	rename($tempname, $file)
-		or die "Couldn't rename temporary file '$tempname' to '$file': $!";
+		or $self->_handle_error("Couldn't rename temporary file '$tempname' to '$file': $!");
 }
 
 # get size of an element associated with the $key (not the size of whole cache)
@@ -362,7 +407,7 @@ sub remove {
 		or return;
 	return unless -f $file;
 	unlink($file)
-		or die "Couldn't remove file '$file': $!";
+		or $self->_handle_error("Couldn't remove file '$file': $!");
 }
 
 # $cache->is_valid($key[, $expires_in])
@@ -379,7 +424,7 @@ sub is_valid {
 	return 0 unless -f $path;
 	# get its modification time
 	my $mtime = (stat(_))[9] # _ to reuse stat structure used in -f test
-		or die "Couldn't stat file '$path': $!";
+		or $self->_handle_error("Couldn't stat file '$path': $!");
 	# cache entry is invalid if it is size 0 (in bytes)
 	return 0 unless ((stat(_))[7] > 0);
 
@@ -468,10 +513,7 @@ sub set_coderef_fh {
 	return unless (defined $path && defined $dir);
 
 	# ensure that directory leading to cache file exists
-	if (!-d $dir) {
-		# mkpath will croak()/die() if there is an error
-		mkpath($dir, 0, 0777);
-	}
+	$self->ensure_path($dir);
 
 	# generate a temporary file
 	my ($fh, $tempfile) = $self->_tempfile_to_path($path, $dir);
@@ -481,7 +523,7 @@ sub set_coderef_fh {
 
 	close $fh;
 	rename($tempfile, $path)
-		or die "Couldn't rename temporary file '$tempfile' to '$path': $!";
+		or $self->_handle_error("Couldn't rename temporary file '$tempfile' to '$path': $!");
 
 	open $fh, '<', $path or return;
 	return ($fh, $path);
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 480cfbc..28a5c5e 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -9,6 +9,7 @@ use Fcntl qw(:DEFAULT);
 use IO::Handle;
 use IO::Select;
 use IO::Pipe;
+use File::Basename;
 
 use Test::More;
 
@@ -475,6 +476,49 @@ subtest 'generating progress info' => sub {
 $cache->set_expires_in(-1);
 
 
+# ----------------------------------------------------------------------
+# ERROR HANDLING
+
+# Test 'on_error' handler
+#
+sub test_handler {
+	die "test_handler\n"; # newline needed
+}
+SKIP: {
+	# prepare error condition
+	my $is_prepared = 1;
+	$is_prepared &&= $cache->set($key, $value);
+	$is_prepared &&= chmod 0555, dirname($cache->path_to_key($key));
+
+	my $ntests = 1; # in subtest
+	skip "could't prepare error condition for 'on_error' tests", $ntests
+		unless $is_prepared;
+	skip "cannot test reliably 'on_error' as root (id=$>)", $ntests
+		unless $> != 0;
+
+	subtest 'error handler' => sub {
+		my ($result, $error);
+
+		# check that error handler works
+		$cache->set_on_error(\&test_handler);
+		$result = eval {
+			$cache->remove($key);
+		} or $error = $@;
+		ok(!defined $result,         'on_error: died on error (via handler)');
+		diag("result is $result") if defined $result;
+		is($error, "test_handler\n", 'on_error: test_handler was used');
+
+		# check that "ignore" works
+		$cache->set_on_error('ignore');
+		$result = eval {
+			$cache->remove($key);
+		} or $error = $@;
+		ok(defined $result,          'on_error: error ignored if requested');
+	};
+}
+chmod 0777, dirname($cache->path_to_key($key));
+
+
 done_testing();
 
 
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH/RFC 21/24] gitweb: Wrap die_error to use as error handler for caching engine
  2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
                   ` (19 preceding siblings ...)
  2010-12-06 23:11 ` [PATCH/RFC 20/24] gitweb/lib - Add support for setting error handler in cache Jakub Narebski
@ 2010-12-06 23:11 ` Jakub Narebski
  2010-12-06 23:11 ` [PATCH/RFC 22/24] gitweb: Support legacy options used by kernel.org " Jakub Narebski
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2010-12-06 23:11 UTC (permalink / raw
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Junio C Hamano, demerphq, Aevar Arnfjord Bjarmason, Thomas Adam,
	Jakub Narebski

Use cache_error_handler subroutine, wrapping die_error (and
HTML-escaping error message), as 'on_error' handler for
GitwebCache::SimpleFileCache and its derivatives, and as both
'on_get_error' and 'on_set_error' handler for CHI based caching
engine.

Added single test in t9501 that checks if error in caching layer
produces "500 Internal Server Error".

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch and previous one this depends on were not present in previous
version of this series.  That is why this patch was marked as an
RFC... well that and we also need to decide how error in caching layer
should be shown.

The error message is slightly more fancy than what "Gitweb caching v7"
series by J.H. would show on error in the caching layer.

 gitweb/gitweb.perl                       |   27 +++++++++++++++++++++++++++
 t/t9501-gitweb-standalone-http-status.sh |    8 ++++++++
 2 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index c974e79..5904d27 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -365,6 +365,20 @@ our %cache_options = (
 	# finishes within this time, then generating error pages should be safe
 	# from infinite "Generating page..." loop.
 	'generating_info_is_safe' => 1,
+
+	# How to handle runtime errors occurring during cache gets and cache
+	# sets.  Options are:
+	#  * "die" (the default) - call die() with an appropriate message
+	#  * "warn" - call warn() with an appropriate message
+	#  * "ignore" - do nothing
+	#  * <coderef> - call this code reference with an appropriate message
+	# Note that gitweb catches 'die <message>' via custom handle_errors_html
+	# handler, set via set_message() from CGI::Carp.  'warn <message>' are
+	# written to web server logs.
+	#
+	# The default is to use cache_error_handler, which wraps die_error.
+	# Only first argument passed to cache_error_handler is used (c.f. CHI)
+	'on_error' => \&cache_error_handler,
 );
 # You define site-wide options for "Generating..." page (if enabled) here
 # (which means that $cache_options{'generating_info'} is set to coderef);
@@ -1186,6 +1200,17 @@ sub configure_gitweb_features {
 	}
 }
 
+# custom error handler for caching engine (Internal Server Error)
+sub cache_error_handler {
+	my $error = shift;
+
+	$error = to_utf8($error);
+	$error =
+		"Error in caching layer: <i>".ref($cache)."</i><br>\n".
+		CGI::escapeHTML($error);
+	# die_error() would exit
+	die_error(undef, undef, $error);
+}
 # custom error handler: 'die <message>' is Internal Server Error
 sub handle_errors_html {
 	my $msg = shift; # it is already HTML escaped
@@ -1348,6 +1373,8 @@ sub configure_caching {
 			# (CHI compatibile initialization)
 			'root_dir' => $cache_options{'cache_root'},
 			'depth' => $cache_options{'cache_depth'},
+			'on_get_error' => $cache_options{'on_error'},
+			'on_set_error' => $cache_options{'on_error'},
 		});
 	}
 	unless (defined $capture && ref($capture)) {
diff --git a/t/t9501-gitweb-standalone-http-status.sh b/t/t9501-gitweb-standalone-http-status.sh
index 168e494..af89422 100755
--- a/t/t9501-gitweb-standalone-http-status.sh
+++ b/t/t9501-gitweb-standalone-http-status.sh
@@ -147,5 +147,13 @@ test_expect_success 'caching enabled (non-existent commit, 404 error)' '
 test_debug 'echo "headers" && cat gitweb.headers'
 test_debug 'echo "body"    && cat gitweb.body'
 
+test_expect_success 'caching errors are 500 Internal Server Error' '
+	chmod 0000 cache/ &&
+	test_when_finished "chmod 0777 cache/" &&
+	gitweb_run "p=.git" &&
+	grep "Status: 500 Internal Server Error" gitweb.headers &&
+	grep "500 - Internal Server Error" gitweb.body
+'
+test_debug 'echo "headers" && cat gitweb.headers'
 
 test_done
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH/RFC 22/24] gitweb: Support legacy options used by kernel.org caching engine
  2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
                   ` (20 preceding siblings ...)
  2010-12-06 23:11 ` [PATCH/RFC 21/24] gitweb: Wrap die_error to use as error handler for caching engine Jakub Narebski
@ 2010-12-06 23:11 ` Jakub Narebski
  2010-12-06 23:11 ` [RFC/PATCH 23/24] gitweb/lib - Add clear() and size() methods to caching interface Jakub Narebski
  2010-12-06 23:11 ` [RFC PATCH 24/24] gitweb: Add beginnings of cache administration page (proof of concept) Jakub Narebski
  23 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2010-12-06 23:11 UTC (permalink / raw
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Junio C Hamano, demerphq, Aevar Arnfjord Bjarmason, Thomas Adam,
	Jakub Narebski

Try to translate between config variables used by gitweb caching
patches[1] by John 'Warthog9' Hawley (J.H.) used, among others, by
git.kernel.org, and new %cache_options options, but only if the legacy
config variables were set in the config file.

Note that $cache_enable is *not* translated to $caching_enabled.

Footnotes:
~~~~~~~~~~
[1] See for example "Gitweb caching v7" thread on git mailing list:
    http://thread.gmane.org/gmane.comp.version-control.git/160147

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch was not present in previous version of this series.

The drawback of this patch is that if any legacy config variable is set
in config file, then any change to %cache_options covering the same area
as legacy config variable would be ignored.  I don't see a better
solution, unless we can somehow check if value of %cache_options was
changed via gitweb config file.

I wonder how useful this patch would be; do people using caching from
"Gitweb caching v7" (caching feom git.kernel.org) configure it, or do
they use default configuration?

 gitweb/gitweb.perl |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 5904d27..1521bf2 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -284,6 +284,9 @@ our $caching_enabled = 0;
 # Suggested mechanism:
 # mv $cachedir $cachedir.flush && mkdir $cachedir && rm -rf $cachedir.flush
 our $cache;
+
+# Legacy options configuring behavior of git.kernel.org caching
+our ($minCacheTime, $maxCacheTime, $cachedir, $backgroundCache, $maxCacheLife);
 # You define site-wide cache options defaults here; override them with
 # $GITWEB_CONFIG as necessary.
 our %cache_options = (
@@ -1363,6 +1366,19 @@ sub configure_caching {
 		$cache ||= 'GitwebCache::FileCacheWithLocking';
 		eval "require $cache";
 		die $@ if $@;
+
+		# support for legacy config variables configuring cache behavior
+		# (those variables are/were used by caching engine by John Hawley,
+		# used among others by custom gitweb at http://git.kernel.org);
+		# it assumes that if those variables are defined, then we should
+		# use them - no provision is made for having both legacy variables
+		# and new %cache_options set in config file(s).
+		$cache_options{'cache_root'} = $cachedir if defined $cachedir;
+		$cache_options{'expires_min'} = $minCacheTime if defined $minCacheTime;
+		$cache_options{'expires_max'} = $maxCacheTime if defined $maxCacheTime;
+		$cache_options{'background_cache'} = $backgroundCache if defined $backgroundCache;
+		$cache_options{'max_lifetime'} = $maxCacheLife if defined $maxCacheLife;
+
 		$cache = $cache->new({
 			%cache_options,
 			#'cache_root' => '/tmp/cache',
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC/PATCH 23/24] gitweb/lib - Add clear() and size() methods to caching interface
  2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
                   ` (21 preceding siblings ...)
  2010-12-06 23:11 ` [PATCH/RFC 22/24] gitweb: Support legacy options used by kernel.org " Jakub Narebski
@ 2010-12-06 23:11 ` Jakub Narebski
  2010-12-06 23:11 ` [RFC PATCH 24/24] gitweb: Add beginnings of cache administration page (proof of concept) Jakub Narebski
  23 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2010-12-06 23:11 UTC (permalink / raw
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Junio C Hamano, demerphq, Aevar Arnfjord Bjarmason, Thomas Adam,
	Jakub Narebski

Add ->size() method, which following Cache::Cache interface returns
estimated total size of all entries in whole cache (in the namsepace
assiciated with give cache instance).  Note that ->get_size($key)
returns size of a single entry!

Add ->clear() method, which removes all entries from the namespace
associated with given cache instance.  For safety it requires
namespace to be set to true value, which means that it cannot be
empty; therefore default namespace is changed to 'gitweb'.

The ->clear() method should be fairly safe, because it first renames
directory (which should be atomic), and only then removes it
(following code from CGI::Driver::File).

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch is unchanged from previous version of this series.

This patch (implementing "backend") together with next patch
(implementing "interface") are meant to provide nice web interface to
safely cleaning (pruning) cache, as described in comments and
documentation (how to do this manually).

This is less an RFC than next patch, but without next patch it is of
very limited use.

 gitweb/lib/GitwebCache/SimpleFileCache.pm |   48 ++++++++++++++++++++++++++--
 t/t9503/test_cache_interface.pl           |    1 -
 2 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
index 8d0a6d9..cd489f8 100644
--- a/gitweb/lib/GitwebCache/SimpleFileCache.pm
+++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
@@ -21,8 +21,9 @@ use strict;
 use warnings;
 
 use Carp;
-use File::Path qw(mkpath);
-use File::Temp qw(tempfile);
+use File::Path qw(mkpath rmtree);
+use File::Temp qw(tempfile mktemp);
+use File::Find qw(find);
 use Digest::MD5 qw(md5_hex);
 
 # by default, the cache nests all entries on the filesystem single
@@ -38,7 +39,7 @@ our $DEFAULT_CACHE_ROOT = "cache";
 # by default we don't use cache namespace (empty namespace);
 # empty namespace does not allow for simple implementation of clear() method.
 #
-our $DEFAULT_NAMESPACE = '';
+our $DEFAULT_NAMESPACE = "gitweb";
 
 # ......................................................................
 # constructor
@@ -393,7 +394,7 @@ sub get_size {
 
 
 # ......................................................................
-# interface methods
+# interface methods dealing with single item
 
 # Removing and expiring
 
@@ -547,6 +548,45 @@ sub compute_fh {
 	return ($fh, $filename);
 }
 
+# ......................................................................
+# interface methods dealing with whole namespace
+
+# $cache->clear();
+#
+# Remove all entries from the namespace.
+# Namespace must be defined and not empty.
+sub clear {
+	my $self = shift;
+
+	return unless $self->get_namespace();
+
+	my $namespace_dir = $self->path_to_namespace();
+	return if !-d $namespace_dir;
+
+	my $renamed_dir = mktemp($namespace_dir . '.XXXX');
+	rename($namespace_dir, $renamed_dir);
+	rmtree($renamed_dir);
+	die "Couldn't remove '$renamed_dir' directory"
+		if -d $renamed_dir;
+}
+
+# $size = $cache->size();
+#
+# Size of whole names (or whole cache if namespace empty)
+sub size {
+	my $self = shift;
+
+	my $namespace_dir = $self->path_to_namespace();
+	return if !-d $namespace_dir;
+
+	my $total_size = 0;
+	my $add_size = sub { $total_size += -s $File::Find::name };
+
+	File::Find::find({ wanted => $add_size, no_chdir => 1 }, $namespace_dir);
+
+	return $total_size;
+}
+
 1;
 __END__
 # end of package GitwebCache::SimpleFileCache;
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 28a5c5e..031c895 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -46,7 +46,6 @@ SKIP: {
 		unless ($GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT &&
 		        $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH);
 
-	is($cache->get_namespace(), '', "default namespace is ''");
 	cmp_ok($cache->get_root(),  'eq', $GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT,
 		"default cache root is '$GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT'");
 	cmp_ok($cache->get_depth(), '==', $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH,
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 24/24] gitweb: Add beginnings of cache administration page (proof of concept)
  2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
                   ` (22 preceding siblings ...)
  2010-12-06 23:11 ` [RFC/PATCH 23/24] gitweb/lib - Add clear() and size() methods to caching interface Jakub Narebski
@ 2010-12-06 23:11 ` Jakub Narebski
  23 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2010-12-06 23:11 UTC (permalink / raw
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Junio C Hamano, demerphq, Aevar Arnfjord Bjarmason, Thomas Adam,
	Jakub Narebski

Currently cache administration page ('cache' action) shows estimated
size of cache, and provides link to clearing cache.

Cache administration page is visible only on local computer; the same
is true with respect to ability to clear cache.  Those are bare
beginnings of autorization framework.

If you can use cache administration page, you will see 'admin' link at
the botom of the page to it.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This is "frontend" (interface) patch for gitweb cache management.

This is very much work in progress, sent to git mailing list as a
proof of concept.  Would something like that (perhaps with more
advanced authorization for admin) be useful?

The 'cache' page looks like this:

  Cache location      Size	 
  ------------------+---------+--------------
  cache/gitweb        14 KiB	[Clear cache]
  ------------------+---------+--------------

where '[Clear cache]' is a submit button.


Compared to previous series there is now 'admin' link that leads to
cache administration page, which is present only if you have
permissions to access said page.

Also for empty cache we show '-'.  

Code was made slightly safer, though still I wouldn't use it in
production, as there might be a chance that remote attacker could
possibly clear cache (meaning I didn't prove that it couldn't happen).


In the final version this patch should probably be split into a few
smaller commits, for example:
* adding support for $opts{'-title'} to git_header_html
* action_is_cacheable subroutine
* human_readable_size (which could also be used in 'tree' view at request)
* xxx_admin_auth_ok, i.e. authorization and authentication for admin and
  write actions in gitweb
* 'cache' (i.e. cache admin) and 'clear_cache' (worker) methods

 gitweb/gitweb.perl |  138 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 134 insertions(+), 4 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 1521bf2..66e240f 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -25,6 +25,8 @@ use Encode;
 use Fcntl qw(:mode :flock);
 use File::Find qw();
 use File::Basename qw(basename);
+use POSIX; # for POSIX::ceil($x)
+
 binmode STDOUT, ':utf8';
 
 our $t0;
@@ -888,6 +890,10 @@ sub evaluate_actions_info {
 	map { $actions_info{$_}{'output_format'} = undef }
 		qw(blob_plain object);
 	$actions_info{'snapshot'}{'output_format'} = 'binary';
+
+	# specify uncacheable actions
+	map { $actions_info{$_}{'uncacheable'} = 1 }
+		qw(cache clear_cache);
 }
 
 sub action_outputs_html {
@@ -895,6 +901,11 @@ sub action_outputs_html {
 	return $actions_info{$action}{'output_format'} eq 'html';
 }
 
+sub action_is_cacheable {
+	my $action = shift;
+	return !$actions_info{$action}{'uncacheable'};
+}
+
 sub browser_is_robot {
 	return 1 if !exists $ENV{'HTTP_USER_AGENT'}; # gitweb run as script
 	if (eval { require HTTP::BrowserDetect; }) {
@@ -1245,12 +1256,13 @@ sub dispatch {
 	if (!defined($actions{$action})) {
 		die_error(400, "Unknown action");
 	}
-	if ($action !~ m/^(?:opml|project_list|project_index)$/ &&
+	if ($action !~ m/^(?:opml|project_list|project_index|cache|clear_cache)$/ &&
 	    !$project) {
 		die_error(400, "Project needed");
 	}
 
-	if ($caching_enabled) {
+	if ($caching_enabled &&
+	    action_is_cacheable($action)) {
 		# human readable key identifying gitweb output
 		my $output_key = href(-replay => 1, -full => 1, -path_info => 0);
 
@@ -1397,6 +1409,10 @@ sub configure_caching {
 		require GitwebCache::Capture::Simple;
 		$capture = GitwebCache::Capture::Simple->new();
 	}
+
+	# some actions are available only if cache is turned on
+	$actions{'cache'} = \&git_cache_admin;
+	$actions{'clear_cache'} = \&git_cache_clear;
 }
 
 run();
@@ -3760,7 +3776,7 @@ sub git_header_html {
 	my $expires = shift;
 	my %opts = @_;
 
-	my $title = get_page_title();
+	my $title = $opts{'-title'} || get_page_title();
 	my $content_type;
 	# require explicit support from the UA if we are to send the page as
 	# 'application/xhtml+xml', otherwise send it as plain old 'text/html'.
@@ -3936,10 +3952,18 @@ sub git_footer_html {
 
 	} else {
 		print $cgi->a({-href => href(project=>undef, action=>"opml"),
-		              -class => $feed_class}, "OPML") . " ";
+		              -class => $feed_class}, "OPML") . "\n";
 		print $cgi->a({-href => href(project=>undef, action=>"project_index"),
 		              -class => $feed_class}, "TXT") . "\n";
+
 	}
+
+	if ($actions{'cache'} &&
+	    cache_admin_auth_ok()) {
+		print $cgi->a({-href => href(project=>undef, action=>"cache"),
+		              -class => $feed_class}, "<i>admin</i>") . "\n";
+	}
+
 	print "</div>\n"; # class="page_footer"
 
 	# timing info doesn't make much sense with output (response) caching,
@@ -7412,3 +7436,109 @@ XML
 </opml>
 XML
 }
+
+# see Number::Bytes::Human
+sub human_readable_size {
+	my $bytes = shift || return;
+
+	my @units = ('', 'KiB', 'MiB', 'GiB', 'TiB');
+	my $block = 1024;
+
+	my $x = $bytes;
+	my $unit;
+	foreach (@units) {
+		$unit = $_, last if POSIX::ceil($x) < $block;
+		$x /= $block;
+	}
+
+	my $num;
+	if ($x < 10.0) {
+		$num = sprintf("%.1f", POSIX::ceil($x*10)/10); 
+	} else {
+		$num = sprintf("%d", POSIX::ceil($x));
+	}
+
+	return "$num $unit";
+}
+
+sub cache_admin_auth_ok {
+	if (defined $ENV{'REMOTE_ADDR'}) {
+		if (defined $ENV{'SERVER_ADDR'}) {
+			# SERVER_ADDR is not in RFC 3875
+			return $ENV{'SERVER_ADDR'} eq $ENV{'REMOTE_ADDR'};
+		} elsif ($ENV{'REMOTE_ADDR'} =~ m!^(?:127\.0\.0\.1|::1/128)$!) {
+			# localhost in IPv4 or IPv6
+			return 1;
+		}
+	} else {
+		# REMOTE_ADDR not defined, probably calling gitweb as script
+		return 1;
+	}
+
+	# restrict all but specified cases
+	return 0;
+}
+
+sub git_cache_admin {
+	$caching_enabled
+		or die_error(403, "Caching disabled");
+	cache_admin_auth_ok()
+		or die_error(403, "Cache administration not allowed");
+	$cache && ref($cache)
+		or die_error(500, "Cache is not present");
+
+	git_header_html(undef, undef,
+		-title => to_utf8($site_name) . " - Gitweb cache");
+
+	print <<'EOF_HTML';
+<table class="cache_admin">
+<tr><th>Cache location</th><th>Size</th><th>&nbsp;</th></tr>
+EOF_HTML
+	print '<tr class="light">' .
+	      '<td class="path">' . esc_path($cache->path_to_namespace()) . '</td>' .
+	      '<td>';
+	my $size;
+	if ($cache->can('size')) {
+		$size = $cache->size();
+	} elsif ($cache->can('get_size')) {
+		$size = $cache->get_size();
+	}
+	if (defined $size) {
+		print human_readable_size($size);
+	} else {
+		print '-';
+	}
+	print '</td><td>';
+	if ($cache->can('clear')) {
+		print $cgi->start_form({-method => "POST",
+		                        -action => $my_uri,
+		                        -enctype => CGI::URL_ENCODED}) .
+		      $cgi->input({-name=>"a", -value=>"clear_cache", -type=>"hidden"}) .
+		      $cgi->submit({-label => 'Clear cache'}) .
+		      $cgi->end_form();
+	}
+	print <<'EOF_HTML';
+</td></tr>
+</table>
+EOF_HTML
+
+	git_footer_html();
+}
+
+sub git_cache_clear {
+	$caching_enabled
+		or die_error(403, "Caching disabled");
+	cache_admin_auth_ok()
+		or die_error(403, "Clearing cache not allowed");
+	$cache && ref($cache)
+		or die_error(500, "Cache is not present");
+
+	if ($cgi->request_method() eq 'POST') {
+
+		$cache->clear();
+	}
+
+	#print "cleared";
+	print $cgi->redirect(-uri => href(action=>'cache', -full=>1),
+	                     -status => '303 See Other');
+}
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2010-12-06 23:20 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
2010-12-06 23:10 ` [PATCH 01/24] t/test-lib.sh: Export also GIT_BUILD_DIR in test_external Jakub Narebski
2010-12-06 23:10 ` [PATCH 02/24] gitweb: Prepare for splitting gitweb Jakub Narebski
2010-12-06 23:10 ` [PATCH 03/24] gitweb/lib - Very simple file based cache Jakub Narebski
2010-12-06 23:10 ` [PATCH 04/24] gitweb/lib - Stat-based cache expiration Jakub Narebski
2010-12-06 23:10 ` [PATCH 05/24] gitweb/lib - Regenerate entry if the cache file has size of 0 Jakub Narebski
2010-12-06 23:10 ` [PATCH 06/24] gitweb/lib - Simple output capture by redirecting STDOUT Jakub Narebski
2010-12-06 23:10 ` [PATCH 07/24] gitweb/lib - Cache captured output (using get/set) Jakub Narebski
2010-12-06 23:10 ` [PATCH 08/24] gitweb: Add optional output caching Jakub Narebski
2010-12-06 23:10 ` [PATCH 09/24] gitweb/lib - Adaptive cache expiration time Jakub Narebski
2010-12-06 23:10 ` [PATCH 10/24] gitweb/lib - Use CHI compatibile (compute method) caching interface Jakub Narebski
2010-12-06 23:10 ` [PATCH 11/24] gitweb/lib - capture output directly to cache entry file Jakub Narebski
2010-12-06 23:10 ` [PATCH 12/24] gitweb/lib - Use locking to avoid 'cache miss stampede' problem Jakub Narebski
2010-12-06 23:10 ` [PATCH 13/24] gitweb/lib - No need for File::Temp when locking Jakub Narebski
2010-12-06 23:10 ` [PATCH 14/24] gitweb/lib - Serve stale data when waiting for filling cache Jakub Narebski
2010-12-06 23:11 ` [PATCH 15/24] gitweb/lib - Regenerate (refresh) cache in background Jakub Narebski
2010-12-06 23:11 ` [PATCH 16/24] gitweb: Introduce %actions_info, gathering information about actions Jakub Narebski
2010-12-06 23:11 ` [PATCH 17/24] gitweb: Show appropriate "Generating..." page when regenerating cache Jakub Narebski
2010-12-06 23:11 ` [PATCH 18/24] gitweb/lib - Configure running 'generating_info' when generating data Jakub Narebski
2010-12-06 23:11 ` [PATCH 19/24] gitweb: Add startup delay to activity indicator for cache Jakub Narebski
2010-12-06 23:11 ` [PATCH/RFC 20/24] gitweb/lib - Add support for setting error handler in cache Jakub Narebski
2010-12-06 23:11 ` [PATCH/RFC 21/24] gitweb: Wrap die_error to use as error handler for caching engine Jakub Narebski
2010-12-06 23:11 ` [PATCH/RFC 22/24] gitweb: Support legacy options used by kernel.org " Jakub Narebski
2010-12-06 23:11 ` [RFC/PATCH 23/24] gitweb/lib - Add clear() and size() methods to caching interface Jakub Narebski
2010-12-06 23:11 ` [RFC PATCH 24/24] gitweb: Add beginnings of cache administration page (proof of concept) Jakub Narebski

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).