git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCHv5 00/17] gitweb: Simple file based output caching
@ 2010-10-06 22:01 Jakub Narebski
  2010-10-06 22:01 ` [PATCHv5 01/17] t/test-lib.sh: Export also GIT_BUILD_DIR in test_external Jakub Narebski
                   ` (22 more replies)
  0 siblings, 23 replies; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 22:01 UTC (permalink / raw)
  To: git; +Cc: John 'Warthog9' Hawley, Petr Baudis, admin,
	Jakub Narebski

This 17+ patches long patch series is intended as a replacement for
large 'gitweb: File based caching layer (from git.kernel.org)'
mega-patch by John 'Warthog9' Hawley aka J.H., by starting small and
adding features piece by piece.

This is fourthfifth version (5th release) of this series, and is
available on http://repo.or.cz/w/git/jnareb-git.git as
'gitweb/cache-kernel-v5' branch.  Earlier versions are available there
as branches 'gitweb/cache-kernel', 'gitweb/cache-kernel-v2',
'gitweb/cache-kernel-v3' and 'gitweb/cache-kernel-v4'.  Previous version
of this series was sent to git mailing list as:

* [RFC PATCHv4 00/17] gitweb: Simple file based output caching
  Message-Id: <1276531710-22945-1-git-send-email-jnareb@gmail.com>
  http://thread.gmane.org/gmane.comp.version-control.git/149101

This version is based on top of aad24ad (gitweb: Move call to
evaluate_git_version after evaluate_gitweb_config, 2010-10-05) in
'gitweb/web' branch of http://repo.or.cz/w/git/jnareb-git.git
repository.

The 'gitweb/web' branch consist of the following commits on top of
1e63341 (Merge branch 'maint', 2010-09-30) in 'master' branch of main
git repository:
  gitweb/Makefile: Include gitweb/config.mak
  t/gitweb-lib.sh: Add support for GITWEB_TEST_INSTALLED
  gitweb/Makefile: Add 'test' and 'test-installed' targets
  gitweb: Move call to evaluate_git_version after evaluate_gitweb_config

The first three patches were send as 3-part series:
* [PATCHv2 0/3] Testing installed gitweb
  Message-Id: <1285506146-8009-1-git-send-email-jnareb@gmail.com>
  http://thread.gmane.org/gmane.comp.version-control.git/157222

Those are required to be able to test installed version of gitweb and
modules, to check if there are any problems with process of installing
gitweb.

The last patch in 'gitweb/web' branch was send as:
* [PATCH] gitweb: Move call to evaluate_git_version after evaluate_gitweb_config 
  Message-ID: <20100926113431.28918.53550.stgit@localhost.localdomain>
  http://article.gmane.org/gmane.comp.version-control.git/157220

This was needed during development of this series, to avoid spurious
output of gitweb tests when using `--debug' option.


The MAIN DIFFERENCE from previous version is that it is based on never
version of gitweb; in particular putting all code in subroutines: gitweb
calls run() subroutine on startup, and run_request() subroutine on each
request (per connection) from the 'jn/gitweb-fastcgi' branch.

The patches
  gitweb: Return or exit after done serving request
  gitweb: Fix typo in hash key name in %opts in git_header_html

present at beginning of previous (fourth) version of this series are
already merged in.

The patches
  gitweb/lib - Benchmarking GitwebCache::SimpleFileCache (in t/9603/)
  gitweb/lib - Alternate ways of capturing output

which were interleaved with other patches in previous version of this
series got moved to the end of series, and marked proof of concept
("PoC").


The following changes are available in the git repository at:

  git://repo.or.cz/git/jnareb-git.git gitweb/cache-kernel-v5


All comments welcome!

J.H., could you comment on how this series relates to the gitweb code
*currently* running on git.kernel.org wrt. code?  If possible, could you
try to compare performance of those two implementations, the one
presented here in this series, and the one used by git.kernel.org.

Pasky or other repo.or.cz admins, could you tell us if and what kind of
caching gitweb used by repo.or.cz uses, and how it compares (wrt code
and if possible also performance) to solution implemented in this
series?


TODO list and areas of possible improvements would be send in separate
email.


Jakub Narebski (21):
  t/test-lib.sh: Export also GIT_BUILD_DIR in test_external
  gitweb: Prepare for splitting gitweb
  gitweb/lib - Very simple file based cache
  gitweb/lib - Stat-based cache expiration
  gitweb/lib - Regenerate entry if the cache file has size of 0
  gitweb/lib - Simple select(FH) based output capture
  gitweb/lib - Cache captured output (using get/set)
  gitweb: Add optional output caching (from gitweb/cache.pm)
  gitweb/lib - Adaptive cache expiration time
  gitweb/lib - Use CHI compatibile (compute method) caching interface
  gitweb/lib - Use locking to avoid 'cache miss stampede' problem
  gitweb/lib - No need for File::Temp when locking
  gitweb/lib - Serve stale data when waiting for filling cache
  gitweb/lib - Regenerate (refresh) cache in background
  gitweb: Introduce %actions_info, gathering information about actions
  gitweb: Show appropriate "Generating..." page when regenerating cache
  gitweb: Add startup delay to activity indicator for cache
  gitweb/lib - Add clear() and size() methods to caching interface
  gitweb: Add beginnings of cache administration page
  gitweb/lib - Benchmarking GitwebCache::SimpleFileCache (in t/9603/)
  gitweb/lib - Alternate ways of capturing output

 gitweb/Makefile                                |   24 ++-
 gitweb/README                                  |   61 +++
 gitweb/gitweb.perl                             |  459 ++++++++++++++++++++++--
 gitweb/lib/GitwebCache/CacheOutput.pm          |   95 +++++
 gitweb/lib/GitwebCache/Capture.pm              |   66 ++++
 gitweb/lib/GitwebCache/Capture/PerlIO.pm       |   79 ++++
 gitweb/lib/GitwebCache/Capture/SelectFH.pm     |   82 ++++
 gitweb/lib/GitwebCache/Capture/TiedCapture.pm  |   76 ++++
 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |  280 ++++++++++++++
 gitweb/lib/GitwebCache/SimpleFileCache.pm      |  472 ++++++++++++++++++++++++
 gitweb/lib/Tie/Restore.pm                      |   24 ++
 gitweb/lib/TiedCapture/PerlIO.pm               |   56 +++
 gitweb/lib/TiedCapture/String.pm               |   53 +++
 t/t9500-gitweb-standalone-no-errors.sh         |   20 +
 t/t9503-gitweb-caching-interface.sh            |   34 ++
 t/t9503/benchmark_caching_interface.pl         |  209 +++++++++++
 t/t9503/test_cache_interface.pl                |  407 ++++++++++++++++++++
 t/t9504-gitweb-capture-interface.sh            |   34 ++
 t/t9504/benchmark_capture_implementations.pl   |  226 +++++++++++
 t/t9504/test_capture_implementations.pl        |   85 +++++
 t/t9504/test_capture_interface.pl              |   76 ++++
 t/t9505-gitweb-cache.sh                        |   34 ++
 t/t9505/test_cache_output.pl                   |   70 ++++
 t/test-lib.sh                                  |    4 +-
 24 files changed, 2997 insertions(+), 29 deletions(-)
 create mode 100644 gitweb/lib/GitwebCache/CacheOutput.pm
 create mode 100644 gitweb/lib/GitwebCache/Capture.pm
 create mode 100644 gitweb/lib/GitwebCache/Capture/PerlIO.pm
 create mode 100644 gitweb/lib/GitwebCache/Capture/SelectFH.pm
 create mode 100644 gitweb/lib/GitwebCache/Capture/TiedCapture.pm
 create mode 100644 gitweb/lib/GitwebCache/FileCacheWithLocking.pm
 create mode 100644 gitweb/lib/GitwebCache/SimpleFileCache.pm
 create mode 100644 gitweb/lib/Tie/Restore.pm
 create mode 100644 gitweb/lib/TiedCapture/PerlIO.pm
 create mode 100644 gitweb/lib/TiedCapture/String.pm
 create mode 100755 t/t9503-gitweb-caching-interface.sh
 create mode 100755 t/t9503/benchmark_caching_interface.pl
 create mode 100755 t/t9503/test_cache_interface.pl
 create mode 100755 t/t9504-gitweb-capture-interface.sh
 create mode 100755 t/t9504/benchmark_capture_implementations.pl
 create mode 100755 t/t9504/test_capture_implementations.pl
 create mode 100755 t/t9504/test_capture_interface.pl
 create mode 100755 t/t9505-gitweb-cache.sh
 create mode 100755 t/t9505/test_cache_output.pl

-- 
1.7.3

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCHv5 01/17] t/test-lib.sh: Export also GIT_BUILD_DIR in test_external
  2010-10-06 22:01 [PATCHv5 00/17] gitweb: Simple file based output caching Jakub Narebski
@ 2010-10-06 22:01 ` Jakub Narebski
  2010-10-06 22:01 ` [PATCHv5 02/17] gitweb: Prepare for splitting gitweb Jakub Narebski
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 22:01 UTC (permalink / raw)
  To: git; +Cc: John 'Warthog9' Hawley, Petr Baudis, admin,
	Jakub Narebski

This way we can use it in test scripts written in other languages
(e.g. in Perl), and not rely on "$TEST_DIRECTORY/.."

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This is not strictly required, but if we modified t/gitweb-lib.sh to
use GIT_BUILD_DIR, we would want to do the same in the test scripts
for caching interface etc. included in this series.

This patch was not present in previous version of this series, and it
has no equivalent in original patch by J.H.

 t/test-lib.sh |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)
 mode change 100644 => 100755 t/test-lib.sh

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 830e5e7..968d240 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -510,9 +510,9 @@ test_external () {
 		# Announce the script to reduce confusion about the
 		# test output that follows.
 		say_color "" "# run $test_count: $descr ($*)"
-		# Export TEST_DIRECTORY, TRASH_DIRECTORY and GIT_TEST_LONG
+		# Export TEST_DIRECTORY, GIT_BUILD_DIR, TRASH_DIRECTORY and GIT_TEST_LONG
 		# to be able to use them in script
-		export TEST_DIRECTORY TRASH_DIRECTORY GIT_TEST_LONG
+		export TEST_DIRECTORY GIT_BUILD_DIR TRASH_DIRECTORY GIT_TEST_LONG
 		# Run command; redirect its stderr to &4 as in
 		# test_run_, but keep its stdout on our stdout even in
 		# non-verbose mode.
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCHv5 02/17] gitweb: Prepare for splitting gitweb
  2010-10-06 22:01 [PATCHv5 00/17] gitweb: Simple file based output caching Jakub Narebski
  2010-10-06 22:01 ` [PATCHv5 01/17] t/test-lib.sh: Export also GIT_BUILD_DIR in test_external Jakub Narebski
@ 2010-10-06 22:01 ` Jakub Narebski
  2010-10-06 22:01 ` [PATCHv5 03/17] gitweb/lib - Very simple file based cache Jakub Narebski
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 22:01 UTC (permalink / raw)
  To: git; +Cc: John 'Warthog9' Hawley, Petr Baudis, admin,
	Jakub Narebski

Prepare gitweb for having been split into modules that are to be
installed alongside gitweb in 'lib/' subdirectory, by adding

  use lib __DIR__.'/lib';

to gitweb.perl (to main gitweb script), and preparing for putting
modules (relative path) in $(GITWEB_MODULES) in gitweb/Makefile.

This preparatory work allows to add new module to gitweb by simply
adding

  GITWEB_MODULES += <module>

to gitweb/Makefile (assuming that the module is in 'gitweb/lib/'
directory).

While at it pass GITWEBLIBDIR in addition to GITWEB_TEST_INSTALLED
to test instaleed version of gitweb and installed version of modules
(for tests which check individual (sub)modules).


Using __DIR__ from Dir::Self module (not in core, that's why currently
gitweb includes excerpt of code from Dir::Self defining __DIR__) was
chosen over using FindBin-based solution (in core since perl 5.00307,
while gitweb itself requires at least perl 5.8.0) because FindBin uses
BEGIN block, which is a problem under mod_perl and other persistent
Perl environments (thought there are workarounds).

At Pavan Kumar Sankara suggestion gitweb/Makefile uses

  install [OPTION]... SOURCE... DIRECTORY

format (2nd format) with single SOURCE rather than

  install [OPTION]... SOURCE DEST

format (1st format) because of security reasons (race conditions).
Modern GNU install has `-T' / `--no-target-directory' option, but we
cannot rely that the $(INSTALL) we are using supports this option.

The install-modules target in gitweb/Makefile uses shell 'for' loop,
instead of make's $(foreach) function, to avoid possible problem with
generating a command line that exceeded the maximum argument list
length.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch can serve as a way to introduce infrastructure required for
splitting gitweb into modules for better maintability.  This issue was
to be first part (first milestone) for "Integrated web client for git"
GSoC2010 project by Pavan Kumar Sankara (with Christian Couder as
mentor, and Petr 'Pasky' Baudis as co-mentor)... which unfortunately
failed middterm evaluations.

As you can see later in series it makes it very easy to add new gitweb
modules which are to be installed.  Just add

  GITWEB_MODULES += Module/Foo.pm

to gitweb/Makefile for Module::Foo (present in gitweb/lib/Module/Foo.pm)
and 'install-gitweb' target would automatically install it along with
gitweb.

Differences from v4:
* The 'install-modules' target in gitweb/Makefile uses shell 'for' loop,
  instead of make's $(foreach) function; see commit message

* Modules in GITWEB_MODULES would not include the 'lib/' prefix, so
  for module in gitweb/lib/Module/Foo.pm it would be
    GITWEB_MODULES += Module/Foo.pm
  and not
    GITWEB_MODULES += lib/Module/Foo.pm

* Installing modules is left for separate 'install-modules' target,
  which is prerequisite for 'install' target

* Includes hunk about 'test-installed' target, making it possible to
  test installed modules and not only installed gitweb.

* 'install' and 'install-modules' targets are marked as .PHONY

* Does not contain spurious vertical whitespace change (adding empty
  line before 'binmode' invocation).

* Commit message is much longer and more detailed.


Differences from relevant parts of J.H. patch (probably apply also to
the gitweb version running git.kernel.org):
* It uses standard Perl way of naming and installing modules; the main
  caching module is named GitwebCache::CacheOutput and it is in the
  gitweb/lib/GitwebCache/CacheOutput.pm and is loaded using 'require'
  and found thanks to 'use lib', rather than being in gitweb/cache.pm
  and loaded using "do 'cache.pm'", and found by path.

  This has the advantage of automatically detecting errors in the
  caching module.  (Note that this patch series up to and including v2
  also used 'do $cache_pm', but implemented error handling, while v3
  used 'require $cache_pm' i.e. also finding cache module by path.)

* Instead of one 'cache.pm' file, there are separate Perl modules for
  handling different parts of work: file-based cache, capturing gitweb
  output, and caching captured output are all in separate modules.

  The code of those modules are made generic and disentangled from
  gitweb.perl code, which makes it possible to test those modules in
  isolation.

* 'make install-gitweb' is newer than original patch by J.H.; this
  patch series makes it possible to install gitweb, its dependencies
  (static files) _and modules_, using this one simple command.

 gitweb/Makefile    |   17 +++++++++++++++--
 gitweb/gitweb.perl |    8 ++++++++
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/gitweb/Makefile b/gitweb/Makefile
index df908a1..c2d72e4 100644
--- a/gitweb/Makefile
+++ b/gitweb/Makefile
@@ -56,6 +56,7 @@ PERL_PATH  ?= /usr/bin/perl
 bindir_SQ = $(subst ','\'',$(bindir))#'
 gitwebdir_SQ = $(subst ','\'',$(gitwebdir))#'
 gitwebstaticdir_SQ = $(subst ','\'',$(gitwebdir)/static)#'
+gitweblibdir_SQ = $(subst ','\'',$(gitwebdir)/lib)#'
 SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH))#'
 PERL_PATH_SQ  = $(subst ','\'',$(PERL_PATH))#'
 DESTDIR_SQ    = $(subst ','\'',$(DESTDIR))#'
@@ -151,20 +152,32 @@ test:
 
 test-installed:
 	GITWEB_TEST_INSTALLED='$(DESTDIR_SQ)$(gitwebdir_SQ)' \
+	GITWEBLIBDIR='$(DESTDIR_SQ)$(gitweblibdir_SQ)' \
 		$(MAKE) -C ../t gitweb-test
 
 ### Installation rules
 
-install: all
+install: all install-modules
 	$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(gitwebdir_SQ)'
 	$(INSTALL) -m 755 $(GITWEB_PROGRAMS) '$(DESTDIR_SQ)$(gitwebdir_SQ)'
 	$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(gitwebstaticdir_SQ)'
 	$(INSTALL) -m 644 $(GITWEB_FILES) '$(DESTDIR_SQ)$(gitwebstaticdir_SQ)'
 
+install-modules:
+	install_dirs="$(sort $(dir $(GITWEB_MODULES)))" && \
+	for dir in $$install_dirs; do \
+		test -d '$(DESTDIR_SQ)$(gitweblibdir_SQ)/$$dir' || \
+		$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(gitweblibdir_SQ)/$$dir'; \
+	done
+	gitweb_modules="$(GITWEB_MODULES)" && \
+	for mod in $$gitweb_modules; do \
+		$(INSTALL) -m 644 lib/$$mod '$(DESTDIR_SQ)$(gitweblibdir_SQ)/$$(dirname $$mod)'; \
+	done
+
 ### Cleaning rules
 
 clean:
 	$(RM) gitweb.cgi static/gitweb.min.js static/gitweb.min.css GITWEB-BUILD-OPTIONS
 
-.PHONY: all clean install test test-installed .FORCE-GIT-VERSION-FILE FORCE
+.PHONY: all clean install install-modules test test-installed .FORCE-GIT-VERSION-FILE FORCE
 
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index d521b4c..e4c08ba 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -9,6 +9,14 @@
 
 use strict;
 use warnings;
+
+use File::Spec;
+# __DIR__ is taken from Dir::Self __DIR__ fragment
+sub __DIR__ () {
+	File::Spec->rel2abs(join '', (File::Spec->splitpath(__FILE__))[0, 1]);
+}
+use lib __DIR__ . '/lib';
+
 use CGI qw(:standard :escapeHTML -nosticky);
 use CGI::Util qw(unescape);
 use CGI::Carp qw(fatalsToBrowser set_message);
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCHv5 03/17] gitweb/lib - Very simple file based cache
  2010-10-06 22:01 [PATCHv5 00/17] gitweb: Simple file based output caching Jakub Narebski
  2010-10-06 22:01 ` [PATCHv5 01/17] t/test-lib.sh: Export also GIT_BUILD_DIR in test_external Jakub Narebski
  2010-10-06 22:01 ` [PATCHv5 02/17] gitweb: Prepare for splitting gitweb Jakub Narebski
@ 2010-10-06 22:01 ` Jakub Narebski
  2010-10-06 22:41   ` Thomas Adam
  2010-10-06 22:57   ` Ævar Arnfjörð Bjarmason
  2010-10-06 22:01 ` [PATCHv5 04/17] gitweb/lib - Stat-based cache expiration Jakub Narebski
                   ` (19 subsequent siblings)
  22 siblings, 2 replies; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 22:01 UTC (permalink / raw)
  To: git; +Cc: John 'Warthog9' Hawley, Petr Baudis, admin,
	Jakub Narebski

This is first step towards implementing file based output (response)
caching layer that is used on such large sites as kernel.org.

This patch introduces GitwebCaching::SimpleFileCache package, which
follows Cache::Cache / CHI interface, although do not implement it
fully.  The intent of following established convention for cache
interface is to be able to replace our simple file based cache,
e.g. by the one using memcached.

Like in original patch by John 'Warthog9' Hawley (J.H.) (the one this
commit intends to be incremental step to), the data is stored in the
case as-is, without adding metadata (like expiration date), and
without serialization (which means that one can store only scalar
data).

To be implemented (from original patch by J.H.):
* cache expiration (based on file stats, current time and global
  expiration time); currently elements in cache do not expire at all
* actually using this cache in gitweb, with exception of error pages
* adaptive cache expiration, based on average system load
* optional locking interface, where only one process can update cache
  (using flock)
* server-side progress indicator when waiting for filling cache,
  which in turn requires separating situations (like snapshots and
  other non-HTML responses) where we should not show 'please wait'
  message

Possible extensions (beyound what was in original patch):
* (optionally) show information about cache utilization
* AJAX (JavaScript-based) progress indicator
* JavaScript code to update relative dates in cached output
* make cache size-aware (try to not exceed specified maximum size)
* utilize X-Sendfile header (or equivalent) to show cached data
  (optional, as it makes sense only if web server supports sendfile
  feature and have it enabled)
* variable expiration feature from CHI, allowing items to expire a bit
  earlier than the stated expiration time to prevent cache miss
  stampedes (although locking, if available, should take care of
  this)

The code of GitwebCaching::SimpleFileCache package in gitweb/lib
was heavily based on file-based cache in Cache::Cache package, i.e.
on Cache::FileCache, Cache::FileBackend and Cache::BaseCache, and on
file-based cache in CHI, i.e. on CHI::Driver::File and CHI::Driver
(including implementing atomic write, something that original patch
lacks).  It tries to follow more modern CHI architecture, but without
requiring Moose.  It is much simplified compared to both interfaces
and their file-based drivers.

This patch does not yet enable output caching in gitweb (it doesn't
have all required features yet); on the other hand it includes tests
of cache Perl API in t/t9503-gitweb-caching-interface.sh.

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Differences from v4:
* The test got renamed a bit, because separate external tests would be
  invoked by separate test scripts, for 'prove' to work correctly
  (as test_external doesn't yet work as subtest).

  The t/t9503-gitweb-caching-interface.sh gets a long description,
  and is updated to follow conventions used post-TAPification.

* The t/t9503/test_cache_interface.pl uses GIT_BUILD_DIR rather than
  TEST_DIRECTORY, and respect GITWEBLIBDIR to make it possible to test
  installed version of module.

* Generating a temporary file to write to is refactored into "private"
  method _tempfile_to_path

* The call to 'mkdir' is enveloped in eval, as it dies on error.
  make_path would be better here, but is not available for File::Path
  before v2.00, which is in core only since 5.10


Differences from relevant parts of J.H. patch:
* Does not use make_path, but older mkpath interface from File::Path;
  this way we do not require File::Path version >= 2.0.

* Writes to cache file atomically, by writing to unique temporary file
  (from File::Temp at this point of series) and then renaming/moving
  this file to the cache file.  This is to make caching more robust,
  by preventing readers from getting partial data.  The issue this
  prevents should be very rare, as we write whole data (whole page) at
  once, after it is generated in full, so this situation can
  theoretically happen only in the case of heavy load, many clients,
  and very large page (over the size of (file)system cache).

* Error reporting is done via "die", not via die_error from gitweb.
  This is to avoid entangling cache.pm and gitweb.perl code (making
  cache.pm dependent on gitweb).  This can be improved further by
  passing die handler (wrapper around die_error in case of gitweb) to
  cache (see CHI source code for how it can be solved).

* The depth of cache hierarchy is configurable, while J.H. patch uses
  hardcoded depth of 1 subdirectory.  It uses unpack rather than substr.
  For 06b90e786e... key digest it uses 06/b90e786e... (with cache_depth = 1)
  rather than 06/06b90e786e... in J.H. patch.  

* GitwebCache::SimpleFileCache uses OO (object-oriented) interface,
  with constructor that takes param names that are compatibile with
  other cache engines (like CHI or Cache::Cache).

  GitwebCache::SimpleFileCache is a proper Perl module.

* GitwebCache::SimpleFileCache uses standard ->get, ->set, ->compute
  methods, which should allow easy replacing it with CHI or Cache::Cache
  cache (see later commit, adding caching support to gitweb).

* Support for namespaces (though default is to have no namespace, at
  least at this point of series).

* ->fetch($key) and ->store($key, $data) use fast slurp / spew,
  adapted from File::Slurp read and write subroutines, via
  CHI::Driver::File

* Tests of caching API in t/t9503-gitweb-caching-interface.sh

* Better Perl style, more consistent with what gitweb itself uses.


 gitweb/lib/GitwebCache/SimpleFileCache.pm |  324 +++++++++++++++++++++++++++++
 t/t9503-gitweb-caching-interface.sh       |   34 +++
 t/t9503/test_cache_interface.pl           |   81 +++++++
 3 files changed, 439 insertions(+), 0 deletions(-)
 create mode 100644 gitweb/lib/GitwebCache/SimpleFileCache.pm
 create mode 100755 t/t9503-gitweb-caching-interface.sh
 create mode 100755 t/t9503/test_cache_interface.pl

diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
new file mode 100644
index 0000000..a54f78f
--- /dev/null
+++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
@@ -0,0 +1,324 @@
+# gitweb - simple web interface to track changes in git repositories
+#
+# (C) 2006, John 'Warthog9' Hawley <warthog19@eaglescrag.net>
+# (C) 2010, Jakub Narebski <jnareb@gmail.com>
+#
+# This program is licensed under the GPLv2
+
+#
+# Gitweb caching engine, simple file-based cache
+#
+
+# Minimalistic cache that stores data in the filesystem, without serialization
+# and currently without any kind of cache expiration (all keys last forever till
+# they got explicitely removed).
+#
+# It follows Cache::Cache and CHI interfaces (but does not implement it fully)
+
+package GitwebCache::SimpleFileCache;
+
+use strict;
+use warnings;
+
+use File::Path qw(mkpath);
+use File::Temp qw(tempfile);
+use Digest::MD5 qw(md5_hex);
+
+# by default, the cache nests all entries on the filesystem single
+# directory deep, i.e. '60/b725f10c9c85c70d97880dfe8191b3' for
+# key name (key digest) 60b725f10c9c85c70d97880dfe8191b3.
+#
+our $DEFAULT_CACHE_DEPTH = 1;
+
+# by default, the root of the cache is located in 'cache'.
+#
+our $DEFAULT_CACHE_ROOT = "cache";
+
+# by default we don't use cache namespace (empty namespace);
+# empty namespace does not allow for simple implementation of clear() method.
+#
+our $DEFAULT_NAMESPACE = '';
+
+# ......................................................................
+# constructor
+
+# The options are set by passing in a reference to a hash containing
+# any of the following keys:
+#  * 'namespace'
+#    The namespace associated with this cache.  This allows easy separation of
+#    multiple, distinct caches without worrying about key collision.  Defaults
+#    to $DEFAULT_NAMESPACE.
+#  * 'cache_root' (Cache::FileCache compatibile),
+#    'root_dir' (CHI::Driver::File compatibile),
+#    The location in the filesystem that will hold the root of the cache.
+#    Defaults to $DEFAULT_CACHE_ROOT.
+#  * 'cache_depth' (Cache::FileCache compatibile),
+#    'depth' (CHI::Driver::File compatibile),
+#    The number of subdirectories deep to cache object item.  This should be
+#    large enough that no cache directory has more than a few hundred objects.
+#    Defaults to $DEFAULT_CACHE_DEPTH unless explicitly set.
+sub new {
+	my ($proto, $p_options_hash_ref) = @_;
+
+	my $class = ref($proto) || $proto;
+	my $self  = {};
+	$self = bless($self, $class);
+
+	my ($root, $depth, $ns);
+	if (defined $p_options_hash_ref) {
+		$root  =
+			$p_options_hash_ref->{'cache_root'} ||
+			$p_options_hash_ref->{'root_dir'};
+		$depth =
+			$p_options_hash_ref->{'cache_depth'} ||
+			$p_options_hash_ref->{'depth'};
+		$ns    = $p_options_hash_ref->{'namespace'};
+	}
+	$root  = $DEFAULT_CACHE_ROOT  unless defined($root);
+	$depth = $DEFAULT_CACHE_DEPTH unless defined($depth);
+	$ns    = $DEFAULT_NAMESPACE   unless defined($ns);
+
+	$self->set_root($root);
+	$self->set_depth($depth);
+	$self->set_namespace($ns);
+
+	return $self;
+}
+
+# ......................................................................
+# accessors
+
+# http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
+
+# creates get_depth() and set_depth($depth) etc. methods
+foreach my $i (qw(depth root namespace)) {
+	my $field = $i;
+	no strict 'refs';
+	*{"get_$field"} = sub {
+		my $self = shift;
+		return $self->{$field};
+	};
+	*{"set_$field"} = sub {
+		my ($self, $value) = @_;
+		$self->{$field} = $value;
+	};
+}
+
+# ----------------------------------------------------------------------
+# utility functions and methods
+
+# Return root dir for namespace (lazily built, cached)
+sub path_to_namespace {
+	my ($self) = @_;
+
+	if (!exists $self->{'path_to_namespace'}) {
+		if (defined $self->{'namespace'} &&
+		    $self->{'namespace'} ne '') {
+			$self->{'path_to_namespace'} = "$self->{'root'}/$self->{'namespace'}";
+		} else {
+			$self->{'path_to_namespace'} =  $self->{'root'};
+		}
+	}
+	return $self->{'path_to_namespace'};
+}
+
+# $path = $cache->path_to_key($key);
+# $path = $cache->path_to_key($key, \$dir);
+#
+# Take an human readable key, and return file path.
+# Puts dirname of file path in second argument, if it is provided.
+sub path_to_key {
+	my ($self, $key, $dir_ref) = @_;
+
+	my @paths = ( $self->path_to_namespace() );
+
+	# Create a unique (hashed) key from human readable key
+	my $filename = md5_hex($key); # or $digester->add($key)->hexdigest();
+
+	# Split filename so that it have DEPTH subdirectories,
+	# where each subdirectory has a two-letter name
+	push @paths, unpack("(a2)".($self->{'depth'} || 1)." a*", $filename);
+	$filename = pop @paths;
+
+	# Join paths together, computing dir separately if $dir_ref was passed.
+	my $filepath;
+	if (defined $dir_ref && ref($dir_ref)) {
+		my $dir = join('/', @paths);
+		$filepath = "$dir/$filename";
+		$$dir_ref = $dir;
+	} else {
+		$filepath = join('/', @paths, $filename);
+	}
+
+	return $filepath;
+}
+
+# ----------------------------------------------------------------------
+# "private" utility functions and methods
+
+# take a file path to cache entry, and its directory
+# return filehandle and filename of open temporary file,
+# like File::Temp::tempfile
+sub _tempfile_to_path {
+	my ($file, $dir) = @_;
+
+	# tempfile will croak() if there is an error
+	return tempfile("${file}_XXXXX",
+		#DIR => $dir,
+		'UNLINK' => 0, # ensure that we don't unlink on close; file is renamed
+		'SUFFIX' => '.tmp');
+}
+
+
+# ----------------------------------------------------------------------
+# worker methods
+
+sub fetch {
+	my ($self, $key) = @_;
+
+	my $file = $self->path_to_key($key);
+	return undef unless (defined $file && -f $file);
+
+	# Fast slurp, adapted from File::Slurp::read, with unnecessary options removed
+	# via CHI::Driver::File (from CHI-0.33)
+	my $buf = '';
+	open my $read_fh, '<', $file
+		or return undef;
+	binmode $read_fh, ':raw';
+
+	my $size_left = -s $read_fh;
+
+	while ($size_left > 0) {
+		my $read_cnt = sysread($read_fh, $buf, $size_left, length($buf));
+		return undef unless defined $read_cnt;
+
+		last if $read_cnt == 0;
+		$size_left -= $read_cnt;
+		#last if $size_left <= 0;
+	}
+
+	close $read_fh
+		or die "Couldn't close file '$file' opened for reading: $!";
+	return $buf;
+}
+
+sub store {
+	my ($self, $key, $data) = @_;
+
+	my $dir;
+	my $file = $self->path_to_key($key, \$dir);
+	return undef unless (defined $file && defined $dir);
+
+	# ensure that directory leading to cache file exists
+	if (!-d $dir) {
+		eval { mkpath($dir, 0, 0777); 1 }
+			or die "Couldn't create leading directory '$dir' (mkpath): $!";
+	}
+
+	# generate a temporary file
+	my ($temp_fh, $tempname) = $self->_tempfile_to_path($file, $dir);
+	chmod 0666, $tempname
+		or warn "Couldn't change permissions to 0666 / -rw-rw-rw- for '$tempname': $!";
+
+	# Fast spew, adapted from File::Slurp::write, with unnecessary options removed
+	# via CHI::Driver::File (from CHI-0.33)
+	my $write_fh = $temp_fh;
+	binmode $write_fh, ':raw';
+
+	my $size_left = length($data);
+	my $offset = 0;
+
+	while ($size_left > 0) {
+		my $write_cnt = syswrite($write_fh, $data, $size_left, $offset);
+		return undef unless defined $write_cnt;
+		
+		$size_left -= $write_cnt;
+		$offset += $write_cnt; # == length($data);
+	}
+
+	close $temp_fh
+		or die "Couldn't close temporary file '$tempname' opened for writing: $!";
+	rename($tempname, $file)
+		or die "Couldn't rename temporary file '$tempname' to '$file': $!";
+}
+
+# get size of an element associated with the $key (not the size of whole cache)
+sub get_size {
+	my ($self, $key) = @_;
+
+	my $path = $self->path_to_key($key)
+		or return undef;
+	if (-f $path) {
+		return -s $path;
+	}
+	return 0;
+}
+
+# ......................................................................
+# interface methods
+
+# Removing and expiring
+
+# $cache->remove($key)
+#
+# Remove the data associated with the $key from the cache.
+sub remove {
+	my ($self, $key) = @_;
+
+	my $file = $self->path_to_key($key)
+		or return undef;
+	return undef unless -f $file;
+	unlink($file)
+		or die "Couldn't remove file '$file': $!";
+}
+
+# Getting and setting
+
+# $cache->set($key, $data);
+#
+# Associates $data with $key in the cache, overwriting any existing entry.
+# Returns $data.
+sub set {
+	my ($self, $key, $data) = @_;
+
+	return unless (defined $key && defined $data);
+
+	$self->store($key, $data);
+
+	return $data;
+}
+
+# $data = $cache->get($key);
+#
+# Returns the data associated with $key.  If $key does not exist
+# or has expired, returns undef.
+sub get {
+	my ($self, $key) = @_;
+
+	my $data = $self->fetch($key)
+		or return undef;
+
+	return $data;
+}
+
+# $data = $cache->compute($key, $code);
+#
+# Combines the get and set operations in a single call.  Attempts to
+# get $key; if successful, returns the value.  Otherwise, calls $code
+# and uses the return value as the new value for $key, which is then
+# returned.
+sub compute {
+	my ($self, $key, $code) = @_;
+
+	my $data = $self->get($key);
+	if (!defined $data) {
+		$data = $code->($self, $key);
+		$self->set($key, $data);
+	}
+
+	return $data;
+}
+
+1;
+__END__
+# end of package GitwebCache::SimpleFileCache;
diff --git a/t/t9503-gitweb-caching-interface.sh b/t/t9503-gitweb-caching-interface.sh
new file mode 100755
index 0000000..819da1d
--- /dev/null
+++ b/t/t9503-gitweb-caching-interface.sh
@@ -0,0 +1,34 @@
+#!/bin/sh
+#
+# Copyright (c) 2010 Jakub Narebski
+#
+
+test_description='gitweb caching interface
+
+This test checks caching interface used in gitweb caching, and caching
+infrastructure (GitwebCache::* modules).'
+
+# for now we are running only cache interface tests
+. ./test-lib.sh
+
+# this test is present in gitweb-lib.sh
+if ! test_have_prereq PERL; then
+	skip_all='perl not available, skipping test'
+	test_done
+fi
+
+"$PERL_PATH" -MTest::More -e 0 >/dev/null 2>&1 || {
+	skip_all='perl module Test::More unavailable, skipping test'
+	test_done
+}
+
+# ----------------------------------------------------------------------
+
+# The external test will outputs its own plan
+test_external_has_tap=1
+
+test_external \
+	'GitwebCache::* Perl API (in gitweb/lib/)' \
+	"$PERL_PATH" "$TEST_DIRECTORY"/t9503/test_cache_interface.pl
+
+test_done
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
new file mode 100755
index 0000000..6a7b715
--- /dev/null
+++ b/t/t9503/test_cache_interface.pl
@@ -0,0 +1,81 @@
+#!/usr/bin/perl
+use lib (split(/:/, $ENV{GITPERLLIB}));
+
+use warnings;
+use strict;
+
+use Test::More;
+
+# test source version
+use lib $ENV{GITWEBLIBDIR} || "$ENV{GIT_BUILD_DIR}/gitweb/lib";
+
+# Test creating a cache
+#
+BEGIN { use_ok('GitwebCache::SimpleFileCache'); }
+my $cache = new_ok('GitwebCache::SimpleFileCache');
+
+# Test that default values are defined
+#
+ok(defined $GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT,
+	'$DEFAULT_CACHE_ROOT defined');
+ok(defined $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH,
+	'$DEFAULT_CACHE_DEPTH defined');
+
+# Test accessors and default values for cache
+#
+SKIP: {
+	skip 'default values not defined', 3
+		unless ($GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT &&
+		        $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH);
+
+	is($cache->get_namespace(), '', "default namespace is ''");
+	cmp_ok($cache->get_root(),  'eq', $GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT,
+		"default cache root is '$GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT'");
+	cmp_ok($cache->get_depth(), '==', $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH,
+		"default cache depth is $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH");
+}
+
+# Test the getting, setting, and removal of a cached value
+# (Cache::Cache interface)
+#
+my $key = 'Test Key';
+my $value = 'Test Value';
+
+subtest 'Cache::Cache interface' => sub {
+	foreach my $method (qw(get set remove)) {
+		can_ok($cache, $method);
+	}
+
+	$cache->set($key, $value);
+	cmp_ok($cache->get_size($key), '>', 0, 'get_size after set, is greater than 0');
+	is($cache->get($key), $value,          'get after set, returns cached value');
+	$cache->remove($key);
+	ok(!defined($cache->get($key)),        'get after remove, is undefined');
+
+	eval { $cache->remove('Not-Existent Key'); };
+	ok(!$@,                                'remove on non-existent key doesn\'t die');
+	diag($@) if $@;
+
+	done_testing();
+};
+
+# Test the getting and setting of a cached value
+# (CHI interface)
+#
+my $call_count = 0;
+sub get_value {
+	$call_count++;
+	return $value;
+}
+subtest 'CHI interface' => sub {
+	can_ok($cache, qw(compute));
+
+	is($cache->compute($key, \&get_value), $value, "compute 1st time (set) returns '$value'");
+	is($cache->compute($key, \&get_value), $value, "compute 2nd time (get) returns '$value'");
+	is($cache->compute($key, \&get_value), $value, "compute 3rd time (get) returns '$value'");
+	cmp_ok($call_count, '==', 1, 'get_value() is called once from compute');
+
+	done_testing();
+};
+
+done_testing();
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCHv5 04/17] gitweb/lib - Stat-based cache expiration
  2010-10-06 22:01 [PATCHv5 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (2 preceding siblings ...)
  2010-10-06 22:01 ` [PATCHv5 03/17] gitweb/lib - Very simple file based cache Jakub Narebski
@ 2010-10-06 22:01 ` Jakub Narebski
  2010-10-06 22:01 ` [PATCHv5 05/17] gitweb/lib - Regenerate entry if the cache file has size of 0 Jakub Narebski
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 22:01 UTC (permalink / raw)
  To: git; +Cc: John 'Warthog9' Hawley, Petr Baudis, admin,
	Jakub Narebski

Add stat-based cache expiration to file-based GitwebCache::SimpleFileCache.
Contrary to the way other caching interfaces such as Cache::Cache and CHI
do it, the time cache element expires in is _global_ value associated with
cache instance, and is not local property of cache entry.  (Currently cache
entry does not store any metadata associated with entry... which means that
there is no need for serialization / marshalling / freezing and thawing.)
Default expire time is -1, which means never expire.

To check if cache entry is expired, GitwebCache::SimpleFileCache compares
difference between mtime (last modify time) of a cache file and current time
with (global) time to expire.  It is done using CHI-compatibile is_valid()
method.

Add some tests checking that expiring works correctly (on the level of API).

To be implemented (from original patch by J.H.):
* actually using this cache in gitweb, except error pages
* adaptive cache expiration, based on average system load
* optional locking interface, where only one process can update cache
  (using flock)
* server-side progress indicator when waiting for filling cache,
  which in turn requires separating situations (like snapshots and
  other non-HTML responses) where we should not show 'please wait'
  message

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Now that caching engine supports cache expiration, we can add caching
support to gitweb.

No significant differences from v4 version of this patch.

Differences from relevant parts of J.H. patch:
* It simply uses stat on last accessed file (checked for existence),
  instead of opening file for reading (without error detection!), running
  stat on it, and then closing it.

* One can use expire time of -1 (or to be more exact less than 0) to set
  expire time to never (cache is considered fresh forever, does not expire).

* There are some tests in t9503 of cache expiration (one of those assume
  that expire time of one day would be not expired in get after set).

* It reuses stat structure (stat(_)), and calculates current time only
  once.

 gitweb/lib/GitwebCache/SimpleFileCache.pm |   39 +++++++++++++++++++++++++++-
 t/t9503/test_cache_interface.pl           |   19 ++++++++++++++
 2 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
index a54f78f..74d7246 100644
--- a/gitweb/lib/GitwebCache/SimpleFileCache.pm
+++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
@@ -57,6 +57,10 @@ our $DEFAULT_NAMESPACE = '';
 #    The number of subdirectories deep to cache object item.  This should be
 #    large enough that no cache directory has more than a few hundred objects.
 #    Defaults to $DEFAULT_CACHE_DEPTH unless explicitly set.
+#  * 'default_expires_in' (Cache::Cache compatibile),
+#    'expires_in' (CHI compatibile) [seconds]
+#    The expiration time for objects place in the cache.
+#    Defaults to -1 (never expire) if not explicitly set.
 sub new {
 	my ($proto, $p_options_hash_ref) = @_;
 
@@ -64,7 +68,7 @@ sub new {
 	my $self  = {};
 	$self = bless($self, $class);
 
-	my ($root, $depth, $ns);
+	my ($root, $depth, $ns, $expires_in);
 	if (defined $p_options_hash_ref) {
 		$root  =
 			$p_options_hash_ref->{'cache_root'} ||
@@ -73,14 +77,19 @@ sub new {
 			$p_options_hash_ref->{'cache_depth'} ||
 			$p_options_hash_ref->{'depth'};
 		$ns    = $p_options_hash_ref->{'namespace'};
+		$expires_in =
+			$p_options_hash_ref->{'default_expires_in'} ||
+			$p_options_hash_ref->{'expires_in'};
 	}
 	$root  = $DEFAULT_CACHE_ROOT  unless defined($root);
 	$depth = $DEFAULT_CACHE_DEPTH unless defined($depth);
 	$ns    = $DEFAULT_NAMESPACE   unless defined($ns);
+	$expires_in = -1 unless defined($expires_in); # <0 means never
 
 	$self->set_root($root);
 	$self->set_depth($depth);
 	$self->set_namespace($ns);
+	$self->set_expires_in($expires_in);
 
 	return $self;
 }
@@ -91,7 +100,7 @@ sub new {
 # http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
 
 # creates get_depth() and set_depth($depth) etc. methods
-foreach my $i (qw(depth root namespace)) {
+foreach my $i (qw(depth root namespace expires_in)) {
 	my $field = $i;
 	no strict 'refs';
 	*{"get_$field"} = sub {
@@ -272,6 +281,31 @@ sub remove {
 		or die "Couldn't remove file '$file': $!";
 }
 
+# $cache->is_valid($key)
+#
+# Returns a boolean indicating whether $key exists in the cache
+# and has not expired (global per-cache 'expires_in').
+sub is_valid {
+	my ($self, $key) = @_;
+
+	my $path = $self->path_to_key($key);
+
+	# does file exists in cache?
+	return 0 unless -f $path;
+	# get its modification time
+	my $mtime = (stat(_))[9] # _ to reuse stat structure used in -f test
+		or die "Couldn't stat file '$path': $!";
+
+	# expire time can be set to never
+	my $expires_in = $self->get_expires_in();
+	return 1 unless (defined $expires_in && $expires_in >= 0);
+
+	# is file expired?
+	my $now = time();
+
+	return (($now - $mtime) < $expires_in);
+}
+
 # Getting and setting
 
 # $cache->set($key, $data);
@@ -295,6 +329,7 @@ sub set {
 sub get {
 	my ($self, $key) = @_;
 
+	return undef unless $self->is_valid($key);
 	my $data = $self->fetch($key)
 		or return undef;
 
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 6a7b715..adca88d 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -78,4 +78,23 @@ subtest 'CHI interface' => sub {
 	done_testing();
 };
 
+# Test cache expiration
+#
+subtest 'cache expiration' => sub {
+	$cache->set_expires_in(60*60*24); # set expire time to 1 day
+	cmp_ok($cache->get_expires_in(), '>', 0, '"expires in" is greater than 0');
+	is($cache->get($key), $value,            'get returns cached value (not expired in 1d)');
+
+	$cache->set_expires_in(-1); # set expire time to never expire
+	is($cache->get_expires_in(), -1,         '"expires in" is set to never (-1)');
+	is($cache->get($key), $value,            'get returns cached value (not expired)');
+
+	$cache->set_expires_in(0);
+	is($cache->get_expires_in(),  0,         '"expires in" is set to now (0)');
+	$cache->set($key, $value);
+	ok(!defined($cache->get($key)),          'cache is expired');
+
+	done_testing();
+};
+
 done_testing();
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCHv5 05/17] gitweb/lib - Regenerate entry if the cache file has size of 0
  2010-10-06 22:01 [PATCHv5 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (3 preceding siblings ...)
  2010-10-06 22:01 ` [PATCHv5 04/17] gitweb/lib - Stat-based cache expiration Jakub Narebski
@ 2010-10-06 22:01 ` Jakub Narebski
  2010-10-06 22:01 ` [PATCHv5 06/17] gitweb/lib - Simple select(FH) based output capture Jakub Narebski
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 22:01 UTC (permalink / raw)
  To: git; +Cc: John 'Warthog9' Hawley, Petr Baudis, admin,
	Jakub Narebski

If the file representing cache entry has size 0 (in bytes), the cache
entry is considered invalid, regardless of its freshness, even if cache
expiration is turned off.

[jh: This is a quick, and thankfully easy, check to regenerate the
cache if the resulting file is of size 0.  This *SHOULDN'T* happen,
but it is possible that it could and thus adding the check.]

Based-on-commit-by: John 'Warthog9' Hawley <warthog9@eaglescrag.net>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch is new and wasn't present in previous version of this
series.

It is inspired by commit 22eb7af (Gitweb - Regenerate the cache if the
resulting file has size of 0, 2010-09-23) on 'master' branch of
git/warthog9/gitweb.git repository on git.kernel.org

 gitweb/lib/GitwebCache/SimpleFileCache.pm |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
index 74d7246..6833699 100644
--- a/gitweb/lib/GitwebCache/SimpleFileCache.pm
+++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
@@ -295,6 +295,8 @@ sub is_valid {
 	# get its modification time
 	my $mtime = (stat(_))[9] # _ to reuse stat structure used in -f test
 		or die "Couldn't stat file '$path': $!";
+	# cache entry is invalid if it is size 0 (in bytes)
+	return 0 unless ((stat(_))[7] > 0);
 
 	# expire time can be set to never
 	my $expires_in = $self->get_expires_in();
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCHv5 06/17] gitweb/lib - Simple select(FH) based output capture
  2010-10-06 22:01 [PATCHv5 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (4 preceding siblings ...)
  2010-10-06 22:01 ` [PATCHv5 05/17] gitweb/lib - Regenerate entry if the cache file has size of 0 Jakub Narebski
@ 2010-10-06 22:01 ` Jakub Narebski
  2010-10-06 22:52   ` Thomas Adam
  2010-10-06 23:03   ` Ævar Arnfjörð Bjarmason
  2010-10-06 22:01 ` [PATCHv5 07/17] gitweb/lib - Cache captured output (using get/set) Jakub Narebski
                   ` (16 subsequent siblings)
  22 siblings, 2 replies; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 22:01 UTC (permalink / raw)
  To: git; +Cc: John 'Warthog9' Hawley, Petr Baudis, admin,
	Jakub Narebski

Add two packages: GitwebCache::Capture, which defines interface, and
GitwebCache::Capture::SelectFH, which is actually implements simple
capturing.  GitwebCache::Capture::SelectFH captures output by using
select(FILEHANDLE) to change default filehandle for output.  This
means that output of a "print" or a "printf" (or a "write") without
a filehandle would be captured.

To change mode of filehandle used for capturing correctly,
  binmode select(), <mode>;
needs to be used in place of
  binmode STDOUT, <mode>;

Capturing is done using in-memory file held in Perl scalar.

Using select(FILEHANDLE) is a bit fragile as a method of capturing
output, as it assumes that we always use "print" or "printf" without
filehandle, and use select() which returns default filehandle for
output in place of explicit STDOUT.  On the other hand it has the
advantage of being simple.  Alternate solutions include using tie
(like in CGI::Cache), or using PerlIO layers - but the last requires
non-standard PerlIO::Util module.


Includes separate tests for capturing output.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
You can see alternate solutions for capturing output in 21/17 patch of
this series: "gitweb/lib - Alternate ways of capturing output" (an
appendix to this series).

Differences from v4:
* The capture interface tests are now invoked by a separate test
  script t/t9504-gitweb-capture-interface.sh, for 'prove' to work
  correctly (as test_external doesn't yet work as subtest).

* The t/t9504/test_capture_interface.pl uses GIT_BUILD_DIR rather than
  TEST_DIRECTORY, and respect GITWEBLIBDIR to make it possible to test
  installed version of module.

* Removed spurious changes and fixes to commits earlier in series
  (patch cleanup).


Differences from relevant parts of J.H. patch:
* Capturing gitweb output will be done without need to modify gitweb
  to either save generated output into $output variable, and then
  print it or save it in cache after it is generated in full (original
  J.H. patch in "Gitweb caching v2"), or changing all print statements
  to print to explicit filehandle which points to STDOUT if caching is
  disabled and to in-memory file if caching is enabled (modified
  J.H. patch in "Gitweb caching v5").

* Contrary to the '$output .= <sth>' solution, and similar to the
  'print {$out} <sth>' or 'print $out <sth>' (which can be thought of
  as explicit version of select($out)), this way of capturing output
  doesn't change gitweb behavior when caching is turned off; in
  particular it preserves streaming.

  Also the '$output .= <sth>' solution can affect performance because
  of repeated string concatenation.

* The most important issue is that I/O "layers" (PerlIO), like ':utf8'
  or ':raw', are *already applied* to the output that is captured.
  This means that captured output is *always* in binary (':raw') mode.
  In Perl 6 language it means that data returned by capturing engine
  is an equivalent of Buf, a collection of bytes, whether Buf or Str
  (a colection of logical characters) is printed.

  The overal result is that we would not need separate code path for
  caching binary output, and separate naming conventions for cache
  files for binary data.

  The t9504 test is about checking if both ':utf8' and ':raw' output
  is captured correctly.

 gitweb/lib/GitwebCache/Capture.pm          |   66 ++++++++++++++++++++++
 gitweb/lib/GitwebCache/Capture/SelectFH.pm |   82 ++++++++++++++++++++++++++++
 t/t9504-gitweb-capture-interface.sh        |   34 ++++++++++++
 t/t9504/test_capture_interface.pl          |   76 ++++++++++++++++++++++++++
 4 files changed, 258 insertions(+), 0 deletions(-)
 create mode 100644 gitweb/lib/GitwebCache/Capture.pm
 create mode 100644 gitweb/lib/GitwebCache/Capture/SelectFH.pm
 create mode 100755 t/t9504-gitweb-capture-interface.sh
 create mode 100755 t/t9504/test_capture_interface.pl

diff --git a/gitweb/lib/GitwebCache/Capture.pm b/gitweb/lib/GitwebCache/Capture.pm
new file mode 100644
index 0000000..3e9fe81
--- /dev/null
+++ b/gitweb/lib/GitwebCache/Capture.pm
@@ -0,0 +1,66 @@
+# gitweb - simple web interface to track changes in git repositories
+#
+# (C) 2010, Jakub Narebski <jnareb@gmail.com>
+#
+# This program is licensed under the GPLv2
+
+#
+# Output capturing for gitweb caching engine
+#
+
+# It is base abstract class (a role) for capturing output of gitweb
+# actions for gitweb caching engine.
+# 
+# Child (derived) concrete classes, which actually implement some method
+# of capturing STDOUT output, must implement the following methods:
+# * ->new(), to create new object of a capturing class
+# * ->start(), to start capturing output
+# * ->stop(), to stop capturing output and return it
+#
+# Before starting capture by using capture_block etc. subroutines,
+# one has to run <child class>->setup().
+
+package GitwebCache::Capture;
+
+use strict;
+use warnings;
+
+use Exporter qw(import);
+our @EXPORT    = qw(capture_start capture_stop capture_block);
+our @EXPORT_OK = qw(setup_capture);
+our %EXPORT_TAGS = (all => [ @EXPORT, @EXPORT_OK ]);
+
+# Holds object used for capture (of child class)
+my $capture;
+
+sub setup_capture {
+	my $self = shift || __PACKAGE__;
+
+	$capture = $self->new(@_);
+}
+
+sub capture {
+	my ($self, $code) = @_;
+
+	$self->start();
+	$code->();
+	return $self->stop();
+}
+
+# Wrap caching data; capture only STDOUT
+sub capture_block (&) {
+	my $code = shift;
+	return $capture->capture($code);
+}
+
+sub capture_start {
+	$capture->start(@_);
+}
+
+sub capture_stop {
+	return $capture->stop(@_);
+}
+
+1;
+__END__
+# end of package GitwebCache::Capture;
diff --git a/gitweb/lib/GitwebCache/Capture/SelectFH.pm b/gitweb/lib/GitwebCache/Capture/SelectFH.pm
new file mode 100644
index 0000000..18ce5c3
--- /dev/null
+++ b/gitweb/lib/GitwebCache/Capture/SelectFH.pm
@@ -0,0 +1,82 @@
+# gitweb - simple web interface to track changes in git repositories
+#
+# (C) 2010, Jakub Narebski <jnareb@gmail.com>
+#
+# This program is licensed under the GPLv2
+
+#
+# Simple output capturing using select(FH);
+#
+
+# This module (class) captures output of 'print <sth>', 'printf <sth>'
+# and 'write <sth>' (without a filehandle) by using select(FILEHANDLE)
+# to change default filehandle for output, changing it to in-memory
+# file (saving output to scalar).
+#
+# Note that when using this simplest way of capturing, to change mode of
+# filehandle using for capturing correctly, "binmode STDOUT, <mode>;"
+# has to be changed to "binmode select(), <mode>;".  This has no change
+# if we are not capturing output using GitwebCache::Capture::SelectFH.
+
+package GitwebCache::Capture::SelectFH;
+
+use PerlIO;
+
+use strict;
+use warnings;
+
+use base qw(GitwebCache::Capture);
+use GitwebCache::Capture qw(:all);
+
+use Exporter qw(import);
+our @EXPORT      = @GitwebCache::Capture::EXPORT;
+our @EXPORT_OK   = @GitwebCache::Capture::EXPORT_OK;
+our %EXPORT_TAGS = %GitwebCache::Capture::EXPORT_TAGS;
+
+# Constructor
+sub new {
+	my $proto = shift;
+
+	my $class = ref($proto) || $proto;
+	my $self  = {};
+	$self = bless($self, $class);
+
+	$self->{'oldfh'} = select();
+	$self->{'data'} = '';
+
+	return $self;
+}
+
+# Start capturing data (STDOUT)
+# (printed using 'print <sth>' or 'printf <sth>')
+sub start {
+	my $self = shift;
+
+	$self->{'data'}    = '';
+	$self->{'data_fh'} = undef;
+	
+	open $self->{'data_fh'}, '>', \$self->{'data'}
+		or die "Couldn't open in-memory file for capture: $!";
+	$self->{'oldfh'} = select($self->{'data_fh'});
+
+	# note: this does not cover all cases
+	binmode select(), ':utf8'
+		if ((PerlIO::get_layers($self->{'oldfh'}))[-1] eq 'utf8');
+}
+
+# Stop capturing data (required for die_error)
+sub stop {
+	my $self = shift;
+
+	# return if we didn't start capturing
+	return unless defined $self->{'data_fh'};
+
+	select($self->{'oldfh'});
+	close $self->{'data_fh'}
+		or die "Couldn't close in-memory file for capture: $!";
+	return $self->{'data'};
+}
+
+1;
+__END__
+# end of package GitwebCache::Capture::SelectFH;
diff --git a/t/t9504-gitweb-capture-interface.sh b/t/t9504-gitweb-capture-interface.sh
new file mode 100755
index 0000000..82623f1
--- /dev/null
+++ b/t/t9504-gitweb-capture-interface.sh
@@ -0,0 +1,34 @@
+#!/bin/sh
+#
+# Copyright (c) 2010 Jakub Narebski
+#
+
+test_description='gitweb capturing interface
+
+This test checks capturing interface used for capturing gitweb output
+in gitweb caching (GitwebCache::Capture* modules).'
+
+# for now we are running only cache interface tests
+. ./test-lib.sh
+
+# this test is present in gitweb-lib.sh
+if ! test_have_prereq PERL; then
+	skip_all='perl not available, skipping test'
+	test_done
+fi
+
+"$PERL_PATH" -MTest::More -e 0 >/dev/null 2>&1 || {
+	skip_all='perl module Test::More unavailable, skipping test'
+	test_done
+}
+
+# ----------------------------------------------------------------------
+
+# The external test will outputs its own plan
+test_external_has_tap=1
+
+test_external \
+	'GitwebCache::Capture Perl API (in gitweb/lib/)' \
+	"$PERL_PATH" "$TEST_DIRECTORY"/t9504/test_capture_interface.pl
+
+test_done
diff --git a/t/t9504/test_capture_interface.pl b/t/t9504/test_capture_interface.pl
new file mode 100755
index 0000000..55c402a
--- /dev/null
+++ b/t/t9504/test_capture_interface.pl
@@ -0,0 +1,76 @@
+#!/usr/bin/perl
+use lib (split(/:/, $ENV{GITPERLLIB}));
+
+use warnings;
+use strict;
+use utf8;
+
+use Test::More;
+
+# test source version
+use lib $ENV{GITWEBLIBDIR} || "$ENV{GIT_BUILD_DIR}/gitweb/lib";
+
+# ....................................................................
+
+# prototypes must be known at compile time, otherwise they do not work
+BEGIN { use_ok('GitwebCache::Capture::SelectFH', qw(:all)); }
+
+# Test setting up capture
+#
+my $capture = new_ok('GitwebCache::Capture::SelectFH' => [], 'The $capture');
+isa_ok($capture, 'GitwebCache::Capture', 'The $capture');
+ok(setup_capture('GitwebCache::Capture::SelectFH'),
+   'setup_capture with package name: GitwebCache::Capture::SelectFH');
+ok(setup_capture($capture),
+   'setup_capture with subclass object: $capture');
+
+# Test properties of capture_block
+#
+is(prototype('capture_block'), '&', 'capture_block has (&) prototype');
+
+# Test capturing
+#
+diag('Should not print anything except test results and diagnostic');
+my $test_data = 'Capture this';
+my $captured = capture_block {
+	print $test_data;
+};
+is($captured, $test_data, 'capture_block captures simple data');
+
+binmode STDOUT, ':utf8';
+$test_data = <<'EOF';
+Áéí óú
+ÄËÑÏÖ
+Ábçdèfg
+Zażółć gęsią jaźń
+山田 太郎
+ブレームのテストです。
+
+はれひほふ
+
+しているのが、いるので。
+濱浜ほれぷりぽれまびぐりろへ。
+EOF
+utf8::decode($test_data);
+$captured = capture_block {
+	binmode select(), ':utf8';
+
+	print $test_data;
+};
+utf8::decode($captured);
+is($captured, $test_data, 'capture_block captures utf8 data');
+
+$test_data = '|\x{fe}\x{ff}|\x{9F}|\000|'; # invalid utf-8
+$captured = capture_block {
+	binmode select(), ':raw';
+
+	print $test_data;
+};
+is($captured, $test_data, 'capture_block captures raw data');
+
+
+done_testing();
+
+# Local Variables:
+# encoding: utf-8
+# End:
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCHv5 07/17] gitweb/lib - Cache captured output (using get/set)
  2010-10-06 22:01 [PATCHv5 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (5 preceding siblings ...)
  2010-10-06 22:01 ` [PATCHv5 06/17] gitweb/lib - Simple select(FH) based output capture Jakub Narebski
@ 2010-10-06 22:01 ` Jakub Narebski
  2010-10-06 22:01 ` [PATCHv5 08/17] gitweb: Add optional output caching Jakub Narebski
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 22:01 UTC (permalink / raw)
  To: git; +Cc: John 'Warthog9' Hawley, Petr Baudis, admin,
	Jakub Narebski

Add GitwebCache::CacheOutput package, which contains cache_output
subroutine, and (re)exports capture_stop from GitwebCache::Capture.
The cache_output gets data from cache and prints it, or captures
output of provided subroutine (code reference), saves it to cache and
prints it.  It currently uses Cache::Cache compatibile (get, set)
interface to cache.  Capturing is currently (not configurable) done
using GitwebCache::Capture::SelectFH introduced in previous commit,
but any class derived from GitwebCache::Capture (like provided example
GitwebCache::Capture::TiedCapture and GitwebCache::Capture::PerlIO)
would work.

Gitweb would use cache_output to get page from cache, or to generate
page and save it to cache.  The capture_stop would be used in
die_error subroutine, because error pages would not be cached.

It is assumed that data is saved to cache _converted_, and should
therefore be read from cache and printed to STDOUT in ':raw' (binary)
mode.


Add t9505/test_cache_output.pl test, run as external test in
t9505-gitweb-cache.sh.  It checks that cache_output behaves correctly,
namely that it saves and restores action output in cache, and that it
prints generated output or cached output.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Differences from v4:
* Similarly to other patches in this revision of gitweb caching
  series, the new test for this new module, written in Perl, is now
  invoked from a separate test script, to make it possible to run
  'prove' on git test suite.

  Also new test uses GIT_BUILD_DIR rather than TEST_DIRECTORY/.., and
  it respects GITWEBLIBDIR to make it possible to test installed
  version of this module.

* GitwebCache::CacheOutput no longer loads GitwebCache::SimpleFileCache;
  It is now assumed that cache_output() caller loads required packages
  itself, so that ->get() and ->set() methods $cache instance, passed
  as argument to cache_output(), work correctly.

  This required small change to t/t9505/test_cache_output.pl


Differences from relevant parts of J.H. patch:
* cache_fetch() subroutine, now named cache_output(), is much, much
  simpler.  Well, at this point in the patch series it lacks most of
  the features of original cache_fetch() by J.H.: it doesn't have
  adaptive cache lifetime, nor locking to prevent 'stampeding herd'
  problem, nor serving stale data when waiting for cache regeneration,
  nor background data generation, nor activity indicator... but the
  cache_output() itself doesn't change much in later commits, as those
  features are added mainly via methods and subroutines that
  cache_output() calls.

* Capturing gitweb output would be done without need to extensively
  modify gitweb to either 1) save generated output into $output
  variable, i.e. replace 'print <sth>' by '$output .= <sth>', and print
  it or save in cache after it is generated in full (original J.H. patch
  in "Gitweb caching v2"), or 2) changing all print statements to print
  to explicit filehandle, i.e. replace 'print <sth>' by 
  'print {$out} <sth>', which filehandle points to STDOUT if caching is
  disabled and to in-memory file if caching is enabled (modified
  J.H. patch in "Gitweb caching v5").

  Currently capturing output is hardcoded to GitwebCache::Capture::SelectFH,
  but changing it to other compatibile capturing engine requires (in
  current version) change only only two lines in GitwebCache::CacheOutput
  module.

* It is assumed that capture generates binary output (with I/O filters
  already applied), and that therefore we read from cache files in
  binary mode, and we print both captured output and data retrieved from
  cache in ':raw' mode (in binmode).

* The key (human-readable unique id) for a page (for given capture) is
  passed explicitely to cache_output().  The cache_output() subroutine
  no longer depends of subroutines (like href()) and variables (like
  $cation or %actions) defined in gitweb.perl.

  This allows to simply use 'require <package>' instead of 'do $package_pm'
  (which also means that it can be installed as module in PERL5LIB etc.).

* Added tests for cache_output() behavior.

 gitweb/lib/GitwebCache/CacheOutput.pm |   64 ++++++++++++++++++++++++++++++
 t/t9505-gitweb-cache.sh               |   34 ++++++++++++++++
 t/t9505/test_cache_output.pl          |   70 +++++++++++++++++++++++++++++++++
 3 files changed, 168 insertions(+), 0 deletions(-)
 create mode 100644 gitweb/lib/GitwebCache/CacheOutput.pm
 create mode 100755 t/t9505-gitweb-cache.sh
 create mode 100755 t/t9505/test_cache_output.pl

diff --git a/gitweb/lib/GitwebCache/CacheOutput.pm b/gitweb/lib/GitwebCache/CacheOutput.pm
new file mode 100644
index 0000000..bba73ee
--- /dev/null
+++ b/gitweb/lib/GitwebCache/CacheOutput.pm
@@ -0,0 +1,64 @@
+# gitweb - simple web interface to track changes in git repositories
+#
+# (C) 2010, Jakub Narebski <jnareb@gmail.com>
+# (C) 2006, John 'Warthog9' Hawley <warthog19@eaglescrag.net>
+#
+# This program is licensed under the GPLv2
+
+#
+# Capturing and caching (gitweb) output
+#
+
+# Capture output, save it in cache and print it, or retrieve it from
+# cache and print it.
+
+package GitwebCache::CacheOutput;
+
+use strict;
+use warnings;
+
+use GitwebCache::Capture::SelectFH qw(:all);
+
+use Exporter qw(import);
+our @EXPORT      = qw(cache_output capture_stop);
+our %EXPORT_TAGS = (all => [ @EXPORT ]);
+
+# cache_output($cache, $key, $action_code);
+#
+# Attempts to get $key from $cache; if successful, prints the value.
+# Otherwise, calls $action_code, capture its output and use
+# the captured output as the new value for $key in $cache,
+# then print captured output.
+#
+# It is assumed that captured data is already converted and it is
+# in ':raw' format (and thus restored in ':raw' from cache)
+our $CAPTURE_CLASS = 'GitwebCache::Capture::SelectFH';
+
+sub cache_output {
+	my ($cache, $key, $code) = @_;
+
+	my $data = $cache->get($key);
+
+	# capture and cache output, if there was nothing in the cache
+	if (!defined $data) {
+		my $capture = $CAPTURE_CLASS;
+		setup_capture($capture);
+
+		# do not use 'capture_block' prototype
+		$data = &capture_block($code);
+		$cache->set($key, $data) if defined $data;
+	}
+
+	# print cached data
+	if (defined $data) {
+		# select() instead of STDOUT is here for tests:
+		binmode select(), ':raw';
+		print $data;
+	}
+
+	return $data;
+}
+
+1;
+__END__
+# end of package GitwebCache::CacheOutput;
diff --git a/t/t9505-gitweb-cache.sh b/t/t9505-gitweb-cache.sh
new file mode 100755
index 0000000..55d3e17
--- /dev/null
+++ b/t/t9505-gitweb-cache.sh
@@ -0,0 +1,34 @@
+#!/bin/sh
+#
+# Copyright (c) 2010 Jakub Narebski
+#
+
+test_description='gitweb cache
+
+This test checks GitwebCache::CacheOutput Perl module that is
+responsible for capturing and caching gitweb output.'
+
+# for now we are running only cache interface tests
+. ./test-lib.sh
+
+# this test is present in gitweb-lib.sh
+if ! test_have_prereq PERL; then
+	skip_all='perl not available, skipping test'
+	test_done
+fi
+
+"$PERL_PATH" -MTest::More -e 0 >/dev/null 2>&1 || {
+	skip_all='perl module Test::More unavailable, skipping test'
+	test_done
+}
+
+# ----------------------------------------------------------------------
+
+# The external test will outputs its own plan
+test_external_has_tap=1
+
+test_external \
+	'GitwebCache::CacheOutput Perl API (in gitweb/lib/)' \
+	"$PERL_PATH" "$TEST_DIRECTORY"/t9505/test_cache_output.pl
+
+test_done
diff --git a/t/t9505/test_cache_output.pl b/t/t9505/test_cache_output.pl
new file mode 100755
index 0000000..0343591
--- /dev/null
+++ b/t/t9505/test_cache_output.pl
@@ -0,0 +1,70 @@
+#!/usr/bin/perl
+use lib (split(/:/, $ENV{GITPERLLIB}));
+
+use warnings;
+use strict;
+
+use Test::More;
+
+# test source version
+use lib $ENV{GITWEBLIBDIR} || "$ENV{GIT_BUILD_DIR}/gitweb/lib";
+
+# ....................................................................
+
+# prototypes must be known at compile time, otherwise they do not work
+BEGIN { use_ok('GitwebCache::CacheOutput'); }
+
+# load GitwebCache::SimpleFileCache (gitweb.perl uses require too)
+# GitwebCache::Capture::SelectFH is loaded by GitwebCache::CacheOutput
+require_ok('GitwebCache::SimpleFileCache');
+
+# Test setting up $cache and $capture
+my $cache   = new_ok('GitwebCache::SimpleFileCache'   => [], 'The $cache  ');
+my $capture = new_ok('GitwebCache::Capture::SelectFH' => [], 'The $capture');
+
+# ......................................................................
+
+# Prepare for testing cache_output
+my $key = 'Key';
+my $action_output = <<'EOF';
+# This is data to be cached and shown
+EOF
+my $cached_output = <<"EOF";
+$action_output# (version recovered from cache)
+EOF
+sub action {
+	print $action_output;
+}
+
+# Catch output printed by cache_fetch
+# (only for 'print <sth>' and 'printf <sth>')
+sub capture_output_of_cache_output {
+	my $test_data = '';
+
+	open my $test_data_fh, '>', \$test_data;
+	my $oldfh = select($test_data_fh);
+
+	cache_output($cache, $key, \&action);
+
+	select($oldfh);
+	close $test_data_fh;
+
+	return $test_data;
+}
+
+# clean state
+$cache->set_expires_in(-1);
+$cache->remove($key);
+my $test_data;
+
+# first time (if there is no cache) generates cache entry
+$test_data = capture_output_of_cache_output();
+is($test_data, $action_output,        'action output is printed (generated)');
+is($cache->get($key), $action_output, 'action output is saved in cache (generated)');
+
+# second time (if cache is set/valid) reads from cache
+$cache->set($key, $cached_output);
+$test_data = capture_output_of_cache_output();
+is($test_data, $cached_output,        'action output is printed (from cache)');
+
+done_testing();
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCHv5 08/17] gitweb: Add optional output caching
  2010-10-06 22:01 [PATCHv5 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (6 preceding siblings ...)
  2010-10-06 22:01 ` [PATCHv5 07/17] gitweb/lib - Cache captured output (using get/set) Jakub Narebski
@ 2010-10-06 22:01 ` Jakub Narebski
  2010-10-06 22:46   ` Ævar Arnfjörð Bjarmason
  2010-10-06 22:01 ` [PATCHv5 09/17] gitweb/lib - Adaptive cache expiration time Jakub Narebski
                   ` (14 subsequent siblings)
  22 siblings, 1 reply; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 22:01 UTC (permalink / raw)
  To: git; +Cc: John 'Warthog9' Hawley, Petr Baudis, admin,
	Jakub Narebski

This commit actually adds output caching to gitweb, as we have now
minimal features required for it in GitwebCache::SimpleFileCache (a
'dumb' but fast file-based cache engine).  To enable cache you need
(at least) set $caching_enabled to true in gitweb config, and copy
caching modules alongside generated gitweb.cgi, e.g. by using 'make
install-gitweb'.  This is described in more detail in the new "Gitweb
caching" section in gitweb/README.

Caching in theory can be done using any Perl module that implements
Cache::Cache compatibile get/set (method) interface.  The default is
to use GitwebCache::SimpleFileCache.  Capturing and caching output is
done via cache_output subroutine from GitwebCache::CacheOutput.

The cache_output subroutine in GitwebCache::CacheOutput currently
captures output using GitwebCache::Capture::SelectFH package, which in
turn uses select(FILEHANDLE) to change default filehandle for output.
This means that a "print" or a "printf" (or a "write") in gitweb
source without a filehandle would be captured.  To change mode of
filehandle used for capturing correctly, "binmode STDOUT, <mode>" had
to be changed to "binmode select(), <mode>".

Capturing and caching is designed in such way that there is no
behaviour change if $caching_enabled is false.  If caching is not
enabled, then capturing is also turned off.

Enabling caching causes the following additional changes to gitweb
output:
* Disables content-type negotiation (choosing between 'text/html'
  mimetype and 'application/xhtml+xml') when caching, as there is no
  content-type negotiation done when retrieving page from cache.
  Use lowest common denominator of 'text/html' mimetype which can
  be used by all browsers.
* Disable optional timing info (how much time it took to generate the
  original page, and how many git commands it took), and in its place show
  unconditionally when page was originally generated (in GMT / UTC
  timezone).
* Disable 'blame_incremental' view, as it doesn't make sense without
  printing data as soon as it is generated (which would require tee-ing
  when capturing output for caching)... and it doesn't work currently
  anyway.

Add basic tests of caching support to t9500-gitweb-standalone-no-errors
test: set $caching_enabled to true and check for errors for first time
run (generating cache) and second time run (retrieving from cache) for a
single view - summary view for a project.


To be implemented (from original patch by J.H.):
* adaptive cache expiration, based on average system load
* optional locking interface, where only one process can update cache
  (using flock)
* server-side progress indicator when waiting for filling cache,
  which in turn requires separating situations (like snapshots and
  other non-HTML responses) where we should not show 'please wait'
  message

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Differences from v4:
* Paths to modules to install gathered in GITWEB_MODULES variable in
  gitweb/Makefile do not include common 'lib/' prefix, following the
  change in "gitweb: Prepare for splitting gitweb"

* Separate preparing cache for output caching into configure_caching()
  subroutine; cache_output() is now called from dispatch(), following
  update to gitweb.

* Load needed module for cache (if $cache is set to name of module, it
  loads this module using 'eval "require $cache"'; note that if $cache
  is unset and $cache_enabled is true, then $cache is set to
  'GitwebCache::SimpleFileCache' before this).  

  This is connected with the fact that GitwebCache::CacheOutput no
  longer includes 'use GitwebCache::SimpleFileCache;'.

* Do not use die_error when configuring cache, as the code might be
  called before $cgi is set (though I think it is not the case with
  current gitweb version).

* Updated gitweb/README

* Remove spurious changes


Differences from relevant parts of J.H. patch:
* Capturing gitweb output is done without need to extensively modify
  gitweb; the only change related to capturing output is replacing 
  'binmode STDOUT, <mode>' with 'binmode select(), <mode>', required
  for currently used select($out) based capturing.

* Instead of using "binary" (sic!) valued $cache_enable (which means 0 or 1
  valued $cache_enable), a set of two variables is used.  The $cache
  variable can be used to select alternate caching engine / caching class.
  The $caching_enabled variable is used to actually enable/disable cache.

* The information about the time when page was generated is shown only if
  'timed' feature is enabled in gitweb config, and it is shown in place of
  usual time it took to generate page (shown when caching is not enabled).
  This means that change to gitweb/gitweb.css was not needed.

* cache_output() is run only when $caching_enabled.  Some of cache
  initializations, like creating cache instance, are in
  configure_cache() subroutine in gitweb.perl, and not at beginning of
  cache_output() (which was n.b. called cache_fetch() in J.H. patch).

* Cache options are contained in %cache_options hash, instead of individual
  global variables (which were using non-Perlish camelCase notation).

* There is information about caching in gitweb in gitweb/README

* Some of new features, like incremental blame view, are turned off
  when caching is enabled.

* There is simple test of output caching in gitweb in t9500, namely
  that gitweb runs without errors or warnings with caching enabled,
  both when saving and when restoring from cache.

 gitweb/Makefile                        |    6 ++
 gitweb/README                          |   57 +++++++++++++
 gitweb/gitweb.perl                     |  140 +++++++++++++++++++++++++++-----
 t/t9500-gitweb-standalone-no-errors.sh |   20 +++++
 4 files changed, 201 insertions(+), 22 deletions(-)

diff --git a/gitweb/Makefile b/gitweb/Makefile
index c2d72e4..ec14cd1 100644
--- a/gitweb/Makefile
+++ b/gitweb/Makefile
@@ -112,6 +112,12 @@ endif
 
 GITWEB_FILES += static/git-logo.png static/git-favicon.png
 
+# gitweb output caching
+GITWEB_MODULES += GitwebCache/CacheOutput.pm
+GITWEB_MODULES += GitwebCache/SimpleFileCache.pm
+GITWEB_MODULES += GitwebCache/Capture.pm
+GITWEB_MODULES += GitwebCache/Capture/SelectFH.pm
+
 GITWEB_REPLACE = \
 	-e 's|++GIT_VERSION++|$(GIT_VERSION)|g' \
 	-e 's|++GIT_BINDIR++|$(bindir)|g' \
diff --git a/gitweb/README b/gitweb/README
index d481198..5e3a0bc 100644
--- a/gitweb/README
+++ b/gitweb/README
@@ -236,6 +236,11 @@ not include variables usually directly set during build):
    If server load exceed this value then return "503 Service Unavailable" error.
    Server load is taken to be 0 if gitweb cannot determine its value.  Set it to
    undefined value to turn it off.  The default is 300.
+ * $caching_enabled
+   If true, gitweb would use caching to speed up generating response.
+   Currently supported is only output (response) caching.  See "Gitweb caching"
+   section below for details on how to configure and customize caching.
+   The default is false (caching is disabled).
 
 
 Projects list file format
@@ -308,6 +313,58 @@ You can use the following files in repository:
    descriptions.
 
 
+Gitweb caching
+~~~~~~~~~~~~~~
+
+Currently gitweb supports only output (HTTP response) caching, similar
+to the one used on http://git.kernel.org.  To turn it on, set 
+$caching_enabled variable to true value in gitweb config file, i.e.:
+
+   our $caching_enabled = 1;
+
+You can choose which caching engine should gitweb use by setting $cache
+variable to _initialized_ instance of cache interface, e.g.:
+
+   use CHI;
+   our $cache = CHI->new( driver => 'Memcached',
+   	servers => [ "10.0.0.15:11211", "10.0.0.15:11212" ],
+   	l1_cache => { driver => 'FastMmap', root_dir => '/var/cache/gitweb' }
+   );
+
+Alternatively you can set $cache variable to the name of cache class,
+e.g.:
+
+   our $cache = 'Cache::FileCache';
+
+In this case caching engine should support Cache::Cache or CHI names for
+cache config (see below), and ignore unrecognized options.  Such caching
+engine should also implement (at least) ->get($key) and ->set($key, $data)
+methods (Cache::Cache and CHI compatible interface).
+
+If $cache is left unset (if it is left undefined), then gitweb would use
+GitwebCache::SimpleFileCache as caching engine.  This engine is 'dumb' (but
+fast) file based caching layer, currently without any support for cache size
+limiting, or even removing expired / grossly expired entries.  It has
+therefore the downside of requiring a huge amount of disk space if there are
+a number of repositories involved.  It is not uncommon for git.kernel.org to
+have on the order of 80G - 120G accumulate over the course of a few months.
+It is therefore recommended that the cache directory be periodically
+completely deleted; this operation is safe to perform.  Suggested mechanism
+(substitute $cachedir for actual path to gitweb cache):
+
+   # mv $cachedir $cachedir.flush && mkdir $cachedir && rm -rf $cachedir.flush
+
+Site-wide cache options are defined in %cache_options hash.  Those options
+apply only when $cache is unset (GitwebCache::SimpleFileCache is used), or
+if $cache is name of cache class (e.g. $cache = 'Cache::FileCache').  You
+can override cache options in gitweb config, e.g.:
+
+   $cache_options{'expires_in'} = 60; # 60 seconds = 1 minute
+
+Please read comments for %cache_options entries in gitweb/gitweb.perl for
+description of available cache options.
+
+
 Webserver configuration
 -----------------------
 
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index e4c08ba..f81a4a2 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -263,6 +263,44 @@ our %highlight_ext = (
 	map { $_ => 'xml' } qw(xhtml html htm),
 );
 
+
+# This enables/disables the caching layer in gitweb.  Currently supported
+# is only output (response) caching, similar to the one used on git.kernel.org.
+our $caching_enabled = 0;
+# Set to _initialized_ instance of cache interface implementing (at least)
+# get($key) and set($key, $data) methods (Cache::Cache and CHI interfaces).
+# If unset, GitwebCache::SimpleFileCache would be used, which is 'dumb'
+# (but fast) file based caching layer, currently without any support for
+# cache size limiting.  It is therefore recommended that the cache directory
+# be periodically completely deleted; this operation is safe to perform.
+# Suggested mechanism:
+# mv $cachedir $cachedir.flush && mkdir $cachedir && rm -rf $cachedir.flush
+our $cache;
+# You define site-wide cache options defaults here; override them with
+# $GITWEB_CONFIG as necessary.
+our %cache_options = (
+	# The location in the filesystem that will hold the root of the cache.
+	# This directory will be created as needed (if possible) on the first
+	# cache set.  Note that either this directory must exists and web server
+	# has to have write permissions to it, or web server must be able to
+	# create this directory.
+	# Possible values:
+	# * 'cache' (relative to gitweb),
+	# * File::Spec->catdir(File::Spec->tmpdir(), 'gitweb-cache'),
+	# * '/var/cache/gitweb' (FHS compliant, requires being set up),
+	'cache_root' => 'cache',
+
+	# The number of subdirectories deep to cache object item.  This should be
+	# large enough that no cache directory has more than a few hundred
+	# objects.  Each non-leaf directory contains up to 256 subdirectories
+	# (00-ff).  Must be larger than 0.
+	'cache_depth' => 1,
+
+	# The (global) expiration time for objects placed in the cache, in seconds.
+	'expires_in' => 20,
+);
+
+
 # You define site-wide feature defaults here; override them with
 # $GITWEB_CONFIG as necessary.
 our %feature = (
@@ -1055,7 +1093,15 @@ sub dispatch {
 	    !$project) {
 		die_error(400, "Project needed");
 	}
-	$actions{$action}->();
+
+	if ($caching_enabled) {
+		# human readable key identifying gitweb output
+		my $output_key = href(-replay => 1, -full => 1, -path_info => 0);
+
+		cache_output($cache, $output_key, $actions{$action});
+	} else {
+		$actions{$action}->();
+	}
 }
 
 sub reset_timer {
@@ -1071,6 +1117,8 @@ sub run_request {
 	evaluate_gitweb_config();
 	evaluate_git_version();
 	check_loadavg();
+	configure_caching()
+		if ($caching_enabled);
 
 	# $projectroot and $projects_list might be set in gitweb config file
 	$projects_list ||= $projectroot;
@@ -1143,6 +1191,38 @@ sub run {
 	1;
 }
 
+sub configure_caching {
+	if (!eval { require GitwebCache::CacheOutput; 1; }) {
+		# cache is configured _before_ handling request, so $cgi is not defined,
+		# so we can't just "die" with sending error message to web browser
+		#die_error(500, "Caching enabled and GitwebCache::CacheOutput not found");
+
+		# turn off caching and warn instead
+		$caching_enabled = 0;
+		warn "Caching enabled and GitwebCache::CacheOutput not found";
+	}
+	GitwebCache::CacheOutput->import();
+
+	# $cache might be initialized (instantiated) cache, i.e. cache object,
+	# or it might be name of class, or it might be undefined
+	unless (defined $cache && ref($cache)) {
+		$cache ||= 'GitwebCache::SimpleFileCache';
+		eval "require $cache";
+		die $@ if $@;
+		$cache = $cache->new({
+			%cache_options,
+			#'cache_root' => '/tmp/cache',
+			#'cache_depth' => 2,
+			#'expires_in' => 20, # in seconds (CHI compatibile)
+			# (Cache::Cache compatibile initialization)
+			'default_expires_in' => $cache_options{'expires_in'},
+			# (CHI compatibile initialization)
+			'root_dir' => $cache_options{'cache_root'},
+			'depth' => $cache_options{'cache_depth'},
+		});
+	}
+}
+
 run();
 
 if (defined caller) {
@@ -3405,7 +3485,9 @@ sub git_header_html {
 	# 'application/xhtml+xml', otherwise send it as plain old 'text/html'.
 	# we have to do this because MSIE sometimes globs '*/*', pretending to
 	# support xhtml+xml but choking when it gets what it asked for.
-	if (defined $cgi->http('HTTP_ACCEPT') &&
+	# Disable content-type negotiation when caching (use mimetype good for all).
+	if (!$caching_enabled &&
+	    defined $cgi->http('HTTP_ACCEPT') &&
 	    $cgi->http('HTTP_ACCEPT') =~ m/(,|;|\s|^)application\/xhtml\+xml(,|;|\s|$)/ &&
 	    $cgi->Accept('application/xhtml+xml') != 0) {
 		$content_type = 'application/xhtml+xml';
@@ -3579,17 +3661,25 @@ sub git_footer_html {
 	}
 	print "</div>\n"; # class="page_footer"
 
-	if (defined $t0 && gitweb_check_feature('timed')) {
+	# timing info doesn't make much sense with output (response) caching,
+	# so when caching is enabled gitweb prints the time of page generation
+	if ((defined $t0 || $caching_enabled) &&
+	    gitweb_check_feature('timed')) {
 		print "<div id=\"generating_info\">\n";
-		print 'This page took '.
-		      '<span id="generating_time" class="time_span">'.
-		      Time::HiRes::tv_interval($t0, [Time::HiRes::gettimeofday()]).
-		      ' seconds </span>'.
-		      ' and '.
-		      '<span id="generating_cmd">'.
-		      $number_of_git_cmds.
-		      '</span> git commands '.
-		      " to generate.\n";
+		if ($caching_enabled) {
+			print 'This page was generated at '.
+			      gmtime( time() )." GMT\n";
+		} else {
+			print 'This page took '.
+			      '<span id="generating_time" class="time_span">'.
+			      Time::HiRes::tv_interval($t0, [Time::HiRes::gettimeofday()]).
+			      ' seconds </span>'.
+			      ' and '.
+			      '<span id="generating_cmd">'.
+			      $number_of_git_cmds.
+			      '</span> git commands '.
+			      " to generate.\n";
+		}
 		print "</div>\n"; # class="page_footer"
 	}
 
@@ -3598,8 +3688,8 @@ sub git_footer_html {
 	}
 
 	print qq!<script type="text/javascript" src="$javascript"></script>\n!;
-	if (defined $action &&
-	    $action eq 'blame_incremental') {
+	if (!$caching_enabled &&
+	    defined $action && $action eq 'blame_incremental') {
 		print qq!<script type="text/javascript">\n!.
 		      qq!startBlame("!. href(action=>"blame_data", -replay=>1) .qq!",\n!.
 		      qq!           "!. href() .qq!");\n!.
@@ -3640,6 +3730,10 @@ sub die_error {
 		500 => '500 Internal Server Error',
 		503 => '503 Service Unavailable',
 	);
+
+	# Do not cache error pages
+	capture_stop() if ($caching_enabled);
+
 	git_header_html($http_responses{$status}, undef, %opts);
 	print <<EOF;
 <div class="page_body">
@@ -5235,7 +5329,8 @@ sub git_tag {
 
 sub git_blame_common {
 	my $format = shift || 'porcelain';
-	if ($format eq 'porcelain' && $cgi->param('js')) {
+	if ($format eq 'porcelain' && $cgi->param('js') &&
+	    !$caching_enabled) {
 		$format = 'incremental';
 		$action = 'blame_incremental'; # for page title etc
 	}
@@ -5289,7 +5384,8 @@ sub git_blame_common {
 			or print "ERROR $!\n";
 
 		print 'END';
-		if (defined $t0 && gitweb_check_feature('timed')) {
+		if (!$caching_enabled &&
+		    defined $t0 && gitweb_check_feature('timed')) {
 			print ' '.
 			      Time::HiRes::tv_interval($t0, [Time::HiRes::gettimeofday()]).
 			      ' '.$number_of_git_cmds;
@@ -5309,7 +5405,7 @@ sub git_blame_common {
 		$formats_nav .=
 			$cgi->a({-href => href(action=>"blame", javascript=>0, -replay=>1)},
 			        "blame") . " (non-incremental)";
-	} else {
+	} elsif (!$caching_enabled) {
 		$formats_nav .=
 			$cgi->a({-href => href(action=>"blame_incremental", -replay=>1)},
 			        "blame") . " (incremental)";
@@ -5468,7 +5564,7 @@ sub git_blame {
 }
 
 sub git_blame_incremental {
-	git_blame_common('incremental');
+	git_blame_common(!$caching_enabled ? 'incremental' : undef);
 }
 
 sub git_blame_data {
@@ -5548,9 +5644,9 @@ sub git_blob_plain {
 			($sandbox ? 'attachment' : 'inline')
 			. '; filename="' . $save_as . '"');
 	local $/ = undef;
-	binmode STDOUT, ':raw';
+	binmode select(), ':raw';
 	print <$fd>;
-	binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
+	binmode select(), ':utf8'; # as set at the beginning of gitweb.cgi
 	close $fd;
 }
 
@@ -5835,9 +5931,9 @@ sub git_snapshot {
 
 	open my $fd, "-|", $cmd
 		or die_error(500, "Execute git-archive failed");
-	binmode STDOUT, ':raw';
+	binmode select(), ':raw';
 	print <$fd>;
-	binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
+	binmode select(), ':utf8'; # as set at the beginning of gitweb.cgi
 	close $fd;
 }
 
diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
index 4f2b9b0..c8b4286 100755
--- a/t/t9500-gitweb-standalone-no-errors.sh
+++ b/t/t9500-gitweb-standalone-no-errors.sh
@@ -676,4 +676,24 @@ test_expect_success HIGHLIGHT \
 	 gitweb_run "p=.git;a=blob;f=test.sh"'
 test_debug 'cat gitweb.log'
 
+# ----------------------------------------------------------------------
+# caching
+
+cat >>gitweb_config.perl <<\EOF
+$caching_enabled = 1;
+$cache_options{'expires_in'} = -1; # never expire cache for tests
+$cache_options{'cache_root'} = 'cache'; # to clear right thing
+EOF
+rm -rf cache
+
+test_expect_success \
+	'caching enabled (project summary, first run)' \
+	'gitweb_run "p=.git;a=summary"'
+test_debug 'cat gitweb.log'
+
+test_expect_success \
+	'caching enabled (project summary, second run)' \
+	'gitweb_run "p=.git;a=summary"'
+test_debug 'cat gitweb.log'
+
 test_done
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCHv5 09/17] gitweb/lib - Adaptive cache expiration time
  2010-10-06 22:01 [PATCHv5 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (7 preceding siblings ...)
  2010-10-06 22:01 ` [PATCHv5 08/17] gitweb: Add optional output caching Jakub Narebski
@ 2010-10-06 22:01 ` Jakub Narebski
  2010-10-06 22:01 ` [PATCHv5 10/21] gitweb/lib - Use CHI compatibile (compute method) caching interface Jakub Narebski
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 22:01 UTC (permalink / raw)
  To: git; +Cc: John 'Warthog9' Hawley, Petr Baudis, admin,
	Jakub Narebski

Add to GitwebCache::SimpleFileCache support for adaptive lifetime
(cache expiration) control.  Cache lifetime can be increased or
decreased by any factor, e.g. load average, through the definition
of the 'check_load' callback.

Note that using ->set_expires_in, or unsetting 'check_load' via
->set_check_load(undef) turns off adaptive caching.

Make gitweb automatically adjust cache lifetime by load, using
get_loadavg() function.  Define and describe default parameters for
dynamic (adaptive) cache expiration time control.

There are some very basic tests of dynamic expiration time in t9503,
namely checking if dynamic expire time is within given upper and lower
bounds.

To be implemented (from original patch by J.H.):
* optional locking interface, where only one process can update cache
  (using flock)
* server-side progress indicator when waiting for filling cache,
  which in turn requires separating situations (like snapshots and
  other non-HTML responses) where we should not show 'please wait'
  message

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
There are no significant difference from v4 (previous version)

Differences from relevant parts of J.H. patch:
* 'increase_factor' is configurable rather than hardcoded

* 'check_load' is passed via conctructor parameter; gitweb by default
  sets it to \&get_loadavg.  This means that the caching engine is not
  entangled with gitweb.

* options are stored in %cache_options, e.g. as 'expires_max' (compatible
  with Cache::Adaptive), and not as global variables with un-Perlish
  camelCase names like $maxCacheTime.

* API tests


 gitweb/gitweb.perl                        |   25 ++++++++-
 gitweb/lib/GitwebCache/SimpleFileCache.pm |   79 +++++++++++++++++++++++++++--
 t/t9503/test_cache_interface.pl           |   33 ++++++++++++
 3 files changed, 130 insertions(+), 7 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index f81a4a2..e89f469 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -296,8 +296,29 @@ our %cache_options = (
 	# (00-ff).  Must be larger than 0.
 	'cache_depth' => 1,
 
-	# The (global) expiration time for objects placed in the cache, in seconds.
-	'expires_in' => 20,
+	# The (global) minimum expiration time for objects placed in the cache,
+	# in seconds.  If the dynamic adaptive cache exporation time is lower
+	# than this number, we set cache timeout to this minimum.
+	'expires_min' => 20,   # 20 seconds
+
+	# The (global) maximum expiration time for dynamic (adaptive) caching
+	# algorithm, in seconds.  If the adaptive cache lifetime exceeds this
+	# number, we set cache timeout to this maximum.
+	# (If 'expires_min' >= 'expires_max', there is no adaptive cache timeout,
+	# and 'expires_min' is used as expiration time for objects in cache.)
+	'expires_max' => 1200, # 20 minutes
+
+	# Cache lifetime will be increased by applying this factor to the result
+	# from 'check_load' callback (see below).
+	'expires_factor' => 60, # expire time in seconds for 1.0 (100% CPU) load
+
+	# User supplied callback for deciding the cache policy, usually system
+	# load.  Multiplied by 'expires_factor' gives adaptive expiration time,
+	# in seconds, subject to the limits imposed by 'expires_min' and
+	# 'expires_max' bounds.  Set to undef (or delete) to turn off dynamic
+	# lifetime control.
+	# (Compatibile with Cache::Adaptive.)
+	'check_load' => \&get_loadavg,
 );
 
 
diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
index 6833699..72ab77a 100644
--- a/gitweb/lib/GitwebCache/SimpleFileCache.pm
+++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
@@ -61,6 +61,22 @@ our $DEFAULT_NAMESPACE = '';
 #    'expires_in' (CHI compatibile) [seconds]
 #    The expiration time for objects place in the cache.
 #    Defaults to -1 (never expire) if not explicitly set.
+#    Sets 'expires_min' to given value.
+#  * 'expires_min' [seconds]
+#    The minimum expiration time for objects in cache (e.g. with 0% CPU load).
+#    Used as lower bound in adaptive cache lifetime / expiration.
+#    Defaults to 20 seconds; 'expires_in' sets it also.
+#  * 'expires_max' [seconds]
+#    The maximum expiration time for objects in cache.
+#    Used as upper bound in adaptive cache lifetime / expiration.
+#    Defaults to 1200 seconds, if not set; 
+#    defaults to 'expires_min' if 'expires_in' is used.
+#  * 'check_load'
+#    Subroutine (code) used for adaptive cache lifetime / expiration.
+#    If unset, adaptive caching is turned off; defaults to unset.
+#  * 'increase_factor' [seconds / 100% CPU load]
+#    Factor multiplying 'check_load' result when calculating cache lietime.
+#    Defaults to 60 seconds for 100% SPU load ('check_load' returning 1.0).
 sub new {
 	my ($proto, $p_options_hash_ref) = @_;
 
@@ -68,7 +84,8 @@ sub new {
 	my $self  = {};
 	$self = bless($self, $class);
 
-	my ($root, $depth, $ns, $expires_in);
+	my ($root, $depth, $ns);
+	my ($expires_min, $expires_max, $increase_factor, $check_load);
 	if (defined $p_options_hash_ref) {
 		$root  =
 			$p_options_hash_ref->{'cache_root'} ||
@@ -77,19 +94,31 @@ sub new {
 			$p_options_hash_ref->{'cache_depth'} ||
 			$p_options_hash_ref->{'depth'};
 		$ns    = $p_options_hash_ref->{'namespace'};
-		$expires_in =
+		$expires_min =
+			$p_options_hash_ref->{'expires_min'} ||
 			$p_options_hash_ref->{'default_expires_in'} ||
 			$p_options_hash_ref->{'expires_in'};
+		$expires_max =
+			$p_options_hash_ref->{'expires_max'};
+		$increase_factor = $p_options_hash_ref->{'expires_factor'};
+		$check_load      = $p_options_hash_ref->{'check_load'};
 	}
 	$root  = $DEFAULT_CACHE_ROOT  unless defined($root);
 	$depth = $DEFAULT_CACHE_DEPTH unless defined($depth);
 	$ns    = $DEFAULT_NAMESPACE   unless defined($ns);
-	$expires_in = -1 unless defined($expires_in); # <0 means never
+	$expires_min = -1 unless defined($expires_min);
+	$expires_max = $expires_min
+		if (!defined($expires_max) && exists $p_options_hash_ref->{'expires_in'});
+	$expires_max = -1 unless (defined($expires_max));
+	$increase_factor = 60 unless defined($increase_factor);
 
 	$self->set_root($root);
 	$self->set_depth($depth);
 	$self->set_namespace($ns);
-	$self->set_expires_in($expires_in);
+	$self->set_expires_min($expires_min);
+	$self->set_expires_max($expires_max);
+	$self->set_increase_factor($increase_factor);
+	$self->set_check_load($check_load);
 
 	return $self;
 }
@@ -100,7 +129,7 @@ sub new {
 # http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
 
 # creates get_depth() and set_depth($depth) etc. methods
-foreach my $i (qw(depth root namespace expires_in)) {
+foreach my $i (qw(depth root namespace expires_min expires_max increase_factor check_load)) {
 	my $field = $i;
 	no strict 'refs';
 	*{"get_$field"} = sub {
@@ -113,6 +142,46 @@ foreach my $i (qw(depth root namespace expires_in)) {
 	};
 }
 
+# ......................................................................
+# pseudo-accessors
+
+# returns adaptive lifetime of cache entry for given $key [seconds]
+sub get_expires_in {
+	my ($self) = @_;
+
+	# short-circuit
+	if (!defined $self->{'check_load'} ||
+	    $self->{'expires_max'} <= $self->{'expires_min'}) {
+		return $self->{'expires_min'};
+	}
+
+	my $expires_in =
+		#$self->{'expires_min'} +
+		$self->{'increase_factor'} * $self->check_load();
+
+	if ($expires_in < $self->{'expires_min'}) {
+		return $self->{'expires_min'};
+	} elsif ($expires_in > $self->{'expires_max'}) {
+		return $self->{'expires_max'};
+	}
+
+	return $expires_in;
+}
+
+# sets expiration time to $duration, turns off adaptive cache lifetime
+sub set_expires_in {
+	my ($self, $duration) = @_;
+
+	$self->{'expires_min'} = $self->{'expires_max'} = $duration;
+}
+
+# runs 'check_load' subroutine, for adaptive cache lifetime.
+# Note: check in caller that 'check_load' exists.
+sub check_load {
+	my $self = shift;
+	return $self->{'check_load'}->();
+}
+
 # ----------------------------------------------------------------------
 # utility functions and methods
 
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index adca88d..0ac0ed7 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -97,4 +97,37 @@ subtest 'cache expiration' => sub {
 	done_testing();
 };
 
+# Test assertions for adaptive cache expiration
+#
+my $load = 0.0;
+sub load { return $load; }
+my $expires_min = 10;
+my $expires_max = 30;
+$cache->set_expires_in(-1);
+$cache->set_expires_min($expires_min);
+$cache->set_expires_max($expires_max);
+$cache->set_check_load(\&load);
+subtest 'adaptive cache expiration' => sub {
+	cmp_ok($cache->get_expires_min(), '==', $expires_min,
+	       '"expires min" set correctly');
+	cmp_ok($cache->get_expires_max(), '==', $expires_max,
+	       '"expires max" set correctly');
+
+	$load = 0.0;
+	cmp_ok($cache->get_expires_in(), '>=', $expires_min,
+	       '"expires in" bound from down for load=0');
+	cmp_ok($cache->get_expires_in(), '<=', $expires_max,
+	       '"expires in" bound from up   for load=0');
+	
+	$load = 1_000;
+	cmp_ok($cache->get_expires_in(), '>=', $expires_min,
+	       '"expires in" bound from down for heavy load');
+	cmp_ok($cache->get_expires_in(), '<=', $expires_max,
+	       '"expires in" bound from up   for heavy load');
+
+	done_testing();
+};
+
+$cache->set_expires_in(-1);
+
 done_testing();
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCHv5 10/21] gitweb/lib - Use CHI compatibile (compute method) caching interface
  2010-10-06 22:01 [PATCHv5 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (8 preceding siblings ...)
  2010-10-06 22:01 ` [PATCHv5 09/17] gitweb/lib - Adaptive cache expiration time Jakub Narebski
@ 2010-10-06 22:01 ` Jakub Narebski
  2010-10-06 22:01 ` [PATCHv5 11/17] gitweb/lib - Use locking to avoid 'cache miss stampede' problem Jakub Narebski
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 22:01 UTC (permalink / raw)
  To: git; +Cc: John 'Warthog9' Hawley, Petr Baudis, admin,
	Jakub Narebski

If $cache provides CHI compatible ->compute($key, $code) method, use it
instead of Cache::Cache compatible ->get($key) and ->set($key, $data).

GitwebCache::SimpleFileCache provides 'compute' method, which currently
simply use 'get' and 'set' methods in proscribed manner.  Nevertheless
'compute' method can be more flexible in choosing when to refresh cache,
and which process is to refresh/(re)generate cache entry.  This method
would use (advisory) locking to prevent 'cache miss stampede' (aka
'stampeding herd') problem in the next commit.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch has no differences since v4.

This patch doesn't strictly have an equivalent in J.H. patch adding
caching to gitweb.


 gitweb/lib/GitwebCache/CacheOutput.pm |   31 +++++++++++++++++++++++++++++++
 1 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/gitweb/lib/GitwebCache/CacheOutput.pm b/gitweb/lib/GitwebCache/CacheOutput.pm
index bba73ee..7a13b2f 100644
--- a/gitweb/lib/GitwebCache/CacheOutput.pm
+++ b/gitweb/lib/GitwebCache/CacheOutput.pm
@@ -37,6 +37,37 @@ our $CAPTURE_CLASS = 'GitwebCache::Capture::SelectFH';
 sub cache_output {
 	my ($cache, $key, $code) = @_;
 
+	if ($cache->can('compute')) {
+		#other solution: goto &cache_output_compute;
+		return cache_output_compute($cache, $key, $code);
+	} else {
+		#other solution: goto &cache_output_get_set;
+		return cache_output_get_set($cache, $key, $code);
+	}
+}
+
+# for $cache which can ->compute($key, $code)
+sub cache_output_compute {
+	my ($cache, $key, $code) = @_;
+
+	my $capture = $CAPTURE_CLASS;
+	setup_capture($capture);
+
+	my $data = $cache->compute($key, sub { &capture_block($code) });
+
+	if (defined $data) {
+		# select() instead of STDOUT is here for t9503 test:
+		binmode select(), ':raw';
+		print $data;
+	}
+
+	return $data;
+}
+
+# for $cache which can ->get($key) and ->set($key, $data)
+sub cache_output_get_set {
+	my ($cache, $key, $code) = @_;
+
 	my $data = $cache->get($key);
 
 	# capture and cache output, if there was nothing in the cache
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCHv5 11/17] gitweb/lib - Use locking to avoid 'cache miss stampede' problem
  2010-10-06 22:01 [PATCHv5 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (9 preceding siblings ...)
  2010-10-06 22:01 ` [PATCHv5 10/21] gitweb/lib - Use CHI compatibile (compute method) caching interface Jakub Narebski
@ 2010-10-06 22:01 ` Jakub Narebski
  2010-10-06 22:01 ` [PATCHv5 12/17] gitweb/lib - No need for File::Temp when locking Jakub Narebski
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 22:01 UTC (permalink / raw)
  To: git; +Cc: John 'Warthog9' Hawley, Petr Baudis, admin,
	Jakub Narebski

Create GitwebCache::FileCacheWithLocking module (class), derived from
GitwebCache::SimpleFileCache, where the ->compute($key, $code) method
use locking (via flock) to ensure that only one process would generate
data to update/fill-in cache; the rest would wait for the cache to
be (re)generated and would read data from cache.  If process that was
to (re)generate data dies or exits, one of the readers would take its
role.

Currently this feature can not be disabled via %cache_options,
although you can set $cache to 'GitwebCache::SimpleFileCache' instead.
Future new features (like: serving stale data while cache is being
regenerated, (re)generating cache in background, activity indicator)
all depend on locking.


A test in t9503 shows that in the case where there are two clients
trying to simultaneously access non-existent or stale cache entry,
(and generating data takes (artifically) a bit of time), if they are
using ->compute method the data is (re)generated once, as opposed to
if those clients are just using ->get/->set methods.

To be implemented (from original patch by J.H.):
* background building, and showing stale cache
* server-side progress indicator when waiting for filling cache,
  which in turn requires separating situations (like snapshots and
  other non-HTML responses) where we should not show 'please wait'
  message

Note that there is slight inefficiency, in that filename for lockfile
depends on the filename for cache entry (it just adds '.lock' suffix),
but is recalculated from $key for both paths.

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
The only difference from v4, besides not including 'lib/' prefix
when adding GitwebCache/FileCacheWithLocking.pm to GITWEB_MODULES
is that GitwebCache::FileCacheWithLocking module is automatically
loaded (require'd) by configure_cache() subroutine in gitweb.


Differences from relevant parts of J.H. patch:
* Locking on separate *.lock file, for simpler work and robustness;
  acquiring exclusive lock might require having file opened for
  possible write, following

    "File Locking Tricks and Traps" (http://perl.plover.com/yak/flock/)

  J.H. patch used separate *.bg file for lock only for background
  caching (to be implemented in one of next commits).

* Ensure that ->compute() delivers $data, unless $code fails.  See
  above about adding loop in ->compute() method.

* Single variable $lock_state, in place of $lockingStatus,
  $lockStatBG, $lockStat, $lockStatus in J.H. patch.

* Use indirect (lexical) filehandles for lockfiles, instead of
  barewords, i.e. global filehandles:

    open my $lock_fh, '+>', $lockfile;

  rather than equivalent of

    open LOCK, '>:utf8', $lockfile

* Do not use LOCK_UN: closing lockfile would unlock.  LOCK_UN doesn't
  always work as intended.

* You can turn off locking, but only by changing $cache to be
  'GitwebCache::SimpleFileCache' again.

* In my opinion much cleaner flow control

* Tests that locking work correctly.

 gitweb/Makefile                                |    1 +
 gitweb/README                                  |    6 +-
 gitweb/gitweb.perl                             |    4 +-
 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |   95 ++++++++++++++
 t/t9503/test_cache_interface.pl                |  156 +++++++++++++++++++++++-
 5 files changed, 257 insertions(+), 5 deletions(-)
 create mode 100644 gitweb/lib/GitwebCache/FileCacheWithLocking.pm

diff --git a/gitweb/Makefile b/gitweb/Makefile
index ec14cd1..a5327ba 100644
--- a/gitweb/Makefile
+++ b/gitweb/Makefile
@@ -115,6 +115,7 @@ GITWEB_FILES += static/git-logo.png static/git-favicon.png
 # gitweb output caching
 GITWEB_MODULES += GitwebCache/CacheOutput.pm
 GITWEB_MODULES += GitwebCache/SimpleFileCache.pm
+GITWEB_MODULES += GitwebCache/FileCacheWithLocking.pm
 GITWEB_MODULES += GitwebCache/Capture.pm
 GITWEB_MODULES += GitwebCache/Capture/SelectFH.pm
 
diff --git a/gitweb/README b/gitweb/README
index 5e3a0bc..8e664c5 100644
--- a/gitweb/README
+++ b/gitweb/README
@@ -341,8 +341,12 @@ cache config (see below), and ignore unrecognized options.  Such caching
 engine should also implement (at least) ->get($key) and ->set($key, $data)
 methods (Cache::Cache and CHI compatible interface).
 
+You can set $cache to 'GitwebCache::SimpleFileCache' if you don't want
+to use locking, but then some advanced features, like generating data in
+background, wouldn't work because they require locking.
+
 If $cache is left unset (if it is left undefined), then gitweb would use
-GitwebCache::SimpleFileCache as caching engine.  This engine is 'dumb' (but
+GitwebCache::FileCacheWithLocking as caching engine.  This engine is 'dumb' (but
 fast) file based caching layer, currently without any support for cache size
 limiting, or even removing expired / grossly expired entries.  It has
 therefore the downside of requiring a huge amount of disk space if there are
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index e89f469..5da9651 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -269,7 +269,7 @@ our %highlight_ext = (
 our $caching_enabled = 0;
 # Set to _initialized_ instance of cache interface implementing (at least)
 # get($key) and set($key, $data) methods (Cache::Cache and CHI interfaces).
-# If unset, GitwebCache::SimpleFileCache would be used, which is 'dumb'
+# If unset, GitwebCache::FileCacheWithLocking would be used, which is 'dumb'
 # (but fast) file based caching layer, currently without any support for
 # cache size limiting.  It is therefore recommended that the cache directory
 # be periodically completely deleted; this operation is safe to perform.
@@ -1227,7 +1227,7 @@ sub configure_caching {
 	# $cache might be initialized (instantiated) cache, i.e. cache object,
 	# or it might be name of class, or it might be undefined
 	unless (defined $cache && ref($cache)) {
-		$cache ||= 'GitwebCache::SimpleFileCache';
+		$cache ||= 'GitwebCache::FileCacheWithLocking';
 		eval "require $cache";
 		die $@ if $@;
 		$cache = $cache->new({
diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
new file mode 100644
index 0000000..c91c0ee
--- /dev/null
+++ b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
@@ -0,0 +1,95 @@
+# gitweb - simple web interface to track changes in git repositories
+#
+# (C) 2006, John 'Warthog9' Hawley <warthog19@eaglescrag.net>
+# (C) 2010, Jakub Narebski <jnareb@gmail.com>
+#
+# This program is licensed under the GPLv2
+
+#
+# Gitweb caching engine, simple file-based cache, with locking
+#
+
+# Based on GitwebCache::SimpleFileCache, minimalistic cache that
+# stores data in the filesystem, without serialization.
+#
+# It uses file locks (flock) to have only one process generating data
+# and writing to cache, when using CHI interface ->compute() method.
+
+package GitwebCache::FileCacheWithLocking;
+use base qw(GitwebCache::SimpleFileCache);
+
+use strict;
+use warnings;
+
+use File::Path qw(mkpath);
+use Fcntl qw(:flock);
+
+# ......................................................................
+# constructor is inherited from GitwebCache::SimpleFileCache
+
+# ----------------------------------------------------------------------
+# utility functions and methods
+
+# Take an human readable key, and return path to be used for lockfile
+# Puts dirname of file path in second argument, if it is provided.
+sub get_lockname {
+	my $self = shift;
+
+	return $self->path_to_key(@_) . '.lock';
+}
+
+# ......................................................................
+# interface methods
+
+# $data = $cache->compute($key, $code);
+#
+# Combines the get and set operations in a single call.  Attempts to
+# get $key; if successful, returns the value.  Otherwise, calls $code
+# and uses the return value as the new value for $key, which is then
+# returned.
+#
+# Uses file locking to have only one process updating value for $key
+# to avoid 'cache miss stampede' (aka 'stampeding herd') problem.
+sub compute {
+	my ($self, $key, $code) = @_;
+
+	my $data = $self->get($key);
+	return $data if defined $data;
+
+	my $dir;
+	my $lockfile = $self->get_lockname($key, \$dir);
+
+	# ensure that directory leading to lockfile exists
+	if (!-d $dir) {
+		eval { mkpath($dir, 0, 0777); 1 }
+			or die "Couldn't mkpath '$dir' for lockfile: $!";
+	}
+
+	# this loop is to protect against situation where process that
+	# acquired exclusive lock (writer) dies or exits (die_error)
+	# before writing data to cache
+	my $lock_state; # needed for loop condition
+	do {
+		open my $lock_fh, '+>', $lockfile
+			or die "Could't open lockfile '$lockfile': $!";
+		$lock_state = flock($lock_fh, LOCK_EX | LOCK_NB);
+		if ($lock_state) {
+			# acquired writers lock
+			$data = $code->($self, $key);
+			$self->set($key, $data);
+		} else {
+			# get readers lock
+			flock($lock_fh, LOCK_SH);
+			$data = $self->fetch($key);
+		}
+		# closing lockfile releases lock
+		close $lock_fh
+			or die "Could't close lockfile '$lockfile': $!";
+	} until (defined $data || $lock_state);
+	# repeat until we have data, or we tried generating data oneself and failed
+	return $data;
+}
+
+1;
+__END__
+# end of package GitwebCache::FileCacheWithLocking
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 0ac0ed7..4a941d8 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -11,8 +11,9 @@ use lib $ENV{GITWEBLIBDIR} || "$ENV{GIT_BUILD_DIR}/gitweb/lib";
 
 # Test creating a cache
 #
-BEGIN { use_ok('GitwebCache::SimpleFileCache'); }
-my $cache = new_ok('GitwebCache::SimpleFileCache');
+BEGIN { use_ok('GitwebCache::FileCacheWithLocking'); }
+my $cache = new_ok('GitwebCache::FileCacheWithLocking');
+isa_ok($cache, 'GitwebCache::SimpleFileCache');
 
 # Test that default values are defined
 #
@@ -130,4 +131,155 @@ subtest 'adaptive cache expiration' => sub {
 
 $cache->set_expires_in(-1);
 
+# Test 'stampeding herd' / 'cache miss stampede' problem
+# (probably should be run only if GIT_TEST_LONG)
+#
+my $slow_time = 1; # how many seconds to sleep in mockup of slow generation
+sub get_value_slow {
+	$call_count++;
+	sleep $slow_time;
+	return $value;
+}
+
+sub cache_get_set {
+	my ($cache, $key) = @_;
+
+	my $data = $cache->get($key);
+	if (!defined $data) {
+		$data = get_value_slow();
+		$cache->set($key, $data);
+	}
+
+	return $data;
+}
+
+sub cache_compute {
+	my ($cache, $key) = @_;
+
+	my $data = $cache->compute($key, \&get_value_slow);
+	return $data;
+}
+
+sub run_child {
+	my ($writer, $data) = @_;
+
+	print $writer "$call_count\0";
+	print $writer "$data\0";
+	$writer->flush();
+
+	exit 0;
+}
+
+sub run_parent {
+	my ($reader, $data, $child) = @_;
+
+	local $/ = "\0";
+
+	my @lines = <$reader>;
+	chomp @lines;
+
+	waitpid $child, 0;
+	close $reader;
+
+	my ($child_count, $child_data) = @lines;
+	return ($child_count, $child_data);
+}
+
+sub count_total {
+	my ($kid_fh, $data, $pid) = @_;
+
+	if ($pid) {
+		my ($child_count, $child_data) =
+			run_parent($kid_fh, $data, $pid);
+		$call_count += $child_count;
+
+	} else {
+		run_child(\*STDOUT, $data);
+	}
+}
+
+my ($pid, $kid_fh);
+
+$call_count = 0;
+$cache->remove($key);
+$pid = open $kid_fh, '-|';
+SKIP: {
+	skip "cannot fork: $!", 1
+		unless defined $pid;
+
+	my $data = cache_get_set($cache, $key);
+	count_total($kid_fh, $data, $pid);
+
+	cmp_ok($call_count, '==', 2, 'parallel get/set: get_value_slow() called twice');
+}
+
+$call_count = 0;
+$cache->remove($key);
+$pid = open $kid_fh, '-|';
+SKIP: {
+	skip "cannot fork: $!", 1
+		unless defined $pid;
+
+	my $data = cache_compute($cache, $key);
+	count_total($kid_fh, $data, $pid);
+
+	cmp_ok($call_count, '==', 1, 'parallel compute: get_value_slow() called once');
+}
+
+
+# Test that it doesn't hang if get_action exits/dies
+#
+sub get_value_die {
+	$call_count++;
+	die "get_value_die\n";
+}
+
+sub cache_compute_catch {
+	my ($cache, $key) = @_;
+
+	eval { $cache->compute($key, \&get_value_die); };
+	my $eval_error = $@;
+	return $eval_error;
+}
+
+sub catch_errors {
+	my ($kid_fh, $eval_error, $pid) = @_;
+
+	my $child_eval_error;
+	if ($pid) {
+		(undef, $child_eval_error) =
+			run_parent($kid_fh, $eval_error, $pid);
+
+	} else {
+		run_child(\*STDOUT, $eval_error);
+	}
+
+	return ($eval_error, $child_eval_error);
+}
+
+$call_count = 0;
+my $result = eval {
+	$pid = open $kid_fh, '-|';
+ SKIP: {
+		skip "cannot fork: $!", 2
+			unless defined $pid;
+
+		local $SIG{ALRM} = sub { die "alarm\n"; };
+		alarm 4*$slow_time;
+
+		my $eval_error = cache_compute_catch($cache, 'No Key');
+		my $child_eval_error;
+
+		($eval_error, $child_eval_error) =
+			catch_errors($kid_fh, $eval_error, $pid);
+
+		is($eval_error,       "get_value_die\n", 'get_value_die() died in parent');
+		is($child_eval_error, "get_value_die\n", 'get_value_die() died in child');
+
+		alarm 0;
+	}
+};
+ok(!$@, 'no alarm call (neither process hung)');
+diag($@) if $@;
+
 done_testing();
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCHv5 12/17] gitweb/lib - No need for File::Temp when locking
  2010-10-06 22:01 [PATCHv5 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (10 preceding siblings ...)
  2010-10-06 22:01 ` [PATCHv5 11/17] gitweb/lib - Use locking to avoid 'cache miss stampede' problem Jakub Narebski
@ 2010-10-06 22:01 ` Jakub Narebski
  2010-10-06 22:01 ` [PATCHv5 13/17] gitweb/lib - Serve stale data when waiting for filling cache Jakub Narebski
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 22:01 UTC (permalink / raw)
  To: git; +Cc: John 'Warthog9' Hawley, Petr Baudis, admin,
	Jakub Narebski

When using locking to ensure that only one process is generating data
and updating cache, there is no need to use File::Temp for temporary
file.  This should improve performance.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch was not present in previous v4 version of this series.  It
brings this solution closer to the one in J.H. patch, because with
locking it does not use File::Temp; the locking is safety enough by
itself.

 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
index c91c0ee..e69047e 100644
--- a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
+++ b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
@@ -38,6 +38,22 @@ sub get_lockname {
 	return $self->path_to_key(@_) . '.lock';
 }
 
+# ----------------------------------------------------------------------
+# "private" utility functions and methods
+
+# take a file path to cache entry, and its directory
+# return filehandle and filename of open temporary file,
+# like File::Temp::tempfile
+sub _tempfile_to_path {
+	my ($file, $dir) = @_;
+
+	my $tempname = "$file.tmp";
+	open my $temp_fh, '>', $tempname
+		or die "Couldn't open temporary file '$tempname' for writing: $!";
+
+	return ($temp_fh, $tempname);
+}
+
 # ......................................................................
 # interface methods
 
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCHv5 13/17] gitweb/lib - Serve stale data when waiting for filling cache
  2010-10-06 22:01 [PATCHv5 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (11 preceding siblings ...)
  2010-10-06 22:01 ` [PATCHv5 12/17] gitweb/lib - No need for File::Temp when locking Jakub Narebski
@ 2010-10-06 22:01 ` Jakub Narebski
  2010-10-06 22:01 ` [PATCHv5 14/17] gitweb/lib - Regenerate (refresh) cache in background Jakub Narebski
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 22:01 UTC (permalink / raw)
  To: git; +Cc: John 'Warthog9' Hawley, Petr Baudis, admin,
	Jakub Narebski

When process fails to acquire exclusive (writers) lock, then instead
of waiting for the other process to (re)generate and fill cache, serve
stale (expired) data from cache.  This is of course possible only if
there is some stale data in cache for given key.

This feature of GitwebCache::FileCacheWithLocking is used only for an
->update($key, $code) method.  It is controlled by 'max_lifetime'
cache parameter; you can set it to -1 to always serve stale data
if it exists, and you can set it to 0 (or any value smaller than
'expires_min') to turn this feature off.

This feature, as it is implemented currently, makes ->update() method a
bit asymmetric with respect to process that acquired writers lock and
those processes that didn't, which can be seen in the new test in t9503.
The process that is to regenerate (refresh) data in cache must wait for
the data to be generated in full before showing anything to client, while
the other processes show stale (expired) data immediately.  In order to
remove or reduce this asymmetry gitweb would need to employ one of the two
alternate solutions.  Either data should be (re)generated in background,
so that process that acquired writers lock would generate data in
background while serving stale data, or alternatively the process that
generates data should pass output to original STDOUT while capturing it
("tee" output).

When developing this feature, ->is_valid() method in base class
GitwebCache::SimpleFileCache acquired additional extra optional
parameter, where one can pass expire time instead of using whole-cache
global (adaptive) expire time.

To be implemented (from original patch by J.H.):
* background building,
* server-side progress indicator when waiting for filling cache,
  which in turn requires separating situations (like snapshots and
  other non-HTML responses) where we should not show 'please wait'
  message

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
No changes since v4 of gitweb caching series.


Differences from relevant parts of J.H. patch:
* Instead of using $maxCacheLife, use 'max_lifetime' option (with
  'max_cache_lifetime' as optional spelling in GitwebCache::SimpleFileCache
  constructor).

* Use 5*60*60 seconds for 5 hours, rather than 18000 (or 18_000) seconds.

* Serving stale data was enabled only when background cache regeneration was
  enabled; here it is enabled for readers regardless whether background
  generation (introduced in next commit) is turned on or not. 

* It is explicit that serving stale data when some process is
  regenerating cache is possible only with locking enabled, i.e. when
  using GitwebCache::FileCacheWithLocking.

* Tests of API

 gitweb/gitweb.perl                             |    8 ++
 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |   94 +++++++++++++++++++++++-
 gitweb/lib/GitwebCache/SimpleFileCache.pm      |    9 +-
 t/t9503/test_cache_interface.pl                |   58 ++++++++++++++-
 4 files changed, 160 insertions(+), 9 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 5da9651..c920656 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -319,6 +319,14 @@ our %cache_options = (
 	# lifetime control.
 	# (Compatibile with Cache::Adaptive.)
 	'check_load' => \&get_loadavg,
+
+	# Maximum cache file life, in seconds.  If cache entry lifetime exceeds
+	# this value, it wouldn't be served as being too stale when waiting for
+	# cache to be regenerated/refreshed, instead of trying to display
+	# existing cache date.
+	# Set it to -1 to always serve existing data if it exists,
+	# set it to 0 to turn off serving stale data - always wait.
+	'max_lifetime' => 5*60*60, # 5 hours
 );
 
 
diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
index e69047e..73cc9ec 100644
--- a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
+++ b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
@@ -25,7 +25,88 @@ use File::Path qw(mkpath);
 use Fcntl qw(:flock);
 
 # ......................................................................
-# constructor is inherited from GitwebCache::SimpleFileCache
+# constructor
+
+# The options are set by passing in a reference to a hash containing
+# any of the following keys:
+#  * 'namespace'
+#    The namespace associated with this cache.  This allows easy separation of
+#    multiple, distinct caches without worrying about key collision.  Defaults
+#    to $DEFAULT_NAMESPACE.
+#  * 'cache_root' (Cache::FileCache compatibile),
+#    'root_dir' (CHI::Driver::File compatibile),
+#    The location in the filesystem that will hold the root of the cache.
+#    Defaults to $DEFAULT_CACHE_ROOT.
+#  * 'cache_depth' (Cache::FileCache compatibile),
+#    'depth' (CHI::Driver::File compatibile),
+#    The number of subdirectories deep to cache object item.  This should be
+#    large enough that no cache directory has more than a few hundred objects.
+#    Defaults to $DEFAULT_CACHE_DEPTH unless explicitly set.
+#  * 'default_expires_in' (Cache::Cache compatibile),
+#    'expires_in' (CHI compatibile) [seconds]
+#    The expiration time for objects place in the cache.
+#    Defaults to -1 (never expire) if not explicitly set.
+#    Sets 'expires_min' to given value.
+#  * 'expires_min' [seconds]
+#    The minimum expiration time for objects in cache (e.g. with 0% CPU load).
+#    Used as lower bound in adaptive cache lifetime / expiration.
+#    Defaults to 20 seconds; 'expires_in' sets it also.
+#  * 'expires_max' [seconds]
+#    The maximum expiration time for objects in cache.
+#    Used as upper bound in adaptive cache lifetime / expiration.
+#    Defaults to 1200 seconds, if not set; 
+#    defaults to 'expires_min' if 'expires_in' is used.
+#  * 'check_load'
+#    Subroutine (code) used for adaptive cache lifetime / expiration.
+#    If unset, adaptive caching is turned off; defaults to unset.
+#  * 'increase_factor' [seconds / 100% CPU load]
+#    Factor multiplying 'check_load' result when calculating cache lietime.
+#    Defaults to 60 seconds for 100% SPU load ('check_load' returning 1.0).
+#
+# (all the above are inherited from GitwebCache::SimpleFileCache)
+#
+#  * 'max_lifetime' [seconds]
+#    If it is greater than 0, and cache entry is expired but not older
+#    than it, serve stale data when waiting for cache entry to be 
+#    regenerated (refreshed).  Non-adaptive.
+#    Defaults to -1 (never expire / always serve stale).
+sub new {
+	my ($proto, $p_options_hash_ref) = @_;
+
+	my $class = ref($proto) || $proto;
+	my $self = $class->SUPER::new($p_options_hash_ref);
+
+	my ($max_lifetime);
+	if (defined $p_options_hash_ref) {
+		$max_lifetime =
+			$p_options_hash_ref->{'max_lifetime'} ||
+			$p_options_hash_ref->{'max_cache_lifetime'};
+	}
+	$max_lifetime = -1 unless defined($max_lifetime);
+
+	$self->set_max_lifetime($max_lifetime);
+
+	return $self;
+}
+
+# ......................................................................
+# accessors
+
+# http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
+
+# creates get_depth() and set_depth($depth) etc. methods
+foreach my $i (qw(max_lifetime)) {
+	my $field = $i;
+	no strict 'refs';
+	*{"get_$field"} = sub {
+		my $self = shift;
+		return $self->{$field};
+	};
+	*{"set_$field"} = sub {
+		my ($self, $value) = @_;
+		$self->{$field} = $value;
+	};
+}
 
 # ----------------------------------------------------------------------
 # utility functions and methods
@@ -94,9 +175,14 @@ sub compute {
 			$data = $code->($self, $key);
 			$self->set($key, $data);
 		} else {
-			# get readers lock
-			flock($lock_fh, LOCK_SH);
-			$data = $self->fetch($key);
+			# try to retrieve stale data
+			$data = $self->fetch($key)
+				if $self->is_valid($key, $self->get_max_lifetime());
+			if (!defined $data) {
+				# get readers lock if there is no stale data to serve
+				flock($lock_fh, LOCK_SH);
+				$data = $self->fetch($key);
+			}
 		}
 		# closing lockfile releases lock
 		close $lock_fh
diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
index 72ab77a..39f24f8 100644
--- a/gitweb/lib/GitwebCache/SimpleFileCache.pm
+++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
@@ -350,12 +350,13 @@ sub remove {
 		or die "Couldn't remove file '$file': $!";
 }
 
-# $cache->is_valid($key)
+# $cache->is_valid($key[, $expires_in])
 #
 # Returns a boolean indicating whether $key exists in the cache
-# and has not expired (global per-cache 'expires_in').
+# and has not expired.  Uses global per-cache expires time, unless
+# passed optional $expires_in argument.
 sub is_valid {
-	my ($self, $key) = @_;
+	my ($self, $key, $expires_in) = @_;
 
 	my $path = $self->path_to_key($key);
 
@@ -368,7 +369,7 @@ sub is_valid {
 	return 0 unless ((stat(_))[7] > 0);
 
 	# expire time can be set to never
-	my $expires_in = $self->get_expires_in();
+	$expires_in = defined $expires_in ? $expires_in : $self->get_expires_in();
 	return 1 unless (defined $expires_in && $expires_in >= 0);
 
 	# is file expired?
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 4a941d8..303bcf0 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -12,7 +12,9 @@ use lib $ENV{GITWEBLIBDIR} || "$ENV{GIT_BUILD_DIR}/gitweb/lib";
 # Test creating a cache
 #
 BEGIN { use_ok('GitwebCache::FileCacheWithLocking'); }
-my $cache = new_ok('GitwebCache::FileCacheWithLocking');
+my $cache = new_ok('GitwebCache::FileCacheWithLocking', [ {
+	'max_lifetime' => 0, # turn it off
+} ]);
 isa_ok($cache, 'GitwebCache::SimpleFileCache');
 
 # Test that default values are defined
@@ -282,4 +284,58 @@ my $result = eval {
 ok(!$@, 'no alarm call (neither process hung)');
 diag($@) if $@;
 
+# Test that cache returns stale data in existing but expired cache situation
+# (probably should be run only if GIT_TEST_LONG)
+#
+my $stale_value = 'Stale Value';
+$cache->set($key, $stale_value);
+$call_count = 0;
+sub run_compute_forked {
+	my ($pid, $kid_fh) = @_;
+
+	my $data = $cache->compute($key, \&get_value_slow);
+
+	my $child_data;
+	if ($pid) {
+		(undef, $child_data) =
+			run_parent($kid_fh, $data, $pid);
+
+	} else {
+		run_child(\*STDOUT, $data);
+	}
+
+	return ($data, $child_data);
+}
+$cache->set_expires_in(0);    # expire now
+$cache->set_max_lifetime(-1); # forever (always serve stale data)
+$pid = open $kid_fh, '-|';
+SKIP: {
+	skip "cannot fork: $!", 2
+		unless defined $pid;
+
+	my ($data, $child_data) = run_compute_forked($pid, $kid_fh);
+
+	# returning stale data works
+	ok($data eq $stale_value || $child_data eq $stale_value,
+	   'stale data in at least one process when expired');
+
+	$cache->set_expires_in(-1); # never expire for next ->get
+	is($cache->get($key), $value, 'value got set correctly, even if stale data returned');
+}
+
+$cache->set_expires_in(0);   # expire now
+$cache->set_max_lifetime(0); # don't serve stale data
+$pid = open $kid_fh, '-|';
+SKIP: {
+	skip "cannot fork: $!", 2
+		unless defined $pid;
+
+	my ($data, $child_data) = run_compute_forked($pid, $kid_fh);
+
+	# no returning stale data
+	isnt($data,       $stale_value, 'no stale data in parent if configured');
+	isnt($child_data, $stale_value, 'no stale data in child  if configured');
+}
+$cache->set_expires_in(-1);
+
 done_testing();
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCHv5 14/17] gitweb/lib - Regenerate (refresh) cache in background
  2010-10-06 22:01 [PATCHv5 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (12 preceding siblings ...)
  2010-10-06 22:01 ` [PATCHv5 13/17] gitweb/lib - Serve stale data when waiting for filling cache Jakub Narebski
@ 2010-10-06 22:01 ` Jakub Narebski
  2010-10-06 22:02 ` [PATCHv5 15/17] gitweb: Introduce %actions_info, gathering information about actions Jakub Narebski
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 22:01 UTC (permalink / raw)
  To: git; +Cc: John 'Warthog9' Hawley, Petr Baudis, admin,
	Jakub Narebski

This commit removes asymmetry in serving stale data (if it exists)
when regenerating cache in GitwebCache::FileCacheWithLocking.  The process
that acquired exclusive (writers) lock, and is therefore selected to
be the one that (re)generates data to fill the cache, can now generate
data in background, while serving stale data.

Those background processes are daemonized, i.e. detached from the main
process (the one returning data or stale data).  Otherwise there might be a
problem when gitweb is running as (part of) long-lived process, for example
from mod_perl (or in the future from FastCGI): it would leave unreaped
children as zombies (entries in process table).  We don't want to wait for
background process, and we can't set $SIG{CHLD} to 'IGNORE' in gitweb to
automatically reap child processes, because this interferes with using
  open my $fd, '-|', git_cmd(), 'param', ...
    or die_error(...)
  # read from <$fd>
  close $fd
    or die_error(...)
In the above code "close" for magic "-|" open calls waitpid...  and we
would would die with "No child processes".  Removing 'or die' would
possibly remove ability to react to other errors.

This feature can be enabled or disabled on demand via 'background_cache'
cache parameter.  It is turned on by default.


To be implemented (from original patch by J.H.):
* server-side progress indicator when waiting for filling cache,
  which in turn requires separating situations (like snapshots and
  other non-HTML responses) where we should not show 'please wait'
  message

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
No differences from v4 (from previous version).

Differences from relevant parts of J.H. patch:
* It is made explicit that background generation depends on using
  locking.

* Use boolean cache option 'background_cache', instead of CamelCase
  global variable $backgroundCache.

* Fork (run generating process in background) only if there is stale
  data to serve (and if background cache is turned on).  In J.H. patch
  forking was done unconditionally, only generation or exit depended
  on $backgroundCache.

* Daemonizes background process, detaching it from parent (using
  setsid).  This way whether main process is short-lived (gitweb as
  CGI) or long-lived (mod_perl or FastCGI), there would be no need to
  wait and no zombies.

* Lock before forking.

* Tests (currently only of API).

* In my opinion cleaner control flow.

 gitweb/gitweb.perl                             |    9 ++++
 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |   61 +++++++++++++++++++----
 gitweb/lib/GitwebCache/SimpleFileCache.pm      |    3 +-
 t/t9503/test_cache_interface.pl                |   12 ++++-
 4 files changed, 70 insertions(+), 15 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index c920656..d9ac063 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -327,6 +327,15 @@ our %cache_options = (
 	# Set it to -1 to always serve existing data if it exists,
 	# set it to 0 to turn off serving stale data - always wait.
 	'max_lifetime' => 5*60*60, # 5 hours
+
+	# This enables/disables background caching.  If it is set to true value,
+	# caching engine would return stale data (if it is not older than
+	# 'max_lifetime' seconds) if it exists, and launch process if regenerating
+	# (refreshing) cache into the background.  If it is set to false value,
+	# the process that fills cache must always wait for data to be generated.
+	# In theory this will make gitweb seem more responsive at the price of
+	# serving possibly stale data.
+	'background_cache' => 1,
 );
 
 
diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
index 73cc9ec..f0d9d6f 100644
--- a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
+++ b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
@@ -23,6 +23,7 @@ use warnings;
 
 use File::Path qw(mkpath);
 use Fcntl qw(:flock);
+use POSIX qw(setsid);
 
 # ......................................................................
 # constructor
@@ -70,21 +71,27 @@ use Fcntl qw(:flock);
 #    than it, serve stale data when waiting for cache entry to be 
 #    regenerated (refreshed).  Non-adaptive.
 #    Defaults to -1 (never expire / always serve stale).
+#  * 'background_cache' (boolean)
+#    This enables/disables regenerating cache in background process.
+#    Defaults to true.
 sub new {
 	my ($proto, $p_options_hash_ref) = @_;
 
 	my $class = ref($proto) || $proto;
 	my $self = $class->SUPER::new($p_options_hash_ref);
 
-	my ($max_lifetime);
+	my ($max_lifetime, $background_cache);
 	if (defined $p_options_hash_ref) {
 		$max_lifetime =
 			$p_options_hash_ref->{'max_lifetime'} ||
 			$p_options_hash_ref->{'max_cache_lifetime'};
+		$background_cache = $p_options_hash_ref->{'background_cache'};
 	}
 	$max_lifetime = -1 unless defined($max_lifetime);
+	$background_cache = 1 unless defined($background_cache);
 
 	$self->set_max_lifetime($max_lifetime);
+	$self->set_background_cache($background_cache);
 
 	return $self;
 }
@@ -95,7 +102,7 @@ sub new {
 # http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
 
 # creates get_depth() and set_depth($depth) etc. methods
-foreach my $i (qw(max_lifetime)) {
+foreach my $i (qw(max_lifetime background_cache)) {
 	my $field = $i;
 	no strict 'refs';
 	*{"get_$field"} = sub {
@@ -148,9 +155,10 @@ sub _tempfile_to_path {
 # Uses file locking to have only one process updating value for $key
 # to avoid 'cache miss stampede' (aka 'stampeding herd') problem.
 sub compute {
-	my ($self, $key, $code) = @_;
+	my ($self, $key, $coderef) = @_;
+	my ($data, $stale_data);
 
-	my $data = $self->get($key);
+	$data = $self->get($key);
 	return $data if defined $data;
 
 	my $dir;
@@ -169,16 +177,47 @@ sub compute {
 	do {
 		open my $lock_fh, '+>', $lockfile
 			or die "Could't open lockfile '$lockfile': $!";
+
 		$lock_state = flock($lock_fh, LOCK_EX | LOCK_NB);
 		if ($lock_state) {
-			# acquired writers lock
-			$data = $code->($self, $key);
-			$self->set($key, $data);
+			## acquired writers lock, have to generate data
+			my $pid;
+			if ($self->{'background_cache'}) {
+				# try to retrieve stale data
+				$stale_data = $self->fetch($key)
+					if $self->is_valid($key, $self->get_max_lifetime());
+
+				# fork if there is stale data, for background process
+				# to regenerate/refresh the cache (generate data)
+				$pid = fork() if (defined $stale_data);
+			}
+			if (!defined $pid || !$pid) {
+				## didn't fork, or are in background process
+
+				# daemonize background process, detaching it from parent
+				# see also Proc::Daemonize, Apache2::SubProcess
+				if (defined $pid) {
+					POSIX::setsid();  # or setpgrp(0, 0);
+					fork() && exit 0;
+				}
+
+				$data = $coderef->($self, $key);
+				$self->set($key, $data);
+
+				if (defined $pid) {
+					## in background process; parent will serve stale data
+					close $lock_fh
+						or die "Couldn't close lockfile '$lockfile' (background): $!";
+					exit 0;
+				}
+			}
+			
 		} else {
 			# try to retrieve stale data
-			$data = $self->fetch($key)
+			$stale_data = $self->fetch($key)
 				if $self->is_valid($key, $self->get_max_lifetime());
-			if (!defined $data) {
+
+			if (!defined $stale_data) {
 				# get readers lock if there is no stale data to serve
 				flock($lock_fh, LOCK_SH);
 				$data = $self->fetch($key);
@@ -187,9 +226,9 @@ sub compute {
 		# closing lockfile releases lock
 		close $lock_fh
 			or die "Could't close lockfile '$lockfile': $!";
-	} until (defined $data || $lock_state);
+	} until (defined $data || defined $stale_data || $lock_state);
 	# repeat until we have data, or we tried generating data oneself and failed
-	return $data;
+	return defined $data ? $data : $stale_data;
 }
 
 1;
diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
index 39f24f8..2a3d9cf 100644
--- a/gitweb/lib/GitwebCache/SimpleFileCache.pm
+++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
@@ -129,7 +129,8 @@ sub new {
 # http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
 
 # creates get_depth() and set_depth($depth) etc. methods
-foreach my $i (qw(depth root namespace expires_min expires_max increase_factor check_load)) {
+foreach my $i (qw(depth root namespace
+                  expires_min expires_max increase_factor check_load)) {
 	my $field = $i;
 	no strict 'refs';
 	*{"get_$field"} = sub {
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 303bcf0..7ee628f 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -17,6 +17,9 @@ my $cache = new_ok('GitwebCache::FileCacheWithLocking', [ {
 } ]);
 isa_ok($cache, 'GitwebCache::SimpleFileCache');
 
+# compute can fork, don't generate zombies
+#local $SIG{CHLD} = 'IGNORE';
+
 # Test that default values are defined
 #
 ok(defined $GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT,
@@ -308,18 +311,21 @@ sub run_compute_forked {
 }
 $cache->set_expires_in(0);    # expire now
 $cache->set_max_lifetime(-1); # forever (always serve stale data)
+ok($cache->get_background_cache(), 'generate cache in background by default');
 $pid = open $kid_fh, '-|';
 SKIP: {
-	skip "cannot fork: $!", 2
+	skip "cannot fork: $!", 3
 		unless defined $pid;
 
 	my ($data, $child_data) = run_compute_forked($pid, $kid_fh);
 
 	# returning stale data works
-	ok($data eq $stale_value || $child_data eq $stale_value,
-	   'stale data in at least one process when expired');
+	is($data,       $stale_value, 'stale data in parent when expired');
+	is($child_data, $stale_value, 'stale data in child  when expired');
 
 	$cache->set_expires_in(-1); # never expire for next ->get
+	diag('waiting for background process to have time to set data');
+	sleep $slow_time; # wait for background process to have chance to set data
 	is($cache->get($key), $value, 'value got set correctly, even if stale data returned');
 }
 
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCHv5 15/17] gitweb: Introduce %actions_info, gathering information about actions
  2010-10-06 22:01 [PATCHv5 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (13 preceding siblings ...)
  2010-10-06 22:01 ` [PATCHv5 14/17] gitweb/lib - Regenerate (refresh) cache in background Jakub Narebski
@ 2010-10-06 22:02 ` Jakub Narebski
  2010-10-06 22:02 ` [PATCHv5/RFC 16/17] gitweb: Show appropriate "Generating..." page when regenerating cache Jakub Narebski
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 22:02 UTC (permalink / raw)
  To: git; +Cc: John 'Warthog9' Hawley, Petr Baudis, admin,
	Jakub Narebski

Currently it only contains information about output format, and is not
used anywhere.  It will be used to check whether current action
produces HTML output, and therefore is displaying HTML-based progress
info about (re)generating cache makes sense.

It can contain information about allowed extra options, whether to
display link to feed (Atom or RSS), etc. in easier and faster way than
listing all matching or all non-matching actions at appropriate site.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This is new patch, which didn't exist in v4 (previous version) of
gitweb caching series, and doesn't have equivalent in J.H. patch.

While it is patch preparing infrastructure to be used by progress
indicator (generating info), it is IMHO a good idea in itself, and
that is why it is not marked as RFC.

 gitweb/gitweb.perl |   20 ++++++++++++++++++++
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index d9ac063..26d5619 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -807,6 +807,25 @@ our %allowed_options = (
 	"--no-merges" => [ qw(rss atom log shortlog history) ],
 );
 
+our %actions_info = ();
+sub evaluate_actions_info {
+	our %actions_info;
+	our (%actions);
+
+	# unless explicitely stated otherwise, default output format is html
+	foreach my $action (keys %actions) {
+		$actions_info{$action}{'output_format'} = 'html';
+	}
+	# list all exceptions; undef means variable (no definite format)
+	map { $actions_info{$_}{'output_format'} = 'text' }
+		qw(commitdiff_plain patch patches project_index blame_data);
+	map { $actions_info{$_}{'output_format'} = 'xml' }
+		qw(rss atom opml); # there are different types (document formats) of XML
+	map { $actions_info{$_}{'output_format'} = undef }
+		qw(blob_plain object);
+	$actions_info{'snapshot'}{'output_format'} = 'binary';
+}
+
 # fill %input_params with the CGI parameters. All values except for 'opt'
 # should be single values, but opt can be an array. We should probably
 # build an array of parameters that can be multi-valued, but since for the time
@@ -1153,6 +1172,7 @@ sub run_request {
 
 	evaluate_uri();
 	evaluate_gitweb_config();
+	evaluate_actions_info();
 	evaluate_git_version();
 	check_loadavg();
 	configure_caching()
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCHv5/RFC 16/17] gitweb: Show appropriate "Generating..." page when regenerating cache
  2010-10-06 22:01 [PATCHv5 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (14 preceding siblings ...)
  2010-10-06 22:02 ` [PATCHv5 15/17] gitweb: Introduce %actions_info, gathering information about actions Jakub Narebski
@ 2010-10-06 22:02 ` Jakub Narebski
  2010-10-06 22:02 ` [PATCHv5/RFC 17/17] gitweb: Add startup delay to activity indicator for cache Jakub Narebski
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 22:02 UTC (permalink / raw)
  To: git; +Cc: John 'Warthog9' Hawley, Petr Baudis, admin,
	Jakub Narebski

When there exist stale/expired (but not too stale) version of
(re)generated page in cache, gitweb returns stale version (and updates
cache in background, assuming 'background_cache' is set to true value).
When there is no stale version suitable to serve the client, currently
we have to wait for the data to be generated in full before showing it.
Add to GitwebCache::FileCacheWithLocking, via 'generating_info' callback,
the ability to show user some activity indicator / progress bar, to
show that we are working on generating data.

Gitweb itself uses "Generating..." page as activity indicator, which
redirects (via <meta http-equiv="Refresh" ...>) to refreshed version
of the page after the cache is filled (via trick of not closing page
and therefore not closing connection till data is available in cache,
checked by getting shared/readers lock on lockfile for cache entry).
The git_generating_data_html() subroutine, which is used by gitweb
to implement this feature, is highly configurable: you can choose
frequency of writing some data so that connection won't get closed,
and maximum time to wait for data in "Generating..." page (see comments
in %generating_options hash definition).

git_generating_data_html() is run only for actions that have HTML
output; only for those actions HTML "Generating..." page makes sense.


This implements final feature from the original gitweb output caching
patch by J.H.

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Differences from v4:
* New action_outputs_html() subroutine in gitweb, used for whitelisting in
  git_generating_data_html().

* No capture_stop(), as it is not needed: the 'generating_info' subroutine
  (e.g. git_generating_data_html()) is run outside capturing output.

* git_generating_data_html() now ends with 'goto DONE_GITWEB'


Differences from relevant parts of J.H. patch:
* When providing "Generating..." page for process which acquired writers
  lock, and spawned background process to (re)generate data in cache, we
  have to drop writers lock for 'generating_info' subroutine to try to get
  readers lock to know when data is finished generating.  This is done by
  closing lockfile, and then reopening; I have found that LOCK_UN doesn't
  work reliably enough.

* The subroutine that is responsible for doing "Generating..." progress
  info / activity indicator (cacheWaitForUpdate() subroutine in
  J.H. patch, git_generating_data_html() in this patch) resides in
  gitweb.perl, and not in caching module.  It is passed as callback (as
  code reference) to $cache constructor, as 'generating_info' parameter.

* The idea how git_generating_data_html() works for web browsers without
  Ajax, using only plain HTML and server-side scripting is somewhat
  described in comments.

* gitweb prints generating info in more restricted set of situations; the
  set of actions where gitweb does not generate activity indicator is
  larger.  We could probably provide activity indicator also for
  (possibly) non-HTML output, like 'blob_plain' or 'patches', provided
  that 'User-Agent' denotes that we are using web browser.

* "Generating..." info behavior can be configured (at least the timings)
  via %generating_options hash, instead of having those options hardcoded.

* Waiting is done using blocking flock + alarm, rather than "busy wait"
  loop with non-blocking flock + sleep.

* The page title shows that it is progress info, and for what action
  (for git_generating_data_html()).

* Ability to use sub-second precision if Time::HiRes is available
  (for git_generating_data_html()).

* Basic test for "Generating..." feature, though not for the
  git_generating_data_html().

* Fixed typo in DTD URL.


 gitweb/gitweb.perl                             |  120 +++++++++++++++++++++++-
 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |   52 ++++++++++-
 t/t9503/test_cache_interface.pl                |   61 ++++++++++++
 3 files changed, 227 insertions(+), 6 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 26d5619..07fa825 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -21,7 +21,7 @@ use CGI qw(:standard :escapeHTML -nosticky);
 use CGI::Util qw(unescape);
 use CGI::Carp qw(fatalsToBrowser set_message);
 use Encode;
-use Fcntl ':mode';
+use Fcntl qw(:mode :flock);
 use File::Find qw();
 use File::Basename qw(basename);
 binmode STDOUT, ':utf8';
@@ -336,8 +336,32 @@ our %cache_options = (
 	# In theory this will make gitweb seem more responsive at the price of
 	# serving possibly stale data.
 	'background_cache' => 1,
-);
 
+	# Subroutine which would be called when gitweb has to wait for data to
+	# be generated (it can't serve stale data because there isn't any,
+	# or if it exists it is older than 'max_lifetime').  The default
+	# is to use git_generating_data_html(), which creates "Generating..."
+	# page, which would then redirect or redraw/rewrite the page when
+	# data is ready.
+	# Set it to `undef' to disable this feature.
+	#
+	# Such subroutine (if invoked from GitwebCache::SimpleFileCache)
+	# is passed the following parameters: $cache instance, human-readable
+	# $key to current page, and filehandle $lock_fh to lockfile.
+	'generating_info' => \&git_generating_data_html,
+);
+# You define site-wide options for "Generating..." page (if enabled) here
+# (which means that $cache_options{'generating_info'} is set to coderef);
+# override them with $GITWEB_CONFIG as necessary.
+our %generating_options = (
+	# The time between generating new piece of output to prevent from
+	# redirection before data is ready, i.e. time between printing each
+	# dot in activity indicator / progress info, in seconds.
+	'print_interval' => 2,
+	# Maximum time "Generating..." page would be present, waiting for data,
+	# before unconditional redirect, in seconds.
+	'timeout' => $cache_options{'expires_min'},
+);
 
 # You define site-wide feature defaults here; override them with
 # $GITWEB_CONFIG as necessary.
@@ -826,6 +850,11 @@ sub evaluate_actions_info {
 	$actions_info{'snapshot'}{'output_format'} = 'binary';
 }
 
+sub action_outputs_html {
+	my $action = shift;
+	return $actions_info{$action}{'output_format'} eq 'html';
+}
+
 # fill %input_params with the CGI parameters. All values except for 'opt'
 # should be single values, but opt can be an array. We should probably
 # build an array of parameters that can be multi-valued, but since for the time
@@ -3532,6 +3561,93 @@ sub get_page_title {
 	return $title;
 }
 
+# creates "Generating..." page when caching enabled and not in cache
+sub git_generating_data_html {
+	my ($cache, $key, $lock_fh) = @_;
+
+	# whitelist of actions that should get "Generating..." page
+	unless (action_outputs_html($action)) {
+		return;
+	}
+
+	my $title = "[Generating...] " . get_page_title();
+	# TODO: the following line of code duplicates the one
+	# in git_header_html, and it should probably be refactored.
+	my $mod_perl_version = $ENV{'MOD_PERL'} ? " $ENV{'MOD_PERL'}" : '';
+
+	# Use the trick that 'refresh' HTTP header equivalent (set via http-equiv)
+	# with timeout of 0 seconds would redirect as soon as page is finished.
+	# It assumes that browser would display partially received page.
+	# This "Generating..." redirect page should not be cached (externally).
+	print STDOUT $cgi->header(-type => 'text/html', -charset => 'utf-8',
+	                          -status=> '200 OK', -expires => 'now');
+	print STDOUT <<"EOF";
+<?xml version="1.0" encoding="utf-8"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
+                      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">
+<!-- git web interface version $version -->
+<!-- git core binaries version $git_version -->
+<head>
+<meta http-equiv="content-type" content="text/html; charset=utf-8" />
+<meta http-equiv="refresh" content="0" />
+<meta name="generator" content="gitweb/$version git/$git_version$mod_perl_version" />
+<meta name="robots" content="noindex, nofollow" />
+<title>$title</title>
+</head>
+<body>
+EOF
+
+	local $| = 1; # autoflush
+	print STDOUT 'Generating...';
+
+	my $total_time = 0;
+	my $interval = $generating_options{'print_interval'} || 1;
+	my $timeout  = $generating_options{'timeout'};
+	my $alarm_handler = sub {
+		local $! = 1;
+		print STDOUT '.';
+		$total_time += $interval;
+		if ($total_time > $timeout) {
+			die "timeout\n";
+		}
+	};
+	eval {
+		# check if we can use functions from Time::HiRes
+		if (defined $t0) {
+			local $SIG{ALRM} = $alarm_handler;
+			Time::HiRes::alarm($interval, $interval);
+		} else {
+			local $SIG{ALRM} = sub {
+				$alarm_handler->();
+				alarm($interval);
+			};
+			alarm($interval);
+		}
+		my $lock_acquired;
+		do {
+			# loop is needed here because SIGALRM (from 'alarm')
+			# can interrupt process of acquiring lock
+			$lock_acquired = flock($lock_fh, LOCK_SH); # blocking readers lock
+		} until ($lock_acquired);
+		alarm 0;
+	};
+	# It doesn't really matter if we got lock, or timed-out
+	# but we should re-throw unknown (unexpected) errors
+	die $@ if ($@ and $@ !~ /timeout/);
+
+	print STDOUT <<"EOF";
+
+</body>
+</html>
+EOF
+
+	# after refresh web browser would reload page and send new request
+	goto DONE_GITWEB;
+	#exit 0;
+	#return;
+}
+
 sub git_header_html {
 	my $status = shift || "200 OK";
 	my $expires = shift;
diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
index f0d9d6f..9211a9a 100644
--- a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
+++ b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
@@ -74,24 +74,32 @@ use POSIX qw(setsid);
 #  * 'background_cache' (boolean)
 #    This enables/disables regenerating cache in background process.
 #    Defaults to true.
+#  * 'generating_info'
+#    Subroutine (code) called when process has to wait for cache entry
+#    to be (re)generated (when there is no not-too-stale data to serve
+#    instead), for other process (or bacground process).  It is passed
+#    $cache instance, $key, and opened $lock_fh filehandle to lockfile.
+#    Unset by default (which means no activity indicator).
 sub new {
 	my ($proto, $p_options_hash_ref) = @_;
 
 	my $class = ref($proto) || $proto;
 	my $self = $class->SUPER::new($p_options_hash_ref);
 
-	my ($max_lifetime, $background_cache);
+	my ($max_lifetime, $background_cache, $generating_info);
 	if (defined $p_options_hash_ref) {
 		$max_lifetime =
 			$p_options_hash_ref->{'max_lifetime'} ||
 			$p_options_hash_ref->{'max_cache_lifetime'};
 		$background_cache = $p_options_hash_ref->{'background_cache'};
+		$generating_info  = $p_options_hash_ref->{'generating_info'};
 	}
 	$max_lifetime = -1 unless defined($max_lifetime);
 	$background_cache = 1 unless defined($background_cache);
 
 	$self->set_max_lifetime($max_lifetime);
 	$self->set_background_cache($background_cache);
+	$self->set_generating_info($generating_info);
 
 	return $self;
 }
@@ -102,7 +110,7 @@ sub new {
 # http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
 
 # creates get_depth() and set_depth($depth) etc. methods
-foreach my $i (qw(max_lifetime background_cache)) {
+foreach my $i (qw(max_lifetime background_cache generating_info)) {
 	my $field = $i;
 	no strict 'refs';
 	*{"get_$field"} = sub {
@@ -115,6 +123,17 @@ foreach my $i (qw(max_lifetime background_cache)) {
 	};
 }
 
+# $cache->generating_info($key, $lock);
+# runs 'generating_info' subroutine, for activity indicator,
+# checking if it is defined first.
+sub generating_info {
+	my $self = shift;
+
+	if (defined $self->{'generating_info'}) {
+		$self->{'generating_info'}->($self, @_);
+	}
+}
+
 # ----------------------------------------------------------------------
 # utility functions and methods
 
@@ -189,7 +208,8 @@ sub compute {
 
 				# fork if there is stale data, for background process
 				# to regenerate/refresh the cache (generate data)
-				$pid = fork() if (defined $stale_data);
+				$pid = fork()
+					if (defined $stale_data || $self->{'generating_info'});
 			}
 			if (!defined $pid || !$pid) {
 				## didn't fork, or are in background process
@@ -206,18 +226,42 @@ sub compute {
 
 				if (defined $pid) {
 					## in background process; parent will serve stale data
+					## or show activity indicator, and serve data
 					close $lock_fh
 						or die "Couldn't close lockfile '$lockfile' (background): $!";
 					exit 0;
 				}
+
+			} else {
+				## forked, in parent process
+
+				# provide "generating page..." info if there is no stale data to serve
+				# might exit, or force web browser to do redirection (refresh)
+				if (!defined $stale_data) {
+					# lock can get inherited across forks; unlock
+					# flock($lock_fh, LOCK_UN); # <-- this doesn't work
+					close $lock_fh
+						or die "Couldn't close lockfile '$lockfile' for reopen: $!";
+					open $lock_fh, '<', $lockfile
+						or die "Couldn't reopen (for reading) lockfile '$lockfile': $!";
+
+					$self->generating_info($key, $lock_fh);
+					# generating info may exit, so we can not get there
+					# wait for and get data from background process
+					flock($lock_fh, LOCK_SH);
+					$data = $self->fetch($key);
+				}
 			}
-			
+
 		} else {
 			# try to retrieve stale data
 			$stale_data = $self->fetch($key)
 				if $self->is_valid($key, $self->get_max_lifetime());
 
 			if (!defined $stale_data) {
+				# there is no stale data to serve
+				# provide "generating page..." info
+				$self->generating_info($key, $lock_fh);
 				# get readers lock if there is no stale data to serve
 				flock($lock_fh, LOCK_SH);
 				$data = $self->fetch($key);
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 7ee628f..1f876bd 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -344,4 +344,65 @@ SKIP: {
 }
 $cache->set_expires_in(-1);
 
+# Test 'generating_info' feature
+#
+$cache->remove($key);
+my $progress_info = "Generating...";
+sub test_generating_info {
+	local $| = 1;
+	print "$progress_info";
+}
+$cache->set_generating_info(\&test_generating_info);
+# Catch output printed by ->compute
+# (only for 'print <sth>' and 'printf <sth>')
+sub capture_compute {
+	my $output = '';
+
+	open my $output_fh, '>', \$output;
+	my $oldfh = select($output_fh);
+
+	my $data = $cache->compute($key, \&get_value_slow);
+
+	select($oldfh);
+	close $output_fh;
+
+	return ($output, $data);
+}
+sub run_capture_compute_forked {
+	my $pid = shift;
+
+	my ($output, $data) = capture_compute();
+	my ($child_output, $child_data);
+
+	if ($pid) {
+		local $/ = "\0";
+		chomp($child_output = <$kid_fh>);
+		chomp($child_data   = <$kid_fh>);
+
+		waitpid $pid, 0;
+		close $kid_fh;
+	} else {
+		local $| = 1;
+		$output = '' unless defined $output;
+		$data   = '' unless defined $data;
+		print "$output\0$data\0";
+		exit 0;
+	}
+
+	return ($output, $data, $child_output, $child_data);
+}
+SKIP: {
+	$pid = open $kid_fh, '-|';
+	skip "cannot fork: $!", 4
+		unless defined $pid;
+
+	my ($output, $data, $child_output, $child_data) =
+		run_capture_compute_forked($pid);
+
+	is($output,       $progress_info, 'progress info from parent');
+	is($child_output, $progress_info, 'progress info from child');
+	is($data,         $value,         'data info from parent');
+	is($child_data,   $value,         'data info from child');
+}
+
 done_testing();
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCHv5/RFC 17/17] gitweb: Add startup delay to activity indicator for cache
  2010-10-06 22:01 [PATCHv5 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (15 preceding siblings ...)
  2010-10-06 22:02 ` [PATCHv5/RFC 16/17] gitweb: Show appropriate "Generating..." page when regenerating cache Jakub Narebski
@ 2010-10-06 22:02 ` Jakub Narebski
  2010-10-06 22:02 ` [RFC/PATCHv5 18/17] gitweb/lib - Add clear() and size() methods to caching interface Jakub Narebski
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 22:02 UTC (permalink / raw)
  To: git; +Cc: John 'Warthog9' Hawley, Petr Baudis, admin,
	Jakub Narebski

This adds support for [optional] startup delay to git_generating_data_html()
subroutine, which is used to provide "Generating..." page as activity
indicator when waiting for page to be generated if caching.  If the data
(page contents) gets generated within $generating_options{'staryp_delay'}
seconds, the "Generating..." page won't get displayed.

This feature was created in response to complaint by Petr 'Pasky' Baudis'
about "Generating..." feature.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
No changes since v4 (previous version).

This patch doesn't have equivalent in J.H. patch, and is rather
extension of idea of progress indicator for (re)generating [cache]
data.  It was inspired by comment by Petr 'Pasky' Baudis on #git IRC
channel that one doesn't want to have "Generating..." page to be shown
only to immediately vanish.

 gitweb/gitweb.perl |   20 ++++++++++++++++++++
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 07fa825..fee8739 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -354,6 +354,9 @@ our %cache_options = (
 # (which means that $cache_options{'generating_info'} is set to coderef);
 # override them with $GITWEB_CONFIG as necessary.
 our %generating_options = (
+	# The delay before displaying "Generating..." page, in seconds.  It is
+	# intended for "Generating..." page to be shown only when really needed.
+	'startup_delay' => 1,
 	# The time between generating new piece of output to prevent from
 	# redirection before data is ready, i.e. time between printing each
 	# dot in activity indicator / progress info, in seconds.
@@ -3570,6 +3573,23 @@ sub git_generating_data_html {
 		return;
 	}
 
+	# Initial delay
+	if ($generating_options{'startup_delay'} > 0) {
+		eval {
+			local $SIG{ALRM} = sub { die "alarm clock restart\n" }; # NB: \n required
+			alarm $generating_options{'startup_delay'};
+			flock($lock_fh, LOCK_SH); # blocking readers lock
+			alarm 0;
+		};
+		if ($@) {
+			# propagate unexpected errors
+			die $@ if $@ !~ /alarm clock restart/;
+		} else {
+			# we got response within 'startup_delay' timeout
+			return;
+		}
+	}
+
 	my $title = "[Generating...] " . get_page_title();
 	# TODO: the following line of code duplicates the one
 	# in git_header_html, and it should probably be refactored.
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC/PATCHv5 18/17] gitweb/lib - Add clear() and size() methods to caching interface
  2010-10-06 22:01 [PATCHv5 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (16 preceding siblings ...)
  2010-10-06 22:02 ` [PATCHv5/RFC 17/17] gitweb: Add startup delay to activity indicator for cache Jakub Narebski
@ 2010-10-06 22:02 ` Jakub Narebski
  2010-10-06 22:56   ` Thomas Adam
  2010-10-06 22:02 ` [RFC PATCHv5 19/17] gitweb: Add beginnings of cache administration page Jakub Narebski
                   ` (4 subsequent siblings)
  22 siblings, 1 reply; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 22:02 UTC (permalink / raw)
  To: git; +Cc: John 'Warthog9' Hawley, Petr Baudis, admin,
	Jakub Narebski

Add ->size() method, which following Cache::Cache interface returns
estimated total size of all entries in whole cache (in the namsepace
assiciated with give cache instance).  Note that ->get_size($key)
returns size of a single entry!

Add ->clear() method, which removes all entries from the namespace
associated with given cache instance.  For safety it requires
namespace to be set to true value, which means that it cannot be
empty; therefore default namespace is changed to 'gitweb'.

The ->clear() method should be fairly safe, because it first renames
directory (which should be atomic), and only then removes it
(following code from CGI::Driver::File).

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
The only difference from previous version is that a comment about some
method was moved to earlier commit that introduced it, where it
belongs.

This patch (implementing "backend") together with next patch
(implementing "interface") are meant to provide nice web interface to
safely cleaning (pruning) cache, as described in comments and
documentation (how to do this manually).

 gitweb/lib/GitwebCache/SimpleFileCache.pm |   48 ++++++++++++++++++++++++++--
 t/t9503/test_cache_interface.pl           |    1 -
 2 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
index 2a3d9cf..8f172cb 100644
--- a/gitweb/lib/GitwebCache/SimpleFileCache.pm
+++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
@@ -20,8 +20,9 @@ package GitwebCache::SimpleFileCache;
 use strict;
 use warnings;
 
-use File::Path qw(mkpath);
-use File::Temp qw(tempfile);
+use File::Path qw(mkpath rmtree);
+use File::Temp qw(tempfile mktemp);
+use File::Find qw(find);
 use Digest::MD5 qw(md5_hex);
 
 # by default, the cache nests all entries on the filesystem single
@@ -37,7 +38,7 @@ our $DEFAULT_CACHE_ROOT = "cache";
 # by default we don't use cache namespace (empty namespace);
 # empty namespace does not allow for simple implementation of clear() method.
 #
-our $DEFAULT_NAMESPACE = '';
+our $DEFAULT_NAMESPACE = "gitweb";
 
 # ......................................................................
 # constructor
@@ -334,7 +335,7 @@ sub get_size {
 }
 
 # ......................................................................
-# interface methods
+# interface methods dealing with single item
 
 # Removing and expiring
 
@@ -427,6 +428,45 @@ sub compute {
 	return $data;
 }
 
+# ......................................................................
+# interface methods dealing with whole namespace
+
+# $cache->clear();
+#
+# Remove all entries from the namespace.
+# Namespace must be defined and not empty.
+sub clear {
+	my $self = shift;
+
+	return unless $self->get_namespace();
+
+	my $namespace_dir = $self->path_to_namespace();
+	return if !-d $namespace_dir;
+
+	my $renamed_dir = mktemp($namespace_dir . '.XXXX');
+	rename($namespace_dir, $renamed_dir);
+	rmtree($renamed_dir);
+	die "Couldn't remove '$renamed_dir' directory"
+		if -d $renamed_dir;
+}
+
+# $size = $cache->size();
+#
+# Size of whole names (or whole cache if namespace empty)
+sub size {
+	my $self = shift;
+
+	my $namespace_dir = $self->path_to_namespace();
+	return if !-d $namespace_dir;
+
+	my $total_size = 0;
+	my $add_size = sub { $total_size += -s $_ };
+
+	File::Find::find({ wanted => $add_size, no_chdir => 1 }, $namespace_dir);
+
+	return $total_size;
+}
+
 1;
 __END__
 # end of package GitwebCache::SimpleFileCache;
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 1f876bd..3a903cb 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -34,7 +34,6 @@ SKIP: {
 		unless ($GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT &&
 		        $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH);
 
-	is($cache->get_namespace(), '', "default namespace is ''");
 	cmp_ok($cache->get_root(),  'eq', $GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT,
 		"default cache root is '$GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT'");
 	cmp_ok($cache->get_depth(), '==', $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH,
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCHv5 19/17] gitweb: Add beginnings of cache administration page
  2010-10-06 22:01 [PATCHv5 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (17 preceding siblings ...)
  2010-10-06 22:02 ` [RFC/PATCHv5 18/17] gitweb/lib - Add clear() and size() methods to caching interface Jakub Narebski
@ 2010-10-06 22:02 ` Jakub Narebski
  2010-10-06 22:02 ` [PoC PATCHv5 20/17] gitweb/lib - Benchmarking GitwebCache::SimpleFileCache (in t/9603/) Jakub Narebski
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 22:02 UTC (permalink / raw)
  To: git; +Cc: John 'Warthog9' Hawley, Petr Baudis, admin,
	Jakub Narebski

Currently it shows estimated size of cache, and provides link to
clearing cache.

Cache administration page is visible only on local computer; the same
is true with respect to ability to clear cache.  Those are bare
beginnings of autorization framework.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Differences from v4:
* action_is_cacheable() (is_cacheable() in previous version) now uses
  %actions_info hash.

* Adding actions related to gitweb cache administration is now done in
  configure_caching() subroutine.

* cache_admin_auth_ok() now returns true also for running gitweb as
  script, when REMOTE_ADDR environment variable is not define.  It
  also covers situation where non-standard (not in RFC 3875)
  SERVER_ADDR is not defined.


This is "frontend" (interface) patch for gitweb cache management.

This is very much work in progress, sent to git mailing list as a
proof of concept.  Would something like that (perhaps with more
advanced authorization for admin) be useful?

The 'cache' page looks like this:

  Cache location      Size	 
  ------------------+---------+--------------
  cache/gitweb        14 KiB	[Clear cache]
  ------------------+---------+--------------

where '[Clear cache]' is a submit button.

 gitweb/gitweb.perl |  119 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 115 insertions(+), 4 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index fee8739..c413797 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -24,6 +24,8 @@ use Encode;
 use Fcntl qw(:mode :flock);
 use File::Find qw();
 use File::Basename qw(basename);
+use POSIX;
+
 binmode STDOUT, ':utf8';
 
 our $t0;
@@ -851,6 +853,10 @@ sub evaluate_actions_info {
 	map { $actions_info{$_}{'output_format'} = undef }
 		qw(blob_plain object);
 	$actions_info{'snapshot'}{'output_format'} = 'binary';
+
+	# specify uncacheable actions
+	map { $actions_info{$_}{'uncacheable'} = 1 }
+		qw(cache clear_cache);
 }
 
 sub action_outputs_html {
@@ -858,6 +864,11 @@ sub action_outputs_html {
 	return $actions_info{$action}{'output_format'} eq 'html';
 }
 
+sub action_is_cacheable {
+	my $action = shift;
+	return !$actions_info{$action}{'uncacheable'};
+}
+
 # fill %input_params with the CGI parameters. All values except for 'opt'
 # should be single values, but opt can be an array. We should probably
 # build an array of parameters that can be multi-valued, but since for the time
@@ -1178,12 +1189,13 @@ sub dispatch {
 	if (!defined($actions{$action})) {
 		die_error(400, "Unknown action");
 	}
-	if ($action !~ m/^(?:opml|project_list|project_index)$/ &&
+	if ($action !~ m/^(?:opml|project_list|project_index|cache|clear_cache)$/ &&
 	    !$project) {
 		die_error(400, "Project needed");
 	}
 
-	if ($caching_enabled) {
+	if ($caching_enabled &&
+	    action_is_cacheable($action)) {
 		# human readable key identifying gitweb output
 		my $output_key = href(-replay => 1, -full => 1, -path_info => 0);
 
@@ -1204,11 +1216,11 @@ sub run_request {
 
 	evaluate_uri();
 	evaluate_gitweb_config();
-	evaluate_actions_info();
 	evaluate_git_version();
 	check_loadavg();
 	configure_caching()
 		if ($caching_enabled);
+	evaluate_actions_info();
 
 	# $projectroot and $projects_list might be set in gitweb config file
 	$projects_list ||= $projectroot;
@@ -1311,6 +1323,10 @@ sub configure_caching {
 			'depth' => $cache_options{'cache_depth'},
 		});
 	}
+
+	# some actions are available only if cache is turned on
+	$actions{'cache'} = \&git_cache_admin;
+	$actions{'clear_cache'} = \&git_cache_clear;
 }
 
 run();
@@ -3673,7 +3689,7 @@ sub git_header_html {
 	my $expires = shift;
 	my %opts = @_;
 
-	my $title = get_page_title();
+	my $title = $opts{'-title'} || get_page_title();
 	my $content_type;
 	# require explicit support from the UA if we are to send the page as
 	# 'application/xhtml+xml', otherwise send it as plain old 'text/html'.
@@ -7325,3 +7341,98 @@ XML
 </opml>
 XML
 }
+
+# see Number::Bytes::Human
+sub human_readable_size {
+	my $bytes = shift || return;
+
+	my @units = ('', 'KiB', 'MiB', 'GiB', 'TiB');
+	my $block = 1024;
+
+	my $x = $bytes;
+	my $unit;
+	foreach (@units) {
+		$unit = $_, last if POSIX::ceil($x) < $block;
+		$x /= $block;
+	}
+
+	my $num;
+	if ($x < 10.0) {
+		$num = sprintf("%.1f", POSIX::ceil($x*10)/10); 
+	} else {
+		$num = sprintf("%d", POSIX::ceil($x));
+	}
+
+	return "$num $unit";
+}
+
+sub cache_admin_auth_ok {
+	if (defined $ENV{'REMOTE_ADDR'}) {
+		if (defined $ENV{'SERVER_ADDR'}) {
+			# SERVER_ADDR is not in RFC 3875
+			return $ENV{'SERVER_ADDR'} eq $ENV{'REMOTE_ADDR'};
+		} elsif ($ENV{'REMOTE_ADDR'} =~ m!^(?:127\.0\.0\.1|::1/128)$!) {
+			# localhost in IPv4 or IPv6
+			return 1;
+		}
+	} else {
+		# REMOTE_ADDR not defined, probably calling gitweb as script
+		return 1;
+	}
+
+	# restrict all but specified cases
+	return 0;
+}
+
+sub git_cache_admin {
+	$caching_enabled
+		or die_error(403, "Caching disabled");
+	cache_admin_auth_ok()
+		or die_error(403, "Cache administration not allowed");
+	$cache && ref($cache)
+		or die_error(500, "Cache is not present");
+
+	git_header_html(undef, undef,
+		-title => to_utf8($site_name) . " - Gitweb cache");
+
+	print <<'EOF_HTML';
+<table class="cache_admin">
+<tr><th>Cache location</th><th>Size</th><th>&nbsp;</th></tr>
+EOF_HTML
+	print '<tr class="light">' .
+	      '<td class="path">' . esc_path($cache->path_to_namespace()) . '</td>' .
+	      '<td>';
+	my $size;
+	$size = $cache->size()     if (!defined $size && $cache->can('size'));
+	$size = $cache->get_size() if (!defined $size && $cache->can('get_size'));
+	if (defined $size) {
+		print human_readable_size($size);
+	}
+	print '</td><td>';
+	if ($cache->can('clear')) {
+		print $cgi->start_form({-method => "POST",
+		                        -action => $my_uri,
+		                        -enctype => CGI::URL_ENCODED}) .
+		      $cgi->input({-name=>"a", -value=>"clear_cache", -type=>"hidden"}) .
+		      $cgi->submit({-label => 'Clear cache'}) .
+		      $cgi->end_form();
+	}
+	print <<'EOF_HTML';
+</td></tr>
+</table>
+EOF_HTML
+
+	git_footer_html();
+}
+
+sub git_cache_clear {
+	if (cache_admin_auth_ok() && $cache &&
+	    $cgi->request_method() eq 'POST') {
+
+		$cache->clear();
+	}
+
+	#print "cleared";
+	print $cgi->redirect(-uri => href(action=>'cache', -full=>1),
+	                     -status => '303 See Other');
+}
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PoC PATCHv5 20/17] gitweb/lib - Benchmarking GitwebCache::SimpleFileCache (in t/9603/)
  2010-10-06 22:01 [PATCHv5 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (18 preceding siblings ...)
  2010-10-06 22:02 ` [RFC PATCHv5 19/17] gitweb: Add beginnings of cache administration page Jakub Narebski
@ 2010-10-06 22:02 ` Jakub Narebski
  2010-10-06 22:02 ` [PoC PATCHv5 21/17] gitweb/lib - Alternate ways of capturing output Jakub Narebski
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 22:02 UTC (permalink / raw)
  To: git; +Cc: John 'Warthog9' Hawley, Petr Baudis, admin,
	Jakub Narebski

Add t/t9503/benchmark_caching_interface.pl, which benchmarks 'set' and
'get' methods from GitwebCache::SimpleFileCache, and compares them
against Cache::FileCache, Cache::MemoryCache, and Cache::FastMmap
if they are available.

Includes example benchmark results.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Formerly as patch 06/17 in previous version of series, now marked PoC
("proof of concept"), and moved to the end of series.

Differences from v4:
* Make benchmark as if for persistent (get/set only) and
  non-persistent (setup+get/set) environment.

* Add benchmark for Cache::NullCache (from Cache::Cache distribution)
  as a kind of baseline.

* Do tests for Cache::FastMmap both with and without setting
  raw_values.

* Include versions of modules used in benchmarks, and add new sample
  benchmark results.

 t/t9503/benchmark_caching_interface.pl |  209 ++++++++++++++++++++++++++++++++
 1 files changed, 209 insertions(+), 0 deletions(-)
 create mode 100755 t/t9503/benchmark_caching_interface.pl

diff --git a/t/t9503/benchmark_caching_interface.pl b/t/t9503/benchmark_caching_interface.pl
new file mode 100755
index 0000000..db06f6e
--- /dev/null
+++ b/t/t9503/benchmark_caching_interface.pl
@@ -0,0 +1,209 @@
+#!/usr/bin/perl
+use lib (split(/:/, $ENV{GITPERLLIB}));
+
+use warnings;
+use strict;
+
+use File::Spec;
+use File::Path;
+use Benchmark qw(:all);
+
+# benchmark source version
+sub __DIR__ () {
+	File::Spec->rel2abs(join '', (File::Spec->splitpath(__FILE__))[0, 1]);
+}
+use lib __DIR__."/../../gitweb/lib";
+use GitwebCache::SimpleFileCache;
+
+my $key = 'http://localhost/cgi-bin/gitweb.cgi?p=git/git.git;a=blob_plain;f=lorem.txt;hb=HEAD';
+my $value = <<'EOF';
+Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
+eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
+minim veniam, quis nostrud exercitation ullamco laboris nisi ut
+aliquip ex ea commodo consequat. Duis aute irure dolor in
+reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
+pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
+culpa qui officia deserunt mollit anim id est laborum.
+EOF
+
+my $cache_root = __DIR__.'/cache';
+my %cache_options = (
+	'cache_root' => $cache_root, # Cache::FileCache compatibile
+	'root_dir'   => $cache_root, # CHI::Driver::File compatibile
+	'cache_depth' => 1, # Cache::FileCache compatibile
+	'depth'       => 1, # CHI::Driver::File compatibile
+	'default_expires_in' => 24*60*60, # Cache::Cache compatibile
+	'expires_in'         => 24*60*60, # CHI compatibile
+	'expire_time'        => 24*60*60, # Cache::FastMmap compatibile
+);
+
+mkdir($cache_root);
+
+my %caches;
+
+sub add_cache {
+	my ($caches, $cache_key, $module, %extra_opts) = @_;
+	my $namespace = $module;
+	$namespace =~ s/::/-/g;
+
+	$caches->{$cache_key}{'module'} = $module;
+	$caches->{$cache_key}{'start'} = sub {
+		$module->new({
+			'namespace' => $namespace,
+			%cache_options,
+			%extra_opts,
+		});
+	};
+	my $cache = $caches->{$cache_key}{'cache'}
+		= $caches->{$cache_key}{'start'}->();
+	$caches->{$cache_key}{'set'} = sub {
+		$cache->set($key, $value);
+	};
+	$caches->{$cache_key}{'get'} = sub {
+		$cache->get($key);
+	};
+	$caches->{$cache_key}{'setup+set'} = sub {
+		$caches->{$cache_key}{'start'}->();
+		$caches->{$cache_key}{'set'}->();
+	};
+	$caches->{$cache_key}{'setup+get'} = sub {
+		$caches->{$cache_key}{'start'}->();
+		$caches->{$cache_key}{'get'}->();
+	};
+}
+
+add_cache(\%caches, 'SimpleFileCache', 'GitwebCache::SimpleFileCache');
+
+# assume that all Cache::Cache modules are available if one of them is
+if (eval { require Cache::FileCache; 1; }) {
+	add_cache(\%caches, 'C::FileCache',   'Cache::FileCache');
+
+	# for reference
+	require Cache::MemoryCache;
+	add_cache(\%caches, 'C::MemoryCache', 'Cache::MemoryCache');
+	require Cache::NullCache;
+	add_cache(\%caches, 'C::NullCache',   'Cache::NullCache');
+}
+
+if (eval { require Cache::FastMmap; 1 }) {
+	add_cache(\%caches, 'FastMmap', 'Cache::FastMmap',
+		'share_file' => $cache_root.'/Cache-FastMmap',
+		'init_file' => 1,  # clear any exiting values and re-initialize file
+		'raw_values' => 1, # don't freeze/thaw (serialize) data
+	);
+	add_cache(\%caches, 'FastMmap (S)', 'Cache::FastMmap',
+		'share_file' => $cache_root.'/Cache-FastMmap',
+		'init_file' => 1,  # clear any exiting values and re0initialize file
+	);
+}
+
+my %codehash;
+my $count = -10;
+my $result_set;
+
+print '$cache->set($key, $value)'."\n";
+%codehash = map { $_ => $caches{$_}{'set'} } keys %caches;
+$result_set = timethese($count, \%codehash);
+cmpthese($result_set);
+print "\n";
+
+print '$cache->get($key)'."\n";
+%codehash = map { $_ => $caches{$_}{'get'} } keys %caches;
+$result_set = timethese($count, \%codehash);
+cmpthese($result_set);
+print "\n";
+
+## Cache::FastMmap shouldn't use "'init_file' => 1" for this
+#
+# print '$cache->new(...) + $cache->set($key, $value)'."\n";
+# %codehash = map { $_ => $caches{$_}{'setup+set'} } keys %caches;
+# $result_set = timethese($count, \%codehash);
+# cmpthese($result_set);
+# print "\n";
+#
+# print '$cache->new(...) + $cache->get($key)'."\n";
+# %codehash = map { $_ => $caches{$_}{'setup+get'} } keys %caches;
+# $result_set = timethese($count, \%codehash);
+# cmpthese($result_set);
+# print "\n";
+
+rmtree($cache_root);
+
+1;
+__END__
+## EXAMPLE OUTPUT ##
+#
+# Cache::FastMmap version 1.35
+# Cache::Cache (Cache::FileCache, Cache::MemoryCache) version 1.05
+#
+# $cache->set($key, $value)
+# Benchmark: running Cache::FastMmap, Cache::FileCache, Cache::MemoryCache, GitwebCache
+#   for at least 10 CPU seconds...
+# Cache::FastMmap:    11 wallclock secs ( 8.46 usr +  2.17 sys = 10.63 CPU) @ 15710.25/s (n=167000)
+# Cache::FileCache:   15 wallclock secs ( 8.43 usr +  2.25 sys = 10.68 CPU) @ 356.27/s   (n=3805)
+# Cache::MemoryCache: 13 wallclock secs ( 9.91 usr +  0.09 sys = 10.00 CPU) @ 3306.20/s  (n=33062)
+# GitwebCache:        29 wallclock secs ( 7.02 usr +  3.47 sys = 10.49 CPU) @ 605.91/s   (n=6356)
+#                       Rate Cache::FileCache GitwebCache Cache::MemoryCache Cache::FastMmap
+# Cache::FileCache     356/s               --        -41%               -89%            -98%
+# GitwebCache          606/s              70%          --               -82%            -96%
+# Cache::MemoryCache  3306/s             828%        446%                 --            -79%
+# Cache::FastMmap    15710/s            4310%       2493%               375%              --
+#
+# $cache->get($key)  # $key exists in cache
+# Benchmark: running Cache::FastMmap, Cache::FileCache, Cache::MemoryCache, GitwebCache
+#   for at least 10 CPU seconds...
+# Cache::FastMmap:    13 wallclock secs ( 7.32 usr +  2.83 sys = 10.15 CPU) @ 24260.59/s (n=246245)
+# Cache::FileCache:   12 wallclock secs ( 9.22 usr +  1.30 sys = 10.52 CPU) @ 972.62/s   (n=10232)
+# Cache::MemoryCache: 14 wallclock secs ( 9.89 usr +  0.12 sys = 10.01 CPU) @ 3679.52/s  (n=36832)
+# GitwebCache:        20 wallclock secs ( 8.16 usr +  2.36 sys = 10.52 CPU) @ 4401.05/s  (n=46299)
+#                       Rate Cache::FileCache Cache::MemoryCache GitwebCache Cache::FastMmap
+# Cache::FileCache     973/s               --               -74%        -78%            -96%
+# Cache::MemoryCache  3680/s             278%                 --        -16%            -85%
+# GitwebCache         4401/s             352%                20%          --            -82%
+# Cache::FastMmap    24261/s            2394%               559%        451%              --
+
+## EXAMPLE OUTPUT (new version of test, modified SimpleFileCache) ##
+#
+## like for non-persistent environment
+#
+# $cache->new(...) + $cache->set($key, $value)
+# Benchmark: running C::FileCache, C::MemoryCache, C::NullCache, SimpleFileCache
+#   for at least 10 CPU seconds...
+#                    Rate C::FileCache SimpleFileCache C::MemoryCache C::NullCache
+# C::FileCache      271/s           --            -47%           -85%         -99%
+# SimpleFileCache   510/s          88%              --           -71%         -98%
+# C::MemoryCache   1766/s         553%            246%             --         -94%
+# C::NullCache    29119/s       10660%           5612%          1549%           --
+#
+# $cache->new(...) + $cache->get($key)
+# Benchmark: running C::FileCache, C::MemoryCache, C::NullCache, SimpleFileCache
+#   for at least 10 CPU seconds...
+#                    Rate C::FileCache C::MemoryCache SimpleFileCache C::NullCache
+# C::FileCache      510/s           --           -73%            -82%         -98%
+# C::MemoryCache   1905/s         273%             --            -32%         -93%
+# SimpleFileCache  2806/s         450%            47%              --         -90%
+# C::NullCache    28626/s        5509%          1402%            920%           --
+#
+## like for persistent environment
+#
+# $cache->set($key, $value)
+# Benchmark: running C::FileCache, C::MemoryCache, C::NullCache,
+#   FastMmap, FastMmap (S), SimpleFileCache for at least 10 CPU seconds...
+#                     Rate C::FileCache SimpleFileCache C::MemoryCache FastMmap (S) FastMmap C::NullCache
+# C::FileCache       309/s           --            -46%           -90%         -94%     -98%        -100%
+# SimpleFileCache    574/s          86%              --           -82%         -88%     -96%        -100%
+# C::MemoryCache    3168/s         925%            451%             --         -34%     -78%         -99%
+# FastMmap (S)      4828/s        1462%            740%            52%           --     -66%         -99%
+# FastMmap         14289/s        4524%           2387%           351%         196%       --         -98%
+# C::NullCache    572686/s      185208%          99586%         17976%       11761%    3908%           --
+#
+# $cache->get($key)  # $key exists in cache
+# Benchmark: running C::FileCache, C::MemoryCache, C::NullCache,
+#   FastMmap, FastMmap (S), SimpleFileCache for at least 10 CPU seconds...
+#                     Rate C::FileCache SimpleFileCache C::MemoryCache FastMmap (S) FastMmap C::NullCache
+# C::FileCache       830/s           --           -77%            -79%         -94%     -96%        -100%
+# C::MemoryCache    3539/s         326%             --            -12%         -75%     -84%         -99%
+# SimpleFileCache   4040/s         387%            14%              --         -72%     -82%         -99%
+# FastMmap (S)     14367/s        1631%           306%            256%           --     -35%         -97%
+# FastMmap         22247/s        2580%           529%            451%          55%       --         -96%
+# C::NullCache    546698/s       65759%         15348%          13431%        3705%    2357%           --
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PoC PATCHv5 21/17] gitweb/lib - Alternate ways of capturing output
  2010-10-06 22:01 [PATCHv5 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (19 preceding siblings ...)
  2010-10-06 22:02 ` [PoC PATCHv5 20/17] gitweb/lib - Benchmarking GitwebCache::SimpleFileCache (in t/9603/) Jakub Narebski
@ 2010-10-06 22:02 ` Jakub Narebski
  2010-10-10 20:32 ` [RFD] Possible improvements for output caching in gitweb Jakub Narebski
  2010-10-24 21:34 ` [PATCHv5 00/17] gitweb: Simple file based output caching J.H.
  22 siblings, 0 replies; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 22:02 UTC (permalink / raw)
  To: git; +Cc: John 'Warthog9' Hawley, Petr Baudis, admin,
	Jakub Narebski

Besides GitwebCache::Capture::SelectFH, which uses select(FH) to
redirect 'print LIST' and 'printf FORMAT, LIST' to in-memory file to
capture output, add GitwebCache::Capture::TiedCapture which uses
tie-ing filehandle to capture output, and GitwebCache::Capture::PerlIO
which uses push_layer method from non-core PerlIO::Util module to
capture output.

Add test (which can be run standalone) for all those implementations,
checking ':utf8' and ':raw' output, and benchmark comparing them
(includes example benchmark tests).  Please note that the test for
alternate implementations is not run from t9504 test.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Formerly as patch 08/17 in previous version of series, now marked PoC
("proof of concept"), and moved to the end of series.

Differences from v4:
* GitwebCache::Capture::TiedCapture has now two (sub)versions: one
  appending data from tied operations to a string (after conversion)
  via TiedCapture::String (default), other redirecting tied operations
  to save to in-memory file via TiedCapture::PerlIO (new).

  Add tests for new TiedCapture::PerlIO.

* Tie::Restore (non-core module from CPAN) is now in separate file;
  should be probably in 'inc/' and not in 'lib/'.

* Change name of field in TiedCapture::* from 'mode' to 'binmode'.

* New example results of benchmark.

* Add test checking that all those implementations work correctly for
  capturing both ':utf8' and ':raw' printed data.

 gitweb/lib/GitwebCache/Capture/PerlIO.pm      |   79 +++++++++
 gitweb/lib/GitwebCache/Capture/TiedCapture.pm |   76 +++++++++
 gitweb/lib/Tie/Restore.pm                     |   24 +++
 gitweb/lib/TiedCapture/PerlIO.pm              |   56 ++++++
 gitweb/lib/TiedCapture/String.pm              |   53 ++++++
 t/t9504/benchmark_capture_implementations.pl  |  226 +++++++++++++++++++++++++
 t/t9504/test_capture_implementations.pl       |   85 +++++++++
 7 files changed, 599 insertions(+), 0 deletions(-)
 create mode 100644 gitweb/lib/GitwebCache/Capture/PerlIO.pm
 create mode 100644 gitweb/lib/GitwebCache/Capture/TiedCapture.pm
 create mode 100644 gitweb/lib/Tie/Restore.pm
 create mode 100644 gitweb/lib/TiedCapture/PerlIO.pm
 create mode 100644 gitweb/lib/TiedCapture/String.pm
 create mode 100755 t/t9504/benchmark_capture_implementations.pl
 create mode 100755 t/t9504/test_capture_implementations.pl

diff --git a/gitweb/lib/GitwebCache/Capture/PerlIO.pm b/gitweb/lib/GitwebCache/Capture/PerlIO.pm
new file mode 100644
index 0000000..199aeed
--- /dev/null
+++ b/gitweb/lib/GitwebCache/Capture/PerlIO.pm
@@ -0,0 +1,79 @@
+# gitweb - simple web interface to track changes in git repositories
+#
+# (C) 2010, Jakub Narebski <jnareb@gmail.com>
+#
+# This program is licensed under the GPLv2
+
+#
+# Output capturing using PerlIO layers
+#
+
+# This module requires PaerlIO::Util installed.
+
+package GitwebCache::Capture::PerlIO;
+
+use PerlIO::Util;
+
+use strict;
+use warnings;
+
+use base qw(GitwebCache::Capture);
+use GitwebCache::Capture qw(:all);
+
+use Exporter qw(import);
+our @EXPORT      = @GitwebCache::Capture::EXPORT;
+our @EXPORT_OK   = @GitwebCache::Capture::EXPORT_OK;
+our %EXPORT_TAGS = %GitwebCache::Capture::EXPORT_TAGS;
+
+# Constructor
+sub new {
+	my $proto = shift;
+
+	my $class = ref($proto) || $proto;
+	my $self  = {};
+	$self = bless($self, $class);
+
+	$self->{'data'} = '';
+
+	return $self;
+}
+
+# Start capturing data (STDOUT)
+# (printed using 'print <sth>' or 'printf <sth>')
+sub start {
+	my $self = shift;
+
+	$self->{'data'}    = '';
+	*STDOUT->push_layer('scalar' => \$self->{'data'});
+
+	# push ':utf8' on top, if it was on top
+	*STDOUT->push_layer(':utf8')
+		if ((*STDOUT->get_layers())[-2] eq 'utf8');
+}
+
+# Stop capturing data (required for die_error)
+sub stop {
+	my $self = shift;
+
+	# return if we didn't start capturing
+	my @layers = *STDOUT->get_layers();
+	return unless grep { $_ eq 'scalar' } @layers;
+
+	my $was_utf8 = $layers[-1] eq 'utf8';
+	# stop saving to scalar, i.e. remove topmost 'scalar' layer,
+	# but remember that 'utf8' layer might be on top of it
+	while ((my $layer = *STDOUT->pop_layer())) {
+		pop @layers;
+		last if $layer eq 'scalar';
+	}
+	# restore ':utf8' mode, if needed
+	if ($was_utf8 && $layers[-1] ne 'utf8') {
+		*STDOUT->push_layer('utf8');
+	}
+
+	return $self->{'data'};
+}
+
+1;
+__END__
+# end of package GitwebCache::Capture::PerlIO;
diff --git a/gitweb/lib/GitwebCache/Capture/TiedCapture.pm b/gitweb/lib/GitwebCache/Capture/TiedCapture.pm
new file mode 100644
index 0000000..6bed0f8
--- /dev/null
+++ b/gitweb/lib/GitwebCache/Capture/TiedCapture.pm
@@ -0,0 +1,76 @@
+# gitweb - simple web interface to track changes in git repositories
+#
+# (C) 2010, Jakub Narebski <jnareb@gmail.com>
+#
+# This program is licensed under the GPLv2
+
+#
+# Simple output capturing by tie-ing filehandle
+#
+
+package GitwebCache::Capture::TiedCapture;
+
+use PerlIO;
+
+use strict;
+use warnings;
+
+use base qw(GitwebCache::Capture);
+use GitwebCache::Capture qw(:all);
+
+use Exporter qw(import);
+our @EXPORT      = @GitwebCache::Capture::EXPORT;
+our @EXPORT_OK   = @GitwebCache::Capture::EXPORT_OK;
+our %EXPORT_TAGS = %GitwebCache::Capture::EXPORT_TAGS;
+
+# Constructor
+sub new {
+	my $proto = shift;
+
+	my $class = ref($proto) || $proto;
+	my $self  = {};
+	$self = bless($self, $class);
+
+	$self->{'data'} = '';
+	$self->{'tied'} = undef;
+	$self->{'tie_class'} = shift || 'TiedCapture::String';
+	eval "require $self->{'tie_class'}";
+
+	return $self;
+}
+
+# Start capturing data (STDOUT)
+# (printed using 'print <sth>' or 'printf <sth>')
+sub start {
+	my $self = shift;
+
+	# savie tie
+	$self->{'tied'} = tied *STDOUT;
+
+	$self->{'data'} = '';
+	tie *STDOUT, $self->{'tie_class'}, \$self->{'data'};
+
+	# re-binmode, so that tied class would pick it up
+	binmode STDOUT,
+		(PerlIO::get_layers(*STDOUT))[-1] eq 'utf8' ? ':utf8' : ':raw';
+}
+
+# Stop capturing data (required for die_error)
+sub stop {
+	my $self = shift;
+
+	# return if we didn't start capturing
+	return unless tied(*STDOUT)->isa($self->{'tie_class'});
+
+	# restore ties, if there were any
+	untie *STDOUT;
+	if ($self->{'tied'}) {
+		tie *STDOUT, 'Tie::Restore', $self->{'tied'};
+	}
+
+	return $self->{'data'};
+}
+
+1;
+__END__
+# end of package GitwebCache::Capture::TiedCapture;
diff --git a/gitweb/lib/Tie/Restore.pm b/gitweb/lib/Tie/Restore.pm
new file mode 100644
index 0000000..687434e
--- /dev/null
+++ b/gitweb/lib/Tie/Restore.pm
@@ -0,0 +1,24 @@
+########################################################################
+# This package should probably be put in `gitweb/inc/' instead
+#
+# taken from http://search.cpan.org/~robwalker/Tie-Restore-0.11/Restore.pm
+# with POD documentation stripped out
+
+package Tie::Restore;
+# Written by Robby Walker ( webmaster@pointwriter.com )
+# for Point Writer ( http://www.pointwriter.com/ ).
+
+our $VERSION = '0.11';
+$VERSION = eval $VERSION;
+
+# $object = tied %hash;                # save
+# tie %hash, 'Tie::Restore', $object;  # restore
+
+sub TIESCALAR { $_[1] }
+sub TIEARRAY  { $_[1] }
+sub TIEHASH   { $_[1] }
+sub TIEHANDLE { $_[1] }
+
+1;
+__END__
+# end of package Tie::Restore
diff --git a/gitweb/lib/TiedCapture/PerlIO.pm b/gitweb/lib/TiedCapture/PerlIO.pm
new file mode 100644
index 0000000..4bbd724
--- /dev/null
+++ b/gitweb/lib/TiedCapture/PerlIO.pm
@@ -0,0 +1,56 @@
+########################################################################
+
+package TiedCapture::PerlIO;
+
+our $VERSION = '0.001';
+$VERSION = eval $VERSION;
+
+use strict;
+use warnings;
+
+use PerlIO;
+
+sub TIEHANDLE {
+	my ($proto, $dataref) = @_;
+	my $class = ref($proto) || $proto;
+	my $self = {};
+	$self = bless($self, $class);
+	$self->{'scalar'} = $dataref;
+	$self->{'binmode'} = ':utf8';
+
+	$self->{'scalar_fh'} = undef;
+	open $self->{'scalar_fh'}, '>', $self->{'scalar'}
+		or die "Couldn't open in-memory file for capture: $!";
+
+	return $self;
+}
+
+sub WRITE {
+	my $self = shift;
+	syswrite $self->{'scalar_fh'}, @_;
+}
+
+sub PRINT {
+	my $self = shift;
+	print { $self->{'scalar_fh'} } @_;
+}
+
+sub PRINTF {
+	my $self = shift;
+	printf { $self->{'scalar_fh'} } @_;
+}
+
+sub BINMODE {
+	my $self = shift;
+	$self->{'binmode'} = shift || ':raw';
+	binmode $self->{'scalar_fh'}, $self->{'binmode'};
+}
+
+#sub UNTIE {
+#	close $self->{'scalar_fh'};
+#	$self->{'scalar_fh'} = undef;
+#}
+
+1;
+__END__
+# end of package TiedCapture::PerlIO
diff --git a/gitweb/lib/TiedCapture/String.pm b/gitweb/lib/TiedCapture/String.pm
new file mode 100644
index 0000000..72b15a7
--- /dev/null
+++ b/gitweb/lib/TiedCapture/String.pm
@@ -0,0 +1,53 @@
+########################################################################
+
+package TiedCapture::String;
+
+our $VERSION = '0.001';
+$VERSION = eval $VERSION;
+
+use strict;
+use warnings;
+
+sub TIEHANDLE {
+	my ($proto, $dataref) = @_;
+	my $class = ref($proto) || $proto;
+	my $self = {};
+	$self = bless($self, $class);
+	$self->{'scalar'} = $dataref;
+	$self->{'binmode'} = ':utf8';
+	return $self;
+}
+
+sub append_str {
+	my ($self, $str) = @_;
+	utf8::encode($str) if ($self->{'binmode'} eq ':utf8');
+	${$self->{'scalar'}} .= $str;
+}
+
+sub WRITE {
+	my ($self, $buffer, $length, $offset) = @_;
+	$self->append_str(substr($buffer, $offset, $length));
+}
+
+sub PRINT {
+	my $self = shift;
+	$self->append_str(join('',@_));
+}
+
+sub PRINTF {
+	my $self = shift;
+	$self->append_str(sprintf(@_));
+}
+
+sub BINMODE {
+	my $self = shift;
+	$self->{'binmode'} = shift || ':raw';
+}
+
+#sub UNTIE {
+#	local $^W = 0;
+#}
+
+1;
+__END__
+# end of package TiedCapture::String
diff --git a/t/t9504/benchmark_capture_implementations.pl b/t/t9504/benchmark_capture_implementations.pl
new file mode 100755
index 0000000..588c1dc
--- /dev/null
+++ b/t/t9504/benchmark_capture_implementations.pl
@@ -0,0 +1,226 @@
+#!/usr/bin/perl
+use lib (split(/:/, $ENV{GITPERLLIB}));
+
+use warnings;
+use strict;
+
+use File::Spec;
+use File::Path;
+use Benchmark qw(:all);
+
+use PerlIO;
+
+# benchmark source version
+sub __DIR__ () {
+	File::Spec->rel2abs(join '', (File::Spec->splitpath(__FILE__))[0, 1]);
+}
+use lib __DIR__."/../../gitweb/lib";
+
+# ....................................................................
+
+# Load modules (without importing)
+#
+my @modules =
+	map { "GitwebCache::Capture::$_" }
+	qw(SelectFH TiedCapture);
+foreach my $mod (@modules) {
+	eval "require $mod";
+}
+if (eval { require PerlIO::Util; 1 }) {
+	require GitwebCache::Capture::PerlIO;
+	push @modules, 'GitwebCache::Capture::PerlIO';
+}
+
+# Set up capturing, for each module
+#
+my @captures = map { $_->new() } @modules;
+push @captures, GitwebCache::Capture::TiedCapture->new('TiedCapture::PerlIO');
+
+
+my $test_data = <<'EOF';
+Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
+eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
+minim veniam, quis nostrud exercitation ullamco laboris nisi ut
+aliquip ex ea commodo consequat. Duis aute irure dolor in
+reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
+pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
+culpa qui officia deserunt mollit anim id est laborum.
+
+Sed ut perspiciatis unde omnis iste natus error sit voluptatem
+accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae
+ab illo inventore veritatis et quasi architecto beatae vitae dicta
+sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit
+aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos
+qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui
+dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed
+quia non numquam eius modi tempora incidunt ut labore et dolore magnam
+aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum
+exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex
+ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in
+ea voluptate velit esse quam nihil molestiae consequatur, vel illum
+qui dolorem eum fugiat quo voluptas nulla pariatur?
+EOF
+
+my @captured_output;
+my $repeat = 100;
+sub capture_output {
+	my ($class, $mode) = @_;
+
+	$class->start();
+	binmode select(), $mode if defined($mode);
+	print $test_data for (1..$repeat);
+
+	return $class->stop();
+}
+
+my %codehash;
+for (my $i = 0; $i < @captures; $i++) {
+	my $capture = $captures[$i];
+	my $name = ref($capture);
+	$name =~ s/^.*:://;
+	$name .= " ($captures[$i]->{'tie_class'})"
+		if $captures[$i]->{'tie_class'};
+	$name =~ s/TiedCapture:://;
+
+	$codehash{$name} = sub { $captured_output[$i] = capture_output($capture) };
+}
+
+# ....................................................................
+
+my $test_other_modules = 0;
+
+if ($test_other_modules) {
+
+	if (eval { require Capture::Tiny; 1; }) {
+		$codehash{'Capture::Tiny'} = sub {
+			my ($stdout, $stderr) = Capture::Tiny::capture(sub {
+				print $test_data for (1..$repeat);
+			});
+			print STDERR $stderr if defined($stderr);
+		};
+	}
+
+	if (eval { require IO::CaptureOutput; 1; }) {
+		$codehash{'IO::CaptureOutput'} = sub {
+			my ($stdout, $stderr);
+			IO::CaptureOutput::capture(sub {
+				print $test_data for (1..$repeat);
+			}, \$stdout, \$stderr);
+			print STDERR $stderr if defined($stderr);
+		};
+		# somehow it interferes with capturing in GitwebCache::Capture::PerlIO
+		delete $codehash{'PerlIO'};
+	}
+
+	if (eval { require IO::Capture::Stdout; 1; }) {
+		$codehash{'IO::Capture'} = sub {
+			my $capture = IO::Capture::Stdout->new();
+
+			$capture->start();
+			print $test_data for (1..$repeat);
+			$capture->stop();
+
+			my $captured_output = join('', $capture->read());
+		};
+	}
+} # end if ($test_other_modules)
+
+# ....................................................................
+
+print "Capturing $repeat x ".length($test_data).
+      " = ".($repeat * length($test_data))." characters\n";
+my $count = -10; # CPU seconds
+my $result = timethese($count, \%codehash);
+cmpthese($result);
+
+#if (exists $codehash{PerlIO}) {
+#	cmpthese(-10, {
+#		'PerlIO::get_layers'  => sub { PerlIO::get_layers(*STDOUT); },
+#		'PerlIO::Util method' => sub { *STDOUT->get_layers(); },
+#	});
+#}
+
+1;
+__END__
+## EXAMPLE OUTPUT ##
+#
+## 1 x $test_data, PerlIO using *STDOUT->get_layers();
+# Benchmark: running PerlIO, SelectFH, TiedCapture for at least 10 CPU seconds...
+#      PerlIO:  9 wallclock secs (10.38 usr +  0.13 sys = 10.51 CPU) @  9676.31/s (n=101698)
+#    SelectFH: 12 wallclock secs (10.51 usr +  0.02 sys = 10.53 CPU) @ 12294.21/s (n=129458)
+# TiedCapture: 10 wallclock secs (10.24 usr +  0.06 sys = 10.30 CPU) @  9489.22/s (n=97739)
+#                Rate TiedCapture      PerlIO    SelectFH
+# TiedCapture  9489/s          --         -2%        -23%
+# PerlIO       9676/s          2%          --        -21%
+# SelectFH    12294/s         30%         27%          --
+#
+## 10 x $test_data, PerlIO using *STDOUT->get_layers();
+# Benchmark: running PerlIO, SelectFH, TiedCapture for at least 10 CPU seconds...
+#      PerlIO:  9 wallclock secs (10.47 usr +  0.07 sys = 10.54 CPU) @ 7558.35/s (n=79665)
+#    SelectFH: 11 wallclock secs (10.36 usr +  0.04 sys = 10.40 CPU) @ 8970.87/s (n=93297)
+# TiedCapture: 11 wallclock secs (10.45 usr +  0.02 sys = 10.47 CPU) @ 2602.77/s (n=27251)
+#               Rate TiedCapture      PerlIO    SelectFH
+# TiedCapture 2603/s          --        -66%        -71%
+# PerlIO      7558/s        190%          --        -16%
+# SelectFH    8971/s        245%         19%          --
+#
+## 100 x $test_data, PerlIO using *STDOUT->get_layers();
+# Benchmark: running PerlIO, SelectFH, TiedCapture for at least 50 CPU seconds...
+#      PerlIO: 67 wallclock secs (35.28 usr + 17.82 sys = 53.10 CPU) @ 832.41/s (n=44201)
+#    SelectFH: 73 wallclock secs (33.83 usr + 18.63 sys = 52.46 CPU) @ 830.06/s (n=43545)
+# TiedCapture: 71 wallclock secs (50.93 usr +  0.41 sys = 51.34 CPU) @  95.31/s (n=4893)
+#               Rate TiedCapture    SelectFH      PerlIO
+# TiedCapture 95.3/s          --        -89%        -89%
+# SelectFH     830/s        771%          --         -0%
+# PerlIO       832/s        773%          0%          --
+#
+## 100 x $test_data, PerlIO using mix of *STDOUT->get_layers() and PerlIO::get_layers(*STDOUT);
+# Capturing 100 x 1314 = 131400 characters
+# Benchmark: timing 25000 iterations of PerlIO, SelectFH, TiedCapture...
+#      PerlIO:  30 wallclock secs  (19.05 usr + 10.29 sys =  29.34 CPU) @ 852.08/s (n=25000)
+#    SelectFH:  30 wallclock secs  (18.95 usr + 10.26 sys =  29.21 CPU) @ 855.87/s (n=25000)
+# TiedCapture: 307 wallclock secs (267.37 usr +  2.95 sys = 270.32 CPU) @  92.48/s (n=25000)
+#               Rate TiedCapture      PerlIO    SelectFH
+# TiedCapture 92.5/s          --        -89%        -89%
+# PerlIO       852/s        821%          --         -0%
+# SelectFH     856/s        825%          0%          --
+#
+## 100 x $test_data (IO::CaptureOutput interferes with GitwebCache::Capture::PerlIO)
+# Capturing 100 x 1314 = 131400 characters
+# Benchmark: running IO::CaptureOutput, SelectFH, TiedCapture for at least 10 CPU seconds...
+# IO::CaptureOutput: 12 wallclock secs ( 5.12 usr +  5.63 sys = 10.75 CPU) @ 126.60/s (n=1361)
+#          SelectFH: 12 wallclock secs ( 6.93 usr +  3.45 sys = 10.38 CPU) @ 808.29/s (n=8390)
+#       TiedCapture: 11 wallclock secs (10.11 usr +  0.01 sys = 10.12 CPU) @ 103.26/s (n=1045)
+#                    Rate       TiedCapture IO::CaptureOutput          SelectFH
+# TiedCapture       103/s                --              -18%              -87%
+# IO::CaptureOutput 127/s               23%                --              -84%
+# SelectFH          808/s              683%              538%                --
+#
+## PerlIO::get_layers   == PerlIO::get_layers(*STDOUT)
+## PerlIU::Util method  == *STDOUT->get_layers()
+#                        Rate PerlIO::Util method  PerlIO::get_layers
+# PerlIO::Util method 54405/s                  --                -38%
+# PerlIO::get_layers  87672/s                 61%                  --
+
+##
+# Capturing 100 x 1314 = 131400 characters
+# Benchmark: running PerlIO, SelectFH, TiedCapture (PerlIO), TiedCapture (String)
+#   for at least 10 CPU seconds...
+#                        Rate TiedCapture (String) TiedCapture (PerlIO) SelectFH PerlIO
+# TiedCapture (String) 96.5/s                   --                 -76%     -88%   -88%
+# TiedCapture (PerlIO)  407/s                 322%                   --     -48%   -48%
+# SelectFH              787/s                 715%                  93%       --    -0%
+# PerlIO                789/s                 717%                  94%       0%     --
+#
+# comment: you can see effects of perltie overhead and repeated string concatenation here.
+
+##
+#
+# Capturing 100 x 1314 = 131400 characters
+# Benchmark: running IO::CaptureOutput, SelectFH, TiedCapture (PerlIO), TiedCapture (String)
+#   for at least 10 CPU seconds...
+#                       Rate TiedCapture (String) IO::CaptureOutput TiedCapture (PerlIO) SelectFH
+# TiedCapture (String) 109/s                   --               -4%                 -72%     -84%
+# IO::CaptureOutput    114/s                   4%                --                 -70%     -84%
+# TiedCapture (PerlIO) 384/s                 253%              237%                   --     -45%
+# SelectFH             693/s                 536%              509%                  80%       --
diff --git a/t/t9504/test_capture_implementations.pl b/t/t9504/test_capture_implementations.pl
new file mode 100755
index 0000000..86796ac
--- /dev/null
+++ b/t/t9504/test_capture_implementations.pl
@@ -0,0 +1,85 @@
+#!/usr/bin/perl
+use lib (split(/:/, $ENV{GITPERLLIB}));
+
+use warnings;
+use strict;
+
+use File::Spec;
+
+use Test::More;
+
+# test source version
+#use if defined($ENV{TEST_DIRECTORY}),
+#	lib => "$ENV{TEST_DIRECTORY}/../gitweb/lib";
+sub __DIR__ () {
+	File::Spec->rel2abs(join '', (File::Spec->splitpath(__FILE__))[0, 1]);
+}
+use lib __DIR__."/../../gitweb/lib";
+
+# ....................................................................
+
+# Load modules
+my @modules =
+	map { "GitwebCache::Capture::$_" }
+	qw(SelectFH TiedCapture);
+require_ok($_) foreach @modules;
+if (eval { require PerlIO::Util; 1 }) {
+	require_ok('GitwebCache::Capture::PerlIO');
+	unshift @modules, 'GitwebCache::Capture::PerlIO';
+}
+
+# Test setting up capture
+#
+my @captures = map { new_ok($_ => []) } @modules;
+push @captures, new_ok('GitwebCache::Capture::TiedCapture' => ['TiedCapture::PerlIO']);
+isa_ok($_, 'GitwebCache::Capture', ref($_)) foreach @captures;
+
+# Test capturing
+#
+diag('Should not print anything except test results and diagnostic');
+
+my $test_data;
+my @captured_output;
+sub capture {
+	my ($class, $mode) = @_;
+
+	$class->start();
+	binmode select(), $mode if defined($mode);
+	print $test_data;
+	return $class->stop();
+}
+sub test_captures {
+	my $mode = shift;
+
+	@captured_output = map { capture($_, $mode); } @captures;
+	if ($mode eq ':utf8') {
+		utf8::decode($_) foreach @captured_output;
+	}
+	for (my $i = 0; $i < @captures; $i++) {
+		my $name = ref($captures[$i]);
+		$name .= " ($captures[$i]->{'tie_class'})"
+			if $captures[$i]->{'tie_class'};
+		my $output = $captured_output[$i];
+		is($output, $test_data, "$name captures $mode data");
+	}
+}
+
+
+binmode STDOUT, ':utf8';
+$test_data = 'Zażółć gęsią jaźń';
+utf8::decode($test_data);
+#diag("\$test_data = $test_data (decoded)\n");
+ok(utf8::is_utf8($test_data), '$test_data is utf8  (utf8::is_utf8)');
+ok(utf8::valid($test_data),   '$test_data is valid (utf8::valid)');
+test_captures(':utf8');
+
+$test_data = '|\x{fe}\x{ff}|\x{9F}|\000|'; # invalid utf-8
+ok(!utf8::is_utf8($test_data), '$test_data is not utf8 (utf8::is_utf8)');
+ok(utf8::valid($test_data),    '$test_data is valid    (utf8::valid)');
+test_captures(':raw');
+
+done_testing();
+
+# Local Variables:
+# encoding: utf-8
+# End:
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCHv5 03/17] gitweb/lib - Very simple file based cache
  2010-10-06 22:01 ` [PATCHv5 03/17] gitweb/lib - Very simple file based cache Jakub Narebski
@ 2010-10-06 22:41   ` Thomas Adam
  2010-10-06 22:44     ` Ævar Arnfjörð Bjarmason
  2010-10-06 23:00     ` Jakub Narebski
  2010-10-06 22:57   ` Ævar Arnfjörð Bjarmason
  1 sibling, 2 replies; 41+ messages in thread
From: Thomas Adam @ 2010-10-06 22:41 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, John 'Warthog9' Hawley, Petr Baudis, admin

On 6 October 2010 23:01, Jakub Narebski <jnareb@gmail.com> wrote:
> +# creates get_depth() and set_depth($depth) etc. methods
> +foreach my $i (qw(depth root namespace)) {
> +       my $field = $i;
> +       no strict 'refs';

For each item, you'll set "no strict refs"?  This might be better off
outside the loop.  It's still scoped appropriately inside the
subroutine.

> +       my $file = $self->path_to_key($key);
> +       return undef unless (defined $file && -f $file);

PBP (Perl Best Practises) will tell you that explicitly returning
undef is discouraged -- "undef" should be reserved for those errors
you cannot handle, not ones you don't want to.

> +
> +       # Fast slurp, adapted from File::Slurp::read, with unnecessary options removed
> +       # via CHI::Driver::File (from CHI-0.33)
> +       my $buf = '';
> +       open my $read_fh, '<', $file
> +               or return undef;

Ditto.

> +       binmode $read_fh, ':raw';
> +
> +       my $size_left = -s $read_fh;
> +
> +       while ($size_left > 0) {
> +               my $read_cnt = sysread($read_fh, $buf, $size_left, length($buf));
> +               return undef unless defined $read_cnt;

Ditto.

> +               last if $read_cnt == 0;
> +               $size_left -= $read_cnt;
> +               #last if $size_left <= 0;
> +       }
> +
> +       close $read_fh
> +               or die "Couldn't close file '$file' opened for reading: $!";
> +       return $buf;
> +}

"use Carp;" would be more useful here, and hence croak() and confess().

> +sub store {
> +       my ($self, $key, $data) = @_;
> +
> +       my $dir;
> +       my $file = $self->path_to_key($key, \$dir);
> +       return undef unless (defined $file && defined $dir);

See above.

> +       # ensure that directory leading to cache file exists
> +       if (!-d $dir) {
> +               eval { mkpath($dir, 0, 0777); 1 }
> +                       or die "Couldn't create leading directory '$dir' (mkpath): $!";
> +       }

Why is this eval()ed?  It will still return false and set $! appropriately.

> +       # generate a temporary file
> +       my ($temp_fh, $tempname) = $self->_tempfile_to_path($file, $dir);
> +       chmod 0666, $tempname
> +               or warn "Couldn't change permissions to 0666 / -rw-rw-rw- for '$tempname': $!";
> +
> +       # Fast spew, adapted from File::Slurp::write, with unnecessary options removed
> +       # via CHI::Driver::File (from CHI-0.33)
> +       my $write_fh = $temp_fh;
> +       binmode $write_fh, ':raw';
> +
> +       my $size_left = length($data);
> +       my $offset = 0;
> +
> +       while ($size_left > 0) {
> +               my $write_cnt = syswrite($write_fh, $data, $size_left, $offset);
> +               return undef unless defined $write_cnt;

Again, with the undef.

> +               $size_left -= $write_cnt;
> +               $offset += $write_cnt; # == length($data);
> +       }
> +
> +       close $temp_fh
> +               or die "Couldn't close temporary file '$tempname' opened for writing: $!";
> +       rename($tempname, $file)
> +               or die "Couldn't rename temporary file '$tempname' to '$file': $!";
> +}
> +
> +# get size of an element associated with the $key (not the size of whole cache)
> +sub get_size {
> +       my ($self, $key) = @_;
> +
> +       my $path = $self->path_to_key($key)
> +               or return undef;

Again with the undef

> +       if (-f $path) {
> +               return -s $path;
> +       }
> +       return 0;
> +}
> +
> +# ......................................................................
> +# interface methods
> +
> +# Removing and expiring
> +
> +# $cache->remove($key)
> +#
> +# Remove the data associated with the $key from the cache.
> +sub remove {
> +       my ($self, $key) = @_;
> +
> +       my $file = $self->path_to_key($key)
> +               or return undef;
> +       return undef unless -f $file;

Again with the undef.

> +       unlink($file)
> +               or die "Couldn't remove file '$file': $!";
> +}
> +
> +# Getting and setting
> +
> +# $cache->set($key, $data);
> +#
> +# Associates $data with $key in the cache, overwriting any existing entry.
> +# Returns $data.
> +sub set {
> +       my ($self, $key, $data) = @_;
> +
> +       return unless (defined $key && defined $data);

return what?

> +       $self->store($key, $data);
> +
> +       return $data;
> +}
> +
> +# $data = $cache->get($key);
> +#
> +# Returns the data associated with $key.  If $key does not exist
> +# or has expired, returns undef.
> +sub get {
> +       my ($self, $key) = @_;
> +
> +       my $data = $self->fetch($key)
> +               or return undef;
> +
> +       return $data;
> +}
> +
> +# $data = $cache->compute($key, $code);
> +#
> +# Combines the get and set operations in a single call.  Attempts to
> +# get $key; if successful, returns the value.  Otherwise, calls $code
> +# and uses the return value as the new value for $key, which is then
> +# returned.
> +sub compute {
> +       my ($self, $key, $code) = @_;
> +
> +       my $data = $self->get($key);
> +       if (!defined $data) {
> +               $data = $code->($self, $key);
> +               $self->set($key, $data);
> +       }

Can you guarantee $code here?

unless( defined $code and ref $code eq 'CODE' )
{
        ....
}

Wouldn't it be easier to eval{} this and checj $@?

[...]

-- Thomas Adam

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv5 03/17] gitweb/lib - Very simple file based cache
  2010-10-06 22:41   ` Thomas Adam
@ 2010-10-06 22:44     ` Ævar Arnfjörð Bjarmason
  2010-10-06 22:46       ` Thomas Adam
  2010-10-06 23:00     ` Jakub Narebski
  1 sibling, 1 reply; 41+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-10-06 22:44 UTC (permalink / raw)
  To: Thomas Adam
  Cc: Jakub Narebski, git, John 'Warthog9' Hawley, Petr Baudis,
	admin

On Wed, Oct 6, 2010 at 22:41, Thomas Adam <thomas@xteddy.org> wrote:
> On 6 October 2010 23:01, Jakub Narebski <jnareb@gmail.com> wrote:
>> +# creates get_depth() and set_depth($depth) etc. methods
>> +foreach my $i (qw(depth root namespace)) {
>> +       my $field = $i;
>> +       no strict 'refs';
>
> For each item, you'll set "no strict refs"?  This might be better off
> outside the loop.  It's still scoped appropriately inside the
> subroutine.
>
>> +       my $file = $self->path_to_key($key);
>> +       return undef unless (defined $file && -f $file);
>
> PBP (Perl Best Practises) will tell you that explicitly returning
> undef is discouraged -- "undef" should be reserved for those errors
> you cannot handle, not ones you don't want to.

[...]

>> +       return unless (defined $key && defined $data);
>
> return what?

false. You're right that "return undef;" is bad style, but "return;"
is what you should use instead.

Then you'll get undef in scalar context, and an empty list in list
context. With "return undef" you'll always get an undef, so:

    sub blah { retur undef }
    my (@foo) = blah();

Will make @foo = (undef), which'll make it evaluate to true in scalar
context since there's an item in the array, and give you a useless
array item.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv5 08/17] gitweb: Add optional output caching
  2010-10-06 22:01 ` [PATCHv5 08/17] gitweb: Add optional output caching Jakub Narebski
@ 2010-10-06 22:46   ` Ævar Arnfjörð Bjarmason
  2010-10-06 23:06     ` Jakub Narebski
  0 siblings, 1 reply; 41+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-10-06 22:46 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, John 'Warthog9' Hawley, Petr Baudis, admin

On Wed, Oct 6, 2010 at 22:01, Jakub Narebski <jnareb@gmail.com> wrote:

> +               $cache ||= 'GitwebCache::SimpleFileCache';
> +               eval "require $cache";

Just:

    eval { require $cache };

Instead?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv5 03/17] gitweb/lib - Very simple file based cache
  2010-10-06 22:44     ` Ævar Arnfjörð Bjarmason
@ 2010-10-06 22:46       ` Thomas Adam
  2010-10-06 22:47         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 41+ messages in thread
From: Thomas Adam @ 2010-10-06 22:46 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jakub Narebski, git, John 'Warthog9' Hawley, Petr Baudis,
	admin

On 6 October 2010 23:44, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> On Wed, Oct 6, 2010 at 22:41, Thomas Adam <thomas@xteddy.org> wrote:
>> On 6 October 2010 23:01, Jakub Narebski <jnareb@gmail.com> wrote:
>>> +# creates get_depth() and set_depth($depth) etc. methods
>>> +foreach my $i (qw(depth root namespace)) {
>>> +       my $field = $i;
>>> +       no strict 'refs';
>>
>> For each item, you'll set "no strict refs"?  This might be better off
>> outside the loop.  It's still scoped appropriately inside the
>> subroutine.
>>
>>> +       my $file = $self->path_to_key($key);
>>> +       return undef unless (defined $file && -f $file);
>>
>> PBP (Perl Best Practises) will tell you that explicitly returning
>> undef is discouraged -- "undef" should be reserved for those errors
>> you cannot handle, not ones you don't want to.
>
> [...]
>
>>> +       return unless (defined $key && defined $data);
>>
>> return what?
>
> false. You're right that "return undef;" is bad style, but "return;"
> is what you should use instead.

I didn't mean that.  :)  It was a question to ensure this is what was
wanted, and not another undef.  I am well aware of the differences.

-- Thomas Adam

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv5 03/17] gitweb/lib - Very simple file based cache
  2010-10-06 22:46       ` Thomas Adam
@ 2010-10-06 22:47         ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 41+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-10-06 22:47 UTC (permalink / raw)
  To: Thomas Adam
  Cc: Jakub Narebski, git, John 'Warthog9' Hawley, Petr Baudis,
	admin

On Wed, Oct 6, 2010 at 22:46, Thomas Adam <thomas@xteddy.org> wrote:
> On 6 October 2010 23:44, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>> On Wed, Oct 6, 2010 at 22:41, Thomas Adam <thomas@xteddy.org> wrote:
>>> On 6 October 2010 23:01, Jakub Narebski <jnareb@gmail.com> wrote:
>>>> +# creates get_depth() and set_depth($depth) etc. methods
>>>> +foreach my $i (qw(depth root namespace)) {
>>>> +       my $field = $i;
>>>> +       no strict 'refs';
>>>
>>> For each item, you'll set "no strict refs"?  This might be better off
>>> outside the loop.  It's still scoped appropriately inside the
>>> subroutine.
>>>
>>>> +       my $file = $self->path_to_key($key);
>>>> +       return undef unless (defined $file && -f $file);
>>>
>>> PBP (Perl Best Practises) will tell you that explicitly returning
>>> undef is discouraged -- "undef" should be reserved for those errors
>>> you cannot handle, not ones you don't want to.
>>
>> [...]
>>
>>>> +       return unless (defined $key && defined $data);
>>>
>>> return what?
>>
>> false. You're right that "return undef;" is bad style, but "return;"
>> is what you should use instead.
>
> I didn't mean that.  :)  It was a question to ensure this is what was
> wanted, and not another undef.  I am well aware of the differences.

Ah right, sorry about the noise then.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv5 06/17] gitweb/lib - Simple select(FH) based output capture
  2010-10-06 22:01 ` [PATCHv5 06/17] gitweb/lib - Simple select(FH) based output capture Jakub Narebski
@ 2010-10-06 22:52   ` Thomas Adam
  2010-10-06 23:22     ` Jakub Narebski
  2010-10-06 23:03   ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 41+ messages in thread
From: Thomas Adam @ 2010-10-06 22:52 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, John 'Warthog9' Hawley, Petr Baudis, admin

Hi --

On 6 October 2010 23:01, Jakub Narebski <jnareb@gmail.com> wrote:
> +sub capture {
> +       my ($self, $code) = @_;
> +
> +       $self->start();
> +       $code->();

unless( defined $code and $code eq 'CODE' )
{
       # Error?
        ....
}

> +       return $self->stop();
> +}
> +
> +# Wrap caching data; capture only STDOUT
> +sub capture_block (&) {

This is prototyped deliberately?

-- Thomas Adam

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC/PATCHv5 18/17] gitweb/lib - Add clear() and size() methods to caching interface
  2010-10-06 22:02 ` [RFC/PATCHv5 18/17] gitweb/lib - Add clear() and size() methods to caching interface Jakub Narebski
@ 2010-10-06 22:56   ` Thomas Adam
  0 siblings, 0 replies; 41+ messages in thread
From: Thomas Adam @ 2010-10-06 22:56 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, John 'Warthog9' Hawley, Petr Baudis, admin

On 6 October 2010 23:02, Jakub Narebski <jnareb@gmail.com> wrote:
> Add ->size() method, which following Cache::Cache interface returns
> estimated total size of all entries in whole cache (in the namsepace
> assiciated with give cache instance).  Note that ->get_size($key)
> returns size of a single entry!
>
> Add ->clear() method, which removes all entries from the namespace
> associated with given cache instance.  For safety it requires
> namespace to be set to true value, which means that it cannot be
> empty; therefore default namespace is changed to 'gitweb'.
>
> The ->clear() method should be fairly safe, because it first renames
> directory (which should be atomic), and only then removes it
> (following code from CGI::Driver::File).
>
> Signed-off-by: Jakub Narebski <jnareb@gmail.com>
> ---
> The only difference from previous version is that a comment about some
> method was moved to earlier commit that introduced it, where it
> belongs.
>
> This patch (implementing "backend") together with next patch
> (implementing "interface") are meant to provide nice web interface to
> safely cleaning (pruning) cache, as described in comments and
> documentation (how to do this manually).
>
>  gitweb/lib/GitwebCache/SimpleFileCache.pm |   48 ++++++++++++++++++++++++++--
>  t/t9503/test_cache_interface.pl           |    1 -
>  2 files changed, 44 insertions(+), 5 deletions(-)
>
> diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
> index 2a3d9cf..8f172cb 100644
> --- a/gitweb/lib/GitwebCache/SimpleFileCache.pm
> +++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
> @@ -20,8 +20,9 @@ package GitwebCache::SimpleFileCache;
>  use strict;
>  use warnings;
>
> -use File::Path qw(mkpath);
> -use File::Temp qw(tempfile);
> +use File::Path qw(mkpath rmtree);
> +use File::Temp qw(tempfile mktemp);
> +use File::Find qw(find);
>  use Digest::MD5 qw(md5_hex);
>
>  # by default, the cache nests all entries on the filesystem single
> @@ -37,7 +38,7 @@ our $DEFAULT_CACHE_ROOT = "cache";
>  # by default we don't use cache namespace (empty namespace);
>  # empty namespace does not allow for simple implementation of clear() method.
>  #
> -our $DEFAULT_NAMESPACE = '';
> +our $DEFAULT_NAMESPACE = "gitweb";
>
>  # ......................................................................
>  # constructor
> @@ -334,7 +335,7 @@ sub get_size {
>  }
>
>  # ......................................................................
> -# interface methods
> +# interface methods dealing with single item
>
>  # Removing and expiring
>
> @@ -427,6 +428,45 @@ sub compute {
>        return $data;
>  }
>
> +# ......................................................................
> +# interface methods dealing with whole namespace
> +
> +# $cache->clear();
> +#
> +# Remove all entries from the namespace.
> +# Namespace must be defined and not empty.
> +sub clear {
> +       my $self = shift;
> +
> +       return unless $self->get_namespace();
> +
> +       my $namespace_dir = $self->path_to_namespace();
> +       return if !-d $namespace_dir;
> +
> +       my $renamed_dir = mktemp($namespace_dir . '.XXXX');
> +       rename($namespace_dir, $renamed_dir);
> +       rmtree($renamed_dir);
> +       die "Couldn't remove '$renamed_dir' directory"
> +               if -d $renamed_dir;
> +}
> +
> +# $size = $cache->size();
> +#
> +# Size of whole names (or whole cache if namespace empty)
> +sub size {
> +       my $self = shift;
> +
> +       my $namespace_dir = $self->path_to_namespace();
> +       return if !-d $namespace_dir;
> +
> +       my $total_size = 0;
> +       my $add_size = sub { $total_size += -s $_ };
> +
> +       File::Find::find({ wanted => $add_size, no_chdir => 1 }, $namespace_dir);
> +
> +       return $total_size;
> +}
> +
>  1;

This looks good to me.

Acked-by:  Thomas Adam <thomas@xteddy.org>

-- Thomas Adam

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv5 03/17] gitweb/lib - Very simple file based cache
  2010-10-06 22:01 ` [PATCHv5 03/17] gitweb/lib - Very simple file based cache Jakub Narebski
  2010-10-06 22:41   ` Thomas Adam
@ 2010-10-06 22:57   ` Ævar Arnfjörð Bjarmason
  2010-10-06 23:46     ` Jakub Narebski
  1 sibling, 1 reply; 41+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-10-06 22:57 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, John 'Warthog9' Hawley, Petr Baudis, admin

On Wed, Oct 6, 2010 at 22:01, Jakub Narebski <jnareb@gmail.com> wrote:

> +sub new {
> +       my ($proto, $p_options_hash_ref) = @_;
> +
> +       my $class = ref($proto) || $proto;
> +       my $self  = {};
> +       $self = bless($self, $class);

You use:

    my $class = ref($proto) || $proto;

Throughout the new class definitions. Presumably that's just
copy/pasted from some old docs and you don't actually want to support:

    my $obj = Class->new;
    my $obj2 = $obj->new;

It's better just to:

    sub new {
        my ($class, $options) = @_:
        my $self = bless {}, $class;

Unless there's some reason for the ref($proto) that I've missed.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv5 03/17] gitweb/lib - Very simple file based cache
  2010-10-06 22:41   ` Thomas Adam
  2010-10-06 22:44     ` Ævar Arnfjörð Bjarmason
@ 2010-10-06 23:00     ` Jakub Narebski
  2010-10-06 23:12       ` Thomas Adam
  1 sibling, 1 reply; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 23:00 UTC (permalink / raw)
  To: Thomas Adam; +Cc: git, John 'Warthog9' Hawley, Petr Baudis, admin

Thank you very much for those comments on code.

On Thu, 7 Oct 2010, Thomas Adam wrote:
> On 6 October 2010 23:01, Jakub Narebski <jnareb@gmail.com> wrote:

> > +# creates get_depth() and set_depth($depth) etc. methods
> > +foreach my $i (qw(depth root namespace)) {
> > +       my $field = $i;
> > +       no strict 'refs';
> 
> For each item, you'll set "no strict refs"?  This might be better off
> outside the loop.  It's still scoped appropriately inside the
> subroutine.

On the other hand this way scope where "no strict 'refs';" is active
is limited... but I guess having "no strict 'refs';" outside loop would
be better.

> 
> > +       my $file = $self->path_to_key($key);
> > +       return undef unless (defined $file && -f $file);
> 
> PBP (Perl Best Practises) will tell you that explicitly returning
> undef is discouraged -- "undef" should be reserved for those errors
> you cannot handle, not ones you don't want to.

Well, Perl Best Practices are practices; sometimes there is good reason
to not take them into account (though probably not in this case).

Explicitly returning undef is discouraged because in list context the
returned undef (or rather 1-element list with 'undef' as sole element)
is true-ish.

On the other hand

> > +# $cache->set($key, $data);
> > +#
> > +# Associates $data with $key in the cache, overwriting any existing entry.
> > +# Returns $data.
> > +sub set {
> > +       my ($self, $key, $data) = @_;
> > +
> > +       return unless (defined $key && defined $data);
> 
> return what?
 
as you can see 'return' statement with no value looks rather cryptic
with statement modifier (conditional).

I should really have run gitweb, caching modules and tests through 
perlcritic...

> > +               last if $read_cnt == 0;
> > +               $size_left -= $read_cnt;
> > +               #last if $size_left <= 0;
> > +       }
> > +
> > +       close $read_fh
> > +               or die "Couldn't close file '$file' opened for reading: $!";
> > +       return $buf;
> > +}
> 
> "use Carp;" would be more useful here, and hence croak() and confess().

For a web application we usually do not want to have too detailed error
message present to client (to web browser) to avoid leaking of sensitive
information.
 
> > +       # ensure that directory leading to cache file exists
> > +       if (!-d $dir) {
> > +               eval { mkpath($dir, 0, 0777); 1 }
> > +                       or die "Couldn't create leading directory '$dir' (mkpath): $!";
> > +       }
> 
> Why is this eval()ed?  It will still return false and set $! appropriately.

IIRC mkpath *dies on error*, rather than returning false.  For better
error handling we would need to use make_path, but File::Path 2.0+
is in [stable] core only since Perl 5.10.
 
[...]
> > +# $data = $cache->compute($key, $code);
> > +#
> > +# Combines the get and set operations in a single call.  Attempts to
> > +# get $key; if successful, returns the value.  Otherwise, calls $code
> > +# and uses the return value as the new value for $key, which is then
> > +# returned.
> > +sub compute {
> > +       my ($self, $key, $code) = @_;
> > +
> > +       my $data = $self->get($key);
> > +       if (!defined $data) {
> > +               $data = $code->($self, $key);
> > +               $self->set($key, $data);
> > +       }
> 
> Can you guarantee $code here?

I don't want to code too defensively, but perhaps check for this is in
order... though what we should do if $code is not code reference?

> 
> unless( defined $code and ref $code eq 'CODE' )

"ref($code) eq 'CODE'" would be enough; 'undef' is not reference, and
ref(undef) returns "".

> {
>         ....
> }
> 
> Wouldn't it be easier to eval{} this and check $@?

So the answer is no.


Thanks again for your comments.
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv5 06/17] gitweb/lib - Simple select(FH) based output capture
  2010-10-06 22:01 ` [PATCHv5 06/17] gitweb/lib - Simple select(FH) based output capture Jakub Narebski
  2010-10-06 22:52   ` Thomas Adam
@ 2010-10-06 23:03   ` Ævar Arnfjörð Bjarmason
  2010-10-06 23:26     ` Jakub Narebski
  1 sibling, 1 reply; 41+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-10-06 23:03 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, John 'Warthog9' Hawley, Petr Baudis, admin

On Wed, Oct 6, 2010 at 22:01, Jakub Narebski <jnareb@gmail.com> wrote:

> * The most important issue is that I/O "layers" (PerlIO), like ':utf8'
>  or ':raw', are *already applied* to the output that is captured.
>  This means that captured output is *always* in binary (':raw') mode.
>  In Perl 6 language it means that data returned by capturing engine
>  is an equivalent of Buf, a collection of bytes, whether Buf or Str
>  (a colection of logical characters) is printed.

> +       # note: this does not cover all cases
> +       binmode select(), ':utf8'
> +               if ((PerlIO::get_layers($self->{'oldfh'}))[-1] eq 'utf8');

I'm not sure but maybe we want to use ":Encoding(UTF-8)" everywhere in
this series where you've used ":utf8". I.e. use the Encoding UTF-8
layer instead of the internal utf8 layer.

It's more strict, see perldoc Encode's "UTF-8 vs. utf8 vs. UTF8". But
maybe we don't care.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv5 08/17] gitweb: Add optional output caching
  2010-10-06 22:46   ` Ævar Arnfjörð Bjarmason
@ 2010-10-06 23:06     ` Jakub Narebski
  2010-10-06 23:16       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 23:06 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, John 'Warthog9' Hawley, Petr Baudis, admin

On Thu, 7 Oct 2010, Ævar Arnfjörð Bjarmason wrote:
> On Wed, Oct 6, 2010 at 22:01, Jakub Narebski <jnareb@gmail.com> wrote:
> 
> > +               $cache ||= 'GitwebCache::SimpleFileCache';
> > +               eval "require $cache";
> 
> Just:
> 
>     eval { require $cache };
> 
> Instead?

Wouldn't work correctly.  We want to use 'require BAREWORD' version,
where BAREWORD is name of module, e.g. GitwebCache::SimpleFileCache,
and which makes Perl to search for GitwebCache/SimpleFileCache.pm
in @INC.

The 'require STRING' version loads file given by a *path*, and it
doesn't do library lookup.

From `perldoc -f require`:

   But if you try this:

      $class = 'Foo::Bar';
      require $class;      # $class is not a bareword
      #or
      require "Foo::Bar";  # not a bareword because of the ""

   The require function will look for the "Foo::Bar" file in the @INC array
   and will complain about not finding "Foo::Bar" there.  In this case you
   can do:

      eval "require $class";


Well, we could insert hooks into @INC, but I don't think we want to use
such hack.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv5 03/17] gitweb/lib - Very simple file based cache
  2010-10-06 23:00     ` Jakub Narebski
@ 2010-10-06 23:12       ` Thomas Adam
  2010-10-06 23:32         ` Jakub Narebski
  0 siblings, 1 reply; 41+ messages in thread
From: Thomas Adam @ 2010-10-06 23:12 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, John 'Warthog9' Hawley, Petr Baudis, admin

Hi --

On 7 October 2010 00:00, Jakub Narebski <jnareb@gmail.com> wrote:
> Thank you very much for those comments on code.

Sure.

> On the other hand this way scope where "no strict 'refs';" is active
> is limited... but I guess having "no strict 'refs';" outside loop would
> be better.

Definitely.

>>
>> > +       my $file = $self->path_to_key($key);
>> > +       return undef unless (defined $file && -f $file);
>>
>> PBP (Perl Best Practises) will tell you that explicitly returning
>> undef is discouraged -- "undef" should be reserved for those errors
>> you cannot handle, not ones you don't want to.
>
> Well, Perl Best Practices are practices; sometimes there is good reason
> to not take them into account (though probably not in this case).

Do not ever underestimate Conway's book, Jakub -- seriously.   It's
not a Bible, sure, but to ignore it or otherwise eschew its advise is
to shoot yourself in the foot.   :)

> I should really have run gitweb, caching modules and tests through
> perlcritic...

And your other patches.   :)

>> > +               last if $read_cnt == 0;
>> > +               $size_left -= $read_cnt;
>> > +               #last if $size_left <= 0;
>> > +       }
>> > +
>> > +       close $read_fh
>> > +               or die "Couldn't close file '$file' opened for reading: $!";
>> > +       return $buf;
>> > +}
>>
>> "use Carp;" would be more useful here, and hence croak() and confess().
>
> For a web application we usually do not want to have too detailed error
> message present to client (to web browser) to avoid leaking of sensitive
> information.

Unless we use fatalstobrowser, it will still end up in Apache's error
log.  I don't see a problem here.

>> > +       # ensure that directory leading to cache file exists
>> > +       if (!-d $dir) {
>> > +               eval { mkpath($dir, 0, 0777); 1 }
>> > +                       or die "Couldn't create leading directory '$dir' (mkpath): $!";
>> > +       }
>>
>> Why is this eval()ed?  It will still return false and set $! appropriately.
>
> IIRC mkpath *dies on error*, rather than returning false.  For better
> error handling we would need to use make_path, but File::Path 2.0+
> is in [stable] core only since Perl 5.10.

Umm, maybe.   I've not verified this, but we shouldn't rely on this either.

> I don't want to code too defensively, but perhaps check for this is in
> order... though what we should do if $code is not code reference?

confess() ahem, or otherwise die() since it's a complete fail here.
Appropriate logging is necessary here.

>>
>> unless( defined $code and ref $code eq 'CODE' )
>
> "ref($code) eq 'CODE'" would be enough; 'undef' is not reference, and
> ref(undef) returns "".

Sure, but it still assumes $code is a reference, and there's no
guarantee of that in the code path.

>> {
>>         ....
>> }
>>
>> Wouldn't it be easier to eval{} this and check $@?
>
> So the answer is no.

Fantastic!   :)

-- Thomas Adam

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv5 08/17] gitweb: Add optional output caching
  2010-10-06 23:06     ` Jakub Narebski
@ 2010-10-06 23:16       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 41+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-10-06 23:16 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, John 'Warthog9' Hawley, Petr Baudis, admin

On Wed, Oct 6, 2010 at 23:06, Jakub Narebski <jnareb@gmail.com> wrote:
> On Thu, 7 Oct 2010, Ævar Arnfjörð Bjarmason wrote:
>> On Wed, Oct 6, 2010 at 22:01, Jakub Narebski <jnareb@gmail.com> wrote:
>>
>> > +               $cache ||= 'GitwebCache::SimpleFileCache';
>> > +               eval "require $cache";
>>
>> Just:
>>
>>     eval { require $cache };
>>
>> Instead?
>
> Wouldn't work correctly.  We want to use 'require BAREWORD' version,
> where BAREWORD is name of module, e.g. GitwebCache::SimpleFileCache,
> and which makes Perl to search for GitwebCache/SimpleFileCache.pm
> in @INC.

Ah, I've been using perl5i too much :)

> Well, we could insert hooks into @INC, but I don't think we want to use
> such hack.

There's also:

    eval {
         my $path = $cache;
         $path =~ s[::][/]g;
         $path .= ".pm";
         require $path;
    };

But at that point it's probably easier just to use string eval. Unless there's
a performance issue (very unlikely).

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv5 06/17] gitweb/lib - Simple select(FH) based output capture
  2010-10-06 22:52   ` Thomas Adam
@ 2010-10-06 23:22     ` Jakub Narebski
  0 siblings, 0 replies; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 23:22 UTC (permalink / raw)
  To: Thomas Adam; +Cc: git, John 'Warthog9' Hawley, Petr Baudis, admin

Thomas Adam wrote:
> On 6 October 2010 23:01, Jakub Narebski <jnareb@gmail.com> wrote:

> > +sub capture {
> > +       my ($self, $code) = @_;
> > +
> > +       $self->start();
> > +       $code->();
> 
> unless( defined $code and ref($code) eq 'CODE' )
> {
>        # Error?
>         ....
> }

But what to do if $code is *not* a code reference?  Well, except perhaps
providing better error message...

> > +       return $self->stop();
> > +}
> > +
> > +# Wrap caching data; capture only STDOUT
> > +sub capture_block (&) {
> 
> This is prototyped deliberately?

Yes, it is prototyped deliberately to be able to use it like this

  my $data = capture_block {
     ...
  }

similarly to the API used by Capture::Tiny (which is not in core, which
captures also STDERR, and which do not provide capture_stop() equivalent
for die_error to be not captured and cached).

Only to explicitely discard prototype with '&capture_block($code)', when
using it in ->cache_output_* methods in GitwebCache::CacheOutput ;-)

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv5 06/17] gitweb/lib - Simple select(FH) based output capture
  2010-10-06 23:03   ` Ævar Arnfjörð Bjarmason
@ 2010-10-06 23:26     ` Jakub Narebski
  0 siblings, 0 replies; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 23:26 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, John 'Warthog9' Hawley, Petr Baudis, admin

On Thu, 7 Oct 2010 01:03, Ævar Arnfjörð Bjarmason wrote:
> On Wed, Oct 6, 2010 at 22:01, Jakub Narebski <jnareb@gmail.com> wrote:
> 
> > * The most important issue is that I/O "layers" (PerlIO), like ':utf8'
> >  or ':raw', are *already applied* to the output that is captured.
> >  This means that captured output is *always* in binary (':raw') mode.
> >  In Perl 6 language it means that data returned by capturing engine
> >  is an equivalent of Buf, a collection of bytes, whether Buf or Str
> >  (a colection of logical characters) is printed.
> 
> > +       # note: this does not cover all cases
> > +       binmode select(), ':utf8'
> > +               if ((PerlIO::get_layers($self->{'oldfh'}))[-1] eq 'utf8');

Sidenote: I just realized that we can simply try to replay all but
special layers from 'oldfh'.
 
> I'm not sure but maybe we want to use ":Encoding(UTF-8)" everywhere in
> this series where you've used ":utf8". I.e. use the Encoding UTF-8
> layer instead of the internal utf8 layer.
> 
> It's more strict, see perldoc Encode's "UTF-8 vs. utf8 vs. UTF8". But
> maybe we don't care.

I think for the time being we don't care. 

In the future we could perhaps update gitweb to use ':encoding(UTF-8)'
PerlIO layer rather than ':utf8' layer, and update capturing engine(s)
appropriately.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv5 03/17] gitweb/lib - Very simple file based cache
  2010-10-06 23:12       ` Thomas Adam
@ 2010-10-06 23:32         ` Jakub Narebski
  0 siblings, 0 replies; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 23:32 UTC (permalink / raw)
  To: Thomas Adam; +Cc: git, John 'Warthog9' Hawley, Petr Baudis, admin

On 7 Oct 2010, Thomas Adam wrote:
> On 7 October 2010 00:00, Jakub Narebski <jnareb@gmail.com> wrote:

[...] 
>> I should really have run gitweb, caching modules and tests through
>> perlcritic...
> 
> And your other patches.   :)

Well, you can't run (as far as I know) *patches* though perlcritic ;-)

>>>> +               last if $read_cnt == 0;
>>>> +               $size_left -= $read_cnt;
>>>> +               #last if $size_left <= 0;
>>>> +       }
>>>> +
>>>> +       close $read_fh
>>>> +               or die "Couldn't close file '$file' opened for reading: $!";
>>>> +       return $buf;
>>>> +}
>>>
>>> "use Carp;" would be more useful here, and hence croak() and confess().
>>
>> For a web application we usually do not want to have too detailed error
>> message present to client (to web browser) to avoid leaking of sensitive
>> information.
> 
> Unless we use fatalstobrowser, it will still end up in Apache's error
> log.  I don't see a problem here.

The problem is that we do equivalent of fatalsToBrowser, i.e. set up
error handler with 'set_message(\&handle_errors_html);'.
 
>>> unless( defined $code and ref $code eq 'CODE' )
>>
>> "ref($code) eq 'CODE'" would be enough; 'undef' is not reference, and
>> ref(undef) returns "".
> 
> Sure, but it still assumes $code is a reference, and there's no
> guarantee of that in the code path.

Yes, but if $code is not reference, then ref($code) evaluates to empty
string, and "ref($code) eq 'CODE'" comparison would fail... without any
warnings.  You don't need to check upfront for "defined $code"; it is
the only thing I wanted to nitpick.
 
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv5 03/17] gitweb/lib - Very simple file based cache
  2010-10-06 22:57   ` Ævar Arnfjörð Bjarmason
@ 2010-10-06 23:46     ` Jakub Narebski
  0 siblings, 0 replies; 41+ messages in thread
From: Jakub Narebski @ 2010-10-06 23:46 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, John 'Warthog9' Hawley, Petr Baudis, admin

On Thu, 7 Oct 2010, Ævar Arnfjörð Bjarmason wrote:
> On Wed, Oct 6, 2010 at 22:01, Jakub Narebski <jnareb@gmail.com> wrote:
> 
> > +sub new {
> > +       my ($proto, $p_options_hash_ref) = @_;
> > +
> > +       my $class = ref($proto) || $proto;
> > +       my $self  = {};
> > +       $self = bless($self, $class);
> 
> You use:
> 
>     my $class = ref($proto) || $proto;
> 
> Throughout the new class definitions. Presumably that's just
> copy/pasted from some old docs and you don't actually want to support:
> 
>     my $obj = Class->new;
>     my $obj2 = $obj->new;
> 
> It's better just to:
> 
>     sub new {
>         my ($class, $options) = @_:
>         my $self = bless {}, $class;
> 
> Unless there's some reason for the ref($proto) that I've missed.

You are right, the 'my $class = ref($proto) || $proto;' dance is not
necessary; we would be always using Class->new().

Will fix.
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [RFD] Possible improvements for output caching in gitweb
  2010-10-06 22:01 [PATCHv5 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (20 preceding siblings ...)
  2010-10-06 22:02 ` [PoC PATCHv5 21/17] gitweb/lib - Alternate ways of capturing output Jakub Narebski
@ 2010-10-10 20:32 ` Jakub Narebski
  2010-10-24 21:34 ` [PATCHv5 00/17] gitweb: Simple file based output caching J.H.
  22 siblings, 0 replies; 41+ messages in thread
From: Jakub Narebski @ 2010-10-10 20:32 UTC (permalink / raw)
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog' Hawley,
	Petr Baudis, Petr Baudis, admin

On Thu, 7 Oct 2010, Jakub Narebski wrote:

> TODO list and areas of possible improvements would be send in separate
> email.

Here they are.  What do you think about them; which are needed, which 
ones would be nice to have, which are not worth the trouble, and what's 
perhaps most important: which ones are missing.  Also in what order of 
importance should they be worked upon.

Note that in the list below there are deliberately missing improvements
to the code (which were already commented on; thanks again).


New features and improvements related directly to cache or capture:

* Ajax-y progress indicator (perhaps inside skeleton of page)
  see 9982b6f (gitweb: Ajax-y "Generating..." page when regenerating
  cache (WIP), 2010-01-24) on 'gitweb/cache-kernel' branch in the
  http://repo.or.cz/w/git/jnareb-git.git repository.

  Instead of relying on http-equiv refresh trick (which uses the fact
  that web browsers render inclomplete page, and that refresh is done
  only after page is received in full), use XMLHttpRequest to get
  (re)generated version of the page, displaying progress info while at
  it, and redraw page when data is received in full.  All of this only
  when JavaScript is enabled, so I guess old trick should be kept as 
  fallback.

  This of course assumes that progress info indicator is important...


* error handler, like in CHI

  Instead of using 'die <message>' and relying on CGI.pm and gitweb
  catching the exception and displaying it (set_message from CGI.pm),
  pass error handler (wrapped die_error) to cache constructor.  In the
  case of CHI caching framework, there are 'on_get_error' and
  'on_set_error' options.

  In the original patches by J.H. subroutines from cache.pm used
  die_error directly; this was possible only because the file was
  loaded using "do 'cache.pm';" as a kind of mixin / role into gitweb
  code.


* $capture option to cache_output()

  Currently in GitwebCache::OutputCache the capture engine used to
  capture gitweb output to cache it is hardcoded; mind you, one would
  need to change code only in two places to use different compatibile
  caching engine, but still it would require changing code.  It would
  be better to pass $capture as parameter to cache_output(), just like
  $cache is.


* POD documentation instead of comments + make doc

  Currently gitweb caching modules  are documented (like original
  cache.pm by J.H.) only in comments, and a bit in gitweb/README.
  Though those modules are used only internally, it might be better
  to use POD (perlpod) for documentation, like in Git.pm.

  We migth also want to add 'doc' target in gitweb/Makefile, though
  it might be difficult (see perl/Makefile.PL and generated perl.mak).


* cache expire variation a la CHI

  CHI (caching interface in Perl) supports 'expires_variance' parameter,
  which according to documentation:

   "Controls the variable expiration feature, which allows items to
    expire a little earlier than the stated expiration time to help
    prevent cache miss stampedes.
       
    Value is between 0.0 and 1.0, with 0.0 meaning that items expire
    exactly when specified (feature is disabled), and 1.0 meaning that
    items might expire anytime from now til the stated expiration time.
    The default is 0.0. A setting of 0.10 to 0.25 would introduce a
    small amount of variation without interfering too much with intended
    expiration times."

  See http://p3rl.org/CHI (or CHI manpage, if you have it installed).
  This feature is about *avoiding* cache miss stampede, while locking
  is used to ensure that only one process is regenerating cache for
  a given entry.


* benchmarks for different caches under light and under heavy load;
  profiling of gitweb with caching using Devel::NYTProf.
  
  The problem is to both prepare repositories, and to generate traffic
  (or generate IO pressure) to represent real-life situation, where
  supposedly gitweb is IO bound, rather than CPU bound.


-------------------------------------------------------------
Below there are cache related improvements that require for 
GitwebCache::CacheOutput to be aware that it caches HTTP response,
which consist of HTTP headers (text) separated by an empty line
from a body of a request (which can be binary).

This can be done either by parsing response or a retrieved cache entry, 
or by storing headers and body separately, or by using some Perl data 
structure (like for example the one used by PSGI) and storing it 
serialized (though serialization can affect performance).


* X-Sendfile (or equivalent) support

  Web server encountering such HTTP header will discard all output and
  send the file specified by that header instead using web server
  internals including all optimizations like caching-headers and
  sendfile or mmap if configured.  For Apache it requires mod_xsendfile
  module (https://tn123.org/mod_xsendfile/), lighttpd has it build in
  (at least for FastCGI) but disabled by default; in Nginx similar
  feature is called X-Accel-Redirect.

  The idea is to use cache file for X-Sendfile contents; though this
  might require storing headers and body of response separately, and
  might be not much of speedup.


* compressed cache entries (transfer-encoding) (?)

  To reduce size taken by cache, and also reduce bandwidth taken by
  serving gitweb requests, save body of response compressed.  Then,
  if browser supports it, send compressed data with the HTTP header
  'Transfer-Encoding:' set to appriopriate value.

  The complication which, I think, we have to take into account is
  that some (hopefully small amount) of web browsers and net downloaders
  doesn't support transfer-encoding we plan to use (gzip or deflate).
  Also gitweb should compress file which it knows to not compress well,
  like already compressed snapshots (zip, tar.gz, tar.bz2) or images.

  There was patch " gitweb: Enable transparent compression for HTTP
  output" sent to git mailing list (using PerlIO::gzip), but in the
  cached case we pay CPU cost only *once*.


* Replace text/html with application/xml+xhtml in header
  when reading from cache.

  In the non-cached case, gitweb served page using either plain
  'text/html' content type, or if web browser accepts it more advanced
  'application/xhtml+xml' content type.  When caching is enabled, we
  had to always use 'text/html', because web browser (e.g. lynx) might
  not accept the other... but with cache being HTTP-aware, we can
  replace 'text/html' with 'application/xhtml+xml' in 'Content-Type:'
  HTTP header.


* Expires-In / cache-age synchronized with cache lifetime,
  Last-Modified synchronized with cache entry creation time.

  Currently all cache entries have the same global (per cache instance)
  expiration time.  The Expire header is not correlated with it.

  There are two issues: when storing data in cache, we can set Expire
  header (or cache-age pragma in Cache-Control header) to the expiration
  time of cache entry and set Last-Modified to the time cache entry was
  (re)generated (unless it is already set by gitweb).

  The other issue is that some data doesn't change, ever.  We set expire
  time to '+1d' (one day) in such case.  If we could mark those cache
  entries as having longer / infinite lifetime to not regenerate them...


* support for If-Modified-Since (external/browser caches)

  When caching is enabled, we know when page was created.  We could
  check for If-Modified-Since conditional request header, and return
  '304 Not Modified' HTTP response if we would serve from the same
  cache entry.  It would save bandwidth, and a bit of I/O.


* ETag support - gitweb version + cache key hash, possibly also Range
  requests.

  We can compose strong ETag validator from cache key hash and gitweb
  version string.  Maybe it would make possible to respond to Range
  requests for resuming download of e.g. large snapshot file...

  But it might be the fact that those features are unrelated...  


----------------------------------------------------------------------
Below there are proposed gitweb improvements and features, which would 
also improve caching support in gitweb:

* Time::HiRes is in core + simplify progress indicator

  Time::HiRes was first released with perl 5.007003 (5.7.3).  Because
  gitweb requires at least Perl 5.8 for its Unicode / UTF-8 support,
  we can assume that it is present.

  This would simplify code in git_generating_data_html()


* $per_request_config = 0/1 (default)/coderef
  or just $per_request_config = coderef

  If it would be possible to have config read only once in persistent
  environments such as mod_perl and FastCGI, and not once per request,
  it would improve performance when caching engine used has slow
  initialization / creation time, like Moose-based CHI.

  The basic idea is to put parts of config that change per request (like
  e.g. gitosis or gitolite uses) in coderef in $per_request_config
  variable; this coderef would be invoked once per config.  Example
  configuration:

    our $per_request_config = sub {
       $ENV{GL_USER} = ($cgi && $cgi->remote_user) || "gitweb";
    };


* authenthication / authorization for admin stuff

  Some kind of authenthications support would be needed for edit / write
  support in gitweb, and also for controlling access to the cache
  administration page.  We don't want anyone to be able to clear cache.

  In the current proof-of-concept patch the cache administration page
  is restricted to people accessing gitweb pages from localhost, or
  running gitweb as a standalone script.


* mod_perl handler

  It might be possible with altering / modifying gitweb only slightly to
  make it work *also* as native mod_perl handler, and not only via
  ModPerl::Registry.  

  This would make possible to initialize cache once per process
  lifetime, and not once per request.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv5 00/17] gitweb: Simple file based output caching
  2010-10-06 22:01 [PATCHv5 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (21 preceding siblings ...)
  2010-10-10 20:32 ` [RFD] Possible improvements for output caching in gitweb Jakub Narebski
@ 2010-10-24 21:34 ` J.H.
  22 siblings, 0 replies; 41+ messages in thread
From: J.H. @ 2010-10-24 21:34 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Petr Baudis, admin

> J.H., could you comment on how this series relates to the gitweb code
> *currently* running on git.kernel.org wrt. code?  If possible, could you
> try to compare performance of those two implementations, the one
> presented here in this series, and the one used by git.kernel.org.

Haven't been ignoring this, but been quite busy with a whole pile of
other things (new hardware, kernel issues, patching exploits, generally
going insane).

I'm *HOPING* I can get around to this next week, I've also been working
on a couple of clean-ups in my own code which will likely land 'soon'.

- John 'Warthog9' Hawley

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2010-10-24 21:34 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-06 22:01 [PATCHv5 00/17] gitweb: Simple file based output caching Jakub Narebski
2010-10-06 22:01 ` [PATCHv5 01/17] t/test-lib.sh: Export also GIT_BUILD_DIR in test_external Jakub Narebski
2010-10-06 22:01 ` [PATCHv5 02/17] gitweb: Prepare for splitting gitweb Jakub Narebski
2010-10-06 22:01 ` [PATCHv5 03/17] gitweb/lib - Very simple file based cache Jakub Narebski
2010-10-06 22:41   ` Thomas Adam
2010-10-06 22:44     ` Ævar Arnfjörð Bjarmason
2010-10-06 22:46       ` Thomas Adam
2010-10-06 22:47         ` Ævar Arnfjörð Bjarmason
2010-10-06 23:00     ` Jakub Narebski
2010-10-06 23:12       ` Thomas Adam
2010-10-06 23:32         ` Jakub Narebski
2010-10-06 22:57   ` Ævar Arnfjörð Bjarmason
2010-10-06 23:46     ` Jakub Narebski
2010-10-06 22:01 ` [PATCHv5 04/17] gitweb/lib - Stat-based cache expiration Jakub Narebski
2010-10-06 22:01 ` [PATCHv5 05/17] gitweb/lib - Regenerate entry if the cache file has size of 0 Jakub Narebski
2010-10-06 22:01 ` [PATCHv5 06/17] gitweb/lib - Simple select(FH) based output capture Jakub Narebski
2010-10-06 22:52   ` Thomas Adam
2010-10-06 23:22     ` Jakub Narebski
2010-10-06 23:03   ` Ævar Arnfjörð Bjarmason
2010-10-06 23:26     ` Jakub Narebski
2010-10-06 22:01 ` [PATCHv5 07/17] gitweb/lib - Cache captured output (using get/set) Jakub Narebski
2010-10-06 22:01 ` [PATCHv5 08/17] gitweb: Add optional output caching Jakub Narebski
2010-10-06 22:46   ` Ævar Arnfjörð Bjarmason
2010-10-06 23:06     ` Jakub Narebski
2010-10-06 23:16       ` Ævar Arnfjörð Bjarmason
2010-10-06 22:01 ` [PATCHv5 09/17] gitweb/lib - Adaptive cache expiration time Jakub Narebski
2010-10-06 22:01 ` [PATCHv5 10/21] gitweb/lib - Use CHI compatibile (compute method) caching interface Jakub Narebski
2010-10-06 22:01 ` [PATCHv5 11/17] gitweb/lib - Use locking to avoid 'cache miss stampede' problem Jakub Narebski
2010-10-06 22:01 ` [PATCHv5 12/17] gitweb/lib - No need for File::Temp when locking Jakub Narebski
2010-10-06 22:01 ` [PATCHv5 13/17] gitweb/lib - Serve stale data when waiting for filling cache Jakub Narebski
2010-10-06 22:01 ` [PATCHv5 14/17] gitweb/lib - Regenerate (refresh) cache in background Jakub Narebski
2010-10-06 22:02 ` [PATCHv5 15/17] gitweb: Introduce %actions_info, gathering information about actions Jakub Narebski
2010-10-06 22:02 ` [PATCHv5/RFC 16/17] gitweb: Show appropriate "Generating..." page when regenerating cache Jakub Narebski
2010-10-06 22:02 ` [PATCHv5/RFC 17/17] gitweb: Add startup delay to activity indicator for cache Jakub Narebski
2010-10-06 22:02 ` [RFC/PATCHv5 18/17] gitweb/lib - Add clear() and size() methods to caching interface Jakub Narebski
2010-10-06 22:56   ` Thomas Adam
2010-10-06 22:02 ` [RFC PATCHv5 19/17] gitweb: Add beginnings of cache administration page Jakub Narebski
2010-10-06 22:02 ` [PoC PATCHv5 20/17] gitweb/lib - Benchmarking GitwebCache::SimpleFileCache (in t/9603/) Jakub Narebski
2010-10-06 22:02 ` [PoC PATCHv5 21/17] gitweb/lib - Alternate ways of capturing output Jakub Narebski
2010-10-10 20:32 ` [RFD] Possible improvements for output caching in gitweb Jakub Narebski
2010-10-24 21:34 ` [PATCHv5 00/17] gitweb: Simple file based output caching J.H.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).