git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash
@ 2008-05-30 23:00 Lea Wiemann
  2008-05-30 23:03 ` Lea Wiemann
  2008-05-31 13:04 ` Petr Baudis
  0 siblings, 2 replies; 23+ messages in thread
From: Lea Wiemann @ 2008-05-30 23:00 UTC (permalink / raw)
  To: git; +Cc: Lea Wiemann

This simplifies git_get_head_hash a lot; the method might eventually
even go away.

I haven't checked whether this causes an IO performance regression by
instantiating a new Git repository instance, but in the end
Git->repository will be as fast as possible and do no eager disk
accesses.  No benchmarking yet at this stage.

Minor change: Moved the parameter shift in git_blob_plain to the top
for readability.

Signed-off-by: Lea Wiemann <LeWiemann@gmail.com>
---

I have tested this by running the smoke-test.pl script on my small
test repository -- this covers calls to git_get_head_hash -- and then
recursively diffing the two resulting wget output directories before
and after the change.  The trees were essentially identical (save a
few timestamps inside the snapshot files).

For brevity, I'll refer to this test procedure as something like
"smoke-test.pl showed no differences" in future patches. ;-)

 gitweb/gitweb.perl |   24 ++++++++----------------
 1 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 57a1905..0efd2f7 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -16,6 +16,7 @@ use Encode;
 use Fcntl ':mode';
 use File::Find qw();
 use File::Basename qw(basename);
+use Git;
 binmode STDOUT, ':utf8';
 
 BEGIN {
@@ -1508,20 +1509,11 @@ sub git_cmd_str {
 # get HEAD ref of given project as hash
 sub git_get_head_hash {
 	my $project = shift;
-	my $o_git_dir = $git_dir;
-	my $retval = undef;
-	$git_dir = "$projectroot/$project";
-	if (open my $fd, "-|", git_cmd(), "rev-parse", "--verify", "HEAD") {
-		my $head = <$fd>;
-		close $fd;
-		if (defined $head && $head =~ /^([0-9a-fA-F]{40})$/) {
-			$retval = $1;
-		}
-	}
-	if (defined $o_git_dir) {
-		$git_dir = $o_git_dir;
-	}
-	return $retval;
+	my $directory = "$projectroot/$project";
+	# Legacy side effect on $git_dir.  This will eventually go
+	# away as the global $git_dir is eliminated.
+	$git_dir = $directory if (!defined $git_dir);
+	Git->repository(Directory => $directory)->parse_rev("HEAD");
 }
 
 # get type of given object
@@ -4377,8 +4369,9 @@ sub git_heads {
 }
 
 sub git_blob_plain {
-	my $expires;
+	my $type = shift;
 
+	my $expires;
 	if (!defined $hash) {
 		if (defined $file_name) {
 			my $base = $hash_base || git_get_head_hash($project);
@@ -4392,7 +4385,6 @@ sub git_blob_plain {
 		$expires = "+1d";
 	}
 
-	my $type = shift;
 	open my $fd, "-|", git_cmd(), "cat-file", "blob", $hash
 		or die_error(undef, "Couldn't cat $file_name, $hash");
 
-- 
1.5.5.GIT

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash
  2008-05-30 23:00 [PATCH] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash Lea Wiemann
@ 2008-05-30 23:03 ` Lea Wiemann
  2008-05-31  9:40   ` Jakub Narebski
  2008-05-31 13:04 ` Petr Baudis
  1 sibling, 1 reply; 23+ messages in thread
From: Lea Wiemann @ 2008-05-30 23:03 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git

Lea Wiemann wrote:
> Subject: [PATCH] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash

For clarification, this is my first patch for refactoring Gitweb to use 
Git.pm's API.

In the end, I'm hoping that all (or at least most) of Gitweb's accesses 
to the repositories will go through this API, which allows us to add 
caching to the Git.pm API (rather than Gitweb) pretty easily and cleanly.

-- Lea

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash
  2008-05-30 23:03 ` Lea Wiemann
@ 2008-05-31  9:40   ` Jakub Narebski
  2008-05-31 12:39     ` Lea Wiemann
  0 siblings, 1 reply; 23+ messages in thread
From: Jakub Narebski @ 2008-05-31  9:40 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git

Lea Wiemann <lewiemann@gmail.com> writes:

> Lea Wiemann wrote:
> > Subject: [PATCH] gitweb: use Git.pm, and use its parse_rev method
> >          for git_get_head_hash
> 
> For clarification, this is my first patch for refactoring Gitweb to
> use Git.pm's API.

Good.

I'm not sure however how much of gitweb engine should be moved to
Git.pm.  I think calling git commands could be moved from git_cmd() to
command_output_pipe() or command_oneline(), replacing global variable
$git_dir by global variable $repo.  Perhaps gitweb's config file
parsing should be added as an option to Git.pm.  But I'm not sure for
example about parsing subroutines.

> In the end, I'm hoping that all (or at least most) of Gitweb's
> accesses to the repositories will go through this API, which allows us
> to add caching to the Git.pm API (rather than Gitweb) pretty easily
> and cleanly.

IMVHO caching command output is bad idea.  I'd rather have gitweb
cache _parsed_ data (i.e. Perl structures).

For example projects list (but also summary page for a project) is
composed as result of parsing output of _many_ git commands, in the
case of projects list coming from many different repositories.
Caching git command output (something like IPC::Cmd::Cached?)
would only repeat bad I/O patterns of git commands.

Also I don't think it is a good idea to pollute Git.pm by caching
command output.  Perhaps in Git::Cached, but as a last resort...

P.S. Here it is what could go to Git.pm:
 * Eager config parsing, using e.g. $repo->parse_config() to prepare
   %config hash, and have $repo->config(VAR) etc. use %config hash.
   This means one git command and not one git command per variable,
   but it also means that we have to convert to bool or int, or color
   in Perl.
 * unquote() (which is not repository instance merhod, but utility
   subroutine) to unquote and unescape filenames.
 * generating rfc2822 and iso-8601 dates from timestamp plus timezone,
   i.e. from author/committer/tagger date; it would be useful for
   example in git-send-email.

I'm not sure about parse_* subroutines (some of which are not generic
enough, I guess).
-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash
  2008-05-31  9:40   ` Jakub Narebski
@ 2008-05-31 12:39     ` Lea Wiemann
  2008-06-01 22:19       ` Jakub Narebski
  0 siblings, 1 reply; 23+ messages in thread
From: Lea Wiemann @ 2008-05-31 12:39 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski wrote:
> But I'm not sure for example [that] parsing subroutines [that parse the
> output of the git commands should be in Git.pm].
> 
> [...] IMVHO caching command output is bad idea.  I'd rather have gitweb
> cache _parsed_ data (i.e. Perl structures).

Yes, I agree with you that parsed data should be cached, but for that 
reason I think that the subroutine that parse git's output should be in 
Git.pm, not in Gitweb:

If you want to cache parsed data, you need the caching layer in between 
the application (= Gitweb) and the repository access layer (= command 
output parsing), or you would have to insert cache code at every call to 
the repository access layer.  So you end up with these layers:

(Layer 0: Front-end [HTML] caching.)
Layer 1: Application (Gitweb)
Layer 2: Back-end caching
Layer 3: Repository access (command parsing)
Layer 4: Calls to the git binary

Layer 3 and 4 are application-independent (i.e. not Gitweb specific), 
and since they form a usable API, they might as well be written as a 
separate API rather than lumped together with Gitweb.  Git.pm is a start 
of such an API (it does layer 4 and a little bit of layer 3), so it 
seems natural for me to extend it.

Layer 2 is application-independent as well, so it can become an extra 
class in Git.pm or a separate module.  (It should stay independent of 
layers 3 and 4).

We may need to have an explicitly implemented layer 0 (front-end 
caching) in Gitweb for at least a subset of pages (like project pages), 
but we'll see if that's necessary.

> Caching git command output (something like IPC::Cmd::Cached?)
> would only repeat bad I/O patterns of git commands.

I hope you're not assuming that the back-end cache will reside on disk 
-- the problem is IO throughput, so having a cache on disk can really 
only work for a front-end cache.  For the back-end cache, we *will* have 
to use a memory cache (or no cache at all).

> P.S. Here it is what could go [from Gitweb] to Git.pm: [snip]

Good points, I agree those should eventually be moved to Git.pm.  (They 
are not my priority at the moment, since I really want to implement 
caching.  So unless they move naturally as part of my refactoring, I'll 
leave them in Gitweb for now.)

-- Lea

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash
  2008-05-30 23:00 [PATCH] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash Lea Wiemann
  2008-05-30 23:03 ` Lea Wiemann
@ 2008-05-31 13:04 ` Petr Baudis
  2008-05-31 14:19   ` [PATCH v2] " Lea Wiemann
  1 sibling, 1 reply; 23+ messages in thread
From: Petr Baudis @ 2008-05-31 13:04 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git

On Sat, May 31, 2008 at 01:00:12AM +0200, Lea Wiemann wrote:
> This simplifies git_get_head_hash a lot; the method might eventually
> even go away.
> 
> I haven't checked whether this causes an IO performance regression by
> instantiating a new Git repository instance, but in the end
> Git->repository will be as fast as possible and do no eager disk
> accesses.  No benchmarking yet at this stage.
> 
> Minor change: Moved the parameter shift in git_blob_plain to the top
> for readability.

Please split this to a separate patch.

> Signed-off-by: Lea Wiemann <LeWiemann@gmail.com>

Other than that,

Acked-by: Petr Baudis <pasky@suse.cz>

-- 
				Petr "Pasky" Baudis
Whatever you can do, or dream you can, begin it.
Boldness has genius, power, and magic in it.	-- J. W. von Goethe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash
  2008-05-31 13:04 ` Petr Baudis
@ 2008-05-31 14:19   ` Lea Wiemann
  2008-05-31 14:34     ` Lea Wiemann
                       ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Lea Wiemann @ 2008-05-31 14:19 UTC (permalink / raw)
  To: git; +Cc: Lea Wiemann

This simplifies git_get_head_hash a lot; the method might eventually
even go away.

I haven't checked whether this causes an IO performance regression by
instantiating a new Git repository instance, but in the end
Git->repository will be as fast as possible and do no eager disk
accesses.  No benchmarking yet at this stage.

Signed-off-by: Lea Wiemann <LeWiemann@gmail.com>
---

Per your request without the cleanup.  I won't submit the cleanup
patch separately, but I assume it will get cleaned up eventually when
someone touches that function.

 gitweb/gitweb.perl |   20 ++++++--------------
 1 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 57a1905..0ed3d6e 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -16,6 +16,7 @@ use Encode;
 use Fcntl ':mode';
 use File::Find qw();
 use File::Basename qw(basename);
+use Git;
 binmode STDOUT, ':utf8';
 
 BEGIN {
@@ -1508,20 +1509,11 @@ sub git_cmd_str {
 # get HEAD ref of given project as hash
 sub git_get_head_hash {
 	my $project = shift;
-	my $o_git_dir = $git_dir;
-	my $retval = undef;
-	$git_dir = "$projectroot/$project";
-	if (open my $fd, "-|", git_cmd(), "rev-parse", "--verify", "HEAD") {
-		my $head = <$fd>;
-		close $fd;
-		if (defined $head && $head =~ /^([0-9a-fA-F]{40})$/) {
-			$retval = $1;
-		}
-	}
-	if (defined $o_git_dir) {
-		$git_dir = $o_git_dir;
-	}
-	return $retval;
+	my $directory = "$projectroot/$project";
+	# Legacy side effect on $git_dir.  This will eventually go
+	# away as the global $git_dir is eliminated.
+	$git_dir = $directory if (!defined $git_dir);
+	Git->repository(Directory => $directory)->parse_rev("HEAD");
 }
 
 # get type of given object
-- 
1.5.5.GIT

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash
  2008-05-31 14:19   ` [PATCH v2] " Lea Wiemann
@ 2008-05-31 14:34     ` Lea Wiemann
  2008-06-01  8:23     ` Junio C Hamano
  2008-06-01 23:18     ` [PATCH v2] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash Lea Wiemann
  2 siblings, 0 replies; 23+ messages in thread
From: Lea Wiemann @ 2008-05-31 14:34 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git

Lea Wiemann wrote:
> Subject: [PATCH v2] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash

Just to clarify, this needs "[PATCH v2] perl/Git.pm: add parse_rev 
method" to work, here:
http://mid.gmane.org/1212241932-28605-1-git-send-email-LeWiemann@gmail.com

(Apologies I keep replying to my own patches.)

-- Lea

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash
  2008-05-31 14:19   ` [PATCH v2] " Lea Wiemann
  2008-05-31 14:34     ` Lea Wiemann
@ 2008-06-01  8:23     ` Junio C Hamano
  2008-06-01 15:44       ` Lea Wiemann
  2008-06-01 23:18     ` [PATCH v2] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash Lea Wiemann
  2 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2008-06-01  8:23 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git

Lea Wiemann <lewiemann@gmail.com> writes:

> This simplifies git_get_head_hash a lot; the method might eventually
> even go away.
>
> I haven't checked whether this causes an IO performance regression by
> instantiating a new Git repository instance, but in the end
> Git->repository will be as fast as possible and do no eager disk
> accesses.  No benchmarking yet at this stage.
>
> Signed-off-by: Lea Wiemann <LeWiemann@gmail.com>

With this on top of your parse_rev patch (I used v2 but I do not think v3
changes the situation in any way), you seem to have broken t9500.

gitweb.log left in the trash directory says that it cannot find Git.pm in
@INC.  I suspect that you are not using your own Git in the build tree in
your test, but an already installed one.

Please make sure that you are testing with Git.pm that you are shipping.
A good way to do so would be to add a deliberate error in perl/Git.pm to
cause loading of the module to fail, and make sure test barfs upon "use
Git".

Something like this is needed on top of your patch, I think.

---

 t/t9500-gitweb-standalone-no-errors.sh |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
index ae7082b..a62231f 100755
--- a/t/t9500-gitweb-standalone-no-errors.sh
+++ b/t/t9500-gitweb-standalone-no-errors.sh
@@ -47,8 +47,9 @@ gitweb_run () {
 	PATH_INFO=""$2""
 	export GATEWAY_INTERFACE HTTP_ACCEPT REQUEST_METHOD QUERY_STRING PATH_INFO
 
+	PERL5LIB=$(pwd)/../../perl/blib/lib
 	GITWEB_CONFIG=$(pwd)/gitweb_config.perl
-	export GITWEB_CONFIG
+	export GITWEB_CONFIG PERL5LIB
 
 	# some of git commands write to STDERR on error, but this is not
 	# written to web server logs, so we are not interested in that:

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash
  2008-06-01  8:23     ` Junio C Hamano
@ 2008-06-01 15:44       ` Lea Wiemann
  2008-06-01 21:33         ` Junio C Hamano
  0 siblings, 1 reply; 23+ messages in thread
From: Lea Wiemann @ 2008-06-01 15:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano wrote:
> With this on top of your parse_rev patch (I used v2 but I do not think v3
> changes the situation in any way), you seem to have broken t9500.
> 
> [...] I suspect that you are not using your own Git in the build tree in
> your test, but an already installed one.

That was indeed the case, thanks for pointing it out!

However, after applying my two patches and your patch on a pristine 
current git.git clone, I still don't get an error, even though the 
Gitweb test uses the new Git.pm (which I tested it does).  Care to send 
me your error message so I can track it down, or even upload your 
complete tree somewhere?  Feel free to reply off-list or ping me on IRC.

> +++ b/t/t9500-gitweb-standalone-no-errors.sh
>  
> +	PERL5LIB=$(pwd)/../../perl/blib/lib

How about putting this into test-lib.sh?  There are more tests (like my 
new Git.pm test suite) that will need it, so setting it up in a central 
place would probably more convenient and prevent future problems of this 
sort.

If PERL5LIB already contains paths, can we just discard them, or should 
we preserve them?

Since perl/Makefile only copies Git.pm to blib/lib/Git.pm, we could also 
set the path to ../../perl, which would prevent us from accidentally 
running tests against an old version of Git.pm (because we haven't run 
cd perl; make before).  And perhaps add a comment to perl/Makefile about 
this, in case someone wants to change the build process in the future. 
Or is there some reason why this would be a bad idea?

-- Lea

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash
  2008-06-01 15:44       ` Lea Wiemann
@ 2008-06-01 21:33         ` Junio C Hamano
  2008-06-01 22:16           ` [PATCH] test-lib.sh: set PERL5LIB instead of GITPERLLIB Lea Wiemann
  0 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2008-06-01 21:33 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git

Lea Wiemann <lewiemann@gmail.com> writes:

> Junio C Hamano wrote:
>> With this on top of your parse_rev patch (I used v2 but I do not think v3
>> changes the situation in any way), you seem to have broken t9500.
>>
>> [...] I suspect that you are not using your own Git in the build tree in
>> your test, but an already installed one.
>
> That was indeed the case, thanks for pointing it out!
>
> However, after applying my two patches and your patch on a pristine
> current git.git clone, I still don't get an error...

If you applied the "this on top of yours" fix, you shouldn't have seen any
breakage.

The breakage was not about what you did in Git.pm, but about adding "use
Git" in gitweb.perl but without arranging it to find Git.pm (I do not have
any git installed in usual places on my machine, and I strip the directory
I keep my git installation out of my $PATH when running tests, to make
sure "make test" can bootstrap itself without first installing).

> How about putting this into test-lib.sh?  There are more tests (like
> my new Git.pm test suite) that will need it, so setting it up in a
> central place would probably more convenient and prevent future
> problems of this sort.

Sure, I think that would be sensible.

> If PERL5LIB already contains paths, can we just discard them, or
> should we preserve them?

It would be best to keep the existing one but make ours the first thing to
be found, so that we will be sure that we test the one we just built.

> Since perl/Makefile only copies Git.pm to blib/lib/Git.pm, we could
> also set the path to ../../perl, which would prevent us from
> accidentally running tests against an old version of Git.pm (because
> we haven't run cd perl; make before).  And perhaps add a comment to
> perl/Makefile about this, in case someone wants to change the build
> process in the future. Or is there some reason why this would be a bad
> idea?

I think it is a bad idea.  In principle we should be testing what we just
built and will install; even if currently the "building procedure" for
blib/lib/Git.pm may happen to be bit-for-bit copy of the original, that is
just by accident and not something we would want to rely on.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] test-lib.sh: set PERL5LIB instead of GITPERLLIB
  2008-06-01 21:33         ` Junio C Hamano
@ 2008-06-01 22:16           ` Lea Wiemann
  2008-06-02  5:17             ` Junio C Hamano
  0 siblings, 1 reply; 23+ messages in thread
From: Lea Wiemann @ 2008-06-01 22:16 UTC (permalink / raw)
  To: git; +Cc: Lea Wiemann

By setting PERL5LIB for the tests, we enable Perl test scripts to just
say "use Git;" without adding the GITPERLLIB paths to @INC beforehand.

Signed-off-by: Lea Wiemann <LeWiemann@gmail.com>
---
Junio C Hamano wrote:
> Lea Wiemann <lewiemann@gmail.com> writes:
>> How about [setting PERL5LIB in] test-lib.sh?
> 
> Sure, I think that would be sensible.

Actually, GITPERLLIB becomes redundant in that case, so I've simply
replaced it with PERL5LIB in test-lib.sh.

I've tested the patch: all scripts pick up the right Git.pm, and all
tests still pass.  It also preserves existing PERL5LIB variables
correctly.

Btw, I'm assuming that it's OK to post small patches (like this one)
as replies.  If you or anyone else would prefer independent patches to
always start a new thread, please let me know.

-- Lea

 t/test-lib.sh |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 99b63da..2f86831 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -454,8 +454,9 @@ GIT_CONFIG_NOSYSTEM=1
 GIT_CONFIG_NOGLOBAL=1
 export PATH GIT_EXEC_PATH GIT_TEMPLATE_DIR GIT_CONFIG_NOSYSTEM GIT_CONFIG_NOGLOBAL
 
-GITPERLLIB=$(pwd)/../perl/blib/lib:$(pwd)/../perl/blib/arch/auto/Git
-export GITPERLLIB
+test -n "$PERL5LIB" && PERL5LIB=":$PERL5LIB"
+PERL5LIB="$(pwd)/../perl/blib/lib:$(pwd)/../perl/blib/arch/auto/Git$PERL5LIB"
+export PERL5LIB
 test -d ../templates/blt || {
 	error "You haven't built things yet, have you?"
 }
-- 
1.5.5.GIT

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash
  2008-05-31 12:39     ` Lea Wiemann
@ 2008-06-01 22:19       ` Jakub Narebski
  2008-06-02  5:35         ` Junio C Hamano
  2008-06-02  9:29         ` Petr Baudis
  0 siblings, 2 replies; 23+ messages in thread
From: Jakub Narebski @ 2008-06-01 22:19 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git

On Sat, 31 May 2008, Lea Wiemann wrote:
> Jakub Narebski wrote:
>>
>> But I'm not sure for example [that] parsing subroutines [that parse the
>> output of the git commands should be in Git.pm].
>> 
>> [...] IMVHO caching command output is bad idea.  I'd rather have gitweb
>> cache _parsed_ data (i.e. Perl structures).
> 
> Yes, I agree with you that parsed data should be cached, but for that 
> reason I think that the subroutine that parse git's output should be in 
> Git.pm, not in Gitweb:

This not necessarily follows.  First, I'm not sure if gitweb's parse_*
subroutines are generic enough; please take into account that Git.pm 
is used also by other git commands coded in Perl: git-cvsexportcommit
(but not git-cvsserver?), git-send-email, git-svn and (internal)
git-add--interactive.  Second, caching of _parsed_ data can be
implemented in gitweb, or in Gitweb::Cache / Gitweb::Caching.

> If you want to cache parsed data, you need the caching layer in between 
> the application (= Gitweb) and the repository access layer (= command 
> output parsing), or you would have to insert cache code at every call to 
> the repository access layer.  

Not every.  Please don't assume that we would want to cache _all_ the
data; herein madness lies.

But there is another reason to have caching as a layer, namely having
caching optional (i.e. enabled or disable), although this can be also
achieved by choosing 'Null'/'None' caching engine.

> So you end up with these layers: 
> 
> (Layer 0: Front-end [HTML] caching.)
> Layer 1: Application (Gitweb)
> Layer 2: Back-end caching
> Layer 3: Repository access (command parsing)
> Layer 4: Calls to the git binary
> 
> Layer 3 and 4 are application-independent (i.e. not Gitweb specific), 
> and since they form a usable API, they might as well be written as a 
> separate API rather than lumped together with Gitweb.  Git.pm is a start 
> of such an API (it does layer 4 and a little bit of layer 3), so it 
> seems natural for me to extend it.

This assumes that command parsing used by gitweb are generic enough
to put them in Git.pm.  But some IMVHO are very gitweb-specific, for
example the part in parse_commit_text() beginning with 
  # remove leading stuff of merges to make the interesting part visible
and the 'age_string*' keys there, parse_difftree_raw_line() which
currently does not support '-z' output, parse_from_to_diffinfo() which
is _very_ gitweb specific, git_get_heads_list() which is not generic
enough (it gets info which gitweb needs, but no more), etc.

> Layer 2 is application-independent as well, so it can become an extra 
> class in Git.pm or a separate module.  (It should stay independent of 
> layers 3 and 4).

I think it would be better as separate module.  Would it be Git::Cache
(or Git::Caching), Gitweb::Cache, or part of gitweb, that would have
to be decided.  Besides, I'm not sure if it is really application-
-independent as you say: I think we would get better result if we
collate data first, which is application dependent.  Also I think
there is no sense to cache everything: what to cache is again
application dependent.

> We may need to have an explicitly implemented layer 0 (front-end 
> caching) in Gitweb for at least a subset of pages (like project pages), 
> but we'll see if that's necessary.

I think that front-end caching (HTML/RSS/Atom output caching) has sense
for large web pages (large size and large number of items), such as
projects list page if it is unpaginated (and perhaps even if it is
divided into pages, although I'm sure not for project search page),
commonly requested snapshots (if they are enabled), large trees like
SPECS directory with all the package *.spec files for distribution
repository, perhaps summary/feed for most popular projects.  If (when)
syntax highlighting would got implemented, probably also caching
blob view (as CPU can become issue).

Front-end (HTML output) caching has the advantages of easy to calculate
strong ETag, and web server support for If-Match: and If-None-Match:
HTTP/1.1 headers.  You can easy support Range: request, needed for
resuming download (e.g. for downloading snapshots, if this feature is
enabled in gitweb).  You can even compress the output, and serve it to
clients which support proper transparent compression (Content-Encoding).

And of course it has the advantage of actually been written and tested
in work, in the case of kernel.org gitweb.  Although caching parsed
data was implemented in repo.or.cz gitweb, it was done only for
projects list view, and it is quite new and not so thoroughly tested
  http://article.gmane.org/gmane.comp.version-control.git/77469


It would be nice for front-end caching to have an option to use absolute
time for all time/dates, and to (optionally) not use adaptive
Content-Type...

>> Caching git command output (something like IPC::Cmd::Cached?)
>> would only repeat bad I/O patterns of git commands.
> 
> I hope you're not assuming that the back-end cache will reside on disk 
> -- the problem is IO throughput, so having a cache on disk can really 
> only work for a front-end cache.  For the back-end cache, we *will* have 
> to use a memory cache (or no cache at all).

Don't assume, check (in this case: benchmark[1]).  

But I reallu don't know enough about caching to say if it is one way
or the other...

[1] http://cpan.robm.fastmail.fm/cache_perf.html (a bit old)
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash
  2008-05-31 14:19   ` [PATCH v2] " Lea Wiemann
  2008-05-31 14:34     ` Lea Wiemann
  2008-06-01  8:23     ` Junio C Hamano
@ 2008-06-01 23:18     ` Lea Wiemann
  2 siblings, 0 replies; 23+ messages in thread
From: Lea Wiemann @ 2008-06-01 23:18 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git

Lea Wiemann wrote:
> Re: [PATCH v2] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash

Since I just renamed the method from parse_rev to get_hash, I would 
technically have to post a v3 of my patch, but I'm afraid that at some 
point I'll have a v4 and v5, with a lot of time wasted. :)

Hence, to minimize effort on all sides, I'd rather suggest that I won't 
post any Gitweb patches (other than as RFCs, or perhaps bug fixes to 
existing code) for now, so that we don't introduce an unnecessary Git.pm 
dependency while the refactoring happens.  Also, since I'm posting 
(too?) many patches around Git.pm and Gitweb already, this will reduce 
the load on reviewers.

-- Lea

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] test-lib.sh: set PERL5LIB instead of GITPERLLIB
  2008-06-01 22:16           ` [PATCH] test-lib.sh: set PERL5LIB instead of GITPERLLIB Lea Wiemann
@ 2008-06-02  5:17             ` Junio C Hamano
  2008-06-02 14:08               ` Lea Wiemann
  2008-06-03  0:20               ` [PATCH] " Lea Wiemann
  0 siblings, 2 replies; 23+ messages in thread
From: Junio C Hamano @ 2008-06-02  5:17 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git

Lea Wiemann <lewiemann@gmail.com> writes:

>>> How about [setting PERL5LIB in] test-lib.sh?
>> 
>> Sure, I think that would be sensible.
>
> Actually, GITPERLLIB becomes redundant in that case, so I've simply
> replaced it with PERL5LIB in test-lib.sh.

Sorry, but now you mention it, it comes back to me why we have GITPERLLIB,
why the setting of GITPERLLIB in t/test-lib.sh we already have did not
work, and why my "this on top of yours" was a mere workaround and not a
real fix.

The fact is that Git.pm can be installed in a private path outside of
where people usually install site-perl modules, and the primary Makefile
has this rule to munge our perl scripts:

        $(patsubst %.perl,%,$(SCRIPT_PERL)): % : %.perl
                $(QUIET_GEN)$(RM) $@ $@+ && \
                INSTLIBDIR=`MAKEFLAGS= $(MAKE) -C perl -s --no-print-directory instlibdir` && \
                sed -e '1{' \
                    -e '	s|#!.*perl|#!$(PERL_PATH_SQ)|' \
                    -e '	h' \
                    -e '	s=.*=use lib (split(/:/, $$ENV{GITPERLLIB} || "@@INSTLIBDIR@@"));=' \
                    -e '	H' \
                    -e '	x' \
                    -e '}' \
                    -e 's|@@INSTLIBDIR@@|'"$$INSTLIBDIR"'|g' \
                    -e 's/@@GIT_VERSION@@/$(GIT_VERSION)/g' \
                    $@.perl >$@+ && \
                chmod +x $@+ && \
                mv $@+ $@

so that people do not have to have that path on PERL5LIB in their
environment when they use git (namely, git-send-email and other scripts
that do "use Git").

The real fix to the issue, which is consistent with what 6fcca93 (Use
$GITPERLLIB instead of $RUNNING_GIT_TESTS and centralize @INC munging,
2006-07-03) did, is not touch any of the t/* files (neither my "this on
top of yours", nor the patch I am responding to), but to fix the build
procedure of gitweb/gitweb.perl so that the above script rewriting is also
applied to it.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash
  2008-06-01 22:19       ` Jakub Narebski
@ 2008-06-02  5:35         ` Junio C Hamano
  2008-06-02  9:29         ` Petr Baudis
  1 sibling, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2008-06-02  5:35 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: Jakub Narebski, git

Jakub Narebski <jnareb@gmail.com> writes:

> On Sat, 31 May 2008, Lea Wiemann wrote:
> ...
>> So you end up with these layers: 
>> 
>> (Layer 0: Front-end [HTML] caching.)
>> Layer 1: Application (Gitweb)
>> Layer 2: Back-end caching
>> Layer 3: Repository access (command parsing)
>> Layer 4: Calls to the git binary
>> 
>> Layer 3 and 4 are application-independent (i.e. not Gitweb specific), 
>> and since they form a usable API, they might as well be written as a 
>> separate API rather than lumped together with Gitweb.  Git.pm is a start 
>> of such an API (it does layer 4 and a little bit of layer 3), so it 
>> seems natural for me to extend it.
>
> This assumes that command parsing used by gitweb are generic enough
> to put them in Git.pm.  But some IMVHO are very gitweb-specific, for
> example the part in parse_commit_text() beginning with 
>   # remove leading stuff of merges to make the interesting part visible
> and the 'age_string*' keys there, parse_difftree_raw_line() which
> currently does not support '-z' output, parse_from_to_diffinfo() which
> is _very_ gitweb specific, git_get_heads_list() which is not generic
> enough (it gets info which gitweb needs, but no more), etc.
>
>> Layer 2 is application-independent as well, so it can become an extra 
>> class in Git.pm or a separate module.  (It should stay independent of 
>> layers 3 and 4).
>
> I think it would be better as separate module.  Would it be Git::Cache
> (or Git::Caching), Gitweb::Cache, or part of gitweb, that would have
> to be decided.  Besides, I'm not sure if it is really application-
> -independent as you say: I think we would get better result if we
> collate data first, which is application dependent.  Also I think
> there is no sense to cache everything: what to cache is again
> application dependent.

Even though I (for some unknown reason) rarely agree with Jakub on this
list, I agree 100% with the above paragraph.  In fact I yesterday started
to write exactly the same thing but I could not word it well enough, and I
am glad Jakub said what I wanted to say in a form that is much clearer
than I would have ;-).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash
  2008-06-01 22:19       ` Jakub Narebski
  2008-06-02  5:35         ` Junio C Hamano
@ 2008-06-02  9:29         ` Petr Baudis
  2008-06-02 21:32           ` Jakub Narebski
  1 sibling, 1 reply; 23+ messages in thread
From: Petr Baudis @ 2008-06-02  9:29 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Lea Wiemann, git

On Mon, Jun 02, 2008 at 12:19:23AM +0200, Jakub Narebski wrote:
> On Sat, 31 May 2008, Lea Wiemann wrote:
> > Layer 2 is application-independent as well, so it can become an extra 
> > class in Git.pm or a separate module.  (It should stay independent of 
> > layers 3 and 4).
> 
> I think it would be better as separate module.  Would it be Git::Cache
> (or Git::Caching), Gitweb::Cache, or part of gitweb, that would have
> to be decided.  Besides, I'm not sure if it is really application-
> -independent as you say: I think we would get better result if we
> collate data first, which is application dependent.  Also I think
> there is no sense to cache everything: what to cache is again
> application dependent.

I'm not very comfortable with putting caching directly to Git either.

Even if you _would_ decide that you want to add caching directly to Git
interface, it would be better to use extra module Git::Cached and
auto-wrap all Git functions through AUTOLOAD. But that still has the
problems Jakub desrcibed.

> > We may need to have an explicitly implemented layer 0 (front-end 
> > caching) in Gitweb for at least a subset of pages (like project pages), 
> > but we'll see if that's necessary.
> 
> I think that front-end caching (HTML/RSS/Atom output caching) has sense
> for large web pages (large size and large number of items), such as
> projects list page if it is unpaginated (and perhaps even if it is
> divided into pages, although I'm sure not for project search page),
> commonly requested snapshots (if they are enabled), large trees like
> SPECS directory with all the package *.spec files for distribution
> repository, perhaps summary/feed for most popular projects. If (when)
> syntax highlighting would got implemented, probably also caching
> blob view (as CPU can become issue).
> 
> Front-end (HTML output) caching has the advantages of easy to calculate
> strong ETag, and web server support for If-Match: and If-None-Match:
> HTTP/1.1 headers.  You can easy support Range: request, needed for
> resuming download (e.g. for downloading snapshots, if this feature is
> enabled in gitweb).

Caching snapshots would definitely make sense, sure.

> You can even compress the output, and serve it to clients which
> support proper transparent compression (Content-Encoding).

What does this have to do with caching?

> And of course it has the advantage of actually been written and tested
> in work, in the case of kernel.org gitweb.  Although caching parsed
> data was implemented in repo.or.cz gitweb, it was done only for
> projects list view, and it is quite new and not so thoroughly tested
>   http://article.gmane.org/gmane.comp.version-control.git/77469

This argument does have some value, but I don't think it matters too
much, since as far as I understood, it is going to get largely
reimplemented anyway.

> It would be nice for front-end caching to have an option to use absolute
> time for all time/dates, and to (optionally) not use adaptive
> Content-Type...

I'd hate to have to do unnecessary compromises in order to get sensible
caching.

Even in your excellent series on Gitweb caching series, I didn't spot
any arguments that would put frontend caching in front of the
intermediate data caching option; yes, it is the simplest solution
implementation-wise, but also the least flexible one. My gut feeling is
still to go with data caching instead of HTML caching.

-- 
				Petr "Pasky" Baudis
Whatever you can do, or dream you can, begin it.
Boldness has genius, power, and magic in it.	-- J. W. von Goethe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] test-lib.sh: set PERL5LIB instead of GITPERLLIB
  2008-06-02  5:17             ` Junio C Hamano
@ 2008-06-02 14:08               ` Lea Wiemann
  2008-06-02 14:13                 ` [PATCH v2] " Lea Wiemann
  2008-06-03  0:20               ` [PATCH] " Lea Wiemann
  1 sibling, 1 reply; 23+ messages in thread
From: Lea Wiemann @ 2008-06-02 14:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano wrote:
> the primary Makefile has this rule to munge our perl scripts:
> 
>                     -e '	s=.*=use lib (split(/:/, $$ENV{GITPERLLIB} || "@@INSTLIBDIR@@"));=' \
> 
> The real fix to the issue [is] to fix the build procedure of
> gitweb/gitweb.perl so that the above script rewriting is also applied to it.

I see -- in that case, gitweb.perl should indeed be fixed.

Still however, for the Perl tests (t/t9700/test.pl in my branch) we 
don't have a Makefile, so the only way for them to pick up the right 
Git.pm module is modifying PERL5LIB in test-lib.sh. [1]  I just realized 
that we not only don't need to set GITPERLLIB in this case, but we 
should actually unset it so that user-set paths don't cause a wrong 
version of Git.pm to be loaded.  I'll send a new patch.

-- Lea

[1] Or alternatively adding some (lengthy and cwd-dependent) "use lib 
..." statement at the top of every Perl test file, but that doesn't seem 
preferable given that we can solve this centrally in test-lib.sh.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2] test-lib.sh: set PERL5LIB instead of GITPERLLIB
  2008-06-02 14:08               ` Lea Wiemann
@ 2008-06-02 14:13                 ` Lea Wiemann
  2008-06-02 20:01                   ` Junio C Hamano
  0 siblings, 1 reply; 23+ messages in thread
From: Lea Wiemann @ 2008-06-02 14:13 UTC (permalink / raw)
  To: git; +Cc: Lea Wiemann, Lea Wiemann

From: Lea Wiemann <lewiemann@gmail.com>

By setting PERL5LIB for the tests, we enable Perl test scripts to just
say "use Git;" without adding the GITPERLLIB paths to @INC beforehand.

Also, unset GITPERLLIB so that user-set paths in GETPERLLIB don't
cause the wrong module to be loaded.

Signed-off-by: Lea Wiemann <LeWiemann@gmail.com>
---
Added since v1: Unset GITPERLLIB.

 t/test-lib.sh |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 99b63da..8ea0511 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -454,8 +454,10 @@ GIT_CONFIG_NOSYSTEM=1
 GIT_CONFIG_NOGLOBAL=1
 export PATH GIT_EXEC_PATH GIT_TEMPLATE_DIR GIT_CONFIG_NOSYSTEM GIT_CONFIG_NOGLOBAL
 
-GITPERLLIB=$(pwd)/../perl/blib/lib:$(pwd)/../perl/blib/arch/auto/Git
-export GITPERLLIB
+unset GITPERLLIB
+test -n "$PERL5LIB" && PERL5LIB=":$PERL5LIB"
+PERL5LIB="$(pwd)/../perl/blib/lib:$(pwd)/../perl/blib/arch/auto/Git$PERL5LIB"
+export PERL5LIB
 test -d ../templates/blt || {
 	error "You haven't built things yet, have you?"
 }
-- 
1.5.5.GIT

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] test-lib.sh: set PERL5LIB instead of GITPERLLIB
  2008-06-02 14:13                 ` [PATCH v2] " Lea Wiemann
@ 2008-06-02 20:01                   ` Junio C Hamano
  2008-06-02 22:19                     ` Lea Wiemann
  0 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2008-06-02 20:01 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git, Lea Wiemann

Lea Wiemann <lewiemann@gmail.com> writes:

> From: Lea Wiemann <lewiemann@gmail.com>
>
> By setting PERL5LIB for the tests, we enable Perl test scripts to just
> say "use Git;" without adding the GITPERLLIB paths to @INC beforehand.
>
> Also, unset GITPERLLIB so that user-set paths in GETPERLLIB don't
> cause the wrong module to be loaded.
>
> Signed-off-by: Lea Wiemann <LeWiemann@gmail.com>
> ---
> Added since v1: Unset GITPERLLIB.

This goes even further in the opposite direction from the original.  It
looks cleaner, and I'd prefer the approach _if_ it worked.

But I do not think it does.

I think the reason we did not do this long time ago in 6fcca93 (Use
$GITPERLLIB instead of $RUNNING_GIT_TESTS and centralize @INC munging,
2006-07-03) was because of the precedence between "use lib @these_paths"
and $PERL5LIB does not work the way you seem to think it does.

If you say in your script:

	use lib '/usr/local/git/perl'
        use Git;

(and we do want to say so in our script to make sure that people can
install Git.pm outside of system install paths), running the script with
PERL5LIB set to elsewhere that has Git.pm would not help.

I just tried:

	$ pwd
        /net/knick/home/junio/junk/
	$ cat j.perl
        #!/usr/bin/perl -w
        use lib '/net/knick/home/junio/junk/1dir';
        use G;
        print $G::it;
        $ cat 1dir/G.pm
        package G;
        our $it = 1;
        1;
        $ cat 2dir/G.pm
        package G;
        our $it = 2;
        1;
        $ perl -w j.perl
        1
        $ PERL5LIB=/net/knick/home/junio/junk/2dir perl -w j.perl
        1

Because "t/t9700/<many>.perl" will _not_ be installed anyway, I do not
think they need to be "built" like the scripted Porcelain commands.  If
they unconditionally do "use lib $ENV{GITPERLLIB};" upfront, wouldn't that
be enough to guarantee that you would use the freshly built one during the
test, not the instsalled one?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash
  2008-06-02  9:29         ` Petr Baudis
@ 2008-06-02 21:32           ` Jakub Narebski
  2008-06-02 22:31             ` Lea Wiemann
  0 siblings, 1 reply; 23+ messages in thread
From: Jakub Narebski @ 2008-06-02 21:32 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Lea Wiemann, git, Lars Hjemli, John Hawley

On Mon, 2 June 2008, Petr Baudis wrote:
> On Mon, Jun 02, 2008 at 12:19:23AM +0200, Jakub Narebski wrote:
> > On Sat, 31 May 2008, Lea Wiemann wrote:
> > >
> > > We may need to have an explicitly implemented layer 0 (front-end 
> > > caching) in Gitweb for at least a subset of pages (like project pages), 
> > > but we'll see if that's necessary.
> > 
> > I think that front-end caching (HTML/RSS/Atom output caching) has sense
> > for large web pages (large size and large number of items), such as
> > projects list page if it is unpaginated (and perhaps even if it is
> > divided into pages, although I'm sure not for project search page),
> > commonly requested snapshots (if they are enabled), large trees like
> > SPECS directory with all the package *.spec files for distribution
> > repository, perhaps summary/feed for most popular projects. If (when)
> > syntax highlighting would got implemented, probably also caching
> > blob view (as CPU can become issue).
> > 
> > Front-end (HTML output) caching has the advantages of easy to calculate
> > strong ETag, and web server support for If-Match: and If-None-Match:
> > HTTP/1.1 headers.  You can easy support Range: request, needed for
> > resuming download (e.g. for downloading snapshots, if this feature is
> > enabled in gitweb).
> 
> Caching snapshots would definitely make sense, sure.

That reminds me to finally implement nicer (git-describe like)
[proposed] snapshot filenames.  For example for snapshot of state
given by some tag (snapshot of tagged release [1]), don't use
generic
  <project basename>-<40-chars sha1>.<suffix>
  git-b2a42f55bc419352b848751b0763b0a2d1198479.tar.gz
but
  <project basename>-<tag name>.<suffix>
  git-v1.5.5.3.tar.gz
(well, currently tags don't have 'snapshot' link, but this is easily
fixed).  What do you think about (ab)using 'fp' (file_parent)
parameter to pass proposed snapshot file name?

[1] Would it be good feature to add support for limiting snapshots
to snapshots only of tagged releases (which would be I guess more
important when gitweb caching gets implemented).
 
> > You can even compress the output, and serve it to clients which
> > support proper transparent compression (Content-Encoding).
> 
> What does this have to do with caching?

Well, only in a sense that with front-end caching (to choose if CPU
matters most) this can be done "for free", without incurring extra
CPU, at the cost of little more disk space.

Of course, if most clients understand (accept) Content-Encoding
(transfer encoding), you can store compressed output, with a little
CPU cost to decompress for non-conformant clients; this way frontend
caching can have cache size comparable to [parsed] data caching.

> > And of course it has the advantage of actually been written and tested
> > in work, in the case of kernel.org gitweb.  Although caching parsed
> > data was implemented in repo.or.cz gitweb, it was done only for
> > projects list view, and it is quite new and not so thoroughly tested
> >   http://article.gmane.org/gmane.comp.version-control.git/77469
> 
> This argument does have some value, but I don't think it matters too
> much, since as far as I understood, it is going to get largely
> reimplemented anyway.

What I'd like to see in a bit of time is some estimate how much time
would take implementing data caching almost from scratch (a bit of
code in repo's gitweb), compared to merging in kernel.org's gitweb
caching code...

> > It would be nice for front-end caching to have an option to use absolute
> > time for all time/dates, and to (optionally) not use adaptive
> > Content-Type...
> 
> I'd hate to have to do unnecessary compromises in order to get sensible
> caching.
> 
> Even in your excellent series on Gitweb caching series, I didn't spot
> any arguments that would put frontend caching in front of the
> intermediate data caching option; yes, it is the simplest solution
> implementation-wise, but also the least flexible one. My gut feeling is
> still to go with data caching instead of HTML caching.

As Lars Hjemli wrote in "[RFC/PATCH] gitweb: Paginate project list"
thread (unfortunately not all articles got to git mailing list)

  http://thread.gmane.org/gmane.comp.version-control.git/81838/focus=81875

  <quote>
  In cgit I've chosen "projectlist in a single file" and "cache html
  output". This makes it cheap (in terms of cpu and io) to both generate
  and serve the cached page (and the cache works for all pages).
  </quote>

  <quote>
  While I agree that caching search result output almost never makes
  sense, I think it's more important that cache hits requires minimal
  processing. This is why I've chosen to cache the final result instead
  of an intermediate state, but both solutions obviously got some pros
  and cons.
  </quote>

caching final output is important if you want to minimize processing
(CPU time).  I'd say also if you want to implement Range: for resumable
downloads (snapshots), because otherwise I think it would be quote hard
to do reasonably (with caching only data).

So I guess best solution would be mixed one: use output cache for large
or CPU intensive pages, use data caching to limit cache size and for
maximum flexibility (relative dates, sorting by columns: athough that
would be best solved using some DHTML/JavaScript, paginated output,
projects search etc.)


By the way, we have your (Petr 'Pasky' Baudis, based on repo.or.cz)
and John 'Warthog9' Hawley (based on kernel.org) statements that gitweb
performance is I/O bound, but I don't remember any hard data.

I have said wrongly that one can use 'fio' tool to check I/O
performance; it is not true, this tool can be used to test _filesystem_
by generating specified pattern of I/O load.  I don't know of any tool
which allow to measure if I/O is bottleneck for given application;
'iogrind' measures cold cache start, and it requires I think compiled
program, as it uses Valgrind.  You can measure CPU load, time to
response and memory usage using 'time' (running gitweb as script),
ApacheBench and top; you can measure latency using LatencyTOP.  

Is there some iotop tool, and can it be used to measure performance
bottlenecks of web scripts (web applications)?

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] test-lib.sh: set PERL5LIB instead of GITPERLLIB
  2008-06-02 20:01                   ` Junio C Hamano
@ 2008-06-02 22:19                     ` Lea Wiemann
  0 siblings, 0 replies; 23+ messages in thread
From: Lea Wiemann @ 2008-06-02 22:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano wrote:
> the precedence between "use lib @these_paths"

Apologies, I somehow missed the or-expression in $$ENV{GITPERLLIB} || 
"@@INSTLIBDIR@@" and thought we didn't set any library paths if we unset 
GITPERLLIB. :(  So yes, of course my patch is rubbish -- sorry for 
wasting your time, I'll check more carefully next time!

> If [t/t9700/<many>.perl] unconditionally do "use lib $ENV{GITPERLLIB};"
> upfront, wouldn't that be enough to guarantee that you would use the
> freshly built one during the test, not the instsalled one?

Yup, that's fine.  I'll add it to the test file.

-- Lea

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash
  2008-06-02 21:32           ` Jakub Narebski
@ 2008-06-02 22:31             ` Lea Wiemann
  0 siblings, 0 replies; 23+ messages in thread
From: Lea Wiemann @ 2008-06-02 22:31 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Petr Baudis, git, Lars Hjemli, John Hawley

Jakub Narebski wrote:
> Is there some iotop tool, and can it be used to measure performance
> bottlenecks of web scripts (web applications)?

There is iotop and sysstat's iostat, but the more interesting measure 
would probably be comparing top's 'wait' vs user+sys CPU time under 
realistic load ('wait' indicates that the CPU is idling because it is 
waiting for IO).

-- Lea

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] test-lib.sh: set PERL5LIB instead of GITPERLLIB
  2008-06-02  5:17             ` Junio C Hamano
  2008-06-02 14:08               ` Lea Wiemann
@ 2008-06-03  0:20               ` Lea Wiemann
  1 sibling, 0 replies; 23+ messages in thread
From: Lea Wiemann @ 2008-06-03  0:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano wrote:
> the primary Makefile has this rule to munge our perl scripts:
>
>                 [snip]
>                 sed -e '1{' \
>                     -e '	s|#!.*perl|#!$(PERL_PATH_SQ)|' \
>                     -e '	h' \
>                     -e '	s=.*=use lib (split(/:/, $$ENV{GITPERLLIB} || "@@INSTLIBDIR@@"));=' \
> 
> so that people do not have to have that path on PERL5LIB in their
> environment when they use [...] scripts that do "use Git"
> 
> The real fix to the issue [is]  to fix the build
> procedure of gitweb/gitweb.perl so that the above script rewriting is also
> applied to it.

I'm not good enough with sed to fix this, so if anyone wants to write a 
patch to the Makefile that adds "use lib (split(/:/, $$ENV{GITPERLLIB} 
|| "@@INSTLIBDIR@@"))" to gitweb.perl, please go ahead. ;-)

-- Lea

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2008-06-03  0:22 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-05-30 23:00 [PATCH] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash Lea Wiemann
2008-05-30 23:03 ` Lea Wiemann
2008-05-31  9:40   ` Jakub Narebski
2008-05-31 12:39     ` Lea Wiemann
2008-06-01 22:19       ` Jakub Narebski
2008-06-02  5:35         ` Junio C Hamano
2008-06-02  9:29         ` Petr Baudis
2008-06-02 21:32           ` Jakub Narebski
2008-06-02 22:31             ` Lea Wiemann
2008-05-31 13:04 ` Petr Baudis
2008-05-31 14:19   ` [PATCH v2] " Lea Wiemann
2008-05-31 14:34     ` Lea Wiemann
2008-06-01  8:23     ` Junio C Hamano
2008-06-01 15:44       ` Lea Wiemann
2008-06-01 21:33         ` Junio C Hamano
2008-06-01 22:16           ` [PATCH] test-lib.sh: set PERL5LIB instead of GITPERLLIB Lea Wiemann
2008-06-02  5:17             ` Junio C Hamano
2008-06-02 14:08               ` Lea Wiemann
2008-06-02 14:13                 ` [PATCH v2] " Lea Wiemann
2008-06-02 20:01                   ` Junio C Hamano
2008-06-02 22:19                     ` Lea Wiemann
2008-06-03  0:20               ` [PATCH] " Lea Wiemann
2008-06-01 23:18     ` [PATCH v2] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash Lea Wiemann

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).