git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Lea Wiemann <lewiemann@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash
Date: Mon, 2 Jun 2008 00:19:23 +0200	[thread overview]
Message-ID: <200806020019.23858.jnareb@gmail.com> (raw)
In-Reply-To: <4841471E.2070302@gmail.com>

On Sat, 31 May 2008, Lea Wiemann wrote:
> Jakub Narebski wrote:
>>
>> But I'm not sure for example [that] parsing subroutines [that parse the
>> output of the git commands should be in Git.pm].
>> 
>> [...] IMVHO caching command output is bad idea.  I'd rather have gitweb
>> cache _parsed_ data (i.e. Perl structures).
> 
> Yes, I agree with you that parsed data should be cached, but for that 
> reason I think that the subroutine that parse git's output should be in 
> Git.pm, not in Gitweb:

This not necessarily follows.  First, I'm not sure if gitweb's parse_*
subroutines are generic enough; please take into account that Git.pm 
is used also by other git commands coded in Perl: git-cvsexportcommit
(but not git-cvsserver?), git-send-email, git-svn and (internal)
git-add--interactive.  Second, caching of _parsed_ data can be
implemented in gitweb, or in Gitweb::Cache / Gitweb::Caching.

> If you want to cache parsed data, you need the caching layer in between 
> the application (= Gitweb) and the repository access layer (= command 
> output parsing), or you would have to insert cache code at every call to 
> the repository access layer.  

Not every.  Please don't assume that we would want to cache _all_ the
data; herein madness lies.

But there is another reason to have caching as a layer, namely having
caching optional (i.e. enabled or disable), although this can be also
achieved by choosing 'Null'/'None' caching engine.

> So you end up with these layers: 
> 
> (Layer 0: Front-end [HTML] caching.)
> Layer 1: Application (Gitweb)
> Layer 2: Back-end caching
> Layer 3: Repository access (command parsing)
> Layer 4: Calls to the git binary
> 
> Layer 3 and 4 are application-independent (i.e. not Gitweb specific), 
> and since they form a usable API, they might as well be written as a 
> separate API rather than lumped together with Gitweb.  Git.pm is a start 
> of such an API (it does layer 4 and a little bit of layer 3), so it 
> seems natural for me to extend it.

This assumes that command parsing used by gitweb are generic enough
to put them in Git.pm.  But some IMVHO are very gitweb-specific, for
example the part in parse_commit_text() beginning with 
  # remove leading stuff of merges to make the interesting part visible
and the 'age_string*' keys there, parse_difftree_raw_line() which
currently does not support '-z' output, parse_from_to_diffinfo() which
is _very_ gitweb specific, git_get_heads_list() which is not generic
enough (it gets info which gitweb needs, but no more), etc.

> Layer 2 is application-independent as well, so it can become an extra 
> class in Git.pm or a separate module.  (It should stay independent of 
> layers 3 and 4).

I think it would be better as separate module.  Would it be Git::Cache
(or Git::Caching), Gitweb::Cache, or part of gitweb, that would have
to be decided.  Besides, I'm not sure if it is really application-
-independent as you say: I think we would get better result if we
collate data first, which is application dependent.  Also I think
there is no sense to cache everything: what to cache is again
application dependent.

> We may need to have an explicitly implemented layer 0 (front-end 
> caching) in Gitweb for at least a subset of pages (like project pages), 
> but we'll see if that's necessary.

I think that front-end caching (HTML/RSS/Atom output caching) has sense
for large web pages (large size and large number of items), such as
projects list page if it is unpaginated (and perhaps even if it is
divided into pages, although I'm sure not for project search page),
commonly requested snapshots (if they are enabled), large trees like
SPECS directory with all the package *.spec files for distribution
repository, perhaps summary/feed for most popular projects.  If (when)
syntax highlighting would got implemented, probably also caching
blob view (as CPU can become issue).

Front-end (HTML output) caching has the advantages of easy to calculate
strong ETag, and web server support for If-Match: and If-None-Match:
HTTP/1.1 headers.  You can easy support Range: request, needed for
resuming download (e.g. for downloading snapshots, if this feature is
enabled in gitweb).  You can even compress the output, and serve it to
clients which support proper transparent compression (Content-Encoding).

And of course it has the advantage of actually been written and tested
in work, in the case of kernel.org gitweb.  Although caching parsed
data was implemented in repo.or.cz gitweb, it was done only for
projects list view, and it is quite new and not so thoroughly tested
  http://article.gmane.org/gmane.comp.version-control.git/77469


It would be nice for front-end caching to have an option to use absolute
time for all time/dates, and to (optionally) not use adaptive
Content-Type...

>> Caching git command output (something like IPC::Cmd::Cached?)
>> would only repeat bad I/O patterns of git commands.
> 
> I hope you're not assuming that the back-end cache will reside on disk 
> -- the problem is IO throughput, so having a cache on disk can really 
> only work for a front-end cache.  For the back-end cache, we *will* have 
> to use a memory cache (or no cache at all).

Don't assume, check (in this case: benchmark[1]).  

But I reallu don't know enough about caching to say if it is one way
or the other...

[1] http://cpan.robm.fastmail.fm/cache_perf.html (a bit old)
-- 
Jakub Narebski
Poland

  reply	other threads:[~2008-06-01 22:20 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-30 23:00 [PATCH] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash Lea Wiemann
2008-05-30 23:03 ` Lea Wiemann
2008-05-31  9:40   ` Jakub Narebski
2008-05-31 12:39     ` Lea Wiemann
2008-06-01 22:19       ` Jakub Narebski [this message]
2008-06-02  5:35         ` Junio C Hamano
2008-06-02  9:29         ` Petr Baudis
2008-06-02 21:32           ` Jakub Narebski
2008-06-02 22:31             ` Lea Wiemann
2008-05-31 13:04 ` Petr Baudis
2008-05-31 14:19   ` [PATCH v2] " Lea Wiemann
2008-05-31 14:34     ` Lea Wiemann
2008-06-01  8:23     ` Junio C Hamano
2008-06-01 15:44       ` Lea Wiemann
2008-06-01 21:33         ` Junio C Hamano
2008-06-01 22:16           ` [PATCH] test-lib.sh: set PERL5LIB instead of GITPERLLIB Lea Wiemann
2008-06-02  5:17             ` Junio C Hamano
2008-06-02 14:08               ` Lea Wiemann
2008-06-02 14:13                 ` [PATCH v2] " Lea Wiemann
2008-06-02 20:01                   ` Junio C Hamano
2008-06-02 22:19                     ` Lea Wiemann
2008-06-03  0:20               ` [PATCH] " Lea Wiemann
2008-06-01 23:18     ` [PATCH v2] gitweb: use Git.pm, and use its parse_rev method for git_get_head_hash Lea Wiemann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200806020019.23858.jnareb@gmail.com \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=lewiemann@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).