git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "J.H." <warthog9@eaglescrag.net>
To: Jakub Narebski <jnareb@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 00/18] Gitweb caching v8
Date: Thu, 09 Dec 2010 16:43:02 -0800	[thread overview]
Message-ID: <4D017796.4030506@eaglescrag.net> (raw)
In-Reply-To: <m3bp4u34vj.fsf@localhost.localdomain>

On 12/09/2010 03:26 PM, Jakub Narebski wrote:
> John, could you please in the future Cc me?  I am interested in gitweb
> output caching development.  Thanks in advance.

Apologies, apparently screwed up on my git send-email line.  I'll get
that right one of these eons.

> Could you please rebase it on top of v7.2 version?  The v7.2 patch
> series contained a few bugs that needs to be corrected.

I assume you mean 7.4, as opposed to 7.2... otherwise already done!

> I guess that in the final submission (i.e. the one that is to be
> merged in into git.git repository) those changes would be squashed in,
> isn't it?

I have no objections to squashing the reversions into a single patch,
just figured it was easier to break them out for the time being.

>> This still differs, by two patches, from whats in production on kernel.org.
>> It's missing the index page git:// link, and kernel.org and kernel.org also
> 
>>
>> has the forced version matching.  As a note I'll probably let this stew
>> another day or so on kernel.org and then I'll push it into the Fedora update
>> stream, as there's a couple of things in this patch series that would be 
>> good for them to have.
> 
> There was some discussion about git:// link in the past; nevertheless
> this issue is independent on gitweb caching and can (and should) be
> sent as a aeparate patch.
> 
> IIRC we agreed that because of backward compatibility forced versions
> match is quite useless (in general)...

The former wasn't submitted as that is a separate issue, the later was
not agreed on really but mostly me retracting the patches as they
weren't making any headway.

I mention the patches at all as clarification of what's actually running
on kernel.org, and eventually what will be in the gitweb-caching
packages that are part of Fedora and EPEL.

>> There is one additional script I've written that the Fedora folks are using,
>> and that might be useful to include, which is an 'offline' cache file generator.
>> It basically wraps gitweb.cgi and at the end moves the cache file into the right
>> place.  The Fedora folks were finding it took hours to generate their front
>> page, and that doing a background generation almost never completed (due to 
>> process death).  This was a simple way to handle that.  If people would like
>> I can add it in as an additional patch.
> 
> Are you detaching the background process?

No, in fact I completely turn off forking (using the $cacheDoFork variable.)

> It would be nice to have it as separate patch.

I can add it easily enough.

>> v8:
>> 	- Reverting several changes from Jakub's change set that make no sense
>>                 - is_cacheable changed to always return true - nothing special about
>>                   blame or blame_incremental as far as the caching engine is concerned
> 
> 'blame_incremental' is just another version of 'blame' view.  I have
> disabled it when caching is enabled in my rewrite (you instead disabled
> caching for 'blame_incremental' in your v7 and mine v7.x) because I
> couldn't get it to work together with caching.  Did you check that it
> works?

blame works fine, blame_incremental generates but doesn't..... ohhhh
someone added ajaxy kinda stuff and doesn't mention it anywhere.

Exciting.

blame_data needs to not get a 'generating...' page in all likelihood,
generating a blame_incremental page, letting it load and then refreshing
the whole thing gets me what I'm expecting.

Is enough to mask.

Guess I'm looking at a v9 now.

> Besides, withou "tee"-ing, i.e. printing output as it is captured,
> cached 'blame_data' means that 'blame_incremental' is not incremental,
> and therefore it vanishes its advantage over 'blame'.

There are only 2 ways to get to a blame_incremental page

1) By going to a blame page and clicking on the incremental link in the nav

2) By enabling it by default so when you click 'blame' it goes to
incremental first.

> In the case data is in cache, then 'blame_inremental' doesn't have
> advantage over 'blame' either.

Agreed, though it's easy enough to support in the caching engine,
basically don't return 'Generating...' and wait for that data to cache.
 Not really an advantage except that your not waiting for the whole
generation to get a page back at all.

>>                 - Reverted change to reset_output as
>>                         open STDOUT, ">&", \*STDOUT_REAL;
>>                   causes assertion failures:
>>                   Assertion !((((s->var)->sv_flags & (0x00004000|0x00008000)) == 0x00008000) && (((svtype)((s->var)->sv_flags & 0xff)) == SVt_PVGV || ((svtype)((s->var)->sv_flags & 0xff)) == SVt_PVLV)) failed: file "scalar.xs", line 49 at gitweb.cgi line 1221.
>>                   if we encounter an error *BEFORE* we've ever changed the output.
> 
> Which Perl version are you using?  Because I think you found error in Perl.
> Well, at least I have not happen on this bug.

This is perl, v5.10.0 built for x86_64-linux-thread-multi

> I have nothing againts using
> 
>   open STDOUT, ">&STDOUT_REAL";
> 
> though I really prefer that you used lexical filehandles, instead of
> "globs" which are global variables.
> 
> The following works:
> 
>   open STDOUT, '>&', fileno($fh);
> 
> Note that fileno(Symbol::qualify_to_ref($fh)) might be needed...

I see 0 advantage to shifting around STDOUT and STDERR to a lexical
filehandle vs. a glob in this case.  STDOUT_REAL retains all the
properties of STDOUT should it be needed elsewhere, including what it
was going and what it was doing.

I have no objection to shifting the file handles I'm using to lexical
variables, if nothing else the argument about them closing when falling
out of scope is worth it, but for STDOUT, STDERR, etc I don't think
switching to lexicals makes a lot of sense

>>         - Cleanups there were indirectly mentioned by Jakub
>>                 - Elimination of anything even remotely looking like duplicate code
>>                         - Creation of isBinaryAction() and isFeedAction()
> 
> Could you please do not use mixedCase names?

I'm fine with renaming those if you wish.

> First, that is what %actions_info from
> 
>   [PATCH 16/24] gitweb: Introduce %actions_info, gathering information about actions
>   http://thread.gmane.org/gmane.comp.version-control.git/163052/focus=163038
>   http://repo.or.cz/w/git/jnareb-git.git/commitdiff/305a10339b33d56b4a50708d71e8f42453c8cb1f
> 
> I have invented for.

I have not based any of my caching engine, right now, on anything you've
done for your rewrite.

> Second, why 'isBinaryAction()'?  there isn't something inherently
> different between binary (':raw') and text (':utf8') output, as I have
> repeatedly said before.

It's a binary action in that you are shoving something down the pipe
with the intention of sending the bits completely raw.  You read the
data raw, and write the data raw.  There is no interpretation of the
data as being anything but straight raw.

Right now, in gitweb already, there are two places that treat output
completely differently:

	- snapshot
	- blob_plain

The only reason isBinaryAction() (or any other function name or process
you want to grant it) exists is so that I can figure out if it's one of
those actions so I can deal with the cache and output handling
differently for each.

Yes, I could flip the entire caching engine over to following the same
mantra for everything and thus there is no need to care, but gitweb
itself isn't really setup to handle that separation cleanly right now,
and I'm trying to make as few bigger changes right now as is.


>>         - Added better error handling
>>                 - Creation of .err file in the cache directory
>>                 - Trap STDERR output into $output_err as this was spewing data prior
>>                   to any header information being sent
> 
> Why it is needed?  We capture output of "die" via CGI::Util::set_message,
> and "warn" output is captured to web server logs... unless you explicitely
> use "print STDERR <sth>" -- don't do that instead.

I have seen, in several instances, a case where git itself will generate
an error, it shoves it to STDERR which makes it to the client before
anything else, thus causing 500 level errors.

Added this so that STDERR got trapped and those messages didn't make it out.

>>         - Added hidden field in footer for url & hash of url, which is extremely useful
>>           for debugging
> 
> Nice idea, I'll see it.  Can it be disabled (information leakage)?

There's not really any information leakage per-se, unless you call
md5suming the url information leakage.

- John 'Warthog9' Hawley

  reply	other threads:[~2010-12-10  0:41 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-09 21:57 [PATCH 00/18] Gitweb caching v8 John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 01/18] gitweb: Prepare for splitting gitweb John 'Warthog9' Hawley
2010-12-09 23:30   ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 02/18] gitweb: add output buffering and associated functions John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 03/18] gitweb: File based caching layer (from git.kernel.org) John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 04/18] gitweb: Minimal testing of gitweb caching John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 05/18] gitweb: Regression fix concerning binary output of files John 'Warthog9' Hawley
2010-12-09 23:33   ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 06/18] gitweb: Add more explicit means of disabling 'Generating...' page John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 07/18] gitweb: Revert back to $cache_enable vs. $caching_enabled John 'Warthog9' Hawley
2010-12-09 23:38   ` Jakub Narebski
2010-12-10  2:38     ` J.H.
2010-12-10 13:48       ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 08/18] gitweb: Change is_cacheable() to return true always John 'Warthog9' Hawley
2010-12-09 23:46   ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 09/18] gitweb: Revert reset_output() back to original code John 'Warthog9' Hawley
2010-12-09 23:58   ` Jakub Narebski
2010-12-10  2:43     ` J.H.
2010-12-09 21:57 ` [PATCH 10/18] gitweb: Adding isBinaryAction() and isFeedAction() to determine the action type John 'Warthog9' Hawley
2010-12-10  0:06   ` Jakub Narebski
2010-12-10  3:39     ` J.H.
2010-12-10 12:10       ` Jakub Narebski
2010-12-10 12:25         ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 11/18] gitweb: add isDumbClient() check John 'Warthog9' Hawley
2010-12-10  0:12   ` Jakub Narebski
2010-12-10  4:00     ` J.H.
2010-12-11  0:07       ` Junio C Hamano
2010-12-11  0:15         ` Jakub Narebski
2010-12-11  1:15           ` J.H.
2010-12-11  1:40             ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 12/18] gitweb: Change file handles (in caching) to lexical variables as opposed to globs John 'Warthog9' Hawley
2010-12-10  0:16   ` Jakub Narebski
2010-12-10  0:32     ` Junio C Hamano
2010-12-10  0:47       ` Jakub Narebski
2010-12-10  5:56       ` J.H.
2010-12-09 21:57 ` [PATCH 13/18] gitweb: Add commented url & url hash to page footer John 'Warthog9' Hawley
2010-12-10  0:26   ` Jakub Narebski
2010-12-10  6:10     ` J.H.
2010-12-09 21:57 ` [PATCH 14/18] gitweb: add print_transient_header() function for central header printing John 'Warthog9' Hawley
2010-12-10  0:36   ` Jakub Narebski
2010-12-10  6:18     ` J.H.
2010-12-09 21:57 ` [PATCH 15/18] gitweb: Add show_warning() to display an immediate warning, with refresh John 'Warthog9' Hawley
2010-12-10  1:01   ` Jakub Narebski
2010-12-10  7:38     ` J.H.
2010-12-10 14:10       ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 16/18] gitweb: When changing output (STDOUT) change STDERR as well John 'Warthog9' Hawley
2010-12-10  1:36   ` Jakub Narebski
2010-12-12  5:25     ` J.H.
2010-12-12 15:17       ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 17/18] gitweb: Prepare for cached error pages & better error page handling John 'Warthog9' Hawley
2010-12-10  1:49   ` Jakub Narebski
2010-12-10  8:33     ` J.H.
2010-12-10 20:33       ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 18/18] gitweb: Add better error handling for gitweb caching John 'Warthog9' Hawley
2010-12-10  1:56   ` Jakub Narebski
2010-12-09 23:26 ` [PATCH 00/18] Gitweb caching v8 Jakub Narebski
2010-12-10  0:43   ` J.H. [this message]
2010-12-10  1:27     ` Jakub Narebski
2010-12-10  0:39 ` Junio C Hamano
2010-12-10  0:45   ` J.H.

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D017796.4030506@eaglescrag.net \
    --to=warthog9@eaglescrag.net \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).