git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC/PATCH 2/2] gitweb: check given hash before trying to create snapshot
@ 2009-09-10 21:20 Mark Rada
  2009-09-11  7:52 ` Jakub Narebski
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Rada @ 2009-09-10 21:20 UTC (permalink / raw)
  To: git; +Cc: Jakub Narebski, Junio C Hamano

This patch is dual purposed.

First, it makes things nicer in cases when you hand craft the snapshot
URL but make a typo (e.g. netx instead of next); you will now get an
error message instead of a broken tarball.

Second, any given treeish will always be translated to the full length,
unambiguous, hash id; this will be useful for things like creating
unique names for snapshot caches.

This patch includes test for t9501 to demonstrate the changed
functionality.

Signed-off-by: Mark Rada <marada@uwaterloo.ca>
---
 gitweb/gitweb.perl                       |    5 +++--
 t/t9501-gitweb-standalone-http-status.sh |   26 ++++++++++++++++++++++++++
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index d650188..4ae960c 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -5197,8 +5197,9 @@ sub git_snapshot {
 		die_error(403, "Unsupported snapshot format");
 	}
 
-	if (!defined $hash) {
-		$hash = &git_get_hash($project);
+	my $snapshot = &git_get_hash($project, $hash);
+	if (!$snapshot) {
+		die_error(400, "Not a valid hash id: $hash");
 	}
 
 	my $name = $project;
diff --git a/t/t9501-gitweb-standalone-http-status.sh b/t/t9501-gitweb-standalone-http-status.sh
index d0ff21d..4f8f147 100644
--- a/t/t9501-gitweb-standalone-http-status.sh
+++ b/t/t9501-gitweb-standalone-http-status.sh
@@ -75,4 +75,30 @@ test_expect_success \
 test_debug 'cat gitweb.output'
 
 
+test_expect_success \
+	'snapshots: bad treeish id' \
+	'gitweb_run "p=.git;a=snapshot;h=frizzumFrazzum;sf=tgz" &&
+	grep "400 - Not a valid hash id:" gitweb.output'
+test_debug 'cat gitweb.output'
+
+test_expect_success \
+	'snapshots: good treeish id' \
+	'gitweb_run "p=.git;a=snapshot;h=master;sf=tgz" &&
+	grep "Status: 200 OK" gitweb.output'
+test_debug 'cat gitweb.output'
+
+test_expect_success \
+	'snapshots: good object id' \
+	'ID=`git rev-parse --verify HEAD` &&
+	gitweb_run "p=.git;a=snapshot;h=$ID;sf=tgz" &&
+	grep "Status: 200 OK" gitweb.output'
+test_debug 'cat gitweb.output'
+
+test_expect_success \
+	'snapshots: bad object id' \
+	'gitweb_run "p=.git;a=snapshot;h=abcdef01234;sf=tgz" &&
+	grep "400 - Not a valid hash id:" gitweb.output'
+test_debug 'cat gitweb.output'
+
+
 test_done
-- 
1.6.4.2

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC/PATCH 2/2] gitweb: check given hash before trying to create snapshot
  2009-09-10 21:20 [RFC/PATCH 2/2] gitweb: check given hash before trying to create snapshot Mark Rada
@ 2009-09-11  7:52 ` Jakub Narebski
  2009-09-11 15:44   ` Mark Rada
  0 siblings, 1 reply; 4+ messages in thread
From: Jakub Narebski @ 2009-09-11  7:52 UTC (permalink / raw)
  To: Mark Rada; +Cc: git, Junio C Hamano

On Thu, 10 Sep 2009, Mark Rada wrote:

> This patch is dual purposed.
> 
> First, it makes things nicer in cases when you hand craft the snapshot
> URL but make a typo (e.g. netx instead of next); you will now get an
> error message instead of a broken tarball.

This is a very good idea.

> 
> Second, any given treeish will always be translated to the full length,
> unambiguous, hash id; this will be useful for things like creating
> unique names for snapshot caches.

But this is not a good idea, IMHO.

First, it introduces feature that nobody uses (at least yet); we can
introduce this feature when it is needed instead.

Second, I'd rather have better names for snapshots than using full SHA-1.
For snapshot of 'v1.5.0' of repository 'repo.git' I'd prefer for snapshot
to be named 'repo-v1.5.0', and for snapshot of 'next' branch of the same
project to be named for example 'repo-next-20090909', or perhaps
'repo-next-2009-09-10T09:16:18' or 'repo-next-20090909-g5f6b0ff',
or 'repo-v1.6.5-rc0-164-g5f6b0ff'.

I'm not sure what would be the best name of snapshot of given 
subdirectory...


In short: I'd rather not improve on bad design of using full SHA-1
in snapshot name.

> 
> This patch includes test for t9501 to demonstrate the changed
> functionality.
> 
> Signed-off-by: Mark Rada <marada@uwaterloo.ca>
> ---
>  gitweb/gitweb.perl                       |    5 +++--
>  t/t9501-gitweb-standalone-http-status.sh |   26 ++++++++++++++++++++++++++
>  2 files changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index d650188..4ae960c 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -5197,8 +5197,9 @@ sub git_snapshot {
>  		die_error(403, "Unsupported snapshot format");
>  	}
>  
> -	if (!defined $hash) {
> -		$hash = &git_get_hash($project);
> +	my $snapshot = &git_get_hash($project, $hash);

Same comment as for PATCH 1/2: don't use '&' subroutine call if it
is not required.


> +	if (!$snapshot) {
> +		die_error(400, "Not a valid hash id: $hash");

Note that we don't use user input in _any_ of other error messages;
you would probably need to sanitize $hash.

By the way, wouldn't 404 (Not Found) be a better error code?

>  	}
>  
>  	my $name = $project;
> diff --git a/t/t9501-gitweb-standalone-http-status.sh b/t/t9501-gitweb-standalone-http-status.sh
> index d0ff21d..4f8f147 100644
> --- a/t/t9501-gitweb-standalone-http-status.sh
> +++ b/t/t9501-gitweb-standalone-http-status.sh
> @@ -75,4 +75,30 @@ test_expect_success \
>  test_debug 'cat gitweb.output'
>  
>  
> +test_expect_success \
> +	'snapshots: bad treeish id' \
> +	'gitweb_run "p=.git;a=snapshot;h=frizzumFrazzum;sf=tgz" &&
> +	grep "400 - Not a valid hash id:" gitweb.output'
> +test_debug 'cat gitweb.output'
> +
> +test_expect_success \
> +	'snapshots: good treeish id' \
> +	'gitweb_run "p=.git;a=snapshot;h=master;sf=tgz" &&
> +	grep "Status: 200 OK" gitweb.output'
> +test_debug 'cat gitweb.output'

Why you don't check for "Status: 400" too?

> +
> +test_expect_success \
> +	'snapshots: good object id' \
> +	'ID=`git rev-parse --verify HEAD` &&
> +	gitweb_run "p=.git;a=snapshot;h=$ID;sf=tgz" &&
> +	grep "Status: 200 OK" gitweb.output'
> +test_debug 'cat gitweb.output'
> +
> +test_expect_success \
> +	'snapshots: bad object id' \
> +	'gitweb_run "p=.git;a=snapshot;h=abcdef01234;sf=tgz" &&
> +	grep "400 - Not a valid hash id:" gitweb.output'
> +test_debug 'cat gitweb.output'
> +
> +
>  test_done
> -- 
> 1.6.4.2
> 
> 

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC/PATCH 2/2] gitweb: check given hash before trying to create snapshot
  2009-09-11  7:52 ` Jakub Narebski
@ 2009-09-11 15:44   ` Mark Rada
  2009-09-11 16:42     ` Jakub Narebski
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Rada @ 2009-09-11 15:44 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Mark Rada, git, Junio C Hamano


On 2009-09-11, at 3:52 AM, Jakub Narebski wrote:

> Second, I'd rather have better names for snapshots than using full  
> SHA-1.
> For snapshot of 'v1.5.0' of repository 'repo.git' I'd prefer for  
> snapshot
> to be named 'repo-v1.5.0', and for snapshot of 'next' branch of the  
> same
> project to be named for example 'repo-next-20090909', or perhaps
> 'repo-next-2009-09-10T09:16:18' or 'repo-next-20090909-g5f6b0ff',
> or 'repo-v1.6.5-rc0-164-g5f6b0ff'.

Ah, yeah, well, I let $hash still hold the originally passed value,  
which
would be used to create the outputted file name (unless overwritten by  
the
client by using curl -o or something). Then $snapshot holds the full  
hash.
This way, if you were to type something like

http://git.kernel.org/?p=git/git.git;a=snapshot;h=next;sf=tgz

into your browser window, you would get git.git-next.tar.gz back, but  
the
backend could look up something like

git-5f6b0ffff13f5cd762d0a5a4e1c4dede58e8a537.tar.gz

using the $snapshot variable in some hypothetical cache (or even  
without the
cache it won't mangle the nicer name $hash might have).

Also, right now gitweb will not accept tags for hashes. This seems to be
because it passes the --verify option to rev-parse, but the output  
from using
and not using the verify option seems to be the same (other than also  
accepting
all tree-ishes). Could you let me know if there is a good reason not  
to take
off the --verify option? Otherwise, I would like to take it off in the  
next
version of this patch.

Your point about adding the short hash to snapshots of branch heads is  
cool,
I'll try that for the next version of the patch.


>> 	}
>>
>> 	my $name = $project;
>> diff --git a/t/t9501-gitweb-standalone-http-status.sh b/t/t9501- 
>> gitweb-standalone-http-status.sh
>> index d0ff21d..4f8f147 100644
>> --- a/t/t9501-gitweb-standalone-http-status.sh
>> +++ b/t/t9501-gitweb-standalone-http-status.sh
>> @@ -75,4 +75,30 @@ test_expect_success \
>> test_debug 'cat gitweb.output'
>>
>>
>> +test_expect_success \
>> +	'snapshots: bad treeish id' \
>> +	'gitweb_run "p=.git;a=snapshot;h=frizzumFrazzum;sf=tgz" &&
>> +	grep "400 - Not a valid hash id:" gitweb.output'
>> +test_debug 'cat gitweb.output'
>> +
>> +test_expect_success \
>> +	'snapshots: good treeish id' \
>> +	'gitweb_run "p=.git;a=snapshot;h=master;sf=tgz" &&
>> +	grep "Status: 200 OK" gitweb.output'
>> +test_debug 'cat gitweb.output'
>
> Why you don't check for "Status: 400" too?

I'm not sure which test you are referring to (I think the second). The
second test is valid and should return a nice .git-master.tar.gz
tarball.


>> Second, any given treeish will always be translated to the full  
>> length,
>> unambiguous, hash id; this will be useful for things like creating
>> unique names for snapshot caches.
>
> But this is not a good idea, IMHO.
>
> First, it introduces feature that nobody uses (at least yet); we can
> introduce this feature when it is needed instead.

Sorry for promoting vapourware, I did originally rip this patch out from
something else. I removed the comment from the v2 commit message.



--
Mark Rada (ferrous26)
marada@uwaterloo.ca

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC/PATCH 2/2] gitweb: check given hash before trying to create snapshot
  2009-09-11 15:44   ` Mark Rada
@ 2009-09-11 16:42     ` Jakub Narebski
  0 siblings, 0 replies; 4+ messages in thread
From: Jakub Narebski @ 2009-09-11 16:42 UTC (permalink / raw)
  To: Mark Rada; +Cc: git, Junio C Hamano

[This mail was very strangely wrapped; I fixed this for readability]

On Fri, 11 Sep 2009, Mark Rada wrote:
> On 2009-09-11, at 3:52 AM, Jakub Narebski wrote:
> 
>> Second, I'd rather have better names for snapshots than using full  SHA-1.
>> For snapshot of 'v1.5.0' of repository 'repo.git' I'd prefer for snapshot
>> to be named 'repo-v1.5.0', and for snapshot of 'next' branch of the  
>> same project to be named for example 'repo-next-20090909', or perhaps
>> 'repo-next-2009-09-10T09:16:18' or 'repo-next-20090909-g5f6b0ff',
>> or 'repo-v1.6.5-rc0-164-g5f6b0ff'.
> 
> Ah, yeah, well, I let $hash still hold the originally passed value,  
> which would be used to create the outputted file name (unless
> overwritten by the client by using curl -o or something).

Or by specifying different file name than proposed by browser.

> Then $snapshot holds the full hash. This way, if you were to type
> something like 
> 
> http://git.kernel.org/?p=git/git.git;a=snapshot;h=next;sf=tgz
> 
> into your browser window, you would get git.git-next.tar.gz back, but  
> the backend could look up something like
> 
> git-5f6b0ffff13f5cd762d0a5a4e1c4dede58e8a537.tar.gz
> 
> using the $snapshot variable in some hypothetical cache (or even  
> without the cache it won't mangle the nicer name $hash might have).

O.K.

> 
> Also, right now gitweb will not accept tags for hashes. This seems to be
> because it passes the --verify option to rev-parse, but the output  
> from using and not using the verify option seems to be the same (other
> than also accepting all tree-ishes). Could you let me know if there is
> a good reason not to take off the --verify option? Otherwise, I would
> like to take it off in the next version of this patch.

Errr, what?

  $ 5096:[gitweb/web@git]# git rev-parse --verify v1.5.0            
  6db027ffe03210324939b3dd655c4223ca023b45
  $ git rev-parse --verify refs/tags/v1.5.0
  6db027ffe03210324939b3dd655c4223ca023b45

So it works as intended.  The problem must be in some other place.

The '--verify' option is needed because git-rev-parse would otherwise
pass parameters it does not understand 'as is'.  Compare

  $ git rev-parse --verify v9.9.9  2>/dev/null

  $ git rev-parse          v9.9.9  2>/dev/null
  v9.9.9

> 
> Your point about adding the short hash to snapshots of branch heads is
> cool, I'll try that for the next version of the patch.

I think it would be better left for a _separate_ patch, as it is
separate feature (and I guess more complicated one).

>>> diff --git a/t/t9501-gitweb-standalone-http-status.sh b/t/t9501- 
>>> gitweb-standalone-http-status.sh
>>> index d0ff21d..4f8f147 100644
>>> --- a/t/t9501-gitweb-standalone-http-status.sh
>>> +++ b/t/t9501-gitweb-standalone-http-status.sh
>>> @@ -75,4 +75,30 @@ test_expect_success \
>>> test_debug 'cat gitweb.output'
>>>
>>>
>>> +test_expect_success \
>>> +	'snapshots: bad treeish id' \
>>> +	'gitweb_run "p=.git;a=snapshot;h=frizzumFrazzum;sf=tgz" &&
>>> +	grep "400 - Not a valid hash id:" gitweb.output'
>>> +test_debug 'cat gitweb.output'
>>> +
>>> +test_expect_success \
>>> +	'snapshots: good treeish id' \
>>> +	'gitweb_run "p=.git;a=snapshot;h=master;sf=tgz" &&
>>> +	grep "Status: 200 OK" gitweb.output'
>>> +test_debug 'cat gitweb.output'
>>
>> Why you don't check for "Status: 400" too?
> 
> I'm not sure which test you are referring to (I think the second). The
> second test is valid and should return a nice .git-master.tar.gz
> tarball.

The output of CGI script like gitweb (and therefore gitweb.output file,
as it is generated now) contains both HTTP headers, separated by single
empty CRLF delimited line from the proper output of a script.

In first test you check that _contents_ contain specific error message,
but you do not check if HTTP status code matches it (it should, because
of how die_error works).  In second test you check HTTP status.  If the
t/t9501-gitweb-standalone-http-status.sh is to be about status, I guess
that you should check HTTP status, and not contents of the page (which
is more likely to change, e.g. due to some prettifying).

In t9501 tests you need, I think, only the HTTP headers part, unless
you want also to check that the contents matches.  There was some sed
script shown to extract only HTTP headers.
 
>>> Second, any given treeish will always be translated to the full length,
>>> unambiguous, hash id; this will be useful for things like creating
>>> unique names for snapshot caches.
>>
>> But this is not a good idea, IMHO.
>>
>> First, it introduces feature that nobody uses (at least yet); we can
>> introduce this feature when it is needed instead.
> 
> Sorry for promoting vapourware, I did originally rip this patch out from
> something else. I removed the comment from the v2 commit message.

Ah. O.K. then.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-09-11 16:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-10 21:20 [RFC/PATCH 2/2] gitweb: check given hash before trying to create snapshot Mark Rada
2009-09-11  7:52 ` Jakub Narebski
2009-09-11 15:44   ` Mark Rada
2009-09-11 16:42     ` Jakub Narebski

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).