git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [BUG] git archive broken in 1.7.8.1
@ 2012-01-10 21:18 Albert Astals Cid
  2012-01-10 21:33 ` Carlos Martín Nieto
  0 siblings, 1 reply; 30+ messages in thread
From: Albert Astals Cid @ 2012-01-10 21:18 UTC (permalink / raw)
  To: git

CC me on answers since i'm not subscribed to the list

Hi, one of our [KDE] anongit servers was updated to 1.7.8.1 and not the syntax

git archive --remote=git://anongit.kde.org/repo.git HEAD:path

does not seem to return a valid tar archive anymore when it did work 
previously. In fact the man page of my version has that syntax in one of the 
examples.

Is that a regression or should have never it worked and the current behaviour 
is the correct one?

Albert

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] git archive broken in 1.7.8.1
  2012-01-10 21:18 [BUG] git archive broken in 1.7.8.1 Albert Astals Cid
@ 2012-01-10 21:33 ` Carlos Martín Nieto
  2012-01-10 22:05   ` Albert Astals Cid
  0 siblings, 1 reply; 30+ messages in thread
From: Carlos Martín Nieto @ 2012-01-10 21:33 UTC (permalink / raw)
  To: Albert Astals Cid; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 671 bytes --]

On Tue, Jan 10, 2012 at 10:18:41PM +0100, Albert Astals Cid wrote:
> CC me on answers since i'm not subscribed to the list
> 
> Hi, one of our [KDE] anongit servers was updated to 1.7.8.1 and not the syntax
> 
> git archive --remote=git://anongit.kde.org/repo.git HEAD:path

This syntax is no longer allowed due to some security tightening. Use
the alternate syntax

    git archive --remote=git://anongit.kde.org/repo.git HEAD -- path

> 
> does not seem to return a valid tar archive anymore when it did work 
> previously. In fact the man page of my version has that syntax in one of the 
> examples.

That sounds like a documentation bug.

   cmn

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] git archive broken in 1.7.8.1
  2012-01-10 21:33 ` Carlos Martín Nieto
@ 2012-01-10 22:05   ` Albert Astals Cid
  2012-01-10 22:50     ` Carlos Martín Nieto
  2012-01-10 23:01     ` [BUG] git archive broken in 1.7.8.1 Allan Wind
  0 siblings, 2 replies; 30+ messages in thread
From: Albert Astals Cid @ 2012-01-10 22:05 UTC (permalink / raw)
  To: Carlos Martín Nieto; +Cc: git

El Dimarts, 10 de gener de 2012, a les 22:33:44, Carlos Martín Nieto va 
escriure:
> On Tue, Jan 10, 2012 at 10:18:41PM +0100, Albert Astals Cid wrote:
> > CC me on answers since i'm not subscribed to the list
> > 
> > Hi, one of our [KDE] anongit servers was updated to 1.7.8.1 and not the
> > syntax
> > 
> > git archive --remote=git://anongit.kde.org/repo.git HEAD:path
> 
> This syntax is no longer allowed due to some security tightening. Use
> the alternate syntax
> 
>     git archive --remote=git://anongit.kde.org/repo.git HEAD -- path

Unfortunately this producess a tarball with a different layout, e.g.

git archive --remote=git://anongit.kde.org/kgraphviewer.git HEAD:doc/en_US
  gives me a tarball with the doc/en_US files in the root

git archive --remote=git://anongit.kde.org/kgraphviewer.git HEAD -- doc/en_US
  gives me a tarball with the doc/en_US folders and then the files

Is there a way to keep the old behaviour or do we need to update our scripts?

Thanks for the fast answer!

Albert

> 
> > does not seem to return a valid tar archive anymore when it did work
> > previously. In fact the man page of my version has that syntax in one of
> > the examples.
> 
> That sounds like a documentation bug.
> 
>    cmn

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] git archive broken in 1.7.8.1
  2012-01-10 22:05   ` Albert Astals Cid
@ 2012-01-10 22:50     ` Carlos Martín Nieto
  2012-01-10 23:21       ` Jeff King
  2012-01-10 23:01     ` [BUG] git archive broken in 1.7.8.1 Allan Wind
  1 sibling, 1 reply; 30+ messages in thread
From: Carlos Martín Nieto @ 2012-01-10 22:50 UTC (permalink / raw)
  To: Albert Astals Cid; +Cc: git, peff

[-- Attachment #1: Type: text/plain, Size: 1873 bytes --]

On Tue, Jan 10, 2012 at 11:05:45PM +0100, Albert Astals Cid wrote:
> El Dimarts, 10 de gener de 2012, a les 22:33:44, Carlos Martín Nieto va 
> escriure:
> > On Tue, Jan 10, 2012 at 10:18:41PM +0100, Albert Astals Cid wrote:
> > > CC me on answers since i'm not subscribed to the list
> > > 
> > > Hi, one of our [KDE] anongit servers was updated to 1.7.8.1 and not the
> > > syntax
> > > 
> > > git archive --remote=git://anongit.kde.org/repo.git HEAD:path
> > 
> > This syntax is no longer allowed due to some security tightening. Use
> > the alternate syntax
> > 
> >     git archive --remote=git://anongit.kde.org/repo.git HEAD -- path
> 
> Unfortunately this producess a tarball with a different layout, e.g.
> 
> git archive --remote=git://anongit.kde.org/kgraphviewer.git HEAD:doc/en_US
>   gives me a tarball with the doc/en_US files in the root
> 
> git archive --remote=git://anongit.kde.org/kgraphviewer.git HEAD -- doc/en_US
>   gives me a tarball with the doc/en_US folders and then the files
> 
> Is there a way to keep the old behaviour or do we need to update our scripts?

Not as far as I know. However, the commit that hardened the input
(ee27ca4a781844: archive: don't let remote clients get unreachable
commits, 2011-11-17) does state that HEAD:doc/en_US should be valid,
so it looks like it's actually a regression. As it's bedtime in my
timezone, I'm blaming Peff and I'll look into this if it hasn't been
fixed by the time I get to the office tomorrow.

> 
> Thanks for the fast answer!
> 
> Albert
> 
> > 
> > > does not seem to return a valid tar archive anymore when it did work
> > > previously. In fact the man page of my version has that syntax in one of
> > > the examples.
> > 
> > That sounds like a documentation bug.

Notice that the syntax is for the local case, not for --remote.

   cmn

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] git archive broken in 1.7.8.1
  2012-01-10 22:05   ` Albert Astals Cid
  2012-01-10 22:50     ` Carlos Martín Nieto
@ 2012-01-10 23:01     ` Allan Wind
  2012-01-11 12:51       ` Carlos Martín Nieto
  1 sibling, 1 reply; 30+ messages in thread
From: Allan Wind @ 2012-01-10 23:01 UTC (permalink / raw)
  To: git

On 2012-01-10 23:05:45, Albert Astals Cid wrote:
> Unfortunately this producess a tarball with a different layout, e.g.
> 
> git archive --remote=git://anongit.kde.org/kgraphviewer.git HEAD:doc/en_US
>   gives me a tarball with the doc/en_US files in the root

Meaning the files you have stored in git under doc/en_US are 
dumped in the root directory of the tar?  That does not sound 
like desired behavior for the feature.


/Allan
-- 
Allan Wind
Life Integrity, LLC
<http://lifeintegrity.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] git archive broken in 1.7.8.1
  2012-01-10 22:50     ` Carlos Martín Nieto
@ 2012-01-10 23:21       ` Jeff King
  2012-01-11 12:12         ` [PATCH] archive: re-allow HEAD:Documentation on a remote invocation Carlos Martín Nieto
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2012-01-10 23:21 UTC (permalink / raw)
  To: Carlos Martín Nieto; +Cc: Albert Astals Cid, git

On Tue, Jan 10, 2012 at 11:50:11PM +0100, Carlos Martín Nieto wrote:

> > Unfortunately this producess a tarball with a different layout, e.g.
> > 
> > git archive --remote=git://anongit.kde.org/kgraphviewer.git HEAD:doc/en_US
> >   gives me a tarball with the doc/en_US files in the root
> > 
> > git archive --remote=git://anongit.kde.org/kgraphviewer.git HEAD -- doc/en_US
> >   gives me a tarball with the doc/en_US folders and then the files
> > 
> > Is there a way to keep the old behaviour or do we need to update our scripts?
> 
> Not as far as I know. However, the commit that hardened the input
> (ee27ca4a781844: archive: don't let remote clients get unreachable
> commits, 2011-11-17) does state that HEAD:doc/en_US should be valid,
> so it looks like it's actually a regression. As it's bedtime in my
> timezone, I'm blaming Peff and I'll look into this if it hasn't been
> fixed by the time I get to the office tomorrow.

It is definitely my fault. According to ee27ca4a, sub-trees were an
unfortunate casualty of the tightening. However, I did have some patches
that moved towards allowing things like that safely (basically the rev
parser needs to pass more context back to the caller).

It's evening here and I'm doing family stuff, but I'll take a look
either late tonight or tomorrow morning.

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] archive: re-allow HEAD:Documentation on a remote invocation
  2012-01-10 23:21       ` Jeff King
@ 2012-01-11 12:12         ` Carlos Martín Nieto
  2012-01-11 19:39           ` Jeff King
  2012-01-12  2:46           ` [PATCH] archive: re-allow HEAD:Documentation on a remote invocation Junio C Hamano
  0 siblings, 2 replies; 30+ messages in thread
From: Carlos Martín Nieto @ 2012-01-11 12:12 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Albert Astals Cid

The tightening done in (ee27ca4a: archive: don't let remote clients
get unreachable commits, 2011-11-17) went too far and disallowed
HEAD:Documentation as it would try to find "HEAD:Documentation" as a
ref.

Only DWIM the "HEAD" part to see if it exists as a ref. Once we're
sure that we've been given a valid ref, we follow the normal code
path. This still disallows attempts to access commits which are not
branch tips.

Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
---

AFAICT this should still be safe. Using HEAD^:Documentation or
<sha1>:Documentation still complains that HEAD^ and <sha1> aren't
refs.

 archive.c |   19 +++++++++++++------
 1 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/archive.c b/archive.c
index 164bbd0..4735bfb 100644
--- a/archive.c
+++ b/archive.c
@@ -260,14 +260,21 @@ static void parse_treeish_arg(const char **argv,
 	/* Remotes are only allowed to fetch actual refs */
 	if (remote) {
 		char *ref = NULL;
-		if (!dwim_ref(name, strlen(name), sha1, &ref))
-			die("no such ref: %s", name);
+		const char *refname, *colon = NULL;
+
+		colon = strchr(name, ':');
+		if (colon)
+			refname = xstrndup(name, colon - name);
+		else
+			refname = name;
+
+		if (!dwim_ref(refname, strlen(refname), sha1, &ref))
+			die("no such ref: %s", refname);
 		free(ref);
 	}
-	else {
-		if (get_sha1(name, sha1))
-			die("Not a valid object name");
-	}
+
+	if (get_sha1(name, sha1))
+		die("Not a valid object name");
 
 	commit = lookup_commit_reference_gently(sha1, 1);
 	if (commit) {
-- 
1.7.8.352.g876a6f

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [BUG] git archive broken in 1.7.8.1
  2012-01-10 23:01     ` [BUG] git archive broken in 1.7.8.1 Allan Wind
@ 2012-01-11 12:51       ` Carlos Martín Nieto
  0 siblings, 0 replies; 30+ messages in thread
From: Carlos Martín Nieto @ 2012-01-11 12:51 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 937 bytes --]

On Tue, Jan 10, 2012 at 06:01:22PM -0500, Allan Wind wrote:
> On 2012-01-10 23:05:45, Albert Astals Cid wrote:
> > Unfortunately this producess a tarball with a different layout, e.g.
> > 
> > git archive --remote=git://anongit.kde.org/kgraphviewer.git HEAD:doc/en_US
> >   gives me a tarball with the doc/en_US files in the root
> 
> Meaning the files you have stored in git under doc/en_US are 
> dumped in the root directory of the tar?  That does not sound 
> like desired behavior for the feature.

Which feature do you mean? Using HEAD:doc/en_US should be equivalent
to running `tar cf - .` from inside the doc/en_US, as what you're
passing is that tree.

Using HEAD -- doc/en_US however means that you want to pack HEAD, but
limit it to files that match doc/en_US, so it's working as
designed. If you mean that 'HEAD -- doc/en_US' shouldn't give you a
tar with those files at the root, then we agree.

   cmn

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] archive: re-allow HEAD:Documentation on a remote invocation
  2012-01-11 12:12         ` [PATCH] archive: re-allow HEAD:Documentation on a remote invocation Carlos Martín Nieto
@ 2012-01-11 19:39           ` Jeff King
  2012-01-11 19:42             ` [PATCH 1/2] get_sha1_with_context: report features used in resolution Jeff King
  2012-01-11 19:42             ` [PATCH 2/2] archive: loosen restrictions on remote object lookup Jeff King
  2012-01-12  2:46           ` [PATCH] archive: re-allow HEAD:Documentation on a remote invocation Junio C Hamano
  1 sibling, 2 replies; 30+ messages in thread
From: Jeff King @ 2012-01-11 19:39 UTC (permalink / raw)
  To: Carlos Martín Nieto; +Cc: git, Albert Astals Cid

On Wed, Jan 11, 2012 at 01:12:38PM +0100, Carlos Martín Nieto wrote:

> The tightening done in (ee27ca4a: archive: don't let remote clients
> get unreachable commits, 2011-11-17) went too far and disallowed
> HEAD:Documentation as it would try to find "HEAD:Documentation" as a
> ref.
> 
> Only DWIM the "HEAD" part to see if it exists as a ref. Once we're
> sure that we've been given a valid ref, we follow the normal code
> path. This still disallows attempts to access commits which are not
> branch tips.

I'd rather not do this kind of ad-hoc parsing of sha1s in the archive
code, and instead let the regular resolution process tell us more about
what it did, so we can make a policy decision at the upper level.

Patches to follow:

  [1/2]: get_sha1_with_context: report features used in resolution
  [2/2]: archive: loosen restrictions on remote object lookup

> AFAICT this should still be safe. Using HEAD^:Documentation or
> <sha1>:Documentation still complains that HEAD^ and <sha1> aren't
> refs.

My patches enable things like HEAD^, but disallow a raw sha1. The only
way to safely allow a raw sha1 is to check its connectivity from the
tips, which can be somewhat expensive (you have to traverse every tree
of every commit in the worst case).

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/2] get_sha1_with_context: report features used in resolution
  2012-01-11 19:39           ` Jeff King
@ 2012-01-11 19:42             ` Jeff King
  2012-01-12  2:36               ` Junio C Hamano
  2012-01-11 19:42             ` [PATCH 2/2] archive: loosen restrictions on remote object lookup Jeff King
  1 sibling, 1 reply; 30+ messages in thread
From: Jeff King @ 2012-01-11 19:42 UTC (permalink / raw)
  To: Carlos Martín Nieto; +Cc: git, Albert Astals Cid

Most callers generally treat get_sha1 as a black box, giving
it a string from the user and expecting to get a sha1 in
return. The get_sha1_with_context function gives callers
more information about what happened while resolving the
object name so they can make better decisions about how to
use the result. We currently use this only to provide
information about the path entry used to find a blob.

We don't currently provide any information about the
resolution rules that were used to reach the final object.
Some callers may want these in order to enforce a policy
that a particular subset of the lookup rules are used (e.g.,
when serving remote requests).

This patch adds a set of bit-fields that document the use of
particular features during an object lookup.

Signed-off-by: Jeff King <peff@peff.net>
---
The diffstat looks a little scary, but it is mostly just the internal
get_sha1 functions learning to pass the object_context around.

 cache.h     |    7 +++++
 sha1_name.c |   73 +++++++++++++++++++++++++++++++++++++---------------------
 2 files changed, 53 insertions(+), 27 deletions(-)

diff --git a/cache.h b/cache.h
index 10afd71..ac25a37 100644
--- a/cache.h
+++ b/cache.h
@@ -809,6 +809,13 @@ struct object_context {
 	unsigned char tree[20];
 	char path[PATH_MAX];
 	unsigned mode;
+	unsigned used_ref:1,
+		 used_reflog:1,
+		 used_index:1,
+		 used_nth_checkout:1,
+		 used_describe_name:1,
+		 used_oneline:1,
+		 used_raw_hex:1;
 };
 
 extern int get_sha1(const char *str, unsigned char *sha1);
diff --git a/sha1_name.c b/sha1_name.c
index 03ffc2c..ad7c52a 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -7,7 +7,8 @@
 #include "refs.h"
 #include "remote.h"
 
-static int get_sha1_oneline(const char *, unsigned char *, struct commit_list *);
+static int get_sha1_oneline(const char *, unsigned char *, struct commit_list *,
+			    struct object_context *);
 
 static int find_short_object_filename(int len, const char *name, unsigned char *sha1)
 {
@@ -158,7 +159,7 @@ static int find_unique_short_object(int len, char *canonical,
 }
 
 static int get_short_sha1(const char *name, int len, unsigned char *sha1,
-			  int quietly)
+			  int quietly, struct object_context *oc)
 {
 	int i, status;
 	char canonical[40];
@@ -190,6 +191,8 @@ static int get_short_sha1(const char *name, int len, unsigned char *sha1,
 	status = find_unique_short_object(i, canonical, res, sha1);
 	if (!quietly && (status == SHORT_NAME_AMBIGUOUS))
 		return error("short SHA1 %.*s is ambiguous.", len, canonical);
+	if (!status)
+		oc->used_raw_hex = 1;
 	return status;
 }
 
@@ -204,7 +207,8 @@ const char *find_unique_abbrev(const unsigned char *sha1, int len)
 		return hex;
 	while (len < 40) {
 		unsigned char sha1_ret[20];
-		status = get_short_sha1(hex, len, sha1_ret, 1);
+		struct object_context unused;
+		status = get_short_sha1(hex, len, sha1_ret, 1, &unused);
 		if (exists
 		    ? !status
 		    : status == SHORT_NAME_NOT_FOUND) {
@@ -255,17 +259,21 @@ static inline int upstream_mark(const char *string, int len)
 	return 0;
 }
 
-static int get_sha1_1(const char *name, int len, unsigned char *sha1);
+static int get_sha1_1(const char *name, int len, unsigned char *sha1,
+		      struct object_context *oc);
 
-static int get_sha1_basic(const char *str, int len, unsigned char *sha1)
+static int get_sha1_basic(const char *str, int len, unsigned char *sha1,
+			  struct object_context *oc)
 {
 	static const char *warn_msg = "refname '%.*s' is ambiguous.";
 	char *real_ref = NULL;
 	int refs_found = 0;
 	int at, reflog_len;
 
-	if (len == 40 && !get_sha1_hex(str, sha1))
+	if (len == 40 && !get_sha1_hex(str, sha1)) {
+		oc->used_raw_hex = 1;
 		return 0;
+	}
 
 	/* basic@{time or number or -number} format to query ref-log */
 	reflog_len = at = 0;
@@ -292,7 +300,8 @@ static int get_sha1_basic(const char *str, int len, unsigned char *sha1)
 		ret = interpret_branch_name(str+at, &buf);
 		if (ret > 0) {
 			/* substitute this branch name and restart */
-			return get_sha1_1(buf.buf, buf.len, sha1);
+			oc->used_nth_checkout = 1;
+			return get_sha1_1(buf.buf, buf.len, sha1, oc);
 		} else if (ret == 0) {
 			return -1;
 		}
@@ -305,6 +314,7 @@ static int get_sha1_basic(const char *str, int len, unsigned char *sha1)
 
 	if (!refs_found)
 		return -1;
+	oc->used_ref = 1;
 
 	if (warn_ambiguous_refs && refs_found > 1)
 		warning(warn_msg, len, str);
@@ -352,6 +362,7 @@ static int get_sha1_basic(const char *str, int len, unsigned char *sha1)
 				    len, str, co_cnt);
 			}
 		}
+		oc->used_reflog = 1;
 	}
 
 	free(real_ref);
@@ -359,10 +370,11 @@ static int get_sha1_basic(const char *str, int len, unsigned char *sha1)
 }
 
 static int get_parent(const char *name, int len,
-		      unsigned char *result, int idx)
+		      unsigned char *result, int idx,
+		      struct object_context *oc)
 {
 	unsigned char sha1[20];
-	int ret = get_sha1_1(name, len, sha1);
+	int ret = get_sha1_1(name, len, sha1, oc);
 	struct commit *commit;
 	struct commit_list *p;
 
@@ -389,13 +401,14 @@ static int get_parent(const char *name, int len,
 }
 
 static int get_nth_ancestor(const char *name, int len,
-			    unsigned char *result, int generation)
+			    unsigned char *result, int generation,
+			    struct object_context *oc)
 {
 	unsigned char sha1[20];
 	struct commit *commit;
 	int ret;
 
-	ret = get_sha1_1(name, len, sha1);
+	ret = get_sha1_1(name, len, sha1, oc);
 	if (ret)
 		return ret;
 	commit = lookup_commit_reference(sha1);
@@ -436,7 +449,8 @@ struct object *peel_to_type(const char *name, int namelen,
 	}
 }
 
-static int peel_onion(const char *name, int len, unsigned char *sha1)
+static int peel_onion(const char *name, int len, unsigned char *sha1,
+		      struct object_context *oc)
 {
 	unsigned char outer[20];
 	const char *sp;
@@ -476,7 +490,7 @@ static int peel_onion(const char *name, int len, unsigned char *sha1)
 	else
 		return -1;
 
-	if (get_sha1_1(name, sp - name - 2, outer))
+	if (get_sha1_1(name, sp - name - 2, outer, oc))
 		return -1;
 
 	o = parse_object(outer);
@@ -515,14 +529,15 @@ static int peel_onion(const char *name, int len, unsigned char *sha1)
 
 		prefix = xstrndup(sp + 1, name + len - 1 - (sp + 1));
 		commit_list_insert((struct commit *)o, &list);
-		ret = get_sha1_oneline(prefix, sha1, list);
+		ret = get_sha1_oneline(prefix, sha1, list, oc);
 		free(prefix);
 		return ret;
 	}
 	return 0;
 }
 
-static int get_describe_name(const char *name, int len, unsigned char *sha1)
+static int get_describe_name(const char *name, int len, unsigned char *sha1,
+			     struct object_context *oc)
 {
 	const char *cp;
 
@@ -535,14 +550,15 @@ static int get_describe_name(const char *name, int len, unsigned char *sha1)
 			if (ch == 'g' && cp[-1] == '-') {
 				cp++;
 				len -= cp - name;
-				return get_short_sha1(cp, len, sha1, 1);
+				return get_short_sha1(cp, len, sha1, 1, oc);
 			}
 		}
 	}
 	return -1;
 }
 
-static int get_sha1_1(const char *name, int len, unsigned char *sha1)
+static int get_sha1_1(const char *name, int len, unsigned char *sha1,
+		      struct object_context *oc)
 {
 	int ret, has_suffix;
 	const char *cp;
@@ -569,25 +585,25 @@ static int get_sha1_1(const char *name, int len, unsigned char *sha1)
 		if (!num && len1 == len - 1)
 			num = 1;
 		if (has_suffix == '^')
-			return get_parent(name, len1, sha1, num);
+			return get_parent(name, len1, sha1, num, oc);
 		/* else if (has_suffix == '~') -- goes without saying */
-		return get_nth_ancestor(name, len1, sha1, num);
+		return get_nth_ancestor(name, len1, sha1, num, oc);
 	}
 
-	ret = peel_onion(name, len, sha1);
+	ret = peel_onion(name, len, sha1, oc);
 	if (!ret)
 		return 0;
 
-	ret = get_sha1_basic(name, len, sha1);
+	ret = get_sha1_basic(name, len, sha1, oc);
 	if (!ret)
 		return 0;
 
 	/* It could be describe output that is "SOMETHING-gXXXX" */
-	ret = get_describe_name(name, len, sha1);
+	ret = get_describe_name(name, len, sha1, oc);
 	if (!ret)
 		return 0;
 
-	return get_short_sha1(name, len, sha1, 0);
+	return get_short_sha1(name, len, sha1, 0, oc);
 }
 
 /*
@@ -619,7 +635,8 @@ static int handle_one_ref(const char *path,
 }
 
 static int get_sha1_oneline(const char *prefix, unsigned char *sha1,
-			    struct commit_list *list)
+			    struct commit_list *list,
+			    struct object_context *oc)
 {
 	struct commit_list *backup = NULL, *l;
 	int found = 0;
@@ -672,6 +689,7 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1,
 	for (l = backup; l; l = l->next)
 		clear_commit_marks(l->item, ONELINE_SEEN);
 	free_commit_list(backup);
+	oc->used_oneline = found;
 	return found ? 0 : -1;
 }
 
@@ -1029,7 +1047,7 @@ int get_sha1_with_context_1(const char *name, unsigned char *sha1,
 
 	memset(oc, 0, sizeof(*oc));
 	oc->mode = S_IFINVALID;
-	ret = get_sha1_1(name, namelen, sha1);
+	ret = get_sha1_1(name, namelen, sha1, oc);
 	if (!ret)
 		return ret;
 	/* sha1:path --> object name of path in ent sha1
@@ -1046,7 +1064,7 @@ int get_sha1_with_context_1(const char *name, unsigned char *sha1,
 		if (!only_to_die && namelen > 2 && name[1] == '/') {
 			struct commit_list *list = NULL;
 			for_each_ref(handle_one_ref, &list);
-			return get_sha1_oneline(name + 2, sha1, list);
+			return get_sha1_oneline(name + 2, sha1, list, oc);
 		}
 		if (namelen < 3 ||
 		    name[2] != ':' ||
@@ -1081,6 +1099,7 @@ int get_sha1_with_context_1(const char *name, unsigned char *sha1,
 			if (ce_stage(ce) == stage) {
 				hashcpy(sha1, ce->sha1);
 				oc->mode = ce->ce_mode;
+				oc->used_index = 1;
 				free(new_path);
 				return 0;
 			}
@@ -1107,7 +1126,7 @@ int get_sha1_with_context_1(const char *name, unsigned char *sha1,
 			strncpy(object_name, name, cp-name);
 			object_name[cp-name] = '\0';
 		}
-		if (!get_sha1_1(name, cp-name, tree_sha1)) {
+		if (!get_sha1_1(name, cp-name, tree_sha1, oc)) {
 			const char *filename = cp+1;
 			char *new_filename = NULL;
 
-- 
1.7.9.rc0.33.gd3c17

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/2] archive: loosen restrictions on remote object lookup
  2012-01-11 19:39           ` Jeff King
  2012-01-11 19:42             ` [PATCH 1/2] get_sha1_with_context: report features used in resolution Jeff King
@ 2012-01-11 19:42             ` Jeff King
  2013-05-29 12:05               ` Ian Harvey
  1 sibling, 1 reply; 30+ messages in thread
From: Jeff King @ 2012-01-11 19:42 UTC (permalink / raw)
  To: Carlos Martín Nieto; +Cc: git, Albert Astals Cid

Initially, "git upload-archive" would feed the tree
specification from the remote side directly into get_sha1,
giving the remote user the full power of the object name
resolver. This was convenient, but it also meant that remote
users could fetch disconnected trees by their sha1s, which
violates the long-standing behavior of upload-pack not to
make such objects available.

Later, commit ee27ca4 tightened this to use dwim_ref instead
of get_sha1 for the remote case, allowing only the use of
actual refs. Unfortunately, this broke some existing use
cases, like fetching sub-trees with "$ref:subdir".

This patch loosens the restrictions to re-enable those use
cases. It does this by using get_sha1_with_context for the
object lookup, and checking that only allowable features
were used.

Signed-off-by: Jeff King <peff@peff.net>
---
 archive.c                     |   34 ++++++++++++++-------
 t/t5002-archive-resolution.sh |   66 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 89 insertions(+), 11 deletions(-)
 create mode 100755 t/t5002-archive-resolution.sh

diff --git a/archive.c b/archive.c
index 164bbd0..a031bde 100644
--- a/archive.c
+++ b/archive.c
@@ -246,6 +246,25 @@ static void parse_pathspec_arg(const char **pathspec,
 	}
 }
 
+static int check_object_context(int remote, const struct object_context *oc)
+{
+	/* For local requests, allow anything */
+	if (!remote)
+		return 1;
+	/*
+	 * Otherwise, require that we accessed the object through a ref,
+	 * but not have used any of the advanced features like looking in
+	 * the reflog.
+	 */
+	return oc->used_ref &&
+	       !oc->used_reflog &&
+	       !oc->used_index &&
+	       !oc->used_nth_checkout &&
+	       !oc->used_describe_name &&
+	       !oc->used_oneline &&
+	       !oc->used_raw_hex;
+}
+
 static void parse_treeish_arg(const char **argv,
 		struct archiver_args *ar_args, const char *prefix,
 		int remote)
@@ -256,18 +275,11 @@ static void parse_treeish_arg(const char **argv,
 	struct tree *tree;
 	const struct commit *commit;
 	unsigned char sha1[20];
+	struct object_context oc;
 
-	/* Remotes are only allowed to fetch actual refs */
-	if (remote) {
-		char *ref = NULL;
-		if (!dwim_ref(name, strlen(name), sha1, &ref))
-			die("no such ref: %s", name);
-		free(ref);
-	}
-	else {
-		if (get_sha1(name, sha1))
-			die("Not a valid object name");
-	}
+	if (get_sha1_with_context(name, sha1, &oc) ||
+	    !check_object_context(remote, &oc))
+		die("Not a valid object name");
 
 	commit = lookup_commit_reference_gently(sha1, 1);
 	if (commit) {
diff --git a/t/t5002-archive-resolution.sh b/t/t5002-archive-resolution.sh
new file mode 100755
index 0000000..bf2b55c
--- /dev/null
+++ b/t/t5002-archive-resolution.sh
@@ -0,0 +1,66 @@
+#!/bin/sh
+
+test_description='test object resolution methods for local and remote archive'
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	echo a >a &&
+	git add . &&
+	git commit -m one &&
+	sha1_one=`git rev-parse HEAD` &&
+	mkdir subdir &&
+	echo b >subdir/b &&
+	git add . &&
+	git commit -m two &&
+	git checkout -b other &&
+	git checkout master
+'
+
+while read desc where what expect; do
+	cmd="git archive --format=tar -o result.tar"
+	test "$where" = "remote" && cmd="$cmd --remote=."
+	cmd="$cmd $what"
+
+	if test "$expect" = "deny"; then
+		test_expect_success "archive $desc ($where, should deny)" "
+			test_must_fail $cmd
+		"
+	else
+		test_expect_success "archive $desc ($where, should work)" '
+			'"$cmd"' &&
+			for i in '"$expect"'; do
+				echo "$i:`basename $i`"
+			done >expect &&
+			rm -rf result &&
+			mkdir result &&
+			(cd result &&
+			tar xf ../result.tar &&
+			for i in `find * -type f`; do
+				echo "$i:`cat $i`"
+			done >../actual
+			) &&
+			test_cmp expect actual
+		'
+	fi
+done <<EOF
+ref local  master a subdir/b
+ref remote master a subdir/b
+parent local  master^ a
+parent remote master^ a
+tree local  master^{tree} a subdir/b
+tree remote master^{tree} a subdir/b
+subtree local  master:subdir b
+subtree remote master:subdir b
+sha1 local  $sha1_one a
+sha1 remote $sha1_one deny
+reflog local  master@{1} a
+reflog remote master@{1} deny
+oneline local  :/one a
+oneline remote :/one deny
+oneline-ref local  master^{/one} a
+oneline-ref remote master^{/one} deny
+nth-checkout local  @{-1} a subdir/b
+nth-checkout remote @{-1} deny
+EOF
+
+test_done
-- 
1.7.9.rc0.33.gd3c17

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] get_sha1_with_context: report features used in resolution
  2012-01-11 19:42             ` [PATCH 1/2] get_sha1_with_context: report features used in resolution Jeff King
@ 2012-01-12  2:36               ` Junio C Hamano
  2012-01-12  2:51                 ` Jeff King
  0 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2012-01-12  2:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Carlos Martín Nieto, git, Albert Astals Cid

Jeff King <peff@peff.net> writes:

> Most callers generally treat get_sha1 as a black box, giving
> it a string from the user and expecting to get a sha1 in
> return. The get_sha1_with_context function gives callers
> more information about what happened while resolving the
> object name so they can make better decisions about how to
> use the result. We currently use this only to provide
> information about the path entry used to find a blob.
>
> We don't currently provide any information about the
> resolution rules that were used to reach the final object.
> Some callers may want these in order to enforce a policy
> that a particular subset of the lookup rules are used (e.g.,
> when serving remote requests).
>
> This patch adds a set of bit-fields that document the use of
> particular features during an object lookup.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> The diffstat looks a little scary, but it is mostly just the internal
> get_sha1 functions learning to pass the object_context around.

Hmm, shouldn't this also cover peel_to_type()?  That would have made it
also apply to the maintenance track.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] archive: re-allow HEAD:Documentation on a remote invocation
  2012-01-11 12:12         ` [PATCH] archive: re-allow HEAD:Documentation on a remote invocation Carlos Martín Nieto
  2012-01-11 19:39           ` Jeff King
@ 2012-01-12  2:46           ` Junio C Hamano
  2012-01-12  2:54             ` Jeff King
  1 sibling, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2012-01-12  2:46 UTC (permalink / raw)
  To: Carlos Martín Nieto; +Cc: Jeff King, git, Albert Astals Cid

Carlos Martín Nieto <cmn@elego.de> writes:

> The tightening done in (ee27ca4a: archive: don't let remote clients
> get unreachable commits, 2011-11-17) went too far and disallowed
> HEAD:Documentation as it would try to find "HEAD:Documentation" as a
> ref.

I do not think it went too far. Actually we discussed this exact issue
when the topic was cooking, and saw no objections. The commit in question
itself advertises this restriction.

Why are we loosening it now? I do not see a compelling reason to do so.

> Only DWIM the "HEAD" part to see if it exists as a ref. Once we're
> sure that we've been given a valid ref, we follow the normal code
> path. This still disallows attempts to access commits which are not
> branch tips.
>
> Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
> ---
> AFAICT this should still be safe. Using HEAD^:Documentation or
> <sha1>:Documentation still complains that HEAD^ and <sha1> aren't
> refs.

Having said that, I think I agree this is a safe thing to do, _if_ we want
to loosen it.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] get_sha1_with_context: report features used in resolution
  2012-01-12  2:36               ` Junio C Hamano
@ 2012-01-12  2:51                 ` Jeff King
  0 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2012-01-12  2:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Carlos Martín Nieto, git, Albert Astals Cid

On Wed, Jan 11, 2012 at 06:36:12PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Most callers generally treat get_sha1 as a black box, giving
> > it a string from the user and expecting to get a sha1 in
> > return. The get_sha1_with_context function gives callers
> > more information about what happened while resolving the
> > object name so they can make better decisions about how to
> > use the result. We currently use this only to provide
> > information about the path entry used to find a blob.
> >
> > We don't currently provide any information about the
> > resolution rules that were used to reach the final object.
> > Some callers may want these in order to enforce a policy
> > that a particular subset of the lookup rules are used (e.g.,
> > when serving remote requests).
> >
> > This patch adds a set of bit-fields that document the use of
> > particular features during an object lookup.
> >
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> > The diffstat looks a little scary, but it is mostly just the internal
> > get_sha1 functions learning to pass the object_context around.
> 
> Hmm, shouldn't this also cover peel_to_type()?  That would have made it
> also apply to the maintenance track.

I don't see how peel_to_type is relevant. As far as get_sha1 is
concerned, the interesting thing is actually calling peel_onion. It does
get the context passed to it in my patch, but I didn't bother marking
that the peel feature was used (because it wasn't relevant to the policy
I wanted to implement in the follow-on patch).

But we could pretty easily mark the use of the peel feature, too.

I'm not sure what you mean about the maintenance track, though. AFAICT,
we don't separately call peel_to_type, but just potentially use it as
part of get_sha1_with_context. Am I missing something?

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] archive: re-allow HEAD:Documentation on a remote invocation
  2012-01-12  2:46           ` [PATCH] archive: re-allow HEAD:Documentation on a remote invocation Junio C Hamano
@ 2012-01-12  2:54             ` Jeff King
  2012-01-12  2:59               ` Jeff King
  2012-01-12  3:03               ` Junio C Hamano
  0 siblings, 2 replies; 30+ messages in thread
From: Jeff King @ 2012-01-12  2:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Carlos Martín Nieto, git, Albert Astals Cid

On Wed, Jan 11, 2012 at 06:46:56PM -0800, Junio C Hamano wrote:

> Carlos Martín Nieto <cmn@elego.de> writes:
> 
> > The tightening done in (ee27ca4a: archive: don't let remote clients
> > get unreachable commits, 2011-11-17) went too far and disallowed
> > HEAD:Documentation as it would try to find "HEAD:Documentation" as a
> > ref.
> 
> I do not think it went too far. Actually we discussed this exact issue
> when the topic was cooking, and saw no objections. The commit in question
> itself advertises this restriction.

I think you and I discussed it off list (I originally took this off-list
because the original issue did have some security implications). So I
don't think people necessarily had a chance to object.

> Why are we loosening it now? I do not see a compelling reason to do so.

I see it the opposite way. People are clearly using the "$ref:$path"
syntax. So why would we restrict them from doing so? There are no
security implications (i.e., they could always just grab $ref and
extract $path themselves). In my view, ee27ca4a was over-eager in its
restrictions because I wanted it to be simple and close the hole. Now we
can take our time adding more code to loosen it.

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] archive: re-allow HEAD:Documentation on a remote invocation
  2012-01-12  2:54             ` Jeff King
@ 2012-01-12  2:59               ` Jeff King
  2012-01-12  3:03               ` Junio C Hamano
  1 sibling, 0 replies; 30+ messages in thread
From: Jeff King @ 2012-01-12  2:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Carlos Martín Nieto, git, Albert Astals Cid

On Wed, Jan 11, 2012 at 09:54:45PM -0500, Jeff King wrote:

> On Wed, Jan 11, 2012 at 06:46:56PM -0800, Junio C Hamano wrote:
> 
> > Carlos Martín Nieto <cmn@elego.de> writes:
> > 
> > > The tightening done in (ee27ca4a: archive: don't let remote clients
> > > get unreachable commits, 2011-11-17) went too far and disallowed
> > > HEAD:Documentation as it would try to find "HEAD:Documentation" as a
> > > ref.
> > 
> > I do not think it went too far. Actually we discussed this exact issue
> > when the topic was cooking, and saw no objections. The commit in question
> > itself advertises this restriction.
> 
> I think you and I discussed it off list (I originally took this off-list
> because the original issue did have some security implications). So I
> don't think people necessarily had a chance to object.

Here is the only on-list discussion:

  http://article.gmane.org/gmane.comp.version-control.git/186366

Quoted below:

  >> * jk/maint-1.6.2-upload-archive (2011-11-21) 1 commit
  >>  - archive: don't let remote clients get unreachable commits
  >>  (this branch is used by jk/maint-upload-archive.)
  >>
  >> * jk/maint-upload-archive (2011-11-21) 1 commit
  >>  - Merge branch 'jk/maint-1.6.2-upload-archive' into
  >>  jk/maint-upload-archive
  >>  (this branch uses jk/maint-1.6.2-upload-archive.)
  >>
  >> Will merge to 'next' after taking another look.
  >
  > Thanks. I also have some followup patches to re-loosen to at least
  > trees reachable from refs. Do you want to leave the tightening to
  > the maint track, and then consider the re-loosening for master?

  I was planning to first have the really tight version graduate to
  'master' and ship it in 1.7.9, while possibly merging that to 1.7.8.X
  series.  If we hear complaints from real users in the meantime before
  or after such releases, we could apply loosening patch on top of these
  topics and call them "regression fix", but I have been assuming that
  nobody would have been using this backdoor for anything that really
  matters.

So now we have heard a complaint. :)

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] archive: re-allow HEAD:Documentation on a remote invocation
  2012-01-12  2:54             ` Jeff King
  2012-01-12  2:59               ` Jeff King
@ 2012-01-12  3:03               ` Junio C Hamano
  2012-01-12  3:10                 ` Jeff King
  1 sibling, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2012-01-12  3:03 UTC (permalink / raw)
  To: Jeff King; +Cc: Carlos Martín Nieto, git, Albert Astals Cid

Jeff King <peff@peff.net> writes:

> I see it the opposite way. People are clearly using the "$ref:$path"
> syntax. So why would we restrict them from doing so? There are no
> security implications (i.e., they could always just grab $ref and
> extract $path themselves). In my view, ee27ca4a was over-eager in its
> restrictions because I wanted it to be simple and close the hole. Now we
> can take our time adding more code to loosen it.

Ok, so it is more like a partial revert of whatever we did. In that case,
I'd take CMN's patch to limit the extent of the changes, as it more
closely matches the spirit of the original ee27ca4 (archive: don't let
remote clients get unreachable commits, 2011-11-17) that singled out and
catered to the need of "archive" command alone. It is already part of the
v1.7.8.1 release, so I would prefer a change to be stupid and simple.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] archive: re-allow HEAD:Documentation on a remote invocation
  2012-01-12  3:03               ` Junio C Hamano
@ 2012-01-12  3:10                 ` Jeff King
  2012-01-12  3:20                   ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2012-01-12  3:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Carlos Martín Nieto, git, Albert Astals Cid

On Wed, Jan 11, 2012 at 07:03:36PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I see it the opposite way. People are clearly using the "$ref:$path"
> > syntax. So why would we restrict them from doing so? There are no
> > security implications (i.e., they could always just grab $ref and
> > extract $path themselves). In my view, ee27ca4a was over-eager in its
> > restrictions because I wanted it to be simple and close the hole. Now we
> > can take our time adding more code to loosen it.
> 
> Ok, so it is more like a partial revert of whatever we did. In that case,
> I'd take CMN's patch to limit the extent of the changes, as it more
> closely matches the spirit of the original ee27ca4 (archive: don't let
> remote clients get unreachable commits, 2011-11-17) that singled out and
> catered to the need of "archive" command alone. It is already part of the
> v1.7.8.1 release, so I would prefer a change to be stupid and simple.

For a maint release, I am OK with that. In the long term, I'd rather my
patches go onto master (either for 1.7.9 or for later), as I think they
are the right way to do it.

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] archive: re-allow HEAD:Documentation on a remote invocation
  2012-01-12  3:10                 ` Jeff King
@ 2012-01-12  3:20                   ` Junio C Hamano
  0 siblings, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2012-01-12  3:20 UTC (permalink / raw)
  To: Jeff King; +Cc: Carlos Martín Nieto, git, Albert Astals Cid

Jeff King <peff@peff.net> writes:

> For a maint release, I am OK with that. In the long term, I'd rather my
> patches go onto master (either for 1.7.9 or for later), as I think they
> are the right way to do it.

I would prefer not to use it in 1.7.9, as CMN's patch will make its only
immediate user disappear. I think we are in agreement that the change to
carry context around is a longer term thing, and when we apply it, the
safety we added to "archive" will become simpler.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] archive: loosen restrictions on remote object lookup
  2012-01-11 19:42             ` [PATCH 2/2] archive: loosen restrictions on remote object lookup Jeff King
@ 2013-05-29 12:05               ` Ian Harvey
  2013-06-05 16:38                 ` Jeff King
  0 siblings, 1 reply; 30+ messages in thread
From: Ian Harvey @ 2013-05-29 12:05 UTC (permalink / raw)
  To: git

So, did this patch make it anywhere? We could really use it.

Here's the use case. The original ee27ca4 patch broke our build system when
the git server was upgraded to Debian Wheezy last night. The builder fetches
source from the repo in two pieces using git archive, and we need to make
sure both pieces are from the same commit. So we get a sha1 hash with git
ls-remote, and use it with git archive --remote. This, of course, breaks
with the 'no such ref' error.

At the very least, the documentation is wrong when it talks about passing a
commit ID to git archive: maintainers must surely agree that the
documentation and the actual behavior ought to match.

Personally, I'd like to see this patch adopted.

Thanks for listening,
Ian

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] archive: loosen restrictions on remote object lookup
  2013-05-29 12:05               ` Ian Harvey
@ 2013-06-05 16:38                 ` Jeff King
  2013-06-05 22:35                   ` [RFC/PATCH 0/4] real reachability checks for upload-archive Jeff King
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2013-06-05 16:38 UTC (permalink / raw)
  To: Ian Harvey; +Cc: git

On Wed, May 29, 2013 at 12:05:41PM +0000, Ian Harvey wrote:

> So, did this patch make it anywhere? We could really use it.
> 
> Here's the use case. The original ee27ca4 patch broke our build system when
> the git server was upgraded to Debian Wheezy last night. The builder fetches
> source from the repo in two pieces using git archive, and we need to make
> sure both pieces are from the same commit. So we get a sha1 hash with git
> ls-remote, and use it with git archive --remote. This, of course, breaks
> with the 'no such ref' error.

The patch you are responding to[1] would not help there, either. It does
not allow raw sha1s. The only way to do that would be:

  1. Add an option to the server to allow arbitrary sha1s, even if they
     are not reachable from the ref tips. This is an easy fix, but
     requires server admins to cooperate (and they may or may not want
     to lose the "you can only access reachable things policy".

  2. Actually do a reachability check. Doing a full object check to
     allow fetching an arbitrary tree by sha1 is probably prohibitively
     expensive[2], but we could allow the form "<commit>[:<path>]", check
     that "<commit>" is reachable, and then allow arbitrary paths within
     it.

> At the very least, the documentation is wrong when it talks about passing a
> commit ID to git archive: maintainers must surely agree that the
> documentation and the actual behavior ought to match.

I am not sure which documentation you mean. The part about "commit ID"
in the current manpage is drawing the distinction between something that
resolves to a commit versus something that resolves to a tree. Either is
available both locally and remotely. I think the use of the phrase
"commit ID" is questionable there, as it really means "something that
resolves to a commit", not "a sha1 commit ID". We used to use the phrase
"commit-ish" to refer to that, but I think it has fallen out of favor as
being too jargon-y.

The documentation does not mention at all the restrictions placed on
refs using "--remote", and it probably should.

-Peff

[1] http://article.gmane.org/gmane.comp.version-control.git/188387

[2] If we had a reachability bitmap cache, calculating arbitrary object
    reachability would actually be pretty cheap. But the bitmap feature
    for core git is not yet ready for prime-time, so I think we should
    not depend on it yet.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC/PATCH 0/4] real reachability checks for upload-archive
  2013-06-05 16:38                 ` Jeff King
@ 2013-06-05 22:35                   ` Jeff King
  2013-06-05 22:37                     ` [PATCH 1/4] clear parsed flag when we free tree buffers Jeff King
                                       ` (4 more replies)
  0 siblings, 5 replies; 30+ messages in thread
From: Jeff King @ 2013-06-05 22:35 UTC (permalink / raw)
  To: Ian Harvey; +Cc: git

On Wed, Jun 05, 2013 at 12:38:23PM -0400, Jeff King wrote:

>   2. Actually do a reachability check. Doing a full object check to
>      allow fetching an arbitrary tree by sha1 is probably prohibitively
>      expensive[2], but we could allow the form "<commit>[:<path>]", check
>      that "<commit>" is reachable, and then allow arbitrary paths within
>      it.

Thinking on this more, the full reachability check is no worse than what
a clone has to do to fetch the full repository. Here's a series that
does the full check. I'm not entirely happy with the performance,
though; details are in patch 3.

I think I'd be tempted to just go the more limiting "commit is
reachable" route, instead, which would solve your case (and most sane
cases).

  [1/4]: clear parsed flag when we free tree buffers
  [2/4]: upload-archive: restrict remote objects with reachability check
  [3/4]: list-objects: optimize "revs->blob_objects = 0" case
  [4/4]: archive: ignore blob objects when checking reachability

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/4] clear parsed flag when we free tree buffers
  2013-06-05 22:35                   ` [RFC/PATCH 0/4] real reachability checks for upload-archive Jeff King
@ 2013-06-05 22:37                     ` Jeff King
  2013-06-06 17:55                       ` Junio C Hamano
  2013-06-05 22:39                     ` [PATCH 2/4] upload-archive: restrict remote objects with reachability check Jeff King
                                       ` (3 subsequent siblings)
  4 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2013-06-05 22:37 UTC (permalink / raw)
  To: Ian Harvey; +Cc: git

Many code paths will free a tree object's buffer and set it
to NULL after finishing with it in order to keep memory
usage down during a traversal. However, out of 8 sites that
do this, only one actually unsets the "parsed" flag back.
Those sites that don't are setting a trap for later users of
the tree object; even after calling parse_tree, the buffer
will remain NULL, causing potential segfaults.

It is not known whether this is triggerable in the current
code. Most commands do not do an in-memory traversal
followed by actually using the objects again. However, it
does not hurt to be safe for future callers.

In most cases, we can abstract this out to a
"free_tree_buffer" helper. However, there are two
exceptions:

  1. The fsck code relies on the parsed flag to know that we
     were able to parse the object at one point. We can
     switch this to using a flag in the "flags" field.

  2. The index-pack code sets the buffer to NULL but does
     not free it (it is freed by a caller). We should still
     unset the parsed flag here, but we cannot use our
     helper, as we do not want to free the buffer.

Signed-off-by: Jeff King <peff@peff.net>
---
This shouldn't have any behavior change, but I'd worry a bit that I
missed some case in builtin/fsck.c where the new HAS_OBJ flag would need
set.

 builtin/fsck.c       | 17 ++++++++---------
 builtin/index-pack.c |  1 +
 builtin/reflog.c     |  3 +--
 http-push.c          |  3 +--
 list-objects.c       |  3 +--
 reachable.c          |  3 +--
 revision.c           |  3 +--
 tree.c               |  8 ++++++++
 tree.h               |  1 +
 walker.c             |  5 +----
 10 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index bb9a2cd..579fdcc 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -16,6 +16,7 @@
 
 #define REACHABLE 0x0001
 #define SEEN      0x0002
+#define HAS_OBJ   0x0004
 
 static int show_root;
 static int show_tags;
@@ -101,7 +102,7 @@ static int mark_object(struct object *obj, int type, void *data)
 	if (obj->flags & REACHABLE)
 		return 0;
 	obj->flags |= REACHABLE;
-	if (!obj->parsed) {
+	if (!(obj->flags & HAS_OBJ)) {
 		if (parent && !has_sha1_file(obj->sha1)) {
 			printf("broken link from %7s %s\n",
 				 typename(parent->type), sha1_to_hex(parent->sha1));
@@ -127,16 +128,13 @@ static int traverse_one_object(struct object *obj)
 	struct tree *tree = NULL;
 
 	if (obj->type == OBJ_TREE) {
-		obj->parsed = 0;
 		tree = (struct tree *)obj;
 		if (parse_tree(tree) < 0)
 			return 1; /* error already displayed */
 	}
 	result = fsck_walk(obj, mark_object, obj);
-	if (tree) {
-		free(tree->buffer);
-		tree->buffer = NULL;
-	}
+	if (tree)
+		free_tree_buffer(tree);
 	return result;
 }
 
@@ -178,7 +176,7 @@ static void check_reachable_object(struct object *obj)
 	 * except if it was in a pack-file and we didn't
 	 * do a full fsck
 	 */
-	if (!obj->parsed) {
+	if (!(obj->flags & HAS_OBJ)) {
 		if (has_sha1_pack(obj->sha1))
 			return; /* it is in pack - forget about it */
 		printf("missing %s %s\n", typename(obj->type), sha1_to_hex(obj->sha1));
@@ -306,8 +304,7 @@ static int fsck_obj(struct object *obj)
 	if (obj->type == OBJ_TREE) {
 		struct tree *item = (struct tree *) obj;
 
-		free(item->buffer);
-		item->buffer = NULL;
+		free_tree_buffer(item);
 	}
 
 	if (obj->type == OBJ_COMMIT) {
@@ -340,6 +337,7 @@ static int fsck_sha1(const unsigned char *sha1)
 		return error("%s: object corrupt or missing",
 			     sha1_to_hex(sha1));
 	}
+	obj->flags |= HAS_OBJ;
 	return fsck_obj(obj);
 }
 
@@ -352,6 +350,7 @@ static int fsck_obj_buffer(const unsigned char *sha1, enum object_type type,
 		errors_found |= ERROR_OBJECT;
 		return error("%s: object corrupt or missing", sha1_to_hex(sha1));
 	}
+	obj->flags = HAS_OBJ;
 	return fsck_obj(obj);
 }
 
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 79dfe47..20cf284 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -765,6 +765,7 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 			if (obj->type == OBJ_TREE) {
 				struct tree *item = (struct tree *) obj;
 				item->buffer = NULL;
+				obj->parsed = 0;
 			}
 			if (obj->type == OBJ_COMMIT) {
 				struct commit *commit = (struct commit *) obj;
diff --git a/builtin/reflog.c b/builtin/reflog.c
index 54184b3..ba27f7c 100644
--- a/builtin/reflog.c
+++ b/builtin/reflog.c
@@ -94,8 +94,7 @@ static int tree_is_complete(const unsigned char *sha1)
 			complete = 0;
 		}
 	}
-	free(tree->buffer);
-	tree->buffer = NULL;
+	free_tree_buffer(tree);
 
 	if (complete)
 		tree->object.flags |= SEEN;
diff --git a/http-push.c b/http-push.c
index 395a8cf..c13b441 100644
--- a/http-push.c
+++ b/http-push.c
@@ -1330,8 +1330,7 @@ static struct object_list **process_tree(struct tree *tree,
 			break;
 		}
 
-	free(tree->buffer);
-	tree->buffer = NULL;
+	free_tree_buffer(tree);
 	return p;
 }
 
diff --git a/list-objects.c b/list-objects.c
index 3dd4a96..c8c3463 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -123,8 +123,7 @@ static void process_tree(struct rev_info *revs,
 				     cb_data);
 	}
 	strbuf_setlen(base, baselen);
-	free(tree->buffer);
-	tree->buffer = NULL;
+	free_tree_buffer(tree);
 }
 
 static void mark_edge_parents_uninteresting(struct commit *commit,
diff --git a/reachable.c b/reachable.c
index e7e6a1e..654a8c5 100644
--- a/reachable.c
+++ b/reachable.c
@@ -80,8 +80,7 @@ static void process_tree(struct tree *tree,
 		else
 			process_blob(lookup_blob(entry.sha1), p, &me, entry.path, cp);
 	}
-	free(tree->buffer);
-	tree->buffer = NULL;
+	free_tree_buffer(tree);
 }
 
 static void process_tag(struct tag *tag, struct object_array *p,
diff --git a/revision.c b/revision.c
index 518cd08..eb988ee 100644
--- a/revision.c
+++ b/revision.c
@@ -135,8 +135,7 @@ void mark_tree_uninteresting(struct tree *tree)
 	 * We don't care about the tree any more
 	 * after it has been marked uninteresting.
 	 */
-	free(tree->buffer);
-	tree->buffer = NULL;
+	free_tree_buffer(tree);
 }
 
 void mark_parents_uninteresting(struct commit *commit)
diff --git a/tree.c b/tree.c
index 62fed63..1cbf60e 100644
--- a/tree.c
+++ b/tree.c
@@ -225,6 +225,14 @@ int parse_tree(struct tree *item)
 	return parse_tree_buffer(item, buffer, size);
 }
 
+void free_tree_buffer(struct tree *tree)
+{
+	free(tree->buffer);
+	tree->buffer = NULL;
+	tree->size = 0;
+	tree->object.parsed = 0;
+}
+
 struct tree *parse_tree_indirect(const unsigned char *sha1)
 {
 	struct object *obj = parse_object(sha1);
diff --git a/tree.h b/tree.h
index 69bcb5e..601ab9c 100644
--- a/tree.h
+++ b/tree.h
@@ -16,6 +16,7 @@ int parse_tree(struct tree *tree);
 int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size);
 
 int parse_tree(struct tree *tree);
+void free_tree_buffer(struct tree *tree);
 
 /* Parses and returns the tree in the given ent, chasing tags and commits. */
 struct tree *parse_tree_indirect(const unsigned char *sha1);
diff --git a/walker.c b/walker.c
index be389dc..633596e 100644
--- a/walker.c
+++ b/walker.c
@@ -56,10 +56,7 @@ static int process_tree(struct walker *walker, struct tree *tree)
 		if (!obj || process(walker, obj))
 			return -1;
 	}
-	free(tree->buffer);
-	tree->buffer = NULL;
-	tree->size = 0;
-	tree->object.parsed = 0;
+	free_tree_buffer(tree);
 	return 0;
 }
 
-- 
1.8.3.rc2.14.g7eee6b3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/4] upload-archive: restrict remote objects with reachability check
  2013-06-05 22:35                   ` [RFC/PATCH 0/4] real reachability checks for upload-archive Jeff King
  2013-06-05 22:37                     ` [PATCH 1/4] clear parsed flag when we free tree buffers Jeff King
@ 2013-06-05 22:39                     ` Jeff King
  2013-06-05 22:40                     ` [PATCH 3/4] list-objects: optimize "revs->blob_objects = 0" case Jeff King
                                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2013-06-05 22:39 UTC (permalink / raw)
  To: Ian Harvey; +Cc: git

When serving a remote request, git-upload-archive tries to
restrict access to unreachable objects, which matches the
behavior of upload-pack. However, we did so by restricting
the requested tree to "<ref>[:<path>]", because it is fast.
That covers the common cases, but does not allow requesting
items by a specific sha1 (either a tree or a commit sha1).

Instead, let's do the correct-but-slower method of actually
walking back from the tips to see if the requested object is
reachable. The performance impact of this is roughly:

  1. For a recent commit, the speed is about the same (we
     traverse in reverse chronological order, so we see it
     almost immediately).

  2. For an older commit, even one pointed at directly by a
     ref (e.g., an old tag), we are slower, because we
     traverse from the more recent tips. We are bounded in
     this case by the time to look at all commits (i.e.,
     "time git rev-list --all").

  3. When we see "$ref:$path", we typically perform much
     worse, because our traversal looks at all commits
     first, followed by all trees.

  4. The worst case (which we hit for an unreachable object)
     is equivalent to "time rev-list --objects --all", which
     is about the same amount of time pack-objects spends
     preparing a full clone (which can be in the tens of
     seconds for a large repository).

The implementation is a fairly straightforward application
of the traverse_commit_list function. Using the
mark_objects_reachable function would seem more appropriate,
but it has no mechanism for looking for a specific object,
which lets us end the traversal early in common cases.

Signed-off-by: Jeff King <peff@peff.net>
---
The slowdown from points (2) and (3) makes me hesitate on this. We can
address (2) by checking if the get_sha1 lookup used a refname
explicitly, but I'm no sure about (3).

 archive.c                     | 70 +++++++++++++++++++++++++++------
 t/t5005-archive-resolution.sh | 91 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 150 insertions(+), 11 deletions(-)
 create mode 100755 t/t5005-archive-resolution.sh

diff --git a/archive.c b/archive.c
index d254fa5..4d77624 100644
--- a/archive.c
+++ b/archive.c
@@ -5,6 +5,9 @@
 #include "archive.h"
 #include "parse-options.h"
 #include "unpack-trees.h"
+#include "diff.h"
+#include "revision.h"
+#include "list-objects.h"
 
 static char const * const archive_usage[] = {
 	N_("git archive [options] <tree-ish> [<path>...]"),
@@ -241,6 +244,59 @@ static void parse_pathspec_arg(const char **pathspec,
 	}
 }
 
+struct reachable_object_data {
+	struct rev_info revs;
+	struct object *obj;
+};
+
+static void check_object(struct object *obj, const struct name_path *path,
+			 const char *name, void *vdata)
+{
+	struct reachable_object_data *data = vdata;
+	/*
+	 * We found it; the caller will take care of marking it SEEN,
+	 * but we can end the traversal early.
+	 */
+	if (obj == data->obj) {
+		free_commit_list(data->revs.commits);
+		data->revs.commits = NULL;
+
+		free(data->revs.pending.objects);
+		data->revs.pending.nr = 0;
+		data->revs.pending.alloc = 0;
+		data->revs.pending.objects = NULL;
+	}
+}
+
+static void check_commit(struct commit *commit, void *vdata)
+{
+	check_object(&commit->object, NULL, NULL, vdata);
+}
+
+static int object_is_reachable(const unsigned char *sha1)
+{
+	static const char *argv[] = {
+		"rev-list",
+		"--objects",
+		"--all",
+		NULL
+	};
+	struct reachable_object_data data;
+
+	data.obj = parse_object(sha1);
+	if (!data.obj)
+		return 0;
+
+	save_commit_buffer = 0;
+	init_revisions(&data.revs, NULL);
+	setup_revisions(ARRAY_SIZE(argv) - 1, argv, &data.revs, NULL);
+	if (prepare_revision_walk(&data.revs))
+		return 0;
+
+	traverse_commit_list(&data.revs, check_commit, check_object, &data);
+	return data.obj->flags & SEEN;
+}
+
 static void parse_treeish_arg(const char **argv,
 		struct archiver_args *ar_args, const char *prefix,
 		int remote)
@@ -252,20 +308,12 @@ static void parse_treeish_arg(const char **argv,
 	const struct commit *commit;
 	unsigned char sha1[20];
 
-	/* Remotes are only allowed to fetch actual refs */
-	if (remote) {
-		char *ref = NULL;
-		const char *colon = strchr(name, ':');
-		int refnamelen = colon ? colon - name : strlen(name);
-
-		if (!dwim_ref(name, refnamelen, sha1, &ref))
-			die("no such ref: %.*s", refnamelen, name);
-		free(ref);
-	}
-
 	if (get_sha1(name, sha1))
 		die("Not a valid object name");
 
+	if (remote && !object_is_reachable(sha1))
+		die("Not a valid object name");
+
 	commit = lookup_commit_reference_gently(sha1, 1);
 	if (commit) {
 		commit_sha1 = commit->object.sha1;
diff --git a/t/t5005-archive-resolution.sh b/t/t5005-archive-resolution.sh
new file mode 100755
index 0000000..2e7ee1e
--- /dev/null
+++ b/t/t5005-archive-resolution.sh
@@ -0,0 +1,91 @@
+#!/bin/sh
+
+test_description='test object resolution methods for local and remote archive'
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	echo one >one && git add one && git commit -m one &&
+	sha1_referenced=`git rev-parse HEAD` &&
+	git tag tagged &&
+	echo two >two && git add two && git commit -m two &&
+	sha1_unreferenced=`git rev-parse HEAD` &&
+	git reset --hard HEAD^ &&
+	echo three >three && git add three && git commit -m three &&
+	git tag tagged-tree HEAD^{tree} &&
+	git reset --hard HEAD^ &&
+	mkdir subdir &&
+	echo four >subdir/four && git add subdir && git commit -m four &&
+	sha1_subtree=`git rev-parse HEAD:subdir`
+'
+
+# check that archiving $what from $where produces expected paths
+check() {
+	desc=$1; shift; # human-readable description
+	where=$1; shift; # local|remote
+	what=$1; shift; # the commit/tree id
+	expect="$*"; # expected paths or "deny"
+
+	cmd="git archive --format=tar -o result.tar"
+	test "$where" = "remote" && cmd="$cmd --remote=."
+	cmd="$cmd $what"
+
+	if test "$expect" = "deny"; then
+		test_expect_success "archive $desc ($where, should deny)" "
+			test_must_fail $cmd
+		"
+	else
+		test_expect_success "archive $desc ($where, should work)" '
+			'"$cmd"' &&
+			for i in '"$expect"'; do
+				echo "$i:`basename $i`"
+			done >expect &&
+			rm -rf result &&
+			mkdir result &&
+			(cd result &&
+			tar xf ../result.tar &&
+			for i in `find * -type f -print`; do
+				echo "$i:`cat $i`"
+			done >../actual
+			) &&
+			test_cmp expect actual
+		'
+	fi
+}
+
+check 'ref'  local master one subdir/four
+check 'ref' remote master one subdir/four
+
+check 'relative ref'  local master^ one
+check 'relative ref' remote master^ one
+
+check 'reachable sha1'  local $sha1_referenced one
+check 'reachable sha1' remote $sha1_referenced one
+
+check 'unreachable sha1'  local $sha1_unreferenced one two
+check 'unreachable sha1' remote $sha1_unreferenced deny
+
+check 'reachable reflog'  local master@{0} one subdir/four
+check 'reachable reflog' remote master@{0} one subdir/four
+
+check 'unreachable reflog'  local master@{4} one two
+check 'unreachable reflog' remote master@{4} deny
+
+check 'tree via ref^{tree}'  local master^{tree} one subdir/four
+check 'tree via ref^{tree}' remote master^{tree} one subdir/four
+
+check 'tree via ref:'  local master: one subdir/four
+check 'tree via ref:' remote master: one subdir/four
+
+check 'subtree via ref:sub'  local master:subdir four
+check 'subtree via ref:sub' remote master:subdir four
+
+check 'subtree via sha1'  local $sha1_subtree four
+check 'subtree via sha1' remote $sha1_subtree four
+
+check 'tagged commit'  local tagged one
+check 'tagged commit' remote tagged one
+
+check 'tagged tree'  local tagged-tree one three
+check 'tagged tree' remote tagged-tree one three
+
+test_done
-- 
1.8.3.rc2.14.g7eee6b3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/4] list-objects: optimize "revs->blob_objects = 0" case
  2013-06-05 22:35                   ` [RFC/PATCH 0/4] real reachability checks for upload-archive Jeff King
  2013-06-05 22:37                     ` [PATCH 1/4] clear parsed flag when we free tree buffers Jeff King
  2013-06-05 22:39                     ` [PATCH 2/4] upload-archive: restrict remote objects with reachability check Jeff King
@ 2013-06-05 22:40                     ` Jeff King
  2013-06-05 22:40                     ` [PATCH 4/4] archive: ignore blob objects when checking reachability Jeff King
  2013-06-06 17:27                     ` [RFC/PATCH 0/4] real reachability checks for upload-archive Junio C Hamano
  4 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2013-06-05 22:40 UTC (permalink / raw)
  To: Ian Harvey; +Cc: git

If we are traversing trees during a "--objects"
traversal, we may skip blobs if the "blob_objects" field of
rev_info is not set. But we do so as the first thing in
process_blob(), only after we have actually created the
"struct blob" object, incurring a hash lookup. We can
optimize out this no-op call completely.

This does not actually affect any current code, as all of
the current traversals always set blob_objects when looking
at objects, anyway.

Signed-off-by: Jeff King <peff@peff.net>
---
 list-objects.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/list-objects.c b/list-objects.c
index c8c3463..77e6ec5 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -116,7 +116,7 @@ static void process_tree(struct rev_info *revs,
 			process_gitlink(revs, entry.sha1,
 					show, &me, entry.path,
 					cb_data);
-		else
+		else if (revs->blob_objects)
 			process_blob(revs,
 				     lookup_blob(entry.sha1),
 				     show, &me, entry.path,
-- 
1.8.3.rc2.14.g7eee6b3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/4] archive: ignore blob objects when checking reachability
  2013-06-05 22:35                   ` [RFC/PATCH 0/4] real reachability checks for upload-archive Jeff King
                                       ` (2 preceding siblings ...)
  2013-06-05 22:40                     ` [PATCH 3/4] list-objects: optimize "revs->blob_objects = 0" case Jeff King
@ 2013-06-05 22:40                     ` Jeff King
  2013-06-06  7:57                       ` Michael Haggerty
  2013-06-07  0:50                       ` Eric Sunshine
  2013-06-06 17:27                     ` [RFC/PATCH 0/4] real reachability checks for upload-archive Junio C Hamano
  4 siblings, 2 replies; 30+ messages in thread
From: Jeff King @ 2013-06-05 22:40 UTC (permalink / raw)
  To: Ian Harvey; +Cc: git

We cannot create an archive from a blob object, so we would
not expect anyone to provide one to us. And if they do, we
will fail anyway just after the reachability check.  We can
therefore optimize our reachability check to ignore blobs
completely, and not even create a "struct blob" for them.

Depending on the repository size and the exact place we find
the reachable object in the traversal, this can save 20-25%,
a we can avoid many lookups in the object hash.

The downside of this is that a blob provided to a remote
archive process will fail with "no such object" rather than
"object is not a tree" (we could organize the code to retain
the old message, but since we no longer know whether the
blob is reachable or not, we would potentially be leaking
information about the existence of unreachable objects).

Signed-off-by: Jeff King <peff@peff.net>
---
 archive.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/archive.c b/archive.c
index 4d77624..98691cd 100644
--- a/archive.c
+++ b/archive.c
@@ -290,6 +290,7 @@ static int object_is_reachable(const unsigned char *sha1)
 	save_commit_buffer = 0;
 	init_revisions(&data.revs, NULL);
 	setup_revisions(ARRAY_SIZE(argv) - 1, argv, &data.revs, NULL);
+	data.revs.blob_objects = 0;
 	if (prepare_revision_walk(&data.revs))
 		return 0;
 
-- 
1.8.3.rc2.14.g7eee6b3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4] archive: ignore blob objects when checking reachability
  2013-06-05 22:40                     ` [PATCH 4/4] archive: ignore blob objects when checking reachability Jeff King
@ 2013-06-06  7:57                       ` Michael Haggerty
  2013-06-07  0:50                       ` Eric Sunshine
  1 sibling, 0 replies; 30+ messages in thread
From: Michael Haggerty @ 2013-06-06  7:57 UTC (permalink / raw)
  To: Jeff King; +Cc: Ian Harvey, git

On 06/06/2013 12:40 AM, Jeff King wrote:
> We cannot create an archive from a blob object, so we would
> not expect anyone to provide one to us. And if they do, we
> will fail anyway just after the reachability check.  We can
> therefore optimize our reachability check to ignore blobs
> completely, and not even create a "struct blob" for them.
> 
> Depending on the repository size and the exact place we find
> the reachable object in the traversal, this can save 20-25%,
> a we can avoid many lookups in the object hash.
> 
> The downside of this is that a blob provided to a remote
> archive process will fail with "no such object" rather than
> "object is not a tree" (we could organize the code to retain
> the old message, but since we no longer know whether the
> blob is reachable or not, we would potentially be leaking
> information about the existence of unreachable objects).

Could we change the error message to "no such tree object" to be
non-committal about the reason for the failure?


For a moment I thought that one could get correct error messages while
retaining the speed gain in the usual case by doing a quick object
lookup, and then

    check type of object
    if object is missing:
        die(there is no such object)
    else if object is a blob:
        do reachability test including blobs
        if object is not reachable:
            die(there is no such object)
        else
            die(object is not a tree)
    else
        do reachability test excluding blobs
        etc

However, even this would leak information about the existence of
nonreachable objects to a client measuring time time for the response
because the death due to non-reachability would take longer than death
due to missing object.  So, if one would insist on correct error
messages and no information leakage, one could just skip the first
"object is missing" optimization (it should be pretty rare anyway!) like so:

    check type of object
    if object is missing or object is a blob:
        /* Force the same delay in either case: */
        do reachability test including blobs
        if object is missing or object is not reachable:
            die(there is no such object)
        else
            die(object is not a tree)
    else
        do reachability test excluding blobs
        etc

I'm not suggesting that the extra effort is worth it; I just wanted to
mention the possibility.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC/PATCH 0/4] real reachability checks for upload-archive
  2013-06-05 22:35                   ` [RFC/PATCH 0/4] real reachability checks for upload-archive Jeff King
                                       ` (3 preceding siblings ...)
  2013-06-05 22:40                     ` [PATCH 4/4] archive: ignore blob objects when checking reachability Jeff King
@ 2013-06-06 17:27                     ` Junio C Hamano
  4 siblings, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2013-06-06 17:27 UTC (permalink / raw)
  To: Jeff King; +Cc: Ian Harvey, git

Jeff King <peff@peff.net> writes:

> On Wed, Jun 05, 2013 at 12:38:23PM -0400, Jeff King wrote:
>
>>   2. Actually do a reachability check. Doing a full object check to
>>      allow fetching an arbitrary tree by sha1 is probably prohibitively
>>      expensive[2], but we could allow the form "<commit>[:<path>]", check
>>      that "<commit>" is reachable, and then allow arbitrary paths within
>>      it.
>
> Thinking on this more, the full reachability check is no worse than what
> a clone has to do to fetch the full repository. Here's a series that
> does the full check. I'm not entirely happy with the performance,
> though; details are in patch 3.

For some repository-servers, it may be OK to enable this by default,
but I suspect it would be better to have at least an opt-out server
configuration.

> I think I'd be tempted to just go the more limiting "commit is
> reachable" route, instead, which would solve your case (and most sane
> cases).

Yes, I think that is a reasonable thing to do.  After all, as you
noted in 4/4, you cannot ask for a single blob, and not being able
to ask for a single tree is not much different.

>   [1/4]: clear parsed flag when we free tree buffers
>   [2/4]: upload-archive: restrict remote objects with reachability check
>   [3/4]: list-objects: optimize "revs->blob_objects = 0" case
>   [4/4]: archive: ignore blob objects when checking reachability
>
> -Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] clear parsed flag when we free tree buffers
  2013-06-05 22:37                     ` [PATCH 1/4] clear parsed flag when we free tree buffers Jeff King
@ 2013-06-06 17:55                       ` Junio C Hamano
  0 siblings, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2013-06-06 17:55 UTC (permalink / raw)
  To: Jeff King; +Cc: Ian Harvey, git

Jeff King <peff@peff.net> writes:

> Many code paths will free a tree object's buffer and set it
> to NULL after finishing with it in order to keep memory
> usage down during a traversal. However, out of 8 sites that
> do this, only one actually unsets the "parsed" flag back.
> Those sites that don't are setting a trap for later users of
> the tree object; even after calling parse_tree, the buffer
> will remain NULL, causing potential segfaults.
>
> It is not known whether this is triggerable in the current
> code. Most commands do not do an in-memory traversal
> followed by actually using the objects again. However, it
> does not hurt to be safe for future callers.
>
> In most cases, we can abstract this out to a
> "free_tree_buffer" helper. However, there are two
> exceptions:
>
>   1. The fsck code relies on the parsed flag to know that we
>      were able to parse the object at one point. We can
>      switch this to using a flag in the "flags" field.
>
>   2. The index-pack code sets the buffer to NULL but does
>      not free it (it is freed by a caller). We should still
>      unset the parsed flag here, but we cannot use our
>      helper, as we do not want to free the buffer.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> This shouldn't have any behavior change, but I'd worry a bit that I
> missed some case in builtin/fsck.c where the new HAS_OBJ flag would need
> set.

check-unreachable-object?

I am wondering if you can use SEEN which is already there, or at
least set HAS_OBJ at the same place where SEEN is set.

The overall structure of fsck is:

 * we scan objects we see in the object store; they are marked with
   SEEN to avoid duplicated work in fsck_obj() and this scanning
   does not have anything to do with connectivity; then

 * we traverse connectivity graph from the starting points (refs,
   index, etc.).

and this obj->parsed check is used in the latter phase, so...


An unrelated tangent.

I suspect that 271b8d25b25 made a copy&paste error to "broken link"
message in builtin/fsck.c while looking at this patch, by the way.
The "lookup_object() to compute the first parameter given to the
callback in fsck_walk() returned NULL" case has two "from"; the
latter should be "to" (and I think in the longer term the function
signature of the callback needs to be enhanced to let us tell what
object name we found in "parent" that failed to give us an object).

>  builtin/fsck.c       | 17 ++++++++---------
>  builtin/index-pack.c |  1 +
>  builtin/reflog.c     |  3 +--
>  http-push.c          |  3 +--
>  list-objects.c       |  3 +--
>  reachable.c          |  3 +--
>  revision.c           |  3 +--
>  tree.c               |  8 ++++++++
>  tree.h               |  1 +
>  walker.c             |  5 +----
>  10 files changed, 24 insertions(+), 23 deletions(-)
>
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index bb9a2cd..579fdcc 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -16,6 +16,7 @@
>  
>  #define REACHABLE 0x0001
>  #define SEEN      0x0002
> +#define HAS_OBJ   0x0004
>  
>  static int show_root;
>  static int show_tags;
> @@ -101,7 +102,7 @@ static int mark_object(struct object *obj, int type, void *data)
>  	if (obj->flags & REACHABLE)
>  		return 0;
>  	obj->flags |= REACHABLE;
> -	if (!obj->parsed) {
> +	if (!(obj->flags & HAS_OBJ)) {
>  		if (parent && !has_sha1_file(obj->sha1)) {
>  			printf("broken link from %7s %s\n",
>  				 typename(parent->type), sha1_to_hex(parent->sha1));
> @@ -127,16 +128,13 @@ static int traverse_one_object(struct object *obj)
>  	struct tree *tree = NULL;
>  
>  	if (obj->type == OBJ_TREE) {
> -		obj->parsed = 0;
>  		tree = (struct tree *)obj;
>  		if (parse_tree(tree) < 0)
>  			return 1; /* error already displayed */
>  	}
>  	result = fsck_walk(obj, mark_object, obj);
> -	if (tree) {
> -		free(tree->buffer);
> -		tree->buffer = NULL;
> -	}
> +	if (tree)
> +		free_tree_buffer(tree);
>  	return result;
>  }
>  
> @@ -178,7 +176,7 @@ static void check_reachable_object(struct object *obj)
>  	 * except if it was in a pack-file and we didn't
>  	 * do a full fsck
>  	 */
> -	if (!obj->parsed) {
> +	if (!(obj->flags & HAS_OBJ)) {
>  		if (has_sha1_pack(obj->sha1))
>  			return; /* it is in pack - forget about it */
>  		printf("missing %s %s\n", typename(obj->type), sha1_to_hex(obj->sha1));
> @@ -306,8 +304,7 @@ static int fsck_obj(struct object *obj)
>  	if (obj->type == OBJ_TREE) {
>  		struct tree *item = (struct tree *) obj;
>  
> -		free(item->buffer);
> -		item->buffer = NULL;
> +		free_tree_buffer(item);
>  	}
>  
>  	if (obj->type == OBJ_COMMIT) {
> @@ -340,6 +337,7 @@ static int fsck_sha1(const unsigned char *sha1)
>  		return error("%s: object corrupt or missing",
>  			     sha1_to_hex(sha1));
>  	}
> +	obj->flags |= HAS_OBJ;
>  	return fsck_obj(obj);
>  }
>  
> @@ -352,6 +350,7 @@ static int fsck_obj_buffer(const unsigned char *sha1, enum object_type type,
>  		errors_found |= ERROR_OBJECT;
>  		return error("%s: object corrupt or missing", sha1_to_hex(sha1));
>  	}
> +	obj->flags = HAS_OBJ;
>  	return fsck_obj(obj);
>  }
>  
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index 79dfe47..20cf284 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -765,6 +765,7 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
>  			if (obj->type == OBJ_TREE) {
>  				struct tree *item = (struct tree *) obj;
>  				item->buffer = NULL;
> +				obj->parsed = 0;
>  			}
>  			if (obj->type == OBJ_COMMIT) {
>  				struct commit *commit = (struct commit *) obj;
> diff --git a/builtin/reflog.c b/builtin/reflog.c
> index 54184b3..ba27f7c 100644
> --- a/builtin/reflog.c
> +++ b/builtin/reflog.c
> @@ -94,8 +94,7 @@ static int tree_is_complete(const unsigned char *sha1)
>  			complete = 0;
>  		}
>  	}
> -	free(tree->buffer);
> -	tree->buffer = NULL;
> +	free_tree_buffer(tree);
>  
>  	if (complete)
>  		tree->object.flags |= SEEN;
> diff --git a/http-push.c b/http-push.c
> index 395a8cf..c13b441 100644
> --- a/http-push.c
> +++ b/http-push.c
> @@ -1330,8 +1330,7 @@ static struct object_list **process_tree(struct tree *tree,
>  			break;
>  		}
>  
> -	free(tree->buffer);
> -	tree->buffer = NULL;
> +	free_tree_buffer(tree);
>  	return p;
>  }
>  
> diff --git a/list-objects.c b/list-objects.c
> index 3dd4a96..c8c3463 100644
> --- a/list-objects.c
> +++ b/list-objects.c
> @@ -123,8 +123,7 @@ static void process_tree(struct rev_info *revs,
>  				     cb_data);
>  	}
>  	strbuf_setlen(base, baselen);
> -	free(tree->buffer);
> -	tree->buffer = NULL;
> +	free_tree_buffer(tree);
>  }
>  
>  static void mark_edge_parents_uninteresting(struct commit *commit,
> diff --git a/reachable.c b/reachable.c
> index e7e6a1e..654a8c5 100644
> --- a/reachable.c
> +++ b/reachable.c
> @@ -80,8 +80,7 @@ static void process_tree(struct tree *tree,
>  		else
>  			process_blob(lookup_blob(entry.sha1), p, &me, entry.path, cp);
>  	}
> -	free(tree->buffer);
> -	tree->buffer = NULL;
> +	free_tree_buffer(tree);
>  }
>  
>  static void process_tag(struct tag *tag, struct object_array *p,
> diff --git a/revision.c b/revision.c
> index 518cd08..eb988ee 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -135,8 +135,7 @@ void mark_tree_uninteresting(struct tree *tree)
>  	 * We don't care about the tree any more
>  	 * after it has been marked uninteresting.
>  	 */
> -	free(tree->buffer);
> -	tree->buffer = NULL;
> +	free_tree_buffer(tree);
>  }
>  
>  void mark_parents_uninteresting(struct commit *commit)
> diff --git a/tree.c b/tree.c
> index 62fed63..1cbf60e 100644
> --- a/tree.c
> +++ b/tree.c
> @@ -225,6 +225,14 @@ int parse_tree(struct tree *item)
>  	return parse_tree_buffer(item, buffer, size);
>  }
>  
> +void free_tree_buffer(struct tree *tree)
> +{
> +	free(tree->buffer);
> +	tree->buffer = NULL;
> +	tree->size = 0;
> +	tree->object.parsed = 0;
> +}
> +
>  struct tree *parse_tree_indirect(const unsigned char *sha1)
>  {
>  	struct object *obj = parse_object(sha1);
> diff --git a/tree.h b/tree.h
> index 69bcb5e..601ab9c 100644
> --- a/tree.h
> +++ b/tree.h
> @@ -16,6 +16,7 @@ int parse_tree(struct tree *tree);
>  int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size);
>  
>  int parse_tree(struct tree *tree);
> +void free_tree_buffer(struct tree *tree);
>  
>  /* Parses and returns the tree in the given ent, chasing tags and commits. */
>  struct tree *parse_tree_indirect(const unsigned char *sha1);
> diff --git a/walker.c b/walker.c
> index be389dc..633596e 100644
> --- a/walker.c
> +++ b/walker.c
> @@ -56,10 +56,7 @@ static int process_tree(struct walker *walker, struct tree *tree)
>  		if (!obj || process(walker, obj))
>  			return -1;
>  	}
> -	free(tree->buffer);
> -	tree->buffer = NULL;
> -	tree->size = 0;
> -	tree->object.parsed = 0;
> +	free_tree_buffer(tree);
>  	return 0;
>  }

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4] archive: ignore blob objects when checking reachability
  2013-06-05 22:40                     ` [PATCH 4/4] archive: ignore blob objects when checking reachability Jeff King
  2013-06-06  7:57                       ` Michael Haggerty
@ 2013-06-07  0:50                       ` Eric Sunshine
  1 sibling, 0 replies; 30+ messages in thread
From: Eric Sunshine @ 2013-06-07  0:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Ian Harvey, Git List

On Wed, Jun 5, 2013 at 6:40 PM, Jeff King <peff@peff.net> wrote:
> We cannot create an archive from a blob object, so we would
> not expect anyone to provide one to us. And if they do, we
> will fail anyway just after the reachability check.  We can
> therefore optimize our reachability check to ignore blobs
> completely, and not even create a "struct blob" for them.
>
> Depending on the repository size and the exact place we find
> the reachable object in the traversal, this can save 20-25%,
> a we can avoid many lookups in the object hash.

s/a/as/

> The downside of this is that a blob provided to a remote
> archive process will fail with "no such object" rather than
> "object is not a tree" (we could organize the code to retain
> the old message, but since we no longer know whether the
> blob is reachable or not, we would potentially be leaking
> information about the existence of unreachable objects).
>
> Signed-off-by: Jeff King <peff@peff.net>

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2013-06-07  0:50 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-10 21:18 [BUG] git archive broken in 1.7.8.1 Albert Astals Cid
2012-01-10 21:33 ` Carlos Martín Nieto
2012-01-10 22:05   ` Albert Astals Cid
2012-01-10 22:50     ` Carlos Martín Nieto
2012-01-10 23:21       ` Jeff King
2012-01-11 12:12         ` [PATCH] archive: re-allow HEAD:Documentation on a remote invocation Carlos Martín Nieto
2012-01-11 19:39           ` Jeff King
2012-01-11 19:42             ` [PATCH 1/2] get_sha1_with_context: report features used in resolution Jeff King
2012-01-12  2:36               ` Junio C Hamano
2012-01-12  2:51                 ` Jeff King
2012-01-11 19:42             ` [PATCH 2/2] archive: loosen restrictions on remote object lookup Jeff King
2013-05-29 12:05               ` Ian Harvey
2013-06-05 16:38                 ` Jeff King
2013-06-05 22:35                   ` [RFC/PATCH 0/4] real reachability checks for upload-archive Jeff King
2013-06-05 22:37                     ` [PATCH 1/4] clear parsed flag when we free tree buffers Jeff King
2013-06-06 17:55                       ` Junio C Hamano
2013-06-05 22:39                     ` [PATCH 2/4] upload-archive: restrict remote objects with reachability check Jeff King
2013-06-05 22:40                     ` [PATCH 3/4] list-objects: optimize "revs->blob_objects = 0" case Jeff King
2013-06-05 22:40                     ` [PATCH 4/4] archive: ignore blob objects when checking reachability Jeff King
2013-06-06  7:57                       ` Michael Haggerty
2013-06-07  0:50                       ` Eric Sunshine
2013-06-06 17:27                     ` [RFC/PATCH 0/4] real reachability checks for upload-archive Junio C Hamano
2012-01-12  2:46           ` [PATCH] archive: re-allow HEAD:Documentation on a remote invocation Junio C Hamano
2012-01-12  2:54             ` Jeff King
2012-01-12  2:59               ` Jeff King
2012-01-12  3:03               ` Junio C Hamano
2012-01-12  3:10                 ` Jeff King
2012-01-12  3:20                   ` Junio C Hamano
2012-01-10 23:01     ` [BUG] git archive broken in 1.7.8.1 Allan Wind
2012-01-11 12:51       ` Carlos Martín Nieto

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).