git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Ian Harvey <iharvey@good.com>
Cc: git@vger.kernel.org
Subject: [PATCH 2/4] upload-archive: restrict remote objects with reachability check
Date: Wed, 5 Jun 2013 18:39:45 -0400	[thread overview]
Message-ID: <20130605223945.GB15607@sigill.intra.peff.net> (raw)
In-Reply-To: <20130605223551.GF8664@sigill.intra.peff.net>

When serving a remote request, git-upload-archive tries to
restrict access to unreachable objects, which matches the
behavior of upload-pack. However, we did so by restricting
the requested tree to "<ref>[:<path>]", because it is fast.
That covers the common cases, but does not allow requesting
items by a specific sha1 (either a tree or a commit sha1).

Instead, let's do the correct-but-slower method of actually
walking back from the tips to see if the requested object is
reachable. The performance impact of this is roughly:

  1. For a recent commit, the speed is about the same (we
     traverse in reverse chronological order, so we see it
     almost immediately).

  2. For an older commit, even one pointed at directly by a
     ref (e.g., an old tag), we are slower, because we
     traverse from the more recent tips. We are bounded in
     this case by the time to look at all commits (i.e.,
     "time git rev-list --all").

  3. When we see "$ref:$path", we typically perform much
     worse, because our traversal looks at all commits
     first, followed by all trees.

  4. The worst case (which we hit for an unreachable object)
     is equivalent to "time rev-list --objects --all", which
     is about the same amount of time pack-objects spends
     preparing a full clone (which can be in the tens of
     seconds for a large repository).

The implementation is a fairly straightforward application
of the traverse_commit_list function. Using the
mark_objects_reachable function would seem more appropriate,
but it has no mechanism for looking for a specific object,
which lets us end the traversal early in common cases.

Signed-off-by: Jeff King <peff@peff.net>
---
The slowdown from points (2) and (3) makes me hesitate on this. We can
address (2) by checking if the get_sha1 lookup used a refname
explicitly, but I'm no sure about (3).

 archive.c                     | 70 +++++++++++++++++++++++++++------
 t/t5005-archive-resolution.sh | 91 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 150 insertions(+), 11 deletions(-)
 create mode 100755 t/t5005-archive-resolution.sh

diff --git a/archive.c b/archive.c
index d254fa5..4d77624 100644
--- a/archive.c
+++ b/archive.c
@@ -5,6 +5,9 @@
 #include "archive.h"
 #include "parse-options.h"
 #include "unpack-trees.h"
+#include "diff.h"
+#include "revision.h"
+#include "list-objects.h"
 
 static char const * const archive_usage[] = {
 	N_("git archive [options] <tree-ish> [<path>...]"),
@@ -241,6 +244,59 @@ static void parse_pathspec_arg(const char **pathspec,
 	}
 }
 
+struct reachable_object_data {
+	struct rev_info revs;
+	struct object *obj;
+};
+
+static void check_object(struct object *obj, const struct name_path *path,
+			 const char *name, void *vdata)
+{
+	struct reachable_object_data *data = vdata;
+	/*
+	 * We found it; the caller will take care of marking it SEEN,
+	 * but we can end the traversal early.
+	 */
+	if (obj == data->obj) {
+		free_commit_list(data->revs.commits);
+		data->revs.commits = NULL;
+
+		free(data->revs.pending.objects);
+		data->revs.pending.nr = 0;
+		data->revs.pending.alloc = 0;
+		data->revs.pending.objects = NULL;
+	}
+}
+
+static void check_commit(struct commit *commit, void *vdata)
+{
+	check_object(&commit->object, NULL, NULL, vdata);
+}
+
+static int object_is_reachable(const unsigned char *sha1)
+{
+	static const char *argv[] = {
+		"rev-list",
+		"--objects",
+		"--all",
+		NULL
+	};
+	struct reachable_object_data data;
+
+	data.obj = parse_object(sha1);
+	if (!data.obj)
+		return 0;
+
+	save_commit_buffer = 0;
+	init_revisions(&data.revs, NULL);
+	setup_revisions(ARRAY_SIZE(argv) - 1, argv, &data.revs, NULL);
+	if (prepare_revision_walk(&data.revs))
+		return 0;
+
+	traverse_commit_list(&data.revs, check_commit, check_object, &data);
+	return data.obj->flags & SEEN;
+}
+
 static void parse_treeish_arg(const char **argv,
 		struct archiver_args *ar_args, const char *prefix,
 		int remote)
@@ -252,20 +308,12 @@ static void parse_treeish_arg(const char **argv,
 	const struct commit *commit;
 	unsigned char sha1[20];
 
-	/* Remotes are only allowed to fetch actual refs */
-	if (remote) {
-		char *ref = NULL;
-		const char *colon = strchr(name, ':');
-		int refnamelen = colon ? colon - name : strlen(name);
-
-		if (!dwim_ref(name, refnamelen, sha1, &ref))
-			die("no such ref: %.*s", refnamelen, name);
-		free(ref);
-	}
-
 	if (get_sha1(name, sha1))
 		die("Not a valid object name");
 
+	if (remote && !object_is_reachable(sha1))
+		die("Not a valid object name");
+
 	commit = lookup_commit_reference_gently(sha1, 1);
 	if (commit) {
 		commit_sha1 = commit->object.sha1;
diff --git a/t/t5005-archive-resolution.sh b/t/t5005-archive-resolution.sh
new file mode 100755
index 0000000..2e7ee1e
--- /dev/null
+++ b/t/t5005-archive-resolution.sh
@@ -0,0 +1,91 @@
+#!/bin/sh
+
+test_description='test object resolution methods for local and remote archive'
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	echo one >one && git add one && git commit -m one &&
+	sha1_referenced=`git rev-parse HEAD` &&
+	git tag tagged &&
+	echo two >two && git add two && git commit -m two &&
+	sha1_unreferenced=`git rev-parse HEAD` &&
+	git reset --hard HEAD^ &&
+	echo three >three && git add three && git commit -m three &&
+	git tag tagged-tree HEAD^{tree} &&
+	git reset --hard HEAD^ &&
+	mkdir subdir &&
+	echo four >subdir/four && git add subdir && git commit -m four &&
+	sha1_subtree=`git rev-parse HEAD:subdir`
+'
+
+# check that archiving $what from $where produces expected paths
+check() {
+	desc=$1; shift; # human-readable description
+	where=$1; shift; # local|remote
+	what=$1; shift; # the commit/tree id
+	expect="$*"; # expected paths or "deny"
+
+	cmd="git archive --format=tar -o result.tar"
+	test "$where" = "remote" && cmd="$cmd --remote=."
+	cmd="$cmd $what"
+
+	if test "$expect" = "deny"; then
+		test_expect_success "archive $desc ($where, should deny)" "
+			test_must_fail $cmd
+		"
+	else
+		test_expect_success "archive $desc ($where, should work)" '
+			'"$cmd"' &&
+			for i in '"$expect"'; do
+				echo "$i:`basename $i`"
+			done >expect &&
+			rm -rf result &&
+			mkdir result &&
+			(cd result &&
+			tar xf ../result.tar &&
+			for i in `find * -type f -print`; do
+				echo "$i:`cat $i`"
+			done >../actual
+			) &&
+			test_cmp expect actual
+		'
+	fi
+}
+
+check 'ref'  local master one subdir/four
+check 'ref' remote master one subdir/four
+
+check 'relative ref'  local master^ one
+check 'relative ref' remote master^ one
+
+check 'reachable sha1'  local $sha1_referenced one
+check 'reachable sha1' remote $sha1_referenced one
+
+check 'unreachable sha1'  local $sha1_unreferenced one two
+check 'unreachable sha1' remote $sha1_unreferenced deny
+
+check 'reachable reflog'  local master@{0} one subdir/four
+check 'reachable reflog' remote master@{0} one subdir/four
+
+check 'unreachable reflog'  local master@{4} one two
+check 'unreachable reflog' remote master@{4} deny
+
+check 'tree via ref^{tree}'  local master^{tree} one subdir/four
+check 'tree via ref^{tree}' remote master^{tree} one subdir/four
+
+check 'tree via ref:'  local master: one subdir/four
+check 'tree via ref:' remote master: one subdir/four
+
+check 'subtree via ref:sub'  local master:subdir four
+check 'subtree via ref:sub' remote master:subdir four
+
+check 'subtree via sha1'  local $sha1_subtree four
+check 'subtree via sha1' remote $sha1_subtree four
+
+check 'tagged commit'  local tagged one
+check 'tagged commit' remote tagged one
+
+check 'tagged tree'  local tagged-tree one three
+check 'tagged tree' remote tagged-tree one three
+
+test_done
-- 
1.8.3.rc2.14.g7eee6b3

  parent reply	other threads:[~2013-06-05 22:39 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-10 21:18 [BUG] git archive broken in 1.7.8.1 Albert Astals Cid
2012-01-10 21:33 ` Carlos Martín Nieto
2012-01-10 22:05   ` Albert Astals Cid
2012-01-10 22:50     ` Carlos Martín Nieto
2012-01-10 23:21       ` Jeff King
2012-01-11 12:12         ` [PATCH] archive: re-allow HEAD:Documentation on a remote invocation Carlos Martín Nieto
2012-01-11 19:39           ` Jeff King
2012-01-11 19:42             ` [PATCH 1/2] get_sha1_with_context: report features used in resolution Jeff King
2012-01-12  2:36               ` Junio C Hamano
2012-01-12  2:51                 ` Jeff King
2012-01-11 19:42             ` [PATCH 2/2] archive: loosen restrictions on remote object lookup Jeff King
2013-05-29 12:05               ` Ian Harvey
2013-06-05 16:38                 ` Jeff King
2013-06-05 22:35                   ` [RFC/PATCH 0/4] real reachability checks for upload-archive Jeff King
2013-06-05 22:37                     ` [PATCH 1/4] clear parsed flag when we free tree buffers Jeff King
2013-06-06 17:55                       ` Junio C Hamano
2013-06-05 22:39                     ` Jeff King [this message]
2013-06-05 22:40                     ` [PATCH 3/4] list-objects: optimize "revs->blob_objects = 0" case Jeff King
2013-06-05 22:40                     ` [PATCH 4/4] archive: ignore blob objects when checking reachability Jeff King
2013-06-06  7:57                       ` Michael Haggerty
2013-06-07  0:50                       ` Eric Sunshine
2013-06-06 17:27                     ` [RFC/PATCH 0/4] real reachability checks for upload-archive Junio C Hamano
2012-01-12  2:46           ` [PATCH] archive: re-allow HEAD:Documentation on a remote invocation Junio C Hamano
2012-01-12  2:54             ` Jeff King
2012-01-12  2:59               ` Jeff King
2012-01-12  3:03               ` Junio C Hamano
2012-01-12  3:10                 ` Jeff King
2012-01-12  3:20                   ` Junio C Hamano
2012-01-10 23:01     ` [BUG] git archive broken in 1.7.8.1 Allan Wind
2012-01-11 12:51       ` Carlos Martín Nieto

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130605223945.GB15607@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=iharvey@good.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).