git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/6] receive-pack: quarantine pushed objects
@ 2016-09-30 19:35 Jeff King
  2016-09-30 19:35 ` [PATCH 1/6] check_connected: accept an env argument Jeff King
                   ` (7 more replies)
  0 siblings, 8 replies; 32+ messages in thread
From: Jeff King @ 2016-09-30 19:35 UTC (permalink / raw)
  To: git; +Cc: David Turner

I've mentioned before on the list that GitHub "quarantines" objects
while the pre-receive hook runs. Here are the patches to implement
that.

The basic problem is that as-is, index-pack admits pushed objects into
the main object database immediately, before the pre-receive hook runs.
It _has_ to, since the hook needs to be able to actually look at the
objects. However, this means that if the pre-receive hook rejects the
push, we still end up with the objects in the repository. We can't just
delete them as temporary files, because we don't know what other
processes might have started referencing them.

The solution here is to push into a "quarantine" directory that is
accessible only to pre-receive, check_connected(), etc, and only
move the objects into the main object database after we've finished
those basic checks.

One of the things we use it for at GitHub is object-size policy, which
we implement via a pre-receive hook (sort of; see below). This scheme
has been in use for about 2 years, though I did do a fair bit of
tweaking to make it ready for upstream (squashing bugfixes and merges
from upstream that came later, along with polishing a few rough edges I
saw while doing so). So I may have introduced new bugs. :)

The patches are:

  [1/6]: check_connected: accept an env argument
  [2/6]: sha1_file: always allow relative paths to alternates

    These two are preparatory.

  [3/6]: tmp-objdir: introduce API for temporary object directories
  [4/6]: receive-pack: quarantine objects until pre-receive accepts

    This is the interesting part.

  [5/6]: tmp-objdir: put quarantine information in the environment
  [6/6]: tmp-objdir: do not migrate files starting with '.'

    These are two changes that I ended up doing later to support another
    series. They're not strictly necessary here, but I think they're
    worth including now, as they change the visible behavior in minor
    ways. It seems like a good idea to start with what I think should be
    the final behavior.

    The other series is basically an optimization for the object-size
    policy. Without it, you are stuck walking the graph again in the
    pre-receive hook to find the new objects and check their sizes.

    But index-pack can do that for you very cheaply; it has the size of
    each object already. But it _doesn't_ produce nice error messages;
    it has no idea at what path the objects are found, and it doesn't
    know what kind of advice it should give the user.

    So what we can do is ask index-pack to make a note of any objects
    larger than N bytes, and write their sha1 and size into a file in
    the quarantine path. Then the pre-receive hook can look in that log
    and generate any nice message it wants. In the common case, the log
    is empty, and it does not have to do any work at all.

    These two patches set that up by letting index-pack and pre-receive
    know that quarantine path and use it to store arbitrary files that
    _don't_ get migrated to the main object database (i.e., the log file
    mentioned above).

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/6] check_connected: accept an env argument
  2016-09-30 19:35 [PATCH 0/6] receive-pack: quarantine pushed objects Jeff King
@ 2016-09-30 19:35 ` Jeff King
  2016-09-30 19:36 ` [PATCH 2/6] sha1_file: always allow relative paths to alternates Jeff King
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2016-09-30 19:35 UTC (permalink / raw)
  To: git; +Cc: David Turner

This lets callers influence the environment seen by
rev-list, which will be useful when we start providing
quarantined objects.

Signed-off-by: Jeff King <peff@peff.net>
---
 connected.c | 1 +
 connected.h | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/connected.c b/connected.c
index 8e3e4b1..136c2ac 100644
--- a/connected.c
+++ b/connected.c
@@ -63,6 +63,7 @@ int check_connected(sha1_iterate_fn fn, void *cb_data,
 				 _("Checking connectivity"));
 
 	rev_list.git_cmd = 1;
+	rev_list.env = opt->env;
 	rev_list.in = -1;
 	rev_list.no_stdout = 1;
 	if (opt->err_fd)
diff --git a/connected.h b/connected.h
index afa48cc..4ca325f 100644
--- a/connected.h
+++ b/connected.h
@@ -33,6 +33,11 @@ struct check_connected_options {
 
 	/* If non-zero, show progress as we traverse the objects. */
 	int progress;
+
+	/*
+	 * Insert these variables into the environment of the child process.
+	 */
+	const char **env;
 };
 
 #define CHECK_CONNECTED_INIT { 0 }
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/6] sha1_file: always allow relative paths to alternates
  2016-09-30 19:35 [PATCH 0/6] receive-pack: quarantine pushed objects Jeff King
  2016-09-30 19:35 ` [PATCH 1/6] check_connected: accept an env argument Jeff King
@ 2016-09-30 19:36 ` Jeff King
  2016-10-02  9:07   ` René Scharfe
  2016-09-30 19:36 ` [PATCH 3/6] tmp-objdir: introduce API for temporary object directories Jeff King
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Jeff King @ 2016-09-30 19:36 UTC (permalink / raw)
  To: git; +Cc: David Turner

We recursively expand alternates repositories, so that if A
borrows from B which borrows from C, A can see all objects.

For the root object database, we allow relative paths, so A
can point to B as "../B/objects". However, we currently do
not allow relative paths when recursing, so B must use an
absolute path to reach C.

That is an ancient protection from c2f493a (Transitively
read alternatives, 2006-05-07) that tries to avoid adding
the same alternate through two different paths. But since
5bdf0a8 (sha1_file: normalize alt_odb path before comparing
and storing, 2011-09-07), we use a normalized absolute path
for each alt_odb entry.

So this protection is no longer necessary; we will detect
the duplicate no matter how we got there.  And it's a good
idea to get rid of it, as it creates an unnecessary
complication when setting up recursive alternates (B has to
know that A is going to borrow from it and make sure to use
an absolute path).

We adjust the test script here to demonstrate that this now
works. Unfortunately, we can't demonstrate that the
duplicate is suppressed, since it has no user-visible
behavior (it's just one less place for our object lookups to
go). But you can verify it manually via gdb, with something
like:

    for i in a b c; do
            git init --bare $i
            blob=$(echo $i | git -C $i hash-object -w --stdin)
    done
    echo "../../b/objects" >a/objects/info/alternates
    echo "../../c/objects" >>a/objects/info/alternates
    echo "../../c/objects" >b/objects/info/alternates
    gdb --args git cat-file -e $blob

After prepare_alt_odb() runs, we have only a single copy of
"/path/to/c/objects/" in the alt_odb list.

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_file.c               | 7 +------
 t/t5613-info-alternate.sh | 4 ++--
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index b9c1fa3..9a79c19 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -343,12 +343,7 @@ static void link_alt_odb_entries(const char *alt, int len, int sep,
 		const char *entry = entries.items[i].string;
 		if (entry[0] == '\0' || entry[0] == '#')
 			continue;
-		if (!is_absolute_path(entry) && depth) {
-			error("%s: ignoring relative alternate object store %s",
-					relative_base, entry);
-		} else {
-			link_alt_odb_entry(entry, relative_base, depth, objdirbuf.buf);
-		}
+		link_alt_odb_entry(entry, relative_base, depth, objdirbuf.buf);
 	}
 	string_list_clear(&entries, 0);
 	free(alt_copy);
diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
index 9cd2626..b429707 100755
--- a/t/t5613-info-alternate.sh
+++ b/t/t5613-info-alternate.sh
@@ -102,9 +102,9 @@ test_valid_repo'
 cd "$base_dir"
 
 test_expect_success \
-    'that relative alternate is only possible for current dir' '
+    'that relative alternate is recursive' '
     cd D &&
-    ! (test_valid_repo)
+    test_valid_repo
 '
 
 cd "$base_dir"
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/6] tmp-objdir: introduce API for temporary object directories
  2016-09-30 19:35 [PATCH 0/6] receive-pack: quarantine pushed objects Jeff King
  2016-09-30 19:35 ` [PATCH 1/6] check_connected: accept an env argument Jeff King
  2016-09-30 19:36 ` [PATCH 2/6] sha1_file: always allow relative paths to alternates Jeff King
@ 2016-09-30 19:36 ` Jeff King
  2016-09-30 21:25   ` Junio C Hamano
  2016-09-30 21:32   ` David Turner
  2016-09-30 19:36 ` [PATCH 4/6] receive-pack: quarantine objects until pre-receive accepts Jeff King
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 32+ messages in thread
From: Jeff King @ 2016-09-30 19:36 UTC (permalink / raw)
  To: git; +Cc: David Turner

Once objects are added to the object database by a process,
they cannot easily be deleted, as we don't know what other
processes may have started referencing them. We have to
clean them up with git-gc, which will apply the usual
reachability and grace-period checks.

This patch provides an alternative: it helps callers create
a temporary directory inside the object directory, and a
temporary environment which can be passed to sub-programs to
ask them to write there (the original object directory
remains accessible as an alternate of the temporary one).

See tmp-objdir.h for details on the API.

Signed-off-by: Jeff King <peff@peff.net>
---
 Makefile     |   1 +
 cache.h      |   1 +
 sha1_file.c  |   6 ++
 tmp-objdir.c | 266 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tmp-objdir.h |  53 ++++++++++++
 5 files changed, 327 insertions(+)
 create mode 100644 tmp-objdir.c
 create mode 100644 tmp-objdir.h

diff --git a/Makefile b/Makefile
index 1aad150..4e3becb 100644
--- a/Makefile
+++ b/Makefile
@@ -831,6 +831,7 @@ LIB_OBJS += submodule-config.o
 LIB_OBJS += symlinks.o
 LIB_OBJS += tag.o
 LIB_OBJS += tempfile.o
+LIB_OBJS += tmp-objdir.o
 LIB_OBJS += trace.o
 LIB_OBJS += trailer.o
 LIB_OBJS += transport.o
diff --git a/cache.h b/cache.h
index ed3d5df..607c9b5 100644
--- a/cache.h
+++ b/cache.h
@@ -1389,6 +1389,7 @@ extern void prepare_alt_odb(void);
 extern void read_info_alternates(const char * relative_base, int depth);
 extern char *compute_alternate_path(const char *path, struct strbuf *err);
 extern void add_to_alternates_file(const char *reference);
+extern void add_to_alternates_internal(const char *reference);
 typedef int alt_odb_fn(struct alternate_object_database *, void *);
 extern int foreach_alt_odb(alt_odb_fn, void*);
 
diff --git a/sha1_file.c b/sha1_file.c
index 9a79c19..65deaf9 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -421,6 +421,12 @@ void add_to_alternates_file(const char *reference)
 	free(alts);
 }
 
+void add_to_alternates_internal(const char *reference)
+{
+	prepare_alt_odb();
+	link_alt_odb_entries(reference, strlen(reference), '\n', NULL, 0);
+}
+
 /*
  * Compute the exact path an alternate is at and returns it. In case of
  * error NULL is returned and the human readable error is added to `err`
diff --git a/tmp-objdir.c b/tmp-objdir.c
new file mode 100644
index 0000000..c92e6cc
--- /dev/null
+++ b/tmp-objdir.c
@@ -0,0 +1,266 @@
+#include "cache.h"
+#include "tmp-objdir.h"
+#include "dir.h"
+#include "sigchain.h"
+#include "string-list.h"
+#include "strbuf.h"
+#include "argv-array.h"
+
+struct tmp_objdir {
+	struct strbuf path;
+	struct argv_array env;
+};
+
+/*
+ * Allow only one tmp_objdir at a time in a running process, which simplifies
+ * our signal/atexit cleanup routines.  It's doubtful callers will ever need
+ * more than one, and we can expand later if so.  You can have many such
+ * tmp_objdirs simultaneously in many processes, of course.
+ */
+struct tmp_objdir *the_tmp_objdir;
+
+static void tmp_objdir_free(struct tmp_objdir *t)
+{
+	strbuf_release(&t->path);
+	argv_array_clear(&t->env);
+	free(t);
+}
+
+static int tmp_objdir_destroy_1(struct tmp_objdir *t, int on_signal)
+{
+	int err;
+
+	if (!t)
+		return 0;
+
+	if (t == the_tmp_objdir)
+		the_tmp_objdir = NULL;
+
+	/*
+	 * This may use malloc via strbuf_grow(), but we should
+	 * have pre-grown t->path sufficiently so that this
+	 * doesn't happen in practice.
+	 */
+	err = remove_dir_recursively(&t->path, 0);
+
+	/*
+	 * When we are cleaning up due to a signal, we won't bother
+	 * freeing memory; it may cause a deadlock if the signal
+	 * arrived while libc's allocator lock is held.
+	 */
+	if (!on_signal)
+		tmp_objdir_free(t);
+	return err;
+}
+
+int tmp_objdir_destroy(struct tmp_objdir *t)
+{
+	return tmp_objdir_destroy_1(t, 0);
+}
+
+static void remove_tmp_objdir(void)
+{
+	tmp_objdir_destroy(the_tmp_objdir);
+}
+
+static void remove_tmp_objdir_on_signal(int signo)
+{
+	tmp_objdir_destroy_1(the_tmp_objdir, 1);
+	sigchain_pop(signo);
+	raise(signo);
+}
+
+static void env_append(struct argv_array *env, const char *key, const char *val)
+{
+	const char *old = getenv(key);
+
+	if (!old)
+		argv_array_pushf(env, "%s=%s", key, val);
+	else
+		argv_array_pushf(env, "%s=%s%c%s", key, old, PATH_SEP, val);
+}
+
+static void env_replace(struct argv_array *env, const char *key, const char *val)
+{
+	argv_array_pushf(env, "%s=%s", key, val);
+}
+
+static int setup_tmp_objdir(const char *root)
+{
+	char *path;
+	int ret = 0;
+
+	path = xstrfmt("%s/pack", root);
+	ret = mkdir(path, 0777);
+	free(path);
+
+	return ret;
+}
+
+struct tmp_objdir *tmp_objdir_create(void)
+{
+	static int installed_handlers;
+	struct tmp_objdir *t;
+
+	if (the_tmp_objdir)
+		die("BUG: only one tmp_objdir can be used at a time");
+
+	t = xmalloc(sizeof(*t));
+	strbuf_init(&t->path, 0);
+	argv_array_init(&t->env);
+
+	strbuf_addf(&t->path, "%s/incoming-XXXXXX", get_object_directory());
+
+	/*
+	 * Grow the strbuf beyond any filename we expect to be placed in it.
+	 * If tmp_objdir_destroy() is called by a signal handler, then
+	 * we should be able to use the strbuf to remove files without
+	 * having to call malloc.
+	 */
+	strbuf_grow(&t->path, 1024);
+
+	if (!mkdtemp(t->path.buf)) {
+		/* free, not destroy, as we never touched the filesystem */
+		tmp_objdir_free(t);
+		return NULL;
+	}
+
+	the_tmp_objdir = t;
+	if (!installed_handlers) {
+		atexit(remove_tmp_objdir);
+		sigchain_push_common(remove_tmp_objdir_on_signal);
+		installed_handlers++;
+	}
+
+	if (setup_tmp_objdir(t->path.buf)) {
+		tmp_objdir_destroy(t);
+		return NULL;
+	}
+
+	env_append(&t->env, ALTERNATE_DB_ENVIRONMENT,
+		   absolute_path(get_object_directory()));
+	env_replace(&t->env, DB_ENVIRONMENT, absolute_path(t->path.buf));
+
+	return t;
+}
+
+/*
+ * Make sure we copy packfiles and their associated metafiles in the correct
+ * order. All of these ends_with checks are slightly expensive to do in
+ * the midst of a sorting routine, but in practice it shouldn't matter.
+ * We will have a relatively small number of packfiles to order, and loose
+ * objects exit early in the first line.
+ */
+static int pack_copy_priority(const char *name)
+{
+	if (!starts_with(name, "pack"))
+		return 0;
+	if (ends_with(name, ".keep"))
+		return 1;
+	if (ends_with(name, ".pack"))
+		return 2;
+	if (ends_with(name, ".idx"))
+		return 3;
+	return 4;
+}
+
+static int pack_copy_cmp(const char *a, const char *b)
+{
+	return pack_copy_priority(a) - pack_copy_priority(b);
+}
+
+static int read_dir_paths(struct string_list *out, const char *path)
+{
+	DIR *dh;
+	struct dirent *de;
+
+	dh = opendir(path);
+	if (!dh)
+		return -1;
+
+	while ((de = readdir(dh)))
+		if (!is_dot_or_dotdot(de->d_name))
+			string_list_append(out, de->d_name);
+
+	closedir(dh);
+	return 0;
+}
+
+static int migrate_paths(struct strbuf *src, struct strbuf *dst);
+
+static int migrate_one(struct strbuf *src, struct strbuf *dst)
+{
+	struct stat st;
+
+	if (stat(src->buf, &st) < 0)
+		return -1;
+	if (S_ISDIR(st.st_mode)) {
+		if (!mkdir(dst->buf, 0777)) {
+			if (adjust_shared_perm(dst->buf))
+				return -1;
+		} else if (errno != EEXIST)
+			return -1;
+		return migrate_paths(src, dst);
+	}
+	return finalize_object_file(src->buf, dst->buf);
+}
+
+static int migrate_paths(struct strbuf *src, struct strbuf *dst)
+{
+	size_t src_len = src->len, dst_len = dst->len;
+	struct string_list paths = STRING_LIST_INIT_DUP;
+	int i;
+	int ret = 0;
+
+	if (read_dir_paths(&paths, src->buf) < 0)
+		return -1;
+	paths.cmp = pack_copy_cmp;
+	string_list_sort(&paths);
+
+	for (i = 0; i < paths.nr; i++) {
+		const char *name = paths.items[i].string;
+
+		strbuf_addf(src, "/%s", name);
+		strbuf_addf(dst, "/%s", name);
+
+		ret |= migrate_one(src, dst);
+
+		strbuf_setlen(src, src_len);
+		strbuf_setlen(dst, dst_len);
+	}
+
+	string_list_clear(&paths, 0);
+	return ret;
+}
+
+int tmp_objdir_migrate(struct tmp_objdir *t)
+{
+	struct strbuf src = STRBUF_INIT, dst = STRBUF_INIT;
+	int ret;
+
+	if (!t)
+		return 0;
+
+	strbuf_addbuf(&src, &t->path);
+	strbuf_addstr(&dst, get_object_directory());
+
+	ret = migrate_paths(&src, &dst);
+
+	strbuf_release(&src);
+	strbuf_release(&dst);
+
+	tmp_objdir_destroy(t);
+	return ret;
+}
+
+const char **tmp_objdir_env(const struct tmp_objdir *t)
+{
+	if (!t)
+		return NULL;
+	return t->env.argv;
+}
+
+void tmp_objdir_add_as_alternate(const struct tmp_objdir *t)
+{
+	add_to_alternates_internal(t->path.buf);
+}
diff --git a/tmp-objdir.h b/tmp-objdir.h
new file mode 100644
index 0000000..aa47aa9
--- /dev/null
+++ b/tmp-objdir.h
@@ -0,0 +1,53 @@
+#ifndef TMP_OBJDIR_H
+#define TMP_OBJDIR_H
+
+/*
+ * This API allows you to create a temporary object directory, advertise it to
+ * sub-processes via GIT_OBJECT_DIRECTORY and GIT_ALTERNATE_OBJECT_DIRECTORIES,
+ * and then either migrate its object into the main object directory, or remove
+ * it. The library handles unexpected signal/exit death by cleaning up the
+ * temporary directory.
+ *
+ * Example:
+ *
+ *	struct tmp_objdir *t = tmp_objdir_create();
+ *	if (!run_command_v_opt_cd_env(cmd, 0, NULL, tmp_objdir_env(t)) &&
+ *	    !tmp_objdir_migrate(t))
+ *		printf("success!\n");
+ *	else
+ *		die("failed...tmp_objdir will clean up for us");
+ *
+ */
+
+struct tmp_objdir;
+
+/*
+ * Create a new temporary object directory; returns NULL on failure.
+ */
+struct tmp_objdir *tmp_objdir_create(void);
+
+/*
+ * Return a list of environment strings, suitable for use with
+ * child_process.env, that can be passed to child programs to make use of the
+ * temporary object directory.
+ */
+const char **tmp_objdir_env(const struct tmp_objdir *);
+
+/*
+ * Finalize a temporary object directory by migrating its objects into the main
+ * object database.
+ */
+int tmp_objdir_migrate(struct tmp_objdir *);
+
+/*
+ * Destroy a temporary object directory, discarding any objects it contains.
+ */
+int tmp_objdir_destroy(struct tmp_objdir *);
+
+/*
+ * Add the temporary object directory as an alternate object store in the
+ * current process.
+ */
+void tmp_objdir_add_as_alternate(const struct tmp_objdir *);
+
+#endif /* TMP_OBJDIR_H */
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/6] receive-pack: quarantine objects until pre-receive accepts
  2016-09-30 19:35 [PATCH 0/6] receive-pack: quarantine pushed objects Jeff King
                   ` (2 preceding siblings ...)
  2016-09-30 19:36 ` [PATCH 3/6] tmp-objdir: introduce API for temporary object directories Jeff King
@ 2016-09-30 19:36 ` Jeff King
  2016-10-01  9:12   ` Jeff King
  2016-09-30 19:36 ` [PATCH 5/6] tmp-objdir: put quarantine information in the environment Jeff King
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Jeff King @ 2016-09-30 19:36 UTC (permalink / raw)
  To: git; +Cc: David Turner

When a client pushes objects to us, index-pack checks the
objects themselves and then installs them into place. If we
then reject the push due to a pre-receive hook, we cannot
just delete the packfile; other processes may be depending
on it. We have to do a normal reachability check at this
point via `git gc`.

But such objects may hang around for weeks due to the
gc.pruneExpire grace period. And worse, during that time
they may be exploded from the pack into inefficient loose
objects.

Instead, this patch teaches receive-pack to put the new
objects into a "quarantine" temporary directory. We make
these objects available to the connectivity check and to the
pre-receive hook, and then install them into place only if
it is successful (and otherwise remove them as tempfiles).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/receive-pack.c     | 41 ++++++++++++++++++++++++++++++++++++++++-
 t/t5547-push-quarantine.sh | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+), 1 deletion(-)
 create mode 100755 t/t5547-push-quarantine.sh

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 896b16f..04ad909 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -20,6 +20,7 @@
 #include "gpg-interface.h"
 #include "sigchain.h"
 #include "fsck.h"
+#include "tmp-objdir.h"
 
 static const char * const receive_pack_usage[] = {
 	N_("git receive-pack <git-dir>"),
@@ -86,6 +87,8 @@ static enum {
 } use_keepalive;
 static int keepalive_in_sec = 5;
 
+struct tmp_objdir *tmp_objdir;
+
 static enum deny_action parse_deny_action(const char *var, const char *value)
 {
 	if (value) {
@@ -663,6 +666,9 @@ static int run_and_feed_hook(const char *hook_name, feed_fn feed,
 	} else
 		argv_array_pushf(&proc.env_array, "GIT_PUSH_OPTION_COUNT");
 
+	if (tmp_objdir)
+		argv_array_pushv(&proc.env_array, tmp_objdir_env(tmp_objdir));
+
 	if (use_sideband) {
 		memset(&muxer, 0, sizeof(muxer));
 		muxer.proc = copy_to_sideband;
@@ -762,6 +768,7 @@ static int run_update_hook(struct command *cmd)
 	proc.stdout_to_stderr = 1;
 	proc.err = use_sideband ? -1 : 0;
 	proc.argv = argv;
+	proc.env = tmp_objdir_env(tmp_objdir);
 
 	code = start_command(&proc);
 	if (code)
@@ -833,6 +840,7 @@ static int update_shallow_ref(struct command *cmd, struct shallow_info *si)
 		    !delayed_reachability_test(si, i))
 			sha1_array_append(&extra, si->shallow->sha1[i]);
 
+	opt.env = tmp_objdir_env(tmp_objdir);
 	setup_alternate_shallow(&shallow_lock, &opt.shallow_file, &extra);
 	if (check_connected(command_singleton_iterator, cmd, &opt)) {
 		rollback_lock_file(&shallow_lock);
@@ -1240,12 +1248,17 @@ static void set_connectivity_errors(struct command *commands,
 
 	for (cmd = commands; cmd; cmd = cmd->next) {
 		struct command *singleton = cmd;
+		struct check_connected_options opt = CHECK_CONNECTED_INIT;
+
 		if (shallow_update && si->shallow_ref[cmd->index])
 			/* to be checked in update_shallow_ref() */
 			continue;
+
+		opt.env = tmp_objdir_env(tmp_objdir);
 		if (!check_connected(command_singleton_iterator, &singleton,
-				     NULL))
+				     &opt))
 			continue;
+
 		cmd->error_string = "missing necessary objects";
 	}
 }
@@ -1428,6 +1441,7 @@ static void execute_commands(struct command *commands,
 	data.si = si;
 	opt.err_fd = err_fd;
 	opt.progress = err_fd && !quiet;
+	opt.env = tmp_objdir_env(tmp_objdir);
 	if (check_connected(iterate_receive_command_list, &data, &opt))
 		set_connectivity_errors(commands, si);
 
@@ -1444,6 +1458,19 @@ static void execute_commands(struct command *commands,
 		return;
 	}
 
+	/*
+	 * Now we'll start writing out refs, which means the objects need
+	 * to be in their final positions so that other processes can see them.
+	 */
+	if (tmp_objdir_migrate(tmp_objdir) < 0) {
+		for (cmd = commands; cmd; cmd = cmd->next) {
+			if (!cmd->error_string)
+				cmd->error_string = "unable to migrate objects to permanent storage";
+		}
+		return;
+	}
+	tmp_objdir = NULL;
+
 	check_aliased_updates(commands);
 
 	free(head_name_to_free);
@@ -1639,6 +1666,18 @@ static const char *unpack(int err_fd, struct shallow_info *si)
 		argv_array_push(&child.args, alt_shallow_file);
 	}
 
+	tmp_objdir = tmp_objdir_create();
+	if (!tmp_objdir)
+		return "unable to create temporary object directory";
+	child.env = tmp_objdir_env(tmp_objdir);
+
+	/*
+	 * Normally we just pass the tmp_objdir environment to the child
+	 * processes that do the heavy lifting, but we may need to see these
+	 * objects ourselves to set up shallow information.
+	 */
+	tmp_objdir_add_as_alternate(tmp_objdir);
+
 	if (ntohl(hdr.hdr_entries) < unpack_limit) {
 		argv_array_pushl(&child.args, "unpack-objects", hdr_arg, NULL);
 		if (quiet)
diff --git a/t/t5547-push-quarantine.sh b/t/t5547-push-quarantine.sh
new file mode 100755
index 0000000..1e5d32d
--- /dev/null
+++ b/t/t5547-push-quarantine.sh
@@ -0,0 +1,36 @@
+#!/bin/sh
+
+test_description='check quarantine of objects during push'
+. ./test-lib.sh
+
+test_expect_success 'create picky dest repo' '
+	git init --bare dest.git &&
+	write_script dest.git/hooks/pre-receive <<-\EOF
+	while read old new ref; do
+		test "$(git log -1 --format=%s $new)" = reject && exit 1
+	done
+	exit 0
+	EOF
+'
+
+test_expect_success 'accepted objects work' '
+	test_commit ok &&
+	git push dest.git HEAD &&
+	commit=$(git rev-parse HEAD) &&
+	git --git-dir=dest.git cat-file commit $commit
+'
+
+test_expect_success 'rejected objects are not installed' '
+	test_commit reject &&
+	commit=$(git rev-parse HEAD) &&
+	test_must_fail git push dest.git reject &&
+	test_must_fail git --git-dir=dest.git cat-file commit $commit
+'
+
+test_expect_success 'rejected objects are removed' '
+	echo "incoming-*" >expect &&
+	(cd dest.git/objects && echo incoming-*) >actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 5/6] tmp-objdir: put quarantine information in the environment
  2016-09-30 19:35 [PATCH 0/6] receive-pack: quarantine pushed objects Jeff King
                   ` (3 preceding siblings ...)
  2016-09-30 19:36 ` [PATCH 4/6] receive-pack: quarantine objects until pre-receive accepts Jeff King
@ 2016-09-30 19:36 ` Jeff King
  2016-09-30 19:36 ` [PATCH 6/6] tmp-objdir: do not migrate files starting with '.' Jeff King
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2016-09-30 19:36 UTC (permalink / raw)
  To: git; +Cc: David Turner

The presence of the GIT_QUARANTINE_PATH variable lets any
called programs know that they're operating in a temporary
object directory (and where that directory is).

Signed-off-by: Jeff King <peff@peff.net>
---
 cache.h      | 1 +
 tmp-objdir.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/cache.h b/cache.h
index 607c9b5..fd81a6c 100644
--- a/cache.h
+++ b/cache.h
@@ -433,6 +433,7 @@ static inline enum object_type object_type(unsigned int mode)
 #define GIT_GLOB_PATHSPECS_ENVIRONMENT "GIT_GLOB_PATHSPECS"
 #define GIT_NOGLOB_PATHSPECS_ENVIRONMENT "GIT_NOGLOB_PATHSPECS"
 #define GIT_ICASE_PATHSPECS_ENVIRONMENT "GIT_ICASE_PATHSPECS"
+#define GIT_QUARANTINE_ENVIRONMENT "GIT_QUARANTINE_PATH"
 
 /*
  * This environment variable is expected to contain a boolean indicating
diff --git a/tmp-objdir.c b/tmp-objdir.c
index c92e6cc..9f53238 100644
--- a/tmp-objdir.c
+++ b/tmp-objdir.c
@@ -140,6 +140,8 @@ struct tmp_objdir *tmp_objdir_create(void)
 	env_append(&t->env, ALTERNATE_DB_ENVIRONMENT,
 		   absolute_path(get_object_directory()));
 	env_replace(&t->env, DB_ENVIRONMENT, absolute_path(t->path.buf));
+	env_replace(&t->env, GIT_QUARANTINE_ENVIRONMENT,
+		    absolute_path(t->path.buf));
 
 	return t;
 }
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 6/6] tmp-objdir: do not migrate files starting with '.'
  2016-09-30 19:35 [PATCH 0/6] receive-pack: quarantine pushed objects Jeff King
                   ` (4 preceding siblings ...)
  2016-09-30 19:36 ` [PATCH 5/6] tmp-objdir: put quarantine information in the environment Jeff King
@ 2016-09-30 19:36 ` Jeff King
  2016-10-02  9:20 ` [PATCH 0/6] receive-pack: quarantine pushed objects Christian Couder
  2016-10-03 20:48 ` [PATCH v2 0/5] " Jeff King
  7 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2016-09-30 19:36 UTC (permalink / raw)
  To: git; +Cc: David Turner

This avoids "." and "..", as we already do, but also leaves
room for index-pack to store extra data in the quarantine
area (e.g., for passing back any analysis to be read by the
pre-receive hook).

Signed-off-by: Jeff King <peff@peff.net>
---
 tmp-objdir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tmp-objdir.c b/tmp-objdir.c
index 9f53238..2181a42 100644
--- a/tmp-objdir.c
+++ b/tmp-objdir.c
@@ -181,7 +181,7 @@ static int read_dir_paths(struct string_list *out, const char *path)
 		return -1;
 
 	while ((de = readdir(dh)))
-		if (!is_dot_or_dotdot(de->d_name))
+		if (de->d_name[0] != '.')
 			string_list_append(out, de->d_name);
 
 	closedir(dh);
-- 
2.10.0.618.g82cc264

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/6] tmp-objdir: introduce API for temporary object directories
  2016-09-30 19:36 ` [PATCH 3/6] tmp-objdir: introduce API for temporary object directories Jeff King
@ 2016-09-30 21:25   ` Junio C Hamano
  2016-09-30 22:13     ` Jeff King
  2016-09-30 21:32   ` David Turner
  1 sibling, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2016-09-30 21:25 UTC (permalink / raw)
  To: Jeff King; +Cc: git, David Turner

Jeff King <peff@peff.net> writes:

> diff --git a/sha1_file.c b/sha1_file.c
> index 9a79c19..65deaf9 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -421,6 +421,12 @@ void add_to_alternates_file(const char *reference)
>  	free(alts);
>  }
>  
> +void add_to_alternates_internal(const char *reference)
> +{
> +	prepare_alt_odb();
> +	link_alt_odb_entries(reference, strlen(reference), '\n', NULL, 0);
> +}
> +

A function _internal being extern felt a bit funny.  We are only
appending so the first one does not have to be reprepare.

> +static int tmp_objdir_destroy_1(struct tmp_objdir *t, int on_signal)
> +{
> +	int err;
> +
> +	if (!t)
> +		return 0;
> +
> +	if (t == the_tmp_objdir)
> +		the_tmp_objdir = NULL;
> +
> +	/*
> +	 * This may use malloc via strbuf_grow(), but we should
> +	 * have pre-grown t->path sufficiently so that this
> +	 * doesn't happen in practice.
> +	 */
> +	err = remove_dir_recursively(&t->path, 0);
> +
> +	/*
> +	 * When we are cleaning up due to a signal, we won't bother
> +	 * freeing memory; it may cause a deadlock if the signal
> +	 * arrived while libc's allocator lock is held.
> +	 */
> +	if (!on_signal)
> +		tmp_objdir_free(t);
> +	return err;
> +}
> +
> +int tmp_objdir_destroy(struct tmp_objdir *t)
> +{
> +	return tmp_objdir_destroy_1(t, 0);
> +}

Looks sensible.

> +	t = xmalloc(sizeof(*t));
> +	strbuf_init(&t->path, 0);
> +	argv_array_init(&t->env);
> +
> +	strbuf_addf(&t->path, "%s/incoming-XXXXXX", get_object_directory());

I was wondering where you would put this in.  Inside .git/objects/
sounds good.

> +/*
> + * Make sure we copy packfiles and their associated metafiles in the correct
> + * order. All of these ends_with checks are slightly expensive to do in
> + * the midst of a sorting routine, but in practice it shouldn't matter.
> + * We will have a relatively small number of packfiles to order, and loose
> + * objects exit early in the first line.
> + */
> +static int pack_copy_priority(const char *name)
> +{
> +	if (!starts_with(name, "pack"))
> +		return 0;
> +	if (ends_with(name, ".keep"))
> +		return 1;
> +	if (ends_with(name, ".pack"))
> +		return 2;
> +	if (ends_with(name, ".idx"))
> +		return 3;
> +	return 4;
> +}

Thanks for being careful.  A blind "cp -r" would have ruined the
day.

We do not do bitmaps upon receiving, I guess.

> + *	struct tmp_objdir *t = tmp_objdir_create();
> + *	if (!run_command_v_opt_cd_env(cmd, 0, NULL, tmp_objdir_env(t)) &&
> + *	    !tmp_objdir_migrate(t))
> + *		printf("success!\n");
> + *	else
> + *		die("failed...tmp_objdir will clean up for us");

Made me briefly wonder if a caller might want to use appropriate
environment to use the tmp-objdir given by the API in addition to
its own, but then such a caller just needs to prepare its own argv-array
and concatenate tmp_objdir_env() before making the opt_cd_env call,
so this is perfectly fine.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH 3/6] tmp-objdir: introduce API for temporary object directories
  2016-09-30 19:36 ` [PATCH 3/6] tmp-objdir: introduce API for temporary object directories Jeff King
  2016-09-30 21:25   ` Junio C Hamano
@ 2016-09-30 21:32   ` David Turner
  2016-09-30 22:44     ` Jeff King
  1 sibling, 1 reply; 32+ messages in thread
From: David Turner @ 2016-09-30 21:32 UTC (permalink / raw)
  To: 'Jeff King', git@vger.kernel.org

> +static void env_append(struct argv_array *env, const char *key, const
> +char *val) {
> +	const char *old = getenv(key);
> +
> +	if (!old)
> +		argv_array_pushf(env, "%s=%s", key, val);
> +	else
> +		argv_array_pushf(env, "%s=%s%c%s", key, old, PATH_SEP,
> val); 
>+}

I would like a comment explaining this function. 

> + * Finalize a temporary object directory by migrating its objects into
> +the main
> + * object database.
> + */

This should mention that it frees its argument.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/6] tmp-objdir: introduce API for temporary object directories
  2016-09-30 21:25   ` Junio C Hamano
@ 2016-09-30 22:13     ` Jeff King
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2016-09-30 22:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, David Turner

On Fri, Sep 30, 2016 at 02:25:43PM -0700, Junio C Hamano wrote:

> > +void add_to_alternates_internal(const char *reference)
> > +{
> > +	prepare_alt_odb();
> > +	link_alt_odb_entries(reference, strlen(reference), '\n', NULL, 0);
> > +}
> > +
> 
> A function _internal being extern felt a bit funny.  We are only
> appending so the first one does not have to be reprepare.

It's a match for add_to_alternates_file(). Suggestions for a better word
are welcome.

We do need to prepare_alt_odb(), as that is what sets up the
alt_odb_tail pointer. And also, a later prepare() call would overwrite
our entry.  We could refactor the alt_odb code, but it seemed simplest
to just make sure we don't add to an unprepared list.

> > +	t = xmalloc(sizeof(*t));
> > +	strbuf_init(&t->path, 0);
> > +	argv_array_init(&t->env);
> > +
> > +	strbuf_addf(&t->path, "%s/incoming-XXXXXX", get_object_directory());
> 
> I was wondering where you would put this in.  Inside .git/objects/
> sounds good.

The name "incoming" is kind of arbitrary and related to the fact that
this is used for receive-pack (though if we were to use it on the
fetching side, I think it would be equally correct). I don't think it
really matters in practice.

> > +static int pack_copy_priority(const char *name)
> > +{
> > +	if (!starts_with(name, "pack"))
> > +		return 0;
> > +	if (ends_with(name, ".keep"))
> > +		return 1;
> > +	if (ends_with(name, ".pack"))
> > +		return 2;
> > +	if (ends_with(name, ".idx"))
> > +		return 3;
> > +	return 4;
> > +}
> 
> Thanks for being careful.  A blind "cp -r" would have ruined the
> day.
> 
> We do not do bitmaps upon receiving, I guess.

But we don't, but they (and anything else) would just sort at the end,
which is OK.

> > + *	struct tmp_objdir *t = tmp_objdir_create();
> > + *	if (!run_command_v_opt_cd_env(cmd, 0, NULL, tmp_objdir_env(t)) &&
> > + *	    !tmp_objdir_migrate(t))
> > + *		printf("success!\n");
> > + *	else
> > + *		die("failed...tmp_objdir will clean up for us");
> 
> Made me briefly wonder if a caller might want to use appropriate
> environment to use the tmp-objdir given by the API in addition to
> its own, but then such a caller just needs to prepare its own argv-array
> and concatenate tmp_objdir_env() before making the opt_cd_env call,
> so this is perfectly fine.

Yep, and that's exactly what happens in one spot of the next patch.
My original had just open-coded, but I was happy to see we have
argv_array_pushv() these days, so it's a one-liner.

In the very original version, the receive-pack process did not need to
access the new objects at all (not until ref update time anyway, at
which point they've been migrated). And that's why the environment is
intentionally kept separate, and the caller can feed it to whichever
sub-programs it chooses. But a later version of git that handled shallow
pushes required receive-pack to actually look at the objects, and I
added the add_to_alternates_internal() call you see here.

At that point, it does make me wonder if a better interface would be for
tmp_objdir to just set up the environment variables in the parent
process in the first place, and then restore them upon
tmp_objdir_destroy(). It makes things a bit more automatic, which makes
me hesitate, but I think it would be fine for receive-pack.

I dunno. I mostly left it alone because I did it this way long ago, and
it wasn't broke. Polishing for upstream is an opportunity to fix old
oddities, but I think there is some value in applying a more
battle-tested patch.

-Peff


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/6] tmp-objdir: introduce API for temporary object directories
  2016-09-30 21:32   ` David Turner
@ 2016-09-30 22:44     ` Jeff King
  2016-09-30 23:07       ` David Turner
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff King @ 2016-09-30 22:44 UTC (permalink / raw)
  To: David Turner; +Cc: git@vger.kernel.org

On Fri, Sep 30, 2016 at 09:32:04PM +0000, David Turner wrote:

> > +static void env_append(struct argv_array *env, const char *key, const
> > +char *val) {
> > +	const char *old = getenv(key);
> > +
> > +	if (!old)
> > +		argv_array_pushf(env, "%s=%s", key, val);
> > +	else
> > +		argv_array_pushf(env, "%s=%s%c%s", key, old, PATH_SEP,
> > val); 
> >+}
> 
> I would like a comment explaining this function. 

I'll squash in:

diff --git a/tmp-objdir.c b/tmp-objdir.c
index c92e6cc..a98c246 100644
--- a/tmp-objdir.c
+++ b/tmp-objdir.c
@@ -70,6 +70,13 @@ static void remove_tmp_objdir_on_signal(int signo)
 	raise(signo);
 }
 
+/*
+ * These env_* functions are for setting up the child environment; the
+ * "replace" variant overrides the value of any existing variable with that
+ * "key". The "append" variant puts our new value at the end of a list,
+ * separated by PATH_SEP (which is what separate values in
+ * GIT_ALTERNATE_OBJECT_DIRECTORIES).
+ */
 static void env_append(struct argv_array *env, const char *key, const char *val)
 {
 	const char *old = getenv(key);

> > + * Finalize a temporary object directory by migrating its objects into
> > +the main
> > + * object database.
> > + */
> 
> This should mention that it frees its argument.

And:

diff --git a/tmp-objdir.h b/tmp-objdir.h
index aa47aa9..b1e45b4 100644
--- a/tmp-objdir.h
+++ b/tmp-objdir.h
@@ -35,7 +35,8 @@ const char **tmp_objdir_env(const struct tmp_objdir *);
 
 /*
  * Finalize a temporary object directory by migrating its objects into the main
- * object database.
+ * object database, removing the temporary directory, and freeing any
+ * associated resources.
  */
 int tmp_objdir_migrate(struct tmp_objdir *);
 

-Peff

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* RE: [PATCH 3/6] tmp-objdir: introduce API for temporary object directories
  2016-09-30 22:44     ` Jeff King
@ 2016-09-30 23:07       ` David Turner
  0 siblings, 0 replies; 32+ messages in thread
From: David Turner @ 2016-09-30 23:07 UTC (permalink / raw)
  To: 'Jeff King'; +Cc: git@vger.kernel.org

Thanks.  The rest all look good too.

> -----Original Message-----
> From: Jeff King [mailto:peff@peff.net]
> Sent: Friday, September 30, 2016 6:45 PM
> To: David Turner
> Cc: git@vger.kernel.org
> Subject: Re: [PATCH 3/6] tmp-objdir: introduce API for temporary object
> directories
> 
> On Fri, Sep 30, 2016 at 09:32:04PM +0000, David Turner wrote:
> 
> > > +static void env_append(struct argv_array *env, const char *key,
> > > +const char *val) {
> > > +	const char *old = getenv(key);
> > > +
> > > +	if (!old)
> > > +		argv_array_pushf(env, "%s=%s", key, val);
> > > +	else
> > > +		argv_array_pushf(env, "%s=%s%c%s", key, old, PATH_SEP,
> > > val);
> > >+}
> >
> > I would like a comment explaining this function.
> 
> I'll squash in:
> 
> diff --git a/tmp-objdir.c b/tmp-objdir.c index c92e6cc..a98c246 100644
> --- a/tmp-objdir.c
> +++ b/tmp-objdir.c
> @@ -70,6 +70,13 @@ static void remove_tmp_objdir_on_signal(int signo)
>  	raise(signo);
>  }
> 
> +/*
> + * These env_* functions are for setting up the child environment; the
> + * "replace" variant overrides the value of any existing variable with
> +that
> + * "key". The "append" variant puts our new value at the end of a list,
> + * separated by PATH_SEP (which is what separate values in
> + * GIT_ALTERNATE_OBJECT_DIRECTORIES).
> + */
>  static void env_append(struct argv_array *env, const char *key, const char
> *val)  {
>  	const char *old = getenv(key);
> 
> > > + * Finalize a temporary object directory by migrating its objects
> > > +into the main
> > > + * object database.
> > > + */
> >
> > This should mention that it frees its argument.
> 
> And:
> 
> diff --git a/tmp-objdir.h b/tmp-objdir.h index aa47aa9..b1e45b4 100644
> --- a/tmp-objdir.h
> +++ b/tmp-objdir.h
> @@ -35,7 +35,8 @@ const char **tmp_objdir_env(const struct tmp_objdir
> *);
> 
>  /*
>   * Finalize a temporary object directory by migrating its objects into the main
> - * object database.
> + * object database, removing the temporary directory, and freeing any
> + * associated resources.
>   */
>  int tmp_objdir_migrate(struct tmp_objdir *);
> 
> 
> -Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/6] receive-pack: quarantine objects until pre-receive accepts
  2016-09-30 19:36 ` [PATCH 4/6] receive-pack: quarantine objects until pre-receive accepts Jeff King
@ 2016-10-01  9:12   ` Jeff King
  2017-04-08 14:53     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff King @ 2016-10-01  9:12 UTC (permalink / raw)
  To: git; +Cc: David Turner

On Fri, Sep 30, 2016 at 03:36:30PM -0400, Jeff King wrote:

> @@ -1639,6 +1666,18 @@ static const char *unpack(int err_fd, struct shallow_info *si)
>  		argv_array_push(&child.args, alt_shallow_file);
>  	}
>  
> +	tmp_objdir = tmp_objdir_create();
> +	if (!tmp_objdir)
> +		return "unable to create temporary object directory";
> +	child.env = tmp_objdir_env(tmp_objdir);

One thing to note here: this new code kicks in all the time. My
reasoning was that there's basically no time you _wouldn't_ want it to,
and certainly that was the case for us when I wrote it. But I tried to
think of user-visible changes. Here's what I came up with:

  - we currently leave the tmp_pack_* for a failed push sitting around
    (e.g., if the client hangs up halfway through, or index-pack rejects
    the pack for some reason). But with this series, it would always be
    cleaned up. That's a very good thing if you're running a git hosting
    site. It might make things harder if you're debugging.

    I don't think it's a good reason not to enable this by default, but
    it _could_ be a reason to have a config switch to turn it off
    temporarily (or just leave the "incoming-*" directory in place).

  - the environment that pre-receive pack runs in has access to objects
    that the rest of the repository doesn't. So if you were to do
    something silly in your pre-receive like:

      # reject the push, but log a copy of the objects
      git update-ref refs/rejected/$(date +%s) $new_sha1
      exit 1

    Then your ref-update would succeed (you have $new_sha1), but the
    objects would be deleted immediately afterwards. I find this a
    somewhat questionable pattern, and I have no idea if anybody else
    has thought of it. But it _does_ work today, and not with this
    series.

I don't think it would be too hard to put a config conditional around
this tmp_objdir_create(). And then all of the tmp_objdir_env() calls
would just return NULL, and effectively become noops.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/6] sha1_file: always allow relative paths to alternates
  2016-09-30 19:36 ` [PATCH 2/6] sha1_file: always allow relative paths to alternates Jeff King
@ 2016-10-02  9:07   ` René Scharfe
  2016-10-02 13:03     ` Jeff King
  2016-10-02 15:38     ` Jeff King
  0 siblings, 2 replies; 32+ messages in thread
From: René Scharfe @ 2016-10-02  9:07 UTC (permalink / raw)
  To: Jeff King, git; +Cc: David Turner

Am 30.09.2016 um 21:36 schrieb Jeff King:
> We adjust the test script here to demonstrate that this now
> works. Unfortunately, we can't demonstrate that the
> duplicate is suppressed, since it has no user-visible
> behavior (it's just one less place for our object lookups to
> go). But you can verify it manually via gdb, with something
> like:
> 
>     for i in a b c; do
>             git init --bare $i
>             blob=$(echo $i | git -C $i hash-object -w --stdin)
>     done
>     echo "../../b/objects" >a/objects/info/alternates
>     echo "../../c/objects" >>a/objects/info/alternates
>     echo "../../c/objects" >b/objects/info/alternates
>     gdb --args git cat-file -e $blob
> 
> After prepare_alt_odb() runs, we have only a single copy of
> "/path/to/c/objects/" in the alt_odb list.

A better way would be to provide a UI for that.  We could easily bolt
it to the side of count-objects like in the patch below.  Feels a bit
hackish, though.

Per-ODB counting would be nice, but a lot more involved, I guess
(didn't try).


diff --git a/builtin/count-objects.c b/builtin/count-objects.c
index ba92919..b2afe36 100644
--- a/builtin/count-objects.c
+++ b/builtin/count-objects.c
@@ -73,6 +73,12 @@ static int count_cruft(const char *basename, const char *path, void *data)
 	return 0;
 }
 
+static int print_alt_odb(struct alternate_object_database *alt, void *data)
+{
+	puts(alt->base);
+	return 0;
+}
+
 static char const * const count_objects_usage[] = {
 	N_("git count-objects [-v] [-H | --human-readable]"),
 	NULL
@@ -81,10 +87,13 @@ static char const * const count_objects_usage[] = {
 int cmd_count_objects(int argc, const char **argv, const char *prefix)
 {
 	int human_readable = 0;
+	int list_alt_odb = 0;
 	struct option opts[] = {
 		OPT__VERBOSE(&verbose, N_("be verbose")),
 		OPT_BOOL('H', "human-readable", &human_readable,
 			 N_("print sizes in human readable format")),
+		OPT_BOOL(0, "list-alternates", &list_alt_odb,
+			 N_("print list of alternate object databases")),
 		OPT_END(),
 	};
 
@@ -92,6 +101,12 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 	/* we do not take arguments other than flags for now */
 	if (argc)
 		usage_with_options(count_objects_usage, opts);
+
+	if (list_alt_odb) {
+		foreach_alt_odb(print_alt_odb, NULL);
+		return 0;
+	}
+
 	if (verbose) {
 		report_garbage = real_report_garbage;
 		report_linked_checkout_garbage();


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/6] receive-pack: quarantine pushed objects
  2016-09-30 19:35 [PATCH 0/6] receive-pack: quarantine pushed objects Jeff King
                   ` (5 preceding siblings ...)
  2016-09-30 19:36 ` [PATCH 6/6] tmp-objdir: do not migrate files starting with '.' Jeff King
@ 2016-10-02  9:20 ` Christian Couder
  2016-10-02 13:02   ` Jeff King
  2016-10-03 20:48 ` [PATCH v2 0/5] " Jeff King
  7 siblings, 1 reply; 32+ messages in thread
From: Christian Couder @ 2016-10-02  9:20 UTC (permalink / raw)
  To: Jeff King; +Cc: git, David Turner

On Fri, Sep 30, 2016 at 9:35 PM, Jeff King <peff@peff.net> wrote:
> I've mentioned before on the list that GitHub "quarantines" objects
> while the pre-receive hook runs. Here are the patches to implement
> that.

Great! Thanks for upstreaming these patches!

I wonder if the patch you sent in:

https://public-inbox.org/git/20160816144642.5ikkta4l5hyx6act@sigill.intra.peff.net/

is still useful or not.

> The basic problem is that as-is, index-pack admits pushed objects into
> the main object database immediately, before the pre-receive hook runs.
> It _has_ to, since the hook needs to be able to actually look at the
> objects. However, this means that if the pre-receive hook rejects the
> push, we still end up with the objects in the repository. We can't just
> delete them as temporary files, because we don't know what other
> processes might have started referencing them.
>
> The solution here is to push into a "quarantine" directory that is
> accessible only to pre-receive, check_connected(), etc, and only
> move the objects into the main object database after we've finished
> those basic checks.

I guess if we fail the receive-pack because the pack is bigger than
receive.maxInputSize, then the "quarantine" directory will also be
removed, so the part of the pack that we received before failing the
receive-pack will be deleted.

[...]

>     These two patches set that up by letting index-pack and pre-receive
>     know that quarantine path and use it to store arbitrary files that
>     _don't_ get migrated to the main object database (i.e., the log file
>     mentioned above).

It would be nice to have a diffstat for the whole series.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/6] receive-pack: quarantine pushed objects
  2016-10-02  9:20 ` [PATCH 0/6] receive-pack: quarantine pushed objects Christian Couder
@ 2016-10-02 13:02   ` Jeff King
  2016-10-03  6:45     ` Christian Couder
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff King @ 2016-10-02 13:02 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, David Turner

On Sun, Oct 02, 2016 at 11:20:59AM +0200, Christian Couder wrote:

> On Fri, Sep 30, 2016 at 9:35 PM, Jeff King <peff@peff.net> wrote:
> > I've mentioned before on the list that GitHub "quarantines" objects
> > while the pre-receive hook runs. Here are the patches to implement
> > that.
> 
> Great! Thanks for upstreaming these patches!
> 
> I wonder if the patch you sent in:
> 
> https://public-inbox.org/git/20160816144642.5ikkta4l5hyx6act@sigill.intra.peff.net/
> 
> is still useful or not.

It is potentially still useful for other code paths besides
receive-pack. But if the main concern is pushes, then yeah, I think it
is not really doing anything.

> I guess if we fail the receive-pack because the pack is bigger than
> receive.maxInputSize, then the "quarantine" directory will also be
> removed, so the part of the pack that we received before failing the
> receive-pack will be deleted.

Correct. _Any_ failure up to the tmp_objdir_migrate() call will drop the
objects. So that includes index-pack failing for any reason.

> >     These two patches set that up by letting index-pack and pre-receive
> >     know that quarantine path and use it to store arbitrary files that
> >     _don't_ get migrated to the main object database (i.e., the log file
> >     mentioned above).
> 
> It would be nice to have a diffstat for the whole series.

You mean in the cover letter? I do not mind including it if people find
them useful, but I personally have always just found them to be clutter
at that level.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/6] sha1_file: always allow relative paths to alternates
  2016-10-02  9:07   ` René Scharfe
@ 2016-10-02 13:03     ` Jeff King
  2016-10-02 15:38     ` Jeff King
  1 sibling, 0 replies; 32+ messages in thread
From: Jeff King @ 2016-10-02 13:03 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, David Turner

On Sun, Oct 02, 2016 at 11:07:39AM +0200, René Scharfe wrote:

> Am 30.09.2016 um 21:36 schrieb Jeff King:
> > We adjust the test script here to demonstrate that this now
> > works. Unfortunately, we can't demonstrate that the
> > duplicate is suppressed, since it has no user-visible
> > behavior (it's just one less place for our object lookups to
> > go). But you can verify it manually via gdb, with something
> > like:
> > 
> >     for i in a b c; do
> >             git init --bare $i
> >             blob=$(echo $i | git -C $i hash-object -w --stdin)
> >     done
> >     echo "../../b/objects" >a/objects/info/alternates
> >     echo "../../c/objects" >>a/objects/info/alternates
> >     echo "../../c/objects" >b/objects/info/alternates
> >     gdb --args git cat-file -e $blob
> > 
> > After prepare_alt_odb() runs, we have only a single copy of
> > "/path/to/c/objects/" in the alt_odb list.
> 
> A better way would be to provide a UI for that.  We could easily bolt
> it to the side of count-objects like in the patch below.  Feels a bit
> hackish, though.

Yeah, I thought about something like that, but I'm a bit hesitant to add
public interfaces that exist just for testing. OTOH, "count-objects
--list-alternates" is conceivably a useful debugging tool for "did I set
up my alternates correctly?".

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/6] sha1_file: always allow relative paths to alternates
  2016-10-02  9:07   ` René Scharfe
  2016-10-02 13:03     ` Jeff King
@ 2016-10-02 15:38     ` Jeff King
  2016-10-02 16:59       ` Jeff King
  1 sibling, 1 reply; 32+ messages in thread
From: Jeff King @ 2016-10-02 15:38 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, David Turner

On Sun, Oct 02, 2016 at 11:07:39AM +0200, René Scharfe wrote:

> diff --git a/builtin/count-objects.c b/builtin/count-objects.c
> index ba92919..b2afe36 100644
> --- a/builtin/count-objects.c
> +++ b/builtin/count-objects.c
> @@ -73,6 +73,12 @@ static int count_cruft(const char *basename, const char *path, void *data)
>  	return 0;
>  }
>  
> +static int print_alt_odb(struct alternate_object_database *alt, void *data)
> +{
> +	puts(alt->base);
> +	return 0;
> +}

This turns out to be wrong, because alt->base _isn't_ just the base;
it's also the scratch buffer we write into to form pathnames. So if
we've used the buffer to look up an object, we'll get that object name
here, not just the base.

It tends to work for your command because we do nothing except list the
alternates and exit, but I'm not sure if there are code paths which
might access an object.

I think giving a known state to the callback should be the
responsibility of foreach_alt_odb(), though.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/6] sha1_file: always allow relative paths to alternates
  2016-10-02 15:38     ` Jeff King
@ 2016-10-02 16:59       ` Jeff King
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2016-10-02 16:59 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, David Turner

On Sun, Oct 02, 2016 at 11:38:56AM -0400, Jeff King wrote:

> On Sun, Oct 02, 2016 at 11:07:39AM +0200, René Scharfe wrote:
> 
> > diff --git a/builtin/count-objects.c b/builtin/count-objects.c
> > index ba92919..b2afe36 100644
> > --- a/builtin/count-objects.c
> > +++ b/builtin/count-objects.c
> > @@ -73,6 +73,12 @@ static int count_cruft(const char *basename, const char *path, void *data)
> >  	return 0;
> >  }
> >  
> > +static int print_alt_odb(struct alternate_object_database *alt, void *data)
> > +{
> > +	puts(alt->base);
> > +	return 0;
> > +}
> 
> This turns out to be wrong, because alt->base _isn't_ just the base;
> it's also the scratch buffer we write into to form pathnames. So if
> we've used the buffer to look up an object, we'll get that object name
> here, not just the base.
> 
> It tends to work for your command because we do nothing except list the
> alternates and exit, but I'm not sure if there are code paths which
> might access an object.
> 
> I think giving a known state to the callback should be the
> responsibility of foreach_alt_odb(), though.

Actually, it looks like there are a lot of opportunities for cleanups in
that area, and maybe some bugfixes (e.g., we should consistently be
using fspathcmp and not memcmp to check for duplicate paths). I started
on a series, but don't have anything to show yet (just FYI in case you
were thinking about doing the same).

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/6] receive-pack: quarantine pushed objects
  2016-10-02 13:02   ` Jeff King
@ 2016-10-03  6:45     ` Christian Couder
  0 siblings, 0 replies; 32+ messages in thread
From: Christian Couder @ 2016-10-03  6:45 UTC (permalink / raw)
  To: Jeff King; +Cc: git, David Turner

On Sun, Oct 2, 2016 at 3:02 PM, Jeff King <peff@peff.net> wrote:
> On Sun, Oct 02, 2016 at 11:20:59AM +0200, Christian Couder wrote:
>
>> I wonder if the patch you sent in:
>>
>> https://public-inbox.org/git/20160816144642.5ikkta4l5hyx6act@sigill.intra.peff.net/
>>
>> is still useful or not.
>
> It is potentially still useful for other code paths besides
> receive-pack. But if the main concern is pushes, then yeah, I think it
> is not really doing anything.
>
>> I guess if we fail the receive-pack because the pack is bigger than
>> receive.maxInputSize, then the "quarantine" directory will also be
>> removed, so the part of the pack that we received before failing the
>> receive-pack will be deleted.
>
> Correct. _Any_ failure up to the tmp_objdir_migrate() call will drop the
> objects. So that includes index-pack failing for any reason.

Great, thanks for explaining!

>> >     These two patches set that up by letting index-pack and pre-receive
>> >     know that quarantine path and use it to store arbitrary files that
>> >     _don't_ get migrated to the main object database (i.e., the log file
>> >     mentioned above).
>>
>> It would be nice to have a diffstat for the whole series.
>
> You mean in the cover letter? I do not mind including it if people find
> them useful, but I personally have always just found them to be clutter
> at that level.

I think it can help to quickly get an idea about what the series
impacts, and it would have made it easier for me to see that the
changes in the patch you sent previously
(https://public-inbox.org/git/20160816144642.5ikkta4l5hyx6act@sigill.intra.peff.net/)
are not part of this series.

Thanks anyway,
Christian.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 0/5] receive-pack: quarantine pushed objects
  2016-09-30 19:35 [PATCH 0/6] receive-pack: quarantine pushed objects Jeff King
                   ` (6 preceding siblings ...)
  2016-10-02  9:20 ` [PATCH 0/6] receive-pack: quarantine pushed objects Christian Couder
@ 2016-10-03 20:48 ` Jeff King
  2016-10-03 20:49   ` [PATCH v2 1/5] check_connected: accept an env argument Jeff King
                     ` (5 more replies)
  7 siblings, 6 replies; 32+ messages in thread
From: Jeff King @ 2016-10-03 20:48 UTC (permalink / raw)
  To: git; +Cc: David Turner

Here's a v2; the original was at:

  http://public-inbox.org/git/20160930193533.ynbepaago6oycg5t@sigill.intra.peff.net/

which contains the rationale.  The required alternate-objects patches
(both the "allow recursive relative" one, and the helper to add an
internal alt-odb) have been pushed into their own series, that I posted
here:

  http://public-inbox.org/git/20161003203321.rj5jepviwo57uhqw@sigill.intra.peff.net/

This needs to be applied on top.

Other than that, v2 just fixes the minor issues raised on the list
(missing "static", and some clarifying comments).

  [1/5]: check_connected: accept an env argument
  [2/5]: tmp-objdir: introduce API for temporary object directories
  [3/5]: receive-pack: quarantine objects until pre-receive accepts
  [4/5]: tmp-objdir: put quarantine information in the environment
  [5/5]: tmp-objdir: do not migrate files starting with '.'

 Makefile                   |   1 +
 builtin/receive-pack.c     |  41 ++++++-
 cache.h                    |   1 +
 connected.c                |   1 +
 connected.h                |   5 +
 t/t5547-push-quarantine.sh |  36 ++++++
 tmp-objdir.c               | 275 +++++++++++++++++++++++++++++++++++++++++++++
 tmp-objdir.h               |  54 +++++++++
 8 files changed, 413 insertions(+), 1 deletion(-)
 create mode 100755 t/t5547-push-quarantine.sh
 create mode 100644 tmp-objdir.c
 create mode 100644 tmp-objdir.h


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 1/5] check_connected: accept an env argument
  2016-10-03 20:48 ` [PATCH v2 0/5] " Jeff King
@ 2016-10-03 20:49   ` Jeff King
  2016-10-05 19:01     ` Jakub Narębski
  2016-10-03 20:49   ` [PATCH v2 2/5] tmp-objdir: introduce API for temporary object directories Jeff King
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 32+ messages in thread
From: Jeff King @ 2016-10-03 20:49 UTC (permalink / raw)
  To: git; +Cc: David Turner

This lets callers influence the environment seen by
rev-list, which will be useful when we start providing
quarantined objects.

Signed-off-by: Jeff King <peff@peff.net>
---
 connected.c | 1 +
 connected.h | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/connected.c b/connected.c
index 8e3e4b1..136c2ac 100644
--- a/connected.c
+++ b/connected.c
@@ -63,6 +63,7 @@ int check_connected(sha1_iterate_fn fn, void *cb_data,
 				 _("Checking connectivity"));
 
 	rev_list.git_cmd = 1;
+	rev_list.env = opt->env;
 	rev_list.in = -1;
 	rev_list.no_stdout = 1;
 	if (opt->err_fd)
diff --git a/connected.h b/connected.h
index afa48cc..4ca325f 100644
--- a/connected.h
+++ b/connected.h
@@ -33,6 +33,11 @@ struct check_connected_options {
 
 	/* If non-zero, show progress as we traverse the objects. */
 	int progress;
+
+	/*
+	 * Insert these variables into the environment of the child process.
+	 */
+	const char **env;
 };
 
 #define CHECK_CONNECTED_INIT { 0 }
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 2/5] tmp-objdir: introduce API for temporary object directories
  2016-10-03 20:48 ` [PATCH v2 0/5] " Jeff King
  2016-10-03 20:49   ` [PATCH v2 1/5] check_connected: accept an env argument Jeff King
@ 2016-10-03 20:49   ` Jeff King
  2016-10-03 20:49   ` [PATCH v2 3/5] receive-pack: quarantine objects until pre-receive accepts Jeff King
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2016-10-03 20:49 UTC (permalink / raw)
  To: git; +Cc: David Turner

Once objects are added to the object database by a process,
they cannot easily be deleted, as we don't know what other
processes may have started referencing them. We have to
clean them up with git-gc, which will apply the usual
reachability and grace-period checks.

This patch provides an alternative: it helps callers create
a temporary directory inside the object directory, and a
temporary environment which can be passed to sub-programs to
ask them to write there (the original object directory
remains accessible as an alternate of the temporary one).

See tmp-objdir.h for details on the API.

Signed-off-by: Jeff King <peff@peff.net>
---
 Makefile     |   1 +
 tmp-objdir.c | 273 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tmp-objdir.h |  54 ++++++++++++
 3 files changed, 328 insertions(+)
 create mode 100644 tmp-objdir.c
 create mode 100644 tmp-objdir.h

diff --git a/Makefile b/Makefile
index 1aad150..4e3becb 100644
--- a/Makefile
+++ b/Makefile
@@ -831,6 +831,7 @@ LIB_OBJS += submodule-config.o
 LIB_OBJS += symlinks.o
 LIB_OBJS += tag.o
 LIB_OBJS += tempfile.o
+LIB_OBJS += tmp-objdir.o
 LIB_OBJS += trace.o
 LIB_OBJS += trailer.o
 LIB_OBJS += transport.o
diff --git a/tmp-objdir.c b/tmp-objdir.c
new file mode 100644
index 0000000..9443868
--- /dev/null
+++ b/tmp-objdir.c
@@ -0,0 +1,273 @@
+#include "cache.h"
+#include "tmp-objdir.h"
+#include "dir.h"
+#include "sigchain.h"
+#include "string-list.h"
+#include "strbuf.h"
+#include "argv-array.h"
+
+struct tmp_objdir {
+	struct strbuf path;
+	struct argv_array env;
+};
+
+/*
+ * Allow only one tmp_objdir at a time in a running process, which simplifies
+ * our signal/atexit cleanup routines.  It's doubtful callers will ever need
+ * more than one, and we can expand later if so.  You can have many such
+ * tmp_objdirs simultaneously in many processes, of course.
+ */
+static struct tmp_objdir *the_tmp_objdir;
+
+static void tmp_objdir_free(struct tmp_objdir *t)
+{
+	strbuf_release(&t->path);
+	argv_array_clear(&t->env);
+	free(t);
+}
+
+static int tmp_objdir_destroy_1(struct tmp_objdir *t, int on_signal)
+{
+	int err;
+
+	if (!t)
+		return 0;
+
+	if (t == the_tmp_objdir)
+		the_tmp_objdir = NULL;
+
+	/*
+	 * This may use malloc via strbuf_grow(), but we should
+	 * have pre-grown t->path sufficiently so that this
+	 * doesn't happen in practice.
+	 */
+	err = remove_dir_recursively(&t->path, 0);
+
+	/*
+	 * When we are cleaning up due to a signal, we won't bother
+	 * freeing memory; it may cause a deadlock if the signal
+	 * arrived while libc's allocator lock is held.
+	 */
+	if (!on_signal)
+		tmp_objdir_free(t);
+	return err;
+}
+
+int tmp_objdir_destroy(struct tmp_objdir *t)
+{
+	return tmp_objdir_destroy_1(t, 0);
+}
+
+static void remove_tmp_objdir(void)
+{
+	tmp_objdir_destroy(the_tmp_objdir);
+}
+
+static void remove_tmp_objdir_on_signal(int signo)
+{
+	tmp_objdir_destroy_1(the_tmp_objdir, 1);
+	sigchain_pop(signo);
+	raise(signo);
+}
+
+/*
+ * These env_* functions are for setting up the child environment; the
+ * "replace" variant overrides the value of any existing variable with that
+ * "key". The "append" variant puts our new value at the end of a list,
+ * separated by PATH_SEP (which is what separate values in
+ * GIT_ALTERNATE_OBJECT_DIRECTORIES).
+ */
+static void env_append(struct argv_array *env, const char *key, const char *val)
+{
+	const char *old = getenv(key);
+
+	if (!old)
+		argv_array_pushf(env, "%s=%s", key, val);
+	else
+		argv_array_pushf(env, "%s=%s%c%s", key, old, PATH_SEP, val);
+}
+
+static void env_replace(struct argv_array *env, const char *key, const char *val)
+{
+	argv_array_pushf(env, "%s=%s", key, val);
+}
+
+static int setup_tmp_objdir(const char *root)
+{
+	char *path;
+	int ret = 0;
+
+	path = xstrfmt("%s/pack", root);
+	ret = mkdir(path, 0777);
+	free(path);
+
+	return ret;
+}
+
+struct tmp_objdir *tmp_objdir_create(void)
+{
+	static int installed_handlers;
+	struct tmp_objdir *t;
+
+	if (the_tmp_objdir)
+		die("BUG: only one tmp_objdir can be used at a time");
+
+	t = xmalloc(sizeof(*t));
+	strbuf_init(&t->path, 0);
+	argv_array_init(&t->env);
+
+	strbuf_addf(&t->path, "%s/incoming-XXXXXX", get_object_directory());
+
+	/*
+	 * Grow the strbuf beyond any filename we expect to be placed in it.
+	 * If tmp_objdir_destroy() is called by a signal handler, then
+	 * we should be able to use the strbuf to remove files without
+	 * having to call malloc.
+	 */
+	strbuf_grow(&t->path, 1024);
+
+	if (!mkdtemp(t->path.buf)) {
+		/* free, not destroy, as we never touched the filesystem */
+		tmp_objdir_free(t);
+		return NULL;
+	}
+
+	the_tmp_objdir = t;
+	if (!installed_handlers) {
+		atexit(remove_tmp_objdir);
+		sigchain_push_common(remove_tmp_objdir_on_signal);
+		installed_handlers++;
+	}
+
+	if (setup_tmp_objdir(t->path.buf)) {
+		tmp_objdir_destroy(t);
+		return NULL;
+	}
+
+	env_append(&t->env, ALTERNATE_DB_ENVIRONMENT,
+		   absolute_path(get_object_directory()));
+	env_replace(&t->env, DB_ENVIRONMENT, absolute_path(t->path.buf));
+
+	return t;
+}
+
+/*
+ * Make sure we copy packfiles and their associated metafiles in the correct
+ * order. All of these ends_with checks are slightly expensive to do in
+ * the midst of a sorting routine, but in practice it shouldn't matter.
+ * We will have a relatively small number of packfiles to order, and loose
+ * objects exit early in the first line.
+ */
+static int pack_copy_priority(const char *name)
+{
+	if (!starts_with(name, "pack"))
+		return 0;
+	if (ends_with(name, ".keep"))
+		return 1;
+	if (ends_with(name, ".pack"))
+		return 2;
+	if (ends_with(name, ".idx"))
+		return 3;
+	return 4;
+}
+
+static int pack_copy_cmp(const char *a, const char *b)
+{
+	return pack_copy_priority(a) - pack_copy_priority(b);
+}
+
+static int read_dir_paths(struct string_list *out, const char *path)
+{
+	DIR *dh;
+	struct dirent *de;
+
+	dh = opendir(path);
+	if (!dh)
+		return -1;
+
+	while ((de = readdir(dh)))
+		if (!is_dot_or_dotdot(de->d_name))
+			string_list_append(out, de->d_name);
+
+	closedir(dh);
+	return 0;
+}
+
+static int migrate_paths(struct strbuf *src, struct strbuf *dst);
+
+static int migrate_one(struct strbuf *src, struct strbuf *dst)
+{
+	struct stat st;
+
+	if (stat(src->buf, &st) < 0)
+		return -1;
+	if (S_ISDIR(st.st_mode)) {
+		if (!mkdir(dst->buf, 0777)) {
+			if (adjust_shared_perm(dst->buf))
+				return -1;
+		} else if (errno != EEXIST)
+			return -1;
+		return migrate_paths(src, dst);
+	}
+	return finalize_object_file(src->buf, dst->buf);
+}
+
+static int migrate_paths(struct strbuf *src, struct strbuf *dst)
+{
+	size_t src_len = src->len, dst_len = dst->len;
+	struct string_list paths = STRING_LIST_INIT_DUP;
+	int i;
+	int ret = 0;
+
+	if (read_dir_paths(&paths, src->buf) < 0)
+		return -1;
+	paths.cmp = pack_copy_cmp;
+	string_list_sort(&paths);
+
+	for (i = 0; i < paths.nr; i++) {
+		const char *name = paths.items[i].string;
+
+		strbuf_addf(src, "/%s", name);
+		strbuf_addf(dst, "/%s", name);
+
+		ret |= migrate_one(src, dst);
+
+		strbuf_setlen(src, src_len);
+		strbuf_setlen(dst, dst_len);
+	}
+
+	string_list_clear(&paths, 0);
+	return ret;
+}
+
+int tmp_objdir_migrate(struct tmp_objdir *t)
+{
+	struct strbuf src = STRBUF_INIT, dst = STRBUF_INIT;
+	int ret;
+
+	if (!t)
+		return 0;
+
+	strbuf_addbuf(&src, &t->path);
+	strbuf_addstr(&dst, get_object_directory());
+
+	ret = migrate_paths(&src, &dst);
+
+	strbuf_release(&src);
+	strbuf_release(&dst);
+
+	tmp_objdir_destroy(t);
+	return ret;
+}
+
+const char **tmp_objdir_env(const struct tmp_objdir *t)
+{
+	if (!t)
+		return NULL;
+	return t->env.argv;
+}
+
+void tmp_objdir_add_as_alternate(const struct tmp_objdir *t)
+{
+	add_to_alternates_memory(t->path.buf);
+}
diff --git a/tmp-objdir.h b/tmp-objdir.h
new file mode 100644
index 0000000..b1e45b4
--- /dev/null
+++ b/tmp-objdir.h
@@ -0,0 +1,54 @@
+#ifndef TMP_OBJDIR_H
+#define TMP_OBJDIR_H
+
+/*
+ * This API allows you to create a temporary object directory, advertise it to
+ * sub-processes via GIT_OBJECT_DIRECTORY and GIT_ALTERNATE_OBJECT_DIRECTORIES,
+ * and then either migrate its object into the main object directory, or remove
+ * it. The library handles unexpected signal/exit death by cleaning up the
+ * temporary directory.
+ *
+ * Example:
+ *
+ *	struct tmp_objdir *t = tmp_objdir_create();
+ *	if (!run_command_v_opt_cd_env(cmd, 0, NULL, tmp_objdir_env(t)) &&
+ *	    !tmp_objdir_migrate(t))
+ *		printf("success!\n");
+ *	else
+ *		die("failed...tmp_objdir will clean up for us");
+ *
+ */
+
+struct tmp_objdir;
+
+/*
+ * Create a new temporary object directory; returns NULL on failure.
+ */
+struct tmp_objdir *tmp_objdir_create(void);
+
+/*
+ * Return a list of environment strings, suitable for use with
+ * child_process.env, that can be passed to child programs to make use of the
+ * temporary object directory.
+ */
+const char **tmp_objdir_env(const struct tmp_objdir *);
+
+/*
+ * Finalize a temporary object directory by migrating its objects into the main
+ * object database, removing the temporary directory, and freeing any
+ * associated resources.
+ */
+int tmp_objdir_migrate(struct tmp_objdir *);
+
+/*
+ * Destroy a temporary object directory, discarding any objects it contains.
+ */
+int tmp_objdir_destroy(struct tmp_objdir *);
+
+/*
+ * Add the temporary object directory as an alternate object store in the
+ * current process.
+ */
+void tmp_objdir_add_as_alternate(const struct tmp_objdir *);
+
+#endif /* TMP_OBJDIR_H */
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 3/5] receive-pack: quarantine objects until pre-receive accepts
  2016-10-03 20:48 ` [PATCH v2 0/5] " Jeff King
  2016-10-03 20:49   ` [PATCH v2 1/5] check_connected: accept an env argument Jeff King
  2016-10-03 20:49   ` [PATCH v2 2/5] tmp-objdir: introduce API for temporary object directories Jeff King
@ 2016-10-03 20:49   ` Jeff King
  2016-10-03 20:49   ` [PATCH v2 4/5] tmp-objdir: put quarantine information in the environment Jeff King
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2016-10-03 20:49 UTC (permalink / raw)
  To: git; +Cc: David Turner

When a client pushes objects to us, index-pack checks the
objects themselves and then installs them into place. If we
then reject the push due to a pre-receive hook, we cannot
just delete the packfile; other processes may be depending
on it. We have to do a normal reachability check at this
point via `git gc`.

But such objects may hang around for weeks due to the
gc.pruneExpire grace period. And worse, during that time
they may be exploded from the pack into inefficient loose
objects.

Instead, this patch teaches receive-pack to put the new
objects into a "quarantine" temporary directory. We make
these objects available to the connectivity check and to the
pre-receive hook, and then install them into place only if
it is successful (and otherwise remove them as tempfiles).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/receive-pack.c     | 41 ++++++++++++++++++++++++++++++++++++++++-
 t/t5547-push-quarantine.sh | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+), 1 deletion(-)
 create mode 100755 t/t5547-push-quarantine.sh

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 896b16f..267d320 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -20,6 +20,7 @@
 #include "gpg-interface.h"
 #include "sigchain.h"
 #include "fsck.h"
+#include "tmp-objdir.h"
 
 static const char * const receive_pack_usage[] = {
 	N_("git receive-pack <git-dir>"),
@@ -86,6 +87,8 @@ static enum {
 } use_keepalive;
 static int keepalive_in_sec = 5;
 
+static struct tmp_objdir *tmp_objdir;
+
 static enum deny_action parse_deny_action(const char *var, const char *value)
 {
 	if (value) {
@@ -663,6 +666,9 @@ static int run_and_feed_hook(const char *hook_name, feed_fn feed,
 	} else
 		argv_array_pushf(&proc.env_array, "GIT_PUSH_OPTION_COUNT");
 
+	if (tmp_objdir)
+		argv_array_pushv(&proc.env_array, tmp_objdir_env(tmp_objdir));
+
 	if (use_sideband) {
 		memset(&muxer, 0, sizeof(muxer));
 		muxer.proc = copy_to_sideband;
@@ -762,6 +768,7 @@ static int run_update_hook(struct command *cmd)
 	proc.stdout_to_stderr = 1;
 	proc.err = use_sideband ? -1 : 0;
 	proc.argv = argv;
+	proc.env = tmp_objdir_env(tmp_objdir);
 
 	code = start_command(&proc);
 	if (code)
@@ -833,6 +840,7 @@ static int update_shallow_ref(struct command *cmd, struct shallow_info *si)
 		    !delayed_reachability_test(si, i))
 			sha1_array_append(&extra, si->shallow->sha1[i]);
 
+	opt.env = tmp_objdir_env(tmp_objdir);
 	setup_alternate_shallow(&shallow_lock, &opt.shallow_file, &extra);
 	if (check_connected(command_singleton_iterator, cmd, &opt)) {
 		rollback_lock_file(&shallow_lock);
@@ -1240,12 +1248,17 @@ static void set_connectivity_errors(struct command *commands,
 
 	for (cmd = commands; cmd; cmd = cmd->next) {
 		struct command *singleton = cmd;
+		struct check_connected_options opt = CHECK_CONNECTED_INIT;
+
 		if (shallow_update && si->shallow_ref[cmd->index])
 			/* to be checked in update_shallow_ref() */
 			continue;
+
+		opt.env = tmp_objdir_env(tmp_objdir);
 		if (!check_connected(command_singleton_iterator, &singleton,
-				     NULL))
+				     &opt))
 			continue;
+
 		cmd->error_string = "missing necessary objects";
 	}
 }
@@ -1428,6 +1441,7 @@ static void execute_commands(struct command *commands,
 	data.si = si;
 	opt.err_fd = err_fd;
 	opt.progress = err_fd && !quiet;
+	opt.env = tmp_objdir_env(tmp_objdir);
 	if (check_connected(iterate_receive_command_list, &data, &opt))
 		set_connectivity_errors(commands, si);
 
@@ -1444,6 +1458,19 @@ static void execute_commands(struct command *commands,
 		return;
 	}
 
+	/*
+	 * Now we'll start writing out refs, which means the objects need
+	 * to be in their final positions so that other processes can see them.
+	 */
+	if (tmp_objdir_migrate(tmp_objdir) < 0) {
+		for (cmd = commands; cmd; cmd = cmd->next) {
+			if (!cmd->error_string)
+				cmd->error_string = "unable to migrate objects to permanent storage";
+		}
+		return;
+	}
+	tmp_objdir = NULL;
+
 	check_aliased_updates(commands);
 
 	free(head_name_to_free);
@@ -1639,6 +1666,18 @@ static const char *unpack(int err_fd, struct shallow_info *si)
 		argv_array_push(&child.args, alt_shallow_file);
 	}
 
+	tmp_objdir = tmp_objdir_create();
+	if (!tmp_objdir)
+		return "unable to create temporary object directory";
+	child.env = tmp_objdir_env(tmp_objdir);
+
+	/*
+	 * Normally we just pass the tmp_objdir environment to the child
+	 * processes that do the heavy lifting, but we may need to see these
+	 * objects ourselves to set up shallow information.
+	 */
+	tmp_objdir_add_as_alternate(tmp_objdir);
+
 	if (ntohl(hdr.hdr_entries) < unpack_limit) {
 		argv_array_pushl(&child.args, "unpack-objects", hdr_arg, NULL);
 		if (quiet)
diff --git a/t/t5547-push-quarantine.sh b/t/t5547-push-quarantine.sh
new file mode 100755
index 0000000..1e5d32d
--- /dev/null
+++ b/t/t5547-push-quarantine.sh
@@ -0,0 +1,36 @@
+#!/bin/sh
+
+test_description='check quarantine of objects during push'
+. ./test-lib.sh
+
+test_expect_success 'create picky dest repo' '
+	git init --bare dest.git &&
+	write_script dest.git/hooks/pre-receive <<-\EOF
+	while read old new ref; do
+		test "$(git log -1 --format=%s $new)" = reject && exit 1
+	done
+	exit 0
+	EOF
+'
+
+test_expect_success 'accepted objects work' '
+	test_commit ok &&
+	git push dest.git HEAD &&
+	commit=$(git rev-parse HEAD) &&
+	git --git-dir=dest.git cat-file commit $commit
+'
+
+test_expect_success 'rejected objects are not installed' '
+	test_commit reject &&
+	commit=$(git rev-parse HEAD) &&
+	test_must_fail git push dest.git reject &&
+	test_must_fail git --git-dir=dest.git cat-file commit $commit
+'
+
+test_expect_success 'rejected objects are removed' '
+	echo "incoming-*" >expect &&
+	(cd dest.git/objects && echo incoming-*) >actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 4/5] tmp-objdir: put quarantine information in the environment
  2016-10-03 20:48 ` [PATCH v2 0/5] " Jeff King
                     ` (2 preceding siblings ...)
  2016-10-03 20:49   ` [PATCH v2 3/5] receive-pack: quarantine objects until pre-receive accepts Jeff King
@ 2016-10-03 20:49   ` Jeff King
  2016-10-03 20:49   ` [PATCH v2 5/5] tmp-objdir: do not migrate files starting with '.' Jeff King
  2016-10-03 21:25   ` [PATCH v2 0/5] receive-pack: quarantine pushed objects Junio C Hamano
  5 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2016-10-03 20:49 UTC (permalink / raw)
  To: git; +Cc: David Turner

The presence of the GIT_QUARANTINE_PATH variable lets any
called programs know that they're operating in a temporary
object directory (and where that directory is).

Signed-off-by: Jeff King <peff@peff.net>
---
 cache.h      | 1 +
 tmp-objdir.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/cache.h b/cache.h
index 9866e46..fcab658 100644
--- a/cache.h
+++ b/cache.h
@@ -433,6 +433,7 @@ static inline enum object_type object_type(unsigned int mode)
 #define GIT_GLOB_PATHSPECS_ENVIRONMENT "GIT_GLOB_PATHSPECS"
 #define GIT_NOGLOB_PATHSPECS_ENVIRONMENT "GIT_NOGLOB_PATHSPECS"
 #define GIT_ICASE_PATHSPECS_ENVIRONMENT "GIT_ICASE_PATHSPECS"
+#define GIT_QUARANTINE_ENVIRONMENT "GIT_QUARANTINE_PATH"
 
 /*
  * This environment variable is expected to contain a boolean indicating
diff --git a/tmp-objdir.c b/tmp-objdir.c
index 9443868..780af8e 100644
--- a/tmp-objdir.c
+++ b/tmp-objdir.c
@@ -147,6 +147,8 @@ struct tmp_objdir *tmp_objdir_create(void)
 	env_append(&t->env, ALTERNATE_DB_ENVIRONMENT,
 		   absolute_path(get_object_directory()));
 	env_replace(&t->env, DB_ENVIRONMENT, absolute_path(t->path.buf));
+	env_replace(&t->env, GIT_QUARANTINE_ENVIRONMENT,
+		    absolute_path(t->path.buf));
 
 	return t;
 }
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 5/5] tmp-objdir: do not migrate files starting with '.'
  2016-10-03 20:48 ` [PATCH v2 0/5] " Jeff King
                     ` (3 preceding siblings ...)
  2016-10-03 20:49   ` [PATCH v2 4/5] tmp-objdir: put quarantine information in the environment Jeff King
@ 2016-10-03 20:49   ` Jeff King
  2016-10-03 21:25   ` [PATCH v2 0/5] receive-pack: quarantine pushed objects Junio C Hamano
  5 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2016-10-03 20:49 UTC (permalink / raw)
  To: git; +Cc: David Turner

This avoids "." and "..", as we already do, but also leaves
room for index-pack to store extra data in the quarantine
area (e.g., for passing back any analysis to be read by the
pre-receive hook).

Signed-off-by: Jeff King <peff@peff.net>
---
 tmp-objdir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tmp-objdir.c b/tmp-objdir.c
index 780af8e..64435f2 100644
--- a/tmp-objdir.c
+++ b/tmp-objdir.c
@@ -188,7 +188,7 @@ static int read_dir_paths(struct string_list *out, const char *path)
 		return -1;
 
 	while ((de = readdir(dh)))
-		if (!is_dot_or_dotdot(de->d_name))
+		if (de->d_name[0] != '.')
 			string_list_append(out, de->d_name);
 
 	closedir(dh);
-- 
2.10.0.618.g82cc264

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 0/5] receive-pack: quarantine pushed objects
  2016-10-03 20:48 ` [PATCH v2 0/5] " Jeff King
                     ` (4 preceding siblings ...)
  2016-10-03 20:49   ` [PATCH v2 5/5] tmp-objdir: do not migrate files starting with '.' Jeff King
@ 2016-10-03 21:25   ` Junio C Hamano
  2016-10-03 21:28     ` Jeff King
  5 siblings, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2016-10-03 21:25 UTC (permalink / raw)
  To: Jeff King; +Cc: git, David Turner

Jeff King <peff@peff.net> writes:

> Here's a v2; the original was at:
>
>   http://public-inbox.org/git/20160930193533.ynbepaago6oycg5t@sigill.intra.peff.net/
>
> which contains the rationale.  The required alternate-objects patches
> (both the "allow recursive relative" one, and the helper to add an
> internal alt-odb) have been pushed into their own series, that I posted
> here:
>
>   http://public-inbox.org/git/20161003203321.rj5jepviwo57uhqw@sigill.intra.peff.net/
>
> This needs to be applied on top.

Sorry, you lost me.  So this 5-patch series comes on top of a brand
new 18-patch clean-up series?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 0/5] receive-pack: quarantine pushed objects
  2016-10-03 21:25   ` [PATCH v2 0/5] receive-pack: quarantine pushed objects Junio C Hamano
@ 2016-10-03 21:28     ` Jeff King
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2016-10-03 21:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, David Turner

On Mon, Oct 03, 2016 at 02:25:23PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Here's a v2; the original was at:
> >
> >   http://public-inbox.org/git/20160930193533.ynbepaago6oycg5t@sigill.intra.peff.net/
> >
> > which contains the rationale.  The required alternate-objects patches
> > (both the "allow recursive relative" one, and the helper to add an
> > internal alt-odb) have been pushed into their own series, that I posted
> > here:
> >
> >   http://public-inbox.org/git/20161003203321.rj5jepviwo57uhqw@sigill.intra.peff.net/
> >
> > This needs to be applied on top.
> 
> Sorry, you lost me.  So this 5-patch series comes on top of a brand
> new 18-patch clean-up series?

Yes, sorry to be unclear. The final sentence should have been:

  This v2 quarantine series need to be applied on top of that cleanup
  series.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 1/5] check_connected: accept an env argument
  2016-10-03 20:49   ` [PATCH v2 1/5] check_connected: accept an env argument Jeff King
@ 2016-10-05 19:01     ` Jakub Narębski
  2016-10-05 19:06       ` Jeff King
  0 siblings, 1 reply; 32+ messages in thread
From: Jakub Narębski @ 2016-10-05 19:01 UTC (permalink / raw)
  To: Jeff King, git; +Cc: David Turner

W dniu 03.10.2016 o 22:49, Jeff King pisze:

> diff --git a/connected.h b/connected.h
> index afa48cc..4ca325f 100644
> --- a/connected.h
> +++ b/connected.h
> @@ -33,6 +33,11 @@ struct check_connected_options {
>  
>  	/* If non-zero, show progress as we traverse the objects. */
>  	int progress;
> +
> +	/*
> +	 * Insert these variables into the environment of the child process.
> +	 */
> +	const char **env;
>  };

Just a nitpick, but I wonder why one comment is in single-line form,
and the other uses block-form with a single line.

-- 
Jakub Narębski, bikeshedding


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 1/5] check_connected: accept an env argument
  2016-10-05 19:01     ` Jakub Narębski
@ 2016-10-05 19:06       ` Jeff King
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2016-10-05 19:06 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: git, David Turner

On Wed, Oct 05, 2016 at 09:01:57PM +0200, Jakub Narębski wrote:

> > diff --git a/connected.h b/connected.h
> > index afa48cc..4ca325f 100644
> > --- a/connected.h
> > +++ b/connected.h
> > @@ -33,6 +33,11 @@ struct check_connected_options {
> >  
> >  	/* If non-zero, show progress as we traverse the objects. */
> >  	int progress;
> > +
> > +	/*
> > +	 * Insert these variables into the environment of the child process.
> > +	 */
> > +	const char **env;
> >  };
> 
> Just a nitpick, but I wonder why one comment is in single-line form,
> and the other uses block-form with a single line.

I think I wrote something longer originally, and then shortened it
before sending.

I don't generally think it matters much for a case like this (if it were
in the middle of code, I think the shorter form is worth doing, but here
it's basically a header for this variable).

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/6] receive-pack: quarantine objects until pre-receive accepts
  2016-10-01  9:12   ` Jeff King
@ 2017-04-08 14:53     ` Ævar Arnfjörð Bjarmason
  2017-04-10 21:14       ` Jeff King
  0 siblings, 1 reply; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-08 14:53 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, David Turner

On Sat, Oct 1, 2016 at 11:12 AM, Jeff King <peff@peff.net> wrote:
> On Fri, Sep 30, 2016 at 03:36:30PM -0400, Jeff King wrote:
>
>> @@ -1639,6 +1666,18 @@ static const char *unpack(int err_fd, struct shallow_info *si)
>>               argv_array_push(&child.args, alt_shallow_file);
>>       }
>>
>> +     tmp_objdir = tmp_objdir_create();
>> +     if (!tmp_objdir)
>> +             return "unable to create temporary object directory";
>> +     child.env = tmp_objdir_env(tmp_objdir);
>
> One thing to note here: this new code kicks in all the time. My
> reasoning was that there's basically no time you _wouldn't_ want it to,
> and certainly that was the case for us when I wrote it. But I tried to
> think of user-visible changes. Here's what I came up with:
>
>   - we currently leave the tmp_pack_* for a failed push sitting around
>     (e.g., if the client hangs up halfway through, or index-pack rejects
>     the pack for some reason). But with this series, it would always be
>     cleaned up. That's a very good thing if you're running a git hosting
>     site. It might make things harder if you're debugging.
>
>     I don't think it's a good reason not to enable this by default, but
>     it _could_ be a reason to have a config switch to turn it off
>     temporarily (or just leave the "incoming-*" directory in place).
>
>   - the environment that pre-receive pack runs in has access to objects
>     that the rest of the repository doesn't. So if you were to do
>     something silly in your pre-receive like:
>
>       # reject the push, but log a copy of the objects
>       git update-ref refs/rejected/$(date +%s) $new_sha1
>       exit 1
>
>     Then your ref-update would succeed (you have $new_sha1), but the
>     objects would be deleted immediately afterwards. I find this a
>     somewhat questionable pattern, and I have no idea if anybody else
>     has thought of it. But it _does_ work today, and not with this
>     series.
>
> I don't think it would be too hard to put a config conditional around
> this tmp_objdir_create(). And then all of the tmp_objdir_env() calls
> would just return NULL, and effectively become noops.

Very late reply, but I have a question about this. Is there anything
you can do on the plumbing level to figure out which area an object is
in (of course that's not mutually exclusive).

The use-case for that is e.g. having a hook that rejects large
binaries, but has an exception for a binary that's been added in the
past, before your change there's no optimal way to find that out,
you'd need to go over the whole history and list all blobs that have
ever been added, with your change it would be trivial if something
could give me "this object is in the quarantine area", but AFAICT
there's no API that'll show me that.

Also, I think this whole thing could really do with some documentation
in githooks(5). E.g. what hooks does it apply for? The test is just
for pre-receive but the patch changes run_update_hook(), does it also
take effect for update? Ditto the caveat about update-ref.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/6] receive-pack: quarantine objects until pre-receive accepts
  2017-04-08 14:53     ` Ævar Arnfjörð Bjarmason
@ 2017-04-10 21:14       ` Jeff King
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2017-04-10 21:14 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List, David Turner

On Sat, Apr 08, 2017 at 04:53:38PM +0200, Ævar Arnfjörð Bjarmason wrote:

> Very late reply, but I have a question about this. Is there anything
> you can do on the plumbing level to figure out which area an object is
> in (of course that's not mutually exclusive).
> 
> The use-case for that is e.g. having a hook that rejects large
> binaries, but has an exception for a binary that's been added in the
> past, before your change there's no optimal way to find that out,
> you'd need to go over the whole history and list all blobs that have
> ever been added, with your change it would be trivial if something
> could give me "this object is in the quarantine area", but AFAICT
> there's no API that'll show me that.

No, I don't think there's a way to ask that. The best API to add it to
would probably be "cat-file --batch-check", which can tell you similar
things about the on-disk storage, like "is it a delta? against what?"
and the on-disk size.

_But_. Like those other things, it would have the big caveat that the
object may appear multiple times. So if somebody sent you an object that
was a duplicate of an existing one, you may or may not see it as being
in the quarantine directory.

I think a better rule for hooks is to complain about history which
introduces new references to problematic objects. IOW, instead of
looking at the object database, actually see what the history
introduces:

  while read old new refname; do
	echo "$new"
	if test "$old" != 0000000000000000000000000000000000000000; then
		echo "^$old"
	fi
  done |
  git rev-list --objects --stdin |
  while read sha1 fn; do
    # various object checks against $sha1
  done

However, if large binaries are your interest, I will suggest one other
thing: it's much cheaper to find them during index-pack, rather than by
traversing history. At GitHub we run with a patch to index-pack that
dumps the sha1s of any large blobs into .large-objects in the quarantine
directory. You can abort there, of course, but it's hard to produce a
implement any other policy or produce a nice message. But if you just
note them and leave it to the pre-receive hook, then that hook can do
the traversal to find, e.g., which pathnames reference the large blobs.
And it only has to run at all when we know there's at least one such
blob (and it can quit immediately when it finds it).

I can share those patches, but they need some cleanup to be ready for
inclusion.

> Also, I think this whole thing could really do with some documentation
> in githooks(5). E.g. what hooks does it apply for? The test is just
> for pre-receive but the patch changes run_update_hook(), does it also
> take effect for update? Ditto the caveat about update-ref.

My thinking was that the cases where the effects were user-visible were
sufficiently insane that nobody would need to know or care when the
feature was in use.

It isn't in use for the update hook. It can't be, because that is run
per-ref. So the objects need to be pulled out of the quarantine before
the first ref is updated (I think the call in run_update_hook is
leftover cruft from before I realized that; it should be a noop because
we'll always have migrated the objects by then, and tmp_objdir_env()
will just return NULL).

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2017-04-10 21:14 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-30 19:35 [PATCH 0/6] receive-pack: quarantine pushed objects Jeff King
2016-09-30 19:35 ` [PATCH 1/6] check_connected: accept an env argument Jeff King
2016-09-30 19:36 ` [PATCH 2/6] sha1_file: always allow relative paths to alternates Jeff King
2016-10-02  9:07   ` René Scharfe
2016-10-02 13:03     ` Jeff King
2016-10-02 15:38     ` Jeff King
2016-10-02 16:59       ` Jeff King
2016-09-30 19:36 ` [PATCH 3/6] tmp-objdir: introduce API for temporary object directories Jeff King
2016-09-30 21:25   ` Junio C Hamano
2016-09-30 22:13     ` Jeff King
2016-09-30 21:32   ` David Turner
2016-09-30 22:44     ` Jeff King
2016-09-30 23:07       ` David Turner
2016-09-30 19:36 ` [PATCH 4/6] receive-pack: quarantine objects until pre-receive accepts Jeff King
2016-10-01  9:12   ` Jeff King
2017-04-08 14:53     ` Ævar Arnfjörð Bjarmason
2017-04-10 21:14       ` Jeff King
2016-09-30 19:36 ` [PATCH 5/6] tmp-objdir: put quarantine information in the environment Jeff King
2016-09-30 19:36 ` [PATCH 6/6] tmp-objdir: do not migrate files starting with '.' Jeff King
2016-10-02  9:20 ` [PATCH 0/6] receive-pack: quarantine pushed objects Christian Couder
2016-10-02 13:02   ` Jeff King
2016-10-03  6:45     ` Christian Couder
2016-10-03 20:48 ` [PATCH v2 0/5] " Jeff King
2016-10-03 20:49   ` [PATCH v2 1/5] check_connected: accept an env argument Jeff King
2016-10-05 19:01     ` Jakub Narębski
2016-10-05 19:06       ` Jeff King
2016-10-03 20:49   ` [PATCH v2 2/5] tmp-objdir: introduce API for temporary object directories Jeff King
2016-10-03 20:49   ` [PATCH v2 3/5] receive-pack: quarantine objects until pre-receive accepts Jeff King
2016-10-03 20:49   ` [PATCH v2 4/5] tmp-objdir: put quarantine information in the environment Jeff King
2016-10-03 20:49   ` [PATCH v2 5/5] tmp-objdir: do not migrate files starting with '.' Jeff King
2016-10-03 21:25   ` [PATCH v2 0/5] receive-pack: quarantine pushed objects Junio C Hamano
2016-10-03 21:28     ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).