git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC/PATCH 0/3] limit the size of the packs we receive
@ 2016-08-15 19:57 Christian Couder
  2016-08-15 19:57 ` [RFC/PATCH 1/3] index-pack: add --max-input-size=<size> option Christian Couder
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Christian Couder @ 2016-08-15 19:57 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Christian Couder

In https://public-inbox.org/git/20150612182045.GA23698%40peff.net/,
Peff sent a patch that is used by GitHub to abort `git receive-pack`
when the size of the pack we receive is bigger than a configured
limit.

GitLab is interested in using the same approach and in standardizing
the error messages the user could get back.

So I rebased Peff's patch to the current master, refreshed it a bit,
split it, and added the missing --max-input-size=<size> option to
`git unpack-objects` - to make it work for all `transfer.unpacklimit`
values - in a new patch.

There is no documentation yet for the `--max-input-size=<size>`
options added to `git index-pack` and `git unpack-objects`, nor for
the new `receive.maxsize` config option.

I kept Peff as the author of the patches that are made mostly from his
patch, but I added my Signed-off-by to them.

Christian Couder (1):
  unpack-objects: add --max-input-size=<size> option

Jeff King (2):
  index-pack: add --max-input-size=<size> option
  receive-pack: allow a maximum input size to be specified

 builtin/index-pack.c     |  5 +++++
 builtin/receive-pack.c   | 12 ++++++++++++
 builtin/unpack-objects.c |  7 +++++++
 t/t5546-push-limits.sh   | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 71 insertions(+)
 create mode 100755 t/t5546-push-limits.sh

-- 
2.10.0.rc0.4.g229e32c.dirty


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC/PATCH 1/3] index-pack: add --max-input-size=<size> option
  2016-08-15 19:57 [RFC/PATCH 0/3] limit the size of the packs we receive Christian Couder
@ 2016-08-15 19:57 ` Christian Couder
  2016-08-15 20:10   ` Jeff King
  2016-08-15 19:57 ` [RFC/PATCH 2/3] unpack-objects: " Christian Couder
  2016-08-15 19:57 ` [RFC/PATCH 3/3] receive-pack: allow a maximum input size to be specified Christian Couder
  2 siblings, 1 reply; 11+ messages in thread
From: Christian Couder @ 2016-08-15 19:57 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Christian Couder

From: Jeff King <peff@peff.net>

When receiving a pack-file, it can be useful to abort the
`git index-pack`, if the pack-file is too big.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 builtin/index-pack.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 1d2ea58..1fd60bd 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -87,6 +87,7 @@ static struct progress *progress;
 static unsigned char input_buffer[4096];
 static unsigned int input_offset, input_len;
 static off_t consumed_bytes;
+static off_t max_input_size;
 static unsigned deepest_delta;
 static git_SHA_CTX input_ctx;
 static uint32_t input_crc32;
@@ -297,6 +298,8 @@ static void use(int bytes)
 	if (signed_add_overflows(consumed_bytes, bytes))
 		die(_("pack too large for current definition of off_t"));
 	consumed_bytes += bytes;
+	if (max_input_size && consumed_bytes > max_input_size)
+		die(_("pack exceeds maximum allowed size"));
 }
 
 static const char *open_pack_file(const char *pack_name)
@@ -1714,6 +1717,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 					opts.off32_limit = strtoul(c+1, &c, 0);
 				if (*c || opts.off32_limit & 0x80000000)
 					die(_("bad %s"), arg);
+			} else if (skip_prefix(arg, "--max-input-size=", &arg)) {
+				max_input_size = strtoul(arg, NULL, 10);
 			} else
 				usage(index_pack_usage);
 			continue;
-- 
2.10.0.rc0.4.g229e32c.dirty


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC/PATCH 2/3] unpack-objects: add --max-input-size=<size> option
  2016-08-15 19:57 [RFC/PATCH 0/3] limit the size of the packs we receive Christian Couder
  2016-08-15 19:57 ` [RFC/PATCH 1/3] index-pack: add --max-input-size=<size> option Christian Couder
@ 2016-08-15 19:57 ` Christian Couder
  2016-08-15 20:11   ` Jeff King
  2016-08-15 19:57 ` [RFC/PATCH 3/3] receive-pack: allow a maximum input size to be specified Christian Couder
  2 siblings, 1 reply; 11+ messages in thread
From: Christian Couder @ 2016-08-15 19:57 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Christian Couder

When receiving a pack-file, it can be useful to abort the
`git unpack-objects`, if the pack-file is too big.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 builtin/unpack-objects.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 172470b..59b1f39 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -19,6 +19,7 @@ static const char unpack_usage[] = "git unpack-objects [-n] [-q] [-r] [--strict]
 static unsigned char buffer[4096];
 static unsigned int offset, len;
 static off_t consumed_bytes;
+static off_t max_input_size;
 static git_SHA_CTX ctx;
 static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
 
@@ -87,6 +88,8 @@ static void use(int bytes)
 	if (signed_add_overflows(consumed_bytes, bytes))
 		die("pack too large for current definition of off_t");
 	consumed_bytes += bytes;
+	if (max_input_size && consumed_bytes > max_input_size)
+		die(_("pack exceeds maximum allowed size"));
 }
 
 static void *get_data(unsigned long size)
@@ -550,6 +553,10 @@ int cmd_unpack_objects(int argc, const char **argv, const char *prefix)
 				len = sizeof(*hdr);
 				continue;
 			}
+			if (skip_prefix(arg, "--max-input-size=", &arg)) {
+				max_input_size = strtoul(arg, NULL, 10);
+				continue;
+			}
 			usage(unpack_usage);
 		}
 
-- 
2.10.0.rc0.4.g229e32c.dirty


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC/PATCH 3/3] receive-pack: allow a maximum input size to be specified
  2016-08-15 19:57 [RFC/PATCH 0/3] limit the size of the packs we receive Christian Couder
  2016-08-15 19:57 ` [RFC/PATCH 1/3] index-pack: add --max-input-size=<size> option Christian Couder
  2016-08-15 19:57 ` [RFC/PATCH 2/3] unpack-objects: " Christian Couder
@ 2016-08-15 19:57 ` Christian Couder
  2016-08-15 20:40   ` Jeff King
  2 siblings, 1 reply; 11+ messages in thread
From: Christian Couder @ 2016-08-15 19:57 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Christian Couder

From: Jeff King <peff@peff.net>

Receive-pack feeds its input to either index-pack or
unpack-objects, which will happily accept as many bytes as
a sender is willing to provide. Let's allow an arbitrary
cutoff point where we will stop writing bytes to disk.

What has already been written to disk can be cleaned
outside of receive-pack.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 builtin/receive-pack.c | 12 ++++++++++++
 t/t5546-push-limits.sh | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+)
 create mode 100755 t/t5546-push-limits.sh

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 92e1213..7627f7f 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -46,6 +46,7 @@ static int transfer_unpack_limit = -1;
 static int advertise_atomic_push = 1;
 static int advertise_push_options;
 static int unpack_limit = 100;
+static off_t max_input_size;
 static int report_status;
 static int use_sideband;
 static int use_atomic;
@@ -212,6 +213,11 @@ static int receive_pack_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (strcmp(var, "receive.maxsize") == 0) {
+		max_input_size = git_config_ulong(var, value);
+		return 0;
+	}
+
 	return git_default_config(var, value, cb);
 }
 
@@ -1650,6 +1656,9 @@ static const char *unpack(int err_fd, struct shallow_info *si)
 		if (fsck_objects)
 			argv_array_pushf(&child.args, "--strict%s",
 				fsck_msg_types.buf);
+		if (max_input_size)
+			argv_array_pushf(&child.args, "--max-input-size=%lu",
+				max_input_size);
 		child.no_stdout = 1;
 		child.err = err_fd;
 		child.git_cmd = 1;
@@ -1678,6 +1687,9 @@ static const char *unpack(int err_fd, struct shallow_info *si)
 				fsck_msg_types.buf);
 		if (!reject_thin)
 			argv_array_push(&child.args, "--fix-thin");
+		if (max_input_size)
+			argv_array_pushf(&child.args, "--max-input-size=%lu",
+				max_input_size);
 		child.out = -1;
 		child.err = err_fd;
 		child.git_cmd = 1;
diff --git a/t/t5546-push-limits.sh b/t/t5546-push-limits.sh
new file mode 100755
index 0000000..d3a4d1a
--- /dev/null
+++ b/t/t5546-push-limits.sh
@@ -0,0 +1,47 @@
+#!/bin/sh
+
+test_description='check input limits for pushing'
+. ./test-lib.sh
+
+test_expect_success 'create known-size commit' '
+	test-genrandom foo 1024 >file &&
+	git add file &&
+	test_commit one-k
+'
+
+test_expect_success 'create remote repository' '
+	git init --bare dest &&
+	git --git-dir=dest config receive.unpacklimit 1
+'
+
+test_expect_success 'receive.maxsize rejects push' '
+	git --git-dir=dest config receive.maxsize 512 &&
+	test_must_fail git push dest HEAD
+'
+
+test_expect_success 'bumping limit allows push' '
+	git --git-dir=dest config receive.maxsize 4k &&
+	git push dest HEAD
+'
+
+test_expect_success 'create another known-size commit' '
+	test-genrandom bar 2048 >file2 &&
+	git add file2 &&
+	test_commit two-k
+'
+
+test_expect_success 'change unpacklimit' '
+	git --git-dir=dest config receive.unpacklimit 10
+'
+
+test_expect_success 'receive.maxsize rejects push' '
+	git --git-dir=dest config receive.maxsize 512 &&
+	test_must_fail git push dest HEAD
+'
+
+test_expect_success 'bumping limit allows push' '
+	git --git-dir=dest config receive.maxsize 4k &&
+	git push dest HEAD
+'
+
+test_done
-- 
2.10.0.rc0.4.g229e32c.dirty


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC/PATCH 1/3] index-pack: add --max-input-size=<size> option
  2016-08-15 19:57 ` [RFC/PATCH 1/3] index-pack: add --max-input-size=<size> option Christian Couder
@ 2016-08-15 20:10   ` Jeff King
  0 siblings, 0 replies; 11+ messages in thread
From: Jeff King @ 2016-08-15 20:10 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Junio C Hamano, Christian Couder

On Mon, Aug 15, 2016 at 09:57:27PM +0200, Christian Couder wrote:

> From: Jeff King <peff@peff.net>
> 
> When receiving a pack-file, it can be useful to abort the
> `git index-pack`, if the pack-file is too big.

Not much rationale here. I guess because it is all in the 3rd patch,
which ties it into receive-pack. I'm not sure it's worth repeating. I
guess it could all be squished back into one patch. I'm OK either way.

> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index 1d2ea58..1fd60bd 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -87,6 +87,7 @@ static struct progress *progress;
>  static unsigned char input_buffer[4096];
>  static unsigned int input_offset, input_len;
>  static off_t consumed_bytes;
> +static off_t max_input_size;
>  static unsigned deepest_delta;
>  static git_SHA_CTX input_ctx;
>  static uint32_t input_crc32;
> @@ -297,6 +298,8 @@ static void use(int bytes)
>  	if (signed_add_overflows(consumed_bytes, bytes))
>  		die(_("pack too large for current definition of off_t"));
>  	consumed_bytes += bytes;
> +	if (max_input_size && consumed_bytes > max_input_size)
> +		die(_("pack exceeds maximum allowed size"));

Looks good. I see you marked it for translation, which makes sense.

On the original, I waffled on whether to share the size with the user in
the message. I didn't want to encourage people with "oh, if it's under
2G it must be OK, then!". Because really 2G was meant to be a "you
really shouldn't get this high, and we will unceremoniously dump your
push if you do".

>  static const char *open_pack_file(const char *pack_name)
> @@ -1714,6 +1717,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
>  					opts.off32_limit = strtoul(c+1, &c, 0);
>  				if (*c || opts.off32_limit & 0x80000000)
>  					die(_("bad %s"), arg);
> +			} else if (skip_prefix(arg, "--max-input-size=", &arg)) {
> +				max_input_size = strtoul(arg, NULL, 10);

max_input_size is an off_t, but your parse only up to ULONG_MAX here.
For my purposes in the original patch, this was OK, as we set it at 2GB,
which works everywhere (and also, GitHub systems all have 64-bit "long"
these days). But somebody on a 32-bit system could not set this to 4GB,
even though I think index-pack could otherwise handle it. We seem to use
strtoumax() elsewhere, so that's probably a good match (technically it
can overflow an off_t, but in practice this value comes from the admin
and they will set something sane).

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC/PATCH 2/3] unpack-objects: add --max-input-size=<size> option
  2016-08-15 19:57 ` [RFC/PATCH 2/3] unpack-objects: " Christian Couder
@ 2016-08-15 20:11   ` Jeff King
  0 siblings, 0 replies; 11+ messages in thread
From: Jeff King @ 2016-08-15 20:11 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Junio C Hamano, Christian Couder

On Mon, Aug 15, 2016 at 09:57:28PM +0200, Christian Couder wrote:

> When receiving a pack-file, it can be useful to abort the
> `git unpack-objects`, if the pack-file is too big.
> 
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>

Same remarks here as on the last patch, including strtoumax. :)

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC/PATCH 3/3] receive-pack: allow a maximum input size to be specified
  2016-08-15 19:57 ` [RFC/PATCH 3/3] receive-pack: allow a maximum input size to be specified Christian Couder
@ 2016-08-15 20:40   ` Jeff King
  2016-08-15 22:48     ` Junio C Hamano
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff King @ 2016-08-15 20:40 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Junio C Hamano, Christian Couder

On Mon, Aug 15, 2016 at 09:57:29PM +0200, Christian Couder wrote:

> From: Jeff King <peff@peff.net>
> 
> Receive-pack feeds its input to either index-pack or
> unpack-objects, which will happily accept as many bytes as
> a sender is willing to provide. Let's allow an arbitrary
> cutoff point where we will stop writing bytes to disk.
> 
> What has already been written to disk can be cleaned
> outside of receive-pack.

This second paragraph hints at a related problem.

Git is generally happy to leave tmp_pack_* around to be cleaned up later
next time git-gc runs. Including its default 2-week grace time.

So imagine that tries to "git push" in a loop. And each time they push,
you say "nope, that's too big". And each time you acquire a new 2GB
tmp_pack file. If your goal was to prevent somebody from streaming
straight to your filesystem and filling up your disk, then it wasn't
very successful. :)

The simple fix is to call register_tempfile() in open_pack_file(), and
just have index-pack clean up the file on its way out.

But there are harder cases. For instance, imagine somebody pushes a
500MB file, and you have a pre-receive hook that says "too big; I won't
accept this". And then they push in a loop, as before. You've accepted
the incoming pack into the repository by the time the pre-receive runs.
You can't just delete it, because you don't know if other simultaneous
processes have started to depend on the objects.

To solve that, I have patches that put incoming packfiles into a
"quarantine" area, then run the connectivity check and pre-receive hooks
with the quarantine accessible via GIT_ALTERNATE_OBJECT_DIRECTORIES. And
then we either move the quarantine packs into the real repo, or blow
away the tmpdir, depending on whether the hooks said the objects were
OK.

Those are patches I plan to share upstream but just haven't gotten
around to yet.

> diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> index 92e1213..7627f7f 100644
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -46,6 +46,7 @@ static int transfer_unpack_limit = -1;
>  static int advertise_atomic_push = 1;
>  static int advertise_push_options;
>  static int unpack_limit = 100;
> +static off_t max_input_size;
>  static int report_status;
>  static int use_sideband;
>  static int use_atomic;
> @@ -212,6 +213,11 @@ static int receive_pack_config(const char *var, const char *value, void *cb)
>  		return 0;
>  	}
>  
> +	if (strcmp(var, "receive.maxsize") == 0) {
> +		max_input_size = git_config_ulong(var, value);
> +		return 0;
> +	}

Another off_t/ulong mismatch. I think you want git_config_int64() here.

> @@ -1650,6 +1656,9 @@ static const char *unpack(int err_fd, struct shallow_info *si)
>  		if (fsck_objects)
>  			argv_array_pushf(&child.args, "--strict%s",
>  				fsck_msg_types.buf);
> +		if (max_input_size)
> +			argv_array_pushf(&child.args, "--max-input-size=%lu",
> +				max_input_size);

And here, PRIuMAX and uintmax_t. Or perhaps simpler, just store the
value as a string here and pass it on to index-pack (which would then
need to learn to handle suffixes like "2g"). We do a similar trick in
repack; see b861e23 (repack: propagate pack-objects options as strings,
2014-01-22).

> diff --git a/t/t5546-push-limits.sh b/t/t5546-push-limits.sh
> new file mode 100755
> index 0000000..d3a4d1a
> --- /dev/null
> +++ b/t/t5546-push-limits.sh
> @@ -0,0 +1,47 @@
> +#!/bin/sh
> +
> +test_description='check input limits for pushing'
> +. ./test-lib.sh
> +
> +test_expect_success 'create known-size commit' '
> +	test-genrandom foo 1024 >file &&
> +	git add file &&
> +	test_commit one-k
> +'
> +
> +test_expect_success 'create remote repository' '
> +	git init --bare dest &&
> +	git --git-dir=dest config receive.unpacklimit 1
> +'

We're going to do basically the same battery of tests against an
unpacklimit of "1" (to catch index-pack) and of "10" (to catch
unpack-objects). It might be clearer to just have a for-loop like:

  for unpacklimit in 1 100
  do
	test_expect_success 'create remote repository' '
		rm -rf dest &&
		git init --bare dest &&
		git -C dest config receive.unpacklimit $unpacklimit
	'

	test_expect_success 'receive.maxsize rejects push' '
		git -C dest config receive.maxsize 512 &&
		test_must_fail git push dest HEAD &&
	'

	test_expect_success 'bumping limit allows push' '
		git -C dest config receive.maxsize 4k &&
		git push dest HEAD
	'
  done

and it's probably worth a comment at the top of the loop explaining what
the heck those numbers mean. :)

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC/PATCH 3/3] receive-pack: allow a maximum input size to be specified
  2016-08-15 20:40   ` Jeff King
@ 2016-08-15 22:48     ` Junio C Hamano
  2016-08-16  1:03       ` Jeff King
  0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2016-08-15 22:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Christian Couder, git, Christian Couder

Jeff King <peff@peff.net> writes:

> The simple fix is to call register_tempfile() in open_pack_file(), and
> just have index-pack clean up the file on its way out.
>
> But there are harder cases. For instance, imagine somebody pushes a
> 500MB file, and you have a pre-receive hook that says "too big; I won't
> accept this". And then they push in a loop, as before. You've accepted
> the incoming pack into the repository by the time the pre-receive runs.
> You can't just delete it, because you don't know if other simultaneous
> processes have started to depend on the objects.
>
> To solve that, I have patches that put incoming packfiles into a
> "quarantine" area, then run the connectivity check and pre-receive hooks
> with the quarantine accessible via GIT_ALTERNATE_OBJECT_DIRECTORIES. And
> then we either move the quarantine packs into the real repo, or blow
> away the tmpdir, depending on whether the hooks said the objects were
> OK.
>
> Those are patches I plan to share upstream but just haven't gotten
> around to yet.

I think these other patches can come later, independent from this
three-patch series resurrected from the archive, so I can take a
reroll of these once the integer-size issues you pointed out are
sorted out.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC/PATCH 3/3] receive-pack: allow a maximum input size to be specified
  2016-08-15 22:48     ` Junio C Hamano
@ 2016-08-16  1:03       ` Jeff King
  2016-08-16  8:25         ` Christian Couder
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff King @ 2016-08-16  1:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Christian Couder, git, Christian Couder

On Mon, Aug 15, 2016 at 03:48:55PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > The simple fix is to call register_tempfile() in open_pack_file(), and
> > just have index-pack clean up the file on its way out.
> >
> > But there are harder cases. For instance, imagine somebody pushes a
> > 500MB file, and you have a pre-receive hook that says "too big; I won't
> > accept this". And then they push in a loop, as before. You've accepted
> > the incoming pack into the repository by the time the pre-receive runs.
> > You can't just delete it, because you don't know if other simultaneous
> > processes have started to depend on the objects.
> >
> > To solve that, I have patches that put incoming packfiles into a
> > "quarantine" area, then run the connectivity check and pre-receive hooks
> > with the quarantine accessible via GIT_ALTERNATE_OBJECT_DIRECTORIES. And
> > then we either move the quarantine packs into the real repo, or blow
> > away the tmpdir, depending on whether the hooks said the objects were
> > OK.
> >
> > Those are patches I plan to share upstream but just haven't gotten
> > around to yet.
> 
> I think these other patches can come later, independent from this
> three-patch series resurrected from the archive, so I can take a
> reroll of these once the integer-size issues you pointed out are
> sorted out.

Yeah, definitely it's independent. I was mostly suggesting to Christian
that he may want to look into the "easy" register_tempfile() case, as
GitLab may find this is only half of the fix that they want.

Also a patch I may try to polish and share in the future, but not as
high priority as some of the other stuff.

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC/PATCH 3/3] receive-pack: allow a maximum input size to be specified
  2016-08-16  1:03       ` Jeff King
@ 2016-08-16  8:25         ` Christian Couder
  2016-08-16 14:46           ` Jeff King
  0 siblings, 1 reply; 11+ messages in thread
From: Christian Couder @ 2016-08-16  8:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Christian Couder

On Tue, Aug 16, 2016 at 3:03 AM, Jeff King <peff@peff.net> wrote:
> On Mon, Aug 15, 2016 at 03:48:55PM -0700, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>>
>> > The simple fix is to call register_tempfile() in open_pack_file(), and
>> > just have index-pack clean up the file on its way out.
>> >
>> > But there are harder cases. For instance, imagine somebody pushes a
>> > 500MB file, and you have a pre-receive hook that says "too big; I won't
>> > accept this". And then they push in a loop, as before. You've accepted
>> > the incoming pack into the repository by the time the pre-receive runs.
>> > You can't just delete it, because you don't know if other simultaneous
>> > processes have started to depend on the objects.
>> >
>> > To solve that, I have patches that put incoming packfiles into a
>> > "quarantine" area, then run the connectivity check and pre-receive hooks
>> > with the quarantine accessible via GIT_ALTERNATE_OBJECT_DIRECTORIES. And
>> > then we either move the quarantine packs into the real repo, or blow
>> > away the tmpdir, depending on whether the hooks said the objects were
>> > OK.

Thanks Peff for the review and all these explanations and suggestions!

>> > Those are patches I plan to share upstream but just haven't gotten
>> > around to yet.

I would be happy to help if I can on these patches too!

>> I think these other patches can come later, independent from this
>> three-patch series resurrected from the archive, so I can take a
>> reroll of these once the integer-size issues you pointed out are
>> sorted out.

Ok, I just sent a non RFC reroll of the series with those issues
hopefully fixed.

> Yeah, definitely it's independent. I was mostly suggesting to Christian
> that he may want to look into the "easy" register_tempfile() case, as
> GitLab may find this is only half of the fix that they want.

Yeah, thanks again for this suggestion!

> Also a patch I may try to polish and share in the future, but not as
> high priority as some of the other stuff.

I can help polishing this patch if you want.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC/PATCH 3/3] receive-pack: allow a maximum input size to be specified
  2016-08-16  8:25         ` Christian Couder
@ 2016-08-16 14:46           ` Jeff King
  0 siblings, 0 replies; 11+ messages in thread
From: Jeff King @ 2016-08-16 14:46 UTC (permalink / raw)
  To: Christian Couder; +Cc: Junio C Hamano, git, Christian Couder

On Tue, Aug 16, 2016 at 10:25:44AM +0200, Christian Couder wrote:

> >> > Those are patches I plan to share upstream but just haven't gotten
> >> > around to yet.
> 
> I would be happy to help if I can on these patches too!

Thanks. I'll try to extract the quarantine patches, though I have a few
other things in my backlog first. I wrote them with the intent of
upstreaming, so I think they should be fairly clean.

> [register_tempfile on odb tmpfiles]
> > Also a patch I may try to polish and share in the future, but not as
> > high priority as some of the other stuff.
> 
> I can help polishing this patch if you want.

The original patch is not worth looking at; it predated all of the
tempfile.c infrastructure, so it added its own hacky versions those
functions.

This is basically what we're doing now (this is extracted from a larger
diff, so the line numbers may be a little funny):

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 3431de2..9c34353 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -304,8 +346,10 @@ static const char *open_pack_file(const char *pack_name)
 		input_fd = 0;
 		if (!pack_name) {
 			static char tmp_file[PATH_MAX];
+			struct tempfile *t = xcalloc(1, sizeof(*t));
 			output_fd = odb_mkstemp(tmp_file, sizeof(tmp_file),
 						"pack/tmp_pack_XXXXXX");
+			register_tempfile(t, tmp_file);
 			pack_name = xstrdup(tmp_file);
 		} else
 			output_fd = open(pack_name, O_CREAT|O_EXCL|O_RDWR, 0600);
diff --git a/pack-write.c b/pack-write.c
index 33293ce..14597b4 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "pack.h"
 #include "csum-file.h"
+#include "tempfile.h"
 
 void reset_pack_idx_option(struct pack_idx_option *opts)
 {
@@ -73,7 +74,9 @@ const char *write_idx_file(const char *index_name, struct pack_idx_entry **objec
 	} else {
 		if (!index_name) {
 			static char tmp_file[PATH_MAX];
+			struct tempfile *t = xcalloc(1, sizeof(*t));
 			fd = odb_mkstemp(tmp_file, sizeof(tmp_file), "pack/tmp_idx_XXXXXX");
+			register_tempfile(t, tmp_file);
 			index_name = xstrdup(tmp_file);
 		} else {
 			unlink(index_name);
@@ -327,10 +330,13 @@ int encode_in_pack_object_header(enum object_type type, uintmax_t size, unsigned
 
 struct sha1file *create_tmp_packfile(char **pack_tmp_name)
 {
+	struct tempfile *t = xcalloc(1, sizeof(*t));
 	char tmpname[PATH_MAX];
 	int fd;
 
 	fd = odb_mkstemp(tmpname, sizeof(tmpname), "pack/tmp_pack_XXXXXX");
+
+	register_tempfile(t, tmpname);
 	*pack_tmp_name = xstrdup(tmpname);
 	return sha1fd(fd, *pack_tmp_name);
 }

But it makes me wonder if we should just be switching odb_mkstemp() to
calling register_tempfile(). Or better yet, switching out
git_mkstemp_mode() for one of the tempfile.h functions (mks_tempfile_m,
I think). There may be some trickery with the allocation of "struct
tempfile" though (the struct needs to persist for the remainder of the
program, but we would not want to allocate and leak one per object, so
we need some way of reusing them).

I also wonder if it should be configurable whether to keep half-written
temporary files. In the early days, it was a good source of debugging
(e.g., look at this half-written pack and see why index-pack choked on
it). But these days I doubt that is really that helpful. So maybe it
would be OK to just drop them unconditionally.

-Peff

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-08-16 14:46 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-15 19:57 [RFC/PATCH 0/3] limit the size of the packs we receive Christian Couder
2016-08-15 19:57 ` [RFC/PATCH 1/3] index-pack: add --max-input-size=<size> option Christian Couder
2016-08-15 20:10   ` Jeff King
2016-08-15 19:57 ` [RFC/PATCH 2/3] unpack-objects: " Christian Couder
2016-08-15 20:11   ` Jeff King
2016-08-15 19:57 ` [RFC/PATCH 3/3] receive-pack: allow a maximum input size to be specified Christian Couder
2016-08-15 20:40   ` Jeff King
2016-08-15 22:48     ` Junio C Hamano
2016-08-16  1:03       ` Jeff King
2016-08-16  8:25         ` Christian Couder
2016-08-16 14:46           ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).