git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/3] limit the size of the packs we receive
@ 2016-08-16  8:16 Christian Couder
  2016-08-16  8:16 ` [PATCH 1/3] index-pack: add --max-input-size=<size> option Christian Couder
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Christian Couder @ 2016-08-16  8:16 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Christian Couder

Goal
~~~~

In https://public-inbox.org/git/20150612182045.GA23698%40peff.net/,
Peff sent a patch that is used by GitHub to abort `git receive-pack`
when the size of the pack we receive is bigger than a configured
limit.

GitLab is interested in using the same approach and in standardizing
the error messages the user could get back.

Comments
~~~~~~~~

I kept Peff as the author of the patches that are made mostly from his
patch, but I added my Signed-off-by to them.

Changes from previous RFC version
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  - added documentation to all the 3 patches,

  - changed strtoul() to strtoumax() in the first 2 patches, as
    suggested by Peff,

  - changed git_config_ulong() to git_config_int64() and used PRIuMAX
    and uintmax_t in the last patch, as suggested by Peff,

  - reorganized the tests in the last patch, as suggested by Peff

  - improved commit message of the last patch with information from
    Peff

Links
~~~~~

This patch series is available here:

https://github.com/chriscool/git/commits/max-receive

The previous RFC version is here on GitHub:

https://github.com/chriscool/git/commits/max-receive2

and here on the list:

https://public-inbox.org/git/20160815195729.16826-1-chriscool@tuxfamily.org/

Peff's initial patch is:

https://public-inbox.org/git/20150612182045.GA23698%40peff.net/

Diff with previous RFC version
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0bcb679..f5b6061 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2517,6 +2517,11 @@ receive.unpackLimit::
 	especially on slow filesystems.  If not set, the value of
 	`transfer.unpackLimit` is used instead.
 
+receive.maxsize::
+	If the size of a pack file is larger than this limit, then
+	git-receive-pack will error out, instead of accepting the pack
+	file. If not set or set to 0, then the size is unlimited.
+
 receive.denyDeletes::
 	If set to true, git-receive-pack will deny a ref update that deletes
 	the ref. Use this to prevent such a ref deletion via a push.
diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index 7a4e055..1b4b65d 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -87,6 +87,8 @@ OPTIONS
 	Specifying 0 will cause Git to auto-detect the number of CPU's
 	and use maximum 3 threads.
 
+--max-input-size=<size>::
+	Die, if the pack is larger than <size>.
 
 Note
 ----
diff --git a/Documentation/git-receive-pack.txt b/Documentation/git-receive-pack.txt
index 000ee8d..0ccd5fb 100644
--- a/Documentation/git-receive-pack.txt
+++ b/Documentation/git-receive-pack.txt
@@ -33,6 +33,9 @@ post-update hooks found in the Documentation/howto directory.
 option, which tells it if updates to a ref should be denied if they
 are not fast-forwards.
 
+A number of other receive.* config options are available to tweak
+its behavior, see linkgit:git-config[1].
+
 OPTIONS
 -------
 <directory>::
diff --git a/Documentation/git-unpack-objects.txt b/Documentation/git-unpack-objects.txt
index 3e887d1..b3de50d 100644
--- a/Documentation/git-unpack-objects.txt
+++ b/Documentation/git-unpack-objects.txt
@@ -44,6 +44,9 @@ OPTIONS
 --strict::
 	Don't write objects with broken content or links.
 
+--max-input-size=<size>::
+	Die, if the pack is larger than <size>.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 1fd60bd..4a8b4ae 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1718,7 +1718,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 				if (*c || opts.off32_limit & 0x80000000)
 					die(_("bad %s"), arg);
 			} else if (skip_prefix(arg, "--max-input-size=", &arg)) {
-				max_input_size = strtoul(arg, NULL, 10);
+				max_input_size = strtoumax(arg, NULL, 10);
 			} else
 				usage(index_pack_usage);
 			continue;
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 7627f7f..8c2943d 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -214,7 +214,7 @@ static int receive_pack_config(const char *var, const char *value, void *cb)
 	}
 
 	if (strcmp(var, "receive.maxsize") == 0) {
-		max_input_size = git_config_ulong(var, value);
+		max_input_size = git_config_int64(var, value);
 		return 0;
 	}
 
@@ -1657,8 +1657,8 @@ static const char *unpack(int err_fd, struct shallow_info *si)
 			argv_array_pushf(&child.args, "--strict%s",
 				fsck_msg_types.buf);
 		if (max_input_size)
-			argv_array_pushf(&child.args, "--max-input-size=%lu",
-				max_input_size);
+			argv_array_pushf(&child.args, "--max-input-size=%"PRIuMAX,
+				(uintmax_t)max_input_size);
 		child.no_stdout = 1;
 		child.err = err_fd;
 		child.git_cmd = 1;
@@ -1688,8 +1688,8 @@ static const char *unpack(int err_fd, struct shallow_info *si)
 		if (!reject_thin)
 			argv_array_push(&child.args, "--fix-thin");
 		if (max_input_size)
-			argv_array_pushf(&child.args, "--max-input-size=%lu",
-				max_input_size);
+			argv_array_pushf(&child.args, "--max-input-size=%"PRIuMAX,
+				(uintmax_t)max_input_size);
 		child.out = -1;
 		child.err = err_fd;
 		child.git_cmd = 1;
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 59b1f39..4532aa0 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -554,7 +554,7 @@ int cmd_unpack_objects(int argc, const char **argv, const char *prefix)
 				continue;
 			}
 			if (skip_prefix(arg, "--max-input-size=", &arg)) {
-				max_input_size = strtoul(arg, NULL, 10);
+				max_input_size = strtoumax(arg, NULL, 10);
 				continue;
 			}
 			usage(unpack_usage);
diff --git a/t/t5546-push-limits.sh b/t/t5546-push-limits.sh
index d3a4d1a..b38d508 100755
--- a/t/t5546-push-limits.sh
+++ b/t/t5546-push-limits.sh
@@ -3,45 +3,40 @@
 test_description='check input limits for pushing'
 . ./test-lib.sh
 
-test_expect_success 'create known-size commit' '
-	test-genrandom foo 1024 >file &&
-	git add file &&
-	test_commit one-k
-'
-
 test_expect_success 'create remote repository' '
-	git init --bare dest &&
-	git --git-dir=dest config receive.unpacklimit 1
+	git init --bare dest
 '
 
-test_expect_success 'receive.maxsize rejects push' '
-	git --git-dir=dest config receive.maxsize 512 &&
-	test_must_fail git push dest HEAD
-'
+# Let's run tests with different unpack limits: 1 and 10
+# When the limit is 1, `git receive-pack` will call `git index-pack`.
+# When the limit is 10, `git receive-pack` will call `git unpack-objects`.
 
-test_expect_success 'bumping limit allows push' '
-	git --git-dir=dest config receive.maxsize 4k &&
-	git push dest HEAD
-'
+while read unpacklimit filesize filename
+do
 
-test_expect_success 'create another known-size commit' '
-	test-genrandom bar 2048 >file2 &&
-	git add file2 &&
-	test_commit two-k
-'
+	test_expect_success "create known-size ($filesize bytes) commit '$filename'" '
+		test-genrandom foo "$filesize" >"$filename" &&
+		git add "$filename" &&
+		test_commit "$filename"
+	'
 
-test_expect_success 'change unpacklimit' '
-	git --git-dir=dest config receive.unpacklimit 10
-'
+	test_expect_success "set unpacklimit to $unpacklimit" '
+		git --git-dir=dest config receive.unpacklimit "$unpacklimit"
+	'
 
-test_expect_success 'receive.maxsize rejects push' '
-	git --git-dir=dest config receive.maxsize 512 &&
-	test_must_fail git push dest HEAD
-'
+	test_expect_success 'setting receive.maxsize to 512 rejects push' '
+		git --git-dir=dest config receive.maxsize 512 &&
+		test_must_fail git push dest HEAD
+	'
 
-test_expect_success 'bumping limit allows push' '
-	git --git-dir=dest config receive.maxsize 4k &&
-	git push dest HEAD
-'
+	test_expect_success 'bumping limit to 4k allows push' '
+		git --git-dir=dest config receive.maxsize 4k &&
+		git push dest HEAD
+	'
+
+done <<\EOF
+1 1024 one-k-file
+10 2048 two-k-file
+EOF
 
 test_done


Christian Couder (1):
  unpack-objects: add --max-input-size=<size> option

Jeff King (2):
  index-pack: add --max-input-size=<size> option
  receive-pack: allow a maximum input size to be specified

 Documentation/config.txt             |  5 +++++
 Documentation/git-index-pack.txt     |  2 ++
 Documentation/git-receive-pack.txt   |  3 +++
 Documentation/git-unpack-objects.txt |  3 +++
 builtin/index-pack.c                 |  5 +++++
 builtin/receive-pack.c               | 12 +++++++++++
 builtin/unpack-objects.c             |  7 ++++++
 t/t5546-push-limits.sh               | 42 ++++++++++++++++++++++++++++++++++++
 8 files changed, 79 insertions(+)
 create mode 100755 t/t5546-push-limits.sh

-- 
2.10.0.rc0.3.g8535b4c


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 1/3] index-pack: add --max-input-size=<size> option
  2016-08-16  8:16 [PATCH 0/3] limit the size of the packs we receive Christian Couder
@ 2016-08-16  8:16 ` Christian Couder
  2016-08-16  8:17 ` [PATCH 2/3] unpack-objects: " Christian Couder
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Christian Couder @ 2016-08-16  8:16 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Christian Couder

From: Jeff King <peff@peff.net>

When receiving a pack-file, it can be useful to abort the
`git index-pack`, if the pack-file is too big.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Documentation/git-index-pack.txt | 2 ++
 builtin/index-pack.c             | 5 +++++
 2 files changed, 7 insertions(+)

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index 7a4e055..1b4b65d 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -87,6 +87,8 @@ OPTIONS
 	Specifying 0 will cause Git to auto-detect the number of CPU's
 	and use maximum 3 threads.
 
+--max-input-size=<size>::
+	Die, if the pack is larger than <size>.
 
 Note
 ----
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 1d2ea58..4a8b4ae 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -87,6 +87,7 @@ static struct progress *progress;
 static unsigned char input_buffer[4096];
 static unsigned int input_offset, input_len;
 static off_t consumed_bytes;
+static off_t max_input_size;
 static unsigned deepest_delta;
 static git_SHA_CTX input_ctx;
 static uint32_t input_crc32;
@@ -297,6 +298,8 @@ static void use(int bytes)
 	if (signed_add_overflows(consumed_bytes, bytes))
 		die(_("pack too large for current definition of off_t"));
 	consumed_bytes += bytes;
+	if (max_input_size && consumed_bytes > max_input_size)
+		die(_("pack exceeds maximum allowed size"));
 }
 
 static const char *open_pack_file(const char *pack_name)
@@ -1714,6 +1717,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 					opts.off32_limit = strtoul(c+1, &c, 0);
 				if (*c || opts.off32_limit & 0x80000000)
 					die(_("bad %s"), arg);
+			} else if (skip_prefix(arg, "--max-input-size=", &arg)) {
+				max_input_size = strtoumax(arg, NULL, 10);
 			} else
 				usage(index_pack_usage);
 			continue;
-- 
2.10.0.rc0.3.g8535b4c


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] unpack-objects: add --max-input-size=<size> option
  2016-08-16  8:16 [PATCH 0/3] limit the size of the packs we receive Christian Couder
  2016-08-16  8:16 ` [PATCH 1/3] index-pack: add --max-input-size=<size> option Christian Couder
@ 2016-08-16  8:17 ` Christian Couder
  2016-08-16  8:17 ` [PATCH 3/3] receive-pack: allow a maximum input size to be specified Christian Couder
  2016-08-16 13:11 ` [PATCH 0/3] limit the size of the packs we receive Jeff King
  3 siblings, 0 replies; 11+ messages in thread
From: Christian Couder @ 2016-08-16  8:17 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Christian Couder

When receiving a pack-file, it can be useful to abort the
`git unpack-objects`, if the pack-file is too big.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Documentation/git-unpack-objects.txt | 3 +++
 builtin/unpack-objects.c             | 7 +++++++
 2 files changed, 10 insertions(+)

diff --git a/Documentation/git-unpack-objects.txt b/Documentation/git-unpack-objects.txt
index 3e887d1..b3de50d 100644
--- a/Documentation/git-unpack-objects.txt
+++ b/Documentation/git-unpack-objects.txt
@@ -44,6 +44,9 @@ OPTIONS
 --strict::
 	Don't write objects with broken content or links.
 
+--max-input-size=<size>::
+	Die, if the pack is larger than <size>.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 172470b..4532aa0 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -19,6 +19,7 @@ static const char unpack_usage[] = "git unpack-objects [-n] [-q] [-r] [--strict]
 static unsigned char buffer[4096];
 static unsigned int offset, len;
 static off_t consumed_bytes;
+static off_t max_input_size;
 static git_SHA_CTX ctx;
 static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
 
@@ -87,6 +88,8 @@ static void use(int bytes)
 	if (signed_add_overflows(consumed_bytes, bytes))
 		die("pack too large for current definition of off_t");
 	consumed_bytes += bytes;
+	if (max_input_size && consumed_bytes > max_input_size)
+		die(_("pack exceeds maximum allowed size"));
 }
 
 static void *get_data(unsigned long size)
@@ -550,6 +553,10 @@ int cmd_unpack_objects(int argc, const char **argv, const char *prefix)
 				len = sizeof(*hdr);
 				continue;
 			}
+			if (skip_prefix(arg, "--max-input-size=", &arg)) {
+				max_input_size = strtoumax(arg, NULL, 10);
+				continue;
+			}
 			usage(unpack_usage);
 		}
 
-- 
2.10.0.rc0.3.g8535b4c


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] receive-pack: allow a maximum input size to be specified
  2016-08-16  8:16 [PATCH 0/3] limit the size of the packs we receive Christian Couder
  2016-08-16  8:16 ` [PATCH 1/3] index-pack: add --max-input-size=<size> option Christian Couder
  2016-08-16  8:17 ` [PATCH 2/3] unpack-objects: " Christian Couder
@ 2016-08-16  8:17 ` Christian Couder
  2016-08-16 13:16   ` Jeff King
  2016-08-16 13:11 ` [PATCH 0/3] limit the size of the packs we receive Jeff King
  3 siblings, 1 reply; 11+ messages in thread
From: Christian Couder @ 2016-08-16  8:17 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Christian Couder

From: Jeff King <peff@peff.net>

Receive-pack feeds its input to either index-pack or
unpack-objects, which will happily accept as many bytes as
a sender is willing to provide. Let's allow an arbitrary
cutoff point where we will stop writing bytes to disk.

Cleaning up what has already been written to disk is a
related problem that is not addressed by this patch.

The problem is that git-gc can clean up tmp_pack_* files
after their grace time expired, but that may not be
enough if someone tries to "git push" in a loop.

A simple fix is to call register_tempfile() in
open_pack_file(), and just have index-pack clean up the
file on its way out.

But there are harder cases. For instance, if somebody
pushes a 500MB file, and there is a pre-receive hook that
says "too big; I won't accept this". And then they push in
a loop, the incoming pack has already been accepted into
the repository by the time the pre-receive hook runs.
It's not possible to just delete it, because we don't know
if other simultaneous processes have started to depend on
the objects.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Documentation/config.txt           |  5 +++++
 Documentation/git-receive-pack.txt |  3 +++
 builtin/receive-pack.c             | 12 +++++++++++
 t/t5546-push-limits.sh             | 42 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 62 insertions(+)
 create mode 100755 t/t5546-push-limits.sh

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0bcb679..f5b6061 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2517,6 +2517,11 @@ receive.unpackLimit::
 	especially on slow filesystems.  If not set, the value of
 	`transfer.unpackLimit` is used instead.
 
+receive.maxsize::
+	If the size of a pack file is larger than this limit, then
+	git-receive-pack will error out, instead of accepting the pack
+	file. If not set or set to 0, then the size is unlimited.
+
 receive.denyDeletes::
 	If set to true, git-receive-pack will deny a ref update that deletes
 	the ref. Use this to prevent such a ref deletion via a push.
diff --git a/Documentation/git-receive-pack.txt b/Documentation/git-receive-pack.txt
index 000ee8d..0ccd5fb 100644
--- a/Documentation/git-receive-pack.txt
+++ b/Documentation/git-receive-pack.txt
@@ -33,6 +33,9 @@ post-update hooks found in the Documentation/howto directory.
 option, which tells it if updates to a ref should be denied if they
 are not fast-forwards.
 
+A number of other receive.* config options are available to tweak
+its behavior, see linkgit:git-config[1].
+
 OPTIONS
 -------
 <directory>::
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 92e1213..8c2943d 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -46,6 +46,7 @@ static int transfer_unpack_limit = -1;
 static int advertise_atomic_push = 1;
 static int advertise_push_options;
 static int unpack_limit = 100;
+static off_t max_input_size;
 static int report_status;
 static int use_sideband;
 static int use_atomic;
@@ -212,6 +213,11 @@ static int receive_pack_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (strcmp(var, "receive.maxsize") == 0) {
+		max_input_size = git_config_int64(var, value);
+		return 0;
+	}
+
 	return git_default_config(var, value, cb);
 }
 
@@ -1650,6 +1656,9 @@ static const char *unpack(int err_fd, struct shallow_info *si)
 		if (fsck_objects)
 			argv_array_pushf(&child.args, "--strict%s",
 				fsck_msg_types.buf);
+		if (max_input_size)
+			argv_array_pushf(&child.args, "--max-input-size=%"PRIuMAX,
+				(uintmax_t)max_input_size);
 		child.no_stdout = 1;
 		child.err = err_fd;
 		child.git_cmd = 1;
@@ -1678,6 +1687,9 @@ static const char *unpack(int err_fd, struct shallow_info *si)
 				fsck_msg_types.buf);
 		if (!reject_thin)
 			argv_array_push(&child.args, "--fix-thin");
+		if (max_input_size)
+			argv_array_pushf(&child.args, "--max-input-size=%"PRIuMAX,
+				(uintmax_t)max_input_size);
 		child.out = -1;
 		child.err = err_fd;
 		child.git_cmd = 1;
diff --git a/t/t5546-push-limits.sh b/t/t5546-push-limits.sh
new file mode 100755
index 0000000..b38d508
--- /dev/null
+++ b/t/t5546-push-limits.sh
@@ -0,0 +1,42 @@
+#!/bin/sh
+
+test_description='check input limits for pushing'
+. ./test-lib.sh
+
+test_expect_success 'create remote repository' '
+	git init --bare dest
+'
+
+# Let's run tests with different unpack limits: 1 and 10
+# When the limit is 1, `git receive-pack` will call `git index-pack`.
+# When the limit is 10, `git receive-pack` will call `git unpack-objects`.
+
+while read unpacklimit filesize filename
+do
+
+	test_expect_success "create known-size ($filesize bytes) commit '$filename'" '
+		test-genrandom foo "$filesize" >"$filename" &&
+		git add "$filename" &&
+		test_commit "$filename"
+	'
+
+	test_expect_success "set unpacklimit to $unpacklimit" '
+		git --git-dir=dest config receive.unpacklimit "$unpacklimit"
+	'
+
+	test_expect_success 'setting receive.maxsize to 512 rejects push' '
+		git --git-dir=dest config receive.maxsize 512 &&
+		test_must_fail git push dest HEAD
+	'
+
+	test_expect_success 'bumping limit to 4k allows push' '
+		git --git-dir=dest config receive.maxsize 4k &&
+		git push dest HEAD
+	'
+
+done <<\EOF
+1 1024 one-k-file
+10 2048 two-k-file
+EOF
+
+test_done
-- 
2.10.0.rc0.3.g8535b4c


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] limit the size of the packs we receive
  2016-08-16  8:16 [PATCH 0/3] limit the size of the packs we receive Christian Couder
                   ` (2 preceding siblings ...)
  2016-08-16  8:17 ` [PATCH 3/3] receive-pack: allow a maximum input size to be specified Christian Couder
@ 2016-08-16 13:11 ` Jeff King
  2016-08-16 14:44   ` Christian Couder
  3 siblings, 1 reply; 11+ messages in thread
From: Jeff King @ 2016-08-16 13:11 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Junio C Hamano, Christian Couder

On Tue, Aug 16, 2016 at 10:16:58AM +0200, Christian Couder wrote:

> Changes from previous RFC version
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
>   - added documentation to all the 3 patches,

Good idea.

>   - changed strtoul() to strtoumax() in the first 2 patches, as
>     suggested by Peff,
> 
>   - changed git_config_ulong() to git_config_int64() and used PRIuMAX
>     and uintmax_t in the last patch, as suggested by Peff,

Thinking a bit, off_t is actually signed. So maybe PRIdMAX (which we
don't seem to have compat macros for) would make more sense. I dunno if
anybody actually cares. This value shouldn't be signed anyway, and
nobody should be approaching the limits of a 64-bit number anyway (there
is no point in limiting the incoming pack to the exabyte range).

So I'm inclined not to worry about it.

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] receive-pack: allow a maximum input size to be specified
  2016-08-16  8:17 ` [PATCH 3/3] receive-pack: allow a maximum input size to be specified Christian Couder
@ 2016-08-16 13:16   ` Jeff King
  2016-08-16 14:27     ` Christian Couder
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff King @ 2016-08-16 13:16 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Junio C Hamano, Christian Couder

On Tue, Aug 16, 2016 at 10:17:01AM +0200, Christian Couder wrote:

> From: Jeff King <peff@peff.net>
> 
> Receive-pack feeds its input to either index-pack or
> unpack-objects, which will happily accept as many bytes as
> a sender is willing to provide. Let's allow an arbitrary
> cutoff point where we will stop writing bytes to disk.
> 
> Cleaning up what has already been written to disk is a
> related problem that is not addressed by this patch.
> 
> The problem is that git-gc can clean up tmp_pack_* files
> after their grace time expired, but that may not be
> enough if someone tries to "git push" in a loop.
> 
> A simple fix is to call register_tempfile() in
> open_pack_file(), and just have index-pack clean up the
> file on its way out.
> 
> But there are harder cases. For instance, if somebody
> pushes a 500MB file, and there is a pre-receive hook that
> says "too big; I won't accept this". And then they push in
> a loop, the incoming pack has already been accepted into
> the repository by the time the pre-receive hook runs.
> It's not possible to just delete it, because we don't know
> if other simultaneous processes have started to depend on
> the objects.

IMHO, everything after "not addressed by this patch" doesn't really add
anything. This commit has value on its own, and the discussion about the
next steps is already documented on the list (and hopefully will be
fixed with other patches!).

> +test_expect_success 'create remote repository' '
> +	git init --bare dest
> +'
> +
> +# Let's run tests with different unpack limits: 1 and 10
> +# When the limit is 1, `git receive-pack` will call `git index-pack`.
> +# When the limit is 10, `git receive-pack` will call `git unpack-objects`.
> +
> +while read unpacklimit filesize filename
> +do
> [...]
> +done <<\EOF
> +1 1024 one-k-file
> +10 2048 two-k-file
> +EOF

Is there any reason to use different filenames and sizes for the two
runs? They should behave identically, so it would make more sense to me
to subject them to identical inputs.

I know you need different files and filenames to continue pushing into
the same "dest", but the problem is easily solved by bumping the "git
init" into the loop.

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] receive-pack: allow a maximum input size to be specified
  2016-08-16 13:16   ` Jeff King
@ 2016-08-16 14:27     ` Christian Couder
  2016-08-16 16:21       ` Jeff King
  0 siblings, 1 reply; 11+ messages in thread
From: Christian Couder @ 2016-08-16 14:27 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, Christian Couder

On Tue, Aug 16, 2016 at 3:16 PM, Jeff King <peff@peff.net> wrote:
> On Tue, Aug 16, 2016 at 10:17:01AM +0200, Christian Couder wrote:
>>
>> Cleaning up what has already been written to disk is a
>> related problem that is not addressed by this patch.
>>
>> The problem is that git-gc can clean up tmp_pack_* files
>> after their grace time expired, but that may not be
>> enough if someone tries to "git push" in a loop.
>>
>> A simple fix is to call register_tempfile() in
>> open_pack_file(), and just have index-pack clean up the
>> file on its way out.
>>
>> But there are harder cases. For instance, if somebody
>> pushes a 500MB file, and there is a pre-receive hook that
>> says "too big; I won't accept this". And then they push in
>> a loop, the incoming pack has already been accepted into
>> the repository by the time the pre-receive hook runs.
>> It's not possible to just delete it, because we don't know
>> if other simultaneous processes have started to depend on
>> the objects.
>
> IMHO, everything after "not addressed by this patch" doesn't really add
> anything. This commit has value on its own, and the discussion about the
> next steps is already documented on the list (and hopefully will be
> fixed with other patches!).

Ok, I will remove that in the next iteration.

>> +test_expect_success 'create remote repository' '
>> +     git init --bare dest
>> +'
>> +
>> +# Let's run tests with different unpack limits: 1 and 10
>> +# When the limit is 1, `git receive-pack` will call `git index-pack`.
>> +# When the limit is 10, `git receive-pack` will call `git unpack-objects`.
>> +
>> +while read unpacklimit filesize filename
>> +do
>> [...]
>> +done <<\EOF
>> +1 1024 one-k-file
>> +10 2048 two-k-file
>> +EOF
>
> Is there any reason to use different filenames and sizes for the two
> runs? They should behave identically, so it would make more sense to me
> to subject them to identical inputs.

About the sizes, I thought that some people might want to try sizes
closer to the limit and also that it is good anyway in general to add
a bit of "randomness", or at least variability, in the tests.

> I know you need different files and filenames to continue pushing into
> the same "dest", but the problem is easily solved by bumping the "git
> init" into the loop.

I thought that it would be a bit less wasteful to use the same "dest"
and also it would make the test more realistic as people often push in
non empty repos.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] limit the size of the packs we receive
  2016-08-16 13:11 ` [PATCH 0/3] limit the size of the packs we receive Jeff King
@ 2016-08-16 14:44   ` Christian Couder
  2016-08-16 14:48     ` Jeff King
  0 siblings, 1 reply; 11+ messages in thread
From: Christian Couder @ 2016-08-16 14:44 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, Christian Couder

On Tue, Aug 16, 2016 at 3:11 PM, Jeff King <peff@peff.net> wrote:
> On Tue, Aug 16, 2016 at 10:16:58AM +0200, Christian Couder wrote:
>
>>   - changed strtoul() to strtoumax() in the first 2 patches, as
>>     suggested by Peff,
>>
>>   - changed git_config_ulong() to git_config_int64() and used PRIuMAX
>>     and uintmax_t in the last patch, as suggested by Peff,
>
> Thinking a bit, off_t is actually signed. So maybe PRIdMAX (which we
> don't seem to have compat macros for) would make more sense. I dunno if
> anybody actually cares. This value shouldn't be signed anyway, and
> nobody should be approaching the limits of a 64-bit number anyway (there
> is no point in limiting the incoming pack to the exabyte range).
>
> So I'm inclined not to worry about it.

Yeah, you previously wrote:

> We seem to use strtoumax() elsewhere, so that's probably a good match
> (technically it can overflow an off_t, but in practice this value comes from
> the admin and they will set something sane).

and I thought that the same would apply to using PRIuMAX and uintmax_t
in this patch.

I can add something along your explanations in the commit message if
it is prefered.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] limit the size of the packs we receive
  2016-08-16 14:44   ` Christian Couder
@ 2016-08-16 14:48     ` Jeff King
  0 siblings, 0 replies; 11+ messages in thread
From: Jeff King @ 2016-08-16 14:48 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Junio C Hamano, Christian Couder

On Tue, Aug 16, 2016 at 04:44:01PM +0200, Christian Couder wrote:

>> [sizes and signedness of off_t]
> I can add something along your explanations in the commit message if
> it is prefered.

I think it's probably OK without it.

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] receive-pack: allow a maximum input size to be specified
  2016-08-16 14:27     ` Christian Couder
@ 2016-08-16 16:21       ` Jeff King
  2016-08-18 14:06         ` Christian Couder
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff King @ 2016-08-16 16:21 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Junio C Hamano, Christian Couder

On Tue, Aug 16, 2016 at 04:27:10PM +0200, Christian Couder wrote:

> >> +while read unpacklimit filesize filename
> >> +do
> >> [...]
> >> +done <<\EOF
> >> +1 1024 one-k-file
> >> +10 2048 two-k-file
> >> +EOF
> >
> > Is there any reason to use different filenames and sizes for the two
> > runs? They should behave identically, so it would make more sense to me
> > to subject them to identical inputs.
> 
> About the sizes, I thought that some people might want to try sizes
> closer to the limit and also that it is good anyway in general to add
> a bit of "randomness", or at least variability, in the tests.

In general, I'd prefer a systematic approach to introducing variables
into tests. If it's important that we test different sizes, then we
should do so, and not only test some combinations (and if it is not
important to test different sizes, then we should not waste CPU time and
the mental energy of people reading the tests).

IOW, when I see this, I wonder why the index-pack code path is not
tested against a 2k file. But there really isn't a good reason. So
either it does matter, and our tests do not have good coverage, or it
does not, and it is just making me wonder if the tests are buggy.

Worse, both files are created with the same seed via test-genrandom. So
I suspect the 2k file is a superset of the 1k file, and we may send it
is a thin delta (so it really is only testing a 1k input anyway!).

> I thought that it would be a bit less wasteful to use the same "dest"
> and also it would make the test more realistic as people often push in
> non empty repos.

I doubt it is expensive enough to matter in practice. Though note that
if you push the same commit to two new repositories, then you can
amortize the test-genrandom/add/commit step (i.e., push the exact same
packfile in both cases).

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] receive-pack: allow a maximum input size to be specified
  2016-08-16 16:21       ` Jeff King
@ 2016-08-18 14:06         ` Christian Couder
  0 siblings, 0 replies; 11+ messages in thread
From: Christian Couder @ 2016-08-18 14:06 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, Christian Couder

On Tue, Aug 16, 2016 at 6:21 PM, Jeff King <peff@peff.net> wrote:
>> >
>> > Is there any reason to use different filenames and sizes for the two
>> > runs? They should behave identically, so it would make more sense to me
>> > to subject them to identical inputs.
>>
>> About the sizes, I thought that some people might want to try sizes
>> closer to the limit and also that it is good anyway in general to add
>> a bit of "randomness", or at least variability, in the tests.
>
> In general, I'd prefer a systematic approach to introducing variables
> into tests. If it's important that we test different sizes, then we
> should do so, and not only test some combinations (and if it is not
> important to test different sizes, then we should not waste CPU time and
> the mental energy of people reading the tests).
>
> IOW, when I see this, I wonder why the index-pack code path is not
> tested against a 2k file. But there really isn't a good reason. So
> either it does matter, and our tests do not have good coverage, or it
> does not, and it is just making me wonder if the tests are buggy.

I don't see things in the same way, but I will make the second file a
1k file too as I don't really care much about that.

> Worse, both files are created with the same seed via test-genrandom. So
> I suspect the 2k file is a superset of the 1k file, and we may send it
> is a thin delta (so it really is only testing a 1k input anyway!).

Yeah, I will use a different seed for the second file.

>> I thought that it would be a bit less wasteful to use the same "dest"
>> and also it would make the test more realistic as people often push in
>> non empty repos.
>
> I doubt it is expensive enough to matter in practice. Though note that
> if you push the same commit to two new repositories, then you can
> amortize the test-genrandom/add/commit step (i.e., push the exact same
> packfile in both cases).

Yeah, it probably doesn't matter anyway regarding efficiency, but I
still prefer to not create a new repo as I would find it more
confusing for the reader to use different repos.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-08-18 14:08 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-16  8:16 [PATCH 0/3] limit the size of the packs we receive Christian Couder
2016-08-16  8:16 ` [PATCH 1/3] index-pack: add --max-input-size=<size> option Christian Couder
2016-08-16  8:17 ` [PATCH 2/3] unpack-objects: " Christian Couder
2016-08-16  8:17 ` [PATCH 3/3] receive-pack: allow a maximum input size to be specified Christian Couder
2016-08-16 13:16   ` Jeff King
2016-08-16 14:27     ` Christian Couder
2016-08-16 16:21       ` Jeff King
2016-08-18 14:06         ` Christian Couder
2016-08-16 13:11 ` [PATCH 0/3] limit the size of the packs we receive Jeff King
2016-08-16 14:44   ` Christian Couder
2016-08-16 14:48     ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).