git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [bug] Reliably Reproducible Bad Packing of Objects
@ 2016-06-24 22:38 Christoph Michelbach
  2016-07-02  9:10 ` Duy Nguyen
  2016-07-05 17:05 ` [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems Nguyễn Thái Ngọc Duy
  0 siblings, 2 replies; 39+ messages in thread
From: Christoph Michelbach @ 2016-06-24 22:38 UTC (permalink / raw)
  To: git

Hi,

when run on a 32 bit system (Linux system, don't know about other
systems),

mkdir test && cd test && git init && touch someFile && git add someFile
&& git commit -m "Initial commit." && dd if=/dev/urandom
of=bigBinaryFile bs=1MB count=4294 && git add bigBinaryFile && git
commit -m "Introduced big biary file."

reliably produces this error message: "error: bad packed object CRC
for"

Since git counts sizes in byte and uses ints for it, it can't handle
files bigger than (2^32 - 1) byte. That's 4'294.967296 MB. If you give
git a file bigger than it can handle, it usually just says "fatal:
Cannot handle files this big" without corrupting the repository. Btw.:
It'd be nice if the error message stated that this only occurs on 32
bit system and only with files 4 GiB in size or bigger.

To provoke the bug, the commands above creates a file which cannot be
compressed slightly less than (2^32 - 1) byte big, probably resulting
in a commit more than (2^32 - 1) byte big.

I was able to reproduce the bug on a Raspberry Pi Model 3 (ARMv7 CPU)
and a virtual machine running Ubuntu 16.04 32 bit (which was
specifically set up to test this, so it was a clean installation) on a
host running Ubuntu 16.04 64 bit on an ARM 64 bit x86 CPU (i7-4720HQ).

Output on raspi: https://gist.github.com/m1cm1c/d874f03be5b12cbd8b86ced
79fa456d1
Output on virt. machine: https://gist.github.com/m1cm1c/d0dd47828386bb0
f1e001b9f750416e0

Note that on the raspi I concatenated `git gc` to the end to show that
the object exists but is corrupt but you can already see that something
went wrong before `git gc` is executed.

If you look at the output on the virtual machine, however, you will not
see that something wrong until you read the part where I typed in `git
gc` which I find particularly worrying.

When checking whether you get the same result, please make sure to use
git 1.7.6 or newer. If you use an older version of git (older versions
are still being distributed, for example for Ubuntu 14.04), git will
try to memory-map a too big file resulting in a different error.

-- 
With kind regards
Christoph Michelbach


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [bug] Reliably Reproducible Bad Packing of Objects
  2016-06-24 22:38 [bug] Reliably Reproducible Bad Packing of Objects Christoph Michelbach
@ 2016-07-02  9:10 ` Duy Nguyen
  2016-07-02 14:35   ` Duy Nguyen
  2016-07-05 17:05 ` [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems Nguyễn Thái Ngọc Duy
  1 sibling, 1 reply; 39+ messages in thread
From: Duy Nguyen @ 2016-07-02  9:10 UTC (permalink / raw)
  To: Christoph Michelbach; +Cc: git

On Sat, Jun 25, 2016 at 12:38:22AM +0200, Christoph Michelbach wrote:
> Hi,
> 
> when run on a 32 bit system (Linux system, don't know about other
> systems),
> 
> mkdir test && cd test && git init && touch someFile && git add someFile
> && git commit -m "Initial commit." && dd if=/dev/urandom
> of=bigBinaryFile bs=1MB count=4294 && git add bigBinaryFile && git
> commit -m "Introduced big biary file."
> 
> reliably produces this error message: "error: bad packed object CRC
> for"


I tried with a 32-bit git build on 64-bit system and did not see this
message. Maybe it's because the 64-bit system. Just too lazy to bring
up a 32-bit VM to process 4GB of data.

But I did look around a bit, and the problem is probably number
truncation. Could you try this patch?

-- 8< --
diff --git a/sha1_file.c b/sha1_file.c
index d5e1121..cb571ac 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2281,7 +2281,7 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
 
 		if (do_check_packed_object_crc && p->index_version > 1) {
 			struct revindex_entry *revidx = find_pack_revindex(p, obj_offset);
-			unsigned long len = revidx[1].offset - obj_offset;
+			off_t len = revidx[1].offset - obj_offset;
 			if (check_pack_crc(p, &w_curs, obj_offset, len, revidx->nr)) {
 				const unsigned char *sha1 =
 					nth_packed_object_sha1(p, revidx->nr);
-- 8< --


If your system has large file support (you must or things must have
failed badly from the beginning), then both offsets in that
subtraction expression are of type off_t, aka 64-bit. However "len"
would be 32-bit and truncation can happen. check_pack_crc() takes
off_t too, so it seems well prepared already (but I didn't test).

PS. There's a similar statement in pack-objects.c, but I don't think
you have done anything to trigger it. OK, 'git gc' may..
--
Duy

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [bug] Reliably Reproducible Bad Packing of Objects
  2016-07-02  9:10 ` Duy Nguyen
@ 2016-07-02 14:35   ` Duy Nguyen
  0 siblings, 0 replies; 39+ messages in thread
From: Duy Nguyen @ 2016-07-02 14:35 UTC (permalink / raw)
  To: Christoph Michelbach; +Cc: Git Mailing List

On Sat, Jul 2, 2016 at 11:10 AM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Sat, Jun 25, 2016 at 12:38:22AM +0200, Christoph Michelbach wrote:
>> Hi,
>>
>> when run on a 32 bit system (Linux system, don't know about other
>> systems),
>>
>> mkdir test && cd test && git init && touch someFile && git add someFile
>> && git commit -m "Initial commit." && dd if=/dev/urandom
>> of=bigBinaryFile bs=1MB count=4294 && git add bigBinaryFile && git
>> commit -m "Introduced big biary file."
>>
>> reliably produces this error message: "error: bad packed object CRC
>> for"
>
>
> I tried with a 32-bit git build on 64-bit system and did not see this
> message. Maybe it's because the 64-bit system. Just too lazy to bring
> up a 32-bit VM to process 4GB of data.
>
> But I did look around a bit, and the problem is probably number
> truncation. Could you try this patch?

Just FYI. There may be more truncation problems on 32-bit systems.
"clang -Wshorten-64-to-32" reports a couple of spots that could be
real problems (pack-objects, index-pack...) for handling 4+ GB files
on these systems. Still working on it...
-- 
Duy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems
  2016-06-24 22:38 [bug] Reliably Reproducible Bad Packing of Objects Christoph Michelbach
  2016-07-02  9:10 ` Duy Nguyen
@ 2016-07-05 17:05 ` Nguyễn Thái Ngọc Duy
  2016-07-05 17:05   ` [PATCH 1/5] pack-objects: pass length to check_pack_crc() without truncation Nguyễn Thái Ngọc Duy
                     ` (12 more replies)
  1 sibling, 13 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-07-05 17:05 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, michelbach94,
	Nguyễn Thái Ngọc Duy

Since I now could reproduce the problem that Christoph showed, I
decided to send the good patches out. To sum up, we use "unsigned
long" in some places related to file size. On 32-bit systems, it's
limited to 32 bits even though the system can handle files larger than
that (off_t is 64-bit). This fixes it.

clang -Wshorten-64-to-32 is very helpful to spot these problems. I
have a couple more patches to clean all these warnings, but some need
more code study to see what is the right way to do.

Most of the rest seems harmless, except for the local variable "size"
in builtin/pack-objects.c:write_one(). I might send 6/5 for that one.

Nguyễn Thái Ngọc Duy (5):
  pack-objects: pass length to check_pack_crc() without truncation
  sha1_file.c: use type off_t* for object_info->disk_sizep
  index-pack: correct "len" type in unpack_data()
  index-pack: report correct bad object offsets even if they are large
  index-pack: correct "offset" type in unpack_entry_data()

 builtin/cat-file.c     |  4 ++--
 builtin/index-pack.c   | 23 ++++++++++++-----------
 builtin/pack-objects.c |  2 +-
 cache.h                |  2 +-
 sha1_file.c            |  2 +-
 5 files changed, 17 insertions(+), 16 deletions(-)

-- 
2.8.2.537.g0965dd9


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/5] pack-objects: pass length to check_pack_crc() without truncation
  2016-07-05 17:05 ` [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems Nguyễn Thái Ngọc Duy
@ 2016-07-05 17:05   ` Nguyễn Thái Ngọc Duy
  2016-07-12 17:16     ` Junio C Hamano
  2016-07-05 17:05   ` [PATCH 2/5] sha1_file.c: use type off_t* for object_info->disk_sizep Nguyễn Thái Ngọc Duy
                     ` (11 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-07-05 17:05 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, michelbach94,
	Nguyễn Thái Ngọc Duy

On 32 bit systems with large file support, unsigned long is 32-bit
while the two offsets in the subtraction expression (pack-objects has
the exact same expression as in sha1_file.c but not shown in diff) are
in 64-bit. If an in-pack object is larger than 2^32 len/datalen is
truncated and we get a misleading "error: bad packed object CRC for
..." as a result.

Use off_t for len and datalen. check_pack_crc() already accepts this
argument as off_t and can deal with 4+ GB.

Noticed-by: Christoph Michelbach <michelbach94@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/pack-objects.c | 2 +-
 sha1_file.c            | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 8f5e358..df6ecd5 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -349,7 +349,7 @@ static unsigned long write_reuse_object(struct sha1file *f, struct object_entry
 	struct revindex_entry *revidx;
 	off_t offset;
 	enum object_type type = entry->type;
-	unsigned long datalen;
+	off_t datalen;
 	unsigned char header[10], dheader[10];
 	unsigned hdrlen;
 
diff --git a/sha1_file.c b/sha1_file.c
index d5e1121..cb571ac 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2281,7 +2281,7 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
 
 		if (do_check_packed_object_crc && p->index_version > 1) {
 			struct revindex_entry *revidx = find_pack_revindex(p, obj_offset);
-			unsigned long len = revidx[1].offset - obj_offset;
+			off_t len = revidx[1].offset - obj_offset;
 			if (check_pack_crc(p, &w_curs, obj_offset, len, revidx->nr)) {
 				const unsigned char *sha1 =
 					nth_packed_object_sha1(p, revidx->nr);
-- 
2.8.2.537.g0965dd9


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/5] sha1_file.c: use type off_t* for object_info->disk_sizep
  2016-07-05 17:05 ` [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems Nguyễn Thái Ngọc Duy
  2016-07-05 17:05   ` [PATCH 1/5] pack-objects: pass length to check_pack_crc() without truncation Nguyễn Thái Ngọc Duy
@ 2016-07-05 17:05   ` Nguyễn Thái Ngọc Duy
  2016-07-12 17:20     ` Junio C Hamano
  2016-07-12 19:26     ` Junio C Hamano
  2016-07-05 17:05   ` [PATCH 3/5] index-pack: correct "len" type in unpack_data() Nguyễn Thái Ngọc Duy
                     ` (10 subsequent siblings)
  12 siblings, 2 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-07-05 17:05 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, michelbach94,
	Nguyễn Thái Ngọc Duy

This field, filled by sha1_object_info() contains the on-disk size of
an object, which could go over 4GB limit of unsigned long on 32-bit
systems. Use off_t for it instead and update all callers.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/cat-file.c | 4 ++--
 cache.h            | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 618103f..5b34bd0 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -131,7 +131,7 @@ struct expand_data {
 	unsigned char sha1[20];
 	enum object_type type;
 	unsigned long size;
-	unsigned long disk_size;
+	off_t disk_size;
 	const char *rest;
 	unsigned char delta_base_sha1[20];
 
@@ -191,7 +191,7 @@ static void expand_atom(struct strbuf *sb, const char *atom, int len,
 		if (data->mark_query)
 			data->info.disk_sizep = &data->disk_size;
 		else
-			strbuf_addf(sb, "%lu", data->disk_size);
+			strbuf_addf(sb, "%"PRIuMAX, data->disk_size);
 	} else if (is_atom("rest", atom, len)) {
 		if (data->mark_query)
 			data->split_on_whitespace = 1;
diff --git a/cache.h b/cache.h
index c73becb..a4465cb 100644
--- a/cache.h
+++ b/cache.h
@@ -1508,7 +1508,7 @@ struct object_info {
 	/* Request */
 	enum object_type *typep;
 	unsigned long *sizep;
-	unsigned long *disk_sizep;
+	off_t *disk_sizep;
 	unsigned char *delta_base_sha1;
 	struct strbuf *typename;
 
-- 
2.8.2.537.g0965dd9


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 3/5] index-pack: correct "len" type in unpack_data()
  2016-07-05 17:05 ` [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems Nguyễn Thái Ngọc Duy
  2016-07-05 17:05   ` [PATCH 1/5] pack-objects: pass length to check_pack_crc() without truncation Nguyễn Thái Ngọc Duy
  2016-07-05 17:05   ` [PATCH 2/5] sha1_file.c: use type off_t* for object_info->disk_sizep Nguyễn Thái Ngọc Duy
@ 2016-07-05 17:05   ` Nguyễn Thái Ngọc Duy
  2016-07-05 20:25     ` Johannes Sixt
  2016-07-05 17:05   ` [PATCH 4/5] index-pack: report correct bad object offsets even if they are large Nguyễn Thái Ngọc Duy
                     ` (9 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-07-05 17:05 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, michelbach94,
	Nguyễn Thái Ngọc Duy

On 32-bit systems with large file support, one entry could be larger
than 4GB and overflow "len". Correct it so we can unpack a full entry.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/index-pack.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index e8c71fc..cafaab7 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -549,13 +549,13 @@ static void *unpack_data(struct object_entry *obj,
 			 void *cb_data)
 {
 	off_t from = obj[0].idx.offset + obj[0].hdr_size;
-	unsigned long len = obj[1].idx.offset - from;
+	off_t len = obj[1].idx.offset - from;
 	unsigned char *data, *inbuf;
 	git_zstream stream;
 	int status;
 
 	data = xmallocz(consume ? 64*1024 : obj->size);
-	inbuf = xmalloc((len < 64*1024) ? len : 64*1024);
+	inbuf = xmalloc((len < 64*1024) ? (int)len : 64*1024);
 
 	memset(&stream, 0, sizeof(stream));
 	git_inflate_init(&stream);
@@ -563,15 +563,15 @@ static void *unpack_data(struct object_entry *obj,
 	stream.avail_out = consume ? 64*1024 : obj->size;
 
 	do {
-		ssize_t n = (len < 64*1024) ? len : 64*1024;
+		ssize_t n = (len < 64*1024) ? (ssize_t)len : 64*1024;
 		n = xpread(get_thread_data()->pack_fd, inbuf, n, from);
 		if (n < 0)
 			die_errno(_("cannot pread pack file"));
 		if (!n)
-			die(Q_("premature end of pack file, %lu byte missing",
-			       "premature end of pack file, %lu bytes missing",
-			       len),
-			    len);
+			die(Q_("premature end of pack file, %"PRIuMAX" byte missing",
+			       "premature end of pack file, %"PRIuMAX" bytes missing",
+			       (unsigned int)len),
+			    (uintmax_t)len);
 		from += n;
 		len -= n;
 		stream.next_in = inbuf;
-- 
2.8.2.537.g0965dd9


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 4/5] index-pack: report correct bad object offsets even if they are large
  2016-07-05 17:05 ` [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems Nguyễn Thái Ngọc Duy
                     ` (2 preceding siblings ...)
  2016-07-05 17:05   ` [PATCH 3/5] index-pack: correct "len" type in unpack_data() Nguyễn Thái Ngọc Duy
@ 2016-07-05 17:05   ` Nguyễn Thái Ngọc Duy
  2016-07-12 17:24     ` Junio C Hamano
  2016-07-12 19:27     ` Junio C Hamano
  2016-07-05 17:05   ` [PATCH 5/5] index-pack: correct "offset" type in unpack_entry_data() Nguyễn Thái Ngọc Duy
                     ` (8 subsequent siblings)
  12 siblings, 2 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-07-05 17:05 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, michelbach94,
	Nguyễn Thái Ngọc Duy

Use the right type for offsets in this case, off_t, which makes a
difference on 32-bit systems with large file support, and change
formatting code accordingly.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/index-pack.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index cafaab7..73f7cd2 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -338,10 +338,10 @@ static void parse_pack_header(void)
 	use(sizeof(struct pack_header));
 }
 
-static NORETURN void bad_object(unsigned long offset, const char *format,
+static NORETURN void bad_object(off_t offset, const char *format,
 		       ...) __attribute__((format (printf, 2, 3)));
 
-static NORETURN void bad_object(unsigned long offset, const char *format, ...)
+static NORETURN void bad_object(off_t offset, const char *format, ...)
 {
 	va_list params;
 	char buf[1024];
@@ -349,7 +349,8 @@ static NORETURN void bad_object(unsigned long offset, const char *format, ...)
 	va_start(params, format);
 	vsnprintf(buf, sizeof(buf), format, params);
 	va_end(params);
-	die(_("pack has bad object at offset %lu: %s"), offset, buf);
+	die(_("pack has bad object at offset %"PRIiMAX": %s"),
+	    (intmax_t)offset, buf);
 }
 
 static inline struct thread_local *get_thread_data(void)
-- 
2.8.2.537.g0965dd9


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 5/5] index-pack: correct "offset" type in unpack_entry_data()
  2016-07-05 17:05 ` [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems Nguyễn Thái Ngọc Duy
                     ` (3 preceding siblings ...)
  2016-07-05 17:05   ` [PATCH 4/5] index-pack: report correct bad object offsets even if they are large Nguyễn Thái Ngọc Duy
@ 2016-07-05 17:05   ` Nguyễn Thái Ngọc Duy
  2016-07-05 18:11   ` [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems Christoph Michelbach
                     ` (7 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-07-05 17:05 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, michelbach94,
	Nguyễn Thái Ngọc Duy

unpack_entry_data() receives an off_t value from unpack_raw_entry(),
which could be larger than unsigned long on 32-bit systems with large
file support. Correct the type so truncation does not happen. This
only affects bad object reporting though.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/index-pack.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 73f7cd2..f35a9dc 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -430,7 +430,7 @@ static int is_delta_type(enum object_type type)
 	return (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA);
 }
 
-static void *unpack_entry_data(unsigned long offset, unsigned long size,
+static void *unpack_entry_data(off_t offset, unsigned long size,
 			       enum object_type type, unsigned char *sha1)
 {
 	static char fixed_buf[8192];
-- 
2.8.2.537.g0965dd9


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems
  2016-07-05 17:05 ` [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems Nguyễn Thái Ngọc Duy
                     ` (4 preceding siblings ...)
  2016-07-05 17:05   ` [PATCH 5/5] index-pack: correct "offset" type in unpack_entry_data() Nguyễn Thái Ngọc Duy
@ 2016-07-05 18:11   ` Christoph Michelbach
       [not found]   ` <1467756891.4798.1.camel@gmail.com>
                     ` (6 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Christoph Michelbach @ 2016-07-05 18:11 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy, git; +Cc: Junio C Hamano

Thank you very much!

-- 
With kind regards
Christoph Michelbach

On Tue, 2016-07-05 at 19:05 +0200, Nguyễn Thái Ngọc Duy wrote:
> Since I now could reproduce the problem that Christoph showed, I
> decided to send the good patches out. To sum up, we use "unsigned
> long" in some places related to file size. On 32-bit systems, it's
> limited to 32 bits even though the system can handle files larger
> than
> that (off_t is 64-bit). This fixes it.
> 
> clang -Wshorten-64-to-32 is very helpful to spot these problems. I
> have a couple more patches to clean all these warnings, but some need
> more code study to see what is the right way to do.
> 
> Most of the rest seems harmless, except for the local variable "size"
> in builtin/pack-objects.c:write_one(). I might send 6/5 for that one.
> 
> Nguyễn Thái Ngọc Duy (5):
>   pack-objects: pass length to check_pack_crc() without truncation
>   sha1_file.c: use type off_t* for object_info->disk_sizep
>   index-pack: correct "len" type in unpack_data()
>   index-pack: report correct bad object offsets even if they are
> large
>   index-pack: correct "offset" type in unpack_entry_data()
> 
>  builtin/cat-file.c     |  4 ++--
>  builtin/index-pack.c   | 23 ++++++++++++-----------
>  builtin/pack-objects.c |  2 +-
>  cache.h                |  2 +-
>  sha1_file.c            |  2 +-
>  5 files changed, 17 insertions(+), 16 deletions(-)
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/5] index-pack: correct "len" type in unpack_data()
  2016-07-05 17:05   ` [PATCH 3/5] index-pack: correct "len" type in unpack_data() Nguyễn Thái Ngọc Duy
@ 2016-07-05 20:25     ` Johannes Sixt
  2016-07-06 15:25       ` Duy Nguyen
  0 siblings, 1 reply; 39+ messages in thread
From: Johannes Sixt @ 2016-07-05 20:25 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Junio C Hamano, michelbach94

Am 05.07.2016 um 19:05 schrieb Nguyễn Thái Ngọc Duy:
> +			die(Q_("premature end of pack file, %"PRIuMAX" byte missing",
> +			       "premature end of pack file, %"PRIuMAX" bytes missing",

I would be surprised if this form of translation worked...

> +			       (unsigned int)len),
> +			    (uintmax_t)len);

-- Hannes


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/5] index-pack: correct "len" type in unpack_data()
  2016-07-05 20:25     ` Johannes Sixt
@ 2016-07-06 15:25       ` Duy Nguyen
  2016-07-06 16:04         ` Junio C Hamano
  0 siblings, 1 reply; 39+ messages in thread
From: Duy Nguyen @ 2016-07-06 15:25 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Git Mailing List, Junio C Hamano, Christoph Michelbach

On Tue, Jul 5, 2016 at 10:25 PM, Johannes Sixt <j6t@kdbg.org> wrote:
> Am 05.07.2016 um 19:05 schrieb Nguyễn Thái Ngọc Duy:
>>
>> +                       die(Q_("premature end of pack file, %"PRIuMAX"
>> byte missing",
>> +                              "premature end of pack file, %"PRIuMAX"
>> bytes missing",
>
>
> I would be surprised if this form of translation worked...

What can I say, gettext is smart. In gc.c we already have this

die(_("gc is already running on machine '%s' pid %"PRIuMAX" (use
--force if not)"), name, (uintmax_t)pid);

and vi.po shows

#: builtin/gc.c:397
#, c-format
msgid ""
"gc is already running on machine '%s' pid %<PRIuMAX> (use --force if not)"
-- 
Duy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/5] index-pack: correct "len" type in unpack_data()
  2016-07-06 15:25       ` Duy Nguyen
@ 2016-07-06 16:04         ` Junio C Hamano
  2016-07-06 16:08           ` Junio C Hamano
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2016-07-06 16:04 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Johannes Sixt, Git Mailing List, Christoph Michelbach

Duy Nguyen <pclouds@gmail.com> writes:

> What can I say, gettext is smart. In gc.c we already have this
>
> die(_("gc is already running on machine '%s' pid %"PRIuMAX" (use
> --force if not)"), name, (uintmax_t)pid);
>
> and vi.po shows
>
> #: builtin/gc.c:397
> #, c-format
> msgid ""
> "gc is already running on machine '%s' pid %<PRIuMAX> (use --force if not)"

And translators are expcted to keep "%<PRIuMAX>" in their translated
.po files, and whatever translates .po into .mo knows what
%<PRIuMAX> should be mapped to?

That is surprising.

On a related but not surprising tangent, I see this example in
gettext.info that may be relevant.

       About larger integer types, such as ‘uintmax_t’ or ‘unsigned long
    long’: they can be handled by reducing the value to a range that fits in
    an ‘unsigned long’.  Simply casting the value to ‘unsigned long’ would
    not do the right thing, since it would treat ‘ULONG_MAX + 1’ like zero,
    ‘ULONG_MAX + 2’ like singular, and the like.  Here you can exploit the
    fact that all mentioned plural form formulas eventually become periodic,
    with a period that is a divisor of 100 (or 1000 or 1000000).  So, when
    you reduce a large value to another one in the range [1000000, 1999999]
    that ends in the same 6 decimal digits, you can assume that it will lead
    to the same plural form selection.  This code does this:

         #include <inttypes.h>
         uintmax_t nbytes = ...;
         printf (ngettext ("The file has %"PRIuMAX" byte.",
                           "The file has %"PRIuMAX" bytes.",
                           (nbytes > ULONG_MAX
                            ? (nbytes % 1000000) + 1000000
                            : nbytes)),
                 nbytes);




^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/5] index-pack: correct "len" type in unpack_data()
  2016-07-06 16:04         ` Junio C Hamano
@ 2016-07-06 16:08           ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2016-07-06 16:08 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Johannes Sixt, Git Mailing List, Christoph Michelbach

Junio C Hamano <gitster@pobox.com> writes:

> Duy Nguyen <pclouds@gmail.com> writes:
>
>> What can I say, gettext is smart. In gc.c we already have this
>>
>> die(_("gc is already running on machine '%s' pid %"PRIuMAX" (use
>> --force if not)"), name, (uintmax_t)pid);
>>
>> and vi.po shows
>>
>> #: builtin/gc.c:397
>> #, c-format
>> msgid ""
>> "gc is already running on machine '%s' pid %<PRIuMAX> (use --force if not)"
>
> And translators are expcted to keep "%<PRIuMAX>" in their translated
> .po files, and whatever translates .po into .mo knows what
> %<PRIuMAX> should be mapped to?
>
> That is surprising.

gettext.info answered me ;-)

    The ‘gettext’ tools and library have special support for these
    ‘<inttypes.h>’ macros.  You can therefore simply write

         printf (gettext ("The amount is %0" PRId64 "\n"), number);

    The PO file will contain the string "The amount is %0<PRId64>\n".  The
    translators will provide a translation containing "%0<PRId64>" as well,
    and at runtime the ‘gettext’ function’s result will contain the
    appropriate constant string, "d" or "ld" or "lld".

       This works only for the predefined ‘<inttypes.h>’ macro...

Nice.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems
       [not found]     ` <CACsJy8BDQbanGsf=3z3K-OuH0++EuqQFEB22udXJT+WZnFKSBg@mail.gmail.com>
@ 2016-07-06 18:02       ` Christoph Michelbach
  2016-07-06 18:54         ` Duy Nguyen
  0 siblings, 1 reply; 39+ messages in thread
From: Christoph Michelbach @ 2016-07-06 18:02 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List

I now created a repo on a 64 bit system (same command as before), then
duplicated it. One copy I applied git gc to on the 64 bit host system,
the other copy I gave to my 32 bit virtual machine to apply git gc to
it.

The 64 bit host uses git from the Ubuntu repositories, the 32 bit
virtual machine uses git 2.9 from github with the patches applied.

git gc worked without problems on the host but I got

frank@frank-virtual-16-04-32-bit:~/g$ git gc
Counting objects: 6, done.
Compressing objects: 100% (3/3), done.
error: bad packed object CRC for
f595ad71c1a1ecc312ddcb32a84a4bfc4a2ed1c8
Writing objects: 100% (6/6), done.
Total 6 (delta 0), reused 0 (delta 0)
error: failed to validate delta base reference at offset 342896 from
.git/objects/pack/pack-630b5a3f28cd9d02a546462bf0d0bc640ffde784.pack
error: bad offset for revindex
error: failed to read object 4246d27f8e0149d45687b0cc23bc29a67f1f0c79
at offset 342887 from .git/objects/pack/pack-
630b5a3f28cd9d02a546462bf0d0bc640ffde784.pack
fatal: packed object 4246d27f8e0149d45687b0cc23bc29a67f1f0c79 (stored
in .git/objects/pack/pack-
630b5a3f28cd9d02a546462bf0d0bc640ffde784.pack) is corrupt
error: failed to run prune
frank@frank-virtual-16-04-32-bit:~/g$ 

on the virtual machine.

Not including the mailing list in CC wasn't intended.

-- 
With kind regards
Christoph Michelbach

On Wed, 2016-07-06 at 17:23 +0200, Duy Nguyen wrote:
> On Wed, Jul 6, 2016 at 12:14 AM, Christoph Michelbach
> <michelbach94@gmail.com> wrote:
> > 
> > I now tried git gc again and it failed (in a different way, though;
> > and
> > the error message only appeared once git gc terminated).
> > 
> > Full input and output:
> > 
> > christoph@virt-16-04-32-bit:~$ mkdir test && cd test && git init &&
> > touch someFile && git add someFile && git commit -m "Initial
> > commit."
> > && dd if=/dev/urandom of=bigBinaryFile bs=1MB count=4294 && git add
> > bigBinaryFile && git commit -m "Introduced big biary file."
> > Initialized empty Git repository in /home/christoph/test/.git/
> > [master (root-commit) 20507ef] Initial commit.
> >  1 file changed, 0 insertions(+), 0 deletions(-)
> >  create mode 100644 someFile
> > 4294+0 records in
> > 4294+0 records out
> > 4294000000 bytes (4.3 GB, 4.0 GiB) copied, 435.236 s, 9.9 MB/s
> > [master 88e5dbb] Introduced big biary file.
> >  1 file changed, 0 insertions(+), 0 deletions(-)
> >  create mode 100644 bigBinaryFile
> > christoph@virt-16-04-32-bit:~/test$ git gc
> > Counting objects: 6, done.
> > Compressing objects: 100% (3/3), done.
> > Writing objects: 100% (6/6), done.
> > Total 6 (delta 0), reused 1 (delta 0)
> > error: inflate: data stream error (incorrect header check)
> > error: failed to read object
> > 705f438ccb845871ffba9d4b56f16ac763652937
> Sigh.. I'll try again :) BTW if you have a 64-bit machine too, try
> create the repo with it (so we are sure the repo is not corrupted in
> the first place) then you can run tests like this on 32-bit systems.
> The previous "bad crc" message was a fault in the code, not the data.
> 
> PS. I have just let gc run till the end (last time I tried
> pack-objects, which is only part of gc) and got a(nother) error.
> Checking... BTW, you should send these mails to git@ as well, no need
> for private message exchange.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems
  2016-07-06 18:02       ` Christoph Michelbach
@ 2016-07-06 18:54         ` Duy Nguyen
  0 siblings, 0 replies; 39+ messages in thread
From: Duy Nguyen @ 2016-07-06 18:54 UTC (permalink / raw)
  To: Christoph Michelbach; +Cc: Git Mailing List

On Wed, Jul 6, 2016 at 8:02 PM, Christoph Michelbach
<michelbach94@gmail.com> wrote:
> I now created a repo on a 64 bit system (same command as before), then
> duplicated it. One copy I applied git gc to on the 64 bit host system,
> the other copy I gave to my 32 bit virtual machine to apply git gc to
> it.
>
> The 64 bit host uses git from the Ubuntu repositories, the 32 bit
> virtual machine uses git 2.9 from github with the patches applied.
>
> git gc worked without problems on the host but I got
>
> frank@frank-virtual-16-04-32-bit:~/g$ git gc
> Counting objects: 6, done.
> Compressing objects: 100% (3/3), done.
> error: bad packed object CRC for
> f595ad71c1a1ecc312ddcb32a84a4bfc4a2ed1c8
> Writing objects: 100% (6/6), done.
> Total 6 (delta 0), reused 0 (delta 0)
> error: failed to validate delta base reference at offset 342896 from
> .git/objects/pack/pack-630b5a3f28cd9d02a546462bf0d0bc640ffde784.pack

Yup. So far I can only say that pack-objects generates incorrect pack
:-(  I hope it's the possible 6/5 patch I mentioned earlier but I'm
not sure. Will continue tomorrow.

> error: bad offset for revindex
> error: failed to read object 4246d27f8e0149d45687b0cc23bc29a67f1f0c79
> at offset 342887 from .git/objects/pack/pack-
> 630b5a3f28cd9d02a546462bf0d0bc640ffde784.pack
> fatal: packed object 4246d27f8e0149d45687b0cc23bc29a67f1f0c79 (stored
> in .git/objects/pack/pack-
> 630b5a3f28cd9d02a546462bf0d0bc640ffde784.pack) is corrupt
> error: failed to run prune
> frank@frank-virtual-16-04-32-bit:~/g$
>
> on the virtual machine.
>
> Not including the mailing list in CC wasn't intended.
>
> --
> With kind regards
> Christoph Michelbach
>
> On Wed, 2016-07-06 at 17:23 +0200, Duy Nguyen wrote:
>> On Wed, Jul 6, 2016 at 12:14 AM, Christoph Michelbach
>> <michelbach94@gmail.com> wrote:
>> >
>> > I now tried git gc again and it failed (in a different way, though;
>> > and
>> > the error message only appeared once git gc terminated).
>> >
>> > Full input and output:
>> >
>> > christoph@virt-16-04-32-bit:~$ mkdir test && cd test && git init &&
>> > touch someFile && git add someFile && git commit -m "Initial
>> > commit."
>> > && dd if=/dev/urandom of=bigBinaryFile bs=1MB count=4294 && git add
>> > bigBinaryFile && git commit -m "Introduced big biary file."
>> > Initialized empty Git repository in /home/christoph/test/.git/
>> > [master (root-commit) 20507ef] Initial commit.
>> >  1 file changed, 0 insertions(+), 0 deletions(-)
>> >  create mode 100644 someFile
>> > 4294+0 records in
>> > 4294+0 records out
>> > 4294000000 bytes (4.3 GB, 4.0 GiB) copied, 435.236 s, 9.9 MB/s
>> > [master 88e5dbb] Introduced big biary file.
>> >  1 file changed, 0 insertions(+), 0 deletions(-)
>> >  create mode 100644 bigBinaryFile
>> > christoph@virt-16-04-32-bit:~/test$ git gc
>> > Counting objects: 6, done.
>> > Compressing objects: 100% (3/3), done.
>> > Writing objects: 100% (6/6), done.
>> > Total 6 (delta 0), reused 1 (delta 0)
>> > error: inflate: data stream error (incorrect header check)
>> > error: failed to read object
>> > 705f438ccb845871ffba9d4b56f16ac763652937
>> Sigh.. I'll try again :) BTW if you have a 64-bit machine too, try
>> create the repo with it (so we are sure the repo is not corrupted in
>> the first place) then you can run tests like this on 32-bit systems.
>> The previous "bad crc" message was a fault in the code, not the data.
>>
>> PS. I have just let gc run till the end (last time I tried
>> pack-objects, which is only part of gc) and got a(nother) error.
>> Checking... BTW, you should send these mails to git@ as well, no need
>> for private message exchange.



-- 
Duy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems
  2016-07-05 17:05 ` [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems Nguyễn Thái Ngọc Duy
                     ` (6 preceding siblings ...)
       [not found]   ` <1467756891.4798.1.camel@gmail.com>
@ 2016-07-10 10:41   ` Duy Nguyen
  2016-07-10 10:42   ` [PATCH 6/5] pack-objects: do not truncate result in-pack object size " Nguyễn Thái Ngọc Duy
                     ` (4 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Duy Nguyen @ 2016-07-10 10:41 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Junio C Hamano, Christoph Michelbach,
	Nguyễn Thái Ngọc Duy

On Tue, Jul 5, 2016 at 7:05 PM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> Most of the rest seems harmless, except for the local variable "size"
> in builtin/pack-objects.c:write_one(). I might send 6/5 for that one.

This truncated "size" produces incorrect .idx file and makes everybody
else fail to read the (correct) .pack file. So 6/5 is coming shortly.

I also add 7/5 to be able to run fsck on 32-bit systems, but 7/5 could
help improve performance a bit on 64-bit systems too.
-- 
Duy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 6/5] pack-objects: do not truncate result in-pack object size on 32-bit systems
  2016-07-05 17:05 ` [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems Nguyễn Thái Ngọc Duy
                     ` (7 preceding siblings ...)
  2016-07-10 10:41   ` Duy Nguyen
@ 2016-07-10 10:42   ` Nguyễn Thái Ngọc Duy
  2016-07-10 10:45   ` [PATCH 7/5] fsck: use streaming interface for large blobs in pack Nguyễn Thái Ngọc Duy
                     ` (3 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-07-10 10:42 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, michelbach94,
	Nguyễn Thái Ngọc Duy

A typical diff will not show what's going on and you need to see full
functions. The core code is like this, at the end of of write_one()

	e->idx.offset = *offset;
	size = write_object(f, e, *offset);
	if (!size) {
		e->idx.offset = recursing;
		return WRITE_ONE_BREAK;
	}
	written_list[nr_written++] = &e->idx;

	/* make sure off_t is sufficiently large not to wrap */
	if (signed_add_overflows(*offset, size))
		die("pack too large for current definition of off_t");
	*offset += size;

Here we can see that the in-pack object size is returned by
write_object (or indirectly by write_reuse_object). And it's used to
calculate object offsets, which end up in the pack index file,
generated at the end.

If "size" overflows (on 32-bit sytems, unsigned long is 32-bit while
off_t can be 64-bit), we got wrong offsets and produce incorrect .idx
file, which may make it look like the .pack file is corrupted.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/pack-objects.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index df6ecd5..f854ca4 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -341,8 +341,8 @@ static unsigned long write_no_reuse_object(struct sha1file *f, struct object_ent
 }
 
 /* Return 0 if we will bust the pack-size limit */
-static unsigned long write_reuse_object(struct sha1file *f, struct object_entry *entry,
-					unsigned long limit, int usable_delta)
+static off_t write_reuse_object(struct sha1file *f, struct object_entry *entry,
+				unsigned long limit, int usable_delta)
 {
 	struct packed_git *p = entry->in_pack;
 	struct pack_window *w_curs = NULL;
@@ -415,11 +415,12 @@ static unsigned long write_reuse_object(struct sha1file *f, struct object_entry
 }
 
 /* Return 0 if we will bust the pack-size limit */
-static unsigned long write_object(struct sha1file *f,
-				  struct object_entry *entry,
-				  off_t write_offset)
+static off_t write_object(struct sha1file *f,
+			  struct object_entry *entry,
+			  off_t write_offset)
 {
-	unsigned long limit, len;
+	unsigned long limit;
+	off_t len;
 	int usable_delta, to_reuse;
 
 	if (!pack_to_stdout)
@@ -491,7 +492,7 @@ static enum write_one_status write_one(struct sha1file *f,
 				       struct object_entry *e,
 				       off_t *offset)
 {
-	unsigned long size;
+	off_t size;
 	int recursing;
 
 	/*
-- 
2.8.2.537.g0965dd9


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 7/5] fsck: use streaming interface for large blobs in pack
  2016-07-05 17:05 ` [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems Nguyễn Thái Ngọc Duy
                     ` (8 preceding siblings ...)
  2016-07-10 10:42   ` [PATCH 6/5] pack-objects: do not truncate result in-pack object size " Nguyễn Thái Ngọc Duy
@ 2016-07-10 10:45   ` Nguyễn Thái Ngọc Duy
  2016-07-12 18:39     ` Junio C Hamano
  2016-07-12 17:07   ` [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems Junio C Hamano
                     ` (2 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-07-10 10:45 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, michelbach94,
	Nguyễn Thái Ngọc Duy

For blobs, we want to make sure the on-disk data is not corrupted
(i.e. can be inflated and produce the expected SHA-1). Blob content is
opaque, there's nothing else inside to check for.

For really large blobs, we may want to avoid unpacking the entire blob
in memory, just to check whether it produces the same SHA-1. On 32-bit
systems, we may not have enough virtual address space for such memory
allocation. And even on 64-bit where it's not a problem, allocating a
lot more memory could result in kicking other parts of systems to swap
file, generating lots of I/O and slowing everything down.

For this particular operation, not unpacking the blob and letting
check_sha1_signature, which supports streaming interface, do the job
is sufficient. check_sha1_signature() is not shown in the diff,
unfortunately. But if will be called when "data_valid && !data" is
false.

We will call the callback function "fn" with NULL as "data". The only
callback of this function is fsck_obj_buffer(), which does not touch
"data" at all if it's a blob.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 pack-check.c | 15 +++++++++++++--
 pack.h       |  1 +
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/pack-check.c b/pack-check.c
index 1da89a4..0777766 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -105,6 +105,8 @@ static int verify_packfile(struct packed_git *p,
 		void *data;
 		enum object_type type;
 		unsigned long size;
+		off_t curpos;
+		int data_valid = 0;
 
 		if (p->index_version > 1) {
 			off_t offset = entries[i].offset;
@@ -116,8 +118,17 @@ static int verify_packfile(struct packed_git *p,
 					    sha1_to_hex(entries[i].sha1),
 					    p->pack_name, (uintmax_t)offset);
 		}
-		data = unpack_entry(p, entries[i].offset, &type, &size);
-		if (!data)
+
+		curpos = entries[i].offset;
+		type = unpack_object_header(p, w_curs, &curpos, &size);
+		unuse_pack(w_curs);
+
+		if (type != OBJ_BLOB || size < big_file_threshold) {
+			data = unpack_entry(p, entries[i].offset, &type, &size);
+			data_valid = 1;
+		}
+
+		if (data_valid && !data)
 			err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
 				    sha1_to_hex(entries[i].sha1), p->pack_name,
 				    (uintmax_t)entries[i].offset);
diff --git a/pack.h b/pack.h
index 3223f5a..0e77429 100644
--- a/pack.h
+++ b/pack.h
@@ -74,6 +74,7 @@ struct pack_idx_entry {
 
 
 struct progress;
+/* Note, the data argument could be NULL if object type is blob */
 typedef int (*verify_fn)(const unsigned char*, enum object_type, unsigned long, void*, int*);
 
 extern const char *write_idx_file(const char *index_name, struct pack_idx_entry **objects, int nr_objects, const struct pack_idx_option *, const unsigned char *sha1);
-- 
2.8.2.537.g0965dd9


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems
  2016-07-05 17:05 ` [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems Nguyễn Thái Ngọc Duy
                     ` (9 preceding siblings ...)
  2016-07-10 10:45   ` [PATCH 7/5] fsck: use streaming interface for large blobs in pack Nguyễn Thái Ngọc Duy
@ 2016-07-12 17:07   ` Junio C Hamano
  2016-07-12 18:48     ` Junio C Hamano
  2016-07-12 20:38   ` Junio C Hamano
  2016-07-13 15:43   ` [PATCH v2 0/7] " Nguyễn Thái Ngọc Duy
  12 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2016-07-12 17:07 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, michelbach94

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> Since I now could reproduce the problem that Christoph showed, I
> decided to send the good patches out. To sum up, we use "unsigned
> long" in some places related to file size. On 32-bit systems, it's
> limited to 32 bits even though the system can handle files larger than
> that (off_t is 64-bit). This fixes it.
>
> clang -Wshorten-64-to-32 is very helpful to spot these problems. I
> have a couple more patches to clean all these warnings, but some need
> more code study to see what is the right way to do.
>
> Most of the rest seems harmless, except for the local variable "size"
> in builtin/pack-objects.c:write_one(). I might send 6/5 for that one.


Thanks.

> Nguyễn Thái Ngọc Duy (5):
>   pack-objects: pass length to check_pack_crc() without truncation
>   sha1_file.c: use type off_t* for object_info->disk_sizep
>   index-pack: correct "len" type in unpack_data()
>   index-pack: report correct bad object offsets even if they are large
>   index-pack: correct "offset" type in unpack_entry_data()
>
>  builtin/cat-file.c     |  4 ++--
>  builtin/index-pack.c   | 23 ++++++++++++-----------
>  builtin/pack-objects.c |  2 +-
>  cache.h                |  2 +-
>  sha1_file.c            |  2 +-
>  5 files changed, 17 insertions(+), 16 deletions(-)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/5] pack-objects: pass length to check_pack_crc() without truncation
  2016-07-05 17:05   ` [PATCH 1/5] pack-objects: pass length to check_pack_crc() without truncation Nguyễn Thái Ngọc Duy
@ 2016-07-12 17:16     ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2016-07-12 17:16 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, michelbach94

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 8f5e358..df6ecd5 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -349,7 +349,7 @@ static unsigned long write_reuse_object(struct sha1file *f, struct object_entry
>  	struct revindex_entry *revidx;
>  	off_t offset;
>  	enum object_type type = entry->type;
> -	unsigned long datalen;
> +	off_t datalen;
>  	unsigned char header[10], dheader[10];
>  	unsigned hdrlen;
>  
> diff --git a/sha1_file.c b/sha1_file.c
> index d5e1121..cb571ac 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -2281,7 +2281,7 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
>  
>  		if (do_check_packed_object_crc && p->index_version > 1) {
>  			struct revindex_entry *revidx = find_pack_revindex(p, obj_offset);
> -			unsigned long len = revidx[1].offset - obj_offset;
> +			off_t len = revidx[1].offset - obj_offset;
>  			if (check_pack_crc(p, &w_curs, obj_offset, len, revidx->nr)) 

It is sad that check_pack_crc() already knew to take off_t here and
we (unknowingly) had a deliberate truncation by using ulong.
Looks obvioiusly correct.

Thanks.

{
>  				const unsigned char *sha1 =
>  					nth_packed_object_sha1(p, revidx->nr);

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/5] sha1_file.c: use type off_t* for object_info->disk_sizep
  2016-07-05 17:05   ` [PATCH 2/5] sha1_file.c: use type off_t* for object_info->disk_sizep Nguyễn Thái Ngọc Duy
@ 2016-07-12 17:20     ` Junio C Hamano
  2016-07-12 19:26     ` Junio C Hamano
  1 sibling, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2016-07-12 17:20 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, michelbach94

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> This field, filled by sha1_object_info() contains the on-disk size of
> an object, which could go over 4GB limit of unsigned long on 32-bit
> systems. Use off_t for it instead and update all callers.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  builtin/cat-file.c | 4 ++--
>  cache.h            | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index 618103f..5b34bd0 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -131,7 +131,7 @@ struct expand_data {
>  	unsigned char sha1[20];
>  	enum object_type type;
>  	unsigned long size;
> -	unsigned long disk_size;
> +	off_t disk_size;
>  	const char *rest;
>  	unsigned char delta_base_sha1[20];
>  
> @@ -191,7 +191,7 @@ static void expand_atom(struct strbuf *sb, const char *atom, int len,
>  		if (data->mark_query)
>  			data->info.disk_sizep = &data->disk_size;
>  		else
> -			strbuf_addf(sb, "%lu", data->disk_size);
> +			strbuf_addf(sb, "%"PRIuMAX, data->disk_size);

Doesn't this now need a cast?

>  	} else if (is_atom("rest", atom, len)) {
>  		if (data->mark_query)
>  			data->split_on_whitespace = 1;
> diff --git a/cache.h b/cache.h
> index c73becb..a4465cb 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1508,7 +1508,7 @@ struct object_info {
>  	/* Request */
>  	enum object_type *typep;
>  	unsigned long *sizep;
> -	unsigned long *disk_sizep;
> +	off_t *disk_sizep;
>  	unsigned char *delta_base_sha1;
>  	struct strbuf *typename;

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/5] index-pack: report correct bad object offsets even if they are large
  2016-07-05 17:05   ` [PATCH 4/5] index-pack: report correct bad object offsets even if they are large Nguyễn Thái Ngọc Duy
@ 2016-07-12 17:24     ` Junio C Hamano
  2016-07-12 19:27     ` Junio C Hamano
  1 sibling, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2016-07-12 17:24 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, michelbach94

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> Use the right type for offsets in this case, off_t, which makes a
> difference on 32-bit systems with large file support, and change
> formatting code accordingly.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  builtin/index-pack.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index cafaab7..73f7cd2 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -338,10 +338,10 @@ static void parse_pack_header(void)
>  	use(sizeof(struct pack_header));
>  }
>  
> -static NORETURN void bad_object(unsigned long offset, const char *format,
> +static NORETURN void bad_object(off_t offset, const char *format,
>  		       ...) __attribute__((format (printf, 2, 3)));
>  
> -static NORETURN void bad_object(unsigned long offset, const char *format, ...)
> +static NORETURN void bad_object(off_t offset, const char *format, ...)
>  {
>  	va_list params;
>  	char buf[1024];
> @@ -349,7 +349,8 @@ static NORETURN void bad_object(unsigned long offset, const char *format, ...)
>  	va_start(params, format);
>  	vsnprintf(buf, sizeof(buf), format, params);
>  	va_end(params);
> -	die(_("pack has bad object at offset %lu: %s"), offset, buf);
> +	die(_("pack has bad object at offset %"PRIiMAX": %s"),
> +	    (intmax_t)offset, buf);
>  }

We seem to have a fallback definition only for PRIuMAX.

off_t is supposed to be a signed integer type [*1*], but we know
this is a positive offset when we issue this warning, so PRIuMAX
would probably be fine.


[Reference]

*1* http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_types.h.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/5] fsck: use streaming interface for large blobs in pack
  2016-07-10 10:45   ` [PATCH 7/5] fsck: use streaming interface for large blobs in pack Nguyễn Thái Ngọc Duy
@ 2016-07-12 18:39     ` Junio C Hamano
  2016-07-12 19:06       ` Junio C Hamano
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2016-07-12 18:39 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, michelbach94

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> For this particular operation, not unpacking the blob and letting
> check_sha1_signature, which supports streaming interface, do the job
> is sufficient.

That reasoning is sound, I would think, but...

> We will call the callback function "fn" with NULL as "data". The only
> callback of this function is fsck_obj_buffer(), which does not touch
> "data" at all if it's a blob.

... this may be a bit too magical that the assumption needs to be
left in an in-code comment to warn people against temptation to
using "data" over there, not just where the type of verify_fn is
declared.  Essentially, you are changing the expectation from
existing functions of that type.

> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  pack-check.c | 15 +++++++++++++--
>  pack.h       |  1 +
>  2 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/pack-check.c b/pack-check.c
> index 1da89a4..0777766 100644
> --- a/pack-check.c
> +++ b/pack-check.c
> @@ -105,6 +105,8 @@ static int verify_packfile(struct packed_git *p,
>  		void *data;
>  		enum object_type type;
>  		unsigned long size;
> +		off_t curpos;
> +		int data_valid = 0;
>  
>  		if (p->index_version > 1) {
>  			off_t offset = entries[i].offset;
> @@ -116,8 +118,17 @@ static int verify_packfile(struct packed_git *p,
>  					    sha1_to_hex(entries[i].sha1),
>  					    p->pack_name, (uintmax_t)offset);
>  		}
> -		data = unpack_entry(p, entries[i].offset, &type, &size);
> -		if (!data)
> +
> +		curpos = entries[i].offset;
> +		type = unpack_object_header(p, w_curs, &curpos, &size);
> +		unuse_pack(w_curs);

This made me wonder where this "unuse" comes from; w_curs's in-use
count is incremented by calling unpack_object_header() and we are
done with that access at this point, hence we have unuse here.

> +		if (type != OBJ_BLOB || size < big_file_threshold) {
> +			data = unpack_entry(p, entries[i].offset, &type, &size);
> +			data_valid = 1;

This codepath slurps the data in-core to hash and data is later
freed, i.e. non-blob objects and small ones are handled as before.

> +		}
> +
> +		if (data_valid && !data)
>  			err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
>  				    sha1_to_hex(entries[i].sha1), p->pack_name,
>  				    (uintmax_t)entries[i].offset);

Otherwise, we'd go to check_sha1_signature() with map==NULL.  And
that is exactly what we want---map==NULL is the way we tell the
function to use the streaming interface to check.

Good.

> diff --git a/pack.h b/pack.h
> index 3223f5a..0e77429 100644
> --- a/pack.h
> +++ b/pack.h
> @@ -74,6 +74,7 @@ struct pack_idx_entry {
>  
>  
>  struct progress;
> +/* Note, the data argument could be NULL if object type is blob */
>  typedef int (*verify_fn)(const unsigned char*, enum object_type, unsigned long, void*, int*);
>  
>  extern const char *write_idx_file(const char *index_name, struct pack_idx_entry **objects, int nr_objects, const struct pack_idx_option *, const unsigned char *sha1);

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems
  2016-07-12 17:07   ` [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems Junio C Hamano
@ 2016-07-12 18:48     ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2016-07-12 18:48 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, michelbach94

Junio C Hamano <gitster@pobox.com> writes:

> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>
>> Since I now could reproduce the problem that Christoph showed, I
>> decided to send the good patches out. To sum up, we use "unsigned
>> long" in some places related to file size. On 32-bit systems, it's
>> limited to 32 bits even though the system can handle files larger than
>> that (off_t is 64-bit). This fixes it.
>>
>> clang -Wshorten-64-to-32 is very helpful to spot these problems. I
>> have a couple more patches to clean all these warnings, but some need
>> more code study to see what is the right way to do.
>>
>> Most of the rest seems harmless, except for the local variable "size"
>> in builtin/pack-objects.c:write_one(). I might send 6/5 for that one.
>
>
> Thanks.

All looked nicely done.  I'll queue with a few SQUASH??? (please ack
or reject them) in between on 'pu' and push the result out later
today.

Thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/5] fsck: use streaming interface for large blobs in pack
  2016-07-12 18:39     ` Junio C Hamano
@ 2016-07-12 19:06       ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2016-07-12 19:06 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, michelbach94

Junio C Hamano <gitster@pobox.com> writes:

>> +		if (type != OBJ_BLOB || size < big_file_threshold) {
>> +			data = unpack_entry(p, entries[i].offset, &type, &size);
>> +			data_valid = 1;
>
> This codepath slurps the data in-core to hash and data is later
> freed, i.e. non-blob objects and small ones are handled as before.
>
>> +		}
>> +
>> +		if (data_valid && !data)
>>  			err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
>>  				    sha1_to_hex(entries[i].sha1), p->pack_name,
>>  				    (uintmax_t)entries[i].offset);
>
> Otherwise, we'd go to check_sha1_signature() with map==NULL.  And
> that is exactly what we want---map==NULL is the way we tell the
> function to use the streaming interface to check.
>
> Good.

But not quite correct yet ;-)

Here is what I'd propose to squash in.  Even though data_valid
protects the above if() condition from accessing an otherwise
uninitialized "data", the call to check_sha1_signature() we have
later will get noticed by the compiler that "data" is passed
uninitialized.

More importantly, uninitialized data passed may be non-NULL, in
which case it would not trigger the streaming interface.

 pack-check.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/pack-check.c b/pack-check.c
index 0777766..ac263fd 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -123,7 +123,14 @@ static int verify_packfile(struct packed_git *p,
 		type = unpack_object_header(p, w_curs, &curpos, &size);
 		unuse_pack(w_curs);
 
-		if (type != OBJ_BLOB || size < big_file_threshold) {
+		if (type == OBJ_BLOB && big_file_threshold <= size) {
+			/*
+			 * Let check_sha1_signature() to check it with
+			 * the streaming interface; no point slurping
+			 * the data in-core only to discard.
+			 */
+			data = NULL;
+		} else {
 			data = unpack_entry(p, entries[i].offset, &type, &size);
 			data_valid = 1;
 		}

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/5] sha1_file.c: use type off_t* for object_info->disk_sizep
  2016-07-05 17:05   ` [PATCH 2/5] sha1_file.c: use type off_t* for object_info->disk_sizep Nguyễn Thái Ngọc Duy
  2016-07-12 17:20     ` Junio C Hamano
@ 2016-07-12 19:26     ` Junio C Hamano
  1 sibling, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2016-07-12 19:26 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, michelbach94

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> -			strbuf_addf(sb, "%lu", data->disk_size);
> +			strbuf_addf(sb, "%"PRIuMAX, data->disk_size);

Subject: [PATCH] SQUASH???

---
 builtin/cat-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 58ad284..13ed944 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -184,7 +184,7 @@ static void expand_atom(struct strbuf *sb, const char *atom, int len,
 		if (data->mark_query)
 			data->info.disk_sizep = &data->disk_size;
 		else
-			strbuf_addf(sb, "%"PRIuMAX, data->disk_size);
+			strbuf_addf(sb, "%"PRIuMAX, (uintmax_t)data->disk_size);
 	} else if (is_atom("rest", atom, len)) {
 		if (data->mark_query)
 			data->split_on_whitespace = 1;
-- 
2.9.1-500-g4c1e1e0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/5] index-pack: report correct bad object offsets even if they are large
  2016-07-05 17:05   ` [PATCH 4/5] index-pack: report correct bad object offsets even if they are large Nguyễn Thái Ngọc Duy
  2016-07-12 17:24     ` Junio C Hamano
@ 2016-07-12 19:27     ` Junio C Hamano
  1 sibling, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2016-07-12 19:27 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, michelbach94

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

>  	va_end(params);
> -	die(_("pack has bad object at offset %lu: %s"), offset, buf);
> +	die(_("pack has bad object at offset %"PRIiMAX": %s"),
> +	    (intmax_t)offset, buf);

Subject: [PATCH] SQUASH???

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 73f7cd2..e2d8ae4 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -349,8 +349,8 @@ static NORETURN void bad_object(off_t offset, const char *format, ...)
 	va_start(params, format);
 	vsnprintf(buf, sizeof(buf), format, params);
 	va_end(params);
-	die(_("pack has bad object at offset %"PRIiMAX": %s"),
-	    (intmax_t)offset, buf);
+	die(_("pack has bad object at offset %"PRIuMAX": %s"),
+	    (uintmax_t)offset, buf);
 }
 
 static inline struct thread_local *get_thread_data(void)
-- 
2.9.1-500-g4c1e1e0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems
  2016-07-05 17:05 ` [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems Nguyễn Thái Ngọc Duy
                     ` (10 preceding siblings ...)
  2016-07-12 17:07   ` [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems Junio C Hamano
@ 2016-07-12 20:38   ` Junio C Hamano
  2016-07-13  6:01     ` Duy Nguyen
  2016-07-13 15:43   ` [PATCH v2 0/7] " Nguyễn Thái Ngọc Duy
  12 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2016-07-12 20:38 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, michelbach94

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> Since I now could reproduce the problem that Christoph showed, I
> decided to send the good patches out. To sum up, we use "unsigned
> long" in some places related to file size. On 32-bit systems, it's
> limited to 32 bits even though the system can handle files larger than
> that (off_t is 64-bit). This fixes it.
>
> clang -Wshorten-64-to-32 is very helpful to spot these problems. I
> have a couple more patches to clean all these warnings, but some need
> more code study to see what is the right way to do.
>
> Most of the rest seems harmless, except for the local variable "size"
> in builtin/pack-objects.c:write_one(). I might send 6/5 for that one.

With 1-7/5 it appears t1050 is broken?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems
  2016-07-12 20:38   ` Junio C Hamano
@ 2016-07-13  6:01     ` Duy Nguyen
  0 siblings, 0 replies; 39+ messages in thread
From: Duy Nguyen @ 2016-07-13  6:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Christoph Michelbach

On Tue, Jul 12, 2016 at 10:38 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>
>> Since I now could reproduce the problem that Christoph showed, I
>> decided to send the good patches out. To sum up, we use "unsigned
>> long" in some places related to file size. On 32-bit systems, it's
>> limited to 32 bits even though the system can handle files larger than
>> that (off_t is 64-bit). This fixes it.
>>
>> clang -Wshorten-64-to-32 is very helpful to spot these problems. I
>> have a couple more patches to clean all these warnings, but some need
>> more code study to see what is the right way to do.
>>
>> Most of the rest seems harmless, except for the local variable "size"
>> in builtin/pack-objects.c:write_one(). I might send 6/5 for that one.
>
> With 1-7/5 it appears t1050 is broken?

Oops, sorry I focused on testing big repos and forgot the test suite.
But hehe.. it's a good thing. That test was added by 735efde, where
fsck could keep on going after failing to inflate some blobs (instead
of hitting malloc failure and abort). Now it's even better, fsck
passes. I'll fix that up and resend with your squashes in.
-- 
Duy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 0/7] Number truncation with 4+ GB files on 32-bit systems
  2016-07-05 17:05 ` [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems Nguyễn Thái Ngọc Duy
                     ` (11 preceding siblings ...)
  2016-07-12 20:38   ` Junio C Hamano
@ 2016-07-13 15:43   ` Nguyễn Thái Ngọc Duy
  2016-07-13 15:43     ` [PATCH v2 1/7] pack-objects: pass length to check_pack_crc() without truncation Nguyễn Thái Ngọc Duy
                       ` (7 more replies)
  12 siblings, 8 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-07-13 15:43 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, michelbach94,
	Nguyễn Thái Ngọc Duy

A diff from nd/pack-ofs-4gb-limit can explain the changes better than
me.

I did not add PRIdMAX or similar because that carries a risk to exotic
platforms that people rarely test. Just casting to unsigned should be
fine.

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 55eac75..b08bc8b 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -356,6 +356,10 @@ static int fsck_sha1(const unsigned char *sha1)
 static int fsck_obj_buffer(const unsigned char *sha1, enum object_type type,
 			   unsigned long size, void *buffer, int *eaten)
 {
+	/*
+	 * Note, buffer may be NULL if type is OBJ_BLOB. See
+	 * verify_packfile(), data_valid variable for details.
+	 */
 	struct object *obj;
 	obj = parse_object_buffer(sha1, type, size, buffer, eaten);
 	if (!obj) {
diff --git a/pack-check.c b/pack-check.c
index 14e8cb0..d123846 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -106,7 +106,7 @@ static int verify_packfile(struct packed_git *p,
 		enum object_type type;
 		unsigned long size;
 		off_t curpos;
-		int data_valid = 0;
+		int data_valid;
 
 		if (p->index_version > 1) {
 			off_t offset = entries[i].offset;
@@ -130,6 +130,7 @@ static int verify_packfile(struct packed_git *p,
 			 * the data in-core only to discard.
 			 */
 			data = NULL;
+			data_valid = 0;
 		} else {
 			data = unpack_entry(p, entries[i].offset, &type, &size);
 			data_valid = 1;
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index f9f3d13..096dbff 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -177,10 +177,9 @@ test_expect_success 'zip achiving, deflate' '
 	git archive --format=zip HEAD >/dev/null
 '
 
-test_expect_success 'fsck' '
-	test_must_fail git fsck 2>err &&
-	n=$(grep "error: attempting to allocate .* over limit" err | wc -l) &&
-	test "$n" -gt 1
+test_expect_success 'fsck large blobs' '
+	git fsck 2>err &&
+	test_must_be_empty err
 '
 
 test_done

Nguyễn Thái Ngọc Duy (7):
  pack-objects: pass length to check_pack_crc() without truncation
  sha1_file.c: use type off_t* for object_info->disk_sizep
  index-pack: correct "len" type in unpack_data()
  index-pack: report correct bad object offsets even if they are large
  index-pack: correct "offset" type in unpack_entry_data()
  pack-objects: do not truncate result in-pack object size on 32-bit systems
  fsck: use streaming interface for large blobs in pack

 builtin/cat-file.c     |  4 ++--
 builtin/fsck.c         |  4 ++++
 builtin/index-pack.c   | 23 ++++++++++++-----------
 builtin/pack-objects.c | 17 +++++++++--------
 cache.h                |  2 +-
 pack-check.c           | 23 +++++++++++++++++++++--
 pack.h                 |  1 +
 sha1_file.c            |  2 +-
 t/t1050-large.sh       |  7 +++----
 9 files changed, 54 insertions(+), 29 deletions(-)

-- 
2.9.1.564.gb2f7278


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 1/7] pack-objects: pass length to check_pack_crc() without truncation
  2016-07-13 15:43   ` [PATCH v2 0/7] " Nguyễn Thái Ngọc Duy
@ 2016-07-13 15:43     ` Nguyễn Thái Ngọc Duy
  2016-07-13 15:43     ` [PATCH v2 2/7] sha1_file.c: use type off_t* for object_info->disk_sizep Nguyễn Thái Ngọc Duy
                       ` (6 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-07-13 15:43 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, michelbach94,
	Nguyễn Thái Ngọc Duy

On 32 bit systems with large file support, unsigned long is 32-bit
while the two offsets in the subtraction expression (pack-objects has
the exact same expression as in sha1_file.c but not shown in diff) are
in 64-bit. If an in-pack object is larger than 2^32 len/datalen is
truncated and we get a misleading "error: bad packed object CRC for
..." as a result.

Use off_t for len and datalen. check_pack_crc() already accepts this
argument as off_t and can deal with 4+ GB.

Noticed-by: Christoph Michelbach <michelbach94@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/pack-objects.c | 2 +-
 sha1_file.c            | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index b6664ce..a3a98c5 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -349,7 +349,7 @@ static unsigned long write_reuse_object(struct sha1file *f, struct object_entry
 	struct revindex_entry *revidx;
 	off_t offset;
 	enum object_type type = entry->type;
-	unsigned long datalen;
+	off_t datalen;
 	unsigned char header[10], dheader[10];
 	unsigned hdrlen;
 
diff --git a/sha1_file.c b/sha1_file.c
index d0f2aa0..cd9b560 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2282,7 +2282,7 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
 
 		if (do_check_packed_object_crc && p->index_version > 1) {
 			struct revindex_entry *revidx = find_pack_revindex(p, obj_offset);
-			unsigned long len = revidx[1].offset - obj_offset;
+			off_t len = revidx[1].offset - obj_offset;
 			if (check_pack_crc(p, &w_curs, obj_offset, len, revidx->nr)) {
 				const unsigned char *sha1 =
 					nth_packed_object_sha1(p, revidx->nr);
-- 
2.9.1.564.gb2f7278


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 2/7] sha1_file.c: use type off_t* for object_info->disk_sizep
  2016-07-13 15:43   ` [PATCH v2 0/7] " Nguyễn Thái Ngọc Duy
  2016-07-13 15:43     ` [PATCH v2 1/7] pack-objects: pass length to check_pack_crc() without truncation Nguyễn Thái Ngọc Duy
@ 2016-07-13 15:43     ` Nguyễn Thái Ngọc Duy
  2016-07-13 15:44     ` [PATCH v2 3/7] index-pack: correct "len" type in unpack_data() Nguyễn Thái Ngọc Duy
                       ` (5 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-07-13 15:43 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, michelbach94,
	Nguyễn Thái Ngọc Duy

This field, filled by sha1_object_info() contains the on-disk size of
an object, which could go over 4GB limit of unsigned long on 32-bit
systems. Use off_t for it instead and update all callers.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/cat-file.c | 4 ++--
 cache.h            | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 54db118..13ed944 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -131,7 +131,7 @@ struct expand_data {
 	unsigned char sha1[20];
 	enum object_type type;
 	unsigned long size;
-	unsigned long disk_size;
+	off_t disk_size;
 	const char *rest;
 	unsigned char delta_base_sha1[20];
 
@@ -184,7 +184,7 @@ static void expand_atom(struct strbuf *sb, const char *atom, int len,
 		if (data->mark_query)
 			data->info.disk_sizep = &data->disk_size;
 		else
-			strbuf_addf(sb, "%lu", data->disk_size);
+			strbuf_addf(sb, "%"PRIuMAX, (uintmax_t)data->disk_size);
 	} else if (is_atom("rest", atom, len)) {
 		if (data->mark_query)
 			data->split_on_whitespace = 1;
diff --git a/cache.h b/cache.h
index 4ff196c..ea64b51 100644
--- a/cache.h
+++ b/cache.h
@@ -1502,7 +1502,7 @@ struct object_info {
 	/* Request */
 	enum object_type *typep;
 	unsigned long *sizep;
-	unsigned long *disk_sizep;
+	off_t *disk_sizep;
 	unsigned char *delta_base_sha1;
 	struct strbuf *typename;
 
-- 
2.9.1.564.gb2f7278


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 3/7] index-pack: correct "len" type in unpack_data()
  2016-07-13 15:43   ` [PATCH v2 0/7] " Nguyễn Thái Ngọc Duy
  2016-07-13 15:43     ` [PATCH v2 1/7] pack-objects: pass length to check_pack_crc() without truncation Nguyễn Thái Ngọc Duy
  2016-07-13 15:43     ` [PATCH v2 2/7] sha1_file.c: use type off_t* for object_info->disk_sizep Nguyễn Thái Ngọc Duy
@ 2016-07-13 15:44     ` Nguyễn Thái Ngọc Duy
  2016-07-13 15:44     ` [PATCH v2 4/7] index-pack: report correct bad object offsets even if they are large Nguyễn Thái Ngọc Duy
                       ` (4 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-07-13 15:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, michelbach94,
	Nguyễn Thái Ngọc Duy

On 32-bit systems with large file support, one entry could be larger
than 4GB and overflow "len". Correct it so we can unpack a full entry.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/index-pack.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index e8c71fc..cafaab7 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -549,13 +549,13 @@ static void *unpack_data(struct object_entry *obj,
 			 void *cb_data)
 {
 	off_t from = obj[0].idx.offset + obj[0].hdr_size;
-	unsigned long len = obj[1].idx.offset - from;
+	off_t len = obj[1].idx.offset - from;
 	unsigned char *data, *inbuf;
 	git_zstream stream;
 	int status;
 
 	data = xmallocz(consume ? 64*1024 : obj->size);
-	inbuf = xmalloc((len < 64*1024) ? len : 64*1024);
+	inbuf = xmalloc((len < 64*1024) ? (int)len : 64*1024);
 
 	memset(&stream, 0, sizeof(stream));
 	git_inflate_init(&stream);
@@ -563,15 +563,15 @@ static void *unpack_data(struct object_entry *obj,
 	stream.avail_out = consume ? 64*1024 : obj->size;
 
 	do {
-		ssize_t n = (len < 64*1024) ? len : 64*1024;
+		ssize_t n = (len < 64*1024) ? (ssize_t)len : 64*1024;
 		n = xpread(get_thread_data()->pack_fd, inbuf, n, from);
 		if (n < 0)
 			die_errno(_("cannot pread pack file"));
 		if (!n)
-			die(Q_("premature end of pack file, %lu byte missing",
-			       "premature end of pack file, %lu bytes missing",
-			       len),
-			    len);
+			die(Q_("premature end of pack file, %"PRIuMAX" byte missing",
+			       "premature end of pack file, %"PRIuMAX" bytes missing",
+			       (unsigned int)len),
+			    (uintmax_t)len);
 		from += n;
 		len -= n;
 		stream.next_in = inbuf;
-- 
2.9.1.564.gb2f7278


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 4/7] index-pack: report correct bad object offsets even if they are large
  2016-07-13 15:43   ` [PATCH v2 0/7] " Nguyễn Thái Ngọc Duy
                       ` (2 preceding siblings ...)
  2016-07-13 15:44     ` [PATCH v2 3/7] index-pack: correct "len" type in unpack_data() Nguyễn Thái Ngọc Duy
@ 2016-07-13 15:44     ` Nguyễn Thái Ngọc Duy
  2016-07-13 15:44     ` [PATCH v2 5/7] index-pack: correct "offset" type in unpack_entry_data() Nguyễn Thái Ngọc Duy
                       ` (3 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-07-13 15:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, michelbach94,
	Nguyễn Thái Ngọc Duy

Use the right type for offsets in this case, off_t, which makes a
difference on 32-bit systems with large file support, and change
formatting code accordingly.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/index-pack.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index cafaab7..e2d8ae4 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -338,10 +338,10 @@ static void parse_pack_header(void)
 	use(sizeof(struct pack_header));
 }
 
-static NORETURN void bad_object(unsigned long offset, const char *format,
+static NORETURN void bad_object(off_t offset, const char *format,
 		       ...) __attribute__((format (printf, 2, 3)));
 
-static NORETURN void bad_object(unsigned long offset, const char *format, ...)
+static NORETURN void bad_object(off_t offset, const char *format, ...)
 {
 	va_list params;
 	char buf[1024];
@@ -349,7 +349,8 @@ static NORETURN void bad_object(unsigned long offset, const char *format, ...)
 	va_start(params, format);
 	vsnprintf(buf, sizeof(buf), format, params);
 	va_end(params);
-	die(_("pack has bad object at offset %lu: %s"), offset, buf);
+	die(_("pack has bad object at offset %"PRIuMAX": %s"),
+	    (uintmax_t)offset, buf);
 }
 
 static inline struct thread_local *get_thread_data(void)
-- 
2.9.1.564.gb2f7278


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 5/7] index-pack: correct "offset" type in unpack_entry_data()
  2016-07-13 15:43   ` [PATCH v2 0/7] " Nguyễn Thái Ngọc Duy
                       ` (3 preceding siblings ...)
  2016-07-13 15:44     ` [PATCH v2 4/7] index-pack: report correct bad object offsets even if they are large Nguyễn Thái Ngọc Duy
@ 2016-07-13 15:44     ` Nguyễn Thái Ngọc Duy
  2016-07-13 15:44     ` [PATCH v2 6/7] pack-objects: do not truncate result in-pack object size on 32-bit systems Nguyễn Thái Ngọc Duy
                       ` (2 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-07-13 15:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, michelbach94,
	Nguyễn Thái Ngọc Duy

unpack_entry_data() receives an off_t value from unpack_raw_entry(),
which could be larger than unsigned long on 32-bit systems with large
file support. Correct the type so truncation does not happen. This
only affects bad object reporting though.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/index-pack.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index e2d8ae4..1008d7f 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -430,7 +430,7 @@ static int is_delta_type(enum object_type type)
 	return (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA);
 }
 
-static void *unpack_entry_data(unsigned long offset, unsigned long size,
+static void *unpack_entry_data(off_t offset, unsigned long size,
 			       enum object_type type, unsigned char *sha1)
 {
 	static char fixed_buf[8192];
-- 
2.9.1.564.gb2f7278


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 6/7] pack-objects: do not truncate result in-pack object size on 32-bit systems
  2016-07-13 15:43   ` [PATCH v2 0/7] " Nguyễn Thái Ngọc Duy
                       ` (4 preceding siblings ...)
  2016-07-13 15:44     ` [PATCH v2 5/7] index-pack: correct "offset" type in unpack_entry_data() Nguyễn Thái Ngọc Duy
@ 2016-07-13 15:44     ` Nguyễn Thái Ngọc Duy
  2016-07-13 15:44     ` [PATCH v2 7/7] fsck: use streaming interface for large blobs in pack Nguyễn Thái Ngọc Duy
  2016-07-13 16:16     ` [PATCH v2 0/7] Number truncation with 4+ GB files on 32-bit systems Junio C Hamano
  7 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-07-13 15:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, michelbach94,
	Nguyễn Thái Ngọc Duy

A typical diff will not show what's going on and you need to see full
functions. The core code is like this, at the end of of write_one()

	e->idx.offset = *offset;
	size = write_object(f, e, *offset);
	if (!size) {
		e->idx.offset = recursing;
		return WRITE_ONE_BREAK;
	}
	written_list[nr_written++] = &e->idx;

	/* make sure off_t is sufficiently large not to wrap */
	if (signed_add_overflows(*offset, size))
		die("pack too large for current definition of off_t");
	*offset += size;

Here we can see that the in-pack object size is returned by
write_object (or indirectly by write_reuse_object). And it's used to
calculate object offsets, which end up in the pack index file,
generated at the end.

If "size" overflows (on 32-bit sytems, unsigned long is 32-bit while
off_t can be 64-bit), we got wrong offsets and produce incorrect .idx
file, which may make it look like the .pack file is corrupted.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/pack-objects.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a3a98c5..ac7a3a5 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -341,8 +341,8 @@ static unsigned long write_no_reuse_object(struct sha1file *f, struct object_ent
 }
 
 /* Return 0 if we will bust the pack-size limit */
-static unsigned long write_reuse_object(struct sha1file *f, struct object_entry *entry,
-					unsigned long limit, int usable_delta)
+static off_t write_reuse_object(struct sha1file *f, struct object_entry *entry,
+				unsigned long limit, int usable_delta)
 {
 	struct packed_git *p = entry->in_pack;
 	struct pack_window *w_curs = NULL;
@@ -415,11 +415,12 @@ static unsigned long write_reuse_object(struct sha1file *f, struct object_entry
 }
 
 /* Return 0 if we will bust the pack-size limit */
-static unsigned long write_object(struct sha1file *f,
-				  struct object_entry *entry,
-				  off_t write_offset)
+static off_t write_object(struct sha1file *f,
+			  struct object_entry *entry,
+			  off_t write_offset)
 {
-	unsigned long limit, len;
+	unsigned long limit;
+	off_t len;
 	int usable_delta, to_reuse;
 
 	if (!pack_to_stdout)
@@ -491,7 +492,7 @@ static enum write_one_status write_one(struct sha1file *f,
 				       struct object_entry *e,
 				       off_t *offset)
 {
-	unsigned long size;
+	off_t size;
 	int recursing;
 
 	/*
-- 
2.9.1.564.gb2f7278


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 7/7] fsck: use streaming interface for large blobs in pack
  2016-07-13 15:43   ` [PATCH v2 0/7] " Nguyễn Thái Ngọc Duy
                       ` (5 preceding siblings ...)
  2016-07-13 15:44     ` [PATCH v2 6/7] pack-objects: do not truncate result in-pack object size on 32-bit systems Nguyễn Thái Ngọc Duy
@ 2016-07-13 15:44     ` Nguyễn Thái Ngọc Duy
  2016-07-13 16:16     ` [PATCH v2 0/7] Number truncation with 4+ GB files on 32-bit systems Junio C Hamano
  7 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-07-13 15:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, michelbach94,
	Nguyễn Thái Ngọc Duy

For blobs, we want to make sure the on-disk data is not corrupted
(i.e. can be inflated and produce the expected SHA-1). Blob content is
opaque, there's nothing else inside to check for.

For really large blobs, we may want to avoid unpacking the entire blob
in memory, just to check whether it produces the same SHA-1. On 32-bit
systems, we may not have enough virtual address space for such memory
allocation. And even on 64-bit where it's not a problem, allocating a
lot more memory could result in kicking other parts of systems to swap
file, generating lots of I/O and slowing everything down.

For this particular operation, not unpacking the blob and letting
check_sha1_signature, which supports streaming interface, do the job
is sufficient. check_sha1_signature() is not shown in the diff,
unfortunately. But if will be called when "data_valid && !data" is
false.

We will call the callback function "fn" with NULL as "data". The only
callback of this function is fsck_obj_buffer(), which does not touch
"data" at all if it's a blob.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/fsck.c   |  4 ++++
 pack-check.c     | 23 +++++++++++++++++++++--
 pack.h           |  1 +
 t/t1050-large.sh |  7 +++----
 4 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 55eac75..b08bc8b 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -356,6 +356,10 @@ static int fsck_sha1(const unsigned char *sha1)
 static int fsck_obj_buffer(const unsigned char *sha1, enum object_type type,
 			   unsigned long size, void *buffer, int *eaten)
 {
+	/*
+	 * Note, buffer may be NULL if type is OBJ_BLOB. See
+	 * verify_packfile(), data_valid variable for details.
+	 */
 	struct object *obj;
 	obj = parse_object_buffer(sha1, type, size, buffer, eaten);
 	if (!obj) {
diff --git a/pack-check.c b/pack-check.c
index 1da89a4..d123846 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -105,6 +105,8 @@ static int verify_packfile(struct packed_git *p,
 		void *data;
 		enum object_type type;
 		unsigned long size;
+		off_t curpos;
+		int data_valid;
 
 		if (p->index_version > 1) {
 			off_t offset = entries[i].offset;
@@ -116,8 +118,25 @@ static int verify_packfile(struct packed_git *p,
 					    sha1_to_hex(entries[i].sha1),
 					    p->pack_name, (uintmax_t)offset);
 		}
-		data = unpack_entry(p, entries[i].offset, &type, &size);
-		if (!data)
+
+		curpos = entries[i].offset;
+		type = unpack_object_header(p, w_curs, &curpos, &size);
+		unuse_pack(w_curs);
+
+		if (type == OBJ_BLOB && big_file_threshold <= size) {
+			/*
+			 * Let check_sha1_signature() check it with
+			 * the streaming interface; no point slurping
+			 * the data in-core only to discard.
+			 */
+			data = NULL;
+			data_valid = 0;
+		} else {
+			data = unpack_entry(p, entries[i].offset, &type, &size);
+			data_valid = 1;
+		}
+
+		if (data_valid && !data)
 			err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
 				    sha1_to_hex(entries[i].sha1), p->pack_name,
 				    (uintmax_t)entries[i].offset);
diff --git a/pack.h b/pack.h
index 3223f5a..0e77429 100644
--- a/pack.h
+++ b/pack.h
@@ -74,6 +74,7 @@ struct pack_idx_entry {
 
 
 struct progress;
+/* Note, the data argument could be NULL if object type is blob */
 typedef int (*verify_fn)(const unsigned char*, enum object_type, unsigned long, void*, int*);
 
 extern const char *write_idx_file(const char *index_name, struct pack_idx_entry **objects, int nr_objects, const struct pack_idx_option *, const unsigned char *sha1);
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index f9f3d13..096dbff 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -177,10 +177,9 @@ test_expect_success 'zip achiving, deflate' '
 	git archive --format=zip HEAD >/dev/null
 '
 
-test_expect_success 'fsck' '
-	test_must_fail git fsck 2>err &&
-	n=$(grep "error: attempting to allocate .* over limit" err | wc -l) &&
-	test "$n" -gt 1
+test_expect_success 'fsck large blobs' '
+	git fsck 2>err &&
+	test_must_be_empty err
 '
 
 test_done
-- 
2.9.1.564.gb2f7278


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/7] Number truncation with 4+ GB files on 32-bit systems
  2016-07-13 15:43   ` [PATCH v2 0/7] " Nguyễn Thái Ngọc Duy
                       ` (6 preceding siblings ...)
  2016-07-13 15:44     ` [PATCH v2 7/7] fsck: use streaming interface for large blobs in pack Nguyễn Thái Ngọc Duy
@ 2016-07-13 16:16     ` Junio C Hamano
  7 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2016-07-13 16:16 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, michelbach94

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> A diff from nd/pack-ofs-4gb-limit can explain the changes better than
> me.
>
> I did not add PRIdMAX or similar because that carries a risk to exotic
> platforms that people rarely test. Just casting to unsigned should be
> fine.

Looks very sensible.  Thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2016-07-13 16:19 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-24 22:38 [bug] Reliably Reproducible Bad Packing of Objects Christoph Michelbach
2016-07-02  9:10 ` Duy Nguyen
2016-07-02 14:35   ` Duy Nguyen
2016-07-05 17:05 ` [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems Nguyễn Thái Ngọc Duy
2016-07-05 17:05   ` [PATCH 1/5] pack-objects: pass length to check_pack_crc() without truncation Nguyễn Thái Ngọc Duy
2016-07-12 17:16     ` Junio C Hamano
2016-07-05 17:05   ` [PATCH 2/5] sha1_file.c: use type off_t* for object_info->disk_sizep Nguyễn Thái Ngọc Duy
2016-07-12 17:20     ` Junio C Hamano
2016-07-12 19:26     ` Junio C Hamano
2016-07-05 17:05   ` [PATCH 3/5] index-pack: correct "len" type in unpack_data() Nguyễn Thái Ngọc Duy
2016-07-05 20:25     ` Johannes Sixt
2016-07-06 15:25       ` Duy Nguyen
2016-07-06 16:04         ` Junio C Hamano
2016-07-06 16:08           ` Junio C Hamano
2016-07-05 17:05   ` [PATCH 4/5] index-pack: report correct bad object offsets even if they are large Nguyễn Thái Ngọc Duy
2016-07-12 17:24     ` Junio C Hamano
2016-07-12 19:27     ` Junio C Hamano
2016-07-05 17:05   ` [PATCH 5/5] index-pack: correct "offset" type in unpack_entry_data() Nguyễn Thái Ngọc Duy
2016-07-05 18:11   ` [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems Christoph Michelbach
     [not found]   ` <1467756891.4798.1.camel@gmail.com>
     [not found]     ` <CACsJy8BDQbanGsf=3z3K-OuH0++EuqQFEB22udXJT+WZnFKSBg@mail.gmail.com>
2016-07-06 18:02       ` Christoph Michelbach
2016-07-06 18:54         ` Duy Nguyen
2016-07-10 10:41   ` Duy Nguyen
2016-07-10 10:42   ` [PATCH 6/5] pack-objects: do not truncate result in-pack object size " Nguyễn Thái Ngọc Duy
2016-07-10 10:45   ` [PATCH 7/5] fsck: use streaming interface for large blobs in pack Nguyễn Thái Ngọc Duy
2016-07-12 18:39     ` Junio C Hamano
2016-07-12 19:06       ` Junio C Hamano
2016-07-12 17:07   ` [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems Junio C Hamano
2016-07-12 18:48     ` Junio C Hamano
2016-07-12 20:38   ` Junio C Hamano
2016-07-13  6:01     ` Duy Nguyen
2016-07-13 15:43   ` [PATCH v2 0/7] " Nguyễn Thái Ngọc Duy
2016-07-13 15:43     ` [PATCH v2 1/7] pack-objects: pass length to check_pack_crc() without truncation Nguyễn Thái Ngọc Duy
2016-07-13 15:43     ` [PATCH v2 2/7] sha1_file.c: use type off_t* for object_info->disk_sizep Nguyễn Thái Ngọc Duy
2016-07-13 15:44     ` [PATCH v2 3/7] index-pack: correct "len" type in unpack_data() Nguyễn Thái Ngọc Duy
2016-07-13 15:44     ` [PATCH v2 4/7] index-pack: report correct bad object offsets even if they are large Nguyễn Thái Ngọc Duy
2016-07-13 15:44     ` [PATCH v2 5/7] index-pack: correct "offset" type in unpack_entry_data() Nguyễn Thái Ngọc Duy
2016-07-13 15:44     ` [PATCH v2 6/7] pack-objects: do not truncate result in-pack object size on 32-bit systems Nguyễn Thái Ngọc Duy
2016-07-13 15:44     ` [PATCH v2 7/7] fsck: use streaming interface for large blobs in pack Nguyễn Thái Ngọc Duy
2016-07-13 16:16     ` [PATCH v2 0/7] Number truncation with 4+ GB files on 32-bit systems Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).