git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
	michelbach94@gmail.com,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: [PATCH 7/5] fsck: use streaming interface for large blobs in pack
Date: Sun, 10 Jul 2016 12:45:55 +0200	[thread overview]
Message-ID: <20160710104555.27478-1-pclouds@gmail.com> (raw)
In-Reply-To: <20160705170558.10906-1-pclouds@gmail.com>

For blobs, we want to make sure the on-disk data is not corrupted
(i.e. can be inflated and produce the expected SHA-1). Blob content is
opaque, there's nothing else inside to check for.

For really large blobs, we may want to avoid unpacking the entire blob
in memory, just to check whether it produces the same SHA-1. On 32-bit
systems, we may not have enough virtual address space for such memory
allocation. And even on 64-bit where it's not a problem, allocating a
lot more memory could result in kicking other parts of systems to swap
file, generating lots of I/O and slowing everything down.

For this particular operation, not unpacking the blob and letting
check_sha1_signature, which supports streaming interface, do the job
is sufficient. check_sha1_signature() is not shown in the diff,
unfortunately. But if will be called when "data_valid && !data" is
false.

We will call the callback function "fn" with NULL as "data". The only
callback of this function is fsck_obj_buffer(), which does not touch
"data" at all if it's a blob.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 pack-check.c | 15 +++++++++++++--
 pack.h       |  1 +
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/pack-check.c b/pack-check.c
index 1da89a4..0777766 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -105,6 +105,8 @@ static int verify_packfile(struct packed_git *p,
 		void *data;
 		enum object_type type;
 		unsigned long size;
+		off_t curpos;
+		int data_valid = 0;
 
 		if (p->index_version > 1) {
 			off_t offset = entries[i].offset;
@@ -116,8 +118,17 @@ static int verify_packfile(struct packed_git *p,
 					    sha1_to_hex(entries[i].sha1),
 					    p->pack_name, (uintmax_t)offset);
 		}
-		data = unpack_entry(p, entries[i].offset, &type, &size);
-		if (!data)
+
+		curpos = entries[i].offset;
+		type = unpack_object_header(p, w_curs, &curpos, &size);
+		unuse_pack(w_curs);
+
+		if (type != OBJ_BLOB || size < big_file_threshold) {
+			data = unpack_entry(p, entries[i].offset, &type, &size);
+			data_valid = 1;
+		}
+
+		if (data_valid && !data)
 			err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
 				    sha1_to_hex(entries[i].sha1), p->pack_name,
 				    (uintmax_t)entries[i].offset);
diff --git a/pack.h b/pack.h
index 3223f5a..0e77429 100644
--- a/pack.h
+++ b/pack.h
@@ -74,6 +74,7 @@ struct pack_idx_entry {
 
 
 struct progress;
+/* Note, the data argument could be NULL if object type is blob */
 typedef int (*verify_fn)(const unsigned char*, enum object_type, unsigned long, void*, int*);
 
 extern const char *write_idx_file(const char *index_name, struct pack_idx_entry **objects, int nr_objects, const struct pack_idx_option *, const unsigned char *sha1);
-- 
2.8.2.537.g0965dd9


  parent reply	other threads:[~2016-07-10 10:46 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-24 22:38 [bug] Reliably Reproducible Bad Packing of Objects Christoph Michelbach
2016-07-02  9:10 ` Duy Nguyen
2016-07-02 14:35   ` Duy Nguyen
2016-07-05 17:05 ` [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems Nguyễn Thái Ngọc Duy
2016-07-05 17:05   ` [PATCH 1/5] pack-objects: pass length to check_pack_crc() without truncation Nguyễn Thái Ngọc Duy
2016-07-12 17:16     ` Junio C Hamano
2016-07-05 17:05   ` [PATCH 2/5] sha1_file.c: use type off_t* for object_info->disk_sizep Nguyễn Thái Ngọc Duy
2016-07-12 17:20     ` Junio C Hamano
2016-07-12 19:26     ` Junio C Hamano
2016-07-05 17:05   ` [PATCH 3/5] index-pack: correct "len" type in unpack_data() Nguyễn Thái Ngọc Duy
2016-07-05 20:25     ` Johannes Sixt
2016-07-06 15:25       ` Duy Nguyen
2016-07-06 16:04         ` Junio C Hamano
2016-07-06 16:08           ` Junio C Hamano
2016-07-05 17:05   ` [PATCH 4/5] index-pack: report correct bad object offsets even if they are large Nguyễn Thái Ngọc Duy
2016-07-12 17:24     ` Junio C Hamano
2016-07-12 19:27     ` Junio C Hamano
2016-07-05 17:05   ` [PATCH 5/5] index-pack: correct "offset" type in unpack_entry_data() Nguyễn Thái Ngọc Duy
2016-07-05 18:11   ` [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems Christoph Michelbach
     [not found]   ` <1467756891.4798.1.camel@gmail.com>
     [not found]     ` <CACsJy8BDQbanGsf=3z3K-OuH0++EuqQFEB22udXJT+WZnFKSBg@mail.gmail.com>
2016-07-06 18:02       ` Christoph Michelbach
2016-07-06 18:54         ` Duy Nguyen
2016-07-10 10:41   ` Duy Nguyen
2016-07-10 10:42   ` [PATCH 6/5] pack-objects: do not truncate result in-pack object size " Nguyễn Thái Ngọc Duy
2016-07-10 10:45   ` Nguyễn Thái Ngọc Duy [this message]
2016-07-12 18:39     ` [PATCH 7/5] fsck: use streaming interface for large blobs in pack Junio C Hamano
2016-07-12 19:06       ` Junio C Hamano
2016-07-12 17:07   ` [PATCH 0/5] Number truncation with 4+ GB files on 32-bit systems Junio C Hamano
2016-07-12 18:48     ` Junio C Hamano
2016-07-12 20:38   ` Junio C Hamano
2016-07-13  6:01     ` Duy Nguyen
2016-07-13 15:43   ` [PATCH v2 0/7] " Nguyễn Thái Ngọc Duy
2016-07-13 15:43     ` [PATCH v2 1/7] pack-objects: pass length to check_pack_crc() without truncation Nguyễn Thái Ngọc Duy
2016-07-13 15:43     ` [PATCH v2 2/7] sha1_file.c: use type off_t* for object_info->disk_sizep Nguyễn Thái Ngọc Duy
2016-07-13 15:44     ` [PATCH v2 3/7] index-pack: correct "len" type in unpack_data() Nguyễn Thái Ngọc Duy
2016-07-13 15:44     ` [PATCH v2 4/7] index-pack: report correct bad object offsets even if they are large Nguyễn Thái Ngọc Duy
2016-07-13 15:44     ` [PATCH v2 5/7] index-pack: correct "offset" type in unpack_entry_data() Nguyễn Thái Ngọc Duy
2016-07-13 15:44     ` [PATCH v2 6/7] pack-objects: do not truncate result in-pack object size on 32-bit systems Nguyễn Thái Ngọc Duy
2016-07-13 15:44     ` [PATCH v2 7/7] fsck: use streaming interface for large blobs in pack Nguyễn Thái Ngọc Duy
2016-07-13 16:16     ` [PATCH v2 0/7] Number truncation with 4+ GB files on 32-bit systems Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160710104555.27478-1-pclouds@gmail.com \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=michelbach94@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).