git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: [PATCH 4/7] parse_object: avoid putting whole blob in core
Date: Wed,  7 Mar 2012 17:54:18 +0700	[thread overview]
Message-ID: <1331117661-19378-5-git-send-email-pclouds@gmail.com> (raw)
In-Reply-To: <1331117661-19378-1-git-send-email-pclouds@gmail.com>

Traditionally, all the callers of check_sha1_signature() first
called read_sha1_file() to prepare the whole object data in core,
and called this function.  The function is used to revalidate what
we read from the object database actually matches the object name we
used to ask for the data from the object database.

Update the API to allow callers to pass NULL as the object data, and
have the function read and hash the object data using streaming API
to recompute the object name, without having to hold everything in
core at the same time.  This is most useful in parse_object() that
parses a blob object, because this caller does not have to keep the
actual blob data around in memory after a "struct blob" is returned.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 object.c    |   11 +++++++++++
 sha1_file.c |   42 ++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/object.c b/object.c
index 6b06297..0498b18 100644
--- a/object.c
+++ b/object.c
@@ -198,6 +198,17 @@ struct object *parse_object(const unsigned char *sha1)
 	if (obj && obj->parsed)
 		return obj;
 
+	if ((obj && obj->type == OBJ_BLOB) ||
+	    (!obj && has_sha1_file(sha1) &&
+	     sha1_object_info(sha1, NULL) == OBJ_BLOB)) {
+		if (check_sha1_signature(repl, NULL, 0, NULL) < 0) {
+			error("sha1 mismatch %s\n", sha1_to_hex(repl));
+			return NULL;
+		}
+		parse_blob_buffer(lookup_blob(sha1), NULL, 0);
+		return lookup_object(sha1);
+	}
+
 	buffer = read_sha1_file(sha1, &type, &size);
 	if (buffer) {
 		if (check_sha1_signature(repl, buffer, size, typename(type)) < 0) {
diff --git a/sha1_file.c b/sha1_file.c
index 4f06a0e..ad314f0 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -19,6 +19,7 @@
 #include "pack-revindex.h"
 #include "sha1-lookup.h"
 #include "bulk-checkin.h"
+#include "streaming.h"
 
 #ifndef O_NOATIME
 #if defined(__linux__) && (defined(__i386__) || defined(__PPC__))
@@ -1146,10 +1147,47 @@ static const struct packed_git *has_packed_and_bad(const unsigned char *sha1)
 	return NULL;
 }
 
-int check_sha1_signature(const unsigned char *sha1, void *map, unsigned long size, const char *type)
+/*
+ * With an in-core object data in "map", rehash it to make sure the
+ * object name actually matches "sha1" to detect object corruption.
+ * With "map" == NULL, try reading the object named with "sha1" using
+ * the streaming interface and rehash it to do the same.
+ */
+int check_sha1_signature(const unsigned char *sha1, void *map,
+			 unsigned long size, const char *type)
 {
 	unsigned char real_sha1[20];
-	hash_sha1_file(map, size, type, real_sha1);
+	enum object_type obj_type;
+	struct git_istream *st;
+	git_SHA_CTX c;
+	char hdr[32];
+	int hdrlen;
+
+	if (map) {
+		hash_sha1_file(map, size, type, real_sha1);
+		return hashcmp(sha1, real_sha1) ? -1 : 0;
+	}
+
+	st = open_istream(sha1, &obj_type, &size, NULL);
+	if (!st)
+		return -1;
+
+	/* Generate the header */
+	hdrlen = sprintf(hdr, "%s %lu", typename(obj_type), size) + 1;
+
+	/* Sha1.. */
+	git_SHA1_Init(&c);
+	git_SHA1_Update(&c, hdr, hdrlen);
+	for (;;) {
+		char buf[1024 * 16];
+		ssize_t readlen = read_istream(st, buf, sizeof(buf));
+
+		if (!readlen)
+			break;
+		git_SHA1_Update(&c, buf, readlen);
+	}
+	git_SHA1_Final(real_sha1, &c);
+	close_istream(st);
 	return hashcmp(sha1, real_sha1) ? -1 : 0;
 }
 
-- 
1.7.8.36.g69ee2

  parent reply	other threads:[~2012-03-07 10:53 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-06  7:15 What's cooking in git.git (Mar 2012, #03; Mon, 5) Junio C Hamano
2012-03-06  9:40 ` Nguyen Thai Ngoc Duy
2012-03-06 18:23   ` Junio C Hamano
2012-03-07 10:54     ` [PATCH 0/7] nd/stream-more updates Nguyễn Thái Ngọc Duy
2012-03-07 10:54       ` [PATCH 1/7] streaming: make streaming-write-entry to be more reusable Nguyễn Thái Ngọc Duy
2012-03-07 10:54       ` [PATCH 2/7] Add more large blob test cases Nguyễn Thái Ngọc Duy
2012-03-07 10:54       ` [PATCH 3/7] cat-file: use streaming API to print blobs Nguyễn Thái Ngọc Duy
2012-03-07 10:54       ` Nguyễn Thái Ngọc Duy [this message]
2012-03-07 10:54       ` [PATCH 5/7] show: use streaming API for showing blobs Nguyễn Thái Ngọc Duy
2012-03-07 10:54       ` [PATCH 6/7] fsck: use streaming API for writing lost-found blobs Nguyễn Thái Ngọc Duy
2012-03-07 10:54       ` [PATCH 7/7] update-server-info: respect core.bigfilethreshold Nguyễn Thái Ngọc Duy
2012-03-07 17:13       ` [PATCH 0/7] nd/stream-more updates Junio C Hamano
2012-03-07  6:28 ` In preparation for 1.7.10-rc0 Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1331117661-19378-5-git-send-email-pclouds@gmail.com \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).