git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] minor fast-export speedup
@ 2013-03-17  8:32 Jeff King
  2013-03-17  8:33 ` [PATCH 1/2] fast-export: rename handle_object function Jeff King
  2013-03-17  8:38 ` [PATCH 2/2] fast-export: do not load blob objects twice Jeff King
  0 siblings, 2 replies; 3+ messages in thread
From: Jeff King @ 2013-03-17  8:32 UTC (permalink / raw
  To: git

While grepping through all of the calls to parse_object (to see how they
handled error conditions, for the other series I just posted), I noticed
this opportunity for a small speedup in fast-export (5-15%). The first
patch is a cleanup, the second is the interesting bit.

  [1/2]: fast-export: rename handle_object function
  [2/2]: fast-export: do not load blob objects twice

A useful third patch on top might be to stream blobs out rather than
load them into memory, but I didn't want to go there tonight.

-Peff

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/2] fast-export: rename handle_object function
  2013-03-17  8:32 [PATCH 0/2] minor fast-export speedup Jeff King
@ 2013-03-17  8:33 ` Jeff King
  2013-03-17  8:38 ` [PATCH 2/2] fast-export: do not load blob objects twice Jeff King
  1 sibling, 0 replies; 3+ messages in thread
From: Jeff King @ 2013-03-17  8:33 UTC (permalink / raw
  To: git

The handle_object function is rather vaguely named; it only
operates on blobs, and its purpose is to export the blob to
the output stream. Let's call it "export_blob" to make it
more clear what it does.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fast-export.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 77dffd1..3eba852 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -113,7 +113,7 @@ static void show_progress(void)
 		printf("progress %d objects\n", counter);
 }
 
-static void handle_object(const unsigned char *sha1)
+static void export_blob(const unsigned char *sha1)
 {
 	unsigned long size;
 	enum object_type type;
@@ -312,7 +312,7 @@ static void handle_commit(struct commit *commit, struct rev_info *rev)
 	/* Export the referenced blobs, and remember the marks. */
 	for (i = 0; i < diff_queued_diff.nr; i++)
 		if (!S_ISGITLINK(diff_queued_diff.queue[i]->two->mode))
-			handle_object(diff_queued_diff.queue[i]->two->sha1);
+			export_blob(diff_queued_diff.queue[i]->two->sha1);
 
 	mark_next_object(&commit->object);
 	if (!is_encoding_utf8(encoding))
@@ -512,7 +512,7 @@ static void get_tags_and_duplicates(struct rev_cmdline_info *info,
 				commit = (struct commit *)tag;
 				break;
 			case OBJ_BLOB:
-				handle_object(tag->object.sha1);
+				export_blob(tag->object.sha1);
 				continue;
 			default: /* OBJ_TAG (nested tags) is already handled */
 				warning("Tag points to object of unexpected type %s, skipping.",
-- 
1.8.2.rc2.7.gef06216

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH 2/2] fast-export: do not load blob objects twice
  2013-03-17  8:32 [PATCH 0/2] minor fast-export speedup Jeff King
  2013-03-17  8:33 ` [PATCH 1/2] fast-export: rename handle_object function Jeff King
@ 2013-03-17  8:38 ` Jeff King
  1 sibling, 0 replies; 3+ messages in thread
From: Jeff King @ 2013-03-17  8:38 UTC (permalink / raw
  To: git

When fast-export wants to export a blob object, it first
calls parse_object to get a "struct object" and check
whether we have already shown the object.  If we haven't
shown it, we then use read_sha1_file to pull it from disk
and write it out.

That means we load each blob from disk twice: once for
parse_object to find its type and check its sha1, and a
second time when we actually output it. We can drop this to
a single load by using lookup_object to check the SHOWN
flag, and then checking the signature on and outputting a
single buffer.

This provides modest speedups on git.git (best-of-five, "git
fast-export HEAD >/dev/null"):

  [before]                [after]
  real    0m14.347s       real    0m13.780s
  user    0m14.084s       user    0m13.620s
  sys     0m0.208s        sys     0m0.100s

and somewhat more on more blob-heavy repos (this is a
repository full of media files):

  [before]                [after]
  real    0m52.236s       real    0m44.451s
  user    0m50.568s       user    0m43.000s
  sys     0m1.536s        sys     0m1.284s

Signed-off-by: Jeff King <peff@peff.net>
---
We actually spend a non-trivial amount of time re-checking the sha1 of
objects we are loading. This change also makes it easy to drop that
checking, though perhaps the additional safety is a good thing to have
during an export. The timings without it are:

  git.git (was 14.347s)
  real    0m11.452s
  user    0m11.336s
  sys     0m0.072s

  photos (was 44.451s)
  real    0m18.383s
  user    0m17.108s
  sys     0m1.224s

 builtin/fast-export.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 3eba852..d380155 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -119,6 +119,7 @@ static void export_blob(const unsigned char *sha1)
 	enum object_type type;
 	char *buf;
 	struct object *object;
+	int eaten;
 
 	if (no_data)
 		return;
@@ -126,16 +127,18 @@ static void export_blob(const unsigned char *sha1)
 	if (is_null_sha1(sha1))
 		return;
 
-	object = parse_object(sha1);
-	if (!object)
-		die ("Could not read blob %s", sha1_to_hex(sha1));
-
-	if (object->flags & SHOWN)
+	object = lookup_object(sha1);
+	if (object && object->flags & SHOWN)
 		return;
 
 	buf = read_sha1_file(sha1, &type, &size);
 	if (!buf)
 		die ("Could not read blob %s", sha1_to_hex(sha1));
+	if (check_sha1_signature(sha1, buf, size, typename(type)) < 0)
+		die("sha1 mismatch in blob %s", sha1_to_hex(sha1));
+	object = parse_object_buffer(sha1, type, size, buf, &eaten);
+	if (!object)
+		die("Could not read blob %s", sha1_to_hex(sha1));
 
 	mark_next_object(object);
 
@@ -147,7 +150,8 @@ static void export_blob(const unsigned char *sha1)
 	show_progress();
 
 	object->flags |= SHOWN;
-	free(buf);
+	if (!eaten)
+		free(buf);
 }
 
 static int depth_first(const void *a_, const void *b_)
-- 
1.8.2.rc2.7.gef06216

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-03-17  8:39 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-17  8:32 [PATCH 0/2] minor fast-export speedup Jeff King
2013-03-17  8:33 ` [PATCH 1/2] fast-export: rename handle_object function Jeff King
2013-03-17  8:38 ` [PATCH 2/2] fast-export: do not load blob objects twice Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).