* [PATCH 0/2] minor fast-export speedup
@ 2013-03-17 8:32 Jeff King
2013-03-17 8:33 ` [PATCH 1/2] fast-export: rename handle_object function Jeff King
2013-03-17 8:38 ` [PATCH 2/2] fast-export: do not load blob objects twice Jeff King
0 siblings, 2 replies; 3+ messages in thread
From: Jeff King @ 2013-03-17 8:32 UTC (permalink / raw
To: git
While grepping through all of the calls to parse_object (to see how they
handled error conditions, for the other series I just posted), I noticed
this opportunity for a small speedup in fast-export (5-15%). The first
patch is a cleanup, the second is the interesting bit.
[1/2]: fast-export: rename handle_object function
[2/2]: fast-export: do not load blob objects twice
A useful third patch on top might be to stream blobs out rather than
load them into memory, but I didn't want to go there tonight.
-Peff
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH 1/2] fast-export: rename handle_object function
2013-03-17 8:32 [PATCH 0/2] minor fast-export speedup Jeff King
@ 2013-03-17 8:33 ` Jeff King
2013-03-17 8:38 ` [PATCH 2/2] fast-export: do not load blob objects twice Jeff King
1 sibling, 0 replies; 3+ messages in thread
From: Jeff King @ 2013-03-17 8:33 UTC (permalink / raw
To: git
The handle_object function is rather vaguely named; it only
operates on blobs, and its purpose is to export the blob to
the output stream. Let's call it "export_blob" to make it
more clear what it does.
Signed-off-by: Jeff King <peff@peff.net>
---
builtin/fast-export.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 77dffd1..3eba852 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -113,7 +113,7 @@ static void show_progress(void)
printf("progress %d objects\n", counter);
}
-static void handle_object(const unsigned char *sha1)
+static void export_blob(const unsigned char *sha1)
{
unsigned long size;
enum object_type type;
@@ -312,7 +312,7 @@ static void handle_commit(struct commit *commit, struct rev_info *rev)
/* Export the referenced blobs, and remember the marks. */
for (i = 0; i < diff_queued_diff.nr; i++)
if (!S_ISGITLINK(diff_queued_diff.queue[i]->two->mode))
- handle_object(diff_queued_diff.queue[i]->two->sha1);
+ export_blob(diff_queued_diff.queue[i]->two->sha1);
mark_next_object(&commit->object);
if (!is_encoding_utf8(encoding))
@@ -512,7 +512,7 @@ static void get_tags_and_duplicates(struct rev_cmdline_info *info,
commit = (struct commit *)tag;
break;
case OBJ_BLOB:
- handle_object(tag->object.sha1);
+ export_blob(tag->object.sha1);
continue;
default: /* OBJ_TAG (nested tags) is already handled */
warning("Tag points to object of unexpected type %s, skipping.",
--
1.8.2.rc2.7.gef06216
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH 2/2] fast-export: do not load blob objects twice
2013-03-17 8:32 [PATCH 0/2] minor fast-export speedup Jeff King
2013-03-17 8:33 ` [PATCH 1/2] fast-export: rename handle_object function Jeff King
@ 2013-03-17 8:38 ` Jeff King
1 sibling, 0 replies; 3+ messages in thread
From: Jeff King @ 2013-03-17 8:38 UTC (permalink / raw
To: git
When fast-export wants to export a blob object, it first
calls parse_object to get a "struct object" and check
whether we have already shown the object. If we haven't
shown it, we then use read_sha1_file to pull it from disk
and write it out.
That means we load each blob from disk twice: once for
parse_object to find its type and check its sha1, and a
second time when we actually output it. We can drop this to
a single load by using lookup_object to check the SHOWN
flag, and then checking the signature on and outputting a
single buffer.
This provides modest speedups on git.git (best-of-five, "git
fast-export HEAD >/dev/null"):
[before] [after]
real 0m14.347s real 0m13.780s
user 0m14.084s user 0m13.620s
sys 0m0.208s sys 0m0.100s
and somewhat more on more blob-heavy repos (this is a
repository full of media files):
[before] [after]
real 0m52.236s real 0m44.451s
user 0m50.568s user 0m43.000s
sys 0m1.536s sys 0m1.284s
Signed-off-by: Jeff King <peff@peff.net>
---
We actually spend a non-trivial amount of time re-checking the sha1 of
objects we are loading. This change also makes it easy to drop that
checking, though perhaps the additional safety is a good thing to have
during an export. The timings without it are:
git.git (was 14.347s)
real 0m11.452s
user 0m11.336s
sys 0m0.072s
photos (was 44.451s)
real 0m18.383s
user 0m17.108s
sys 0m1.224s
builtin/fast-export.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 3eba852..d380155 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -119,6 +119,7 @@ static void export_blob(const unsigned char *sha1)
enum object_type type;
char *buf;
struct object *object;
+ int eaten;
if (no_data)
return;
@@ -126,16 +127,18 @@ static void export_blob(const unsigned char *sha1)
if (is_null_sha1(sha1))
return;
- object = parse_object(sha1);
- if (!object)
- die ("Could not read blob %s", sha1_to_hex(sha1));
-
- if (object->flags & SHOWN)
+ object = lookup_object(sha1);
+ if (object && object->flags & SHOWN)
return;
buf = read_sha1_file(sha1, &type, &size);
if (!buf)
die ("Could not read blob %s", sha1_to_hex(sha1));
+ if (check_sha1_signature(sha1, buf, size, typename(type)) < 0)
+ die("sha1 mismatch in blob %s", sha1_to_hex(sha1));
+ object = parse_object_buffer(sha1, type, size, buf, &eaten);
+ if (!object)
+ die("Could not read blob %s", sha1_to_hex(sha1));
mark_next_object(object);
@@ -147,7 +150,8 @@ static void export_blob(const unsigned char *sha1)
show_progress();
object->flags |= SHOWN;
- free(buf);
+ if (!eaten)
+ free(buf);
}
static int depth_first(const void *a_, const void *b_)
--
1.8.2.rc2.7.gef06216
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-03-17 8:39 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-17 8:32 [PATCH 0/2] minor fast-export speedup Jeff King
2013-03-17 8:33 ` [PATCH 1/2] fast-export: rename handle_object function Jeff King
2013-03-17 8:38 ` [PATCH 2/2] fast-export: do not load blob objects twice Jeff King
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).