From: Linus Torvalds <torvalds@linux-foundation.org>
To: Junio C Hamano <gitster@pobox.com>,
Git Mailing List <git@vger.kernel.org>
Cc: Kristian H?gsberg <krh@redhat.com>
Subject: Re: performance problem: "git commit filename"
Date: Sat, 12 Jan 2008 20:04:22 -0800 (PST) [thread overview]
Message-ID: <alpine.LFD.1.00.0801121949180.2806@woody.linux-foundation.org> (raw)
In-Reply-To: <alpine.LFD.1.00.0801121735020.2806@woody.linux-foundation.org>
On Sat, 12 Jan 2008, Linus Torvalds wrote:
>
> HOWEVER. When that logic was converted from that shell-script into a
> builtin-commit.c, that conversion was not done correctly. The old "git
> read-tree -i -m" was not translated as a "unpack_trees()" call, but as
> this in prepare_index():
>
> discard_cache()
> ..
> tree = parse_tree_indirect(head_sha1);
> ..
> read_tree(tree, 0, NULL)
>
> which is very wrong, because it replaces the old index entirely, and
> doesn't do that stat information merging.
This patch may or may not fix it.
It makes builtin-commit.c use the same logic that "git read-tree -i -m"
does (which is what the old shell script did), and it seems to pass the
test-suite, and it looks pretty obvious.
It also brings down the number of open/mmap/munmap/close calls to where it
should be, although it still does *way* too many "lstat()" operations (ie
it does 4*lstat for each file in the index - one more than the
non-filename one does).
With that fixed, performance is also roughly where it should be (ie the
17-18s for the cold-cache case), because it no longer needs to rehash all
the files!
HOWEVER. This was just a quick hack, and while it all looks sane, this is
some damn core code. Somebody else should double- and triple-check this.
[ That 4x lstat thing bothers me. I think we should add a flag to the
index saying "we checked this once already, it's clean", so that if we
do multiple passes over the index, we can still do just a single lstat()
on just the first pass. But that's a separate issue.
On Linux, a cached lstat() is almost free. Well, at least compared to
all the crap operating systems out there. And obviously, if you do
multiple lstat's per file, all but the first one *will* be cached.
However, "almost free" still isn't zero, and with the kernel having 23k
files in it, doing almost a hundred thousand lstat's is still something
that only takes about half a second or so for me. We _really_ should do
only ~23k or so of them, and the cached cache should take on the order
of 0.15s, rather than half a second!
So this is worth optimizing. With bigger repositories, it's going to be
more noticeable, and with other operating systems, all those lstat()'s
will cost much _much_ more. Of course, any IO overhead will be much
bigger, so this is mostly a cached-case issue, but cached-case is still
important.. ]
Anyway, consider this being conditionally signed-off-by: me, assuming
a few other people spend a bit of time double-checking all my logic.
Please?
Linus
---
builtin-commit.c | 37 ++++++++++++++++++++++++++++---------
1 files changed, 28 insertions(+), 9 deletions(-)
diff --git a/builtin-commit.c b/builtin-commit.c
index 73f1e35..cc5134e 100644
--- a/builtin-commit.c
+++ b/builtin-commit.c
@@ -21,6 +21,7 @@
#include "utf8.h"
#include "parse-options.h"
#include "path-list.h"
+#include "unpack-trees.h"
static const char * const builtin_commit_usage[] = {
"git-commit [options] [--] <filepattern>...",
@@ -177,10 +178,34 @@ static void add_remove_files(struct path_list *list)
}
}
+static void create_base_index(void)
+{
+ struct tree *tree;
+ struct unpack_trees_options opts;
+ struct tree_desc t;
+
+ if (initial_commit) {
+ discard_cache();
+ return;
+ }
+
+ memset(&opts, 0, sizeof(opts));
+ opts.head_idx = 1;
+ opts.index_only = 1;
+ opts.merge = 1;
+
+ opts.fn = oneway_merge;
+ tree = parse_tree_indirect(head_sha1);
+ if (!tree)
+ die("failed to unpack HEAD tree object");
+ parse_tree(tree);
+ init_tree_desc(&t, tree->buffer, tree->size);
+ unpack_trees(1, &t, &opts);
+}
+
static char *prepare_index(int argc, const char **argv, const char *prefix)
{
int fd;
- struct tree *tree;
struct path_list partial;
const char **pathspec = NULL;
@@ -278,14 +303,8 @@ static char *prepare_index(int argc, const char **argv, const char *prefix)
fd = hold_lock_file_for_update(&false_lock,
git_path("next-index-%d", getpid()), 1);
- discard_cache();
- if (!initial_commit) {
- tree = parse_tree_indirect(head_sha1);
- if (!tree)
- die("failed to unpack HEAD tree object");
- if (read_tree(tree, 0, NULL))
- die("failed to read HEAD tree object");
- }
+
+ create_base_index();
add_remove_files(&partial);
refresh_cache(REFRESH_QUIET);
next prev parent reply other threads:[~2008-01-13 4:05 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-12 22:46 performance problem: "git commit filename" Linus Torvalds
2008-01-13 1:46 ` Linus Torvalds
2008-01-13 4:04 ` Linus Torvalds [this message]
2008-01-13 5:38 ` Daniel Barkalow
2008-01-13 8:14 ` Junio C Hamano
2008-01-13 16:57 ` Linus Torvalds
2008-01-13 19:31 ` Daniel Barkalow
2008-01-13 8:12 ` Junio C Hamano
2008-01-13 10:33 ` Junio C Hamano
2008-01-13 10:54 ` [PATCH] builtin-commit.c: do not lstat(2) partially committed paths twice Junio C Hamano
2008-01-13 11:09 ` performance problem: "git commit filename" Junio C Hamano
2008-01-13 17:24 ` Linus Torvalds
2008-01-13 19:39 ` Junio C Hamano
2008-01-13 22:36 ` [PATCH] index: be careful when handling long names Junio C Hamano
2008-01-13 22:53 ` Alex Riesen
2008-01-13 23:08 ` Junio C Hamano
2008-01-13 23:33 ` Alex Riesen
2008-01-14 21:03 ` Junio C Hamano
2008-01-14 1:00 ` performance problem: "git commit filename" Junio C Hamano
2008-01-14 17:07 ` Linus Torvalds
2008-01-14 18:38 ` Junio C Hamano
2008-01-14 19:39 ` Linus Torvalds
2008-01-14 20:08 ` Junio C Hamano
2008-01-14 21:00 ` Linus Torvalds
2008-01-15 0:18 ` Linus Torvalds
2008-01-15 1:13 ` Junio C Hamano
2008-01-13 10:38 ` [PATCH] builtin-commit.c: remove useless check added by faulty cut and paste Junio C Hamano
2008-01-14 21:23 ` しらいしななこ
2008-01-14 21:54 ` Junio C Hamano
2008-01-14 23:46 ` performance problem: "git commit filename" Kristian Høgsberg
2008-01-14 23:15 ` Kristian Høgsberg
2008-01-14 23:48 ` Junio C Hamano
2008-01-14 23:53 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.1.00.0801121949180.2806@woody.linux-foundation.org \
--to=torvalds@linux-foundation.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=krh@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).