git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Junio C Hamano <gitster@pobox.com>,
	Git Mailing List <git@vger.kernel.org>
Cc: Kristian H?gsberg <krh@redhat.com>
Subject: Re: performance problem: "git commit filename"
Date: Sat, 12 Jan 2008 20:04:22 -0800 (PST)	[thread overview]
Message-ID: <alpine.LFD.1.00.0801121949180.2806@woody.linux-foundation.org> (raw)
In-Reply-To: <alpine.LFD.1.00.0801121735020.2806@woody.linux-foundation.org>



On Sat, 12 Jan 2008, Linus Torvalds wrote:
> 
> HOWEVER. When that logic was converted from that shell-script into a 
> builtin-commit.c, that conversion was not done correctly. The old "git 
> read-tree -i -m" was not translated as a "unpack_trees()" call, but as 
> this in prepare_index():
> 
> 	discard_cache()
> 	..
> 	tree = parse_tree_indirect(head_sha1);
> 	..
> 	read_tree(tree, 0, NULL)
> 
> which is very wrong, because it replaces the old index entirely, and 
> doesn't do that stat information merging.

This patch may or may not fix it.

It makes builtin-commit.c use the same logic that "git read-tree -i -m" 
does (which is what the old shell script did), and it seems to pass the 
test-suite, and it looks pretty obvious.

It also brings down the number of open/mmap/munmap/close calls to where it 
should be, although it still does *way* too many "lstat()" operations (ie 
it does 4*lstat for each file in the index - one more than the 
non-filename one does).

With that fixed, performance is also roughly where it should be (ie the 
17-18s for the cold-cache case), because it no longer needs to rehash all 
the files!

HOWEVER. This was just a quick hack, and while it all looks sane, this is 
some damn core code. Somebody else should double- and triple-check this.

[ That 4x lstat thing bothers me. I think we should add a flag to the 
  index saying "we checked this once already, it's clean", so that if we 
  do multiple passes over the index, we can still do just a single lstat() 
  on just the first pass. But that's a separate issue.

  On Linux, a cached lstat() is almost free. Well, at least compared to 
  all the crap operating systems out there. And obviously, if you do 
  multiple lstat's per file, all but the first one *will* be cached.

  However, "almost free" still isn't zero, and with the kernel having 23k 
  files in it, doing almost a hundred thousand lstat's is still something 
  that only takes about half a second or so for me. We _really_ should do 
  only ~23k or so of them, and the cached cache should take on the order 
  of 0.15s, rather than half a second!

  So this is worth optimizing. With bigger repositories, it's going to be 
  more noticeable, and with other operating systems, all those lstat()'s 
  will cost much _much_ more. Of course, any IO overhead will be much 
  bigger, so this is mostly a cached-case issue, but cached-case is still 
  important.. ]

Anyway, consider this being conditionally signed-off-by: me, assuming 
a few other people spend a bit of time double-checking all my logic.

Please?

			Linus

---
 builtin-commit.c |   37 ++++++++++++++++++++++++++++---------
 1 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/builtin-commit.c b/builtin-commit.c
index 73f1e35..cc5134e 100644
--- a/builtin-commit.c
+++ b/builtin-commit.c
@@ -21,6 +21,7 @@
 #include "utf8.h"
 #include "parse-options.h"
 #include "path-list.h"
+#include "unpack-trees.h"
 
 static const char * const builtin_commit_usage[] = {
 	"git-commit [options] [--] <filepattern>...",
@@ -177,10 +178,34 @@ static void add_remove_files(struct path_list *list)
 	}
 }
 
+static void create_base_index(void)
+{
+	struct tree *tree;
+	struct unpack_trees_options opts;
+	struct tree_desc t;
+
+	if (initial_commit) {
+		discard_cache();
+		return;
+	}
+
+	memset(&opts, 0, sizeof(opts));
+	opts.head_idx = 1;
+	opts.index_only = 1;
+	opts.merge = 1;
+	
+	opts.fn = oneway_merge;
+	tree = parse_tree_indirect(head_sha1);
+	if (!tree)
+		die("failed to unpack HEAD tree object");
+	parse_tree(tree);
+	init_tree_desc(&t, tree->buffer, tree->size);
+	unpack_trees(1, &t, &opts);
+}
+
 static char *prepare_index(int argc, const char **argv, const char *prefix)
 {
 	int fd;
-	struct tree *tree;
 	struct path_list partial;
 	const char **pathspec = NULL;
 
@@ -278,14 +303,8 @@ static char *prepare_index(int argc, const char **argv, const char *prefix)
 
 	fd = hold_lock_file_for_update(&false_lock,
 				       git_path("next-index-%d", getpid()), 1);
-	discard_cache();
-	if (!initial_commit) {
-		tree = parse_tree_indirect(head_sha1);
-		if (!tree)
-			die("failed to unpack HEAD tree object");
-		if (read_tree(tree, 0, NULL))
-			die("failed to read HEAD tree object");
-	}
+
+	create_base_index();
 	add_remove_files(&partial);
 	refresh_cache(REFRESH_QUIET);
 

  reply	other threads:[~2008-01-13  4:05 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-12 22:46 performance problem: "git commit filename" Linus Torvalds
2008-01-13  1:46 ` Linus Torvalds
2008-01-13  4:04   ` Linus Torvalds [this message]
2008-01-13  5:38     ` Daniel Barkalow
2008-01-13  8:14       ` Junio C Hamano
2008-01-13 16:57       ` Linus Torvalds
2008-01-13 19:31         ` Daniel Barkalow
2008-01-13  8:12     ` Junio C Hamano
2008-01-13 10:33       ` Junio C Hamano
2008-01-13 10:54         ` [PATCH] builtin-commit.c: do not lstat(2) partially committed paths twice Junio C Hamano
2008-01-13 11:09         ` performance problem: "git commit filename" Junio C Hamano
2008-01-13 17:24         ` Linus Torvalds
2008-01-13 19:39           ` Junio C Hamano
2008-01-13 22:36             ` [PATCH] index: be careful when handling long names Junio C Hamano
2008-01-13 22:53               ` Alex Riesen
2008-01-13 23:08                 ` Junio C Hamano
2008-01-13 23:33                   ` Alex Riesen
2008-01-14 21:03                     ` Junio C Hamano
2008-01-14  1:00           ` performance problem: "git commit filename" Junio C Hamano
2008-01-14 17:07             ` Linus Torvalds
2008-01-14 18:38               ` Junio C Hamano
2008-01-14 19:39               ` Linus Torvalds
2008-01-14 20:08                 ` Junio C Hamano
2008-01-14 21:00                   ` Linus Torvalds
2008-01-15  0:18             ` Linus Torvalds
2008-01-15  1:13               ` Junio C Hamano
2008-01-13 10:38       ` [PATCH] builtin-commit.c: remove useless check added by faulty cut and paste Junio C Hamano
2008-01-14 21:23         ` しらいしななこ
2008-01-14 21:54           ` Junio C Hamano
2008-01-14 23:46     ` performance problem: "git commit filename" Kristian Høgsberg
2008-01-14 23:15   ` Kristian Høgsberg
2008-01-14 23:48     ` Junio C Hamano
2008-01-14 23:53       ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.1.00.0801121949180.2806@woody.linux-foundation.org \
    --to=torvalds@linux-foundation.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=krh@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).