git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Junio C Hamano <gitster@pobox.com>,
	Git Mailing List <git@vger.kernel.org>
Cc: "Kristian Høgsberg" <krh@redhat.com>
Subject: performance problem: "git commit filename"
Date: Sat, 12 Jan 2008 14:46:08 -0800 (PST)	[thread overview]
Message-ID: <alpine.LFD.1.00.0801121426510.2806@woody.linux-foundation.org> (raw)


I thought we had fixed this long long ago, but if we did, it has 
re-surfaced.

Using an explicit filename with "git commit" is _extremely_ slow. Lookie 
here:

	[torvalds@woody linux]$ time git commit fs/exec.c
	no changes added to commit (use "git add" and/or "git commit -a")

	real    0m1.671s
	user    0m1.200s
	sys     0m0.328s

that's closer to two seconds on a fast machine, with the whole tree 
cached!

And for the uncached case, it's just unbearably slow: two and a half 
*minutes*.

In contrast, without the filename, it's much faster:

	[torvalds@woody linux]$ time git commit
	no changes added to commit (use "git add" and/or "git commit -a")
	
	real    0m0.387s
	user    0m0.220s
	sys     0m0.168s

with the cold-cache case now being "just" 18s (which is still long, but 
we're talking eight times faster, and certainly not unbearable!)

Doing an "strace -c" on the thing shows why. In the filename case, we 
have:

	% time     seconds  usecs/call     calls    errors syscall
	------ ----------- ----------- --------- --------- ----------------
	 32.69    0.000868           0     92299        37 lstat
	 17.40    0.000462           0     29958      3993 open
	 15.78    0.000419           0      5522           getdents
	 15.56    0.000413           0     23165           mmap
	 11.37    0.000302           0     23118           munmap
	  5.76    0.000153           0     25966         2 close
	  1.43    0.000038           0      2845           fstat
	...

and in the non-filename case we have

	% time     seconds  usecs/call     calls    errors syscall
	------ ----------- ----------- --------- --------- ----------------
	 53.67    0.000600           0     69227        31 lstat
	 23.35    0.000261           0      5522           getdents
	 11.09    0.000124           2        55           munmap
	  4.20    0.000047           0       285           write
	  3.31    0.000037           0      5537      2638 open
	  2.33    0.000026           0      2899         1 close
	  2.06    0.000023           0      2844           fstat
	...

notice how the expensive case has a lot of successful open/mmap/munmap 
calls: it is *literally* ignoring the valid entries in the old index 
entirely, and re-hashing every single file in the tree! No wonder it is 
slow!

Just counting "lstat()" calls, it's worth noticing that the non-filename 
case seems to do three lstat's for each index entry (and yes, that's two
too many), but the named file case has upped that to *four* lstats per 
entry, and then added the one open/mmap/munmap/close on top of that!

I'm pretty sure we didn't use to do things this badly. And if this is a 
regression like I think it is, it should be fixed before a real 1.5.4 
release.

I'll try to see if I can see what's up, but I thought I'd better let 
others know too, in case I don't have time. I *suspect* (but have nothing 
what-so-ever to back that up) that this happened as part of making commit 
a builtin.

			Linus

             reply	other threads:[~2008-01-12 22:47 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-12 22:46 Linus Torvalds [this message]
2008-01-13  1:46 ` performance problem: "git commit filename" Linus Torvalds
2008-01-13  4:04   ` Linus Torvalds
2008-01-13  5:38     ` Daniel Barkalow
2008-01-13  8:14       ` Junio C Hamano
2008-01-13 16:57       ` Linus Torvalds
2008-01-13 19:31         ` Daniel Barkalow
2008-01-13  8:12     ` Junio C Hamano
2008-01-13 10:33       ` Junio C Hamano
2008-01-13 10:54         ` [PATCH] builtin-commit.c: do not lstat(2) partially committed paths twice Junio C Hamano
2008-01-13 11:09         ` performance problem: "git commit filename" Junio C Hamano
2008-01-13 17:24         ` Linus Torvalds
2008-01-13 19:39           ` Junio C Hamano
2008-01-13 22:36             ` [PATCH] index: be careful when handling long names Junio C Hamano
2008-01-13 22:53               ` Alex Riesen
2008-01-13 23:08                 ` Junio C Hamano
2008-01-13 23:33                   ` Alex Riesen
2008-01-14 21:03                     ` Junio C Hamano
2008-01-14  1:00           ` performance problem: "git commit filename" Junio C Hamano
2008-01-14 17:07             ` Linus Torvalds
2008-01-14 18:38               ` Junio C Hamano
2008-01-14 19:39               ` Linus Torvalds
2008-01-14 20:08                 ` Junio C Hamano
2008-01-14 21:00                   ` Linus Torvalds
2008-01-15  0:18             ` Linus Torvalds
2008-01-15  1:13               ` Junio C Hamano
2008-01-13 10:38       ` [PATCH] builtin-commit.c: remove useless check added by faulty cut and paste Junio C Hamano
2008-01-14 21:23         ` しらいしななこ
2008-01-14 21:54           ` Junio C Hamano
2008-01-14 23:46     ` performance problem: "git commit filename" Kristian Høgsberg
2008-01-14 23:15   ` Kristian Høgsberg
2008-01-14 23:48     ` Junio C Hamano
2008-01-14 23:53       ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.1.00.0801121426510.2806@woody.linux-foundation.org \
    --to=torvalds@linux-foundation.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=krh@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).