git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* 'setup_work_tree()' considered harmful
@ 2008-06-17  0:45 Linus Torvalds
  2008-06-17  0:52 ` Linus Torvalds
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Linus Torvalds @ 2008-06-17  0:45 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List; +Cc: Johannes Schindelin


[ Dscho cc'd because I think he is the primary culprits for this thing, I 
  think. Commit e90fdc39b6903502192b2dd11e5503cea721a1ad in particular, 
  methinks. ]

So because I was looking at system call traces for object creation (due to 
those "rename to the final resting place" patches I did that got merged 
recently), I've noticed that some commands have *much* worse system call 
patterns than others.

In particular, doing a "git add ." will use absolute pathnames for all git 
files, while a "git diff" will not. And this is quite noticeable - the 
absolute pathnames are not just longer, they have more path components in 
them. Making them a lot slower to look up and use.

Of course, "a lot" depends on things a bit, but it really is noticeable.

To test, I created a kernel tree (no .git), and did a "git init" followed 
by a "git add .". Here's the timings for current git head:

	[torvalds@woody kernel]$ time git add .

	real    0m8.377s
	user    0m6.556s
	sys     0m1.656s

After that, I fixed "write_loose_object()" to not unnecessarily try to 
open a git object file, because every single caller has already done a 
"has_sha1_file(sha1)" or "has_loose_object(sha1)" check before calling 
that function, so trying to open it again is just pointless.

As a result, git add sped up a tiny bit:

	[torvalds@woody kernel]$ time ~/git/git-add .

	real    0m8.341s
	user    0m6.588s
	sys     0m1.548s

but I'll admit that it's not exactly noticeable. Half a percentage point 
is not a big deal.

Then I created a hack that just made "setup_work_tree()" a no-op (because 
it does all that crazy stuff that forces GIT_DIR to be an absolute path 
etc). As a result I got:

	[torvalds@woody kernel]$ time ~/git/git-add .

	real    0m7.849s
	user    0m6.420s
	sys     0m1.296s

ie now we're talking about a 5%+ performance difference.

Of course, this is all for the hot-cache case, and it wouldn't be 
noticeable for a cold-cache case, but it really can be a real performance 
issue. And no, it's not that "setup_work_tree()" itself is expensive, but 
depending on how GIT_DIR is set up, you get very different system call 
patterns.

Here's what current git does for one file (sound/usb/usbmixer_maps.c):

	lstat("sound/usb/usbmixer_maps.c", {st_mode=S_IFREG|0664, st_size=10230, ...}) = 0
	open("sound/usb/usbmixer_maps.c", O_RDONLY) = 4
	mmap(NULL, 10230, PROT_READ, MAP_PRIVATE, 4, 0) = 0x7fa247228000
	close(4)                                = 0
	access("/home/torvalds/kernel/.git/objects/d7/55be0ad8115da5c1296211678f81a7774277d5", F_OK) = -1 ENOENT (No such file or directory)
	open("/home/torvalds/kernel/.git/objects/d7/55be0ad8115da5c1296211678f81a7774277d5", O_RDONLY) = -1 ENOENT (No such file or directory)
	open("/home/torvalds/kernel/.git/objects/d7/tmp_obj_55be0ad8115da5c12962116_vDqXRl", O_RDWR|O_CREAT|O_EXCL, 0600) = 4
	write(4, "x\1\315ZmS\333H\22\276\257\370W\364\361\t\210\1K\206$\300.\273\6\314\306\273@(l\'"..., 3402) = 3402
	fchmod(4, 0444)                         = 0
	close(4)                                = 0
	link("/home/torvalds/kernel/.git/objects/d7/tmp_obj_55be0ad8115da5c12962116_vDqXRl", "/home/torvalds/kernel/.git/objects/d7/55be0ad8115da5c1296211678f81a7774277d5") = 0
	unlink("/home/torvalds/kernel/.git/objects/d7/tmp_obj_55be0ad8115da5c12962116_vDqXRl") = 0
	munmap(0x7fa247228000, 10230)           = 0

and here is the fixed version:

	lstat("sound/usb/usbmixer_maps.c", {st_mode=S_IFREG|0664, st_size=10230, ...}) = 0
	open("sound/usb/usbmixer_maps.c", O_RDONLY) = 4
	mmap(NULL, 10230, PROT_READ, MAP_PRIVATE, 4, 0) = 0x7f54c231a000
	close(4)                                = 0
	access(".git/objects/d7/55be0ad8115da5c1296211678f81a7774277d5", F_OK) = -1 ENOENT (No such file or directory)
	open(".git/objects/d7/tmp_obj_55be0ad8115da5c12962116_fKy6px", O_RDWR|O_CREAT|O_EXCL, 0600) = 4
	write(4, "x\1\315ZmS\333H\22\276\257\370W\364\361\t\210\1K\206$\300.\273\6\314\306\273@(l\'"..., 3402) = 3402
	fchmod(4, 0444)                         = 0
	close(4)                                = 0
	link(".git/objects/d7/tmp_obj_55be0ad8115da5c12962116_fKy6px", ".git/objects/d7/55be0ad8115da5c1296211678f81a7774277d5") = 0
	unlink(".git/objects/d7/tmp_obj_55be0ad8115da5c12962116_fKy6px") = 0
	munmap(0x7f54c231a000, 10230)           = 0

ie note how it does one unnecessary "open()" less, but more importantly, 
notice the difference between "/home/torvalds/kernel/.git/*" and ".git/*", 
and realize that that second difference was the much more noticeable one.

And no, obviously the right fix is not to just comment out all of 
"setup_work_tree()" (it will break stuff that depends on GIT_WORKTREE), 
but I did that as a minimal example of showing what the bad effect of that 
function is.

In general, I think we've gone in the wrong direction with a lot of the 
"make_absolute_path" stuff. See above. 5% performance loss is not good.

		Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 'setup_work_tree()' considered harmful
  2008-06-17  0:45 'setup_work_tree()' considered harmful Linus Torvalds
@ 2008-06-17  0:52 ` Linus Torvalds
  2008-06-17 11:01 ` Johannes Schindelin
  2008-06-18  9:05 ` Mike Hommey
  2 siblings, 0 replies; 5+ messages in thread
From: Linus Torvalds @ 2008-06-17  0:52 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List; +Cc: Johannes Schindelin



On Mon, 16 Jun 2008, Linus Torvalds wrote:
> 
> After that, I fixed "write_loose_object()" to not unnecessarily try to 
> open a git object file, because every single caller has already done a 
> "has_sha1_file(sha1)" or "has_loose_object(sha1)" check before calling 
> that function, so trying to open it again is just pointless.

In case anybody cares, the obvious diff for this part is appended. The 
FIXME! was wrong anyway - any collision detection needs to be done by the 
callers when they do the "do we already have this object". Doing it in 
write_loose_object() is too late (not to mention the wrong place to try to 
do so anyway - not all callers generate new objects: some callers just 
turn a packed object into a loose one).

			Linus

---
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 16 Jun 2008 17:17:10 -0700
Subject: [PATCH] write_loose_object: don't bother trying to read an old object

Before even calling this, all callers have done a "has_sha1_file(sha1)"
or "has_loose_object(sha1)" check, so there is no point in doing a
second check.

If something races with us on object creation, we handle that in the
final link() that moves it to the right place.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 sha1_file.c |   14 --------------
 1 files changed, 0 insertions(+), 14 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index f9ec069..4255099 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2160,20 +2160,6 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 	static char tmpfile[PATH_MAX];
 
 	filename = sha1_file_name(sha1);
-	fd = open(filename, O_RDONLY);
-	if (fd >= 0) {
-		/*
-		 * FIXME!!! We might do collision checking here, but we'd
-		 * need to uncompress the old file and check it. Later.
-		 */
-		close(fd);
-		return 0;
-	}
-
-	if (errno != ENOENT) {
-		return error("sha1 file %s: %s\n", filename, strerror(errno));
-	}
-
 	fd = create_tmpfile(tmpfile, sizeof(tmpfile), filename);
 	if (fd < 0) {
 		if (errno == EPERM)
-- 
1.5.6.rc3.7.g336d0.dirty

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: 'setup_work_tree()' considered harmful
  2008-06-17  0:45 'setup_work_tree()' considered harmful Linus Torvalds
  2008-06-17  0:52 ` Linus Torvalds
@ 2008-06-17 11:01 ` Johannes Schindelin
  2008-06-18  9:05 ` Mike Hommey
  2 siblings, 0 replies; 5+ messages in thread
From: Johannes Schindelin @ 2008-06-17 11:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List

Hi,

On Mon, 16 Jun 2008, Linus Torvalds wrote:

> [ Dscho cc'd because I think he is the primary culprits for this thing, I 
>   think. Commit e90fdc39b6903502192b2dd11e5503cea721a1ad in particular, 
>   methinks. ]

Yes, I am not happy with work-tree.  In fact, I grew to positively hate 
it.  IMHO it was not well designed when it entered Git, and unfortunately 
my endeavors to clean it up were not very successful, either.

> [...]
>
> In particular, doing a "git add ." will use absolute pathnames for all 
> git files, while a "git diff" will not. And this is quite noticeable - 
> the absolute pathnames are not just longer, they have more path 
> components in them. Making them a lot slower to look up and use.
> 
> [...]
> 
> In general, I think we've gone in the wrong direction with a lot of the 
> "make_absolute_path" stuff. See above. 5% performance loss is not good.

Yes, that is true.  However, I think we need it for the case where 
work_tree is set (technically, we could try to be clever when work_tree is 
set in such a manner that we can operate with relative paths, but I think 
that is just not worth it).

So I am thinking about reverting to the old behavior, but _just_ for the 
common case that no work tree was set.

This might be more tricky than it used to be, because of the many special 
cases work trees require.

I briefly considered working on this now, but I simply do not have the 
time, and I seem to be unconcentrated these days.  Probably the best thing 
would be to scrap the whole work-tree thing and throw it out, admitting 
that it was a mistake to begin with.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 'setup_work_tree()' considered harmful
  2008-06-17  0:45 'setup_work_tree()' considered harmful Linus Torvalds
  2008-06-17  0:52 ` Linus Torvalds
  2008-06-17 11:01 ` Johannes Schindelin
@ 2008-06-18  9:05 ` Mike Hommey
  2008-06-18 16:26   ` Linus Torvalds
  2 siblings, 1 reply; 5+ messages in thread
From: Mike Hommey @ 2008-06-18  9:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List, Johannes Schindelin

On Mon, Jun 16, 2008 at 05:45:25PM -0700, Linus Torvalds wrote:
> ie note how it does one unnecessary "open()" less, but more importantly, 
> notice the difference between "/home/torvalds/kernel/.git/*" and ".git/*", 
> and realize that that second difference was the much more noticeable one.
> 
> And no, obviously the right fix is not to just comment out all of 
> "setup_work_tree()" (it will break stuff that depends on GIT_WORKTREE), 
> but I did that as a minimal example of showing what the bad effect of that 
> function is.
> 
> In general, I think we've gone in the wrong direction with a lot of the 
> "make_absolute_path" stuff. See above. 5% performance loss is not good.

Maybe using openat, fstatat, etc. when they are available, could be a
good thing, already, though it wouldn't help for other platforms.

Mike

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 'setup_work_tree()' considered harmful
  2008-06-18  9:05 ` Mike Hommey
@ 2008-06-18 16:26   ` Linus Torvalds
  0 siblings, 0 replies; 5+ messages in thread
From: Linus Torvalds @ 2008-06-18 16:26 UTC (permalink / raw)
  To: Mike Hommey; +Cc: Junio C Hamano, Git Mailing List, Johannes Schindelin



On Wed, 18 Jun 2008, Mike Hommey wrote:
> 
> Maybe using openat, fstatat, etc. when they are available, could be a
> good thing, already, though it wouldn't help for other platforms.

I agree that openat() and friends would be nice for this (use a file 
descriptor instead of a string to point to object directories etc), but 
it's unportable enough that the pain of having to have two totally 
different access routines is just not worth it. It could be done, but it 
would need some really nifty abstraction layer.

We do already have _part_ of that abstraction layer with things like 
"git_path()" etc, but it would have to be extended upon quite a bit. 

		Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-06-18 16:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-17  0:45 'setup_work_tree()' considered harmful Linus Torvalds
2008-06-17  0:52 ` Linus Torvalds
2008-06-17 11:01 ` Johannes Schindelin
2008-06-18  9:05 ` Mike Hommey
2008-06-18 16:26   ` Linus Torvalds

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).