* 'setup_work_tree()' considered harmful
@ 2008-06-17 0:45 Linus Torvalds
2008-06-17 0:52 ` Linus Torvalds
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Linus Torvalds @ 2008-06-17 0:45 UTC (permalink / raw)
To: Junio C Hamano, Git Mailing List; +Cc: Johannes Schindelin
[ Dscho cc'd because I think he is the primary culprits for this thing, I
think. Commit e90fdc39b6903502192b2dd11e5503cea721a1ad in particular,
methinks. ]
So because I was looking at system call traces for object creation (due to
those "rename to the final resting place" patches I did that got merged
recently), I've noticed that some commands have *much* worse system call
patterns than others.
In particular, doing a "git add ." will use absolute pathnames for all git
files, while a "git diff" will not. And this is quite noticeable - the
absolute pathnames are not just longer, they have more path components in
them. Making them a lot slower to look up and use.
Of course, "a lot" depends on things a bit, but it really is noticeable.
To test, I created a kernel tree (no .git), and did a "git init" followed
by a "git add .". Here's the timings for current git head:
[torvalds@woody kernel]$ time git add .
real 0m8.377s
user 0m6.556s
sys 0m1.656s
After that, I fixed "write_loose_object()" to not unnecessarily try to
open a git object file, because every single caller has already done a
"has_sha1_file(sha1)" or "has_loose_object(sha1)" check before calling
that function, so trying to open it again is just pointless.
As a result, git add sped up a tiny bit:
[torvalds@woody kernel]$ time ~/git/git-add .
real 0m8.341s
user 0m6.588s
sys 0m1.548s
but I'll admit that it's not exactly noticeable. Half a percentage point
is not a big deal.
Then I created a hack that just made "setup_work_tree()" a no-op (because
it does all that crazy stuff that forces GIT_DIR to be an absolute path
etc). As a result I got:
[torvalds@woody kernel]$ time ~/git/git-add .
real 0m7.849s
user 0m6.420s
sys 0m1.296s
ie now we're talking about a 5%+ performance difference.
Of course, this is all for the hot-cache case, and it wouldn't be
noticeable for a cold-cache case, but it really can be a real performance
issue. And no, it's not that "setup_work_tree()" itself is expensive, but
depending on how GIT_DIR is set up, you get very different system call
patterns.
Here's what current git does for one file (sound/usb/usbmixer_maps.c):
lstat("sound/usb/usbmixer_maps.c", {st_mode=S_IFREG|0664, st_size=10230, ...}) = 0
open("sound/usb/usbmixer_maps.c", O_RDONLY) = 4
mmap(NULL, 10230, PROT_READ, MAP_PRIVATE, 4, 0) = 0x7fa247228000
close(4) = 0
access("/home/torvalds/kernel/.git/objects/d7/55be0ad8115da5c1296211678f81a7774277d5", F_OK) = -1 ENOENT (No such file or directory)
open("/home/torvalds/kernel/.git/objects/d7/55be0ad8115da5c1296211678f81a7774277d5", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/home/torvalds/kernel/.git/objects/d7/tmp_obj_55be0ad8115da5c12962116_vDqXRl", O_RDWR|O_CREAT|O_EXCL, 0600) = 4
write(4, "x\1\315ZmS\333H\22\276\257\370W\364\361\t\210\1K\206$\300.\273\6\314\306\273@(l\'"..., 3402) = 3402
fchmod(4, 0444) = 0
close(4) = 0
link("/home/torvalds/kernel/.git/objects/d7/tmp_obj_55be0ad8115da5c12962116_vDqXRl", "/home/torvalds/kernel/.git/objects/d7/55be0ad8115da5c1296211678f81a7774277d5") = 0
unlink("/home/torvalds/kernel/.git/objects/d7/tmp_obj_55be0ad8115da5c12962116_vDqXRl") = 0
munmap(0x7fa247228000, 10230) = 0
and here is the fixed version:
lstat("sound/usb/usbmixer_maps.c", {st_mode=S_IFREG|0664, st_size=10230, ...}) = 0
open("sound/usb/usbmixer_maps.c", O_RDONLY) = 4
mmap(NULL, 10230, PROT_READ, MAP_PRIVATE, 4, 0) = 0x7f54c231a000
close(4) = 0
access(".git/objects/d7/55be0ad8115da5c1296211678f81a7774277d5", F_OK) = -1 ENOENT (No such file or directory)
open(".git/objects/d7/tmp_obj_55be0ad8115da5c12962116_fKy6px", O_RDWR|O_CREAT|O_EXCL, 0600) = 4
write(4, "x\1\315ZmS\333H\22\276\257\370W\364\361\t\210\1K\206$\300.\273\6\314\306\273@(l\'"..., 3402) = 3402
fchmod(4, 0444) = 0
close(4) = 0
link(".git/objects/d7/tmp_obj_55be0ad8115da5c12962116_fKy6px", ".git/objects/d7/55be0ad8115da5c1296211678f81a7774277d5") = 0
unlink(".git/objects/d7/tmp_obj_55be0ad8115da5c12962116_fKy6px") = 0
munmap(0x7f54c231a000, 10230) = 0
ie note how it does one unnecessary "open()" less, but more importantly,
notice the difference between "/home/torvalds/kernel/.git/*" and ".git/*",
and realize that that second difference was the much more noticeable one.
And no, obviously the right fix is not to just comment out all of
"setup_work_tree()" (it will break stuff that depends on GIT_WORKTREE),
but I did that as a minimal example of showing what the bad effect of that
function is.
In general, I think we've gone in the wrong direction with a lot of the
"make_absolute_path" stuff. See above. 5% performance loss is not good.
Linus
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 'setup_work_tree()' considered harmful
2008-06-17 0:45 'setup_work_tree()' considered harmful Linus Torvalds
@ 2008-06-17 0:52 ` Linus Torvalds
2008-06-17 11:01 ` Johannes Schindelin
2008-06-18 9:05 ` Mike Hommey
2 siblings, 0 replies; 5+ messages in thread
From: Linus Torvalds @ 2008-06-17 0:52 UTC (permalink / raw)
To: Junio C Hamano, Git Mailing List; +Cc: Johannes Schindelin
On Mon, 16 Jun 2008, Linus Torvalds wrote:
>
> After that, I fixed "write_loose_object()" to not unnecessarily try to
> open a git object file, because every single caller has already done a
> "has_sha1_file(sha1)" or "has_loose_object(sha1)" check before calling
> that function, so trying to open it again is just pointless.
In case anybody cares, the obvious diff for this part is appended. The
FIXME! was wrong anyway - any collision detection needs to be done by the
callers when they do the "do we already have this object". Doing it in
write_loose_object() is too late (not to mention the wrong place to try to
do so anyway - not all callers generate new objects: some callers just
turn a packed object into a loose one).
Linus
---
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 16 Jun 2008 17:17:10 -0700
Subject: [PATCH] write_loose_object: don't bother trying to read an old object
Before even calling this, all callers have done a "has_sha1_file(sha1)"
or "has_loose_object(sha1)" check, so there is no point in doing a
second check.
If something races with us on object creation, we handle that in the
final link() that moves it to the right place.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
sha1_file.c | 14 --------------
1 files changed, 0 insertions(+), 14 deletions(-)
diff --git a/sha1_file.c b/sha1_file.c
index f9ec069..4255099 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2160,20 +2160,6 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
static char tmpfile[PATH_MAX];
filename = sha1_file_name(sha1);
- fd = open(filename, O_RDONLY);
- if (fd >= 0) {
- /*
- * FIXME!!! We might do collision checking here, but we'd
- * need to uncompress the old file and check it. Later.
- */
- close(fd);
- return 0;
- }
-
- if (errno != ENOENT) {
- return error("sha1 file %s: %s\n", filename, strerror(errno));
- }
-
fd = create_tmpfile(tmpfile, sizeof(tmpfile), filename);
if (fd < 0) {
if (errno == EPERM)
--
1.5.6.rc3.7.g336d0.dirty
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: 'setup_work_tree()' considered harmful
2008-06-17 0:45 'setup_work_tree()' considered harmful Linus Torvalds
2008-06-17 0:52 ` Linus Torvalds
@ 2008-06-17 11:01 ` Johannes Schindelin
2008-06-18 9:05 ` Mike Hommey
2 siblings, 0 replies; 5+ messages in thread
From: Johannes Schindelin @ 2008-06-17 11:01 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List
Hi,
On Mon, 16 Jun 2008, Linus Torvalds wrote:
> [ Dscho cc'd because I think he is the primary culprits for this thing, I
> think. Commit e90fdc39b6903502192b2dd11e5503cea721a1ad in particular,
> methinks. ]
Yes, I am not happy with work-tree. In fact, I grew to positively hate
it. IMHO it was not well designed when it entered Git, and unfortunately
my endeavors to clean it up were not very successful, either.
> [...]
>
> In particular, doing a "git add ." will use absolute pathnames for all
> git files, while a "git diff" will not. And this is quite noticeable -
> the absolute pathnames are not just longer, they have more path
> components in them. Making them a lot slower to look up and use.
>
> [...]
>
> In general, I think we've gone in the wrong direction with a lot of the
> "make_absolute_path" stuff. See above. 5% performance loss is not good.
Yes, that is true. However, I think we need it for the case where
work_tree is set (technically, we could try to be clever when work_tree is
set in such a manner that we can operate with relative paths, but I think
that is just not worth it).
So I am thinking about reverting to the old behavior, but _just_ for the
common case that no work tree was set.
This might be more tricky than it used to be, because of the many special
cases work trees require.
I briefly considered working on this now, but I simply do not have the
time, and I seem to be unconcentrated these days. Probably the best thing
would be to scrap the whole work-tree thing and throw it out, admitting
that it was a mistake to begin with.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 'setup_work_tree()' considered harmful
2008-06-17 0:45 'setup_work_tree()' considered harmful Linus Torvalds
2008-06-17 0:52 ` Linus Torvalds
2008-06-17 11:01 ` Johannes Schindelin
@ 2008-06-18 9:05 ` Mike Hommey
2008-06-18 16:26 ` Linus Torvalds
2 siblings, 1 reply; 5+ messages in thread
From: Mike Hommey @ 2008-06-18 9:05 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List, Johannes Schindelin
On Mon, Jun 16, 2008 at 05:45:25PM -0700, Linus Torvalds wrote:
> ie note how it does one unnecessary "open()" less, but more importantly,
> notice the difference between "/home/torvalds/kernel/.git/*" and ".git/*",
> and realize that that second difference was the much more noticeable one.
>
> And no, obviously the right fix is not to just comment out all of
> "setup_work_tree()" (it will break stuff that depends on GIT_WORKTREE),
> but I did that as a minimal example of showing what the bad effect of that
> function is.
>
> In general, I think we've gone in the wrong direction with a lot of the
> "make_absolute_path" stuff. See above. 5% performance loss is not good.
Maybe using openat, fstatat, etc. when they are available, could be a
good thing, already, though it wouldn't help for other platforms.
Mike
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 'setup_work_tree()' considered harmful
2008-06-18 9:05 ` Mike Hommey
@ 2008-06-18 16:26 ` Linus Torvalds
0 siblings, 0 replies; 5+ messages in thread
From: Linus Torvalds @ 2008-06-18 16:26 UTC (permalink / raw)
To: Mike Hommey; +Cc: Junio C Hamano, Git Mailing List, Johannes Schindelin
On Wed, 18 Jun 2008, Mike Hommey wrote:
>
> Maybe using openat, fstatat, etc. when they are available, could be a
> good thing, already, though it wouldn't help for other platforms.
I agree that openat() and friends would be nice for this (use a file
descriptor instead of a string to point to object directories etc), but
it's unportable enough that the pain of having to have two totally
different access routines is just not worth it. It could be done, but it
would need some really nifty abstraction layer.
We do already have _part_ of that abstraction layer with things like
"git_path()" etc, but it would have to be extended upon quite a bit.
Linus
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-06-18 16:29 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-17 0:45 'setup_work_tree()' considered harmful Linus Torvalds
2008-06-17 0:52 ` Linus Torvalds
2008-06-17 11:01 ` Johannes Schindelin
2008-06-18 9:05 ` Mike Hommey
2008-06-18 16:26 ` Linus Torvalds
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).