git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git performance after directory copy
@ 2010-09-20  9:20 Gaer, A.
  2010-09-20  9:45 ` Michael J Gruber
  2010-09-20  9:56 ` Matthieu Moy
  0 siblings, 2 replies; 8+ messages in thread
From: Gaer, A. @ 2010-09-20  9:20 UTC (permalink / raw
  To: git

Hello all,

while moving a project directory around I stumbled over an interesting
phenomenon. On a copied source directory "git status" seems to be about
3 times slower than on the original directory. Only after a "git reset"
both copies behave the same again. Is this connected to the timestamps
of files & directories? Actually I would like to move the project
directories of several software developers to a new partition and
forcing them all to "git reset" in all of their repos is a little bit
annoying. Any suggestions how to "repair" the repos less intrusive?

Here's how I measured. The trees reside on an ext3 FS. I have lots of
free RAM, so after the first operation all further "git status" seem to
run from FS cache in RAM.

$ git clone <path to your preferred kernel>
$ cd kernel
$ time git status # several times!
...
$ time git status
# On branch master
nothing to commit (working directory clean)

real    0m0.691s
user    0m0.256s
sys     0m0.356s

$ cd ..
$ rsync -a kernel/ kernel2/
$ cd kernel2
$ time git status # several times!
...
$ time git status
# On branch master
nothing to commit (working directory clean)

real    0m2.705s
user    0m1.724s
sys     0m0.816s

$ git reset
$ time git status
# On branch master
nothing to commit (working directory clean)

real    0m0.704s
user    0m0.296s
sys     0m0.348s

Regards,
 Andreas.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: git performance after directory copy
  2010-09-20  9:20 git performance after directory copy Gaer, A.
@ 2010-09-20  9:45 ` Michael J Gruber
  2010-09-20  9:56 ` Matthieu Moy
  1 sibling, 0 replies; 8+ messages in thread
From: Michael J Gruber @ 2010-09-20  9:45 UTC (permalink / raw
  To: Gaer, A.; +Cc: git

Gaer, A. venit, vidit, dixit 20.09.2010 11:20:
> Hello all,
> 
> while moving a project directory around I stumbled over an interesting
> phenomenon. On a copied source directory "git status" seems to be about
> 3 times slower than on the original directory. Only after a "git reset"
> both copies behave the same again. Is this connected to the timestamps
> of files & directories? Actually I would like to move the project
> directories of several software developers to a new partition and
> forcing them all to "git reset" in all of their repos is a little bit
> annoying. Any suggestions how to "repair" the repos less intrusive?
> 

Since you clone from A to B, then copy from B several times, why don't
you clone from A several times instead?

If it's really about moving across file-system boundaries, then I don't
think there's a way around: you need to refresh the index with the
changed inodes information.

Michael

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: git performance after directory copy
  2010-09-20  9:20 git performance after directory copy Gaer, A.
  2010-09-20  9:45 ` Michael J Gruber
@ 2010-09-20  9:56 ` Matthieu Moy
  2010-09-20 10:54   ` Michael J Gruber
                     ` (2 more replies)
  1 sibling, 3 replies; 8+ messages in thread
From: Matthieu Moy @ 2010-09-20  9:56 UTC (permalink / raw
  To: Gaer, A.; +Cc: git

"Gaer, A." <Andreas.Gaer@baslerweb.com> writes:

> On a copied source directory "git status" seems to be about
> 3 times slower than on the original directory.

It is expected that the first "git status" be slower. It will most
likely have to actually re-diff the files and update the index
stat-cache.

But I'm surprised that the next "git status" are still slow. Other
people may get a better explanation, but this very much looks like a
bug.

Are you sure you don't have any permission problem, like a read-only
.git/index?

To investigate a bit, you can play with diff.autorefreshindex. When
set to false, "git diff" will tell you about the files which are
identical, but do not have the same stat information (and hence, that
Git has to re-diff). For example:

$ git config diff.autorefreshindex false
$ git diff

# no output: bar.txt exists, but is up to date.

$ touch bar.txt
$ git diff     
diff --git a/bar.txt b/bar.txt

# No actual diff, but the file appears since its stat information is
# different.

$ git diff
diff --git a/bar.txt b/bar.txt

# Further "git diff" behave the same.

$ git status
...
$ git diff  

# git status did update the stat-cache, hence, no output from git diff
# anymore.

Another diagnosis tool would be "strace -e open git status,lstat64".
For example:

$ touch bar.txt 
$ strace -e open,lstat64 git status |& grep bar.txt
lstat64("bar.txt", {st_mode=S_IFREG|0644, st_size=30, ...}) = 0
open("bar.txt", O_RDONLY|O_LARGEFILE)   = 3
$ strace -e open,lstat64 git status |& grep bar.txt
lstat64("bar.txt", {st_mode=S_IFREG|0644, st_size=30, ...}) = 0

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: git performance after directory copy
  2010-09-20  9:56 ` Matthieu Moy
@ 2010-09-20 10:54   ` Michael J Gruber
  2010-09-20 11:37   ` AW: " Gaer, A.
  2010-09-20 11:48   ` Johannes Sixt
  2 siblings, 0 replies; 8+ messages in thread
From: Michael J Gruber @ 2010-09-20 10:54 UTC (permalink / raw
  To: Matthieu Moy; +Cc: Gaer, A., git

Matthieu Moy venit, vidit, dixit 20.09.2010 11:56:
> "Gaer, A." <Andreas.Gaer@baslerweb.com> writes:
> 
>> On a copied source directory "git status" seems to be about
>> 3 times slower than on the original directory.
> 
> It is expected that the first "git status" be slower. It will most
> likely have to actually re-diff the files and update the index
> stat-cache.
> 
> But I'm surprised that the next "git status" are still slow. Other
> people may get a better explanation, but this very much looks like a
> bug.

In the OP's case, the inode/dev info differs (after copying, or after
moving across FS boundaries). I don't think "git status" updates these.
I noticed (and reported) something like this a while ago when, after a
reboot, I would get warnings about crossing FS boundaries. Only a "rm
.git/index && git reset --hard" was able to refresh the index properly.
This case seems to be similar, although a simple "git reset --hard"
appears to suffice.

Michael

^ permalink raw reply	[flat|nested] 8+ messages in thread

* AW: git performance after directory copy
  2010-09-20  9:56 ` Matthieu Moy
  2010-09-20 10:54   ` Michael J Gruber
@ 2010-09-20 11:37   ` Gaer, A.
  2010-09-20 11:48   ` Johannes Sixt
  2 siblings, 0 replies; 8+ messages in thread
From: Gaer, A. @ 2010-09-20 11:37 UTC (permalink / raw
  To: Matthieu Moy; +Cc: git

Hello all,

thanks Matthieu for your tip. "git diff" without "autorefreshindex"
shows that all files have different stat info. Calling "git update-index
--refresh" seems to be the right cure.

What I find a little bit confusing is that "git status" does not warn
about such a situation, or even "repair" it. As I said, I wanted to move
the project directories of several developers to a new partition without
too much interference and the first try didn't succeed because people
complained about "bad" "git status" performance (people get used to git
rocket-fast performance very soon ;-).

Maybe this was introduced in the 1.7 release: "git status" is not "git
commit --dry-run" anymore. "git commit --dry-run" does behave as you
expect: first call takes a little bit longer, subsequent calls are fast
again.

BTW, I tested on a system with git version 1.7.1 installed, but release
notes do not suggest any changes in that respect in 1.7.2 or 1.7.3.

Regards,
 Andreas.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: git performance after directory copy
  2010-09-20  9:56 ` Matthieu Moy
  2010-09-20 10:54   ` Michael J Gruber
  2010-09-20 11:37   ` AW: " Gaer, A.
@ 2010-09-20 11:48   ` Johannes Sixt
  2010-09-20 11:53     ` AW: " Gaer, A.
  2010-09-20 13:57     ` Matthieu Moy
  2 siblings, 2 replies; 8+ messages in thread
From: Johannes Sixt @ 2010-09-20 11:48 UTC (permalink / raw
  To: Matthieu Moy; +Cc: Gaer, A., git

Am 9/20/2010 11:56, schrieb Matthieu Moy:
> But I'm surprised that the next "git status" are still slow. Other
> people may get a better explanation, but this very much looks like a
> bug.

Most likely, Andreas works with 1.7.1. From the release notes of 1.7.1.1:

 * "git status" stopped refreshing the index by mistake in 1.7.1.

-- Hannes

^ permalink raw reply	[flat|nested] 8+ messages in thread

* AW: git performance after directory copy
  2010-09-20 11:48   ` Johannes Sixt
@ 2010-09-20 11:53     ` Gaer, A.
  2010-09-20 13:57     ` Matthieu Moy
  1 sibling, 0 replies; 8+ messages in thread
From: Gaer, A. @ 2010-09-20 11:53 UTC (permalink / raw
  To: Johannes Sixt, Matthieu Moy; +Cc: git

Hello Johannes,

you are right. Sorry. I totally overlooked release notes for 1.7.1.1. Thanks for all your help!

Regards,
 Andreas.
 

-----Ursprüngliche Nachricht-----
Von: Johannes Sixt [mailto:j.sixt@viscovery.net] 
Gesendet: Montag, 20. September 2010 13:48
An: Matthieu Moy
Cc: Gaer, A.; git@vger.kernel.org
Betreff: Re: git performance after directory copy

Am 9/20/2010 11:56, schrieb Matthieu Moy:
> But I'm surprised that the next "git status" are still slow. Other
> people may get a better explanation, but this very much looks like a
> bug.

Most likely, Andreas works with 1.7.1. From the release notes of 1.7.1.1:

 * "git status" stopped refreshing the index by mistake in 1.7.1.

-- Hannes

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: git performance after directory copy
  2010-09-20 11:48   ` Johannes Sixt
  2010-09-20 11:53     ` AW: " Gaer, A.
@ 2010-09-20 13:57     ` Matthieu Moy
  1 sibling, 0 replies; 8+ messages in thread
From: Matthieu Moy @ 2010-09-20 13:57 UTC (permalink / raw
  To: Johannes Sixt; +Cc: Gaer, A., git

Johannes Sixt <j.sixt@viscovery.net> writes:

> Am 9/20/2010 11:56, schrieb Matthieu Moy:
>> But I'm surprised that the next "git status" are still slow. Other
>> people may get a better explanation, but this very much looks like a
>> bug.
>
> Most likely, Andreas works with 1.7.1. From the release notes of 1.7.1.1:
>
>  * "git status" stopped refreshing the index by mistake in 1.7.1.

Nice catch, and this explains why I couldn't reproduce with latest
Git.

More precisely, it was fixed here:

b2f6fd9 t7508: add a test for "git status" in a read-only repository
4bb6644 git status: refresh the index if possible
4c926b3 t7508: add test for "git status" refreshing the index

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-09-20 14:02 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-20  9:20 git performance after directory copy Gaer, A.
2010-09-20  9:45 ` Michael J Gruber
2010-09-20  9:56 ` Matthieu Moy
2010-09-20 10:54   ` Michael J Gruber
2010-09-20 11:37   ` AW: " Gaer, A.
2010-09-20 11:48   ` Johannes Sixt
2010-09-20 11:53     ` AW: " Gaer, A.
2010-09-20 13:57     ` Matthieu Moy

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).