git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git commit results in many lstat()s
@ 2017-02-01 21:45 Gumbel, Matthew K
  2017-02-01 22:11 ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Gumbel, Matthew K @ 2017-02-01 21:45 UTC (permalink / raw)
  To: git@vger.kernel.org

Hello,

My high level problem is to speed up git commit on a large repository stored on NFS filesystem. I see via strace that it is slow because it makes a large number (~50,000) of lstat() calls in serial. Every call is a round-trip to the NFS server.

I do not understand why git commit must call lstat() on every file in the repository, even when I specify the name of the file I want to commit on the command line. Can somebody explain why it must call lstat on every file?

My command-line looks like this: git commit -uno -o -m asdf file-to-commit.txt

Secondly, are there any optimizations I can make to avoid this behavior?

Thanks,
Matt


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git commit results in many lstat()s
  2017-02-01 21:45 git commit results in many lstat()s Gumbel, Matthew K
@ 2017-02-01 22:11 ` Junio C Hamano
  2017-02-01 22:26   ` Gumbel, Matthew K
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2017-02-01 22:11 UTC (permalink / raw)
  To: Gumbel, Matthew K; +Cc: git@vger.kernel.org

"Gumbel, Matthew K" <matthew.k.gumbel@intel.com> writes:

> I do not understand why git commit must call lstat() on every file
> in the repository, even when I specify the name of the file I want
> to commit on the command line.

Assuming the "COPYING" and "README.md" files are already tracked:

    $ >COPYING
    $ >README.md
    $ git commit COPYING

would open an editor, in which you would see list of files under
"Changes to be committed", "Changes not staged for commit", etc.
Among the second class you would see README.md listed.

To figure out what paths are "changed", without having to open all
files and compare their contents with what is recorded in the commit
you are building on top of, we do lstat(2) to see if the timestamp
(and other information in the inode) of the files are the same since
you checked them out of HEAD.

    $ git commit --no-status COPYING

would reduce the number of lstat(2) somewhat, because the codepath
is told that it does not have to make the list to be shown in the
editor.  So would

    $ git commit -m "empty COPYING" COPYING

These two only halve the number of lstat(2), by taking advantage of
the fact that the list of "modified files" does not have to be
built.  There probably are other things that can be optimized.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: git commit results in many lstat()s
  2017-02-01 22:11 ` Junio C Hamano
@ 2017-02-01 22:26   ` Gumbel, Matthew K
  2017-02-01 23:50     ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Gumbel, Matthew K @ 2017-02-01 22:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org

"Junio C Hamano" <jch2355@gmail.com> writes:
> There probably are other things that can be optimized.

Yes, I think that when the user passes --only flag to git-commit, then git does not
need to call refresh_cache() in prepare_index() in builtin/commit.c.

I may experiment with that. Do you see any downside or negative side-effects?

Matt

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git commit results in many lstat()s
  2017-02-01 22:26   ` Gumbel, Matthew K
@ 2017-02-01 23:50     ` Junio C Hamano
  2017-02-02  0:14       ` Gumbel, Matthew K
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2017-02-01 23:50 UTC (permalink / raw)
  To: Gumbel, Matthew K; +Cc: git@vger.kernel.org

"Gumbel, Matthew K" <matthew.k.gumbel@intel.com> writes:

> "Junio C Hamano" <jch2355@gmail.com> writes:
>> There probably are other things that can be optimized.
>
> Yes, I think that when the user passes --only flag to git-commit, then git does not
> need to call refresh_cache() in prepare_index() in builtin/commit.c.
>
> I may experiment with that. Do you see any downside or negative side-effects?

There may be other fallouts, but one that immediately comes to mind
is that it may break pre-commit hook.

When we get "--only", we prepare an temporary index to create the
commit out of, and give it to the pre-commit hook.  The hook expects
that the cached stat information is up-to-date, iow, it does not
have to do 'update-index --refresh' before using plumbing commands
like "diff-index" to do its own inspection of the working tree.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: git commit results in many lstat()s
  2017-02-01 23:50     ` Junio C Hamano
@ 2017-02-02  0:14       ` Gumbel, Matthew K
  2017-02-02  0:25         ` brian m. carlson
  0 siblings, 1 reply; 6+ messages in thread
From: Gumbel, Matthew K @ 2017-02-02  0:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org

"Junio C Hamano <mailto:jch2355@gmail.com> writes:

"Gumbel, Matthew K" <matthew.k.gumbel@intel.com> writes:

>> Yes, I think that when the user passes --only flag to git-commit, then git does not
>> need to call refresh_cache() in prepare_index() in builtin/commit.c.
>>
>> I may experiment with that. Do you see any downside or negative side-effects?

> There may be other fallouts, but one that immediately comes to mind
> is that it may break pre-commit hook.

If pre-commit hook exists, we can fall-back to original behavior and call
refresh_cache(). Many repos will not have pre-commit hook and can 
benefit from the speedup.

I'm testing such a change locally. Git test suite seems to be running for quite
a while. Do you know any way to run it in parallel or otherwise speed it
up?

Thanks,
Matt

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git commit results in many lstat()s
  2017-02-02  0:14       ` Gumbel, Matthew K
@ 2017-02-02  0:25         ` brian m. carlson
  0 siblings, 0 replies; 6+ messages in thread
From: brian m. carlson @ 2017-02-02  0:25 UTC (permalink / raw)
  To: Gumbel, Matthew K; +Cc: Junio C Hamano, git@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 631 bytes --]

On Thu, Feb 02, 2017 at 12:14:30AM +0000, Gumbel, Matthew K wrote:
> I'm testing such a change locally. Git test suite seems to be running for quite
> a while. Do you know any way to run it in parallel or otherwise speed it
> up?

I usually do something like the following:

  make -j3 all && (cd t && GIT_PROVE_OPTS=-j3 make prove)

This, of course, requires that you have Perl's prove installed, which
has been part of core Perl since 5.10.1.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-02-02  0:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-01 21:45 git commit results in many lstat()s Gumbel, Matthew K
2017-02-01 22:11 ` Junio C Hamano
2017-02-01 22:26   ` Gumbel, Matthew K
2017-02-01 23:50     ` Junio C Hamano
2017-02-02  0:14       ` Gumbel, Matthew K
2017-02-02  0:25         ` brian m. carlson

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).