git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* First Git status takes 40+ minutes, when mounting fileystem/diskimage with 50G GIT repo + 900G of builds articles
@ 2019-08-22 18:02 Saravanan Shanmugham (sarvi)
  2019-08-22 18:13 ` Junio C Hamano
  2019-08-22 19:15 ` Johannes Sixt
  0 siblings, 2 replies; 7+ messages in thread
From: Saravanan Shanmugham (sarvi) @ 2019-08-22 18:02 UTC (permalink / raw)
  To: git@vger.kernel.org



We have a diskimage/fileysystem that has a 50G Git repository + 900G of binary/build articles and untracked files.
When we mount such a diskimage, The verify first “git status” command can take as long 40-50minutes.
Subsequent “git status” finish in under 5-10 seconds.

If I had a diskimage of just the 50G source repository, and I mount and do a “git status” takes around 15 seconds.

How can we optimize this to be faster?

I suspect warming the filesystem caching is in play.
But so is the fact that walk every tree to find untracked files as well. 

My interest in git status is the 50G of sources/repository not the 900G of build generated articles in the workspace?
I have tried adding .gitignore to whole directory trees that contain build artcles, 700G are excluded using git ignores, and that still drops the time for git status to only 30 minutes, which is high.
time git status -uno --ignored=no

Any suggestions on how to root cause and optimize this case?


Thanks,
Sarvi
Occam’s Razor Rules


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: First Git status takes 40+ minutes, when mounting fileystem/diskimage with 50G GIT repo + 900G of builds articles
  2019-08-22 18:02 First Git status takes 40+ minutes, when mounting fileystem/diskimage with 50G GIT repo + 900G of builds articles Saravanan Shanmugham (sarvi)
@ 2019-08-22 18:13 ` Junio C Hamano
  2019-08-23  0:32   ` Saravanan Shanmugham (sarvi)
  2019-08-22 19:15 ` Johannes Sixt
  1 sibling, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2019-08-22 18:13 UTC (permalink / raw)
  To: Saravanan Shanmugham (sarvi); +Cc: git@vger.kernel.org

"Saravanan Shanmugham (sarvi)" <sarvi@cisco.com> writes:

> I suspect warming the filesystem caching is in play.
> But so is the fact that walk every tree to find untracked files as well. 

Enable the untracked cache and "update-index --refresh", before
freezing the repository + working tree state in the diskimage?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: First Git status takes 40+ minutes, when mounting fileystem/diskimage with 50G GIT repo + 900G of builds articles
  2019-08-22 18:02 First Git status takes 40+ minutes, when mounting fileystem/diskimage with 50G GIT repo + 900G of builds articles Saravanan Shanmugham (sarvi)
  2019-08-22 18:13 ` Junio C Hamano
@ 2019-08-22 19:15 ` Johannes Sixt
  2019-08-22 19:32   ` Junio C Hamano
  1 sibling, 1 reply; 7+ messages in thread
From: Johannes Sixt @ 2019-08-22 19:15 UTC (permalink / raw)
  To: Saravanan Shanmugham (sarvi); +Cc: git@vger.kernel.org

Am 22.08.19 um 20:02 schrieb Saravanan Shanmugham (sarvi):
> We have a diskimage/fileysystem that has a 50G Git repository + 900G
> of binary/build articles and untracked files. When we mount such a
> diskimage, The verify first “git status” command can take as long
> 40-50minutes. Subsequent “git status” finish in under 5-10 seconds.>
> If I had a diskimage of just the 50G source repository, and I mount
> and do a “git status” takes around 15 seconds.
Are you saying that you commonly mount and unmount the filesystem?

Git tracks a device number in the index. Could it happen that it is
different every time you mount the filesystem? Because when it is, Git
reads the data and checks whether it has changed. At this time, the
device number is also fixed up in the index. Thereafter, "git status" is
fast because it sees from the cached file properties that no change was
made and does not have to read the data.

You may set "git config core.checkStat minimal" to avoid the problem.
But it may come with its own problems (certain kinds of modifications
would not be noticed, although these would be hard to trigger in practice).

-- Hannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: First Git status takes 40+ minutes, when mounting fileystem/diskimage with 50G GIT repo + 900G of builds articles
  2019-08-22 19:15 ` Johannes Sixt
@ 2019-08-22 19:32   ` Junio C Hamano
  2019-08-22 21:29     ` Johannes Sixt
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2019-08-22 19:32 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Saravanan Shanmugham (sarvi), git@vger.kernel.org

Johannes Sixt <j6t@kdbg.org> writes:

> Am 22.08.19 um 20:02 schrieb Saravanan Shanmugham (sarvi):
>> We have a diskimage/fileysystem that has a 50G Git repository + 900G
>> of binary/build articles and untracked files. When we mount such a
>> diskimage, The verify first “git status” command can take as long
>> 40-50minutes. Subsequent “git status” finish in under 5-10 seconds.>
>> If I had a diskimage of just the 50G source repository, and I mount
>> and do a “git status” takes around 15 seconds.
> Are you saying that you commonly mount and unmount the filesystem?
>
> Git tracks a device number in the index. Could it happen that it is
> different every time you mount the filesystem?

I read the above to mean that a diskimage file is treated as a
virtual block device on which a filesystem image exists, and it is
mounted via the lookback device mechanism.  In such a case, I do not
think stability of i-num would not be an issue (the filesystem image
would record them all).

> You may set "git config core.checkStat minimal" to avoid the problem.
> But it may come with its own problems (certain kinds of modifications
> would not be noticed, although these would be hard to trigger in practice).

Yeah.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: First Git status takes 40+ minutes, when mounting fileystem/diskimage with 50G GIT repo + 900G of builds articles
  2019-08-22 19:32   ` Junio C Hamano
@ 2019-08-22 21:29     ` Johannes Sixt
  2019-08-22 21:36       ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Sixt @ 2019-08-22 21:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Saravanan Shanmugham (sarvi), git@vger.kernel.org

Am 22.08.19 um 21:32 schrieb Junio C Hamano:
> Johannes Sixt <j6t@kdbg.org> writes:
> 
>> Am 22.08.19 um 20:02 schrieb Saravanan Shanmugham (sarvi):
>>> We have a diskimage/fileysystem that has a 50G Git repository + 900G
>>> of binary/build articles and untracked files. When we mount such a
>>> diskimage, The verify first “git status” command can take as long
>>> 40-50minutes. Subsequent “git status” finish in under 5-10 seconds.>
>>> If I had a diskimage of just the 50G source repository, and I mount
>>> and do a “git status” takes around 15 seconds.
>> Are you saying that you commonly mount and unmount the filesystem?
>>
>> Git tracks a device number in the index. Could it happen that it is
>> different every time you mount the filesystem?
> 
> I read the above to mean that a diskimage file is treated as a
> virtual block device on which a filesystem image exists, and it is
> mounted via the lookback device mechanism.  In such a case, I do not
> think stability of i-num would not be an issue (the filesystem image
> would record them all).

Inode number would be stable, but st_dev may not be. But it looks like a
default build does not use it anyway (I see that we do not define
USE_STDEV), so my guess was most likely wrong.

-- Hannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: First Git status takes 40+ minutes, when mounting fileystem/diskimage with 50G GIT repo + 900G of builds articles
  2019-08-22 21:29     ` Johannes Sixt
@ 2019-08-22 21:36       ` Junio C Hamano
  0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2019-08-22 21:36 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Saravanan Shanmugham (sarvi), git@vger.kernel.org

Johannes Sixt <j6t@kdbg.org> writes:

> Inode number would be stable, but st_dev may not be. But it looks like a
> default build does not use it anyway (I see that we do not define
> USE_STDEV), so my guess was most likely wrong.

Ahh, thanks.  I overlooked the device number, but I think the
default settings excludes st_dev because it was unstable on nfs and
friends.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: First Git status takes 40+ minutes, when mounting fileystem/diskimage with 50G GIT repo + 900G of builds articles
  2019-08-22 18:13 ` Junio C Hamano
@ 2019-08-23  0:32   ` Saravanan Shanmugham (sarvi)
  0 siblings, 0 replies; 7+ messages in thread
From: Saravanan Shanmugham (sarvi) @ 2019-08-23  0:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org

Thanks. That seems to help.

Just for context, we are using a copy-on-write/cloning solution to give developers  a prebuilt workspace with all the platforms fully built, as well as sources all cloned,
So they are ready for incremental development.

We create a ext4 diskimage, with a git clone workspace(50G) and 900G of fully built trees, then freeze that diskimage.
Developers have a way to clone that diskimage(into a copy-on-write) copy of the diskimage(in 30 seconds) that they can mount and use for incremental build development.

I did the following.

Mounted the existing filesystem
git update-index --untracked-cache
git update-index --refresh

Not sure what you meant by " working tree state in the diskimage"

I then detached the diskimage, dropped all the caches. 
Remounted the diskimage.


git status  - The very first git status after dropping the cache now returns in in 1.54 seconds, which seems very much acceptable.

I obviously need to do more read up on what/how git caches.

Can you point to any documentation on what sort of information GIT caches, and how to understand debug that?
I would like to understand what the above git update-index commands actually do, that is making this faster.
 
Thanks,
Sarvi
Occam’s Razor Rules

On 8/22/19, 11:13 AM, "Junio C Hamano" <gitster@pobox.com> wrote:

    "Saravanan Shanmugham (sarvi)" <sarvi@cisco.com> writes:
    
    > I suspect warming the filesystem caching is in play.
    > But so is the fact that walk every tree to find untracked files as well. 
    
    Enable the untracked cache and "update-index --refresh", before
    freezing the repository + working tree state in the diskimage?
    


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-08-23  0:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-22 18:02 First Git status takes 40+ minutes, when mounting fileystem/diskimage with 50G GIT repo + 900G of builds articles Saravanan Shanmugham (sarvi)
2019-08-22 18:13 ` Junio C Hamano
2019-08-23  0:32   ` Saravanan Shanmugham (sarvi)
2019-08-22 19:15 ` Johannes Sixt
2019-08-22 19:32   ` Junio C Hamano
2019-08-22 21:29     ` Johannes Sixt
2019-08-22 21:36       ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).