git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Zager <szager@google.com>
To: Karsten Blees <karsten.blees@gmail.com>
Cc: Sebastian Schuberth <sschuberth@gmail.com>,
	git@vger.kernel.org, msysGit <msysgit@googlegroups.com>
Subject: Re: Windows performance / threading file access
Date: Thu, 10 Oct 2013 22:35:47 -0700	[thread overview]
Message-ID: <CAHOQ7J_sNnajm9M+QUd-QwkQGP2vOidzAW5_5EzsdwBGTDCnSA@mail.gmail.com> (raw)
In-Reply-To: <52574B90.3070309@gmail.com>

On Thu, Oct 10, 2013 at 5:51 PM, Karsten Blees <karsten.blees@gmail.com>wrote:

> >> I've noticed that when working with a very large repository using msys
> >> git, the initial checkout of a cloned repository is excruciatingly
> >> slow (80%+ of total clone time).  The root cause, I think, is that git
> >> does all the file access serially, and that's really slow on Windows.
> >>
>
> What exactly do you mean by "excruciatingly slow"?
>
> I just ran a few tests with a big repo (WebKit, ~2GB, ~200k files). A full
> checkout with git 1.8.4 on my SSD took 52s on Linux and 81s on Windows.
> Xcopy /s took ~4 minutes (so xcopy is much slower than git). On a 'real' HD
> (WD Caviar Green) the Windows checkout took ~9 minutes.

I'm using blink for my test, which should be more or less indistinguishable
from WebKit.  I'm using a standard spinning disk, no SSD.  For my purposes,
I need to optimize this for "standard"-ish hardware, not best-in-class.

For my test, I first run 'git clone -n <repo>', and then measure the
running time of 'git checkout --force HEAD'.  On linux, the checkout
command runs in 0:12; on Windows, it's about 3:30.

> If your numbers are much slower, check for overeager virus scanners and
> probably the infamous "User Account Control" (On Vista/7 (8?), the
> luafv.sys driver slows down things on the system drive even with UAC turned
> off in control panel. The driver can be disabled with "sc config luafv
> start= disabled" + reboot. Reenable with "sc config luafv start= auto").

I confess that I am pretty ignorant about Windows, so I'll have to research
these.

>> Has anyone considered threading file access to speed this up?  In
> >> particular, I've got my eye on this loop in unpack-trees.c:
> >>
>
> Its probably worth a try, however, in my experience, doing disk IO in
> parallel tends to slow things down due to more disk seeks.

> I'd rather try to minimize seeks, ...
>

In my experience, modern disk controllers are very very good at this; it
rarely, if ever, makes sense to try and outsmart them.

But, from talking to Windows-savvy people, I believe the issue is not disk
seek time, but rather the fact that Windows doesn't cache file stat
information.  Instead, it goes all the way to the source of truth (i.e.,
the physical disk) every time it stats a file or directory.  That's what
causes the checkout to be so slow: all those file stats run serially.

Does that sound right?  I'm prepared to be wrong about this; but if no one
has tried it, then it's probably at least worth an experiment.

Thanks,

Stefan

  parent reply	other threads:[~2013-10-11  5:35 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-10 18:18 Windows performance / threading file access Stefan Zager
2013-10-10 20:19 ` Sebastian Schuberth
2013-10-11  0:51   ` Karsten Blees
2013-10-11  5:28     ` Stefan Zager
2013-10-11  5:35     ` Stefan Zager [this message]
2013-10-11  5:48       ` Duy Nguyen
2013-10-15 22:22       ` pro-logic
2013-10-17 16:50         ` Karsten Blees
2013-10-21 22:58           ` pro-logic
2013-10-22 14:30             ` Karsten Blees
2013-10-22 14:49               ` Sebastian Schuberth
2013-10-22 15:40                 ` Karsten Blees

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHOQ7J_sNnajm9M+QUd-QwkQGP2vOidzAW5_5EzsdwBGTDCnSA@mail.gmail.com \
    --to=szager@google.com \
    --cc=git@vger.kernel.org \
    --cc=karsten.blees@gmail.com \
    --cc=msysgit@googlegroups.com \
    --cc=sschuberth@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).