git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>
Subject: Re: [PATCH/RFC] gitperformance: add new documentation about git performance tuning
Date: Tue, 4 Apr 2017 14:25:34 -0400	[thread overview]
Message-ID: <9916fcdf-6d28-82c2-af7e-c144ac0982ca@jeffhostetler.com> (raw)
In-Reply-To: <CACBZZX5gYeRWOY+J0E55FfzqnqP8+4JGR9f42Y+MVqUZhxt87A@mail.gmail.com>



On 4/4/2017 11:18 AM, Ævar Arnfjörð Bjarmason wrote:
> On Tue, Apr 4, 2017 at 5:07 PM, Jeff Hostetler <git@jeffhostetler.com> wrote:
>>
>> On 4/3/2017 5:16 PM, Ævar Arnfjörð Bjarmason wrote:
>>>
>>> Add a new manpage that gives an overview of how to tweak git's
>>> performance.
>>>
>>> There's currently no good single resource for things a git site
>>> administrator might want to look into to improve performance for his
>>> site & his users. This unfinished documentation aims to be the first
>>> thing someone might want to look at when investigating ways to improve
>>> git performance.
>>>
>>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>>> ---
>>>
>>> I've been wanting to get something like this started for a while. It's
>>> obviously woefully incomplete. Pointers about what to include would be
>>> great & whether including something like this makes sense.
>>>
>>> Things I have on my TODO list:
>>>
>>>  - Add a section discussing how refs impact performance, suggest
>>>    e.g. archiving old tags if possible, or at least run "git remote
>>>    prune origin" regularly on clients.
>>>
>>>  - Discuss split index a bit, although I'm not very confident in
>>>    describing what its pros & cons are.
>>>
>>>  - Should we be covering good practices for your repo going forward to
>>>    maintain good performance? E.g. don't have some huge tree all in
>>>    one directory (use subdirs), don't add binary (rather
>>>    un-delta-able) content if you can help it etc.
>>>
>>> - The new core.checksumIndex option being discussed on-list. Which
>>>   actually drove my to finally write this up (hrm, this sounds useful,
>>>   but unless I was watching the list I'd probably never see it...).
>>
>>
>> You might also consider core.preloadIndex.
>
> It's been enabled by default since 2.1.0 (299e29870b), or do you mean
> talk about disabling it, or "this is a perf option we have on by
> default"?
>
> I don't know the pros of disabling that, haven't used it myself & it's
> not clear from the docs.

Sorry, no, don't disable it.  Maybe an ack that
it should be on.


>
>> For people with very large trees, talk about sparse-checkout.
>
> *nod*
>
>> And (on Windows) core.fscache.  Or leave a place for
>> an addendum for Windows that we can fill in later.
>
> I have no core.fscache in my git.git, did you mean something else?

This is only in the Git for Windows tree.  It hasn't
made it upstream yet.

https://github.com/git-for-windows/git/commits/master/compat/win32/fscache.c

Ignore this for now if you want and we can fill in the details
later for you.

>
>>
>>
>>>
>>>
>>>  Documentation/Makefile           |   1 +
>>>  Documentation/gitperformance.txt | 107
>>> +++++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 108 insertions(+)
>>>  create mode 100644 Documentation/gitperformance.txt
>>>
>>> diff --git a/Documentation/Makefile b/Documentation/Makefile
>>> index b5be2e2d3f..528aa22354 100644
>>> --- a/Documentation/Makefile
>>> +++ b/Documentation/Makefile
>>> @@ -23,6 +23,7 @@ MAN5_TXT += gitrepository-layout.txt
>>>  MAN5_TXT += gitweb.conf.txt
>>>
>>>  MAN7_TXT += gitcli.txt
>>> +MAN7_TXT += gitperformance.txt
>>>  MAN7_TXT += gitcore-tutorial.txt
>>>  MAN7_TXT += gitcredentials.txt
>>>  MAN7_TXT += gitcvs-migration.txt
>>> diff --git a/Documentation/gitperformance.txt
>>> b/Documentation/gitperformance.txt
>>> new file mode 100644
>>> index 0000000000..0548d1e721
>>> --- /dev/null
>>> +++ b/Documentation/gitperformance.txt
>>> @@ -0,0 +1,107 @@
>>> +giteveryday(7)
>>> +==============
>>> +
>>> +NAME
>>> +----
>>> +gitperformance - How to improve Git's performance
>>> +
>>> +SYNOPSIS
>>> +--------
>>> +
>>> +A guide to improving Git's performance beyond the defaults.
>>> +
>>> +DESCRIPTION
>>> +-----------
>>> +
>>> +Git is mostly performant by default, but ships with various
>>> +configuration options, command-line options, etc. that might improve
>>> +performance, but for various reasons aren't on by default.
>>> +
>>> +This document provides a brief overview of these features.
>>> +
>>> +The reader should not assume that turning on all of these features
>>> +will increase performance, depending on the repository, workload &
>>> +use-case turning some of them on might severely harm performance.
>>> +
>>> +This document serves as a starting point for things to look into when
>>> +it comes to improving performance, not as a checklist for things to
>>> +enable or disable.
>>> +
>>> +Performance by topic
>>> +--------------------
>>> +
>>> +It can be hard to divide the performance features into topics, but
>>> +most of them fall into various well-defined buckets. E.g. there are
>>> +features that help with the performance of "git status", and couldn't
>>> +possibly impact repositories without working copies, and then some
>>> +that only impact the performance of cloning from a server, or help the
>>> +server itself etc.
>>> +
>>> +git status
>>> +~~~~~~~~~~
>>> +
>>> +Running "git status" requires traversing the working tree & comparing
>>> +it with the index. Several configuration options can help with its
>>> +performance, with some trade-offs.
>>> +
>>> +- config: "core.untrackedCache=true" (see linkgit:git-config[1]) can
>>> +  save on `stat(2)` calls by caching the mtime of filesystem
>>> +  directories, and if they didn't change avoid recursing into that
>>> +  directory to `stat(2)` every file in them.
>>> ++
>>> +pros: Can drastically speed up "git status".
>>> ++
>>> +cons: There's a speed hit for initially populating & maintaining the
>>> +cache. Doesn't work on all filesystems (see `--test-untracked-cache`
>>> +in linkgit:git-update-index[1]).
>>> +
>>> +- config: "status.showUntrackedFiles=no" (see
>>> +  linkgit:git-config[1]). Skips looking for files in the working tree
>>> +  git doesn't already know about.
>>> ++
>>> +pros: Speeds up "git status" by making it do a lot less work.
>>> ++
>>> +cons: If there's any new & untracked files anywhere in the working
>>> +tree they won't be noticed by git. Makes it easy to accidentally miss
>>> +files to "git add" before committing, or files which might impact the
>>> +code in the working tree, but which git won't know exist.
>>> +
>>> +git grep
>>> +~~~~~~~~
>>> +
>>> +- config: "grep.patternType=perl" (see linkgit:git-config[1]) will use
>>> +  the PCRE library when "git grep" is invoked by default. This can be
>>> +  faster than POSIX regular expressions in many cases.
>>> ++
>>> +pros: Can, depending on the use-case, be faster than default "git grep".
>>> ++
>>> +cons: Can also be slower, and in some edge cases produce different
>>> +results.
>>> +
>>> +- config: "grep.threads=*" (see linkgit:git-config[1] &
>>> +  linkgit:git-grep[1]). Tunes the number of "git grep" worker threads.
>>> ++
>>> +pros: Giving this a more optimal value might result in a faster grep.
>>> ++
>>> +cons: It might not.
>>> +
>>> +Server options to help clients
>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> +
>>> +These features can be enabled on git servers, they won't help the
>>> +performance of the servers themselves, but will help clients that need
>>> +to talk to those servers.
>>> +
>>> +- config: "repack.writeBitmaps=true" (see
>>> +  linkgit:git-config[1]). Spend more time during repack to produce
>>> +  bitmap index, helps clients with "fetch" & "clone" performance.
>>> ++
>>> +pros: Once enabled & run regularly as part of "git repack" speeds up
>>> +"clone" and "fetch".
>>> ++
>>> +cons: Takes extra time during repack, requires doing full
>>> +non-incremental repacks with `-A` or `-a`.
>>> +
>>> +GIT
>>> +---
>>> +Part of the linkgit:git[1] suite
>>>
>>

  reply	other threads:[~2017-04-04 18:25 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-03 21:16 [PATCH/RFC] gitperformance: add new documentation about git performance tuning Ævar Arnfjörð Bjarmason
2017-04-03 21:34 ` Eric Wong
2017-04-03 21:57   ` Ævar Arnfjörð Bjarmason
2017-04-03 22:39     ` Eric Wong
2017-04-04 21:12       ` Ævar Arnfjörð Bjarmason
2017-04-04  2:19     ` Jeff King
2017-04-04 15:07 ` Jeff Hostetler
2017-04-04 15:18   ` Ævar Arnfjörð Bjarmason
2017-04-04 18:25     ` Jeff Hostetler [this message]
2017-04-05 12:56 ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9916fcdf-6d28-82c2-af7e-c144ac0982ca@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).