git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: Ben Peart <peartben@gmail.com>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	git <git@vger.kernel.org>, "Ben Peart" <benpeart@microsoft.com>,
	"Alex Vandiver" <alexmv@dropbox.com>,
	"Christian Couder" <christian.couder@gmail.com>,
	jamill@microsoft.com
Subject: Re: Some rough edges of core.fsmonitor
Date: Thu, 1 Feb 2018 17:40:10 +0700	[thread overview]
Message-ID: <CACsJy8CShVSQAZ6ZaGT9i0JDEn19cTMaEmLRDV3ULjaj9DqZUg@mail.gmail.com> (raw)
In-Reply-To: <bb8a433f-bd61-1f3c-2034-1acc96539882@gmail.com>

On Tue, Jan 30, 2018 at 6:16 AM, Ben Peart <peartben@gmail.com> wrote:
>
>
> On 1/29/2018 4:40 AM, Duy Nguyen wrote:
>>
>> On Sat, Jan 27, 2018 at 12:43:41PM +0100, Ævar Arnfjörð Bjarmason wrote:
>>>
>>> b) with fsmonitor
>>>
>>>      $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status
>>>      12:34:23.833625 read-cache.c:1890       performance: 0.049485685 s:
>>> read cache .git/index
>>
>>
>> This is sort of off topic but may be interesting for big repo guys. It
>> looks like read cache's time is partially dominated by malloc().
>>
>
> That is correct.  We have tried a few different ways to address this. First
> was my patch series [1] that would parallelize all of the read cache code.
>
> We quickly found that malloc() was the biggest culprit and by speeding that
> up, we got most of the wins.  At Peff's recommendation [2], we looked into
> using tcmalloc but found that 1) it has bugs on Windows and 2) it isn't
> being actively maintained so it didn't seem those bugs would ever get fixed.
>
> We are currently working on a patch that will use a refactored version of
> the mem_pool in fast-import.c to do block allocations of the cache entries
> which is giving us about a 22% improvement in "git status" times.  One

My apologies if this has been discussed in the second half of 2017
which I have no idea what happened.

I just wonder if it's possible to design a "file format" that is
basically a memory dump of lots of struct cache_entry? This "file"
will stay in a shared memory somewhere and never get written down to
disk. Since it's very close to the data structure we have in core, the
most we need to do after mmap'ing it (and keeping it mmap'd until the
end) is adjust some pointers. These entries are of course read-only.
When you modify/create new entries, they are created new, using the
old malloc(). We just need to make sure not free the read-only
cache_entry entries and munmap() the whole thing when we
discard_index().

This opens up another option to deal with the large UNTR and TREE
extensions in a similar way. These will be your next headache after
you have reduced parse time for main entries.

> challenge has been ensuring that cache entries are not passed from one
> index/mem_pool to another which could cause access after free bugs.

We kind of have something close to that, but not entirely the same.
When split index is used, the same cache_entry can appear in two
index_state structs. Of course you can free only one of them (and you
can only do so when you know both index_state are gone). I see some
code cleanup opportunity :)

>
> [1]
> https://public-inbox.org/git/20171109141737.47976-1-benpeart@microsoft.com/
> [2]
> https://public-inbox.org/git/20171120153846.v5b7ho42yzrznqoh@sigill.intra.peff.net/
-- 
Duy

  reply	other threads:[~2018-02-01 10:40 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-27  0:28 Some rough edges of core.fsmonitor Ævar Arnfjörð Bjarmason
2018-01-27  1:36 ` Duy Nguyen
2018-01-27  1:39   ` [PATCH] trace: measure where the time is spent in the index-heavy operations Nguyễn Thái Ngọc Duy
2018-01-27 11:58     ` Thomas Gummerer
2018-01-27 12:27       ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2018-01-27 11:43   ` Some rough edges of core.fsmonitor Ævar Arnfjörð Bjarmason
2018-01-27 12:39     ` Duy Nguyen
2018-01-27 13:09       ` Duy Nguyen
2018-01-27 19:01         ` Ævar Arnfjörð Bjarmason
2018-01-30 22:41           ` Ben Peart
2018-01-29  9:40     ` Duy Nguyen
2018-01-29 23:16       ` Ben Peart
2018-02-01 10:40         ` Duy Nguyen [this message]
2018-01-28 20:44 ` Johannes Schindelin
2018-01-28 22:28   ` Ævar Arnfjörð Bjarmason
2018-01-30  1:21     ` Ben Peart
2018-01-31 10:15       ` Duy Nguyen
2018-02-04  9:38         ` [PATCH] dir.c: ignore paths containing .git when invalidating untracked cache Nguyễn Thái Ngọc Duy
2018-02-05 17:44           ` Ben Peart
2018-02-06 12:02             ` Duy Nguyen
2018-02-07  9:21           ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2018-02-07  9:21             ` Nguyễn Thái Ngọc Duy
2018-02-07 16:59               ` Ben Peart
2018-02-13 10:00                 ` Duy Nguyen
2018-02-13 17:57                   ` Junio C Hamano
2018-02-14  1:24                     ` Duy Nguyen
2018-02-14  8:00                       ` Junio C Hamano
2018-01-30 22:57 ` Some rough edges of core.fsmonitor Ben Peart
2018-01-30 23:16   ` Ævar Arnfjörð Bjarmason
2018-01-31 16:12     ` Ben Peart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACsJy8CShVSQAZ6ZaGT9i0JDEn19cTMaEmLRDV3ULjaj9DqZUg@mail.gmail.com \
    --to=pclouds@gmail.com \
    --cc=alexmv@dropbox.com \
    --cc=avarab@gmail.com \
    --cc=benpeart@microsoft.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jamill@microsoft.com \
    --cc=peartben@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).