git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <ttaylorr@github.com>
To: Sun Chao <16657101987@163.com>
Cc: "Taylor Blau" <me@ttaylorr.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Sun Chao via GitGitGadget" <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Subject: Re: [PATCH v2] packfile: freshen the mtime of packfile by configuration
Date: Wed, 14 Jul 2021 13:04:03 -0400	[thread overview]
Message-ID: <YO8XrOChAtxhpuxS@nand.local> (raw)
In-Reply-To: <ACE7ECBE-0D7A-4FB8-B4F9-F9E32BE2234C@163.com>

On Thu, Jul 15, 2021 at 12:46:47AM +0800, Sun Chao wrote:
> > Stepping back, I'm not sure I understand why freshening a pack is so
> > slow for you. freshen_file() just calls utime(2), and any sync back to
> > the disk shouldn't need to update the pack itself, just a couple of
> > fields in its inode. Maybe you could help explain further.
> >
> > [ ... ]
>
> The reason why we want to avoid freshen the mtime of ".pack" file is to
> improve the reading speed of Git Servers.
>
> We have some large repositories in our Git Severs (some are bigger than 10GB),
> and we created '.keep' files for large ".pack" files, we want the big files
> unchanged to speed up git upload-pack, because in our mind the file system
> cache will reduce the disk IO if a file does not changed.
>
> However we find the mtime of ".pack" files changes over time which makes the
> file system always reload the big files, that takes a lot of IO time and result
> in lower speed of git upload-pack and even further the disk IOPS is exhausted.

That's surprising behavior to me. Are you saying that calling utime(2)
causes the *page* cache to be invalidated and that most reads are
cache-misses lowering overall IOPS?

If so, then I am quite surprised ;). The only state that should be
dirtied by calling utime(2) is the inode itself, so the blocks referred
to by the inode corresponding to a pack should be left in-tact.

If you're on Linux, you can try observing the behavior of evicting
inodes, blocks, or both from the disk cache by changing "2" in the
following:

    hyperfine 'git pack-objects --all --stdout --delta-base-offset >/dev/null'
      --prepare='sync; echo 2 | sudo tee /proc/sys/vm/drop_caches'

where "1" drops the page cache, "2" drops the inodes, and "3" evicts
both.

I wonder if you could share the results of running the above varying
the value of "1", "2", and "3", as well as swapping the `--prepare` for
`--warmup=3` to warm your caches (and give us an idea of what your
expected performance is probably like).

Thanks,
Taylor

  reply	other threads:[~2021-07-14 17:04 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-10 19:01 [PATCH] packfile: enhance the mtime of packfile by idx file Sun Chao via GitGitGadget
2021-07-11 23:44 ` Ævar Arnfjörð Bjarmason
2021-07-12 16:17   ` Sun Chao
2021-07-14  1:28 ` [PATCH v2] packfile: freshen the mtime of packfile by configuration Sun Chao via GitGitGadget
2021-07-14  1:39   ` Ævar Arnfjörð Bjarmason
2021-07-14  2:52     ` Taylor Blau
2021-07-14 16:46       ` Sun Chao
2021-07-14 17:04         ` Taylor Blau [this message]
2021-07-14 18:19           ` Ævar Arnfjörð Bjarmason
2021-07-14 19:11             ` Martin Fick
2021-07-14 19:41               ` Ævar Arnfjörð Bjarmason
2021-07-14 20:20                 ` Martin Fick
2021-07-20  6:32                   ` Ævar Arnfjörð Bjarmason
2021-07-15  8:23                 ` Son Luong Ngoc
2021-07-20  6:29                   ` Ævar Arnfjörð Bjarmason
2021-07-14 19:30             ` Taylor Blau
2021-07-14 19:32               ` Ævar Arnfjörð Bjarmason
2021-07-14 19:52                 ` Taylor Blau
2021-07-14 21:40               ` Junio C Hamano
2021-07-15 16:30           ` Sun Chao
2021-07-15 16:42             ` Taylor Blau
2021-07-15 16:48               ` Sun Chao
2021-07-14 16:11     ` Sun Chao
2021-07-19 19:53   ` [PATCH v3] " Sun Chao via GitGitGadget
2021-07-19 20:51     ` Taylor Blau
2021-07-20  0:07       ` Junio C Hamano
2021-07-20 15:07         ` Sun Chao
2021-07-20  6:19       ` Ævar Arnfjörð Bjarmason
2021-07-20 15:34         ` Sun Chao
2021-07-20 15:00       ` Sun Chao
2021-07-20 16:53         ` Taylor Blau
2021-08-15 17:08     ` [PATCH v4 0/2] " Sun Chao via GitGitGadget
2021-08-15 17:08       ` [PATCH v4 1/2] packfile: rename `derive_filename()` to `derive_pack_filename()` Sun Chao via GitGitGadget
2021-08-15 17:08       ` [PATCH v4 2/2] packfile: freshen the mtime of packfile by bump file Sun Chao via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YO8XrOChAtxhpuxS@nand.local \
    --to=ttaylorr@github.com \
    --cc=16657101987@163.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).