git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Sun Chao via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Sun Chao <16657101987@163.com>
Subject: [PATCH 0/2] packfile: freshen the mtime of packfile by bump file
Date: Mon, 16 Aug 2021 17:05:59 +0000	[thread overview]
Message-ID: <pull.1066.git.git.1629133561.gitgitgadget@gmail.com> (raw)

packfile: freshen the mtime of packfile by bump file

We've talked about the cache reload through earlier patches(
https://lore.kernel.org/git/pull.1043.git.git.1625943685565.gitgitgadget@gmail.com),
and we stopped because no further evidence can tell NFS client will reload
the page caches if the file mtime changed. So our team have done these
experiments:

Step1: prepare git servers which mount the NFS disk and a big repo

We prepared 3 vms named c1, s1 and s2, we also have a NFS server named n1.
s1 and s2 mount the NFS disk from n1 by:

    mount -t nfs -o vers=3,timeo=600,nolock,noatime,lookupcache=postive,\
    actimeo=3 <n1 ip addr>:/repositories /mnt/repositories


We setup git server services on s1 and s2, so we can clone repos from s1 by
git commands. Then we created a repository under /mnt/repositories, and
pushed large files to the repository, so we can find a large .pack file in
the repository with about 1.2 GB size.

Step2: do first git clone from client after drop caches of s1

First we drop the caches from s1 by:

    sync; echo 3 > /proc/sys/vm/drop_caches


Then we run git command in c1 to clone the huge repository we created in
Step1, at the same time we run the two commands in s1:

    tcpdump -nn host <n1 ip addr> -w 1st_command.pcap
    nfsiostat 1 -p /mnt/repositories


try to get the result and check what happends.

Step3: do new git clones without drop caches of s1

After Step2, we called new git clone command in c1 to clone the huge
repository for serveral times, and also run the commands at the same time:

    tcpdump -nn host <n1 ip addr> -w lots_of_command.pcap
    nfsiostat 1 -p /mnt/repositories


Step4: do new git clones with packfile mtime changed

After Step2 and Step3, we try to touch all the ".pack" files from s2, and we
call a new git clone in c1 to download the huge repository again, and run
the two command in s1 at the same time:

    tcpdump -nn host <n1 ip addr> -w mtime_changed_command.pcap
    nfsiostat 1 -p /mnt/repositories


Result:

We got a about 1.4GB big pcap file during Step2 and Step4, we can find lots
of READ request and response after open it with wireshark. And by
'nfsiostat' command we can see the 'ops/s' and 'KB/s' of 'read' in the
output shows a relatively large value for a while.

But we got a 4MB pcap file in Step3, and open it with wireshark, we can only
find GETATTR and FSSTAT requests and response. And we the 'nfsiostat' always
show 0 in 'ops/s' and 'KB/s' of 'read' part in the output.

We have done Step1 to Step4 serveral times, each time the result are same.

So we can make sure the NFS client will reload the page cache if other NFS
client changes the mtime of the large .pack files. And for git servers which
use filesystem like NFS to manage large repositories, reload large files
that only have mtime changed result big NFS server IOPS pressure and that
also makes the git server slow because the IO is the bottleneck when there
are too many client requests for the same big repositries.

And I do think the team who manage the git servers need a configuration
choise which can enhance the mtime of packfile through another file which
should be small enough or even empty. It should be backward compatibility
when it is in default value, but just as metioned by Ævar before, maybe
somepeople what to use it in mixed-version environment, we should warn them
in documents, but such configuration do big help for some team who run some
servers mount the NFS disks.

Sun Chao (2):
  packfile: rename `derive_filename()` to `derive_pack_filename()`
  packfile: freshen the mtime of packfile by bump file

 Documentation/config/core.txt   |  11 +++
 builtin/index-pack.c            |  19 +----
 cache.h                         |   1 +
 config.c                        |   5 ++
 environment.c                   |   1 +
 object-file.c                   |  30 +++++++-
 packfile.c                      |  25 ++++++-
 packfile.h                      |   7 ++
 t/t5326-pack-mtime-bumpfiles.sh | 118 ++++++++++++++++++++++++++++++++
 9 files changed, 198 insertions(+), 19 deletions(-)
 create mode 100755 t/t5326-pack-mtime-bumpfiles.sh


base-commit: 5d213e46bb7b880238ff5ea3914e940a50ae9369
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1066%2Fsunchao9%2Fmaster-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1066/sunchao9/master-v1
Pull-Request: https://github.com/git/git/pull/1066
-- 
gitgitgadget

             reply	other threads:[~2021-08-16 17:06 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-16 17:05 Sun Chao via GitGitGadget [this message]
2021-08-16 17:06 ` [PATCH 1/2] packfile: rename `derive_filename()` to `derive_pack_filename()` Sun Chao via GitGitGadget
2021-08-16 17:06 ` [PATCH 2/2] packfile: freshen the mtime of packfile by bump file Sun Chao via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1066.git.git.1629133561.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=16657101987@163.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).