git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "kylezhao(赵柯宇)" <kylezhao@tencent.com>
To: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: reachability-bitmap makes push performance worse ?
Date: Tue, 14 Jun 2022 07:59:41 +0000	[thread overview]
Message-ID: <b940e705fbe9454685757f2e3055e2ce@tencent.com> (raw)


Hi All, 
 
thank you for reading my report.
 
 
How did we find out?
 
The problem described in the title occurs on our git server.
Each git repositories have multiple replicas on our servers to increase git read performance, and the data synchronization method between these replicas is git push.
One day we found that the git push of a repository was significantly slow, and it took more than ten seconds to just create a new branch from an existing commit.
 
How to reproduce the problem ?
 
git version: 2.36.1
 
# /data/test/repo is a bare git repository which can reproduce the problem
$ cd /data/test/repo
 
# number of refs
$ git show-ref | wc -l
21134
# pack information
$ ls objects/pack/ -hl
total 14G
-r--r--r-- 1 root root  43M Jun 14 04:16 pack-9a7fc024652645a632fb82a4ff26c3ddf4883eed.bitmap
-r--r--r-- 1 root root 169M Jun 14 04:15 pack-9a7fc024652645a632fb82a4ff26c3ddf4883eed.idx
-r--r--r-- 1 root root  14G Jun 14 04:14 pack-9a7fc024652645a632fb82a4ff26c3ddf4883eed.pack
 
# objects information
$ git count-objects -v
count: 0
size: 0
in-pack: 5185141
packs: 1
size-pack: 13938704
prune-packable: 0
garbage: 0
size-garbage: 0
 
# number of commits
$ git rev-list --all |  wc -l
955262
 
$ cp -r /data/test/repo /data/test/replica-1
$ cp -r /data/test/repo /data/test/replica-2
$ cd /data/test/replica-1
 
# create a branch from an existing commit
$ git update-ref refs/heads/b_1 43fa4721c61106583cd552da85da3bd84f0f9929
$ git show-ref | grep 43fa4721c61106583cd552da85da3bd84f0f9929
43fa4721c61106583cd552da85da3bd84f0f9929 refs/heads/b_1
 
# number of commits of the ref
$ git rev-list refs/heads/b_1 |  wc -l
117836
 
# git push with bitmap
$ GIT_TRACE=1 git push file:///data/test/replica-2 refs/heads/b_1
04:19:07.654103 git.c:459               trace: built-in: git push file:///data/test/replica-2 refs/heads/b_1
04:19:07.690006 run-command.c:654       trace: run_command: unset GIT_DIR GIT_IMPLICIT_WORK_TREE GIT_PREFIX; 'git-receive-pack '\''/data/test/replica-2'\'''
04:19:07.694339 git.c:459               trace: built-in: git receive-pack /data/test/replica-2
04:19:07.751814 run-command.c:654       trace: run_command: git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress
04:19:07.754011 git.c:459               trace: built-in: git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress
Total 0 (delta 0), reused 0 (delta 0), pack-reused 0
04:19:20.304868 run-command.c:654       trace: run_command: GIT_ALTERNATE_OBJECT_DIRECTORIES=/data/test/replica-2/./objects GIT_OBJECT_DIRECTORY=/data/test/replica-2/./objects/tmp_objdir-incoming-CaCTHm GIT_QUARANTINE_PATH
=/data/test/replica-2/./objects/tmp_objdir-incoming-CaCTHm git unpack-objects --pack_header=2,0
remote: 04:19:20.306550 git.c:459               trace: built-in: git unpack-objects --pack_header=2,0
04:19:20.306903 run-command.c:654       trace: run_command: GIT_ALTERNATE_OBJECT_DIRECTORIES=/data/test/replica-2/./objects GIT_OBJECT_DIRECTORY=/data/test/replica-2/./objects/tmp_objdir-incoming-CaCTHm GIT_QUARANTINE_PATH
=/data/test/replica-2/./objects/tmp_objdir-incoming-CaCTHm git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
remote: 04:19:20.308332 git.c:459               trace: built-in: git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
remote: 04:19:20.344031 run-command.c:654       trace: run_command: unset GIT_ALTERNATE_OBJECT_DIRECTORIES GIT_DIR GIT_OBJECT_DIRECTORY GIT_PREFIX; git --git-dir=/data/test/replica-2 for-each-ref '--format=%(objectname)'
remote: 04:19:20.346359 git.c:459               trace: built-in: git for-each-ref '--format=%(objectname)'
04:19:20.395511 run-command.c:654       trace: run_command: git gc --auto --quiet
remote: 04:19:20.397949 git.c:459               trace: built-in: git gc --auto --quiet
To file:///data/test/replica-2
* [new branch]                b_1 -> b_1
 
# reset replica-2 and remove bitmap
$ rm -rf /data/test/replica-2
$ cp -r /data/test/repo /data/test/replica-2
$ rm objects/pack/pack-9a7fc024652645a632fb82a4ff26c3ddf4883eed.bitmap
 
 
# git push without bitmap
$ GIT_TRACE=1 git push file:///data/test/replica-2 refs/heads/b_1
04:20:44.633590 git.c:459               trace: built-in: git push file:///data/test/replica-2 refs/heads/b_1
04:20:44.668908 run-command.c:654       trace: run_command: unset GIT_DIR GIT_IMPLICIT_WORK_TREE GIT_PREFIX; 'git-receive-pack '\''/data/test/replica-2'\'''
04:20:44.673234 git.c:459               trace: built-in: git receive-pack /data/test/replica-2
04:20:44.720852 run-command.c:654       trace: run_command: git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress
04:20:44.723100 git.c:459               trace: built-in: git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress
Total 0 (delta 0), reused 0 (delta 0), pack-reused 0
04:20:44.800298 run-command.c:654       trace: run_command: GIT_ALTERNATE_OBJECT_DIRECTORIES=/data/test/replica-2/./objects GIT_OBJECT_DIRECTORY=/data/test/replica-2/./objects/tmp_objdir-incoming-UOWY1E GIT_QUARANTINE_PATH
=/data/test/replica-2/./objects/tmp_objdir-incoming-UOWY1E git unpack-objects --pack_header=2,0
remote: 04:20:44.802056 git.c:459               trace: built-in: git unpack-objects --pack_header=2,0
04:20:44.802474 run-command.c:654       trace: run_command: GIT_ALTERNATE_OBJECT_DIRECTORIES=/data/test/replica-2/./objects GIT_OBJECT_DIRECTORY=/data/test/replica-2/./objects/tmp_objdir-incoming-UOWY1E GIT_QUARANTINE_PATH
=/data/test/replica-2/./objects/tmp_objdir-incoming-UOWY1E git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
remote: 04:20:44.803930 git.c:459               trace: built-in: git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
remote: 04:20:44.834388 run-command.c:654       trace: run_command: unset GIT_ALTERNATE_OBJECT_DIRECTORIES GIT_DIR GIT_OBJECT_DIRECTORY GIT_PREFIX; git --git-dir=/data/test/replica-2 for-each-ref '--format=%(objectname)'
remote: 04:20:44.836220 git.c:459               trace: built-in: git for-each-ref '--format=%(objectname)'
04:20:44.884165 run-command.c:654       trace: run_command: git gc --auto --quiet
remote: 04:20:44.886108 git.c:459               trace: built-in: git gc --auto --quiet
To file:///data/test/replica-2
* [new branch]                b_1 -> b_1
 
 
It can be seen from the above operations that git push is stuck in the git pack-objects process for about 13s for a long time.
After I deleted the bitmap, the whole git push completed in less than 1s.
 
During testing, we found that not every git repository was significantly affected by bitmap. 
This may be related to the number of objects in the git repository itself, the number of refs, and the sha1 pointed to by the pushed branch.
 
We benefit from bitmap performance optimizations for git fetch and clone, but it seems that it affects the performance of git push.
 
Maybe we can disable bitmap under the process of git push?
As far as I know, the number of "counting objects" represented during a git push is usually small relative to the entire repository.
Counting objects by building bitmaps in memory may take more time than before.
 
Of course, it would be better if anyone has a better solution.
 
Regards,
Kyle
    

             reply	other threads:[~2022-06-14  8:06 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-14  7:59 kylezhao(赵柯宇) [this message]
2022-06-14  8:55 ` reachability-bitmap makes push performance worse ? Ævar Arnfjörð Bjarmason
2022-06-14 11:00   ` [Internet]Re: " kylezhao(赵柯宇)
2022-06-14 14:22     ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b940e705fbe9454685757f2e3055e2ce@tencent.com \
    --to=kylezhao@tencent.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).