git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* reachability-bitmap makes push performance worse ?
@ 2022-06-14  7:59 kylezhao(赵柯宇)
  2022-06-14  8:55 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 4+ messages in thread
From: kylezhao(赵柯宇) @ 2022-06-14  7:59 UTC (permalink / raw)
  To: git@vger.kernel.org


Hi All, 
 
thank you for reading my report.
 
 
How did we find out?
 
The problem described in the title occurs on our git server.
Each git repositories have multiple replicas on our servers to increase git read performance, and the data synchronization method between these replicas is git push.
One day we found that the git push of a repository was significantly slow, and it took more than ten seconds to just create a new branch from an existing commit.
 
How to reproduce the problem ?
 
git version: 2.36.1
 
# /data/test/repo is a bare git repository which can reproduce the problem
$ cd /data/test/repo
 
# number of refs
$ git show-ref | wc -l
21134
# pack information
$ ls objects/pack/ -hl
total 14G
-r--r--r-- 1 root root  43M Jun 14 04:16 pack-9a7fc024652645a632fb82a4ff26c3ddf4883eed.bitmap
-r--r--r-- 1 root root 169M Jun 14 04:15 pack-9a7fc024652645a632fb82a4ff26c3ddf4883eed.idx
-r--r--r-- 1 root root  14G Jun 14 04:14 pack-9a7fc024652645a632fb82a4ff26c3ddf4883eed.pack
 
# objects information
$ git count-objects -v
count: 0
size: 0
in-pack: 5185141
packs: 1
size-pack: 13938704
prune-packable: 0
garbage: 0
size-garbage: 0
 
# number of commits
$ git rev-list --all |  wc -l
955262
 
$ cp -r /data/test/repo /data/test/replica-1
$ cp -r /data/test/repo /data/test/replica-2
$ cd /data/test/replica-1
 
# create a branch from an existing commit
$ git update-ref refs/heads/b_1 43fa4721c61106583cd552da85da3bd84f0f9929
$ git show-ref | grep 43fa4721c61106583cd552da85da3bd84f0f9929
43fa4721c61106583cd552da85da3bd84f0f9929 refs/heads/b_1
 
# number of commits of the ref
$ git rev-list refs/heads/b_1 |  wc -l
117836
 
# git push with bitmap
$ GIT_TRACE=1 git push file:///data/test/replica-2 refs/heads/b_1
04:19:07.654103 git.c:459               trace: built-in: git push file:///data/test/replica-2 refs/heads/b_1
04:19:07.690006 run-command.c:654       trace: run_command: unset GIT_DIR GIT_IMPLICIT_WORK_TREE GIT_PREFIX; 'git-receive-pack '\''/data/test/replica-2'\'''
04:19:07.694339 git.c:459               trace: built-in: git receive-pack /data/test/replica-2
04:19:07.751814 run-command.c:654       trace: run_command: git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress
04:19:07.754011 git.c:459               trace: built-in: git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress
Total 0 (delta 0), reused 0 (delta 0), pack-reused 0
04:19:20.304868 run-command.c:654       trace: run_command: GIT_ALTERNATE_OBJECT_DIRECTORIES=/data/test/replica-2/./objects GIT_OBJECT_DIRECTORY=/data/test/replica-2/./objects/tmp_objdir-incoming-CaCTHm GIT_QUARANTINE_PATH
=/data/test/replica-2/./objects/tmp_objdir-incoming-CaCTHm git unpack-objects --pack_header=2,0
remote: 04:19:20.306550 git.c:459               trace: built-in: git unpack-objects --pack_header=2,0
04:19:20.306903 run-command.c:654       trace: run_command: GIT_ALTERNATE_OBJECT_DIRECTORIES=/data/test/replica-2/./objects GIT_OBJECT_DIRECTORY=/data/test/replica-2/./objects/tmp_objdir-incoming-CaCTHm GIT_QUARANTINE_PATH
=/data/test/replica-2/./objects/tmp_objdir-incoming-CaCTHm git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
remote: 04:19:20.308332 git.c:459               trace: built-in: git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
remote: 04:19:20.344031 run-command.c:654       trace: run_command: unset GIT_ALTERNATE_OBJECT_DIRECTORIES GIT_DIR GIT_OBJECT_DIRECTORY GIT_PREFIX; git --git-dir=/data/test/replica-2 for-each-ref '--format=%(objectname)'
remote: 04:19:20.346359 git.c:459               trace: built-in: git for-each-ref '--format=%(objectname)'
04:19:20.395511 run-command.c:654       trace: run_command: git gc --auto --quiet
remote: 04:19:20.397949 git.c:459               trace: built-in: git gc --auto --quiet
To file:///data/test/replica-2
* [new branch]                b_1 -> b_1
 
# reset replica-2 and remove bitmap
$ rm -rf /data/test/replica-2
$ cp -r /data/test/repo /data/test/replica-2
$ rm objects/pack/pack-9a7fc024652645a632fb82a4ff26c3ddf4883eed.bitmap
 
 
# git push without bitmap
$ GIT_TRACE=1 git push file:///data/test/replica-2 refs/heads/b_1
04:20:44.633590 git.c:459               trace: built-in: git push file:///data/test/replica-2 refs/heads/b_1
04:20:44.668908 run-command.c:654       trace: run_command: unset GIT_DIR GIT_IMPLICIT_WORK_TREE GIT_PREFIX; 'git-receive-pack '\''/data/test/replica-2'\'''
04:20:44.673234 git.c:459               trace: built-in: git receive-pack /data/test/replica-2
04:20:44.720852 run-command.c:654       trace: run_command: git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress
04:20:44.723100 git.c:459               trace: built-in: git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress
Total 0 (delta 0), reused 0 (delta 0), pack-reused 0
04:20:44.800298 run-command.c:654       trace: run_command: GIT_ALTERNATE_OBJECT_DIRECTORIES=/data/test/replica-2/./objects GIT_OBJECT_DIRECTORY=/data/test/replica-2/./objects/tmp_objdir-incoming-UOWY1E GIT_QUARANTINE_PATH
=/data/test/replica-2/./objects/tmp_objdir-incoming-UOWY1E git unpack-objects --pack_header=2,0
remote: 04:20:44.802056 git.c:459               trace: built-in: git unpack-objects --pack_header=2,0
04:20:44.802474 run-command.c:654       trace: run_command: GIT_ALTERNATE_OBJECT_DIRECTORIES=/data/test/replica-2/./objects GIT_OBJECT_DIRECTORY=/data/test/replica-2/./objects/tmp_objdir-incoming-UOWY1E GIT_QUARANTINE_PATH
=/data/test/replica-2/./objects/tmp_objdir-incoming-UOWY1E git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
remote: 04:20:44.803930 git.c:459               trace: built-in: git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
remote: 04:20:44.834388 run-command.c:654       trace: run_command: unset GIT_ALTERNATE_OBJECT_DIRECTORIES GIT_DIR GIT_OBJECT_DIRECTORY GIT_PREFIX; git --git-dir=/data/test/replica-2 for-each-ref '--format=%(objectname)'
remote: 04:20:44.836220 git.c:459               trace: built-in: git for-each-ref '--format=%(objectname)'
04:20:44.884165 run-command.c:654       trace: run_command: git gc --auto --quiet
remote: 04:20:44.886108 git.c:459               trace: built-in: git gc --auto --quiet
To file:///data/test/replica-2
* [new branch]                b_1 -> b_1
 
 
It can be seen from the above operations that git push is stuck in the git pack-objects process for about 13s for a long time.
After I deleted the bitmap, the whole git push completed in less than 1s.
 
During testing, we found that not every git repository was significantly affected by bitmap. 
This may be related to the number of objects in the git repository itself, the number of refs, and the sha1 pointed to by the pushed branch.
 
We benefit from bitmap performance optimizations for git fetch and clone, but it seems that it affects the performance of git push.
 
Maybe we can disable bitmap under the process of git push?
As far as I know, the number of "counting objects" represented during a git push is usually small relative to the entire repository.
Counting objects by building bitmaps in memory may take more time than before.
 
Of course, it would be better if anyone has a better solution.
 
Regards,
Kyle
    

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: reachability-bitmap makes push performance worse ?
  2022-06-14  7:59 reachability-bitmap makes push performance worse ? kylezhao(赵柯宇)
@ 2022-06-14  8:55 ` Ævar Arnfjörð Bjarmason
  2022-06-14 11:00   ` [Internet]Re: " kylezhao(赵柯宇)
  0 siblings, 1 reply; 4+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-06-14  8:55 UTC (permalink / raw)
  To: kylezhao(赵柯宇); +Cc: git@vger.kernel.org


On Tue, Jun 14 2022, kylezhao(赵柯宇) wrote:

> Hi All, 
>  
> thank you for reading my report.
>  
>  
> How did we find out?
>  
> The problem described in the title occurs on our git server.
> Each git repositories have multiple replicas on our servers to increase git read performance, and the data synchronization method between these replicas is git push.
> One day we found that the git push of a repository was significantly slow, and it took more than ten seconds to just create a new branch from an existing commit.
>  
> How to reproduce the problem ?
>  
> git version: 2.36.1
>  
> # /data/test/repo is a bare git repository which can reproduce the problem
> $ cd /data/test/repo
>  
> # number of refs
> $ git show-ref | wc -l
> 21134
> # pack information
> $ ls objects/pack/ -hl
> total 14G
> -r--r--r-- 1 root root  43M Jun 14 04:16 pack-9a7fc024652645a632fb82a4ff26c3ddf4883eed.bitmap
> -r--r--r-- 1 root root 169M Jun 14 04:15 pack-9a7fc024652645a632fb82a4ff26c3ddf4883eed.idx
> -r--r--r-- 1 root root  14G Jun 14 04:14 pack-9a7fc024652645a632fb82a4ff26c3ddf4883eed.pack
>  
> # objects information
> $ git count-objects -v
> count: 0
> size: 0
> in-pack: 5185141
> packs: 1
> size-pack: 13938704
> prune-packable: 0
> garbage: 0
> size-garbage: 0
>  
> # number of commits
> $ git rev-list --all |  wc -l
> 955262
>  
> $ cp -r /data/test/repo /data/test/replica-1
> $ cp -r /data/test/repo /data/test/replica-2
> $ cd /data/test/replica-1
>  
> # create a branch from an existing commit
> $ git update-ref refs/heads/b_1 43fa4721c61106583cd552da85da3bd84f0f9929
> $ git show-ref | grep 43fa4721c61106583cd552da85da3bd84f0f9929
> 43fa4721c61106583cd552da85da3bd84f0f9929 refs/heads/b_1
>  
> # number of commits of the ref
> $ git rev-list refs/heads/b_1 |  wc -l
> 117836
>  
> # git push with bitmap
> $ GIT_TRACE=1 git push file:///data/test/replica-2 refs/heads/b_1
> 04:19:07.654103 git.c:459               trace: built-in: git push file:///data/test/replica-2 refs/heads/b_1
> 04:19:07.690006 run-command.c:654       trace: run_command: unset GIT_DIR GIT_IMPLICIT_WORK_TREE GIT_PREFIX; 'git-receive-pack '\''/data/test/replica-2'\'''
> 04:19:07.694339 git.c:459               trace: built-in: git receive-pack /data/test/replica-2
> 04:19:07.751814 run-command.c:654       trace: run_command: git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress
> 04:19:07.754011 git.c:459               trace: built-in: git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress
> Total 0 (delta 0), reused 0 (delta 0), pack-reused 0
> 04:19:20.304868 run-command.c:654       trace: run_command:
> GIT_ALTERNATE_OBJECT_DIRECTORIES=/data/test/replica-2/./objects
> GIT_OBJECT_DIRECTORY=/data/test/replica-2/./objects/tmp_objdir-incoming-CaCTHm
> GIT_QUARANTINE_PATH
> =/data/test/replica-2/./objects/tmp_objdir-incoming-CaCTHm git unpack-objects --pack_header=2,0
> remote: 04:19:20.306550 git.c:459               trace: built-in: git unpack-objects --pack_header=2,0
> 04:19:20.306903 run-command.c:654       trace: run_command:
> GIT_ALTERNATE_OBJECT_DIRECTORIES=/data/test/replica-2/./objects
> GIT_OBJECT_DIRECTORY=/data/test/replica-2/./objects/tmp_objdir-incoming-CaCTHm
> GIT_QUARANTINE_PATH
> =/data/test/replica-2/./objects/tmp_objdir-incoming-CaCTHm git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
> remote: 04:19:20.308332 git.c:459               trace: built-in: git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
> remote: 04:19:20.344031 run-command.c:654       trace: run_command:
> unset GIT_ALTERNATE_OBJECT_DIRECTORIES GIT_DIR GIT_OBJECT_DIRECTORY
> GIT_PREFIX; git --git-dir=/data/test/replica-2 for-each-ref
> '--format=%(objectname)'
> remote: 04:19:20.346359 git.c:459               trace: built-in: git for-each-ref '--format=%(objectname)'
> 04:19:20.395511 run-command.c:654       trace: run_command: git gc --auto --quiet
> remote: 04:19:20.397949 git.c:459               trace: built-in: git gc --auto --quiet
> To file:///data/test/replica-2
> * [new branch]                b_1 -> b_1
>  
> # reset replica-2 and remove bitmap
> $ rm -rf /data/test/replica-2
> $ cp -r /data/test/repo /data/test/replica-2
> $ rm objects/pack/pack-9a7fc024652645a632fb82a4ff26c3ddf4883eed.bitmap
>  
>  
> # git push without bitmap
> $ GIT_TRACE=1 git push file:///data/test/replica-2 refs/heads/b_1
> 04:20:44.633590 git.c:459               trace: built-in: git push file:///data/test/replica-2 refs/heads/b_1
> 04:20:44.668908 run-command.c:654       trace: run_command: unset GIT_DIR GIT_IMPLICIT_WORK_TREE GIT_PREFIX; 'git-receive-pack '\''/data/test/replica-2'\'''
> 04:20:44.673234 git.c:459               trace: built-in: git receive-pack /data/test/replica-2
> 04:20:44.720852 run-command.c:654       trace: run_command: git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress
> 04:20:44.723100 git.c:459               trace: built-in: git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress
> Total 0 (delta 0), reused 0 (delta 0), pack-reused 0
> 04:20:44.800298 run-command.c:654       trace: run_command:
> GIT_ALTERNATE_OBJECT_DIRECTORIES=/data/test/replica-2/./objects
> GIT_OBJECT_DIRECTORY=/data/test/replica-2/./objects/tmp_objdir-incoming-UOWY1E
> GIT_QUARANTINE_PATH
> =/data/test/replica-2/./objects/tmp_objdir-incoming-UOWY1E git unpack-objects --pack_header=2,0
> remote: 04:20:44.802056 git.c:459               trace: built-in: git unpack-objects --pack_header=2,0
> 04:20:44.802474 run-command.c:654       trace: run_command:
> GIT_ALTERNATE_OBJECT_DIRECTORIES=/data/test/replica-2/./objects
> GIT_OBJECT_DIRECTORY=/data/test/replica-2/./objects/tmp_objdir-incoming-UOWY1E
> GIT_QUARANTINE_PATH
> =/data/test/replica-2/./objects/tmp_objdir-incoming-UOWY1E git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
> remote: 04:20:44.803930 git.c:459               trace: built-in: git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
> remote: 04:20:44.834388 run-command.c:654       trace: run_command:
> unset GIT_ALTERNATE_OBJECT_DIRECTORIES GIT_DIR GIT_OBJECT_DIRECTORY
> GIT_PREFIX; git --git-dir=/data/test/replica-2 for-each-ref
> '--format=%(objectname)'
> remote: 04:20:44.836220 git.c:459               trace: built-in: git for-each-ref '--format=%(objectname)'
> 04:20:44.884165 run-command.c:654       trace: run_command: git gc --auto --quiet
> remote: 04:20:44.886108 git.c:459               trace: built-in: git gc --auto --quiet
> To file:///data/test/replica-2
> * [new branch]                b_1 -> b_1
>  
>  
> It can be seen from the above operations that git push is stuck in the git pack-objects process for about 13s for a long time.
> After I deleted the bitmap, the whole git push completed in less than 1s.
>  
> During testing, we found that not every git repository was significantly affected by bitmap. 
> This may be related to the number of objects in the git repository itself, the number of refs, and the sha1 pointed to by the pushed branch.
>  
> We benefit from bitmap performance optimizations for git fetch and clone, but it seems that it affects the performance of git push.
>  
> Maybe we can disable bitmap under the process of git push?
> As far as I know, the number of "counting objects" represented during a git push is usually small relative to the entire repository.
> Counting objects by building bitmaps in memory may take more time than before.
>  
> Of course, it would be better if anyone has a better solution.

This is a known issue, I think you've found the same problem discussed
in these past threads:

https://lore.kernel.org/git/38b99459158a45b1bea09037f3dd092d@exmbdft7.ad.twosigma.com/
https://lore.kernel.org/git/87zhoz8b9o.fsf@evledraar.gmail.com/

The latter one in particular has a lot of extra details. The former also
has the suggestion of a per-push bitmap configuration as a workaround.

As your numbers show it's still an issue today, but those threads should
help you if you're looking to dig further into the root cause.

Aside from the underlying root causes it would be very nice to fix the
progress code in that area, i.e. we "stall" on "Enumerating objects",
which is just a matter of us not having a separate progress bar for the
very expensive bitmap work we're doing.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [Internet]Re: reachability-bitmap makes push performance worse ?
  2022-06-14  8:55 ` Ævar Arnfjörð Bjarmason
@ 2022-06-14 11:00   ` kylezhao(赵柯宇)
  2022-06-14 14:22     ` Derrick Stolee
  0 siblings, 1 reply; 4+ messages in thread
From: kylezhao(赵柯宇) @ 2022-06-14 11:00 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git@vger.kernel.org

> This is a known issue, I think you've found the same problem discussed in these past threads:
> 
> https://lore.kernel.org/git/38b99459158a45b1bea09037f3dd092d@exmbdft7.ad.twosigma.com/
> https://lore.kernel.org/git/87zhoz8b9o.fsf@evledraar.gmail.com/

Thanks.

> The latter one in particular has a lot of extra details. The former also has the suggestion of a per-push bitmap configuration as a workaround.
>
> As your numbers show it's still an issue today, but those threads should help you if you're looking to dig further into the root cause.
> 
> Aside from the underlying root causes it would be very nice to fix the progress code in that area, i.e. we "stall" on "Enumerating objects", which is just a matter of us not having a separate progress bar for the very expensive bitmap work we're doing.

It looks like optimizing the bitmap to solve the problem will be a long process. This requires developers to have a deep understanding of the algorithm.

A per-push bitmap configuration as a workaround can't completely solve the problem, but it works for me. 
After all, bitmap was not designed to optimize git push. Most of time, git push is not been called as frequently as git fetch.

The problem has been around for 3 years, has the community considered providing a config like "push.useBitmap" to prevent git push using bitmap?
It would be appreciated if there is such a config, which can quickly solve my problem and doesn't seem like a lot of work.

If no one is interested in it, I can also try to submit a patch (although it may be a bit slow since all I am new to the git community).


-----Original Message-----
From: Ævar Arnfjörð Bjarmason <avarab@gmail.com> 
Sent: 2022年6月14日 16:56
To: kylezhao(赵柯宇) <kylezhao@tencent.com>
Cc: git@vger.kernel.org
Subject: [Internet]Re: reachability-bitmap makes push performance worse ?


On Tue, Jun 14 2022, kylezhao(赵柯宇) wrote:

> Hi All,
>  
> thank you for reading my report.
>  
>  
> How did we find out?
>  
> The problem described in the title occurs on our git server.
> Each git repositories have multiple replicas on our servers to increase git read performance, and the data synchronization method between these replicas is git push.
> One day we found that the git push of a repository was significantly slow, and it took more than ten seconds to just create a new branch from an existing commit.
>  
> How to reproduce the problem ?
>  
> git version: 2.36.1
>  
> # /data/test/repo is a bare git repository which can reproduce the 
> problem $ cd /data/test/repo
>  
> # number of refs
> $ git show-ref | wc -l
> 21134
> # pack information
> $ ls objects/pack/ -hl
> total 14G
> -r--r--r-- 1 root root  43M Jun 14 04:16 
> pack-9a7fc024652645a632fb82a4ff26c3ddf4883eed.bitmap
> -r--r--r-- 1 root root 169M Jun 14 04:15 
> pack-9a7fc024652645a632fb82a4ff26c3ddf4883eed.idx
> -r--r--r-- 1 root root  14G Jun 14 04:14 
> pack-9a7fc024652645a632fb82a4ff26c3ddf4883eed.pack
>  
> # objects information
> $ git count-objects -v
> count: 0
> size: 0
> in-pack: 5185141
> packs: 1
> size-pack: 13938704
> prune-packable: 0
> garbage: 0
> size-garbage: 0
>  
> # number of commits
> $ git rev-list --all |  wc -l
> 955262
>  
> $ cp -r /data/test/repo /data/test/replica-1 $ cp -r /data/test/repo 
> /data/test/replica-2 $ cd /data/test/replica-1
>  
> # create a branch from an existing commit $ git update-ref 
> refs/heads/b_1 43fa4721c61106583cd552da85da3bd84f0f9929
> $ git show-ref | grep 43fa4721c61106583cd552da85da3bd84f0f9929
> 43fa4721c61106583cd552da85da3bd84f0f9929 refs/heads/b_1
>  
> # number of commits of the ref
> $ git rev-list refs/heads/b_1 |  wc -l
> 117836
>  
> # git push with bitmap
> $ GIT_TRACE=1 git push file:///data/test/replica-2 refs/heads/b_1
> 04:19:07.654103 git.c:459               trace: built-in: git push 
> file:///data/test/replica-2 refs/heads/b_1
> 04:19:07.690006 run-command.c:654       trace: run_command: unset GIT_DIR GIT_IMPLICIT_WORK_TREE GIT_PREFIX; 'git-receive-pack '\''/data/test/replica-2'\'''
> 04:19:07.694339 git.c:459               trace: built-in: git 
> receive-pack /data/test/replica-2
> 04:19:07.751814 run-command.c:654       trace: run_command: git 
> pack-objects --all-progress-implied --revs --stdout --thin 
> --delta-base-offset --progress
> 04:19:07.754011 git.c:459               trace: built-in: git 
> pack-objects --all-progress-implied --revs --stdout --thin 
> --delta-base-offset --progress Total 0 (delta 0), reused 0 (delta 0), 
> pack-reused 0
> 04:19:20.304868 run-command.c:654       trace: run_command:
> GIT_ALTERNATE_OBJECT_DIRECTORIES=/data/test/replica-2/./objects
> GIT_OBJECT_DIRECTORY=/data/test/replica-2/./objects/tmp_objdir-incomin
> g-CaCTHm
> GIT_QUARANTINE_PATH
> =/data/test/replica-2/./objects/tmp_objdir-incoming-CaCTHm git 
> unpack-objects --pack_header=2,0
> remote: 04:19:20.306550 git.c:459               trace: built-in: git 
> unpack-objects --pack_header=2,0
> 04:19:20.306903 run-command.c:654       trace: run_command:
> GIT_ALTERNATE_OBJECT_DIRECTORIES=/data/test/replica-2/./objects
> GIT_OBJECT_DIRECTORY=/data/test/replica-2/./objects/tmp_objdir-incomin
> g-CaCTHm
> GIT_QUARANTINE_PATH
> =/data/test/replica-2/./objects/tmp_objdir-incoming-CaCTHm git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
> remote: 04:19:20.308332 git.c:459               trace: built-in: git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
> remote: 04:19:20.344031 run-command.c:654       trace: run_command:
> unset GIT_ALTERNATE_OBJECT_DIRECTORIES GIT_DIR GIT_OBJECT_DIRECTORY 
> GIT_PREFIX; git --git-dir=/data/test/replica-2 for-each-ref 
> '--format=%(objectname)'
> remote: 04:19:20.346359 git.c:459               trace: built-in: git for-each-ref '--format=%(objectname)'
> 04:19:20.395511 run-command.c:654       trace: run_command: git gc 
> --auto --quiet
> remote: 04:19:20.397949 git.c:459               trace: built-in: git 
> gc --auto --quiet To file:///data/test/replica-2
> * [new branch]                b_1 -> b_1
>  
> # reset replica-2 and remove bitmap
> $ rm -rf /data/test/replica-2
> $ cp -r /data/test/repo /data/test/replica-2 $ rm 
> objects/pack/pack-9a7fc024652645a632fb82a4ff26c3ddf4883eed.bitmap
>  
>  
> # git push without bitmap
> $ GIT_TRACE=1 git push file:///data/test/replica-2 refs/heads/b_1
> 04:20:44.633590 git.c:459               trace: built-in: git push 
> file:///data/test/replica-2 refs/heads/b_1
> 04:20:44.668908 run-command.c:654       trace: run_command: unset GIT_DIR GIT_IMPLICIT_WORK_TREE GIT_PREFIX; 'git-receive-pack '\''/data/test/replica-2'\'''
> 04:20:44.673234 git.c:459               trace: built-in: git 
> receive-pack /data/test/replica-2
> 04:20:44.720852 run-command.c:654       trace: run_command: git 
> pack-objects --all-progress-implied --revs --stdout --thin 
> --delta-base-offset --progress
> 04:20:44.723100 git.c:459               trace: built-in: git 
> pack-objects --all-progress-implied --revs --stdout --thin 
> --delta-base-offset --progress Total 0 (delta 0), reused 0 (delta 0), 
> pack-reused 0
> 04:20:44.800298 run-command.c:654       trace: run_command:
> GIT_ALTERNATE_OBJECT_DIRECTORIES=/data/test/replica-2/./objects
> GIT_OBJECT_DIRECTORY=/data/test/replica-2/./objects/tmp_objdir-incomin
> g-UOWY1E
> GIT_QUARANTINE_PATH
> =/data/test/replica-2/./objects/tmp_objdir-incoming-UOWY1E git 
> unpack-objects --pack_header=2,0
> remote: 04:20:44.802056 git.c:459               trace: built-in: git 
> unpack-objects --pack_header=2,0
> 04:20:44.802474 run-command.c:654       trace: run_command:
> GIT_ALTERNATE_OBJECT_DIRECTORIES=/data/test/replica-2/./objects
> GIT_OBJECT_DIRECTORY=/data/test/replica-2/./objects/tmp_objdir-incomin
> g-UOWY1E
> GIT_QUARANTINE_PATH
> =/data/test/replica-2/./objects/tmp_objdir-incoming-UOWY1E git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
> remote: 04:20:44.803930 git.c:459               trace: built-in: git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
> remote: 04:20:44.834388 run-command.c:654       trace: run_command:
> unset GIT_ALTERNATE_OBJECT_DIRECTORIES GIT_DIR GIT_OBJECT_DIRECTORY 
> GIT_PREFIX; git --git-dir=/data/test/replica-2 for-each-ref 
> '--format=%(objectname)'
> remote: 04:20:44.836220 git.c:459               trace: built-in: git for-each-ref '--format=%(objectname)'
> 04:20:44.884165 run-command.c:654       trace: run_command: git gc 
> --auto --quiet
> remote: 04:20:44.886108 git.c:459               trace: built-in: git 
> gc --auto --quiet To file:///data/test/replica-2
> * [new branch]                b_1 -> b_1
>  
>  
> It can be seen from the above operations that git push is stuck in the git pack-objects process for about 13s for a long time.
> After I deleted the bitmap, the whole git push completed in less than 1s.
>  
> During testing, we found that not every git repository was significantly affected by bitmap. 
> This may be related to the number of objects in the git repository itself, the number of refs, and the sha1 pointed to by the pushed branch.
>  
> We benefit from bitmap performance optimizations for git fetch and clone, but it seems that it affects the performance of git push.
>  
> Maybe we can disable bitmap under the process of git push?
> As far as I know, the number of "counting objects" represented during a git push is usually small relative to the entire repository.
> Counting objects by building bitmaps in memory may take more time than before.
>  
> Of course, it would be better if anyone has a better solution.

This is a known issue, I think you've found the same problem discussed in these past threads:

https://lore.kernel.org/git/38b99459158a45b1bea09037f3dd092d@exmbdft7.ad.twosigma.com/
https://lore.kernel.org/git/87zhoz8b9o.fsf@evledraar.gmail.com/

The latter one in particular has a lot of extra details. The former also has the suggestion of a per-push bitmap configuration as a workaround.

As your numbers show it's still an issue today, but those threads should help you if you're looking to dig further into the root cause.

Aside from the underlying root causes it would be very nice to fix the progress code in that area, i.e. we "stall" on "Enumerating objects", which is just a matter of us not having a separate progress bar for the very expensive bitmap work we're doing.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Internet]Re: reachability-bitmap makes push performance worse ?
  2022-06-14 11:00   ` [Internet]Re: " kylezhao(赵柯宇)
@ 2022-06-14 14:22     ` Derrick Stolee
  0 siblings, 0 replies; 4+ messages in thread
From: Derrick Stolee @ 2022-06-14 14:22 UTC (permalink / raw)
  To: kylezhao(赵柯宇),
	Ævar Arnfjörð Bjarmason
  Cc: git@vger.kernel.org

On 6/14/2022 7:00 AM, kylezhao(赵柯宇) wrote:
>> This is a known issue, I think you've found the same problem discussed in these past threads:
>>
>> https://lore.kernel.org/git/38b99459158a45b1bea09037f3dd092d@exmbdft7.ad.twosigma.com/
>> https://lore.kernel.org/git/87zhoz8b9o.fsf@evledraar.gmail.com/
> 
> Thanks.
> 
>> The latter one in particular has a lot of extra details. The former also
>> has the suggestion of a per-push bitmap configuration as a workaround.
>>
>> As your numbers show it's still an issue today, but those threads should
>> help you if you're looking to dig further into the root cause.
>>
>> Aside from the underlying root causes it would be very nice to fix the
>> progress code in that area, i.e. we "stall" on "Enumerating objects",
>> which is just a matter of us not having a separate progress bar for the
>> very expensive bitmap work we're doing.
> 
> It looks like optimizing the bitmap to solve the problem will be a long
> process. This requires developers to have a deep understanding of the
> algorithm.
> 
> A per-push bitmap configuration as a workaround can't completely solve the
> problem, but it works for me. After all, bitmap was not designed to optimize
> git push. Most of time, git push is not been called as frequently as git fetch.

I think the issue is that bitmaps are designed to support servers, which don't
exactly use "git push" but instead use "git upload-pack" with a very different
type of data (a lot of branches simultaneously from a large variety of bases).

In general, clients that use "git push" don't generally have bitmaps, so this
has not been a priority. For clients, it is faster to do a more focused object
walk. See these commits and blog post for more details:

* d5d2e93577 (revision: implement sparse algorithm, 2019-01-16)
* 3d036eb0d2 (pack-objects: create pack.useSparse setting, 2019-01-16)
* https://devblogs.microsoft.com/devops/exploring-new-frontiers-for-git-push-performance/

Hopefully that gives enough context as to why one would want to disable bitmaps
for most "git push" operations.

> The problem has been around for 3 years, has the community considered providing
> a config like "push.useBitmap" to prevent git push using bitmap? It would be
> appreciated if there is such a config, which can quickly solve my problem and
> doesn't seem like a lot of work.

I think this config would be a good idea, and I would even argue that we might
want to set it to "false" by default.

> If no one is interested in it, I can also try to submit a patch (although it
> may be a bit slow since all I am new to the git community).

We would welcome the contribution! I look forward to seeing it when you're
ready.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-06-14 14:22 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-14  7:59 reachability-bitmap makes push performance worse ? kylezhao(赵柯宇)
2022-06-14  8:55 ` Ævar Arnfjörð Bjarmason
2022-06-14 11:00   ` [Internet]Re: " kylezhao(赵柯宇)
2022-06-14 14:22     ` Derrick Stolee

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).