git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* why git is so slow for a tiny git push?
       [not found] ` <5a6f3e8f29f74c93bf3af5da636df973@xiaomi.com>
@ 2021-10-09 18:05   ` 程洋
  2021-10-11 16:53     ` Jeff King
  2021-10-28 13:17     ` Han-Wen Nienhuys
  0 siblings, 2 replies; 16+ messages in thread
From: 程洋 @ 2021-10-09 18:05 UTC (permalink / raw)
  To: git

I have a really big repository which has 9m objects and maybe 300k refs.
I noticed that git push is really slow for a tiny change. An example shows below

3 objects which is only 7 kb takes 36 seconds to pack-objects (it's the time after i enable pack.usesparse)
However if I manually call “pack-objects” with the exactly same objects SHA1. It only take less than 0.005 second
What is really pass to “pack-objects” when I call “git push”?

I read an article says git will enumerate all "uninteresting objects" to determine what to send. but i don't understand, in my case git should only enumerate objects between "1a2d494b1b71469eebbd42aeabe1736bfa4b51fa..ddf3b84dca1aa4fe209a218380df1482af0d6b48". It's insane. I have a master server and a slave server serve this repository to my users. And i have a cron job to push every change from master to slave. And i found my master server CPU is full all the time because of the push jobs

Is there any solution?


Here is my full output
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
work@c5-miui-miuigit-slave15:~/repositories/miui/platform/frameworks/base2/base$  GIT_TRACE_PERFORMANCE=1 GIT_TRACE_PACKET=1 git push git://10.172.32.31/miui/platform/frameworks/base.git bsp-qcom-s:bsp-qcom-s
.....
.....
.....
.....
01:53:00.595910 pkt-line.c:80           packet:         push< 2d5eb3d37e2aa6659920bda688310584c70a37d0 refs/remotes/origin/v9-l-ido-stable2x
01:53:00.595917 pkt-line.c:80           packet:         push< 120ebf9b593c8f914981506b8390dd317fb8c9f1 refs/remotes/origin/c
01:53:00.595954 pkt-line.c:80           packet:         push< 90403256e6775476bd8dee31f28b31af9f7eac89 refs/remotes/origin/v9-o-d1s-dev-backup
01:53:00.595962 pkt-line.c:80           packet:         push< 9a78ddd5a4d6759585d29549466138291b665d35 refs/remotes/origin/v9-o-dipper-dev
01:53:00.595969 pkt-line.c:80           packet:         push< 343dfeb613e411e0ede4ec6327e2b171e39ff523 refs/remotes/origin/v9-o-nitrogen-dev
01:53:00.595976 pkt-line.c:80           packet:         push< 35e23e916c2dccd8603a6fe485ee56d688003ff5 refs/remotes/origin/v9-o-sagit-dev
01:53:00.595984 pkt-line.c:80           packet:         push< dc62ab5c28f7f6154d41d138301fe7f555db160b refs/remotes/origin/v9-o-sakura-dev
01:53:00.595991 pkt-line.c:80           packet:         push< 00b4763e8219de6e8a76b9ce24ecb3460030783f refs/remotes/origin/v9-o-scorpio-dev
01:53:00.595998 pkt-line.c:80           packet:         push< 2f7b3a268712ba187bb1d399c698cb1836c5d47a refs/remotes/origin/v9-o-sirius-dev
01:53:00.596005 pkt-line.c:80           packet:         push< e14f29817e4128463ceeccc531f313ae1b138780 refs/remotes/origin/v9-o-whyred-dev
01:53:00.596012 pkt-line.c:80           packet:         push< c7620c21cf3aade7666c448471306c1264032c5d refs/remotes/origin/v9-o-ysl-dev
01:53:00.596022 pkt-line.c:80           packet:         push< c1cbe43f1bc50dcd422ef354e694ea45ca8aa797 refs/remotes/origin/wt-p-laurus-native
01:53:00.596029 pkt-line.c:80           packet:         push< 3c736c7b701b9023f08c3641bcc77041e35d0eca refs/remotes/origin/wt-q-laurus-miui
01:53:00.596036 pkt-line.c:80           packet:         push< 3dc0351efe79817f5c8bf23564e3c483ff059833 refs/remotes/origin/wt-q-laurus-native
01:53:00.596044 pkt-line.c:80           packet:         push< 58ac901f9302e7f52325f4ce905cc2cbdfa310ca refs/remotes/origin/wt-r-evergreen
01:53:00.596053 pkt-line.c:80           packet:         push< 0000
01:53:00.601910 pkt-line.c:80           packet:         push> 1a2d494b1b71469eebbd42aeabe1736bfa4b51fa ddf3b84dca1aa4fe209a218380df1482af0d6b48 refs/heads/bsp-qcom-s\0 report-status-v2 side-band-64k object-format=sha1 agent=git/2.32.0
01:53:00.601959 pkt-line.c:80           packet:         push> 0000
Enumerating objects: 3, done.
Counting objects: 100% (3/3), done.
Delta compression using up to 40 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 7.24 KiB | 7.24 MiB/s, done.
Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
01:53:19.256506 trace.c:487             performance: 18.653337755 s: git command: /usr/libexec/git-core/git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress
01:53:22.298695 pkt-line.c:80           packet:     sideband< \1000eunpack ok001dok refs/heads/bsp-qcom-s0000
01:53:22.298735 pkt-line.c:80           packet:     sideband< 0000
01:53:22.298745 pkt-line.c:80           packet:         push< unpack ok
01:53:22.298770 pkt-line.c:80           packet:         push< ok refs/heads/bsp-qcom-s
01:53:22.298779 pkt-line.c:80           packet:         push< 0000
To git://10.172.32.31/miui/platform/frameworks/base.git
   1a2d494b1b71..ddf3b84dca1a  bsp-qcom-s -> bsp-qcom-s
01:53:22.316441 trace.c:487             performance: 22.883688573 s: git command: git push git://10.172.32.31/miui/platform/frameworks/base.git bsp-qcom-s:bsp-qcom-s
work@c5-miui-miuigit-slave15:~/repositories/miui/platform/frameworks/base2/base$ echo 1a2d494b1b71469eebbd42aeabe1736bfa4b51fa..ddf3b84dca1aa4fe209a218380df1482af0d6b48 > 1
work@c5-miui-miuigit-slave15:~/repositories/miui/platform/frameworks/base2/base$ time /usr/libexec/git-core/git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress < 1
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 40 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 292 bytes | 292.00 KiB/s, done.

Total 3 (delta 2), reused 0 (delta 0), pack-reused 0

real    0m0.005s
user    0m0.000s
sys 0m0.004s
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
#/******本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/#

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: why git is so slow for a tiny git push?
  2021-10-09 18:05   ` why git is so slow for a tiny git push? 程洋
@ 2021-10-11 16:53     ` Jeff King
  2021-10-12  8:04       ` [External Mail]Re: " 程洋
  2021-10-28 13:17     ` Han-Wen Nienhuys
  1 sibling, 1 reply; 16+ messages in thread
From: Jeff King @ 2021-10-11 16:53 UTC (permalink / raw)
  To: 程洋; +Cc: git

On Sat, Oct 09, 2021 at 06:05:56PM +0000, 程洋 wrote:

> I have a really big repository which has 9m objects and maybe 300k refs.
> I noticed that git push is really slow for a tiny change. An example shows below
> 
> 3 objects which is only 7 kb takes 36 seconds to pack-objects (it's the time after i enable pack.usesparse)
> However if I manually call “pack-objects” with the exactly same objects SHA1. It only take less than 0.005 second
> What is really pass to “pack-objects” when I call “git push”?

Do you have an objects/pack/pack-*.bitmap file on the sending side?

The bitmap code is eager to produce an exact set difference between what
is being sent and what the other side has. If you have incomplete bitmap
coverage (which is almost a certainty if you have 300k refs), it may do
a lot of traversal filling in the "what the other side has" part of the
bitmap, even though it does not end up helping the final result in this
case.

Bitmaps are enabled by default on bare repos since Git v2.22.0. You can
override this with:

  git config repack.writeBitmaps false
  git gc

(or if you don't want to do the gc, you can safely remove the '.bitmap'
file).

I notice you used GIT_TRACE_PERFORMANCE below. Try GIT_TRACE2_PERF
instead, which goes into detail within particular processes. If this is
related to bitmaps, I'd expect the time to go to the "enumerate-objects"
region.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [External Mail]Re: why git is so slow for a tiny git push?
  2021-10-11 16:53     ` Jeff King
@ 2021-10-12  8:04       ` 程洋
  2021-10-12  8:39         ` Jeff King
  0 siblings, 1 reply; 16+ messages in thread
From: 程洋 @ 2021-10-12  8:04 UTC (permalink / raw)
  To: Jeff King; +Cc: git

I have bitmap indeed because my master server also serves as download server.
However I'm using git 2.17.0, and I didn't set repack.writeBitmaps

Also I tried "GIT_TRACE2_PERF" and the it is "enumerating objects" cost most of the time.
But why bitmaps can cause push to be slow? Do you mean that if writeBitmaps is true, every push will regenerate bitmap file? If that's what you mean, what I see is the only bitmap file in my repo didn't change across time (the modify time is one month ago, long before I run the experiment)

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

I came up with an idea, since I found if I decrease the refs number, push goes much faster than before.
So I use receive.hiderefs to hide most of refs. And I comment "reject_updates_to_hidden" in receive-pack.c (because I need to update those hide refs, or add new refs)

-----Original Message-----
From: Jeff King <peff@peff.net>
Sent: Tuesday, October 12, 2021 12:53 AM
To: 程洋 <chengyang@xiaomi.com>
Cc: git@vger.kernel.org
Subject: [External Mail]Re: why git is so slow for a tiny git push?

*This message originated from outside of XIAOMI. Please treat this email with caution*


On Sat, Oct 09, 2021 at 06:05:56PM +0000, 程洋 wrote:

> I have a really big repository which has 9m objects and maybe 300k refs.
> I noticed that git push is really slow for a tiny change. An example
> shows below
>
> 3 objects which is only 7 kb takes 36 seconds to pack-objects (it's
> the time after i enable pack.usesparse) However if I manually call
> “pack-objects” with the exactly same objects SHA1. It only take less than 0.005 second What is really pass to “pack-objects” when I call “git push”?

Do you have an objects/pack/pack-*.bitmap file on the sending side?

The bitmap code is eager to produce an exact set difference between what is being sent and what the other side has. If you have incomplete bitmap coverage (which is almost a certainty if you have 300k refs), it may do a lot of traversal filling in the "what the other side has" part of the bitmap, even though it does not end up helping the final result in this case.

Bitmaps are enabled by default on bare repos since Git v2.22.0. You can override this with:

  git config repack.writeBitmaps false
  git gc

(or if you don't want to do the gc, you can safely remove the '.bitmap'
file).

I notice you used GIT_TRACE_PERFORMANCE below. Try GIT_TRACE2_PERF instead, which goes into detail within particular processes. If this is related to bitmaps, I'd expect the time to go to the "enumerate-objects"
region.

-Peff
#/******本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/#

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [External Mail]Re: why git is so slow for a tiny git push?
  2021-10-12  8:04       ` [External Mail]Re: " 程洋
@ 2021-10-12  8:39         ` Jeff King
  2021-10-12  9:08           ` 程洋
  2021-10-12 10:06           ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 16+ messages in thread
From: Jeff King @ 2021-10-12  8:39 UTC (permalink / raw)
  To: 程洋; +Cc: git

On Tue, Oct 12, 2021 at 08:04:44AM +0000, 程洋 wrote:

> I have bitmap indeed because my master server also serves as download server.
> However I'm using git 2.17.0, and I didn't set repack.writeBitmaps

On that version and without the config, then perhaps you (or somebody)
passed "-b" to git-repack.

> But why bitmaps can cause push to be slow? Do you mean that if
> writeBitmaps is true, every push will regenerate bitmap file? If
> that's what you mean, what I see is the only bitmap file in my repo
> didn't change across time (the modify time is one month ago, long
> before I run the experiment)

No, it is not regenerating the on-disk bitmaps. But when deciding the
set of objects to send, pack-objects will generate an internal bitmap
which is the set difference of objects reachable from the pushed refs,
minus objects reachable from the refs the other the other side told us
they had.

It uses the on-disk bitmaps as much as possible, but there may be
commits not covered by bitmaps (either because they were pushed since
the last repack which built bitmaps, or simply because it's too
expensive to put a bitmap on every commit, so we sprinkle them
throughout the commit history). In those cases we have to traverse parts
of the object graph by walking commits and opening up trees. This can be
expensive, and is where your time is going.

Reachability bitmaps _usually_ make things faster, but they have some
cases where they make things worse (especially if you have a ton of
refs, or haven't repacked recently).

If bitmaps are causing a problem for your push, they are likely to be
causing problems for fetches, too. But if you want to keep them to serve
fetches, but not use them for push, you should be able to do:

  git -c pack.usebitmaps=false push

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [External Mail]Re: why git is so slow for a tiny git push?
  2021-10-12  8:39         ` Jeff King
@ 2021-10-12  9:08           ` 程洋
  2021-10-12 21:39             ` Jeff King
  2021-10-12 10:06           ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 16+ messages in thread
From: 程洋 @ 2021-10-12  9:08 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Oh my god.
Jesus. It works for me. After disable writebitmap, time cost decrease from 33 seconds to 0.9 seconds.

But now it turns out that, remote side takes 13 seconds to receive the pack,  since git receive-pack is triggered automatically from remote side, is there anyway to enable GIT_TRACE2_PERF on server side?

-----Original Message-----
From: Jeff King <peff@peff.net>
Sent: Tuesday, October 12, 2021 4:40 PM
To: 程洋 <chengyang@xiaomi.com>
Cc: git@vger.kernel.org
Subject: Re: [External Mail]Re: why git is so slow for a tiny git push?

*This message originated from outside of XIAOMI. Please treat this email with caution*


On Tue, Oct 12, 2021 at 08:04:44AM +0000, 程洋 wrote:

> I have bitmap indeed because my master server also serves as download server.
> However I'm using git 2.17.0, and I didn't set repack.writeBitmaps

On that version and without the config, then perhaps you (or somebody) passed "-b" to git-repack.

> But why bitmaps can cause push to be slow? Do you mean that if
> writeBitmaps is true, every push will regenerate bitmap file? If
> that's what you mean, what I see is the only bitmap file in my repo
> didn't change across time (the modify time is one month ago, long
> before I run the experiment)

No, it is not regenerating the on-disk bitmaps. But when deciding the set of objects to send, pack-objects will generate an internal bitmap which is the set difference of objects reachable from the pushed refs, minus objects reachable from the refs the other the other side told us they had.

It uses the on-disk bitmaps as much as possible, but there may be commits not covered by bitmaps (either because they were pushed since the last repack which built bitmaps, or simply because it's too expensive to put a bitmap on every commit, so we sprinkle them throughout the commit history). In those cases we have to traverse parts of the object graph by walking commits and opening up trees. This can be expensive, and is where your time is going.

Reachability bitmaps _usually_ make things faster, but they have some cases where they make things worse (especially if you have a ton of refs, or haven't repacked recently).

If bitmaps are causing a problem for your push, they are likely to be causing problems for fetches, too. But if you want to keep them to serve fetches, but not use them for push, you should be able to do:

  git -c pack.usebitmaps=false push

-Peff
#/******本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/#

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [External Mail]Re: why git is so slow for a tiny git push?
  2021-10-12  8:39         ` Jeff King
  2021-10-12  9:08           ` 程洋
@ 2021-10-12 10:06           ` Ævar Arnfjörð Bjarmason
  2021-10-12 21:46             ` Jeff King
  1 sibling, 1 reply; 16+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-12 10:06 UTC (permalink / raw)
  To: Jeff King; +Cc: 程洋, git


On Tue, Oct 12 2021, Jeff King wrote:

> On Tue, Oct 12, 2021 at 08:04:44AM +0000, 程洋 wrote:
>
>> I have bitmap indeed because my master server also serves as download server.
>> However I'm using git 2.17.0, and I didn't set repack.writeBitmaps
>
> On that version and without the config, then perhaps you (or somebody)
> passed "-b" to git-repack.
>
>> But why bitmaps can cause push to be slow? Do you mean that if
>> writeBitmaps is true, every push will regenerate bitmap file? If
>> that's what you mean, what I see is the only bitmap file in my repo
>> didn't change across time (the modify time is one month ago, long
>> before I run the experiment)
>
> No, it is not regenerating the on-disk bitmaps. But when deciding the
> set of objects to send, pack-objects will generate an internal bitmap
> which is the set difference of objects reachable from the pushed refs,
> minus objects reachable from the refs the other the other side told us
> they had.
>
> It uses the on-disk bitmaps as much as possible, but there may be
> commits not covered by bitmaps (either because they were pushed since
> the last repack which built bitmaps, or simply because it's too
> expensive to put a bitmap on every commit, so we sprinkle them
> throughout the commit history). In those cases we have to traverse parts
> of the object graph by walking commits and opening up trees. This can be
> expensive, and is where your time is going.
>
> Reachability bitmaps _usually_ make things faster, but they have some
> cases where they make things worse (especially if you have a ton of
> refs, or haven't repacked recently).
>
> If bitmaps are causing a problem for your push, they are likely to be
> causing problems for fetches, too. But if you want to keep them to serve
> fetches, but not use them for push, you should be able to do:
>
>   git -c pack.usebitmaps=false push

For the last on-list discussion to (probably the same) problem, which in
turn references an even earlier one:
https://lore.kernel.org/git/878s6nfq54.fsf@evledraar.gmail.com/

I don't remember if my own report from mid-2019 said so or contradicts
this (and I didn't re-read the thread), but FWIW I *vaguely* recall that
the case I ran into *might* have had to do with a user running into this
on a shared "staging" server.

I.e. one where users logged in as their own user, cd'd to a shared git
repo they got a lock on, and ran fetch/push/deploy commands. One user
had a pack.useBitmaps=true or equivalent in their config (or had
manually run such a repack), so there were some very old stale bitmaps
around.

This was also a setup with a gc.bigPackThreshold configured, which now
that I think about it might have made it much worse, i.e. we'd keep that
*.bitmap on the "big pack" around forever.

But more generally with these side-indexes it seems to me that the code
involved might not be considering these sorts of edge cases, i.e. my
understanding from you above is that if we have bitmaps anywhere we'll
try to in-memory use them for all the objects in play? Or that otherwise
having "partial" bitmaps leads to pathological behavior.

tl;dr: We can't assume that a config of "I like to write side-index
A/B/C/... when I repack" means that the repo is in that state *now*, but
it seems that we do.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [External Mail]Re: why git is so slow for a tiny git push?
  2021-10-12  9:08           ` 程洋
@ 2021-10-12 21:39             ` Jeff King
  2021-10-14  6:47               ` 程洋
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff King @ 2021-10-12 21:39 UTC (permalink / raw)
  To: 程洋; +Cc: git

On Tue, Oct 12, 2021 at 09:08:08AM +0000, 程洋 wrote:

> Jesus. It works for me. After disable writebitmap, time cost decrease
> from 33 seconds to 0.9 seconds.
> 
> But now it turns out that, remote side takes 13 seconds to receive the
> pack,  since git receive-pack is triggered automatically from remote
> side, is there anyway to enable GIT_TRACE2_PERF on server side?

For the environment variable, it depends on your protocol. If you can
push over ssh (and the other side lets you execute arbitrary commands),
then:

  git push --receive-pack='GIT_TRACE2_PERF=/tmp/foo.trace git-receive-pack'

Otherwise, you can look at setting the trace2.perfTarget config option
on the server side. I haven't played with it myself before.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [External Mail]Re: why git is so slow for a tiny git push?
  2021-10-12 10:06           ` Ævar Arnfjörð Bjarmason
@ 2021-10-12 21:46             ` Jeff King
  2021-11-23  6:42               ` 程洋
  2021-11-24  8:07               ` 程洋
  0 siblings, 2 replies; 16+ messages in thread
From: Jeff King @ 2021-10-12 21:46 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: 程洋, git

On Tue, Oct 12, 2021 at 12:06:04PM +0200, Ævar Arnfjörð Bjarmason wrote:

> But more generally with these side-indexes it seems to me that the code
> involved might not be considering these sorts of edge cases, i.e. my
> understanding from you above is that if we have bitmaps anywhere we'll
> try to in-memory use them for all the objects in play? Or that otherwise
> having "partial" bitmaps leads to pathological behavior.

Sure, if there was an easy way to know beforehand whether the bitmap was
going to help or run into these pathological cases, it would be nice to
detect it. I don't know what that is (and I've given it quite a lot of
thought over the past 8 years).

I suspect the most direction would be to teach the bitmap code to behave
more like the regular traversal by just walking down to the
UNINTERESTING commits. Right now it gets a complete bitmap for the
commits we don't want, and then a bitmap for the ones we do want, and
takes a set difference.

It could instead walk both sides in the usual way, filling in the bitmap
for each, and then stop when it hits boundary commits. The bitmap for
the boundary commit (if we don't have a full one on-disk) is filled in
with what's in its tree. That means it's incomplete, and the result
might include some extra objects (e.g., if boundary~100 had a blob that
went away, but later came back in a descendant that isn't marked
uninteresting). That's the same tradeoff the non-bitmap traversal makes.

It would be pretty major surgery to the bitmap code. I haven't actually
tried it before.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [External Mail]Re: why git is so slow for a tiny git push?
  2021-10-12 21:39             ` Jeff King
@ 2021-10-14  6:47               ` 程洋
  2021-10-26 21:54                 ` Jeff King
  0 siblings, 1 reply; 16+ messages in thread
From: 程洋 @ 2021-10-14  6:47 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Seems that git receive-pack only takes 1 seconds.
After that, git rev-list takes the most time, I don't know what is it doing

-----Original Message-----
From: Jeff King <peff@peff.net>
Sent: Wednesday, October 13, 2021 5:39 AM
To: 程洋 <chengyang@xiaomi.com>
Cc: git@vger.kernel.org
Subject: Re: [External Mail]Re: why git is so slow for a tiny git push?

*This message originated from outside of XIAOMI. Please treat this email with caution*


On Tue, Oct 12, 2021 at 09:08:08AM +0000, 程洋 wrote:

> Jesus. It works for me. After disable writebitmap, time cost decrease
> from 33 seconds to 0.9 seconds.
>
> But now it turns out that, remote side takes 13 seconds to receive the
> pack,  since git receive-pack is triggered automatically from remote
> side, is there anyway to enable GIT_TRACE2_PERF on server side?

For the environment variable, it depends on your protocol. If you can push over ssh (and the other side lets you execute arbitrary commands),
then:

  git push --receive-pack='GIT_TRACE2_PERF=/tmp/foo.trace git-receive-pack'

Otherwise, you can look at setting the trace2.perfTarget config option on the server side. I haven't played with it myself before.

-Peff
#/******本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/#

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [External Mail]Re: why git is so slow for a tiny git push?
  2021-10-14  6:47               ` 程洋
@ 2021-10-26 21:54                 ` Jeff King
  2021-10-27  2:48                   ` 程洋
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff King @ 2021-10-26 21:54 UTC (permalink / raw)
  To: 程洋; +Cc: git

On Thu, Oct 14, 2021 at 06:47:31AM +0000, 程洋 wrote:

> Seems that git receive-pack only takes 1 seconds.
> After that, git rev-list takes the most time, I don't know what is it doing

It's checking connectivity of what was sent (i.e., that the other side
sent us all the objects). If you have a ton of refs on the server side,
there are some known issues here, as just loading the "we already have
this" side of the traversal can be very expensive.

There's been some recent work to make this faster, like f559d6d45e
(revision: avoid hitting packfiles when commits are in commit-graph,
2021-08-09) and f45022dc2f (connected: do not sort input revisions,
2021-08-09), which will be in the upcoming release. I'm sure there are
more improvements that could be made on top, but those show the general
direction.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [External Mail]Re: why git is so slow for a tiny git push?
  2021-10-26 21:54                 ` Jeff King
@ 2021-10-27  2:48                   ` 程洋
  0 siblings, 0 replies; 16+ messages in thread
From: 程洋 @ 2021-10-27  2:48 UTC (permalink / raw)
  To: Jeff King; +Cc: git

That's cool. Thanks for you help. I will try to compile it myself

-----Original Message-----
From: Jeff King <peff@peff.net>
Sent: Wednesday, October 27, 2021 5:55 AM
To: 程洋 <chengyang@xiaomi.com>
Cc: git@vger.kernel.org
Subject: Re: [External Mail]Re: why git is so slow for a tiny git push?

*This message originated from outside of XIAOMI. Please treat this email with caution*


On Thu, Oct 14, 2021 at 06:47:31AM +0000, 程洋 wrote:

> Seems that git receive-pack only takes 1 seconds.
> After that, git rev-list takes the most time, I don't know what is it
> doing

It's checking connectivity of what was sent (i.e., that the other side sent us all the objects). If you have a ton of refs on the server side, there are some known issues here, as just loading the "we already have this" side of the traversal can be very expensive.

There's been some recent work to make this faster, like f559d6d45e
(revision: avoid hitting packfiles when commits are in commit-graph,
2021-08-09) and f45022dc2f (connected: do not sort input revisions, 2021-08-09), which will be in the upcoming release. I'm sure there are more improvements that could be made on top, but those show the general direction.

-Peff
#/******本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/#

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: why git is so slow for a tiny git push?
  2021-10-09 18:05   ` why git is so slow for a tiny git push? 程洋
  2021-10-11 16:53     ` Jeff King
@ 2021-10-28 13:17     ` Han-Wen Nienhuys
  1 sibling, 0 replies; 16+ messages in thread
From: Han-Wen Nienhuys @ 2021-10-28 13:17 UTC (permalink / raw)
  To: 程洋; +Cc: git

On Sat, Oct 9, 2021 at 8:06 PM 程洋 <chengyang@xiaomi.com> wrote:
>
> I have a really big repository which has 9m objects and maybe 300k refs.
> I noticed that git push is really slow for a tiny change. An example shows below
>
> 3 objects which is only 7 kb takes 36 seconds to pack-objects (it's the time after i enable pack.usesparse)
> However if I manually call “pack-objects” with the exactly same objects SHA1. It only take less than 0.005 second
> What is really pass to “pack-objects” when I call “git push”?
>
> I read an article says git will enumerate all "uninteresting objects" to determine what to send. but i don't understand, in my case git should only enumerate objects between "1a2d494b1b71469eebbd42aeabe1736bfa4b51fa..ddf3b84dca1aa4fe209a218380df1482af0d6b48". It's insane. I have a master server and a slave server serve this repository to my users. And i have a cron job to push every change from master to slave. And i found my master server CPU is full all the time because of the push jobs
>
> Is there any solution?

not sure if this is your problem , but we heard reports of bitmaps
slowing down push replication for Gerrit (it looks like you're working
with Android repositories.). See
https://groups.google.com/g/repo-discuss/c/Xb8TbBXUYxw/m/jv5hqZ2PCQAJ
for background and suggestions.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [External Mail]Re: why git is so slow for a tiny git push?
  2021-10-12 21:46             ` Jeff King
@ 2021-11-23  6:42               ` 程洋
  2021-11-24 18:15                 ` Jeff King
  2021-11-24  8:07               ` 程洋
  1 sibling, 1 reply; 16+ messages in thread
From: 程洋 @ 2021-11-23  6:42 UTC (permalink / raw)
  To: Jeff King, Ævar Arnfjörð Bjarmason; +Cc: git

I got another problem here.
When I tries to clone from remote server. It took me 25 seconds to enumerating objects. And then 1 second to `couting objects` by bitmap.
I don't understand, why a fresh clone need `enumerating objects` ? Is `couting objects` enough for the server to determine what to send?

Here is the remote server trace:


11:49:12.438519 common-main.c:48             | d0 | main                     | version      |     |           |           |              | 2.33.1.558.g2bd2f258f4.dirty
11:49:12.438556 common-main.c:49             | d0 | main                     | start        |     |  0.000274 |           |              | git daemon --inetd --syslog --export-all --enable=upload-pack --enable=receive-pack --base-path=/home/work/repositories
11:49:12.438607 compat/linux/procinfo.c:170  | d0 | main                     | cmd_ancestry |     |           |           |              | ancestry:[xinetd systemd]
11:49:12.438655 git.c:737                                      | d0 | main                     | cmd_name     |     |           |           |              | _run_dashed_ (_run_dashed_)
11:49:12.438668 run-command.c:739                 | d0 | main                     | child_start  |     |  0.000390 |           |              | [ch0] class:dashed argv:[git-daemon --inetd --syslog --export-all --enable=upload-pack --enable=receive-pack --base-path=/home/work/repositories]
11:49:12.439555 common-main.c:48                  | d1 | main                     | version      |     |           |           |              | 2.33.1.558.g2bd2f258f4.dirty
11:49:12.439589 common-main.c:49                  | d1 | main                     | start        |     |  0.000242 |           |              | /usr/libexec/git-core/git-daemon --inetd --syslog --export-all --enable=upload-pack --enable=receive-pack --base-path=/home/work/repositories
11:49:12.439645 compat/linux/procinfo.c:170  | d1 | main                     | cmd_ancestry |     |           |           |              | ancestry:[git xinetd systemd]
11:49:12.439809 run-command.c:739            | d1 | main                     | child_start  |     |  0.000467 |           |              | [ch0] class:? argv:[git upload-pack --strict --timeout=0 .]
11:49:12.440747 common-main.c:48             | d2 | main                     | version      |     |           |           |              | 2.33.1.558.g2bd2f258f4.dirty
11:49:12.440772 common-main.c:49             | d2 | main                     | start        |     |  0.000252 |           |              | /usr/libexec/git-core/git upload-pack --strict --timeout=0 .
11:49:12.440833 compat/linux/procinfo.c:170  | d2 | main                     | cmd_ancestry |     |           |           |              | ancestry:[git-daemon git xinetd systemd]
11:49:12.440853 git.c:456                    | d2 | main                     | cmd_name     |     |           |           |              | upload-pack (_run_dashed_/upload-pack)
11:49:12.441013 protocol.c:76                | d2 | main                     | data         |     |  0.000494 |  0.000494 | transfer     | negotiated-version:2
11:49:12.481208 run-command.c:739            | d2 | main                     | child_start  |     |  0.040684 |           |              | [ch0] class:? argv:[git pack-objects --revs --thin --stdout --progress --delta-base-offset]
11:49:12.482307 common-main.c:48             | d3 | main                     | version      |     |           |           |              | 2.33.1.558.g2bd2f258f4.dirty
11:49:12.482334 common-main.c:49             | d3 | main                     | start        |     |  0.000220 |           |              | /usr/libexec/git-core/git pack-objects --revs --thin --stdout --progress --delta-base-offset
11:49:12.482405 compat/linux/procinfo.c:170  | d3 | main                     | cmd_ancestry |     |           |           |              | ancestry:[git git-daemon git xinetd systemd]
11:49:12.482500 git.c:456                    | d3 | main                     | cmd_name     |     |           |           |              | pack-objects (_run_dashed_/upload-pack/pack-objects)
11:49:12.482632 builtin/pack-objects.c:4140  | d3 | main                     | region_enter | r0  |  0.000522 |           | pack-objects | label:enumerate-objects
11:49:12.482825 progress.c:268               | d3 | main                     | region_enter | r0  |  0.000715 |           | progress     | ..label:Enumerating objects
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
11:49:21.477783 progress.c:329               | d3 | main                     | data         | r0  |  8.995670 |  8.994955 | progress     | ....total_objects:0
11:49:21.477848 progress.c:336               | d3 | main                     | region_leave | r0  |  8.995738 |  8.995023 | progress     | ..label:Enumerating objects
11:49:21.477880 builtin/pack-objects.c:4162  | d3 | main                     | region_leave | r0  |  8.995770 |  8.995248 | pack-objects | label:enumerate-objects
11:49:21.477891 builtin/pack-objects.c:4168  | d3 | main                     | region_enter | r0  |  8.995782 |           | pack-objects | label:prepare-pack
11:49:21.477903 progress.c:268               | d3 | main                     | region_enter | r0  |  8.995794 |           | progress     | ..label:Counting objects
11:49:22.316806 progress.c:329               | d3 | main                     | data         | r0  |  9.834695 |  0.838901 | progress     | ....total_objects:1383396
11:49:22.316848 progress.c:336               | d3 | main                     | region_leave | r0  |  9.834738 |  0.838944 | progress     | ..label:Counting objects
11:49:22.366109 progress.c:268               | d3 | main                     | region_enter | r0  |  9.883998 |           | progress     | ..label:Compressing objects
11:49:34.208323 trace2/tr2_tgt_perf.c:201    | d2 | main                     | signal       |     | 21.767795 |           |              | signo:13
11:49:34.208372 trace2/tr2_tgt_perf.c:201    | d3 | main                     | signal       |     | 21.726219 |           |              | ....signo:13
11:49:34.218767 run-command.c:995            | d1 | main                     | child_exit   |     | 21.779417 | 21.778950 |              | [ch0] pid:48725 code:141
11:49:34.218809 common-main.c:54             | d1 | main                     | exit         |     | 21.779469 |           |              | code:141
11:49:34.218822 trace2/tr2_tgt_perf.c:213    | d1 | main                     | atexit       |     | 21.779482 |           |              | code:141
11:49:34.219135 run-command.c:995            | d0 | main                     | child_exit   |     | 21.780855 | 21.780465 |              | [ch0] pid:48724 code:141
11:49:34.219170 git.c:759                    | d0 | main                     | exit         |     | 21.780893 |           |              | code:141
11:49:34.219182 trace2/tr2_tgt_perf.c:213    | d0 | main                     | atexit       |     | 21.780906 |           |              | code:141
-----Original Message-----
From: Jeff King <peff@peff.net>
Sent: Wednesday, October 13, 2021 5:46 AM
To: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Cc: 程洋 <chengyang@xiaomi.com>; git@vger.kernel.org
Subject: Re: [External Mail]Re: why git is so slow for a tiny git push?

*This message originated from outside of XIAOMI. Please treat this email with caution*


On Tue, Oct 12, 2021 at 12:06:04PM +0200, Ævar Arnfjörð Bjarmason wrote:

> But more generally with these side-indexes it seems to me that the
> code involved might not be considering these sorts of edge cases, i.e.
> my understanding from you above is that if we have bitmaps anywhere
> we'll try to in-memory use them for all the objects in play? Or that
> otherwise having "partial" bitmaps leads to pathological behavior.

Sure, if there was an easy way to know beforehand whether the bitmap was going to help or run into these pathological cases, it would be nice to detect it. I don't know what that is (and I've given it quite a lot of thought over the past 8 years).

I suspect the most direction would be to teach the bitmap code to behave more like the regular traversal by just walking down to the UNINTERESTING commits. Right now it gets a complete bitmap for the commits we don't want, and then a bitmap for the ones we do want, and takes a set difference.

It could instead walk both sides in the usual way, filling in the bitmap for each, and then stop when it hits boundary commits. The bitmap for the boundary commit (if we don't have a full one on-disk) is filled in with what's in its tree. That means it's incomplete, and the result might include some extra objects (e.g., if boundary~100 had a blob that went away, but later came back in a descendant that isn't marked uninteresting). That's the same tradeoff the non-bitmap traversal makes.

It would be pretty major surgery to the bitmap code. I haven't actually tried it before.

-Peff
#/******本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/#

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [External Mail]Re: why git is so slow for a tiny git push?
  2021-10-12 21:46             ` Jeff King
  2021-11-23  6:42               ` 程洋
@ 2021-11-24  8:07               ` 程洋
  1 sibling, 0 replies; 16+ messages in thread
From: 程洋 @ 2021-11-24  8:07 UTC (permalink / raw)
  To: Jeff King, Ævar Arnfjörð Bjarmason; +Cc: git

It seems that,  "get_object_list_from_bitmap" takes 9 seconds.
Does it meet the expectation?

I'm not sure. But here is my guess:
Since I have 300k refs. But clone with `--no-tags` only requires "refs/heads/*". Git has to search and filter refs in the whole bitmap file, which takes a lot of time.
I think jgit do it in a really smart way. It pack all refs/heads into one bitmapfile ,and the other refs in another bitmap file. Because 90% of clone operation only requires all refs/heads.


-----Original Message-----
From: 程洋
Sent: Tuesday, November 23, 2021 2:42 PM
To: 'Jeff King' <peff@peff.net>; Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Cc: git@vger.kernel.org
Subject: RE: [External Mail]Re: why git is so slow for a tiny git push?

I got another problem here.
When I tries to clone from remote server. It took me 25 seconds to enumerating objects. And then 1 second to `couting objects` by bitmap.
I don't understand, why a fresh clone need `enumerating objects` ? Is `couting objects` enough for the server to determine what to send?

Here is the remote server trace:


11:49:12.438519 common-main.c:48             | d0 | main                     | version      |     |           |           |              | 2.33.1.558.g2bd2f258f4.dirty
11:49:12.438556 common-main.c:49             | d0 | main                     | start        |     |  0.000274 |           |              | git daemon --inetd --syslog --export-all --enable=upload-pack --enable=receive-pack --base-path=/home/work/repositories
11:49:12.438607 compat/linux/procinfo.c:170  | d0 | main                     | cmd_ancestry |     |           |           |              | ancestry:[xinetd systemd]
11:49:12.438655 git.c:737                                      | d0 | main                     | cmd_name     |     |           |           |              | _run_dashed_ (_run_dashed_)
11:49:12.438668 run-command.c:739                 | d0 | main                     | child_start  |     |  0.000390 |           |              | [ch0] class:dashed argv:[git-daemon --inetd --syslog --export-all --enable=upload-pack --enable=receive-pack --base-path=/home/work/repositories]
11:49:12.439555 common-main.c:48                  | d1 | main                     | version      |     |           |           |              | 2.33.1.558.g2bd2f258f4.dirty
11:49:12.439589 common-main.c:49                  | d1 | main                     | start        |     |  0.000242 |           |              | /usr/libexec/git-core/git-daemon --inetd --syslog --export-all --enable=upload-pack --enable=receive-pack --base-path=/home/work/repositories
11:49:12.439645 compat/linux/procinfo.c:170  | d1 | main                     | cmd_ancestry |     |           |           |              | ancestry:[git xinetd systemd]
11:49:12.439809 run-command.c:739            | d1 | main                     | child_start  |     |  0.000467 |           |              | [ch0] class:? argv:[git upload-pack --strict --timeout=0 .]
11:49:12.440747 common-main.c:48             | d2 | main                     | version      |     |           |           |              | 2.33.1.558.g2bd2f258f4.dirty
11:49:12.440772 common-main.c:49             | d2 | main                     | start        |     |  0.000252 |           |              | /usr/libexec/git-core/git upload-pack --strict --timeout=0 .
11:49:12.440833 compat/linux/procinfo.c:170  | d2 | main                     | cmd_ancestry |     |           |           |              | ancestry:[git-daemon git xinetd systemd]
11:49:12.440853 git.c:456                    | d2 | main                     | cmd_name     |     |           |           |              | upload-pack (_run_dashed_/upload-pack)
11:49:12.441013 protocol.c:76                | d2 | main                     | data         |     |  0.000494 |  0.000494 | transfer     | negotiated-version:2
11:49:12.481208 run-command.c:739            | d2 | main                     | child_start  |     |  0.040684 |           |              | [ch0] class:? argv:[git pack-objects --revs --thin --stdout --progress --delta-base-offset]
11:49:12.482307 common-main.c:48             | d3 | main                     | version      |     |           |           |              | 2.33.1.558.g2bd2f258f4.dirty
11:49:12.482334 common-main.c:49             | d3 | main                     | start        |     |  0.000220 |           |              | /usr/libexec/git-core/git pack-objects --revs --thin --stdout --progress --delta-base-offset
11:49:12.482405 compat/linux/procinfo.c:170  | d3 | main                     | cmd_ancestry |     |           |           |              | ancestry:[git git-daemon git xinetd systemd]
11:49:12.482500 git.c:456                    | d3 | main                     | cmd_name     |     |           |           |              | pack-objects (_run_dashed_/upload-pack/pack-objects)
11:49:12.482632 builtin/pack-objects.c:4140  | d3 | main                     | region_enter | r0  |  0.000522 |           | pack-objects | label:enumerate-objects
11:49:12.482825 progress.c:268               | d3 | main                     | region_enter | r0  |  0.000715 |           | progress     | ..label:Enumerating objects
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
11:49:21.477783 progress.c:329               | d3 | main                     | data         | r0  |  8.995670 |  8.994955 | progress     | ....total_objects:0
11:49:21.477848 progress.c:336               | d3 | main                     | region_leave | r0  |  8.995738 |  8.995023 | progress     | ..label:Enumerating objects
11:49:21.477880 builtin/pack-objects.c:4162  | d3 | main                     | region_leave | r0  |  8.995770 |  8.995248 | pack-objects | label:enumerate-objects
11:49:21.477891 builtin/pack-objects.c:4168  | d3 | main                     | region_enter | r0  |  8.995782 |           | pack-objects | label:prepare-pack
11:49:21.477903 progress.c:268               | d3 | main                     | region_enter | r0  |  8.995794 |           | progress     | ..label:Counting objects
11:49:22.316806 progress.c:329               | d3 | main                     | data         | r0  |  9.834695 |  0.838901 | progress     | ....total_objects:1383396
11:49:22.316848 progress.c:336               | d3 | main                     | region_leave | r0  |  9.834738 |  0.838944 | progress     | ..label:Counting objects
11:49:22.366109 progress.c:268               | d3 | main                     | region_enter | r0  |  9.883998 |           | progress     | ..label:Compressing objects
11:49:34.208323 trace2/tr2_tgt_perf.c:201    | d2 | main                     | signal       |     | 21.767795 |           |              | signo:13
11:49:34.208372 trace2/tr2_tgt_perf.c:201    | d3 | main                     | signal       |     | 21.726219 |           |              | ....signo:13
11:49:34.218767 run-command.c:995            | d1 | main                     | child_exit   |     | 21.779417 | 21.778950 |              | [ch0] pid:48725 code:141
11:49:34.218809 common-main.c:54             | d1 | main                     | exit         |     | 21.779469 |           |              | code:141
11:49:34.218822 trace2/tr2_tgt_perf.c:213    | d1 | main                     | atexit       |     | 21.779482 |           |              | code:141
11:49:34.219135 run-command.c:995            | d0 | main                     | child_exit   |     | 21.780855 | 21.780465 |              | [ch0] pid:48724 code:141
11:49:34.219170 git.c:759                    | d0 | main                     | exit         |     | 21.780893 |           |              | code:141
11:49:34.219182 trace2/tr2_tgt_perf.c:213    | d0 | main                     | atexit       |     | 21.780906 |           |              | code:141
-----Original Message-----
From: Jeff King <peff@peff.net>
Sent: Wednesday, October 13, 2021 5:46 AM
To: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Cc: 程洋 <chengyang@xiaomi.com>; git@vger.kernel.org
Subject: Re: [External Mail]Re: why git is so slow for a tiny git push?

*This message originated from outside of XIAOMI. Please treat this email with caution*


On Tue, Oct 12, 2021 at 12:06:04PM +0200, Ævar Arnfjörð Bjarmason wrote:

> But more generally with these side-indexes it seems to me that the
> code involved might not be considering these sorts of edge cases, i.e.
> my understanding from you above is that if we have bitmaps anywhere
> we'll try to in-memory use them for all the objects in play? Or that
> otherwise having "partial" bitmaps leads to pathological behavior.

Sure, if there was an easy way to know beforehand whether the bitmap was going to help or run into these pathological cases, it would be nice to detect it. I don't know what that is (and I've given it quite a lot of thought over the past 8 years).

I suspect the most direction would be to teach the bitmap code to behave more like the regular traversal by just walking down to the UNINTERESTING commits. Right now it gets a complete bitmap for the commits we don't want, and then a bitmap for the ones we do want, and takes a set difference.

It could instead walk both sides in the usual way, filling in the bitmap for each, and then stop when it hits boundary commits. The bitmap for the boundary commit (if we don't have a full one on-disk) is filled in with what's in its tree. That means it's incomplete, and the result might include some extra objects (e.g., if boundary~100 had a blob that went away, but later came back in a descendant that isn't marked uninteresting). That's the same tradeoff the non-bitmap traversal makes.

It would be pretty major surgery to the bitmap code. I haven't actually tried it before.

-Peff
#/******本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/#

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [External Mail]Re: why git is so slow for a tiny git push?
  2021-11-23  6:42               ` 程洋
@ 2021-11-24 18:15                 ` Jeff King
  2021-11-25  2:53                   ` 程洋
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff King @ 2021-11-24 18:15 UTC (permalink / raw)
  To: 程洋; +Cc: Taylor Blau, Ævar Arnfjörð Bjarmason, git

On Tue, Nov 23, 2021 at 06:42:12AM +0000, 程洋 wrote:

> I got another problem here.
> When I tries to clone from remote server. It took me 25 seconds to enumerating objects. And then 1 second to `couting objects` by bitmap.
> I don't understand, why a fresh clone need `enumerating objects` ? Is `couting objects` enough for the server to determine what to send?

In older versions of Git, the "counting objects" progress meter used to
be the actual object graph traversal. That changed in v2.18 (via
5af050437a), but you may still seem some reference to "counting objects
is expensive".

These days that is called "enumerating objects", and "counting objects"
is just doing a quick-ish pass over that list to do some light analysis
(e.g., if we can reuse an on-disk delta). I'd expect "enumerating" to be
expensive in general, and "counting" to be quick in general.

The "enumerating" phase is where we determine what to send whether it's
for a clone or a fetch, and may involve opening up a bunch of trees to
walk the graph. It's what reachability bitmaps are supposed to make
faster. But if you have 300k refs, as you've mentioned, you almost
certainly don't have complete coverage of all of the ref tips, so we'll
have to fallback to doing at least a partial graph traversal.

Taylor (cc'd) has been looking at some tricks for speeding up cases like
this with a lot of refs. But I don't think there's anything to show
publicly yet.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [External Mail]Re: why git is so slow for a tiny git push?
  2021-11-24 18:15                 ` Jeff King
@ 2021-11-25  2:53                   ` 程洋
  0 siblings, 0 replies; 16+ messages in thread
From: 程洋 @ 2021-11-25  2:53 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, Ævar Arnfjörð Bjarmason, git

Well, we do have 300k refs, but only 1000 refs/heads.
However, I think most users only requires refs/heads, a few people only require refs/tags. As for other refs, we hardly see any user case.

So jgit treat it with a smart way,  it will create 2 pack files and 2 bitmaps, pack A contain all refs/heads, and pack B contains other refs. And when user do a fresh clone, it just need to send the pack A without determine if we can reuse or not

-----Original Message-----
From: Jeff King <peff@peff.net>
Sent: Thursday, November 25, 2021 2:15 AM
To: 程洋 <chengyang@xiaomi.com>
Cc: Taylor Blau <me@ttaylorr.com>; Ævar Arnfjörð Bjarmason <avarab@gmail.com>; git@vger.kernel.org
Subject: Re: [External Mail]Re: why git is so slow for a tiny git push?

*This message originated from outside of XIAOMI. Please treat this email with caution*


On Tue, Nov 23, 2021 at 06:42:12AM +0000, 程洋 wrote:

> I got another problem here.
> When I tries to clone from remote server. It took me 25 seconds to enumerating objects. And then 1 second to `couting objects` by bitmap.
> I don't understand, why a fresh clone need `enumerating objects` ? Is `couting objects` enough for the server to determine what to send?

In older versions of Git, the "counting objects" progress meter used to be the actual object graph traversal. That changed in v2.18 (via 5af050437a), but you may still seem some reference to "counting objects is expensive".

These days that is called "enumerating objects", and "counting objects"
is just doing a quick-ish pass over that list to do some light analysis (e.g., if we can reuse an on-disk delta). I'd expect "enumerating" to be expensive in general, and "counting" to be quick in general.

The "enumerating" phase is where we determine what to send whether it's for a clone or a fetch, and may involve opening up a bunch of trees to walk the graph. It's what reachability bitmaps are supposed to make faster. But if you have 300k refs, as you've mentioned, you almost certainly don't have complete coverage of all of the ref tips, so we'll have to fallback to doing at least a partial graph traversal.

Taylor (cc'd) has been looking at some tricks for speeding up cases like this with a lot of refs. But I don't think there's anything to show publicly yet.

-Peff
#/******本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/#

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-11-25  2:55 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <c5a8595658d6416684c2bbd317494c49@xiaomi.com>
     [not found] ` <5a6f3e8f29f74c93bf3af5da636df973@xiaomi.com>
2021-10-09 18:05   ` why git is so slow for a tiny git push? 程洋
2021-10-11 16:53     ` Jeff King
2021-10-12  8:04       ` [External Mail]Re: " 程洋
2021-10-12  8:39         ` Jeff King
2021-10-12  9:08           ` 程洋
2021-10-12 21:39             ` Jeff King
2021-10-14  6:47               ` 程洋
2021-10-26 21:54                 ` Jeff King
2021-10-27  2:48                   ` 程洋
2021-10-12 10:06           ` Ævar Arnfjörð Bjarmason
2021-10-12 21:46             ` Jeff King
2021-11-23  6:42               ` 程洋
2021-11-24 18:15                 ` Jeff King
2021-11-25  2:53                   ` 程洋
2021-11-24  8:07               ` 程洋
2021-10-28 13:17     ` Han-Wen Nienhuys

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).