* RE: [External Mail]Re: why git is so slow for a tiny git push?
2021-10-12 8:39 ` Jeff King
@ 2021-10-12 9:08 ` 程洋
2021-10-12 21:39 ` Jeff King
2021-10-12 10:06 ` Ævar Arnfjörð Bjarmason
1 sibling, 1 reply; 16+ messages in thread
From: 程洋 @ 2021-10-12 9:08 UTC (permalink / raw)
To: Jeff King; +Cc: git@vger.kernel.org
Oh my god.
Jesus. It works for me. After disable writebitmap, time cost decrease from 33 seconds to 0.9 seconds.
But now it turns out that, remote side takes 13 seconds to receive the pack, since git receive-pack is triggered automatically from remote side, is there anyway to enable GIT_TRACE2_PERF on server side?
-----Original Message-----
From: Jeff King <peff@peff.net>
Sent: Tuesday, October 12, 2021 4:40 PM
To: 程洋 <chengyang@xiaomi.com>
Cc: git@vger.kernel.org
Subject: Re: [External Mail]Re: why git is so slow for a tiny git push?
*This message originated from outside of XIAOMI. Please treat this email with caution*
On Tue, Oct 12, 2021 at 08:04:44AM +0000, 程洋 wrote:
> I have bitmap indeed because my master server also serves as download server.
> However I'm using git 2.17.0, and I didn't set repack.writeBitmaps
On that version and without the config, then perhaps you (or somebody) passed "-b" to git-repack.
> But why bitmaps can cause push to be slow? Do you mean that if
> writeBitmaps is true, every push will regenerate bitmap file? If
> that's what you mean, what I see is the only bitmap file in my repo
> didn't change across time (the modify time is one month ago, long
> before I run the experiment)
No, it is not regenerating the on-disk bitmaps. But when deciding the set of objects to send, pack-objects will generate an internal bitmap which is the set difference of objects reachable from the pushed refs, minus objects reachable from the refs the other the other side told us they had.
It uses the on-disk bitmaps as much as possible, but there may be commits not covered by bitmaps (either because they were pushed since the last repack which built bitmaps, or simply because it's too expensive to put a bitmap on every commit, so we sprinkle them throughout the commit history). In those cases we have to traverse parts of the object graph by walking commits and opening up trees. This can be expensive, and is where your time is going.
Reachability bitmaps _usually_ make things faster, but they have some cases where they make things worse (especially if you have a ton of refs, or haven't repacked recently).
If bitmaps are causing a problem for your push, they are likely to be causing problems for fetches, too. But if you want to keep them to serve fetches, but not use them for push, you should be able to do:
git -c pack.usebitmaps=false push
-Peff
#/******本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/#
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [External Mail]Re: why git is so slow for a tiny git push?
2021-10-12 9:08 ` 程洋
@ 2021-10-12 21:39 ` Jeff King
2021-10-14 6:47 ` 程洋
0 siblings, 1 reply; 16+ messages in thread
From: Jeff King @ 2021-10-12 21:39 UTC (permalink / raw)
To: 程洋; +Cc: git@vger.kernel.org
On Tue, Oct 12, 2021 at 09:08:08AM +0000, 程洋 wrote:
> Jesus. It works for me. After disable writebitmap, time cost decrease
> from 33 seconds to 0.9 seconds.
>
> But now it turns out that, remote side takes 13 seconds to receive the
> pack, since git receive-pack is triggered automatically from remote
> side, is there anyway to enable GIT_TRACE2_PERF on server side?
For the environment variable, it depends on your protocol. If you can
push over ssh (and the other side lets you execute arbitrary commands),
then:
git push --receive-pack='GIT_TRACE2_PERF=/tmp/foo.trace git-receive-pack'
Otherwise, you can look at setting the trace2.perfTarget config option
on the server side. I haven't played with it myself before.
-Peff
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [External Mail]Re: why git is so slow for a tiny git push?
2021-10-12 21:39 ` Jeff King
@ 2021-10-14 6:47 ` 程洋
2021-10-26 21:54 ` Jeff King
0 siblings, 1 reply; 16+ messages in thread
From: 程洋 @ 2021-10-14 6:47 UTC (permalink / raw)
To: Jeff King; +Cc: git@vger.kernel.org
Seems that git receive-pack only takes 1 seconds.
After that, git rev-list takes the most time, I don't know what is it doing
-----Original Message-----
From: Jeff King <peff@peff.net>
Sent: Wednesday, October 13, 2021 5:39 AM
To: 程洋 <chengyang@xiaomi.com>
Cc: git@vger.kernel.org
Subject: Re: [External Mail]Re: why git is so slow for a tiny git push?
*This message originated from outside of XIAOMI. Please treat this email with caution*
On Tue, Oct 12, 2021 at 09:08:08AM +0000, 程洋 wrote:
> Jesus. It works for me. After disable writebitmap, time cost decrease
> from 33 seconds to 0.9 seconds.
>
> But now it turns out that, remote side takes 13 seconds to receive the
> pack, since git receive-pack is triggered automatically from remote
> side, is there anyway to enable GIT_TRACE2_PERF on server side?
For the environment variable, it depends on your protocol. If you can push over ssh (and the other side lets you execute arbitrary commands),
then:
git push --receive-pack='GIT_TRACE2_PERF=/tmp/foo.trace git-receive-pack'
Otherwise, you can look at setting the trace2.perfTarget config option on the server side. I haven't played with it myself before.
-Peff
#/******本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/#
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [External Mail]Re: why git is so slow for a tiny git push?
2021-10-14 6:47 ` 程洋
@ 2021-10-26 21:54 ` Jeff King
2021-10-27 2:48 ` 程洋
0 siblings, 1 reply; 16+ messages in thread
From: Jeff King @ 2021-10-26 21:54 UTC (permalink / raw)
To: 程洋; +Cc: git@vger.kernel.org
On Thu, Oct 14, 2021 at 06:47:31AM +0000, 程洋 wrote:
> Seems that git receive-pack only takes 1 seconds.
> After that, git rev-list takes the most time, I don't know what is it doing
It's checking connectivity of what was sent (i.e., that the other side
sent us all the objects). If you have a ton of refs on the server side,
there are some known issues here, as just loading the "we already have
this" side of the traversal can be very expensive.
There's been some recent work to make this faster, like f559d6d45e
(revision: avoid hitting packfiles when commits are in commit-graph,
2021-08-09) and f45022dc2f (connected: do not sort input revisions,
2021-08-09), which will be in the upcoming release. I'm sure there are
more improvements that could be made on top, but those show the general
direction.
-Peff
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [External Mail]Re: why git is so slow for a tiny git push?
2021-10-26 21:54 ` Jeff King
@ 2021-10-27 2:48 ` 程洋
0 siblings, 0 replies; 16+ messages in thread
From: 程洋 @ 2021-10-27 2:48 UTC (permalink / raw)
To: Jeff King; +Cc: git@vger.kernel.org
That's cool. Thanks for you help. I will try to compile it myself
-----Original Message-----
From: Jeff King <peff@peff.net>
Sent: Wednesday, October 27, 2021 5:55 AM
To: 程洋 <chengyang@xiaomi.com>
Cc: git@vger.kernel.org
Subject: Re: [External Mail]Re: why git is so slow for a tiny git push?
*This message originated from outside of XIAOMI. Please treat this email with caution*
On Thu, Oct 14, 2021 at 06:47:31AM +0000, 程洋 wrote:
> Seems that git receive-pack only takes 1 seconds.
> After that, git rev-list takes the most time, I don't know what is it
> doing
It's checking connectivity of what was sent (i.e., that the other side sent us all the objects). If you have a ton of refs on the server side, there are some known issues here, as just loading the "we already have this" side of the traversal can be very expensive.
There's been some recent work to make this faster, like f559d6d45e
(revision: avoid hitting packfiles when commits are in commit-graph,
2021-08-09) and f45022dc2f (connected: do not sort input revisions, 2021-08-09), which will be in the upcoming release. I'm sure there are more improvements that could be made on top, but those show the general direction.
-Peff
#/******本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/#
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [External Mail]Re: why git is so slow for a tiny git push?
2021-10-12 8:39 ` Jeff King
2021-10-12 9:08 ` 程洋
@ 2021-10-12 10:06 ` Ævar Arnfjörð Bjarmason
2021-10-12 21:46 ` Jeff King
1 sibling, 1 reply; 16+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-12 10:06 UTC (permalink / raw)
To: Jeff King; +Cc: 程洋, git@vger.kernel.org
On Tue, Oct 12 2021, Jeff King wrote:
> On Tue, Oct 12, 2021 at 08:04:44AM +0000, 程洋 wrote:
>
>> I have bitmap indeed because my master server also serves as download server.
>> However I'm using git 2.17.0, and I didn't set repack.writeBitmaps
>
> On that version and without the config, then perhaps you (or somebody)
> passed "-b" to git-repack.
>
>> But why bitmaps can cause push to be slow? Do you mean that if
>> writeBitmaps is true, every push will regenerate bitmap file? If
>> that's what you mean, what I see is the only bitmap file in my repo
>> didn't change across time (the modify time is one month ago, long
>> before I run the experiment)
>
> No, it is not regenerating the on-disk bitmaps. But when deciding the
> set of objects to send, pack-objects will generate an internal bitmap
> which is the set difference of objects reachable from the pushed refs,
> minus objects reachable from the refs the other the other side told us
> they had.
>
> It uses the on-disk bitmaps as much as possible, but there may be
> commits not covered by bitmaps (either because they were pushed since
> the last repack which built bitmaps, or simply because it's too
> expensive to put a bitmap on every commit, so we sprinkle them
> throughout the commit history). In those cases we have to traverse parts
> of the object graph by walking commits and opening up trees. This can be
> expensive, and is where your time is going.
>
> Reachability bitmaps _usually_ make things faster, but they have some
> cases where they make things worse (especially if you have a ton of
> refs, or haven't repacked recently).
>
> If bitmaps are causing a problem for your push, they are likely to be
> causing problems for fetches, too. But if you want to keep them to serve
> fetches, but not use them for push, you should be able to do:
>
> git -c pack.usebitmaps=false push
For the last on-list discussion to (probably the same) problem, which in
turn references an even earlier one:
https://lore.kernel.org/git/878s6nfq54.fsf@evledraar.gmail.com/
I don't remember if my own report from mid-2019 said so or contradicts
this (and I didn't re-read the thread), but FWIW I *vaguely* recall that
the case I ran into *might* have had to do with a user running into this
on a shared "staging" server.
I.e. one where users logged in as their own user, cd'd to a shared git
repo they got a lock on, and ran fetch/push/deploy commands. One user
had a pack.useBitmaps=true or equivalent in their config (or had
manually run such a repack), so there were some very old stale bitmaps
around.
This was also a setup with a gc.bigPackThreshold configured, which now
that I think about it might have made it much worse, i.e. we'd keep that
*.bitmap on the "big pack" around forever.
But more generally with these side-indexes it seems to me that the code
involved might not be considering these sorts of edge cases, i.e. my
understanding from you above is that if we have bitmaps anywhere we'll
try to in-memory use them for all the objects in play? Or that otherwise
having "partial" bitmaps leads to pathological behavior.
tl;dr: We can't assume that a config of "I like to write side-index
A/B/C/... when I repack" means that the repo is in that state *now*, but
it seems that we do.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [External Mail]Re: why git is so slow for a tiny git push?
2021-10-12 10:06 ` Ævar Arnfjörð Bjarmason
@ 2021-10-12 21:46 ` Jeff King
2021-11-23 6:42 ` 程洋
2021-11-24 8:07 ` 程洋
0 siblings, 2 replies; 16+ messages in thread
From: Jeff King @ 2021-10-12 21:46 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: 程洋, git@vger.kernel.org
On Tue, Oct 12, 2021 at 12:06:04PM +0200, Ævar Arnfjörð Bjarmason wrote:
> But more generally with these side-indexes it seems to me that the code
> involved might not be considering these sorts of edge cases, i.e. my
> understanding from you above is that if we have bitmaps anywhere we'll
> try to in-memory use them for all the objects in play? Or that otherwise
> having "partial" bitmaps leads to pathological behavior.
Sure, if there was an easy way to know beforehand whether the bitmap was
going to help or run into these pathological cases, it would be nice to
detect it. I don't know what that is (and I've given it quite a lot of
thought over the past 8 years).
I suspect the most direction would be to teach the bitmap code to behave
more like the regular traversal by just walking down to the
UNINTERESTING commits. Right now it gets a complete bitmap for the
commits we don't want, and then a bitmap for the ones we do want, and
takes a set difference.
It could instead walk both sides in the usual way, filling in the bitmap
for each, and then stop when it hits boundary commits. The bitmap for
the boundary commit (if we don't have a full one on-disk) is filled in
with what's in its tree. That means it's incomplete, and the result
might include some extra objects (e.g., if boundary~100 had a blob that
went away, but later came back in a descendant that isn't marked
uninteresting). That's the same tradeoff the non-bitmap traversal makes.
It would be pretty major surgery to the bitmap code. I haven't actually
tried it before.
-Peff
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [External Mail]Re: why git is so slow for a tiny git push?
2021-10-12 21:46 ` Jeff King
@ 2021-11-23 6:42 ` 程洋
2021-11-24 18:15 ` Jeff King
2021-11-24 8:07 ` 程洋
1 sibling, 1 reply; 16+ messages in thread
From: 程洋 @ 2021-11-23 6:42 UTC (permalink / raw)
To: Jeff King, Ævar Arnfjörð Bjarmason; +Cc: git@vger.kernel.org
I got another problem here.
When I tries to clone from remote server. It took me 25 seconds to enumerating objects. And then 1 second to `couting objects` by bitmap.
I don't understand, why a fresh clone need `enumerating objects` ? Is `couting objects` enough for the server to determine what to send?
Here is the remote server trace:
11:49:12.438519 common-main.c:48 | d0 | main | version | | | | | 2.33.1.558.g2bd2f258f4.dirty
11:49:12.438556 common-main.c:49 | d0 | main | start | | 0.000274 | | | git daemon --inetd --syslog --export-all --enable=upload-pack --enable=receive-pack --base-path=/home/work/repositories
11:49:12.438607 compat/linux/procinfo.c:170 | d0 | main | cmd_ancestry | | | | | ancestry:[xinetd systemd]
11:49:12.438655 git.c:737 | d0 | main | cmd_name | | | | | _run_dashed_ (_run_dashed_)
11:49:12.438668 run-command.c:739 | d0 | main | child_start | | 0.000390 | | | [ch0] class:dashed argv:[git-daemon --inetd --syslog --export-all --enable=upload-pack --enable=receive-pack --base-path=/home/work/repositories]
11:49:12.439555 common-main.c:48 | d1 | main | version | | | | | 2.33.1.558.g2bd2f258f4.dirty
11:49:12.439589 common-main.c:49 | d1 | main | start | | 0.000242 | | | /usr/libexec/git-core/git-daemon --inetd --syslog --export-all --enable=upload-pack --enable=receive-pack --base-path=/home/work/repositories
11:49:12.439645 compat/linux/procinfo.c:170 | d1 | main | cmd_ancestry | | | | | ancestry:[git xinetd systemd]
11:49:12.439809 run-command.c:739 | d1 | main | child_start | | 0.000467 | | | [ch0] class:? argv:[git upload-pack --strict --timeout=0 .]
11:49:12.440747 common-main.c:48 | d2 | main | version | | | | | 2.33.1.558.g2bd2f258f4.dirty
11:49:12.440772 common-main.c:49 | d2 | main | start | | 0.000252 | | | /usr/libexec/git-core/git upload-pack --strict --timeout=0 .
11:49:12.440833 compat/linux/procinfo.c:170 | d2 | main | cmd_ancestry | | | | | ancestry:[git-daemon git xinetd systemd]
11:49:12.440853 git.c:456 | d2 | main | cmd_name | | | | | upload-pack (_run_dashed_/upload-pack)
11:49:12.441013 protocol.c:76 | d2 | main | data | | 0.000494 | 0.000494 | transfer | negotiated-version:2
11:49:12.481208 run-command.c:739 | d2 | main | child_start | | 0.040684 | | | [ch0] class:? argv:[git pack-objects --revs --thin --stdout --progress --delta-base-offset]
11:49:12.482307 common-main.c:48 | d3 | main | version | | | | | 2.33.1.558.g2bd2f258f4.dirty
11:49:12.482334 common-main.c:49 | d3 | main | start | | 0.000220 | | | /usr/libexec/git-core/git pack-objects --revs --thin --stdout --progress --delta-base-offset
11:49:12.482405 compat/linux/procinfo.c:170 | d3 | main | cmd_ancestry | | | | | ancestry:[git git-daemon git xinetd systemd]
11:49:12.482500 git.c:456 | d3 | main | cmd_name | | | | | pack-objects (_run_dashed_/upload-pack/pack-objects)
11:49:12.482632 builtin/pack-objects.c:4140 | d3 | main | region_enter | r0 | 0.000522 | | pack-objects | label:enumerate-objects
11:49:12.482825 progress.c:268 | d3 | main | region_enter | r0 | 0.000715 | | progress | ..label:Enumerating objects
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
11:49:21.477783 progress.c:329 | d3 | main | data | r0 | 8.995670 | 8.994955 | progress | ....total_objects:0
11:49:21.477848 progress.c:336 | d3 | main | region_leave | r0 | 8.995738 | 8.995023 | progress | ..label:Enumerating objects
11:49:21.477880 builtin/pack-objects.c:4162 | d3 | main | region_leave | r0 | 8.995770 | 8.995248 | pack-objects | label:enumerate-objects
11:49:21.477891 builtin/pack-objects.c:4168 | d3 | main | region_enter | r0 | 8.995782 | | pack-objects | label:prepare-pack
11:49:21.477903 progress.c:268 | d3 | main | region_enter | r0 | 8.995794 | | progress | ..label:Counting objects
11:49:22.316806 progress.c:329 | d3 | main | data | r0 | 9.834695 | 0.838901 | progress | ....total_objects:1383396
11:49:22.316848 progress.c:336 | d3 | main | region_leave | r0 | 9.834738 | 0.838944 | progress | ..label:Counting objects
11:49:22.366109 progress.c:268 | d3 | main | region_enter | r0 | 9.883998 | | progress | ..label:Compressing objects
11:49:34.208323 trace2/tr2_tgt_perf.c:201 | d2 | main | signal | | 21.767795 | | | signo:13
11:49:34.208372 trace2/tr2_tgt_perf.c:201 | d3 | main | signal | | 21.726219 | | | ....signo:13
11:49:34.218767 run-command.c:995 | d1 | main | child_exit | | 21.779417 | 21.778950 | | [ch0] pid:48725 code:141
11:49:34.218809 common-main.c:54 | d1 | main | exit | | 21.779469 | | | code:141
11:49:34.218822 trace2/tr2_tgt_perf.c:213 | d1 | main | atexit | | 21.779482 | | | code:141
11:49:34.219135 run-command.c:995 | d0 | main | child_exit | | 21.780855 | 21.780465 | | [ch0] pid:48724 code:141
11:49:34.219170 git.c:759 | d0 | main | exit | | 21.780893 | | | code:141
11:49:34.219182 trace2/tr2_tgt_perf.c:213 | d0 | main | atexit | | 21.780906 | | | code:141
-----Original Message-----
From: Jeff King <peff@peff.net>
Sent: Wednesday, October 13, 2021 5:46 AM
To: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Cc: 程洋 <chengyang@xiaomi.com>; git@vger.kernel.org
Subject: Re: [External Mail]Re: why git is so slow for a tiny git push?
*This message originated from outside of XIAOMI. Please treat this email with caution*
On Tue, Oct 12, 2021 at 12:06:04PM +0200, Ævar Arnfjörð Bjarmason wrote:
> But more generally with these side-indexes it seems to me that the
> code involved might not be considering these sorts of edge cases, i.e.
> my understanding from you above is that if we have bitmaps anywhere
> we'll try to in-memory use them for all the objects in play? Or that
> otherwise having "partial" bitmaps leads to pathological behavior.
Sure, if there was an easy way to know beforehand whether the bitmap was going to help or run into these pathological cases, it would be nice to detect it. I don't know what that is (and I've given it quite a lot of thought over the past 8 years).
I suspect the most direction would be to teach the bitmap code to behave more like the regular traversal by just walking down to the UNINTERESTING commits. Right now it gets a complete bitmap for the commits we don't want, and then a bitmap for the ones we do want, and takes a set difference.
It could instead walk both sides in the usual way, filling in the bitmap for each, and then stop when it hits boundary commits. The bitmap for the boundary commit (if we don't have a full one on-disk) is filled in with what's in its tree. That means it's incomplete, and the result might include some extra objects (e.g., if boundary~100 had a blob that went away, but later came back in a descendant that isn't marked uninteresting). That's the same tradeoff the non-bitmap traversal makes.
It would be pretty major surgery to the bitmap code. I haven't actually tried it before.
-Peff
#/******本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/#
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [External Mail]Re: why git is so slow for a tiny git push?
2021-11-23 6:42 ` 程洋
@ 2021-11-24 18:15 ` Jeff King
2021-11-25 2:53 ` 程洋
0 siblings, 1 reply; 16+ messages in thread
From: Jeff King @ 2021-11-24 18:15 UTC (permalink / raw)
To: 程洋
Cc: Taylor Blau, Ævar Arnfjörð Bjarmason,
git@vger.kernel.org
On Tue, Nov 23, 2021 at 06:42:12AM +0000, 程洋 wrote:
> I got another problem here.
> When I tries to clone from remote server. It took me 25 seconds to enumerating objects. And then 1 second to `couting objects` by bitmap.
> I don't understand, why a fresh clone need `enumerating objects` ? Is `couting objects` enough for the server to determine what to send?
In older versions of Git, the "counting objects" progress meter used to
be the actual object graph traversal. That changed in v2.18 (via
5af050437a), but you may still seem some reference to "counting objects
is expensive".
These days that is called "enumerating objects", and "counting objects"
is just doing a quick-ish pass over that list to do some light analysis
(e.g., if we can reuse an on-disk delta). I'd expect "enumerating" to be
expensive in general, and "counting" to be quick in general.
The "enumerating" phase is where we determine what to send whether it's
for a clone or a fetch, and may involve opening up a bunch of trees to
walk the graph. It's what reachability bitmaps are supposed to make
faster. But if you have 300k refs, as you've mentioned, you almost
certainly don't have complete coverage of all of the ref tips, so we'll
have to fallback to doing at least a partial graph traversal.
Taylor (cc'd) has been looking at some tricks for speeding up cases like
this with a lot of refs. But I don't think there's anything to show
publicly yet.
-Peff
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [External Mail]Re: why git is so slow for a tiny git push?
2021-11-24 18:15 ` Jeff King
@ 2021-11-25 2:53 ` 程洋
0 siblings, 0 replies; 16+ messages in thread
From: 程洋 @ 2021-11-25 2:53 UTC (permalink / raw)
To: Jeff King
Cc: Taylor Blau, Ævar Arnfjörð Bjarmason,
git@vger.kernel.org
Well, we do have 300k refs, but only 1000 refs/heads.
However, I think most users only requires refs/heads, a few people only require refs/tags. As for other refs, we hardly see any user case.
So jgit treat it with a smart way, it will create 2 pack files and 2 bitmaps, pack A contain all refs/heads, and pack B contains other refs. And when user do a fresh clone, it just need to send the pack A without determine if we can reuse or not
-----Original Message-----
From: Jeff King <peff@peff.net>
Sent: Thursday, November 25, 2021 2:15 AM
To: 程洋 <chengyang@xiaomi.com>
Cc: Taylor Blau <me@ttaylorr.com>; Ævar Arnfjörð Bjarmason <avarab@gmail.com>; git@vger.kernel.org
Subject: Re: [External Mail]Re: why git is so slow for a tiny git push?
*This message originated from outside of XIAOMI. Please treat this email with caution*
On Tue, Nov 23, 2021 at 06:42:12AM +0000, 程洋 wrote:
> I got another problem here.
> When I tries to clone from remote server. It took me 25 seconds to enumerating objects. And then 1 second to `couting objects` by bitmap.
> I don't understand, why a fresh clone need `enumerating objects` ? Is `couting objects` enough for the server to determine what to send?
In older versions of Git, the "counting objects" progress meter used to be the actual object graph traversal. That changed in v2.18 (via 5af050437a), but you may still seem some reference to "counting objects is expensive".
These days that is called "enumerating objects", and "counting objects"
is just doing a quick-ish pass over that list to do some light analysis (e.g., if we can reuse an on-disk delta). I'd expect "enumerating" to be expensive in general, and "counting" to be quick in general.
The "enumerating" phase is where we determine what to send whether it's for a clone or a fetch, and may involve opening up a bunch of trees to walk the graph. It's what reachability bitmaps are supposed to make faster. But if you have 300k refs, as you've mentioned, you almost certainly don't have complete coverage of all of the ref tips, so we'll have to fallback to doing at least a partial graph traversal.
Taylor (cc'd) has been looking at some tricks for speeding up cases like this with a lot of refs. But I don't think there's anything to show publicly yet.
-Peff
#/******本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/#
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [External Mail]Re: why git is so slow for a tiny git push?
2021-10-12 21:46 ` Jeff King
2021-11-23 6:42 ` 程洋
@ 2021-11-24 8:07 ` 程洋
1 sibling, 0 replies; 16+ messages in thread
From: 程洋 @ 2021-11-24 8:07 UTC (permalink / raw)
To: Jeff King, Ævar Arnfjörð Bjarmason; +Cc: git@vger.kernel.org
It seems that, "get_object_list_from_bitmap" takes 9 seconds.
Does it meet the expectation?
I'm not sure. But here is my guess:
Since I have 300k refs. But clone with `--no-tags` only requires "refs/heads/*". Git has to search and filter refs in the whole bitmap file, which takes a lot of time.
I think jgit do it in a really smart way. It pack all refs/heads into one bitmapfile ,and the other refs in another bitmap file. Because 90% of clone operation only requires all refs/heads.
-----Original Message-----
From: 程洋
Sent: Tuesday, November 23, 2021 2:42 PM
To: 'Jeff King' <peff@peff.net>; Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Cc: git@vger.kernel.org
Subject: RE: [External Mail]Re: why git is so slow for a tiny git push?
I got another problem here.
When I tries to clone from remote server. It took me 25 seconds to enumerating objects. And then 1 second to `couting objects` by bitmap.
I don't understand, why a fresh clone need `enumerating objects` ? Is `couting objects` enough for the server to determine what to send?
Here is the remote server trace:
11:49:12.438519 common-main.c:48 | d0 | main | version | | | | | 2.33.1.558.g2bd2f258f4.dirty
11:49:12.438556 common-main.c:49 | d0 | main | start | | 0.000274 | | | git daemon --inetd --syslog --export-all --enable=upload-pack --enable=receive-pack --base-path=/home/work/repositories
11:49:12.438607 compat/linux/procinfo.c:170 | d0 | main | cmd_ancestry | | | | | ancestry:[xinetd systemd]
11:49:12.438655 git.c:737 | d0 | main | cmd_name | | | | | _run_dashed_ (_run_dashed_)
11:49:12.438668 run-command.c:739 | d0 | main | child_start | | 0.000390 | | | [ch0] class:dashed argv:[git-daemon --inetd --syslog --export-all --enable=upload-pack --enable=receive-pack --base-path=/home/work/repositories]
11:49:12.439555 common-main.c:48 | d1 | main | version | | | | | 2.33.1.558.g2bd2f258f4.dirty
11:49:12.439589 common-main.c:49 | d1 | main | start | | 0.000242 | | | /usr/libexec/git-core/git-daemon --inetd --syslog --export-all --enable=upload-pack --enable=receive-pack --base-path=/home/work/repositories
11:49:12.439645 compat/linux/procinfo.c:170 | d1 | main | cmd_ancestry | | | | | ancestry:[git xinetd systemd]
11:49:12.439809 run-command.c:739 | d1 | main | child_start | | 0.000467 | | | [ch0] class:? argv:[git upload-pack --strict --timeout=0 .]
11:49:12.440747 common-main.c:48 | d2 | main | version | | | | | 2.33.1.558.g2bd2f258f4.dirty
11:49:12.440772 common-main.c:49 | d2 | main | start | | 0.000252 | | | /usr/libexec/git-core/git upload-pack --strict --timeout=0 .
11:49:12.440833 compat/linux/procinfo.c:170 | d2 | main | cmd_ancestry | | | | | ancestry:[git-daemon git xinetd systemd]
11:49:12.440853 git.c:456 | d2 | main | cmd_name | | | | | upload-pack (_run_dashed_/upload-pack)
11:49:12.441013 protocol.c:76 | d2 | main | data | | 0.000494 | 0.000494 | transfer | negotiated-version:2
11:49:12.481208 run-command.c:739 | d2 | main | child_start | | 0.040684 | | | [ch0] class:? argv:[git pack-objects --revs --thin --stdout --progress --delta-base-offset]
11:49:12.482307 common-main.c:48 | d3 | main | version | | | | | 2.33.1.558.g2bd2f258f4.dirty
11:49:12.482334 common-main.c:49 | d3 | main | start | | 0.000220 | | | /usr/libexec/git-core/git pack-objects --revs --thin --stdout --progress --delta-base-offset
11:49:12.482405 compat/linux/procinfo.c:170 | d3 | main | cmd_ancestry | | | | | ancestry:[git git-daemon git xinetd systemd]
11:49:12.482500 git.c:456 | d3 | main | cmd_name | | | | | pack-objects (_run_dashed_/upload-pack/pack-objects)
11:49:12.482632 builtin/pack-objects.c:4140 | d3 | main | region_enter | r0 | 0.000522 | | pack-objects | label:enumerate-objects
11:49:12.482825 progress.c:268 | d3 | main | region_enter | r0 | 0.000715 | | progress | ..label:Enumerating objects
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
11:49:21.477783 progress.c:329 | d3 | main | data | r0 | 8.995670 | 8.994955 | progress | ....total_objects:0
11:49:21.477848 progress.c:336 | d3 | main | region_leave | r0 | 8.995738 | 8.995023 | progress | ..label:Enumerating objects
11:49:21.477880 builtin/pack-objects.c:4162 | d3 | main | region_leave | r0 | 8.995770 | 8.995248 | pack-objects | label:enumerate-objects
11:49:21.477891 builtin/pack-objects.c:4168 | d3 | main | region_enter | r0 | 8.995782 | | pack-objects | label:prepare-pack
11:49:21.477903 progress.c:268 | d3 | main | region_enter | r0 | 8.995794 | | progress | ..label:Counting objects
11:49:22.316806 progress.c:329 | d3 | main | data | r0 | 9.834695 | 0.838901 | progress | ....total_objects:1383396
11:49:22.316848 progress.c:336 | d3 | main | region_leave | r0 | 9.834738 | 0.838944 | progress | ..label:Counting objects
11:49:22.366109 progress.c:268 | d3 | main | region_enter | r0 | 9.883998 | | progress | ..label:Compressing objects
11:49:34.208323 trace2/tr2_tgt_perf.c:201 | d2 | main | signal | | 21.767795 | | | signo:13
11:49:34.208372 trace2/tr2_tgt_perf.c:201 | d3 | main | signal | | 21.726219 | | | ....signo:13
11:49:34.218767 run-command.c:995 | d1 | main | child_exit | | 21.779417 | 21.778950 | | [ch0] pid:48725 code:141
11:49:34.218809 common-main.c:54 | d1 | main | exit | | 21.779469 | | | code:141
11:49:34.218822 trace2/tr2_tgt_perf.c:213 | d1 | main | atexit | | 21.779482 | | | code:141
11:49:34.219135 run-command.c:995 | d0 | main | child_exit | | 21.780855 | 21.780465 | | [ch0] pid:48724 code:141
11:49:34.219170 git.c:759 | d0 | main | exit | | 21.780893 | | | code:141
11:49:34.219182 trace2/tr2_tgt_perf.c:213 | d0 | main | atexit | | 21.780906 | | | code:141
-----Original Message-----
From: Jeff King <peff@peff.net>
Sent: Wednesday, October 13, 2021 5:46 AM
To: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Cc: 程洋 <chengyang@xiaomi.com>; git@vger.kernel.org
Subject: Re: [External Mail]Re: why git is so slow for a tiny git push?
*This message originated from outside of XIAOMI. Please treat this email with caution*
On Tue, Oct 12, 2021 at 12:06:04PM +0200, Ævar Arnfjörð Bjarmason wrote:
> But more generally with these side-indexes it seems to me that the
> code involved might not be considering these sorts of edge cases, i.e.
> my understanding from you above is that if we have bitmaps anywhere
> we'll try to in-memory use them for all the objects in play? Or that
> otherwise having "partial" bitmaps leads to pathological behavior.
Sure, if there was an easy way to know beforehand whether the bitmap was going to help or run into these pathological cases, it would be nice to detect it. I don't know what that is (and I've given it quite a lot of thought over the past 8 years).
I suspect the most direction would be to teach the bitmap code to behave more like the regular traversal by just walking down to the UNINTERESTING commits. Right now it gets a complete bitmap for the commits we don't want, and then a bitmap for the ones we do want, and takes a set difference.
It could instead walk both sides in the usual way, filling in the bitmap for each, and then stop when it hits boundary commits. The bitmap for the boundary commit (if we don't have a full one on-disk) is filled in with what's in its tree. That means it's incomplete, and the result might include some extra objects (e.g., if boundary~100 had a blob that went away, but later came back in a descendant that isn't marked uninteresting). That's the same tradeoff the non-bitmap traversal makes.
It would be pretty major surgery to the bitmap code. I haven't actually tried it before.
-Peff
#/******本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/#
^ permalink raw reply [flat|nested] 16+ messages in thread