git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: 程洋 <chengyang@xiaomi.com>
To: ZheNing Hu <adlternative@gmail.com>
Cc: "Jonathan Tan" <jonathantanmy@google.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>,
	何浩 <hehao@xiaomi.com>, "Xin7 Ma 马鑫" <maxin7@xiaomi.com>,
	石奉兵 <shifengbing@xiaomi.com>, 凡军辉 <fanjunhui@xiaomi.com>,
	王汉基 <wanghanji@xiaomi.com>
Subject: RE: [External Mail]Re: Partial-clone cause big performance impact on server
Date: Mon, 15 Aug 2022 13:15:07 +0000	[thread overview]
Message-ID: <44c62b62ce8f418d8929bdffc894d329@xiaomi.com> (raw)
In-Reply-To: <CAOLTT8R6hNKWGen4RD2sSU-asjjS6HXnxY2JC4k9SeL4YDzB-g@mail.gmail.com>

There is a really easy way to reproduce it

git clone --filter=blob:none -b master "https://android.googlesource.com/platform/prebuilts/gradle-plugin"

Even Google AOSP Gerrit will have this problem. You will find it hang for minutes on checkout


> -----Original Message-----
> From: ZheNing Hu <adlternative@gmail.com>
> Sent: Monday, August 15, 2022 1:16 PM
> To: 程洋 <chengyang@xiaomi.com>
> Cc: Jonathan Tan <jonathantanmy@google.com>; git@vger.kernel.org; 何浩
> <hehao@xiaomi.com>; Xin7 Ma 马鑫 <maxin7@xiaomi.com>; 石奉兵
> <shifengbing@xiaomi.com>; 凡军辉 <fanjunhui@xiaomi.com>; 王汉基
> <wanghanji@xiaomi.com>
> Subject: Re: [External Mail]Re: Partial-clone cause big performance impact on
> server
>
> [外部邮件] 此邮件来源于小米公司外部,请谨慎处理。
>
> 程洋 <chengyang@xiaomi.com> 于2022年8月13日周六 16:00写道:
> >
> > > >     3. with GIT_TRACE_PACKET=1. We found on big repositories
> (200K+refs, 6m+ objects). Git will sends 40k want.
> > > >     4. And we then track our server(which is gerrit with jgit). We found
> the server is couting objects. Then we check those 40k objects, most of them
> are blobs rather than commit. (which means they're not in bitmap)
> > > >     5. We believe that's the root cause of our problem. Git sends too
> many "want SHA1" which are not in bitmap, cause the server to count
> objects frequently, which then slow down the server.
> > > >
> > > > What we want is, download the things we need to checkout to specific
> commit. But if one commit contain so many objects (like us , 40k+). It takes
> more time to counting than downloading.
> > > > Is it possible to let git only send "commit want" rather than all the
> objects SHA1 one by one?
> > >
> > > On a technical level, it may be possible - at the point in the Git
> > > code where the batch prefetch occurs, I'm not sure if we have the
> > > commit, but we could plumb the commit information there. (We have
> > > the tree, but this doesn't help us here because as far as I know,
> > > the tree won't be in the bitmap so the server would need to count
> > > objects anyway, resulting in the same problem.)
> > >
> > > However, sending only commits as wants would mean that we would be
> > > fetching more blobs than needed. For example, if we were to clone
> > > (with
> > > checkout) and then checkout HEAD^, sending a "commit want" for the
> > > latter checkout would result in all blobs referenced by the commit's
> > > tree being fetched and not only the blobs that are different.
> >
> > It seems your solution require changes from both server side and
> > client side Why not we just add another filter, allow partial-clone always
> sends commit level want?
> > If we checkout HEAD~1, then client can send "want HEAD~1 HEAD~2".
> >
>
> I am interesting about this question too, maybe I can try if we can do this.. ;-)
>
> ZheNing Hu
#/******本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/#

  reply	other threads:[~2022-08-15 13:16 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-11  8:09 Partial-clone cause big performance impact on server 程洋
2022-08-11 17:22 ` Jonathan Tan
2022-08-13  7:55   ` 回复: [External Mail]Re: " 程洋
2022-08-13 11:41     ` 程洋
2022-08-15  5:16     ` ZheNing Hu
2022-08-15 13:15       ` 程洋 [this message]
2022-08-12 12:21 ` Derrick Stolee
2022-08-14  6:48 ` Jeff King
2022-08-15 13:18   ` Derrick Stolee
2022-08-15 14:50     ` [External Mail]Re: " 程洋
2022-08-17 10:22     ` 程洋
2022-08-17 13:41       ` Derrick Stolee
2022-08-18  5:49         ` Jeff King
2022-09-01  6:53   ` 程洋
2022-09-01 16:19     ` Jeff King
2022-09-05 11:17       ` 程洋
2022-09-06 18:38         ` Jeff King
2022-09-06 22:58           ` [PATCH 0/3] speeding up on-demand fetch for blobs in partial clone Jeff King
2022-09-06 23:01             ` [PATCH 1/3] parse_object(): allow skipping hash check Jeff King
2022-09-07 14:15               ` Derrick Stolee
2022-09-07 20:44                 ` Jeff King
2022-09-06 23:05             ` [PATCH 2/3] upload-pack: skip parse-object re-hashing of "want" objects Jeff King
2022-09-07 14:36               ` Derrick Stolee
2022-09-07 14:45                 ` Derrick Stolee
2022-09-07 20:50                   ` Jeff King
2022-09-07 19:26               ` Junio C Hamano
2022-09-07 20:36                 ` Jeff King
2022-09-07 20:48                   ` [BUG] t1800: Fails for error text comparison rsbecker
2022-09-07 21:55                     ` Junio C Hamano
2022-09-07 22:23                       ` rsbecker
2022-09-07 21:02                   ` [PATCH 2/3] upload-pack: skip parse-object re-hashing of "want" objects Jeff King
2022-09-07 22:07                     ` Junio C Hamano
2022-09-08  5:04                       ` Jeff King
2022-09-08 16:41                         ` Junio C Hamano
2022-09-06 23:06             ` [PATCH 3/3] parse_object(): check commit-graph when skip_hash set Jeff King
2022-09-07 14:46               ` Derrick Stolee
2022-09-07 19:31               ` Junio C Hamano
2022-09-08 10:39                 ` [External Mail]Re: " 程洋
2022-09-08 18:42                   ` Jeff King
2022-09-07 14:48             ` [PATCH 0/3] speeding up on-demand fetch for blobs in partial clone Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44c62b62ce8f418d8929bdffc894d329@xiaomi.com \
    --to=chengyang@xiaomi.com \
    --cc=adlternative@gmail.com \
    --cc=fanjunhui@xiaomi.com \
    --cc=git@vger.kernel.org \
    --cc=hehao@xiaomi.com \
    --cc=jonathantanmy@google.com \
    --cc=maxin7@xiaomi.com \
    --cc=shifengbing@xiaomi.com \
    --cc=wanghanji@xiaomi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).