git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: ZheNing Hu <adlternative@gmail.com>
To: Elijah Newren <newren@gmail.com>
Cc: "Emily Shaffer" <emilyshaffer@google.com>,
	"Git List" <git@vger.kernel.org>,
	"Derrick Stolee" <derrickstolee@github.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
	"Victoria Dye" <vdye@github.com>
Subject: Re: Question relate to collaboration on git monorepo
Date: Fri, 23 Sep 2022 22:31:59 +0800	[thread overview]
Message-ID: <CAOLTT8QfwOi9yx_qZZgyGa8iL8kHWutEED7ok_jxwTcYT_hf9Q@mail.gmail.com> (raw)
In-Reply-To: <CABPp-BEBB1oqdVcXrWwMAdtb0TwHZvr-6KDa210j5ncw54Di_g@mail.gmail.com>

Elijah Newren <newren@gmail.com> 于2022年9月22日周四 07:36写道:
>
> On Wed, Sep 21, 2022 at 8:22 AM ZheNing Hu <adlternative@gmail.com> wrote:
> >
> > Emily Shaffer <emilyshaffer@google.com> 于2022年9月21日周三 02:53写道:
> > >
> > > On Tue, Sep 20, 2022 at 5:42 AM ZheNing Hu <adlternative@gmail.com> wrote:
> > > >
> > > > Hey, guys,
> > > >
> > > > If two users of git monorepo are working on different sub project
> > > > /project1 and /project2 by partial-clone and sparse-checkout ,
> > > > if user one push first, then user two want to push too, he must
> > > > pull some blob which pushed by user one. I guess their repo size
> > > > will gradually increase by other's project's objects, so is there any way
> > > > to delete unnecessary blobs out of working project (sparse-checkout
> > > > filterspec), or just git pull don't really fetch these unnecessary blobs?
> > >
> > > This is exactly what the combination of partial clone and sparse
> > > checkout is for!
> > >
> > > Dev A is working on project1/, and excludes project2/ from her sparse
> > > filter; she also cloned with `--filter=blob:none`.
> > > Dev B is working on project2/, and excludes project1/ from his sparse
> > > filter, and similarly  is using blob:none partial clone filter.
> > >
> > > Assuming everybody is contributing by direct push, and not using a
> > > code review tool or something else which handles the push for them...
> > > Dev A finishes first, and pushes.
> > > Dev B needs to pull, like you say - but during that pull he doesn't
> > > need to fetch the objects in project1, because they're excluded by the
> > > combination of his partial clone filter and his sparse checkout
> > > pattern. The pull needs to happen because there is a new commit which
> > > Dev B's commit needs to treat as a parent, and so Dev B's client needs
> > > to know the ID of that commit.
> > >
> >
> > I don't agree here, it indeed fetches the blobs during git pull. So I
> > do a little
> > change in the previous test:
> >
> > (
> >   cd m2
> >   git cat-file --batch-check --batch-all-objects | grep blob | wc -l >
> > blob_count1
> > #  git push
> > #  git -c pull.rebase=false pull --no-edit #no conflict
> >   git fetch origin main
> >   git cat-file --batch-check --batch-all-objects | grep blob | wc -l >
> > blob_count2
> >   git merge --no-edit origin/main
> >   git cat-file --batch-check --batch-all-objects | grep blob | wc -l >
> > blob_count3
> >   printf "blob_count1=%s\n" $(cat blob_count1)
> >   printf "blob_count2=%s\n" $(cat blob_count2)
> >   printf "blob_count3=%s\n" $(cat blob_count3)
> > )
> >
> > warning: This repository uses promisor remotes. Some objects may not be loaded.
> > remote: Enumerating objects: 32, done.
> > remote: Counting objects: 100% (32/32), done.
> > remote: Compressing objects: 100% (20/20), done.
> > remote: Total 30 (delta 0), reused 0 (delta 0), pack-reused 0
> > Receiving objects: 100% (30/30), 2.61 KiB | 2.61 MiB/s, done.
> > From /Users/adl/./mono-repo
> >  * branch            main       -> FETCH_HEAD
> >    a6a17f2..16a8585  main       -> origin/main
> > warning: This repository uses promisor remotes. Some objects may not be loaded.
> > Merge made by the 'ort' strategy.
>
> Note: The merge completed successfully, and we see no evidence of
> additional blobs being downloaded before this point.
>

Agree. Debug message This is not a problem caused by git merge,
but caused by "finish" period of git merge, which fetch missing objects
to show the diffstat.

(lldb) b fetch_objects
Breakpoint 1: where = git`fetch_objects + 29 at
promisor-remote.c:18:23, address = 0x0000000100275f4d
(lldb) r
Process 62227 launched: '/Users/adl/repos/git/git' (x86_64)
Merge made by the 'ort' strategy.
Process 62227 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100275f4d
git`fetch_objects(repo=0x0000000100406a88, remote_name="origin",
oids=0x0000000101204360, oid_nr=1) at promisor-remote.c:18:23
   15  const struct object_id *oids,
   16  int oid_nr)
   17  {
-> 18  struct child_process child = CHILD_PROCESS_INIT;
   19  int i;
   20  FILE *child_in;
   21
Target 0: (git) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x0000000100275f4d
git`fetch_objects(repo=0x0000000100406a88, remote_name="origin",
oids=0x0000000101204360, oid_nr=1) at promisor-remote.c:18:23
    frame #1: 0x0000000100275ea3
git`promisor_remote_get_direct(repo=0x0000000100406a88,
oids=0x0000000101204360, oid_nr=1) at promisor-remote.c:249:7
    frame #2: 0x00000001001a2fe3
git`diff_queued_diff_prefetch(repository=0x0000000100406a88) at
diff.c:6781:2
    frame #3: 0x00000001001a3075
git`diffcore_std(options=0x00007ff7bfefed20) at diff.c:6805:3
    frame #4: 0x000000010009ca11
git`finish(head_commit=0x000000010151f000,
remoteheads=0x0000600000004390, new_head=0x00007ff7bfeff030,
msg="Merge made by the 'ort' strategy.") at merge.c:499:3
    frame #5: 0x000000010009d787
git`finish_automerge(head=0x000000010151f000, head_subsumed=0,
common=0x0000600000004330, remoteheads=0x0000600000004390,
result_tree=0x00007ff7bfeff280, wt_strategy="ort") at merge.c:960:2
    frame #6: 0x000000010009b07b git`cmd_merge(argc=1,
argv=0x00007ff7bfeff660, prefix=0x0000000000000000) at merge.c:1743:9
    frame #7: 0x0000000100005573 git`run_builtin(p=0x00000001003e0e60,
argc=3, argv=0x00007ff7bfeff660) at git.c:466:11
    frame #8: 0x0000000100004098 git`handle_builtin(argc=3,
argv=0x00007ff7bfeff660) at git.c:721:3
    frame #9: 0x0000000100004f76
git`run_argv(argcp=0x00007ff7bfeff4dc, argv=0x00007ff7bfeff4d0) at
git.c:788:4
    frame #10: 0x0000000100003e69 git`cmd_main(argc=3,
argv=0x00007ff7bfeff660) at git.c:921:19
    frame #11: 0x000000010011e8f6 git`main(argc=4,
argv=0x00007ff7bfeff658) at common-main.c:56:11
    frame #12: 0x00000001005b94fe dyld`start + 462

> > remote: Enumerating objects: 1, done.
> > remote: Counting objects: 100% (1/1), done.
> > remote: Total 1 (delta 0), reused 0 (delta 0), pack-reused 0
> > Receiving objects: 100% (1/1), 87 bytes | 87.00 KiB/s, done.
>
> Here, we do have an object download, which occurred after the merge
> completed, so there must be something happening after the merge which
> needs the extra blob; if we keep reading...
>
> >  project1/file1 | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
>
> Ah, the 'helpful' diffstat.  It downloads blobs from a promisor remote
> just so we can see what has changed, including in the area of the
> project we don't care about.
>
> (This is yet another reason it'd be nice to have a --restrict mode for
> grep/diff/log/etc. for sparse-checkout uses, and an ability to make it
> the default in some repo, so you could get just the diffstat within
> the region of the project that you care about.  We're discussing such
> an idea, but it isn't implemented yet.)
>
> > warning: This repository uses promisor remotes. Some objects may not be loaded.
> > blob_count1=11
> > blob_count2=11
> > blob_count3=12
> >
> > The result shows that blob count doesn't change in git fetch, but in git merge.
>
> If you add --no-stat to your merge command (or set merge.stat to
> false), the extra blob will not be downloaded.

After config merge.stat to false, the problem is solved. Thanks a lot!

ZheNing Hu

  parent reply	other threads:[~2022-09-23 14:33 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-20 12:42 Question relate to collaboration on git monorepo ZheNing Hu
2022-09-20 18:53 ` Emily Shaffer
2022-09-21 15:22   ` ZheNing Hu
2022-09-21 23:36     ` Elijah Newren
2022-09-22 14:24       ` Derrick Stolee
2022-09-22 15:20         ` Emily Shaffer
2022-09-23  2:08           ` Elijah Newren
2022-09-23 15:46         ` Junio C Hamano
2022-09-23 18:11           ` Derrick Stolee
2022-09-23 14:31       ` ZheNing Hu [this message]
2022-09-21  1:47 ` Elijah Newren
2022-09-21 15:42   ` ZheNing Hu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOLTT8QfwOi9yx_qZZgyGa8iL8kHWutEED7ok_jxwTcYT_hf9Q@mail.gmail.com \
    --to=adlternative@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).