git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: ZheNing Hu <adlternative@gmail.com>
To: Emily Shaffer <emilyshaffer@google.com>
Cc: "Git List" <git@vger.kernel.org>,
	"Derrick Stolee" <derrickstolee@github.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
	"Victoria Dye" <vdye@github.com>,
	"Elijah Newren" <newren@gmail.com>
Subject: Re: Question relate to collaboration on git monorepo
Date: Wed, 21 Sep 2022 23:22:41 +0800	[thread overview]
Message-ID: <CAOLTT8TwdwfHCCv+x51++Aanf3tipMegfZiTKFbQtfh7b_EY0A@mail.gmail.com> (raw)
In-Reply-To: <CAJoAoZnm8jE9rT+rrze-zP7KZNW=oCZjcrFWqjDssW3LzxrKPg@mail.gmail.com>

Emily Shaffer <emilyshaffer@google.com> 于2022年9月21日周三 02:53写道:
>
> On Tue, Sep 20, 2022 at 5:42 AM ZheNing Hu <adlternative@gmail.com> wrote:
> >
> > Hey, guys,
> >
> > If two users of git monorepo are working on different sub project
> > /project1 and /project2 by partial-clone and sparse-checkout ,
> > if user one push first, then user two want to push too, he must
> > pull some blob which pushed by user one. I guess their repo size
> > will gradually increase by other's project's objects, so is there any way
> > to delete unnecessary blobs out of working project (sparse-checkout
> > filterspec), or just git pull don't really fetch these unnecessary blobs?
>
> This is exactly what the combination of partial clone and sparse
> checkout is for!
>
> Dev A is working on project1/, and excludes project2/ from her sparse
> filter; she also cloned with `--filter=blob:none`.
> Dev B is working on project2/, and excludes project1/ from his sparse
> filter, and similarly  is using blob:none partial clone filter.
>
> Assuming everybody is contributing by direct push, and not using a
> code review tool or something else which handles the push for them...
> Dev A finishes first, and pushes.
> Dev B needs to pull, like you say - but during that pull he doesn't
> need to fetch the objects in project1, because they're excluded by the
> combination of his partial clone filter and his sparse checkout
> pattern. The pull needs to happen because there is a new commit which
> Dev B's commit needs to treat as a parent, and so Dev B's client needs
> to know the ID of that commit.
>

I don't agree here, it indeed fetches the blobs during git pull. So I
do a little
change in the previous test:

(
  cd m2
  git cat-file --batch-check --batch-all-objects | grep blob | wc -l >
blob_count1
#  git push
#  git -c pull.rebase=false pull --no-edit #no conflict
  git fetch origin main
  git cat-file --batch-check --batch-all-objects | grep blob | wc -l >
blob_count2
  git merge --no-edit origin/main
  git cat-file --batch-check --batch-all-objects | grep blob | wc -l >
blob_count3
  printf "blob_count1=%s\n" $(cat blob_count1)
  printf "blob_count2=%s\n" $(cat blob_count2)
  printf "blob_count3=%s\n" $(cat blob_count3)
)

warning: This repository uses promisor remotes. Some objects may not be loaded.
remote: Enumerating objects: 32, done.
remote: Counting objects: 100% (32/32), done.
remote: Compressing objects: 100% (20/20), done.
remote: Total 30 (delta 0), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (30/30), 2.61 KiB | 2.61 MiB/s, done.
From /Users/adl/./mono-repo
 * branch            main       -> FETCH_HEAD
   a6a17f2..16a8585  main       -> origin/main
warning: This repository uses promisor remotes. Some objects may not be loaded.
Merge made by the 'ort' strategy.
remote: Enumerating objects: 1, done.
remote: Counting objects: 100% (1/1), done.
remote: Total 1 (delta 0), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (1/1), 87 bytes | 87.00 KiB/s, done.
 project1/file1 | 10 ++++++++++
 1 file changed, 10 insertions(+)
warning: This repository uses promisor remotes. Some objects may not be loaded.
blob_count1=11
blob_count2=11
blob_count3=12

The result shows that blob count doesn't change in git fetch, but in git merge.
However, not all the history of this blob will be pulled down here, so
the growth
of the local repository should be slow. So I was concerned about whether there
was a way to periodically clean up these unneeded blob.

> >
> > The large number of interruptions in git push may be another
> > problem, if thousands of probjects are in one monorepo, and
> > no one else has any code that would conflict with me in any way,
> > but I need pull everytime? Is there a way to make improvements
> > here?
>
> The typical improvement people make here is to use some form of
> automation or tooling to perform the push and merge for them. That
> usually falls to the code review tool. We can call the history like
> this: "S" is the source commit which both A and B branched from, and
> "A" and "B" are the commits by their respective owners. Because of the
> order of push, we want the final commit history to look like "S -> A
> -> B". Dev A's local history looks like "S -> A" and Dev B's local
> history looks like "S -> B".
>
> If we're using the GitHub PR model, then GitHub may do merge commits
> for us, and it creates those merge commits automatically at the time
> someone pushes "Merge PR" (or whatever the button is called). So our
> history probably looks like:
> o  (merge B)
> |   \
> o   |  (merge A)
> |\  |
> | | B
> | A |
> | / /
> S
>
> In this case, neither A or B need to know about each other, because
> the merge commit is being created by the code review tool.
>
> With tooling like Gerrit, or other tooling that uses the rebase
> strategy (rather than merge), pretty much the same thing happens -
> both devs can push without knowing about their own commit because the
> review tool's automation performs the rebase (that is, the "git pull"
> you described) for them.
>

Agree. Using Github PR or Gerrit, the Merge/Rebase process occurs on
a remote server, so local repo will not do git merge, and so don't need to
fetch blobs.

> But if you're not using tooling, yeah, Dev B needs to know which
> commit should come before his own commit, so he needs to fetch latest
> history, even though the only changes are from Dev A who was working
> somewhere else in the monorepo.
>

Thanks for the answer,
ZheNing Hu

  reply	other threads:[~2022-09-21 15:23 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-20 12:42 Question relate to collaboration on git monorepo ZheNing Hu
2022-09-20 18:53 ` Emily Shaffer
2022-09-21 15:22   ` ZheNing Hu [this message]
2022-09-21 23:36     ` Elijah Newren
2022-09-22 14:24       ` Derrick Stolee
2022-09-22 15:20         ` Emily Shaffer
2022-09-23  2:08           ` Elijah Newren
2022-09-23 15:46         ` Junio C Hamano
2022-09-23 18:11           ` Derrick Stolee
2022-09-23 14:31       ` ZheNing Hu
2022-09-21  1:47 ` Elijah Newren
2022-09-21 15:42   ` ZheNing Hu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOLTT8TwdwfHCCv+x51++Aanf3tipMegfZiTKFbQtfh7b_EY0A@mail.gmail.com \
    --to=adlternative@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).