git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Sean Allred <allred.sean@gmail.com>
To: 程洋 <chengyang@xiaomi.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
	姜浩哲 <jianghaozhe1@xiaomi.com>
Subject: Re: Git fetch slow on local repository with 600k refs
Date: Tue, 14 Mar 2023 09:29:31 -0500	[thread overview]
Message-ID: <m0wn3jxu7g.fsf@epic96565.epic.com> (raw)
In-Reply-To: <e28a23e8eb044d26947462b8619e88bd@xiaomi.com>


程洋 <chengyang@xiaomi.com> writes:

> We're holding a Gerrit server cluster. And uses pull-replication
> plugin to sync changes between master and slave.
>
> When a change is pushed to master, it notify the slave, and slave
> fetch it from master.
>
> But we found in a big repository with 600k refs. Fetch takes 5-10
> seconds even if fetching a 1 byte change. Here is the GIT_TRACE2_PERF
>
> I did an experiment to fetch a ref that my slave already have. And we
> can find git rev-list takes 2 seconds to perform. (I guess it try to
> find remote object from reachable objects of local refs one by one)
>
> Is there anyway to optimize such situation?

Do you need all those refs as refs -- or are you just looking to keep
the commits?

We found a rather clever solution for the latter we're looking to
upstream at some point to collect all refs into a single 'archive' ref
that collects commits in fake merge commits (there's no actual conflict
resolution happening -- we just use the same tree over and over). We
make each commit message look like show-ref output. For example:

A single ref (refs/archive) pointing to commit (A), with contents

    tree <some arbitrary tree>
    parent <B> [... 500 other commits 'merged' in ...]
    author <system user>
    committer <system user>

    deadbeef0123456788... refs/tags/very/old/release-1
    deadbeef0123456789... refs/tags/very/old/release-2

When we want to pull a ref out of the archive, we have a process in
place to do so. This keeps the total number of refs down and the
fetch/push performance within acceptable limits.

--
Sean Allred

      parent reply	other threads:[~2023-03-14 14:51 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-13 11:54 Git fetch slow on local repository with 600k refs 程洋
2023-03-14  8:22 ` Bagas Sanjaya
2023-03-14 11:13   ` [External Mail]Re: " 程洋
2023-03-14 14:29 ` Sean Allred [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m0wn3jxu7g.fsf@epic96565.epic.com \
    --to=allred.sean@gmail.com \
    --cc=chengyang@xiaomi.com \
    --cc=git@vger.kernel.org \
    --cc=jianghaozhe1@xiaomi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).