git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
To: git@vger.kernel.org
Cc: tools@linux.kernel.org
Subject: grokmirror-2.0 is available
Date: Mon, 21 Sep 2020 13:06:51 -0400	[thread overview]
Message-ID: <20200921170651.aszbydzvnj7l4y2w@chatter.i7.local> (raw)

[-- Attachment #1: Type: text/plain, Size: 4071 bytes --]

Hello:

I am pleased to announce version 2.0 of kernel.org's git mirroring 
software, grokmirror. This is a major rewrite that intentionally breaks 
the upgrade path from grokmirror-1.x due to significant backend changes 
requiring replica administrator's thoughtful consideration -- please see 
the UPGRADING.rst document provided with this release.

## New in grokmirror-2.0

- Drop support for python < 3.6
- Introduce "object storage" repositories that benefit from git-pack
  delta islands and improve overall disk storage footprint (results will 
  directly depend on the number of forks).
- Drop dependency on GitPython: use git calls directly for all operations
- Remove progress bars to slim down dependencies (drops enlighten)
- Make grok-pull operate in daemon mode (with -o) (see contrib for
  systemd unit files). This is more efficient than the cron mode when
  run very frequently.
- Provide a socket listener for pubsub push updates (see contrib for
  Google pubsubv1.py).
- Merge fsck.conf and repos.conf into a single config file. This
  requires creating a new configuration file after the upgrade. See
  UPGRADING.rst for details.
- Record and propagate HEAD position using the manifest file.
- Add grok-bundle command to create clone.bundle files for CDN-offloaded
  cloning (mostly used by Android's repo command).
- Add SELinux policy for EL7 (see contrib).

## Object Storage Repositories

Grokmirror 2.0 introduces the concept of "object storage repositories", which
aims to optimize how repository forks are stored on disk and served to the
cloning clients.

When grok-fsck runs, it will automatically recognize related repositories by
analyzing their root commits. If it finds two or more related repositories, it
will set up a unified "object storage" repo and fetch all refs from each
related repository into it.

For example, you can have two forks of linux.git:
  torvalds/linux.git:
    refs/heads/master
    refs/tags/v5.0-rc3
    ...

and its fork:

  maintainer/linux.git:
    refs/heads/master
    refs/heads/devbranch
    refs/tags/v5.0-rc3
    ...

Grok-fsck will set up an object storage repository and fetch all refs from both
repositories:

  objstore/[random-guid-name].git
     refs/virtual/[sha1-of-torvalds/linux.git:12]/heads/master
     refs/virtual/[sha1-of-torvalds/linux.git:12]/tags/v5.0-rc3
     ...
     refs/virtual/[sha1-of-maintainer/linux.git:12]/heads/master
     refs/virtual/[sha1-of-maintainer/linux.git:12]/heads/devbranch
     refs/virtual/[sha1-of-maintainer/linux.git:12]/tags/v5.0-rc3
     ...

Then both torvalds/linux.git and maintainer/linux.git with be configured to use
objstore/[random-guid-name].git via objects/info/alternates and repacked to
just contain metadata and no objects.

The alternates repository will be repacked with "delta islands" enabled,
which should help optimize clone operations for each "sibling"
repository.

Please see the example grokmirror.conf for more details about configuring
objstore repositories.

## Space savings using object storage repositories

Any disk space savings will depend on how many repositories are forks of 
each other. For git.kernel.org, which already aggressively used 
alternates for all linux.git forks, we saw reduction from 60GB to 20GB 
for the entirety of git.kernel.org content. On some of the 
codeaurora.org systems, especially those containing a lot of pre-release 
forks of entire AOSP repo collections, we saw space usage go from 3TB to 
under 1TB.

## Stability

This release has proven pretty stable and has been operating on 
git.kernel.org and a subset of codeaurora.org systems for over the past 
month. However, since the trickiest part is initial repository 
conversion towards the use of object storage repos, we urge proceeding 
with caution. Please study the UPGRADING.rst document before making any 
changes to your infrastructure.

With all support questions, please email tools@linux.kernel.org.

Best regards,
Konstantin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

                 reply	other threads:[~2020-09-21 17:07 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200921170651.aszbydzvnj7l4y2w@chatter.i7.local \
    --to=konstantin@linuxfoundation.org \
    --cc=git@vger.kernel.org \
    --cc=tools@linux.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).