git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Robin H. Johnson" <robbat2@gentoo.org>
To: Git Mailing List <git@vger.kernel.org>
Subject: Bundles: Partial Clone & Shallow clone
Date: Tue, 27 Oct 2020 21:55:08 +0000	[thread overview]
Message-ID: <robbat2-20201027T175013-504497035Z@orbis-terrarum.net> (raw)

[-- Attachment #1: Type: text/plain, Size: 3111 bytes --]

Hi!

I was wondering if anybody has made a start towards easier ways to
create bundles of partial & shallow clones.

I'm looking at potential ways to improve the various snapshot &
incremental update mechanisms that we have available in Gentoo Linux's
tree distribution.

Right now, the various mechanisms available are:
- git tree WITHOUT generated metadata (git-daemon, smart-http)

Plus all of the following WITH generated metadata.
- git tree (git-daemon OR smart-http)
- daily full snapshots in several formats/layouts.
- deltas of full snapshots in some of the formats & layouts [1-day interval, no rollback]
- rsync:// tree [30-min intervals, no rollback]

Offline/Air-gapped use cases presently are expected to use the
snapshots+deltas, but the 1-day cadence is longer than desirable

Several of these formats cannot rollback unless they kept the old tree
or the snapshot it was generated from.

rsync:// performance is heavily impacted by network latency and large
number of small objects being transferred.

What I'd like to offer instead, is CDN-replicated bundles generated at
regular cadences, with the absolute minimal content; taken from the git
tree WITH generated metadata. The git tree would have tags for every
cadence point (ideally 30-minutes, with potential pruning of old tags).
The bundles would have GPG-signed checksums separately included, to
provide verification of the updates.

These come in two variants:
1. Daily full snapshots, equivalent to depth=1 clone. 
2. 30-min & daily incremental bundles, using partial clone (needs to
   include the new blobs that would be present when up to date, and
   knowledge of which files can be deleted).

This should let users load some consistent set of daily or 30-min bundle
onto their gitdir, and hop between those tags [they would not have
gaps].

If they did have gaps due to a missing bundle or wanting to go within
the cadence points, and were online, they could use the partial
mechanisms to fill in their tree as needed.

Right now, I can naively generate the snapshots by explicitly making a
new detached shallow clone & then generating a bundle of that. 

Incremental bundles are already possible, but presently include all
commits, trees & blobs between two points.

The bundles are already generally smaller than our prior snapshots and
deltas, but I'm looking to make the process easier and cover the
remaining gaps [if we have a high-change period, then the bundle winds
up bigger than other deltas, because of the intermediate blobs].

Ask 1. Ability to generate shallow bundle without the intermediate clone step.

Ask 2. For incremental bundles, ability to exclude blobs not needed by
       the latest commit (and it's tree).

Both of these I think would be possible by adding some variant on the
--filter mechanism to git-bundle.

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robbat2@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1113 bytes --]

             reply	other threads:[~2020-10-27 22:05 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-27 21:55 Robin H. Johnson [this message]
2020-10-30  8:19 ` Bundles: Partial Clone & Shallow clone Christian Couder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=robbat2-20201027T175013-504497035Z@orbis-terrarum.net \
    --to=robbat2@gentoo.org \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).