git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Caleb Gray <hey@calebgray.com>
To: git@vger.kernel.org
Subject: Re: Add a "Flattened Cache" to `git --clone`?
Date: Thu, 14 May 2020 14:33:06 -0700	[thread overview]
Message-ID: <CAGjfG9bsQh2C6WP242v4LoiaSdghZDPuqns0VO82Txe-V54_KA@mail.gmail.com> (raw)
In-Reply-To: <20200514210501.GY1596452@mit.edu>

To Clarify: I'm talking about a server-side only cache which behaves
much like a `tar` file: it is a flat version of exactly(*) what ends
up on the client's storage. When a client runs `git --clone` and
there's a valid cache on the other end, that's all that gets streamed.

Konstantin's point that a repo like Linux is bound to see little/no
benefit (in fact, it'll just constantly invalidate/rewrite the ~1gb
cache) is reasonable. This feature definitely targets the "niche"
audience of repos with less-frequent-pushes-to-master-than-clones.

Bryan is exactly on the right track for what I'm referring to: the CDN
approach did come to mind (and is superior in nearly every way).

Junio nailed it: I'm not hoping for anything revolutionary here, just
hoping to reduce the redundant steps in clone down to a single
(presumably faster) step.

If the community agrees that there's little/no benefit to the
limitations of having a "cache for master and that's all," I'm also
more than capable of designing a more useful/complex graph/reduce
based solution which could dynamically bundle the most statistically
relevant data for whatever context the code is working in, though-- I
can't commit to any sort of deadline for that sort of a contribution.



On Thu, May 14, 2020 at 2:05 PM Theodore Y. Ts'o <tytso@mit.edu> wrote:
>
> On Thu, May 14, 2020 at 04:33:26PM -0400, Konstantin Ryabitsev wrote:
> > > Assuming my idea doesn't contradict other best practices or standards
> > > already in place,  I'd like to transform the typical `git clone` flow
> > > from:
> > >
> > >  Cloning into 'linux'...
> > >  remote: Enumerating objects: 4154, done.
> > >  remote: Counting objects: 100% (4154/4154), done.
> > >  remote: Compressing objects: 100% (2535/2535), done.
> > >  remote: Total 7344127 (delta 2564), reused 2167 (delta 1612),
> > > pack-reused 7339973
> > >  Receiving objects: 100% (7344127/7344127), 1.22 GiB | 8.51 MiB/s, done.
> > >  Resolving deltas: 100% (6180880/6180880), done.
> > >
> > > To subsequent clones (until cache invalidated) using the "flattened
> > > cache" version (presumably built while fulfilling the first clone
> > > request above):
> > >
> > >  Cloning into 'linux'...
> > >  Receiving cache: 100% (7344127/7344127), 1.22 GiB | 8.51 MiB/s, done.
> >
> > I don't think it's a common workflow for someone to repeatedly clone
> > linux.git. Automated processes like CI would be doing it, but they tend
> > to blow away the local disk between jobs, so they are unlikely to
> > benefit from any native git local cache for something like this (in
> > fact, we recommend that people use clone.bundle files for their CI
> > needs, as described here:
> > https://www.kernel.org/best-way-to-do-linux-clones-for-your-ci.html).
>
> If the goal is a git local cache, we have this today.  I'm not sure
> this is what Caleb was asking for, though:
>
> git clone --bare https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git base
> git clone --reference base https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git ext4
>
>                                                         - Ted

  parent reply	other threads:[~2020-05-14 21:33 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-14 14:34 Add a "Flattened Cache" to `git --clone`? Caleb Gray
2020-05-14 20:33 ` Konstantin Ryabitsev
2020-05-14 20:54   ` Bryan Turner
2020-05-14 21:05   ` Theodore Y. Ts'o
2020-05-14 21:09     ` Eric Sunshine
2020-05-14 21:10     ` Konstantin Ryabitsev
2020-05-14 21:23       ` Junio C Hamano
2020-05-14 21:44         ` Konstantin Ryabitsev
2020-05-15 21:42           ` Eric Wong
2020-05-17 22:12             ` Konstantin Ryabitsev
     [not found]               ` <1061511589863147@mail.yandex.ru>
2020-05-25 14:02                 ` Caleb Gray
2020-05-14 21:33     ` Caleb Gray [this message]
2020-05-14 21:56       ` Junio C Hamano
2020-05-14 22:04         ` Caleb Gray
2020-05-14 22:30           ` Junio C Hamano
2020-05-14 22:44           ` Bryan Turner
2020-05-14 21:19   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGjfG9bsQh2C6WP242v4LoiaSdghZDPuqns0VO82Txe-V54_KA@mail.gmail.com \
    --to=hey@calebgray.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).