From: Eric Wong <normalperson@yhbt.net>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [PATCH] pack-objects: warn on split packs disabling bitmaps
Date: Thu, 28 Apr 2016 08:02:28 +0000 [thread overview]
Message-ID: <20160428080228.GB5252@dcvr.yhbt.net> (raw)
In-Reply-To: <20160428022514.GC9707@sigill.intra.peff.net>
Jeff King <peff@peff.net> wrote:
> On Wed, Apr 27, 2016 at 09:53:24PM +0000, Eric Wong wrote:
>
> > It can be tempting for a server admin to want a stable set of
> > long-lived packs for dumb clients; but also want to enable
> > bitmaps to serve smart clients more quickly.
> But I did want to mention one thing, which is that long-lived split
> packs are a tradeoff, even for dumb clients. The pack format cannot do
> deltas between packs, so the sum of your split packs is larger than a
> single pack would be. That's a good thing for somebody who cloned
> earlier, and wants to only a few small packs on top. But it's much worse
> for somebody who wants to do a fresh clone, and has to grab all of the
> packs either way.
Definitely a trade off, but a fresh clone with packs might only
be (at worst) doubling or tripling bandwidth use on both sides?
However, the CPU/memory cost of packing is at least an order of
magnitude (more likely several orders of magnitude) more
expensive on the server. The client most likely won't care
about CPU/memory usage, though.
> > Fwiw, I'm hoping to publish an ~800MB git-clone-able repo of
> > our ML archives, soonish. I can serve terabytes of dumb HTTP
> > traffic all day long without breaking a sweat; but smart
> > packing of big repos worries me; especially when feeding
> > slow clients and having to leave processes running
> > (or buffering pack output to disk). So perhaps I'll teach
> > my HTTP server play dumb whenever CPU/memory usage is high.
>
> Yeah, CPU and memory load for serving large clones is a problem. Memory
> especially scales with number of objects (because we keep the whole
> packing list in memory for the entirety of the write). At GitHub, we
> have some changes to try to serve things verbatim from the on-disk pack
> without even creating an in-memory list of objects (it's just a bitmap
> of which objects in the packfile to send), and that reduces CPU and
> memory load quite a bit. Cleaning up and submitting those patches has
> been on my todo list for a while, but I just haven't gotten to it. I'm
> of course happy to share the messy state if you want to pick through it
> yourself.
Sure thing! I can't promise I'll have time, either, but being
able to serve packs verbatim would be great; especially if you
could multiplex it with epoll/kqueue for folks on slow pipes
(and maybe use sendfile, but perhaps that's not worth the effort
with TLS everywhere nowadays).
I was also wondering if fresh clones could be memoized entirely.
prev parent reply other threads:[~2016-04-28 8:02 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-27 21:53 [PATCH] pack-objects: warn on split packs disabling bitmaps Eric Wong
2016-04-27 22:56 ` Junio C Hamano
2016-04-28 2:15 ` Jeff King
2016-04-28 7:28 ` [PATCH v2] " Eric Wong
2016-04-28 2:25 ` [PATCH] " Jeff King
2016-04-28 8:02 ` Eric Wong [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160428080228.GB5252@dcvr.yhbt.net \
--to=normalperson@yhbt.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).