git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Eric Wong <normalperson@yhbt.net>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [PATCH] pack-objects: warn on split packs disabling bitmaps
Date: Thu, 28 Apr 2016 08:02:28 +0000	[thread overview]
Message-ID: <20160428080228.GB5252@dcvr.yhbt.net> (raw)
In-Reply-To: <20160428022514.GC9707@sigill.intra.peff.net>

Jeff King <peff@peff.net> wrote:
> On Wed, Apr 27, 2016 at 09:53:24PM +0000, Eric Wong wrote:
> 
> > It can be tempting for a server admin to want a stable set of
> > long-lived packs for dumb clients; but also want to enable
> > bitmaps to serve smart clients more quickly.

> But I did want to mention one thing, which is that long-lived split
> packs are a tradeoff, even for dumb clients. The pack format cannot do
> deltas between packs, so the sum of your split packs is larger than a
> single pack would be. That's a good thing for somebody who cloned
> earlier, and wants to only a few small packs on top. But it's much worse
> for somebody who wants to do a fresh clone, and has to grab all of the
> packs either way.

Definitely a trade off, but a fresh clone with packs might only
be (at worst) doubling or tripling bandwidth use on both sides?

However, the CPU/memory cost of packing is at least an order of
magnitude (more likely several orders of magnitude) more
expensive on the server.  The client most likely won't care
about CPU/memory usage, though.

> >  Fwiw, I'm hoping to publish an ~800MB git-clone-able repo of
> >  our ML archives, soonish.  I can serve terabytes of dumb HTTP
> >  traffic all day long without breaking a sweat; but smart
> >  packing of big repos worries me; especially when feeding
> >  slow clients and having to leave processes running
> >  (or buffering pack output to disk).  So perhaps I'll teach
> >  my HTTP server play dumb whenever CPU/memory usage is high.
> 
> Yeah, CPU and memory load for serving large clones is a problem. Memory
> especially scales with number of objects (because we keep the whole
> packing list in memory for the entirety of the write). At GitHub, we
> have some changes to try to serve things verbatim from the on-disk pack
> without even creating an in-memory list of objects (it's just a bitmap
> of which objects in the packfile to send), and that reduces CPU and
> memory load quite a bit. Cleaning up and submitting those patches has
> been on my todo list for a while, but I just haven't gotten to it. I'm
> of course happy to share the messy state if you want to pick through it
> yourself.

Sure thing!  I can't promise I'll have time, either, but being
able to serve packs verbatim would be great; especially if you
could multiplex it with epoll/kqueue for folks on slow pipes
(and maybe use sendfile, but perhaps that's not worth the effort
 with TLS everywhere nowadays).

I was also wondering if fresh clones could be memoized entirely.

      reply	other threads:[~2016-04-28  8:02 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-27 21:53 [PATCH] pack-objects: warn on split packs disabling bitmaps Eric Wong
2016-04-27 22:56 ` Junio C Hamano
2016-04-28  2:15   ` Jeff King
2016-04-28  7:28   ` [PATCH v2] " Eric Wong
2016-04-28  2:25 ` [PATCH] " Jeff King
2016-04-28  8:02   ` Eric Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160428080228.GB5252@dcvr.yhbt.net \
    --to=normalperson@yhbt.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).