git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Eric Wong <normalperson@yhbt.net>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [PATCH] pack-objects: warn on split packs disabling bitmaps
Date: Wed, 27 Apr 2016 22:25:14 -0400	[thread overview]
Message-ID: <20160428022514.GC9707@sigill.intra.peff.net> (raw)
In-Reply-To: <20160427215324.GA22165@dcvr.yhbt.net>

On Wed, Apr 27, 2016 at 09:53:24PM +0000, Eric Wong wrote:

> It can be tempting for a server admin to want a stable set of
> long-lived packs for dumb clients; but also want to enable
> bitmaps to serve smart clients more quickly.
> 
> Unfortunately, such a configuration is impossible;
> so at least warn users of this incompatibility since
> commit 21134714787a02a37da15424d72c0119b2b8ed71
> ("pack-objects: turn off bitmaps when we split packs").
> 
> Tested the warning by inspecting the output of:
> 
> 	make -C t t5310-pack-bitmaps.sh GIT_TEST_OPTS=-v

I think the intent and code in your patch is fine; looks like doc
specifics are being discussed elsewhere.

But I did want to mention one thing, which is that long-lived split
packs are a tradeoff, even for dumb clients. The pack format cannot do
deltas between packs, so the sum of your split packs is larger than a
single pack would be. That's a good thing for somebody who cloned
earlier, and wants to only a few small packs on top. But it's much worse
for somebody who wants to do a fresh clone, and has to grab all of the
packs either way.

>  Fwiw, I'm hoping to publish an ~800MB git-clone-able repo of
>  our ML archives, soonish.  I can serve terabytes of dumb HTTP
>  traffic all day long without breaking a sweat; but smart
>  packing of big repos worries me; especially when feeding
>  slow clients and having to leave processes running
>  (or buffering pack output to disk).  So perhaps I'll teach
>  my HTTP server play dumb whenever CPU/memory usage is high.

Yeah, CPU and memory load for serving large clones is a problem. Memory
especially scales with number of objects (because we keep the whole
packing list in memory for the entirety of the write). At GitHub, we
have some changes to try to serve things verbatim from the on-disk pack
without even creating an in-memory list of objects (it's just a bitmap
of which objects in the packfile to send), and that reduces CPU and
memory load quite a bit. Cleaning up and submitting those patches has
been on my todo list for a while, but I just haven't gotten to it. I'm
of course happy to share the messy state if you want to pick through it
yourself.

-Peff

  parent reply	other threads:[~2016-04-28  2:25 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-27 21:53 [PATCH] pack-objects: warn on split packs disabling bitmaps Eric Wong
2016-04-27 22:56 ` Junio C Hamano
2016-04-28  2:15   ` Jeff King
2016-04-28  7:28   ` [PATCH v2] " Eric Wong
2016-04-28  2:25 ` Jeff King [this message]
2016-04-28  8:02   ` [PATCH] " Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160428022514.GC9707@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=normalperson@yhbt.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).