* [PATCH] pack-objects: warn on split packs disabling bitmaps @ 2016-04-27 21:53 Eric Wong 2016-04-27 22:56 ` Junio C Hamano 2016-04-28 2:25 ` [PATCH] " Jeff King 0 siblings, 2 replies; 6+ messages in thread From: Eric Wong @ 2016-04-27 21:53 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Jeff King It can be tempting for a server admin to want a stable set of long-lived packs for dumb clients; but also want to enable bitmaps to serve smart clients more quickly. Unfortunately, such a configuration is impossible; so at least warn users of this incompatibility since commit 21134714787a02a37da15424d72c0119b2b8ed71 ("pack-objects: turn off bitmaps when we split packs"). Tested the warning by inspecting the output of: make -C t t5310-pack-bitmaps.sh GIT_TEST_OPTS=-v Signed-off-by: Eric Wong <normalperson@yhbt.net> --- Fwiw, I'm hoping to publish an ~800MB git-clone-able repo of our ML archives, soonish. I can serve terabytes of dumb HTTP traffic all day long without breaking a sweat; but smart packing of big repos worries me; especially when feeding slow clients and having to leave processes running (or buffering pack output to disk). So perhaps I'll teach my HTTP server play dumb whenever CPU/memory usage is high. Documentation/config.txt | 11 +++++++---- Documentation/git-pack-objects.txt | 3 ++- Documentation/git-repack.txt | 9 ++++++--- builtin/pack-objects.c | 9 ++++++++- 4 files changed, 23 insertions(+), 9 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 42d2b50..ec31170 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2156,8 +2156,10 @@ pack.packSizeLimit:: The maximum size of a pack. This setting only affects packing to a file when repacking, i.e. the git:// protocol is unaffected. It can be overridden by the `--max-pack-size` - option of linkgit:git-repack[1]. The minimum size allowed is - limited to 1 MiB. The default is unlimited. + option of linkgit:git-repack[1]. Reaching this limit prevents + pack bitmaps from being written when `repack.writeBitmaps` + is configured. The minimum size allowed is limited to 1 MiB. + The default is unlimited. Common unit suffixes of 'k', 'm', or 'g' are supported. @@ -2557,8 +2559,9 @@ repack.writeBitmaps:: objects to disk (e.g., when `git repack -a` is run). This index can speed up the "counting objects" phase of subsequent packs created for clones and fetches, at the cost of some disk - space and extra time spent on the initial repack. Defaults to - false. + space and extra time spent on the initial repack. This has + no effect if `pack.packSizeLimit` is configured and reached. + Defaults to false. rerere.autoUpdate:: When set to true, `git-rerere` updates the index with the diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt index bbea529..19cdcd0 100644 --- a/Documentation/git-pack-objects.txt +++ b/Documentation/git-pack-objects.txt @@ -110,7 +110,8 @@ base-name:: --max-pack-size=<n>:: Maximum size of each output pack file. The size can be suffixed with "k", "m", or "g". The minimum size allowed is limited to 1 MiB. - If specified, multiple packfiles may be created. + If specified, multiple packfiles may be created, which also + prevents the creation of a bitmap index. The default is unlimited, unless the config variable `pack.packSizeLimit` is set. diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt index af230d0..2065812 100644 --- a/Documentation/git-repack.txt +++ b/Documentation/git-repack.txt @@ -106,7 +106,8 @@ other objects in that pack they already have locally. --max-pack-size=<n>:: Maximum size of each output pack file. The size can be suffixed with "k", "m", or "g". The minimum size allowed is limited to 1 MiB. - If specified, multiple packfiles may be created. + If specified, multiple packfiles may be created, causing + `--write-bitmap-index` and `repack.writeBitmaps` to be ignored. The default is unlimited, unless the config variable `pack.packSizeLimit` is set. @@ -115,7 +116,9 @@ other objects in that pack they already have locally. Write a reachability bitmap index as part of the repack. This only makes sense when used with `-a` or `-A`, as the bitmaps must be able to refer to all reachable objects. This option - overrides the setting of `pack.writeBitmaps`. + overrides the setting of `repack.writeBitmaps`. This option + has no effect if a multiple packfiles are created due to + reaching `pack.packSizeLimit` or `--max-pack-size`. --pack-kept-objects:: Include objects in `.keep` files when repacking. Note that we @@ -123,7 +126,7 @@ other objects in that pack they already have locally. This means that we may duplicate objects, but this makes the option safe to use when there are concurrent pushes or fetches. This option is generally only useful if you are writing bitmaps - with `-b` or `pack.writeBitmaps`, as it ensures that the + with `-b` or `repack.writeBitmaps`, as it ensures that the bitmapped packfile has the necessary objects. Configuration diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index a27de5b..b6664ce 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -759,6 +759,10 @@ static off_t write_reused_pack(struct sha1file *f) return reuse_packfile_offset - sizeof(struct pack_header); } +static const char no_split_warning[] = N_( +"disabling bitmap writing, packs are split due to pack.packSizeLimit" +); + static void write_pack_file(void) { uint32_t i = 0, j; @@ -813,7 +817,10 @@ static void write_pack_file(void) fixup_pack_header_footer(fd, sha1, pack_tmp_name, nr_written, sha1, offset); close(fd); - write_bitmap_index = 0; + if (write_bitmap_index) { + warning(_(no_split_warning)); + write_bitmap_index = 0; + } } if (!pack_to_stdout) { -- EW ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] pack-objects: warn on split packs disabling bitmaps 2016-04-27 21:53 [PATCH] pack-objects: warn on split packs disabling bitmaps Eric Wong @ 2016-04-27 22:56 ` Junio C Hamano 2016-04-28 2:15 ` Jeff King 2016-04-28 7:28 ` [PATCH v2] " Eric Wong 2016-04-28 2:25 ` [PATCH] " Jeff King 1 sibling, 2 replies; 6+ messages in thread From: Junio C Hamano @ 2016-04-27 22:56 UTC (permalink / raw) To: Eric Wong; +Cc: git, Jeff King Eric Wong <normalperson@yhbt.net> writes: > It can be tempting for a server admin to want a stable set of > long-lived packs for dumb clients; but also want to enable > bitmaps to serve smart clients more quickly. > > Unfortunately, such a configuration is impossible; > so at least warn users of this incompatibility since > commit 21134714787a02a37da15424d72c0119b2b8ed71 > ("pack-objects: turn off bitmaps when we split packs"). > > Tested the warning by inspecting the output of: > > make -C t t5310-pack-bitmaps.sh GIT_TEST_OPTS=-v While I do not think the update in this patch is wrong, this makes me wonder what happens to a server admin who wants a stable set of long-lived objects in a pack, and other objects in another pack that is frequently recreated, by first packing objects needed for the latest released version into a single pack and marking it with .keep and then running "git repack" to create the second pack for the remainder. There is no automatic split involved, so we would get a bitmap file for each of these two packs. Would that pose a problem? The pack with the newer part of the history would not satisfy "all of the reachable objects are in the same pack" condition. > diff --git a/Documentation/config.txt b/Documentation/config.txt > index 42d2b50..ec31170 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -2156,8 +2156,10 @@ pack.packSizeLimit:: > The maximum size of a pack. This setting only affects > packing to a file when repacking, i.e. the git:// protocol > is unaffected. It can be overridden by the `--max-pack-size` > - option of linkgit:git-repack[1]. The minimum size allowed is > - limited to 1 MiB. The default is unlimited. > + option of linkgit:git-repack[1]. Reaching this limit prevents > + pack bitmaps from being written when `repack.writeBitmaps` > + is configured. The minimum size allowed is limited to 1 MiB. > + The default is unlimited. This is not wrong per-se, but I find the additional text overly verbose. The resulting text has at least three issues: - This sets maximum. It does not say what happens when the maximum is reached ("packing fails" is a valid expectation). We should spell out that when the maximum is reached, we will create multiple packfiles. - It is not "reaching this limit" that prevents bitmaps from being written. It is "we will create multiple packfiles" that does. - Even when repack.writeBitmaps is not configured, but bitmap is requested via the command line option "repack -b", bitmap generation is disabled once we end up creating multiple packfiles. > @@ -2557,8 +2559,9 @@ repack.writeBitmaps:: > objects to disk (e.g., when `git repack -a` is run). This > index can speed up the "counting objects" phase of subsequent > packs created for clones and fetches, at the cost of some disk > - space and extra time spent on the initial repack. Defaults to > - false. > + space and extra time spent on the initial repack. This has > + no effect if `pack.packSizeLimit` is configured and reached. > + Defaults to false. This has the opposite issue as the third point above. We have to ignore repack.writeBitmaps when we end up splitting a pack, regardless of the reason why we split was pack.packSizeLimit or by an explicit request from the command line. Also to future-proof these two paragraphs (and those that are touched by later parts of this patch), it may even be a good idea to omit the mention of packsizelimit altogether. We may invent a new and different reason why we end up producing multiple packfiles, other than packsizelimit. What this patch wants to make readers aware is that when multiple packs are generated, bitmap generation is disabled. Other details (e.g. how the user asks us to create multiple packs, either via command line or configuration, or how the use asks us to generate bitmaps) are of lessor importance. > diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt > index bbea529..19cdcd0 100644 > --- a/Documentation/git-pack-objects.txt > +++ b/Documentation/git-pack-objects.txt > @@ -110,7 +110,8 @@ base-name:: > --max-pack-size=<n>:: > Maximum size of each output pack file. The size can be suffixed with > "k", "m", or "g". The minimum size allowed is limited to 1 MiB. > - If specified, multiple packfiles may be created. > + If specified, multiple packfiles may be created, which also > + prevents the creation of a bitmap index. This is a good update, judging with the yardstick I set in the previous paragraph in this review. > The default is unlimited, unless the config variable > `pack.packSizeLimit` is set. > > diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt > index af230d0..2065812 100644 > --- a/Documentation/git-repack.txt > +++ b/Documentation/git-repack.txt > @@ -106,7 +106,8 @@ other objects in that pack they already have locally. > --max-pack-size=<n>:: > Maximum size of each output pack file. The size can be suffixed with > "k", "m", or "g". The minimum size allowed is limited to 1 MiB. > - If specified, multiple packfiles may be created. > + If specified, multiple packfiles may be created, causing > + `--write-bitmap-index` and `repack.writeBitmaps` to be ignored. And this is not. Just say "bitmap index is not created". > @@ -115,7 +116,9 @@ other objects in that pack they already have locally. > Write a reachability bitmap index as part of the repack. This > only makes sense when used with `-a` or `-A`, as the bitmaps > must be able to refer to all reachable objects. This option > - overrides the setting of `pack.writeBitmaps`. > + overrides the setting of `repack.writeBitmaps`. This option > + has no effect if a multiple packfiles are created due to > + reaching `pack.packSizeLimit` or `--max-pack-size`. Dropping "due to ..." makes it perfect. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] pack-objects: warn on split packs disabling bitmaps 2016-04-27 22:56 ` Junio C Hamano @ 2016-04-28 2:15 ` Jeff King 2016-04-28 7:28 ` [PATCH v2] " Eric Wong 1 sibling, 0 replies; 6+ messages in thread From: Jeff King @ 2016-04-28 2:15 UTC (permalink / raw) To: Junio C Hamano; +Cc: Eric Wong, git On Wed, Apr 27, 2016 at 03:56:46PM -0700, Junio C Hamano wrote: > Eric Wong <normalperson@yhbt.net> writes: > > > It can be tempting for a server admin to want a stable set of > > long-lived packs for dumb clients; but also want to enable > > bitmaps to serve smart clients more quickly. > > > > Unfortunately, such a configuration is impossible; > > so at least warn users of this incompatibility since > > commit 21134714787a02a37da15424d72c0119b2b8ed71 > > ("pack-objects: turn off bitmaps when we split packs"). > > > > Tested the warning by inspecting the output of: > > > > make -C t t5310-pack-bitmaps.sh GIT_TEST_OPTS=-v > > While I do not think the update in this patch is wrong, this makes > me wonder what happens to a server admin who wants a stable set of > long-lived objects in a pack, and other objects in another pack that > is frequently recreated, by first packing objects needed for the > latest released version into a single pack and marking it with .keep > and then running "git repack" to create the second pack for the > remainder. > > There is no automatic split involved, so we would get a bitmap file > for each of these two packs. Would that pose a problem? The pack > with the newer part of the history would not satisfy "all of the > reachable objects are in the same pack" condition. You will not get two bitmaps in such a case. In add_object_entry(), if we exclude an object via want_object_in_pack(), we will issue a warning and turn off bitmaps. That includes finding that one of the reachable objects we would need is in a .keep pack. You could in theory construct a broken non-closed bitmap by feeding an arbitrary set of objects to pack-objects, but we turn off bitmap writing entirely unless --all is used. So the test in add_object_entry() should be sufficient for all cases there (it's actually too conservative; it's possible that the object is not reachable from the other objects that are going into the pack, but this is so impractical that it's not even worth trying to catch). The split-pack check from 211347147 had to come separately, because we only find out about the split much later during the write phase (and again, it is too conservative; it's _possible_ that the objects being split aren't reachable from anything in the previous pack, but it's exceedingly unlikely and not worth caring about). Adding a warning as Eric's patch does is a definite improvement, and something I should have done in the original (we _could_ just use the same no_closure_warning, as that is technically what is going on, but I think it is nice to mention pack-splitting, which is more likely to lead the user in the right direction to fixing it). > [discussion of doc fixes] I do not mind overly if pack-splitting mentions disabling bitmaps. But I think it may be simpler to keep the documentation in the bitmap section, rather than trying to cross-reference in both directions. It is the bitmap code which is not prepared to work with pack-splits (and other things, like .keep), not the other way around. -Peff ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2] pack-objects: warn on split packs disabling bitmaps 2016-04-27 22:56 ` Junio C Hamano 2016-04-28 2:15 ` Jeff King @ 2016-04-28 7:28 ` Eric Wong 1 sibling, 0 replies; 6+ messages in thread From: Eric Wong @ 2016-04-28 7:28 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Jeff King Junio C Hamano <gitster@pobox.com> wrote: > Eric Wong <normalperson@yhbt.net> writes: > > +++ b/Documentation/git-pack-objects.txt > > @@ -110,7 +110,8 @@ base-name:: > > --max-pack-size=<n>:: > > Maximum size of each output pack file. The size can be suffixed with > > "k", "m", or "g". The minimum size allowed is limited to 1 MiB. > > - If specified, multiple packfiles may be created. > > + If specified, multiple packfiles may be created, which also > > + prevents the creation of a bitmap index. > > This is a good update, judging with the yardstick I set in the > previous paragraph in this review. Thanks for the review; made some adjustments and have v2 below. > > --- a/Documentation/git-repack.txt > > +++ b/Documentation/git-repack.txt > > @@ -106,7 +106,8 @@ other objects in that pack they already have locally. > > --max-pack-size=<n>:: > > Maximum size of each output pack file. The size can be suffixed with > > "k", "m", or "g". The minimum size allowed is limited to 1 MiB. > > - If specified, multiple packfiles may be created. > > + If specified, multiple packfiles may be created, causing > > + `--write-bitmap-index` and `repack.writeBitmaps` to be ignored. > > And this is not. Just say "bitmap index is not created". Ah, I've now synced the same --max-pack-size doc from git-pack-objects.txt you liked into v2 below. I worded my original differently between pack-objects and repack since I figured repack is more likely used by end users; and perhaps warranted an explanation that didn't require describing the actual problem... But I suppose "repack" isn't commonly called anymore, either. > > @@ -115,7 +116,9 @@ other objects in that pack they already have locally. > > Write a reachability bitmap index as part of the repack. This > > only makes sense when used with `-a` or `-A`, as the bitmaps > > must be able to refer to all reachable objects. This option > > - overrides the setting of `pack.writeBitmaps`. > > + overrides the setting of `repack.writeBitmaps`. This option > > + has no effect if a multiple packfiles are created due to > > + reaching `pack.packSizeLimit` or `--max-pack-size`. > > Dropping "due to ..." makes it perfect. Done, along with: s/effect if a multiple/effect when multiple/ "if a" was definitely a typo, "if" is probably alright, but I suspect "when" is even better. -------------8<------------- Subject: [PATCH] pack-objects: warn on split packs disabling bitmaps It can be tempting for a server admin to want a stable set of long-lived packs for dumb clients; but also want to enable bitmaps to serve smart clients more quickly. Unfortunately, such a configuration is impossible; so at least warn users of this incompatibility since commit 21134714787a02a37da15424d72c0119b2b8ed71 ("pack-objects: turn off bitmaps when we split packs"). Tested the warning by inspecting the output of: make -C t t5310-pack-bitmaps.sh GIT_TEST_OPTS=-v Signed-off-by: Eric Wong <normalperson@yhbt.net> --- Documentation/config.txt | 12 ++++++++---- Documentation/git-pack-objects.txt | 3 ++- Documentation/git-repack.txt | 8 +++++--- builtin/pack-objects.c | 9 ++++++++- 4 files changed, 23 insertions(+), 9 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 3d8bc97..3ea3372 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2165,8 +2165,11 @@ pack.packSizeLimit:: The maximum size of a pack. This setting only affects packing to a file when repacking, i.e. the git:// protocol is unaffected. It can be overridden by the `--max-pack-size` - option of linkgit:git-repack[1]. The minimum size allowed is - limited to 1 MiB. The default is unlimited. + option of linkgit:git-repack[1]. Reaching this limit results + in the creation of multiple packfiles; which in turn prevents + bitmaps from being created. + The minimum size allowed is limited to 1 MiB. + The default is unlimited. Common unit suffixes of 'k', 'm', or 'g' are supported. @@ -2566,8 +2569,9 @@ repack.writeBitmaps:: objects to disk (e.g., when `git repack -a` is run). This index can speed up the "counting objects" phase of subsequent packs created for clones and fetches, at the cost of some disk - space and extra time spent on the initial repack. Defaults to - false. + space and extra time spent on the initial repack. This has + no effect if multiple packfiles are created. + Defaults to false. rerere.autoUpdate:: When set to true, `git-rerere` updates the index with the diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt index bbea529..19cdcd0 100644 --- a/Documentation/git-pack-objects.txt +++ b/Documentation/git-pack-objects.txt @@ -110,7 +110,8 @@ base-name:: --max-pack-size=<n>:: Maximum size of each output pack file. The size can be suffixed with "k", "m", or "g". The minimum size allowed is limited to 1 MiB. - If specified, multiple packfiles may be created. + If specified, multiple packfiles may be created, which also + prevents the creation of a bitmap index. The default is unlimited, unless the config variable `pack.packSizeLimit` is set. diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt index af230d0..b9c02ce 100644 --- a/Documentation/git-repack.txt +++ b/Documentation/git-repack.txt @@ -106,7 +106,8 @@ other objects in that pack they already have locally. --max-pack-size=<n>:: Maximum size of each output pack file. The size can be suffixed with "k", "m", or "g". The minimum size allowed is limited to 1 MiB. - If specified, multiple packfiles may be created. + If specified, multiple packfiles may be created, which also + prevents the creation of a bitmap index. The default is unlimited, unless the config variable `pack.packSizeLimit` is set. @@ -115,7 +116,8 @@ other objects in that pack they already have locally. Write a reachability bitmap index as part of the repack. This only makes sense when used with `-a` or `-A`, as the bitmaps must be able to refer to all reachable objects. This option - overrides the setting of `pack.writeBitmaps`. + overrides the setting of `repack.writeBitmaps`. This option + has no effect if multiple packfiles are created. --pack-kept-objects:: Include objects in `.keep` files when repacking. Note that we @@ -123,7 +125,7 @@ other objects in that pack they already have locally. This means that we may duplicate objects, but this makes the option safe to use when there are concurrent pushes or fetches. This option is generally only useful if you are writing bitmaps - with `-b` or `pack.writeBitmaps`, as it ensures that the + with `-b` or `repack.writeBitmaps`, as it ensures that the bitmapped packfile has the necessary objects. Configuration diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index a27de5b..b6664ce 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -759,6 +759,10 @@ static off_t write_reused_pack(struct sha1file *f) return reuse_packfile_offset - sizeof(struct pack_header); } +static const char no_split_warning[] = N_( +"disabling bitmap writing, packs are split due to pack.packSizeLimit" +); + static void write_pack_file(void) { uint32_t i = 0, j; @@ -813,7 +817,10 @@ static void write_pack_file(void) fixup_pack_header_footer(fd, sha1, pack_tmp_name, nr_written, sha1, offset); close(fd); - write_bitmap_index = 0; + if (write_bitmap_index) { + warning(_(no_split_warning)); + write_bitmap_index = 0; + } } if (!pack_to_stdout) { -- EW ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] pack-objects: warn on split packs disabling bitmaps 2016-04-27 21:53 [PATCH] pack-objects: warn on split packs disabling bitmaps Eric Wong 2016-04-27 22:56 ` Junio C Hamano @ 2016-04-28 2:25 ` Jeff King 2016-04-28 8:02 ` Eric Wong 1 sibling, 1 reply; 6+ messages in thread From: Jeff King @ 2016-04-28 2:25 UTC (permalink / raw) To: Eric Wong; +Cc: Junio C Hamano, git On Wed, Apr 27, 2016 at 09:53:24PM +0000, Eric Wong wrote: > It can be tempting for a server admin to want a stable set of > long-lived packs for dumb clients; but also want to enable > bitmaps to serve smart clients more quickly. > > Unfortunately, such a configuration is impossible; > so at least warn users of this incompatibility since > commit 21134714787a02a37da15424d72c0119b2b8ed71 > ("pack-objects: turn off bitmaps when we split packs"). > > Tested the warning by inspecting the output of: > > make -C t t5310-pack-bitmaps.sh GIT_TEST_OPTS=-v I think the intent and code in your patch is fine; looks like doc specifics are being discussed elsewhere. But I did want to mention one thing, which is that long-lived split packs are a tradeoff, even for dumb clients. The pack format cannot do deltas between packs, so the sum of your split packs is larger than a single pack would be. That's a good thing for somebody who cloned earlier, and wants to only a few small packs on top. But it's much worse for somebody who wants to do a fresh clone, and has to grab all of the packs either way. > Fwiw, I'm hoping to publish an ~800MB git-clone-able repo of > our ML archives, soonish. I can serve terabytes of dumb HTTP > traffic all day long without breaking a sweat; but smart > packing of big repos worries me; especially when feeding > slow clients and having to leave processes running > (or buffering pack output to disk). So perhaps I'll teach > my HTTP server play dumb whenever CPU/memory usage is high. Yeah, CPU and memory load for serving large clones is a problem. Memory especially scales with number of objects (because we keep the whole packing list in memory for the entirety of the write). At GitHub, we have some changes to try to serve things verbatim from the on-disk pack without even creating an in-memory list of objects (it's just a bitmap of which objects in the packfile to send), and that reduces CPU and memory load quite a bit. Cleaning up and submitting those patches has been on my todo list for a while, but I just haven't gotten to it. I'm of course happy to share the messy state if you want to pick through it yourself. -Peff ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] pack-objects: warn on split packs disabling bitmaps 2016-04-28 2:25 ` [PATCH] " Jeff King @ 2016-04-28 8:02 ` Eric Wong 0 siblings, 0 replies; 6+ messages in thread From: Eric Wong @ 2016-04-28 8:02 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git Jeff King <peff@peff.net> wrote: > On Wed, Apr 27, 2016 at 09:53:24PM +0000, Eric Wong wrote: > > > It can be tempting for a server admin to want a stable set of > > long-lived packs for dumb clients; but also want to enable > > bitmaps to serve smart clients more quickly. > But I did want to mention one thing, which is that long-lived split > packs are a tradeoff, even for dumb clients. The pack format cannot do > deltas between packs, so the sum of your split packs is larger than a > single pack would be. That's a good thing for somebody who cloned > earlier, and wants to only a few small packs on top. But it's much worse > for somebody who wants to do a fresh clone, and has to grab all of the > packs either way. Definitely a trade off, but a fresh clone with packs might only be (at worst) doubling or tripling bandwidth use on both sides? However, the CPU/memory cost of packing is at least an order of magnitude (more likely several orders of magnitude) more expensive on the server. The client most likely won't care about CPU/memory usage, though. > > Fwiw, I'm hoping to publish an ~800MB git-clone-able repo of > > our ML archives, soonish. I can serve terabytes of dumb HTTP > > traffic all day long without breaking a sweat; but smart > > packing of big repos worries me; especially when feeding > > slow clients and having to leave processes running > > (or buffering pack output to disk). So perhaps I'll teach > > my HTTP server play dumb whenever CPU/memory usage is high. > > Yeah, CPU and memory load for serving large clones is a problem. Memory > especially scales with number of objects (because we keep the whole > packing list in memory for the entirety of the write). At GitHub, we > have some changes to try to serve things verbatim from the on-disk pack > without even creating an in-memory list of objects (it's just a bitmap > of which objects in the packfile to send), and that reduces CPU and > memory load quite a bit. Cleaning up and submitting those patches has > been on my todo list for a while, but I just haven't gotten to it. I'm > of course happy to share the messy state if you want to pick through it > yourself. Sure thing! I can't promise I'll have time, either, but being able to serve packs verbatim would be great; especially if you could multiplex it with epoll/kqueue for folks on slow pipes (and maybe use sendfile, but perhaps that's not worth the effort with TLS everywhere nowadays). I was also wondering if fresh clones could be memoized entirely. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-04-28 8:02 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-04-27 21:53 [PATCH] pack-objects: warn on split packs disabling bitmaps Eric Wong 2016-04-27 22:56 ` Junio C Hamano 2016-04-28 2:15 ` Jeff King 2016-04-28 7:28 ` [PATCH v2] " Eric Wong 2016-04-28 2:25 ` [PATCH] " Jeff King 2016-04-28 8:02 ` Eric Wong
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).