git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* "disabling bitmap writing, as some objects are not being packed"?
@ 2016-12-16 21:05 David Turner
  2016-12-16 21:27 ` Jeff King
  2016-12-16 21:28 ` Junio C Hamano
  0 siblings, 2 replies; 21+ messages in thread
From: David Turner @ 2016-12-16 21:05 UTC (permalink / raw)
  To: git; +Cc: peff

I'm a bit confused by the message "disabling bitmap writing, as some
objects are not being packed".  I see it the my gc.log file on my git
server.

1. Its presence in the gc.log file prevents future automatic garbage
collection.  This seems bad.  I understand the desire to avoid making
things worse if a past gc has run into issues.  But this warning is
non-fatal; the only consequence is that many operations get slower.  But
a lack of gc when there are too many packs causes that consequence too
(often a much worse slowdown than would be caused by the missing
bitmap).

So I wonder if it would be better for auto gc to grep gc.log for fatal
errors (as opposed to warnings) and only skip running if any are found.
Alternately, we could simply put warnings into gc.log.warning and
reserve gc.log for fatal errors. I'm not sure which would be simpler.  

2. I don't understand what would cause that message.  That is, what bad
thing am I doing that I should stop doing?  I've briefly skimmed the
code and commit message, but the answer isn't leaping out at me.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "disabling bitmap writing, as some objects are not being packed"?
  2016-12-16 21:05 "disabling bitmap writing, as some objects are not being packed"? David Turner
@ 2016-12-16 21:27 ` Jeff King
  2016-12-16 21:28 ` Junio C Hamano
  1 sibling, 0 replies; 21+ messages in thread
From: Jeff King @ 2016-12-16 21:27 UTC (permalink / raw)
  To: David Turner; +Cc: git

On Fri, Dec 16, 2016 at 04:05:31PM -0500, David Turner wrote:

> 1. Its presence in the gc.log file prevents future automatic garbage
> collection.  This seems bad.  I understand the desire to avoid making
> things worse if a past gc has run into issues.  But this warning is
> non-fatal; the only consequence is that many operations get slower.  But
> a lack of gc when there are too many packs causes that consequence too
> (often a much worse slowdown than would be caused by the missing
> bitmap).
> 
> So I wonder if it would be better for auto gc to grep gc.log for fatal
> errors (as opposed to warnings) and only skip running if any are found.
> Alternately, we could simply put warnings into gc.log.warning and
> reserve gc.log for fatal errors. I'm not sure which would be simpler.  

Without thinking too hard on it, that seems like the appropriate
solution to me, too.

> 2. I don't understand what would cause that message.  That is, what bad
> thing am I doing that I should stop doing?  I've briefly skimmed the
> code and commit message, but the answer isn't leaping out at me.

Do you have alternates and are using --local? Do you have .keep packs
and have set repack.packKeptObjects to false?

There are other ways (e.g., an incremental repack), but I think those
are the likely ones to get via "git gc".

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "disabling bitmap writing, as some objects are not being packed"?
  2016-12-16 21:05 "disabling bitmap writing, as some objects are not being packed"? David Turner
  2016-12-16 21:27 ` Jeff King
@ 2016-12-16 21:28 ` Junio C Hamano
  2016-12-16 21:32   ` Jeff King
  2016-12-17  7:50   ` "disabling bitmap writing, as some objects are not being packed"? Duy Nguyen
  1 sibling, 2 replies; 21+ messages in thread
From: Junio C Hamano @ 2016-12-16 21:28 UTC (permalink / raw)
  To: David Turner; +Cc: git, peff, Nguyễn Thái Ngọc Duy

David Turner <novalis@novalis.org> writes:

> I'm a bit confused by the message "disabling bitmap writing, as some
> objects are not being packed".  I see it the my gc.log file on my git
> server.

> 1. Its presence in the gc.log file prevents future automatic garbage
> collection.  This seems bad.  I understand the desire to avoid making
> things worse if a past gc has run into issues.  But this warning is
> non-fatal; the only consequence is that many operations get slower.  But
> a lack of gc when there are too many packs causes that consequence too
> (often a much worse slowdown than would be caused by the missing
> bitmap).
>
> So I wonder if it would be better for auto gc to grep gc.log for fatal
> errors (as opposed to warnings) and only skip running if any are found.
> Alternately, we could simply put warnings into gc.log.warning and
> reserve gc.log for fatal errors. I'm not sure which would be simpler.  

I am not sure if string matching is really a good idea, as I'd
assume that these messages are eligible for i18n.

329e6e8794 ("gc: save log from daemonized gc --auto and print it
next time", 2015-09-19) wanted to notice that auto-gc is not
making progress and used the presense of error messages as a cue.
In your case, I think the auto-gc _is_ making progress, reducing
number of loose objects in the repository or consolidating many
packfiles into one, and the message is only about the fact that
packing is punting and not producing a bitmap as you asked, which
is different from not making any progress.  I do not think log vs
warn is a good criteria to tell them apart, either.

In any case, as the error message asks the user to do, the user
eventually wants to correct the root cause before removing the
gc.log; I am not sure report_last_gc_error() is the place to correct
this in the first place.

> 2. I don't understand what would cause that message.  That is, what bad
> thing am I doing that I should stop doing?  I've briefly skimmed the
> code and commit message, but the answer isn't leaping out at me.

Enabling bitmap generation for incremental packing that does not
cram everything into a single pack is triggering it, I would
presume.  Perhaps we should ignore -b option in most of the cases
and enable it only for "repack -a -d -f" codepath?  Or detect that
we are being run from "gc --auto" and automatically disable -b?  I
have a feeling that an approach along that line is closer to the
real solution than tweaking report_last_gc_error() and trying to
deduce if we are making any progress.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "disabling bitmap writing, as some objects are not being packed"?
  2016-12-16 21:28 ` Junio C Hamano
@ 2016-12-16 21:32   ` Jeff King
  2016-12-16 21:40     ` David Turner
  2016-12-17  7:50   ` "disabling bitmap writing, as some objects are not being packed"? Duy Nguyen
  1 sibling, 1 reply; 21+ messages in thread
From: Jeff King @ 2016-12-16 21:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: David Turner, git, Nguyễn Thái Ngọc Duy

On Fri, Dec 16, 2016 at 01:28:00PM -0800, Junio C Hamano wrote:

> > 2. I don't understand what would cause that message.  That is, what bad
> > thing am I doing that I should stop doing?  I've briefly skimmed the
> > code and commit message, but the answer isn't leaping out at me.
> 
> Enabling bitmap generation for incremental packing that does not
> cram everything into a single pack is triggering it, I would
> presume.  Perhaps we should ignore -b option in most of the cases
> and enable it only for "repack -a -d -f" codepath?  Or detect that
> we are being run from "gc --auto" and automatically disable -b?  I
> have a feeling that an approach along that line is closer to the
> real solution than tweaking report_last_gc_error() and trying to
> deduce if we are making any progress.

Ah, indeed. I was thinking in my other response that "git gc" would
always kick off an all-into-one repack. But "gc --auto" will not in
certain cases. And yes, in those cases you definitely would want
--no-write-bitmap-index. I think it would be reasonable for "git repack"
to disable bitmap-writing automatically when not doing an all-into-one
repack.

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "disabling bitmap writing, as some objects are not being packed"?
  2016-12-16 21:32   ` Jeff King
@ 2016-12-16 21:40     ` David Turner
  2016-12-16 21:49       ` Jeff King
  0 siblings, 1 reply; 21+ messages in thread
From: David Turner @ 2016-12-16 21:40 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Nguyễn Thái Ngọc Duy

On Fri, 2016-12-16 at 16:32 -0500, Jeff King wrote:
> On Fri, Dec 16, 2016 at 01:28:00PM -0800, Junio C Hamano wrote:
> 
> > > 2. I don't understand what would cause that message.  That is, what bad
> > > thing am I doing that I should stop doing?  I've briefly skimmed the
> > > code and commit message, but the answer isn't leaping out at me.
> > 
> > Enabling bitmap generation for incremental packing that does not
> > cram everything into a single pack is triggering it, I would
> > presume.  Perhaps we should ignore -b option in most of the cases
> > and enable it only for "repack -a -d -f" codepath?  Or detect that
> > we are being run from "gc --auto" and automatically disable -b?  I
> > have a feeling that an approach along that line is closer to the
> > real solution than tweaking report_last_gc_error() and trying to
> > deduce if we are making any progress.
> 
> Ah, indeed. I was thinking in my other response that "git gc" would
> always kick off an all-into-one repack. But "gc --auto" will not in
> certain cases. And yes, in those cases you definitely would want
> --no-write-bitmap-index. I think it would be reasonable for "git repack"
> to disable bitmap-writing automatically when not doing an all-into-one
> repack.

I do not have alternates and am not using --local.  Nor do I have .keep
packs.

I would assume, based on the documentation, that auto gc would be doing
an all-into-one repack:
"If the number of packs exceeds the value of gc.autopacklimit, then
 existing packs (except those marked with a .keep file) are
 consolidated into a single pack by using the -A option of git
 repack."

I don't have any settings that limit the size of packs, either.  And a
manual git repack -a -d creates only a single pack.  Its loneliness
doesn't last long, because pretty soon a new pack is created by an
incoming push.

Unless this just means that some objects are being kept loose (perhaps
because they are unreferenced)? 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "disabling bitmap writing, as some objects are not being packed"?
  2016-12-16 21:40     ` David Turner
@ 2016-12-16 21:49       ` Jeff King
  2016-12-16 23:59         ` [PATCH] pack-objects: don't warn about bitmaps on incremental pack David Turner
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff King @ 2016-12-16 21:49 UTC (permalink / raw)
  To: David Turner; +Cc: Junio C Hamano, git, Nguyễn Thái Ngọc Duy

On Fri, Dec 16, 2016 at 04:40:16PM -0500, David Turner wrote:

> I would assume, based on the documentation, that auto gc would be doing
> an all-into-one repack:
> "If the number of packs exceeds the value of gc.autopacklimit, then
>  existing packs (except those marked with a .keep file) are
>  consolidated into a single pack by using the -A option of git
>  repack."
> 
> I don't have any settings that limit the size of packs, either.  And a
> manual git repack -a -d creates only a single pack.  Its loneliness
> doesn't last long, because pretty soon a new pack is created by an
> incoming push.

The interesting code is in need_to_gc():

        /*
         * If there are too many loose objects, but not too many
         * packs, we run "repack -d -l".  If there are too many packs,
         * we run "repack -A -d -l".  Otherwise we tell the caller
         * there is no need.
         */
        if (too_many_packs())
                add_repack_all_option();
        else if (!too_many_loose_objects())
                return 0;

So if you have (say) 10 packs and 10,000 objects, we'll incrementally
pack those objects into a single new pack.

I never noticed this myself because we do not use auto-gc at GitHub at
all. We only ever do a big all-into-one repack.

> Unless this just means that some objects are being kept loose (perhaps
> because they are unreferenced)? 

If they're unreferenced, they won't be part of the new pack. You might
accumulate loose objects that are ejected from previous packs, which
could trigger auto-gc to do an incremental pack (even though it wouldn't
be productive, because they're unreferenced!). You may also get them
from pushes (small pushes will be exploded into loose objects by
default).

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] pack-objects: don't warn about bitmaps on incremental pack
  2016-12-16 21:49       ` Jeff King
@ 2016-12-16 23:59         ` David Turner
  2016-12-17  4:04           ` Jeff King
  0 siblings, 1 reply; 21+ messages in thread
From: David Turner @ 2016-12-16 23:59 UTC (permalink / raw)
  To: git, peff; +Cc: David Turner

When running git pack-objects --incremental, we do not expect to be
able to write a bitmap; it is very likely that objects in the new pack
will have references to objects outside of the pack.  So we don't need
to warn the user about it.

This warning was making its way into gc.log because auto-gc will do an
incremental repack when there are too many loose objects but not too
many packs.  When the gc.log was present, future auto gc runs would
refuse to run.

Signed-off-by: David Turner <dturner@twosigma.com>
---
 builtin/pack-objects.c  |  3 ++-
 t/t5310-pack-bitmaps.sh | 12 ++++++++++++
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 0fd52bd..96de213 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1083,7 +1083,8 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type,
 	if (!want_object_in_pack(sha1, exclude, &found_pack, &found_offset)) {
 		/* The pack is missing an object, so it will not have closure */
 		if (write_bitmap_index) {
-			warning(_(no_closure_warning));
+			if (!incremental)
+				warning(_(no_closure_warning));
 			write_bitmap_index = 0;
 		}
 		return 0;
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index b4c7a6f..d81636e 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -247,6 +247,18 @@ test_expect_success 'pack-objects respects --incremental' '
 	test_cmp 4.objects objects
 '
 
+test_expect_success 'incremental repack does not create bitmaps' '
+	test_commit 11 &&
+	ls .git/objects/pack/ | grep bitmap >existing_bitmaps &&
+	ls .git/objects/pack/ | grep -v bitmap >existing_packs &&
+	git repack -d 2>err &&
+	test_line_count = 0 err &&
+	ls .git/objects/pack/ | grep bitmap >output &&
+	ls .git/objects/pack/ | grep -v bitmap >post_packs &&
+	test_cmp existing_bitmaps output &&
+	! test_cmp existing_packs post_packs
+'
+
 test_expect_success 'pack with missing blob' '
 	rm $(objpath $blob) &&
 	git pack-objects --stdout --revs <revs >/dev/null
-- 
2.8.0.rc4.22.g8ae061a


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] pack-objects: don't warn about bitmaps on incremental pack
  2016-12-16 23:59         ` [PATCH] pack-objects: don't warn about bitmaps on incremental pack David Turner
@ 2016-12-17  4:04           ` Jeff King
  2016-12-19 16:03             ` David Turner
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff King @ 2016-12-17  4:04 UTC (permalink / raw)
  To: David Turner; +Cc: git

On Fri, Dec 16, 2016 at 06:59:35PM -0500, David Turner wrote:

> When running git pack-objects --incremental, we do not expect to be
> able to write a bitmap; it is very likely that objects in the new pack
> will have references to objects outside of the pack.  So we don't need
> to warn the user about it.
> [...]
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 0fd52bd..96de213 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -1083,7 +1083,8 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type,
>  	if (!want_object_in_pack(sha1, exclude, &found_pack, &found_offset)) {
>  		/* The pack is missing an object, so it will not have closure */
>  		if (write_bitmap_index) {
> -			warning(_(no_closure_warning));
> +			if (!incremental)
> +				warning(_(no_closure_warning));
>  			write_bitmap_index = 0;
>  		}
>  		return 0;

I agree that the user doesn't need to be warned about it when running
"gc --auto", but I wonder if somebody invoking "pack-objects
--incremental --write-bitmap-index" ought to be.

In other words, your patch is detecting at a low level that we've been
given a nonsense combination of options, but should we perhaps stop
passing nonsense in the first place?

Either at the repack level, with something like:

diff --git a/builtin/repack.c b/builtin/repack.c
index 80dd06b4a2..6608a902b1 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -231,8 +231,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		argv_array_pushf(&cmd.args, "--no-reuse-delta");
 	if (no_reuse_object)
 		argv_array_pushf(&cmd.args, "--no-reuse-object");
-	if (write_bitmaps)
-		argv_array_push(&cmd.args, "--write-bitmap-index");
 
 	if (pack_everything & ALL_INTO_ONE) {
 		get_non_kept_pack_filenames(&existing_packs);
@@ -256,8 +254,11 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	} else {
 		argv_array_push(&cmd.args, "--unpacked");
 		argv_array_push(&cmd.args, "--incremental");
+		write_bitmap_index = 0;
 	}
 
+	if (write_bitmaps)
+		argv_array_push(&cmd.args, "--write-bitmap-index");
 	if (local)
 		argv_array_push(&cmd.args,  "--local");
 	if (quiet)

Though that still means we do not warn on:

  git repack --write-bitmap-index

which is nonsense (it is asking for an incremental repack with bitmaps).

So maybe do it at the gc level, like:

diff --git a/builtin/gc.c b/builtin/gc.c
index 069950d0b4..d3c978c765 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -191,6 +191,11 @@ static void add_repack_all_option(void)
 	}
 }
 
+static void add_repack_incremental_option(void)
+{
+	argv_array_push(&repack, "--no-write-bitmap-index");
+}
+
 static int need_to_gc(void)
 {
 	/*
@@ -208,7 +213,9 @@ static int need_to_gc(void)
 	 */
 	if (too_many_packs())
 		add_repack_all_option();
-	else if (!too_many_loose_objects())
+	else if (too_many_loose_objects())
+		add_repack_incremental_option();
+	else
 		return 0;
 
 	if (run_hook_le(NULL, "pre-auto-gc", NULL))

-Peff

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: "disabling bitmap writing, as some objects are not being packed"?
  2016-12-16 21:28 ` Junio C Hamano
  2016-12-16 21:32   ` Jeff King
@ 2016-12-17  7:50   ` Duy Nguyen
  2017-02-08  1:03     ` David Turner
  1 sibling, 1 reply; 21+ messages in thread
From: Duy Nguyen @ 2016-12-17  7:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: David Turner, Git Mailing List, Jeff King

On Sat, Dec 17, 2016 at 4:28 AM, Junio C Hamano <gitster@pobox.com> wrote:
> David Turner <novalis@novalis.org> writes:
>
>> I'm a bit confused by the message "disabling bitmap writing, as some
>> objects are not being packed".  I see it the my gc.log file on my git
>> server.
>
>> 1. Its presence in the gc.log file prevents future automatic garbage
>> collection.  This seems bad.  I understand the desire to avoid making
>> things worse if a past gc has run into issues.  But this warning is
>> non-fatal; the only consequence is that many operations get slower.  But
>> a lack of gc when there are too many packs causes that consequence too
>> (often a much worse slowdown than would be caused by the missing
>> bitmap).
>>
>> So I wonder if it would be better for auto gc to grep gc.log for fatal
>> errors (as opposed to warnings) and only skip running if any are found.
>> Alternately, we could simply put warnings into gc.log.warning and
>> reserve gc.log for fatal errors. I'm not sure which would be simpler.
>
> I am not sure if string matching is really a good idea, as I'd
> assume that these messages are eligible for i18n.

And we can't grep for fatal errors anyway. The problem that led to
329e6e8794 was this line

    warning: There are too many unreachable loose objects; run 'git
prune' to remove them.

which is not fatal.

> 329e6e8794 ("gc: save log from daemonized gc --auto and print it
> next time", 2015-09-19) wanted to notice that auto-gc is not
> making progress and used the presense of error messages as a cue.
> In your case, I think the auto-gc _is_ making progress, reducing
> number of loose objects in the repository or consolidating many
> packfiles into one

Yeah the key point is making progress, and to reliably detect that we
need some way for all the commands that git-gc executes to tell it
about that, git-repack in this particular case but...

> and the message is only about the fact that
> packing is punting and not producing a bitmap as you asked, which
> is different from not making any progress.  I do not think log vs
> warn is a good criteria to tell them apart, either.
>
> In any case, as the error message asks the user to do, the user
> eventually wants to correct the root cause before removing the
> gc.log; I am not sure report_last_gc_error() is the place to correct
> this in the first place.
>
>> 2. I don't understand what would cause that message.  That is, what bad
>> thing am I doing that I should stop doing?  I've briefly skimmed the
>> code and commit message, but the answer isn't leaping out at me.
>
> Enabling bitmap generation for incremental packing that does not
> cram everything into a single pack is triggering it, I would
> presume.  Perhaps we should ignore -b option in most of the cases
> and enable it only for "repack -a -d -f" codepath?  Or detect that
> we are being run from "gc --auto" and automatically disable -b?

... since we have to change down in git-repack for that, perhaps doing
this is better. We can pass --auto (or something) to repack to tell it
about this special caller, so it only prints something to stderr in
serious cases.

Or we detect cases where background gc'ing won't work well and always
do it in foreground (e.g. when bitmap generation is enabled).

> I have a feeling that an approach along that line is closer to the
> real solution than tweaking report_last_gc_error() and trying to
> deduce if we are making any progress.
-- 
Duy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH] pack-objects: don't warn about bitmaps on incremental pack
  2016-12-17  4:04           ` Jeff King
@ 2016-12-19 16:03             ` David Turner
  0 siblings, 0 replies; 21+ messages in thread
From: David Turner @ 2016-12-19 16:03 UTC (permalink / raw)
  To: 'Jeff King'; +Cc: git@vger.kernel.org

> diff --git a/builtin/gc.c b/builtin/gc.c index 069950d0b4..d3c978c765
> 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -191,6 +191,11 @@ static void add_repack_all_option(void)
>  	}
>  }
> 
> +static void add_repack_incremental_option(void)
> +{
> +	argv_array_push(&repack, "--no-write-bitmap-index"); }
> +
>  static int need_to_gc(void)
>  {
>  	/*
> @@ -208,7 +213,9 @@ static int need_to_gc(void)
>  	 */
>  	if (too_many_packs())
>  		add_repack_all_option();
> -	else if (!too_many_loose_objects())
> +	else if (too_many_loose_objects())
> +		add_repack_incremental_option();
> +	else
>  		return 0;
> 
>  	if (run_hook_le(NULL, "pre-auto-gc", NULL))

Sure, that's fine.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "disabling bitmap writing, as some objects are not being packed"?
  2016-12-17  7:50   ` "disabling bitmap writing, as some objects are not being packed"? Duy Nguyen
@ 2017-02-08  1:03     ` David Turner
  2017-02-08  6:45       ` Duy Nguyen
  0 siblings, 1 reply; 21+ messages in thread
From: David Turner @ 2017-02-08  1:03 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Junio C Hamano, Git Mailing List, Jeff King

On Sat, 2016-12-17 at 14:50 +0700, Duy Nguyen wrote:
> And we can't grep for fatal errors anyway. The problem that led to
> 329e6e8794 was this line
> 
>     warning: There are too many unreachable loose objects; run 'git
> prune' to remove them.
> 
> which is not fatal.

So, speaking of that message, I noticed that our git servers were
getting slow again and found that message in gc.log.

I propose to make auto gc not write that message either. Any objections?



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "disabling bitmap writing, as some objects are not being packed"?
  2017-02-08  1:03     ` David Turner
@ 2017-02-08  6:45       ` Duy Nguyen
  2017-02-08  8:24         ` David Turner
  0 siblings, 1 reply; 21+ messages in thread
From: Duy Nguyen @ 2017-02-08  6:45 UTC (permalink / raw)
  To: David Turner; +Cc: Junio C Hamano, Git Mailing List, Jeff King

On Wed, Feb 8, 2017 at 8:03 AM, David Turner <novalis@novalis.org> wrote:
> On Sat, 2016-12-17 at 14:50 +0700, Duy Nguyen wrote:
>> And we can't grep for fatal errors anyway. The problem that led to
>> 329e6e8794 was this line
>>
>>     warning: There are too many unreachable loose objects; run 'git
>> prune' to remove them.
>>
>> which is not fatal.
>
> So, speaking of that message, I noticed that our git servers were
> getting slow again and found that message in gc.log.
>
> I propose to make auto gc not write that message either. Any objections?

Does that really help? auto gc would run more often, but unreachable
loose objects are still present and potentially make your servers
slow? Should these servers run periodic and explicit gc/prune?
-- 
Duy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "disabling bitmap writing, as some objects are not being packed"?
  2017-02-08  6:45       ` Duy Nguyen
@ 2017-02-08  8:24         ` David Turner
  2017-02-08  8:37           ` Duy Nguyen
  0 siblings, 1 reply; 21+ messages in thread
From: David Turner @ 2017-02-08  8:24 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Junio C Hamano, Git Mailing List, Jeff King

On Wed, 2017-02-08 at 13:45 +0700, Duy Nguyen wrote:
> On Wed, Feb 8, 2017 at 8:03 AM, David Turner <novalis@novalis.org> wrote:
> > On Sat, 2016-12-17 at 14:50 +0700, Duy Nguyen wrote:
> >> And we can't grep for fatal errors anyway. The problem that led to
> >> 329e6e8794 was this line
> >>
> >>     warning: There are too many unreachable loose objects; run 'git
> >> prune' to remove them.
> >>
> >> which is not fatal.
> >
> > So, speaking of that message, I noticed that our git servers were
> > getting slow again and found that message in gc.log.
> >
> > I propose to make auto gc not write that message either. Any objections?
> 
> Does that really help? auto gc would run more often, but unreachable
> loose objects are still present and potentially make your servers
> slow? Should these servers run periodic and explicit gc/prune?

At least pack files wouldn't accumulate.  This is the major cause of
slowdown, since each pack file must be checked for each object.

(And, also, maybe those unreachable loose objects are too new to get
gc'd, but if we retry next week, we'll gc them).



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "disabling bitmap writing, as some objects are not being packed"?
  2017-02-08  8:24         ` David Turner
@ 2017-02-08  8:37           ` Duy Nguyen
  2017-02-08 17:44             ` Junio C Hamano
  0 siblings, 1 reply; 21+ messages in thread
From: Duy Nguyen @ 2017-02-08  8:37 UTC (permalink / raw)
  To: David Turner; +Cc: Junio C Hamano, Git Mailing List, Jeff King

On Wed, Feb 8, 2017 at 3:24 PM, David Turner <novalis@novalis.org> wrote:
> On Wed, 2017-02-08 at 13:45 +0700, Duy Nguyen wrote:
>> On Wed, Feb 8, 2017 at 8:03 AM, David Turner <novalis@novalis.org> wrote:
>> > On Sat, 2016-12-17 at 14:50 +0700, Duy Nguyen wrote:
>> >> And we can't grep for fatal errors anyway. The problem that led to
>> >> 329e6e8794 was this line
>> >>
>> >>     warning: There are too many unreachable loose objects; run 'git
>> >> prune' to remove them.
>> >>
>> >> which is not fatal.
>> >
>> > So, speaking of that message, I noticed that our git servers were
>> > getting slow again and found that message in gc.log.
>> >
>> > I propose to make auto gc not write that message either. Any objections?
>>
>> Does that really help? auto gc would run more often, but unreachable
>> loose objects are still present and potentially make your servers
>> slow? Should these servers run periodic and explicit gc/prune?
>
> At least pack files wouldn't accumulate.  This is the major cause of
> slowdown, since each pack file must be checked for each object.
>
> (And, also, maybe those unreachable loose objects are too new to get
> gc'd, but if we retry next week, we'll gc them).

I was about to suggest a config option that lets you run auto gc
unconditionally, which, I think, is better than suppressing the
message. Then I found gc.autoDetach. If you set it to false globally,
I think you'll get the behavior you want.

On second thought, perhaps gc.autoDetach should default to false if
there's no tty, since its main point it to stop breaking interactive
usage. That would make the server side happy (no tty there).
-- 
Duy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "disabling bitmap writing, as some objects are not being packed"?
  2017-02-08  8:37           ` Duy Nguyen
@ 2017-02-08 17:44             ` Junio C Hamano
  2017-02-08 19:05               ` David Turner
  0 siblings, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2017-02-08 17:44 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: David Turner, Git Mailing List, Jeff King

Duy Nguyen <pclouds@gmail.com> writes:

> On second thought, perhaps gc.autoDetach should default to false if
> there's no tty, since its main point it to stop breaking interactive
> usage. That would make the server side happy (no tty there).

Sounds like an idea, but wouldn't that keep the end-user coming over
the network waiting after accepting a push until the GC completes, I
wonder.  If an impatient user disconnects, would that end up killing
an ongoing GC?  etc.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "disabling bitmap writing, as some objects are not being packed"?
  2017-02-08 17:44             ` Junio C Hamano
@ 2017-02-08 19:05               ` David Turner
  2017-02-08 19:08                 ` Jeff King
  0 siblings, 1 reply; 21+ messages in thread
From: David Turner @ 2017-02-08 19:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Duy Nguyen, Git Mailing List, Jeff King

On Wed, 2017-02-08 at 09:44 -0800, Junio C Hamano wrote:
> Duy Nguyen <pclouds@gmail.com> writes:
> 
> > On second thought, perhaps gc.autoDetach should default to false if
> > there's no tty, since its main point it to stop breaking interactive
> > usage. That would make the server side happy (no tty there).
> 
> Sounds like an idea, but wouldn't that keep the end-user coming over
> the network waiting after accepting a push until the GC completes, I
> wonder.  If an impatient user disconnects, would that end up killing
> an ongoing GC?  etc.

Regardless, it's impolite to keep the user waiting. So, I think we
should just not write the "too many unreachable loose objects" message
if auto-gc is on.  Does that sound OK?



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "disabling bitmap writing, as some objects are not being packed"?
  2017-02-08 19:05               ` David Turner
@ 2017-02-08 19:08                 ` Jeff King
  2017-02-08 22:14                   ` David Turner
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff King @ 2017-02-08 19:08 UTC (permalink / raw)
  To: David Turner; +Cc: Junio C Hamano, Duy Nguyen, Git Mailing List

On Wed, Feb 08, 2017 at 02:05:42PM -0500, David Turner wrote:

> On Wed, 2017-02-08 at 09:44 -0800, Junio C Hamano wrote:
> > Duy Nguyen <pclouds@gmail.com> writes:
> > 
> > > On second thought, perhaps gc.autoDetach should default to false if
> > > there's no tty, since its main point it to stop breaking interactive
> > > usage. That would make the server side happy (no tty there).
> > 
> > Sounds like an idea, but wouldn't that keep the end-user coming over
> > the network waiting after accepting a push until the GC completes, I
> > wonder.  If an impatient user disconnects, would that end up killing
> > an ongoing GC?  etc.
> 
> Regardless, it's impolite to keep the user waiting. So, I think we
> should just not write the "too many unreachable loose objects" message
> if auto-gc is on.  Does that sound OK?

I thought the point of that message was to prevent auto-gc from kicking
in over and over again due to objects that won't actually get pruned.

I wonder if you'd want to either bump the auto-gc object limit, or
possibly reduce the gc.pruneExpire limit to keep this situation from
coming up in the first place (or at least mitigating the amount of time
it's the case).

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "disabling bitmap writing, as some objects are not being packed"?
  2017-02-08 19:08                 ` Jeff King
@ 2017-02-08 22:14                   ` David Turner
  2017-02-08 23:00                     ` Jeff King
  0 siblings, 1 reply; 21+ messages in thread
From: David Turner @ 2017-02-08 22:14 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Duy Nguyen, Git Mailing List

On Wed, 2017-02-08 at 14:08 -0500, Jeff King wrote:
> On Wed, Feb 08, 2017 at 02:05:42PM -0500, David Turner wrote:
> 
> > On Wed, 2017-02-08 at 09:44 -0800, Junio C Hamano wrote:
> > > Duy Nguyen <pclouds@gmail.com> writes:
> > > 
> > > > On second thought, perhaps gc.autoDetach should default to false if
> > > > there's no tty, since its main point it to stop breaking interactive
> > > > usage. That would make the server side happy (no tty there).
> > > 
> > > Sounds like an idea, but wouldn't that keep the end-user coming over
> > > the network waiting after accepting a push until the GC completes, I
> > > wonder.  If an impatient user disconnects, would that end up killing
> > > an ongoing GC?  etc.
> > 
> > Regardless, it's impolite to keep the user waiting. So, I think we
> > should just not write the "too many unreachable loose objects" message
> > if auto-gc is on.  Does that sound OK?
> 
> I thought the point of that message was to prevent auto-gc from kicking
> in over and over again due to objects that won't actually get pruned.
> 
> I wonder if you'd want to either bump the auto-gc object limit, or
> possibly reduce the gc.pruneExpire limit to keep this situation from
> coming up in the first place (or at least mitigating the amount of time
> it's the case).

Auto-gc might not succeed in pruning objects, but it will at least
reduce the number of packs, which is super-important for performance.

I think the intent of automatic gc is to have a git repository be
relatively low-maintenance from a server-operator perspective.  (Side
note: it's fairly trivial for a user with push access to mess with the
check simply by pushing a bunch of objects whose shas start with 17).
It seems odd that git gets itself into a state where it refuses to do
any maintenance just because at some point some piece of the maintenance
didn't make progress.

Sure, I could change my configuration, but that doesn't help the other
folks (e.g. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=813084 )
who run into this.

I have three thoughts on this:

Idea 1: when gc --auto would issue this message, instead it could create
a file named gc.too-much-garbage (instead of gc.log), with this message.
If that file exists, and it is less than one day (?) old, then we don't
attempt to do a full gc; instead we just run git repack -A -d.  (If it's
more than one day old, we just delete it and continue anyway).

Idea 2 : Like idea 1, but instead of repacking, just smash the existing
packs together into one big pack.  In other words, don't consider
dangling objects, or recompute deltas.  Twitter has a tool called "git
combine-pack" that does this:
https://github.com/dturner-tw/git/blob/dturner/journal/builtin/combine-pack.c

That's less space-efficient than a true repack, but it's no worse than
having the packs separate, and it's a win for read performance because
there's no need to do a linear search over N packs to find an object.

Idea 3: As I suggested last time, separate fatal and non-fatal errors.
If gc fails because of EIO or something, we probably don't want to touch
the disk anymore. But here, the worst consequence is that we waste some
processing power. And it's better to occasionally waste processing power
in a non-interactive setting than it is to do so when a user will be
blocked on it.  So non-fatal warnings should go to gc.log, and fatal
errors should go to gc.fatal.  gc.log won't block gc from running. I
think this is my preferred option.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "disabling bitmap writing, as some objects are not being packed"?
  2017-02-08 22:14                   ` David Turner
@ 2017-02-08 23:00                     ` Jeff King
  2017-02-09  0:18                       ` Junio C Hamano
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff King @ 2017-02-08 23:00 UTC (permalink / raw)
  To: David Turner; +Cc: Junio C Hamano, Duy Nguyen, Git Mailing List

On Wed, Feb 08, 2017 at 05:14:03PM -0500, David Turner wrote:

> > I wonder if you'd want to either bump the auto-gc object limit, or
> > possibly reduce the gc.pruneExpire limit to keep this situation from
> > coming up in the first place (or at least mitigating the amount of time
> > it's the case).
> 
> Auto-gc might not succeed in pruning objects, but it will at least
> reduce the number of packs, which is super-important for performance.

Right, I mean to bump the loose-object limit but keep the
gc.autoPackLimit at 50. If you couple that with setting
transfer.unpackLimit, then each push creates a single pack, and you
repack after 50 pushes.

You don't have to care about loose objects, because you know you only
get them when a "gc" ejects loose objects (so they're not as efficient,
but nothing actually accesses them; they just hang around until their
mtime grace period is up).

> I think the intent of automatic gc is to have a git repository be
> relatively low-maintenance from a server-operator perspective.  (Side
> note: it's fairly trivial for a user with push access to mess with the
> check simply by pushing a bunch of objects whose shas start with 17).
> It seems odd that git gets itself into a state where it refuses to do
> any maintenance just because at some point some piece of the maintenance
> didn't make progress.

In my experience, auto-gc has never been a low-maintenance operation on
the server side (and I do think it was primarily designed with clients
in mind).

At GitHub we disable it entirely, and do our own gc based on a throttled
job queue (one reason to throttle is that repacking is memory and I/O
intensive, so you really don't want to a bunch of repacks kicking off
all at once). So basically any repo that gets pushed to goes on the
queue, and then we pick the worst cases from the queue based on how
badly they need packing[1].

I wish regular Git were more turn-key in that respect. Maybe it is for
smaller sites, but we certainly didn't find it so. And I don't know that
it's feasible to really share the solution. It's entangled with our
database (to store last-pushed and last-maintenance values for repos)
and our job scheduler.

[1] The "how bad" thing is a heuristic, and we found it's generally
    proportional to the number of bytes stored in objects _outside_ of
    the big "main" pack. So 2 big pushes may need maintenance more
    than 10 tiny pushes, because they have more objects (and our goal
    with maintenance isn't just saving disk space or avoiding the linear
    pack search, but having up-to-date bitmaps and good on-disk deltas
    to make serving fetches as cheap as possible).

> Sure, I could change my configuration, but that doesn't help the other
> folks (e.g. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=813084 )
> who run into this.

Yeah, I'm certainly open to improving Git's defaults. If it's not clear
from the above, I mostly just gave up for a site the size of GitHub. :)

> Idea 1: when gc --auto would issue this message, instead it could create
> a file named gc.too-much-garbage (instead of gc.log), with this message.
> If that file exists, and it is less than one day (?) old, then we don't
> attempt to do a full gc; instead we just run git repack -A -d.  (If it's
> more than one day old, we just delete it and continue anyway).

I kind of wonder if this should apply to _any_ error. I.e., just check
the mtime of gc.log and forcibly remove it when it's older than a day.
You never want to get into a state that will fail to resolve itself
eventually. That might still happen (e.g., corrupt repo), but at the
very least it won't be because Git is too dumb to try again.

> Idea 2 : Like idea 1, but instead of repacking, just smash the existing
> packs together into one big pack.  In other words, don't consider
> dangling objects, or recompute deltas.  Twitter has a tool called "git
> combine-pack" that does this:
> https://github.com/dturner-tw/git/blob/dturner/journal/builtin/combine-pack.c

We wrote something similar at GitHub, too, but we never ended up using
it in production. We found that with a sane scheduler, it's not too big
a deal to just do maintenance once in a while.

  Also, our original problem was that repos which have gotten out of
  hand (say, 5000 packs) repacked _very_ slowly with a normal repack. So
  a "fast pack" followed by a real pack was a viable way out of that. In
  the end, I just made pack-objects handle this case better, and we
  don't need the fast-pack.

> That's less space-efficient than a true repack, but it's no worse than
> having the packs separate, and it's a win for read performance because
> there's no need to do a linear search over N packs to find an object.

Over the long term you may end up with worse packs, because the true
repack will drop some delta opportunities between objects in the same
pack (reasoning that they weren't made into deltas last time, so it's
not worth trying again). You'd probably need to use "-f" periodically.

This is all speculation, though. We never did it in production, so I was
never able to measure the real impact over time.

> Idea 3: As I suggested last time, separate fatal and non-fatal errors.
> If gc fails because of EIO or something, we probably don't want to touch
> the disk anymore. But here, the worst consequence is that we waste some
> processing power. And it's better to occasionally waste processing power
> in a non-interactive setting than it is to do so when a user will be
> blocked on it.  So non-fatal warnings should go to gc.log, and fatal
> errors should go to gc.fatal.  gc.log won't block gc from running. I
> think this is my preferred option.

This seems like your (1), except that it handles more than one type of
non-fatal error. So I like it much better.

I'm still not sure if it's worth making the fatal/non-fatal distinction.
Doing so is perhaps safer, but it does mean that somebody has to decide
which errors are important enough to block a retry totally, and which
are not. In theory, it would be safe to always _try_ and then the gc
process can decide when something is broken and abort. And all you've
wasted is some processing power each day.

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "disabling bitmap writing, as some objects are not being packed"?
  2017-02-08 23:00                     ` Jeff King
@ 2017-02-09  0:18                       ` Junio C Hamano
  2017-02-09  1:12                         ` Jeff King
  0 siblings, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2017-02-09  0:18 UTC (permalink / raw)
  To: Jeff King; +Cc: David Turner, Duy Nguyen, Git Mailing List

Jeff King <peff@peff.net> writes:

> In my experience, auto-gc has never been a low-maintenance operation on
> the server side (and I do think it was primarily designed with clients
> in mind).

I do not think auto-gc was ever tweaked to help server usage, in its
history since it was invented strictly to help end-users (mostly new
ones).

> At GitHub we disable it entirely, and do our own gc based on a throttled
> job queue ...
> I wish regular Git were more turn-key in that respect. Maybe it is for
> smaller sites, but we certainly didn't find it so. And I don't know that
> it's feasible to really share the solution. It's entangled with our
> database (to store last-pushed and last-maintenance values for repos)
> and our job scheduler.

Thanks for sharing the insights from the trenches ;-)

> Yeah, I'm certainly open to improving Git's defaults. If it's not clear
> from the above, I mostly just gave up for a site the size of GitHub. :)
>
>> Idea 1: when gc --auto would issue this message, instead it could create
>> a file named gc.too-much-garbage (instead of gc.log), with this message.
>> If that file exists, and it is less than one day (?) old, then we don't
>> attempt to do a full gc; instead we just run git repack -A -d.  (If it's
>> more than one day old, we just delete it and continue anyway).
>
> I kind of wonder if this should apply to _any_ error. I.e., just check
> the mtime of gc.log and forcibly remove it when it's older than a day.
> You never want to get into a state that will fail to resolve itself
> eventually. That might still happen (e.g., corrupt repo), but at the
> very least it won't be because Git is too dumb to try again.

;-)

>> Idea 2 : Like idea 1, but instead of repacking, just smash the existing
>> packs together into one big pack.  In other words, don't consider
>> dangling objects, or recompute deltas.  Twitter has a tool called "git
>> combine-pack" that does this:
>> https://github.com/dturner-tw/git/blob/dturner/journal/builtin/combine-pack.c
>
> We wrote something similar at GitHub, too, but we never ended up using
> it in production. We found that with a sane scheduler, it's not too big
> a deal to just do maintenance once in a while.

Thanks again for this.  I've also been wondering about how effective
a "concatenate packs without paying reachability penalty" would be.

> I'm still not sure if it's worth making the fatal/non-fatal distinction.
> Doing so is perhaps safer, but it does mean that somebody has to decide
> which errors are important enough to block a retry totally, and which
> are not. In theory, it would be safe to always _try_ and then the gc
> process can decide when something is broken and abort. And all you've
> wasted is some processing power each day.

Yup, and somebody or something need to monitor so that repeated
failures can be dealt with.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: "disabling bitmap writing, as some objects are not being packed"?
  2017-02-09  0:18                       ` Junio C Hamano
@ 2017-02-09  1:12                         ` Jeff King
  0 siblings, 0 replies; 21+ messages in thread
From: Jeff King @ 2017-02-09  1:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: David Turner, Duy Nguyen, Git Mailing List

On Wed, Feb 08, 2017 at 04:18:25PM -0800, Junio C Hamano wrote:

> > We wrote something similar at GitHub, too, but we never ended up using
> > it in production. We found that with a sane scheduler, it's not too big
> > a deal to just do maintenance once in a while.
> 
> Thanks again for this.  I've also been wondering about how effective
> a "concatenate packs without paying reachability penalty" would be.

For the sake of posterity, I'll include our patch at the end (sorry, not
chunked into nice readable commits; that never existed in the first
place).

> > I'm still not sure if it's worth making the fatal/non-fatal distinction.
> > Doing so is perhaps safer, but it does mean that somebody has to decide
> > which errors are important enough to block a retry totally, and which
> > are not. In theory, it would be safe to always _try_ and then the gc
> > process can decide when something is broken and abort. And all you've
> > wasted is some processing power each day.
> 
> Yup, and somebody or something need to monitor so that repeated
> failures can be dealt with.

Yes. I think that part is probably outside the scope of Git. But if
auto-gc leaves gc.log lying around, it would be easy to visit each repo
and collect the various failures.

-- >8 --
This is the "pack-fast" patch, for reference. It applies on v2.6.5,
though I had to do some wiggling due to a few of our other custom
patches, so it's possible I introduced new bugs. It compiles, but I
didn't actually re-test the result.  I _think_ the original at least
generated valid packs in all cases.

So I would certainly not recommend anybody run this. It's just a
possible base to work off of if anybody's interested in the topic. I
haven't looked at David's combine-packs at all to see if it is any less
gross. :)

---
 Makefile            |   1 +
 builtin.h           |   1 +
 builtin/pack-fast.c | 618 +++++++++++++++++++++++++++++++++++
 cache.h             |   5 +
 git.c               |   1 +
 pack-bitmap-write.c | 167 +++++++++-
 pack-bitmap.c       |   2 +-
 pack-bitmap.h       |   8 +
 sha1_file.c         |   4 +-
 9 files changed, 792 insertions(+), 15 deletions(-)

diff --git a/Makefile b/Makefile
index 37e2d9e18..524b185ec 100644
--- a/Makefile
+++ b/Makefile
@@ -887,6 +887,7 @@ BUILTIN_OBJS += builtin/mv.o
 BUILTIN_OBJS += builtin/name-rev.o
 BUILTIN_OBJS += builtin/notes.o
 BUILTIN_OBJS += builtin/pack-objects.o
+BUILTIN_OBJS += builtin/pack-fast.o
 BUILTIN_OBJS += builtin/pack-redundant.o
 BUILTIN_OBJS += builtin/pack-refs.o
 BUILTIN_OBJS += builtin/patch-id.o
diff --git a/builtin.h b/builtin.h
index 79aaf0afe..df4e4d668 100644
--- a/builtin.h
+++ b/builtin.h
@@ -95,6 +95,7 @@ extern int cmd_mv(int argc, const char **argv, const char *prefix);
 extern int cmd_name_rev(int argc, const char **argv, const char *prefix);
 extern int cmd_notes(int argc, const char **argv, const char *prefix);
 extern int cmd_pack_objects(int argc, const char **argv, const char *prefix);
+extern int cmd_pack_fast(int argc, const char **argv, const char *prefix);
 extern int cmd_pack_redundant(int argc, const char **argv, const char *prefix);
 extern int cmd_patch_id(int argc, const char **argv, const char *prefix);
 extern int cmd_prune(int argc, const char **argv, const char *prefix);
diff --git a/builtin/pack-fast.c b/builtin/pack-fast.c
new file mode 100644
index 000000000..ad9f5e5f1
--- /dev/null
+++ b/builtin/pack-fast.c
@@ -0,0 +1,618 @@
+#include "builtin.h"
+#include "cache.h"
+#include "pack.h"
+#include "progress.h"
+#include "csum-file.h"
+#include "sha1-lookup.h"
+#include "parse-options.h"
+#include "tempfile.h"
+#include "pack-bitmap.h"
+#include "pack-revindex.h"
+
+static const char *pack_usage[] = {
+	N_("git pack-fast --quiet [options...] [base-name]"),
+	NULL
+};
+
+struct packwriter {
+	struct tempfile *tmp;
+	off_t total;
+	int fd;
+	uint32_t crc32;
+	unsigned do_crc;
+};
+
+static void packwriter_crc32_start(struct packwriter *w)
+{
+	w->crc32 = crc32(0, NULL, 0);
+	w->do_crc = 1;
+}
+
+static uint32_t packwriter_crc32_end(struct packwriter *w)
+{
+	w->do_crc = 0;
+	return w->crc32;
+}
+
+static void packwriter_write(struct packwriter *w, const void *buf, unsigned int count)
+{
+	if (w->do_crc)
+		w->crc32 = crc32(w->crc32, buf, count);
+	write_or_die(w->fd, buf, count);
+	w->total += count;
+}
+
+static off_t packwriter_total(struct packwriter *w)
+{
+	return w->total;
+}
+
+static void packwriter_init(struct packwriter *w)
+{
+	char tmpname[PATH_MAX];
+
+	w->fd = odb_mkstemp(tmpname, sizeof(tmpname), "pack/tmp_pack_XXXXXX");
+	w->total = 0;
+	w->do_crc = 0;
+	w->tmp = xcalloc(1, sizeof(*w->tmp));
+
+	register_tempfile(w->tmp, tmpname);
+}
+
+
+static int progress = 1;
+static struct progress *progress_state;
+static struct pack_idx_option pack_idx_opts;
+static const char *base_name = "pack-fast";
+static int skip_largest;
+static int write_bitmap_index = 1;
+
+static struct packed_git **all_packfiles;
+static unsigned int all_packfiles_nr;
+
+static struct pack_idx_entry **written_list;
+static unsigned int written_nr;
+
+struct write_slab {
+	struct write_slab *next;
+	unsigned int nr;
+
+	struct write_slab_entry {
+		struct pack_idx_entry idx;
+		enum object_type real_type;
+	} entries[];
+};
+
+static struct write_slab *written_slab_root;
+static struct write_slab *written_slab_current;
+
+static void add_to_write_list(
+	const unsigned char *sha1, off_t offset, uint32_t crc32,
+	enum object_type real_type)
+{
+	struct write_slab *slab = written_slab_current;
+	struct write_slab_entry *entry = &(slab->entries[slab->nr++]);
+
+	entry->real_type = real_type;
+	entry->idx.offset = offset;
+	entry->idx.crc32 = crc32;
+	hashcpy(entry->idx.sha1, sha1);
+}
+
+static void preallocate_write_slab(unsigned int num_entries)
+{
+	struct write_slab *slab = xmalloc(
+		sizeof(struct write_slab) +
+		num_entries * sizeof(struct write_slab_entry));
+
+	slab->next = NULL;
+	slab->nr = 0;
+
+	if (!written_slab_current) {
+		written_slab_current = slab;
+		written_slab_root = slab;
+	} else {
+		written_slab_current->next = slab;
+		written_slab_current = slab;
+	}
+}
+
+static struct skipped_object {
+	off_t skipped_offset;
+	off_t real_offset;
+} *skipped_list;
+static unsigned int skipped_nr;
+static unsigned int skipped_alloc;
+
+static void add_to_skipped_list(off_t skipped_offset, off_t real_offset)
+{
+	if (skipped_nr >= skipped_alloc) {
+		skipped_alloc = (skipped_alloc + 32) * 2;
+		REALLOC_ARRAY(skipped_list, skipped_alloc);
+	}
+
+	skipped_list[skipped_nr].skipped_offset = skipped_offset;
+	skipped_list[skipped_nr].real_offset = real_offset;
+	skipped_nr++;
+}
+
+static off_t find_real_offset_for_base(off_t skipped_offset)
+{
+	int lo = 0, hi = skipped_nr;
+	while (lo < hi) {
+		int mi = lo + ((hi - lo) / 2);
+		if (skipped_offset == skipped_list[mi].skipped_offset)
+			return skipped_list[mi].real_offset;
+		if (skipped_offset < skipped_list[mi].skipped_offset)
+			hi = mi;
+		else
+			lo = mi + 1;
+	}
+
+	return 0;
+}
+
+/*
+ * Record the offsets needed in our reused packfile chunks due to
+ * "gaps" where we omitted some objects.
+ */
+static struct reused_chunk {
+	off_t start;
+	off_t offset;
+} *reused_chunks;
+static int reused_chunks_nr;
+static int reused_chunks_alloc;
+
+static void record_reused_object(off_t where, off_t offset)
+{
+	if (reused_chunks_nr && reused_chunks[reused_chunks_nr-1].offset == offset)
+		return;
+
+	ALLOC_GROW(reused_chunks, reused_chunks_nr + 1,
+		   reused_chunks_alloc);
+	reused_chunks[reused_chunks_nr].start = where;
+	reused_chunks[reused_chunks_nr].offset = offset;
+	reused_chunks_nr++;
+}
+
+/*
+ * Binary search to find the chunk that "where" is in. Note
+ * that we're not looking for an exact match, just the first
+ * chunk that contains it (which implicitly ends at the start
+ * of the next chunk.
+ */
+static off_t find_reused_offset(off_t where)
+{
+	int lo = 0, hi = reused_chunks_nr;
+	while (lo < hi) {
+		int mi = lo + ((hi - lo) / 2);
+		if (where == reused_chunks[mi].start)
+			return reused_chunks[mi].offset;
+		if (where < reused_chunks[mi].start)
+			hi = mi;
+		else
+			lo = mi + 1;
+	}
+
+	/*
+	 * The first chunk starts at zero, so we can't have gone below
+	 * there.
+	 */
+	assert(lo);
+	return reused_chunks[lo-1].offset;
+}
+
+static uint32_t nth_packed_object_crc32(const struct packed_git *p, uint32_t nr)
+{
+	const uint32_t *index_crc = p->index_data;
+	index_crc += 2 + 256 + p->num_objects * (20/4) + nr;
+	return ntohl(*index_crc);
+}
+
+static void load_index_or_die(struct packed_git *p)
+{
+	if (open_pack_index(p) < 0)
+		die("failed to open index for '%s'", p->pack_name);
+
+	if (p->index_version != 2)
+		die("unsupported index version %d (fast-pack requires index v2)\n",
+			p->index_version);
+}
+
+static int sort_pack(const void *a_, const void *b_)
+{
+	struct packed_git *a = *((struct packed_git **)a_);
+	struct packed_git *b = *((struct packed_git **)b_);
+
+	if (a->mtime > b->mtime)
+		return 1;
+	else if (a->mtime == b->mtime)
+		return 0;
+	return -1;
+}
+
+static void find_packfiles(void)
+{
+	struct packed_git *p;
+	unsigned int n;
+
+	prepare_packed_git();
+
+	for (n = 0, p = packed_git; p; p = p->next) {
+		if (p->pack_local)
+			n++;
+	}
+
+	all_packfiles = xcalloc(n, sizeof(struct packed_git *));
+	all_packfiles_nr = n;
+
+	for (n = 0, p = packed_git; p; p = p->next) {
+		if (p->pack_local)
+			all_packfiles[n++] = p;
+	}
+
+	for (n = 1; n < all_packfiles_nr; ++n) {
+		if (all_packfiles[n]->pack_size > all_packfiles[0]->pack_size) {
+			struct packed_git *tmp = all_packfiles[0];
+			all_packfiles[0] = all_packfiles[n];
+			all_packfiles[n] = tmp;
+		}
+	}
+
+	qsort(all_packfiles + 1, all_packfiles_nr - 1, sizeof(struct packed_git *), sort_pack);
+}
+
+static int sha1_index__cmp(const void *a_, const void *b_)
+{
+	struct pack_idx_entry *a = *((struct pack_idx_entry **)a_);
+	struct pack_idx_entry *b = *((struct pack_idx_entry **)b_);
+	return hashcmp(a->sha1, b->sha1);
+}
+
+static const unsigned char *sha1_index__access(size_t pos, void *table)
+{
+	struct pack_idx_entry **index = table;
+	return index[pos]->sha1;
+}
+
+static void sha1_index_update(void)
+{
+	const unsigned int left_nr = written_nr;
+	const unsigned int right_nr = written_slab_current->nr;
+	const unsigned int total_nr = left_nr + right_nr;
+
+	struct pack_idx_entry **left = written_list;
+	struct pack_idx_entry **right = xmalloc(right_nr * sizeof(struct pack_idx_entry *));
+	struct pack_idx_entry **result = xmalloc(total_nr * sizeof(struct pack_idx_entry *));
+
+	unsigned int i, j, n;
+
+	for (j = 0; j < right_nr; ++j)
+		right[j] = (struct pack_idx_entry *)(&written_slab_current->entries[j]);
+
+	qsort(right, right_nr, sizeof(struct pack_idx_entry  *), sha1_index__cmp);
+
+	for (i = j = n = 0; i < left_nr && j < right_nr; ++n) {
+		struct pack_idx_entry *a = left[i];
+		struct pack_idx_entry *b = right[j];
+
+		if (hashcmp(a->sha1, b->sha1) <= 0) {
+			result[n] = a;
+			i++;
+		} else {
+			result[n] = b;
+			j++;
+		}
+	}
+
+	for (; i < left_nr; ++n, ++i)
+		result[n] = left[i];
+
+	for (; j < right_nr; ++n, ++j)
+		result[n] = right[j];
+
+	free(written_list);
+	free(right);
+
+	written_list = result;
+	written_nr = total_nr;
+}
+
+static off_t sha1_index_find_offset(const unsigned char *sha1)
+{
+	int pos = sha1_pos(sha1, written_list, written_nr, sha1_index__access);
+	return (pos < 0) ? 0 : written_list[pos]->offset;
+}
+
+static void copy_pack_data(
+		struct packwriter *w,
+		struct packed_git *p,
+		struct pack_window **w_curs,
+		off_t offset,
+		off_t len)
+{
+	unsigned char *in;
+	unsigned long avail;
+
+	while (len) {
+		in = use_pack(p, w_curs, offset, &avail);
+		if (avail > len)
+			avail = (unsigned long)len;
+		packwriter_write(w, in, avail);
+		offset += avail;
+		len -= avail;
+	}
+}
+
+extern enum object_type packed_to_object_type(
+	struct packed_git *p, off_t obj_offset, enum object_type type,
+	struct pack_window **w_curs, off_t curpos);
+
+static int append_object_1(
+	struct revindex_entry *reventry,
+	struct packwriter *w,
+	struct packed_git *pack,
+	struct pack_window **w_curs,
+	enum object_type *real_type)
+{
+	const off_t offset = reventry[0].offset;
+	const off_t next = reventry[1].offset;
+
+	off_t cur;
+	enum object_type type;
+	unsigned long size;
+
+	record_reused_object(offset, offset - packwriter_total(w));
+
+	cur = offset;
+	type = unpack_object_header(pack, w_curs, &cur, &size);
+	assert(type >= 0);
+
+	if (write_bitmap_index)
+		*real_type = packed_to_object_type(pack, offset, type, w_curs, cur);
+
+	if (type == OBJ_OFS_DELTA) {
+		const off_t base_offset = get_delta_base(pack, w_curs, &cur, type, offset);
+		const off_t real_base_offset = find_real_offset_for_base(base_offset);
+		off_t fixed_offset = 0;
+
+		assert(base_offset != 0);
+
+		if (real_base_offset) {
+			fixed_offset = packwriter_total(w) - real_base_offset;
+		} else {
+			off_t fixup = find_reused_offset(offset) - find_reused_offset(base_offset);
+			if (fixup)
+				fixed_offset = offset - base_offset - fixup;
+		}
+
+		if (fixed_offset) {
+			unsigned char header[10], ofs_header[10];
+			unsigned i, len, ofs_len;
+
+			assert(fixed_offset > 0);
+			len = encode_in_pack_object_header(OBJ_OFS_DELTA, size, header);
+
+			i = sizeof(ofs_header) - 1;
+			ofs_header[i] = fixed_offset & 127;
+			while (fixed_offset >>= 7)
+				ofs_header[--i] = 128 | (--fixed_offset & 127);
+
+			ofs_len = sizeof(ofs_header) - i;
+
+			packwriter_write(w, header, len);
+			packwriter_write(w, ofs_header + sizeof(ofs_header) - ofs_len, ofs_len);
+			copy_pack_data(w, pack, w_curs, cur, next - cur);
+			return 1;
+		}
+
+		/* ...otherwise we have no fixup, and can write it verbatim */
+	}
+
+	copy_pack_data(w, pack, w_curs, offset, next - offset);
+	return 0;
+}
+
+static int copy_packfile(int from, struct packwriter *w)
+{
+	unsigned char buffer[8192];
+	struct stat st;
+	ssize_t to_read;
+
+	if (from < 0 || fstat(from, &st))
+		return -1;
+
+	posix_fadvise(from, 0, st.st_size, POSIX_FADV_SEQUENTIAL);
+	to_read = st.st_size - 20;
+
+	if (progress)
+		fprintf(stderr, "Copying main packfile...");
+
+	while (to_read) {
+		ssize_t r, cap = sizeof(buffer);
+
+		if (cap > to_read)
+			cap = to_read;
+
+		r = xread(from, buffer, cap);
+		if (r < 0)
+			return -1;
+
+		packwriter_write(w, buffer, r);
+		to_read -= r;
+	}
+
+	if (progress)
+		fprintf(stderr, " done.\n");
+	assert(to_read == 0);
+	return 0;
+}
+
+static void write_initial_packfile(struct packed_git *p, struct packwriter *w)
+{
+	unsigned int n;
+	int source_fd = git_open_noatime(p->pack_name);
+
+	if (copy_packfile(source_fd, w) < 0)
+		die_errno("failed to copy '%s'", p->pack_name);
+	close(source_fd);
+
+	load_index_or_die(p);
+	preallocate_write_slab(p->num_objects);
+
+	if (progress)
+		progress_state = start_progress("Indexing main packfile", p->num_objects);
+
+	for (n = 0; n < p->num_objects; ++n) {
+		const unsigned char *sha1 = nth_packed_object_sha1(p, n);
+		const off_t offset = nth_packed_object_offset(p, n);
+		const uint32_t crc32 = nth_packed_object_crc32(p, n);
+		add_to_write_list(sha1, offset, crc32, OBJ_BAD);
+		display_progress(progress_state, n + 1);
+	}
+
+	stop_progress(&progress_state);
+	close_pack_index(p);
+
+	written_list = xmalloc(p->num_objects * sizeof(struct packed_git *));
+	written_nr = p->num_objects;
+	for (n = 0; n < written_nr; ++n)
+		written_list[n] = (struct pack_idx_entry *)(&written_slab_current->entries[n]);
+}
+
+static void append_packfile(struct packed_git *p, struct packwriter *w)
+{
+	struct pack_window *w_curs = NULL;
+	struct pack_revindex *revidx;
+
+	unsigned int n;
+
+	load_index_or_die(p);
+	preallocate_write_slab(p->num_objects);
+	revidx = revindex_for_pack(p);
+
+	if (progress)
+		progress_state = start_progress("Appending packfile", p->num_objects);
+
+	for (n = 0; n < p->num_objects; ++n) {
+		struct revindex_entry *reventry = &revidx->revindex[n];
+		const unsigned char *sha1 = nth_packed_object_sha1(p, reventry[0].nr);
+		const off_t offset_in_pack = sha1_index_find_offset(sha1);
+
+		if (!offset_in_pack) {
+			const off_t offset = packwriter_total(w);
+
+			enum object_type real_type = OBJ_BAD;
+			uint32_t crc32;
+			int rewrite_header;
+
+			packwriter_crc32_start(w);
+			rewrite_header = append_object_1(reventry, w, p, &w_curs, &real_type);
+			crc32 = packwriter_crc32_end(w);
+
+			if (!rewrite_header && crc32 != nth_packed_object_crc32(p, reventry[0].nr))
+				die("crc32 check failed for %s", sha1_to_hex(sha1));
+
+			add_to_write_list(sha1, offset, crc32, real_type);
+		} else {
+			add_to_skipped_list(reventry[0].offset, offset_in_pack);
+		}
+
+		display_progress(progress_state, n + 1);
+	}
+
+	stop_progress(&progress_state);
+	unuse_pack(&w_curs);
+	close_pack_windows(p);
+	close_pack_index(p);
+
+	sha1_index_update();
+	skipped_nr = 0;
+	reused_chunks_nr = 0;
+}
+
+static void write_packs(void)
+{
+	struct packwriter w;
+	unsigned int i;
+
+	packwriter_init(&w);
+	write_initial_packfile(all_packfiles[0], &w);
+
+	for (i = 1; i < all_packfiles_nr; ++i)
+		append_packfile(all_packfiles[i], &w);
+
+	/* finalize pack */
+	{
+		unsigned char sha1[20];
+		struct strbuf tmpname = STRBUF_INIT;
+
+		fixup_pack_header_footer(w.fd, sha1, w.tmp->filename.buf, written_nr, NULL, 0);
+		close(w.fd);
+
+		strbuf_addf(&tmpname, "%s-", base_name);
+
+		finish_tmp_packfile(&tmpname, w.tmp->filename.buf,
+				written_list, written_nr,
+				&pack_idx_opts, sha1);
+
+		if (write_bitmap_index) {
+			strbuf_addf(&tmpname, "%s.bitmap", sha1_to_hex(sha1));
+			bitmap_rewrite_existing(
+				all_packfiles[0],
+				written_list, written_nr,
+				packwriter_total(&w),
+				sha1, tmpname.buf);
+		}
+
+		strbuf_release(&tmpname);
+		puts(sha1_to_hex(sha1));
+	}
+}
+
+void pack_fast_grow_typemaps(struct packed_git *p, struct ewah_bitmap **typemaps)
+{
+	uint32_t n;
+	size_t pos = p->num_objects;
+	struct write_slab *slab = written_slab_root;
+
+	assert(slab->nr == p->num_objects);
+	assert(slab->next);
+	slab = slab->next;
+
+	while (slab) {
+		for (n = 0; n < slab->nr; ++n) {
+			const enum object_type real_type = slab->entries[n].real_type;
+			assert(real_type >= OBJ_COMMIT && real_type <= OBJ_TAG);
+			ewah_set(typemaps[real_type - 1], pos++);
+		}
+		slab = slab->next;
+	}
+}
+
+int cmd_pack_fast(int argc, const char **argv, const char *prefix)
+{
+	struct option pack_fast_options[] = {
+		OPT_SET_INT('q', "quiet", &progress,
+			    N_("do not show progress meter"), 0),
+		OPT_SET_INT(0, "progress", &progress,
+			    N_("show progress meter"), 1),
+		OPT_BOOL(0, "skip-largest", &skip_largest,
+			 N_("do not pack the largest packfile in the repository")),
+		OPT_END(),
+	};
+
+	reset_pack_idx_option(&pack_idx_opts);
+	progress = isatty(2);
+	argc = parse_options(argc, argv, prefix, pack_fast_options,
+			     pack_usage, 0);
+
+	if (argc) {
+		base_name = argv[0];
+		argc--;
+	}
+
+	find_packfiles();
+	write_packs();
+	return 0;
+}
diff --git a/cache.h b/cache.h
index 6f53962bf..1a13961bd 100644
--- a/cache.h
+++ b/cache.h
@@ -1336,6 +1336,11 @@ extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, unsign
 extern unsigned long unpack_object_header_buffer(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep);
 extern unsigned long get_size_from_delta(struct packed_git *, struct pack_window **, off_t);
 extern int unpack_object_header(struct packed_git *, struct pack_window **, off_t *, unsigned long *);
+extern off_t get_delta_base(struct packed_git *p,
+			    struct pack_window **w_curs,
+			    off_t *curpos,
+			    enum object_type type,
+			    off_t delta_obj_offset);
 
 /*
  * Iterate over the files in the loose-object parts of the object
diff --git a/git.c b/git.c
index 40f9df089..d81bd4469 100644
--- a/git.c
+++ b/git.c
@@ -440,6 +440,7 @@ static struct cmd_struct commands[] = {
 	{ "name-rev", cmd_name_rev, RUN_SETUP },
 	{ "notes", cmd_notes, RUN_SETUP },
 	{ "pack-objects", cmd_pack_objects, RUN_SETUP },
+	{ "pack-fast", cmd_pack_fast, RUN_SETUP },
 	{ "pack-redundant", cmd_pack_redundant, RUN_SETUP },
 	{ "pack-refs", cmd_pack_refs, RUN_SETUP },
 	{ "patch-id", cmd_patch_id },
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index c05d1386a..449715f02 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -505,23 +505,39 @@ void bitmap_writer_set_checksum(unsigned char *sha1)
 	hashcpy(writer.pack_checksum, sha1);
 }
 
+static struct sha1file *bitmap_file_new(char *tmp_file, size_t len)
+{
+	int fd = odb_mkstemp(tmp_file, len, "pack/tmp_bitmap_XXXXXX");
+
+	if (fd < 0)
+		die_errno("unable to create '%s'", tmp_file);
+
+	return sha1fd(fd, tmp_file);
+}
+
+static void bitmap_file_close(struct sha1file *f, const char *tmp_file, const char *dest)
+{
+	sha1close(f, NULL, CSUM_FSYNC);
+
+	if (adjust_shared_perm(tmp_file))
+		die_errno("unable to make temporary bitmap file readable");
+
+	if (rename(tmp_file, dest))
+		die_errno("unable to rename temporary bitmap file to '%s'", dest);
+}
+
 void bitmap_writer_finish(struct pack_idx_entry **index,
 			  uint32_t index_nr,
 			  const char *filename,
 			  uint16_t options)
 {
-	static char tmp_file[PATH_MAX];
 	static uint16_t default_version = 1;
 	static uint16_t flags = BITMAP_OPT_FULL_DAG;
+	char tmp_file[PATH_MAX];
 	struct sha1file *f;
-
 	struct bitmap_disk_header header;
 
-	int fd = odb_mkstemp(tmp_file, sizeof(tmp_file), "pack/tmp_bitmap_XXXXXX");
-
-	if (fd < 0)
-		die_errno("unable to create '%s'", tmp_file);
-	f = sha1fd(fd, tmp_file);
+	f = bitmap_file_new(tmp_file, sizeof(tmp_file));
 
 	memcpy(header.magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE));
 	header.version = htons(default_version);
@@ -539,11 +555,138 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 	if (options & BITMAP_OPT_HASH_CACHE)
 		write_hash_cache(f, index, index_nr);
 
-	sha1close(f, NULL, CSUM_FSYNC);
+	bitmap_file_close(f, tmp_file, filename);
+}
 
-	if (adjust_shared_perm(tmp_file))
-		die_errno("unable to make temporary bitmap file readable");
+static void *try_load_bitmap(struct packed_git *p, size_t *_size_out)
+{
+	void *reused_bitmap;
+	size_t reused_bitmap_size;
+
+	int fd;
+	struct stat st;
+	char *idx_name;
+
+	idx_name = pack_bitmap_filename(p);
+	fd = git_open_noatime(idx_name);
+	free(idx_name);
+
+	if (fd < 0)
+		return NULL;
+
+	if (fstat(fd, &st)) {
+		close(fd);
+		return NULL;
+	}
+
+	reused_bitmap_size = xsize_t(st.st_size);
+	reused_bitmap = xmmap(NULL, reused_bitmap_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	close(fd);
+
+	*_size_out = reused_bitmap_size;
+	return reused_bitmap;
+}
+
+extern void pack_fast_grow_typemaps(struct packed_git *p, struct ewah_bitmap **typemaps);
+
+static size_t rewrite_type_maps(struct sha1file *f,
+	struct packed_git *p, unsigned char *original_map, size_t original_size, size_t pos)
+{
+	struct ewah_bitmap *typemaps[4];
+	int r, i;
+
+	for (i = 0; i < 4; ++i) {
+		typemaps[i] = ewah_pool_new();
+		r = ewah_read_mmap(typemaps[i], original_map + pos, original_size - pos);
+		if (r < 0)
+			die("failed to read bitmap index");
+		pos += r;
+	}
+
+	pack_fast_grow_typemaps(p, typemaps);
+
+	for (i = 0; i < 4; ++i) {
+		dump_bitmap(f, typemaps[i]);
+		ewah_pool_free(typemaps[i]);
+	}
+
+	return pos;
+}
+
+static size_t rewrite_bitmaps(struct sha1file *f,
+	struct packed_git *p, unsigned char *original_map, size_t original_size, size_t pos,
+	uint32_t entry_count, struct pack_idx_entry **index, uint32_t index_nr)
+{
+	uint32_t i;
+
+	for (i = 0; i < entry_count; ++i) {
+		const unsigned char *sha1;
+		uint32_t src_idx, src_buffer_len, total_len;
+		int new_idx;
+
+		src_idx = get_be32(original_map + pos);
+		pos += 4;
+
+		sha1 = nth_packed_object_sha1(p, src_idx);
+		new_idx = sha1_pos(sha1, index, index_nr, sha1_access);
+		sha1write_be32(f, (uint32_t)new_idx);
+
+		src_buffer_len = get_be32(original_map + pos + 2 + 4);
+		total_len = (3 * 4) + (src_buffer_len * 8);
+
+		sha1write(f, original_map + pos, 2 + total_len);
+		pos += 2 + total_len;
+
+		if (pos > original_size)
+			die("unexpected end of file");
+	}
+
+	return pos;
+}
+
+void bitmap_rewrite_existing(
+	struct packed_git *p,
+	struct pack_idx_entry **index,
+	uint32_t index_nr,
+	off_t pack_offset,
+	const unsigned char *pack_sha1,
+	const char *filename)
+{
+	char tmp_file[PATH_MAX];
+	struct sha1file *f;
+
+	unsigned char *original_map;
+	size_t original_size, pos = 0;
+	struct bitmap_disk_header header;
+
+	original_map = try_load_bitmap(p, &original_size);
+	if (!original_map || original_size < sizeof(header) + 20)
+		return;
+
+	memcpy(&header, original_map, sizeof(header));
+	hashcpy(header.checksum, pack_sha1);
+
+	if (memcmp(header.magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE)) != 0)
+		die("existing bitmap for '%s' is corrupted", p->pack_name);
+
+	if (ntohs(header.version) != 1)
+		die("existing bitmap for '%s' has an unsupported version", p->pack_name);
+
+	f = bitmap_file_new(tmp_file, sizeof(tmp_file));
+
+	sha1write(f, &header, sizeof(header));
+	pos = sizeof(header);
+	pos = rewrite_type_maps(f, p, original_map, original_size, pos);
+	pos = rewrite_bitmaps(f, p, original_map, original_size, pos,
+			ntohl(header.entry_count), index, index_nr);
+
+	if (ntohs(header.options) & BITMAP_OPT_HASH_CACHE) {
+		uint32_t i, zero = 0;
+		sha1write(f, original_map + pos, p->num_objects * 4);
+		for (i = p->num_objects; i < index_nr; ++i)
+			sha1write(f, &zero, 4);
+		pos += (p->num_objects * 4);
+	}
 
-	if (rename(tmp_file, filename))
-		die_errno("unable to rename temporary bitmap file to '%s'", filename);
+	bitmap_file_close(f, tmp_file, filename);
 }
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 637770af8..ee361fa6a 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -250,7 +250,7 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 	return 0;
 }
 
-static char *pack_bitmap_filename(struct packed_git *p)
+char *pack_bitmap_filename(struct packed_git *p)
 {
 	char *idx_name;
 	int len;
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 0adcef77b..398523dbb 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -34,6 +34,7 @@ typedef int (*show_reachable_fn)(
 	struct packed_git *found_pack,
 	off_t found_offset);
 
+char *pack_bitmap_filename(struct packed_git *p);
 int prepare_bitmap_git(void);
 void count_bitmap_commit_list(uint32_t *commits, uint32_t *trees, uint32_t *blobs, uint32_t *tags);
 void traverse_bitmap_commit_list(show_reachable_fn show_reachable);
@@ -53,5 +54,12 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 			  uint32_t index_nr,
 			  const char *filename,
 			  uint16_t options);
+void bitmap_rewrite_existing(
+	struct packed_git *p,
+	struct pack_idx_entry **index,
+	uint32_t index_nr,
+	off_t pack_offset,
+	const unsigned char *pack_sha1,
+	const char *filename);
 
 #endif
diff --git a/sha1_file.c b/sha1_file.c
index 72289696d..bcd447f16 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1821,7 +1821,7 @@ unsigned long get_size_from_delta(struct packed_git *p,
 	return get_delta_hdr_size(&data, delta_head+sizeof(delta_head));
 }
 
-static off_t get_delta_base(struct packed_git *p,
+off_t get_delta_base(struct packed_git *p,
 				    struct pack_window **w_curs,
 				    off_t *curpos,
 				    enum object_type type,
@@ -1936,7 +1936,7 @@ static int retry_bad_packed_offset(struct packed_git *p, off_t obj_offset)
 
 #define POI_STACK_PREALLOC 64
 
-static enum object_type packed_to_object_type(struct packed_git *p,
+enum object_type packed_to_object_type(struct packed_git *p,
 					      off_t obj_offset,
 					      enum object_type type,
 					      struct pack_window **w_curs,

^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-02-09  7:12 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-16 21:05 "disabling bitmap writing, as some objects are not being packed"? David Turner
2016-12-16 21:27 ` Jeff King
2016-12-16 21:28 ` Junio C Hamano
2016-12-16 21:32   ` Jeff King
2016-12-16 21:40     ` David Turner
2016-12-16 21:49       ` Jeff King
2016-12-16 23:59         ` [PATCH] pack-objects: don't warn about bitmaps on incremental pack David Turner
2016-12-17  4:04           ` Jeff King
2016-12-19 16:03             ` David Turner
2016-12-17  7:50   ` "disabling bitmap writing, as some objects are not being packed"? Duy Nguyen
2017-02-08  1:03     ` David Turner
2017-02-08  6:45       ` Duy Nguyen
2017-02-08  8:24         ` David Turner
2017-02-08  8:37           ` Duy Nguyen
2017-02-08 17:44             ` Junio C Hamano
2017-02-08 19:05               ` David Turner
2017-02-08 19:08                 ` Jeff King
2017-02-08 22:14                   ` David Turner
2017-02-08 23:00                     ` Jeff King
2017-02-09  0:18                       ` Junio C Hamano
2017-02-09  1:12                         ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).