git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* WIth git-next, writing bitmaps fails when keep files are present
@ 2014-01-23  2:38 Siddharth Agarwal
  2014-01-23 20:36 ` Siddharth Agarwal
  2014-01-23 22:52 ` [PATCH] pack-objects: turn off bitmaps when skipping objects Jeff King
  0 siblings, 2 replies; 27+ messages in thread
From: Siddharth Agarwal @ 2014-01-23  2:38 UTC (permalink / raw
  To: git

Running git-next, writing bitmap indexes fails if a keep file is present 
from an earlier pack.

With git at b139ac2, the following commands demonstrate the problem:

git init test
cd test
touch a
git add a
git commit -m "a"

git repack -ad  # generate a pack file
for f in .git/objects/pack/*.pack; touch ${f/%pack/keep}  # mark it as 
to keep

touch b
git add b
git commit -m "b"
git repack -adb

This fails at the bitmap writing stage with something like:

Counting objects: 2, done.
Delta compression using up to 24 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (2/2), done.
fatal: Failed to write bitmap index. Packfile doesn't have full closure 
(object 7388a015938147155b600eaacc59af6e78c75e5a is missing)

In our case we have .keep files lying around from ages ago (possibly due 
to kill -9s run on the server). It also means that running repack -a 
with bitmap writing enabled on a repo becomes problematic if a fetch is 
run concurrently.

Even if we practice good .keep hygiene, this seems like a bug in git 
that should be fixed.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WIth git-next, writing bitmaps fails when keep files are present
  2014-01-23  2:38 WIth git-next, writing bitmaps fails when keep files are present Siddharth Agarwal
@ 2014-01-23 20:36 ` Siddharth Agarwal
  2014-01-23 22:52 ` [PATCH] pack-objects: turn off bitmaps when skipping objects Jeff King
  1 sibling, 0 replies; 27+ messages in thread
From: Siddharth Agarwal @ 2014-01-23 20:36 UTC (permalink / raw
  To: git

On 01/22/2014 06:38 PM, Siddharth Agarwal wrote:
> In our case we have .keep files lying around from ages ago (possibly 
> due to kill -9s run on the server). It also means that running repack 
> -a with bitmap writing enabled on a repo becomes problematic if a 
> fetch is run concurrently.

We briefly discussed locking our repos while the repack was run, but the 
repo that would benefit the most from repacks cannot be locked to pushes 
for even a tenth of the time that repack takes on it.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] pack-objects: turn off bitmaps when skipping objects
  2014-01-23  2:38 WIth git-next, writing bitmaps fails when keep files are present Siddharth Agarwal
  2014-01-23 20:36 ` Siddharth Agarwal
@ 2014-01-23 22:52 ` Jeff King
  2014-01-23 23:45   ` Siddharth Agarwal
  1 sibling, 1 reply; 27+ messages in thread
From: Jeff King @ 2014-01-23 22:52 UTC (permalink / raw
  To: Siddharth Agarwal; +Cc: git

On Wed, Jan 22, 2014 at 06:38:57PM -0800, Siddharth Agarwal wrote:

> Running git-next, writing bitmap indexes fails if a keep file is
> present from an earlier pack.

Right, that's expected.

The bitmap format cannot represent objects that are not present in the
pack. So we cannot write a bitmap index if any object reachable from a
packed commit is omitted from the pack.

We could be nicer and downgrade it to a warning, though. The patch below
does that.

> In our case we have .keep files lying around from ages ago (possibly
> due to kill -9s run on the server).

We ran into that problem at GitHub, too. We just turn off
`--honor-pack-keep` during our repacks, as we never want them on anyway
(and we would prefer to ignore the .keep than to abort the bitmap).

> It also means that running repack -a with bitmap writing enabled on a
> repo becomes problematic if a fetch is run concurrently.

For the most part, no. The .keep file should generally only be set
during the period between indexing the pack and updating the refs (so
while checking connectivity and running hooks). But pack-objects starts
from the ref tips and walks backwards. Until they are updated, it will
not try to pack the objects in the .keep files, as nobody references
them. There are two loopholes, though:

  1. In some instances, a remote may send an object we already have
     (e.g., because it is a blob referenced in an old commit, but newly
     referenced again due to a revert; we do not do a full object
     difference during the protocol negotiation, for reasons of
     efficiency). If that is the case, we may omit it if pack-objects
     starts during the period that the .pack and .keep files exist.

  2. Once the fetch updates the refs, it removes the .keep file. But
     this isn't atomic. A repack which starts between the two may pick
     up the new ref values, but also see the .keep file.

These are both unlikely, but possible on a very busy repository. The
patch below will downgrade each to a warning, rather than aborting the
repack.

So this should just work out of the box with this patch.  But if bitmaps
are important to you (say, you are running a very busy site and want
to make sure you always have bitmaps turned on) and you do not otherwise
care about .keep files, you may want to disable them, too.

-Peff

-- >8 --
Subject: pack-objects: turn off bitmaps when skipping objects

The pack bitmap format requires that we have a single bit
for each object in the pack, and that each object's bitmap
represents its complete set of reachable objects. Therefore
we have no way to represent the bitmap of an object which
references objects outside the pack.

We notice this problem while generating the bitmaps, as we
try to find the offset of a particular object and realize
that we do not have it. In this case we die, and neither the
bitmap nor the pack is generated. This is correct, but
perhaps a little unfriendly. If you have bitmaps turned on
in the config, many repacks will fail which would otherwise
succeed. E.g., incremental repacks, repacks with "-l" when
you have alternates, ".keep" files.

Instead, this patch notices early that we are omitting some
objects from the pack and turns off bitmaps (with a
warning). Note that this is not strictly correct, as it's
possible that the object being omitted is not reachable from
any other object in the pack. In practice, this is almost
never the case, and there are two advantages to doing it
this way:

  1. The code is much simpler, as we do not have to cleanly
     abort the bitmap-generation process midway through.

  2. We do not waste time partially generating bitmaps only
     to find out that some object deep in the history is not
     being packed.

Signed-off-by: Jeff King <peff@peff.net>
---
I tried to keep the warning to an 80-character line without making it
too confusing. Suggestions welcome if it doesn't make sense to people.

 builtin/pack-objects.c  | 12 +++++++++++-
 t/t5310-pack-bitmaps.sh |  5 ++++-
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 8364fbd..76831d9 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1000,6 +1000,10 @@ static void create_object_entry(const unsigned char *sha1,
 	entry->no_try_delta = no_try_delta;
 }
 
+static const char no_closure_warning[] = N_(
+"disabling bitmap writing, as some objects are not being packed"
+);
+
 static int add_object_entry(const unsigned char *sha1, enum object_type type,
 			    const char *name, int exclude)
 {
@@ -1010,8 +1014,14 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type,
 	if (have_duplicate_entry(sha1, exclude, &index_pos))
 		return 0;
 
-	if (!want_object_in_pack(sha1, exclude, &found_pack, &found_offset))
+	if (!want_object_in_pack(sha1, exclude, &found_pack, &found_offset)) {
+		/* The pack is missing an object, so it will not have closure */
+		if (write_bitmap_index) {
+			warning(_(no_closure_warning));
+			write_bitmap_index = 0;
+		}
 		return 0;
+	}
 
 	create_object_entry(sha1, type, pack_name_hash(name),
 			    exclude, name && no_try_delta(name),
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index d3a3afa..f13525c 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -91,7 +91,10 @@ test_expect_success 'fetch (partial bitmap)' '
 
 test_expect_success 'incremental repack cannot create bitmaps' '
 	test_commit more-1 &&
-	test_must_fail git repack -d
+	find .git/objects/pack -name "*.bitmap" >expect &&
+	git repack -d &&
+	find .git/objects/pack -name "*.bitmap" >actual &&
+	test_cmp expect actual
 '
 
 test_expect_success 'incremental repack can disable bitmaps' '
-- 
1.8.5.2.500.g8060133

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] pack-objects: turn off bitmaps when skipping objects
  2014-01-23 22:52 ` [PATCH] pack-objects: turn off bitmaps when skipping objects Jeff King
@ 2014-01-23 23:45   ` Siddharth Agarwal
  2014-01-23 23:53     ` Siddharth Agarwal
  2014-01-23 23:56     ` [PATCH] pack-objects: turn off bitmaps when skipping objects Vicent Martí
  0 siblings, 2 replies; 27+ messages in thread
From: Siddharth Agarwal @ 2014-01-23 23:45 UTC (permalink / raw
  To: Jeff King; +Cc: git

On 01/23/2014 02:52 PM, Jeff King wrote:
> Right, that's expected.
>
> The bitmap format cannot represent objects that are not present in the
> pack. So we cannot write a bitmap index if any object reachable from a
> packed commit is omitted from the pack.
>
> We could be nicer and downgrade it to a warning, though. The patch below
> does that.

This makes sense.

>> In our case we have .keep files lying around from ages ago (possibly
>> due to kill -9s run on the server).
> We ran into that problem at GitHub, too. We just turn off
> `--honor-pack-keep` during our repacks, as we never want them on anyway
> (and we would prefer to ignore the .keep than to abort the bitmap).

Yes, we'd prefer to do that too. How do you actually do this, though? I 
don't see a way to pass `--honor-pack-keep` (shouldn't I pass in its 
inverse?) down to `git-pack-objects`.

>> It also means that running repack -a with bitmap writing enabled on a
>> repo becomes problematic if a fetch is run concurrently.
> For the most part, no. The .keep file should generally only be set
> during the period between indexing the pack and updating the refs (so
> while checking connectivity and running hooks). But pack-objects starts
> from the ref tips and walks backwards. Until they are updated, it will
> not try to pack the objects in the .keep files, as nobody references
> them.

The worry is less certain objects not being packed and more the old 
packs being deleted by git repack, isn't it? From the man page for 
git-index-pack:

--keep
Before moving the index into its final destination create an empty .keep 
file for the associated pack file. This option is usually necessary with
--stdin to prevent a simultaneous git repack process from deleting the 
newly constructed pack and index before refs can be updated to use 
objects contained in the pack.

I could be misunderstanding things here, though. From the description in 
the man page it's not clear what the actual failure mode here is.

> There are two loopholes, though:
>
>    1. In some instances, a remote may send an object we already have
>       (e.g., because it is a blob referenced in an old commit, but newly
>       referenced again due to a revert; we do not do a full object
>       difference during the protocol negotiation, for reasons of
>       efficiency). If that is the case, we may omit it if pack-objects
>       starts during the period that the .pack and .keep files exist.
>
>    2. Once the fetch updates the refs, it removes the .keep file. But
>       this isn't atomic. A repack which starts between the two may pick
>       up the new ref values, but also see the .keep file.
>
> These are both unlikely, but possible on a very busy repository. The
> patch below will downgrade each to a warning, rather than aborting the
> repack.
>
> So this should just work out of the box with this patch.  But if bitmaps
> are important to you (say, you are running a very busy site and want
> to make sure you always have bitmaps turned on) and you do not otherwise
> care about .keep files, you may want to disable them, too.

We need to make sure bitmaps are always turned on, but we need to be 
even more certain that pushes don't fail due to races.

> -Peff
>
> -- >8 --
> Subject: pack-objects: turn off bitmaps when skipping objects
>
> The pack bitmap format requires that we have a single bit
> for each object in the pack, and that each object's bitmap
> represents its complete set of reachable objects. Therefore
> we have no way to represent the bitmap of an object which
> references objects outside the pack.
>
> We notice this problem while generating the bitmaps, as we
> try to find the offset of a particular object and realize
> that we do not have it. In this case we die, and neither the
> bitmap nor the pack is generated. This is correct, but
> perhaps a little unfriendly. If you have bitmaps turned on
> in the config, many repacks will fail which would otherwise
> succeed. E.g., incremental repacks, repacks with "-l" when
> you have alternates, ".keep" files.
>
> Instead, this patch notices early that we are omitting some
> objects from the pack and turns off bitmaps (with a
> warning). Note that this is not strictly correct, as it's
> possible that the object being omitted is not reachable from
> any other object in the pack. In practice, this is almost
> never the case, and there are two advantages to doing it
> this way:
>
>    1. The code is much simpler, as we do not have to cleanly
>       abort the bitmap-generation process midway through.
>
>    2. We do not waste time partially generating bitmaps only
>       to find out that some object deep in the history is not
>       being packed.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I tried to keep the warning to an 80-character line without making it
> too confusing. Suggestions welcome if it doesn't make sense to people.
>
>   builtin/pack-objects.c  | 12 +++++++++++-
>   t/t5310-pack-bitmaps.sh |  5 ++++-
>   2 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 8364fbd..76831d9 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -1000,6 +1000,10 @@ static void create_object_entry(const unsigned char *sha1,
>   	entry->no_try_delta = no_try_delta;
>   }
>   
> +static const char no_closure_warning[] = N_(
> +"disabling bitmap writing, as some objects are not being packed"
> +);
> +
>   static int add_object_entry(const unsigned char *sha1, enum object_type type,
>   			    const char *name, int exclude)
>   {
> @@ -1010,8 +1014,14 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type,
>   	if (have_duplicate_entry(sha1, exclude, &index_pos))
>   		return 0;
>   
> -	if (!want_object_in_pack(sha1, exclude, &found_pack, &found_offset))
> +	if (!want_object_in_pack(sha1, exclude, &found_pack, &found_offset)) {
> +		/* The pack is missing an object, so it will not have closure */
> +		if (write_bitmap_index) {
> +			warning(_(no_closure_warning));
> +			write_bitmap_index = 0;
> +		}
>   		return 0;
> +	}
>   
>   	create_object_entry(sha1, type, pack_name_hash(name),
>   			    exclude, name && no_try_delta(name),
> diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
> index d3a3afa..f13525c 100755
> --- a/t/t5310-pack-bitmaps.sh
> +++ b/t/t5310-pack-bitmaps.sh
> @@ -91,7 +91,10 @@ test_expect_success 'fetch (partial bitmap)' '
>   
>   test_expect_success 'incremental repack cannot create bitmaps' '
>   	test_commit more-1 &&
> -	test_must_fail git repack -d
> +	find .git/objects/pack -name "*.bitmap" >expect &&
> +	git repack -d &&
> +	find .git/objects/pack -name "*.bitmap" >actual &&
> +	test_cmp expect actual
>   '
>   
>   test_expect_success 'incremental repack can disable bitmaps' '

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] pack-objects: turn off bitmaps when skipping objects
  2014-01-23 23:45   ` Siddharth Agarwal
@ 2014-01-23 23:53     ` Siddharth Agarwal
  2014-01-24  2:28       ` Jeff King
  2014-01-23 23:56     ` [PATCH] pack-objects: turn off bitmaps when skipping objects Vicent Martí
  1 sibling, 1 reply; 27+ messages in thread
From: Siddharth Agarwal @ 2014-01-23 23:53 UTC (permalink / raw
  To: Jeff King; +Cc: git

On 01/23/2014 03:45 PM, Siddharth Agarwal wrote:
>
> The worry is less certain objects not being packed and more the old 
> packs being deleted by git repack, isn't it? From the man page for 
> git-index-pack:

This should probably be "new pack" and not "old packs", I guess. Not 
knowing much about how this actually works, I'm assuming the scenario 
here is something like:

(1) git receive-pack receives a pack P.pack and writes it to disk
(2) git index-pack runs on P.pack
(3) git repack runs separately, finds pack P.pack with no refs pointing 
to it, and deletes it
(4) everything goes wrong

With a keep file, this would be averted because

(1) git receive-pack receives a pack P.pack and writes it to disk
(2) git index-pack writes a keep file for P.pack, called P.keep
(3) git repack runs separately, finds pack P.pack with a keep file, 
doesn't touch it
(4) git index-pack finishes, and something updates refs to point to 
P.pack and deletes P.keep

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] pack-objects: turn off bitmaps when skipping objects
  2014-01-23 23:45   ` Siddharth Agarwal
  2014-01-23 23:53     ` Siddharth Agarwal
@ 2014-01-23 23:56     ` Vicent Martí
  2014-01-24  2:26       ` Jeff King
  1 sibling, 1 reply; 27+ messages in thread
From: Vicent Martí @ 2014-01-23 23:56 UTC (permalink / raw
  To: Siddharth Agarwal; +Cc: Jeff King, git

On Fri, Jan 24, 2014 at 12:45 AM, Siddharth Agarwal <sid0@fb.com> wrote:
> Yes, we'd prefer to do that too. How do you actually do this, though? I
> don't see a way to pass `--honor-pack-keep` (shouldn't I pass in its
> inverse?) down to `git-pack-objects`.

We run with this patch in production, it may be of use to you:
https://gist.github.com/vmg/8589317

In fact, it may be worth upstreaming too. I'll kindly ask peff to do
it when he has a moment.

Apologies for not attaching the patch inline, the GMail web UI doesn't
mix well with patch workflow.

Cheers,
vmg

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] pack-objects: turn off bitmaps when skipping objects
  2014-01-23 23:56     ` [PATCH] pack-objects: turn off bitmaps when skipping objects Vicent Martí
@ 2014-01-24  2:26       ` Jeff King
  0 siblings, 0 replies; 27+ messages in thread
From: Jeff King @ 2014-01-24  2:26 UTC (permalink / raw
  To: Vicent Martí; +Cc: Siddharth Agarwal, git

On Fri, Jan 24, 2014 at 12:56:17AM +0100, Vicent Martí wrote:

> On Fri, Jan 24, 2014 at 12:45 AM, Siddharth Agarwal <sid0@fb.com> wrote:
> > Yes, we'd prefer to do that too. How do you actually do this, though? I
> > don't see a way to pass `--honor-pack-keep` (shouldn't I pass in its
> > inverse?) down to `git-pack-objects`.
> 
> We run with this patch in production, it may be of use to you:
> https://gist.github.com/vmg/8589317
> 
> In fact, it may be worth upstreaming too. I'll kindly ask peff to do
> it when he has a moment.

I was actually looking at it earlier when I sent this message. The
tricky thing about the patch is that it turns off --honor-pack-keep, but
does _not_ teach git-repack to clean up the .keep file.

Which I think is the right and safe thing to do, as otherwise you might
blow away a pack with .keep, even though you did not just pack its
objects (i.e., because it was written by a fetch or push which did not
yet update the refs). So the safe thing is to actually duplicate those
objects, leave the .keep pack around, and then assume it will get
cleaned up on the next repack.

If you _do_ have a stale .keep file, though, then that stale pack will
hang around forever (presumably with its objects duplicated in the
"real" pack).

So I think the patch is doing the right thing, but I was still figuring
out how to explain it (and I hope I just did). I'll post it with a full
commit message tomorrow.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] pack-objects: turn off bitmaps when skipping objects
  2014-01-23 23:53     ` Siddharth Agarwal
@ 2014-01-24  2:28       ` Jeff King
  2014-01-24  2:44         ` Siddharth Agarwal
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2014-01-24  2:28 UTC (permalink / raw
  To: Siddharth Agarwal; +Cc: git

On Thu, Jan 23, 2014 at 03:53:28PM -0800, Siddharth Agarwal wrote:

> On 01/23/2014 03:45 PM, Siddharth Agarwal wrote:
> >
> >The worry is less certain objects not being packed and more the old
> >packs being deleted by git repack, isn't it? From the man page for
> >git-index-pack:
> 
> This should probably be "new pack" and not "old packs", I guess. Not
> knowing much about how this actually works, I'm assuming the scenario
> here is something like:
> 
> (1) git receive-pack receives a pack P.pack and writes it to disk
> (2) git index-pack runs on P.pack
> (3) git repack runs separately, finds pack P.pack with no refs
> pointing to it, and deletes it
> (4) everything goes wrong
> 
> With a keep file, this would be averted because
> 
> (1) git receive-pack receives a pack P.pack and writes it to disk
> (2) git index-pack writes a keep file for P.pack, called P.keep
> (3) git repack runs separately, finds pack P.pack with a keep file,
> doesn't touch it
> (4) git index-pack finishes, and something updates refs to point to
> P.pack and deletes P.keep

I think your understanding is accurate here. So we want repack to
respect keep files for deletion, but we _not_ necessarily want
pack-objects to avoid packing an object just because it's in a pack
marked by .keep (see my other email).

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] pack-objects: turn off bitmaps when skipping objects
  2014-01-24  2:28       ` Jeff King
@ 2014-01-24  2:44         ` Siddharth Agarwal
  2014-01-28  6:09           ` [PATCH] repack: add `repack.honorpackkeep` config var Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: Siddharth Agarwal @ 2014-01-24  2:44 UTC (permalink / raw
  To: Jeff King; +Cc: git

On 01/23/2014 06:28 PM, Jeff King wrote:
> I think your understanding is accurate here. So we want repack to
> respect keep files for deletion, but we _not_ necessarily want
> pack-objects to avoid packing an object just because it's in a pack
> marked by .keep (see my other email).

Yes, that makes sense and sounds pretty safe.

So the right solution for us probably is to apply the patch Vicent 
posted, set repack.honorpackkeep to false, and also have a cron job that 
cleans up stale .keep files so that subsequent repacks clean it up.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] repack: add `repack.honorpackkeep` config var
  2014-01-24  2:44         ` Siddharth Agarwal
@ 2014-01-28  6:09           ` Jeff King
  2014-01-28  9:21             ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2014-01-28  6:09 UTC (permalink / raw
  To: Siddharth Agarwal; +Cc: Vicent Marti, Junio C Hamano, git

On Thu, Jan 23, 2014 at 06:44:43PM -0800, Siddharth Agarwal wrote:

> On 01/23/2014 06:28 PM, Jeff King wrote:
> >I think your understanding is accurate here. So we want repack to
> >respect keep files for deletion, but we _not_ necessarily want
> >pack-objects to avoid packing an object just because it's in a pack
> >marked by .keep (see my other email).
> 
> Yes, that makes sense and sounds pretty safe.
> 
> So the right solution for us probably is to apply the patch Vicent
> posted, set repack.honorpackkeep to false, and also have a cron job
> that cleans up stale .keep files so that subsequent repacks clean it
> up.

Yes, that matches what we do at GitHub.

Here's Vicent's patch, with documentation and an expanded commit
message. I think it should be suitable for upstream git.

-- >8 --
From: Vicent Marti <tanoku@gmail.com>
Subject: repack: add `repack.honorpackkeep` config var

The git-repack command always passes `--honor-pack-keep`
to pack-objects. This has traditionally been a good thing,
as we do not want to duplicate those objects in a new pack,
and we are not going to delete the old pack.

However, when bitmaps are in use, it is important for a full
repack to include all reachable objects, even if they may be
duplicated in a .keep pack. Otherwise, we cannot generate
the bitmaps, as the on-disk format requires the set of
objects in the pack to be fully closed.

Even if the repository does not generally have .keep files,
a simultaneous push could cause a race condition in which a
.keep file exists at the moment of a repack. The repack may
try to include those objects in one of two situations:

  1. The pushed .keep pack contains objects that were
     already in the repository (e.g., blobs due to a revert of
     an old commit).

  2. Receive-pack updates the refs, making the objects
     reachable, but before it removes the .keep file, the
     repack runs.

In either case, we may prefer to duplicate some objects in
the new, full pack, and let the next repack (after the .keep
file is cleaned up) take care of removing them.

This patch introduces an option to disable the
`--honor-pack-keep` option.  It is not triggered by default,
even when pack.writeBitmaps is turned on, because its use
depends on your overall packing strategy and use of .keep
files.

Note that this option just disables the pack-objects
behavior. We still leave packs with a .keep in place, as we
do not necessarily know that we have duplicated all of their
objects.

Signed-off-by: Jeff King <peff@peff.net>
---
Intended for the jk/pack-bitmap topic.

 Documentation/config.txt | 8 ++++++++
 builtin/repack.c         | 8 +++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 947e6f8..5036a10 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2128,6 +2128,14 @@ repack.usedeltabaseoffset::
 	"false" and repack. Access from old Git versions over the
 	native protocol are unaffected by this option.
 
+repack.honorPackKeep::
+	If set to false, include objects in `.keep` files when repacking
+	via `git repack`. Note that we still do not delete `.keep` packs
+	after `pack-objects` finishes. This means that we may duplicate
+	objects, but this makes the option safe to use when there are
+	concurrent pushes or fetches. This option is generally only
+	useful if you have set `pack.writeBitmaps`. Defaults to true.
+
 rerere.autoupdate::
 	When set to true, `git-rerere` updates the index with the
 	resulting contents after it cleanly resolves conflicts using
diff --git a/builtin/repack.c b/builtin/repack.c
index a9c4593..585c41d 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -9,6 +9,7 @@
 #include "argv-array.h"
 
 static int delta_base_offset = 1;
+static int honor_pack_keep = 1;
 static char *packdir, *packtmp;
 
 static const char *const git_repack_usage[] = {
@@ -22,6 +23,10 @@ static int repack_config(const char *var, const char *value, void *cb)
 		delta_base_offset = git_config_bool(var, value);
 		return 0;
 	}
+	if (!strcmp(var, "repack.honorpackkeep")) {
+		honor_pack_keep = git_config_bool(var, value);
+		return 0;
+	}
 	return git_default_config(var, value, cb);
 }
 
@@ -190,10 +195,11 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 
 	argv_array_push(&cmd_args, "pack-objects");
 	argv_array_push(&cmd_args, "--keep-true-parents");
-	argv_array_push(&cmd_args, "--honor-pack-keep");
 	argv_array_push(&cmd_args, "--non-empty");
 	argv_array_push(&cmd_args, "--all");
 	argv_array_push(&cmd_args, "--reflog");
+	if (honor_pack_keep)
+		argv_array_push(&cmd_args, "--honor-pack-keep");
 	if (window)
 		argv_array_pushf(&cmd_args, "--window=%u", window);
 	if (window_memory)
-- 
1.8.5.2.500.g8060133

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] repack: add `repack.honorpackkeep` config var
  2014-01-28  6:09           ` [PATCH] repack: add `repack.honorpackkeep` config var Jeff King
@ 2014-01-28  9:21             ` Junio C Hamano
  2014-02-24  8:24               ` Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2014-01-28  9:21 UTC (permalink / raw
  To: Jeff King; +Cc: Siddharth Agarwal, Vicent Marti, git

Jeff King <peff@peff.net> writes:

> The git-repack command always passes `--honor-pack-keep`
> to pack-objects. This has traditionally been a good thing,
> as we do not want to duplicate those objects in a new pack,
> and we are not going to delete the old pack.
> ...
> Note that this option just disables the pack-objects
> behavior. We still leave packs with a .keep in place, as we
> do not necessarily know that we have duplicated all of their
> objects.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> Intended for the jk/pack-bitmap topic.

Two comments.

 - It seems that this adds a configuration variable that cannot be
   countermanded from the command line. It has to come with either a
   very good justification in the documentation describing why it is
   a bad idea to even allow overriding from the command line in a
   repository that sets it, or a command line option to let the
   users override it. I personally prefer the latter, because that
   will be one less thing for users to remember (i.e. "usually you
   can override the configured default from the command line, but
   this variable cannot be because of these very good reasons").

 - In the context of "pack-objects", the name "--honor-pack-keep"
   makes sense; it is understood that pack-objects will _not_ remove
   kept packfile, so "honoring" can only mean "do not attempt to
   pick objects out of kept packs to add to the pack being
   generated." and there is no room for --no-honor-pack-keep to be
   mistaken as "you canremove the ones marked to be kept after
   saving the still-used objects in it away."

   But does the same name make sense in the context of "repack"?

Thanks. 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] repack: add `repack.honorpackkeep` config var
  2014-01-28  9:21             ` Junio C Hamano
@ 2014-02-24  8:24               ` Jeff King
  2014-02-24 19:10                 ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2014-02-24  8:24 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Siddharth Agarwal, Vicent Marti, git

On Tue, Jan 28, 2014 at 01:21:43AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > The git-repack command always passes `--honor-pack-keep`
> > to pack-objects. This has traditionally been a good thing,
> > as we do not want to duplicate those objects in a new pack,
> > and we are not going to delete the old pack.
> > ...
> > Note that this option just disables the pack-objects
> > behavior. We still leave packs with a .keep in place, as we
> > do not necessarily know that we have duplicated all of their
> > objects.
> >
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> > Intended for the jk/pack-bitmap topic.
> 
> Two comments.

Sorry, this one slipped through the cracks. Here's a re-roll addressing
your comments.

>  - It seems that this adds a configuration variable that cannot be
>    countermanded from the command line. It has to come with either a
>    very good justification in the documentation describing why it is
>    a bad idea to even allow overriding from the command line in a
>    repository that sets it, or a command line option to let the
>    users override it. I personally prefer the latter, because that
>    will be one less thing for users to remember (i.e. "usually you
>    can override the configured default from the command line, but
>    this variable cannot be because of these very good reasons").

It was less "it is a bad idea to override" and more "I cannot conceive
of any good reason to override". And since you can always use "git -c",
I didn't think it was worth cluttering repack's options.

However, I suppose if one were explicitly bitmapping a single invocation
with `git repack -adb`, it might make sense to use it on the command
line. Fixed in the re-roll.

>  - In the context of "pack-objects", the name "--honor-pack-keep"
>    makes sense; it is understood that pack-objects will _not_ remove
>    kept packfile, so "honoring" can only mean "do not attempt to
>    pick objects out of kept packs to add to the pack being
>    generated." and there is no room for --no-honor-pack-keep to be
>    mistaken as "you canremove the ones marked to be kept after
>    saving the still-used objects in it away."
> 
>    But does the same name make sense in the context of "repack"?

I think the distinction you are making is to capture the second second
from the docs:

  If set to false, include objects in `.keep` files when repacking via
  `git repack`. Note that we still do not delete `.keep` packs after
  `pack-objects` finishes.

The best name I could come up with is "--pack-keep-objects", since that
is literally what it is doing. I'm not wild about the name because it is
easy to read "keep" as a verb (and "pack" as a noun). I think it's OK,
but suggestions are welcome.

-- >8 --
From: Vicent Marti <tanoku@gmail.com>
Subject: repack: add `repack.packKeepObjects` config var

The git-repack command always passes `--honor-pack-keep`
to pack-objects. This has traditionally been a good thing,
as we do not want to duplicate those objects in a new pack,
and we are not going to delete the old pack.

However, when bitmaps are in use, it is important for a full
repack to include all reachable objects, even if they may be
duplicated in a .keep pack. Otherwise, we cannot generate
the bitmaps, as the on-disk format requires the set of
objects in the pack to be fully closed.

Even if the repository does not generally have .keep files,
a simultaneous push could cause a race condition in which a
.keep file exists at the moment of a repack. The repack may
try to include those objects in one of two situations:

  1. The pushed .keep pack contains objects that were
     already in the repository (e.g., blobs due to a revert of
     an old commit).

  2. Receive-pack updates the refs, making the objects
     reachable, but before it removes the .keep file, the
     repack runs.

In either case, we may prefer to duplicate some objects in
the new, full pack, and let the next repack (after the .keep
file is cleaned up) take care of removing them.

This patch introduces an option to disable the
`--honor-pack-keep` option.  It is not triggered by default,
even when pack.writeBitmaps is turned on, because its use
depends on your overall packing strategy and use of .keep
files.

Note that this option just disables the pack-objects
behavior. We still leave packs with a .keep in place, as we
do not necessarily know that we have duplicated all of their
objects.

Signed-off-by: Jeff King <peff@peff.net>
---
I added a test, too, mostly to make sure I didn't bungle the option,
since it's negated from its former name.

 Documentation/config.txt     |  5 +++++
 Documentation/git-repack.txt |  8 ++++++++
 builtin/repack.c             | 10 +++++++++-
 t/t7700-repack.sh            | 16 ++++++++++++++++
 4 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index becbade..8ad081e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2136,6 +2136,11 @@ repack.usedeltabaseoffset::
 	"false" and repack. Access from old Git versions over the
 	native protocol are unaffected by this option.
 
+repack.packKeepObjects::
+	If set to true, makes `git repack` act as if
+	`--pack-keep-objects` was passed. See linkgit:git-repack[1] for
+	details. Defaults to false.
+
 rerere.autoupdate::
 	When set to true, `git-rerere` updates the index with the
 	resulting contents after it cleanly resolves conflicts using
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 002cfd5..0c1ffbd 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -117,6 +117,14 @@ other objects in that pack they already have locally.
 	must be able to refer to all reachable objects. This option
 	overrides the setting of `pack.writebitmaps`.
 
+--pack-keep-objects::
+	Include objects in `.keep` files when repacking.  Note that we
+	still do not delete `.keep` packs after `pack-objects` finishes.
+	This means that we may duplicate objects, but this makes the
+	option safe to use when there are concurrent pushes or fetches.
+	This option is generally only useful if you are writing bitmaps
+	with `-b` or `pack.writebitmaps`, as it ensures that the
+	bitmapped packfile has the necessary objects.
 
 Configuration
 -------------
diff --git a/builtin/repack.c b/builtin/repack.c
index 49f5857..0785a4e 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -9,6 +9,7 @@
 #include "argv-array.h"
 
 static int delta_base_offset = 1;
+static int pack_keep_objects;
 static char *packdir, *packtmp;
 
 static const char *const git_repack_usage[] = {
@@ -22,6 +23,10 @@ static int repack_config(const char *var, const char *value, void *cb)
 		delta_base_offset = git_config_bool(var, value);
 		return 0;
 	}
+	if (!strcmp(var, "repack.packkeepobjects")) {
+		pack_keep_objects = git_config_bool(var, value);
+		return 0;
+	}
 	return git_default_config(var, value, cb);
 }
 
@@ -175,6 +180,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 				N_("limits the maximum delta depth")),
 		OPT_STRING(0, "max-pack-size", &max_pack_size, N_("bytes"),
 				N_("maximum size of each packfile")),
+		OPT_BOOL(0, "pack-keep-objects", &pack_keep_objects,
+				N_("repack objects in packs marked with .keep")),
 		OPT_END()
 	};
 
@@ -190,7 +197,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 
 	argv_array_push(&cmd_args, "pack-objects");
 	argv_array_push(&cmd_args, "--keep-true-parents");
-	argv_array_push(&cmd_args, "--honor-pack-keep");
+	if (!pack_keep_objects)
+		argv_array_push(&cmd_args, "--honor-pack-keep");
 	argv_array_push(&cmd_args, "--non-empty");
 	argv_array_push(&cmd_args, "--all");
 	argv_array_push(&cmd_args, "--reflog");
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index b45bd1e..13ca93c 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -35,6 +35,22 @@ test_expect_success 'objects in packs marked .keep are not repacked' '
 	test -z "$found_duplicate_object"
 '
 
+test_expect_success '--pack-keep-objects duplicates objects' '
+	# build on $objsha1, $packsha1, and .keep state from previous
+	git repack -Adl --pack-keep-objects &&
+	test_when_finished "found_duplicate_object=" &&
+	for p in .git/objects/pack/*.idx; do
+		idx=$(basename $p)
+		test "pack-$packsha1.idx" = "$idx" && continue
+		if git verify-pack -v $p | egrep "^$objsha1"; then
+			found_duplicate_object=1
+			echo "DUPLICATE OBJECT FOUND"
+			break
+		fi
+	done &&
+	test "$found_duplicate_object" = 1
+'
+
 test_expect_success 'loose objects in alternate ODB are not repacked' '
 	mkdir alt_objects &&
 	echo `pwd`/alt_objects > .git/objects/info/alternates &&
-- 
1.8.5.2.500.g8060133

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] repack: add `repack.honorpackkeep` config var
  2014-02-24  8:24               ` Jeff King
@ 2014-02-24 19:10                 ` Junio C Hamano
  2014-02-26 10:13                   ` Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2014-02-24 19:10 UTC (permalink / raw
  To: Jeff King; +Cc: Siddharth Agarwal, Vicent Marti, git

Jeff King <peff@peff.net> writes:

> Sorry, this one slipped through the cracks. Here's a re-roll addressing
> your comments.
> ...
>>  - In the context of "pack-objects", the name "--honor-pack-keep"
>>    makes sense; it is understood that pack-objects will _not_ remove
>>    kept packfile, so "honoring" can only mean "do not attempt to
>>    pick objects out of kept packs to add to the pack being
>>    generated." and there is no room for --no-honor-pack-keep to be
>>    mistaken as "you canremove the ones marked to be kept after
>>    saving the still-used objects in it away."
>> 
>>    But does the same name make sense in the context of "repack"?
>
> I think the distinction you are making is to capture the second second
> from the docs:
>
>   If set to false, include objects in `.keep` files when repacking via
>   `git repack`. Note that we still do not delete `.keep` packs after
>   `pack-objects` finishes.
>
> The best name I could come up with is "--pack-keep-objects", since that
> is literally what it is doing. I'm not wild about the name because it is
> easy to read "keep" as a verb (and "pack" as a noun). I think it's OK,
> but suggestions are welcome.

pack-kept-objects then?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] repack: add `repack.honorpackkeep` config var
  2014-02-24 19:10                 ` Junio C Hamano
@ 2014-02-26 10:13                   ` Jeff King
  2014-02-26 20:30                     ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2014-02-26 10:13 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Siddharth Agarwal, Vicent Marti, git

On Mon, Feb 24, 2014 at 11:10:49AM -0800, Junio C Hamano wrote:

> > The best name I could come up with is "--pack-keep-objects", since that
> > is literally what it is doing. I'm not wild about the name because it is
> > easy to read "keep" as a verb (and "pack" as a noun). I think it's OK,
> > but suggestions are welcome.
> 
> pack-kept-objects then?

Hmm. That does address my point above, but somehow the word "kept" feels
awkward to me. I'm ambivalent between the two.

Here's the "kept" version if you want to apply that.

-- >8 --
From: Vicent Marti <tanoku@gmail.com>
Subject: [PATCH] repack: add `repack.packKeptObjects` config var

The git-repack command always passes `--honor-pack-keep`
to pack-objects. This has traditionally been a good thing,
as we do not want to duplicate those objects in a new pack,
and we are not going to delete the old pack.

However, when bitmaps are in use, it is important for a full
repack to include all reachable objects, even if they may be
duplicated in a .keep pack. Otherwise, we cannot generate
the bitmaps, as the on-disk format requires the set of
objects in the pack to be fully closed.

Even if the repository does not generally have .keep files,
a simultaneous push could cause a race condition in which a
.keep file exists at the moment of a repack. The repack may
try to include those objects in one of two situations:

  1. The pushed .keep pack contains objects that were
     already in the repository (e.g., blobs due to a revert of
     an old commit).

  2. Receive-pack updates the refs, making the objects
     reachable, but before it removes the .keep file, the
     repack runs.

In either case, we may prefer to duplicate some objects in
the new, full pack, and let the next repack (after the .keep
file is cleaned up) take care of removing them.

This patch introduces an option to disable the
`--honor-pack-keep` option.  It is not triggered by default,
even when pack.writeBitmaps is turned on, because its use
depends on your overall packing strategy and use of .keep
files.

Note that this option just disables the pack-objects
behavior. We still leave packs with a .keep in place, as we
do not necessarily know that we have duplicated all of their
objects.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/config.txt     |  5 +++++
 Documentation/git-repack.txt |  8 ++++++++
 builtin/repack.c             | 10 +++++++++-
 t/t7700-repack.sh            | 16 ++++++++++++++++
 4 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index becbade..3a3d84f 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2136,6 +2136,11 @@ repack.usedeltabaseoffset::
 	"false" and repack. Access from old Git versions over the
 	native protocol are unaffected by this option.
 
+repack.packKeptObjects::
+	If set to true, makes `git repack` act as if
+	`--pack-kept-objects` was passed. See linkgit:git-repack[1] for
+	details. Defaults to false.
+
 rerere.autoupdate::
 	When set to true, `git-rerere` updates the index with the
 	resulting contents after it cleanly resolves conflicts using
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 002cfd5..4786a78 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -117,6 +117,14 @@ other objects in that pack they already have locally.
 	must be able to refer to all reachable objects. This option
 	overrides the setting of `pack.writebitmaps`.
 
+--pack-kept-objects::
+	Include objects in `.keep` files when repacking.  Note that we
+	still do not delete `.keep` packs after `pack-objects` finishes.
+	This means that we may duplicate objects, but this makes the
+	option safe to use when there are concurrent pushes or fetches.
+	This option is generally only useful if you are writing bitmaps
+	with `-b` or `pack.writebitmaps`, as it ensures that the
+	bitmapped packfile has the necessary objects.
 
 Configuration
 -------------
diff --git a/builtin/repack.c b/builtin/repack.c
index 49f5857..49947b2 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -9,6 +9,7 @@
 #include "argv-array.h"
 
 static int delta_base_offset = 1;
+static int pack_kept_objects;
 static char *packdir, *packtmp;
 
 static const char *const git_repack_usage[] = {
@@ -22,6 +23,10 @@ static int repack_config(const char *var, const char *value, void *cb)
 		delta_base_offset = git_config_bool(var, value);
 		return 0;
 	}
+	if (!strcmp(var, "repack.packkeptobjects")) {
+		pack_kept_objects = git_config_bool(var, value);
+		return 0;
+	}
 	return git_default_config(var, value, cb);
 }
 
@@ -175,6 +180,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 				N_("limits the maximum delta depth")),
 		OPT_STRING(0, "max-pack-size", &max_pack_size, N_("bytes"),
 				N_("maximum size of each packfile")),
+		OPT_BOOL(0, "pack-kept-objects", &pack_kept_objects,
+				N_("repack objects in packs marked with .keep")),
 		OPT_END()
 	};
 
@@ -190,7 +197,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 
 	argv_array_push(&cmd_args, "pack-objects");
 	argv_array_push(&cmd_args, "--keep-true-parents");
-	argv_array_push(&cmd_args, "--honor-pack-keep");
+	if (!pack_kept_objects)
+		argv_array_push(&cmd_args, "--honor-pack-keep");
 	argv_array_push(&cmd_args, "--non-empty");
 	argv_array_push(&cmd_args, "--all");
 	argv_array_push(&cmd_args, "--reflog");
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index b45bd1e..f8431a8 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -35,6 +35,22 @@ test_expect_success 'objects in packs marked .keep are not repacked' '
 	test -z "$found_duplicate_object"
 '
 
+test_expect_success '--pack-kept-objects duplicates objects' '
+	# build on $objsha1, $packsha1, and .keep state from previous
+	git repack -Adl --pack-kept-objects &&
+	test_when_finished "found_duplicate_object=" &&
+	for p in .git/objects/pack/*.idx; do
+		idx=$(basename $p)
+		test "pack-$packsha1.idx" = "$idx" && continue
+		if git verify-pack -v $p | egrep "^$objsha1"; then
+			found_duplicate_object=1
+			echo "DUPLICATE OBJECT FOUND"
+			break
+		fi
+	done &&
+	test "$found_duplicate_object" = 1
+'
+
 test_expect_success 'loose objects in alternate ODB are not repacked' '
 	mkdir alt_objects &&
 	echo `pwd`/alt_objects > .git/objects/info/alternates &&
-- 
1.8.5.2.500.g8060133

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] repack: add `repack.honorpackkeep` config var
  2014-02-26 10:13                   ` Jeff King
@ 2014-02-26 20:30                     ` Junio C Hamano
  2014-02-27 11:27                       ` Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2014-02-26 20:30 UTC (permalink / raw
  To: Jeff King; +Cc: Siddharth Agarwal, Vicent Marti, git

Jeff King <peff@peff.net> writes:

> On Mon, Feb 24, 2014 at 11:10:49AM -0800, Junio C Hamano wrote:
>
>> > The best name I could come up with is "--pack-keep-objects", since that
>> > is literally what it is doing. I'm not wild about the name because it is
>> > easy to read "keep" as a verb (and "pack" as a noun). I think it's OK,
>> > but suggestions are welcome.
>> 
>> pack-kept-objects then?
>
> Hmm. That does address my point above, but somehow the word "kept" feels
> awkward to me. I'm ambivalent between the two.

That word does make my backside somewhat itchy ;-)

Would it help to take a step back and think what the option really
does?  Perhaps we should call it --pack-all-objects, which is short
for --pack-all-objectsregardless-of-where-they-currently-are-stored,
or something?  The word "all" gives a wrong connotation in a
different way (e.g. "regardless of reachability" is a possible wrong
interpretation), so that does not sound too good, either.

"--repack-kept-objects"?  "--include-kept-objects"?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] repack: add `repack.honorpackkeep` config var
  2014-02-26 20:30                     ` Junio C Hamano
@ 2014-02-27 11:27                       ` Jeff King
  2014-02-27 18:04                         ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2014-02-27 11:27 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Siddharth Agarwal, Vicent Marti, git

On Wed, Feb 26, 2014 at 12:30:36PM -0800, Junio C Hamano wrote:

> >> pack-kept-objects then?
> >
> > Hmm. That does address my point above, but somehow the word "kept" feels
> > awkward to me. I'm ambivalent between the two.
> 
> That word does make my backside somewhat itchy ;-)
> 
> Would it help to take a step back and think what the option really
> does?  Perhaps we should call it --pack-all-objects, which is short
> for --pack-all-objectsregardless-of-where-they-currently-are-stored,
> or something?  The word "all" gives a wrong connotation in a
> different way (e.g. "regardless of reachability" is a possible wrong
> interpretation), so that does not sound too good, either.

I do not think "all" is what we want to say. It only affects "kept"
objects, not any of the other exclusions (e.g., "-l").

> "--repack-kept-objects"?  "--include-kept-objects"?

Of all of them, I think --pack-kept-objects is probably the best. And I
think we are hitting diminishing returns in thinking too much more on
the name. :)

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] repack: add `repack.honorpackkeep` config var
  2014-02-27 11:27                       ` Jeff King
@ 2014-02-27 18:04                         ` Junio C Hamano
  2014-02-28  8:55                           ` Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2014-02-27 18:04 UTC (permalink / raw
  To: Jeff King; +Cc: Siddharth Agarwal, Vicent Marti, git

Jeff King <peff@peff.net> writes:

> Of all of them, I think --pack-kept-objects is probably the best. And I
> think we are hitting diminishing returns in thinking too much more on
> the name. :)

True enough.

I wonder if it makes sense to link it with "pack.writebitmaps" more
tightly, without even exposing it as a seemingly orthogonal knob
that can be tweaked, though.

I think that is because I do not fully understand the ", because ..."
part of the below:

>> This patch introduces an option to disable the
>> `--honor-pack-keep` option.  It is not triggered by default,
>> even when pack.writeBitmaps is turned on, because its use
>> depends on your overall packing strategy and use of .keep
>> files.

If you ask --write-bitmap-index (or have pack.writeBitmaps on), you
do want the bitmap-index to be written, and unless you tell
pack-objects to ignore the .keep marker, it cannot do so, no?

Does the ", because ..." part above mean "you may have an overall
packing strategy to use .keep file to not ever repack some subset of
the objects, so we will not silently explode the kept objects into a
new pack"?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] repack: add `repack.honorpackkeep` config var
  2014-02-27 18:04                         ` Junio C Hamano
@ 2014-02-28  8:55                           ` Jeff King
  2014-02-28 17:09                             ` Nasser Grainawi
  2014-02-28 18:45                             ` Junio C Hamano
  0 siblings, 2 replies; 27+ messages in thread
From: Jeff King @ 2014-02-28  8:55 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Siddharth Agarwal, Vicent Marti, git

On Thu, Feb 27, 2014 at 10:04:44AM -0800, Junio C Hamano wrote:

> I wonder if it makes sense to link it with "pack.writebitmaps" more
> tightly, without even exposing it as a seemingly orthogonal knob
> that can be tweaked, though.
> 
> I think that is because I do not fully understand the ", because ..."
> part of the below:
> 
> >> This patch introduces an option to disable the
> >> `--honor-pack-keep` option.  It is not triggered by default,
> >> even when pack.writeBitmaps is turned on, because its use
> >> depends on your overall packing strategy and use of .keep
> >> files.
> 
> If you ask --write-bitmap-index (or have pack.writeBitmaps on), you
> do want the bitmap-index to be written, and unless you tell
> pack-objects to ignore the .keep marker, it cannot do so, no?
> 
> Does the ", because ..." part above mean "you may have an overall
> packing strategy to use .keep file to not ever repack some subset of
> the objects, so we will not silently explode the kept objects into a
> new pack"?

Exactly. The two features (bitmaps and .keep) are not compatible with
each other, so you have to prioritize one. If you are using static .keep
files, you might want them to continue being respected at the expense of
using bitmaps for that repo. So I think you want a separate option from
--write-bitmap-index to allow the appropriate flexibility.

The default is another matter.  I think most people using .bitmaps on a
server would probably want to set repack.packKeptObjects.  They would
want to repack often to take advantage of the .bitmaps anyway, so they
probably don't care about .keep files (any they see are due to races
with incoming pushes).

So we could do something like falling back to turning the option on if
--write-bitmap-index is on _and_ the user didn't specify
--pack-kept-objects. The existing default is mostly there because it is
the conservative choice (.keep files continue to do their thing as
normal unless you say otherwise). But the fallback thing would be one
less knob that bitmap users would need to turn in the common case.

Here's the interdiff for doing the fallback:

---
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 3a3d84f..a8ddc7f 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2139,7 +2139,9 @@ repack.usedeltabaseoffset::
 repack.packKeptObjects::
 	If set to true, makes `git repack` act as if
 	`--pack-kept-objects` was passed. See linkgit:git-repack[1] for
-	details. Defaults to false.
+	details. Defaults to `false` normally, but `true` if a bitmap
+	index is being written (either via `--write-bitmap-index` or
+	`pack.writeBitmaps`).
 
 rerere.autoupdate::
 	When set to true, `git-rerere` updates the index with the
diff --git a/builtin/repack.c b/builtin/repack.c
index 49947b2..6b0b62d 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -9,7 +9,7 @@
 #include "argv-array.h"
 
 static int delta_base_offset = 1;
-static int pack_kept_objects;
+static int pack_kept_objects = -1;
 static char *packdir, *packtmp;
 
 static const char *const git_repack_usage[] = {
@@ -190,6 +190,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, builtin_repack_options,
 				git_repack_usage, 0);
 
+	if (pack_kept_objects < 0)
+		pack_kept_objects = write_bitmap;
+
 	packdir = mkpathdup("%s/pack", get_object_directory());
 	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());
 
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index f8431a8..b1eed5c 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -21,7 +21,7 @@ test_expect_success 'objects in packs marked .keep are not repacked' '
 	objsha1=$(git verify-pack -v pack-$packsha1.idx | head -n 1 |
 		sed -e "s/^\([0-9a-f]\{40\}\).*/\1/") &&
 	mv pack-* .git/objects/pack/ &&
-	git repack -A -d -l &&
+	git repack --no-pack-kept-objects -A -d -l &&
 	git prune-packed &&
 	for p in .git/objects/pack/*.idx; do
 		idx=$(basename $p)
@@ -35,9 +35,9 @@ test_expect_success 'objects in packs marked .keep are not repacked' '
 	test -z "$found_duplicate_object"
 '
 
-test_expect_success '--pack-kept-objects duplicates objects' '
+test_expect_success 'writing bitmaps duplicates .keep objects' '
 	# build on $objsha1, $packsha1, and .keep state from previous
-	git repack -Adl --pack-kept-objects &&
+	git repack -Adl &&
 	test_when_finished "found_duplicate_object=" &&
 	for p in .git/objects/pack/*.idx; do
 		idx=$(basename $p)

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] repack: add `repack.honorpackkeep` config var
  2014-02-28  8:55                           ` Jeff King
@ 2014-02-28 17:09                             ` Nasser Grainawi
  2014-03-01  6:05                               ` Jeff King
  2014-02-28 18:45                             ` Junio C Hamano
  1 sibling, 1 reply; 27+ messages in thread
From: Nasser Grainawi @ 2014-02-28 17:09 UTC (permalink / raw
  To: Jeff King; +Cc: Junio C Hamano, Siddharth Agarwal, Vicent Marti, git

On Feb 28, 2014, at 1:55 AM, Jeff King <peff@peff.net> wrote:

> On Thu, Feb 27, 2014 at 10:04:44AM -0800, Junio C Hamano wrote:
> 
>> I wonder if it makes sense to link it with "pack.writebitmaps" more
>> tightly, without even exposing it as a seemingly orthogonal knob
>> that can be tweaked, though.
>> 
>> I think that is because I do not fully understand the ", because ..."
>> part of the below:
>> 
>>>> This patch introduces an option to disable the
>>>> `--honor-pack-keep` option.  It is not triggered by default,
>>>> even when pack.writeBitmaps is turned on, because its use
>>>> depends on your overall packing strategy and use of .keep
>>>> files.
>> 
>> If you ask --write-bitmap-index (or have pack.writeBitmaps on), you
>> do want the bitmap-index to be written, and unless you tell
>> pack-objects to ignore the .keep marker, it cannot do so, no?
>> 
>> Does the ", because ..." part above mean "you may have an overall
>> packing strategy to use .keep file to not ever repack some subset of
>> the objects, so we will not silently explode the kept objects into a
>> new pack"?
> 
> Exactly. The two features (bitmaps and .keep) are not compatible with
> each other, so you have to prioritize one. If you are using static .keep
> files, you might want them to continue being respected at the expense of
> using bitmaps for that repo. So I think you want a separate option from
> --write-bitmap-index to allow the appropriate flexibility.

Has anyone thought about how to make them compatible? We're using Martin Fick's git-exproll script which makes heavy use of keeps to reduce pack file churn. In addition to the on-disk benefits we get there, the driving factor behind creating exproll was to prevent Gerrit from having two large (30GB+) mostly duplicated pack files open in memory at the same time. Repacking in JGit would help in a single-master environment, but we'd be back to having this problem once we go to a multi-master setup.

Perhaps the solution here is actually something in JGit where it could aggressively try to close references to pack files, but that still doesn't help the disk churn problem. As Peff says below, we would want to repack often to get up-to-date bitmaps, but ideally we could do that without writing hundreds of GBs to disk (which is obviously worse when "disk" is a NFS mount).

> 
> The default is another matter.  I think most people using .bitmaps on a
> server would probably want to set repack.packKeptObjects.  They would
> want to repack often to take advantage of the .bitmaps anyway, so they
> probably don't care about .keep files (any they see are due to races
> with incoming pushes).
> 
> So we could do something like falling back to turning the option on if
> --write-bitmap-index is on _and_ the user didn't specify
> --pack-kept-objects. The existing default is mostly there because it is
> the conservative choice (.keep files continue to do their thing as
> normal unless you say otherwise). But the fallback thing would be one
> less knob that bitmap users would need to turn in the common case.
> 
> Here's the interdiff for doing the fallback:
> 
> ---
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 3a3d84f..a8ddc7f 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -2139,7 +2139,9 @@ repack.usedeltabaseoffset::
> repack.packKeptObjects::
> 	If set to true, makes `git repack` act as if
> 	`--pack-kept-objects` was passed. See linkgit:git-repack[1] for
> -	details. Defaults to false.
> +	details. Defaults to `false` normally, but `true` if a bitmap
> +	index is being written (either via `--write-bitmap-index` or
> +	`pack.writeBitmaps`).
> 
> rerere.autoupdate::
> 	When set to true, `git-rerere` updates the index with the
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 49947b2..6b0b62d 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -9,7 +9,7 @@
> #include "argv-array.h"
> 
> static int delta_base_offset = 1;
> -static int pack_kept_objects;
> +static int pack_kept_objects = -1;
> static char *packdir, *packtmp;
> 
> static const char *const git_repack_usage[] = {
> @@ -190,6 +190,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
> 	argc = parse_options(argc, argv, prefix, builtin_repack_options,
> 				git_repack_usage, 0);
> 
> +	if (pack_kept_objects < 0)
> +		pack_kept_objects = write_bitmap;
> +
> 	packdir = mkpathdup("%s/pack", get_object_directory());
> 	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());
> 
> diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
> index f8431a8..b1eed5c 100755
> --- a/t/t7700-repack.sh
> +++ b/t/t7700-repack.sh
> @@ -21,7 +21,7 @@ test_expect_success 'objects in packs marked .keep are not repacked' '
> 	objsha1=$(git verify-pack -v pack-$packsha1.idx | head -n 1 |
> 		sed -e "s/^\([0-9a-f]\{40\}\).*/\1/") &&
> 	mv pack-* .git/objects/pack/ &&
> -	git repack -A -d -l &&
> +	git repack --no-pack-kept-objects -A -d -l &&
> 	git prune-packed &&
> 	for p in .git/objects/pack/*.idx; do
> 		idx=$(basename $p)
> @@ -35,9 +35,9 @@ test_expect_success 'objects in packs marked .keep are not repacked' '
> 	test -z "$found_duplicate_object"
> '
> 
> -test_expect_success '--pack-kept-objects duplicates objects' '
> +test_expect_success 'writing bitmaps duplicates .keep objects' '
> 	# build on $objsha1, $packsha1, and .keep state from previous
> -	git repack -Adl --pack-kept-objects &&
> +	git repack -Adl &&
> 	test_when_finished "found_duplicate_object=" &&
> 	for p in .git/objects/pack/*.idx; do
> 		idx=$(basename $p)
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] repack: add `repack.honorpackkeep` config var
  2014-02-28  8:55                           ` Jeff King
  2014-02-28 17:09                             ` Nasser Grainawi
@ 2014-02-28 18:45                             ` Junio C Hamano
  2014-03-01  5:43                               ` Jeff King
  1 sibling, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2014-02-28 18:45 UTC (permalink / raw
  To: Jeff King; +Cc: Siddharth Agarwal, Vicent Marti, git

Jeff King <peff@peff.net> writes:

> On Thu, Feb 27, 2014 at 10:04:44AM -0800, Junio C Hamano wrote:
>
>> I wonder if it makes sense to link it with "pack.writebitmaps" more
>> tightly, without even exposing it as a seemingly orthogonal knob
>> that can be tweaked, though.
>> 
>> I think that is because I do not fully understand the ", because ..."
>> part of the below:
>> 
>> >> This patch introduces an option to disable the
>> >> `--honor-pack-keep` option.  It is not triggered by default,
>> >> even when pack.writeBitmaps is turned on, because its use
>> >> depends on your overall packing strategy and use of .keep
>> >> files.
>> 
>> If you ask --write-bitmap-index (or have pack.writeBitmaps on), you
>> do want the bitmap-index to be written, and unless you tell
>> pack-objects to ignore the .keep marker, it cannot do so, no?
>> 
>> Does the ", because ..." part above mean "you may have an overall
>> packing strategy to use .keep file to not ever repack some subset of
>> the objects, so we will not silently explode the kept objects into a
>> new pack"?
>
> Exactly. The two features (bitmaps and .keep) are not compatible with
> each other, so you have to prioritize one. If you are using static .keep
> files, you might want them to continue being respected at the expense of
> using bitmaps for that repo. So I think you want a separate option from
> --write-bitmap-index to allow the appropriate flexibility.

What is "the appropriate flexibility", though?  If the user wants to
use bitmap, we would need to drop .keep, no?  Doesn't always having
two copies in two packs degrade performance unnecessarily (without
even talking about wasted diskspace)?  An explicit .keep exists in
the repository because it is expensive and undesirable to duplicate
what is in there in the first place, so it feels to me that either

 - Disable with warning, or outright refuse, the "-b" option if
   there is .keep (if we want to give precedence to .keep); or

 - Remove .keep with warning when "-b" option is given (if we want
   to give precedence to "-b").

and nothing else would be a reasonable option.  Unfortunately, we
can do neither automatically because there could be a transient .keep
file in an active repository.

So I think I agree with this...

> The default is another matter.  I think most people using .bitmaps on a
> server would probably want to set repack.packKeptObjects.  They would
> want to repack often to take advantage of the .bitmaps anyway, so they
> probably don't care about .keep files (any they see are due to races
> with incoming pushes).

... which makes me think that repack.packKeptObjects is merely a
distraction---it should be enough to just pass "--pack-kept-objects"
when "-b" is asked, without giving any extra configurability, no?

> So we could do something like falling back to turning the option on if
> --write-bitmap-index is on _and_ the user didn't specify
> --pack-kept-objects.

If you mean "didn't specify --no-pack-kept-objects", then I think
that is sensible.  I still do not know why we would want the
configuration variable, though.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] repack: add `repack.honorpackkeep` config var
  2014-02-28 18:45                             ` Junio C Hamano
@ 2014-03-01  5:43                               ` Jeff King
  2014-03-03 18:13                                 ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2014-03-01  5:43 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Siddharth Agarwal, Vicent Marti, git

On Fri, Feb 28, 2014 at 10:45:39AM -0800, Junio C Hamano wrote:

> > Exactly. The two features (bitmaps and .keep) are not compatible with
> > each other, so you have to prioritize one. If you are using static .keep
> > files, you might want them to continue being respected at the expense of
> > using bitmaps for that repo. So I think you want a separate option from
> > --write-bitmap-index to allow the appropriate flexibility.
> 
> What is "the appropriate flexibility", though?  If the user wants to
> use bitmap, we would need to drop .keep, no?

Or the flip side: if the user wants to use .keep, we should drop
bitmaps. My point is that we do not know which way the user wants to
go, so we should not tie the options together.

> Doesn't always having two copies in two packs degrade performance
> unnecessarily (without even talking about wasted diskspace)?  An
> explicit .keep exists in the repository because it is expensive and
> undesirable to duplicate what is in there in the first place, so it
> feels to me that either

The benefits of static .keep files are (I think):

  1. less I/O during repacks, as you do not rewrite a static set of
     objects

  2. less turnover of packfiles, which can make dumb access more
     efficient (both for dumb clients, but also for things like
     non-git-aware backups).

I think the existence of smart-http more or less nullifies (2). For (1),
it helps at first, but you get diminishing returns as your non-keep
packfile grows. I think it only helps in pathological cases (e.g., you
mark 10GB worth of giant blobs in a .keep pack, and then pack the other
10MB of trees, commits, and normal-sized blobs as usual).

>  - Disable with warning, or outright refuse, the "-b" option if
>    there is .keep (if we want to give precedence to .keep); or
> 
>  - Remove .keep with warning when "-b" option is given (if we want
>    to give precedence to "-b").
> 
> and nothing else would be a reasonable option.  Unfortunately, we
> can do neither automatically because there could be a transient .keep
> file in an active repository.

Right, the transient ones complicate the issue. But I think even for
static .keep versus bitmaps, there is question. See below...

> > The default is another matter.  I think most people using .bitmaps on a
> > server would probably want to set repack.packKeptObjects.  They would
> > want to repack often to take advantage of the .bitmaps anyway, so they
> > probably don't care about .keep files (any they see are due to races
> > with incoming pushes).
> 
> ... which makes me think that repack.packKeptObjects is merely a
> distraction---it should be enough to just pass "--pack-kept-objects"
> when "-b" is asked, without giving any extra configurability, no?

But you do not necessarily ask for "-b" explicitly; it might come from
the config, too. Imagine you have a server with many repos. You want to
use bitmaps when you can, so you set pack.writeBitmaps in
/etc/gitconfig. But in a few repos, you want to use .keep files, and
it's more important for you to use it than bitmaps (e.g., because it is
one of the pathological cases above). So you set repack.packKeptObjects
to false in /etc/gitconfig, to prefer .keep to bitmaps where
appropriate.

If you did not have that config option, your alternative would be to
turn off pack.writeBitmaps in the repositories with .keep files. But
then you need to per-repo keep that flag in sync with whether or not the
repo has .keep files.

To be clear, at GitHub we do not plan on ever having
repack.packKeptObjects off (for now we have it on explicitly, but if it
were connected to pack.writeBitmaps, then we would be happy with that
default). I am mostly trying to give an escape hatch to let people use
different optimization strategies if they want.

If we are going to have --pack-kept-objects (and I think we should),
I think we also should have a matching config option. Because it is
useful when matched with the bitmap code, and that can be turned on both
from the command-line or from the config. Wherever you are doing that,
you would want to be able to make the matching .keep decision.

And I don't think it hurts much.  With the fallback-to-on behavior, you
do not have to even care that it is there unless you are doing something
clever.

> > So we could do something like falling back to turning the option on if
> > --write-bitmap-index is on _and_ the user didn't specify
> > --pack-kept-objects.
> 
> If you mean "didn't specify --no-pack-kept-objects", then I think
> that is sensible.  I still do not know why we would want the
> configuration variable, though.

Right, I meant "if the pack-kept-objects variable is set, either on or
off, either via the command line or in the config".

As far as what the config is good for, see above.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] repack: add `repack.honorpackkeep` config var
  2014-02-28 17:09                             ` Nasser Grainawi
@ 2014-03-01  6:05                               ` Jeff King
  2014-03-03 19:12                                 ` Shawn Pearce
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2014-03-01  6:05 UTC (permalink / raw
  To: Nasser Grainawi; +Cc: Junio C Hamano, Siddharth Agarwal, Vicent Marti, git

On Fri, Feb 28, 2014 at 10:09:08AM -0700, Nasser Grainawi wrote:

> > Exactly. The two features (bitmaps and .keep) are not compatible with
> > each other, so you have to prioritize one. If you are using static .keep
> > files, you might want them to continue being respected at the expense of
> > using bitmaps for that repo. So I think you want a separate option from
> > --write-bitmap-index to allow the appropriate flexibility.
> 
> Has anyone thought about how to make them compatible?

Yes, but it's complicated and not likely to happen soon.

Having .keep files means that you are not including some objects in the
newly created pack. Each bit in a commit's bitmap corresponds to one
object in the pack, and whether it is reachable from that commit. The
bitmap is only useful if we can calculate the full reachability from it,
and it has no way to specify objects outside of the pack.

To fix this, you would need to change the on-disk format of the bitmaps
to somehow reference objects outside of the pack. Either by having the
bitmaps index a repo-global set of objects, or by permitting a list of
"edge" objects that are referenced from the pack, but not included (and
then when assembling the full reachable list, you would have to recurse
across "edge" objects to find their reachable list in another pack,
etc).

So it's possible, but it would complicate the scheme quite a bit, and
would not be backwards compatible with either JGit or C Git.

> We're using Martin Fick's git-exproll script which makes heavy use of
> keeps to reduce pack file churn. In addition to the on-disk benefits
> we get there, the driving factor behind creating exproll was to
> prevent Gerrit from having two large (30GB+) mostly duplicated pack
> files open in memory at the same time. Repacking in JGit would help in
> a single-master environment, but we'd be back to having this problem
> once we go to a multi-master setup.
> 
> Perhaps the solution here is actually something in JGit where it could
> aggressively try to close references to pack files

In C git we don't worry about this too much, because our programs tend
to be short-lived, and references to the old pack will go away quickly.
Plus it is all mmap'd, so as we simply stop accessing the pages of the
old pack, they should eventually be dropped if there is memory pressure.

I seem to recall that JGit does not mmap its packfiles. Does it pread?
In that case, I'd expect unused bits from the duplicated packfile to get
dropped from the disk cache over time. If it loads whole packfiles into
memory, then yes, it should probably close more aggressively.

> , but that still
> doesn't help the disk churn problem. As Peff says below, we would want
> to repack often to get up-to-date bitmaps, but ideally we could do
> that without writing hundreds of GBs to disk (which is obviously worse
> when "disk" is a NFS mount).

Ultimately I think the solution to the churn problem is a packfile-like
storage that allows true appending of deltas. You can come up with a
scheme to allow deltas between on-disk packs (i.e., "thin" packs on
disk). The trick there is handling the dependencies and cycles. I think
you could get by with a strict ordering of packs and a few rules:

  1. An object in a pack with weight A cannot have as a base an object
     in a pack with weight <= A.

  2. A pack with weight A cannot be deleted if there exists a pack with
     weight > A.

But you'd want to also add in a single update-able index over all the
packfiles, and even then you'd still want to pack occasionally (because
you'd end up with deltas on bases going back in time, but you really
prefer your bases to be near the tip of history).

So I am not volunteering to work on it. :)

At GitHub we mostly deal with the churn by throwing more server
resources at it. But we have the advantage of having a very large number
of small-to-medium repos, which is relatively easy to scale up. A small
number of huge repos is trickier.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] repack: add `repack.honorpackkeep` config var
  2014-03-01  5:43                               ` Jeff King
@ 2014-03-03 18:13                                 ` Junio C Hamano
  2014-03-03 18:15                                   ` Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2014-03-03 18:13 UTC (permalink / raw
  To: Jeff King; +Cc: Siddharth Agarwal, Vicent Marti, git

Jeff King <peff@peff.net> writes:

> On Fri, Feb 28, 2014 at 10:45:39AM -0800, Junio C Hamano wrote:
>
>> > Exactly. The two features (bitmaps and .keep) are not compatible with
>> > each other, so you have to prioritize one. If you are using static .keep
>> > files, you might want them to continue being respected at the expense of
>> > using bitmaps for that repo. So I think you want a separate option from
>> > --write-bitmap-index to allow the appropriate flexibility.
>> 
>> What is "the appropriate flexibility", though?  If the user wants to
>> use bitmap, we would need to drop .keep, no?
>
> Or the flip side: if the user wants to use .keep, we should drop
> bitmaps. My point is that we do not know which way the user wants to
> go, so we should not tie the options together.

Hmph.  I think the short of your later explanation is "global config
may tell us to use bitmap, in which case we would need a way to
defeat that and have existing .keep honored, and it makes it easier
to do so if these two are kept separate, because you do not want to
run around and selectively disable bitmaps in these repositories.
We can instead do so with repack.packKeptObjects in the global
configuration." and I tend to agree with the reasoning.

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] repack: add `repack.honorpackkeep` config var
  2014-03-03 18:13                                 ` Junio C Hamano
@ 2014-03-03 18:15                                   ` Jeff King
  2014-03-03 19:51                                     ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2014-03-03 18:15 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Siddharth Agarwal, Vicent Marti, git

On Mon, Mar 03, 2014 at 10:13:47AM -0800, Junio C Hamano wrote:

> > Or the flip side: if the user wants to use .keep, we should drop
> > bitmaps. My point is that we do not know which way the user wants to
> > go, so we should not tie the options together.
> 
> Hmph.  I think the short of your later explanation is "global config
> may tell us to use bitmap, in which case we would need a way to
> defeat that and have existing .keep honored, and it makes it easier
> to do so if these two are kept separate, because you do not want to
> run around and selectively disable bitmaps in these repositories.
> We can instead do so with repack.packKeptObjects in the global
> configuration." and I tend to agree with the reasoning.

Yes. Do you need a re-roll from me? I think the last version I sent +
the squash to tie the default to bitmap-writing makes the most sense.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] repack: add `repack.honorpackkeep` config var
  2014-03-01  6:05                               ` Jeff King
@ 2014-03-03 19:12                                 ` Shawn Pearce
  0 siblings, 0 replies; 27+ messages in thread
From: Shawn Pearce @ 2014-03-03 19:12 UTC (permalink / raw
  To: Jeff King
  Cc: Nasser Grainawi, Junio C Hamano, Siddharth Agarwal, Vicent Marti,
	git

On Fri, Feb 28, 2014 at 10:05 PM, Jeff King <peff@peff.net> wrote:
> On Fri, Feb 28, 2014 at 10:09:08AM -0700, Nasser Grainawi wrote:
>
>> > Exactly. The two features (bitmaps and .keep) are not compatible with
>> > each other, so you have to prioritize one. If you are using static .keep
>> > files, you might want them to continue being respected at the expense of
>> > using bitmaps for that repo. So I think you want a separate option from
>> > --write-bitmap-index to allow the appropriate flexibility.
>>
>> Has anyone thought about how to make them compatible?
>
> Yes, but it's complicated and not likely to happen soon.
>
> Having .keep files means that you are not including some objects in the
> newly created pack. Each bit in a commit's bitmap corresponds to one
> object in the pack, and whether it is reachable from that commit. The
> bitmap is only useful if we can calculate the full reachability from it,
> and it has no way to specify objects outside of the pack.
>
> To fix this, you would need to change the on-disk format of the bitmaps
> to somehow reference objects outside of the pack. Either by having the
> bitmaps index a repo-global set of objects, or by permitting a list of
> "edge" objects that are referenced from the pack, but not included (and
> then when assembling the full reachable list, you would have to recurse
> across "edge" objects to find their reachable list in another pack,
> etc).
>
> So it's possible, but it would complicate the scheme quite a bit, and
> would not be backwards compatible with either JGit or C Git.

Colby Ranger always wanted to add this to the bitmap scheme. Construct
a partial pack bitmap on a partial pack of "recent" objects, with edge
pointers naming objects that are not in this pack but whose closures
need to be considered part of the bitmap. Its complicated in-memory
because you need to fuse together two or more bitmaps (the partial
pack one, and the larger historical kept pack) before running the
"want AND NOT have" computation.

Colby did not find time to work on this in JGit, so it just didn't get
implemented. But we did consider it, as the servers at Google we built
bitmap for use a multi-level pack scheme and don't want to rebuild
packs all of the time.

>> We're using Martin Fick's git-exproll script which makes heavy use of
>> keeps to reduce pack file churn. In addition to the on-disk benefits
>> we get there, the driving factor behind creating exproll was to
>> prevent Gerrit from having two large (30GB+) mostly duplicated pack
>> files open in memory at the same time. Repacking in JGit would help in
>> a single-master environment, but we'd be back to having this problem
>> once we go to a multi-master setup.
>>
>> Perhaps the solution here is actually something in JGit where it could
>> aggressively try to close references to pack files
>
> In C git we don't worry about this too much, because our programs tend
> to be short-lived, and references to the old pack will go away quickly.
> Plus it is all mmap'd, so as we simply stop accessing the pages of the
> old pack, they should eventually be dropped if there is memory pressure.
>
> I seem to recall that JGit does not mmap its packfiles. Does it pread?

JGit does not mmap because you can't munmap() until the Java GC gets
around to freeing the tiny little header object that contains the
memory address of the start of the mmap segment. This can take ages,
to the point where you run out of virtual address space in the process
and s**t starts to fail left and right inside of the JVM. The GC is
just unable to prioritize finding those tiny headers and getting them
out of the heap so the munmap can take place safely.

So yea, JGit does pread() for the blocks but it holds those in its own
buffer cache inside of the Java heap. Where a 4K disk block is a 4K
memory array that puts pressure on the GC to actually wake up and free
resources that are unused. What Nasser is talking about is JGit may
take a long time to realize one pack is unused and start kicking those
blocks out of its buffer cache. Those blocks are reference counted and
the file descriptor JGit preads from is held open so long as at least
one block is in the buffer cache. By keeping the file open we force
the filesystem to keep the inode alive a lot longer, which means the
disk needs a huge amount of free space to store the unlinked but still
open 30G pack files from prior GC generations.

> In that case, I'd expect unused bits from the duplicated packfile to get
> dropped from the disk cache over time. If it loads whole packfiles into
> memory, then yes, it should probably close more aggressively.

Its more than that, its the inode being kept alive by the open file
descriptor...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] repack: add `repack.honorpackkeep` config var
  2014-03-03 18:15                                   ` Jeff King
@ 2014-03-03 19:51                                     ` Junio C Hamano
  2014-03-03 20:04                                       ` Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2014-03-03 19:51 UTC (permalink / raw
  To: Jeff King; +Cc: Siddharth Agarwal, Vicent Marti, git

Jeff King <peff@peff.net> writes:

> On Mon, Mar 03, 2014 at 10:13:47AM -0800, Junio C Hamano wrote:
>
>> > Or the flip side: if the user wants to use .keep, we should drop
>> > bitmaps. My point is that we do not know which way the user wants to
>> > go, so we should not tie the options together.
>> 
>> Hmph.  I think the short of your later explanation is "global config
>> may tell us to use bitmap, in which case we would need a way to
>> defeat that and have existing .keep honored, and it makes it easier
>> to do so if these two are kept separate, because you do not want to
>> run around and selectively disable bitmaps in these repositories.
>> We can instead do so with repack.packKeptObjects in the global
>> configuration." and I tend to agree with the reasoning.
>
> Yes. Do you need a re-roll from me? I think the last version I sent +
> the squash to tie the default to bitmap-writing makes the most sense.

I have 9e20b390 (repack: add `repack.packKeptObjects` config var,
2014-02-26); I do not recall I've squashed anything into it, though.

Do you mean this one?

Here's the interdiff for doing the fallback:

---
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 3a3d84f..a8ddc7f 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2139,7 +2139,9 @@ repack.usedeltabaseoffset::
 repack.packKeptObjects::
 	If set to true, makes `git repack` act as if
 	`--pack-kept-objects` was passed. See linkgit:git-repack[1] for
-	details. Defaults to false.
+	details. Defaults to `false` normally, but `true` if a bitmap
+	index is being written (either via `--write-bitmap-index` or
+	`pack.writeBitmaps`).
 
 rerere.autoupdate::
 	When set to true, `git-rerere` updates the index with the
diff --git a/builtin/repack.c b/builtin/repack.c
index 49947b2..6b0b62d 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -9,7 +9,7 @@
 #include "argv-array.h"
 
 static int delta_base_offset = 1;
-static int pack_kept_objects;
+static int pack_kept_objects = -1;
 static char *packdir, *packtmp;
 
 static const char *const git_repack_usage[] = {
@@ -190,6 +190,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, builtin_repack_options,
 				git_repack_usage, 0);
 
+	if (pack_kept_objects < 0)
+		pack_kept_objects = write_bitmap;
+
 	packdir = mkpathdup("%s/pack", get_object_directory());
 	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());
 
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index f8431a8..b1eed5c 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -21,7 +21,7 @@ test_expect_success 'objects in packs marked .keep are not repacked' '
 	objsha1=$(git verify-pack -v pack-$packsha1.idx | head -n 1 |
 		sed -e "s/^\([0-9a-f]\{40\}\).*/\1/") &&
 	mv pack-* .git/objects/pack/ &&
-	git repack -A -d -l &&
+	git repack --no-pack-kept-objects -A -d -l &&
 	git prune-packed &&
 	for p in .git/objects/pack/*.idx; do
 		idx=$(basename $p)
@@ -35,9 +35,9 @@ test_expect_success 'objects in packs marked .keep are not repacked' '
 	test -z "$found_duplicate_object"
 '
 
-test_expect_success '--pack-kept-objects duplicates objects' '
+test_expect_success 'writing bitmaps duplicates .keep objects' '
 	# build on $objsha1, $packsha1, and .keep state from previous
-	git repack -Adl --pack-kept-objects &&
+	git repack -Adl &&
 	test_when_finished "found_duplicate_object=" &&
 	for p in .git/objects/pack/*.idx; do
 		idx=$(basename $p)

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] repack: add `repack.honorpackkeep` config var
  2014-03-03 19:51                                     ` Junio C Hamano
@ 2014-03-03 20:04                                       ` Jeff King
  0 siblings, 0 replies; 27+ messages in thread
From: Jeff King @ 2014-03-03 20:04 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Siddharth Agarwal, Vicent Marti, git

On Mon, Mar 03, 2014 at 11:51:06AM -0800, Junio C Hamano wrote:

> > Yes. Do you need a re-roll from me? I think the last version I sent +
> > the squash to tie the default to bitmap-writing makes the most sense.
> 
> I have 9e20b390 (repack: add `repack.packKeptObjects` config var,
> 2014-02-26); I do not recall I've squashed anything into it, though.
> 
> Do you mean this one?
> 
> Here's the interdiff for doing the fallback:
> [...]

Yes. Though I just noticed that the commit message needs updating if
that is squashed in. Here is the whole patch, with that update.

And I am dropping Vicent as the author, since I think there are now
literally zero lines of his left. ;)

-- >8 --
Subject: [PATCH] repack: add `repack.packKeptObjects` config var

The git-repack command always passes `--honor-pack-keep`
to pack-objects. This has traditionally been a good thing,
as we do not want to duplicate those objects in a new pack,
and we are not going to delete the old pack.

However, when bitmaps are in use, it is important for a full
repack to include all reachable objects, even if they may be
duplicated in a .keep pack. Otherwise, we cannot generate
the bitmaps, as the on-disk format requires the set of
objects in the pack to be fully closed.

Even if the repository does not generally have .keep files,
a simultaneous push could cause a race condition in which a
.keep file exists at the moment of a repack. The repack may
try to include those objects in one of two situations:

  1. The pushed .keep pack contains objects that were
     already in the repository (e.g., blobs due to a revert of
     an old commit).

  2. Receive-pack updates the refs, making the objects
     reachable, but before it removes the .keep file, the
     repack runs.

In either case, we may prefer to duplicate some objects in
the new, full pack, and let the next repack (after the .keep
file is cleaned up) take care of removing them.

This patch introduces both a command-line and config option
to disable the `--honor-pack-keep` option.  By default, it
is triggered when pack.writeBitmaps (or `--write-bitmap-index`
is turned on), but specifying it explicitly can override the
behavior (e.g., in cases where you prefer .keep files to
bitmaps, but only when they are present).

Note that this option just disables the pack-objects
behavior. We still leave packs with a .keep in place, as we
do not necessarily know that we have duplicated all of their
objects.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/config.txt     |  7 +++++++
 Documentation/git-repack.txt |  8 ++++++++
 builtin/repack.c             | 13 ++++++++++++-
 t/t7700-repack.sh            | 18 +++++++++++++++++-
 4 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index becbade..a8ddc7f 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2136,6 +2136,13 @@ repack.usedeltabaseoffset::
 	"false" and repack. Access from old Git versions over the
 	native protocol are unaffected by this option.
 
+repack.packKeptObjects::
+	If set to true, makes `git repack` act as if
+	`--pack-kept-objects` was passed. See linkgit:git-repack[1] for
+	details. Defaults to `false` normally, but `true` if a bitmap
+	index is being written (either via `--write-bitmap-index` or
+	`pack.writeBitmaps`).
+
 rerere.autoupdate::
 	When set to true, `git-rerere` updates the index with the
 	resulting contents after it cleanly resolves conflicts using
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 002cfd5..4786a78 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -117,6 +117,14 @@ other objects in that pack they already have locally.
 	must be able to refer to all reachable objects. This option
 	overrides the setting of `pack.writebitmaps`.
 
+--pack-kept-objects::
+	Include objects in `.keep` files when repacking.  Note that we
+	still do not delete `.keep` packs after `pack-objects` finishes.
+	This means that we may duplicate objects, but this makes the
+	option safe to use when there are concurrent pushes or fetches.
+	This option is generally only useful if you are writing bitmaps
+	with `-b` or `pack.writebitmaps`, as it ensures that the
+	bitmapped packfile has the necessary objects.
 
 Configuration
 -------------
diff --git a/builtin/repack.c b/builtin/repack.c
index 49f5857..6b0b62d 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -9,6 +9,7 @@
 #include "argv-array.h"
 
 static int delta_base_offset = 1;
+static int pack_kept_objects = -1;
 static char *packdir, *packtmp;
 
 static const char *const git_repack_usage[] = {
@@ -22,6 +23,10 @@ static int repack_config(const char *var, const char *value, void *cb)
 		delta_base_offset = git_config_bool(var, value);
 		return 0;
 	}
+	if (!strcmp(var, "repack.packkeptobjects")) {
+		pack_kept_objects = git_config_bool(var, value);
+		return 0;
+	}
 	return git_default_config(var, value, cb);
 }
 
@@ -175,6 +180,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 				N_("limits the maximum delta depth")),
 		OPT_STRING(0, "max-pack-size", &max_pack_size, N_("bytes"),
 				N_("maximum size of each packfile")),
+		OPT_BOOL(0, "pack-kept-objects", &pack_kept_objects,
+				N_("repack objects in packs marked with .keep")),
 		OPT_END()
 	};
 
@@ -183,6 +190,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, builtin_repack_options,
 				git_repack_usage, 0);
 
+	if (pack_kept_objects < 0)
+		pack_kept_objects = write_bitmap;
+
 	packdir = mkpathdup("%s/pack", get_object_directory());
 	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());
 
@@ -190,7 +200,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 
 	argv_array_push(&cmd_args, "pack-objects");
 	argv_array_push(&cmd_args, "--keep-true-parents");
-	argv_array_push(&cmd_args, "--honor-pack-keep");
+	if (!pack_kept_objects)
+		argv_array_push(&cmd_args, "--honor-pack-keep");
 	argv_array_push(&cmd_args, "--non-empty");
 	argv_array_push(&cmd_args, "--all");
 	argv_array_push(&cmd_args, "--reflog");
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index b45bd1e..b1eed5c 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -21,7 +21,7 @@ test_expect_success 'objects in packs marked .keep are not repacked' '
 	objsha1=$(git verify-pack -v pack-$packsha1.idx | head -n 1 |
 		sed -e "s/^\([0-9a-f]\{40\}\).*/\1/") &&
 	mv pack-* .git/objects/pack/ &&
-	git repack -A -d -l &&
+	git repack --no-pack-kept-objects -A -d -l &&
 	git prune-packed &&
 	for p in .git/objects/pack/*.idx; do
 		idx=$(basename $p)
@@ -35,6 +35,22 @@ test_expect_success 'objects in packs marked .keep are not repacked' '
 	test -z "$found_duplicate_object"
 '
 
+test_expect_success 'writing bitmaps can duplicate .keep objects' '
+	# build on $objsha1, $packsha1, and .keep state from previous
+	git repack -Adl &&
+	test_when_finished "found_duplicate_object=" &&
+	for p in .git/objects/pack/*.idx; do
+		idx=$(basename $p)
+		test "pack-$packsha1.idx" = "$idx" && continue
+		if git verify-pack -v $p | egrep "^$objsha1"; then
+			found_duplicate_object=1
+			echo "DUPLICATE OBJECT FOUND"
+			break
+		fi
+	done &&
+	test "$found_duplicate_object" = 1
+'
+
 test_expect_success 'loose objects in alternate ODB are not repacked' '
 	mkdir alt_objects &&
 	echo `pwd`/alt_objects > .git/objects/info/alternates &&
-- 
1.8.5.2.500.g8060133

^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2014-03-03 20:04 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-23  2:38 WIth git-next, writing bitmaps fails when keep files are present Siddharth Agarwal
2014-01-23 20:36 ` Siddharth Agarwal
2014-01-23 22:52 ` [PATCH] pack-objects: turn off bitmaps when skipping objects Jeff King
2014-01-23 23:45   ` Siddharth Agarwal
2014-01-23 23:53     ` Siddharth Agarwal
2014-01-24  2:28       ` Jeff King
2014-01-24  2:44         ` Siddharth Agarwal
2014-01-28  6:09           ` [PATCH] repack: add `repack.honorpackkeep` config var Jeff King
2014-01-28  9:21             ` Junio C Hamano
2014-02-24  8:24               ` Jeff King
2014-02-24 19:10                 ` Junio C Hamano
2014-02-26 10:13                   ` Jeff King
2014-02-26 20:30                     ` Junio C Hamano
2014-02-27 11:27                       ` Jeff King
2014-02-27 18:04                         ` Junio C Hamano
2014-02-28  8:55                           ` Jeff King
2014-02-28 17:09                             ` Nasser Grainawi
2014-03-01  6:05                               ` Jeff King
2014-03-03 19:12                                 ` Shawn Pearce
2014-02-28 18:45                             ` Junio C Hamano
2014-03-01  5:43                               ` Jeff King
2014-03-03 18:13                                 ` Junio C Hamano
2014-03-03 18:15                                   ` Jeff King
2014-03-03 19:51                                     ` Junio C Hamano
2014-03-03 20:04                                       ` Jeff King
2014-01-23 23:56     ` [PATCH] pack-objects: turn off bitmaps when skipping objects Vicent Martí
2014-01-24  2:26       ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).