git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] Increase core.packedGitLimit
@ 2017-04-20 20:41 David Turner
  2017-04-20 21:02 ` Jeff King
  2017-06-21 13:44 ` [PATCH] Increase core.packedGitLimit Jeff King
  0 siblings, 2 replies; 13+ messages in thread
From: David Turner @ 2017-04-20 20:41 UTC (permalink / raw)
  To: git; +Cc: David Turner

When core.packedGitLimit is exceeded, git will close packs.  If there
is a repack operation going on in parallel with a fetch, the fetch
might open a pack, and then be forced to close it due to
packedGitLimit being hit.  The repack could then delete the pack
out from under the fetch, causing the fetch to fail.

Increase core.packedGitLimit's default value to prevent
this.

On current 64-bit x86_64 machines, 48 bits of address space are
available.  It appears that 64-bit ARM machines have no standard
amount of address space (that is, it varies by manufacturer), and IA64
and POWER machines have the full 64 bits.  So 48 bits is the only
limit that we can reasonably care about.  We reserve a few bits of the
48-bit address space for the kernel's use (this is not strictly
necessary, but it's better to be safe), and use up to the remaining
45.  No git repository will be anywhere near this large any time soon,
so this should prevent the failure.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: David Turner <dturner@twosigma.com>
---
 git-compat-util.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index 8a4a3f85e7..1c5de153a5 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -616,7 +616,7 @@ extern int git_lstat(const char *, struct stat *);
 #endif
 
 #define DEFAULT_PACKED_GIT_LIMIT \
-	((1024L * 1024L) * (size_t)(sizeof(void*) >= 8 ? 8192 : 256))
+	((1024L * 1024L) * (size_t)(sizeof(void*) >= 8 ? (32 * 1024L * 1024L) : 256))
 
 #ifdef NO_PREAD
 #define pread git_pread
-- 
2.11.GIT


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] Increase core.packedGitLimit
  2017-04-20 20:41 [PATCH] Increase core.packedGitLimit David Turner
@ 2017-04-20 21:02 ` Jeff King
  2017-04-20 21:58   ` Johannes Schindelin
  2017-06-21 13:51   ` [PATCH] docs: update 64-bit core.packedGitLimit default Jeff King
  2017-06-21 13:44 ` [PATCH] Increase core.packedGitLimit Jeff King
  1 sibling, 2 replies; 13+ messages in thread
From: Jeff King @ 2017-04-20 21:02 UTC (permalink / raw)
  To: David Turner; +Cc: git

On Thu, Apr 20, 2017 at 04:41:18PM -0400, David Turner wrote:

> When core.packedGitLimit is exceeded, git will close packs.  If there
> is a repack operation going on in parallel with a fetch, the fetch
> might open a pack, and then be forced to close it due to
> packedGitLimit being hit.  The repack could then delete the pack
> out from under the fetch, causing the fetch to fail.
> 
> Increase core.packedGitLimit's default value to prevent
> this.
> 
> On current 64-bit x86_64 machines, 48 bits of address space are
> available.  It appears that 64-bit ARM machines have no standard
> amount of address space (that is, it varies by manufacturer), and IA64
> and POWER machines have the full 64 bits.  So 48 bits is the only
> limit that we can reasonably care about.  We reserve a few bits of the
> 48-bit address space for the kernel's use (this is not strictly
> necessary, but it's better to be safe), and use up to the remaining
> 45.  No git repository will be anywhere near this large any time soon,
> so this should prevent the failure.

Yep, I think this is a reasonable direction.

> ---
>  git-compat-util.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

This probably needs an update to the core.packedGitLimit section of
Documentation/config.txt.

> diff --git a/git-compat-util.h b/git-compat-util.h
> index 8a4a3f85e7..1c5de153a5 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -616,7 +616,7 @@ extern int git_lstat(const char *, struct stat *);
>  #endif
>  
>  #define DEFAULT_PACKED_GIT_LIMIT \
> -	((1024L * 1024L) * (size_t)(sizeof(void*) >= 8 ? 8192 : 256))
> +	((1024L * 1024L) * (size_t)(sizeof(void*) >= 8 ? (32 * 1024L * 1024L) : 256))

I wondered if we would run afoul of integer sizes on 64-bit systems where
"long" is still only 32-bits (i.e., Windows). But I think it's OK,
because the values before we cast to size_t are in megabytes. So your
32*1024*1024 needs only 25 bits to store it. And then after we cast to
size_t, everything is in 64-bit.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Increase core.packedGitLimit
  2017-04-20 21:02 ` Jeff King
@ 2017-04-20 21:58   ` Johannes Schindelin
  2017-04-20 22:04     ` David Turner
  2017-06-21 13:51   ` [PATCH] docs: update 64-bit core.packedGitLimit default Jeff King
  1 sibling, 1 reply; 13+ messages in thread
From: Johannes Schindelin @ 2017-04-20 21:58 UTC (permalink / raw)
  To: Jeff King; +Cc: David Turner, git

Hi Peff,

On Thu, 20 Apr 2017, Jeff King wrote:

> On Thu, Apr 20, 2017 at 04:41:18PM -0400, David Turner wrote:
> 
> > When core.packedGitLimit is exceeded, git will close packs.  If there
> > is a repack operation going on in parallel with a fetch, the fetch
> > might open a pack, and then be forced to close it due to
> > packedGitLimit being hit.  The repack could then delete the pack
> > out from under the fetch, causing the fetch to fail.
> > 
> > Increase core.packedGitLimit's default value to prevent
> > this.
> > 
> > On current 64-bit x86_64 machines, 48 bits of address space are
> > available.  It appears that 64-bit ARM machines have no standard
> > amount of address space (that is, it varies by manufacturer), and IA64
> > and POWER machines have the full 64 bits.  So 48 bits is the only
> > limit that we can reasonably care about.  We reserve a few bits of the
> > 48-bit address space for the kernel's use (this is not strictly
> > necessary, but it's better to be safe), and use up to the remaining
> > 45.  No git repository will be anywhere near this large any time soon,
> > so this should prevent the failure.
> 
> Yep, I think this is a reasonable direction.
> 
> > ---
> >  git-compat-util.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> This probably needs an update to the core.packedGitLimit section of
> Documentation/config.txt.
> 
> > diff --git a/git-compat-util.h b/git-compat-util.h
> > index 8a4a3f85e7..1c5de153a5 100644
> > --- a/git-compat-util.h
> > +++ b/git-compat-util.h
> > @@ -616,7 +616,7 @@ extern int git_lstat(const char *, struct stat *);
> >  #endif
> >  
> >  #define DEFAULT_PACKED_GIT_LIMIT \
> > -	((1024L * 1024L) * (size_t)(sizeof(void*) >= 8 ? 8192 : 256))
> > +	((1024L * 1024L) * (size_t)(sizeof(void*) >= 8 ? (32 * 1024L * 1024L) : 256))
> 
> I wondered if we would run afoul of integer sizes on 64-bit systems where
> "long" is still only 32-bits (i.e., Windows). But I think it's OK,
> because the values before we cast to size_t are in megabytes. So your
> 32*1024*1024 needs only 25 bits to store it. And then after we cast to
> size_t, everything is in 64-bit.

Indeed, when I patch a local Git checkout accordingly, I see that
packed_git_limit is set to 35184372088832.

The bigger problem in this regard is that users are allowed to override
this via core.packedgitlimit but that value is parsed as an unsigned long.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH] Increase core.packedGitLimit
  2017-04-20 21:58   ` Johannes Schindelin
@ 2017-04-20 22:04     ` David Turner
  2017-04-21  9:34       ` Johannes Schindelin
  0 siblings, 1 reply; 13+ messages in thread
From: David Turner @ 2017-04-20 22:04 UTC (permalink / raw)
  To: 'Johannes Schindelin', Jeff King; +Cc: git@vger.kernel.org


> -----Original Message-----
> From: Johannes Schindelin [mailto:Johannes.Schindelin@gmx.de]
> Sent: Thursday, April 20, 2017 5:58 PM
> To: Jeff King <peff@peff.net>
> Cc: David Turner <David.Turner@twosigma.com>; git@vger.kernel.org
> Subject: Re: [PATCH] Increase core.packedGitLimit
> 
> Hi Peff,
> 
> On Thu, 20 Apr 2017, Jeff King wrote:
> 
> > On Thu, Apr 20, 2017 at 04:41:18PM -0400, David Turner wrote:
> >
> > > When core.packedGitLimit is exceeded, git will close packs.  If
> > > there is a repack operation going on in parallel with a fetch, the
> > > fetch might open a pack, and then be forced to close it due to
> > > packedGitLimit being hit.  The repack could then delete the pack out
> > > from under the fetch, causing the fetch to fail.
> > >
> > > Increase core.packedGitLimit's default value to prevent this.
> > >
> > > On current 64-bit x86_64 machines, 48 bits of address space are
> > > available.  It appears that 64-bit ARM machines have no standard
> > > amount of address space (that is, it varies by manufacturer), and
> > > IA64 and POWER machines have the full 64 bits.  So 48 bits is the
> > > only limit that we can reasonably care about.  We reserve a few bits
> > > of the 48-bit address space for the kernel's use (this is not
> > > strictly necessary, but it's better to be safe), and use up to the
> > > remaining 45.  No git repository will be anywhere near this large
> > > any time soon, so this should prevent the failure.
> >
> > Yep, I think this is a reasonable direction.
> >
> > > ---
> > >  git-compat-util.h | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > This probably needs an update to the core.packedGitLimit section of
> > Documentation/config.txt.
> >
> > > diff --git a/git-compat-util.h b/git-compat-util.h index
> > > 8a4a3f85e7..1c5de153a5 100644
> > > --- a/git-compat-util.h
> > > +++ b/git-compat-util.h
> > > @@ -616,7 +616,7 @@ extern int git_lstat(const char *, struct stat
> > > *);  #endif
> > >
> > >  #define DEFAULT_PACKED_GIT_LIMIT \
> > > -	((1024L * 1024L) * (size_t)(sizeof(void*) >= 8 ? 8192 : 256))
> > > +	((1024L * 1024L) * (size_t)(sizeof(void*) >= 8 ? (32 * 1024L *
> > > +1024L) : 256))
> >
> > I wondered if we would run afoul of integer sizes on 64-bit systems
> > where "long" is still only 32-bits (i.e., Windows). But I think it's
> > OK, because the values before we cast to size_t are in megabytes. So
> > your
> > 32*1024*1024 needs only 25 bits to store it. And then after we cast to
> > size_t, everything is in 64-bit.
> 
> Indeed, when I patch a local Git checkout accordingly, I see that
> packed_git_limit is set to 35184372088832.
> 
> The bigger problem in this regard is that users are allowed to override this via
> core.packedgitlimit but that value is parsed as an unsigned long.

We might want to think about replacing git_config_ulong with 
git_config_size_t in nearly all cases. "long" has ceased to be 
useful.  More modern versions of C prefer uint64_t, but I
think that we'll usually want size_t because these values will
be used as memory limits of various sorts.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH] Increase core.packedGitLimit
  2017-04-20 22:04     ` David Turner
@ 2017-04-21  9:34       ` Johannes Schindelin
  0 siblings, 0 replies; 13+ messages in thread
From: Johannes Schindelin @ 2017-04-21  9:34 UTC (permalink / raw)
  To: David Turner; +Cc: Jeff King, git@vger.kernel.org

Hi Dave,

On Thu, 20 Apr 2017, David Turner wrote:
> 
> We might want to think about replacing git_config_ulong with 
> git_config_size_t in nearly all cases. "long" has ceased to be 
> useful.  More modern versions of C prefer uint64_t, but I
> think that we'll usually want size_t because these values will
> be used as memory limits of various sorts.

Indeed, `unsigned long` has ceased to be useful ever since the
introduction of inttypes.h. It is just too undefined.

We probably also need git_config_off_t(), though, and maybe even
git_config_ssize_t().

I would definitely welcome a change in direction where we use datatypes
also as a documentation of the variables' purpose.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Increase core.packedGitLimit
  2017-04-20 20:41 [PATCH] Increase core.packedGitLimit David Turner
  2017-04-20 21:02 ` Jeff King
@ 2017-06-21 13:44 ` Jeff King
  1 sibling, 0 replies; 13+ messages in thread
From: Jeff King @ 2017-06-21 13:44 UTC (permalink / raw)
  To: David Turner; +Cc: git

On Thu, Apr 20, 2017 at 04:41:18PM -0400, David Turner wrote:

> When core.packedGitLimit is exceeded, git will close packs.  If there
> is a repack operation going on in parallel with a fetch, the fetch
> might open a pack, and then be forced to close it due to
> packedGitLimit being hit.  The repack could then delete the pack
> out from under the fetch, causing the fetch to fail.
> 
> Increase core.packedGitLimit's default value to prevent
> this.
> 
> On current 64-bit x86_64 machines, 48 bits of address space are
> available.  It appears that 64-bit ARM machines have no standard
> amount of address space (that is, it varies by manufacturer), and IA64
> and POWER machines have the full 64 bits.  So 48 bits is the only
> limit that we can reasonably care about.  We reserve a few bits of the
> 48-bit address space for the kernel's use (this is not strictly
> necessary, but it's better to be safe), and use up to the remaining
> 45.  No git repository will be anywhere near this large any time soon,
> so this should prevent the failure.

I happened to run into another case of this which failed fairly
reliably, and confirmed that bumping packedGitLimit fixed it. Having
forgotten completely that we did this bump here, I wrote a similar
patch. :-/

The content of the patch is basically the same, but for the sake of the
list archive, here's the bit from my commit message about the
reproduction recipe (in case we should ever need to revisit it again,
though I'm pretty sure that your patch should fix the problem for good).

-- >8 --

You can see this situation in practice with strace:

  # two objects, one big and one small; for the sake of
  # speed, we'll make our big object only 1.1MB and drop
  # core.packedGitLimit to 1MB. Note that we care about
  # on-disk compressed sizes here, so we need uncompressible
  # data.
  small=$(echo small | git hash-object -w --stdin)
  big=$(dd if=/dev/urandom bs=1k count=1100 | git hash-object -w --stdin)

  # Now put each in a pack, which will get mmap'd
  # separately.
  echo $small | git pack-objects objects/pack/pack
  echo $big | git pack-objects objects/pack/pack

  # Now we look in the packs alternately, and observe the
  # access patterns.
  {
          echo $small
          echo $big
          echo $small
  } | strace -e open,close,mmap,munmap \
      git -c core.packedgitlimit=1M \
          -c core.packedGitWindowSize=512K \
          cat-file --batch >/dev/null

The trace will look something like this (I've omitted the
uninteresting .idx bits):

  open("./objects/pack/pack-b2a8452664fe8848fd0a78be8b6f69ce65af3c57.pack", O_RDONLY|O_CLOEXEC) = 3
  mmap(NULL, 47, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fb5890af000
  close(3)                                = 0

There we opened the small pack, mapped the whole thing, and
closed it.

  open("./objects/pack/pack-acfa5f8886b391d0b2475ef3f19bcc387ce19271.pack", O_RDONLY|O_CLOEXEC) = 3
  mmap(NULL, 524288, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fb589011000
  mmap(NULL, 1130496, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb588efd000
  munmap(0x7fb5890af000, 47)              = 0
  mmap(NULL, 524288, PROT_READ, MAP_PRIVATE, 3, 0x80000) = 0x7fb5881c0000
  munmap(0x7fb589011000, 524288)          = 0
  mmap(NULL, 78211, PROT_READ, MAP_PRIVATE, 3, 0x100000) = 0x7fb58909a000
  munmap(0x7fb588efd000, 1130496)         = 0

And here we open the larger pack. Note that we don't close
the descriptor because the file is bigger than our 512k
window.  And in fact we end up creating several maps as we
progress through the file. But before making the second one,
which would push us over our packedGitLimit, we close the
original small 47-byte map.

  open("./objects/pack/pack-b2a8452664fe8848fd0a78be8b6f69ce65af3c57.pack", O_RDONLY|O_CLOEXEC) = 4
  mmap(NULL, 47, PROT_READ, MAP_PRIVATE, 4, 0) = 0x7fb5890af000
  close(4)                                = 0

And here we access the small pack again, but we have to
re-open and map it from scratch.  If there had been a
simultaneous repack and if this had been pack-objects, we
would now fail the operation.

These numbers are shrunk to make the test more tractable,
but you can see the same thing with the defaults when
fetching from a large repository.  The real-world situation
that would trigger this is something like:

  0. Repo has 8GB or larger packfile (this is the default
     core.packedGitLimit on 64-bit machines).

  1. Somebody pushes a "small" pack (smaller than the 1GB
     default window).

  2. Somebody tries to clone/fetch the result. The
     pack-objects process on the server starts by generating
     the list of objects to write, committing to the
     use of those packs. After accessing the recent objects
     in the small pack, it opens the big pack and closes the
     small one.

  3. Meanwhile, git-repack runs and folds the small pack
     into a larger one.

  4. The pack-objects process hits the write phase, but it
     can no longer access the recent objects from the small
     pack, and dies.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] docs: update 64-bit core.packedGitLimit default
  2017-04-20 21:02 ` Jeff King
  2017-04-20 21:58   ` Johannes Schindelin
@ 2017-06-21 13:51   ` Jeff King
  2017-06-21 16:25     ` Stefan Beller
  1 sibling, 1 reply; 13+ messages in thread
From: Jeff King @ 2017-06-21 13:51 UTC (permalink / raw)
  To: David Turner; +Cc: git, Junio C Hamano

On Thu, Apr 20, 2017 at 05:02:55PM -0400, Jeff King wrote:

> >  git-compat-util.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> This probably needs an update to the core.packedGitLimit section of
> Documentation/config.txt.

Looks like we never did that part. Here it is (Junio, this goes on top
of dt/raise-core-packed-git-limit).

-- >8 --
Subject: [PATCH] docs: update 64-bit core.packedGitLimit default

We bumped the default in be4ca2905 (Increase
core.packedGitLimit, 2017-04-20) but never adjusted the
documentation to match.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/config.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index f6278a5ae..fc2cf1fe1 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -683,7 +683,8 @@ core.packedGitLimit::
 	bytes at once to complete an operation it will unmap existing
 	regions to reclaim virtual address space within the process.
 +
-Default is 256 MiB on 32 bit platforms and 8 GiB on 64 bit platforms.
+Default is 256 MiB on 32 bit platforms and 32 TiB (effectively
+unlimited) on 64 bit platforms.
 This should be reasonable for all users/operating systems, except on
 the largest projects.  You probably do not need to adjust this value.
 +
-- 
2.13.1.792.g159074dab


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] docs: update 64-bit core.packedGitLimit default
  2017-06-21 13:51   ` [PATCH] docs: update 64-bit core.packedGitLimit default Jeff King
@ 2017-06-21 16:25     ` Stefan Beller
  2017-06-21 18:06       ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Beller @ 2017-06-21 16:25 UTC (permalink / raw)
  To: Jeff King; +Cc: David Turner, git@vger.kernel.org, Junio C Hamano

On Wed, Jun 21, 2017 at 6:51 AM, Jeff King <peff@peff.net> wrote:
> On Thu, Apr 20, 2017 at 05:02:55PM -0400, Jeff King wrote:
>
>> >  git-compat-util.h | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> This probably needs an update to the core.packedGitLimit section of
>> Documentation/config.txt.
>
> Looks like we never did that part. Here it is (Junio, this goes on top
> of dt/raise-core-packed-git-limit).
>
> -- >8 --
> Subject: [PATCH] docs: update 64-bit core.packedGitLimit default
>
> We bumped the default in be4ca2905 (Increase
> core.packedGitLimit, 2017-04-20) but never adjusted the
> documentation to match.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  Documentation/config.txt | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index f6278a5ae..fc2cf1fe1 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -683,7 +683,8 @@ core.packedGitLimit::
>         bytes at once to complete an operation it will unmap existing
>         regions to reclaim virtual address space within the process.
>  +
> -Default is 256 MiB on 32 bit platforms and 8 GiB on 64 bit platforms.
> +Default is 256 MiB on 32 bit platforms and 32 TiB (effectively
> +unlimited) on 64 bit platforms.

nit: I would have not written "effectively unlimited", as we state
the limit right here. The further sentences already sooth the
user to not worry to much:

    This should be reasonable for all users/operating systems,
    except on the largest projects.  You probably do not need to
    adjust this value.

But as we just adjusted the default to prevent a bug,
maybe there are good reasons to adjust the value for users,
too? (Specifically 32 bit platforms could run into the problem
that the commit be4ca2905 (Increase core.packedGitLimit,
2017-04-20) states.)

While I think this patch is worthwhile applying (even as-is),
I think we might want to put another patch here?
Or is there a way to fix the underlying issue when fetch and
repack is run at the same time?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] docs: update 64-bit core.packedGitLimit default
  2017-06-21 16:25     ` Stefan Beller
@ 2017-06-21 18:06       ` Jeff King
  2017-06-21 18:38         ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2017-06-21 18:06 UTC (permalink / raw)
  To: Stefan Beller; +Cc: David Turner, git@vger.kernel.org, Junio C Hamano

On Wed, Jun 21, 2017 at 09:25:09AM -0700, Stefan Beller wrote:

> > -Default is 256 MiB on 32 bit platforms and 8 GiB on 64 bit platforms.
> > +Default is 256 MiB on 32 bit platforms and 32 TiB (effectively
> > +unlimited) on 64 bit platforms.
>
> nit: I would have not written "effectively unlimited", as we state
> the limit right here. The further sentences already sooth the
> user to not worry to much:
>
>     This should be reasonable for all users/operating systems,
>     except on the largest projects.  You probably do not need to
>     adjust this value.

I actually thought about taking it the other way, and simply saying "and
unlimited on 64 bit platforms". That's not technically true, of course.
You're limited by size_t and how much address space the OS will give
you. But unless you have petabyte repositories, you'd never know.

> But as we just adjusted the default to prevent a bug,
> maybe there are good reasons to adjust the value for users,
> too? (Specifically 32 bit platforms could run into the problem
> that the commit be4ca2905 (Increase core.packedGitLimit,
> 2017-04-20) states.)

No, there's really not a good reason to adjust it. If you're on 32-bit,
you almost certainly can't increase it to avoid the issue[1]. You could
bump it to say 512MB or even 1GB, and if your repository happens to be
smaller than that, it would help. But then you may also hit the
problem of running out of contiguous address space. In which case mmap
fails and you are much worse off than you otherwise would have been. The
exact value that's reasonable is going to to depend on things like how
much address space the kernel gives you, how much space other libraries
take up, how your libc allocates, etc. There's some discussion in the
thread around [2].

But in the end, I think the answer is "don't host a busy site full of
large repositories on a 32-bit system. It's 2017, for goodness' sake".

[1] I wouldn't call it a bug exactly; just an unfortunate interaction
    between two simultaneous processes.

[2] http://public-inbox.org/git/Pine.LNX.4.64.0612240126380.3671@woody.osdl.org/

> While I think this patch is worthwhile applying (even as-is),
> I think we might want to put another patch here?
> Or is there a way to fix the underlying issue when fetch and
> repack is run at the same time?

Before I remembered that we had already applied David's patch, I wrote
up a really long commit message for my own version. Let me quote a bit
from it, because it sets up some context for talking about solutions:

    In general, Git is resilient to the object database being
    repacked. If we fail to find an object, we re-scan the pack
    directory to pick the newly written pack.  The one exception
    is pack-objects, which records certain details of each
    objects (such as its exact pack location and delta format)
    and uses that during the final write phase. If a repack
    occurs while we're serving a fetch, the fetch may die with
    "packfile is invalid".

    We dealt with this somewhat in 4c0801820 (pack-objects:
    protect against disappearing packs, 2011-10-14), which
    doesn't commit to the use of a pack until we've opened and
    mmap'd the packfile. But that still leaves a loophole: we
    sometimes close and unmap packs if we hit certain limits.

    The rules are basically:

      1. If we open a pack bigger than our mmap window size
         (defined by core.packedGitWindowSize), we keep the
         descriptor open to make further maps.

      2. If we open a small pack, we mmap the whole thing and
         close the descriptor. This prevents us consuming too
         many descriptors if you have a lot of small packs.

      3. If we hit our descriptor limit (determined internally
         by getrlimit), we look for an open pack to close.

      4. If we hit our memory limit (defined by
         core.packedGitLimit), we close mmap windows until
         we're under the limit.

    As long as we have a reasonable number of packs (compared to
    the available descriptors), we won't close any descriptors
    according to (3). If we bump against the packedGitLimit in
    (4) and we close a large pack from (1), we're also OK; we
    retain the descriptor and can mmap again later. But if we
    hit the limit in (4) and close a small pack from (2), we'll
    have to re-open the pack, which may fail if it was repacked.

So the other direction, instead of avoiding the memory limit in (4), is
to stop closing "small" packs in (2). But I don't think that's a good
idea. Even with the code after David's patch, you can still trigger the
problem by running out of file descriptors. And if we stop closing
small packs, that makes it even more likely for that to happen.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] docs: update 64-bit core.packedGitLimit default
  2017-06-21 18:06       ` Jeff King
@ 2017-06-21 18:38         ` Junio C Hamano
  2017-06-21 18:53           ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2017-06-21 18:38 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Beller, David Turner, git@vger.kernel.org

Jeff King <peff@peff.net> writes:

> So the other direction, instead of avoiding the memory limit in (4), is
> to stop closing "small" packs in (2). But I don't think that's a good
> idea. Even with the code after David's patch, you can still trigger the
> problem by running out of file descriptors. And if we stop closing
> small packs, that makes it even more likely for that to happen.

I recall that when we notice that we cannot access a loose one that
we earlier thought existed we fall back to rescan the packs?  Would
an approach similar to that can work to deal with the "closed small
pack goes away" scenario?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] docs: update 64-bit core.packedGitLimit default
  2017-06-21 18:38         ` Junio C Hamano
@ 2017-06-21 18:53           ` Jeff King
  2017-06-21 19:22             ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2017-06-21 18:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, David Turner, git@vger.kernel.org

On Wed, Jun 21, 2017 at 11:38:54AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > So the other direction, instead of avoiding the memory limit in (4), is
> > to stop closing "small" packs in (2). But I don't think that's a good
> > idea. Even with the code after David's patch, you can still trigger the
> > problem by running out of file descriptors. And if we stop closing
> > small packs, that makes it even more likely for that to happen.
> 
> I recall that when we notice that we cannot access a loose one that
> we earlier thought existed we fall back to rescan the packs?  Would
> an approach similar to that can work to deal with the "closed small
> pack goes away" scenario?

Not very well. See the first paragraph of my explanation. Basically,
pack-objects is special because it makes decisions based on (and records
pointers to) the particular packed representation. If that goes away, it
just bails.

Which isn't to say that falling back is impossible. I think in the worst
case that it could say "oops, I can't access the pack that has object X
anymore", fall back to finding _any_ copy of it and including it as a
pure base object (it's too late at that point to make a delta, and
trying to be clever about reusing on-disk deltas is likely just going to
end up with a broken corner case).

So then you have a sub-optimal pack, but at least it didn't die(). If
that happens for one object, I don't think it's that big a deal. But the
resulting pack could end up pretty sub-optimal if you lose access to a
whole pack. And remember, "small" here is just smaller than the window
size, which is a gigabyte on 64 bit systems. So imagine that you lose
access to a 500 MB pack, but we recover by sending base objects. Then
everything that was in that pack gets converted to its full non-delta
representation, which could mean it expands to several gigabytes. The
current behavior to die() and retry the fetch is not that bad an
alternative.

Of course, the best alternative is retaining access to the packs, which
is what typically happens now on 64-bit systems (it's just that the
packedGitLimit was set pointlessly low). I'm not sure if you're asking
in general, or as a last-ditch effort for 32-bit systems.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] docs: update 64-bit core.packedGitLimit default
  2017-06-21 18:53           ` Jeff King
@ 2017-06-21 19:22             ` Junio C Hamano
  2017-06-21 19:31               ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2017-06-21 19:22 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Beller, David Turner, git@vger.kernel.org

Jeff King <peff@peff.net> writes:

> Of course, the best alternative is retaining access to the packs, which
> is what typically happens now on 64-bit systems (it's just that the
> packedGitLimit was set pointlessly low). I'm not sure if you're asking
> in general, or as a last-ditch effort for 32-bit systems.

As the last-ditch effort for 32-bit boxen.  But at this point in
time we probably do not care about them too deeply?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] docs: update 64-bit core.packedGitLimit default
  2017-06-21 19:22             ` Junio C Hamano
@ 2017-06-21 19:31               ` Jeff King
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2017-06-21 19:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, David Turner, git@vger.kernel.org

On Wed, Jun 21, 2017 at 12:22:08PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Of course, the best alternative is retaining access to the packs, which
> > is what typically happens now on 64-bit systems (it's just that the
> > packedGitLimit was set pointlessly low). I'm not sure if you're asking
> > in general, or as a last-ditch effort for 32-bit systems.
> 
> As the last-ditch effort for 32-bit boxen.  But at this point in
> time we probably do not care about them too deeply?

Right, that was my general feeling. Especially as this is an issue only
for people hosting large and very active repositories (so "upgrade to a
better platform" is an easier pill to swallow for a few servers rather
than the whole of the Git population).

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-06-21 19:31 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-20 20:41 [PATCH] Increase core.packedGitLimit David Turner
2017-04-20 21:02 ` Jeff King
2017-04-20 21:58   ` Johannes Schindelin
2017-04-20 22:04     ` David Turner
2017-04-21  9:34       ` Johannes Schindelin
2017-06-21 13:51   ` [PATCH] docs: update 64-bit core.packedGitLimit default Jeff King
2017-06-21 16:25     ` Stefan Beller
2017-06-21 18:06       ` Jeff King
2017-06-21 18:38         ` Junio C Hamano
2017-06-21 18:53           ` Jeff King
2017-06-21 19:22             ` Junio C Hamano
2017-06-21 19:31               ` Jeff King
2017-06-21 13:44 ` [PATCH] Increase core.packedGitLimit Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).