git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Git server eats all memory
@ 2010-08-04 14:57 Ivan Kanis
  2010-08-04 15:55 ` Matthieu Moy
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Ivan Kanis @ 2010-08-04 14:57 UTC (permalink / raw)
  To: git

Hello,

I am running git 1.7.0.4 on Debian stable. I have compiled git from
source. I am cloning using ssh.The client and the server are running
the same version. The bare repository is 4.5G with various binary files,
I have 6G of memory on my server.

I am having problem with memory ballooning when receiving object
from the server. The amount of memory used on the server seems to be same
size as the object received.

I have been discussing this quite a bit on #git at freenode. Hilary said
it was maybe due to file mmap that caused the memory reported on top to
be skewed.

I have ran two clones side by side and noticed that the server started
to swap and performance were awful. This means that if two developers
are doing a clone at the same time they will notice the slowness. 

Another worry is that we're planning to have more repositories of the
same size and the server just won't scale.

I am wondering if anyone has seen this behavior? I'll do whatever I can
to troubleshoot the problem. I know C but I just don't know where to
look at. Any help would be very much appreciated.

Kind regards,
-- 
Ivan Kanis

Let a fool hold his tongue and he will pass for a sage.
    -- Publilius Syrus 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git server eats all memory
  2010-08-04 14:57 Git server eats all memory Ivan Kanis
@ 2010-08-04 15:55 ` Matthieu Moy
  2010-08-04 17:50   ` Ivan Kanis
  2010-08-04 20:12 ` Avery Pennarun
  2010-08-10  0:46 ` Robin H. Johnson
  2 siblings, 1 reply; 28+ messages in thread
From: Matthieu Moy @ 2010-08-04 15:55 UTC (permalink / raw)
  To: Ivan Kanis; +Cc: git

Ivan Kanis <expire-by-2010-08-09@kanis.fr> writes:

> Hello,
>
> I am running git 1.7.0.4 on Debian stable. I have compiled git from
> source. I am cloning using ssh.The client and the server are running
> the same version. The bare repository is 4.5G with various binary files,
> I have 6G of memory on my server.

I never tried that size myself, but according to what I read on the
mailing list, that should remain within what Git could manage, maybe a
bit painfully.

The standard followup question in your case is: is the repository
fully packed on the server? If not, maybe "git gc" or "git gc
--aggressive" (expansive, but a one-time operation) could help.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git server eats all memory
  2010-08-04 15:55 ` Matthieu Moy
@ 2010-08-04 17:50   ` Ivan Kanis
  0 siblings, 0 replies; 28+ messages in thread
From: Ivan Kanis @ 2010-08-04 17:50 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: git

Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> wrote:

> The standard followup question in your case is: is the repository
> fully packed on the server? If not, maybe "git gc" or "git gc
> --aggressive" (expansive, but a one-time operation) could help.

Hi Mathieu,

I'll give it a shot when I get back to work.

I have done the following:

git repack -adf --window=100 --depth=20 --window-memory=50m

It greatly helps the compression stage that used up all the memory. The
problem is in the next phase when receiving objects. It is a bit strange
as receiving objects should not take up any memory on the server.

Take care,
-- 
Ivan Kanis
http://kanis.fr

Email is a wonderful thing for people whose role in life is to be on
top of things. But not for me; my role is to be on the bottom of
things. What I do takes long hours of studying and uninterruptible
concentration.
    -- Donald Knuth                                                 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git server eats all memory
  2010-08-04 14:57 Git server eats all memory Ivan Kanis
  2010-08-04 15:55 ` Matthieu Moy
@ 2010-08-04 20:12 ` Avery Pennarun
  2010-08-05  6:33   ` Ivan Kanis
  2010-08-10  0:46 ` Robin H. Johnson
  2 siblings, 1 reply; 28+ messages in thread
From: Avery Pennarun @ 2010-08-04 20:12 UTC (permalink / raw)
  To: Ivan Kanis; +Cc: git

On Wed, Aug 4, 2010 at 10:57 AM, Ivan Kanis
<expire-by-2010-08-09@kanis.fr> wrote:
> I am running git 1.7.0.4 on Debian stable. I have compiled git from
> source. I am cloning using ssh.The client and the server are running
> the same version. The bare repository is 4.5G with various binary files,
> I have 6G of memory on my server.
>
> I am having problem with memory ballooning when receiving object
> from the server. The amount of memory used on the server seems to be same
> size as the object received.

Git works fine with huge repositories; it does not work fine at all
with very large individual objects in a repository, and it does what
you're experiencing.  There are a few minor workarounds (like the
repack command someone mentioned) that slightly reduce the symptoms,
but the symptoms will crop up again eventually.

There's at least one project intended to solve this that people have linked to:
http://caca.zoy.org/wiki/git-bigfiles

...but it's incomplete and it doesn't look like their repo has changed
in some time.

I'm not aware of any workaround to this sort of thing other than
"don't store large objects in git."  You can split the big objects
into a bunch of small objects, as bup does, but then you have to
reassemble them all, which is inconvenient.

Sorry,

Avery

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git server eats all memory
  2010-08-04 20:12 ` Avery Pennarun
@ 2010-08-05  6:33   ` Ivan Kanis
  2010-08-05 22:45     ` Jared Hance
  2010-08-06  1:37     ` Nguyen Thai Ngoc Duy
  0 siblings, 2 replies; 28+ messages in thread
From: Ivan Kanis @ 2010-08-05  6:33 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: git

Avery Pennarun <apenwarr@gmail.com> wrote:

> On Wed, Aug 4, 2010 at 10:57 AM, Ivan Kanis

>> I am having problem with memory ballooning when receiving object
>> from the server. The amount of memory used on the server seems to be same
>> size as the object received.

> Git works fine with huge repositories; it does not work fine at all
> with very large individual objects in a repository, and it does what
> you're experiencing.

Hello Avery,

The largest object is 120M. I didn't describe the problem very
accurately. The memory consumed is the *sum* of the objects
downloaded. 

For example, a repository of 4G consumes 4G of memory at the end of the
receiving objects phase. What is very interesting is the total of
objects downloaded is the same as the memory consumed. That makes me
think there is a link somewhere. Surely it shouldn't consume that much
memory.

I am ready to do whatever to diagnose the problem. I know C pretty well
and am ready to look into it but I am not sure where to start.

Take care,
-- 
Ivan Kanis
http://kanis.fr

Seriousness is the only refuge of the shallow.
    -- Oscar Wilde 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git server eats all memory
  2010-08-05  6:33   ` Ivan Kanis
@ 2010-08-05 22:45     ` Jared Hance
  2010-08-06  1:37     ` Nguyen Thai Ngoc Duy
  1 sibling, 0 replies; 28+ messages in thread
From: Jared Hance @ 2010-08-05 22:45 UTC (permalink / raw)
  To: git

On Thu, Aug 05, 2010 at 08:33:02AM +0200, Ivan Kanis wrote:
> I am ready to do whatever to diagnose the problem. I know C pretty well
> and am ready to look into it but I am not sure where to start.

It sounds like Git is reading each object into memory to send it to
the client, but doesn't remember to free the memory at the end of
sending the object, so it remains as a memory leak.

I would look in the code for malloc calls that don't have a free call,
or spots where free calls might not be hit.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git server eats all memory
  2010-08-05  6:33   ` Ivan Kanis
  2010-08-05 22:45     ` Jared Hance
@ 2010-08-06  1:37     ` Nguyen Thai Ngoc Duy
  2010-08-06  1:51       ` Nguyen Thai Ngoc Duy
  1 sibling, 1 reply; 28+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-08-06  1:37 UTC (permalink / raw)
  To: Ivan Kanis; +Cc: Avery Pennarun, git

On Thu, Aug 5, 2010 at 4:33 PM, Ivan Kanis
<expire-by-2010-08-10@kanis.fr> wrote:
> I am ready to do whatever to diagnose the problem. I know C pretty well
> and am ready to look into it but I am not sure where to start.

Try "git pack-objects --all --stdout > /dev/null" on the repo on
server to see if it uses the same amount of memory you saw in cloning.
You can then try debugging that command if it does.
-- 
Duy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git server eats all memory
  2010-08-06  1:37     ` Nguyen Thai Ngoc Duy
@ 2010-08-06  1:51       ` Nguyen Thai Ngoc Duy
  2010-08-06 11:34         ` Jakub Narebski
  2010-08-06 17:23         ` Ivan Kanis
  0 siblings, 2 replies; 28+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-08-06  1:51 UTC (permalink / raw)
  To: Ivan Kanis; +Cc: Avery Pennarun, git

On Fri, Aug 6, 2010 at 11:37 AM, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> On Thu, Aug 5, 2010 at 4:33 PM, Ivan Kanis
> <expire-by-2010-08-10@kanis.fr> wrote:
>> I am ready to do whatever to diagnose the problem. I know C pretty well
>> and am ready to look into it but I am not sure where to start.
>
> Try "git pack-objects --all --stdout > /dev/null" on the repo on
> server to see if it uses the same amount of memory you saw in cloning.
> You can then try debugging that command if it does.

Naah, git pack-objects needs list of commit tips. Try
git for-each-ref|cut -c 1-40|git pack-objects --all --stdout > /dev/null
-- 
Duy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git server eats all memory
  2010-08-06  1:51       ` Nguyen Thai Ngoc Duy
@ 2010-08-06 11:34         ` Jakub Narebski
  2010-08-06 17:23         ` Ivan Kanis
  1 sibling, 0 replies; 28+ messages in thread
From: Jakub Narebski @ 2010-08-06 11:34 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Ivan Kanis, Avery Pennarun, git

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
> On Fri, Aug 6, 2010 at 11:37 AM, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
>> On Thu, Aug 5, 2010 at 4:33 PM, Ivan Kanis <expire-by-2010-08-10@kanis.fr> wrote:

>>> I am ready to do whatever to diagnose the problem. I know C pretty well
>>> and am ready to look into it but I am not sure where to start.
>>
>> Try "git pack-objects --all --stdout > /dev/null" on the repo on
>> server to see if it uses the same amount of memory you saw in cloning.
>> You can then try debugging that command if it does.
> 
> Naah, git pack-objects needs list of commit tips. Try
> git for-each-ref|cut -c 1-40|git pack-objects --all --stdout > /dev/null

Nitpick: git-for-each-ref has `--format' option, no need for `cut'.

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git server eats all memory
  2010-08-06  1:51       ` Nguyen Thai Ngoc Duy
  2010-08-06 11:34         ` Jakub Narebski
@ 2010-08-06 17:23         ` Ivan Kanis
  2010-08-07  6:42           ` Dmitry Potapov
       [not found]           ` <AANLkTi=yeTh2tKn9t_=iZbdB5VLrfCPZ2_fBpYdf9wta@mail.gmail.com>
  1 sibling, 2 replies; 28+ messages in thread
From: Ivan Kanis @ 2010-08-06 17:23 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy, jaredhance; +Cc: Avery Pennarun, jnareb, git

Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:

> Naah, git pack-objects needs list of commit tips. Try
> git for-each-ref|cut -c 1-40|git pack-objects --all --stdout > /dev/null

Jared Hance <jaredhance@gmail.com> wrote:

> I would look in the code for malloc calls that don't have a free call,
> or spots where free calls might not be hit.

Hello Jared and Nguyen,

Thank you Nguyen for your command. I can now reproduce the problem
without needing the network. I have been following Jared lead today on
a potential memory leak. Here is what I found out.

I downloaded the latest release of git 1.7.2.1 and compiled it with
debugging support. I ran valgrind on the command and found two memory
leaks. I put the output at the bottom of the e-mail as it's not very
interesting. I patched one of the leak in pack_objects.c but got the
same problem: over 4G of memory consumption for a 4G repository.

I've come to the conclusion that it's not a memory leak. 

This afternoon I put macro around the following functions: xmalloc
xmallocz, xrealloc, xcalloc and xmmap. It reported the line of code and
size passed in each functions. I then run the result through a script
that totaled the amount used by each bit of code.

Here are the top 3 consumers:

| function | source                     | size in M |
|----------+----------------------------+-----------|
| xrealloc | builtin/pack-objects.c:690 |        86 |
| xmallocz | patch-delta.c:36           |       301 |
| xmmap    | sha1_file.c:772            |      4393 |

I expected the malloc to take 4G but was surprised it didn't. It seems
to be mmap taking all the memory. I am not familiar with that function,
it looks like it's mapping memory to a file... Is it reasonable to mmap
so much memory?

Today I chatted with someone on freenode #git and he reported the same
problem on his 2G repository, I am glad I am not the only one seeing
this ;)

I tried reading the code but it's going over my head. I'll look at is
some more next monday.

If anyone is familiar with the code source of git I would love to have
some insight into this.

Take care,

Ivan Kanis

PS: output of valgrind --leak-check=full

65 bytes in 1 blocks are definitely lost in loss record 4 of 7
   at 0x4C2260E: malloc (vg_replace_malloc.c:207)
   by 0x4C22797: realloc (vg_replace_malloc.c:429)
   by 0x4C600D: xrealloc (wrapper.c:80)
   by 0x4B7939: strbuf_grow (strbuf.c:70)
   by 0x4B80BA: strbuf_addf (strbuf.c:201)
   by 0x4832EF: system_path (exec_cmd.c:37)
   by 0x483411: setup_path (exec_cmd.c:104)
   by 0x404AF2: main (git.c:536)

512 bytes in 1 blocks are definitely lost in loss record 5 of 8
   at 0x4C203E4: calloc (vg_replace_malloc.c:397)
   by 0x4C5F9D: xcalloc (wrapper.c:96)
   by 0x445741: cmd_pack_objects (pack-objects.c:2117)
   by 0x4048EE: handle_internal_command (git.c:270)
   by 0x404B03: main (git.c:470)
-- 
http://kanis.fr

Everything should be made as simple as possible, but not simpler.
    -- Albert Einstein 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git server eats all memory
  2010-08-06 17:23         ` Ivan Kanis
@ 2010-08-07  6:42           ` Dmitry Potapov
  2010-08-09 10:12             ` Excessive mmap [was Git server eats all memory] Ivan Kanis
       [not found]           ` <AANLkTi=yeTh2tKn9t_=iZbdB5VLrfCPZ2_fBpYdf9wta@mail.gmail.com>
  1 sibling, 1 reply; 28+ messages in thread
From: Dmitry Potapov @ 2010-08-07  6:42 UTC (permalink / raw)
  To: Ivan Kanis; +Cc: Nguyen Thai Ngoc Duy, jaredhance, Avery Pennarun, jnareb, git

On Fri, Aug 06, 2010 at 07:23:17PM +0200, Ivan Kanis wrote:
>
> I expected the malloc to take 4G but was surprised it didn't. It seems
> to be mmap taking all the memory. I am not familiar with that function,
> it looks like it's mapping memory to a file... Is it reasonable to mmap
> so much memory?

AFAIK, Git does not need to mmap the whole pack to memory, but it
is more efficient to mmap the whole pack wherever possible, because
it has a completely random access, so if you store only one sliding
window, you will have to re-read it many times. Besides, mmap size
does not mean that so much physical memory is used. Pages should
be loaded when they are necessary, and if you have more than one
client cloning the same repo, this memory should be shared by them.


Dmitry

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git server eats all memory
       [not found]             ` <wesbp9cnnag.fsf@kanis.fr>
@ 2010-08-09  9:57               ` Nguyen Thai Ngoc Duy
  2010-08-09 17:38                 ` Ivan Kanis
  0 siblings, 1 reply; 28+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-08-09  9:57 UTC (permalink / raw)
  To: Ivan Kanis; +Cc: git

On Mon, Aug 9, 2010 at 7:33 PM, Ivan Kanis
<expire-by-2010-08-14@kanis.fr> wrote:
> Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
>
>> Can you send me massif report of git pack-objects? It'd be interesting
>> to see how memory is allocated.
>
> Hi Nguyen,
>
> I have attached the massif report.

Thanks. It does not look like it used a lot of memory (~50MB) (viewed
with ms_print). Git allocates 32 bytes per tree and 48 per commit plus
all tree contents, and all that will stay in memory until the end.

This command on git already gives me ~40MB peak. Are you sure you ran
it on your big repo?

echo|valgrind --tool=massif ./git pack-objects --all --stdout > /dev/null
-- 
Duy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Excessive mmap [was Git server eats all memory]
  2010-08-07  6:42           ` Dmitry Potapov
@ 2010-08-09 10:12             ` Ivan Kanis
  2010-08-09 12:35               ` Dmitry Potapov
  2018-06-20 14:53               ` Duy Nguyen
  0 siblings, 2 replies; 28+ messages in thread
From: Ivan Kanis @ 2010-08-09 10:12 UTC (permalink / raw)
  To: Dmitry Potapov
  Cc: Ivan Kanis, Nguyen Thai Ngoc Duy, jaredhance, Avery Pennarun,
	jnareb, git

Dmitry Potapov <dpotapov@gmail.com> wrote:

> On Fri, Aug 06, 2010 at 07:23:17PM +0200, Ivan Kanis wrote:
>>
>> I expected the malloc to take 4G but was surprised it didn't. It seems
>> to be mmap taking all the memory. I am not familiar with that function,
>> it looks like it's mapping memory to a file... Is it reasonable to mmap
>> so much memory?
>
> AFAIK, Git does not need to mmap the whole pack to memory, but it
> is more efficient to mmap the whole pack wherever possible, because
> it has a completely random access, so if you store only one sliding
> window, you will have to re-read it many times. Besides, mmap size
> does not mean that so much physical memory is used. Pages should
> be loaded when they are necessary, and if you have more than one
> client cloning the same repo, this memory should be shared by them.

I have clone identical repositories and the system starts to swap. I
think it shows that cloning two repository doesn't share mmap.

I saw this constant defined in git-compat-util.h

/* This value must be multiple of (pagesize * 2) */
#define DEFAULT_PACKED_GIT_WINDOW_SIZE \
        (sizeof(void*) >= 8 \
                ?  1 * 1024 * 1024 * 1024 \
                : 32 * 1024 * 1024)

If I read this correctly git is allocating 1G of mmap on 64 bit
architecture. Isn't that a bit much? I am running on a 64 bit server so
I have bumped DEFAULT_PACKED_GIT_WINDOW_SIZE down to 64M but, alas, the
pack command still takes over 4G...

I'll keep investigating today,

Take care,
-- 
Ivan Kanis

Nothing in life is to be feared. It is only to be understood.
    -- Marie Curie 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Excessive mmap [was Git server eats all memory]
  2010-08-09 10:12             ` Excessive mmap [was Git server eats all memory] Ivan Kanis
@ 2010-08-09 12:35               ` Dmitry Potapov
  2010-08-09 16:34                 ` Ivan Kanis
  2018-06-20 14:53               ` Duy Nguyen
  1 sibling, 1 reply; 28+ messages in thread
From: Dmitry Potapov @ 2010-08-09 12:35 UTC (permalink / raw)
  To: Ivan Kanis
  Cc: Ivan Kanis, Nguyen Thai Ngoc Duy, jaredhance, Avery Pennarun,
	jnareb, git

On Mon, Aug 09, 2010 at 12:12:34PM +0200, Ivan Kanis wrote:
>
> I have clone identical repositories and the system starts to swap. I
> think it shows that cloning two repository doesn't share mmap.

Though Git uses MAP_PRIVATE with mmap, this only marks mapped pages as
copy-on-write. Because cloning does not change the pack file, all those
pages should be shared. So, the only reasons for swapping could be:
- each cloning operation access to different pages at the same time, so
  it means more pages have to be loaded in memory to allow two programs
  to run simultaneously.
- each operation allocates 387Mb (accordingly to your earlier data), so
  it may add more memory pressure.

>
> I saw this constant defined in git-compat-util.h
>
> /* This value must be multiple of (pagesize * 2) */
> #define DEFAULT_PACKED_GIT_WINDOW_SIZE \
>        (sizeof(void*) >= 8 \
>                ?  1 * 1024 * 1024 * 1024 \
>                : 32 * 1024 * 1024)
>
> If I read this correctly git is allocating 1G of mmap on 64 bit
> architecture. Isn't that a bit much?

On 64-bit architecture, you have plenty virtual space, and mapping
a file to memory should not take much physical memory (only space
needed for system tables). You can reduce core.packedGitWindowSize
in the Git configuration to see if it helps, but I doubt that it
will have any noticeable effect.


Dmitry

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Excessive mmap [was Git server eats all memory]
  2010-08-09 12:35               ` Dmitry Potapov
@ 2010-08-09 16:34                 ` Ivan Kanis
  2010-08-09 16:50                   ` Avery Pennarun
  0 siblings, 1 reply; 28+ messages in thread
From: Ivan Kanis @ 2010-08-09 16:34 UTC (permalink / raw)
  To: Dmitry Potapov
  Cc: Nguyen Thai Ngoc Duy, jaredhance, Avery Pennarun, jnareb, git

Dmitry Potapov <dpotapov@gmail.com> wrote:

> Though Git uses MAP_PRIVATE with mmap, this only marks mapped pages as
> copy-on-write. Because cloning does not change the pack file, all those
> pages should be shared.

I reran the test today. One client is receiving while the other one is
compressing. I have to interrupt both client because the server is
becoming unusable. I am sure you are right about sharing the pages but I
can't test it.

> On 64-bit architecture, you have plenty virtual space, and mapping
> a file to memory should not take much physical memory (only space
> needed for system tables). 

What I can tell from the mmap man page is that it should map memory to a
file. I assume it shouldn't take up physical memory. However I am seeing
physical memory being consumed. It might be a feature of the kernel. Is
there a way to turn it off?

> You can reduce core.packedGitWindowSize in the Git configuration to
> see if it helps, but I doubt that it will have any noticeable effect.

It was worth a shot, it didn't help...

Looking some more into it today the bulk of the memory allocation
happens in write_pack_file in the following loop.

for (; i < nr_objects; i++) {
    if (!write_one(f, objects + i, &offset))
        break;
    display_progress(progress_state, written);
}

This eventually calls write_object, here I am wondering if the
unuse_pack function is doing its job. As far as I can tell it writes a
null in memory, that I think is not enough to reclaim memory.

I also looked at the use_pack function where the mmap is
happening. Would it be worth refactoring this function so that it uses
an index withing a file instead of mmap?

Unless I hear of a better idea I'll be trying that tomorrow...

Take care,
-- 
Ivan Kanis

I can stand brute force, but brute reason is quite unbearable.  There
is something unfair about its use. It is hitting below the intellect.
    -- Oscar Wilde 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Excessive mmap [was Git server eats all memory]
  2010-08-09 16:34                 ` Ivan Kanis
@ 2010-08-09 16:50                   ` Avery Pennarun
  2010-08-09 17:45                     ` Tomas Carnecky
                                       ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Avery Pennarun @ 2010-08-09 16:50 UTC (permalink / raw)
  To: Ivan Kanis; +Cc: Dmitry Potapov, Nguyen Thai Ngoc Duy, jaredhance, jnareb, git

On Mon, Aug 9, 2010 at 12:34 PM, Ivan Kanis
<expire-by-2010-08-14@kanis.fr> wrote:
> Dmitry Potapov <dpotapov@gmail.com> wrote:
>> On 64-bit architecture, you have plenty virtual space, and mapping
>> a file to memory should not take much physical memory (only space
>> needed for system tables).
>
> What I can tell from the mmap man page is that it should map memory to a
> file. I assume it shouldn't take up physical memory. However I am seeing
> physical memory being consumed. It might be a feature of the kernel. Is
> there a way to turn it off?

'ps axu' will show two columns: VSIZE and RSS.  The only one that
actually matters is RSS.

When you mmap a file, it will immediately consume a lot of VSIZE - but
this won't affect your available system memory, because you have only
consumed "virtual" memory.  Instead of swapping that memory out to the
swap file, the kernel knows that this chunk of virtual memory is
already on disk - inside the mmap'd file.

When you access some of the pages of the mmap'd file, the kernel will
swap those pages into memory, which increases RSS.  This uses *real*
memory on the system.

As git generates a new pack file, it needs to access every single page
of every single pack that it's reading from, so eventually, all the
stuff you need will get sucked into RSS, so you'll see that number
grow and grow.  If your packfiles are huge, this is a lot of memory.

Now, the kernel is supposed to be smart enough to release old pages
out of RSS if you stop using them; it's no different from what the
kernel does with any cached file data.  So it shouldn't be expensive
to mmap instead of just reading the file.

> Looking some more into it today the bulk of the memory allocation
> happens in write_pack_file in the following loop.
>
> for (; i < nr_objects; i++) {
>    if (!write_one(f, objects + i, &offset))
>        break;
>    display_progress(progress_state, written);
> }
>
> This eventually calls write_object, here I am wondering if the
> unuse_pack function is doing its job. As far as I can tell it writes a
> null in memory, that I think is not enough to reclaim memory.

What do you mean by the "memory allocation" happens here?  How are you
measuring it?

unuse_pack indeed doesn't free any memory; it just zeroes a pointer
and decreases a refcount.  I don't know much about this code, but I
assume something else goes and cleans up the mmaps later.

In any case, mmap/munmap have little to do with your "real" memory
usage.  munmap() won't free any actual kernel memory; the used pages
will still be floating around in disk cache.

> I also looked at the use_pack function where the mmap is
> happening. Would it be worth refactoring this function so that it uses
> an index withing a file instead of mmap?
>
> Unless I hear of a better idea I'll be trying that tomorrow...

I wouldn't expect this to help, but I would be interested to hear if it does.

If the problem is simply that you're flooding the kernel disk cache
with data you'll use only once, to the detriment of everything else on
the system, then one thing that might help could be posix_fadvise:

    posix_fadvise(fd, ofs, len, POSIX_FADV_DONTNEED);

bup uses this when backing up huge files, since it knows it's only
going to use each block once, and this seemed to decrease system load
(without affecting bup's own performance) in some test cases.
However, it uses this for filesystem files, not packs, so it's a
different use case.

On the other hand, perhaps a more important question is: why does git
feel like it needs to generate entirely new packs for each person
doing a clone on your system?  Shouldn't it be reusing existing ones
and just streaming them straight out to the recipient?

Have fun,

Avery

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git server eats all memory
  2010-08-09  9:57               ` Git server eats all memory Nguyen Thai Ngoc Duy
@ 2010-08-09 17:38                 ` Ivan Kanis
  0 siblings, 0 replies; 28+ messages in thread
From: Ivan Kanis @ 2010-08-09 17:38 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Ivan Kanis, git

Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:

> This command on git already gives me ~40MB peak. Are you sure you ran
> it on your big repo?

Hi Nguyen,

Yes I ran it on the big repo...

Take care,
-- 
Ivan Kanis
http://kanis.fr

Men of lofty genius when they are doing the least work are the most
active.
    -- Leonardo da Vinci 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Excessive mmap [was Git server eats all memory]
  2010-08-09 16:50                   ` Avery Pennarun
@ 2010-08-09 17:45                     ` Tomas Carnecky
  2010-08-09 18:17                       ` Avery Pennarun
  2010-08-09 21:28                     ` Dmitry Potapov
  2010-08-11 15:47                     ` Ivan Kanis
  2 siblings, 1 reply; 28+ messages in thread
From: Tomas Carnecky @ 2010-08-09 17:45 UTC (permalink / raw)
  To: Avery Pennarun
  Cc: Ivan Kanis, Dmitry Potapov, Nguyen Thai Ngoc Duy, jaredhance,
	jnareb, git

On 8/9/10 6:50 PM, Avery Pennarun wrote:
> On the other hand, perhaps a more important question is: why does git
> feel like it needs to generate entirely new packs for each person
> doing a clone on your system?  Shouldn't it be reusing existing ones
> and just streaming them straight out to the recipient?

Isn't that something that the rev cache is supposed to fix?

tom

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Excessive mmap [was Git server eats all memory]
  2010-08-09 17:45                     ` Tomas Carnecky
@ 2010-08-09 18:17                       ` Avery Pennarun
  0 siblings, 0 replies; 28+ messages in thread
From: Avery Pennarun @ 2010-08-09 18:17 UTC (permalink / raw)
  To: Tomas Carnecky
  Cc: Ivan Kanis, Dmitry Potapov, Nguyen Thai Ngoc Duy, jaredhance,
	jnareb, git

On Mon, Aug 9, 2010 at 1:45 PM, Tomas Carnecky <tom@dbservice.com> wrote:
> On 8/9/10 6:50 PM, Avery Pennarun wrote:
>> On the other hand, perhaps a more important question is: why does git
>> feel like it needs to generate entirely new packs for each person
>> doing a clone on your system?  Shouldn't it be reusing existing ones
>> and just streaming them straight out to the recipient?
>
> Isn't that something that the rev cache is supposed to fix?

I wouldn't think so - at least not when cloning a repository from
scratch.  The whole idea of a clone is you ought to be able to copy
the pack contents verbatim since you should want "all" the objects.
Though maybe I've missed something... I've never read the code for
that stuff.

Have fun,

Avery

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Excessive mmap [was Git server eats all memory]
  2010-08-09 16:50                   ` Avery Pennarun
  2010-08-09 17:45                     ` Tomas Carnecky
@ 2010-08-09 21:28                     ` Dmitry Potapov
  2010-08-11 15:47                     ` Ivan Kanis
  2 siblings, 0 replies; 28+ messages in thread
From: Dmitry Potapov @ 2010-08-09 21:28 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Ivan Kanis, Nguyen Thai Ngoc Duy, jaredhance, jnareb, git

On Mon, Aug 09, 2010 at 12:50:30PM -0400, Avery Pennarun wrote:
>
> On the other hand, perhaps a more important question is: why does git
> feel like it needs to generate entirely new packs for each person
> doing a clone on your system?  Shouldn't it be reusing existing ones
> and just streaming them straight out to the recipient?

Git cannot send the whole pack, in general, because it sends only those
objects that are requested by the client. So, except the initial cloning,
it is never the whole pack. Moreover, even during the initial cloning,
it may not be the whole, but only visible objects from it. So, git has
to generate a new pack for every clone operation. But the generated pack
should reuse deltas from the initial pack, so it should not a very
expensive operation.


Dmitry

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git server eats all memory
  2010-08-04 14:57 Git server eats all memory Ivan Kanis
  2010-08-04 15:55 ` Matthieu Moy
  2010-08-04 20:12 ` Avery Pennarun
@ 2010-08-10  0:46 ` Robin H. Johnson
  2010-08-10  2:31   ` Sverre Rabbelier
  2010-08-11 15:54   ` Ivan Kanis
  2 siblings, 2 replies; 28+ messages in thread
From: Robin H. Johnson @ 2010-08-10  0:46 UTC (permalink / raw)
  To: Git Mailing List

On Wed, Aug 04, 2010 at 04:57:39PM +0200,  Ivan Kanis wrote:
> I am wondering if anyone has seen this behavior? I'll do whatever I can
> to troubleshoot the problem. I know C but I just don't know where to
> look at. Any help would be very much appreciated.
We've seen a similar problem in experimental planning for migrating the
core Gentoo repository to Git. That's a 900MiB packfile ('git repack
-adf --window=250 --depth=250' taking 2 hours).

Multiple concurrent full clones push the server into swap. We had 16GiB
of RAM, and this was still occurring.

Our temporary solution plan is via hooks, if you're asking for a item
before a certain point, throw an error telling you to download a
git-bundle from a given URL instead (as a bonus you can resume that
trivially).

-- 
Robin Hugh Johnson
Gentoo Linux: Developer, Trustee & Infrastructure Lead
E-Mail     : robbat2@gentoo.org
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git server eats all memory
  2010-08-10  0:46 ` Robin H. Johnson
@ 2010-08-10  2:31   ` Sverre Rabbelier
  2010-08-11 10:30     ` Sam Vilain
  2010-08-11 15:54   ` Ivan Kanis
  1 sibling, 1 reply; 28+ messages in thread
From: Sverre Rabbelier @ 2010-08-10  2:31 UTC (permalink / raw)
  To: Robin H. Johnson, Sam Vilain
  Cc: Dmitry Potapov, Ivan Kanis, Nguyen Thai Ngoc Duy, jaredhance,
	Avery Pennarun, jnareb, git

Heya,

[please don't cull the cc list]

On Mon, Aug 9, 2010 at 19:46, Robin H. Johnson <robbat2@gentoo.org> wrote:
> Our temporary solution plan is via hooks, if you're asking for a item
> before a certain point, throw an error telling you to download a
> git-bundle from a given URL instead (as a bonus you can resume that
> trivially).

Seems like there should be a way to tell the git server that certain
pack files should be sent to the client verbatim. Perhaps the protocol
could learn a new capability to support such a negotiation in which
the server will assume that the client either has the required packs,
or continue negotiation under the assumption that they will be
downloaded first.

Sounds like Sam might have some relevant to say about this? Or perhaps
the pack caching gsoc project is relevant?

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git server eats all memory
  2010-08-10  2:31   ` Sverre Rabbelier
@ 2010-08-11 10:30     ` Sam Vilain
  0 siblings, 0 replies; 28+ messages in thread
From: Sam Vilain @ 2010-08-11 10:30 UTC (permalink / raw)
  To: Sverre Rabbelier
  Cc: Robin H. Johnson, Dmitry Potapov, Ivan Kanis,
	Nguyen Thai Ngoc Duy, jaredhance, Avery Pennarun, jnareb, git

On Mon, 2010-08-09 at 21:31 -0500, Sverre Rabbelier wrote:
> On Mon, Aug 9, 2010 at 19:46, Robin H. Johnson <robbat2@gentoo.org> wrote:
> > Our temporary solution plan is via hooks, if you're asking for a item
> > before a certain point, throw an error telling you to download a
> > git-bundle from a given URL instead (as a bonus you can resume that
> > trivially).
> 
> Seems like there should be a way to tell the git server that certain
> pack files should be sent to the client verbatim. Perhaps the protocol
> could learn a new capability to support such a negotiation in which
> the server will assume that the client either has the required packs,
> or continue negotiation under the assumption that they will be
> downloaded first.
> 
> Sounds like Sam might have some relevant to say about this? Or perhaps
> the pack caching gsoc project is relevant?

Sure, well the project was supposed to be primarily useful for this use
case.  It just needs someone to pick it up and revitalize it so it can
be merged... looks like there's a topgit series at github.com/sirnot/git
from Oct. last year.

Sam

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Excessive mmap [was Git server eats all memory]
  2010-08-09 16:50                   ` Avery Pennarun
  2010-08-09 17:45                     ` Tomas Carnecky
  2010-08-09 21:28                     ` Dmitry Potapov
@ 2010-08-11 15:47                     ` Ivan Kanis
  2010-08-11 16:35                       ` Avery Pennarun
  2 siblings, 1 reply; 28+ messages in thread
From: Ivan Kanis @ 2010-08-11 15:47 UTC (permalink / raw)
  To: Avery Pennarun
  Cc: Dmitry Potapov, Nguyen Thai Ngoc Duy, jaredhance, jnareb, git

Hi Avery,

Avery Pennarun <apenwarr@gmail.com> wrote:

> ... When you access some of the pages of the mmap'd file, the kernel
> will swap those pages into memory, which increases RSS.  This uses
> *real* memory on the system...

Thanks for the very clear explanations

> Now, the kernel is supposed to be smart enough to release old pages
> out of RSS if you stop using them; it's no different from what the
> kernel does with any cached file data.  So it shouldn't be expensive
> to mmap instead of just reading the file.

How can the kernel release old pages? There does not seem to be anyway
to tell it that it doesn't need a given memory block.

>> Looking some more into it today the bulk of the memory allocation
>> happens in write_pack_file in the following loop.
>>
>> for (; i < nr_objects; i++) {
>>    if (!write_one(f, objects + i, &offset))
>>        break;
>>    display_progress(progress_state, written);
>> }
>>
>> This eventually calls write_object, here I am wondering if the
>> unuse_pack function is doing its job. As far as I can tell it writes a
>> null in memory, that I think is not enough to reclaim memory.
>
> What do you mean by the "memory allocation" happens here?  How are you
> measuring it?

I run top and look at the RES column. I put a printf before and after
the code block and watch the memory go up and up.

>> I also looked at the use_pack function where the mmap is
>> happening. Would it be worth refactoring this function so that it uses
>> an index withing a file instead of mmap?
>>
>> Unless I hear of a better idea I'll be trying that tomorrow...
>
> I wouldn't expect this to help, but I would be interested to hear if
> it does.

I got caught up with other thing at work but I think I'll be able to try
Friday.

> If the problem is simply that you're flooding the kernel disk cache
> with data you'll use only once, to the detriment of everything else on
> the system, then one thing that might help could be posix_fadvise:
>
>     posix_fadvise(fd, ofs, len, POSIX_FADV_DONTNEED);

Sounds interesting, I'll try sticking that in the unuse_pack function
Friday.

> On the other hand, perhaps a more important question is: why does git
> feel like it needs to generate entirely new packs for each person
> doing a clone on your system?  Shouldn't it be reusing existing ones
> and just streaming them straight out to the recipient?

Ah interesting point. Two things make me suspect the mmap is not shared
between processes. One is that mmap is done with the MAP_PRIVATE flag
which according to the man page doesn't share between processes. The
second is that the mmap is done on a temporary file created by
odb_mkstemp, I don't believe the name is identical between the two
processes.

Take care,
-- 
Ivan Kanis
http://kanis.fr

Nobody ever went broke underestimating the intelligence of the
American public.
    -- H L Mencken 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Git server eats all memory
  2010-08-10  0:46 ` Robin H. Johnson
  2010-08-10  2:31   ` Sverre Rabbelier
@ 2010-08-11 15:54   ` Ivan Kanis
  1 sibling, 0 replies; 28+ messages in thread
From: Ivan Kanis @ 2010-08-11 15:54 UTC (permalink / raw)
  To: Robin H. Johnson
  Cc: srabbelier, Sam Vilain, Dmitry Potapov, Nguyen Thai Ngoc Duy,
	jaredhance, Avery Pennarun, jnareb, git

Hi Robin,

"Robin H. Johnson" <robbat2@gentoo.org> wrote:

> On Wed, Aug 04, 2010 at 04:57:39PM +0200,  Ivan Kanis wrote:
>> I am wondering if anyone has seen this behavior? I'll do whatever I can
>> to troubleshoot the problem. I know C but I just don't know where to
>> look at. Any help would be very much appreciated.
>
> Multiple concurrent full clones push the server into swap. We had 16GiB
> of RAM, and this was still occurring.

Glad I am not the only one seeing this :D

> Our temporary solution plan is via hooks, if you're asking for a item
> before a certain point, throw an error telling you to download a
> git-bundle from a given URL instead (as a bonus you can resume that
> trivially).

I don't understand the solution... Do you have snippet of the hook?

Take care,
-- 
Ivan Kanis
http://kanis.fr

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Excessive mmap [was Git server eats all memory]
  2010-08-11 15:47                     ` Ivan Kanis
@ 2010-08-11 16:35                       ` Avery Pennarun
       [not found]                         ` <wes4oetv31i.fsf@kanis.fr>
  0 siblings, 1 reply; 28+ messages in thread
From: Avery Pennarun @ 2010-08-11 16:35 UTC (permalink / raw)
  To: Ivan Kanis; +Cc: Dmitry Potapov, Nguyen Thai Ngoc Duy, jaredhance, jnareb, git

On Wed, Aug 11, 2010 at 11:47 AM, Ivan Kanis
<expire-by-2010-08-16@kanis.fr> wrote:
> Avery Pennarun <apenwarr@gmail.com> wrote:
>> Now, the kernel is supposed to be smart enough to release old pages
>> out of RSS if you stop using them; it's no different from what the
>> kernel does with any cached file data.  So it shouldn't be expensive
>> to mmap instead of just reading the file.
>
> How can the kernel release old pages? There does not seem to be anyway
> to tell it that it doesn't need a given memory block.

The kernel doesn't care whether you "need" it; it swaps out "needed"
pages all the time.

With a normal dirty memory page (allocated with malloc() or whatever),
the kernel will need to write it out to the swap file before it drops
it from RSS.  Then if the process needs to read/write that page in the
future, it'll have to read it back in from the swap file and increase
RSS again before it can be used.

With mmap'd files it's slightly different.  As long as the page hasn't
been modified (and as far as I know, git never writes to pages of
packfiles) then we already know that page is safe on disk.  So if the
kernel needs to "swap it out", it just drops it immediately from RSS
and doesn't do any I/O.  When/if the process needs to read/write the
page in the future, the kernel can swap it in the way it did in the
first place: from the original file.

If I understand correctly, all this means that the kernel on average
tries to drop mmap'd file pages out of RSS more than other kinds of
dirty pages, because swapping out mmap'd pages is cheaper.

If you think about it, if you do 'cat filename' in a loop, every new
'cat' process needs to load filename into memory. Of course the kernel
doesn't throw away the pages just because cat exits; it keeps a cache
of the file's pages in memory, and just feeds them to the next 'cat'
process when it starts.  So the kernel keeps stuff in memory even if
nobody is currently using it.  The surprising thing (at first) is that
the kernel is also happy to throw away pages even if you *are* using
them, as long as it can get them back.

Swapping is based on how frequently a page is used, not whether that
page is currently mapped into someone's address space.  (Disclaimer: I
haven't read the code.  Maybe it does give higher priority to pages
that are currently mapped.)

>>> Looking some more into it today the bulk of the memory allocation
>>> happens in write_pack_file in the following loop.
>>>
>>> for (; i < nr_objects; i++) {
>>>    if (!write_one(f, objects + i, &offset))
>>>        break;
>>>    display_progress(progress_state, written);
>>> }
>>>
>>> This eventually calls write_object, here I am wondering if the
>>> unuse_pack function is doing its job. As far as I can tell it writes a
>>> null in memory, that I think is not enough to reclaim memory.
>>
>> What do you mean by the "memory allocation" happens here?  How are you
>> measuring it?
>
> I run top and look at the RES column. I put a printf before and after
> the code block and watch the memory go up and up.

Yeah, that's not a very good way to do it.  The problem is that RSS is
*guaranteed* to go up in this location: you've just accessed an mmap'd
page you haven't used before.  That's not a bug.  Furthermore, if
multiple processes are mmap'ing the same pages, *all* those processes
might see their RSS go up, but it's the "same" pages, so that's not
actually taking twice the physical memory.

Unfortunately there are no really reliable ways to track this kind of
memory usage (as far as I know).  The tricks I often use are:

1)  while sleep 1; do free; done

2)  vmstat 1

Command #1 will show you what's happening to your physical RAM.  If
you run one git-repack, do you see the 'free' column decreasing by the
same amount as the RSS increases?  If you run two repacks at once,
does it increase as the sum of the two RSS columns, or just one of
them, or something else?

Command #2 will show you your blocks swapped in and out per second.
The interesting columns are si/so/bi/bo.

>> On the other hand, perhaps a more important question is: why does git
>> feel like it needs to generate entirely new packs for each person
>> doing a clone on your system?  Shouldn't it be reusing existing ones
>> and just streaming them straight out to the recipient?
>
> Ah interesting point. Two things make me suspect the mmap is not shared
> between processes. One is that mmap is done with the MAP_PRIVATE flag
> which according to the man page doesn't share between processes. The
> second is that the mmap is done on a temporary file created by
> odb_mkstemp, I don't believe the name is identical between the two
> processes.

MAP_PRIVATE is a little more complicated than that.  What it means is
that if one of the processes *writes* to one of the pages, the other
process won't see the changes.  But if nobody writes to the pages -
and I'm pretty sure nobody does - then the kernel won't just copy the
data for no reason, because it would be pointlessly inefficient.

That said, you're obviously experiencing bad behaviour, ie. it's not
working like it's supposed to, one way or another.  So you shouldn't
trust that your kernel, or git, or even my explanations are correct :)

Have fun,

Avery

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Excessive mmap [was Git server eats all memory]
       [not found]                         ` <wes4oetv31i.fsf@kanis.fr>
@ 2010-08-17 17:07                           ` Dmitry Potapov
  0 siblings, 0 replies; 28+ messages in thread
From: Dmitry Potapov @ 2010-08-17 17:07 UTC (permalink / raw)
  To: Ivan Kanis
  Cc: Avery Pennarun, Sverre Rabbelier, Robin H. Johnson, Sam Vilain,
	Nguyen Thai Ngoc Duy, jaredhance, jnareb, git

Hi Ivan,

On Tue, Aug 17, 2010 at 02:26:01PM +0200, Ivan Kanis wrote:
>
> I have ran the following command for my tests: vmstat -SM -n 60
>
> Here's the stat for one git clone, I see that 4237M of memory is
> consumed. That is roughly the size of the repository. This raises my
> first question: why is the memory not reclaimed, at the end of the run?

It is consumed by system cache to hold the last read data, and it will
hold it as long as there is enough free memory. When there is not enough
free memory, the system will free some old (long time unused) pages from
the system cache. This allows the system to avoid re-reading same files
when there is enough memory to keep them.

>
> ls -lh objects/pack/*.pack
> 4.2G objects/pack/pack-55ad6d01f37427ca69e6267b0cd4e5257e57272c.pack
>
> Is it a sensible behavior to leave a 4G file lying around?

I am not sure I understand your question. This pack contains all data
of your repository. So what did you expect to happen?

> Does it get erased when people are pushing changes in?

No, usually, push adds a new pack. However, when there are too many
packs, the garbage collector will try to repack everything in one new
pack.  You can create an empty file with the same name but with .keep
extension to preserve the specified pack from being repacked.


Dmitry

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Excessive mmap [was Git server eats all memory]
  2010-08-09 10:12             ` Excessive mmap [was Git server eats all memory] Ivan Kanis
  2010-08-09 12:35               ` Dmitry Potapov
@ 2018-06-20 14:53               ` Duy Nguyen
  1 sibling, 0 replies; 28+ messages in thread
From: Duy Nguyen @ 2018-06-20 14:53 UTC (permalink / raw)
  To: Ivan Kanis
  Cc: Dmitry Potapov, Ivan Kanis, Jared Hance, Avery Pennarun,
	Jakub Narebski, Git Mailing List

On Tue, Jun 19, 2018 at 10:27 PM Ivan Kanis
<expire-by-2010-08-14@kanis.fr> wrote:
>
> Dmitry Potapov <dpotapov@gmail.com> wrote:
>
> > On Fri, Aug 06, 2010 at 07:23:17PM +0200, Ivan Kanis wrote:
> >>
> >> I expected the malloc to take 4G but was surprised it didn't. It seems
> >> to be mmap taking all the memory. I am not familiar with that function,
> >> it looks like it's mapping memory to a file... Is it reasonable to mmap
> >> so much memory?
> >
> > AFAIK, Git does not need to mmap the whole pack to memory, but it
> > is more efficient to mmap the whole pack wherever possible, because
> > it has a completely random access, so if you store only one sliding
> > window, you will have to re-read it many times. Besides, mmap size
> > does not mean that so much physical memory is used. Pages should
> > be loaded when they are necessary, and if you have more than one
> > client cloning the same repo, this memory should be shared by them.
>
> I have clone identical repositories and the system starts to swap. I
> think it shows that cloning two repository doesn't share mmap.

I doubt it (assuming you're on linux). If you suspect this, configure
core.packedGitWindowSize to reduce the mmap size. There are lots of
other things in a cloning process that do not share (is this client or
server btw?) and things could add up.
--
Duy

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2018-06-20 14:54 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-04 14:57 Git server eats all memory Ivan Kanis
2010-08-04 15:55 ` Matthieu Moy
2010-08-04 17:50   ` Ivan Kanis
2010-08-04 20:12 ` Avery Pennarun
2010-08-05  6:33   ` Ivan Kanis
2010-08-05 22:45     ` Jared Hance
2010-08-06  1:37     ` Nguyen Thai Ngoc Duy
2010-08-06  1:51       ` Nguyen Thai Ngoc Duy
2010-08-06 11:34         ` Jakub Narebski
2010-08-06 17:23         ` Ivan Kanis
2010-08-07  6:42           ` Dmitry Potapov
2010-08-09 10:12             ` Excessive mmap [was Git server eats all memory] Ivan Kanis
2010-08-09 12:35               ` Dmitry Potapov
2010-08-09 16:34                 ` Ivan Kanis
2010-08-09 16:50                   ` Avery Pennarun
2010-08-09 17:45                     ` Tomas Carnecky
2010-08-09 18:17                       ` Avery Pennarun
2010-08-09 21:28                     ` Dmitry Potapov
2010-08-11 15:47                     ` Ivan Kanis
2010-08-11 16:35                       ` Avery Pennarun
     [not found]                         ` <wes4oetv31i.fsf@kanis.fr>
2010-08-17 17:07                           ` Dmitry Potapov
2018-06-20 14:53               ` Duy Nguyen
     [not found]           ` <AANLkTi=yeTh2tKn9t_=iZbdB5VLrfCPZ2_fBpYdf9wta@mail.gmail.com>
     [not found]             ` <wesbp9cnnag.fsf@kanis.fr>
2010-08-09  9:57               ` Git server eats all memory Nguyen Thai Ngoc Duy
2010-08-09 17:38                 ` Ivan Kanis
2010-08-10  0:46 ` Robin H. Johnson
2010-08-10  2:31   ` Sverre Rabbelier
2010-08-11 10:30     ` Sam Vilain
2010-08-11 15:54   ` Ivan Kanis

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).