git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Git exhausts memory.
@ 2011-04-02  5:01 Alif Wahid
  2011-04-02 15:05 ` Nicolas Pitre
  0 siblings, 1 reply; 19+ messages in thread
From: Alif Wahid @ 2011-04-02  5:01 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1471 bytes --]

Hi there,

I'm using Git v1.7.1 on Ubuntu v10.10 and unfortunately Git seems to
regularly exhaust the memory on my machine and fails to compress loose
objects and/or collect garbage.

My Intel based dual-core machine has 2 GB of RAM and 4 GB of swap
space. I need to track a working tree with a handful of really large
tarballs that rarely change and loads of really small text files that
change frequently. What I'm seeing is that over time whenever "git gc"
runs automatically it fails with the message "fatal: Out of memory,
malloc failed". So I've been trying to manually run "git repack -ad
--window-memory=1g --max-pack-size=1g" in the hope that Git will not
exceed the physical memory. But I still get the same error message :(

As I can't make my repository public, I've attached a simple Python
script that generates a ~1.3 GB file containing random integers (takes
roughly 10 min. on my machine). Then I run the following four commands
and get the out-of-memory failure from "git repack". This is
effectively emulating the scenario I have with my repository.

$ git init
$ git add ./test_data.dat
$ git commit ./test_data.dat -m "Test data."
$ git repack -ad --window-memory=1g --max-pack-size=1g
Counting objects: 3, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (2/2), done.
fatal: Out of memory, malloc failed

I can't find anything on the wiki about out-of-memory failures. Any
info/help would be much appreciated.

Regards

Alif

[-- Attachment #2: test_data.py --]
[-- Type: text/x-python, Size: 1048 bytes --]

#! /usr/bin/env python

import sys, os, random

if __name__ == '__main__':

    fbuff = 2**28   # file buffer size in bytes
    fints = 10**8   # number of integers to write
    fsize = 12      # number of digits to write per integer
    
    fname = os.path.splitext(os.path.basename(sys.argv[0]))[0]
    try:
        os.mkdir(fname)
    except OSError:
        pass    # directory already exists
        
    fpath = os.path.join(fname, fname+'.dat')
    fhand = open(fpath, 'w', fbuff)
    
    sys.stdout.write('Writing %d MB to file \'%s\'\n' % (fints*(fsize+1)/10**6, fpath))
    sys.stdout.write('This will take some time, please wait.\n')
    sys.stdout.write('Progress:')
    sys.stdout.flush()

    random.seed(-1)
    fform = '%'+str(fsize)+'d\n'
    for x in xrange(fints): 
        fhand.write(fform % (random.randint(0,fints)))
        if x % (fints/10) == 0 and x > 0:
            sys.stdout.write('%3d/10' % (x/(fints/10))) 
            sys.stdout.flush()

    sys.stdout.write(' 10/10\n')
    sys.stdout.flush()
    fhand.close()


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git exhausts memory.
  2011-04-02  5:01 Git exhausts memory Alif Wahid
@ 2011-04-02 15:05 ` Nicolas Pitre
  2011-04-03  9:15   ` Alif Wahid
  0 siblings, 1 reply; 19+ messages in thread
From: Nicolas Pitre @ 2011-04-02 15:05 UTC (permalink / raw)
  To: Alif Wahid; +Cc: git

On Sat, 2 Apr 2011, Alif Wahid wrote:

> Hi there,
> 
> I'm using Git v1.7.1 on Ubuntu v10.10 and unfortunately Git seems to
> regularly exhaust the memory on my machine and fails to compress loose
> objects and/or collect garbage.
> 
> My Intel based dual-core machine has 2 GB of RAM and 4 GB of swap
> space. I need to track a working tree with a handful of really large
> tarballs that rarely change and loads of really small text files that
> change frequently. What I'm seeing is that over time whenever "git gc"
> runs automatically it fails with the message "fatal: Out of memory,
> malloc failed". So I've been trying to manually run "git repack -ad
> --window-memory=1g --max-pack-size=1g" in the hope that Git will not
> exceed the physical memory. But I still get the same error message :(

Don't use --max-pack-size.  That won't help here.

How large are those tar files?


Nicolas

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git exhausts memory.
  2011-04-02 15:05 ` Nicolas Pitre
@ 2011-04-03  9:15   ` Alif Wahid
  2011-04-03 15:18     ` Nicolas Pitre
  0 siblings, 1 reply; 19+ messages in thread
From: Alif Wahid @ 2011-04-03  9:15 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git

Hi Nicolas,

On 3 April 2011 02:05, Nicolas Pitre <nico@fluxnic.net> wrote:

> Don't use --max-pack-size.  That won't help here.

I've tried only --window-memory with different values and they all
failed. It seems to me as though this option is simply ignored or
non-existent.

> How large are those tar files?

The tar files aggregate to just under 2 GB and my complete working
tree is around 3 GB. Whenever I run git-gc or git-repack they seem to
reach a virtual mem. footprint of roughly 2.5 GB over the course of 10
minutes and then fail with the out-of-memory error.

Alif

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git exhausts memory.
  2011-04-03  9:15   ` Alif Wahid
@ 2011-04-03 15:18     ` Nicolas Pitre
  2011-04-04 12:52       ` Alif Wahid
  0 siblings, 1 reply; 19+ messages in thread
From: Nicolas Pitre @ 2011-04-03 15:18 UTC (permalink / raw)
  To: Alif Wahid; +Cc: git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 818 bytes --]

On Sun, 3 Apr 2011, Alif Wahid wrote:

> Hi Nicolas,
> 
> On 3 April 2011 02:05, Nicolas Pitre <nico@fluxnic.net> wrote:
> 
> > Don't use --max-pack-size.  That won't help here.
> 
> I've tried only --window-memory with different values and they all
> failed. It seems to me as though this option is simply ignored or
> non-existent.

It is not ignored, but there are situations where there are problem 
making it effective, especially if a few files are very large.

> > How large are those tar files?
> 
> The tar files aggregate to just under 2 GB and my complete working
> tree is around 3 GB.

What about the individual size for those files?

Something you can try is to simply tell Git not to attempt any delta 
compression on those tar files using gitattributes (see the man page of 
the same name).


Nicolas

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git exhausts memory.
  2011-04-03 15:18     ` Nicolas Pitre
@ 2011-04-04 12:52       ` Alif Wahid
  2011-04-04 14:57         ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 19+ messages in thread
From: Alif Wahid @ 2011-04-04 12:52 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git

Hi Nicolas,

On 4 April 2011 01:18, Nicolas Pitre <nico@fluxnic.net> wrote:
>
> Something you can try is to simply tell Git not to attempt any delta
> compression on those tar files using gitattributes (see the man page of
> the same name).
>

Seems to have worked. Both git-gc and git-repack appear to be less
memory hungry now and do actually run to completion without failure.

Thanks for your help.

Cheers

Alif

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git exhausts memory.
  2011-04-04 12:52       ` Alif Wahid
@ 2011-04-04 14:57         ` Nguyen Thai Ngoc Duy
  2011-04-05  2:22           ` David Fries
  2011-04-05 16:48           ` Holger Hellmuth
  0 siblings, 2 replies; 19+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-04-04 14:57 UTC (permalink / raw)
  To: Nicolas Pitre, Git Mailing List; +Cc: Alif Wahid

On Mon, Apr 4, 2011 at 7:52 PM, Alif Wahid <alif.wahid@gmail.com> wrote:
> Hi Nicolas,
>
> On 4 April 2011 01:18, Nicolas Pitre <nico@fluxnic.net> wrote:
>>
>> Something you can try is to simply tell Git not to attempt any delta
>> compression on those tar files using gitattributes (see the man page of
>> the same name).

Should we change the default to not delta if a blob exceeds predefined
limit (say 128M)? People who deliberately wants to delta them can
still set delta attr. 1.8.0 material maybe?

> Seems to have worked. Both git-gc and git-repack appear to be less
> memory hungry now and do actually run to completion without failure.
>
> Thanks for your help.
-- 
Duy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git exhausts memory.
  2011-04-04 14:57         ` Nguyen Thai Ngoc Duy
@ 2011-04-05  2:22           ` David Fries
  2011-04-05  4:35             ` Alif Wahid
  2011-04-05 16:48           ` Holger Hellmuth
  1 sibling, 1 reply; 19+ messages in thread
From: David Fries @ 2011-04-05  2:22 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Nicolas Pitre, Git Mailing List, Alif Wahid

On Mon, Apr 04, 2011 at 09:57:01PM +0700, Nguyen Thai Ngoc Duy wrote:
> On Mon, Apr 4, 2011 at 7:52 PM, Alif Wahid <alif.wahid@gmail.com> wrote:
> > Hi Nicolas,
> >
> > On 4 April 2011 01:18, Nicolas Pitre <nico@fluxnic.net> wrote:
> >>
> >> Something you can try is to simply tell Git not to attempt any delta
> >> compression on those tar files using gitattributes (see the man page of
> >> the same name).
> 
> Should we change the default to not delta if a blob exceeds predefined
> limit (say 128M)? People who deliberately wants to delta them can
> still set delta attr. 1.8.0 material maybe?

I think it would be better to define it in terms of available memory.
Something like the minimum of system memory or address space, and
delta up to X amount of that (it might be good to leave off swap to
reduce trashing).  There has to be a better way than a straight 128MB
default.

The number which works on my 8GB desktop is going to kill the computer
in the trunk of my car with 48MB of ram.  I've actually seen a 700 MB
repository fail with `git-gc --aggressive` on a system with 4GB of ram
because it ran out of memory, it only worked by leaving off the
--aggressive option.

> > Seems to have worked. Both git-gc and git-repack appear to be less
> > memory hungry now and do actually run to completion without failure.
> >
> > Thanks for your help.
> -- 
> Duy
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
David Fries <david@fries.net>    PGP pub CB1EE8F0
http://fries.net/~david/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git exhausts memory.
  2011-04-05  2:22           ` David Fries
@ 2011-04-05  4:35             ` Alif Wahid
  2011-04-05 11:13               ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 19+ messages in thread
From: Alif Wahid @ 2011-04-05  4:35 UTC (permalink / raw)
  To: David Fries, Nguyen Thai Ngoc Duy, Nicolas Pitre; +Cc: Git Mailing List

Hi everyone,

On 5 April 2011 12:22, David Fries <david@fries.net> wrote:
> On Mon, Apr 04, 2011 at 09:57:01PM +0700, Nguyen Thai Ngoc Duy wrote:
>> On Mon, Apr 4, 2011 at 7:52 PM, Alif Wahid <alif.wahid@gmail.com> wrote:
>> > Hi Nicolas,
>> >
>> > On 4 April 2011 01:18, Nicolas Pitre <nico@fluxnic.net> wrote:
>> >>
>> >> Something you can try is to simply tell Git not to attempt any delta
>> >> compression on those tar files using gitattributes (see the man page of
>> >> the same name).
>>
>> Should we change the default to not delta if a blob exceeds predefined
>> limit (say 128M)? People who deliberately wants to delta them can
>> still set delta attr. 1.8.0 material maybe?
>
> I think it would be better to define it in terms of available memory.
> Something like the minimum of system memory or address space, and
> delta up to X amount of that (it might be good to leave off swap to
> reduce trashing).  There has to be a better way than a straight 128MB
> default.
>
> The number which works on my 8GB desktop is going to kill the computer
> in the trunk of my car with 48MB of ram.  I've actually seen a 700 MB
> repository fail with `git-gc --aggressive` on a system with 4GB of ram
> because it ran out of memory, it only worked by leaving off the
> --aggressive option.

It seems to me that if "git init" creates a $GIT_DIR/info/attributes
file by default with a line like "*.gz -delta", then that will disable
the memory intensive deltra compression plumbing for those special
cases where people need to track gzip archives (similarly another line
"*.bz2 -delta" for bzip2 archives and so on). Since these files can't
supposedly be compressed much more, I think Git ought to have a
default heuristic to not attempt any compression on them.

Cheers

Alif

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git exhausts memory.
  2011-04-05  4:35             ` Alif Wahid
@ 2011-04-05 11:13               ` Nguyen Thai Ngoc Duy
  2011-04-05 11:26                 ` Alif Wahid
  0 siblings, 1 reply; 19+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-04-05 11:13 UTC (permalink / raw)
  To: Alif Wahid
  Cc: David Fries, Nicolas Pitre, Git Mailing List, Junio C Hamano,
	Michael J Gruber

On Tue, Apr 5, 2011 at 11:35 AM, Alif Wahid <alif.wahid@gmail.com> wrote:
> It seems to me that if "git init" creates a $GIT_DIR/info/attributes
> file by default with a line like "*.gz -delta", then that will disable
> the memory intensive deltra compression plumbing for those special
> cases where people need to track gzip archives (similarly another line
> "*.bz2 -delta" for bzip2 archives and so on). Since these files can't
> supposedly be compressed much more, I think Git ought to have a
> default heuristic to not attempt any compression on them.

I was thinking of very similar thing on my ride home. But I selected
files on size, not extension. With the (hopefully coming soon)
introduction of pathspec magic specifier [1], we can teach git-attr to
express "files that have size in a range [a,b]" (either a or b can be
infinite). The rest is like yours: applying -delta on selected files
then put such a rule with a default range in default template.

[1] http://thread.gmane.org/gmane.comp.version-control.git/169813/focus=169844
-- 
Duy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git exhausts memory.
  2011-04-05 11:13               ` Nguyen Thai Ngoc Duy
@ 2011-04-05 11:26                 ` Alif Wahid
  0 siblings, 0 replies; 19+ messages in thread
From: Alif Wahid @ 2011-04-05 11:26 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy
  Cc: David Fries, Nicolas Pitre, Git Mailing List, Junio C Hamano,
	Michael J Gruber

On 5 April 2011 21:13, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> On Tue, Apr 5, 2011 at 11:35 AM, Alif Wahid <alif.wahid@gmail.com> wrote:
>> It seems to me that if "git init" creates a $GIT_DIR/info/attributes
>> file by default with a line like "*.gz -delta", then that will disable
>> the memory intensive deltra compression plumbing for those special
>> cases where people need to track gzip archives (similarly another line
>> "*.bz2 -delta" for bzip2 archives and so on). Since these files can't
>> supposedly be compressed much more, I think Git ought to have a
>> default heuristic to not attempt any compression on them.
>
> I was thinking of very similar thing on my ride home. But I selected
> files on size, not extension. With the (hopefully coming soon)
> introduction of pathspec magic specifier [1], we can teach git-attr to
> express "files that have size in a range [a,b]" (either a or b can be
> infinite). The rest is like yours: applying -delta on selected files
> then put such a rule with a default range in default template.
>
> [1] http://thread.gmane.org/gmane.comp.version-control.git/169813/focus=169844

Yeah, makes sense.

I also noticed the following thread regarding big file support. Most
of the details there are related to this issue as well.

http://thread.gmane.org/gmane.comp.version-control.git/170649/focus=170649

Alif

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git exhausts memory.
  2011-04-04 14:57         ` Nguyen Thai Ngoc Duy
  2011-04-05  2:22           ` David Fries
@ 2011-04-05 16:48           ` Holger Hellmuth
  2011-04-05 17:06             ` Shawn Pearce
  1 sibling, 1 reply; 19+ messages in thread
From: Holger Hellmuth @ 2011-04-05 16:48 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Nicolas Pitre, Git Mailing List, Alif Wahid

On 04.04.2011 16:57, Nguyen Thai Ngoc Duy wrote:
> On Mon, Apr 4, 2011 at 7:52 PM, Alif Wahid<alif.wahid@gmail.com>  wrote:
>> Hi Nicolas,
>>
>> On 4 April 2011 01:18, Nicolas Pitre<nico@fluxnic.net>  wrote:
>>>
>>> Something you can try is to simply tell Git not to attempt any delta
>>> compression on those tar files using gitattributes (see the man page of
>>> the same name).
>
> Should we change the default to not delta if a blob exceeds predefined
> limit (say 128M)? People who deliberately wants to delta them can
> still set delta attr. 1.8.0 material maybe?


Isn't this already done with the config variable core.bigFileThreshold ?

documentation says: "Files larger than this size are stored deflated, 
without attempting delta compression. ... Default is 512 MiB on all 
platforms."

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git exhausts memory.
  2011-04-05 16:48           ` Holger Hellmuth
@ 2011-04-05 17:06             ` Shawn Pearce
  2011-04-05 17:44               ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Shawn Pearce @ 2011-04-05 17:06 UTC (permalink / raw)
  To: Holger Hellmuth
  Cc: Nguyen Thai Ngoc Duy, Nicolas Pitre, Git Mailing List, Alif Wahid

On Tue, Apr 5, 2011 at 12:48, Holger Hellmuth <hellmuth@ira.uka.de> wrote:
> On 04.04.2011 16:57, Nguyen Thai Ngoc Duy wrote:
>>
>> Should we change the default to not delta if a blob exceeds predefined
>> limit (say 128M)? People who deliberately wants to delta them can
>> still set delta attr. 1.8.0 material maybe?
>
> Isn't this already done with the config variable core.bigFileThreshold ?
>
> documentation says: "Files larger than this size are stored deflated,
> without attempting delta compression. ... Default is 512 MiB on all
> platforms."

This is only implemented inside of fast-import. pack-objects does not
honor this variable.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git exhausts memory.
  2011-04-05 17:06             ` Shawn Pearce
@ 2011-04-05 17:44               ` Junio C Hamano
  2011-04-05 20:56                 ` Nicolas Pitre
  2011-04-06 15:51                 ` Jay Soffian
  0 siblings, 2 replies; 19+ messages in thread
From: Junio C Hamano @ 2011-04-05 17:44 UTC (permalink / raw)
  To: Shawn Pearce
  Cc: Holger Hellmuth, Nguyen Thai Ngoc Duy, Nicolas Pitre,
	Git Mailing List, Alif Wahid

Shawn Pearce <spearce@spearce.org> writes:

> On Tue, Apr 5, 2011 at 12:48, Holger Hellmuth <hellmuth@ira.uka.de> wrote:
>> On 04.04.2011 16:57, Nguyen Thai Ngoc Duy wrote:
>>>
>>> Should we change the default to not delta if a blob exceeds predefined
>>> limit (say 128M)? People who deliberately wants to delta them can
>>> still set delta attr. 1.8.0 material maybe?
>>
>> Isn't this already done with the config variable core.bigFileThreshold ?
>>
>> documentation says: "Files larger than this size are stored deflated,
>> without attempting delta compression. ... Default is 512 MiB on all
>> platforms."
>
> This is only implemented inside of fast-import. pack-objects does not
> honor this variable.

Do you mean perhaps we should?

 builtin/pack-objects.c |    8 ++++++--
 cache.h                |    1 +
 config.c               |    6 ++++++
 environment.c          |    1 +
 fast-import.c          |    5 -----
 5 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index b0503b2..f402a84 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1142,8 +1142,12 @@ static void get_object_details(void)
 		sorted_by_offset[i] = objects + i;
 	qsort(sorted_by_offset, nr_objects, sizeof(*sorted_by_offset), pack_offset_sort);
 
-	for (i = 0; i < nr_objects; i++)
-		check_object(sorted_by_offset[i]);
+	for (i = 0; i < nr_objects; i++) {
+		struct object_entry *entry = sorted_by_offset[i];
+		check_object(entry);
+		if (big_file_threshold <= entry->size)
+			entry->no_try_delta = 1;
+	}
 
 	free(sorted_by_offset);
 }
diff --git a/cache.h b/cache.h
index 2674f4c..316d85f 100644
--- a/cache.h
+++ b/cache.h
@@ -573,6 +573,7 @@ extern int core_compression_seen;
 extern size_t packed_git_window_size;
 extern size_t packed_git_limit;
 extern size_t delta_base_cache_limit;
+extern uintmax_t big_file_threshold;
 extern int read_replace_refs;
 extern int fsync_object_files;
 extern int core_preload_index;
diff --git a/config.c b/config.c
index 0abcada..d06fb19 100644
--- a/config.c
+++ b/config.c
@@ -567,6 +567,12 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.bigfilethreshold")) {
+		long n = git_config_int(var, value);
+		big_file_threshold = 0 < n ? n : 0;
+		return 0;
+	}
+
 	if (!strcmp(var, "core.packedgitlimit")) {
 		packed_git_limit = git_config_int(var, value);
 		return 0;
diff --git a/environment.c b/environment.c
index f4549d3..3d1ab51 100644
--- a/environment.c
+++ b/environment.c
@@ -35,6 +35,7 @@ int fsync_object_files;
 size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
 size_t delta_base_cache_limit = 16 * 1024 * 1024;
+uintmax_t big_file_threshold = 512 * 1024 * 1024;
 const char *pager_program;
 int pager_use_color = 1;
 const char *editor_program;
diff --git a/fast-import.c b/fast-import.c
index 65d65bf..3e4e655 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -274,7 +274,6 @@ struct recent_command {
 /* Configured limits on output */
 static unsigned long max_depth = 10;
 static off_t max_packsize;
-static uintmax_t big_file_threshold = 512 * 1024 * 1024;
 static int force_update;
 static int pack_compression_level = Z_DEFAULT_COMPRESSION;
 static int pack_compression_seen;
@@ -3206,10 +3205,6 @@ static int git_pack_config(const char *k, const char *v, void *cb)
 		max_packsize = git_config_ulong(k, v);
 		return 0;
 	}
-	if (!strcmp(k, "core.bigfilethreshold")) {
-		long n = git_config_int(k, v);
-		big_file_threshold = 0 < n ? n : 0;
-	}
 	return git_default_config(k, v, cb);
 }
 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Git exhausts memory.
  2011-04-05 17:44               ` Junio C Hamano
@ 2011-04-05 20:56                 ` Nicolas Pitre
  2011-04-05 22:16                   ` Junio C Hamano
  2011-04-06 15:51                 ` Jay Soffian
  1 sibling, 1 reply; 19+ messages in thread
From: Nicolas Pitre @ 2011-04-05 20:56 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Shawn Pearce, Holger Hellmuth, Nguyen Thai Ngoc Duy,
	Git Mailing List, Alif Wahid

On Tue, 5 Apr 2011, Junio C Hamano wrote:

> Shawn Pearce <spearce@spearce.org> writes:
> 
> > On Tue, Apr 5, 2011 at 12:48, Holger Hellmuth <hellmuth@ira.uka.de> wrote:
> >> On 04.04.2011 16:57, Nguyen Thai Ngoc Duy wrote:
> >>>
> >>> Should we change the default to not delta if a blob exceeds predefined
> >>> limit (say 128M)? People who deliberately wants to delta them can
> >>> still set delta attr. 1.8.0 material maybe?
> >>
> >> Isn't this already done with the config variable core.bigFileThreshold ?
> >>
> >> documentation says: "Files larger than this size are stored deflated,
> >> without attempting delta compression. ... Default is 512 MiB on all
> >> platforms."
> >
> > This is only implemented inside of fast-import. pack-objects does not
> > honor this variable.
> 
> Do you mean perhaps we should?

Yes.

Acked-by: Nicolas Pitre <nico@fluxnic.net>


>  builtin/pack-objects.c |    8 ++++++--
>  cache.h                |    1 +
>  config.c               |    6 ++++++
>  environment.c          |    1 +
>  fast-import.c          |    5 -----
>  5 files changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index b0503b2..f402a84 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -1142,8 +1142,12 @@ static void get_object_details(void)
>  		sorted_by_offset[i] = objects + i;
>  	qsort(sorted_by_offset, nr_objects, sizeof(*sorted_by_offset), pack_offset_sort);
>  
> -	for (i = 0; i < nr_objects; i++)
> -		check_object(sorted_by_offset[i]);
> +	for (i = 0; i < nr_objects; i++) {
> +		struct object_entry *entry = sorted_by_offset[i];
> +		check_object(entry);
> +		if (big_file_threshold <= entry->size)
> +			entry->no_try_delta = 1;
> +	}
>  
>  	free(sorted_by_offset);
>  }
> diff --git a/cache.h b/cache.h
> index 2674f4c..316d85f 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -573,6 +573,7 @@ extern int core_compression_seen;
>  extern size_t packed_git_window_size;
>  extern size_t packed_git_limit;
>  extern size_t delta_base_cache_limit;
> +extern uintmax_t big_file_threshold;
>  extern int read_replace_refs;
>  extern int fsync_object_files;
>  extern int core_preload_index;
> diff --git a/config.c b/config.c
> index 0abcada..d06fb19 100644
> --- a/config.c
> +++ b/config.c
> @@ -567,6 +567,12 @@ static int git_default_core_config(const char *var, const char *value)
>  		return 0;
>  	}
>  
> +	if (!strcmp(var, "core.bigfilethreshold")) {
> +		long n = git_config_int(var, value);
> +		big_file_threshold = 0 < n ? n : 0;
> +		return 0;
> +	}
> +
>  	if (!strcmp(var, "core.packedgitlimit")) {
>  		packed_git_limit = git_config_int(var, value);
>  		return 0;
> diff --git a/environment.c b/environment.c
> index f4549d3..3d1ab51 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -35,6 +35,7 @@ int fsync_object_files;
>  size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
>  size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
>  size_t delta_base_cache_limit = 16 * 1024 * 1024;
> +uintmax_t big_file_threshold = 512 * 1024 * 1024;
>  const char *pager_program;
>  int pager_use_color = 1;
>  const char *editor_program;
> diff --git a/fast-import.c b/fast-import.c
> index 65d65bf..3e4e655 100644
> --- a/fast-import.c
> +++ b/fast-import.c
> @@ -274,7 +274,6 @@ struct recent_command {
>  /* Configured limits on output */
>  static unsigned long max_depth = 10;
>  static off_t max_packsize;
> -static uintmax_t big_file_threshold = 512 * 1024 * 1024;
>  static int force_update;
>  static int pack_compression_level = Z_DEFAULT_COMPRESSION;
>  static int pack_compression_seen;
> @@ -3206,10 +3205,6 @@ static int git_pack_config(const char *k, const char *v, void *cb)
>  		max_packsize = git_config_ulong(k, v);
>  		return 0;
>  	}
> -	if (!strcmp(k, "core.bigfilethreshold")) {
> -		long n = git_config_int(k, v);
> -		big_file_threshold = 0 < n ? n : 0;
> -	}
>  	return git_default_config(k, v, cb);
>  }
>  
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git exhausts memory.
  2011-04-05 20:56                 ` Nicolas Pitre
@ 2011-04-05 22:16                   ` Junio C Hamano
  2011-04-05 22:19                     ` Shawn Pearce
  2011-04-06  0:34                     ` Nicolas Pitre
  0 siblings, 2 replies; 19+ messages in thread
From: Junio C Hamano @ 2011-04-05 22:16 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Shawn Pearce, Holger Hellmuth, Nguyen Thai Ngoc Duy,
	Git Mailing List, Alif Wahid

Nicolas Pitre <nico@fluxnic.net> writes:

>> > This is only implemented inside of fast-import. pack-objects does not
>> > honor this variable.
>> 
>> Do you mean perhaps we should?
>
> Yes.
>
> Acked-by: Nicolas Pitre <nico@fluxnic.net>

I actually was somewhat unhappy to use uintmax_t type in the public header
for some reason I cannot quite explain (perhaps religious), and was hoping
somebody with more sanity than myself would stop me or show me a better
way.

>> diff --git a/cache.h b/cache.h
>> index 2674f4c..316d85f 100644
>> --- a/cache.h
>> +++ b/cache.h
>> @@ -573,6 +573,7 @@ extern int core_compression_seen;
>>  extern size_t packed_git_window_size;
>>  extern size_t packed_git_limit;
>>  extern size_t delta_base_cache_limit;
>> +extern uintmax_t big_file_threshold;
>>  extern int read_replace_refs;
>>  extern int fsync_object_files;
>>  extern int core_preload_index;

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git exhausts memory.
  2011-04-05 22:16                   ` Junio C Hamano
@ 2011-04-05 22:19                     ` Shawn Pearce
  2011-04-06  0:34                     ` Nicolas Pitre
  1 sibling, 0 replies; 19+ messages in thread
From: Shawn Pearce @ 2011-04-05 22:19 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nicolas Pitre, Holger Hellmuth, Nguyen Thai Ngoc Duy,
	Git Mailing List, Alif Wahid

On Tue, Apr 5, 2011 at 18:16, Junio C Hamano <gitster@pobox.com> wrote:
> Nicolas Pitre <nico@fluxnic.net> writes:
>
>>> > This is only implemented inside of fast-import. pack-objects does not
>>> > honor this variable.
>>>
>>> Do you mean perhaps we should?
>>
>> Yes.
>>
>> Acked-by: Nicolas Pitre <nico@fluxnic.net>
>
> I actually was somewhat unhappy to use uintmax_t type in the public header
> for some reason I cannot quite explain (perhaps religious), and was hoping
> somebody with more sanity than myself would stop me or show me a better
> way.

unsigned long? Without even looking at the source, I bet that is the
type used by pack-objects for the size member that you are comparing
against.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git exhausts memory.
  2011-04-05 22:16                   ` Junio C Hamano
  2011-04-05 22:19                     ` Shawn Pearce
@ 2011-04-06  0:34                     ` Nicolas Pitre
  1 sibling, 0 replies; 19+ messages in thread
From: Nicolas Pitre @ 2011-04-06  0:34 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Shawn Pearce, Holger Hellmuth, Nguyen Thai Ngoc Duy,
	Git Mailing List, Alif Wahid

On Tue, 5 Apr 2011, Junio C Hamano wrote:

> Nicolas Pitre <nico@fluxnic.net> writes:
> 
> >> > This is only implemented inside of fast-import. pack-objects does not
> >> > honor this variable.
> >> 
> >> Do you mean perhaps we should?
> >
> > Yes.
> >
> > Acked-by: Nicolas Pitre <nico@fluxnic.net>
> 
> I actually was somewhat unhappy to use uintmax_t type in the public header
> for some reason I cannot quite explain (perhaps religious), and was hoping
> somebody with more sanity than myself would stop me or show me a better
> way.

Just use unsigned long.  Everywhere we have object size, it is stored as 
unsigned long.


Nicolas

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Git exhausts memory.
  2011-04-05 17:44               ` Junio C Hamano
  2011-04-05 20:56                 ` Nicolas Pitre
@ 2011-04-06 15:51                 ` Jay Soffian
  2011-04-06 16:33                   ` Junio C Hamano
  1 sibling, 1 reply; 19+ messages in thread
From: Jay Soffian @ 2011-04-06 15:51 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Shawn Pearce, Holger Hellmuth, Nguyen Thai Ngoc Duy,
	Nicolas Pitre, Git Mailing List, Alif Wahid

On Tue, Apr 5, 2011 at 1:44 PM, Junio C Hamano <gitster@pobox.com> wrote:
>  builtin/pack-objects.c |    8 ++++++--
>  cache.h                |    1 +
>  config.c               |    6 ++++++
>  environment.c          |    1 +
>  fast-import.c          |    5 -----
>  5 files changed, 14 insertions(+), 7 deletions(-)

This will be white-spaced damaged by Gmail, but anyway:

diff --git i/Documentation/config.txt w/Documentation/config.txt
index 750c86d..91aa9be 100644
--- i/Documentation/config.txt
+++ w/Documentation/config.txt
@@ -443,7 +443,6 @@ be delta compressed, but larger binary media files won't be.
 +
 Common unit suffixes of 'k', 'm', or 'g' are supported.
 +
-Currently only linkgit:git-fast-import[1] honors this setting.

 core.excludesfile::
 	In addition to '.gitignore' (per-directory) and


j.

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Git exhausts memory.
  2011-04-06 15:51                 ` Jay Soffian
@ 2011-04-06 16:33                   ` Junio C Hamano
  0 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2011-04-06 16:33 UTC (permalink / raw)
  To: Jay Soffian
  Cc: Shawn Pearce, Holger Hellmuth, Nguyen Thai Ngoc Duy,
	Nicolas Pitre, Git Mailing List, Alif Wahid

Jay Soffian <jaysoffian@gmail.com> writes:

> This will be white-spaced damaged by Gmail, but anyway:

I've already done that last night.  Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2011-04-06 16:34 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-02  5:01 Git exhausts memory Alif Wahid
2011-04-02 15:05 ` Nicolas Pitre
2011-04-03  9:15   ` Alif Wahid
2011-04-03 15:18     ` Nicolas Pitre
2011-04-04 12:52       ` Alif Wahid
2011-04-04 14:57         ` Nguyen Thai Ngoc Duy
2011-04-05  2:22           ` David Fries
2011-04-05  4:35             ` Alif Wahid
2011-04-05 11:13               ` Nguyen Thai Ngoc Duy
2011-04-05 11:26                 ` Alif Wahid
2011-04-05 16:48           ` Holger Hellmuth
2011-04-05 17:06             ` Shawn Pearce
2011-04-05 17:44               ` Junio C Hamano
2011-04-05 20:56                 ` Nicolas Pitre
2011-04-05 22:16                   ` Junio C Hamano
2011-04-05 22:19                     ` Shawn Pearce
2011-04-06  0:34                     ` Nicolas Pitre
2011-04-06 15:51                 ` Jay Soffian
2011-04-06 16:33                   ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).