git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/4] Better "gc --aggressive"
@ 2014-03-16 13:34 Nguyễn Thái Ngọc Duy
  2014-03-16 13:35 ` [PATCH 1/4] environment.c: fix constness for odb_pack_keep() Nguyễn Thái Ngọc Duy
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-03-16 13:34 UTC (permalink / raw
  To: git; +Cc: Nguyễn Thái Ngọc Duy

See [1] for the discussion that led to this series. It attempts to
pack the repo with two different depths: old history tightly packed
(smaller but also takes longer time to access) and recent history on
the opposite.

First draft, probably still some bugs lurking in pack_old_history().
It would be great if people could try it out on large repos and report
back the results (pack size between the old and new aggressive, gc
time, git log and blame speed...)

[1] http://thread.gmane.org/gmane.comp.version-control.git/242277

Nguyễn Thái Ngọc Duy (4):
  environment.c: fix constness for odb_pack_keep()
  pack-objects: support --keep
  gc --aggressive: make --depth configurable
  gc --aggressive: three phase repacking

 Documentation/config.txt           |  24 ++++++++
 Documentation/git-gc.txt           |   3 +
 Documentation/git-pack-objects.txt |   4 ++
 builtin/gc.c                       | 113 ++++++++++++++++++++++++++++++++++++-
 builtin/pack-objects.c             |  26 +++++++++
 environment.c                      |   2 +-
 git-compat-util.h                  |   2 +-
 7 files changed, 169 insertions(+), 5 deletions(-)

-- 
1.9.0.40.gaa8c3ea

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/4] environment.c: fix constness for odb_pack_keep()
  2014-03-16 13:34 [PATCH 0/4] Better "gc --aggressive" Nguyễn Thái Ngọc Duy
@ 2014-03-16 13:35 ` Nguyễn Thái Ngọc Duy
  2014-03-16 13:35 ` [PATCH] index-pack: do not segfault when keep_name is NULL Nguyễn Thái Ngọc Duy
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-03-16 13:35 UTC (permalink / raw
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 environment.c     | 2 +-
 git-compat-util.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/environment.c b/environment.c
index c3c8606..5c4815d 100644
--- a/environment.c
+++ b/environment.c
@@ -237,7 +237,7 @@ int odb_mkstemp(char *template, size_t limit, const char *pattern)
 	return xmkstemp_mode(template, mode);
 }
 
-int odb_pack_keep(char *name, size_t namesz, unsigned char *sha1)
+int odb_pack_keep(char *name, size_t namesz, const unsigned char *sha1)
 {
 	int fd;
 
diff --git a/git-compat-util.h b/git-compat-util.h
index 585ef8a..adbfb5e 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -533,7 +533,7 @@ extern FILE *xfdopen(int fd, const char *mode);
 extern int xmkstemp(char *template);
 extern int xmkstemp_mode(char *template, int mode);
 extern int odb_mkstemp(char *template, size_t limit, const char *pattern);
-extern int odb_pack_keep(char *name, size_t namesz, unsigned char *sha1);
+extern int odb_pack_keep(char *name, size_t namesz, const unsigned char *sha1);
 
 static inline size_t xsize_t(off_t len)
 {
-- 
1.9.0.40.gaa8c3ea

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH] index-pack: do not segfault when keep_name is NULL
  2014-03-16 13:34 [PATCH 0/4] Better "gc --aggressive" Nguyễn Thái Ngọc Duy
  2014-03-16 13:35 ` [PATCH 1/4] environment.c: fix constness for odb_pack_keep() Nguyễn Thái Ngọc Duy
@ 2014-03-16 13:35 ` Nguyễn Thái Ngọc Duy
  2014-03-16 13:35 ` [PATCH 2/4] pack-objects: support --keep Nguyễn Thái Ngọc Duy
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-03-16 13:35 UTC (permalink / raw
  To: git; +Cc: Nguyễn Thái Ngọc Duy

keep_name is used to print error messages a couple lines down. Reset
it to the real path returned by odb_pack_keep() if it's set to NULL by
caller.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 One of these moments I will make git log and friends optionally recognize
 "Diff-Options:" line in commit message. Adding -U14 in this case
 should help the reviewer to see how those error messages are printed.

 builtin/index-pack.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index a6b1c17..d95c3dc 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1283,9 +1283,10 @@ static void final(const char *final_pack_name, const char *curr_pack_name,
 	if (keep_msg) {
 		int keep_fd, keep_msg_len = strlen(keep_msg);
 
-		if (!keep_name)
+		if (!keep_name) {
 			keep_fd = odb_pack_keep(name, sizeof(name), sha1);
-		else
+			keep_name = name;
+		} else
 			keep_fd = open(keep_name, O_RDWR|O_CREAT|O_EXCL, 0600);
 
 		if (keep_fd < 0) {
-- 
1.9.0.40.gaa8c3ea

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/4] pack-objects: support --keep
  2014-03-16 13:34 [PATCH 0/4] Better "gc --aggressive" Nguyễn Thái Ngọc Duy
  2014-03-16 13:35 ` [PATCH 1/4] environment.c: fix constness for odb_pack_keep() Nguyễn Thái Ngọc Duy
  2014-03-16 13:35 ` [PATCH] index-pack: do not segfault when keep_name is NULL Nguyễn Thái Ngọc Duy
@ 2014-03-16 13:35 ` Nguyễn Thái Ngọc Duy
  2014-03-16 13:35 ` [PATCH 3/4] gc --aggressive: make --depth configurable Nguyễn Thái Ngọc Duy
  2014-03-16 13:35 ` [PATCH 4/4] gc --aggressive: three phase repacking Nguyễn Thái Ngọc Duy
  4 siblings, 0 replies; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-03-16 13:35 UTC (permalink / raw
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git-pack-objects.txt |  4 ++++
 builtin/pack-objects.c             | 26 ++++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index cdab9ed..8aae3b5 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -117,6 +117,10 @@ base-name::
 	has a .keep file to be ignored, even if it would have
 	otherwise been packed.
 
+--keep::
+	Before moving the index into its final destination
+	create an empty .keep file for the associated pack file.
+
 --incremental::
 	This flag causes an object already in a pack to be ignored
 	even if it would have otherwise been packed.
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 1fb972f..46c7a57 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -57,6 +57,7 @@ static int num_preferred_base;
 static struct progress *progress_state;
 static int pack_compression_level = Z_DEFAULT_COMPRESSION;
 static int pack_compression_seen;
+static int keep_pack;
 
 static struct packed_git *reuse_packfile;
 static uint32_t reuse_packfile_objects;
@@ -745,6 +746,23 @@ static off_t write_reused_pack(struct sha1file *f)
 	return reuse_packfile_offset - sizeof(struct pack_header);
 }
 
+static void write_keep_file(const unsigned char *sha1)
+{
+	const char *msg = "pack-objects --keep\n";
+	char name[PATH_MAX];
+	int keep_fd = odb_pack_keep(name, sizeof(name), sha1);
+	if (keep_fd < 0) {
+		if (errno != EEXIST)
+			die_errno(_("cannot write keep file '%s'"),
+				  name);
+		return;
+	}
+	write_or_die(keep_fd, msg, strlen(msg));
+	if (close(keep_fd) != 0)
+		die_errno(_("cannot close written keep file '%s'"),
+			  name);
+}
+
 static void write_pack_file(void)
 {
 	uint32_t i = 0, j;
@@ -805,6 +823,9 @@ static void write_pack_file(void)
 			struct stat st;
 			char tmpname[PATH_MAX];
 
+			if (keep_pack)
+				write_keep_file(sha1);
+
 			/*
 			 * Packs are runtime accessed in their mtime
 			 * order since newer packs are more likely to contain
@@ -2609,6 +2630,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			 N_("use a bitmap index if available to speed up counting objects")),
 		OPT_BOOL(0, "write-bitmap-index", &write_bitmap_index,
 			 N_("write a bitmap index together with the pack index")),
+		OPT_BOOL(0, "keep", &keep_pack,
+			 N_("create .keep for the new pack")),
 		OPT_END(),
 	};
 
@@ -2672,6 +2695,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	if (!pack_to_stdout && thin)
 		die("--thin cannot be used to build an indexable pack.");
 
+	if (pack_to_stdout && keep_pack)
+		die("--keep cannot be used with --stdout");
+
 	if (keep_unreachable && unpack_unreachable)
 		die("--keep-unreachable and --unpack-unreachable are incompatible.");
 
-- 
1.9.0.40.gaa8c3ea

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/4] gc --aggressive: make --depth configurable
  2014-03-16 13:34 [PATCH 0/4] Better "gc --aggressive" Nguyễn Thái Ngọc Duy
                   ` (2 preceding siblings ...)
  2014-03-16 13:35 ` [PATCH 2/4] pack-objects: support --keep Nguyễn Thái Ngọc Duy
@ 2014-03-16 13:35 ` Nguyễn Thái Ngọc Duy
       [not found]   ` <CAG+J_Dw=Y5d2JTOngkxH=vNg3C43nP5=y7S6VXS=aHgmBshYZQ@mail.gmail.com>
  2014-03-16 13:35 ` [PATCH 4/4] gc --aggressive: three phase repacking Nguyễn Thái Ngọc Duy
  4 siblings, 1 reply; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-03-16 13:35 UTC (permalink / raw
  To: git; +Cc: Nguyễn Thái Ngọc Duy

When 1c192f3 (gc --aggressive: make it really aggressive - 2007-12-06)
made --depth=250 the default value, it didn't really explain the
reason behind, especially the pros and cons of --depth=250.

An old mail from Linus below explains it at length. Long story short,
--depth=250 is a disk saver and a performance killer. Not everybody
agrees on that aggressiveness. Let the user configure it.

-- 8< --
From: Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH] gc --aggressive: make it really aggressive
Date: Thu, 6 Dec 2007 08:19:24 -0800 (PST)
Message-ID: <alpine.LFD.0.9999.0712060803430.13796@woody.linux-foundation.org>
Gmane-URL: http://article.gmane.org/gmane.comp.gcc.devel/94637

On Thu, 6 Dec 2007, Harvey Harrison wrote:
>
> 7:41:25elapsed 86%CPU

Heh. And this is why you want to do it exactly *once*, and then just
export the end result for others ;)

> -r--r--r-- 1 hharrison hharrison 324094684 2007-12-06 07:26 pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.pack

But yeah, especially if you allow longer delta chains, the end result can
be much smaller (and what makes the one-time repack more expensive is the
window size, not the delta chain - you could make the delta chains longer
with no cost overhead at packing time)

HOWEVER.

The longer delta chains do make it potentially much more expensive to then
use old history. So there's a trade-off. And quite frankly, a delta depth
of 250 is likely going to cause overflows in the delta cache (which is
only 256 entries in size *and* it's a hash, so it's going to start having
hash conflicts long before hitting the 250 depth limit).

So when I said "--depth=250 --window=250", I chose those numbers more as
an example of extremely aggressive packing, and I'm not at all sure that
the end result is necessarily wonderfully usable. It's going to save disk
space (and network bandwidth - the delta's will be re-used for the network
protocol too!), but there are definitely downsides too, and using long
delta chains may simply not be worth it in practice.

(And some of it might just want to have git tuning, ie if people think
that long deltas are worth it, we could easily just expand on the delta
hash, at the cost of some more memory used!)

That said, the good news is that working with *new* history will not be
affected negatively, and if you want to be _really_ sneaky, there are ways
to say "create a pack that contains the history up to a version one year
ago, and be very aggressive about those old versions that we still want to
have around, but do a separate pack for newer stuff using less aggressive
parameters"

So this is something that can be tweaked, although we don't really have
any really nice interfaces for stuff like that (ie the git delta cache
size is hardcoded in the sources and cannot be set in the config file, and
the "pack old history more aggressively" involves some manual scripting
and knowing how "git pack-objects" works rather than any nice simple
command line switch).

So the thing to take away from this is:
 - git is certainly flexible as hell
 - .. but to get the full power you may need to tweak things
 - .. happily you really only need to have one person to do the tweaking,
   and the tweaked end results will be available to others that do not
   need to know/care.

And whether the difference between 320MB and 500MB is worth any really
involved tweaking (considering the potential downsides), I really don't
know. Only testing will tell.

			Linus
-- 8< --

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/config.txt | 5 +++++
 Documentation/git-gc.txt | 3 +++
 builtin/gc.c             | 8 +++++++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 79e5768..5ce7f9a 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1151,6 +1151,11 @@ filter.<driver>.smudge::
 	object to a worktree file upon checkout.  See
 	linkgit:gitattributes[5] for details.
 
+gc.aggressiveDepth::
+	The depth parameter used in the delta compression
+	algorithm used by 'git gc --aggressive'.  This defaults
+	to 250.
+
 gc.aggressiveWindow::
 	The window size parameter used in the delta compression
 	algorithm used by 'git gc --aggressive'.  This defaults
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index e158a3b..273c466 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -124,6 +124,9 @@ the value, the more time is spent optimizing the delta compression.  See
 the documentation for the --window' option in linkgit:git-repack[1] for
 more details.  This defaults to 250.
 
+Similarly, the optional configuration variable 'gc.aggressiveDepth'
+controls --depth option in linkgit:git-repack[1]. This defaults to 250.
+
 The optional configuration variable 'gc.pruneExpire' controls how old
 the unreferenced loose objects have to be before they are pruned.  The
 default is "2 weeks ago".
diff --git a/builtin/gc.c b/builtin/gc.c
index 63d400b..72aa912 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -26,6 +26,7 @@ static const char * const builtin_gc_usage[] = {
 };
 
 static int pack_refs = 1;
+static int aggressive_depth = 250;
 static int aggressive_window = 250;
 static int gc_auto_threshold = 6700;
 static int gc_auto_pack_limit = 50;
@@ -66,6 +67,10 @@ static int gc_config(const char *var, const char *value, void *cb)
 		aggressive_window = git_config_int(var, value);
 		return 0;
 	}
+	if (!strcmp(var, "gc.aggressivedepth")) {
+		aggressive_depth = git_config_int(var, value);
+		return 0;
+	}
 	if (!strcmp(var, "gc.auto")) {
 		gc_auto_threshold = git_config_int(var, value);
 		return 0;
@@ -294,7 +299,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 
 	if (aggressive) {
 		argv_array_push(&repack, "-f");
-		argv_array_push(&repack, "--depth=250");
+		if (aggressive_depth > 0)
+			argv_array_pushf(&repack, "--depth=%d", aggressive_depth);
 		if (aggressive_window > 0)
 			argv_array_pushf(&repack, "--window=%d", aggressive_window);
 	}
-- 
1.9.0.40.gaa8c3ea

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/4] gc --aggressive: three phase repacking
  2014-03-16 13:34 [PATCH 0/4] Better "gc --aggressive" Nguyễn Thái Ngọc Duy
                   ` (3 preceding siblings ...)
  2014-03-16 13:35 ` [PATCH 3/4] gc --aggressive: make --depth configurable Nguyễn Thái Ngọc Duy
@ 2014-03-16 13:35 ` Nguyễn Thái Ngọc Duy
  2014-03-17 22:12   ` Junio C Hamano
  2014-03-18  4:50   ` Jeff King
  4 siblings, 2 replies; 21+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-03-16 13:35 UTC (permalink / raw
  To: git; +Cc: Nguyễn Thái Ngọc Duy

As explained in the previous commit, current aggressive settings
--depth=250 --window=250 could slow down repository access
significantly. Notice that people usually work on recent history only,
we could keep recent history more loosely packed, so that repo access
is fast most of the time while the pack file remains small.

Three more configuration variables are used to make that happen. The
first one, gc.aggressiveCommitLimits covers the old history part,
which will be tightly packed. The remaining part will be packed with
gc.lessAggresiveWindow and gc.lessAggressiveDepth. If
gc.aggressiveCommitLimits is empty, everything will be tightly packed
as before.

The repack process becomes:

 - repack -adf on old history (e.g. the default --before=1.year.ago)
   mark to keep that pack

 - repack the second time with lessAggressive settings, the kept pack
   should be left untouched.

 - remove .keep file and repack the final time, reusing all deltas

This process costs more time, but produce a more effecient pack. It is
assumed that people who do "gc --aggressive" do not do this often and
do not mind if it takes a bit longer.

git.git is not a great repo to test it because its size is modest but
so are my laptop's cpu and memory, so here are the timings and pack
sizes

            size  time
old aggr.   36MB  5m51
new aggr.   37MB  6m13
repack -adf 48MB  1m12

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/config.txt |  19 +++++++++
 builtin/gc.c             | 109 +++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 124 insertions(+), 4 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 5ce7f9a..47979dc 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1161,6 +1161,25 @@ gc.aggressiveWindow::
 	algorithm used by 'git gc --aggressive'.  This defaults
 	to 250.
 
+gc.aggressiveCommitLimits::
+	This one parameter to linkgit:git-rev-list[1] to select
+	commits that are repacked with gc.aggressiveDepth and
+	gc.aggressiveWindow, while the remaining commits are repacked
+	with gc.lessAggressiveDepth and gc.lessAggressiveWindow.
++
+If this is an empty string, everything will be repacked with
+gc.aggressiveWindow and gc.aggressiveDepth.
+
+gc.lessAggressiveDepth::
+	The depth parameter used in the delta compression
+	algorithm used by 'git gc --aggressive' when
+	gc.aggressiveCommitLimits is set.  This defaults to 50.
+
+gc.lessAggressiveWindow::
+	The window size parameter used in the delta compression
+	algorithm used by 'git gc --aggressive' when
+	gc.aggressiveCommitLimits is set.  This defaults to 250.
+
 gc.auto::
 	When there are approximately more than this many loose
 	objects in the repository, `git gc --auto` will pack them.
diff --git a/builtin/gc.c b/builtin/gc.c
index 72aa912..37fc603 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -28,10 +28,14 @@ static const char * const builtin_gc_usage[] = {
 static int pack_refs = 1;
 static int aggressive_depth = 250;
 static int aggressive_window = 250;
+static const char *aggressive_rev_list = "--before=1.year.ago";
+static int less_aggressive_depth = 50;
+static int less_aggressive_window = 250;
 static int gc_auto_threshold = 6700;
 static int gc_auto_pack_limit = 50;
 static int detach_auto = 1;
 static const char *prune_expire = "2.weeks.ago";
+static int delta_base_offset = 1;
 
 static struct argv_array pack_refs_cmd = ARGV_ARRAY_INIT;
 static struct argv_array reflog = ARGV_ARRAY_INIT;
@@ -39,10 +43,13 @@ static struct argv_array repack = ARGV_ARRAY_INIT;
 static struct argv_array prune = ARGV_ARRAY_INIT;
 static struct argv_array rerere = ARGV_ARRAY_INIT;
 
+static char *keep_file;
 static char *pidfile;
 
 static void remove_pidfile(void)
 {
+	if (keep_file)
+		unlink_or_warn(keep_file);
 	if (pidfile)
 		unlink(pidfile);
 }
@@ -54,6 +61,63 @@ static void remove_pidfile_on_signal(int signo)
 	raise(signo);
 }
 
+static void pack_old_history(int quiet)
+{
+	struct child_process pack_objects;
+	struct child_process rev_list;
+	struct argv_array av_po = ARGV_ARRAY_INIT;
+	struct argv_array av_rl = ARGV_ARRAY_INIT;
+	char sha1[41];
+
+	argv_array_pushl(&av_rl, "rev-list", "--all", "--objects",
+			 "--reflog", NULL);
+	argv_array_push(&av_rl, aggressive_rev_list);
+
+	memset(&rev_list, 0, sizeof(rev_list));
+	rev_list.no_stdin = 1;
+	rev_list.out = -1;
+	rev_list.git_cmd = 1;
+	rev_list.argv = av_rl.argv;
+
+	if (start_command(&rev_list))
+		die(_("gc: unable to fork git-rev-list"));
+
+	argv_array_pushl(&av_po, "pack-objects", "--keep-true-parents",
+			 "--honor-pack-keep", "--non-empty", "--no-reuse-delta",
+			 "--keep", "--local", NULL);
+	if (delta_base_offset)
+		argv_array_push(&av_po,  "--delta-base-offset");
+	if (quiet)
+		argv_array_push(&av_po, "-q");
+	if (aggressive_window)
+		argv_array_pushf(&av_po, "--window=%d", aggressive_window);
+	if (aggressive_depth)
+		argv_array_pushf(&av_po, "--depth=%d", aggressive_depth);
+	argv_array_push(&av_po, git_path("objects/pack/pack"));
+
+	memset(&pack_objects, 0, sizeof(pack_objects));
+	pack_objects.in = rev_list.out;
+	pack_objects.out = -1;
+	pack_objects.git_cmd = 1;
+	pack_objects.argv = av_po.argv;
+
+	if (start_command(&pack_objects))
+		die(_("gc: unable to fork git-pack-objects"));
+
+	if (read_in_full(pack_objects.out, sha1, 41) != 41 ||
+	    sha1[40] != '\n')
+		die_errno(_("gc: pack-objects did not return the new pack's SHA-1"));
+	sha1[40] = '\0';
+	keep_file = git_pathdup("objects/pack/pack-%s.keep", sha1);
+	close(pack_objects.out);
+
+	if (finish_command(&rev_list))
+		die(_("gc: git-rev-list died with error"));
+
+	if (finish_command(&pack_objects))
+		die(_("gc: git-pack-objects died with error"));
+}
+
 static int gc_config(const char *var, const char *value, void *cb)
 {
 	if (!strcmp(var, "gc.packrefs")) {
@@ -71,6 +135,22 @@ static int gc_config(const char *var, const char *value, void *cb)
 		aggressive_depth = git_config_int(var, value);
 		return 0;
 	}
+	if (!strcmp(var, "gc.aggressivecommitlimits")) {
+		aggressive_rev_list = value && *value ? xstrdup(value) : NULL;
+		return 0;
+	}
+	if (!strcmp(var, "gc.lessaggressivewindow")) {
+		less_aggressive_window = git_config_int(var, value);
+		return 0;
+	}
+	if (!strcmp(var, "gc.lessaggressivedepth")) {
+		less_aggressive_depth = git_config_int(var, value);
+		return 0;
+	}
+	if (!strcmp(var, "repack.usedeltabaseoffset")) {
+		delta_base_offset = git_config_bool(var, value);
+		return 0;
+	}
 	if (!strcmp(var, "gc.auto")) {
 		gc_auto_threshold = git_config_int(var, value);
 		return 0;
@@ -298,11 +378,19 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 		usage_with_options(builtin_gc_usage, builtin_gc_options);
 
 	if (aggressive) {
+		int depth, window;
+		if (aggressive_rev_list) {
+			depth = less_aggressive_depth;
+			window = less_aggressive_window;
+		} else {
+			depth = aggressive_depth;
+			window = aggressive_window;
+		}
 		argv_array_push(&repack, "-f");
-		if (aggressive_depth > 0)
-			argv_array_pushf(&repack, "--depth=%d", aggressive_depth);
-		if (aggressive_window > 0)
-			argv_array_pushf(&repack, "--window=%d", aggressive_window);
+		if (depth > 0)
+			argv_array_pushf(&repack, "--depth=%d", depth);
+		if (window > 0)
+			argv_array_pushf(&repack, "--window=%d", window);
 	}
 	if (quiet)
 		argv_array_push(&repack, "-q");
@@ -343,9 +431,22 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 	if (run_command_v_opt(reflog.argv, RUN_GIT_CMD))
 		return error(FAILED_RUN, reflog.argv[0]);
 
+	if (aggressive && aggressive_rev_list)
+		pack_old_history(quiet);
+
 	if (run_command_v_opt(repack.argv, RUN_GIT_CMD))
 		return error(FAILED_RUN, repack.argv[0]);
 
+	if (aggressive && aggressive_rev_list) {
+		if (keep_file)
+			unlink_or_warn(keep_file);
+		argv_array_clear(&repack);
+		argv_array_pushl(&repack, "repack", "-d", "-l", NULL);
+		add_repack_all_option();
+		if (run_command_v_opt(repack.argv, RUN_GIT_CMD))
+			return error(FAILED_RUN, repack.argv[0]);
+	}
+
 	if (prune_expire) {
 		argv_array_push(&prune, prune_expire);
 		if (quiet)
-- 
1.9.0.40.gaa8c3ea

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/4] gc --aggressive: make --depth configurable
       [not found]   ` <CAG+J_Dw=Y5d2JTOngkxH=vNg3C43nP5=y7S6VXS=aHgmBshYZQ@mail.gmail.com>
@ 2014-03-16 23:06     ` Duy Nguyen
  0 siblings, 0 replies; 21+ messages in thread
From: Duy Nguyen @ 2014-03-16 23:06 UTC (permalink / raw
  To: Jay Soffian; +Cc: git

On Mon, Mar 17, 2014 at 12:10 AM, Jay Soffian <jaysoffian@gmail.com> wrote:
> On Sun, Mar 16, 2014 at 9:35 AM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> wrote:
>>
>> When 1c192f3 (gc --aggressive: make it really aggressive - 2007-12-06)
>> made --depth=250 the default value, it didn't really explain the
>> reason behind, especially the pros and cons of --depth=250.
>>
>> An old mail from Linus below explains it at length. Long story short,
>> --depth=250 is a disk saver and a performance killer. Not everybody
>> agrees on that aggressiveness. Let the user configure it.
>
>
> Suggestion to link to the email discussion(s) on gmane in the Documentation
> or at least the commit message?

If you mean the discussion back in 2007, there is a fake header
"Gmane-URL" in Linus' mail in the commit message.
-- 
Duy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] gc --aggressive: three phase repacking
  2014-03-16 13:35 ` [PATCH 4/4] gc --aggressive: three phase repacking Nguyễn Thái Ngọc Duy
@ 2014-03-17 22:12   ` Junio C Hamano
  2014-03-17 22:59     ` Duy Nguyen
  2014-03-18  4:50   ` Jeff King
  1 sibling, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2014-03-17 22:12 UTC (permalink / raw
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> As explained in the previous commit,...

[PATCH 3/4] becomes a commit with an empty log message for some
reason.  Have you tried running "am -s" on it?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] gc --aggressive: three phase repacking
  2014-03-17 22:12   ` Junio C Hamano
@ 2014-03-17 22:59     ` Duy Nguyen
  2014-03-17 23:07       ` Junio C Hamano
  0 siblings, 1 reply; 21+ messages in thread
From: Duy Nguyen @ 2014-03-17 22:59 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Git Mailing List

On Tue, Mar 18, 2014 at 5:12 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>
>> As explained in the previous commit,...
>
> [PATCH 3/4] becomes a commit with an empty log message for some
> reason.  Have you tried running "am -s" on it?

"am -s" worked fine. "am -s --scissors" did not (because of my use of
scissors in the commit message). Not sure what happened on your side..
-- 
Duy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] gc --aggressive: three phase repacking
  2014-03-17 22:59     ` Duy Nguyen
@ 2014-03-17 23:07       ` Junio C Hamano
  0 siblings, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2014-03-17 23:07 UTC (permalink / raw
  To: Duy Nguyen; +Cc: Git Mailing List

Duy Nguyen <pclouds@gmail.com> writes:

> On Tue, Mar 18, 2014 at 5:12 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>>
>>> As explained in the previous commit,...
>>
>> [PATCH 3/4] becomes a commit with an empty log message for some
>> reason.  Have you tried running "am -s" on it?
>
> "am -s" worked fine. "am -s --scissors" did not (because of my use of
> scissors in the commit message). Not sure what happened on your side..

Yeah, I meant "am -c".

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] gc --aggressive: three phase repacking
  2014-03-16 13:35 ` [PATCH 4/4] gc --aggressive: three phase repacking Nguyễn Thái Ngọc Duy
  2014-03-17 22:12   ` Junio C Hamano
@ 2014-03-18  4:50   ` Jeff King
  2014-03-18  5:00     ` Duy Nguyen
  2014-03-18  5:07     ` Jeff King
  1 sibling, 2 replies; 21+ messages in thread
From: Jeff King @ 2014-03-18  4:50 UTC (permalink / raw
  To: Nguyễn Thái Ngọc Duy; +Cc: git

On Sun, Mar 16, 2014 at 08:35:04PM +0700, Nguyễn Thái Ngọc Duy wrote:

> As explained in the previous commit, current aggressive settings
> --depth=250 --window=250 could slow down repository access
> significantly. Notice that people usually work on recent history only,
> we could keep recent history more loosely packed, so that repo access
> is fast most of the time while the pack file remains small.

One thing I have not seen is real-world timings showing the slowdown
based on --depth. Did I miss them, or are we just making assumptions
based on one old case from 2009 (that, AFAIK does not have real numbers,
just speculation)? Has anyone measured the effect of bumping the delta
cache size (and its hash implementation)?

> git.git is not a great repo to test it because its size is modest but
> so are my laptop's cpu and memory, so here are the timings and pack
> sizes
> 
>             size  time
> old aggr.   36MB  5m51
> new aggr.   37MB  6m13
> repack -adf 48MB  1m12

I am not clear on what these times mean. It looks like the new code is
slower _and_ bigger. Can you explain them?

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] gc --aggressive: three phase repacking
  2014-03-18  4:50   ` Jeff King
@ 2014-03-18  5:00     ` Duy Nguyen
  2014-03-18  5:13       ` Jeff King
  2014-03-19 11:03       ` Duy Nguyen
  2014-03-18  5:07     ` Jeff King
  1 sibling, 2 replies; 21+ messages in thread
From: Duy Nguyen @ 2014-03-18  5:00 UTC (permalink / raw
  To: Jeff King; +Cc: Git Mailing List

On Tue, Mar 18, 2014 at 11:50 AM, Jeff King <peff@peff.net> wrote:
> On Sun, Mar 16, 2014 at 08:35:04PM +0700, Nguyễn Thái Ngọc Duy wrote:
>
>> As explained in the previous commit, current aggressive settings
>> --depth=250 --window=250 could slow down repository access
>> significantly. Notice that people usually work on recent history only,
>> we could keep recent history more loosely packed, so that repo access
>> is fast most of the time while the pack file remains small.
>
> One thing I have not seen is real-world timings showing the slowdown
> based on --depth. Did I miss them, or are we just making assumptions
> based on one old case from 2009 (that, AFAIK does not have real numbers,
> just speculation)? Has anyone measured the effect of bumping the delta
> cache size (and its hash implementation)?

David tested it with git-blame [1]. I should probably run some tests
too (I don't remember if I tested some operations last time).

http://thread.gmane.org/gmane.comp.version-control.git/242277/focus=242435

>> git.git is not a great repo to test it because its size is modest but
>> so are my laptop's cpu and memory, so here are the timings and pack
>> sizes
>>
>>             size  time
>> old aggr.   36MB  5m51
>> new aggr.   37MB  6m13
>> repack -adf 48MB  1m12
>
> I am not clear on what these times mean. It looks like the new code is
> slower _and_ bigger. Can you explain them?

That's right :) The upside is faster operations, which is complely
missed here. The good thing from those numbers is pack size does not
increase much (the upper limit would be repack -adf with default
settings).
-- 
Duy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] gc --aggressive: three phase repacking
  2014-03-18  4:50   ` Jeff King
  2014-03-18  5:00     ` Duy Nguyen
@ 2014-03-18  5:07     ` Jeff King
  2014-03-18  5:16       ` Duy Nguyen
  2014-03-18  6:19       ` David Kastrup
  1 sibling, 2 replies; 21+ messages in thread
From: Jeff King @ 2014-03-18  5:07 UTC (permalink / raw
  To: Nguyễn Thái Ngọc Duy; +Cc: git

On Tue, Mar 18, 2014 at 12:50:50AM -0400, Jeff King wrote:

> On Sun, Mar 16, 2014 at 08:35:04PM +0700, Nguyễn Thái Ngọc Duy wrote:
> 
> > As explained in the previous commit, current aggressive settings
> > --depth=250 --window=250 could slow down repository access
> > significantly. Notice that people usually work on recent history only,
> > we could keep recent history more loosely packed, so that repo access
> > is fast most of the time while the pack file remains small.
> 
> One thing I have not seen is real-world timings showing the slowdown
> based on --depth. Did I miss them, or are we just making assumptions
> based on one old case from 2009 (that, AFAIK does not have real numbers,
> just speculation)? Has anyone measured the effect of bumping the delta
> cache size (and its hash implementation)?

Just as a very quick, rough data point, here are before-and-after
timings for the patch below doing "git rev-list --objects --all" on my
linux.git, which is a mix of "--aggressive" and normal packing (I didn't
do a "repack -f", but it's partially what I've downloaded from k.org and
what I've repacked in various experiments over the past few months).

  [before]
  real    0m28.824s
  user    0m28.620s
  sys     0m0.232s

  [after]
  real    0m21.694s
  user    0m21.544s
  sys     0m0.172s

The numbers below are completely pulled out of a hat, so we can perhaps
do even better. But I think it shows that there is room for improvement
in the delta base cache.

---
diff --git a/environment.c b/environment.c
index c3c8606..73ed670 100644
--- a/environment.c
+++ b/environment.c
@@ -37,7 +37,7 @@ int core_compression_seen;
 int fsync_object_files;
 size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
-size_t delta_base_cache_limit = 16 * 1024 * 1024;
+size_t delta_base_cache_limit = 128 * 1024 * 1024;
 unsigned long big_file_threshold = 512 * 1024 * 1024;
 const char *pager_program;
 int pager_use_color = 1;
diff --git a/sha1_file.c b/sha1_file.c
index b37c6f6..a9ab8e3 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1944,7 +1944,7 @@ static void *unpack_compressed_entry(struct packed_git *p,
 	return buffer;
 }
 
-#define MAX_DELTA_CACHE (256)
+#define MAX_DELTA_CACHE (1024)
 
 static size_t delta_base_cached;
 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] gc --aggressive: three phase repacking
  2014-03-18  5:00     ` Duy Nguyen
@ 2014-03-18  5:13       ` Jeff King
  2014-03-18  6:16         ` David Kastrup
  2014-03-19 11:03       ` Duy Nguyen
  1 sibling, 1 reply; 21+ messages in thread
From: Jeff King @ 2014-03-18  5:13 UTC (permalink / raw
  To: Duy Nguyen; +Cc: Git Mailing List

On Tue, Mar 18, 2014 at 12:00:48PM +0700, Duy Nguyen wrote:

> On Tue, Mar 18, 2014 at 11:50 AM, Jeff King <peff@peff.net> wrote:
> > On Sun, Mar 16, 2014 at 08:35:04PM +0700, Nguyễn Thái Ngọc Duy wrote:
> >
> >> As explained in the previous commit, current aggressive settings
> >> --depth=250 --window=250 could slow down repository access
> >> significantly. Notice that people usually work on recent history only,
> >> we could keep recent history more loosely packed, so that repo access
> >> is fast most of the time while the pack file remains small.
> >
> > One thing I have not seen is real-world timings showing the slowdown
> > based on --depth. Did I miss them, or are we just making assumptions
> > based on one old case from 2009 (that, AFAIK does not have real numbers,
> > just speculation)? Has anyone measured the effect of bumping the delta
> > cache size (and its hash implementation)?
> 
> David tested it with git-blame [1]. I should probably run some tests
> too (I don't remember if I tested some operations last time).
> 
> http://thread.gmane.org/gmane.comp.version-control.git/242277/focus=242435

Ah, thanks. I do remember that thread now.

It looks like David's last word is that he gets a significant
performance from bumping the delta base cache size (and number of
buckets). And that matches the timings I just did. I suspect there are
still pathological cases that could behave worse, but it really sounds
like we should be looking into improving that cache as a first step.

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] gc --aggressive: three phase repacking
  2014-03-18  5:07     ` Jeff King
@ 2014-03-18  5:16       ` Duy Nguyen
  2014-03-18  6:19         ` Duy Nguyen
       [not found]         ` <CALbm-EbZSuzynXoUNEifP=Ga_mj6Fp9L9Do-mxhRdMvUEfogig@mail.gmail.com>
  2014-03-18  6:19       ` David Kastrup
  1 sibling, 2 replies; 21+ messages in thread
From: Duy Nguyen @ 2014-03-18  5:16 UTC (permalink / raw
  To: Jeff King; +Cc: Git Mailing List

On Tue, Mar 18, 2014 at 12:07 PM, Jeff King <peff@peff.net> wrote:
>   [before]
>   real    0m28.824s
>   user    0m28.620s
>   sys     0m0.232s
>
>   [after]
>   real    0m21.694s
>   user    0m21.544s
>   sys     0m0.172s
>
> The numbers below are completely pulled out of a hat, so we can perhaps
> do even better. But I think it shows that there is room for improvement
> in the delta base cache.

There is definitely room for improvement in delta base cache:
increasing cache size, better eviction strategies.. But I think it's
orthogonal to gc --aggressive improvement. From what Linus described
---aggressive is more for archival purposes but I think people have
been using it for upstream repos as well. Better packed repos could
speed up current clients immediately (although this argument is
somewhat flawed, server updates are usually at slower pace than
clients..).
-- 
Duy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] gc --aggressive: three phase repacking
  2014-03-18  5:13       ` Jeff King
@ 2014-03-18  6:16         ` David Kastrup
  0 siblings, 0 replies; 21+ messages in thread
From: David Kastrup @ 2014-03-18  6:16 UTC (permalink / raw
  To: Jeff King; +Cc: Duy Nguyen, Git Mailing List

Jeff King <peff@peff.net> writes:

> On Tue, Mar 18, 2014 at 12:00:48PM +0700, Duy Nguyen wrote:
>
>> On Tue, Mar 18, 2014 at 11:50 AM, Jeff King <peff@peff.net> wrote:
>> > On Sun, Mar 16, 2014 at 08:35:04PM +0700, Nguyễn Thái Ngọc Duy wrote:
>> >
>> >> As explained in the previous commit, current aggressive settings
>> >> --depth=250 --window=250 could slow down repository access
>> >> significantly. Notice that people usually work on recent history only,
>> >> we could keep recent history more loosely packed, so that repo access
>> >> is fast most of the time while the pack file remains small.
>> >
>> > One thing I have not seen is real-world timings showing the slowdown
>> > based on --depth. Did I miss them, or are we just making assumptions
>> > based on one old case from 2009 (that, AFAIK does not have real numbers,
>> > just speculation)? Has anyone measured the effect of bumping the delta
>> > cache size (and its hash implementation)?
>> 
>> David tested it with git-blame [1]. I should probably run some tests
>> too (I don't remember if I tested some operations last time).
>> 
>> http://thread.gmane.org/gmane.comp.version-control.git/242277/focus=242435
>
> Ah, thanks. I do remember that thread now.
>
> It looks like David's last word is that he gets a significant
> performance from bumping the delta base cache size (and number of
> buckets).

Increasing number of buckets was having comparatively minor effects
(that was the suggestion I started with), actually _degrading_
performance rather soon.  The delta base cache size was much more
noticeable.  I had prepared a patch serious increasing it.  The reason
I have not submitted it yet is that I have not found a compelling
real-world test case _apart_ from the fast git-blame that is still
missing implementation of -M and -C options.

There should be other commands digging through large amounts of old
history, but I did not really find something benchmarking convincingly.
Either most stuff is inefficient anyway, or the access order is
better-behaved, causing fewer unwanted cache flushes.

Access order in the optimized git-blame case is basically done with a
reverse commit-time based priority queue leading to a breadth-first
strategy.  It still beats unsorted access solidly in its timing.  Don't
think I compared depth-first results (inversing the priority queue
sorting condition) with regard to cache results, but it's bad for
interactive use as it tends to leave some recent history unblamed for a
long time while digging up stuff in the remote past.

Moderate cache size increases seem like a better strategy, and the
default size of 16M does not make a lot of sense with modern computers.
In particular since the history digging is rarely competing with other
memory intensive operations at the same time.

> And that matches the timings I just did. I suspect there are still
> pathological cases that could behave worse, but it really sounds like
> we should be looking into improving that cache as a first step.

I can put up a patch.  My git-blame experiments used 128M, and the patch
proposes a more conservative 64M.  I don't actually have made
experiments for the 64M setting, though.  The current default is 16M.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] gc --aggressive: three phase repacking
  2014-03-18  5:07     ` Jeff King
  2014-03-18  5:16       ` Duy Nguyen
@ 2014-03-18  6:19       ` David Kastrup
  1 sibling, 0 replies; 21+ messages in thread
From: David Kastrup @ 2014-03-18  6:19 UTC (permalink / raw
  To: Jeff King; +Cc: Nguyễn Thái Ngọc Duy, git

Jeff King <peff@peff.net> writes:

> On Tue, Mar 18, 2014 at 12:50:50AM -0400, Jeff King wrote:
>
>> On Sun, Mar 16, 2014 at 08:35:04PM +0700, Nguyễn Thái Ngọc Duy wrote:
>> 
>> > As explained in the previous commit, current aggressive settings
>> > --depth=250 --window=250 could slow down repository access
>> > significantly. Notice that people usually work on recent history only,
>> > we could keep recent history more loosely packed, so that repo access
>> > is fast most of the time while the pack file remains small.
>> 
>> One thing I have not seen is real-world timings showing the slowdown
>> based on --depth. Did I miss them, or are we just making assumptions
>> based on one old case from 2009 (that, AFAIK does not have real numbers,
>> just speculation)? Has anyone measured the effect of bumping the delta
>> cache size (and its hash implementation)?
>
> Just as a very quick, rough data point, here are before-and-after
> timings for the patch below doing "git rev-list --objects --all" on my
> linux.git, which is a mix of "--aggressive" and normal packing (I didn't
> do a "repack -f", but it's partially what I've downloaded from k.org and
> what I've repacked in various experiments over the past few months).
>
>   [before]
>   real    0m28.824s
>   user    0m28.620s
>   sys     0m0.232s
>
>   [after]
>   real    0m21.694s
>   user    0m21.544s
>   sys     0m0.172s
>
> The numbers below are completely pulled out of a hat, so we can perhaps
> do even better. But I think it shows that there is room for improvement
> in the delta base cache.
>
> ---
> diff --git a/environment.c b/environment.c
> index c3c8606..73ed670 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -37,7 +37,7 @@ int core_compression_seen;
>  int fsync_object_files;
>  size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
>  size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
> -size_t delta_base_cache_limit = 16 * 1024 * 1024;
> +size_t delta_base_cache_limit = 128 * 1024 * 1024;

You need to change a file in Documentation as well.  Can offer a patch.

>  unsigned long big_file_threshold = 512 * 1024 * 1024;
>  const char *pager_program;
>  int pager_use_color = 1;
> diff --git a/sha1_file.c b/sha1_file.c
> index b37c6f6..a9ab8e3 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -1944,7 +1944,7 @@ static void *unpack_compressed_entry(struct packed_git *p,
>  	return buffer;
>  }
>  
> -#define MAX_DELTA_CACHE (256)
> +#define MAX_DELTA_CACHE (1024)

This one really needs experimentation.  I found that increases here lead
to performance degradation rather soon, probably because of decreased
memory locality without significant reduction in cache collisions.  Not
sure whether it's worth touching at all.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] gc --aggressive: three phase repacking
  2014-03-18  5:16       ` Duy Nguyen
@ 2014-03-18  6:19         ` Duy Nguyen
  2014-03-18  7:38           ` David Kastrup
       [not found]         ` <CALbm-EbZSuzynXoUNEifP=Ga_mj6Fp9L9Do-mxhRdMvUEfogig@mail.gmail.com>
  1 sibling, 1 reply; 21+ messages in thread
From: Duy Nguyen @ 2014-03-18  6:19 UTC (permalink / raw
  To: Jeff King; +Cc: Git Mailing List

On Tue, Mar 18, 2014 at 12:16 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> But I think it's orthogonal to gc --aggressive improvement.

There's another reason that improving gc may be a good idea (or not).
It depends on how other git implementations handle long delta chains.
If they hate long delta chains like current git too, then more reason
to make gc less aggressive.
-- 
Duy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] gc --aggressive: three phase repacking
  2014-03-18  6:19         ` Duy Nguyen
@ 2014-03-18  7:38           ` David Kastrup
  0 siblings, 0 replies; 21+ messages in thread
From: David Kastrup @ 2014-03-18  7:38 UTC (permalink / raw
  To: Duy Nguyen; +Cc: Jeff King, Git Mailing List

Duy Nguyen <pclouds@gmail.com> writes:

> On Tue, Mar 18, 2014 at 12:16 PM, Duy Nguyen <pclouds@gmail.com> wrote:
>> But I think it's orthogonal to gc --aggressive improvement.
>
> There's another reason that improving gc may be a good idea (or not).
> It depends on how other git implementations handle long delta chains.

"Other git implementations" including current installations that have a
half-life of at last a year.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] gc --aggressive: three phase repacking
  2014-03-18  5:00     ` Duy Nguyen
  2014-03-18  5:13       ` Jeff King
@ 2014-03-19 11:03       ` Duy Nguyen
  1 sibling, 0 replies; 21+ messages in thread
From: Duy Nguyen @ 2014-03-19 11:03 UTC (permalink / raw
  To: Jeff King; +Cc: Git Mailing List

On Tue, Mar 18, 2014 at 12:00 PM, Duy Nguyen <pclouds@gmail.com> wrote:
>>>             size  time
>>> old aggr.   36MB  5m51
>>> new aggr.   37MB  6m13
>>> repack -adf 48MB  1m12
>>
>> I am not clear on what these times mean. It looks like the new code is
>> slower _and_ bigger. Can you explain them?
>
> That's right :) The upside is faster operations, which is complely
> missed here. The good thing from those numbers is pack size does not
> increase much (the upper limit would be repack -adf with default
> settings).

Something is not right. The performance numbers are against me. Still looking..
-- 
Duy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] gc --aggressive: three phase repacking
       [not found]         ` <CALbm-EbZSuzynXoUNEifP=Ga_mj6Fp9L9Do-mxhRdMvUEfogig@mail.gmail.com>
@ 2014-03-20  1:31           ` Duy Nguyen
  0 siblings, 0 replies; 21+ messages in thread
From: Duy Nguyen @ 2014-03-20  1:31 UTC (permalink / raw
  To: Stefan Beller; +Cc: Jeff King, Git Mailing List

On Wed, Mar 19, 2014 at 9:14 PM, Stefan Beller <stefanbeller@gmail.com> wrote:
> Maybe we should introduce another option --pack-for-archive
> which features the characteristics of the old --aggressive.
> And the new --aggressive would be a tradeoff between space and
> time?
> Thinking further we could add a couple of options there like
>  --developer-friendly which makes blaming fast
>  --hosting-friendly which makes it most efficient when upstream
>  --archival-friendly (the old --aggressive)
>  --...

--aggressive-mode=<mode> would be a good option to select those
friendly modes instead of a bunch of new options. Well, maby "gc
--mode" is enough, we already have two modes: normal and aggressive
(or although maybe it should be renamed archive).
-- 
Duy

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2014-03-20  1:31 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-16 13:34 [PATCH 0/4] Better "gc --aggressive" Nguyễn Thái Ngọc Duy
2014-03-16 13:35 ` [PATCH 1/4] environment.c: fix constness for odb_pack_keep() Nguyễn Thái Ngọc Duy
2014-03-16 13:35 ` [PATCH] index-pack: do not segfault when keep_name is NULL Nguyễn Thái Ngọc Duy
2014-03-16 13:35 ` [PATCH 2/4] pack-objects: support --keep Nguyễn Thái Ngọc Duy
2014-03-16 13:35 ` [PATCH 3/4] gc --aggressive: make --depth configurable Nguyễn Thái Ngọc Duy
     [not found]   ` <CAG+J_Dw=Y5d2JTOngkxH=vNg3C43nP5=y7S6VXS=aHgmBshYZQ@mail.gmail.com>
2014-03-16 23:06     ` Duy Nguyen
2014-03-16 13:35 ` [PATCH 4/4] gc --aggressive: three phase repacking Nguyễn Thái Ngọc Duy
2014-03-17 22:12   ` Junio C Hamano
2014-03-17 22:59     ` Duy Nguyen
2014-03-17 23:07       ` Junio C Hamano
2014-03-18  4:50   ` Jeff King
2014-03-18  5:00     ` Duy Nguyen
2014-03-18  5:13       ` Jeff King
2014-03-18  6:16         ` David Kastrup
2014-03-19 11:03       ` Duy Nguyen
2014-03-18  5:07     ` Jeff King
2014-03-18  5:16       ` Duy Nguyen
2014-03-18  6:19         ` Duy Nguyen
2014-03-18  7:38           ` David Kastrup
     [not found]         ` <CALbm-EbZSuzynXoUNEifP=Ga_mj6Fp9L9Do-mxhRdMvUEfogig@mail.gmail.com>
2014-03-20  1:31           ` Duy Nguyen
2014-03-18  6:19       ` David Kastrup

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).