git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/8] Updated git-gc --auto series.
@ 2007-09-17  8:27 Junio C Hamano
  2007-09-17  8:27 ` [PATCH 1/8] Export matches_pack_name() and fix its return value Junio C Hamano
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Junio C Hamano @ 2007-09-17  8:27 UTC (permalink / raw
  To: git

An updated series of "git-gc --auto", on top of 'next',
consisting of 8 patches, will follow this message.

Differences from the previous round are:

 - Earlier if you have too many unreachable loose objects,
   automated gc would have tried to "repack -d -l" which would
   not improve the situation at all every time it was run.  It
   now at least warns upon such a situation;

 - pack-objects learned --keep-unreachable option which helps
   "repack -a -d" not to lose packed but unreachable objects
   while repacking existing packs into a new pack;

 - repack learned -A option which is similar to -a but gives
   --keep-unreachable to underlying pack-objects;

 - "git-gc --auto" runs "git-repack -A -d -l" when there are too
   many packs in the repository;

 - These changes are documented ;-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/8] Export matches_pack_name() and fix its return value
  2007-09-17  8:27 [PATCH 0/8] Updated git-gc --auto series Junio C Hamano
@ 2007-09-17  8:27 ` Junio C Hamano
  2007-09-17  8:27 ` [PATCH 2/8] pack-objects --keep-unreachable Junio C Hamano
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2007-09-17  8:27 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano

The function sounds boolean; make it behave as one, not "0 for
success, non-zero for failure".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 cache.h     |    1 +
 sha1_file.c |   14 +++++++-------
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/cache.h b/cache.h
index 70abbd5..3fa5b8e 100644
--- a/cache.h
+++ b/cache.h
@@ -529,6 +529,7 @@ extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, unsign
 extern unsigned long unpack_object_header_gently(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep);
 extern unsigned long get_size_from_delta(struct packed_git *, struct pack_window **, off_t);
 extern const char *packed_object_info_detail(struct packed_git *, off_t, unsigned long *, unsigned long *, unsigned int *, unsigned char *);
+extern int matches_pack_name(struct packed_git *p, const char *name);
 
 /* Dumb servers support */
 extern int update_server_info(int);
diff --git a/sha1_file.c b/sha1_file.c
index 9978a58..5801c3e 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1684,22 +1684,22 @@ off_t find_pack_entry_one(const unsigned char *sha1,
 	return 0;
 }
 
-static int matches_pack_name(struct packed_git *p, const char *ig)
+int matches_pack_name(struct packed_git *p, const char *name)
 {
 	const char *last_c, *c;
 
-	if (!strcmp(p->pack_name, ig))
-		return 0;
+	if (!strcmp(p->pack_name, name))
+		return 1;
 
 	for (c = p->pack_name, last_c = c; *c;)
 		if (*c == '/')
 			last_c = ++c;
 		else
 			++c;
-	if (!strcmp(last_c, ig))
-		return 0;
+	if (!strcmp(last_c, name))
+		return 1;
 
-	return 1;
+	return 0;
 }
 
 static int find_pack_entry(const unsigned char *sha1, struct pack_entry *e, const char **ignore_packed)
@@ -1717,7 +1717,7 @@ static int find_pack_entry(const unsigned char *sha1, struct pack_entry *e, cons
 		if (ignore_packed) {
 			const char **ig;
 			for (ig = ignore_packed; *ig; ig++)
-				if (!matches_pack_name(p, *ig))
+				if (matches_pack_name(p, *ig))
 					break;
 			if (*ig)
 				goto next;
-- 
1.5.3.1.967.g6bb01

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/8] pack-objects --keep-unreachable
  2007-09-17  8:27 [PATCH 0/8] Updated git-gc --auto series Junio C Hamano
  2007-09-17  8:27 ` [PATCH 1/8] Export matches_pack_name() and fix its return value Junio C Hamano
@ 2007-09-17  8:27 ` Junio C Hamano
  2007-09-17  8:27 ` [PATCH 3/8] repack -A -d: use --keep-unreachable when repacking Junio C Hamano
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2007-09-17  8:27 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano

This new option is meant to be used in conjunction with the
options "git repack -a -d" usually invokes the underlying
pack-objects.  When this option is given, objects unreachable
from the refs in packs named with --unpacked= option are added
to the resulting pack, in addition to the reachable objects that
are not in packs marked with *.keep files.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin-pack-objects.c |   95 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 12509fa..ba7c8da 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -21,7 +21,7 @@ git-pack-objects [{ -q | --progress | --all-progress }] \n\
 	[--window=N] [--window-memory=N] [--depth=N] \n\
 	[--no-reuse-delta] [--no-reuse-object] [--delta-base-offset] \n\
 	[--non-empty] [--revs [--unpacked | --all]*] [--reflog] \n\
-	[--stdout | base-name] [<ref-list | <object-list]";
+	[--stdout | base-name] [--keep-unreachable] [<ref-list | <object-list]";
 
 struct object_entry {
 	struct pack_idx_entry idx;
@@ -57,7 +57,7 @@ static struct object_entry **written_list;
 static uint32_t nr_objects, nr_alloc, nr_result, nr_written;
 
 static int non_empty;
-static int no_reuse_delta, no_reuse_object;
+static int no_reuse_delta, no_reuse_object, keep_unreachable;
 static int local;
 static int incremental;
 static int allow_ofs_delta;
@@ -1625,15 +1625,19 @@ static void read_object_list_from_stdin(void)
 	}
 }
 
+#define OBJECT_ADDED (1u<<20)
+
 static void show_commit(struct commit *commit)
 {
 	add_object_entry(commit->object.sha1, OBJ_COMMIT, NULL, 0);
+	commit->object.flags |= OBJECT_ADDED;
 }
 
 static void show_object(struct object_array_entry *p)
 {
 	add_preferred_base_object(p->name);
 	add_object_entry(p->item->sha1, p->item->type, p->name, 0);
+	p->item->flags |= OBJECT_ADDED;
 }
 
 static void show_edge(struct commit *commit)
@@ -1641,6 +1645,86 @@ static void show_edge(struct commit *commit)
 	add_preferred_base(commit->object.sha1);
 }
 
+struct in_pack_object {
+	off_t offset;
+	struct object *object;
+};
+
+struct in_pack {
+	int alloc;
+	int nr;
+	struct in_pack_object *array;
+};
+
+static void mark_in_pack_object(struct object *object, struct packed_git *p, struct in_pack *in_pack)
+{
+	in_pack->array[in_pack->nr].offset = find_pack_entry_one(object->sha1, p);
+	in_pack->array[in_pack->nr].object = object;
+	in_pack->nr++;
+}
+
+/*
+ * Compare the objects in the offset order, in order to emulate the
+ * "git-rev-list --objects" output that produced the pack originally.
+ */
+static int ofscmp(const void *a_, const void *b_)
+{
+	struct in_pack_object *a = (struct in_pack_object *)a_;
+	struct in_pack_object *b = (struct in_pack_object *)b_;
+
+	if (a->offset < b->offset)
+		return -1;
+	else if (a->offset > b->offset)
+		return 1;
+	else
+		return hashcmp(a->object->sha1, b->object->sha1);
+}
+
+static void add_objects_in_unpacked_packs(struct rev_info *revs)
+{
+	struct packed_git *p;
+	struct in_pack in_pack;
+	uint32_t i;
+
+	memset(&in_pack, 0, sizeof(in_pack));
+
+	for (p = packed_git; p; p = p->next) {
+		const unsigned char *sha1;
+		struct object *o;
+
+		for (i = 0; i < revs->num_ignore_packed; i++) {
+			if (matches_pack_name(p, revs->ignore_packed[i]))
+				break;
+		}
+		if (revs->num_ignore_packed <= i)
+			continue;
+		if (open_pack_index(p))
+			die("cannot open pack index");
+
+		ALLOC_GROW(in_pack.array,
+			   in_pack.nr + p->num_objects,
+			   in_pack.alloc);
+
+		for (i = 0; i < p->num_objects; i++) {
+			sha1 = nth_packed_object_sha1(p, i);
+			o = lookup_unknown_object(sha1);
+			if (!(o->flags & OBJECT_ADDED))
+				mark_in_pack_object(o, p, &in_pack);
+			o->flags |= OBJECT_ADDED;
+		}
+	}
+
+	if (in_pack.nr) {
+		qsort(in_pack.array, in_pack.nr, sizeof(in_pack.array[0]),
+		      ofscmp);
+		for (i = 0; i < in_pack.nr; i++) {
+			struct object *o = in_pack.array[i].object;
+			add_object_entry(o->sha1, o->type, "", 0);
+		}
+	}
+	free(in_pack.array);
+}
+
 static void get_object_list(int ac, const char **av)
 {
 	struct rev_info revs;
@@ -1672,6 +1756,9 @@ static void get_object_list(int ac, const char **av)
 	prepare_revision_walk(&revs);
 	mark_edges_uninteresting(revs.commits, &revs, show_edge);
 	traverse_commit_list(&revs, show_commit, show_object);
+
+	if (keep_unreachable)
+		add_objects_in_unpacked_packs(&revs);
 }
 
 static int adjust_perm(const char *path, mode_t mode)
@@ -1789,6 +1876,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			use_internal_rev_list = 1;
 			continue;
 		}
+		if (!strcmp("--keep-unreachable", arg)) {
+			keep_unreachable = 1;
+			continue;
+		}
 		if (!strcmp("--unpacked", arg) ||
 		    !prefixcmp(arg, "--unpacked=") ||
 		    !strcmp("--reflog", arg) ||
-- 
1.5.3.1.967.g6bb01

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/8] repack -A -d: use --keep-unreachable when repacking
  2007-09-17  8:27 [PATCH 0/8] Updated git-gc --auto series Junio C Hamano
  2007-09-17  8:27 ` [PATCH 1/8] Export matches_pack_name() and fix its return value Junio C Hamano
  2007-09-17  8:27 ` [PATCH 2/8] pack-objects --keep-unreachable Junio C Hamano
@ 2007-09-17  8:27 ` Junio C Hamano
  2007-09-17  8:27 ` [PATCH 4/8] git-gc --auto: move threshold check to need_to_gc() function Junio C Hamano
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2007-09-17  8:27 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano

This is a safer variant of "repack -a -d" that does not drop
unreachable objects that are in packs.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 git-repack.sh |   14 +++++++++++---
 1 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/git-repack.sh b/git-repack.sh
index 156c5e8..204084e 100755
--- a/git-repack.sh
+++ b/git-repack.sh
@@ -3,17 +3,19 @@
 # Copyright (c) 2005 Linus Torvalds
 #
 
-USAGE='[-a] [-d] [-f] [-l] [-n] [-q] [--max-pack-size=N] [--window=N] [--window-memory=N] [--depth=N]'
+USAGE='[-a|-A] [-d] [-f] [-l] [-n] [-q] [--max-pack-size=N] [--window=N] [--window-memory=N] [--depth=N]'
 SUBDIRECTORY_OK='Yes'
 . git-sh-setup
 
-no_update_info= all_into_one= remove_redundant=
+no_update_info= all_into_one= remove_redundant= keep_unreachable=
 local= quiet= no_reuse= extra=
 while case "$#" in 0) break ;; esac
 do
 	case "$1" in
 	-n)	no_update_info=t ;;
 	-a)	all_into_one=t ;;
+	-A)	all_into_one=t
+		keep_unreachable=t ;;
 	-d)	remove_redundant=t ;;
 	-q)	quiet=-q ;;
 	-f)	no_reuse=--no-reuse-object ;;
@@ -59,7 +61,13 @@ case ",$all_into_one," in
 			fi
 		done
 	fi
-	[ -z "$args" ] && args='--unpacked --incremental'
+	if test -z "$args"
+	then
+		args='--unpacked --incremental'
+	elif test -n "$keep_unreachable"
+	then
+		args="$args --keep-unreachable"
+	fi
 	;;
 esac
 
-- 
1.5.3.1.967.g6bb01

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/8] git-gc --auto: move threshold check to need_to_gc() function.
  2007-09-17  8:27 [PATCH 0/8] Updated git-gc --auto series Junio C Hamano
                   ` (2 preceding siblings ...)
  2007-09-17  8:27 ` [PATCH 3/8] repack -A -d: use --keep-unreachable when repacking Junio C Hamano
@ 2007-09-17  8:27 ` Junio C Hamano
  2007-09-17  8:27 ` [PATCH 5/8] git-gc --auto: add documentation Junio C Hamano
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2007-09-17  8:27 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano

That is where we decide if we are going to run gc
automatically.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin-gc.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/builtin-gc.c b/builtin-gc.c
index 093b3dd..f046a2a 100644
--- a/builtin-gc.c
+++ b/builtin-gc.c
@@ -80,6 +80,13 @@ static int need_to_gc(void)
 	int num_loose = 0;
 	int needed = 0;
 
+	/*
+	 * Setting gc.auto to 0 or negative can disable the
+	 * automatic gc
+	 */
+	if (gc_auto_threshold <= 0)
+		return 0;
+
 	if (sizeof(path) <= snprintf(path, sizeof(path), "%s/17", objdir)) {
 		warning("insanely long object directory %.*s", 50, objdir);
 		return 0;
@@ -129,8 +136,6 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 			continue;
 		}
 		if (!strcmp(arg, "--auto")) {
-			if (gc_auto_threshold <= 0)
-				return 0;
 			auto_gc = 1;
 			continue;
 		}
-- 
1.5.3.1.967.g6bb01

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/8] git-gc --auto: add documentation.
  2007-09-17  8:27 [PATCH 0/8] Updated git-gc --auto series Junio C Hamano
                   ` (3 preceding siblings ...)
  2007-09-17  8:27 ` [PATCH 4/8] git-gc --auto: move threshold check to need_to_gc() function Junio C Hamano
@ 2007-09-17  8:27 ` Junio C Hamano
  2007-09-17  8:27 ` [PATCH 6/8] git-gc --auto: protect ourselves from accumulated cruft Junio C Hamano
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2007-09-17  8:27 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano

This documents the auto-packing of loose objects performed by
git-gc --auto.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config.txt |    7 +++++++
 Documentation/git-gc.txt |   11 ++++++++++-
 2 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 866e053..3643c0b 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -439,6 +439,13 @@ gc.aggressiveWindow::
 	algorithm used by 'git gc --aggressive'.  This defaults
 	to 10.
 
+gc.auto::
+	When there are approximately more than this many loose
+	objects in the repository, `git gc --auto` that is
+	invoked by some Porcelain commands will create a new
+	pack and prune them.  Setting this to 0 disables the
+	auto garbage collection.
+
 gc.packrefs::
 	`git gc` does not run `git pack-refs` in a bare repository by
 	default so that older dumb-transport clients can still fetch
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index c7742ca..40c1ce4 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -8,7 +8,7 @@ git-gc - Cleanup unnecessary files and optimize the local repository
 
 SYNOPSIS
 --------
-'git-gc' [--prune] [--aggressive]
+'git-gc' [--prune] [--aggressive] [--auto]
 
 DESCRIPTION
 -----------
@@ -43,6 +43,15 @@ OPTIONS
 	persistent, so this option only needs to be used occasionally; every
 	few hundred changesets or so.
 
+--auto::
+	With this option, `git gc` checks if there are too many
+	loose objects in the repository and runs
+	gitlink:git-repack[1] with `-d -l` option to pack them.
+	The threshold is set with `gc.auto` configuration
+	variable, and can be disabled by setting it to 0.  Some
+	Porcelain commands use this after they perform operation
+	that could create many loose objects automatically.
+
 Configuration
 -------------
 
-- 
1.5.3.1.967.g6bb01

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 6/8] git-gc --auto: protect ourselves from accumulated cruft
  2007-09-17  8:27 [PATCH 0/8] Updated git-gc --auto series Junio C Hamano
                   ` (4 preceding siblings ...)
  2007-09-17  8:27 ` [PATCH 5/8] git-gc --auto: add documentation Junio C Hamano
@ 2007-09-17  8:27 ` Junio C Hamano
  2007-09-17  8:27 ` [PATCH 7/8] git-gc --auto: restructure the way "repack" command line is built Junio C Hamano
  2007-09-17  8:27 ` [PATCH 8/8] git-gc --auto: run "repack -A -d -l" as necessary Junio C Hamano
  7 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2007-09-17  8:27 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano

Deciding to run "repack -d -l" when there are too many
loose objects would backfire when there are too many loose
objects that are unreachable, because repacking that way would
never improve the situation.  Detect that case by checking the
number of loose objects again after automatic garbage collection
runs, and issue an warning to run "prune" manually.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin-gc.c |   25 +++++++++++++++++--------
 1 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/builtin-gc.c b/builtin-gc.c
index f046a2a..bf29f5e 100644
--- a/builtin-gc.c
+++ b/builtin-gc.c
@@ -64,7 +64,7 @@ static void append_option(const char **cmd, const char *opt, int max_length)
 	cmd[i] = NULL;
 }
 
-static int need_to_gc(void)
+static int too_many_loose_objects(void)
 {
 	/*
 	 * Quickly check if a "gc" is needed, by estimating how
@@ -80,13 +80,6 @@ static int need_to_gc(void)
 	int num_loose = 0;
 	int needed = 0;
 
-	/*
-	 * Setting gc.auto to 0 or negative can disable the
-	 * automatic gc
-	 */
-	if (gc_auto_threshold <= 0)
-		return 0;
-
 	if (sizeof(path) <= snprintf(path, sizeof(path), "%s/17", objdir)) {
 		warning("insanely long object directory %.*s", 50, objdir);
 		return 0;
@@ -109,6 +102,18 @@ static int need_to_gc(void)
 	return needed;
 }
 
+static int need_to_gc(void)
+{
+	/*
+	 * Setting gc.auto to 0 or negative can disable the
+	 * automatic gc
+	 */
+	if (gc_auto_threshold <= 0)
+		return 0;
+
+	return too_many_loose_objects();
+}
+
 int cmd_gc(int argc, const char **argv, const char *prefix)
 {
 	int i;
@@ -170,5 +175,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 	if (run_command_v_opt(argv_rerere, RUN_GIT_CMD))
 		return error(FAILED_RUN, argv_rerere[0]);
 
+	if (auto_gc && too_many_loose_objects())
+		warning("There are too many unreachable loose objects; "
+			"run 'git prune' to remove them.");
+
 	return 0;
 }
-- 
1.5.3.1.967.g6bb01

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 7/8] git-gc --auto: restructure the way "repack" command line is built.
  2007-09-17  8:27 [PATCH 0/8] Updated git-gc --auto series Junio C Hamano
                   ` (5 preceding siblings ...)
  2007-09-17  8:27 ` [PATCH 6/8] git-gc --auto: protect ourselves from accumulated cruft Junio C Hamano
@ 2007-09-17  8:27 ` Junio C Hamano
  2007-09-17  8:27 ` [PATCH 8/8] git-gc --auto: run "repack -A -d -l" as necessary Junio C Hamano
  7 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2007-09-17  8:27 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano

We used to build the command line to run repack outside of
need_to_gc() but with the next patch we would want to tweak the
command line depending on the nature of need.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin-gc.c |   15 ++++++++++-----
 1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/builtin-gc.c b/builtin-gc.c
index bf29f5e..34ce35b 100644
--- a/builtin-gc.c
+++ b/builtin-gc.c
@@ -29,8 +29,6 @@ static const char *argv_repack[MAX_ADD] = {"repack", "-a", "-d", "-l", NULL};
 static const char *argv_prune[] = {"prune", NULL};
 static const char *argv_rerere[] = {"rerere", "gc", NULL};
 
-static const char *argv_repack_auto[] = {"repack", "-d", "-l", NULL};
-
 static int gc_config(const char *var, const char *value)
 {
 	if (!strcmp(var, "gc.packrefs")) {
@@ -104,6 +102,8 @@ static int too_many_loose_objects(void)
 
 static int need_to_gc(void)
 {
+	int ac = 0;
+
 	/*
 	 * Setting gc.auto to 0 or negative can disable the
 	 * automatic gc
@@ -111,7 +111,14 @@ static int need_to_gc(void)
 	if (gc_auto_threshold <= 0)
 		return 0;
 
-	return too_many_loose_objects();
+	if (!too_many_loose_objects())
+		return 0;
+
+	argv_repack[ac++] = "repack";
+	argv_repack[ac++] = "-d";
+	argv_repack[ac++] = "-l";
+	argv_repack[ac++] = NULL;
+	return 1;
 }
 
 int cmd_gc(int argc, const char **argv, const char *prefix)
@@ -154,8 +161,6 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 		 * Auto-gc should be least intrusive as possible.
 		 */
 		prune = 0;
-		for (i = 0; i < ARRAY_SIZE(argv_repack_auto); i++)
-			argv_repack[i] = argv_repack_auto[i];
 		if (!need_to_gc())
 			return 0;
 	}
-- 
1.5.3.1.967.g6bb01

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 8/8] git-gc --auto: run "repack -A -d -l" as necessary.
  2007-09-17  8:27 [PATCH 0/8] Updated git-gc --auto series Junio C Hamano
                   ` (6 preceding siblings ...)
  2007-09-17  8:27 ` [PATCH 7/8] git-gc --auto: restructure the way "repack" command line is built Junio C Hamano
@ 2007-09-17  8:27 ` Junio C Hamano
  7 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2007-09-17  8:27 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano

This teaches "git-gc --auto" to consolidate many packs into one
without losing unreachable objects in them by using "repack -A"
when there are too many packfiles that are not marked with *.keep
in the repository.  gc.autopacklimit configuration can be used
to set the maximum number of packs a repository is allowed to
have before this mechanism kicks in.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config.txt |    9 +++++-
 Documentation/git-gc.txt |    7 +++++-
 builtin-gc.c             |   57 +++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 64 insertions(+), 9 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 3643c0b..f5136c3 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -443,8 +443,13 @@ gc.auto::
 	When there are approximately more than this many loose
 	objects in the repository, `git gc --auto` that is
 	invoked by some Porcelain commands will create a new
-	pack and prune them.  Setting this to 0 disables the
-	auto garbage collection.
+	pack and prune them.  Setting this to 0 disables this.
+
+gc.autopacklimit::
+	When there are more than this many packs that are not
+	marked with `*.keep` file in the repository, `git gc
+	--auto` consolidates them into one larger pack.  Setting
+	this to 0 disables this.
 
 gc.packrefs::
 	`git gc` does not run `git pack-refs` in a bare repository by
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 40c1ce4..b9d5660 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -47,10 +47,15 @@ OPTIONS
 	With this option, `git gc` checks if there are too many
 	loose objects in the repository and runs
 	gitlink:git-repack[1] with `-d -l` option to pack them.
-	The threshold is set with `gc.auto` configuration
+	The threshold for loose objects is set with `gc.auto` configuration
 	variable, and can be disabled by setting it to 0.  Some
 	Porcelain commands use this after they perform operation
 	that could create many loose objects automatically.
+	Additionally, when there are too many packs are present,
+	they are consolidated into one larger pack by running
+	the `git-repack` command with `-A` option.  The
+	threshold for number of packs is set with
+	`gc.autopacklimit` configuration variable.
 
 Configuration
 -------------
diff --git a/builtin-gc.c b/builtin-gc.c
index 34ce35b..a82f6be 100644
--- a/builtin-gc.c
+++ b/builtin-gc.c
@@ -21,6 +21,7 @@ static const char builtin_gc_usage[] = "git-gc [--prune] [--aggressive]";
 static int pack_refs = 1;
 static int aggressive_window = -1;
 static int gc_auto_threshold = 6700;
+static int gc_auto_pack_limit = 20;
 
 #define MAX_ADD 10
 static const char *argv_pack_refs[] = {"pack-refs", "--all", "--prune", NULL};
@@ -46,6 +47,10 @@ static int gc_config(const char *var, const char *value)
 		gc_auto_threshold = git_config_int(var, value);
 		return 0;
 	}
+	if (!strcmp(var, "gc.autopacklimit")) {
+		gc_auto_pack_limit = git_config_int(var, value);
+		return 0;
+	}
 	return git_default_config(var, value);
 }
 
@@ -78,6 +83,9 @@ static int too_many_loose_objects(void)
 	int num_loose = 0;
 	int needed = 0;
 
+	if (gc_auto_threshold <= 0)
+		return 0;
+
 	if (sizeof(path) <= snprintf(path, sizeof(path), "%s/17", objdir)) {
 		warning("insanely long object directory %.*s", 50, objdir);
 		return 0;
@@ -100,21 +108,58 @@ static int too_many_loose_objects(void)
 	return needed;
 }
 
+static int too_many_packs(void)
+{
+	struct packed_git *p;
+	int cnt;
+
+	if (gc_auto_pack_limit <= 0)
+		return 0;
+
+	for (cnt = 0, p = packed_git; p; p = p->next) {
+		char *suffix;
+		int keep;
+		if (!p->pack_local)
+			continue;
+		suffix = p->pack_name + strlen(p->pack_name) - 5;
+		if (memcmp(suffix, ".pack", 6))
+			continue;
+		memcpy(suffix, ".keep", 6);
+		keep = access(p->pack_name, F_OK) && (errno == ENOENT);
+		memcpy(suffix, ".pack", 6);
+		if (keep)
+			continue;
+		/*
+		 * Perhaps check the size of the pack and count only
+		 * very small ones here?
+		 */
+		cnt++;
+	}
+	return gc_auto_pack_limit <= cnt;
+}
+
 static int need_to_gc(void)
 {
 	int ac = 0;
 
 	/*
-	 * Setting gc.auto to 0 or negative can disable the
-	 * automatic gc
+	 * Setting gc.auto and gc.autopacklimit to 0 or negative can
+	 * disable the automatic gc.
 	 */
-	if (gc_auto_threshold <= 0)
-		return 0;
-
-	if (!too_many_loose_objects())
+	if (gc_auto_threshold <= 0 && gc_auto_pack_limit <= 0)
 		return 0;
 
+	/*
+	 * If there are too many loose objects, but not too many
+	 * packs, we run "repack -d -l".  If there are too many packs,
+	 * we run "repack -A -d -l".  Otherwise we tell the caller
+	 * there is no need.
+	 */
 	argv_repack[ac++] = "repack";
+	if (too_many_packs())
+		argv_repack[ac++] = "-A";
+	if (!too_many_loose_objects() && ac == 1)
+		return 0;
 	argv_repack[ac++] = "-d";
 	argv_repack[ac++] = "-l";
 	argv_repack[ac++] = NULL;
-- 
1.5.3.1.967.g6bb01

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/8] git-gc --auto: add documentation.
  2007-09-17  8:44 [PATCH 1/8] Export matches_pack_name() and fix its return value Junio C Hamano
@ 2007-09-17  8:44 ` Junio C Hamano
  2007-09-17  9:36   ` Johannes Schindelin
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2007-09-17  8:44 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano

This documents the auto-packing of loose objects performed by
git-gc --auto.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config.txt |    7 +++++++
 Documentation/git-gc.txt |   11 ++++++++++-
 2 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 866e053..3643c0b 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -439,6 +439,13 @@ gc.aggressiveWindow::
 	algorithm used by 'git gc --aggressive'.  This defaults
 	to 10.
 
+gc.auto::
+	When there are approximately more than this many loose
+	objects in the repository, `git gc --auto` that is
+	invoked by some Porcelain commands will create a new
+	pack and prune them.  Setting this to 0 disables the
+	auto garbage collection.
+
 gc.packrefs::
 	`git gc` does not run `git pack-refs` in a bare repository by
 	default so that older dumb-transport clients can still fetch
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index c7742ca..40c1ce4 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -8,7 +8,7 @@ git-gc - Cleanup unnecessary files and optimize the local repository
 
 SYNOPSIS
 --------
-'git-gc' [--prune] [--aggressive]
+'git-gc' [--prune] [--aggressive] [--auto]
 
 DESCRIPTION
 -----------
@@ -43,6 +43,15 @@ OPTIONS
 	persistent, so this option only needs to be used occasionally; every
 	few hundred changesets or so.
 
+--auto::
+	With this option, `git gc` checks if there are too many
+	loose objects in the repository and runs
+	gitlink:git-repack[1] with `-d -l` option to pack them.
+	The threshold is set with `gc.auto` configuration
+	variable, and can be disabled by setting it to 0.  Some
+	Porcelain commands use this after they perform operation
+	that could create many loose objects automatically.
+
 Configuration
 -------------
 
-- 
1.5.3.1.967.g6bb01

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/8] git-gc --auto: add documentation.
  2007-09-17  8:44 ` [PATCH 5/8] git-gc --auto: add documentation Junio C Hamano
@ 2007-09-17  9:36   ` Johannes Schindelin
  2007-09-17 19:54     ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Johannes Schindelin @ 2007-09-17  9:36 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

Hi,

On Mon, 17 Sep 2007, Junio C Hamano wrote:

> +	When there are approximately more than this many loose
> +	objects in the repository, `git gc --auto` that is
> +	invoked by some Porcelain commands will create a new
> +	pack and prune them.

It might strike a newbie as dangerous to read "prune them" here.  Maybe

	... `git gc --auto` will pack the loose objects.  Some Porcelain 
	commands use this command, to do a light weight garbage collection
	from time to time.
+
*Note*: you should still run "git gc" from time to time, since the 
"--auto" option is designed to be fast, whereas "git gc" _without_ 
auto is designed to create small packs.

Hmm? (I think that this is the most visible part of the --auto business, 
so I would document the "git gc" note here...)

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/8] git-gc --auto: add documentation.
  2007-09-17  9:36   ` Johannes Schindelin
@ 2007-09-17 19:54     ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2007-09-17 19:54 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Mon, 17 Sep 2007, Junio C Hamano wrote:
>
>> +	When there are approximately more than this many loose
>> +	objects in the repository, `git gc --auto` that is
>> +	invoked by some Porcelain commands will create a new
>> +	pack and prune them.
>
> It might strike a newbie as dangerous to read "prune them" here.  Maybe
>
> 	... `git gc --auto` will pack the loose objects.  Some Porcelain 
> 	commands use this command, to do a light weight garbage collection
> 	from time to time.
> +
> *Note*: you should still run "git gc" from time to time, since the 
> "--auto" option is designed to be fast, whereas "git gc" _without_ 
> auto is designed to create small packs.
>
> Hmm? (I think that this is the most visible part of the --auto business, 
> so I would document the "git gc" note here...)

Makes sense.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-09-17 19:54 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-17  8:27 [PATCH 0/8] Updated git-gc --auto series Junio C Hamano
2007-09-17  8:27 ` [PATCH 1/8] Export matches_pack_name() and fix its return value Junio C Hamano
2007-09-17  8:27 ` [PATCH 2/8] pack-objects --keep-unreachable Junio C Hamano
2007-09-17  8:27 ` [PATCH 3/8] repack -A -d: use --keep-unreachable when repacking Junio C Hamano
2007-09-17  8:27 ` [PATCH 4/8] git-gc --auto: move threshold check to need_to_gc() function Junio C Hamano
2007-09-17  8:27 ` [PATCH 5/8] git-gc --auto: add documentation Junio C Hamano
2007-09-17  8:27 ` [PATCH 6/8] git-gc --auto: protect ourselves from accumulated cruft Junio C Hamano
2007-09-17  8:27 ` [PATCH 7/8] git-gc --auto: restructure the way "repack" command line is built Junio C Hamano
2007-09-17  8:27 ` [PATCH 8/8] git-gc --auto: run "repack -A -d -l" as necessary Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2007-09-17  8:44 [PATCH 1/8] Export matches_pack_name() and fix its return value Junio C Hamano
2007-09-17  8:44 ` [PATCH 5/8] git-gc --auto: add documentation Junio C Hamano
2007-09-17  9:36   ` Johannes Schindelin
2007-09-17 19:54     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).