git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC/PATCH 0/8] Add configuration options for split-index
@ 2016-07-11 17:22 Christian Couder
  2016-07-11 17:22 ` [RFC/PATCH 1/8] config: add git_config_get_split_index() Christian Couder
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Christian Couder @ 2016-07-11 17:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Nguyen Thai Ngoc Duy, Christian Couder

Goal
~~~~

We want to make it possible to use the split-index feature
automatically by just setting a new "core.splitIndex" configuration
variable to true.

This can be valuable as split-index can help significantly speed up
`git rebase` especially along with the current work to libify `git
apply`.

Design
~~~~~~

The design is similar as the previous work that introduced
"core.untrackedCache". 

The new "core.splitIndex" configuration option can be either true,
false or undefined which is the default.

When it is true, the split index is created, if it does not already
exists, when the index is read. When it is false, the split index is
removed if it exists, when the index is read. Otherwise it is left as
is.

Highlevel view of the patches in the series
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    - Patches 1/8 to 4/8 introduce the functions that are reading the
      configuration variable and tweaking the split index depending on
      its value.

    - Patches 5/8 and 6/8 add some documentation for the new feature.

    - Patch 7/8 adds a few simple tests.

    - Patch 8/8 cleans up the old "sharedindex.XXXX" previously
      created by split-index.

Future work
~~~~~~~~~~~

One thing that is probably missing is a mechanism to avoid having too
many changes accumulating in the (split) index while in split index
mode. The git-update-index documentation says:

	If split-index mode is already enabled and `--split-index` is
	given again, all changes in $GIT_DIR/index are pushed back to
	the shared index file.

but it is probably better to not expect the user to think about it and
to have a mechanism that pushes back all changes to the shared index
file automatically when some threshold is reached. The threshold could
be for example when $GIT_DIR/index size is larger than 25% of the
shared index size. Opinions, test results or test ideas are welcome on
this.

Links
~~~~~

The last iteration of the git apply libification patch series is
there:

http://thread.gmane.org/gmane.comp.version-control.git/298344/

This patch series is also available here:

https://github.com/chriscool/git/commits/config-split-index


Christian Couder (8):
  config: add git_config_get_split_index()
  split-index: add {add,remove}_split_index() functions
  read-cache: add and then use tweak_split_index()
  update-index: warn in case of split-index incoherency
  Documentation/config: add information for core.splitIndex
  Documentation/git-update-index: talk about core.splitIndex config var
  t1700: add tests for core.splitIndex
  read-cache: unlink old sharedindex files

 Documentation/config.txt           |  4 ++++
 Documentation/git-update-index.txt |  6 ++++++
 builtin/update-index.c             | 20 ++++++++---------
 cache.h                            |  1 +
 config.c                           | 10 +++++++++
 read-cache.c                       | 44 +++++++++++++++++++++++++++++++++++++-
 split-index.c                      | 18 ++++++++++++++++
 split-index.h                      |  2 ++
 t/t1700-split-index.sh             | 44 ++++++++++++++++++++++++++++++++++++++
 9 files changed, 138 insertions(+), 11 deletions(-)

-- 
2.9.0.250.g7087ccc.dirty


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC/PATCH 1/8] config: add git_config_get_split_index()
  2016-07-11 17:22 [RFC/PATCH 0/8] Add configuration options for split-index Christian Couder
@ 2016-07-11 17:22 ` Christian Couder
  2016-07-11 17:22 ` [RFC/PATCH 2/8] split-index: add {add,remove}_split_index() functions Christian Couder
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Christian Couder @ 2016-07-11 17:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Nguyen Thai Ngoc Duy, Christian Couder

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 cache.h  |  1 +
 config.c | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/cache.h b/cache.h
index f1dc289..5296a50 100644
--- a/cache.h
+++ b/cache.h
@@ -1695,6 +1695,7 @@ extern int git_config_get_bool_or_int(const char *key, int *is_bool, int *dest);
 extern int git_config_get_maybe_bool(const char *key, int *dest);
 extern int git_config_get_pathname(const char *key, const char **dest);
 extern int git_config_get_untracked_cache(void);
+extern int git_config_get_split_index(void);
 
 /*
  * This is a hack for test programs like test-dump-untracked-cache to
diff --git a/config.c b/config.c
index bea937e..318194b 100644
--- a/config.c
+++ b/config.c
@@ -1656,6 +1656,16 @@ int git_config_get_untracked_cache(void)
 	return -1; /* default value */
 }
 
+int git_config_get_split_index(void)
+{
+	int val = -1;
+
+	if (!git_config_get_maybe_bool("core.splitindex", &val))
+		return val;
+
+	return -1; /* default value */
+}
+
 NORETURN
 void git_die_config_linenr(const char *key, const char *filename, int linenr)
 {
-- 
2.9.0.250.g7087ccc.dirty


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC/PATCH 2/8] split-index: add {add,remove}_split_index() functions
  2016-07-11 17:22 [RFC/PATCH 0/8] Add configuration options for split-index Christian Couder
  2016-07-11 17:22 ` [RFC/PATCH 1/8] config: add git_config_get_split_index() Christian Couder
@ 2016-07-11 17:22 ` Christian Couder
  2016-07-11 17:22 ` [RFC/PATCH 3/8] read-cache: add and then use tweak_split_index() Christian Couder
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Christian Couder @ 2016-07-11 17:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Nguyen Thai Ngoc Duy, Christian Couder

Also use the functions in cmd_update_index() in
builtin/update-index.c.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 builtin/update-index.c | 17 ++++-------------
 split-index.c          | 18 ++++++++++++++++++
 split-index.h          |  2 ++
 3 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index 6cdfd5f..f06fe80 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -1108,19 +1108,10 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 		strbuf_release(&buf);
 	}
 
-	if (split_index > 0) {
-		init_split_index(&the_index);
-		the_index.cache_changed |= SPLIT_INDEX_ORDERED;
-	} else if (!split_index && the_index.split_index) {
-		/*
-		 * can't discard_split_index(&the_index); because that
-		 * will destroy split_index->base->cache[], which may
-		 * be shared with the_index.cache[]. So yeah we're
-		 * leaking a bit here.
-		 */
-		the_index.split_index = NULL;
-		the_index.cache_changed |= SOMETHING_CHANGED;
-	}
+	if (split_index > 0)
+		add_split_index(&the_index);
+	else if (!split_index && the_index.split_index)
+		remove_split_index(&the_index);
 
 	switch (untracked_cache) {
 	case UC_UNSPECIFIED:
diff --git a/split-index.c b/split-index.c
index 3c75d4b..9466b69 100644
--- a/split-index.c
+++ b/split-index.c
@@ -319,3 +319,21 @@ void replace_index_entry_in_base(struct index_state *istate,
 		istate->split_index->base->cache[new->index - 1] = new;
 	}
 }
+
+void add_split_index(struct index_state *istate)
+{
+	init_split_index(istate);
+	istate->cache_changed |= SPLIT_INDEX_ORDERED;
+}
+
+void remove_split_index(struct index_state *istate)
+{
+	/*
+	 * can't discard_split_index(&the_index); because that
+	 * will destroy split_index->base->cache[], which may
+	 * be shared with the_index.cache[]. So yeah we're
+	 * leaking a bit here.
+	 */
+	istate->split_index = NULL;
+	istate->cache_changed |= SOMETHING_CHANGED;
+}
diff --git a/split-index.h b/split-index.h
index c1324f5..df91c1b 100644
--- a/split-index.h
+++ b/split-index.h
@@ -31,5 +31,7 @@ void merge_base_index(struct index_state *istate);
 void prepare_to_write_split_index(struct index_state *istate);
 void finish_writing_split_index(struct index_state *istate);
 void discard_split_index(struct index_state *istate);
+void add_split_index(struct index_state *istate);
+void remove_split_index(struct index_state *istate);
 
 #endif
-- 
2.9.0.250.g7087ccc.dirty


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC/PATCH 3/8] read-cache: add and then use tweak_split_index()
  2016-07-11 17:22 [RFC/PATCH 0/8] Add configuration options for split-index Christian Couder
  2016-07-11 17:22 ` [RFC/PATCH 1/8] config: add git_config_get_split_index() Christian Couder
  2016-07-11 17:22 ` [RFC/PATCH 2/8] split-index: add {add,remove}_split_index() functions Christian Couder
@ 2016-07-11 17:22 ` Christian Couder
  2016-07-11 17:22 ` [RFC/PATCH 4/8] update-index: warn in case of split-index incoherency Christian Couder
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Christian Couder @ 2016-07-11 17:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Nguyen Thai Ngoc Duy, Christian Couder

This will make us use the split-index feature or not depending
on the value of the core.splitIndex config variable.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 read-cache.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/read-cache.c b/read-cache.c
index db27766..ae292d6 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1538,10 +1538,27 @@ static void tweak_untracked_cache(struct index_state *istate)
 	}
 }
 
+static void tweak_split_index(struct index_state *istate)
+{
+	switch (git_config_get_split_index()) {
+	case -1: /* unset: do nothing */
+		break;
+	case 0: /* false */
+		remove_split_index(istate);
+		break;
+	case 1: /* true */
+		add_split_index(istate);
+		break;
+	default: /* unknown value: do nothing */
+		break;
+	}
+}
+
 static void post_read_index_from(struct index_state *istate)
 {
 	check_ce_order(istate);
 	tweak_untracked_cache(istate);
+	tweak_split_index(istate);
 }
 
 /* remember to discard_cache() before reading a different cache! */
-- 
2.9.0.250.g7087ccc.dirty


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC/PATCH 4/8] update-index: warn in case of split-index incoherency
  2016-07-11 17:22 [RFC/PATCH 0/8] Add configuration options for split-index Christian Couder
                   ` (2 preceding siblings ...)
  2016-07-11 17:22 ` [RFC/PATCH 3/8] read-cache: add and then use tweak_split_index() Christian Couder
@ 2016-07-11 17:22 ` Christian Couder
  2016-07-11 17:22 ` [RFC/PATCH 5/8] Documentation/config: add information for core.splitIndex Christian Couder
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Christian Couder @ 2016-07-11 17:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Nguyen Thai Ngoc Duy, Christian Couder

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 builtin/update-index.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index f06fe80..2b8aaa6 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -1108,10 +1108,19 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 		strbuf_release(&buf);
 	}
 
-	if (split_index > 0)
+	if (split_index > 0) {
+		if (git_config_get_split_index() == 0)
+			warning("core.splitIndex is set to false; "
+				"remove or change it, if you really want to "
+				"enable split index");
 		add_split_index(&the_index);
-	else if (!split_index && the_index.split_index)
+	} else if (!split_index && the_index.split_index) {
+		if (git_config_get_split_index() == 1)
+			warning("core.splitIndex is set to true; "
+				"remove or change it, if you really want to "
+				"disable split index");
 		remove_split_index(&the_index);
+	}
 
 	switch (untracked_cache) {
 	case UC_UNSPECIFIED:
-- 
2.9.0.250.g7087ccc.dirty


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC/PATCH 5/8] Documentation/config: add information for core.splitIndex
  2016-07-11 17:22 [RFC/PATCH 0/8] Add configuration options for split-index Christian Couder
                   ` (3 preceding siblings ...)
  2016-07-11 17:22 ` [RFC/PATCH 4/8] update-index: warn in case of split-index incoherency Christian Couder
@ 2016-07-11 17:22 ` Christian Couder
  2016-07-11 17:22 ` [RFC/PATCH 6/8] Documentation/git-update-index: talk about core.splitIndex config var Christian Couder
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Christian Couder @ 2016-07-11 17:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Nguyen Thai Ngoc Duy, Christian Couder

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Documentation/config.txt | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index e208af1..aec8ecb 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -324,6 +324,10 @@ core.trustctime::
 	crawlers and some backup systems).
 	See linkgit:git-update-index[1]. True by default.
 
+core.splitIndex::
+	If true, the split-index feature of the index will be used.
+	See linkgit:git-update-index[1]. False by default.
+
 core.untrackedCache::
 	Determines what to do about the untracked cache feature of the
 	index. It will be kept, if this variable is unset or set to
-- 
2.9.0.250.g7087ccc.dirty


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC/PATCH 6/8] Documentation/git-update-index: talk about core.splitIndex config var
  2016-07-11 17:22 [RFC/PATCH 0/8] Add configuration options for split-index Christian Couder
                   ` (4 preceding siblings ...)
  2016-07-11 17:22 ` [RFC/PATCH 5/8] Documentation/config: add information for core.splitIndex Christian Couder
@ 2016-07-11 17:22 ` Christian Couder
  2016-07-11 17:22 ` [RFC/PATCH 7/8] t1700: add tests for core.splitIndex Christian Couder
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Christian Couder @ 2016-07-11 17:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Nguyen Thai Ngoc Duy, Christian Couder

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Documentation/git-update-index.txt | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt
index c6cbed1..2293140 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -171,6 +171,12 @@ may not support it yet.
 	given again, all changes in $GIT_DIR/index are pushed back to
 	the shared index file. This mode is designed for very large
 	indexes that take a significant amount of time to read or write.
++
+These options take effect whatever the value of the `core.splitIndex`
+configuration variable (see linkgit:git-config[1]). But a warning is
+emitted when the change goes against the configured value, as the
+configured value will take effect next time the index is read and this
+will remove the intended effect of the option.
 
 --untracked-cache::
 --no-untracked-cache::
-- 
2.9.0.250.g7087ccc.dirty


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC/PATCH 7/8] t1700: add tests for core.splitIndex
  2016-07-11 17:22 [RFC/PATCH 0/8] Add configuration options for split-index Christian Couder
                   ` (5 preceding siblings ...)
  2016-07-11 17:22 ` [RFC/PATCH 6/8] Documentation/git-update-index: talk about core.splitIndex config var Christian Couder
@ 2016-07-11 17:22 ` Christian Couder
  2016-07-11 17:22 ` [RFC/PATCH 8/8] read-cache: unlink old sharedindex files Christian Couder
  2016-07-12 16:01 ` [RFC/PATCH 0/8] Add configuration options for split-index Duy Nguyen
  8 siblings, 0 replies; 19+ messages in thread
From: Christian Couder @ 2016-07-11 17:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Nguyen Thai Ngoc Duy, Christian Couder

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t1700-split-index.sh | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index 8aef49f..f1af0d5 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -200,4 +200,41 @@ EOF
 	test_cmp expect actual
 '
 
+test_expect_success 'set core.splitIndex config variable to true' '
+	git config core.splitIndex true &&
+	: >three &&
+	git update-index --add three &&
+	git ls-files --stage >ls-files.actual &&
+	cat >ls-files.expect <<EOF &&
+100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0	one
+100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0	three
+100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0	two
+EOF
+	test_cmp ls-files.expect ls-files.actual &&
+	BASE=$(test-dump-split-index .git/index | grep "^base") &&
+	test-dump-split-index .git/index | sed "/^own/d" >actual &&
+	cat >expect <<EOF &&
+$BASE
+replacements:
+deletions:
+EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'set core.splitIndex config variable to false' '
+	git config core.splitIndex false &&
+	git update-index --force-remove three &&
+	git ls-files --stage >ls-files.actual &&
+	cat >ls-files.expect <<EOF &&
+100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0	one
+100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0	two
+EOF
+	test_cmp ls-files.expect ls-files.actual &&
+	test-dump-split-index .git/index | sed "/^own/d" >actual &&
+	cat >expect <<EOF &&
+not a split index
+EOF
+	test_cmp expect actual
+'
+
 test_done
-- 
2.9.0.250.g7087ccc.dirty


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC/PATCH 8/8] read-cache: unlink old sharedindex files
  2016-07-11 17:22 [RFC/PATCH 0/8] Add configuration options for split-index Christian Couder
                   ` (6 preceding siblings ...)
  2016-07-11 17:22 ` [RFC/PATCH 7/8] t1700: add tests for core.splitIndex Christian Couder
@ 2016-07-11 17:22 ` Christian Couder
  2016-07-11 18:27   ` Duy Nguyen
  2016-07-12 16:01 ` [RFC/PATCH 0/8] Add configuration options for split-index Duy Nguyen
  8 siblings, 1 reply; 19+ messages in thread
From: Christian Couder @ 2016-07-11 17:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Nguyen Thai Ngoc Duy, Christian Couder

Everytime split index is turned on, it creates a "sharedindex.XXXX"
file in the git directory. This makes sure that old sharedindex
files are removed after a new one has been created.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 read-cache.c           | 27 ++++++++++++++++++++++++++-
 t/t1700-split-index.sh |  7 +++++++
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index ae292d6..13aa058 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2166,6 +2166,28 @@ static int write_split_index(struct index_state *istate,
 	return ret;
 }
 
+static void clean_shared_index_files(const char *current_hex)
+{
+	struct dirent *de;
+	DIR *dir = opendir(get_git_dir());
+
+	if (!dir) {
+		error_errno("unable to open git dir: %s", get_git_dir());
+		return;
+	}
+
+	while ((de = readdir(dir)) != NULL) {
+		const char *sha1_hex;
+		if (!skip_prefix(de->d_name, "sharedindex.", &sha1_hex))
+			continue;
+		if (!strcmp(sha1_hex, current_hex))
+			continue;
+		if (unlink(git_path("%s", de->d_name)))
+			error_errno("unable to unlink: %s", git_path("%s", de->d_name));
+	}
+	closedir(dir);
+}
+
 static struct tempfile temporary_sharedindex;
 
 static int write_shared_index(struct index_state *istate,
@@ -2187,8 +2209,11 @@ static int write_shared_index(struct index_state *istate,
 	}
 	ret = rename_tempfile(&temporary_sharedindex,
 			      git_path("sharedindex.%s", sha1_to_hex(si->base->sha1)));
-	if (!ret)
+	if (!ret) {
 		hashcpy(si->base_sha1, si->base->sha1);
+		clean_shared_index_files(sha1_to_hex(si->base->sha1));
+	}
+
 	return ret;
 }
 
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index f1af0d5..5036efa 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -237,4 +237,11 @@ EOF
 	test_cmp expect actual
 '
 
+test_expect_success 'only one "sharedindex" files' '
+	git config core.splitIndex true &&
+	: >three &&
+	git update-index --add three &&
+	test $(ls .git/sharedindex.* | wc -l) = 1
+'
+
 test_done
-- 
2.9.0.250.g7087ccc.dirty


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH 8/8] read-cache: unlink old sharedindex files
  2016-07-11 17:22 ` [RFC/PATCH 8/8] read-cache: unlink old sharedindex files Christian Couder
@ 2016-07-11 18:27   ` Duy Nguyen
  2016-07-12  7:04     ` Christian Couder
  0 siblings, 1 reply; 19+ messages in thread
From: Duy Nguyen @ 2016-07-11 18:27 UTC (permalink / raw)
  To: Christian Couder
  Cc: Git Mailing List, Junio C Hamano,
	Ævar Arnfjörð Bjarmason, Christian Couder

On Mon, Jul 11, 2016 at 7:22 PM, Christian Couder
<christian.couder@gmail.com> wrote:
> Everytime split index is turned on, it creates a "sharedindex.XXXX"
> file in the git directory. This makes sure that old sharedindex
> files are removed after a new one has been created.

Hmm it's one-way link, we don't know how many index files use this
shared index file, how can you be sure nobody else will need it? I'm
thinking about temporary indexes. If a temp index is created, saved on
disk, and use delete the shared index file, the real, main index may
become useless. Temp index will most likely replace the main index
(git commit) but if a failure happens, we can't fall back.

A safer approach is "touch" the shared index every time a linked index
is used, then we can delete shared indexes with old mtime, older than
a grace period, in git-prune (or here).
-- 
Duy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH 8/8] read-cache: unlink old sharedindex files
  2016-07-11 18:27   ` Duy Nguyen
@ 2016-07-12  7:04     ` Christian Couder
  2016-07-12 15:12       ` Duy Nguyen
  0 siblings, 1 reply; 19+ messages in thread
From: Christian Couder @ 2016-07-12  7:04 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Junio C Hamano,
	Ævar Arnfjörð Bjarmason, Christian Couder

On Mon, Jul 11, 2016 at 8:27 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Mon, Jul 11, 2016 at 7:22 PM, Christian Couder
> <christian.couder@gmail.com> wrote:
>> Everytime split index is turned on, it creates a "sharedindex.XXXX"
>> file in the git directory. This makes sure that old sharedindex
>> files are removed after a new one has been created.
>
> Hmm it's one-way link, we don't know how many index files use this
> shared index file, how can you be sure nobody else will need it? I'm
> thinking about temporary indexes. If a temp index is created, saved on
> disk, and use delete the shared index file, the real, main index may
> become useless. Temp index will most likely replace the main index
> (git commit) but if a failure happens, we can't fall back.

Isn't there a way to scan all the current indexes (temp or not) to see
which shared indexes they need?

> A safer approach is "touch" the shared index every time a linked index
> is used, then we can delete shared indexes with old mtime, older than
> a grace period, in git-prune (or here).

Maybe old linked indexes could be converted after some time to use a
newer shared index, so that we can get rid of the old shared indexes.
That seems safer than just deleting old linked indexes.

Thanks for your review,
Christian.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH 8/8] read-cache: unlink old sharedindex files
  2016-07-12  7:04     ` Christian Couder
@ 2016-07-12 15:12       ` Duy Nguyen
  2016-07-12 19:45         ` Christian Couder
  0 siblings, 1 reply; 19+ messages in thread
From: Duy Nguyen @ 2016-07-12 15:12 UTC (permalink / raw)
  To: Christian Couder
  Cc: Git Mailing List, Junio C Hamano,
	Ævar Arnfjörð Bjarmason, Christian Couder

On Tue, Jul 12, 2016 at 9:04 AM, Christian Couder
<christian.couder@gmail.com> wrote:
> On Mon, Jul 11, 2016 at 8:27 PM, Duy Nguyen <pclouds@gmail.com> wrote:
>> On Mon, Jul 11, 2016 at 7:22 PM, Christian Couder
>> <christian.couder@gmail.com> wrote:
>>> Everytime split index is turned on, it creates a "sharedindex.XXXX"
>>> file in the git directory. This makes sure that old sharedindex
>>> files are removed after a new one has been created.
>>
>> Hmm it's one-way link, we don't know how many index files use this
>> shared index file, how can you be sure nobody else will need it? I'm
>> thinking about temporary indexes. If a temp index is created, saved on
>> disk, and use delete the shared index file, the real, main index may
>> become useless. Temp index will most likely replace the main index
>> (git commit) but if a failure happens, we can't fall back.
>
> Isn't there a way to scan all the current indexes (temp or not) to see
> which shared indexes they need?

No. People could create an index file anywhere in theory. So you don't
know how many index files there are.

>> A safer approach is "touch" the shared index every time a linked index
>> is used, then we can delete shared indexes with old mtime, older than
>> a grace period, in git-prune (or here).
>
> Maybe old linked indexes could be converted after some time to use a
> newer shared index, so that we can get rid of the old shared indexes.
> That seems safer than just deleting old linked indexes.

It really depends. If the shared part is too small for old indexes, we
might as well unsplit them. In practice though, the only long-term
index file is $GIT_DIR/index. If we don't delete old shared index
files too close to their creation time, temp index files will go away.
-- 
Duy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH 0/8] Add configuration options for split-index
  2016-07-11 17:22 [RFC/PATCH 0/8] Add configuration options for split-index Christian Couder
                   ` (7 preceding siblings ...)
  2016-07-11 17:22 ` [RFC/PATCH 8/8] read-cache: unlink old sharedindex files Christian Couder
@ 2016-07-12 16:01 ` Duy Nguyen
  2016-07-23 16:11   ` Christian Couder
  8 siblings, 1 reply; 19+ messages in thread
From: Duy Nguyen @ 2016-07-12 16:01 UTC (permalink / raw)
  To: Christian Couder
  Cc: Git Mailing List, Junio C Hamano,
	Ævar Arnfjörð Bjarmason, Christian Couder

On Mon, Jul 11, 2016 at 7:22 PM, Christian Couder
<christian.couder@gmail.com> wrote:
> Future work
> ~~~~~~~~~~~
>
> One thing that is probably missing is a mechanism to avoid having too
> many changes accumulating in the (split) index while in split index
> mode. The git-update-index documentation says:
>
>         If split-index mode is already enabled and `--split-index` is
>         given again, all changes in $GIT_DIR/index are pushed back to
>         the shared index file.
>
> but it is probably better to not expect the user to think about it and
> to have a mechanism that pushes back all changes to the shared index
> file automatically when some threshold is reached. The threshold could
> be for example when $GIT_DIR/index size is larger than 25% of the
> shared index size. Opinions, test results or test ideas are welcome on
> this.

Oh yes I would like something like this. I stuck to the basics because
as you see you need to define some criteria to re-split again, but
without experimenting on real repos, I could just have gone the wrong
way. Index file size or the percentage of entries in linked/shared
indexes are two good candidates. You can also just re-split on
commands that likely lead to increasing linked index size a lot (maybe
git-reset), or run long enough that some extra processing won't get
noticed. For example git-gc should definitely re-split if this feature
is on, but I can't say if it's often enough.
-- 
Duy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH 8/8] read-cache: unlink old sharedindex files
  2016-07-12 15:12       ` Duy Nguyen
@ 2016-07-12 19:45         ` Christian Couder
  2016-07-13 15:16           ` Duy Nguyen
  0 siblings, 1 reply; 19+ messages in thread
From: Christian Couder @ 2016-07-12 19:45 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Junio C Hamano,
	Ævar Arnfjörð Bjarmason, Christian Couder

On Tue, Jul 12, 2016 at 5:12 PM, Duy Nguyen <pclouds@gmail.com> wrote:
>
> No. People could create an index file anywhere in theory. So you don't
> know how many index files there are.

Maybe when an index file is created, its path and its sharedindex file
could be appended into a log file.
We could check this log file to see if we can remove sharedindex files.
We would need to remove the entries in this log file for the indexes
that are no longer there.

Or instead of one log file we could have a file for each index file in
a special directory called for example "indexinfo".
So we could just delete the file if its related index is no longer there.

> It really depends. If the shared part is too small for old indexes, we
> might as well unsplit them. In practice though, the only long-term
> index file is $GIT_DIR/index. If we don't delete old shared index
> files too close to their creation time, temp index files will go away.

We could treat $GIT_DIR/index specially so that if there are no temp
index files, there should be nothing in "indexinfo".

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH 8/8] read-cache: unlink old sharedindex files
  2016-07-12 19:45         ` Christian Couder
@ 2016-07-13 15:16           ` Duy Nguyen
  2016-07-13 17:54             ` Christian Couder
  0 siblings, 1 reply; 19+ messages in thread
From: Duy Nguyen @ 2016-07-13 15:16 UTC (permalink / raw)
  To: Christian Couder
  Cc: Git Mailing List, Junio C Hamano,
	Ævar Arnfjörð Bjarmason, Christian Couder

On Tue, Jul 12, 2016 at 9:45 PM, Christian Couder
<christian.couder@gmail.com> wrote:
> On Tue, Jul 12, 2016 at 5:12 PM, Duy Nguyen <pclouds@gmail.com> wrote:
>>
>> No. People could create an index file anywhere in theory. So you don't
>> know how many index files there are.
>
> Maybe when an index file is created, its path and its sharedindex file
> could be appended into a log file.
> We could check this log file to see if we can remove sharedindex files.
> We would need to remove the entries in this log file for the indexes
> that are no longer there.
>
> Or instead of one log file we could have a file for each index file in
> a special directory called for example "indexinfo".
> So we could just delete the file if its related index is no longer there.

New files will require locking so people don't append at the same
time. And maybe another new host of problems. I think we can just go
with the garbage collection way that we have done for unreachable
objects.

Your indexinfo idea looks very close to multiple locking, an index
would lock the shared index it's linked to, preventing it from being
removed. For single locking, we can just create a file named $X.lock,
but for multiple locks, maybe we can go with
$X.lock-$index_trailing_sha1. Will it work? I don't know. Just
thinking out loud.

>> It really depends. If the shared part is too small for old indexes, we
>> might as well unsplit them. In practice though, the only long-term
>> index file is $GIT_DIR/index. If we don't delete old shared index
>> files too close to their creation time, temp index files will go away.
>
> We could treat $GIT_DIR/index specially so that if there are no temp
> index files, there should be nothing in "indexinfo".

No, temp index files are needed. And it will  be hard to treat
$GIT_DIR/index specially because updating it involves a temp index:
you first prepare a new index in $GIT_DIR/index.lock. If everything
goes well, you atomically rename it to $GIT_DIR/index. You may be able
to treat $GIT_DIR/index.lock special too, but that's starting to get
out of hand.
-- 
Duy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH 8/8] read-cache: unlink old sharedindex files
  2016-07-13 15:16           ` Duy Nguyen
@ 2016-07-13 17:54             ` Christian Couder
  0 siblings, 0 replies; 19+ messages in thread
From: Christian Couder @ 2016-07-13 17:54 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Junio C Hamano,
	Ævar Arnfjörð Bjarmason, Christian Couder

On Wed, Jul 13, 2016 at 5:16 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Tue, Jul 12, 2016 at 9:45 PM, Christian Couder
> <christian.couder@gmail.com> wrote:
>> On Tue, Jul 12, 2016 at 5:12 PM, Duy Nguyen <pclouds@gmail.com> wrote:
>>>
>>> No. People could create an index file anywhere in theory. So you don't
>>> know how many index files there are.
>>
>> Maybe when an index file is created, its path and its sharedindex file
>> could be appended into a log file.
>> We could check this log file to see if we can remove sharedindex files.
>> We would need to remove the entries in this log file for the indexes
>> that are no longer there.
>>
>> Or instead of one log file we could have a file for each index file in
>> a special directory called for example "indexinfo".
>> So we could just delete the file if its related index is no longer there.
>
> New files will require locking so people don't append at the same
> time.

If we create a new file for each index file with a name that depends
on the index path (for example maybe the sha1 of the index path) then
as many index files with the same path are not possible we should be
safe.

> And maybe another new host of problems. I think we can just go
> with the garbage collection way that we have done for unreachable
> objects.
>
> Your indexinfo idea looks very close to multiple locking, an index
> would lock the shared index it's linked to, preventing it from being
> removed. For single locking, we can just create a file named $X.lock,
> but for multiple locks, maybe we can go with
> $X.lock-$index_trailing_sha1. Will it work? I don't know. Just
> thinking out loud.

Isn't is possible that the same index (so with the same trailing sha1)
be created in two different places?

>>> It really depends. If the shared part is too small for old indexes, we
>>> might as well unsplit them. In practice though, the only long-term
>>> index file is $GIT_DIR/index. If we don't delete old shared index
>>> files too close to their creation time, temp index files will go away.
>>
>> We could treat $GIT_DIR/index specially so that if there are no temp
>> index files, there should be nothing in "indexinfo".
>
> No, temp index files are needed. And it will  be hard to treat
> $GIT_DIR/index specially because updating it involves a temp index:
> you first prepare a new index in $GIT_DIR/index.lock.

If the new index is always prepared in $GIT_DIR/index.lock, then there
is no problem in the first place because when my original patch calls
clean_shared_index_files() the $GIT_DIR/index.lock is taken. Or maybe
I am missing something?

> If everything
> goes well, you atomically rename it to $GIT_DIR/index. You may be able
> to treat $GIT_DIR/index.lock special too, but that's starting to get
> out of hand.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH 0/8] Add configuration options for split-index
  2016-07-12 16:01 ` [RFC/PATCH 0/8] Add configuration options for split-index Duy Nguyen
@ 2016-07-23 16:11   ` Christian Couder
  2016-07-25 16:04     ` Duy Nguyen
  0 siblings, 1 reply; 19+ messages in thread
From: Christian Couder @ 2016-07-23 16:11 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Junio C Hamano,
	Ævar Arnfjörð Bjarmason, Christian Couder

On Tue, Jul 12, 2016 at 6:01 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Mon, Jul 11, 2016 at 7:22 PM, Christian Couder
> <christian.couder@gmail.com> wrote:
>> Future work
>> ~~~~~~~~~~~
>>
>> One thing that is probably missing is a mechanism to avoid having too
>> many changes accumulating in the (split) index while in split index
>> mode. The git-update-index documentation says:
>>
>>         If split-index mode is already enabled and `--split-index` is
>>         given again, all changes in $GIT_DIR/index are pushed back to
>>         the shared index file.
>>
>> but it is probably better to not expect the user to think about it and
>> to have a mechanism that pushes back all changes to the shared index
>> file automatically when some threshold is reached. The threshold could
>> be for example when $GIT_DIR/index size is larger than 25% of the
>> shared index size. Opinions, test results or test ideas are welcome on
>> this.
>
> Oh yes I would like something like this. I stuck to the basics because
> as you see you need to define some criteria to re-split again, but
> without experimenting on real repos, I could just have gone the wrong
> way. Index file size or the percentage of entries in linked/shared
> indexes are two good candidates.

Ok, I started working on automatically pushing back all changes to the
shared index when the percentage of entries in linked vs shared
indexes is greater than 25% (maybe I will make it configurable later).
It is very basic and doesn't work well for now (for one thing it is
missing added entries), see:

https://github.com/chriscool/git/commits/config-split-index8

Basically I would like a way to count then entries that are only in
the linked index without modifying them to be safe, but I have a hard
time seeing how I could modify prepare_to_write_split_index() to get
that.

> You can also just re-split on
> commands that likely lead to increasing linked index size a lot (maybe
> git-reset), or run long enough that some extra processing won't get
> noticed. For example git-gc should definitely re-split if this feature
> is on, but I can't say if it's often enough.

I'd like to avoid re-split only on some specific commands as I feel it
could lead to bad behavior when none of these specific commands are
called but the linked index is growing because of other commands.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH 0/8] Add configuration options for split-index
  2016-07-23 16:11   ` Christian Couder
@ 2016-07-25 16:04     ` Duy Nguyen
  2016-07-25 21:18       ` Christian Couder
  0 siblings, 1 reply; 19+ messages in thread
From: Duy Nguyen @ 2016-07-25 16:04 UTC (permalink / raw)
  To: Christian Couder
  Cc: Git Mailing List, Junio C Hamano,
	Ævar Arnfjörð Bjarmason, Christian Couder

On Sat, Jul 23, 2016 at 6:11 PM, Christian Couder
<christian.couder@gmail.com> wrote:
> Ok, I started working on automatically pushing back all changes to the
> shared index when the percentage of entries in linked vs shared
> indexes is greater than 25% (maybe I will make it configurable later).
> It is very basic and doesn't work well for now (for one thing it is
> missing added entries), see:
>
> https://github.com/chriscool/git/commits/config-split-index8
>
> Basically I would like a way to count then entries that are only in
> the linked index without modifying them to be safe, but I have a hard
> time seeing how I could modify prepare_to_write_split_index() to get
> that.

Hmm.. can you do the counting separately? A shared cache_entry must
have its field "index" greater than zero. By counting the number of
entries whose index is zero (i.e. not shared) against the total number
of real (*) entries, you should have a decent estimate when to split.
Then you can do exactly what "git update-index --no-split-index" and
"git update-index --split-index" sequence does, but in write_index().
It's easier than messing inside split-index.c. If we hit performance
problem, then we can look into changing split-index.c

(*) remember that some entries may be marked CE_REMOVE, which are dead
entries and should not be counted because they will never be written
down on disk.
-- 
Duy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH 0/8] Add configuration options for split-index
  2016-07-25 16:04     ` Duy Nguyen
@ 2016-07-25 21:18       ` Christian Couder
  0 siblings, 0 replies; 19+ messages in thread
From: Christian Couder @ 2016-07-25 21:18 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Junio C Hamano,
	Ævar Arnfjörð Bjarmason, Christian Couder

On Mon, Jul 25, 2016 at 6:04 PM, Duy Nguyen <pclouds@gmail.com> wrote:
>
> Hmm.. can you do the counting separately? A shared cache_entry must
> have its field "index" greater than zero. By counting the number of
> entries whose index is zero (i.e. not shared) against the total number
> of real (*) entries, you should have a decent estimate when to split.
> Then you can do exactly what "git update-index --no-split-index" and
> "git update-index --split-index" sequence does, but in write_index().
> It's easier than messing inside split-index.c. If we hit performance
> problem, then we can look into changing split-index.c

Yeah, thanks for the suggestion. I will try it.

> (*) remember that some entries may be marked CE_REMOVE, which are dead
> entries and should not be counted because they will never be written
> down on disk.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-07-25 21:18 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-11 17:22 [RFC/PATCH 0/8] Add configuration options for split-index Christian Couder
2016-07-11 17:22 ` [RFC/PATCH 1/8] config: add git_config_get_split_index() Christian Couder
2016-07-11 17:22 ` [RFC/PATCH 2/8] split-index: add {add,remove}_split_index() functions Christian Couder
2016-07-11 17:22 ` [RFC/PATCH 3/8] read-cache: add and then use tweak_split_index() Christian Couder
2016-07-11 17:22 ` [RFC/PATCH 4/8] update-index: warn in case of split-index incoherency Christian Couder
2016-07-11 17:22 ` [RFC/PATCH 5/8] Documentation/config: add information for core.splitIndex Christian Couder
2016-07-11 17:22 ` [RFC/PATCH 6/8] Documentation/git-update-index: talk about core.splitIndex config var Christian Couder
2016-07-11 17:22 ` [RFC/PATCH 7/8] t1700: add tests for core.splitIndex Christian Couder
2016-07-11 17:22 ` [RFC/PATCH 8/8] read-cache: unlink old sharedindex files Christian Couder
2016-07-11 18:27   ` Duy Nguyen
2016-07-12  7:04     ` Christian Couder
2016-07-12 15:12       ` Duy Nguyen
2016-07-12 19:45         ` Christian Couder
2016-07-13 15:16           ` Duy Nguyen
2016-07-13 17:54             ` Christian Couder
2016-07-12 16:01 ` [RFC/PATCH 0/8] Add configuration options for split-index Duy Nguyen
2016-07-23 16:11   ` Christian Couder
2016-07-25 16:04     ` Duy Nguyen
2016-07-25 21:18       ` Christian Couder

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).