git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
* [PATCH 1/3] repository: fix repo_read_index with submodules
  2017-12-10 21:21 [PATCH 0/3] fixes for split index mode Thomas Gummerer
@ 2017-12-10 21:22 ` Thomas Gummerer
  2017-12-11 18:54   ` Brandon Williams
  2017-12-10 21:22 ` [PATCH 2/3] prune: fix pruning with multiple worktrees and split index Thomas Gummerer
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 94+ messages in thread
From: Thomas Gummerer @ 2017-12-10 21:22 UTC (permalink / raw)
  To: git
  Cc: Christian Couder, Lars Schneider, Brandon Williams, Jeff King,
	Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Thomas Gummerer

repo_read_index calls read_index_from, which takes an path argument for
the location of the index file.  For the split index however it relies
on the current working directory to construct the path using git_path.

repo_read_index calls read_index_from with the full path for the index
file, however it doesn't change the cwd, so when split index mode is
turned on, read_index_from can't find the file for the split index.

For example t3007-ls-files-recurse-submodules.sh was broken with
GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository
object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also
broken in a similar manner, probably by introducing struct repository
there, although I didn't track down the exact commit for that.

Fix this by introducing a new read_index_for_repo function, which knows
about the correct paths for the submodules.

The alternative would have been to make the callers pass in the base
path for the split index, however that ended up being more complicated,
and I think we want to converge towards using struct repository for
things like these anyway.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 cache.h      |  1 +
 read-cache.c | 19 +++++++++++++++++--
 repository.c |  2 +-
 3 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/cache.h b/cache.h
index cb5db7bf83..d42bea1ef7 100644
--- a/cache.h
+++ b/cache.h
@@ -614,6 +614,7 @@ extern int read_index_preload(struct index_state *, const struct pathspec *paths
 extern int do_read_index(struct index_state *istate, const char *path,
 			 int must_exist); /* for testting only! */
 extern int read_index_from(struct index_state *, const char *path);
+extern int read_index_for_repo(const struct repository *);
 extern int is_index_unborn(struct index_state *);
 extern int read_index_unmerged(struct index_state *);
 
diff --git a/read-cache.c b/read-cache.c
index 2eb81a66b9..4d5c4ad79b 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -20,6 +20,7 @@
 #include "split-index.h"
 #include "utf8.h"
 #include "fsmonitor.h"
+#include "repository.h"
 
 /* Mask for the name length in ce_flags in the on-disk index */
 
@@ -1871,7 +1872,8 @@ static void freshen_shared_index(char *base_sha1_hex, int warn)
 	free(shared_index);
 }
 
-int read_index_from(struct index_state *istate, const char *path)
+static int do_read_index_from(struct index_state *istate, const char *path,
+			      const struct repository *repo)
 {
 	struct split_index *split_index;
 	int ret;
@@ -1896,7 +1898,10 @@ int read_index_from(struct index_state *istate, const char *path)
 		split_index->base = xcalloc(1, sizeof(*split_index->base));
 
 	base_sha1_hex = sha1_to_hex(split_index->base_sha1);
-	base_path = git_path("sharedindex.%s", base_sha1_hex);
+	if (repo)
+		base_path = repo_git_path(repo, "sharedindex.%s", base_sha1_hex);
+	else
+		base_path = git_path("sharedindex.%s", base_sha1_hex);
 	ret = do_read_index(split_index->base, base_path, 1);
 	if (hashcmp(split_index->base_sha1, split_index->base->sha1))
 		die("broken index, expect %s in %s, got %s",
@@ -1909,6 +1914,16 @@ int read_index_from(struct index_state *istate, const char *path)
 	return ret;
 }
 
+int read_index_for_repo(const struct repository *repo)
+{
+	return do_read_index_from(repo->index, repo->index_file, repo);
+}
+
+int read_index_from(struct index_state *istate, const char *path)
+{
+	return do_read_index_from(istate, path, NULL);
+}
+
 int is_index_unborn(struct index_state *istate)
 {
 	return (!istate->cache_nr && !istate->timestamp.sec);
diff --git a/repository.c b/repository.c
index bb2fae5446..928b1f553d 100644
--- a/repository.c
+++ b/repository.c
@@ -229,5 +229,5 @@ int repo_read_index(struct repository *repo)
 	if (!repo->index)
 		repo->index = xcalloc(1, sizeof(*repo->index));
 
-	return read_index_from(repo->index, repo->index_file);
+	return read_index_for_repo(repo);
 }
-- 
2.15.1.504.g5279b80103


^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 2/3] prune: fix pruning with multiple worktrees and split index
  2017-12-10 21:21 [PATCH 0/3] fixes for split index mode Thomas Gummerer
  2017-12-10 21:22 ` [PATCH 1/3] repository: fix repo_read_index with submodules Thomas Gummerer
@ 2017-12-10 21:22 ` Thomas Gummerer
  2017-12-11 19:09   ` Brandon Williams
  2017-12-10 21:22 ` [PATCH 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX Thomas Gummerer
  2017-12-17 22:51 ` [PATCH v2 0/3] fixes for split index mode Thomas Gummerer
  3 siblings, 1 reply; 94+ messages in thread
From: Thomas Gummerer @ 2017-12-10 21:22 UTC (permalink / raw)
  To: git
  Cc: Christian Couder, Lars Schneider, Brandon Williams, Jeff King,
	Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Thomas Gummerer

be489d02d2 ("revision.c: --indexed-objects add objects from all
worktrees", 2017-08-23) made sure that pruning takes objects from all
worktrees into account.

It did that by reading the index of every worktree and adding the
necessary index objects to the set of pending objects.  The index is
read by read_index_from.  As mentioned in the previous commit,
read_index_from depends on the CWD for the location of the split index,
and add_index_objects_to_pending doesn't set that before using
read_index_from.

Instead of using read_index_from, use repo_read_index, which is aware of
the proper paths for the worktree.

This fixes t5304-prune when ran with GIT_TEST_SPLIT_INDEX set.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---

This also fixes t7009 when ran with GIT_TEST_SPLIT_INDEX.  I'm not
quite sure why it is fixed by this.  Either way I tracked the failure
down to f767178a5a ("Merge branch 'jk/no-null-sha1-in-cache-tree'",
2017-05-16).  Maybe Peff has an idea why this fixes that test?

 repository.c | 11 +++++++++++
 repository.h |  2 ++
 revision.c   | 13 ++++++++-----
 3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/repository.c b/repository.c
index 928b1f553d..3c9bfbd1b8 100644
--- a/repository.c
+++ b/repository.c
@@ -2,6 +2,7 @@
 #include "repository.h"
 #include "config.h"
 #include "submodule-config.h"
+#include "worktree.h"
 
 /* The main repository */
 static struct repository the_repo = {
@@ -146,6 +147,16 @@ int repo_init(struct repository *repo, const char *gitdir, const char *worktree)
 	return -1;
 }
 
+/*
+ * Initialize 'repo' based on the provided worktree
+ * Return 0 upon success and a non-zero value upon failure.
+ */
+int repo_worktree_init(struct repository *repo, struct worktree *worktree)
+{
+	return repo_init(repo, get_worktree_git_dir(worktree),
+			 worktree->path);
+}
+
 /*
  * Initialize 'submodule' as the submodule given by 'path' in parent repository
  * 'superproject'.
diff --git a/repository.h b/repository.h
index 7f5e24a0a2..2adeb05bf4 100644
--- a/repository.h
+++ b/repository.h
@@ -4,6 +4,7 @@
 struct config_set;
 struct index_state;
 struct submodule_cache;
+struct worktree;
 
 struct repository {
 	/* Environment */
@@ -87,6 +88,7 @@ extern struct repository *the_repository;
 extern void repo_set_gitdir(struct repository *repo, const char *path);
 extern void repo_set_worktree(struct repository *repo, const char *path);
 extern int repo_init(struct repository *repo, const char *gitdir, const char *worktree);
+extern int repo_worktree_init(struct repository *repo, struct worktree *worktree);
 extern int repo_submodule_init(struct repository *submodule,
 			       struct repository *superproject,
 			       const char *path);
diff --git a/revision.c b/revision.c
index e2e691dd5a..9d8d9b96d1 100644
--- a/revision.c
+++ b/revision.c
@@ -22,6 +22,7 @@
 #include "packfile.h"
 #include "worktree.h"
 #include "argv-array.h"
+#include "repository.h"
 
 volatile show_early_output_fn_t show_early_output;
 
@@ -1346,15 +1347,17 @@ void add_index_objects_to_pending(struct rev_info *revs, unsigned int flags)
 	worktrees = get_worktrees(0);
 	for (p = worktrees; *p; p++) {
 		struct worktree *wt = *p;
-		struct index_state istate = { NULL };
+		struct repository *repo;
 
+		repo = xmalloc(sizeof(struct repository));
 		if (wt->is_current)
 			continue; /* current index already taken care of */
+		if (repo_worktree_init(repo, wt))
+			BUG("couldn't initialize repository object from worktree");
 
-		if (read_index_from(&istate,
-				    worktree_git_path(wt, "index")) > 0)
-			do_add_index_objects_to_pending(revs, &istate);
-		discard_index(&istate);
+		if (repo_read_index(repo) > 0)
+			do_add_index_objects_to_pending(revs, repo->index);
+		discard_index(repo->index);
 	}
 	free_worktrees(worktrees);
 }
-- 
2.15.1.504.g5279b80103


^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX
  2017-12-10 21:21 [PATCH 0/3] fixes for split index mode Thomas Gummerer
  2017-12-10 21:22 ` [PATCH 1/3] repository: fix repo_read_index with submodules Thomas Gummerer
  2017-12-10 21:22 ` [PATCH 2/3] prune: fix pruning with multiple worktrees and split index Thomas Gummerer
@ 2017-12-10 21:22 ` Thomas Gummerer
  2017-12-10 23:37   ` Eric Sunshine
  2017-12-11 21:09   ` SZEDER Gábor
  2017-12-17 22:51 ` [PATCH v2 0/3] fixes for split index mode Thomas Gummerer
  3 siblings, 2 replies; 94+ messages in thread
From: Thomas Gummerer @ 2017-12-10 21:22 UTC (permalink / raw)
  To: git
  Cc: Christian Couder, Lars Schneider, Brandon Williams, Jeff King,
	Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Thomas Gummerer

Make sure that split index doesn't get broken, by running it on travis
CI.

Run the test suite with split index enabled in linux 64 bit mode, and
leave split index turned off in 32-bit mode.  The laternative would be
to add an extra target in the matrix, enabling split index mode, but
that would only use additional cycles on travis and would not bring that
much benefit, as we are still running the test suite in the "vanilla"
version in the 32-bit mode.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 .travis.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.travis.yml b/.travis.yml
index 281f101f31..c83c766dee 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -39,7 +39,7 @@ env:
 
 matrix:
   include:
-    - env: GETTEXT_POISON=YesPlease
+    - env: GETTEXT_POISON=YesPlease GIT_TEST_SPLIT_INDEX=YesPlease
       os: linux
       compiler:
       addons:
-- 
2.15.1.504.g5279b80103


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX
  2017-12-10 21:22 ` [PATCH 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX Thomas Gummerer
@ 2017-12-10 23:37   ` Eric Sunshine
  2017-12-11 21:09   ` SZEDER Gábor
  1 sibling, 0 replies; 94+ messages in thread
From: Eric Sunshine @ 2017-12-10 23:37 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Git List, Christian Couder, Lars Schneider, Brandon Williams,
	Jeff King, Junio C Hamano, Nguyễn Thái Ngọc Duy

On Sun, Dec 10, 2017 at 4:22 PM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> Make sure that split index doesn't get broken, by running it on travis
> CI.
>
> Run the test suite with split index enabled in linux 64 bit mode, and
> leave split index turned off in 32-bit mode.  The laternative would be

s/laternative/alternative/

> to add an extra target in the matrix, enabling split index mode, but
> that would only use additional cycles on travis and would not bring that
> much benefit, as we are still running the test suite in the "vanilla"
> version in the 32-bit mode.
>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/3] repository: fix repo_read_index with submodules
  2017-12-10 21:22 ` [PATCH 1/3] repository: fix repo_read_index with submodules Thomas Gummerer
@ 2017-12-11 18:54   ` Brandon Williams
  2017-12-11 20:37     ` Thomas Gummerer
  0 siblings, 1 reply; 94+ messages in thread
From: Brandon Williams @ 2017-12-11 18:54 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: git, Christian Couder, Lars Schneider, Jeff King, Junio C Hamano,
	Nguyễn Thái Ngọc Duy

On 12/10, Thomas Gummerer wrote:
> repo_read_index calls read_index_from, which takes an path argument for
> the location of the index file.  For the split index however it relies
> on the current working directory to construct the path using git_path.
> 
> repo_read_index calls read_index_from with the full path for the index
> file, however it doesn't change the cwd, so when split index mode is
> turned on, read_index_from can't find the file for the split index.
> 
> For example t3007-ls-files-recurse-submodules.sh was broken with
> GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository
> object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also
> broken in a similar manner, probably by introducing struct repository
> there, although I didn't track down the exact commit for that.
> 
> Fix this by introducing a new read_index_for_repo function, which knows
> about the correct paths for the submodules.
> 
> The alternative would have been to make the callers pass in the base
> path for the split index, however that ended up being more complicated,
> and I think we want to converge towards using struct repository for
> things like these anyway.

Thanks for catching this, I'm not a user of split index myself which is
why I unfortunately overlooked this.  Definitely a good change.  I
really only have one nit below.

> 
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
>  cache.h      |  1 +
>  read-cache.c | 19 +++++++++++++++++--
>  repository.c |  2 +-
>  3 files changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/cache.h b/cache.h
> index cb5db7bf83..d42bea1ef7 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -614,6 +614,7 @@ extern int read_index_preload(struct index_state *, const struct pathspec *paths
>  extern int do_read_index(struct index_state *istate, const char *path,
>  			 int must_exist); /* for testting only! */
>  extern int read_index_from(struct index_state *, const char *path);
> +extern int read_index_for_repo(const struct repository *);
>  extern int is_index_unborn(struct index_state *);
>  extern int read_index_unmerged(struct index_state *);
>  
> diff --git a/read-cache.c b/read-cache.c
> index 2eb81a66b9..4d5c4ad79b 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -20,6 +20,7 @@
>  #include "split-index.h"
>  #include "utf8.h"
>  #include "fsmonitor.h"
> +#include "repository.h"
>  
>  /* Mask for the name length in ce_flags in the on-disk index */
>  
> @@ -1871,7 +1872,8 @@ static void freshen_shared_index(char *base_sha1_hex, int warn)
>  	free(shared_index);
>  }
>  
> -int read_index_from(struct index_state *istate, const char *path)
> +static int do_read_index_from(struct index_state *istate, const char *path,
> +			      const struct repository *repo)
>  {
>  	struct split_index *split_index;
>  	int ret;
> @@ -1896,7 +1898,10 @@ int read_index_from(struct index_state *istate, const char *path)
>  		split_index->base = xcalloc(1, sizeof(*split_index->base));
>  
>  	base_sha1_hex = sha1_to_hex(split_index->base_sha1);
> -	base_path = git_path("sharedindex.%s", base_sha1_hex);
> +	if (repo)
> +		base_path = repo_git_path(repo, "sharedindex.%s", base_sha1_hex);
> +	else
> +		base_path = git_path("sharedindex.%s", base_sha1_hex);
>  	ret = do_read_index(split_index->base, base_path, 1);
>  	if (hashcmp(split_index->base_sha1, split_index->base->sha1))
>  		die("broken index, expect %s in %s, got %s",
> @@ -1909,6 +1914,16 @@ int read_index_from(struct index_state *istate, const char *path)
>  	return ret;
>  }
>  
> +int read_index_for_repo(const struct repository *repo)
> +{
> +	return do_read_index_from(repo->index, repo->index_file, repo);
> +}
> +
> +int read_index_from(struct index_state *istate, const char *path)
> +{
> +	return do_read_index_from(istate, path, NULL);
> +}

Instead of passing NULL and having to special case it in
'do_read_index_from()', how about we pass in 'the_repository'?

> +
>  int is_index_unborn(struct index_state *istate)
>  {
>  	return (!istate->cache_nr && !istate->timestamp.sec);
> diff --git a/repository.c b/repository.c
> index bb2fae5446..928b1f553d 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -229,5 +229,5 @@ int repo_read_index(struct repository *repo)
>  	if (!repo->index)
>  		repo->index = xcalloc(1, sizeof(*repo->index));
>  
> -	return read_index_from(repo->index, repo->index_file);
> +	return read_index_for_repo(repo);
>  }
> -- 
> 2.15.1.504.g5279b80103
> 

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 2/3] prune: fix pruning with multiple worktrees and split index
  2017-12-10 21:22 ` [PATCH 2/3] prune: fix pruning with multiple worktrees and split index Thomas Gummerer
@ 2017-12-11 19:09   ` Brandon Williams
  2017-12-11 21:39     ` Thomas Gummerer
  0 siblings, 1 reply; 94+ messages in thread
From: Brandon Williams @ 2017-12-11 19:09 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: git, Christian Couder, Lars Schneider, Jeff King, Junio C Hamano,
	Nguyễn Thái Ngọc Duy

On 12/10, Thomas Gummerer wrote:
> be489d02d2 ("revision.c: --indexed-objects add objects from all
> worktrees", 2017-08-23) made sure that pruning takes objects from all
> worktrees into account.
> 
> It did that by reading the index of every worktree and adding the
> necessary index objects to the set of pending objects.  The index is
> read by read_index_from.  As mentioned in the previous commit,
> read_index_from depends on the CWD for the location of the split index,
> and add_index_objects_to_pending doesn't set that before using
> read_index_from.
> 
> Instead of using read_index_from, use repo_read_index, which is aware of
> the proper paths for the worktree.
> 
> This fixes t5304-prune when ran with GIT_TEST_SPLIT_INDEX set.
> 

I'm on the fence about this change.  I understand that this will ensure
that the proper objects aren't pruned when using a split index in the
presence of worktrees but I think the solution needs to be thought
through a bit more.

My big concern right now is the interaction of 'struct worktree's and
'struct repository'.  I'll try to highlight my concerns below.

> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
> 
> This also fixes t7009 when ran with GIT_TEST_SPLIT_INDEX.  I'm not
> quite sure why it is fixed by this.  Either way I tracked the failure
> down to f767178a5a ("Merge branch 'jk/no-null-sha1-in-cache-tree'",
> 2017-05-16).  Maybe Peff has an idea why this fixes that test?
> 
>  repository.c | 11 +++++++++++
>  repository.h |  2 ++
>  revision.c   | 13 ++++++++-----
>  3 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/repository.c b/repository.c
> index 928b1f553d..3c9bfbd1b8 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -2,6 +2,7 @@
>  #include "repository.h"
>  #include "config.h"
>  #include "submodule-config.h"
> +#include "worktree.h"
>  
>  /* The main repository */
>  static struct repository the_repo = {
> @@ -146,6 +147,16 @@ int repo_init(struct repository *repo, const char *gitdir, const char *worktree)
>  	return -1;
>  }
>  
> +/*
> + * Initialize 'repo' based on the provided worktree
> + * Return 0 upon success and a non-zero value upon failure.
> + */
> +int repo_worktree_init(struct repository *repo, struct worktree *worktree)
> +{
> +	return repo_init(repo, get_worktree_git_dir(worktree),
> +			 worktree->path);
> +}

My first concern is the use of 'get_worktree_git_dir()'.  Under the hood
it calls 'get_git_dir()', 'get_git_common_dir()', and
'git_common_path()' which rely on global state as stored in
'the_repository'.  So how does one initialize a repository struct (using
this initializer) using a worktree from a repository other than the
global 'the_repository' struct?  I'm not sure I have an answer right
now, but its an issue that needs to be thought through before we head
down this road.

Just thinking to myself, Does it make sense to have worktree's as a
separate struct or to have them stored in 'struct repository' in some
way?  Shouldn't a repository struct have a way to interact with all of
its worktrees?  How would initializing a repository struct for every
worktree work once we migrate the object store to be stored in 'struct
repoisotry'?  Shouldn't every worktree share the same object store
in-memory like they do on-disk?

> +
>  /*
>   * Initialize 'submodule' as the submodule given by 'path' in parent repository
>   * 'superproject'.
> diff --git a/repository.h b/repository.h
> index 7f5e24a0a2..2adeb05bf4 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -4,6 +4,7 @@
>  struct config_set;
>  struct index_state;
>  struct submodule_cache;
> +struct worktree;
>  
>  struct repository {
>  	/* Environment */
> @@ -87,6 +88,7 @@ extern struct repository *the_repository;
>  extern void repo_set_gitdir(struct repository *repo, const char *path);
>  extern void repo_set_worktree(struct repository *repo, const char *path);
>  extern int repo_init(struct repository *repo, const char *gitdir, const char *worktree);
> +extern int repo_worktree_init(struct repository *repo, struct worktree *worktree);
>  extern int repo_submodule_init(struct repository *submodule,
>  			       struct repository *superproject,
>  			       const char *path);
> diff --git a/revision.c b/revision.c
> index e2e691dd5a..9d8d9b96d1 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -22,6 +22,7 @@
>  #include "packfile.h"
>  #include "worktree.h"
>  #include "argv-array.h"
> +#include "repository.h"
>  
>  volatile show_early_output_fn_t show_early_output;
>  
> @@ -1346,15 +1347,17 @@ void add_index_objects_to_pending(struct rev_info *revs, unsigned int flags)
>  	worktrees = get_worktrees(0);
>  	for (p = worktrees; *p; p++) {
>  		struct worktree *wt = *p;
> -		struct index_state istate = { NULL };
> +		struct repository *repo;
>  
> +		repo = xmalloc(sizeof(struct repository));

This was allocated but never freed, was that intentional?

>  		if (wt->is_current)
>  			continue; /* current index already taken care of */
> +		if (repo_worktree_init(repo, wt))
> +			BUG("couldn't initialize repository object from worktree");
>  
> -		if (read_index_from(&istate,
> -				    worktree_git_path(wt, "index")) > 0)
> -			do_add_index_objects_to_pending(revs, &istate);
> -		discard_index(&istate);
> +		if (repo_read_index(repo) > 0)
> +			do_add_index_objects_to_pending(revs, repo->index);
> +		discard_index(repo->index);

One we have separate object stores per-repository how would we handle
this since this pruning should only work on a single repository's object
store?

>  	}
>  	free_worktrees(worktrees);
>  }
> -- 
> 2.15.1.504.g5279b80103
> 

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/3] repository: fix repo_read_index with submodules
  2017-12-11 18:54   ` Brandon Williams
@ 2017-12-11 20:37     ` Thomas Gummerer
  0 siblings, 0 replies; 94+ messages in thread
From: Thomas Gummerer @ 2017-12-11 20:37 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, Christian Couder, Lars Schneider, Jeff King, Junio C Hamano,
	Nguyễn Thái Ngọc Duy

On 12/11, Brandon Williams wrote:
> On 12/10, Thomas Gummerer wrote:
> > repo_read_index calls read_index_from, which takes an path argument for
> > the location of the index file.  For the split index however it relies
> > on the current working directory to construct the path using git_path.
> > 
> > repo_read_index calls read_index_from with the full path for the index
> > file, however it doesn't change the cwd, so when split index mode is
> > turned on, read_index_from can't find the file for the split index.
> > 
> > For example t3007-ls-files-recurse-submodules.sh was broken with
> > GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository
> > object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also
> > broken in a similar manner, probably by introducing struct repository
> > there, although I didn't track down the exact commit for that.
> > 
> > Fix this by introducing a new read_index_for_repo function, which knows
> > about the correct paths for the submodules.
> > 
> > The alternative would have been to make the callers pass in the base
> > path for the split index, however that ended up being more complicated,
> > and I think we want to converge towards using struct repository for
> > things like these anyway.
> 
> Thanks for catching this, I'm not a user of split index myself which is
> why I unfortunately overlooked this.  Definitely a good change.  I
> really only have one nit below.

Me neither, I just remember to run the split index tests every once in
a while, which is also why it's taken so long to catch this.
Hopefully after we fixed this we can get travis to run the tests,
which should help :)

> > 
> > Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> > ---
> >  cache.h      |  1 +
> >  read-cache.c | 19 +++++++++++++++++--
> >  repository.c |  2 +-
> >  3 files changed, 19 insertions(+), 3 deletions(-)
> > 
> > diff --git a/cache.h b/cache.h
> > index cb5db7bf83..d42bea1ef7 100644
> > --- a/cache.h
> > +++ b/cache.h
> > @@ -614,6 +614,7 @@ extern int read_index_preload(struct index_state *, const struct pathspec *paths
> >  extern int do_read_index(struct index_state *istate, const char *path,
> >  			 int must_exist); /* for testting only! */
> >  extern int read_index_from(struct index_state *, const char *path);
> > +extern int read_index_for_repo(const struct repository *);
> >  extern int is_index_unborn(struct index_state *);
> >  extern int read_index_unmerged(struct index_state *);
> >  
> > diff --git a/read-cache.c b/read-cache.c
> > index 2eb81a66b9..4d5c4ad79b 100644
> > --- a/read-cache.c
> > +++ b/read-cache.c
> > @@ -20,6 +20,7 @@
> >  #include "split-index.h"
> >  #include "utf8.h"
> >  #include "fsmonitor.h"
> > +#include "repository.h"
> >  
> >  /* Mask for the name length in ce_flags in the on-disk index */
> >  
> > @@ -1871,7 +1872,8 @@ static void freshen_shared_index(char *base_sha1_hex, int warn)
> >  	free(shared_index);
> >  }
> >  
> > -int read_index_from(struct index_state *istate, const char *path)
> > +static int do_read_index_from(struct index_state *istate, const char *path,
> > +			      const struct repository *repo)
> >  {
> >  	struct split_index *split_index;
> >  	int ret;
> > @@ -1896,7 +1898,10 @@ int read_index_from(struct index_state *istate, const char *path)
> >  		split_index->base = xcalloc(1, sizeof(*split_index->base));
> >  
> >  	base_sha1_hex = sha1_to_hex(split_index->base_sha1);
> > -	base_path = git_path("sharedindex.%s", base_sha1_hex);
> > +	if (repo)
> > +		base_path = repo_git_path(repo, "sharedindex.%s", base_sha1_hex);
> > +	else
> > +		base_path = git_path("sharedindex.%s", base_sha1_hex);
> >  	ret = do_read_index(split_index->base, base_path, 1);
> >  	if (hashcmp(split_index->base_sha1, split_index->base->sha1))
> >  		die("broken index, expect %s in %s, got %s",
> > @@ -1909,6 +1914,16 @@ int read_index_from(struct index_state *istate, const char *path)
> >  	return ret;
> >  }
> >  
> > +int read_index_for_repo(const struct repository *repo)
> > +{
> > +	return do_read_index_from(repo->index, repo->index_file, repo);
> > +}
> > +
> > +int read_index_from(struct index_state *istate, const char *path)
> > +{
> > +	return do_read_index_from(istate, path, NULL);
> > +}
> 
> Instead of passing NULL and having to special case it in
> 'do_read_index_from()', how about we pass in 'the_repository'?

I think that makes sense the only way this function used to work with
split index turned on is if it's called from the main repository, so
just passing through 'the_repository' would have the function behave
the exact same way as before.

> > +
> >  int is_index_unborn(struct index_state *istate)
> >  {
> >  	return (!istate->cache_nr && !istate->timestamp.sec);
> > diff --git a/repository.c b/repository.c
> > index bb2fae5446..928b1f553d 100644
> > --- a/repository.c
> > +++ b/repository.c
> > @@ -229,5 +229,5 @@ int repo_read_index(struct repository *repo)
> >  	if (!repo->index)
> >  		repo->index = xcalloc(1, sizeof(*repo->index));
> >  
> > -	return read_index_from(repo->index, repo->index_file);
> > +	return read_index_for_repo(repo);
> >  }
> > -- 
> > 2.15.1.504.g5279b80103
> > 
> 
> -- 
> Brandon Williams

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX
  2017-12-10 21:22 ` [PATCH 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX Thomas Gummerer
  2017-12-10 23:37   ` Eric Sunshine
@ 2017-12-11 21:09   ` SZEDER Gábor
  2017-12-11 21:42     ` Thomas Gummerer
  1 sibling, 1 reply; 94+ messages in thread
From: SZEDER Gábor @ 2017-12-11 21:09 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: SZEDER Gábor, Christian Couder, Lars Schneider,
	Brandon Williams, Jeff King, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, git

> Make sure that split index doesn't get broken, by running it on travis
> CI.
> 
> Run the test suite with split index enabled in linux 64 bit mode, and
> leave split index turned off in 32-bit mode.

This doesn't accurately describe what the patch does.
Travis CI runs three 64 bit Linux build jobs for us: one compiled with
Clang, one with GCC, and one with GETTEXT_POISON enabled.  This patch
enables split index only in the latter build job, but not in the Clang
and GCC build jobs.

>  The laternative would be
> to add an extra target in the matrix, enabling split index mode, but
> that would only use additional cycles on travis and would not bring that
> much benefit, as we are still running the test suite in the "vanilla"
> version in the 32-bit mode.
> 
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
>  .travis.yml | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/.travis.yml b/.travis.yml
> index 281f101f31..c83c766dee 100644
> --- a/.travis.yml
> +++ b/.travis.yml
> @@ -39,7 +39,7 @@ env:
>  
>  matrix:
>    include:
> -    - env: GETTEXT_POISON=YesPlease
> +    - env: GETTEXT_POISON=YesPlease GIT_TEST_SPLIT_INDEX=YesPlease
>        os: linux
>        compiler:
>        addons:
> -- 
> 2.15.1.504.g5279b80103

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 2/3] prune: fix pruning with multiple worktrees and split index
  2017-12-11 19:09   ` Brandon Williams
@ 2017-12-11 21:39     ` Thomas Gummerer
  0 siblings, 0 replies; 94+ messages in thread
From: Thomas Gummerer @ 2017-12-11 21:39 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, Christian Couder, Lars Schneider, Jeff King, Junio C Hamano,
	Nguyễn Thái Ngọc Duy

On 12/11, Brandon Williams wrote:
> On 12/10, Thomas Gummerer wrote:
> > be489d02d2 ("revision.c: --indexed-objects add objects from all
> > worktrees", 2017-08-23) made sure that pruning takes objects from all
> > worktrees into account.
> > 
> > It did that by reading the index of every worktree and adding the
> > necessary index objects to the set of pending objects.  The index is
> > read by read_index_from.  As mentioned in the previous commit,
> > read_index_from depends on the CWD for the location of the split index,
> > and add_index_objects_to_pending doesn't set that before using
> > read_index_from.
> > 
> > Instead of using read_index_from, use repo_read_index, which is aware of
> > the proper paths for the worktree.
> > 
> > This fixes t5304-prune when ran with GIT_TEST_SPLIT_INDEX set.
> > 
> 
> I'm on the fence about this change.  I understand that this will ensure
> that the proper objects aren't pruned when using a split index in the
> presence of worktrees but I think the solution needs to be thought
> through a bit more.
> 
> My big concern right now is the interaction of 'struct worktree's and
> 'struct repository'.  I'll try to highlight my concerns below.

Thanks for the review!  My thoughts on the concerns are below.

> > Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> > ---
> > 
> > This also fixes t7009 when ran with GIT_TEST_SPLIT_INDEX.  I'm not
> > quite sure why it is fixed by this.  Either way I tracked the failure
> > down to f767178a5a ("Merge branch 'jk/no-null-sha1-in-cache-tree'",
> > 2017-05-16).  Maybe Peff has an idea why this fixes that test?
> > 
> >  repository.c | 11 +++++++++++
> >  repository.h |  2 ++
> >  revision.c   | 13 ++++++++-----
> >  3 files changed, 21 insertions(+), 5 deletions(-)
> > 
> > diff --git a/repository.c b/repository.c
> > index 928b1f553d..3c9bfbd1b8 100644
> > --- a/repository.c
> > +++ b/repository.c
> > @@ -2,6 +2,7 @@
> >  #include "repository.h"
> >  #include "config.h"
> >  #include "submodule-config.h"
> > +#include "worktree.h"
> >  
> >  /* The main repository */
> >  static struct repository the_repo = {
> > @@ -146,6 +147,16 @@ int repo_init(struct repository *repo, const char *gitdir, const char *worktree)
> >  	return -1;
> >  }
> >  
> > +/*
> > + * Initialize 'repo' based on the provided worktree
> > + * Return 0 upon success and a non-zero value upon failure.
> > + */
> > +int repo_worktree_init(struct repository *repo, struct worktree *worktree)
> > +{
> > +	return repo_init(repo, get_worktree_git_dir(worktree),
> > +			 worktree->path);
> > +}
> 
> My first concern is the use of 'get_worktree_git_dir()'.  Under the hood
> it calls 'get_git_dir()', 'get_git_common_dir()', and
> 'git_common_path()' which rely on global state as stored in
> 'the_repository'.  So how does one initialize a repository struct (using
> this initializer) using a worktree from a repository other than the
> global 'the_repository' struct?  I'm not sure I have an answer right
> now, but its an issue that needs to be thought through before we head
> down this road.

The main reason for just using 'get_worktree_git_dir()' was because it
exists and it does exactly what I needed.  If we ever have the need to
initialize a repository struct for a worktree from another repository
struct, I would imagine we introduce a new function
'repo_get_worktree_git_dir()', which takes a struct repository as
input and returns the worktree git dir based on that.

And then we'd have to add a different initializer
'repo_worktree_init_from_repo()' or something like that, where we
could initialize a worktree from a repository struct different from
'the_repository'.  But maybe there are some complications there that I
didn't think about?

> Just thinking to myself, Does it make sense to have worktree's as a
> separate struct or to have them stored in 'struct repository' in some
> way?

I'm not sure I have a good answer for this.  Looking at the libgit2
source, which has a repository struct for (I assume at least, I'm not
very familiar with the libgit2 source :)), they have a repository
struct for each worktree, with a flag specifying whether the struct is
for a worktree, or a "normal" git repository/the main worktree.

Obviously that doesn't mean we should do it the same way, but it
probably also doesn't hurt to look at prior art.

> Shouldn't a repository struct have a way to interact with all of
> its worktrees?

It could go either way I guess depending on how we want to design it.
If we want the repository struct to interact with all worktrees, it
seems to me like we would almost need some kind of 'struct
super_repository', which then contains a list of 'struct repository's
representing each worktree, probably including the main worktree.

But then again in normal operation we probably only want the
repository struct for the current repository we're working with, not
all of the worktrees.  I suspect it's only in some special cases where
we want to have all worktrees in memory at once (such as in the prune
case).

Or how would you haven envisioned this?

> How would initializing a repository struct for every
> worktree work once we migrate the object store to be stored in 'struct
> repoisotry'?  Shouldn't every worktree share the same object store
> in-memory like they do on-disk?

I haven't thought this far.  They probably want to share the same
object store, if they all want the object store to be loaded.  But
we can do that either way, whether we have one 'struct repository'
that has links to the worktrees, or we have multiple 'struct
repository's, which each have a ref-counted pointer to the object
store or something like that.

> > +
> >  /*
> >   * Initialize 'submodule' as the submodule given by 'path' in parent repository
> >   * 'superproject'.
> > diff --git a/repository.h b/repository.h
> > index 7f5e24a0a2..2adeb05bf4 100644
> > --- a/repository.h
> > +++ b/repository.h
> > @@ -4,6 +4,7 @@
> >  struct config_set;
> >  struct index_state;
> >  struct submodule_cache;
> > +struct worktree;
> >  
> >  struct repository {
> >  	/* Environment */
> > @@ -87,6 +88,7 @@ extern struct repository *the_repository;
> >  extern void repo_set_gitdir(struct repository *repo, const char *path);
> >  extern void repo_set_worktree(struct repository *repo, const char *path);
> >  extern int repo_init(struct repository *repo, const char *gitdir, const char *worktree);
> > +extern int repo_worktree_init(struct repository *repo, struct worktree *worktree);
> >  extern int repo_submodule_init(struct repository *submodule,
> >  			       struct repository *superproject,
> >  			       const char *path);
> > diff --git a/revision.c b/revision.c
> > index e2e691dd5a..9d8d9b96d1 100644
> > --- a/revision.c
> > +++ b/revision.c
> > @@ -22,6 +22,7 @@
> >  #include "packfile.h"
> >  #include "worktree.h"
> >  #include "argv-array.h"
> > +#include "repository.h"
> >  
> >  volatile show_early_output_fn_t show_early_output;
> >  
> > @@ -1346,15 +1347,17 @@ void add_index_objects_to_pending(struct rev_info *revs, unsigned int flags)
> >  	worktrees = get_worktrees(0);
> >  	for (p = worktrees; *p; p++) {
> >  		struct worktree *wt = *p;
> > -		struct index_state istate = { NULL };
> > +		struct repository *repo;
> >  
> > +		repo = xmalloc(sizeof(struct repository));
> 
> This was allocated but never freed, was that intentional?

No, that was an oversight, will fix, thanks!

> >  		if (wt->is_current)
> >  			continue; /* current index already taken care of */
> > +		if (repo_worktree_init(repo, wt))
> > +			BUG("couldn't initialize repository object from worktree");
> >  
> > -		if (read_index_from(&istate,
> > -				    worktree_git_path(wt, "index")) > 0)
> > -			do_add_index_objects_to_pending(revs, &istate);
> > -		discard_index(&istate);
> > +		if (repo_read_index(repo) > 0)
> > +			do_add_index_objects_to_pending(revs, repo->index);
> > +		discard_index(repo->index);
> 
> One we have separate object stores per-repository how would we handle
> this since this pruning should only work on a single repository's object
> store?

That depends a lot on how we design the object store and how it's held
by the repository struct.  If we only have one in-memory
representation of it, the answer is obvious.  If we have multiple
in-memory representations of it at the same time, things will get a
bit more hairy.  But until we have at least a design of how that would
look like it's quite hard to tell I'd say.

> >  	}
> >  	free_worktrees(worktrees);
> >  }
> > -- 
> > 2.15.1.504.g5279b80103
> > 
> 
> -- 
> Brandon Williams

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX
  2017-12-11 21:09   ` SZEDER Gábor
@ 2017-12-11 21:42     ` Thomas Gummerer
  2017-12-12 15:54       ` Lars Schneider
  0 siblings, 1 reply; 94+ messages in thread
From: Thomas Gummerer @ 2017-12-11 21:42 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Christian Couder, Lars Schneider, Brandon Williams, Jeff King,
	Junio C Hamano, Nguyễn Thái Ngọc Duy, git

On 12/11, SZEDER Gábor wrote:
> > Make sure that split index doesn't get broken, by running it on travis
> > CI.
> > 
> > Run the test suite with split index enabled in linux 64 bit mode, and
> > leave split index turned off in 32-bit mode.
> 
> This doesn't accurately describe what the patch does.
> Travis CI runs three 64 bit Linux build jobs for us: one compiled with
> Clang, one with GCC, and one with GETTEXT_POISON enabled.  This patch
> enables split index only in the latter build job, but not in the Clang
> and GCC build jobs.

You're right, it's my first time using travis CI and I got confused
about how the .travis.yml works, thanks for catching that.  Will
re-phrase the commit message.

> >  The laternative would be
> > to add an extra target in the matrix, enabling split index mode, but
> > that would only use additional cycles on travis and would not bring that
> > much benefit, as we are still running the test suite in the "vanilla"
> > version in the 32-bit mode.
> > 
> > Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> > ---
> >  .travis.yml | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/.travis.yml b/.travis.yml
> > index 281f101f31..c83c766dee 100644
> > --- a/.travis.yml
> > +++ b/.travis.yml
> > @@ -39,7 +39,7 @@ env:
> >  
> >  matrix:
> >    include:
> > -    - env: GETTEXT_POISON=YesPlease
> > +    - env: GETTEXT_POISON=YesPlease GIT_TEST_SPLIT_INDEX=YesPlease
> >        os: linux
> >        compiler:
> >        addons:
> > -- 
> > 2.15.1.504.g5279b80103

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX
  2017-12-11 21:42     ` Thomas Gummerer
@ 2017-12-12 15:54       ` Lars Schneider
  2017-12-12 19:15         ` Junio C Hamano
  0 siblings, 1 reply; 94+ messages in thread
From: Lars Schneider @ 2017-12-12 15:54 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: SZEDER Gábor, Christian Couder, Brandon Williams, Jeff King,
	Junio C Hamano, Nguyễn Thái Ngọc Duy, git


> On 11 Dec 2017, at 22:42, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> 
> On 12/11, SZEDER Gábor wrote:
>>> Make sure that split index doesn't get broken, by running it on travis
>>> CI.
>>> 
>>> Run the test suite with split index enabled in linux 64 bit mode, and
>>> leave split index turned off in 32-bit mode.
>> 
>> This doesn't accurately describe what the patch does.
>> Travis CI runs three 64 bit Linux build jobs for us: one compiled with
>> Clang, one with GCC, and one with GETTEXT_POISON enabled.  This patch
>> enables split index only in the latter build job, but not in the Clang
>> and GCC build jobs.
> 
> You're right, it's my first time using travis CI and I got confused
> about how the .travis.yml works, thanks for catching that.  Will
> re-phrase the commit message.

Szeder is spot on. If you fix up the message, then this patch looks
perfect! Pragmatic (= very good!) idea to combine GITTEXT_POISON with 
GIT_TEST_SPLIT_INDEX :-)

Thanks,
Lars


> 
>>> The laternative would be
>>> to add an extra target in the matrix, enabling split index mode, but
>>> that would only use additional cycles on travis and would not bring that
>>> much benefit, as we are still running the test suite in the "vanilla"
>>> version in the 32-bit mode.
>>> 
>>> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
>>> ---
>>> .travis.yml | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>> 
>>> diff --git a/.travis.yml b/.travis.yml
>>> index 281f101f31..c83c766dee 100644
>>> --- a/.travis.yml
>>> +++ b/.travis.yml
>>> @@ -39,7 +39,7 @@ env:
>>> 
>>> matrix:
>>>   include:
>>> -    - env: GETTEXT_POISON=YesPlease
>>> +    - env: GETTEXT_POISON=YesPlease GIT_TEST_SPLIT_INDEX=YesPlease
>>>       os: linux
>>>       compiler:
>>>       addons:
>>> -- 
>>> 2.15.1.504.g5279b80103


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX
  2017-12-12 15:54       ` Lars Schneider
@ 2017-12-12 19:15         ` Junio C Hamano
  2017-12-12 20:15           ` Thomas Gummerer
  2017-12-13 17:21           ` Lars Schneider
  0 siblings, 2 replies; 94+ messages in thread
From: Junio C Hamano @ 2017-12-12 19:15 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Thomas Gummerer, SZEDER Gábor, Christian Couder,
	Brandon Williams, Jeff King,
	Nguyễn Thái Ngọc Duy, git

Lars Schneider <larsxschneider@gmail.com> writes:

>> You're right, it's my first time using travis CI and I got confused
>> about how the .travis.yml works, thanks for catching that.  Will
>> re-phrase the commit message.
>
> Szeder is spot on. If you fix up the message, then this patch looks
> perfect! Pragmatic (= very good!) idea to combine GITTEXT_POISON with 
> GIT_TEST_SPLIT_INDEX :-)

I am failing to guess the real intent of the smiley here.

If split-index code is so easy to break, I do not think it is a good
idea to combine it into the poison build.  In fact, the poison test
is useless on a codebase where other/real breakages are expected to
exist, because it is about seeing messages meant for non-humans are
not passed to the _() mechanism by sloppy coding, and the way it
does so is to corrupt all the messages that come through the _()
mechanism.  If we do not even produce a message when a correct code
_should_ produce one, poison test would catch nothing useful.

I wonder if it makes more sense to update ci/run-tests.sh so that
its final step is run twice with different settings, like so?

 ci/run-tests.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/ci/run-tests.sh b/ci/run-tests.sh
index f0c743de94..15a5f5a6cc 100755
--- a/ci/run-tests.sh
+++ b/ci/run-tests.sh
@@ -8,3 +8,4 @@
 mkdir -p $HOME/travis-cache
 ln -s $HOME/travis-cache/.prove t/.prove
 make --quiet test
+GIT_TEST_SPLIT_INDEX=LetsTryIt make --quiet test

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX
  2017-12-12 19:15         ` Junio C Hamano
@ 2017-12-12 20:15           ` Thomas Gummerer
  2017-12-12 20:51             ` Junio C Hamano
  2017-12-13 17:21           ` Lars Schneider
  1 sibling, 1 reply; 94+ messages in thread
From: Thomas Gummerer @ 2017-12-12 20:15 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Lars Schneider, SZEDER Gábor, Christian Couder,
	Brandon Williams, Jeff King,
	Nguyễn Thái Ngọc Duy, git

On 12/12, Junio C Hamano wrote:
> Lars Schneider <larsxschneider@gmail.com> writes:
> 
> >> You're right, it's my first time using travis CI and I got confused
> >> about how the .travis.yml works, thanks for catching that.  Will
> >> re-phrase the commit message.
> >
> > Szeder is spot on. If you fix up the message, then this patch looks
> > perfect! Pragmatic (= very good!) idea to combine GITTEXT_POISON with 
> > GIT_TEST_SPLIT_INDEX :-)
> 
> I am failing to guess the real intent of the smiley here.
> 
> If split-index code is so easy to break, I do not think it is a good
> idea to combine it into the poison build.  In fact, the poison test
> is useless on a codebase where other/real breakages are expected to
> exist, because it is about seeing messages meant for non-humans are
> not passed to the _() mechanism by sloppy coding, and the way it
> does so is to corrupt all the messages that come through the _()
> mechanism.  If we do not even produce a message when a correct code
> _should_ produce one, poison test would catch nothing useful.

The breakages wen the split-index code fails tend to break things in
much more obvious manners than a wrong message, usually git ends up
dying if it gets broken.  Both of the bugs that were fixed here would
have been caught with the change in my patch.

But yeah I can see the argument that it doesn't give us a guarantee
that it catches all things the test suite could catch.

> I wonder if it makes more sense to update ci/run-tests.sh so that
> its final step is run twice with different settings, like so?

I kind of wanted to avoid that, because it ends up running the test
suite twice on travis for every test, which seems a bit overkill.  But
I don't exactly how worried we are about cycles on travis (I don't
check it very often personally, so I don't really know).  If we aren't
worried about cycles what you have below would certainly make sense.

>  ci/run-tests.sh | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/ci/run-tests.sh b/ci/run-tests.sh
> index f0c743de94..15a5f5a6cc 100755
> --- a/ci/run-tests.sh
> +++ b/ci/run-tests.sh
> @@ -8,3 +8,4 @@
>  mkdir -p $HOME/travis-cache
>  ln -s $HOME/travis-cache/.prove t/.prove
>  make --quiet test
> +GIT_TEST_SPLIT_INDEX=LetsTryIt make --quiet test

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX
  2017-12-12 20:15           ` Thomas Gummerer
@ 2017-12-12 20:51             ` Junio C Hamano
  2017-12-13 23:21               ` Thomas Gummerer
  0 siblings, 1 reply; 94+ messages in thread
From: Junio C Hamano @ 2017-12-12 20:51 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Lars Schneider, SZEDER Gábor, Christian Couder,
	Brandon Williams, Jeff King,
	Nguyễn Thái Ngọc Duy, git

Thomas Gummerer <t.gummerer@gmail.com> writes:

>
> The breakages wen the split-index code fails tend to break things in
> much more obvious manners than a wrong message, usually git ends up
> dying if it gets broken.  Both of the bugs that were fixed here would
> have been caught with the change in my patch.
>
> But yeah I can see the argument that it doesn't give us a guarantee
> that it catches all things the test suite could catch.

I think you misunderstood me.  When split index is much easier to
break than poison tests, combining them together would hurt the test
coverage of poison tests.  If you value poison tests much more than
how well split index mode works, that is a worse outcome.

>> I wonder if it makes more sense to update ci/run-tests.sh so that
>> its final step is run twice with different settings, like so?
>
> I kind of wanted to avoid that, because it ends up running the test
> suite twice on travis for every test, which seems a bit overkill.  But
> I don't exactly how worried we are about cycles on travis (I don't
> check it very often personally, so I don't really know).  If we aren't
> worried about cycles what you have below would certainly make sense.

I think 64-bit gcc/clang builds tend to cost us about 10-20 minutes
each, and 32-bit linux builds about 10 minutes or so.  I wonder if
it makes sense to do the "run twice" for only one of 64-bit builds
(perhaps the clang one, as I suspect 32-bit linux one uses gcc) and
the 32-bit linux build, and nowhere else.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX
  2017-12-12 19:15         ` Junio C Hamano
  2017-12-12 20:15           ` Thomas Gummerer
@ 2017-12-13 17:21           ` Lars Schneider
  2017-12-13 17:38             ` Junio C Hamano
  1 sibling, 1 reply; 94+ messages in thread
From: Lars Schneider @ 2017-12-13 17:21 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Thomas Gummerer, SZEDER Gábor, Christian Couder,
	Brandon Williams, Jeff King,
	Nguyễn Thái Ngọc Duy, git


> On 12 Dec 2017, at 20:15, Junio C Hamano <gitster@pobox.com> wrote:
> 
> Lars Schneider <larsxschneider@gmail.com> writes:
> 
>>> You're right, it's my first time using travis CI and I got confused
>>> about how the .travis.yml works, thanks for catching that.  Will
>>> re-phrase the commit message.
>> 
>> Szeder is spot on. If you fix up the message, then this patch looks
>> perfect! Pragmatic (= very good!) idea to combine GITTEXT_POISON with 
>> GIT_TEST_SPLIT_INDEX :-)
> 
> I am failing to guess the real intent of the smiley here.

No real reason. I was just happy to see that Travis CI seems to
be useful for the Git project.


> If split-index code is so easy to break, I do not think it is a good
> idea to combine it into the poison build.  In fact, the poison test
> is useless on a codebase where other/real breakages are expected to
> exist, because it is about seeing messages meant for non-humans are
> not passed to the _() mechanism by sloppy coding, and the way it
> does so is to corrupt all the messages that come through the _()
> mechanism.  If we do not even produce a message when a correct code
> _should_ produce one, poison test would catch nothing useful.
> 
> I wonder if it makes more sense to update ci/run-tests.sh so that
> its final step is run twice with different settings, like so?

Agreed - I didn't think it through. Let's keep it separate then.

I think your solution points into the right direction.
Right now we have the following test matrix:

1. Linux - clang
2. Linux - gcc
3. Mac - clang
4. Mac - gcc
5. Linux - gcc - GET_TEXT_POISION
6. Linux - gcc - 32bit
7. Windows

AFAIK your solution would run the split index test for 
1, 2, 3, and 4. I think that is too much.

1 runs the fastest and I would like to keep it that way
to get a quick "general" result. I think only 2 should be
extended in the way you are suggesting. We could run
the tests with different env variables there. What else
do we have besides GIT_TEST_SPLIT_INDEX?

Would that work for everyone?

- Lars


> ci/run-tests.sh | 1 +
> 1 file changed, 1 insertion(+)
> 
> diff --git a/ci/run-tests.sh b/ci/run-tests.sh
> index f0c743de94..15a5f5a6cc 100755
> --- a/ci/run-tests.sh
> +++ b/ci/run-tests.sh
> @@ -8,3 +8,4 @@
> mkdir -p $HOME/travis-cache
> ln -s $HOME/travis-cache/.prove t/.prove
> make --quiet test
> +GIT_TEST_SPLIT_INDEX=LetsTryIt make --quiet test


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX
  2017-12-13 17:21           ` Lars Schneider
@ 2017-12-13 17:38             ` Junio C Hamano
  2017-12-13 17:46               ` Lars Schneider
  0 siblings, 1 reply; 94+ messages in thread
From: Junio C Hamano @ 2017-12-13 17:38 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Thomas Gummerer, SZEDER Gábor, Christian Couder,
	Brandon Williams, Jeff King,
	Nguyễn Thái Ngọc Duy, git

Lars Schneider <larsxschneider@gmail.com> writes:

> I think your solution points into the right direction.
> Right now we have the following test matrix:
>
> 1. Linux - clang
> 2. Linux - gcc
> 3. Mac - clang
> 4. Mac - gcc
> 5. Linux - gcc - GET_TEXT_POISION
> 6. Linux - gcc - 32bit
> 7. Windows
>
> AFAIK your solution would run the split index test for 
> 1, 2, 3, and 4. I think that is too much.

Not that it matters too much, but I meant to add it to 1. and
6. when I said "only one of 64-bit build plus 32-bit one".  

> 1 runs the fastest and I would like to keep it that way
> to get a quick "general" result. I think only 2 should be
> extended in the way you are suggesting. We could run
> the tests with different env variables there. What else
> do we have besides GIT_TEST_SPLIT_INDEX?
>
> Would that work for everyone?

I am OK to make 2. use split-index (which unfortunately would mean
we lose tests without split-index under gcc), or add 2.5 that is a
copy of 2. plus split-index.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX
  2017-12-13 17:38             ` Junio C Hamano
@ 2017-12-13 17:46               ` Lars Schneider
  2017-12-13 23:28                 ` Thomas Gummerer
  0 siblings, 1 reply; 94+ messages in thread
From: Lars Schneider @ 2017-12-13 17:46 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Thomas Gummerer, SZEDER Gábor, Christian Couder,
	Brandon Williams, Jeff King,
	Nguyễn Thái Ngọc Duy, git


> On 13 Dec 2017, at 18:38, Junio C Hamano <gitster@pobox.com> wrote:
> 
> Lars Schneider <larsxschneider@gmail.com> writes:
> 
>> I think your solution points into the right direction.
>> Right now we have the following test matrix:
>> 
>> 1. Linux - clang
>> 2. Linux - gcc
>> 3. Mac - clang
>> 4. Mac - gcc
>> 5. Linux - gcc - GET_TEXT_POISION
>> 6. Linux - gcc - 32bit
>> 7. Windows
>> 
>> AFAIK your solution would run the split index test for 
>> 1, 2, 3, and 4. I think that is too much.
> 
> Not that it matters too much, but I meant to add it to 1. and
> 6. when I said "only one of 64-bit build plus 32-bit one".  

Ah. Sorry, I didn't get that.


>> 1 runs the fastest and I would like to keep it that way
>> to get a quick "general" result. I think only 2 should be
>> extended in the way you are suggesting. We could run
>> the tests with different env variables there. What else
>> do we have besides GIT_TEST_SPLIT_INDEX?
>> 
>> Would that work for everyone?
> 
> I am OK to make 2. use split-index (which unfortunately would mean
> we lose tests without split-index under gcc), or add 2.5 that is a
> copy of 2. plus split-index.


I think I experessed myself poorly. As far as I understand it,
GIT_TEST_SPLIT_INDEX is an environment variable that only needs
to be set at test time and not at compile time. Is this right?

If yes, my idea for 2. is as follows:
- build Git with gcc
- run tests with "make --quiet test"
- run tests with "GIT_TEST_SPLIT_INDEX=YesPlease make --quiet test"

This should be quicker than a new 2.5 target because we don't need to
spin up the machine and build Git. Plus, we could run the tests
a few more times with other test flags.

- Lars

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX
  2017-12-12 20:51             ` Junio C Hamano
@ 2017-12-13 23:21               ` Thomas Gummerer
  0 siblings, 0 replies; 94+ messages in thread
From: Thomas Gummerer @ 2017-12-13 23:21 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Lars Schneider, SZEDER Gábor, Christian Couder,
	Brandon Williams, Jeff King,
	Nguyễn Thái Ngọc Duy, git

On 12/12, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
> 
> >
> > The breakages wen the split-index code fails tend to break things in
> > much more obvious manners than a wrong message, usually git ends up
> > dying if it gets broken.  Both of the bugs that were fixed here would
> > have been caught with the change in my patch.
> >
> > But yeah I can see the argument that it doesn't give us a guarantee
> > that it catches all things the test suite could catch.
> 
> I think you misunderstood me.  When split index is much easier to
> break than poison tests, combining them together would hurt the test
> coverage of poison tests.  If you value poison tests much more than
> how well split index mode works, that is a worse outcome.

Sorry, indeed I misunderstood you.  I do agree with the above.

> >> I wonder if it makes more sense to update ci/run-tests.sh so that
> >> its final step is run twice with different settings, like so?
> >
> > I kind of wanted to avoid that, because it ends up running the test
> > suite twice on travis for every test, which seems a bit overkill.  But
> > I don't exactly how worried we are about cycles on travis (I don't
> > check it very often personally, so I don't really know).  If we aren't
> > worried about cycles what you have below would certainly make sense.
> 
> I think 64-bit gcc/clang builds tend to cost us about 10-20 minutes
> each, and 32-bit linux builds about 10 minutes or so.  I wonder if
> it makes sense to do the "run twice" for only one of 64-bit builds
> (perhaps the clang one, as I suspect 32-bit linux one uses gcc) and
> the 32-bit linux build, and nowhere else.

Yeah I'd be happy with something like that.  And I see there's some
more discussion about it in another part of the thread, will follow up
there.  Thanks!

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX
  2017-12-13 17:46               ` Lars Schneider
@ 2017-12-13 23:28                 ` Thomas Gummerer
  0 siblings, 0 replies; 94+ messages in thread
From: Thomas Gummerer @ 2017-12-13 23:28 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Junio C Hamano, SZEDER Gábor, Christian Couder,
	Brandon Williams, Jeff King,
	Nguyễn Thái Ngọc Duy, git

On 12/13, Lars Schneider wrote:
> 
> > On 13 Dec 2017, at 18:38, Junio C Hamano <gitster@pobox.com> wrote:
> > 
> > Lars Schneider <larsxschneider@gmail.com> writes:
> > 
> >> I think your solution points into the right direction.
> >> Right now we have the following test matrix:
> >> 
> >> 1. Linux - clang
> >> 2. Linux - gcc
> >> 3. Mac - clang
> >> 4. Mac - gcc
> >> 5. Linux - gcc - GET_TEXT_POISION
> >> 6. Linux - gcc - 32bit
> >> 7. Windows
> >> 
> >> AFAIK your solution would run the split index test for 
> >> 1, 2, 3, and 4. I think that is too much.
> > 
> > Not that it matters too much, but I meant to add it to 1. and
> > 6. when I said "only one of 64-bit build plus 32-bit one".  
> 
> Ah. Sorry, I didn't get that.
> 
> 
> >> 1 runs the fastest and I would like to keep it that way
> >> to get a quick "general" result. I think only 2 should be
> >> extended in the way you are suggesting. We could run
> >> the tests with different env variables there. What else
> >> do we have besides GIT_TEST_SPLIT_INDEX?
> >> 
> >> Would that work for everyone?
> > 
> > I am OK to make 2. use split-index (which unfortunately would mean
> > we lose tests without split-index under gcc), or add 2.5 that is a
> > copy of 2. plus split-index.
> 
> 
> I think I experessed myself poorly. As far as I understand it,
> GIT_TEST_SPLIT_INDEX is an environment variable that only needs
> to be set at test time and not at compile time. Is this right?

Yes, that's correct.

> If yes, my idea for 2. is as follows:
> - build Git with gcc
> - run tests with "make --quiet test"
> - run tests with "GIT_TEST_SPLIT_INDEX=YesPlease make --quiet test"
> 
> This should be quicker than a new 2.5 target because we don't need to
> spin up the machine and build Git. Plus, we could run the tests
> a few more times with other test flags.

I'd be happy with that.  Thanks for the discussion on this, will
change this in my re-roll.

> - Lars

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v2 0/3] fixes for split index mode
  2017-12-10 21:21 [PATCH 0/3] fixes for split index mode Thomas Gummerer
                   ` (2 preceding siblings ...)
  2017-12-10 21:22 ` [PATCH 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX Thomas Gummerer
@ 2017-12-17 22:51 ` Thomas Gummerer
  2017-12-17 22:51   ` [PATCH v2 1/3] repository: fix repo_read_index with submodules Thomas Gummerer
                     ` (3 more replies)
  3 siblings, 4 replies; 94+ messages in thread
From: Thomas Gummerer @ 2017-12-17 22:51 UTC (permalink / raw)
  To: git
  Cc: Lars Schneider, Brandon Williams, Jeff King, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor,
	Thomas Gummerer

The previous round was at <20171210212202.28231-1-t.gummerer@gmail.com>.

Thanks Brandon, Eric, Szeder, Lars and Junio for the comments on the
previous round.  

Changes since the previous round:

- pass in the_repository to do_read_index_from, so we don't have to
  special case NULL anymore.
- free struct repository properly after allocating it in revision.c
- run the travis tests in the linux-gcc and linux 32-bit builds, to
  avoid clashes with the GETTEXT_POISON tests.

The travis tests now depend on sg/travis-fixes, as running the split
index tests separately gets simplified with the introduction of the
'$jobname' variable.  The script falls back gracefully to not runing
the split index test under linux with gcc and only running the
"normal" tests there, while the split index tests would still get run
in the linux 32-bit build.

Because it's a "soft" dependency I thought having this dependency
would be okay, but let me know if you it's getting to complicated and
we can use the environment variables set by travis for now, and then
switch to the '$jobname' variable when it's merged to next/master.

I kept the usage of struct repository for worktrees, as I didn't hear
anything back after my last reply in <20171211213948.GC25616@hank>,
and don't see a better way of doing it.

Interdiff below:
diff --git a/.travis.yml b/.travis.yml
index c83c766dee..281f101f31 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -39,7 +39,7 @@ env:
 
 matrix:
   include:
-    - env: GETTEXT_POISON=YesPlease GIT_TEST_SPLIT_INDEX=YesPlease
+    - env: GETTEXT_POISON=YesPlease
       os: linux
       compiler:
       addons:
diff --git a/ci/run-linux32-build.sh b/ci/run-linux32-build.sh
index e30fb2cddc..f173c9cf2a 100755
--- a/ci/run-linux32-build.sh
+++ b/ci/run-linux32-build.sh
@@ -27,4 +27,5 @@ linux32 --32bit i386 su -m -l $CI_USER -c '
     cd /usr/src/git &&
     make --jobs=2 &&
     make --quiet test
+    GIT_TEST_SPLIT_INDEX=YesPlease make --quiet test
 '
diff --git a/ci/run-tests.sh b/ci/run-tests.sh
index f0c743de94..c7aee5b9ff 100755
--- a/ci/run-tests.sh
+++ b/ci/run-tests.sh
@@ -8,3 +8,7 @@
 mkdir -p $HOME/travis-cache
 ln -s $HOME/travis-cache/.prove t/.prove
 make --quiet test
+if test "$jobname" = "linux-gcc"
+then
+	GIT_TEST_SPLIT_INDEX=YesPlease make --quiet test
+fi
diff --git a/read-cache.c b/read-cache.c
index 4d5c4ad79b..70357febdc 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1898,10 +1898,7 @@ static int do_read_index_from(struct index_state *istate, const char *path,
 		split_index->base = xcalloc(1, sizeof(*split_index->base));
 
 	base_sha1_hex = sha1_to_hex(split_index->base_sha1);
-	if (repo)
-		base_path = repo_git_path(repo, "sharedindex.%s", base_sha1_hex);
-	else
-		base_path = git_path("sharedindex.%s", base_sha1_hex);
+	base_path = repo_git_path(repo, "sharedindex.%s", base_sha1_hex);
 	ret = do_read_index(split_index->base, base_path, 1);
 	if (hashcmp(split_index->base_sha1, split_index->base->sha1))
 		die("broken index, expect %s in %s, got %s",
@@ -1921,7 +1918,7 @@ int read_index_for_repo(const struct repository *repo)
 
 int read_index_from(struct index_state *istate, const char *path)
 {
-	return do_read_index_from(istate, path, NULL);
+	return do_read_index_from(istate, path, the_repository);
 }
 
 int is_index_unborn(struct index_state *istate)
diff --git a/revision.c b/revision.c
index 9d8d9b96d1..34e1e4b799 100644
--- a/revision.c
+++ b/revision.c
@@ -1358,6 +1358,7 @@ void add_index_objects_to_pending(struct rev_info *revs, unsigned int flags)
 		if (repo_read_index(repo) > 0)
 			do_add_index_objects_to_pending(revs, repo->index);
 		discard_index(repo->index);
+		free(repo);
 	}
 	free_worktrees(worktrees);
 }

Thomas Gummerer (3):
  repository: fix repo_read_index with submodules
  prune: fix pruning with multiple worktrees and split index
  travis: run tests with GIT_TEST_SPLIT_INDEX

 cache.h                 |  1 +
 ci/run-linux32-build.sh |  1 +
 ci/run-tests.sh         |  1 +
 read-cache.c            | 16 ++++++++++++++--
 repository.c            | 13 ++++++++++++-
 repository.h            |  2 ++
 revision.c              | 14 +++++++++-----
 7 files changed, 40 insertions(+), 8 deletions(-)

-- 
2.15.1.620.gb9897f4670


^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v2 1/3] repository: fix repo_read_index with submodules
  2017-12-17 22:51 ` [PATCH v2 0/3] fixes for split index mode Thomas Gummerer
@ 2017-12-17 22:51   ` Thomas Gummerer
  2017-12-18 18:01     ` Brandon Williams
  2017-12-17 22:51   ` [PATCH v2 2/3] prune: fix pruning with multiple worktrees and split index Thomas Gummerer
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 94+ messages in thread
From: Thomas Gummerer @ 2017-12-17 22:51 UTC (permalink / raw)
  To: git
  Cc: Lars Schneider, Brandon Williams, Jeff King, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor,
	Thomas Gummerer

repo_read_index calls read_index_from, which takes an path argument for
the location of the index file.  For the split index however it relies
on the current working directory to construct the path using git_path.

repo_read_index calls read_index_from with the full path for the index
file, however it doesn't change the cwd, so when split index mode is
turned on, read_index_from can't find the file for the split index.

For example t3007-ls-files-recurse-submodules.sh was broken with
GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository
object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also
broken in a similar manner, probably by introducing struct repository
there, although I didn't track down the exact commit for that.

Fix this by introducing a new read_index_for_repo function, which knows
about the correct paths for the submodules.

The alternative would have been to make the callers pass in the base
path for the split index, however that ended up being more complicated,
and I think we want to converge towards using struct repository for
things like these anyway.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 cache.h      |  1 +
 read-cache.c | 16 ++++++++++++++--
 repository.c |  2 +-
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/cache.h b/cache.h
index cb5db7bf83..d42bea1ef7 100644
--- a/cache.h
+++ b/cache.h
@@ -614,6 +614,7 @@ extern int read_index_preload(struct index_state *, const struct pathspec *paths
 extern int do_read_index(struct index_state *istate, const char *path,
 			 int must_exist); /* for testting only! */
 extern int read_index_from(struct index_state *, const char *path);
+extern int read_index_for_repo(const struct repository *);
 extern int is_index_unborn(struct index_state *);
 extern int read_index_unmerged(struct index_state *);
 
diff --git a/read-cache.c b/read-cache.c
index 2eb81a66b9..70357febdc 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -20,6 +20,7 @@
 #include "split-index.h"
 #include "utf8.h"
 #include "fsmonitor.h"
+#include "repository.h"
 
 /* Mask for the name length in ce_flags in the on-disk index */
 
@@ -1871,7 +1872,8 @@ static void freshen_shared_index(char *base_sha1_hex, int warn)
 	free(shared_index);
 }
 
-int read_index_from(struct index_state *istate, const char *path)
+static int do_read_index_from(struct index_state *istate, const char *path,
+			      const struct repository *repo)
 {
 	struct split_index *split_index;
 	int ret;
@@ -1896,7 +1898,7 @@ int read_index_from(struct index_state *istate, const char *path)
 		split_index->base = xcalloc(1, sizeof(*split_index->base));
 
 	base_sha1_hex = sha1_to_hex(split_index->base_sha1);
-	base_path = git_path("sharedindex.%s", base_sha1_hex);
+	base_path = repo_git_path(repo, "sharedindex.%s", base_sha1_hex);
 	ret = do_read_index(split_index->base, base_path, 1);
 	if (hashcmp(split_index->base_sha1, split_index->base->sha1))
 		die("broken index, expect %s in %s, got %s",
@@ -1909,6 +1911,16 @@ int read_index_from(struct index_state *istate, const char *path)
 	return ret;
 }
 
+int read_index_for_repo(const struct repository *repo)
+{
+	return do_read_index_from(repo->index, repo->index_file, repo);
+}
+
+int read_index_from(struct index_state *istate, const char *path)
+{
+	return do_read_index_from(istate, path, the_repository);
+}
+
 int is_index_unborn(struct index_state *istate)
 {
 	return (!istate->cache_nr && !istate->timestamp.sec);
diff --git a/repository.c b/repository.c
index bb2fae5446..928b1f553d 100644
--- a/repository.c
+++ b/repository.c
@@ -229,5 +229,5 @@ int repo_read_index(struct repository *repo)
 	if (!repo->index)
 		repo->index = xcalloc(1, sizeof(*repo->index));
 
-	return read_index_from(repo->index, repo->index_file);
+	return read_index_for_repo(repo);
 }
-- 
2.15.1.620.gb9897f4670


^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v2 2/3] prune: fix pruning with multiple worktrees and split index
  2017-12-17 22:51 ` [PATCH v2 0/3] fixes for split index mode Thomas Gummerer
  2017-12-17 22:51   ` [PATCH v2 1/3] repository: fix repo_read_index with submodules Thomas Gummerer
@ 2017-12-17 22:51   ` Thomas Gummerer
  2017-12-18 18:19     ` Brandon Williams
  2017-12-17 22:51   ` [PATCH v2 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX Thomas Gummerer
  2018-01-07 22:30   ` [PATCH v3 0/3] fixes for split index mode Thomas Gummerer
  3 siblings, 1 reply; 94+ messages in thread
From: Thomas Gummerer @ 2017-12-17 22:51 UTC (permalink / raw)
  To: git
  Cc: Lars Schneider, Brandon Williams, Jeff King, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor,
	Thomas Gummerer

be489d02d2 ("revision.c: --indexed-objects add objects from all
worktrees", 2017-08-23) made sure that pruning takes objects from all
worktrees into account.

It did that by reading the index of every worktree and adding the
necessary index objects to the set of pending objects.  The index is
read by read_index_from.  As mentioned in the previous commit,
read_index_from depends on the CWD for the location of the split index,
and add_index_objects_to_pending doesn't set that before using
read_index_from.

Instead of using read_index_from, use repo_read_index, which is aware of
the proper paths for the worktree.

This fixes t5304-prune when ran with GIT_TEST_SPLIT_INDEX set.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 repository.c | 11 +++++++++++
 repository.h |  2 ++
 revision.c   | 14 +++++++++-----
 3 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/repository.c b/repository.c
index 928b1f553d..3c9bfbd1b8 100644
--- a/repository.c
+++ b/repository.c
@@ -2,6 +2,7 @@
 #include "repository.h"
 #include "config.h"
 #include "submodule-config.h"
+#include "worktree.h"
 
 /* The main repository */
 static struct repository the_repo = {
@@ -146,6 +147,16 @@ int repo_init(struct repository *repo, const char *gitdir, const char *worktree)
 	return -1;
 }
 
+/*
+ * Initialize 'repo' based on the provided worktree
+ * Return 0 upon success and a non-zero value upon failure.
+ */
+int repo_worktree_init(struct repository *repo, struct worktree *worktree)
+{
+	return repo_init(repo, get_worktree_git_dir(worktree),
+			 worktree->path);
+}
+
 /*
  * Initialize 'submodule' as the submodule given by 'path' in parent repository
  * 'superproject'.
diff --git a/repository.h b/repository.h
index 7f5e24a0a2..2adeb05bf4 100644
--- a/repository.h
+++ b/repository.h
@@ -4,6 +4,7 @@
 struct config_set;
 struct index_state;
 struct submodule_cache;
+struct worktree;
 
 struct repository {
 	/* Environment */
@@ -87,6 +88,7 @@ extern struct repository *the_repository;
 extern void repo_set_gitdir(struct repository *repo, const char *path);
 extern void repo_set_worktree(struct repository *repo, const char *path);
 extern int repo_init(struct repository *repo, const char *gitdir, const char *worktree);
+extern int repo_worktree_init(struct repository *repo, struct worktree *worktree);
 extern int repo_submodule_init(struct repository *submodule,
 			       struct repository *superproject,
 			       const char *path);
diff --git a/revision.c b/revision.c
index e2e691dd5a..34e1e4b799 100644
--- a/revision.c
+++ b/revision.c
@@ -22,6 +22,7 @@
 #include "packfile.h"
 #include "worktree.h"
 #include "argv-array.h"
+#include "repository.h"
 
 volatile show_early_output_fn_t show_early_output;
 
@@ -1346,15 +1347,18 @@ void add_index_objects_to_pending(struct rev_info *revs, unsigned int flags)
 	worktrees = get_worktrees(0);
 	for (p = worktrees; *p; p++) {
 		struct worktree *wt = *p;
-		struct index_state istate = { NULL };
+		struct repository *repo;
 
+		repo = xmalloc(sizeof(struct repository));
 		if (wt->is_current)
 			continue; /* current index already taken care of */
+		if (repo_worktree_init(repo, wt))
+			BUG("couldn't initialize repository object from worktree");
 
-		if (read_index_from(&istate,
-				    worktree_git_path(wt, "index")) > 0)
-			do_add_index_objects_to_pending(revs, &istate);
-		discard_index(&istate);
+		if (repo_read_index(repo) > 0)
+			do_add_index_objects_to_pending(revs, repo->index);
+		discard_index(repo->index);
+		free(repo);
 	}
 	free_worktrees(worktrees);
 }
-- 
2.15.1.620.gb9897f4670


^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v2 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX
  2017-12-17 22:51 ` [PATCH v2 0/3] fixes for split index mode Thomas Gummerer
  2017-12-17 22:51   ` [PATCH v2 1/3] repository: fix repo_read_index with submodules Thomas Gummerer
  2017-12-17 22:51   ` [PATCH v2 2/3] prune: fix pruning with multiple worktrees and split index Thomas Gummerer
@ 2017-12-17 22:51   ` Thomas Gummerer
  2017-12-18 18:16     ` Lars Schneider
  2018-01-07 22:30   ` [PATCH v3 0/3] fixes for split index mode Thomas Gummerer
  3 siblings, 1 reply; 94+ messages in thread
From: Thomas Gummerer @ 2017-12-17 22:51 UTC (permalink / raw)
  To: git
  Cc: Lars Schneider, Brandon Williams, Jeff King, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor,
	Thomas Gummerer

Split index mode only has a few dedicated tests, but as the index is
involved in nearly every git operation, this doesn't quite cover all the
ways repositories with split index can break.  To use split index mode
throughout the test suite a GIT_TEST_SPLIT_INDEX environment variable
can be set, which makes git split the index at random and thus
excercises the functionality much more thoroughly.

As this is not turned on by default, it is not executed nearly as often
as the test suite is run, so occationally breakages slip through.  Try
to counteract that by running the test suite with GIT_TEST_SPLIT_INDEX
mode turned on on travis.

To avoid using too many cycles on travis only run split index mode in
the linux-gcc and the linux 32-bit gcc targets.  The Linux builds were
chosen over the Mac OS builds because they tend to be much faster to
complete.

The linux gcc build was chosen over the linux clang build because the
linux clang build is the fastest build, so it can serve as an early
indicator if something is broken and we want to avoid spending the extra
cycles of running the test suite twice for that.

Helped-by: Lars Schneider <larsxschneider@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 ci/run-linux32-build.sh | 1 +
 ci/run-tests.sh         | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/ci/run-linux32-build.sh b/ci/run-linux32-build.sh
index e30fb2cddc..f173c9cf2a 100755
--- a/ci/run-linux32-build.sh
+++ b/ci/run-linux32-build.sh
@@ -27,4 +27,5 @@ linux32 --32bit i386 su -m -l $CI_USER -c '
     cd /usr/src/git &&
     make --jobs=2 &&
     make --quiet test
+    GIT_TEST_SPLIT_INDEX=YesPlease make --quiet test
 '
diff --git a/ci/run-tests.sh b/ci/run-tests.sh
index f0c743de94..c7aee5b9ff 100755
--- a/ci/run-tests.sh
+++ b/ci/run-tests.sh
@@ -8,3 +8,7 @@
 mkdir -p $HOME/travis-cache
 ln -s $HOME/travis-cache/.prove t/.prove
 make --quiet test
+if test "$jobname" = "linux-gcc"
+then
+	GIT_TEST_SPLIT_INDEX=YesPlease make --quiet test
+fi
-- 
2.15.1.620.gb9897f4670


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 1/3] repository: fix repo_read_index with submodules
  2017-12-17 22:51   ` [PATCH v2 1/3] repository: fix repo_read_index with submodules Thomas Gummerer
@ 2017-12-18 18:01     ` Brandon Williams
  2017-12-18 23:05       ` Thomas Gummerer
  0 siblings, 1 reply; 94+ messages in thread
From: Brandon Williams @ 2017-12-18 18:01 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: git, Lars Schneider, Jeff King, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor

On 12/17, Thomas Gummerer wrote:
> repo_read_index calls read_index_from, which takes an path argument for
> the location of the index file.  For the split index however it relies

> on the current working directory to construct the path using git_path.

This line isn't actually true and should probably be fixed.  git_path
doesn't rely on the CWD but rather it relies on the gitdir of the main
repository (the_repository).

> 
> repo_read_index calls read_index_from with the full path for the index
> file, however it doesn't change the cwd, so when split index mode is
> turned on, read_index_from can't find the file for the split index.
> 
> For example t3007-ls-files-recurse-submodules.sh was broken with
> GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository
> object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also
> broken in a similar manner, probably by introducing struct repository
> there, although I didn't track down the exact commit for that.
> 
> Fix this by introducing a new read_index_for_repo function, which knows
> about the correct paths for the submodules.
> 
> The alternative would have been to make the callers pass in the base
> path for the split index, however that ended up being more complicated,
> and I think we want to converge towards using struct repository for
> things like these anyway.
> 
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
>  cache.h      |  1 +
>  read-cache.c | 16 ++++++++++++++--
>  repository.c |  2 +-
>  3 files changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/cache.h b/cache.h
> index cb5db7bf83..d42bea1ef7 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -614,6 +614,7 @@ extern int read_index_preload(struct index_state *, const struct pathspec *paths
>  extern int do_read_index(struct index_state *istate, const char *path,
>  			 int must_exist); /* for testting only! */
>  extern int read_index_from(struct index_state *, const char *path);
> +extern int read_index_for_repo(const struct repository *);
>  extern int is_index_unborn(struct index_state *);
>  extern int read_index_unmerged(struct index_state *);
>  
> diff --git a/read-cache.c b/read-cache.c
> index 2eb81a66b9..70357febdc 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -20,6 +20,7 @@
>  #include "split-index.h"
>  #include "utf8.h"
>  #include "fsmonitor.h"
> +#include "repository.h"
>  
>  /* Mask for the name length in ce_flags in the on-disk index */
>  
> @@ -1871,7 +1872,8 @@ static void freshen_shared_index(char *base_sha1_hex, int warn)
>  	free(shared_index);
>  }
>  
> -int read_index_from(struct index_state *istate, const char *path)
> +static int do_read_index_from(struct index_state *istate, const char *path,
> +			      const struct repository *repo)
>  {
>  	struct split_index *split_index;
>  	int ret;
> @@ -1896,7 +1898,7 @@ int read_index_from(struct index_state *istate, const char *path)
>  		split_index->base = xcalloc(1, sizeof(*split_index->base));
>  
>  	base_sha1_hex = sha1_to_hex(split_index->base_sha1);
> -	base_path = git_path("sharedindex.%s", base_sha1_hex);
> +	base_path = repo_git_path(repo, "sharedindex.%s", base_sha1_hex);
>  	ret = do_read_index(split_index->base, base_path, 1);
>  	if (hashcmp(split_index->base_sha1, split_index->base->sha1))
>  		die("broken index, expect %s in %s, got %s",
> @@ -1909,6 +1911,16 @@ int read_index_from(struct index_state *istate, const char *path)
>  	return ret;
>  }
>  
> +int read_index_for_repo(const struct repository *repo)
> +{
> +	return do_read_index_from(repo->index, repo->index_file, repo);
> +}

> +
> +int read_index_from(struct index_state *istate, const char *path)
> +{
> +	return do_read_index_from(istate, path, the_repository);
> +}

This looks fine, though I wonder what the point of passing in the index
file even was since we end just ended up reading the 'sharedindex' file based
on the git path. I'm just curious about how this function evolved.

> +
>  int is_index_unborn(struct index_state *istate)
>  {
>  	return (!istate->cache_nr && !istate->timestamp.sec);
> diff --git a/repository.c b/repository.c
> index bb2fae5446..928b1f553d 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -229,5 +229,5 @@ int repo_read_index(struct repository *repo)
>  	if (!repo->index)
>  		repo->index = xcalloc(1, sizeof(*repo->index));
>  
> -	return read_index_from(repo->index, repo->index_file);
> +	return read_index_for_repo(repo);
>  }
> -- 
> 2.15.1.620.gb9897f4670
> 

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX
  2017-12-17 22:51   ` [PATCH v2 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX Thomas Gummerer
@ 2017-12-18 18:16     ` Lars Schneider
  2018-01-04 20:13       ` Thomas Gummerer
  0 siblings, 1 reply; 94+ messages in thread
From: Lars Schneider @ 2017-12-18 18:16 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Git List, Brandon Williams, Jeff King, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor


> On 17 Dec 2017, at 23:51, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> 
> Split index mode only has a few dedicated tests, but as the index is
> involved in nearly every git operation, this doesn't quite cover all the
> ways repositories with split index can break.  To use split index mode
> throughout the test suite a GIT_TEST_SPLIT_INDEX environment variable
> can be set, which makes git split the index at random and thus
> excercises the functionality much more thoroughly.
> 
> As this is not turned on by default, it is not executed nearly as often
> as the test suite is run, so occationally breakages slip through.  Try
> to counteract that by running the test suite with GIT_TEST_SPLIT_INDEX
> mode turned on on travis.
> 
> To avoid using too many cycles on travis only run split index mode in
> the linux-gcc and the linux 32-bit gcc targets.

I am surprised to see the split index mode test for the linux 32-bit
target. Is it likely that a split index bug appears only on 32-bit? 
Wouldn't the linux-gcc target be sufficient to save resources/time?


>  The Linux builds were
> chosen over the Mac OS builds because they tend to be much faster to
> complete.
> 
> The linux gcc build was chosen over the linux clang build because the
> linux clang build is the fastest build, so it can serve as an early
> indicator if something is broken and we want to avoid spending the extra
> cycles of running the test suite twice for that.
> 
> Helped-by: Lars Schneider <larsxschneider@gmail.com>
> Helped-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
> ci/run-linux32-build.sh | 1 +
> ci/run-tests.sh         | 4 ++++
> 2 files changed, 5 insertions(+)
> 
> diff --git a/ci/run-linux32-build.sh b/ci/run-linux32-build.sh
> index e30fb2cddc..f173c9cf2a 100755
> --- a/ci/run-linux32-build.sh
> +++ b/ci/run-linux32-build.sh
> @@ -27,4 +27,5 @@ linux32 --32bit i386 su -m -l $CI_USER -c '
>     cd /usr/src/git &&
>     make --jobs=2 &&
>     make --quiet test
> +    GIT_TEST_SPLIT_INDEX=YesPlease make --quiet test
> '
> diff --git a/ci/run-tests.sh b/ci/run-tests.sh
> index f0c743de94..c7aee5b9ff 100755
> --- a/ci/run-tests.sh
> +++ b/ci/run-tests.sh
> @@ -8,3 +8,7 @@
> mkdir -p $HOME/travis-cache
> ln -s $HOME/travis-cache/.prove t/.prove
> make --quiet test
> +if test "$jobname" = "linux-gcc"
> +then
> +	GIT_TEST_SPLIT_INDEX=YesPlease make --quiet test
> +fi

For now I think that looks good. Maybe we could define additional test 
configurations with an environment variable. That could be an array variable
defined in the lib-travis.ci "case" statement:
https://github.com/git/git/blob/1229713f78cd2883798e95f33c19c81b523413fd/ci/lib-travisci.sh#L42-L65


- Lars

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 2/3] prune: fix pruning with multiple worktrees and split index
  2017-12-17 22:51   ` [PATCH v2 2/3] prune: fix pruning with multiple worktrees and split index Thomas Gummerer
@ 2017-12-18 18:19     ` Brandon Williams
  2018-01-03 22:18       ` Thomas Gummerer
  0 siblings, 1 reply; 94+ messages in thread
From: Brandon Williams @ 2017-12-18 18:19 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: git, Lars Schneider, Jeff King, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor

On 12/17, Thomas Gummerer wrote:
> be489d02d2 ("revision.c: --indexed-objects add objects from all
> worktrees", 2017-08-23) made sure that pruning takes objects from all
> worktrees into account.
> 
> It did that by reading the index of every worktree and adding the
> necessary index objects to the set of pending objects.  The index is
> read by read_index_from.  As mentioned in the previous commit,
> read_index_from depends on the CWD for the location of the split index,

As I mentioned before this doesn't actually depend on the CWD but
rather the per-worktree gitdir.

> and add_index_objects_to_pending doesn't set that before using
> read_index_from.
> 
> Instead of using read_index_from, use repo_read_index, which is aware of
> the proper paths for the worktree.
> 
> This fixes t5304-prune when ran with GIT_TEST_SPLIT_INDEX set.
> 
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
>  repository.c | 11 +++++++++++
>  repository.h |  2 ++
>  revision.c   | 14 +++++++++-----
>  3 files changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/repository.c b/repository.c
> index 928b1f553d..3c9bfbd1b8 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -2,6 +2,7 @@
>  #include "repository.h"
>  #include "config.h"
>  #include "submodule-config.h"
> +#include "worktree.h"
>  
>  /* The main repository */
>  static struct repository the_repo = {
> @@ -146,6 +147,16 @@ int repo_init(struct repository *repo, const char *gitdir, const char *worktree)
>  	return -1;
>  }
>  
> +/*
> + * Initialize 'repo' based on the provided worktree
> + * Return 0 upon success and a non-zero value upon failure.
> + */
> +int repo_worktree_init(struct repository *repo, struct worktree *worktree)
> +{
> +	return repo_init(repo, get_worktree_git_dir(worktree),
> +			 worktree->path);

I still feel very unsettled about this and don't think its a good idea.
get_worktree_git_dir depends implicitly on the global the_repository
object and I would like to avoid relying on it for an initialization
function like this.

> +}
> +
>  /*
>   * Initialize 'submodule' as the submodule given by 'path' in parent repository
>   * 'superproject'.
> diff --git a/repository.h b/repository.h
> index 7f5e24a0a2..2adeb05bf4 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -4,6 +4,7 @@
>  struct config_set;
>  struct index_state;
>  struct submodule_cache;
> +struct worktree;
>  
>  struct repository {
>  	/* Environment */
> @@ -87,6 +88,7 @@ extern struct repository *the_repository;
>  extern void repo_set_gitdir(struct repository *repo, const char *path);
>  extern void repo_set_worktree(struct repository *repo, const char *path);
>  extern int repo_init(struct repository *repo, const char *gitdir, const char *worktree);
> +extern int repo_worktree_init(struct repository *repo, struct worktree *worktree);
>  extern int repo_submodule_init(struct repository *submodule,
>  			       struct repository *superproject,
>  			       const char *path);
> diff --git a/revision.c b/revision.c
> index e2e691dd5a..34e1e4b799 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -22,6 +22,7 @@
>  #include "packfile.h"
>  #include "worktree.h"
>  #include "argv-array.h"
> +#include "repository.h"
>  
>  volatile show_early_output_fn_t show_early_output;
>  
> @@ -1346,15 +1347,18 @@ void add_index_objects_to_pending(struct rev_info *revs, unsigned int flags)
>  	worktrees = get_worktrees(0);
>  	for (p = worktrees; *p; p++) {
>  		struct worktree *wt = *p;
> -		struct index_state istate = { NULL };
> +		struct repository *repo;
>  
> +		repo = xmalloc(sizeof(struct repository));
>  		if (wt->is_current)
>  			continue; /* current index already taken care of */
> +		if (repo_worktree_init(repo, wt))
> +			BUG("couldn't initialize repository object from worktree");
>  
> -		if (read_index_from(&istate,
> -				    worktree_git_path(wt, "index")) > 0)

Ok, after thinking this through a bit more I think a better approach may
be to restructure the call to read_index_from to take in both an index
file as well as the explicit gitdir to use when constructing a path to
the sharedindex file.  That way you can fix this for worktrees and
submodules without having to pass in a repository object to the logic
which is reading an index file as well as avoiding needing to init a
repository object for every worktree.

So you could rework the first patch to do something like:

diff --git a/read-cache.c b/read-cache.c
index 2eb81a66b..3a04b5567 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1863,20 +1863,19 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
  * This way, shared index can be removed if they have not been used
  * for some time.
  */
-static void freshen_shared_index(char *base_sha1_hex, int warn)
+static void freshen_shared_index(const char *shared_index, int warn)
 {
-	char *shared_index = git_pathdup("sharedindex.%s", base_sha1_hex);
 	if (!check_and_freshen_file(shared_index, 1) && warn)
 		warning("could not freshen shared index '%s'", shared_index);
-	free(shared_index);
 }
 
-int read_index_from(struct index_state *istate, const char *path)
+int read_index_from(struct index_state *istate, const char *path,
+		    const char *gitdir)
 {
 	struct split_index *split_index;
 	int ret;
 	char *base_sha1_hex;
-	const char *base_path;
+	char *base_path;
 
 	/* istate->initialized covers both .git/index and .git/sharedindex.xxx */
 	if (istate->initialized)
@@ -1896,14 +1895,14 @@ int read_index_from(struct index_state *istate, const char *path)
 		split_index->base = xcalloc(1, sizeof(*split_index->base));
 
 	base_sha1_hex = sha1_to_hex(split_index->base_sha1);
-	base_path = git_path("sharedindex.%s", base_sha1_hex);
+	base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_sha1_hex);
 	ret = do_read_index(split_index->base, base_path, 1);
 	if (hashcmp(split_index->base_sha1, split_index->base->sha1))
 		die("broken index, expect %s in %s, got %s",
 		    base_sha1_hex, base_path,
 		    sha1_to_hex(split_index->base->sha1));
 
-	freshen_shared_index(base_sha1_hex, 0);
+	freshen_shared_index(base_path, 0);
 	merge_base_index(istate);
 	post_read_index_from(istate);
+ free(base_path);
 	return ret;


-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 1/3] repository: fix repo_read_index with submodules
  2017-12-18 18:01     ` Brandon Williams
@ 2017-12-18 23:05       ` Thomas Gummerer
  2017-12-18 23:05         ` Brandon Williams
  0 siblings, 1 reply; 94+ messages in thread
From: Thomas Gummerer @ 2017-12-18 23:05 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, Lars Schneider, Jeff King, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor

On 12/18, Brandon Williams wrote:
> On 12/17, Thomas Gummerer wrote:
> > repo_read_index calls read_index_from, which takes an path argument for
> > the location of the index file.  For the split index however it relies
> 
> > on the current working directory to construct the path using git_path.
> 
> This line isn't actually true and should probably be fixed.  git_path
> doesn't rely on the CWD but rather it relies on the gitdir of the main
> repository (the_repository).

Right, let me fix that.  Thanks!

> > 
> > repo_read_index calls read_index_from with the full path for the index
> > file, however it doesn't change the cwd, so when split index mode is
> > turned on, read_index_from can't find the file for the split index.
> > 
> > For example t3007-ls-files-recurse-submodules.sh was broken with
> > GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository
> > object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also
> > broken in a similar manner, probably by introducing struct repository
> > there, although I didn't track down the exact commit for that.
> > 
> > Fix this by introducing a new read_index_for_repo function, which knows
> > about the correct paths for the submodules.
> > 
> > The alternative would have been to make the callers pass in the base
> > path for the split index, however that ended up being more complicated,
> > and I think we want to converge towards using struct repository for
> > things like these anyway.
> > 
> > Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> > ---
> >  cache.h      |  1 +
> >  read-cache.c | 16 ++++++++++++++--
> >  repository.c |  2 +-
> >  3 files changed, 16 insertions(+), 3 deletions(-)
> > 
> > diff --git a/cache.h b/cache.h
> > index cb5db7bf83..d42bea1ef7 100644
> > --- a/cache.h
> > +++ b/cache.h
> > @@ -614,6 +614,7 @@ extern int read_index_preload(struct index_state *, const struct pathspec *paths
> >  extern int do_read_index(struct index_state *istate, const char *path,
> >  			 int must_exist); /* for testting only! */
> >  extern int read_index_from(struct index_state *, const char *path);
> > +extern int read_index_for_repo(const struct repository *);
> >  extern int is_index_unborn(struct index_state *);
> >  extern int read_index_unmerged(struct index_state *);
> >  
> > diff --git a/read-cache.c b/read-cache.c
> > index 2eb81a66b9..70357febdc 100644
> > --- a/read-cache.c
> > +++ b/read-cache.c
> > @@ -20,6 +20,7 @@
> >  #include "split-index.h"
> >  #include "utf8.h"
> >  #include "fsmonitor.h"
> > +#include "repository.h"
> >  
> >  /* Mask for the name length in ce_flags in the on-disk index */
> >  
> > @@ -1871,7 +1872,8 @@ static void freshen_shared_index(char *base_sha1_hex, int warn)
> >  	free(shared_index);
> >  }
> >  
> > -int read_index_from(struct index_state *istate, const char *path)
> > +static int do_read_index_from(struct index_state *istate, const char *path,
> > +			      const struct repository *repo)
> >  {
> >  	struct split_index *split_index;
> >  	int ret;
> > @@ -1896,7 +1898,7 @@ int read_index_from(struct index_state *istate, const char *path)
> >  		split_index->base = xcalloc(1, sizeof(*split_index->base));
> >  
> >  	base_sha1_hex = sha1_to_hex(split_index->base_sha1);
> > -	base_path = git_path("sharedindex.%s", base_sha1_hex);
> > +	base_path = repo_git_path(repo, "sharedindex.%s", base_sha1_hex);
> >  	ret = do_read_index(split_index->base, base_path, 1);
> >  	if (hashcmp(split_index->base_sha1, split_index->base->sha1))
> >  		die("broken index, expect %s in %s, got %s",
> > @@ -1909,6 +1911,16 @@ int read_index_from(struct index_state *istate, const char *path)
> >  	return ret;
> >  }
> >  
> > +int read_index_for_repo(const struct repository *repo)
> > +{
> > +	return do_read_index_from(repo->index, repo->index_file, repo);
> > +}
> 
> > +
> > +int read_index_from(struct index_state *istate, const char *path)
> > +{
> > +	return do_read_index_from(istate, path, the_repository);
> > +}
> 
> This looks fine, though I wonder what the point of passing in the index
> file even was since we end just ended up reading the 'sharedindex' file based
> on the git path. I'm just curious about how this function evolved.

There are some callsites that are using an index different form
$gitdir/index, or even GIT_INDEX_FILE.  e.g. see builtin/am.c [*1*],
which uses it's own 'patch-merge-index' in the am state directory for
its internal operations.

The split index mode was later bolted on, and the sharedindex.xxxx
would always go in $gitdir for the repository.  Others probably know
quite a bit more about this, while I'm always interested in index
related things as that's how I got started with the git project, I
couldn't follow all the conversations that were going on there.

*1*: https://github.com/gitster/git/blob/52015aaf9d19c97b52c47c7046058e6d029ff856/builtin/am.c#L1844

> > +
> >  int is_index_unborn(struct index_state *istate)
> >  {
> >  	return (!istate->cache_nr && !istate->timestamp.sec);
> > diff --git a/repository.c b/repository.c
> > index bb2fae5446..928b1f553d 100644
> > --- a/repository.c
> > +++ b/repository.c
> > @@ -229,5 +229,5 @@ int repo_read_index(struct repository *repo)
> >  	if (!repo->index)
> >  		repo->index = xcalloc(1, sizeof(*repo->index));
> >  
> > -	return read_index_from(repo->index, repo->index_file);
> > +	return read_index_for_repo(repo);
> >  }
> > -- 
> > 2.15.1.620.gb9897f4670
> > 
> 
> -- 
> Brandon Williams

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 1/3] repository: fix repo_read_index with submodules
  2017-12-18 23:05       ` Thomas Gummerer
@ 2017-12-18 23:05         ` Brandon Williams
  0 siblings, 0 replies; 94+ messages in thread
From: Brandon Williams @ 2017-12-18 23:05 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: git, Lars Schneider, Jeff King, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor

On 12/18, Thomas Gummerer wrote:
> On 12/18, Brandon Williams wrote:
> > On 12/17, Thomas Gummerer wrote:
> > > repo_read_index calls read_index_from, which takes an path argument for
> > > the location of the index file.  For the split index however it relies
> > 
> > > on the current working directory to construct the path using git_path.
> > 
> > This line isn't actually true and should probably be fixed.  git_path
> > doesn't rely on the CWD but rather it relies on the gitdir of the main
> > repository (the_repository).
> 
> Right, let me fix that.  Thanks!
> 
> > > 
> > > repo_read_index calls read_index_from with the full path for the index
> > > file, however it doesn't change the cwd, so when split index mode is
> > > turned on, read_index_from can't find the file for the split index.
> > > 
> > > For example t3007-ls-files-recurse-submodules.sh was broken with
> > > GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository
> > > object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also
> > > broken in a similar manner, probably by introducing struct repository
> > > there, although I didn't track down the exact commit for that.
> > > 
> > > Fix this by introducing a new read_index_for_repo function, which knows
> > > about the correct paths for the submodules.
> > > 
> > > The alternative would have been to make the callers pass in the base
> > > path for the split index, however that ended up being more complicated,
> > > and I think we want to converge towards using struct repository for
> > > things like these anyway.
> > > 
> > > Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> > > ---
> > >  cache.h      |  1 +
> > >  read-cache.c | 16 ++++++++++++++--
> > >  repository.c |  2 +-
> > >  3 files changed, 16 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/cache.h b/cache.h
> > > index cb5db7bf83..d42bea1ef7 100644
> > > --- a/cache.h
> > > +++ b/cache.h
> > > @@ -614,6 +614,7 @@ extern int read_index_preload(struct index_state *, const struct pathspec *paths
> > >  extern int do_read_index(struct index_state *istate, const char *path,
> > >  			 int must_exist); /* for testting only! */
> > >  extern int read_index_from(struct index_state *, const char *path);
> > > +extern int read_index_for_repo(const struct repository *);
> > >  extern int is_index_unborn(struct index_state *);
> > >  extern int read_index_unmerged(struct index_state *);
> > >  
> > > diff --git a/read-cache.c b/read-cache.c
> > > index 2eb81a66b9..70357febdc 100644
> > > --- a/read-cache.c
> > > +++ b/read-cache.c
> > > @@ -20,6 +20,7 @@
> > >  #include "split-index.h"
> > >  #include "utf8.h"
> > >  #include "fsmonitor.h"
> > > +#include "repository.h"
> > >  
> > >  /* Mask for the name length in ce_flags in the on-disk index */
> > >  
> > > @@ -1871,7 +1872,8 @@ static void freshen_shared_index(char *base_sha1_hex, int warn)
> > >  	free(shared_index);
> > >  }
> > >  
> > > -int read_index_from(struct index_state *istate, const char *path)
> > > +static int do_read_index_from(struct index_state *istate, const char *path,
> > > +			      const struct repository *repo)
> > >  {
> > >  	struct split_index *split_index;
> > >  	int ret;
> > > @@ -1896,7 +1898,7 @@ int read_index_from(struct index_state *istate, const char *path)
> > >  		split_index->base = xcalloc(1, sizeof(*split_index->base));
> > >  
> > >  	base_sha1_hex = sha1_to_hex(split_index->base_sha1);
> > > -	base_path = git_path("sharedindex.%s", base_sha1_hex);
> > > +	base_path = repo_git_path(repo, "sharedindex.%s", base_sha1_hex);
> > >  	ret = do_read_index(split_index->base, base_path, 1);
> > >  	if (hashcmp(split_index->base_sha1, split_index->base->sha1))
> > >  		die("broken index, expect %s in %s, got %s",
> > > @@ -1909,6 +1911,16 @@ int read_index_from(struct index_state *istate, const char *path)
> > >  	return ret;
> > >  }
> > >  
> > > +int read_index_for_repo(const struct repository *repo)
> > > +{
> > > +	return do_read_index_from(repo->index, repo->index_file, repo);
> > > +}
> > 
> > > +
> > > +int read_index_from(struct index_state *istate, const char *path)
> > > +{
> > > +	return do_read_index_from(istate, path, the_repository);
> > > +}
> > 
> > This looks fine, though I wonder what the point of passing in the index
> > file even was since we end just ended up reading the 'sharedindex' file based
> > on the git path. I'm just curious about how this function evolved.
> 
> There are some callsites that are using an index different form
> $gitdir/index, or even GIT_INDEX_FILE.  e.g. see builtin/am.c [*1*],
> which uses it's own 'patch-merge-index' in the am state directory for
> its internal operations.
> 
> The split index mode was later bolted on, and the sharedindex.xxxx
> would always go in $gitdir for the repository.  Others probably know
> quite a bit more about this, while I'm always interested in index
> related things as that's how I got started with the git project, I
> couldn't follow all the conversations that were going on there.
> 
> *1*: https://github.com/gitster/git/blob/52015aaf9d19c97b52c47c7046058e6d029ff856/builtin/am.c#L1844

Thanks for the explanation and pointer! :) 

> 
> > > +
> > >  int is_index_unborn(struct index_state *istate)
> > >  {
> > >  	return (!istate->cache_nr && !istate->timestamp.sec);
> > > diff --git a/repository.c b/repository.c
> > > index bb2fae5446..928b1f553d 100644
> > > --- a/repository.c
> > > +++ b/repository.c
> > > @@ -229,5 +229,5 @@ int repo_read_index(struct repository *repo)
> > >  	if (!repo->index)
> > >  		repo->index = xcalloc(1, sizeof(*repo->index));
> > >  
> > > -	return read_index_from(repo->index, repo->index_file);
> > > +	return read_index_for_repo(repo);
> > >  }
> > > -- 
> > > 2.15.1.620.gb9897f4670
> > > 
> > 
> > -- 
> > Brandon Williams

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 2/3] prune: fix pruning with multiple worktrees and split index
  2017-12-18 18:19     ` Brandon Williams
@ 2018-01-03 22:18       ` Thomas Gummerer
  2018-01-04 19:12         ` Junio C Hamano
  0 siblings, 1 reply; 94+ messages in thread
From: Thomas Gummerer @ 2018-01-03 22:18 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, Lars Schneider, Jeff King, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor

[sorry for the late reply.  I was on Christmas holidays until today
and am still catching up on the mailing list.  It will probably take
me untill the weekend to send a re-roll]

On 12/18, Brandon Williams wrote:
> On 12/17, Thomas Gummerer wrote:
> > be489d02d2 ("revision.c: --indexed-objects add objects from all
> > worktrees", 2017-08-23) made sure that pruning takes objects from all
> > worktrees into account.
> > 
> > It did that by reading the index of every worktree and adding the
> > necessary index objects to the set of pending objects.  The index is
> > read by read_index_from.  As mentioned in the previous commit,
> > read_index_from depends on the CWD for the location of the split index,
> 
> As I mentioned before this doesn't actually depend on the CWD but
> rather the per-worktree gitdir.

Right, will fix.

> > and add_index_objects_to_pending doesn't set that before using
> > read_index_from.
> > 
> > Instead of using read_index_from, use repo_read_index, which is aware of
> > the proper paths for the worktree.
> > 
> > This fixes t5304-prune when ran with GIT_TEST_SPLIT_INDEX set.
> > 
> > Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> > ---
> >  repository.c | 11 +++++++++++
> >  repository.h |  2 ++
> >  revision.c   | 14 +++++++++-----
> >  3 files changed, 22 insertions(+), 5 deletions(-)
> > 
> > diff --git a/repository.c b/repository.c
> > index 928b1f553d..3c9bfbd1b8 100644
> > --- a/repository.c
> > +++ b/repository.c
> > @@ -2,6 +2,7 @@
> >  #include "repository.h"
> >  #include "config.h"
> >  #include "submodule-config.h"
> > +#include "worktree.h"
> >  
> >  /* The main repository */
> >  static struct repository the_repo = {
> > @@ -146,6 +147,16 @@ int repo_init(struct repository *repo, const char *gitdir, const char *worktree)
> >  	return -1;
> >  }
> >  
> > +/*
> > + * Initialize 'repo' based on the provided worktree
> > + * Return 0 upon success and a non-zero value upon failure.
> > + */
> > +int repo_worktree_init(struct repository *repo, struct worktree *worktree)
> > +{
> > +	return repo_init(repo, get_worktree_git_dir(worktree),
> > +			 worktree->path);
> 
> I still feel very unsettled about this and don't think its a good idea.
> get_worktree_git_dir depends implicitly on the global the_repository
> object and I would like to avoid relying on it for an initialization
> function like this.
> 
> > +}
> > +
> >  /*
> >   * Initialize 'submodule' as the submodule given by 'path' in parent repository
> >   * 'superproject'.
> > diff --git a/repository.h b/repository.h
> > index 7f5e24a0a2..2adeb05bf4 100644
> > --- a/repository.h
> > +++ b/repository.h
> > @@ -4,6 +4,7 @@
> >  struct config_set;
> >  struct index_state;
> >  struct submodule_cache;
> > +struct worktree;
> >  
> >  struct repository {
> >  	/* Environment */
> > @@ -87,6 +88,7 @@ extern struct repository *the_repository;
> >  extern void repo_set_gitdir(struct repository *repo, const char *path);
> >  extern void repo_set_worktree(struct repository *repo, const char *path);
> >  extern int repo_init(struct repository *repo, const char *gitdir, const char *worktree);
> > +extern int repo_worktree_init(struct repository *repo, struct worktree *worktree);
> >  extern int repo_submodule_init(struct repository *submodule,
> >  			       struct repository *superproject,
> >  			       const char *path);
> > diff --git a/revision.c b/revision.c
> > index e2e691dd5a..34e1e4b799 100644
> > --- a/revision.c
> > +++ b/revision.c
> > @@ -22,6 +22,7 @@
> >  #include "packfile.h"
> >  #include "worktree.h"
> >  #include "argv-array.h"
> > +#include "repository.h"
> >  
> >  volatile show_early_output_fn_t show_early_output;
> >  
> > @@ -1346,15 +1347,18 @@ void add_index_objects_to_pending(struct rev_info *revs, unsigned int flags)
> >  	worktrees = get_worktrees(0);
> >  	for (p = worktrees; *p; p++) {
> >  		struct worktree *wt = *p;
> > -		struct index_state istate = { NULL };
> > +		struct repository *repo;
> >  
> > +		repo = xmalloc(sizeof(struct repository));
> >  		if (wt->is_current)
> >  			continue; /* current index already taken care of */
> > +		if (repo_worktree_init(repo, wt))
> > +			BUG("couldn't initialize repository object from worktree");
> >  
> > -		if (read_index_from(&istate,
> > -				    worktree_git_path(wt, "index")) > 0)
> 
> Ok, after thinking this through a bit more I think a better approach may
> be to restructure the call to read_index_from to take in both an index
> file as well as the explicit gitdir to use when constructing a path to
> the sharedindex file.  That way you can fix this for worktrees and
> submodules without having to pass in a repository object to the logic
> which is reading an index file as well as avoiding needing to init a
> repository object for every worktree.

I still think with eventually we probably only want to read the index
through a repository object instead of reading it to a separate struct
index_state.

But we can probably defer that for now, as this series really just
wants to fix the regressions, and we can think a bit more about how
the repository struct interacts with worktrees.

> So you could rework the first patch to do something like:

Thanks for the below, I'll try the below and see how much churn it
causes adjusting all the callsites, and send a re-roll later this
week.

> diff --git a/read-cache.c b/read-cache.c
> index 2eb81a66b..3a04b5567 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1863,20 +1863,19 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
>   * This way, shared index can be removed if they have not been used
>   * for some time.
>   */
> -static void freshen_shared_index(char *base_sha1_hex, int warn)
> +static void freshen_shared_index(const char *shared_index, int warn)
>  {
> -	char *shared_index = git_pathdup("sharedindex.%s", base_sha1_hex);
>  	if (!check_and_freshen_file(shared_index, 1) && warn)
>  		warning("could not freshen shared index '%s'", shared_index);
> -	free(shared_index);
>  }
>  
> -int read_index_from(struct index_state *istate, const char *path)
> +int read_index_from(struct index_state *istate, const char *path,
> +		    const char *gitdir)
>  {
>  	struct split_index *split_index;
>  	int ret;
>  	char *base_sha1_hex;
> -	const char *base_path;
> +	char *base_path;
>  
>  	/* istate->initialized covers both .git/index and .git/sharedindex.xxx */
>  	if (istate->initialized)
> @@ -1896,14 +1895,14 @@ int read_index_from(struct index_state *istate, const char *path)
>  		split_index->base = xcalloc(1, sizeof(*split_index->base));
>  
>  	base_sha1_hex = sha1_to_hex(split_index->base_sha1);
> -	base_path = git_path("sharedindex.%s", base_sha1_hex);
> +	base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_sha1_hex);
>  	ret = do_read_index(split_index->base, base_path, 1);
>  	if (hashcmp(split_index->base_sha1, split_index->base->sha1))
>  		die("broken index, expect %s in %s, got %s",
>  		    base_sha1_hex, base_path,
>  		    sha1_to_hex(split_index->base->sha1));
>  
> -	freshen_shared_index(base_sha1_hex, 0);
> +	freshen_shared_index(base_path, 0);
>  	merge_base_index(istate);
>  	post_read_index_from(istate);
> + free(base_path);
>  	return ret;
> 
> 
> -- 
> Brandon Williams

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 2/3] prune: fix pruning with multiple worktrees and split index
  2018-01-03 22:18       ` Thomas Gummerer
@ 2018-01-04 19:12         ` Junio C Hamano
  0 siblings, 0 replies; 94+ messages in thread
From: Junio C Hamano @ 2018-01-04 19:12 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Brandon Williams, git, Lars Schneider, Jeff King,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor

Thomas Gummerer <t.gummerer@gmail.com> writes:

> [sorry for the late reply.  I was on Christmas holidays until today
> and am still catching up on the mailing list.  It will probably take
> me untill the weekend to send a re-roll]
>
> On 12/18, Brandon Williams wrote:
>> On 12/17, Thomas Gummerer wrote:
>> > be489d02d2 ("revision.c: --indexed-objects add objects from all
>> > worktrees", 2017-08-23) made sure that pruning takes objects from all
>> > worktrees into account.
>> > 
>> > It did that by reading the index of every worktree and adding the
>> > necessary index objects to the set of pending objects.  The index is
>> > read by read_index_from.  As mentioned in the previous commit,
>> > read_index_from depends on the CWD for the location of the split index,
>> 
>> As I mentioned before this doesn't actually depend on the CWD but
>> rather the per-worktree gitdir.
>
> Right, will fix.

Thanks.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX
  2017-12-18 18:16     ` Lars Schneider
@ 2018-01-04 20:13       ` Thomas Gummerer
  2018-01-05 11:03         ` Lars Schneider
  2018-01-07 20:02         ` Thomas Gummerer
  0 siblings, 2 replies; 94+ messages in thread
From: Thomas Gummerer @ 2018-01-04 20:13 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Git List, Brandon Williams, Jeff King, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor

On 12/18, Lars Schneider wrote:
> 
> > On 17 Dec 2017, at 23:51, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> > 
> > Split index mode only has a few dedicated tests, but as the index is
> > involved in nearly every git operation, this doesn't quite cover all the
> > ways repositories with split index can break.  To use split index mode
> > throughout the test suite a GIT_TEST_SPLIT_INDEX environment variable
> > can be set, which makes git split the index at random and thus
> > excercises the functionality much more thoroughly.
> > 
> > As this is not turned on by default, it is not executed nearly as often
> > as the test suite is run, so occationally breakages slip through.  Try
> > to counteract that by running the test suite with GIT_TEST_SPLIT_INDEX
> > mode turned on on travis.
> > 
> > To avoid using too many cycles on travis only run split index mode in
> > the linux-gcc and the linux 32-bit gcc targets.
> 
> I am surprised to see the split index mode test for the linux 32-bit
> target. Is it likely that a split index bug appears only on 32-bit? 
> Wouldn't the linux-gcc target be sufficient to save resources/time?

I'm not sure it's particularly likely for a bug to appear only on
32-bit builds.  It also doesn't seem to take too long to run, so I
thought I'd add it just in case, but I'm happy running the tests only
in the 64-bit builds if that's preferred.

> >  The Linux builds were
> > chosen over the Mac OS builds because they tend to be much faster to
> > complete.
> > 
> > The linux gcc build was chosen over the linux clang build because the
> > linux clang build is the fastest build, so it can serve as an early
> > indicator if something is broken and we want to avoid spending the extra
> > cycles of running the test suite twice for that.
> > 
> > Helped-by: Lars Schneider <larsxschneider@gmail.com>
> > Helped-by: Junio C Hamano <gitster@pobox.com>
> > Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> > ---
> > ci/run-linux32-build.sh | 1 +
> > ci/run-tests.sh         | 4 ++++
> > 2 files changed, 5 insertions(+)
> > 
> > diff --git a/ci/run-linux32-build.sh b/ci/run-linux32-build.sh
> > index e30fb2cddc..f173c9cf2a 100755
> > --- a/ci/run-linux32-build.sh
> > +++ b/ci/run-linux32-build.sh
> > @@ -27,4 +27,5 @@ linux32 --32bit i386 su -m -l $CI_USER -c '
> >     cd /usr/src/git &&
> >     make --jobs=2 &&
> >     make --quiet test
> > +    GIT_TEST_SPLIT_INDEX=YesPlease make --quiet test
> > '
> > diff --git a/ci/run-tests.sh b/ci/run-tests.sh
> > index f0c743de94..c7aee5b9ff 100755
> > --- a/ci/run-tests.sh
> > +++ b/ci/run-tests.sh
> > @@ -8,3 +8,7 @@
> > mkdir -p $HOME/travis-cache
> > ln -s $HOME/travis-cache/.prove t/.prove
> > make --quiet test
> > +if test "$jobname" = "linux-gcc"
> > +then
> > +	GIT_TEST_SPLIT_INDEX=YesPlease make --quiet test
> > +fi
> 
> For now I think that looks good. Maybe we could define additional test 
> configurations with an environment variable. That could be an array variable
> defined in the lib-travis.ci "case" statement:
> https://github.com/git/git/blob/1229713f78cd2883798e95f33c19c81b523413fd/ci/lib-travisci.sh#L42-L65

That sounds like a good idea.  I'll try to see if I can come up with
something.

> - Lars

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX
  2018-01-04 20:13       ` Thomas Gummerer
@ 2018-01-05 11:03         ` Lars Schneider
  2018-01-07 20:02         ` Thomas Gummerer
  1 sibling, 0 replies; 94+ messages in thread
From: Lars Schneider @ 2018-01-05 11:03 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Git List, Brandon Williams, Jeff King, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor


> On 04 Jan 2018, at 21:13, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> 
> On 12/18, Lars Schneider wrote:
>> 
>>> On 17 Dec 2017, at 23:51, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>>> 
>>> Split index mode only has a few dedicated tests, but as the index is
>>> involved in nearly every git operation, this doesn't quite cover all the
>>> ways repositories with split index can break.  To use split index mode
>>> throughout the test suite a GIT_TEST_SPLIT_INDEX environment variable
>>> can be set, which makes git split the index at random and thus
>>> excercises the functionality much more thoroughly.
>>> 
>>> As this is not turned on by default, it is not executed nearly as often
>>> as the test suite is run, so occationally breakages slip through.  Try
>>> to counteract that by running the test suite with GIT_TEST_SPLIT_INDEX
>>> mode turned on on travis.
>>> 
>>> To avoid using too many cycles on travis only run split index mode in
>>> the linux-gcc and the linux 32-bit gcc targets.
>> 
>> I am surprised to see the split index mode test for the linux 32-bit
>> target. Is it likely that a split index bug appears only on 32-bit? 
>> Wouldn't the linux-gcc target be sufficient to save resources/time?
> 
> I'm not sure it's particularly likely for a bug to appear only on
> 32-bit builds.  It also doesn't seem to take too long to run, so I
> thought I'd add it just in case, but I'm happy running the tests only
> in the 64-bit builds if that's preferred.

CI running times naturally add up over time as more and more stuff is
tested. That's why I would argue testing GIT_TEST_SPLIT_INDEX on 64-bit
is probably sufficient. However, I am not really familiar with 
GIT_TEST_SPLIT_INDEX. 

There is no right or wrong here and either way is fine with me. 
I just wanted to express my "keep an eye on CI time" concern.


Thanks,
Lars

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX
  2018-01-04 20:13       ` Thomas Gummerer
  2018-01-05 11:03         ` Lars Schneider
@ 2018-01-07 20:02         ` Thomas Gummerer
  1 sibling, 0 replies; 94+ messages in thread
From: Thomas Gummerer @ 2018-01-07 20:02 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Git List, Brandon Williams, Jeff King, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor

On 01/04, Thomas Gummerer wrote:
> On 12/18, Lars Schneider wrote:
> > [snip]
> > For now I think that looks good. Maybe we could define additional test 
> > configurations with an environment variable. That could be an array variable
> > defined in the lib-travis.ci "case" statement:
> > https://github.com/git/git/blob/1229713f78cd2883798e95f33c19c81b523413fd/ci/lib-travisci.sh#L42-L65
> 
> That sounds like a good idea.  I'll try to see if I can come up with
> something.

On second thought I'd prefer to just leave it as is for now, and leave
defining additional test configurations for a future iteration.
Having it configurable makes it a bit uglier, and I'm not even sure
how many configurations we can just test at runtime vs. having to
compile with the flag set.  So I'd like to punt on that for now, and
introduce more configurability once we actually need it :)

> > - Lars

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v3 0/3] fixes for split index mode
  2017-12-17 22:51 ` [PATCH v2 0/3] fixes for split index mode Thomas Gummerer
                     ` (2 preceding siblings ...)
  2017-12-17 22:51   ` [PATCH v2 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX Thomas Gummerer
@ 2018-01-07 22:30   ` Thomas Gummerer
  2018-01-07 22:30     ` [PATCH v3 1/3] read-cache: fix reading the shared index for other repos Thomas Gummerer
                       ` (4 more replies)
  3 siblings, 5 replies; 94+ messages in thread
From: Thomas Gummerer @ 2018-01-07 22:30 UTC (permalink / raw)
  To: git
  Cc: Lars Schneider, Brandon Williams, Jeff King, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor,
	Thomas Gummerer

Thanks Brandon and Lars for comments on the previous round.

Previous rounds were at <20171210212202.28231-1-t.gummerer@gmail.com>
and <20171217225122.28941-1-t.gummerer@gmail.com>.

Changes since the previous round:

- reworked the patches to no longer try to use struct repository for
  worktrees, but pass gitdir into read_index_from instead (Thanks
  Brandon :)).  So the fixes that were in 1/3 and 2/3 previously are
  now in 1/3
- 2/3 now fixes t7009 properly.  I thought it was fixed before, but it
  probably just passed the tests because of the GIT_TEST_SPLIT_INDEX
  "randomness".
- The travis job is now only running the 64-bit linux build with split
  index mode to save even more cycles.
- As this wasn't picked up anywhere yet, I took the freedom to rebase
  this onto the current master, which includes sg/travis-fixes, which
  made it a bit easier for me to test.  If this makes the life harder
  for anyone reviewing this let me know and I can base it on the same
  commit previous iterations were based on.

Thomas Gummerer (3):
  read-cache: fix reading the shared index for other repos
  split-index: don't write cache tree with null sha1 entries
  travis: run tests with GIT_TEST_SPLIT_INDEX

 cache-tree.c            |  2 +-
 cache.h                 |  8 +++++---
 ci/run-linux32-build.sh |  1 +
 ci/run-tests.sh         |  4 ++++
 read-cache.c            | 25 ++++++++++++++-----------
 repository.c            |  2 +-
 revision.c              |  3 ++-
 split-index.c           |  2 ++
 t/t1700-split-index.sh  | 19 +++++++++++++++++++
 9 files changed, 49 insertions(+), 17 deletions(-)

-- 
2.16.0.rc1.238.g530d649a79


^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v3 1/3] read-cache: fix reading the shared index for other repos
  2018-01-07 22:30   ` [PATCH v3 0/3] fixes for split index mode Thomas Gummerer
@ 2018-01-07 22:30     ` Thomas Gummerer
  2018-01-08 10:41       ` Duy Nguyen
                         ` (2 more replies)
  2018-01-07 22:30     ` [PATCH v3 2/3] split-index: don't write cache tree with null oid entries Thomas Gummerer
                       ` (3 subsequent siblings)
  4 siblings, 3 replies; 94+ messages in thread
From: Thomas Gummerer @ 2018-01-07 22:30 UTC (permalink / raw)
  To: git
  Cc: Lars Schneider, Brandon Williams, Jeff King, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor,
	Thomas Gummerer

read_index_from() takes a path argument for the location of the index
file.  For reading the shared index in split index mode however it just
ignores that path argument, and reads it from the gitdir of the current
repository.

This works as long as an index in the_repository is read.  Once that
changes, such as when we read the index of a submodule, or of a
different working tree than the current one, the gitdir of
the_repository will no longer contain the appropriate shared index,
and git will fail to read it.

For example t3007-ls-files-recurse-submodules.sh was broken with
GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository
object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also
broken in a similar manner, probably by introducing struct repository
there, although I didn't track down the exact commit for that.

be489d02d2 ("revision.c: --indexed-objects add objects from all
worktrees", 2017-08-23) breaks with split index mode in a similar
manner, not erroring out when it can't read the index, but instead
carrying on with pruning, without taking the index of the worktree into
account.

Fix this by passing an additional gitdir parameter to read_index_from,
to indicate where it should look for and read the shared index from.

read_cache_from() defaults to using the gitdir of the_repository.  As it
is mostly a convenience macro, having to pass get_git_dir() for every
call seems overkill, and if necessary users can have more control by
using read_index_from().

Helped-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 cache-tree.c |  2 +-
 cache.h      |  5 +++--
 read-cache.c | 23 +++++++++++++----------
 repository.c |  2 +-
 revision.c   |  3 ++-
 5 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index e03e72c34a..0dd6292a94 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -608,7 +608,7 @@ int write_index_as_tree(unsigned char *sha1, struct index_state *index_state, co
 
 	hold_lock_file_for_update(&lock_file, index_path, LOCK_DIE_ON_ERROR);
 
-	entries = read_index_from(index_state, index_path);
+	entries = read_index_from(index_state, index_path, get_git_dir());
 	if (entries < 0) {
 		ret = WRITE_TREE_UNREADABLE_INDEX;
 		goto out;
diff --git a/cache.h b/cache.h
index d8b975a571..d5a7d1d7f1 100644
--- a/cache.h
+++ b/cache.h
@@ -371,7 +371,7 @@ extern void free_name_hash(struct index_state *istate);
 #define active_cache_tree (the_index.cache_tree)
 
 #define read_cache() read_index(&the_index)
-#define read_cache_from(path) read_index_from(&the_index, (path))
+#define read_cache_from(path) read_index_from(&the_index, (path), (get_git_dir()))
 #define read_cache_preload(pathspec) read_index_preload(&the_index, (pathspec))
 #define is_cache_unborn() is_index_unborn(&the_index)
 #define read_cache_unmerged() read_index_unmerged(&the_index)
@@ -616,7 +616,8 @@ extern int read_index(struct index_state *);
 extern int read_index_preload(struct index_state *, const struct pathspec *pathspec);
 extern int do_read_index(struct index_state *istate, const char *path,
 			 int must_exist); /* for testting only! */
-extern int read_index_from(struct index_state *, const char *path);
+extern int read_index_from(struct index_state *, const char *path,
+			   const char *gitdir);
 extern int is_index_unborn(struct index_state *);
 extern int read_index_unmerged(struct index_state *);
 
diff --git a/read-cache.c b/read-cache.c
index 2eb81a66b9..3f5b4afa36 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1603,7 +1603,7 @@ int hold_locked_index(struct lock_file *lk, int lock_flags)
 
 int read_index(struct index_state *istate)
 {
-	return read_index_from(istate, get_index_file());
+	return read_index_from(istate, get_index_file(), get_git_dir());
 }
 
 static struct cache_entry *cache_entry_from_ondisk(struct ondisk_cache_entry *ondisk,
@@ -1863,20 +1863,19 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
  * This way, shared index can be removed if they have not been used
  * for some time.
  */
-static void freshen_shared_index(char *base_sha1_hex, int warn)
+static void freshen_shared_index(const char *shared_index, int warn)
 {
-	char *shared_index = git_pathdup("sharedindex.%s", base_sha1_hex);
 	if (!check_and_freshen_file(shared_index, 1) && warn)
 		warning("could not freshen shared index '%s'", shared_index);
-	free(shared_index);
 }
 
-int read_index_from(struct index_state *istate, const char *path)
+int read_index_from(struct index_state *istate, const char *path,
+		    const char *gitdir)
 {
 	struct split_index *split_index;
 	int ret;
 	char *base_sha1_hex;
-	const char *base_path;
+	char *base_path;
 
 	/* istate->initialized covers both .git/index and .git/sharedindex.xxx */
 	if (istate->initialized)
@@ -1896,16 +1895,17 @@ int read_index_from(struct index_state *istate, const char *path)
 		split_index->base = xcalloc(1, sizeof(*split_index->base));
 
 	base_sha1_hex = sha1_to_hex(split_index->base_sha1);
-	base_path = git_path("sharedindex.%s", base_sha1_hex);
+	base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_sha1_hex);
 	ret = do_read_index(split_index->base, base_path, 1);
 	if (hashcmp(split_index->base_sha1, split_index->base->sha1))
 		die("broken index, expect %s in %s, got %s",
 		    base_sha1_hex, base_path,
 		    sha1_to_hex(split_index->base->sha1));
 
-	freshen_shared_index(base_sha1_hex, 0);
+	freshen_shared_index(base_path, 0);
 	merge_base_index(istate);
 	post_read_index_from(istate);
+	free(base_path);
 	return ret;
 }
 
@@ -2573,8 +2573,11 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
 	ret = write_split_index(istate, lock, flags);
 
 	/* Freshen the shared index only if the split-index was written */
-	if (!ret && !new_shared_index)
-		freshen_shared_index(sha1_to_hex(si->base_sha1), 1);
+	if (!ret && !new_shared_index) {
+		const char *shared_index = git_path("sharedindex.%s",
+						    sha1_to_hex(si->base_sha1));
+		freshen_shared_index(shared_index, 1);
+	}
 
 out:
 	if (flags & COMMIT_LOCK)
diff --git a/repository.c b/repository.c
index 998413b8bb..0d715f4fdb 100644
--- a/repository.c
+++ b/repository.c
@@ -236,5 +236,5 @@ int repo_read_index(struct repository *repo)
 	if (!repo->index)
 		repo->index = xcalloc(1, sizeof(*repo->index));
 
-	return read_index_from(repo->index, repo->index_file);
+	return read_index_from(repo->index, repo->index_file, repo->gitdir);
 }
diff --git a/revision.c b/revision.c
index 72f2b4572e..76d6d125b3 100644
--- a/revision.c
+++ b/revision.c
@@ -1352,7 +1352,8 @@ void add_index_objects_to_pending(struct rev_info *revs, unsigned int flags)
 			continue; /* current index already taken care of */
 
 		if (read_index_from(&istate,
-				    worktree_git_path(wt, "index")) > 0)
+				    worktree_git_path(wt, "index"),
+				    get_worktree_git_dir(wt)) > 0)
 			do_add_index_objects_to_pending(revs, &istate);
 		discard_index(&istate);
 	}
-- 
2.16.0.rc1.238.g530d649a79


^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v3 2/3] split-index: don't write cache tree with null oid entries
  2018-01-07 22:30   ` [PATCH v3 0/3] fixes for split index mode Thomas Gummerer
  2018-01-07 22:30     ` [PATCH v3 1/3] read-cache: fix reading the shared index for other repos Thomas Gummerer
@ 2018-01-07 22:30     ` Thomas Gummerer
  2018-01-07 22:30     ` [PATCH v3 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX Thomas Gummerer
                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 94+ messages in thread
From: Thomas Gummerer @ 2018-01-07 22:30 UTC (permalink / raw)
  To: git
  Cc: Lars Schneider, Brandon Williams, Jeff King, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor,
	Thomas Gummerer

In a96d3cc3f6 ("cache-tree: reject entries with null sha1", 2017-04-21)
we made sure that broken cache entries do not get propagated to new
trees.  Part of that was making sure not to re-use an existing cache
tree that includes a null oid.

It did so by dropping the cache tree in 'do_write_index()' if one of
the entries contains a null oid.  In split index mode however, there
are two invocations to 'do_write_index()', one for the shared index
and one for the split index.  The cache tree is only written once, to
the split index.

As we only loop through the elements that are effectively being
written by the current invocation, that may not include the entry with
a null oid in the split index (when it is already written to the
shared index), where we write the cache tree.  Therefore in split
index mode we may still end up writing the cache tree, even though
there is an entry with a null oid in the index.

Fix this by checking for null oids in prepare_to_write_split_index,
where we loop the entries of the shared index as well as the entries for
the split index.

This fixes t7009 with GIT_TEST_SPLIT_INDEX.  Also add a new test that's
more specifically showing the problem.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 cache.h                |  3 ++-
 read-cache.c           |  2 +-
 split-index.c          |  2 ++
 t/t1700-split-index.sh | 19 +++++++++++++++++++
 4 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index d5a7d1d7f1..fd755c32cf 100644
--- a/cache.h
+++ b/cache.h
@@ -345,7 +345,8 @@ struct index_state {
 	struct split_index *split_index;
 	struct cache_time timestamp;
 	unsigned name_hash_initialized : 1,
-		 initialized : 1;
+		 initialized : 1,
+		 drop_cache_tree : 1;
 	struct hashmap name_hash;
 	struct hashmap dir_hash;
 	unsigned char sha1[20];
diff --git a/read-cache.c b/read-cache.c
index 3f5b4afa36..d13ce83794 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2243,7 +2243,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 	struct stat st;
 	struct ondisk_cache_entry_extended ondisk;
 	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
-	int drop_cache_tree = 0;
+	int drop_cache_tree = istate->drop_cache_tree;
 
 	for (i = removed = extended = 0; i < entries; i++) {
 		if (cache[i]->ce_flags & CE_REMOVE)
diff --git a/split-index.c b/split-index.c
index 83e39ec8d7..284d04d67f 100644
--- a/split-index.c
+++ b/split-index.c
@@ -238,6 +238,8 @@ void prepare_to_write_split_index(struct index_state *istate)
 				ALLOC_GROW(entries, nr_entries+1, nr_alloc);
 				entries[nr_entries++] = ce;
 			}
+			if (is_null_oid(&ce->oid))
+				istate->drop_cache_tree = 1;
 		}
 	}
 
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index af9b847761..c087b63367 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -401,4 +401,23 @@ done <<\EOF
 0642 -rw-r---w-
 EOF
 
+test_expect_success 'writing split index with null sha1 does not write cache tree' '
+	git config core.splitIndex true &&
+	git config splitIndex.maxPercentChange 0 &&
+	git commit -m "commit" &&
+	{
+		git ls-tree HEAD &&
+		printf "160000 commit $_z40\\tbroken\\n"
+	} >broken-tree &&
+	echo "add broken entry" >msg &&
+
+	tree=$(git mktree <broken-tree) &&
+	test_tick &&
+	commit=$(git commit-tree $tree -p HEAD <msg) &&
+	git update-ref HEAD "$commit" &&
+	GIT_ALLOW_NULL_SHA1=1 git reset --hard &&
+	(test-dump-cache-tree >cache-tree.out || true) &&
+	test_line_count = 0 cache-tree.out
+'
+
 test_done
-- 
2.16.0.rc1.238.g530d649a79


^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v3 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX
  2018-01-07 22:30   ` [PATCH v3 0/3] fixes for split index mode Thomas Gummerer
  2018-01-07 22:30     ` [PATCH v3 1/3] read-cache: fix reading the shared index for other repos Thomas Gummerer
  2018-01-07 22:30     ` [PATCH v3 2/3] split-index: don't write cache tree with null oid entries Thomas Gummerer
@ 2018-01-07 22:30     ` Thomas Gummerer
  2018-01-13 22:37     ` [PATCH v3 4/3] read-cache: don't try to write index if we can't write shared index Thomas Gummerer
  2018-01-18 21:53     ` [PATCH v3 0/3] fixes for split index mode Thomas Gummerer
  4 siblings, 0 replies; 94+ messages in thread
From: Thomas Gummerer @ 2018-01-07 22:30 UTC (permalink / raw)
  To: git
  Cc: Lars Schneider, Brandon Williams, Jeff King, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor,
	Thomas Gummerer

Split index mode only has a few dedicated tests, but as the index is
involved in nearly every git operation, this doesn't quite cover all the
ways repositories with split index can break.  To use split index mode
throughout the test suite a GIT_TEST_SPLIT_INDEX environment variable
can be set, which makes git split the index at random and thus
excercises the functionality much more thoroughly.

As this is not turned on by default, it is not executed nearly as often
as the test suite is run, so occationally breakages slip through.  Try
to counteract that by running the test suite with GIT_TEST_SPLIT_INDEX
mode turned on on travis.

To avoid using too many cycles on travis only run split index mode in
the linux-gcc target only.  The Linux build was chosen over the Mac OS
builds because it tends to be much faster to complete.

The linux gcc build was chosen over the linux clang build because the
linux clang build is the fastest build, so it can serve as an early
indicator if something is broken and we want to avoid spending the extra
cycles of running the test suite twice for that.

Helped-by: Lars Schneider <larsxschneider@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 ci/run-tests.sh | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/ci/run-tests.sh b/ci/run-tests.sh
index f0c743de94..c7aee5b9ff 100755
--- a/ci/run-tests.sh
+++ b/ci/run-tests.sh
@@ -8,3 +8,7 @@
 mkdir -p $HOME/travis-cache
 ln -s $HOME/travis-cache/.prove t/.prove
 make --quiet test
+if test "$jobname" = "linux-gcc"
+then
+	GIT_TEST_SPLIT_INDEX=YesPlease make --quiet test
+fi
-- 
2.16.0.rc1.238.g530d649a79


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos
  2018-01-07 22:30     ` [PATCH v3 1/3] read-cache: fix reading the shared index for other repos Thomas Gummerer
@ 2018-01-08 10:41       ` Duy Nguyen
  2018-01-08 22:41         ` Thomas Gummerer
  2018-01-08 23:38         ` Brandon Williams
  2018-01-16 21:42       ` Brandon Williams
  2018-01-19 21:57       ` Junio C Hamano
  2 siblings, 2 replies; 94+ messages in thread
From: Duy Nguyen @ 2018-01-08 10:41 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Git Mailing List, Lars Schneider, Brandon Williams, Jeff King,
	Junio C Hamano, SZEDER Gábor

On Mon, Jan 8, 2018 at 5:30 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> @@ -1896,16 +1895,17 @@ int read_index_from(struct index_state *istate, const char *path)
>                 split_index->base = xcalloc(1, sizeof(*split_index->base));
>
>         base_sha1_hex = sha1_to_hex(split_index->base_sha1);
> -       base_path = git_path("sharedindex.%s", base_sha1_hex);
> +       base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_sha1_hex);

Personally I prefer the repo_git_path() from v2 (sorry I was away and
could not comment anything). The thing is, git_path() and friends
could do some path translation underneath to support multiple
worktrees. Think of the given path here as a "virtual path" that may
be translated to something else, not exactly <git_dir> + "/" +
"sharedindex.%s". But in practice, we're not breaking the relationship
between $GIT_DIR/index and $GIT_DIR/sharedindex.* any time soon, doing
manual path transformation here is fine.

What about the other git_path() in this file? With patch applied I still get

> ~/w/git/temp $ git grep git_path read-cache.c
read-cache.c:           shared_index_path = git_path("%s", de->d_name);
read-cache.c:   temp = mks_tempfile(git_path("sharedindex_XXXXXX"));
read-cache.c:                         git_path("sharedindex.%s",
sha1_to_hex(si->base->sha1)));
read-cache.c:           const char *shared_index = git_path("sharedindex.%s",

I suppose submodule has not triggered any of these code paths yet. Not
sure if we should deal with them now or wait until later.

Perhaps if we add a "struct repository *" pointer inside index_state,
we could retrieve back the_repository (or others) and call
repo_git_path() everywhere without changing index api too much. I
don't know. I like the  'struct repository' concept but couldn't
follow its development so I don't if this is what it should become.
-- 
Duy

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos
  2018-01-08 10:41       ` Duy Nguyen
@ 2018-01-08 22:41         ` Thomas Gummerer
  2018-01-13 22:33           ` Thomas Gummerer
  2018-01-08 23:38         ` Brandon Williams
  1 sibling, 1 reply; 94+ messages in thread
From: Thomas Gummerer @ 2018-01-08 22:41 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Lars Schneider, Brandon Williams, Jeff King,
	Junio C Hamano, SZEDER Gábor

On 01/08, Duy Nguyen wrote:
> On Mon, Jan 8, 2018 at 5:30 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> > @@ -1896,16 +1895,17 @@ int read_index_from(struct index_state *istate, const char *path)
> >                 split_index->base = xcalloc(1, sizeof(*split_index->base));
> >
> >         base_sha1_hex = sha1_to_hex(split_index->base_sha1);
> > -       base_path = git_path("sharedindex.%s", base_sha1_hex);
> > +       base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_sha1_hex);
> 
> Personally I prefer the repo_git_path() from v2 (sorry I was away and
> could not comment anything).

It felt slightly nicer to me as well, but it threw up a bunch of
questions about how worktrees will fit together with struct
repository.  As I don't feel very strongly about this either way I
decided to go with what Brandon suggested as an alternative, which
allows us to defer that decision.  I'd be happy to revert this to the
way I had it in v2, but I don't want to drag the discussion on too
long either, as this series does fix some real regressions.

> The thing is, git_path() and friends
> could do some path translation underneath to support multiple
> worktrees. Think of the given path here as a "virtual path" that may
> be translated to something else, not exactly <git_dir> + "/" +
> "sharedindex.%s". But in practice, we're not breaking the relationship
> between $GIT_DIR/index and $GIT_DIR/sharedindex.* any time soon, doing
> manual path transformation here is fine.
>
> What about the other git_path() in this file? With patch applied I still get
> 
> > ~/w/git/temp $ git grep git_path read-cache.c
> read-cache.c:           shared_index_path = git_path("%s", de->d_name);
> read-cache.c:   temp = mks_tempfile(git_path("sharedindex_XXXXXX"));
> read-cache.c:                         git_path("sharedindex.%s",
> sha1_to_hex(si->base->sha1)));
> read-cache.c:           const char *shared_index = git_path("sharedindex.%s",

Good point, I hadn't looked at these, I only looked at the current
test failures.  I'm going to be away for the rest of the week, but
I'll have a look at them when I come back.

> I suppose submodule has not triggered any of these code paths yet. Not
> sure if we should deal with them now or wait until later.
> 
> Perhaps if we add a "struct repository *" pointer inside index_state,
> we could retrieve back the_repository (or others) and call
> repo_git_path() everywhere without changing index api too much. I
> don't know. I like the  'struct repository' concept but couldn't
> follow its development so I don't if this is what it should become.

Interesting.  I didn't follow the development of  struct repository
too closely either, so I'm not sure.  Brandon might have more of an
opinion on that? :)

> -- 
> Duy

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos
  2018-01-08 10:41       ` Duy Nguyen
  2018-01-08 22:41         ` Thomas Gummerer
@ 2018-01-08 23:38         ` Brandon Williams
  2018-01-09  1:24           ` Duy Nguyen
  1 sibling, 1 reply; 94+ messages in thread
From: Brandon Williams @ 2018-01-08 23:38 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Thomas Gummerer, Git Mailing List, Lars Schneider, Jeff King,
	Junio C Hamano, SZEDER Gábor

On 01/08, Duy Nguyen wrote:
> On Mon, Jan 8, 2018 at 5:30 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> > @@ -1896,16 +1895,17 @@ int read_index_from(struct index_state *istate, const char *path)
> >                 split_index->base = xcalloc(1, sizeof(*split_index->base));
> >
> >         base_sha1_hex = sha1_to_hex(split_index->base_sha1);
> > -       base_path = git_path("sharedindex.%s", base_sha1_hex);
> > +       base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_sha1_hex);
> 
> Personally I prefer the repo_git_path() from v2 (sorry I was away and
> could not comment anything). The thing is, git_path() and friends
> could do some path translation underneath to support multiple
> worktrees. Think of the given path here as a "virtual path" that may
> be translated to something else, not exactly <git_dir> + "/" +
> "sharedindex.%s". But in practice, we're not breaking the relationship
> between $GIT_DIR/index and $GIT_DIR/sharedindex.* any time soon, doing
> manual path transformation here is fine.

My biggest complaint about v2 is that we still don't quite know the best
way to integrate worktrees and struct repository yet so I was very
reluctant to start having them interact in the way v2 was using them
together.  I'm very much in favor of this version (v3) as each worktree
can explicitly provide their gitdir to be used to determine where to
read the shared index file without having to replicate a struct
repository for each.

> 
> What about the other git_path() in this file? With patch applied I still get
> 
> > ~/w/git/temp $ git grep git_path read-cache.c
> read-cache.c:           shared_index_path = git_path("%s", de->d_name);
> read-cache.c:   temp = mks_tempfile(git_path("sharedindex_XXXXXX"));
> read-cache.c:                         git_path("sharedindex.%s",
> sha1_to_hex(si->base->sha1)));
> read-cache.c:           const char *shared_index = git_path("sharedindex.%s",
> 
> I suppose submodule has not triggered any of these code paths yet. Not
> sure if we should deal with them now or wait until later.
> 
> Perhaps if we add a "struct repository *" pointer inside index_state,
> we could retrieve back the_repository (or others) and call
> repo_git_path() everywhere without changing index api too much. I
> don't know. I like the  'struct repository' concept but couldn't
> follow its development so I don't if this is what it should become.

I'm not too keen on having an index_state struct contain a back pointer
to a repository struct.  I do think that we may want to have worktree
structs contain a back pointer to the struct repository they correspond
to.  That way there is only one instance of the repository (and
object-store once that gets integrated) yet multiple worktrees.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos
  2018-01-08 23:38         ` Brandon Williams
@ 2018-01-09  1:24           ` Duy Nguyen
  0 siblings, 0 replies; 94+ messages in thread
From: Duy Nguyen @ 2018-01-09  1:24 UTC (permalink / raw)
  To: Brandon Williams
  Cc: Thomas Gummerer, Git Mailing List, Lars Schneider, Jeff King,
	Junio C Hamano, SZEDER Gábor

On Tue, Jan 9, 2018 at 6:38 AM, Brandon Williams <bmwill@google.com> wrote:
> On 01/08, Duy Nguyen wrote:
>> On Mon, Jan 8, 2018 at 5:30 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>> > @@ -1896,16 +1895,17 @@ int read_index_from(struct index_state *istate, const char *path)
>> >                 split_index->base = xcalloc(1, sizeof(*split_index->base));
>> >
>> >         base_sha1_hex = sha1_to_hex(split_index->base_sha1);
>> > -       base_path = git_path("sharedindex.%s", base_sha1_hex);
>> > +       base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_sha1_hex);
>>
>> Personally I prefer the repo_git_path() from v2 (sorry I was away and
>> could not comment anything). The thing is, git_path() and friends
>> could do some path translation underneath to support multiple
>> worktrees. Think of the given path here as a "virtual path" that may
>> be translated to something else, not exactly <git_dir> + "/" +
>> "sharedindex.%s". But in practice, we're not breaking the relationship
>> between $GIT_DIR/index and $GIT_DIR/sharedindex.* any time soon, doing
>> manual path transformation here is fine.
>
> My biggest complaint about v2 is that we still don't quite know the best
> way to integrate worktrees and struct repository yet so I was very
> reluctant to start having them interact in the way v2 was using them
> together.

This will be on my todo list (after I have finished with
nd/worktree-move which has been on 'pu' for too long). I'm ok with v3
then.

> I'm very much in favor of this version (v3) as each worktree
> can explicitly provide their gitdir to be used to determine where to
> read the shared index file without having to replicate a struct
> repository for each.

Isn't that the end goal though (I vaguely recall this discussion when
'struct repository' was an idea), that we will pass struct repository
around [1] rather than relying on global objects.

[1] or in this case struct worktree-something because the index
belongs more to a worktree than the object/refs database.

>> What about the other git_path() in this file? With patch applied I still get
>>
>> > ~/w/git/temp $ git grep git_path read-cache.c
>> read-cache.c:           shared_index_path = git_path("%s", de->d_name);
>> read-cache.c:   temp = mks_tempfile(git_path("sharedindex_XXXXXX"));
>> read-cache.c:                         git_path("sharedindex.%s",
>> sha1_to_hex(si->base->sha1)));
>> read-cache.c:           const char *shared_index = git_path("sharedindex.%s",
>>
>> I suppose submodule has not triggered any of these code paths yet. Not
>> sure if we should deal with them now or wait until later.
>>
>> Perhaps if we add a "struct repository *" pointer inside index_state,
>> we could retrieve back the_repository (or others) and call
>> repo_git_path() everywhere without changing index api too much. I
>> don't know. I like the  'struct repository' concept but couldn't
>> follow its development so I don't if this is what it should become.
>
> I'm not too keen on having an index_state struct contain a back pointer
> to a repository struct.  I do think that we may want to have worktree
> structs contain a back pointer to the struct repository they correspond
> to.  That way there is only one instance of the repository (and
> object-store once that gets integrated) yet multiple worktrees.

Yeah back pointer from worktree struct makes sense.
-- 
Duy

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos
  2018-01-08 22:41         ` Thomas Gummerer
@ 2018-01-13 22:33           ` Thomas Gummerer
  0 siblings, 0 replies; 94+ messages in thread
From: Thomas Gummerer @ 2018-01-13 22:33 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Lars Schneider, Brandon Williams, Jeff King,
	Junio C Hamano, SZEDER Gábor

On 01/08, Thomas Gummerer wrote:
> On 01/08, Duy Nguyen wrote:
> > On Mon, Jan 8, 2018 at 5:30 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> > > @@ -1896,16 +1895,17 @@ int read_index_from(struct index_state *istate, const char *path)
> > >                 split_index->base = xcalloc(1, sizeof(*split_index->base));
> > >
> > >         base_sha1_hex = sha1_to_hex(split_index->base_sha1);
> > > -       base_path = git_path("sharedindex.%s", base_sha1_hex);
> > > +       base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_sha1_hex);
> > 
> > Personally I prefer the repo_git_path() from v2 (sorry I was away and
> > could not comment anything).
> 
> It felt slightly nicer to me as well, but it threw up a bunch of
> questions about how worktrees will fit together with struct
> repository.  As I don't feel very strongly about this either way I
> decided to go with what Brandon suggested as an alternative, which
> allows us to defer that decision.  I'd be happy to revert this to the
> way I had it in v2, but I don't want to drag the discussion on too
> long either, as this series does fix some real regressions.
> 
> > The thing is, git_path() and friends
> > could do some path translation underneath to support multiple
> > worktrees. Think of the given path here as a "virtual path" that may
> > be translated to something else, not exactly <git_dir> + "/" +
> > "sharedindex.%s". But in practice, we're not breaking the relationship
> > between $GIT_DIR/index and $GIT_DIR/sharedindex.* any time soon, doing
> > manual path transformation here is fine.
> >
> > What about the other git_path() in this file? With patch applied I still get
> > 
> > > ~/w/git/temp $ git grep git_path read-cache.c
> > read-cache.c:           shared_index_path = git_path("%s", de->d_name);
> > read-cache.c:   temp = mks_tempfile(git_path("sharedindex_XXXXXX"));
> > read-cache.c:                         git_path("sharedindex.%s",
> > sha1_to_hex(si->base->sha1)));
> > read-cache.c:           const char *shared_index = git_path("sharedindex.%s",
> 
> Good point, I hadn't looked at these, I only looked at the current
> test failures.  I'm going to be away for the rest of the week, but
> I'll have a look at them when I come back.
> 
> > I suppose submodule has not triggered any of these code paths yet. Not
> > sure if we should deal with them now or wait until later.

Having had a look at these now, they are all in the write_locked_index
codepath.  We should probably have something like
'repo_write_locked_index()' for this.  But that probably requires a
bit more work/discussion to see what this should look like.  I'd
rather keep this patch series focused on the current breakages, and
deal with that in a separate patch series.

While looking at this, I did find another breakage in the split index
code, which I'll send as 4/3.

> > Perhaps if we add a "struct repository *" pointer inside index_state,
> > we could retrieve back the_repository (or others) and call
> > repo_git_path() everywhere without changing index api too much. I
> > don't know. I like the  'struct repository' concept but couldn't
> > follow its development so I don't if this is what it should become.
> 
> Interesting.  I didn't follow the development of  struct repository
> too closely either, so I'm not sure.  Brandon might have more of an
> opinion on that? :)
> 
> > -- 
> > Duy

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v3 4/3] read-cache: don't try to write index if we can't write shared index
  2018-01-07 22:30   ` [PATCH v3 0/3] fixes for split index mode Thomas Gummerer
                       ` (2 preceding siblings ...)
  2018-01-07 22:30     ` [PATCH v3 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX Thomas Gummerer
@ 2018-01-13 22:37     ` Thomas Gummerer
  2018-01-14  9:36       ` Duy Nguyen
  2018-01-18 21:53     ` [PATCH v3 0/3] fixes for split index mode Thomas Gummerer
  4 siblings, 1 reply; 94+ messages in thread
From: Thomas Gummerer @ 2018-01-13 22:37 UTC (permalink / raw)
  To: git
  Cc: Lars Schneider, Brandon Williams, Jeff King, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor

In a0a967568e ("update-index --split-index: do not split if $GIT_DIR is
read only", 2014-06-13), we tried to make sure we can still write an
index, even if the shared index can not be written.

We did so by just calling 'do_write_locked_index()' from
'write_shared_index()'.  'do_write_locked_index()' always at least
closes the tempfile nowadays, and used to close or commit the lockfile
if COMMIT_LOCK or CLOSE_LOCK were given at the time this feature was
introduced.  COMMIT_LOCK or CLOSE_LOCK is passed in by most callers of
'write_locked_index()'.

After calling 'write_shared_index()', we call 'write_split_index()',
which calls 'do_write_locked_index()' again, which then tries to use the
closed lockfile again, but in fact fails to do so as it's already
closed.

In the current version, git will in fact segfault if it can't create a
new file in $gitdir, and this feature seems to never have worked in the
first place.

Ever since introducing the split index feature, nobody has complained
about this failing, and it really just papers over repositories that
will sooner or later need fixing anyway.

Therefore just make being unable to write the split index a proper
error, and have users fix their repositories instead of trying (but
failing) to paper over the error.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 read-cache.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index d13ce83794..a9c8facdfd 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2471,18 +2471,15 @@ static int clean_shared_index_files(const char *current_hex)
 	return 0;
 }
 
-static int write_shared_index(struct index_state *istate,
-			      struct lock_file *lock, unsigned flags)
+static int write_shared_index(struct index_state *istate)
 {
 	struct tempfile *temp;
 	struct split_index *si = istate->split_index;
 	int ret;
 
 	temp = mks_tempfile(git_path("sharedindex_XXXXXX"));
-	if (!temp) {
-		hashclr(si->base_sha1);
-		return do_write_locked_index(istate, lock, flags);
-	}
+	if (!temp)
+		return error("cannot create new shared index");
 	move_cache_to_base_index(istate);
 	ret = do_write_index(si->base, temp, 1);
 	if (ret) {
@@ -2565,7 +2562,7 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
 	new_shared_index = istate->cache_changed & SPLIT_INDEX_ORDERED;
 
 	if (new_shared_index) {
-		ret = write_shared_index(istate, lock, flags);
+		ret = write_shared_index(istate);
 		if (ret)
 			goto out;
 	}
-- 
2.16.0.rc2.280.g09355b536d


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 4/3] read-cache: don't try to write index if we can't write shared index
  2018-01-13 22:37     ` [PATCH v3 4/3] read-cache: don't try to write index if we can't write shared index Thomas Gummerer
@ 2018-01-14  9:36       ` Duy Nguyen
  2018-01-14 10:18         ` [PATCH 1/3] read-cache.c: change type of "temp" in write_shared_index() Nguyễn Thái Ngọc Duy
  2018-01-14 14:29         ` [PATCH v3 4/3] read-cache: don't try to write index " Thomas Gummerer
  0 siblings, 2 replies; 94+ messages in thread
From: Duy Nguyen @ 2018-01-14  9:36 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Git Mailing List, Lars Schneider, Brandon Williams, Jeff King,
	Junio C Hamano, SZEDER Gábor

On Sun, Jan 14, 2018 at 5:37 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> In a0a967568e ("update-index --split-index: do not split if $GIT_DIR is
> read only", 2014-06-13), we tried to make sure we can still write an
> index, even if the shared index can not be written.
>
> We did so by just calling 'do_write_locked_index()' from
> 'write_shared_index()'.  'do_write_locked_index()' always at least
> closes the tempfile nowadays, and used to close or commit the lockfile
> if COMMIT_LOCK or CLOSE_LOCK were given at the time this feature was
> introduced.  COMMIT_LOCK or CLOSE_LOCK is passed in by most callers of
> 'write_locked_index()'.
>
> After calling 'write_shared_index()', we call 'write_split_index()',
> which calls 'do_write_locked_index()' again, which then tries to use the
> closed lockfile again, but in fact fails to do so as it's already
> closed.
>
> In the current version, git will in fact segfault if it can't create a
> new file in $gitdir, and this feature seems to never have worked in the
> first place.
>
> Ever since introducing the split index feature, nobody has complained
> about this failing, and it really just papers over repositories that
> will sooner or later need fixing anyway.

Actually there's one valid case for this: you're accessing a read-only
$GIT_DIR (.e.g maybe from a web server cgi script which may be run by
user nobody or something) and creating a temporary index _outside_
$GIT_DIR. I used to do this when I wanted to do "git grep" on some
SHA-1 a couple times. Doing "git grep <SHA-1>" directly (a couple
times) pays full cost for walking trees. If you prepare an index
first, you pay it only once.

> Therefore just make being unable to write the split index a proper
> error, and have users fix their repositories instead of trying (but
> failing) to paper over the error.
>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
>  read-cache.c | 11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/read-cache.c b/read-cache.c
> index d13ce83794..a9c8facdfd 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -2471,18 +2471,15 @@ static int clean_shared_index_files(const char *current_hex)
>         return 0;
>  }
>
> -static int write_shared_index(struct index_state *istate,
> -                             struct lock_file *lock, unsigned flags)
> +static int write_shared_index(struct index_state *istate)
>  {
>         struct tempfile *temp;
>         struct split_index *si = istate->split_index;
>         int ret;
>
>         temp = mks_tempfile(git_path("sharedindex_XXXXXX"));
> -       if (!temp) {
> -               hashclr(si->base_sha1);
> -               return do_write_locked_index(istate, lock, flags);

I think this code tries to do what's done near the beginning of
write_locked_index() where we also bail out early:

-- 8< --
        if (!si || alternate_index_output ||
            (istate->cache_changed & ~EXTMASK)) {
                if (si)
                        hashclr(si->base_sha1);
                ret = do_write_locked_index(istate, lock, flags);
                goto out;
        }
-- 8< --

the only difference is it does not realize that it can't do "goto
out;" like that code unless something goes wrong. I'll try to prepare
a patch that move tempfile creation out of write_shared_index()
instead. Patches coming in a bit..
-- 
Duy

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 1/3] read-cache.c: change type of "temp" in write_shared_index()
  2018-01-14  9:36       ` Duy Nguyen
@ 2018-01-14 10:18         ` Nguyễn Thái Ngọc Duy
  2018-01-14 10:18           ` [PATCH 2/3] read-cache.c: move tempfile creation/cleanup out of write_shared_index Nguyễn Thái Ngọc Duy
  2018-01-14 10:18           ` [PATCH 3/3] read-cache: don't write index twice if we can't write shared index Nguyễn Thái Ngọc Duy
  2018-01-14 14:29         ` [PATCH v3 4/3] read-cache: don't try to write index " Thomas Gummerer
  1 sibling, 2 replies; 94+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-01-14 10:18 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, larsxschneider, bmwill, peff, Junio C Hamano,
	szeder.dev, Nguyễn Thái Ngọc Duy

This local variable 'temp' will be passed in from the caller in the next
patch. To reduce patch noise, let's change its type now while it's still
a local variable and get all the trival conversion out of the next patch.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 read-cache.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 2eb81a66b9..536086e1fe 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2474,30 +2474,32 @@ static int clean_shared_index_files(const char *current_hex)
 static int write_shared_index(struct index_state *istate,
 			      struct lock_file *lock, unsigned flags)
 {
-	struct tempfile *temp;
+	struct tempfile *real_temp;
+	struct tempfile **temp = &real_temp;
 	struct split_index *si = istate->split_index;
 	int ret;
 
-	temp = mks_tempfile(git_path("sharedindex_XXXXXX"));
-	if (!temp) {
+	real_temp = mks_tempfile(git_path("sharedindex_XXXXXX"));
+	if (!real_temp) {
 		hashclr(si->base_sha1);
 		return do_write_locked_index(istate, lock, flags);
 	}
+	temp = &real_temp;
 	move_cache_to_base_index(istate);
-	ret = do_write_index(si->base, temp, 1);
+	ret = do_write_index(si->base, *temp, 1);
 	if (ret) {
-		delete_tempfile(&temp);
+		delete_tempfile(temp);
 		return ret;
 	}
-	ret = adjust_shared_perm(get_tempfile_path(temp));
+	ret = adjust_shared_perm(get_tempfile_path(*temp));
 	if (ret) {
 		int save_errno = errno;
-		error("cannot fix permission bits on %s", get_tempfile_path(temp));
-		delete_tempfile(&temp);
+		error("cannot fix permission bits on %s", get_tempfile_path(*temp));
+		delete_tempfile(temp);
 		errno = save_errno;
 		return ret;
 	}
-	ret = rename_tempfile(&temp,
+	ret = rename_tempfile(temp,
 			      git_path("sharedindex.%s", sha1_to_hex(si->base->sha1)));
 	if (!ret) {
 		hashcpy(si->base_sha1, si->base->sha1);
-- 
2.15.1.600.g899a5f85c6


^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 2/3] read-cache.c: move tempfile creation/cleanup out of write_shared_index
  2018-01-14 10:18         ` [PATCH 1/3] read-cache.c: change type of "temp" in write_shared_index() Nguyễn Thái Ngọc Duy
@ 2018-01-14 10:18           ` Nguyễn Thái Ngọc Duy
  2018-01-14 10:18           ` [PATCH 3/3] read-cache: don't write index twice if we can't write shared index Nguyễn Thái Ngọc Duy
  1 sibling, 0 replies; 94+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-01-14 10:18 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, larsxschneider, bmwill, peff, Junio C Hamano,
	szeder.dev, Nguyễn Thái Ngọc Duy

For one thing, we have more consistent cleanup procedure now and always
keep errno intact.

The real purpose is the ability to break out of write_locked_index()
early when mks_tempfile() fails in the next patch. It's more awkward to
do it if this mks_tempfile() is still inside write_shared_index().

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 read-cache.c | 33 +++++++++++++++++----------------
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 536086e1fe..c568643f55 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2472,31 +2472,18 @@ static int clean_shared_index_files(const char *current_hex)
 }
 
 static int write_shared_index(struct index_state *istate,
-			      struct lock_file *lock, unsigned flags)
+			      struct tempfile **temp)
 {
-	struct tempfile *real_temp;
-	struct tempfile **temp = &real_temp;
 	struct split_index *si = istate->split_index;
 	int ret;
 
-	real_temp = mks_tempfile(git_path("sharedindex_XXXXXX"));
-	if (!real_temp) {
-		hashclr(si->base_sha1);
-		return do_write_locked_index(istate, lock, flags);
-	}
-	temp = &real_temp;
 	move_cache_to_base_index(istate);
 	ret = do_write_index(si->base, *temp, 1);
-	if (ret) {
-		delete_tempfile(temp);
+	if (ret)
 		return ret;
-	}
 	ret = adjust_shared_perm(get_tempfile_path(*temp));
 	if (ret) {
-		int save_errno = errno;
 		error("cannot fix permission bits on %s", get_tempfile_path(*temp));
-		delete_tempfile(temp);
-		errno = save_errno;
 		return ret;
 	}
 	ret = rename_tempfile(temp,
@@ -2567,7 +2554,21 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
 	new_shared_index = istate->cache_changed & SPLIT_INDEX_ORDERED;
 
 	if (new_shared_index) {
-		ret = write_shared_index(istate, lock, flags);
+		struct tempfile *temp;
+		int saved_errno;
+
+		temp = mks_tempfile(git_path("sharedindex_XXXXXX"));
+		if (!temp) {
+			hashclr(si->base_sha1);
+			ret = do_write_locked_index(istate, lock, flags);
+		} else
+			ret = write_shared_index(istate, &temp);
+
+		saved_errno = errno;
+		if (is_tempfile_active(temp))
+			delete_tempfile(&temp);
+		errno = saved_errno;
+
 		if (ret)
 			goto out;
 	}
-- 
2.15.1.600.g899a5f85c6


^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 3/3] read-cache: don't write index twice if we can't write shared index
  2018-01-14 10:18         ` [PATCH 1/3] read-cache.c: change type of "temp" in write_shared_index() Nguyễn Thái Ngọc Duy
  2018-01-14 10:18           ` [PATCH 2/3] read-cache.c: move tempfile creation/cleanup out of write_shared_index Nguyễn Thái Ngọc Duy
@ 2018-01-14 10:18           ` Nguyễn Thái Ngọc Duy
  2018-01-18 11:36             ` SZEDER Gábor
  1 sibling, 1 reply; 94+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-01-14 10:18 UTC (permalink / raw)
  To: git
  Cc: t.gummerer, larsxschneider, bmwill, peff, Junio C Hamano,
	szeder.dev, Nguyễn Thái Ngọc Duy

In a0a967568e ("update-index --split-index: do not split if $GIT_DIR is
read only", 2014-06-13), we tried to make sure we can still write an
index, even if the shared index can not be written.

We did so by just calling 'do_write_locked_index()' just before
'write_shared_index()'.  'do_write_locked_index()' always at least
closes the tempfile nowadays, and used to close or commit the lockfile
if COMMIT_LOCK or CLOSE_LOCK were given at the time this feature was
introduced.  COMMIT_LOCK or CLOSE_LOCK is passed in by most callers of
'write_locked_index()'.

After calling 'write_shared_index()', we call 'write_split_index()',
which calls 'do_write_locked_index()' again, which then tries to use the
closed lockfile again, but in fact fails to do so as it's already
closed. This eventually leads to a segfault.

Make sure to write the main index only once.

[nd: most of the commit message and investigation done by Thomas, I only
tweaked the solution a bit]

Helped-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 read-cache.c           |  5 +++--
 t/t1700-split-index.sh | 19 +++++++++++++++++++
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index c568643f55..c58c0a978a 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2561,8 +2561,9 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
 		if (!temp) {
 			hashclr(si->base_sha1);
 			ret = do_write_locked_index(istate, lock, flags);
-		} else
-			ret = write_shared_index(istate, &temp);
+			goto out;
+		}
+		ret = write_shared_index(istate, &temp);
 
 		saved_errno = errno;
 		if (is_tempfile_active(temp))
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index af9b847761..5494389dbb 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -401,4 +401,23 @@ done <<\EOF
 0642 -rw-r---w-
 EOF
 
+test_expect_success POSIXPERM 'graceful handling splitting index is not allowed' '
+	test_create_repo ro &&
+	(
+		cd ro &&
+		test_commit initial &&
+		git update-index --split-index &&
+		test -f .git/sharedindex.*
+	) &&
+	test_when_finished "chmod -R u+w ro" &&
+	chmod -R u-w ro &&
+	cp ro/.git/index new-index &&
+	GIT_INDEX_FILE="$(pwd)/new-index" git -C ro update-index --split-index &&
+	chmod -R u+w ro &&
+	rm ro/.git/sharedindex.* &&
+	GIT_INDEX_FILE=new-index git ls-files >actual &&
+	echo initial.t >expected &&
+	test_cmp expected actual
+'
+
 test_done
-- 
2.15.1.600.g899a5f85c6


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 4/3] read-cache: don't try to write index if we can't write shared index
  2018-01-14  9:36       ` Duy Nguyen
  2018-01-14 10:18         ` [PATCH 1/3] read-cache.c: change type of "temp" in write_shared_index() Nguyễn Thái Ngọc Duy
@ 2018-01-14 14:29         ` " Thomas Gummerer
  1 sibling, 0 replies; 94+ messages in thread
From: Thomas Gummerer @ 2018-01-14 14:29 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Lars Schneider, Brandon Williams, Jeff King,
	Junio C Hamano, SZEDER Gábor

On 01/14, Duy Nguyen wrote:
> On Sun, Jan 14, 2018 at 5:37 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> > In a0a967568e ("update-index --split-index: do not split if $GIT_DIR is
> > read only", 2014-06-13), we tried to make sure we can still write an
> > index, even if the shared index can not be written.
> >
> > We did so by just calling 'do_write_locked_index()' from
> > 'write_shared_index()'.  'do_write_locked_index()' always at least
> > closes the tempfile nowadays, and used to close or commit the lockfile
> > if COMMIT_LOCK or CLOSE_LOCK were given at the time this feature was
> > introduced.  COMMIT_LOCK or CLOSE_LOCK is passed in by most callers of
> > 'write_locked_index()'.
> >
> > After calling 'write_shared_index()', we call 'write_split_index()',
> > which calls 'do_write_locked_index()' again, which then tries to use the
> > closed lockfile again, but in fact fails to do so as it's already
> > closed.
> >
> > In the current version, git will in fact segfault if it can't create a
> > new file in $gitdir, and this feature seems to never have worked in the
> > first place.
> >
> > Ever since introducing the split index feature, nobody has complained
> > about this failing, and it really just papers over repositories that
> > will sooner or later need fixing anyway.
> 
> Actually there's one valid case for this: you're accessing a read-only
> $GIT_DIR (.e.g maybe from a web server cgi script which may be run by
> user nobody or something) and creating a temporary index _outside_
> $GIT_DIR. I used to do this when I wanted to do "git grep" on some
> SHA-1 a couple times. Doing "git grep <SHA-1>" directly (a couple
> times) pays full cost for walking trees. If you prepare an index
> first, you pay it only once.

Makes sense, I didn't realize that usecase, thanks!

> > Therefore just make being unable to write the split index a proper
> > error, and have users fix their repositories instead of trying (but
> > failing) to paper over the error.
> >
> > Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> > ---
> >  read-cache.c | 11 ++++-------
> >  1 file changed, 4 insertions(+), 7 deletions(-)
> >
> > diff --git a/read-cache.c b/read-cache.c
> > index d13ce83794..a9c8facdfd 100644
> > --- a/read-cache.c
> > +++ b/read-cache.c
> > @@ -2471,18 +2471,15 @@ static int clean_shared_index_files(const char *current_hex)
> >         return 0;
> >  }
> >
> > -static int write_shared_index(struct index_state *istate,
> > -                             struct lock_file *lock, unsigned flags)
> > +static int write_shared_index(struct index_state *istate)
> >  {
> >         struct tempfile *temp;
> >         struct split_index *si = istate->split_index;
> >         int ret;
> >
> >         temp = mks_tempfile(git_path("sharedindex_XXXXXX"));
> > -       if (!temp) {
> > -               hashclr(si->base_sha1);
> > -               return do_write_locked_index(istate, lock, flags);
> 
> I think this code tries to do what's done near the beginning of
> write_locked_index() where we also bail out early:
> 
> -- 8< --
>         if (!si || alternate_index_output ||
>             (istate->cache_changed & ~EXTMASK)) {
>                 if (si)
>                         hashclr(si->base_sha1);
>                 ret = do_write_locked_index(istate, lock, flags);
>                 goto out;
>         }
> -- 8< --
> 
> the only difference is it does not realize that it can't do "goto
> out;" like that code unless something goes wrong. I'll try to prepare
> a patch that move tempfile creation out of write_shared_index()
> instead. Patches coming in a bit..

Thanks for fixing this in a nicer way :)

> -- 
> Duy

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos
  2018-01-07 22:30     ` [PATCH v3 1/3] read-cache: fix reading the shared index for other repos Thomas Gummerer
  2018-01-08 10:41       ` Duy Nguyen
@ 2018-01-16 21:42       ` Brandon Williams
  2018-01-17  0:16         ` Duy Nguyen
  2018-01-19 21:57       ` Junio C Hamano
  2 siblings, 1 reply; 94+ messages in thread
From: Brandon Williams @ 2018-01-16 21:42 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: git, Lars Schneider, Jeff King, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor

On 01/07, Thomas Gummerer wrote:
> read_index_from() takes a path argument for the location of the index
> file.  For reading the shared index in split index mode however it just
> ignores that path argument, and reads it from the gitdir of the current
> repository.
> 
> This works as long as an index in the_repository is read.  Once that
> changes, such as when we read the index of a submodule, or of a
> different working tree than the current one, the gitdir of
> the_repository will no longer contain the appropriate shared index,
> and git will fail to read it.
> 
> For example t3007-ls-files-recurse-submodules.sh was broken with
> GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository
> object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also
> broken in a similar manner, probably by introducing struct repository
> there, although I didn't track down the exact commit for that.
> 
> be489d02d2 ("revision.c: --indexed-objects add objects from all
> worktrees", 2017-08-23) breaks with split index mode in a similar
> manner, not erroring out when it can't read the index, but instead
> carrying on with pruning, without taking the index of the worktree into
> account.
> 
> Fix this by passing an additional gitdir parameter to read_index_from,
> to indicate where it should look for and read the shared index from.
> 
> read_cache_from() defaults to using the gitdir of the_repository.  As it
> is mostly a convenience macro, having to pass get_git_dir() for every
> call seems overkill, and if necessary users can have more control by
> using read_index_from().

I'm not saying we need to change this again but I got to thinking about
what the root cause for this bug is and I think it's a design flaw in
how split index is implemented.  IIUC Split index is an index extension
that can be enabled to limit the size of the index file that is written
when making changes to the index.  It breaks the index into two pieces,
index (which contains only changes) and sharedindex.XXXXX (which
contains unchanged information) where 'XXXXX' is a value found in the
index file.  If we don't do anything fancy then these two files live
next to one another in a repository's git directory at $GIT_DIR/index
and $GIT_DIR/sharedindex.XXXXX.  This seems to work all well and fine
except that this isn't always the case and the read_index_from function
takes this into account by enabling a caller to specify a path to where
the index file is located.  We can do this by specifying the index file
we want to use by setting GIT_INDEX_FILE.

Being able to specify an index file via the environment is a feature
that has existed for a very long time (one that I personally think makes
things difficult because of things like additions like the sharedindex)
and I don't think was taken into account when introducing the split
index extension.  In this case if i were to specify a location of an
index file in my home directory '~/index' and be using the split index
feature then the corresponding sharedindex file would live in my
repository's git directory '~/project/.git/sharedindex.XXXXX'.  So the
sharedindex file is always located relative to the project's git
directory and not the index file itself, which is kind of confusing.
Maybe a better design would be to have the sharedindex file located
relative to the index file.

Anyway, some food for thought.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos
  2018-01-16 21:42       ` Brandon Williams
@ 2018-01-17  0:16         ` Duy Nguyen
  2018-01-17  0:32           ` Brandon Williams
  2018-01-17 18:16           ` Jonathan Nieder
  0 siblings, 2 replies; 94+ messages in thread
From: Duy Nguyen @ 2018-01-17  0:16 UTC (permalink / raw)
  To: Brandon Williams
  Cc: Thomas Gummerer, Git Mailing List, Lars Schneider, Jeff King,
	Junio C Hamano, SZEDER Gábor

On Wed, Jan 17, 2018 at 4:42 AM, Brandon Williams <bmwill@google.com> wrote:
> On 01/07, Thomas Gummerer wrote:
>> read_index_from() takes a path argument for the location of the index
>> file.  For reading the shared index in split index mode however it just
>> ignores that path argument, and reads it from the gitdir of the current
>> repository.
>>
>> This works as long as an index in the_repository is read.  Once that
>> changes, such as when we read the index of a submodule, or of a
>> different working tree than the current one, the gitdir of
>> the_repository will no longer contain the appropriate shared index,
>> and git will fail to read it.
>>
>> For example t3007-ls-files-recurse-submodules.sh was broken with
>> GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository
>> object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also
>> broken in a similar manner, probably by introducing struct repository
>> there, although I didn't track down the exact commit for that.
>>
>> be489d02d2 ("revision.c: --indexed-objects add objects from all
>> worktrees", 2017-08-23) breaks with split index mode in a similar
>> manner, not erroring out when it can't read the index, but instead
>> carrying on with pruning, without taking the index of the worktree into
>> account.
>>
>> Fix this by passing an additional gitdir parameter to read_index_from,
>> to indicate where it should look for and read the shared index from.
>>
>> read_cache_from() defaults to using the gitdir of the_repository.  As it
>> is mostly a convenience macro, having to pass get_git_dir() for every
>> call seems overkill, and if necessary users can have more control by
>> using read_index_from().
>
> I'm not saying we need to change this again but I got to thinking about
> what the root cause for this bug is and I think it's a design flaw in
> how split index is implemented.  IIUC Split index is an index extension
> that can be enabled to limit the size of the index file that is written
> when making changes to the index.  It breaks the index into two pieces,
> index (which contains only changes) and sharedindex.XXXXX (which
> contains unchanged information) where 'XXXXX' is a value found in the
> index file.  If we don't do anything fancy then these two files live
> next to one another in a repository's git directory at $GIT_DIR/index
> and $GIT_DIR/sharedindex.XXXXX.  This seems to work all well and fine
> except that this isn't always the case and the read_index_from function
> takes this into account by enabling a caller to specify a path to where
> the index file is located.  We can do this by specifying the index file
> we want to use by setting GIT_INDEX_FILE.
>
> Being able to specify an index file via the environment is a feature
> that has existed for a very long time (one that I personally think makes
> things difficult because of things like additions like the sharedindex)
> and I don't think was taken into account when introducing the split
> index extension.

It was (partly because I did use GIT_INDEX_FILE on occasions).

> In this case if i were to specify a location of an
> index file in my home directory '~/index' and be using the split index
> feature then the corresponding sharedindex file would live in my
> repository's git directory '~/project/.git/sharedindex.XXXXX'.  So the
> sharedindex file is always located relative to the project's git
> directory and not the index file itself, which is kind of confusing.
> Maybe a better design would be to have the sharedindex file located
> relative to the index file.

That adds more problems. Now when you move the index file around you
have to move the shared index file too (think about atomic rename
which we use in plenty of places, we can't achieve that by moving two
files). A new dependency to $GIT_DIR is not that confusing to me, the
index file is useless anyway if you don't have access to
$GIT_DIR/objects. There was always the option to _not_ split the index
when $GIT_INDEX_FILE is specified, I think I did consider that but I
dropped it because we'd lose the performance gain by splitting.

> Anyway, some food for thought.
>
> --
> Brandon Williams



-- 
Duy

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos
  2018-01-17  0:16         ` Duy Nguyen
@ 2018-01-17  0:32           ` Brandon Williams
  2018-01-17 18:16           ` Jonathan Nieder
  1 sibling, 0 replies; 94+ messages in thread
From: Brandon Williams @ 2018-01-17  0:32 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Thomas Gummerer, Git Mailing List, Lars Schneider, Jeff King,
	Junio C Hamano, SZEDER Gábor

On 01/17, Duy Nguyen wrote:
> On Wed, Jan 17, 2018 at 4:42 AM, Brandon Williams <bmwill@google.com> wrote:
> > On 01/07, Thomas Gummerer wrote:
> >> read_index_from() takes a path argument for the location of the index
> >> file.  For reading the shared index in split index mode however it just
> >> ignores that path argument, and reads it from the gitdir of the current
> >> repository.
> >>
> >> This works as long as an index in the_repository is read.  Once that
> >> changes, such as when we read the index of a submodule, or of a
> >> different working tree than the current one, the gitdir of
> >> the_repository will no longer contain the appropriate shared index,
> >> and git will fail to read it.
> >>
> >> For example t3007-ls-files-recurse-submodules.sh was broken with
> >> GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository
> >> object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also
> >> broken in a similar manner, probably by introducing struct repository
> >> there, although I didn't track down the exact commit for that.
> >>
> >> be489d02d2 ("revision.c: --indexed-objects add objects from all
> >> worktrees", 2017-08-23) breaks with split index mode in a similar
> >> manner, not erroring out when it can't read the index, but instead
> >> carrying on with pruning, without taking the index of the worktree into
> >> account.
> >>
> >> Fix this by passing an additional gitdir parameter to read_index_from,
> >> to indicate where it should look for and read the shared index from.
> >>
> >> read_cache_from() defaults to using the gitdir of the_repository.  As it
> >> is mostly a convenience macro, having to pass get_git_dir() for every
> >> call seems overkill, and if necessary users can have more control by
> >> using read_index_from().
> >
> > I'm not saying we need to change this again but I got to thinking about
> > what the root cause for this bug is and I think it's a design flaw in
> > how split index is implemented.  IIUC Split index is an index extension
> > that can be enabled to limit the size of the index file that is written
> > when making changes to the index.  It breaks the index into two pieces,
> > index (which contains only changes) and sharedindex.XXXXX (which
> > contains unchanged information) where 'XXXXX' is a value found in the
> > index file.  If we don't do anything fancy then these two files live
> > next to one another in a repository's git directory at $GIT_DIR/index
> > and $GIT_DIR/sharedindex.XXXXX.  This seems to work all well and fine
> > except that this isn't always the case and the read_index_from function
> > takes this into account by enabling a caller to specify a path to where
> > the index file is located.  We can do this by specifying the index file
> > we want to use by setting GIT_INDEX_FILE.
> >
> > Being able to specify an index file via the environment is a feature
> > that has existed for a very long time (one that I personally think makes
> > things difficult because of things like additions like the sharedindex)
> > and I don't think was taken into account when introducing the split
> > index extension.
> 
> It was (partly because I did use GIT_INDEX_FILE on occasions).
> 
> > In this case if i were to specify a location of an
> > index file in my home directory '~/index' and be using the split index
> > feature then the corresponding sharedindex file would live in my
> > repository's git directory '~/project/.git/sharedindex.XXXXX'.  So the
> > sharedindex file is always located relative to the project's git
> > directory and not the index file itself, which is kind of confusing.
> > Maybe a better design would be to have the sharedindex file located
> > relative to the index file.
> 
> That adds more problems. Now when you move the index file around you
> have to move the shared index file too (think about atomic rename
> which we use in plenty of places, we can't achieve that by moving two
> files). A new dependency to $GIT_DIR is not that confusing to me, the
> index file is useless anyway if you don't have access to
> $GIT_DIR/objects. There was always the option to _not_ split the index
> when $GIT_INDEX_FILE is specified, I think I did consider that but I
> dropped it because we'd lose the performance gain by splitting.
> 
> > Anyway, some food for thought.
> >
> > --
> > Brandon Williams
> 
> 
> 
> -- 
> Duy

Thanks for giving me some more context :) makes me feel more confident
in the solution that's already been proposed.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos
  2018-01-17  0:16         ` Duy Nguyen
  2018-01-17  0:32           ` Brandon Williams
@ 2018-01-17 18:16           ` Jonathan Nieder
  2018-01-18 10:19             ` Duy Nguyen
  1 sibling, 1 reply; 94+ messages in thread
From: Jonathan Nieder @ 2018-01-17 18:16 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Brandon Williams, Thomas Gummerer, Git Mailing List,
	Lars Schneider, Jeff King, Junio C Hamano, SZEDER Gábor

Hi,

Duy Nguyen wrote:
> On Wed, Jan 17, 2018 at 4:42 AM, Brandon Williams <bmwill@google.com> wrote:

>>                                  IIUC Split index is an index extension
>> that can be enabled to limit the size of the index file that is written
>> when making changes to the index.  It breaks the index into two pieces,
>> index (which contains only changes) and sharedindex.XXXXX (which
>> contains unchanged information) where 'XXXXX' is a value found in the
>> index file.  If we don't do anything fancy then these two files live
>> next to one another in a repository's git directory at $GIT_DIR/index
>> and $GIT_DIR/sharedindex.XXXXX.  This seems to work all well and fine
>> except that this isn't always the case and the read_index_from function
>> takes this into account by enabling a caller to specify a path to where
>> the index file is located.  We can do this by specifying the index file
>> we want to use by setting GIT_INDEX_FILE.
[...]
>> In this case if i were to specify a location of an
>> index file in my home directory '~/index' and be using the split index
>> feature then the corresponding sharedindex file would live in my
>> repository's git directory '~/project/.git/sharedindex.XXXXX'.  So the
>> sharedindex file is always located relative to the project's git
>> directory and not the index file itself, which is kind of confusing.
>> Maybe a better design would be to have the sharedindex file located
>> relative to the index file.
>
> That adds more problems. Now when you move the index file around you
> have to move the shared index file too (think about atomic rename
> which we use in plenty of places, we can't achieve that by moving two
> files). A new dependency to $GIT_DIR is not that confusing to me, the
> index file is useless anyway if you don't have access to
> $GIT_DIR/objects. There was always the option to _not_ split the index
> when $GIT_INDEX_FILE is specified, I think I did consider that but I
> dropped it because we'd lose the performance gain by splitting.

Can you elaborate a little more on this?

At first glance, it seems simpler to say "paths in index extensions
named in the index file are relative to the location of the index
file" and to make moving the index file also require moving the shared
index file, exactly as you say.  So at least from a "principle of
least surprise" perspective I would be tempted to go that way.

It's true that we rely on atomic rename in plenty of places, but only
within a directory.  (Filesystem boundaries, NFS, etc mean that atomic
renames across directories are a lost cause.)

Fortunately index files (including temp index files used by scripts)
tend to only be in $GIT_DIR, for exactly that reason.  So I am
wondering if switching to index-file-relative semantics would be an
invasive move and what the pros and cons of such a move are.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos
  2018-01-17 18:16           ` Jonathan Nieder
@ 2018-01-18 10:19             ` Duy Nguyen
  0 siblings, 0 replies; 94+ messages in thread
From: Duy Nguyen @ 2018-01-18 10:19 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Brandon Williams, Thomas Gummerer, Git Mailing List,
	Lars Schneider, Jeff King, Junio C Hamano, SZEDER Gábor

On Thu, Jan 18, 2018 at 1:16 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Hi,
>
> Duy Nguyen wrote:
>> On Wed, Jan 17, 2018 at 4:42 AM, Brandon Williams <bmwill@google.com> wrote:
>
>>>                                  IIUC Split index is an index extension
>>> that can be enabled to limit the size of the index file that is written
>>> when making changes to the index.  It breaks the index into two pieces,
>>> index (which contains only changes) and sharedindex.XXXXX (which
>>> contains unchanged information) where 'XXXXX' is a value found in the
>>> index file.  If we don't do anything fancy then these two files live
>>> next to one another in a repository's git directory at $GIT_DIR/index
>>> and $GIT_DIR/sharedindex.XXXXX.  This seems to work all well and fine
>>> except that this isn't always the case and the read_index_from function
>>> takes this into account by enabling a caller to specify a path to where
>>> the index file is located.  We can do this by specifying the index file
>>> we want to use by setting GIT_INDEX_FILE.
> [...]
>>> In this case if i were to specify a location of an
>>> index file in my home directory '~/index' and be using the split index
>>> feature then the corresponding sharedindex file would live in my
>>> repository's git directory '~/project/.git/sharedindex.XXXXX'.  So the
>>> sharedindex file is always located relative to the project's git
>>> directory and not the index file itself, which is kind of confusing.
>>> Maybe a better design would be to have the sharedindex file located
>>> relative to the index file.
>>
>> That adds more problems. Now when you move the index file around you
>> have to move the shared index file too (think about atomic rename
>> which we use in plenty of places, we can't achieve that by moving two
>> files). A new dependency to $GIT_DIR is not that confusing to me, the
>> index file is useless anyway if you don't have access to
>> $GIT_DIR/objects. There was always the option to _not_ split the index
>> when $GIT_INDEX_FILE is specified, I think I did consider that but I
>> dropped it because we'd lose the performance gain by splitting.
>
> Can you elaborate a little more on this?
>
> At first glance, it seems simpler to say "paths in index extensions
> named in the index file are relative to the location of the index
> file" and to make moving the index file also require moving the shared
> index file, exactly as you say.  So at least from a "principle of
> least surprise" perspective I would be tempted to go that way.
>
> It's true that we rely on atomic rename in plenty of places, but only
> within a directory.  (Filesystem boundaries, NFS, etc mean that atomic
> renames across directories are a lost cause.)
>
> Fortunately index files (including temp index files used by scripts)
> tend to only be in $GIT_DIR, for exactly that reason.  So I am
> wondering if switching to index-file-relative semantics would be an
> invasive move and what the pros and cons of such a move are.

I think it gets messier. Now you have to move two files. If the first
move succeeds but the second one fails, recovery may involve un-move
the first file, but its old content is already gone. We probably can
get around that. But since the shared index is assumed big and heavy,
I just went with "store it in the place it's going to be and never
move it anywhere ever (until nobody uses it then it's deleted)"
-- 
Duy

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/3] read-cache: don't write index twice if we can't write shared index
  2018-01-14 10:18           ` [PATCH 3/3] read-cache: don't write index twice if we can't write shared index Nguyễn Thái Ngọc Duy
@ 2018-01-18 11:36             ` SZEDER Gábor
  2018-01-18 12:47               ` Duy Nguyen
  0 siblings, 1 reply; 94+ messages in thread
From: SZEDER Gábor @ 2018-01-18 11:36 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: Git mailing list, Thomas Gummerer, Lars Schneider,
	Brandon Williams, Jeff King, Junio C Hamano

This series, queued as 'nd/shared-index-fix', makes the 32 bit Linux
build job fail on Travis CI.  Unfortunately, all it can tell us about
the failure is this:

  Test Summary Report
  -------------------
  t1700-split-index.sh                             (Wstat: 256 Tests: 23
  Failed: 1)
    Failed test:  23
    Non-zero exit status: 1
  Files=809, Tests=18491, 401 wallclock secs ( 7.22 usr  1.60 sys + 263.16
  cusr 49.58 csys = 321.56 CPU)
  Result: FAIL

because it can't access the test's verbose log due to lack of
permissions.


  https://travis-ci.org/git/git/jobs/329681826#L2074

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/3] read-cache: don't write index twice if we can't write shared index
  2018-01-18 11:36             ` SZEDER Gábor
@ 2018-01-18 12:47               ` Duy Nguyen
  2018-01-18 13:29                 ` Jeff King
  2018-01-22 18:27                 ` [PATCH 3/3] read-cache: don't write index twice if we can't write shared index SZEDER Gábor
  0 siblings, 2 replies; 94+ messages in thread
From: Duy Nguyen @ 2018-01-18 12:47 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Git mailing list, Thomas Gummerer, Lars Schneider,
	Brandon Williams, Jeff King, Junio C Hamano

On Thu, Jan 18, 2018 at 6:36 PM, SZEDER Gábor <szeder.dev@gmail.com> wrote:
> This series, queued as 'nd/shared-index-fix', makes the 32 bit Linux
> build job fail on Travis CI.  Unfortunately, all it can tell us about
> the failure is this:
>
>   Test Summary Report
>   -------------------
>   t1700-split-index.sh                             (Wstat: 256 Tests: 23
>   Failed: 1)
>     Failed test:  23
>     Non-zero exit status: 1
>   Files=809, Tests=18491, 401 wallclock secs ( 7.22 usr  1.60 sys + 263.16
>   cusr 49.58 csys = 321.56 CPU)
>   Result: FAIL
>
> because it can't access the test's verbose log due to lack of
> permissions.
>
>
>   https://travis-ci.org/git/git/jobs/329681826#L2074

I may need help getting that log (or even better the trash directory
of t1700). I tried -m32 build, then valgrind on amd64 (because it
didn't work with my 32 build), running the tests with dash and even
the linux32 docker version that comes with "ci" directory. Everything
worked for me.
-- 
Duy

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/3] read-cache: don't write index twice if we can't write shared index
  2018-01-18 12:47               ` Duy Nguyen
@ 2018-01-18 13:29                 ` Jeff King
  2018-01-18 13:36                   ` Duy Nguyen
  2018-01-22 18:27                 ` [PATCH 3/3] read-cache: don't write index twice if we can't write shared index SZEDER Gábor
  1 sibling, 1 reply; 94+ messages in thread
From: Jeff King @ 2018-01-18 13:29 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: SZEDER Gábor, Git mailing list, Thomas Gummerer,
	Lars Schneider, Brandon Williams, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1013 bytes --]

On Thu, Jan 18, 2018 at 07:47:50PM +0700, Duy Nguyen wrote:

> I may need help getting that log (or even better the trash directory
> of t1700). I tried -m32 build, then valgrind on amd64 (because it
> didn't work with my 32 build), running the tests with dash and even
> the linux32 docker version that comes with "ci" directory. Everything
> worked for me.

It fails for me locally if I run it in the linux32 docker environment.
Here's the end of the "-v -x" output:

  + GIT_INDEX_FILE=/usr/src/git/t/trash directory.t1700-split-index/new-index git -C ro update-index --split-index
  + chmod -R u+w ro
  + rm ro/.git/sharedindex.bbdf63f5a51242904dba2ea929ea4f56fcc340b6
  + GIT_INDEX_FILE=new-index git ls-files
  fatal: .git/sharedindex.bbdf63f5a51242904dba2ea929ea4f56fcc340b6: index file open failed: No such file or directory
  error: last command exited with $?=128
  not ok 23 - graceful handling splitting index is not allowed

I don't know if the trash state will be helpful, but it's attached.

-Peff

[-- Attachment #2: t1700-trash.tar.gz --]
[-- Type: application/gzip, Size: 24979 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/3] read-cache: don't write index twice if we can't write shared index
  2018-01-18 13:29                 ` Jeff King
@ 2018-01-18 13:36                   ` Duy Nguyen
  2018-01-18 15:00                     ` Duy Nguyen
  0 siblings, 1 reply; 94+ messages in thread
From: Duy Nguyen @ 2018-01-18 13:36 UTC (permalink / raw)
  To: Jeff King
  Cc: SZEDER Gábor, Git mailing list, Thomas Gummerer,
	Lars Schneider, Brandon Williams, Junio C Hamano

On Thu, Jan 18, 2018 at 8:29 PM, Jeff King <peff@peff.net> wrote:
> On Thu, Jan 18, 2018 at 07:47:50PM +0700, Duy Nguyen wrote:
>
>> I may need help getting that log (or even better the trash directory
>> of t1700). I tried -m32 build, then valgrind on amd64 (because it
>> didn't work with my 32 build), running the tests with dash and even
>> the linux32 docker version that comes with "ci" directory. Everything
>> worked for me.
>
> It fails for me locally if I run it in the linux32 docker environment.
> Here's the end of the "-v -x" output:
>
>   + GIT_INDEX_FILE=/usr/src/git/t/trash directory.t1700-split-index/new-index git -C ro update-index --split-index
>   + chmod -R u+w ro
>   + rm ro/.git/sharedindex.bbdf63f5a51242904dba2ea929ea4f56fcc340b6
>   + GIT_INDEX_FILE=new-index git ls-files
>   fatal: .git/sharedindex.bbdf63f5a51242904dba2ea929ea4f56fcc340b6: index file open failed: No such file or directory
>   error: last command exited with $?=128
>   not ok 23 - graceful handling splitting index is not allowed
>
> I don't know if the trash state will be helpful, but it's attached.

Thanks. Somehow my way of forcing mks_tempfile() to fail, well..
failed in this environment. I see the same output if I remove "chmod
-R u-w ro". I think I have enough to continue from here.
-- 
Duy

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/3] read-cache: don't write index twice if we can't write shared index
  2018-01-18 13:36                   ` Duy Nguyen
@ 2018-01-18 15:00                     ` Duy Nguyen
  2018-01-18 21:37                       ` Jeff King
  0 siblings, 1 reply; 94+ messages in thread
From: Duy Nguyen @ 2018-01-18 15:00 UTC (permalink / raw)
  To: Jeff King
  Cc: SZEDER Gábor, Git mailing list, Thomas Gummerer,
	Lars Schneider, Brandon Williams, Junio C Hamano

On Thu, Jan 18, 2018 at 08:36:32PM +0700, Duy Nguyen wrote:
> On Thu, Jan 18, 2018 at 8:29 PM, Jeff King <peff@peff.net> wrote:
> > On Thu, Jan 18, 2018 at 07:47:50PM +0700, Duy Nguyen wrote:
> >
> >> I may need help getting that log (or even better the trash directory
> >> of t1700). I tried -m32 build, then valgrind on amd64 (because it
> >> didn't work with my 32 build), running the tests with dash and even
> >> the linux32 docker version that comes with "ci" directory. Everything
> >> worked for me.
> >
> > It fails for me locally if I run it in the linux32 docker environment.
> > Here's the end of the "-v -x" output:
> >
> >   + GIT_INDEX_FILE=/usr/src/git/t/trash directory.t1700-split-index/new-index git -C ro update-index --split-index
> >   + chmod -R u+w ro
> >   + rm ro/.git/sharedindex.bbdf63f5a51242904dba2ea929ea4f56fcc340b6
> >   + GIT_INDEX_FILE=new-index git ls-files
> >   fatal: .git/sharedindex.bbdf63f5a51242904dba2ea929ea4f56fcc340b6: index file open failed: No such file or directory
> >   error: last command exited with $?=128
> >   not ok 23 - graceful handling splitting index is not allowed
> >
> > I don't know if the trash state will be helpful, but it's attached.
> 
> Thanks. Somehow my way of forcing mks_tempfile() to fail, well..
> failed in this environment. I see the same output if I remove "chmod
> -R u-w ro". I think I have enough to continue from here.

The test suite was run as root, no wonder why my removing write access
has no effect. I got the test to pass with this, but then it fails
with

    Can't write .prove (Permission denied) at /usr/share/perl/5.22/App/Prove.pm line 542.

Some more chown'ing or chmod'ing is required....

-- 8< --
Subject: [PATCH] ci: don't accidentally run the test suite as root

This script assigns and adds a user named "ci" in a subshell so the
outer CI_USER is not affected. For some reason, CI_USER is actually
empty on Travis linux32 builds. This makes the following "su" useless
and the test suite is run as root.

Some tests may expect file/dir permissions to be respected. But root
rules all and ignores all. That could make tests fail.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 ci/run-linux32-build.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/ci/run-linux32-build.sh b/ci/run-linux32-build.sh
index c19c50c1c9..92b488ff27 100755
--- a/ci/run-linux32-build.sh
+++ b/ci/run-linux32-build.sh
@@ -21,8 +21,8 @@ linux32 --32bit i386 sh -c '
 # If a host user id is given, then create a user "ci" with the host user
 # id to make everything accessible to the host user.
 HOST_UID=$1 &&
-CI_USER=$USER &&
-test -z $HOST_UID || (CI_USER="ci" && useradd -u $HOST_UID $CI_USER) &&
+CI_USER=${USER:-ci} &&
+test -z "$HOST_UID" || useradd -u $HOST_UID $CI_USER &&
 
 # Build and test
 linux32 --32bit i386 su -m -l $CI_USER -c '
-- 
2.16.0.47.g5ff498d35b

-- 8< --

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/3] read-cache: don't write index twice if we can't write shared index
  2018-01-18 15:00                     ` Duy Nguyen
@ 2018-01-18 21:37                       ` Jeff King
  2018-01-18 22:32                         ` SZEDER Gábor
  0 siblings, 1 reply; 94+ messages in thread
From: Jeff King @ 2018-01-18 21:37 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: SZEDER Gábor, Git mailing list, Thomas Gummerer,
	Lars Schneider, Brandon Williams, Junio C Hamano

On Thu, Jan 18, 2018 at 10:00:14PM +0700, Duy Nguyen wrote:

> The test suite was run as root, no wonder why my removing write access
> has no effect. I got the test to pass with this, but then it fails
> with
> 
>     Can't write .prove (Permission denied) at /usr/share/perl/5.22/App/Prove.pm line 542.
> 
> Some more chown'ing or chmod'ing is required....

Ah, right. I agree that we probably ought to run the ci as non-root.
However, if the test requires non-root, it probably needs to be marked
with the SANITY prereq.

I also ran into one funny thing: if you run the script with "-i", then
we do not run the test_when_finished block. And therefore the "ro"
directory is left without its write bit, and the next test run fails, as
it cannot "rm -rf" the old trash directory out of the way.

I'm not sure there's a good solution, though. Skipping the
test_when_finished block on a "-i" run is intentional, to let you
inspect the broken state.

> Subject: [PATCH] ci: don't accidentally run the test suite as root
> 
> This script assigns and adds a user named "ci" in a subshell so the
> outer CI_USER is not affected. For some reason, CI_USER is actually
> empty on Travis linux32 builds. This makes the following "su" useless
> and the test suite is run as root.

Are we sure this was the problem on Travis, and it wasn't just an issue
with how I reproduced via docker?

-Peff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 0/3] fixes for split index mode
  2018-01-07 22:30   ` [PATCH v3 0/3] fixes for split index mode Thomas Gummerer
                       ` (3 preceding siblings ...)
  2018-01-13 22:37     ` [PATCH v3 4/3] read-cache: don't try to write index if we can't write shared index Thomas Gummerer
@ 2018-01-18 21:53     ` Thomas Gummerer
  2018-01-19 18:34       ` Junio C Hamano
  4 siblings, 1 reply; 94+ messages in thread
From: Thomas Gummerer @ 2018-01-18 21:53 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Lars Schneider, Brandon Williams, Jeff King, git,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor

Friendly ping on this series now that 2.16 is out :) Is there anything
in this series (up to 3/3, 4/3 can be dropped now that Duy fixed it in
a nicer way) that still needs updating?  It fixes a few bugs in split
index mode with submodules/worktrees, so it would be nice to get this
reviewed/merged.

On 01/07, Thomas Gummerer wrote:
> Thanks Brandon and Lars for comments on the previous round.
> 
> Previous rounds were at <20171210212202.28231-1-t.gummerer@gmail.com>
> and <20171217225122.28941-1-t.gummerer@gmail.com>.
> 
> Changes since the previous round:
> 
> - reworked the patches to no longer try to use struct repository for
>   worktrees, but pass gitdir into read_index_from instead (Thanks
>   Brandon :)).  So the fixes that were in 1/3 and 2/3 previously are
>   now in 1/3
> - 2/3 now fixes t7009 properly.  I thought it was fixed before, but it
>   probably just passed the tests because of the GIT_TEST_SPLIT_INDEX
>   "randomness".
> - The travis job is now only running the 64-bit linux build with split
>   index mode to save even more cycles.
> - As this wasn't picked up anywhere yet, I took the freedom to rebase
>   this onto the current master, which includes sg/travis-fixes, which
>   made it a bit easier for me to test.  If this makes the life harder
>   for anyone reviewing this let me know and I can base it on the same
>   commit previous iterations were based on.
> 
> Thomas Gummerer (3):
>   read-cache: fix reading the shared index for other repos
>   split-index: don't write cache tree with null sha1 entries
>   travis: run tests with GIT_TEST_SPLIT_INDEX
> 
>  cache-tree.c            |  2 +-
>  cache.h                 |  8 +++++---
>  ci/run-linux32-build.sh |  1 +
>  ci/run-tests.sh         |  4 ++++
>  read-cache.c            | 25 ++++++++++++++-----------
>  repository.c            |  2 +-
>  revision.c              |  3 ++-
>  split-index.c           |  2 ++
>  t/t1700-split-index.sh  | 19 +++++++++++++++++++
>  9 files changed, 49 insertions(+), 17 deletions(-)
> 
> -- 
> 2.16.0.rc1.238.g530d649a79
> 

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/3] read-cache: don't write index twice if we can't write shared index
  2018-01-18 21:37                       ` Jeff King
@ 2018-01-18 22:32                         ` SZEDER Gábor
  2018-01-19  0:30                           ` Duy Nguyen
  2018-01-22 13:32                           ` [PATCH 0/5] Travis CI: don't run the test suite as root in the 32 bit Linux build SZEDER Gábor
  0 siblings, 2 replies; 94+ messages in thread
From: SZEDER Gábor @ 2018-01-18 22:32 UTC (permalink / raw)
  To: Jeff King
  Cc: Duy Nguyen, Git mailing list, Thomas Gummerer, Lars Schneider,
	Brandon Williams, Junio C Hamano

On Thu, Jan 18, 2018 at 10:37 PM, Jeff King <peff@peff.net> wrote:
> On Thu, Jan 18, 2018 at 10:00:14PM +0700, Duy Nguyen wrote:
>
>> The test suite was run as root, no wonder why my removing write access
>> has no effect. I got the test to pass with this, but then it fails
>> with
>>
>>     Can't write .prove (Permission denied) at /usr/share/perl/5.22/App/Prove.pm line 542.
>>
>> Some more chown'ing or chmod'ing is required....

This is the fallout of running the tests as root in the past.  With your
patch 'prove' is run as a non-root user, but the prove state is loaded
from Travis CI's cache, where it has been written as root the last time
around, so now we don't have permissions to (over)write it.

I have patches in the works to address this as well.

>> Subject: [PATCH] ci: don't accidentally run the test suite as root
>>
>> This script assigns and adds a user named "ci" in a subshell so the
>> outer CI_USER is not affected. For some reason, CI_USER is actually
>> empty on Travis linux32 builds. This makes the following "su" useless
>> and the test suite is run as root.
>
> Are we sure this was the problem on Travis, and it wasn't just an issue
> with how I reproduced via docker?

Yes, we are.


Gábor

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/3] read-cache: don't write index twice if we can't write shared index
  2018-01-18 22:32                         ` SZEDER Gábor
@ 2018-01-19  0:30                           ` Duy Nguyen
  2018-01-22 13:32                           ` [PATCH 0/5] Travis CI: don't run the test suite as root in the 32 bit Linux build SZEDER Gábor
  1 sibling, 0 replies; 94+ messages in thread
From: Duy Nguyen @ 2018-01-19  0:30 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Jeff King, Git mailing list, Thomas Gummerer, Lars Schneider,
	Brandon Williams, Junio C Hamano

On Fri, Jan 19, 2018 at 5:32 AM, SZEDER Gábor <szeder.dev@gmail.com> wrote:
> On Thu, Jan 18, 2018 at 10:37 PM, Jeff King <peff@peff.net> wrote:
>> On Thu, Jan 18, 2018 at 10:00:14PM +0700, Duy Nguyen wrote:
>>
>>> The test suite was run as root, no wonder why my removing write access
>>> has no effect. I got the test to pass with this, but then it fails
>>> with
>>>
>>>     Can't write .prove (Permission denied) at /usr/share/perl/5.22/App/Prove.pm line 542.
>>>
>>> Some more chown'ing or chmod'ing is required....
>
> This is the fallout of running the tests as root in the past.  With your
> patch 'prove' is run as a non-root user, but the prove state is loaded
> from Travis CI's cache, where it has been written as root the last time
> around, so now we don't have permissions to (over)write it.
>
> I have patches in the works to address this as well.

Great. I'll leave it to you then. I will update the test with SANITY
(and a small fix in the test name) and think more about the "-i" Jeff
mentioned.
-- 
Duy

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 0/3] fixes for split index mode
  2018-01-18 21:53     ` [PATCH v3 0/3] fixes for split index mode Thomas Gummerer
@ 2018-01-19 18:34       ` Junio C Hamano
  2018-01-19 21:11         ` Thomas Gummerer
  0 siblings, 1 reply; 94+ messages in thread
From: Junio C Hamano @ 2018-01-19 18:34 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Lars Schneider, Brandon Williams, Jeff King, git,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor

Thomas Gummerer <t.gummerer@gmail.com> writes:

> Friendly ping on this series now that 2.16 is out :) Is there anything
> in this series (up to 3/3, 4/3 can be dropped now that Duy fixed it in
> a nicer way) that still needs updating?  It fixes a few bugs in split
> index mode with submodules/worktrees, so it would be nice to get this
> reviewed/merged.

I was wondering about the same thing.  Especially it wasn't very
clear to me what Duy's replacement was meant to replace and how well
it was supposed to work with the rest of your series.

Let's drop 4/3 and queue 1-3/3 for now.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 0/3] fixes for split index mode
  2018-01-19 18:34       ` Junio C Hamano
@ 2018-01-19 21:11         ` Thomas Gummerer
  0 siblings, 0 replies; 94+ messages in thread
From: Thomas Gummerer @ 2018-01-19 21:11 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Lars Schneider, Brandon Williams, Jeff King, git,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor

On 01/19, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
> 
> > Friendly ping on this series now that 2.16 is out :) Is there anything
> > in this series (up to 3/3, 4/3 can be dropped now that Duy fixed it in
> > a nicer way) that still needs updating?  It fixes a few bugs in split
> > index mode with submodules/worktrees, so it would be nice to get this
> > reviewed/merged.
> 
> I was wondering about the same thing.  Especially it wasn't very
> clear to me what Duy's replacement was meant to replace and how well
> it was supposed to work with the rest of your series.

Sorry about the confusion.  4/3 was replaced by Duy's series, but the
rest of this series is independent of those patches.

> Let's drop 4/3 and queue 1-3/3 for now.

Thanks!

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos
  2018-01-07 22:30     ` [PATCH v3 1/3] read-cache: fix reading the shared index for other repos Thomas Gummerer
  2018-01-08 10:41       ` Duy Nguyen
  2018-01-16 21:42       ` Brandon Williams
@ 2018-01-19 21:57       ` Junio C Hamano
  2018-01-20 11:58         ` Thomas Gummerer
  2 siblings, 1 reply; 94+ messages in thread
From: Junio C Hamano @ 2018-01-19 21:57 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: git, Lars Schneider, Brandon Williams, Jeff King,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor

Thomas Gummerer <t.gummerer@gmail.com> writes:

> read_cache_from() defaults to using the gitdir of the_repository.  As it
> is mostly a convenience macro, having to pass get_git_dir() for every
> call seems overkill, and if necessary users can have more control by
> using read_index_from().

This was a bit painful change, given that some changes in flight do
add new callsites to read_index_from() and they got the function
changed under their feet.

Please double check if I made the right git-dir to be passed to them
when I push out 'pu' in a few hours.

Thanks.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos
  2018-01-19 21:57       ` Junio C Hamano
@ 2018-01-20 11:58         ` Thomas Gummerer
  2018-01-22  6:14           ` Junio C Hamano
  0 siblings, 1 reply; 94+ messages in thread
From: Thomas Gummerer @ 2018-01-20 11:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Lars Schneider, Brandon Williams, Jeff King,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor

On 01/19, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
> 
> > read_cache_from() defaults to using the gitdir of the_repository.  As it
> > is mostly a convenience macro, having to pass get_git_dir() for every
> > call seems overkill, and if necessary users can have more control by
> > using read_index_from().
> 
> This was a bit painful change, given that some changes in flight do
> add new callsites to read_index_from() and they got the function
> changed under their feet.

Sorry about that.  Is there any way to make such a change less painful
in the future?

> Please double check if I made the right git-dir to be passed to them
> when I push out 'pu' in a few hours.

I think one conversion was not quite correct, even though all tests
still pass with GIT_TEST_SPLIT_INDEX set.

The following diff fixes that conversation and has a test showing the
breakage:

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 6a49f9e628..4d86a3574f 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -612,7 +612,8 @@ static void validate_no_submodules(const struct worktree *wt)
        struct index_state istate = {0};
        int i, found_submodules = 0;
 
-       if (read_index_from(&istate, worktree_git_path(wt, "index"), get_git_dir()) > 0) {
+       if (read_index_from(&istate, worktree_git_path(wt, "index"),
+                           get_worktree_git_dir(wt)) > 0) {
                for (i = 0; i < istate.cache_nr; i++) {
                        struct cache_entry *ce = istate.cache[i];
 
diff --git a/t/t2028-worktree-move.sh b/t/t2028-worktree-move.sh
index b3105eaaed..8faf61bbf5 100755
--- a/t/t2028-worktree-move.sh
+++ b/t/t2028-worktree-move.sh
@@ -90,6 +90,16 @@ test_expect_success 'move main worktree' '
        test_must_fail git worktree move . def
 '
 
+test_expect_success 'move worktree with split index' '
+       git worktree add test &&
+       (
+               cd test &&
+               test_commit file &&
+               git update-index --split-index
+       ) &&
+       git worktree move test test-destination
+'
+
 test_expect_success 'remove main worktree' '
        test_must_fail git worktree remove .
 '

With this applied what you have in pu looks good to me.  Does the
above help, or should I send this in another for for you to apply?

> Thanks.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos
  2018-01-20 11:58         ` Thomas Gummerer
@ 2018-01-22  6:14           ` Junio C Hamano
  2018-01-27 12:18             ` Thomas Gummerer
  2018-02-07 22:41             ` Junio C Hamano
  0 siblings, 2 replies; 94+ messages in thread
From: Junio C Hamano @ 2018-01-22  6:14 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: git, Lars Schneider, Brandon Williams, Jeff King,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor

Thomas Gummerer <t.gummerer@gmail.com> writes:

> On 01/19, Junio C Hamano wrote:
>> Thomas Gummerer <t.gummerer@gmail.com> writes:
>> 
>> > read_cache_from() defaults to using the gitdir of the_repository.  As it
>> > is mostly a convenience macro, having to pass get_git_dir() for every
>> > call seems overkill, and if necessary users can have more control by
>> > using read_index_from().
>> 
>> This was a bit painful change, given that some changes in flight do
>> add new callsites to read_index_from() and they got the function
>> changed under their feet.
>
> Sorry about that.  Is there any way to make such a change less painful
> in the future?

One way is to do for read_index_from() what you did for the existing
users of read_cache_from().  Introduce a _new_ helper that will not
be known for any existing topics in flight, and use that to make the
existing API a thin wrapper around it.

I _think_ I got it right with evil merge, so unless this fix needs
to be delayed for extended period of time for whatever reason while
any more new callers of the function appears (which is unlikely), we
should be OK ;-)

Thanks.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 0/5] Travis CI: don't run the test suite as root in the 32 bit Linux build
  2018-01-18 22:32                         ` SZEDER Gábor
  2018-01-19  0:30                           ` Duy Nguyen
@ 2018-01-22 13:32                           ` SZEDER Gábor
  2018-01-22 13:32                             ` [PATCH 1/5] travis-ci: use 'set -x' for the commands under 'su' " SZEDER Gábor
                                               ` (4 more replies)
  1 sibling, 5 replies; 94+ messages in thread
From: SZEDER Gábor @ 2018-01-22 13:32 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Duy Nguyen, Jeff King, Lars Schneider, Johannes Schindelin,
	Thomas Gummerer, Brandon Williams, git, SZEDER Gábor


The important one is patch 4, which fixes the issue of running the test
suite as root, with a bit of sugar on top to make sure that a future
"non-root" build job can cope with a root-written cache directory from
the past.

The rest is a collection of small cleanups and improvements to make the
debugging this Docked-based build more convenient.

SZEDER Gábor (5):
  travis-ci: use 'set -x' for the commands under 'su' in the 32 bit
    Linux build
  travis-ci: use 'set -e' in the 32 bit Linux build job
  travis-ci: don't repeat the path of the cache directory
  travis-ci: don't run the test suite as root in the 32 bit Linux build
  travis-ci: don't fail if user already exists on 32 bit Linux build job

 ci/lib-travisci.sh        |  7 +++---
 ci/run-build-and-tests.sh |  2 +-
 ci/run-linux32-build.sh   | 55 +++++++++++++++++++++++++++++++++++------------
 ci/run-linux32-docker.sh  |  7 ++++--
 4 files changed, 51 insertions(+), 20 deletions(-)

-- 
2.16.1.80.gc0eec9753d


^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 1/5] travis-ci: use 'set -x' for the commands under 'su' in the 32 bit Linux build
  2018-01-22 13:32                           ` [PATCH 0/5] Travis CI: don't run the test suite as root in the 32 bit Linux build SZEDER Gábor
@ 2018-01-22 13:32                             ` " SZEDER Gábor
  2018-01-22 13:32                             ` [PATCH 2/5] travis-ci: use 'set -e' in the 32 bit Linux build job SZEDER Gábor
                                               ` (3 subsequent siblings)
  4 siblings, 0 replies; 94+ messages in thread
From: SZEDER Gábor @ 2018-01-22 13:32 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Duy Nguyen, Jeff King, Lars Schneider, Johannes Schindelin,
	Thomas Gummerer, Brandon Williams, git, SZEDER Gábor

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 ci/run-linux32-build.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/ci/run-linux32-build.sh b/ci/run-linux32-build.sh
index c19c50c1c9..5a36a8d7c0 100755
--- a/ci/run-linux32-build.sh
+++ b/ci/run-linux32-build.sh
@@ -26,6 +26,7 @@ test -z $HOST_UID || (CI_USER="ci" && useradd -u $HOST_UID $CI_USER) &&
 
 # Build and test
 linux32 --32bit i386 su -m -l $CI_USER -c '
+    set -x &&
     cd /usr/src/git &&
     ln -s /tmp/travis-cache/.prove t/.prove &&
     make --jobs=2 &&
-- 
2.16.1.80.gc0eec9753d


^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 2/5] travis-ci: use 'set -e' in the 32 bit Linux build job
  2018-01-22 13:32                           ` [PATCH 0/5] Travis CI: don't run the test suite as root in the 32 bit Linux build SZEDER Gábor
  2018-01-22 13:32                             ` [PATCH 1/5] travis-ci: use 'set -x' for the commands under 'su' " SZEDER Gábor
@ 2018-01-22 13:32                             ` SZEDER Gábor
  2018-01-23 16:26                               ` Jeff King
  2018-01-22 13:32                             ` [PATCH 3/5] travis-ci: don't repeat the path of the cache directory SZEDER Gábor
                                               ` (2 subsequent siblings)
  4 siblings, 1 reply; 94+ messages in thread
From: SZEDER Gábor @ 2018-01-22 13:32 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Duy Nguyen, Jeff King, Lars Schneider, Johannes Schindelin,
	Thomas Gummerer, Brandon Williams, git, SZEDER Gábor

All 'ci/*' scripts use 'set -e' to break the build job if a command
fails, except 'ci/run-linux32-build.sh' which relies on the && chain
to do the same.  This inconsistency among the 'ci/*' scripts is asking
for trouble: I forgot about the && chain more than once while working
on this patch series.

Enable 'set -e' for the whole script and for the commands executed
under 'su' as well.

While touching every line in the 'su' command block anyway, change
their indentation to use a tab instead of spaces.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 ci/run-linux32-build.sh | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/ci/run-linux32-build.sh b/ci/run-linux32-build.sh
index 5a36a8d7c0..248183982b 100755
--- a/ci/run-linux32-build.sh
+++ b/ci/run-linux32-build.sh
@@ -6,29 +6,29 @@
 #   run-linux32-build.sh [host-user-id]
 #
 
-set -x
+set -ex
 
 # Update packages to the latest available versions
 linux32 --32bit i386 sh -c '
     apt update >/dev/null &&
     apt install -y build-essential libcurl4-openssl-dev libssl-dev \
 	libexpat-dev gettext python >/dev/null
-' &&
+'
 
 # If this script runs inside a docker container, then all commands are
 # usually executed as root. Consequently, the host user might not be
 # able to access the test output files.
 # If a host user id is given, then create a user "ci" with the host user
 # id to make everything accessible to the host user.
-HOST_UID=$1 &&
-CI_USER=$USER &&
-test -z $HOST_UID || (CI_USER="ci" && useradd -u $HOST_UID $CI_USER) &&
+HOST_UID=$1
+CI_USER=$USER
+test -z $HOST_UID || (CI_USER="ci" && useradd -u $HOST_UID $CI_USER)
 
 # Build and test
 linux32 --32bit i386 su -m -l $CI_USER -c '
-    set -x &&
-    cd /usr/src/git &&
-    ln -s /tmp/travis-cache/.prove t/.prove &&
-    make --jobs=2 &&
-    make --quiet test
+	set -ex
+	cd /usr/src/git
+	ln -s /tmp/travis-cache/.prove t/.prove
+	make --jobs=2
+	make --quiet test
 '
-- 
2.16.1.80.gc0eec9753d


^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 3/5] travis-ci: don't repeat the path of the cache directory
  2018-01-22 13:32                           ` [PATCH 0/5] Travis CI: don't run the test suite as root in the 32 bit Linux build SZEDER Gábor
  2018-01-22 13:32                             ` [PATCH 1/5] travis-ci: use 'set -x' for the commands under 'su' " SZEDER Gábor
  2018-01-22 13:32                             ` [PATCH 2/5] travis-ci: use 'set -e' in the 32 bit Linux build job SZEDER Gábor
@ 2018-01-22 13:32                             ` SZEDER Gábor
  2018-01-23 16:30                               ` Jeff King
  2018-01-22 13:32                             ` [PATCH 4/5] travis-ci: don't run the test suite as root in the 32 bit Linux build SZEDER Gábor
  2018-01-22 13:32                             ` [PATCH 5/5] travis-ci: don't fail if user already exists on 32 bit Linux build job SZEDER Gábor
  4 siblings, 1 reply; 94+ messages in thread
From: SZEDER Gábor @ 2018-01-22 13:32 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Duy Nguyen, Jeff King, Lars Schneider, Johannes Schindelin,
	Thomas Gummerer, Brandon Williams, git, SZEDER Gábor

Some of our 'ci/*' scripts repeat the name or full path of the Travis
CI cache directory, and the following patches will add new places
using that path.

Use a variable to refer to the path of the cache directory instead, so
it's hard-coded only in a single place.

Pay extra attention to the 32 bit Linux build: it runs in a Docker
container, so pass the path of the cache directory from the host to
the container in an environment variable.  Furthermore, use the
variable in the container only if it's set, to prevent errors when
someone is running or debugging the Docker build locally, because in
that case the variable won't be set as there won't be any Travis CI
cache.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 ci/lib-travisci.sh        | 7 ++++---
 ci/run-build-and-tests.sh | 2 +-
 ci/run-linux32-build.sh   | 6 +++---
 ci/run-linux32-docker.sh  | 5 ++++-
 4 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/ci/lib-travisci.sh b/ci/lib-travisci.sh
index 07f27c7270..1efee55ef7 100755
--- a/ci/lib-travisci.sh
+++ b/ci/lib-travisci.sh
@@ -21,8 +21,6 @@ skip_branch_tip_with_tag () {
 	fi
 }
 
-good_trees_file="$HOME/travis-cache/good-trees"
-
 # Save some info about the current commit's tree, so we can skip the build
 # job if we encounter the same tree again and can provide a useful info
 # message.
@@ -83,7 +81,10 @@ check_unignored_build_artifacts ()
 # and installing dependencies.
 set -ex
 
-mkdir -p "$HOME/travis-cache"
+cache_dir="$HOME/travis-cache"
+good_trees_file="$cache_dir/good-trees"
+
+mkdir -p "$cache_dir"
 
 skip_branch_tip_with_tag
 skip_good_tree
diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index d3a094603f..30790e258d 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -5,7 +5,7 @@
 
 . ${0%/*}/lib-travisci.sh
 
-ln -s $HOME/travis-cache/.prove t/.prove
+ln -s "$cache_dir/.prove" t/.prove
 
 make --jobs=2
 make --quiet test
diff --git a/ci/run-linux32-build.sh b/ci/run-linux32-build.sh
index 248183982b..c9476d6598 100755
--- a/ci/run-linux32-build.sh
+++ b/ci/run-linux32-build.sh
@@ -25,10 +25,10 @@ CI_USER=$USER
 test -z $HOST_UID || (CI_USER="ci" && useradd -u $HOST_UID $CI_USER)
 
 # Build and test
-linux32 --32bit i386 su -m -l $CI_USER -c '
+linux32 --32bit i386 su -m -l $CI_USER -c "
 	set -ex
 	cd /usr/src/git
-	ln -s /tmp/travis-cache/.prove t/.prove
+	test -n '$cache_dir' && ln -s '$cache_dir/.prove' t/.prove
 	make --jobs=2
 	make --quiet test
-'
+"
diff --git a/ci/run-linux32-docker.sh b/ci/run-linux32-docker.sh
index 4f191c5bb1..15288ea2cf 100755
--- a/ci/run-linux32-docker.sh
+++ b/ci/run-linux32-docker.sh
@@ -11,6 +11,8 @@ docker pull daald/ubuntu32:xenial
 # $ docker run -itv "${PWD}:/usr/src/git" --entrypoint /bin/bash daald/ubuntu32:xenial
 # root@container:/# /usr/src/git/ci/run-linux32-build.sh
 
+container_cache_dir=/tmp/travis-cache
+
 docker run \
 	--interactive \
 	--env DEVELOPER \
@@ -18,8 +20,9 @@ docker run \
 	--env GIT_PROVE_OPTS \
 	--env GIT_TEST_OPTS \
 	--env GIT_TEST_CLONE_2GB \
+	--env cache_dir="$container_cache_dir" \
 	--volume "${PWD}:/usr/src/git" \
-	--volume "${HOME}/travis-cache:/tmp/travis-cache" \
+	--volume "$cache_dir:$container_cache_dir" \
 	daald/ubuntu32:xenial \
 	/usr/src/git/ci/run-linux32-build.sh $(id -u $USER)
 
-- 
2.16.1.80.gc0eec9753d


^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4/5] travis-ci: don't run the test suite as root in the 32 bit Linux build
  2018-01-22 13:32                           ` [PATCH 0/5] Travis CI: don't run the test suite as root in the 32 bit Linux build SZEDER Gábor
                                               ` (2 preceding siblings ...)
  2018-01-22 13:32                             ` [PATCH 3/5] travis-ci: don't repeat the path of the cache directory SZEDER Gábor
@ 2018-01-22 13:32                             ` SZEDER Gábor
  2018-01-23 16:43                               ` Jeff King
  2018-01-22 13:32                             ` [PATCH 5/5] travis-ci: don't fail if user already exists on 32 bit Linux build job SZEDER Gábor
  4 siblings, 1 reply; 94+ messages in thread
From: SZEDER Gábor @ 2018-01-22 13:32 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Duy Nguyen, Jeff King, Lars Schneider, Johannes Schindelin,
	Thomas Gummerer, Brandon Williams, git, SZEDER Gábor

Travis CI runs the 32 bit Linux build job in a Docker container, where
all commands are executed as root by default.  Therefore, ever since
we added this build job in 88dedd5e7 (Travis: also test on 32-bit
Linux, 2017-03-05), we have a bit of code to create a user in the
container matching the ID of the host user and then to run the test
suite as this user.  Matching the host user ID is important, because
otherwise the host user would have no access to any files written by
processes running in the container, notably the logs of failed tests
couldn't be included in the build job's trace log.

Alas, this piece of code never worked, because it sets the variable
holding the user name ($CI_USER) in a subshell, meaning it doesn't
have any effect by the time we get to the point to actually use the
variable to switch users with 'su'.  So all this time we were running
the test suite as root.

Reorganize that piece of code in 'ci/run-linux32-build.sh' a bit to
avoid that problematic subshell and to ensure that we switch to the
right user.  Furthermore, make the script's optional host user ID
option mandatory, so running the build accidentally as root will
become harder when debugging locally.  If someone really wants to run
the test suite as root, whatever the reasons might be, it'll still be
possible to do so by explicitly passing '0' as host user ID.

Finally, one last catch: since commit 7e72cfcee (travis-ci: save prove
state for the 32 bit Linux build, 2017-12-27) the 'prove' test harness
has been writing its state to the Travis CI cache directory from
within the Docker container while running as root.  After this patch
'prove' will run as a regular user, so in future build jobs it won't
be able overwrite a previously written, still root-owned state file,
resulting in build job failures.  To resolve this we should manually
delete caches containing such root-owned files, but that would be a
hassle.  Instead, work this around by changing the owner of the whole
contents of the cache directory to the host user ID.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 ci/run-linux32-build.sh  | 30 +++++++++++++++++++++++++-----
 ci/run-linux32-docker.sh |  2 +-
 2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/ci/run-linux32-build.sh b/ci/run-linux32-build.sh
index c9476d6598..e37e1d2d5f 100755
--- a/ci/run-linux32-build.sh
+++ b/ci/run-linux32-build.sh
@@ -3,11 +3,17 @@
 # Build and test Git in a 32-bit environment
 #
 # Usage:
-#   run-linux32-build.sh [host-user-id]
+#   run-linux32-build.sh <host-user-id>
 #
 
 set -ex
 
+if test $# -ne 1 || test -z "$1"
+then
+	echo >&2 "usage: run-linux32-build.sh <host-user-id>"
+	exit 1
+fi
+
 # Update packages to the latest available versions
 linux32 --32bit i386 sh -c '
     apt update >/dev/null &&
@@ -18,11 +24,25 @@ linux32 --32bit i386 sh -c '
 # If this script runs inside a docker container, then all commands are
 # usually executed as root. Consequently, the host user might not be
 # able to access the test output files.
-# If a host user id is given, then create a user "ci" with the host user
-# id to make everything accessible to the host user.
+# If a non 0 host user id is given, then create a user "ci" with that
+# user id to make everything accessible to the host user.
 HOST_UID=$1
-CI_USER=$USER
-test -z $HOST_UID || (CI_USER="ci" && useradd -u $HOST_UID $CI_USER)
+if test $HOST_UID -eq 0
+then
+	# Just in case someone does want to run the test suite as root.
+	CI_USER=root
+else
+	CI_USER=ci
+	useradd -u $HOST_UID $CI_USER
+	# Due to a bug the test suite was run as root in the past, so
+	# a prove state file created back then is only accessible by
+	# root.  Now that bug is fixed, the test suite is run as a
+	# regular user, but the prove state file coming from Travis
+	# CI's cache might still be owned by root.
+	# Make sure that this user has rights to any cached files,
+	# including an existing prove state file.
+	test -n "$cache_dir" && chown -R $HOST_UID:$HOST_UID "$cache_dir"
+fi
 
 # Build and test
 linux32 --32bit i386 su -m -l $CI_USER -c "
diff --git a/ci/run-linux32-docker.sh b/ci/run-linux32-docker.sh
index 15288ea2cf..21637903ce 100755
--- a/ci/run-linux32-docker.sh
+++ b/ci/run-linux32-docker.sh
@@ -9,7 +9,7 @@ docker pull daald/ubuntu32:xenial
 
 # Use the following command to debug the docker build locally:
 # $ docker run -itv "${PWD}:/usr/src/git" --entrypoint /bin/bash daald/ubuntu32:xenial
-# root@container:/# /usr/src/git/ci/run-linux32-build.sh
+# root@container:/# /usr/src/git/ci/run-linux32-build.sh <host-user-id>
 
 container_cache_dir=/tmp/travis-cache
 
-- 
2.16.1.80.gc0eec9753d


^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 5/5] travis-ci: don't fail if user already exists on 32 bit Linux build job
  2018-01-22 13:32                           ` [PATCH 0/5] Travis CI: don't run the test suite as root in the 32 bit Linux build SZEDER Gábor
                                               ` (3 preceding siblings ...)
  2018-01-22 13:32                             ` [PATCH 4/5] travis-ci: don't run the test suite as root in the 32 bit Linux build SZEDER Gábor
@ 2018-01-22 13:32                             ` SZEDER Gábor
  2018-01-23 16:46                               ` Jeff King
  4 siblings, 1 reply; 94+ messages in thread
From: SZEDER Gábor @ 2018-01-22 13:32 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Duy Nguyen, Jeff King, Lars Schneider, Johannes Schindelin,
	Thomas Gummerer, Brandon Williams, git, SZEDER Gábor

The 32 bit Linux build job runs in a Docker container, which lends
itself to running and debugging locally, too.  Especially during
debugging one usually doesn't want to start with a fresh container
every time, to save time spent on installing a bunch of dependencies.
However, that doesn't work quite smootly, because the script running
in the container always creates a new user, which then must be removed
every time before subsequent executions, or the build script fails.

Make this process more convenient and don't try to create that user if
it already exists and has the right user ID in the container, so
developers don't have to bother with running a 'userdel' each time
before they run the build script.

The build job on Travis CI always starts with a fresh Docker
container, so this change doesn't make a difference there.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 ci/run-linux32-build.sh | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/ci/run-linux32-build.sh b/ci/run-linux32-build.sh
index e37e1d2d5f..13047adde3 100755
--- a/ci/run-linux32-build.sh
+++ b/ci/run-linux32-build.sh
@@ -33,7 +33,13 @@ then
 	CI_USER=root
 else
 	CI_USER=ci
-	useradd -u $HOST_UID $CI_USER
+	if test "$(id -u $CI_USER)" = $HOST_UID
+	then
+		: # user already exists with the right ID
+	else
+		useradd -u $HOST_UID $CI_USER
+	fi
+
 	# Due to a bug the test suite was run as root in the past, so
 	# a prove state file created back then is only accessible by
 	# root.  Now that bug is fixed, the test suite is run as a
-- 
2.16.1.80.gc0eec9753d


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/3] read-cache: don't write index twice if we can't write shared index
  2018-01-18 12:47               ` Duy Nguyen
  2018-01-18 13:29                 ` Jeff King
@ 2018-01-22 18:27                 ` SZEDER Gábor
  2018-01-22 19:46                   ` Eric Sunshine
                                     ` (2 more replies)
  1 sibling, 3 replies; 94+ messages in thread
From: SZEDER Gábor @ 2018-01-22 18:27 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Jeff King, Lars Schneider, Thomas Gummerer, Brandon Williams,
	git, SZEDER Gábor


On Thu, Jan 18, 2018 at 1:47 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Thu, Jan 18, 2018 at 6:36 PM, SZEDER Gábor <szeder.dev@gmail.com> wrote:
>> This series, queued as 'nd/shared-index-fix', makes the 32 bit Linux
>> build job fail on Travis CI.  Unfortunately, all it can tell us about
>> the failure is this:
>>
>>   Test Summary Report
>>   -------------------
>>   t1700-split-index.sh                             (Wstat: 256 Tests: 23
>>   Failed: 1)
>>     Failed test:  23
>>     Non-zero exit status: 1
>>   Files=809, Tests=18491, 401 wallclock secs ( 7.22 usr  1.60 sys + 263.16
>>   cusr 49.58 csys = 321.56 CPU)
>>   Result: FAIL
>>
>> because it can't access the test's verbose log due to lack of
>> permissions.
>>
>>
>>   https://travis-ci.org/git/git/jobs/329681826#L2074
>
> I may need help getting that log (or even better the trash directory
> of t1700). 

Well, shower thoughts gave me an idea, see the included PoC patch below.
I can't really decide whether it's too clever or too ugly :)

It produces output like this (a previous WIP version without
'--immediate'):

  https://travis-ci.org/szeder/git/jobs/331601009#L2486


  -- >8 --

Subject: [PATCH] travis-ci: include the trash directories of failed tests in
 the trace log

The trash directory of a failed test might contain valuable
information about the cause of the failure, but we have no access to
the trash directories of Travis CI build jobs.  The only feedback we
get from there is the trace log, so...

Modify 'ci/print-test-failures.sh' to create a tar.gz archive of the
test directory of each failed test and encode it with base64, so the
result is a block of ASCII text that can be safely included in the
trace log, along with a hint about how to restore it.  Furthermore,
run tests with '--immediate' to faithfully preserve the failed state.

A few of our tests create a sizeable trash directory, so limit the
size of each included base64-encoded block, let's say, to 1MB.

Note:

  - The logs of Linux build jobs coming from Travis CI have mostly
    CRLF line endings, which makes 'base64 -d' from 'coreutils'
    complain about "invalid input"; it has to be converted to LF
    first.  'openssl base64 -d' can grok it just fine, even without
    conversion.

  - The logs of OSX build jobs have CRCRLF line endings.  However, the
    'base64' util of OSX doesn't wrap its output at 76 columns, i.e.
    prints one single albeit very long line.  This means that while
    'base64' from 'coreutils' still complains, by the time it gets to
    the invalid input it already decoded everything and produced a
    valid .tar.gz.  OTOH, 'openssl base64 -d' doesn't grok it, but
    exits without any error message or even an error code, even after
    converting to CRLF or LF line endings.

    Go figure.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 ci/lib-travisci.sh        |  2 +-
 ci/print-test-failures.sh | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/ci/lib-travisci.sh b/ci/lib-travisci.sh
index 1efee55ef7..981c6e9504 100755
--- a/ci/lib-travisci.sh
+++ b/ci/lib-travisci.sh
@@ -97,7 +97,7 @@ fi
 export DEVELOPER=1
 export DEFAULT_TEST_TARGET=prove
 export GIT_PROVE_OPTS="--timer --jobs 3 --state=failed,slow,save"
-export GIT_TEST_OPTS="--verbose-log"
+export GIT_TEST_OPTS="--verbose-log --immediate"
 export GIT_TEST_CLONE_2GB=YesPlease
 
 case "$jobname" in
diff --git a/ci/print-test-failures.sh b/ci/print-test-failures.sh
index 4f261ddc01..cc6a1247a4 100755
--- a/ci/print-test-failures.sh
+++ b/ci/print-test-failures.sh
@@ -23,5 +23,25 @@ do
 		echo "$(tput setaf 1)${TEST_OUT}...$(tput sgr0)"
 		echo "------------------------------------------------------------------------"
 		cat "${TEST_OUT}"
+
+		TEST_NAME="${TEST_EXIT%.exit}"
+		TEST_NAME="${TEST_NAME##*/}"
+		TRASH_DIR="t/trash directory.$TEST_NAME"
+		TRASH_TGZ_B64="$TEST_NAME.trash.base64"
+		if [ -d "$TRASH_DIR" ]
+		then
+			tar czp "$TRASH_DIR" |base64 >"$TRASH_TGZ_B64"
+
+			if [ 1048576 -lt $(wc -c <"$TRASH_TGZ_B64") ]
+			then
+				# larger than 1MB
+				echo "$(tput setaf 1)Trash directory of '$TEST_NAME' is too big to be included in the trace log$(tput sgr0)"
+			else
+				echo "$(tput setaf 1)Trash directory of '$TEST_NAME':$(tput sgr0)"
+				echo "(Extract it to a file from the raw log, make sure it has Unix line endings, then run 'base64 -d <file> |tar vxzp' to restore it)"
+				cat "$TRASH_TGZ_B64"
+				echo "$(tput setaf 1)End of trash directory of '$TEST_NAME'$(tput sgr0)"
+			fi
+		fi
 	fi
 done
-- 
2.16.1.80.gc0eec9753d


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/3] read-cache: don't write index twice if we can't write shared index
  2018-01-22 18:27                 ` [PATCH 3/3] read-cache: don't write index twice if we can't write shared index SZEDER Gábor
@ 2018-01-22 19:46                   ` Eric Sunshine
  2018-01-22 22:10                     ` SZEDER Gábor
  2018-01-24  9:11                   ` Duy Nguyen
  2018-01-26 22:44                   ` Lars Schneider
  2 siblings, 1 reply; 94+ messages in thread
From: Eric Sunshine @ 2018-01-22 19:46 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Duy Nguyen, Jeff King, Lars Schneider, Thomas Gummerer,
	Brandon Williams, Git List

On Mon, Jan 22, 2018 at 1:27 PM, SZEDER Gábor <szeder.dev@gmail.com> wrote:
> Subject: [PATCH] travis-ci: include the trash directories of failed tests in
>  the trace log
>
> The trash directory of a failed test might contain valuable
> information about the cause of the failure, but we have no access to
> the trash directories of Travis CI build jobs.  The only feedback we
> get from there is the trace log, so...
>
> Modify 'ci/print-test-failures.sh' to create a tar.gz archive of the
> test directory of each failed test and encode it with base64, so the
> result is a block of ASCII text that can be safely included in the
> trace log, along with a hint about how to restore it.  Furthermore,
> run tests with '--immediate' to faithfully preserve the failed state.
>
> A few of our tests create a sizeable trash directory, so limit the
> size of each included base64-encoded block, let's say, to 1MB.
>
> Note:
>
>   - The logs of Linux build jobs coming from Travis CI have mostly
>     CRLF line endings, which makes 'base64 -d' from 'coreutils'
>     complain about "invalid input"; it has to be converted to LF
>     first.  'openssl base64 -d' can grok it just fine, even without
>     conversion.
>
>   - The logs of OSX build jobs have CRCRLF line endings.  However, the
>     'base64' util of OSX doesn't wrap its output at 76 columns, i.e.
>     prints one single albeit very long line.  This means that while

Perhaps you could pipe the 'base64' output through 'fold' or 'fmt'?

>     'base64' from 'coreutils' still complains, by the time it gets to
>     the invalid input it already decoded everything and produced a
>     valid .tar.gz.  OTOH, 'openssl base64 -d' doesn't grok it, but
>     exits without any error message or even an error code, even after
>     converting to CRLF or LF line endings.
>
>     Go figure.
>
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/3] read-cache: don't write index twice if we can't write shared index
  2018-01-22 19:46                   ` Eric Sunshine
@ 2018-01-22 22:10                     ` SZEDER Gábor
  0 siblings, 0 replies; 94+ messages in thread
From: SZEDER Gábor @ 2018-01-22 22:10 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Duy Nguyen, Jeff King, Lars Schneider, Thomas Gummerer,
	Brandon Williams, Git List

On Mon, Jan 22, 2018 at 8:46 PM, Eric Sunshine <sunshine@sunshineco.com> wrote:
> On Mon, Jan 22, 2018 at 1:27 PM, SZEDER Gábor <szeder.dev@gmail.com> wrote:

>>   - The logs of OSX build jobs have CRCRLF line endings.  However, the
>>     'base64' util of OSX doesn't wrap its output at 76 columns, i.e.
>>     prints one single albeit very long line.  This means that while
>
> Perhaps you could pipe the 'base64' output through 'fold' or 'fmt'?

No need to, according to its manpage[1], OSX's base64 has an option to
wrap lines after given number of characters.  Of course it uses a
different letter for the option than the coreutils base64... :)

(While long lines are ugly, in this particular case it comes handy: when
using vim to extract the base64-encoded section from the log to a
separate file, it's less keystrokes to yank a single line than to search
for the end of the encoded section.)


[1] - https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man1/base64.1.html

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 2/5] travis-ci: use 'set -e' in the 32 bit Linux build job
  2018-01-22 13:32                             ` [PATCH 2/5] travis-ci: use 'set -e' in the 32 bit Linux build job SZEDER Gábor
@ 2018-01-23 16:26                               ` Jeff King
  2018-01-23 16:32                                 ` Jeff King
  2018-01-24 12:12                                 ` SZEDER Gábor
  0 siblings, 2 replies; 94+ messages in thread
From: Jeff King @ 2018-01-23 16:26 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Junio C Hamano, Duy Nguyen, Lars Schneider, Johannes Schindelin,
	Thomas Gummerer, Brandon Williams, git

On Mon, Jan 22, 2018 at 02:32:17PM +0100, SZEDER Gábor wrote:

> All 'ci/*' scripts use 'set -e' to break the build job if a command
> fails, except 'ci/run-linux32-build.sh' which relies on the && chain
> to do the same.  This inconsistency among the 'ci/*' scripts is asking
> for trouble: I forgot about the && chain more than once while working
> on this patch series.

I think this actually fixes a bug:

>  # Update packages to the latest available versions
>  linux32 --32bit i386 sh -c '
>      apt update >/dev/null &&
>      apt install -y build-essential libcurl4-openssl-dev libssl-dev \
>  	libexpat-dev gettext python >/dev/null
> -' &&
> +'

If this step failed, then...

>  # If this script runs inside a docker container, then all commands are
>  # usually executed as root. Consequently, the host user might not be
>  # able to access the test output files.
>  # If a host user id is given, then create a user "ci" with the host user
>  # id to make everything accessible to the host user.
> -HOST_UID=$1 &&
> -CI_USER=$USER &&
> -test -z $HOST_UID || (CI_USER="ci" && useradd -u $HOST_UID $CI_USER) &&

We'd have triggered the right-hand side of this "||", and run the rest
of the script.  The whole "||" block should have been inside {}.

But after your patch, it should be fine with "set -e". Although...

> +HOST_UID=$1
> +CI_USER=$USER
> +test -z $HOST_UID || (CI_USER="ci" && useradd -u $HOST_UID $CI_USER)

If this "useradd" step fails, we wouldn't abort the script, because it's
part of a conditional. You'd need a manual "|| exit 1" at the end of
this line. Or to use a real "if" block.

Reading this line, I'm also slightly confused by setting CI_USER in the
subshell, and then only using it once. Should it be respecting the outer
CI_USER setting? If not, should it just hard-code "ci" on the useradd
command-line?  But that has nothing to do with your patch here.

-Peff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/5] travis-ci: don't repeat the path of the cache directory
  2018-01-22 13:32                             ` [PATCH 3/5] travis-ci: don't repeat the path of the cache directory SZEDER Gábor
@ 2018-01-23 16:30                               ` Jeff King
  2018-01-24 13:14                                 ` SZEDER Gábor
  0 siblings, 1 reply; 94+ messages in thread
From: Jeff King @ 2018-01-23 16:30 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Junio C Hamano, Duy Nguyen, Lars Schneider, Johannes Schindelin,
	Thomas Gummerer, Brandon Williams, git

On Mon, Jan 22, 2018 at 02:32:18PM +0100, SZEDER Gábor wrote:

> Some of our 'ci/*' scripts repeat the name or full path of the Travis
> CI cache directory, and the following patches will add new places
> using that path.
> 
> Use a variable to refer to the path of the cache directory instead, so
> it's hard-coded only in a single place.
> 
> Pay extra attention to the 32 bit Linux build: it runs in a Docker
> container, so pass the path of the cache directory from the host to
> the container in an environment variable.  Furthermore, use the
> variable in the container only if it's set, to prevent errors when
> someone is running or debugging the Docker build locally, because in
> that case the variable won't be set as there won't be any Travis CI
> cache.

Makes sense that we need to pass it down, but...

> diff --git a/ci/run-linux32-build.sh b/ci/run-linux32-build.sh
> index 248183982b..c9476d6598 100755
> --- a/ci/run-linux32-build.sh
> +++ b/ci/run-linux32-build.sh
> @@ -25,10 +25,10 @@ CI_USER=$USER
>  test -z $HOST_UID || (CI_USER="ci" && useradd -u $HOST_UID $CI_USER)
>  
>  # Build and test
> -linux32 --32bit i386 su -m -l $CI_USER -c '
> +linux32 --32bit i386 su -m -l $CI_USER -c "
>  	set -ex
>  	cd /usr/src/git
> -	ln -s /tmp/travis-cache/.prove t/.prove
> +	test -n '$cache_dir' && ln -s '$cache_dir/.prove' t/.prove
>  	make --jobs=2
>  	make --quiet test
> -'
> +"

This interpolates $cache_dir while generating the snippet. You've stuck
it in single-quotes, which prevents most quoting problems, but obviously
it's an issue if the variable itself contains a single-quote. Is there
a reason not to just export $cache_dir in the environment and access it
directly from the snippet?

Probably not a _big_ deal, since we control the contents of the
variable, but it just seems like a fragile practice to avoid.

-Peff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 2/5] travis-ci: use 'set -e' in the 32 bit Linux build job
  2018-01-23 16:26                               ` Jeff King
@ 2018-01-23 16:32                                 ` Jeff King
  2018-01-24 12:12                                 ` SZEDER Gábor
  1 sibling, 0 replies; 94+ messages in thread
From: Jeff King @ 2018-01-23 16:32 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Junio C Hamano, Duy Nguyen, Lars Schneider, Johannes Schindelin,
	Thomas Gummerer, Brandon Williams, git

On Tue, Jan 23, 2018 at 11:26:33AM -0500, Jeff King wrote:

> > +HOST_UID=$1
> > +CI_USER=$USER
> > +test -z $HOST_UID || (CI_USER="ci" && useradd -u $HOST_UID $CI_USER)
> 
> If this "useradd" step fails, we wouldn't abort the script, because it's
> part of a conditional. You'd need a manual "|| exit 1" at the end of
> this line. Or to use a real "if" block.
> 
> Reading this line, I'm also slightly confused by setting CI_USER in the
> subshell, and then only using it once. Should it be respecting the outer
> CI_USER setting? If not, should it just hard-code "ci" on the useradd
> command-line?  But that has nothing to do with your patch here.

OK, nevermind on all this. I just read patch 4. :)

-Peff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 4/5] travis-ci: don't run the test suite as root in the 32 bit Linux build
  2018-01-22 13:32                             ` [PATCH 4/5] travis-ci: don't run the test suite as root in the 32 bit Linux build SZEDER Gábor
@ 2018-01-23 16:43                               ` Jeff King
  2018-01-24 13:45                                 ` SZEDER Gábor
  0 siblings, 1 reply; 94+ messages in thread
From: Jeff King @ 2018-01-23 16:43 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Junio C Hamano, Duy Nguyen, Lars Schneider, Johannes Schindelin,
	Thomas Gummerer, Brandon Williams, git

On Mon, Jan 22, 2018 at 02:32:19PM +0100, SZEDER Gábor wrote:

> Travis CI runs the 32 bit Linux build job in a Docker container, where
> all commands are executed as root by default.  Therefore, ever since
> we added this build job in 88dedd5e7 (Travis: also test on 32-bit
> Linux, 2017-03-05), we have a bit of code to create a user in the
> container matching the ID of the host user and then to run the test
> suite as this user.  Matching the host user ID is important, because
> otherwise the host user would have no access to any files written by
> processes running in the container, notably the logs of failed tests
> couldn't be included in the build job's trace log.
> 
> Alas, this piece of code never worked, because it sets the variable
> holding the user name ($CI_USER) in a subshell, meaning it doesn't
> have any effect by the time we get to the point to actually use the
> variable to switch users with 'su'.  So all this time we were running
> the test suite as root.

This all makes sense to me, and the patch looks sane.

> Reorganize that piece of code in 'ci/run-linux32-build.sh' a bit to
> avoid that problematic subshell and to ensure that we switch to the
> right user.  Furthermore, make the script's optional host user ID
> option mandatory, so running the build accidentally as root will
> become harder when debugging locally.  If someone really wants to run
> the test suite as root, whatever the reasons might be, it'll still be
> possible to do so by explicitly passing '0' as host user ID.

Coincidentally, we recently set up a docker test build for our local fork
of Git. We found the whole "mount the existing git source tree" strategy
slightly annoying, exactly because of these mismatches between the host
and docker uids.

Instead, we ended up with something more like:

  git archive HEAD | docker run ...

and the in-container script starts with:

  tar -x

to extract the to-be-tested source, and ends with a dump of the failures
to stdout.

I don't know if it's worth switching strategies now, but just some food
for thought. It may be less interesting with Travis, too, because you're
also trying to carry the prove cache across runs, which does require
some filesystem interaction.

(As an aside, I'm not sure the prove cache is doing much. Running in
slow-to-fast order helps if you are trying to run massively in parallel,
but we only use -j3 for our Travis builds).

-Peff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 5/5] travis-ci: don't fail if user already exists on 32 bit Linux build job
  2018-01-22 13:32                             ` [PATCH 5/5] travis-ci: don't fail if user already exists on 32 bit Linux build job SZEDER Gábor
@ 2018-01-23 16:46                               ` Jeff King
  2018-01-24  0:32                                 ` Duy Nguyen
  2018-01-24 19:39                                 ` SZEDER Gábor
  0 siblings, 2 replies; 94+ messages in thread
From: Jeff King @ 2018-01-23 16:46 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Junio C Hamano, Duy Nguyen, Lars Schneider, Johannes Schindelin,
	Thomas Gummerer, Brandon Williams, git

On Mon, Jan 22, 2018 at 02:32:20PM +0100, SZEDER Gábor wrote:

> The 32 bit Linux build job runs in a Docker container, which lends
> itself to running and debugging locally, too.  Especially during
> debugging one usually doesn't want to start with a fresh container
> every time, to save time spent on installing a bunch of dependencies.
> However, that doesn't work quite smootly, because the script running
> in the container always creates a new user, which then must be removed
> every time before subsequent executions, or the build script fails.
> 
> Make this process more convenient and don't try to create that user if
> it already exists and has the right user ID in the container, so
> developers don't have to bother with running a 'userdel' each time
> before they run the build script.

Makes sense.

> The build job on Travis CI always starts with a fresh Docker
> container, so this change doesn't make a difference there.

I wonder if that is fixable. Installing dependencies into the container
takes quite a lot of time, and is just wasted effort.

> diff --git a/ci/run-linux32-build.sh b/ci/run-linux32-build.sh
> index e37e1d2d5f..13047adde3 100755
> --- a/ci/run-linux32-build.sh
> +++ b/ci/run-linux32-build.sh
> @@ -33,7 +33,13 @@ then
>  	CI_USER=root
>  else
>  	CI_USER=ci
> -	useradd -u $HOST_UID $CI_USER
> +	if test "$(id -u $CI_USER)" = $HOST_UID
> +	then
> +		: # user already exists with the right ID
> +	else
> +		useradd -u $HOST_UID $CI_USER
> +	fi

Is it worth redirecting the stderr of "id" to avoid noise when $CI_USER
does not yet exist at all? Or is that a useful log message? :)

-Peff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 5/5] travis-ci: don't fail if user already exists on 32 bit Linux build job
  2018-01-23 16:46                               ` Jeff King
@ 2018-01-24  0:32                                 ` Duy Nguyen
  2018-01-24 19:39                                 ` SZEDER Gábor
  1 sibling, 0 replies; 94+ messages in thread
From: Duy Nguyen @ 2018-01-24  0:32 UTC (permalink / raw)
  To: Jeff King
  Cc: SZEDER Gábor, Junio C Hamano, Lars Schneider,
	Johannes Schindelin, Thomas Gummerer, Brandon Williams,
	Git Mailing List

On Tue, Jan 23, 2018 at 11:46 PM, Jeff King <peff@peff.net> wrote:
>> The build job on Travis CI always starts with a fresh Docker
>> container, so this change doesn't make a difference there.
>
> I wonder if that is fixable. Installing dependencies into the container
> takes quite a lot of time, and is just wasted effort.

I agree (because apt-get install phase took forever on my laptop).
I've seen people just include a Dockerfile that builds everything in
(except things like creating user matching host). First try to fetch
it from docker hub (or a local registry but I don't think it applies
to us). If not found, rebuild the image with Dockerfile and run a new
container with it. Every time somebody changes dockerfile, they are
supposed to step the image version up.

Of course this is still wasted effort unless somebody pushes images to
docker registry and the script has to rebuild the image every single
time [1]. Somebody could do that manually. I see docker hub has some
automated rebuild feature, perhaps that'll work for us.

[1] It still benefits users who run ci scripts manually because the
created image will stay in their local machine.
-- 
Duy

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/3] read-cache: don't write index twice if we can't write shared index
  2018-01-22 18:27                 ` [PATCH 3/3] read-cache: don't write index twice if we can't write shared index SZEDER Gábor
  2018-01-22 19:46                   ` Eric Sunshine
@ 2018-01-24  9:11                   ` Duy Nguyen
  2018-01-26 22:44                   ` Lars Schneider
  2 siblings, 0 replies; 94+ messages in thread
From: Duy Nguyen @ 2018-01-24  9:11 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Jeff King, Lars Schneider, Thomas Gummerer, Brandon Williams,
	Git Mailing List

On Tue, Jan 23, 2018 at 1:27 AM, SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> On Thu, Jan 18, 2018 at 1:47 PM, Duy Nguyen <pclouds@gmail.com> wrote:
>> On Thu, Jan 18, 2018 at 6:36 PM, SZEDER Gábor <szeder.dev@gmail.com> wrote:
>>> This series, queued as 'nd/shared-index-fix', makes the 32 bit Linux
>>> build job fail on Travis CI.  Unfortunately, all it can tell us about
>>> the failure is this:
>>>
>>>   Test Summary Report
>>>   -------------------
>>>   t1700-split-index.sh                             (Wstat: 256 Tests: 23
>>>   Failed: 1)
>>>     Failed test:  23
>>>     Non-zero exit status: 1
>>>   Files=809, Tests=18491, 401 wallclock secs ( 7.22 usr  1.60 sys + 263.16
>>>   cusr 49.58 csys = 321.56 CPU)
>>>   Result: FAIL
>>>
>>> because it can't access the test's verbose log due to lack of
>>> permissions.
>>>
>>>
>>>   https://travis-ci.org/git/git/jobs/329681826#L2074
>>
>> I may need help getting that log (or even better the trash directory
>> of t1700).
>
> Well, shower thoughts gave me an idea, see the included PoC patch below.
> I can't really decide whether it's too clever or too ugly :)

I can't comment on the patch. But there is one thing I think I should
mention. As someone new to Travis, I didn't know that I could set up
my own Travis jobs and get the logs myself. Maybe you should point
that out when you point people to travis test failures (which I
appreciate).
-- 
Duy

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 2/5] travis-ci: use 'set -e' in the 32 bit Linux build job
  2018-01-23 16:26                               ` Jeff King
  2018-01-23 16:32                                 ` Jeff King
@ 2018-01-24 12:12                                 ` SZEDER Gábor
  2018-01-24 15:49                                   ` Jeff King
  1 sibling, 1 reply; 94+ messages in thread
From: SZEDER Gábor @ 2018-01-24 12:12 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Duy Nguyen, Lars Schneider, Johannes Schindelin,
	Thomas Gummerer, Brandon Williams, Git mailing list

On Tue, Jan 23, 2018 at 5:26 PM, Jeff King <peff@peff.net> wrote:
> On Mon, Jan 22, 2018 at 02:32:17PM +0100, SZEDER Gábor wrote:
>
>> All 'ci/*' scripts use 'set -e' to break the build job if a command
>> fails, except 'ci/run-linux32-build.sh' which relies on the && chain
>> to do the same.  This inconsistency among the 'ci/*' scripts is asking
>> for trouble: I forgot about the && chain more than once while working
>> on this patch series.
>
> I think this actually fixes a bug:
>
>>  # Update packages to the latest available versions
>>  linux32 --32bit i386 sh -c '
>>      apt update >/dev/null &&
>>      apt install -y build-essential libcurl4-openssl-dev libssl-dev \
>>       libexpat-dev gettext python >/dev/null
>> -' &&
>> +'
>
> If this step failed, then...
>
>>  # If this script runs inside a docker container, then all commands are
>>  # usually executed as root. Consequently, the host user might not be
>>  # able to access the test output files.
>>  # If a host user id is given, then create a user "ci" with the host user
>>  # id to make everything accessible to the host user.
>> -HOST_UID=$1 &&
>> -CI_USER=$USER &&
>> -test -z $HOST_UID || (CI_USER="ci" && useradd -u $HOST_UID $CI_USER) &&
>
> We'd have triggered the right-hand side of this "||", and run the rest
> of the script.  The whole "||" block should have been inside {}.

Indeed, the && chain is broken, I didn't noticed that.

Luckily it was broken in a way that it didn't lead to false successes:
if installing dependencies fails, then the first && chain
ensures that HOST_UID is not set, and then useradd will error out with
"invalid user ID 'ci'", causing the second && chain to abort the script
and thus breaking the build.

Will update the commit message accordingly.

> But after your patch, it should be fine with "set -e". Although...
>
>> +HOST_UID=$1
>> +CI_USER=$USER
>> +test -z $HOST_UID || (CI_USER="ci" && useradd -u $HOST_UID $CI_USER)
>
> If this "useradd" step fails, we wouldn't abort the script, because it's
> part of a conditional. You'd need a manual "|| exit 1" at the end of
> this line. Or to use a real "if" block.

No.  It does abort the script, because it isn't part of a conditional.
Try to run the script twice in the same container instance: in the
second run the user already exists, useradd fails and the whole script
aborts.


> Reading this line, I'm also slightly confused by setting CI_USER in the
> subshell, and then only using it once. Should it be respecting the outer
> CI_USER setting? If not, should it just hard-code "ci" on the useradd
> command-line?  But that has nothing to do with your patch here.

I see you've already made it to patch 4, good :)

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/5] travis-ci: don't repeat the path of the cache directory
  2018-01-23 16:30                               ` Jeff King
@ 2018-01-24 13:14                                 ` SZEDER Gábor
  0 siblings, 0 replies; 94+ messages in thread
From: SZEDER Gábor @ 2018-01-24 13:14 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Duy Nguyen, Lars Schneider, Johannes Schindelin,
	Thomas Gummerer, Brandon Williams, Git mailing list

On Tue, Jan 23, 2018 at 5:30 PM, Jeff King <peff@peff.net> wrote:
> On Mon, Jan 22, 2018 at 02:32:18PM +0100, SZEDER Gábor wrote:

>> diff --git a/ci/run-linux32-build.sh b/ci/run-linux32-build.sh
>> index 248183982b..c9476d6598 100755
>> --- a/ci/run-linux32-build.sh
>> +++ b/ci/run-linux32-build.sh
>> @@ -25,10 +25,10 @@ CI_USER=$USER
>>  test -z $HOST_UID || (CI_USER="ci" && useradd -u $HOST_UID $CI_USER)
>>
>>  # Build and test
>> -linux32 --32bit i386 su -m -l $CI_USER -c '
>> +linux32 --32bit i386 su -m -l $CI_USER -c "
>>       set -ex
>>       cd /usr/src/git
>> -     ln -s /tmp/travis-cache/.prove t/.prove
>> +     test -n '$cache_dir' && ln -s '$cache_dir/.prove' t/.prove
>>       make --jobs=2
>>       make --quiet test
>> -'
>> +"
>
> This interpolates $cache_dir while generating the snippet. You've stuck
> it in single-quotes, which prevents most quoting problems, but obviously
> it's an issue if the variable itself contains a single-quote. Is there
> a reason not to just export $cache_dir in the environment and access it
> directly from the snippet?

Right, that's my conditioned response to single quotes kicking in :)

We don't even need to export the variable, because Docker has already
done it (that's what 'docker run --env VAR' does).


> Probably not a _big_ deal, since we control the contents of the
> variable, but it just seems like a fragile practice to avoid.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 4/5] travis-ci: don't run the test suite as root in the 32 bit Linux build
  2018-01-23 16:43                               ` Jeff King
@ 2018-01-24 13:45                                 ` SZEDER Gábor
  2018-01-24 15:56                                   ` Jeff King
  0 siblings, 1 reply; 94+ messages in thread
From: SZEDER Gábor @ 2018-01-24 13:45 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Duy Nguyen, Lars Schneider, Johannes Schindelin,
	Thomas Gummerer, Brandon Williams, Git mailing list

On Tue, Jan 23, 2018 at 5:43 PM, Jeff King <peff@peff.net> wrote:
> On Mon, Jan 22, 2018 at 02:32:19PM +0100, SZEDER Gábor wrote:
>
>> Travis CI runs the 32 bit Linux build job in a Docker container, where
>> all commands are executed as root by default.  Therefore, ever since
>> we added this build job in 88dedd5e7 (Travis: also test on 32-bit
>> Linux, 2017-03-05), we have a bit of code to create a user in the
>> container matching the ID of the host user and then to run the test
>> suite as this user.  Matching the host user ID is important, because
>> otherwise the host user would have no access to any files written by
>> processes running in the container, notably the logs of failed tests
>> couldn't be included in the build job's trace log.
>>
>> Alas, this piece of code never worked, because it sets the variable
>> holding the user name ($CI_USER) in a subshell, meaning it doesn't
>> have any effect by the time we get to the point to actually use the
>> variable to switch users with 'su'.  So all this time we were running
>> the test suite as root.
>
> This all makes sense to me, and the patch looks sane.
>
>> Reorganize that piece of code in 'ci/run-linux32-build.sh' a bit to
>> avoid that problematic subshell and to ensure that we switch to the
>> right user.  Furthermore, make the script's optional host user ID
>> option mandatory, so running the build accidentally as root will
>> become harder when debugging locally.  If someone really wants to run
>> the test suite as root, whatever the reasons might be, it'll still be
>> possible to do so by explicitly passing '0' as host user ID.
>
> Coincidentally, we recently set up a docker test build for our local fork
> of Git. We found the whole "mount the existing git source tree" strategy
> slightly annoying, exactly because of these mismatches between the host
> and docker uids.
>
> Instead, we ended up with something more like:
>
>   git archive HEAD | docker run ...
>
> and the in-container script starts with:
>
>   tar -x
>
> to extract the to-be-tested source, and ends with a dump of the failures
> to stdout.
>
> I don't know if it's worth switching strategies now, but just some food
> for thought. It may be less interesting with Travis, too, because you're
> also trying to carry the prove cache across runs, which does require
> some filesystem interaction.

The prove state in itself doesn't need matching user IDs in the host and
the container, because it's only accessed in the container[1].

I think the key is the handling of verbose logs of failed test(s).  The
original motivation for matching the user IDs was, I suppose, that we
wanted to dump the verbose log of the failed test(s) to the trace log on
the host, because this way it's fairly consistent with how other build
job do the same: it uses the same 'after_failure' script to dump the
verbose logs, it relies on Travis CI to automatically run that script if
something goes wrong, and those logs end up in the 'after_failure'
fold.

[1] - I ignored saving and restoring the cache here: while it's done on
      the host, but it's done by Travis CI, presumably as root, so the
      owner of the file doesn't matter.

> (As an aside, I'm not sure the prove cache is doing much. Running in
> slow-to-fast order helps if you are trying to run massively in parallel,
> but we only use -j3 for our Travis builds).

It saves about a minute / 10% of runtime; it's mentioned in 7e72cfcee
(travis-ci: save prove state for the 32 bit Linux build, 2017-12-27).

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 2/5] travis-ci: use 'set -e' in the 32 bit Linux build job
  2018-01-24 12:12                                 ` SZEDER Gábor
@ 2018-01-24 15:49                                   ` Jeff King
  0 siblings, 0 replies; 94+ messages in thread
From: Jeff King @ 2018-01-24 15:49 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Junio C Hamano, Duy Nguyen, Lars Schneider, Johannes Schindelin,
	Thomas Gummerer, Brandon Williams, Git mailing list

On Wed, Jan 24, 2018 at 01:12:48PM +0100, SZEDER Gábor wrote:

> >> -test -z $HOST_UID || (CI_USER="ci" && useradd -u $HOST_UID $CI_USER) &&
> [...]
> Indeed, the && chain is broken, I didn't noticed that.
> 
> Luckily it was broken in a way that it didn't lead to false successes:
> if installing dependencies fails, then the first && chain
> ensures that HOST_UID is not set, and then useradd will error out with
> "invalid user ID 'ci'", causing the second && chain to abort the script
> and thus breaking the build.

True. :) I didn't even look closely at whether the failures could be
correlated.

> >> +HOST_UID=$1
> >> +CI_USER=$USER
> >> +test -z $HOST_UID || (CI_USER="ci" && useradd -u $HOST_UID $CI_USER)
> >
> > If this "useradd" step fails, we wouldn't abort the script, because it's
> > part of a conditional. You'd need a manual "|| exit 1" at the end of
> > this line. Or to use a real "if" block.
> 
> No.  It does abort the script, because it isn't part of a conditional.
> Try to run the script twice in the same container instance: in the
> second run the user already exists, useradd fails and the whole script
> aborts.

You're right. I was confusing it with this case:

  (one; two) || three

in which we continue with "two" even if "one" fails. But it's only the
left-hand side of the || that does that.

-Peff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 4/5] travis-ci: don't run the test suite as root in the 32 bit Linux build
  2018-01-24 13:45                                 ` SZEDER Gábor
@ 2018-01-24 15:56                                   ` Jeff King
  2018-01-24 18:01                                     ` Jeff King
  0 siblings, 1 reply; 94+ messages in thread
From: Jeff King @ 2018-01-24 15:56 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Junio C Hamano, Duy Nguyen, Lars Schneider, Johannes Schindelin,
	Thomas Gummerer, Brandon Williams, Git mailing list

On Wed, Jan 24, 2018 at 02:45:18PM +0100, SZEDER Gábor wrote:

> I think the key is the handling of verbose logs of failed test(s).  The
> original motivation for matching the user IDs was, I suppose, that we
> wanted to dump the verbose log of the failed test(s) to the trace log on
> the host, because this way it's fairly consistent with how other build
> job do the same: it uses the same 'after_failure' script to dump the
> verbose logs, it relies on Travis CI to automatically run that script if
> something goes wrong, and those logs end up in the 'after_failure'
> fold.

That makes sense, I guess. In the CI system I'm using there's no such
distinction, and I just do:

    tar -x && make -j64 test && exit 0
    echo "Failing tests:"
    echo "--------------"
    grep -l '[^0]' t/test-results/*.exit |
    while read failed; do
      base=${failed%.exit}
      name=${base#t/test-results/}
      echo "==> $name"
      cat "$base.out"
    done
    exit 1

There's no folds there at all, but we could of course do our own. It may
not be worth messing with, though, if you've shaken all the problems out
of the existing setup.

> > (As an aside, I'm not sure the prove cache is doing much. Running in
> > slow-to-fast order helps if you are trying to run massively in parallel,
> > but we only use -j3 for our Travis builds).
> 
> It saves about a minute / 10% of runtime; it's mentioned in 7e72cfcee
> (travis-ci: save prove state for the 32 bit Linux build, 2017-12-27).

I'm surprised we get that much benefit out of a 3-way parallel run, but
I'll believe you if you measured it. I guess it's because a lot of the
really long tests are right at the end, numerically (especially if svn
tests are enabled). I wonder if "--shuffle" would yield similar
benefits.

-Peff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 4/5] travis-ci: don't run the test suite as root in the 32 bit Linux build
  2018-01-24 15:56                                   ` Jeff King
@ 2018-01-24 18:01                                     ` Jeff King
  2018-01-24 19:51                                       ` Jeff King
  0 siblings, 1 reply; 94+ messages in thread
From: Jeff King @ 2018-01-24 18:01 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Junio C Hamano, Duy Nguyen, Lars Schneider, Johannes Schindelin,
	Thomas Gummerer, Brandon Williams, Git mailing list

On Wed, Jan 24, 2018 at 10:56:22AM -0500, Jeff King wrote:

> > > (As an aside, I'm not sure the prove cache is doing much. Running in
> > > slow-to-fast order helps if you are trying to run massively in parallel,
> > > but we only use -j3 for our Travis builds).
> > 
> > It saves about a minute / 10% of runtime; it's mentioned in 7e72cfcee
> > (travis-ci: save prove state for the 32 bit Linux build, 2017-12-27).
> 
> I'm surprised we get that much benefit out of a 3-way parallel run, but
> I'll believe you if you measured it. I guess it's because a lot of the
> really long tests are right at the end, numerically (especially if svn
> tests are enabled). I wonder if "--shuffle" would yield similar
> benefits.

Just for fun, I tried running:

  cd t
  best-of-five make GIT_PROVE_OPTS='-j3'
  best-of-five make GIT_PROVE_OPTS='-j3 --state=slow,save'
  best-of-five make GIT_PROVE_OPTS='-j3 --shuffle'

and got:

  stock
  Attempt 1: 137.057
  Attempt 2: 137.635
  Attempt 3: 138.925
  Attempt 4: 134.693
  Attempt 5: 139.581

  slow,save
  Attempt 1: 133.157
  Attempt 2: 135.602
  Attempt 3: 133.225
  Attempt 4: 136.278
  Attempt 5: 133.382

  shuffle
  Attempt 1: 136.717
  Attempt 2: 138.805
  Attempt 3: 145.734
  Attempt 4: 145.226
  Attempt 5: 145.889

I had expected the shuffle to be sometimes-fast and sometimes-slow, but
it seems like it is just-slow. So that's probably not a big win. It also
doesn't look like state-saving gets us much.

Those runs don't have cvs/svn installed. I repeated with those
installed. The whole run is much slower then (about 230s), but the
relative timings are the same.

I wonder what is different between my setup and Travis. I guess one
is that I use a tmpfs for the test-root. I wonder if that could throw
off the relative timings, or the importance of parallelization.

-Peff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 5/5] travis-ci: don't fail if user already exists on 32 bit Linux build job
  2018-01-23 16:46                               ` Jeff King
  2018-01-24  0:32                                 ` Duy Nguyen
@ 2018-01-24 19:39                                 ` SZEDER Gábor
  1 sibling, 0 replies; 94+ messages in thread
From: SZEDER Gábor @ 2018-01-24 19:39 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Duy Nguyen, Lars Schneider, Johannes Schindelin,
	Thomas Gummerer, Brandon Williams, Git mailing list

On Tue, Jan 23, 2018 at 5:46 PM, Jeff King <peff@peff.net> wrote:
> On Mon, Jan 22, 2018 at 02:32:20PM +0100, SZEDER Gábor wrote:

>> diff --git a/ci/run-linux32-build.sh b/ci/run-linux32-build.sh
>> index e37e1d2d5f..13047adde3 100755
>> --- a/ci/run-linux32-build.sh
>> +++ b/ci/run-linux32-build.sh
>> @@ -33,7 +33,13 @@ then
>>       CI_USER=root
>>  else
>>       CI_USER=ci
>> -     useradd -u $HOST_UID $CI_USER
>> +     if test "$(id -u $CI_USER)" = $HOST_UID
>> +     then
>> +             : # user already exists with the right ID
>> +     else
>> +             useradd -u $HOST_UID $CI_USER
>> +     fi
>
> Is it worth redirecting the stderr of "id" to avoid noise when $CI_USER
> does not yet exist at all? Or is that a useful log message? :)

I think it's worth silencing that error.  I'm not even sure it was
useful while working on this patch series, but an error message in the
middle of the log of a successful build job is surely distracting and
might raise some eyebrows (though I suspect that barely anyone looks at
logs of successful build jobs).

OTOH, I might turn that comment into an echo msg...

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 4/5] travis-ci: don't run the test suite as root in the 32 bit Linux build
  2018-01-24 18:01                                     ` Jeff King
@ 2018-01-24 19:51                                       ` Jeff King
  0 siblings, 0 replies; 94+ messages in thread
From: Jeff King @ 2018-01-24 19:51 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Junio C Hamano, Duy Nguyen, Lars Schneider, Johannes Schindelin,
	Thomas Gummerer, Brandon Williams, Git mailing list

On Wed, Jan 24, 2018 at 01:01:42PM -0500, Jeff King wrote:

> Just for fun, I tried running:
> 
>   cd t
>   best-of-five make GIT_PROVE_OPTS='-j3'
>   best-of-five make GIT_PROVE_OPTS='-j3 --state=slow,save'
>   best-of-five make GIT_PROVE_OPTS='-j3 --shuffle'
> [...]
> I wonder what is different between my setup and Travis. I guess one
> is that I use a tmpfs for the test-root. I wonder if that could throw
> off the relative timings, or the importance of parallelization.

I just tried again without a tmpfs (actually, with
GIT_TEST_OPTS=--verbose-log, which matches Travis) and still got the
"prove" version as only about 10s faster. So I dunno. My disk is an SSD,
so that probably _is_ faster than Travis's disks, and maybe that would
matter.

I'm giving up on finding it. If you happen to have a recipe for
reproducing a larger speedup with "--state=slow,save", I'd be curious to
see it. But don't spend much time on it on my account. The existing
prove-caching setup is there and working, so even in the worst case if
it's not helping, it's not actively hurting much.

-Peff

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/3] read-cache: don't write index twice if we can't write shared index
  2018-01-22 18:27                 ` [PATCH 3/3] read-cache: don't write index twice if we can't write shared index SZEDER Gábor
  2018-01-22 19:46                   ` Eric Sunshine
  2018-01-24  9:11                   ` Duy Nguyen
@ 2018-01-26 22:44                   ` Lars Schneider
  2 siblings, 0 replies; 94+ messages in thread
From: Lars Schneider @ 2018-01-26 22:44 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Duy Nguyen, Jeff King, Thomas Gummerer, Brandon Williams, git


> On 22 Jan 2018, at 19:27, SZEDER Gábor <szeder.dev@gmail.com> wrote:
> 
> 
> On Thu, Jan 18, 2018 at 1:47 PM, Duy Nguyen <pclouds@gmail.com> wrote:
>> On Thu, Jan 18, 2018 at 6:36 PM, SZEDER Gábor <szeder.dev@gmail.com> wrote:
>>> This series, queued as 'nd/shared-index-fix', makes the 32 bit Linux
>>> build job fail on Travis CI.  Unfortunately, all it can tell us about
>>> the failure is this:
>>> 
>>>  Test Summary Report
>>>  -------------------
>>>  t1700-split-index.sh                             (Wstat: 256 Tests: 23
>>>  Failed: 1)
>>>    Failed test:  23
>>>    Non-zero exit status: 1
>>>  Files=809, Tests=18491, 401 wallclock secs ( 7.22 usr  1.60 sys + 263.16
>>>  cusr 49.58 csys = 321.56 CPU)
>>>  Result: FAIL
>>> 
>>> because it can't access the test's verbose log due to lack of
>>> permissions.
>>> 
>>> 
>>>  https://travis-ci.org/git/git/jobs/329681826#L2074
>> 
>> I may need help getting that log (or even better the trash directory
>> of t1700). 
> 
> Well, shower thoughts gave me an idea, see the included PoC patch below.
> I can't really decide whether it's too clever or too ugly :)

Creative :-)

Alternatively, we could store write access credentials [1] of a public
repo in Travis and upload the trash directory to it. This could
work "out-of-the-box" for git/git. Personal Travis environments (e.g.
larsxschneider/git) would need to setup that individually.

- Lars

[1] https://docs.travis-ci.com/user/environment-variables/#Defining-Variables-in-Repository-Settings
    ^^^ that's the mechanism we use for the Windows build credentials

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos
  2018-01-22  6:14           ` Junio C Hamano
@ 2018-01-27 12:18             ` Thomas Gummerer
  2018-02-07 22:41             ` Junio C Hamano
  1 sibling, 0 replies; 94+ messages in thread
From: Thomas Gummerer @ 2018-01-27 12:18 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Lars Schneider, Brandon Williams, Jeff King,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor

On 01/21, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
> 
> > On 01/19, Junio C Hamano wrote:
> >> Thomas Gummerer <t.gummerer@gmail.com> writes:
> >> 
> >> > read_cache_from() defaults to using the gitdir of the_repository.  As it
> >> > is mostly a convenience macro, having to pass get_git_dir() for every
> >> > call seems overkill, and if necessary users can have more control by
> >> > using read_index_from().
> >> 
> >> This was a bit painful change, given that some changes in flight do
> >> add new callsites to read_index_from() and they got the function
> >> changed under their feet.
> >
> > Sorry about that.  Is there any way to make such a change less painful
> > in the future?
> 
> One way is to do for read_index_from() what you did for the existing
> users of read_cache_from().  Introduce a _new_ helper that will not
> be known for any existing topics in flight, and use that to make the
> existing API a thin wrapper around it.

I'll do that next time.  My worries were just that the
'read_index_from()' API is broken with split index, which is what this
series fixes.  It would be easy to introduce new breakages if we're
not careful.

> I _think_ I got it right with evil merge, so unless this fix needs
> to be delayed for extended period of time for whatever reason while
> any more new callers of the function appears (which is unlikely), we
> should be OK ;-)

Thanks!  As mentioned there's one call that was added that the evil
merge didn't get quite right, for which I sent the diff below in my
previous email.  But I'm happy to fix that on top once this series
goes to master or next if that's preferred.

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 6a49f9e628..4d86a3574f 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -612,7 +612,8 @@ static void validate_no_submodules(const struct worktree *wt)
        struct index_state istate = {0};
        int i, found_submodules = 0;
 
-       if (read_index_from(&istate, worktree_git_path(wt, "index"), get_git_dir()) > 0) {
+       if (read_index_from(&istate, worktree_git_path(wt, "index"),
+                           get_worktree_git_dir(wt)) > 0) {
                for (i = 0; i < istate.cache_nr; i++) {
                        struct cache_entry *ce = istate.cache[i];
 
diff --git a/t/t2028-worktree-move.sh b/t/t2028-worktree-move.sh
index b3105eaaed..8faf61bbf5 100755
--- a/t/t2028-worktree-move.sh
+++ b/t/t2028-worktree-move.sh
@@ -90,6 +90,16 @@ test_expect_success 'move main worktree' '
        test_must_fail git worktree move . def
 '
 
+test_expect_success 'move worktree with split index' '
+       git worktree add test &&
+       (
+               cd test &&
+               test_commit file &&
+               git update-index --split-index
+       ) &&
+       git worktree move test test-destination
+'
+
 test_expect_success 'remove main worktree' '
        test_must_fail git worktree remove .
 '

> Thanks.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos
  2018-01-22  6:14           ` Junio C Hamano
  2018-01-27 12:18             ` Thomas Gummerer
@ 2018-02-07 22:41             ` Junio C Hamano
  1 sibling, 0 replies; 94+ messages in thread
From: Junio C Hamano @ 2018-02-07 22:41 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: git, Lars Schneider, Brandon Williams, Jeff King,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor

Junio C Hamano <gitster@pobox.com> writes:

>>> This was a bit painful change, given that some changes in flight do
>>> add new callsites to read_index_from() and they got the function
>>> changed under their feet.
>>
>> Sorry about that.  Is there any way to make such a change less painful
>> in the future?
>
> One way is to do for read_index_from() what you did for the existing
> users of read_cache_from().  Introduce a _new_ helper that will not
> be known for any existing topics in flight, and use that to make the
> existing API a thin wrapper around it.

This continues to cause pain simply because read_index_from() is
something new topics want to stay stable X-<.  

I'll be merging this to 'next' hopefully as part of today's
integration run, so will need to endure the pain only until it (and
other topics that conflict with it) all graduate to 'master'.

Next time a patch with an internal API change like this appears,
please remind me to push it back a lot stronger.  I was too lenient
and ended up slowing down the progress of other topics this time.
Thanks.



^ permalink raw reply	[flat|nested] 94+ messages in thread

end of thread, back to index

Thread overview: 94+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-10 21:21 [PATCH 0/3] fixes for split index mode Thomas Gummerer
2017-12-10 21:22 ` [PATCH 1/3] repository: fix repo_read_index with submodules Thomas Gummerer
2017-12-11 18:54   ` Brandon Williams
2017-12-11 20:37     ` Thomas Gummerer
2017-12-10 21:22 ` [PATCH 2/3] prune: fix pruning with multiple worktrees and split index Thomas Gummerer
2017-12-11 19:09   ` Brandon Williams
2017-12-11 21:39     ` Thomas Gummerer
2017-12-10 21:22 ` [PATCH 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX Thomas Gummerer
2017-12-10 23:37   ` Eric Sunshine
2017-12-11 21:09   ` SZEDER Gábor
2017-12-11 21:42     ` Thomas Gummerer
2017-12-12 15:54       ` Lars Schneider
2017-12-12 19:15         ` Junio C Hamano
2017-12-12 20:15           ` Thomas Gummerer
2017-12-12 20:51             ` Junio C Hamano
2017-12-13 23:21               ` Thomas Gummerer
2017-12-13 17:21           ` Lars Schneider
2017-12-13 17:38             ` Junio C Hamano
2017-12-13 17:46               ` Lars Schneider
2017-12-13 23:28                 ` Thomas Gummerer
2017-12-17 22:51 ` [PATCH v2 0/3] fixes for split index mode Thomas Gummerer
2017-12-17 22:51   ` [PATCH v2 1/3] repository: fix repo_read_index with submodules Thomas Gummerer
2017-12-18 18:01     ` Brandon Williams
2017-12-18 23:05       ` Thomas Gummerer
2017-12-18 23:05         ` Brandon Williams
2017-12-17 22:51   ` [PATCH v2 2/3] prune: fix pruning with multiple worktrees and split index Thomas Gummerer
2017-12-18 18:19     ` Brandon Williams
2018-01-03 22:18       ` Thomas Gummerer
2018-01-04 19:12         ` Junio C Hamano
2017-12-17 22:51   ` [PATCH v2 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX Thomas Gummerer
2017-12-18 18:16     ` Lars Schneider
2018-01-04 20:13       ` Thomas Gummerer
2018-01-05 11:03         ` Lars Schneider
2018-01-07 20:02         ` Thomas Gummerer
2018-01-07 22:30   ` [PATCH v3 0/3] fixes for split index mode Thomas Gummerer
2018-01-07 22:30     ` [PATCH v3 1/3] read-cache: fix reading the shared index for other repos Thomas Gummerer
2018-01-08 10:41       ` Duy Nguyen
2018-01-08 22:41         ` Thomas Gummerer
2018-01-13 22:33           ` Thomas Gummerer
2018-01-08 23:38         ` Brandon Williams
2018-01-09  1:24           ` Duy Nguyen
2018-01-16 21:42       ` Brandon Williams
2018-01-17  0:16         ` Duy Nguyen
2018-01-17  0:32           ` Brandon Williams
2018-01-17 18:16           ` Jonathan Nieder
2018-01-18 10:19             ` Duy Nguyen
2018-01-19 21:57       ` Junio C Hamano
2018-01-20 11:58         ` Thomas Gummerer
2018-01-22  6:14           ` Junio C Hamano
2018-01-27 12:18             ` Thomas Gummerer
2018-02-07 22:41             ` Junio C Hamano
2018-01-07 22:30     ` [PATCH v3 2/3] split-index: don't write cache tree with null oid entries Thomas Gummerer
2018-01-07 22:30     ` [PATCH v3 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX Thomas Gummerer
2018-01-13 22:37     ` [PATCH v3 4/3] read-cache: don't try to write index if we can't write shared index Thomas Gummerer
2018-01-14  9:36       ` Duy Nguyen
2018-01-14 10:18         ` [PATCH 1/3] read-cache.c: change type of "temp" in write_shared_index() Nguyễn Thái Ngọc Duy
2018-01-14 10:18           ` [PATCH 2/3] read-cache.c: move tempfile creation/cleanup out of write_shared_index Nguyễn Thái Ngọc Duy
2018-01-14 10:18           ` [PATCH 3/3] read-cache: don't write index twice if we can't write shared index Nguyễn Thái Ngọc Duy
2018-01-18 11:36             ` SZEDER Gábor
2018-01-18 12:47               ` Duy Nguyen
2018-01-18 13:29                 ` Jeff King
2018-01-18 13:36                   ` Duy Nguyen
2018-01-18 15:00                     ` Duy Nguyen
2018-01-18 21:37                       ` Jeff King
2018-01-18 22:32                         ` SZEDER Gábor
2018-01-19  0:30                           ` Duy Nguyen
2018-01-22 13:32                           ` [PATCH 0/5] Travis CI: don't run the test suite as root in the 32 bit Linux build SZEDER Gábor
2018-01-22 13:32                             ` [PATCH 1/5] travis-ci: use 'set -x' for the commands under 'su' " SZEDER Gábor
2018-01-22 13:32                             ` [PATCH 2/5] travis-ci: use 'set -e' in the 32 bit Linux build job SZEDER Gábor
2018-01-23 16:26                               ` Jeff King
2018-01-23 16:32                                 ` Jeff King
2018-01-24 12:12                                 ` SZEDER Gábor
2018-01-24 15:49                                   ` Jeff King
2018-01-22 13:32                             ` [PATCH 3/5] travis-ci: don't repeat the path of the cache directory SZEDER Gábor
2018-01-23 16:30                               ` Jeff King
2018-01-24 13:14                                 ` SZEDER Gábor
2018-01-22 13:32                             ` [PATCH 4/5] travis-ci: don't run the test suite as root in the 32 bit Linux build SZEDER Gábor
2018-01-23 16:43                               ` Jeff King
2018-01-24 13:45                                 ` SZEDER Gábor
2018-01-24 15:56                                   ` Jeff King
2018-01-24 18:01                                     ` Jeff King
2018-01-24 19:51                                       ` Jeff King
2018-01-22 13:32                             ` [PATCH 5/5] travis-ci: don't fail if user already exists on 32 bit Linux build job SZEDER Gábor
2018-01-23 16:46                               ` Jeff King
2018-01-24  0:32                                 ` Duy Nguyen
2018-01-24 19:39                                 ` SZEDER Gábor
2018-01-22 18:27                 ` [PATCH 3/3] read-cache: don't write index twice if we can't write shared index SZEDER Gábor
2018-01-22 19:46                   ` Eric Sunshine
2018-01-22 22:10                     ` SZEDER Gábor
2018-01-24  9:11                   ` Duy Nguyen
2018-01-26 22:44                   ` Lars Schneider
2018-01-14 14:29         ` [PATCH v3 4/3] read-cache: don't try to write index " Thomas Gummerer
2018-01-18 21:53     ` [PATCH v3 0/3] fixes for split index mode Thomas Gummerer
2018-01-19 18:34       ` Junio C Hamano
2018-01-19 21:11         ` Thomas Gummerer

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox