git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] worktree add race fix
@ 2019-02-18 17:04 Michal Suchanek
  2019-02-18 17:04 ` [PATCH 1/2] worktree: fix worktree add race Michal Suchanek
                   ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: Michal Suchanek @ 2019-02-18 17:04 UTC (permalink / raw)
  To: git
  Cc: Michal Suchanek, Eric Sunshine, Marketa Calabkova,
	Nguyễn Thái Ngọc Duy, Junio C Hamano

Hello,

I am running a git automation script that crates a tree and commmits it into a
git repository repeatedly.

I noticed that the step which creates a tree is most time-consuming part of the
script and when a lot of data is to be automatically added to the repository it
is benefical to parallelize this part.

To do so I had the script create a dozen worktrees and share the work between
them. The problem is automatically creating several worktrees occasioanlly
fails.

The most common problem is in the worktree add implementation itself which
tries to find an available directory name and then mkdir() it. Of course, doing
that several times in parallel causes issues.

When running stress-test to make sure the fix is effective I uncovered
additional issues in get_common_dir_noenv. This function is used on each
worktree to build a worktree list.

Apparently it can happen that stat() claims there is a commondir file but when
trying to open the file it is missing.

Another even rarer issue is that the file might be zero size because another
process initializing a worktree opened the file but has not written is content
yet.

When any of this happnes git aborts failing to create a worktree because
unrelated worktree is not yet fully initialized.

I have tested that these patches fix the issue. However, I expect race against
removing/pruning worktrees is still possible.

For previous discussion see

http://public-inbox.org/git/CAPig+cSdpq0Bfq3zSK8kJd6da3dKixK7qYQ24=ZwbuQtsaLNZw@mail.gmail.com/

Michal Suchanek (2):
  worktree: fix worktree add race.
  setup: don't fail if commondir reference is deleted.

 builtin/worktree.c | 12 +++++++-----
 setup.c            | 16 +++++++++++-----
 2 files changed, 18 insertions(+), 10 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/2] worktree: fix worktree add race.
  2019-02-18 17:04 [PATCH 0/2] worktree add race fix Michal Suchanek
@ 2019-02-18 17:04 ` Michal Suchanek
  2019-02-18 17:04 ` [PATCH 2/2] setup: don't fail if commondir reference is deleted Michal Suchanek
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 27+ messages in thread
From: Michal Suchanek @ 2019-02-18 17:04 UTC (permalink / raw)
  To: git
  Cc: Michal Suchanek, Eric Sunshine, Marketa Calabkova,
	Nguyễn Thái Ngọc Duy, Junio C Hamano

Git runs a stat loop to find a worktree name that's available and then does
mkdir on the found name. Turn it to mkdir loop to avoid another invocation of
worktree add finding the same free name and creating the directory first.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
v2:
- simplify loop exit condition
- exit early if the mkdir fails for reason other than already present
worktree
- make counter unsigned
---
 builtin/worktree.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 3f9907fcc994..85a604cfe98c 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -268,10 +268,10 @@ static int add_worktree(const char *path, const char *refname,
 	struct strbuf sb_git = STRBUF_INIT, sb_repo = STRBUF_INIT;
 	struct strbuf sb = STRBUF_INIT;
 	const char *name;
-	struct stat st;
 	struct child_process cp = CHILD_PROCESS_INIT;
 	struct argv_array child_env = ARGV_ARRAY_INIT;
-	int counter = 0, len, ret;
+	unsigned int counter = 0;
+	int len, ret;
 	struct strbuf symref = STRBUF_INIT;
 	struct commit *commit = NULL;
 	int is_branch = 0;
@@ -295,8 +295,12 @@ static int add_worktree(const char *path, const char *refname,
 	if (safe_create_leading_directories_const(sb_repo.buf))
 		die_errno(_("could not create leading directories of '%s'"),
 			  sb_repo.buf);
-	while (!stat(sb_repo.buf, &st)) {
+
+	while (mkdir(sb_repo.buf, 0777)) {
 		counter++;
+		if ((errno != EEXIST) || !counter /* overflow */)
+			die_errno(_("could not create directory of '%s'"),
+				  sb_repo.buf);
 		strbuf_setlen(&sb_repo, len);
 		strbuf_addf(&sb_repo, "%d", counter);
 	}
@@ -306,8 +310,6 @@ static int add_worktree(const char *path, const char *refname,
 	atexit(remove_junk);
 	sigchain_push_common(remove_junk_on_signal);
 
-	if (mkdir(sb_repo.buf, 0777))
-		die_errno(_("could not create directory of '%s'"), sb_repo.buf);
 	junk_git_dir = xstrdup(sb_repo.buf);
 	is_junk = 1;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/2] setup: don't fail if commondir reference is deleted.
  2019-02-18 17:04 [PATCH 0/2] worktree add race fix Michal Suchanek
  2019-02-18 17:04 ` [PATCH 1/2] worktree: fix worktree add race Michal Suchanek
@ 2019-02-18 17:04 ` Michal Suchanek
  2019-02-18 21:00   ` Eric Sunshine
  2019-02-21 10:50   ` Duy Nguyen
  2019-02-20 16:16 ` [PATCH v3 1/2] worktree: fix worktree add race Michal Suchanek
  2019-02-20 16:16 ` [PATCH v3 2/2] setup: don't fail if commondir reference is deleted Michal Suchanek
  3 siblings, 2 replies; 27+ messages in thread
From: Michal Suchanek @ 2019-02-18 17:04 UTC (permalink / raw)
  To: git
  Cc: Michal Suchanek, Eric Sunshine, Marketa Calabkova,
	Nguyễn Thái Ngọc Duy, Junio C Hamano

When adding wotktrees git can die in get_common_dir_noenv while
examining existing worktrees because the commondir file does not exist.
Rather than testing if the file exists before reading it handle ENOENT.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
v2:
- do not test file existence first, just read it and handle ENOENT.
- handle zero size file correctly
---
 setup.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/setup.c b/setup.c
index ca9e8a949ed8..dd865f280d34 100644
--- a/setup.c
+++ b/setup.c
@@ -270,12 +270,20 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
 {
 	struct strbuf data = STRBUF_INIT;
 	struct strbuf path = STRBUF_INIT;
-	int ret = 0;
+	int ret;
 
 	strbuf_addf(&path, "%s/commondir", gitdir);
-	if (file_exists(path.buf)) {
-		if (strbuf_read_file(&data, path.buf, 0) <= 0)
+	ret = strbuf_read_file(&data, path.buf, 0);
+	if (ret <= 0) {
+		/*
+		 * if file is missing or zero size (just being written)
+		 * assume default, bail otherwise
+		 */
+		if (ret && errno != ENOENT)
 			die_errno(_("failed to read %s"), path.buf);
+		strbuf_addstr(sb, gitdir);
+		ret = 0;
+	} else {
 		while (data.len && (data.buf[data.len - 1] == '\n' ||
 				    data.buf[data.len - 1] == '\r'))
 			data.len--;
@@ -286,8 +294,6 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
 		strbuf_addbuf(&path, &data);
 		strbuf_add_real_path(sb, path.buf);
 		ret = 1;
-	} else {
-		strbuf_addstr(sb, gitdir);
 	}
 
 	strbuf_release(&data);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] setup: don't fail if commondir reference is deleted.
  2019-02-18 17:04 ` [PATCH 2/2] setup: don't fail if commondir reference is deleted Michal Suchanek
@ 2019-02-18 21:00   ` Eric Sunshine
  2019-02-21 10:50   ` Duy Nguyen
  1 sibling, 0 replies; 27+ messages in thread
From: Eric Sunshine @ 2019-02-18 21:00 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Git List, Marketa Calabkova,
	Nguyễn Thái Ngọc Duy, Junio C Hamano

On Mon, Feb 18, 2019 at 12:05 PM Michal Suchanek <msuchanek@suse.de> wrote:
> When adding wotktrees git can die in get_common_dir_noenv while
> examining existing worktrees because the commondir file does not exist.
> Rather than testing if the file exists before reading it handle ENOENT.

This commit message leaves the reader wondering under what conditions
"commondir file" might not exist. For instance, the reader might
wonder "iIs it simply a condition of normal operation or does it arise
under odd circumstances?". Without this information, it is difficult
for someone reading the explanation to understand if or how this code
might validly be changed in the future. Your cover letter contained
explanation which likely ought to be duplicated here as an aid to
future readers.

> Signed-off-by: Michal Suchanek <msuchanek@suse.de>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 1/2] worktree: fix worktree add race.
  2019-02-18 17:04 [PATCH 0/2] worktree add race fix Michal Suchanek
  2019-02-18 17:04 ` [PATCH 1/2] worktree: fix worktree add race Michal Suchanek
  2019-02-18 17:04 ` [PATCH 2/2] setup: don't fail if commondir reference is deleted Michal Suchanek
@ 2019-02-20 16:16 ` Michal Suchanek
  2019-02-20 16:34   ` Eric Sunshine
  2019-03-08  9:20   ` Duy Nguyen
  2019-02-20 16:16 ` [PATCH v3 2/2] setup: don't fail if commondir reference is deleted Michal Suchanek
  3 siblings, 2 replies; 27+ messages in thread
From: Michal Suchanek @ 2019-02-20 16:16 UTC (permalink / raw)
  To: git
  Cc: Michal Suchanek, Eric Sunshine, Marketa Calabkova,
	Nguyễn Thái Ngọc Duy, Junio C Hamano

Git runs a stat loop to find a worktree name that's available and then does
mkdir on the found name. Turn it to mkdir loop to avoid another invocation of
worktree add finding the same free name and creating the directory first.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
v2:
- simplify loop exit condition
- exit early if the mkdir fails for reason other than already present
worktree
- make counter unsigned
---
 builtin/worktree.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 3f9907fcc994..85a604cfe98c 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -268,10 +268,10 @@ static int add_worktree(const char *path, const char *refname,
 	struct strbuf sb_git = STRBUF_INIT, sb_repo = STRBUF_INIT;
 	struct strbuf sb = STRBUF_INIT;
 	const char *name;
-	struct stat st;
 	struct child_process cp = CHILD_PROCESS_INIT;
 	struct argv_array child_env = ARGV_ARRAY_INIT;
-	int counter = 0, len, ret;
+	unsigned int counter = 0;
+	int len, ret;
 	struct strbuf symref = STRBUF_INIT;
 	struct commit *commit = NULL;
 	int is_branch = 0;
@@ -295,8 +295,12 @@ static int add_worktree(const char *path, const char *refname,
 	if (safe_create_leading_directories_const(sb_repo.buf))
 		die_errno(_("could not create leading directories of '%s'"),
 			  sb_repo.buf);
-	while (!stat(sb_repo.buf, &st)) {
+
+	while (mkdir(sb_repo.buf, 0777)) {
 		counter++;
+		if ((errno != EEXIST) || !counter /* overflow */)
+			die_errno(_("could not create directory of '%s'"),
+				  sb_repo.buf);
 		strbuf_setlen(&sb_repo, len);
 		strbuf_addf(&sb_repo, "%d", counter);
 	}
@@ -306,8 +310,6 @@ static int add_worktree(const char *path, const char *refname,
 	atexit(remove_junk);
 	sigchain_push_common(remove_junk_on_signal);
 
-	if (mkdir(sb_repo.buf, 0777))
-		die_errno(_("could not create directory of '%s'"), sb_repo.buf);
 	junk_git_dir = xstrdup(sb_repo.buf);
 	is_junk = 1;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 2/2] setup: don't fail if commondir reference is deleted.
  2019-02-18 17:04 [PATCH 0/2] worktree add race fix Michal Suchanek
                   ` (2 preceding siblings ...)
  2019-02-20 16:16 ` [PATCH v3 1/2] worktree: fix worktree add race Michal Suchanek
@ 2019-02-20 16:16 ` Michal Suchanek
  2019-02-20 16:55   ` Eric Sunshine
  3 siblings, 1 reply; 27+ messages in thread
From: Michal Suchanek @ 2019-02-20 16:16 UTC (permalink / raw)
  To: git
  Cc: Michal Suchanek, Eric Sunshine, Marketa Calabkova,
	Nguyễn Thái Ngọc Duy, Junio C Hamano

Apparently it can happen that stat() claims there is a commondir file but when
trying to open the file it is missing.

Another even rarer issue is that the file might be zero size because another
process initializing a worktree opened the file but has not written is content
yet.

When any of this happnes git aborts failing to perform perfectly valid
command because unrelated worktree is not yet fully initialized.

Rather than testing if the file exists before reading it handle ENOENT
and ENOTDIR.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
v2:
- do not test file existence first, just read it and handle ENOENT.
- handle zero size file correctly
v3:
- handle ENOTDIR as well
- add more details to commit message
---
 setup.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/setup.c b/setup.c
index ca9e8a949ed8..49306e36990d 100644
--- a/setup.c
+++ b/setup.c
@@ -270,12 +270,20 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
 {
 	struct strbuf data = STRBUF_INIT;
 	struct strbuf path = STRBUF_INIT;
-	int ret = 0;
+	int ret;
 
 	strbuf_addf(&path, "%s/commondir", gitdir);
-	if (file_exists(path.buf)) {
-		if (strbuf_read_file(&data, path.buf, 0) <= 0)
+	ret = strbuf_read_file(&data, path.buf, 0);
+	if (ret <= 0) {
+		/*
+		 * if file is missing or zero size (just being written)
+		 * assume default, bail otherwise
+		 */
+		if (ret && errno != ENOENT && errno != ENOTDIR)
 			die_errno(_("failed to read %s"), path.buf);
+		strbuf_addstr(sb, gitdir);
+		ret = 0;
+	} else {
 		while (data.len && (data.buf[data.len - 1] == '\n' ||
 				    data.buf[data.len - 1] == '\r'))
 			data.len--;
@@ -286,8 +294,6 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
 		strbuf_addbuf(&path, &data);
 		strbuf_add_real_path(sb, path.buf);
 		ret = 1;
-	} else {
-		strbuf_addstr(sb, gitdir);
 	}
 
 	strbuf_release(&data);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/2] worktree: fix worktree add race.
  2019-02-20 16:16 ` [PATCH v3 1/2] worktree: fix worktree add race Michal Suchanek
@ 2019-02-20 16:34   ` Eric Sunshine
  2019-02-20 17:29     ` Michal Suchánek
  2019-03-08  9:20   ` Duy Nguyen
  1 sibling, 1 reply; 27+ messages in thread
From: Eric Sunshine @ 2019-02-20 16:34 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Git List, Marketa Calabkova,
	Nguyễn Thái Ngọc Duy, Junio C Hamano

On Wed, Feb 20, 2019 at 11:17 AM Michal Suchanek <msuchanek@suse.de> wrote:
> Git runs a stat loop to find a worktree name that's available and then does
> mkdir on the found name. Turn it to mkdir loop to avoid another invocation of
> worktree add finding the same free name and creating the directory first.
>
> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> ---
> diff --git a/builtin/worktree.c b/builtin/worktree.c
> @@ -295,8 +295,12 @@ static int add_worktree(const char *path, const char *refname,
>         if (safe_create_leading_directories_const(sb_repo.buf))
>                 die_errno(_("could not create leading directories of '%s'"),
>                           sb_repo.buf);
> -       while (!stat(sb_repo.buf, &st)) {
> +       while (mkdir(sb_repo.buf, 0777)) {
>                 counter++;
> +               if ((errno != EEXIST) || !counter /* overflow */)
> +                       die_errno(_("could not create directory of '%s'"),
> +                                 sb_repo.buf);
>                 strbuf_setlen(&sb_repo, len);
>                 strbuf_addf(&sb_repo, "%d", counter);
>         }
> @@ -306,8 +310,6 @@ static int add_worktree(const char *path, const char *refname,
>         atexit(remove_junk);
>         sigchain_push_common(remove_junk_on_signal);
> -       if (mkdir(sb_repo.buf, 0777))
> -               die_errno(_("could not create directory of '%s'"), sb_repo.buf);
>         junk_git_dir = xstrdup(sb_repo.buf);
>         is_junk = 1;

Did you audit this "junk" handling to verify that stuff which ought to
be cleaned up still is cleaned up now that the mkdir() and die() have
been moved above the atexit(remove_junk) invocation?

I did just audit it, and I _think_ that it still works as expected,
but it would be good to hear that someone else has come to the same
conclusion.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] setup: don't fail if commondir reference is deleted.
  2019-02-20 16:16 ` [PATCH v3 2/2] setup: don't fail if commondir reference is deleted Michal Suchanek
@ 2019-02-20 16:55   ` Eric Sunshine
  2019-02-20 17:16     ` Michal Suchánek
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Sunshine @ 2019-02-20 16:55 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Git List, Marketa Calabkova,
	Nguyễn Thái Ngọc Duy, Junio C Hamano

On Wed, Feb 20, 2019 at 11:17 AM Michal Suchanek <msuchanek@suse.de> wrote:
> Apparently it can happen that stat() claims there is a commondir file but when
> trying to open the file it is missing.

Under what circumstances?

> Another even rarer issue is that the file might be zero size because another
> process initializing a worktree opened the file but has not written is content
> yet.

Based upon the explanation thus far, I'm having trouble understanding
under what circumstances these race conditions can arise. Are you
trying to invoke Git commands in a particular worktree even as the
worktree itself is being created?

Without this information being spelled out clearly, it is going to be
difficult for someone in the future to reason about why the code is
the way it is following this change.

> When any of this happnes git aborts failing to perform perfectly valid
> command because unrelated worktree is not yet fully initialized.

s/happnes/happens/

> Rather than testing if the file exists before reading it handle ENOENT
> and ENOTDIR.

One more comment below...

> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> ---
> diff --git a/setup.c b/setup.c
> @@ -270,12 +270,20 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
>  {
>         strbuf_addf(&path, "%s/commondir", gitdir);
> -       if (file_exists(path.buf)) {
> -               if (strbuf_read_file(&data, path.buf, 0) <= 0)
> +       ret = strbuf_read_file(&data, path.buf, 0);
> +       if (ret <= 0) {
> +               /*
> +                * if file is missing or zero size (just being written)
> +                * assume default, bail otherwise
> +                */
> +               if (ret && errno != ENOENT && errno != ENOTDIR)
>                         die_errno(_("failed to read %s"), path.buf);

It's not clear from the explanation given in the commit message if the
new behavior is indeed sensible. The original intent of the code, as I
understand it, is to validate "commondir", to ensure that it is not
somehow corrupt (such as the user editing it and making it empty).
Following this change, that particular validation no longer takes
place. But, more importantly, what does it mean to fall back to
"default" for this particular worktree? I'm having trouble
understanding how the new behavior can be correct or desirable. (Am I
missing something obvious?)

> +               strbuf_addstr(sb, gitdir);
> +               ret = 0;
> +       } else {

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] setup: don't fail if commondir reference is deleted.
  2019-02-20 16:55   ` Eric Sunshine
@ 2019-02-20 17:16     ` Michal Suchánek
  2019-02-20 18:35       ` Eric Sunshine
  0 siblings, 1 reply; 27+ messages in thread
From: Michal Suchánek @ 2019-02-20 17:16 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Git List, Marketa Calabkova,
	Nguyễn Thái Ngọc Duy, Junio C Hamano

On Wed, 20 Feb 2019 11:55:46 -0500
Eric Sunshine <sunshine@sunshineco.com> wrote:

> On Wed, Feb 20, 2019 at 11:17 AM Michal Suchanek <msuchanek@suse.de> wrote:
> > Apparently it can happen that stat() claims there is a commondir file but when
> > trying to open the file it is missing.  
> 
> Under what circumstances?

I would like to know that as well. The only command tested was worktree
add which should not remove the file. Nonetheless running many woktree
add commands in parallel can cause the file to go away for some of
them. For many commands git calls itself recursively so there is
probably much more going on than the single function that creates the
worktree.

> 
> > Another even rarer issue is that the file might be zero size because another
> > process initializing a worktree opened the file but has not written is content
> > yet.  
> 
> Based upon the explanation thus far, I'm having trouble understanding
> under what circumstances these race conditions can arise. Are you
> trying to invoke Git commands in a particular worktree even as the
> worktree itself is being created?

It's explained in the following paragraph. If you have multiple
worktrees some *other* worktreee may be uninitialized.

> 
> Without this information being spelled out clearly, it is going to be
> difficult for someone in the future to reason about why the code is
> the way it is following this change.
> 
> > When any of this happnes git aborts failing to perform perfectly valid
> > command because unrelated worktree is not yet fully initialized.  
> 
> s/happnes/happens/
> 
> > Rather than testing if the file exists before reading it handle ENOENT
> > and ENOTDIR.  
> 
> One more comment below...
> 
> > Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> > ---
> > diff --git a/setup.c b/setup.c
> > @@ -270,12 +270,20 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
> >  {
> >         strbuf_addf(&path, "%s/commondir", gitdir);
> > -       if (file_exists(path.buf)) {
> > -               if (strbuf_read_file(&data, path.buf, 0) <= 0)
> > +       ret = strbuf_read_file(&data, path.buf, 0);
> > +       if (ret <= 0) {
> > +               /*
> > +                * if file is missing or zero size (just being written)
> > +                * assume default, bail otherwise
> > +                */
> > +               if (ret && errno != ENOENT && errno != ENOTDIR)
> >                         die_errno(_("failed to read %s"), path.buf);  
> 
> It's not clear from the explanation given in the commit message if the
> new behavior is indeed sensible. The original intent of the code, as I
> understand it, is to validate "commondir", to ensure that it is not
> somehow corrupt (such as the user editing it and making it empty).

How is it validated in the code below when it is non-zero size?

There is *no* validation whatsoever. Yet zero size is somehow totally
unacceptable and requires that git working in *any* worktree aborts if
commondir file in *any* worktree is zero size.

> Following this change, that particular validation no longer takes
> place. But, more importantly, what does it mean to fall back to
> "default" for this particular worktree? I'm having trouble
> understanding how the new behavior can be correct or desirable. (Am I
> missing something obvious?)

If the file can be missing altogether and it is not an error how it is
incorrect or undesirable to ignore zero size file?

Thanks

Michal

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/2] worktree: fix worktree add race.
  2019-02-20 16:34   ` Eric Sunshine
@ 2019-02-20 17:29     ` Michal Suchánek
  0 siblings, 0 replies; 27+ messages in thread
From: Michal Suchánek @ 2019-02-20 17:29 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Git List, Marketa Calabkova,
	Nguyễn Thái Ngọc Duy, Junio C Hamano

On Wed, 20 Feb 2019 11:34:54 -0500
Eric Sunshine <sunshine@sunshineco.com> wrote:

> On Wed, Feb 20, 2019 at 11:17 AM Michal Suchanek <msuchanek@suse.de> wrote:
> > Git runs a stat loop to find a worktree name that's available and then does
> > mkdir on the found name. Turn it to mkdir loop to avoid another invocation of
> > worktree add finding the same free name and creating the directory first.
> >
> > Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> > ---
> > diff --git a/builtin/worktree.c b/builtin/worktree.c
> > @@ -295,8 +295,12 @@ static int add_worktree(const char *path, const char *refname,
> >         if (safe_create_leading_directories_const(sb_repo.buf))
> >                 die_errno(_("could not create leading directories of '%s'"),
> >                           sb_repo.buf);
> > -       while (!stat(sb_repo.buf, &st)) {
> > +       while (mkdir(sb_repo.buf, 0777)) {
> >                 counter++;
> > +               if ((errno != EEXIST) || !counter /* overflow */)
> > +                       die_errno(_("could not create directory of '%s'"),
> > +                                 sb_repo.buf);
> >                 strbuf_setlen(&sb_repo, len);
> >                 strbuf_addf(&sb_repo, "%d", counter);
> >         }
> > @@ -306,8 +310,6 @@ static int add_worktree(const char *path, const char *refname,
> >         atexit(remove_junk);
> >         sigchain_push_common(remove_junk_on_signal);
> > -       if (mkdir(sb_repo.buf, 0777))
> > -               die_errno(_("could not create directory of '%s'"), sb_repo.buf);
> >         junk_git_dir = xstrdup(sb_repo.buf);
> >         is_junk = 1;  
> 
> Did you audit this "junk" handling to verify that stuff which ought to
> be cleaned up still is cleaned up now that the mkdir() and die() have
> been moved above the atexit(remove_junk) invocation?
> 
> I did just audit it, and I _think_ that it still works as expected,
> but it would be good to hear that someone else has come to the same
> conclusion.

The die() is executed only when mkdir() fails so there is no junk to
clean up in that case.

Thanks

Michal

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] setup: don't fail if commondir reference is deleted.
  2019-02-20 17:16     ` Michal Suchánek
@ 2019-02-20 18:35       ` Eric Sunshine
  2019-02-21  9:27         ` Eric Sunshine
  2019-02-21 11:19         ` Michal Suchánek
  0 siblings, 2 replies; 27+ messages in thread
From: Eric Sunshine @ 2019-02-20 18:35 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: Git List, Marketa Calabkova,
	Nguyễn Thái Ngọc Duy, Junio C Hamano

On Wed, Feb 20, 2019 at 12:16 PM Michal Suchánek <msuchanek@suse.de> wrote:
> On Wed, 20 Feb 2019 11:55:46 -0500
> Eric Sunshine <sunshine@sunshineco.com> wrote:
> > On Wed, Feb 20, 2019 at 11:17 AM Michal Suchanek <msuchanek@suse.de> wrote:
> > > Apparently it can happen that stat() claims there is a commondir file but when
> > > trying to open the file it is missing.
> >
> > Under what circumstances?
>
> I would like to know that as well. The only command tested was worktree
> add which should not remove the file. Nonetheless running many woktree
> add commands in parallel can cause the file to go away for some of
> them.

You actually encountered this particular error message, correct? Was
that before or after you fixed the race in builtin/worktree.c itself
via patch 1/2? Did the reported 'errno' indicate that the file did not
exist or was it some other error?

> For many commands git calls itself recursively so there is
> probably much more going on than the single function that creates the
> worktree.

"git worktree add" is careful to invoke other Git commands only after
"commondir" exists, so it's not clear how this circumstance arises if
the file is indeed missing by the time the other Git command is run.

> > > Another even rarer issue is that the file might be zero size because another
> > > process initializing a worktree opened the file but has not written is content
> > > yet.
> >
> > Based upon the explanation thus far, I'm having trouble understanding
> > under what circumstances these race conditions can arise. Are you
> > trying to invoke Git commands in a particular worktree even as the
> > worktree itself is being created?
>
> It's explained in the following paragraph. If you have multiple
> worktrees some *other* worktreee may be uninitialized.

I understand that, but setup.c:get_common_dir_noenv() is concerned
only with _this_ worktree -- the one in which the Git command is being
run -- so it's not clear if or how some other partially-initialized
worktree could have any impact. (And, I'm having trouble fathoming how
it could, which is why I'm asking these questions).

Is it possible that when you saw that error message, it actually arose
from some code other than setup.c:get_common_dir_noenv()?

> > > -       if (file_exists(path.buf)) {
> > > -               if (strbuf_read_file(&data, path.buf, 0) <= 0)
> > > +       ret = strbuf_read_file(&data, path.buf, 0);
> > > +       if (ret <= 0) {
> > > +               /*
> > > +                * if file is missing or zero size (just being written)
> > > +                * assume default, bail otherwise
> > > +                */
> > > +               if (ret && errno != ENOENT && errno != ENOTDIR)
> > >                         die_errno(_("failed to read %s"), path.buf);
> >
> > It's not clear from the explanation given in the commit message if the
> > new behavior is indeed sensible. The original intent of the code, as I
> > understand it, is to validate "commondir", to ensure that it is not
> > somehow corrupt (such as the user editing it and making it empty).
>
> How is it validated in the code below when it is non-zero size?

Checking whether the file has content _is_ a form of validation, even
if not extensive validation.

> There is *no* validation whatsoever. Yet zero size is somehow totally
> unacceptable and requires that git working in *any* worktree aborts if
> commondir file in *any* worktree is zero size.

As noted above, it's not clear from the commit message how this case
can arise given that setup.c:get_common_dir_noenv() is presumably
concerned with and only consults _this_ worktree, so I'm having
trouble understanding how the state of other worktrees could impact
it.

> > Following this change, that particular validation no longer takes
> > place. But, more importantly, what does it mean to fall back to
> > "default" for this particular worktree? I'm having trouble
> > understanding how the new behavior can be correct or desirable. (Am I
> > missing something obvious?)
>
> If the file can be missing altogether and it is not an error how it is
> incorrect or undesirable to ignore zero size file?

Because the _presence_ of that file indicates a linked worktree,
whereas it's absence indicates the main worktree. If the file is
present but empty, then that is an abnormal condition, i.e. some form
of corruption.

The difference is significant, and that's why I'm asking if the new
behavior is correct or desirable. If you start interpreting this
abnormal condition as a non-error, then get_common_dir_noenv() will be
reporting that this is the main worktree when in fact it is (a somehow
corrupted) linked worktree. Such false reporting could trigger
undesirable and outright wrong behavior in callers.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] setup: don't fail if commondir reference is deleted.
  2019-02-20 18:35       ` Eric Sunshine
@ 2019-02-21  9:27         ` Eric Sunshine
  2019-02-21 11:13           ` Michal Suchánek
  2019-02-21 11:19         ` Michal Suchánek
  1 sibling, 1 reply; 27+ messages in thread
From: Eric Sunshine @ 2019-02-21  9:27 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: Git List, Marketa Calabkova,
	Nguyễn Thái Ngọc Duy, Junio C Hamano

On Wed, Feb 20, 2019 at 1:35 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> On Wed, Feb 20, 2019 at 12:16 PM Michal Suchánek <msuchanek@suse.de> wrote:
> > On Wed, 20 Feb 2019 11:55:46 -0500
> > Eric Sunshine <sunshine@sunshineco.com> wrote:
> > > On Wed, Feb 20, 2019 at 11:17 AM Michal Suchanek <msuchanek@suse.de> wrote:
> > > > Another even rarer issue is that the file might be zero size because another
> > > > process initializing a worktree opened the file but has not written is content
> > > > yet.
> > >
> > > Based upon the explanation thus far, I'm having trouble understanding
> > > under what circumstances these race conditions can arise. Are you
> > > trying to invoke Git commands in a particular worktree even as the
> > > worktree itself is being created?
> >
> > It's explained in the following paragraph. If you have multiple
> > worktrees some *other* worktreee may be uninitialized.
>
> I understand that, but setup.c:get_common_dir_noenv() is concerned
> only with _this_ worktree -- the one in which the Git command is being
> run -- so it's not clear if or how some other partially-initialized
> worktree could have any impact. (And, I'm having trouble fathoming how
> it could, which is why I'm asking these questions).

I still can't see how setup.c:get_common_dir_noenv() could be
responsible for the behavior you're describing of _any_ Git command
erroring out due to _any_ worktree being incompletely-initialized.
However, I can imagine "git worktree add" itself being racy and
failing due to a missing or empty "commondir" file for some other
worktree since that command _does_ consult other worktree entries when
validating the "add" operation via
builtin/worktree.c:validate_worktree_add() which calls
get_worktrees(). If get_worktrees() is subject to that raciness
problem, then "git worktree add" will inherit that undesirable
raciness behavior (as will other "git worktree" commands which call
get_worktrees(), such as "git worktree list").

> Is it possible that when you saw that error message, it actually arose
> from some code other than setup.c:get_common_dir_noenv()?

So, I'm suspecting get_worktrees() or some function it calls (and so
on) as the racy culprit.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] setup: don't fail if commondir reference is deleted.
  2019-02-18 17:04 ` [PATCH 2/2] setup: don't fail if commondir reference is deleted Michal Suchanek
  2019-02-18 21:00   ` Eric Sunshine
@ 2019-02-21 10:50   ` Duy Nguyen
  2019-02-21 13:50     ` Michal Suchánek
  1 sibling, 1 reply; 27+ messages in thread
From: Duy Nguyen @ 2019-02-21 10:50 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Git Mailing List, Eric Sunshine, Marketa Calabkova,
	Junio C Hamano

On Tue, Feb 19, 2019 at 12:05 AM Michal Suchanek <msuchanek@suse.de> wrote:
>
> When adding wotktrees git can die in get_common_dir_noenv while
> examining existing worktrees because the commondir file does not exist.
> Rather than testing if the file exists before reading it handle ENOENT.

I don't think we could go around fixing every access to incomplete
worktrees like this. If this is because of racy 'worktree add', then
perhaps a better solution is make it absolutely clear it's not ready
for anybody to access.

For example, we can suffix the worktree directory name with ".lock"
and make sure get_worktrees() ignores entries ending with ".lock".
That should protect other commands while 'worktree add' is still
running. Only when the worktree is complete that 'worktree add' should
rename the directory to lose ".lock" and run external commands like
git-checkout to populate the worktree.

> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> ---
> v2:
> - do not test file existence first, just read it and handle ENOENT.
> - handle zero size file correctly
> ---
>  setup.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/setup.c b/setup.c
> index ca9e8a949ed8..dd865f280d34 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -270,12 +270,20 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
>  {
>         struct strbuf data = STRBUF_INIT;
>         struct strbuf path = STRBUF_INIT;
> -       int ret = 0;
> +       int ret;
>
>         strbuf_addf(&path, "%s/commondir", gitdir);
> -       if (file_exists(path.buf)) {
> -               if (strbuf_read_file(&data, path.buf, 0) <= 0)
> +       ret = strbuf_read_file(&data, path.buf, 0);
> +       if (ret <= 0) {
> +               /*
> +                * if file is missing or zero size (just being written)
> +                * assume default, bail otherwise
> +                */
> +               if (ret && errno != ENOENT)
>                         die_errno(_("failed to read %s"), path.buf);
> +               strbuf_addstr(sb, gitdir);
> +               ret = 0;
> +       } else {
>                 while (data.len && (data.buf[data.len - 1] == '\n' ||
>                                     data.buf[data.len - 1] == '\r'))
>                         data.len--;
> @@ -286,8 +294,6 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
>                 strbuf_addbuf(&path, &data);
>                 strbuf_add_real_path(sb, path.buf);
>                 ret = 1;
> -       } else {
> -               strbuf_addstr(sb, gitdir);
>         }
>
>         strbuf_release(&data);
> --
> 2.20.1
>


-- 
Duy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] setup: don't fail if commondir reference is deleted.
  2019-02-21  9:27         ` Eric Sunshine
@ 2019-02-21 11:13           ` Michal Suchánek
  0 siblings, 0 replies; 27+ messages in thread
From: Michal Suchánek @ 2019-02-21 11:13 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Git List, Marketa Calabkova,
	Nguyễn Thái Ngọc Duy, Junio C Hamano

On Thu, 21 Feb 2019 04:27:21 -0500
Eric Sunshine <sunshine@sunshineco.com> wrote:

> On Wed, Feb 20, 2019 at 1:35 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> > On Wed, Feb 20, 2019 at 12:16 PM Michal Suchánek <msuchanek@suse.de> wrote:  
> > > On Wed, 20 Feb 2019 11:55:46 -0500
> > > Eric Sunshine <sunshine@sunshineco.com> wrote:  
> > > > On Wed, Feb 20, 2019 at 11:17 AM Michal Suchanek <msuchanek@suse.de> wrote:  
> > > > > Another even rarer issue is that the file might be zero size because another
> > > > > process initializing a worktree opened the file but has not written is content
> > > > > yet.  
> > > >
> > > > Based upon the explanation thus far, I'm having trouble understanding
> > > > under what circumstances these race conditions can arise. Are you
> > > > trying to invoke Git commands in a particular worktree even as the
> > > > worktree itself is being created?  
> > >
> > > It's explained in the following paragraph. If you have multiple
> > > worktrees some *other* worktreee may be uninitialized.  
> >
> > I understand that, but setup.c:get_common_dir_noenv() is concerned
> > only with _this_ worktree -- the one in which the Git command is being
> > run -- so it's not clear if or how some other partially-initialized
> > worktree could have any impact. (And, I'm having trouble fathoming how
> > it could, which is why I'm asking these questions).  
> 
> I still can't see how setup.c:get_common_dir_noenv() could be
> responsible for the behavior you're describing of _any_ Git command
> erroring out due to _any_ worktree being incompletely-initialized.
> However, I can imagine "git worktree add" itself being racy and
> failing due to a missing or empty "commondir" file for some other
> worktree since that command _does_ consult other worktree entries when
> validating the "add" operation via
> builtin/worktree.c:validate_worktree_add() which calls
> get_worktrees(). If get_worktrees() is subject to that raciness
> problem, then "git worktree add" will inherit that undesirable
> raciness behavior (as will other "git worktree" commands which call
> get_worktrees(), such as "git worktree list").
> 
> > Is it possible that when you saw that error message, it actually arose
> > from some code other than setup.c:get_common_dir_noenv()?  
> 
> So, I'm suspecting get_worktrincludes both itees() or some function it calls (and so
> on) as the racy culprit.

Yes, that's my explanation for the situation as well.

Thanks

Michal

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] setup: don't fail if commondir reference is deleted.
  2019-02-20 18:35       ` Eric Sunshine
  2019-02-21  9:27         ` Eric Sunshine
@ 2019-02-21 11:19         ` Michal Suchánek
  1 sibling, 0 replies; 27+ messages in thread
From: Michal Suchánek @ 2019-02-21 11:19 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Git List, Marketa Calabkova,
	Nguyễn Thái Ngọc Duy, Junio C Hamano

On Wed, 20 Feb 2019 13:35:57 -0500
Eric Sunshine <sunshine@sunshineco.com> wrote:

> On Wed, Feb 20, 2019 at 12:16 PM Michal Suchánek <msuchanek@suse.de> wrote:
> > On Wed, 20 Feb 2019 11:55:46 -0500
> > Eric Sunshine <sunshine@sunshineco.com> wrote:  

> > > Following this change, that particular validation no longer takes
> > > place. But, more importantly, what does it mean to fall back to
> > > "default" for this particular worktree? I'm having trouble
> > > understanding how the new behavior can be correct or desirable. (Am I
> > > missing something obvious?)  
> >
> > If the file can be missing altogether and it is not an error how it is
> > incorrect or undesirable to ignore zero size file?  
> 
> Because the _presence_ of that file indicates a linked worktree,
> whereas it's absence indicates the main worktree. If the file is
> present but empty, then that is an abnormal condition, i.e. some form
> of corruption.
> 
> The difference is significant, and that's why I'm asking if the new
> behavior is correct or desirable. If you start interpreting this
> abnormal condition as a non-error, then get_common_dir_noenv() will be
> reporting that this is the main worktree when in fact it is (a somehow
> corrupted) linked worktree. Such false reporting could trigger
> undesirable and outright wrong behavior in callers.

This is not an issue introduced with this patch, however. The worktree
is not initialized atomically. First the worktree directory is created
and then it is populated with content including the commondir reference.

Because there is no big repository lock that everyone takes to access
a repository other running git processes can see the wotktree without
the commondir file. 

The way this is mitigated in users of get_worktrees() is an assumption
that the first worktree is the main worktree.

If this is sufficient is not something this patchset aims to address.
It merely addresses get_worktrees() aborting due to hitting specific
stage in the initialization of a worktree.

Thanks

Michal

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] setup: don't fail if commondir reference is deleted.
  2019-02-21 10:50   ` Duy Nguyen
@ 2019-02-21 13:50     ` Michal Suchánek
  2019-02-21 17:07       ` Phillip Wood
  2019-02-22  9:26       ` Duy Nguyen
  0 siblings, 2 replies; 27+ messages in thread
From: Michal Suchánek @ 2019-02-21 13:50 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Eric Sunshine, Marketa Calabkova,
	Junio C Hamano

On Thu, 21 Feb 2019 17:50:38 +0700
Duy Nguyen <pclouds@gmail.com> wrote:

> On Tue, Feb 19, 2019 at 12:05 AM Michal Suchanek <msuchanek@suse.de> wrote:
> >
> > When adding wotktrees git can die in get_common_dir_noenv while
> > examining existing worktrees because the commondir file does not exist.
> > Rather than testing if the file exists before reading it handle ENOENT.  
> 
> I don't think we could go around fixing every access to incomplete
> worktrees like this. If this is because of racy 'worktree add', then
> perhaps a better solution is make it absolutely clear it's not ready
> for anybody to access.
> 
> For example, we can suffix the worktree directory name with ".lock"
> and make sure get_worktrees() ignores entries ending with ".lock".
> That should protect other commands while 'worktree add' is still
> running. Only when the worktree is complete that 'worktree add' should
> rename the directory to lose ".lock" and run external commands like
> git-checkout to populate the worktree.

The problem is we don't forbid worktree names ending with ".lock".
Which means that if we start to forbid them now existing worktrees
might become inaccessible.

Thanks

Michal

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] setup: don't fail if commondir reference is deleted.
  2019-02-21 13:50     ` Michal Suchánek
@ 2019-02-21 17:07       ` Phillip Wood
  2019-02-21 17:12         ` Eric Sunshine
  2019-02-22  9:32         ` Duy Nguyen
  2019-02-22  9:26       ` Duy Nguyen
  1 sibling, 2 replies; 27+ messages in thread
From: Phillip Wood @ 2019-02-21 17:07 UTC (permalink / raw)
  To: Michal Suchánek, Duy Nguyen
  Cc: Git Mailing List, Eric Sunshine, Marketa Calabkova,
	Junio C Hamano

Hi Michal/Duy

On 21/02/2019 13:50, Michal Suchánek wrote:
> On Thu, 21 Feb 2019 17:50:38 +0700
> Duy Nguyen <pclouds@gmail.com> wrote:
> 
>> On Tue, Feb 19, 2019 at 12:05 AM Michal Suchanek <msuchanek@suse.de> wrote:
>>>
>>> When adding wotktrees git can die in get_common_dir_noenv while
>>> examining existing worktrees because the commondir file does not exist.
>>> Rather than testing if the file exists before reading it handle ENOENT.
>>
>> I don't think we could go around fixing every access to incomplete
>> worktrees like this. If this is because of racy 'worktree add', then
>> perhaps a better solution is make it absolutely clear it's not ready
>> for anybody to access.
>>
>> For example, we can suffix the worktree directory name with ".lock"
>> and make sure get_worktrees() ignores entries ending with ".lock".
>> That should protect other commands while 'worktree add' is still
>> running. Only when the worktree is complete that 'worktree add' should
>> rename the directory to lose ".lock" and run external commands like
>> git-checkout to populate the worktree.
> 
> The problem is we don't forbid worktree names ending with ".lock".
> Which means that if we start to forbid them now existing worktrees
> might become inaccessible.

I think it is also racy as the renaming breaks the use of mkdir erroring 
out if the directory already exists. One solution is to have a lock 
entry in $GIT_COMMON_DIR/worktree-locks and make sure the code that 
iterates over the entries in $GIT_COMMON_DIR/worktrees skips any that 
have a corresponding ignores in $GIT_COMMON_DIR/worktree-locks. If the 
worktree-locks/<dir> is created before worktree/<dir> then it should be 
race free (you will have to remove the lock if the real entry cannot be 
created and then increment the counter and try again). Entries could 
also be locked on removal to prevent a race there.

Best Wishes

Phillip

> Thanks
> 
> Michal
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] setup: don't fail if commondir reference is deleted.
  2019-02-21 17:07       ` Phillip Wood
@ 2019-02-21 17:12         ` Eric Sunshine
  2019-02-21 17:27           ` Phillip Wood
  2019-02-21 17:33           ` Michal Suchánek
  2019-02-22  9:32         ` Duy Nguyen
  1 sibling, 2 replies; 27+ messages in thread
From: Eric Sunshine @ 2019-02-21 17:12 UTC (permalink / raw)
  To: Phillip Wood
  Cc: Michal Suchánek, Duy Nguyen, Git Mailing List,
	Marketa Calabkova, Junio C Hamano

On Thu, Feb 21, 2019 at 12:07 PM Phillip Wood <phillip.wood@talktalk.net> wrote:
> On 21/02/2019 13:50, Michal Suchánek wrote:
> >> On Tue, Feb 19, 2019 at 12:05 AM Michal Suchanek <msuchanek@suse.de> wrote:
> > The problem is we don't forbid worktree names ending with ".lock".
> > Which means that if we start to forbid them now existing worktrees
> > might become inaccessible.
>
> I think it is also racy as the renaming breaks the use of mkdir erroring
> out if the directory already exists. One solution is to have a lock
> entry in $GIT_COMMON_DIR/worktree-locks and make sure the code that
> iterates over the entries in $GIT_COMMON_DIR/worktrees skips any that
> have a corresponding ignores in $GIT_COMMON_DIR/worktree-locks. If the
> worktree-locks/<dir> is created before worktree/<dir> then it should be
> race free (you will have to remove the lock if the real entry cannot be
> created and then increment the counter and try again). Entries could
> also be locked on removal to prevent a race there.

I wonder, though, how much this helps or hinders the use-case which
prompted this patch series in the first place; to wit, creating
hundreds or thousands of worktrees. Doing so serially was too slow, so
the many "git worktree add" invocations were instead run in parallel
(which led to "discovery" of race conditions). Using a global worktree
lock would serialize worktree creation, thus slowing it down once
again.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] setup: don't fail if commondir reference is deleted.
  2019-02-21 17:12         ` Eric Sunshine
@ 2019-02-21 17:27           ` Phillip Wood
  2019-03-04 13:30             ` Michal Suchánek
  2019-02-21 17:33           ` Michal Suchánek
  1 sibling, 1 reply; 27+ messages in thread
From: Phillip Wood @ 2019-02-21 17:27 UTC (permalink / raw)
  To: Eric Sunshine, Phillip Wood
  Cc: Michal Suchánek, Duy Nguyen, Git Mailing List,
	Marketa Calabkova, Junio C Hamano

Hi Eric

On 21/02/2019 17:12, Eric Sunshine wrote:
> On Thu, Feb 21, 2019 at 12:07 PM Phillip Wood <phillip.wood@talktalk.net> wrote:
>> On 21/02/2019 13:50, Michal Suchánek wrote:
>>>> On Tue, Feb 19, 2019 at 12:05 AM Michal Suchanek <msuchanek@suse.de> wrote:
>>> The problem is we don't forbid worktree names ending with ".lock".
>>> Which means that if we start to forbid them now existing worktrees
>>> might become inaccessible.
>>
>> I think it is also racy as the renaming breaks the use of mkdir erroring
>> out if the directory already exists. One solution is to have a lock
>> entry in $GIT_COMMON_DIR/worktree-locks and make sure the code that
>> iterates over the entries in $GIT_COMMON_DIR/worktrees skips any that
>> have a corresponding ignores in $GIT_COMMON_DIR/worktree-locks. If the
>> worktree-locks/<dir> is created before worktree/<dir> then it should be
>> race free (you will have to remove the lock if the real entry cannot be
>> created and then increment the counter and try again). Entries could
>> also be locked on removal to prevent a race there.
> 
> I wonder, though, how much this helps or hinders the use-case which
> prompted this patch series in the first place; to wit, creating
> hundreds or thousands of worktrees. Doing so serially was too slow, so
> the many "git worktree add" invocations were instead run in parallel
> (which led to "discovery" of race conditions). Using a global worktree
> lock would serialize worktree creation, thus slowing it down once
> again.

The idea is that there are per-worktree lock stored under worktree-locks 
(hence the plural name). Using a separate directory for the locks gets 
round the problems of name clashes between the lock for a worktree 
called foo and one called foo.lock and means we can rely on mkdir 
erroring out if the worktree name already exists as there is no renaming.

Best Wishes

Phillip


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] setup: don't fail if commondir reference is deleted.
  2019-02-21 17:12         ` Eric Sunshine
  2019-02-21 17:27           ` Phillip Wood
@ 2019-02-21 17:33           ` Michal Suchánek
  1 sibling, 0 replies; 27+ messages in thread
From: Michal Suchánek @ 2019-02-21 17:33 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Phillip Wood, Duy Nguyen, Git Mailing List, Marketa Calabkova,
	Junio C Hamano

On Thu, 21 Feb 2019 12:12:28 -0500
Eric Sunshine <sunshine@sunshineco.com> wrote:

> On Thu, Feb 21, 2019 at 12:07 PM Phillip Wood <phillip.wood@talktalk.net> wrote:
> > On 21/02/2019 13:50, Michal Suchánek wrote:  
> > >> On Tue, Feb 19, 2019 at 12:05 AM Michal Suchanek <msuchanek@suse.de> wrote:  
> > > The problem is we don't forbid worktree names ending with ".lock".
> > > Which means that if we start to forbid them now existing worktrees
> > > might become inaccessible.  
> >
> > I think it is also racy as the renaming breaks the use of mkdir erroring
> > out if the directory already exists. One solution is to have a lock
> > entry in $GIT_COMMON_DIR/worktree-locks and make sure the code that
> > iterates over the entries in $GIT_COMMON_DIR/worktrees skips any that
> > have a corresponding ignores in $GIT_COMMON_DIR/worktree-locks. If the
> > worktree-locks/<dir> is created before worktree/<dir> then it should be
> > race free (you will have to remove the lock if the real entry cannot be
> > created and then increment the counter and try again). Entries could
> > also be locked on removal to prevent a race there.  
> 
> I wonder, though, how much this helps or hinders the use-case which
> prompted this patch series in the first place; to wit, creating
> hundreds or thousands of worktrees. Doing so serially was too slow, so
> the many "git worktree add" invocations were instead run in parallel
> (which led to "discovery" of race conditions). Using a global worktree
> lock would serialize worktree creation, thus slowing it down once
> again.

I created thousands of worktrees only for stress-testing. The real
workload needs only a dozen of them. That still leads to hitting a
race condition occasionally and automation failure.

Creating a separate lock directory will probably work. The question is
when do you need to take the lock. Before adding a worktree, sure.
Before deleting it as well. The problem is that deleting a worktree
successfully without creating some broken state needs to exclude
processes that might add stuff in the worktree directory. How many
operations then do *not* need to take the lock?

Thanks

Michal

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] setup: don't fail if commondir reference is deleted.
  2019-02-21 13:50     ` Michal Suchánek
  2019-02-21 17:07       ` Phillip Wood
@ 2019-02-22  9:26       ` Duy Nguyen
  1 sibling, 0 replies; 27+ messages in thread
From: Duy Nguyen @ 2019-02-22  9:26 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: Git Mailing List, Eric Sunshine, Marketa Calabkova,
	Junio C Hamano

On Thu, Feb 21, 2019 at 8:50 PM Michal Suchánek <msuchanek@suse.de> wrote:
>
> On Thu, 21 Feb 2019 17:50:38 +0700
> Duy Nguyen <pclouds@gmail.com> wrote:
>
> > On Tue, Feb 19, 2019 at 12:05 AM Michal Suchanek <msuchanek@suse.de> wrote:
> > >
> > > When adding wotktrees git can die in get_common_dir_noenv while
> > > examining existing worktrees because the commondir file does not exist.
> > > Rather than testing if the file exists before reading it handle ENOENT.
> >
> > I don't think we could go around fixing every access to incomplete
> > worktrees like this. If this is because of racy 'worktree add', then
> > perhaps a better solution is make it absolutely clear it's not ready
> > for anybody to access.
> >
> > For example, we can suffix the worktree directory name with ".lock"
> > and make sure get_worktrees() ignores entries ending with ".lock".
> > That should protect other commands while 'worktree add' is still
> > running. Only when the worktree is complete that 'worktree add' should
> > rename the directory to lose ".lock" and run external commands like
> > git-checkout to populate the worktree.
>
> The problem is we don't forbid worktree names ending with ".lock".
> Which means that if we start to forbid them now existing worktrees
> might become inaccessible.

Worktrees ending with .lock will not work well now anyway. While [1]
reports the problem with worktree names having a whitespace, ".lock"
is in the same class (not a valid refname) and will result the same
error. So if you have "*.lock" worktrees now you're already in
trouble.

[1] https://public-inbox.org/git/1550673274.30738.0@yandex.ru/T/#m9d86e0a388fd4961bc102c2c69e8bc3b2db07a42
-- 
Duy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] setup: don't fail if commondir reference is deleted.
  2019-02-21 17:07       ` Phillip Wood
  2019-02-21 17:12         ` Eric Sunshine
@ 2019-02-22  9:32         ` Duy Nguyen
  2019-02-22 10:20           ` Phillip Wood
  1 sibling, 1 reply; 27+ messages in thread
From: Duy Nguyen @ 2019-02-22  9:32 UTC (permalink / raw)
  To: Phillip Wood
  Cc: Michal Suchánek, Git Mailing List, Eric Sunshine,
	Marketa Calabkova, Junio C Hamano

On Fri, Feb 22, 2019 at 12:07 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>
> Hi Michal/Duy
>
> On 21/02/2019 13:50, Michal Suchánek wrote:
> > On Thu, 21 Feb 2019 17:50:38 +0700
> > Duy Nguyen <pclouds@gmail.com> wrote:
> >
> >> On Tue, Feb 19, 2019 at 12:05 AM Michal Suchanek <msuchanek@suse.de> wrote:
> >>>
> >>> When adding wotktrees git can die in get_common_dir_noenv while
> >>> examining existing worktrees because the commondir file does not exist.
> >>> Rather than testing if the file exists before reading it handle ENOENT.
> >>
> >> I don't think we could go around fixing every access to incomplete
> >> worktrees like this. If this is because of racy 'worktree add', then
> >> perhaps a better solution is make it absolutely clear it's not ready
> >> for anybody to access.
> >>
> >> For example, we can suffix the worktree directory name with ".lock"
> >> and make sure get_worktrees() ignores entries ending with ".lock".
> >> That should protect other commands while 'worktree add' is still
> >> running. Only when the worktree is complete that 'worktree add' should
> >> rename the directory to lose ".lock" and run external commands like
> >> git-checkout to populate the worktree.
> >
> > The problem is we don't forbid worktree names ending with ".lock".
> > Which means that if we start to forbid them now existing worktrees
> > might become inaccessible.
>
> I think it is also racy as the renaming breaks the use of mkdir erroring
> out if the directory already exists.

You mean the part where we see "fred" exists and decide to try the
name "fred1" instead (i.e. patch 1/2)?

I don't think it's the problem if that's the case. We mkdir
"fred.lock" _then_ check if "fred" exists. If it does, remove
fred.lock and move on to fred1.lock. Then we rename fred1.lock to
fred1 and error out if rename fails.
-- 
Duy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] setup: don't fail if commondir reference is deleted.
  2019-02-22  9:32         ` Duy Nguyen
@ 2019-02-22 10:20           ` Phillip Wood
  0 siblings, 0 replies; 27+ messages in thread
From: Phillip Wood @ 2019-02-22 10:20 UTC (permalink / raw)
  To: Duy Nguyen, Phillip Wood
  Cc: Michal Suchánek, Git Mailing List, Eric Sunshine,
	Marketa Calabkova, Junio C Hamano

Hi Duy

On 22/02/2019 09:32, Duy Nguyen wrote:
> On Fri, Feb 22, 2019 at 12:07 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>>
>> Hi Michal/Duy
>>
>> On 21/02/2019 13:50, Michal Suchánek wrote:
>>> On Thu, 21 Feb 2019 17:50:38 +0700
>>> Duy Nguyen <pclouds@gmail.com> wrote:
>>>
>>>> On Tue, Feb 19, 2019 at 12:05 AM Michal Suchanek <msuchanek@suse.de> wrote:
>>>>>
>>>>> When adding wotktrees git can die in get_common_dir_noenv while
>>>>> examining existing worktrees because the commondir file does not exist.
>>>>> Rather than testing if the file exists before reading it handle ENOENT.
>>>>
>>>> I don't think we could go around fixing every access to incomplete
>>>> worktrees like this. If this is because of racy 'worktree add', then
>>>> perhaps a better solution is make it absolutely clear it's not ready
>>>> for anybody to access.
>>>>
>>>> For example, we can suffix the worktree directory name with ".lock"
>>>> and make sure get_worktrees() ignores entries ending with ".lock".
>>>> That should protect other commands while 'worktree add' is still
>>>> running. Only when the worktree is complete that 'worktree add' should
>>>> rename the directory to lose ".lock" and run external commands like
>>>> git-checkout to populate the worktree.
>>>
>>> The problem is we don't forbid worktree names ending with ".lock".
>>> Which means that if we start to forbid them now existing worktrees
>>> might become inaccessible.
>>
>> I think it is also racy as the renaming breaks the use of mkdir erroring
>> out if the directory already exists.
> 
> You mean the part where we see "fred" exists and decide to try the
> name "fred1" instead (i.e. patch 1/2)?
> 
> I don't think it's the problem if that's the case. We mkdir
> "fred.lock" _then_ check if "fred" exists. If it does, remove
> fred.lock and move on to fred1.lock. Then we rename fred1.lock to
> fred1 and error out if rename fails.

Ah you're right, if another process tries to create fred.lock as we're 
renaming it either their mkdir fred.lock will fail or they'll see fred 
once they've made fred.lock

Sorry for the confusion

Phillip

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] setup: don't fail if commondir reference is deleted.
  2019-02-21 17:27           ` Phillip Wood
@ 2019-03-04 13:30             ` Michal Suchánek
  0 siblings, 0 replies; 27+ messages in thread
From: Michal Suchánek @ 2019-03-04 13:30 UTC (permalink / raw)
  To: Phillip Wood
  Cc: phillip.wood, Eric Sunshine, Duy Nguyen, Git Mailing List,
	Marketa Calabkova, Junio C Hamano

Hello,

On Thu, 21 Feb 2019 17:27:04 +0000
Phillip Wood <phillip.wood@talktalk.net> wrote:

> Hi Eric
> 
> On 21/02/2019 17:12, Eric Sunshine wrote:
> > On Thu, Feb 21, 2019 at 12:07 PM Phillip Wood <phillip.wood@talktalk.net> wrote:  
> >> On 21/02/2019 13:50, Michal Suchánek wrote:  
> >>>> On Tue, Feb 19, 2019 at 12:05 AM Michal Suchanek <msuchanek@suse.de> wrote:  
> >>> The problem is we don't forbid worktree names ending with ".lock".
> >>> Which means that if we start to forbid them now existing worktrees
> >>> might become inaccessible.  
> >>
> >> I think it is also racy as the renaming breaks the use of mkdir erroring
> >> out if the directory already exists. One solution is to have a lock
> >> entry in $GIT_COMMON_DIR/worktree-locks and make sure the code that
> >> iterates over the entries in $GIT_COMMON_DIR/worktrees skips any that
> >> have a corresponding ignores in $GIT_COMMON_DIR/worktree-locks. If the
> >> worktree-locks/<dir> is created before worktree/<dir> then it should be
> >> race free (you will have to remove the lock if the real entry cannot be
> >> created and then increment the counter and try again). Entries could
> >> also be locked on removal to prevent a race there.  
> > 
> > I wonder, though, how much this helps or hinders the use-case which
> > prompted this patch series in the first place; to wit, creating
> > hundreds or thousands of worktrees. Doing so serially was too slow, so
> > the many "git worktree add" invocations were instead run in parallel
> > (which led to "discovery" of race conditions). Using a global worktree
> > lock would serialize worktree creation, thus slowing it down once
> > again.  
> 
> The idea is that there are per-worktree lock stored under worktree-locks 
> (hence the plural name). Using a separate directory for the locks gets 
> round the problems of name clashes between the lock for a worktree 
> called foo and one called foo.lock and means we can rely on mkdir 
> erroring out if the worktree name already exists as there is no renaming.

I suppose this separate directory would work. When are you supposed to
take the lock, though?

When adding worktree, sure.

When managing worktrees, sure. Otherwise you would see the incomplete
worktrees.

When doing anything in git? Probably. Because otherwise you could
accidentally use the incomplete worktree. Or somebody deleting worktree
would fail removing it because you would keep adding files to it.

Isn't git supposed to allow parallel access to the repository?

As things stand if you wanted to implement worktree locking you would
need to lock the worktree for *every* operation that touches it, and
for many operations you would have to lock/unlock *all* worktrees one by
one to find the worktree you are supposed to work on.

I don't feel like adding locking to all of git to fix this problem.

Sure, adding enough locking to ensure repository consistency at all
times would be nice but it also needs to be granular enough to not harm
performance. I can't say I understand the git repository layout and
usage well enough to design that.

Thanks

Michal

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/2] worktree: fix worktree add race.
  2019-02-20 16:16 ` [PATCH v3 1/2] worktree: fix worktree add race Michal Suchanek
  2019-02-20 16:34   ` Eric Sunshine
@ 2019-03-08  9:20   ` Duy Nguyen
  2019-03-08  9:37     ` Eric Sunshine
  2019-03-11  1:55     ` Junio C Hamano
  1 sibling, 2 replies; 27+ messages in thread
From: Duy Nguyen @ 2019-03-08  9:20 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git Mailing List, Michal Suchanek, Eric Sunshine,
	Marketa Calabkova

Junio, it seems 2/2 is stuck in an endless discussion. But 1/2 is good
regardless, maybe pick it up now and let 2/2 come later whenever it's
ready?

On Wed, Feb 20, 2019 at 11:16 PM Michal Suchanek <msuchanek@suse.de> wrote:
>
> Git runs a stat loop to find a worktree name that's available and then does
> mkdir on the found name. Turn it to mkdir loop to avoid another invocation of
> worktree add finding the same free name and creating the directory first.
>
> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> ---
> v2:
> - simplify loop exit condition
> - exit early if the mkdir fails for reason other than already present
> worktree
> - make counter unsigned
> ---
>  builtin/worktree.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/builtin/worktree.c b/builtin/worktree.c
> index 3f9907fcc994..85a604cfe98c 100644
> --- a/builtin/worktree.c
> +++ b/builtin/worktree.c
> @@ -268,10 +268,10 @@ static int add_worktree(const char *path, const char *refname,
>         struct strbuf sb_git = STRBUF_INIT, sb_repo = STRBUF_INIT;
>         struct strbuf sb = STRBUF_INIT;
>         const char *name;
> -       struct stat st;
>         struct child_process cp = CHILD_PROCESS_INIT;
>         struct argv_array child_env = ARGV_ARRAY_INIT;
> -       int counter = 0, len, ret;
> +       unsigned int counter = 0;
> +       int len, ret;
>         struct strbuf symref = STRBUF_INIT;
>         struct commit *commit = NULL;
>         int is_branch = 0;
> @@ -295,8 +295,12 @@ static int add_worktree(const char *path, const char *refname,
>         if (safe_create_leading_directories_const(sb_repo.buf))
>                 die_errno(_("could not create leading directories of '%s'"),
>                           sb_repo.buf);
> -       while (!stat(sb_repo.buf, &st)) {
> +
> +       while (mkdir(sb_repo.buf, 0777)) {
>                 counter++;
> +               if ((errno != EEXIST) || !counter /* overflow */)
> +                       die_errno(_("could not create directory of '%s'"),
> +                                 sb_repo.buf);
>                 strbuf_setlen(&sb_repo, len);
>                 strbuf_addf(&sb_repo, "%d", counter);
>         }
> @@ -306,8 +310,6 @@ static int add_worktree(const char *path, const char *refname,
>         atexit(remove_junk);
>         sigchain_push_common(remove_junk_on_signal);
>
> -       if (mkdir(sb_repo.buf, 0777))
> -               die_errno(_("could not create directory of '%s'"), sb_repo.buf);
>         junk_git_dir = xstrdup(sb_repo.buf);
>         is_junk = 1;
>
> --
> 2.20.1
>


-- 
Duy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/2] worktree: fix worktree add race.
  2019-03-08  9:20   ` Duy Nguyen
@ 2019-03-08  9:37     ` Eric Sunshine
  2019-03-11  1:55     ` Junio C Hamano
  1 sibling, 0 replies; 27+ messages in thread
From: Eric Sunshine @ 2019-03-08  9:37 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Junio C Hamano, Git Mailing List, Michal Suchanek,
	Marketa Calabkova

On Fri, Mar 8, 2019 at 4:20 AM Duy Nguyen <pclouds@gmail.com> wrote:
> Junio, it seems 2/2 is stuck in an endless discussion. But 1/2 is good
> regardless, maybe pick it up now and let 2/2 come later whenever it's
> ready?

Yep, 1/2 seems a good idea and has not been controversial. It may not
solve all the race conditions, but it is a good step forward.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/2] worktree: fix worktree add race.
  2019-03-08  9:20   ` Duy Nguyen
  2019-03-08  9:37     ` Eric Sunshine
@ 2019-03-11  1:55     ` Junio C Hamano
  1 sibling, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2019-03-11  1:55 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Michal Suchanek, Eric Sunshine,
	Marketa Calabkova

Duy Nguyen <pclouds@gmail.com> writes:

> Junio, it seems 2/2 is stuck in an endless discussion. But 1/2 is good
> regardless, maybe pick it up now and let 2/2 come later whenever it's
> ready?

Thanks for poking, and I think it is a good idea.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2019-03-11  1:55 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-18 17:04 [PATCH 0/2] worktree add race fix Michal Suchanek
2019-02-18 17:04 ` [PATCH 1/2] worktree: fix worktree add race Michal Suchanek
2019-02-18 17:04 ` [PATCH 2/2] setup: don't fail if commondir reference is deleted Michal Suchanek
2019-02-18 21:00   ` Eric Sunshine
2019-02-21 10:50   ` Duy Nguyen
2019-02-21 13:50     ` Michal Suchánek
2019-02-21 17:07       ` Phillip Wood
2019-02-21 17:12         ` Eric Sunshine
2019-02-21 17:27           ` Phillip Wood
2019-03-04 13:30             ` Michal Suchánek
2019-02-21 17:33           ` Michal Suchánek
2019-02-22  9:32         ` Duy Nguyen
2019-02-22 10:20           ` Phillip Wood
2019-02-22  9:26       ` Duy Nguyen
2019-02-20 16:16 ` [PATCH v3 1/2] worktree: fix worktree add race Michal Suchanek
2019-02-20 16:34   ` Eric Sunshine
2019-02-20 17:29     ` Michal Suchánek
2019-03-08  9:20   ` Duy Nguyen
2019-03-08  9:37     ` Eric Sunshine
2019-03-11  1:55     ` Junio C Hamano
2019-02-20 16:16 ` [PATCH v3 2/2] setup: don't fail if commondir reference is deleted Michal Suchanek
2019-02-20 16:55   ` Eric Sunshine
2019-02-20 17:16     ` Michal Suchánek
2019-02-20 18:35       ` Eric Sunshine
2019-02-21  9:27         ` Eric Sunshine
2019-02-21 11:13           ` Michal Suchánek
2019-02-21 11:19         ` Michal Suchánek

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).