git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] reftable/stack: fix race in up-to-date check
@ 2024-01-18 13:41 Patrick Steinhardt
  2024-01-18 13:41 ` [PATCH 1/2] reftable/stack: unconditionally reload stack after commit Patrick Steinhardt
  2024-01-18 13:41 ` [PATCH 2/2] reftable/stack: fix race in up-to-date check Patrick Steinhardt
  0 siblings, 2 replies; 7+ messages in thread
From: Patrick Steinhardt @ 2024-01-18 13:41 UTC (permalink / raw
  To: git

[-- Attachment #1: Type: text/plain, Size: 1225 bytes --]

Hi,

In 6fdfaf15 (reftable/stack: use stat info to avoid re-reading stack
list, 2024-01-11) we have introduced a new mechanism to avoid re-reading
the table list in case stat(3P) figures out that the stack didn't change
since the last time we read it.

Unfortunately, I've been able to hit a bug there that can happen under
very specific circumstances when a reading and writing process race with
each other. When the writer appends to the stack and then compacts it,
it may happen that we recycle the previously-used inode of "tables.list"
so that the resulting `struct stat` exactly matches the cached value of
the reading process.

The first patch is merely cosmetic, whereas the second patch fixes the
described race. The topic depends on ps/reftable-optimize-io, which has
introduced the issue.

Patrick

Patrick Steinhardt (2):
  reftable/stack: unconditionally reload stack after commit
  reftable/stack: fix race in up-to-date check

 reftable/stack.c  | 101 ++++++++++++++++++++++++++++++++++++++++++----
 reftable/stack.h  |   4 +-
 reftable/system.h |   1 -
 3 files changed, 96 insertions(+), 10 deletions(-)


base-commit: 718a93ecc06ed59dda4e6a5d91b1c2169275694f
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] reftable/stack: unconditionally reload stack after commit
  2024-01-18 13:41 [PATCH 0/2] reftable/stack: fix race in up-to-date check Patrick Steinhardt
@ 2024-01-18 13:41 ` Patrick Steinhardt
  2024-01-18 13:41 ` [PATCH 2/2] reftable/stack: fix race in up-to-date check Patrick Steinhardt
  1 sibling, 0 replies; 7+ messages in thread
From: Patrick Steinhardt @ 2024-01-18 13:41 UTC (permalink / raw
  To: git

[-- Attachment #1: Type: text/plain, Size: 1137 bytes --]

After we have committed an addition to the reftable stack we call
`reftable_stack_reload()` to reload the stack and thus reflect the
changes that were just added. This function will only conditionally
reload the stack in case `stack_uptodate()` tells us that the stack
needs reloading. This check is wasteful though because we already know
that the stack needs reloading.

Call `reftable_stack_reload_maybe_reuse()` instead, which will
unconditionally reload the stack. This is merely a conceptual fix, the
code in question was not found to cause any problems in practice.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index c28d82299d..705cfb6caa 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -585,7 +585,7 @@ int reftable_addition_commit(struct reftable_addition *add)
 	add->new_tables = NULL;
 	add->new_tables_len = 0;
 
-	err = reftable_stack_reload(add->stack);
+	err = reftable_stack_reload_maybe_reuse(add->stack, 1);
 	if (err)
 		goto done;
 
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] reftable/stack: fix race in up-to-date check
  2024-01-18 13:41 [PATCH 0/2] reftable/stack: fix race in up-to-date check Patrick Steinhardt
  2024-01-18 13:41 ` [PATCH 1/2] reftable/stack: unconditionally reload stack after commit Patrick Steinhardt
@ 2024-01-18 13:41 ` Patrick Steinhardt
  2024-01-18 20:07   ` Junio C Hamano
  2024-01-20  1:05   ` Jeff King
  1 sibling, 2 replies; 7+ messages in thread
From: Patrick Steinhardt @ 2024-01-18 13:41 UTC (permalink / raw
  To: git

[-- Attachment #1: Type: text/plain, Size: 9609 bytes --]

In 6fdfaf15a0 (reftable/stack: use stat info to avoid re-reading stack
list, 2024-01-11) we have introduced a new mechanism to avoid re-reading
the table list in case stat(3P) figures out that the stack didn't change
since the last time we read it.

While this change significantly improved performance when writing many
refs, it can unfortunately lead to false negatives in very specific
scenarios. Given two processes A and B, there is a feasible sequence of
events that cause us to accidentally treat the table list as up-to-date
even though it changed:

  1. A reads the reftable stack and caches its stat info.

  2. B updates the stack, appending a new table to "tables.list". This
     will both use a new inode and result in a different file size, thus
     invalidating A's cache in theory.

  3. B decides to auto-compact the stack and merges two tables. The file
     size now matches what A has cached again. Furthermore, the
     filesystem may decide to recycle the inode number of the file we
     have replaced in (2) because it is not in use anymore.

  4. A reloads the reftable stack. Neither the inode number nor the
     file size changed. If the timestamps did not change either then we
     think the cached copy of our stack is up-to-date.

In fact, the commit introduced three related issues:

  - Non-POSIX compliant systems may not report proper `st_dev` and
    `st_ino` values in stat(3P), which made us rely solely on the
    file's potentially coarse-grained mtime and ctime.

  - `stat_validity_check()` and friends may end up not comparing
    `st_dev` and `st_ino` depending on the "core.checkstat" config,
    again reducing the signal to the mtime and ctime.

  - `st_ino` can be recycled, rendering the check moot even on
    POSIX-compliant systems.

Given that POSIX defines that "The st_ino and st_dev fields taken
together uniquely identify the file within the system", these issues led
to the most important signal to establish file identity to be ignored or
become useless in some cases.

Refactor the code to stop using `stat_validity_check()`. Instead, we
manually stat(3P) the file descriptors to make relevant information
available. On Windows and MSYS2 the result will have both `st_dev` and
`st_ino` set to 0, which allows us to address the first issue by not
using the stat-based cache in that case. It also allows us to make sure
that we always compare `st_dev` and `st_ino`, addressing the second
issue.

The third issue of inode recycling can be addressed by keeping the file
descriptor of "files.list" open during the lifetime of the reftable
stack. As the file will still exist on disk even though it has been
unlinked it is impossible for its inode to be recycled as long as the
file descriptor is still open.

This should address the race in a POSIX-compliant way. The only real
downside is that this mechanism cannot be used on non-POSIX-compliant
systems like Windows. But we at least have the second-level caching
mechanism in place that compares contents of "files.list" with the
currently loaded list of tables.

This new mechanism performs roughly the same as the previous one that
relied on `stat_validity_check()`:

  Benchmark 1: update-ref: create many refs (HEAD~)
    Time (mean ± σ):      4.754 s ±  0.026 s    [User: 2.204 s, System: 2.549 s]
    Range (min … max):    4.694 s …  4.802 s    20 runs

  Benchmark 2: update-ref: create many refs (HEAD)
    Time (mean ± σ):      4.721 s ±  0.020 s    [User: 2.194 s, System: 2.527 s]
    Range (min … max):    4.691 s …  4.753 s    20 runs

  Summary
    update-ref: create many refs (HEAD~) ran
      1.01 ± 0.01 times faster than update-ref: create many refs (HEAD)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c  | 99 +++++++++++++++++++++++++++++++++++++++++++----
 reftable/stack.h  |  4 +-
 reftable/system.h |  1 -
 3 files changed, 95 insertions(+), 9 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index 705cfb6caa..77a387a86c 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -66,6 +66,7 @@ int reftable_new_stack(struct reftable_stack **dest, const char *dir,
 	strbuf_addstr(&list_file_name, "/tables.list");
 
 	p->list_file = strbuf_detach(&list_file_name, NULL);
+	p->list_fd = -1;
 	p->reftable_dir = xstrdup(dir);
 	p->config = config;
 
@@ -175,7 +176,12 @@ void reftable_stack_destroy(struct reftable_stack *st)
 		st->readers_len = 0;
 		FREE_AND_NULL(st->readers);
 	}
-	stat_validity_clear(&st->list_validity);
+
+	if (st->list_fd >= 0) {
+		close(st->list_fd);
+		st->list_fd = -1;
+	}
+
 	FREE_AND_NULL(st->list_file);
 	FREE_AND_NULL(st->reftable_dir);
 	reftable_free(st);
@@ -375,11 +381,59 @@ static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
 		sleep_millisec(delay);
 	}
 
-	stat_validity_update(&st->list_validity, fd);
-
 out:
-	if (err)
-		stat_validity_clear(&st->list_validity);
+	/*
+	 * Invalidate the stat cache. It is sufficient to only close the file
+	 * descriptor and keep the cached stat info because we never use the
+	 * latter when the former is negative.
+	 */
+	if (st->list_fd >= 0) {
+		close(st->list_fd);
+		st->list_fd = -1;
+	}
+
+	/*
+	 * Cache stat information in case it provides a useful signal to us.
+	 * According to POSIX, "The st_ino and st_dev fields taken together
+	 * uniquely identify the file within the system." That being said,
+	 * Windows is not POSIX compliant and we do not have these fields
+	 * available. So the information we have there is insufficient to
+	 * determine whether two file descriptors point to the same file.
+	 *
+	 * While we could fall back to using other signals like the file's
+	 * mtime, those are not sufficient to avoid races. We thus refrain from
+	 * using the stat cache on such systems and fall back to the secondary
+	 * caching mechanism, which is to check whether contents of the file
+	 * have changed.
+	 *
+	 * On other systems which are POSIX compliant we must keep the file
+	 * descriptor open. This is to avoid a race condition where two
+	 * processes access the reftable stack at the same point in time:
+	 *
+	 *   1. A reads the reftable stack and caches its stat info.
+	 *
+	 *   2. B updates the stack, appending a new table to "tables.list".
+	 *      This will both use a new inode and result in a different file
+	 *      size, thus invalidating A's cache in theory.
+	 *
+	 *   3. B decides to auto-compact the stack and merges two tables. The
+	 *      file size now matches what A has cached again. Furthermore, the
+	 *      filesystem may decide to recycle the inode number of the file
+	 *      we have replaced in (2) because it is not in use anymore.
+	 *
+	 *   4. A reloads the reftable stack. Neither the inode number nor the
+	 *      file size changed. If the timestamps did not change either then
+	 *      we think the cached copy of our stack is up-to-date.
+	 *
+	 * By keeping the file descriptor open the inode number cannot be
+	 * recycled, mitigating the race.
+	 */
+	if (!err && fd >= 0 && !fstat(fd, &st->list_st) &&
+	    st->list_st.st_dev && st->list_st.st_ino) {
+		st->list_fd = fd;
+		fd = -1;
+	}
+
 	if (fd >= 0)
 		close(fd);
 	free_names(names);
@@ -396,8 +450,39 @@ static int stack_uptodate(struct reftable_stack *st)
 	int err;
 	int i = 0;
 
-	if (stat_validity_check(&st->list_validity, st->list_file))
-		return 0;
+	/*
+	 * When we have cached stat information available then we use it to
+	 * verify whether the file has been rewritten.
+	 *
+	 * Note that we explicitly do not want to use `stat_validity_check()`
+	 * and friends here because they may end up not comparing the `st_dev`
+	 * and `st_ino` fields. These functions thus cannot guarantee that we
+	 * indeed still have the same file.
+	 */
+	if (st->list_fd >= 0) {
+		struct stat list_st;
+
+		if (stat(st->list_file, &list_st) < 0) {
+			/*
+			 * It's fine for "tables.list" to not exist. In that
+			 * case, we have to refresh when the loaded stack has
+			 * any readers.
+			 */
+			if (errno == ENOENT)
+				return !!st->readers_len;
+			return REFTABLE_IO_ERROR;
+		}
+
+		/*
+		 * When "tables.list" refers to the same file we can assume
+		 * that it didn't change. This is because we always use
+		 * rename(3P) to update the file and never write to it
+		 * directly.
+		 */
+		if (st->list_st.st_dev == list_st.st_dev &&
+		    st->list_st.st_ino == list_st.st_ino)
+			return 0;
+	}
 
 	err = read_lines(st->list_file, &names);
 	if (err < 0)
diff --git a/reftable/stack.h b/reftable/stack.h
index 3f80cc598a..c1e3efa899 100644
--- a/reftable/stack.h
+++ b/reftable/stack.h
@@ -14,8 +14,10 @@ license that can be found in the LICENSE file or at
 #include "reftable-stack.h"
 
 struct reftable_stack {
-	struct stat_validity list_validity;
+	struct stat list_st;
 	char *list_file;
+	int list_fd;
+
 	char *reftable_dir;
 	int disable_auto_compact;
 
diff --git a/reftable/system.h b/reftable/system.h
index 2cc7adf271..6b74a81514 100644
--- a/reftable/system.h
+++ b/reftable/system.h
@@ -12,7 +12,6 @@ license that can be found in the LICENSE file or at
 /* This header glues the reftable library to the rest of Git */
 
 #include "git-compat-util.h"
-#include "statinfo.h"
 #include "strbuf.h"
 #include "hash-ll.h" /* hash ID, sizes.*/
 #include "dir.h" /* remove_dir_recursively, for tests.*/
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] reftable/stack: fix race in up-to-date check
  2024-01-18 13:41 ` [PATCH 2/2] reftable/stack: fix race in up-to-date check Patrick Steinhardt
@ 2024-01-18 20:07   ` Junio C Hamano
  2024-01-20  1:05   ` Jeff King
  1 sibling, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2024-01-18 20:07 UTC (permalink / raw
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

> This should address the race in a POSIX-compliant way. The only real
> downside is that this mechanism cannot be used on non-POSIX-compliant
> systems like Windows. But we at least have the second-level caching
> mechanism in place that compares contents of "files.list" with the
> currently loaded list of tables.

OK.


> +	/*
> +	 * Cache stat information in case it provides a useful signal to us.
> +	 * According to POSIX, "The st_ino and st_dev fields taken together
> +	 * uniquely identify the file within the system." That being said,
> +	 * Windows is not POSIX compliant and we do not have these fields
> +	 * available. So the information we have there is insufficient to
> +	 * determine whether two file descriptors point to the same file.
> +	 *
> +	 * While we could fall back to using other signals like the file's
> +	 * mtime, those are not sufficient to avoid races. We thus refrain from
> +	 * using the stat cache on such systems and fall back to the secondary
> +	 * caching mechanism, which is to check whether contents of the file
> +	 * have changed.

OK.

> +	 *
> +	 * On other systems which are POSIX compliant we must keep the file
> +	 * descriptor open. This is to avoid a race condition where two
> +	 * processes access the reftable stack at the same point in time:
> +	 *
> +	 *   1. A reads the reftable stack and caches its stat info.
> +	 *
> +	 *   2. B updates the stack, appending a new table to "tables.list".
> +	 *      This will both use a new inode and result in a different file
> +	 *      size, thus invalidating A's cache in theory.
> +	 *
> +	 *   3. B decides to auto-compact the stack and merges two tables. The
> +	 *      file size now matches what A has cached again. Furthermore, the
> +	 *      filesystem may decide to recycle the inode number of the file
> +	 *      we have replaced in (2) because it is not in use anymore.
> +	 *
> +	 *   4. A reloads the reftable stack. Neither the inode number nor the
> +	 *      file size changed. If the timestamps did not change either then
> +	 *      we think the cached copy of our stack is up-to-date.
> +	 *
> +	 * By keeping the file descriptor open the inode number cannot be
> +	 * recycled, mitigating the race.
> +	 */

This is nasty.  Well diagnosed and fixed.

Will queue.

Thanks.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] reftable/stack: fix race in up-to-date check
  2024-01-18 13:41 ` [PATCH 2/2] reftable/stack: fix race in up-to-date check Patrick Steinhardt
  2024-01-18 20:07   ` Junio C Hamano
@ 2024-01-20  1:05   ` Jeff King
  2024-01-22 10:32     ` Patrick Steinhardt
  1 sibling, 1 reply; 7+ messages in thread
From: Jeff King @ 2024-01-20  1:05 UTC (permalink / raw
  To: Patrick Steinhardt; +Cc: git

On Thu, Jan 18, 2024 at 02:41:56PM +0100, Patrick Steinhardt wrote:

> Refactor the code to stop using `stat_validity_check()`. Instead, we
> manually stat(3P) the file descriptors to make relevant information
> available. On Windows and MSYS2 the result will have both `st_dev` and
> `st_ino` set to 0, which allows us to address the first issue by not
> using the stat-based cache in that case. It also allows us to make sure
> that we always compare `st_dev` and `st_ino`, addressing the second
> issue.

I didn't think too hard about the details, but does this mean that
every user of stat_validity_check() has the same issue? The other big
one is packed-refs (for which the code was originally written). Should
this fix go into that API?

-Peff


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] reftable/stack: fix race in up-to-date check
  2024-01-20  1:05   ` Jeff King
@ 2024-01-22 10:32     ` Patrick Steinhardt
  2024-01-23  0:32       ` Jeff King
  0 siblings, 1 reply; 7+ messages in thread
From: Patrick Steinhardt @ 2024-01-22 10:32 UTC (permalink / raw
  To: Jeff King; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2687 bytes --]

On Fri, Jan 19, 2024 at 08:05:59PM -0500, Jeff King wrote:
> On Thu, Jan 18, 2024 at 02:41:56PM +0100, Patrick Steinhardt wrote:
> 
> > Refactor the code to stop using `stat_validity_check()`. Instead, we
> > manually stat(3P) the file descriptors to make relevant information
> > available. On Windows and MSYS2 the result will have both `st_dev` and
> > `st_ino` set to 0, which allows us to address the first issue by not
> > using the stat-based cache in that case. It also allows us to make sure
> > that we always compare `st_dev` and `st_ino`, addressing the second
> > issue.
> 
> I didn't think too hard about the details, but does this mean that
> every user of stat_validity_check() has the same issue? The other big
> one is packed-refs (for which the code was originally written). Should
> this fix go into that API?

In theory, the issue is the same for the `packed-refs` file. But in
practice it's much less likely to be an issue:

  - The file gets rewritten a lot less frequently than the "tables.list"
    file, making the race less likely in the first place. It can only
    happen when git-pack-refs(1) races with concurrent readers, whereas
    it can happen for any two concurrent process with reftables.

  - Due to entries in the `packed-refs` being of variable length it's
    less likely that the size will be exactly the same after the file
    has been rewritten. For reftables we have constant-length entries in
    the "tables.list", so it's likely to happen there.

  - It is very unlikely that we have the same issue with inode reuse
    with the packed-refs file. The only reason why we have it with the
    reftable backend is that it is very likely that we end up writing to
    "tables.list" twice, once for the normal update and once for auto
    compaction.

So overall, I doubt that it's all that critical in practice for the
packed-refs backend. It _is_ possible to happen, but chances are
significantly lower. I cannot recall a single report of this issue,
which underpins how unlikely it seems to be for the files backend.

Also, applying the same fix for the packed-refs would essentially mean
that the caching mechanism is now ineffective on Windows systems where
we do not have proper `st_dev` and `st_ino` values available. I think
this is a no-go in the context of packed-refs because we don't have a
second-level caching mechanism like we do in the reftable backend. It's
not great that we have to reread the "tables.list" file on Windows
systems for now, but at least it's better than having to reread the
complete "packed-refs" file like we'd have to do with the files backend.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] reftable/stack: fix race in up-to-date check
  2024-01-22 10:32     ` Patrick Steinhardt
@ 2024-01-23  0:32       ` Jeff King
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2024-01-23  0:32 UTC (permalink / raw
  To: Patrick Steinhardt; +Cc: git

On Mon, Jan 22, 2024 at 11:32:14AM +0100, Patrick Steinhardt wrote:

> > I didn't think too hard about the details, but does this mean that
> > every user of stat_validity_check() has the same issue? The other big
> > one is packed-refs (for which the code was originally written). Should
> > this fix go into that API?
> 
> In theory, the issue is the same for the `packed-refs` file. But in
> practice it's much less likely to be an issue:

Thanks for laying this all out. It does concern me a little that there's
still a possible race, because they can be so hard to catch and debug in
practice. But I think you make a compelling argument that it's probably
not happening a lot in practice, and especially...

> Also, applying the same fix for the packed-refs would essentially mean
> that the caching mechanism is now ineffective on Windows systems where
> we do not have proper `st_dev` and `st_ino` values available. I think
> this is a no-go in the context of packed-refs because we don't have a
> second-level caching mechanism like we do in the reftable backend. It's
> not great that we have to reread the "tables.list" file on Windows
> systems for now, but at least it's better than having to reread the
> complete "packed-refs" file like we'd have to do with the files backend.

...here that the performance profile is so different. If the "fix"
means re-reading the whole packed-refs file constantly, that's going
to be quite noticeable.

Given that this race has been here forever-ish, I agree with you that we
should leave it be.

-Peff


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-01-23  1:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-18 13:41 [PATCH 0/2] reftable/stack: fix race in up-to-date check Patrick Steinhardt
2024-01-18 13:41 ` [PATCH 1/2] reftable/stack: unconditionally reload stack after commit Patrick Steinhardt
2024-01-18 13:41 ` [PATCH 2/2] reftable/stack: fix race in up-to-date check Patrick Steinhardt
2024-01-18 20:07   ` Junio C Hamano
2024-01-20  1:05   ` Jeff King
2024-01-22 10:32     ` Patrick Steinhardt
2024-01-23  0:32       ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).