git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Marc Strapetz via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Marc Strapetz <marc.strapetz@syntevo.com>
Subject: Re: [PATCH] update-index: refresh should rewrite index in case of racy timestamps
Date: Wed, 22 Dec 2021 15:52:59 -0800	[thread overview]
Message-ID: <xmqqfsqkdwo4.fsf@gitster.g> (raw)
In-Reply-To: <pull.1105.git.1640181390841.gitgitgadget@gmail.com> (Marc Strapetz via GitGitGadget's message of "Wed, 22 Dec 2021 13:56:30 +0000")

"Marc Strapetz via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  builtin/update-index.c               |  6 +++
>  cache.h                              |  1 +
>  read-cache.c                         |  2 +-
>  t/t2108-update-index-refresh-racy.sh | 58 ++++++++++++++++++++++++++++
>  4 files changed, 66 insertions(+), 1 deletion(-)
>  create mode 100755 t/t2108-update-index-refresh-racy.sh
>
> diff --git a/builtin/update-index.c b/builtin/update-index.c
> index 187203e8bb5..0a069281e23 100644
> --- a/builtin/update-index.c
> +++ b/builtin/update-index.c
> @@ -787,6 +787,12 @@ static int refresh(struct refresh_params *o, unsigned int flag)
>  	setup_work_tree();
>  	read_cache();
>  	*o->has_errors |= refresh_cache(o->flags | flag);
> +	if (has_racy_timestamp(&the_index)) {
> +		/* For racy timestamps we should set active_cache_changed immediately:
> +		 * other callbacks may follow for which some of them may reset
> +		 * active_cache_changed. */
> +		active_cache_changed |= SOMETHING_CHANGED;
> +	}

Documentation/CodingGuidelines says:

 - Multi-line comments include their delimiters on separate lines from
   the text.  E.g.

	/*
	 * A very long
	 * multi-line comment.
	 */

The last half-sentence puzzles me, partly because of the word
"callback", which is an implementation detail of how --refresh and
other actions are triggered by the update-index command.  Calling
them "operation" or "action" might be easier to understand.  I dunno.

But more problematic is the word "reset", which at least to me
implies that the SOMETHING_CHANGED bit may be cleared by them, which
sounds just wrong and broken.

    ... goes and looks ...

Ah, there are cases where we do clear active_cache_changed when we
notice that an operation detected an error, to avoid spreading the
breakage by writing the index file out, and I think that is the
right thing to do.  Which means that the above patch is not quite
right.  Perhaps taking all of the above together, something like
this?

	*o->has_errors |= refresh_cache(o->flags | flag);
	if (*o->has_errors)
		active_cache_changed = 0; 
	else if (has_racy_timestamps(&the_index))
        	/*
		 * Even if nothing else has changed, updating the file
		 * increases the chance that racy timestamps become
		 * non-racy, helping future run-time performance.
		 */
		active_cache_changed |= SOMETHING_CHANGED;


> diff --git a/t/t2108-update-index-refresh-racy.sh b/t/t2108-update-index-refresh-racy.sh
> new file mode 100755
> index 00000000000..ece1151847c
> --- /dev/null
> +++ b/t/t2108-update-index-refresh-racy.sh
> @@ -0,0 +1,58 @@
> +#!/bin/sh
> +
> +test_description='update-index refresh tests related to racy timestamps'
> +
> +. ./test-lib.sh
> +
> +reset_mtime() {

Documentation/CodingGuidelines

 - We prefer a space between the function name and the parentheses,
   and no space inside the parentheses. The opening "{" should also
   be on the same line.

	(incorrect)
	my_function(){
		...

	(correct)
	my_function () {
		...

> +	test-tool chmtime =$(test-tool chmtime --get .git/fs-tstamp) $1

Even if we know all the existing callers pass a single word argument
to this function, it would be a good discipline to put double-quotes
around "$1" to assure the readers that we are future-proofed.

> +}
> +
> +update_assert_unchanged() {
> +	local ts1=$(test-tool chmtime --get .git/index) &&
> +	git update-index $1 &&
> +	local ts2=$(test-tool chmtime --get .git/index) &&
> +	[ $ts1 -eq $ts2 ]

Documentation/CodingGuidelines

 - We prefer "test" over "[ ... ]".

> +}
> +
> +update_assert_changed() {
> +	local ts1=$(test-tool chmtime --get .git/index) &&
> +	test_might_fail git update-index $1 &&
> +	local ts2=$(test-tool chmtime --get .git/index) &&
> +	[ $ts1 -ne $ts2 ]
> +}
> +
> +test_expect_success 'setup' '
> +	touch .git/fs-tstamp &&

Not that it is wrong, but do we need to create such a throw-away
file inside the .git directory?

When we care only the presence of a path, and not that the path has
the current timestamp, we prefer not to use "touch".

	>.git/fs-tstamp

I am debating myself which is more appropriate in this case.  A
mistaken implementation of "touch" could call gettimeofday() and use
the result to call utimes(), leaving wallclock timestamp in the
result, but redirecation to create or truncate the path is a more
guaranteed way to make sure the timestamp comes from the filesystem,
so it may be more suitable for our needs here.

> +	test-tool chmtime -1 .git/fs-tstamp &&
> +	echo content >file &&
> +	reset_mtime file &&
> +
> +	git add file &&
> +	git commit -m "initial import"
> +'
> +
> +test_expect_success '--refresh has no racy timestamps to fix' '
> +	reset_mtime .git/index &&
> +	test-tool chmtime +1 .git/index &&
> +	update_assert_unchanged --refresh
> +'
> +
> +test_expect_success '--refresh should fix racy timestamp' '
> +	reset_mtime .git/index &&
> +	update_assert_changed --refresh
> +'
> +
> +test_expect_success '--really-refresh should fix racy timestamp' '
> +	reset_mtime .git/index &&
> +	update_assert_changed --really-refresh
> +'
> +
> +test_expect_success '--refresh should fix racy timestamp even if needs update' '
> +	echo content2 >file &&
> +	reset_mtime file &&
> +	reset_mtime .git/index &&
> +	update_assert_changed --refresh
> +'
> +
> +test_done
>
> base-commit: 597af311a2899bfd6640b9b107622c5795d5f998

  reply	other threads:[~2021-12-22 23:53 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-22 13:56 [PATCH] update-index: refresh should rewrite index in case of racy timestamps Marc Strapetz via GitGitGadget
2021-12-22 23:52 ` Junio C Hamano [this message]
2021-12-23 18:24   ` Marc Strapetz
2022-01-05 13:15 ` [PATCH v2 0/2] " Marc Strapetz via GitGitGadget
2022-01-05 13:15   ` [PATCH v2 1/2] t7508: add tests capturing racy timestamp handling Marc Strapetz via GitGitGadget
2022-01-05 20:59     ` Junio C Hamano
2022-01-06 10:21       ` Marc Strapetz
2022-01-05 13:15   ` [PATCH v2 2/2] update-index: refresh should rewrite index in case of racy timestamps Marc Strapetz via GitGitGadget
2022-01-05 21:03     ` Junio C Hamano
2022-01-06 22:34   ` [PATCH v3 0/4] " Marc Strapetz via GitGitGadget
2022-01-06 22:34     ` [PATCH v3 1/4] test-lib: introduce API for verifying file mtime Marc Strapetz via GitGitGadget
2022-01-06 23:55       ` Junio C Hamano
2022-01-06 22:34     ` [PATCH v3 2/4] t7508: fix bogus mtime verification Marc Strapetz via GitGitGadget
2022-01-06 22:34     ` [PATCH v3 3/4] t7508: add tests capturing racy timestamp handling Marc Strapetz via GitGitGadget
2022-01-06 22:34     ` [PATCH v3 4/4] update-index: refresh should rewrite index in case of racy timestamps Marc Strapetz via GitGitGadget
2022-01-07 11:17     ` [PATCH v4 0/4] " Marc Strapetz via GitGitGadget
2022-01-07 11:17       ` [PATCH v4 1/4] test-lib: introduce API for verifying file mtime Marc Strapetz via GitGitGadget
2022-01-07 11:17       ` [PATCH v4 2/4] t7508: fix bogus mtime verification Marc Strapetz via GitGitGadget
2022-01-07 11:17       ` [PATCH v4 3/4] t7508: add tests capturing racy timestamp handling Marc Strapetz via GitGitGadget
2022-01-07 11:17       ` [PATCH v4 4/4] update-index: refresh should rewrite index in case of racy timestamps Marc Strapetz via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqfsqkdwo4.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=marc.strapetz@syntevo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).