git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <johannes.schindelin@gmx.de>
To: Karsten Blees <karsten.blees@gmail.com>
Cc: Git List <git@vger.kernel.org>
Subject: Re: [PATCH/RFC] read-cache: fix file time comparisons with different precisions
Date: Mon, 28 Sep 2015 14:52:38 +0200	[thread overview]
Message-ID: <763be6c1331ac57cf7dee3636d82f994@dscho.org> (raw)
In-Reply-To: <560918F8.1080905@gmail.com>

Hi Karsten,

On 2015-09-28 12:39, Karsten Blees wrote:
> Different git variants record file times in the index with different
> precisions, according to their capabilities. E.g. git compiled with NO_NSEC
> records seconds only, JGit records the mtime in milliseconds, but leaves
> ctime blank (because ctime is unavailable in Java).
> 
> This causes performance issues in git compiled with USE_NSEC, because index
> entries with such 'incomplete' timestamps are considered dirty, triggering
> unnecessary content checks.
> 
> Add a file time comparison function that auto-detects the precision based
> on the number of trailing 0 digits, and compares with the lower precision
> of both values. This initial version supports the known precisions seconds
> (git + NO_NSEC), milliseconds (JGit) and nanoseconds (git + USE_NSEC), but
> can be easily extended to e.g. microseconds.
> 
> Use the new comparison function in both dirty and racy checks. As a side
> effect, this fixes racy detection in USE_NSEC-enabled git with
> core.checkStat=minimal, as the coreStat setting now affects racy checks as
> well.
> 
> Finally, do not check ctime if ctime.sec is 0 (as recorded by JGit).

Great analysis, and nice patch. I would like to offer one suggestion in addition:

> diff --git a/read-cache.c b/read-cache.c
> index 87204a5..3a4e6cd 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -99,23 +99,50 @@ void fill_stat_data(struct stat_data *sd, struct stat *st)
>  	sd->sd_size = st->st_size;
>  }
>  
> +/*
> + * Compares two file times. Returns 0 if equal, <0 if t1 < t2, >0 if t1 > t2.
> + * Auto-detects precision based on trailing 0 digits. Compares seconds only if
> + * core.checkStat=minimal.
> + */
> +static inline int cmp_filetime(uint32_t t1_sec, uint32_t t1_nsec,
> +			       uint32_t t2_sec, uint32_t t2_nsec) {
> +#ifdef USE_NSEC
> +	/*
> +	 * Compare seconds and return result if different, or checkStat=mimimal,
> +	 * or one of the time stamps has second precision only (nsec == 0).
> +	 */
> +	int diff = t1_sec - t2_sec;
> +	if (diff || !check_stat || !t1_nsec || !t2_nsec)
> +		return diff;
> +
> +	/*
> +	 * Check if one of the time stamps has millisecond precision only (i.e.
> +	 * the trailing 6 digits are 0). First check the trailing 6 bits so that
> +	 * we only do (slower) modulo division if necessary.
> +	 */
> +	if ((!(t1_nsec & 0x3f) && !(t1_nsec % 1000000)) ||
> +	    (!(t2_nsec & 0x3f) && !(t2_nsec % 1000000)))
> +		/* Compare milliseconds. */
> +		return (t1_nsec - t2_nsec) / 1000000;
> +
> +	/* Compare nanoseconds */
> +	return t1_nsec - t2_nsec;
> +#else
> +	return t1_sec - t2_sec;
> +#endif
> +}

As this affects only setups where the same repository is accessed via clients with different precision, would it make sense to hide this behind a config option? I.e. something like

static int cmp_filetime_precise(uint32_t t1_sec, uint32_t t1_nsec,
			        uint32_t t2_sec, uint32_t t2_nsec)
{
#ifdef USE_NSEC
	return t1_sec != t2_sec ? t1_sec - t2_sec : t1_nsec - t2_nsec;
#else
	return t1_sec - t2_sec;
#endif
}

static int cmp_filetime_mixed(uint32_t t1_sec, uint32_t t1_nsec,
			      uint32_t t2_sec, uint32_t t2_nsec)
{
#ifdef USE_NSEC
	... detect lower precision and compare with lower precision only...
#else
	return t1_sec - t2_sec;
#endif
}

static (int *)cmp_filetime(uint32_t t1_sec, uint32_t t1_nsec,
			   uint32_t t2_sec, uint32_t t2_nsec)
	= cmp_filetime_precise;

... modify cmp_filetime_precise if core.mixedTimeSpec = true...

Otherwise there would be that little loop-hole where (nsec % 1000) == 0 *by chance* and we assume the timestamps to be identical even if they are not.

Ciao,
Dscho

  reply	other threads:[~2015-09-28 12:52 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-25 23:28 broken racy detection and performance issues with nanosecond file times Karsten Blees
2015-09-28 10:39 ` [PATCH/RFC] read-cache: fix file time comparisons with different precisions Karsten Blees
2015-09-28 12:52   ` Johannes Schindelin [this message]
2015-09-29 10:23     ` Karsten Blees
2015-09-29 13:42       ` Johannes Schindelin
2015-09-28 17:38 ` broken racy detection and performance issues with nanosecond file times Junio C Hamano
2015-09-29 11:28   ` Karsten Blees
2015-09-28 18:17 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=763be6c1331ac57cf7dee3636d82f994@dscho.org \
    --to=johannes.schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=karsten.blees@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).