From: Johannes Schindelin <johannes.schindelin@gmx.de>
To: Karsten Blees <karsten.blees@gmail.com>
Cc: Git List <git@vger.kernel.org>
Subject: Re: [PATCH/RFC] read-cache: fix file time comparisons with different precisions
Date: Mon, 28 Sep 2015 14:52:38 +0200 [thread overview]
Message-ID: <763be6c1331ac57cf7dee3636d82f994@dscho.org> (raw)
In-Reply-To: <560918F8.1080905@gmail.com>
Hi Karsten,
On 2015-09-28 12:39, Karsten Blees wrote:
> Different git variants record file times in the index with different
> precisions, according to their capabilities. E.g. git compiled with NO_NSEC
> records seconds only, JGit records the mtime in milliseconds, but leaves
> ctime blank (because ctime is unavailable in Java).
>
> This causes performance issues in git compiled with USE_NSEC, because index
> entries with such 'incomplete' timestamps are considered dirty, triggering
> unnecessary content checks.
>
> Add a file time comparison function that auto-detects the precision based
> on the number of trailing 0 digits, and compares with the lower precision
> of both values. This initial version supports the known precisions seconds
> (git + NO_NSEC), milliseconds (JGit) and nanoseconds (git + USE_NSEC), but
> can be easily extended to e.g. microseconds.
>
> Use the new comparison function in both dirty and racy checks. As a side
> effect, this fixes racy detection in USE_NSEC-enabled git with
> core.checkStat=minimal, as the coreStat setting now affects racy checks as
> well.
>
> Finally, do not check ctime if ctime.sec is 0 (as recorded by JGit).
Great analysis, and nice patch. I would like to offer one suggestion in addition:
> diff --git a/read-cache.c b/read-cache.c
> index 87204a5..3a4e6cd 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -99,23 +99,50 @@ void fill_stat_data(struct stat_data *sd, struct stat *st)
> sd->sd_size = st->st_size;
> }
>
> +/*
> + * Compares two file times. Returns 0 if equal, <0 if t1 < t2, >0 if t1 > t2.
> + * Auto-detects precision based on trailing 0 digits. Compares seconds only if
> + * core.checkStat=minimal.
> + */
> +static inline int cmp_filetime(uint32_t t1_sec, uint32_t t1_nsec,
> + uint32_t t2_sec, uint32_t t2_nsec) {
> +#ifdef USE_NSEC
> + /*
> + * Compare seconds and return result if different, or checkStat=mimimal,
> + * or one of the time stamps has second precision only (nsec == 0).
> + */
> + int diff = t1_sec - t2_sec;
> + if (diff || !check_stat || !t1_nsec || !t2_nsec)
> + return diff;
> +
> + /*
> + * Check if one of the time stamps has millisecond precision only (i.e.
> + * the trailing 6 digits are 0). First check the trailing 6 bits so that
> + * we only do (slower) modulo division if necessary.
> + */
> + if ((!(t1_nsec & 0x3f) && !(t1_nsec % 1000000)) ||
> + (!(t2_nsec & 0x3f) && !(t2_nsec % 1000000)))
> + /* Compare milliseconds. */
> + return (t1_nsec - t2_nsec) / 1000000;
> +
> + /* Compare nanoseconds */
> + return t1_nsec - t2_nsec;
> +#else
> + return t1_sec - t2_sec;
> +#endif
> +}
As this affects only setups where the same repository is accessed via clients with different precision, would it make sense to hide this behind a config option? I.e. something like
static int cmp_filetime_precise(uint32_t t1_sec, uint32_t t1_nsec,
uint32_t t2_sec, uint32_t t2_nsec)
{
#ifdef USE_NSEC
return t1_sec != t2_sec ? t1_sec - t2_sec : t1_nsec - t2_nsec;
#else
return t1_sec - t2_sec;
#endif
}
static int cmp_filetime_mixed(uint32_t t1_sec, uint32_t t1_nsec,
uint32_t t2_sec, uint32_t t2_nsec)
{
#ifdef USE_NSEC
... detect lower precision and compare with lower precision only...
#else
return t1_sec - t2_sec;
#endif
}
static (int *)cmp_filetime(uint32_t t1_sec, uint32_t t1_nsec,
uint32_t t2_sec, uint32_t t2_nsec)
= cmp_filetime_precise;
... modify cmp_filetime_precise if core.mixedTimeSpec = true...
Otherwise there would be that little loop-hole where (nsec % 1000) == 0 *by chance* and we assume the timestamps to be identical even if they are not.
Ciao,
Dscho
next prev parent reply other threads:[~2015-09-28 12:52 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-25 23:28 broken racy detection and performance issues with nanosecond file times Karsten Blees
2015-09-28 10:39 ` [PATCH/RFC] read-cache: fix file time comparisons with different precisions Karsten Blees
2015-09-28 12:52 ` Johannes Schindelin [this message]
2015-09-29 10:23 ` Karsten Blees
2015-09-29 13:42 ` Johannes Schindelin
2015-09-28 17:38 ` broken racy detection and performance issues with nanosecond file times Junio C Hamano
2015-09-29 11:28 ` Karsten Blees
2015-09-28 18:17 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=763be6c1331ac57cf7dee3636d82f994@dscho.org \
--to=johannes.schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=karsten.blees@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).