git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* broken racy detection and performance issues with nanosecond file times
@ 2015-09-25 23:28 Karsten Blees
  2015-09-28 10:39 ` [PATCH/RFC] read-cache: fix file time comparisons with different precisions Karsten Blees
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Karsten Blees @ 2015-09-25 23:28 UTC (permalink / raw)
  To: Git List

Hi there,

I think I found a few nasty problems with racy detection, as well as
performance issues when using git implementations with different file
time resolutions on the same repository (e.g. git compiled with and
without USE_NSEC, libgit2 compiled with and without USE_NSEC, JGit
executed in different Java implementations...).

Let me start by listing relevant file time gotchas (skip this if it
sounds too familiar) before diving into problem descriptions. Some
ideas for potential solutions are at the end.


Notable file time facts:
========================

The st_ctime discrepancy:
* stat.st_ctime means "change time" (of file metadata) on POSIX
  systems and "creation time" on Windows
* While some file systems may track all four time stamps (mtime,
  atime, change time and creation time), there are no public OS APIs
  to obtain creation time on POSIX / change time on Windows.

Linux:
* In-core file times may not be properly rounded to on-disk
  precision, causing spurious file time changes when the cache is
  refreshed from disk. This was fixed for typical Unix file systems
  in kernel 2.6.11. The fix for CEPH, CIFS, NTFS, UFS and FUSE will
  be in kernel 4.3. There's no fix for FAT-based file systems yet.
* Maximum file time precision is 1 ns (or 1 s with really old glibc).

Windows:
* Maximum file time precision is 100 ns.

Java <= 6:
* Only exposes mtime in milliseconds (via File.getLastModifiedTime).

Java >= 7:
* Only exposes mtime, atime and creation time, no change time (see
  java.nio.file.attribute.BasicFileAttributes).
* Maximum file time precision is implementation specific (OpenJDK:
  1 microsecond on both Unix [1] and Windows [2]).
* On platforms or file systems that don't support creation time,
  BasicFileAttribtes.creationTime() is implementation specific
  (OpenJDK returns mtime instead). There's no public API to detect
  whether creation time is supported or "emulated" in some way.

Git Options:
* NO_NSEC (git only): compile-time option that disables recording of
  nanoseconds in the index, implies USE_NSEC=false.
* USE_NSEC (git and libgit2 with [3]): compile-time option that
  enables nanosecond comparison in both up-to-date and racy checks.
* core.checkStat=minimal (git, libgit2, JGit): config-option that
  disables nanosecond comparison in up-to-date checks, but not in
  racy checks.

JGit:
* Only uses mtime, rounded to milliseconds. While there is a
  DirCacheEntry.setCreationTime() [4] to set the index entry's ctime
  field, AFAICT its not used anywhere.
* Does not compare nanoseconds if the cached value recorded in the
  index is 0, to prevent performance issues with NO_NSEC git
  implementations [5].


Problem 1: Failure to detect racy files (without USE_NSEC)
==========================================================

Git may not detect racy changes when 'update-index' runs in parallel
to work tree updates.

Consider this (where timestamps are t<seconds>.<nanoseconds>):

 t0.0$ echo "foo" > file1
 t0.1$ git update-index file1 &  # runs in background
 t0.2$ # update-index records stats and sha1 of file1 in new index
 t0.3$ echo "bar" > file1
 ....$ # update-index writes other index entries
 t1.0$ # update-index finishes (sets mtime of the new index to t1.0!)
 t1.1$ git status # doesn't detect that file1 has changed

The problem here is that racy checks in 'git status' compare against
the new index file's mtime (t1.0), which may be newer than the last
change of file1.


Problem 2: Failure to detect racy files (mixed USE_NSEC)
========================================================

Git may fail to detect racy conditions if file times in .git/index
have been recorded by another git implementation with better file
time resolution.

Consider the following sequence:

 t0.0$ echo "foo" > file1
 t0.1$ use-nsec-git update-index file1
 t0.2$ echo "bar" > file1
 ....$ sleep 1
 t1.0$ touch file2
 t1.1$ use-nsec-git status # rewrites index, to store file2 change
 t1.2$ git status # doesn't detect that file1 has changed

The problem here is that the first, nsec-enabled 'git status' does
not consider file1 racy (with nanosecond precision, the file is dirty
already (t0.0 != t0.2), so no racy-checks are performed). Thus, it
will not squash the size field (as a second-precision-git would).
However, it will rewrite the index to capture the status change of
file2, and thus create a new index file with mtime = t1.1. Similar
to problem 1, subsequent 'git status' with second-precision has no
way to detect that file1 has changed.

This problem would not be limited to USE_NSEC-enabled/disabled git,
it occurs whenever different file time resolutions are at play, e.g.:
 * second-based git vs. millisecond-based JGit
 * millisecond-based JGit vs. nanosecond-enabled git
 * GIT_WORK_TREE on ext2 (1 s) and GIT_DIR on ext4 (1 ns)
 * JGit executed by different Java implementations (with different
   file time resolutions)


Problem 3: Failure to detect racy files with core.checkStat=minimal
===================================================================

Consider the example above (problem 2). With core.checkStat=minimal,
the nanosecond-enabled git also fails to detect that file1 has
changed.

This is because racy checks are still done with nanosecond precision
(despite checkStat=minimal), and against the *cached* mtime, not the
real one. I.e.:
 * in match_stat_data(), nanoseconds are ignored, and file1 is
   considered unchanged (as t0[.0] == t0[.2]).
 * in ie_match_stat(), we pass the cache entry to is_racy_timestamp()
   (which has mtime == t0.0), even though we know the current mtime
   at this point (t0.2)
 * in is_racy_stat(), file1 is not considered racy, because the index
   file's mtime (t0.1) is newer than the cached mtime (t0.0)


Problem 4: Performance issues with mixed file time resolutions
==============================================================

A git implementation will consider files dirty (i.e. triggering a
content check) if the index entry has been recorded by another git
implementation with lower file time resolution.

Examples:

Git compiled with NO_NSEC writes index entries with nanosecond
fields == 0. A USE_NSEC-enabled git will consider these files dirty
(except in the rare case that on-disk nanoseconds of the file time
are really 0).

JGit writes index entries with mtime nanosecond fields rounded to
milliseconds. Again, a USE_NSEC-enabled git will consider the files
dirty.

JGit writes index entries with ctime seconds and nanoseconds == 0.
All other git implementations will consider such files dirty.


Ideas for potential solutions:
==============================

Performance issues:
-------------------

1. Compare file times in minimum supported precision
   When comparing file times, use the minimum precision supported by
   both the writing and reading git implementations.
1a. Simplest variant: Don't compare nanoseconds if the field in the
   cached index entry is 0. JGit already does this [5], but at the
   same time it is very unfriendly to USE_NSEC-enabled git by storing
   only milliseconds in the nanosecond field. This "simple" solution
   implies that git implementations that cannot provide full
   nanosecond precision must leave the nanosecond field empty.
1b. More involved: Store the precision in the index entry.
   We only need 30 bits to encode nanoseconds, so the high 2 bits of
   the nanosecond field could be used as follows:
   00: second precision (i.e. ignore, for backward compatibility)
   01: millisecond precision
   10: microsecond precision
   11: nanosecond precision
   When reading the index, USE-NSEC-enabled git implementations would
   do dirty checks with the minimum precision supported by themselves
   and the creator of the index entry.


2. Don't use ctime in dirty checks if ctime.sec == 0.


Racy detection:
---------------

3. Minimal racy solution
   * Do all racy checks with second-precision only.
   * When committing an index.lock file, reset mtime to the time
     before git started reading the old index (i.e. time(null) when
     calling read_cache()).

   I believe this should fix all three racy problems described above,
   although restraining ourselves to second-precision somewhat
   thwarts the ability to track nanoseconds in the first place.
   
   The problem with this solution is that files changed by git itself
   will appear racy to the next git process, thus increasing the
   performance penalty after e.g. a large checkout. Although I think
   that re-reading the file after the file's mtime is the only way to
   be really sure it hasn't been changed.


4. More ideas to solve the racy problem
   Conceptually, any changes that happen at the same time or after we
   start capturing information about a file may be missed by the
   recording process. Thus, a "safe" way to use file times for racy /
   dirty checks would be as follows:

     start_capture = filesystem(file).now()
     oid = read_sha1(file)
     mtime1 = lstat(file).mtime
     racy = mtime1 >= start_capture
     ...
     mtime2 = lstat(file).mtime
     check_content = mtime1 != mtime2 || racy

   Whereas Git currently does something like this:

     mtime1 = lstat(file).mtime
     oid = read_sha1(file)
     ...
     end_capture = lstat(index).mtime
     mtime2 = lstat(file).mtime
     check_content = mtime1 != mtime2 || mtime1 >= end_capture   

   One problem with this is that end_capture is only known after
   closing the index file, which is why currently, racy checks can
   only be done by the next git process that reads the index.

   Additionally, rewriting the index file changes its mtime and thus
   deprives subsequent git processes from doing racy checks. This is
   currently solved by squashing the size field of racy entries.
   Which means that a third git process needs to fill the size back
   in, rewriting the index again...

   I suspect that we could get away with fewer index rewrites if we
   did racy checks in the git process that initially updates the
   index entry. I.e.:
    * get start_capture from index.lock immediately after creating it
      (this ignores that index.lock may be on another file system
      with different file time precision than the work tree)
    * do racy checks immediately and store the results in the entry
    * to accommodate different file time precisions, the racy "flag"
      could indicate at which file time precision the entry would
      have to be considered racy. E.g.
        if (mtime1.sec < start_capture.sec)
          return NOT_RACY;
	else if (mtime1.sec > start_capture.sec || 
                 mtime1.nsec >= start_capture.nsec)
          return ALWAYS_RACY;
        else if (mtime1.nsec / 1000 == start_capture.nsec / 1000)
          return RACY_AT_USEC_MSEC_SEC;
        else if (mitme1.nsec / 1000000 == start_capture.nsec / 1000000)
          return RACY_AT_MSEC_SEC;
        else
          return RACY_AT_SEC;
    * for backward compatibility, we could still squash the size and
      store the original size + racy info in an index extension - on
      the other hand, reliable change detection is so fundamental to
      an SCM that we may want to keep racy info in the core index
      entry structure, probably even if it means a format change

   An advantage of this would be that when rewriting the index, git
   is no longer required to treat racy entries in any special way -
   if the git command is not interested in the racy entry, it can
   simply copy it to the next index file, without checking file
   content or squashing the size field. Commands like git status
   would only need to rewrite the index if the racy info changes
   (i.e. enough time has passed).


Please let me know what you think of this...maybe I've completely
screwed up and can no longer see the forest for all the trees.

TIA,
Karsten

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH/RFC] read-cache: fix file time comparisons with different precisions
  2015-09-25 23:28 broken racy detection and performance issues with nanosecond file times Karsten Blees
@ 2015-09-28 10:39 ` Karsten Blees
  2015-09-28 12:52   ` Johannes Schindelin
  2015-09-28 17:38 ` broken racy detection and performance issues with nanosecond file times Junio C Hamano
  2015-09-28 18:17 ` Junio C Hamano
  2 siblings, 1 reply; 8+ messages in thread
From: Karsten Blees @ 2015-09-28 10:39 UTC (permalink / raw)
  To: Git List

Different git variants record file times in the index with different
precisions, according to their capabilities. E.g. git compiled with NO_NSEC
records seconds only, JGit records the mtime in milliseconds, but leaves
ctime blank (because ctime is unavailable in Java).

This causes performance issues in git compiled with USE_NSEC, because index
entries with such 'incomplete' timestamps are considered dirty, triggering
unnecessary content checks.

Add a file time comparison function that auto-detects the precision based
on the number of trailing 0 digits, and compares with the lower precision
of both values. This initial version supports the known precisions seconds
(git + NO_NSEC), milliseconds (JGit) and nanoseconds (git + USE_NSEC), but
can be easily extended to e.g. microseconds.

Use the new comparison function in both dirty and racy checks. As a side
effect, this fixes racy detection in USE_NSEC-enabled git with
core.checkStat=minimal, as the coreStat setting now affects racy checks as
well.

Finally, do not check ctime if ctime.sec is 0 (as recorded by JGit).

Signed-off-by: Karsten Blees <blees@dcon.de>
---
 read-cache.c | 62 ++++++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 41 insertions(+), 21 deletions(-)


This patch fixes problems 3 and 4, by trying to auto-detect the recorded file
time precision.


diff --git a/read-cache.c b/read-cache.c
index 87204a5..3a4e6cd 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -99,23 +99,50 @@ void fill_stat_data(struct stat_data *sd, struct stat *st)
 	sd->sd_size = st->st_size;
 }
 
+/*
+ * Compares two file times. Returns 0 if equal, <0 if t1 < t2, >0 if t1 > t2.
+ * Auto-detects precision based on trailing 0 digits. Compares seconds only if
+ * core.checkStat=minimal.
+ */
+static inline int cmp_filetime(uint32_t t1_sec, uint32_t t1_nsec,
+			       uint32_t t2_sec, uint32_t t2_nsec) {
+#ifdef USE_NSEC
+	/*
+	 * Compare seconds and return result if different, or checkStat=mimimal,
+	 * or one of the time stamps has second precision only (nsec == 0).
+	 */
+	int diff = t1_sec - t2_sec;
+	if (diff || !check_stat || !t1_nsec || !t2_nsec)
+		return diff;
+
+	/*
+	 * Check if one of the time stamps has millisecond precision only (i.e.
+	 * the trailing 6 digits are 0). First check the trailing 6 bits so that
+	 * we only do (slower) modulo division if necessary.
+	 */
+	if ((!(t1_nsec & 0x3f) && !(t1_nsec % 1000000)) ||
+	    (!(t2_nsec & 0x3f) && !(t2_nsec % 1000000)))
+		/* Compare milliseconds. */
+		return (t1_nsec - t2_nsec) / 1000000;
+
+	/* Compare nanoseconds */
+	return t1_nsec - t2_nsec;
+#else
+	return t1_sec - t2_sec;
+#endif
+}
+
 int match_stat_data(const struct stat_data *sd, struct stat *st)
 {
 	int changed = 0;
 
-	if (sd->sd_mtime.sec != (unsigned int)st->st_mtime)
-		changed |= MTIME_CHANGED;
-	if (trust_ctime && check_stat &&
-	    sd->sd_ctime.sec != (unsigned int)st->st_ctime)
-		changed |= CTIME_CHANGED;
-
-#ifdef USE_NSEC
-	if (check_stat && sd->sd_mtime.nsec != ST_MTIME_NSEC(*st))
+	if (cmp_filetime(sd->sd_mtime.sec, sd->sd_mtime.nsec,
+			 (unsigned) st->st_mtime, ST_MTIME_NSEC(*st)))
 		changed |= MTIME_CHANGED;
-	if (trust_ctime && check_stat &&
-	    sd->sd_ctime.nsec != ST_CTIME_NSEC(*st))
+	if (trust_ctime && check_stat && sd->sd_ctime.sec &&
+	    cmp_filetime(sd->sd_ctime.sec, sd->sd_ctime.nsec,
+			 (unsigned) st->st_ctime, ST_CTIME_NSEC(*st)))
 		changed |= CTIME_CHANGED;
-#endif
 
 	if (check_stat) {
 		if (sd->sd_uid != (unsigned int) st->st_uid ||
@@ -276,16 +303,9 @@ static int ce_match_stat_basic(const struct cache_entry *ce, struct stat *st)
 static int is_racy_stat(const struct index_state *istate,
 			const struct stat_data *sd)
 {
-	return (istate->timestamp.sec &&
-#ifdef USE_NSEC
-		 /* nanosecond timestamped files can also be racy! */
-		(istate->timestamp.sec < sd->sd_mtime.sec ||
-		 (istate->timestamp.sec == sd->sd_mtime.sec &&
-		  istate->timestamp.nsec <= sd->sd_mtime.nsec))
-#else
-		istate->timestamp.sec <= sd->sd_mtime.sec
-#endif
-		);
+	return istate->timestamp.sec &&
+	       (cmp_filetime(istate->timestamp.sec, istate->timestamp.nsec,
+			     sd->sd_mtime.sec, sd->sd_mtime.nsec) <= 0);
 }
 
 static int is_racy_timestamp(const struct index_state *istate,
-- 
2.1.0.msysgit.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH/RFC] read-cache: fix file time comparisons with different precisions
  2015-09-28 10:39 ` [PATCH/RFC] read-cache: fix file time comparisons with different precisions Karsten Blees
@ 2015-09-28 12:52   ` Johannes Schindelin
  2015-09-29 10:23     ` Karsten Blees
  0 siblings, 1 reply; 8+ messages in thread
From: Johannes Schindelin @ 2015-09-28 12:52 UTC (permalink / raw)
  To: Karsten Blees; +Cc: Git List

Hi Karsten,

On 2015-09-28 12:39, Karsten Blees wrote:
> Different git variants record file times in the index with different
> precisions, according to their capabilities. E.g. git compiled with NO_NSEC
> records seconds only, JGit records the mtime in milliseconds, but leaves
> ctime blank (because ctime is unavailable in Java).
> 
> This causes performance issues in git compiled with USE_NSEC, because index
> entries with such 'incomplete' timestamps are considered dirty, triggering
> unnecessary content checks.
> 
> Add a file time comparison function that auto-detects the precision based
> on the number of trailing 0 digits, and compares with the lower precision
> of both values. This initial version supports the known precisions seconds
> (git + NO_NSEC), milliseconds (JGit) and nanoseconds (git + USE_NSEC), but
> can be easily extended to e.g. microseconds.
> 
> Use the new comparison function in both dirty and racy checks. As a side
> effect, this fixes racy detection in USE_NSEC-enabled git with
> core.checkStat=minimal, as the coreStat setting now affects racy checks as
> well.
> 
> Finally, do not check ctime if ctime.sec is 0 (as recorded by JGit).

Great analysis, and nice patch. I would like to offer one suggestion in addition:

> diff --git a/read-cache.c b/read-cache.c
> index 87204a5..3a4e6cd 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -99,23 +99,50 @@ void fill_stat_data(struct stat_data *sd, struct stat *st)
>  	sd->sd_size = st->st_size;
>  }
>  
> +/*
> + * Compares two file times. Returns 0 if equal, <0 if t1 < t2, >0 if t1 > t2.
> + * Auto-detects precision based on trailing 0 digits. Compares seconds only if
> + * core.checkStat=minimal.
> + */
> +static inline int cmp_filetime(uint32_t t1_sec, uint32_t t1_nsec,
> +			       uint32_t t2_sec, uint32_t t2_nsec) {
> +#ifdef USE_NSEC
> +	/*
> +	 * Compare seconds and return result if different, or checkStat=mimimal,
> +	 * or one of the time stamps has second precision only (nsec == 0).
> +	 */
> +	int diff = t1_sec - t2_sec;
> +	if (diff || !check_stat || !t1_nsec || !t2_nsec)
> +		return diff;
> +
> +	/*
> +	 * Check if one of the time stamps has millisecond precision only (i.e.
> +	 * the trailing 6 digits are 0). First check the trailing 6 bits so that
> +	 * we only do (slower) modulo division if necessary.
> +	 */
> +	if ((!(t1_nsec & 0x3f) && !(t1_nsec % 1000000)) ||
> +	    (!(t2_nsec & 0x3f) && !(t2_nsec % 1000000)))
> +		/* Compare milliseconds. */
> +		return (t1_nsec - t2_nsec) / 1000000;
> +
> +	/* Compare nanoseconds */
> +	return t1_nsec - t2_nsec;
> +#else
> +	return t1_sec - t2_sec;
> +#endif
> +}

As this affects only setups where the same repository is accessed via clients with different precision, would it make sense to hide this behind a config option? I.e. something like

static int cmp_filetime_precise(uint32_t t1_sec, uint32_t t1_nsec,
			        uint32_t t2_sec, uint32_t t2_nsec)
{
#ifdef USE_NSEC
	return t1_sec != t2_sec ? t1_sec - t2_sec : t1_nsec - t2_nsec;
#else
	return t1_sec - t2_sec;
#endif
}

static int cmp_filetime_mixed(uint32_t t1_sec, uint32_t t1_nsec,
			      uint32_t t2_sec, uint32_t t2_nsec)
{
#ifdef USE_NSEC
	... detect lower precision and compare with lower precision only...
#else
	return t1_sec - t2_sec;
#endif
}

static (int *)cmp_filetime(uint32_t t1_sec, uint32_t t1_nsec,
			   uint32_t t2_sec, uint32_t t2_nsec)
	= cmp_filetime_precise;

... modify cmp_filetime_precise if core.mixedTimeSpec = true...

Otherwise there would be that little loop-hole where (nsec % 1000) == 0 *by chance* and we assume the timestamps to be identical even if they are not.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: broken racy detection and performance issues with nanosecond file times
  2015-09-25 23:28 broken racy detection and performance issues with nanosecond file times Karsten Blees
  2015-09-28 10:39 ` [PATCH/RFC] read-cache: fix file time comparisons with different precisions Karsten Blees
@ 2015-09-28 17:38 ` Junio C Hamano
  2015-09-29 11:28   ` Karsten Blees
  2015-09-28 18:17 ` Junio C Hamano
  2 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2015-09-28 17:38 UTC (permalink / raw)
  To: Karsten Blees; +Cc: Git List

Karsten Blees <karsten.blees@gmail.com> writes:

> Problem 1: Failure to detect racy files (without USE_NSEC)
> ==========================================================
>
> Git may not detect racy changes when 'update-index' runs in parallel
> to work tree updates.
>
> Consider this (where timestamps are t<seconds>.<nanoseconds>):
>
>  t0.0$ echo "foo" > file1
>  t0.1$ git update-index file1 &  # runs in background

I just wonder after looking at the ampersand here ...

> Please let me know what you think of this...maybe I've completely
> screwed up and can no longer see the forest for all the trees.

... if your task would become much simpler if you declare "once you
give Git the control, do not muck with the repository until you get
the control back".

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: broken racy detection and performance issues with nanosecond file times
  2015-09-25 23:28 broken racy detection and performance issues with nanosecond file times Karsten Blees
  2015-09-28 10:39 ` [PATCH/RFC] read-cache: fix file time comparisons with different precisions Karsten Blees
  2015-09-28 17:38 ` broken racy detection and performance issues with nanosecond file times Junio C Hamano
@ 2015-09-28 18:17 ` Junio C Hamano
  2 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2015-09-28 18:17 UTC (permalink / raw)
  To: Karsten Blees; +Cc: Git List

Karsten Blees <karsten.blees@gmail.com> writes:

> Ideas for potential solutions:
> ==============================
>
> Performance issues:
> -------------------
>
> 1. Compare file times in minimum supported precision
>    When comparing file times, use the minimum precision supported by
>    both the writing and reading git implementations.
> 1a. Simplest variant: Don't compare nanoseconds if the field in the
>    cached index entry is 0. JGit already does this [5], but at the
>    same time it is very unfriendly to USE_NSEC-enabled git by storing
>    only milliseconds in the nanosecond field. This "simple" solution
>    implies that git implementations that cannot provide full
>    nanosecond precision must leave the nanosecond field empty.
> 1b. More involved: Store the precision in the index entry.
>    We only need 30 bits to encode nanoseconds, so the high 2 bits of
>    the nanosecond field could be used as follows:
>    00: second precision (i.e. ignore, for backward compatibility)
>    01: millisecond precision
>    10: microsecond precision
>    11: nanosecond precision
>    When reading the index, USE-NSEC-enabled git implementations would
>    do dirty checks with the minimum precision supported by themselves
>    and the creator of the index entry.

Yeah, my gut feeling is that we should make sure that at least 1a is
done by all implementations.

I agree that 1b. is a bit more involved in that all binary that was
built with USE_NSEC that is not aware of these 2-bits need to be
eradicated for a new version to be deployed --- the transition for
users who use multiple implementations will be a pain (those that
use just one implementation of Git can just say "rm -f .git/index &&
git reset --hard" or something after updating to the new version of
Git).

> 2. Don't use ctime in dirty checks if ctime.sec == 0.

OK.  That is slightly less drastic than !trust_ctime, I guess.

> Racy detection:
> ---------------
>
> 3. Minimal racy solution
>    * Do all racy checks with second-precision only.
>    * When committing an index.lock file, reset mtime to the time
>      before git started reading the old index (i.e. time(null) when
>      calling read_cache()).
>
>    I believe this should fix all three racy problems described above,
>    although restraining ourselves to second-precision somewhat
>    thwarts the ability to track nanoseconds in the first place.
>    
>    The problem with this solution is that files changed by git itself
>    will appear racy to the next git process, thus increasing the
>    performance penalty after e.g. a large checkout. Although I think
>    that re-reading the file after the file's mtime is the only way to
>    be really sure it hasn't been changed.

... the last of which is what is done anyway, so I think the above,
especally the second bullet-point, is all sensible.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH/RFC] read-cache: fix file time comparisons with different precisions
  2015-09-28 12:52   ` Johannes Schindelin
@ 2015-09-29 10:23     ` Karsten Blees
  2015-09-29 13:42       ` Johannes Schindelin
  0 siblings, 1 reply; 8+ messages in thread
From: Karsten Blees @ 2015-09-29 10:23 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Git List

Am 28.09.2015 um 14:52 schrieb Johannes Schindelin:
> Otherwise there would be that little loop-hole where (nsec % 1000) == 0 *by chance* and we assume the timestamps to be identical even if they are not.

Yeah, but in this case the file would be racy, as racy-checks use
the same comparison now.

IMO change detection is so fundamental that it should Just Work,
without having a plethora of config options that we need to explain
to end users.

If that means that once in a million cases we need an extra content
check to revalidate such falsely racy entries, that's fine with me.

Cheers,
Karsten

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: broken racy detection and performance issues with nanosecond file times
  2015-09-28 17:38 ` broken racy detection and performance issues with nanosecond file times Junio C Hamano
@ 2015-09-29 11:28   ` Karsten Blees
  0 siblings, 0 replies; 8+ messages in thread
From: Karsten Blees @ 2015-09-29 11:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List

Am 28.09.2015 um 19:38 schrieb Junio C Hamano:
> Karsten Blees <karsten.blees@gmail.com> writes:
> 
>> Problem 1: Failure to detect racy files (without USE_NSEC)
>> ==========================================================
>>
>> Git may not detect racy changes when 'update-index' runs in parallel
>> to work tree updates.
>>
>> Consider this (where timestamps are t<seconds>.<nanoseconds>):
>>
>>  t0.0$ echo "foo" > file1
>>  t0.1$ git update-index file1 &  # runs in background
> 
> I just wonder after looking at the ampersand here ...
> 
>> Please let me know what you think of this...maybe I've completely
>> screwed up and can no longer see the forest for all the trees.
> 
> ... if your task would become much simpler if you declare "once you
> give Git the control, do not muck with the repository until you get
> the control back".
> 

This is just to illustrate the problem. GUI-based applications will
often do things in the background that you cannot control. E.g. gitk,
git gui, Eclipse or TortoiseGit don't tell you when and how long you
shouldn't touch the working copy. At the same time, IntelliJ IDEA and
most office suits have the auto-save feature turned on by default,
and you cannot tell them when *not* to auto-save.

It may still be quite unlikely that this happens (you need two
changes within a second, without changing the file size), but *if*
it happens, the user may not even notice. And as git trusts the
false stat data blindly, the problem won't go away automatically.
You can mark all entries racy by setting index mtime to some value
far in the past, but this implies that you noticed that something
was wrong...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH/RFC] read-cache: fix file time comparisons with different precisions
  2015-09-29 10:23     ` Karsten Blees
@ 2015-09-29 13:42       ` Johannes Schindelin
  0 siblings, 0 replies; 8+ messages in thread
From: Johannes Schindelin @ 2015-09-29 13:42 UTC (permalink / raw)
  To: Karsten Blees; +Cc: Git List

Hi Karsten,

On 2015-09-29 12:23, Karsten Blees wrote:
> Am 28.09.2015 um 14:52 schrieb Johannes Schindelin:
>> Otherwise there would be that little loop-hole where (nsec % 1000) == 0 *by chance* and we assume the timestamps to be identical even if they are not.
> 
> Yeah, but in this case the file would be racy, as racy-checks use
> the same comparison now.

True.

> IMO change detection is so fundamental that it should Just Work,
> without having a plethora of config options that we need to explain
> to end users.
> 
> If that means that once in a million cases we need an extra content
> check to revalidate such falsely racy entries, that's fine with me.

You have a good point there. I retract my objections.

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-09-29 13:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-25 23:28 broken racy detection and performance issues with nanosecond file times Karsten Blees
2015-09-28 10:39 ` [PATCH/RFC] read-cache: fix file time comparisons with different precisions Karsten Blees
2015-09-28 12:52   ` Johannes Schindelin
2015-09-29 10:23     ` Karsten Blees
2015-09-29 13:42       ` Johannes Schindelin
2015-09-28 17:38 ` broken racy detection and performance issues with nanosecond file times Junio C Hamano
2015-09-29 11:28   ` Karsten Blees
2015-09-28 18:17 ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).