git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Fix double "close()" in ce_compare_data
@ 2006-07-31 16:55     ` Linus Torvalds
  2006-07-31 19:05       ` Junio C Hamano
                         ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Linus Torvalds @ 2006-07-31 16:55 UTC (permalink / raw
  To: Junio C Hamano, Git Mailing List


Doing an "strace" on "git diff" shows that we close() a file descriptor 
twice (getting EBADFD on the second one) when we end up in ce_compare_data 
if the index does not match the checked-out stat information.

The "index_fd()" function will already have closed the fd for us, so we 
should not close it again.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---

The way I found this also showed a potential performance problem: if you 
do a "git reset --hard" (or similar) after you have changes in your tree, 
it will write the index file with the same timestamp as the checked out 
files that it re-wrote.

That will also then forever afterwards (well, until the next "git 
update-index --refresh") cause the "uncommon" timestamp case in 
ce_match_stat(), where we check the index-file timestamp against the 
timestamp of the stat data, to trigger.

Not very good. The "ce_modified_check_fs()" tests can be quite expensive 
if you have lots of those files because we end up then calling the 
"ce_compare_data()" function a lot. And suddenly "git diff" doesn't take a 
tenth of a second any more.

We should really try to have some way to re-generate the index 
automatically when this case triggers, so that we only need to do it 
_once_ rather than keep doing it forever while the index is "potentially 
stale".

Any ideas?

diff --git a/read-cache.c b/read-cache.c
index c0b0313..f92cdaa 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -61,7 +61,7 @@ static int ce_compare_data(struct cache_
 		unsigned char sha1[20];
 		if (!index_fd(sha1, fd, st, 0, NULL))
 			match = memcmp(sha1, ce->sha1, 20);
-		close(fd);
+		/* index_fd() closed the file descriptor already */
 	}
 	return match;
 }

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: Fix double "close()" in ce_compare_data
  2006-07-31 16:55     ` Fix double "close()" in ce_compare_data Linus Torvalds
@ 2006-07-31 19:05       ` Junio C Hamano
  2006-08-05 11:20       ` [PATCH] Racy git: avoid having to be always too careful Junio C Hamano
  2006-08-15 20:12       ` [PATCH] Documentation/technical/racy-git.txt Junio C Hamano
  2 siblings, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2006-07-31 19:05 UTC (permalink / raw
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@osdl.org> writes:

> The way I found this also showed a potential performance problem: if you 
> do a "git reset --hard" (or similar) after you have changes in your tree, 
> it will write the index file with the same timestamp as the checked out 
> files that it re-wrote.
>...
> We should really try to have some way to re-generate the index 
> automatically when this case triggers, so that we only need to do it 
> _once_ rather than keep doing it forever while the index is "potentially 
> stale".
>
> Any ideas?

Just before writing the index out, we could do a gettimeofday()
and wait for 1 second (or two if you are on FAT) if we have many
many paths that have the same timestamp.  Ugly and would not
work on NFS I suspect.

We could do gratuitous "update-index --refresh" but that is
uglier especially if done at the core level; the caller does not
expect the index file to be updated.

Or we could remove the check and tell the user not to touch the
working tree after calling update-index (I am not seriously
suggesting this).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] Racy git: avoid having to be always too careful
  2006-07-31 16:55     ` Fix double "close()" in ce_compare_data Linus Torvalds
  2006-07-31 19:05       ` Junio C Hamano
@ 2006-08-05 11:20       ` Junio C Hamano
  2006-08-08 15:43         ` Johannes Schindelin
  2006-08-15 20:12       ` [PATCH] Documentation/technical/racy-git.txt Junio C Hamano
  2 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2006-08-05 11:20 UTC (permalink / raw
  To: Linus Torvalds; +Cc: git

Immediately after a bulk checkout, most of the paths in the
working tree would have the same timestamp as the index file,
and this would force ce_match_stat() to take slow path for all
of them.  When writing an index file out, if many of the paths
have very new (read: the same timestamp as the index file being
written out) timestamp, we are better off delaying the return
from the command, to make sure that later command to touch the
working tree files will leave newer timestamps than recorded in
the index, thereby avoiding to take the slow path.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

Linus Torvalds <torvalds@osdl.org> writes:

> Doing an "strace" on "git diff" shows that we close() a file descriptor 
> twice (getting EBADFD on the second one) when we end up in ce_compare_data 
> if the index does not match the checked-out stat information.
>
> The "index_fd()" function will already have closed the fd for us, so we 
> should not close it again.
>
> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> ---
>
> The way I found this also showed a potential performance problem: if you 
> do a "git reset --hard" (or similar) after you have changes in your tree, 
> it will write the index file with the same timestamp as the checked out 
> files that it re-wrote.
>
> That will also then forever afterwards (well, until the next "git 
> update-index --refresh") cause the "uncommon" timestamp case in 
> ce_match_stat(), where we check the index-file timestamp against the 
> timestamp of the stat data, to trigger.
>
> Not very good. The "ce_modified_check_fs()" tests can be quite expensive 
> if you have lots of those files because we end up then calling the 
> "ce_compare_data()" function a lot. And suddenly "git diff" doesn't take a 
> tenth of a second any more.
>
> We should really try to have some way to re-generate the index 
> automatically when this case triggers, so that we only need to do it 
> _once_ rather than keep doing it forever while the index is "potentially 
> stale".
>
> Any ideas?

Here is what I came up with.  I am not very happy with its
magic, but should be better than nothing.

 read-cache.c |   44 ++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index f92cdaa..ce76c20 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -5,6 +5,7 @@
  */
 #include "cache.h"
 #include "cache-tree.h"
+#include <time.h>
 
 /* Index extensions.
  *
@@ -923,7 +924,7 @@ static void ce_smudge_racily_clean_entry
 		 * $ echo filfre >nitfol
 		 * $ git-update-index --add nitfol
 		 *
-		 * but it does not.  Whe the second update-index runs,
+		 * but it does not.  When the second update-index runs,
 		 * it notices that the entry "frotz" has the same timestamp
 		 * as index, and if we were to smudge it by resetting its
 		 * size to zero here, then the object name recorded
@@ -945,7 +946,9 @@ int write_cache(int newfd, struct cache_
 {
 	SHA_CTX c;
 	struct cache_header hdr;
-	int i, removed;
+	int i, removed, recent;
+	struct stat st;
+	time_t now;
 
 	for (i = removed = 0; i < entries; i++)
 		if (!cache[i]->ce_mode)
@@ -959,15 +962,19 @@ int write_cache(int newfd, struct cache_
 	if (ce_write(&c, newfd, &hdr, sizeof(hdr)) < 0)
 		return -1;
 
+	now = fstat(newfd, &st) ? 0 : st.st_mtime;
+	recent = 0;
 	for (i = 0; i < entries; i++) {
 		struct cache_entry *ce = cache[i];
+		time_t entry_time = (time_t) ntohl(ce->ce_mtime.sec);
 		if (!ce->ce_mode)
 			continue;
-		if (index_file_timestamp &&
-		    index_file_timestamp <= ntohl(ce->ce_mtime.sec))
+		if (index_file_timestamp && index_file_timestamp <= entry_time)
 			ce_smudge_racily_clean_entry(ce);
 		if (ce_write(&c, newfd, ce, ce_size(ce)) < 0)
 			return -1;
+		if (now && now <= entry_time)
+			recent++;
 	}
 
 	/* Write extension data here */
@@ -983,5 +990,34 @@ int write_cache(int newfd, struct cache_
 			return -1;
 		}
 	}
+
+	/*
+	 * To prevent later ce_match_stat() from always falling into
+	 * check_fs(), if we have too many entries that can trigger
+	 * racily clean check, we are better off delaying the return.
+	 * We arbitrarily say if more than 20 paths or 25% of total
+	 * paths are very new, we delay the return until the index
+	 * file gets a new timestamp.
+	 *
+	 * NOTE! NOTE! NOTE!
+	 *
+	 * This assumes that nobody is touching the working tree while
+	 * we are updating the index.
+	 */
+	if (20 < recent || entries <= recent * 4) {
+		now = fstat(newfd, &st) ? 0 : st.st_mtime;
+		while (now && !fstat(newfd, &st) && st.st_mtime <= now) {
+			struct timespec rq, rm;
+			off_t where = lseek(newfd, 0, SEEK_CUR);
+			rq.tv_sec = 0;
+			rq.tv_nsec = 250000000;
+			nanosleep(&rq, &rm);
+			if ((where == (off_t) -1) ||
+			    (write(newfd, "", 1) != 1) ||
+			    (lseek(newfd, -1, SEEK_CUR) != where) ||
+			    ftruncate(newfd, where))
+				break;
+		}
+	}
 	return ce_flush(&c, newfd);
 }
-- 
1.4.2.rc3.g19a8a

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] Racy git: avoid having to be always too careful
  2006-08-05 11:20       ` [PATCH] Racy git: avoid having to be always too careful Junio C Hamano
@ 2006-08-08 15:43         ` Johannes Schindelin
  2006-08-08 17:25           ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Johannes Schindelin @ 2006-08-08 15:43 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Linus Torvalds, git

Hi,

On Sat, 5 Aug 2006, Junio C Hamano wrote:

> Immediately after a bulk checkout, most of the paths in the
> working tree would have the same timestamp as the index file,
> and this would force ce_match_stat() to take slow path for all
> of them.  When writing an index file out, if many of the paths
> have very new (read: the same timestamp as the index file being
> written out) timestamp, we are better off delaying the return
> from the command, to make sure that later command to touch the
> working tree files will leave newer timestamps than recorded in
> the index, thereby avoiding to take the slow path.

This makes "make test" dog slow. How about making this overrideable?
(It is just a guess, but I _think_ that the sleeping is worse than having 
to check the files with the same time stamp again and again -- a "git 
status" will help that).

"make test" without this patch:

	real    9m59.314s
	user    0m36.580s
	sys     0m39.290s

"make test" with this patch:

	real    1m36.429s
	user    0m37.410s
	sys     0m40.460s

-- 8< --
[PATCH] read-cache: optionally disable being cautious with racy caches

By setting the environment variable GIT_RISK_RACY_CACHE or the config
variable core.riskRacyCache, the sleeping (to avoid a racy cache) is
disabled. This is also used for "make test", drastically reducing the
needed time.

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
---
 Documentation/config.txt |    4 ++++
 Documentation/git.txt    |    5 +++++
 cache.h                  |    1 +
 config.c                 |    4 ++++
 environment.c            |    1 +
 read-cache.c             |    3 ++-
 t/test-lib.sh            |    2 ++
 7 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index d89916b..b61fd91 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -103,6 +103,10 @@ core.legacyheaders::
 	database directly (where the "http://" and "rsync://" protocols
 	count as direct access).
 
+core.riskRacyCache::
+	A boolean which allows to disable the 1-second sleeping to avoid
+	a racy cache.
+
 alias.*::
 	Command aliases for the gitlink:git[1] command wrapper - e.g.
 	after defining "alias.last = cat-file commit HEAD", the invocation
diff --git a/Documentation/git.txt b/Documentation/git.txt
index ba525d3..ff464e8 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -649,6 +649,11 @@ other
 	stderr telling about alias expansion, built-in command
 	execution and external command execution.
 
+'GIT_RISK_RACY_CACHE'::
+	If this variable is set, gitlink:git-read-cache[1] will never
+	sleep to avoid a racy cache by making the timestamps of the index
+	file newer than the files referenced in the index.
+
 Discussion[[Discussion]]
 ------------------------
 include::README[]
diff --git a/cache.h b/cache.h
index 19fdef6..e8bbe6d 100644
--- a/cache.h
+++ b/cache.h
@@ -174,6 +174,7 @@ #define REFRESH_UNMERGED	0x0002	/* allow
 #define REFRESH_QUIET		0x0004	/* be quiet about it */
 #define REFRESH_IGNORE_MISSING	0x0008	/* ignore non-existent */
 extern int refresh_cache(unsigned int flags);
+extern int risk_racy_cache;
 
 struct lock_file {
 	struct lock_file *next;
diff --git a/config.c b/config.c
index c6e6f6a..4ab9329 100644
--- a/config.c
+++ b/config.c
@@ -291,6 +291,10 @@ int git_default_config(const char *var, 
 		return 0;
 	}
 
+	if (!strcmp(var, "core.riskracycache")) {
+		risk_racy_cache = git_config_bool(var, value);
+	}
+
 	if (!strcmp(var, "user.name")) {
 		strlcpy(git_default_name, value, sizeof(git_default_name));
 		return 0;
diff --git a/environment.c b/environment.c
index 1ce3411..a481fc4 100644
--- a/environment.c
+++ b/environment.c
@@ -24,6 +24,7 @@ const char *apply_default_whitespace = N
 int zlib_compression_level = Z_DEFAULT_COMPRESSION;
 int pager_in_use;
 int pager_use_color = 1;
+int risk_racy_cache = 0;
 
 static int dyn_git_object_dir, dyn_git_index_file, dyn_git_graft_file;
 static char *git_dir, *git_object_dir, *git_index_file, *git_refs_dir,
diff --git a/read-cache.c b/read-cache.c
index d64b503..f20dac5 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1021,7 +1021,8 @@ int write_cache(int newfd, struct cache_
 	 * This assumes that nobody is touching the working tree while
 	 * we are updating the index.
 	 */
-	if (20 < recent || entries <= recent * 4) {
+	if (!risk_racy_cache && !getenv("GIT_RISK_RACY_CACHE") &&
+			(20 < recent || entries <= recent * 4)) {
 		now = fstat(newfd, &st) ? 0 : st.st_mtime;
 		while (now && !fstat(newfd, &st) && st.st_mtime <= now) {
 			struct timespec rq, rm;
diff --git a/t/test-lib.sh b/t/test-lib.sh
index b6d119a..3b6882d 100755
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -197,6 +197,8 @@ # t/ subdirectory and are run in trash s
 PATH=$(pwd)/..:$PATH
 GIT_EXEC_PATH=$(pwd)/..
 export PATH GIT_EXEC_PATH
+GIT_RISK_RACY_CACHE=1
+export GIT_RISK_RACY_CACHE
 
 # Similarly use ../compat/subprocess.py if our python does not
 # have subprocess.py on its own.
-- 
1.4.2.rc3.g6b27

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] Racy git: avoid having to be always too careful
  2006-08-08 15:43         ` Johannes Schindelin
@ 2006-08-08 17:25           ` Junio C Hamano
  2006-08-08 22:30             ` Johannes Schindelin
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2006-08-08 17:25 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: Linus Torvalds, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> This makes "make test" dog slow. How about making this overrideable?
> (It is just a guess, but I _think_ that the sleeping is worse than having 
> to check the files with the same time stamp again and again -- a "git 
> status" will help that).
>
>
> -- 8< --
> [PATCH] read-cache: optionally disable being cautious with racy caches
>
> By setting the environment variable GIT_RISK_RACY_CACHE or the config
> variable core.riskRacyCache, the sleeping (to avoid a racy cache) is
> disabled.

"RISK" is a misnomer here -- the original code without patch is
being cautious and taking a runtime hit but not risky.  The
"delay while writing the cache out for the first time" is trying
to avoid the runtime hit by taking a hit at cache generation
time.  In either case you never risk racy cache.

Perhaps throwing away the whole "delay" thing might be simpler
and more worthwhile.

Another possibility is to tweak the heuristics -- currently we
say 20 paths or 1/4 of whole paths is too many and would cause
too much runtime hit but that was done without any measurement.
We could raise the threashold which would solve the case for the
testsuite whose trees are all small.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Racy git: avoid having to be always too careful
  2006-08-08 17:25           ` Junio C Hamano
@ 2006-08-08 22:30             ` Johannes Schindelin
  2006-08-08 23:07               ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Johannes Schindelin @ 2006-08-08 22:30 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Linus Torvalds, git

Hi,

On Tue, 8 Aug 2006, Junio C Hamano wrote:

> Another possibility is to tweak the heuristics -- currently we
> say 20 paths or 1/4 of whole paths is too many and would cause
> too much runtime hit but that was done without any measurement.
> We could raise the threashold which would solve the case for the
> testsuite whose trees are all small.

Okay, how about 100 paths? And make _this_ a config variable?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Racy git: avoid having to be always too careful
  2006-08-08 22:30             ` Johannes Schindelin
@ 2006-08-08 23:07               ` Junio C Hamano
  2006-08-08 23:44                 ` Johannes Schindelin
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2006-08-08 23:07 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: Linus Torvalds, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Okay, how about 100 paths? And make _this_ a config variable?

Actually, I have a better patch I'll be pushing out in "next"
this evening.

Hint: ce_write() batches and buffers things, so "now" obtained
in the current code is unreliable.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Racy git: avoid having to be always too careful
  2006-08-08 23:07               ` Junio C Hamano
@ 2006-08-08 23:44                 ` Johannes Schindelin
  0 siblings, 0 replies; 18+ messages in thread
From: Johannes Schindelin @ 2006-08-08 23:44 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Linus Torvalds, git

Hi,

On Tue, 8 Aug 2006, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > Okay, how about 100 paths? And make _this_ a config variable?
> 
> Actually, I have a better patch I'll be pushing out in "next"
> this evening.

Pity. I just whipped up this patch...

-- 8< --
[PATCH] read-cache: introduce core.racyThreshold

The expensive sleep is done only when there are more than 100 updated
files now. This number is overrideable by the core.racyThreshold variable.

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
---
 Documentation/config.txt |    5 +++++
 cache.h                  |    1 +
 config.c                 |    5 +++++
 environment.c            |    1 +
 read-cache.c             |    2 +-
 5 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index d89916b..f428fb5 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -103,6 +103,11 @@ core.legacyheaders::
 	database directly (where the "http://" and "rsync://" protocols
 	count as direct access).
 
+core.racyThreshold::
+	The number of files which need to be updated at least, before
+	read-cache sleeps to make the timestamps of these files and the
+	timestamp of the index file distinct.
+
 alias.*::
 	Command aliases for the gitlink:git[1] command wrapper - e.g.
 	after defining "alias.last = cat-file commit HEAD", the invocation
diff --git a/cache.h b/cache.h
index 19fdef6..8c0a830 100644
--- a/cache.h
+++ b/cache.h
@@ -193,6 +193,7 @@ extern int warn_ambiguous_refs;
 extern int shared_repository;
 extern const char *apply_default_whitespace;
 extern int zlib_compression_level;
+extern int racy_threshold;
 
 #define GIT_REPO_VERSION 0
 extern int repository_format_version;
diff --git a/config.c b/config.c
index c6e6f6a..01b8c23 100644
--- a/config.c
+++ b/config.c
@@ -291,6 +291,11 @@ int git_default_config(const char *var, 
 		return 0;
 	}
 
+	if (!strcmp(var, "core.racythreshold")) {
+		racy_threshold = git_config_int(var, value);
+		return 0;
+	}
+
 	if (!strcmp(var, "user.name")) {
 		strlcpy(git_default_name, value, sizeof(git_default_name));
 		return 0;
diff --git a/environment.c b/environment.c
index 1ce3411..3656f36 100644
--- a/environment.c
+++ b/environment.c
@@ -24,6 +24,7 @@ const char *apply_default_whitespace = N
 int zlib_compression_level = Z_DEFAULT_COMPRESSION;
 int pager_in_use;
 int pager_use_color = 1;
+int racy_threshold = 100;
 
 static int dyn_git_object_dir, dyn_git_index_file, dyn_git_graft_file;
 static char *git_dir, *git_object_dir, *git_index_file, *git_refs_dir,
diff --git a/read-cache.c b/read-cache.c
index d64b503..42a3f78 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1021,7 +1021,7 @@ int write_cache(int newfd, struct cache_
 	 * This assumes that nobody is touching the working tree while
 	 * we are updating the index.
 	 */
-	if (20 < recent || entries <= recent * 4) {
+	if (racy_threshold < recent) {
 		now = fstat(newfd, &st) ? 0 : st.st_mtime;
 		while (now && !fstat(newfd, &st) && st.st_mtime <= now) {
 			struct timespec rq, rm;
-- 
1.4.2.rc3.gfa72-dirty

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 0/6] Configuration tweaks for Solaris
@ 2006-08-15  9:00 Dennis Stosberg
  2006-08-15  9:01 ` [PATCH 1/6] Solaris has strlcpy() at least since version 8 Dennis Stosberg
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Dennis Stosberg @ 2006-08-15  9:00 UTC (permalink / raw
  To: git

Hello,

The current configure script fails to generate a working configuration
on Solaris for a number of different reasons.  With these patches on the
"next" branch "gmake clean configure && ./configure && gmake all test"
completes on Solaris 9 with Sun CC 5.8 without errors.

The second patch may be suitable for the "maint" branch as well. Without
it, at least t3800-mktag.sh fails.

Regards,
Dennis

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/6] Solaris has strlcpy() at least since version 8
  2006-08-15  9:00 [PATCH 0/6] Configuration tweaks for Solaris Dennis Stosberg
@ 2006-08-15  9:01 ` Dennis Stosberg
  2006-08-15  9:01 ` [PATCH 2/6] Solaris does not support C99 format strings before version 10 Dennis Stosberg
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Dennis Stosberg @ 2006-08-15  9:01 UTC (permalink / raw
  To: git

See http://docs.sun.com/app/docs/doc/816-3321/6m9k23sjk?a=view

Signed-off-by: Dennis Stosberg <dennis@stosberg.net>
---

 Makefile |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/Makefile b/Makefile
index 66c4fcc..495631a 100644
--- a/Makefile
+++ b/Makefile
@@ -338,7 +338,6 @@ ifeq ($(uname_S),SunOS)
 	NEEDS_NSL = YesPlease
 	SHELL_PATH = /bin/bash
 	NO_STRCASESTR = YesPlease
-	NO_STRLCPY = YesPlease
 	ifeq ($(uname_R),5.8)
 		NEEDS_LIBICONV = YesPlease
 		NO_UNSETENV = YesPlease

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/6] Solaris does not support C99 format strings before version 10
  2006-08-15  9:00 [PATCH 0/6] Configuration tweaks for Solaris Dennis Stosberg
  2006-08-15  9:01 ` [PATCH 1/6] Solaris has strlcpy() at least since version 8 Dennis Stosberg
@ 2006-08-15  9:01 ` Dennis Stosberg
  2006-08-15  9:01 ` [PATCH 3/6] Look for sockaddr_storage in sys/socket.h Dennis Stosberg
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Dennis Stosberg @ 2006-08-15  9:01 UTC (permalink / raw
  To: git

Signed-off-by: Dennis Stosberg <dennis@stosberg.net>
---

 Makefile |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/Makefile b/Makefile
index 495631a..3cb6531 100644
--- a/Makefile
+++ b/Makefile
@@ -342,10 +342,12 @@ ifeq ($(uname_S),SunOS)
 		NEEDS_LIBICONV = YesPlease
 		NO_UNSETENV = YesPlease
 		NO_SETENV = YesPlease
+		NO_C99_FORMAT = YesPlease
 	endif
 	ifeq ($(uname_R),5.9)
 		NO_UNSETENV = YesPlease
 		NO_SETENV = YesPlease
+		NO_C99_FORMAT = YesPlease
 	endif
 	INSTALL = ginstall
 	TAR = gtar

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/6] Look for sockaddr_storage in sys/socket.h
  2006-08-15  9:00 [PATCH 0/6] Configuration tweaks for Solaris Dennis Stosberg
  2006-08-15  9:01 ` [PATCH 1/6] Solaris has strlcpy() at least since version 8 Dennis Stosberg
  2006-08-15  9:01 ` [PATCH 2/6] Solaris does not support C99 format strings before version 10 Dennis Stosberg
@ 2006-08-15  9:01 ` Dennis Stosberg
  2006-08-15  9:01 ` [PATCH 4/6] Fix detection of ipv6 on Solaris Dennis Stosberg
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Dennis Stosberg @ 2006-08-15  9:01 UTC (permalink / raw
  To: git

On Solaris and the BSDs the definition of "struct sockaddr_storage"
is not available from "netinet/in.h".  On Solaris "sys/socket.h" is 
enough, at least OpenBSD needs "sys/types.h", too.

Using "sys/types.h" and "sys/socket.h" seems to be a more portable
way.

Signed-off-by: Dennis Stosberg <dennis@stosberg.net>
---

 configure.ac |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/configure.ac b/configure.ac
index e890131..0321d43 100644
--- a/configure.ac
+++ b/configure.ac
@@ -181,8 +181,10 @@ # Define NO_SOCKADDR_STORAGE if your pla
 # sockaddr_storage.
 AC_CHECK_TYPE(struct sockaddr_storage,
 [NO_SOCKADDR_STORAGE=],
-[NO_SOCKADDR_STORAGE=YesPlease],
-[#include <netinet/in.h>])
+[NO_SOCKADDR_STORAGE=YesPlease],[
+#include <sys/types.h>
+#include <sys/socket.h>
+])
 AC_SUBST(NO_SOCKADDR_STORAGE)
 #
 # Define NO_IPV6 if you lack IPv6 support and getaddrinfo().

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/6] Fix detection of ipv6 on Solaris
  2006-08-15  9:00 [PATCH 0/6] Configuration tweaks for Solaris Dennis Stosberg
                   ` (2 preceding siblings ...)
  2006-08-15  9:01 ` [PATCH 3/6] Look for sockaddr_storage in sys/socket.h Dennis Stosberg
@ 2006-08-15  9:01 ` Dennis Stosberg
  2006-08-15  9:01 ` [PATCH 5/6] On Solaris nanosleep() is not in libc but in librt Dennis Stosberg
  2006-08-15  9:01 ` [PATCH 6/6] Fix compilation with Sun CC Dennis Stosberg
  5 siblings, 0 replies; 18+ messages in thread
From: Dennis Stosberg @ 2006-08-15  9:01 UTC (permalink / raw
  To: git

The configuration script detects whether linking with -lsocket is
necessary but doesn't add -lsocket to LIBS.  This lets the ipv6 test
fail.

Signed-off-by: Dennis Stosberg <dennis@stosberg.net>
---

 configure.ac |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/configure.ac b/configure.ac
index 0321d43..36f9cd9 100644
--- a/configure.ac
+++ b/configure.ac
@@ -154,6 +154,7 @@ AC_CHECK_LIB([c], [socket],
 [NEEDS_SOCKET=],
 [NEEDS_SOCKET=YesPlease])
 AC_SUBST(NEEDS_SOCKET)
+test -n "$NEEDS_SOCKET" && LIBS="$LIBS -lsocket"
 
 
 ## Checks for header files.

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/6] On Solaris nanosleep() is not in libc but in librt
  2006-08-15  9:00 [PATCH 0/6] Configuration tweaks for Solaris Dennis Stosberg
                   ` (3 preceding siblings ...)
  2006-08-15  9:01 ` [PATCH 4/6] Fix detection of ipv6 on Solaris Dennis Stosberg
@ 2006-08-15  9:01 ` Dennis Stosberg
  2006-08-15 10:35   ` Junio C Hamano
  2006-08-15  9:01 ` [PATCH 6/6] Fix compilation with Sun CC Dennis Stosberg
  5 siblings, 1 reply; 18+ messages in thread
From: Dennis Stosberg @ 2006-08-15  9:01 UTC (permalink / raw
  To: git

Signed-off-by: Dennis Stosberg <dennis@stosberg.net>
---

 Makefile      |   11 +++++++++--
 config.mak.in |    1 +
 configure.ac  |   10 ++++++++--
 3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/Makefile b/Makefile
index 3cb6531..d352901 100644
--- a/Makefile
+++ b/Makefile
@@ -67,8 +67,10 @@ # Define NEEDS_SSL_WITH_CRYPTO if you ne
 #
 # Define NEEDS_LIBICONV if linking with libc is not enough (Darwin).
 #
-# Define NEEDS_SOCKET if linking with libc is not enough (SunOS,
-# Patrick Mauritz).
+# Define NEEDS_SOCKET if linking with libc is not enough for socket()
+# (SunOS, Patrick Mauritz).
+#
+# Define NEEDS_RT if linking with libc is not enough for nanosleep() (SunOS)
 #
 # Define NO_MMAP if you want to avoid mmap.
 #
@@ -336,6 +338,7 @@ endif
 ifeq ($(uname_S),SunOS)
 	NEEDS_SOCKET = YesPlease
 	NEEDS_NSL = YesPlease
+	NEEDS_RT = YesPlease
 	SHELL_PATH = /bin/bash
 	NO_STRCASESTR = YesPlease
 	ifeq ($(uname_R),5.8)
@@ -479,6 +482,10 @@ ifdef NEEDS_NSL
 	EXTLIBS += -lnsl
 	SIMPLE_LIB += -lnsl
 endif
+ifdef NEEDS_RT
+	EXTLIBS += -lrt
+	SIMPLE_LIB += -lrt
+endif
 ifdef NO_D_TYPE_IN_DIRENT
 	BASIC_CFLAGS += -DNO_D_TYPE_IN_DIRENT
 endif
diff --git a/config.mak.in b/config.mak.in
index 369e611..038767e 100644
--- a/config.mak.in
+++ b/config.mak.in
@@ -29,6 +29,7 @@ NO_CURL=@NO_CURL@
 NO_EXPAT=@NO_EXPAT@
 NEEDS_LIBICONV=@NEEDS_LIBICONV@
 NEEDS_SOCKET=@NEEDS_SOCKET@
+NEEDS_RT=@NEEDS_RT@
 NO_D_INO_IN_DIRENT=@NO_D_INO_IN_DIRENT@
 NO_D_TYPE_IN_DIRENT=@NO_D_TYPE_IN_DIRENT@
 NO_SOCKADDR_STORAGE=@NO_SOCKADDR_STORAGE@
diff --git a/configure.ac b/configure.ac
index 36f9cd9..6f1d87a 100644
--- a/configure.ac
+++ b/configure.ac
@@ -148,13 +148,19 @@ AC_CHECK_LIB([c], [iconv],
 [NEEDS_LIBICONV=YesPlease])
 AC_SUBST(NEEDS_LIBICONV)
 #
-# Define NEEDS_SOCKET if linking with libc is not enough (SunOS,
-# Patrick Mauritz).
+# Define NEEDS_SOCKET if linking with libc is not enough for socket()
+# (SunOS, Patrick Mauritz).
 AC_CHECK_LIB([c], [socket],
 [NEEDS_SOCKET=],
 [NEEDS_SOCKET=YesPlease])
 AC_SUBST(NEEDS_SOCKET)
 test -n "$NEEDS_SOCKET" && LIBS="$LIBS -lsocket"
+#
+# Define NEEDS_RT if linking with libc is not enough for nanosleep (SunOS)
+AC_CHECK_LIB([c], [nanosleep],
+[NEEDS_RT=],
+[NEEDS_RT=YesPlease])
+AC_SUBST(NEEDS_RT)
 
 
 ## Checks for header files.

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/6] Fix compilation with Sun CC
  2006-08-15  9:00 [PATCH 0/6] Configuration tweaks for Solaris Dennis Stosberg
                   ` (4 preceding siblings ...)
  2006-08-15  9:01 ` [PATCH 5/6] On Solaris nanosleep() is not in libc but in librt Dennis Stosberg
@ 2006-08-15  9:01 ` Dennis Stosberg
  5 siblings, 0 replies; 18+ messages in thread
From: Dennis Stosberg @ 2006-08-15  9:01 UTC (permalink / raw
  To: git

- Add the CFLAGS variable to config.mak.in to override the Makefile's
  default, which is gcc-specific and won't work with Sun CC.
- Prefer "cc" over "gcc", because Pasky's Git.pm will not compile with gcc
  on Solaris at all. On Linux and the free BSDs "cc" is linked to "gcc"
  anyway.
- Set correct flag to generate position-independent code.
- Add "-xO3" (= use default optimization level) to CFLAGS.

Signed-off-by: Dennis Stosberg <dennis@stosberg.net>
---

 Makefile      |    6 +++++-
 config.mak.in |    2 ++
 configure.ac  |    9 ++++++++-
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index d352901..aeefc4e 100644
--- a/Makefile
+++ b/Makefile
@@ -114,6 +114,7 @@ uname_P := $(shell sh -c 'uname -p 2>/de
 # CFLAGS and LDFLAGS are for the users to override from the command line.
 
 CFLAGS = -g -O2 -Wall
+PIC_FLAG = -fPIC
 LDFLAGS =
 ALL_CFLAGS = $(CFLAGS)
 ALL_LDFLAGS = $(LDFLAGS)
@@ -408,6 +409,9 @@ endif
 ifneq (,$(findstring arm,$(uname_M)))
 	ARM_SHA1 = YesPlease
 endif
+ifeq ($(uname_M),sun4u)
+	USE_PIC = YesPlease
+endif
 ifeq ($(uname_M),x86_64)
 	USE_PIC = YesPlease
 endif
@@ -554,7 +558,7 @@ endif
 endif
 endif
 ifdef USE_PIC
-	ALL_CFLAGS += -fPIC
+	ALL_CFLAGS += $(PIC_FLAG)
 endif
 ifdef NO_ACCURATE_DIFF
 	BASIC_CFLAGS += -DNO_ACCURATE_DIFF
diff --git a/config.mak.in b/config.mak.in
index 038767e..1fd5f7e 100644
--- a/config.mak.in
+++ b/config.mak.in
@@ -2,6 +2,8 @@ # git Makefile configuration, included i
 # @configure_input@
 
 CC = @CC@
+CFLAGS = @CFLAGS@
+PIC_FLAG = @PIC_FLAG@
 AR = @AR@
 TAR = @TAR@
 #INSTALL = @INSTALL@		# needs install-sh or install.sh in sources
diff --git a/configure.ac b/configure.ac
index 6f1d87a..427ac23 100644
--- a/configure.ac
+++ b/configure.ac
@@ -95,7 +95,14 @@ AC_SUBST(PYTHON_PATH)
 ## Checks for programs.
 AC_MSG_NOTICE([CHECKS for programs])
 #
-AC_PROG_CC
+AC_PROG_CC([cc gcc])
+if test -n "$GCC"; then
+	PIC_FLAG="-fPIC"
+else
+	AC_CHECK_DECL(__SUNPRO_C, [CFLAGS="$CFLAGS -xO3"; PIC_FLAG="-KPIC"])
+fi
+AC_SUBST(PIC_FLAG)
+
 #AC_PROG_INSTALL		# needs install-sh or install.sh in sources
 AC_CHECK_TOOL(AR, ar, :)
 AC_CHECK_PROGS(TAR, [gtar tar])

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] On Solaris nanosleep() is not in libc but in librt
  2006-08-15  9:01 ` [PATCH 5/6] On Solaris nanosleep() is not in libc but in librt Dennis Stosberg
@ 2006-08-15 10:35   ` Junio C Hamano
  2006-07-31 16:55     ` Fix double "close()" in ce_compare_data Linus Torvalds
  2006-08-15 11:18     ` [PATCH 5/6] On Solaris nanosleep() is not in libc but in librt Alex Riesen
  0 siblings, 2 replies; 18+ messages in thread
From: Junio C Hamano @ 2006-08-15 10:35 UTC (permalink / raw
  To: Dennis Stosberg; +Cc: git

Dennis Stosberg <dennis@stosberg.net> writes:

> -# Define NEEDS_SOCKET if linking with libc is not enough (SunOS,
> -# Patrick Mauritz).
> +# Define NEEDS_SOCKET if linking with libc is not enough for socket()
> +# (SunOS, Patrick Mauritz).
> +#
> +# Define NEEDS_RT if linking with libc is not enough for nanosleep() (SunOS)

Ah, nanosleep(2) was my fault, and we should be able to just use
straight sleep(3) there.  The purpose of the loop is to wait
until the next filesystem timestamp granularity, and the code
uses subsecond sleep in the hope that it can shorten the delay
to 0.5 seconds on average instead of a full second.

How exotic is -lrt on SunOS?  I suspect it is not worth
depending on it only for that single use in read-cache.c

We might want to yank out the whole "racy-git avoidance is
costly later so let's delay writing the index out" codepath
later, but that is a separate issue and needs some testing on
large trees to figure it out.  After playing with the kernel
tree, I have a feeling that the whole thing may not be worth
it.

In any case, an obvious tentative patch is here.

diff --git a/read-cache.c b/read-cache.c
index b18f9f7..ec4dd5a 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -5,7 +5,6 @@
  */
 #include "cache.h"
 #include "cache-tree.h"
-#include <time.h>
 
 /* Index extensions.
  *
@@ -1033,11 +1032,8 @@ #if 0
 			fprintf(stderr, "now        %lu\n", now);
 #endif
 			while (!fstat(newfd, &st) && st.st_mtime <= now) {
-				struct timespec rq, rm;
 				off_t where = lseek(newfd, 0, SEEK_CUR);
-				rq.tv_sec = 0;
-				rq.tv_nsec = 250000000;
-				nanosleep(&rq, &rm);
+				sleep(1);
 				if ((where == (off_t) -1) ||
 				    (write(newfd, "", 1) != 1) ||
 				    (lseek(newfd, -1, SEEK_CUR) != where) ||

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] On Solaris nanosleep() is not in libc but in librt
  2006-08-15 10:35   ` Junio C Hamano
  2006-07-31 16:55     ` Fix double "close()" in ce_compare_data Linus Torvalds
@ 2006-08-15 11:18     ` Alex Riesen
  1 sibling, 0 replies; 18+ messages in thread
From: Alex Riesen @ 2006-08-15 11:18 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Dennis Stosberg, git

On 8/15/06, Junio C Hamano <junkio@cox.net> wrote:
> > -# Define NEEDS_SOCKET if linking with libc is not enough (SunOS,
> > -# Patrick Mauritz).
> > +# Define NEEDS_SOCKET if linking with libc is not enough for socket()
> > +# (SunOS, Patrick Mauritz).
> > +#
> > +# Define NEEDS_RT if linking with libc is not enough for nanosleep() (SunOS)
>
> Ah, nanosleep(2) was my fault, and we should be able to just use
> straight sleep(3) there.  The purpose of the loop is to wait
> until the next filesystem timestamp granularity, and the code
> uses subsecond sleep in the hope that it can shorten the delay
> to 0.5 seconds on average instead of a full second.

Was it not SunOS where sleep was implemented by means of SIGALRM?
Besides, we still can shorten the delay by using select(2).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] Documentation/technical/racy-git.txt
  2006-07-31 16:55     ` Fix double "close()" in ce_compare_data Linus Torvalds
  2006-07-31 19:05       ` Junio C Hamano
  2006-08-05 11:20       ` [PATCH] Racy git: avoid having to be always too careful Junio C Hamano
@ 2006-08-15 20:12       ` Junio C Hamano
  2 siblings, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2006-08-15 20:12 UTC (permalink / raw
  To: Linus Torvalds; +Cc: git, Johannes Schindelin


Signed-off-by: Junio C Hamano <junkio@cox.net>
---

   Junio C Hamano <junkio@cox.net> writes:

   > We might want to yank out the whole "racy-git avoidance is
   > costly later so let's delay writing the index out" codepath
   > later, but that is a separate issue and needs some testing on
   > large trees to figure it out.  After playing with the kernel
   > tree, I have a feeling that the whole thing may not be worth
   > it.

   Addressed to Linus because he originally brought up this issue
   in <Pine.LNX.4.64.0607310945490.4168@g5.osdl.org>, Johannes
   CC'ed because he had some comments earlier on the same topic
   and he is generally a good person to talk to when I have
   doubts on issues ;-).

 Documentation/technical/racy-git.txt |  193 ++++++++++++++++++++++++++++++++++
 1 files changed, 193 insertions(+), 0 deletions(-)

diff --git a/Documentation/technical/racy-git.txt b/Documentation/technical/racy-git.txt
new file mode 100644
index 0000000..7597d04
--- /dev/null
+++ b/Documentation/technical/racy-git.txt
@@ -0,0 +1,193 @@
+Use of index and Racy git problem
+=================================
+
+Background
+----------
+
+The index is one of the most important data structure in git.
+It represents a virtual working tree state by recording list of
+paths and their object names and serves as a staging area to
+write out the next tree object to be committed.  The state is
+"virtual" in the sense that it does not necessarily have to, and
+often does not, match the files in the working tree.
+
+There are cases git needs to examine the differences between the
+virtual working tree state in the index and the files in the
+working tree.  The most obvious case is when the user asks `git
+diff` (or its low level implementation, `git diff-files`) or
+`git-ls-files --modified`.  In addition, git internally checks
+if the files in the working tree is different from what are
+recorded in the index to avoid stomping on local changes in them
+during patch application, switching branches, and merging.
+
+In order to speed up this comparison between the files in the
+working tree and the index entries, the index entries record the
+information obtained from the filesystem via `lstat(2)` system
+call when they were last updated.  When checking if they differ,
+git first runs `lstat(2)` on the files and compare the result
+with this information (this is what was originally done by the
+`ce_match_stat()` function, which the current code does in
+`ce_match_stat_basic()` function).  If some of these "cached
+stat information" fields do not match, git can tell that the
+files are modified without even looking at their contents.
+
+Note: not all members in `struct stat` obtained via `lstat(2)`
+are used for this comparison.  For example, `st_atime` obviously
+is not useful.  Currently, git compares the file type (regular
+files vs symbolic links) and executable bits (only for regular
+files) from `st_mode` member, `st_mtime` and `st_ctime`
+timestamps, `st_uid`, `st_gid`, `st_ino`, and `st_size` members.
+With a `USE_STDEV` compile-time option, `st_dev` is also
+compared, but this is not enabled by default because this member
+is not stable on network filesystems.  With `USE_NSEC`
+compile-time option, `st_mtim.tv_nsec` and `st_ctim.tv_nsec`
+members are also compared, but this is not enabled by default
+because the value of this member becomes meaningless once the
+inode is evicted from the inode cache on filesystems that do not
+store it on disk.
+
+
+Racy git
+--------
+
+There is one slight problem with the optimization based on the
+cached stat information.  Consider this sequence:
+
+  $ git update-index 'foo'
+  : modify 'foo' in-place without changing its size
+
+The first `update-index` computes the object name of the
+contents of file `foo` and updates the index entry for `foo`
+along with the `struct stat` information.  If the modification
+that follows it happens very fast so that the file's `st_mtime`
+timestamp does not change, after this sequence, the cached stat
+information the index entry records still exactly match what you
+can obtain from the filesystem, but the file `foo` is modified.
+This way, git can incorrectly think files in the working tree
+are unmodified even though they actually are.  This is called
+the "racy git" problem (discovered by Pasky), and the entries
+that appear clean when they may not be because of this problem
+are called "racily clean".
+
+To avoid this problem, git does two things:
+
+. When the cached stat information says the file has not been
+  modified, and the `st_mtime` is the same as (or newer than)
+  the timestamp of the index file itself (which is the time `git
+  update-index foo` finished running in the above example), it
+  also compares the contents with the object registered in the
+  index entry to make sure they match.
+
+. When the index file is updated that contains racily clean
+  entries, cached `st_size` information is truncated to zero
+  before writing a new version of the index file.
+
+Because the index file itself is written after collecting all
+the stat information from updated paths, `st_mtime` timestamp of
+it is usually the same as or newer than any of the paths the
+index contains.  And no matter how quick the modification that
+follows `git update-index foo` finishes, the resulting
+`st_mtime` timestamp on `foo` cannot get the timestamp earlier
+than the index file.  Therefore, index entries that can be
+racily clean are limited to the ones that have the same
+timestamp as the index file itself.
+
+The callers that want to check if an index entry matches the
+corresponding file in the working tree continue to call
+`ce_match_stat()`, but with this change, `ce_match_stat()` uses
+`ce_modified_check_fs()` to see if racily clean ones are
+actually clean after comparing the cached stat information using
+`ce_match_stat_basic()`.
+
+The problem the latter solves is this sequence:
+
+  $ git update-index 'foo'
+  : modify 'foo' in-place without changing its size
+  : wait for enough time
+  $ git update-index 'bar'
+
+Without the latter, the timestamp of the index file gets a newer
+value, and falsely clean entry `foo` would not be caught by the
+timestamp comparison check done with the former logic anymore.
+The latter makes sure that the cached stat information for `foo`
+would never match with the file in the working tree, so later
+checks by `ce_match_stat_basic()` would report the index entry
+does not match the file and git does not have to fall back on more
+expensive `ce_modified_check_fs()`.
+
+
+Runtime penalty
+---------------
+
+The runtime penalty of falling back to `ce_modified_check_fs()`
+from `ce_match_stat()` can be very expensive when there are many
+racily clean entries.  An obvious way to artificially create
+this situation is to give the same timestamp to all the files in
+the working tree in a large project, run `git update-index` on
+them, and give the same timestamp to the index file:
+
+  $ date >.datestamp
+  $ git ls-files | xargs touch -r .datestamp
+  $ git ls-files | git update-index --stdin
+  $ touch -r .datestamp .git/index
+
+This will make all index entries racily clean.  The linux-2.6
+project, for example, there are over 20,000 files in the working
+tree.  On my Athron 64X2 3800+, after the above:
+
+  $ /usr/bin/time git diff-files
+  1.68user 0.54system 0:02.22elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
+  0inputs+0outputs (0major+67111minor)pagefaults 0swaps
+  $ git update-index MAINTAINERS
+  $ /usr/bin/time git diff-files
+  0.02user 0.12system 0:00.14elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
+  0inputs+0outputs (0major+935minor)pagefaults 0swaps
+
+Running `git update-index` in the middle checked the racily
+clean entries, and left the cached `st_mtime` for all the paths
+intact because they were actually clean (so this step took about
+the same amount of time as the first `git diff-files`).  After
+that, they are not racily clean anymore but are truly clean, so
+the second invocation of `git diff-files` fully took advantage
+of the cached stat information.
+
+
+Avoiding runtime penalty
+------------------------
+
+In order to avoid the above runtime penalty, the recent "master"
+branch (post 1.4.2) has a code that makes sure the index file
+gets timestamp newer than the youngest files in the index when
+there are many young files with the same timestamp as the
+resulting index file would otherwise would have by waiting
+before finishing writing the index file out.
+
+I suspect that in practice the situation where many paths in the
+index are all racily clean is quite rare.  The only code paths
+that can record recent timestamp for large number of paths I
+know of are:
+
+. Initial `git add .` of a large project.
+
+. `git checkout` of a large project from an empty index into an
+  unpopulated working tree.
+
+Note: switching branches with `git checkout` keeps the cached
+stat information of existing working tree files that are the
+same between the current branch and the new branch, which are
+all older than the resulting index file, and they will not
+become racily clean.  Only the files that are actually checked
+out can become racily clean.
+
+In a large project where raciness avoidance cost really matters,
+however, the initial computation of all object names in the
+index takes more than one second, and the index file is written
+out after all that happens.  Therefore the timestamp of the
+index file will be more than one seconds later than the the
+youngest file in the working tree.  This means that in these
+cases there actually will not be any racily clean entry in
+the resulting index.
+
+So in summary I think we should not worry about avoiding the
+runtime penalty and get rid of the "wait before finishing
+writing" code out.
-- 
1.4.2.g59bb

^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2006-08-15 20:12 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-15  9:00 [PATCH 0/6] Configuration tweaks for Solaris Dennis Stosberg
2006-08-15  9:01 ` [PATCH 1/6] Solaris has strlcpy() at least since version 8 Dennis Stosberg
2006-08-15  9:01 ` [PATCH 2/6] Solaris does not support C99 format strings before version 10 Dennis Stosberg
2006-08-15  9:01 ` [PATCH 3/6] Look for sockaddr_storage in sys/socket.h Dennis Stosberg
2006-08-15  9:01 ` [PATCH 4/6] Fix detection of ipv6 on Solaris Dennis Stosberg
2006-08-15  9:01 ` [PATCH 5/6] On Solaris nanosleep() is not in libc but in librt Dennis Stosberg
2006-08-15 10:35   ` Junio C Hamano
2006-07-31 16:55     ` Fix double "close()" in ce_compare_data Linus Torvalds
2006-07-31 19:05       ` Junio C Hamano
2006-08-05 11:20       ` [PATCH] Racy git: avoid having to be always too careful Junio C Hamano
2006-08-08 15:43         ` Johannes Schindelin
2006-08-08 17:25           ` Junio C Hamano
2006-08-08 22:30             ` Johannes Schindelin
2006-08-08 23:07               ` Junio C Hamano
2006-08-08 23:44                 ` Johannes Schindelin
2006-08-15 20:12       ` [PATCH] Documentation/technical/racy-git.txt Junio C Hamano
2006-08-15 11:18     ` [PATCH 5/6] On Solaris nanosleep() is not in libc but in librt Alex Riesen
2006-08-15  9:01 ` [PATCH 6/6] Fix compilation with Sun CC Dennis Stosberg

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).