git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] Avoid accessing a slow working copy during diffcore operations.
@ 2006-12-14 11:15 Shawn O. Pearce
  2006-12-14 13:57 ` Alex Riesen
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Shawn O. Pearce @ 2006-12-14 11:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Alex Riesen

The Cygwin folks have done a fine job at creating a POSIX layer
on Windows That Just Works(tm).  However it comes with a penalty;
accessing files in the working tree by way of stat/open/mmap can
be slower for diffcore than inflating the data from a blob which
is stored in a packfile.

This performance problem is especially an issue in merge-recursive
when dealing with nearly 7000 added files, as we are loading
each file's content from the working directory to perform rename
detection.  I have literally seen (and sadly watched) paint dry in
less time than it takes for merge-recursive to finish such a merge.
On the other hand this very same merge runs very fast on Solaris.

If Git is compiled with NO_FAST_WORKING_DIRECTORY set then we will
avoid looking at the working directory when the blob in question
is available within a packfile and the caller doesn't need the data
unpacked into a temporary file.

We don't use loose objects as they have the same open/mmap/close
costs as the working directory file access, but have the additional
CPU overhead of needing to inflate the content before use.  So it
is still faster to use the working tree file over the loose object.

If the caller needs the file data unpacked into a temporary file
its likely because they are going to call an external diff program,
passing the file as a parameter.  In this case reusing the working
tree file will be faster as we don't need to inflate the data and
write it out to a temporary file.

The NO_FAST_WORKING_DIRECTORY feature is enabled by default on
Cygwin, as that is the platform which currently appears to benefit
the most from this option.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 Makefile |    7 +++++++
 diff.c   |   20 +++++++++++++++++---
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index bf4c65d..cd9992a 100644
--- a/Makefile
+++ b/Makefile
@@ -69,6 +69,9 @@ all:
 #
 # Define NO_MMAP if you want to avoid mmap.
 #
+# Define NO_FAST_WORKING_DIRECTORY if accessing objects in pack files is
+# generally faster on your platform than accessing the working directory.
+#
 # Define NO_IPV6 if you lack IPv6 support and getaddrinfo().
 #
 # Define NO_SOCKADDR_STORAGE if your platform does not have struct
@@ -355,6 +358,7 @@ ifeq ($(uname_O),Cygwin)
 	NO_SYMLINK_HEAD = YesPlease
 	NEEDS_LIBICONV = YesPlease
 	NO_C99_FORMAT = YesPlease
+	NO_FAST_WORKING_DIRECTORY = UnfortunatelyYes
 	# There are conflicting reports about this.
 	# On some boxes NO_MMAP is needed, and not so elsewhere.
 	# Try uncommenting this if you see things break -- YMMV.
@@ -506,6 +510,9 @@ ifdef NO_MMAP
 	COMPAT_CFLAGS += -DNO_MMAP
 	COMPAT_OBJS += compat/mmap.o
 endif
+ifdef NO_FAST_WORKING_DIRECTORY
+	BASIC_CFLAGS += -DNO_FAST_WORKING_DIRECTORY
+endif
 ifdef NO_IPV6
 	BASIC_CFLAGS += -DNO_IPV6
 endif
diff --git a/diff.c b/diff.c
index 0b284b3..565b23c 100644
--- a/diff.c
+++ b/diff.c
@@ -1172,7 +1172,7 @@ void fill_filespec(struct diff_filespec *spec, const unsigned char *sha1,
  * the work tree has that object contents, return true, so that
  * prepare_temp_file() does not have to inflate and extract.
  */
-static int work_tree_matches(const char *name, const unsigned char *sha1)
+static int work_tree_matches(const char *name, const unsigned char *sha1, int want_file)
 {
 	struct cache_entry *ce;
 	struct stat st;
@@ -1193,6 +1193,20 @@ static int work_tree_matches(const char *name, const unsigned char *sha1)
 	if (!active_cache)
 		return 0;
 
+#ifdef NO_FAST_WORKING_DIRECTORY
+	/* We want to avoid the working directory if our caller
+	 * doesn't need the data in a normal file, this system
+	 * is rather slow with its stat/open/mmap/close syscalls,
+	 * and the object is contained in a pack file.  The pack
+	 * is probably already open and will be faster to obtain
+	 * the data through than the working directory.  Loose
+	 * objects however would tend to be slower as they need
+	 * to be individually opened and inflated.
+	 */
+	if (!want_file && has_sha1_pack(sha1, NULL))
+		return 0;
+#endif
+
 	len = strlen(name);
 	pos = cache_name_pos(name, len);
 	if (pos < 0)
@@ -1279,7 +1293,7 @@ int diff_populate_filespec(struct diff_filespec *s, int size_only)
 	if (s->data)
 		return err;
 	if (!s->sha1_valid ||
-	    work_tree_matches(s->path, s->sha1)) {
+	    work_tree_matches(s->path, s->sha1, 0)) {
 		struct stat st;
 		int fd;
 		if (lstat(s->path, &st) < 0) {
@@ -1386,7 +1400,7 @@ static void prepare_temp_file(const char *name,
 	}
 
 	if (!one->sha1_valid ||
-	    work_tree_matches(name, one->sha1)) {
+	    work_tree_matches(name, one->sha1, 1)) {
 		struct stat st;
 		if (lstat(name, &st) < 0) {
 			if (errno == ENOENT)
-- 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] Avoid accessing a slow working copy during diffcore operations.
  2006-12-14 11:15 [PATCH] Avoid accessing a slow working copy during diffcore operations Shawn O. Pearce
@ 2006-12-14 13:57 ` Alex Riesen
  2006-12-14 14:12   ` Johannes Schindelin
  2006-12-15  9:55 ` Jakub Narebski
  2006-12-16  6:03 ` Junio C Hamano
  2 siblings, 1 reply; 10+ messages in thread
From: Alex Riesen @ 2006-12-14 13:57 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Junio C Hamano, git

On 12/14/06, Shawn O. Pearce <spearce@spearce.org> wrote:
> If Git is compiled with NO_FAST_WORKING_DIRECTORY set then we will
> avoid looking at the working directory when the blob in question
> is available within a packfile and the caller doesn't need the data
> unpacked into a temporary file.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Avoid accessing a slow working copy during diffcore operations.
  2006-12-14 13:57 ` Alex Riesen
@ 2006-12-14 14:12   ` Johannes Schindelin
  2006-12-14 14:49     ` Alex Riesen
  0 siblings, 1 reply; 10+ messages in thread
From: Johannes Schindelin @ 2006-12-14 14:12 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Shawn O. Pearce, Junio C Hamano, git

Hi,

On Thu, 14 Dec 2006, Alex Riesen wrote:

> On 12/14/06, Shawn O. Pearce <spearce@spearce.org> wrote:
> > If Git is compiled with NO_FAST_WORKING_DIRECTORY set then we will
> > avoid looking at the working directory when the blob in question
> > is available within a packfile and the caller doesn't need the data
> > unpacked into a temporary file.
> 
> Why can't it be useful in generic code? What are the downsides?

It is usually cheaper to just read the file, especially if it is still 
cached, because the alternative means unpacking the loose object, or 
worse, unpacking the packed object _along_ with the objects in its delta 
chain.

Not every OS sucks cache-wise, and you should not make others suffer for 
Redmond's shortcomings.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Avoid accessing a slow working copy during diffcore operations.
  2006-12-14 14:12   ` Johannes Schindelin
@ 2006-12-14 14:49     ` Alex Riesen
  2006-12-14 15:37       ` Johannes Schindelin
  0 siblings, 1 reply; 10+ messages in thread
From: Alex Riesen @ 2006-12-14 14:49 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Shawn O. Pearce, Junio C Hamano, git

On 12/14/06, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > > If Git is compiled with NO_FAST_WORKING_DIRECTORY set then we will
> > > avoid looking at the working directory when the blob in question
> > > is available within a packfile and the caller doesn't need the data
> > > unpacked into a temporary file.
> >
> > Why can't it be useful in generic code? What are the downsides?
>
> It is usually cheaper to just read the file, especially if it is still
> cached, because the alternative means unpacking the loose object, or
> worse, unpacking the packed object _along_ with the objects in its delta
> chain.

But you have to read less, and even that could be in cache as well
and unpacking in userspace could be faster than open/write temporary/
read temporary/close/unlink temporary file on a normal system

> Not every OS sucks cache-wise, and you should not make others suffer for
> Redmond's shortcomings.

I'm just do not understand why avoiding temporary file wouldn't help

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Avoid accessing a slow working copy during diffcore operations.
  2006-12-14 14:49     ` Alex Riesen
@ 2006-12-14 15:37       ` Johannes Schindelin
  0 siblings, 0 replies; 10+ messages in thread
From: Johannes Schindelin @ 2006-12-14 15:37 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Shawn O. Pearce, Junio C Hamano, git

Hi,

On Thu, 14 Dec 2006, Alex Riesen wrote:

> On 12/14/06, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > 
> > It is usually cheaper to just read the file, especially if it is still 
> > cached, because the alternative means unpacking the loose object, or 
> > worse, unpacking the packed object _along_ with the objects in its 
> > delta chain.
> 
> But you have to read less, and even that could be in cache as well and 
> unpacking in userspace could be faster than open/write temporary/ read 
> temporary/close/unlink temporary file on a normal system

You have to unpack anyway, since even the loose objects are packed. But to 
reconstruct a deltified object, you have to reconstruct possibly many 
objects.

So yes, if you have it in the working directory (unpacked), it should be 
faster to just read it, especially if it is still in the filesystem cache.

> > Not every OS sucks cache-wise, and you should not make others suffer 
> > for Redmond's shortcomings.
> 
> I'm just do not understand why avoiding temporary file wouldn't help all 
> OSes, even if they do not suck cache-wise.

Ah! But it is not temporary! It is the "working copy", which means the 
file you have in the working directory. IOW it is an unpacked blob, which 
happens to be already unpacked anyway.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Avoid accessing a slow working copy during diffcore operations.
  2006-12-14 11:15 [PATCH] Avoid accessing a slow working copy during diffcore operations Shawn O. Pearce
  2006-12-14 13:57 ` Alex Riesen
@ 2006-12-15  9:55 ` Jakub Narebski
  2006-12-15 15:05   ` Shawn Pearce
  2006-12-16  6:03 ` Junio C Hamano
  2 siblings, 1 reply; 10+ messages in thread
From: Jakub Narebski @ 2006-12-15  9:55 UTC (permalink / raw)
  To: git

Shawn O. Pearce wrote:

> +# Define NO_FAST_WORKING_DIRECTORY if accessing objects in pack files is
> +# generally faster on your platform than accessing the working directory.
> +#

Hmmm... I wonder if generated by autoconf ./configure script can autodetect
that. Probably not.
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Avoid accessing a slow working copy during diffcore operations.
  2006-12-15  9:55 ` Jakub Narebski
@ 2006-12-15 15:05   ` Shawn Pearce
  0 siblings, 0 replies; 10+ messages in thread
From: Shawn Pearce @ 2006-12-15 15:05 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> wrote:
> Shawn O. Pearce wrote:
> 
> > +# Define NO_FAST_WORKING_DIRECTORY if accessing objects in pack files is
> > +# generally faster on your platform than accessing the working directory.
> > +#
> 
> Hmmm... I wonder if generated by autoconf ./configure script can autodetect
> that. Probably not.

I think it would be difficult to perform an automatic performance
test prior to building Git to determine how we should build Git on
the current platform.  Besides, Cygwin users tend to not use the
./configure script as it takes longer than just using the Makefile.

AFAIK the only "OS" which suffers from this problem is Windows.
Anything else that Git has been ported too has reasonably fast
syscalls for file IO, which means this should be unset.

-- 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Avoid accessing a slow working copy during diffcore operations.
  2006-12-14 11:15 [PATCH] Avoid accessing a slow working copy during diffcore operations Shawn O. Pearce
  2006-12-14 13:57 ` Alex Riesen
  2006-12-15  9:55 ` Jakub Narebski
@ 2006-12-16  6:03 ` Junio C Hamano
  2006-12-16 11:54   ` Johannes Schindelin
  2 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2006-12-16  6:03 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git, Alex Riesen

"Shawn O. Pearce" <spearce@spearce.org> writes:

> If Git is compiled with NO_FAST_WORKING_DIRECTORY set then we will
> avoid looking at the working directory when the blob in question
> is available within a packfile and the caller doesn't need the data
> unpacked into a temporary file.

I'd take the patch as is, but...

> -static int work_tree_matches(const char *name, const unsigned char *sha1)
> +static int work_tree_matches(const char *name, const unsigned char *sha1, int want_file)

this feels wrong.  It is not about "work tree matches" anymore.
reuse_worktree_copy(), perhaps.

> @@ -1193,6 +1193,20 @@ static int work_tree_matches(const char *name, const unsigned char *sha1)
>  	if (!active_cache)
>  		return 0;
>  
> +#ifdef NO_FAST_WORKING_DIRECTORY
> +	/* We want to avoid the working directory if our caller
> +	 * doesn't need the data in a normal file, this system
> +	 * is rather slow with its stat/open/mmap/close syscalls,
> +	 * and the object is contained in a pack file.  The pack
> +	 * is probably already open and will be faster to obtain
> +	 * the data through than the working directory.  Loose
> +	 * objects however would tend to be slower as they need
> +	 * to be individually opened and inflated.
> +	 */
> +	if (!want_file && has_sha1_pack(sha1, NULL))
> +		return 0;
> +#endif
> +

Also I'd prefer doing this without #ifdef;

        if (defined(NO_FAST_WORKING_DIRECTORY) &&
        	!want_file && has_sha1_pack(sha1, NULL))
		return 0;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Avoid accessing a slow working copy during diffcore operations.
  2006-12-16  6:03 ` Junio C Hamano
@ 2006-12-16 11:54   ` Johannes Schindelin
  2006-12-16 13:38     ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Johannes Schindelin @ 2006-12-16 11:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn O. Pearce, git, Alex Riesen

Hi,

On Fri, 15 Dec 2006, Junio C Hamano wrote:

> Also I'd prefer doing this without #ifdef;
> 
>         if (defined(NO_FAST_WORKING_DIRECTORY) &&
>         	!want_file && has_sha1_pack(sha1, NULL))
> 		return 0;

Are you sure? AFAIU it is an OS dependent problem, so it should not be 
configurable at runtime anyway.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Avoid accessing a slow working copy during diffcore operations.
  2006-12-16 11:54   ` Johannes Schindelin
@ 2006-12-16 13:38     ` Junio C Hamano
  0 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2006-12-16 13:38 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Shawn O. Pearce, git, Alex Riesen

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Fri, 15 Dec 2006, Junio C Hamano wrote:
>
>> Also I'd prefer doing this without #ifdef;
>> 
>>         if (defined(NO_FAST_WORKING_DIRECTORY) &&
>>         	!want_file && has_sha1_pack(sha1, NULL))
>> 		return 0;
>
> Are you sure? AFAIU it is an OS dependent problem, so it should not be 
> configurable at runtime anyway.

Well, defined(SYMBOL) does not seem to be a valid C X-<, and I
rewrote it in what I committed.  But the point was that the
whole thing:

	if (!FAST_WORKING_DIRECTORY && whatever)
		statement;

when FAST_WORKING_DIRECTORY is a compilation time constant,
would be optimized out for sane platforms with a reasonable
compiler.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2006-12-16 13:38 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-12-14 11:15 [PATCH] Avoid accessing a slow working copy during diffcore operations Shawn O. Pearce
2006-12-14 13:57 ` Alex Riesen
2006-12-14 14:12   ` Johannes Schindelin
2006-12-14 14:49     ` Alex Riesen
2006-12-14 15:37       ` Johannes Schindelin
2006-12-15  9:55 ` Jakub Narebski
2006-12-15 15:05   ` Shawn Pearce
2006-12-16  6:03 ` Junio C Hamano
2006-12-16 11:54   ` Johannes Schindelin
2006-12-16 13:38     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).