git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Checking out on a different+partial directory
@ 2010-05-17 10:53 Eli Barzilay
  2010-05-22  5:48 ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Eli Barzilay @ 2010-05-17 10:53 UTC (permalink / raw
  To: git

Say that I have a repository at /some/dir, and a tree that contains
most of its files at /another/dir (the second is a build directory,
made with `git archive', so some files removed due to export-ignore
attributes and some built files are present).

Is there a convenient way to make /some/dir usable as a repository?
Two things that I tried are

  git --work-tree=/another/dir reset --hard master

which one time, but then failed with "fatal: unable to read tree...",
and another is

  cp -a /some/dir/.git /another/dir
  cd /another/dir
  git reset --hard master

which looks like it can suffer from the same problem.

(It would be especially nice if there's a way to have only different
files touched in /another/dir.)

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Checking out on a different+partial directory
  2010-05-17 10:53 Checking out on a different+partial directory Eli Barzilay
@ 2010-05-22  5:48 ` Jeff King
  2010-05-22  6:43   ` [PATCH] fix over-eager caching in sha1_file_name Jeff King
  2010-05-27  5:10   ` Checking out on a different+partial directory Eli Barzilay
  0 siblings, 2 replies; 6+ messages in thread
From: Jeff King @ 2010-05-22  5:48 UTC (permalink / raw
  To: Eli Barzilay; +Cc: git

On Mon, May 17, 2010 at 06:53:55AM -0400, Eli Barzilay wrote:

> Say that I have a repository at /some/dir, and a tree that contains
> most of its files at /another/dir (the second is a build directory,
> made with `git archive', so some files removed due to export-ignore
> attributes and some built files are present).
> 
> Is there a convenient way to make /some/dir usable as a repository?

Did you mean /another/dir? /some/dir is already a repository as per your
description above (is it a bare repository or a regular one?).

> Two things that I tried are
> 
>   git --work-tree=/another/dir reset --hard master
> 
> which one time, but then failed with "fatal: unable to read tree...",
> and another is

I would have thought that worked, assuming you were in /some/dir. And
oddly, _some_ stuff works. I tried:

  mkdir repo && cd repo && git init
  echo content >file && git add . && git commit -m file
  git archive --format=tar --prefix=work/ HEAD | tar -C .. -xf -

to create the situation. Running

  cd repo
  git --work-tree=$PWD/../work diff-files
  git --work-tree=$PWD/../work diff

works as expected (the first one notices stat-dirtiness, and the second
shows an empty diff). But then I get:

  $ git --work-tree=$PWD/../work reset --hard
  fatal: unable to read tree ea394696ee1f7f5b55ca0d97d37748933cf39095

So there is clearly a bug. I'll investigate.

>   cp -a /some/dir/.git /another/dir
>   cd /another/dir
>   git reset --hard master
> 
> which looks like it can suffer from the same problem.

That should work, too.

> (It would be especially nice if there's a way to have only different
> files touched in /another/dir.)

Only different files will be rewritten, but git will have to read all of
the files to determine their sha1 (usually it avoids this by checking
stat info, but obviously your exported files will not match the stat
info in the /another/dir's index).

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] fix over-eager caching in sha1_file_name
  2010-05-22  5:48 ` Jeff King
@ 2010-05-22  6:43   ` Jeff King
  2010-05-22  6:59     ` [PATCH] remove " Jeff King
  2010-05-27  5:10   ` Checking out on a different+partial directory Eli Barzilay
  1 sibling, 1 reply; 6+ messages in thread
From: Jeff King @ 2010-05-22  6:43 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Eli Barzilay, git

On Sat, May 22, 2010 at 01:48:52AM -0400, Jeff King wrote:

> I would have thought that worked, assuming you were in /some/dir. And
> oddly, _some_ stuff works. I tried:
> 
>   mkdir repo && cd repo && git init
>   echo content >file && git add . && git commit -m file
>   git archive --format=tar --prefix=work/ HEAD | tar -C .. -xf -
> 
> to create the situation. Running
> 
>   cd repo
>   git --work-tree=$PWD/../work diff-files
>   git --work-tree=$PWD/../work diff
> 
> works as expected (the first one notices stat-dirtiness, and the second
> shows an empty diff). But then I get:
> 
>   $ git --work-tree=$PWD/../work reset --hard
>   fatal: unable to read tree ea394696ee1f7f5b55ca0d97d37748933cf39095
> 
> So there is clearly a bug. I'll investigate.

Ugh. This code dates back to the first two weeks of git, but didn't
become a bug until sometime in the intervening 5 years. See below for
the explanation and fix.

-- >8 --
Subject: [PATCH] fix over-eager caching in sha1_file_name

This function takes a sha1 and produces a loose object
filename. It caches the location of the object directory so
that it can fill the sha1 information directly without
allocating a new buffer (and in its original incarnation,
without calling getenv(), though these days we cache that
with the code in environment.c).

This cached base directory can become stale, however, if in
a single process git changes the location of the object
directory (e.g., by running setup_work_tree, which will
chdir to the new worktree).

In most cases this isn't a problem, because we tend to set
up the git repository location and do any chdir()s before
actually looking up any objects, so the first lookup will
cache the correct location. In the case of reset --hard,
however, we do something like:

  1. look up the commit object

  2. notice we are doing --hard, run setup_work_tree

  3. look up the tree object to reset

Step (3) fails because our cache object directory value is
bogus.

This patch takes the minimalist fix. It retains the caching,
but checks the validity of our object directory against the
one cached in environment.c, which adds only a single
function call and a pointer comparison to the fast path.

Signed-off-by: Jeff King <peff@peff.net>
---
As I noted above, this is the minimal fix. I think it would be more
readable, though, to simply remove this caching layer altogether and use
a static buffer. I suspect the original was just trying to avoid the
slow getenv() call, which is no longer an issue now. We can probably
afford an snprintf. I'll post that patch shortly.

 sha1_file.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index d8e61a6..f047b3c 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -103,10 +103,12 @@ static void fill_sha1_path(char *pathbuf, const unsigned char *sha1)
 char *sha1_file_name(const unsigned char *sha1)
 {
 	static char *name, *base;
+	static const char *sha1_file_directory;
 
-	if (!base) {
-		const char *sha1_file_directory = get_object_directory();
+	if (!base || sha1_file_directory != get_object_directory()) {
+		sha1_file_directory = get_object_directory();
 		int len = strlen(sha1_file_directory);
+		free(base);
 		base = xmalloc(len + 60);
 		memcpy(base, sha1_file_directory, len);
 		memset(base+len, 0, 60);
-- 
1.7.1.352.gb63a7.dirty

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH] remove over-eager caching in sha1_file_name
  2010-05-22  6:43   ` [PATCH] fix over-eager caching in sha1_file_name Jeff King
@ 2010-05-22  6:59     ` Jeff King
  2010-05-26  5:07       ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2010-05-22  6:59 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Eli Barzilay, git

On Sat, May 22, 2010 at 02:43:08AM -0400, Jeff King wrote:

> This patch takes the minimalist fix. It retains the caching,
> but checks the validity of our object directory against the
> one cached in environment.c, which adds only a single
> function call and a pointer comparison to the fast path.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> As I noted above, this is the minimal fix. I think it would be more
> readable, though, to simply remove this caching layer altogether and use
> a static buffer. I suspect the original was just trying to avoid the
> slow getenv() call, which is no longer an issue now. We can probably
> afford an snprintf. I'll post that patch shortly.

Actually, thinking on it more, micro-optimizing this is really
pointless. I was thinking that it would get called a lot, so we need to
care. But the next step is almost certainly to open, mmap, and zlib
decompress the resulting object, which is way more expensive.

So here it is with caching ripped out. More readable and more robust,
and I'm sure we can afford an extra strlen() and memcpy() on each object
lookup.

-- >8 --
Subject: [PATCH] remove over-eager caching in sha1_file_name

This function takes a sha1 and produces a loose object
filename. It caches the location of the object directory so
that it can fill the sha1 information directly without
allocating a new buffer (and in its original incarnation,
without calling getenv(), though these days we cache that
with the code in environment.c).

This cached base directory can become stale, however, if in
a single process git changes the location of the object
directory (e.g., by running setup_work_tree, which will
chdir to the new worktree).

In most cases this isn't a problem, because we tend to set
up the git repository location and do any chdir()s before
actually looking up any objects, so the first lookup will
cache the correct location. In the case of reset --hard,
however, we do something like:

  1. look up the commit object

  2. notice we are doing --hard, run setup_work_tree

  3. look up the tree object to reset

Step (3) fails because our cache object directory value is
bogus.

This patch simply removes the caching. We use a static
buffer instead of allocating one each time (the original
version treated the malloc'd buffer as a static, so there is
no change in calling semantics).

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_file.c |   28 +++++++++++++++-------------
 1 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index d8e61a6..e42ef96 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -102,20 +102,22 @@ static void fill_sha1_path(char *pathbuf, const unsigned char *sha1)
  */
 char *sha1_file_name(const unsigned char *sha1)
 {
-	static char *name, *base;
+	static char buf[PATH_MAX];
+	const char *objdir;
+	int len;
 
-	if (!base) {
-		const char *sha1_file_directory = get_object_directory();
-		int len = strlen(sha1_file_directory);
-		base = xmalloc(len + 60);
-		memcpy(base, sha1_file_directory, len);
-		memset(base+len, 0, 60);
-		base[len] = '/';
-		base[len+3] = '/';
-		name = base + len + 1;
-	}
-	fill_sha1_path(name, sha1);
-	return base;
+	objdir = get_object_directory();
+	len = strlen(objdir);
+
+	/* '/' + sha1(2) + '/' + sha1(38) + '\0' */
+	if (len + 43 > PATH_MAX)
+		die("insanely long object directory %s", objdir);
+	memcpy(buf, objdir, len);
+	buf[len] = '/';
+	buf[len+3] = '/';
+	buf[len+42] = '\0';
+	fill_sha1_path(buf + len + 1, sha1);
+	return buf;
 }
 
 static char *sha1_get_pack_name(const unsigned char *sha1,
-- 
1.7.1.227.ge187a.dirty

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] remove over-eager caching in sha1_file_name
  2010-05-22  6:59     ` [PATCH] remove " Jeff King
@ 2010-05-26  5:07       ` Junio C Hamano
  0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2010-05-26  5:07 UTC (permalink / raw
  To: Jeff King; +Cc: Eli Barzilay, git

Jeff King <peff@peff.net> writes:

> Actually, thinking on it more, micro-optimizing this is really
> pointless. I was thinking that it would get called a lot, so we need to
> care. But the next step is almost certainly to open, mmap, and zlib
> decompress the resulting object, which is way more expensive.
>
> So here it is with caching ripped out. More readable and more robust,
> and I'm sure we can afford an extra strlen() and memcpy() on each object
> lookup.

Makes sense; thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Checking out on a different+partial directory
  2010-05-22  5:48 ` Jeff King
  2010-05-22  6:43   ` [PATCH] fix over-eager caching in sha1_file_name Jeff King
@ 2010-05-27  5:10   ` Eli Barzilay
  1 sibling, 0 replies; 6+ messages in thread
From: Eli Barzilay @ 2010-05-27  5:10 UTC (permalink / raw
  To: git

Jeff King <peff@peff.net> writes:

> On Mon, May 17, 2010 at 06:53:55AM -0400, Eli Barzilay wrote:
>
>> Say that I have a repository at /some/dir, and a tree that contains
>> most of its files at /another/dir (the second is a build directory,
>> made with `git archive', so some files removed due to export-ignore
>> attributes and some built files are present).
>> 
>> Is there a convenient way to make /some/dir usable as a repository?
>
> Did you mean /another/dir? /some/dir is already a repository as per your
> description above

Yes, sorry for the confusion...


> (is it a bare repository or a regular one?).

It's a regular one.


>> Two things that I tried are
>> 
>>   git --work-tree=/another/dir reset --hard master
>> 
>> which one time, but then failed with "fatal: unable to read tree...",
>> and another is
>
> I would have thought that worked, assuming you were in /some/dir. And
> oddly, _some_ stuff works. I tried:
>
> [...]
>
> So there is clearly a bug. I'll investigate.

Thanks!


>>   cp -a /some/dir/.git /another/dir
>>   cd /another/dir
>>   git reset --hard master
>> 
>> which looks like it can suffer from the same problem.
>
> That should work, too.

Heh, I just assumed that I'm doing something similarly wrong in both
cases...  In any case, I finally settled on doing things the other
way: grab all of the files that were created by the built tree and add
them to the repository directory (that is -- get stuff from
/another/dir into /some/dir).  It's a little less convenient for me,
so I'll probably switch back to the above, given that it should work.


>> (It would be especially nice if there's a way to have only
>> different files touched in /another/dir.)
>
> Only different files will be rewritten, but git will have to read all of
> the files to determine their sha1 (usually it avoids this by checking
> stat info, but obviously your exported files will not match the stat
> info in the /another/dir's index).

Yes, I expected this cost...

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-05-27  5:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-17 10:53 Checking out on a different+partial directory Eli Barzilay
2010-05-22  5:48 ` Jeff King
2010-05-22  6:43   ` [PATCH] fix over-eager caching in sha1_file_name Jeff King
2010-05-22  6:59     ` [PATCH] remove " Jeff King
2010-05-26  5:07       ` Junio C Hamano
2010-05-27  5:10   ` Checking out on a different+partial directory Eli Barzilay

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).