git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] Work around too many file descriptors
@ 2010-11-01 22:54 Shawn O. Pearce
  2010-11-01 22:54 ` [PATCH 1/2] Use git_open_noatime when accessing pack data Shawn O. Pearce
  2010-11-01 22:54 ` [PATCH 2/2] Work around EMFILE when there are too many pack files Shawn O. Pearce
  0 siblings, 2 replies; 11+ messages in thread
From: Shawn O. Pearce @ 2010-11-01 22:54 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

This tiny series tries to work around not having enough file
descriptors to open every pack in a repository.  It is the logical
conclusion of Johannes Schindelin's commit fd73ccf2 ("Cope better
with a _lot_ of packs").

The patch was prepared on top of gitster's recent two changes to
improve error handling in read_sha1_file().

Shawn O. Pearce (2):
  Use git_open_noatime when accessing pack data
  Work around EMFILE when there are too many pack files

 sha1_file.c |   43 ++++++++++++++++++++++++++++---------------
 1 files changed, 28 insertions(+), 15 deletions(-)

-- 
1.7.3.2.191.g2d0e5

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] Use git_open_noatime when accessing pack data
  2010-11-01 22:54 [PATCH 0/2] Work around too many file descriptors Shawn O. Pearce
@ 2010-11-01 22:54 ` Shawn O. Pearce
  2010-11-03 17:07   ` Junio C Hamano
  2010-11-01 22:54 ` [PATCH 2/2] Work around EMFILE when there are too many pack files Shawn O. Pearce
  1 sibling, 1 reply; 11+ messages in thread
From: Shawn O. Pearce @ 2010-11-01 22:54 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

This utility function avoids an unnecessary update of the access time
for a loose object file.  Just as the atime isn't useful on a loose
object, its not useful on the pack or the corresonding idx file.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 sha1_file.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index f709ed6..a6c1934 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -35,6 +35,8 @@ static size_t sz_fmt(size_t s) { return s; }
 
 const unsigned char null_sha1[20];
 
+static int git_open_noatime(const char *name);
+
 int safe_create_leading_directories(char *path)
 {
 	char *pos = path + offset_1st_component(path);
@@ -298,7 +300,7 @@ static void read_info_alternates(const char * relative_base, int depth)
 	int fd;
 
 	sprintf(path, "%s/%s", relative_base, alt_file_name);
-	fd = open(path, O_RDONLY);
+	fd = git_open_noatime(path);
 	if (fd < 0)
 		return;
 	if (fstat(fd, &st) || (st.st_size == 0)) {
@@ -411,7 +413,7 @@ static int check_packed_git_idx(const char *path,  struct packed_git *p)
 	struct pack_idx_header *hdr;
 	size_t idx_size;
 	uint32_t version, nr, i, *index;
-	int fd = open(path, O_RDONLY);
+	int fd = git_open_noatime(path);
 	struct stat st;
 
 	if (fd < 0)
@@ -655,9 +657,9 @@ static int open_packed_git_1(struct packed_git *p)
 	if (!p->index_data && open_pack_index(p))
 		return error("packfile %s index unavailable", p->pack_name);
 
-	p->pack_fd = open(p->pack_name, O_RDONLY);
+	p->pack_fd = git_open_noatime(p->pack_name);
 	while (p->pack_fd < 0 && errno == EMFILE && unuse_one_window(p, -1))
-		p->pack_fd = open(p->pack_name, O_RDONLY);
+		p->pack_fd = git_open_noatime(p->pack_name);
 	if (p->pack_fd < 0 || fstat(p->pack_fd, &st))
 		return -1;
 
-- 
1.7.3.2.191.g2d0e5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] Work around EMFILE when there are too many pack files
  2010-11-01 22:54 [PATCH 0/2] Work around too many file descriptors Shawn O. Pearce
  2010-11-01 22:54 ` [PATCH 1/2] Use git_open_noatime when accessing pack data Shawn O. Pearce
@ 2010-11-01 22:54 ` Shawn O. Pearce
  2010-11-02  8:44   ` Johannes Sixt
  2010-11-03 17:06   ` Junio C Hamano
  1 sibling, 2 replies; 11+ messages in thread
From: Shawn O. Pearce @ 2010-11-01 22:54 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

When opening any files in the object database, release unused pack
windows if the open(2) syscall fails due to EMFILE (too many open
files in this process).  This allows Git to degrade gracefully on
a repository with thousands of pack files, and a commit stored in
a loose object in the middle of the history.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 sha1_file.c |   43 +++++++++++++++++++++++++++----------------
 1 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index a6c1934..43d68e0 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -35,7 +35,7 @@ static size_t sz_fmt(size_t s) { return s; }
 
 const unsigned char null_sha1[20];
 
-static int git_open_noatime(const char *name);
+static int git_open_noatime(const char *name, struct packed_git *p);
 
 int safe_create_leading_directories(char *path)
 {
@@ -300,7 +300,7 @@ static void read_info_alternates(const char * relative_base, int depth)
 	int fd;
 
 	sprintf(path, "%s/%s", relative_base, alt_file_name);
-	fd = git_open_noatime(path);
+	fd = git_open_noatime(path, NULL);
 	if (fd < 0)
 		return;
 	if (fstat(fd, &st) || (st.st_size == 0)) {
@@ -413,7 +413,7 @@ static int check_packed_git_idx(const char *path,  struct packed_git *p)
 	struct pack_idx_header *hdr;
 	size_t idx_size;
 	uint32_t version, nr, i, *index;
-	int fd = git_open_noatime(path);
+	int fd = git_open_noatime(path, p);
 	struct stat st;
 
 	if (fd < 0)
@@ -657,9 +657,7 @@ static int open_packed_git_1(struct packed_git *p)
 	if (!p->index_data && open_pack_index(p))
 		return error("packfile %s index unavailable", p->pack_name);
 
-	p->pack_fd = git_open_noatime(p->pack_name);
-	while (p->pack_fd < 0 && errno == EMFILE && unuse_one_window(p, -1))
-		p->pack_fd = git_open_noatime(p->pack_name);
+	p->pack_fd = git_open_noatime(p->pack_name, p);
 	if (p->pack_fd < 0 || fstat(p->pack_fd, &st))
 		return -1;
 
@@ -876,7 +874,7 @@ static void prepare_packed_git_one(char *objdir, int local)
 	sprintf(path, "%s/pack", objdir);
 	len = strlen(path);
 	dir = opendir(path);
-	while (!dir && errno == EMFILE && unuse_one_window(packed_git, -1))
+	while (!dir && errno == EMFILE && unuse_one_window(NULL, -1))
 		dir = opendir(path);
 	if (!dir) {
 		if (errno != ENOENT)
@@ -1024,18 +1022,31 @@ int check_sha1_signature(const unsigned char *sha1, void *map, unsigned long siz
 	return hashcmp(sha1, real_sha1) ? -1 : 0;
 }
 
-static int git_open_noatime(const char *name)
+static int git_open_noatime(const char *name, struct packed_git *p)
 {
 	static int sha1_file_open_flag = O_NOATIME;
-	int fd = open(name, O_RDONLY | sha1_file_open_flag);
 
-	/* Might the failure be due to O_NOATIME? */
-	if (fd < 0 && errno != ENOENT && sha1_file_open_flag) {
-		fd = open(name, O_RDONLY);
+	for (;;) {
+		int fd = open(name, O_RDONLY | sha1_file_open_flag);
 		if (fd >= 0)
+			return fd;
+
+		/* Might the failure be insufficient file descriptors? */
+		if (errno == EMFILE) {
+			if (unuse_one_window(p, -1))
+				continue;
+			else
+				return -1;
+		}
+
+		/* Might the failure be due to O_NOATIME? */
+		if (errno != ENOENT && sha1_file_open_flag) {
 			sha1_file_open_flag = 0;
+			continue;
+		}
+
+		return -1;
 	}
-	return fd;
 }
 
 static int open_sha1_file(const unsigned char *sha1)
@@ -1044,7 +1055,7 @@ static int open_sha1_file(const unsigned char *sha1)
 	char *name = sha1_file_name(sha1);
 	struct alternate_object_database *alt;
 
-	fd = git_open_noatime(name);
+	fd = git_open_noatime(name, NULL);
 	if (fd >= 0)
 		return fd;
 
@@ -1053,7 +1064,7 @@ static int open_sha1_file(const unsigned char *sha1)
 	for (alt = alt_odb_list; alt; alt = alt->next) {
 		name = alt->name;
 		fill_sha1_path(name, sha1);
-		fd = git_open_noatime(alt->base);
+		fd = git_open_noatime(alt->base, NULL);
 		if (fd >= 0)
 			return fd;
 	}
@@ -2314,7 +2325,7 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 
 	filename = sha1_file_name(sha1);
 	fd = create_tmpfile(tmpfile, sizeof(tmpfile), filename);
-	while (fd < 0 && errno == EMFILE && unuse_one_window(packed_git, -1))
+	while (fd < 0 && errno == EMFILE && unuse_one_window(NULL, -1))
 		fd = create_tmpfile(tmpfile, sizeof(tmpfile), filename);
 	if (fd < 0) {
 		if (errno == EACCES)
-- 
1.7.3.2.191.g2d0e5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] Work around EMFILE when there are too many pack files
  2010-11-01 22:54 ` [PATCH 2/2] Work around EMFILE when there are too many pack files Shawn O. Pearce
@ 2010-11-02  8:44   ` Johannes Sixt
  2010-11-03 17:06   ` Junio C Hamano
  1 sibling, 0 replies; 11+ messages in thread
From: Johannes Sixt @ 2010-11-02  8:44 UTC (permalink / raw
  To: Shawn O. Pearce; +Cc: Junio C Hamano, git

Am 11/1/2010 23:54, schrieb Shawn O. Pearce:
> -static int git_open_noatime(const char *name)
> +static int git_open_noatime(const char *name, struct packed_git *p)
>  {
>  	static int sha1_file_open_flag = O_NOATIME;
> -	int fd = open(name, O_RDONLY | sha1_file_open_flag);
>  
> -	/* Might the failure be due to O_NOATIME? */
> -	if (fd < 0 && errno != ENOENT && sha1_file_open_flag) {
> -		fd = open(name, O_RDONLY);
> +	for (;;) {
> +		int fd = open(name, O_RDONLY | sha1_file_open_flag);
>  		if (fd >= 0)
> +			return fd;
> +
> +		/* Might the failure be insufficient file descriptors? */
> +		if (errno == EMFILE) {
> +			if (unuse_one_window(p, -1))
> +				continue;
> +			else
> +				return -1;
> +		}
> +
> +		/* Might the failure be due to O_NOATIME? */
> +		if (errno != ENOENT && sha1_file_open_flag) {
>  			sha1_file_open_flag = 0;
> +			continue;
> +		}
> +
> +		return -1;
>  	}
> -	return fd;
>  }

Oh, boy! A function that returns a value, but does not have a return
statement at the end? Even a goto would be easier to read:

retry:
	fd = open(name, O_RDONLY | sha1_file_open_flag);
	if (fd >= 0)
		return fd;

	if (errno == EMFILE) {
		if (unuse_one_window(p, -1))
			goto retry;
		return -1;
	}

	if (errno != ENOENT && sha1_file_open_flag) {
		sha1_file_open_flag = 0;
		goto retry;
	}

	return -1;

IMHO, of course.

-- Hannes

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] Work around EMFILE when there are too many pack files
  2010-11-01 22:54 ` [PATCH 2/2] Work around EMFILE when there are too many pack files Shawn O. Pearce
  2010-11-02  8:44   ` Johannes Sixt
@ 2010-11-03 17:06   ` Junio C Hamano
  1 sibling, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2010-11-03 17:06 UTC (permalink / raw
  To: Shawn O. Pearce; +Cc: git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> @@ -876,7 +874,7 @@ static void prepare_packed_git_one(char *objdir, int local)
>  	sprintf(path, "%s/pack", objdir);
>  	len = strlen(path);
>  	dir = opendir(path);
> -	while (!dir && errno == EMFILE && unuse_one_window(packed_git, -1))
> +	while (!dir && errno == EMFILE && unuse_one_window(NULL, -1))

Hmmmm, why?

> @@ -2314,7 +2325,7 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
>  
>  	filename = sha1_file_name(sha1);
>  	fd = create_tmpfile(tmpfile, sizeof(tmpfile), filename);
> -	while (fd < 0 && errno == EMFILE && unuse_one_window(packed_git, -1))
> +	while (fd < 0 && errno == EMFILE && unuse_one_window(NULL, -1))

Again, why?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] Use git_open_noatime when accessing pack data
  2010-11-01 22:54 ` [PATCH 1/2] Use git_open_noatime when accessing pack data Shawn O. Pearce
@ 2010-11-03 17:07   ` Junio C Hamano
  2010-11-03 17:41     ` Jonathan Nieder
  0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2010-11-03 17:07 UTC (permalink / raw
  To: Shawn O. Pearce; +Cc: git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> This utility function avoids an unnecessary update of the access time
> for a loose object file.  Just as the atime isn't useful on a loose
> object, its not useful on the pack or the corresonding idx file.
>
> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>

Hearing the name "git-open-noatime", one would naturally assume that it is
a way to open files without burdening the filesystem with inode metadata
update traffic, as that was the original reason why we open loose objects
without atime update.  We historically anticipated to have very many of
the loose objects lying around, and this optimization made sense.

As any sane repository would have far fewer packfiles than loose objects,
one would think that, while it may not hurt, using git-open-noatime to
open packfiles is just a misguided performance measure.  Not.

This patch (and the next patch) adds "we unuse pack windows to retry
opening if we have too many files already open" logic, which is a lot more
important side effect, especially when this function is used for packfiles
(because they tend to stay open for a long time, unlike loose object files
that are opened, read/mapped, and then immediately closed) than what the
name of this function says it does.

Even though I think the issue you are solving is worth addressing, I do
not think I like the structure of the API resulting from these two
patches.  Most of the callers, except for the ones in check-packed-git-idx
and open-packed-git-1, do not care about "keeping one packfile" interface,
so I would prefer to see a two-patch series along the lines of ...

 (1) introduce "int git_open_ro(const char *)" to replace the current
     git_open_noatime().  The point is that the function no longer is
     about avoiding from smudging the inode metadata.  Instead, it becomes
     the preferred way for us to get a read-only fd.

 (2) call your git_open_noatime() implementation git_open_rowpf() or
     something.  Make git_open_ro() a thin wrapper of this function that
     passes NULL for its packed_git parameter.  Two callers that care
     about protecting a pack they are operating on will call this function
     directly.

We can of course do without s/git_open_noatime/git_open_ro/; and it will
make the patch much smaller.  The rename is purely a clarification of the
API and is optional.  It may make it easier to explain the name of the new
function, though.

By the way, I think I still owe you a patch to selectively pack-ref only
old ones.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] Use git_open_noatime when accessing pack data
  2010-11-03 17:07   ` Junio C Hamano
@ 2010-11-03 17:41     ` Jonathan Nieder
  2010-11-03 19:35       ` Junio C Hamano
  0 siblings, 1 reply; 11+ messages in thread
From: Jonathan Nieder @ 2010-11-03 17:41 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Shawn O. Pearce, git

Junio C Hamano wrote:

>  (1) introduce "int git_open_ro(const char *)" to replace the current
>      git_open_noatime().  The point is that the function no longer is
>      about avoiding from smudging the inode metadata.  Instead, it becomes
>      the preferred way for us to get a read-only fd.
[...]
> We can of course do without s/git_open_noatime/git_open_ro/; and it will
> make the patch much smaller.  The rename is purely a clarification of the
> API and is optional.  It may make it easier to explain the name of the new
> function, though.

Probably a silly question, but should all readonly open()s actually use
noatime?  Some uses for atime:

 - tmpwatch.  That one's a bit insane, anyway, but I think it might
   work because loose objects and packs are read-only.

 - mail clients.  For this case, noatime is the right thing to do ---
   git's access does mean the mail was read.

 - listing important files, as in popularity-contest.  For this,
   noatime is also the right thing to do.

Judging from these three use cases, readonly open()s to the worktree
should indeed use noatime, but open()s of .git/config, say, should
not.  Hmm.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] Use git_open_noatime when accessing pack data
  2010-11-03 17:41     ` Jonathan Nieder
@ 2010-11-03 19:35       ` Junio C Hamano
  2010-11-04  5:04         ` Jonathan Nieder
  0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2010-11-03 19:35 UTC (permalink / raw
  To: Jonathan Nieder; +Cc: Shawn O. Pearce, git

Jonathan Nieder <jrnieder@gmail.com> writes:

> Judging from these three use cases, readonly open()s to the worktree
> should indeed use noatime, but open()s of .git/config, say, should
> not.  Hmm.

Why not, when you are talking about readonly open?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] Use git_open_noatime when accessing pack data
  2010-11-03 19:35       ` Junio C Hamano
@ 2010-11-04  5:04         ` Jonathan Nieder
  2010-11-04  5:23           ` Kevin Ballard
  2010-11-05 17:26           ` Junio C Hamano
  0 siblings, 2 replies; 11+ messages in thread
From: Jonathan Nieder @ 2010-11-04  5:04 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Shawn O. Pearce, git

Junio C Hamano wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:

>> Judging from these three use cases, readonly open()s to the worktree
>> should indeed use noatime, but open()s of .git/config, say, should
>> not.  Hmm.
>
> Why not, when you are talking about readonly open?

A .git/config frequently accessed by git is being used, so tmpwatch
shouldn't delete it.  (Meanwhile, a worktree frequently accessed by
git is not precious on that account, and letting git touch the atime
there might mask more useful information about when other programs
touched the files.)

But that is all very fuzzy to me.  I tend to mount noatime except on
/usr (for popularity-contest).

I guess I should put it another way.  What if anything does readonly
have to do with O_NOATIME?  Why shouldn't we always use O_NOATIME?
Why should the operating system provide atime at all?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] Use git_open_noatime when accessing pack data
  2010-11-04  5:04         ` Jonathan Nieder
@ 2010-11-04  5:23           ` Kevin Ballard
  2010-11-05 17:26           ` Junio C Hamano
  1 sibling, 0 replies; 11+ messages in thread
From: Kevin Ballard @ 2010-11-04  5:23 UTC (permalink / raw
  To: Jonathan Nieder; +Cc: Junio C Hamano, Shawn O. Pearce, git

On Nov 3, 2010, at 10:04 PM, Jonathan Nieder wrote:

> I guess I should put it another way.  What if anything does readonly
> have to do with O_NOATIME?  Why shouldn't we always use O_NOATIME?
> Why should the operating system provide atime at all?

atime is useful for user-level files that are opened or modified as a
direct result of actions the user took. I don't think a took like git
has any reason to ever want to touch atime.

-Kevin Ballard

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] Use git_open_noatime when accessing pack data
  2010-11-04  5:04         ` Jonathan Nieder
  2010-11-04  5:23           ` Kevin Ballard
@ 2010-11-05 17:26           ` Junio C Hamano
  1 sibling, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2010-11-05 17:26 UTC (permalink / raw
  To: Jonathan Nieder; +Cc: Shawn O. Pearce, git

Jonathan Nieder <jrnieder@gmail.com> writes:

> I guess I should put it another way.  What if anything does readonly
> have to do with O_NOATIME?  Why shouldn't we always use O_NOATIME?
> Why should the operating system provide atime at all?

Well, I think what you are getting at is that the proposed API in Shawn's
patch and my suggestion is upside down, and it should be layered more like
this (from lower to higher layers) from the interface's point of view:

 - git_open_wpf(): an API to get a file descriptor to a file, while
   protecting open fds to a packfile from getting reclaimed;

 - git_open(): a thin wrapper of the previous, for callers that do not
   need any "with-pack-file" aspect of it;

 - git_open_noatime(): a thin wrapper of the previous, for callers that do
   not want to incur inode metainformation traffic to the filesystem.
 
even though the bulk of implementation, including the logic to handle
no-atime, probably needs to happen in the lowermost layer.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-11-05 17:27 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-01 22:54 [PATCH 0/2] Work around too many file descriptors Shawn O. Pearce
2010-11-01 22:54 ` [PATCH 1/2] Use git_open_noatime when accessing pack data Shawn O. Pearce
2010-11-03 17:07   ` Junio C Hamano
2010-11-03 17:41     ` Jonathan Nieder
2010-11-03 19:35       ` Junio C Hamano
2010-11-04  5:04         ` Jonathan Nieder
2010-11-04  5:23           ` Kevin Ballard
2010-11-05 17:26           ` Junio C Hamano
2010-11-01 22:54 ` [PATCH 2/2] Work around EMFILE when there are too many pack files Shawn O. Pearce
2010-11-02  8:44   ` Johannes Sixt
2010-11-03 17:06   ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).