* [PATCH 0/2] Work around too many file descriptors
@ 2010-11-01 22:54 Shawn O. Pearce
2010-11-01 22:54 ` [PATCH 1/2] Use git_open_noatime when accessing pack data Shawn O. Pearce
2010-11-01 22:54 ` [PATCH 2/2] Work around EMFILE when there are too many pack files Shawn O. Pearce
0 siblings, 2 replies; 11+ messages in thread
From: Shawn O. Pearce @ 2010-11-01 22:54 UTC (permalink / raw
To: Junio C Hamano; +Cc: git
This tiny series tries to work around not having enough file
descriptors to open every pack in a repository. It is the logical
conclusion of Johannes Schindelin's commit fd73ccf2 ("Cope better
with a _lot_ of packs").
The patch was prepared on top of gitster's recent two changes to
improve error handling in read_sha1_file().
Shawn O. Pearce (2):
Use git_open_noatime when accessing pack data
Work around EMFILE when there are too many pack files
sha1_file.c | 43 ++++++++++++++++++++++++++++---------------
1 files changed, 28 insertions(+), 15 deletions(-)
--
1.7.3.2.191.g2d0e5
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] Use git_open_noatime when accessing pack data
2010-11-01 22:54 [PATCH 0/2] Work around too many file descriptors Shawn O. Pearce
@ 2010-11-01 22:54 ` Shawn O. Pearce
2010-11-03 17:07 ` Junio C Hamano
2010-11-01 22:54 ` [PATCH 2/2] Work around EMFILE when there are too many pack files Shawn O. Pearce
1 sibling, 1 reply; 11+ messages in thread
From: Shawn O. Pearce @ 2010-11-01 22:54 UTC (permalink / raw
To: Junio C Hamano; +Cc: git
This utility function avoids an unnecessary update of the access time
for a loose object file. Just as the atime isn't useful on a loose
object, its not useful on the pack or the corresonding idx file.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
sha1_file.c | 10 ++++++----
1 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/sha1_file.c b/sha1_file.c
index f709ed6..a6c1934 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -35,6 +35,8 @@ static size_t sz_fmt(size_t s) { return s; }
const unsigned char null_sha1[20];
+static int git_open_noatime(const char *name);
+
int safe_create_leading_directories(char *path)
{
char *pos = path + offset_1st_component(path);
@@ -298,7 +300,7 @@ static void read_info_alternates(const char * relative_base, int depth)
int fd;
sprintf(path, "%s/%s", relative_base, alt_file_name);
- fd = open(path, O_RDONLY);
+ fd = git_open_noatime(path);
if (fd < 0)
return;
if (fstat(fd, &st) || (st.st_size == 0)) {
@@ -411,7 +413,7 @@ static int check_packed_git_idx(const char *path, struct packed_git *p)
struct pack_idx_header *hdr;
size_t idx_size;
uint32_t version, nr, i, *index;
- int fd = open(path, O_RDONLY);
+ int fd = git_open_noatime(path);
struct stat st;
if (fd < 0)
@@ -655,9 +657,9 @@ static int open_packed_git_1(struct packed_git *p)
if (!p->index_data && open_pack_index(p))
return error("packfile %s index unavailable", p->pack_name);
- p->pack_fd = open(p->pack_name, O_RDONLY);
+ p->pack_fd = git_open_noatime(p->pack_name);
while (p->pack_fd < 0 && errno == EMFILE && unuse_one_window(p, -1))
- p->pack_fd = open(p->pack_name, O_RDONLY);
+ p->pack_fd = git_open_noatime(p->pack_name);
if (p->pack_fd < 0 || fstat(p->pack_fd, &st))
return -1;
--
1.7.3.2.191.g2d0e5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/2] Work around EMFILE when there are too many pack files
2010-11-01 22:54 [PATCH 0/2] Work around too many file descriptors Shawn O. Pearce
2010-11-01 22:54 ` [PATCH 1/2] Use git_open_noatime when accessing pack data Shawn O. Pearce
@ 2010-11-01 22:54 ` Shawn O. Pearce
2010-11-02 8:44 ` Johannes Sixt
2010-11-03 17:06 ` Junio C Hamano
1 sibling, 2 replies; 11+ messages in thread
From: Shawn O. Pearce @ 2010-11-01 22:54 UTC (permalink / raw
To: Junio C Hamano; +Cc: git
When opening any files in the object database, release unused pack
windows if the open(2) syscall fails due to EMFILE (too many open
files in this process). This allows Git to degrade gracefully on
a repository with thousands of pack files, and a commit stored in
a loose object in the middle of the history.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
sha1_file.c | 43 +++++++++++++++++++++++++++----------------
1 files changed, 27 insertions(+), 16 deletions(-)
diff --git a/sha1_file.c b/sha1_file.c
index a6c1934..43d68e0 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -35,7 +35,7 @@ static size_t sz_fmt(size_t s) { return s; }
const unsigned char null_sha1[20];
-static int git_open_noatime(const char *name);
+static int git_open_noatime(const char *name, struct packed_git *p);
int safe_create_leading_directories(char *path)
{
@@ -300,7 +300,7 @@ static void read_info_alternates(const char * relative_base, int depth)
int fd;
sprintf(path, "%s/%s", relative_base, alt_file_name);
- fd = git_open_noatime(path);
+ fd = git_open_noatime(path, NULL);
if (fd < 0)
return;
if (fstat(fd, &st) || (st.st_size == 0)) {
@@ -413,7 +413,7 @@ static int check_packed_git_idx(const char *path, struct packed_git *p)
struct pack_idx_header *hdr;
size_t idx_size;
uint32_t version, nr, i, *index;
- int fd = git_open_noatime(path);
+ int fd = git_open_noatime(path, p);
struct stat st;
if (fd < 0)
@@ -657,9 +657,7 @@ static int open_packed_git_1(struct packed_git *p)
if (!p->index_data && open_pack_index(p))
return error("packfile %s index unavailable", p->pack_name);
- p->pack_fd = git_open_noatime(p->pack_name);
- while (p->pack_fd < 0 && errno == EMFILE && unuse_one_window(p, -1))
- p->pack_fd = git_open_noatime(p->pack_name);
+ p->pack_fd = git_open_noatime(p->pack_name, p);
if (p->pack_fd < 0 || fstat(p->pack_fd, &st))
return -1;
@@ -876,7 +874,7 @@ static void prepare_packed_git_one(char *objdir, int local)
sprintf(path, "%s/pack", objdir);
len = strlen(path);
dir = opendir(path);
- while (!dir && errno == EMFILE && unuse_one_window(packed_git, -1))
+ while (!dir && errno == EMFILE && unuse_one_window(NULL, -1))
dir = opendir(path);
if (!dir) {
if (errno != ENOENT)
@@ -1024,18 +1022,31 @@ int check_sha1_signature(const unsigned char *sha1, void *map, unsigned long siz
return hashcmp(sha1, real_sha1) ? -1 : 0;
}
-static int git_open_noatime(const char *name)
+static int git_open_noatime(const char *name, struct packed_git *p)
{
static int sha1_file_open_flag = O_NOATIME;
- int fd = open(name, O_RDONLY | sha1_file_open_flag);
- /* Might the failure be due to O_NOATIME? */
- if (fd < 0 && errno != ENOENT && sha1_file_open_flag) {
- fd = open(name, O_RDONLY);
+ for (;;) {
+ int fd = open(name, O_RDONLY | sha1_file_open_flag);
if (fd >= 0)
+ return fd;
+
+ /* Might the failure be insufficient file descriptors? */
+ if (errno == EMFILE) {
+ if (unuse_one_window(p, -1))
+ continue;
+ else
+ return -1;
+ }
+
+ /* Might the failure be due to O_NOATIME? */
+ if (errno != ENOENT && sha1_file_open_flag) {
sha1_file_open_flag = 0;
+ continue;
+ }
+
+ return -1;
}
- return fd;
}
static int open_sha1_file(const unsigned char *sha1)
@@ -1044,7 +1055,7 @@ static int open_sha1_file(const unsigned char *sha1)
char *name = sha1_file_name(sha1);
struct alternate_object_database *alt;
- fd = git_open_noatime(name);
+ fd = git_open_noatime(name, NULL);
if (fd >= 0)
return fd;
@@ -1053,7 +1064,7 @@ static int open_sha1_file(const unsigned char *sha1)
for (alt = alt_odb_list; alt; alt = alt->next) {
name = alt->name;
fill_sha1_path(name, sha1);
- fd = git_open_noatime(alt->base);
+ fd = git_open_noatime(alt->base, NULL);
if (fd >= 0)
return fd;
}
@@ -2314,7 +2325,7 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
filename = sha1_file_name(sha1);
fd = create_tmpfile(tmpfile, sizeof(tmpfile), filename);
- while (fd < 0 && errno == EMFILE && unuse_one_window(packed_git, -1))
+ while (fd < 0 && errno == EMFILE && unuse_one_window(NULL, -1))
fd = create_tmpfile(tmpfile, sizeof(tmpfile), filename);
if (fd < 0) {
if (errno == EACCES)
--
1.7.3.2.191.g2d0e5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] Work around EMFILE when there are too many pack files
2010-11-01 22:54 ` [PATCH 2/2] Work around EMFILE when there are too many pack files Shawn O. Pearce
@ 2010-11-02 8:44 ` Johannes Sixt
2010-11-03 17:06 ` Junio C Hamano
1 sibling, 0 replies; 11+ messages in thread
From: Johannes Sixt @ 2010-11-02 8:44 UTC (permalink / raw
To: Shawn O. Pearce; +Cc: Junio C Hamano, git
Am 11/1/2010 23:54, schrieb Shawn O. Pearce:
> -static int git_open_noatime(const char *name)
> +static int git_open_noatime(const char *name, struct packed_git *p)
> {
> static int sha1_file_open_flag = O_NOATIME;
> - int fd = open(name, O_RDONLY | sha1_file_open_flag);
>
> - /* Might the failure be due to O_NOATIME? */
> - if (fd < 0 && errno != ENOENT && sha1_file_open_flag) {
> - fd = open(name, O_RDONLY);
> + for (;;) {
> + int fd = open(name, O_RDONLY | sha1_file_open_flag);
> if (fd >= 0)
> + return fd;
> +
> + /* Might the failure be insufficient file descriptors? */
> + if (errno == EMFILE) {
> + if (unuse_one_window(p, -1))
> + continue;
> + else
> + return -1;
> + }
> +
> + /* Might the failure be due to O_NOATIME? */
> + if (errno != ENOENT && sha1_file_open_flag) {
> sha1_file_open_flag = 0;
> + continue;
> + }
> +
> + return -1;
> }
> - return fd;
> }
Oh, boy! A function that returns a value, but does not have a return
statement at the end? Even a goto would be easier to read:
retry:
fd = open(name, O_RDONLY | sha1_file_open_flag);
if (fd >= 0)
return fd;
if (errno == EMFILE) {
if (unuse_one_window(p, -1))
goto retry;
return -1;
}
if (errno != ENOENT && sha1_file_open_flag) {
sha1_file_open_flag = 0;
goto retry;
}
return -1;
IMHO, of course.
-- Hannes
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] Work around EMFILE when there are too many pack files
2010-11-01 22:54 ` [PATCH 2/2] Work around EMFILE when there are too many pack files Shawn O. Pearce
2010-11-02 8:44 ` Johannes Sixt
@ 2010-11-03 17:06 ` Junio C Hamano
1 sibling, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2010-11-03 17:06 UTC (permalink / raw
To: Shawn O. Pearce; +Cc: git
"Shawn O. Pearce" <spearce@spearce.org> writes:
> @@ -876,7 +874,7 @@ static void prepare_packed_git_one(char *objdir, int local)
> sprintf(path, "%s/pack", objdir);
> len = strlen(path);
> dir = opendir(path);
> - while (!dir && errno == EMFILE && unuse_one_window(packed_git, -1))
> + while (!dir && errno == EMFILE && unuse_one_window(NULL, -1))
Hmmmm, why?
> @@ -2314,7 +2325,7 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
>
> filename = sha1_file_name(sha1);
> fd = create_tmpfile(tmpfile, sizeof(tmpfile), filename);
> - while (fd < 0 && errno == EMFILE && unuse_one_window(packed_git, -1))
> + while (fd < 0 && errno == EMFILE && unuse_one_window(NULL, -1))
Again, why?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] Use git_open_noatime when accessing pack data
2010-11-01 22:54 ` [PATCH 1/2] Use git_open_noatime when accessing pack data Shawn O. Pearce
@ 2010-11-03 17:07 ` Junio C Hamano
2010-11-03 17:41 ` Jonathan Nieder
0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2010-11-03 17:07 UTC (permalink / raw
To: Shawn O. Pearce; +Cc: git
"Shawn O. Pearce" <spearce@spearce.org> writes:
> This utility function avoids an unnecessary update of the access time
> for a loose object file. Just as the atime isn't useful on a loose
> object, its not useful on the pack or the corresonding idx file.
>
> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Hearing the name "git-open-noatime", one would naturally assume that it is
a way to open files without burdening the filesystem with inode metadata
update traffic, as that was the original reason why we open loose objects
without atime update. We historically anticipated to have very many of
the loose objects lying around, and this optimization made sense.
As any sane repository would have far fewer packfiles than loose objects,
one would think that, while it may not hurt, using git-open-noatime to
open packfiles is just a misguided performance measure. Not.
This patch (and the next patch) adds "we unuse pack windows to retry
opening if we have too many files already open" logic, which is a lot more
important side effect, especially when this function is used for packfiles
(because they tend to stay open for a long time, unlike loose object files
that are opened, read/mapped, and then immediately closed) than what the
name of this function says it does.
Even though I think the issue you are solving is worth addressing, I do
not think I like the structure of the API resulting from these two
patches. Most of the callers, except for the ones in check-packed-git-idx
and open-packed-git-1, do not care about "keeping one packfile" interface,
so I would prefer to see a two-patch series along the lines of ...
(1) introduce "int git_open_ro(const char *)" to replace the current
git_open_noatime(). The point is that the function no longer is
about avoiding from smudging the inode metadata. Instead, it becomes
the preferred way for us to get a read-only fd.
(2) call your git_open_noatime() implementation git_open_rowpf() or
something. Make git_open_ro() a thin wrapper of this function that
passes NULL for its packed_git parameter. Two callers that care
about protecting a pack they are operating on will call this function
directly.
We can of course do without s/git_open_noatime/git_open_ro/; and it will
make the patch much smaller. The rename is purely a clarification of the
API and is optional. It may make it easier to explain the name of the new
function, though.
By the way, I think I still owe you a patch to selectively pack-ref only
old ones.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] Use git_open_noatime when accessing pack data
2010-11-03 17:07 ` Junio C Hamano
@ 2010-11-03 17:41 ` Jonathan Nieder
2010-11-03 19:35 ` Junio C Hamano
0 siblings, 1 reply; 11+ messages in thread
From: Jonathan Nieder @ 2010-11-03 17:41 UTC (permalink / raw
To: Junio C Hamano; +Cc: Shawn O. Pearce, git
Junio C Hamano wrote:
> (1) introduce "int git_open_ro(const char *)" to replace the current
> git_open_noatime(). The point is that the function no longer is
> about avoiding from smudging the inode metadata. Instead, it becomes
> the preferred way for us to get a read-only fd.
[...]
> We can of course do without s/git_open_noatime/git_open_ro/; and it will
> make the patch much smaller. The rename is purely a clarification of the
> API and is optional. It may make it easier to explain the name of the new
> function, though.
Probably a silly question, but should all readonly open()s actually use
noatime? Some uses for atime:
- tmpwatch. That one's a bit insane, anyway, but I think it might
work because loose objects and packs are read-only.
- mail clients. For this case, noatime is the right thing to do ---
git's access does mean the mail was read.
- listing important files, as in popularity-contest. For this,
noatime is also the right thing to do.
Judging from these three use cases, readonly open()s to the worktree
should indeed use noatime, but open()s of .git/config, say, should
not. Hmm.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] Use git_open_noatime when accessing pack data
2010-11-03 17:41 ` Jonathan Nieder
@ 2010-11-03 19:35 ` Junio C Hamano
2010-11-04 5:04 ` Jonathan Nieder
0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2010-11-03 19:35 UTC (permalink / raw
To: Jonathan Nieder; +Cc: Shawn O. Pearce, git
Jonathan Nieder <jrnieder@gmail.com> writes:
> Judging from these three use cases, readonly open()s to the worktree
> should indeed use noatime, but open()s of .git/config, say, should
> not. Hmm.
Why not, when you are talking about readonly open?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] Use git_open_noatime when accessing pack data
2010-11-03 19:35 ` Junio C Hamano
@ 2010-11-04 5:04 ` Jonathan Nieder
2010-11-04 5:23 ` Kevin Ballard
2010-11-05 17:26 ` Junio C Hamano
0 siblings, 2 replies; 11+ messages in thread
From: Jonathan Nieder @ 2010-11-04 5:04 UTC (permalink / raw
To: Junio C Hamano; +Cc: Shawn O. Pearce, git
Junio C Hamano wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:
>> Judging from these three use cases, readonly open()s to the worktree
>> should indeed use noatime, but open()s of .git/config, say, should
>> not. Hmm.
>
> Why not, when you are talking about readonly open?
A .git/config frequently accessed by git is being used, so tmpwatch
shouldn't delete it. (Meanwhile, a worktree frequently accessed by
git is not precious on that account, and letting git touch the atime
there might mask more useful information about when other programs
touched the files.)
But that is all very fuzzy to me. I tend to mount noatime except on
/usr (for popularity-contest).
I guess I should put it another way. What if anything does readonly
have to do with O_NOATIME? Why shouldn't we always use O_NOATIME?
Why should the operating system provide atime at all?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] Use git_open_noatime when accessing pack data
2010-11-04 5:04 ` Jonathan Nieder
@ 2010-11-04 5:23 ` Kevin Ballard
2010-11-05 17:26 ` Junio C Hamano
1 sibling, 0 replies; 11+ messages in thread
From: Kevin Ballard @ 2010-11-04 5:23 UTC (permalink / raw
To: Jonathan Nieder; +Cc: Junio C Hamano, Shawn O. Pearce, git
On Nov 3, 2010, at 10:04 PM, Jonathan Nieder wrote:
> I guess I should put it another way. What if anything does readonly
> have to do with O_NOATIME? Why shouldn't we always use O_NOATIME?
> Why should the operating system provide atime at all?
atime is useful for user-level files that are opened or modified as a
direct result of actions the user took. I don't think a took like git
has any reason to ever want to touch atime.
-Kevin Ballard
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] Use git_open_noatime when accessing pack data
2010-11-04 5:04 ` Jonathan Nieder
2010-11-04 5:23 ` Kevin Ballard
@ 2010-11-05 17:26 ` Junio C Hamano
1 sibling, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2010-11-05 17:26 UTC (permalink / raw
To: Jonathan Nieder; +Cc: Shawn O. Pearce, git
Jonathan Nieder <jrnieder@gmail.com> writes:
> I guess I should put it another way. What if anything does readonly
> have to do with O_NOATIME? Why shouldn't we always use O_NOATIME?
> Why should the operating system provide atime at all?
Well, I think what you are getting at is that the proposed API in Shawn's
patch and my suggestion is upside down, and it should be layered more like
this (from lower to higher layers) from the interface's point of view:
- git_open_wpf(): an API to get a file descriptor to a file, while
protecting open fds to a packfile from getting reclaimed;
- git_open(): a thin wrapper of the previous, for callers that do not
need any "with-pack-file" aspect of it;
- git_open_noatime(): a thin wrapper of the previous, for callers that do
not want to incur inode metainformation traffic to the filesystem.
even though the bulk of implementation, including the logic to handle
no-atime, probably needs to happen in the lowermost layer.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2010-11-05 17:27 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-01 22:54 [PATCH 0/2] Work around too many file descriptors Shawn O. Pearce
2010-11-01 22:54 ` [PATCH 1/2] Use git_open_noatime when accessing pack data Shawn O. Pearce
2010-11-03 17:07 ` Junio C Hamano
2010-11-03 17:41 ` Jonathan Nieder
2010-11-03 19:35 ` Junio C Hamano
2010-11-04 5:04 ` Jonathan Nieder
2010-11-04 5:23 ` Kevin Ballard
2010-11-05 17:26 ` Junio C Hamano
2010-11-01 22:54 ` [PATCH 2/2] Work around EMFILE when there are too many pack files Shawn O. Pearce
2010-11-02 8:44 ` Johannes Sixt
2010-11-03 17:06 ` Junio C Hamano
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).