* [PATCH 0/2] Work around too many file descriptors @ 2010-11-01 22:54 Shawn O. Pearce 2010-11-01 22:54 ` [PATCH 1/2] Use git_open_noatime when accessing pack data Shawn O. Pearce 2010-11-01 22:54 ` [PATCH 2/2] Work around EMFILE when there are too many pack files Shawn O. Pearce 0 siblings, 2 replies; 11+ messages in thread From: Shawn O. Pearce @ 2010-11-01 22:54 UTC (permalink / raw To: Junio C Hamano; +Cc: git This tiny series tries to work around not having enough file descriptors to open every pack in a repository. It is the logical conclusion of Johannes Schindelin's commit fd73ccf2 ("Cope better with a _lot_ of packs"). The patch was prepared on top of gitster's recent two changes to improve error handling in read_sha1_file(). Shawn O. Pearce (2): Use git_open_noatime when accessing pack data Work around EMFILE when there are too many pack files sha1_file.c | 43 ++++++++++++++++++++++++++++--------------- 1 files changed, 28 insertions(+), 15 deletions(-) -- 1.7.3.2.191.g2d0e5 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] Use git_open_noatime when accessing pack data 2010-11-01 22:54 [PATCH 0/2] Work around too many file descriptors Shawn O. Pearce @ 2010-11-01 22:54 ` Shawn O. Pearce 2010-11-03 17:07 ` Junio C Hamano 2010-11-01 22:54 ` [PATCH 2/2] Work around EMFILE when there are too many pack files Shawn O. Pearce 1 sibling, 1 reply; 11+ messages in thread From: Shawn O. Pearce @ 2010-11-01 22:54 UTC (permalink / raw To: Junio C Hamano; +Cc: git This utility function avoids an unnecessary update of the access time for a loose object file. Just as the atime isn't useful on a loose object, its not useful on the pack or the corresonding idx file. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> --- sha1_file.c | 10 ++++++---- 1 files changed, 6 insertions(+), 4 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index f709ed6..a6c1934 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -35,6 +35,8 @@ static size_t sz_fmt(size_t s) { return s; } const unsigned char null_sha1[20]; +static int git_open_noatime(const char *name); + int safe_create_leading_directories(char *path) { char *pos = path + offset_1st_component(path); @@ -298,7 +300,7 @@ static void read_info_alternates(const char * relative_base, int depth) int fd; sprintf(path, "%s/%s", relative_base, alt_file_name); - fd = open(path, O_RDONLY); + fd = git_open_noatime(path); if (fd < 0) return; if (fstat(fd, &st) || (st.st_size == 0)) { @@ -411,7 +413,7 @@ static int check_packed_git_idx(const char *path, struct packed_git *p) struct pack_idx_header *hdr; size_t idx_size; uint32_t version, nr, i, *index; - int fd = open(path, O_RDONLY); + int fd = git_open_noatime(path); struct stat st; if (fd < 0) @@ -655,9 +657,9 @@ static int open_packed_git_1(struct packed_git *p) if (!p->index_data && open_pack_index(p)) return error("packfile %s index unavailable", p->pack_name); - p->pack_fd = open(p->pack_name, O_RDONLY); + p->pack_fd = git_open_noatime(p->pack_name); while (p->pack_fd < 0 && errno == EMFILE && unuse_one_window(p, -1)) - p->pack_fd = open(p->pack_name, O_RDONLY); + p->pack_fd = git_open_noatime(p->pack_name); if (p->pack_fd < 0 || fstat(p->pack_fd, &st)) return -1; -- 1.7.3.2.191.g2d0e5 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] Use git_open_noatime when accessing pack data 2010-11-01 22:54 ` [PATCH 1/2] Use git_open_noatime when accessing pack data Shawn O. Pearce @ 2010-11-03 17:07 ` Junio C Hamano 2010-11-03 17:41 ` Jonathan Nieder 0 siblings, 1 reply; 11+ messages in thread From: Junio C Hamano @ 2010-11-03 17:07 UTC (permalink / raw To: Shawn O. Pearce; +Cc: git "Shawn O. Pearce" <spearce@spearce.org> writes: > This utility function avoids an unnecessary update of the access time > for a loose object file. Just as the atime isn't useful on a loose > object, its not useful on the pack or the corresonding idx file. > > Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Hearing the name "git-open-noatime", one would naturally assume that it is a way to open files without burdening the filesystem with inode metadata update traffic, as that was the original reason why we open loose objects without atime update. We historically anticipated to have very many of the loose objects lying around, and this optimization made sense. As any sane repository would have far fewer packfiles than loose objects, one would think that, while it may not hurt, using git-open-noatime to open packfiles is just a misguided performance measure. Not. This patch (and the next patch) adds "we unuse pack windows to retry opening if we have too many files already open" logic, which is a lot more important side effect, especially when this function is used for packfiles (because they tend to stay open for a long time, unlike loose object files that are opened, read/mapped, and then immediately closed) than what the name of this function says it does. Even though I think the issue you are solving is worth addressing, I do not think I like the structure of the API resulting from these two patches. Most of the callers, except for the ones in check-packed-git-idx and open-packed-git-1, do not care about "keeping one packfile" interface, so I would prefer to see a two-patch series along the lines of ... (1) introduce "int git_open_ro(const char *)" to replace the current git_open_noatime(). The point is that the function no longer is about avoiding from smudging the inode metadata. Instead, it becomes the preferred way for us to get a read-only fd. (2) call your git_open_noatime() implementation git_open_rowpf() or something. Make git_open_ro() a thin wrapper of this function that passes NULL for its packed_git parameter. Two callers that care about protecting a pack they are operating on will call this function directly. We can of course do without s/git_open_noatime/git_open_ro/; and it will make the patch much smaller. The rename is purely a clarification of the API and is optional. It may make it easier to explain the name of the new function, though. By the way, I think I still owe you a patch to selectively pack-ref only old ones. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] Use git_open_noatime when accessing pack data 2010-11-03 17:07 ` Junio C Hamano @ 2010-11-03 17:41 ` Jonathan Nieder 2010-11-03 19:35 ` Junio C Hamano 0 siblings, 1 reply; 11+ messages in thread From: Jonathan Nieder @ 2010-11-03 17:41 UTC (permalink / raw To: Junio C Hamano; +Cc: Shawn O. Pearce, git Junio C Hamano wrote: > (1) introduce "int git_open_ro(const char *)" to replace the current > git_open_noatime(). The point is that the function no longer is > about avoiding from smudging the inode metadata. Instead, it becomes > the preferred way for us to get a read-only fd. [...] > We can of course do without s/git_open_noatime/git_open_ro/; and it will > make the patch much smaller. The rename is purely a clarification of the > API and is optional. It may make it easier to explain the name of the new > function, though. Probably a silly question, but should all readonly open()s actually use noatime? Some uses for atime: - tmpwatch. That one's a bit insane, anyway, but I think it might work because loose objects and packs are read-only. - mail clients. For this case, noatime is the right thing to do --- git's access does mean the mail was read. - listing important files, as in popularity-contest. For this, noatime is also the right thing to do. Judging from these three use cases, readonly open()s to the worktree should indeed use noatime, but open()s of .git/config, say, should not. Hmm. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] Use git_open_noatime when accessing pack data 2010-11-03 17:41 ` Jonathan Nieder @ 2010-11-03 19:35 ` Junio C Hamano 2010-11-04 5:04 ` Jonathan Nieder 0 siblings, 1 reply; 11+ messages in thread From: Junio C Hamano @ 2010-11-03 19:35 UTC (permalink / raw To: Jonathan Nieder; +Cc: Shawn O. Pearce, git Jonathan Nieder <jrnieder@gmail.com> writes: > Judging from these three use cases, readonly open()s to the worktree > should indeed use noatime, but open()s of .git/config, say, should > not. Hmm. Why not, when you are talking about readonly open? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] Use git_open_noatime when accessing pack data 2010-11-03 19:35 ` Junio C Hamano @ 2010-11-04 5:04 ` Jonathan Nieder 2010-11-04 5:23 ` Kevin Ballard 2010-11-05 17:26 ` Junio C Hamano 0 siblings, 2 replies; 11+ messages in thread From: Jonathan Nieder @ 2010-11-04 5:04 UTC (permalink / raw To: Junio C Hamano; +Cc: Shawn O. Pearce, git Junio C Hamano wrote: > Jonathan Nieder <jrnieder@gmail.com> writes: >> Judging from these three use cases, readonly open()s to the worktree >> should indeed use noatime, but open()s of .git/config, say, should >> not. Hmm. > > Why not, when you are talking about readonly open? A .git/config frequently accessed by git is being used, so tmpwatch shouldn't delete it. (Meanwhile, a worktree frequently accessed by git is not precious on that account, and letting git touch the atime there might mask more useful information about when other programs touched the files.) But that is all very fuzzy to me. I tend to mount noatime except on /usr (for popularity-contest). I guess I should put it another way. What if anything does readonly have to do with O_NOATIME? Why shouldn't we always use O_NOATIME? Why should the operating system provide atime at all? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] Use git_open_noatime when accessing pack data 2010-11-04 5:04 ` Jonathan Nieder @ 2010-11-04 5:23 ` Kevin Ballard 2010-11-05 17:26 ` Junio C Hamano 1 sibling, 0 replies; 11+ messages in thread From: Kevin Ballard @ 2010-11-04 5:23 UTC (permalink / raw To: Jonathan Nieder; +Cc: Junio C Hamano, Shawn O. Pearce, git On Nov 3, 2010, at 10:04 PM, Jonathan Nieder wrote: > I guess I should put it another way. What if anything does readonly > have to do with O_NOATIME? Why shouldn't we always use O_NOATIME? > Why should the operating system provide atime at all? atime is useful for user-level files that are opened or modified as a direct result of actions the user took. I don't think a took like git has any reason to ever want to touch atime. -Kevin Ballard ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] Use git_open_noatime when accessing pack data 2010-11-04 5:04 ` Jonathan Nieder 2010-11-04 5:23 ` Kevin Ballard @ 2010-11-05 17:26 ` Junio C Hamano 1 sibling, 0 replies; 11+ messages in thread From: Junio C Hamano @ 2010-11-05 17:26 UTC (permalink / raw To: Jonathan Nieder; +Cc: Shawn O. Pearce, git Jonathan Nieder <jrnieder@gmail.com> writes: > I guess I should put it another way. What if anything does readonly > have to do with O_NOATIME? Why shouldn't we always use O_NOATIME? > Why should the operating system provide atime at all? Well, I think what you are getting at is that the proposed API in Shawn's patch and my suggestion is upside down, and it should be layered more like this (from lower to higher layers) from the interface's point of view: - git_open_wpf(): an API to get a file descriptor to a file, while protecting open fds to a packfile from getting reclaimed; - git_open(): a thin wrapper of the previous, for callers that do not need any "with-pack-file" aspect of it; - git_open_noatime(): a thin wrapper of the previous, for callers that do not want to incur inode metainformation traffic to the filesystem. even though the bulk of implementation, including the logic to handle no-atime, probably needs to happen in the lowermost layer. ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/2] Work around EMFILE when there are too many pack files 2010-11-01 22:54 [PATCH 0/2] Work around too many file descriptors Shawn O. Pearce 2010-11-01 22:54 ` [PATCH 1/2] Use git_open_noatime when accessing pack data Shawn O. Pearce @ 2010-11-01 22:54 ` Shawn O. Pearce 2010-11-02 8:44 ` Johannes Sixt 2010-11-03 17:06 ` Junio C Hamano 1 sibling, 2 replies; 11+ messages in thread From: Shawn O. Pearce @ 2010-11-01 22:54 UTC (permalink / raw To: Junio C Hamano; +Cc: git When opening any files in the object database, release unused pack windows if the open(2) syscall fails due to EMFILE (too many open files in this process). This allows Git to degrade gracefully on a repository with thousands of pack files, and a commit stored in a loose object in the middle of the history. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> --- sha1_file.c | 43 +++++++++++++++++++++++++++---------------- 1 files changed, 27 insertions(+), 16 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index a6c1934..43d68e0 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -35,7 +35,7 @@ static size_t sz_fmt(size_t s) { return s; } const unsigned char null_sha1[20]; -static int git_open_noatime(const char *name); +static int git_open_noatime(const char *name, struct packed_git *p); int safe_create_leading_directories(char *path) { @@ -300,7 +300,7 @@ static void read_info_alternates(const char * relative_base, int depth) int fd; sprintf(path, "%s/%s", relative_base, alt_file_name); - fd = git_open_noatime(path); + fd = git_open_noatime(path, NULL); if (fd < 0) return; if (fstat(fd, &st) || (st.st_size == 0)) { @@ -413,7 +413,7 @@ static int check_packed_git_idx(const char *path, struct packed_git *p) struct pack_idx_header *hdr; size_t idx_size; uint32_t version, nr, i, *index; - int fd = git_open_noatime(path); + int fd = git_open_noatime(path, p); struct stat st; if (fd < 0) @@ -657,9 +657,7 @@ static int open_packed_git_1(struct packed_git *p) if (!p->index_data && open_pack_index(p)) return error("packfile %s index unavailable", p->pack_name); - p->pack_fd = git_open_noatime(p->pack_name); - while (p->pack_fd < 0 && errno == EMFILE && unuse_one_window(p, -1)) - p->pack_fd = git_open_noatime(p->pack_name); + p->pack_fd = git_open_noatime(p->pack_name, p); if (p->pack_fd < 0 || fstat(p->pack_fd, &st)) return -1; @@ -876,7 +874,7 @@ static void prepare_packed_git_one(char *objdir, int local) sprintf(path, "%s/pack", objdir); len = strlen(path); dir = opendir(path); - while (!dir && errno == EMFILE && unuse_one_window(packed_git, -1)) + while (!dir && errno == EMFILE && unuse_one_window(NULL, -1)) dir = opendir(path); if (!dir) { if (errno != ENOENT) @@ -1024,18 +1022,31 @@ int check_sha1_signature(const unsigned char *sha1, void *map, unsigned long siz return hashcmp(sha1, real_sha1) ? -1 : 0; } -static int git_open_noatime(const char *name) +static int git_open_noatime(const char *name, struct packed_git *p) { static int sha1_file_open_flag = O_NOATIME; - int fd = open(name, O_RDONLY | sha1_file_open_flag); - /* Might the failure be due to O_NOATIME? */ - if (fd < 0 && errno != ENOENT && sha1_file_open_flag) { - fd = open(name, O_RDONLY); + for (;;) { + int fd = open(name, O_RDONLY | sha1_file_open_flag); if (fd >= 0) + return fd; + + /* Might the failure be insufficient file descriptors? */ + if (errno == EMFILE) { + if (unuse_one_window(p, -1)) + continue; + else + return -1; + } + + /* Might the failure be due to O_NOATIME? */ + if (errno != ENOENT && sha1_file_open_flag) { sha1_file_open_flag = 0; + continue; + } + + return -1; } - return fd; } static int open_sha1_file(const unsigned char *sha1) @@ -1044,7 +1055,7 @@ static int open_sha1_file(const unsigned char *sha1) char *name = sha1_file_name(sha1); struct alternate_object_database *alt; - fd = git_open_noatime(name); + fd = git_open_noatime(name, NULL); if (fd >= 0) return fd; @@ -1053,7 +1064,7 @@ static int open_sha1_file(const unsigned char *sha1) for (alt = alt_odb_list; alt; alt = alt->next) { name = alt->name; fill_sha1_path(name, sha1); - fd = git_open_noatime(alt->base); + fd = git_open_noatime(alt->base, NULL); if (fd >= 0) return fd; } @@ -2314,7 +2325,7 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen, filename = sha1_file_name(sha1); fd = create_tmpfile(tmpfile, sizeof(tmpfile), filename); - while (fd < 0 && errno == EMFILE && unuse_one_window(packed_git, -1)) + while (fd < 0 && errno == EMFILE && unuse_one_window(NULL, -1)) fd = create_tmpfile(tmpfile, sizeof(tmpfile), filename); if (fd < 0) { if (errno == EACCES) -- 1.7.3.2.191.g2d0e5 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] Work around EMFILE when there are too many pack files 2010-11-01 22:54 ` [PATCH 2/2] Work around EMFILE when there are too many pack files Shawn O. Pearce @ 2010-11-02 8:44 ` Johannes Sixt 2010-11-03 17:06 ` Junio C Hamano 1 sibling, 0 replies; 11+ messages in thread From: Johannes Sixt @ 2010-11-02 8:44 UTC (permalink / raw To: Shawn O. Pearce; +Cc: Junio C Hamano, git Am 11/1/2010 23:54, schrieb Shawn O. Pearce: > -static int git_open_noatime(const char *name) > +static int git_open_noatime(const char *name, struct packed_git *p) > { > static int sha1_file_open_flag = O_NOATIME; > - int fd = open(name, O_RDONLY | sha1_file_open_flag); > > - /* Might the failure be due to O_NOATIME? */ > - if (fd < 0 && errno != ENOENT && sha1_file_open_flag) { > - fd = open(name, O_RDONLY); > + for (;;) { > + int fd = open(name, O_RDONLY | sha1_file_open_flag); > if (fd >= 0) > + return fd; > + > + /* Might the failure be insufficient file descriptors? */ > + if (errno == EMFILE) { > + if (unuse_one_window(p, -1)) > + continue; > + else > + return -1; > + } > + > + /* Might the failure be due to O_NOATIME? */ > + if (errno != ENOENT && sha1_file_open_flag) { > sha1_file_open_flag = 0; > + continue; > + } > + > + return -1; > } > - return fd; > } Oh, boy! A function that returns a value, but does not have a return statement at the end? Even a goto would be easier to read: retry: fd = open(name, O_RDONLY | sha1_file_open_flag); if (fd >= 0) return fd; if (errno == EMFILE) { if (unuse_one_window(p, -1)) goto retry; return -1; } if (errno != ENOENT && sha1_file_open_flag) { sha1_file_open_flag = 0; goto retry; } return -1; IMHO, of course. -- Hannes ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] Work around EMFILE when there are too many pack files 2010-11-01 22:54 ` [PATCH 2/2] Work around EMFILE when there are too many pack files Shawn O. Pearce 2010-11-02 8:44 ` Johannes Sixt @ 2010-11-03 17:06 ` Junio C Hamano 1 sibling, 0 replies; 11+ messages in thread From: Junio C Hamano @ 2010-11-03 17:06 UTC (permalink / raw To: Shawn O. Pearce; +Cc: git "Shawn O. Pearce" <spearce@spearce.org> writes: > @@ -876,7 +874,7 @@ static void prepare_packed_git_one(char *objdir, int local) > sprintf(path, "%s/pack", objdir); > len = strlen(path); > dir = opendir(path); > - while (!dir && errno == EMFILE && unuse_one_window(packed_git, -1)) > + while (!dir && errno == EMFILE && unuse_one_window(NULL, -1)) Hmmmm, why? > @@ -2314,7 +2325,7 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen, > > filename = sha1_file_name(sha1); > fd = create_tmpfile(tmpfile, sizeof(tmpfile), filename); > - while (fd < 0 && errno == EMFILE && unuse_one_window(packed_git, -1)) > + while (fd < 0 && errno == EMFILE && unuse_one_window(NULL, -1)) Again, why? ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2010-11-05 17:27 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-01 22:54 [PATCH 0/2] Work around too many file descriptors Shawn O. Pearce 2010-11-01 22:54 ` [PATCH 1/2] Use git_open_noatime when accessing pack data Shawn O. Pearce 2010-11-03 17:07 ` Junio C Hamano 2010-11-03 17:41 ` Jonathan Nieder 2010-11-03 19:35 ` Junio C Hamano 2010-11-04 5:04 ` Jonathan Nieder 2010-11-04 5:23 ` Kevin Ballard 2010-11-05 17:26 ` Junio C Hamano 2010-11-01 22:54 ` [PATCH 2/2] Work around EMFILE when there are too many pack files Shawn O. Pearce 2010-11-02 8:44 ` Johannes Sixt 2010-11-03 17:06 ` Junio C Hamano
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).