* git repack leaks disk space on ENOSPC @ 2017-10-11 15:05 Andreas Krey 2017-10-12 3:17 ` Jonathan Nieder 0 siblings, 1 reply; 6+ messages in thread From: Andreas Krey @ 2017-10-11 15:05 UTC (permalink / raw) To: Git Users Hi all, I observed (again) an annoying behavior of 'git repack': When the new pack file cannot be fully written because the disk gets full beforehand, the tmp_pack file isn't deleted, meaning the disk stays full: $ df -h .; git repack -ad; df -h .; ls -lart .git/objects/pack/tmp*; rm .git/objects/pack/tmp*; df -h . Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg02-localworkspaces 250G 245G 5.1G 98% /workspaces/calvin Counting objects: 4715349, done. Delta compression using up to 8 threads. Compressing objects: 100% (978051/978051), done. fatal: sha1 file '.git/objects/pack/tmp_pack_xB7DMt' write error: No space left on device Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg02-localworkspaces 250G 250G 20K 100% /workspaces/calvin -r--r--r-- 1 andrkrey users 5438435328 Oct 11 17:03 .git/objects/pack/tmp_pack_xB7DMt rm: remove write-protected regular file '.git/objects/pack/tmp_pack_xB7DMt'? y Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg02-localworkspaces 250G 245G 5.1G 98% /workspaces/calvin - Andreas git version 2.15.0.rc0 -- "Totally trivial. Famous last words." From: Linus Torvalds <torvalds@*.org> Date: Fri, 22 Jan 2010 07:29:21 -0800 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git repack leaks disk space on ENOSPC 2017-10-11 15:05 git repack leaks disk space on ENOSPC Andreas Krey @ 2017-10-12 3:17 ` Jonathan Nieder 2017-10-12 9:34 ` Andreas Krey 2017-10-12 13:27 ` Jeff King 0 siblings, 2 replies; 6+ messages in thread From: Jonathan Nieder @ 2017-10-12 3:17 UTC (permalink / raw) To: Andreas Krey; +Cc: Git Users Hi Andreas, Andreas Krey wrote: > I observed (again) an annoying behavior of 'git repack': Do you have context for this 'again'? E.g. was this discussed previously on-list? > When the new pack file cannot be fully written because > the disk gets full beforehand, the tmp_pack file isn't > deleted, meaning the disk stays full: > > $ df -h .; git repack -ad; df -h .; ls -lart .git/objects/pack/tmp*; rm .git/objects/pack/tmp*; df -h . > Filesystem Size Used Avail Use% Mounted on > /dev/mapper/vg02-localworkspaces 250G 245G 5.1G 98% /workspaces/calvin > Counting objects: 4715349, done. > Delta compression using up to 8 threads. > Compressing objects: 100% (978051/978051), done. > fatal: sha1 file '.git/objects/pack/tmp_pack_xB7DMt' write error: No space left on device > Filesystem Size Used Avail Use% Mounted on > /dev/mapper/vg02-localworkspaces 250G 250G 20K 100% /workspaces/calvin > -r--r--r-- 1 andrkrey users 5438435328 Oct 11 17:03 .git/objects/pack/tmp_pack_xB7DMt > rm: remove write-protected regular file '.git/objects/pack/tmp_pack_xB7DMt'? y > Filesystem Size Used Avail Use% Mounted on > /dev/mapper/vg02-localworkspaces 250G 245G 5.1G 98% /workspaces/calvin > > git version 2.15.0.rc0 I can imagine this behavior of retaining tmp_pack being useful for debugging in some circumstances, but I agree with you that it is certainly not a good default. Chasing this down, I find: pack-write.c::create_tmp_packfile chooses the filename builtin/pack-objects.c::write_pack_file writes to it and the .bitmap, calling pack-write.c::finish_tmp_packfile to rename it into place Nothing tries to install an atexit handler to do anything special to it on exit. The natural thing, I'd expect, would be for pack-write to use the tempfile API (see tempfile.h) to create and finish the file. That way, we'd get such atexit handlers for free. If we want a way to keep temp files for debugging on abnormal exit, we could set that up separately as a generic feature of the tempfile API (e.g. an envvar GIT_KEEP_TEMPFILES_ON_FAILURE), making that an orthogonal topic. Does using create_tempfile there seem like a good path forward to you? Would you be interested in working on it (either writing a patch with such a fix or a test in t/ to make sure it keeps working)? Thanks, Jonathan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git repack leaks disk space on ENOSPC 2017-10-12 3:17 ` Jonathan Nieder @ 2017-10-12 9:34 ` Andreas Krey 2017-10-12 11:01 ` brian m. carlson 2017-10-12 13:36 ` Jeff King 2017-10-12 13:27 ` Jeff King 1 sibling, 2 replies; 6+ messages in thread From: Andreas Krey @ 2017-10-12 9:34 UTC (permalink / raw) To: Jonathan Nieder; +Cc: Git Users On Wed, 11 Oct 2017 20:17:03 +0000, Jonathan Nieder wrote: > Hi Andreas, > > Andreas Krey wrote: > > > I observed (again) an annoying behavior of 'git repack': > > Do you have context for this 'again'? E.g. was this discussed > previously on-list? I think I posted about it, but no discussion. I poked a bit at the code, with not much luck back then. ... > Does using create_tempfile there seem like a good path forward to you? > Would you be interested in working on it (either writing a patch with > such a fix or a test in t/ to make sure it keeps working)? I will look into creating a patch (thanks for the pointers), but I don't see how to make a testcase for this - pre-filling the disk doesn't sound like a good idea. Most people probably won't run in this situation, and then won't have tmp_packs with a dozen GBytes each lying around. Andreas -- "Totally trivial. Famous last words." From: Linus Torvalds <torvalds@*.org> Date: Fri, 22 Jan 2010 07:29:21 -0800 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git repack leaks disk space on ENOSPC 2017-10-12 9:34 ` Andreas Krey @ 2017-10-12 11:01 ` brian m. carlson 2017-10-12 13:36 ` Jeff King 1 sibling, 0 replies; 6+ messages in thread From: brian m. carlson @ 2017-10-12 11:01 UTC (permalink / raw) To: Andreas Krey; +Cc: Jonathan Nieder, Git Users [-- Attachment #1: Type: text/plain, Size: 1061 bytes --] On Thu, Oct 12, 2017 at 11:34:39AM +0200, Andreas Krey wrote: > On Wed, 11 Oct 2017 20:17:03 +0000, Jonathan Nieder wrote: > > Does using create_tempfile there seem like a good path forward to you? > > Would you be interested in working on it (either writing a patch with > > such a fix or a test in t/ to make sure it keeps working)? > > I will look into creating a patch (thanks for the pointers), > but I don't see how to make a testcase for this - pre-filling the > disk doesn't sound like a good idea. Most people probably won't run in > this situation, and then won't have tmp_packs with a dozen GBytes each > lying around. A patch would be very welcome. We have this problem not infrequently at work with development and test VMs, which tend to have a relatively small amount of disk. If you decide that you don't want to create a patch, I'll probably pick it up eventually. -- brian m. carlson / brian with sandals: Houston, Texas, US https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 867 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git repack leaks disk space on ENOSPC 2017-10-12 9:34 ` Andreas Krey 2017-10-12 11:01 ` brian m. carlson @ 2017-10-12 13:36 ` Jeff King 1 sibling, 0 replies; 6+ messages in thread From: Jeff King @ 2017-10-12 13:36 UTC (permalink / raw) To: Andreas Krey; +Cc: Jonathan Nieder, Git Users On Thu, Oct 12, 2017 at 11:34:39AM +0200, Andreas Krey wrote: > > Does using create_tempfile there seem like a good path forward to you? > > Would you be interested in working on it (either writing a patch with > > such a fix or a test in t/ to make sure it keeps working)? > > I will look into creating a patch (thanks for the pointers), > but I don't see how to make a testcase for this - pre-filling the > disk doesn't sound like a good idea. Most people probably won't run in > this situation, and then won't have tmp_packs with a dozen GBytes each > lying around. It may be easier to trigger a case which rejects the pack for other reasons. For an incoming index-pack, turning on transfer.fsckObjects is an easy one. For a repack, perhaps corrupting a loose object to-be-packed would work. E.g.: git init echo content >file git add file git commit -m foo # corrupt the blob in a subtle way obj=.git/objects/$(git rev-parse HEAD:file | sed 's,..,&/,') chmod +w $obj echo cruft >>$obj After that, I get: $ git repack -ad Counting objects: 3, done. error: garbage at end of loose object 'd95f3ad14dee633a758d2e331151e950dd13e4ed' fatal: loose object d95f3ad14dee633a758d2e331151e950dd13e4ed (stored in .git/objects/d9/5f3ad14dee633a758d2e331151e950dd13e4ed) is corrupt $ find -type f .git/objects/pack .git/objects/pack/tmp_pack_0GaXwk -Peff ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git repack leaks disk space on ENOSPC 2017-10-12 3:17 ` Jonathan Nieder 2017-10-12 9:34 ` Andreas Krey @ 2017-10-12 13:27 ` Jeff King 1 sibling, 0 replies; 6+ messages in thread From: Jeff King @ 2017-10-12 13:27 UTC (permalink / raw) To: Jonathan Nieder; +Cc: Andreas Krey, Git Users On Wed, Oct 11, 2017 at 08:17:03PM -0700, Jonathan Nieder wrote: > I can imagine this behavior of retaining tmp_pack being useful for > debugging in some circumstances, but I agree with you that it is > certainly not a good default. > > Chasing this down, I find: > > pack-write.c::create_tmp_packfile chooses the filename > builtin/pack-objects.c::write_pack_file writes to it and the .bitmap, calling > pack-write.c::finish_tmp_packfile to rename it into place > > Nothing tries to install an atexit handler to do anything special to it > on exit. > > The natural thing, I'd expect, would be for pack-write to use the > tempfile API (see tempfile.h) to create and finish the file. That way, > we'd get such atexit handlers for free. If we want a way to keep temp > files for debugging on abnormal exit, we could set that up separately as > a generic feature of the tempfile API (e.g. an envvar > GIT_KEEP_TEMPFILES_ON_FAILURE), making that an orthogonal topic. Yes, I think this is the right direction. I've had a patch in GitHub's fork for years that does so (since otherwise failures can fill up your disk and need manual intervention). The main reason that I hadn't submitted it upstream was because of the "you can never free a struct tempfile" requirement. So my patch just leaks the tempfile structs. That's OK for packs, of which we tend to create only a few in a given process, but doesn't scale for loose objects. Now that 89563ec379 (Merge branch 'jk/incore-lockfile-removal', 2017-09-19) has landed, I think it makes sense to pursue that direction. My patch roughly looks like: diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 4ff567db47..7f261e56c4 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -308,9 +348,11 @@ static const char *open_pack_file(const char *pack_name) input_fd = 0; if (!pack_name) { struct strbuf tmp_file = STRBUF_INIT; + struct tempfile *t = xcalloc(1, sizeof(*t)); output_fd = odb_mkstemp(&tmp_file, "pack/tmp_pack_XXXXXX"); pack_name = strbuf_detach(&tmp_file, NULL); + register_tempfile(t, pack_name); } else { output_fd = open(pack_name, O_CREAT|O_EXCL|O_RDWR, 0600); if (output_fd < 0) but note that's not quite what we'd want. It never closes the tempfile, so: 1. Under the new regime, we'd still leak the struct! 2. Git will still try to unlink the tempfile on exit, even if we successfully moved it into place. So I think all the code around open_pack_file() needs to learn to pass around the tempfile struct, and eventually use rename_tempfile() to cement it in place. I also suspect that odb_mkstemp should just take a "struct tempfile". -Peff ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-10-12 13:36 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-10-11 15:05 git repack leaks disk space on ENOSPC Andreas Krey 2017-10-12 3:17 ` Jonathan Nieder 2017-10-12 9:34 ` Andreas Krey 2017-10-12 11:01 ` brian m. carlson 2017-10-12 13:36 ` Jeff King 2017-10-12 13:27 ` Jeff King
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).