* Order of operations at the end of fast-import? @ 2020-04-16 4:24 Mike Hommey 2020-04-16 5:09 ` Jeff King 2020-04-16 11:34 ` Mike Hommey 0 siblings, 2 replies; 5+ messages in thread From: Mike Hommey @ 2020-04-16 4:24 UTC (permalink / raw) To: git Hi, I just noticed that the order of operations at the end of fast-import are: - end_packfile - dump_branches - dump_tags The first may create loose objects if the pack is small (per fastimport.unpackLimit, defaulting to 100). The latter two create refs. There seems to be a theoretical race condition here, if something else triggers a gc at the "wrong" time, the loose objects might be cleaned up and the branches/tags refs become dangling. I understand that the packfile does need to be finished before creating the refs, and that the unpacking replaces that when there aren't enough objects, but wouldn't it be more data-safe to actually finish the pack, create the refs, and then unpack? Mike ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Order of operations at the end of fast-import? 2020-04-16 4:24 Order of operations at the end of fast-import? Mike Hommey @ 2020-04-16 5:09 ` Jeff King 2020-04-16 18:34 ` Johannes Sixt 2020-04-16 11:34 ` Mike Hommey 1 sibling, 1 reply; 5+ messages in thread From: Jeff King @ 2020-04-16 5:09 UTC (permalink / raw) To: Mike Hommey; +Cc: git On Thu, Apr 16, 2020 at 01:24:49PM +0900, Mike Hommey wrote: > I just noticed that the order of operations at the end of fast-import > are: > - end_packfile > - dump_branches > - dump_tags > > The first may create loose objects if the pack is small (per > fastimport.unpackLimit, defaulting to 100). The latter two create refs. > > There seems to be a theoretical race condition here, if something else > triggers a gc at the "wrong" time, the loose objects might be cleaned up > and the branches/tags refs become dangling. > > I understand that the packfile does need to be finished before creating > the refs, and that the unpacking replaces that when there aren't enough > objects, but wouldn't it be more data-safe to actually finish the pack, > create the refs, and then unpack? That race is there even without the unpacking step. Another gc might remove the pack, dropping its unreferenced objects. We do add a ".keep" between writing the pack and updating the refs, but it doesn't look like it's done atomically (i.e., we write the .idx file and _then_ add the .keep). So there's a small race there. But all of this is also true of any operation, like git-commit. It's creating new loose objects, and then will try to reference them. In between, a simultaneous gc will think they're unreachable. Likewise, receiving a push may write a pack (with a .keep, though in the correct order) or even loose objects. This is usually handled by the gc expiration time, which is compared against the file mtime. The default is 2 weeks, but even something short like 5 minutes would be plenty to avoid this race (even for a long import, we should be updating the mtime every time we call write()). In fact, gc will use the same expiration for clearing out tempfiles. So even before we write the final pack and its .keep, any temporary files we're writing into would be at risk. But again, if we're actively writing, their mtimes should save them. -Peff ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Order of operations at the end of fast-import? 2020-04-16 5:09 ` Jeff King @ 2020-04-16 18:34 ` Johannes Sixt 2020-04-16 18:58 ` Jeff King 0 siblings, 1 reply; 5+ messages in thread From: Johannes Sixt @ 2020-04-16 18:34 UTC (permalink / raw) To: Jeff King; +Cc: Mike Hommey, git Am 16.04.20 um 07:09 schrieb Jeff King: > We do add a ".keep" between writing the pack and updating the refs, but > it doesn't look like it's done atomically (i.e., we write the .idx file > and _then_ add the .keep). So there's a small race there. > > But all of this is also true of any operation, like git-commit. It's > creating new loose objects, and then will try to reference them. In > between, a simultaneous gc will think they're unreachable. Likewise, > receiving a push may write a pack (with a .keep, though in the correct > order) or even loose objects. > > This is usually handled by the gc expiration time, which is compared > against the file mtime. The default is 2 weeks, but even something short > like 5 minutes would be plenty to avoid this race (even for a long > import, we should be updating the mtime every time we call write()). Except that on Windows the times are only updated reliably when all file handles refering to the file are closed. Would that poke a hole in your argument? (I don't think so, as long as expiration times are not ridiculously short; and people who do gc --prune=now should know what they are doing -- and when.) > In fact, gc will use the same expiration for clearing out tempfiles. So > even before we write the final pack and its .keep, any temporary files > we're writing into would be at risk. But again, if we're actively > writing, their mtimes should save them. -- Hannes ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Order of operations at the end of fast-import? 2020-04-16 18:34 ` Johannes Sixt @ 2020-04-16 18:58 ` Jeff King 0 siblings, 0 replies; 5+ messages in thread From: Jeff King @ 2020-04-16 18:58 UTC (permalink / raw) To: Johannes Sixt; +Cc: Mike Hommey, git On Thu, Apr 16, 2020 at 08:34:27PM +0200, Johannes Sixt wrote: > > This is usually handled by the gc expiration time, which is compared > > against the file mtime. The default is 2 weeks, but even something short > > like 5 minutes would be plenty to avoid this race (even for a long > > import, we should be updating the mtime every time we call write()). > > Except that on Windows the times are only updated reliably when all file > handles refering to the file are closed. Would that poke a hole in your > argument? (I don't think so, as long as expiration times are not > ridiculously short; and people who do gc --prune=now should know what > they are doing -- and when.) Yes, if you do "git gc --prune=5.minutes" while an hour-long fast-import is running, you might be in trouble. Don't do that. Though on _any_ system, there are other cases where objects sit untouched before being referenced. I guess we pull from the index for reachability, so "git add" is safe for blobs. I don't know if we are clever enough to use cache-tree (and it might not even be there), so possibly those trees are at risk until you run "git commit" (or even after you run while waiting in the editor). Certainly I think the whole pruning system is a patchwork of assumptions that could be violated in extreme cases. But it has been that way for 15 years. If people aren't routinely finding objects pruned out from under them, I'm not inclined to spend a lot of time digging on it. -Peff ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Order of operations at the end of fast-import? 2020-04-16 4:24 Order of operations at the end of fast-import? Mike Hommey 2020-04-16 5:09 ` Jeff King @ 2020-04-16 11:34 ` Mike Hommey 1 sibling, 0 replies; 5+ messages in thread From: Mike Hommey @ 2020-04-16 11:34 UTC (permalink / raw) To: git On Thu, Apr 16, 2020 at 01:24:49PM +0900, Mike Hommey wrote: > Hi, > > I just noticed that the order of operations at the end of fast-import > are: > - end_packfile > - dump_branches > - dump_tags > > The first may create loose objects if the pack is small (per > fastimport.unpackLimit, defaulting to 100). The latter two create refs. > > There seems to be a theoretical race condition here, if something else > triggers a gc at the "wrong" time, the loose objects might be cleaned up > and the branches/tags refs become dangling. Come to think of it, this is only a problem with --prune=now, and the same applies to loose objects created with git add, git write-tree, git-commit-tree. Mike ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-04-16 18:58 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-04-16 4:24 Order of operations at the end of fast-import? Mike Hommey 2020-04-16 5:09 ` Jeff King 2020-04-16 18:34 ` Johannes Sixt 2020-04-16 18:58 ` Jeff King 2020-04-16 11:34 ` Mike Hommey
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).