On Wed, Nov 10, 2021 at 03:36:59AM -0500, Jeff King wrote: > On Tue, Nov 09, 2021 at 12:25:46PM +0100, Patrick Steinhardt wrote: > > > So I've finally found the time to have another look at massaging this > > into the ref_transaction mechanism. If we do want to batch the fsync(3P) > > calls, then we basically have two different alternatives: > > > > 1. We first lock all loose refs by creating the respective lock > > files and writing the updated ref into that file. We keep the > > file descriptor open such that we can then flush them all in one > > go. > > > > 2. Same as before, we lock all refs and write the updated pointers > > into the lockfiles, but this time we close each lockfile after > > having written to it. Later, we reopen them all to fsync(3P) them > > to disk. > > > > I'm afraid both alternatives aren't any good: the first alternative > > risks running out of file descriptors if you queue up lots of refs. And > > the second alternative is going to be slow, especially so on Windows if > > I'm not mistaken. > > I agree the first is a dead end. I had imagined something like the > second, but I agree that we'd have to measure the cost of re-opening. > It's too bad there is not a syscall to sync a particular set of paths > (or even better, a directory tree recursively). > > There is another option, though: the batch-fsync code for objects does a > "cheap" fsync while we have the descriptor open, and then later triggers > a to-disk sync on a separate file. My understanding is that this works > because modern filesystems will make sure the data write is in the > journal on the cheap sync, and then the separate-file sync makes sure > the journal goes to disk. > > We could do something like that here. In fact, if you don't care about > durability and just filesystem corruption, then you only care about the > first sync (because the bad case is when the rename gets journaled but > the data write doesn't). Ah, interesting. That does sound like a good way forward to me, thanks for the pointers! Patrick > In fact, even if you did choose to re-open and fsync each one, that's > still sequential. We'd need some way to tell the kernel to sync them all > at once. The batch-fsync trickery above is one such way (I haven't > tried, but I wonder if making a bunch of fsync calls in parallel would > work similarly). > > > So with both not being feasible, we'll likely have to come up with a > > more complex scheme if we want to batch-sync files. One idea would be to > > fsync(3P) all lockfiles every $n refs, but it adds complexity in a place > > where I'd really like to have things as simple as possible. It also > > raises the question what $n would have to be. > > I do think syncing every $n would not be too hard to implement. It could > all be hidden behind a state machine API that collects lock files and > flushes when it sees fit. You'd just call a magic "batch_fsync_and_close" > instead of "fsync() and close()", though you would have to remember to > do a final flush call to tell it there are no more coming. > > -Peff