Non-robust lock files in containers can lead to repo corruption

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* Non-robust lock files in containers can lead to repo corruption
@ 2019-08-10 16:05 Gregory Szorc
  2019-08-12 13:48 ` Randall S. Becker
  2019-08-12 16:38 ` Jeff King
  0 siblings, 2 replies; 5+ messages in thread
From: Gregory Szorc @ 2019-08-10 16:05 UTC (permalink / raw)
  To: git

I tracked down a source of Git corrupting repositories to lock file
design not being robust when containers / PID namespaces are present.

In my case, the corruption stemmed from premature release of the `git
gc` lock in the gc.pid file. But since the lock file code for that file
is in gc.c, there could be other lock files in Git affected by the same
design limitation as well.

The lock design of gc.pid stores the current hostname and PID of the
locking process in the file. If another process comes along and its
hostname matches the stored hostname, it checks to see if the listed PID
exists. If the PID is missing, it assumes the lock is stale and releases
the lock.

A limitation with this approach is it isn't robust in the presence of
containers / PID namespaces. In containers, it is common for the
hostname to match the container host's hostname. Or the hostname will be
static string. In Kubernetes, all containers within a pod share the same
hostname. Containers (almost always) run in separate PID namespaces, so
PIDs from outside the container aren't visible to the container itself.
This means that if e.g. 2 `git gc` processes are running with the same
hostname in separate containers / PID namespaces, Git could prematurely
release the lock file because it thinks the "other" PID is dead and repo
corruption could ensue due to the 2 `git gc` processes racing with each
other.

The on-disk format of lock files obviously needs to be backwards
compatible with older clients. One backwards compatible solution is to
append something to the hostname to disambiguate containers / PID
namespaces. Mercurial appends the current PID namespace identifier to
the hostname [1] and my experience is that this is sufficient to
mitigate the issue. It is possible more robust solutions are achievable.

Gregory

[1] https://www.mercurial-scm.org/repo/hg/rev/1f151a33af8e

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Non-robust lock files in containers can lead to repo corruption
  2019-08-10 16:05 Non-robust lock files in containers can lead to repo corruption Gregory Szorc
@ 2019-08-12 13:48 ` Randall S. Becker
  2019-08-12 15:54   ` Junio C Hamano
  2019-08-12 16:38 ` Jeff King
  1 sibling, 1 reply; 5+ messages in thread
From: Randall S. Becker @ 2019-08-12 13:48 UTC (permalink / raw)
  To: 'Gregory Szorc', git

On August 10, 2019 12:06 PM, Gregory Szorc wrote:
> I tracked down a source of Git corrupting repositories to lock file design not
> being robust when containers / PID namespaces are present.
> 
> In my case, the corruption stemmed from premature release of the `git gc`
> lock in the gc.pid file. But since the lock file code for that file is in gc.c, there
> could be other lock files in Git affected by the same design limitation as well.
> 
> The lock design of gc.pid stores the current hostname and PID of the locking
> process in the file. If another process comes along and its hostname matches
> the stored hostname, it checks to see if the listed PID exists. If the PID is
> missing, it assumes the lock is stale and releases the lock.
> 
> A limitation with this approach is it isn't robust in the presence of containers
> / PID namespaces. In containers, it is common for the hostname to match
> the container host's hostname. Or the hostname will be static string. In
> Kubernetes, all containers within a pod share the same hostname. Containers
> (almost always) run in separate PID namespaces, so PIDs from outside the
> container aren't visible to the container itself.
> This means that if e.g. 2 `git gc` processes are running with the same
> hostname in separate containers / PID namespaces, Git could prematurely
> release the lock file because it thinks the "other" PID is dead and repo
> corruption could ensue due to the 2 `git gc` processes racing with each other.
> 
> The on-disk format of lock files obviously needs to be backwards compatible
> with older clients. One backwards compatible solution is to append
> something to the hostname to disambiguate containers / PID namespaces.
> Mercurial appends the current PID namespace identifier to the hostname [1]
> and my experience is that this is sufficient to mitigate the issue. It is possible
> more robust solutions are achievable.

While I like the idea personally, many platforms, including NonStop (TNS/E),  do not support pid namespaces. In particular setns(2) may not be implemented. Please make sure that any changes detect this condition properly and omit the use of namespaces.

Regards,
Randall



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Non-robust lock files in containers can lead to repo corruption
  2019-08-12 13:48 ` Randall S. Becker
@ 2019-08-12 15:54   ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2019-08-12 15:54 UTC (permalink / raw)
  To: Randall S. Becker; +Cc: 'Gregory Szorc', git

"Randall S. Becker" <rsbecker@nexbridge.com> writes:

>> The lock design of gc.pid stores the current hostname and PID of the locking
>> process in the file. If another process comes along and its hostname matches
>> the stored hostname, it checks to see if the listed PID exists. If the PID is
>> missing, it assumes the lock is stale and releases the lock.

The assumption that <hostname, pid> pair can be used to identify
running process (until the pid wraps around and gets reused) does
not sound particularly limited Git.  Don't container folks solve it
without touching individual applications?  For example, is it a
viable option to isolate UTS namespaces (CLONE_NEWUTS) in addition
to PID namespaces (CLONE_NEWPID)?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Non-robust lock files in containers can lead to repo corruption
  2019-08-10 16:05 Non-robust lock files in containers can lead to repo corruption Gregory Szorc
  2019-08-12 13:48 ` Randall S. Becker
@ 2019-08-12 16:38 ` Jeff King
  2019-08-13 19:13   ` Junio C Hamano
  1 sibling, 1 reply; 5+ messages in thread
From: Jeff King @ 2019-08-12 16:38 UTC (permalink / raw)
  To: Gregory Szorc; +Cc: git

On Sat, Aug 10, 2019 at 09:05:33AM -0700, Gregory Szorc wrote:

> I tracked down a source of Git corrupting repositories to lock file
> design not being robust when containers / PID namespaces are present.
> 
> In my case, the corruption stemmed from premature release of the `git
> gc` lock in the gc.pid file. But since the lock file code for that file
> is in gc.c, there could be other lock files in Git affected by the same
> design limitation as well.

I don't think there are. Most of Git's locks are predicated purely on
the existence of the lockfile (with the intent that they'd work over
systems like NFS). The gc lock is a weird one-off.

And while it's not great for multiple gc's to run at the same time
(because it wastes CPU), two of them running at the same time shouldn't
cause a corruption. If you have a reproducible demonstration where that
happens, I'd be very interested to see it.

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Non-robust lock files in containers can lead to repo corruption
  2019-08-12 16:38 ` Jeff King
@ 2019-08-13 19:13   ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2019-08-13 19:13 UTC (permalink / raw)
  To: Jeff King; +Cc: Gregory Szorc, git

Jeff King <peff@peff.net> writes:

> I don't think there are. Most of Git's locks are predicated purely on
> the existence of the lockfile (with the intent that they'd work over
> systems like NFS). The gc lock is a weird one-off.
>
> And while it's not great for multiple gc's to run at the same time
> (because it wastes CPU), two of them running at the same time shouldn't
> cause a corruption. If you have a reproducible demonstration where that
> happens, I'd be very interested to see it.

Good point.

And come to think of it, gc "lock" does not have to be a lock to
begin with.  It is not "I am forbidding all of you guys from doing
gc, because that would break the result of gc _I_ am doing right
now", which is what we traditionally call "lock".  It is merely a
"We need to do this every once in a while and I am doing it now.  I
let others know that I am already doing so, so they do not have to
start the same thing right now" advisory.

And the code (i.e. lock_repo_for_gc()) allows the current process to
run when

 - "--force" option is given, or
 - the lockfile cannot be open()ed, or
 - the lockfile cannot be fstat()ed, or
 - the lockfile is older than 12 hours, or
 - the lockfile has malformed contents, or
 - the lockfile was taken on a host with the same name from ours,
   and a process with the same pid as recorded is not running.

Following the """12 hour limit is very generous as gc should never
take that long. On the other hand we don't really need a strict
limit here, running gc --auto one day late is not a big
problem. --force can be used in manual gc after the user verifies
that no gc is running.""" reasoning, I suspect that it shouldn't be
too bad even if we dropped the last condition (i.e. "is the process
still running?")  from the set of these conditions.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-08-13 19:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-10 16:05 Non-robust lock files in containers can lead to repo corruption Gregory Szorc
2019-08-12 13:48 ` Randall S. Becker
2019-08-12 15:54   ` Junio C Hamano
2019-08-12 16:38 ` Jeff King
2019-08-13 19:13   ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).