git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: "Torsten Bögershausen" <tboegi@web.de>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Johannes Sixt" <j6t@kdbg.org>
Cc: Jeff King <peff@peff.net>, git@vger.kernel.org
Subject: Re: [PATCH v4 00/32] Lockfile correctness and refactoring
Date: Fri, 12 Sep 2014 14:50:23 +0200	[thread overview]
Message-ID: <5412EC0F.5090601@alum.mit.edu> (raw)
In-Reply-To: <540C6A02.9070403@web.de>

On 09/07/2014 04:21 PM, Torsten Bögershausen wrote:
> 
> On 2014-09-06 09.50, Michael Haggerty wrote:
>> Sorry for the long delay since v3. This version mostly cleans up a
>> couple more places where the lockfile object was left in an
>> ill-defined state. 
> No problem with the delay.
> The most important question is if we do the lk->active handling right.
> Set it to false as seen as possible, and to true as late as possible,
> then die() cleanly.
> So the ->acive handling looks (more or less, please see below) and
> deserves another critical review, may be.
> 
> Instead of commenting each patch, I collected a mixture of small questions
> and possible suggestions into a diff file.
> 
> diff --git a/lockfile.c b/lockfile.c
> index e54d260..7f27ea2 100644
> --- a/lockfile.c
> +++ b/lockfile.c
> @@ -18,6 +18,10 @@
>   *    Usually, if $FILENAME is a symlink, then it is resolved, and the
>   *    file ultimately pointed to is the one that is locked and later
>   *    replaced.  However, if LOCK_NODEREF is used, then $FILENAME
> +LOCK_NODEREF can be read either as
> +LOCK_NODE_REF or LOCK_NO_DEREF
> +should we change it ?
> +

Good idea.

>   *    itself is locked and later replaced, even if it is a symlink.
>   *
>   * 2. Write the new file contents to the lockfile.
> @@ -46,9 +50,18 @@
>   *   state:
>   *   - the lockfile exists
>   *   - active is set
> - *   - filename holds the filename of the lockfile
> + *   - filename holds the filename of the lockfile in a strbuf

I don't think this is necessary. The point of this list is to describe
the state machine, not the contents of the lock_file structure, so this
detail is only a distraction. And even if a reader is confused, the
compiler will warn if he tries to use the strbuf as if it were a string.

>   *   - fd holds a file descriptor open for writing to the lockfile
>   *   - owner holds the PID of the process that locked the file
> +question: Why do we need the PID here ?
> +Do we open a lock file and do a fork() ?
> +And if yes, the child gets a new PID, what happens when the
> +child gets a signal ?
> +Who "owns" the lockfile, the parent, the child, both ?
> +The child has access to all data, the fd is open and can be used,
> +why do we not allow a rollback, when the child dies ?

Good questions. I will add an explanation of the purpose of the pid in
this docstring.

>   *
>   * - Locked, lockfile closed (after close_lock_file()).  Same as the
>   *   previous state, except that the lockfile is closed and fd is -1.
> @@ -57,7 +70,7 @@
>   *   rollback_lock_file(), or a failed attempt to lock).  In this
>   *   state:
>   *   - active is unset
> - *   - filename is the empty string (usually, though there are
> + *   - filename is an empty string buffer (usually, though there are
>   *     transitory states in which this condition doesn't hold)
>   *   - fd is -1
>   *   - the object is left registered in the lock_file_list, and
> @@ -68,7 +81,7 @@
>  
>  static struct lock_file *volatile lock_file_list;
>  
> -static void remove_lock_file(void)
> +static void remove_lock_files(void)

Good idea. I will rename these two functions.

>  {
>      pid_t me = getpid();
>  
> @@ -79,9 +92,9 @@ static void remove_lock_file(void)
>      }
>  }
>  
> -static void remove_lock_file_on_signal(int signo)
> +static void remove_lock_files_on_signal(int signo)
>  {
> -    remove_lock_file();
> +    remove_lock_files();
>      sigchain_pop(signo);
>      raise(signo);
>  }
> @@ -93,7 +106,7 @@ static void remove_lock_file_on_signal(int signo)
>   * "/", if any).  If path is empty or the root directory ("/"), set
>   * path to the empty string.
>   */
> -static void trim_last_path_elm(struct strbuf *path)
> +static void trim_last_path_elem(struct strbuf *path)

I agree that the old name was bad. I will make it even more explicit:
trim_last_path_component().

>  {
>      int i = path->len;
>  
> @@ -143,7 +156,7 @@ static void resolve_symlink(struct strbuf *path)
>               * link is a relative path, so replace the
>               * last element of p with it.
>               */
> -            trim_last_path_elm(path);
> +            trim_last_path_elem(path);
>  
>          strbuf_addbuf(path, &link);
>      }
> @@ -153,13 +166,16 @@ static void resolve_symlink(struct strbuf *path)
>  /* Make sure errno contains a meaningful value on error */
>  static int lock_file(struct lock_file *lk, const char *path, int flags)
>  {
> +    struct stat st;
> +    int mode = 0666;
>      if (!lock_file_list) {
>          /* One-time initialization */
> -        sigchain_push_common(remove_lock_file_on_signal);
> -        atexit(remove_lock_file);
> +        sigchain_push_common(remove_lock_files_on_signal);
> +        atexit(remove_lock_files);
>      }
>  
> -    assert(!lk->active);
> +  if (lk->active)
> +        die("lk->active %s", path);

OK, but I will use die("BUG:...") since this would be an indication of a
bug in git.

>      if (!lk->on_list) {
>          /* Initialize *lk and add it to lock_file_list: */
> @@ -167,16 +183,25 @@ static int lock_file(struct lock_file *lk, const char *path, int flags)
>          lk->active = 0;
>          lk->owner = 0;
>          lk->on_list = 1;
> -        strbuf_init(&lk->filename, 0);
> +        strbuf_init(&lk->filename, strlen(path) + LOCK_SUFFIX_LEN);

OK.

>          lk->next = lock_file_list;
>          lock_file_list = lk;
>      }
>  
> +    strbuf_reset(&lk->filename); /* Better to be save */

I agree, but this would be a bug so I'd rather die("BUG:") in this case.

>      strbuf_addstr(&lk->filename, path);
>      if (!(flags & LOCK_NODEREF))
>          resolve_symlink(&lk->filename);
>      strbuf_addstr(&lk->filename, LOCK_SUFFIX);
> -    lk->fd = open(lk->filename.buf, O_RDWR | O_CREAT | O_EXCL, 0666);
> +    /*
> +     * adjust_shared_perm() will widen permissions if needed,
> +     * otherwise keep permissions restrictive
> +     *
> +     */
> +    if (!stat(path, &st))
> +        mode = st.st_mode & 07777;
> +
> +    lk->fd = open(lk->filename.buf, O_RDWR | O_CREAT | O_EXCL, mode);

I think this change is separate from the changes made in this series,
and needs its own justification (i.e., it's not immediately obvious to
me whether the change is an improvement). Would you mind submitting it
as a separate patch?

>      if (lk->fd < 0) {
>          strbuf_reset(&lk->filename);
>          return -1;
> @@ -268,7 +293,7 @@ int close_lock_file(struct lock_file *lk)
>      return close(fd);
>  }
>  
> -int reopen_lock_file(struct lock_file *lk)
> +int reopen_lock_file_UNUSED_CAN_IT_BE_REMOVED(struct lock_file *lk)

Junio added this function recently, without any callers, in 93dcaea2. I
assume he has some diabolical plan for it. Junio?

>  {
>      if (0 <= lk->fd)
>          die(_("BUG: reopen a lockfile that is still open"));
> @@ -283,7 +308,7 @@ int commit_lock_file_to(struct lock_file *lk, const char *path)
>      int save_errno;
>  
>      if (!lk->active)
> -        die("BUG: attempt to commit unlocked object");
> +        die("BUG: attempt to commit unlocked object %s", path);

OK.

>  
>      if (lk->fd >= 0 && close_lock_file(lk))
>          goto rollback;
> @@ -325,10 +350,12 @@ void rollback_lock_file(struct lock_file *lk)
>  {
>      if (!lk->active)
>          return;
> +    lk->active = 0; /* We are going to de-activate,
> +                       so active is no longer valid already here ? */

I think the question is: what should happen if a signal arrives at this
moment? If we set active=0 here, then the signal handler wouldn't clean
up this lockfile at all. But we haven't removed it yet, either. So the
lockfile would get left behind.

So how much longer can we leave active set here?

>  
>      if (lk->fd >= 0)
>          close_lock_file(lk);

close_lock_file() is careful to clear lk->fd *before* closing it, so it
shouldn't be possible for the file to get closed twice [1]. Therefore I
think it is OK to leave lk->active set during the call to close_lock_file().

>      unlink_or_warn(lk->filename.buf);

This line is a bit dangerous. If we call unlink_or_warn() with
lk->active set, then there is a chance that unlink_and_warn() will be
called a second time by a signal handler that runs after this line but
before lk->active is cleared. The first call would delete the lockfile
that we created, but the second call (which might be delayed for a bit
while our signal handler works its way through the lockfile linked-list)
might end up deleting a lockfile that another process created between
our two calls. That would be a bad outcome, but it requires a double
coincidence: our process has to receive a signal at this pessimal moment
*and* another process has to create a lockfile after that event but
before our signal handler has run. And for *real* damage to occur, a
*third* process has to incorrectly acquire the *same* lockfile because
of its incorrect deletion by our process, and it has to make a change
that conflicts with the change that the second process is trying to make.

On the other hand, if we clear lk->active *before* calling
unlink_or_warn(), then the danger is that we receive a signal and never
clean up the lockfile. The result is perhaps not as bad as deleting
another process's lockfile, but it seems to me that it is far more
likely: it can happen regardless of whether another process is racing
with us, let alone a third process is trying to acquire the same lock.

Can anybody think of a reasonable way to make this 100% safe?

If not, I think the code as written is safer than your proposed change.

> -    lk->active = 0;
> +    //lk->active = 0;
>      strbuf_reset(&lk->filename);
>  }
> 
> 

Thanks very much for your feedback!

Michael

[1] Another question (and I think somebody else brought it up) is
whether close_lock_file() should actually close the file *before*
clearing lk->fd. I will address that question elsewhere.

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

  reply	other threads:[~2014-09-12 12:57 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-06  7:50 [PATCH v4 00/32] Lockfile correctness and refactoring Michael Haggerty
2014-09-06  7:50 ` [PATCH v4 01/32] unable_to_lock_die(): rename function from unable_to_lock_index_die() Michael Haggerty
2014-09-06  7:50 ` [PATCH v4 02/32] api-lockfile: expand the documentation Michael Haggerty
2014-09-09 22:40   ` Junio C Hamano
2014-09-06  7:50 ` [PATCH v4 03/32] rollback_lock_file(): do not clear filename redundantly Michael Haggerty
2014-09-11 19:13   ` Ronnie Sahlberg
2014-09-06  7:50 ` [PATCH v4 04/32] rollback_lock_file(): exit early if lock is not active Michael Haggerty
2014-09-11 19:13   ` Ronnie Sahlberg
2014-09-06  7:50 ` [PATCH v4 05/32] rollback_lock_file(): set fd to -1 Michael Haggerty
2014-09-06  7:50 ` [PATCH v4 06/32] lockfile: unlock file if lockfile permissions cannot be adjusted Michael Haggerty
2014-09-09 22:39   ` Junio C Hamano
2014-09-12 11:03     ` Michael Haggerty
2014-09-06  7:50 ` [PATCH v4 07/32] hold_lock_file_for_append(): release lock on errors Michael Haggerty
2014-09-09 22:41   ` Junio C Hamano
2014-09-12 11:04     ` Michael Haggerty
2014-09-06  7:50 ` [PATCH v4 08/32] lock_file(): always add lock_file object to lock_file_list Michael Haggerty
2014-09-06  7:50 ` [PATCH v4 09/32] lockfile.c: document the various states of lock_file objects Michael Haggerty
2014-09-11 19:57   ` Ronnie Sahlberg
2014-09-06  7:50 ` [PATCH v4 10/32] cache.h: define constants LOCK_SUFFIX and LOCK_SUFFIX_LEN Michael Haggerty
2014-09-11 22:15   ` Ronnie Sahlberg
2014-09-12 16:44     ` Michael Haggerty
2014-09-11 22:42   ` Ronnie Sahlberg
2014-09-12 17:13     ` Michael Haggerty
2014-09-12 17:32       ` Ronnie Sahlberg
2014-09-06  7:50 ` [PATCH v4 11/32] delete_ref_loose(): don't muck around in the lock_file's filename Michael Haggerty
2014-09-13  7:41   ` Johannes Sixt
2014-09-14  6:27     ` Michael Haggerty
2014-09-14  6:38       ` Michael Haggerty
2014-09-14 14:49         ` Johannes Sixt
2014-09-06  7:50 ` [PATCH v4 12/32] prepare_index(): declare return value to be (const char *) Michael Haggerty
2014-09-06  7:50 ` [PATCH v4 13/32] write_packed_entry_fn(): convert cb_data into a (const int *) Michael Haggerty
2014-09-11 19:55   ` Ronnie Sahlberg
2014-09-06  7:50 ` [PATCH v4 14/32] lock_file(): exit early if lockfile cannot be opened Michael Haggerty
2014-09-11 22:49   ` Ronnie Sahlberg
2014-09-06  7:50 ` [PATCH v4 15/32] remove_lock_file(): call rollback_lock_file() Michael Haggerty
2014-09-06  7:50 ` [PATCH v4 16/32] commit_lock_file(): inline temporary variable Michael Haggerty
2014-09-06  7:50 ` [PATCH v4 17/32] commit_lock_file(): die() if called for unlocked lockfile object Michael Haggerty
2014-09-06  7:50 ` [PATCH v4 18/32] commit_lock_file(): if close fails, roll back Michael Haggerty
2014-09-06  7:50 ` [PATCH v4 19/32] commit_lock_file(): rollback lock file on failure to rename Michael Haggerty
2014-09-10  7:55   ` Jeff King
2014-09-10 12:55     ` Duy Nguyen
2014-09-06  7:50 ` [PATCH v4 20/32] api-lockfile: document edge cases Michael Haggerty
2014-09-06  7:50 ` [PATCH v4 21/32] dump_marks(): remove a redundant call to rollback_lock_file() Michael Haggerty
2014-09-06  7:50 ` [PATCH v4 22/32] git_config_set_multivar_in_file(): avoid " Michael Haggerty
2014-09-06  7:50 ` [PATCH v4 23/32] lockfile: avoid transitory invalid states Michael Haggerty
2014-09-06  7:50 ` [PATCH v4 24/32] struct lock_file: declare some fields volatile Michael Haggerty
2014-09-06  7:50 ` [PATCH v4 25/32] try_merge_strategy(): remove redundant lock_file allocation Michael Haggerty
2014-09-06  7:50 ` [PATCH v4 26/32] try_merge_strategy(): use a statically-allocated lock_file object Michael Haggerty
2014-09-06  7:50 ` [PATCH v4 27/32] commit_lock_file(): use a strbuf to manage temporary space Michael Haggerty
2014-09-06  7:50 ` [PATCH v4 28/32] Change lock_file::filename into a strbuf Michael Haggerty
2014-09-06  7:50 ` [PATCH v4 29/32] resolve_symlink(): use a strbuf for internal scratch space Michael Haggerty
2014-09-06  7:50 ` [PATCH v4 30/32] resolve_symlink(): take a strbuf parameter Michael Haggerty
2014-09-06  7:50 ` [PATCH v4 31/32] trim_last_path_elm(): replace last_path_elm() Michael Haggerty
2014-09-06  7:50 ` [PATCH v4 32/32] Extract a function commit_lock_file_to() Michael Haggerty
2014-09-07 14:21 ` [PATCH v4 00/32] Lockfile correctness and refactoring Torsten Bögershausen
2014-09-12 12:50   ` Michael Haggerty [this message]
2014-09-08 22:35 ` Junio C Hamano
2014-09-10  8:13 ` Jeff King
2014-09-10 10:25   ` Duy Nguyen
2014-09-10 10:30     ` Jeff King
2014-09-10 16:51       ` Junio C Hamano
2014-09-10 19:11         ` Jeff King
2014-09-12 11:28           ` Michael Haggerty
2014-09-12 11:13         ` Michael Haggerty
2014-09-12 14:21   ` Michael Haggerty
2014-09-13 18:51     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5412EC0F.5090601@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=peff@peff.net \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).