git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Stefan Beller <sbeller@google.com>
Cc: git <git@vger.kernel.org>
Subject: Re: File locking issues with fetching submodules in parallel
Date: Tue, 20 Feb 2018 23:27:10 +0100 (STD)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.1802202314180.31@ZVAVAG-6OXH6DA.rhebcr.pbec.zvpebfbsg.pbz> (raw)
In-Reply-To: <CAGZ79kZXtE84nfO_nx+H61+2B9SU7ASD4r4JozwV=8HZ050SCA@mail.gmail.com>

Hi Stefan,

On Tue, 20 Feb 2018, Stefan Beller wrote:

> On Tue, Feb 20, 2018 at 8:20 AM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> > Hi Stefan (and other submodule wizards),
> >
> > Igor Melnichenko reported a submodule problem in a Git for Windows ticket:
> >
> >         https://github.com/git-for-windows/git/issues/1469
> >
> 
> Thanks for bringing this up. Is it better to discuss it here on at that
> issue?

It's up to you. I have no preference.

> > Part of it is due to Windows' behavior where you cannot read the same
> > file in one process while you write it in another process.
> >
> > The other part is how our submodule code works in parallel. In
> > particular, we seem to write the `.git` file maybe even every time a
> > submodule is fetched, unconditionally, not even testing whether the
> > .git file is already there and contains the correct contents?
> 
> only when they need to be newly cloned (as that is when the
> submodule--helper clone is invoked). When fetching, we'd not touch the
> '.git' file IIRC.

Okay. I saw multiple calls to the same connect_work_tree_and_git_dir()
function in submodule.c, and was not so sure that it was only clone.

But the bug report talks about `git submodule add` with a new URL and a
new submodule location, so I guess it is going down the clone path.

> > For some reason, the bug reporter saw a "Permission denied" on the
> > `.git` file when we try to write it (and I am pretty certain that I
> > tracked it down correctly to the `connect_work_tree_and_git_dir()`
> > function). The intermittent "Permission denied" error seems to suggest
> > that *another* process is accessing this file while we are writing it.
> 
> The reporter also reports this coming out of "git submodule add",
> which is not parallelized, but rather in the messy state of migrating
> as much code into C from shell, trying to not introduce to many bugs. ;)
> 
> So for add there is no parallelism I am aware of, which makes me wonder
> how the race is coming in there.

Hrm. You're right, there is no indication in that bug report that `git
submodule` spawns multiple processes in parallel.

> I recall making the decision to unconditionally write out the '.git' file
> for other operations like "git reset --hard --recurse-submodules" or
> "git checkout --force --recurse-submodules", as there you are really asking
> for a clean slate. I missed the Windows FS semantics for that case,
> so I guess re-reading the file to make sure it is already correct would
> avoid issues on Windows there. (though I do not yet recall how the
> parallel stuff would bite us there, as the file is written as the very
> first thing)

Maybe it is not parallel, but maybe it is simply an unclosed handle to the
.git file? I do not see anything in read_gitfile_gently() (which is the
only location reading the .git file I know of)... Did I miss anything?

> > It also seems that this problem becomes worse if the firewall is turned
> > on, in which case a couple of network operations become a little slower
> > (which I could imagine to be the factor making the problems more likely).
> >
> > A plausible workaround would be to write the `.git` file only when needed
> > (which also would fix a bug on macOS/Linux, although the window is
> > substantially smaller: the reader could potentially read a
> > partially-written file).
> >
> > But maybe we are simply holding onto an open handle to the `.git` file
> > too long?
> 
> That could very well be the case.

But as I said above, I fail to find any smoking gun in the source code.
The only code path I see that reads the .git file releases the resources
correctly AFAICT.

> > I tried to put together a bit more information here:
> >
> > https://github.com/git-for-windows/git/issues/1469#issuecomment-366932746
> 
> Thanks!
> 
> > Do you think there is an easy solution for this? You're much deeper in the
> > submodule code than I ever want to be...
> 
> I'll take a look for open handles, though we're already using our
> wrapper functions
> like 'write_file', which is supposed to close the handle gracefully (and has
> undergone testing on Windows a lot as it is the standard write function)

Right, even some of my code uses this successfully...

Curious problem.

Ciao,
Dscho

      reply	other threads:[~2018-02-20 22:27 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-20 16:20 File locking issues with fetching submodules in parallel Johannes Schindelin
2018-02-20 20:40 ` Stefan Beller
2018-02-20 22:27   ` Johannes Schindelin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.1802202314180.31@ZVAVAG-6OXH6DA.rhebcr.pbec.zvpebfbsg.pbz \
    --to=johannes.schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).