git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: git <git@vger.kernel.org>
Subject: Re: File locking issues with fetching submodules in parallel
Date: Tue, 20 Feb 2018 12:40:02 -0800
Message-ID: <CAGZ79kZXtE84nfO_nx+H61+2B9SU7ASD4r4JozwV=8HZ050SCA@mail.gmail.com> (raw)
In-Reply-To: <nycvar.QRO.7.76.6.1802201709570.31@ZVAVAG-6OXH6DA.rhebcr.pbec.zvpebfbsg.pbz>

On Tue, Feb 20, 2018 at 8:20 AM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi Stefan (and other submodule wizards),
>
> Igor Melnichenko reported a submodule problem in a Git for Windows ticket:
>
>         https://github.com/git-for-windows/git/issues/1469
>

Thanks for bringing this up. Is it better to discuss it here on at that issue?

> Part of it is due to Windows' behavior where you cannot read the same file
> in one process while you write it in another process.
>
> The other part is how our submodule code works in parallel. In particular,
> we seem to write the `.git` file maybe even every time a submodule is
> fetched, unconditionally, not even testing whether the .git file is
> already there and contains the correct contents?

only when they need to be newly cloned (as that is when the
submodule--helper clone is invoked). When fetching, we'd not touch
the '.git' file IIRC.

> For some reason, the bug reporter saw a "Permission denied" on the `.git`
> file when we try to write it (and I am pretty certain that I tracked it
> down correctly to the `connect_work_tree_and_git_dir()` function). The
> intermittent "Permission denied" error seems to suggest that *another*
> process is accessing this file while we are writing it.

The reporter also reports this coming out of "git submodule add",
which is not parallelized, but rather in the messy state of migrating
as much code into C from shell, trying to not introduce to many bugs. ;)

So for add there is no parallelism I am aware of, which makes me wonder
how the race is coming in there.

I recall making the decision to unconditionally write out the '.git' file
for other operations like "git reset --hard --recurse-submodules" or
"git checkout --force --recurse-submodules", as there you are really asking
for a clean slate. I missed the Windows FS semantics for that case,
so I guess re-reading the file to make sure it is already correct would
avoid issues on Windows there. (though I do not yet recall how the
parallel stuff would bite us there, as the file is written as the very
first thing)

> It also seems that this problem becomes worse if the firewall is turned
> on, in which case a couple of network operations become a little slower
> (which I could imagine to be the factor making the problems more likely).
>
> A plausible workaround would be to write the `.git` file only when needed
> (which also would fix a bug on macOS/Linux, although the window is
> substantially smaller: the reader could potentially read a
> partially-written file).
>
> But maybe we are simply holding onto an open handle to the `.git` file too
> long?

That could very well be the case.

> I tried to put together a bit more information here:
>
> https://github.com/git-for-windows/git/issues/1469#issuecomment-366932746

Thanks!

> Do you think there is an easy solution for this? You're much deeper in the
> submodule code than I ever want to be...

I'll take a look for open handles, though we're already using our
wrapper functions
like 'write_file', which is supposed to close the handle gracefully (and has
undergone testing on Windows a lot as it is the standard write function)

Thanks,
Stefan

  reply index

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-20 16:20 Johannes Schindelin
2018-02-20 20:40 ` Stefan Beller [this message]
2018-02-20 22:27   ` Johannes Schindelin

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGZ79kZXtE84nfO_nx+H61+2B9SU7ASD4r4JozwV=8HZ050SCA@mail.gmail.com' \
    --to=sbeller@google.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox