git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] git: add --no-optional-locks option
@ 2017-09-21  4:32 Jeff King
  2017-09-21  4:55 ` Junio C Hamano
                   ` (6 more replies)
  0 siblings, 7 replies; 26+ messages in thread
From: Jeff King @ 2017-09-21  4:32 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes, this is an adaptation of your 67e5ce7f63 (status: offer *not*
to lock the index and update it, 2016-08-12). Folks working on GitHub
Desktop complained to me that it's only available on Windows. :)

I expanded the scope a bit to let us give the same treatment to more
commands in the long run.  I'd also be OK with just cherry-picking your
patch to non-Windows Git if you don't find my reasoning below
compelling. But I think we need _something_ like this, as the other
solutions I could come up with don't seem very promising.

-Peff

-- >8 --
Some tools like IDEs or fancy editors may periodically run
commands like "git status" in the background to keep track
of the state of the repository. Some of these commands may
refresh the index and write out the result in an
opportunistic way: if they can get the index lock, then they
update the on-disk index with any updates they find. And if
not, then their in-core refresh is lost and just has to be
recomputed by the next caller.

But taking the index lock may conflict with other operations
in the repository. Especially ones that the user is doing
themselves, which _aren't_ opportunistic. In other words,
"git status" knows how to back off when somebody else is
holding the lock, but other commands don't know that status
would be happy to drop the lock if somebody else wanted it.

There are a couple possible solutions:

  1. Have some kind of "pseudo-lock" that allows other
     commands to tell status that they want the lock.

     This is likely to be complicated and error-prone to
     implement (and maybe even impossible with just
     dotlocks to work from, as it requires some
     inter-process communication).

  2. Avoid background runs of commands like "git status"
     that want to do opportunistic updates, preferring
     instead plumbing like diff-files, etc.

     This is awkward for a couple of reasons. One is that
     "status --porcelain" reports a lot more about the
     repository state than is available from individual
     plumbing commands. And two is that we actually _do_
     want to see the refreshed index. We just don't want to
     take a lock or write out the result. Whereas commands
     like diff-files expect us to refresh the index
     separately and write it to disk so that they can depend
     on the result. But that write is exactly what we're
     trying to avoid.

  3. Ask "status" not to lock or write the index.

     This is easy to implement. The big downside is that any
     work done in refreshing the index for such a call is
     lost when the process exits. So a background process
     may end up re-hashing a changed file multiple times
     until the user runs a command that does an index
     refresh themselves.

This patch implements the option 3. The idea (and the test)
is largely stolen from a Git for Windows patch by Johannes
Schindelin, 67e5ce7f63 (status: offer *not* to lock the
index and update it, 2016-08-12). The twist here is that
instead of making this an option to "git status", it becomes
a "git" option and matching environment variable.

The reason there is two-fold:

  1. An environment variable is carried through to
     sub-processes. And whether an invocation is a
     background process or not should apply to the whole
     process tree. So you could do "git --no-optional-locks
     foo", and if "foo" is a script or alias that calls
     "status", you'll still get the effect.

  2. There may be other programs that want the same
     treatment.

     I've punted here on finding more callers to convert,
     since "status" is the obvious one to call as a repeated
     background job. But "git diff"'s opportunistic refresh
     of the index may be a good candidate.

The test is taken from 67e5ce7f63, and it's worth repeating
Johannes's explanation:

  Note that the regression test added in this commit does
  not *really* verify that no index.lock file was written;
  that test is not possible in a portable way. Instead, we
  verify that .git/index is rewritten *only* when `git
  status` is run without `--no-optional-locks`.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git.txt | 13 +++++++++++++
 builtin/commit.c      |  5 ++++-
 cache.h               |  6 ++++++
 environment.c         |  5 +++++
 git.c                 |  4 ++++
 t/t7508-status.sh     | 10 ++++++++++
 6 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 6e3a6767e5..8dd3ae05ae 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -159,6 +159,10 @@ foo.bar= ...`) sets `foo.bar` to the empty string which ` git config
 	Add "icase" magic to all pathspec. This is equivalent to setting
 	the `GIT_ICASE_PATHSPECS` environment variable to `1`.
 
+--no-optional-locks::
+	Do not perform optional operations that require locks. This is
+	equivalent to setting the `GIT_OPTIONAL_LOCKS` to `0`.
+
 GIT COMMANDS
 ------------
 
@@ -697,6 +701,15 @@ of clones and fetches.
 	which feed potentially-untrusted URLS to git commands.  See
 	linkgit:git-config[1] for more details.
 
+`GIT_OPTIONAL_LOCKS`::
+	If set to `0`, Git will avoid performing any operations which
+	require taking a lock and which are not required to complete the
+	requested operation. For example, this will prevent `git status`
+	from refreshing the index as a side effect. This is useful for
+	processes running in the background which do not want to cause
+	lock contention with other operations on the repository.
+	Defaults to `1`.
+
 Discussion[[Discussion]]
 ------------------------
 
diff --git a/builtin/commit.c b/builtin/commit.c
index 58f9747c2f..fafd492029 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1387,7 +1387,10 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 	read_cache_preload(&s.pathspec);
 	refresh_index(&the_index, REFRESH_QUIET|REFRESH_UNMERGED, &s.pathspec, NULL, NULL);
 
-	fd = hold_locked_index(&index_lock, 0);
+	if (use_optional_locks())
+		fd = hold_locked_index(&index_lock, 0);
+	else
+		fd = -1;
 
 	s.is_initial = get_oid(s.reference, &oid) ? 1 : 0;
 	if (!s.is_initial)
diff --git a/cache.h b/cache.h
index a916bc79e3..fea400c5ec 100644
--- a/cache.h
+++ b/cache.h
@@ -443,6 +443,7 @@ static inline enum object_type object_type(unsigned int mode)
 #define GIT_NOGLOB_PATHSPECS_ENVIRONMENT "GIT_NOGLOB_PATHSPECS"
 #define GIT_ICASE_PATHSPECS_ENVIRONMENT "GIT_ICASE_PATHSPECS"
 #define GIT_QUARANTINE_ENVIRONMENT "GIT_QUARANTINE_PATH"
+#define GIT_OPTIONAL_LOCKS_ENVIRONMENT "GIT_OPTIONAL_LOCKS"
 
 /*
  * This environment variable is expected to contain a boolean indicating
@@ -782,6 +783,11 @@ extern int protect_ntfs;
  */
 extern int ref_paranoia;
 
+/*
+ * Returns the boolean value of $GIT_OPTIONAL_LOCKS (or the default value).
+ */
+int use_optional_locks(void);
+
 /*
  * The character that begins a commented line in user-editable file
  * that is subject to stripspace.
diff --git a/environment.c b/environment.c
index f1f934b6fd..8289c25b44 100644
--- a/environment.c
+++ b/environment.c
@@ -338,3 +338,8 @@ void reset_shared_repository(void)
 {
 	need_shared_repository_from_config = 1;
 }
+
+int use_optional_locks(void)
+{
+	return git_env_bool(GIT_OPTIONAL_LOCKS_ENVIRONMENT, 1);
+}
diff --git a/git.c b/git.c
index f31dca6962..9e96dd4090 100644
--- a/git.c
+++ b/git.c
@@ -182,6 +182,10 @@ static int handle_options(const char ***argv, int *argc, int *envchanged)
 			setenv(GIT_ICASE_PATHSPECS_ENVIRONMENT, "1", 1);
 			if (envchanged)
 				*envchanged = 1;
+		} else if (!strcmp(cmd, "--no-optional-locks")) {
+			setenv(GIT_OPTIONAL_LOCKS_ENVIRONMENT, "0", 1);
+			if (envchanged)
+				*envchanged = 1;
 		} else if (!strcmp(cmd, "--shallow-file")) {
 			(*argv)++;
 			(*argc)--;
diff --git a/t/t7508-status.sh b/t/t7508-status.sh
index 43d19a9b22..93f162a4f7 100755
--- a/t/t7508-status.sh
+++ b/t/t7508-status.sh
@@ -1670,4 +1670,14 @@ test_expect_success '"Initial commit" should not be noted in commit template' '
 	test_i18ngrep ! "Initial commit" output
 '
 
+test_expect_success '--no-optional-locks prevents index update' '
+	test-chmtime =1234567890 .git/index &&
+	git --no-optional-locks status &&
+	test-chmtime -v +0 .git/index >out &&
+	grep ^1234567890 out &&
+	git status &&
+	test-chmtime -v +0 .git/index >out &&
+	! grep ^1234567890 out
+'
+
 test_done
-- 
2.14.1.1040.gcaf8795f39

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] git: add --no-optional-locks option
  2017-09-21  4:32 [PATCH] git: add --no-optional-locks option Jeff King
@ 2017-09-21  4:55 ` Junio C Hamano
  2017-09-21  5:08   ` Jeff King
  2017-09-21 18:25 ` Johannes Sixt
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2017-09-21  4:55 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, git

Jeff King <peff@peff.net> writes:

> Johannes, this is an adaptation of your 67e5ce7f63 (status: offer *not*
> to lock the index and update it, 2016-08-12). Folks working on GitHub
> Desktop complained to me that it's only available on Windows. :)
>
> I expanded the scope a bit to let us give the same treatment to more
> commands in the long run.  I'd also be OK with just cherry-picking your
> patch to non-Windows Git if you don't find my reasoning below
> compelling. But I think we need _something_ like this, as the other
> solutions I could come up with don't seem very promising.

The phrase 'optional lock' does not answer this question clearly,
though: does it make sense to extend the coverage of this option in
the future to things more than the "opportunistic update to the
index file"?

If the answer is no, then having 'index' instead of 'lock' in the
name of the option would make more sense (and 'opportunistic' over
'optional', too), because what the change is about is to allow other
processes that are directly interacting with the user to update the
index, and 'lock' being hindrance is merely an implementation
detail.  The comment on the "test" in the log message mentions as if
it were a short-coming that it does not check the lock but checks
if the index is written, but I think that is testing what matters
and preferable than testing "did we lock and then unlock it?"

On the other hand, if the answer is yes, then I am curious what
other things this may extend to, and if these other things are also
opportunistic optimizations.

Other than that, I think this change (including the part that this
is done globally and down to subprocesses as needed) makes sense.

Thanks (and sorry for not being Johannes ;-).



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] git: add --no-optional-locks option
  2017-09-21  4:55 ` Junio C Hamano
@ 2017-09-21  5:08   ` Jeff King
  2017-09-21  5:29     ` Junio C Hamano
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2017-09-21  5:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git

On Thu, Sep 21, 2017 at 01:55:58PM +0900, Junio C Hamano wrote:

> The phrase 'optional lock' does not answer this question clearly,
> though: does it make sense to extend the coverage of this option in
> the future to things more than the "opportunistic update to the
> index file"?
> 
> If the answer is no, then having 'index' instead of 'lock' in the
> name of the option would make more sense (and 'opportunistic' over
> 'optional', too), because what the change is about is to allow other
> processes that are directly interacting with the user to update the
> index, and 'lock' being hindrance is merely an implementation
> detail.  The comment on the "test" in the log message mentions as if
> it were a short-coming that it does not check the lock but checks
> if the index is written, but I think that is testing what matters
> and preferable than testing "did we lock and then unlock it?"
> 
> On the other hand, if the answer is yes, then I am curious what
> other things this may extend to, and if these other things are also
> opportunistic optimizations.

I left it intentionally vague exactly because I thought we might want to
leave room for the answer to change to "yes" eventually.  For instance,
imagine that we had a ref storage format that required periodic
compaction, and readers might sometimes choose to compact in order to
save future readers from repeating some work they've done. If that
compaction means holding a lock even for a brief period, I think it
would fall under this option.

I admit that's just adding more hand-waving to the pile. But I don't
think it really _hurts_ to leave that door open (aside from making the
documentation a bit wishy-washy). And it helps because callers would
pick up the new features automatically, without having to learn about a
new option.

And I think that's really what this option is. It is less about the
caller asking for some specific behavior, and more about them telling
Git about the context in which it's executing so it can make intelligent
decisions.

And in that sense, something descriptive like --background-process
perhaps would be a better name. Except that I couldn't come up with a
name that isn't confusing (certainly --background-process implies to me
that Git would itself run in the background, which makes no sense here).

I also considered something like "--read-only" to tell Git that we
should avoid writing to the repository. But that's really not what this
does. It just avoids writes that may cause contention, not all writes.

I also considered using the word "opportunistic" in the option name, but
decided it was too long and hard to spell.

So there. I am open to a better name, but I could not come up with one.

> Thanks (and sorry for not being Johannes ;-).

You lack his rugged good looks, but your review was still welcome.

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] git: add --no-optional-locks option
  2017-09-21  5:08   ` Jeff King
@ 2017-09-21  5:29     ` Junio C Hamano
  0 siblings, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2017-09-21  5:29 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, git

Jeff King <peff@peff.net> writes:

> I admit that's just adding more hand-waving to the pile. But I don't
> think it really _hurts_ to leave that door open (aside from making the
> documentation a bit wishy-washy). And it helps because callers would
> pick up the new features automatically, without having to learn about a
> new option.

Oh, I do agree it is a good idea to leave that door open, so that
scripts that rely on today's --no-optional-locks option will not
have to be updated when a similar feature (like the "ref compaction"
example you mentioned) against which "let's disable when we are not
the primary process, in order to keep the interference to the
minimum" would be something we would want to say.  The option being
added here should cover these future needs.

> And I think that's really what this option is. It is less about the
> caller asking for some specific behavior, and more about them telling
> Git about the context in which it's executing so it can make intelligent
> decisions.

Yes, indeed.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] git: add --no-optional-locks option
  2017-09-21  4:32 [PATCH] git: add --no-optional-locks option Jeff King
  2017-09-21  4:55 ` Junio C Hamano
@ 2017-09-21 18:25 ` Johannes Sixt
  2017-09-22  4:25   ` Jeff King
  2017-09-22  6:42 ` Daniel Santos
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: Johannes Sixt @ 2017-09-21 18:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, git

Am 21.09.2017 um 06:32 schrieb Jeff King:
> diff --git a/Documentation/git.txt b/Documentation/git.txt
> index 6e3a6767e5..8dd3ae05ae 100644
> --- a/Documentation/git.txt
> +++ b/Documentation/git.txt
> @@ -159,6 +159,10 @@ foo.bar= ...`) sets `foo.bar` to the empty string which ` git config
>   	Add "icase" magic to all pathspec. This is equivalent to setting
>   	the `GIT_ICASE_PATHSPECS` environment variable to `1`.
>   
> +--no-optional-locks::
> +	Do not perform optional operations that require locks. This is
> +	equivalent to setting the `GIT_OPTIONAL_LOCKS` to `0`.
> +
>   GIT COMMANDS
>   ------------
>   
> @@ -697,6 +701,15 @@ of clones and fetches.
>   	which feed potentially-untrusted URLS to git commands.  See
>   	linkgit:git-config[1] for more details.
>   
> +`GIT_OPTIONAL_LOCKS`::
> +	If set to `0`, Git will avoid performing any operations which
> +	require taking a lock and which are not required to complete the
> +	requested operation. For example, this will prevent `git status`
> +	from refreshing the index as a side effect. This is useful for
> +	processes running in the background which do not want to cause
> +	lock contention with other operations on the repository.
> +	Defaults to `1`.

I don't think we should pass this environment variable to remote 
repositories. It should be listed in local_repo_env[] in environment.c.

-- Hannes

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] git: add --no-optional-locks option
  2017-09-21 18:25 ` Johannes Sixt
@ 2017-09-22  4:25   ` Jeff King
  2017-09-22 11:22     ` Jeff Hostetler
  2017-09-22 20:09     ` Stefan Beller
  0 siblings, 2 replies; 26+ messages in thread
From: Jeff King @ 2017-09-22  4:25 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Johannes Schindelin, git

On Thu, Sep 21, 2017 at 08:25:50PM +0200, Johannes Sixt wrote:

> > +`GIT_OPTIONAL_LOCKS`::
> > +	If set to `0`, Git will avoid performing any operations which
> > +	require taking a lock and which are not required to complete the
> > +	requested operation. For example, this will prevent `git status`
> > +	from refreshing the index as a side effect. This is useful for
> > +	processes running in the background which do not want to cause
> > +	lock contention with other operations on the repository.
> > +	Defaults to `1`.
> 
> I don't think we should pass this environment variable to remote
> repositories. It should be listed in local_repo_env[] in environment.c.

I'm not sure I agree. This is really about the context in which the
command is executing, not anything about the particular repository
you're operating on.

For fetch/push operations that touch a remote, I doubt it would matter
either way (and anyway, those often cross network boundaries that don't
propagate environment variables anyway).

But imagine that "git status" learns to recurse into submodules and run
"git status" inside them. Surely we would want the submodule repos to
also avoid taking any unnecessary locks?

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] git: add --no-optional-locks option
  2017-09-21  4:32 [PATCH] git: add --no-optional-locks option Jeff King
  2017-09-21  4:55 ` Junio C Hamano
  2017-09-21 18:25 ` Johannes Sixt
@ 2017-09-22  6:42 ` Daniel Santos
  2017-09-22 16:04   ` Jeff King
  2017-09-24 11:31 ` Kaartic Sivaraam
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: Daniel Santos @ 2017-09-22  6:42 UTC (permalink / raw)
  To: Jeff King, Johannes Schindelin; +Cc: git

On 09/20/2017 11:32 PM, Jeff King wrote:
> Johannes, this is an adaptation of your 67e5ce7f63 (status: offer *not*
> to lock the index and update it, 2016-08-12). Folks working on GitHub
> Desktop complained to me that it's only available on Windows. :)
>
> I expanded the scope a bit to let us give the same treatment to more
> commands in the long run.  I'd also be OK with just cherry-picking your
> patch to non-Windows Git if you don't find my reasoning below
> compelling. But I think we need _something_ like this, as the other
> solutions I could come up with don't seem very promising.
>
> -Peff
>
> -- >8 --
> Some tools like IDEs or fancy editors may periodically run
> commands like "git status" in the background to keep track
> of the state of the repository. Some of these commands may
> refresh the index and write out the result in an
> opportunistic way: if they can get the index lock, then they
> update the on-disk index with any updates they find. And if
> not, then their in-core refresh is lost and just has to be
> recomputed by the next caller.
>
> But taking the index lock may conflict with other operations
> in the repository. Especially ones that the user is doing
> themselves, which _aren't_ opportunistic. In other words,
> "git status" knows how to back off when somebody else is
> holding the lock, but other commands don't know that status
> would be happy to drop the lock if somebody else wanted it.

Interestingly, this usually slaps me when performing an _interactive_
rebase.  It occurred to me that if I'm performing an interaction
operation, it doesn't seem unreasonable for git wait up to 125ms or so
for the lock and then prompting the user to ask if they want to continue
waiting for the lock.

> There are a couple possible solutions:
>
>   1. Have some kind of "pseudo-lock" that allows other
>      commands to tell status that they want the lock.
>
>      This is likely to be complicated and error-prone to
>      implement (and maybe even impossible with just
>      dotlocks to work from, as it requires some
>      inter-process communication).
>
>   2. Avoid background runs of commands like "git status"
>      that want to do opportunistic updates, preferring
>      instead plumbing like diff-files, etc.
>
>      This is awkward for a couple of reasons. One is that
>      "status --porcelain" reports a lot more about the
>      repository state than is available from individual
>      plumbing commands. And two is that we actually _do_
>      want to see the refreshed index. We just don't want to
>      take a lock or write out the result. Whereas commands
>      like diff-files expect us to refresh the index
>      separately and write it to disk so that they can depend
>      on the result. But that write is exactly what we're
>      trying to avoid.
>
>   3. Ask "status" not to lock or write the index.
>
>      This is easy to implement. The big downside is that any
>      work done in refreshing the index for such a call is
>      lost when the process exits. So a background process
>      may end up re-hashing a changed file multiple times
>      until the user runs a command that does an index
>      refresh themselves.

That is not necessarily the case.  I don't actually know git on the
inside, but I would ask you to consider a read-write lock and a hybrid
of one and three.

I don't know what dotlocks are, but I'm certain that you can implement a
rw lock using lock files and no other IPC, although it does increase the
complexity.  The way this works is that `git status' acquires a read
lock and does its thing.  If it has real changes, instead of discarding
them it attempts to upgrade to a write lock.  If that fails, you throw
it away, otherwise you write them and release.

In order to implement rw locks with only lock files, "off the cuff" I
say you have a single "lock list" file that should never be deleted and
a "lock lock" file that is held in order to read or modify the list. 
The format of the lock list would have a pair of 32-bit wrapping
modification counts (or versions) at the top -- one for modifications to
the lock list its self and another for modifications to the underlying
data (i.e., the number of times a write lock has been acquired).  This
header is followed by entries something like this:

<operation> <pid> <version> <timestamp>

<operation>  'r' if waiting for a read lock
             'R' if actively reading
             'w' if waiting for write lock
             'W' if actively writing
<pid>        The pid
<version>    If active, the version of the data at the time lock acquired or zero.
<timestamp>  Time began waiting or time lock acquired


An operation of 'r' or 'w' means that you are waiting and upper case
means that you are active.  <version> is the version of the data at the
time the lock became active and writers increment it when they acquire a
lock.  You wait with file alteration notification on the lock list (if
there is any doubt based upon timestamp precision then you can examine
the lock list version).  When you want to read or write, you lock the
lock-lock (this sounds like a joke... "Lock, lock.", "Who's there?",
"Reader.", "Reader who?"....) and examine the lock list.  If it's empty,
you add yourself as an active reader or writer with an upper case 'R' or
'W' and release the lock-lock.  If there are only readers, and you want
to read, you add yourself as an active reader.  The version of the lock
list is incremented every time it is modified.

Read-write locks need to be given a priority policy of either readers,
writers, fifo or don't-care.  In this case that should probably be
writers.  So if you want to read and there is an active or waiting
writer, the prospective reader would either add themselves with an 'r'
or fail if they don't want to wait.  If a process wants to write and
there are active readers or writers, it adds its self to the list or
fails as well.  When all active readers have exited, then which ever
prospective writers gets the lock-lock first can make themselves the
active writer.  When a process acquires a write lock, it increases the
data version number.  If a reader lock tries to upgrade to a writer lock
but the data version changed than it fails.

Is there not already a library somewhere that does this?  Either way,
your current effort seems like a step in the right direction -- and
thanks for that!

Daniel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] git: add --no-optional-locks option
  2017-09-22  4:25   ` Jeff King
@ 2017-09-22 11:22     ` Jeff Hostetler
  2017-09-22 15:04       ` Jeff King
  2017-09-22 20:09     ` Stefan Beller
  1 sibling, 1 reply; 26+ messages in thread
From: Jeff Hostetler @ 2017-09-22 11:22 UTC (permalink / raw)
  To: Jeff King, Johannes Sixt; +Cc: Johannes Schindelin, git



On 9/22/2017 12:25 AM, Jeff King wrote:
> On Thu, Sep 21, 2017 at 08:25:50PM +0200, Johannes Sixt wrote:
> 
>>> +`GIT_OPTIONAL_LOCKS`::
>>> +	If set to `0`, Git will avoid performing any operations which
>>> +	require taking a lock and which are not required to complete the
>>> +	requested operation. For example, this will prevent `git status`
>>> +	from refreshing the index as a side effect. This is useful for
>>> +	processes running in the background which do not want to cause
>>> +	lock contention with other operations on the repository.
>>> +	Defaults to `1`. >>
>> I don't think we should pass this environment variable to remote
>> repositories. It should be listed in local_repo_env[] in environment.c.
> 
> I'm not sure I agree. This is really about the context in which the
> command is executing, not anything about the particular repository
> you're operating on.
> 
> For fetch/push operations that touch a remote, I doubt it would matter
> either way (and anyway, those often cross network boundaries that don't
> propagate environment variables anyway).
> 
> But imagine that "git status" learns to recurse into submodules and run
> "git status" inside them. Surely we would want the submodule repos to
> also avoid taking any unnecessary locks?
> 
> -Peff
> 

https://github.com/git-for-windows/git/commit/ff63b51c22389139a864eb2e565c6cdc5a30f061

https://github.com/git-for-windows/git/pull/1004/commits/45bad66192352481acbc826f11d90c8928b39a7a

We should compare this with what we did in Git for Windows last fall.
I guess those commits didn't get pushed upstream.

We added '--no-lock-index' to keep status from locking the index
during status and effectively being read-only.  This helped with
problems with Visual Studio similar to the ones being described
for KDevelop.

Jeff


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] git: add --no-optional-locks option
  2017-09-22 11:22     ` Jeff Hostetler
@ 2017-09-22 15:04       ` Jeff King
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2017-09-22 15:04 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: Johannes Sixt, Johannes Schindelin, git

On Fri, Sep 22, 2017 at 07:22:28AM -0400, Jeff Hostetler wrote:

> > > I don't think we should pass this environment variable to remote
> > > repositories. It should be listed in local_repo_env[] in environment.c.
> > 
> > I'm not sure I agree. This is really about the context in which the
> > command is executing, not anything about the particular repository
> > you're operating on.
> > 
> > For fetch/push operations that touch a remote, I doubt it would matter
> > either way (and anyway, those often cross network boundaries that don't
> > propagate environment variables anyway).
> > 
> > But imagine that "git status" learns to recurse into submodules and run
> > "git status" inside them. Surely we would want the submodule repos to
> > also avoid taking any unnecessary locks?
> > 
> > -Peff
> > 
> 
> https://github.com/git-for-windows/git/commit/ff63b51c22389139a864eb2e565c6cdc5a30f061
> 
> https://github.com/git-for-windows/git/pull/1004/commits/45bad66192352481acbc826f11d90c8928b39a7a
> 
> We should compare this with what we did in Git for Windows last fall.
> I guess those commits didn't get pushed upstream.

Right. I think you missed the initial message in the thread that
explains how this is an expanded version of ff63b51c22. :)

I didn't know about the environment thing in 45bad66192, though[1]. That
makes me even more confident that this is the right approach.

-Peff

[1] Sorry for not doing my homework more carefully on the existing
    solution.  GitHub Desktop ran into the same situation and pointed me
    at ff63b51c22. I extrapolated the rest of it on my own. ;)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] git: add --no-optional-locks option
  2017-09-22  6:42 ` Daniel Santos
@ 2017-09-22 16:04   ` Jeff King
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2017-09-22 16:04 UTC (permalink / raw)
  To: Daniel Santos; +Cc: Johannes Schindelin, git

On Fri, Sep 22, 2017 at 01:42:10AM -0500, Daniel Santos wrote:

> > But taking the index lock may conflict with other operations
> > in the repository. Especially ones that the user is doing
> > themselves, which _aren't_ opportunistic. In other words,
> > "git status" knows how to back off when somebody else is
> > holding the lock, but other commands don't know that status
> > would be happy to drop the lock if somebody else wanted it.
> 
> Interestingly, this usually slaps me when performing an _interactive_
> rebase.  It occurred to me that if I'm performing an interaction
> operation, it doesn't seem unreasonable for git wait up to 125ms or so
> for the lock and then prompting the user to ask if they want to continue
> waiting for the lock.

Yes, lock timeouts would help with this situation, though not eliminate
it entirely (and we've been adding some lock timeouts in a few places
for related situations).

We generally avoid prompting, especially when a command can just be
reissued. It sounds like "git rebase" gets into a funny state if it
cannot grab the index lock. That's something that should be fixed. Even
if it bails, you should be able to move forward with "rebase --continue"
or similar.

> That is not necessarily the case.  I don't actually know git on the
> inside, but I would ask you to consider a read-write lock and a hybrid
> of one and three.
> 
> I don't know what dotlocks are, but I'm certain that you can implement a
> rw lock using lock files and no other IPC, although it does increase the
> complexity.  The way this works is that `git status' acquires a read
> lock and does its thing.  If it has real changes, instead of discarding
> them it attempts to upgrade to a write lock.  If that fails, you throw
> it away, otherwise you write them and release.

By dotlocks, I just mean our strategy of creating O_EXCL "index.lock"
files.

You're right that it can be done using two such locks, and communicating
via the lockfile contents. So my "impossible" was overstating it. I do
stand by my contention that it's much more complex than the existing
scheme. :)

More importantly, though, it changes the locking contract completely
between old and new versions (and between other implementations).
There's probably only a small minority of users who might use two
implementations to simultaneously access the same repository, but it is
a use case we try to support. I think we'd have to require a
repository-version bump to tell programs to use the new scheme.

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] git: add --no-optional-locks option
  2017-09-22  4:25   ` Jeff King
  2017-09-22 11:22     ` Jeff Hostetler
@ 2017-09-22 20:09     ` Stefan Beller
  2017-09-22 21:25       ` Jeff King
  1 sibling, 1 reply; 26+ messages in thread
From: Stefan Beller @ 2017-09-22 20:09 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Sixt, Johannes Schindelin, git@vger.kernel.org

On Thu, Sep 21, 2017 at 9:25 PM, Jeff King <peff@peff.net> wrote:

>
> But imagine that "git status" learns to recurse into submodules and run
> "git status" inside them. Surely we would want the submodule repos to
> also avoid taking any unnecessary locks?

You can teach Git to recurse into submodules already at home,
just 'git config status.submoduleSummary none'. ;)

It occurs to me that the config name is badly choosen, as it stores
an argument for git status --ignore-submodules[=mode]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] git: add --no-optional-locks option
  2017-09-22 20:09     ` Stefan Beller
@ 2017-09-22 21:25       ` Jeff King
  2017-09-22 21:41         ` Stefan Beller
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2017-09-22 21:25 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Johannes Sixt, Johannes Schindelin, git@vger.kernel.org

On Fri, Sep 22, 2017 at 01:09:32PM -0700, Stefan Beller wrote:

> On Thu, Sep 21, 2017 at 9:25 PM, Jeff King <peff@peff.net> wrote:
> 
> >
> > But imagine that "git status" learns to recurse into submodules and run
> > "git status" inside them. Surely we would want the submodule repos to
> > also avoid taking any unnecessary locks?
> 
> You can teach Git to recurse into submodules already at home,
> just 'git config status.submoduleSummary none'. ;)
> 
> It occurs to me that the config name is badly choosen, as it stores
> an argument for git status --ignore-submodules[=mode]

Ah, thanks. I _thought_ we could already do that but when I went looking
for the standard --recursive option I couldn't find it.

So yes, I would think we would want this option to apply recursively in
that case, even when we cross repository boundaries.

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] git: add --no-optional-locks option
  2017-09-22 21:25       ` Jeff King
@ 2017-09-22 21:41         ` Stefan Beller
  2017-09-23  3:34           ` Jeff King
  0 siblings, 1 reply; 26+ messages in thread
From: Stefan Beller @ 2017-09-22 21:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Sixt, Johannes Schindelin, git@vger.kernel.org

On Fri, Sep 22, 2017 at 2:25 PM, Jeff King <peff@peff.net> wrote:
> On Fri, Sep 22, 2017 at 01:09:32PM -0700, Stefan Beller wrote:
>
>> On Thu, Sep 21, 2017 at 9:25 PM, Jeff King <peff@peff.net> wrote:
>>
>> >
>> > But imagine that "git status" learns to recurse into submodules and run
>> > "git status" inside them. Surely we would want the submodule repos to
>> > also avoid taking any unnecessary locks?
>>
>> You can teach Git to recurse into submodules already at home,
>> just 'git config status.submoduleSummary none'. ;)
>>
>> It occurs to me that the config name is badly choosen, as it stores
>> an argument for git status --ignore-submodules[=mode]
>
> Ah, thanks. I _thought_ we could already do that but when I went looking
> for the standard --recursive option I couldn't find it.

Thanks for checking for submodules there.

I personally prefer --recurse-submodules despite the longer name,
because a plain recursive option can mean anything in a sufficiently
complex program such as Git (recurse into the tree (c.f. ls-tree), or
for the algorithm used (c.f. merge, diff) or yet another dimension
I did not think of).

> So yes, I would think we would want this option to apply recursively in
> that case, even when we cross repository boundaries.

Regarding the actual patch which is heavily aimed at coping with IDEs
despite the command line being used, I wonder how many IDEs pass
--ignore-submodules and recurse themselves (if needed). Reason for
my suspicion is [1] which does pay attention to submodules:

>    Our application calls status including the following flags:
>    --porcelain=v2 --ignored --untracked-files=all --ignore-submodules=none

[1] https://public-inbox.org/git/2bbb1d0f-ae06-1878-d185-112bd51f75c9@gmail.com/

There might be another option to cope with the situation:

 4. Teach all commands to spinlock / busywait shortly for important
     locks instead of giving up. In that case git-status rewriting
     the index ought to be fine?

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] git: add --no-optional-locks option
  2017-09-22 21:41         ` Stefan Beller
@ 2017-09-23  3:34           ` Jeff King
  2017-09-25 18:51             ` Stefan Beller
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2017-09-23  3:34 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Johannes Sixt, Johannes Schindelin, git@vger.kernel.org

On Fri, Sep 22, 2017 at 02:41:06PM -0700, Stefan Beller wrote:

> > Ah, thanks. I _thought_ we could already do that but when I went looking
> > for the standard --recursive option I couldn't find it.
> 
> Thanks for checking for submodules there.
> 
> I personally prefer --recurse-submodules despite the longer name,
> because a plain recursive option can mean anything in a sufficiently
> complex program such as Git (recurse into the tree (c.f. ls-tree), or
> for the algorithm used (c.f. merge, diff) or yet another dimension
> I did not think of).

Yeah, I agree that's a better name (mentioning --recursive is probably
just showing my general ignorance of submodules).

> > So yes, I would think we would want this option to apply recursively in
> > that case, even when we cross repository boundaries.
> 
> Regarding the actual patch which is heavily aimed at coping with IDEs
> despite the command line being used, I wonder how many IDEs pass
> --ignore-submodules and recurse themselves (if needed). Reason for
> my suspicion is [1] which does pay attention to submodules:
> 
> >    Our application calls status including the following flags:
> >    --porcelain=v2 --ignored --untracked-files=all --ignore-submodules=none
> 
> [1] https://public-inbox.org/git/2bbb1d0f-ae06-1878-d185-112bd51f75c9@gmail.com/

Yeah, I think that's probably Visual Studio (which is what Johannes
presumably wrote the original patch for). GitHub Desktop does not
currently pass it, though perhaps should consider doing so (though of
course the user could tweak their config, too).

> There might be another option to cope with the situation:
> 
>  4. Teach all commands to spinlock / busywait shortly for important
>      locks instead of giving up. In that case git-status rewriting
>      the index ought to be fine?

We do have all the infrastructure in place to do a reasonable busywait
with backoff. I think the patch is roughly just:

diff --git a/read-cache.c b/read-cache.c
index 65f4fe8375..fc1ba122a3 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1563,7 +1563,8 @@ static int read_index_extension(struct index_state *istate,
 
 int hold_locked_index(struct lock_file *lk, int lock_flags)
 {
-	return hold_lock_file_for_update(lk, get_index_file(), lock_flags);
+	return hold_lock_file_for_update_timeout(lk, get_index_file(),
+						 lock_flags, 500);
 }
 
 int read_index(struct index_state *istate)

though I think there are a few sites which manually call
hold_lock_file_for_update() on the index that would need similar
treatment.

I suspect it would work OK in practice, unless your index is so big that
500ms isn't enough. The user may also see minor stalls when there's lock
contention. I'm not sure how annoying that would be.

-Peff

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] git: add --no-optional-locks option
  2017-09-21  4:32 [PATCH] git: add --no-optional-locks option Jeff King
                   ` (2 preceding siblings ...)
  2017-09-22  6:42 ` Daniel Santos
@ 2017-09-24 11:31 ` Kaartic Sivaraam
  2017-09-25 16:17   ` Johannes Schindelin
  2017-09-25 16:10 ` Johannes Schindelin
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: Kaartic Sivaraam @ 2017-09-24 11:31 UTC (permalink / raw)
  To: Jeff King, Johannes Schindelin; +Cc: git

On Thursday 21 September 2017 10:02 AM, Jeff King wrote:
> Some tools like IDEs or fancy editors may periodically run
> commands like "git status" in the background to keep track
> of the state of the repository.

I might be missing something, shouldn't the IDEs be encouraged to use
libgit2 instead? I thought it was meant for these use cases.

Note: I assume getting the status through libgit2 doesn't create an 
index.lock file.

>    3. Ask "status" not to lock or write the index.
> 
>       This is easy to implement. The big downside is that any
>       work done in refreshing the index for such a call is
>       lost when the process exits. So a background process
>       may end up re-hashing a changed file multiple times
>       until the user runs a command that does an index
>       refresh themselves.
> 
> This patch implements the option 3.

So, if I get that correctly "git status --no-optional-locks" is a way
to get the "current" status without updating the on disk index file?

> +`GIT_OPTIONAL_LOCKS`::
> +	If set to `0`, Git will avoid performing any operations which
> +	require taking a lock and which are not required to complete the
> +	requested operation.

The above sentence seems to be a little hard to interpret. How about,

    If set to `0`, Git will complete the requested operation without
    performing the optional sub-operations that require taking a lock.


---
Kaartic

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] git: add --no-optional-locks option
  2017-09-21  4:32 [PATCH] git: add --no-optional-locks option Jeff King
                   ` (3 preceding siblings ...)
  2017-09-24 11:31 ` Kaartic Sivaraam
@ 2017-09-25 16:10 ` Johannes Schindelin
  2017-09-25 17:00   ` Jeff King
       [not found] ` <79ed4c34-1727-7c1e-868a-1206902638ad@gmail.com>
  2017-09-27  6:54 ` [PATCH v2] " Jeff King
  6 siblings, 1 reply; 26+ messages in thread
From: Johannes Schindelin @ 2017-09-25 16:10 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Hi Peff,

On Thu, 21 Sep 2017, Jeff King wrote:

> Johannes, this is an adaptation of your 67e5ce7f63 (status: offer *not*
> to lock the index and update it, 2016-08-12). Folks working on GitHub
> Desktop complained to me that it's only available on Windows. :)

Okay, so this is trying to help me by upstreaming a patch from Git for
Windows?

If so: thanks!

The changes, in particular the different semantics, will guarantee that I
have to work on the consumer's side here, though, leaving me even less
time for the Git mailing list, so I will need a lot more help with
upstreaming patches.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] git: add --no-optional-locks option
  2017-09-24 11:31 ` Kaartic Sivaraam
@ 2017-09-25 16:17   ` Johannes Schindelin
  2017-09-26 14:44     ` Jeff Hostetler
  0 siblings, 1 reply; 26+ messages in thread
From: Johannes Schindelin @ 2017-09-25 16:17 UTC (permalink / raw)
  To: Kaartic Sivaraam; +Cc: Jeff King, git

Hi Kaartic,

On Sun, 24 Sep 2017, Kaartic Sivaraam wrote:

> On Thursday 21 September 2017 10:02 AM, Jeff King wrote:
> > Some tools like IDEs or fancy editors may periodically run commands
> > like "git status" in the background to keep track of the state of the
> > repository.
> 
> I might be missing something, shouldn't the IDEs be encouraged to use
> libgit2 instead? I thought it was meant for these use cases.

There are pros and cons. Visual Studio moved away from libgit2 e.g. to
support SSH (back then, libgit2 did not support spawning processes, I have
no idea whether that changed in the meantime).

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] git: add --no-optional-locks option
  2017-09-25 16:10 ` Johannes Schindelin
@ 2017-09-25 17:00   ` Jeff King
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2017-09-25 17:00 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

On Mon, Sep 25, 2017 at 06:10:12PM +0200, Johannes Schindelin wrote:

> On Thu, 21 Sep 2017, Jeff King wrote:
> 
> > Johannes, this is an adaptation of your 67e5ce7f63 (status: offer *not*
> > to lock the index and update it, 2016-08-12). Folks working on GitHub
> > Desktop complained to me that it's only available on Windows. :)
> 
> Okay, so this is trying to help me by upstreaming a patch from Git for
> Windows?
> 
> If so: thanks!

Yes, in a round-about way. :)

> The changes, in particular the different semantics, will guarantee that I
> have to work on the consumer's side here, though, leaving me even less
> time for the Git mailing list, so I will need a lot more help with
> upstreaming patches.

I think you can get away with doing nothing for the time being. The two
patches can co-exist in the GfW codebase[1], and people using the
existing option can switch over at their leisure. Eventually you may
decide to revert 67e5ce7f63.

-Peff

[1] Modulo some trivial conflict resolution when the two are merged.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] git: add --no-optional-locks option
  2017-09-23  3:34           ` Jeff King
@ 2017-09-25 18:51             ` Stefan Beller
  2017-09-27  6:44               ` Jeff King
  0 siblings, 1 reply; 26+ messages in thread
From: Stefan Beller @ 2017-09-25 18:51 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Sixt, Johannes Schindelin, git@vger.kernel.org

>> There might be another option to cope with the situation:
>>
>>  4. Teach all commands to spinlock / busywait shortly for important
>>      locks instead of giving up. In that case git-status rewriting
>>      the index ought to be fine?
>
> We do have all the infrastructure in place to do a reasonable busywait
> with backoff. I think the patch is roughly just:
>
> diff --git a/read-cache.c b/read-cache.c
> index 65f4fe8375..fc1ba122a3 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1563,7 +1563,8 @@ static int read_index_extension(struct index_state *istate,
>
>  int hold_locked_index(struct lock_file *lk, int lock_flags)
>  {
> -       return hold_lock_file_for_update(lk, get_index_file(), lock_flags);
> +       return hold_lock_file_for_update_timeout(lk, get_index_file(),
> +                                                lock_flags, 500);
>  }
>
>  int read_index(struct index_state *istate)
>
> though I think there are a few sites which manually call
> hold_lock_file_for_update() on the index that would need similar
> treatment.

uh, too bad. The patch above looks really promising, though. :)

>
> I suspect it would work OK in practice, unless your index is so big that
> 500ms isn't enough. The user may also see minor stalls when there's lock
> contention. I'm not sure how annoying that would be.

There is only one way to find out. Though we don't want to volunteer
all users into this experiment, I'd presume.

Regarding larger indexes, I wonder if we can adapt the 500ms
to the repo size. At first I thought the abbreviation length could be
a good proxy to determine the maximum waiting time, but now I am
not so sure any more.

Stefan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] git: add --no-optional-locks option
  2017-09-25 16:17   ` Johannes Schindelin
@ 2017-09-26 14:44     ` Jeff Hostetler
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff Hostetler @ 2017-09-26 14:44 UTC (permalink / raw)
  To: Johannes Schindelin, Kaartic Sivaraam; +Cc: Jeff King, git



On 9/25/2017 12:17 PM, Johannes Schindelin wrote:
> Hi Kaartic,
> 
> On Sun, 24 Sep 2017, Kaartic Sivaraam wrote:
> 
>> On Thursday 21 September 2017 10:02 AM, Jeff King wrote:
>>> Some tools like IDEs or fancy editors may periodically run commands
>>> like "git status" in the background to keep track of the state of the
>>> repository.
>>
>> I might be missing something, shouldn't the IDEs be encouraged to use
>> libgit2 instead? I thought it was meant for these use cases.
> 
> There are pros and cons. Visual Studio moved away from libgit2 e.g. to
> support SSH (back then, libgit2 did not support spawning processes, I have
> no idea whether that changed in the meantime).

There were other issues besides feature parity.  The big one for VS
was that it moved the git computations into a separate process and
address space.  You can't easily read/modify/write a 500MB .git/index
file into memory (with however many copies of the data that that
creates) in a 32-bit process.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] git: add --no-optional-locks option
       [not found] ` <79ed4c34-1727-7c1e-868a-1206902638ad@gmail.com>
@ 2017-09-27  6:40   ` Jeff King
  2017-09-27 13:50     ` Kaartic Sivaraam
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2017-09-27  6:40 UTC (permalink / raw)
  To: Kaartic Sivaraam; +Cc: Johannes Schindelin, git

On Sun, Sep 24, 2017 at 01:08:41PM +0530, Kaartic Sivaraam wrote:

> On Thursday 21 September 2017 10:02 AM, Jeff King wrote:
> > Some tools like IDEs or fancy editors may periodically run
> > commands like "git status" in the background to keep track
> > of the state of the repository.
> 
> I might be missing something, shouldn't the IDEs be encouraged to use
> libgit2 instead? I thought it was meant for these use cases.

GitHub Desktop, at least, has actually moved _away_ from libgit2. I
think there were a number of reasons. Some match what Dscho said
regarding Visual Studio. But I think a main one is just that libgit2
doesn't quite match regular git for feature parity or pace of
development. So you're stuck shelling out to regular Git half the time
anyway, and now you get to handle dependencies on _two_ systems. :)

> Note: I assume getting the status through libgit2 doesn't create an
> index.lock file.

I don't know if that's the case or not.

> > This patch implements the option 3.
> 
> So, if I get that correctly "git status --no-optional-locks" is a way to get
> the "current" status without updating the on disk index file?

It's actually "git --no-optional-locks status", since it's a git-wide
option (it's just that "status" is the only one who respects it for
now).

> > +`GIT_OPTIONAL_LOCKS`::
> > +	If set to `0`, Git will avoid performing any operations which
> > +	require taking a lock and which are not required to complete the
> > +	requested operation.
> 
> The above sentence seems to be a little confusing. How about,
> 
>    If set to `0`, Git will complete the requested operation without
>    performing the optional sub-operations that require taking a lock.

Yeah, my original is pretty clunky. I ended up with:

  If set to `0`, Git will complete any requested operation without
  performing any optional sub-operations that require taking a lock.

which swaps out "the" for "any".

Thanks.

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] git: add --no-optional-locks option
  2017-09-25 18:51             ` Stefan Beller
@ 2017-09-27  6:44               ` Jeff King
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2017-09-27  6:44 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Johannes Sixt, Johannes Schindelin, git@vger.kernel.org

On Mon, Sep 25, 2017 at 11:51:31AM -0700, Stefan Beller wrote:

> > diff --git a/read-cache.c b/read-cache.c
> > index 65f4fe8375..fc1ba122a3 100644
> > --- a/read-cache.c
> > +++ b/read-cache.c
> > @@ -1563,7 +1563,8 @@ static int read_index_extension(struct index_state *istate,
> >
> >  int hold_locked_index(struct lock_file *lk, int lock_flags)
> >  {
> > -       return hold_lock_file_for_update(lk, get_index_file(), lock_flags);
> > +       return hold_lock_file_for_update_timeout(lk, get_index_file(),
> > +                                                lock_flags, 500);
> >  }
> >
> >  int read_index(struct index_state *istate)
> >
> > though I think there are a few sites which manually call
> > hold_lock_file_for_update() on the index that would need similar
> > treatment.
> 
> uh, too bad. The patch above looks really promising, though. :)

There are probably only a handful of other callers, and they'd just need
to swap out s/update/&_timeout/. So it really is pretty trivial.

> > I suspect it would work OK in practice, unless your index is so big that
> > 500ms isn't enough. The user may also see minor stalls when there's lock
> > contention. I'm not sure how annoying that would be.
> 
> There is only one way to find out. Though we don't want to volunteer
> all users into this experiment, I'd presume.

Yes. One of the nice things about the optional-locks approach is that it
only affects callers who specify the option. And the general idea has
gotten a year of testing in Visual Studio, which makes me feel good
about it.

> Regarding larger indexes, I wonder if we can adapt the 500ms
> to the repo size. At first I thought the abbreviation length could be
> a good proxy to determine the maximum waiting time, but now I am
> not so sure any more.

I think madness that way lies.

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2] git: add --no-optional-locks option
  2017-09-21  4:32 [PATCH] git: add --no-optional-locks option Jeff King
                   ` (5 preceding siblings ...)
       [not found] ` <79ed4c34-1727-7c1e-868a-1206902638ad@gmail.com>
@ 2017-09-27  6:54 ` Jeff King
  2017-09-28 16:15   ` Johannes Schindelin
  6 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2017-09-27  6:54 UTC (permalink / raw)
  To: git
  Cc: Johannes Schindelin, Johannes Sixt, Junio C Hamano, Stefan Beller,
	Kaartic Sivaraam

This is an update of my --no-optional-locks patch. The only difference
is that I updated the documentation to be a bit less clunky (Thanks,
Kaartic).

For the other comments on v1, I decided not to make any changes:

  - JSixt suggested marking this as a local-repo variable. I think we
    really do want it to cross repo boundaries to handle submodules (and
    the GfW version does the same).

  - there was some discussion over the name. I didn't see other
    suggestions, and I didn't come up with anything better.

  - there was some discussion of alternatives, including reader/writer
    locking schemes and lock-retry timeouts for other callers. This
    approach has the benefit of simplicity and having been tested in the
    real world by GfW/VS.  I wouldn't mind if somebody wanted to explore
    lock retries, but I think we'd want to have this feature regardless.

So I'm happy with this version, though of course I'm open to more
comments.

-- >8 --
Subject: git: add --no-optional-locks option

Some tools like IDEs or fancy editors may periodically run
commands like "git status" in the background to keep track
of the state of the repository. Some of these commands may
refresh the index and write out the result in an
opportunistic way: if they can get the index lock, then they
update the on-disk index with any updates they find. And if
not, then their in-core refresh is lost and just has to be
recomputed by the next caller.

But taking the index lock may conflict with other operations
in the repository. Especially ones that the user is doing
themselves, which _aren't_ opportunistic. In other words,
"git status" knows how to back off when somebody else is
holding the lock, but other commands don't know that status
would be happy to drop the lock if somebody else wanted it.

There are a couple possible solutions:

  1. Have some kind of "pseudo-lock" that allows other
     commands to tell status that they want the lock.

     This is likely to be complicated and error-prone to
     implement (and maybe even impossible with just
     dotlocks to work from, as it requires some
     inter-process communication).

  2. Avoid background runs of commands like "git status"
     that want to do opportunistic updates, preferring
     instead plumbing like diff-files, etc.

     This is awkward for a couple of reasons. One is that
     "status --porcelain" reports a lot more about the
     repository state than is available from individual
     plumbing commands. And two is that we actually _do_
     want to see the refreshed index. We just don't want to
     take a lock or write out the result. Whereas commands
     like diff-files expect us to refresh the index
     separately and write it to disk so that they can depend
     on the result. But that write is exactly what we're
     trying to avoid.

  3. Ask "status" not to lock or write the index.

     This is easy to implement. The big downside is that any
     work done in refreshing the index for such a call is
     lost when the process exits. So a background process
     may end up re-hashing a changed file multiple times
     until the user runs a command that does an index
     refresh themselves.

This patch implements the option 3. The idea (and the test)
is largely stolen from a Git for Windows patch by Johannes
Schindelin, 67e5ce7f63 (status: offer *not* to lock the
index and update it, 2016-08-12). The twist here is that
instead of making this an option to "git status", it becomes
a "git" option and matching environment variable.

The reason there is two-fold:

  1. An environment variable is carried through to
     sub-processes. And whether an invocation is a
     background process or not should apply to the whole
     process tree. So you could do "git --no-optional-locks
     foo", and if "foo" is a script or alias that calls
     "status", you'll still get the effect.

  2. There may be other programs that want the same
     treatment.

     I've punted here on finding more callers to convert,
     since "status" is the obvious one to call as a repeated
     background job. But "git diff"'s opportunistic refresh
     of the index may be a good candidate.

The test is taken from 67e5ce7f63, and it's worth repeating
Johannes's explanation:

  Note that the regression test added in this commit does
  not *really* verify that no index.lock file was written;
  that test is not possible in a portable way. Instead, we
  verify that .git/index is rewritten *only* when `git
  status` is run without `--no-optional-locks`.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git.txt | 12 ++++++++++++
 builtin/commit.c      |  5 ++++-
 cache.h               |  6 ++++++
 environment.c         |  5 +++++
 git.c                 |  4 ++++
 t/t7508-status.sh     | 10 ++++++++++
 6 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 6e3a6767e5..f7e603bc82 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -159,6 +159,10 @@ foo.bar= ...`) sets `foo.bar` to the empty string which ` git config
 	Add "icase" magic to all pathspec. This is equivalent to setting
 	the `GIT_ICASE_PATHSPECS` environment variable to `1`.
 
+--no-optional-locks::
+	Do not perform optional operations that require locks. This is
+	equivalent to setting the `GIT_OPTIONAL_LOCKS` to `0`.
+
 GIT COMMANDS
 ------------
 
@@ -697,6 +701,14 @@ of clones and fetches.
 	which feed potentially-untrusted URLS to git commands.  See
 	linkgit:git-config[1] for more details.
 
+`GIT_OPTIONAL_LOCKS`::
+	If set to `0`, Git will complete any requested operation without
+	performing any optional sub-operations that require taking a lock.
+	For example, this will prevent `git status` from refreshing the
+	index as a side effect. This is useful for processes running in
+	the background which do not want to cause lock contention with
+	other operations on the repository.  Defaults to `1`.
+
 Discussion[[Discussion]]
 ------------------------
 
diff --git a/builtin/commit.c b/builtin/commit.c
index 58f9747c2f..fafd492029 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1387,7 +1387,10 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 	read_cache_preload(&s.pathspec);
 	refresh_index(&the_index, REFRESH_QUIET|REFRESH_UNMERGED, &s.pathspec, NULL, NULL);
 
-	fd = hold_locked_index(&index_lock, 0);
+	if (use_optional_locks())
+		fd = hold_locked_index(&index_lock, 0);
+	else
+		fd = -1;
 
 	s.is_initial = get_oid(s.reference, &oid) ? 1 : 0;
 	if (!s.is_initial)
diff --git a/cache.h b/cache.h
index 49b083ee0a..75f8efc56f 100644
--- a/cache.h
+++ b/cache.h
@@ -444,6 +444,7 @@ static inline enum object_type object_type(unsigned int mode)
 #define GIT_NOGLOB_PATHSPECS_ENVIRONMENT "GIT_NOGLOB_PATHSPECS"
 #define GIT_ICASE_PATHSPECS_ENVIRONMENT "GIT_ICASE_PATHSPECS"
 #define GIT_QUARANTINE_ENVIRONMENT "GIT_QUARANTINE_PATH"
+#define GIT_OPTIONAL_LOCKS_ENVIRONMENT "GIT_OPTIONAL_LOCKS"
 
 /*
  * This environment variable is expected to contain a boolean indicating
@@ -783,6 +784,11 @@ extern int protect_ntfs;
  */
 extern int ref_paranoia;
 
+/*
+ * Returns the boolean value of $GIT_OPTIONAL_LOCKS (or the default value).
+ */
+int use_optional_locks(void);
+
 /*
  * The character that begins a commented line in user-editable file
  * that is subject to stripspace.
diff --git a/environment.c b/environment.c
index f1f934b6fd..8289c25b44 100644
--- a/environment.c
+++ b/environment.c
@@ -338,3 +338,8 @@ void reset_shared_repository(void)
 {
 	need_shared_repository_from_config = 1;
 }
+
+int use_optional_locks(void)
+{
+	return git_env_bool(GIT_OPTIONAL_LOCKS_ENVIRONMENT, 1);
+}
diff --git a/git.c b/git.c
index f31dca6962..9e96dd4090 100644
--- a/git.c
+++ b/git.c
@@ -182,6 +182,10 @@ static int handle_options(const char ***argv, int *argc, int *envchanged)
 			setenv(GIT_ICASE_PATHSPECS_ENVIRONMENT, "1", 1);
 			if (envchanged)
 				*envchanged = 1;
+		} else if (!strcmp(cmd, "--no-optional-locks")) {
+			setenv(GIT_OPTIONAL_LOCKS_ENVIRONMENT, "0", 1);
+			if (envchanged)
+				*envchanged = 1;
 		} else if (!strcmp(cmd, "--shallow-file")) {
 			(*argv)++;
 			(*argc)--;
diff --git a/t/t7508-status.sh b/t/t7508-status.sh
index 43d19a9b22..93f162a4f7 100755
--- a/t/t7508-status.sh
+++ b/t/t7508-status.sh
@@ -1670,4 +1670,14 @@ test_expect_success '"Initial commit" should not be noted in commit template' '
 	test_i18ngrep ! "Initial commit" output
 '
 
+test_expect_success '--no-optional-locks prevents index update' '
+	test-chmtime =1234567890 .git/index &&
+	git --no-optional-locks status &&
+	test-chmtime -v +0 .git/index >out &&
+	grep ^1234567890 out &&
+	git status &&
+	test-chmtime -v +0 .git/index >out &&
+	! grep ^1234567890 out
+'
+
 test_done
-- 
2.14.2.988.g01c8b37dde


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] git: add --no-optional-locks option
  2017-09-27  6:40   ` Jeff King
@ 2017-09-27 13:50     ` Kaartic Sivaraam
  2017-09-27 16:28       ` Jeff King
  0 siblings, 1 reply; 26+ messages in thread
From: Kaartic Sivaraam @ 2017-09-27 13:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, git

On Wed, 2017-09-27 at 02:40 -0400, Jeff King wrote:
> On Sun, Sep 24, 2017 at 01:08:41PM +0530, Kaartic Sivaraam wrote:
> 
> > 
> > So, if I get that correctly "git status --no-optional-locks" is a way to get
> > the "current" status without updating the on disk index file?
> 
> It's actually "git --no-optional-locks status", since it's a git-wide
> option (it's just that "status" is the only one who respects it for
> now).

Thanks for correcting the command. Now let me ask my (corrected)
question again! So, if I get that correctly "git --no-optional-
locks status" is a way to get the "current" status *without updating*
the on disk index file?

---
Kaartic

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] git: add --no-optional-locks option
  2017-09-27 13:50     ` Kaartic Sivaraam
@ 2017-09-27 16:28       ` Jeff King
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2017-09-27 16:28 UTC (permalink / raw)
  To: Kaartic Sivaraam; +Cc: Johannes Schindelin, git

On Wed, Sep 27, 2017 at 07:20:05PM +0530, Kaartic Sivaraam wrote:

> On Wed, 2017-09-27 at 02:40 -0400, Jeff King wrote:
> > On Sun, Sep 24, 2017 at 01:08:41PM +0530, Kaartic Sivaraam wrote:
> > 
> > > 
> > > So, if I get that correctly "git status --no-optional-locks" is a way to get
> > > the "current" status without updating the on disk index file?
> > 
> > It's actually "git --no-optional-locks status", since it's a git-wide
> > option (it's just that "status" is the only one who respects it for
> > now).
> 
> Thanks for correcting the command. Now let me ask my (corrected)
> question again! So, if I get that correctly "git --no-optional-
> locks status" is a way to get the "current" status *without updating*
> the on disk index file?

Yes.

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] git: add --no-optional-locks option
  2017-09-27  6:54 ` [PATCH v2] " Jeff King
@ 2017-09-28 16:15   ` Johannes Schindelin
  0 siblings, 0 replies; 26+ messages in thread
From: Johannes Schindelin @ 2017-09-28 16:15 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Johannes Sixt, Junio C Hamano, Stefan Beller,
	Kaartic Sivaraam

Hi Peff,

On Wed, 27 Sep 2017, Jeff King wrote:

> For the other comments on v1, I decided not to make any changes:
> 
>   - JSixt suggested marking this as a local-repo variable. I think we
>     really do want it to cross repo boundaries to handle submodules (and
>     the GfW version does the same).

Indeed. We had it originally as a local-repo only variable, and you
probably guessed that submodules made us rethink that approach.

>   - there was some discussion of alternatives, including reader/writer
>     locking schemes and lock-retry timeouts for other callers. This
>     approach has the benefit of simplicity and having been tested in the
>     real world by GfW/VS.  I wouldn't mind if somebody wanted to explore
>     lock retries, but I think we'd want to have this feature regardless.

We also weighed other options, and took this route for simplicity. It's
possible to explain during an uninterrupted elevator ride what it does.
Makes for less bugs.

And yes, we wanted to test it first before showing it to the Git mailing
list ;-)

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-09-28 16:15 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-21  4:32 [PATCH] git: add --no-optional-locks option Jeff King
2017-09-21  4:55 ` Junio C Hamano
2017-09-21  5:08   ` Jeff King
2017-09-21  5:29     ` Junio C Hamano
2017-09-21 18:25 ` Johannes Sixt
2017-09-22  4:25   ` Jeff King
2017-09-22 11:22     ` Jeff Hostetler
2017-09-22 15:04       ` Jeff King
2017-09-22 20:09     ` Stefan Beller
2017-09-22 21:25       ` Jeff King
2017-09-22 21:41         ` Stefan Beller
2017-09-23  3:34           ` Jeff King
2017-09-25 18:51             ` Stefan Beller
2017-09-27  6:44               ` Jeff King
2017-09-22  6:42 ` Daniel Santos
2017-09-22 16:04   ` Jeff King
2017-09-24 11:31 ` Kaartic Sivaraam
2017-09-25 16:17   ` Johannes Schindelin
2017-09-26 14:44     ` Jeff Hostetler
2017-09-25 16:10 ` Johannes Schindelin
2017-09-25 17:00   ` Jeff King
     [not found] ` <79ed4c34-1727-7c1e-868a-1206902638ad@gmail.com>
2017-09-27  6:40   ` Jeff King
2017-09-27 13:50     ` Kaartic Sivaraam
2017-09-27 16:28       ` Jeff King
2017-09-27  6:54 ` [PATCH v2] " Jeff King
2017-09-28 16:15   ` Johannes Schindelin

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).