From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: Alban Gruin <alban.gruin@gmail.com>
Cc: git@vger.kernel.org, Thomas Gummerer <t.gummerer@gmail.com>,
Johannes Schindelin <Johannes.Schindelin@gmx.de>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: [RFC PATCH v1 0/6] stash: drop usage of a second index
Date: Mon, 15 Jun 2020 17:27:15 +0200 [thread overview]
Message-ID: <20200615152715.GD2898@szeder.dev> (raw)
In-Reply-To: <20200505104849.13602-1-alban.gruin@gmail.com>
On Tue, May 05, 2020 at 12:48:43PM +0200, Alban Gruin wrote:
> The old scripted `git stash' used to create a second index to save
> modified and untracked files, and restore untracked files, without
> affecting the main index. This behaviour was carried on when it was
> rewritten in C, and here, most operations performed on the second index
> are done by forked commands (ie. `read-tree' instead of reset_tree(),
> etc.). This works most of the time, except in some edge case with the
> split-index when the split file has expired and is deleted by a forked
> command: the main index may still contain a reference to the now-deleted
> file, and subsequent operations on the index will fail [0].
>
> The goal of this series is to modernise (a bit) builtin/stash.c, and to
> fix the aforementionned edge case.
While this patch series does fix this edge case, it doesn't fix the
root cause, which is not specific to 'git stash', but is more general
issue when using split index with an alternate index. More on this in
the notes attached to the patch below.
Having said that, and without actually looking at your patches, I
think removing the second index from the builtin stash is a good idea
nonetheless.
> I have to admit that I don't really know how to test this.
> GIT_TEST_SPLIT_INDEX failed on me (gdb showed me that it does not enable
> the split-index at all, at least in `git stash' and its forks)
You are right, as it turns out GIT_TEST_SPLIT_INDEX=1 has been broken
for a while now. The patch below fixes this, but there is a bit of a
fallout of test failures, which it doesn't fix, because I'm not quite
sure how to fix one of them. Again, more on this in the attached
notes.
--- >8 ---
Subject: [PATCH] [RFH] read-cache: fix GIT_TEST_SPLIT_INDEX
Setting GIT_TEST_SPLIT_INDEX=1 is supposed to turn on the split-index
feature in the test suite and trigger index splitting (mostly)
randomly. Alas, this has been broken since 6e37c8ed3c (read-cache.c:
fix writing "link" index ext with null base oid, 2019-02-13), and no
index splitting has been performed at all since then.
There are two places where we check the value of GIT_TEST_SPLIT_INDEX,
and before 6e37c8ed3c they worked like this:
1) In the lower-level do_write_index(), where, if
GIT_TEST_SPLIT_INDEX is enabled, we call init_split_index().
This call merely allocates and zero-initializes
'istate->split_index', but does nothing else (i.e. doesn't fill
the base/shared index with cache entries, doesn't actually write
a shared index file, etc.). Pertinent to this issue, the hash of
the base index remains all zeroed out.
2) In the higher-level write_locked_index(), where, if
GIT_TEST_SPLIT_INDEX is enabled _and_ 'istate->split_index' has
already been initialized, it randomly sets the flag that triggers
index splitting later in this function. This randomness comes
from the first byte of the hash of the base index via an 'if
((first_byte & 15) < 6)' condition.
However, if 'istate->split_index' hasn't been initialized (i.e.
still NULL), then it calls do_write_locked_index(), which
internally calls do_write_index() mentioned above, and then a
'goto out' skips right over this second GIT_TEST_SPLIT_INDEX
check.
This means that while GIT_TEST_SPLIT_INDEX=1 usually triggered index
splitting randomly, the first two index writes were always
deterministic, though I suspect this was unintentional (this is still
before 6e37c8ed3c):
- The initial index write never splits the index.
During the first index write write_locked_index() is called with
'istate->split_index' still uninitialized, so the check in 2) is
not executed. It still calls do_write_index(), though, which then
executes the check in 1). The resulting all zero base index hash
then leads to the 'link' extension being written to '.git/index',
though a shared index file is not written:
$ rm .git/index
$ GIT_TEST_SPLIT_INDEX=1 git update-index --add file
$ test-tool dump-split-index .git/index
own c6ef71168597caec8553c83d9d0048f1ef416170
base 0000000000000000000000000000000000000000
100644 d00491fd7e5bb6fa28c517a0bb32b8b506539d4d 0 file
replacements:
deletions:
$ ls -l .git/sharedindex.*
ls: cannot access '.git/sharedindex.*': No such file or directory
- The second index write always splits the index.
When the index written in the previous point is read,
'istate->split_index' is initialized because of the presence of
the 'link' extension. So during the second write
write_locked_index() does run the check in 2), and the first byte
of the all zero base index hash always fulfills the randomness
condition, which in turn always triggers the index splitting.
- Subsequent index writes will find the 'link' extension with a real
non-zero base index hash, so from then on the check in 2) is
executed and the first byte of the base index hash is as random as
it gets (coming from the SHA-1 of index data including timestamps
and inodes...).
All this worked until 6e37c8ed3c came along, and stopped writing the
'link' extension if the hash of the base index was all zero:
$ rm .git/index
$ GIT_TEST_SPLIT_INDEX=1 git update-index --add file
$ test-tool dump-split-index .git/index
own abbd6f6458d5dee73ae8e210ca15a68a390c6fd7
not a split index
$ ls -l .git/sharedindex.*
ls: cannot access '.git/sharedindex.*': No such file or directory
Since there is no 'link' extension, in the second index write
'istate->split_index' remains uninitialized, and the check in 2) is
not executed, and ultimately the index is never split.
Fix this by modifying write_locked_index() to make sure to check
GIT_TEST_SPLIT_INDEX even if 'istate->split_index' is still
uninitialized, and initialize it if necessary. The check for
GIT_TEST_SPLIT_INDEX and separate init_split_index() call in
do_write_index() thus becomes unnecessary, so remove it. Furthermore,
add a test to 't1700-split-index.sh' to make sure that
GIT_TEST_SPLIT_INDEX=1 will keep working (though only check the
index splitting on the first index write, because after that it will
be random).
Note that this change does not restore the pre-6e37c8ed3c behaviour,
as it will deterministically split the index already on the first
index write. Since GIT_TEST_SPLIT_INDEX is purely a developer aid,
there is no backwards compatibility issue here. The new behaviour
does trigger test failures in 't0003-attributes.sh' and
't1600-index.sh', though, which should be fixed in preparatory patches
first.
---
Notes:
The failures in 't1600-index.sh' are easy: those tests check how bogus
values in $GIT_INDEX_VERSION or 'index.version' are handled, and all
failures look like this:
--- expect.err 2020-06-14 17:28:21.043527044 +0000
+++ actual.err 2020-06-14 17:28:21.043527044 +0000
@@ -1,2 +1,4 @@
warning: index.version set, but the value is invalid.
Using version
+warning: index.version set, but the value is invalid.
+Using version
We get one such warning when writing any index file, and since we now
split the index on the first write, we get one warning when writing
the shared index and a second warning when writing '.git/index'. A
simple 'sane_unset GIT_TEST_SPLIT_INDEX' fixes these failures, and
since this test script is about some intricacies of the index, I think
this is the right thing to do in this case.
The failure in 't0003-attributes.sh' is tricky, because it does this:
# The previous tests wrote the split and shared shared index files
# in '$TRASH_DIRECTORY/.git'.
(
cd bare.git &&
GIT_INDEX_FILE=../.git/index \
git check-attr --cached ....
)
which fails with:
fatal: ./sharedindex.3e9b03a8e3e1533d2bb2e22c77d4784ce2e7b108: index file open failed: No such file or directory
Unsetting GIT_TEST_SPLIT_INDEX in the test script fixes this failure
as well, of course, though I don't think this is the right thing to do
in this case. Worse, this failure raises some serious questions:
- Where to look for and where to write the shared index file when
using an alternate index location?
Currently it's read from / written to gitdir. This might be fine
as long as both cwd and the alternate index are in the same
repository while writing and reading. However, as this failing
test shows, if the shared index was written in one repository,
then an attempt to read it while in another repository using
GIT_INDEX_FILE will fail.
I think it would be more sensible to write the shared index next
to the (alternate) index file in the same directory, wherever that
might be. But what about backwards compatibility? IOW can there
be a sensible use case that relies on GIT_INDEX_FILE pointing to
somewhere, but its shared index file is still being written to the
current repository?
- What about expiring shared index files while using an alternate
index?
We must be careful not to expire any sharedindex.<sha1> files that
are used by '.git/index' or any existing alternate index files.
Unfortunately, we are not that careful, and I think this is the
root cause of the issue that Alban is trying to fix. The failing
test below demonstrates this using basically only 'git
update-index', without all the complexity of 'git stash'.
test_expect_failure 'split index expiration vs. alternate index' '
>file1 &&
>file2 &&
git update-index --split-index --add file1 &&
# debug: this should print .git/index and .git/sharedindex.<sha1>.
echo .git/*index* &&
GIT_INDEX_FILE=.git/otherindex \
git -c splitIndex.sharedIndexExpire=now \
update-index --split-index --add file2 &&
# debug: this should print .git/index, .git/otherindex and _two_
# .git/sharedindex.<sha1> files, but, alas, it prints only one
# shared index, because the other got expired.
echo .git/*index* &&
git diff --cached --name-only >actual1 &&
echo file1 >expect1 &&
test_cmp expect1 actual1 &&
GIT_INDEX_FILE=.git/otherindex git diff --cached --name-only >actual2 &&
echo file2 >expect2 &&
test_cmp expect2 actual2
'
Now, not expiring the shared index used by '.git/index' while
writing to an alternate split index is easy, because we do know
about '.git/index', of course, and thus can check it to exclude
its shared index. But how could a git process expiring shared
indexes know whether a shared index file belongs to an alternate
index?! I'm afraid it can't, which leads to the next question...
- Should we even allow writing a split index when using an alternate
index file?
If we don't do that, then there is no risk of expiring the shared
index of an alternate index, because there will be no such shared
index to begin with. Furthermore, if we don't write a shared
index while using an alternate index, then we won't trigger
expiration while using an alternate index, so we won't
accidentally expire the shared index used by '.git/index'.
Note, however, that we should still support _reading_ the shared
index of an alternate index, as that alternate index might be the
"real" '.git/index' of a different repository, like in the failing
test in t0003.
- Should we even allow 'splitIndex.sharedIndexExpire=now'?
I believe, though haven't confirmed, that it can cause trouble
even without using an alternate index. Consider the following
sequence of events:
- Git process A reads '.git/index', finds the 'link' extension,
and reads the SHA1 recorded there that determines the filename
of its shared index.
- The scheduler steps in, and puts process A to sleep.
- Git process B updates the index, decides that it's time to
write a new shared index, does so, and then because of
'splitIndex.sharedIndexExpire=now' it removes all other shared
index files.
- The scheduler wakes process A, which now tries to open the
shared index file it just learned about, but fails because
that file has just been removed by process B.
This is similar to the issue we have with 'git gc --prune=now',
except that 'git gc's documentation explicitly warns about the
risks of using '--prune=now', while the description of
'splitIndex.sharedIndexExpire' doesn't have any such warning.
I think that 'splitIndex.sharedIndexExpire=now' should be allowed,
for those who hopefully know what they are doing, just as we allow
'git gc --prune=now', but the documentation should clearly warn
against its potential pitfalls.
read-cache.c | 22 +++++++++++++---------
t/t1700-split-index.sh | 10 ++++++++++
2 files changed, 23 insertions(+), 9 deletions(-)
diff --git a/read-cache.c b/read-cache.c
index aa427c5c17..b888c5df44 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2801,11 +2801,8 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
}
}
- if (!istate->version) {
+ if (!istate->version)
istate->version = get_index_format_default(the_repository);
- if (git_env_bool("GIT_TEST_SPLIT_INDEX", 0))
- init_split_index(istate);
- }
/* demote version 3 to version 2 when the latter suffices */
if (istate->version == 3 || istate->version == 2)
@@ -3222,7 +3219,7 @@ static int too_many_not_shared_entries(struct index_state *istate)
int write_locked_index(struct index_state *istate, struct lock_file *lock,
unsigned flags)
{
- int new_shared_index, ret;
+ int new_shared_index, ret, test_split_index_env;
struct split_index *si = istate->split_index;
if (git_env_bool("GIT_TEST_CHECK_CACHE_TREE", 0))
@@ -3237,7 +3234,9 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
if (istate->fsmonitor_last_update)
fill_fsmonitor_bitmap(istate);
- if (!si || alternate_index_output ||
+ test_split_index_env = git_env_bool("GIT_TEST_SPLIT_INDEX", 0);
+
+ if ((!si && !test_split_index_env) || alternate_index_output ||
(istate->cache_changed & ~EXTMASK)) {
if (si)
oidclr(&si->base_oid);
@@ -3245,10 +3244,15 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
goto out;
}
- if (git_env_bool("GIT_TEST_SPLIT_INDEX", 0)) {
- int v = si->base_oid.hash[0];
- if ((v & 15) < 6)
+ if (test_split_index_env) {
+ if (!si) {
+ si = init_split_index(istate);
istate->cache_changed |= SPLIT_INDEX_ORDERED;
+ } else {
+ int v = si->base_oid.hash[0];
+ if ((v & 15) < 6)
+ istate->cache_changed |= SPLIT_INDEX_ORDERED;
+ }
}
if (too_many_not_shared_entries(istate))
istate->cache_changed |= SPLIT_INDEX_ORDERED;
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index 12a5568844..e0c9e16a8a 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -507,4 +507,14 @@ test_expect_success 'do not refresh null base index' '
)
'
+test_expect_success 'GIT_TEST_SPLIT_INDEX works' '
+ test_create_repo git-test-split-index &&
+ (
+ cd git-test-split-index &&
+ >file &&
+ GIT_TEST_SPLIT_INDEX=1 git update-index --add file &&
+ verbose test $(ls -l .git/sharedindex.* |wc -l) = 1
+ )
+'
+
test_done
--
2.27.0.278.g0bed8b425d
next prev parent reply other threads:[~2020-06-15 15:27 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-05 10:48 [RFC PATCH v1 0/6] stash: drop usage of a second index Alban Gruin
2020-05-05 10:48 ` [RFC PATCH v1 1/6] stash: mark `i_tree' in reset_tree() const Alban Gruin
2020-06-13 8:09 ` Christian Couder
2020-05-05 10:48 ` [RFC PATCH v1 2/6] stash: remove the second index in stash_working_tree() Alban Gruin
2020-06-13 8:52 ` Christian Couder
2020-06-13 18:00 ` Alban Gruin
2020-06-15 12:02 ` Christian Couder
2020-05-05 10:48 ` [RFC PATCH v1 3/6] stash: remove the second index in stash_patch() Alban Gruin
2020-06-13 9:38 ` Christian Couder
2020-06-13 10:04 ` Christian Couder
2020-05-05 10:48 ` [RFC PATCH v1 4/6] stash: remove the second index in save_untracked_files() Alban Gruin
2020-06-13 18:51 ` Christian Couder
2020-05-05 10:48 ` [RFC PATCH v1 5/6] stash: remove the second index in restore_untracked() Alban Gruin
2020-06-13 19:41 ` Christian Couder
2020-05-05 10:48 ` [RFC PATCH v1 6/6] stash: remove `stash_index_path' Alban Gruin
2020-06-04 12:07 ` [RFC PATCH v1 0/6] stash: drop usage of a second index Alban Gruin
2020-06-13 7:52 ` Christian Couder
2020-06-25 12:35 ` Alban Gruin
2020-06-15 15:27 ` SZEDER Gábor [this message]
2020-06-15 21:50 ` SZEDER Gábor
2020-06-16 7:06 ` SZEDER Gábor
2020-06-17 20:04 ` Junio C Hamano
2020-06-17 21:31 ` Alban Gruin
2020-06-30 15:15 ` [PATCH v2 " Alban Gruin
2020-06-30 15:15 ` [PATCH v2 1/6] stash: mark `i_tree' in reset_tree() const Alban Gruin
2020-06-30 15:15 ` [PATCH v2 2/6] stash: remove the second index in stash_working_tree() Alban Gruin
2020-06-30 15:15 ` [PATCH v2 3/6] stash: remove the second index in stash_patch() Alban Gruin
2020-06-30 15:15 ` [PATCH v2 4/6] stash: remove the second index in save_untracked_files() Alban Gruin
2020-06-30 15:15 ` [PATCH v2 5/6] stash: remove the second index in restore_untracked() Alban Gruin
2020-07-31 13:45 ` Christian Couder
2020-07-31 16:16 ` Alban Gruin
2020-06-30 15:15 ` [PATCH v2 6/6] stash: remove `stash_index_path' Alban Gruin
2020-07-31 13:53 ` [PATCH v2 0/6] stash: drop usage of a second index Christian Couder
2020-07-31 16:51 ` [PATCH v3 " Alban Gruin
2020-07-31 16:51 ` [PATCH v3 1/6] stash: mark `i_tree' in reset_tree() const Alban Gruin
2020-07-31 18:28 ` Junio C Hamano
2020-07-31 16:51 ` [PATCH v3 2/6] stash: remove the second index in stash_working_tree() Alban Gruin
2020-07-31 18:26 ` Junio C Hamano
2020-08-02 2:20 ` Junio C Hamano
2020-07-31 16:51 ` [PATCH v3 3/6] stash: remove the second index in stash_patch() Alban Gruin
2020-07-31 16:51 ` [PATCH v3 4/6] stash: remove the second index in save_untracked_files() Alban Gruin
2020-07-31 16:51 ` [PATCH v3 5/6] stash: remove the second index in restore_untracked() Alban Gruin
2020-07-31 16:51 ` [PATCH v3 6/6] stash: remove `stash_index_path' Alban Gruin
2020-07-31 17:48 ` [PATCH v3 0/6] stash: drop usage of a second index Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200615152715.GD2898@szeder.dev \
--to=szeder.dev@gmail.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=alban.gruin@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=t.gummerer@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).