* [PATCH v8] read-cache: call verify_hdr() in a background thread
@ 2017-04-25 18:41 git
2017-04-25 18:41 ` [PATCH v8] read-cache: force_verify_index_checksum git
2017-04-26 4:11 ` [PATCH v8] read-cache: call verify_hdr() in a background thread Junio C Hamano
0 siblings, 2 replies; 7+ messages in thread
From: git @ 2017-04-25 18:41 UTC (permalink / raw)
To: git; +Cc: gitster, peff, j6t, Jeff Hostetler
From: Jeff Hostetler <jeffhost@microsoft.com>
Version 8 of this patch converts the unit test to use
perl to corrupt the index checksum (rather than altering
a filename) and also verifies the fsck error message as
suggested in response to v7 on the mailing list.
If there are no other suggestions, I think this version
should be considered final.
Jeff Hostetler (1):
read-cache: force_verify_index_checksum
builtin/fsck.c | 1 +
cache.h | 2 ++
read-cache.c | 7 +++++++
t/t1450-fsck.sh | 32 ++++++++++++++++++++++++++++++++
4 files changed, 42 insertions(+)
--
2.9.3
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v8] read-cache: force_verify_index_checksum
2017-04-25 18:41 [PATCH v8] read-cache: call verify_hdr() in a background thread git
@ 2017-04-25 18:41 ` git
2017-04-26 4:11 ` [PATCH v8] read-cache: call verify_hdr() in a background thread Junio C Hamano
1 sibling, 0 replies; 7+ messages in thread
From: git @ 2017-04-25 18:41 UTC (permalink / raw)
To: git; +Cc: gitster, peff, j6t, Jeff Hostetler
From: Jeff Hostetler <jeffhost@microsoft.com>
Teach git to skip verification of the SHA1-1 checksum at the end of
the index file in verify_hdr() which is called from read_index()
unless the "force_verify_index_checksum" global variable is set.
Teach fsck to force this verification.
The checksum verification is for detecting disk corruption, and for
small projects, the time it takes to compute SHA-1 is not that
significant, but for gigantic repositories this calculation adds
significant time to every command.
These effect can be seen using t/perf/p0002-read-cache.sh:
Test HEAD~1 HEAD
--------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1000 times 0.66(0.44+0.20) 0.30(0.27+0.02) -54.5%
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
builtin/fsck.c | 1 +
cache.h | 2 ++
read-cache.c | 7 +++++++
t/t1450-fsck.sh | 32 ++++++++++++++++++++++++++++++++
4 files changed, 42 insertions(+)
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 1a5cacc..5512d06 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -771,6 +771,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
}
if (keep_cache_objects) {
+ verify_index_checksum = 1;
read_cache();
for (i = 0; i < active_nr; i++) {
unsigned int mode;
diff --git a/cache.h b/cache.h
index 80b6372..87f13bf 100644
--- a/cache.h
+++ b/cache.h
@@ -685,6 +685,8 @@ extern void update_index_if_able(struct index_state *, struct lock_file *);
extern int hold_locked_index(struct lock_file *, int);
extern void set_alternate_index_output(const char *);
+extern int verify_index_checksum;
+
/* Environment bits from configuration mechanism */
extern int trust_executable_bit;
extern int trust_ctime;
diff --git a/read-cache.c b/read-cache.c
index 9054369..c4205aa 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1371,6 +1371,9 @@ struct ondisk_cache_entry_extended {
ondisk_cache_entry_extended_size(ce_namelen(ce)) : \
ondisk_cache_entry_size(ce_namelen(ce)))
+/* Allow fsck to force verification of the index checksum. */
+int verify_index_checksum;
+
static int verify_hdr(struct cache_header *hdr, unsigned long size)
{
git_SHA_CTX c;
@@ -1382,6 +1385,10 @@ static int verify_hdr(struct cache_header *hdr, unsigned long size)
hdr_version = ntohl(hdr->hdr_version);
if (hdr_version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < hdr_version)
return error("bad index version %d", hdr_version);
+
+ if (!verify_index_checksum)
+ return 0;
+
git_SHA1_Init(&c);
git_SHA1_Update(&c, hdr, size - 20);
git_SHA1_Final(sha1, &c);
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 33a51c9..eff1cd6 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -689,4 +689,36 @@ test_expect_success 'bogus head does not fallback to all heads' '
! grep $blob out
'
+# Corrupt the checksum on the index.
+# Add 1 to the last byte in the SHA.
+corrupt_index_checksum () {
+ perl -w -e '
+ use Fcntl ":seek";
+ open my $fh, "+<", ".git/index" or die "open: $!";
+ binmode $fh;
+ seek $fh, -1, SEEK_END or die "seek: $!";
+ read $fh, my $in_byte, 1 or die "read: $!";
+
+ $in_value = unpack("C", $in_byte);
+ $out_value = ($in_value + 1) & 255;
+
+ $out_byte = pack("C", $out_value);
+
+ seek $fh, -1, SEEK_END or die "seek: $!";
+ print $fh $out_byte;
+ close $fh or die "close: $!";
+ '
+}
+
+# Corrupt the checksum on the index and then
+# verify that only fsck notices.
+test_expect_success 'detect corrupt index file in fsck' '
+ cp .git/index .git/index.backup &&
+ test_when_finished "mv .git/index.backup .git/index" &&
+ corrupt_index_checksum &&
+ test_must_fail git fsck --cache 2>expect &&
+ grep "bad index file" expect &&
+ git status
+'
+
test_done
--
2.9.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v8] read-cache: call verify_hdr() in a background thread
2017-04-25 18:41 [PATCH v8] read-cache: call verify_hdr() in a background thread git
2017-04-25 18:41 ` [PATCH v8] read-cache: force_verify_index_checksum git
@ 2017-04-26 4:11 ` Junio C Hamano
2017-04-26 4:34 ` Junio C Hamano
2017-04-26 13:03 ` Jeff Hostetler
1 sibling, 2 replies; 7+ messages in thread
From: Junio C Hamano @ 2017-04-26 4:11 UTC (permalink / raw)
To: git; +Cc: git, peff, j6t, Jeff Hostetler
git@jeffhostetler.com writes:
> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Version 8 of this patch converts the unit test to use
> perl to corrupt the index checksum (rather than altering
> a filename) and also verifies the fsck error message as
> suggested in response to v7 on the mailing list.
>
> If there are no other suggestions, I think this version
> should be considered final.
Oops. The earlier one has already been in 'master' for a few days.
Let's make this an incremental update.
Is the description in the following something you are OK with (so
that I can add your sign-off)?
Thanks.
-- >8 --
From: Jeff Hostetler <jeffhost@microsoft.com>
Date: Tue, 25 Apr 2017 18:41:09 +0000
Subject: t1450: avoid use of "sed" on the index, which is a binary file
The previous step added a path zzzzzzzz to the index, and then used
"sed" to replace this string to yyyyyyyy to create a test case where
the checksum at the end of the file does not match the contents.
Unfortunately, use of "sed" on a non-text file is not portable.
Instead, use a Perl script that seeks to the end and modifies the
last byte of the file (where we _know_ stores the trailing
checksum).
---
t/t1450-fsck.sh | 33 ++++++++++++++++++++++++++-------
1 file changed, 26 insertions(+), 7 deletions(-)
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 677e15a7a4..eff1cd68e9 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -689,16 +689,35 @@ test_expect_success 'bogus head does not fallback to all heads' '
! grep $blob out
'
+# Corrupt the checksum on the index.
+# Add 1 to the last byte in the SHA.
+corrupt_index_checksum () {
+ perl -w -e '
+ use Fcntl ":seek";
+ open my $fh, "+<", ".git/index" or die "open: $!";
+ binmode $fh;
+ seek $fh, -1, SEEK_END or die "seek: $!";
+ read $fh, my $in_byte, 1 or die "read: $!";
+
+ $in_value = unpack("C", $in_byte);
+ $out_value = ($in_value + 1) & 255;
+
+ $out_byte = pack("C", $out_value);
+
+ seek $fh, -1, SEEK_END or die "seek: $!";
+ print $fh $out_byte;
+ close $fh or die "close: $!";
+ '
+}
+
+# Corrupt the checksum on the index and then
+# verify that only fsck notices.
test_expect_success 'detect corrupt index file in fsck' '
cp .git/index .git/index.backup &&
test_when_finished "mv .git/index.backup .git/index" &&
- echo zzzzzzzz >zzzzzzzz &&
- git add zzzzzzzz &&
- sed -e "s/zzzzzzzz/yyyyyyyy/" .git/index >.git/index.yyy &&
- mv .git/index.yyy .git/index &&
- # Confirm that fsck detects invalid checksum
- test_must_fail git fsck --cache &&
- # Confirm that status no longer complains about invalid checksum
+ corrupt_index_checksum &&
+ test_must_fail git fsck --cache 2>expect &&
+ grep "bad index file" expect &&
git status
'
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v8] read-cache: call verify_hdr() in a background thread
2017-04-26 4:11 ` [PATCH v8] read-cache: call verify_hdr() in a background thread Junio C Hamano
@ 2017-04-26 4:34 ` Junio C Hamano
2017-04-26 13:06 ` Jeff Hostetler
2017-04-26 13:03 ` Jeff Hostetler
1 sibling, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2017-04-26 4:34 UTC (permalink / raw)
To: git; +Cc: git, peff, j6t, Jeff Hostetler
Junio C Hamano <gitster@pobox.com> writes:
> git@jeffhostetler.com writes:
>
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Version 8 of this patch converts the unit test to use
>> perl to corrupt the index checksum (rather than altering
>> a filename) and also verifies the fsck error message as
>> suggested in response to v7 on the mailing list.
>>
>> If there are no other suggestions, I think this version
>> should be considered final.
>
> Oops. The earlier one has already been in 'master' for a few days.
> Let's make this an incremental update.
>
> Is the description in the following something you are OK with (so
> that I can add your sign-off)?
>
> Thanks.
By the way, I said I am fine with the two-liner version in Dscho's
message, but I am also OK with this version that does not use a
separate dd and instead does everything in a single invocation of
Perl. As long as you've tested this version, there is no point
replacing it with yet another one.
Thanks.
> -- >8 --
> From: Jeff Hostetler <jeffhost@microsoft.com>
> Date: Tue, 25 Apr 2017 18:41:09 +0000
> Subject: t1450: avoid use of "sed" on the index, which is a binary file
>
> The previous step added a path zzzzzzzz to the index, and then used
> "sed" to replace this string to yyyyyyyy to create a test case where
> the checksum at the end of the file does not match the contents.
>
> Unfortunately, use of "sed" on a non-text file is not portable.
> Instead, use a Perl script that seeks to the end and modifies the
> last byte of the file (where we _know_ stores the trailing
> checksum).
>
>
> ---
> t/t1450-fsck.sh | 33 ++++++++++++++++++++++++++-------
> 1 file changed, 26 insertions(+), 7 deletions(-)
>
> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
> index 677e15a7a4..eff1cd68e9 100755
> --- a/t/t1450-fsck.sh
> +++ b/t/t1450-fsck.sh
> @@ -689,16 +689,35 @@ test_expect_success 'bogus head does not fallback to all heads' '
> ! grep $blob out
> '
>
> +# Corrupt the checksum on the index.
> +# Add 1 to the last byte in the SHA.
> +corrupt_index_checksum () {
> + perl -w -e '
> + use Fcntl ":seek";
> + open my $fh, "+<", ".git/index" or die "open: $!";
> + binmode $fh;
> + seek $fh, -1, SEEK_END or die "seek: $!";
> + read $fh, my $in_byte, 1 or die "read: $!";
> +
> + $in_value = unpack("C", $in_byte);
> + $out_value = ($in_value + 1) & 255;
> +
> + $out_byte = pack("C", $out_value);
> +
> + seek $fh, -1, SEEK_END or die "seek: $!";
> + print $fh $out_byte;
> + close $fh or die "close: $!";
> + '
> +}
> +
> +# Corrupt the checksum on the index and then
> +# verify that only fsck notices.
> test_expect_success 'detect corrupt index file in fsck' '
> cp .git/index .git/index.backup &&
> test_when_finished "mv .git/index.backup .git/index" &&
> - echo zzzzzzzz >zzzzzzzz &&
> - git add zzzzzzzz &&
> - sed -e "s/zzzzzzzz/yyyyyyyy/" .git/index >.git/index.yyy &&
> - mv .git/index.yyy .git/index &&
> - # Confirm that fsck detects invalid checksum
> - test_must_fail git fsck --cache &&
> - # Confirm that status no longer complains about invalid checksum
> + corrupt_index_checksum &&
> + test_must_fail git fsck --cache 2>expect &&
> + grep "bad index file" expect &&
> git status
> '
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v8] read-cache: call verify_hdr() in a background thread
2017-04-26 4:11 ` [PATCH v8] read-cache: call verify_hdr() in a background thread Junio C Hamano
2017-04-26 4:34 ` Junio C Hamano
@ 2017-04-26 13:03 ` Jeff Hostetler
1 sibling, 0 replies; 7+ messages in thread
From: Jeff Hostetler @ 2017-04-26 13:03 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, peff, j6t, Jeff Hostetler
On 4/26/2017 12:11 AM, Junio C Hamano wrote:
> git@jeffhostetler.com writes:
>
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Version 8 of this patch converts the unit test to use
>> perl to corrupt the index checksum (rather than altering
>> a filename) and also verifies the fsck error message as
>> suggested in response to v7 on the mailing list.
>>
>> If there are no other suggestions, I think this version
>> should be considered final.
> Oops. The earlier one has already been in 'master' for a few days.
> Let's make this an incremental update.
>
> Is the description in the following something you are OK with (so
> that I can add your sign-off)?
Yes, this is fine. Thanks!
Jeff
>
> Thanks.
>
> -- >8 --
> From: Jeff Hostetler <jeffhost@microsoft.com>
> Date: Tue, 25 Apr 2017 18:41:09 +0000
> Subject: t1450: avoid use of "sed" on the index, which is a binary file
>
> The previous step added a path zzzzzzzz to the index, and then used
> "sed" to replace this string to yyyyyyyy to create a test case where
> the checksum at the end of the file does not match the contents.
>
> Unfortunately, use of "sed" on a non-text file is not portable.
> Instead, use a Perl script that seeks to the end and modifies the
> last byte of the file (where we _know_ stores the trailing
> checksum).
>
>
> ---
> t/t1450-fsck.sh | 33 ++++++++++++++++++++++++++-------
> 1 file changed, 26 insertions(+), 7 deletions(-)
>
> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
> index 677e15a7a4..eff1cd68e9 100755
> --- a/t/t1450-fsck.sh
> +++ b/t/t1450-fsck.sh
> @@ -689,16 +689,35 @@ test_expect_success 'bogus head does not fallback to all heads' '
> ! grep $blob out
> '
>
> +# Corrupt the checksum on the index.
> +# Add 1 to the last byte in the SHA.
> +corrupt_index_checksum () {
> + perl -w -e '
> + use Fcntl ":seek";
> + open my $fh, "+<", ".git/index" or die "open: $!";
> + binmode $fh;
> + seek $fh, -1, SEEK_END or die "seek: $!";
> + read $fh, my $in_byte, 1 or die "read: $!";
> +
> + $in_value = unpack("C", $in_byte);
> + $out_value = ($in_value + 1) & 255;
> +
> + $out_byte = pack("C", $out_value);
> +
> + seek $fh, -1, SEEK_END or die "seek: $!";
> + print $fh $out_byte;
> + close $fh or die "close: $!";
> + '
> +}
> +
> +# Corrupt the checksum on the index and then
> +# verify that only fsck notices.
> test_expect_success 'detect corrupt index file in fsck' '
> cp .git/index .git/index.backup &&
> test_when_finished "mv .git/index.backup .git/index" &&
> - echo zzzzzzzz >zzzzzzzz &&
> - git add zzzzzzzz &&
> - sed -e "s/zzzzzzzz/yyyyyyyy/" .git/index >.git/index.yyy &&
> - mv .git/index.yyy .git/index &&
> - # Confirm that fsck detects invalid checksum
> - test_must_fail git fsck --cache &&
> - # Confirm that status no longer complains about invalid checksum
> + corrupt_index_checksum &&
> + test_must_fail git fsck --cache 2>expect &&
> + grep "bad index file" expect &&
> git status
> '
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v8] read-cache: call verify_hdr() in a background thread
2017-04-26 4:34 ` Junio C Hamano
@ 2017-04-26 13:06 ` Jeff Hostetler
2017-04-27 0:41 ` Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Jeff Hostetler @ 2017-04-26 13:06 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, peff, j6t, Jeff Hostetler
On 4/26/2017 12:34 AM, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> git@jeffhostetler.com writes:
>>
>>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>>
>>> Version 8 of this patch converts the unit test to use
>>> perl to corrupt the index checksum (rather than altering
>>> a filename) and also verifies the fsck error message as
>>> suggested in response to v7 on the mailing list.
>>>
>>> If there are no other suggestions, I think this version
>>> should be considered final.
>> Oops. The earlier one has already been in 'master' for a few days.
>> Let's make this an incremental update.
>>
>> Is the description in the following something you are OK with (so
>> that I can add your sign-off)?
>>
>> Thanks.
> By the way, I said I am fine with the two-liner version in Dscho's
> message, but I am also OK with this version that does not use a
> separate dd and instead does everything in a single invocation of
> Perl. As long as you've tested this version, there is no point
> replacing it with yet another one.
Either version is fine. I'd say use my perl version as I have tested it and
it is simple enough and already in the tree. I don't think it's worth the
bother to switch it to the dd version.
Thanks
Jeff
> Thanks.
>
>> -- >8 --
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>> Date: Tue, 25 Apr 2017 18:41:09 +0000
>> Subject: t1450: avoid use of "sed" on the index, which is a binary file
>>
>> The previous step added a path zzzzzzzz to the index, and then used
>> "sed" to replace this string to yyyyyyyy to create a test case where
>> the checksum at the end of the file does not match the contents.
>>
>> Unfortunately, use of "sed" on a non-text file is not portable.
>> Instead, use a Perl script that seeks to the end and modifies the
>> last byte of the file (where we _know_ stores the trailing
>> checksum).
>>
>>
>> ---
>> t/t1450-fsck.sh | 33 ++++++++++++++++++++++++++-------
>> 1 file changed, 26 insertions(+), 7 deletions(-)
>>
>> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
>> index 677e15a7a4..eff1cd68e9 100755
>> --- a/t/t1450-fsck.sh
>> +++ b/t/t1450-fsck.sh
>> @@ -689,16 +689,35 @@ test_expect_success 'bogus head does not fallback to all heads' '
>> ! grep $blob out
>> '
>>
>> +# Corrupt the checksum on the index.
>> +# Add 1 to the last byte in the SHA.
>> +corrupt_index_checksum () {
>> + perl -w -e '
>> + use Fcntl ":seek";
>> + open my $fh, "+<", ".git/index" or die "open: $!";
>> + binmode $fh;
>> + seek $fh, -1, SEEK_END or die "seek: $!";
>> + read $fh, my $in_byte, 1 or die "read: $!";
>> +
>> + $in_value = unpack("C", $in_byte);
>> + $out_value = ($in_value + 1) & 255;
>> +
>> + $out_byte = pack("C", $out_value);
>> +
>> + seek $fh, -1, SEEK_END or die "seek: $!";
>> + print $fh $out_byte;
>> + close $fh or die "close: $!";
>> + '
>> +}
>> +
>> +# Corrupt the checksum on the index and then
>> +# verify that only fsck notices.
>> test_expect_success 'detect corrupt index file in fsck' '
>> cp .git/index .git/index.backup &&
>> test_when_finished "mv .git/index.backup .git/index" &&
>> - echo zzzzzzzz >zzzzzzzz &&
>> - git add zzzzzzzz &&
>> - sed -e "s/zzzzzzzz/yyyyyyyy/" .git/index >.git/index.yyy &&
>> - mv .git/index.yyy .git/index &&
>> - # Confirm that fsck detects invalid checksum
>> - test_must_fail git fsck --cache &&
>> - # Confirm that status no longer complains about invalid checksum
>> + corrupt_index_checksum &&
>> + test_must_fail git fsck --cache 2>expect &&
>> + grep "bad index file" expect &&
>> git status
>> '
>>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v8] read-cache: call verify_hdr() in a background thread
2017-04-26 13:06 ` Jeff Hostetler
@ 2017-04-27 0:41 ` Junio C Hamano
0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2017-04-27 0:41 UTC (permalink / raw)
To: Jeff Hostetler; +Cc: git, peff, j6t, Jeff Hostetler
Jeff Hostetler <git@jeffhostetler.com> writes:
> Either version is fine. I'd say use my perl version as I have tested it and
> it is simple enough and already in the tree. I don't think it's worth the
> bother to switch it to the dd version.
Thanks, I agree what you said is very sensible.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-04-27 0:41 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-25 18:41 [PATCH v8] read-cache: call verify_hdr() in a background thread git
2017-04-25 18:41 ` [PATCH v8] read-cache: force_verify_index_checksum git
2017-04-26 4:11 ` [PATCH v8] read-cache: call verify_hdr() in a background thread Junio C Hamano
2017-04-26 4:34 ` Junio C Hamano
2017-04-26 13:06 ` Jeff Hostetler
2017-04-27 0:41 ` Junio C Hamano
2017-04-26 13:03 ` Jeff Hostetler
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).