git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch
  2018-04-17 18:10 [RFC PATCH 00/12] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
@ 2018-05-10 17:34 ` Derrick Stolee
  2018-05-10 19:05   ` Martin Ågren
  2018-05-10 19:17   ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 9+ messages in thread
From: Derrick Stolee @ 2018-05-10 17:34 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: peff@peff.net, sbeller@google.com, jnareb@gmail.com,
	stolee@gmail.com, Derrick Stolee

The commit-graph feature is not useful to end users until the
commit-graph file is maintained automatically by Git during normal
upkeep operations. One natural place to trigger this write is during
'git gc'.

Before automatically generating a commit-graph, we need to be able to
verify the contents of a commit-graph file. Integrate commit-graph
checks into 'fsck' that check the commit-graph contents against commits
in the object database.

Thanks, Jakub, for the feedback on the RFC. I think there are still some
things to decide at a high-level before we dig too far into the review.
Specifically, the integration points in 'fsck', 'gc', and 'fetch' are
worth considering our alternatives.

For 'fsck', the current integration is minimal: if core.commitGraph is
true, then 'git fsck' calls 'git commit-graph verify --object-dir=X'
for the objects directory and every alternate. There are a few options
to consider here:

1. Keep this behavior: we should always check the commit-graph if it
   exists.

2. Add a --[no-]commit-graph argument to 'fsck' that toggles the
   commit-graph verification.

3. Remove all direct integration between 'fsck' and 'commit-graph' and
   instead rely on users checking 'git commit-graph verify' manually
   when they suspect a problem with the commit-graph file. While this
   option is worth considering, it is my least favorite since it requires
   more from users.

For 'gc' and 'fetch', these seemed like natural places to update the
commit-graph file. Relative to the other maintenance that occurs during
these commands, the 'git commit-graph write' command is fast, especially
for incremental updates; only the "new" commits are walked when computing
generation numbers and other metadata for the commit-graph file.

The behavior in this patch series does the following:

1. Near the end of 'git gc', run 'git commit-graph write'. The location
   of this code assumes that a 'git gc --auto' has not terminated early
   due to not meeting the auto threshold.

2. At the end of 'git fetch', run 'git commit-graph write'. This means
   that every reachable commit will be in the commit-graph after a
   a successful fetch, which seems a reasonable frequency. Then, the
   only times we would be missing a reachable commit is after creating
   one locally. There is a problem with the current patch, though: every
   'git fetch' call runs 'git commit-graph write', even if there were no
   ref updates or objects downloaded. Is there a simple way to detect if
   the fetch was non-trivial?

One obvious problem with this approach: if we compute this during 'gc'
AND 'fetch', there will be times where a 'fetch' calls 'gc' and triggers
two commit-graph writes. If I were to abandon one of these patches, it
would be the 'fetch' integration. A 'git gc' really wants to delete all
references to unreachable commits, and without updating the commit-graph
we may still have commit data in the commit-graph file that is not in
the object database. In fact, deleting commits from the object database
but not from the commit-graph will cause 'git commit-graph verify' to
fail!

I welcome discussion on these ideas, as we are venturing out of the
"pure data structure" world and into the "user experience" world. I am
less confident in my skills in this world, but the feature is worthless
if it does not improve the user experience.

Thanks,
-Stolee

Derrick Stolee (12):

Commits 01-07 focus on the 'git commit-graph verify' subcommand. These
are ready for full, rigorous review.

  commit-graph: add 'verify' subcommand
  commit-graph: verify file header information
  commit-graph: parse commit from chosen graph
  commit-graph: verify fanout and lookup table
  commit: force commit to parse from object database
  commit-graph: load a root tree from specific graph
  commit-graph: verify commit contents against odb

Commit 08 integrates 'git commit-graph verify' into fsck. The work here
is the minimum integration possible. (See above discussion of options.)

  fsck: verify commit-graph

Commit 09 introduces a new '--reachable' option only to make the calls
from 'gc' and 'fetch' simpler. Commits 10-11 integrate writing the
commit-graph into 'gc' and 'fetch', respectively. (See above disucssion.)

  commit-graph: add '--reachable' option
  gc: automatically write commit-graph files
  fetch: compute commit-graph by default

Commit 12 simply deletes sections from the "Future Work" section
of the commit-graph design document.

  commit-graph: update design document

 Documentation/config.txt                 |  10 ++
 Documentation/git-commit-graph.txt       |  14 ++-
 Documentation/git-fsck.txt               |   3 +
 Documentation/git-gc.txt                 |   4 +
 Documentation/technical/commit-graph.txt |  22 ----
 builtin/commit-graph.c                   |  79 +++++++++++++-
 builtin/fetch.c                          |  13 +++
 builtin/fsck.c                           |  21 ++++
 builtin/gc.c                             |   8 ++
 commit-graph.c                           | 175 ++++++++++++++++++++++++++++++-
 commit-graph.h                           |   2 +
 commit.c                                 |  13 ++-
 commit.h                                 |   1 +
 t/t5318-commit-graph.sh                  |  25 +++++
 14 files changed, 353 insertions(+), 37 deletions(-)


base-commit: 34fdd433396ee0e3ef4de02eb2189f8226eafe4e
-- 
2.16.2.329.gfb62395de6


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch
  2018-05-10 17:34 ` [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch Derrick Stolee
@ 2018-05-10 19:05   ` Martin Ågren
  2018-05-10 19:22     ` Stefan Beller
  2018-05-10 19:17   ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 9+ messages in thread
From: Martin Ågren @ 2018-05-10 19:05 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git@vger.kernel.org, peff@peff.net, sbeller@google.com,
	jnareb@gmail.com, stolee@gmail.com

On 10 May 2018 at 19:34, Derrick Stolee <dstolee@microsoft.com> wrote:

> Commits 01-07 focus on the 'git commit-graph verify' subcommand. These
> are ready for full, rigorous review.

I don't know about "full" and "rigorous", but I tried to offer my thoughts.

A lingering feeling I have is that users could possibly benefit from
seeing "the commit-graph has a bad foo" a bit more than just "the
commit-graph is bad". But adding "the bar is baz, should have been
frotz" might not bring that much. Maybe you could keep the translatable
string somewhat simple, then, if the more technical data could be useful
to Git developers, dump it in a non-translated format. (I guess it could
be hidden behind a debug switch, but let's take one step at a time..)
This is nothing I feel strongly about.

>  t/t5318-commit-graph.sh                  |  25 +++++

I wonder about tests. Some tests seem to use `dd` to corrupt files and
check that it gets caught. Doing this in a a hash-agnostic way could be
tricky, but maybe not impossible. I guess we could do something
probabilistic, like "set the first two bytes of the very last OID to
zero -- surely all OIDs can't start with 16 zero bits". Hmm, that might
still require knowing the size of the OIDs...

I hope to find time to do some more hands-on testing of this to see that
errors actually do get caught.

I saw you redirect stdout to a file "output", and anticipated later
commits to actually look into it. I never saw that though. (I did not
apply the patches, so I could have missed something.)

Martin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch
  2018-05-10 17:34 ` [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch Derrick Stolee
  2018-05-10 19:05   ` Martin Ågren
@ 2018-05-10 19:17   ` Ævar Arnfjörð Bjarmason
  2018-05-11 17:23     ` Derrick Stolee
  1 sibling, 1 reply; 9+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-05-10 19:17 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git@vger.kernel.org, peff@peff.net, sbeller@google.com,
	jnareb@gmail.com, stolee@gmail.com


On Thu, May 10 2018, Derrick Stolee wrote:

> The behavior in this patch series does the following:
>
> 1. Near the end of 'git gc', run 'git commit-graph write'. The location
>    of this code assumes that a 'git gc --auto' has not terminated early
>    due to not meeting the auto threshold.
>
> 2. At the end of 'git fetch', run 'git commit-graph write'. This means
>    that every reachable commit will be in the commit-graph after a
>    a successful fetch, which seems a reasonable frequency. Then, the
>    only times we would be missing a reachable commit is after creating
>    one locally. There is a problem with the current patch, though: every
>    'git fetch' call runs 'git commit-graph write', even if there were no
>    ref updates or objects downloaded. Is there a simple way to detect if
>    the fetch was non-trivial?
>
> One obvious problem with this approach: if we compute this during 'gc'
> AND 'fetch', there will be times where a 'fetch' calls 'gc' and triggers
> two commit-graph writes. If I were to abandon one of these patches, it
> would be the 'fetch' integration. A 'git gc' really wants to delete all
> references to unreachable commits, and without updating the commit-graph
> we may still have commit data in the commit-graph file that is not in
> the object database. In fact, deleting commits from the object database
> but not from the commit-graph will cause 'git commit-graph verify' to
> fail!
>
> I welcome discussion on these ideas, as we are venturing out of the
> "pure data structure" world and into the "user experience" world. I am
> less confident in my skills in this world, but the feature is worthless
> if it does not improve the user experience.

I really like #1 here, but I wonder why #2 is necessary.

I.e. is it critical for the performance of the commit graph feature that
it be kept really up-to-date, moreso than other things that rely on gc
--auto (e.g. the optional bitmap index)?

Even if that's the case, I think something that does this via gc --auto
is a much better option. I.e. now we have gc.auto & gc.autoPackLimit, if
the answer to my question above is "yes" this could also be accomplished
by introducing a new graph-specific gc.* setting, and --auto would just
update the graph more often, but leave the rest.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch
  2018-05-10 19:05   ` Martin Ågren
@ 2018-05-10 19:22     ` Stefan Beller
  2018-05-11 17:23       ` Derrick Stolee
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Beller @ 2018-05-10 19:22 UTC (permalink / raw)
  To: Martin Ågren
  Cc: Derrick Stolee, git@vger.kernel.org, peff@peff.net,
	jnareb@gmail.com, stolee@gmail.com

On Thu, May 10, 2018 at 12:05 PM, Martin Ågren <martin.agren@gmail.com> wrote:
> On 10 May 2018 at 19:34, Derrick Stolee <dstolee@microsoft.com> wrote:
>
>> Commits 01-07 focus on the 'git commit-graph verify' subcommand. These
>> are ready for full, rigorous review.
>
> I don't know about "full" and "rigorous", but I tried to offer my thoughts.
>
> A lingering feeling I have is that users could possibly benefit from
> seeing "the commit-graph has a bad foo" a bit more than just "the
> commit-graph is bad". But adding "the bar is baz, should have been
> frotz" might not bring that much. Maybe you could keep the translatable
> string somewhat simple, then, if the more technical data could be useful
> to Git developers, dump it in a non-translated format. (I guess it could
> be hidden behind a debug switch, but let's take one step at a time..)
> This is nothing I feel strongly about.
>
>>  t/t5318-commit-graph.sh                  |  25 +++++
>
> I wonder about tests. Some tests seem to use `dd` to corrupt files and
> check that it gets caught. Doing this in a a hash-agnostic way could be
> tricky, but maybe not impossible. I guess we could do something
> probabilistic, like "set the first two bytes of the very last OID to
> zero -- surely all OIDs can't start with 16 zero bits". Hmm, that might
> still require knowing the size of the OIDs...
>
> I hope to find time to do some more hands-on testing of this to see that
> errors actually do get caught.

Given that the commit graph is secondary data, the users work around
to quickly get back to a well working repository is most likely to remove
the file and regenerate it.
As a developer who wants to fix the bug, a stacktrace/datadump and the
history of git commands might be most valuable, but I agree we should
hide that behind a debug flag.

Packfiles and loose objects are primary data, which means that those
need a more advanced way to diagnose and repair them, so I would imagine
the commit graph fsck is closer to bitmaps fsck, which I would have suspected
to be found in t5310, but a quick read doesn't reveal many tests that are
checking for integrity. So I guess the test coverage here is ok, (although we
should always ask for more)

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch
@ 2018-05-10 20:45 Martin Ågren
  2018-05-11 12:30 ` Derrick Stolee
  0 siblings, 1 reply; 9+ messages in thread
From: Martin Ågren @ 2018-05-10 20:45 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, Stefan Beller, Jeff King, Jakub Narebski, Derrick Stolee

On 10 May 2018 at 21:22, Stefan Beller <sbeller@google.com> wrote:
> On Thu, May 10, 2018 at 12:05 PM, Martin Ågren <martin.agren@gmail.com> wrote:

>> I hope to find time to do some more hands-on testing of this to see that
>> errors actually do get caught.

> Packfiles and loose objects are primary data, which means that those
> need a more advanced way to diagnose and repair them, so I would imagine
> the commit graph fsck is closer to bitmaps fsck, which I would have suspected
> to be found in t5310, but a quick read doesn't reveal many tests that are
> checking for integrity. So I guess the test coverage here is ok, (although we
> should always ask for more)

Since I'm wrapping up for today, I'm posting some simple tests that I
assembled. The last of these showcases one or two problems with the
current error-reporting. Depending on the error, there can be *lots* of
errors reported and there are no new-lines, so the result on stdout can
be a wall of not-very-legible text.

Some of these might not make sense. I just started going through the
documentation on the format, causing some sort of corruption in each
field. Maybe this can be helpful somehow.

Martin

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 82f95eb11f..a7e48db2de 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -255,4 +255,49 @@ test_expect_success 'git fsck (checks commit-graph)' '
 	git fsck
 '
 
+# usage: corrupt_data <file> <pos> [<data>]
+corrupt_data() {
+	file=$1
+	pos=$2
+	data="${3:-\0}"
+	printf "$data" | dd of="$file" bs=1 seek="$pos" conv=notrunc
+}
+
+test_expect_success 'detect bad signature' '
+	cd "$TRASH_DIRECTORY/full" &&
+	cp $objdir/info/commit-graph commit-graph-backup &&
+	test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+	corrupt_data $objdir/info/commit-graph 0 "\0" &&
+	test_must_fail git commit-graph verify 2>err &&
+	grep "graph signature" err
+'
+
+test_expect_success 'detect bad version number' '
+	cd "$TRASH_DIRECTORY/full" &&
+	cp $objdir/info/commit-graph commit-graph-backup &&
+	test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+	corrupt_data $objdir/info/commit-graph 4 "\02" &&
+	test_must_fail git commit-graph verify 2>err &&
+	grep "graph version" err
+'
+
+test_expect_success 'detect bad hash version' '
+	cd "$TRASH_DIRECTORY/full" &&
+	cp $objdir/info/commit-graph commit-graph-backup &&
+	test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+	corrupt_data $objdir/info/commit-graph 5 "\02" &&
+	test_must_fail git commit-graph verify 2>err &&
+	grep "hash version" err
+'
+
+test_expect_success 'detect too small chunk-count' '
+	cd "$TRASH_DIRECTORY/full" &&
+	cp $objdir/info/commit-graph commit-graph-backup &&
+	test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+	corrupt_data $objdir/info/commit-graph 6 "\01" &&
+	test_must_fail git commit-graph verify 2>err &&
+	cat err
+'
+
+
 test_done
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch
  2018-05-10 20:45 [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch Martin Ågren
@ 2018-05-11 12:30 ` Derrick Stolee
  0 siblings, 0 replies; 9+ messages in thread
From: Derrick Stolee @ 2018-05-11 12:30 UTC (permalink / raw)
  To: Martin Ågren, Derrick Stolee
  Cc: git, Stefan Beller, Jeff King, Jakub Narebski

On 5/10/2018 4:45 PM, Martin Ågren wrote:
> On 10 May 2018 at 21:22, Stefan Beller <sbeller@google.com> wrote:
>> On Thu, May 10, 2018 at 12:05 PM, Martin Ågren <martin.agren@gmail.com> wrote:
>>> I hope to find time to do some more hands-on testing of this to see that
>>> errors actually do get caught.
>> Packfiles and loose objects are primary data, which means that those
>> need a more advanced way to diagnose and repair them, so I would imagine
>> the commit graph fsck is closer to bitmaps fsck, which I would have suspected
>> to be found in t5310, but a quick read doesn't reveal many tests that are
>> checking for integrity. So I guess the test coverage here is ok, (although we
>> should always ask for more)
> Since I'm wrapping up for today, I'm posting some simple tests that I
> assembled. The last of these showcases one or two problems with the
> current error-reporting. Depending on the error, there can be *lots* of
> errors reported and there are no new-lines, so the result on stdout can
> be a wall of not-very-legible text.
>
> Some of these might not make sense. I just started going through the
> documentation on the format, causing some sort of corruption in each
> field. Maybe this can be helpful somehow.
>
> Martin
>
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index 82f95eb11f..a7e48db2de 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -255,4 +255,49 @@ test_expect_success 'git fsck (checks commit-graph)' '
>   	git fsck
>   '
>   
> +# usage: corrupt_data <file> <pos> [<data>]
> +corrupt_data() {
> +	file=$1
> +	pos=$2
> +	data="${3:-\0}"
> +	printf "$data" | dd of="$file" bs=1 seek="$pos" conv=notrunc
> +}
> +
> +test_expect_success 'detect bad signature' '
> +	cd "$TRASH_DIRECTORY/full" &&
> +	cp $objdir/info/commit-graph commit-graph-backup &&
> +	test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
> +	corrupt_data $objdir/info/commit-graph 0 "\0" &&
> +	test_must_fail git commit-graph verify 2>err &&
> +	grep "graph signature" err
> +'
> +
> +test_expect_success 'detect bad version number' '
> +	cd "$TRASH_DIRECTORY/full" &&
> +	cp $objdir/info/commit-graph commit-graph-backup &&
> +	test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
> +	corrupt_data $objdir/info/commit-graph 4 "\02" &&
> +	test_must_fail git commit-graph verify 2>err &&
> +	grep "graph version" err
> +'
> +
> +test_expect_success 'detect bad hash version' '
> +	cd "$TRASH_DIRECTORY/full" &&
> +	cp $objdir/info/commit-graph commit-graph-backup &&
> +	test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
> +	corrupt_data $objdir/info/commit-graph 5 "\02" &&
> +	test_must_fail git commit-graph verify 2>err &&
> +	grep "hash version" err
> +'
> +
> +test_expect_success 'detect too small chunk-count' '
> +	cd "$TRASH_DIRECTORY/full" &&
> +	cp $objdir/info/commit-graph commit-graph-backup &&
> +	test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
> +	corrupt_data $objdir/info/commit-graph 6 "\01" &&
> +	test_must_fail git commit-graph verify 2>err &&
> +	cat err
> +'
> +
> +
>   test_done


Martin: thank you so much for these test examples, and for running them 
to find out about the newline issue. This is really helpful.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch
  2018-05-10 19:17   ` Ævar Arnfjörð Bjarmason
@ 2018-05-11 17:23     ` Derrick Stolee
  0 siblings, 0 replies; 9+ messages in thread
From: Derrick Stolee @ 2018-05-11 17:23 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Derrick Stolee
  Cc: git@vger.kernel.org, peff@peff.net, sbeller@google.com,
	jnareb@gmail.com

On 5/10/2018 3:17 PM, Ævar Arnfjörð Bjarmason wrote:
> On Thu, May 10 2018, Derrick Stolee wrote:
>
>> The behavior in this patch series does the following:
>>
>> 1. Near the end of 'git gc', run 'git commit-graph write'. The location
>>     of this code assumes that a 'git gc --auto' has not terminated early
>>     due to not meeting the auto threshold.
>>
>> 2. At the end of 'git fetch', run 'git commit-graph write'. This means
>>     that every reachable commit will be in the commit-graph after a
>>     a successful fetch, which seems a reasonable frequency. Then, the
>>     only times we would be missing a reachable commit is after creating
>>     one locally. There is a problem with the current patch, though: every
>>     'git fetch' call runs 'git commit-graph write', even if there were no
>>     ref updates or objects downloaded. Is there a simple way to detect if
>>     the fetch was non-trivial?
>>
>> One obvious problem with this approach: if we compute this during 'gc'
>> AND 'fetch', there will be times where a 'fetch' calls 'gc' and triggers
>> two commit-graph writes. If I were to abandon one of these patches, it
>> would be the 'fetch' integration. A 'git gc' really wants to delete all
>> references to unreachable commits, and without updating the commit-graph
>> we may still have commit data in the commit-graph file that is not in
>> the object database. In fact, deleting commits from the object database
>> but not from the commit-graph will cause 'git commit-graph verify' to
>> fail!
>>
>> I welcome discussion on these ideas, as we are venturing out of the
>> "pure data structure" world and into the "user experience" world. I am
>> less confident in my skills in this world, but the feature is worthless
>> if it does not improve the user experience.
> I really like #1 here, but I wonder why #2 is necessary.
>
> I.e. is it critical for the performance of the commit graph feature that
> it be kept really up-to-date, moreso than other things that rely on gc
> --auto (e.g. the optional bitmap index)?

It is not critical. The feature has been designed to have recent commits 
not in the file. For simplicity, it is probably best to limit ourselves 
to writing after a non-trivial 'gc'.

> Even if that's the case, I think something that does this via gc --auto
> is a much better option. I.e. now we have gc.auto & gc.autoPackLimit, if
> the answer to my question above is "yes" this could also be accomplished
> by introducing a new graph-specific gc.* setting, and --auto would just
> update the graph more often, but leave the rest.

This is an excellent idea for a follow-up series, if we find we want the 
commit-graph written more frequently. For now, I'm satisfied with one 
place where it is automatically computed.

I'll drop the fetch integration in my v2 series.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch
  2018-05-10 19:22     ` Stefan Beller
@ 2018-05-11 17:23       ` Derrick Stolee
  2018-05-11 17:30         ` Martin Ågren
  0 siblings, 1 reply; 9+ messages in thread
From: Derrick Stolee @ 2018-05-11 17:23 UTC (permalink / raw)
  To: Stefan Beller, Martin Ågren
  Cc: Derrick Stolee, git@vger.kernel.org, peff@peff.net,
	jnareb@gmail.com

On 5/10/2018 3:22 PM, Stefan Beller wrote:
> On Thu, May 10, 2018 at 12:05 PM, Martin Ågren <martin.agren@gmail.com> wrote:
>> On 10 May 2018 at 19:34, Derrick Stolee <dstolee@microsoft.com> wrote:
>>
>>> Commits 01-07 focus on the 'git commit-graph verify' subcommand. These
>>> are ready for full, rigorous review.
>> I don't know about "full" and "rigorous", but I tried to offer my thoughts.
>>
>> A lingering feeling I have is that users could possibly benefit from
>> seeing "the commit-graph has a bad foo" a bit more than just "the
>> commit-graph is bad". But adding "the bar is baz, should have been
>> frotz" might not bring that much. Maybe you could keep the translatable
>> string somewhat simple, then, if the more technical data could be useful
>> to Git developers, dump it in a non-translated format. (I guess it could
>> be hidden behind a debug switch, but let's take one step at a time..)
>> This is nothing I feel strongly about.
>>
>>>   t/t5318-commit-graph.sh                  |  25 +++++
>> I wonder about tests. Some tests seem to use `dd` to corrupt files and
>> check that it gets caught. Doing this in a a hash-agnostic way could be
>> tricky, but maybe not impossible. I guess we could do something
>> probabilistic, like "set the first two bytes of the very last OID to
>> zero -- surely all OIDs can't start with 16 zero bits". Hmm, that might
>> still require knowing the size of the OIDs...
>>
>> I hope to find time to do some more hands-on testing of this to see that
>> errors actually do get caught.
> Given that the commit graph is secondary data, the users work around
> to quickly get back to a well working repository is most likely to remove
> the file and regenerate it.
> As a developer who wants to fix the bug, a stacktrace/datadump and the
> history of git commands might be most valuable, but I agree we should
> hide that behind a debug flag.
>
> Packfiles and loose objects are primary data, which means that those
> need a more advanced way to diagnose and repair them, so I would imagine
> the commit graph fsck is closer to bitmaps fsck, which I would have suspected
> to be found in t5310, but a quick read doesn't reveal many tests that are
> checking for integrity. So I guess the test coverage here is ok, (although we
> should always ask for more)

My main goal is to help developers figure out what is wrong with a file, 
and then we can use other diagnostic tools to discover how it got into 
that state.

Martin's initial test cases are wonderful. I've adapted them to test the 
other conditions in the verify_commit_graph() method and caught some 
interesting behavior in the process. I'm preparing v2 so we can 
investigate the direction of the tests.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch
  2018-05-11 17:23       ` Derrick Stolee
@ 2018-05-11 17:30         ` Martin Ågren
  0 siblings, 0 replies; 9+ messages in thread
From: Martin Ågren @ 2018-05-11 17:30 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Stefan Beller, Derrick Stolee, git@vger.kernel.org, peff@peff.net,
	jnareb@gmail.com

On 11 May 2018 at 19:23, Derrick Stolee <stolee@gmail.com> wrote:

> Martin's initial test cases are wonderful. I've adapted them to test the
> other conditions in the verify_commit_graph() method and caught some
> interesting behavior in the process. I'm preparing v2 so we can investigate
> the direction of the tests.

Cool, I'm glad you found them useful. One thought I had was that you
could possibly write the tests such that you introduce errors from the
back of the file. That might enable you to do less of the "backup
commit-graph file and restore it"-dance. Just a thought.

Martin

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-05-11 17:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-10 20:45 [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch Martin Ågren
2018-05-11 12:30 ` Derrick Stolee
  -- strict thread matches above, loose matches on Subject: below --
2018-04-17 18:10 [RFC PATCH 00/12] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
2018-05-10 17:34 ` [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch Derrick Stolee
2018-05-10 19:05   ` Martin Ågren
2018-05-10 19:22     ` Stefan Beller
2018-05-11 17:23       ` Derrick Stolee
2018-05-11 17:30         ` Martin Ågren
2018-05-10 19:17   ` Ævar Arnfjörð Bjarmason
2018-05-11 17:23     ` Derrick Stolee

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).