* [PATCH] commit-graph: add verify changed paths option
@ 2020-07-31 7:49 Son Luong Ngoc via GitGitGadget
2020-07-31 16:21 ` Christian Couder
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Son Luong Ngoc via GitGitGadget @ 2020-07-31 7:49 UTC (permalink / raw)
To: git; +Cc: Son Luong Ngoc, Son Luong Ngoc
From: Son Luong Ngoc <sluongng@gmail.com>
Add '--has-changed-paths' option to 'git commit-graph verify' subcommand
to validate whether the commit-graph was written with '--changed-paths'
option.
Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
---
Commit-Graph: Verify bloom filter
When I was working on git-care(1) and Gitaly(2), the need to check
whether a commit-graph (split or non-split) were built with Bloom
filter. This is needed especially when a repository primary commit-graph
write strategy is '--split' and the bottom chains might rarely be
re-written (or never) thus Bloom filter is never applied to the graph.
Provides users with a straight forward way to validate the existence of
Bloom filter chunks to save user having to read the commit-graph
manually as show in (1) and (2).
References:
1. https://github.com/sluongng/git-care/commit/d0feaa381ea3ec7b0e617c6596ad6e3cf16b884a
2. https://gitlab.com/sluongng/gitaly/-/commit/78dba8b73e720b11500482b19b755346ec853025
------------------------------------------------------------------------
It's probably going to take me a bit more time to write up some tests
for this, so I want to send it out first for comments.
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-687%2Fsluongng%2Fsluongngoc%2Fverify-bloom-filter-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-687/sluongng/sluongngoc/verify-bloom-filter-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/687
builtin/commit-graph.c | 12 +++++++++---
commit-graph.c | 22 +++++++++++++++++-----
commit-graph.h | 12 +++++++++---
3 files changed, 35 insertions(+), 11 deletions(-)
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 16c9f6101a..ce8a7cbe90 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -18,7 +18,8 @@ static char const * const builtin_commit_graph_usage[] = {
};
static const char * const builtin_commit_graph_verify_usage[] = {
- N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"),
+ N_("git commit-graph verify [--object-dir <objdir>] [--shallow] "
+ "[--has-changed-paths] [--[no-]progress]"),
NULL
};
@@ -37,6 +38,7 @@ static struct opts_commit_graph {
int append;
int split;
int shallow;
+ int has_changed_paths;
int progress;
int enable_changed_paths;
} opts;
@@ -71,12 +73,14 @@ static int graph_verify(int argc, const char **argv)
int open_ok;
int fd;
struct stat st;
- int flags = 0;
+ enum commit_graph_verify_flags flags = 0;
static struct option builtin_commit_graph_verify_options[] = {
OPT_STRING(0, "object-dir", &opts.obj_dir,
N_("dir"),
N_("The object directory to store the graph")),
+ OPT_BOOL(0, "has-changed-paths", &opts.has_changed_paths,
+ N_("verify that the commit-graph includes changed paths")),
OPT_BOOL(0, "shallow", &opts.shallow,
N_("if the commit-graph is split, only verify the tip file")),
OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")),
@@ -94,8 +98,10 @@ static int graph_verify(int argc, const char **argv)
opts.obj_dir = get_object_directory();
if (opts.shallow)
flags |= COMMIT_GRAPH_VERIFY_SHALLOW;
+ if (opts.has_changed_paths)
+ flags |= COMMIT_GRAPH_VERIFY_CHANGED_PATHS;
if (opts.progress)
- flags |= COMMIT_GRAPH_WRITE_PROGRESS;
+ flags |= COMMIT_GRAPH_VERIFY_PROGRESS;
odb = find_odb(the_repository, opts.obj_dir);
graph_name = get_commit_graph_filename(odb);
diff --git a/commit-graph.c b/commit-graph.c
index 1af68c297d..d83f5a2325 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -250,7 +250,7 @@ struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st,
return ret;
}
-static int verify_commit_graph_lite(struct commit_graph *g)
+static int verify_commit_graph_lite(struct commit_graph *g, int verify_changed_path)
{
/*
* Basic validation shared between parse_commit_graph()
@@ -276,6 +276,16 @@ static int verify_commit_graph_lite(struct commit_graph *g)
error("commit-graph is missing the Commit Data chunk");
return 1;
}
+ if (verify_changed_path) {
+ if (!g->chunk_bloom_indexes) {
+ error("commit-graph is missing Bloom Index chunk");
+ return 1;
+ }
+ if (!g->chunk_bloom_data) {
+ error("commit-graph is missing Bloom Data chunk");
+ return 1;
+ }
+ }
return 0;
}
@@ -439,7 +449,7 @@ struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size)
hashcpy(graph->oid.hash, graph->data + graph->data_len - graph->hash_len);
- if (verify_commit_graph_lite(graph))
+ if (verify_commit_graph_lite(graph, 0))
goto free_and_return;
return graph;
@@ -2216,7 +2226,9 @@ static void graph_report(const char *fmt, ...)
#define GENERATION_ZERO_EXISTS 1
#define GENERATION_NUMBER_EXISTS 2
-int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
+int verify_commit_graph(struct repository *r,
+ struct commit_graph *g,
+ enum commit_graph_verify_flags flags)
{
uint32_t i, cur_fanout_pos = 0;
struct object_id prev_oid, cur_oid, checksum;
@@ -2231,7 +2243,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
return 1;
}
- verify_commit_graph_error = verify_commit_graph_lite(g);
+ verify_commit_graph_error = verify_commit_graph_lite(g, flags & COMMIT_GRAPH_VERIFY_CHANGED_PATHS);
if (verify_commit_graph_error)
return verify_commit_graph_error;
@@ -2284,7 +2296,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)
return verify_commit_graph_error;
- if (flags & COMMIT_GRAPH_WRITE_PROGRESS)
+ if (flags & COMMIT_GRAPH_VERIFY_PROGRESS)
progress = start_progress(_("Verifying commits in commit graph"),
g->num_commits);
diff --git a/commit-graph.h b/commit-graph.h
index 28f89cdf3e..29c01b5000 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -94,6 +94,12 @@ enum commit_graph_write_flags {
COMMIT_GRAPH_WRITE_BLOOM_FILTERS = (1 << 3),
};
+enum commit_graph_verify_flags {
+ COMMIT_GRAPH_VERIFY_SHALLOW = (1 << 0),
+ COMMIT_GRAPH_VERIFY_CHANGED_PATHS = (1 << 1),
+ COMMIT_GRAPH_VERIFY_PROGRESS = (1 << 2),
+};
+
enum commit_graph_split_flags {
COMMIT_GRAPH_SPLIT_UNSPECIFIED = 0,
COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED = 1,
@@ -122,9 +128,9 @@ int write_commit_graph(struct object_directory *odb,
enum commit_graph_write_flags flags,
const struct split_commit_graph_opts *split_opts);
-#define COMMIT_GRAPH_VERIFY_SHALLOW (1 << 0)
-
-int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags);
+int verify_commit_graph(struct repository *r,
+ struct commit_graph *g,
+ enum commit_graph_verify_flags flags);
void close_commit_graph(struct raw_object_store *);
void free_commit_graph(struct commit_graph *);
base-commit: 47ae905ffb98cc4d4fd90083da6bc8dab55d9ecc
--
gitgitgadget
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] commit-graph: add verify changed paths option
2020-07-31 7:49 [PATCH] commit-graph: add verify changed paths option Son Luong Ngoc via GitGitGadget
@ 2020-07-31 16:21 ` Christian Couder
2020-07-31 17:14 ` Junio C Hamano
2020-07-31 18:02 ` Jeff King
2 siblings, 0 replies; 8+ messages in thread
From: Christian Couder @ 2020-07-31 16:21 UTC (permalink / raw)
To: Son Luong Ngoc via GitGitGadget; +Cc: git, Son Luong Ngoc
On Fri, Jul 31, 2020 at 9:52 AM Son Luong Ngoc via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Son Luong Ngoc <sluongng@gmail.com>
>
> Add '--has-changed-paths' option to 'git commit-graph verify' subcommand
> to validate whether the commit-graph was written with '--changed-paths'
> option.
>
> Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
[...]
> It's probably going to take me a bit more time to write up some tests
> for this,
It would need some documentation too.
> so I want to send it out first for comments.
[...]
> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index 16c9f6101a..ce8a7cbe90 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -18,7 +18,8 @@ static char const * const builtin_commit_graph_usage[] = {
> };
>
> static const char * const builtin_commit_graph_verify_usage[] = {
> - N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"),
> + N_("git commit-graph verify [--object-dir <objdir>] [--shallow] "
> + "[--has-changed-paths] [--[no-]progress]"),
> NULL
> };
>
> @@ -37,6 +38,7 @@ static struct opts_commit_graph {
> int append;
> int split;
> int shallow;
> + int has_changed_paths;
> int progress;
> int enable_changed_paths;
> } opts;
> @@ -71,12 +73,14 @@ static int graph_verify(int argc, const char **argv)
> int open_ok;
> int fd;
> struct stat st;
> - int flags = 0;
> + enum commit_graph_verify_flags flags = 0;
>
> static struct option builtin_commit_graph_verify_options[] = {
> OPT_STRING(0, "object-dir", &opts.obj_dir,
> N_("dir"),
> N_("The object directory to store the graph")),
> + OPT_BOOL(0, "has-changed-paths", &opts.has_changed_paths,
> + N_("verify that the commit-graph includes changed paths")),
> OPT_BOOL(0, "shallow", &opts.shallow,
> N_("if the commit-graph is split, only verify the tip file")),
> OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")),
> @@ -94,8 +98,10 @@ static int graph_verify(int argc, const char **argv)
> opts.obj_dir = get_object_directory();
> if (opts.shallow)
> flags |= COMMIT_GRAPH_VERIFY_SHALLOW;
> + if (opts.has_changed_paths)
> + flags |= COMMIT_GRAPH_VERIFY_CHANGED_PATHS;
I wonder if OPT_BIT() could be used instead of OPT_BOOL() above to
directly set the above flag, as the 'has_changed_paths' field in
'struct opts_commit_graph' seems to be used only for the purpose of
setting this flag.
> if (opts.progress)
> - flags |= COMMIT_GRAPH_WRITE_PROGRESS;
> + flags |= COMMIT_GRAPH_VERIFY_PROGRESS;
Does this change belong to this patch? I think it would deserve an
explanation in the commit message if that's the case.
> odb = find_odb(the_repository, opts.obj_dir);
> graph_name = get_commit_graph_filename(odb);
> diff --git a/commit-graph.c b/commit-graph.c
> index 1af68c297d..d83f5a2325 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -250,7 +250,7 @@ struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st,
> return ret;
> }
>
> -static int verify_commit_graph_lite(struct commit_graph *g)
> +static int verify_commit_graph_lite(struct commit_graph *g, int verify_changed_path)
[...]
> -int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
> +int verify_commit_graph(struct repository *r,
> + struct commit_graph *g,
> + enum commit_graph_verify_flags flags)
It seems to me that it would be more coherent to have both
verify_commit_graph() and verify_commit_graph_lite() accept an 'enum
commit_graph_verify_flags flags' argument.
Right now the "has_changed_paths" option is first an int, then it's
converted to a flag and then to an int again before being passed to
verify_commit_graph_lite(). It would be simpler if it could be a flag
all along.
Thanks,
Christian.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] commit-graph: add verify changed paths option
2020-07-31 7:49 [PATCH] commit-graph: add verify changed paths option Son Luong Ngoc via GitGitGadget
2020-07-31 16:21 ` Christian Couder
@ 2020-07-31 17:14 ` Junio C Hamano
2020-07-31 18:06 ` Taylor Blau
2020-07-31 18:02 ` Jeff King
2 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2020-07-31 17:14 UTC (permalink / raw)
To: Son Luong Ngoc via GitGitGadget; +Cc: git, Son Luong Ngoc
"Son Luong Ngoc via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Son Luong Ngoc <sluongng@gmail.com>
>
> Add '--has-changed-paths' option to 'git commit-graph verify' subcommand
> to validate whether the commit-graph was written with '--changed-paths'
> option.
The implementation seems to be only about "does this section exist?"
and not "does this section have healthy/uncorrupted data?", which
feels a bit strange for "verify". Instead of setting ourselves up
to having to add "--has-this-section" and "--has-that-section" every
time a new kind of data is added to the system, how about giving the
verify command an option to list all the sections found in the file,
or a separate "git commit-graph list-sections" subcommand?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] commit-graph: add verify changed paths option
2020-07-31 7:49 [PATCH] commit-graph: add verify changed paths option Son Luong Ngoc via GitGitGadget
2020-07-31 16:21 ` Christian Couder
2020-07-31 17:14 ` Junio C Hamano
@ 2020-07-31 18:02 ` Jeff King
2020-07-31 18:09 ` Taylor Blau
2 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2020-07-31 18:02 UTC (permalink / raw)
To: Son Luong Ngoc via GitGitGadget; +Cc: git, Son Luong Ngoc
On Fri, Jul 31, 2020 at 07:49:25AM +0000, Son Luong Ngoc via GitGitGadget wrote:
> From: Son Luong Ngoc <sluongng@gmail.com>
>
> Add '--has-changed-paths' option to 'git commit-graph verify' subcommand
> to validate whether the commit-graph was written with '--changed-paths'
> option.
Is a single boolean flag sufficient? If you have incrementals, you might
have some slices with this chunk and some without. What should the
boolean be in that case?
I thought we had some way of reporting the number of commits covered by
filters, but I can't seem to find it.
Our "test-tool read-graph" can report on whether there's a bloom filter
chunk, but I think it also doesn't distinguish between different slices
(and anyway, it wouldn't be suitable for tools that don't rely on an
actual built git.git directory).
-Peff
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] commit-graph: add verify changed paths option
2020-07-31 17:14 ` Junio C Hamano
@ 2020-07-31 18:06 ` Taylor Blau
0 siblings, 0 replies; 8+ messages in thread
From: Taylor Blau @ 2020-07-31 18:06 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Son Luong Ngoc via GitGitGadget, git, Son Luong Ngoc
On Fri, Jul 31, 2020 at 10:14:39AM -0700, Junio C Hamano wrote:
> "Son Luong Ngoc via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Son Luong Ngoc <sluongng@gmail.com>
> >
> > Add '--has-changed-paths' option to 'git commit-graph verify' subcommand
> > to validate whether the commit-graph was written with '--changed-paths'
> > option.
>
> The implementation seems to be only about "does this section exist?"
> and not "does this section have healthy/uncorrupted data?", which
> feels a bit strange for "verify". Instead of setting ourselves up
> to having to add "--has-this-section" and "--has-that-section" every
> time a new kind of data is added to the system, how about giving the
> verify command an option to list all the sections found in the file,
> or a separate "git commit-graph list-sections" subcommand?
Completely agreed. When I suggested that Son work on this, I more had in
mind something like 'git commit-graph verify --changed-paths' to mean
"verify the integrity of the commit-graph(s), including regenerating
changed-path Bloom filters and making sure they match".
If you are just curious whether or not the section exists, I'd rather
write a script to look for the 'BIDX' or 'BDAT' chunk IDs. That said, if
they're spread across incremental, maybe it makes more sense to extend
the commit-graph test tool.
I dunno.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] commit-graph: add verify changed paths option
2020-07-31 18:02 ` Jeff King
@ 2020-07-31 18:09 ` Taylor Blau
2020-07-31 19:14 ` Jeff King
0 siblings, 1 reply; 8+ messages in thread
From: Taylor Blau @ 2020-07-31 18:09 UTC (permalink / raw)
To: Jeff King; +Cc: Son Luong Ngoc via GitGitGadget, git, Son Luong Ngoc
On Fri, Jul 31, 2020 at 02:02:35PM -0400, Jeff King wrote:
> On Fri, Jul 31, 2020 at 07:49:25AM +0000, Son Luong Ngoc via GitGitGadget wrote:
>
> > From: Son Luong Ngoc <sluongng@gmail.com>
> >
> > Add '--has-changed-paths' option to 'git commit-graph verify' subcommand
> > to validate whether the commit-graph was written with '--changed-paths'
> > option.
>
> Is a single boolean flag sufficient? If you have incrementals, you might
> have some slices with this chunk and some without. What should the
> boolean be in that case?
I think you'd really want to know which layers do and don't have
filters. It might be even more interesting to have a tool like what 'git
show-index' is to '*.idx' files, maybe something like 'git show-graph'
or 'git show-commit-graph'. Its output would be one line per commit that
shows:
- what layer in the chain it's located at
- its graph_pos
- its generation number
- whether or not it has a Bloom filter
- ???
That would be a useful tool for debugging anyway, even outside of the
test suite. It would be even better if we could replace the test-tool
with it.
On an unrelated note; this patch is broken as-is, since it will only
report that Bloom filters exist if the top-most graph has them. I have a
patch to fix this that I have been meaning to send out for most of this
week. I'll try to get to it shortly.
> I thought we had some way of reporting the number of commits covered by
> filters, but I can't seem to find it.
I don't recall having anything like that.
> Our "test-tool read-graph" can report on whether there's a bloom filter
> chunk, but I think it also doesn't distinguish between different slices
> (and anyway, it wouldn't be suitable for tools that don't rely on an
> actual built git.git directory).
>
> -Peff
Thanks,
Taylor
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] commit-graph: add verify changed paths option
2020-07-31 18:09 ` Taylor Blau
@ 2020-07-31 19:14 ` Jeff King
2020-07-31 19:31 ` Son Luong Ngoc
0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2020-07-31 19:14 UTC (permalink / raw)
To: Taylor Blau; +Cc: Son Luong Ngoc via GitGitGadget, git, Son Luong Ngoc
On Fri, Jul 31, 2020 at 02:09:56PM -0400, Taylor Blau wrote:
> > Is a single boolean flag sufficient? If you have incrementals, you might
> > have some slices with this chunk and some without. What should the
> > boolean be in that case?
>
> I think you'd really want to know which layers do and don't have
> filters. It might be even more interesting to have a tool like what 'git
> show-index' is to '*.idx' files, maybe something like 'git show-graph'
> or 'git show-commit-graph'. Its output would be one line per commit that
> shows:
>
> - what layer in the chain it's located at
> - its graph_pos
> - its generation number
> - whether or not it has a Bloom filter
> - ???
>
> That would be a useful tool for debugging anyway, even outside of the
> test suite. It would be even better if we could replace the test-tool
> with it.
Yeah, that was exactly what I had in mind, except that I'd make it a
sub-command of "git commit-graph" ("show" or perhaps "dump").
-Peff
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] commit-graph: add verify changed paths option
2020-07-31 19:14 ` Jeff King
@ 2020-07-31 19:31 ` Son Luong Ngoc
0 siblings, 0 replies; 8+ messages in thread
From: Son Luong Ngoc @ 2020-07-31 19:31 UTC (permalink / raw)
To: Jeff King; +Cc: Taylor Blau, Son Luong Ngoc via GitGitGadget, git
Note: re-send to mailing list due to me forgot to turn on Plain Text format.
(sorry for the noise)
Hi Peff, Taylor, Junio and Christian,
Thanks a lot for the valuable feedbacks.
This is exactly what I was hoping for by sending out the patch early!
> On Jul 31, 2020, at 21:14, Jeff King <peff@peff.net> wrote:
>
> On Fri, Jul 31, 2020 at 02:09:56PM -0400, Taylor Blau wrote:
>
>>> Is a single boolean flag sufficient? If you have incrementals, you might
>>> have some slices with this chunk and some without. What should the
>>> boolean be in that case?
>>
>> I think you'd really want to know which layers do and don't have
>> filters. It might be even more interesting to have a tool like what 'git
>> show-index' is to '*.idx' files, maybe something like 'git show-graph'
>> or 'git show-commit-graph'. Its output would be one line per commit that
>> shows:
>>
>> - what layer in the chain it's located at
>> - its graph_pos
>> - its generation number
>> - whether or not it has a Bloom filter
>> - ???
>>
>> That would be a useful tool for debugging anyway, even outside of the
>> test suite. It would be even better if we could replace the test-tool
>> with it.
>
> Yeah, that was exactly what I had in mind, except that I'd make it a
> sub-command of "git commit-graph" ("show" or perhaps "dump").
I loved Junio's initial suggestion and the follow up here.
I was thinking of something like 'git commit-graph verify --verbose' but
now I agree that a distinct command such as 'show' might be more
distinct and better communicate the purpose.
I will stick with my poor-man bash/golang script for now to invalidate
the commit-graph (chain or no-chain) as it does the job just fine.
Let me see if I have the capacity to implement 'show' sub-command
after. ^_^!
>
> -Peff
Cheers,
Son Luong.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-07-31 19:31 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-31 7:49 [PATCH] commit-graph: add verify changed paths option Son Luong Ngoc via GitGitGadget
2020-07-31 16:21 ` Christian Couder
2020-07-31 17:14 ` Junio C Hamano
2020-07-31 18:06 ` Taylor Blau
2020-07-31 18:02 ` Jeff King
2020-07-31 18:09 ` Taylor Blau
2020-07-31 19:14 ` Jeff King
2020-07-31 19:31 ` Son Luong Ngoc
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).