git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Derrick Stolee <dstolee@microsoft.com>
Cc: git@vger.kernel.org, "avarab\@gmail.com" <avarab@gmail.com>,
	"martin.agren\@gmail.com" <martin.agren@gmail.com>,
	"peff\@peff.net" <peff@peff.net>
Subject: Re: [PATCH v2 01/12] commit-graph: add 'verify' subcommand
Date: Sun, 20 May 2018 14:10:41 +0200	[thread overview]
Message-ID: <86d0xqeddq.fsf@gmail.com> (raw)
In-Reply-To: <20180511211504.79877-2-dstolee@microsoft.com> (Derrick Stolee's message of "Fri, 11 May 2018 21:15:15 +0000")

Derrick Stolee <dstolee@microsoft.com> writes:

> If the commit-graph file becomes corrupt, we need a way to verify
> that its contents match the object database. In the manner of
> 'git fsck' we will implement a 'git commit-graph verify' subcommand
> to report all issues with the file.
>
> Add the 'verify' subcommand to the 'commit-graph' builtin and its
> documentation. The subcommand is currently a no-op except for
> loading the commit-graph into memory, which may trigger run-time
> errors that would be caught by normal use. Add a simple test that
> ensures the command returns a zero error code.
>
> If no commit-graph file exists, this is an acceptable state. Do
> not report any errors.

All right.  Nice introductory patch.

>
> During review, we noticed that a FREE_AND_NULL(graph_name) was
> placed after a possible 'return', and this pattern was also in
> graph_read(). Fix that case, too.

This should probably be a separate [micro-]patch.  Especially as Martin
Ågren noticed it is not correct...

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/git-commit-graph.txt |  6 ++++++
>  builtin/commit-graph.c             | 40 +++++++++++++++++++++++++++++++++++++-
>  commit-graph.c                     |  5 +++++
>  commit-graph.h                     |  2 ++
>  t/t5318-commit-graph.sh            | 10 ++++++++++
>  5 files changed, 62 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
> index 4c97b555cc..a222cfab08 100644
> --- a/Documentation/git-commit-graph.txt
> +++ b/Documentation/git-commit-graph.txt
> @@ -10,6 +10,7 @@ SYNOPSIS
>  --------
>  [verse]
>  'git commit-graph read' [--object-dir <dir>]
> +'git commit-graph verify' [--object-dir <dir>]
>  'git commit-graph write' <options> [--object-dir <dir>]
>  
>  
> @@ -52,6 +53,11 @@ existing commit-graph file.
>  Read a graph file given by the commit-graph file and output basic
>  details about the graph file. Used for debugging purposes.
>  
> +'verify'::
> +
> +Read the commit-graph file and verify its contents against the object
> +database. Used to check for corrupted data.

I wonder if it would be useful to have an option to verify commit-graph
file without accessing the object database, checking just that it is
well formed.

Anyway, it could be added later, if needed.

> +
>  
>  EXAMPLES
>  --------
> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index 37420ae0fd..af3101291f 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -8,10 +8,16 @@
>  static char const * const builtin_commit_graph_usage[] = {
>  	N_("git commit-graph [--object-dir <objdir>]"),
>  	N_("git commit-graph read [--object-dir <objdir>]"),
> +	N_("git commit-graph verify [--object-dir <objdir>]"),
>  	N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
>  	NULL
>  };
>  
> +static const char * const builtin_commit_graph_verify_usage[] = {
> +	N_("git commit-graph verify [--object-dir <objdir>]"),
> +	NULL
> +};
> +
>  static const char * const builtin_commit_graph_read_usage[] = {
>  	N_("git commit-graph read [--object-dir <objdir>]"),
>  	NULL
> @@ -29,6 +35,36 @@ static struct opts_commit_graph {
>  	int append;
>  } opts;
>  
> +
> +static int graph_verify(int argc, const char **argv)

A reminder for myself: exit code 0 means no errors.

> +{
> +	struct commit_graph *graph = 0;
> +	char *graph_name;
> +
> +	static struct option builtin_commit_graph_verify_options[] = {
> +		OPT_STRING(0, "object-dir", &opts.obj_dir,
> +			   N_("dir"),
> +			   N_("The object directory to store the graph")),
> +		OPT_END(),
> +	};
> +
> +	argc = parse_options(argc, argv, NULL,
> +			     builtin_commit_graph_verify_options,
> +			     builtin_commit_graph_verify_usage, 0);
> +
> +	if (!opts.obj_dir)
> +		opts.obj_dir = get_object_directory();

All right, simple handling of a subcommand and its options.


I still think that '--object-dir=<path>' should be a git wrapper option,
like '--git-dir=<path>' and '--work-tree=<path>' (and
'--namespace=<path>') are.  It would be command-line option equivalent
to the GIT_OBJECT_DIRECTORY environment variable, just like
--git-dir=<path> is for GIT_DIR, and --work-tree=<path> is for
GIT_WORK_TREE, etc.

This way the code would be implemented once for all commands, and there
would be no duplicated code for each git-commit-graph subcommand.

But that may be a matter of a separate patch.

> +
> +	graph_name = get_commit_graph_filename(opts.obj_dir);

This returns full path of commit-graph file, allocating it.

> +	graph = load_commit_graph_one(graph_name);

This reads the file (no checking if core.commitGraph is set, no handling
of alternatives), and verifies that:
 - file is not too small, i.e. smaller than GRAPH_MIN_SIZE
 - it has correct signature
 - it has correct graph version
 - it has correct hash version
 - chunk [offsets] fit within file
 - that OID Fanout, OID Lookup, Commit Data and Large Edges chunks are
   not repeated, though not that all required chunks are present

> +	FREE_AND_NULL(graph_name);

All right, graph_name is not used further, so we free it and set it to
NULL.

Note however that if load_commit_graph_one() finds errors, it would exit
with error code 1... but then exiting the process would free its
resources anyway.

> +
> +	if (!graph)
> +		return 0;

The only case where load_commit_graph_one() returns NULL instead of
exiting is if the file cannot be opened for reading (e.g. it does not
exists, or cannot be opened for reading), or its status cannot be read.

Side question: this is nice defensive programming, but could it happen
that file can be opened for reading, but its status cannot be read?

> +
> +	return verify_commit_graph(graph);

Here we leak graph->fd (it is neither unmapped, nor closed) but that may
not matter as we are exiting anyway.  Well, at least until it gets
libified, then maybe.

> +}
> +
>  static int graph_read(int argc, const char **argv)
>  {
>  	struct commit_graph *graph = NULL;
> @@ -50,10 +86,10 @@ static int graph_read(int argc, const char **argv)
>  
>  	graph_name = get_commit_graph_filename(opts.obj_dir);
>  	graph = load_commit_graph_one(graph_name);
> +	FREE_AND_NULL(graph_name);
>  
>  	if (!graph)
>  		die("graph file %s does not exist", graph_name);
> -	FREE_AND_NULL(graph_name);

Here we use graph_name, which has been just freed and set to NULL.

This should probably be (but I may be wrong):


	if (!graph) {
		UNLEAK(graph_name);
		die("graph file %s does not exist", graph_name);
	}
	FREE_AND_NULL(graph_name);


>  
>  	printf("header: %08x %d %d %d %d\n",
>  		ntohl(*(uint32_t*)graph->data),
> @@ -160,6 +196,8 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
>  			     PARSE_OPT_STOP_AT_NON_OPTION);
>  
>  	if (argc > 0) {
> +		if (!strcmp(argv[0], "verify"))
> +			return graph_verify(argc, argv);
>  		if (!strcmp(argv[0], "read"))
>  			return graph_read(argc, argv);
>  		if (!strcmp(argv[0], "write"))

All right, straightforward adding of a new subcommand.

> diff --git a/commit-graph.c b/commit-graph.c
> index a8c337dd77..b25aaed128 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -817,3 +817,8 @@ void write_commit_graph(const char *obj_dir,
>  	oids.alloc = 0;
>  	oids.nr = 0;
>  }
> +
> +int verify_commit_graph(struct commit_graph *g)
> +{
> +	return !g;
> +}

All right, nice placeholder.

> diff --git a/commit-graph.h b/commit-graph.h
> index 96cccb10f3..71a39c5a57 100644
> --- a/commit-graph.h
> +++ b/commit-graph.h
> @@ -53,4 +53,6 @@ void write_commit_graph(const char *obj_dir,
>  			int nr_commits,
>  			int append);
>  
> +int verify_commit_graph(struct commit_graph *g);
> +
>  #endif
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index 77d85aefe7..6ca451dfd2 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -11,6 +11,11 @@ test_expect_success 'setup full repo' '
>  	objdir=".git/objects"
>  '
>  
> +test_expect_success 'verify graph with no graph file' '
> +	cd "$TRASH_DIRECTORY/full" &&
> +	git commit-graph verify
> +'

Nice to have this test here, but as it is now it is not an independent
test.  It depends on the fact that other earlier tests did not generate
graph file.

Perhaps it would be better to have a separate directory with repository
without commit graph file, and separate directory with repository with
commit graph file.  Or rename commit graph file if it exists, renaming
it back after tests finishes.  Or something like that.

> +
>  test_expect_success 'write graph with no packs' '
>  	cd "$TRASH_DIRECTORY/full" &&
>  	git commit-graph write --object-dir . &&
> @@ -230,4 +235,9 @@ test_expect_success 'perform fast-forward merge in full repo' '
>  	test_cmp expect output
>  '
>  
> +test_expect_success 'git commit-graph verify' '
> +	cd "$TRASH_DIRECTORY/full" &&
> +	git commit-graph verify >output
> +'

This is amost the same tests as the one added earlier, the only
difference is its place in t/t5318-commit-graph.sh.  This test is not
independent.

Though I'm not sure if that would matter much.

> +
>  test_done

  parent reply	other threads:[~2018-05-20 12:10 UTC|newest]

Thread overview: 149+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-17 18:10 [RFC PATCH 00/12] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
2018-04-17 18:10 ` [RFC PATCH 01/12] fixup! commit-graph: always load commit-graph information Derrick Stolee
2018-04-17 18:10 ` [RFC PATCH 03/12] commit-graph: check file header information Derrick Stolee
2018-04-19 15:58   ` Jakub Narebski
2018-04-17 18:10 ` [RFC PATCH 02/12] commit-graph: add 'check' subcommand Derrick Stolee
2018-04-19 13:24   ` Jakub Narebski
2018-04-17 18:10 ` [RFC PATCH 04/12] commit-graph: parse commit from chosen graph Derrick Stolee
2018-04-19 17:21   ` Jakub Narebski
2018-04-17 18:10 ` [RFC PATCH 05/12] commit-graph: check fanout and lookup table Derrick Stolee
2018-04-20  7:27   ` Jakub Narebski
2018-04-17 18:10 ` [RFC PATCH 06/12] commit: force commit to parse from object database Derrick Stolee
2018-04-20 12:13   ` Jakub Narebski
2018-04-17 18:10 ` [RFC PATCH 07/12] commit-graph: load a root tree from specific graph Derrick Stolee
2018-04-20 12:18   ` Jakub Narebski
2018-04-17 18:10 ` [RFC PATCH 08/12] commit-graph: verify commit contents against odb Derrick Stolee
2018-04-20 16:47   ` Jakub Narebski
2018-04-17 18:10 ` [RFC PATCH 10/12] commit-graph: add '--reachable' option Derrick Stolee
2018-04-20 17:17   ` Jakub Narebski
2018-04-17 18:10 ` [RFC PATCH 09/12] fsck: check commit-graph Derrick Stolee
2018-04-20 16:59   ` Jakub Narebski
2018-04-17 18:10 ` [RFC PATCH 12/12] commit-graph: update design document Derrick Stolee
2018-04-20 19:10   ` Jakub Narebski
2018-04-17 18:10 ` [RFC PATCH 11/12] gc: automatically write commit-graph files Derrick Stolee
2018-04-20 17:34   ` Jakub Narebski
2018-04-20 18:33     ` Ævar Arnfjörð Bjarmason
2018-04-17 18:50 ` [RFC PATCH 00/12] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
2018-05-10 17:34 ` [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch Derrick Stolee
2018-05-10 17:34   ` [PATCH 01/12] commit-graph: add 'verify' subcommand Derrick Stolee
2018-05-10 18:15     ` Martin Ågren
2018-05-10 17:34   ` [PATCH 02/12] commit-graph: verify file header information Derrick Stolee
2018-05-10 18:21     ` Martin Ågren
2018-05-10 17:34   ` [PATCH 03/12] commit-graph: parse commit from chosen graph Derrick Stolee
2018-05-10 17:34   ` [PATCH 04/12] commit-graph: verify fanout and lookup table Derrick Stolee
2018-05-10 18:29     ` Martin Ågren
2018-05-11 15:17       ` Derrick Stolee
2018-05-10 17:34   ` [PATCH 05/12] commit: force commit to parse from object database Derrick Stolee
2018-05-10 17:34   ` [PATCH 06/12] commit-graph: load a root tree from specific graph Derrick Stolee
2018-05-10 17:34   ` [PATCH 07/12] commit-graph: verify commit contents against odb Derrick Stolee
2018-05-10 17:34   ` [PATCH 08/12] fsck: verify commit-graph Derrick Stolee
2018-05-10 17:34   ` [PATCH 09/12] commit-graph: add '--reachable' option Derrick Stolee
2018-05-10 17:34   ` [PATCH 10/12] gc: automatically write commit-graph files Derrick Stolee
2018-05-10 17:34   ` [PATCH 11/12] fetch: compute commit-graph by default Derrick Stolee
2018-05-10 17:34   ` [PATCH 12/12] commit-graph: update design document Derrick Stolee
2018-05-10 19:05   ` [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch Martin Ågren
2018-05-10 19:22     ` Stefan Beller
2018-05-11 17:23       ` Derrick Stolee
2018-05-11 17:30         ` Martin Ågren
2018-05-10 19:17   ` Ævar Arnfjörð Bjarmason
2018-05-11 17:23     ` Derrick Stolee
2018-05-11 21:15   ` [PATCH v2 00/12] Integrate commit-graph into fsck and gc Derrick Stolee
2018-05-11 21:15     ` [PATCH v2 01/12] commit-graph: add 'verify' subcommand Derrick Stolee
2018-05-12 13:31       ` Martin Ågren
2018-05-14 13:27         ` Derrick Stolee
2018-05-20 12:10       ` Jakub Narebski [this message]
2018-05-11 21:15     ` [PATCH v2 02/12] commit-graph: verify file header information Derrick Stolee
2018-05-12 13:35       ` Martin Ågren
2018-05-14 13:31         ` Derrick Stolee
2018-05-20 20:00       ` Jakub Narebski
2018-05-11 21:15     ` [PATCH v2 03/12] commit-graph: test that 'verify' finds corruption Derrick Stolee
2018-05-12 13:43       ` Martin Ågren
2018-05-21 18:53       ` Jakub Narebski
2018-05-24 16:28         ` Derrick Stolee
2018-05-11 21:15     ` [PATCH v2 04/12] commit-graph: parse commit from chosen graph Derrick Stolee
2018-05-12 20:50       ` Martin Ågren
2018-05-11 21:15     ` [PATCH v2 05/12] commit-graph: verify fanout and lookup table Derrick Stolee
2018-05-11 21:15     ` [PATCH v2 06/12] commit: force commit to parse from object database Derrick Stolee
2018-05-12 20:54       ` Martin Ågren
2018-05-11 21:15     ` [PATCH v2 07/12] commit-graph: load a root tree from specific graph Derrick Stolee
2018-05-12 20:55       ` Martin Ågren
2018-05-11 21:15     ` [PATCH v2 08/12] commit-graph: verify commit contents against odb Derrick Stolee
2018-05-12 21:17       ` Martin Ågren
2018-05-14 13:44         ` Derrick Stolee
2018-05-15 21:12       ` Martin Ågren
2018-05-11 21:15     ` [PATCH v2 09/12] fsck: verify commit-graph Derrick Stolee
2018-05-17 18:13       ` Martin Ågren
2018-05-11 21:15     ` [PATCH v2 11/12] gc: automatically write commit-graph files Derrick Stolee
2018-05-17 18:20       ` Martin Ågren
2018-05-11 21:15     ` [PATCH v2 10/12] commit-graph: add '--reachable' option Derrick Stolee
2018-05-17 18:16       ` Martin Ågren
2018-05-11 21:15     ` [PATCH v2 12/12] commit-graph: update design document Derrick Stolee
2018-05-24 16:25     ` [PATCH v3 00/20] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
2018-05-24 16:25       ` [PATCH v3 01/20] commit-graph: UNLEAK before die() Derrick Stolee
2018-05-24 22:47         ` Stefan Beller
2018-05-25  0:08           ` Derrick Stolee
2018-05-24 16:25       ` [PATCH v3 02/20] commit-graph: fix GRAPH_MIN_SIZE Derrick Stolee
2018-05-26 18:46         ` Jakub Narebski
2018-05-26 20:30           ` brian m. carlson
2018-06-02 19:43             ` Jakub Narebski
2018-05-24 16:25       ` [PATCH v3 03/20] commit-graph: parse commit from chosen graph Derrick Stolee
2018-05-27 10:23         ` Jakub Narebski
2018-05-29 12:31           ` Derrick Stolee
2018-05-24 16:25       ` [PATCH v3 04/20] commit: force commit to parse from object database Derrick Stolee
2018-05-27 18:04         ` Jakub Narebski
2018-05-24 16:25       ` [PATCH v3 05/20] commit-graph: load a root tree from specific graph Derrick Stolee
2018-05-27 19:12         ` Jakub Narebski
2018-05-24 16:25       ` [PATCH v3 06/20] commit-graph: add 'verify' subcommand Derrick Stolee
2018-05-27 22:55         ` Jakub Narebski
2018-05-30 16:07           ` Derrick Stolee
2018-06-02 21:19             ` Jakub Narebski
2018-06-04 11:30               ` Derrick Stolee
2018-05-24 16:25       ` [PATCH v3 07/20] commit-graph: verify catches corrupt signature Derrick Stolee
2018-05-28 14:05         ` Jakub Narebski
2018-05-29 12:43           ` Derrick Stolee
2018-06-02 22:30             ` Jakub Narebski
2018-05-24 16:25       ` [PATCH v3 08/20] commit-graph: verify required chunks are present Derrick Stolee
2018-05-28 17:11         ` Jakub Narebski
2018-05-24 16:25       ` [PATCH v3 09/20] commit-graph: verify corrupt OID fanout and lookup Derrick Stolee
2018-05-30 13:34         ` Jakub Narebski
2018-05-30 16:18           ` Derrick Stolee
2018-06-02  4:38         ` Duy Nguyen
2018-06-04 11:32           ` Derrick Stolee
2018-06-04 14:42             ` Duy Nguyen
2018-05-24 16:25       ` [PATCH v3 10/20] commit-graph: verify objects exist Derrick Stolee
2018-05-30 19:22         ` Jakub Narebski
2018-05-31 12:53           ` Derrick Stolee
2018-05-24 16:25       ` [PATCH v3 11/20] commit-graph: verify root tree OIDs Derrick Stolee
2018-05-30 22:24         ` Jakub Narebski
2018-05-31 13:16           ` Derrick Stolee
2018-06-02 22:50             ` Jakub Narebski
2018-05-24 16:25       ` [PATCH v3 12/20] commit-graph: verify parent list Derrick Stolee
2018-06-01 23:21         ` Jakub Narebski
2018-05-24 16:25       ` [PATCH v3 13/20] commit-graph: verify generation number Derrick Stolee
2018-06-02 12:23         ` Jakub Narebski
2018-06-04 11:47           ` Derrick Stolee
2018-05-24 16:25       ` [PATCH v3 14/20] commit-graph: verify commit date Derrick Stolee
2018-06-02 12:29         ` Jakub Narebski
2018-05-24 16:25       ` [PATCH v3 15/20] commit-graph: test for corrupted octopus edge Derrick Stolee
2018-06-02 12:39         ` Jakub Narebski
2018-06-04 13:08           ` Derrick Stolee
2018-05-24 16:26       ` [PATCH v3 16/20] commit-graph: verify contents match checksum Derrick Stolee
2018-05-30 12:35         ` SZEDER Gábor
2018-06-02 15:52         ` Jakub Narebski
2018-06-04 11:55           ` Derrick Stolee
2018-05-24 16:26       ` [PATCH v3 17/20] fsck: verify commit-graph Derrick Stolee
2018-06-02 16:17         ` Jakub Narebski
2018-06-04 11:59           ` Derrick Stolee
2018-05-24 16:26       ` [PATCH v3 18/20] commit-graph: add '--reachable' option Derrick Stolee
2018-06-02 17:34         ` Jakub Narebski
2018-06-04 12:44           ` Derrick Stolee
2018-05-24 16:26       ` [PATCH v3 19/20] gc: automatically write commit-graph files Derrick Stolee
2018-06-02 18:03         ` Jakub Narebski
2018-06-04 12:51           ` Derrick Stolee
2018-05-24 16:26       ` [PATCH v3 20/20] commit-graph: update design document Derrick Stolee
2018-06-02 18:27         ` Jakub Narebski
2018-05-24 21:15       ` [PATCH v3 00/20] Integrate commit-graph into 'fsck' and 'gc' Ævar Arnfjörð Bjarmason
2018-05-25  4:11       ` Junio C Hamano
2018-05-29  4:27       ` Junio C Hamano
2018-05-29 12:37         ` Derrick Stolee
2018-05-29 13:41           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86d0xqeddq.fsf@gmail.com \
    --to=jnareb@gmail.com \
    --cc=avarab@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=martin.agren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).