git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Arthur Chan via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Arthur Chan <arthur.chan@adalogics.com>
Subject: Re: [PATCH] fuzz: add basic fuzz testing for git command
Date: Tue, 13 Sep 2022 09:13:32 -0700	[thread overview]
Message-ID: <xmqqv8pr9rrn.fsf@gitster.g> (raw)
In-Reply-To: <pull.1351.git.1663078962231.gitgitgadget@gmail.com> (Arthur Chan via GitGitGadget's message of "Tue, 13 Sep 2022 14:22:42 +0000")

"Arthur Chan via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  .gitignore        |   2 +
>  Makefile          |   2 +
>  fuzz-cmd-base.c   | 117 ++++++++++++++++++++++++++++++++++++++++++++++
>  fuzz-cmd-base.h   |  13 ++++++
>  fuzz-cmd-status.c |  68 +++++++++++++++++++++++++++
>  5 files changed, 202 insertions(+)
>  create mode 100644 fuzz-cmd-base.c
>  create mode 100644 fuzz-cmd-base.h
>  create mode 100644 fuzz-cmd-status.c

Just like we have t/ hierarchy for testing, if we plan to add more
fuzz-* related things on top of what we already have (like those
that can be seen in the context of this patch), I would prefer to
see a creation of fuzz/ hierarchy and move existing stuff there as
the first step before adding more.

And more fuzzing is good, if we can afford it ;-)

Thanks.

Even though I am not taking this patch as-is, let's give a cursory
look to make sure the future iteration can be more reviewable by
pointing out various CodingGuidelines issues.

> diff --git a/fuzz-cmd-base.c b/fuzz-cmd-base.c
> new file mode 100644
> index 00000000000..98f05c78372
> --- /dev/null
> +++ b/fuzz-cmd-base.c
> @@ -0,0 +1,117 @@
> +#include "cache.h"

Good to have this as the first thing.

> +#include "fuzz-cmd-base.h"
> +
> +
> +/*
> + * This function is used to randomize the content of a file with the
> + * random data. The random data normally come from the fuzzing engine
> + * LibFuzzer in order to create randomization of the git file worktree
> + * and possibly messing up of certain git config file to fuzz different
> + * git command execution logic.
> + */
> +void randomize_git_file(char *dir, char *name, char *data_chunk, int data_size) {

Unlike other control structure with multiple statements in a block,
the surrounding braces {} around function block sit on their own
lines.  I.e.

    void randomize_git_file(char *dir, char *name, char *data_chunk, int data_size)
    {


> +   char fname[256];

In our codebase, tab-width is 8 and we indent with tabs.

Use <strbuf.h> and avoid snprintf(), e.g.

	struct strbuf fname = STRBUF_INIT;
	strbuf_addf(&fname, "%s/%s", dir, name);
	... use fname.buf ...
	strbuf_release(&fname);

> +   FILE *fp;
> +

Good that you leave a blank between the end of decl and the
beginning of the statements.

> +   snprintf(fname, 255, "%s/%s", dir, name);
> +
> +   fp = fopen(fname, "wb");
> +   if (fp) {
> +      fwrite(data_chunk, 1, data_size, fp);
> +      fclose(fp);
> +   }
> +}

Why doesn't this care about errors at all?  Not even fopen errors?

> +/*
> + * This function is the variants of the above functions which takes
> + * in a set of target files to be processed. These target file are

"... is a variant of the above function, which takes a set of ..."

> + * passing to the above function one by one for content rewrite.
> + */
> +void randomize_git_files(char *dir, char *name_set[], int files_count, char *data, int size) {
> +   int data_size = size / files_count;
> +
> +   for(int i=0; i<files_count; i++) {

We do not yet officially allow variable decl for for() statement
like this.  We'll start allowing it later this year but we are
waiting for oddball platform/compiler folks to scream right now.

IOW, we write the above more like so:

	int data_size = size / files_count;
	int i;

        for (i = 0; i < files_count; i++) {

Take also notice how we use whitespaces around non-unary operators.

> +      char *data_chunk = malloc(data_size);
> +      memcpy(data_chunk, data + (i * data_size), data_size);
> +      randomize_git_file(dir, name_set[i], data_chunk, data_size);
> +
> +      free(data_chunk);
> +   }

As data_size does not change in this loop and the contents of
data_chunk from each round is discardable, allocating once outside
may make more sense.  Actually, as the called function makes only
read-only accesses of data_chunk, I do not quite see why you need to
make a copy in the first place.

We do not use malloc() etc. directly out of the system; study wrapper.c
and find xmalloc() and friends.

What if size is not a multiple of files_count, by the way?

I'll stop here as we already have plenty above (read: it is not "I
didn't spot any problems in the patch after this point").

Thanks.

  parent reply	other threads:[~2022-09-13 17:27 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-13 14:22 [PATCH] fuzz: add basic fuzz testing for git command Arthur Chan via GitGitGadget
2022-09-13 15:57 ` Ævar Arnfjörð Bjarmason
2022-09-16 15:54   ` Arthur Chan
2022-09-13 16:13 ` Junio C Hamano [this message]
2022-09-16 16:06   ` Arthur Chan
2022-09-16 17:29 ` [PATCH v2] " Arthur Chan via GitGitGadget
2022-09-16 17:37   ` Junio C Hamano
2022-09-16 18:07     ` Arthur Chan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqv8pr9rrn.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=arthur.chan@adalogics.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).