[PATCH] fuzz: add basic fuzz testing for git command

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH] fuzz: add basic fuzz testing for git command
@ 2022-09-13 14:22 Arthur Chan via GitGitGadget
  2022-09-13 15:57 ` Ævar Arnfjörð Bjarmason
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Arthur Chan via GitGitGadget @ 2022-09-13 14:22 UTC (permalink / raw)
  To: git; +Cc: Arthur Chan, Arthur Chan

From: Arthur Chan <arthur.chan@adalogics.com>

fuzz-cmd-base.c / fuzz-cmd-base.h provides base functions for
fuzzing on git command which are compatible with libFuzzer
(and possibly other fuzzing engines).
fuzz-cmd-status.c provides first git command fuzzing target
as a demonstration of the approach.

CC: Josh Steadmon <steadmon@google.com>
Signed-off-by: Arthur Chan <arthur.chan@adalogics.com>
---
    fuzz: add basic fuzz testing for git command
    
    An initial attempt to create LibFuzzer compatible fuzzer for git
    command. fuzz-cmd-base.c / fuzz-cmd-base.h provides base functions for
    fuzzing on git command which are compatible with libFuzzer (and possibly
    other fuzzing engines). fuzz-cmd-status.c provides first git command
    fuzzing target as a demonstration of the approach.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1351%2Farthurscchan%2Ffuzz-git-cmd-status-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1351/arthurscchan/fuzz-git-cmd-status-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1351

 .gitignore        |   2 +
 Makefile          |   2 +
 fuzz-cmd-base.c   | 117 ++++++++++++++++++++++++++++++++++++++++++++++
 fuzz-cmd-base.h   |  13 ++++++
 fuzz-cmd-status.c |  68 +++++++++++++++++++++++++++
 5 files changed, 202 insertions(+)
 create mode 100644 fuzz-cmd-base.c
 create mode 100644 fuzz-cmd-base.h
 create mode 100644 fuzz-cmd-status.c

diff --git a/.gitignore b/.gitignore
index 80b530bbed2..5d0ce214164 100644
--- a/.gitignore
+++ b/.gitignore
@@ -2,6 +2,8 @@
 /fuzz_corpora
 /fuzz-pack-headers
 /fuzz-pack-idx
+/fuzz-cmd-base
+/fuzz-cmd-status
 /GIT-BUILD-OPTIONS
 /GIT-CFLAGS
 /GIT-LDFLAGS
diff --git a/Makefile b/Makefile
index c6e126e54c2..20742935073 100644
--- a/Makefile
+++ b/Makefile
@@ -689,6 +689,7 @@ ETAGS_TARGET = TAGS
 FUZZ_OBJS += fuzz-commit-graph.o
 FUZZ_OBJS += fuzz-pack-headers.o
 FUZZ_OBJS += fuzz-pack-idx.o
+FUZZ_OBJS += fuzz-cmd-status.o
 .PHONY: fuzz-objs
 fuzz-objs: $(FUZZ_OBJS)
 
@@ -961,6 +962,7 @@ LIB_OBJS += fsck.o
 LIB_OBJS += fsmonitor.o
 LIB_OBJS += fsmonitor-ipc.o
 LIB_OBJS += fsmonitor-settings.o
+LIB_OBJS += fuzz-cmd-base.o
 LIB_OBJS += gettext.o
 LIB_OBJS += gpg-interface.o
 LIB_OBJS += graph.o
diff --git a/fuzz-cmd-base.c b/fuzz-cmd-base.c
new file mode 100644
index 00000000000..98f05c78372
--- /dev/null
+++ b/fuzz-cmd-base.c
@@ -0,0 +1,117 @@
+#include "cache.h"
+#include "fuzz-cmd-base.h"
+
+
+/*
+ * This function is used to randomize the content of a file with the
+ * random data. The random data normally come from the fuzzing engine
+ * LibFuzzer in order to create randomization of the git file worktree
+ * and possibly messing up of certain git config file to fuzz different
+ * git command execution logic.
+ */
+void randomize_git_file(char *dir, char *name, char *data_chunk, int data_size) {
+   char fname[256];
+   FILE *fp;
+
+   snprintf(fname, 255, "%s/%s", dir, name);
+
+   fp = fopen(fname, "wb");
+   if (fp) {
+      fwrite(data_chunk, 1, data_size, fp);
+      fclose(fp);
+   }
+}
+
+/*
+ * This function is the variants of the above functions which takes
+ * in a set of target files to be processed. These target file are
+ * passing to the above function one by one for content rewrite.
+ */
+void randomize_git_files(char *dir, char *name_set[], int files_count, char *data, int size) {
+   int data_size = size / files_count;
+
+   for(int i=0; i<files_count; i++) {
+      char *data_chunk = malloc(data_size);
+      memcpy(data_chunk, data + (i * data_size), data_size);
+
+      randomize_git_file(dir, name_set[i], data_chunk, data_size);
+
+      free(data_chunk);
+   }
+}
+
+/*
+ * Instead of randomizing the content of existing files. This helper
+ * function helps generate a temp file with random file name before
+ * passing to the above functions to get randomized content for later
+ * fuzzing of git command
+ */
+void generate_random_file(char *data, int size) {
+   unsigned char *hash = malloc(size);
+   char *fname = malloc((size*2)+12);
+   char *data_chunk = malloc(size);
+
+   memcpy(hash, data, size);
+   memcpy(data_chunk, data + size, size);
+
+   snprintf(fname, size*2+11, "TEMP-%s-TEMP", hash_to_hex(hash));
+   randomize_git_file(".", fname, data_chunk, size);
+
+   free(hash);
+   free(fname);
+   free(data_chunk);
+}
+
+/*
+ * This function helps to generate random commit and build up a
+ * worktree with randomization to provide a target for the fuzzing
+ * of git commands.
+ */
+void generate_commit(char *data, int size) {
+   int ret = 0;
+   char *data_chunk = malloc(size * 2);
+   memcpy(data_chunk, data, size * 2);
+
+   generate_random_file(data_chunk, size);
+   ret += system("git add TEMP-*-TEMP");
+   ret += system("git commit -m\"New Commit\"");
+
+   free(data_chunk);
+}
+
+/*
+ * In some cases, there maybe some fuzzing logic that will mess
+ * up with the git repository and its configuration and settings.
+ * This function aims to reset the git repository into the default
+ * base settings before each round of fuzzing.
+ */
+int reset_git_folder(void) {
+   int ret = 0;
+
+   ret += system("rm -rf ./.git");
+   ret += system("rm -f ./TEMP-*-TEMP");
+   ret += system("git init");
+   ret += system("git config --global user.name \"FUZZ\"");
+   ret += system("git config --global user.email \"FUZZ@LOCALHOST\"");
+   ret += system("git config --global --add safe.directory '*'");
+   ret += system("git add ./TEMP_1 ./TEMP_2");
+   ret += system("git commit -m\"First Commit\"");
+
+   return ret;
+}
+
+/*
+ * This helper function returns the maximum number of commit can
+ * be generated by the provided random data without reusing the
+ * data to increase randomization of the fuzzing target and allow
+ * more path of fuzzing to be covered.
+ */
+int get_max_commit_count(int data_size, int git_files_count, int hash_size) {
+   int count = (data_size - 4 - git_files_count * 2) / (hash_size * 2);
+
+   if(count > 20) {
+      count = 20;
+   }
+
+   return count;
+}
diff --git a/fuzz-cmd-base.h b/fuzz-cmd-base.h
new file mode 100644
index 00000000000..d63e46eac75
--- /dev/null
+++ b/fuzz-cmd-base.h
@@ -0,0 +1,13 @@
+#ifndef FUZZ_CMD_BASE_H
+#define FUZZ_CMD_BASE_H
+
+#define HASH_SIZE 20
+
+void randomize_git_files(char *dir, char *name_set[], int files_count, char *data, int size);
+void randomize_git_file(char *dir, char *name, char *data_chunk, int data_size);
+void generate_random_file(char *data, int size);
+void generate_commit(char *data, int size);
+int reset_git_folder(void);
+int get_max_commit_count(int data_size, int git_files_count, int hash_size);
+
+#endif
diff --git a/fuzz-cmd-status.c b/fuzz-cmd-status.c
new file mode 100644
index 00000000000..b02410a1259
--- /dev/null
+++ b/fuzz-cmd-status.c
@@ -0,0 +1,68 @@
+#include "builtin.h"
+#include "repository.h"
+#include "fuzz-cmd-base.h"
+
+int cmd_status(int argc, const char **argv, const char *prefix);
+
+int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size);
+
+int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
+   int no_of_commit;
+   int max_commit_count;
+   char *argv[2];
+   char *data_chunk;
+   char *basedir = "./.git";
+
+   /*
+    *  Initialize the repository
+    */
+   initialize_the_repository();
+
+   max_commit_count = get_max_commit_count(size, 0, HASH_SIZE);
+
+   /*
+    * End this round of fuzzing if the data is not large enough
+    */
+   if (size <= (HASH_SIZE * 2 + 4)) {
+      repo_clear(the_repository);
+      return 0;
+   }
+
+   if (reset_git_folder()) {
+      repo_clear(the_repository);
+      return 0;
+   }
+
+   /*
+    * Generate random commit
+    */
+   no_of_commit = (*((int *)data)) % max_commit_count + 1;
+   data += 4;
+   size -= 4;
+
+   for (int i=0; i<no_of_commit; i++) {
+      data_chunk = malloc(HASH_SIZE * 2);
+      memcpy(data_chunk, data, HASH_SIZE * 2);
+      generate_commit(data_chunk, HASH_SIZE);
+      data += (HASH_SIZE * 2);
+      size -= (HASH_SIZE * 2);
+      free(data_chunk);
+   }
+
+   /*
+    * Final preparing of the repository settings
+    */
+   repo_clear(the_repository);
+   repo_init(the_repository, basedir, ".");
+
+   /*
+    * Calling target git command
+    */
+   argv[0] = "status";
+   argv[1] = "-v";
+   cmd_status(2, (const char **)argv, (const char *)"");
+
+   repo_clear(the_repository);
+
+   return 0;
+}

base-commit: dd3f6c4cae7e3b15ce984dce8593ff7569650e24
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] fuzz: add basic fuzz testing for git command
  2022-09-13 14:22 [PATCH] fuzz: add basic fuzz testing for git command Arthur Chan via GitGitGadget
@ 2022-09-13 15:57 ` Ævar Arnfjörð Bjarmason
  2022-09-16 15:54   ` Arthur Chan
  2022-09-13 16:13 ` Junio C Hamano
  2022-09-16 17:29 ` [PATCH v2] " Arthur Chan via GitGitGadget
  2 siblings, 1 reply; 8+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-09-13 15:57 UTC (permalink / raw)
  To: Arthur Chan via GitGitGadget; +Cc: git, Arthur Chan


On Tue, Sep 13 2022, Arthur Chan via GitGitGadget wrote:

> From: Arthur Chan <arthur.chan@adalogics.com>
> [...]

Just a quick comment. The coding style of this project is to:

> +void randomize_git_files(char *dir, char *name_set[], int files_count, char *data, int size) {

...try to wrap at 79 columns.

> +   int data_size = size / files_count;

...and to use \t for indentation, not spaces.

> +   for(int i=0; i<files_count; i++) {

...and e.g. to use "for (", not "for(", spaces around "<" etc. We also
tend to pre-declare "int" instead of putting it in "for" etc.

> +void generate_random_file(char *data, int size) {

Can we really not use the APIs we have already for this (maybe not due
to the fuzz testing aspect of this...)

> +   ret += system("git add TEMP-*-TEMP");
> +   ret += system("git commit -m\"New Commit\"");

(I have not looked deeply). We usually write *.sh tests in t/*.sh, can
this really not be driven by that sort of infrastructure?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fuzz: add basic fuzz testing for git command
  2022-09-13 14:22 [PATCH] fuzz: add basic fuzz testing for git command Arthur Chan via GitGitGadget
  2022-09-13 15:57 ` Ævar Arnfjörð Bjarmason
@ 2022-09-13 16:13 ` Junio C Hamano
  2022-09-16 16:06   ` Arthur Chan
  2022-09-16 17:29 ` [PATCH v2] " Arthur Chan via GitGitGadget
  2 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2022-09-13 16:13 UTC (permalink / raw)
  To: Arthur Chan via GitGitGadget; +Cc: git, Arthur Chan

"Arthur Chan via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  .gitignore        |   2 +
>  Makefile          |   2 +
>  fuzz-cmd-base.c   | 117 ++++++++++++++++++++++++++++++++++++++++++++++
>  fuzz-cmd-base.h   |  13 ++++++
>  fuzz-cmd-status.c |  68 +++++++++++++++++++++++++++
>  5 files changed, 202 insertions(+)
>  create mode 100644 fuzz-cmd-base.c
>  create mode 100644 fuzz-cmd-base.h
>  create mode 100644 fuzz-cmd-status.c

Just like we have t/ hierarchy for testing, if we plan to add more
fuzz-* related things on top of what we already have (like those
that can be seen in the context of this patch), I would prefer to
see a creation of fuzz/ hierarchy and move existing stuff there as
the first step before adding more.

And more fuzzing is good, if we can afford it ;-)

Thanks.

Even though I am not taking this patch as-is, let's give a cursory
look to make sure the future iteration can be more reviewable by
pointing out various CodingGuidelines issues.

> diff --git a/fuzz-cmd-base.c b/fuzz-cmd-base.c
> new file mode 100644
> index 00000000000..98f05c78372
> --- /dev/null
> +++ b/fuzz-cmd-base.c
> @@ -0,0 +1,117 @@
> +#include "cache.h"

Good to have this as the first thing.

> +#include "fuzz-cmd-base.h"
> +
> +
> +/*
> + * This function is used to randomize the content of a file with the
> + * random data. The random data normally come from the fuzzing engine
> + * LibFuzzer in order to create randomization of the git file worktree
> + * and possibly messing up of certain git config file to fuzz different
> + * git command execution logic.
> + */
> +void randomize_git_file(char *dir, char *name, char *data_chunk, int data_size) {

Unlike other control structure with multiple statements in a block,
the surrounding braces {} around function block sit on their own
lines.  I.e.

    void randomize_git_file(char *dir, char *name, char *data_chunk, int data_size)
    {

> +   char fname[256];

In our codebase, tab-width is 8 and we indent with tabs.

Use <strbuf.h> and avoid snprintf(), e.g.

	struct strbuf fname = STRBUF_INIT;
	strbuf_addf(&fname, "%s/%s", dir, name);
	... use fname.buf ...
	strbuf_release(&fname);

> +   FILE *fp;
> +

Good that you leave a blank between the end of decl and the
beginning of the statements.

> +   snprintf(fname, 255, "%s/%s", dir, name);
> +
> +   fp = fopen(fname, "wb");
> +   if (fp) {
> +      fwrite(data_chunk, 1, data_size, fp);
> +      fclose(fp);
> +   }
> +}

Why doesn't this care about errors at all?  Not even fopen errors?

> +/*
> + * This function is the variants of the above functions which takes
> + * in a set of target files to be processed. These target file are

"... is a variant of the above function, which takes a set of ..."

> + * passing to the above function one by one for content rewrite.
> + */
> +void randomize_git_files(char *dir, char *name_set[], int files_count, char *data, int size) {
> +   int data_size = size / files_count;
> +
> +   for(int i=0; i<files_count; i++) {

We do not yet officially allow variable decl for for() statement
like this.  We'll start allowing it later this year but we are
waiting for oddball platform/compiler folks to scream right now.

IOW, we write the above more like so:

	int data_size = size / files_count;
	int i;

        for (i = 0; i < files_count; i++) {

Take also notice how we use whitespaces around non-unary operators.

> +      char *data_chunk = malloc(data_size);
> +      memcpy(data_chunk, data + (i * data_size), data_size);
> +      randomize_git_file(dir, name_set[i], data_chunk, data_size);
> +
> +      free(data_chunk);
> +   }

As data_size does not change in this loop and the contents of
data_chunk from each round is discardable, allocating once outside
may make more sense.  Actually, as the called function makes only
read-only accesses of data_chunk, I do not quite see why you need to
make a copy in the first place.

We do not use malloc() etc. directly out of the system; study wrapper.c
and find xmalloc() and friends.

What if size is not a multiple of files_count, by the way?

I'll stop here as we already have plenty above (read: it is not "I
didn't spot any problems in the patch after this point").

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fuzz: add basic fuzz testing for git command
  2022-09-13 15:57 ` Ævar Arnfjörð Bjarmason
@ 2022-09-16 15:54   ` Arthur Chan
  0 siblings, 0 replies; 8+ messages in thread
From: Arthur Chan @ 2022-09-16 15:54 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason,
	Arthur Chan via GitGitGadget
  Cc: git, Arthur Chan, david

Thanks for the styling suggestions. I have review my patch and change
most of the styling accordingly. And I am just a little bit confused
with two of the suggestions.

1) I am really sorry that I did not find any similar API for generate
random files for fuzzing, which not only required random file names, but
also random content that is purposely generated by the fuzzing engine.
Could you kindly suggest which existing APIs I could use for this
purpose? Thanks.

2) The existence of those system command in the code because the fuzzer
needed to reset the git repository on each round of fuzzing and it is
integrated inside the LLVM oss-fuzz library, thus it is necessary to
increase such resetting logic within the code.

Thanks again for your helpful comments and hope to hear back from you.

On 13/9/2022 4:57 pm, Ævar Arnfjörð Bjarmason wrote:
> On Tue, Sep 13 2022, Arthur Chan via GitGitGadget wrote:
>
>> From: Arthur Chan <arthur.chan@adalogics.com>
>> [...]
> Just a quick comment. The coding style of this project is to:
>
>> +void randomize_git_files(char *dir, char *name_set[], int files_count, char *data, int size) {
> ...try to wrap at 79 columns.
>
>> +   int data_size = size / files_count;
> ...and to use \t for indentation, not spaces.
>
>> +   for(int i=0; i<files_count; i++) {
> ...and e.g. to use "for (", not "for(", spaces around "<" etc. We also
> tend to pre-declare "int" instead of putting it in "for" etc.
>
>> +void generate_random_file(char *data, int size) {
> Can we really not use the APIs we have already for this (maybe not due
> to the fuzz testing aspect of this...)
>
>> +   ret += system("git add TEMP-*-TEMP");
>> +   ret += system("git commit -m\"New Commit\"");
> (I have not looked deeply). We usually write *.sh tests in t/*.sh, can
> this really not be driven by that sort of infrastructure?
ADA Logics Ltd is registered in England. No: 11624074.
Registered office: 266 Banbury Road, Post Box 292,
OX2 7DL, Oxford, Oxfordshire , United Kingdom

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fuzz: add basic fuzz testing for git command
  2022-09-13 16:13 ` Junio C Hamano
@ 2022-09-16 16:06   ` Arthur Chan
  0 siblings, 0 replies; 8+ messages in thread
From: Arthur Chan @ 2022-09-16 16:06 UTC (permalink / raw)
  To: Junio C Hamano, Arthur Chan via GitGitGadget; +Cc: git, Arthur Chan, david


On 13/9/2022 5:13 pm, Junio C Hamano wrote:
> "Arthur Chan via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>>   .gitignore        |   2 +
>>   Makefile          |   2 +
>>   fuzz-cmd-base.c   | 117 ++++++++++++++++++++++++++++++++++++++++++++++
>>   fuzz-cmd-base.h   |  13 ++++++
>>   fuzz-cmd-status.c |  68 +++++++++++++++++++++++++++
>>   5 files changed, 202 insertions(+)
>>   create mode 100644 fuzz-cmd-base.c
>>   create mode 100644 fuzz-cmd-base.h
>>   create mode 100644 fuzz-cmd-status.c
> Just like we have t/ hierarchy for testing, if we plan to add more
> fuzz-* related things on top of what we already have (like those
> that can be seen in the context of this patch), I would prefer to
> see a creation of fuzz/ hierarchy and move existing stuff there as
> the first step before adding more.
>
> And more fuzzing is good, if we can afford it ;-)
Fixed, I move the fuzzer into a new directory oss-fuzz
>
> Thanks.
>
> Even though I am not taking this patch as-is, let's give a cursory
> look to make sure the future iteration can be more reviewable by
> pointing out various CodingGuidelines issues.
>
Thanks for the styling suggestion, I have change most of them accordingly.
>> diff --git a/fuzz-cmd-base.c b/fuzz-cmd-base.c
>> new file mode 100644
>> index 00000000000..98f05c78372
>> --- /dev/null
>> +++ b/fuzz-cmd-base.c
>> @@ -0,0 +1,117 @@
>> +#include "cache.h"
> Good to have this as the first thing.
>
>> +#include "fuzz-cmd-base.h"
>> +
>> +
>> +/*
>> + * This function is used to randomize the content of a file with the
>> + * random data. The random data normally come from the fuzzing engine
>> + * LibFuzzer in order to create randomization of the git file worktree
>> + * and possibly messing up of certain git config file to fuzz different
>> + * git command execution logic.
>> + */
>> +void randomize_git_file(char *dir, char *name, char *data_chunk, int data_size) {
> Unlike other control structure with multiple statements in a block,
> the surrounding braces {} around function block sit on their own
> lines.  I.e.
>
>      void randomize_git_file(char *dir, char *name, char *data_chunk, int data_size)
>      {
>
>
>> +   char fname[256];
> In our codebase, tab-width is 8 and we indent with tabs.
>
> Use <strbuf.h> and avoid snprintf(), e.g.
>
>       struct strbuf fname = STRBUF_INIT;
>       strbuf_addf(&fname, "%s/%s", dir, name);
>       ... use fname.buf ...
>       strbuf_release(&fname);
I have changed all the snprintf code to use strbuf instead. Thanks for
the suggestion.
>> +   FILE *fp;
>> +
> Good that you leave a blank between the end of decl and the
> beginning of the statements.
>
>> +   snprintf(fname, 255, "%s/%s", dir, name);
>> +
>> +   fp = fopen(fname, "wb");
>> +   if (fp) {
>> +      fwrite(data_chunk, 1, data_size, fp);
>> +      fclose(fp);
>> +   }
>> +}
> Why doesn't this care about errors at all?  Not even fopen errors?
>
I have changed the code a little bit, but in general, fail to generate
contents of a file do appear many time during the fuzzing process
because some random fuzzing data result in unexpected behaviour and we
currently just skip that round of fuzzing.
>> +/*
>> + * This function is the variants of the above functions which takes
>> + * in a set of target files to be processed. These target file are
> "... is a variant of the above function, which takes a set of ..."
>
>> + * passing to the above function one by one for content rewrite.
>> + */
>> +void randomize_git_files(char *dir, char *name_set[], int files_count, char *data, int size) {
>> +   int data_size = size / files_count;
>> +
>> +   for(int i=0; i<files_count; i++) {
> We do not yet officially allow variable decl for for() statement
> like this.  We'll start allowing it later this year but we are
> waiting for oddball platform/compiler folks to scream right now.
>
> IOW, we write the above more like so:
>
>       int data_size = size / files_count;
>       int i;
>
>          for (i = 0; i < files_count; i++) {
>
> Take also notice how we use whitespaces around non-unary operators.
Thanks, changed the code style accordingly.
>> +      char *data_chunk = malloc(data_size);
>> +      memcpy(data_chunk, data + (i * data_size), data_size);
>> +      randomize_git_file(dir, name_set[i], data_chunk, data_size);
>> +
>> +      free(data_chunk);
>> +   }
> As data_size does not change in this loop and the contents of
> data_chunk from each round is discardable, allocating once outside
> may make more sense.  Actually, as the called function makes only
> read-only accesses of data_chunk, I do not quite see why you need to
> make a copy in the first place.
>
> We do not use malloc() etc. directly out of the system; study wrapper.c
> and find xmalloc() and friends.
Change to use xmallocz_gentle instead of malloc. Thanks for the suggestion.
>
> What if size is not a multiple of files_count, by the way?
It does not matter, the unused byte just simply be ignored. We just
ensure it has enough random byte provided by the oss-fuzz engine to
generate those random file content.
> I'll stop here as we already have plenty above (read: it is not "I
> didn't spot any problems in the patch after this point").
Thanks and sorry for the trouble, this is the first time to contribute
to patches in git and does not know most of the convention and style.
Have changed most of the them with my best effort accordingly and will
prepare a v2 soon enough.
>
> Thanks.
ADA Logics Ltd is registered in England. No: 11624074.
Registered office: 266 Banbury Road, Post Box 292,
OX2 7DL, Oxford, Oxfordshire , United Kingdom

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2] fuzz: add basic fuzz testing for git command
  2022-09-13 14:22 [PATCH] fuzz: add basic fuzz testing for git command Arthur Chan via GitGitGadget
  2022-09-13 15:57 ` Ævar Arnfjörð Bjarmason
  2022-09-13 16:13 ` Junio C Hamano
@ 2022-09-16 17:29 ` Arthur Chan via GitGitGadget
  2022-09-16 17:37   ` Junio C Hamano
  2 siblings, 1 reply; 8+ messages in thread
From: Arthur Chan via GitGitGadget @ 2022-09-16 17:29 UTC (permalink / raw)
  To: git; +Cc: Ævar Arnfjörð Bjarmason, Arthur Chan, Arthur Chan

From: Arthur Chan <arthur.chan@adalogics.com>

fuzz-cmd-base.c / fuzz-cmd-base.h provides base functions for
fuzzing on git command which are compatible with libFuzzer
(and possibly other fuzzing engines).
fuzz-cmd-status.c provides first git command fuzzing target
as a demonstration of the approach.

CC: Josh Steadmon <steadmon@google.com>
CC: David Korczynski <david@adalogics.com>
Signed-off-by: Arthur Chan <arthur.chan@adalogics.com>
---
    fuzz: add basic fuzz testing for git command
    
    An initial attempt to create LibFuzzer compatible fuzzer for git
    command. fuzz-cmd-base.c / fuzz-cmd-base.h provides base functions for
    fuzzing on git command which are compatible with libFuzzer (and possibly
    other fuzzing engines). fuzz-cmd-status.c provides first git command
    fuzzing target as a demonstration of the approach.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1351%2Farthurscchan%2Ffuzz-git-cmd-status-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1351/arthurscchan/fuzz-git-cmd-status-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1351

Range-diff vs v1:

 1:  11a09cef59a ! 1:  9100c7c8e51 fuzz: add basic fuzz testing for git command
     @@ Commit message
          as a demonstration of the approach.
      
          CC: Josh Steadmon <steadmon@google.com>
     +    CC: David Korczynski <david@adalogics.com>
          Signed-off-by: Arthur Chan <arthur.chan@adalogics.com>
      
       ## .gitignore ##
     @@ .gitignore
       /GIT-LDFLAGS
      
       ## Makefile ##
     -@@ Makefile: ETAGS_TARGET = TAGS
     - FUZZ_OBJS += fuzz-commit-graph.o
     - FUZZ_OBJS += fuzz-pack-headers.o
     - FUZZ_OBJS += fuzz-pack-idx.o
     -+FUZZ_OBJS += fuzz-cmd-status.o
     +@@ Makefile: SCRIPTS = $(SCRIPT_SH_GEN) \
     + 
     + ETAGS_TARGET = TAGS
     + 
     +-FUZZ_OBJS += fuzz-commit-graph.o
     +-FUZZ_OBJS += fuzz-pack-headers.o
     +-FUZZ_OBJS += fuzz-pack-idx.o
     ++FUZZ_OBJS += oss-fuzz/fuzz-commit-graph.o
     ++FUZZ_OBJS += oss-fuzz/fuzz-pack-headers.o
     ++FUZZ_OBJS += oss-fuzz/fuzz-pack-idx.o
     ++FUZZ_OBJS += oss-fuzz/fuzz-cmd-status.o
       .PHONY: fuzz-objs
       fuzz-objs: $(FUZZ_OBJS)
       
     -@@ Makefile: LIB_OBJS += fsck.o
     - LIB_OBJS += fsmonitor.o
     - LIB_OBJS += fsmonitor-ipc.o
     - LIB_OBJS += fsmonitor-settings.o
     -+LIB_OBJS += fuzz-cmd-base.o
     - LIB_OBJS += gettext.o
     - LIB_OBJS += gpg-interface.o
     - LIB_OBJS += graph.o
     +@@ Makefile: LIB_OBJS += oid-array.o
     + LIB_OBJS += oidmap.o
     + LIB_OBJS += oidset.o
     + LIB_OBJS += oidtree.o
     ++LIB_OBJS += oss-fuzz/fuzz-cmd-base.o
     + LIB_OBJS += pack-bitmap-write.o
     + LIB_OBJS += pack-bitmap.o
     + LIB_OBJS += pack-check.o
      
     - ## fuzz-cmd-base.c (new) ##
     + ## oss-fuzz/fuzz-cmd-base.c (new) ##
      @@
      +#include "cache.h"
      +#include "fuzz-cmd-base.h"
     @@ fuzz-cmd-base.c (new)
      + * random data. The random data normally come from the fuzzing engine
      + * LibFuzzer in order to create randomization of the git file worktree
      + * and possibly messing up of certain git config file to fuzz different
     -+ * git command execution logic.
     ++ * git command execution logic. Return -1 if it fails to create the file.
      + */
     -+void randomize_git_file(char *dir, char *name, char *data_chunk, int data_size) {
     -+   char fname[256];
     -+   FILE *fp;
     -+
     -+   snprintf(fname, 255, "%s/%s", dir, name);
     -+
     -+   fp = fopen(fname, "wb");
     -+   if (fp) {
     -+      fwrite(data_chunk, 1, data_size, fp);
     -+      fclose(fp);
     -+   }
     ++int randomize_git_file(char *dir, char *name, char *data, int size)
     ++{
     ++	FILE *fp;
     ++	int ret = 0;
     ++	struct strbuf fname = STRBUF_INIT;
     ++
     ++	strbuf_addf(&fname, "%s/%s", dir, name);
     ++
     ++	fp = fopen(fname.buf, "wb");
     ++	if (fp)
     ++	{
     ++		fwrite(data, 1, size, fp);
     ++	}
     ++	else
     ++	{
     ++		ret = -1;
     ++	}
     ++
     ++	fclose(fp);
     ++	strbuf_release(&fname);
     ++
     ++	return ret;
      +}
      +
      +/*
     -+ * This function is the variants of the above functions which takes
     -+ * in a set of target files to be processed. These target file are
     ++ * This function is a variant of the above function which takes
     ++ * a set of target files to be processed. These target file are
      + * passing to the above function one by one for content rewrite.
     ++ * The data is equally divided for each of the files, and the
     ++ * remaining bytes (if not divisible) will be ignored.
      + */
     -+void randomize_git_files(char *dir, char *name_set[], int files_count, char *data, int size) {
     -+   int data_size = size / files_count;
     -+
     -+   for(int i=0; i<files_count; i++) {
     -+      char *data_chunk = malloc(data_size);
     -+      memcpy(data_chunk, data + (i * data_size), data_size);
     -+
     -+      randomize_git_file(dir, name_set[i], data_chunk, data_size);
     -+
     -+      free(data_chunk);
     -+   }
     ++void randomize_git_files(char *dir, char *name_set[],
     ++	int files_count, char *data, int size)
     ++{
     ++	int i;
     ++	int data_size = size / files_count;
     ++	char *data_chunk = xmallocz_gently(data_size);
     ++
     ++	if (!data_chunk)
     ++	{
     ++		return;
     ++	}
     ++
     ++	for (i = 0; i < files_count; i++)
     ++	{
     ++		memcpy(data_chunk, data + (i * data_size), data_size);
     ++		randomize_git_file(dir, name_set[i], data_chunk, data_size);
     ++	}
     ++	free(data_chunk);
      +}
      +
      +/*
      + * Instead of randomizing the content of existing files. This helper
      + * function helps generate a temp file with random file name before
      + * passing to the above functions to get randomized content for later
     -+ * fuzzing of git command
     ++ * fuzzing of git command.
      + */
     -+void generate_random_file(char *data, int size) {
     -+   unsigned char *hash = malloc(size);
     -+   char *fname = malloc((size*2)+12);
     -+   char *data_chunk = malloc(size);
     -+
     -+   memcpy(hash, data, size);
     -+   memcpy(data_chunk, data + size, size);
     -+
     -+   snprintf(fname, size*2+11, "TEMP-%s-TEMP", hash_to_hex(hash));
     -+   randomize_git_file(".", fname, data_chunk, size);
     -+
     -+   free(hash);
     -+   free(fname);
     -+   free(data_chunk);
     ++void generate_random_file(char *data, int size)
     ++{
     ++	unsigned char *hash = xmallocz_gently(size);
     ++	char *data_chunk = xmallocz_gently(size);
     ++	struct strbuf fname = STRBUF_INIT;
     ++
     ++	if (!hash || !data_chunk)
     ++	{
     ++		return;
     ++	}
     ++
     ++	memcpy(hash, data, size);
     ++	memcpy(data_chunk, data + size, size);
     ++
     ++	strbuf_addf(&fname, "TEMP-%s-TEMP", hash_to_hex(hash));
     ++	randomize_git_file(".", fname.buf, data_chunk, size);
     ++
     ++	free(hash);
     ++	free(data_chunk);
     ++	strbuf_release(&fname);
      +}
      +
      +/*
     @@ fuzz-cmd-base.c (new)
      + * of git commands.
      + */
      +void generate_commit(char *data, int size) {
     -+   int ret = 0;
     -+   char *data_chunk = malloc(size * 2);
     -+   memcpy(data_chunk, data, size * 2);
     ++	char *data_chunk = xmallocz_gently(size * 2);
     ++
     ++	if (!data_chunk)
     ++	{
     ++		return;
     ++	}
     ++
     ++	memcpy(data_chunk, data, size * 2);
     ++	generate_random_file(data_chunk, size);
      +
     -+   generate_random_file(data_chunk, size);
     -+   ret += system("git add TEMP-*-TEMP");
     -+   ret += system("git commit -m\"New Commit\"");
     ++	free(data_chunk);
      +
     -+   free(data_chunk);
     ++	if (system("git add TEMP-*-TEMP") || system("git commit -m\"New Commit\""))
     ++	{
     ++		// Just skip the commit if fails
     ++		return;
     ++	}
      +}
      +
      +/*
      + * In some cases, there maybe some fuzzing logic that will mess
      + * up with the git repository and its configuration and settings.
     -+ * This function aims to reset the git repository into the default
     -+ * base settings before each round of fuzzing.
     ++ * This function integrates into the fuzzing processing and
     ++ * reset the git repository into the default
     ++ * base settings befire each round of fuzzing.
      + */
     -+int reset_git_folder(void) {
     -+   int ret = 0;
     -+
     -+   ret += system("rm -rf ./.git");
     -+   ret += system("rm -f ./TEMP-*-TEMP");
     -+   ret += system("git init");
     -+   ret += system("git config --global user.name \"FUZZ\"");
     -+   ret += system("git config --global user.email \"FUZZ@LOCALHOST\"");
     -+   ret += system("git config --global --add safe.directory '*'");
     -+   ret += system("git add ./TEMP_1 ./TEMP_2");
     -+   ret += system("git commit -m\"First Commit\"");
     -+
     -+   return ret;
     ++int reset_git_folder(void)
     ++{
     ++	int ret = 0;
     ++
     ++	ret += system("rm -rf ./.git");
     ++	ret += system("rm -f ./TEMP-*-TEMP");
     ++
     ++	if (system("git init") ||
     ++		system("git config --global user.name \"FUZZ\"") ||
     ++		system("git config --global user.email \"FUZZ@LOCALHOST\"") ||
     ++		system("git config --global --add safe.directory '*'") ||
     ++		system("git add ./TEMP_1 ./TEMP_2") ||
     ++		system("git commit -m\"First Commit\""))
     ++	{
     ++		return -1;
     ++	}
     ++
     ++	return 0;
      +}
      +
      +/*
     @@ fuzz-cmd-base.c (new)
      + * data to increase randomization of the fuzzing target and allow
      + * more path of fuzzing to be covered.
      + */
     -+int get_max_commit_count(int data_size, int git_files_count, int hash_size) {
     -+   int count = (data_size - 4 - git_files_count * 2) / (hash_size * 2);
     ++int get_max_commit_count(int data_size, int git_files_count, int hash_size)
     ++{
     ++	int count = (data_size - 4 - git_files_count * 2) / (hash_size * 2);
      +
     -+   if(count > 20) {
     -+      count = 20;
     -+   }
     ++	if (count > 20)
     ++	{
     ++		count = 20;
     ++	}
      +
     -+   return count;
     ++	return count;
      +}
      
     - ## fuzz-cmd-base.h (new) ##
     + ## oss-fuzz/fuzz-cmd-base.h (new) ##
      @@
      +#ifndef FUZZ_CMD_BASE_H
      +#define FUZZ_CMD_BASE_H
      +
      +#define HASH_SIZE 20
      +
     -+void randomize_git_files(char *dir, char *name_set[], int files_count, char *data, int size);
     -+void randomize_git_file(char *dir, char *name, char *data_chunk, int data_size);
     ++int randomize_git_file(char *dir, char *name, char *data, int size);
     ++void randomize_git_files(char *dir, char *name_set[],
     ++	int files_count, char *data, int size);
      +void generate_random_file(char *data, int size);
      +void generate_commit(char *data, int size);
      +int reset_git_folder(void);
     @@ fuzz-cmd-base.h (new)
      +
      +#endif
      
     - ## fuzz-cmd-status.c (new) ##
     + ## oss-fuzz/fuzz-cmd-status.c (new) ##
      @@
      +#include "builtin.h"
      +#include "repository.h"
     @@ fuzz-cmd-status.c (new)
      +
      +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size);
      +
     -+int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
     -+   int no_of_commit;
     -+   int max_commit_count;
     -+   char *argv[2];
     -+   char *data_chunk;
     -+   char *basedir = "./.git";
     -+
     -+   /*
     -+    *  Initialize the repository
     -+    */
     -+   initialize_the_repository();
     -+
     -+   max_commit_count = get_max_commit_count(size, 0, HASH_SIZE);
     -+
     -+   /*
     -+    * End this round of fuzzing if the data is not large enough
     -+    */
     -+   if (size <= (HASH_SIZE * 2 + 4)) {
     -+      repo_clear(the_repository);
     -+      return 0;
     -+   }
     -+
     -+   if (reset_git_folder()) {
     -+      repo_clear(the_repository);
     -+      return 0;
     -+   }
     -+
     -+   /*
     -+    * Generate random commit
     -+    */
     -+   no_of_commit = (*((int *)data)) % max_commit_count + 1;
     -+   data += 4;
     -+   size -= 4;
     -+
     -+   for (int i=0; i<no_of_commit; i++) {
     -+      data_chunk = malloc(HASH_SIZE * 2);
     -+      memcpy(data_chunk, data, HASH_SIZE * 2);
     -+      generate_commit(data_chunk, HASH_SIZE);
     -+      data += (HASH_SIZE * 2);
     -+      size -= (HASH_SIZE * 2);
     -+      free(data_chunk);
     -+   }
     -+
     -+   /*
     -+    * Final preparing of the repository settings
     -+    */
     -+   repo_clear(the_repository);
     -+   repo_init(the_repository, basedir, ".");
     -+
     -+   /*
     -+    * Calling target git command
     -+    */
     -+   argv[0] = "status";
     -+   argv[1] = "-v";
     -+   cmd_status(2, (const char **)argv, (const char *)"");
     -+
     -+   repo_clear(the_repository);
     -+
     -+   return 0;
     ++int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
     ++{
     ++	int i;
     ++	int no_of_commit;
     ++	int max_commit_count;
     ++	char *argv[2];
     ++	char *data_chunk;
     ++	char *basedir = "./.git";
     ++
     ++	/*
     ++	 *  Initialize the repository
     ++	 */
     ++	initialize_the_repository();
     ++
     ++	max_commit_count = get_max_commit_count(size, 0, HASH_SIZE);
     ++
     ++	/*
     ++	 * End this round of fuzzing if the data is not large enough
     ++	 */
     ++	if (size <= (HASH_SIZE * 2 + 4) || reset_git_folder())
     ++	{
     ++		repo_clear(the_repository);
     ++		return 0;
     ++	}
     ++
     ++
     ++	/*
     ++	 * Generate random commit
     ++	 */
     ++	no_of_commit = (*((int *)data)) % max_commit_count + 1;
     ++	data += 4;
     ++	size -= 4;
     ++
     ++	data_chunk = xmallocz_gently(HASH_SIZE * 2);
     ++
     ++	if (!data_chunk)
     ++	{
     ++		repo_clear(the_repository);
     ++		return 0;
     ++	}
     ++
     ++	for (i = 0; i < no_of_commit; i++)
     ++	{
     ++		memcpy(data_chunk, data, HASH_SIZE * 2);
     ++		generate_commit(data_chunk, HASH_SIZE);
     ++		data += (HASH_SIZE * 2);
     ++		size -= (HASH_SIZE * 2);
     ++	}
     ++
     ++	free(data_chunk);
     ++
     ++	/*
     ++	 * Final preparing of the repository settings
     ++	 */
     ++	repo_clear(the_repository);
     ++	if (repo_init(the_repository, basedir, "."))
     ++	{
     ++		repo_clear(the_repository);
     ++		return 0;
     ++	}
     ++
     ++	/*
     ++	 * Calling target git command
     ++	 */
     ++	argv[0] = "status";
     ++	argv[1] = "-v";
     ++	cmd_status(2, (const char **)argv, (const char *)"");
     ++
     ++	repo_clear(the_repository);
     ++	return 0;
      +}
     +
     + ## fuzz-commit-graph.c => oss-fuzz/fuzz-commit-graph.c ##
     +
     + ## fuzz-pack-headers.c => oss-fuzz/fuzz-pack-headers.c ##
     +
     + ## fuzz-pack-idx.c => oss-fuzz/fuzz-pack-idx.c ##


 .gitignore                                    |   2 +
 Makefile                                      |   8 +-
 oss-fuzz/fuzz-cmd-base.c                      | 159 ++++++++++++++++++
 oss-fuzz/fuzz-cmd-base.h                      |  14 ++
 oss-fuzz/fuzz-cmd-status.c                    |  79 +++++++++
 .../fuzz-commit-graph.c                       |   0
 .../fuzz-pack-headers.c                       |   0
 fuzz-pack-idx.c => oss-fuzz/fuzz-pack-idx.c   |   0
 8 files changed, 259 insertions(+), 3 deletions(-)
 create mode 100644 oss-fuzz/fuzz-cmd-base.c
 create mode 100644 oss-fuzz/fuzz-cmd-base.h
 create mode 100644 oss-fuzz/fuzz-cmd-status.c
 rename fuzz-commit-graph.c => oss-fuzz/fuzz-commit-graph.c (100%)
 rename fuzz-pack-headers.c => oss-fuzz/fuzz-pack-headers.c (100%)
 rename fuzz-pack-idx.c => oss-fuzz/fuzz-pack-idx.c (100%)

diff --git a/.gitignore b/.gitignore
index 80b530bbed2..5d0ce214164 100644
--- a/.gitignore
+++ b/.gitignore
@@ -2,6 +2,8 @@
 /fuzz_corpora
 /fuzz-pack-headers
 /fuzz-pack-idx
+/fuzz-cmd-base
+/fuzz-cmd-status
 /GIT-BUILD-OPTIONS
 /GIT-CFLAGS
 /GIT-LDFLAGS
diff --git a/Makefile b/Makefile
index c6e126e54c2..4aafe20489e 100644
--- a/Makefile
+++ b/Makefile
@@ -686,9 +686,10 @@ SCRIPTS = $(SCRIPT_SH_GEN) \
 
 ETAGS_TARGET = TAGS
 
-FUZZ_OBJS += fuzz-commit-graph.o
-FUZZ_OBJS += fuzz-pack-headers.o
-FUZZ_OBJS += fuzz-pack-idx.o
+FUZZ_OBJS += oss-fuzz/fuzz-commit-graph.o
+FUZZ_OBJS += oss-fuzz/fuzz-pack-headers.o
+FUZZ_OBJS += oss-fuzz/fuzz-pack-idx.o
+FUZZ_OBJS += oss-fuzz/fuzz-cmd-status.o
 .PHONY: fuzz-objs
 fuzz-objs: $(FUZZ_OBJS)
 
@@ -1009,6 +1010,7 @@ LIB_OBJS += oid-array.o
 LIB_OBJS += oidmap.o
 LIB_OBJS += oidset.o
 LIB_OBJS += oidtree.o
+LIB_OBJS += oss-fuzz/fuzz-cmd-base.o
 LIB_OBJS += pack-bitmap-write.o
 LIB_OBJS += pack-bitmap.o
 LIB_OBJS += pack-check.o
diff --git a/oss-fuzz/fuzz-cmd-base.c b/oss-fuzz/fuzz-cmd-base.c
new file mode 100644
index 00000000000..25fb7b838f0
--- /dev/null
+++ b/oss-fuzz/fuzz-cmd-base.c
@@ -0,0 +1,159 @@
+#include "cache.h"
+#include "fuzz-cmd-base.h"
+
+
+/*
+ * This function is used to randomize the content of a file with the
+ * random data. The random data normally come from the fuzzing engine
+ * LibFuzzer in order to create randomization of the git file worktree
+ * and possibly messing up of certain git config file to fuzz different
+ * git command execution logic. Return -1 if it fails to create the file.
+ */
+int randomize_git_file(char *dir, char *name, char *data, int size)
+{
+	FILE *fp;
+	int ret = 0;
+	struct strbuf fname = STRBUF_INIT;
+
+	strbuf_addf(&fname, "%s/%s", dir, name);
+
+	fp = fopen(fname.buf, "wb");
+	if (fp)
+	{
+		fwrite(data, 1, size, fp);
+	}
+	else
+	{
+		ret = -1;
+	}
+
+	fclose(fp);
+	strbuf_release(&fname);
+
+	return ret;
+}
+
+/*
+ * This function is a variant of the above function which takes
+ * a set of target files to be processed. These target file are
+ * passing to the above function one by one for content rewrite.
+ * The data is equally divided for each of the files, and the
+ * remaining bytes (if not divisible) will be ignored.
+ */
+void randomize_git_files(char *dir, char *name_set[],
+	int files_count, char *data, int size)
+{
+	int i;
+	int data_size = size / files_count;
+	char *data_chunk = xmallocz_gently(data_size);
+
+	if (!data_chunk)
+	{
+		return;
+	}
+
+	for (i = 0; i < files_count; i++)
+	{
+		memcpy(data_chunk, data + (i * data_size), data_size);
+		randomize_git_file(dir, name_set[i], data_chunk, data_size);
+	}
+	free(data_chunk);
+}
+
+/*
+ * Instead of randomizing the content of existing files. This helper
+ * function helps generate a temp file with random file name before
+ * passing to the above functions to get randomized content for later
+ * fuzzing of git command.
+ */
+void generate_random_file(char *data, int size)
+{
+	unsigned char *hash = xmallocz_gently(size);
+	char *data_chunk = xmallocz_gently(size);
+	struct strbuf fname = STRBUF_INIT;
+
+	if (!hash || !data_chunk)
+	{
+		return;
+	}
+
+	memcpy(hash, data, size);
+	memcpy(data_chunk, data + size, size);
+
+	strbuf_addf(&fname, "TEMP-%s-TEMP", hash_to_hex(hash));
+	randomize_git_file(".", fname.buf, data_chunk, size);
+
+	free(hash);
+	free(data_chunk);
+	strbuf_release(&fname);
+}
+
+/*
+ * This function helps to generate random commit and build up a
+ * worktree with randomization to provide a target for the fuzzing
+ * of git commands.
+ */
+void generate_commit(char *data, int size) {
+	char *data_chunk = xmallocz_gently(size * 2);
+
+	if (!data_chunk)
+	{
+		return;
+	}
+
+	memcpy(data_chunk, data, size * 2);
+	generate_random_file(data_chunk, size);
+
+	free(data_chunk);
+
+	if (system("git add TEMP-*-TEMP") || system("git commit -m\"New Commit\""))
+	{
+		// Just skip the commit if fails
+		return;
+	}
+}
+
+/*
+ * In some cases, there maybe some fuzzing logic that will mess
+ * up with the git repository and its configuration and settings.
+ * This function integrates into the fuzzing processing and
+ * reset the git repository into the default
+ * base settings befire each round of fuzzing.
+ */
+int reset_git_folder(void)
+{
+	int ret = 0;
+
+	ret += system("rm -rf ./.git");
+	ret += system("rm -f ./TEMP-*-TEMP");
+
+	if (system("git init") ||
+		system("git config --global user.name \"FUZZ\"") ||
+		system("git config --global user.email \"FUZZ@LOCALHOST\"") ||
+		system("git config --global --add safe.directory '*'") ||
+		system("git add ./TEMP_1 ./TEMP_2") ||
+		system("git commit -m\"First Commit\""))
+	{
+		return -1;
+	}
+
+	return 0;
+}
+
+/*
+ * This helper function returns the maximum number of commit can
+ * be generated by the provided random data without reusing the
+ * data to increase randomization of the fuzzing target and allow
+ * more path of fuzzing to be covered.
+ */
+int get_max_commit_count(int data_size, int git_files_count, int hash_size)
+{
+	int count = (data_size - 4 - git_files_count * 2) / (hash_size * 2);
+
+	if (count > 20)
+	{
+		count = 20;
+	}
+
+	return count;
+}
diff --git a/oss-fuzz/fuzz-cmd-base.h b/oss-fuzz/fuzz-cmd-base.h
new file mode 100644
index 00000000000..ed9ef62d554
--- /dev/null
+++ b/oss-fuzz/fuzz-cmd-base.h
@@ -0,0 +1,14 @@
+#ifndef FUZZ_CMD_BASE_H
+#define FUZZ_CMD_BASE_H
+
+#define HASH_SIZE 20
+
+int randomize_git_file(char *dir, char *name, char *data, int size);
+void randomize_git_files(char *dir, char *name_set[],
+	int files_count, char *data, int size);
+void generate_random_file(char *data, int size);
+void generate_commit(char *data, int size);
+int reset_git_folder(void);
+int get_max_commit_count(int data_size, int git_files_count, int hash_size);
+
+#endif
diff --git a/oss-fuzz/fuzz-cmd-status.c b/oss-fuzz/fuzz-cmd-status.c
new file mode 100644
index 00000000000..eb87d1ed13d
--- /dev/null
+++ b/oss-fuzz/fuzz-cmd-status.c
@@ -0,0 +1,79 @@
+#include "builtin.h"
+#include "repository.h"
+#include "fuzz-cmd-base.h"
+
+int cmd_status(int argc, const char **argv, const char *prefix);
+
+int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size);
+
+int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
+{
+	int i;
+	int no_of_commit;
+	int max_commit_count;
+	char *argv[2];
+	char *data_chunk;
+	char *basedir = "./.git";
+
+	/*
+	 *  Initialize the repository
+	 */
+	initialize_the_repository();
+
+	max_commit_count = get_max_commit_count(size, 0, HASH_SIZE);
+
+	/*
+	 * End this round of fuzzing if the data is not large enough
+	 */
+	if (size <= (HASH_SIZE * 2 + 4) || reset_git_folder())
+	{
+		repo_clear(the_repository);
+		return 0;
+	}
+
+
+	/*
+	 * Generate random commit
+	 */
+	no_of_commit = (*((int *)data)) % max_commit_count + 1;
+	data += 4;
+	size -= 4;
+
+	data_chunk = xmallocz_gently(HASH_SIZE * 2);
+
+	if (!data_chunk)
+	{
+		repo_clear(the_repository);
+		return 0;
+	}
+
+	for (i = 0; i < no_of_commit; i++)
+	{
+		memcpy(data_chunk, data, HASH_SIZE * 2);
+		generate_commit(data_chunk, HASH_SIZE);
+		data += (HASH_SIZE * 2);
+		size -= (HASH_SIZE * 2);
+	}
+
+	free(data_chunk);
+
+	/*
+	 * Final preparing of the repository settings
+	 */
+	repo_clear(the_repository);
+	if (repo_init(the_repository, basedir, "."))
+	{
+		repo_clear(the_repository);
+		return 0;
+	}
+
+	/*
+	 * Calling target git command
+	 */
+	argv[0] = "status";
+	argv[1] = "-v";
+	cmd_status(2, (const char **)argv, (const char *)"");
+
+	repo_clear(the_repository);
+	return 0;
+}
diff --git a/fuzz-commit-graph.c b/oss-fuzz/fuzz-commit-graph.c
similarity index 100%
rename from fuzz-commit-graph.c
rename to oss-fuzz/fuzz-commit-graph.c
diff --git a/fuzz-pack-headers.c b/oss-fuzz/fuzz-pack-headers.c
similarity index 100%
rename from fuzz-pack-headers.c
rename to oss-fuzz/fuzz-pack-headers.c
diff --git a/fuzz-pack-idx.c b/oss-fuzz/fuzz-pack-idx.c
similarity index 100%
rename from fuzz-pack-idx.c
rename to oss-fuzz/fuzz-pack-idx.c

base-commit: dd3f6c4cae7e3b15ce984dce8593ff7569650e24
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] fuzz: add basic fuzz testing for git command
  2022-09-16 17:29 ` [PATCH v2] " Arthur Chan via GitGitGadget
@ 2022-09-16 17:37   ` Junio C Hamano
  2022-09-16 18:07     ` Arthur Chan
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2022-09-16 17:37 UTC (permalink / raw)
  To: Arthur Chan via GitGitGadget
  Cc: git, Ævar Arnfjörð Bjarmason, Arthur Chan

"Arthur Chan via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Arthur Chan <arthur.chan@adalogics.com>
>
> fuzz-cmd-base.c / fuzz-cmd-base.h provides base functions for
> fuzzing on git command which are compatible with libFuzzer
> (and possibly other fuzzing engines).
> fuzz-cmd-status.c provides first git command fuzzing target
> as a demonstration of the approach.

As I said in my review on the previous round, please make the
"cleaning up of existing stuff" and "addition of new stuff" into two
separate patches, the latter building on top of the former.  That
will make it easier to review the former (as there shouldn't be
anything that would add or change the way how the moved stuff
interacts with the rest of the world) and also the latter (as the
scope of the second patch would be much smaller and more focused).

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] fuzz: add basic fuzz testing for git command
  2022-09-16 17:37   ` Junio C Hamano
@ 2022-09-16 18:07     ` Arthur Chan
  0 siblings, 0 replies; 8+ messages in thread
From: Arthur Chan @ 2022-09-16 18:07 UTC (permalink / raw)
  To: Junio C Hamano, Arthur Chan via GitGitGadget
  Cc: git, Ævar Arnfjörð Bjarmason, Arthur Chan


On 16/9/2022 6:37 pm, Junio C Hamano wrote:
> "Arthur Chan via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Arthur Chan <arthur.chan@adalogics.com>
>>
>> fuzz-cmd-base.c / fuzz-cmd-base.h provides base functions for
>> fuzzing on git command which are compatible with libFuzzer
>> (and possibly other fuzzing engines).
>> fuzz-cmd-status.c provides first git command fuzzing target
>> as a demonstration of the approach.
> As I said in my review on the previous round, please make the
> "cleaning up of existing stuff" and "addition of new stuff" into two
> separate patches, the latter building on top of the former.  That
> will make it easier to review the former (as there shouldn't be
> anything that would add or change the way how the moved stuff
> interacts with the rest of the world) and also the latter (as the
> scope of the second patch would be much smaller and more focused).
>
> Thanks.

Thanks. Sorry for the misunderstanding on my side. I will go ahead and
create a PR for moving of fuzzing before submitting another on for new
fuzzers. Thanks very much for your kind suggestions and time.

Cheers.

ADA Logics Ltd is registered in England. No: 11624074.
Registered office: 266 Banbury Road, Post Box 292,
OX2 7DL, Oxford, Oxfordshire , United Kingdom

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-09-16 18:07 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-13 14:22 [PATCH] fuzz: add basic fuzz testing for git command Arthur Chan via GitGitGadget
2022-09-13 15:57 ` Ævar Arnfjörð Bjarmason
2022-09-16 15:54   ` Arthur Chan
2022-09-13 16:13 ` Junio C Hamano
2022-09-16 16:06   ` Arthur Chan
2022-09-16 17:29 ` [PATCH v2] " Arthur Chan via GitGitGadget
2022-09-16 17:37   ` Junio C Hamano
2022-09-16 18:07     ` Arthur Chan

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).