From: Arthur Chan <arthur.chan@adalogics.com>
To: Junio C Hamano <gitster@pobox.com>,
Arthur Chan via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Arthur Chan <arthur.chan@adalogics.com>,
david@adalogics.com
Subject: Re: [PATCH] fuzz: add basic fuzz testing for git command
Date: Fri, 16 Sep 2022 17:06:10 +0100 [thread overview]
Message-ID: <d1a53455-f9cc-3c7a-0867-8b141435d251@adalogics.com> (raw)
In-Reply-To: <xmqqv8pr9rrn.fsf@gitster.g>
On 13/9/2022 5:13 pm, Junio C Hamano wrote:
> "Arthur Chan via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> .gitignore | 2 +
>> Makefile | 2 +
>> fuzz-cmd-base.c | 117 ++++++++++++++++++++++++++++++++++++++++++++++
>> fuzz-cmd-base.h | 13 ++++++
>> fuzz-cmd-status.c | 68 +++++++++++++++++++++++++++
>> 5 files changed, 202 insertions(+)
>> create mode 100644 fuzz-cmd-base.c
>> create mode 100644 fuzz-cmd-base.h
>> create mode 100644 fuzz-cmd-status.c
> Just like we have t/ hierarchy for testing, if we plan to add more
> fuzz-* related things on top of what we already have (like those
> that can be seen in the context of this patch), I would prefer to
> see a creation of fuzz/ hierarchy and move existing stuff there as
> the first step before adding more.
>
> And more fuzzing is good, if we can afford it ;-)
Fixed, I move the fuzzer into a new directory oss-fuzz
>
> Thanks.
>
> Even though I am not taking this patch as-is, let's give a cursory
> look to make sure the future iteration can be more reviewable by
> pointing out various CodingGuidelines issues.
>
Thanks for the styling suggestion, I have change most of them accordingly.
>> diff --git a/fuzz-cmd-base.c b/fuzz-cmd-base.c
>> new file mode 100644
>> index 00000000000..98f05c78372
>> --- /dev/null
>> +++ b/fuzz-cmd-base.c
>> @@ -0,0 +1,117 @@
>> +#include "cache.h"
> Good to have this as the first thing.
>
>> +#include "fuzz-cmd-base.h"
>> +
>> +
>> +/*
>> + * This function is used to randomize the content of a file with the
>> + * random data. The random data normally come from the fuzzing engine
>> + * LibFuzzer in order to create randomization of the git file worktree
>> + * and possibly messing up of certain git config file to fuzz different
>> + * git command execution logic.
>> + */
>> +void randomize_git_file(char *dir, char *name, char *data_chunk, int data_size) {
> Unlike other control structure with multiple statements in a block,
> the surrounding braces {} around function block sit on their own
> lines. I.e.
>
> void randomize_git_file(char *dir, char *name, char *data_chunk, int data_size)
> {
>
>
>> + char fname[256];
> In our codebase, tab-width is 8 and we indent with tabs.
>
> Use <strbuf.h> and avoid snprintf(), e.g.
>
> struct strbuf fname = STRBUF_INIT;
> strbuf_addf(&fname, "%s/%s", dir, name);
> ... use fname.buf ...
> strbuf_release(&fname);
I have changed all the snprintf code to use strbuf instead. Thanks for
the suggestion.
>> + FILE *fp;
>> +
> Good that you leave a blank between the end of decl and the
> beginning of the statements.
>
>> + snprintf(fname, 255, "%s/%s", dir, name);
>> +
>> + fp = fopen(fname, "wb");
>> + if (fp) {
>> + fwrite(data_chunk, 1, data_size, fp);
>> + fclose(fp);
>> + }
>> +}
> Why doesn't this care about errors at all? Not even fopen errors?
>
I have changed the code a little bit, but in general, fail to generate
contents of a file do appear many time during the fuzzing process
because some random fuzzing data result in unexpected behaviour and we
currently just skip that round of fuzzing.
>> +/*
>> + * This function is the variants of the above functions which takes
>> + * in a set of target files to be processed. These target file are
> "... is a variant of the above function, which takes a set of ..."
>
>> + * passing to the above function one by one for content rewrite.
>> + */
>> +void randomize_git_files(char *dir, char *name_set[], int files_count, char *data, int size) {
>> + int data_size = size / files_count;
>> +
>> + for(int i=0; i<files_count; i++) {
> We do not yet officially allow variable decl for for() statement
> like this. We'll start allowing it later this year but we are
> waiting for oddball platform/compiler folks to scream right now.
>
> IOW, we write the above more like so:
>
> int data_size = size / files_count;
> int i;
>
> for (i = 0; i < files_count; i++) {
>
> Take also notice how we use whitespaces around non-unary operators.
Thanks, changed the code style accordingly.
>> + char *data_chunk = malloc(data_size);
>> + memcpy(data_chunk, data + (i * data_size), data_size);
>> + randomize_git_file(dir, name_set[i], data_chunk, data_size);
>> +
>> + free(data_chunk);
>> + }
> As data_size does not change in this loop and the contents of
> data_chunk from each round is discardable, allocating once outside
> may make more sense. Actually, as the called function makes only
> read-only accesses of data_chunk, I do not quite see why you need to
> make a copy in the first place.
>
> We do not use malloc() etc. directly out of the system; study wrapper.c
> and find xmalloc() and friends.
Change to use xmallocz_gentle instead of malloc. Thanks for the suggestion.
>
> What if size is not a multiple of files_count, by the way?
It does not matter, the unused byte just simply be ignored. We just
ensure it has enough random byte provided by the oss-fuzz engine to
generate those random file content.
> I'll stop here as we already have plenty above (read: it is not "I
> didn't spot any problems in the patch after this point").
Thanks and sorry for the trouble, this is the first time to contribute
to patches in git and does not know most of the convention and style.
Have changed most of the them with my best effort accordingly and will
prepare a v2 soon enough.
>
> Thanks.
ADA Logics Ltd is registered in England. No: 11624074.
Registered office: 266 Banbury Road, Post Box 292,
OX2 7DL, Oxford, Oxfordshire , United Kingdom
next prev parent reply other threads:[~2022-09-16 16:06 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-13 14:22 [PATCH] fuzz: add basic fuzz testing for git command Arthur Chan via GitGitGadget
2022-09-13 15:57 ` Ævar Arnfjörð Bjarmason
2022-09-16 15:54 ` Arthur Chan
2022-09-13 16:13 ` Junio C Hamano
2022-09-16 16:06 ` Arthur Chan [this message]
2022-09-16 17:29 ` [PATCH v2] " Arthur Chan via GitGitGadget
2022-09-16 17:37 ` Junio C Hamano
2022-09-16 18:07 ` Arthur Chan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d1a53455-f9cc-3c7a-0867-8b141435d251@adalogics.com \
--to=arthur.chan@adalogics.com \
--cc=david@adalogics.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).