From: Emily Shaffer <emilyshaffer@google.com>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [RFC PATCH] grep: provide sane default to grep_source struct
Date: Thu, 16 May 2019 14:05:58 -0700 [thread overview]
Message-ID: <20190516210558.GB138048@google.com> (raw)
In-Reply-To: <20190516022911.GA135875@google.com>
On Wed, May 15, 2019 at 08:11:52PM -0700, Jonathan Nieder wrote:
> Hi,
>
> Emily Shaffer wrote:
>
> > grep_buffer creates a struct grep_source gs and calls grep_source()
> > with it. However, gs.name is null, which eventually produces a
> > segmentation fault in
> > grep_source()->grep_source_1()->show_line() when grep_opt.status_only is
> > not set.
>
> Thanks for catching it. Taking a step back, I think the problem is in
> the definition of "struct grep_source":
>
> struct grep_source {
> char *name;
>
> enum grep_source_type {
> GREP_SOURCE_OID,
> GREP_SOURCE_FILE,
> GREP_SOURCE_BUF,
> } type;
> void *identifier;
>
> ...
> };
>
> What is the difference between a 'name' and an 'identifier'? Who is
> responsible for free()ing them? Can they be NULL? This is pretty
> underdocumented for a public type.
>
> If we take the point of view that 'name' should always be non-NULL,
> then I wonder:
>
> - can we document that?
> - can grep_source_init enforce that?
Today grep_source_init() defaults to NULL. So if we decide that 'name'
should be non-NULL it will be somewhat changing the intent.
void grep_source_init(struct grep_source *gs, enum grep_source_type type,
const char *name, const char *path,
const void *identifier)
{
gs->type = type;
gs->name = xstrdup_or_null(name);
...
> - can we take advantage of that in grep_source as well, as a sanity
> check that the grep_source has been initialized?
> - while we're here, can we describe what the field is used for
> (prefixing output with context before a ":", I believe)?
In general the documentation for grep.[ch] is pretty light. There aren't
any header comments and `Documentation/technical/api-grep.txt` is a
todo. So I agree that we should document it anywhere we can.
> > Jonathan Nieder proposed alternatively adding some check to grep_source()
> > to ensure that if opt->status_only is unset, gs->name must be non-NULL
> > (and yell about it if not), as well as some extra comments indicating
> > what assumptions are made about the data coming into functions like
> > grep_source(). I'm fine with that as well (although I'm not sure it
> > makes sense semantically to require a name which the user probably can't
> > easily set, or else ban the user from printing LOC during grep). Mostly
> > I'm happy with any solution besides a segfault with no error logging :)
>
> Let's compare the two possibilities.
>
> The advantage of "(in memory)" is that it Just Works, which should
> make a nicer development experience with getting a new caller mostly
> working on the way to getting them working just the way you want.
>
> The disadvantage is that if we start outputting that in production, we
> and static analyzers are less likely to notice. In other words,
> printing "(in memory)" is leaking details to the end user that do not
> match what the end user asked for. NULL would instead produce a
> crash, prompting the author of the caller to fix it.
>
> What was particularly pernicious about this example is that the NULL
> dereference only occurs if the grep has a match. So I suppose I'm
> leaning toward (in addition to adding some comments to document the
> struct) adding a check like
>
> if (!gs->name && !opt->status_only)
> BUG("grep calls that could print name require name");
>
> to grep_source.
Why not both? :)
But seriously, I am planning to push a second patch with both, per
Junio's reply.
I'll consider the documentation out of scope for now since I'm not sure
I know enough about grep.[ch]'s intent or history to document anything
yet. :)
>
> That would also sidestep the question of whether this debugging aid
> should be translated. :)
>
> Sensible?
>
> Thanks,
> Jonathan
next prev parent reply other threads:[~2019-05-16 21:06 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-16 2:00 [RFC PATCH] grep: provide sane default to grep_source struct Emily Shaffer
2019-05-16 3:07 ` Junio C Hamano
2019-05-16 3:13 ` Jonathan Nieder
2019-05-16 3:11 ` Jonathan Nieder
2019-05-16 21:05 ` Emily Shaffer [this message]
2019-05-16 21:44 ` [PATCH v2] " Emily Shaffer
2019-05-16 22:02 ` Jeff King
2019-05-21 23:52 ` Emily Shaffer
2019-05-22 0:17 ` Jonathan Nieder
2019-05-22 0:34 ` [PATCH v3] " Emily Shaffer
2019-05-22 0:58 ` Jonathan Nieder
2019-05-22 4:01 ` Jeff King
2019-05-23 20:23 ` [PATCH v4] grep: fail if call could output and name is null Emily Shaffer
2019-05-23 20:34 ` Jonathan Nieder
2019-05-28 17:57 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190516210558.GB138048@google.com \
--to=emilyshaffer@google.com \
--cc=git@vger.kernel.org \
--cc=jrnieder@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).