git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Emily Shaffer <emilyshaffer@google.com>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [RFC PATCH] grep: provide sane default to grep_source struct
Date: Thu, 16 May 2019 14:05:58 -0700	[thread overview]
Message-ID: <20190516210558.GB138048@google.com> (raw)
In-Reply-To: <20190516022911.GA135875@google.com>

On Wed, May 15, 2019 at 08:11:52PM -0700, Jonathan Nieder wrote:
> Hi,
> 
> Emily Shaffer wrote:
> 
> > grep_buffer creates a struct grep_source gs and calls grep_source()
> > with it. However, gs.name is null, which eventually produces a
> > segmentation fault in
> > grep_source()->grep_source_1()->show_line() when grep_opt.status_only is
> > not set.
> 
> Thanks for catching it.  Taking a step back, I think the problem is in
> the definition of "struct grep_source":
> 
> 	struct grep_source {
> 		char *name;
> 
> 		enum grep_source_type {
> 			GREP_SOURCE_OID,
> 			GREP_SOURCE_FILE,
> 			GREP_SOURCE_BUF,
> 		} type;
> 		void *identifier;
> 
> 		...
> 	};
> 
> What is the difference between a 'name' and an 'identifier'?  Who is
> responsible for free()ing them?  Can they be NULL?  This is pretty
> underdocumented for a public type.
> 
> If we take the point of view that 'name' should always be non-NULL,
> then I wonder:
> 
> - can we document that?
> - can grep_source_init enforce that?

Today grep_source_init() defaults to NULL. So if we decide that 'name'
should be non-NULL it will be somewhat changing the intent.

	void grep_source_init(struct grep_source *gs, enum grep_source_type type,        
	                      const char *name, const char *path,                        
	                      const void *identifier)                                    
	{                                                                                
	        gs->type = type;                                                         
	        gs->name = xstrdup_or_null(name); 
	...

> - can we take advantage of that in grep_source as well, as a sanity
>   check that the grep_source has been initialized?
> - while we're here, can we describe what the field is used for
>   (prefixing output with context before a ":", I believe)?

In general the documentation for grep.[ch] is pretty light. There aren't
any header comments and `Documentation/technical/api-grep.txt` is a
todo. So I agree that we should document it anywhere we can.

> > Jonathan Nieder proposed alternatively adding some check to grep_source()
> > to ensure that if opt->status_only is unset, gs->name must be non-NULL
> > (and yell about it if not), as well as some extra comments indicating
> > what assumptions are made about the data coming into functions like
> > grep_source(). I'm fine with that as well (although I'm not sure it
> > makes sense semantically to require a name which the user probably can't
> > easily set, or else ban the user from printing LOC during grep). Mostly
> > I'm happy with any solution besides a segfault with no error logging :)
> 
> Let's compare the two possibilities.
> 
> The advantage of "(in memory)" is that it Just Works, which should
> make a nicer development experience with getting a new caller mostly
> working on the way to getting them working just the way you want.
> 
> The disadvantage is that if we start outputting that in production, we
> and static analyzers are less likely to notice.  In other words,
> printing "(in memory)" is leaking details to the end user that do not
> match what the end user asked for.  NULL would instead produce a
> crash, prompting the author of the caller to fix it.
> 
> What was particularly pernicious about this example is that the NULL
> dereference only occurs if the grep has a match.  So I suppose I'm
> leaning toward (in addition to adding some comments to document the
> struct) adding a check like
> 
> 	if (!gs->name && !opt->status_only)
> 		BUG("grep calls that could print name require name");
> 
> to grep_source.

Why not both? :)

But seriously, I am planning to push a second patch with both, per
Junio's reply.

I'll consider the documentation out of scope for now since I'm not sure
I know enough about grep.[ch]'s intent or history to document anything
yet. :)

> 
> That would also sidestep the question of whether this debugging aid
> should be translated. :)
> 
> Sensible?
> 
> Thanks,
> Jonathan

  reply	other threads:[~2019-05-16 21:06 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-16  2:00 [RFC PATCH] grep: provide sane default to grep_source struct Emily Shaffer
2019-05-16  3:07 ` Junio C Hamano
2019-05-16  3:13   ` Jonathan Nieder
2019-05-16  3:11 ` Jonathan Nieder
2019-05-16 21:05   ` Emily Shaffer [this message]
2019-05-16 21:44 ` [PATCH v2] " Emily Shaffer
2019-05-16 22:02   ` Jeff King
2019-05-21 23:52     ` Emily Shaffer
2019-05-22  0:17       ` Jonathan Nieder
2019-05-22  0:34   ` [PATCH v3] " Emily Shaffer
2019-05-22  0:58     ` Jonathan Nieder
2019-05-22  4:01     ` Jeff King
2019-05-23 20:23     ` [PATCH v4] grep: fail if call could output and name is null Emily Shaffer
2019-05-23 20:34       ` Jonathan Nieder
2019-05-28 17:57         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190516210558.GB138048@google.com \
    --to=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).