* [RFC PATCH] grep: provide sane default to grep_source struct @ 2019-05-16 2:00 Emily Shaffer 2019-05-16 3:07 ` Junio C Hamano ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Emily Shaffer @ 2019-05-16 2:00 UTC (permalink / raw) To: git; +Cc: jrn, Emily Shaffer grep_buffer creates a struct grep_source gs and calls grep_source() with it. However, gs.name is null, which eventually produces a segmentation fault in grep_source()->grep_source_1()->show_line() when grep_opt.status_only is not set. This seems to be unreachable from existing commands but is reachable in the event that someone rolls their own revision walk and neglects to set rev_info->grep_filter->status_only. Conceivably, someone might want to print all changed lines of all commits which match some regex. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> --- Hiya, I ran into this issue while I was working on a tutorial on rolling your own revision walk. I didn't want to print all the lines associated with a matching change, but it didn't seem good that a seemingly-sane grep_filter config was segfaulting. I don't know if adding a placeholder name is the right answer (hence RFC patch). Jonathan Nieder proposed alternatively adding some check to grep_source() to ensure that if opt->status_only is unset, gs->name must be non-NULL (and yell about it if not), as well as some extra comments indicating what assumptions are made about the data coming into functions like grep_source(). I'm fine with that as well (although I'm not sure it makes sense semantically to require a name which the user probably can't easily set, or else ban the user from printing LOC during grep). Mostly I'm happy with any solution besides a segfault with no error logging :) Thanks in advance for everyone's thoughts. Emily grep.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/grep.c b/grep.c index 0d50598acd..fd84454faf 100644 --- a/grep.c +++ b/grep.c @@ -2045,7 +2045,7 @@ int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size) struct grep_source gs; int r; - grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL, NULL); + grep_source_init(&gs, GREP_SOURCE_BUF, _("(in memory)"), NULL, NULL); gs.buf = buf; gs.size = size; -- 2.21.0.1020.gf2820cf01a-goog ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [RFC PATCH] grep: provide sane default to grep_source struct 2019-05-16 2:00 [RFC PATCH] grep: provide sane default to grep_source struct Emily Shaffer @ 2019-05-16 3:07 ` Junio C Hamano 2019-05-16 3:13 ` Jonathan Nieder 2019-05-16 3:11 ` Jonathan Nieder 2019-05-16 21:44 ` [PATCH v2] " Emily Shaffer 2 siblings, 1 reply; 15+ messages in thread From: Junio C Hamano @ 2019-05-16 3:07 UTC (permalink / raw) To: Emily Shaffer; +Cc: git, jrn Emily Shaffer <emilyshaffer@google.com> writes: > I don't know if adding a placeholder name is the right answer (hence RFC > patch). > > Jonathan Nieder proposed alternatively adding some check to grep_source() > to ensure that if opt->status_only is unset, gs->name must be non-NULL > (and yell about it if not), as well as some extra comments indicating > what assumptions are made about the data coming into functions like > grep_source(). I'm fine with that as well (although I'm not sure it That's an interesting problem. Currently the only use of the function is to see if the log message matches with the given pattern (yes/no), but it is conceivable that new callers may want to prepare in-core data and use it to see if that matches the pattern, or even to _show_ the lines that match the pattern (possibly with its own .output callback). Such a caller may even make repeated calls to the function, and want to give hint in the output to help users identify which in-core data produced hits, which implies that a hardcoded "(in core)" is not such a great idea. In a future where we support such a caller, the function signature of grep_buffer() would change to include the name, and you would be passing that to grep_source_init(). Until that happens, it would be OK to use a hardcoded "in-core" constant here, but to futureproof for such a future, I think it makes sense to have a check for a BUG() suggested by jrn---that would become useful once callers of grep_buffer() can give names prevent them from passing NULL when it later would be used. > makes sense semantically to require a name which the user probably can't > easily set, or else ban the user from printing LOC during grep). Mostly > I'm happy with any solution besides a segfault with no error logging :) > > Thanks in advance for everyone's thoughts. > Emily > > > grep.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/grep.c b/grep.c > index 0d50598acd..fd84454faf 100644 > --- a/grep.c > +++ b/grep.c > @@ -2045,7 +2045,7 @@ int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size) > struct grep_source gs; > int r; > > - grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL, NULL); > + grep_source_init(&gs, GREP_SOURCE_BUF, _("(in memory)"), NULL, NULL); > gs.buf = buf; > gs.size = size; ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH] grep: provide sane default to grep_source struct 2019-05-16 3:07 ` Junio C Hamano @ 2019-05-16 3:13 ` Jonathan Nieder 0 siblings, 0 replies; 15+ messages in thread From: Jonathan Nieder @ 2019-05-16 3:13 UTC (permalink / raw) To: Junio C Hamano; +Cc: Emily Shaffer, git Junio C Hamano wrote: > Currently the only use of the function is to see if the log message > matches with the given pattern (yes/no), but it is conceivable that > new callers may want to prepare in-core data and use it to see if > that matches the pattern, or even to _show_ the lines that match the > pattern (possibly with its own .output callback). Oh, good point about .output. That answers my question about getting the output in the right place. [...] >> Thanks in advance for everyone's thoughts. >> Emily Thanks, both. Jonathan ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH] grep: provide sane default to grep_source struct 2019-05-16 2:00 [RFC PATCH] grep: provide sane default to grep_source struct Emily Shaffer 2019-05-16 3:07 ` Junio C Hamano @ 2019-05-16 3:11 ` Jonathan Nieder 2019-05-16 21:05 ` Emily Shaffer 2019-05-16 21:44 ` [PATCH v2] " Emily Shaffer 2 siblings, 1 reply; 15+ messages in thread From: Jonathan Nieder @ 2019-05-16 3:11 UTC (permalink / raw) To: Emily Shaffer; +Cc: git Hi, Emily Shaffer wrote: > grep_buffer creates a struct grep_source gs and calls grep_source() > with it. However, gs.name is null, which eventually produces a > segmentation fault in > grep_source()->grep_source_1()->show_line() when grep_opt.status_only is > not set. Thanks for catching it. Taking a step back, I think the problem is in the definition of "struct grep_source": struct grep_source { char *name; enum grep_source_type { GREP_SOURCE_OID, GREP_SOURCE_FILE, GREP_SOURCE_BUF, } type; void *identifier; ... }; What is the difference between a 'name' and an 'identifier'? Who is responsible for free()ing them? Can they be NULL? This is pretty underdocumented for a public type. If we take the point of view that 'name' should always be non-NULL, then I wonder: - can we document that? - can grep_source_init enforce that? - can we take advantage of that in grep_source as well, as a sanity check that the grep_source has been initialized? - while we're here, can we describe what the field is used for (prefixing output with context before a ":", I believe)? [...] > This seems to be unreachable from existing commands but is reachable in > the event that someone rolls their own revision walk and neglects to set > rev_info->grep_filter->status_only. init_revisions sets that field, so a caller would have to replace grep_filter or unset status_only to trigger this. > Conceivably, someone might want to > print all changed lines of all commits which match some regex. Hm, does that work well? The caller of grep_buffer in revision.c doesn't seem to be careful about where in the output hits would be printed, but I might be missing something. [...] > I ran into this issue while I was working on a tutorial on rolling your > own revision walk. I didn't want to print all the lines associated with > a matching change, but it didn't seem good that a seemingly-sane > grep_filter config was segfaulting. Good find, and thanks for taking the time to make it easier to debug for the next person. [...] > Jonathan Nieder proposed alternatively adding some check to grep_source() > to ensure that if opt->status_only is unset, gs->name must be non-NULL > (and yell about it if not), as well as some extra comments indicating > what assumptions are made about the data coming into functions like > grep_source(). I'm fine with that as well (although I'm not sure it > makes sense semantically to require a name which the user probably can't > easily set, or else ban the user from printing LOC during grep). Mostly > I'm happy with any solution besides a segfault with no error logging :) Let's compare the two possibilities. The advantage of "(in memory)" is that it Just Works, which should make a nicer development experience with getting a new caller mostly working on the way to getting them working just the way you want. The disadvantage is that if we start outputting that in production, we and static analyzers are less likely to notice. In other words, printing "(in memory)" is leaking details to the end user that do not match what the end user asked for. NULL would instead produce a crash, prompting the author of the caller to fix it. What was particularly pernicious about this example is that the NULL dereference only occurs if the grep has a match. So I suppose I'm leaning toward (in addition to adding some comments to document the struct) adding a check like if (!gs->name && !opt->status_only) BUG("grep calls that could print name require name"); to grep_source. That would also sidestep the question of whether this debugging aid should be translated. :) Sensible? Thanks, Jonathan ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH] grep: provide sane default to grep_source struct 2019-05-16 3:11 ` Jonathan Nieder @ 2019-05-16 21:05 ` Emily Shaffer 0 siblings, 0 replies; 15+ messages in thread From: Emily Shaffer @ 2019-05-16 21:05 UTC (permalink / raw) To: Jonathan Nieder; +Cc: git On Wed, May 15, 2019 at 08:11:52PM -0700, Jonathan Nieder wrote: > Hi, > > Emily Shaffer wrote: > > > grep_buffer creates a struct grep_source gs and calls grep_source() > > with it. However, gs.name is null, which eventually produces a > > segmentation fault in > > grep_source()->grep_source_1()->show_line() when grep_opt.status_only is > > not set. > > Thanks for catching it. Taking a step back, I think the problem is in > the definition of "struct grep_source": > > struct grep_source { > char *name; > > enum grep_source_type { > GREP_SOURCE_OID, > GREP_SOURCE_FILE, > GREP_SOURCE_BUF, > } type; > void *identifier; > > ... > }; > > What is the difference between a 'name' and an 'identifier'? Who is > responsible for free()ing them? Can they be NULL? This is pretty > underdocumented for a public type. > > If we take the point of view that 'name' should always be non-NULL, > then I wonder: > > - can we document that? > - can grep_source_init enforce that? Today grep_source_init() defaults to NULL. So if we decide that 'name' should be non-NULL it will be somewhat changing the intent. void grep_source_init(struct grep_source *gs, enum grep_source_type type, const char *name, const char *path, const void *identifier) { gs->type = type; gs->name = xstrdup_or_null(name); ... > - can we take advantage of that in grep_source as well, as a sanity > check that the grep_source has been initialized? > - while we're here, can we describe what the field is used for > (prefixing output with context before a ":", I believe)? In general the documentation for grep.[ch] is pretty light. There aren't any header comments and `Documentation/technical/api-grep.txt` is a todo. So I agree that we should document it anywhere we can. > > Jonathan Nieder proposed alternatively adding some check to grep_source() > > to ensure that if opt->status_only is unset, gs->name must be non-NULL > > (and yell about it if not), as well as some extra comments indicating > > what assumptions are made about the data coming into functions like > > grep_source(). I'm fine with that as well (although I'm not sure it > > makes sense semantically to require a name which the user probably can't > > easily set, or else ban the user from printing LOC during grep). Mostly > > I'm happy with any solution besides a segfault with no error logging :) > > Let's compare the two possibilities. > > The advantage of "(in memory)" is that it Just Works, which should > make a nicer development experience with getting a new caller mostly > working on the way to getting them working just the way you want. > > The disadvantage is that if we start outputting that in production, we > and static analyzers are less likely to notice. In other words, > printing "(in memory)" is leaking details to the end user that do not > match what the end user asked for. NULL would instead produce a > crash, prompting the author of the caller to fix it. > > What was particularly pernicious about this example is that the NULL > dereference only occurs if the grep has a match. So I suppose I'm > leaning toward (in addition to adding some comments to document the > struct) adding a check like > > if (!gs->name && !opt->status_only) > BUG("grep calls that could print name require name"); > > to grep_source. Why not both? :) But seriously, I am planning to push a second patch with both, per Junio's reply. I'll consider the documentation out of scope for now since I'm not sure I know enough about grep.[ch]'s intent or history to document anything yet. :) > > That would also sidestep the question of whether this debugging aid > should be translated. :) > > Sensible? > > Thanks, > Jonathan ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2] grep: provide sane default to grep_source struct 2019-05-16 2:00 [RFC PATCH] grep: provide sane default to grep_source struct Emily Shaffer 2019-05-16 3:07 ` Junio C Hamano 2019-05-16 3:11 ` Jonathan Nieder @ 2019-05-16 21:44 ` Emily Shaffer 2019-05-16 22:02 ` Jeff King 2019-05-22 0:34 ` [PATCH v3] " Emily Shaffer 2 siblings, 2 replies; 15+ messages in thread From: Emily Shaffer @ 2019-05-16 21:44 UTC (permalink / raw) To: emilyshaffer; +Cc: git, jrn grep_buffer creates a struct grep_source gs and calls grep_source() with it. However, gs.name is null, which eventually produces a segmentation fault in grep_source()->grep_source_1()->show_line() when grep_opt.status_only is not set. This seems to be unreachable from existing commands but is reachable in the event that someone rolls their own revision walk and neglects to set rev_info->grep_filter->status_only. Conceivably, someone might want to print all changed lines of all commits which match some regex. To futureproof, give developers a heads-up if grep_source() is called and a valid name field is expected but not given. This path should be unreachable now, but in the future may become reachable if developers want to add the ability to use grep_buffer() in prepared in-core data and provide hints to the user about where the data came from. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> --- I've added the BUG() line to grep_source(). I considered adding it to grep_source_1() but didn't see much difference. Looks like grep_source_1() exists purely as a helper to grep_source() and is never called by anybody else... but it's the function where the crash would happen. So I'm not sure. I also modified the wording for the BUG("") statement a little from jrn's suggestion to hopefully make it more explicit, but I welcome wordsmithing. grep.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/grep.c b/grep.c index 0d50598acd..6ceb880a8c 100644 --- a/grep.c +++ b/grep.c @@ -2021,6 +2021,9 @@ static int chk_hit_marker(struct grep_expr *x) int grep_source(struct grep_opt *opt, struct grep_source *gs) { + if (!opt->status_only && gs->name == NULL) + BUG("grep call which could print a name requires " + "grep_source.name be non-NULL"); /* * we do not have to do the two-pass grep when we do not check * buffer-wide "all-match". @@ -2045,7 +2048,11 @@ int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size) struct grep_source gs; int r; - grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL, NULL); + /* TODO: In the future it may become desirable to pass in the name as + * an argument to grep_buffer(). At that time, "(in core)" should be + * replaced. + */ + grep_source_init(&gs, GREP_SOURCE_BUF, _("(in core)"), NULL, NULL); gs.buf = buf; gs.size = size; -- 2.21.0.1020.gf2820cf01a-goog ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v2] grep: provide sane default to grep_source struct 2019-05-16 21:44 ` [PATCH v2] " Emily Shaffer @ 2019-05-16 22:02 ` Jeff King 2019-05-21 23:52 ` Emily Shaffer 2019-05-22 0:34 ` [PATCH v3] " Emily Shaffer 1 sibling, 1 reply; 15+ messages in thread From: Jeff King @ 2019-05-16 22:02 UTC (permalink / raw) To: Emily Shaffer; +Cc: git, jrn On Thu, May 16, 2019 at 02:44:44PM -0700, Emily Shaffer wrote: > grep_buffer creates a struct grep_source gs and calls grep_source() > with it. However, gs.name is null, which eventually produces a > segmentation fault in > grep_source()->grep_source_1()->show_line() when grep_opt.status_only is > not set. Funny wrapping here. > This seems to be unreachable from existing commands but is reachable in > the event that someone rolls their own revision walk and neglects to set > rev_info->grep_filter->status_only. Conceivably, someone might want to > print all changed lines of all commits which match some regex. > > To futureproof, give developers a heads-up if grep_source() is called > and a valid name field is expected but not given. This path should be > unreachable now, but in the future may become reachable if developers > want to add the ability to use grep_buffer() in prepared in-core data > and provide hints to the user about where the data came from. So I guess this is probably my fault, as I was the one who factored out the grep_source bits long ago (though I would also not be surprised if it could be triggered before, too). I think adding a BUG() is the right thing to do. > I've added the BUG() line to grep_source(). I considered adding it to > grep_source_1() but didn't see much difference. Looks like > grep_source_1() exists purely as a helper to grep_source() and is never > called by anybody else... but it's the function where the crash would > happen. So I'm not sure. I agree it probably doesn't matter. I'd probably stick at at the top of grep_source_1(), just with the rationale that it could possibly catch more cases if somebody ever refactors grep_source() to have two different callers (e.g., imagine we later add a variant that takes more options, but leave the old one to avoid disrupting existing callers). > I also modified the wording for the BUG("") statement a little from > jrn's suggestion to hopefully make it more explicit, but I welcome > wordsmithing. > [...] > + BUG("grep call which could print a name requires " > + "grep_source.name be non-NULL"); It looks fine to me. The point is that nobody should ever see this. And if they do, we should already get a file/line number telling us where the problem is (and a coredump to poke at). So as long as it's enough to convince a regular user that they should make a report to the mailing list, it will have done its job. > diff --git a/grep.c b/grep.c > index 0d50598acd..6ceb880a8c 100644 > --- a/grep.c > +++ b/grep.c > @@ -2021,6 +2021,9 @@ static int chk_hit_marker(struct grep_expr *x) > > int grep_source(struct grep_opt *opt, struct grep_source *gs) > { > + if (!opt->status_only && gs->name == NULL) > + BUG("grep call which could print a name requires " > + "grep_source.name be non-NULL"); > /* > * we do not have to do the two-pass grep when we do not check > * buffer-wide "all-match". Minor nit, but can we put a blank line between the new block and the comment? > @@ -2045,7 +2048,11 @@ int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size) > struct grep_source gs; > int r; > > - grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL, NULL); > + /* TODO: In the future it may become desirable to pass in the name as > + * an argument to grep_buffer(). At that time, "(in core)" should be > + * replaced. > + */ > + grep_source_init(&gs, GREP_SOURCE_BUF, _("(in core)"), NULL, NULL); Hmm. I don't see much point in this one, as it would just avoid triggering our BUG(). If somebody is adding new grep_buffer() callers that don't use status_only, wouldn't we want them to see the BUG() to know that they need to refactor grep_buffer() to provide a name? -Peff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2] grep: provide sane default to grep_source struct 2019-05-16 22:02 ` Jeff King @ 2019-05-21 23:52 ` Emily Shaffer 2019-05-22 0:17 ` Jonathan Nieder 0 siblings, 1 reply; 15+ messages in thread From: Emily Shaffer @ 2019-05-21 23:52 UTC (permalink / raw) To: Jeff King; +Cc: git, jrn On Thu, May 16, 2019 at 06:02:54PM -0400, Jeff King wrote: > On Thu, May 16, 2019 at 02:44:44PM -0700, Emily Shaffer wrote: > > > grep_buffer creates a struct grep_source gs and calls grep_source() > > with it. However, gs.name is null, which eventually produces a > > segmentation fault in > > grep_source()->grep_source_1()->show_line() when grep_opt.status_only is > > not set. > > Funny wrapping here. > > > This seems to be unreachable from existing commands but is reachable in > > the event that someone rolls their own revision walk and neglects to set > > rev_info->grep_filter->status_only. Conceivably, someone might want to > > print all changed lines of all commits which match some regex. > > > > To futureproof, give developers a heads-up if grep_source() is called > > and a valid name field is expected but not given. This path should be > > unreachable now, but in the future may become reachable if developers > > want to add the ability to use grep_buffer() in prepared in-core data > > and provide hints to the user about where the data came from. > > So I guess this is probably my fault, as I was the one who factored out > the grep_source bits long ago (though I would also not be surprised if > it could be triggered before, too). > > I think adding a BUG() is the right thing to do. > > > I've added the BUG() line to grep_source(). I considered adding it to > > grep_source_1() but didn't see much difference. Looks like > > grep_source_1() exists purely as a helper to grep_source() and is never > > called by anybody else... but it's the function where the crash would > > happen. So I'm not sure. > > I agree it probably doesn't matter. I'd probably stick at at the top of > grep_source_1(), just with the rationale that it could possibly catch > more cases if somebody ever refactors grep_source() to have two > different callers (e.g., imagine we later add a variant that takes more > options, but leave the old one to avoid disrupting existing callers). I think this combined with needing a history lesson to know not to call grep_source_1() makes a pretty strong case for moving it. I'll put the BUG() line at the top of grep_source_1() instead. > > > I also modified the wording for the BUG("") statement a little from > > jrn's suggestion to hopefully make it more explicit, but I welcome > > wordsmithing. > > [...] > > + BUG("grep call which could print a name requires " > > + "grep_source.name be non-NULL"); > > It looks fine to me. The point is that nobody should ever see this. And > if they do, we should already get a file/line number telling us where > the problem is (and a coredump to poke at). So as long as it's enough to > convince a regular user that they should make a report to the mailing > list, it will have done its job. > > > diff --git a/grep.c b/grep.c > > index 0d50598acd..6ceb880a8c 100644 > > --- a/grep.c > > +++ b/grep.c > > @@ -2021,6 +2021,9 @@ static int chk_hit_marker(struct grep_expr *x) > > > > int grep_source(struct grep_opt *opt, struct grep_source *gs) > > { > > + if (!opt->status_only && gs->name == NULL) > > + BUG("grep call which could print a name requires " > > + "grep_source.name be non-NULL"); > > /* > > * we do not have to do the two-pass grep when we do not check > > * buffer-wide "all-match". > > Minor nit, but can we put a blank line between the new block and the > comment? I made sure to include blank lines surrounding this block in its new home. > > > @@ -2045,7 +2048,11 @@ int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size) > > struct grep_source gs; > > int r; > > > > - grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL, NULL); > > + /* TODO: In the future it may become desirable to pass in the name as > > + * an argument to grep_buffer(). At that time, "(in core)" should be > > + * replaced. > > + */ > > + grep_source_init(&gs, GREP_SOURCE_BUF, _("(in core)"), NULL, NULL); > > Hmm. I don't see much point in this one, as it would just avoid > triggering our BUG(). If somebody is adding new grep_buffer() callers > that don't use status_only, wouldn't we want them to see the BUG() to > know that they need to refactor grep_buffer() to provide a name? For me, I'd rather have *some* functionality available without doing a large refactor. The line now isn't inaccurate, and `git grep --fixed "(in core)"` shows it's the only instance of that string in the codebase, so it's easy to track down where it's happening. Though the BUG() line is pretty easy to track down too. Having a default field like this might help folks move quickly if they just want proof of concept that their idea works without adding a bunch of plumbing. But then, they could add it on their own if they're that early in their feature.... Can we think of a reason anybody would want to be able to use it this way with the placeholder string? > > -Peff Thanks for your comments Peff. - Emily ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2] grep: provide sane default to grep_source struct 2019-05-21 23:52 ` Emily Shaffer @ 2019-05-22 0:17 ` Jonathan Nieder 0 siblings, 0 replies; 15+ messages in thread From: Jonathan Nieder @ 2019-05-22 0:17 UTC (permalink / raw) To: Emily Shaffer; +Cc: Jeff King, git Hi, Emily Shaffer wrote: > On Thu, May 16, 2019 at 06:02:54PM -0400, Jeff King wrote: >> On Thu, May 16, 2019 at 02:44:44PM -0700, Emily Shaffer wrote: >>> + /* TODO: In the future it may become desirable to pass in the name as >>> + * an argument to grep_buffer(). At that time, "(in core)" should be >>> + * replaced. >>> + */ (micronit, likely moot: Git's multi-line comments start with "/*" on its own line: /* * NEEDSWORK: Passing the name in as an argument would allow * "(in core)" to be replaced. */ .) >>> + grep_source_init(&gs, GREP_SOURCE_BUF, _("(in core)"), NULL, NULL); >> >> Hmm. I don't see much point in this one, as it would just avoid >> triggering our BUG(). If somebody is adding new grep_buffer() callers >> that don't use status_only, wouldn't we want them to see the BUG() to >> know that they need to refactor grep_buffer() to provide a name? [...] > Can we think of a reason anybody would want to be able to use it this > way with the placeholder string? I agree with Peff here: using NULL puts this in a better place with respect to Rusty's API design manifesto[1]. With the "(in core)" default, I may end up triggering the "(in core)" behavior in production, because there is not a clear enough signal that my code path is making a mistake. That's problematic because it gives the end user a confusing experience: the end user cares where the line comes from, not that it spent a second or two in core. With the NULL default, *especially* after this patch, such usage would instead trigger a BUG: line in output, meaning - if it gets exercised in tests, the test will fail, prompting the patch auther to pass in a more appropriate label - if it gets missed in tests and gets triggered in production, the error message makes it clear that this is a mistake so the user is likely to report a bug instead of assuming this is deliberate but confusing behavior In that vein, this patch is very helpful, since the BUG would trip *consistently*, not only when the grep pattern matches. Failing consistently like this is a huge improvement in API usability. It would be even better if we could catch the problem at compile time, but one thing at a time. Thanks, Jonathan [1] https://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html, https://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v3] grep: provide sane default to grep_source struct 2019-05-16 21:44 ` [PATCH v2] " Emily Shaffer 2019-05-16 22:02 ` Jeff King @ 2019-05-22 0:34 ` Emily Shaffer 2019-05-22 0:58 ` Jonathan Nieder ` (2 more replies) 1 sibling, 3 replies; 15+ messages in thread From: Emily Shaffer @ 2019-05-22 0:34 UTC (permalink / raw) To: emilyshaffer; +Cc: git, jrn grep_buffer creates a struct grep_source gs and calls grep_source() with it. However, gs.name is null, which eventually produces a segmentation fault in grep_source()->grep_source_1()->show_line() when grep_opt.status_only is not set. This seems to be unreachable from existing commands but is reachable in the event that someone rolls their own revision walk and neglects to set rev_info->grep_filter->status_only. Conceivably, someone might want to print all changed lines of all commits which match some regex. To futureproof, give developers a heads-up if grep_source() is called and a valid name field is expected but not given. This path should be unreachable now, but in the future may become reachable if developers want to add the ability to use grep_buffer() in prepared in-core data and provide hints to the user about where the data came from. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> --- I think Peff and Jonathan are right. If someone does want to hack away on something in the very early stages, the BUG() line gives a pretty clear indicator of what to modify to make it work. I also moved the BUG() to grep_source_1() to keep it close to the error itself and reflowed the commit message. - Emily grep.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/grep.c b/grep.c index 0d50598acd..f7c3a5803e 100644 --- a/grep.c +++ b/grep.c @@ -1780,6 +1780,10 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle enum grep_context ctx = GREP_CONTEXT_HEAD; xdemitconf_t xecfg; + if (!opt->status_only && gs->name == NULL) + BUG("grep call which could print a name requires " + "grep_source.name be non-NULL"); + if (!opt->output) opt->output = std_output; -- 2.21.0.1020.gf2820cf01a-goog ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v3] grep: provide sane default to grep_source struct 2019-05-22 0:34 ` [PATCH v3] " Emily Shaffer @ 2019-05-22 0:58 ` Jonathan Nieder 2019-05-22 4:01 ` Jeff King 2019-05-23 20:23 ` [PATCH v4] grep: fail if call could output and name is null Emily Shaffer 2 siblings, 0 replies; 15+ messages in thread From: Jonathan Nieder @ 2019-05-22 0:58 UTC (permalink / raw) To: Emily Shaffer; +Cc: git, Jeff King Emily Shaffer wrote: > Subject: grep: provide sane default to grep_source struct This subject line doesn't describe what the patch does any more. How about something like grep: fail early if call could print output and name is not set ? > grep_buffer creates a struct grep_source gs and calls grep_source() > with it. However, gs.name is null, which eventually produces a > segmentation fault in grep_source()->grep_source_1()->show_line() > when grep_opt.status_only is not set. This is the place to describe the motivation behind the patch: - what is the current behavior? - what is the motivation behind the current behavior? - in what scenario does it fail? - how does it fail, and what is undesirable about that failure? Perhaps something like grep_source, the workhorse of Git's grep library, allows passing in an arbitrary grep_source representing the haystack in which to search for a needle matching a pattern. In most callers, the grep_source::name field is set to an appropriate prefix to print before a colon when a result matches: README:Git is an Open Source project covered by the GNU General Public One caller, grep_buffer, leaves the 'name' field set to NULL because there is not enough context to determine an appropriate name to put in such output lines. In practice, this has been fine because the only caller of grep_buffer is "git log --grep" and that caller sets status_only to disable output (and only check whether a match exists). But this is brittle: a future caller can call grep_buffer without status_only set, and as soon as it hits a match, grep_source will segfault in the process of trying to print (null):Git is an Open Source project covered by the GNU General Public > This seems to be unreachable from existing commands but is reachable in > the event that someone rolls their own revision walk and neglects to set > rev_info->grep_filter->status_only. Conceivably, someone might want to > print all changed lines of all commits which match some regex. Focusing on the use case instead of the details of how it is implemented, we can simplify this to For example, such a future caller might want to print all matching lines from commits which match a regex. > To futureproof, give developers a heads-up if grep_source() is called > and a valid name field is expected but not given. This path should be > unreachable now, but in the future may become reachable if developers > want to add the ability to use grep_buffer() in prepared in-core data > and provide hints to the user about where the data came from. Here we can concisely describe the improved behavior with the fix: Futureproof by diagnosing a use of the API that could trigger that condition early, before we know whether the pattern matches: BUG: git.c:832: grep call which could print a name requires grep_source.name be non-NULL Aborted And then what it makes possible: This way, the caller's author gets a quick indication of how to fix it, by ensuring that either grep_source.name is set or that status_only is set to prevent printing output. And how it came up: Noticed while adding such a call in a tutorial on revision walks. > Signed-off-by: Emily Shaffer <emilyshaffer@google.com> > --- [...] > I also moved the BUG() to grep_source_1() to keep it close to the error > itself and reflowed the commit message. Makes sense. Thanks for these summaries of what changed between each revision of the patch --- they're very helpful. With whatever subset of the above suggested commit message changes make sense, Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Thanks. [...] > --- a/grep.c > +++ b/grep.c > @@ -1780,6 +1780,10 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle > enum grep_context ctx = GREP_CONTEXT_HEAD; > xdemitconf_t xecfg; > > + if (!opt->status_only && gs->name == NULL) optional: can use !gs->name here. > + BUG("grep call which could print a name requires " > + "grep_source.name be non-NULL"); ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] grep: provide sane default to grep_source struct 2019-05-22 0:34 ` [PATCH v3] " Emily Shaffer 2019-05-22 0:58 ` Jonathan Nieder @ 2019-05-22 4:01 ` Jeff King 2019-05-23 20:23 ` [PATCH v4] grep: fail if call could output and name is null Emily Shaffer 2 siblings, 0 replies; 15+ messages in thread From: Jeff King @ 2019-05-22 4:01 UTC (permalink / raw) To: Emily Shaffer; +Cc: git, jrn On Tue, May 21, 2019 at 05:34:02PM -0700, Emily Shaffer wrote: > I think Peff and Jonathan are right. If someone does want to hack away > on something in the very early stages, the BUG() line gives a pretty > clear indicator of what to modify to make it work. > > I also moved the BUG() to grep_source_1() to keep it close to the error > itself and reflowed the commit message. Thanks. This version looks good to me (though if you took any of Jonathan's suggestions in the other part of the thread that would be fine with me, too). -Peff ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v4] grep: fail if call could output and name is null 2019-05-22 0:34 ` [PATCH v3] " Emily Shaffer 2019-05-22 0:58 ` Jonathan Nieder 2019-05-22 4:01 ` Jeff King @ 2019-05-23 20:23 ` Emily Shaffer 2019-05-23 20:34 ` Jonathan Nieder 2 siblings, 1 reply; 15+ messages in thread From: Emily Shaffer @ 2019-05-23 20:23 UTC (permalink / raw) To: git; +Cc: Emily Shaffer, Jeff King, Jonathan Nieder grep_source(), which performs much of the work for Git's grep library, allows passing an arbitrary struct grep_source which represents the text which grep_source() should search to match a pattern in the provided struct grep_opt. In most callers, the grep_source::name field is set to an appropriate prefix to print before a colon when a result matches: README:Git is an Open Source project covered by the GNU General One caller, grep_buffer(), leaves the grep_source::name field set to NULL because there isn't enough context to determine an appropriate name for this kind of output line. In practice, this has been fine: the only caller of grep_buffer() is "git log --grep", and that caller sets grep_opt::status_only, which disables output and only checks whether a match exists. But this is brittle: a future caller can call grep_buffer() without grep_opt::status_only set, and as soon as it hits a match, grep_source() will try to print the match and segfault: (null):Git is an Open Source project covered by the GNU General For example, a future caller might want to print all matching lines from commits which match a regex. Futureproof by diagnosing early a use of the API that could trigger that condition, before we know whether the pattern matches: BUG: grep.c:1783: grep call which could print a name requires grep_source.name be non-NULL Aborted This way, the caller's author gets an indication of how to fix the issue - by providing grep_source::name or setting grep_opt::status_only - and they are warned of the potential for a segfault unconditionally, rather than only if there is a match. Noticed while adding such a call to a tutorial on revision walks. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> --- Since v3, only the commit message has changed. Reworked based on Jonathan Nieder's suggestions (with some modifications for readability). - Emily grep.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/grep.c b/grep.c index 0d50598acd..f7c3a5803e 100644 --- a/grep.c +++ b/grep.c @@ -1780,6 +1780,10 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle enum grep_context ctx = GREP_CONTEXT_HEAD; xdemitconf_t xecfg; + if (!opt->status_only && gs->name == NULL) + BUG("grep call which could print a name requires " + "grep_source.name be non-NULL"); + if (!opt->output) opt->output = std_output; -- 2.22.0.rc1.257.g3120a18244-goog ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v4] grep: fail if call could output and name is null 2019-05-23 20:23 ` [PATCH v4] grep: fail if call could output and name is null Emily Shaffer @ 2019-05-23 20:34 ` Jonathan Nieder 2019-05-28 17:57 ` Junio C Hamano 0 siblings, 1 reply; 15+ messages in thread From: Jonathan Nieder @ 2019-05-23 20:34 UTC (permalink / raw) To: Emily Shaffer; +Cc: git, Jeff King Emily Shaffer wrote: > Signed-off-by: Emily Shaffer <emilyshaffer@google.com> > Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> > --- > Since v3, only the commit message has changed. Reworked based on > Jonathan Nieder's suggestions (with some modifications for readability). > > grep.c | 4 ++++ > 1 file changed, 4 insertions(+) This is indeed Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Thanks for your patient work. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4] grep: fail if call could output and name is null 2019-05-23 20:34 ` Jonathan Nieder @ 2019-05-28 17:57 ` Junio C Hamano 0 siblings, 0 replies; 15+ messages in thread From: Junio C Hamano @ 2019-05-28 17:57 UTC (permalink / raw) To: Jonathan Nieder; +Cc: Emily Shaffer, git, Jeff King Jonathan Nieder <jrnieder@gmail.com> writes: > Emily Shaffer wrote: > >> Signed-off-by: Emily Shaffer <emilyshaffer@google.com> >> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> >> --- >> Since v3, only the commit message has changed. Reworked based on >> Jonathan Nieder's suggestions (with some modifications for readability). >> >> grep.c | 4 ++++ >> 1 file changed, 4 insertions(+) > > This is indeed > Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> > > Thanks for your patient work. Thanks, both. Will queue. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2019-05-28 17:57 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-05-16 2:00 [RFC PATCH] grep: provide sane default to grep_source struct Emily Shaffer 2019-05-16 3:07 ` Junio C Hamano 2019-05-16 3:13 ` Jonathan Nieder 2019-05-16 3:11 ` Jonathan Nieder 2019-05-16 21:05 ` Emily Shaffer 2019-05-16 21:44 ` [PATCH v2] " Emily Shaffer 2019-05-16 22:02 ` Jeff King 2019-05-21 23:52 ` Emily Shaffer 2019-05-22 0:17 ` Jonathan Nieder 2019-05-22 0:34 ` [PATCH v3] " Emily Shaffer 2019-05-22 0:58 ` Jonathan Nieder 2019-05-22 4:01 ` Jeff King 2019-05-23 20:23 ` [PATCH v4] grep: fail if call could output and name is null Emily Shaffer 2019-05-23 20:34 ` Jonathan Nieder 2019-05-28 17:57 ` Junio C Hamano
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).