git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] remote: initialize values that might not be set
@ 2021-06-07 12:39 Derrick Stolee via GitGitGadget
  2021-06-07 21:59 ` Johannes Schindelin
  0 siblings, 1 reply; 6+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-06-07 12:39 UTC (permalink / raw)
  To: git; +Cc: gitster, stolee, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

I noticed during an unrelated test with Valgrind that these variables
might be left un-set by stat_tracking_info() in some cases. Initialize
them so that a later branch upon their value is consistent.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
    remote: initialize values that might not be set
    
    A very minor fixup.
    
    Thanks, -Stolee

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-974%2Fderrickstolee%2Fremote-uninitialized-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-974/derrickstolee/remote-uninitialized-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/974

 remote.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/remote.c b/remote.c
index c3f85c17ca7c..a116392fb057 100644
--- a/remote.c
+++ b/remote.c
@@ -2101,7 +2101,7 @@ int stat_tracking_info(struct branch *branch, int *num_ours, int *num_theirs,
 int format_tracking_info(struct branch *branch, struct strbuf *sb,
 			 enum ahead_behind_flags abf)
 {
-	int ours, theirs, sti;
+	int ours = 0, theirs = 0, sti = 0;
 	const char *full_base;
 	char *base;
 	int upstream_is_gone = 0;

base-commit: 71ca53e8125e36efbda17293c50027d31681a41f
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] remote: initialize values that might not be set
  2021-06-07 12:39 [PATCH] remote: initialize values that might not be set Derrick Stolee via GitGitGadget
@ 2021-06-07 21:59 ` Johannes Schindelin
  2021-06-07 23:21   ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Johannes Schindelin @ 2021-06-07 21:59 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, gitster, stolee, Derrick Stolee, Derrick Stolee

Hi Stolee,

On Mon, 7 Jun 2021, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> I noticed during an unrelated test with Valgrind that these variables
> might be left un-set by stat_tracking_info() in some cases. Initialize
> them so that a later branch upon their value is consistent.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>     remote: initialize values that might not be set
>
>     A very minor fixup.
>
>     Thanks, -Stolee
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-974%2Fderrickstolee%2Fremote-uninitialized-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-974/derrickstolee/remote-uninitialized-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/974
>
>  remote.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/remote.c b/remote.c
> index c3f85c17ca7c..a116392fb057 100644
> --- a/remote.c
> +++ b/remote.c
> @@ -2101,7 +2101,7 @@ int stat_tracking_info(struct branch *branch, int *num_ours, int *num_theirs,
>  int format_tracking_info(struct branch *branch, struct strbuf *sb,
>  			 enum ahead_behind_flags abf)
>  {
> -	int ours, theirs, sti;
> +	int ours = 0, theirs = 0, sti = 0;

While I like this change, I am somewhat confused where the values are used
for branching. The only time I see them used when `stat_branch_pair()` has
_not_ initialized `ours` and `theirs` is in those `trace2_data_intmax()`
calls. Otherwise `sti` is set to -1 and the other users of `ours` and
`theirs` aren't reached.

If my reading of the code is correct, maybe the commit message could be
adjusted to talk about tracing instead of branching?

Thanks,
Dscho

>  	const char *full_base;
>  	char *base;
>  	int upstream_is_gone = 0;
>
> base-commit: 71ca53e8125e36efbda17293c50027d31681a41f
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] remote: initialize values that might not be set
  2021-06-07 21:59 ` Johannes Schindelin
@ 2021-06-07 23:21   ` Junio C Hamano
  2021-06-10  9:24     ` Johannes Schindelin
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2021-06-07 23:21 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Derrick Stolee via GitGitGadget, git, stolee, Derrick Stolee,
	Derrick Stolee

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> diff --git a/remote.c b/remote.c
>> index c3f85c17ca7c..a116392fb057 100644
>> --- a/remote.c
>> +++ b/remote.c
>> @@ -2101,7 +2101,7 @@ int stat_tracking_info(struct branch *branch, int *num_ours, int *num_theirs,
>>  int format_tracking_info(struct branch *branch, struct strbuf *sb,
>>  			 enum ahead_behind_flags abf)
>>  {
>> -	int ours, theirs, sti;
>> +	int ours = 0, theirs = 0, sti = 0;
>
> While I like this change, I am somewhat confused where the values are used
> for branching. The only time I see them used when `stat_branch_pair()` has
> _not_ initialized `ours` and `theirs` is in those `trace2_data_intmax()`
> calls. Otherwise `sti` is set to -1 and the other users of `ours` and
> `theirs` aren't reached.
>
> If my reading of the code is correct, maybe the commit message could be
> adjusted to talk about tracing instead of branching?

I too wondered why initializing them to 0 is safe (instead of hiding
latent bugs).  I think that stat_tracking_info() would always return
-1 if returns before reaching the point in stat_branch_pair(), but
it is not clear how we can futureproof the whole thing.

If these two are initialized to say -1 here, and then we had some
sanity check, perhaps like so:

	sti = stat_tracking_info(branch, &ours, &theirs, &full_base, 0, abf);
+	assert(sti < 0 || (0 <= ours && 0 <= theirs));
	if (sti < 0) {
		if (!full_base)
	...

to enforce the invariant we assume (i.e. OK sti means ours and
theirs are set), it would allow us to sleep better, perhaps?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] remote: initialize values that might not be set
  2021-06-07 23:21   ` Junio C Hamano
@ 2021-06-10  9:24     ` Johannes Schindelin
  2021-06-11  1:41       ` Junio C Hamano
  2021-06-11 17:56       ` Derrick Stolee
  0 siblings, 2 replies; 6+ messages in thread
From: Johannes Schindelin @ 2021-06-10  9:24 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, git, stolee, Derrick Stolee,
	Derrick Stolee

Hi Junio,

On Tue, 8 Jun 2021, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> >> diff --git a/remote.c b/remote.c
> >> index c3f85c17ca7c..a116392fb057 100644
> >> --- a/remote.c
> >> +++ b/remote.c
> >> @@ -2101,7 +2101,7 @@ int stat_tracking_info(struct branch *branch, int *num_ours, int *num_theirs,
> >>  int format_tracking_info(struct branch *branch, struct strbuf *sb,
> >>  			 enum ahead_behind_flags abf)
> >>  {
> >> -	int ours, theirs, sti;
> >> +	int ours = 0, theirs = 0, sti = 0;
> >
> > While I like this change, I am somewhat confused where the values are used
> > for branching. The only time I see them used when `stat_branch_pair()` has
> > _not_ initialized `ours` and `theirs` is in those `trace2_data_intmax()`
> > calls. Otherwise `sti` is set to -1 and the other users of `ours` and
> > `theirs` aren't reached.
> >
> > If my reading of the code is correct, maybe the commit message could be
> > adjusted to talk about tracing instead of branching?
>
> I too wondered why initializing them to 0 is safe (instead of hiding
> latent bugs).  I think that stat_tracking_info() would always return
> -1 if returns before reaching the point in stat_branch_pair(),

While that is true, I was trying to make a different point: I noticed that
the `ours`/`theirs` variables _are_ used, even if `sti` is negative. The
code that I looked at reads like this:

	int format_tracking_info(struct branch *branch, struct strbuf *sb,
				 enum ahead_behind_flags abf)
	{
		int ours, theirs, sti;
		const char *full_base;
		char *base;
		int upstream_is_gone = 0;

		trace2_region_enter("tracking", "stat_tracking_info", NULL);
		sti = stat_tracking_info(branch, &ours, &theirs, &full_base, 0, abf);
		trace2_data_intmax("tracking", NULL, "stat_tracking_info/ab_flags", abf);
		trace2_data_intmax("tracking", NULL, "stat_tracking_info/ab_result", sti);
		if (abf == AHEAD_BEHIND_FULL) {
		    trace2_data_intmax("tracking", NULL, "stat_tracking_info/ab_ahead", ours);
		    trace2_data_intmax("tracking", NULL, "stat_tracking_info/ab_behind", theirs);
		}
		trace2_region_leave("tracking", "stat_tracking_info", NULL);

		if (sti < 0) {
			if (!full_base)
				return 0;
			upstream_is_gone = 1;
		}

You will notice that there are two Trace2 calls in that conditional `abf
== AHEAD_BEHIND_FULL` block.

Now, what I failed to realize when reviewing this code (and I _bet_ Stolee
was in the same boat when they contributed the patch) is that this version
of `format_tracking_info()` is different from what is in v2.32.0. It is
the version we have in the `microsoft/git` fork, and it has not yet made
it upstream. To be precise, it is this commit:
https://github.com/microsoft/git/commit/91209e591b0398c8334a78001a245807f7eb348a

In light of this, it might make more sense for us to fixup! this commit
thusly:

-- snip --
diff --git a/remote.c b/remote.c
index caed9cbc31b1..cfb7b6bd8d30 100644
--- a/remote.c
+++ b/remote.c
@@ -2110,7 +2110,7 @@ int format_tracking_info(struct branch *branch, struct strbuf *sb,
 	sti = stat_tracking_info(branch, &ours, &theirs, &full_base, 0, abf);
 	trace2_data_intmax("tracking", NULL, "stat_tracking_info/ab_flags", abf);
 	trace2_data_intmax("tracking", NULL, "stat_tracking_info/ab_result", sti);
-	if (abf == AHEAD_BEHIND_FULL) {
+	if (sti >= 0 && abf == AHEAD_BEHIND_FULL) {
 	    trace2_data_intmax("tracking", NULL, "stat_tracking_info/ab_ahead", ours);
 	    trace2_data_intmax("tracking", NULL, "stat_tracking_info/ab_behind", theirs);
 	}
-- snap --

This would be in line with how `format_tracking_info()` avoids accessing
`ours` and `theirs` if `stat_tracking_info()` returned a negative value.

I opened the corresponding PR here:
https://github.com/microsoft/git/pull/373

> but it is not clear how we can futureproof the whole thing.
>
> If these two are initialized to say -1 here, and then we had some
> sanity check, perhaps like so:
>
> 	sti = stat_tracking_info(branch, &ours, &theirs, &full_base, 0, abf);
> +	assert(sti < 0 || (0 <= ours && 0 <= theirs));
> 	if (sti < 0) {
> 		if (!full_base)
> 	...
>
> to enforce the invariant we assume (i.e. OK sti means ours and
> theirs are set), it would allow us to sleep better, perhaps?

As I have stated elsewhere, I am somewhat doubtful of the benefit those
`assert()` calls give us.

I wish there was a way to integrate some sort of static analysis that
would warn us about using uninitialized values.

Of course, we would have to make sure that it does not show as many false
positives about `struct strbuf` and `struct strvec` "overrunning" on their
buffer. This is what dominates Coverity's report, for example.

FWIW I played a little with CodeQL on GitHub, but have not found time to
continue on that in a long time... my current state is pushed as `codeql`
to my fork: https://github.com/git/git/compare/master...dscho:codeql, just
in case somebody interested wants to take this further).

Ciao,
Dscho

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] remote: initialize values that might not be set
  2021-06-10  9:24     ` Johannes Schindelin
@ 2021-06-11  1:41       ` Junio C Hamano
  2021-06-11 17:56       ` Derrick Stolee
  1 sibling, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2021-06-11  1:41 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Derrick Stolee via GitGitGadget, git, stolee, Derrick Stolee,
	Derrick Stolee

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> You will notice that there are two Trace2 calls in that conditional `abf
> == AHEAD_BEHIND_FULL` block.

Yes, the calls use ours/theirs uninitialized.  Is it sensible to
show 0 there, or "(unset)" or its moral equivalent (e.g. "-1")?  Not
showing them indeed is an option, which is what you did below, and
that I find sensible, too.

> Now, what I failed to realize when reviewing this code (and I _bet_ Stolee
> was in the same boat when they contributed the patch) is that this version
> of `format_tracking_info()` is different from what is in v2.32.0. It is
> the version we have in the `microsoft/git` fork, and it has not yet made
> it upstream. To be precise, it is this commit:
> https://github.com/microsoft/git/commit/91209e591b0398c8334a78001a245807f7eb348a
>
> In light of this, it might make more sense for us to fixup! this commit
> thusly:
>
> -- snip --
> diff --git a/remote.c b/remote.c
> index caed9cbc31b1..cfb7b6bd8d30 100644
> --- a/remote.c
> +++ b/remote.c
> @@ -2110,7 +2110,7 @@ int format_tracking_info(struct branch *branch, struct strbuf *sb,
>  	sti = stat_tracking_info(branch, &ours, &theirs, &full_base, 0, abf);
>  	trace2_data_intmax("tracking", NULL, "stat_tracking_info/ab_flags", abf);
>  	trace2_data_intmax("tracking", NULL, "stat_tracking_info/ab_result", sti);
> -	if (abf == AHEAD_BEHIND_FULL) {
> +	if (sti >= 0 && abf == AHEAD_BEHIND_FULL) {
>  	    trace2_data_intmax("tracking", NULL, "stat_tracking_info/ab_ahead", ours);
>  	    trace2_data_intmax("tracking", NULL, "stat_tracking_info/ab_behind", theirs);
>  	}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] remote: initialize values that might not be set
  2021-06-10  9:24     ` Johannes Schindelin
  2021-06-11  1:41       ` Junio C Hamano
@ 2021-06-11 17:56       ` Derrick Stolee
  1 sibling, 0 replies; 6+ messages in thread
From: Derrick Stolee @ 2021-06-11 17:56 UTC (permalink / raw)
  To: Johannes Schindelin, Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, git, Derrick Stolee,
	Derrick Stolee

On 6/10/2021 5:24 AM, Johannes Schindelin wrote:> Now, what I failed to realize when reviewing this code (and I _bet_ Stolee
> was in the same boat when they contributed the patch) is that this version
> of `format_tracking_info()` is different from what is in v2.32.0. It is
> the version we have in the `microsoft/git` fork, and it has not yet made
> it upstream. To be precise, it is this commit:
> https://github.com/microsoft/git/commit/91209e591b0398c8334a78001a245807f7eb348a

I _did_ miss that this wasn't necessary in v2.32.0 and only exists in
microsoft/git. My cherry-pick applied cleanly, but I should have been
more careful.

> In light of this, it might make more sense for us to fixup! this commit
> thusly:

I've approved your PR in microsoft/git. That settles the real problem
and this patch can be dropped.

Sorry for the noise!

-Stolee

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-06-11 17:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-07 12:39 [PATCH] remote: initialize values that might not be set Derrick Stolee via GitGitGadget
2021-06-07 21:59 ` Johannes Schindelin
2021-06-07 23:21   ` Junio C Hamano
2021-06-10  9:24     ` Johannes Schindelin
2021-06-11  1:41       ` Junio C Hamano
2021-06-11 17:56       ` Derrick Stolee

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).