git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] commit-reach: fix sorting commits by generation
@ 2018-10-22 21:10 Thomas Gummerer
  2018-10-22 21:53 ` René Scharfe
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Gummerer @ 2018-10-22 21:10 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Thomas Gummerer

compare_commit_by_gen is used to sort a list of pointers to 'struct
commit'.  The comparison function for qsort is called with pointers to
the objects it needs to compare, so when sorting a list of 'struct
commit *', the arguments are of type 'struct commit **'.  However,
currently the comparison function casts it's arguments to 'struct
commit *' and uses those, leading to out of bounds memory access and
potentially to wrong results.  Fix that.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---

I noticed this by running the test suite through valgrind.  I'm not
familiar with this code, so I'm not sure why this didn't cause any
issues or how they would manifest, but this seems like the right fix
for this function either way.

 commit-reach.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index bc522d6840..9efddfd7a0 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -516,8 +516,8 @@ int commit_contains(struct ref_filter *filter, struct commit *commit,
 
 static int compare_commits_by_gen(const void *_a, const void *_b)
 {
-	const struct commit *a = (const struct commit *)_a;
-	const struct commit *b = (const struct commit *)_b;
+	const struct commit *a = *(const struct commit **)_a;
+	const struct commit *b = *(const struct commit **)_b;
 
 	if (a->generation < b->generation)
 		return -1;
-- 
2.19.1.759.g500967bb5e


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] commit-reach: fix sorting commits by generation
  2018-10-22 21:10 [PATCH] commit-reach: fix sorting commits by generation Thomas Gummerer
@ 2018-10-22 21:53 ` René Scharfe
  2018-10-23 20:32   ` Thomas Gummerer
  0 siblings, 1 reply; 4+ messages in thread
From: René Scharfe @ 2018-10-22 21:53 UTC (permalink / raw)
  To: Thomas Gummerer, git; +Cc: Derrick Stolee

Am 22.10.2018 um 23:10 schrieb Thomas Gummerer:
> compare_commit_by_gen is used to sort a list of pointers to 'struct
> commit'.  The comparison function for qsort is called with pointers to
> the objects it needs to compare, so when sorting a list of 'struct
> commit *', the arguments are of type 'struct commit **'.  However,
> currently the comparison function casts it's arguments to 'struct
> commit *' and uses those, leading to out of bounds memory access and
> potentially to wrong results.  Fix that.
> 
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
> 
> I noticed this by running the test suite through valgrind.  I'm not
> familiar with this code, so I'm not sure why this didn't cause any
> issues or how they would manifest, but this seems like the right fix
> for this function either way.

Right; I sent a similar patch a while ago, but it seems to have fallen
through the cracks:

https://public-inbox.org/git/d1b58614-989f-5998-6c53-c19eee409a2f@web.de/

Anyway, your implied question was discussed back then.  Derrick wrote:

   The reason to sort is to hopefully minimize the amount we walk by 
   exploring the "lower" commits first. This is a performance-only thing, 
   not a correctness issue (which is why the bug exists). Even then, it is 
   just a heuristic.

Does b6723e4671 in pu (commit-reach: fix first-parent heuristic) change
that picture?  Did a quick test and found no performance difference with
and without the fix on top, i.e. proper sorting didn't seem to matter.

>  commit-reach.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/commit-reach.c b/commit-reach.c
> index bc522d6840..9efddfd7a0 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -516,8 +516,8 @@ int commit_contains(struct ref_filter *filter, struct commit *commit,
>  
>  static int compare_commits_by_gen(const void *_a, const void *_b)
>  {
> -	const struct commit *a = (const struct commit *)_a;
> -	const struct commit *b = (const struct commit *)_b;
> +	const struct commit *a = *(const struct commit **)_a;
> +	const struct commit *b = *(const struct commit **)_b;
>  
>  	if (a->generation < b->generation)
>  		return -1;
> 

Looks good to me.

René

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] commit-reach: fix sorting commits by generation
  2018-10-22 21:53 ` René Scharfe
@ 2018-10-23 20:32   ` Thomas Gummerer
  2018-10-24 13:19     ` Derrick Stolee
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Gummerer @ 2018-10-23 20:32 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, Derrick Stolee

On 10/22, René Scharfe wrote:
> Am 22.10.2018 um 23:10 schrieb Thomas Gummerer:
> > compare_commit_by_gen is used to sort a list of pointers to 'struct
> > commit'.  The comparison function for qsort is called with pointers to
> > the objects it needs to compare, so when sorting a list of 'struct
> > commit *', the arguments are of type 'struct commit **'.  However,
> > currently the comparison function casts it's arguments to 'struct
> > commit *' and uses those, leading to out of bounds memory access and
> > potentially to wrong results.  Fix that.
> > 
> > Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> > ---
> > 
> > I noticed this by running the test suite through valgrind.  I'm not
> > familiar with this code, so I'm not sure why this didn't cause any
> > issues or how they would manifest, but this seems like the right fix
> > for this function either way.
> 
> Right; I sent a similar patch a while ago, but it seems to have fallen
> through the cracks:
> 
> https://public-inbox.org/git/d1b58614-989f-5998-6c53-c19eee409a2f@web.de/

Whoops I didn't notice that, I only checked whether the problem still
exists in pu.  I'd be more than happy to go with your patch instead.

> Anyway, your implied question was discussed back then.  Derrick wrote:
> 
>    The reason to sort is to hopefully minimize the amount we walk by 
>    exploring the "lower" commits first. This is a performance-only thing, 
>    not a correctness issue (which is why the bug exists). Even then, it is 
>    just a heuristic.

Thanks for pointing that out!

> Does b6723e4671 in pu (commit-reach: fix first-parent heuristic) change
> that picture?  Did a quick test and found no performance difference with
> and without the fix on top, i.e. proper sorting didn't seem to matter.

I just gave 'test-tool reach can_all_from_reach' a try and got the
same results, with or without the fix the times are very similar.  I
haven't had time to follow the commit-graph series though, so I'm not
sure I used it correctly.  I tried it on the linux repository with the
following input:

X:v4.10
X:v4.9
X:v4.8
X:v4.7
X:v4.6
X:v4.5
X:v4.4
X:v4.3
X:v4.2
X:v4.1
Y:v3.10
Y:v3.9
Y:v3.8
Y:v3.7
Y:v3.6
Y:v3.5
Y:v3.4
Y:v3.3
Y:v3.2
Y:v3.1

> >  commit-reach.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/commit-reach.c b/commit-reach.c
> > index bc522d6840..9efddfd7a0 100644
> > --- a/commit-reach.c
> > +++ b/commit-reach.c
> > @@ -516,8 +516,8 @@ int commit_contains(struct ref_filter *filter, struct commit *commit,
> >  
> >  static int compare_commits_by_gen(const void *_a, const void *_b)
> >  {
> > -	const struct commit *a = (const struct commit *)_a;
> > -	const struct commit *b = (const struct commit *)_b;
> > +	const struct commit *a = *(const struct commit **)_a;
> > +	const struct commit *b = *(const struct commit **)_b;
> >  
> >  	if (a->generation < b->generation)
> >  		return -1;
> > 
> 
> Looks good to me.
> 
> René

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] commit-reach: fix sorting commits by generation
  2018-10-23 20:32   ` Thomas Gummerer
@ 2018-10-24 13:19     ` Derrick Stolee
  0 siblings, 0 replies; 4+ messages in thread
From: Derrick Stolee @ 2018-10-24 13:19 UTC (permalink / raw)
  To: Thomas Gummerer, René Scharfe; +Cc: git, Jeff King

On 10/23/2018 4:32 PM, Thomas Gummerer wrote:
> On 10/22, René Scharfe wrote:
>> Am 22.10.2018 um 23:10 schrieb Thomas Gummerer:
>>
>> Anyway, your implied question was discussed back then.  Derrick wrote:
>>
>>     The reason to sort is to hopefully minimize the amount we walk by
>>     exploring the "lower" commits first. This is a performance-only thing,
>>     not a correctness issue (which is why the bug exists). Even then, it is
>>     just a heuristic.
> Thanks for pointing that out!
>
>> Does b6723e4671 in pu (commit-reach: fix first-parent heuristic) change
>> that picture?  Did a quick test and found no performance difference with
>> and without the fix on top, i.e. proper sorting didn't seem to matter.
> I just gave 'test-tool reach can_all_from_reach' a try and got the
> same results, with or without the fix the times are very similar.  I
> haven't had time to follow the commit-graph series though, so I'm not
> sure I used it correctly.

Thanks for your attention here. I've been thinking a lot about this 
heuristic and have concluded the following two things are true:

(1) When we return 1, the order that we explore the 'from' commits does 
not change the explored set of commits.

(2) When we return 0, the order that we explore the 'to' commits will 
change the explored set, but it is difficult to say that the heuristic 
helps more than it hurts.

Item (1) is contrary to what I had thought when I first created the 
heuristic.

The details are tricky, but essentially each DFS starting at a 'from' 
commit may be short-circuited due to a prior walk, but swapping the 
order of those two 'from' commits would lead to the same set of commits 
to be explored (with the short-circuit happening in the other commit). 
The only change is that we can terminate early if we fully explore a 
'from' commit and do not find a commit marked with 'with_flag'. In this 
sense, it would be best to explore the commits that are "closest" to the 
generation number cutoff first, as we can maybe find a negative answer 
earlier in the search.

In this sense, we could remove the sort entirely and probably not have 
much performance hit. But since the set of 'from' commits is probably 
much smaller than the set of commits to explore, the sort is likely 
inexpensive.

In conclusion: I cannot say that this sort is super-important. As for 
the potential benefits in (2), I'll leave that to people who run git as 
a server who may have telemetry around fetch negotiation. How often do 
we actually say we need more rounds of negotiation? What kinds of data 
shapes matter there?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-10-24 13:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-22 21:10 [PATCH] commit-reach: fix sorting commits by generation Thomas Gummerer
2018-10-22 21:53 ` René Scharfe
2018-10-23 20:32   ` Thomas Gummerer
2018-10-24 13:19     ` Derrick Stolee

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).