git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] merge-ort: only do pointer arithmetic for non-empty lists
@ 2021-04-10  8:30 Andrzej Hunt via GitGitGadget
  2021-04-10 11:48 ` René Scharfe
  2021-04-11 11:05 ` [PATCH v2] " Andrzej Hunt via GitGitGadget
  0 siblings, 2 replies; 9+ messages in thread
From: Andrzej Hunt via GitGitGadget @ 2021-04-10  8:30 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Andrzej Hunt, Andrzej Hunt

From: Andrzej Hunt <ajrhunt@google.com>

versions could be an empty string_list. In that case, versions->items is
NULL, and we shouldn't be trying to perform pointer arithmetic with it (as
that results in undefined behaviour).

This issue has probably existed since:
  ee4012dcf9 (merge-ort: step 2 of tree writing -- function to create tree object, 2020-12-13)
But it only started occurring during tests since tests started using
merge-ort:
  f3b964a07e (Add testing with merge-ort merge strategy, 2021-03-20)

For reference - here's the original UBSAN commit that implemented this
check, it sounds like this behaviour isn't actually likely to cause any
issues (but we might as well fix it regardless):
https://reviews.llvm.org/D67122

UBSAN output from t3404 or t5601:

merge-ort.c:2669:43: runtime error: applying zero offset to null pointer
    #0 0x78bb53 in write_tree merge-ort.c:2669:43
    #1 0x7856c9 in process_entries merge-ort.c:3303:2
    #2 0x782317 in merge_ort_nonrecursive_internal merge-ort.c:3744:2
    #3 0x77feef in merge_incore_nonrecursive merge-ort.c:3853:2
    #4 0x6f6a5c in do_recursive_merge sequencer.c:640:3
    #5 0x6f6a5c in do_pick_commit sequencer.c:2221:9
    #6 0x6ef055 in single_pick sequencer.c:4814:9
    #7 0x6ef055 in sequencer_pick_revisions sequencer.c:4867:10
    #8 0x4fb392 in run_sequencer revert.c:225:9
    #9 0x4fa5b0 in cmd_revert revert.c:235:8
    #10 0x42abd7 in run_builtin git.c:453:11
    #11 0x429531 in handle_builtin git.c:704:3
    #12 0x4282fb in run_argv git.c:771:4
    #13 0x4282fb in cmd_main git.c:902:19
    #14 0x524b63 in main common-main.c:52:11
    #15 0x7fc2ca340349 in __libc_start_main (/lib64/libc.so.6+0x24349)
    #16 0x4072b9 in _start start.S:120

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior merge-ort.c:2669:43 in

Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
---
    merge-ort: only do pointer arithmetic for non-empty lists
    
    Here's a small and inconsequential UBSAN issue that started happening
    when running tests on next since yesterday (since the merge of
    en/ort-readiness).
    
    It can be reproduced with something like this (t3404 also triggers the
    same issue):
    
    make SANITIZE=undefined COPTS="-Og -g" T="$(wildcard t5601-*.sh)"
    GIT_TEST_OPTS="-v" UBSAN_OPTIONS=print_stacktrace=1 test
    
    ATB, Andrzej

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-930%2Fahunt%2Fmerge-ort-ubsan-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-930/ahunt/merge-ort-ubsan-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/930

 merge-ort.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 5e118a85ee04..4da4b4688336 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -2504,8 +2504,10 @@ static void write_tree(struct object_id *result_oid,
 	 * We won't use relevant_entries again and will let it just pop off the
 	 * stack, so there won't be allocation worries or anything.
 	 */
-	relevant_entries.items = versions->items + offset;
-	relevant_entries.nr = versions->nr - offset;
+	if (versions->nr) {
+		relevant_entries.items = versions->items + offset;
+		relevant_entries.nr = versions->nr - offset;
+	}
 	QSORT(relevant_entries.items, relevant_entries.nr, tree_entry_order);
 
 	/* Pre-allocate some space in buf */

base-commit: 89b43f80a514aee58b662ad606e6352e03eaeee4
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] merge-ort: only do pointer arithmetic for non-empty lists
  2021-04-10  8:30 [PATCH] merge-ort: only do pointer arithmetic for non-empty lists Andrzej Hunt via GitGitGadget
@ 2021-04-10 11:48 ` René Scharfe
  2021-04-10 22:56   ` Junio C Hamano
  2021-04-11  9:12   ` Andrzej Hunt
  2021-04-11 11:05 ` [PATCH v2] " Andrzej Hunt via GitGitGadget
  1 sibling, 2 replies; 9+ messages in thread
From: René Scharfe @ 2021-04-10 11:48 UTC (permalink / raw)
  To: Andrzej Hunt via GitGitGadget, git
  Cc: Elijah Newren, Andrzej Hunt, Andrzej Hunt

Am 10.04.21 um 10:30 schrieb Andrzej Hunt via GitGitGadget:
> From: Andrzej Hunt <ajrhunt@google.com>
>
> versions could be an empty string_list. In that case, versions->items is
> NULL, and we shouldn't be trying to perform pointer arithmetic with it (as
> that results in undefined behaviour).
>
> This issue has probably existed since:
>   ee4012dcf9 (merge-ort: step 2 of tree writing -- function to create tree object, 2020-12-13)
> But it only started occurring during tests since tests started using
> merge-ort:
>   f3b964a07e (Add testing with merge-ort merge strategy, 2021-03-20)
>
> For reference - here's the original UBSAN commit that implemented this
> check, it sounds like this behaviour isn't actually likely to cause any
> issues (but we might as well fix it regardless):
> https://reviews.llvm.org/D67122
>
> UBSAN output from t3404 or t5601:
>
> merge-ort.c:2669:43: runtime error: applying zero offset to null pointer
>     #0 0x78bb53 in write_tree merge-ort.c:2669:43
>     #1 0x7856c9 in process_entries merge-ort.c:3303:2
>     #2 0x782317 in merge_ort_nonrecursive_internal merge-ort.c:3744:2
>     #3 0x77feef in merge_incore_nonrecursive merge-ort.c:3853:2
>     #4 0x6f6a5c in do_recursive_merge sequencer.c:640:3
>     #5 0x6f6a5c in do_pick_commit sequencer.c:2221:9
>     #6 0x6ef055 in single_pick sequencer.c:4814:9
>     #7 0x6ef055 in sequencer_pick_revisions sequencer.c:4867:10
>     #8 0x4fb392 in run_sequencer revert.c:225:9
>     #9 0x4fa5b0 in cmd_revert revert.c:235:8
>     #10 0x42abd7 in run_builtin git.c:453:11
>     #11 0x429531 in handle_builtin git.c:704:3
>     #12 0x4282fb in run_argv git.c:771:4
>     #13 0x4282fb in cmd_main git.c:902:19
>     #14 0x524b63 in main common-main.c:52:11
>     #15 0x7fc2ca340349 in __libc_start_main (/lib64/libc.so.6+0x24349)
>     #16 0x4072b9 in _start start.S:120
>
> SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior merge-ort.c:2669:43 in
>
> Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
> ---
>     merge-ort: only do pointer arithmetic for non-empty lists
>
>     Here's a small and inconsequential UBSAN issue that started happening
>     when running tests on next since yesterday (since the merge of
>     en/ort-readiness).
>
>     It can be reproduced with something like this (t3404 also triggers the
>     same issue):
>
>     make SANITIZE=undefined COPTS="-Og -g" T="$(wildcard t5601-*.sh)"
>     GIT_TEST_OPTS="-v" UBSAN_OPTIONS=print_stacktrace=1 test
>
>     ATB, Andrzej
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-930%2Fahunt%2Fmerge-ort-ubsan-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-930/ahunt/merge-ort-ubsan-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/930
>
>  merge-ort.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/merge-ort.c b/merge-ort.c
> index 5e118a85ee04..4da4b4688336 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -2504,8 +2504,10 @@ static void write_tree(struct object_id *result_oid,
>  	 * We won't use relevant_entries again and will let it just pop off the
>  	 * stack, so there won't be allocation worries or anything.
>  	 */
> -	relevant_entries.items = versions->items + offset;
> -	relevant_entries.nr = versions->nr - offset;
> +	if (versions->nr) {
> +		relevant_entries.items = versions->items + offset;
> +		relevant_entries.nr = versions->nr - offset;
> +	}
>  	QSORT(relevant_entries.items, relevant_entries.nr, tree_entry_order);

Reading the diff I was wondering if QSORT now gets handed uninitialized
values if version-nr is 0.  The answer is no -- relevant_entries is
initialized at declaration.  Otherwise the compiler would have probably
warned, but sometimes it gets confused.

I wonder why relevant_entries is introduced at all, though.  It's not
referenced later.  So how about this instead?

	if (versions->nr)
		QSORT(versions->items + offset, nr, tree_entry_order);

The intent to sort the last versions->nr-offset entries of versions,
but only if it's not empty, is easier to see like this, I think.

>
>  	/* Pre-allocate some space in buf */
>
> base-commit: 89b43f80a514aee58b662ad606e6352e03eaeee4
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] merge-ort: only do pointer arithmetic for non-empty lists
  2021-04-10 11:48 ` René Scharfe
@ 2021-04-10 22:56   ` Junio C Hamano
  2021-04-11  9:14     ` Andrzej Hunt
  2021-04-11  9:12   ` Andrzej Hunt
  1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2021-04-10 22:56 UTC (permalink / raw)
  To: René Scharfe
  Cc: Andrzej Hunt via GitGitGadget, git, Elijah Newren, Andrzej Hunt,
	Andrzej Hunt

René Scharfe <l.s.r@web.de> writes:

>> diff --git a/merge-ort.c b/merge-ort.c
>> index 5e118a85ee04..4da4b4688336 100644
>> --- a/merge-ort.c
>> +++ b/merge-ort.c
>> @@ -2504,8 +2504,10 @@ static void write_tree(struct object_id *result_oid,
>>  	 * We won't use relevant_entries again and will let it just pop off the
>>  	 * stack, so there won't be allocation worries or anything.
>>  	 */
>> -	relevant_entries.items = versions->items + offset;
>> -	relevant_entries.nr = versions->nr - offset;
>> +	if (versions->nr) {
>> +		relevant_entries.items = versions->items + offset;
>> +		relevant_entries.nr = versions->nr - offset;
>> +	}
>>  	QSORT(relevant_entries.items, relevant_entries.nr, tree_entry_order);
>
> Reading the diff I was wondering if QSORT now gets handed uninitialized
> values if version-nr is 0.  The answer is no -- relevant_entries is
> initialized at declaration.  Otherwise the compiler would have probably
> warned, but sometimes it gets confused.
>
> I wonder why relevant_entries is introduced at all, though.  It's not
> referenced later.  So how about this instead?
>
> 	if (versions->nr)
> 		QSORT(versions->items + offset, nr, tree_entry_order);
>
> The intent to sort the last versions->nr-offset entries of versions,
> but only if it's not empty, is easier to see like this, I think.

That does make sense.  I wonder if there needs some assertion to
ensure that "offset" does not exceed "versions.nr", though.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] merge-ort: only do pointer arithmetic for non-empty lists
  2021-04-10 11:48 ` René Scharfe
  2021-04-10 22:56   ` Junio C Hamano
@ 2021-04-11  9:12   ` Andrzej Hunt
  1 sibling, 0 replies; 9+ messages in thread
From: Andrzej Hunt @ 2021-04-11  9:12 UTC (permalink / raw)
  To: René Scharfe, Andrzej Hunt via GitGitGadget, git
  Cc: Elijah Newren, Andrzej Hunt


On 10/04/2021 13:48, René Scharfe wrote:
> Am 10.04.21 um 10:30 schrieb Andrzej Hunt via GitGitGadget:
>> [...]
>> diff --git a/merge-ort.c b/merge-ort.c
>> index 5e118a85ee04..4da4b4688336 100644
>> --- a/merge-ort.c
>> +++ b/merge-ort.c
>> @@ -2504,8 +2504,10 @@ static void write_tree(struct object_id *result_oid,
>>   	 * We won't use relevant_entries again and will let it just pop off the
>>   	 * stack, so there won't be allocation worries or anything.
>>   	 */
>> -	relevant_entries.items = versions->items + offset;
>> -	relevant_entries.nr = versions->nr - offset;
>> +	if (versions->nr) {
>> +		relevant_entries.items = versions->items + offset;
>> +		relevant_entries.nr = versions->nr - offset;
>> +	}
>>   	QSORT(relevant_entries.items, relevant_entries.nr, tree_entry_order);
> 
> Reading the diff I was wondering if QSORT now gets handed uninitialized
> values if version-nr is 0.  The answer is no -- relevant_entries is
> initialized at declaration.  Otherwise the compiler would have probably
> warned, but sometimes it gets confused.
> 
> I wonder why relevant_entries is introduced at all, though.  It's not
> referenced later.  So how about this instead?
> 
> 	if (versions->nr)
> 		QSORT(versions->items + offset, nr, tree_entry_order);
> 
> The intent to sort the last versions->nr-offset entries of versions,
> but only if it's not empty, is easier to see like this, I think.
> 

That is much more elegant, I will follow this approach. Thank you for 
the suggestion!

An alternative might be to keep relevant_entries, and replace all later 
usages of versions->items[offset+i] with relevant_entries.items[i], but 
that's more invasive and I don't see any good reason for doing so given 
that the existing pattern works fine.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] merge-ort: only do pointer arithmetic for non-empty lists
  2021-04-10 22:56   ` Junio C Hamano
@ 2021-04-11  9:14     ` Andrzej Hunt
  0 siblings, 0 replies; 9+ messages in thread
From: Andrzej Hunt @ 2021-04-11  9:14 UTC (permalink / raw)
  To: Junio C Hamano, René Scharfe
  Cc: Andrzej Hunt via GitGitGadget, git, Elijah Newren, Andrzej Hunt


On 11/04/2021 00:56, Junio C Hamano wrote:
> René Scharfe <l.s.r@web.de> writes:
> 
> [...] >> Reading the diff I was wondering if QSORT now gets handed uninitialized
>> values if version-nr is 0.  The answer is no -- relevant_entries is
>> initialized at declaration.  Otherwise the compiler would have probably
>> warned, but sometimes it gets confused.
>>
>> I wonder why relevant_entries is introduced at all, though.  It's not
>> referenced later.  So how about this instead?
>>
>> 	if (versions->nr)
>> 		QSORT(versions->items + offset, nr, tree_entry_order);
>>
>> The intent to sort the last versions->nr-offset entries of versions,
>> but only if it's not empty, is easier to see like this, I think.
> 
> That does make sense.  I wonder if there needs some assertion to
> ensure that "offset" does not exceed "versions.nr", though.
> 

I think so - I will add the assertion. Following the original offset 
calculations is not trivial- which seems like a good enough reason for 
the assert.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2] merge-ort: only do pointer arithmetic for non-empty lists
  2021-04-10  8:30 [PATCH] merge-ort: only do pointer arithmetic for non-empty lists Andrzej Hunt via GitGitGadget
  2021-04-10 11:48 ` René Scharfe
@ 2021-04-11 11:05 ` Andrzej Hunt via GitGitGadget
  2021-04-12 15:52   ` Elijah Newren
  1 sibling, 1 reply; 9+ messages in thread
From: Andrzej Hunt via GitGitGadget @ 2021-04-11 11:05 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, René Scharfe, Andrzej Hunt, Andrzej Hunt

From: Andrzej Hunt <ajrhunt@google.com>

versions could be an empty string_list. In that case, versions->items is
NULL, and we shouldn't be trying to perform pointer arithmetic with it (as
that results in undefined behaviour).

Moreover we only use the results of this calculation once when calling
QSORT. Therefore we choose to skip creating relevant_entries and call
QSORT directly with our manipulated pointers (but only if there's data
requiring sorting). This lets us avoid abusing the string_list API,
and saves us from having to explain why this abuse is OK.

Finally, an assertion is added to make sure that write_tree() is called
with a valid offset.

This issue has probably existed since:
  ee4012dcf9 (merge-ort: step 2 of tree writing -- function to create tree object, 2020-12-13)
But it only started occurring during tests since tests started using
merge-ort:
  f3b964a07e (Add testing with merge-ort merge strategy, 2021-03-20)

For reference - here's the original UBSAN commit that implemented this
check, it sounds like this behaviour isn't actually likely to cause any
issues (but we might as well fix it regardless):
https://reviews.llvm.org/D67122

UBSAN output from t3404 or t5601:

merge-ort.c:2669:43: runtime error: applying zero offset to null pointer
    #0 0x78bb53 in write_tree merge-ort.c:2669:43
    #1 0x7856c9 in process_entries merge-ort.c:3303:2
    #2 0x782317 in merge_ort_nonrecursive_internal merge-ort.c:3744:2
    #3 0x77feef in merge_incore_nonrecursive merge-ort.c:3853:2
    #4 0x6f6a5c in do_recursive_merge sequencer.c:640:3
    #5 0x6f6a5c in do_pick_commit sequencer.c:2221:9
    #6 0x6ef055 in single_pick sequencer.c:4814:9
    #7 0x6ef055 in sequencer_pick_revisions sequencer.c:4867:10
    #8 0x4fb392 in run_sequencer revert.c:225:9
    #9 0x4fa5b0 in cmd_revert revert.c:235:8
    #10 0x42abd7 in run_builtin git.c:453:11
    #11 0x429531 in handle_builtin git.c:704:3
    #12 0x4282fb in run_argv git.c:771:4
    #13 0x4282fb in cmd_main git.c:902:19
    #14 0x524b63 in main common-main.c:52:11
    #15 0x7fc2ca340349 in __libc_start_main (/lib64/libc.so.6+0x24349)
    #16 0x4072b9 in _start start.S:120

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior merge-ort.c:2669:43 in

Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
---
    merge-ort: only do pointer arithmetic for non-empty lists
    
    V2 removes relevant_entries as per René's suggestion, and adds an
    assertion as per Junio's question.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-930%2Fahunt%2Fmerge-ort-ubsan-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-930/ahunt/merge-ort-ubsan-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/930

Range-diff vs v1:

 1:  d52ce7110446 ! 1:  48c1a0ad6f46 merge-ort: only do pointer arithmetic for non-empty lists
     @@ Commit message
          NULL, and we shouldn't be trying to perform pointer arithmetic with it (as
          that results in undefined behaviour).
      
     +    Moreover we only use the results of this calculation once when calling
     +    QSORT. Therefore we choose to skip creating relevant_entries and call
     +    QSORT directly with our manipulated pointers (but only if there's data
     +    requiring sorting). This lets us avoid abusing the string_list API,
     +    and saves us from having to explain why this abuse is OK.
     +
     +    Finally, an assertion is added to make sure that write_tree() is called
     +    with a valid offset.
     +
          This issue has probably existed since:
            ee4012dcf9 (merge-ort: step 2 of tree writing -- function to create tree object, 2020-12-13)
          But it only started occurring during tests since tests started using
     @@ Commit message
      
       ## merge-ort.c ##
      @@ merge-ort.c: static void write_tree(struct object_id *result_oid,
     - 	 * We won't use relevant_entries again and will let it just pop off the
     - 	 * stack, so there won't be allocation worries or anything.
     - 	 */
     + 	size_t maxlen = 0, extra;
     + 	unsigned int nr = versions->nr - offset;
     + 	struct strbuf buf = STRBUF_INIT;
     +-	struct string_list relevant_entries = STRING_LIST_INIT_NODUP;
     + 	int i;
     + 
     +-	/*
     +-	 * We want to sort the last (versions->nr-offset) entries in versions.
     +-	 * Do so by abusing the string_list API a bit: make another string_list
     +-	 * that contains just those entries and then sort them.
     +-	 *
     +-	 * We won't use relevant_entries again and will let it just pop off the
     +-	 * stack, so there won't be allocation worries or anything.
     +-	 */
      -	relevant_entries.items = versions->items + offset;
      -	relevant_entries.nr = versions->nr - offset;
     -+	if (versions->nr) {
     -+		relevant_entries.items = versions->items + offset;
     -+		relevant_entries.nr = versions->nr - offset;
     -+	}
     - 	QSORT(relevant_entries.items, relevant_entries.nr, tree_entry_order);
     +-	QSORT(relevant_entries.items, relevant_entries.nr, tree_entry_order);
     ++	assert(offset <= versions->nr);
     ++	if (versions->nr)
     ++		QSORT(versions->items + offset, nr, tree_entry_order);
       
       	/* Pre-allocate some space in buf */
     + 	extra = hash_size + 8; /* 8: 6 for mode, 1 for space, 1 for NUL char */


 merge-ort.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 5e118a85ee04..702eb5bf362d 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -2493,20 +2493,11 @@ static void write_tree(struct object_id *result_oid,
 	size_t maxlen = 0, extra;
 	unsigned int nr = versions->nr - offset;
 	struct strbuf buf = STRBUF_INIT;
-	struct string_list relevant_entries = STRING_LIST_INIT_NODUP;
 	int i;
 
-	/*
-	 * We want to sort the last (versions->nr-offset) entries in versions.
-	 * Do so by abusing the string_list API a bit: make another string_list
-	 * that contains just those entries and then sort them.
-	 *
-	 * We won't use relevant_entries again and will let it just pop off the
-	 * stack, so there won't be allocation worries or anything.
-	 */
-	relevant_entries.items = versions->items + offset;
-	relevant_entries.nr = versions->nr - offset;
-	QSORT(relevant_entries.items, relevant_entries.nr, tree_entry_order);
+	assert(offset <= versions->nr);
+	if (versions->nr)
+		QSORT(versions->items + offset, nr, tree_entry_order);
 
 	/* Pre-allocate some space in buf */
 	extra = hash_size + 8; /* 8: 6 for mode, 1 for space, 1 for NUL char */

base-commit: 89b43f80a514aee58b662ad606e6352e03eaeee4
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] merge-ort: only do pointer arithmetic for non-empty lists
  2021-04-11 11:05 ` [PATCH v2] " Andrzej Hunt via GitGitGadget
@ 2021-04-12 15:52   ` Elijah Newren
  2021-04-12 17:39     ` Junio C Hamano
  2021-04-12 17:47     ` Junio C Hamano
  0 siblings, 2 replies; 9+ messages in thread
From: Elijah Newren @ 2021-04-12 15:52 UTC (permalink / raw)
  To: Andrzej Hunt via GitGitGadget
  Cc: Git Mailing List, René Scharfe, Andrzej Hunt, Andrzej Hunt

On Sun, Apr 11, 2021 at 4:05 AM Andrzej Hunt via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Andrzej Hunt <ajrhunt@google.com>
>
> versions could be an empty string_list. In that case, versions->items is
> NULL, and we shouldn't be trying to perform pointer arithmetic with it (as
> that results in undefined behaviour).
>
> Moreover we only use the results of this calculation once when calling
> QSORT. Therefore we choose to skip creating relevant_entries and call
> QSORT directly with our manipulated pointers (but only if there's data
> requiring sorting). This lets us avoid abusing the string_list API,
> and saves us from having to explain why this abuse is OK.

Oh, nice, I like that.

>
> Finally, an assertion is added to make sure that write_tree() is called
> with a valid offset.
>
> This issue has probably existed since:
>   ee4012dcf9 (merge-ort: step 2 of tree writing -- function to create tree object, 2020-12-13)
> But it only started occurring during tests since tests started using
> merge-ort:
>   f3b964a07e (Add testing with merge-ort merge strategy, 2021-03-20)
>
> For reference - here's the original UBSAN commit that implemented this
> check, it sounds like this behaviour isn't actually likely to cause any
> issues (but we might as well fix it regardless):
> https://reviews.llvm.org/D67122
>
> UBSAN output from t3404 or t5601:
>
> merge-ort.c:2669:43: runtime error: applying zero offset to null pointer
>     #0 0x78bb53 in write_tree merge-ort.c:2669:43
>     #1 0x7856c9 in process_entries merge-ort.c:3303:2
>     #2 0x782317 in merge_ort_nonrecursive_internal merge-ort.c:3744:2
>     #3 0x77feef in merge_incore_nonrecursive merge-ort.c:3853:2
>     #4 0x6f6a5c in do_recursive_merge sequencer.c:640:3
>     #5 0x6f6a5c in do_pick_commit sequencer.c:2221:9
>     #6 0x6ef055 in single_pick sequencer.c:4814:9
>     #7 0x6ef055 in sequencer_pick_revisions sequencer.c:4867:10
>     #8 0x4fb392 in run_sequencer revert.c:225:9
>     #9 0x4fa5b0 in cmd_revert revert.c:235:8
>     #10 0x42abd7 in run_builtin git.c:453:11
>     #11 0x429531 in handle_builtin git.c:704:3
>     #12 0x4282fb in run_argv git.c:771:4
>     #13 0x4282fb in cmd_main git.c:902:19
>     #14 0x524b63 in main common-main.c:52:11
>     #15 0x7fc2ca340349 in __libc_start_main (/lib64/libc.so.6+0x24349)
>     #16 0x4072b9 in _start start.S:120
>
> SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior merge-ort.c:2669:43 in
>
> Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
> ---
>     merge-ort: only do pointer arithmetic for non-empty lists
>
>     V2 removes relevant_entries as per René's suggestion, and adds an
>     assertion as per Junio's question.
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-930%2Fahunt%2Fmerge-ort-ubsan-v2
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-930/ahunt/merge-ort-ubsan-v2
> Pull-Request: https://github.com/gitgitgadget/git/pull/930
>
> Range-diff vs v1:
>
>  1:  d52ce7110446 ! 1:  48c1a0ad6f46 merge-ort: only do pointer arithmetic for non-empty lists
>      @@ Commit message
>           NULL, and we shouldn't be trying to perform pointer arithmetic with it (as
>           that results in undefined behaviour).
>
>      +    Moreover we only use the results of this calculation once when calling
>      +    QSORT. Therefore we choose to skip creating relevant_entries and call
>      +    QSORT directly with our manipulated pointers (but only if there's data
>      +    requiring sorting). This lets us avoid abusing the string_list API,
>      +    and saves us from having to explain why this abuse is OK.
>      +
>      +    Finally, an assertion is added to make sure that write_tree() is called
>      +    with a valid offset.
>      +
>           This issue has probably existed since:
>             ee4012dcf9 (merge-ort: step 2 of tree writing -- function to create tree object, 2020-12-13)
>           But it only started occurring during tests since tests started using
>      @@ Commit message
>
>        ## merge-ort.c ##
>       @@ merge-ort.c: static void write_tree(struct object_id *result_oid,
>      -   * We won't use relevant_entries again and will let it just pop off the
>      -   * stack, so there won't be allocation worries or anything.
>      -   */
>      +  size_t maxlen = 0, extra;
>      +  unsigned int nr = versions->nr - offset;

Should we remove the initialization here and just define nr only, and then...

>      +  struct strbuf buf = STRBUF_INIT;
>      +- struct string_list relevant_entries = STRING_LIST_INIT_NODUP;
>      +  int i;
>      +
>      +- /*
>      +-  * We want to sort the last (versions->nr-offset) entries in versions.
>      +-  * Do so by abusing the string_list API a bit: make another string_list
>      +-  * that contains just those entries and then sort them.
>      +-  *
>      +-  * We won't use relevant_entries again and will let it just pop off the
>      +-  * stack, so there won't be allocation worries or anything.
>      +-  */
>       - relevant_entries.items = versions->items + offset;
>       - relevant_entries.nr = versions->nr - offset;
>      -+ if (versions->nr) {
>      -+         relevant_entries.items = versions->items + offset;
>      -+         relevant_entries.nr = versions->nr - offset;
>      -+ }
>      -  QSORT(relevant_entries.items, relevant_entries.nr, tree_entry_order);
>      +- QSORT(relevant_entries.items, relevant_entries.nr, tree_entry_order);
>      ++ assert(offset <= versions->nr);

Define nr here after this assert?  The definition of nr also depended
implicitly on this requirement, so if we're making this requirement
explicit, it would make sense to define nr after stating the
requirement.

>      ++ if (versions->nr)
>      ++         QSORT(versions->items + offset, nr, tree_entry_order);
>
>         /* Pre-allocate some space in buf */
>      +  extra = hash_size + 8; /* 8: 6 for mode, 1 for space, 1 for NUL char */
>
>
>  merge-ort.c | 15 +++------------
>  1 file changed, 3 insertions(+), 12 deletions(-)
>
> diff --git a/merge-ort.c b/merge-ort.c
> index 5e118a85ee04..702eb5bf362d 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -2493,20 +2493,11 @@ static void write_tree(struct object_id *result_oid,
>         size_t maxlen = 0, extra;
>         unsigned int nr = versions->nr - offset;
>         struct strbuf buf = STRBUF_INIT;
> -       struct string_list relevant_entries = STRING_LIST_INIT_NODUP;
>         int i;
>
> -       /*
> -        * We want to sort the last (versions->nr-offset) entries in versions.
> -        * Do so by abusing the string_list API a bit: make another string_list
> -        * that contains just those entries and then sort them.
> -        *
> -        * We won't use relevant_entries again and will let it just pop off the
> -        * stack, so there won't be allocation worries or anything.
> -        */
> -       relevant_entries.items = versions->items + offset;
> -       relevant_entries.nr = versions->nr - offset;
> -       QSORT(relevant_entries.items, relevant_entries.nr, tree_entry_order);
> +       assert(offset <= versions->nr);
> +       if (versions->nr)
> +               QSORT(versions->items + offset, nr, tree_entry_order);
>
>         /* Pre-allocate some space in buf */
>         extra = hash_size + 8; /* 8: 6 for mode, 1 for space, 1 for NUL char */
>
> base-commit: 89b43f80a514aee58b662ad606e6352e03eaeee4
> --
> gitgitgadget

Otherwise, this patch looks good to me; thanks!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] merge-ort: only do pointer arithmetic for non-empty lists
  2021-04-12 15:52   ` Elijah Newren
@ 2021-04-12 17:39     ` Junio C Hamano
  2021-04-12 17:47     ` Junio C Hamano
  1 sibling, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2021-04-12 17:39 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Andrzej Hunt via GitGitGadget, Git Mailing List,
	René Scharfe, Andrzej Hunt, Andrzej Hunt

Elijah Newren <newren@gmail.com> writes:

> Otherwise, this patch looks good to me; thanks!

Queued, manually amended, and here is what I have.

$ git range-diff @{1}...
1:  ca4cc8d182 ! 1:  c1ea48a8f7 merge-ort: only do pointer arithmetic for non-empty lists
    @@ Commit message
         SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior merge-ort.c:2669:43 in
     
         Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
    +    Reviewed-by: Elijah Newren <newren@gmail.com>
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      ## merge-ort.c ##
     @@ merge-ort.c: static void write_tree(struct object_id *result_oid,
    + 		       size_t hash_size)
    + {
      	size_t maxlen = 0, extra;
    - 	unsigned int nr = versions->nr - offset;
    +-	unsigned int nr = versions->nr - offset;
    ++	unsigned int nr;
      	struct strbuf buf = STRBUF_INIT;
     -	struct string_list relevant_entries = STRING_LIST_INIT_NODUP;
      	int i;
    @@ merge-ort.c: static void write_tree(struct object_id *result_oid,
     -	relevant_entries.nr = versions->nr - offset;
     -	QSORT(relevant_entries.items, relevant_entries.nr, tree_entry_order);
     +	assert(offset <= versions->nr);
    ++	nr = versions->nr - offset;
     +	if (versions->nr)
     +		QSORT(versions->items + offset, nr, tree_entry_order);
      

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] merge-ort: only do pointer arithmetic for non-empty lists
  2021-04-12 15:52   ` Elijah Newren
  2021-04-12 17:39     ` Junio C Hamano
@ 2021-04-12 17:47     ` Junio C Hamano
  1 sibling, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2021-04-12 17:47 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Andrzej Hunt via GitGitGadget, Git Mailing List,
	René Scharfe, Andrzej Hunt, Andrzej Hunt

Elijah Newren <newren@gmail.com> writes:

>>         /* Pre-allocate some space in buf */
>>         extra = hash_size + 8; /* 8: 6 for mode, 1 for space, 1 for NUL char */
>>
>> base-commit: 89b43f80a514aee58b662ad606e6352e03eaeee4
>> --
>> gitgitgadget
>
> Otherwise, this patch looks good to me; thanks!

By the way, I noticed the post-context comment and got curious.

	/* Pre-allocate some space in buf */
	extra = hash_size + 8; /* 8: 6 for mode, 1 for space, 1 for NUL char */
	for (i = 0; i < nr; i++) {
		maxlen += strlen(versions->items[offset+i].string) + extra;
	}
	strbuf_grow(&buf, maxlen);

Because "6 for mode" is wrong if it means "%06o", but the format
used in the code is "%o" so there is no correctness issues in the
code (Phew).  This "grow" is to avoid having to repeatedly
reallocate "nr" times in the loop that comes, but

 (1) s/6 for mode/up to 6 for mode/ to make it less misleading.

 (2) does this preallocation really help performance?

 (3) it is really disturbing to find a custom tree-writing code that
     is exercised only by ort here.  Tree-writing has been one major
     source of broken objects in various reimplementation of Git---
     all known ones made a mistake sometime in the past---and I'd
     strongly prefer to see a single helper that formats a tree
     object, given a set of <mode, object name, path> in the longer
     code health.




^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-04-12 17:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-10  8:30 [PATCH] merge-ort: only do pointer arithmetic for non-empty lists Andrzej Hunt via GitGitGadget
2021-04-10 11:48 ` René Scharfe
2021-04-10 22:56   ` Junio C Hamano
2021-04-11  9:14     ` Andrzej Hunt
2021-04-11  9:12   ` Andrzej Hunt
2021-04-11 11:05 ` [PATCH v2] " Andrzej Hunt via GitGitGadget
2021-04-12 15:52   ` Elijah Newren
2021-04-12 17:39     ` Junio C Hamano
2021-04-12 17:47     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).