git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/3] xdiff: handle allocation failures
@ 2022-02-09 10:59 Phillip Wood via GitGitGadget
  2022-02-09 10:59 ` [PATCH 1/3] xdiff: handle allocation failure in patience diff Phillip Wood via GitGitGadget
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Phillip Wood via GitGitGadget @ 2022-02-09 10:59 UTC (permalink / raw)
  To: git; +Cc: Edward Thomson, Phillip Wood

Other users of xdiff such as libgit2 need to be able to handle allocation
failures. This series fixes the few cases where we are not doing this
already.

Edward's patch[1] reminded me that I had these waiting to be submitted.

[1] https://lore.kernel.org/git/20220209013354.GB7@abe733c6e288/

Phillip Wood (3):
  xdiff: handle allocation failure in patience diff
  xdiff: refactor a function
  xdiff: handle allocation failure when merging

 xdiff/xmerge.c    | 44 ++++++++++++++++++++++++--------------------
 xdiff/xpatience.c | 17 ++++++++++++-----
 2 files changed, 36 insertions(+), 25 deletions(-)


base-commit: 38062e73e009f27ea192d50481fcb5e7b0e9d6eb
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1140%2Fphillipwood%2Fwip%2Fxdiff-handle-allocation-failures-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1140/phillipwood/wip/xdiff-handle-allocation-failures-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1140
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/3] xdiff: handle allocation failure in patience diff
  2022-02-09 10:59 [PATCH 0/3] xdiff: handle allocation failures Phillip Wood via GitGitGadget
@ 2022-02-09 10:59 ` Phillip Wood via GitGitGadget
  2022-02-09 10:59 ` [PATCH 2/3] xdiff: refactor a function Phillip Wood via GitGitGadget
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Phillip Wood via GitGitGadget @ 2022-02-09 10:59 UTC (permalink / raw)
  To: git; +Cc: Edward Thomson, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Other users of libxdiff such as libgit2 need to be able to handle
allocation failures. As NULL is a valid return value the function
signature is changed to be able report allocation failures.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 xdiff/xpatience.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
index c5d48e80aef..3e3d99f8922 100644
--- a/xdiff/xpatience.c
+++ b/xdiff/xpatience.c
@@ -198,7 +198,7 @@ static int binary_search(struct entry **sequence, int longest,
  * item per sequence length: the sequence with the smallest last
  * element (in terms of line2).
  */
-static struct entry *find_longest_common_sequence(struct hashmap *map)
+static int find_longest_common_sequence(struct hashmap *map, struct entry **res)
 {
 	struct entry **sequence = xdl_malloc(map->nr * sizeof(struct entry *));
 	int longest = 0, i;
@@ -211,6 +211,9 @@ static struct entry *find_longest_common_sequence(struct hashmap *map)
 	 */
 	int anchor_i = -1;
 
+	if (!sequence)
+		return -1;
+
 	for (entry = map->first; entry; entry = entry->next) {
 		if (!entry->line2 || entry->line2 == NON_UNIQUE)
 			continue;
@@ -230,8 +233,9 @@ static struct entry *find_longest_common_sequence(struct hashmap *map)
 
 	/* No common unique lines were found */
 	if (!longest) {
+		*res = NULL;
 		xdl_free(sequence);
-		return NULL;
+		return 0;
 	}
 
 	/* Iterate starting at the last element, adjusting the "next" members */
@@ -241,8 +245,9 @@ static struct entry *find_longest_common_sequence(struct hashmap *map)
 		entry->previous->next = entry;
 		entry = entry->previous;
 	}
+	*res = entry;
 	xdl_free(sequence);
-	return entry;
+	return 0;
 }
 
 static int match(struct hashmap *map, int line1, int line2)
@@ -358,14 +363,16 @@ static int patience_diff(mmfile_t *file1, mmfile_t *file2,
 		return 0;
 	}
 
-	first = find_longest_common_sequence(&map);
+	result = find_longest_common_sequence(&map, &first);
+	if (result)
+		goto out;
 	if (first)
 		result = walk_common_sequence(&map, first,
 			line1, count1, line2, count2);
 	else
 		result = fall_back_to_classic_diff(&map,
 			line1, count1, line2, count2);
-
+ out:
 	xdl_free(map.entries);
 	return result;
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/3] xdiff: refactor a function
  2022-02-09 10:59 [PATCH 0/3] xdiff: handle allocation failures Phillip Wood via GitGitGadget
  2022-02-09 10:59 ` [PATCH 1/3] xdiff: handle allocation failure in patience diff Phillip Wood via GitGitGadget
@ 2022-02-09 10:59 ` Phillip Wood via GitGitGadget
  2022-02-09 18:04   ` Junio C Hamano
  2022-02-09 10:59 ` [PATCH 3/3] xdiff: handle allocation failure when merging Phillip Wood via GitGitGadget
  2022-02-16 10:15 ` [PATCH v2 0/4] xdiff: handle allocation failures Phillip Wood via GitGitGadget
  3 siblings, 1 reply; 18+ messages in thread
From: Phillip Wood via GitGitGadget @ 2022-02-09 10:59 UTC (permalink / raw)
  To: git; +Cc: Edward Thomson, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Rather than having to remember exactly what to free after an
allocation failure use the standard "goto out" pattern. This will
simplify the next commit that starts handling currently unhandled
failures.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 xdiff/xmerge.c | 36 ++++++++++++++++--------------------
 1 file changed, 16 insertions(+), 20 deletions(-)

diff --git a/xdiff/xmerge.c b/xdiff/xmerge.c
index fff0b594f9a..48c5e9e3f35 100644
--- a/xdiff/xmerge.c
+++ b/xdiff/xmerge.c
@@ -684,35 +684,30 @@ static int xdl_do_merge(xdfenv_t *xe1, xdchange_t *xscr1,
 int xdl_merge(mmfile_t *orig, mmfile_t *mf1, mmfile_t *mf2,
 		xmparam_t const *xmp, mmbuffer_t *result)
 {
-	xdchange_t *xscr1, *xscr2;
-	xdfenv_t xe1, xe2;
-	int status;
+	xdchange_t *xscr1 = NULL, *xscr2 = NULL;
+	xdfenv_t xe1 = { 0 }, xe2 = { 0 };
+	int status = -1;
 	xpparam_t const *xpp = &xmp->xpp;
 
 	result->ptr = NULL;
 	result->size = 0;
 
-	if (xdl_do_diff(orig, mf1, xpp, &xe1) < 0) {
-		return -1;
-	}
-	if (xdl_do_diff(orig, mf2, xpp, &xe2) < 0) {
-		xdl_free_env(&xe1);
-		return -1;
-	}
+	if (xdl_do_diff(orig, mf1, xpp, &xe1) < 0)
+		goto out;
+
+	if (xdl_do_diff(orig, mf2, xpp, &xe2) < 0)
+		goto out;
+
 	if (xdl_change_compact(&xe1.xdf1, &xe1.xdf2, xpp->flags) < 0 ||
 	    xdl_change_compact(&xe1.xdf2, &xe1.xdf1, xpp->flags) < 0 ||
-	    xdl_build_script(&xe1, &xscr1) < 0) {
-		xdl_free_env(&xe1);
-		return -1;
-	}
+	    xdl_build_script(&xe1, &xscr1) < 0)
+		goto out;
+
 	if (xdl_change_compact(&xe2.xdf1, &xe2.xdf2, xpp->flags) < 0 ||
 	    xdl_change_compact(&xe2.xdf2, &xe2.xdf1, xpp->flags) < 0 ||
-	    xdl_build_script(&xe2, &xscr2) < 0) {
-		xdl_free_script(xscr1);
-		xdl_free_env(&xe1);
-		xdl_free_env(&xe2);
-		return -1;
-	}
+	    xdl_build_script(&xe2, &xscr2) < 0)
+		goto out;
+
 	status = 0;
 	if (!xscr1) {
 		result->ptr = xdl_malloc(mf2->size);
@@ -727,6 +722,7 @@ int xdl_merge(mmfile_t *orig, mmfile_t *mf1, mmfile_t *mf2,
 				      &xe2, xscr2,
 				      xmp, result);
 	}
+ out:
 	xdl_free_script(xscr1);
 	xdl_free_script(xscr2);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/3] xdiff: handle allocation failure when merging
  2022-02-09 10:59 [PATCH 0/3] xdiff: handle allocation failures Phillip Wood via GitGitGadget
  2022-02-09 10:59 ` [PATCH 1/3] xdiff: handle allocation failure in patience diff Phillip Wood via GitGitGadget
  2022-02-09 10:59 ` [PATCH 2/3] xdiff: refactor a function Phillip Wood via GitGitGadget
@ 2022-02-09 10:59 ` Phillip Wood via GitGitGadget
  2022-02-16 10:15 ` [PATCH v2 0/4] xdiff: handle allocation failures Phillip Wood via GitGitGadget
  3 siblings, 0 replies; 18+ messages in thread
From: Phillip Wood via GitGitGadget @ 2022-02-09 10:59 UTC (permalink / raw)
  To: git; +Cc: Edward Thomson, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Other users of xdiff such as libgit2 need to be able to handle
allocation failures. These allocation failures were previously
ignored.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 xdiff/xmerge.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/xdiff/xmerge.c b/xdiff/xmerge.c
index 48c5e9e3f35..b9d72f0e9f8 100644
--- a/xdiff/xmerge.c
+++ b/xdiff/xmerge.c
@@ -711,10 +711,18 @@ int xdl_merge(mmfile_t *orig, mmfile_t *mf1, mmfile_t *mf2,
 	status = 0;
 	if (!xscr1) {
 		result->ptr = xdl_malloc(mf2->size);
+		if (!result->ptr) {
+			status = -1;
+			goto out;
+		}
 		memcpy(result->ptr, mf2->ptr, mf2->size);
 		result->size = mf2->size;
 	} else if (!xscr2) {
 		result->ptr = xdl_malloc(mf1->size);
+		if (!result->ptr) {
+			status = -1;
+			goto out;
+		}
 		memcpy(result->ptr, mf1->ptr, mf1->size);
 		result->size = mf1->size;
 	} else {
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] xdiff: refactor a function
  2022-02-09 10:59 ` [PATCH 2/3] xdiff: refactor a function Phillip Wood via GitGitGadget
@ 2022-02-09 18:04   ` Junio C Hamano
  2022-02-10  6:28     ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2022-02-09 18:04 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget; +Cc: git, Edward Thomson, Phillip Wood

"Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> Rather than having to remember exactly what to free after an
> allocation failure use the standard "goto out" pattern. This will
> simplify the next commit that starts handling currently unhandled
> failures.

It sound like this is meant to be a no-op clean-up; let's see.

>
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
>  xdiff/xmerge.c | 36 ++++++++++++++++--------------------
>  1 file changed, 16 insertions(+), 20 deletions(-)
>
> diff --git a/xdiff/xmerge.c b/xdiff/xmerge.c
> index fff0b594f9a..48c5e9e3f35 100644
> --- a/xdiff/xmerge.c
> +++ b/xdiff/xmerge.c
> @@ -684,35 +684,30 @@ static int xdl_do_merge(xdfenv_t *xe1, xdchange_t *xscr1,
>  int xdl_merge(mmfile_t *orig, mmfile_t *mf1, mmfile_t *mf2,
>  		xmparam_t const *xmp, mmbuffer_t *result)
>  {
> -	xdchange_t *xscr1, *xscr2;
> -	xdfenv_t xe1, xe2;
> -	int status;
> +	xdchange_t *xscr1 = NULL, *xscr2 = NULL;
> +	xdfenv_t xe1 = { 0 }, xe2 = { 0 };
> +	int status = -1;
>  	xpparam_t const *xpp = &xmp->xpp;
>  
>  	result->ptr = NULL;
>  	result->size = 0;
>  
> -	if (xdl_do_diff(orig, mf1, xpp, &xe1) < 0) {
> -		return -1;
> -	}
> -	if (xdl_do_diff(orig, mf2, xpp, &xe2) < 0) {
> -		xdl_free_env(&xe1);
> -		return -1;
> -	}

OK, xdl_do_diff() calls xdl_free_env(xe) before an error return (I
didn't check if patience and histogram also do so correctly), so the
original was not leaking xe1 or xe2.

> +	if (xdl_do_diff(orig, mf1, xpp, &xe1) < 0)
> +		goto out;
> +
> +	if (xdl_do_diff(orig, mf2, xpp, &xe2) < 0)
> +		goto out;
> +

And this does not change that.

>  	if (xdl_change_compact(&xe1.xdf1, &xe1.xdf2, xpp->flags) < 0 ||
>  	    xdl_change_compact(&xe1.xdf2, &xe1.xdf1, xpp->flags) < 0 ||
> -	    xdl_build_script(&xe1, &xscr1) < 0) {
> -		xdl_free_env(&xe1);
> -		return -1;
> -	}
> +	    xdl_build_script(&xe1, &xscr1) < 0)
> +		goto out;
> +

Here, as xdl_build_script() does free the script it failed to build,
but not the xe it was given, the original is correct to free xe1.

We jump from here to leave the freeing of xe1 to the clean-up part.

>  	if (xdl_change_compact(&xe2.xdf1, &xe2.xdf2, xpp->flags) < 0 ||
>  	    xdl_change_compact(&xe2.xdf2, &xe2.xdf1, xpp->flags) < 0 ||
> -	    xdl_build_script(&xe2, &xscr2) < 0) {
> -		xdl_free_script(xscr1);
> -		xdl_free_env(&xe1);
> -		xdl_free_env(&xe2);
> -		return -1;
> -	}
> +	    xdl_build_script(&xe2, &xscr2) < 0)
> +		goto out;

Ditto for xe2 here.

>  	status = 0;
>  	if (!xscr1) {
>  		result->ptr = xdl_malloc(mf2->size);
> @@ -727,6 +722,7 @@ int xdl_merge(mmfile_t *orig, mmfile_t *mf1, mmfile_t *mf2,
>  				      &xe2, xscr2,
>  				      xmp, result);
>  	}
> + out:
>  	xdl_free_script(xscr1);
>  	xdl_free_script(xscr2);

And after the post-context of this hunk, we do free_env() both xe1
and xe2, so we should be doing the same thing as before.

Looking good.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] xdiff: refactor a function
  2022-02-09 18:04   ` Junio C Hamano
@ 2022-02-10  6:28     ` Junio C Hamano
  2022-02-11 15:19       ` Phillip Wood
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2022-02-10  6:28 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget; +Cc: git, Edward Thomson, Phillip Wood

Junio C Hamano <gitster@pobox.com> writes:

>>  int xdl_merge(mmfile_t *orig, mmfile_t *mf1, mmfile_t *mf2,
>>  		xmparam_t const *xmp, mmbuffer_t *result)
>>  {
>> -	xdchange_t *xscr1, *xscr2;
>> -	xdfenv_t xe1, xe2;
>> -	int status;
>> +	xdchange_t *xscr1 = NULL, *xscr2 = NULL;
>> +	xdfenv_t xe1 = { 0 }, xe2 = { 0 };
>> +	int status = -1;
>>  	xpparam_t const *xpp = &xmp->xpp;
>>  
>>  	result->ptr = NULL;
>>  	result->size = 0;
>>  
>> -	if (xdl_do_diff(orig, mf1, xpp, &xe1) < 0) {
>> -		return -1;
>> -	}
>> -	if (xdl_do_diff(orig, mf2, xpp, &xe2) < 0) {
>> -		xdl_free_env(&xe1);
>> -		return -1;
>> -	}
>
> OK, xdl_do_diff() calls xdl_free_env(xe) before an error return (I
> didn't check if patience and histogram also do so correctly), so the
> original was not leaking xe1 or xe2.

After I wrote the above, I took a brief look at patience and histogram,
and it does not seem to release resources held by "env" when it signals
a failure by returning a negative value.  So it seems that the original
used with patience or histogram were leaking env when it failed, and
this patch plugs that small leak.

If that is indeed the case, please note it in the proposed log
message.

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] xdiff: refactor a function
  2022-02-10  6:28     ` Junio C Hamano
@ 2022-02-11 15:19       ` Phillip Wood
  2022-02-11 16:46         ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Phillip Wood @ 2022-02-11 15:19 UTC (permalink / raw)
  To: Junio C Hamano, Phillip Wood via GitGitGadget
  Cc: git, Edward Thomson, Phillip Wood

Hi Junio

On 10/02/2022 06:28, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
>>>   int xdl_merge(mmfile_t *orig, mmfile_t *mf1, mmfile_t *mf2,
>>>   		xmparam_t const *xmp, mmbuffer_t *result)
>>>   {
>>> -	xdchange_t *xscr1, *xscr2;
>>> -	xdfenv_t xe1, xe2;
>>> -	int status;
>>> +	xdchange_t *xscr1 = NULL, *xscr2 = NULL;
>>> +	xdfenv_t xe1 = { 0 }, xe2 = { 0 };
>>> +	int status = -1;
>>>   	xpparam_t const *xpp = &xmp->xpp;
>>>   
>>>   	result->ptr = NULL;
>>>   	result->size = 0;
>>>   
>>> -	if (xdl_do_diff(orig, mf1, xpp, &xe1) < 0) {
>>> -		return -1;
>>> -	}
>>> -	if (xdl_do_diff(orig, mf2, xpp, &xe2) < 0) {
>>> -		xdl_free_env(&xe1);
>>> -		return -1;
>>> -	}
>>
>> OK, xdl_do_diff() calls xdl_free_env(xe) before an error return (I
>> didn't check if patience and histogram also do so correctly), so the
>> original was not leaking xe1 or xe2.
> 
> After I wrote the above, I took a brief look at patience and histogram,
> and it does not seem to release resources held by "env" when it signals
> a failure by returning a negative value.  So it seems that the original
> used with patience or histogram were leaking env when it failed, and
> this patch plugs that small leak.
> 
> If that is indeed the case, please note it in the proposed log
> message.

Oh well spotted, xdl_do_diff() only frees "env" if the myers algorithm 
has an error, if the patience or histogram algorithms have an error then 
they do not free "env" and it is not freed by xdl_do_diff(). This patch 
inadvertently fixes that leak when merging but not when calling 
xdl_do_diff() to compact conflicts in zealous mode or when doing a plain 
diff. I think the simplest fix is to have xdl_do_diff() free "env" when 
there is an error what ever algorithm is used.

I'll try to put a patch together to fix the other cases. If we fix this 
leak in xdl_do_diff() then maybe we should go back to returning -1 in 
the hunk above and explain in the log message why that is ok.

Best Wishes

Phillip

> Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] xdiff: refactor a function
  2022-02-11 15:19       ` Phillip Wood
@ 2022-02-11 16:46         ` Junio C Hamano
  0 siblings, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2022-02-11 16:46 UTC (permalink / raw)
  To: Phillip Wood
  Cc: Phillip Wood via GitGitGadget, git, Edward Thomson, Phillip Wood

Phillip Wood <phillip.wood123@gmail.com> writes:

> Oh well spotted, xdl_do_diff() only frees "env" if the myers algorithm
> has an error, if the patience or histogram algorithms have an error
> then they do not free "env" and it is not freed by xdl_do_diff(). This
> patch inadvertently fixes that leak when merging but not when calling 
> xdl_do_diff() to compact conflicts in zealous mode or when doing a
> plain diff. I think the simplest fix is to have xdl_do_diff() free
> "env" when there is an error what ever algorithm is used.

Heh, I didn't even look at the other uses of xdl_do_diff(); I am
glad you did.

> I'll try to put a patch together to fix the other cases. If we fix
> this leak in xdl_do_diff() then maybe we should go back to returning
> -1 in the hunk above and explain in the log message why that is ok.

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 0/4] xdiff: handle allocation failures
  2022-02-09 10:59 [PATCH 0/3] xdiff: handle allocation failures Phillip Wood via GitGitGadget
                   ` (2 preceding siblings ...)
  2022-02-09 10:59 ` [PATCH 3/3] xdiff: handle allocation failure when merging Phillip Wood via GitGitGadget
@ 2022-02-16 10:15 ` Phillip Wood via GitGitGadget
  2022-02-16 10:15   ` [PATCH v2 1/4] xdiff: fix a memory leak Phillip Wood via GitGitGadget
                     ` (3 more replies)
  3 siblings, 4 replies; 18+ messages in thread
From: Phillip Wood via GitGitGadget @ 2022-02-16 10:15 UTC (permalink / raw)
  To: git; +Cc: Edward Thomson, Phillip Wood, Phillip Wood

Thanks to Junio for his comments on V1.

Changes since V1:

 * Patch 1 is new and addresses a memory leak noticed by Junio
 * Patch 2 is unchanged
 * Patch 3 now avoids a double free of xe1 on error
 * Patch 4 reworked handling of the return value

V1 Cover Letter: Other users of xdiff such as libgit2 need to be able to
handle allocation failures. This series fixes the few cases where we are not
doing this already.

Edward's patch[1] reminded me that I had these waiting to be submitted.

[1] https://lore.kernel.org/git/20220209013354.GB7@abe733c6e288/

Phillip Wood (4):
  xdiff: fix a memory leak
  xdiff: handle allocation failure in patience diff
  xdiff: refactor a function
  xdiff: handle allocation failure when merging

 xdiff/xdiffi.c     | 33 +++++++++++++++++----------------
 xdiff/xhistogram.c |  3 ---
 xdiff/xmerge.c     | 42 ++++++++++++++++++++++--------------------
 xdiff/xpatience.c  | 21 ++++++++++++---------
 4 files changed, 51 insertions(+), 48 deletions(-)


base-commit: 38062e73e009f27ea192d50481fcb5e7b0e9d6eb
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1140%2Fphillipwood%2Fwip%2Fxdiff-handle-allocation-failures-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1140/phillipwood/wip/xdiff-handle-allocation-failures-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1140

Range-diff vs v1:

 -:  ----------- > 1:  3fcb6c80367 xdiff: fix a memory leak
 1:  b8f88f1b9f8 = 2:  fec1f4a4152 xdiff: handle allocation failure in patience diff
 2:  8655bb0348d ! 3:  073a45e9e85 xdiff: refactor a function
     @@ Metadata
       ## Commit message ##
          xdiff: refactor a function
      
     -    Rather than having to remember exactly what to free after an
     -    allocation failure use the standard "goto out" pattern. This will
     -    simplify the next commit that starts handling currently unhandled
     -    failures.
     +    Use the standard "goto out" pattern rather than repeating very similar
     +    code after checking for each error. This will simplify the next commit
     +    that starts handling allocation failures that are currently ignored.
     +    On error xdl_do_diff() frees the environment so we need to take care
     +    to avoid a double free in that case. xdl_build_script() does not
     +    assign a result unless it is successful so there is no possibility of
     +    a double free if it fails.
      
          Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
      
     @@ xdiff/xmerge.c: static int xdl_do_merge(xdfenv_t *xe1, xdchange_t *xscr1,
       		xmparam_t const *xmp, mmbuffer_t *result)
       {
      -	xdchange_t *xscr1, *xscr2;
     --	xdfenv_t xe1, xe2;
     --	int status;
      +	xdchange_t *xscr1 = NULL, *xscr2 = NULL;
     -+	xdfenv_t xe1 = { 0 }, xe2 = { 0 };
     + 	xdfenv_t xe1, xe2;
     +-	int status;
      +	int status = -1;
       	xpparam_t const *xpp = &xmp->xpp;
       
     @@ xdiff/xmerge.c: static int xdl_do_merge(xdfenv_t *xe1, xdchange_t *xscr1,
       	result->size = 0;
       
      -	if (xdl_do_diff(orig, mf1, xpp, &xe1) < 0) {
     --		return -1;
     ++	if (xdl_do_diff(orig, mf1, xpp, &xe1) < 0)
     + 		return -1;
      -	}
      -	if (xdl_do_diff(orig, mf2, xpp, &xe2) < 0) {
      -		xdl_free_env(&xe1);
      -		return -1;
      -	}
     -+	if (xdl_do_diff(orig, mf1, xpp, &xe1) < 0)
     -+		goto out;
      +
      +	if (xdl_do_diff(orig, mf2, xpp, &xe2) < 0)
     -+		goto out;
     ++		goto free_xe1; /* avoid double free of xe2 */
      +
       	if (xdl_change_compact(&xe1.xdf1, &xe1.xdf2, xpp->flags) < 0 ||
       	    xdl_change_compact(&xe1.xdf2, &xe1.xdf1, xpp->flags) < 0 ||
     @@ xdiff/xmerge.c: int xdl_merge(mmfile_t *orig, mmfile_t *mf1, mmfile_t *mf2,
       	xdl_free_script(xscr1);
       	xdl_free_script(xscr2);
       
     +-	xdl_free_env(&xe1);
     + 	xdl_free_env(&xe2);
     ++ free_xe1:
     ++	xdl_free_env(&xe1);
     + 
     + 	return status;
     + }
 3:  3e935abba1d ! 4:  7e705d771b0 xdiff: handle allocation failure when merging
     @@ Commit message
      
       ## xdiff/xmerge.c ##
      @@ xdiff/xmerge.c: int xdl_merge(mmfile_t *orig, mmfile_t *mf1, mmfile_t *mf2,
     - 	status = 0;
     + 	    xdl_build_script(&xe2, &xscr2) < 0)
     + 		goto out;
     + 
     +-	status = 0;
       	if (!xscr1) {
       		result->ptr = xdl_malloc(mf2->size);
     -+		if (!result->ptr) {
     -+			status = -1;
     ++		if (!result->ptr)
      +			goto out;
     -+		}
     ++		status = 0;
       		memcpy(result->ptr, mf2->ptr, mf2->size);
       		result->size = mf2->size;
       	} else if (!xscr2) {
       		result->ptr = xdl_malloc(mf1->size);
     -+		if (!result->ptr) {
     -+			status = -1;
     ++		if (!result->ptr)
      +			goto out;
     -+		}
     ++		status = 0;
       		memcpy(result->ptr, mf1->ptr, mf1->size);
       		result->size = mf1->size;
       	} else {

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/4] xdiff: fix a memory leak
  2022-02-16 10:15 ` [PATCH v2 0/4] xdiff: handle allocation failures Phillip Wood via GitGitGadget
@ 2022-02-16 10:15   ` Phillip Wood via GitGitGadget
  2022-02-16 10:28     ` Ævar Arnfjörð Bjarmason
  2022-02-16 10:15   ` [PATCH v2 2/4] xdiff: handle allocation failure in patience diff Phillip Wood via GitGitGadget
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Phillip Wood via GitGitGadget @ 2022-02-16 10:15 UTC (permalink / raw)
  To: git; +Cc: Edward Thomson, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Although the patience and histogram algorithms initialize the
environment they do not free it if there is an error. In contrast for
the Myers algorithm the environment is initalized in xdl_do_diff() and
it is freed if there is an error. Fix this by always initializing the
environment in xdl_do_diff() and freeing it there if there is an
error. Remove the comment in do_patience_diff() about the environment
being freed by xdl_diff() as it is not accurate because (a) xdl_diff()
does not do that if there is an error and (b) xdl_diff() is not the
only caller.

Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 xdiff/xdiffi.c     | 33 +++++++++++++++++----------------
 xdiff/xhistogram.c |  3 ---
 xdiff/xpatience.c  |  4 ----
 3 files changed, 17 insertions(+), 23 deletions(-)

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 69689fab247..758410c11ac 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -315,16 +315,19 @@ int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 	long *kvd, *kvdf, *kvdb;
 	xdalgoenv_t xenv;
 	diffdata_t dd1, dd2;
+	int res;
 
-	if (XDF_DIFF_ALG(xpp->flags) == XDF_PATIENCE_DIFF)
-		return xdl_do_patience_diff(mf1, mf2, xpp, xe);
-
-	if (XDF_DIFF_ALG(xpp->flags) == XDF_HISTOGRAM_DIFF)
-		return xdl_do_histogram_diff(mf1, mf2, xpp, xe);
+	if (xdl_prepare_env(mf1, mf2, xpp, xe) < 0)
+		return -1;
 
-	if (xdl_prepare_env(mf1, mf2, xpp, xe) < 0) {
+	if (XDF_DIFF_ALG(xpp->flags) == XDF_PATIENCE_DIFF) {
+		res = xdl_do_patience_diff(mf1, mf2, xpp, xe);
+		goto out;
+	}
 
-		return -1;
+	if (XDF_DIFF_ALG(xpp->flags) == XDF_HISTOGRAM_DIFF) {
+		res = xdl_do_histogram_diff(mf1, mf2, xpp, xe);
+		goto out;
 	}
 
 	/*
@@ -359,17 +362,15 @@ int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 	dd2.rchg = xe->xdf2.rchg;
 	dd2.rindex = xe->xdf2.rindex;
 
-	if (xdl_recs_cmp(&dd1, 0, dd1.nrec, &dd2, 0, dd2.nrec,
-			 kvdf, kvdb, (xpp->flags & XDF_NEED_MINIMAL) != 0, &xenv) < 0) {
-
-		xdl_free(kvd);
-		xdl_free_env(xe);
-		return -1;
-	}
-
+	res = xdl_recs_cmp(&dd1, 0, dd1.nrec, &dd2, 0, dd2.nrec,
+			   kvdf, kvdb, (xpp->flags & XDF_NEED_MINIMAL) != 0,
+			   &xenv);
 	xdl_free(kvd);
+ out:
+	if (res < 0)
+		xdl_free_env(xe);
 
-	return 0;
+	return res;
 }
 
 
diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c
index 80794748b0d..01decffc332 100644
--- a/xdiff/xhistogram.c
+++ b/xdiff/xhistogram.c
@@ -372,9 +372,6 @@ out:
 int xdl_do_histogram_diff(mmfile_t *file1, mmfile_t *file2,
 	xpparam_t const *xpp, xdfenv_t *env)
 {
-	if (xdl_prepare_env(file1, file2, xpp, env) < 0)
-		return -1;
-
 	return histogram_diff(xpp, env,
 		env->xdf1.dstart + 1, env->xdf1.dend - env->xdf1.dstart + 1,
 		env->xdf2.dstart + 1, env->xdf2.dend - env->xdf2.dstart + 1);
diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
index c5d48e80aef..e8de8d150ce 100644
--- a/xdiff/xpatience.c
+++ b/xdiff/xpatience.c
@@ -373,10 +373,6 @@ static int patience_diff(mmfile_t *file1, mmfile_t *file2,
 int xdl_do_patience_diff(mmfile_t *file1, mmfile_t *file2,
 		xpparam_t const *xpp, xdfenv_t *env)
 {
-	if (xdl_prepare_env(file1, file2, xpp, env) < 0)
-		return -1;
-
-	/* environment is cleaned up in xdl_diff() */
 	return patience_diff(file1, file2, xpp, env,
 			1, env->xdf1.nrec, 1, env->xdf2.nrec);
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 2/4] xdiff: handle allocation failure in patience diff
  2022-02-16 10:15 ` [PATCH v2 0/4] xdiff: handle allocation failures Phillip Wood via GitGitGadget
  2022-02-16 10:15   ` [PATCH v2 1/4] xdiff: fix a memory leak Phillip Wood via GitGitGadget
@ 2022-02-16 10:15   ` Phillip Wood via GitGitGadget
  2022-02-16 10:15   ` [PATCH v2 3/4] xdiff: refactor a function Phillip Wood via GitGitGadget
  2022-02-16 10:15   ` [PATCH v2 4/4] xdiff: handle allocation failure when merging Phillip Wood via GitGitGadget
  3 siblings, 0 replies; 18+ messages in thread
From: Phillip Wood via GitGitGadget @ 2022-02-16 10:15 UTC (permalink / raw)
  To: git; +Cc: Edward Thomson, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Other users of libxdiff such as libgit2 need to be able to handle
allocation failures. As NULL is a valid return value the function
signature is changed to be able report allocation failures.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 xdiff/xpatience.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
index e8de8d150ce..1a21c6a74b3 100644
--- a/xdiff/xpatience.c
+++ b/xdiff/xpatience.c
@@ -198,7 +198,7 @@ static int binary_search(struct entry **sequence, int longest,
  * item per sequence length: the sequence with the smallest last
  * element (in terms of line2).
  */
-static struct entry *find_longest_common_sequence(struct hashmap *map)
+static int find_longest_common_sequence(struct hashmap *map, struct entry **res)
 {
 	struct entry **sequence = xdl_malloc(map->nr * sizeof(struct entry *));
 	int longest = 0, i;
@@ -211,6 +211,9 @@ static struct entry *find_longest_common_sequence(struct hashmap *map)
 	 */
 	int anchor_i = -1;
 
+	if (!sequence)
+		return -1;
+
 	for (entry = map->first; entry; entry = entry->next) {
 		if (!entry->line2 || entry->line2 == NON_UNIQUE)
 			continue;
@@ -230,8 +233,9 @@ static struct entry *find_longest_common_sequence(struct hashmap *map)
 
 	/* No common unique lines were found */
 	if (!longest) {
+		*res = NULL;
 		xdl_free(sequence);
-		return NULL;
+		return 0;
 	}
 
 	/* Iterate starting at the last element, adjusting the "next" members */
@@ -241,8 +245,9 @@ static struct entry *find_longest_common_sequence(struct hashmap *map)
 		entry->previous->next = entry;
 		entry = entry->previous;
 	}
+	*res = entry;
 	xdl_free(sequence);
-	return entry;
+	return 0;
 }
 
 static int match(struct hashmap *map, int line1, int line2)
@@ -358,14 +363,16 @@ static int patience_diff(mmfile_t *file1, mmfile_t *file2,
 		return 0;
 	}
 
-	first = find_longest_common_sequence(&map);
+	result = find_longest_common_sequence(&map, &first);
+	if (result)
+		goto out;
 	if (first)
 		result = walk_common_sequence(&map, first,
 			line1, count1, line2, count2);
 	else
 		result = fall_back_to_classic_diff(&map,
 			line1, count1, line2, count2);
-
+ out:
 	xdl_free(map.entries);
 	return result;
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 3/4] xdiff: refactor a function
  2022-02-16 10:15 ` [PATCH v2 0/4] xdiff: handle allocation failures Phillip Wood via GitGitGadget
  2022-02-16 10:15   ` [PATCH v2 1/4] xdiff: fix a memory leak Phillip Wood via GitGitGadget
  2022-02-16 10:15   ` [PATCH v2 2/4] xdiff: handle allocation failure in patience diff Phillip Wood via GitGitGadget
@ 2022-02-16 10:15   ` Phillip Wood via GitGitGadget
  2022-02-16 10:15   ` [PATCH v2 4/4] xdiff: handle allocation failure when merging Phillip Wood via GitGitGadget
  3 siblings, 0 replies; 18+ messages in thread
From: Phillip Wood via GitGitGadget @ 2022-02-16 10:15 UTC (permalink / raw)
  To: git; +Cc: Edward Thomson, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Use the standard "goto out" pattern rather than repeating very similar
code after checking for each error. This will simplify the next commit
that starts handling allocation failures that are currently ignored.
On error xdl_do_diff() frees the environment so we need to take care
to avoid a double free in that case. xdl_build_script() does not
assign a result unless it is successful so there is no possibility of
a double free if it fails.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 xdiff/xmerge.c | 35 ++++++++++++++++-------------------
 1 file changed, 16 insertions(+), 19 deletions(-)

diff --git a/xdiff/xmerge.c b/xdiff/xmerge.c
index fff0b594f9a..d43abe05b95 100644
--- a/xdiff/xmerge.c
+++ b/xdiff/xmerge.c
@@ -684,35 +684,30 @@ static int xdl_do_merge(xdfenv_t *xe1, xdchange_t *xscr1,
 int xdl_merge(mmfile_t *orig, mmfile_t *mf1, mmfile_t *mf2,
 		xmparam_t const *xmp, mmbuffer_t *result)
 {
-	xdchange_t *xscr1, *xscr2;
+	xdchange_t *xscr1 = NULL, *xscr2 = NULL;
 	xdfenv_t xe1, xe2;
-	int status;
+	int status = -1;
 	xpparam_t const *xpp = &xmp->xpp;
 
 	result->ptr = NULL;
 	result->size = 0;
 
-	if (xdl_do_diff(orig, mf1, xpp, &xe1) < 0) {
+	if (xdl_do_diff(orig, mf1, xpp, &xe1) < 0)
 		return -1;
-	}
-	if (xdl_do_diff(orig, mf2, xpp, &xe2) < 0) {
-		xdl_free_env(&xe1);
-		return -1;
-	}
+
+	if (xdl_do_diff(orig, mf2, xpp, &xe2) < 0)
+		goto free_xe1; /* avoid double free of xe2 */
+
 	if (xdl_change_compact(&xe1.xdf1, &xe1.xdf2, xpp->flags) < 0 ||
 	    xdl_change_compact(&xe1.xdf2, &xe1.xdf1, xpp->flags) < 0 ||
-	    xdl_build_script(&xe1, &xscr1) < 0) {
-		xdl_free_env(&xe1);
-		return -1;
-	}
+	    xdl_build_script(&xe1, &xscr1) < 0)
+		goto out;
+
 	if (xdl_change_compact(&xe2.xdf1, &xe2.xdf2, xpp->flags) < 0 ||
 	    xdl_change_compact(&xe2.xdf2, &xe2.xdf1, xpp->flags) < 0 ||
-	    xdl_build_script(&xe2, &xscr2) < 0) {
-		xdl_free_script(xscr1);
-		xdl_free_env(&xe1);
-		xdl_free_env(&xe2);
-		return -1;
-	}
+	    xdl_build_script(&xe2, &xscr2) < 0)
+		goto out;
+
 	status = 0;
 	if (!xscr1) {
 		result->ptr = xdl_malloc(mf2->size);
@@ -727,11 +722,13 @@ int xdl_merge(mmfile_t *orig, mmfile_t *mf1, mmfile_t *mf2,
 				      &xe2, xscr2,
 				      xmp, result);
 	}
+ out:
 	xdl_free_script(xscr1);
 	xdl_free_script(xscr2);
 
-	xdl_free_env(&xe1);
 	xdl_free_env(&xe2);
+ free_xe1:
+	xdl_free_env(&xe1);
 
 	return status;
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 4/4] xdiff: handle allocation failure when merging
  2022-02-16 10:15 ` [PATCH v2 0/4] xdiff: handle allocation failures Phillip Wood via GitGitGadget
                     ` (2 preceding siblings ...)
  2022-02-16 10:15   ` [PATCH v2 3/4] xdiff: refactor a function Phillip Wood via GitGitGadget
@ 2022-02-16 10:15   ` Phillip Wood via GitGitGadget
  3 siblings, 0 replies; 18+ messages in thread
From: Phillip Wood via GitGitGadget @ 2022-02-16 10:15 UTC (permalink / raw)
  To: git; +Cc: Edward Thomson, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Other users of xdiff such as libgit2 need to be able to handle
allocation failures. These allocation failures were previously
ignored.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 xdiff/xmerge.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/xdiff/xmerge.c b/xdiff/xmerge.c
index d43abe05b95..af40c88a5b3 100644
--- a/xdiff/xmerge.c
+++ b/xdiff/xmerge.c
@@ -708,13 +708,18 @@ int xdl_merge(mmfile_t *orig, mmfile_t *mf1, mmfile_t *mf2,
 	    xdl_build_script(&xe2, &xscr2) < 0)
 		goto out;
 
-	status = 0;
 	if (!xscr1) {
 		result->ptr = xdl_malloc(mf2->size);
+		if (!result->ptr)
+			goto out;
+		status = 0;
 		memcpy(result->ptr, mf2->ptr, mf2->size);
 		result->size = mf2->size;
 	} else if (!xscr2) {
 		result->ptr = xdl_malloc(mf1->size);
+		if (!result->ptr)
+			goto out;
+		status = 0;
 		memcpy(result->ptr, mf1->ptr, mf1->size);
 		result->size = mf1->size;
 	} else {
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/4] xdiff: fix a memory leak
  2022-02-16 10:15   ` [PATCH v2 1/4] xdiff: fix a memory leak Phillip Wood via GitGitGadget
@ 2022-02-16 10:28     ` Ævar Arnfjörð Bjarmason
  2022-02-16 13:49       ` Phillip Wood
  0 siblings, 1 reply; 18+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-16 10:28 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget
  Cc: git, Edward Thomson, Phillip Wood, Phillip Wood


On Wed, Feb 16 2022, Phillip Wood via GitGitGadget wrote:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> Although the patience and histogram algorithms initialize the
> environment they do not free it if there is an error. In contrast for
> the Myers algorithm the environment is initalized in xdl_do_diff() and
> it is freed if there is an error. Fix this by always initializing the
> environment in xdl_do_diff() and freeing it there if there is an
> error. Remove the comment in do_patience_diff() about the environment
> being freed by xdl_diff() as it is not accurate because (a) xdl_diff()
> does not do that if there is an error and (b) xdl_diff() is not the
> only caller.
>
> Reported-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
>  xdiff/xdiffi.c     | 33 +++++++++++++++++----------------
>  xdiff/xhistogram.c |  3 ---
>  xdiff/xpatience.c  |  4 ----
>  3 files changed, 17 insertions(+), 23 deletions(-)
>
> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
> index 69689fab247..758410c11ac 100644
> --- a/xdiff/xdiffi.c
> +++ b/xdiff/xdiffi.c
> @@ -315,16 +315,19 @@ int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
>  	long *kvd, *kvdf, *kvdb;
>  	xdalgoenv_t xenv;
>  	diffdata_t dd1, dd2;
> +	int res;
>  
> -	if (XDF_DIFF_ALG(xpp->flags) == XDF_PATIENCE_DIFF)
> -		return xdl_do_patience_diff(mf1, mf2, xpp, xe);
> -
> -	if (XDF_DIFF_ALG(xpp->flags) == XDF_HISTOGRAM_DIFF)
> -		return xdl_do_histogram_diff(mf1, mf2, xpp, xe);
> +	if (xdl_prepare_env(mf1, mf2, xpp, xe) < 0)
> +		return -1;
>  
> -	if (xdl_prepare_env(mf1, mf2, xpp, xe) < 0) {
> +	if (XDF_DIFF_ALG(xpp->flags) == XDF_PATIENCE_DIFF) {
> +		res = xdl_do_patience_diff(mf1, mf2, xpp, xe);
> +		goto out;
> +	}
>  
> -		return -1;
> +	if (XDF_DIFF_ALG(xpp->flags) == XDF_HISTOGRAM_DIFF) {
> +		res = xdl_do_histogram_diff(mf1, mf2, xpp, xe);
> +		goto out;
>  	}
>  
>  	/*
> @@ -359,17 +362,15 @@ int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
>  	dd2.rchg = xe->xdf2.rchg;
>  	dd2.rindex = xe->xdf2.rindex;
>  
> -	if (xdl_recs_cmp(&dd1, 0, dd1.nrec, &dd2, 0, dd2.nrec,
> -			 kvdf, kvdb, (xpp->flags & XDF_NEED_MINIMAL) != 0, &xenv) < 0) {
> -
> -		xdl_free(kvd);
> -		xdl_free_env(xe);
> -		return -1;
> -	}
> -
> +	res = xdl_recs_cmp(&dd1, 0, dd1.nrec, &dd2, 0, dd2.nrec,
> +			   kvdf, kvdb, (xpp->flags & XDF_NEED_MINIMAL) != 0,
> +			   &xenv);
>  	xdl_free(kvd);
> + out:
> +	if (res < 0)
> +		xdl_free_env(xe);
>  
> -	return 0;
> +	return res;
>  }

I wanted to test if this made some diff test pass under SANITIZE=leak
that didn't before, and my usual quick way to discover that is to run
the tests with something like this for testing:
	
	diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
	index 758410c11ac..7811ce2a1d3 100644
	--- a/xdiff/xdiffi.c
	+++ b/xdiff/xdiffi.c
	@@ -368,7 +368,7 @@ int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
	 	xdl_free(kvd);
	  out:
	 	if (res < 0)
	-		xdl_free_env(xe);
	+		BUG("needed freeing");
	 
	 	return res;
	 }

But doing so has all tests passing, so either this code is unreachable
or not reachable by any of our tests.

More generally I think this patch is taking the wrong approach, why are
we trying to the members of a stack variable in xdl_do_diff(), when that
variable isn't ours, but is created by our caller? Why aren't they
dealing with it?

The main issue the memory handling is such a pain here is because of
that mixed ownership.

The below POC (and seems to pass tests) shows a way to do that, which I
think in general is *much* simpler. I.e. sure we'll call free()
redundantly some of the time, but that'll be safe as long as we
zero-initialize the relevant struct.

The xdiff code is much harder to read & maintain than it needs to be
because it's trying to micro-optimize these sort of patterns.

E.g. with the diff below we'll now redundantly call the free on a xe2
when we only used a xe1, but "who cares?" is my considered opinion on
that topic :)

By not trying to micro-optimize it like that the memory management and
ownership becomes so much easier to deal with.

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 758410c11ac..6ad30a98b62 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -1054,19 +1054,19 @@ static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
 int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 	     xdemitconf_t const *xecfg, xdemitcb_t *ecb) {
 	xdchange_t *xscr;
-	xdfenv_t xe;
+	xdfenv_t xe = { 0 };
 	emit_func_t ef = xecfg->hunk_func ? xdl_call_hunk_func : xdl_emit_diff;
+	int status = 0;
 
 	if (xdl_do_diff(mf1, mf2, xpp, &xe) < 0) {
-
-		return -1;
+		status = -1;
+		goto cleanup;
 	}
 	if (xdl_change_compact(&xe.xdf1, &xe.xdf2, xpp->flags) < 0 ||
 	    xdl_change_compact(&xe.xdf2, &xe.xdf1, xpp->flags) < 0 ||
 	    xdl_build_script(&xe, &xscr) < 0) {
-
-		xdl_free_env(&xe);
-		return -1;
+		status = -1;
+		goto cleanup;
 	}
 	if (xscr) {
 		if (xpp->flags & XDF_IGNORE_BLANK_LINES)
@@ -1078,12 +1078,14 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 		if (ef(&xe, xscr, ecb, xecfg) < 0) {
 
 			xdl_free_script(xscr);
-			xdl_free_env(&xe);
-			return -1;
+			status = -1;
+			goto cleanup;
 		}
 		xdl_free_script(xscr);
 	}
+
+cleanup:
 	xdl_free_env(&xe);
 
-	return 0;
+	return status;
 }
diff --git a/xdiff/xmerge.c b/xdiff/xmerge.c
index fff0b594f9a..4751eab9c12 100644
--- a/xdiff/xmerge.c
+++ b/xdiff/xmerge.c
@@ -685,7 +685,8 @@ int xdl_merge(mmfile_t *orig, mmfile_t *mf1, mmfile_t *mf2,
 		xmparam_t const *xmp, mmbuffer_t *result)
 {
 	xdchange_t *xscr1, *xscr2;
-	xdfenv_t xe1, xe2;
+	xdfenv_t xe1 = { 0 };
+	xdfenv_t xe2 = { 0 };
 	int status;
 	xpparam_t const *xpp = &xmp->xpp;
 
@@ -693,25 +694,25 @@ int xdl_merge(mmfile_t *orig, mmfile_t *mf1, mmfile_t *mf2,
 	result->size = 0;
 
 	if (xdl_do_diff(orig, mf1, xpp, &xe1) < 0) {
-		return -1;
+		status = -1;
+		goto cleanup;
 	}
 	if (xdl_do_diff(orig, mf2, xpp, &xe2) < 0) {
-		xdl_free_env(&xe1);
-		return -1;
+		status = -1;
+		goto cleanup;
 	}
 	if (xdl_change_compact(&xe1.xdf1, &xe1.xdf2, xpp->flags) < 0 ||
 	    xdl_change_compact(&xe1.xdf2, &xe1.xdf1, xpp->flags) < 0 ||
 	    xdl_build_script(&xe1, &xscr1) < 0) {
-		xdl_free_env(&xe1);
-		return -1;
+		status = -1;
+		goto cleanup;
 	}
 	if (xdl_change_compact(&xe2.xdf1, &xe2.xdf2, xpp->flags) < 0 ||
 	    xdl_change_compact(&xe2.xdf2, &xe2.xdf1, xpp->flags) < 0 ||
 	    xdl_build_script(&xe2, &xscr2) < 0) {
 		xdl_free_script(xscr1);
-		xdl_free_env(&xe1);
-		xdl_free_env(&xe2);
-		return -1;
+		status = -1;
+		goto cleanup;
 	}
 	status = 0;
 	if (!xscr1) {
@@ -730,6 +731,7 @@ int xdl_merge(mmfile_t *orig, mmfile_t *mf1, mmfile_t *mf2,
 	xdl_free_script(xscr1);
 	xdl_free_script(xscr2);
 
+cleanup:
 	xdl_free_env(&xe1);
 	xdl_free_env(&xe2);
 
diff --git a/xdiff/xutils.c b/xdiff/xutils.c
index cfa6e2220ff..63fb2eee975 100644
--- a/xdiff/xutils.c
+++ b/xdiff/xutils.c
@@ -414,7 +414,8 @@ int xdl_fall_back_diff(xdfenv_t *diff_env, xpparam_t const *xpp,
 	 * ranges of lines instead of the whole files.
 	 */
 	mmfile_t subfile1, subfile2;
-	xdfenv_t env;
+	xdfenv_t env = { 0 };
+	int status = 0;
 
 	subfile1.ptr = (char *)diff_env->xdf1.recs[line1 - 1]->ptr;
 	subfile1.size = diff_env->xdf1.recs[line1 + count1 - 2]->ptr +
@@ -422,13 +423,16 @@ int xdl_fall_back_diff(xdfenv_t *diff_env, xpparam_t const *xpp,
 	subfile2.ptr = (char *)diff_env->xdf2.recs[line2 - 1]->ptr;
 	subfile2.size = diff_env->xdf2.recs[line2 + count2 - 2]->ptr +
 		diff_env->xdf2.recs[line2 + count2 - 2]->size - subfile2.ptr;
-	if (xdl_do_diff(&subfile1, &subfile2, xpp, &env) < 0)
-		return -1;
+	if (xdl_do_diff(&subfile1, &subfile2, xpp, &env) < 0) {
+		status = -1;
+		goto cleanup;
+	}
 
 	memcpy(diff_env->xdf1.rchg + line1 - 1, env.xdf1.rchg, count1);
 	memcpy(diff_env->xdf2.rchg + line2 - 1, env.xdf2.rchg, count2);
 
+cleanup:
 	xdl_free_env(&env);
 
-	return 0;
+	return status;
 }

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/4] xdiff: fix a memory leak
  2022-02-16 10:28     ` Ævar Arnfjörð Bjarmason
@ 2022-02-16 13:49       ` Phillip Wood
  2022-02-16 14:38         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 18+ messages in thread
From: Phillip Wood @ 2022-02-16 13:49 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason,
	Phillip Wood via GitGitGadget
  Cc: git, Edward Thomson, Phillip Wood

Hi Ævar

Thanks for taking a look at this patch

On 16/02/2022 10:28, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Feb 16 2022, Phillip Wood via GitGitGadget wrote:
> 
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> Although the patience and histogram algorithms initialize the
>> environment they do not free it if there is an error. In contrast for
>> the Myers algorithm the environment is initalized in xdl_do_diff() and
>> it is freed if there is an error. Fix this by always initializing the
>> environment in xdl_do_diff() and freeing it there if there is an
>> error. Remove the comment in do_patience_diff() about the environment
>> being freed by xdl_diff() as it is not accurate because (a) xdl_diff()
>> does not do that if there is an error and (b) xdl_diff() is not the
>> only caller.
>>
>> Reported-by: Junio C Hamano <gitster@pobox.com>
>> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>> ---
>>   xdiff/xdiffi.c     | 33 +++++++++++++++++----------------
>>   xdiff/xhistogram.c |  3 ---
>>   xdiff/xpatience.c  |  4 ----
>>   3 files changed, 17 insertions(+), 23 deletions(-)
>>
>> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
>> index 69689fab247..758410c11ac 100644
>> --- a/xdiff/xdiffi.c
>> +++ b/xdiff/xdiffi.c
>> @@ -315,16 +315,19 @@ int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
>>   	long *kvd, *kvdf, *kvdb;
>>   	xdalgoenv_t xenv;
>>   	diffdata_t dd1, dd2;
>> +	int res;
>>   
>> -	if (XDF_DIFF_ALG(xpp->flags) == XDF_PATIENCE_DIFF)
>> -		return xdl_do_patience_diff(mf1, mf2, xpp, xe);
>> -
>> -	if (XDF_DIFF_ALG(xpp->flags) == XDF_HISTOGRAM_DIFF)
>> -		return xdl_do_histogram_diff(mf1, mf2, xpp, xe);
>> +	if (xdl_prepare_env(mf1, mf2, xpp, xe) < 0)
>> +		return -1;
>>   
>> -	if (xdl_prepare_env(mf1, mf2, xpp, xe) < 0) {
>> +	if (XDF_DIFF_ALG(xpp->flags) == XDF_PATIENCE_DIFF) {
>> +		res = xdl_do_patience_diff(mf1, mf2, xpp, xe);
>> +		goto out;
>> +	}
>>   
>> -		return -1;
>> +	if (XDF_DIFF_ALG(xpp->flags) == XDF_HISTOGRAM_DIFF) {
>> +		res = xdl_do_histogram_diff(mf1, mf2, xpp, xe);
>> +		goto out;
>>   	}
>>   
>>   	/*
>> @@ -359,17 +362,15 @@ int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
>>   	dd2.rchg = xe->xdf2.rchg;
>>   	dd2.rindex = xe->xdf2.rindex;
>>   
>> -	if (xdl_recs_cmp(&dd1, 0, dd1.nrec, &dd2, 0, dd2.nrec,
>> -			 kvdf, kvdb, (xpp->flags & XDF_NEED_MINIMAL) != 0, &xenv) < 0) {
>> -
>> -		xdl_free(kvd);
>> -		xdl_free_env(xe);
>> -		return -1;
>> -	}
>> -
>> +	res = xdl_recs_cmp(&dd1, 0, dd1.nrec, &dd2, 0, dd2.nrec,
>> +			   kvdf, kvdb, (xpp->flags & XDF_NEED_MINIMAL) != 0,
>> +			   &xenv);
>>   	xdl_free(kvd);
>> + out:
>> +	if (res < 0)
>> +		xdl_free_env(xe);
>>   
>> -	return 0;
>> +	return res;
>>   }
> 
> I wanted to test if this made some diff test pass under SANITIZE=leak
> that didn't before, and my usual quick way to discover that is to run
> the tests with something like this for testing:
> 	
> 	diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
> 	index 758410c11ac..7811ce2a1d3 100644
> 	--- a/xdiff/xdiffi.c
> 	+++ b/xdiff/xdiffi.c
> 	@@ -368,7 +368,7 @@ int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
> 	 	xdl_free(kvd);
> 	  out:
> 	 	if (res < 0)
> 	-		xdl_free_env(xe);
> 	+		BUG("needed freeing");
> 	
> 	 	return res;
> 	 }
> 
> But doing so has all tests passing, so either this code is unreachable
> or not reachable by any of our tests.

Yeah, it is fixing a leak in the error path. The only possible error is 
an allocation failure and we die rather than returning in that case so 
the test suite is not going to tell us anything about this patch.

> More generally I think this patch is taking the wrong approach, why are
> we trying to the members of a stack variable in xdl_do_diff(), when that
> variable isn't ours, but is created by our caller? Why aren't they
> dealing with it?

They are not dealing with it because they do not initialize it - it is 
an "out" parameter that is used to return data to the caller. This patch 
changes the logic to "whoever initializes it is responsible for freeing 
it if there is an error". By doing that we localize the error handling 
to xdl_do_diff() and can leave the callers unchanged.

> The main issue the memory handling is such a pain here is because of
> that mixed ownership.
> 
> The below POC (and seems to pass tests) shows a way to do that, which I
> think in general is *much* simpler. I.e. sure we'll call free()
> redundantly some of the time, but that'll be safe as long as we
> zero-initialize the relevant struct.
> 
> The xdiff code is much harder to read & maintain than it needs to be
> because it's trying to micro-optimize these sort of patterns.
> 
> E.g. with the diff below we'll now redundantly call the free on a xe2
> when we only used a xe1, but "who cares?" is my considered opinion on
> that topic :)

Then you should reread the cover letter and commit message and code 
comment in patch 3 that all mention avoiding a double free in that case. 
If you have any ideas how to make those clearer please let me know.

If xdl_do_diff(orig, mf2, xpp, &xe2) fails it frees any members of xe2 
that it allocated but it does not NULL them. If we then call 
xdl_free_env(&xe2) we will have a double free error.

Best Wishes

Phillip

> By not trying to micro-optimize it like that the memory management and
> ownership becomes so much easier to deal with.
> 
> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
> index 758410c11ac..6ad30a98b62 100644
> --- a/xdiff/xdiffi.c
> +++ b/xdiff/xdiffi.c
> @@ -1054,19 +1054,19 @@ static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
>   int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
>   	     xdemitconf_t const *xecfg, xdemitcb_t *ecb) {
>   	xdchange_t *xscr;
> -	xdfenv_t xe;
> +	xdfenv_t xe = { 0 };
>   	emit_func_t ef = xecfg->hunk_func ? xdl_call_hunk_func : xdl_emit_diff;
> +	int status = 0;
>   
>   	if (xdl_do_diff(mf1, mf2, xpp, &xe) < 0) {
> -
> -		return -1;
> +		status = -1;
> +		goto cleanup;
>   	}
>   	if (xdl_change_compact(&xe.xdf1, &xe.xdf2, xpp->flags) < 0 ||
>   	    xdl_change_compact(&xe.xdf2, &xe.xdf1, xpp->flags) < 0 ||
>   	    xdl_build_script(&xe, &xscr) < 0) {
> -
> -		xdl_free_env(&xe);
> -		return -1;
> +		status = -1;
> +		goto cleanup;
>   	}
>   	if (xscr) {
>   		if (xpp->flags & XDF_IGNORE_BLANK_LINES)
> @@ -1078,12 +1078,14 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
>   		if (ef(&xe, xscr, ecb, xecfg) < 0) {
>   
>   			xdl_free_script(xscr);
> -			xdl_free_env(&xe);
> -			return -1;
> +			status = -1;
> +			goto cleanup;
>   		}
>   		xdl_free_script(xscr);
>   	}
> +
> +cleanup:
>   	xdl_free_env(&xe);
>   
> -	return 0;
> +	return status;
>   }
> diff --git a/xdiff/xmerge.c b/xdiff/xmerge.c
> index fff0b594f9a..4751eab9c12 100644
> --- a/xdiff/xmerge.c
> +++ b/xdiff/xmerge.c
> @@ -685,7 +685,8 @@ int xdl_merge(mmfile_t *orig, mmfile_t *mf1, mmfile_t *mf2,
>   		xmparam_t const *xmp, mmbuffer_t *result)
>   {
>   	xdchange_t *xscr1, *xscr2;
> -	xdfenv_t xe1, xe2;
> +	xdfenv_t xe1 = { 0 };
> +	xdfenv_t xe2 = { 0 };
>   	int status;
>   	xpparam_t const *xpp = &xmp->xpp;
>   
> @@ -693,25 +694,25 @@ int xdl_merge(mmfile_t *orig, mmfile_t *mf1, mmfile_t *mf2,
>   	result->size = 0;
>   
>   	if (xdl_do_diff(orig, mf1, xpp, &xe1) < 0) {
> -		return -1;
> +		status = -1;
> +		goto cleanup;
>   	}
>   	if (xdl_do_diff(orig, mf2, xpp, &xe2) < 0) {
> -		xdl_free_env(&xe1);
> -		return -1;
> +		status = -1;
> +		goto cleanup;
>   	}
>   	if (xdl_change_compact(&xe1.xdf1, &xe1.xdf2, xpp->flags) < 0 ||
>   	    xdl_change_compact(&xe1.xdf2, &xe1.xdf1, xpp->flags) < 0 ||
>   	    xdl_build_script(&xe1, &xscr1) < 0) {
> -		xdl_free_env(&xe1);
> -		return -1;
> +		status = -1;
> +		goto cleanup;
>   	}
>   	if (xdl_change_compact(&xe2.xdf1, &xe2.xdf2, xpp->flags) < 0 ||
>   	    xdl_change_compact(&xe2.xdf2, &xe2.xdf1, xpp->flags) < 0 ||
>   	    xdl_build_script(&xe2, &xscr2) < 0) {
>   		xdl_free_script(xscr1);
> -		xdl_free_env(&xe1);
> -		xdl_free_env(&xe2);
> -		return -1;
> +		status = -1;
> +		goto cleanup;
>   	}
>   	status = 0;
>   	if (!xscr1) {
> @@ -730,6 +731,7 @@ int xdl_merge(mmfile_t *orig, mmfile_t *mf1, mmfile_t *mf2,
>   	xdl_free_script(xscr1);
>   	xdl_free_script(xscr2);
>   
> +cleanup:
>   	xdl_free_env(&xe1);
>   	xdl_free_env(&xe2);
>   
> diff --git a/xdiff/xutils.c b/xdiff/xutils.c
> index cfa6e2220ff..63fb2eee975 100644
> --- a/xdiff/xutils.c
> +++ b/xdiff/xutils.c
> @@ -414,7 +414,8 @@ int xdl_fall_back_diff(xdfenv_t *diff_env, xpparam_t const *xpp,
>   	 * ranges of lines instead of the whole files.
>   	 */
>   	mmfile_t subfile1, subfile2;
> -	xdfenv_t env;
> +	xdfenv_t env = { 0 };
> +	int status = 0;
>   
>   	subfile1.ptr = (char *)diff_env->xdf1.recs[line1 - 1]->ptr;
>   	subfile1.size = diff_env->xdf1.recs[line1 + count1 - 2]->ptr +
> @@ -422,13 +423,16 @@ int xdl_fall_back_diff(xdfenv_t *diff_env, xpparam_t const *xpp,
>   	subfile2.ptr = (char *)diff_env->xdf2.recs[line2 - 1]->ptr;
>   	subfile2.size = diff_env->xdf2.recs[line2 + count2 - 2]->ptr +
>   		diff_env->xdf2.recs[line2 + count2 - 2]->size - subfile2.ptr;
> -	if (xdl_do_diff(&subfile1, &subfile2, xpp, &env) < 0)
> -		return -1;
> +	if (xdl_do_diff(&subfile1, &subfile2, xpp, &env) < 0) {
> +		status = -1;
> +		goto cleanup;
> +	}
>   
>   	memcpy(diff_env->xdf1.rchg + line1 - 1, env.xdf1.rchg, count1);
>   	memcpy(diff_env->xdf2.rchg + line2 - 1, env.xdf2.rchg, count2);
>   
> +cleanup:
>   	xdl_free_env(&env);
>   
> -	return 0;
> +	return status;
>   }


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/4] xdiff: fix a memory leak
  2022-02-16 13:49       ` Phillip Wood
@ 2022-02-16 14:38         ` Ævar Arnfjörð Bjarmason
  2022-02-16 16:55           ` Junio C Hamano
  2022-02-17 11:05           ` Phillip Wood
  0 siblings, 2 replies; 18+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-16 14:38 UTC (permalink / raw)
  To: phillip.wood; +Cc: Phillip Wood via GitGitGadget, git, Edward Thomson


On Wed, Feb 16 2022, Phillip Wood wrote:

> Hi Ævar
>
> Thanks for taking a look at this patch
>
> On 16/02/2022 10:28, Ævar Arnfjörð Bjarmason wrote:
>> On Wed, Feb 16 2022, Phillip Wood via GitGitGadget wrote:
>> 
>>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>>
>>> Although the patience and histogram algorithms initialize the
>>> environment they do not free it if there is an error. In contrast for
>>> the Myers algorithm the environment is initalized in xdl_do_diff() and
>>> it is freed if there is an error. Fix this by always initializing the
>>> environment in xdl_do_diff() and freeing it there if there is an
>>> error. Remove the comment in do_patience_diff() about the environment
>>> being freed by xdl_diff() as it is not accurate because (a) xdl_diff()
>>> does not do that if there is an error and (b) xdl_diff() is not the
>>> only caller.
>>>
>>> Reported-by: Junio C Hamano <gitster@pobox.com>
>>> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>>> ---
>>>   xdiff/xdiffi.c     | 33 +++++++++++++++++----------------
>>>   xdiff/xhistogram.c |  3 ---
>>>   xdiff/xpatience.c  |  4 ----
>>>   3 files changed, 17 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
>>> index 69689fab247..758410c11ac 100644
>>> --- a/xdiff/xdiffi.c
>>> +++ b/xdiff/xdiffi.c
>>> @@ -315,16 +315,19 @@ int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
>>>   	long *kvd, *kvdf, *kvdb;
>>>   	xdalgoenv_t xenv;
>>>   	diffdata_t dd1, dd2;
>>> +	int res;
>>>   -	if (XDF_DIFF_ALG(xpp->flags) == XDF_PATIENCE_DIFF)
>>> -		return xdl_do_patience_diff(mf1, mf2, xpp, xe);
>>> -
>>> -	if (XDF_DIFF_ALG(xpp->flags) == XDF_HISTOGRAM_DIFF)
>>> -		return xdl_do_histogram_diff(mf1, mf2, xpp, xe);
>>> +	if (xdl_prepare_env(mf1, mf2, xpp, xe) < 0)
>>> +		return -1;
>>>   -	if (xdl_prepare_env(mf1, mf2, xpp, xe) < 0) {
>>> +	if (XDF_DIFF_ALG(xpp->flags) == XDF_PATIENCE_DIFF) {
>>> +		res = xdl_do_patience_diff(mf1, mf2, xpp, xe);
>>> +		goto out;
>>> +	}
>>>   -		return -1;
>>> +	if (XDF_DIFF_ALG(xpp->flags) == XDF_HISTOGRAM_DIFF) {
>>> +		res = xdl_do_histogram_diff(mf1, mf2, xpp, xe);
>>> +		goto out;
>>>   	}
>>>     	/*
>>> @@ -359,17 +362,15 @@ int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
>>>   	dd2.rchg = xe->xdf2.rchg;
>>>   	dd2.rindex = xe->xdf2.rindex;
>>>   -	if (xdl_recs_cmp(&dd1, 0, dd1.nrec, &dd2, 0, dd2.nrec,
>>> -			 kvdf, kvdb, (xpp->flags & XDF_NEED_MINIMAL) != 0, &xenv) < 0) {
>>> -
>>> -		xdl_free(kvd);
>>> -		xdl_free_env(xe);
>>> -		return -1;
>>> -	}
>>> -
>>> +	res = xdl_recs_cmp(&dd1, 0, dd1.nrec, &dd2, 0, dd2.nrec,
>>> +			   kvdf, kvdb, (xpp->flags & XDF_NEED_MINIMAL) != 0,
>>> +			   &xenv);
>>>   	xdl_free(kvd);
>>> + out:
>>> +	if (res < 0)
>>> +		xdl_free_env(xe);
>>>   -	return 0;
>>> +	return res;
>>>   }
>> I wanted to test if this made some diff test pass under
>> SANITIZE=leak
>> that didn't before, and my usual quick way to discover that is to run
>> the tests with something like this for testing:
>> 	
>> 	diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
>> 	index 758410c11ac..7811ce2a1d3 100644
>> 	--- a/xdiff/xdiffi.c
>> 	+++ b/xdiff/xdiffi.c
>> 	@@ -368,7 +368,7 @@ int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
>> 	 	xdl_free(kvd);
>> 	  out:
>> 	 	if (res < 0)
>> 	-		xdl_free_env(xe);
>> 	+		BUG("needed freeing");
>> 	
>> 	 	return res;
>> 	 }
>> But doing so has all tests passing, so either this code is
>> unreachable
>> or not reachable by any of our tests.
>
> Yeah, it is fixing a leak in the error path. The only possible error
> is an allocation failure and we die rather than returning in that case
> so the test suite is not going to tell us anything about this patch.

Indeed. I meant that comment as "here's the first thing I tried, and
when that didn't do anything I went digging further, and...".

>> More generally I think this patch is taking the wrong approach, why are
>> we trying to the members of a stack variable in xdl_do_diff(), when that
>> variable isn't ours, but is created by our caller? Why aren't they
>> dealing with it?
>
> They are not dealing with it because they do not initialize it - it is
> an "out" parameter that is used to return data to the caller. This
> patch changes the logic to "whoever initializes it is responsible for
> freeing it if there is an error". By doing that we localize the error
> handling to xdl_do_diff() and can leave the callers unchanged.

Yes, I'm saying that we're needlessly piling on complexity by continuing
with this pattern in the xdiff/ codebase. I think it's fair to question
the direction in general.

>> The main issue the memory handling is such a pain here is because of
>> that mixed ownership.
>> The below POC (and seems to pass tests) shows a way to do that,
>> which I
>> think in general is *much* simpler. I.e. sure we'll call free()
>> redundantly some of the time, but that'll be safe as long as we
>> zero-initialize the relevant struct.
>> The xdiff code is much harder to read & maintain than it needs to be
>> because it's trying to micro-optimize these sort of patterns.
>> E.g. with the diff below we'll now redundantly call the free on a
>> xe2
>> when we only used a xe1, but "who cares?" is my considered opinion on
>> that topic :)
>
> Then you should reread the cover letter and commit message and code
> comment in patch 3 that all mention avoiding a double free in that
> case. If you have any ideas how to make those clearer please let me
> know.

To use FREE_AND_NULL() instead of free()? Then double-freeing stops
being an issue, and we don't need to do the accounting.

> If xdl_do_diff(orig, mf2, xpp, &xe2) fails it frees any members of xe2
> that it allocated but it does not NULL them. If we then call 
> xdl_free_env(&xe2) we will have a double free error.

Right. I'm saying that if you follow the chain from xdl_do_diff() down
we call e.g. xdl_do_patience_diff(), which then calls xdl_prepare_env(),
which will on failure call e.g. xdl_free_ctx(&xe->xdf2), i.e. freeing
specific things inside the "xe" struct that it specifically knows it
allocated on those codepaths.

If we instead stick to the rule that he who created the stack variable
is responsible for freeing it, we generally speaking avoid the need for
FREE_AND_NULL() over free(), but in either case the control flow becomes
much simpler and less verbose.

I.e. we do away with the incremental freeing because some function 3
stack frames down knows it populated 1/10 struct fields, instead it all
becomes:

 1. Do we own this variable? No, return -1 then. No freeing here.
 2. The thing that owns the variable calls free_thingy(&x)
 3. We created "x" with "x = { 0 }" so calling free() on all its fields is safe
 4. We do just that.

It's a lot simpler at the cost of a tiny bit of redundant work, but in
these cases we're on the error path anyway, so it won't be a performance
issue.

That's how we tend to do it with all of git's internal APIs as far as
freeing is concerned, xdiff doesn't do it because it's externally
imported. But I think it's a better direction, and something that wolud
work for both git.git and libgit2.git (and the diff/code size isn't that
different).

>
>> By not trying to micro-optimize it like that the memory management and
>> ownership becomes so much easier to deal with.
>> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
>> index 758410c11ac..6ad30a98b62 100644
>> --- a/xdiff/xdiffi.c
>> +++ b/xdiff/xdiffi.c
>> @@ -1054,19 +1054,19 @@ static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
>>   int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
>>   	     xdemitconf_t const *xecfg, xdemitcb_t *ecb) {
>>   	xdchange_t *xscr;
>> -	xdfenv_t xe;
>> +	xdfenv_t xe = { 0 };
>>   	emit_func_t ef = xecfg->hunk_func ? xdl_call_hunk_func : xdl_emit_diff;
>> +	int status = 0;
>>     	if (xdl_do_diff(mf1, mf2, xpp, &xe) < 0) {
>> -
>> -		return -1;
>> +		status = -1;
>> +		goto cleanup;
>>   	}
>>   	if (xdl_change_compact(&xe.xdf1, &xe.xdf2, xpp->flags) < 0 ||
>>   	    xdl_change_compact(&xe.xdf2, &xe.xdf1, xpp->flags) < 0 ||
>>   	    xdl_build_script(&xe, &xscr) < 0) {
>> -
>> -		xdl_free_env(&xe);
>> -		return -1;
>> +		status = -1;
>> +		goto cleanup;
>>   	}
>>   	if (xscr) {
>>   		if (xpp->flags & XDF_IGNORE_BLANK_LINES)
>> @@ -1078,12 +1078,14 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
>>   		if (ef(&xe, xscr, ecb, xecfg) < 0) {
>>     			xdl_free_script(xscr);
>> -			xdl_free_env(&xe);
>> -			return -1;
>> +			status = -1;
>> +			goto cleanup;
>>   		}
>>   		xdl_free_script(xscr);
>>   	}
>> +
>> +cleanup:
>>   	xdl_free_env(&xe);
>>   -	return 0;
>> +	return status;
>>   }
>> diff --git a/xdiff/xmerge.c b/xdiff/xmerge.c
>> index fff0b594f9a..4751eab9c12 100644
>> --- a/xdiff/xmerge.c
>> +++ b/xdiff/xmerge.c
>> @@ -685,7 +685,8 @@ int xdl_merge(mmfile_t *orig, mmfile_t *mf1, mmfile_t *mf2,
>>   		xmparam_t const *xmp, mmbuffer_t *result)
>>   {
>>   	xdchange_t *xscr1, *xscr2;
>> -	xdfenv_t xe1, xe2;
>> +	xdfenv_t xe1 = { 0 };
>> +	xdfenv_t xe2 = { 0 };
>>   	int status;
>>   	xpparam_t const *xpp = &xmp->xpp;
>>   @@ -693,25 +694,25 @@ int xdl_merge(mmfile_t *orig, mmfile_t *mf1,
>> mmfile_t *mf2,
>>   	result->size = 0;
>>     	if (xdl_do_diff(orig, mf1, xpp, &xe1) < 0) {
>> -		return -1;
>> +		status = -1;
>> +		goto cleanup;
>>   	}
>>   	if (xdl_do_diff(orig, mf2, xpp, &xe2) < 0) {
>> -		xdl_free_env(&xe1);
>> -		return -1;
>> +		status = -1;
>> +		goto cleanup;
>>   	}
>>   	if (xdl_change_compact(&xe1.xdf1, &xe1.xdf2, xpp->flags) < 0 ||
>>   	    xdl_change_compact(&xe1.xdf2, &xe1.xdf1, xpp->flags) < 0 ||
>>   	    xdl_build_script(&xe1, &xscr1) < 0) {
>> -		xdl_free_env(&xe1);
>> -		return -1;
>> +		status = -1;
>> +		goto cleanup;
>>   	}
>>   	if (xdl_change_compact(&xe2.xdf1, &xe2.xdf2, xpp->flags) < 0 ||
>>   	    xdl_change_compact(&xe2.xdf2, &xe2.xdf1, xpp->flags) < 0 ||
>>   	    xdl_build_script(&xe2, &xscr2) < 0) {
>>   		xdl_free_script(xscr1);
>> -		xdl_free_env(&xe1);
>> -		xdl_free_env(&xe2);
>> -		return -1;
>> +		status = -1;
>> +		goto cleanup;
>>   	}
>>   	status = 0;
>>   	if (!xscr1) {
>> @@ -730,6 +731,7 @@ int xdl_merge(mmfile_t *orig, mmfile_t *mf1, mmfile_t *mf2,
>>   	xdl_free_script(xscr1);
>>   	xdl_free_script(xscr2);
>>   +cleanup:
>>   	xdl_free_env(&xe1);
>>   	xdl_free_env(&xe2);
>>   diff --git a/xdiff/xutils.c b/xdiff/xutils.c
>> index cfa6e2220ff..63fb2eee975 100644
>> --- a/xdiff/xutils.c
>> +++ b/xdiff/xutils.c
>> @@ -414,7 +414,8 @@ int xdl_fall_back_diff(xdfenv_t *diff_env, xpparam_t const *xpp,
>>   	 * ranges of lines instead of the whole files.
>>   	 */
>>   	mmfile_t subfile1, subfile2;
>> -	xdfenv_t env;
>> +	xdfenv_t env = { 0 };
>> +	int status = 0;
>>     	subfile1.ptr = (char *)diff_env->xdf1.recs[line1 - 1]->ptr;
>>   	subfile1.size = diff_env->xdf1.recs[line1 + count1 - 2]->ptr +
>> @@ -422,13 +423,16 @@ int xdl_fall_back_diff(xdfenv_t *diff_env, xpparam_t const *xpp,
>>   	subfile2.ptr = (char *)diff_env->xdf2.recs[line2 - 1]->ptr;
>>   	subfile2.size = diff_env->xdf2.recs[line2 + count2 - 2]->ptr +
>>   		diff_env->xdf2.recs[line2 + count2 - 2]->size - subfile2.ptr;
>> -	if (xdl_do_diff(&subfile1, &subfile2, xpp, &env) < 0)
>> -		return -1;
>> +	if (xdl_do_diff(&subfile1, &subfile2, xpp, &env) < 0) {
>> +		status = -1;
>> +		goto cleanup;
>> +	}
>>     	memcpy(diff_env->xdf1.rchg + line1 - 1, env.xdf1.rchg,
>> count1);
>>   	memcpy(diff_env->xdf2.rchg + line2 - 1, env.xdf2.rchg, count2);
>>   +cleanup:
>>   	xdl_free_env(&env);
>>   -	return 0;
>> +	return status;
>>   }


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/4] xdiff: fix a memory leak
  2022-02-16 14:38         ` Ævar Arnfjörð Bjarmason
@ 2022-02-16 16:55           ` Junio C Hamano
  2022-02-17 11:05           ` Phillip Wood
  1 sibling, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2022-02-16 16:55 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: phillip.wood, Phillip Wood via GitGitGadget, git, Edward Thomson

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> They are not dealing with it because they do not initialize it - it is
>> an "out" parameter that is used to return data to the caller. This
>> patch changes the logic to "whoever initializes it is responsible for
>> freeing it if there is an error". By doing that we localize the error
>> handling to xdl_do_diff() and can leave the callers unchanged.
>
> Yes, I'm saying that we're needlessly piling on complexity by continuing
> with this pattern in the xdiff/ codebase. I think it's fair to question
> the direction in general.

It is perfectly OK to question, but I'd prefer to see local
consistency.

Thanks.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/4] xdiff: fix a memory leak
  2022-02-16 14:38         ` Ævar Arnfjörð Bjarmason
  2022-02-16 16:55           ` Junio C Hamano
@ 2022-02-17 11:05           ` Phillip Wood
  1 sibling, 0 replies; 18+ messages in thread
From: Phillip Wood @ 2022-02-17 11:05 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, phillip.wood
  Cc: Phillip Wood via GitGitGadget, git, Edward Thomson

On 16/02/2022 14:38, Ævar Arnfjörð Bjarmason wrote:
> [...]
>>> More generally I think this patch is taking the wrong approach, why are
>>> we trying to the members of a stack variable in xdl_do_diff(), when that
>>> variable isn't ours, but is created by our caller? Why aren't they
>>> dealing with it?
>>
>> They are not dealing with it because they do not initialize it - it is
>> an "out" parameter that is used to return data to the caller. This
>> patch changes the logic to "whoever initializes it is responsible for
>> freeing it if there is an error". By doing that we localize the error
>> handling to xdl_do_diff() and can leave the callers unchanged.
> 
> Yes, I'm saying that we're needlessly piling on complexity by continuing
> with this pattern in the xdiff/ codebase. I think it's fair to question
> the direction in general.

I think if there were a lot of these bugs that needed fixing then the 
wholesale changes to the way memory is managed you seem to be suggesting 
would be worth it. However I think the existing code is mostly correct 
so such a change would be a lot of churn and would absorb a lot of 
reviewer time that would be better spent elsewhere compared to fixing up 
the code we have.

Best Wishes

Phillip

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-02-17 11:06 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-09 10:59 [PATCH 0/3] xdiff: handle allocation failures Phillip Wood via GitGitGadget
2022-02-09 10:59 ` [PATCH 1/3] xdiff: handle allocation failure in patience diff Phillip Wood via GitGitGadget
2022-02-09 10:59 ` [PATCH 2/3] xdiff: refactor a function Phillip Wood via GitGitGadget
2022-02-09 18:04   ` Junio C Hamano
2022-02-10  6:28     ` Junio C Hamano
2022-02-11 15:19       ` Phillip Wood
2022-02-11 16:46         ` Junio C Hamano
2022-02-09 10:59 ` [PATCH 3/3] xdiff: handle allocation failure when merging Phillip Wood via GitGitGadget
2022-02-16 10:15 ` [PATCH v2 0/4] xdiff: handle allocation failures Phillip Wood via GitGitGadget
2022-02-16 10:15   ` [PATCH v2 1/4] xdiff: fix a memory leak Phillip Wood via GitGitGadget
2022-02-16 10:28     ` Ævar Arnfjörð Bjarmason
2022-02-16 13:49       ` Phillip Wood
2022-02-16 14:38         ` Ævar Arnfjörð Bjarmason
2022-02-16 16:55           ` Junio C Hamano
2022-02-17 11:05           ` Phillip Wood
2022-02-16 10:15   ` [PATCH v2 2/4] xdiff: handle allocation failure in patience diff Phillip Wood via GitGitGadget
2022-02-16 10:15   ` [PATCH v2 3/4] xdiff: refactor a function Phillip Wood via GitGitGadget
2022-02-16 10:15   ` [PATCH v2 4/4] xdiff: handle allocation failure when merging Phillip Wood via GitGitGadget

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).