git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Bug report: git -L requires excessive memory.
@ 2022-10-29 16:59 man dog
  2022-10-31 21:45 ` SZEDER Gábor
  0 siblings, 1 reply; 17+ messages in thread
From: man dog @ 2022-10-29 16:59 UTC (permalink / raw)
  To: git

Thank you for filling out a Git bug report!
Please answer the following questions to help us understand your issue.

What did you do before the bug happened? (Steps to reproduce your issue)
git log -L /regex/,/regex/:myfile to a repo in which 2MB text file is
committed about 2800 times.

What did you expect to happen? (Expected behavior)
get the result.

What happened instead? (Actual behavior)
fatal: Out of memory, malloc failed (tried to allocate 2346801 bytes)

What's different between what you expected and what actually happened?
The function requires too much memory.
-n option should work for -L function.

Anything else you want to add:
I made a script to reproduce this. Please run the script below.
Results in each environments are in its header.
A workaround which is given in other BBS is included also.




#!/bin/bash
#
# Bug report: git -L requires excessive memory.
# Run this script to reproduce
#
# MINGW32(git version 2.38.1.windows.1) fatal: Out of memory, malloc
failed (tried to allocate 2346801 bytes)
# MINGW64(git version 2.38.1.windows.1) requires  8.6GB
# Linux64(git version 2.20.1          ) requires 13.1GB
#

git --version

if [ ! -d .git ]; then
  git init
  c=${1:-3000}
  for (( i=0;i<c;i++)); do
    gawk -v r="$i" '
      BEGIN{
        for (i=0;i<100;i++) {
          if (r>=i) {
            printf("function func_%03d(){ // revised at %d\n",i,
int((r-i)/100)*100+i%100)
            printf("  // contents of function\n")
            printf("}\n")
            make_subfuncs(i);
          }
        }
        exit
      }
      function make_subfuncs(i,    j){
        for (j=0;j<300;j++) {
          printf("function func_%03d_sub%03d(){\n",i,j)
          printf("  // contents of sub functions are NOT revised.\n")
          printf("}\n")
        }
      }' > test.txt
    git add test.txt
    git commit -m "revision $i"
  done
  git gc
fi

git log -L /func_007\(/,/}$/:test.txt # this command requires excessive memory.
git log -L /func_007\(/,/}$/:test.txt -n 10 # -n option doesn't work also.
#git log -L /func_007\(/,/}$/:test.txt HEAD~10..HEAD~0 # this works.

#
# This can be a workaround
#
step=50
num=`git log | grep -c commit`
for ((i=0;i<$num;i+=$step)); do
  end=$((i+$step))
  range=HEAD~$end..HEAD~$i
  if [ $end -ge $num ]; then
    range=HEAD~$i
  fi
#  echo $range
  git --no-pager log -L /func_007\(/,/}$/:test.txt $range
done




[System Info]
[Enabled Hooks]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug report: git -L requires excessive memory.
  2022-10-29 16:59 Bug report: git -L requires excessive memory man dog
@ 2022-10-31 21:45 ` SZEDER Gábor
  2022-10-31 21:56   ` Taylor Blau
  0 siblings, 1 reply; 17+ messages in thread
From: SZEDER Gábor @ 2022-10-31 21:45 UTC (permalink / raw)
  To: man dog; +Cc: git

On Sun, Oct 30, 2022 at 01:59:41AM +0900, man dog wrote:
> Thank you for filling out a Git bug report!
> Please answer the following questions to help us understand your issue.
> 
> What did you do before the bug happened? (Steps to reproduce your issue)
> git log -L /regex/,/regex/:myfile to a repo in which 2MB text file is
> committed about 2800 times.
> 
> What did you expect to happen? (Expected behavior)
> get the result.
> 
> What happened instead? (Actual behavior)
> fatal: Out of memory, malloc failed (tried to allocate 2346801 bytes)

Thanks for the report and the reproduction recipe.

This is not a buggy allocation (the size matches the size of the test
file + 1 byte), but the line-level log apparently leaks some memory
for each commit modifying the file in question, and in your case their
combined size is excessive because of the somewhat big file that is
modified in every commit.

'line-log.c' contains two "NEEDSWORK leaking like a sieve" comments,
but you managed to stumble upon yet another case (those two are in the
code path handling merge commits, but your history is linear).

The patch below plugs this leak.

  ---  >8  ---

diff --git a/line-log.c b/line-log.c
index 51d93310a4..b6ea82ac6b 100644
--- a/line-log.c
+++ b/line-log.c
@@ -1195,6 +1195,9 @@ static int process_ranges_ordinary_commit(struct rev_info *rev, struct commit *c
 	if (parent)
 		add_line_range(rev, parent, parent_range);
 	free_line_log_data(parent_range);
+	for (int i = 0; i < queue.nr; i++)
+		diff_free_filepair(queue.queue[i]);
+	free(queue.queue);
 	return changed;
 }
 
  ---  8<  ---

> What's different between what you expected and what actually happened?
> The function requires too much memory.
> -n option should work for -L function.

Line-level log does work with '-n', but the file is so big and is
modified so many times between commits that do touch the specified
line range, that by the time it gets to the 10th commit to show it has
already leaked over 4GB memory.  Had you specified an even smaller
number of commits to show it might have worked:

  $ for i in {1..7} ; do /usr/bin/time -f "n: $i  maxRSS: %M" git log -L /func_007\(/,/}$/:test.txt -n $i >/dev/null || break ; done
  n: 1  maxRSS: 531192
  n: 2  maxRSS: 989504
  n: 3  maxRSS: 1447900
  n: 4  maxRSS: 1906408
  n: 5  maxRSS: 2364740
  n: 6  maxRSS: 2823148
  n: 7  maxRSS: 3282360

In you reproduction recipe the given line range is modified every 100
commit and there are 3000 commits in total, so I estimate the total
memory usage to be somewhere around 13.5GB.  With the patch above it
tops out at around 260MB.


> Anything else you want to add:
> I made a script to reproduce this. Please run the script below.
> Results in each environments are in its header.
> A workaround which is given in other BBS is included also.
> 
> 
> 
> 
> #!/bin/bash
> #
> # Bug report: git -L requires excessive memory.
> # Run this script to reproduce
> #
> # MINGW32(git version 2.38.1.windows.1) fatal: Out of memory, malloc
> failed (tried to allocate 2346801 bytes)
> # MINGW64(git version 2.38.1.windows.1) requires  8.6GB
> # Linux64(git version 2.20.1          ) requires 13.1GB
> #
> 
> git --version
> 
> if [ ! -d .git ]; then
>   git init
>   c=${1:-3000}
>   for (( i=0;i<c;i++)); do
>     gawk -v r="$i" '
>       BEGIN{
>         for (i=0;i<100;i++) {
>           if (r>=i) {
>             printf("function func_%03d(){ // revised at %d\n",i,
> int((r-i)/100)*100+i%100)
>             printf("  // contents of function\n")
>             printf("}\n")
>             make_subfuncs(i);
>           }
>         }
>         exit
>       }
>       function make_subfuncs(i,    j){
>         for (j=0;j<300;j++) {
>           printf("function func_%03d_sub%03d(){\n",i,j)
>           printf("  // contents of sub functions are NOT revised.\n")
>           printf("}\n")
>         }
>       }' > test.txt
>     git add test.txt
>     git commit -m "revision $i"
>   done
>   git gc
> fi
> 
> git log -L /func_007\(/,/}$/:test.txt # this command requires excessive memory.
> git log -L /func_007\(/,/}$/:test.txt -n 10 # -n option doesn't work also.
> #git log -L /func_007\(/,/}$/:test.txt HEAD~10..HEAD~0 # this works.

Perhaps I misunderstood, but I got the impression that you think that
'HEAD~10..HEAD~0' and '-n 10' do the same.

They are not: 'HEAD~10..HEAD~0' means to process only the last ten
commits, so it can't leak all that much, and that's why it worked.
'-n 10', however, means to _show_ only ten commits, but process as
many commits as necessary to find those ten.  In your case, with the
line range being modified every 100 commit, that amounts to processing
over 1000 commits.

> #
> # This can be a workaround
> #
> step=50
> num=`git log | grep -c commit`
> for ((i=0;i<$num;i+=$step)); do
>   end=$((i+$step))
>   range=HEAD~$end..HEAD~$i
>   if [ $end -ge $num ]; then
>     range=HEAD~$i
>   fi
> #  echo $range
>   git --no-pager log -L /func_007\(/,/}$/:test.txt $range
> done
> 
> 
> 
> 
> [System Info]
> [Enabled Hooks]

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Bug report: git -L requires excessive memory.
  2022-10-31 21:45 ` SZEDER Gábor
@ 2022-10-31 21:56   ` Taylor Blau
  2022-11-02 22:01     ` [PATCH 0/3] line-log: plug some memory leaks SZEDER Gábor
  0 siblings, 1 reply; 17+ messages in thread
From: Taylor Blau @ 2022-10-31 21:56 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: man dog, git

On Mon, Oct 31, 2022 at 10:45:54PM +0100, SZEDER Gábor wrote:
> 'line-log.c' contains two "NEEDSWORK leaking like a sieve" comments,
> but you managed to stumble upon yet another case (those two are in the
> code path handling merge commits, but your history is linear).

;-).

> The patch below plugs this leak.
>
>   ---  >8  ---
>
> diff --git a/line-log.c b/line-log.c
> index 51d93310a4..b6ea82ac6b 100644
> --- a/line-log.c
> +++ b/line-log.c
> @@ -1195,6 +1195,9 @@ static int process_ranges_ordinary_commit(struct rev_info *rev, struct commit *c
>  	if (parent)
>  		add_line_range(rev, parent, parent_range);
>  	free_line_log_data(parent_range);
> +	for (int i = 0; i < queue.nr; i++)
> +		diff_free_filepair(queue.queue[i]);
> +	free(queue.queue);
>  	return changed;
>  }
>
>   ---  8<  ---

Very nice. I wouldn't be surprised if there are other leaks, in the
line-log code, but fixing one is good regardless. If you resend with
your S-o-b, I'd be happy to queue it.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/3] line-log: plug some memory leaks
  2022-10-31 21:56   ` Taylor Blau
@ 2022-11-02 22:01     ` SZEDER Gábor
  2022-11-02 22:01       ` [PATCH 1/3] line-log: free diff queue when processing non-merge commits SZEDER Gábor
                         ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: SZEDER Gábor @ 2022-11-02 22:01 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, man dog, SZEDER Gábor

The first patch plugs the reported big memory leak, the second one
plugs a minor leak, and the little cleanup in the third puts the
cherry on top.

SZEDER Gábor (3):
  line-log: free diff queue when processing non-merge commits
  line-log: free the diff queues' arrays when processing merge commits
  diff.c: use diff_free_queue()

 diff.c     | 17 +++++++++--------
 diffcore.h |  1 +
 line-log.c |  7 +++----
 3 files changed, 13 insertions(+), 12 deletions(-)

-- 
2.38.1.564.g99c012faba


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/3] line-log: free diff queue when processing non-merge commits
  2022-11-02 22:01     ` [PATCH 0/3] line-log: plug some memory leaks SZEDER Gábor
@ 2022-11-02 22:01       ` SZEDER Gábor
  2022-11-03  0:20         ` Taylor Blau
  2022-11-02 22:01       ` [PATCH 2/3] line-log: free the diff queues' arrays when processing merge commits SZEDER Gábor
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: SZEDER Gábor @ 2022-11-02 22:01 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, man dog, SZEDER Gábor

When processing a non-merge commit, the line-level log first asks the
tree-diff machinery whether any of the files in the given line ranges
were modified between the current commit and its parent, and if some
of them were, then it loads the contents of those files from both
commits to see whether their line ranges were modified and/or need to
be adjusted.  Alas, it doesn't free() the diff queue holding the
results of that query and the contents of those files once its done.
This can add up to a substantial amount of leaked memory, especially
when the file in question is big and is frequently modified: a user
reported "Out of memory, malloc failed" errors with a 2MB text file
that was modified ~2800 times [1] (I estimate the leak would use up
almost 11GB memory in that case).

Free that diff queue to plug this memory leak.  However, instead of
simply open-coding the necessary three lines, add them as a helper
function to the diff API, because it will be useful elsewhere as well.

[1] https://public-inbox.org/git/CAFOPqVXz2XwzX8vGU7wLuqb2ZuwTuOFAzBLRM_QPk+NJa=eC-g@mail.gmail.com/

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 diff.c     | 7 +++++++
 diffcore.h | 1 +
 line-log.c | 1 +
 3 files changed, 9 insertions(+)

diff --git a/diff.c b/diff.c
index 35e46dd968..ef94175163 100644
--- a/diff.c
+++ b/diff.c
@@ -5773,6 +5773,13 @@ void diff_free_filepair(struct diff_filepair *p)
 	free(p);
 }
 
+void diff_free_queue(struct diff_queue_struct *q)
+{
+	for (int i = 0; i < q->nr; i++)
+		diff_free_filepair(q->queue[i]);
+	free(q->queue);
+}
+
 const char *diff_aligned_abbrev(const struct object_id *oid, int len)
 {
 	int abblen;
diff --git a/diffcore.h b/diffcore.h
index badc2261c2..9b588a1ee1 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -162,6 +162,7 @@ struct diff_filepair *diff_queue(struct diff_queue_struct *,
 				 struct diff_filespec *,
 				 struct diff_filespec *);
 void diff_q(struct diff_queue_struct *, struct diff_filepair *);
+void diff_free_queue(struct diff_queue_struct *q);
 
 /* dir_rename_relevance: the reason we want rename information for a dir */
 enum dir_rename_relevance {
diff --git a/line-log.c b/line-log.c
index 51d93310a4..7a74daf2e8 100644
--- a/line-log.c
+++ b/line-log.c
@@ -1195,6 +1195,7 @@ static int process_ranges_ordinary_commit(struct rev_info *rev, struct commit *c
 	if (parent)
 		add_line_range(rev, parent, parent_range);
 	free_line_log_data(parent_range);
+	diff_free_queue(&queue);
 	return changed;
 }
 
-- 
2.38.1.564.g99c012faba


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/3] line-log: free the diff queues' arrays when processing merge commits
  2022-11-02 22:01     ` [PATCH 0/3] line-log: plug some memory leaks SZEDER Gábor
  2022-11-02 22:01       ` [PATCH 1/3] line-log: free diff queue when processing non-merge commits SZEDER Gábor
@ 2022-11-02 22:01       ` SZEDER Gábor
  2022-11-03  0:21         ` Taylor Blau
  2022-11-02 22:01       ` [PATCH 3/3] diff.c: use diff_free_queue() SZEDER Gábor
  2022-11-03  9:05       ` [PATCH 0/3] line-log: plug some memory leaks Ævar Arnfjörð Bjarmason
  3 siblings, 1 reply; 17+ messages in thread
From: SZEDER Gábor @ 2022-11-02 22:01 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, man dog, SZEDER Gábor

When processing merge commits, the line-level log first creates an
array of diff queues, each comparing the merge commit with one of its
parents, to check whether any of the files in the given line ranges
were modified.  Alas, when freeing these queues it only frees the
filepairs in the queues, but not the queues' internal arrays holding
pointers to those filepairs.

Use the diff_free_queue() helper function introduced in the previous
commit to free the diff queues' internal arrays as well.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 line-log.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/line-log.c b/line-log.c
index 7a74daf2e8..a7f3e7f6ce 100644
--- a/line-log.c
+++ b/line-log.c
@@ -1089,10 +1089,8 @@ static struct diff_filepair *diff_filepair_dup(struct diff_filepair *pair)
 
 static void free_diffqueues(int n, struct diff_queue_struct *dq)
 {
-	int i, j;
-	for (i = 0; i < n; i++)
-		for (j = 0; j < dq[i].nr; j++)
-			diff_free_filepair(dq[i].queue[j]);
+	for (int i = 0; i < n; i++)
+		diff_free_queue(&dq[i]);
 	free(dq);
 }
 
-- 
2.38.1.564.g99c012faba


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/3] diff.c: use diff_free_queue()
  2022-11-02 22:01     ` [PATCH 0/3] line-log: plug some memory leaks SZEDER Gábor
  2022-11-02 22:01       ` [PATCH 1/3] line-log: free diff queue when processing non-merge commits SZEDER Gábor
  2022-11-02 22:01       ` [PATCH 2/3] line-log: free the diff queues' arrays when processing merge commits SZEDER Gábor
@ 2022-11-02 22:01       ` SZEDER Gábor
  2022-11-03  0:24         ` Taylor Blau
  2022-11-03  9:05       ` [PATCH 0/3] line-log: plug some memory leaks Ævar Arnfjörð Bjarmason
  3 siblings, 1 reply; 17+ messages in thread
From: SZEDER Gábor @ 2022-11-02 22:01 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, man dog, SZEDER Gábor

Use diff_free_queue() instead of open-coding it.  This shortens the
code and make it less repetitive.

Note that the second hunk in diff_flush() is interesting, because the
'free_queue' label separates the loop freeing the queue's filepairs
from free()-ing the queue's internal array.  This is somewhat
suspicious, but it was not an issue before: there is only one place
from where we jump to this label with a goto, and that is protected by
an 'if (!q->nr && ...)' condition, i.e. we only skipped the loop
freeing the filepairs when there were no filepairs in the queue to
begin with.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 diff.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index ef94175163..03e6ffb5e4 100644
--- a/diff.c
+++ b/diff.c
@@ -6337,13 +6337,9 @@ static int diff_get_patch_id(struct diff_options *options, struct object_id *oid
 int diff_flush_patch_id(struct diff_options *options, struct object_id *oid, int diff_header_only)
 {
 	struct diff_queue_struct *q = &diff_queued_diff;
-	int i;
 	int result = diff_get_patch_id(options, oid, diff_header_only);
 
-	for (i = 0; i < q->nr; i++)
-		diff_free_filepair(q->queue[i]);
-
-	free(q->queue);
+	diff_free_queue(q);
 	DIFF_QUEUE_CLEAR(q);
 
 	return result;
@@ -6612,10 +6608,8 @@ void diff_flush(struct diff_options *options)
 	if (output_format & DIFF_FORMAT_CALLBACK)
 		options->format_callback(q, options, options->format_callback_data);
 
-	for (i = 0; i < q->nr; i++)
-		diff_free_filepair(q->queue[i]);
 free_queue:
-	free(q->queue);
+	diff_free_queue(q);
 	DIFF_QUEUE_CLEAR(q);
 	diff_free(options);
 
-- 
2.38.1.564.g99c012faba


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] line-log: free diff queue when processing non-merge commits
  2022-11-02 22:01       ` [PATCH 1/3] line-log: free diff queue when processing non-merge commits SZEDER Gábor
@ 2022-11-03  0:20         ` Taylor Blau
  2022-11-07 15:11           ` SZEDER Gábor
  0 siblings, 1 reply; 17+ messages in thread
From: Taylor Blau @ 2022-11-03  0:20 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git, Taylor Blau, man dog

On Wed, Nov 02, 2022 at 11:01:40PM +0100, SZEDER Gábor wrote:
> When processing a non-merge commit, the line-level log first asks the
> tree-diff machinery whether any of the files in the given line ranges
> were modified between the current commit and its parent, and if some
> of them were, then it loads the contents of those files from both
> commits to see whether their line ranges were modified and/or need to
> be adjusted.  Alas, it doesn't free() the diff queue holding the
> results of that query and the contents of those files once its done.
> This can add up to a substantial amount of leaked memory, especially
> when the file in question is big and is frequently modified: a user
> reported "Out of memory, malloc failed" errors with a 2MB text file
> that was modified ~2800 times [1] (I estimate the leak would use up
> almost 11GB memory in that case).
>
> Free that diff queue to plug this memory leak.  However, instead of
> simply open-coding the necessary three lines, add them as a helper
> function to the diff API, because it will be useful elsewhere as well.

Nicely explained.

> ---
>  diff.c     | 7 +++++++
>  diffcore.h | 1 +
>  line-log.c | 1 +
>  3 files changed, 9 insertions(+)

And all looks reasonable here, good...

> diff --git a/diff.c b/diff.c
> index 35e46dd968..ef94175163 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -5773,6 +5773,13 @@ void diff_free_filepair(struct diff_filepair *p)
>  	free(p);
>  }
>
> +void diff_free_queue(struct diff_queue_struct *q)
> +{
> +	for (int i = 0; i < q->nr; i++)
> +		diff_free_filepair(q->queue[i]);
> +	free(q->queue);
> +}

Though I wonder, should diff_free_queue() be a noop when q is NULL? The
caller in process_ranges_ordinary_commit() doesn't care, of course,
since q is always non-NULL there.

But if we're making it part of the diff API, we should probably err on
the side of flexibility.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] line-log: free the diff queues' arrays when processing merge commits
  2022-11-02 22:01       ` [PATCH 2/3] line-log: free the diff queues' arrays when processing merge commits SZEDER Gábor
@ 2022-11-03  0:21         ` Taylor Blau
  0 siblings, 0 replies; 17+ messages in thread
From: Taylor Blau @ 2022-11-03  0:21 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git, Taylor Blau, man dog

On Wed, Nov 02, 2022 at 11:01:41PM +0100, SZEDER Gábor wrote:
> ---
>  line-log.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)

Looking good.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] diff.c: use diff_free_queue()
  2022-11-02 22:01       ` [PATCH 3/3] diff.c: use diff_free_queue() SZEDER Gábor
@ 2022-11-03  0:24         ` Taylor Blau
  2022-11-07 16:13           ` SZEDER Gábor
  0 siblings, 1 reply; 17+ messages in thread
From: Taylor Blau @ 2022-11-03  0:24 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git, Taylor Blau, man dog

On Wed, Nov 02, 2022 at 11:01:42PM +0100, SZEDER Gábor wrote:
> Use diff_free_queue() instead of open-coding it.  This shortens the
> code and make it less repetitive.
>
> Note that the second hunk in diff_flush() is interesting, because the
> 'free_queue' label separates the loop freeing the queue's filepairs
> from free()-ing the queue's internal array.  This is somewhat
> suspicious, but it was not an issue before: there is only one place
> from where we jump to this label with a goto, and that is protected by
> an 'if (!q->nr && ...)' condition, i.e. we only skipped the loop
> freeing the filepairs when there were no filepairs in the queue to
> begin with.
>
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---
>  diff.c | 10 ++--------
>  1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index ef94175163..03e6ffb5e4 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -6337,13 +6337,9 @@ static int diff_get_patch_id(struct diff_options *options, struct object_id *oid
>  int diff_flush_patch_id(struct diff_options *options, struct object_id *oid, int diff_header_only)
>  {
>  	struct diff_queue_struct *q = &diff_queued_diff;
> -	int i;
>  	int result = diff_get_patch_id(options, oid, diff_header_only);
>
> -	for (i = 0; i < q->nr; i++)
> -		diff_free_filepair(q->queue[i]);
> -
> -	free(q->queue);
> +	diff_free_queue(q);
>  	DIFF_QUEUE_CLEAR(q);

So, this all looks fine to me. But I did a quick grep around for
DIFF_QUEUE_CLEAR(), and this macro is used in quite a few places.
Mostly, as far as I can tell, to "empty" out the diff-queue by setting
its 'queue' pointer to NULL, and its 'nr' back to 0.

Should we be freeing the memory held by the queue there more
aggressively? I.e., should we make sure that there is a
diff_free_queue() call above each expansion of the DIFF_QUEUE_CLEAR()
macro?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] line-log: plug some memory leaks
  2022-11-02 22:01     ` [PATCH 0/3] line-log: plug some memory leaks SZEDER Gábor
                         ` (2 preceding siblings ...)
  2022-11-02 22:01       ` [PATCH 3/3] diff.c: use diff_free_queue() SZEDER Gábor
@ 2022-11-03  9:05       ` Ævar Arnfjörð Bjarmason
  3 siblings, 0 replies; 17+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-11-03  9:05 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git, Taylor Blau, man dog


On Wed, Nov 02 2022, SZEDER Gábor wrote:

> The first patch plugs the reported big memory leak, the second one
> plugs a minor leak, and the little cleanup in the third puts the
> cherry on top.

Looks good as far as it goes.

The "further" part seems a real mess though, e.g. I came up with the
below on top as a quick test, i.e. we have other existing users that
also loop over the same struct, and free "queue[n]", but do so
differently.

I handled the combine-diff.c one (rough WIP, only compiled it), but then
diffcore-rename.c has another such case.

I wonder if you looked at that further, and if the free function we're
adding now should anticipate that case or not.

And, orthagonally I came up with this rough WIP yesterday:
	
	@@ -6640,9 +6633,7 @@ static void diffcore_apply_filter(struct diff_options *options)
	 {
	 	int i;
	 	struct diff_queue_struct *q = &diff_queued_diff;
	-	struct diff_queue_struct outq;
	-
	-	DIFF_QUEUE_CLEAR(&outq);
	+	struct diff_queue_struct outq = DIFF_QUEUE_STRUCT_INIT;
	 
	 	if (!options->filter)
	 		return;
	@@ -6735,8 +6726,7 @@ static void diffcore_skip_stat_unmatch(struct diff_options *diffopt)
	 {
	 	int i;
	 	struct diff_queue_struct *q = &diff_queued_diff;
	-	struct diff_queue_struct outq;
	-	DIFF_QUEUE_CLEAR(&outq);
	+	struct diff_queue_struct outq = DIFF_QUEUE_STRUCT_INIT;
	 
	 	for (i = 0; i < q->nr; i++) {
	 		struct diff_filepair *p = q->queue[i];
	diff --git a/diffcore.h b/diffcore.h
	index badc2261c20..a0a89568cec 100644
	--- a/diffcore.h
	+++ b/diffcore.h
	@@ -150,12 +150,7 @@ struct diff_queue_struct {
	 	int alloc;
	 	int nr;
	 };
	-
	-#define DIFF_QUEUE_CLEAR(q) \
	-	do { \
	-		(q)->queue = NULL; \
	-		(q)->nr = (q)->alloc = 0; \
	-	} while (0)
	+	#define DIFF_QUEUE_STRUCT_INIT { 0 }

You leave the DIFF_QUEUE_CLEAR in place, but I wonder given that that's
the common pattern whether you shouldn't have a *_reset() and
*_{free,release}() (one of the two, your current naming is fine) which
resets it too, as some callers seen below & in your diff context want
that.


diff --git a/combine-diff.c b/combine-diff.c
index b0ece954808..509c58ad556 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1297,12 +1297,6 @@ void show_combined_diff(struct combine_diff_path *p,
 		show_patch_diff(p, num_parent, 1, rev);
 }
 
-static void free_combined_pair(struct diff_filepair *pair)
-{
-	free(pair->two);
-	free(pair);
-}
-
 /*
  * A combine_diff_path expresses N parents on the LHS against 1 merge
  * result. Synthesize a diff_filepair that has N entries on the "one"
@@ -1355,9 +1349,7 @@ static void handle_combined_callback(struct diff_options *opt,
 	for (i = 0, p = paths; p; p = p->next)
 		q.queue[i++] = combined_pair(p, num_parent);
 	opt->format_callback(&q, opt, opt->format_callback_data);
-	for (i = 0; i < num_paths; i++)
-		free_combined_pair(q.queue[i]);
-	free(q.queue);
+	diff_free_queue(&q, 1);
 }
 
 static const char *path_path(void *obj)
diff --git a/diff.c b/diff.c
index 03e6ffb5e4e..b48105c070f 100644
--- a/diff.c
+++ b/diff.c
@@ -5773,10 +5773,20 @@ void diff_free_filepair(struct diff_filepair *p)
 	free(p);
 }
 
-void diff_free_queue(struct diff_queue_struct *q)
+static void free_combined_pair(struct diff_filepair *pair)
 {
-	for (int i = 0; i < q->nr; i++)
-		diff_free_filepair(q->queue[i]);
+	free(pair->two);
+	free(pair);
+}
+
+void diff_free_queue(struct diff_queue_struct *q, int combined)
+{
+	for (int i = 0; i < q->nr; i++) {
+		if (combined)
+			diff_free_filepair(q->queue[i]);
+		else
+			free_combined_pair(q->queue[i]);
+	}
 	free(q->queue);
 }
 
@@ -6339,7 +6349,7 @@ int diff_flush_patch_id(struct diff_options *options, struct object_id *oid, int
 	struct diff_queue_struct *q = &diff_queued_diff;
 	int result = diff_get_patch_id(options, oid, diff_header_only);
 
-	diff_free_queue(q);
+	diff_free_queue(q, 0);
 	DIFF_QUEUE_CLEAR(q);
 
 	return result;
@@ -6609,7 +6619,7 @@ void diff_flush(struct diff_options *options)
 		options->format_callback(q, options, options->format_callback_data);
 
 free_queue:
-	diff_free_queue(q);
+	diff_free_queue(q, 0);
 	DIFF_QUEUE_CLEAR(q);
 	diff_free(options);
 
diff --git a/diffcore-rename.c b/diffcore-rename.c
index c0422d9e709..1754411a916 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -1686,6 +1686,7 @@ void diffcore_rename_extended(struct diff_options *options,
 			pair_to_free = p;
 
 		if (pair_to_free)
+			/* ??? */
 			pool_diff_free_filepair(pool, pair_to_free);
 	}
 	diff_debug_queue("done copying original", &outq);
diff --git a/diffcore.h b/diffcore.h
index 9b588a1ee15..6cb74e6eadf 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -162,7 +162,7 @@ struct diff_filepair *diff_queue(struct diff_queue_struct *,
 				 struct diff_filespec *,
 				 struct diff_filespec *);
 void diff_q(struct diff_queue_struct *, struct diff_filepair *);
-void diff_free_queue(struct diff_queue_struct *q);
+void diff_free_queue(struct diff_queue_struct *q, int combined);
 
 /* dir_rename_relevance: the reason we want rename information for a dir */
 enum dir_rename_relevance {
diff --git a/line-log.c b/line-log.c
index a7f3e7f6ce4..88c22b20c0f 100644
--- a/line-log.c
+++ b/line-log.c
@@ -1090,7 +1090,7 @@ static struct diff_filepair *diff_filepair_dup(struct diff_filepair *pair)
 static void free_diffqueues(int n, struct diff_queue_struct *dq)
 {
 	for (int i = 0; i < n; i++)
-		diff_free_queue(&dq[i]);
+		diff_free_queue(&dq[i], 0);
 	free(dq);
 }
 
@@ -1193,7 +1193,7 @@ static int process_ranges_ordinary_commit(struct rev_info *rev, struct commit *c
 	if (parent)
 		add_line_range(rev, parent, parent_range);
 	free_line_log_data(parent_range);
-	diff_free_queue(&queue);
+	diff_free_queue(&queue, 0);
 	return changed;
 }
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] line-log: free diff queue when processing non-merge commits
  2022-11-03  0:20         ` Taylor Blau
@ 2022-11-07 15:11           ` SZEDER Gábor
  2022-11-07 15:29             ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 17+ messages in thread
From: SZEDER Gábor @ 2022-11-07 15:11 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, man dog

On Wed, Nov 02, 2022 at 08:20:21PM -0400, Taylor Blau wrote:
> > +void diff_free_queue(struct diff_queue_struct *q)
> > +{
> > +	for (int i = 0; i < q->nr; i++)
> > +		diff_free_filepair(q->queue[i]);
> > +	free(q->queue);
> > +}
> 
> Though I wonder, should diff_free_queue() be a noop when q is NULL? The
> caller in process_ranges_ordinary_commit() doesn't care, of course,
> since q is always non-NULL there.
> 
> But if we're making it part of the diff API, we should probably err on
> the side of flexibility.

On one hand, strbuf_reset(), string_list_clear(), or strvec_clear()
would all segfault on a NULL strbuf, string_list, or strvec pointer.

On the other hand, given the usage patterns of the diff API, and that
it mostly only works on the dreaded global 'diff_queued_diff'
instance, I don't think there is any flexibility to be gained with
this; indeed it is already more flexible than many diff API functions
as it works on the diff queue given as parameter instead of that
global instance.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] line-log: free diff queue when processing non-merge commits
  2022-11-07 15:11           ` SZEDER Gábor
@ 2022-11-07 15:29             ` Ævar Arnfjörð Bjarmason
  2022-11-07 15:57               ` SZEDER Gábor
  0 siblings, 1 reply; 17+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-11-07 15:29 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Taylor Blau, git, man dog


On Mon, Nov 07 2022, SZEDER Gábor wrote:

> On Wed, Nov 02, 2022 at 08:20:21PM -0400, Taylor Blau wrote:
>> > +void diff_free_queue(struct diff_queue_struct *q)
>> > +{
>> > +	for (int i = 0; i < q->nr; i++)
>> > +		diff_free_filepair(q->queue[i]);
>> > +	free(q->queue);
>> > +}
>> 
>> Though I wonder, should diff_free_queue() be a noop when q is NULL? The
>> caller in process_ranges_ordinary_commit() doesn't care, of course,
>> since q is always non-NULL there.
>> 
>> But if we're making it part of the diff API, we should probably err on
>> the side of flexibility.
>
> On one hand, strbuf_reset(), string_list_clear(), or strvec_clear()
> would all segfault on a NULL strbuf, string_list, or strvec pointer.

But the reason we do that is because those APIs will always ensure that
the struct is never in an inconsistent state, as opposed to the
destructor you're adding here.

I.e. if you were to work with the queue after this diff_free_queue()
call in process_ranges_ordinary_commit() you'd segfault, not so with
those other APIs.

> On the other hand, given the usage patterns of the diff API, and that
> it mostly only works on the dreaded global 'diff_queued_diff'
> instance, I don't think there is any flexibility to be gained with
> this; indeed it is already more flexible than many diff API functions
> as it works on the diff queue given as parameter instead of that
> global instance.

I pointed how this could be nicer if you made it work like those other
APIs in
https://lore.kernel.org/git/221103.864jvg2yit.gmgdl@evledraar.gmail.com/;
I.e. we could do away with DIFF_QUEUE_CLEAR() after calling this
"free()".

But in lieu of such a larger change, just adding a call to
"DIFF_QUEUE_CLEAR()" in this new free() function seems like it could
make thing safer at very little cost.

We're also far from consistent about this, but I wish it worked like
that and were called:

	diff_queue_struct_{release,clear}()

I.e. the usual naming is:

	<struct name>_{release,clear}()

In cases where we don't free() the pointer itself, but assume that we're
working on a struct on the stack, whereas *_free() functions will free
the malloc'd pointer itself, as well as anything it contains.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] line-log: free diff queue when processing non-merge commits
  2022-11-07 15:29             ` Ævar Arnfjörð Bjarmason
@ 2022-11-07 15:57               ` SZEDER Gábor
  2022-11-08  2:14                 ` Taylor Blau
  0 siblings, 1 reply; 17+ messages in thread
From: SZEDER Gábor @ 2022-11-07 15:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Taylor Blau, git, man dog

On Mon, Nov 07, 2022 at 04:29:39PM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Nov 07 2022, SZEDER Gábor wrote:
> 
> > On Wed, Nov 02, 2022 at 08:20:21PM -0400, Taylor Blau wrote:
> >> > +void diff_free_queue(struct diff_queue_struct *q)
> >> > +{
> >> > +	for (int i = 0; i < q->nr; i++)
> >> > +		diff_free_filepair(q->queue[i]);
> >> > +	free(q->queue);
> >> > +}
> >> 
> >> Though I wonder, should diff_free_queue() be a noop when q is NULL? The
> >> caller in process_ranges_ordinary_commit() doesn't care, of course,
> >> since q is always non-NULL there.
> >> 
> >> But if we're making it part of the diff API, we should probably err on
> >> the side of flexibility.
> >
> > On one hand, strbuf_reset(), string_list_clear(), or strvec_clear()
> > would all segfault on a NULL strbuf, string_list, or strvec pointer.
> 
> But the reason we do that is because those APIs will always ensure that
> the struct is never in an inconsistent state, as opposed to the
> destructor you're adding here.

Taylor's suggestion quoted above is not about the internal state of
the diff queue, but about a NULL pointer passed to diff_free_queue().

> I.e. if you were to work with the queue after this diff_free_queue()
> call in process_ranges_ordinary_commit() you'd segfault, not so with
> those other APIs.
> 
> > On the other hand, given the usage patterns of the diff API, and that
> > it mostly only works on the dreaded global 'diff_queued_diff'
> > instance, I don't think there is any flexibility to be gained with
> > this; indeed it is already more flexible than many diff API functions
> > as it works on the diff queue given as parameter instead of that
> > global instance.
> 
> I pointed how this could be nicer if you made it work like those other
> APIs in
> https://lore.kernel.org/git/221103.864jvg2yit.gmgdl@evledraar.gmail.com/;
> I.e. we could do away with DIFF_QUEUE_CLEAR() after calling this
> "free()".
> 
> But in lieu of such a larger change, just adding a call to
> "DIFF_QUEUE_CLEAR()" in this new free() function seems like it could
> make thing safer at very little cost.
> 
> We're also far from consistent about this, but I wish it worked like
> that and were called:
> 
> 	diff_queue_struct_{release,clear}()
> 
> I.e. the usual naming is:
> 
> 	<struct name>_{release,clear}()
> 
> In cases where we don't free() the pointer itself, but assume that we're
> working on a struct on the stack, whereas *_free() functions will free
> the malloc'd pointer itself, as well as anything it contains.
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] diff.c: use diff_free_queue()
  2022-11-03  0:24         ` Taylor Blau
@ 2022-11-07 16:13           ` SZEDER Gábor
  2022-11-08  2:14             ` Taylor Blau
  0 siblings, 1 reply; 17+ messages in thread
From: SZEDER Gábor @ 2022-11-07 16:13 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, man dog

On Wed, Nov 02, 2022 at 08:24:09PM -0400, Taylor Blau wrote:
> On Wed, Nov 02, 2022 at 11:01:42PM +0100, SZEDER Gábor wrote:
> >  int diff_flush_patch_id(struct diff_options *options, struct object_id *oid, int diff_header_only)
> >  {
> >  	struct diff_queue_struct *q = &diff_queued_diff;
> > -	int i;
> >  	int result = diff_get_patch_id(options, oid, diff_header_only);
> >
> > -	for (i = 0; i < q->nr; i++)
> > -		diff_free_filepair(q->queue[i]);
> > -
> > -	free(q->queue);
> > +	diff_free_queue(q);
> >  	DIFF_QUEUE_CLEAR(q);
> 
> So, this all looks fine to me. But I did a quick grep around for
> DIFF_QUEUE_CLEAR(), and this macro is used in quite a few places.
> Mostly, as far as I can tell, to "empty" out the diff-queue by setting
> its 'queue' pointer to NULL, and its 'nr' back to 0.
> 
> Should we be freeing the memory held by the queue there more
> aggressively? I.e., should we make sure that there is a
> diff_free_queue() call above each expansion of the DIFF_QUEUE_CLEAR()
> macro?

Definitely not.  DIFF_QUEUE_CLEAR is often used to initialize a just
created 'struct diff_queue_struct' instance; by adding a
diff_free_queue() above those it would operate on uninitialized
memory.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] line-log: free diff queue when processing non-merge commits
  2022-11-07 15:57               ` SZEDER Gábor
@ 2022-11-08  2:14                 ` Taylor Blau
  0 siblings, 0 replies; 17+ messages in thread
From: Taylor Blau @ 2022-11-08  2:14 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Ævar Arnfjörð Bjarmason, Taylor Blau, git, man dog

On Mon, Nov 07, 2022 at 04:57:21PM +0100, SZEDER Gábor wrote:
> On Mon, Nov 07, 2022 at 04:29:39PM +0100, Ævar Arnfjörð Bjarmason wrote:
> >
> > On Mon, Nov 07 2022, SZEDER Gábor wrote:
> >
> > > On Wed, Nov 02, 2022 at 08:20:21PM -0400, Taylor Blau wrote:
> > >> > +void diff_free_queue(struct diff_queue_struct *q)
> > >> > +{
> > >> > +	for (int i = 0; i < q->nr; i++)
> > >> > +		diff_free_filepair(q->queue[i]);
> > >> > +	free(q->queue);
> > >> > +}
> > >>
> > >> Though I wonder, should diff_free_queue() be a noop when q is NULL? The
> > >> caller in process_ranges_ordinary_commit() doesn't care, of course,
> > >> since q is always non-NULL there.
> > >>
> > >> But if we're making it part of the diff API, we should probably err on
> > >> the side of flexibility.
> > >
> > > On one hand, strbuf_reset(), string_list_clear(), or strvec_clear()
> > > would all segfault on a NULL strbuf, string_list, or strvec pointer.
> >
> > But the reason we do that is because those APIs will always ensure that
> > the struct is never in an inconsistent state, as opposed to the
> > destructor you're adding here.
>
> Taylor's suggestion quoted above is not about the internal state of
> the diff queue, but about a NULL pointer passed to diff_free_queue().

I think your perspective that strbuf_reset(), string_list_clear(), etc.
all segfault on a NULL argument is extremely reasonable. Let's start
merging this down.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] diff.c: use diff_free_queue()
  2022-11-07 16:13           ` SZEDER Gábor
@ 2022-11-08  2:14             ` Taylor Blau
  0 siblings, 0 replies; 17+ messages in thread
From: Taylor Blau @ 2022-11-08  2:14 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Taylor Blau, git, man dog

On Mon, Nov 07, 2022 at 05:13:11PM +0100, SZEDER Gábor wrote:
> On Wed, Nov 02, 2022 at 08:24:09PM -0400, Taylor Blau wrote:
> > On Wed, Nov 02, 2022 at 11:01:42PM +0100, SZEDER Gábor wrote:
> > >  int diff_flush_patch_id(struct diff_options *options, struct object_id *oid, int diff_header_only)
> > >  {
> > >  	struct diff_queue_struct *q = &diff_queued_diff;
> > > -	int i;
> > >  	int result = diff_get_patch_id(options, oid, diff_header_only);
> > >
> > > -	for (i = 0; i < q->nr; i++)
> > > -		diff_free_filepair(q->queue[i]);
> > > -
> > > -	free(q->queue);
> > > +	diff_free_queue(q);
> > >  	DIFF_QUEUE_CLEAR(q);
> >
> > So, this all looks fine to me. But I did a quick grep around for
> > DIFF_QUEUE_CLEAR(), and this macro is used in quite a few places.
> > Mostly, as far as I can tell, to "empty" out the diff-queue by setting
> > its 'queue' pointer to NULL, and its 'nr' back to 0.
> >
> > Should we be freeing the memory held by the queue there more
> > aggressively? I.e., should we make sure that there is a
> > diff_free_queue() call above each expansion of the DIFF_QUEUE_CLEAR()
> > macro?
>
> Definitely not.  DIFF_QUEUE_CLEAR is often used to initialize a just
> created 'struct diff_queue_struct' instance; by adding a
> diff_free_queue() above those it would operate on uninitialized
> memory.

Thanks for pointing it out.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-11-08  2:14 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-29 16:59 Bug report: git -L requires excessive memory man dog
2022-10-31 21:45 ` SZEDER Gábor
2022-10-31 21:56   ` Taylor Blau
2022-11-02 22:01     ` [PATCH 0/3] line-log: plug some memory leaks SZEDER Gábor
2022-11-02 22:01       ` [PATCH 1/3] line-log: free diff queue when processing non-merge commits SZEDER Gábor
2022-11-03  0:20         ` Taylor Blau
2022-11-07 15:11           ` SZEDER Gábor
2022-11-07 15:29             ` Ævar Arnfjörð Bjarmason
2022-11-07 15:57               ` SZEDER Gábor
2022-11-08  2:14                 ` Taylor Blau
2022-11-02 22:01       ` [PATCH 2/3] line-log: free the diff queues' arrays when processing merge commits SZEDER Gábor
2022-11-03  0:21         ` Taylor Blau
2022-11-02 22:01       ` [PATCH 3/3] diff.c: use diff_free_queue() SZEDER Gábor
2022-11-03  0:24         ` Taylor Blau
2022-11-07 16:13           ` SZEDER Gábor
2022-11-08  2:14             ` Taylor Blau
2022-11-03  9:05       ` [PATCH 0/3] line-log: plug some memory leaks Ævar Arnfjörð Bjarmason

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).