git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [patch] possible memory leak in diff.c::diff_free_filepair()
@ 2005-08-13 10:58 Yasushi SHOJI
  2005-08-13 19:30 ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Yasushi SHOJI @ 2005-08-13 10:58 UTC (permalink / raw
  To: git

Hi all,

When I run git-diff-tree on big change, it seems the command eats so
much memory.  so I just put git under valgrind to see what's going on.

here is the output:

==26475== 63816 bytes in 766 blocks are definitely lost in loss record 7 of 7
==26475==    at 0x1B8FF896: malloc (vg_replace_malloc.c:149)
==26475==    by 0x805203B: alloc_filespec (diff.c:214)
==26475==    by 0x80528C5: diff_addremove (diff.c:1141)
==26475==    by 0x8049C7A: show_file (diff-tree.c:97)
==26475==    by 0x8049D63: show_file (diff-tree.c:206)
==26475==    by 0x8049D63: show_file (diff-tree.c:206)
==26475==    by 0x8049D63: show_file (diff-tree.c:206)
==26475==    by 0x8049EB3: diff_tree (diff-tree.c:118)
==26475==    by 0x804A12E: diff_tree_sha1 (diff-tree.c:260)
==26475==    by 0x804A06E: diff_tree (diff-tree.c:139)
==26475==    by 0x804A12E: diff_tree_sha1 (diff-tree.c:260)
==26475==    by 0x804A06E: diff_tree (diff-tree.c:139)
==26475== 
==26475== LEAK SUMMARY:
==26475==    definitely lost: 63816 bytes in 766 blocks.
==26475==      possibly lost: 0 bytes in 0 blocks.
==26475==    still reachable: 351 bytes in 6 blocks.
==26475==         suppressed: 0 bytes in 0 blocks.

diff_free_filespec_data() doesn't free diff_filespec itself.  is this
because in merge_broken() filespec itself is used but fliespec data
need to be freed?

so I've put one more function, diff_free_filespec(), between
diff_free_filepare() and diff_free_filespec_data() call-chain.

result is:

==27983== LEAK SUMMARY:
==27983==    definitely lost: 0 bytes in 0 blocks.
==27983==      possibly lost: 0 bytes in 0 blocks.
==27983==    still reachable: 276 bytes in 6 blocks.
==27983==         suppressed: 0 bytes in 0 blocks.

Signed-off-by: Yasushi SHOJI <yashi@atmark-techno.com>

---
diff --git a/diff.c b/diff.c
--- a/diff.c
+++ b/diff.c
@@ -767,10 +767,16 @@ struct diff_filepair *diff_queue(struct 
 	return dp;
 }
 
+void diff_free_filespec(struct diff_filespec *s)
+{
+	diff_free_filespec_data(s);
+        free(s);
+}
+
 void diff_free_filepair(struct diff_filepair *p)
 {
-	diff_free_filespec_data(p->one);
-	diff_free_filespec_data(p->two);
+	diff_free_filespec(p->one);
+	diff_free_filespec(p->two);
 	free(p);
 }
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] possible memory leak in diff.c::diff_free_filepair()
  2005-08-13 10:58 [patch] possible memory leak in diff.c::diff_free_filepair() Yasushi SHOJI
@ 2005-08-13 19:30 ` Junio C Hamano
  2005-08-13 21:09   ` Yasushi SHOJI
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2005-08-13 19:30 UTC (permalink / raw
  To: Yasushi SHOJI; +Cc: git

Yasushi SHOJI <yashi@atmark-techno.com> writes:

> When I run git-diff-tree on big change, it seems the command eats so
> much memory.  so I just put git under valgrind to see what's going on.
>
> diff_free_filespec_data() doesn't free diff_filespec itself.  is this
> because in merge_broken() filespec itself is used but fliespec data
> need to be freed?

Thanks for the patch.  I am wondering if the same leak exists in
diff_free_filepair(), which frees the filespec data without
freeing filespec itself for both sides.  If this is something
you can trap easily with valgrind I would really appreciate it.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] possible memory leak in diff.c::diff_free_filepair()
  2005-08-13 19:30 ` Junio C Hamano
@ 2005-08-13 21:09   ` Yasushi SHOJI
  2005-08-13 21:31     ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Yasushi SHOJI @ 2005-08-13 21:09 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

At Sat, 13 Aug 2005 12:30:59 -0700,
Junio C Hamano wrote:
> 
> Yasushi SHOJI <yashi@atmark-techno.com> writes:
> 
> > When I run git-diff-tree on big change, it seems the command eats so
> > much memory.  so I just put git under valgrind to see what's going on.
> >
> > diff_free_filespec_data() doesn't free diff_filespec itself.  is this
> > because in merge_broken() filespec itself is used but fliespec data
> > need to be freed?
> 
> Thanks for the patch.  I am wondering if the same leak exists in
> diff_free_filepair(), which frees the filespec data without
> freeing filespec itself for both sides.  If this is something
> you can trap easily with valgrind I would really appreciate it.

oops.  probably my english wasn't clear. my patch fixes
diff_free_filepair().

the reason I asked about merge_broken() was that those two functions
are the only functions calling diff_free_filespec_data(). it's first
time reading git source code and still learning ;)
--
          yashi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] possible memory leak in diff.c::diff_free_filepair()
  2005-08-13 21:09   ` Yasushi SHOJI
@ 2005-08-13 21:31     ` Junio C Hamano
  2005-08-16  3:05       ` Yasushi SHOJI
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2005-08-13 21:31 UTC (permalink / raw
  To: Yasushi SHOJI; +Cc: git

Yasushi SHOJI <yashi@atmark-techno.com> writes:

> oops.  probably my english wasn't clear. my patch fixes
> diff_free_filepair().

No, my reading of your patch when I wrote that message was
wrong.  The attached is what I ended up doing based on your
patch.  It does not seem to barf with the following test on
either git repository itself nor recent linux-2.6 repository,
which is a good sign.

    $ export MALLOC_CHECK_=2
    $ ./git-rev-list HEAD |
      ./git-diff-tree --stdin -r -B -C --find-copies-harder |
      sed -ne '/^[^:]/p;/ [MRCDA][0-9][0-9]*	/p'

When the command is run on linux-2.6 repository, virtual memory
consumption of git-diff-tree command skyrockets to about half a
gig, because it maps all files in two adjacent revisions of the
entire kernel tree.  But it seems to reclaim things reasonably
well and goes back down to less than 10m when it starts to
process the next commit pair.

------------
From: Yasushi SHOJI <yashi@atmark-techno.com>
Date: 1123930736 +0900
[PATCH] plug memory leak in diff.c::diff_free_filepair()

When I run git-diff-tree on big change, it seems the command eats so
much memory.  so I just put git under valgrind to see what's going on.
diff_free_filespec_data() doesn't free diff_filespec itself.

[jc: I ended up doing things slightly differently from Yasushi's
patch.  The original idea was to use free_filespec_data() only to
free the data portion and keep useing the filespec itself, but
no existing code seems to do things that way, so I just yanked
that part out.]

Signed-off-by: Yasushi SHOJI <yashi@atmark-techno.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
---

 diff.c           |    9 ++++-----
 diffcore-break.c |    4 ++--
 diffcore.h       |    2 +-
 3 files changed, 7 insertions(+), 8 deletions(-)

20226fa40d48069b55cf165c9e197a003e1608a8
diff --git a/diff.c b/diff.c
--- a/diff.c
+++ b/diff.c
@@ -405,14 +405,13 @@ int diff_populate_filespec(struct diff_f
 	return 0;
 }
 
-void diff_free_filespec_data(struct diff_filespec *s)
+void diff_free_filespec(struct diff_filespec *s)
 {
 	if (s->should_free)
 		free(s->data);
 	else if (s->should_munmap)
 		munmap(s->data, s->size);
-	s->should_free = s->should_munmap = 0;
-	s->data = NULL;
+	free(s);
 }
 
 static void prep_temp_blob(struct diff_tempfile *temp,
@@ -769,8 +768,8 @@ struct diff_filepair *diff_queue(struct 
 
 void diff_free_filepair(struct diff_filepair *p)
 {
-	diff_free_filespec_data(p->one);
-	diff_free_filespec_data(p->two);
+	diff_free_filespec(p->one);
+	diff_free_filespec(p->two);
 	free(p);
 }
 
diff --git a/diffcore-break.c b/diffcore-break.c
--- a/diffcore-break.c
+++ b/diffcore-break.c
@@ -231,8 +231,8 @@ static void merge_broken(struct diff_fil
 
 	dp = diff_queue(outq, d->one, c->two);
 	dp->score = p->score;
-	diff_free_filespec_data(d->two);
-	diff_free_filespec_data(c->one);
+	diff_free_filespec(d->two);
+	diff_free_filespec(c->one);
 	free(d);
 	free(c);
 }
diff --git a/diffcore.h b/diffcore.h
--- a/diffcore.h
+++ b/diffcore.h
@@ -43,7 +43,7 @@ extern void fill_filespec(struct diff_fi
 			  unsigned short);
 
 extern int diff_populate_filespec(struct diff_filespec *, int);
-extern void diff_free_filespec_data(struct diff_filespec *);
+extern void diff_free_filespec(struct diff_filespec *);
 
 struct diff_filepair {
 	struct diff_filespec *one;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] possible memory leak in diff.c::diff_free_filepair()
  2005-08-13 21:31     ` Junio C Hamano
@ 2005-08-16  3:05       ` Yasushi SHOJI
  2005-08-16  4:32         ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Yasushi SHOJI @ 2005-08-16  3:05 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

At Sat, 13 Aug 2005 14:31:59 -0700,
Junio C Hamano wrote:
> 
> Yasushi SHOJI <yashi@atmark-techno.com> writes:
> 
> > oops.  probably my english wasn't clear. my patch fixes
> > diff_free_filepair().
> 
> When the command is run on linux-2.6 repository, virtual memory
> consumption of git-diff-tree command skyrockets to about half a
> gig, because it maps all files in two adjacent revisions of the
> entire kernel tree.  But it seems to reclaim things reasonably
> well and goes back down to less than 10m when it starts to
> process the next commit pair.

it tunes out that, at least for my problem is to populating filespec
data in parepare_temp_file() and not freeing it after creating temp
file with prep_temp_blob().

parepare_temp_file() and diff_populate_filespec() has a lot in
similarity. so it'd be nice to refactor some. and re-introduce
diff_free_filespec_data() and call right after prep_temp_blob() in
prepare_temp_file().

Junio, did you also mean to clean-up these functions when you said in
the thread of "Re: gitweb - option to disable rename detection"?

regards,
--
         yashi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] possible memory leak in diff.c::diff_free_filepair()
  2005-08-16  3:05       ` Yasushi SHOJI
@ 2005-08-16  4:32         ` Junio C Hamano
  2005-08-21  7:14           ` Yasushi SHOJI
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2005-08-16  4:32 UTC (permalink / raw
  To: Yasushi SHOJI; +Cc: git

Yasushi SHOJI <yashi@atmark-techno.com> writes:

> parepare_temp_file() and diff_populate_filespec() has a lot in
> similarity. so it'd be nice to refactor some. and re-introduce
> diff_free_filespec_data() and call right after prep_temp_blob() in
> prepare_temp_file().

Another thing that may (or may not) help would be to move that
prepare_temp_file() and stuff to happen in the forked child
process instead of the parent, so that we do not have to worry
about freeing the buffer; it has been some time since I worked
on that part of the code so this may not be an option to make
things easier.

> Junio, did you also mean to clean-up these functions when you said in
> the thread of "Re: gitweb - option to disable rename detection"?

No.  I was talking about cleaning up the similar option parsing
code, and the call into diffcore_std() which takes more
arguments every time a new option is added to the family.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] possible memory leak in diff.c::diff_free_filepair()
  2005-08-16  4:32         ` Junio C Hamano
@ 2005-08-21  7:14           ` Yasushi SHOJI
  0 siblings, 0 replies; 7+ messages in thread
From: Yasushi SHOJI @ 2005-08-21  7:14 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

At Mon, 15 Aug 2005 21:32:12 -0700,
Junio C Hamano wrote:
> 
> Yasushi SHOJI <yashi@atmark-techno.com> writes:
> 
> > parepare_temp_file() and diff_populate_filespec() has a lot in
> > similarity. so it'd be nice to refactor some. and re-introduce
> > diff_free_filespec_data() and call right after prep_temp_blob() in
> > prepare_temp_file().
> 
> Another thing that may (or may not) help would be to move that
> prepare_temp_file() and stuff to happen in the forked child
> process instead of the parent, so that we do not have to worry
> about freeing the buffer; it has been some time since I worked
> on that part of the code so this may not be an option to make
> things easier.

well, I couldn't figure out how to do it in child.  because the parent
need to know temp files in order to delete after child used it.

anyway, here is a patch to fix the problem in the simplest way.

regards,
--
          yashi

-----------
Make diff_flush a bit more system friendly.

This seems to be the simplest way to fix the peak time
high memory usage in diff_flush.  Don't wait all iteration
to free each filepair.

Signed-off-by: Yasushi SHOJI <yashi@atmark-techno.com>


---

 diff.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

e0f7f97ddc29a9172c2f6c1221eca936aa80835c
diff --git a/diff.c b/diff.c
--- a/diff.c
+++ b/diff.c
@@ -1010,9 +1010,8 @@ void diff_flush(int diff_output_style, i
 			diff_flush_name(p, line_termination);
 			break;
 		}
-	}
-	for (i = 0; i < q->nr; i++)
 		diff_free_filepair(q->queue[i]);
+	}
 	free(q->queue);
 	q->queue = NULL;
 	q->nr = q->alloc = 0;

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-08-21  7:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-13 10:58 [patch] possible memory leak in diff.c::diff_free_filepair() Yasushi SHOJI
2005-08-13 19:30 ` Junio C Hamano
2005-08-13 21:09   ` Yasushi SHOJI
2005-08-13 21:31     ` Junio C Hamano
2005-08-16  3:05       ` Yasushi SHOJI
2005-08-16  4:32         ` Junio C Hamano
2005-08-21  7:14           ` Yasushi SHOJI

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).