Something is broken in repack

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* Something is broken in repack
@ 2007-12-07 23:05 Jon Smirl
  2007-12-08  0:37 ` Linus Torvalds
                   ` (4 more replies)
  0 siblings, 5 replies; 82+ messages in thread
From: Jon Smirl @ 2007-12-07 23:05 UTC (permalink / raw
  To: Git Mailing List

Using this config:
[pack]
        threads = 4
        deltacachesize = 256M
        deltacachelimit = 0

And the 330MB gcc pack for input
 git repack -a -d -f  --depth=250 --window=250

complete seconds RAM
10%  47 1GB
20%  29 1Gb
30%  24 1Gb
40%  18 1GB
50%  110 1.2GB
60%  85 1.4GB
70%  195 1.5GB
80%  186 2.5GB
90%  489 3.8GB
95%  800 4.8GB
I killed it because it started swapping

The mmaps are only about 400MB in this case.
At the end the git process had 4.4GB of physical RAM allocated.

Starting from a highly compressed pack greatly aggravates the problem.
Starting with a 2GB pack of the same data my process size only grew to
3GB with 2GB of mmaps.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-07 23:05 Something is broken in repack Jon Smirl
@ 2007-12-08  0:37 ` Linus Torvalds
  2007-12-08  1:27   ` [PATCH] pack-objects: fix delta cache size accounting Nicolas Pitre
  2007-12-08  1:46 ` Something is broken in repack Nicolas Pitre
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 82+ messages in thread
From: Linus Torvalds @ 2007-12-08  0:37 UTC (permalink / raw
  To: Jon Smirl, Nicolas Pitre; +Cc: Git Mailing List

On Fri, 7 Dec 2007, Jon Smirl wrote:
>
> Using this config:
> [pack]
>         threads = 4
>         deltacachesize = 256M

I think deltacachesize is broken.

The code in try_delta() that replaces a delta cache entry with another one 
seems very buggy wrt that whole "delta_cache_size" update. It does

	delta_cache_size -= trg_entry->delta_size;

to account for the old delta going away, but it does this *after* having 
already replaced trg_entry->delta_size with the new delta entry.

I suspect there are other issues going on too, but that's the one that I 
noticed from a quick look-through.

Nico? I think this one is yours..

		Linus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH] pack-objects: fix delta cache size accounting
  2007-12-08  0:37 ` Linus Torvalds
@ 2007-12-08  1:27   ` Nicolas Pitre
  0 siblings, 0 replies; 82+ messages in thread
From: Nicolas Pitre @ 2007-12-08  1:27 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Linus Torvalds, Jon Smirl, Git Mailing List

The wrong value was substracted from delta_cache_size when replacing
a cached delta, as trg_entry->delta_size was used after the old size
had been replaced by the new size.

Noticed by Linus.

Signed-off-by: Nicolas Pitre <nico@cam.org> 
---

On Fri, 7 Dec 2007, Linus Torvalds wrote:

> The code in try_delta() that replaces a delta cache entry with another one 
> seems very buggy wrt that whole "delta_cache_size" update. It does
> 
> 	delta_cache_size -= trg_entry->delta_size;
> 
> to account for the old delta going away, but it does this *after* having 
> already replaced trg_entry->delta_size with the new delta entry.

Doh!  Mea culpa.

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 4f44658..350ece4 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -1422,10 +1422,6 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
 		}
 	}
 
-	trg_entry->delta = src_entry;
-	trg_entry->delta_size = delta_size;
-	trg->depth = src->depth + 1;
-
 	/*
 	 * Handle memory allocation outside of the cache
 	 * accounting lock.  Compiler will optimize the strangeness
@@ -1439,7 +1435,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
 		trg_entry->delta_data = NULL;
 	}
 	if (delta_cacheable(src_size, trg_size, delta_size)) {
-		delta_cache_size += trg_entry->delta_size;
+		delta_cache_size += delta_size;
 		cache_unlock();
 		trg_entry->delta_data = xrealloc(delta_buf, delta_size);
 	} else {
@@ -1447,6 +1443,10 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
 		free(delta_buf);
 	}
 
+	trg_entry->delta = src_entry;
+	trg_entry->delta_size = delta_size;
+	trg->depth = src->depth + 1;
+
 	return 1;
 }
 

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-07 23:05 Something is broken in repack Jon Smirl
  2007-12-08  0:37 ` Linus Torvalds
@ 2007-12-08  1:46 ` Nicolas Pitre
  2007-12-08  2:04   ` Jon Smirl
                     ` (3 more replies)
  2007-12-08  2:56 ` David Brown
                   ` (2 subsequent siblings)
  4 siblings, 4 replies; 82+ messages in thread
From: Nicolas Pitre @ 2007-12-08  1:46 UTC (permalink / raw
  To: Jon Smirl; +Cc: Git Mailing List

On Fri, 7 Dec 2007, Jon Smirl wrote:

> Using this config:
> [pack]
>         threads = 4
>         deltacachesize = 256M
>         deltacachelimit = 0

Since you have a different result according to the source pack used then 
those cache settings, even if there was a bug with them, are not 
significant.

> And the 330MB gcc pack for input
>  git repack -a -d -f  --depth=250 --window=250
> 
> complete seconds RAM
> 10%  47 1GB
> 20%  29 1Gb
> 30%  24 1Gb
> 40%  18 1GB
> 50%  110 1.2GB
> 60%  85 1.4GB
> 70%  195 1.5GB
> 80%  186 2.5GB
> 90%  489 3.8GB
> 95%  800 4.8GB
> I killed it because it started swapping
> 
> The mmaps are only about 400MB in this case.
> At the end the git process had 4.4GB of physical RAM allocated.

That's really bad.

> Starting from a highly compressed pack greatly aggravates the problem.

That is really interesting though.

> Starting with a 2GB pack of the same data my process size only grew to
> 3GB with 2GB of mmaps.

Which is quite reasonable, even if the same issue might still be there.

So the problem seems to be related to the pack access code and not the 
repack code.  And it must have something to do with the number of deltas 
being replayed.  And because the repack is attempting delta compression 
roughly from newest to oldest, and because old objects are typically in 
a deeper delta chain, then this might explain the logarithmic slowdown.

So something must be wrong with the delta cache in sha1_file.c somehow.

Nicolas

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-08  1:46 ` Something is broken in repack Nicolas Pitre
@ 2007-12-08  2:04   ` Jon Smirl
  2007-12-08  2:28     ` Nicolas Pitre
  2007-12-08  2:22   ` Jon Smirl
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 82+ messages in thread
From: Jon Smirl @ 2007-12-08  2:04 UTC (permalink / raw
  To: Nicolas Pitre; +Cc: Git Mailing List

On 12/7/07, Nicolas Pitre <nico@cam.org> wrote:
> On Fri, 7 Dec 2007, Jon Smirl wrote:
>
> > Using this config:
> > [pack]
> >         threads = 4
> >         deltacachesize = 256M
> >         deltacachelimit = 0
>
> Since you have a different result according to the source pack used then
> those cache settings, even if there was a bug with them, are not
> significant.
>
> > And the 330MB gcc pack for input
> >  git repack -a -d -f  --depth=250 --window=250
> >
> > complete seconds RAM
> > 10%  47 1GB
> > 20%  29 1Gb
> > 30%  24 1Gb
> > 40%  18 1GB
> > 50%  110 1.2GB
> > 60%  85 1.4GB
> > 70%  195 1.5GB
> > 80%  186 2.5GB
> > 90%  489 3.8GB
> > 95%  800 4.8GB
> > I killed it because it started swapping
> >
> > The mmaps are only about 400MB in this case.
> > At the end the git process had 4.4GB of physical RAM allocated.
>
> That's really bad.
>
> > Starting from a highly compressed pack greatly aggravates the problem.
>
> That is really interesting though.
>
> > Starting with a 2GB pack of the same data my process size only grew to
> > 3GB with 2GB of mmaps.
>
> Which is quite reasonable, even if the same issue might still be there.
>
> So the problem seems to be related to the pack access code and not the
> repack code.  And it must have something to do with the number of deltas
> being replayed.  And because the repack is attempting delta compression
> roughly from newest to oldest, and because old objects are typically in
> a deeper delta chain, then this might explain the logarithmic slowdown.
>
> So something must be wrong with the delta cache in sha1_file.c somehow.

I applied the delta accounting patch. It took about 200MB of from the
memory use but that doesn't make a dent in 4GB of allocations.


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-08  2:04   ` Jon Smirl
@ 2007-12-08  2:28     ` Nicolas Pitre
  2007-12-08  3:29       ` Jon Smirl
  0 siblings, 1 reply; 82+ messages in thread
From: Nicolas Pitre @ 2007-12-08  2:28 UTC (permalink / raw
  To: Jon Smirl; +Cc: Git Mailing List

On Fri, 7 Dec 2007, Jon Smirl wrote:

> On 12/7/07, Nicolas Pitre <nico@cam.org> wrote:
> > On Fri, 7 Dec 2007, Jon Smirl wrote:
> >
> > >  git repack -a -d -f  --depth=250 --window=250
> > >
> > > complete seconds RAM
> > > 10%  47 1GB
> > > 20%  29 1Gb
> > > 30%  24 1Gb
> > > 40%  18 1GB
> > > 50%  110 1.2GB
> > > 60%  85 1.4GB
> > > 70%  195 1.5GB
> > > 80%  186 2.5GB
> > > 90%  489 3.8GB
> > > 95%  800 4.8GB
> > > I killed it because it started swapping
> > >
> > > The mmaps are only about 400MB in this case.
> > > At the end the git process had 4.4GB of physical RAM allocated.
> >
> > That's really bad.
> >
> > > Starting from a highly compressed pack greatly aggravates the problem.
> >
> > That is really interesting though.
> >
> > > Starting with a 2GB pack of the same data my process size only grew to
> > > 3GB with 2GB of mmaps.
> >
> > Which is quite reasonable, even if the same issue might still be there.
> >
> > So the problem seems to be related to the pack access code and not the
> > repack code.  And it must have something to do with the number of deltas
> > being replayed.  And because the repack is attempting delta compression
> > roughly from newest to oldest, and because old objects are typically in
> > a deeper delta chain, then this might explain the logarithmic slowdown.
> >
> > So something must be wrong with the delta cache in sha1_file.c somehow.

Staring at the cache code I don't see anything wrong with it.

> I applied the delta accounting patch. It took about 200MB of from the
> memory use but that doesn't make a dent in 4GB of allocations.

Right.  I didn't expect much from that fix.


Nicolas

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-08  2:28     ` Nicolas Pitre
@ 2007-12-08  3:29       ` Jon Smirl
  2007-12-08  3:37         ` David Brown
  2007-12-08  3:48         ` Harvey Harrison
  0 siblings, 2 replies; 82+ messages in thread
From: Jon Smirl @ 2007-12-08  3:29 UTC (permalink / raw
  To: Nicolas Pitre; +Cc: Git Mailing List

The kernel repo has the same problem but not nearly as bad.

Starting from a default pack
 git repack -a -d -f  --depth=1000 --window=1000
Uses 1GB of physical memory

Now do the command again.
 git repack -a -d -f  --depth=1000 --window=1000
Uses 1.3GB of physical memory

I suspect the gcc repo has much longer revision chains than the kernel
one since the kernel repo is only a few years old. The Mozilla repo
contained revision chains with over 2,000 revisions. Longer revision
chains result in longer delta chains.

So what is allocating the extra memory? Either a function of the
number of entries in the chain, or related to accessing the chain
since a chain with more entries will need to be accessed more times.

I have a 168MB kernel pack now after 15 minutes of four cores at 100%.

Here's another observation, the gcc objects are larger. Kernel has
650K objects in 190MB, gcc has 870K objects in 330MB. Average gcc
object is 30% larger. How should the average kernel developer
interpret this?

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-08  3:29       ` Jon Smirl
@ 2007-12-08  3:37         ` David Brown
  2007-12-08  4:22           ` Jon Smirl
  2007-12-08  3:48         ` Harvey Harrison
  1 sibling, 1 reply; 82+ messages in thread
From: David Brown @ 2007-12-08  3:37 UTC (permalink / raw
  To: Jon Smirl; +Cc: Nicolas Pitre, Git Mailing List

On Fri, Dec 07, 2007 at 10:29:31PM -0500, Jon Smirl wrote:
>The kernel repo has the same problem but not nearly as bad.
>
>Starting from a default pack
> git repack -a -d -f  --depth=1000 --window=1000
>Uses 1GB of physical memory
>
>Now do the command again.
> git repack -a -d -f  --depth=1000 --window=1000
>Uses 1.3GB of physical memory

With my repo that contains a bunch of 50MB tarfiles, I've found I must
specify --window-memory as well to keep repack from using nearly unbounded
amounts of memory.  Perhaps it is the larger files found in gcc that
provokes this.

A window size of 1000 can take a lot of memory if the objects are large.

Dave

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-08  3:37         ` David Brown
@ 2007-12-08  4:22           ` Jon Smirl
  2007-12-08  4:30             ` Nicolas Pitre
  0 siblings, 1 reply; 82+ messages in thread
From: Jon Smirl @ 2007-12-08  4:22 UTC (permalink / raw
  To: David Brown, Nicolas Pitre, Git Mailing List

On 12/7/07, David Brown <git@davidb.org> wrote:
> On Fri, Dec 07, 2007 at 10:29:31PM -0500, Jon Smirl wrote:
> >The kernel repo has the same problem but not nearly as bad.
> >
> >Starting from a default pack
> > git repack -a -d -f  --depth=1000 --window=1000
> >Uses 1GB of physical memory
> >
> >Now do the command again.
> > git repack -a -d -f  --depth=1000 --window=1000
> >Uses 1.3GB of physical memory
>
> With my repo that contains a bunch of 50MB tarfiles, I've found I must
> specify --window-memory as well to keep repack from using nearly unbounded
> amounts of memory.  Perhaps it is the larger files found in gcc that
> provokes this.
>
> A window size of 1000 can take a lot of memory if the objects are large.

This is a partial solution to the problem. Adding window size =256M
took memory consumption down from 4.8GB to 2.8GB. It took an hour to
run the test.

It not the complete solution since my git process is still using 2.4GB
physical memory. I also still experiencing a lot of slow down in the
last 10%.

Does the gcc repo contain some giant objects? Why wasn't the memory
freed after their chain was processed?

Most of the last 10% is being done on a single CPU. There must be a
chain of giant objects that is unbalancing everything.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-08  4:22           ` Jon Smirl
@ 2007-12-08  4:30             ` Nicolas Pitre
  2007-12-08  5:01               ` Jon Smirl
  0 siblings, 1 reply; 82+ messages in thread
From: Nicolas Pitre @ 2007-12-08  4:30 UTC (permalink / raw
  To: Jon Smirl; +Cc: David Brown, Git Mailing List

On Fri, 7 Dec 2007, Jon Smirl wrote:

> Does the gcc repo contain some giant objects? Why wasn't the memory
> freed after their chain was processed?

It should be.

> Most of the last 10% is being done on a single CPU. There must be a
> chain of giant objects that is unbalancing everything.

I'm about to send a patch to fix the thread balancing for real this 
time.


Nicolas

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-08  4:30             ` Nicolas Pitre
@ 2007-12-08  5:01               ` Jon Smirl
  2007-12-08  5:12                 ` Nicolas Pitre
  0 siblings, 1 reply; 82+ messages in thread
From: Jon Smirl @ 2007-12-08  5:01 UTC (permalink / raw
  To: Nicolas Pitre; +Cc: David Brown, Git Mailing List

On 12/7/07, Nicolas Pitre <nico@cam.org> wrote:
> On Fri, 7 Dec 2007, Jon Smirl wrote:
>
> > Does the gcc repo contain some giant objects? Why wasn't the memory
> > freed after their chain was processed?
>
> It should be.
>
> > Most of the last 10% is being done on a single CPU. There must be a
> > chain of giant objects that is unbalancing everything.
>
> I'm about to send a patch to fix the thread balancing for real this
> time.

Something is really broken in the last 5% of that repo. I have been
processing at 97% for 30 minutes without moving to 98%.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-08  5:01               ` Jon Smirl
@ 2007-12-08  5:12                 ` Nicolas Pitre
  0 siblings, 0 replies; 82+ messages in thread
From: Nicolas Pitre @ 2007-12-08  5:12 UTC (permalink / raw
  To: Jon Smirl; +Cc: Git Mailing List

On Sat, 8 Dec 2007, Jon Smirl wrote:

> On 12/7/07, Nicolas Pitre <nico@cam.org> wrote:
> > On Fri, 7 Dec 2007, Jon Smirl wrote:
> >
> > > Does the gcc repo contain some giant objects? Why wasn't the memory
> > > freed after their chain was processed?
> >
> > It should be.
> >
> > > Most of the last 10% is being done on a single CPU. There must be a
> > > chain of giant objects that is unbalancing everything.
> >
> > I'm about to send a patch to fix the thread balancing for real this
> > time.
> 
> Something is really broken in the last 5% of that repo. I have been
> processing at 97% for 30 minutes without moving to 98%.

This is a clear sign of a problem, indeed.

I'll be away for the weekend, so here's a few things to try out if you 
feel like it:

1) Make sure the problem occurs with the thread code disabled.  That 
   would eliminate one variable, and will help for #2.

2) Try bissecting the issue.  If you can find an old Git version where 
   the issue doesn't appear then simply run "git bissect" to find the 
   exact commit causing the problem.  Best with a repo that doesn't take
   ages to repack.

3) Compile Git against the dmalloc library in order to identify where
   the huge memory leak is happening.

Nicolas

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-08  3:29       ` Jon Smirl
  2007-12-08  3:37         ` David Brown
@ 2007-12-08  3:48         ` Harvey Harrison
  1 sibling, 0 replies; 82+ messages in thread
From: Harvey Harrison @ 2007-12-08  3:48 UTC (permalink / raw
  To: Jon Smirl; +Cc: Nicolas Pitre, Git Mailing List

On Fri, 2007-12-07 at 22:29 -0500, Jon Smirl wrote:
> The kernel repo has the same problem but not nearly as bad.
> 
> Starting from a default pack
>  git repack -a -d -f  --depth=1000 --window=1000
> Uses 1GB of physical memory
> 
> Now do the command again.
>  git repack -a -d -f  --depth=1000 --window=1000
> Uses 1.3GB of physical memory
> 
> I suspect the gcc repo has much longer revision chains than the kernel
> one since the kernel repo is only a few years old. The Mozilla repo
> contained revision chains with over 2,000 revisions. Longer revision
> chains result in longer delta chains.

I sent out a partial delta breakdown for the gcc repo earlier, here's
the whole list.

breakdown of the gcc packfile:

Total objects
1017922

ChainLength	Objects	Cumulative
1:	103817	103817
2:	67332	171149
3:	57520	228669
4:	52570	281239
5:	43910	325149
6:	37520	362669
7:	35248	397917
8:	29819	427736
9:	27619	455355
10:	22656	478011
11:	21073	499084
12:	18738	517822
13:	16674	534496
14:	14882	549378
15:	14424	563802
16:	12765	576567
17:	11662	588229
18:	11845	600074
19:	11694	611768
20:	9625	621393
21:	9031	630424
22:	8437	638861
23:	8217	647078
24:	7927	655005
25:	7955	662960
26:	7092	670052
27:	7004	677056
28:	6724	683780
29:	6626	690406
30:	5875	696281
31:	5970	702251
32:	5726	707977
33:	6025	714002
34:	5354	719356
35:	6413	725769
36:	4933	730702
37:	4888	735590
38:	4561	740151
39:	4366	744517
40:	4166	748683
41:	4531	753214
42:	4029	757243
43:	3701	760944
44:	3647	764591
45:	3553	768144
46:	3509	771653
47:	3473	775126
48:	3442	778568
49:	3379	781947
50:	3395	785342
51:	3315	788657
52:	3168	791825
53:	3345	795170
54:	3166	798336
55:	3237	801573
56:	2795	804368
57:	2768	807136
58:	2666	809802
59:	2723	812525
60:	2547	815072
61:	2565	817637
62:	2622	820259
63:	2521	822780
64:	2492	825272
65:	2529	827801
66:	2566	830367
67:	2685	833052
68:	2458	835510
69:	2457	837967
70:	2440	840407
71:	2410	842817
72:	2337	845154
73:	2301	847455
74:	2201	849656
75:	2127	851783
76:	2256	854039
77:	2038	856077
78:	1925	858002
79:	1965	859967
80:	1929	861896
81:	1890	863786
82:	1873	865659
83:	1964	867623
84:	1898	869521
85:	1839	871360
86:	1933	873293
87:	1876	875169
88:	1851	877020
89:	1789	878809
90:	1790	880599
91:	1804	882403
92:	1696	884099
93:	1863	885962
94:	1889	887851
95:	1766	889617
96:	1731	891348
97:	1775	893123
98:	1750	894873
99:	1767	896640
100:	1644	898284
101:	1642	899926
102:	1489	901415
103:	1532	902947
104:	1564	904511
105:	1477	905988
106:	1461	907449
107:	1383	908832
108:	1422	910254
109:	1316	911570
110:	1480	913050
111:	1329	914379
112:	1375	915754
113:	1292	917046
114:	1224	918270
115:	1123	919393
116:	1216	920609
117:	1252	921861
118:	1252	923113
119:	1346	924459
120:	1320	925779
121:	1277	927056
122:	1234	928290
123:	1200	929490
124:	1255	930745
125:	1206	931951
126:	1155	933106
127:	1246	934352
128:	1226	935578
129:	1194	936772
130:	1268	938040
131:	1334	939374
132:	1146	940520
133:	1220	941740
134:	1055	942795
135:	1110	943905
136:	1095	945000
137:	1294	946294
138:	1204	947498
139:	1218	948716
140:	1101	949817
141:	993	950810
142:	975	951785
143:	1014	952799
144:	968	953767
145:	957	954724
146:	1069	955793
147:	996	956789
148:	967	957756
149:	964	958720
150:	954	959674
151:	949	960623
152:	1001	961624
153:	1042	962666
154:	1057	963723
155:	948	964671
156:	966	965637
157:	833	966470
158:	959	967429
159:	907	968336
160:	854	969190
161:	847	970037
162:	836	970873
163:	769	971642
164:	747	972389
165:	755	973144
166:	707	973851
167:	774	974625
168:	777	975402
169:	783	976185
170:	707	976892
171:	738	977630
172:	775	978405
173:	781	979186
174:	698	979884
175:	801	980685
176:	712	981397
177:	679	982076
178:	775	982851
179:	696	983547
180:	760	984307
181:	740	985047
182:	752	985799
183:	704	986503
184:	683	987186
185:	690	987876
186:	741	988617
187:	642	989259
188:	672	989931
189:	679	990610
190:	691	991301
191:	648	991949
192:	703	992652
193:	675	993327
194:	687	994014
195:	625	994639
196:	607	995246
197:	583	995829
198:	632	996461
199:	540	997001
200:	652	997653
201:	600	998253
202:	628	998881
203:	624	999505
204:	582	1000087
205:	548	1000635
206:	520	1001155
207:	648	1001803
208:	556	1002359
209:	563	1002922
210:	508	1003430
211:	570	1004000
212:	530	1004530
213:	575	1005105
214:	527	1005632
215:	521	1006153
216:	515	1006668
217:	513	1007181
218:	460	1007641
219:	491	1008132
220:	474	1008606
221:	471	1009077
222:	482	1009559
223:	485	1010044
224:	439	1010483
225:	385	1010868
226:	385	1011253
227:	403	1011656
228:	380	1012036
229:	376	1012412
230:	377	1012789
231:	415	1013204
232:	394	1013598
233:	362	1013960
234:	334	1014294
235:	366	1014660
236:	317	1014977
237:	362	1015339
238:	343	1015682
239:	392	1016074
240:	317	1016391
241:	305	1016696
242:	319	1017015
243:	276	1017291
244:	247	1017538
245:	179	1017717
246:	111	1017828
247:	61	1017889
248:	27	1017916
249:	6	1017922

Harvey

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-08  1:46 ` Something is broken in repack Nicolas Pitre
  2007-12-08  2:04   ` Jon Smirl
@ 2007-12-08  2:22   ` Jon Smirl
  2007-12-08  3:44   ` Harvey Harrison
  2007-12-08 22:18   ` Junio C Hamano
  3 siblings, 0 replies; 82+ messages in thread
From: Jon Smirl @ 2007-12-08  2:22 UTC (permalink / raw
  To: Nicolas Pitre; +Cc: Git Mailing List

On 12/7/07, Nicolas Pitre <nico@cam.org> wrote:
> So the problem seems to be related to the pack access code and not the
> repack code.  And it must have something to do with the number of deltas
> being replayed.  And because the repack is attempting delta compression
> roughly from newest to oldest, and because old objects are typically in
> a deeper delta chain, then this might explain the logarithmic slowdown.

What could be wrongly allocating 4GB of memory? Figure that out and
you should have your answer. The slow down may be coming from having
to search through more and more objects in memory.

Memory consumption seem to be correlated to the depth of the delta
chain being accessed. It blows up tremendously right at the end. It
may even be a square of the length of the chain length. For the normal
default case the square didn't hurt, but 250*250 = 62,500 which would
eat a huge amount of memory.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-08  1:46 ` Something is broken in repack Nicolas Pitre
  2007-12-08  2:04   ` Jon Smirl
  2007-12-08  2:22   ` Jon Smirl
@ 2007-12-08  3:44   ` Harvey Harrison
  2007-12-08 22:18   ` Junio C Hamano
  3 siblings, 0 replies; 82+ messages in thread
From: Harvey Harrison @ 2007-12-08  3:44 UTC (permalink / raw
  To: Nicolas Pitre; +Cc: Jon Smirl, Git Mailing List


On Fri, 2007-12-07 at 20:46 -0500, Nicolas Pitre wrote:
> On Fri, 7 Dec 2007, Jon Smirl wrote:
> > And the 330MB gcc pack for input
> >  git repack -a -d -f  --depth=250 --window=250
> > 
> > complete seconds RAM
> > 10%  47 1GB
> > 20%  29 1Gb
> > 30%  24 1Gb
> > 40%  18 1GB
> > 50%  110 1.2GB
> > 60%  85 1.4GB
> > 70%  195 1.5GB
> > 80%  186 2.5GB
> > 90%  489 3.8GB
> > 95%  800 4.8GB
> > I killed it because it started swapping
> > 
> > The mmaps are only about 400MB in this case.
> > At the end the git process had 4.4GB of physical RAM allocated.
> > Starting with a 2GB pack of the same data my process size only grew to
> > 3GB with 2GB of mmaps.
> 
> Which is quite reasonable, even if the same issue might still be there.
> 
> So the problem seems to be related to the pack access code and not the 
> repack code.  And it must have something to do with the number of deltas 
> being replayed.  And because the repack is attempting delta compression 
> roughly from newest to oldest, and because old objects are typically in 
> a deeper delta chain, then this might explain the logarithmic slowdown.
> 
> So something must be wrong with the delta cache in sha1_file.c somehow.

All I have is a qualitative observation, but during the process of
creating the pack, there was a _huge_ slowdown between 10-15%
(hundreds/dozens per second to single object per second and a
corresponding increase in process size).  Didn't keep any numbers
at the time, but it was noticable.

I wonder if there are a bunch of huge objects somewhere in gcc's
history?

Harvey

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-08  1:46 ` Something is broken in repack Nicolas Pitre
                     ` (2 preceding siblings ...)
  2007-12-08  3:44   ` Harvey Harrison
@ 2007-12-08 22:18   ` Junio C Hamano
  2007-12-09  8:05     ` Junio C Hamano
  2007-12-10  2:49     ` Nicolas Pitre
  3 siblings, 2 replies; 82+ messages in thread
From: Junio C Hamano @ 2007-12-08 22:18 UTC (permalink / raw
  To: Nicolas Pitre; +Cc: Jon Smirl, Git Mailing List

Nicolas Pitre <nico@cam.org> writes:

> On Fri, 7 Dec 2007, Jon Smirl wrote:
>
>> Starting with a 2GB pack of the same data my process size only grew to
>> 3GB with 2GB of mmaps.
>
> Which is quite reasonable, even if the same issue might still be there.
>
> So the problem seems to be related to the pack access code and not the 
> repack code.  And it must have something to do with the number of deltas 
> being replayed.  And because the repack is attempting delta compression 
> roughly from newest to oldest, and because old objects are typically in 
> a deeper delta chain, then this might explain the logarithmic slowdown.
>
> So something must be wrong with the delta cache in sha1_file.c somehow.

I was reaching the same conclusion but haven't managed to spot anything
blatantly wrong in that area.  Will need to dig more.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-08 22:18   ` Junio C Hamano
@ 2007-12-09  8:05     ` Junio C Hamano
  2007-12-09 15:19       ` Jon Smirl
  2007-12-09 18:25       ` Jon Smirl
  2007-12-10  2:49     ` Nicolas Pitre
  1 sibling, 2 replies; 82+ messages in thread
From: Junio C Hamano @ 2007-12-09  8:05 UTC (permalink / raw
  To: Jon Smirl; +Cc: Nicolas Pitre, Git Mailing List

Junio C Hamano <gitster@pobox.com> writes:

> Nicolas Pitre <nico@cam.org> writes:
>
>> On Fri, 7 Dec 2007, Jon Smirl wrote:
>>
>>> Starting with a 2GB pack of the same data my process size only grew to
>>> 3GB with 2GB of mmaps.
>>
>> Which is quite reasonable, even if the same issue might still be there.
>>
>> So the problem seems to be related to the pack access code and not the 
>> repack code.  And it must have something to do with the number of deltas 
>> being replayed.  And because the repack is attempting delta compression 
>> roughly from newest to oldest, and because old objects are typically in 
>> a deeper delta chain, then this might explain the logarithmic slowdown.
>>
>> So something must be wrong with the delta cache in sha1_file.c somehow.
>
> I was reaching the same conclusion but haven't managed to spot anything
> blatantly wrong in that area.  Will need to dig more.

Does this problem have correlation with the use of threads?  Do you see
the same bloat with or without THREADED_DELTA_SEARCH defined?

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-09  8:05     ` Junio C Hamano
@ 2007-12-09 15:19       ` Jon Smirl
  2007-12-09 18:25       ` Jon Smirl
  1 sibling, 0 replies; 82+ messages in thread
From: Jon Smirl @ 2007-12-09 15:19 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Nicolas Pitre, Git Mailing List

On 12/9/07, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Nicolas Pitre <nico@cam.org> writes:
> >
> >> On Fri, 7 Dec 2007, Jon Smirl wrote:
> >>
> >>> Starting with a 2GB pack of the same data my process size only grew to
> >>> 3GB with 2GB of mmaps.
> >>
> >> Which is quite reasonable, even if the same issue might still be there.
> >>
> >> So the problem seems to be related to the pack access code and not the
> >> repack code.  And it must have something to do with the number of deltas
> >> being replayed.  And because the repack is attempting delta compression
> >> roughly from newest to oldest, and because old objects are typically in
> >> a deeper delta chain, then this might explain the logarithmic slowdown.
> >>
> >> So something must be wrong with the delta cache in sha1_file.c somehow.
> >
> > I was reaching the same conclusion but haven't managed to spot anything
> > blatantly wrong in that area.  Will need to dig more.
>
> Does this problem have correlation with the use of threads?  Do you see
> the same bloat with or without THREADED_DELTA_SEARCH defined?
>

I just started a non-threaded one. It will be four or five hours
before it finishes.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-09  8:05     ` Junio C Hamano
  2007-12-09 15:19       ` Jon Smirl
@ 2007-12-09 18:25       ` Jon Smirl
  2007-12-10  1:07         ` Nicolas Pitre
  1 sibling, 1 reply; 82+ messages in thread
From: Jon Smirl @ 2007-12-09 18:25 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Nicolas Pitre, Git Mailing List

On 12/9/07, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Nicolas Pitre <nico@cam.org> writes:
> >
> >> On Fri, 7 Dec 2007, Jon Smirl wrote:
> >>
> >>> Starting with a 2GB pack of the same data my process size only grew to
> >>> 3GB with 2GB of mmaps.
> >>
> >> Which is quite reasonable, even if the same issue might still be there.
> >>
> >> So the problem seems to be related to the pack access code and not the
> >> repack code.  And it must have something to do with the number of deltas
> >> being replayed.  And because the repack is attempting delta compression
> >> roughly from newest to oldest, and because old objects are typically in
> >> a deeper delta chain, then this might explain the logarithmic slowdown.
> >>
> >> So something must be wrong with the delta cache in sha1_file.c somehow.
> >
> > I was reaching the same conclusion but haven't managed to spot anything
> > blatantly wrong in that area.  Will need to dig more.
>
> Does this problem have correlation with the use of threads?  Do you see
> the same bloat with or without THREADED_DELTA_SEARCH defined?
>

Something else seems to be wrong.

With threading turned off,  5000 CPU seconds and 13% done.
With threading turned on, threads = 1, 5000 CPU seconds, 13%
With threading turned on, threads = 2, 180 CPU seconds, 13%
With threading turned on, threads = 4, 150 CPU seconds, 13%

This can't be right, four cores are not 40x one core. So maybe the
observed logarithmic slow down is because the percent complete is
being reported wrong in the threaded case. If that's the case we may
be looking in the wrong place for problems.

The times are only approximate, I'm using the CPU for other things.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-09 18:25       ` Jon Smirl
@ 2007-12-10  1:07         ` Nicolas Pitre
  0 siblings, 0 replies; 82+ messages in thread
From: Nicolas Pitre @ 2007-12-10  1:07 UTC (permalink / raw
  To: Jon Smirl; +Cc: Junio C Hamano, Git Mailing List

On Sun, 9 Dec 2007, Jon Smirl wrote:

> On 12/9/07, Junio C Hamano <gitster@pobox.com> wrote:
> > Junio C Hamano <gitster@pobox.com> writes:
> >
> > > Nicolas Pitre <nico@cam.org> writes:
> > >
> > >> On Fri, 7 Dec 2007, Jon Smirl wrote:
> > >>
> > >>> Starting with a 2GB pack of the same data my process size only grew to
> > >>> 3GB with 2GB of mmaps.
> > >>
> > >> Which is quite reasonable, even if the same issue might still be there.
> > >>
> > >> So the problem seems to be related to the pack access code and not the
> > >> repack code.  And it must have something to do with the number of deltas
> > >> being replayed.  And because the repack is attempting delta compression
> > >> roughly from newest to oldest, and because old objects are typically in
> > >> a deeper delta chain, then this might explain the logarithmic slowdown.
> > >>
> > >> So something must be wrong with the delta cache in sha1_file.c somehow.
> > >
> > > I was reaching the same conclusion but haven't managed to spot anything
> > > blatantly wrong in that area.  Will need to dig more.
> >
> > Does this problem have correlation with the use of threads?  Do you see
> > the same bloat with or without THREADED_DELTA_SEARCH defined?
> >
> 
> Something else seems to be wrong.
> 
> With threading turned off,  5000 CPU seconds and 13% done.
> With threading turned on, threads = 1, 5000 CPU seconds, 13%
> With threading turned on, threads = 2, 180 CPU seconds, 13%
> With threading turned on, threads = 4, 150 CPU seconds, 13%
> 
> This can't be right, four cores are not 40x one core.

It may be right.  The object list to apply delta compression on doesn't 
necessarily require a uniform amount of cycles throughout.  When using 
multiple threads, the list is broken in parts for each thread, and later 
parts might end up being simply much easier to process, therefore 
changing the percentage figure.

> So maybe the observed logarithmic slow down is because the percent 
> complete is being reported wrong in the threaded case. If that's the 
> case we may be looking in the wrong place for problems.

I really doubt it.


Nicolas

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-08 22:18   ` Junio C Hamano
  2007-12-09  8:05     ` Junio C Hamano
@ 2007-12-10  2:49     ` Nicolas Pitre
  1 sibling, 0 replies; 82+ messages in thread
From: Nicolas Pitre @ 2007-12-10  2:49 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Jon Smirl, Git Mailing List

On Sat, 8 Dec 2007, Junio C Hamano wrote:

> Nicolas Pitre <nico@cam.org> writes:
> 
> > On Fri, 7 Dec 2007, Jon Smirl wrote:
> >
> >> Starting with a 2GB pack of the same data my process size only grew to
> >> 3GB with 2GB of mmaps.
> >
> > Which is quite reasonable, even if the same issue might still be there.
> >
> > So the problem seems to be related to the pack access code and not the 
> > repack code.  And it must have something to do with the number of deltas 
> > being replayed.  And because the repack is attempting delta compression 
> > roughly from newest to oldest, and because old objects are typically in 
> > a deeper delta chain, then this might explain the logarithmic slowdown.
> >
> > So something must be wrong with the delta cache in sha1_file.c somehow.
> 
> I was reaching the same conclusion but haven't managed to spot anything
> blatantly wrong in that area.  Will need to dig more.

I didn't find anything wrong there either. I'll have to run some more 
gcc repacking tests myself, despite not having a blazingly fast machine 
making for rather long turnarounds.


Nicolas

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-07 23:05 Something is broken in repack Jon Smirl
  2007-12-08  0:37 ` Linus Torvalds
  2007-12-08  1:46 ` Something is broken in repack Nicolas Pitre
@ 2007-12-08  2:56 ` David Brown
  2007-12-10 19:56 ` Nicolas Pitre
  2007-12-11  2:25 ` Jon Smirl
  4 siblings, 0 replies; 82+ messages in thread
From: David Brown @ 2007-12-08  2:56 UTC (permalink / raw
  To: Jon Smirl; +Cc: Git Mailing List

On Fri, Dec 07, 2007 at 06:05:38PM -0500, Jon Smirl wrote:
>Using this config:
>[pack]
>        threads = 4
>        deltacachesize = 256M
>        deltacachelimit = 0

Just out of curiousity, does adding

         [pack]
                 windowmemory = 256M

help.  I've found this to grow very large when there are large blobs.

Dave

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-07 23:05 Something is broken in repack Jon Smirl
                   ` (2 preceding siblings ...)
  2007-12-08  2:56 ` David Brown
@ 2007-12-10 19:56 ` Nicolas Pitre
  2007-12-10 20:05   ` Jon Smirl
  2007-12-11  2:25 ` Jon Smirl
  4 siblings, 1 reply; 82+ messages in thread
From: Nicolas Pitre @ 2007-12-10 19:56 UTC (permalink / raw
  To: Jon Smirl; +Cc: Git Mailing List

On Fri, 7 Dec 2007, Jon Smirl wrote:

> Using this config:
> [pack]
>         threads = 4
>         deltacachesize = 256M
>         deltacachelimit = 0
> 
> And the 330MB gcc pack for input
>  git repack -a -d -f  --depth=250 --window=250
> 
> complete seconds RAM
> 10%  47 1GB
> 20%  29 1Gb
> 30%  24 1Gb
> 40%  18 1GB
> 50%  110 1.2GB
> 60%  85 1.4GB
> 70%  195 1.5GB
> 80%  186 2.5GB
> 90%  489 3.8GB
> 95%  800 4.8GB
> I killed it because it started swapping
> 
> The mmaps are only about 400MB in this case.
> At the end the git process had 4.4GB of physical RAM allocated.
> 
> Starting from a highly compressed pack greatly aggravates the problem.
> Starting with a 2GB pack of the same data my process size only grew to
> 3GB with 2GB of mmaps.

You said having reproduced the issue, albeit not as severe, with the 
Linux kernel repo.  I did just that:

# to get the default pack:
$ git repack -a -f -d

# first measurement with a repack from a default pack
$ /usr/bin/time git repack -a -f --window=256 --depth=256
2572.17user 5.87system 22:46.80elapsed 188%CPU (0avgtext+0avgdata 0maxresident)k
15720inputs+356640outputs (71major+264376minor)pagefaults 0swaps

# do it again to start from a highly packed pack
$ /usr/bin/time git repack -a -f --window=256 --depth=256
2573.53user 5.62system 22:45.60elapsed 188%CPU (0avgtext+0avgdata 0maxresident)k
29176inputs+356664outputs (210major+274887minor)pagefaults 0swaps

This is with pack.threads=2 on a P4 with HT, and I'm using the machine 
for other tasks as well, but all measured time is sensibly the same for 
both cases.  Virtual memory allocation never reached 700MB in both cases 
either.


Nicolas

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-10 19:56 ` Nicolas Pitre
@ 2007-12-10 20:05   ` Jon Smirl
  2007-12-10 20:16     ` Morten Welinder
  0 siblings, 1 reply; 82+ messages in thread
From: Jon Smirl @ 2007-12-10 20:05 UTC (permalink / raw
  To: Nicolas Pitre; +Cc: Git Mailing List

On 12/10/07, Nicolas Pitre <nico@cam.org> wrote:
> On Fri, 7 Dec 2007, Jon Smirl wrote:
>
> > Using this config:
> > [pack]
> >         threads = 4
> >         deltacachesize = 256M
> >         deltacachelimit = 0
> >
> > And the 330MB gcc pack for input
> >  git repack -a -d -f  --depth=250 --window=250
> >
> > complete seconds RAM
> > 10%  47 1GB
> > 20%  29 1Gb
> > 30%  24 1Gb
> > 40%  18 1GB
> > 50%  110 1.2GB
> > 60%  85 1.4GB
> > 70%  195 1.5GB
> > 80%  186 2.5GB
> > 90%  489 3.8GB
> > 95%  800 4.8GB
> > I killed it because it started swapping
> >
> > The mmaps are only about 400MB in this case.
> > At the end the git process had 4.4GB of physical RAM allocated.
> >
> > Starting from a highly compressed pack greatly aggravates the problem.
> > Starting with a 2GB pack of the same data my process size only grew to
> > 3GB with 2GB of mmaps.
>
> You said having reproduced the issue, albeit not as severe, with the
> Linux kernel repo.  I did just that:
>
> # to get the default pack:
> $ git repack -a -f -d
>
> # first measurement with a repack from a default pack
> $ /usr/bin/time git repack -a -f --window=256 --depth=256
> 2572.17user 5.87system 22:46.80elapsed 188%CPU (0avgtext+0avgdata 0maxresident)k
> 15720inputs+356640outputs (71major+264376minor)pagefaults 0swaps
>
> # do it again to start from a highly packed pack
> $ /usr/bin/time git repack -a -f --window=256 --depth=256
> 2573.53user 5.62system 22:45.60elapsed 188%CPU (0avgtext+0avgdata 0maxresident)k
> 29176inputs+356664outputs (210major+274887minor)pagefaults 0swaps
>
> This is with pack.threads=2 on a P4 with HT, and I'm using the machine
> for other tasks as well, but all measured time is sensibly the same for
> both cases.  Virtual memory allocation never reached 700MB in both cases
> either.
>

This is the mail about the kernel pack, the one you quoted is a gcc run.

The kernel repo has the same problem but not nearly as bad.

Starting from a default pack
 git repack -a -d -f  --depth=1000 --window=1000
Uses 1GB of physical memory

Now do the command again.
 git repack -a -d -f  --depth=1000 --window=1000
Uses 1.3GB of physical memory

I suspect the gcc repo has much longer revision chains than the kernel
one since the kernel repo is only a few years old. The Mozilla repo
contained revision chains with over 2,000 revisions. Longer revision
chains result in longer delta chains.

So what is allocating the extra memory? Either a function of the
number of entries in the chain, or related to accessing the chain
since a chain with more entries will need to be accessed more times.

I have a 168MB kernel pack now after 15 minutes of four cores at 100%.

Here's another observation, the gcc objects are larger. Kernel has
650K objects in 190MB, gcc has 870K objects in 330MB. Average gcc
object is 30% larger. How should the average kernel developer
interpret this?



>
> Nicolas
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-10 20:05   ` Jon Smirl
@ 2007-12-10 20:16     ` Morten Welinder
  0 siblings, 0 replies; 82+ messages in thread
From: Morten Welinder @ 2007-12-10 20:16 UTC (permalink / raw
  To: Jon Smirl; +Cc: Nicolas Pitre, Git Mailing List

> Here's another observation, the gcc objects are larger. Kernel has
> 650K objects in 190MB, gcc has 870K objects in 330MB. Average gcc
> object is 30% larger. How should the average kernel developer
> interpret this?

Could this be explained by the ChangeLog file?  It's large; it has tons of
revisions; it is a prime candidate for delta compression.

Morten

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-07 23:05 Something is broken in repack Jon Smirl
                   ` (3 preceding siblings ...)
  2007-12-10 19:56 ` Nicolas Pitre
@ 2007-12-11  2:25 ` Jon Smirl
  2007-12-11  2:55   ` Junio C Hamano
  2007-12-11  3:49   ` Nicolas Pitre
  4 siblings, 2 replies; 82+ messages in thread
From: Jon Smirl @ 2007-12-11  2:25 UTC (permalink / raw
  To: Git Mailing List, Nicolas Pitre

New run using same configuration. With the addition of the more
efficient load balancing patches and delta cache accounting.

Seconds are wall clock time. They are lower since the patch made
threading better at using all four cores. I am stuck at 380-390% CPU
utilization for the git process.

complete seconds RAM
10%   60    900M (includes counting)
20%   15    900M
30%   15    900M
40%   50    1.2G
50%   80    1.3G
60%   70    1.7G
70%   140  1.8G
80%   180  2.0G
90%   280  2.2G
95%   530  2.8G - 1,420 total to here, previous was 1,983
100% 1390 2.85G
During the writing phase RAM fell to 1.6G
What is being freed in the writing phase??

I have no explanation for the change in RAM usage. Two guesses come to
mind. Memory fragmentation. Or the change in the way the work was
split up altered RAM usage.

Total CPU time was 195 minutes in 70 minutes clock time. About 70%
efficient. During the compress phase all four cores were active until
the last 90 seconds. Writing the objects took over 23 minutes CPU
bound on one core.

New pack file is: 270,594,853
Old one was: 344,543,752
It still has 828,660 objects


On 12/7/07, Jon Smirl <jonsmirl@gmail.com> wrote:
> Using this config:
> [pack]
>         threads = 4
>         deltacachesize = 256M
>         deltacachelimit = 0
>
> And the 330MB gcc pack for input
>  git repack -a -d -f  --depth=250 --window=250
>
> complete seconds RAM
> 10%  47 1GB
> 20%  29 1Gb
> 30%  24 1Gb
> 40%  18 1GB
> 50%  110 1.2GB
> 60%  85 1.4GB
> 70%  195 1.5GB
> 80%  186 2.5GB
> 90%  489 3.8GB
> 95%  800 4.8GB
> I killed it because it started swapping
>
> The mmaps are only about 400MB in this case.
> At the end the git process had 4.4GB of physical RAM allocated.
>
> Starting from a highly compressed pack greatly aggravates the problem.
> Starting with a 2GB pack of the same data my process size only grew to
> 3GB with 2GB of mmaps.
>
> --
> Jon Smirl
> jonsmirl@gmail.com
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11  2:25 ` Jon Smirl
@ 2007-12-11  2:55   ` Junio C Hamano
  2007-12-11  3:27     ` Nicolas Pitre
  2007-12-11  3:49   ` Nicolas Pitre
  1 sibling, 1 reply; 82+ messages in thread
From: Junio C Hamano @ 2007-12-11  2:55 UTC (permalink / raw
  To: Jon Smirl; +Cc: Git Mailing List, Nicolas Pitre

"Jon Smirl" <jonsmirl@gmail.com> writes:

> 95%   530  2.8G - 1,420 total to here, previous was 1,983
> 100% 1390 2.85G
> During the writing phase RAM fell to 1.6G
> What is being freed in the writing phase??

entry->delta_data is the only thing I can think of that are freed
in the function that have been allocated much earlier before entering
the function.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11  2:55   ` Junio C Hamano
@ 2007-12-11  3:27     ` Nicolas Pitre
  2007-12-11 11:08       ` David Kastrup
  0 siblings, 1 reply; 82+ messages in thread
From: Nicolas Pitre @ 2007-12-11  3:27 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Jon Smirl, Git Mailing List

On Mon, 10 Dec 2007, Junio C Hamano wrote:

> "Jon Smirl" <jonsmirl@gmail.com> writes:
> 
> > 95%   530  2.8G - 1,420 total to here, previous was 1,983
> > 100% 1390 2.85G
> > During the writing phase RAM fell to 1.6G
> > What is being freed in the writing phase??
> 
> entry->delta_data is the only thing I can think of that are freed
> in the function that have been allocated much earlier before entering
> the function.

Yet all ->delta-data instances are limited to 256MB according to Jon's 
config.


Nicolas

> 


Nicolas

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11  3:27     ` Nicolas Pitre
@ 2007-12-11 11:08       ` David Kastrup
  2007-12-11 12:08         ` Pierre Habouzit
  0 siblings, 1 reply; 82+ messages in thread
From: David Kastrup @ 2007-12-11 11:08 UTC (permalink / raw
  To: Nicolas Pitre; +Cc: Junio C Hamano, Jon Smirl, Git Mailing List

Nicolas Pitre <nico@cam.org> writes:

> On Mon, 10 Dec 2007, Junio C Hamano wrote:
>
>> "Jon Smirl" <jonsmirl@gmail.com> writes:
>> 
>> > 95%   530  2.8G - 1,420 total to here, previous was 1,983
>> > 100% 1390 2.85G
>> > During the writing phase RAM fell to 1.6G
>> > What is being freed in the writing phase??
>> 
>> entry->delta_data is the only thing I can think of that are freed
>> in the function that have been allocated much earlier before entering
>> the function.
>
> Yet all ->delta-data instances are limited to 256MB according to Jon's 
> config.

Maybe address space fragmentation is involved here?  malloc/free for
large areas works using mmap in glibc.  There must be enough
_contiguous_ space for a new allocation to succeed.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11 11:08       ` David Kastrup
@ 2007-12-11 12:08         ` Pierre Habouzit
  2007-12-11 12:18           ` David Kastrup
  0 siblings, 1 reply; 82+ messages in thread
From: Pierre Habouzit @ 2007-12-11 12:08 UTC (permalink / raw
  To: David Kastrup; +Cc: Nicolas Pitre, Junio C Hamano, Jon Smirl, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1509 bytes --]

On Tue, Dec 11, 2007 at 11:08:47AM +0000, David Kastrup wrote:
> Nicolas Pitre <nico@cam.org> writes:
> 
> > On Mon, 10 Dec 2007, Junio C Hamano wrote:
> >
> >> "Jon Smirl" <jonsmirl@gmail.com> writes:
> >> 
> >> > 95%   530  2.8G - 1,420 total to here, previous was 1,983
> >> > 100% 1390 2.85G
> >> > During the writing phase RAM fell to 1.6G
> >> > What is being freed in the writing phase??
> >> 
> >> entry->delta_data is the only thing I can think of that are freed
> >> in the function that have been allocated much earlier before entering
> >> the function.
> >
> > Yet all ->delta-data instances are limited to 256MB according to Jon's 
> > config.
> 
> Maybe address space fragmentation is involved here?  malloc/free for
> large areas works using mmap in glibc.  There must be enough
> _contiguous_ space for a new allocation to succeed.

  Well, that's interesting, but there is a way to know for sure instead
of taking bets. Just use valgrind --tool=massif and look at the pretty
picture, it'll tell what was going on very accurately.

  Note that I find your explanation unlikely: glibc uses mmap for sizes
over 128k by default (IIRC), and as soon as you use mmaps, that's the
kernel that deals with the address space, and it's not necessarily
contiguous, that's only true for the heap.
-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11 12:08         ` Pierre Habouzit
@ 2007-12-11 12:18           ` David Kastrup
  0 siblings, 0 replies; 82+ messages in thread
From: David Kastrup @ 2007-12-11 12:18 UTC (permalink / raw
  To: Pierre Habouzit
  Cc: Nicolas Pitre, Junio C Hamano, Jon Smirl, Git Mailing List

Pierre Habouzit <madcoder@artemis.madism.org> writes:

> On Tue, Dec 11, 2007 at 11:08:47AM +0000, David Kastrup wrote:
>
>> Maybe address space fragmentation is involved here?  malloc/free for
>> large areas works using mmap in glibc.  There must be enough
>> _contiguous_ space for a new allocation to succeed.
>
>   Note that I find your explanation unlikely: glibc uses mmap for
> sizes over 128k by default (IIRC), and as soon as you use mmaps,
> that's the kernel that deals with the address space, and it's not
> necessarily contiguous, that's only true for the heap.

Every single allocation needs to be contiguous in virtual address space
and must not collide with existing virtual address space allocations.
So fragmentation is at least a logistical issue.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11  2:25 ` Jon Smirl
  2007-12-11  2:55   ` Junio C Hamano
@ 2007-12-11  3:49   ` Nicolas Pitre
  2007-12-11  5:25     ` Jon Smirl
  1 sibling, 1 reply; 82+ messages in thread
From: Nicolas Pitre @ 2007-12-11  3:49 UTC (permalink / raw
  To: Jon Smirl; +Cc: Git Mailing List

On Mon, 10 Dec 2007, Jon Smirl wrote:

> New run using same configuration. With the addition of the more
> efficient load balancing patches and delta cache accounting.
> 
> Seconds are wall clock time. They are lower since the patch made
> threading better at using all four cores. I am stuck at 380-390% CPU
> utilization for the git process.
> 
> complete seconds RAM
> 10%   60    900M (includes counting)
> 20%   15    900M
> 30%   15    900M
> 40%   50    1.2G
> 50%   80    1.3G
> 60%   70    1.7G
> 70%   140  1.8G
> 80%   180  2.0G
> 90%   280  2.2G
> 95%   530  2.8G - 1,420 total to here, previous was 1,983
> 100% 1390 2.85G
> During the writing phase RAM fell to 1.6G
> What is being freed in the writing phase??

The cached delta results, but you put a cap of 256MB for them.

Could you try again with that cache disabled entirely, with 
pack.deltacachesize = 1 (don't use 0 as that means unbounded).

And then, while still keeping the delta cache disabled, could you try 
with pack.threads = 2, and pack.threads = 1 ?

I'm sorry to ask you to do this but I don't have enough ram to even 
complete a repack with threads=2 so I'm reattempting single threaded at 
the moment.  But I really wonder if the threading has such an effect on 
memory usage.



> 
> I have no explanation for the change in RAM usage. Two guesses come to
> mind. Memory fragmentation. Or the change in the way the work was
> split up altered RAM usage.
> 
> Total CPU time was 195 minutes in 70 minutes clock time. About 70%
> efficient. During the compress phase all four cores were active until
> the last 90 seconds. Writing the objects took over 23 minutes CPU
> bound on one core.
> 
> New pack file is: 270,594,853
> Old one was: 344,543,752
> It still has 828,660 objects

You mean the pack for the gcc repo is now less than 300MB?  Wow.


Nicolas

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11  3:49   ` Nicolas Pitre
@ 2007-12-11  5:25     ` Jon Smirl
  2007-12-11  5:29       ` Jon Smirl
  2007-12-11  6:01       ` Sean
  0 siblings, 2 replies; 82+ messages in thread
From: Jon Smirl @ 2007-12-11  5:25 UTC (permalink / raw
  To: Nicolas Pitre, Junio C Hamano; +Cc: Git Mailing List

On 12/10/07, Nicolas Pitre <nico@cam.org> wrote:
> On Mon, 10 Dec 2007, Jon Smirl wrote:
>
> > New run using same configuration. With the addition of the more
> > efficient load balancing patches and delta cache accounting.
> >
> > Seconds are wall clock time. They are lower since the patch made
> > threading better at using all four cores. I am stuck at 380-390% CPU
> > utilization for the git process.
> >
> > complete seconds RAM
> > 10%   60    900M (includes counting)
> > 20%   15    900M
> > 30%   15    900M
> > 40%   50    1.2G
> > 50%   80    1.3G
> > 60%   70    1.7G
> > 70%   140  1.8G
> > 80%   180  2.0G
> > 90%   280  2.2G
> > 95%   530  2.8G - 1,420 total to here, previous was 1,983
> > 100% 1390 2.85G
> > During the writing phase RAM fell to 1.6G
> > What is being freed in the writing phase??
>
> The cached delta results, but you put a cap of 256MB for them.
>
> Could you try again with that cache disabled entirely, with
> pack.deltacachesize = 1 (don't use 0 as that means unbounded).
>
> And then, while still keeping the delta cache disabled, could you try
> with pack.threads = 2, and pack.threads = 1 ?
>
> I'm sorry to ask you to do this but I don't have enough ram to even
> complete a repack with threads=2 so I'm reattempting single threaded at
> the moment.  But I really wonder if the threading has such an effect on
> memory usage.

I already have a threads = 1 running with this config. Binary and
config were same from threads=4 run.

10% 28min 950M
40% 135min 950M
50% 157min 900M
60% 160min 830M
100% 170min 830M

Something is hurting bad with threads. 170 CPU minutes with one
thread, versus 195 CPU minutes with four threads.

Is there a different memory allocator that can be used when
multithreaded on gcc? This whole problem may be coming from the memory
allocation function. git is hardly interacting at all on the thread
level so it's likely a problem in the C run-time.

[core]
        repositoryformatversion = 0
        filemode = true
        bare = false
        logallrefupdates = true
[pack]
        threads = 1
        deltacachesize = 256M
        windowmemory = 256M
        deltacachelimit = 0
[remote "origin"]
        url = git://git.infradead.org/gcc.git
        fetch = +refs/heads/*:refs/remotes/origin/*
[branch "trunk"]
        remote = origin
        merge = refs/heads/trunk




>
>
>
> >
> > I have no explanation for the change in RAM usage. Two guesses come to
> > mind. Memory fragmentation. Or the change in the way the work was
> > split up altered RAM usage.
> >
> > Total CPU time was 195 minutes in 70 minutes clock time. About 70%
> > efficient. During the compress phase all four cores were active until
> > the last 90 seconds. Writing the objects took over 23 minutes CPU
> > bound on one core.
> >
> > New pack file is: 270,594,853
> > Old one was: 344,543,752
> > It still has 828,660 objects
>
> You mean the pack for the gcc repo is now less than 300MB?  Wow.
>
>
> Nicolas
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11  5:25     ` Jon Smirl
@ 2007-12-11  5:29       ` Jon Smirl
  2007-12-11  7:01         ` Jon Smirl
  2007-12-11 13:31         ` Nicolas Pitre
  2007-12-11  6:01       ` Sean
  1 sibling, 2 replies; 82+ messages in thread
From: Jon Smirl @ 2007-12-11  5:29 UTC (permalink / raw
  To: Nicolas Pitre, Junio C Hamano, gcc; +Cc: Git Mailing List

I added the gcc people to the CC, it's their repository. Maybe they
can help up sort this out.

On 12/11/07, Jon Smirl <jonsmirl@gmail.com> wrote:
> On 12/10/07, Nicolas Pitre <nico@cam.org> wrote:
> > On Mon, 10 Dec 2007, Jon Smirl wrote:
> >
> > > New run using same configuration. With the addition of the more
> > > efficient load balancing patches and delta cache accounting.
> > >
> > > Seconds are wall clock time. They are lower since the patch made
> > > threading better at using all four cores. I am stuck at 380-390% CPU
> > > utilization for the git process.
> > >
> > > complete seconds RAM
> > > 10%   60    900M (includes counting)
> > > 20%   15    900M
> > > 30%   15    900M
> > > 40%   50    1.2G
> > > 50%   80    1.3G
> > > 60%   70    1.7G
> > > 70%   140  1.8G
> > > 80%   180  2.0G
> > > 90%   280  2.2G
> > > 95%   530  2.8G - 1,420 total to here, previous was 1,983
> > > 100% 1390 2.85G
> > > During the writing phase RAM fell to 1.6G
> > > What is being freed in the writing phase??
> >
> > The cached delta results, but you put a cap of 256MB for them.
> >
> > Could you try again with that cache disabled entirely, with
> > pack.deltacachesize = 1 (don't use 0 as that means unbounded).
> >
> > And then, while still keeping the delta cache disabled, could you try
> > with pack.threads = 2, and pack.threads = 1 ?
> >
> > I'm sorry to ask you to do this but I don't have enough ram to even
> > complete a repack with threads=2 so I'm reattempting single threaded at
> > the moment.  But I really wonder if the threading has such an effect on
> > memory usage.
>
> I already have a threads = 1 running with this config. Binary and
> config were same from threads=4 run.
>
> 10% 28min 950M
> 40% 135min 950M
> 50% 157min 900M
> 60% 160min 830M
> 100% 170min 830M
>
> Something is hurting bad with threads. 170 CPU minutes with one
> thread, versus 195 CPU minutes with four threads.
>
> Is there a different memory allocator that can be used when
> multithreaded on gcc? This whole problem may be coming from the memory
> allocation function. git is hardly interacting at all on the thread
> level so it's likely a problem in the C run-time.
>
> [core]
>         repositoryformatversion = 0
>         filemode = true
>         bare = false
>         logallrefupdates = true
> [pack]
>         threads = 1
>         deltacachesize = 256M
>         windowmemory = 256M
>         deltacachelimit = 0
> [remote "origin"]
>         url = git://git.infradead.org/gcc.git
>         fetch = +refs/heads/*:refs/remotes/origin/*
> [branch "trunk"]
>         remote = origin
>         merge = refs/heads/trunk
>
>
>
>
> >
> >
> >
> > >
> > > I have no explanation for the change in RAM usage. Two guesses come to
> > > mind. Memory fragmentation. Or the change in the way the work was
> > > split up altered RAM usage.
> > >
> > > Total CPU time was 195 minutes in 70 minutes clock time. About 70%
> > > efficient. During the compress phase all four cores were active until
> > > the last 90 seconds. Writing the objects took over 23 minutes CPU
> > > bound on one core.
> > >
> > > New pack file is: 270,594,853
> > > Old one was: 344,543,752
> > > It still has 828,660 objects
> >
> > You mean the pack for the gcc repo is now less than 300MB?  Wow.
> >
> >
> > Nicolas
> >
>
>
> --
> Jon Smirl
> jonsmirl@gmail.com
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11  5:29       ` Jon Smirl
@ 2007-12-11  7:01         ` Jon Smirl
  2007-12-11  7:34           ` Andreas Ericsson
                             ` (3 more replies)
  2007-12-11 13:31         ` Nicolas Pitre
  1 sibling, 4 replies; 82+ messages in thread
From: Jon Smirl @ 2007-12-11  7:01 UTC (permalink / raw
  To: Nicolas Pitre, Junio C Hamano, gcc; +Cc: Git Mailing List

Switching to the Google perftools malloc
http://goog-perftools.sourceforge.net/

10%   30  828M
20%   15  831M
30%   10  834M
40%   50  1014M
50%   80  1086M
60%   80  1500M
70% 200  1.53G
80% 200  1.85G
90% 260  1.87G
95% 520  1.97G
100% 1335 2.24G

Google allocator knocked 600MB off from memory use.
Memory consumption did not fall during the write out phase like it did with gcc.

Since all of this is with the same code except for changing the
threading split, those runs where memory consumption went to 4.5GB
with the gcc allocator must have triggered an extreme problem with
fragmentation.

Total CPU time 196 CPU minutes vs 190 for gcc. Google's claims of
being faster are not true.

So why does our threaded code take 20 CPU minutes longer (12%) to run
than the same code with a single thread? Clock time is obviously
faster. Are the threads working too close to each other in memory and
bouncing cache lines between the cores? Q6600 is just two E6600s in
the same package, the caches are not shared.

Why does the threaded code need 2.24GB (google allocator, 2.85GB gcc)
with 4 threads? But only need 950MB with one thread? Where's the extra
gigabyte going?

Is there another allocator to try? One that combines Google's
efficiency with gcc's speed?


On 12/11/07, Jon Smirl <jonsmirl@gmail.com> wrote:
> I added the gcc people to the CC, it's their repository. Maybe they
> can help up sort this out.
>
> On 12/11/07, Jon Smirl <jonsmirl@gmail.com> wrote:
> > On 12/10/07, Nicolas Pitre <nico@cam.org> wrote:
> > > On Mon, 10 Dec 2007, Jon Smirl wrote:
> > >
> > > > New run using same configuration. With the addition of the more
> > > > efficient load balancing patches and delta cache accounting.
> > > >
> > > > Seconds are wall clock time. They are lower since the patch made
> > > > threading better at using all four cores. I am stuck at 380-390% CPU
> > > > utilization for the git process.
> > > >
> > > > complete seconds RAM
> > > > 10%   60    900M (includes counting)
> > > > 20%   15    900M
> > > > 30%   15    900M
> > > > 40%   50    1.2G
> > > > 50%   80    1.3G
> > > > 60%   70    1.7G
> > > > 70%   140  1.8G
> > > > 80%   180  2.0G
> > > > 90%   280  2.2G
> > > > 95%   530  2.8G - 1,420 total to here, previous was 1,983
> > > > 100% 1390 2.85G
> > > > During the writing phase RAM fell to 1.6G
> > > > What is being freed in the writing phase??
> > >
> > > The cached delta results, but you put a cap of 256MB for them.
> > >
> > > Could you try again with that cache disabled entirely, with
> > > pack.deltacachesize = 1 (don't use 0 as that means unbounded).
> > >
> > > And then, while still keeping the delta cache disabled, could you try
> > > with pack.threads = 2, and pack.threads = 1 ?
> > >
> > > I'm sorry to ask you to do this but I don't have enough ram to even
> > > complete a repack with threads=2 so I'm reattempting single threaded at
> > > the moment.  But I really wonder if the threading has such an effect on
> > > memory usage.
> >
> > I already have a threads = 1 running with this config. Binary and
> > config were same from threads=4 run.
> >
> > 10% 28min 950M
> > 40% 135min 950M
> > 50% 157min 900M
> > 60% 160min 830M
> > 100% 170min 830M
> >
> > Something is hurting bad with threads. 170 CPU minutes with one
> > thread, versus 195 CPU minutes with four threads.
> >
> > Is there a different memory allocator that can be used when
> > multithreaded on gcc? This whole problem may be coming from the memory
> > allocation function. git is hardly interacting at all on the thread
> > level so it's likely a problem in the C run-time.
> >
> > [core]
> >         repositoryformatversion = 0
> >         filemode = true
> >         bare = false
> >         logallrefupdates = true
> > [pack]
> >         threads = 1
> >         deltacachesize = 256M
> >         windowmemory = 256M
> >         deltacachelimit = 0
> > [remote "origin"]
> >         url = git://git.infradead.org/gcc.git
> >         fetch = +refs/heads/*:refs/remotes/origin/*
> > [branch "trunk"]
> >         remote = origin
> >         merge = refs/heads/trunk
> >
> >
> >
> >
> > >
> > >
> > >
> > > >
> > > > I have no explanation for the change in RAM usage. Two guesses come to
> > > > mind. Memory fragmentation. Or the change in the way the work was
> > > > split up altered RAM usage.
> > > >
> > > > Total CPU time was 195 minutes in 70 minutes clock time. About 70%
> > > > efficient. During the compress phase all four cores were active until
> > > > the last 90 seconds. Writing the objects took over 23 minutes CPU
> > > > bound on one core.
> > > >
> > > > New pack file is: 270,594,853
> > > > Old one was: 344,543,752
> > > > It still has 828,660 objects
> > >
> > > You mean the pack for the gcc repo is now less than 300MB?  Wow.
> > >
> > >
> > > Nicolas
> > >
> >
> >
> > --
> > Jon Smirl
> > jonsmirl@gmail.com
> >
>
>
> --
> Jon Smirl
> jonsmirl@gmail.com
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11  7:01         ` Jon Smirl
@ 2007-12-11  7:34           ` Andreas Ericsson
  2007-12-11 13:49           ` Nicolas Pitre
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 82+ messages in thread
From: Andreas Ericsson @ 2007-12-11  7:34 UTC (permalink / raw
  To: Jon Smirl; +Cc: Nicolas Pitre, Junio C Hamano, gcc, Git Mailing List

Jon Smirl wrote:
> Switching to the Google perftools malloc
> http://goog-perftools.sourceforge.net/
> 
> Google allocator knocked 600MB off from memory use.
> Memory consumption did not fall during the write out phase like it did with gcc.
> 
> Since all of this is with the same code except for changing the
> threading split, those runs where memory consumption went to 4.5GB
> with the gcc allocator must have triggered an extreme problem with
> fragmentation.
> 
> Total CPU time 196 CPU minutes vs 190 for gcc. Google's claims of
> being faster are not true.
> 

Did you use the tcmalloc with heap checker/profiler, or tcmalloc_minimal?

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11  7:01         ` Jon Smirl
  2007-12-11  7:34           ` Andreas Ericsson
@ 2007-12-11 13:49           ` Nicolas Pitre
  2007-12-11 15:00             ` Nicolas Pitre
  2007-12-11 16:33           ` Linus Torvalds
  2007-12-11 17:28           ` Daniel Berlin
  3 siblings, 1 reply; 82+ messages in thread
From: Nicolas Pitre @ 2007-12-11 13:49 UTC (permalink / raw
  To: Jon Smirl; +Cc: Junio C Hamano, gcc, Git Mailing List

On Tue, 11 Dec 2007, Jon Smirl wrote:

> Switching to the Google perftools malloc
> http://goog-perftools.sourceforge.net/
> 
> 10%   30  828M
> 20%   15  831M
> 30%   10  834M
> 40%   50  1014M
> 50%   80  1086M
> 60%   80  1500M
> 70% 200  1.53G
> 80% 200  1.85G
> 90% 260  1.87G
> 95% 520  1.97G
> 100% 1335 2.24G
> 
> Google allocator knocked 600MB off from memory use.
> Memory consumption did not fall during the write out phase like it did with gcc.
> 
> Since all of this is with the same code except for changing the
> threading split, those runs where memory consumption went to 4.5GB
> with the gcc allocator must have triggered an extreme problem with
> fragmentation.

Did you mean the glibc allocator?

> Total CPU time 196 CPU minutes vs 190 for gcc. Google's claims of
> being faster are not true.
> 
> So why does our threaded code take 20 CPU minutes longer (12%) to run
> than the same code with a single thread? Clock time is obviously
> faster. Are the threads working too close to each other in memory and
> bouncing cache lines between the cores? Q6600 is just two E6600s in
> the same package, the caches are not shared.

Of course there'll always be a certain amount of wasted cycles when 
threaded.  The locking overhead, the extra contention for IO, etc.  So 
12% overhead (3% per thread) when using 4 threads is not that bad I 
would say.

> Why does the threaded code need 2.24GB (google allocator, 2.85GB gcc)
> with 4 threads? But only need 950MB with one thread? Where's the extra
> gigabyte going?

I really don't know.

Did you try with pack.deltacachesize set to 1 ?

And yet, this is still missing the actual issue.  The issue being that 
the 2.1GB pack as a _source_ doesn't cause as much memory to be 
allocated even if the _result_ pack ends up being the same.

I was able to repack the 2.1GB pack on my machine which has 1GB of ram. 
Now that it has been repacked, I can't repack it anymore, even when 
single threaded, as it start crowling into swap fairly quickly.  It is 
really non intuitive and actually senseless that Git would require twice 
as much RAM to deal with a pack that is 7 times smaller.

Nicolas (still puzzled)

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11 13:49           ` Nicolas Pitre
@ 2007-12-11 15:00             ` Nicolas Pitre
  2007-12-11 15:36               ` Jon Smirl
  2007-12-11 16:20               ` Nicolas Pitre
  0 siblings, 2 replies; 82+ messages in thread
From: Nicolas Pitre @ 2007-12-11 15:00 UTC (permalink / raw
  To: Jon Smirl; +Cc: Junio C Hamano, gcc, Git Mailing List

On Tue, 11 Dec 2007, Nicolas Pitre wrote:

> And yet, this is still missing the actual issue.  The issue being that 
> the 2.1GB pack as a _source_ doesn't cause as much memory to be 
> allocated even if the _result_ pack ends up being the same.
> 
> I was able to repack the 2.1GB pack on my machine which has 1GB of ram. 
> Now that it has been repacked, I can't repack it anymore, even when 
> single threaded, as it start crowling into swap fairly quickly.  It is 
> really non intuitive and actually senseless that Git would require twice 
> as much RAM to deal with a pack that is 7 times smaller.

OK, here's something else for you to try:

	core.deltabasecachelimit=0
	pack.threads=2
	pack.deltacachesize=1

With that I'm able to repack the small gcc pack on my machine with 1GB 
of ram using:

	git repack -a -f -d --window=250 --depth=250

and top reports a ~700m virt and ~500m res without hitting swap at all.
It is only at 25% so far, but I was unable to get that far before.

Would be curious to know what you get with 4 threads on your machine.


Nicolas

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11 15:00             ` Nicolas Pitre
@ 2007-12-11 15:36               ` Jon Smirl
  2007-12-11 16:20               ` Nicolas Pitre
  1 sibling, 0 replies; 82+ messages in thread
From: Jon Smirl @ 2007-12-11 15:36 UTC (permalink / raw
  To: Nicolas Pitre; +Cc: Junio C Hamano, gcc, Git Mailing List

On 12/11/07, Nicolas Pitre <nico@cam.org> wrote:
> On Tue, 11 Dec 2007, Nicolas Pitre wrote:
>
> > And yet, this is still missing the actual issue.  The issue being that
> > the 2.1GB pack as a _source_ doesn't cause as much memory to be
> > allocated even if the _result_ pack ends up being the same.
> >
> > I was able to repack the 2.1GB pack on my machine which has 1GB of ram.
> > Now that it has been repacked, I can't repack it anymore, even when
> > single threaded, as it start crowling into swap fairly quickly.  It is
> > really non intuitive and actually senseless that Git would require twice
> > as much RAM to deal with a pack that is 7 times smaller.
>
> OK, here's something else for you to try:
>
>         core.deltabasecachelimit=0
>         pack.threads=2
>         pack.deltacachesize=1
>
> With that I'm able to repack the small gcc pack on my machine with 1GB
> of ram using:
>
>         git repack -a -f -d --window=250 --depth=250
>
> and top reports a ~700m virt and ~500m res without hitting swap at all.
> It is only at 25% so far, but I was unable to get that far before.
>
> Would be curious to know what you get with 4 threads on your machine.

Changing those parameters really slowed down counting the objects. I
used to be able to count in 45 seconds now it took 130 seconds. I am
still have the Google allocator linked in.

4 threads, cumulative clock time
25%     200 seconds, 820/627M
55%     510 seconds, 1240/1000M - little late recording
75%     15 minutes, 1658/1500M
90%      22 minutes, 1974/1800M
it's still running but there is no significant change.

Are two types of allocations being mixed?
1) long term, global objects kept until the end of everything
2) volatile, private objects allocated only while the object is being
compressed and then freed

Separating these would make a big difference to the fragmentation
problem. Single threading probably wouldn't see a fragmentation
problem from mixing the allocation types.

When a thread is created it could allocated a private 20MB (or
whatever) pool. The volatile, private objects would come from that
pool. Long term objects would stay in the global pool. Since they are
long term they will just get laid down sequentially in memory.
Separating these allocation types make things way easier for malloc.

CPU time would be helped by removing some of the locking if possible.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11 15:00             ` Nicolas Pitre
  2007-12-11 15:36               ` Jon Smirl
@ 2007-12-11 16:20               ` Nicolas Pitre
  2007-12-11 16:21                 ` Jon Smirl
  1 sibling, 1 reply; 82+ messages in thread
From: Nicolas Pitre @ 2007-12-11 16:20 UTC (permalink / raw
  To: Jon Smirl; +Cc: Junio C Hamano, gcc, Git Mailing List

On Tue, 11 Dec 2007, Nicolas Pitre wrote:

> OK, here's something else for you to try:
> 
> 	core.deltabasecachelimit=0
> 	pack.threads=2
> 	pack.deltacachesize=1
> 
> With that I'm able to repack the small gcc pack on my machine with 1GB 
> of ram using:
> 
> 	git repack -a -f -d --window=250 --depth=250
> 
> and top reports a ~700m virt and ~500m res without hitting swap at all.
> It is only at 25% so far, but I was unable to get that far before.

Well, around 55% memory usage skyrocketed to 1.6GB and the system went 
deep into swap.  So I restarted it with no threads.

Nicolas (even more puzzled)

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11 16:20               ` Nicolas Pitre
@ 2007-12-11 16:21                 ` Jon Smirl
  2007-12-12  5:12                   ` Nicolas Pitre
  0 siblings, 1 reply; 82+ messages in thread
From: Jon Smirl @ 2007-12-11 16:21 UTC (permalink / raw
  To: Nicolas Pitre; +Cc: Junio C Hamano, gcc, Git Mailing List

On 12/11/07, Nicolas Pitre <nico@cam.org> wrote:
> On Tue, 11 Dec 2007, Nicolas Pitre wrote:
>
> > OK, here's something else for you to try:
> >
> >       core.deltabasecachelimit=0
> >       pack.threads=2
> >       pack.deltacachesize=1
> >
> > With that I'm able to repack the small gcc pack on my machine with 1GB
> > of ram using:
> >
> >       git repack -a -f -d --window=250 --depth=250
> >
> > and top reports a ~700m virt and ~500m res without hitting swap at all.
> > It is only at 25% so far, but I was unable to get that far before.
>
> Well, around 55% memory usage skyrocketed to 1.6GB and the system went
> deep into swap.  So I restarted it with no threads.
>
> Nicolas (even more puzzled)

On the plus side you are seeing what I see, so it proves I am not imagining it.


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11 16:21                 ` Jon Smirl
@ 2007-12-12  5:12                   ` Nicolas Pitre
  2007-12-12  8:05                     ` David Kastrup
                                       ` (2 more replies)
  0 siblings, 3 replies; 82+ messages in thread
From: Nicolas Pitre @ 2007-12-12  5:12 UTC (permalink / raw
  To: Jon Smirl; +Cc: Junio C Hamano, gcc, Git Mailing List

On Tue, 11 Dec 2007, Jon Smirl wrote:

> On 12/11/07, Nicolas Pitre <nico@cam.org> wrote:
> > On Tue, 11 Dec 2007, Nicolas Pitre wrote:
> >
> > > OK, here's something else for you to try:
> > >
> > >       core.deltabasecachelimit=0
> > >       pack.threads=2
> > >       pack.deltacachesize=1
> > >
> > > With that I'm able to repack the small gcc pack on my machine with 1GB
> > > of ram using:
> > >
> > >       git repack -a -f -d --window=250 --depth=250
> > >
> > > and top reports a ~700m virt and ~500m res without hitting swap at all.
> > > It is only at 25% so far, but I was unable to get that far before.
> >
> > Well, around 55% memory usage skyrocketed to 1.6GB and the system went
> > deep into swap.  So I restarted it with no threads.
> >
> > Nicolas (even more puzzled)
> 
> On the plus side you are seeing what I see, so it proves I am not imagining it.

Well... This is weird.

It seems that memory fragmentation is really really killing us here.  
The fact that the Google allocator did manage to waste quite less memory 
is a good indicator already.

I did modify the progress display to show accounted memory that was 
allocated vs memory that was freed but still not released to the system.  
At least that gives you an idea of memory allocation and fragmentation 
with glibc in real time:

diff --git a/progress.c b/progress.c
index d19f80c..46ac9ef 100644
--- a/progress.c
+++ b/progress.c
@@ -8,6 +8,7 @@
  * published by the Free Software Foundation.
  */

+#include <malloc.h>
 #include "git-compat-util.h"
 #include "progress.h"

@@ -94,10 +95,12 @@ static int display(struct progress *progress, unsigned n, const char *done)
 	if (progress->total) {
 		unsigned percent = n * 100 / progress->total;
 		if (percent != progress->last_percent || progress_update) {
+			struct mallinfo m = mallinfo();
 			progress->last_percent = percent;
-			fprintf(stderr, "%s: %3u%% (%u/%u)%s%s",
-				progress->title, percent, n,
-				progress->total, tp, eol);
+			fprintf(stderr, "%s: %3u%% (%u/%u) %u/%uMB%s%s",
+				progress->title, percent, n, progress->total,
+				m.uordblks >> 18, m.fordblks >> 18,
+				tp, eol);
 			fflush(stderr);
 			progress_update = 0;
 			return 1;

This shows that at some point the repack goes into a big memory surge.  
I don't have enough RAM to see how fragmented memory gets though, since 
it starts swapping around 50% done with 2 threads.

With only 1 thread, memory usage grows significantly at around 11% with 
a pretty noticeable slowdown in the progress rate.

So I think the theory goes like this:

There is a block of big objects together in the list somewhere.  
Initially, all those big objects are assigned to thread #1 out of 4.  
Because those objects are big, they get really slow to delta compress, 
and storing them all in a window with 250 slots takes significant 
memory.

Threads 2, 3, and 4 have "easy" work loads, so they complete fairly 
quicly compared to thread #1.  But since the progress display is global 
then you won't notice that one thread is actually crawling slowly.

To keep all threads busy until the end, those threads that are done with 
their work load will steal some work from another thread, choosing the 
one with the largest remaining work.  That is most likely thread #1.  So 
as threads 2, 3, and 4 complete, they will steal from thread 1 and 
populate their own window with those big objects too, and get slow too.

And because all threads gets to work on those big objects towards the 
end, the progress display will then show a significant slowdown, and 
memory usage will almost quadruple.

Add memory fragmentation to that and you have a clogged system.

Solution: 

	pack.deltacachesize=1
	pack.windowmemory=16M

Limiting the window memory to 16MB will automatically shrink the window 
size when big objects are encountered, therefore keeping much fewer of 
those objects at the same time in memory, which in turn means they will 
be processed much more quickly.  And somehow that must help with memory 
fragmentation as well.

Setting pack.deltacachesize to 1 is simply to disable the caching of 
delta results entirely which will only slow down the writing phase, but 
I wanted to keep it out of the picture for now.

With the above settings, I'm currently repacking the gcc repo with 2 
threads, and memory allocation never exceeded 700m virt and 400m res, 
while the mallinfo shows about 350MB, and progress has reached 90% which 
has never occurred on this machine with the 300MB source pack so far.

Nicolas

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-12  5:12                   ` Nicolas Pitre
@ 2007-12-12  8:05                     ` David Kastrup
  2007-12-14 16:18                       ` Wolfram Gloger
  2007-12-12 15:48                     ` Nicolas Pitre
  2007-12-12 16:13                     ` Nicolas Pitre
  2 siblings, 1 reply; 82+ messages in thread
From: David Kastrup @ 2007-12-12  8:05 UTC (permalink / raw
  To: Nicolas Pitre; +Cc: Jon Smirl, Junio C Hamano, gcc, Git Mailing List

Nicolas Pitre <nico@cam.org> writes:

> Well... This is weird.
>
> It seems that memory fragmentation is really really killing us here.  
> The fact that the Google allocator did manage to waste quite less memory 
> is a good indicator already.

Maybe an malloc/free/mmap wrapper that records the requested sizes and
alloc/free order and dumps them to file so that one can make a compact
git-free standalone test case for the glibc maintainers might be a good
thing.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-12  8:05                     ` David Kastrup
@ 2007-12-14 16:18                       ` Wolfram Gloger
  0 siblings, 0 replies; 82+ messages in thread
From: Wolfram Gloger @ 2007-12-14 16:18 UTC (permalink / raw
  To: dak; +Cc: nico, jonsmirl, gitster, gcc, git

Hi,

> Maybe an malloc/free/mmap wrapper that records the requested sizes and
> alloc/free order and dumps them to file so that one can make a compact
> git-free standalone test case for the glibc maintainers might be a good
> thing.

I already have such a wrapper:

http://malloc.de/malloc/mtrace-20060529.tar.gz

But note that it does interfere with the thread scheduling, so it
can't record the exact same allocation pattern as when not using the
wrapper.

Regards,
Wolfram.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-12  5:12                   ` Nicolas Pitre
  2007-12-12  8:05                     ` David Kastrup
@ 2007-12-12 15:48                     ` Nicolas Pitre
  2007-12-12 16:17                       ` Paolo Bonzini
                                         ` (2 more replies)
  2007-12-12 16:13                     ` Nicolas Pitre
  2 siblings, 3 replies; 82+ messages in thread
From: Nicolas Pitre @ 2007-12-12 15:48 UTC (permalink / raw
  To: Jon Smirl; +Cc: Junio C Hamano, gcc, Git Mailing List

On Wed, 12 Dec 2007, Nicolas Pitre wrote:

> Add memory fragmentation to that and you have a clogged system.
> 
> Solution: 
> 
> 	pack.deltacachesize=1
> 	pack.windowmemory=16M
> 
> Limiting the window memory to 16MB will automatically shrink the window 
> size when big objects are encountered, therefore keeping much fewer of 
> those objects at the same time in memory, which in turn means they will 
> be processed much more quickly.  And somehow that must help with memory 
> fragmentation as well.

OK scrap that.

When I returned to the computer this morning, the repack was 
completed... with a 1.3GB pack instead.

So... The gcc repo apparently really needs a large window to efficiently 
compress those large objects.

But when those large objects are already well deltified and you repack 
again with a large window, somehow the memory allocator is way more 
involved, probably even 
more so when there are several threads in parallel amplifying the issue, 
and things probably get to a point of no return with regard to memory 
fragmentation after a while.

So... my conclusion is that the glibc allocator has fragmentation issues 
with this work load, given the notable difference with the Google 
allocator, which itself might not be completely immune to fragmentation 
issues of its own.  And because the gcc repo requires a large window of 
big objects to get good compression, then you're better not using 4 
threads to repack it with -a -f.  The fact that the size of the source 
pack has such an influence is probably only because the increased usage 
of the delta base object cache is playing a role in the global memory 
allocation pattern, allowing for the bad fragmentation issue to occur.

If you could run one last test with the mallinfo patch I posted, without 
the pack.windowmemory setting, and adding the reported values along with 
those from top, then we could formally conclude to memory fragmentation 
issues.

So I don't think Git itself is actually bad.  The gcc repo most 
certainly constitute a nasty use case for memory allocators, but I don't 
think there is much we can do about it besides possibly implementing our 
own memory allocator with active defragmentation where possible (read 
memcpy) at some point to give glibc's allocator some chance to breathe a 
bit more.

In the mean time you might have to use only one thread and lots of 
memory to repack the gcc repo, or find the perfect memory allocator to 
be used with Git.  After all, packing the whole gcc history to around 
230MB is quite a stunt but it requires sufficient resources to 
achieve it. Fortunately, like Linus said, such a wholesale repack is not 
something that most users have to do anyway.

Nicolas

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-12 15:48                     ` Nicolas Pitre
@ 2007-12-12 16:17                       ` Paolo Bonzini
  2007-12-12 16:37                       ` Linus Torvalds
  2007-12-13 13:32                       ` Nguyen Thai Ngoc Duy
  2 siblings, 0 replies; 82+ messages in thread
From: Paolo Bonzini @ 2007-12-12 16:17 UTC (permalink / raw
  To: gcc; +Cc: git


> When I returned to the computer this morning, the repack was 
> completed... with a 1.3GB pack instead.
> 
> So... The gcc repo apparently really needs a large window to efficiently 
> compress those large objects.

So, am I right that if you have a very well-done pack (such as gcc's), 
you might want to repack in two phases:

- first discarding the old deltas and using a small window, thus 
producing a bad pack that can be repacked without humongous amounts of 
memory...

- ... then discarding the old deltas and producing another 
well-compressed pack?

Paolo

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-12 15:48                     ` Nicolas Pitre
  2007-12-12 16:17                       ` Paolo Bonzini
@ 2007-12-12 16:37                       ` Linus Torvalds
  2007-12-12 16:42                         ` David Miller
                                           ` (2 more replies)
  2007-12-13 13:32                       ` Nguyen Thai Ngoc Duy
  2 siblings, 3 replies; 82+ messages in thread
From: Linus Torvalds @ 2007-12-12 16:37 UTC (permalink / raw
  To: Nicolas Pitre; +Cc: Jon Smirl, Junio C Hamano, gcc, Git Mailing List

On Wed, 12 Dec 2007, Nicolas Pitre wrote:
> 
> So... my conclusion is that the glibc allocator has fragmentation issues 
> with this work load, given the notable difference with the Google 
> allocator, which itself might not be completely immune to fragmentation 
> issues of its own. 

Yes.

Note that delta following involves patterns something like

   allocate (small) space for delta
   for i in (1..depth) {
	allocate large space for base
	allocate large space for result
	.. apply delta ..
	free large space for base
	free small space for delta
   }

so if you have some stupid heap algorithm that doesn't try to merge and 
re-use free'd spaces very aggressively (because that takes CPU time!), you 
might have memory usage be horribly inflated by the heap having all those 
holes for all the objects that got free'd in the chain that don't get 
aggressively re-used.

Threaded memory allocators then make this worse by probably using totally 
different heaps for different threads (in order to avoid locking), so they 
will *all* have the fragmentation issue.

And if you *really* want to cause trouble for a memory allocator, what you 
should try to do is to allocate the memory in one thread, and free it in 
another, and then things can really explode (the freeing thread notices 
that the allocation is not in its thread-local heap, so instead of really 
freeing it, it puts it on a separate list of areas to be freed later by 
the original thread when it needs memory - or worse, it adds it to the 
local thread list, and makes it effectively totally impossible to then 
ever merge different free'd allocations ever again because the freed 
things will be on different heap lists!).

I'm not saying that particular case happens in git, I'm just saying that 
it's not unheard of. And with the delta cache and the object lookup, it's 
not at _all_ impossible that we hit the "allocate in one thread, free in 
another" case!

		Linus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-12 16:37                       ` Linus Torvalds
@ 2007-12-12 16:42                         ` David Miller
  2007-12-12 16:54                           ` Linus Torvalds
  2007-12-12 17:12                         ` Jon Smirl
  2007-12-14 16:12                         ` Wolfram Gloger
  2 siblings, 1 reply; 82+ messages in thread
From: David Miller @ 2007-12-12 16:42 UTC (permalink / raw
  To: torvalds; +Cc: nico, jonsmirl, gitster, gcc, git

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed, 12 Dec 2007 08:37:10 -0800 (PST)

> I'm not saying that particular case happens in git, I'm just saying that 
> it's not unheard of. And with the delta cache and the object lookup, it's 
> not at _all_ impossible that we hit the "allocate in one thread, free in 
> another" case!

One thing that supports these theories is that, while running
these large repacks, I notice that the RSS is roughly 2/3 of
the amount of virtual address space allocated.

I personally don't think it's unreasonable for GIT to have it's
own customized allocator at least for certain object types.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-12 16:42                         ` David Miller
@ 2007-12-12 16:54                           ` Linus Torvalds
  0 siblings, 0 replies; 82+ messages in thread
From: Linus Torvalds @ 2007-12-12 16:54 UTC (permalink / raw
  To: David Miller; +Cc: nico, jonsmirl, gitster, gcc, git

On Wed, 12 Dec 2007, David Miller wrote:
> 
> I personally don't think it's unreasonable for GIT to have it's
> own customized allocator at least for certain object types.

Well, we actually already *do* have a customized allocator, but currently 
only for the actual core "object descriptor" that really just has the SHA1 
and object flags in it (and a few extra words depending on object type).

Those are critical for certain loads, and small too (so using the standard 
allocator wasted a _lot_ of memory). In addition, they're fixed-size and 
never free'd, so a specialized allocator really can do a lot better than 
any general-purpose memory allocator ever could.

But the actual object *contents* are currently all allocated with whatever 
the standard libc malloc/free allocator is that you compile for (or load 
dynamically). Havign a specialized allocator for them is a much more 
involved issue, exactly because we do have interesting allocation patterns 
etc.

That said, at least those object allocations are all single-threaded (for 
right now, at least), so even when git does multi-threaded stuff, the core 
sha1_file.c stuff is always run under a single lock, and a simpler 
allocator that doesn't care about threads is likely to be much better than 
one that tries to have thread-local heaps etc.

I suspect that is what the google allocator does. It probably doesn't have 
per-thread heaps, it just uses locking (and quite possibly things like 
per-*size* heaps, which is much more memory-efficient and helps avoid some 
of the fragmentation problems). 

Locking is much slower than per-thread accesses, but it doesn't have the 
issues with per-thread-fragmentation and all the problems with one thread 
allocating and another one freeing.

			Linus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-12 16:37                       ` Linus Torvalds
  2007-12-12 16:42                         ` David Miller
@ 2007-12-12 17:12                         ` Jon Smirl
  2007-12-14 16:12                         ` Wolfram Gloger
  2 siblings, 0 replies; 82+ messages in thread
From: Jon Smirl @ 2007-12-12 17:12 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Nicolas Pitre, Junio C Hamano, gcc, Git Mailing List

On 12/12/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>
> On Wed, 12 Dec 2007, Nicolas Pitre wrote:
> >
> > So... my conclusion is that the glibc allocator has fragmentation issues
> > with this work load, given the notable difference with the Google
> > allocator, which itself might not be completely immune to fragmentation
> > issues of its own.
>
> Yes.
>
> Note that delta following involves patterns something like
>
>    allocate (small) space for delta
>    for i in (1..depth) {
>         allocate large space for base
>         allocate large space for result
>         .. apply delta ..
>         free large space for base
>         free small space for delta
>    }

Is it hard to hack up something that statically allocates a big block
of memory per thread for these two and then just reuses it?
   allocate (small) space for delta
   allocate large space for base

The alternating between long term and short term allocations
definitely aggravates fragmentation.

>
> so if you have some stupid heap algorithm that doesn't try to merge and
> re-use free'd spaces very aggressively (because that takes CPU time!), you
> might have memory usage be horribly inflated by the heap having all those
> holes for all the objects that got free'd in the chain that don't get
> aggressively re-used.
>
> Threaded memory allocators then make this worse by probably using totally
> different heaps for different threads (in order to avoid locking), so they
> will *all* have the fragmentation issue.
>
> And if you *really* want to cause trouble for a memory allocator, what you
> should try to do is to allocate the memory in one thread, and free it in
> another, and then things can really explode (the freeing thread notices
> that the allocation is not in its thread-local heap, so instead of really
> freeing it, it puts it on a separate list of areas to be freed later by
> the original thread when it needs memory - or worse, it adds it to the
> local thread list, and makes it effectively totally impossible to then
> ever merge different free'd allocations ever again because the freed
> things will be on different heap lists!).
>
> I'm not saying that particular case happens in git, I'm just saying that
> it's not unheard of. And with the delta cache and the object lookup, it's
> not at _all_ impossible that we hit the "allocate in one thread, free in
> another" case!
>
>                 Linus
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-12 16:37                       ` Linus Torvalds
  2007-12-12 16:42                         ` David Miller
  2007-12-12 17:12                         ` Jon Smirl
@ 2007-12-14 16:12                         ` Wolfram Gloger
  2007-12-14 16:45                           ` David Kastrup
  2 siblings, 1 reply; 82+ messages in thread
From: Wolfram Gloger @ 2007-12-14 16:12 UTC (permalink / raw
  To: torvalds; +Cc: nico, jonsmirl, gitster, gcc, git

Hi,

> Note that delta following involves patterns something like
> 
>    allocate (small) space for delta
>    for i in (1..depth) {
> 	allocate large space for base
> 	allocate large space for result
> 	.. apply delta ..
> 	free large space for base
> 	free small space for delta
>    }
> 
> so if you have some stupid heap algorithm that doesn't try to merge and 
> re-use free'd spaces very aggressively (because that takes CPU time!),

ptmalloc2 (in glibc) _per arena_ is basically best-fit.  This is the
best known general strategy, but it certainly cannot be the best in
every case.

> you 
> might have memory usage be horribly inflated by the heap having all those 
> holes for all the objects that got free'd in the chain that don't get 
> aggressively re-used.

It depends how large 'large' is -- if it exceeds the mmap() threshold
(settable with mallopt(M_MMAP_THRESHOLD, ...))
the 'large' spaces will be allocated with mmap() and won't cause
any internal fragmentation.
It might pay to experiment with this parameter if it is hard to
avoid the alloc/free large space sequence.

> Threaded memory allocators then make this worse by probably using totally 
> different heaps for different threads (in order to avoid locking), so they 
> will *all* have the fragmentation issue.

Indeed.

Could someone perhaps try ptmalloc3
(http://malloc.de/malloc/ptmalloc3-current.tar.gz) on this case?

Thanks,
Wolfram.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-14 16:12                         ` Wolfram Gloger
@ 2007-12-14 16:45                           ` David Kastrup
  2007-12-14 16:59                             ` Wolfram Gloger
  0 siblings, 1 reply; 82+ messages in thread
From: David Kastrup @ 2007-12-14 16:45 UTC (permalink / raw
  To: Wolfram Gloger; +Cc: torvalds, nico, jonsmirl, gitster, gcc, git

Wolfram Gloger <wmglo@dent.med.uni-muenchen.de> writes:

> Hi,
>
>> Note that delta following involves patterns something like
>> 
>>    allocate (small) space for delta
>>    for i in (1..depth) {
>> 	allocate large space for base
>> 	allocate large space for result
>> 	.. apply delta ..
>> 	free large space for base
>> 	free small space for delta
>>    }
>> 
>> so if you have some stupid heap algorithm that doesn't try to merge and 
>> re-use free'd spaces very aggressively (because that takes CPU time!),
>
> ptmalloc2 (in glibc) _per arena_ is basically best-fit.  This is the
> best known general strategy,

Uh what?  Someone crank out his copy of "The Art of Computer
Programming", I think volume 1.  Best fit is known (analyzed and proven
and documented decades ago) to be one of the worst strategies for memory
allocation.  Exactly because it leads to huge fragmentation problems.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-14 16:45                           ` David Kastrup
@ 2007-12-14 16:59                             ` Wolfram Gloger
  0 siblings, 0 replies; 82+ messages in thread
From: Wolfram Gloger @ 2007-12-14 16:59 UTC (permalink / raw
  To: dak; +Cc: wmglo, torvalds, nico, jonsmirl, gitster, gcc, git

Hi,

> Uh what?  Someone crank out his copy of "The Art of Computer
> Programming", I think volume 1.  Best fit is known (analyzed and proven
> and documented decades ago) to be one of the worst strategies for memory
> allocation.  Exactly because it leads to huge fragmentation problems.

Well, quoting http://gee.cs.oswego.edu/dl/html/malloc.html:

"As shown by Wilson et al, best-fit schemes (of various kinds and
approximations) tend to produce the least fragmentation on real loads
compared to other general approaches such as first-fit."

See [Wilson 1995] ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps for
more details and references.

Regards,
Wolfram.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-12 15:48                     ` Nicolas Pitre
  2007-12-12 16:17                       ` Paolo Bonzini
  2007-12-12 16:37                       ` Linus Torvalds
@ 2007-12-13 13:32                       ` Nguyen Thai Ngoc Duy
  2007-12-13 15:32                         ` Paolo Bonzini
  2 siblings, 1 reply; 82+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2007-12-13 13:32 UTC (permalink / raw
  To: Nicolas Pitre; +Cc: Jon Smirl, Junio C Hamano, gcc, Git Mailing List

On Dec 12, 2007 10:48 PM, Nicolas Pitre <nico@cam.org> wrote:
> In the mean time you might have to use only one thread and lots of
> memory to repack the gcc repo, or find the perfect memory allocator to
> be used with Git.  After all, packing the whole gcc history to around
> 230MB is quite a stunt but it requires sufficient resources to
> achieve it. Fortunately, like Linus said, such a wholesale repack is not
> something that most users have to do anyway.

Is there an alternative to "git repack -a -d" that repacks everything
but the first pack?
-- 
Duy

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-13 13:32                       ` Nguyen Thai Ngoc Duy
@ 2007-12-13 15:32                         ` Paolo Bonzini
  2007-12-13 16:29                           ` Paolo Bonzini
  2007-12-13 16:39                           ` Johannes Sixt
  0 siblings, 2 replies; 82+ messages in thread
From: Paolo Bonzini @ 2007-12-13 15:32 UTC (permalink / raw
  To: git; +Cc: gcc

Nguyen Thai Ngoc Duy wrote:
> On Dec 12, 2007 10:48 PM, Nicolas Pitre <nico@cam.org> wrote:
>> In the mean time you might have to use only one thread and lots of
>> memory to repack the gcc repo, or find the perfect memory allocator to
>> be used with Git.  After all, packing the whole gcc history to around
>> 230MB is quite a stunt but it requires sufficient resources to
>> achieve it. Fortunately, like Linus said, such a wholesale repack is not
>> something that most users have to do anyway.
> 
> Is there an alternative to "git repack -a -d" that repacks everything
> but the first pack?

That would be a pretty good idea for big repositories.  If I were to 
implement it, I would actually add a .git/config option like 
pack.permanent so that more than one pack could be made permanent; then 
to repack really really everything you'd need "git repack -a -a -d".

Paolo

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-13 15:32                         ` Paolo Bonzini
@ 2007-12-13 16:29                           ` Paolo Bonzini
  2007-12-13 16:39                           ` Johannes Sixt
  1 sibling, 0 replies; 82+ messages in thread
From: Paolo Bonzini @ 2007-12-13 16:29 UTC (permalink / raw
  Cc: git, gcc


>> Is there an alternative to "git repack -a -d" that repacks everything
>> but the first pack?
> 
> That would be a pretty good idea for big repositories.  If I were to 
> implement it, I would actually add a .git/config option like 
> pack.permanent so that more than one pack could be made permanent; then 
> to repack really really everything you'd need "git repack -a -a -d".

Actually there is something like this, as seen from the source of 
git-repack:

             for e in `cd "$PACKDIR" && find . -type f -name '*.pack' \
                      | sed -e 's/^\.\///' -e 's/\.pack$//'`
             do
                     if [ -e "$PACKDIR/$e.keep" ]; then
                             : keep
                     else
                             args="$args --unpacked=$e.pack"
                             existing="$existing $e"
                     fi
             done

So, just create a file named as the pack, but with extension ".keep".

Paolo

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-13 15:32                         ` Paolo Bonzini
  2007-12-13 16:29                           ` Paolo Bonzini
@ 2007-12-13 16:39                           ` Johannes Sixt
  2007-12-14  1:04                             ` Jakub Narebski
  1 sibling, 1 reply; 82+ messages in thread
From: Johannes Sixt @ 2007-12-13 16:39 UTC (permalink / raw
  To: Paolo Bonzini; +Cc: git, gcc

Paolo Bonzini schrieb:
> Nguyen Thai Ngoc Duy wrote:
>> On Dec 12, 2007 10:48 PM, Nicolas Pitre <nico@cam.org> wrote:
>>> In the mean time you might have to use only one thread and lots of
>>> memory to repack the gcc repo, or find the perfect memory allocator to
>>> be used with Git.  After all, packing the whole gcc history to around
>>> 230MB is quite a stunt but it requires sufficient resources to
>>> achieve it. Fortunately, like Linus said, such a wholesale repack is not
>>> something that most users have to do anyway.
>>
>> Is there an alternative to "git repack -a -d" that repacks everything
>> but the first pack?
> 
> That would be a pretty good idea for big repositories.  If I were to
> implement it, I would actually add a .git/config option like
> pack.permanent so that more than one pack could be made permanent; then
> to repack really really everything you'd need "git repack -a -a -d".

It's already there: If you have a pack .git/objects/pack/pack-foo.pack, then
"touch .git/objects/pack/pack-foo.keep" marks the pack as precious.

-- Hannes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-13 16:39                           ` Johannes Sixt
@ 2007-12-14  1:04                             ` Jakub Narebski
  2007-12-14  6:14                               ` Paolo Bonzini
  0 siblings, 1 reply; 82+ messages in thread
From: Jakub Narebski @ 2007-12-14  1:04 UTC (permalink / raw
  To: git; +Cc: gcc

Johannes Sixt wrote:
> Paolo Bonzini schrieb:
>> Nguyen Thai Ngoc Duy wrote:
>>>
>>> Is there an alternative to "git repack -a -d" that repacks everything
>>> but the first pack?
>> 
>> That would be a pretty good idea for big repositories.  If I were to
>> implement it, I would actually add a .git/config option like
>> pack.permanent so that more than one pack could be made permanent; then
>> to repack really really everything you'd need "git repack -a -a -d".
> 
> It's already there: If you have a pack .git/objects/pack/pack-foo.pack, then
> "touch .git/objects/pack/pack-foo.keep" marks the pack as precious.

Actually you can (and probably should) put the one line with the _reason_
pack is to be kept in the *.keep file.

Hmmm... it is even documented in git-gc(1)... and git-index-pack(1) of
all things.
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-14  1:04                             ` Jakub Narebski
@ 2007-12-14  6:14                               ` Paolo Bonzini
  2007-12-14  6:24                                 ` Nguyen Thai Ngoc Duy
  2007-12-14 13:25                                 ` Nicolas Pitre
  0 siblings, 2 replies; 82+ messages in thread
From: Paolo Bonzini @ 2007-12-14  6:14 UTC (permalink / raw
  To: git; +Cc: gcc

> Hmmm... it is even documented in git-gc(1)... and git-index-pack(1) of
> all things.

I found that the .keep file is not transmitted over the network (at 
least I tried with git+ssh:// and http:// protocols), however.

Paolo

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-14  6:14                               ` Paolo Bonzini
@ 2007-12-14  6:24                                 ` Nguyen Thai Ngoc Duy
  2007-12-14  8:20                                   ` Paolo Bonzini
  2007-12-14 10:40                                   ` Jakub Narebski
  2007-12-14 13:25                                 ` Nicolas Pitre
  1 sibling, 2 replies; 82+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2007-12-14  6:24 UTC (permalink / raw
  To: Paolo Bonzini; +Cc: git, gcc

On Dec 14, 2007 1:14 PM, Paolo Bonzini <bonzini@gnu.org> wrote:
> > Hmmm... it is even documented in git-gc(1)... and git-index-pack(1) of
> > all things.
>
> I found that the .keep file is not transmitted over the network (at
> least I tried with git+ssh:// and http:// protocols), however.

I'm thinking about "git clone --keep" to mark initial packs precious.
But 'git clone' is under rewrite to C. Let's wait until C rewrite is
done.
-- 
Duy

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-14  6:24                                 ` Nguyen Thai Ngoc Duy
@ 2007-12-14  8:20                                   ` Paolo Bonzini
  2007-12-14  9:01                                     ` Harvey Harrison
  2007-12-14 10:40                                   ` Jakub Narebski
  1 sibling, 1 reply; 82+ messages in thread
From: Paolo Bonzini @ 2007-12-14  8:20 UTC (permalink / raw
  To: gcc; +Cc: git


> I'm thinking about "git clone --keep" to mark initial packs precious.
> But 'git clone' is under rewrite to C. Let's wait until C rewrite is
> done.

It should be the default, IMHO.

Paolo

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-14  8:20                                   ` Paolo Bonzini
@ 2007-12-14  9:01                                     ` Harvey Harrison
  0 siblings, 0 replies; 82+ messages in thread
From: Harvey Harrison @ 2007-12-14  9:01 UTC (permalink / raw
  To: Paolo Bonzini; +Cc: gcc, git

On Fri, 2007-12-14 at 09:20 +0100, Paolo Bonzini wrote:
> > I'm thinking about "git clone --keep" to mark initial packs precious.
> > But 'git clone' is under rewrite to C. Let's wait until C rewrite is
> > done.
> 
> It should be the default, IMHO.
> 

While it doesn't mark the packs as .keep, git will reuse all of the old
deltas you got in the original clone, so you're not losing anything.

Harvey

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-14  6:24                                 ` Nguyen Thai Ngoc Duy
  2007-12-14  8:20                                   ` Paolo Bonzini
@ 2007-12-14 10:40                                   ` Jakub Narebski
  2007-12-14 10:52                                     ` Nguyen Thai Ngoc Duy
  1 sibling, 1 reply; 82+ messages in thread
From: Jakub Narebski @ 2007-12-14 10:40 UTC (permalink / raw
  To: Nguyen Thai Ngoc Duy; +Cc: Paolo Bonzini, git, gcc

"Nguyen Thai Ngoc Duy" <pclouds@gmail.com> writes:

> On Dec 14, 2007 1:14 PM, Paolo Bonzini <bonzini@gnu.org> wrote:
> > > Hmmm... it is even documented in git-gc(1)... and git-index-pack(1) of
> > > all things.
> >
> > I found that the .keep file is not transmitted over the network (at
> > least I tried with git+ssh:// and http:// protocols), however.
> 
> I'm thinking about "git clone --keep" to mark initial packs precious.
> But 'git clone' is under rewrite to C. Let's wait until C rewrite is
> done.

But if you clone via network, pack might be network optimized if you
use "smart" transport, not disk optimized, at least with current git
which regenerates pack also on clone AFAIK.

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-14 10:40                                   ` Jakub Narebski
@ 2007-12-14 10:52                                     ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 82+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2007-12-14 10:52 UTC (permalink / raw
  To: Jakub Narebski, Harvey Harrison; +Cc: Paolo Bonzini, git, gcc

On Dec 14, 2007 4:01 PM, Harvey Harrison <harvey.harrison@gmail.com> wrote:
> While it doesn't mark the packs as .keep, git will reuse all of the old
> deltas you got in the original clone, so you're not losing anything.

There is another reason I want it. I have an ~800MB pack and I don't
want git to rewrite  the pack every time I repack my changes. So it's
kind of disk-wise (don't require 800MB on disk to prepare new pack,
and don't write too much).

On Dec 14, 2007 5:40 PM, Jakub Narebski <jnareb@gmail.com> wrote:
> But if you clone via network, pack might be network optimized if you
> use "smart" transport, not disk optimized, at least with current git
> which regenerates pack also on clone AFAIK.

Um.. that's ok it just regenerate once.

-- 
Duy

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-14  6:14                               ` Paolo Bonzini
  2007-12-14  6:24                                 ` Nguyen Thai Ngoc Duy
@ 2007-12-14 13:25                                 ` Nicolas Pitre
  1 sibling, 0 replies; 82+ messages in thread
From: Nicolas Pitre @ 2007-12-14 13:25 UTC (permalink / raw
  To: Paolo Bonzini; +Cc: git, gcc

On Fri, 14 Dec 2007, Paolo Bonzini wrote:

> > Hmmm... it is even documented in git-gc(1)... and git-index-pack(1) of
> > all things.
> 
> I found that the .keep file is not transmitted over the network (at least I
> tried with git+ssh:// and http:// protocols), however.

That is a local policy.


Nicolas

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-12  5:12                   ` Nicolas Pitre
  2007-12-12  8:05                     ` David Kastrup
  2007-12-12 15:48                     ` Nicolas Pitre
@ 2007-12-12 16:13                     ` Nicolas Pitre
  2007-12-13  7:32                       ` Andreas Ericsson
  2 siblings, 1 reply; 82+ messages in thread
From: Nicolas Pitre @ 2007-12-12 16:13 UTC (permalink / raw
  To: Jon Smirl; +Cc: Junio C Hamano, gcc, Git Mailing List

On Wed, 12 Dec 2007, Nicolas Pitre wrote:

> I did modify the progress display to show accounted memory that was 
> allocated vs memory that was freed but still not released to the system.  
> At least that gives you an idea of memory allocation and fragmentation 
> with glibc in real time:
> 
> diff --git a/progress.c b/progress.c
> index d19f80c..46ac9ef 100644
> --- a/progress.c
> +++ b/progress.c
> @@ -8,6 +8,7 @@
>   * published by the Free Software Foundation.
>   */
>  
> +#include <malloc.h>
>  #include "git-compat-util.h"
>  #include "progress.h"
>  
> @@ -94,10 +95,12 @@ static int display(struct progress *progress, unsigned n, const char *done)
>  	if (progress->total) {
>  		unsigned percent = n * 100 / progress->total;
>  		if (percent != progress->last_percent || progress_update) {
> +			struct mallinfo m = mallinfo();
>  			progress->last_percent = percent;
> -			fprintf(stderr, "%s: %3u%% (%u/%u)%s%s",
> -				progress->title, percent, n,
> -				progress->total, tp, eol);
> +			fprintf(stderr, "%s: %3u%% (%u/%u) %u/%uMB%s%s",
> +				progress->title, percent, n, progress->total,
> +				m.uordblks >> 18, m.fordblks >> 18,
> +				tp, eol);

Note: I didn't know what unit of memory those blocks represents, so the 
shift is most probably wrong.


Nicolas

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-12 16:13                     ` Nicolas Pitre
@ 2007-12-13  7:32                       ` Andreas Ericsson
  2007-12-14 16:03                         ` Wolfram Gloger
  0 siblings, 1 reply; 82+ messages in thread
From: Andreas Ericsson @ 2007-12-13  7:32 UTC (permalink / raw
  To: Nicolas Pitre; +Cc: Jon Smirl, Junio C Hamano, gcc, Git Mailing List

Nicolas Pitre wrote:
> On Wed, 12 Dec 2007, Nicolas Pitre wrote:
> 
>> I did modify the progress display to show accounted memory that was 
>> allocated vs memory that was freed but still not released to the system.  
>> At least that gives you an idea of memory allocation and fragmentation 
>> with glibc in real time:
>>
>> diff --git a/progress.c b/progress.c
>> index d19f80c..46ac9ef 100644
>> --- a/progress.c
>> +++ b/progress.c
>> @@ -8,6 +8,7 @@
>>   * published by the Free Software Foundation.
>>   */
>>  
>> +#include <malloc.h>
>>  #include "git-compat-util.h"
>>  #include "progress.h"
>>  
>> @@ -94,10 +95,12 @@ static int display(struct progress *progress, unsigned n, const char *done)
>>  	if (progress->total) {
>>  		unsigned percent = n * 100 / progress->total;
>>  		if (percent != progress->last_percent || progress_update) {
>> +			struct mallinfo m = mallinfo();
>>  			progress->last_percent = percent;
>> -			fprintf(stderr, "%s: %3u%% (%u/%u)%s%s",
>> -				progress->title, percent, n,
>> -				progress->total, tp, eol);
>> +			fprintf(stderr, "%s: %3u%% (%u/%u) %u/%uMB%s%s",
>> +				progress->title, percent, n, progress->total,
>> +				m.uordblks >> 18, m.fordblks >> 18,
>> +				tp, eol);
> 
> Note: I didn't know what unit of memory those blocks represents, so the 
> shift is most probably wrong.
> 

Me neither, but it appears to me as if hblkhd holds the actual memory
consumed by the process. It seems to store the information in bytes,
which I find a bit dubious unless glibc has some internal multiplier.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-13  7:32                       ` Andreas Ericsson
@ 2007-12-14 16:03                         ` Wolfram Gloger
  0 siblings, 0 replies; 82+ messages in thread
From: Wolfram Gloger @ 2007-12-14 16:03 UTC (permalink / raw
  To: ae; +Cc: nico, jonsmirl, gitster, gcc, git

Hi,

> >>  	if (progress->total) {
> >>  		unsigned percent = n * 100 / progress->total;
> >>  		if (percent != progress->last_percent || progress_update) {
> >> +			struct mallinfo m = mallinfo();
> >>  			progress->last_percent = percent;
> >> -			fprintf(stderr, "%s: %3u%% (%u/%u)%s%s",
> >> -				progress->title, percent, n,
> >> -				progress->total, tp, eol);
> >> +			fprintf(stderr, "%s: %3u%% (%u/%u) %u/%uMB%s%s",
> >> +				progress->title, percent, n, progress->total,
> >> +				m.uordblks >> 18, m.fordblks >> 18,
> >> +				tp, eol);
> > 
> > Note: I didn't know what unit of memory those blocks represents, so the 
> > shift is most probably wrong.
> > 
> 
> Me neither, but it appears to me as if hblkhd holds the actual memory
> consumed by the process. It seems to store the information in bytes,
> which I find a bit dubious unless glibc has some internal multiplier.

mallinfo() will only give you the used memory for the main arena.
When you have separate arenas (likely when concurrent threads have
been used), the only way to get the full picture is to call
malloc_stats(), which prints to stderr.

Regards,
Wolfram.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11  7:01         ` Jon Smirl
  2007-12-11  7:34           ` Andreas Ericsson
  2007-12-11 13:49           ` Nicolas Pitre
@ 2007-12-11 16:33           ` Linus Torvalds
  2007-12-11 17:21             ` Nicolas Pitre
  2007-12-11 18:43             ` Jon Smirl
  2007-12-11 17:28           ` Daniel Berlin
  3 siblings, 2 replies; 82+ messages in thread
From: Linus Torvalds @ 2007-12-11 16:33 UTC (permalink / raw
  To: Jon Smirl; +Cc: Nicolas Pitre, Junio C Hamano, gcc, Git Mailing List

On Tue, 11 Dec 2007, Jon Smirl wrote:
> 
> So why does our threaded code take 20 CPU minutes longer (12%) to run
> than the same code with a single thread?

Threaded code *always* takes more CPU time. The only thing you can hope 
for is a wall-clock reduction. You're seeing probably a combination of 
 (a) more cache misses
 (b) bigger dataset active at a time
and a probably fairly miniscule
 (c) threading itself tends to have some overheads.

> Q6600 is just two E6600s in the same package, the caches are not shared.

Sure they are shared. They're just not *entirely* shared. But they are 
shared between each two cores, so each thread essentially has only half 
the cache they had with the non-threaded version.

Threading is *not* a magic solution to all problems. It gives you 
potentially twice the CPU power, but there are real downsides that you 
should keep in mind.

> Why does the threaded code need 2.24GB (google allocator, 2.85GB gcc)
> with 4 threads? But only need 950MB with one thread? Where's the extra
> gigabyte going?

I suspect that it's really simple: you have a few rather big files in the 
gcc history, with deep delta chains. And what happens when you have four 
threads running at the same time is that they all need to keep all those 
objects that they are working on - and their hash state - in memory at the 
same time!

So if you want to use more threads, that _forces_ you to have a bigger 
memory footprint, simply because you have more "live" objects that you 
work on. Normally, that isn't much of a problem, since most source files 
are small, but if you have a few deep delta chains on big files, both the 
delta chain itself is going to use memory (you may have limited the size 
of the cache, but it's still needed for the actual delta generation, so 
it's not like the memory usage went away).

That said, I suspect there are a few things fighting you:

 - threading is hard. I haven't looked a lot at the changes Nico did to do 
   a threaded object packer, but what I've seen does not convince me it is 
   correct. The "trg_entry" accesses are *mostly* protected with 
   "cache_lock", but nothing else really seems to be, so quite frankly, I 
   wouldn't trust the threaded version very much. It's off by default, and 
   for a good reason, I think.

   For example: the packing code does this:

	if (!src->data) {
		read_lock();
		src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz);
		read_unlock();
		...

   and that's racy. If two threads come in at roughly the same time and 
   see a NULL src->data, theÿ́'ll both get the lock, and they'll both 
   (serially) try to fill it in. It will all *work*, but one of them will 
   have done unnecessary work, and one of them will have their result 
   thrown away and leaked.

   Are you hitting issues like this? I dunno. The object sorting means 
   that different threads normally shouldn't look at the same objects (not 
   even the sources), so probably not, but basically, I wouldn't trust the 
   threading 100%. It needs work, and it needs to stay off by default.

 - you're working on a problem that isn't really even worth optimizing 
   that much. The *normal* case is to re-use old deltas, which makes all 
   of the issues you are fighting basically go away (because you only have 
   a few _incremental_ objects that need deltaing). 

   In other words: the _real_ optimizations have already been done, and 
   are done elsewhere, and are much smarter (the best way to optimize X is 
   not to make X run fast, but to avoid doing X in the first place!). The 
   thing you are trying to work with is the one-time-only case where you 
   explicitly disable that big and important optimization, and then you 
   complain about the end result being slow!

   It's like saying that you're compiling with extreme debugging and no
   optimizations, and then complaining that the end result doesn't run as 
   fast as if you used -O2. Except this is a hundred times worse, because 
   you literally asked git to do the really expensive thing that it really 
   really doesn't want to do ;)

> Is there another allocator to try? One that combines Google's
> efficiency with gcc's speed?

See above: I'd look around at threading-related bugs and check the way we 
lock (or don't) accesses.

		Linus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11 16:33           ` Linus Torvalds
@ 2007-12-11 17:21             ` Nicolas Pitre
  2007-12-11 17:24               ` David Miller
  2007-12-11 18:43             ` Jon Smirl
  1 sibling, 1 reply; 82+ messages in thread
From: Nicolas Pitre @ 2007-12-11 17:21 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Jon Smirl, Junio C Hamano, gcc, Git Mailing List

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4146 bytes --]

On Tue, 11 Dec 2007, Linus Torvalds wrote:

> That said, I suspect there are a few things fighting you:
> 
>  - threading is hard. I haven't looked a lot at the changes Nico did to do 
>    a threaded object packer, but what I've seen does not convince me it is 
>    correct. The "trg_entry" accesses are *mostly* protected with 
>    "cache_lock", but nothing else really seems to be, so quite frankly, I 
>    wouldn't trust the threaded version very much. It's off by default, and 
>    for a good reason, I think.

I beg to differ (of course, since I always know precisely what I do, and 
like you, my code never has bugs).

Seriously though, the trg_entry has not to be protected at all.  Why? 
Simply because each thread has its own exclusive set of objects which no 
other threads ever mess with.  They never overlap.

>    For example: the packing code does this:
> 
> 	if (!src->data) {
> 		read_lock();
> 		src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz);
> 		read_unlock();
> 		...
> 
>    and that's racy. If two threads come in at roughly the same time and 
>    see a NULL src->data, theÿ́'ll both get the lock, and they'll both 
>    (serially) try to fill it in. It will all *work*, but one of them will 
>    have done unnecessary work, and one of them will have their result 
>    thrown away and leaked.

No.  Once again, it is impossible for two threads to ever see the same 
src->data at all.  The lock is there simply because read_sha1_file() is 
not reentrant.

>    Are you hitting issues like this? I dunno. The object sorting means 
>    that different threads normally shouldn't look at the same objects (not 
>    even the sources), so probably not, but basically, I wouldn't trust the 
>    threading 100%. It needs work, and it needs to stay off by default.

For now it is, but I wouldn't say it really needs significant work at 
this point.  The latest thread patches were more about tuning than 
correctness.

What the threading could be doing, though, is uncovering some other 
bugs, like in the pack mmap windowing code for example.  Although that 
code is serialized by the read lock above, the fact that multiple 
threads are hammering on it in turns means that the mmap window is 
possibly seeking back and forth much more often than otherwise, possibly 
leaking something in the process.

>  - you're working on a problem that isn't really even worth optimizing 
>    that much. The *normal* case is to re-use old deltas, which makes all 
>    of the issues you are fighting basically go away (because you only have 
>    a few _incremental_ objects that need deltaing). 
> 
>    In other words: the _real_ optimizations have already been done, and 
>    are done elsewhere, and are much smarter (the best way to optimize X is 
>    not to make X run fast, but to avoid doing X in the first place!). The 
>    thing you are trying to work with is the one-time-only case where you 
>    explicitly disable that big and important optimization, and then you 
>    complain about the end result being slow!
> 
>    It's like saying that you're compiling with extreme debugging and no
>    optimizations, and then complaining that the end result doesn't run as 
>    fast as if you used -O2. Except this is a hundred times worse, because 
>    you literally asked git to do the really expensive thing that it really 
>    really doesn't want to do ;)

Linus, please pay attention to the _actual_ important issue here.

Sure I've been tuning the threading code in parallel to the attempt to 
debug this memory usage issue.

BUT.  The point is that repacking the gcc repo using "git repack -a -f 
--window=250" has a radically different memory usage profile whether you 
do the repack on the earlier 2.1GB pack or the later 300MB pack.  
_That_ is the issue.  Ironically, it is the 300MB pack that causes the 
repack to blow memory usage out of proportion.

And in both cases, the threading code has to do the same 
work whether or not the original pack was densely packed or not since -f 
throws away every existing deltas anyway.

So something is fishy elsewhere than in the packing code.

Nicolas

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11 17:21             ` Nicolas Pitre
@ 2007-12-11 17:24               ` David Miller
  2007-12-11 17:44                 ` Nicolas Pitre
  0 siblings, 1 reply; 82+ messages in thread
From: David Miller @ 2007-12-11 17:24 UTC (permalink / raw
  To: nico; +Cc: torvalds, jonsmirl, gitster, gcc, git

From: Nicolas Pitre <nico@cam.org>
Date: Tue, 11 Dec 2007 12:21:11 -0500 (EST)

> BUT.  The point is that repacking the gcc repo using "git repack -a -f 
> --window=250" has a radically different memory usage profile whether you 
> do the repack on the earlier 2.1GB pack or the later 300MB pack.  

If you repack on the smaller pack file, git has to expand more stuff
internally in order to search the deltas, whereas with the larger pack
file I bet git has to less often undelta'ify to get base objects blobs
for delta search.

In fact that behavior makes perfect sense to me and I don't understand
GIT internals very well :-)

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11 17:24               ` David Miller
@ 2007-12-11 17:44                 ` Nicolas Pitre
  2007-12-11 20:26                   ` Andreas Ericsson
  0 siblings, 1 reply; 82+ messages in thread
From: Nicolas Pitre @ 2007-12-11 17:44 UTC (permalink / raw
  To: David Miller; +Cc: Linus Torvalds, jonsmirl, Junio C Hamano, gcc, git

On Tue, 11 Dec 2007, David Miller wrote:

> From: Nicolas Pitre <nico@cam.org>
> Date: Tue, 11 Dec 2007 12:21:11 -0500 (EST)
> 
> > BUT.  The point is that repacking the gcc repo using "git repack -a -f 
> > --window=250" has a radically different memory usage profile whether you 
> > do the repack on the earlier 2.1GB pack or the later 300MB pack.  
> 
> If you repack on the smaller pack file, git has to expand more stuff
> internally in order to search the deltas, whereas with the larger pack
> file I bet git has to less often undelta'ify to get base objects blobs
> for delta search.

Of course.  I came to that conclusion two days ago.  And despite being 
pretty familiar with the involved code (I wrote part of it myself) I 
just can't spot anything wrong with it so far.

But somehow the threading code keep distracting people from that issue 
since it gets to do the same work whether or not the source pack is 
densely packed or not.

Nicolas 
(who wish he had access to a much faster machine to investigate this issue)

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11 17:44                 ` Nicolas Pitre
@ 2007-12-11 20:26                   ` Andreas Ericsson
  0 siblings, 0 replies; 82+ messages in thread
From: Andreas Ericsson @ 2007-12-11 20:26 UTC (permalink / raw
  To: Nicolas Pitre
  Cc: David Miller, Linus Torvalds, jonsmirl, Junio C Hamano, gcc, git

Nicolas Pitre wrote:
> On Tue, 11 Dec 2007, David Miller wrote:
> 
>> From: Nicolas Pitre <nico@cam.org>
>> Date: Tue, 11 Dec 2007 12:21:11 -0500 (EST)
>>
>>> BUT.  The point is that repacking the gcc repo using "git repack -a -f 
>>> --window=250" has a radically different memory usage profile whether you 
>>> do the repack on the earlier 2.1GB pack or the later 300MB pack.  
>> If you repack on the smaller pack file, git has to expand more stuff
>> internally in order to search the deltas, whereas with the larger pack
>> file I bet git has to less often undelta'ify to get base objects blobs
>> for delta search.
> 
> Of course.  I came to that conclusion two days ago.  And despite being 
> pretty familiar with the involved code (I wrote part of it myself) I 
> just can't spot anything wrong with it so far.
> 
> But somehow the threading code keep distracting people from that issue 
> since it gets to do the same work whether or not the source pack is 
> densely packed or not.
> 
> Nicolas 
> (who wish he had access to a much faster machine to investigate this issue)

If it's still an issue next week, we'll have a 16 core (8 dual-core cpu's)
machine with some 32gb of ram in that'll be free for about two days.
You'll have to remind me about it though, as I've got a lot on my mind
these days.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11 16:33           ` Linus Torvalds
  2007-12-11 17:21             ` Nicolas Pitre
@ 2007-12-11 18:43             ` Jon Smirl
  2007-12-11 18:57               ` Nicolas Pitre
  2007-12-11 19:17               ` Linus Torvalds
  1 sibling, 2 replies; 82+ messages in thread
From: Jon Smirl @ 2007-12-11 18:43 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Nicolas Pitre, Junio C Hamano, gcc, Git Mailing List

On 12/11/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>
> On Tue, 11 Dec 2007, Jon Smirl wrote:
> >
> > So why does our threaded code take 20 CPU minutes longer (12%) to run
> > than the same code with a single thread?
>
> Threaded code *always* takes more CPU time. The only thing you can hope
> for is a wall-clock reduction. You're seeing probably a combination of
>  (a) more cache misses
>  (b) bigger dataset active at a time
> and a probably fairly miniscule
>  (c) threading itself tends to have some overheads.
>
> > Q6600 is just two E6600s in the same package, the caches are not shared.
>
> Sure they are shared. They're just not *entirely* shared. But they are
> shared between each two cores, so each thread essentially has only half
> the cache they had with the non-threaded version.
>
> Threading is *not* a magic solution to all problems. It gives you
> potentially twice the CPU power, but there are real downsides that you
> should keep in mind.
>
> > Why does the threaded code need 2.24GB (google allocator, 2.85GB gcc)
> > with 4 threads? But only need 950MB with one thread? Where's the extra
> > gigabyte going?
>
> I suspect that it's really simple: you have a few rather big files in the
> gcc history, with deep delta chains. And what happens when you have four
> threads running at the same time is that they all need to keep all those
> objects that they are working on - and their hash state - in memory at the
> same time!
>
> So if you want to use more threads, that _forces_ you to have a bigger
> memory footprint, simply because you have more "live" objects that you
> work on. Normally, that isn't much of a problem, since most source files
> are small, but if you have a few deep delta chains on big files, both the
> delta chain itself is going to use memory (you may have limited the size
> of the cache, but it's still needed for the actual delta generation, so
> it's not like the memory usage went away).

This makes sense. Those runs that blew up to 4.5GB were a combination
of this effect and fragmentation in the gcc allocator. Google
allocator appears to be much better at controlling fragmentation.

Is there a reasonable scheme to force the chains to only be loaded
once and then shared between worker threads? The memory blow up
appears to be directly correlated with chain length.

>
> That said, I suspect there are a few things fighting you:
>
>  - threading is hard. I haven't looked a lot at the changes Nico did to do
>    a threaded object packer, but what I've seen does not convince me it is
>    correct. The "trg_entry" accesses are *mostly* protected with
>    "cache_lock", but nothing else really seems to be, so quite frankly, I
>    wouldn't trust the threaded version very much. It's off by default, and
>    for a good reason, I think.
>
>    For example: the packing code does this:
>
>         if (!src->data) {
>                 read_lock();
>                 src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz);
>                 read_unlock();
>                 ...
>
>    and that's racy. If two threads come in at roughly the same time and
>    see a NULL src->data, theÿ́'ll both get the lock, and they'll both
>    (serially) try to fill it in. It will all *work*, but one of them will
>    have done unnecessary work, and one of them will have their result
>    thrown away and leaked.

That may account for the threaded version needing an extra 20 minutes
CPU time.  An extra 12% of CPU seems like too much overhead for
threading. Just letting a couple of those long chain compressions be
done twice

>
>    Are you hitting issues like this? I dunno. The object sorting means
>    that different threads normally shouldn't look at the same objects (not
>    even the sources), so probably not, but basically, I wouldn't trust the
>    threading 100%. It needs work, and it needs to stay off by default.
>
>  - you're working on a problem that isn't really even worth optimizing
>    that much. The *normal* case is to re-use old deltas, which makes all
>    of the issues you are fighting basically go away (because you only have
>    a few _incremental_ objects that need deltaing).

I agree, this problem only occurs when people import giant
repositories. But every time someone hits these problems they declare
git to be screwed up and proceed to thrash it in their blogs.

>    In other words: the _real_ optimizations have already been done, and
>    are done elsewhere, and are much smarter (the best way to optimize X is
>    not to make X run fast, but to avoid doing X in the first place!). The
>    thing you are trying to work with is the one-time-only case where you
>    explicitly disable that big and important optimization, and then you
>    complain about the end result being slow!
>
>    It's like saying that you're compiling with extreme debugging and no
>    optimizations, and then complaining that the end result doesn't run as
>    fast as if you used -O2. Except this is a hundred times worse, because
>    you literally asked git to do the really expensive thing that it really
>    really doesn't want to do ;)
>
> > Is there another allocator to try? One that combines Google's
> > efficiency with gcc's speed?
>
> See above: I'd look around at threading-related bugs and check the way we
> lock (or don't) accesses.
>
>                 Linus
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11 18:43             ` Jon Smirl
@ 2007-12-11 18:57               ` Nicolas Pitre
  2007-12-11 19:17               ` Linus Torvalds
  1 sibling, 0 replies; 82+ messages in thread
From: Nicolas Pitre @ 2007-12-11 18:57 UTC (permalink / raw
  To: Jon Smirl; +Cc: Linus Torvalds, Junio C Hamano, gcc, Git Mailing List

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3040 bytes --]

On Tue, 11 Dec 2007, Jon Smirl wrote:

> This makes sense. Those runs that blew up to 4.5GB were a combination
> of this effect and fragmentation in the gcc allocator.

I disagree.  This is insane.

> Google allocator appears to be much better at controlling fragmentation.

Indeed.  And if fragmentation is indeed wasting half of Git's memory 
usage then we'll have to come with a custom memory allocator.

> Is there a reasonable scheme to force the chains to only be loaded
> once and then shared between worker threads? The memory blow up
> appears to be directly correlated with chain length.

No.  That would be the equivalent of holding each revision of all files 
uncompressed all at once in memory.

> > That said, I suspect there are a few things fighting you:
> >
> >  - threading is hard. I haven't looked a lot at the changes Nico did to do
> >    a threaded object packer, but what I've seen does not convince me it is
> >    correct. The "trg_entry" accesses are *mostly* protected with
> >    "cache_lock", but nothing else really seems to be, so quite frankly, I
> >    wouldn't trust the threaded version very much. It's off by default, and
> >    for a good reason, I think.
> >
> >    For example: the packing code does this:
> >
> >         if (!src->data) {
> >                 read_lock();
> >                 src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz);
> >                 read_unlock();
> >                 ...
> >
> >    and that's racy. If two threads come in at roughly the same time and
> >    see a NULL src->data, theÿ́'ll both get the lock, and they'll both
> >    (serially) try to fill it in. It will all *work*, but one of them will
> >    have done unnecessary work, and one of them will have their result
> >    thrown away and leaked.
> 
> That may account for the threaded version needing an extra 20 minutes
> CPU time.  An extra 12% of CPU seems like too much overhead for
> threading. Just letting a couple of those long chain compressions be
> done twice

No it may not.  This theory is wrong as explained before.

> >
> >    Are you hitting issues like this? I dunno. The object sorting means
> >    that different threads normally shouldn't look at the same objects (not
> >    even the sources), so probably not, but basically, I wouldn't trust the
> >    threading 100%. It needs work, and it needs to stay off by default.
> >
> >  - you're working on a problem that isn't really even worth optimizing
> >    that much. The *normal* case is to re-use old deltas, which makes all
> >    of the issues you are fighting basically go away (because you only have
> >    a few _incremental_ objects that need deltaing).
> 
> I agree, this problem only occurs when people import giant
> repositories. But every time someone hits these problems they declare
> git to be screwed up and proceed to thrash it in their blogs.

It's not only for repack.  Someone just reported git-blame being 
unusable too due to insane memory usage, which I suspect is due to the 
same issue.


Nicolas

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11 18:43             ` Jon Smirl
  2007-12-11 18:57               ` Nicolas Pitre
@ 2007-12-11 19:17               ` Linus Torvalds
  2007-12-11 19:40                 ` Junio C Hamano
  1 sibling, 1 reply; 82+ messages in thread
From: Linus Torvalds @ 2007-12-11 19:17 UTC (permalink / raw
  To: Jon Smirl; +Cc: Nicolas Pitre, Junio C Hamano, gcc, Git Mailing List

On Tue, 11 Dec 2007, Jon Smirl wrote:
> >
> > So if you want to use more threads, that _forces_ you to have a bigger
> > memory footprint, simply because you have more "live" objects that you
> > work on. Normally, that isn't much of a problem, since most source files
> > are small, but if you have a few deep delta chains on big files, both the
> > delta chain itself is going to use memory (you may have limited the size
> > of the cache, but it's still needed for the actual delta generation, so
> > it's not like the memory usage went away).
> 
> This makes sense. Those runs that blew up to 4.5GB were a combination
> of this effect and fragmentation in the gcc allocator. Google
> allocator appears to be much better at controlling fragmentation.

Yes. I think we do have some case where we simply keep a lot of objects 
around, and if we are talking reasonably large deltas, we'll have the 
whole delta-chain in memory just to unpack one single object.

The delta cache size limits kick in only when we explicitly cache old 
delta results (in case they will be re-used, which is rather common), it 
doesn't affect the normal "I'm using this data right now" case at all.

And then fragmentation makes it much much worse. Since the allocation 
patterns aren't nice (they are pretty random and depend on just the sizes 
of the objects), and the lifetimes aren't always nicely nested _either_ 
(they become more so when you disable the cache entirely, but that's just 
death for performance), I'm not surprised that there can be memory 
allocators that end up having some issues.

> Is there a reasonable scheme to force the chains to only be loaded
> once and then shared between worker threads? The memory blow up
> appears to be directly correlated with chain length.

The worker threads explicitly avoid touching the same objects, and no, you 
definitely don't want to explode the chains globally once, because the 
whole point is that we do fit 15 years worth of history into 300MB of 
pack-file thanks to having a very dense representation. The "loaded once" 
part is the mmap'ing of the pack-file into memory, but if you were to 
actually then try to expand the chains, you'd be talking about many *many* 
more gigabytes of memory than you already see used ;)

So what you actually want to do is to just re-use already packed delta 
chains directly, which is what we normally do. But you are explicitly 
looking at the "--no-reuse-delta" (aka "git repack -f") case, which is why 
it then blows up.

I'm sure we can find places to improve. But I would like to re-iterate the 
statement that you're kind of doing a "don't do that then" case which is 
really - by design - meant to be done once and never again, and is using 
resources - again, pretty much by design - wildly inappropriately just to 
get an initial packing done.

> That may account for the threaded version needing an extra 20 minutes
> CPU time.  An extra 12% of CPU seems like too much overhead for
> threading. Just letting a couple of those long chain compressions be
> done twice

Well, Nico pointed out that those things should all be thread-private 
data, so no, the race isn't there (unless there's some other bug there).

> I agree, this problem only occurs when people import giant
> repositories. But every time someone hits these problems they declare
> git to be screwed up and proceed to thrash it in their blogs.

Sure. I'd love to do global packing without paying the cost, but it really 
was a design decision. Thanks to doing off-line packing ("let it run 
overnight on some beefy machine") we can get better results. It's 
expensive, yes. But it was pretty much meant to be expensive. It's a very 
efficient compression algorithm, after all, and you're turning it up to 
eleven ;)

I also suspect that the gcc archive makes things more interesting thanks 
to having some rather large files. The ChangeLog is probably the worst 
case (large file with *lots* of edits), but I suspect the *.po files 
aren't wonderful either.

			Linus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11 19:17               ` Linus Torvalds
@ 2007-12-11 19:40                 ` Junio C Hamano
  2007-12-11 20:34                   ` Andreas Ericsson
  0 siblings, 1 reply; 82+ messages in thread
From: Junio C Hamano @ 2007-12-11 19:40 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Jon Smirl, Nicolas Pitre, gcc, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Tue, 11 Dec 2007, Jon Smirl wrote:
>> >
>> > So if you want to use more threads, that _forces_ you to have a bigger
>> > memory footprint, simply because you have more "live" objects that you
>> > work on. Normally, that isn't much of a problem, since most source files
>> > are small, but if you have a few deep delta chains on big files, both the
>> > delta chain itself is going to use memory (you may have limited the size
>> > of the cache, but it's still needed for the actual delta generation, so
>> > it's not like the memory usage went away).
>> 
>> This makes sense. Those runs that blew up to 4.5GB were a combination
>> of this effect and fragmentation in the gcc allocator. Google
>> allocator appears to be much better at controlling fragmentation.
>
> Yes. I think we do have some case where we simply keep a lot of objects 
> around, and if we are talking reasonably large deltas, we'll have the 
> whole delta-chain in memory just to unpack one single object.

Eh, excuse me.  unpack_delta_entry()

 - first unpacks the base object (this goes recursive);
 - uncompresses the delta;
 - applies the delta to the base to obtain the target object;
 - frees delta;
 - frees (but allows it to be cached) the base object;
 - returns the result

So no matter how deep a chain is, you keep only one delta at a time in
core, not whole delta-chain in core.

> So what you actually want to do is to just re-use already packed delta 
> chains directly, which is what we normally do. But you are explicitly 
> looking at the "--no-reuse-delta" (aka "git repack -f") case, which is why 
> it then blows up.

While that does not explain, as Nico pointed out, the huge difference
between the two repack runs that have different starting pack, I would
say it is a fair thing to say.  If you have a suboptimal pack (i.e. not
enough reusable deltas, as in the 2.1GB pack case), do run "repack -f",
but if you have a good pack (i.e. 300MB pack), don't.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11 19:40                 ` Junio C Hamano
@ 2007-12-11 20:34                   ` Andreas Ericsson
  0 siblings, 0 replies; 82+ messages in thread
From: Andreas Ericsson @ 2007-12-11 20:34 UTC (permalink / raw
  To: Junio C Hamano
  Cc: Linus Torvalds, Jon Smirl, Nicolas Pitre, gcc, Git Mailing List

Junio C Hamano wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
>> So what you actually want to do is to just re-use already packed delta 
>> chains directly, which is what we normally do. But you are explicitly 
>> looking at the "--no-reuse-delta" (aka "git repack -f") case, which is why 
>> it then blows up.
> 
> While that does not explain, as Nico pointed out, the huge difference
> between the two repack runs that have different starting pack, I would
> say it is a fair thing to say.  If you have a suboptimal pack (i.e. not
> enough reusable deltas, as in the 2.1GB pack case), do run "repack -f",
> but if you have a good pack (i.e. 300MB pack), don't.


I think this is too much of a mystery for a lot of people to let it go.
Even I started looking into it, and I've got so little spare time just
now that I wouldn't stand much of a chance of making a contribution
even if I had written the code originally.

That being said, I the fact that some git repositories really *can't*
be repacked on some machines (because it eats ALL virtual memory) is
really something that lowers git's reputation among huge projects.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11  7:01         ` Jon Smirl
                             ` (2 preceding siblings ...)
  2007-12-11 16:33           ` Linus Torvalds
@ 2007-12-11 17:28           ` Daniel Berlin
  3 siblings, 0 replies; 82+ messages in thread
From: Daniel Berlin @ 2007-12-11 17:28 UTC (permalink / raw
  To: Jon Smirl; +Cc: Nicolas Pitre, Junio C Hamano, gcc, Git Mailing List

On 12/11/07, Jon Smirl <jonsmirl@gmail.com> wrote:
>
> Total CPU time 196 CPU minutes vs 190 for gcc. Google's claims of
> being faster are not true.

Depends on your allocation patterns. For our apps, it certainly is :)
Of course, i don't know if we've updated the external allocator in a
while, i'll bug the people in charge of it.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11  5:29       ` Jon Smirl
  2007-12-11  7:01         ` Jon Smirl
@ 2007-12-11 13:31         ` Nicolas Pitre
  1 sibling, 0 replies; 82+ messages in thread
From: Nicolas Pitre @ 2007-12-11 13:31 UTC (permalink / raw
  To: Jon Smirl; +Cc: Junio C Hamano, gcc, Git Mailing List

On Tue, 11 Dec 2007, Jon Smirl wrote:

> I added the gcc people to the CC, it's their repository. Maybe they
> can help up sort this out.

Unless there is a Git expert amongst the gcc crowd, I somehow doubt it. 
And gcc people with an interest in Git internals are probably already on 
the Git mailing list.


Nicolas

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11  5:25     ` Jon Smirl
  2007-12-11  5:29       ` Jon Smirl
@ 2007-12-11  6:01       ` Sean
  2007-12-11  6:20         ` Jon Smirl
  1 sibling, 1 reply; 82+ messages in thread
From: Sean @ 2007-12-11  6:01 UTC (permalink / raw
  To: Jon Smirl; +Cc: Nicolas Pitre, Junio C Hamano, Git Mailing List

On Tue, 11 Dec 2007 00:25:55 -0500
"Jon Smirl" <jonsmirl@gmail.com> wrote:

> Something is hurting bad with threads. 170 CPU minutes with one
> thread, versus 195 CPU minutes with four threads.
> 
> Is there a different memory allocator that can be used when
> multithreaded on gcc? This whole problem may be coming from the memory
> allocation function. git is hardly interacting at all on the thread
> level so it's likely a problem in the C run-time.

You might want to try Google's malloc, it's basically a drop in replacement
with some optional built-in performance monitoring capabilities.  It is said
to be much faster and better at threading than glibc's:

  http://code.google.com/p/google-perftools/wiki/GooglePerformanceTools
  http://google-perftools.googlecode.com/svn/trunk/doc/tcmalloc.html


You can LD_PRELOAD it or link directly.

Cheers,
Sean

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Something is broken in repack
  2007-12-11  6:01       ` Sean
@ 2007-12-11  6:20         ` Jon Smirl
  0 siblings, 0 replies; 82+ messages in thread
From: Jon Smirl @ 2007-12-11  6:20 UTC (permalink / raw
  To: Sean; +Cc: Nicolas Pitre, Junio C Hamano, Git Mailing List

On 12/11/07, Sean <seanlkml@sympatico.ca> wrote:
> On Tue, 11 Dec 2007 00:25:55 -0500
> "Jon Smirl" <jonsmirl@gmail.com> wrote:
>
> > Something is hurting bad with threads. 170 CPU minutes with one
> > thread, versus 195 CPU minutes with four threads.
> >
> > Is there a different memory allocator that can be used when
> > multithreaded on gcc? This whole problem may be coming from the memory
> > allocation function. git is hardly interacting at all on the thread
> > level so it's likely a problem in the C run-time.
>
> You might want to try Google's malloc, it's basically a drop in replacement
> with some optional built-in performance monitoring capabilities.  It is said
> to be much faster and better at threading than glibc's:
>
>   http://code.google.com/p/google-perftools/wiki/GooglePerformanceTools
>   http://google-perftools.googlecode.com/svn/trunk/doc/tcmalloc.html
>
>
> You can LD_PRELOAD it or link directly.

I'm 45 minutes into a run using it. It doesn't seem to be any faster
but it is reducing memory consumption significantly. The run should be
done in another 20 minutes or so.


>
> Cheers,
> Sean
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 82+ messages in thread

end of thread, other threads:[~2007-12-14 17:00 UTC | newest]

Thread overview: 82+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-07 23:05 Something is broken in repack Jon Smirl
2007-12-08  0:37 ` Linus Torvalds
2007-12-08  1:27   ` [PATCH] pack-objects: fix delta cache size accounting Nicolas Pitre
2007-12-08  1:46 ` Something is broken in repack Nicolas Pitre
2007-12-08  2:04   ` Jon Smirl
2007-12-08  2:28     ` Nicolas Pitre
2007-12-08  3:29       ` Jon Smirl
2007-12-08  3:37         ` David Brown
2007-12-08  4:22           ` Jon Smirl
2007-12-08  4:30             ` Nicolas Pitre
2007-12-08  5:01               ` Jon Smirl
2007-12-08  5:12                 ` Nicolas Pitre
2007-12-08  3:48         ` Harvey Harrison
2007-12-08  2:22   ` Jon Smirl
2007-12-08  3:44   ` Harvey Harrison
2007-12-08 22:18   ` Junio C Hamano
2007-12-09  8:05     ` Junio C Hamano
2007-12-09 15:19       ` Jon Smirl
2007-12-09 18:25       ` Jon Smirl
2007-12-10  1:07         ` Nicolas Pitre
2007-12-10  2:49     ` Nicolas Pitre
2007-12-08  2:56 ` David Brown
2007-12-10 19:56 ` Nicolas Pitre
2007-12-10 20:05   ` Jon Smirl
2007-12-10 20:16     ` Morten Welinder
2007-12-11  2:25 ` Jon Smirl
2007-12-11  2:55   ` Junio C Hamano
2007-12-11  3:27     ` Nicolas Pitre
2007-12-11 11:08       ` David Kastrup
2007-12-11 12:08         ` Pierre Habouzit
2007-12-11 12:18           ` David Kastrup
2007-12-11  3:49   ` Nicolas Pitre
2007-12-11  5:25     ` Jon Smirl
2007-12-11  5:29       ` Jon Smirl
2007-12-11  7:01         ` Jon Smirl
2007-12-11  7:34           ` Andreas Ericsson
2007-12-11 13:49           ` Nicolas Pitre
2007-12-11 15:00             ` Nicolas Pitre
2007-12-11 15:36               ` Jon Smirl
2007-12-11 16:20               ` Nicolas Pitre
2007-12-11 16:21                 ` Jon Smirl
2007-12-12  5:12                   ` Nicolas Pitre
2007-12-12  8:05                     ` David Kastrup
2007-12-14 16:18                       ` Wolfram Gloger
2007-12-12 15:48                     ` Nicolas Pitre
2007-12-12 16:17                       ` Paolo Bonzini
2007-12-12 16:37                       ` Linus Torvalds
2007-12-12 16:42                         ` David Miller
2007-12-12 16:54                           ` Linus Torvalds
2007-12-12 17:12                         ` Jon Smirl
2007-12-14 16:12                         ` Wolfram Gloger
2007-12-14 16:45                           ` David Kastrup
2007-12-14 16:59                             ` Wolfram Gloger
2007-12-13 13:32                       ` Nguyen Thai Ngoc Duy
2007-12-13 15:32                         ` Paolo Bonzini
2007-12-13 16:29                           ` Paolo Bonzini
2007-12-13 16:39                           ` Johannes Sixt
2007-12-14  1:04                             ` Jakub Narebski
2007-12-14  6:14                               ` Paolo Bonzini
2007-12-14  6:24                                 ` Nguyen Thai Ngoc Duy
2007-12-14  8:20                                   ` Paolo Bonzini
2007-12-14  9:01                                     ` Harvey Harrison
2007-12-14 10:40                                   ` Jakub Narebski
2007-12-14 10:52                                     ` Nguyen Thai Ngoc Duy
2007-12-14 13:25                                 ` Nicolas Pitre
2007-12-12 16:13                     ` Nicolas Pitre
2007-12-13  7:32                       ` Andreas Ericsson
2007-12-14 16:03                         ` Wolfram Gloger
2007-12-11 16:33           ` Linus Torvalds
2007-12-11 17:21             ` Nicolas Pitre
2007-12-11 17:24               ` David Miller
2007-12-11 17:44                 ` Nicolas Pitre
2007-12-11 20:26                   ` Andreas Ericsson
2007-12-11 18:43             ` Jon Smirl
2007-12-11 18:57               ` Nicolas Pitre
2007-12-11 19:17               ` Linus Torvalds
2007-12-11 19:40                 ` Junio C Hamano
2007-12-11 20:34                   ` Andreas Ericsson
2007-12-11 17:28           ` Daniel Berlin
2007-12-11 13:31         ` Nicolas Pitre
2007-12-11  6:01       ` Sean
2007-12-11  6:20         ` Jon Smirl

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).