* Re: Git and GCC [not found] <4aca3dc20712051108s216d3331t8061ef45b9aa324a@mail.gmail.com> @ 2007-12-06 2:28 ` David Miller 2007-12-06 2:41 ` Daniel Berlin [not found] ` <2007-12-05-21-23-14+trackit+sam@rfc1149.net> 1 sibling, 1 reply; 90+ messages in thread From: David Miller @ 2007-12-06 2:28 UTC (permalink / raw) To: dberlin; +Cc: ismail, gcc, git From: "Daniel Berlin" <dberlin@dberlin.org> Date: Wed, 5 Dec 2007 14:08:41 -0500 > So I tried a full history conversion using git-svn of the gcc > repository (IE every trunk revision from 1-HEAD as of yesterday) > The git-svn import was done using repacks every 1000 revisions. > After it finished, I used git-gc --aggressive --prune. Two hours > later, it finished. > The final size after this is 1.5 gig for all of the history of gcc for > just trunk. > > dberlin@home:/compilerstuff/gitgcc/gccrepo/.git/objects/pack$ ls -trl > total 1568899 > -r--r--r-- 1 dberlin dberlin 1585972834 2007-12-05 14:01 > pack-cd328fcf0bd673d8f2f72c42fbe67da64cbcd218.pack > -r--r--r-- 1 dberlin dberlin 19008488 2007-12-05 14:01 > pack-cd328fcf0bd673d8f2f72c42fbe67da64cbcd218.idx > > This is 3x bigger than hg *and* hg doesn't require me to waste my life > repacking every so often. > The hg operations run roughly as fast as the git ones > > I'm sure there are magic options, magic command lines, etc, i could > use to make it smaller. > > I'm sure if i spent the next few weeks fucking around with git, it may > even be usable! > > But given that git is harder to use, requires manual repacking to get > any kind of sane space usage, and is 3x bigger anyway, i don't see any > advantage to continuing to experiment with git and gcc. I would really appreciate it if you would share experiences like this with the GIT community, who have been now CC:'d. That's the only way this situation is going to improve. When you don't CC: the people who can fix the problem, I can only speculate that perhaps at least subconsciously you don't care if the situation improves or not. The OpenSolaris folks behaved similarly, and that really ticked me off. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 2:28 ` Git and GCC David Miller @ 2007-12-06 2:41 ` Daniel Berlin 2007-12-06 2:52 ` David Miller 0 siblings, 1 reply; 90+ messages in thread From: Daniel Berlin @ 2007-12-06 2:41 UTC (permalink / raw) To: David Miller; +Cc: ismail, gcc, git On 12/5/07, David Miller <davem@davemloft.net> wrote: > From: "Daniel Berlin" <dberlin@dberlin.org> > Date: Wed, 5 Dec 2007 14:08:41 -0500 > > > So I tried a full history conversion using git-svn of the gcc > > repository (IE every trunk revision from 1-HEAD as of yesterday) > > The git-svn import was done using repacks every 1000 revisions. > > After it finished, I used git-gc --aggressive --prune. Two hours > > later, it finished. > > The final size after this is 1.5 gig for all of the history of gcc for > > just trunk. > > > > dberlin@home:/compilerstuff/gitgcc/gccrepo/.git/objects/pack$ ls -trl > > total 1568899 > > -r--r--r-- 1 dberlin dberlin 1585972834 2007-12-05 14:01 > > pack-cd328fcf0bd673d8f2f72c42fbe67da64cbcd218.pack > > -r--r--r-- 1 dberlin dberlin 19008488 2007-12-05 14:01 > > pack-cd328fcf0bd673d8f2f72c42fbe67da64cbcd218.idx > > > > This is 3x bigger than hg *and* hg doesn't require me to waste my life > > repacking every so often. > > The hg operations run roughly as fast as the git ones > > > > I'm sure there are magic options, magic command lines, etc, i could > > use to make it smaller. > > > > I'm sure if i spent the next few weeks fucking around with git, it may > > even be usable! > > > > But given that git is harder to use, requires manual repacking to get > > any kind of sane space usage, and is 3x bigger anyway, i don't see any > > advantage to continuing to experiment with git and gcc. > > I would really appreciate it if you would share experiences > like this with the GIT community, who have been now CC:'d. > > That's the only way this situation is going to improve. > > When you don't CC: the people who can fix the problem, I can only > speculate that perhaps at least subconsciously you don't care if > the situation improves or not. > I didn't cc the git community for three reasons 1. It's not the nicest message in the world, and thus, more likely to get bad responses than constructive ones. 2. Based on the level of usability, I simply assume it is too young for regular developers to use. At least, I hope this is the case. 3. People i know have had bad experiences talking usability issues with the git community in the past. I am not likely to fare any better, so I would rather have someone who is involved with both our community and theirs, raise these issues, rather than a complete newcomer. But hey, whatever floats your boat :) It is true I gave up quickly, but this is mainly because i don't like to fight with my tools. I am quite fine with a distributed workflow, I now use 8 or so gcc branches in mercurial (auto synced from svn) and merge a lot between them. I wanted to see if git would sanely let me manage the commits back to svn. After fighting with it, i gave up and just wrote a python extension to hg that lets me commit non-svn changesets back to svn directly from hg. --Dan ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 2:41 ` Daniel Berlin @ 2007-12-06 2:52 ` David Miller 2007-12-06 3:47 ` Daniel Berlin 2007-12-06 11:57 ` Johannes Schindelin 0 siblings, 2 replies; 90+ messages in thread From: David Miller @ 2007-12-06 2:52 UTC (permalink / raw) To: dberlin; +Cc: ismail, gcc, git From: "Daniel Berlin" <dberlin@dberlin.org> Date: Wed, 5 Dec 2007 21:41:19 -0500 > It is true I gave up quickly, but this is mainly because i don't like > to fight with my tools. > I am quite fine with a distributed workflow, I now use 8 or so gcc > branches in mercurial (auto synced from svn) and merge a lot between > them. I wanted to see if git would sanely let me manage the commits > back to svn. After fighting with it, i gave up and just wrote a > python extension to hg that lets me commit non-svn changesets back to > svn directly from hg. I find it ironic that you were even willing to write tools to facilitate your hg based gcc workflow. That really shows what your thinking is on this matter, in that you're willing to put effort towards making hg work better for you but you're not willing to expend that level of effort to see if git can do so as well. This is what really eats me from the inside about your dissatisfaction with git. Your analysis seems to be a self-fullfilling prophecy, and that's totally unfair to both hg and git. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 2:52 ` David Miller @ 2007-12-06 3:47 ` Daniel Berlin 2007-12-06 4:20 ` David Miller 2007-12-06 4:25 ` Harvey Harrison 2007-12-06 11:57 ` Johannes Schindelin 1 sibling, 2 replies; 90+ messages in thread From: Daniel Berlin @ 2007-12-06 3:47 UTC (permalink / raw) To: David Miller; +Cc: ismail, gcc, git On 12/5/07, David Miller <davem@davemloft.net> wrote: > From: "Daniel Berlin" <dberlin@dberlin.org> > Date: Wed, 5 Dec 2007 21:41:19 -0500 > > > It is true I gave up quickly, but this is mainly because i don't like > > to fight with my tools. > > I am quite fine with a distributed workflow, I now use 8 or so gcc > > branches in mercurial (auto synced from svn) and merge a lot between > > them. I wanted to see if git would sanely let me manage the commits > > back to svn. After fighting with it, i gave up and just wrote a > > python extension to hg that lets me commit non-svn changesets back to > > svn directly from hg. > > I find it ironic that you were even willing to write tools to > facilitate your hg based gcc workflow. Why? > That really shows what your > thinking is on this matter, in that you're willing to put effort > towards making hg work better for you but you're not willing to expend > that level of effort to see if git can do so as well. See, now you claim to know my thinking. I went back to hg because the GIT's space usage wasn't even in the ballpark, i couldn't get git-svn rebase to update the revs after the initial import (even though i had properly used a rewriteRoot). The size is clearly not just svn data, it's in the git pack itself. I spent a long time working on SVN to reduce it's space usage (repo side and cleaning up the client side and giving a path to svn devs to reduce it further), as well as ui issues, and I really don't feel like having to do the same for GIT. I'm tired of having to spend a large amount of effort to get my tools to work. If the community wants to find and fix the problem, i've already said repeatedly i'll happily give over my repo, data, whatever. You are correct i am not going to spend even more effort when i can be productive with something else much quicker. The devil i know (committing to svn) is better than the devil i don't (diving into git source code and finding/fixing what is causing this space blowup). The python extension took me a few hours (< 4). In git, i spent these hours waiting for git-gc to finish. > This is what really eats me from the inside about your dissatisfaction > with git. Your analysis seems to be a self-fullfilling prophecy, and > that's totally unfair to both hg and git. Oh? You seem to be taking this awfully personally. I came into this completely open minded. Really, I did (i'm sure you'll claim otherwise). GIT people told me it would work great and i'd have a really small git repo and be able to commit back to svn. I tried it. It didn't work out. It doesn't seem to be usable for whatever reason. I'm happy to give details, data, whatever. I made the engineering decision that my effort would be better spent doing something I knew i could do quickly (make hg commit back to svn for my purposes) then trying to improve larger issues in GIT (UI and space usage). That took me a few hours, and I was happy again. I would have been incredibly happy to have git just have come up with a 400 meg gcc repository, and to be happily committing away from git-svn to gcc's repository ... But it didn't happen. So far, you have yet to actually do anything but incorrectly tell me what I am thinking. I'll probably try again in 6 months, and maybe it will be better. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 3:47 ` Daniel Berlin @ 2007-12-06 4:20 ` David Miller 2007-12-06 4:28 ` Harvey Harrison 2007-12-06 4:32 ` Daniel Berlin 2007-12-06 4:25 ` Harvey Harrison 1 sibling, 2 replies; 90+ messages in thread From: David Miller @ 2007-12-06 4:20 UTC (permalink / raw) To: dberlin; +Cc: ismail, gcc, git From: "Daniel Berlin" <dberlin@dberlin.org> Date: Wed, 5 Dec 2007 22:47:01 -0500 > The size is clearly not just svn data, it's in the git pack itself. And other users have shown much smaller metadata from a GIT import, and yes those are including all of the repository history and branches not just the trunk. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 4:20 ` David Miller @ 2007-12-06 4:28 ` Harvey Harrison 2007-12-06 4:32 ` Daniel Berlin 1 sibling, 0 replies; 90+ messages in thread From: Harvey Harrison @ 2007-12-06 4:28 UTC (permalink / raw) To: David Miller; +Cc: dberlin, ismail, gcc, git On Wed, 2007-12-05 at 20:20 -0800, David Miller wrote: > From: "Daniel Berlin" <dberlin@dberlin.org> > Date: Wed, 5 Dec 2007 22:47:01 -0500 > > > The size is clearly not just svn data, it's in the git pack itself. > > And other users have shown much smaller metadata from a GIT import, > and yes those are including all of the repository history and branches > not just the trunk. David, I think it is actually a bug in git gc with the --aggressive option...mind you, even if he solves that the format git svn uses for its bi-directional metadata is so space-inefficient Daniel will be crying for other reasons immediately afterwards...4MB for every branch and tag in gcc svn (more than a few thousand). You only need it around for any branches you are planning on committing to but it is all created during the default git svn import. FYI Harvey ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 4:20 ` David Miller 2007-12-06 4:28 ` Harvey Harrison @ 2007-12-06 4:32 ` Daniel Berlin 2007-12-06 4:48 ` David Miller 1 sibling, 1 reply; 90+ messages in thread From: Daniel Berlin @ 2007-12-06 4:32 UTC (permalink / raw) To: David Miller; +Cc: ismail, gcc, git On 12/5/07, David Miller <davem@davemloft.net> wrote: > From: "Daniel Berlin" <dberlin@dberlin.org> > Date: Wed, 5 Dec 2007 22:47:01 -0500 > > > The size is clearly not just svn data, it's in the git pack itself. > > And other users have shown much smaller metadata from a GIT import, > and yes those are including all of the repository history and branches > not just the trunk. I followed the instructions in the tutorials. I followed the instructions given to by people who created these. I came up with a 1.5 gig pack file. You want to help, or you want to argue with me. Right now it sounds like you are trying to blame me or make it look like i did something wrong. You are of course, welcome to try it yourself. I can give you the absolute exactly commands I gave, and with git 1.5.3.7, it will give you a 1.5 gig pack file. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 4:32 ` Daniel Berlin @ 2007-12-06 4:48 ` David Miller 2007-12-06 5:11 ` Daniel Berlin 0 siblings, 1 reply; 90+ messages in thread From: David Miller @ 2007-12-06 4:48 UTC (permalink / raw) To: dberlin; +Cc: ismail, gcc, git From: "Daniel Berlin" <dberlin@dberlin.org> Date: Wed, 5 Dec 2007 23:32:52 -0500 > On 12/5/07, David Miller <davem@davemloft.net> wrote: > > From: "Daniel Berlin" <dberlin@dberlin.org> > > Date: Wed, 5 Dec 2007 22:47:01 -0500 > > > > > The size is clearly not just svn data, it's in the git pack itself. > > > > And other users have shown much smaller metadata from a GIT import, > > and yes those are including all of the repository history and branches > > not just the trunk. > I followed the instructions in the tutorials. > I followed the instructions given to by people who created these. > I came up with a 1.5 gig pack file. > You want to help, or you want to argue with me. Several people replied in this thread showing what options can lead to smaller pack files. They also listed what the GIT limitations are that would effect the kind of work you are doing, which seemed to mostly deal with the high space cost of branching and tags when converting to/from SVN repos. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 4:48 ` David Miller @ 2007-12-06 5:11 ` Daniel Berlin 2007-12-06 5:15 ` Harvey Harrison 2007-12-06 6:09 ` Linus Torvalds 0 siblings, 2 replies; 90+ messages in thread From: Daniel Berlin @ 2007-12-06 5:11 UTC (permalink / raw) To: David Miller; +Cc: ismail, gcc, git On 12/5/07, David Miller <davem@davemloft.net> wrote: > From: "Daniel Berlin" <dberlin@dberlin.org> > Date: Wed, 5 Dec 2007 23:32:52 -0500 > > > On 12/5/07, David Miller <davem@davemloft.net> wrote: > > > From: "Daniel Berlin" <dberlin@dberlin.org> > > > Date: Wed, 5 Dec 2007 22:47:01 -0500 > > > > > > > The size is clearly not just svn data, it's in the git pack itself. > > > > > > And other users have shown much smaller metadata from a GIT import, > > > and yes those are including all of the repository history and branches > > > not just the trunk. > > I followed the instructions in the tutorials. > > I followed the instructions given to by people who created these. > > I came up with a 1.5 gig pack file. > > You want to help, or you want to argue with me. > > Several people replied in this thread showing what options can lead to > smaller pack files. Actually, one person did, but that's okay, let's assume it was several. I am currently trying Harvey's options. I asked about using the pre-existing repos so i didn't have to do this, but they were all 1. Done using read-only imports or 2. Don't contain full history (IE the one that contains full history that is often posted here was done as a read only import and thus doesn't have the metadata). > They also listed what the GIT limitations are that would effect the > kind of work you are doing, which seemed to mostly deal with the high > space cost of branching and tags when converting to/from SVN repos. Actually, it turns out that git-gc --aggressive does this dumb thing to pack files sometimes regardless of whether you converted from an SVN repo or not. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 5:11 ` Daniel Berlin @ 2007-12-06 5:15 ` Harvey Harrison 2007-12-06 5:17 ` Daniel Berlin 2007-12-06 6:09 ` Linus Torvalds 1 sibling, 1 reply; 90+ messages in thread From: Harvey Harrison @ 2007-12-06 5:15 UTC (permalink / raw) To: Daniel Berlin; +Cc: David Miller, ismail, gcc, git On Thu, 2007-12-06 at 00:11 -0500, Daniel Berlin wrote: > On 12/5/07, David Miller <davem@davemloft.net> wrote: > > From: "Daniel Berlin" <dberlin@dberlin.org> > > Date: Wed, 5 Dec 2007 23:32:52 -0500 > > > > > On 12/5/07, David Miller <davem@davemloft.net> wrote: > > > > From: "Daniel Berlin" <dberlin@dberlin.org> > > > > Date: Wed, 5 Dec 2007 22:47:01 -0500 > > > > > > > > > The size is clearly not just svn data, it's in the git pack itself. > > > > > > > > And other users have shown much smaller metadata from a GIT import, > > > > and yes those are including all of the repository history and branches > > > > not just the trunk. > > > I followed the instructions in the tutorials. > > > I followed the instructions given to by people who created these. > > > I came up with a 1.5 gig pack file. > > > You want to help, or you want to argue with me. > > > > Several people replied in this thread showing what options can lead to > > smaller pack files. > > Actually, one person did, but that's okay, let's assume it was several. > I am currently trying Harvey's options. > > I asked about using the pre-existing repos so i didn't have to do > this, but they were all > 1. Done using read-only imports or > 2. Don't contain full history > (IE the one that contains full history that is often posted here was > done as a read only import and thus doesn't have the metadata). While you won't get the git svn metadata if you clone the infradead repo, it can be recreated on the fly by git svn if you want to start commiting directly to gcc svn. Harvey ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 5:15 ` Harvey Harrison @ 2007-12-06 5:17 ` Daniel Berlin 2007-12-06 6:47 ` Jon Smirl 0 siblings, 1 reply; 90+ messages in thread From: Daniel Berlin @ 2007-12-06 5:17 UTC (permalink / raw) To: Harvey Harrison; +Cc: David Miller, ismail, gcc, git > While you won't get the git svn metadata if you clone the infradead > repo, it can be recreated on the fly by git svn if you want to start > commiting directly to gcc svn. > I will give this a try :) ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 5:17 ` Daniel Berlin @ 2007-12-06 6:47 ` Jon Smirl 2007-12-06 7:15 ` Jeff King 0 siblings, 1 reply; 90+ messages in thread From: Jon Smirl @ 2007-12-06 6:47 UTC (permalink / raw) To: Daniel Berlin; +Cc: Harvey Harrison, David Miller, ismail, gcc, git On 12/6/07, Daniel Berlin <dberlin@dberlin.org> wrote: > > While you won't get the git svn metadata if you clone the infradead > > repo, it can be recreated on the fly by git svn if you want to start > > commiting directly to gcc svn. > > > I will give this a try :) Back when I was working on the Mozilla repository we were able to convert the full 4GB CVS repository complete with all history into a 450MB pack file. That work is where the git-fastimport tool came from. But it took a month of messing with the import tools to achieve this and Mozilla still chose another VCS (mainly because of poor Windows support in git). Like Linus says, this type of command will yield the smallest pack file: git repack -a -d --depth=250 --window=250 I do agree that importing multi-gigabyte repositories is not a daily occurrence nor a turn-key operation. There are significant issues when translating from one VCS to another. The lack of global branch tracking in CVS causes extreme problems on import. Hand editing of CVS files also caused endless trouble. The key to converting repositories of this size is RAM. 4GB minimum, more would be better. git-repack is not multi-threaded. There were a few attempts at making it multi-threaded but none were too successful. If I remember right, with loads of RAM, a repack on a 450MB repository was taking about five hours on a 2.8Ghz Core2. But this is something you only have to do once for the import. Later repacks will reuse the original deltas. -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 6:47 ` Jon Smirl @ 2007-12-06 7:15 ` Jeff King 2007-12-06 14:18 ` Nicolas Pitre 0 siblings, 1 reply; 90+ messages in thread From: Jeff King @ 2007-12-06 7:15 UTC (permalink / raw) To: Jon Smirl; +Cc: Daniel Berlin, Harvey Harrison, David Miller, ismail, gcc, git On Thu, Dec 06, 2007 at 01:47:54AM -0500, Jon Smirl wrote: > The key to converting repositories of this size is RAM. 4GB minimum, > more would be better. git-repack is not multi-threaded. There were a > few attempts at making it multi-threaded but none were too successful. > If I remember right, with loads of RAM, a repack on a 450MB repository > was taking about five hours on a 2.8Ghz Core2. But this is something > you only have to do once for the import. Later repacks will reuse the > original deltas. Actually, Nicolas put quite a bit of work into multi-threading the repack process; the results have been in master for some time, and will be in the soon-to-be-released v1.5.4. The downside is that the threading partitions the object space, so the resulting size is not necessarily as small (but I don't know that anybody has done testing on large repos to find out how large the difference is). -Peff ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 7:15 ` Jeff King @ 2007-12-06 14:18 ` Nicolas Pitre 2007-12-06 17:39 ` Jeff King 0 siblings, 1 reply; 90+ messages in thread From: Nicolas Pitre @ 2007-12-06 14:18 UTC (permalink / raw) To: Jeff King Cc: Jon Smirl, Daniel Berlin, Harvey Harrison, David Miller, ismail, gcc, git On Thu, 6 Dec 2007, Jeff King wrote: > On Thu, Dec 06, 2007 at 01:47:54AM -0500, Jon Smirl wrote: > > > The key to converting repositories of this size is RAM. 4GB minimum, > > more would be better. git-repack is not multi-threaded. There were a > > few attempts at making it multi-threaded but none were too successful. > > If I remember right, with loads of RAM, a repack on a 450MB repository > > was taking about five hours on a 2.8Ghz Core2. But this is something > > you only have to do once for the import. Later repacks will reuse the > > original deltas. > > Actually, Nicolas put quite a bit of work into multi-threading the > repack process; the results have been in master for some time, and will > be in the soon-to-be-released v1.5.4. > > The downside is that the threading partitions the object space, so the > resulting size is not necessarily as small (but I don't know that > anybody has done testing on large repos to find out how large the > difference is). Quick guesstimate is in the 1% ballpark. Nicolas ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 14:18 ` Nicolas Pitre @ 2007-12-06 17:39 ` Jeff King 2007-12-06 18:02 ` Nicolas Pitre ` (2 more replies) 0 siblings, 3 replies; 90+ messages in thread From: Jeff King @ 2007-12-06 17:39 UTC (permalink / raw) To: Nicolas Pitre Cc: Jon Smirl, Daniel Berlin, Harvey Harrison, David Miller, ismail, gcc, git On Thu, Dec 06, 2007 at 09:18:39AM -0500, Nicolas Pitre wrote: > > The downside is that the threading partitions the object space, so the > > resulting size is not necessarily as small (but I don't know that > > anybody has done testing on large repos to find out how large the > > difference is). > > Quick guesstimate is in the 1% ballpark. Fortunately, we now have numbers. Harvey Harrison reported repacking the gcc repo and getting these results: > /usr/bin/time git repack -a -d -f --window=250 --depth=250 > > 23266.37user 581.04system 7:41:25elapsed 86%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (419835major+123275804minor)pagefaults 0swaps > > -r--r--r-- 1 hharrison hharrison 29091872 2007-12-06 07:26 pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.idx > -r--r--r-- 1 hharrison hharrison 324094684 2007-12-06 07:26 pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.pack I tried the threaded repack with pack.threads = 3 on a dual-processor machine, and got: time git repack -a -d -f --window=250 --depth=250 real 309m59.849s user 377m43.948s sys 8m23.319s -r--r--r-- 1 peff peff 28570088 2007-12-06 10:11 pack-1fa336f33126d762988ed6fc3f44ecbe0209da3c.idx -r--r--r-- 1 peff peff 339922573 2007-12-06 10:11 pack-1fa336f33126d762988ed6fc3f44ecbe0209da3c.pack So it is about 5% bigger. What is really disappointing is that we saved only about 20% of the time. I didn't sit around watching the stages, but my guess is that we spent a long time in the single threaded "writing objects" stage with a thrashing delta cache. -Peff ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 17:39 ` Jeff King @ 2007-12-06 18:02 ` Nicolas Pitre 2007-12-07 6:50 ` Jeff King 2007-12-06 18:35 ` Linus Torvalds 2007-12-07 3:31 ` David Miller 2 siblings, 1 reply; 90+ messages in thread From: Nicolas Pitre @ 2007-12-06 18:02 UTC (permalink / raw) To: Jeff King Cc: Jon Smirl, Daniel Berlin, Harvey Harrison, David Miller, ismail, gcc, git On Thu, 6 Dec 2007, Jeff King wrote: > On Thu, Dec 06, 2007 at 09:18:39AM -0500, Nicolas Pitre wrote: > > > > The downside is that the threading partitions the object space, so the > > > resulting size is not necessarily as small (but I don't know that > > > anybody has done testing on large repos to find out how large the > > > difference is). > > > > Quick guesstimate is in the 1% ballpark. > > Fortunately, we now have numbers. Harvey Harrison reported repacking the > gcc repo and getting these results: > > > /usr/bin/time git repack -a -d -f --window=250 --depth=250 > > > > 23266.37user 581.04system 7:41:25elapsed 86%CPU (0avgtext+0avgdata 0maxresident)k > > 0inputs+0outputs (419835major+123275804minor)pagefaults 0swaps > > > > -r--r--r-- 1 hharrison hharrison 29091872 2007-12-06 07:26 pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.idx > > -r--r--r-- 1 hharrison hharrison 324094684 2007-12-06 07:26 pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.pack > > I tried the threaded repack with pack.threads = 3 on a dual-processor > machine, and got: > > time git repack -a -d -f --window=250 --depth=250 > > real 309m59.849s > user 377m43.948s > sys 8m23.319s > > -r--r--r-- 1 peff peff 28570088 2007-12-06 10:11 pack-1fa336f33126d762988ed6fc3f44ecbe0209da3c.idx > -r--r--r-- 1 peff peff 339922573 2007-12-06 10:11 pack-1fa336f33126d762988ed6fc3f44ecbe0209da3c.pack > > So it is about 5% bigger. Right. I should probably revisit that idea of finding deltas across partition boundaries to mitigate that loss. And those partitions could be made coarser as well to reduce the number of such partition gaps (just increase the value of chunk_size on line 1648 in builtin-pack-objects.c). > What is really disappointing is that we saved > only about 20% of the time. I didn't sit around watching the stages, but > my guess is that we spent a long time in the single threaded "writing > objects" stage with a thrashing delta cache. Maybe you should run the non threaded repack on the same machine to have a good comparison. And if you have only 2 CPUs, you will have better performances with pack.threads = 2, otherwise there'll be wasteful task switching going on. And of course, if the delta cache is being trashed, that might be due to the way the existing pack was previously packed. Hence the current pack might impact object _access_ when repacking them. So for a really really fair performance comparison, you'd have to preserve the original pack and swap it back before each repack attempt. Nicolas ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 18:02 ` Nicolas Pitre @ 2007-12-07 6:50 ` Jeff King 2007-12-07 7:27 ` Jeff King 0 siblings, 1 reply; 90+ messages in thread From: Jeff King @ 2007-12-07 6:50 UTC (permalink / raw) To: Nicolas Pitre Cc: Jon Smirl, Daniel Berlin, Harvey Harrison, David Miller, ismail, gcc, git On Thu, Dec 06, 2007 at 01:02:58PM -0500, Nicolas Pitre wrote: > > What is really disappointing is that we saved > > only about 20% of the time. I didn't sit around watching the stages, but > > my guess is that we spent a long time in the single threaded "writing > > objects" stage with a thrashing delta cache. > > Maybe you should run the non threaded repack on the same machine to have > a good comparison. Sorry, I should have been more clear. By "saved" I meant "we needed N minutes of CPU time, but took only M minutes of real time to use it." IOW, if we assume that the threading had zero overhead and that we were completely CPU bound, then the task would have taken N minutes of real time. And obviously those assumptions aren't true, but I was attempting to say "it would have been at most N minutes of real time to do it single-threaded." > And if you have only 2 CPUs, you will have better performances with > pack.threads = 2, otherwise there'll be wasteful task switching going > on. Yes, but balanced by one thread running out of data way earlier than the other, and completing the task with only one CPU. I am doing a 4-thread test on a quad-CPU right now, and I will also try it with threads=1 and threads=6 for comparison. > And of course, if the delta cache is being trashed, that might be due to > the way the existing pack was previously packed. Hence the current pack > might impact object _access_ when repacking them. So for a really > really fair performance comparison, you'd have to preserve the original > pack and swap it back before each repack attempt. I am working each time from the pack generated by fetching from git://git.infradead.org/gcc.git. -Peff ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-07 6:50 ` Jeff King @ 2007-12-07 7:27 ` Jeff King 0 siblings, 0 replies; 90+ messages in thread From: Jeff King @ 2007-12-07 7:27 UTC (permalink / raw) To: Nicolas Pitre Cc: Jon Smirl, Daniel Berlin, Harvey Harrison, David Miller, ismail, gcc, git On Fri, Dec 07, 2007 at 01:50:47AM -0500, Jeff King wrote: > Yes, but balanced by one thread running out of data way earlier than the > other, and completing the task with only one CPU. I am doing a 4-thread > test on a quad-CPU right now, and I will also try it with threads=1 and > threads=6 for comparison. Hmm. As this has been running, I read the rest of the thread, and it looks like Jon Smirl has already posted the interesting numbers. So nevermind, unless there is something particular you would like to see. -Peff ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 17:39 ` Jeff King 2007-12-06 18:02 ` Nicolas Pitre @ 2007-12-06 18:35 ` Linus Torvalds 2007-12-06 18:55 ` Jon Smirl ` (2 more replies) 2007-12-07 3:31 ` David Miller 2 siblings, 3 replies; 90+ messages in thread From: Linus Torvalds @ 2007-12-06 18:35 UTC (permalink / raw) To: Jeff King Cc: Nicolas Pitre, Jon Smirl, Daniel Berlin, Harvey Harrison, David Miller, ismail, gcc, git On Thu, 6 Dec 2007, Jeff King wrote: > > What is really disappointing is that we saved only about 20% of the > time. I didn't sit around watching the stages, but my guess is that we > spent a long time in the single threaded "writing objects" stage with a > thrashing delta cache. I don't think you spent all that much time writing the objects. That part isn't very intensive, it's mostly about the IO. I suspect you may simply be dominated by memory-throughput issues. The delta matching doesn't cache all that well, and using two or more cores isn't going to help all that much if they are largely waiting for memory (and quite possibly also perhaps fighting each other for a shared cache? Is this a Core 2 with the shared L2?) Linus ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 18:35 ` Linus Torvalds @ 2007-12-06 18:55 ` Jon Smirl 2007-12-06 19:08 ` Nicolas Pitre 2007-12-07 7:31 ` Jeff King 2007-12-08 0:47 ` Harvey Harrison 2 siblings, 1 reply; 90+ messages in thread From: Jon Smirl @ 2007-12-06 18:55 UTC (permalink / raw) To: Linus Torvalds Cc: Jeff King, Nicolas Pitre, Daniel Berlin, Harvey Harrison, David Miller, ismail, gcc, git On 12/6/07, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Thu, 6 Dec 2007, Jeff King wrote: > > > > What is really disappointing is that we saved only about 20% of the > > time. I didn't sit around watching the stages, but my guess is that we > > spent a long time in the single threaded "writing objects" stage with a > > thrashing delta cache. > > I don't think you spent all that much time writing the objects. That part > isn't very intensive, it's mostly about the IO. > > I suspect you may simply be dominated by memory-throughput issues. The > delta matching doesn't cache all that well, and using two or more cores > isn't going to help all that much if they are largely waiting for memory > (and quite possibly also perhaps fighting each other for a shared cache? > Is this a Core 2 with the shared L2?) When I lasted looked at the code, the problem was in evenly dividing the work. I was using a four core machine and most of the time one core would end up with 3-5x the work of the lightest loaded core. Setting pack.threads up to 20 fixed the problem. With a high number of threads I was able to get a 4hr pack to finished in something like 1:15. A scheme where each core could work a minute without communicating to the other cores would be best. It would also be more efficient if the cores could avoid having sync points between them. -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 18:55 ` Jon Smirl @ 2007-12-06 19:08 ` Nicolas Pitre 2007-12-06 21:39 ` Jon Smirl 0 siblings, 1 reply; 90+ messages in thread From: Nicolas Pitre @ 2007-12-06 19:08 UTC (permalink / raw) To: Jon Smirl Cc: Linus Torvalds, Jeff King, Daniel Berlin, Harvey Harrison, David Miller, ismail, gcc, git On Thu, 6 Dec 2007, Jon Smirl wrote: > On 12/6/07, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > > > > On Thu, 6 Dec 2007, Jeff King wrote: > > > > > > What is really disappointing is that we saved only about 20% of the > > > time. I didn't sit around watching the stages, but my guess is that we > > > spent a long time in the single threaded "writing objects" stage with a > > > thrashing delta cache. > > > > I don't think you spent all that much time writing the objects. That part > > isn't very intensive, it's mostly about the IO. > > > > I suspect you may simply be dominated by memory-throughput issues. The > > delta matching doesn't cache all that well, and using two or more cores > > isn't going to help all that much if they are largely waiting for memory > > (and quite possibly also perhaps fighting each other for a shared cache? > > Is this a Core 2 with the shared L2?) > > When I lasted looked at the code, the problem was in evenly dividing > the work. I was using a four core machine and most of the time one > core would end up with 3-5x the work of the lightest loaded core. > Setting pack.threads up to 20 fixed the problem. With a high number of > threads I was able to get a 4hr pack to finished in something like > 1:15. But as far as I know you didn't try my latest incarnation which has been available in Git's master branch for a few months already. Nicolas ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 19:08 ` Nicolas Pitre @ 2007-12-06 21:39 ` Jon Smirl 2007-12-06 22:08 ` Nicolas Pitre 0 siblings, 1 reply; 90+ messages in thread From: Jon Smirl @ 2007-12-06 21:39 UTC (permalink / raw) To: Nicolas Pitre Cc: Linus Torvalds, Jeff King, Daniel Berlin, Harvey Harrison, David Miller, ismail, gcc, git On 12/6/07, Nicolas Pitre <nico@cam.org> wrote: > > When I lasted looked at the code, the problem was in evenly dividing > > the work. I was using a four core machine and most of the time one > > core would end up with 3-5x the work of the lightest loaded core. > > Setting pack.threads up to 20 fixed the problem. With a high number of > > threads I was able to get a 4hr pack to finished in something like > > 1:15. > > But as far as I know you didn't try my latest incarnation which has been > available in Git's master branch for a few months already. I've deleted all my giant packs. Using the kernel pack: 4GB Q6600 Using the current thread pack code I get these results. The interesting case is the last one. I set it to 15 threads and monitored with 'top'. For 0-60% compression I was at 300% CPU, 60-74% was 200% CPU and 74-100% was 100% CPU. It never used all for cores. The only other things running were top and my desktop. This is the same load balancing problem I observed earlier. Much more clock time was spent in the 2/1 core phases than the 3 core one. Threaded, threads = 5 jonsmirl@terra:/home/linux$ time git repack -a -d -f Counting objects: 648366, done. Compressing objects: 100% (647457/647457), done. Writing objects: 100% (648366/648366), done. Total 648366 (delta 528994), reused 0 (delta 0) real 1m31.395s user 2m59.239s sys 0m3.048s jonsmirl@terra:/home/linux$ 12 seconds counting 53 seconds compressing 38 seconds writing Without threads, jonsmirl@terra:/home/linux$ time git repack -a -d -f warning: no threads support, ignoring pack.threads Counting objects: 648366, done. Compressing objects: 100% (647457/647457), done. Writing objects: 100% (648366/648366), done. Total 648366 (delta 528999), reused 0 (delta 0) real 2m54.849s user 2m51.267s sys 0m1.412s jonsmirl@terra:/home/linux$ Threaded, threads = 5 jonsmirl@terra:/home/linux$ time git repack -a -d -f --depth=250 --window=250 Counting objects: 648366, done. Compressing objects: 100% (647457/647457), done. Writing objects: 100% (648366/648366), done. Total 648366 (delta 539080), reused 0 (delta 0) real 9m18.032s user 19m7.484s sys 0m3.880s jonsmirl@terra:/home/linux$ jonsmirl@terra:/home/linux/.git/objects/pack$ ls -l total 182156 -r--r--r-- 1 jonsmirl jonsmirl 15561848 2007-12-06 16:15 pack-f1f8637d2c68eb1c964ec7c1877196c0c7513412.idx -r--r--r-- 1 jonsmirl jonsmirl 170768761 2007-12-06 16:15 pack-f1f8637d2c68eb1c964ec7c1877196c0c7513412.pack jonsmirl@terra:/home/linux/.git/objects/pack$ Non-threaded: jonsmirl@terra:/home/linux$ time git repack -a -d -f --depth=250 --window=250 warning: no threads support, ignoring pack.threads Counting objects: 648366, done. Compressing objects: 100% (647457/647457), done. Writing objects: 100% (648366/648366), done. Total 648366 (delta 539080), reused 0 (delta 0) real 18m51.183s user 18m46.538s sys 0m1.604s jonsmirl@terra:/home/linux$ jonsmirl@terra:/home/linux/.git/objects/pack$ ls -l total 182156 -r--r--r-- 1 jonsmirl jonsmirl 15561848 2007-12-06 15:33 pack-f1f8637d2c68eb1c964ec7c1877196c0c7513412.idx -r--r--r-- 1 jonsmirl jonsmirl 170768761 2007-12-06 15:33 pack-f1f8637d2c68eb1c964ec7c1877196c0c7513412.pack jonsmirl@terra:/home/linux/.git/objects/pack$ Threaded, threads = 15 jonsmirl@terra:/home/linux$ time git repack -a -d -f --depth=250 --window=250 Counting objects: 648366, done. Compressing objects: 100% (647457/647457), done. Writing objects: 100% (648366/648366), done. Total 648366 (delta 539080), reused 0 (delta 0) real 9m18.325s user 19m14.340s sys 0m3.996s jonsmirl@terra:/home/linux$ -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 21:39 ` Jon Smirl @ 2007-12-06 22:08 ` Nicolas Pitre 2007-12-06 22:11 ` Jon Smirl 2007-12-06 22:22 ` Jon Smirl 0 siblings, 2 replies; 90+ messages in thread From: Nicolas Pitre @ 2007-12-06 22:08 UTC (permalink / raw) To: Jon Smirl Cc: Linus Torvalds, Jeff King, Daniel Berlin, Harvey Harrison, David Miller, ismail, gcc, git On Thu, 6 Dec 2007, Jon Smirl wrote: > On 12/6/07, Nicolas Pitre <nico@cam.org> wrote: > > > When I lasted looked at the code, the problem was in evenly dividing > > > the work. I was using a four core machine and most of the time one > > > core would end up with 3-5x the work of the lightest loaded core. > > > Setting pack.threads up to 20 fixed the problem. With a high number of > > > threads I was able to get a 4hr pack to finished in something like > > > 1:15. > > > > But as far as I know you didn't try my latest incarnation which has been > > available in Git's master branch for a few months already. > > I've deleted all my giant packs. Using the kernel pack: > 4GB Q6600 > > Using the current thread pack code I get these results. > > The interesting case is the last one. I set it to 15 threads and > monitored with 'top'. > For 0-60% compression I was at 300% CPU, 60-74% was 200% CPU and > 74-100% was 100% CPU. It never used all for cores. The only other > things running were top and my desktop. This is the same load > balancing problem I observed earlier. Well, that's possible with a window 25 times larger than the default. The load balancing is solved with a master thread serving relatively small object list segments to any work thread that finished with its previous segment. But the size for those segments is currently fixed to window * 1000 which is way too large when window == 250. I have to find a way to auto-tune that segment size somehow. But with the default window size there should not be any such noticeable load balancing problem. Note that threading only happens in the compression phase. The count and write phase are hardly paralleled. Nicolas ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 22:08 ` Nicolas Pitre @ 2007-12-06 22:11 ` Jon Smirl 2007-12-06 22:22 ` Jon Smirl 1 sibling, 0 replies; 90+ messages in thread From: Jon Smirl @ 2007-12-06 22:11 UTC (permalink / raw) To: Nicolas Pitre Cc: Linus Torvalds, Jeff King, Daniel Berlin, Harvey Harrison, David Miller, ismail, gcc, git On 12/6/07, Nicolas Pitre <nico@cam.org> wrote: > On Thu, 6 Dec 2007, Jon Smirl wrote: > > > On 12/6/07, Nicolas Pitre <nico@cam.org> wrote: > > > > When I lasted looked at the code, the problem was in evenly dividing > > > > the work. I was using a four core machine and most of the time one > > > > core would end up with 3-5x the work of the lightest loaded core. > > > > Setting pack.threads up to 20 fixed the problem. With a high number of > > > > threads I was able to get a 4hr pack to finished in something like > > > > 1:15. > > > > > > But as far as I know you didn't try my latest incarnation which has been > > > available in Git's master branch for a few months already. > > > > I've deleted all my giant packs. Using the kernel pack: > > 4GB Q6600 > > > > Using the current thread pack code I get these results. > > > > The interesting case is the last one. I set it to 15 threads and > > monitored with 'top'. > > For 0-60% compression I was at 300% CPU, 60-74% was 200% CPU and > > 74-100% was 100% CPU. It never used all for cores. The only other > > things running were top and my desktop. This is the same load > > balancing problem I observed earlier. > > Well, that's possible with a window 25 times larger than the default. > > The load balancing is solved with a master thread serving relatively > small object list segments to any work thread that finished with its > previous segment. But the size for those segments is currently fixed to > window * 1000 which is way too large when window == 250. > > I have to find a way to auto-tune that segment size somehow. That would be nice. Threading is most important on the giant pack/window combinations. The normal case is fast enough that I don't real notice it. These giant pack/window combos can run 8-10 hours. > > But with the default window size there should not be any such noticeable > load balancing problem. I only spend 30 seconds in the compression phase without making the window larger. It's not long enough to really see what is going on. > > Note that threading only happens in the compression phase. The count > and write phase are hardly paralleled. > > > Nicolas > -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 22:08 ` Nicolas Pitre 2007-12-06 22:11 ` Jon Smirl @ 2007-12-06 22:22 ` Jon Smirl 2007-12-06 22:30 ` Nicolas Pitre 1 sibling, 1 reply; 90+ messages in thread From: Jon Smirl @ 2007-12-06 22:22 UTC (permalink / raw) To: Nicolas Pitre Cc: Linus Torvalds, Jeff King, Daniel Berlin, Harvey Harrison, David Miller, ismail, gcc, git On 12/6/07, Nicolas Pitre <nico@cam.org> wrote: > On Thu, 6 Dec 2007, Jon Smirl wrote: > > > On 12/6/07, Nicolas Pitre <nico@cam.org> wrote: > > > > When I lasted looked at the code, the problem was in evenly dividing > > > > the work. I was using a four core machine and most of the time one > > > > core would end up with 3-5x the work of the lightest loaded core. > > > > Setting pack.threads up to 20 fixed the problem. With a high number of > > > > threads I was able to get a 4hr pack to finished in something like > > > > 1:15. > > > > > > But as far as I know you didn't try my latest incarnation which has been > > > available in Git's master branch for a few months already. > > > > I've deleted all my giant packs. Using the kernel pack: > > 4GB Q6600 > > > > Using the current thread pack code I get these results. > > > > The interesting case is the last one. I set it to 15 threads and > > monitored with 'top'. > > For 0-60% compression I was at 300% CPU, 60-74% was 200% CPU and > > 74-100% was 100% CPU. It never used all for cores. The only other > > things running were top and my desktop. This is the same load > > balancing problem I observed earlier. > > Well, that's possible with a window 25 times larger than the default. Why did it never use more than three cores? > > The load balancing is solved with a master thread serving relatively > small object list segments to any work thread that finished with its > previous segment. But the size for those segments is currently fixed to > window * 1000 which is way too large when window == 250. > > I have to find a way to auto-tune that segment size somehow. > > But with the default window size there should not be any such noticeable > load balancing problem. > > Note that threading only happens in the compression phase. The count > and write phase are hardly paralleled. > > > Nicolas > -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 22:22 ` Jon Smirl @ 2007-12-06 22:30 ` Nicolas Pitre 2007-12-06 22:44 ` Jon Smirl 0 siblings, 1 reply; 90+ messages in thread From: Nicolas Pitre @ 2007-12-06 22:30 UTC (permalink / raw) To: Jon Smirl Cc: Linus Torvalds, Jeff King, Daniel Berlin, Harvey Harrison, David Miller, ismail, gcc, git On Thu, 6 Dec 2007, Jon Smirl wrote: > On 12/6/07, Nicolas Pitre <nico@cam.org> wrote: > > On Thu, 6 Dec 2007, Jon Smirl wrote: > > > > > On 12/6/07, Nicolas Pitre <nico@cam.org> wrote: > > > > > When I lasted looked at the code, the problem was in evenly dividing > > > > > the work. I was using a four core machine and most of the time one > > > > > core would end up with 3-5x the work of the lightest loaded core. > > > > > Setting pack.threads up to 20 fixed the problem. With a high number of > > > > > threads I was able to get a 4hr pack to finished in something like > > > > > 1:15. > > > > > > > > But as far as I know you didn't try my latest incarnation which has been > > > > available in Git's master branch for a few months already. > > > > > > I've deleted all my giant packs. Using the kernel pack: > > > 4GB Q6600 > > > > > > Using the current thread pack code I get these results. > > > > > > The interesting case is the last one. I set it to 15 threads and > > > monitored with 'top'. > > > For 0-60% compression I was at 300% CPU, 60-74% was 200% CPU and > > > 74-100% was 100% CPU. It never used all for cores. The only other > > > things running were top and my desktop. This is the same load > > > balancing problem I observed earlier. > > > > Well, that's possible with a window 25 times larger than the default. > > Why did it never use more than three cores? You have 648366 objects total, and only 647457 of them are subject to delta compression. With a window size of 250 and a default thread segment of window * 1000 that means only 3 segments will be distributed to threads, hence only 3 threads with work to do. Nicolas ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 22:30 ` Nicolas Pitre @ 2007-12-06 22:44 ` Jon Smirl 0 siblings, 0 replies; 90+ messages in thread From: Jon Smirl @ 2007-12-06 22:44 UTC (permalink / raw) To: Nicolas Pitre Cc: Linus Torvalds, Jeff King, Daniel Berlin, Harvey Harrison, David Miller, ismail, gcc, git On 12/6/07, Nicolas Pitre <nico@cam.org> wrote: > > > Well, that's possible with a window 25 times larger than the default. > > > > Why did it never use more than three cores? > > You have 648366 objects total, and only 647457 of them are subject to > delta compression. > > With a window size of 250 and a default thread segment of window * 1000 > that means only 3 segments will be distributed to threads, hence only 3 > threads with work to do. One little tweak and the clock time drops from 9.5 to 6 minutes. The tweak makes all four cores work. jonsmirl@terra:/home/apps/git$ git diff diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index 4f44658..e0dd12e 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -1645,7 +1645,7 @@ static void ll_find_deltas(struct object_entry **list, unsigned list_size, } /* this should be auto-tuned somehow */ - chunk_size = window * 1000; + chunk_size = window * 50; do { unsigned sublist_size = chunk_size; jonsmirl@terra:/home/linux/.git$ time git repack -a -d -f --depth=250 --window=250 Counting objects: 648366, done. Compressing objects: 100% (647457/647457), done. Writing objects: 100% (648366/648366), done. Total 648366 (delta 539043), reused 0 (delta 0) real 6m2.109s user 20m0.491s sys 0m4.608s jonsmirl@terra:/home/linux/.git$ > > > Nicolas > -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply related [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 18:35 ` Linus Torvalds 2007-12-06 18:55 ` Jon Smirl @ 2007-12-07 7:31 ` Jeff King 2007-12-08 0:47 ` Harvey Harrison 2 siblings, 0 replies; 90+ messages in thread From: Jeff King @ 2007-12-07 7:31 UTC (permalink / raw) To: Linus Torvalds Cc: Nicolas Pitre, Jon Smirl, Daniel Berlin, Harvey Harrison, David Miller, ismail, gcc, git On Thu, Dec 06, 2007 at 10:35:22AM -0800, Linus Torvalds wrote: > > What is really disappointing is that we saved only about 20% of the > > time. I didn't sit around watching the stages, but my guess is that we > > spent a long time in the single threaded "writing objects" stage with a > > thrashing delta cache. > > I don't think you spent all that much time writing the objects. That part > isn't very intensive, it's mostly about the IO. It can get nasty with super-long deltas thrashing the cache, I think. But in this case, I think it ended up being just a poor division of labor caused by the chunk_size parameter using the quite large window size (see elsewhere in the thread for discussion). > I suspect you may simply be dominated by memory-throughput issues. The > delta matching doesn't cache all that well, and using two or more cores > isn't going to help all that much if they are largely waiting for memory > (and quite possibly also perhaps fighting each other for a shared cache? > Is this a Core 2 with the shared L2?) I think the chunk_size more or less explains it. I have had reasonable success keeping both CPUs busy on similar tasks in the past (but with smaller window sizes). For reference, it was a Core 2 Duo; do they all share L2, or is there something I can look for in /proc/cpuinfo? -Peff ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 18:35 ` Linus Torvalds 2007-12-06 18:55 ` Jon Smirl 2007-12-07 7:31 ` Jeff King @ 2007-12-08 0:47 ` Harvey Harrison 2007-12-10 9:54 ` Gabriel Paubert 2 siblings, 1 reply; 90+ messages in thread From: Harvey Harrison @ 2007-12-08 0:47 UTC (permalink / raw) To: Linus Torvalds Cc: Jeff King, Nicolas Pitre, Jon Smirl, Daniel Berlin, David Miller, ismail, gcc, git Some interesting stats from the highly packed gcc repo. The long chain lengths very quickly tail off. Over 60% of the objects have a chain length of 20 or less. If anyone wants the full list let me know. I also have included a few other interesting points, the git default depth of 50, my initial guess of 100 and every 10% in the cumulative distribution from 60-100%. This shows the git default of 50 really isn't that bad, and after about 100 it really starts to get sparse. Harvey 1: 103817 103817 10.20% 1017922 2: 67332 171149 16.81% 3: 57520 228669 22.46% 4: 52570 281239 27.63% 5: 43910 325149 31.94% 6: 37520 362669 35.63% 7: 35248 397917 39.09% 8: 29819 427736 42.02% 9: 27619 455355 44.73% 10: 22656 478011 46.96% 11: 21073 499084 49.03% 12: 18738 517822 50.87% 13: 16674 534496 52.51% 14: 14882 549378 53.97% 15: 14424 563802 55.39% 16: 12765 576567 56.64% 17: 11662 588229 57.79% 18: 11845 600074 58.95% 19: 11694 611768 60.10% 20: 9625 621393 61.05% 34: 5354 719356 70.67% 50: 3395 785342 77.15% 60: 2547 815072 80.07% 100: 1644 898284 88.25% 113: 1292 917046 90.09% 158: 959 967429 95.04% 200: 652 997653 98.01% 219: 491 1008132 99.04% 245: 179 1017717 99.98% 246: 111 1017828 99.99% 247: 61 1017889 100.00% 248: 27 1017916 100.00% 249: 6 1017922 100.00% ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-08 0:47 ` Harvey Harrison @ 2007-12-10 9:54 ` Gabriel Paubert 2007-12-10 15:35 ` Nicolas Pitre 0 siblings, 1 reply; 90+ messages in thread From: Gabriel Paubert @ 2007-12-10 9:54 UTC (permalink / raw) To: Harvey Harrison Cc: Linus Torvalds, Jeff King, Nicolas Pitre, Jon Smirl, Daniel Berlin, David Miller, ismail, gcc, git On Fri, Dec 07, 2007 at 04:47:19PM -0800, Harvey Harrison wrote: > Some interesting stats from the highly packed gcc repo. The long chain > lengths very quickly tail off. Over 60% of the objects have a chain > length of 20 or less. If anyone wants the full list let me know. I > also have included a few other interesting points, the git default > depth of 50, my initial guess of 100 and every 10% in the cumulative > distribution from 60-100%. > > This shows the git default of 50 really isn't that bad, and after > about 100 it really starts to get sparse. Do you have a way to know which files have the longest chains? I have a suspiscion that the ChangeLog* files are among them, not only because they are, almost without exception, only modified by prepending text to the previous version (and a fairly small amount compared to the size of the file), and therefore the diff is simple (a single hunk) so that the limit on chain depth is probably what causes a new copy to be created. Besides that these files grow quite large and become some of the largest files in the tree, and at least one of them is changed for every commit. This leads again to many versions of fairly large files. If this guess is right, this implies that most of the size gains from longer chains comes from having less copies of the ChangeLog* files. From a performance point of view, it is rather favourable since the differences are simple. This would also explain why the window parameter has little effect. Regards, Gabriel ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-10 9:54 ` Gabriel Paubert @ 2007-12-10 15:35 ` Nicolas Pitre 0 siblings, 0 replies; 90+ messages in thread From: Nicolas Pitre @ 2007-12-10 15:35 UTC (permalink / raw) To: Gabriel Paubert Cc: Harvey Harrison, Linus Torvalds, Jeff King, Jon Smirl, Daniel Berlin, David Miller, ismail, gcc, git On Mon, 10 Dec 2007, Gabriel Paubert wrote: > On Fri, Dec 07, 2007 at 04:47:19PM -0800, Harvey Harrison wrote: > > Some interesting stats from the highly packed gcc repo. The long chain > > lengths very quickly tail off. Over 60% of the objects have a chain > > length of 20 or less. If anyone wants the full list let me know. I > > also have included a few other interesting points, the git default > > depth of 50, my initial guess of 100 and every 10% in the cumulative > > distribution from 60-100%. > > > > This shows the git default of 50 really isn't that bad, and after > > about 100 it really starts to get sparse. > > Do you have a way to know which files have the longest chains? With 'git verify-pack -v' you get the delta depth for each object. Then you can use 'git show' with the object SHA1 to see its content. > I have a suspiscion that the ChangeLog* files are among them, > not only because they are, almost without exception, only modified > by prepending text to the previous version (and a fairly small amount > compared to the size of the file), and therefore the diff is simple > (a single hunk) so that the limit on chain depth is probably what > causes a new copy to be created. My gcc repo is currently repacked with a max delta depth of 50, and a quick sample of those objects at the depth limit does indeed show the content of the ChangeLog file. But I have occurrences of the root directory tree object too, and the "GCC machine description for IA-32" content as well. But yes, the really deep delta chains are most certainly going to contain those ChangeLog files. > Besides that these files grow quite large and become some of the > largest files in the tree, and at least one of them is changed > for every commit. This leads again to many versions of fairly > large files. > > If this guess is right, this implies that most of the size gains > from longer chains comes from having less copies of the ChangeLog* > files. From a performance point of view, it is rather favourable > since the differences are simple. This would also explain why > the window parameter has little effect. Well, actually the window parameter does have big effects. For instance the default of 10 is completely inadequate for the gcc repo, since changing the window size from 10 to 100 made the corresponding pack shrink from 2.1GB down to 400MB, with the same max delta depth. Nicolas ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 17:39 ` Jeff King 2007-12-06 18:02 ` Nicolas Pitre 2007-12-06 18:35 ` Linus Torvalds @ 2007-12-07 3:31 ` David Miller 2007-12-07 6:38 ` Jeff King 2 siblings, 1 reply; 90+ messages in thread From: David Miller @ 2007-12-07 3:31 UTC (permalink / raw) To: peff; +Cc: nico, jonsmirl, dberlin, harvey.harrison, ismail, gcc, git From: Jeff King <peff@peff.net> Date: Thu, 6 Dec 2007 12:39:47 -0500 > I tried the threaded repack with pack.threads = 3 on a dual-processor > machine, and got: > > time git repack -a -d -f --window=250 --depth=250 > > real 309m59.849s > user 377m43.948s > sys 8m23.319s > > -r--r--r-- 1 peff peff 28570088 2007-12-06 10:11 pack-1fa336f33126d762988ed6fc3f44ecbe0209da3c.idx > -r--r--r-- 1 peff peff 339922573 2007-12-06 10:11 pack-1fa336f33126d762988ed6fc3f44ecbe0209da3c.pack > > So it is about 5% bigger. What is really disappointing is that we saved > only about 20% of the time. I didn't sit around watching the stages, but > my guess is that we spent a long time in the single threaded "writing > objects" stage with a thrashing delta cache. If someone can give me a good way to run this test case I can have my 64-cpu Niagara-2 box crunch on this and see how fast it goes and how much larger the resulting pack file is. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-07 3:31 ` David Miller @ 2007-12-07 6:38 ` Jeff King 2007-12-07 7:10 ` Jon Smirl 0 siblings, 1 reply; 90+ messages in thread From: Jeff King @ 2007-12-07 6:38 UTC (permalink / raw) To: David Miller; +Cc: nico, jonsmirl, dberlin, harvey.harrison, ismail, gcc, git On Thu, Dec 06, 2007 at 07:31:21PM -0800, David Miller wrote: > > So it is about 5% bigger. What is really disappointing is that we saved > > only about 20% of the time. I didn't sit around watching the stages, but > > my guess is that we spent a long time in the single threaded "writing > > objects" stage with a thrashing delta cache. > > If someone can give me a good way to run this test case I can > have my 64-cpu Niagara-2 box crunch on this and see how fast > it goes and how much larger the resulting pack file is. That would be fun to see. The procedure I am using is this: # compile recent git master with threaded delta cd git echo THREADED_DELTA_SEARCH = 1 >>config.mak make install # get the gcc pack mkdir gcc && cd gcc git --bare init git config remote.gcc.url git://git.infradead.org/gcc.git git config remote.gcc.fetch \ '+refs/remotes/gcc.gnu.org/*:refs/remotes/gcc.gnu.org/*' git remote update # make a copy, so we can run further tests from a known point cd .. cp -a gcc test # and test multithreaded large depth/window repacking cd test git config pack.threads 4 time git repack -a -d -f --window=250 --depth=250 -Peff ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-07 6:38 ` Jeff King @ 2007-12-07 7:10 ` Jon Smirl 2007-12-07 12:53 ` David Miller 0 siblings, 1 reply; 90+ messages in thread From: Jon Smirl @ 2007-12-07 7:10 UTC (permalink / raw) To: Jeff King; +Cc: David Miller, nico, dberlin, harvey.harrison, ismail, gcc, git On 12/7/07, Jeff King <peff@peff.net> wrote: > On Thu, Dec 06, 2007 at 07:31:21PM -0800, David Miller wrote: > > > > So it is about 5% bigger. What is really disappointing is that we saved > > > only about 20% of the time. I didn't sit around watching the stages, but > > > my guess is that we spent a long time in the single threaded "writing > > > objects" stage with a thrashing delta cache. > > > > If someone can give me a good way to run this test case I can > > have my 64-cpu Niagara-2 box crunch on this and see how fast > > it goes and how much larger the resulting pack file is. > > That would be fun to see. The procedure I am using is this: > > # compile recent git master with threaded delta > cd git > echo THREADED_DELTA_SEARCH = 1 >>config.mak > make install > > # get the gcc pack > mkdir gcc && cd gcc > git --bare init > git config remote.gcc.url git://git.infradead.org/gcc.git > git config remote.gcc.fetch \ > '+refs/remotes/gcc.gnu.org/*:refs/remotes/gcc.gnu.org/*' > git remote update > > # make a copy, so we can run further tests from a known point > cd .. > cp -a gcc test > > # and test multithreaded large depth/window repacking > cd test > git config pack.threads 4 64 threads with 64 CPUs, if they are multicore you want even more. you need to adjust chunk_size as mentioned in the other mail. > time git repack -a -d -f --window=250 --depth=250 > > -Peff > -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-07 7:10 ` Jon Smirl @ 2007-12-07 12:53 ` David Miller 2007-12-07 17:23 ` Linus Torvalds 2007-12-10 9:57 ` David Miller 0 siblings, 2 replies; 90+ messages in thread From: David Miller @ 2007-12-07 12:53 UTC (permalink / raw) To: jonsmirl; +Cc: peff, nico, dberlin, harvey.harrison, ismail, gcc, git From: "Jon Smirl" <jonsmirl@gmail.com> Date: Fri, 7 Dec 2007 02:10:49 -0500 > On 12/7/07, Jeff King <peff@peff.net> wrote: > > On Thu, Dec 06, 2007 at 07:31:21PM -0800, David Miller wrote: > > > > # and test multithreaded large depth/window repacking > > cd test > > git config pack.threads 4 > > 64 threads with 64 CPUs, if they are multicore you want even more. > you need to adjust chunk_size as mentioned in the other mail. It's an 8 core system with 64 cpu threads. > > time git repack -a -d -f --window=250 --depth=250 Didn't work very well, even with the one-liner patch for chunk_size it died. I think I need to build 64-bit binaries. davem@huronp11:~/src/GCC/git/test$ time git repack -a -d -f --window=250 --depth=250 Counting objects: 1190671, done. fatal: Out of memory? mmap failed: Cannot allocate memory real 58m36.447s user 289m8.270s sys 4m40.680s davem@huronp11:~/src/GCC/git/test$ While it did run the load was anywhere between 5 and 9, although it did create 64 threads, and the size of the process was about 3.2GB This may be in part why it wasn't able to use all 64 thread effectively. Like I said it seemed to have 9 active at best, at any one time, most of the time only 4 or 5 were busy doing anything. Also I could end up being performance limited by SHA, it's not very well tuned on Sparc. It's been on my TODO list to code up the crypto unit support for Niagara-2 in the kernel, then work with Herbert Xu on the userland interfaces to take advantage of that in things like libssl. Even a better C/asm version would probably improve GIT performance a bit. Is SHA a significant portion of the compute during these repacks? I should run oprofile... ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-07 12:53 ` David Miller @ 2007-12-07 17:23 ` Linus Torvalds 2007-12-07 20:26 ` Giovanni Bajo 2007-12-08 1:55 ` David Miller 2007-12-10 9:57 ` David Miller 1 sibling, 2 replies; 90+ messages in thread From: Linus Torvalds @ 2007-12-07 17:23 UTC (permalink / raw) To: David Miller Cc: jonsmirl, peff, nico, dberlin, harvey.harrison, ismail, gcc, git On Fri, 7 Dec 2007, David Miller wrote: > > Also I could end up being performance limited by SHA, it's not very > well tuned on Sparc. It's been on my TODO list to code up the crypto > unit support for Niagara-2 in the kernel, then work with Herbert Xu on > the userland interfaces to take advantage of that in things like > libssl. Even a better C/asm version would probably improve GIT > performance a bit. I doubt yu can use the hardware support. Kernel-only hw support is inherently broken for any sane user-space usage, the setup costs are just way way too high. To be useful, crypto engines need to support direct user space access (ie a regular instruction, with all state being held in normal registers that get saved/restored by the kernel). > Is SHA a significant portion of the compute during these repacks? > I should run oprofile... SHA1 is almost totally insignificant on x86. It hardly shows up. But we have a good optimized version there. zlib tends to be a lot more noticeable (especially the uncompression: it may be faster than compression, but it's done _so_ much more that it totally dominates). Linus ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-07 17:23 ` Linus Torvalds @ 2007-12-07 20:26 ` Giovanni Bajo 2007-12-07 22:14 ` Jakub Narebski 2007-12-08 1:55 ` David Miller 1 sibling, 1 reply; 90+ messages in thread From: Giovanni Bajo @ 2007-12-07 20:26 UTC (permalink / raw) To: Linus Torvalds Cc: David Miller, jonsmirl, peff, nico, dberlin, harvey.harrison, ismail, gcc, git On 12/7/2007 6:23 PM, Linus Torvalds wrote: >> Is SHA a significant portion of the compute during these repacks? >> I should run oprofile... > > SHA1 is almost totally insignificant on x86. It hardly shows up. But we > have a good optimized version there. > > zlib tends to be a lot more noticeable (especially the uncompression: it > may be faster than compression, but it's done _so_ much more that it > totally dominates). Have you considered alternatives, like: http://www.oberhumer.com/opensource/ucl/ -- Giovanni Bajo ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-07 20:26 ` Giovanni Bajo @ 2007-12-07 22:14 ` Jakub Narebski 2007-12-07 23:04 ` Luke Lu 2007-12-07 23:14 ` Giovanni Bajo 0 siblings, 2 replies; 90+ messages in thread From: Jakub Narebski @ 2007-12-07 22:14 UTC (permalink / raw) To: Giovanni Bajo Cc: Linus Torvalds, David Miller, jonsmirl, peff, nico, dberlin, harvey.harrison, ismail, gcc, git Giovanni Bajo <rasky@develer.com> writes: > On 12/7/2007 6:23 PM, Linus Torvalds wrote: > > >> Is SHA a significant portion of the compute during these repacks? > >> I should run oprofile... > > SHA1 is almost totally insignificant on x86. It hardly shows up. But > > we have a good optimized version there. > > zlib tends to be a lot more noticeable (especially the > > *uncompression*: it may be faster than compression, but it's done _so_ > > much more that it totally dominates). > > Have you considered alternatives, like: > http://www.oberhumer.com/opensource/ucl/ <quote> As compared to LZO, the UCL algorithms achieve a better compression ratio but *decompression* is a little bit slower. See below for some rough timings. </quote> It is uncompression speed that is more important, because it is used much more often. -- Jakub Narebski ShadeHawk on #git ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-07 22:14 ` Jakub Narebski @ 2007-12-07 23:04 ` Luke Lu 2007-12-07 23:14 ` Giovanni Bajo 1 sibling, 0 replies; 90+ messages in thread From: Luke Lu @ 2007-12-07 23:04 UTC (permalink / raw) To: Jakub Narebski Cc: Giovanni Bajo, Linus Torvalds, David Miller, jonsmirl, peff, nico, dberlin, harvey.harrison, ismail, gcc, git On Dec 7, 2007, at 2:14 PM, Jakub Narebski wrote: > Giovanni Bajo <rasky@develer.com> writes: >> On 12/7/2007 6:23 PM, Linus Torvalds wrote: >>>> Is SHA a significant portion of the compute during these repacks? >>>> I should run oprofile... >>> SHA1 is almost totally insignificant on x86. It hardly shows up. But >>> we have a good optimized version there. >>> zlib tends to be a lot more noticeable (especially the >>> *uncompression*: it may be faster than compression, but it's done >>> _so_ >>> much more that it totally dominates). >> >> Have you considered alternatives, like: >> http://www.oberhumer.com/opensource/ucl/ > > <quote> > As compared to LZO, the UCL algorithms achieve a better compression > ratio but *decompression* is a little bit slower. See below for some > rough timings. > </quote> > > It is uncompression speed that is more important, because it is used > much more often. So why didn't we consider lzo then? It's much faster than zlib. __Luke ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-07 22:14 ` Jakub Narebski 2007-12-07 23:04 ` Luke Lu @ 2007-12-07 23:14 ` Giovanni Bajo 2007-12-07 23:33 ` Daniel Berlin 1 sibling, 1 reply; 90+ messages in thread From: Giovanni Bajo @ 2007-12-07 23:14 UTC (permalink / raw) To: Jakub Narebski Cc: Linus Torvalds, David Miller, jonsmirl, peff, nico, dberlin, harvey.harrison, ismail, gcc, git On Fri, 2007-12-07 at 14:14 -0800, Jakub Narebski wrote: > > >> Is SHA a significant portion of the compute during these repacks? > > >> I should run oprofile... > > > SHA1 is almost totally insignificant on x86. It hardly shows up. But > > > we have a good optimized version there. > > > zlib tends to be a lot more noticeable (especially the > > > *uncompression*: it may be faster than compression, but it's done _so_ > > > much more that it totally dominates). > > > > Have you considered alternatives, like: > > http://www.oberhumer.com/opensource/ucl/ > > <quote> > As compared to LZO, the UCL algorithms achieve a better compression > ratio but *decompression* is a little bit slower. See below for some > rough timings. > </quote> > > It is uncompression speed that is more important, because it is used > much more often. I know, but the point is not what is the fastestest, but if it's fast enough to get off the profiles. I think UCL is fast enough since it's still times faster than zlib. Anyway, LZO is GPL too, so why not considering it too. They are good libraries. -- Giovanni Bajo ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-07 23:14 ` Giovanni Bajo @ 2007-12-07 23:33 ` Daniel Berlin 2007-12-08 12:00 ` Johannes Schindelin 0 siblings, 1 reply; 90+ messages in thread From: Daniel Berlin @ 2007-12-07 23:33 UTC (permalink / raw) To: Giovanni Bajo Cc: Jakub Narebski, Linus Torvalds, David Miller, jonsmirl, peff, nico, harvey.harrison, ismail, gcc, git On 12/7/07, Giovanni Bajo <rasky@develer.com> wrote: > On Fri, 2007-12-07 at 14:14 -0800, Jakub Narebski wrote: > > > > >> Is SHA a significant portion of the compute during these repacks? > > > >> I should run oprofile... > > > > SHA1 is almost totally insignificant on x86. It hardly shows up. But > > > > we have a good optimized version there. > > > > zlib tends to be a lot more noticeable (especially the > > > > *uncompression*: it may be faster than compression, but it's done _so_ > > > > much more that it totally dominates). > > > > > > Have you considered alternatives, like: > > > http://www.oberhumer.com/opensource/ucl/ > > > > <quote> > > As compared to LZO, the UCL algorithms achieve a better compression > > ratio but *decompression* is a little bit slower. See below for some > > rough timings. > > </quote> > > > > It is uncompression speed that is more important, because it is used > > much more often. > > I know, but the point is not what is the fastestest, but if it's fast > enough to get off the profiles. I think UCL is fast enough since it's > still times faster than zlib. Anyway, LZO is GPL too, so why not > considering it too. They are good libraries. At worst, you could also use fastlz (www.fastlz.org), which is faster than all of these by a factor of 4 (and compression wise, is actually sometimes better, sometimes worse, than LZO). ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-07 23:33 ` Daniel Berlin @ 2007-12-08 12:00 ` Johannes Schindelin 0 siblings, 0 replies; 90+ messages in thread From: Johannes Schindelin @ 2007-12-08 12:00 UTC (permalink / raw) To: Daniel Berlin Cc: Giovanni Bajo, Jakub Narebski, Linus Torvalds, David Miller, jonsmirl, peff, nico, harvey.harrison, ismail, gcc, git Hi, On Fri, 7 Dec 2007, Daniel Berlin wrote: > On 12/7/07, Giovanni Bajo <rasky@develer.com> wrote: > > On Fri, 2007-12-07 at 14:14 -0800, Jakub Narebski wrote: > > > > > > >> Is SHA a significant portion of the compute during these > > > > >> repacks? I should run oprofile... > > > > > SHA1 is almost totally insignificant on x86. It hardly shows up. > > > > > But we have a good optimized version there. zlib tends to be a > > > > > lot more noticeable (especially the *uncompression*: it may be > > > > > faster than compression, but it's done _so_ much more that it > > > > > totally dominates). > > > > > > > > Have you considered alternatives, like: > > > > http://www.oberhumer.com/opensource/ucl/ > > > > > > <quote> > > > As compared to LZO, the UCL algorithms achieve a better > > > compression ratio but *decompression* is a little bit slower. See > > > below for some rough timings. > > > </quote> > > > > > > It is uncompression speed that is more important, because it is used > > > much more often. > > > > I know, but the point is not what is the fastestest, but if it's fast > > enough to get off the profiles. I think UCL is fast enough since it's > > still times faster than zlib. Anyway, LZO is GPL too, so why not > > considering it too. They are good libraries. > > > At worst, you could also use fastlz (www.fastlz.org), which is faster > than all of these by a factor of 4 (and compression wise, is actually > sometimes better, sometimes worse, than LZO). fastLZ is awfully short on details when it comes to a comparison of the resulting file sizes. The only result I saw was that for the (single) example they chose, compressed size was 470MB as opposed to 361MB for zip's _fastest_ mode. Really, that's not acceptable for me in the context of git. Besides, if you change the compression algorithm you will have to add support for legacy clients to _recompress_ with libz. Which most likely would make Sisyphos grin watching them servers. Ciao, Dscho ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-07 17:23 ` Linus Torvalds 2007-12-07 20:26 ` Giovanni Bajo @ 2007-12-08 1:55 ` David Miller 1 sibling, 0 replies; 90+ messages in thread From: David Miller @ 2007-12-08 1:55 UTC (permalink / raw) To: torvalds; +Cc: jonsmirl, peff, nico, dberlin, harvey.harrison, ismail, gcc, git From: Linus Torvalds <torvalds@linux-foundation.org> Date: Fri, 7 Dec 2007 09:23:47 -0800 (PST) > > > On Fri, 7 Dec 2007, David Miller wrote: > > > > Also I could end up being performance limited by SHA, it's not very > > well tuned on Sparc. It's been on my TODO list to code up the crypto > > unit support for Niagara-2 in the kernel, then work with Herbert Xu on > > the userland interfaces to take advantage of that in things like > > libssl. Even a better C/asm version would probably improve GIT > > performance a bit. > > I doubt yu can use the hardware support. Kernel-only hw support is > inherently broken for any sane user-space usage, the setup costs are just > way way too high. To be useful, crypto engines need to support direct user > space access (ie a regular instruction, with all state being held in > normal registers that get saved/restored by the kernel). Unfortunately they are hypervisor calls, and you have to give the thing physical addresses for the buffer to work on, so letting userland get at it directly isn't currently doable. I still believe that there are cases where userland can take advantage of in-kernel crypto devices, such as when we are streaming the data into the kernel anyways (for a write() or sendmsg()) and the user just wants the transformation to be done on that stream. As a specific case, hardware crypto SSL support works quite well for sendmsg() user packet data. And this the kind of API Solaris provides to get good SSL performance with Niagara. > > Is SHA a significant portion of the compute during these repacks? > > I should run oprofile... > > SHA1 is almost totally insignificant on x86. It hardly shows up. But we > have a good optimized version there. Ok. > zlib tends to be a lot more noticeable (especially the uncompression: it > may be faster than compression, but it's done _so_ much more that it > totally dominates). zlib is really hard to optimize on Sparc, I've tried numerous times. Actually compress is the real cycle killer, and in that case the inner loop wants to dereference 2-byte shorts at a time but they are unaligned half of the time, and any the check for alignment nullifies the gains of avoiding the two byte loads. Uncompress I don't think is optimized at all on any platform with asm stuff like the compress side is. It's a pretty straightforward transformation and the memory accesses dominate the overhead. I'll do some profiling to see what might be worth looking into. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-07 12:53 ` David Miller 2007-12-07 17:23 ` Linus Torvalds @ 2007-12-10 9:57 ` David Miller 1 sibling, 0 replies; 90+ messages in thread From: David Miller @ 2007-12-10 9:57 UTC (permalink / raw) To: jonsmirl; +Cc: peff, nico, dberlin, harvey.harrison, ismail, gcc, git From: David Miller <davem@davemloft.net> Date: Fri, 07 Dec 2007 04:53:29 -0800 (PST) > I should run oprofile... While doing the initial object counting, most of the time is spent in lookup_object(), memcmp() (via hashcmp()), and inflate(). I tried to see if I could do some tricks on sparc with the hashcmp() but the sha1 pointers are very often not even 4 byte aligned. I suspect lookup_object() could be improved if it didn't use a hash table without chaining, but I can see why 'struct object' size is a concern and thus why things are done the way they are. samples % app name symbol name 504 13.7517 libc-2.6.1.so memcmp 386 10.5321 libz.so.1.2.3.3 inflate 288 7.8581 git lookup_object 248 6.7667 libz.so.1.2.3.3 inflate_fast 201 5.4843 libz.so.1.2.3.3 inflate_table 175 4.7749 git decode_tree_entry ... Deltifying is %94 consumed by create_delta(), the rest is completely in the noise. samples % app name symbol name 10581 94.8373 git create_delta 181 1.6223 git create_delta_index 72 0.6453 git prepare_pack 55 0.4930 libc-2.6.1.so loop 34 0.3047 libz.so.1.2.3.3 inflate_fast 33 0.2958 libc-2.6.1.so _int_malloc 22 0.1972 libshadow.so shadowUpdatePacked 21 0.1882 libc-2.6.1.so _int_free 19 0.1703 libc-2.6.1.so malloc ... ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 5:11 ` Daniel Berlin 2007-12-06 5:15 ` Harvey Harrison @ 2007-12-06 6:09 ` Linus Torvalds 2007-12-06 7:49 ` Harvey Harrison ` (4 more replies) 1 sibling, 5 replies; 90+ messages in thread From: Linus Torvalds @ 2007-12-06 6:09 UTC (permalink / raw) To: Daniel Berlin; +Cc: David Miller, ismail, gcc, git On Thu, 6 Dec 2007, Daniel Berlin wrote: > > Actually, it turns out that git-gc --aggressive does this dumb thing > to pack files sometimes regardless of whether you converted from an > SVN repo or not. Absolutely. git --aggressive is mostly dumb. It's really only useful for the case of "I know I have a *really* bad pack, and I want to throw away all the bad packing decisions I have done". To explain this, it's worth explaining (you are probably aware of it, but let me go through the basics anyway) how git delta-chains work, and how they are so different from most other systems. In other SCM's, a delta-chain is generally fixed. It might be "forwards" or "backwards", and it might evolve a bit as you work with the repository, but generally it's a chain of changes to a single file represented as some kind of single SCM entity. In CVS, it's obviously the *,v file, and a lot of other systems do rather similar things. Git also does delta-chains, but it does them a lot more "loosely". There is no fixed entity. Delta's are generated against any random other version that git deems to be a good delta candidate (with various fairly successful heursitics), and there are absolutely no hard grouping rules. This is generally a very good thing. It's good for various conceptual reasons (ie git internally never really even needs to care about the whole revision chain - it doesn't really think in terms of deltas at all), but it's also great because getting rid of the inflexible delta rules means that git doesn't have any problems at all with merging two files together, for example - there simply are no arbitrary *,v "revision files" that have some hidden meaning. It also means that the choice of deltas is a much more open-ended question. If you limit the delta chain to just one file, you really don't have a lot of choices on what to do about deltas, but in git, it really can be a totally different issue. And this is where the really badly named "--aggressive" comes in. While git generally tries to re-use delta information (because it's a good idea, and it doesn't waste CPU time re-finding all the good deltas we found earlier), sometimes you want to say "let's start all over, with a blank slate, and ignore all the previous delta information, and try to generate a new set of deltas". So "--aggressive" is not really about being aggressive, but about wasting CPU time re-doing a decision we already did earlier! *Sometimes* that is a good thing. Some import tools in particular could generate really horribly bad deltas. Anything that uses "git fast-import", for example, likely doesn't have much of a great delta layout, so it might be worth saying "I want to start from a clean slate". But almost always, in other cases, it's actually a really bad thing to do. It's going to waste CPU time, and especially if you had actually done a good job at deltaing earlier, the end result isn't going to re-use all those *good* deltas you already found, so you'll actually end up with a much worse end result too! I'll send a patch to Junio to just remove the "git gc --aggressive" documentation. It can be useful, but it generally is useful only when you really understand at a very deep level what it's doing, and that documentation doesn't help you do that. Generally, doing incremental "git gc" is the right approach, and better than doing "git gc --aggressive". It's going to re-use old deltas, and when those old deltas can't be found (the reason for doing incremental GC in the first place!) it's going to create new ones. On the other hand, it's definitely true that an "initial import of a long and involved history" is a point where it can be worth spending a lot of time finding the *really*good* deltas. Then, every user ever after (as long as they don't use "git gc --aggressive" to undo it!) will get the advantage of that one-time event. So especially for big projects with a long history, it's probably worth doing some extra work, telling the delta finding code to go wild. So the equivalent of "git gc --aggressive" - but done *properly* - is to do (overnight) something like git repack -a -d --depth=250 --window=250 where that depth thing is just about how deep the delta chains can be (make them longer for old history - it's worth the space overhead), and the window thing is about how big an object window we want each delta candidate to scan. And here, you might well want to add the "-f" flag (which is the "drop all old deltas", since you now are actually trying to make sure that this one actually finds good candidates. And then it's going to take forever and a day (ie a "do it overnight" thing). But the end result is that everybody downstream from that repository will get much better packs, without having to spend any effort on it themselves. Linus ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 6:09 ` Linus Torvalds @ 2007-12-06 7:49 ` Harvey Harrison 2007-12-06 8:11 ` David Brown 2007-12-06 14:01 ` Nicolas Pitre 2007-12-06 12:03 ` [PATCH] gc --aggressive: make it really aggressive Johannes Schindelin ` (3 subsequent siblings) 4 siblings, 2 replies; 90+ messages in thread From: Harvey Harrison @ 2007-12-06 7:49 UTC (permalink / raw) To: Linus Torvalds; +Cc: Daniel Berlin, David Miller, ismail, gcc, git > git repack -a -d --depth=250 --window=250 > Since I have the whole gcc repo locally I'll give this a shot overnight just to see what can be done at the extreme end or things. Harvey ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 7:49 ` Harvey Harrison @ 2007-12-06 8:11 ` David Brown 2007-12-06 14:01 ` Nicolas Pitre 1 sibling, 0 replies; 90+ messages in thread From: David Brown @ 2007-12-06 8:11 UTC (permalink / raw) To: Harvey Harrison Cc: Linus Torvalds, Daniel Berlin, David Miller, ismail, gcc, git On Wed, Dec 05, 2007 at 11:49:21PM -0800, Harvey Harrison wrote: > >> git repack -a -d --depth=250 --window=250 >> > >Since I have the whole gcc repo locally I'll give this a shot overnight >just to see what can be done at the extreme end or things. When I tried this on a very large repo, at least one with some large files in it, git quickly exceeded my physical memory and started thrashing the machine. I had good results with git config pack.deltaCacheSize 512m git config pack.windowMemory 512m of course adjusting based on your physical memory. I think changing the windowMemory will affect the resulting compression, so changing these ratios might get better compression out of the result. If you're really patient, though, you could leave the unbounded window, hope you have enough swap, and just let it run. Dave ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 7:49 ` Harvey Harrison 2007-12-06 8:11 ` David Brown @ 2007-12-06 14:01 ` Nicolas Pitre 1 sibling, 0 replies; 90+ messages in thread From: Nicolas Pitre @ 2007-12-06 14:01 UTC (permalink / raw) To: Harvey Harrison Cc: Linus Torvalds, Daniel Berlin, David Miller, ismail, gcc, git On Wed, 5 Dec 2007, Harvey Harrison wrote: > > > git repack -a -d --depth=250 --window=250 > > > > Since I have the whole gcc repo locally I'll give this a shot overnight > just to see what can be done at the extreme end or things. Don't forget to add -f as well. Nicolas ^ permalink raw reply [flat|nested] 90+ messages in thread
* [PATCH] gc --aggressive: make it really aggressive 2007-12-06 6:09 ` Linus Torvalds 2007-12-06 7:49 ` Harvey Harrison @ 2007-12-06 12:03 ` Johannes Schindelin 2007-12-06 13:42 ` Theodore Tso ` (3 more replies) 2007-12-06 18:04 ` Git and GCC Daniel Berlin ` (2 subsequent siblings) 4 siblings, 4 replies; 90+ messages in thread From: Johannes Schindelin @ 2007-12-06 12:03 UTC (permalink / raw) To: Linus Torvalds; +Cc: Daniel Berlin, David Miller, ismail, gcc, git, gitster The default was not to change the window or depth at all. As suggested by Jon Smirl, Linus Torvalds and others, default to --window=250 --depth=250 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- On Wed, 5 Dec 2007, Linus Torvalds wrote: > On Thu, 6 Dec 2007, Daniel Berlin wrote: > > > > Actually, it turns out that git-gc --aggressive does this dumb > > thing to pack files sometimes regardless of whether you > > converted from an SVN repo or not. > > Absolutely. git --aggressive is mostly dumb. It's really only > useful for the case of "I know I have a *really* bad pack, and I > want to throw away all the bad packing decisions I have done". > > [...] > > So the equivalent of "git gc --aggressive" - but done *properly* > - is to do (overnight) something like > > git repack -a -d --depth=250 --window=250 How about this, then? builtin-gc.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/builtin-gc.c b/builtin-gc.c index 799c263..c6806d3 100644 --- a/builtin-gc.c +++ b/builtin-gc.c @@ -23,7 +23,7 @@ static const char * const builtin_gc_usage[] = { }; static int pack_refs = 1; -static int aggressive_window = -1; +static int aggressive_window = 250; static int gc_auto_threshold = 6700; static int gc_auto_pack_limit = 20; @@ -192,6 +192,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix) if (aggressive) { append_option(argv_repack, "-f", MAX_ADD); + append_option(argv_repack, "--depth=250", MAX_ADD); if (aggressive_window > 0) { sprintf(buf, "--window=%d", aggressive_window); append_option(argv_repack, buf, MAX_ADD); -- 1.5.3.7.2157.g9598e ^ permalink raw reply related [flat|nested] 90+ messages in thread
* Re: [PATCH] gc --aggressive: make it really aggressive 2007-12-06 12:03 ` [PATCH] gc --aggressive: make it really aggressive Johannes Schindelin @ 2007-12-06 13:42 ` Theodore Tso 2007-12-06 14:15 ` Nicolas Pitre 2007-12-06 14:22 ` Pierre Habouzit ` (2 subsequent siblings) 3 siblings, 1 reply; 90+ messages in thread From: Theodore Tso @ 2007-12-06 13:42 UTC (permalink / raw) To: Johannes Schindelin Cc: Linus Torvalds, Daniel Berlin, David Miller, ismail, gcc, git, gitster On Thu, Dec 06, 2007 at 12:03:38PM +0000, Johannes Schindelin wrote: > > The default was not to change the window or depth at all. As suggested > by Jon Smirl, Linus Torvalds and others, default to > > --window=250 --depth=250 I'd also suggest adding a comment in the man pages that this should only be done rarely, and that it can potentially take a *long* time (i.e., overnight) for big repositories, and in general it's not worth the effort to use --aggressive. Apologies to Linus and to the gcc folks, since I was the one who originally coded up gc --aggressive, and at the time my intent was "rarely does it make sense, and it may take a long time". The reason why I didn't make the default --window and --depth larger is because at the time the biggest repo I had easy access to was the Linux kernel's, and there you rapidly hit diminishing returns at much smaller numbers, so there was no real point in using --window=250 --depth=250. Linus later pointed out that what we *really* should do is at some point was to change repack -f to potentially retry to find a better delta, but to reuse the existing delta if it was no worse. That automatically does the right thing in the case where you had previously done a repack with --window=<large n> --depth=<large n>, but then later try using "gc --agressive", which ends up doing a worse job and throwing away the information from the previous repack with large window and depth sizes. Unfortunately no one ever got around to implementing that. Regards, - Ted ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH] gc --aggressive: make it really aggressive 2007-12-06 13:42 ` Theodore Tso @ 2007-12-06 14:15 ` Nicolas Pitre 0 siblings, 0 replies; 90+ messages in thread From: Nicolas Pitre @ 2007-12-06 14:15 UTC (permalink / raw) To: Theodore Tso Cc: Johannes Schindelin, Linus Torvalds, Daniel Berlin, David Miller, ismail, gcc, git, gitster On Thu, 6 Dec 2007, Theodore Tso wrote: > Linus later pointed out that what we *really* should do is at some > point was to change repack -f to potentially retry to find a better > delta, but to reuse the existing delta if it was no worse. That > automatically does the right thing in the case where you had > previously done a repack with --window=<large n> --depth=<large n>, > but then later try using "gc --agressive", which ends up doing a worse > job and throwing away the information from the previous repack with > large window and depth sizes. Unfortunately no one ever got around to > implementing that. I did start looking at it, but there are subtle issues to consider, such as making sure not to create delta loops. Currently this is avoided by never involving already reused deltas in new delta chains, except for edge base objects. IOW, this requires some head scratching which I didn't have the time for so far. Nicolas ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH] gc --aggressive: make it really aggressive 2007-12-06 12:03 ` [PATCH] gc --aggressive: make it really aggressive Johannes Schindelin 2007-12-06 13:42 ` Theodore Tso @ 2007-12-06 14:22 ` Pierre Habouzit 2007-12-06 15:55 ` Johannes Schindelin 2007-12-06 15:30 ` Harvey Harrison 2009-03-18 16:01 ` Johannes Schindelin 3 siblings, 1 reply; 90+ messages in thread From: Pierre Habouzit @ 2007-12-06 14:22 UTC (permalink / raw) To: Johannes Schindelin Cc: Linus Torvalds, Daniel Berlin, David Miller, ismail, gcc, git, gitster [-- Attachment #1: Type: text/plain, Size: 655 bytes --] On Thu, Dec 06, 2007 at 12:03:38PM +0000, Johannes Schindelin wrote: > > The default was not to change the window or depth at all. As suggested > by Jon Smirl, Linus Torvalds and others, default to > > --window=250 --depth=250 well, this will explode on many quite reasonnably sized systems. This should also use a memory-limit that could be auto-guessed from the system total physical memory (50% of the actual memory could be a good idea e.g.). On very large repositories, using that on the e.g. linux kernel, swaps like hell on a machine with 1Go of ram, and almost nothing running on it (less than 200Mo of ram actually used) [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH] gc --aggressive: make it really aggressive 2007-12-06 14:22 ` Pierre Habouzit @ 2007-12-06 15:55 ` Johannes Schindelin 2007-12-06 17:05 ` David Kastrup 0 siblings, 1 reply; 90+ messages in thread From: Johannes Schindelin @ 2007-12-06 15:55 UTC (permalink / raw) To: Pierre Habouzit Cc: Linus Torvalds, Daniel Berlin, David Miller, ismail, gcc, git, gitster Hi, On Thu, 6 Dec 2007, Pierre Habouzit wrote: > On Thu, Dec 06, 2007 at 12:03:38PM +0000, Johannes Schindelin wrote: > > > > The default was not to change the window or depth at all. As > > suggested by Jon Smirl, Linus Torvalds and others, default to > > > > --window=250 --depth=250 > > well, this will explode on many quite reasonnably sized systems. This > should also use a memory-limit that could be auto-guessed from the > system total physical memory (50% of the actual memory could be a good > idea e.g.). > > On very large repositories, using that on the e.g. linux kernel, swaps > like hell on a machine with 1Go of ram, and almost nothing running on it > (less than 200Mo of ram actually used) Yes. However, I think that --aggressive should be aggressive, and if you decide to run it on a machine which lacks the muscle to be aggressive, well, you should have known better. The upside: if you run this on a strong machine and clone it to a weak machine, you'll still have the benefit of a small pack (and you should mark it as .keep, too, to keep the benefit...) Ciao, Dscho ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH] gc --aggressive: make it really aggressive 2007-12-06 15:55 ` Johannes Schindelin @ 2007-12-06 17:05 ` David Kastrup 0 siblings, 0 replies; 90+ messages in thread From: David Kastrup @ 2007-12-06 17:05 UTC (permalink / raw) To: Johannes Schindelin Cc: Pierre Habouzit, Linus Torvalds, Daniel Berlin, David Miller, ismail, gcc, git, gitster Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > However, I think that --aggressive should be aggressive, and if you > decide to run it on a machine which lacks the muscle to be aggressive, > well, you should have known better. That's a rather cheap shot. "you should have known better" than expecting to be able to use a documented command and option because the git developers happened to have a nicer machine... _How_ is one supposed to have known better? -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH] gc --aggressive: make it really aggressive 2007-12-06 12:03 ` [PATCH] gc --aggressive: make it really aggressive Johannes Schindelin 2007-12-06 13:42 ` Theodore Tso 2007-12-06 14:22 ` Pierre Habouzit @ 2007-12-06 15:30 ` Harvey Harrison 2007-12-06 15:56 ` Johannes Schindelin 2007-12-06 16:19 ` Linus Torvalds 2009-03-18 16:01 ` Johannes Schindelin 3 siblings, 2 replies; 90+ messages in thread From: Harvey Harrison @ 2007-12-06 15:30 UTC (permalink / raw) To: Johannes Schindelin Cc: Linus Torvalds, Daniel Berlin, David Miller, ismail, gcc, git, gitster Wow /usr/bin/time git repack -a -d -f --window=250 --depth=250 23266.37user 581.04system 7:41:25elapsed 86%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (419835major+123275804minor)pagefaults 0swaps -r--r--r-- 1 hharrison hharrison 29091872 2007-12-06 07:26 pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.idx -r--r--r-- 1 hharrison hharrison 324094684 2007-12-06 07:26 pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.pack That extra delta depth really does make a difference. Just over a 300MB pack in the end, for all gcc branches/tags as of last night. Cheers, Harvey ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH] gc --aggressive: make it really aggressive 2007-12-06 15:30 ` Harvey Harrison @ 2007-12-06 15:56 ` Johannes Schindelin 2007-12-06 16:19 ` Linus Torvalds 1 sibling, 0 replies; 90+ messages in thread From: Johannes Schindelin @ 2007-12-06 15:56 UTC (permalink / raw) To: Harvey Harrison Cc: Linus Torvalds, Daniel Berlin, David Miller, ismail, gcc, git, gitster Hi, On Thu, 6 Dec 2007, Harvey Harrison wrote: > -r--r--r-- 1 hharrison hharrison 324094684 2007-12-06 07:26 > pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.pack Wow. Ciao, Dscho ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH] gc --aggressive: make it really aggressive 2007-12-06 15:30 ` Harvey Harrison 2007-12-06 15:56 ` Johannes Schindelin @ 2007-12-06 16:19 ` Linus Torvalds 1 sibling, 0 replies; 90+ messages in thread From: Linus Torvalds @ 2007-12-06 16:19 UTC (permalink / raw) To: Harvey Harrison Cc: Johannes Schindelin, Daniel Berlin, David Miller, ismail, gcc, Git Mailing List, Junio C Hamano On Thu, 6 Dec 2007, Harvey Harrison wrote: > > 7:41:25elapsed 86%CPU Heh. And this is why you want to do it exactly *once*, and then just export the end result for others ;) > -r--r--r-- 1 hharrison hharrison 324094684 2007-12-06 07:26 pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.pack But yeah, especially if you allow longer delta chains, the end result can be much smaller (and what makes the one-time repack more expensive is the window size, not the delta chain - you could make the delta chains longer with no cost overhead at packing time) HOWEVER. The longer delta chains do make it potentially much more expensive to then use old history. So there's a trade-off. And quite frankly, a delta depth of 250 is likely going to cause overflows in the delta cache (which is only 256 entries in size *and* it's a hash, so it's going to start having hash conflicts long before hitting the 250 depth limit). So when I said "--depth=250 --window=250", I chose those numbers more as an example of extremely aggressive packing, and I'm not at all sure that the end result is necessarily wonderfully usable. It's going to save disk space (and network bandwidth - the delta's will be re-used for the network protocol too!), but there are definitely downsides too, and using long delta chains may simply not be worth it in practice. (And some of it might just want to have git tuning, ie if people think that long deltas are worth it, we could easily just expand on the delta hash, at the cost of some more memory used!) That said, the good news is that working with *new* history will not be affected negatively, and if you want to be _really_ sneaky, there are ways to say "create a pack that contains the history up to a version one year ago, and be very aggressive about those old versions that we still want to have around, but do a separate pack for newer stuff using less aggressive parameters" So this is something that can be tweaked, although we don't really have any really nice interfaces for stuff like that (ie the git delta cache size is hardcoded in the sources and cannot be set in the config file, and the "pack old history more aggressively" involves some manual scripting and knowing how "git pack-objects" works rather than any nice simple command line switch). So the thing to take away from this is: - git is certainly flexible as hell - .. but to get the full power you may need to tweak things - .. happily you really only need to have one person to do the tweaking, and the tweaked end results will be available to others that do not need to know/care. And whether the difference between 320MB and 500MB is worth any really involved tweaking (considering the potential downsides), I really don't know. Only testing will tell. Linus ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH] gc --aggressive: make it really aggressive 2007-12-06 12:03 ` [PATCH] gc --aggressive: make it really aggressive Johannes Schindelin ` (2 preceding siblings ...) 2007-12-06 15:30 ` Harvey Harrison @ 2009-03-18 16:01 ` Johannes Schindelin 2009-03-18 16:27 ` Teemu Likonen 2009-03-18 18:02 ` Nicolas Pitre 3 siblings, 2 replies; 90+ messages in thread From: Johannes Schindelin @ 2009-03-18 16:01 UTC (permalink / raw) To: git; +Cc: gitster Hi, On Thu, 6 Dec 2007, Johannes Schindelin wrote: > > The default was not to change the window or depth at all. As suggested > by Jon Smirl, Linus Torvalds and others, default to > > --window=250 --depth=250 > > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> > --- Guess what. This is still unresolved, and yet somebody else had to be bitten by 'git gc --aggressive' being everything but aggressive. So... I think it is high time to resolve the issue, either by applying this patch with a delay of over one year, or by the pack wizards trying to implement that 'never fall back to a worse delta' idea mentioned in this thread. Although I suggest, really, that implying --depth=250 --window=250 (unless overridden by the config) with --aggressive is not at all wrong. Ciao, Dscho ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH] gc --aggressive: make it really aggressive 2009-03-18 16:01 ` Johannes Schindelin @ 2009-03-18 16:27 ` Teemu Likonen 2009-03-18 18:02 ` Nicolas Pitre 1 sibling, 0 replies; 90+ messages in thread From: Teemu Likonen @ 2009-03-18 16:27 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git, gitster On 2009-03-18 17:01 (+0100), Johannes Schindelin wrote: >> The default was not to change the window or depth at all. As >> suggested by Jon Smirl, Linus Torvalds and others, default to >> >> --window=250 --depth=250 > Guess what. This is still unresolved, and yet somebody else had to be > bitten by 'git gc --aggressive' being everything but aggressive. Pieter de Bie's tests seem to suggest that usually --window=50 --depth=50 gives about the same results than with higher values: http://vcscompare.blogspot.com/2008/06/git-repack-parameters.html I don't understand the issue very well myself so I really can't say what would be a/the good value. Anyway, I agree that it would be nice if "git gc --aggressive" were aggressive and a user wouldn't need to know about "git repack" and its cryptical low-levelish options. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH] gc --aggressive: make it really aggressive 2009-03-18 16:01 ` Johannes Schindelin 2009-03-18 16:27 ` Teemu Likonen @ 2009-03-18 18:02 ` Nicolas Pitre 1 sibling, 0 replies; 90+ messages in thread From: Nicolas Pitre @ 2009-03-18 18:02 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git, gitster On Wed, 18 Mar 2009, Johannes Schindelin wrote: > Hi, > > On Thu, 6 Dec 2007, Johannes Schindelin wrote: > > > > > The default was not to change the window or depth at all. As suggested > > by Jon Smirl, Linus Torvalds and others, default to > > > > --window=250 --depth=250 > > > > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> > > --- > > Guess what. This is still unresolved, and yet somebody else had to be > bitten by 'git gc --aggressive' being everything but aggressive. > > So... I think it is high time to resolve the issue, either by applying > this patch with a delay of over one year, or by the pack wizards trying to > implement that 'never fall back to a worse delta' idea mentioned in this > thread. This is just a bit complicated to implement (cycle avoidance, etc). > Although I suggest, really, that implying --depth=250 --window=250 (unless > overridden by the config) with --aggressive is not at all wrong. ACK. Nicolas ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 6:09 ` Linus Torvalds 2007-12-06 7:49 ` Harvey Harrison 2007-12-06 12:03 ` [PATCH] gc --aggressive: make it really aggressive Johannes Schindelin @ 2007-12-06 18:04 ` Daniel Berlin 2007-12-06 18:29 ` Linus Torvalds 2007-12-07 2:42 ` Harvey Harrison 2007-12-06 18:24 ` NightStrike 2007-12-06 19:12 ` Jon Loeliger 4 siblings, 2 replies; 90+ messages in thread From: Daniel Berlin @ 2007-12-06 18:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Miller, ismail, gcc, git On 12/6/07, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Thu, 6 Dec 2007, Daniel Berlin wrote: > > > > Actually, it turns out that git-gc --aggressive does this dumb thing > > to pack files sometimes regardless of whether you converted from an > > SVN repo or not. > > Absolutely. git --aggressive is mostly dumb. It's really only useful for > the case of "I know I have a *really* bad pack, and I want to throw away > all the bad packing decisions I have done". > > To explain this, it's worth explaining (you are probably aware of it, but > let me go through the basics anyway) how git delta-chains work, and how > they are so different from most other systems. > I worked on Monotone and other systems that use object stores. for a little while :) In particular, I believe GIT's original object store was based on Monotone, IIRC. > In other SCM's, a delta-chain is generally fixed. It might be "forwards" > or "backwards", and it might evolve a bit as you work with the repository, > but generally it's a chain of changes to a single file represented as some > kind of single SCM entity. In CVS, it's obviously the *,v file, and a lot > of other systems do rather similar things. > > Git also does delta-chains, but it does them a lot more "loosely". There > is no fixed entity. Delta's are generated against any random other version > that git deems to be a good delta candidate (with various fairly > successful heursitics), and there are absolutely no hard grouping rules. Sure. SVN actually supports this (surprisingly), it just never happens to choose delta bases that aren't related by ancestry. (IE it would have absolutely no problem with you using random other parts of the repository as delta bases, and i've played with it before). I actually advocated we move towards an object store model, as ancestry can be a crappy way of approximating similarity when you have a lot of branches. > So the equivalent of "git gc --aggressive" - but done *properly* - is to > do (overnight) something like > > git repack -a -d --depth=250 --window=250 > I gave this a try overnight, and it definitely helps a lot. Thanks! > And then it's going to take forever and a day (ie a "do it overnight" > thing). But the end result is that everybody downstream from that > repository will get much better packs, without having to spend any effort > on it themselves. > If your forever and a day is spent figuring out which deltas to use, you can reduce this significantly. If it is spent writing out the data, it's much harder. :) ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 18:04 ` Git and GCC Daniel Berlin @ 2007-12-06 18:29 ` Linus Torvalds 2007-12-07 2:42 ` Harvey Harrison 1 sibling, 0 replies; 90+ messages in thread From: Linus Torvalds @ 2007-12-06 18:29 UTC (permalink / raw) To: Daniel Berlin; +Cc: David Miller, ismail, gcc, git On Thu, 6 Dec 2007, Daniel Berlin wrote: > > I worked on Monotone and other systems that use object stores. for a > little while :) In particular, I believe GIT's original object store was > based on Monotone, IIRC. Yes and no. Monotone does what git does for the blobs. But there is a big difference in how git then does it for everything else too, ie trees and history. Tree being in that object store in particular are very important, and one of the biggest deals for deltas (actually, for two reasons: most of the time they don't change AT ALL if some subdirectory gets no changes and you don't need any delta, and even when they do change, it's usually going to delta very well, since it's usually just a small part that changes). > > And then it's going to take forever and a day (ie a "do it overnight" > > thing). But the end result is that everybody downstream from that > > repository will get much better packs, without having to spend any effort > > on it themselves. > > If your forever and a day is spent figuring out which deltas to use, > you can reduce this significantly. It's almost all about figuring out the delta. Which is why *not* using "-f" (or "--aggressive") is such a big deal for normal operation, because then you just skip it all. Linus ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 18:04 ` Git and GCC Daniel Berlin 2007-12-06 18:29 ` Linus Torvalds @ 2007-12-07 2:42 ` Harvey Harrison 2007-12-07 3:01 ` Linus Torvalds 1 sibling, 1 reply; 90+ messages in thread From: Harvey Harrison @ 2007-12-07 2:42 UTC (permalink / raw) To: Daniel Berlin; +Cc: Linus Torvalds, David Miller, ismail, gcc, git On Thu, 2007-12-06 at 13:04 -0500, Daniel Berlin wrote: > On 12/6/07, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > So the equivalent of "git gc --aggressive" - but done *properly* - is to > > do (overnight) something like > > > > git repack -a -d --depth=250 --window=250 > > > I gave this a try overnight, and it definitely helps a lot. > Thanks! I've updated the public mirror repo with the very-packed version. People cloning it now should get the just over 300MB repo now. git.infradead.org/gcc.git Cheers, Harvey ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-07 2:42 ` Harvey Harrison @ 2007-12-07 3:01 ` Linus Torvalds 2007-12-07 4:06 ` Jon Smirl 0 siblings, 1 reply; 90+ messages in thread From: Linus Torvalds @ 2007-12-07 3:01 UTC (permalink / raw) To: Harvey Harrison; +Cc: Daniel Berlin, David Miller, ismail, gcc, git On Thu, 6 Dec 2007, Harvey Harrison wrote: > > I've updated the public mirror repo with the very-packed version. Side note: it might be interesting to compare timings for history-intensive stuff with and without this kind of very-packed situation. The very density of a smaller pack-file might be enough to overcome the downsides (more CPU time to apply longer delta-chains), but regardless, real numbers talks, bullshit walks. So wouldn't it be nice to have real numbers? One easy way to get real numbers for history would be to just time some reasonably costly operation that uses lots of history. Ie just do a time git blame -C gcc/regclass.c > /dev/null and see if the deeper delta chains are very expensive. (Yeah, the above is pretty much designed to be the worst possible case for this kind of aggressive history packing, but I don't know if that choice of file to try to annotate is a good choice or not. I suspect that "git blame -C" with a CVS import is just horrid, because CVS commits tend to be pretty big and nasty and not as localized as we've tried to make things in the kernel, so doing the code copy detection is probably horrendously expensive) Linus ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-07 3:01 ` Linus Torvalds @ 2007-12-07 4:06 ` Jon Smirl 2007-12-07 4:21 ` Nicolas Pitre 2007-12-07 5:21 ` Linus Torvalds 0 siblings, 2 replies; 90+ messages in thread From: Jon Smirl @ 2007-12-07 4:06 UTC (permalink / raw) To: Linus Torvalds Cc: Harvey Harrison, Daniel Berlin, David Miller, ismail, gcc, git On 12/6/07, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Thu, 6 Dec 2007, Harvey Harrison wrote: > > > > I've updated the public mirror repo with the very-packed version. > > Side note: it might be interesting to compare timings for > history-intensive stuff with and without this kind of very-packed > situation. > > The very density of a smaller pack-file might be enough to overcome the > downsides (more CPU time to apply longer delta-chains), but regardless, > real numbers talks, bullshit walks. So wouldn't it be nice to have real > numbers? > > One easy way to get real numbers for history would be to just time some > reasonably costly operation that uses lots of history. Ie just do a > > time git blame -C gcc/regclass.c > /dev/null > > and see if the deeper delta chains are very expensive. jonsmirl@terra:/video/gcc$ time git blame -C gcc/regclass.c > /dev/null real 1m21.967s user 1m21.329s sys 0m0.640s The Mozilla repo is at least 50% larger than the gcc one. It took me 23 minutes to repack the gcc one on my $800 Dell. The trick to this is lots of RAM and 64b. There is little disk IO during the compression phase, everything is cached. I have a 4.8GB git process with 4GB of physical memory. Everything started slowing down a lot when the process got that big. Does git really need 4.8GB to repack? I could only keep 3.4GB resident. Luckily this happen at 95% completion. With 8GB of memory you should be able to do this repack in under 20 minutes. jonsmirl@terra:/video/gcc$ time git repack -a -d -f --depth=250 --window=250 real 22m54.380s user 69m18.948s sys 0m23.773s > (Yeah, the above is pretty much designed to be the worst possible case for > this kind of aggressive history packing, but I don't know if that choice > of file to try to annotate is a good choice or not. I suspect that "git > blame -C" with a CVS import is just horrid, because CVS commits tend to be > pretty big and nasty and not as localized as we've tried to make things in > the kernel, so doing the code copy detection is probably horrendously > expensive) > > Linus > - > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-07 4:06 ` Jon Smirl @ 2007-12-07 4:21 ` Nicolas Pitre 2007-12-07 5:21 ` Linus Torvalds 1 sibling, 0 replies; 90+ messages in thread From: Nicolas Pitre @ 2007-12-07 4:21 UTC (permalink / raw) To: Jon Smirl Cc: Linus Torvalds, Harvey Harrison, Daniel Berlin, David Miller, ismail, gcc, git On Thu, 6 Dec 2007, Jon Smirl wrote: > I have a 4.8GB git process with 4GB of physical memory. Everything > started slowing down a lot when the process got that big. Does git > really need 4.8GB to repack? I could only keep 3.4GB resident. Luckily > this happen at 95% completion. With 8GB of memory you should be able > to do this repack in under 20 minutes. Probably you have too many cached delta results. By default, every delta smaller than 1000 bytes is kept in memory until the write phase. Try using pack.deltacachesize = 256M or lower, or try disabling this caching entirely with pack.deltacachelimit = 0. Nicolas ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-07 4:06 ` Jon Smirl 2007-12-07 4:21 ` Nicolas Pitre @ 2007-12-07 5:21 ` Linus Torvalds 2007-12-07 7:08 ` Jon Smirl 1 sibling, 1 reply; 90+ messages in thread From: Linus Torvalds @ 2007-12-07 5:21 UTC (permalink / raw) To: Jon Smirl; +Cc: Harvey Harrison, Daniel Berlin, David Miller, ismail, gcc, git On Thu, 6 Dec 2007, Jon Smirl wrote: > > > > time git blame -C gcc/regclass.c > /dev/null > > jonsmirl@terra:/video/gcc$ time git blame -C gcc/regclass.c > /dev/null > > real 1m21.967s > user 1m21.329s Well, I was also hoping for a "compared to not-so-aggressive packing" number on the same machine.. IOW, what I was wondering is whether there is a visible performance downside to the deeper delta chains in the 300MB pack vs the (less aggressive) 500MB pack. Linus ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-07 5:21 ` Linus Torvalds @ 2007-12-07 7:08 ` Jon Smirl 2007-12-07 19:36 ` Nicolas Pitre 0 siblings, 1 reply; 90+ messages in thread From: Jon Smirl @ 2007-12-07 7:08 UTC (permalink / raw) To: Linus Torvalds Cc: Harvey Harrison, Daniel Berlin, David Miller, ismail, gcc, git On 12/7/07, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Thu, 6 Dec 2007, Jon Smirl wrote: > > > > > > time git blame -C gcc/regclass.c > /dev/null > > > > jonsmirl@terra:/video/gcc$ time git blame -C gcc/regclass.c > /dev/null > > > > real 1m21.967s > > user 1m21.329s > > Well, I was also hoping for a "compared to not-so-aggressive packing" > number on the same machine.. IOW, what I was wondering is whether there is > a visible performance downside to the deeper delta chains in the 300MB > pack vs the (less aggressive) 500MB pack. Same machine with a default pack jonsmirl@terra:/video/gcc/.git/objects/pack$ ls -l total 2145716 -r--r--r-- 1 jonsmirl jonsmirl 23667932 2007-12-07 02:03 pack-bd163555ea9240a7fdd07d2708a293872665f48b.idx -r--r--r-- 1 jonsmirl jonsmirl 2171385413 2007-12-07 02:03 pack-bd163555ea9240a7fdd07d2708a293872665f48b.pack jonsmirl@terra:/video/gcc/.git/objects/pack$ Delta lengths have virtually no impact. The bigger pack file causes more IO which offsets the increased delta processing time. One of my rules is smaller is almost always better. Smaller eliminates IO and helps with the CPU cache. It's like the kernel being optimized for size instead of speed ending up being faster. time git blame -C gcc/regclass.c > /dev/null real 1m19.289s user 1m17.853s sys 0m0.952s > > Linus > -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-07 7:08 ` Jon Smirl @ 2007-12-07 19:36 ` Nicolas Pitre 0 siblings, 0 replies; 90+ messages in thread From: Nicolas Pitre @ 2007-12-07 19:36 UTC (permalink / raw) To: Jon Smirl Cc: Linus Torvalds, Harvey Harrison, Daniel Berlin, David Miller, ismail, gcc, git On Fri, 7 Dec 2007, Jon Smirl wrote: > On 12/7/07, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > > > > On Thu, 6 Dec 2007, Jon Smirl wrote: > > > > > > > > time git blame -C gcc/regclass.c > /dev/null > > > > > > jonsmirl@terra:/video/gcc$ time git blame -C gcc/regclass.c > /dev/null > > > > > > real 1m21.967s > > > user 1m21.329s > > > > Well, I was also hoping for a "compared to not-so-aggressive packing" > > number on the same machine.. IOW, what I was wondering is whether there is > > a visible performance downside to the deeper delta chains in the 300MB > > pack vs the (less aggressive) 500MB pack. > > Same machine with a default pack > > jonsmirl@terra:/video/gcc/.git/objects/pack$ ls -l > total 2145716 > -r--r--r-- 1 jonsmirl jonsmirl 23667932 2007-12-07 02:03 > pack-bd163555ea9240a7fdd07d2708a293872665f48b.idx > -r--r--r-- 1 jonsmirl jonsmirl 2171385413 2007-12-07 02:03 > pack-bd163555ea9240a7fdd07d2708a293872665f48b.pack > jonsmirl@terra:/video/gcc/.git/objects/pack$ > > Delta lengths have virtually no impact. I can confirm this. I just did a repack keeping the default depth of 50 but with window=100 instead of the default of 10, and the pack shrunk from 2171385413 bytes down to 410607140 bytes. So our default window size is definitely not adequate for the gcc repo. OTOH, I recall tytso mentioning something about not having much return on a bigger window size in his tests when he proposed to increase the default delta depth to 50. So there is definitely some kind of threshold at which point the increased window size stops being advantageous wrt the number of cycles involved, and we should find a way to correlate it to the data set to have a better default window size than the current fixed default. Nicolas ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 6:09 ` Linus Torvalds ` (2 preceding siblings ...) 2007-12-06 18:04 ` Git and GCC Daniel Berlin @ 2007-12-06 18:24 ` NightStrike 2007-12-06 18:45 ` Linus Torvalds 2007-12-06 19:12 ` Jon Loeliger 4 siblings, 1 reply; 90+ messages in thread From: NightStrike @ 2007-12-06 18:24 UTC (permalink / raw) To: Linus Torvalds; +Cc: Daniel Berlin, David Miller, ismail, gcc, git On 12/6/07, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Thu, 6 Dec 2007, Daniel Berlin wrote: > > > > Actually, it turns out that git-gc --aggressive does this dumb thing > > to pack files sometimes regardless of whether you converted from an > > SVN repo or not. > I'll send a patch to Junio to just remove the "git gc --aggressive" > documentation. It can be useful, but it generally is useful only when you > really understand at a very deep level what it's doing, and that > documentation doesn't help you do that. No disrespect is meant by this reply. I am just curious (and I am probably misunderstanding something).. Why remove all of the documentation entirely? Wouldn't it be better to just document it more thoroughly? I thought you did a fine job in this post in explaining its purpose, when to use it, when not to, etc. Removing the documention seems counter-intuitive when you've already gone to the trouble of creating good documentation here in this post. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 18:24 ` NightStrike @ 2007-12-06 18:45 ` Linus Torvalds 2007-12-07 5:36 ` NightStrike 0 siblings, 1 reply; 90+ messages in thread From: Linus Torvalds @ 2007-12-06 18:45 UTC (permalink / raw) To: NightStrike; +Cc: Daniel Berlin, David Miller, ismail, gcc, git On Thu, 6 Dec 2007, NightStrike wrote: > > No disrespect is meant by this reply. I am just curious (and I am > probably misunderstanding something).. Why remove all of the > documentation entirely? Wouldn't it be better to just document it > more thoroughly? Well, part of it is that I don't think "--aggressive" as it is implemented right now is really almost *ever* the right answer. We could change the implementation, of course, but generally the right thing to do is to not use it (tweaking the "--window" and "--depth" manually for the repacking is likely the more natural thing to do). The other part of the answer is that, when you *do* want to do what that "--aggressive" tries to achieve, it's such a special case event that while it should probably be documented, I don't think it should necessarily be documented where it is now (as part of "git gc"), but as part of a much more technical manual for "deep and subtle tricks you can play". > I thought you did a fine job in this post in explaining its purpose, > when to use it, when not to, etc. Removing the documention seems > counter-intuitive when you've already gone to the trouble of creating > good documentation here in this post. I'm so used to writing emails, and I *like* trying to explain what is going on, so I have no problems at all doing that kind of thing. However, trying to write a manual or man-page or other technical documentation is something rather different. IOW, I like explaining git within the _context_ of a discussion or a particular problem/issue. But documentation should work regardless of context (or at least set it up), and that's the part I am not so good at. In other words, if somebody (hint hint) thinks my explanation was good and readable, I'd love for them to try to turn it into real documentation by editing it up and creating enough context for it! But I'm nort personally very likely to do that. I'd just send Junio the patch to remove a misleading part of the documentation we have. Linus ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 18:45 ` Linus Torvalds @ 2007-12-07 5:36 ` NightStrike 0 siblings, 0 replies; 90+ messages in thread From: NightStrike @ 2007-12-07 5:36 UTC (permalink / raw) To: Linus Torvalds; +Cc: Daniel Berlin, David Miller, ismail, gcc, git On 12/6/07, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Thu, 6 Dec 2007, NightStrike wrote: > > > > No disrespect is meant by this reply. I am just curious (and I am > > probably misunderstanding something).. Why remove all of the > > documentation entirely? Wouldn't it be better to just document it > > more thoroughly? > > Well, part of it is that I don't think "--aggressive" as it is implemented > right now is really almost *ever* the right answer. We could change the > implementation, of course, but generally the right thing to do is to not > use it (tweaking the "--window" and "--depth" manually for the repacking > is likely the more natural thing to do). > > The other part of the answer is that, when you *do* want to do what that > "--aggressive" tries to achieve, it's such a special case event that while > it should probably be documented, I don't think it should necessarily be > documented where it is now (as part of "git gc"), but as part of a much > more technical manual for "deep and subtle tricks you can play". > > > I thought you did a fine job in this post in explaining its purpose, > > when to use it, when not to, etc. Removing the documention seems > > counter-intuitive when you've already gone to the trouble of creating > > good documentation here in this post. > > I'm so used to writing emails, and I *like* trying to explain what is > going on, so I have no problems at all doing that kind of thing. However, > trying to write a manual or man-page or other technical documentation is > something rather different. > > IOW, I like explaining git within the _context_ of a discussion or a > particular problem/issue. But documentation should work regardless of > context (or at least set it up), and that's the part I am not so good at. > > In other words, if somebody (hint hint) thinks my explanation was good and > readable, I'd love for them to try to turn it into real documentation by > editing it up and creating enough context for it! But I'm nort personally > very likely to do that. I'd just send Junio the patch to remove a > misleading part of the documentation we have. hehe.. I'd love to, actually. I can work on it next week. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 6:09 ` Linus Torvalds ` (3 preceding siblings ...) 2007-12-06 18:24 ` NightStrike @ 2007-12-06 19:12 ` Jon Loeliger 2007-12-06 19:39 ` Linus Torvalds 2007-12-06 20:04 ` Junio C Hamano 4 siblings, 2 replies; 90+ messages in thread From: Jon Loeliger @ 2007-12-06 19:12 UTC (permalink / raw) To: Linus Torvalds; +Cc: Daniel Berlin, David Miller, ismail, gcc, Git List On Thu, 2007-12-06 at 00:09, Linus Torvalds wrote: > Git also does delta-chains, but it does them a lot more "loosely". There > is no fixed entity. Delta's are generated against any random other version > that git deems to be a good delta candidate (with various fairly > successful heursitics), and there are absolutely no hard grouping rules. I'd like to learn more about that. Can someone point me to either more documentation on it? In the absence of that, perhaps a pointer to the source code that implements it? I guess one question I posit is, would it be more accurate to think of this as a "delta net" in a weighted graph rather than a "delta chain"? Thanks, jdl ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 19:12 ` Jon Loeliger @ 2007-12-06 19:39 ` Linus Torvalds 2007-12-07 0:29 ` Jakub Narebski 2007-12-06 20:04 ` Junio C Hamano 1 sibling, 1 reply; 90+ messages in thread From: Linus Torvalds @ 2007-12-06 19:39 UTC (permalink / raw) To: Jon Loeliger; +Cc: Daniel Berlin, David Miller, ismail, gcc, Git List On Thu, 6 Dec 2007, Jon Loeliger wrote: > > On Thu, 2007-12-06 at 00:09, Linus Torvalds wrote: > > Git also does delta-chains, but it does them a lot more "loosely". There > > is no fixed entity. Delta's are generated against any random other version > > that git deems to be a good delta candidate (with various fairly > > successful heursitics), and there are absolutely no hard grouping rules. > > I'd like to learn more about that. Can someone point me to > either more documentation on it? In the absence of that, > perhaps a pointer to the source code that implements it? Well, in a very real sense, what the delta code does is: - just list every single object in the whole repository - walk over each object, trying to find another object that it can be written as a delta against - write out the result as a pack-file That's simplified: we may not walk _all_ objects, for example: only a global repack does that (and most pack creations are actually for pushign and pulling between two repositories, so we only walk the objects that are in the source but not the destination repository). The interesting phase is the "walk each object, try to find a delta" part. In particular, you don't want to try to find a delta by comparing each object to every other object out there (that would be O(n^2) in objects, and with a fairly high constant cost too!). So what it does is to sort the objects by a few heuristics (type of object, base name that object was found as when traversing a tree and size, and how recently it was found in the history). And then over that sorted list, it tries to find deltas between entries that are "close" to each other (and that's where the "--window=xyz" thing comes in - it says how big the window is for objects being close. A smaller window generates somewhat less good deltas, but takes a lot less effort to generate). The source is in git/builtin-pack-objects.c, with the core of it being - try_delta() - try to generate a *single* delta when given an object pair. - find_deltas() - do the actual list traversal - prepare_pack() and type_size_sort() - create the delta sort list from the list of objects. but that whole file is probably some of the more opaque parts of git. > I guess one question I posit is, would it be more accurate > to think of this as a "delta net" in a weighted graph rather > than a "delta chain"? It's certainly not a simple chain, it's more of a set of acyclic directed graphs in the object list. And yes, it's weigted by the size of the delta between objects, and the optimization problem is kind of akin to finding the smallest spanning tree (well, forest - since you do *not* want to create one large graph, you also want to make the individual trees shallow enough that you don't have excessive delta depth). There are good algorithms for finding minimum spanning trees, but this one is complicated by the fact that the biggest cost (by far!) is the calculation of the weights itself. So rather than really worry about finding the minimal tree/forest, the code needs to worry about not having to even calculate all the weights! (That, btw, is a common theme. A lot of git is about traversing graphs, like the revision graph. And most of the trivial graph problems all assume that you have the whole graph, but since the "whole graph" is the whole history of the repository, those algorithms are totally worthless, since they are fundamentally much too expensive - if we have to generate the whole history, we're already screwed for a big project. So things like revision graph calculation, the main performance issue is to avoid having to even *look* at parts of the graph that we don't need to see!) Linus ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 19:39 ` Linus Torvalds @ 2007-12-07 0:29 ` Jakub Narebski 0 siblings, 0 replies; 90+ messages in thread From: Jakub Narebski @ 2007-12-07 0:29 UTC (permalink / raw) To: Linus Torvalds Cc: Jon Loeliger, Daniel Berlin, David Miller, Ismail Donmez, gcc, git Linus Torvalds <torvalds@linux-foundation.org> writes: > On Thu, 6 Dec 2007, Jon Loeliger wrote: >> I guess one question I posit is, would it be more accurate >> to think of this as a "delta net" in a weighted graph rather >> than a "delta chain"? > > It's certainly not a simple chain, it's more of a set of acyclic directed > graphs in the object list. And yes, it's weigted by the size of the delta > between objects, and the optimization problem is kind of akin to finding > the smallest spanning tree (well, forest - since you do *not* want to > create one large graph, you also want to make the individual trees shallow > enough that you don't have excessive delta depth). > > There are good algorithms for finding minimum spanning trees, but this one > is complicated by the fact that the biggest cost (by far!) is the > calculation of the weights itself. So rather than really worry about > finding the minimal tree/forest, the code needs to worry about not having > to even calculate all the weights! > > (That, btw, is a common theme. A lot of git is about traversing graphs, > like the revision graph. And most of the trivial graph problems all assume > that you have the whole graph, but since the "whole graph" is the whole > history of the repository, those algorithms are totally worthless, since > they are fundamentally much too expensive - if we have to generate the > whole history, we're already screwed for a big project. So things like > revision graph calculation, the main performance issue is to avoid having > to even *look* at parts of the graph that we don't need to see!) Hmmm... I think that these two problems (find minimal spanning forest with limited depth and traverse graph) with the additional constraint to avoid calculating weights / avoid calculating whole graph would be a good problem to present at CompSci course. Just a thought... -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 19:12 ` Jon Loeliger 2007-12-06 19:39 ` Linus Torvalds @ 2007-12-06 20:04 ` Junio C Hamano 2007-12-06 21:02 ` Junio C Hamano 1 sibling, 1 reply; 90+ messages in thread From: Junio C Hamano @ 2007-12-06 20:04 UTC (permalink / raw) To: Jon Loeliger Cc: Linus Torvalds, Daniel Berlin, David Miller, ismail, gcc, Git List Jon Loeliger <jdl@freescale.com> writes: > On Thu, 2007-12-06 at 00:09, Linus Torvalds wrote: > >> Git also does delta-chains, but it does them a lot more "loosely". There >> is no fixed entity. Delta's are generated against any random other version >> that git deems to be a good delta candidate (with various fairly >> successful heursitics), and there are absolutely no hard grouping rules. > > I'd like to learn more about that. Can someone point me to > either more documentation on it? In the absence of that, > perhaps a pointer to the source code that implements it? See Documentation/technical/pack-heuristics.txt, but the document predates and does not talk about delta reusing, which was covered here: http://thread.gmane.org/gmane.comp.version-control.git/16223/focus=16267 > I guess one question I posit is, would it be more accurate > to think of this as a "delta net" in a weighted graph rather > than a "delta chain"? Yes. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 20:04 ` Junio C Hamano @ 2007-12-06 21:02 ` Junio C Hamano 2007-12-06 22:26 ` David Kastrup 0 siblings, 1 reply; 90+ messages in thread From: Junio C Hamano @ 2007-12-06 21:02 UTC (permalink / raw) To: Jon Loeliger Cc: Linus Torvalds, Daniel Berlin, David Miller, ismail, gcc, Git List Junio C Hamano <gitster@pobox.com> writes: > Jon Loeliger <jdl@freescale.com> writes: > >> I'd like to learn more about that. Can someone point me to >> either more documentation on it? In the absence of that, >> perhaps a pointer to the source code that implements it? > > See Documentation/technical/pack-heuristics.txt, A somewhat funny thing about this is ... $ git show --stat --summary b116b297 commit b116b297a80b54632256eb89dd22ea2b140de622 Author: Jon Loeliger <jdl@jdl.com> Date: Thu Mar 2 19:19:29 2006 -0600 Added Packing Heursitics IRC writeup. Signed-off-by: Jon Loeliger <jdl@jdl.com> Signed-off-by: Junio C Hamano <junkio@cox.net> Documentation/technical/pack-heuristics.txt | 466 +++++++++++++++++++++++++++ 1 files changed, 466 insertions(+), 0 deletions(-) create mode 100644 Documentation/technical/pack-heuristics.txt ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 21:02 ` Junio C Hamano @ 2007-12-06 22:26 ` David Kastrup 2007-12-06 22:38 ` [OT] " Randy Dunlap 0 siblings, 1 reply; 90+ messages in thread From: David Kastrup @ 2007-12-06 22:26 UTC (permalink / raw) To: Junio C Hamano Cc: Jon Loeliger, Linus Torvalds, Daniel Berlin, David Miller, ismail, gcc, Git List Junio C Hamano <gitster@pobox.com> writes: > Junio C Hamano <gitster@pobox.com> writes: > >> Jon Loeliger <jdl@freescale.com> writes: >> >>> I'd like to learn more about that. Can someone point me to >>> either more documentation on it? In the absence of that, >>> perhaps a pointer to the source code that implements it? >> >> See Documentation/technical/pack-heuristics.txt, > > A somewhat funny thing about this is ... > > $ git show --stat --summary b116b297 > commit b116b297a80b54632256eb89dd22ea2b140de622 > Author: Jon Loeliger <jdl@jdl.com> > Date: Thu Mar 2 19:19:29 2006 -0600 > > Added Packing Heursitics IRC writeup. Ah, fishing for compliments. The cookie baking season... -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 90+ messages in thread
* [OT] Re: Git and GCC 2007-12-06 22:26 ` David Kastrup @ 2007-12-06 22:38 ` Randy Dunlap 0 siblings, 0 replies; 90+ messages in thread From: Randy Dunlap @ 2007-12-06 22:38 UTC (permalink / raw) To: David Kastrup Cc: Junio C Hamano, Jon Loeliger, Linus Torvalds, Daniel Berlin, David Miller, ismail, gcc, Git List On Thu, 06 Dec 2007 23:26:07 +0100 David Kastrup wrote: > Junio C Hamano <gitster@pobox.com> writes: > > > Junio C Hamano <gitster@pobox.com> writes: > > > >> Jon Loeliger <jdl@freescale.com> writes: > >> > >>> I'd like to learn more about that. Can someone point me to > >>> either more documentation on it? In the absence of that, > >>> perhaps a pointer to the source code that implements it? > >> > >> See Documentation/technical/pack-heuristics.txt, > > > > A somewhat funny thing about this is ... > > > > $ git show --stat --summary b116b297 > > commit b116b297a80b54632256eb89dd22ea2b140de622 > > Author: Jon Loeliger <jdl@jdl.com> > > Date: Thu Mar 2 19:19:29 2006 -0600 > > > > Added Packing Heursitics IRC writeup. > > Ah, fishing for compliments. The cookie baking season... Indeed. Here are some really good & sweet recipes (IMHO). http://www.xenotime.net/linux/recipes/ --- ~Randy Features and documentation: http://lwn.net/Articles/260136/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 3:47 ` Daniel Berlin 2007-12-06 4:20 ` David Miller @ 2007-12-06 4:25 ` Harvey Harrison 2007-12-06 4:54 ` Linus Torvalds 1 sibling, 1 reply; 90+ messages in thread From: Harvey Harrison @ 2007-12-06 4:25 UTC (permalink / raw) To: Daniel Berlin; +Cc: David Miller, ismail, gcc, git I fought with this a few months ago when I did my own clone of gcc svn. My bad for only discussing this on #git at the time. Should have put this to the list as well. If anyone recalls my report was something along the lines of git gc --aggressive explodes pack size. git repack -a -d --depth=100 --window=100 produced a ~550MB packfile immediately afterwards a git gc --aggressive produces a 1.5G packfile. This was for all branches/tags, not just trunk like Daniel's repo. The best theory I had at the time was that the gc doesn't find as good deltas or doesn't allow the same delta chain depth and so generates a new object in the pack, rather the reusing a good delta it already has in the well-packed pack. Cheers, Harvey ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 4:25 ` Harvey Harrison @ 2007-12-06 4:54 ` Linus Torvalds 2007-12-06 5:04 ` Harvey Harrison 0 siblings, 1 reply; 90+ messages in thread From: Linus Torvalds @ 2007-12-06 4:54 UTC (permalink / raw) To: Harvey Harrison; +Cc: Daniel Berlin, David Miller, ismail, gcc, git On Wed, 5 Dec 2007, Harvey Harrison wrote: > > If anyone recalls my report was something along the lines of > git gc --aggressive explodes pack size. Yes, --aggressive is generally a bad idea. I think we should remove it or at least fix it. It doesn't do what the name implies, because it actually throws away potentially good packing, and re-does it all from a clean slate. That said, it's totally pointless for a person who isn't a git proponent to do an initial import, and in that sense I agree with Daniel: he shouldn't waste his time with tools that he doesn't know or care about, since there are people who *can* do a better job, and who know what they are doing, and understand and like the tool. While you can do a half-assed job with just mindlessly running "git svnimport" (which is deprecated these days) or "git svn clone" (better), the fact is, to do a *good* import does likely mean spending some effort on it. Trying to make the user names / emails to be better with a mailmap, for example. [ By default, for example, "git svn clone/fetch" seems to create those horrible fake email addresses that contain the ID of the SVN repo in each commit - I'm not talking about the "git-svn-id", I'm talking about the "user@hex-string-goes-here" thing for the author. Maybe people don't really care, but isn't that ugly as hell? I'd think it's worth it doing a really nice import, spending some effort on it. But maybe those things come from the older CVS->SVN import, I don't really know. I've done a few SVN imports, but I've done them just for stuff where I didn't want to touch SVN, but just wanted to track some project like libgpod. For things like *that*, a totally mindless "git svn" thing is fine ] Of course, that does require there to be git people in the gcc crowd who are motivated enough to do the proper import and then make sure it's up-to-date and hosted somewhere. If those people don't exist, I'm not sure there's much idea to it. The point being, you cannot ask a non-git person to do a major git import for an actual switch-over. Yes, it *can* be as simple as just doing a git svn clone --stdlayout svn://svn://gcc.gnu.org/svn/gcc gcc but the fact remains, you want to spend more effort and expertise on it if you actually want the result to be used as a basis for future work (as opposed to just tracking somebody elses SVN tree). That includes: - do the historic import with good packing (and no, "--aggressive" is not it, never mind the misleading name and man-page) - probably mailmap entries, certainly spending some time validating the results. - hosting it and perhaps most importantly - helping people who are *not* git users get up to speed. because doing a good job at it is like asking a CVS newbie to set up a branch in CVS. I'm sure you can do it from man-pages, but I'm also sure you sure as hell won't like the end result. Linus ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 4:54 ` Linus Torvalds @ 2007-12-06 5:04 ` Harvey Harrison 0 siblings, 0 replies; 90+ messages in thread From: Harvey Harrison @ 2007-12-06 5:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: Daniel Berlin, David Miller, ismail, gcc, git On Wed, 2007-12-05 at 20:54 -0800, Linus Torvalds wrote: > > On Wed, 5 Dec 2007, Harvey Harrison wrote: > > > > If anyone recalls my report was something along the lines of > > git gc --aggressive explodes pack size. > [ By default, for example, "git svn clone/fetch" seems to create those > horrible fake email addresses that contain the ID of the SVN repo in > each commit - I'm not talking about the "git-svn-id", I'm talking about > the "user@hex-string-goes-here" thing for the author. Maybe people don't > really care, but isn't that ugly as hell? I'd think it's worth it doing > a really nice import, spending some effort on it. > > But maybe those things come from the older CVS->SVN import, I don't > really know. I've done a few SVN imports, but I've done them just for > stuff where I didn't want to touch SVN, but just wanted to track some > project like libgpod. For things like *that*, a totally mindless "git > svn" thing is fine ] > git svn does accept a mailmap at import time with the same format as the cvs importer I think. But for someone that just wants a repo to check out this was easiest. I'd be willing to spend the time to do a nicer job if there was any interest from the gcc side, but I'm not that invested (other than owing them for an often-used tool). Harvey ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 2:52 ` David Miller 2007-12-06 3:47 ` Daniel Berlin @ 2007-12-06 11:57 ` Johannes Schindelin 2007-12-06 12:04 ` Ismail Dönmez 1 sibling, 1 reply; 90+ messages in thread From: Johannes Schindelin @ 2007-12-06 11:57 UTC (permalink / raw) To: David Miller; +Cc: dberlin, ismail, gcc, git Hi, On Wed, 5 Dec 2007, David Miller wrote: > From: "Daniel Berlin" <dberlin@dberlin.org> > Date: Wed, 5 Dec 2007 21:41:19 -0500 > > > It is true I gave up quickly, but this is mainly because i don't like > > to fight with my tools. > > > > I am quite fine with a distributed workflow, I now use 8 or so gcc > > branches in mercurial (auto synced from svn) and merge a lot between > > them. I wanted to see if git would sanely let me manage the commits > > back to svn. After fighting with it, i gave up and just wrote a > > python extension to hg that lets me commit non-svn changesets back to > > svn directly from hg. > > I find it ironic that you were even willing to write tools to facilitate > your hg based gcc workflow. That really shows what your thinking is on > this matter, in that you're willing to put effort towards making hg work > better for you but you're not willing to expend that level of effort to > see if git can do so as well. While this is true... > This is what really eats me from the inside about your dissatisfaction > with git. Your analysis seems to be a self-fullfilling prophecy, and > that's totally unfair to both hg and git. ... I actually appreciate people complaining -- in the meantime. It shows right away what group you belong to in the "Those who can do, do, those who can't, complain.". You can see that very easily on the git list, or on the #git channel on irc.freenode.net. There is enough data for a study which yearns to be written, that shows how quickly we resolve issues with people that are sincerely interested in a solution. (Of course, on the other hand, there are also quite a few cases which show how frustrating (for both sides) and unfruitful discussions started by a complaint are.) So I fully expect an issue like Daniel's to be resolved in a matter of minutes on the git list, if the OP gives us a chance. If we are not even Cc'ed, you are completely right, she or he probably does not want the issue to be resolved. Ciao, Dscho ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Git and GCC 2007-12-06 11:57 ` Johannes Schindelin @ 2007-12-06 12:04 ` Ismail Dönmez 0 siblings, 0 replies; 90+ messages in thread From: Ismail Dönmez @ 2007-12-06 12:04 UTC (permalink / raw) To: Johannes Schindelin; +Cc: David Miller, dberlin, gcc, git Thursday 06 December 2007 13:57:06 Johannes Schindelin yazmıştı: [...] > So I fully expect an issue like Daniel's to be resolved in a matter of > minutes on the git list, if the OP gives us a chance. If we are not even > Cc'ed, you are completely right, she or he probably does not want the > issue to be resolved. Lets be fair about this, Ollie Wild already sent a mail about git-svn disk usage and there is no concrete solution yet, though it seems the bottleneck is known. Regards, ismail -- Never learn by your mistakes, if you do you may never dare to try again. ^ permalink raw reply [flat|nested] 90+ messages in thread
[parent not found: <2007-12-05-21-23-14+trackit+sam@rfc1149.net>]
[parent not found: <1196891451.10408.54.camel@brick>]
[parent not found: <jeeje0ogvk.fsf@sykes.suse.de>]
[parent not found: <1196897840.10408.57.camel@brick>]
[parent not found: <38a0d8450712130640p1b5d74d6nfa124ad0b0110d64@mail.gmail.com>]
[parent not found: <1197572755.898.15.camel@brick>]
* "Argument list too long" in git remote update (Was: Git and GCC) [not found] ` <1197572755.898.15.camel@brick> @ 2007-12-17 22:15 ` Geert Bosch 2007-12-17 22:59 ` Johannes Schindelin 2007-12-17 23:01 ` Linus Torvalds 0 siblings, 2 replies; 90+ messages in thread From: Geert Bosch @ 2007-12-17 22:15 UTC (permalink / raw) To: Harvey Harrison; +Cc: Git Mailing List On Dec 13, 2007, at 14:05, Harvey Harrison wrote: > After the discussions lately regarding the gcc svn mirror. I'm coming > up with a recipe to set up your own git-svn mirror. Suggestions on > the > following. > > // Create directory and initialize git > mkdir gcc > cd gcc > git init > // add the remote site that currently mirrors gcc > // I have chosen the name gcc.gnu.org *1* as my local name to refer to > // this choose something else if you like > git remote add gcc.gnu.org git://git.infradead.org/gcc.git > // fetching someone else's remote branches is not a standard thing > to do > // so we'll need to edit our .git/config file > // you should have a section that looks like: > [remote "gcc.gnu.org"] > url = git://git.infradead.org/gcc.git > fetch = +refs/heads/*:refs/remotes/gcc.gnu.org/* > // infradead's mirror puts the gcc svn branches in its own namespace > // refs/remotes/gcc.gnu.org/* > // change our fetch line accordingly > [remote "gcc.gnu.org"] > url = git://git.infradead.org/gcc.git > fetch = +refs/remotes/gcc.gnu.org/*:refs/remotes/gcc.gnu.org/* > // fetch the remote data from the mirror site > git remote update With git version 1.5.3.6 on Mac OS X, this results in: potomac%:~/gcc%git remote update Updating gcc.gnu.org /opt/git/bin/git-fetch: line 220: /opt/git/bin/git: Argument list too long warning: no common commits [after a long wait and a good amount of network traffic] fatal: index-pack died of signal 13 fetch gcc.gnu.org: command returned error: 126 potomac%:~/gcc% Any ideas on what to do to resolve this? ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: "Argument list too long" in git remote update (Was: Git and GCC) 2007-12-17 22:15 ` "Argument list too long" in git remote update (Was: Git and GCC) Geert Bosch @ 2007-12-17 22:59 ` Johannes Schindelin 2007-12-17 23:01 ` Linus Torvalds 1 sibling, 0 replies; 90+ messages in thread From: Johannes Schindelin @ 2007-12-17 22:59 UTC (permalink / raw) To: Geert Bosch; +Cc: Harvey Harrison, Git Mailing List Hi, On Mon, 17 Dec 2007, Geert Bosch wrote: > On Dec 13, 2007, at 14:05, Harvey Harrison wrote: > > After the discussions lately regarding the gcc svn mirror. I'm coming > > up with a recipe to set up your own git-svn mirror. Suggestions on the > > following. > > > > // Create directory and initialize git > > mkdir gcc > > cd gcc > > git init > > // add the remote site that currently mirrors gcc > > // I have chosen the name gcc.gnu.org *1* as my local name to refer to > > // this choose something else if you like > > git remote add gcc.gnu.org git://git.infradead.org/gcc.git > > // fetching someone else's remote branches is not a standard thing to do > > // so we'll need to edit our .git/config file > > // you should have a section that looks like: > > [remote "gcc.gnu.org"] > > url = git://git.infradead.org/gcc.git > > fetch = +refs/heads/*:refs/remotes/gcc.gnu.org/* > > // infradead's mirror puts the gcc svn branches in its own namespace > > // refs/remotes/gcc.gnu.org/* > > // change our fetch line accordingly > > [remote "gcc.gnu.org"] > > url = git://git.infradead.org/gcc.git > > fetch = +refs/remotes/gcc.gnu.org/*:refs/remotes/gcc.gnu.org/* > > // fetch the remote data from the mirror site > > git remote update > > With git version 1.5.3.6 on Mac OS X, this results in: > potomac%:~/gcc%git remote update > Updating gcc.gnu.org > /opt/git/bin/git-fetch: line 220: /opt/git/bin/git: Argument list too long > warning: no common commits > [after a long wait and a good amount of network traffic] > fatal: index-pack died of signal 13 > fetch gcc.gnu.org: command returned error: 126 > potomac%:~/gcc% > > Any ideas on what to do to resolve this? Unfortunately, the builtin remote did not make it into git's master yet, and it will probably miss 1.5.4. Chances are that this would make the bug go away, but Junio said that on one of his machines, the regression tests fail with the builtin remote. In the meantime, "git fetch gcc.gnu.org" should do what you want, methinks. Hth, Dscho ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: "Argument list too long" in git remote update (Was: Git and GCC) 2007-12-17 22:15 ` "Argument list too long" in git remote update (Was: Git and GCC) Geert Bosch 2007-12-17 22:59 ` Johannes Schindelin @ 2007-12-17 23:01 ` Linus Torvalds 2007-12-18 1:34 ` Derek Fawcus 1 sibling, 1 reply; 90+ messages in thread From: Linus Torvalds @ 2007-12-17 23:01 UTC (permalink / raw) To: Geert Bosch; +Cc: Harvey Harrison, Git Mailing List On Mon, 17 Dec 2007, Geert Bosch wrote: > > With git version 1.5.3.6 on Mac OS X, this results in: > potomac%:~/gcc%git remote update > Updating gcc.gnu.org > /opt/git/bin/git-fetch: line 220: /opt/git/bin/git: Argument list too long Oops. > Any ideas on what to do to resolve this? Can you try the current git tree? "git fetch" is built-in these days, and that old shell-script that ran "git fetch--tool" on all the refs is no more, so most likely the problem simply no longer exists. But maybe there is some way to raise the argument size limit on OS X. One thing to check is whether maybe you have an excessively big environment (just run "printenv" to see what it contains) which might be cutting down on the size allowed for arguments. Linus ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: "Argument list too long" in git remote update (Was: Git and GCC) 2007-12-17 23:01 ` Linus Torvalds @ 2007-12-18 1:34 ` Derek Fawcus 2007-12-18 1:52 ` Shawn O. Pearce 0 siblings, 1 reply; 90+ messages in thread From: Derek Fawcus @ 2007-12-18 1:34 UTC (permalink / raw) To: Linus Torvalds; +Cc: Geert Bosch, Harvey Harrison, Git Mailing List On Mon, Dec 17, 2007 at 03:01:25PM -0800, Linus Torvalds wrote: > > > With git version 1.5.3.6 on Mac OS X, this results in: > > potomac%:~/gcc%git remote update > > Updating gcc.gnu.org > > /opt/git/bin/git-fetch: line 220: /opt/git/bin/git: Argument list too long > But maybe there is some way to raise the argument size limit on OS X. Well the certification for Leopard claims it can be up to 256k. I don't know about Tiger or earlier, but ARG_MAX on my 10.4 box is also (256 * 1024). So - how much do people want? Or maybe there is some sort limit in play here? DF ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: "Argument list too long" in git remote update (Was: Git and GCC) 2007-12-18 1:34 ` Derek Fawcus @ 2007-12-18 1:52 ` Shawn O. Pearce 0 siblings, 0 replies; 90+ messages in thread From: Shawn O. Pearce @ 2007-12-18 1:52 UTC (permalink / raw) To: Derek Fawcus Cc: Linus Torvalds, Geert Bosch, Harvey Harrison, Git Mailing List Derek Fawcus <dfawcus@cisco.com> wrote: > On Mon, Dec 17, 2007 at 03:01:25PM -0800, Linus Torvalds wrote: > > > > > With git version 1.5.3.6 on Mac OS X, this results in: > > > potomac%:~/gcc%git remote update > > > Updating gcc.gnu.org > > > /opt/git/bin/git-fetch: line 220: /opt/git/bin/git: Argument list too long > > > But maybe there is some way to raise the argument size limit on OS X. > > Well the certification for Leopard claims it can be up to 256k. > > I don't know about Tiger or earlier, but ARG_MAX on my 10.4 > box is also (256 * 1024). > > So - how much do people want? Or maybe there is some sort limit in play here? I heard there's like 1000 branches in that GCC repository, not counting tags. Each branch name is at least 12 bytes or so ("refs/heads/.."). It adds up. It should be fixed in latest git (1.5.4-rc0) as that uses the new builtin-fetch implementation, which passes the lists around in memory rather than on the command line. Much faster, and doesn't suffer from argument/environment limits. -- Shawn. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH] gc --aggressive: make it really aggressive
@ 2007-12-06 19:07 J.C. Pizarro
0 siblings, 0 replies; 90+ messages in thread
From: J.C. Pizarro @ 2007-12-06 19:07 UTC (permalink / raw)
To: David Kastrup, Johannes Schindelin
Cc: Pierre Habouzit, Linus Torvalds, Daniel Berlin, David Miller,
ismail, gcc, git, gitster
On 2007/12/06, David Kastrup <dak@gnu.org> wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > However, I think that --aggressive should be aggressive, and if you
> > decide to run it on a machine which lacks the muscle to be aggressive,
> > well, you should have known better.
>
> That's a rather cheap shot. "you should have known better" than
> expecting to be able to use a documented command and option because the
> git developers happened to have a nicer machine...
>
> _How_ is one supposed to have known better?
>
> --
> David Kastrup, Kriemhildstr. 15, 44793 Bochum
In GIT, the --aggressive option doesn't make it aggressive.
In GCC, the -Wall option doesn't enable all warnings.
#
It's a "Tie one to one" with the similar reputations. #######
To have a rest in peace. #
#
J.C.Pizarro #
^ permalink raw reply [flat|nested] 90+ messages in thread
end of thread, other threads:[~2009-03-18 18:04 UTC | newest] Thread overview: 90+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <4aca3dc20712051108s216d3331t8061ef45b9aa324a@mail.gmail.com> 2007-12-06 2:28 ` Git and GCC David Miller 2007-12-06 2:41 ` Daniel Berlin 2007-12-06 2:52 ` David Miller 2007-12-06 3:47 ` Daniel Berlin 2007-12-06 4:20 ` David Miller 2007-12-06 4:28 ` Harvey Harrison 2007-12-06 4:32 ` Daniel Berlin 2007-12-06 4:48 ` David Miller 2007-12-06 5:11 ` Daniel Berlin 2007-12-06 5:15 ` Harvey Harrison 2007-12-06 5:17 ` Daniel Berlin 2007-12-06 6:47 ` Jon Smirl 2007-12-06 7:15 ` Jeff King 2007-12-06 14:18 ` Nicolas Pitre 2007-12-06 17:39 ` Jeff King 2007-12-06 18:02 ` Nicolas Pitre 2007-12-07 6:50 ` Jeff King 2007-12-07 7:27 ` Jeff King 2007-12-06 18:35 ` Linus Torvalds 2007-12-06 18:55 ` Jon Smirl 2007-12-06 19:08 ` Nicolas Pitre 2007-12-06 21:39 ` Jon Smirl 2007-12-06 22:08 ` Nicolas Pitre 2007-12-06 22:11 ` Jon Smirl 2007-12-06 22:22 ` Jon Smirl 2007-12-06 22:30 ` Nicolas Pitre 2007-12-06 22:44 ` Jon Smirl 2007-12-07 7:31 ` Jeff King 2007-12-08 0:47 ` Harvey Harrison 2007-12-10 9:54 ` Gabriel Paubert 2007-12-10 15:35 ` Nicolas Pitre 2007-12-07 3:31 ` David Miller 2007-12-07 6:38 ` Jeff King 2007-12-07 7:10 ` Jon Smirl 2007-12-07 12:53 ` David Miller 2007-12-07 17:23 ` Linus Torvalds 2007-12-07 20:26 ` Giovanni Bajo 2007-12-07 22:14 ` Jakub Narebski 2007-12-07 23:04 ` Luke Lu 2007-12-07 23:14 ` Giovanni Bajo 2007-12-07 23:33 ` Daniel Berlin 2007-12-08 12:00 ` Johannes Schindelin 2007-12-08 1:55 ` David Miller 2007-12-10 9:57 ` David Miller 2007-12-06 6:09 ` Linus Torvalds 2007-12-06 7:49 ` Harvey Harrison 2007-12-06 8:11 ` David Brown 2007-12-06 14:01 ` Nicolas Pitre 2007-12-06 12:03 ` [PATCH] gc --aggressive: make it really aggressive Johannes Schindelin 2007-12-06 13:42 ` Theodore Tso 2007-12-06 14:15 ` Nicolas Pitre 2007-12-06 14:22 ` Pierre Habouzit 2007-12-06 15:55 ` Johannes Schindelin 2007-12-06 17:05 ` David Kastrup 2007-12-06 15:30 ` Harvey Harrison 2007-12-06 15:56 ` Johannes Schindelin 2007-12-06 16:19 ` Linus Torvalds 2009-03-18 16:01 ` Johannes Schindelin 2009-03-18 16:27 ` Teemu Likonen 2009-03-18 18:02 ` Nicolas Pitre 2007-12-06 18:04 ` Git and GCC Daniel Berlin 2007-12-06 18:29 ` Linus Torvalds 2007-12-07 2:42 ` Harvey Harrison 2007-12-07 3:01 ` Linus Torvalds 2007-12-07 4:06 ` Jon Smirl 2007-12-07 4:21 ` Nicolas Pitre 2007-12-07 5:21 ` Linus Torvalds 2007-12-07 7:08 ` Jon Smirl 2007-12-07 19:36 ` Nicolas Pitre 2007-12-06 18:24 ` NightStrike 2007-12-06 18:45 ` Linus Torvalds 2007-12-07 5:36 ` NightStrike 2007-12-06 19:12 ` Jon Loeliger 2007-12-06 19:39 ` Linus Torvalds 2007-12-07 0:29 ` Jakub Narebski 2007-12-06 20:04 ` Junio C Hamano 2007-12-06 21:02 ` Junio C Hamano 2007-12-06 22:26 ` David Kastrup 2007-12-06 22:38 ` [OT] " Randy Dunlap 2007-12-06 4:25 ` Harvey Harrison 2007-12-06 4:54 ` Linus Torvalds 2007-12-06 5:04 ` Harvey Harrison 2007-12-06 11:57 ` Johannes Schindelin 2007-12-06 12:04 ` Ismail Dönmez [not found] ` <2007-12-05-21-23-14+trackit+sam@rfc1149.net> [not found] ` <1196891451.10408.54.camel@brick> [not found] ` <jeeje0ogvk.fsf@sykes.suse.de> [not found] ` <1196897840.10408.57.camel@brick> [not found] ` <38a0d8450712130640p1b5d74d6nfa124ad0b0110d64@mail.gmail.com> [not found] ` <1197572755.898.15.camel@brick> 2007-12-17 22:15 ` "Argument list too long" in git remote update (Was: Git and GCC) Geert Bosch 2007-12-17 22:59 ` Johannes Schindelin 2007-12-17 23:01 ` Linus Torvalds 2007-12-18 1:34 ` Derek Fawcus 2007-12-18 1:52 ` Shawn O. Pearce 2007-12-06 19:07 [PATCH] gc --aggressive: make it really aggressive J.C. Pizarro
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).