From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steven Grimm Subject: Re: People unaware of the importance of "git gc"? Date: Wed, 05 Sep 2007 10:35:36 -0700 Message-ID: <46DEE8E8.2000801@midwinter.com> References: <69b0c0350709050947k5e32ba7fj38924a0968569d9a@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Linus Torvalds , Git Mailing List To: Govind Salinas X-From: git-owner@vger.kernel.org Wed Sep 05 19:35:44 2007 Return-path: Envelope-to: gcvg-git@gmane.org Received: from vger.kernel.org ([209.132.176.167]) by lo.gmane.org with esmtp (Exim 4.50) id 1ISymz-0007w3-1s for gcvg-git@gmane.org; Wed, 05 Sep 2007 19:35:41 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754623AbXIERfg (ORCPT ); Wed, 5 Sep 2007 13:35:36 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753503AbXIERfg (ORCPT ); Wed, 5 Sep 2007 13:35:36 -0400 Received: from tater2.midwinter.com ([216.32.86.91]:58066 "HELO midwinter.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with SMTP id S1754015AbXIERfe (ORCPT ); Wed, 5 Sep 2007 13:35:34 -0400 Received: (qmail 6623 invoked from network); 5 Sep 2007 17:35:34 -0000 Received: from c-76-21-16-80.hsd1.ca.comcast.net (HELO pinklady.local) (koreth@76.21.16.80) by tater.midwinter.com with SMTP; 5 Sep 2007 17:35:34 -0000 User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) In-Reply-To: <69b0c0350709050947k5e32ba7fj38924a0968569d9a@mail.gmail.com> Sender: git-owner@vger.kernel.org Precedence: bulk X-Mailing-List: git@vger.kernel.org Archived-At: Govind Salinas wrote: > This is one reason why I really think that gc should be *plumbing* > and *not* porcelain. > That's a good way to think of it IMO. It's a low-level operation (albeit one that encapsulates other, lower-level ones) that tells git to rearrange its internal data structures. It is not something that has any user-visible effect. Every other porcelain-level git command *does something* from the user's point of view. Running git-gc is basically a no-op, which from the user's point of view makes it a waste of keystrokes and an annoying distraction from focusing on the stuff they're using git to help them build. > The user should never have to trigger a gc, they should even be > discouraged from doing so. That is how other gc systems are. Can you > imagine if you had a Java app that had a button on it to do a gc? > When should I push it? Should I wait till the system is getting slow > or just start spamming the button whenever I'm bored? I know that > Java/c#/py GC are different than git gc, but they fulfill the same > basic purpose as git gc. IE to clean up unused items and free up > resources. Git additionally may do some re-optimization, but that is > not relevant to a user. > I'll play devil's advocate for a moment here, though, and say that, as others have suggested in this thread, git could be made to tell you when it's appropriate to run gc. So the "I don't know when to run it" argument isn't a hard one to address. With that in mind, here's what the message should look like IMO: --- Your repository can be optimized for better performance and lower disk usage. Please run "git gc" to optimize it now, or run "git config gc.auto true" to tell git to automatically optimize it in the future (this will launch processes in the background.) For more information, "man git-gc". --- And that "gc.auto" config option (just an arbitrary name, call it something else if that's no good) actually has four settings: warn (the default) - prints the warning message, at most once every N minutes (we can determine a good value for N) true - launches git-gc in the background as needed false - suppresses the warning and the check that triggers the warning foreground - launches git-gc in the foreground as needed (to make it easier to abort) I don't buy the "git gc takes too much memory to run in the background" argument as a reason automatic git-gc is a bad idea. Many of us (me included) work on machines with plenty of memory to launch a background git-gc without hampering our development work, and/or on repositories small enough that it doesn't eat that much memory in the first place. And if you make it an option that the user has to enable, people on low-memory machines can simply not enable it, end of problem. One big problem with git-gc now is that it's not discoverable. Or rather, the need for it isn't discoverable. So at the very least we should print the warning, IMO -- and if we're already going to all the trouble to determine whether or not git-gc needs to be run, it will reduce the "why are you telling me to run something when you could just do it for me, you stupid machine?" factor if there's an easily discoverable way to just do it as needed. -Steve