From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff King Subject: Re: RFC GSoC idea: new "git config" features Date: Sat, 1 Mar 2014 02:52:47 -0500 Message-ID: <20140301075247.GF20397@sigill.intra.peff.net> References: <53108650.2020708@alum.mit.edu> <53112794.2070007@alum.mit.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: Junio C Hamano , git discussion list To: Michael Haggerty X-From: git-owner@vger.kernel.org Sat Mar 01 08:52:55 2014 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WJej8-0005pF-Lm for gcvg-git-2@plane.gmane.org; Sat, 01 Mar 2014 08:52:55 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752504AbaCAHwu (ORCPT ); Sat, 1 Mar 2014 02:52:50 -0500 Received: from cloud.peff.net ([50.56.180.127]:59242 "HELO peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1752423AbaCAHwt (ORCPT ); Sat, 1 Mar 2014 02:52:49 -0500 Received: (qmail 25982 invoked by uid 102); 1 Mar 2014 07:52:49 -0000 Received: from c-71-63-4-13.hsd1.va.comcast.net (HELO sigill.intra.peff.net) (71.63.4.13) (smtp-auth username relayok, mechanism cram-md5) by peff.net (qpsmtpd/0.84) with ESMTPA; Sat, 01 Mar 2014 01:52:49 -0600 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Sat, 01 Mar 2014 02:52:47 -0500 Content-Disposition: inline In-Reply-To: <53112794.2070007@alum.mit.edu> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Sat, Mar 01, 2014 at 01:19:32AM +0100, Michael Haggerty wrote: > I absolutely understand that changing all of the config parsers is not > feasible. But I had imagined a third route: > > (3) parse the config once, storing the raw values to records in > memory. When an "unset" is seen, delete any previous records that > have accumulated for that key. After the whole config has been > read, iterate through the records, feeding the surviving values > into the callback in the order they were originally read (minus > deletions). > > Do you see any problems with this way of implementing the functionality > (aside from slightly increased overhead)? Yeah, this is something I have considered many times. It does have some overhead, but the existing system is not great either. As you noted, we often read the config several times for a given program invocation. But moreover, we linearly strcmp each config key we find against each one we know about. In some cases we return early if a sub-function is looking for `starts_with(key, "foo.")`, but in most cases we just look for "foo.bar", "foo.baz", and so on. If we had the keys in-memory, we could reverse this: config code asks for keys it cares about, and we can do an optimized lookup (binary search, hash, etc). That also makes many constructs easier to express. Recently we had a problem where the parsing order of "remote.pushdefault" and "branch.*.pushremote" mattered, because they were read into the same variable. The solution is to use two variables and reconcile them after all config is read. But if you can just query the config subsystem directly, the boilerplate of reading them into strings goes away, and you can just do: x = config_string_getf("branch.%s.pushremote", current_branch); if (!x) x = config_string_get("remote.pushdefault"); if (!x) x = config_string_getf("branch.%s.remote", current_branch); if (!x) x = "origin"; As it is now, the code that does this has a lot more boilerplate, and is split across several functions. Another example is the way we have to manage "deferred" config in git-status (see 84b4202). This might be more clear if we could simply `config_get_bool("status.branch")` at the right moment. The talk of efficiency is probably a red-herring here. I do not think config-reading is a significant source of slow-down in the current code. But I'd be in favor of something that reduced boilerplate and made the code easier to read. > > But the side effects these callbacks may cause are not limited to > > setting a simple scaler variable (like 'frotz' in the illustration) > > but would include things that are hard to undo once done > > (e.g. calling a set-up function with a lot of side effects). Most callbacks would convert to a query system in a pretty straightforward way, but some that have side effects might be tricky. Converting them all may be too large for a GSoC project, but I think you could do it gradually: 1. Convert the parser to read into an in-memory representation, but leave git_config() as a wrapper which iterates over it. 2. Add query functions like config_string_get() above. 3. Convert callbacks to query functions one by one. 4. Eventually drop git_config(). A GSoC project could take us partway through (3). -Peff