Re: RFC GSoC idea: new "git config" features

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: Jeff King <peff@peff.net>
To: Michael Haggerty <mhagger@alum.mit.edu>
Cc: Junio C Hamano <gitster@pobox.com>,
	git discussion list <git@vger.kernel.org>
Subject: Re: RFC GSoC idea: new "git config" features
Date: Sat, 1 Mar 2014 02:52:47 -0500	[thread overview]
Message-ID: <20140301075247.GF20397@sigill.intra.peff.net> (raw)
In-Reply-To: <53112794.2070007@alum.mit.edu>

On Sat, Mar 01, 2014 at 01:19:32AM +0100, Michael Haggerty wrote:

> I absolutely understand that changing all of the config parsers is not
> feasible.  But I had imagined a third route:
> 
> (3) parse the config once, storing the raw values to records in
>     memory.  When an "unset" is seen, delete any previous records that
>     have accumulated for that key.  After the whole config has been
>     read, iterate through the records, feeding the surviving values
>     into the callback in the order they were originally read (minus
>     deletions).
> 
> Do you see any problems with this way of implementing the functionality
> (aside from slightly increased overhead)?

Yeah, this is something I have considered many times. It does have some
overhead, but the existing system is not great either. As you noted, we
often read the config several times for a given program invocation.

But moreover, we linearly strcmp each config key we find against each
one we know about. In some cases we return early if a sub-function is
looking for `starts_with(key, "foo.")`, but in most cases we just look
for "foo.bar", "foo.baz", and so on.

If we had the keys in-memory, we could reverse this: config code asks
for keys it cares about, and we can do an optimized lookup (binary
search, hash, etc).

That also makes many constructs easier to express. Recently we had a
problem where the parsing order of "remote.pushdefault" and
"branch.*.pushremote" mattered, because they were read into the same
variable. The solution is to use two variables and reconcile them after
all config is read. But if you can just query the config subsystem
directly, the boilerplate of reading them into strings goes away, and
you can just do:

  x = config_string_getf("branch.%s.pushremote", current_branch);
  if (!x)
          x = config_string_get("remote.pushdefault");
  if (!x)
          x = config_string_getf("branch.%s.remote", current_branch);
  if (!x)
          x = "origin";

As it is now, the code that does this has a lot more boilerplate, and is
split across several functions.

Another example is the way we have to manage "deferred" config in
git-status (see 84b4202). This might be more clear if we could simply
`config_get_bool("status.branch")` at the right moment.

The talk of efficiency is probably a red-herring here. I do not think
config-reading is a significant source of slow-down in the current code.
But I'd be in favor of something that reduced boilerplate and made the
code easier to read.

> > But the side effects these callbacks may cause are not limited to
> > setting a simple scaler variable (like 'frotz' in the illustration)
> > but would include things that are hard to undo once done
> > (e.g. calling a set-up function with a lot of side effects).

Most callbacks would convert to a query system in a pretty
straightforward way, but some that have side effects might be tricky.
Converting them all may be too large for a GSoC project, but I think you
could do it gradually:

  1. Convert the parser to read into an in-memory representation, but
     leave git_config() as a wrapper which iterates over it.

  2. Add query functions like config_string_get() above.

  3. Convert callbacks to query functions one by one.

  4. Eventually drop git_config().

A GSoC project could take us partway through (3).

-Peff

next prev parent reply	other threads:[~2014-03-01  7:52 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-28 12:51 RFC GSoC idea: new "git config" features Michael Haggerty
2014-02-28 20:00 ` Junio C Hamano
2014-03-01  0:19   ` Michael Haggerty
2014-03-01  7:52     ` Jeff King [this message]
2014-03-01 11:01       ` Matthieu Moy
2014-03-14  4:43         ` Jeff King
2014-03-14 21:00           ` Junio C Hamano
2014-03-03 20:07       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140301075247.GF20397@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).