git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* RFC GSoC idea: new "git config" features
@ 2014-02-28 12:51 Michael Haggerty
  2014-02-28 20:00 ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Haggerty @ 2014-02-28 12:51 UTC (permalink / raw)
  To: git discussion list

I just wrote up another double-idea that has been stewing in my head for
a while:

* Allow configuration values to be unset via a config file
* Fix "git config --unset" to clean up detritus from sections that are
left empty.

These ideas are more "out there" than the last, and might be too
controversial to be implemented, let alone as a GSoC project.  I'd
definitely like some feedback.

And if you like this idea or the other one I proposed, please volunteer
to be a co-mentor!  I will be traveling for a few weeks this summer, so
I *won't* be able to be the sole mentor to a student.

I wrote up this idea in the following pull request:

    https://github.com/git/git.github.io/pull/6

I will also append the text, for your mailing-list-reading convenience.

Michael

### `git config` improvements

This project proposes the following two improvements to `git config`.
Please note that this project has a significant "political" component
to it, because some of the details of the features will be
controversial.

#### Unsetting configuration options

Some Git configuration options have an effect by their mere existence.
(I.e., setting the option to "false" or the empty string is different
than leaving it unset altogether.)  Also, some configuration options
can take multiple values.

However, there is no way for an option file to "unset" an option--that
is, to change the option back to "unset".  This is awkward, because
configuration values are read from various places (`/etc/gitconfig`,
`~/.config/git/config` or `~/.gitconfig`, and `$GIT_DIR/config`, plus
perhaps files that are included by other configuration files).
Therefore, if an option is set in one of the earlier files, there is
no way for it to be unset in a later one.  The unwanted option might
have even been set in a file like `/etc/gitconfig` that the user
doesn't have permission to modify.

It would be nice to have a syntax that can be used to unset any
previously-defined values for an option.  Perhaps

    [section "subsection"]
            !option

The above is currently currently a syntax error that causes Git to
terminate, so some thought has to go into a transition plan for
enabling this feature.  Maybe a syntax has to be invented that
conforms to the current format, like

    [unset]
            name = section.subsection.option

Because options are currently processed as they are read, this change
will require the code that reads options files to be changed
significantly.

Leave yourself a lot of time to attain a consensus on the mailing list
about how this can be done while retaining reasonable backwards
compatibility.

#### Tidy configuration files

When a configuration file is repeatedly modified, often garbage is
left behind.  For example, after

    git config my.option true
    git config --unset my.option
    git config my.option true
    git config --unset my.option

the bottom of the configuration file is left with the useless lines

    [my]
    [my]

It would be nice to clean up such garbage when rewriting the
configuration file.  This project is a bit tricky because of the
possible presence of comments.  For example, what if an empty section
looks like this:

    [my]
            # This section is for my own private settings

or this:

    [my]
    # This section is for my own private settings

or this:

    # This section is for my own private settings:
    [my]

?  In some such cases it might be desireable either to retain the
section even though it is empty, or to delete the comment along with
the section.  Very likely there will be some obvious patterns when
everybody agrees that an empty section can be deleted, and other, more
controversial cases where you will have to reach a consensus on the
mailing list about what should be done.

 - Language: C
 - Difficulty: medium
 - Possible mentors: Michael Haggerty


-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFC GSoC idea: new "git config" features
  2014-02-28 12:51 RFC GSoC idea: new "git config" features Michael Haggerty
@ 2014-02-28 20:00 ` Junio C Hamano
  2014-03-01  0:19   ` Michael Haggerty
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2014-02-28 20:00 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: git discussion list

Michael Haggerty <mhagger@alum.mit.edu> writes:

> I just wrote up another double-idea that has been stewing in my head for
> a while:
>
> * Allow configuration values to be unset via a config file
> * Fix "git config --unset" to clean up detritus from sections that are
> left empty.

The former is *way* too large for a GSoC project.  Most
configuration variables are meant to be read sequencially and affect
in-core variables directly, like

        /* file-scope global */
	static int frotz = -1;  /* unset */

        static int parse_config_frotz(const char *key, const char *value, void *cb)
	{
        	if (!strcmp(key, "core.frotz"))
                	frotz = git_config_int(value);
		return 0;
	}

	... and somewhere ...
        	git_config(parse_config_frotz, NULL);

The config parsers are distributed and there is no single registry
that knows how in-core variables owned by each subsystem represent
an "unset" value.  In the above example, -1 is such a sentinel
value, but in some other contexts, the subsystem may choose to use
INT_MAX.  The only way to allow "resetting to previous" is to

 (1) come up with a way to pass "this key is being reset to
     'unspecified'" to existing git_config() callback functions
     (like parse_config_frotz() in the above illustration), which
     may or may not involve changing the function signature of the
     callbacks;

 (2) go through all the git_config() callback functions and make
     them understand the new "reset to 'unspecified'" convention.

which may not sound too bad at the first glance (especially, the
first one is almost trivial).

But the side effects these callbacks may cause are not limited to
setting a simple scaler variable (like 'frotz' in the illustration)
but would include things that are hard to undo once done
(e.g. calling a set-up function with a lot of side effects).

The latter, on the other hand, should be a change that is of a
fairly limited scope, and would be a good fit for a GSoC project
(incidentally, it has been one of the items on my leftover-bits list
http://git-blame.blogspot.com/p/leftover-bits.html for quite some
time).

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFC GSoC idea: new "git config" features
  2014-02-28 20:00 ` Junio C Hamano
@ 2014-03-01  0:19   ` Michael Haggerty
  2014-03-01  7:52     ` Jeff King
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Haggerty @ 2014-03-01  0:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git discussion list

On 02/28/2014 09:00 PM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
> 
>> I just wrote up another double-idea that has been stewing in my head for
>> a while:
>>
>> * Allow configuration values to be unset via a config file
>> * Fix "git config --unset" to clean up detritus from sections that are
>> left empty.
> 
> The former is *way* too large for a GSoC project.  Most
> configuration variables are meant to be read sequencially and affect
> in-core variables directly, like
> 
>         /* file-scope global */
> 	static int frotz = -1;  /* unset */
> 
>         static int parse_config_frotz(const char *key, const char *value, void *cb)
> 	{
>         	if (!strcmp(key, "core.frotz"))
>                 	frotz = git_config_int(value);
> 		return 0;
> 	}
> 
> 	... and somewhere ...
>         	git_config(parse_config_frotz, NULL);
> 
> The config parsers are distributed and there is no single registry
> that knows how in-core variables owned by each subsystem represent
> an "unset" value.  In the above example, -1 is such a sentinel
> value, but in some other contexts, the subsystem may choose to use
> INT_MAX.  The only way to allow "resetting to previous" is to
> 
>  (1) come up with a way to pass "this key is being reset to
>      'unspecified'" to existing git_config() callback functions
>      (like parse_config_frotz() in the above illustration), which
>      may or may not involve changing the function signature of the
>      callbacks;
> 
>  (2) go through all the git_config() callback functions and make
>      them understand the new "reset to 'unspecified'" convention.

I absolutely understand that changing all of the config parsers is not
feasible.  But I had imagined a third route:

(3) parse the config once, storing the raw values to records in
    memory.  When an "unset" is seen, delete any previous records that
    have accumulated for that key.  After the whole config has been
    read, iterate through the records, feeding the surviving values
    into the callback in the order they were originally read (minus
    deletions).

Do you see any problems with this way of implementing the functionality
(aside from slightly increased overhead)?

And once we have a way to store config records in memory, it might also
make sense to reuse the parsed values for later config inquiries (after
checking that the files have not changed since the last read).  After
this second step the net performance change might even be advantageous.

> which may not sound too bad at the first glance (especially, the
> first one is almost trivial).
> 
> But the side effects these callbacks may cause are not limited to
> setting a simple scaler variable (like 'frotz' in the illustration)
> but would include things that are hard to undo once done
> (e.g. calling a set-up function with a lot of side effects).
> 
> The latter, on the other hand, should be a change that is of a
> fairly limited scope, and would be a good fit for a GSoC project
> (incidentally, it has been one of the items on my leftover-bits list
> http://git-blame.blogspot.com/p/leftover-bits.html for quite some
> time).

But only the latter part would be a bit meager as a GSoC project, don't
you think?

Thanks for the feedback.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFC GSoC idea: new "git config" features
  2014-03-01  0:19   ` Michael Haggerty
@ 2014-03-01  7:52     ` Jeff King
  2014-03-01 11:01       ` Matthieu Moy
  2014-03-03 20:07       ` Junio C Hamano
  0 siblings, 2 replies; 8+ messages in thread
From: Jeff King @ 2014-03-01  7:52 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Junio C Hamano, git discussion list

On Sat, Mar 01, 2014 at 01:19:32AM +0100, Michael Haggerty wrote:

> I absolutely understand that changing all of the config parsers is not
> feasible.  But I had imagined a third route:
> 
> (3) parse the config once, storing the raw values to records in
>     memory.  When an "unset" is seen, delete any previous records that
>     have accumulated for that key.  After the whole config has been
>     read, iterate through the records, feeding the surviving values
>     into the callback in the order they were originally read (minus
>     deletions).
> 
> Do you see any problems with this way of implementing the functionality
> (aside from slightly increased overhead)?

Yeah, this is something I have considered many times. It does have some
overhead, but the existing system is not great either. As you noted, we
often read the config several times for a given program invocation.

But moreover, we linearly strcmp each config key we find against each
one we know about. In some cases we return early if a sub-function is
looking for `starts_with(key, "foo.")`, but in most cases we just look
for "foo.bar", "foo.baz", and so on.

If we had the keys in-memory, we could reverse this: config code asks
for keys it cares about, and we can do an optimized lookup (binary
search, hash, etc).

That also makes many constructs easier to express. Recently we had a
problem where the parsing order of "remote.pushdefault" and
"branch.*.pushremote" mattered, because they were read into the same
variable. The solution is to use two variables and reconcile them after
all config is read. But if you can just query the config subsystem
directly, the boilerplate of reading them into strings goes away, and
you can just do:

  x = config_string_getf("branch.%s.pushremote", current_branch);
  if (!x)
          x = config_string_get("remote.pushdefault");
  if (!x)
          x = config_string_getf("branch.%s.remote", current_branch);
  if (!x)
          x = "origin";

As it is now, the code that does this has a lot more boilerplate, and is
split across several functions.

Another example is the way we have to manage "deferred" config in
git-status (see 84b4202). This might be more clear if we could simply
`config_get_bool("status.branch")` at the right moment.

The talk of efficiency is probably a red-herring here. I do not think
config-reading is a significant source of slow-down in the current code.
But I'd be in favor of something that reduced boilerplate and made the
code easier to read.

> > But the side effects these callbacks may cause are not limited to
> > setting a simple scaler variable (like 'frotz' in the illustration)
> > but would include things that are hard to undo once done
> > (e.g. calling a set-up function with a lot of side effects).

Most callbacks would convert to a query system in a pretty
straightforward way, but some that have side effects might be tricky.
Converting them all may be too large for a GSoC project, but I think you
could do it gradually:

  1. Convert the parser to read into an in-memory representation, but
     leave git_config() as a wrapper which iterates over it.

  2. Add query functions like config_string_get() above.

  3. Convert callbacks to query functions one by one.

  4. Eventually drop git_config().

A GSoC project could take us partway through (3).

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFC GSoC idea: new "git config" features
  2014-03-01  7:52     ` Jeff King
@ 2014-03-01 11:01       ` Matthieu Moy
  2014-03-14  4:43         ` Jeff King
  2014-03-03 20:07       ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Matthieu Moy @ 2014-03-01 11:01 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael Haggerty, Junio C Hamano, git discussion list

Jeff King <peff@peff.net> writes:

> If we had the keys in-memory, we could reverse this: config code asks
> for keys it cares about, and we can do an optimized lookup (binary
> search, hash, etc).

I'm actually dreaming of a system where a configuration variable could
be "declared" in Git's source code, with associated type (list/single
value, boolean/string/path/...), default value and documentation (and
then Documentation/config.txt could become a generated file). One could
imagine a lot of possibilities like

$ git config --describe <some-variable>
Type: boolean
Default value: true
Description: ...

Somehow, do for config variables what has been done for command-line
option parsing.

Migrating the whole code to such system would take time, but creating
the system and applying it to a few examples might be feasible as a GSoC
project.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFC GSoC idea: new "git config" features
  2014-03-01  7:52     ` Jeff King
  2014-03-01 11:01       ` Matthieu Moy
@ 2014-03-03 20:07       ` Junio C Hamano
  1 sibling, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2014-03-03 20:07 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael Haggerty, git discussion list

Jeff King <peff@peff.net> writes:

> Most callbacks would convert to a query system in a pretty
> straightforward way, but some that have side effects might be tricky.
> Converting them all may be too large for a GSoC project, but I think you
> could do it gradually:
>
>   1. Convert the parser to read into an in-memory representation, but
>      leave git_config() as a wrapper which iterates over it.
>
>   2. Add query functions like config_string_get() above.
>
>   3. Convert callbacks to query functions one by one.
>
>   4. Eventually drop git_config().
>
> A GSoC project could take us partway through (3).

I actually discarded the "read from these config files to preparsed
structure to memory, later to be consumed by repeated calls to the
git_config() callback functions, making the only difference from the
current scheme that the preparsed structure will be reset when there
is the new 'reset to the original' definition" as obvious and
uninteresting.

This is one of these times that I find myself blessed with capable
others that can go beyond, building on top of such an idea that I
may have discarded without thinking it through, around me ;-)

Yes, the new abstraction like config_<type>_get() that can live
alongside the existing "git_config() feeds callback chain
everything" and gradually replace the latter, would be a good way
forward.  Given that we read configuration multiple times anyway for
different purposes, even without the new abstraction, the end result
might perform better if we read the files once and reused in later
calls to git_config().

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFC GSoC idea: new "git config" features
  2014-03-01 11:01       ` Matthieu Moy
@ 2014-03-14  4:43         ` Jeff King
  2014-03-14 21:00           ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2014-03-14  4:43 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: Michael Haggerty, Junio C Hamano, git discussion list

On Sat, Mar 01, 2014 at 12:01:44PM +0100, Matthieu Moy wrote:

> Jeff King <peff@peff.net> writes:
> 
> > If we had the keys in-memory, we could reverse this: config code asks
> > for keys it cares about, and we can do an optimized lookup (binary
> > search, hash, etc).
> 
> I'm actually dreaming of a system where a configuration variable could
> be "declared" in Git's source code, with associated type (list/single
> value, boolean/string/path/...), default value and documentation (and
> then Documentation/config.txt could become a generated file). One could
> imagine a lot of possibilities like

Yes, I think something like that would be very nice. I am not a big
fan of code generation, but if we had config queries like
"config_get_bool", then I think it would be reasonably pleasant to take
a spec like:

  Key: help.browser
  Type: string
  Description: Specify the browser for help...

and turn it into:

  const char *config_get_help_browser(void)
  {
          return config_get_string("help.browser");
  }

So technically code generation, but all the heavy lifting is done behind
the scenes. We're not saving lines in the result so much as avoiding
repeating ourselves (that is, the generated code is only mapping the
config-type from the spec into a C type and function name that gives us
extra compile-time safety).

However, I skimmed through config.txt looking for a key to use in my
example above, and there are a surprising number of one-off semantics
(e.g., things that are mostly bool, but can be "auto" or take some other
special value). We may find that the "Type" field has a surprising
number of variants that makes a technique like this annoying. But I'd
reserve judgement until somebody actually tries encoding a significant
chunk of the config keys and we see what it looks like.

> Migrating the whole code to such system would take time, but creating
> the system and applying it to a few examples might be feasible as a GSoC
> project.

Agreed, as long as we have enough examples to feel confident that the
infrastructure is sufficient.

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFC GSoC idea: new "git config" features
  2014-03-14  4:43         ` Jeff King
@ 2014-03-14 21:00           ` Junio C Hamano
  0 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2014-03-14 21:00 UTC (permalink / raw)
  To: Jeff King; +Cc: Matthieu Moy, Michael Haggerty, git discussion list

Jeff King <peff@peff.net> writes:

> On Sat, Mar 01, 2014 at 12:01:44PM +0100, Matthieu Moy wrote:
>
>> Jeff King <peff@peff.net> writes:
>> 
>> > If we had the keys in-memory, we could reverse this: config code asks
>> > for keys it cares about, and we can do an optimized lookup (binary
>> > search, hash, etc).
>> 
>> I'm actually dreaming of a system where a configuration variable could
>> be "declared" in Git's source code, with associated type (list/single
>> value, boolean/string/path/...), default value and documentation (and
>> then Documentation/config.txt could become a generated file). One could
>> imagine a lot of possibilities like
>
> Yes, I think something like that would be very nice. ...
> ...
>> Migrating the whole code to such system would take time, but creating
>> the system and applying it to a few examples might be feasible as a GSoC
>> project.
>
> Agreed, as long as we have enough examples to feel confident that the
> infrastructure is sufficient.

I agree that it would give us a lot of enhancement opportunities if
we had a central catalog of what the supported configuration
variables are and what semantics (e.g. type, multi-value-ness, etc.)
they have.

One thing we need to be careful about is that we still must support
random configuration items that git-core does not care about at all
but scripts (and future versions of git-core) read off of, though.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-03-14 21:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-28 12:51 RFC GSoC idea: new "git config" features Michael Haggerty
2014-02-28 20:00 ` Junio C Hamano
2014-03-01  0:19   ` Michael Haggerty
2014-03-01  7:52     ` Jeff King
2014-03-01 11:01       ` Matthieu Moy
2014-03-14  4:43         ` Jeff King
2014-03-14 21:00           ` Junio C Hamano
2014-03-03 20:07       ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).