git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Tanay Abhra <tanayabh@gmail.com>
To: Matthieu Moy <Matthieu.Moy@grenoble-inp.fr>
Cc: git@vger.kernel.org, Ramkumar Ramachandra <artagnon@gmail.com>,
	Karsten Blees <karsten.blees@gmail.com>,
	Junio C Hamano <gitster@pobox.com>,
	Eric Sunshine <sunshine@sunshineco.com>
Subject: Re: [PATCH v4 1/2] add `config_set` API for caching config files
Date: Wed, 02 Jul 2014 17:28:59 +0530	[thread overview]
Message-ID: <53B3F403.6030501@gmail.com> (raw)
In-Reply-To: <vpqoax8m8bh.fsf@anie.imag.fr>



On 7/2/2014 2:44 PM, Matthieu Moy wrote:
> Tanay Abhra <tanayabh@gmail.com> writes:
> 
>> Add a `config_set` construct that points to an ordered set of config files
>> specified by the user, each of which represents what was read and cached
>> in core as hashmaps. Add two external functions `git_configset_get_value`
>> and `git_configset_get_value_multi` for querying from the config sets,
>> which follow `last one wins` semantic(i.e. if there are multiple matches
> 
> Space before (
> 

Noted. Sorry about that.

>> for the queried key in the files of the configset the value returned will
>> the last value in the last matching file of the configset) Add type

> Does this remark also apply to git_configset_get_value_multi? My
> understanding was that "last one wins" applied only to
> git_configset_get_value.
> 

Maybe a reworded sentence may work,
`git_configset_get_value_multi` returns a list of values sorted in order of
increasing priority (i.e. last match will be at the end of the list.)

>> Add a default `config_set`, `the_config_set` to cache all key-value pairs
> 
> I don't like the name "the_config_set". It's not the only one. Perhaps
> repo_config_set? (not totally satisfactory as it does not contain only
> the repo, but the repo+user+system)
> 
> What do others think?
> 
>> read from usual config files(repo specific .git/config, user wide
>> ~/.gitconfig and the global /etc/gitconfig). `the_config_set` uses a
>> single hashmap populated using git_config(). The reason for doing so is
>> twofold, one is to honour include directives, another is to guarantee O(1)
>> lookups for usual config values, as values are queried for hundred of
>> times during a run of a git program.
> 
> What is the reason to deal with `the_config_set` and other config sets
> differently? You're giving arguments to store `the_config_set` as a
> single hashmap, but what is the reason to store others as multiple
> hashmaps?
> 
> And actually, I don't completely buy your arguments: having 3 or 4
> hashmaps (.git/config, ~/.gitconfig, ~/.config/git/config and
> /etc/gitconfig) would be a O(4) lookup, which is still O(1), and you
> could deal with include directives by having ".git/config and included
> files" in a hashmap, "~/.gitconfig and included files" in a second
> hashmap, ...
> 
> My guess is that the real argument is "git_config already does what I
> want and I'm too lazy to change it". And I do consider lazyness as a
> quality for a computer-scientist ;-).
> 
> I would personally find it much simpler to have a single hashmap. We'd
> lose the ability to invalidate the cache for only a single file, but I'm
> not sure it's worth the code complexity.
> 

Point noted. I can take the multiple hashmap route for `the_config_set`.
Still, since it will be the most used config set in the code and for each
query it would have to look through n hashmaps hampering performance. I
can change it if you want but like you, I don't think it is worth the code
complexity.

>> +Querying For Specific Variables
>> +-------------------------------
>> +
>> +For programs wanting to query for specific variables in a non-callback
>> +manner, the config API provides two functions `git_config_get_value`
>> +and `git_config_get_value_multi`. They both read values from an internal
>> +`git_config_iter` gives a way to iterate over the keys in cache. Existing
>> +`git_config` callback function signature is used for iterating.
> 
> "Existing" refers to the old state. It's OK in a commit message, but
> won't make sense in the future, when your non-callback API and the old
> callback API will live side by side.
>

Noted.

>> +The config API also provides type specific API functions which do conversion
>> +as well as retrieval for the queried variable, including:
>> +
>> +`git_config_get_int`::
>> +Parse the retrieved value to an integer, including unit factors. Dies on
>> +error; otherwise, allocates and copies the integer into the dest parameter.
>> +Returns 0 on success, or 1 if no value was found.
> 
> It seems strange to me to return 1 here, and -1 in git_config_get_value
> to mean the same thing.
> 

Noted. Some of the type specific function return -1 on wrong parsing, I will
put return value 1 for no value found in all of the cases.

>> +`git_config_bool`::
>> +/* config-hash.c */

>> +#define CONFIG_SET_INIT { NULL, 0, 0 }
>> +
>> +struct config_set {
>> +	struct config_set_item *item;
>> +	unsigned int nr, alloc;
>> +};
>> +
>> +struct config_set_item {
>> +	struct hashmap config_hash;
>> +	char *filename;
>> +static struct hashmap *add_config_hash(const char *filename, struct config_set *cs)
>> +{
>> +	int ret = 0;
>> +	struct config_set_item *ct;
>> +	struct config_set_cb cb;
>> +	ALLOC_GROW(cs->item, cs->nr + 1, cs->alloc);
>> +	ct = &cs->item[cs->nr++];
>> +	ct->filename = xstrdup(filename);
>> +	config_hash_init(&ct->config_hash);
>> +	cb.cs = cs;
>> +	cb.ct = ct;
>> +	/*
>> +	 * When filename is "default", hashmap is constructed from the usual set of
>> +	 * config files (i.e repo specific .git/config, user wide ~/.gitconfig and the
>> +	 * global /etc/gitconfig), used in `the_config_set`
>> +	 */
>> +	if (!strcmp(filename, "default"))
> 
> How does one read a file actually called "default" with this API?
> 
> More generally, why do you need a special-case here? I think it would be
> much better to leave this part unaware of the default, and have a
> separate function like create_repo_config_hash (or
> create_the_config_hash) that would call git_config(...). There actually
> isn't much shared code between the "default" case and the others in your
> function.
>

Noted, a separate function would be much better.

>> +static struct hashmap *get_config_hash(const char *filename, struct config_set *cs)
>> +{
>> +	int i;
>> +	for(i = 0; i < cs->nr; i++) {
>> +		if (!strcmp(cs->item[i].filename, filename))
>> +			return &cs->item[i].config_hash;
>> +	}
>> +	return add_config_hash(filename, cs);
>> +}
>> +
>> +static struct config_hash_entry *config_hash_find_entry(const char *key, const char* filename,
>> +							struct config_set *cs)
> 
> I don't get the logic here.
> 
> Either the caller explicitly manages a config_set variable like
> 
>   config_set my_set = git_configset_init();
>   git_configset_add_file(my_set, "my-file");
>   git_configset_get_value(my_set, "some.variable.name");
> 
> Or there's an implicit singleton mapping files to hashmaps to allow the
> user to say
> 
>   git_configset_get_value("my-file", "some.variable.name");
> 
> without asking the user to explicitly manipulate config_set variables.
> 
> It seems to me that your solution is a bad compromise between both
> solutions: you do require the user to manipulate config_set variables,
> but you also require a filename to look for the right entry in the list.
> 
> Did you miss the end of Junio's message:
> 
>   http://thread.gmane.org/gmane.comp.version-control.git/252567
> 
> ?
>

This is an internal function which is used to iterate over every `config_set`
item which contains a hashmap and filename as an identifier.
The exposed API which I documented above doesn't require the user to pass the
filename when querying for a value.

The API works like this, suppose there are two files,
A.config	B.config
[foo]		[foo]
a = b		a = d
a = c		a = e

config_set *my_set = git_configset_init();
git_configset_add_file(my_set, "A.config");
git_configset_add_file(my_set, "B.config");
git_configset_get_value(my_set, "foo.a");

Here get_value calls config_hash_find_entry once for each configset_item
"A.config" and "B.config" with key as "foo.a". which in turn returns
config_hash_item which contains the key and the value list.

We get two string_lists containing values for the key foo.a,
{b,c} and {d,e} corresponding to A.config & B.config  respectively.

get_value then returns the value with the highest priority which is 'e'
after going through both of the lists and picking the last one.
These steps are done by internal functions not by the user facing
API ones.

> If the user explicitly asks for a single entry in the list, then why
> group them in a list? I guess the question is the same as above, and the
> more I read the patch, the more I think a config_set should contain a
> single hashmap.
>

Do you think so after reading the whole patch? It would be much easier
to implement config_set using a single hashmap . But Junio had suggested
that it would be much better for config_set to be a collection of hashmaps
and `the_config_set` to contain a single hashmap[1].

[1]http://thread.gmane.org/gmane.comp.version-control.git/252329/focus=252460

>> +static int config_hash_add_value(const char *key, const char *value,
>> +				 struct config_set_item *ct, struct config_set *cs)
>> +{
>> +	struct hashmap *config_hash = &ct->config_hash;
>> +	struct config_hash_entry *e;
>> +	e = config_hash_find_entry(key, ct->filename, cs);
>> +
>> +	if (!e) {
>> +		e = xmalloc(sizeof(*e));
>> +		hashmap_entry_init(e, strhash(key));
>> +		e->key = xstrdup(key);
>> +		memset(&e->value_list, 0, sizeof(e->value_list));
>> +		e->value_list.strdup_strings = 1;
>> +		hashmap_add(config_hash, e);
>> +	}
>> +	/*
>> +	 * Since the values are being fed by git_config*() callback mechanism, they
>> +	 * are already normalised.
>> +int git_config_get_value(const char *key, const char **value)
>> +{
>> +	if (!the_config_set_initialised) {
>> +		git_configset_add_file(&the_config_set, "default");
>> +		the_config_set_initialised = 1;
>> +	}
>> +	return git_configset_get_value(&the_config_set, key, value);
>> +}
>> +
>> +const struct string_list *git_config_get_value_multi(const char *key)
>> +{
>> +	if (!the_config_set_initialised) {
>> +		git_configset_add_file(&the_config_set, "default");
>> +		the_config_set_initialised = 1;
>> +	}
>> +	return git_configset_get_value_multi(&the_config_set, key);
>> +}
> 
> These if statements could be refactored, and the_config_set_initialised
> could be hidden to the caller. Just make a function get_the_config_set()
> that initializes if needed and returns a pointer to the_config_set.
>

Noted. Thanks for the quick review.

Tanay.

  reply	other threads:[~2014-07-02 11:59 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-02  6:01 [PATCH v4 0/3] git config cache & special querying api utilizing the cache Tanay Abhra
2014-07-02  6:01 ` [PATCH v4 1/2] add `config_set` API for caching config files Tanay Abhra
2014-07-02  9:14   ` Matthieu Moy
2014-07-02 11:58     ` Tanay Abhra [this message]
2014-07-02 14:32       ` Matthieu Moy
2014-07-02 17:09     ` Junio C Hamano
2014-07-04  4:58     ` Tanay Abhra
2014-07-04  9:17       ` Matthieu Moy
2014-07-04  9:25         ` Tanay Abhra
2014-07-04  9:43           ` Matthieu Moy
2014-07-07 17:02           ` Junio C Hamano
2014-07-02 17:00   ` Junio C Hamano
2014-07-02 17:09     ` Matthieu Moy
2014-07-03 17:05     ` Tanay Abhra
2014-07-02  6:01 ` [PATCH 2/2] test-config: Add tests for the config_set API Tanay Abhra
2014-07-02  9:29   ` Matthieu Moy
2014-07-02 12:04     ` Tanay Abhra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53B3F403.6030501@gmail.com \
    --to=tanayabh@gmail.com \
    --cc=Matthieu.Moy@grenoble-inp.fr \
    --cc=artagnon@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=karsten.blees@gmail.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).