git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Matthieu Moy <Matthieu.Moy@grenoble-inp.fr>
To: Tanay Abhra <tanayabh@gmail.com>
Cc: git@vger.kernel.org, Ramkumar Ramachandra <artagnon@gmail.com>,
	Karsten Blees <karsten.blees@gmail.com>,
	Junio C Hamano <gitster@pobox.com>,
	Eric Sunshine <sunshine@sunshineco.com>
Subject: Re: [PATCH v4 1/2] add `config_set` API for caching config files
Date: Wed, 02 Jul 2014 11:14:10 +0200	[thread overview]
Message-ID: <vpqoax8m8bh.fsf@anie.imag.fr> (raw)
In-Reply-To: <1404280905-26763-2-git-send-email-tanayabh@gmail.com> (Tanay Abhra's message of "Tue, 1 Jul 2014 23:01:44 -0700")

Tanay Abhra <tanayabh@gmail.com> writes:

> Add a `config_set` construct that points to an ordered set of config files
> specified by the user, each of which represents what was read and cached
> in core as hashmaps. Add two external functions `git_configset_get_value`
> and `git_configset_get_value_multi` for querying from the config sets,
> which follow `last one wins` semantic(i.e. if there are multiple matches

Space before (

> for the queried key in the files of the configset the value returned will
> the last value in the last matching file of the configset) Add type

will _be_ ?

Does this remark also apply to git_configset_get_value_multi? My
understanding was that "last one wins" applied only to
git_configset_get_value.

> Add a default `config_set`, `the_config_set` to cache all key-value pairs

I don't like the name "the_config_set". It's not the only one. Perhaps
repo_config_set? (not totally satisfactory as it does not contain only
the repo, but the repo+user+system)

What do others think?

> read from usual config files(repo specific .git/config, user wide

(You always forget space before '('...)

> ~/.gitconfig and the global /etc/gitconfig). `the_config_set` uses a
> single hashmap populated using git_config(). The reason for doing so is
> twofold, one is to honour include directives, another is to guarantee O(1)
> lookups for usual config values, as values are queried for hundred of
> times during a run of a git program.

What is the reason to deal with `the_config_set` and other config sets
differently? You're giving arguments to store `the_config_set` as a
single hashmap, but what is the reason to store others as multiple
hashmaps?

And actually, I don't completely buy your arguments: having 3 or 4
hashmaps (.git/config, ~/.gitconfig, ~/.config/git/config and
/etc/gitconfig) would be a O(4) lookup, which is still O(1), and you
could deal with include directives by having ".git/config and included
files" in a hashmap, "~/.gitconfig and included files" in a second
hashmap, ...

My guess is that the real argument is "git_config already does what I
want and I'm too lazy to change it". And I do consider lazyness as a
quality for a computer-scientist ;-).

I would personally find it much simpler to have a single hashmap. We'd
lose the ability to invalidate the cache for only a single file, but I'm
not sure it's worth the code complexity.

> +Querying For Specific Variables
> +-------------------------------
> +
> +For programs wanting to query for specific variables in a non-callback
> +manner, the config API provides two functions `git_config_get_value`
> +and `git_config_get_value_multi`.They both read values from an internal

Space after .

> +`git_config_iter` gives a way to iterate over the keys in cache. Existing
> +`git_config` callback function signature is used for iterating.

"Existing" refers to the old state. It's OK in a commit message, but
won't make sense in the future, when your non-callback API and the old
callback API will live side by side.

> +The config API also provides type specific API functions which do conversion
> +as well as retrieval for the queried variable, including:
> +
> +`git_config_get_int`::
> +Parse the retrieved value to an integer, including unit factors. Dies on
> +error; otherwise, allocates and copies the integer into the dest parameter.
> +Returns 0 on success, or 1 if no value was found.

It seems strange to me to return 1 here, and -1 in git_config_get_value
to mean the same thing.

> +`git_config_bool`::

Isn't it git_config_get_bool?

> +/* config-hash.c */

You may point to the documentation in the comment too.

> +#define CONFIG_SET_INIT { NULL, 0, 0 }
> +
> +struct config_set {
> +	struct config_set_item *item;
> +	unsigned int nr, alloc;
> +};
> +
> +struct config_set_item {
> +	struct hashmap config_hash;
> +	char *filename;

Perhaps const char *?

> +static struct hashmap *add_config_hash(const char *filename, struct config_set *cs)
> +{
> +	int ret = 0;
> +	struct config_set_item *ct;
> +	struct config_set_cb cb;
> +	ALLOC_GROW(cs->item, cs->nr + 1, cs->alloc);
> +	ct = &cs->item[cs->nr++];
> +	ct->filename = xstrdup(filename);
> +	config_hash_init(&ct->config_hash);
> +	cb.cs = cs;
> +	cb.ct = ct;
> +	/*
> +	 * When filename is "default", hashmap is constructed from the usual set of
> +	 * config files (i.e repo specific .git/config, user wide ~/.gitconfig and the
> +	 * global /etc/gitconfig), used in `the_config_set`
> +	 */
> +	if (!strcmp(filename, "default"))

How does one read a file actually called "default" with this API?

More generally, why do you need a special-case here? I think it would be
much better to leave this part unaware of the default, and have a
separate function like create_repo_config_hash (or
create_the_config_hash) that would call git_config(...). There actually
isn't much shared code between the "default" case and the others in your
function.

> +static struct hashmap *get_config_hash(const char *filename, struct config_set *cs)
> +{
> +	int i;
> +	for(i = 0; i < cs->nr; i++) {
> +		if (!strcmp(cs->item[i].filename, filename))
> +			return &cs->item[i].config_hash;
> +	}
> +	return add_config_hash(filename, cs);
> +}
> +
> +static struct config_hash_entry *config_hash_find_entry(const char *key, const char* filename,
> +							struct config_set *cs)

I don't get the logic here.

Either the caller explicitly manages a config_set variable like

  config_set my_set = git_configset_init();
  git_configset_add_file(my_set, "my-file");
  git_configset_get_value(my_set, "some.variable.name");

Or there's an implicit singleton mapping files to hashmaps to allow the
user to say

  git_configset_get_value("my-file", "some.variable.name");

without asking the user to explicitly manipulate config_set variables.

It seems to me that your solution is a bad compromise between both
solutions: you do require the user to manipulate config_set variables,
but you also require a filename to look for the right entry in the list.

Did you miss the end of Junio's message:

  http://thread.gmane.org/gmane.comp.version-control.git/252567

?

If the user explicitly asks for a single entry in the list, then why
group them in a list? I guess the question is the same as above, and the
more I read the patch, the more I think a config_set should contain a
single hashmap.

> +static int config_hash_add_value(const char *key, const char *value,
> +				 struct config_set_item *ct, struct config_set *cs)
> +{
> +	struct hashmap *config_hash = &ct->config_hash;
> +	struct config_hash_entry *e;
> +	e = config_hash_find_entry(key, ct->filename, cs);
> +
> +	if (!e) {
> +		e = xmalloc(sizeof(*e));
> +		hashmap_entry_init(e, strhash(key));
> +		e->key = xstrdup(key);
> +		memset(&e->value_list, 0, sizeof(e->value_list));
> +		e->value_list.strdup_strings = 1;
> +		hashmap_add(config_hash, e);
> +	}
> +	/*
> +	 * Since the values are being fed by git_config*() callback mechanism, they
> +	 * are already normalised.

We usually speak en_US rather than en_UK => normalized.

> +int git_config_get_value(const char *key, const char **value)
> +{
> +	if (!the_config_set_initialised) {
> +		git_configset_add_file(&the_config_set, "default");
> +		the_config_set_initialised = 1;
> +	}
> +	return git_configset_get_value(&the_config_set, key, value);
> +}
> +
> +const struct string_list *git_config_get_value_multi(const char *key)
> +{
> +	if (!the_config_set_initialised) {
> +		git_configset_add_file(&the_config_set, "default");
> +		the_config_set_initialised = 1;
> +	}
> +	return git_configset_get_value_multi(&the_config_set, key);
> +}

These if statements could be refactored, and the_config_set_initialised
could be hidden to the caller. Just make a function get_the_config_set()
that initializes if needed and returns a pointer to the_config_set.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

  reply	other threads:[~2014-07-02  9:14 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-02  6:01 [PATCH v4 0/3] git config cache & special querying api utilizing the cache Tanay Abhra
2014-07-02  6:01 ` [PATCH v4 1/2] add `config_set` API for caching config files Tanay Abhra
2014-07-02  9:14   ` Matthieu Moy [this message]
2014-07-02 11:58     ` Tanay Abhra
2014-07-02 14:32       ` Matthieu Moy
2014-07-02 17:09     ` Junio C Hamano
2014-07-04  4:58     ` Tanay Abhra
2014-07-04  9:17       ` Matthieu Moy
2014-07-04  9:25         ` Tanay Abhra
2014-07-04  9:43           ` Matthieu Moy
2014-07-07 17:02           ` Junio C Hamano
2014-07-02 17:00   ` Junio C Hamano
2014-07-02 17:09     ` Matthieu Moy
2014-07-03 17:05     ` Tanay Abhra
2014-07-02  6:01 ` [PATCH 2/2] test-config: Add tests for the config_set API Tanay Abhra
2014-07-02  9:29   ` Matthieu Moy
2014-07-02 12:04     ` Tanay Abhra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=vpqoax8m8bh.fsf@anie.imag.fr \
    --to=matthieu.moy@grenoble-inp.fr \
    --cc=artagnon@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=karsten.blees@gmail.com \
    --cc=sunshine@sunshineco.com \
    --cc=tanayabh@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).