git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Michael Haggerty <mhagger@alum.mit.edu>,
	git discussion list <git@vger.kernel.org>,
	Matthieu Moy <Matthieu.Moy@grenoble-inp.fr>
Subject: Re: RFC GSoC idea: git configuration caching (needs co-mentor!)
Date: Thu, 6 Mar 2014 14:46:54 -0500	[thread overview]
Message-ID: <20140306194654.GA28203@sigill.intra.peff.net> (raw)
In-Reply-To: <xmqqtxbbxh99.fsf@gitster.dls.corp.google.com>

On Thu, Mar 06, 2014 at 11:24:18AM -0800, Junio C Hamano wrote:

> > * Add new API calls that allow the cache to be inquired easily and
> >   efficiently.  Rewrite other functions like `git_config_int()` to be
> >   cache-aware.
> 
> Are you sure about the second sentence of this item is what you
> want?
> 
> git_config_<type>(name, value) are all about parsing "value" (string
> or NULL) as <type>, return the parsed value or complain against a
> bad value for "name".  They do not care where these "name" and
> "value" come from right now, and there is no reason for them to
> start caring about caching.  They will still be the underlying
> helper functions the git_config() callbacks will depend on even
> after the second item in your list happens.

Yeah, I agree we want a _new_ set of helpers for retrieving values in a
non-callback way. We could call those helpers "git_config_int" (and
rename the existing pure functions), but I'd rather not, as it simply
invites confusion with the existing ones.

> A set of new API calls would look more like this, I would think:
> 
> 	extern int git_get_config_string_multi(const char *, int *, const char ***);

Not important at this stage, but I was hoping we could keep the names of
the new helpers shorter. :)

> 	const char *git_get_config_string(const char *name)
>         {
> 		const char **values, *result;
>                 int num_values;
> 
> 	        if (git_get_config_string_multi("sample.str", &num_values, &values))
>         		return NULL;
>                 result = num_values ? values[num_values - 1] : NULL;
>                 free(values);
> 		return result;
> 	}
> 
> that implements the "last one wins" semantics.  The real thing would
> need to avoid allocation and free overhead.

One of the things that needs to be figured out by the student is the
format of the internal cache. I had actually envisioned a mapping of
keys to values, where values are represented as a full list of strings.
Then your "string_multi" can just return a pointer to that list, and a
last-one-wins lookup can grab the final value, with no allocation or
ownership complexity. We'd lose the relative order of different config
keys, but those should never be important (only the order of single
keys, but that is reflected in the order of the value list).

Another approach would be to actually represent the syntax tree of the
config file in memory. That would make lookups of individual keys more
expensive, but would enable other manipulation. E.g., if your syntax
tree included nodes for comments and other non-semantic constructs, then
we can use it for a complete rewrite. And "git config" becomes:

  1. Read the tree.

  2. Perform operations on the tree (add nodes, delete nodes, etc).

  3. Write out the tree.

and things like "remove the section header when the last item in the
section is removed" become trivial during step 2.

But comparing those approaches is something for the student to figure
out, I think.

-Peff

  reply	other threads:[~2014-03-06 19:47 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-06  5:57 RFC GSoC idea: git configuration caching (needs co-mentor!) Michael Haggerty
2014-03-06 19:24 ` Junio C Hamano
2014-03-06 19:46   ` Jeff King [this message]
2014-03-06 21:33   ` Michael Haggerty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140306194654.GA28203@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=Matthieu.Moy@grenoble-inp.fr \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).