git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: David Turner <dturner@twopensource.com>
Cc: git@vger.kernel.org, pclouds@gmail.com,
	Ramsay Jones <ramsay@ramsayjones.plus.com>
Subject: Re: [PATCH v6 03/19] index-helper: new daemon for caching index and related stuff
Date: Thu, 28 Apr 2016 11:58:37 -0700	[thread overview]
Message-ID: <xmqq60v15zmq.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <1461787481-877-4-git-send-email-dturner@twopensource.com> (David Turner's message of "Wed, 27 Apr 2016 16:04:25 -0400")

David Turner <dturner@twopensource.com> writes:

> From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ...
> The biggest gain is not having to verify the trailing SHA-1, which
> takes lots of time especially on large index files. But this also
> opens doors for further optimiztions:

optimizAtion

> Git can poke the daemon via unix domain sockets to tell it to refresh
> the index cache, or to keep it alive some more minutes. It can't give
> any real index data directly to the daemon. Real data goes to disk
> first, then the daemon reads and verifies it from there. Poking only
> happens for $GIT_DIR/index, not temporary index files.

Is this limited to "poking", or the helper daemon is not involved in
codepaths that handle temporary index at all?  It makes sense if it
is the latter, and it doesn't if it were the former, but it is
unclear in this paragraph.

> $GIT_DIR/index-helper.sock is the socket for the daemon process. The
> daemon reads from the socket and executes commands.
>
> Named pipes were considered for portability reasons, but then commands
> that need replies from the daemon would have open their own pipes,

"would have TO open"?

> since a named pipe should only have one reader.  Unix domain sockets
> don't have this problem.

> diff --git a/Documentation/git-index-helper.txt b/Documentation/git-index-helper.txt
> new file mode 100644
> index 0000000..77687c0
> --- /dev/null
> +++ b/Documentation/git-index-helper.txt
> @@ -0,0 +1,47 @@
> +git-index-helper(1)
> +===================
> +
> +NAME
> +----
> +git-index-helper - A simple cache daemon for speeding up index file access
> +
> +SYNOPSIS
> +--------
> +[verse]
> +'git index-helper' [options]
> +
> +DESCRIPTION
> +-----------
> +Keep the index file in memory for faster access. This daemon is per
> +repository.

Not an objection but a question.  Does this mean "per index file"?

What the users would need to be aware of is that it is not like
running a single daemon instance helps the toplevel project with its
entire 600 submodules checked out you have--you would need that many
instances of the helper, which is already conveyed well with "per
repository".  But I was wondering if it helps users experimenting
with the multiple worktree feature if we said "per index file".  It
would make it more clear that you would run two instances of the
helper daemon when you use another worktree in addition to your
primary repository.

> +... The following commands are used to control the daemon:
> +
> +"refresh"::
> +	Reread the index.
> +
> +"poke":
> +	Let the daemon know the index is to be read. It keeps the
> +	daemon alive longer, unless `--exit-after=0` is used.
> +
> +All commands and replies are terminated by a 0 byte.

s/0/NUL/

> diff --git a/cache.h b/cache.h
> index 4180e2b..43fb314 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -334,6 +334,8 @@ struct index_state {
>  	struct cache_time timestamp;
>  	unsigned name_hash_initialized : 1,
>  		 keep_mmap : 1,
> +		 from_shm : 1,
> +		 to_shm : 1,

keep_mmap bit had its own commit, which made it more-or-less
unnecessary to have a separate explanation, but it is unclear how
these two bits are envisioned to be used...

> diff --git a/index-helper.c b/index-helper.c
> new file mode 100644
> index 0000000..976e913
> --- /dev/null
> +++ b/index-helper.c
> @@ -0,0 +1,285 @@
> +#include "cache.h"
> +#include "parse-options.h"
> +#include "sigchain.h"
> +#include "strbuf.h"
> +#include "exec_cmd.h"
> +#include "split-index.h"
> +#include "lockfile.h"
> +#include "cache.h"
> +#include "unix-socket.h"
> +#include "pkt-line.h"
> +
> +struct shm {
> +	unsigned char sha1[20];
> +	void *shm;
> +	size_t size;
> +};
> +
> +static struct shm shm_index;
> +static struct shm shm_base_index;
> +
> +static void release_index_shm(struct shm *is)
> +{
> +	if (!is->shm)
> +		return;
> +	munmap(is->shm, is->size);
> +	unlink(git_path("shm-index-%s", sha1_to_hex(is->sha1)));
> +	is->shm = NULL;
> +}
> +
> +static void cleanup_shm(void)
> +{
> +	release_index_shm(&shm_index);
> +	release_index_shm(&shm_base_index);
> +}
> +
> +static void cleanup(void)
> +{
> +	unlink(git_path("index-helper.sock"));
> +	cleanup_shm();
> +}
> +
> +static void cleanup_on_signal(int sig)
> +{
> +	/* We ignore sigpipes -- that's just a client being broken. */

Is it really "a broken" client, in which case we would want to know
about the breakage so that we can fix it?  Or is it a case of "a
client wasn't interested in what we had to say, which is perfectly
fine"?

> +	if (sig == SIGPIPE)
> +		return;
> +	cleanup();
> +	sigchain_pop(sig);
> +	raise(sig);
> +}
> +
> +static int shared_mmap_create(int file_flags, int file_mode, size_t size,
> +			      void **new_mmap, int mmap_prot, int mmap_flags,
> +			      const char *path)
> +{

I can understand size, new_mmap and path, but do all others need the
customizability?  It is hard to tell at this point in the series as
there is only one caller, which gives a reasonable set of values to
these parameters.  And each of all these "reasonable values" is the
only reasonable value for each of these parameters-- you would ever
want to have mmap_prot without PROT_WRIT as this is creating an
empty region that is mapped to a newly created file, for example.
It is inconceivable that a function named "shared_mmap_create" ever
passes mmap_flags that lack MAP_SHARED, for another example, or
file_flags that is not O_CREAT|O_EXCL|O_RDWR for that matter.

> + ...
> +	if (ftruncate(fd, size))
> +		goto done;
> +	*new_mmap = mmap(NULL, size, mmap_prot, mmap_flags, fd, 0);
> ...
> +static void share_index(struct index_state *istate, struct shm *is)
> +{
> +	void *new_mmap;
> +	if (istate->mmap_size <= 20 ||
> +	    hashcmp(istate->sha1,
> +		    (unsigned char *)istate->mmap + istate->mmap_size - 20) ||
> +	    !hashcmp(istate->sha1, is->sha1) ||
> +	    shared_mmap_create(O_CREAT | O_EXCL | O_RDWR, 0700,
> +			       istate->mmap_size, &new_mmap,
> +			       PROT_READ | PROT_WRITE, MAP_SHARED,
> +			       git_path("shm-index-%s",
> +					sha1_to_hex(istate->sha1))) < 0)
> +		return;
> +	release_index_shm(is);
> +	is->size = istate->mmap_size;
> +	is->shm = new_mmap;
> +	hashcpy(is->sha1, istate->sha1);

So "is" is expected to be a valid instance of shm that was obtained
previously.  When we fail to map and return, does it make sense to
retain it?  Because this function does not return any value, the
caller wouldn't be able to tell that shared-mmap-create failed, and
will keep using that stale instance of "is"--I cannot tell if that
is OK in the caller of this thing or not from what I read so far in
this series...

> +static void refresh(void)
> +{
> +	discard_index(&the_index);
> +	the_index.keep_mmap = 1;
> +	the_index.to_shm    = 1;

Again, mysterious to_shm appears here...

> +		/* Wait for a request */
> +		FD_ZERO(&readfds);
> +		FD_SET(fd, &readfds);
> +		result = select(fd + 1, &readfds, NULL, NULL, timeout_p);

I admit that I am old fashioned and am more familiar with select(2)
than poll(2), but I think we try to avoid select(2) in new code.

  reply	other threads:[~2016-04-28 18:58 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-27 20:04 [PATCH v6 00/19] index-helper/watchman David Turner
2016-04-27 20:04 ` [PATCH v6 01/19] read-cache.c: fix constness of verify_hdr() David Turner
2016-04-27 20:04 ` [PATCH v6 02/19] read-cache: allow to keep mmap'd memory after reading David Turner
2016-04-27 20:04 ` [PATCH v6 03/19] index-helper: new daemon for caching index and related stuff David Turner
2016-04-28 18:58   ` Junio C Hamano [this message]
2016-04-29 18:46     ` David Turner
2016-04-29 20:47       ` Junio C Hamano
2016-05-01  0:22       ` Duy Nguyen
2016-04-27 20:04 ` [PATCH v6 04/19] index-helper: add --strict David Turner
2016-04-27 20:04 ` [PATCH v6 05/19] index-helper: log warnings David Turner
2016-04-27 20:04 ` [PATCH v6 06/19] daemonize(): set a flag before exiting the main process David Turner
2016-04-27 20:04 ` [PATCH v6 07/19] index-helper: add --detach David Turner
2016-04-27 20:04 ` [PATCH v6 08/19] read-cache: add watchman 'WAMA' extension David Turner
2016-04-27 20:04 ` [PATCH v6 09/19] Add watchman support to reduce index refresh cost David Turner
2016-04-27 20:04 ` [PATCH v6 10/19] index-helper: use watchman to avoid refreshing index with lstat() David Turner
2016-04-27 20:04 ` [PATCH v6 11/19] update-index: enable/disable watchman support David Turner
2016-04-27 20:04 ` [PATCH v6 12/19] unpack-trees: preserve index extensions David Turner
2016-04-27 20:04 ` [PATCH v6 13/19] watchman: add a config option to enable the extension David Turner
2016-04-27 20:04 ` [PATCH v6 14/19] index-helper: kill mode David Turner
2016-04-27 20:04 ` [PATCH v6 15/19] index-helper: don't run if already running David Turner
2016-04-27 20:04 ` [PATCH v6 16/19] index-helper: autorun mode David Turner
2016-04-27 20:04 ` [PATCH v6 17/19] index-helper: optionally automatically run David Turner
2016-04-27 20:04 ` [PATCH v6 18/19] Add tracing to measure where most of the time is spent David Turner
2016-04-27 20:04 ` [PATCH v6 19/19] untracked-cache: config option David Turner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq60v15zmq.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=ramsay@ramsayjones.plus.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).