git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: David Turner <dturner@twopensource.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, pclouds@gmail.com,
	Ramsay Jones <ramsay@ramsayjones.plus.com>
Subject: Re: [PATCH v6 03/19] index-helper: new daemon for caching index and related stuff
Date: Fri, 29 Apr 2016 14:46:57 -0400	[thread overview]
Message-ID: <1461955617.4123.37.camel@twopensource.com> (raw)
In-Reply-To: <xmqq60v15zmq.fsf@gitster.mtv.corp.google.com>

On Thu, 2016-04-28 at 11:58 -0700, Junio C Hamano wrote:
> David Turner <dturner@twopensource.com> writes:
> 
> > From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> > ...
> > The biggest gain is not having to verify the trailing SHA-1, which
> > takes lots of time especially on large index files. But this also
> > opens doors for further optimiztions:
> 
> optimizAtion
> 
> > Git can poke the daemon via unix domain sockets to tell it to
> > refresh
> > the index cache, or to keep it alive some more minutes. It can't
> > give
> > any real index data directly to the daemon. Real data goes to disk
> > first, then the daemon reads and verifies it from there. Poking
> > only
> > happens for $GIT_DIR/index, not temporary index files.
> 
> Is this limited to "poking", or the helper daemon is not involved in
> codepaths that handle temporary index at all?  It makes sense if it
> is the latter, and it doesn't if it were the former, but it is
> unclear in this paragraph.

It is in fact the latter.  Will clarify.

> > $GIT_DIR/index-helper.sock is the socket for the daemon process.
> > The
> > daemon reads from the socket and executes commands.
> > 
> > Named pipes were considered for portability reasons, but then
> > commands
> > that need replies from the daemon would have open their own pipes,
> 
> "would have TO open"?

Oops.

> > since a named pipe should only have one reader.  Unix domain
> > sockets
> > don't have this problem.
> 
> > diff --git a/Documentation/git-index-helper.txt b/Documentation/git
> > -index-helper.txt
> > new file mode 100644
> > index 0000000..77687c0
> > --- /dev/null
> > +++ b/Documentation/git-index-helper.txt
> > @@ -0,0 +1,47 @@
> > +git-index-helper(1)
> > +===================
> > +
> > +NAME
> > +----
> > +git-index-helper - A simple cache daemon for speeding up index
> > file access
> > +
> > +SYNOPSIS
> > +--------
> > +[verse]
> > +'git index-helper' [options]
> > +
> > +DESCRIPTION
> > +-----------
> > +Keep the index file in memory for faster access. This daemon is
> > per
> > +repository.
> 
> Not an objection but a question.  Does this mean "per index file"?
>
> What the users would need to be aware of is that it is not like
> running a single daemon instance helps the toplevel project with its
> entire 600 submodules checked out you have--you would need that many
> instances of the helper, which is already conveyed well with "per
> repository".  But I was wondering if it helps users experimenting
> with the multiple worktree feature if we said "per index file".  It
> would make it more clear that you would run two instances of the
> helper daemon when you use another worktree in addition to your
> primary repository.

Per working tree/repository, yes (not quite per-index because working
trees with split indexes have two).  But yeah, that's confusing; will
clarify.

> > +... The following commands are used to control the daemon:
> > +
> > +"refresh"::
> > +	Reread the index.
> > +
> > +"poke":
> > +	Let the daemon know the index is to be read. It keeps the
> > +	daemon alive longer, unless `--exit-after=0` is used.
> > +
> > +All commands and replies are terminated by a 0 byte.
> 
> s/0/NUL/

Will fix.

> > diff --git a/cache.h b/cache.h
> > index 4180e2b..43fb314 100644
> > --- a/cache.h
> > +++ b/cache.h
> > @@ -334,6 +334,8 @@ struct index_state {
> >  	struct cache_time timestamp;
> >  	unsigned name_hash_initialized : 1,
> >  		 keep_mmap : 1,
> > +		 from_shm : 1,
> > +		 to_shm : 1,
> 
> keep_mmap bit had its own commit, which made it more-or-less
> unnecessary to have a separate explanation, but it is unclear how
> these two bits are envisioned to be used...

Will add comments.

> > diff --git a/index-helper.c b/index-helper.c
> > new file mode 100644
> > index 0000000..976e913
> > --- /dev/null
> > +++ b/index-helper.c
> > @@ -0,0 +1,285 @@
> > +#include "cache.h"
> > +#include "parse-options.h"
> > +#include "sigchain.h"
> > +#include "strbuf.h"
> > +#include "exec_cmd.h"
> > +#include "split-index.h"
> > +#include "lockfile.h"
> > +#include "cache.h"
> > +#include "unix-socket.h"
> > +#include "pkt-line.h"
> > +
> > +struct shm {
> > +	unsigned char sha1[20];
> > +	void *shm;
> > +	size_t size;
> > +};
> > +
> > +static struct shm shm_index;
> > +static struct shm shm_base_index;
> > +
> > +static void release_index_shm(struct shm *is)
> > +{
> > +	if (!is->shm)
> > +		return;
> > +	munmap(is->shm, is->size);
> > +	unlink(git_path("shm-index-%s", sha1_to_hex(is->sha1)));
> > +	is->shm = NULL;
> > +}
> > +
> > +static void cleanup_shm(void)
> > +{
> > +	release_index_shm(&shm_index);
> > +	release_index_shm(&shm_base_index);
> > +}
> > +
> > +static void cleanup(void)
> > +{
> > +	unlink(git_path("index-helper.sock"));
> > +	cleanup_shm();
> > +}
> > +
> > +static void cleanup_on_signal(int sig)
> > +{
> > +	/* We ignore sigpipes -- that's just a client being
> > broken. */
> 
> Is it really "a broken" client, in which case we would want to know
> about the breakage so that we can fix it?  Or is it a case of "a
> client wasn't interested in what we had to say, which is perfectly
> fine"?

It's usually a client that's killed at an inopportune moment e.g.:
git diff 
^C 
#oops I meant 
git diff --cached

I guess I'll reword to "suddenly dying".


> > +	if (sig == SIGPIPE)
> > +		return;
> > +	cleanup();
> > +	sigchain_pop(sig);
> > +	raise(sig);
> > +}
> > +
> > +static int shared_mmap_create(int file_flags, int file_mode,
> > size_t size,
> > +			      void **new_mmap, int mmap_prot, int
> > mmap_flags,
> > +			      const char *path)
> > +{
> 
> I can understand size, new_mmap and path, but do all others need the
> customizability?  It is hard to tell at this point in the series as
> there is only one caller, which gives a reasonable set of values to
> these parameters.  And each of all these "reasonable values" is the
> only reasonable value for each of these parameters-- you would ever
> want to have mmap_prot without PROT_WRIT as this is creating an
> empty region that is mapped to a newly created file, for example.
> It is inconceivable that a function named "shared_mmap_create" ever
> passes mmap_flags that lack MAP_SHARED, for another example, or
> file_flags that is not O_CREAT|O_EXCL|O_RDWR for that matter.

Yeah, good point.

> > + ...
> > +	if (ftruncate(fd, size))
> > +		goto done;
> > +	*new_mmap = mmap(NULL, size, mmap_prot, mmap_flags, fd,
> > 0);
> > ...
> > +static void share_index(struct index_state *istate, struct shm
> > *is)
> > +{
> > +	void *new_mmap;
> > +	if (istate->mmap_size <= 20 ||
> > +	    hashcmp(istate->sha1,
> > +		    (unsigned char *)istate->mmap + istate
> > ->mmap_size - 20) ||
> > +	    !hashcmp(istate->sha1, is->sha1) ||
> > +	    shared_mmap_create(O_CREAT | O_EXCL | O_RDWR, 0700,
> > +			       istate->mmap_size, &new_mmap,
> > +			       PROT_READ | PROT_WRITE, MAP_SHARED,
> > +			       git_path("shm-index-%s",
> > +					sha1_to_hex(istate
> > ->sha1))) < 0)
> > +		return;
> > +	release_index_shm(is);
> > +	is->size = istate->mmap_size;
> > +	is->shm = new_mmap;
> > +	hashcpy(is->sha1, istate->sha1);
> 
> So "is" is expected to be a valid instance of shm that was obtained
> previously.  When we fail to map and return, does it make sense to
> retain it?  Because this function does not return any value, the
> caller wouldn't be able to tell that shared-mmap-create failed, and
> will keep using that stale instance of "is"--I cannot tell if that
> is OK in the caller of this thing or not from what I read so far in
> this series...

Honestly, if we fail to mmap, we should probably just die.  That's not
a normal circumstance, and there's not much point in keeping a non
-functioning index-helper around.

> > +static void refresh(void)
> > +{
> > +	discard_index(&the_index);
> > +	the_index.keep_mmap = 1;
> > +	the_index.to_shm    = 1;
> 
> Again, mysterious to_shm appears here...

I expect that the explanation of to_shm will help here.

> > +		/* Wait for a request */
> > +		FD_ZERO(&readfds);
> > +		FD_SET(fd, &readfds);
> > +		result = select(fd + 1, &readfds, NULL, NULL,
> > timeout_p);
> 
> I admit that I am old fashioned and am more familiar with select(2)
> than poll(2), but I think we try to avoid select(2) in new code.

I don't see that in Documentation/CodingGuidelines (although I admit
that I did not even consider checking it before writing this code, so
if it had been there, I would have missed it).

Why do we try to avoid select(2)?  It seems marginally less efficient
for small numbers of fds, and might not scale to large numbers of fds,
but we're not using it in a performance-critical way, and we only care
about one fd at a time.

  reply	other threads:[~2016-04-29 18:47 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-27 20:04 [PATCH v6 00/19] index-helper/watchman David Turner
2016-04-27 20:04 ` [PATCH v6 01/19] read-cache.c: fix constness of verify_hdr() David Turner
2016-04-27 20:04 ` [PATCH v6 02/19] read-cache: allow to keep mmap'd memory after reading David Turner
2016-04-27 20:04 ` [PATCH v6 03/19] index-helper: new daemon for caching index and related stuff David Turner
2016-04-28 18:58   ` Junio C Hamano
2016-04-29 18:46     ` David Turner [this message]
2016-04-29 20:47       ` Junio C Hamano
2016-05-01  0:22       ` Duy Nguyen
2016-04-27 20:04 ` [PATCH v6 04/19] index-helper: add --strict David Turner
2016-04-27 20:04 ` [PATCH v6 05/19] index-helper: log warnings David Turner
2016-04-27 20:04 ` [PATCH v6 06/19] daemonize(): set a flag before exiting the main process David Turner
2016-04-27 20:04 ` [PATCH v6 07/19] index-helper: add --detach David Turner
2016-04-27 20:04 ` [PATCH v6 08/19] read-cache: add watchman 'WAMA' extension David Turner
2016-04-27 20:04 ` [PATCH v6 09/19] Add watchman support to reduce index refresh cost David Turner
2016-04-27 20:04 ` [PATCH v6 10/19] index-helper: use watchman to avoid refreshing index with lstat() David Turner
2016-04-27 20:04 ` [PATCH v6 11/19] update-index: enable/disable watchman support David Turner
2016-04-27 20:04 ` [PATCH v6 12/19] unpack-trees: preserve index extensions David Turner
2016-04-27 20:04 ` [PATCH v6 13/19] watchman: add a config option to enable the extension David Turner
2016-04-27 20:04 ` [PATCH v6 14/19] index-helper: kill mode David Turner
2016-04-27 20:04 ` [PATCH v6 15/19] index-helper: don't run if already running David Turner
2016-04-27 20:04 ` [PATCH v6 16/19] index-helper: autorun mode David Turner
2016-04-27 20:04 ` [PATCH v6 17/19] index-helper: optionally automatically run David Turner
2016-04-27 20:04 ` [PATCH v6 18/19] Add tracing to measure where most of the time is spent David Turner
2016-04-27 20:04 ` [PATCH v6 19/19] untracked-cache: config option David Turner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1461955617.4123.37.camel@twopensource.com \
    --to=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=ramsay@ramsayjones.plus.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).