git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: David Turner <dturner@twopensource.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 04/19] index-helper: new daemon for caching index and related stuff
Date: Fri, 11 Mar 2016 08:11:57 +0700	[thread overview]
Message-ID: <CACsJy8CT7x3p2TLvMbJ2OkNbd=PZ7bxZQJv1wSU1j_8s826J2A@mail.gmail.com> (raw)
In-Reply-To: <1457641341.13557.32.camel@twopensource.com>

On Fri, Mar 11, 2016 at 3:22 AM, David Turner <dturner@twopensource.com> wrote:
> Reads (ignoring SHA verification) will be slightly slower (due to the
> btree overhead).  If, in general, we only had to read part of the
> index, that would be faster. But a fair amount of git code is written
> to assume that it is reasonable to iterate over the whole index.  So I
> don't see a win from just switching to LMDB without other changes.

I didn't think of the btree overhead. Will need to run some test.. I
think lmdb just gives us pointers to mmap'd file, so an entire index
reading can be fast. We just record where the on-disk data is. We need
to make index-on-lmdb use native byte endian though to avoid
conversion and actually reading all index entries because of that.

> (I guess we could parallelize index deserialization more easily with
> lmdb, but I don't know
>
> The original intent of the index was to be a "staging area" to build up
> a tree before committing it.  This implies an alternate view of the
> index as a diff against some (possibly-empty) tree.  An index diff or
> status operation, then, just requires iterating over the changes.
>
> I haven't really dug into merges to understand whether it would be
> reasonable for them to use a representation like this, and what that
> would do to speed there.

I think we keep the same internal representation of the index, a list
of paths with extra info like stat and some flags. Minimal impacts to
the code base that way. There are some cases (I think) where we just
need to read a portion of the index (filtered by pathspec). If we
don't update the index afterwards, lmdb partial reads definitely help.
I need to check if those cases are common (and worth optimizing for)
though.

> In general, git takes the commit side of the commit-diff duality. This
> is generally a good thing, because commits are simpler to reason about.
> But I wonder if we could do better for this specific case.
>
> But I don't think switching to LMDB would replace index-helper.

The point of index-helper is reduce reading cost (let's leave watchman
out for now). But it looks like we're not certain if using lmdb can
reduce reading cost at lower complexity. I'll give it some thought
over the weekend. Maybe in the end we better just go with
index-helper...
-- 
Duy

  reply	other threads:[~2016-03-11  1:12 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-09 18:36 [PATCH 00/19] index-helper, watchman David Turner
2016-03-09 18:36 ` [PATCH 01/19] trace.c: add GIT_TRACE_PACK_STATS for pack usage statistics David Turner
2016-03-09 22:58   ` Junio C Hamano
2016-03-10  0:05     ` David Turner
2016-03-10 10:59       ` Duy Nguyen
2016-03-09 18:36 ` [PATCH 02/19] read-cache.c: fix constness of verify_hdr() David Turner
2016-03-09 18:36 ` [PATCH 03/19] read-cache: allow to keep mmap'd memory after reading David Turner
2016-03-09 23:02   ` Junio C Hamano
2016-03-10  0:09     ` David Turner
2016-03-09 18:36 ` [PATCH 04/19] index-helper: new daemon for caching index and related stuff David Turner
2016-03-09 23:09   ` Junio C Hamano
2016-03-09 23:21     ` Junio C Hamano
2016-03-10  0:01       ` David Turner
2016-03-10 11:17       ` Duy Nguyen
2016-03-10 20:22         ` David Turner
2016-03-11  1:11           ` Duy Nguyen [this message]
2016-03-10  0:18     ` David Turner
2016-03-15 11:56     ` Duy Nguyen
2016-03-15 15:56       ` Junio C Hamano
2016-03-15 11:52   ` Duy Nguyen
2016-03-09 18:36 ` [PATCH 05/19] trace.c: add GIT_TRACE_INDEX_STATS for index statistics David Turner
2016-03-09 18:36 ` [PATCH 06/19] index-helper: add --strict David Turner
2016-03-09 18:36 ` [PATCH 07/19] daemonize(): set a flag before exiting the main process David Turner
2016-03-09 18:36 ` [PATCH 08/19] index-helper: add --detach David Turner
2016-03-09 18:36 ` [PATCH 09/19] index-helper: add Windows support David Turner
2016-03-16 11:42   ` Duy Nguyen
2016-03-17 12:18     ` Johannes Schindelin
2016-03-17 12:59       ` Duy Nguyen
2016-03-09 18:36 ` [PATCH 10/19] read-cache: add watchman 'WAMA' extension David Turner
2016-03-09 18:36 ` [PATCH 11/19] Add watchman support to reduce index refresh cost David Turner
2016-03-09 18:36 ` [PATCH 12/19] read-cache: allow index-helper to prepare shm before git reads it David Turner
2016-03-09 18:36 ` [PATCH 13/19] index-helper: use watchman to avoid refreshing index with lstat() David Turner
2016-03-09 18:36 ` [PATCH 14/19] update-index: enable/disable watchman support David Turner
2016-03-09 18:36 ` [PATCH 15/19] unpack-trees: preserve index extensions David Turner
2016-03-09 18:36 ` [PATCH 16/19] index-helper: rewrite pidfile after daemonizing David Turner
2016-03-09 18:36 ` [PATCH 17/19] index-helper: process management David Turner
2016-03-09 18:36 ` [PATCH 18/19] index-helper: autorun David Turner
2016-03-15 12:12   ` Duy Nguyen
2016-03-15 14:26     ` Johannes Schindelin
2016-03-16 11:37       ` Duy Nguyen
2016-03-16 18:11       ` David Turner
2016-03-16 18:27         ` Johannes Schindelin
2016-03-17 13:02           ` Duy Nguyen
2016-03-17 14:43             ` Johannes Schindelin
2016-03-17 18:31               ` David Turner
2016-03-18  0:50               ` Duy Nguyen
2016-03-18  7:14                 ` Johannes Schindelin
2016-03-18  7:44                   ` Duy Nguyen
2016-03-18 17:22                     ` David Turner
2016-03-18 23:09                       ` Duy Nguyen
2016-03-18  7:17                 ` Johannes Schindelin
2016-03-18  7:34                   ` Duy Nguyen
2016-03-18 15:57                     ` Johannes Schindelin
2016-03-09 18:36 ` [PATCH 19/19] hack: watchman/untracked cache mashup David Turner
2016-03-15 12:31   ` Duy Nguyen
2016-03-17  0:56     ` David Turner
2016-03-17 13:06       ` Duy Nguyen
2016-03-17 18:08         ` David Turner
2016-03-29 17:09 ` [PATCH 00/19] index-helper, watchman Torsten Bögershausen
2016-03-29 21:51   ` David Turner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACsJy8CT7x3p2TLvMbJ2OkNbd=PZ7bxZQJv1wSU1j_8s826J2A@mail.gmail.com' \
    --to=pclouds@gmail.com \
    --cc=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).