git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Duane Knesek <duane.knesek@gmail.com>
To: git@vger.kernel.org
Subject: FUSE with git
Date: Thu, 22 Feb 2018 22:20:49 -0600	[thread overview]
Message-ID: <CACzbczZNAxN94wfNhFX2=Xbob6sO1wUrY2M+ZNE4rDet_tapRQ@mail.gmail.com> (raw)

Disclaimer:  I am not a git developer, nor have I ever written
anything FUSE. So I apologize if the following is idiotic:

I've been looking for a virtual file system (in Linux) that works with
git to make huge working directories fast.  I have found that
Microsoft has written GVFS in Windows for this purpose.  I read in a
forum a discussion where they said using a FUSE implementation was too
slow and that they had to write a full file system at the kernel level
to be fast enough.  Their web page also boasts that a checkout takes
30 seconds rather than 3 hours.  My question is why?

If a FUSE was implemented in a way where a git checkout would do
nothing more than copy the snapshot manifest file locally, wouldn't
that basically be instantaneous?  The file system could then fetch
files by the hashes within that manifest whenever one needed to be
read.  Files would only need to be stored locally if it they were
modified.  Since the file system would know exactly what files were
modified, then it seems that git status and commit would be fast as
well.

Perhaps MS implemented GVFS that way because building a big tree from
scratch would be slow if it had to go into user space over and over
again?  If so, then what if a build system like Bazel (from Google)
was used to always build everything incrementally? It too could be
modified (maybe via plugin) to interact with the file system to know
exactly what files changed without reading everything.  The file
system could also use Google's hashes and remote caching to provide
unmodified binary content just like it would use git's SHA1 to provide
unmodified source content from git.  So when a user did a checkout, it
would appear that all the binaries were committed along the source
code.  Bazel would build new binaries from the source that was
modified, and only those new binaries would be written locally. To
execute those binaries, most of them would be read from the cache
while new ones would be read locally.

Perhaps that runtime part is the issue?  Executing the resulting code
was too slow due to the slower file access?  I would think that hit
would not be too bad, and only be during init, but perhaps I'm wrong.


Microsoft is full of really smart guys.  So clearly I am missing
something.  What is it?


(Sorry if I'm wasting your time)

                 reply	other threads:[~2018-02-23  4:20 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACzbczZNAxN94wfNhFX2=Xbob6sO1wUrY2M+ZNE4rDet_tapRQ@mail.gmail.com' \
    --to=duane.knesek@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).