git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Avery Pennarun <apenwarr@gmail.com>
To: Joshua Juran <jjuran@gmail.com>
Cc: Finn Arne Gangstad <finnag@pvv.org>, git@vger.kernel.org
Subject: Re: inotify daemon speedup for git [POC/HACK]
Date: Tue, 27 Jul 2010 19:51:26 -0400	[thread overview]
Message-ID: <AANLkTi=oA33M4DmS5FyDx7Wn1DFrUGcmhSYkvcSYMc2r@mail.gmail.com> (raw)
In-Reply-To: <9E67A084-4EDB-4CCB-A771-11B97107F4EF@gmail.com>

On Tue, Jul 27, 2010 at 7:39 PM, Joshua Juran <jjuran@gmail.com> wrote:
> On Jul 27, 2010, at 4:29 PM, Avery Pennarun wrote:
>
>> An inotify daemon could easily keep track of which files have been
>> added that aren't in the index... but where would it put the list of
>> files git doesn't know about?  Do they go in the index with a special
>> NOT_REALLY_INDEXED flag?
>
> One option is not to write it to disk at all.  The client could consult the
> daemon directly.

True.  What would the client-server protocol look like, though?  "Give
me the list of unknown files?"  Does the daemon need to understand
.gitignore or will it send back a list of all my million *.o files
every time?  etc.

Offhandedly, I think it would be nice to have an inotify daemon just
maintain (something like) the git index file where it just has a list
of *all* the files in a form that's a) random access, not just
sequential, and b) really fast when accessed sequentially.

Knowing that large numbers of files can cause slowness, I was planning
ahead for inotify when I designed bup's index file format, and it
meets the above criteria.  Unfortunately I screwed up other stuff
(adding new files is too slow) and it still needs to be rewritten
anyway.  Oh well.

While we're here, it's probably worth mentioning that git's index file
format (which stores a sequential list of full paths in alphabetical
order, instead of an actual hierarchy) does become a bottleneck when
you actually have a huge number of files in your repo (like literally
a million).  You can't actually binary search through the index!  The
current implementation of submodules allows you to dodge that
scalability problem since you end up with multiple smaller index
files.  Anyway, that's fixable too.

Have fun,

Avery

  reply	other threads:[~2010-07-27 23:51 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-27 12:20 inotify daemon speedup for git [POC/HACK] Finn Arne Gangstad
2010-07-27 23:29 ` Avery Pennarun
2010-07-27 23:39   ` Joshua Juran
2010-07-27 23:51     ` Avery Pennarun [this message]
2010-07-28  0:00       ` Shawn O. Pearce
2010-07-28  0:18         ` Avery Pennarun
2010-07-28  1:14           ` Joshua Juran
2010-07-28  1:31             ` Avery Pennarun
2010-07-28  6:03               ` Sverre Rabbelier
2010-07-28  6:06                 ` Jonathan Nieder
2010-07-28  7:44                   ` Ævar Arnfjörð Bjarmason
2010-07-28 11:08                     ` Theodore Tso
2010-07-28  8:20                 ` Nguyen Thai Ngoc Duy
2010-08-13 17:53                   ` Enrico Weigelt
2010-07-28 13:09           ` Jakub Narebski
2010-07-28 13:06         ` Jakub Narebski
2010-08-13 17:58           ` Enrico Weigelt
2010-07-27 23:58 ` Sverre Rabbelier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='AANLkTi=oA33M4DmS5FyDx7Wn1DFrUGcmhSYkvcSYMc2r@mail.gmail.com' \
    --to=apenwarr@gmail.com \
    --cc=finnag@pvv.org \
    --cc=git@vger.kernel.org \
    --cc=jjuran@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).