git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Avery Pennarun <apenwarr@gmail.com>
To: Joshua Juran <jjuran@gmail.com>
Cc: "Shawn O. Pearce" <spearce@spearce.org>,
	Finn Arne Gangstad <finnag@pvv.org>,
	git@vger.kernel.org
Subject: Re: inotify daemon speedup for git [POC/HACK]
Date: Tue, 27 Jul 2010 21:31:38 -0400	[thread overview]
Message-ID: <AANLkTi=TQnyATgJ0LSdR3qeeCVAgu+wOFcHmHUBguPiV@mail.gmail.com> (raw)
In-Reply-To: <52EDBD9A-2961-4F66-88B3-07BF873FA994@gmail.com>

On Tue, Jul 27, 2010 at 9:14 PM, Joshua Juran <jjuran@gmail.com> wrote:
> Okay, I have an idea.  If I understand correctly, the index is a flat
> database of records including a pathname and several fixed-length fields.
>  Since the records are not fixed-length, only sequential search is possible,
> even though the records are sorted by pathname.
>
> Here's the idea:  Divide the database into blocks.  Each block contains a
> block header and the records belonging to a single directory.  The block
> header contains the length of the block and also the offset to the next
> block, in bytes.  In addition to a record for each indexed file in a
> directory, a directory's block also contains records for subdirectories. The
> mode flags in a record indicate the record type.  Directory records contain
> an offset in bytes to the block for that directory (in place of the SHA-1
> hash).  The block list is preceded by a file header, which includes the
> offset in bytes of the root block.  All offsets are from the beginning of
> the file.
>
> Instead of having to search among every file in the repository, the search
> space now includes only the immediate descendants of each directory in the
> target file's path.  If a directory is modified then it can either be
> rewritten in place (if there's sufficient room) or appended to the end of
> the file (requiring the old and new sequentially preceding blocks and the
> parent directory's block to update their offsets).

Yeah, that's pretty much what bup's current format does, minus
appending rewritten dirs at the end when files are added.  I've
thought of that, but sooner or later, the file would need to be
rewritten anyway, and then you end up with odd performance
characteristics where the file expands in random ways and then shrinks
again when you decide it's gotten too big.  And if you do try to reuse
empty blocks - which should mostly avoid the endless growth problem -
you basically just have a database, including fragmentation problems
and multi-user concerns and all.  That's what made me think that
sqlite might be a sensible choice, since it's already a database :)

But maybe there's some simpler way.

Have fun,

Avery

  reply	other threads:[~2010-07-28  1:32 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-27 12:20 inotify daemon speedup for git [POC/HACK] Finn Arne Gangstad
2010-07-27 23:29 ` Avery Pennarun
2010-07-27 23:39   ` Joshua Juran
2010-07-27 23:51     ` Avery Pennarun
2010-07-28  0:00       ` Shawn O. Pearce
2010-07-28  0:18         ` Avery Pennarun
2010-07-28  1:14           ` Joshua Juran
2010-07-28  1:31             ` Avery Pennarun [this message]
2010-07-28  6:03               ` Sverre Rabbelier
2010-07-28  6:06                 ` Jonathan Nieder
2010-07-28  7:44                   ` Ævar Arnfjörð Bjarmason
2010-07-28 11:08                     ` Theodore Tso
2010-07-28  8:20                 ` Nguyen Thai Ngoc Duy
2010-08-13 17:53                   ` Enrico Weigelt
2010-07-28 13:09           ` Jakub Narebski
2010-07-28 13:06         ` Jakub Narebski
2010-08-13 17:58           ` Enrico Weigelt
2010-07-27 23:58 ` Sverre Rabbelier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='AANLkTi=TQnyATgJ0LSdR3qeeCVAgu+wOFcHmHUBguPiV@mail.gmail.com' \
    --to=apenwarr@gmail.com \
    --cc=finnag@pvv.org \
    --cc=git@vger.kernel.org \
    --cc=jjuran@gmail.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).