git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Shawn O. Pearce" <spearce@spearce.org>
To: Nicolas Pitre <nico@cam.org>
Cc: Jakub Narebski <jnareb@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Roman Shaposhnik <rvs@Sun.COM>,
	git@vger.kernel.org
Subject: Re: Achieving efficient storage of weirdly structured repos
Date: Fri, 4 Apr 2008 23:24:45 -0400	[thread overview]
Message-ID: <20080405032445.GS10274@spearce.org> (raw)
In-Reply-To: <alpine.LFD.1.00.0804040844470.2947@xanadu.home>

Nicolas Pitre <nico@cam.org> wrote:
> On Thu, 3 Apr 2008, Jakub Narebski wrote:
> 
> > One of bigger hindrances, as I understand it, in developing pack v4
> > was the fact that it didn't offer that much of improvement in typical
> > cases for the work needed... but perhaps "your" repository would be
> > good showcase for pack v4.
> 
> The biggest hindrance for pack v4 is actually the lack of a native 
> runtime tree walking, and having both tree object formats properly and 
> optimally abstracted has not been looked at yet.
> 
> Speed is the primary goal for pack v4.  The fact that it also provides a 
> 10% pack reduction is only consequential.  But without native tree 
> walking we must recreate the legacy tree format on the fly each time a 
> tree object is loaded which dwarfs any improvements pack v4 is aiming 
> for (yes it is still a little bit faster than pack v3 nevertheless, but 
> not yet significantly enough to overcome the incompatibility costs).

Even though we don't have native tree walking, I think the right
way to do this is to put in pack v4 with "canonical tree, canonical
commit" mode, where it inflates its native tree/commit encoding
into the canonical forms, then come back later with native walking.

Canonical mode is still faster than pack v2 inflate is for these
types, so it does (slightly) boost rev-list performance.  It might
chop a solid 30% off the CPU time jgit spends in its equivilant of
revision.c, and that's without teaching jgit to use the native pack
v4 encoding directly.

Once we have it in we can experiment with the necessary abstractions
to handle the two different available encodings, and allowing
higher level code to switch back and forth between them as objects
come from loose or pack v2, and from pack v4.  One of the things we
wanted to do was boost path limiter performance by matching on tree
name ids when walking a pack v4 native tree, but fall back to the
string based memcmp when walking a canonical tree.  That won't be
easy to design without the two different encodings being available
at the lower level in sha1_file.c.

Just my rapidly declining .02 bush peso.

> Nicolas (who wishes he was still a student with plenty of hacking time)

Don't we all.  :-)

-- 
Shawn.

  parent reply	other threads:[~2008-04-05  3:25 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-03 19:42 Achieving efficient storage of weirdly structured repos Roman Shaposhnik
2008-04-03 21:11 ` Linus Torvalds
2008-04-04  6:21   ` Jakub Narebski
2008-04-04 13:11     ` Nicolas Pitre
2008-04-04 14:16       ` Pieter de Bie
2008-04-05  3:24       ` Shawn O. Pearce [this message]
2008-04-04 23:30   ` Roman Shaposhnik
2008-04-04 23:57     ` Linus Torvalds
2008-04-06  0:13       ` Roman Shaposhnik
2008-04-06  0:48         ` Linus Torvalds
2008-04-06 16:10           ` Jeff King
2008-04-07  0:13             ` Nicolas Pitre
2008-04-07  0:18               ` Jeff King
2008-04-07  0:36                 ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080405032445.GS10274@spearce.org \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=nico@cam.org \
    --cc=rvs@Sun.COM \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).