From: Nicolas Pitre <nico@fluxnic.net>
To: Theodore Tso <tytso@MIT.EDU>
Cc: Jeff King <peff@peff.net>, Will Palmer <wmpalmer@gmail.com>,
Avery Pennarun <apenwarr@gmail.com>,
git@vger.kernel.org
Subject: Re: Why is "git tag --contains" so slow?
Date: Thu, 08 Jul 2010 15:06:34 -0400 (EDT) [thread overview]
Message-ID: <alpine.LFD.2.00.1007081443040.6020@xanadu.home> (raw)
In-Reply-To: <11D5771D-EB47-42E9-BCC3-69C8FE1999EC@MIT.EDU>
On Thu, 8 Jul 2010, Theodore Tso wrote:
>
> On Jul 7, 2010, at 1:45 PM, Jeff King wrote:
>
> > And of course it's just complex, and I tend to shy away from
> > complexity when I can. The question to me comes back to (1) above.
> > Is massive clock skew a breakage that should produce a few
> > incorrect results, or is it something we should always handle?
>
> Going back to the question that kicked off this thread, I wonder if there
> is some way that cacheing could be used to speed up the all cases,
> or at lest the edge cases, without imposing as much latency as tracking
> the max skew? i.e., some thing like gitk's gitk.cache file. For bonus
> points, it could be a cache file that is used by both gitk and git tag
> --contains, git branch --contains, and git name-rev.
>
> Does that sound like reasonable idea?
I don't think any caching would be as good as fixing the fundamental
issue.
Git is fast, sure. But it could be way faster yet in its graph
traversal. And my pack v4 format is meant to overcome all those
obstacles that Git currently has to work through in order to walk its
commit graph. Once one realize that most of the commit object headers
are SHA1 reference which need no be compressed with zlib as it is done
now, and that the author and committer info can be factored out in a
dictionary table, and that even those SHA1 references can be substituted
with an index value into the pack index file (a bit like the OFS variant
of the delta object), meaning that even the object lookup could be
bypassed, then it would be possible to make graph traversal a magnitude
cheaper in terms of computing cycles and memory touched.
The pack format v4 has been brewing in my head for... well... years now.
And that is good because I've improved on the original v4 design even
more since then. And I even found some time to write more code lately.
I have the new object encoding code almost working for trees and
commits. My Git hacking time is still limited so this is progressing
slowly though.
Just to say that I don't think any kind of caching might be necessary in
the end, as it is possible to encode object data in a pack in a way that
ought to be about as fast to access as a separate cache would. So if
someone is pondering about working on a cache layer, then I'd have one
alternate suggestion or two for that person. ;-)
Nicolas
next prev parent reply other threads:[~2010-07-08 19:06 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-01 0:54 Why is "git tag --contains" so slow? Theodore Ts'o
2010-07-01 0:58 ` Shawn O. Pearce
2010-07-03 23:27 ` Sam Vilain
2010-07-01 1:00 ` Avery Pennarun
2010-07-01 12:17 ` tytso
2010-07-01 15:03 ` Jeff King
2010-07-01 15:38 ` Jeff King
2010-07-02 19:26 ` tytso
2010-07-03 8:06 ` Jeff King
2010-07-04 0:55 ` tytso
2010-07-05 12:27 ` Jeff King
2010-07-05 12:33 ` [RFC/PATCH 1/4] tag: speed up --contains calculation Jeff King
2010-10-13 22:07 ` Jonathan Nieder
2010-10-13 22:56 ` Clemens Buchacher
2011-02-23 15:51 ` Ævar Arnfjörð Bjarmason
2011-02-23 16:39 ` Jeff King
2010-07-05 12:34 ` [RFC/PATCH 2/4] limit "contains" traversals based on commit timestamp Jeff King
2010-10-13 23:21 ` Jonathan Nieder
2010-07-05 12:35 ` [RFC/PATCH 3/4] default core.clockskew variable to one day Jeff King
2010-07-05 12:36 ` [RFC/PATCH 4/4] name-rev: respect core.clockskew Jeff King
2010-07-05 12:39 ` Why is "git tag --contains" so slow? Jeff King
2010-10-14 18:59 ` Jonathan Nieder
2010-10-16 14:32 ` Clemens Buchacher
2010-10-27 17:11 ` Jeff King
2010-10-28 8:07 ` Clemens Buchacher
2010-07-05 14:10 ` tytso
2010-07-06 11:58 ` Jeff King
2010-07-06 15:31 ` Will Palmer
2010-07-06 16:53 ` tytso
2010-07-08 11:28 ` Jeff King
2010-07-08 13:21 ` Will Palmer
2010-07-08 13:54 ` tytso
2010-07-07 17:45 ` Jeff King
2010-07-08 10:29 ` Theodore Tso
2010-07-08 11:12 ` Jakub Narebski
2010-07-08 19:29 ` Nicolas Pitre
2010-07-08 19:39 ` Avery Pennarun
2010-07-08 20:13 ` Nicolas Pitre
2010-07-08 21:20 ` Jakub Narebski
2010-07-08 21:30 ` Sverre Rabbelier
2010-07-08 23:10 ` Nicolas Pitre
2010-07-08 23:15 ` Nicolas Pitre
2010-07-08 11:31 ` Jeff King
2010-07-08 14:35 ` Johan Herland
2010-07-08 19:06 ` Nicolas Pitre [this message]
2010-07-07 17:50 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.00.1007081443040.6020@xanadu.home \
--to=nico@fluxnic.net \
--cc=apenwarr@gmail.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
--cc=tytso@MIT.EDU \
--cc=wmpalmer@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).