From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Johan Herland <johan@herland.net>
Cc: Junio C Hamano <gitster@pobox.com>,
git@vger.kernel.org, trast@student.ethz.ch,
tavestbo@trolltech.com, git@drmicha.warpmail.net,
chriscool@tuxfamily.org, spearce@spearce.org
Subject: Re: [PATCHv5 00/14] git notes
Date: Tue, 8 Sep 2009 11:32:04 +0200 (CEST) [thread overview]
Message-ID: <alpine.DEB.1.00.0909081100020.4330@intel-tinevez-2-302> (raw)
In-Reply-To: <200909081054.02523.johan@herland.net>
Hi,
On Tue, 8 Sep 2009, Johan Herland wrote:
> Algorithm / Notes tree git log -n10 (x100) git log --all
> ------------------------------------------------------------
> next / no-notes 4.77s 63.84s
>
> before / no-notes 4.78s 63.90s
> before / no-fanout 56.85s 65.69s
>
> 16tree / no-notes 4.77s 64.18s
> 16tree / no-fanout 30.35s 65.39s
> 16tree / 2_38 5.57s 65.42s
> 16tree / 2_2_36 5.19s 65.76s
>
> flexible / no-notes 4.78s 63.91s
> flexible / no-fanout 30.34s 65.57s
> flexible / 2_38 5.57s 65.46s
> flexible / 2_2_36 5.18s 65.72s
> flexible / ym 5.13s 65.66s
> flexible / ym_2_38 5.08s 65.63s
> flexible / ymd 5.30s 65.45s
> flexible / ymd_2_38 5.29s 65.90s
> flexible / y_m 5.11s 65.72s
> flexible / y_m_2_38 5.08s 65.67s
> flexible / y_m_d 5.06s 65.50s
> flexible / y_m_d_2_38 5.07s 65.79s
It's good to see that the no-notes behaves roughly like baseline.
I can see that some people may think that date-based fan-out is the cat's
ass, but I have to warn that we have no idea how notes will be used, and
the date-based fan-out is rather limiting in certain respects:
- for the typical nightly-build-generated notes, this fan-out is pretty
inefficient memory-wise.
- I find the restriction to commits rather limiting.
- most of the performance difference between the date-based and the SHA-1
based fan-out looks to me as if the issue was the top-level tree.
Basically, this tree has to be read _every_ time _anybody_ wants to read
a note.
Maybe a finer-grained fan-out (finer than 16-bits) could help. After
all, if you have 16 different notes, chances are that they have 16
different first letters, but all have the same commit year. That's
where the top-level notes with a fan-out perform incredibly bad.
But I think that having a dynamic fan-out that can even put blobs into
the top-level tree (nothing prevents us from doing that, right?) would
_outperform_ the date-based one, at least with less than 1 note/commit
(and maybe even then, because the year-based fan-out results in pretty
varying entropies per fan-out depth).
The real question for me, therefore, is: what is the optimal way to
strike the balance between size of the tree objects (which we want to
be small, so that unpacking them is fast) and depth of the fan-out
(which we want to be shallow to avoid reading worst-case 39 tree objects
to get at one single note).
- related to the previous point is my gut feeling that the date-based
fan-out has nothing to do with any theoretical optimum. I am pretty
certain that the optimal fan-out strategy depends heavily on the SHA-1s
of the annotated objects (if you have 10,000 notes in 2009, but only 1
in 2008, the year-based fan-out _must_ be suboptimal) and maybe is
something like a sibling to the Fibonacci heap.
- I'd love to see performance numbers for less than 157118 notes. Don't
get me wrong, it is good to see the worst-case scenario in terms of
notes/commits ratio. But it will hardly be the common case, and I
very much would like to optimize for the common case.
So, I'd appreciate if you could do the tests with something like 500
notes, randomly spread over the commits (rationale: my original
understanding was that the notes could amend commit messages, and that
is much more likely to be done with relatively old commits that you
cannot change anymore).
Please understand that I might not have the time to participate in this
thread as much as I would like to. The next 4 days will be especially
hard.
Ciao,
Dscho
next prev parent reply other threads:[~2009-09-08 9:32 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-08 2:26 [PATCHv5 00/14] git notes Johan Herland
2009-09-08 2:26 ` [PATCHv5 01/14] Introduce commit notes Johan Herland
2009-09-08 2:26 ` [PATCHv5 02/14] Add a script to edit/inspect notes Johan Herland
2009-09-08 2:26 ` [PATCHv5 03/14] Speed up git notes lookup Johan Herland
2009-09-08 2:26 ` [PATCHv5 04/14] Add an expensive test for git-notes Johan Herland
2009-09-08 2:26 ` [PATCHv5 05/14] Teach "-m <msg>" and "-F <file>" to "git notes edit" Johan Herland
2009-09-08 2:26 ` [PATCHv5 06/14] fast-import: Add support for importing commit notes Johan Herland
2009-09-08 2:26 ` [PATCHv5 07/14] t3302-notes-index-expensive: Speed up create_repo() Johan Herland
2009-09-08 2:26 ` [PATCHv5 08/14] Add flags to get_commit_notes() to control the format of the note string Johan Herland
2009-09-08 2:26 ` [PATCHv5 09/14] Add '%N'-format for pretty-printing commit notes Johan Herland
2009-09-08 2:26 ` [PATCHv5 10/14] Teach notes code to free its internal data structures on request Johan Herland
2009-09-08 2:26 ` [PATCHv5 11/14] Teach the notes lookup code to parse notes trees with various fanout schemes Johan Herland
2009-09-08 2:27 ` [PATCHv5 12/14] Selftests verifying semantics when loading notes trees with various fanouts Johan Herland
2009-09-08 2:27 ` [PATCHv5 13/14] Allow flexible organization of notes trees, using both commit date and SHA1 Johan Herland
2009-09-08 2:27 ` [PATCHv5 14/14] Add test cases for date-based fanouts Johan Herland
2009-09-08 3:12 ` [PATCHv5 00/14] git notes Johan Herland
2009-09-08 4:16 ` Junio C Hamano
2009-09-08 8:54 ` Johan Herland
2009-09-08 9:32 ` Johannes Schindelin [this message]
2009-09-08 12:36 ` Johan Herland
2009-09-08 15:53 ` Johannes Schindelin
2009-09-08 22:46 ` Johan Herland
2009-09-10 6:23 ` Stephen R. van den Berg
2009-09-10 9:25 ` Johan Herland
2009-09-08 20:31 ` Junio C Hamano
2009-09-08 21:10 ` Shawn O. Pearce
2009-09-08 21:36 ` Sverre Rabbelier
2009-09-08 21:39 ` Shawn O. Pearce
2009-09-08 21:57 ` Sverre Rabbelier
2009-09-08 21:40 ` Johan Herland
2009-09-12 15:50 ` Johan Herland
2009-09-12 18:11 ` Shawn O. Pearce
2009-09-12 18:35 ` Johan Herland
2009-09-10 14:00 ` Geert Bosch
2009-09-10 14:09 ` Michael J Gruber
2009-09-10 14:12 ` Geert Bosch
2009-09-12 0:11 ` Junio C Hamano
2009-09-12 15:52 ` Johan Herland
2009-09-12 16:08 ` [PATCHv6 " Johan Herland
2009-09-12 16:08 ` [PATCHv6 01/14] Introduce commit notes Johan Herland
2009-09-12 16:08 ` [PATCHv6 02/14] Add a script to edit/inspect notes Johan Herland
2009-09-12 16:08 ` [PATCHv6 03/14] Speed up git notes lookup Johan Herland
2009-09-12 16:08 ` [PATCHv6 04/14] Add an expensive test for git-notes Johan Herland
2009-09-12 16:08 ` [PATCHv6 05/14] Teach "-m <msg>" and "-F <file>" to "git notes edit" Johan Herland
2009-09-12 16:08 ` [PATCHv6 06/14] fast-import: Add support for importing commit notes Johan Herland
2009-09-12 16:08 ` [PATCHv6 07/14] t3302-notes-index-expensive: Speed up create_repo() Johan Herland
2009-09-12 16:08 ` [PATCHv6 08/14] Add flags to get_commit_notes() to control the format of the note string Johan Herland
2009-09-12 16:08 ` [PATCHv6 09/14] Add '%N'-format for pretty-printing commit notes Johan Herland
2009-09-12 16:08 ` [PATCHv6 10/14] Teach notes code to free its internal data structures on request Johan Herland
2009-09-12 18:40 ` Junio C Hamano
2009-09-12 22:21 ` Johan Herland
2009-09-12 16:08 ` [PATCHv6 11/14] Teach the notes lookup code to parse notes trees with various fanout schemes Johan Herland
2009-09-12 16:08 ` [PATCHv6 12/14] Selftests verifying semantics when loading notes trees with various fanouts Johan Herland
2009-09-12 16:08 ` [PATCHv6 13/14] Allow flexible organization of notes trees, using both commit date and SHA1 Johan Herland
2009-09-12 18:41 ` Junio C Hamano
2009-09-12 22:33 ` Johan Herland
2009-09-12 23:37 ` Junio C Hamano
2009-09-12 16:08 ` [PATCHv6 14/14] Add test cases for various date-based fanouts Johan Herland
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.1.00.0909081100020.4330@intel-tinevez-2-302 \
--to=johannes.schindelin@gmx.de \
--cc=chriscool@tuxfamily.org \
--cc=git@drmicha.warpmail.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=johan@herland.net \
--cc=spearce@spearce.org \
--cc=tavestbo@trolltech.com \
--cc=trast@student.ethz.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).