git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johan Herland <johan@herland.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	trast@student.ethz.ch, tavestbo@trolltech.com,
	git@drmicha.warpmail.net, chriscool@tuxfamily.org,
	spearce@spearce.org
Subject: Re: [PATCHv5 00/14] git notes
Date: Tue, 08 Sep 2009 23:40:05 +0200	[thread overview]
Message-ID: <200909082340.05318.johan@herland.net> (raw)
In-Reply-To: <7vocplxjov.fsf@alter.siamese.dyndns.org>

On Tuesday 08 September 2009, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> > On Tue, 8 Sep 2009, Johan Herland wrote:
> >> Algorithm / Notes tree   git log -n10 (x100)   git log --all
> >> ------------------------------------------------------------
> >> next / no-notes                4.77s              63.84s
> >>
> >> before / no-notes              4.78s              63.90s
> >> before / no-fanout            56.85s              65.69s
> >>
> >> 16tree / no-notes              4.77s              64.18s
> >> 16tree / no-fanout            30.35s              65.39s
> >> 16tree / 2_38                  5.57s              65.42s
> >> 16tree / 2_2_36                5.19s              65.76s
> >>
> >> flexible / no-notes            4.78s              63.91s
> >> flexible / no-fanout          30.34s              65.57s
> >> flexible / 2_38                5.57s              65.46s
> >> flexible / 2_2_36              5.18s              65.72s
> >> flexible / ym                  5.13s              65.66s
> >> flexible / ym_2_38             5.08s              65.63s
> >> flexible / ymd                 5.30s              65.45s
> >> flexible / ymd_2_38            5.29s              65.90s
> >> flexible / y_m                 5.11s              65.72s
> >> flexible / y_m_2_38            5.08s              65.67s
> >> flexible / y_m_d               5.06s              65.50s
> >> flexible / y_m_d_2_38          5.07s              65.79s
> >
> > I can see that some people may think that date-based fan-out is the
> > cat's ass,
> 
> Actually, my knee-jerk reaction was that 4.77 (next) vs 5.57 (16tree with
> 2_38) is already a good enough performance/simplicity tradeoff, and 5.57
> vs 5.08 (16tree with ym_2_38) probably does not justify the risk of worst
> case behaviour that can come from possible mismatch between the access
> pattern and the date-optimized tree layout.

Yes, 16tree / 2_38 looks like a reasonable tradeoff when you look at the 
absolute numbers, but it's also interesting to highlight the actual cost of 
doing the notes lookup. In that case, we see that 16tree / 2_38 costs 0.80s, 
whereas flexible / ym_2_38 only costs 0.31s, i.e. less than half the cost of 
the former...

> But that only argues against supporting _only_ date-optimized layout.
> 
> Support of "flexible layout" is not that flexible as its name suggests;
> one single note tree needs to have a uniform fanout strategy.

Actually, the uniform strategy is only required at each separate level. You 
are free to vary the strategy within independent subtrees. I.e. in the case 
where you have 1 note from 2007, and 1000 notes from 2008, you are free to 
use a mix of date-based and SHA1-based structures, like this:

  y2007/1234567...
  y2008/m01/d01/2345678...
  y2008/m01/d01/3456789...
  y2008/m01/d02/45/67890...
  y2008/m01/d02/56/78901...
  y2008/m01/d02/67/89012...
  ...

> > - I find the restriction to commits rather limiting.
> 
> Yeah, we would not want to be surprised to find many people want to
> annotate non-commits with this mechanism.

We could arbitrarily set the "commit date" for non-commit objects to the 
epoch, so that they can still be represented in a date-based fanout. (Of 
course, the notes code should be smart enough to choose a more optimal 
fanout if the number of non-commit notes is significant).

> > - most of the performance difference between the date-based and the
> > SHA-1 based fan-out looks to me as if the issue was the top-level tree.
> > Basically, this tree has to be read _every_ time _anybody_ wants to
> > read a note.
> 
> A comparison between 'next' and another algorithm that opens the
>  top-level notes tree object and returns "I did not find any note"
>  without doing anything else would reveal that cost.  But when you are
>  doing "log -n10" (or "log --all"), you would read the notes top-level
>  tree once, and it is likely to be cached in the obj_hash[] (or in
>  delta_base cache) already for the remaining invocations, even if notes
>  mechanism does not do its own cache, which I think it does, no?

Yes it does, since Dscho's original hash_map based implementation, in fact.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

  parent reply	other threads:[~2009-09-08 21:40 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-08  2:26 [PATCHv5 00/14] git notes Johan Herland
2009-09-08  2:26 ` [PATCHv5 01/14] Introduce commit notes Johan Herland
2009-09-08  2:26 ` [PATCHv5 02/14] Add a script to edit/inspect notes Johan Herland
2009-09-08  2:26 ` [PATCHv5 03/14] Speed up git notes lookup Johan Herland
2009-09-08  2:26 ` [PATCHv5 04/14] Add an expensive test for git-notes Johan Herland
2009-09-08  2:26 ` [PATCHv5 05/14] Teach "-m <msg>" and "-F <file>" to "git notes edit" Johan Herland
2009-09-08  2:26 ` [PATCHv5 06/14] fast-import: Add support for importing commit notes Johan Herland
2009-09-08  2:26 ` [PATCHv5 07/14] t3302-notes-index-expensive: Speed up create_repo() Johan Herland
2009-09-08  2:26 ` [PATCHv5 08/14] Add flags to get_commit_notes() to control the format of the note string Johan Herland
2009-09-08  2:26 ` [PATCHv5 09/14] Add '%N'-format for pretty-printing commit notes Johan Herland
2009-09-08  2:26 ` [PATCHv5 10/14] Teach notes code to free its internal data structures on request Johan Herland
2009-09-08  2:26 ` [PATCHv5 11/14] Teach the notes lookup code to parse notes trees with various fanout schemes Johan Herland
2009-09-08  2:27 ` [PATCHv5 12/14] Selftests verifying semantics when loading notes trees with various fanouts Johan Herland
2009-09-08  2:27 ` [PATCHv5 13/14] Allow flexible organization of notes trees, using both commit date and SHA1 Johan Herland
2009-09-08  2:27 ` [PATCHv5 14/14] Add test cases for date-based fanouts Johan Herland
2009-09-08  3:12 ` [PATCHv5 00/14] git notes Johan Herland
2009-09-08  4:16   ` Junio C Hamano
2009-09-08  8:54     ` Johan Herland
2009-09-08  9:32       ` Johannes Schindelin
2009-09-08 12:36         ` Johan Herland
2009-09-08 15:53           ` Johannes Schindelin
2009-09-08 22:46             ` Johan Herland
2009-09-10  6:23               ` Stephen R. van den Berg
2009-09-10  9:25           ` Johan Herland
2009-09-08 20:31         ` Junio C Hamano
2009-09-08 21:10           ` Shawn O. Pearce
2009-09-08 21:36             ` Sverre Rabbelier
2009-09-08 21:39               ` Shawn O. Pearce
2009-09-08 21:57                 ` Sverre Rabbelier
2009-09-08 21:40           ` Johan Herland [this message]
2009-09-12 15:50   ` Johan Herland
2009-09-12 18:11     ` Shawn O. Pearce
2009-09-12 18:35       ` Johan Herland
2009-09-10 14:00 ` Geert Bosch
2009-09-10 14:09   ` Michael J Gruber
2009-09-10 14:12     ` Geert Bosch
2009-09-12  0:11 ` Junio C Hamano
2009-09-12 15:52   ` Johan Herland
2009-09-12 16:08     ` [PATCHv6 " Johan Herland
2009-09-12 16:08     ` [PATCHv6 01/14] Introduce commit notes Johan Herland
2009-09-12 16:08     ` [PATCHv6 02/14] Add a script to edit/inspect notes Johan Herland
2009-09-12 16:08     ` [PATCHv6 03/14] Speed up git notes lookup Johan Herland
2009-09-12 16:08     ` [PATCHv6 04/14] Add an expensive test for git-notes Johan Herland
2009-09-12 16:08     ` [PATCHv6 05/14] Teach "-m <msg>" and "-F <file>" to "git notes edit" Johan Herland
2009-09-12 16:08     ` [PATCHv6 06/14] fast-import: Add support for importing commit notes Johan Herland
2009-09-12 16:08     ` [PATCHv6 07/14] t3302-notes-index-expensive: Speed up create_repo() Johan Herland
2009-09-12 16:08     ` [PATCHv6 08/14] Add flags to get_commit_notes() to control the format of the note string Johan Herland
2009-09-12 16:08     ` [PATCHv6 09/14] Add '%N'-format for pretty-printing commit notes Johan Herland
2009-09-12 16:08     ` [PATCHv6 10/14] Teach notes code to free its internal data structures on request Johan Herland
2009-09-12 18:40       ` Junio C Hamano
2009-09-12 22:21         ` Johan Herland
2009-09-12 16:08     ` [PATCHv6 11/14] Teach the notes lookup code to parse notes trees with various fanout schemes Johan Herland
2009-09-12 16:08     ` [PATCHv6 12/14] Selftests verifying semantics when loading notes trees with various fanouts Johan Herland
2009-09-12 16:08     ` [PATCHv6 13/14] Allow flexible organization of notes trees, using both commit date and SHA1 Johan Herland
2009-09-12 18:41       ` Junio C Hamano
2009-09-12 22:33         ` Johan Herland
2009-09-12 23:37           ` Junio C Hamano
2009-09-12 16:08     ` [PATCHv6 14/14] Add test cases for various date-based fanouts Johan Herland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200909082340.05318.johan@herland.net \
    --to=johan@herland.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=chriscool@tuxfamily.org \
    --cc=git@drmicha.warpmail.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=spearce@spearce.org \
    --cc=tavestbo@trolltech.com \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).