git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johan Herland <johan@herland.net>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	trast@student.ethz.ch, tavestbo@trolltech.com,
	git@drmicha.warpmail.net, chriscool@tuxfamily.org,
	spearce@spearce.org
Subject: Re: [PATCHv5 00/14] git notes
Date: Wed, 09 Sep 2009 00:46:45 +0200	[thread overview]
Message-ID: <200909090046.45213.johan@herland.net> (raw)
In-Reply-To: <alpine.DEB.1.00.0909081741590.4330@intel-tinevez-2-302>

On Tuesday 08 September 2009, Johannes Schindelin wrote:
> On Tue, 8 Sep 2009, Johan Herland wrote:
> > On Tuesday 08 September 2009, Johannes Schindelin wrote:
> > > I can see that some people may think that date-based fan-out is the
> > > cat's ass, but I have to warn that we have no idea how notes will be
> > > used,
> >
> > I don't agree. Although we will certainly see many more use cases for
> > notes, I believe that the vast majority of them can be placed in one of
> > two categories:
> 
> My experience with Git is that having beliefs how my work is used was a
> constant source of surprise.

And you believe that a system that only allows SHA1-based fanout schemes is 
better equipped to tackle such surprises than a system that provides both 
date-based and SHA1-based fanout schemes?

> > > - I find the restriction to commits rather limiting.
> >
> > I see your point, but I don't agree until I see a compelling case for
> > annotating a non-commit.
> 
> My point is that it is too late by then, if you don't allow for a
>  flexible and still efficient scheme.

As I replied to Junio, we could use the epoch as a "commit date" for tree 
and blob objects, thus making them representable in a date-based fanout 
scheme (although if there are a signficant number of non-commit notes, the 
code should be smart enough to find a better (SHA1-based probably) fanout 
scheme for those notes).

> > > - most of the performance difference between the date-based and the
> > >   SHA-1 based fan-out looks to me as if the issue was the top-level
> > >   tree. Basically, this tree has to be read _every_ time _anybody_
> > >   wants to read a note.
> >
> > Not sure what you're trying to say here. The top-level notes tree is
> > read (as in fill_tree_descriptor()) exactly _once_. After that, it is
> > cached by the internal data structure (until free_commit_notes() or
> > end-of-process).
> 
> By that reasoning, we do not need any fan-out scheme.
> 
> Keep in mind: reading a large tree object takes a long time.  That's why
> we started fan-out.  Reading a large number of tree objects also takes a
> long time.  That's why I propagated flexible fan-out that is only read-in
> on demand.

Not sure where you're going with this. Of course we want to strike an 
optimal balance between the size of tree objects and the number of tree 
objects. Nobody is arguing about that. Both SHA1-based and (in the most 
common cases) date-based schemes can be used to achieve this balance. But 
using date-based fanout has the added advantage of providing better 
performance (both runtime- and memory-wise) when looking up notes in a 
chronological order.

> > > But I think that having a dynamic fan-out that can even put blobs
> > > into the top-level tree (nothing prevents us from doing that, right?)
> >
> > Well, the "flexible" code does add the new requirement that all entries
> > in a notes (sub)tree object must follow the same scheme, i.e. you
> > cannot have:
> >
> >   /12/34567890123456789012345678901234567890
> >   /2345/678901234567890123456789012345678901
> >
> > but you can have
> >
> >   /12/34567890123456789012345678901234567890
> >   /23/45/678901234567890123456789012345678901
> 
> Umm, why?  Is there any good technical reason?

In the date-based parts of notes tree, there are very good reasons for doing 
so: The code peeks at the first tree entry in order to determine what kind 
of date-based fanout is used in the current tree object. Subsequent entries 
(in that tree object) that do not follow the same format are 
skipped/ignored.

The SHA1-based fanout code has not changed since the last iteration, so this 
extra requirement is not absolutely necessary for the SHA1-based parts of 
the notes tree. However, the extra requirement does guarantee that commit 
notes have exactly one unique location in the notes tree, and thus relieves 
us of having to keep searching for alternative notes locations, and 
concatenate the notes found.

> > > The real question for me, therefore, is: what is the optimal way to
> > > strike the balance between size of the tree objects (which we want to
> > > be small, so that unpacking them is fast)  and depth of the fan-out
> > > (which we want to be shallow to avoid reading worst-case 39 tree
> > > objects to get at one single note).
> >
> > s/39/19/ (each fanout must use at least 2 chars of the 40-char SHA1)
> 
> That is another unnecessary restriction that could cost you dearly.  Just
> think what happens if it turns out that the optimal number of tree items
> is closer to 16 than to 255...

The code can easily be rewritten to allow for "odd" fanouts (1/39, 1/1/38, 
etc.). Feel free to submit a patch.

I was, however, naive enough to assume that when git.git decided on using 
2/38 fanout for its loose objects, then some performance-related thoughts
went into that decision. If there are indications that multiple-of-2-type 
fanouts are not optimal, we should probably reconsider.


Have fun! :)

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

  reply	other threads:[~2009-09-08 22:46 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-08  2:26 [PATCHv5 00/14] git notes Johan Herland
2009-09-08  2:26 ` [PATCHv5 01/14] Introduce commit notes Johan Herland
2009-09-08  2:26 ` [PATCHv5 02/14] Add a script to edit/inspect notes Johan Herland
2009-09-08  2:26 ` [PATCHv5 03/14] Speed up git notes lookup Johan Herland
2009-09-08  2:26 ` [PATCHv5 04/14] Add an expensive test for git-notes Johan Herland
2009-09-08  2:26 ` [PATCHv5 05/14] Teach "-m <msg>" and "-F <file>" to "git notes edit" Johan Herland
2009-09-08  2:26 ` [PATCHv5 06/14] fast-import: Add support for importing commit notes Johan Herland
2009-09-08  2:26 ` [PATCHv5 07/14] t3302-notes-index-expensive: Speed up create_repo() Johan Herland
2009-09-08  2:26 ` [PATCHv5 08/14] Add flags to get_commit_notes() to control the format of the note string Johan Herland
2009-09-08  2:26 ` [PATCHv5 09/14] Add '%N'-format for pretty-printing commit notes Johan Herland
2009-09-08  2:26 ` [PATCHv5 10/14] Teach notes code to free its internal data structures on request Johan Herland
2009-09-08  2:26 ` [PATCHv5 11/14] Teach the notes lookup code to parse notes trees with various fanout schemes Johan Herland
2009-09-08  2:27 ` [PATCHv5 12/14] Selftests verifying semantics when loading notes trees with various fanouts Johan Herland
2009-09-08  2:27 ` [PATCHv5 13/14] Allow flexible organization of notes trees, using both commit date and SHA1 Johan Herland
2009-09-08  2:27 ` [PATCHv5 14/14] Add test cases for date-based fanouts Johan Herland
2009-09-08  3:12 ` [PATCHv5 00/14] git notes Johan Herland
2009-09-08  4:16   ` Junio C Hamano
2009-09-08  8:54     ` Johan Herland
2009-09-08  9:32       ` Johannes Schindelin
2009-09-08 12:36         ` Johan Herland
2009-09-08 15:53           ` Johannes Schindelin
2009-09-08 22:46             ` Johan Herland [this message]
2009-09-10  6:23               ` Stephen R. van den Berg
2009-09-10  9:25           ` Johan Herland
2009-09-08 20:31         ` Junio C Hamano
2009-09-08 21:10           ` Shawn O. Pearce
2009-09-08 21:36             ` Sverre Rabbelier
2009-09-08 21:39               ` Shawn O. Pearce
2009-09-08 21:57                 ` Sverre Rabbelier
2009-09-08 21:40           ` Johan Herland
2009-09-12 15:50   ` Johan Herland
2009-09-12 18:11     ` Shawn O. Pearce
2009-09-12 18:35       ` Johan Herland
2009-09-10 14:00 ` Geert Bosch
2009-09-10 14:09   ` Michael J Gruber
2009-09-10 14:12     ` Geert Bosch
2009-09-12  0:11 ` Junio C Hamano
2009-09-12 15:52   ` Johan Herland
2009-09-12 16:08     ` [PATCHv6 " Johan Herland
2009-09-12 16:08     ` [PATCHv6 01/14] Introduce commit notes Johan Herland
2009-09-12 16:08     ` [PATCHv6 02/14] Add a script to edit/inspect notes Johan Herland
2009-09-12 16:08     ` [PATCHv6 03/14] Speed up git notes lookup Johan Herland
2009-09-12 16:08     ` [PATCHv6 04/14] Add an expensive test for git-notes Johan Herland
2009-09-12 16:08     ` [PATCHv6 05/14] Teach "-m <msg>" and "-F <file>" to "git notes edit" Johan Herland
2009-09-12 16:08     ` [PATCHv6 06/14] fast-import: Add support for importing commit notes Johan Herland
2009-09-12 16:08     ` [PATCHv6 07/14] t3302-notes-index-expensive: Speed up create_repo() Johan Herland
2009-09-12 16:08     ` [PATCHv6 08/14] Add flags to get_commit_notes() to control the format of the note string Johan Herland
2009-09-12 16:08     ` [PATCHv6 09/14] Add '%N'-format for pretty-printing commit notes Johan Herland
2009-09-12 16:08     ` [PATCHv6 10/14] Teach notes code to free its internal data structures on request Johan Herland
2009-09-12 18:40       ` Junio C Hamano
2009-09-12 22:21         ` Johan Herland
2009-09-12 16:08     ` [PATCHv6 11/14] Teach the notes lookup code to parse notes trees with various fanout schemes Johan Herland
2009-09-12 16:08     ` [PATCHv6 12/14] Selftests verifying semantics when loading notes trees with various fanouts Johan Herland
2009-09-12 16:08     ` [PATCHv6 13/14] Allow flexible organization of notes trees, using both commit date and SHA1 Johan Herland
2009-09-12 18:41       ` Junio C Hamano
2009-09-12 22:33         ` Johan Herland
2009-09-12 23:37           ` Junio C Hamano
2009-09-12 16:08     ` [PATCHv6 14/14] Add test cases for various date-based fanouts Johan Herland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200909090046.45213.johan@herland.net \
    --to=johan@herland.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=chriscool@tuxfamily.org \
    --cc=git@drmicha.warpmail.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=spearce@spearce.org \
    --cc=tavestbo@trolltech.com \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).