git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: Brandon Williams <bmwill@google.com>,
	git@vger.kernel.org, git@jeffhostetler.com
Subject: Re: [PATCH] oidmap: map with OID as key
Date: Fri, 29 Sep 2017 15:26:25 -0400	[thread overview]
Message-ID: <20170929192624.4ukvpjujgiyzgibb@sigill.intra.peff.net> (raw)
In-Reply-To: <20170929120456.3bb8021de1c7aebee7bb5026@google.com>

On Fri, Sep 29, 2017 at 12:04:56PM -0700, Jonathan Tan wrote:

> > So depending how you count it, we're wasting between 28% (sha1 and no
> > extra hash) and 16% (sha256 plus reusing hashmap). That's not great, but
> > it's probably not breaking the bank.
> 
> Hmm...how did you get the 16% figure? The values, as I see it, are:
>  - 32 for the sha256 hash itself
>  - 8 for the "next" chain pointer
>  - 8 for the padded hash
>  - 8 for the "util" pointer
> 
> For an oidset, the padded hash and the "util" pointer are wasted, which is
> 16/56=0.286. (If you assume, say, 8 bytes of malloc bookkeeping overhead, then
> 16/64=0.25.)

Sorry to be unclear. I was just counting the waste for the "util"
pointer, not the extra padded hash bit (which AFAIK is already wasted in
oidset). So I computed 48 bytes without the util pointer, which means we
waste an additional 16% to add it.

Anyway, my point was mostly to say that this is a fractional percentage
of the total memory. So it falls into the category of "this might help
in tight situations" and less "this will blow up in our faces".

> In a 100-million-object monorepo, we will probably end up only operating on the
> "frontier" objects (objects that we do not have but we know about because some
> object we have reference them) at the worst. I don't have numbers but I think
> that that is at least an order of magnitude less than 100M.

Agreed.

> > So I think we may be better off going with the solution here that's
> > simpler and requires introducing less code. If it does turn out to be a
> > memory problem in the future, this is a _really_ easy thing to optimize
> > after the fact, because we have these nice abstractions.
> 
> Optimizing away the padded hash should be straightforward. Optimizing away the
> "util" pointer after we have code that uses it is (slightly?) less
> straightforward, but still doable.

I was thinking here of just oidset. It has a limited API, so swapping
out the implementation for one that does not depend on oidmap and waste
the "util" pointer would be the "easy" part.

> I still lean towards saving memory by eliminating the padded hash and
> not using the "util" pointer, because the complication is contained
> within only two files (oidmap.{c,h}), and because the constant factors
> in memory savings might end up mattering. But I don't feel strongly
> about that - I think any of the oidmaps that we have discussed is an
> improvement over what we have now.

My main concern is not so much about complexity bleeding out of the
oidmap code, but that now we have two parallel near-identical hashmap
implementations.  When one gets an optimization, bugfix, or new feature,
we have to port it over manually to the other.

-Peff

  reply	other threads:[~2017-09-29 19:26 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-27 22:19 [PATCH] oidmap: map with OID as key Jonathan Tan
2017-09-28  0:41 ` Brandon Williams
2017-09-28 17:46   ` Jonathan Tan
2017-09-28 20:05     ` Jeff King
2017-09-29 19:04       ` Jonathan Tan
2017-09-29 19:26         ` Jeff King [this message]
2017-09-29 21:43       ` Johannes Schindelin
2017-09-29 23:24         ` Jeff King
2017-09-28  3:13 ` Junio C Hamano
2017-09-28 17:38   ` Jonathan Tan
2017-09-29 22:54 ` [PATCH v2] " Jonathan Tan
2017-10-02 23:48   ` Brandon Williams
2017-10-03  6:31     ` Jeff King
2017-10-04  0:29       ` Jonathan Tan
2017-10-04  7:45         ` Jeff King
2017-10-04  8:48           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170929192624.4ukvpjujgiyzgibb@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=bmwill@google.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).