From: Jeff King <peff@peff.net>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: Brandon Williams <bmwill@google.com>,
git@vger.kernel.org, git@jeffhostetler.com
Subject: Re: [PATCH] oidmap: map with OID as key
Date: Fri, 29 Sep 2017 15:26:25 -0400 [thread overview]
Message-ID: <20170929192624.4ukvpjujgiyzgibb@sigill.intra.peff.net> (raw)
In-Reply-To: <20170929120456.3bb8021de1c7aebee7bb5026@google.com>
On Fri, Sep 29, 2017 at 12:04:56PM -0700, Jonathan Tan wrote:
> > So depending how you count it, we're wasting between 28% (sha1 and no
> > extra hash) and 16% (sha256 plus reusing hashmap). That's not great, but
> > it's probably not breaking the bank.
>
> Hmm...how did you get the 16% figure? The values, as I see it, are:
> - 32 for the sha256 hash itself
> - 8 for the "next" chain pointer
> - 8 for the padded hash
> - 8 for the "util" pointer
>
> For an oidset, the padded hash and the "util" pointer are wasted, which is
> 16/56=0.286. (If you assume, say, 8 bytes of malloc bookkeeping overhead, then
> 16/64=0.25.)
Sorry to be unclear. I was just counting the waste for the "util"
pointer, not the extra padded hash bit (which AFAIK is already wasted in
oidset). So I computed 48 bytes without the util pointer, which means we
waste an additional 16% to add it.
Anyway, my point was mostly to say that this is a fractional percentage
of the total memory. So it falls into the category of "this might help
in tight situations" and less "this will blow up in our faces".
> In a 100-million-object monorepo, we will probably end up only operating on the
> "frontier" objects (objects that we do not have but we know about because some
> object we have reference them) at the worst. I don't have numbers but I think
> that that is at least an order of magnitude less than 100M.
Agreed.
> > So I think we may be better off going with the solution here that's
> > simpler and requires introducing less code. If it does turn out to be a
> > memory problem in the future, this is a _really_ easy thing to optimize
> > after the fact, because we have these nice abstractions.
>
> Optimizing away the padded hash should be straightforward. Optimizing away the
> "util" pointer after we have code that uses it is (slightly?) less
> straightforward, but still doable.
I was thinking here of just oidset. It has a limited API, so swapping
out the implementation for one that does not depend on oidmap and waste
the "util" pointer would be the "easy" part.
> I still lean towards saving memory by eliminating the padded hash and
> not using the "util" pointer, because the complication is contained
> within only two files (oidmap.{c,h}), and because the constant factors
> in memory savings might end up mattering. But I don't feel strongly
> about that - I think any of the oidmaps that we have discussed is an
> improvement over what we have now.
My main concern is not so much about complexity bleeding out of the
oidmap code, but that now we have two parallel near-identical hashmap
implementations. When one gets an optimization, bugfix, or new feature,
we have to port it over manually to the other.
-Peff
next prev parent reply other threads:[~2017-09-29 19:26 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-27 22:19 [PATCH] oidmap: map with OID as key Jonathan Tan
2017-09-28 0:41 ` Brandon Williams
2017-09-28 17:46 ` Jonathan Tan
2017-09-28 20:05 ` Jeff King
2017-09-29 19:04 ` Jonathan Tan
2017-09-29 19:26 ` Jeff King [this message]
2017-09-29 21:43 ` Johannes Schindelin
2017-09-29 23:24 ` Jeff King
2017-09-28 3:13 ` Junio C Hamano
2017-09-28 17:38 ` Jonathan Tan
2017-09-29 22:54 ` [PATCH v2] " Jonathan Tan
2017-10-02 23:48 ` Brandon Williams
2017-10-03 6:31 ` Jeff King
2017-10-04 0:29 ` Jonathan Tan
2017-10-04 7:45 ` Jeff King
2017-10-04 8:48 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170929192624.4ukvpjujgiyzgibb@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=bmwill@google.com \
--cc=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).