git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: Brandon Williams <bmwill@google.com>,
	git@vger.kernel.org, git@jeffhostetler.com
Subject: Re: [PATCH] oidmap: map with OID as key
Date: Thu, 28 Sep 2017 16:05:57 -0400	[thread overview]
Message-ID: <20170928200556.grysihlj7cbzocfq@sigill.intra.peff.net> (raw)
In-Reply-To: <20170928104616.be61b394b50dc5193be275be@google.com>

On Thu, Sep 28, 2017 at 10:46:16AM -0700, Jonathan Tan wrote:

> > To me it seems like a much simpler API for a map would be to just allow
> > callers to store a 'void *' as the value.
> 
> I agree that the API would be simpler.
> 
> My main motivation with this design is indeed to save memory, and not
> inconvenience the user too much (in the case where you're storing things
> larger than one pointer, you just need to remember to put the special
> struct at the beginning of your struct), but if memory is not so
> important, I agree that we can switch to the "util" design.

When I saw that you were implementing "oidset" in terms of "oidmap", I
was all ready to be crabby about this extra memory. But then I saw that
the implementation tries hard not to waste any memory. :)

All of which is to say I gave this some thought when I was in the "ready
to be crabby" phase, and came to the conclusion that it probably isn't
that painful. An unused pointer is 8 bytes per entry. We're already
spending 20 for the oid itself (which is likely to grow to 32
eventually), plus 8 for the chained "next" pointer. Plus potentially 8
for a padded version of the hash, if we just use a straight hashmap that
duplicates the hash field.

So depending how you count it, we're wasting between 28% (sha1 and no
extra hash) and 16% (sha256 plus reusing hashmap). That's not great, but
it's probably not breaking the bank.

Another way of thinking about it. Large-ish (but not insane) repos have
on the order of 5-10 million objects. If we had an oidset that mentioned
every single object in the repository, that's 40-80MB wasted in the
worst case. For current uses of oidset, that's probably fine. It's
generally used only to collect ref tips (so probably two orders of
magnitude less).

If you're planning on using an oidset to mark every object in a
100-million-object monorepo, we'd probably care more. But I'd venture to
say that any scheme which involves generating that hash table on the fly
is doing it wrong. At at that scale we'd want to look at compact
mmap-able on-disk representations.

So I think we may be better off going with the solution here that's
simpler and requires introducing less code. If it does turn out to be a
memory problem in the future, this is a _really_ easy thing to optimize
after the fact, because we have these nice abstractions.

-Peff

  reply	other threads:[~2017-09-28 20:06 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-27 22:19 [PATCH] oidmap: map with OID as key Jonathan Tan
2017-09-28  0:41 ` Brandon Williams
2017-09-28 17:46   ` Jonathan Tan
2017-09-28 20:05     ` Jeff King [this message]
2017-09-29 19:04       ` Jonathan Tan
2017-09-29 19:26         ` Jeff King
2017-09-29 21:43       ` Johannes Schindelin
2017-09-29 23:24         ` Jeff King
2017-09-28  3:13 ` Junio C Hamano
2017-09-28 17:38   ` Jonathan Tan
2017-09-29 22:54 ` [PATCH v2] " Jonathan Tan
2017-10-02 23:48   ` Brandon Williams
2017-10-03  6:31     ` Jeff King
2017-10-04  0:29       ` Jonathan Tan
2017-10-04  7:45         ` Jeff King
2017-10-04  8:48           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170928200556.grysihlj7cbzocfq@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=bmwill@google.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).