git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: "René Scharfe" <l.s.r@web.de>
Cc: git@vger.kernel.org, stolee@gmail.com
Subject: Re: [PATCH 2/2] packfile: refactor hash search with fanout table
Date: Fri, 9 Feb 2018 11:50:51 -0800	[thread overview]
Message-ID: <20180209115051.b9356543f3f7d07f3bae213f@google.com> (raw)
In-Reply-To: <cfbde137-dbac-8796-f49f-2a543303d33a@web.de>

On Fri, 9 Feb 2018 19:03:48 +0100
René Scharfe <l.s.r@web.de> wrote:

> Going from unsigned to signed int means the patch breaks support for
> more than 2G pack entries, which was put with 326bf39677 (Use uint32_t
> for all packed object counts.) in 2007.

Ah, good catch. I'll wait to see if there are any more comments, then
send out a new version.

> > +int bsearch_hash(const unsigned char *sha1, const void *fanout_,
> > +		 const void *table_, size_t stride)
> > +{
> > +	const uint32_t *fanout = fanout_;
> 
> Why hide the type?  It doesn't make the function more generic.

I thought that the fanout_ parameter could come from a variety of
sources (e.g. direct mmap - void *, or mmap with some pointer arithmetic
- char *) so I just picked the generic one. But now I realize that that
could lead to unaligned reads, which is probably not a good idea. I'll
update it.

For consistency, I'll also update table_ to be unsigned char *.
(Unsigned because it is primarily interpreted as hashes, which use
"unsigned char *" in the Git code.)

> Why not use sha1_pos()?  I guess because it avoids the overhead of the
> accessor function, right?  And I wonder how much of difference it makes.

Yes, overhead of the accessor function. We would also need to modify
sha1_pos to take in a function that we can pass userdata to (to contain
the stride).

> A binary search function for embedded hashes just needs the key, a
> pointer to the first hash in the array, the stride and the number of
> elements.  It can then be used with or without a fanout table, making it
> more versatile.  Just a thought.

I specifically want to include the fanout table in the calculation here,
because it will be used by subsequent patches that also incorporate the
fanout table.

  reply	other threads:[~2018-02-09 19:50 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-02 22:36 [PATCH 0/2] Refactor hash search with fanout table Jonathan Tan
2018-02-02 22:36 ` [PATCH 1/2] packfile: remove GIT_DEBUG_LOOKUP log statements Jonathan Tan
2018-02-02 22:36 ` [PATCH 2/2] packfile: refactor hash search with fanout table Jonathan Tan
2018-02-09 18:03   ` René Scharfe
2018-02-09 19:50     ` Jonathan Tan [this message]
2018-02-02 23:30 ` [PATCH 0/2] Refactor " Junio C Hamano
2018-02-03  2:09   ` Derrick Stolee
2018-02-13 18:39 ` [PATCH v2 " Jonathan Tan
2018-02-13 18:39   ` [PATCH v2 1/2] packfile: remove GIT_DEBUG_LOOKUP log statements Jonathan Tan
2018-02-13 18:39   ` [PATCH v2 2/2] packfile: refactor hash search with fanout table Jonathan Tan
2018-02-13 18:52   ` [PATCH v2 0/2] Refactor " Derrick Stolee
2018-02-13 19:57   ` Junio C Hamano
2018-02-13 20:15     ` Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180209115051.b9356543f3f7d07f3bae213f@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    --cc=l.s.r@web.de \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).