git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] fast-import: replace custom hash with hashmap.c
Date: Tue, 7 Apr 2020 20:37:03 +0200	[thread overview]
Message-ID: <3fd73ee7-14e7-9acd-798e-cb841997f4d7@web.de> (raw)
In-Reply-To: <20200406183542.GA23753@coredump.intra.peff.net>

Am 06.04.20 um 20:35 schrieb Jeff King:
> On Mon, Apr 06, 2020 at 08:07:45PM +0200, René Scharfe wrote:
>
>>> Interesting. I re-ran mine just to double check, and got:
>> [...]
>> A second run today reported:
>>
>> Test                        origin/master         HEAD
>> ---------------------------------------------------------------------------
>> 9300.1: export (no-blobs)   61.58(59.93+1.40)     60.63(59.35+1.22) -1.5%
>> 9300.2: import (no-blobs)   239.64(239.00+0.63)   246.02(245.18+0.82) +2.7%
>>
>> git describe says v5.4-3890-g86fd3e9df543 in that repo.
>
> I'm on v5.6-rc7-188-g1b649e0bcae7, but I can't imagine it makes a big
> difference.
>
>> Dunno.  My PC has thermal issues and stressing it for half an hour straight
>> may cause it to throttle?
>
> Yeah. I wonder what the variance is between the 3 runs (assuming you're
> using ./run and doing 3). I.e., is one run in the first set much faster
> than the others, and we pick it as best-of-3.

I did use run with the default three laps.  Ran p9300 directly with
patch v2 applied instead now, got this:

Test                        this tree
-----------------------------------------------
9300.1: export (no-blobs)   64.80(60.95+1.47)
9300.2: import (no-blobs)   210.00(206.02+2.04)
9300.1: export (no-blobs)   62.58(60.74+1.35)
9300.2: import (no-blobs)   209.07(206.78+1.96)
9300.1: export (no-blobs)   61.33(60.17+1.03)
9300.2: import (no-blobs)   207.73(205.47+2.04)

> But we
> really do lie to container_of(). See remote.c, for example. In
> make_remote(), we call hashmap_get() with a pointer to lookup_entry,
> which is a bare "struct hashmap_entry". That should end up in
> remotes_hash_cmp(), which unconditionally computes a pointer to a
> "struct remote".

Ugh.  Any optimization level would delay that to after the keydata
check, though, I guess.

There's also function pointer casting going on (with patch_util_cmp()
and sequence_entry_cmp()), which is undefined behavior IIUC.

> Now the hashmap_entry there is at the beginning of the struct, so the
> offset is 0 and the two pointers are the same. So while the pointer's
> type is incorrect, we didn't compute an invalid pointer value. And
> traditionally the hashmap code required that it be at the front of the
> containing struct, but that was loosened recently (and container_of
> introduced) in 5efabc7ed9 (Merge branch 'ew/hashmap', 2019-10-15).

Hmm, OK.

> And grepping around, I don't see any cases where it's _not_ at the
> beginning. So perhaps this is a problem waiting to bite us.

Possibly.

Speaking of grep, I ended up with this oneliner to show all the
comparison functions used with hashmaps:

git grep -W -f <(git grep -wh hashmap_init | cut -f2 -d, | grep -v -e ' .* ' -e NULL -e '^ *$' | sed 's/ //; s/.*)//; s/^/int /' | sort -u)

But I didn't learn much from looking at them, except that most have
that unconditional container_of.

>> Stuffing the oidhash() result into a variable and using it twice with
>> hashmap_entry_init() would work as well.  This would make the reason for
>> the duplicate find_object() code obvious, while keeping struct
>> hashmap_entry opaque.
>
> I'd prefer not to use a separate variable, as that requires giving it a
> type.

Specifying the type that oidhash() returns and hashmap_entry_init()
accepts is redundant and inconvenient, but that line should be easy to
find and adapt when the function signatures are eventually changed.
Which is perhaps never going to happen anyway?

> Perhaps:
>
>   hashmap_entry_init(&e->ent, lookup_entry.hash);
>
> which is used elsewhere is OK? That still assumes there's a hash field,
> but I think hashmap has always been pretty up front about that (the real
> sin in the original is assuming that nothing else needs to be
> initialized).

That conflicts with that statement from the hashmap.h:

 * struct hashmap_entry is an opaque structure representing an entry in the
 * hash table.

Patch v2 looks good to me, though!

René

  parent reply	other threads:[~2020-04-07 18:37 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-03 12:15 [PATCH] fast-import: replace custom hash with hashmap.c Jeff King
2020-04-03 18:53 ` René Scharfe
2020-04-05 18:59   ` Jeff King
2020-04-06 18:07     ` René Scharfe
2020-04-06 18:35       ` Jeff King
2020-04-06 18:46         ` Jeff King
2020-04-07 18:37         ` René Scharfe [this message]
2020-04-06 19:49 ` [PATCH v2] " Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3fd73ee7-14e7-9acd-798e-cb841997f4d7@web.de \
    --to=l.s.r@web.de \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).