git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "René Scharfe" <l.s.r@web.de>
Cc: git@vger.kernel.org
Subject: Re: fast-import's hash table is slow
Date: Fri, 3 Apr 2020 08:14:17 -0400	[thread overview]
Message-ID: <20200403121417.GB65799@coredump.intra.peff.net> (raw)
In-Reply-To: <31a0efbe-8ab0-045a-fcab-3211c0d641f6@web.de>

On Thu, Apr 02, 2020 at 08:40:58PM +0200, René Scharfe wrote:

> >> Storing the objects directly in the set and getting rid of new_object()
> >> could improve performance due to reduced dereferencing overhead -- or
> >> make it slower because more data has to be copied when the hashmap needs
> >> to grow.  Worth a shot.  Later.
> >
> > Yeah. I tried that, too, but it caused tons of test failures. Quite
> > possibly just a bug in my patch, which I did as quickly as possible. But
> > I think it performed about the same. Here it is for reference (though
> > you may be better off to start from scratch).
> 
> Tried that earlier, ran into failures as well.  I think the pointer
> returned from insert_object() is stored somewhere and still used after
> the next call, which could have invalidated it by a rehash.  E.g.
> insert_mark() seems to do that.

That doesn't surprise me. I didn't look very hard, but mostly just did
the minimum to see if it would work (and it didn't).

It could perhaps be overcome, and I certainly don't mind if you want to
dig further, but I'm happy enough with the hashmap solution.

> > Note the "this is OK to cast from oid to object_entry" comment is
> > leftover from the earlier attempt, but it probably _isn't_ OK. Even
> > though we don't do anything actionable on the non-oid bytes, they do get
> > passed by value, which could mean reading random bytes.
> 
> "Value" meaning pointer value when KHASH_INIT is give a pointer type,
> as was the case in the patch with that comment (unlike in the patch
> below).  So it should be OK there.

Right, I meant the comment no longer applies in the patch below, because
we're not using a pointer type anymore.

> > -	for (o = blocks; o; o = o->next_pool)
> > -		for (e = o->next_free; e-- != o->entries;)
> > +	for (iter = kh_begin(&object_table); iter != kh_end(&object_table); iter++) {
> > +		if (kh_exist(&object_table, iter)) {
> > +			e = &kh_key(&object_table, iter);
> >  			if (pack_id == e->pack_id)
> >  				*c++ = &e->idx;
> > +		}
> > +	}
> 
> The original code writes the objects in reverse order of their creation
> (LIFO), right?  Is that relevant?

Yeah, agreed this is another weakness of this approach.

> But anyway,  we need stable object locations anyway if their addresses are
> stored in other structs, as mentioned above.  Those pointers would have to
> be replaced by object_ids and pointer derefs by hashmap lookups.  Not a
> promising direction.

Agreed.

-Peff

  reply	other threads:[~2020-04-03 12:14 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-31  9:45 fast-import's hash table is slow Jeff King
2020-03-31 19:14 ` René Scharfe
2020-03-31 23:21   ` René Scharfe
2020-04-01 10:24     ` Jeff King
2020-04-02 18:40       ` René Scharfe
2020-04-03 12:14         ` Jeff King [this message]
2020-04-01 10:35   ` Jeff King
2020-04-01 11:16     ` Jeff King
2020-04-02 18:40       ` René Scharfe
2020-04-03 12:12         ` Jeff King
2020-04-03 18:53           ` René Scharfe
2020-04-03 19:01             ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200403121417.GB65799@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=l.s.r@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).