Re: [PATCH 1/2] bswap.h: drop unaligned loads

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: Jeff King <peff@peff.net>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"René Scharfe" <l.s.r@web.de>,
	"Han-Wen Nienhuys" <hanwen@google.com>, git <git@vger.kernel.org>,
	"Han-Wen Nienhuys" <hanwenn@gmail.com>
Subject: Re: [PATCH 1/2] bswap.h: drop unaligned loads
Date: Fri, 25 Sep 2020 01:13:48 +0000	[thread overview]
Message-ID: <20200925011348.GA1392312@camp.crustytoothpaste.net> (raw)
In-Reply-To: <20200924192111.GA2528225@coredump.intra.peff.net>

[-- Attachment #1: Type: text/plain, Size: 2829 bytes --]

On 2020-09-24 at 19:21:11, Jeff King wrote:
> However, the unaligned loads were either not the useful part of that
> speedup, or perhaps compilers and processors have changed since then.
> Here are times for computing the sha1 of 4GB of random data, with and
> without -DNO_UNALIGNED_LOADS (and BLK_SHA1=1, of course). This is with
> gcc 10, -O2, and the processor is a Core i9-9880H.
> 
>   [stock]
>   Benchmark #1: t/helper/test-tool sha1 <foo.rand
>     Time (mean ± σ):      6.638 s ±  0.081 s    [User: 6.269 s, System: 0.368 s]
>     Range (min … max):    6.550 s …  6.841 s    10 runs
> 
>   [-DNO_UNALIGNED_LOADS]
>   Benchmark #1: t/helper/test-tool sha1 <foo.rand
>     Time (mean ± σ):      6.418 s ±  0.015 s    [User: 6.058 s, System: 0.360 s]
>     Range (min … max):    6.394 s …  6.447 s    10 runs
> 
> And here's the same test run on an AMD A8-7600, using gcc 8.
> 
>   [stock]
>   Benchmark #1: t/helper/test-tool sha1 <foo.rand
>     Time (mean ± σ):     11.721 s ±  0.113 s    [User: 10.761 s, System: 0.951 s]
>     Range (min … max):   11.509 s … 11.861 s    10 runs
> 
>   [-DNO_UNALIGNED_LOADS]
>   Benchmark #1: t/helper/test-tool sha1 <foo.rand
>     Time (mean ± σ):     11.744 s ±  0.066 s    [User: 10.807 s, System: 0.928 s]
>     Range (min … max):   11.637 s … 11.863 s    10 runs

I think this is a fine and desirable change, both for performance and
correctness.  It is, as usual, well explained.

> So the unaligned loads don't seem to help much, and actually make things
> worse. It's possible there are platforms where they provide more
> benefit, but:
> 
>   - the non-x86 platforms for which we use this code are old and obscure
>     (powerpc and s390).

I cannot speak for s390, since I have never owned one, but my
understanding on unaligned access is that typically there is a tiny
penalty on x86 (about a cycle) and a more significant penalty on
PowerPC, although that may have changed with newer POWER chips.  So my
gut tells me this is an improvement either way, although I no longer own
any such bootable hardware to measure for certain.

Anyway, as René found, the latest versions of GCC already use the
peephole optimizer to recognize and optimize this on x86, so I expect
they'll do so on other architectures as well.  Byte swapping is a pretty
common operation.

>   - the main caller that cares about performance is block-sha1. But
>     these days it is rarely used anyway, in favor of sha1dc (which is
>     already much slower, and nobody seems to have cared that much).

I think block-sha256 uses it as well, but in any case, it's still faster
than sha1dc and people who care desperately about performance will use a
crypto library instead.
-- 
brian m. carlson: Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

next prev parent reply	other threads:[~2020-09-25  1:13 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-24 19:16 [PATCH 0/2] drop unaligned loads Jeff King
2020-09-24 19:21 ` [PATCH 1/2] bswap.h: " Jeff King
2020-09-24 22:02   ` René Scharfe
2020-09-25  4:56     ` Jeff King
2020-09-25  1:13   ` brian m. carlson [this message]
2020-09-25  9:05     ` Carlo Arenas
2020-09-25  9:09       ` Jeff King
2020-09-25 20:48       ` Thomas Guyot
2020-09-24 19:22 ` [PATCH 2/2] Revert "fast-export: use local array to store anonymized oid" Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200925011348.GA1392312@camp.crustytoothpaste.net \
    --to=sandals@crustytoothpaste.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hanwen@google.com \
    --cc=hanwenn@gmail.com \
    --cc=l.s.r@web.de \
    --cc=peff@peff.net \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).