From: "George Spelvin" <linux@horizon.com>
To: linux@horizon.com, nico@cam.org
Cc: art.08.09@gmail.com, bdonlan@gmail.com, git@vger.kernel.org,
johnflux@gmail.com, P@draigBrady.com,
torvalds@linux-foundation.org
Subject: Re: Linus' sha1 is much faster!
Date: 17 Aug 2009 14:54:48 -0400 [thread overview]
Message-ID: <20090817185448.30254.qmail@science.horizon.com> (raw)
In-Reply-To: <alpine.LFD.2.00.0908171228570.6044@xanadu.home>
> I don't think this trick of making source code textually different from
> another work while still intimately mimicking the same structure entitles
> you to any copyright (or non copyright) claims over that other work. I
> certainly wouldn't bet any dime for this standing up in court.
<div type="legal digression">
Actually, I would. I did a lot more than text search and replace;
I re-implemented it from FIPS 180-2 (work of U.S. government, no copyright)
and then merged in the *ideas* from the mailing list.
(And from elsewhere; the idea of a five-round macro is from Brian Gladman.)
Remember, to the extent that something is *functional*, it is not
copyrightable; copyright only covers the non-functional expressive bits.
The vast majority of that code is simply required by the standard,
or the desired calling interface.
For a large portion of the rest, remember that standard programming
conventions (e.g. brace style, macro names IN CAPS, etc.) that's also
non-copyrightable "scene a faire" material.
It's well established that paraphrasing a recipe avoids copyright;
the proportions and treatment of the ingredients is not copyrightable.
For more details, see the extensive coverage of the NEC v. Intel decision
(1989) regarding the firmware for NEC's 8086-clone V20 microprocessor.
It was found non-infringing despite non-clean-room implementation and
substantial similarities.
</div>
> It is probably much safer to simply ask the people involved to agree
> with your relicensing. And so far I don't see anyone with a stake in
> this fiercely wanting to stick to a particular license.
As for politeness, that's exactly why I did post it and solicit
objections. The purpose of the rewrite is to avoid having to make
pessimistic assumptions about people who don't respond.
I suppose I should have made that request clearer:
Is there anyone who claims copyright on anything here?
Or would just like credit?
If so, are you willing to donate it to the public domain?
I like the GPL, but some code is generally applicable enough, and unlikely
to be improved on by others enough, that I'd rather just let everyone
use the same version.
> It's not quite 100% finished, as I haven't benchmarked it against Linus's
> code yet, but it's functionally correct.
>
> It's also clean with -W -Wall -Wextra.
> Not if you try with the unaligned put_be32() as the destination pointer
> is marked const.
Oops. And the "||" in get_be32() doesn't help either.
There was another major bug I found, too.
And I'm still trying to match "linusv" performance.
> As to the actual result on ARM... Well, the assembly _looks_ much worse
> than Linus' version. It uses a stack frame of 152 bytes instead of 64
> bytes. The resulting binary is also 6868 bytes large compared to 6180
> bytes. Surprisingly, the performance is not that bad (the reason for
> the underlined "looks" above) albeit still a bit worse, like 5% slower.
> I was expecting much worse than that.
Thanks, I'll work on it.
> Also, wrt skipping the last 3 write back to the 16 word array... For
> all the (limited) attempts I've made so far to do that, it always ended
> up making things worse. I've yet to investigate why though.
Thanks. That's certainly surprising!
If anyone cares, my current code (adapted to fit sha1bench) is appended.
/*
* Secure Hash Algorith SHA-1, as published in FIPS PUB 180-2.
*
* This implementation is in the public domain. Copyright abandoned.
* You may do anything you like with it, including evil things.
*
* This is a rewrite from scratch, based on Linus Torvalds' "block-sha1"
* from the git mailing list (August, 2009). Additional optimization
* ideas cribbed from
* - Artur Skawina (x86, particularly P4, and much benchmarking)
* - Nicolas Pitre (ARM)
*
* Contributors to this code are requested to donate their changes to
* the public domain to avoid version skew. If you don't want to, be
* sure to change the copyright waiver above.
*/
#include <string.h> /* For memcpy() */
#include <arpa/inet.h> /* For ntohl() */
#include "pdsha1.h"
#define SHA_CTX pd_SHA_CTX
#define SHA_context pd_SHA_context
#define SHA1_Init pd_SHA1_Init
#define SHA1_Update pd_SHA1_Update
#define SHA1_Final pd_SHA1_Final
static void sha1_core(uint32_t iv[5], unsigned char const *p);
void SHA1_Init(struct SHA_context *c)
{
/* This is a prefix of the SHA_context structure */
static struct {
uint64_t len;
uint32_t iv[5];
} const iv = { 0,
{ 0x67452301, 0xEFCDAB89, 0x98BADCFE, 0x10325476, 0xC3D2E1F0 }
};
memcpy(c, &iv, sizeof iv);
}
void SHA1_Update(struct SHA_context *c, void const *p, size_t n)
{
size_t pos = c->len & 63; /* Offset into current input block */
size_t nblocks;
c->len += n;
/* Initial partial block (if any) */
if (pos) {
size_t space = 63 - pos;
if (n < space)
goto end;
memcpy(c->buf + pos, p, space);
sha1_core(c->iv, c->buf);
n -= space;
p = (char const *)p + space;
}
/* The large middle piece */
nblocks = n >> 6;
while (nblocks--) {
sha1_core(c->iv, p);
p += 64;
}
n &= 63;
pos = 0;
end:
/* Final partial block (may be zero size) */
memcpy(c->buf + pos, p, n);
}
/* Machine specific hacks */
#if defined(__i386__) || defined(__x86_64__) || defined (__ppc__) || \
defined(__ppc64__) || defined(__powerpc__) || defined (__powerpc64__) || \
defined(__s390__) || defined(__s390x__)
/* Unaligned access is okay */
#define GET_BE32(p) ntohl(*(uint32_t const *)(p))
#define PUT_BE32(p,v) (*(uint32_t *)(p) = htonl(v))
#else
/*
* Unaligned access is not okay; do conversion as byte fetches
* These macros are IN_CAPS, as they are not side-effect safe
* The argument p must be an "unsigned char *" or "uint8_t *"
*/
#define GET_BE32(p) \
((uint32_t)(p)[0]<<24 | (p)[1]<<16 | (p)[2]<<8 | (p)[3])
#define PUT_BE32(p,v) \
( (p)[0] = (v)>>24, (p)[1] = (v)>>16, (p)[2] = (v)>>8, (p)[3] = (v) )
#endif
void SHA1_Final(unsigned char hash[20], struct SHA_context *c)
{
size_t pos = c->len & 63;
unsigned i;
/* Append a single 1 bit */
c->buf[pos++] = 0x80;
/* Append 0 bits until 64 bits remain in a block */
if (pos > 56) {
memset(c->buf + pos, 0, 64 - pos);
sha1_core(c->iv, c->buf);
pos = 0;
}
memset(c->buf + pos, 0, 56 - pos);
/* Append total input length in bits */
((uint32_t *)c->buf)[14] = htonl((uint32_t)(c->len >> 29));
((uint32_t *)c->buf)[15] = htonl((uint32_t)c->len << 3);
/* Final hash round */
sha1_core(c->iv, c->buf);
/* Copy hash result out */
for (i = 0; i < 5; i++)
PUT_BE32(hash + 4*i, c->iv[i]);
}
/*
* Helper macros for sha1_core function. To avoid clutter, these macros
* are NOT fully parenthesized if it doesn't matter to the actual use.
*/
#if __GNUC__ && (defined(__i386__) || defined(__x86_64__))
/*
* GCC by itself only generates left rotates. Use right rotates if
* possible to be kinder to dinky implementations with iterative rotate
* instructions.
*/
#define ROT(op, x, k) \
({ uint32_t y; __asm__(op " %1,%0" : "=r" (y) : "I" (k), "0" (x)); y; })
#define ROL(x,k) ROT("roll", x, k)
#define ROR(x,k) ROT("rorl", x, k)
#else
/* Generic C equivalent */
#define ROT(x,l,r) ((x) << (l) | (x) >> (r))
#define ROL(x,k) ROT(x,k,32-(k))
#define ROR(x,k) ROT(x,32-(k),k)
#endif
/*
* An important temporary array in SHA-1 is the working array W[],
* which holds a scheduled key word per round. Only the last 16 words
* are relevant, so only use 16 words...
*/
#define W(i) w[(i) & 15]
/*
* If you have a small register set, it helps GCC to force stores to
* the w[] array to memory. Given 22 or more registers (e.g. PowerPC),
* GCC can map the entire w[] array to registers and this becomes
* counterproductive.
*
* The optimal kludge seems to differ between x86 and ARM. On the latter,
* forcing a full memory barrier is required to stop GCC from splitting
* live ranges for each round and generating a separate stack slot for
* each. (Which produces a huge stack frame and kills performance.)
*/
#if __GNUC__ && defined(__i386__) || defined(__x86_64__)
#define STORE(i, x) W(i) = x; __asm__ volatile("" : "+m" (W(x)))
#elif defined(__i386__) || defined(__x86_64__)
#define STORE(i, x) *(volatile uint32_t *)&W(i) = x
#elif __GNUC__ && defined(__arm__)
#define STORE(i, x) W(i) = x; __asm__("":::"memory")
#else
#define STORE(i, x) W(i) = x
#endif
/* The three round functions. F2 is also used as F4 */
#define F1(b,c,d) (((d ^ c) & b) ^ d) /* Bitwise b ? c : d */
#define F2(b,c,d) (d ^ c ^ b) /* Even parity */
#define F3(b,c,d) (d & c) + ((d ^ c) & b) /* Majority function */
/* The four round constants */
#define K1 0x5a827999 /* 2^30 * sqrt(2) */
#define K2 0x6ed9eba1 /* 2^30 * sqrt(3) */
#define K3 0x8f1bbcdc /* 2^30 * sqrt(5) */
#define K4 0xca62c1d6 /* 2^30 * sqrt(10) */
/* Rounds 0..15 fetch a word from the input */
#define FETCH(t,i) t = GET_BE32(p + 4*(i)); STORE(i,t)
//#define FETCH(t,i) t = W(i)
/* Rounds 16..79 mix previous words to get a new one */
#define MIX(t,i) t = W(i) ^ W(i+2) ^ W(i+8) ^ W(i+13); t = ROL(t, 1)
/* Rounds 16..76 have to store back the result */
#define CALC(t,i) MIX(t,i); STORE(i,t)
/* The basic SHA-1 round */
#if 0
#define SHA_ROUND(a,b,c,d,e,f,k,src,t,i) \
src(t,i); \
e += t + f(b,c,d) + k + ROL(a,5); \
b = ROR(b,2)
#elif 0
#define SHA_ROUND(a,b,c,d,e,f,k,src,t,i) do { \
uint32_t t; \
src(t,i); \
e += f(b,c,d); \
e += t + k; \
b = ROR(b,2); \
e += ROL(a,5); } while (0)
#else
#define SHA_ROUND(a,b,c,d,e,f,k,src,t,i) do { \
uint32_t t = ROL(a,5); \
e += f(b,c,d); \
e += k + t; \
src(t,i); \
b = ROR(b,2); \
e += t; } while(0)
#endif
/* An aligned group of 5 rounds */
#define SHA_ROUND5(f,k,src,t,i) \
SHA_ROUND(a,b,c,d,e, f,k,src,t,i); \
SHA_ROUND(e,a,b,c,d, f,k,src,t,i+1); \
SHA_ROUND(d,e,a,b,c, f,k,src,t,i+2); \
SHA_ROUND(c,d,e,a,b, f,k,src,t,i+3); \
SHA_ROUND(b,c,d,e,a, f,k,src,t,i+4)
/*
* The core SHA-1 transform.
*
* iv[5] = Current SHA-1 hash state.
* p = Pointer to 64 bytes of source data. Not necessarily aligned.
*/
static void
sha1_core(uint32_t iv[5], unsigned char const *p)
{
//uint32_t e = iv[4], d = iv[3], c = iv[2], b = iv[1], a = iv[0];
uint32_t a = iv[0], b = iv[1], c = iv[2], d = iv[3], e = iv[4];
uint32_t w[16];
#if 0
{
int i;
for (i = 0; i < 16; i++) {
STORE(i, get_be32(p + 4*i));
}
}
#endif
SHA_ROUND5(F1, K1, FETCH, t, 0);
SHA_ROUND5(F1, K1, FETCH, t, 5);
SHA_ROUND5(F1, K1, FETCH, t, 10);
SHA_ROUND(a,b,c,d,e, F1, K1, FETCH, t, 15);
SHA_ROUND(e,a,b,c,d, F1, K1, CALC, t, 16);
SHA_ROUND(d,e,a,b,c, F1, K1, CALC, t, 17);
SHA_ROUND(c,d,e,a,b, F1, K1, CALC, t, 18);
SHA_ROUND(b,c,d,e,a, F1, K1, CALC, t, 19);
SHA_ROUND5(F2, K2, CALC, t, 20);
SHA_ROUND5(F2, K2, CALC, t, 25);
SHA_ROUND5(F2, K2, CALC, t, 30);
SHA_ROUND5(F2, K2, CALC, t, 35);
SHA_ROUND5(F3, K3, CALC, t, 40);
SHA_ROUND5(F3, K3, CALC, t, 45);
SHA_ROUND5(F3, K3, CALC, t, 50);
SHA_ROUND5(F3, K3, CALC, t, 55);
SHA_ROUND5(F2, K4, CALC, t, 60);
SHA_ROUND5(F2, K4, CALC, t, 65);
SHA_ROUND5(F2, K4, CALC, t, 70);
SHA_ROUND(a,b,c,d,e, F2, K4, CALC, t, 75);
SHA_ROUND(e,a,b,c,d, F2, K4, CALC, t, 76);
/* Last 3 rounds don't need to store mixed value */
SHA_ROUND(d,e,a,b,c, F2, K4, MIX, t, 77);
SHA_ROUND(c,d,e,a,b, F2, K4, MIX, t, 78);
SHA_ROUND(b,c,d,e,a, F2, K4, MIX, t, 79);
iv[0] += a;
iv[1] += b;
iv[2] += c;
iv[3] += d;
iv[4] += e;
}
/* Compile with -DUNITTEST for self-tests */
#if UNITTEST
#include <stdio.h>
#include <stdlib.h>
/* Known answer test */
static void katest(char const *text, size_t len, unsigned char const hash[20])
{
SHA_CTX c;
unsigned char hash2[20];
int i;
SHA1_Init(&c);
SHA1_Update(&c, text, len);
SHA1_Final(hash2, &c);
for (i = 0; i < 20; i++)
if (hash[i] != hash2[i])
goto mismatch;
printf("%u-byte known answer test PASSED\n", len);
return;
mismatch:
printf("%u-byte known answer test FAILED:\n", len);
if (len < 64)
printf("Input: \"%.*s\"\n", len, text);
else
printf("Input: \"%.64s\"...\n", text);
printf("Computed:");
for (i = 0; i < 20; i++)
printf(" %02x", hash2[i]);
printf("\nExpected:");
for (i = 0; i < 20; i++)
printf(" %02x", hash[i]);
putchar('\n');
}
int
main(void)
{
/* FIPS PUB 180.1 example A.1 */
static char const text1[3] = "abc";
static unsigned char const hash1[20] = {
0xa9, 0x99, 0x3e, 0x36, 0x47, 0x06, 0x81, 0x6a, 0xba, 0x3e,
0x25, 0x71, 0x78, 0x50, 0xc2, 0x6c, 0x9c, 0xd0, 0xd8, 0x9d };
/* FIPS PUB 180.1 example A.2 */
static char const text2[56] =
"abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq";
static unsigned char const hash2[20] = {
0x84, 0x98, 0x3e, 0x44, 0x1c, 0x3b, 0xd2, 0x6e, 0xba, 0xae,
0x4a, 0xa1, 0xf9, 0x51, 0x29, 0xe5, 0xe5, 0x46, 0x70, 0xf1 };
/* FIPS PUB 180.1 example A.3 */
char *text3;
static unsigned char const hash3[20] = {
0x34, 0xaa, 0x97, 0x3c, 0xd4, 0xc4, 0xda, 0xa4, 0xf6, 0x1e,
0xeb, 0x2b, 0xdb, 0xad, 0x27, 0x31, 0x65, 0x34, 0x01, 0x6f };
katest(text1, sizeof text1, hash1);
katest(text2, sizeof text2, hash2);
text3 = malloc(1000000);
memset(text3, 'a', 1000000);
katest(text3, 1000000, hash3);
return 0;
}
#endif
next prev parent reply other threads:[~2009-08-17 18:55 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-17 7:23 Linus' sha1 is much faster! George Spelvin
2009-08-17 14:20 ` Nicolas Pitre
2009-08-17 17:06 ` Nicolas Pitre
2009-08-17 17:20 ` Paolo Bonzini
2009-08-17 18:54 ` George Spelvin [this message]
2009-08-17 19:34 ` Nicolas Pitre
2009-08-17 23:12 ` George Spelvin
-- strict thread matches above, loose matches on Subject: below --
2009-08-14 23:25 Pádraig Brady
2009-08-15 20:02 ` Bryan Donlan
2009-08-15 20:12 ` John Tapsell
2009-08-15 20:23 ` Linus Torvalds
2009-08-15 20:54 ` Linus Torvalds
2009-08-17 1:55 ` Nicolas Pitre
2009-08-26 11:39 ` Pádraig Brady
2017-04-20 21:35 ` galt
2017-04-20 21:38 ` galt
2009-08-17 8:22 ` Andreas Ericsson
2009-08-16 0:06 ` Theodore Tso
2009-08-16 19:25 ` Giuseppe Scrivano
2009-08-16 20:10 ` Linus Torvalds
2009-08-16 22:15 ` Giuseppe Scrivano
2009-08-16 22:47 ` Linus Torvalds
2009-08-17 1:53 ` Pádraig Brady
2009-08-17 10:51 ` Giuseppe Scrivano
2009-08-17 15:44 ` Steven Noonan
2009-08-17 16:22 ` Linus Torvalds
2009-08-17 21:43 ` Steven Noonan
2009-08-17 17:32 ` Giuseppe Scrivano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090817185448.30254.qmail@science.horizon.com \
--to=linux@horizon.com \
--cc=P@draigBrady.com \
--cc=art.08.09@gmail.com \
--cc=bdonlan@gmail.com \
--cc=git@vger.kernel.org \
--cc=johnflux@gmail.com \
--cc=nico@cam.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).