git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v4 00/12] Base SHA-256 implementation
@ 2018-10-25  2:39 brian m. carlson
  2018-10-25  2:39 ` [PATCH v4 01/12] sha1-file: rename algorithm to "sha1" brian m. carlson
                   ` (12 more replies)
  0 siblings, 13 replies; 58+ messages in thread
From: brian m. carlson @ 2018-10-25  2:39 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor

This series provides a functional SHA-256 implementation and wires it
up, along with some housekeeping patches to make it suitable for
testing.

While I was fixing the macros, I wondered if I could make the code a bit
cleaner by using inline functions.  I tried it and found that not only
did it make the code cleaner, it made the code significantly faster
across all sizes of output, with larger gains on larger chunks (e.g.,
214 MiB/s on 16 KiB chunks vs 151 MiB/s).  I'm unsure why this effect
occurs, but I figured nobody would complain about improved performance.

Changes from v3:
* Switch to using inline functions instead of macros in many cases.
* Undefine remaining macros at the top.

Changes from v2:
* Improve commit messages to include timing and performance information.
* Improve commit messages to be less ambiguous and more friendly to a
  wider variety of English speakers.
* Prefer functions taking struct git_hash_algo in hex.c.
* Port pieces of the block-sha1 implementation over to the block-sha256
  implementation for better compatibility.
* Drop patch 13 in favor of further discussion about the best way
  forward for versioning commit graph.
* Rename the test so as to have a different number from other tests.
* Rebase on master.

Changes from v1:
* Add a hash_to_hex function mirroring sha1_to_hex, but for
  the_hash_algo.
* Strip commit message explanation about why we chose SHA-256.
* Rebase on master
* Strip leading whitespace from commit message.
* Improve commit-graph patch to cover new code added since v1.
* Be more honest about the scope of work involved in porting the SHA-256
  implementation out of libtomcrypt.
* Revert change to limit hashcmp to 20 bytes.

brian m. carlson (12):
  sha1-file: rename algorithm to "sha1"
  sha1-file: provide functions to look up hash algorithms
  hex: introduce functions to print arbitrary hashes
  cache: make hashcmp and hasheq work with larger hashes
  t: add basic tests for our SHA-1 implementation
  t: make the sha1 test-tool helper generic
  sha1-file: add a constant for hash block size
  t/helper: add a test helper to compute hash speed
  commit-graph: convert to using the_hash_algo
  Add a base implementation of SHA-256 support
  sha256: add an SHA-256 implementation using libgcrypt
  hash: add an SHA-256 implementation using OpenSSL

 Makefile                              |  22 +++
 cache.h                               |  51 ++++---
 commit-graph.c                        |  33 +++--
 hash.h                                |  41 +++++-
 hex.c                                 |  32 +++-
 sha1-file.c                           |  70 ++++++++-
 sha256/block/sha256.c                 | 201 ++++++++++++++++++++++++++
 sha256/block/sha256.h                 |  26 ++++
 sha256/gcrypt.h                       |  30 ++++
 t/helper/test-hash-speed.c            |  61 ++++++++
 t/helper/{test-sha1.c => test-hash.c} |  19 +--
 t/helper/test-sha1.c                  |  52 +------
 t/helper/test-sha256.c                |   7 +
 t/helper/test-tool.c                  |   2 +
 t/helper/test-tool.h                  |   4 +
 t/t0015-hash.sh                       |  54 +++++++
 16 files changed, 601 insertions(+), 104 deletions(-)
 create mode 100644 sha256/block/sha256.c
 create mode 100644 sha256/block/sha256.h
 create mode 100644 sha256/gcrypt.h
 create mode 100644 t/helper/test-hash-speed.c
 copy t/helper/{test-sha1.c => test-hash.c} (65%)
 create mode 100644 t/helper/test-sha256.c
 create mode 100755 t/t0015-hash.sh

Range-diff against v3:
 1:  a004a4c982 <  -:  ---------- :hash-impl
 2:  cf9f7f5620 =  1:  cf9f7f5620 sha1-file: rename algorithm to "sha1"
 3:  0144deaebe =  2:  0144deaebe sha1-file: provide functions to look up hash algorithms
 4:  b74858fb03 =  3:  b74858fb03 hex: introduce functions to print arbitrary hashes
 5:  e9703017a4 =  4:  e9703017a4 cache: make hashcmp and hasheq work with larger hashes
 6:  ab85a834fd =  5:  ab85a834fd t: add basic tests for our SHA-1 implementation
 7:  962f6d8903 =  6:  962f6d8903 t: make the sha1 test-tool helper generic
 8:  53addf4d58 =  7:  53addf4d58 sha1-file: add a constant for hash block size
 9:  9ace10faa2 =  8:  9ace10faa2 t/helper: add a test helper to compute hash speed
10:  9adc56d01e =  9:  9adc56d01e commit-graph: convert to using the_hash_algo
11:  8e82cb0dfb ! 10:  f48cb1ad27 Add a base implementation of SHA-256 support
    @@ -207,6 +207,9 @@
     +#include "git-compat-util.h"
     +#include "./sha256.h"
     +
    ++#undef RND
    ++#undef BLKSIZE
    ++
     +#define BLKSIZE blk_SHA256_BLKSIZE
     +
     +void blk_SHA256_Init(blk_SHA256_CTX *ctx)
    @@ -228,14 +231,35 @@
     +	return (x >> n) | (x << (32 - n));
     +}
     +
    -+#define Ch(x,y,z)       (z ^ (x & (y ^ z)))
    -+#define Maj(x,y,z)      (((x | y) & z) | (x & y))
    -+#define S(x, n)         ror((x),(n))
    -+#define R(x, n)         ((x)>>(n))
    -+#define Sigma0(x)       (S(x, 2) ^ S(x, 13) ^ S(x, 22))
    -+#define Sigma1(x)       (S(x, 6) ^ S(x, 11) ^ S(x, 25))
    -+#define Gamma0(x)       (S(x, 7) ^ S(x, 18) ^ R(x, 3))
    -+#define Gamma1(x)       (S(x, 17) ^ S(x, 19) ^ R(x, 10))
    ++static inline uint32_t ch(uint32_t x, uint32_t y, uint32_t z)
    ++{
    ++	return (z ^ (x & (y ^ z)));
    ++}
    ++
    ++static inline uint32_t maj(uint32_t x, uint32_t y, uint32_t z)
    ++{
    ++	return (((x | y) & z) | (x & y));
    ++}
    ++
    ++static inline uint32_t sigma0(uint32_t x)
    ++{
    ++	return ror(x, 2) ^ ror(x, 13) ^ ror(x, 22);
    ++}
    ++
    ++static inline uint32_t sigma1(uint32_t x)
    ++{
    ++	return ror(x, 6) ^ ror(x, 11) ^ ror(x, 25);
    ++}
    ++
    ++static inline uint32_t gamma0(uint32_t x)
    ++{
    ++	return ror(x, 7) ^ ror(x, 18) ^ (x >> 3);
    ++}
    ++
    ++static inline uint32_t gamma1(uint32_t x)
    ++{
    ++	return ror(x, 17) ^ ror(x, 19) ^ (x >> 10);
    ++}
     +
     +static void blk_SHA256_Transform(blk_SHA256_CTX *ctx, const unsigned char *buf)
     +{
    @@ -255,12 +279,12 @@
     +
     +	/* fill W[16..63] */
     +	for (i = 16; i < 64; i++) {
    -+		W[i] = Gamma1(W[i - 2]) + W[i - 7] + Gamma0(W[i - 15]) + W[i - 16];
    ++		W[i] = gamma1(W[i - 2]) + W[i - 7] + gamma0(W[i - 15]) + W[i - 16];
     +	}
     +
     +#define RND(a,b,c,d,e,f,g,h,i,ki)                    \
    -+	t0 = h + Sigma1(e) + Ch(e, f, g) + ki + W[i];   \
    -+	t1 = Sigma0(a) + Maj(a, b, c);                  \
    ++	t0 = h + sigma1(e) + ch(e, f, g) + ki + W[i];   \
    ++	t1 = sigma0(a) + maj(a, b, c);                  \
     +	d += t0;                                        \
     +	h  = t0 + t1;
     +
    @@ -329,15 +353,6 @@
     +	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],62,0xbef9a3f7);
     +	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],63,0xc67178f2);
     +
    -+#undef RND
    -+#undef Ch
    -+#undef Maj
    -+#undef S
    -+#undef R
    -+#undef Sigma0
    -+#undef Sigma1
    -+#undef Gamma0
    -+#undef Gamma1
     +
     +	for (i = 0; i < 8; i++) {
     +		ctx->state[i] = ctx->state[i] + S[i];
12:  9e0061bd74 = 11:  fe8f2ba01c sha256: add an SHA-256 implementation using libgcrypt
13:  128d6b8150 = 12:  38142d8fc6 hash: add an SHA-256 implementation using OpenSSL

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v4 01/12] sha1-file: rename algorithm to "sha1"
  2018-10-25  2:39 [PATCH v4 00/12] Base SHA-256 implementation brian m. carlson
@ 2018-10-25  2:39 ` brian m. carlson
  2018-10-25  2:39 ` [PATCH v4 02/12] sha1-file: provide functions to look up hash algorithms brian m. carlson
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-10-25  2:39 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor

The transition plan anticipates us using a syntax such as "^{sha1}" for
disambiguation.  Since this is a syntax some people will be typing a
lot, it makes sense to provide a short, easy-to-type syntax.  Omitting
the dash doesn't create any ambiguity; however, it does make the syntax
shorter and easier to type, especially for touch typists.  In addition,
the transition plan already uses "sha1" in this context.

Rename the name of SHA-1 implementation to "sha1".

Note that this change creates no backwards compatibility concerns, since
we haven't yet used this field in any configuration settings.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 sha1-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sha1-file.c b/sha1-file.c
index dd0b6aa873..91311ebb3d 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -97,7 +97,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		NULL,
 	},
 	{
-		"sha-1",
+		"sha1",
 		/* "sha1", big-endian */
 		0x73686131,
 		GIT_SHA1_RAWSZ,

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v4 02/12] sha1-file: provide functions to look up hash algorithms
  2018-10-25  2:39 [PATCH v4 00/12] Base SHA-256 implementation brian m. carlson
  2018-10-25  2:39 ` [PATCH v4 01/12] sha1-file: rename algorithm to "sha1" brian m. carlson
@ 2018-10-25  2:39 ` brian m. carlson
  2018-10-25  2:39 ` [PATCH v4 03/12] hex: introduce functions to print arbitrary hashes brian m. carlson
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-10-25  2:39 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor

There are several ways we might refer to a hash algorithm: by name, such
as in the config file; by format ID, such as in a pack; or internally,
by a pointer to the hash_algos array.  Provide functions to look up hash
algorithms based on these various forms and return the internal constant
used for them.  If conversion to another form is necessary, this
internal constant can be used to look up the proper data in the
hash_algos array.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 hash.h      | 13 +++++++++++++
 sha1-file.c | 21 +++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/hash.h b/hash.h
index 7c8238bc2e..80881eea47 100644
--- a/hash.h
+++ b/hash.h
@@ -98,4 +98,17 @@ struct git_hash_algo {
 };
 extern const struct git_hash_algo hash_algos[GIT_HASH_NALGOS];
 
+/*
+ * Return a GIT_HASH_* constant based on the name.  Returns GIT_HASH_UNKNOWN if
+ * the name doesn't match a known algorithm.
+ */
+int hash_algo_by_name(const char *name);
+/* Identical, except based on the format ID. */
+int hash_algo_by_id(uint32_t format_id);
+/* Identical, except for a pointer to struct git_hash_algo. */
+static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
+{
+	return p - hash_algos;
+}
+
 #endif
diff --git a/sha1-file.c b/sha1-file.c
index 91311ebb3d..7e9dedc744 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -122,6 +122,27 @@ const char *empty_blob_oid_hex(void)
 	return oid_to_hex_r(buf, the_hash_algo->empty_blob);
 }
 
+int hash_algo_by_name(const char *name)
+{
+	int i;
+	if (!name)
+		return GIT_HASH_UNKNOWN;
+	for (i = 1; i < GIT_HASH_NALGOS; i++)
+		if (!strcmp(name, hash_algos[i].name))
+			return i;
+	return GIT_HASH_UNKNOWN;
+}
+
+int hash_algo_by_id(uint32_t format_id)
+{
+	int i;
+	for (i = 1; i < GIT_HASH_NALGOS; i++)
+		if (format_id == hash_algos[i].format_id)
+			return i;
+	return GIT_HASH_UNKNOWN;
+}
+
+
 /*
  * This is meant to hold a *small* number of objects that you would
  * want read_sha1_file() to be able to return, but yet you do not want

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v4 03/12] hex: introduce functions to print arbitrary hashes
  2018-10-25  2:39 [PATCH v4 00/12] Base SHA-256 implementation brian m. carlson
  2018-10-25  2:39 ` [PATCH v4 01/12] sha1-file: rename algorithm to "sha1" brian m. carlson
  2018-10-25  2:39 ` [PATCH v4 02/12] sha1-file: provide functions to look up hash algorithms brian m. carlson
@ 2018-10-25  2:39 ` brian m. carlson
  2018-10-25  2:39 ` [PATCH v4 04/12] cache: make hashcmp and hasheq work with larger hashes brian m. carlson
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-10-25  2:39 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor

Currently, we have functions that turn an arbitrary SHA-1 value or an
object ID into hex format, either using a static buffer or with a
user-provided buffer.  Add variants of these functions that can handle
an arbitrary hash algorithm, specified by constant.  Update the
documentation as well.

While we're at it, remove the "extern" declaration from this family of
functions, since it's not needed and our style now recommends against
it.

We use the variant taking the algorithm structure pointer as the
internal variant, since taking an algorithm pointer is the easiest way
to handle all of the variants in use.

Note that we maintain these functions because there are hashes which
must change based on the hash algorithm in use but are not object IDs
(such as pack checksums).

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 cache.h | 15 +++++++++------
 hex.c   | 32 ++++++++++++++++++++++++--------
 2 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/cache.h b/cache.h
index 59c8a93046..51580c4b77 100644
--- a/cache.h
+++ b/cache.h
@@ -1364,9 +1364,9 @@ extern int get_oid_hex(const char *hex, struct object_id *sha1);
 extern int hex_to_bytes(unsigned char *binary, const char *hex, size_t len);
 
 /*
- * Convert a binary sha1 to its hex equivalent. The `_r` variant is reentrant,
+ * Convert a binary hash to its hex equivalent. The `_r` variant is reentrant,
  * and writes the NUL-terminated output to the buffer `out`, which must be at
- * least `GIT_SHA1_HEXSZ + 1` bytes, and returns a pointer to out for
+ * least `GIT_MAX_HEXSZ + 1` bytes, and returns a pointer to out for
  * convenience.
  *
  * The non-`_r` variant returns a static buffer, but uses a ring of 4
@@ -1374,10 +1374,13 @@ extern int hex_to_bytes(unsigned char *binary, const char *hex, size_t len);
  *
  *   printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two));
  */
-extern char *sha1_to_hex_r(char *out, const unsigned char *sha1);
-extern char *oid_to_hex_r(char *out, const struct object_id *oid);
-extern char *sha1_to_hex(const unsigned char *sha1);	/* static buffer result! */
-extern char *oid_to_hex(const struct object_id *oid);	/* same static buffer as sha1_to_hex */
+char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash, const struct git_hash_algo *);
+char *sha1_to_hex_r(char *out, const unsigned char *sha1);
+char *oid_to_hex_r(char *out, const struct object_id *oid);
+char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *);	/* static buffer result! */
+char *sha1_to_hex(const unsigned char *sha1);						/* same static buffer */
+char *hash_to_hex(const unsigned char *hash);						/* same static buffer */
+char *oid_to_hex(const struct object_id *oid);						/* same static buffer */
 
 /*
  * Parse a 40-character hexadecimal object ID starting from hex, updating the
diff --git a/hex.c b/hex.c
index 10af1a29e8..d2e8bb9540 100644
--- a/hex.c
+++ b/hex.c
@@ -73,14 +73,15 @@ int parse_oid_hex(const char *hex, struct object_id *oid, const char **end)
 	return ret;
 }
 
-char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
+inline char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash,
+					const struct git_hash_algo *algop)
 {
 	static const char hex[] = "0123456789abcdef";
 	char *buf = buffer;
 	int i;
 
-	for (i = 0; i < the_hash_algo->rawsz; i++) {
-		unsigned int val = *sha1++;
+	for (i = 0; i < algop->rawsz; i++) {
+		unsigned int val = *hash++;
 		*buf++ = hex[val >> 4];
 		*buf++ = hex[val & 0xf];
 	}
@@ -89,20 +90,35 @@ char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
 	return buffer;
 }
 
-char *oid_to_hex_r(char *buffer, const struct object_id *oid)
+char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
 {
-	return sha1_to_hex_r(buffer, oid->hash);
+	return hash_to_hex_algop_r(buffer, sha1, &hash_algos[GIT_HASH_SHA1]);
 }
 
-char *sha1_to_hex(const unsigned char *sha1)
+char *oid_to_hex_r(char *buffer, const struct object_id *oid)
+{
+	return hash_to_hex_algop_r(buffer, oid->hash, the_hash_algo);
+}
+
+char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *algop)
 {
 	static int bufno;
 	static char hexbuffer[4][GIT_MAX_HEXSZ + 1];
 	bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer);
-	return sha1_to_hex_r(hexbuffer[bufno], sha1);
+	return hash_to_hex_algop_r(hexbuffer[bufno], hash, algop);
+}
+
+char *sha1_to_hex(const unsigned char *sha1)
+{
+	return hash_to_hex_algop(sha1, &hash_algos[GIT_HASH_SHA1]);
+}
+
+char *hash_to_hex(const unsigned char *hash)
+{
+	return hash_to_hex_algop(hash, the_hash_algo);
 }
 
 char *oid_to_hex(const struct object_id *oid)
 {
-	return sha1_to_hex(oid->hash);
+	return hash_to_hex_algop(oid->hash, the_hash_algo);
 }

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v4 04/12] cache: make hashcmp and hasheq work with larger hashes
  2018-10-25  2:39 [PATCH v4 00/12] Base SHA-256 implementation brian m. carlson
                   ` (2 preceding siblings ...)
  2018-10-25  2:39 ` [PATCH v4 03/12] hex: introduce functions to print arbitrary hashes brian m. carlson
@ 2018-10-25  2:39 ` brian m. carlson
  2018-10-25  2:39 ` [PATCH v4 05/12] t: add basic tests for our SHA-1 implementation brian m. carlson
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-10-25  2:39 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor

In 183a638b7d ("hashcmp: assert constant hash size", 2018-08-23), we
modified hashcmp to assert that the hash size was always 20 to help it
optimize and inline calls to memcmp.  In a future series, we replaced
many calls to hashcmp and oidcmp with calls to hasheq and oideq to
improve inlining further.

However, we want to support hash algorithms other than SHA-1, namely
SHA-256.  When doing so, we must handle the case where these values are
32 bytes long as well as 20.  Adjust hashcmp to handle two cases:
20-byte matches, and maximum-size matches.  Therefore, when we include
SHA-256, we'll automatically handle it properly, while at the same time
teaching the compiler that there are only two possible options to
consider.  This will allow the compiler to write the most efficient
possible code.

Copy similar code into hasheq and perform an identical transformation.
At least with GCC 8.2.0, making hasheq defer to hashcmp when there are
two branches prevents the compiler from inlining the comparison, while
the code in this patch is inlined properly.  Add a comment to avoid an
accidental performance regression from well-intentioned refactoring.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 cache.h | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index 51580c4b77..bab8e8964f 100644
--- a/cache.h
+++ b/cache.h
@@ -1027,16 +1027,12 @@ extern const struct object_id null_oid;
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
 	/*
-	 * This is a temporary optimization hack. By asserting the size here,
-	 * we let the compiler know that it's always going to be 20, which lets
-	 * it turn this fixed-size memcmp into a few inline instructions.
-	 *
-	 * This will need to be extended or ripped out when we learn about
-	 * hashes of different sizes.
+	 * Teach the compiler that there are only two possibilities of hash size
+	 * here, so that it can optimize for this case as much as possible.
 	 */
-	if (the_hash_algo->rawsz != 20)
-		BUG("hash size not yet supported by hashcmp");
-	return memcmp(sha1, sha2, the_hash_algo->rawsz);
+	if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
+		return memcmp(sha1, sha2, GIT_MAX_RAWSZ);
+	return memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)
@@ -1046,7 +1042,13 @@ static inline int oidcmp(const struct object_id *oid1, const struct object_id *o
 
 static inline int hasheq(const unsigned char *sha1, const unsigned char *sha2)
 {
-	return !hashcmp(sha1, sha2);
+	/*
+	 * We write this here instead of deferring to hashcmp so that the
+	 * compiler can properly inline it and avoid calling memcmp.
+	 */
+	if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
+		return !memcmp(sha1, sha2, GIT_MAX_RAWSZ);
+	return !memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
 }
 
 static inline int oideq(const struct object_id *oid1, const struct object_id *oid2)

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v4 05/12] t: add basic tests for our SHA-1 implementation
  2018-10-25  2:39 [PATCH v4 00/12] Base SHA-256 implementation brian m. carlson
                   ` (3 preceding siblings ...)
  2018-10-25  2:39 ` [PATCH v4 04/12] cache: make hashcmp and hasheq work with larger hashes brian m. carlson
@ 2018-10-25  2:39 ` brian m. carlson
  2018-10-25  2:39 ` [PATCH v4 06/12] t: make the sha1 test-tool helper generic brian m. carlson
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-10-25  2:39 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor

We have in the past had some unfortunate endianness issues with some
SHA-1 implementations we ship, especially on big-endian machines.  Add
an explicit test using the test helper to catch these issues and point
them out prominently.  This test can also be used as a staging ground
for people testing additional algorithms to verify that their
implementations are working as expected.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t0015-hash.sh | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)
 create mode 100755 t/t0015-hash.sh

diff --git a/t/t0015-hash.sh b/t/t0015-hash.sh
new file mode 100755
index 0000000000..8e763c2c3d
--- /dev/null
+++ b/t/t0015-hash.sh
@@ -0,0 +1,29 @@
+#!/bin/sh
+
+test_description='test basic hash implementation'
+. ./test-lib.sh
+
+
+test_expect_success 'test basic SHA-1 hash values' '
+	test-tool sha1 </dev/null >actual &&
+	grep da39a3ee5e6b4b0d3255bfef95601890afd80709 actual &&
+	printf "a" | test-tool sha1 >actual &&
+	grep 86f7e437faa5a7fce15d1ddcb9eaeaea377667b8 actual &&
+	printf "abc" | test-tool sha1 >actual &&
+	grep a9993e364706816aba3e25717850c26c9cd0d89d actual &&
+	printf "message digest" | test-tool sha1 >actual &&
+	grep c12252ceda8be8994d5fa0290a47231c1d16aae3 actual &&
+	printf "abcdefghijklmnopqrstuvwxyz" | test-tool sha1 >actual &&
+	grep 32d10c7b8cf96570ca04ce37f2a19d84240d3a89 actual &&
+	perl -E "for (1..100000) { print q{aaaaaaaaaa}; }" | \
+		test-tool sha1 >actual &&
+	grep 34aa973cd4c4daa4f61eeb2bdbad27316534016f actual &&
+	printf "blob 0\0" | test-tool sha1 >actual &&
+	grep e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 actual &&
+	printf "blob 3\0abc" | test-tool sha1 >actual &&
+	grep f2ba8f84ab5c1bce84a7b441cb1959cfc7093b7f actual &&
+	printf "tree 0\0" | test-tool sha1 >actual &&
+	grep 4b825dc642cb6eb9a060e54bf8d69288fbee4904 actual
+'
+
+test_done

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v4 06/12] t: make the sha1 test-tool helper generic
  2018-10-25  2:39 [PATCH v4 00/12] Base SHA-256 implementation brian m. carlson
                   ` (4 preceding siblings ...)
  2018-10-25  2:39 ` [PATCH v4 05/12] t: add basic tests for our SHA-1 implementation brian m. carlson
@ 2018-10-25  2:39 ` brian m. carlson
  2018-10-25  2:40 ` [PATCH v4 07/12] sha1-file: add a constant for hash block size brian m. carlson
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-10-25  2:39 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor

Since we're going to have multiple hash algorithms to test, it makes
sense to share as much of the test code as possible.  Convert the sha1
helper for the test-tool to be generic and move it out into its own
module.  This will allow us to share most of this code with our NewHash
implementation.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Makefile                              |  1 +
 t/helper/{test-sha1.c => test-hash.c} | 19 +++++-----
 t/helper/test-sha1.c                  | 52 +--------------------------
 t/helper/test-tool.h                  |  2 ++
 4 files changed, 14 insertions(+), 60 deletions(-)
 copy t/helper/{test-sha1.c => test-hash.c} (65%)

diff --git a/Makefile b/Makefile
index d18ab0fe78..81dc9ac819 100644
--- a/Makefile
+++ b/Makefile
@@ -714,6 +714,7 @@ TEST_BUILTINS_OBJS += test-dump-split-index.o
 TEST_BUILTINS_OBJS += test-dump-untracked-cache.o
 TEST_BUILTINS_OBJS += test-example-decorate.o
 TEST_BUILTINS_OBJS += test-genrandom.o
+TEST_BUILTINS_OBJS += test-hash.o
 TEST_BUILTINS_OBJS += test-hashmap.o
 TEST_BUILTINS_OBJS += test-index-version.o
 TEST_BUILTINS_OBJS += test-json-writer.o
diff --git a/t/helper/test-sha1.c b/t/helper/test-hash.c
similarity index 65%
copy from t/helper/test-sha1.c
copy to t/helper/test-hash.c
index 1ba0675c75..0a31de66f3 100644
--- a/t/helper/test-sha1.c
+++ b/t/helper/test-hash.c
@@ -1,13 +1,14 @@
 #include "test-tool.h"
 #include "cache.h"
 
-int cmd__sha1(int ac, const char **av)
+int cmd_hash_impl(int ac, const char **av, int algo)
 {
-	git_SHA_CTX ctx;
-	unsigned char sha1[20];
+	git_hash_ctx ctx;
+	unsigned char hash[GIT_MAX_HEXSZ];
 	unsigned bufsz = 8192;
 	int binary = 0;
 	char *buffer;
+	const struct git_hash_algo *algop = &hash_algos[algo];
 
 	if (ac == 2) {
 		if (!strcmp(av[1], "-b"))
@@ -26,7 +27,7 @@ int cmd__sha1(int ac, const char **av)
 			die("OOPS");
 	}
 
-	git_SHA1_Init(&ctx);
+	algop->init_fn(&ctx);
 
 	while (1) {
 		ssize_t sz, this_sz;
@@ -38,20 +39,20 @@ int cmd__sha1(int ac, const char **av)
 			if (sz == 0)
 				break;
 			if (sz < 0)
-				die_errno("test-sha1");
+				die_errno("test-hash");
 			this_sz += sz;
 			cp += sz;
 			room -= sz;
 		}
 		if (this_sz == 0)
 			break;
-		git_SHA1_Update(&ctx, buffer, this_sz);
+		algop->update_fn(&ctx, buffer, this_sz);
 	}
-	git_SHA1_Final(sha1, &ctx);
+	algop->final_fn(hash, &ctx);
 
 	if (binary)
-		fwrite(sha1, 1, 20, stdout);
+		fwrite(hash, 1, algop->rawsz, stdout);
 	else
-		puts(sha1_to_hex(sha1));
+		puts(hash_to_hex_algop(hash, algop));
 	exit(0);
 }
diff --git a/t/helper/test-sha1.c b/t/helper/test-sha1.c
index 1ba0675c75..d860c387c3 100644
--- a/t/helper/test-sha1.c
+++ b/t/helper/test-sha1.c
@@ -3,55 +3,5 @@
 
 int cmd__sha1(int ac, const char **av)
 {
-	git_SHA_CTX ctx;
-	unsigned char sha1[20];
-	unsigned bufsz = 8192;
-	int binary = 0;
-	char *buffer;
-
-	if (ac == 2) {
-		if (!strcmp(av[1], "-b"))
-			binary = 1;
-		else
-			bufsz = strtoul(av[1], NULL, 10) * 1024 * 1024;
-	}
-
-	if (!bufsz)
-		bufsz = 8192;
-
-	while ((buffer = malloc(bufsz)) == NULL) {
-		fprintf(stderr, "bufsz %u is too big, halving...\n", bufsz);
-		bufsz /= 2;
-		if (bufsz < 1024)
-			die("OOPS");
-	}
-
-	git_SHA1_Init(&ctx);
-
-	while (1) {
-		ssize_t sz, this_sz;
-		char *cp = buffer;
-		unsigned room = bufsz;
-		this_sz = 0;
-		while (room) {
-			sz = xread(0, cp, room);
-			if (sz == 0)
-				break;
-			if (sz < 0)
-				die_errno("test-sha1");
-			this_sz += sz;
-			cp += sz;
-			room -= sz;
-		}
-		if (this_sz == 0)
-			break;
-		git_SHA1_Update(&ctx, buffer, this_sz);
-	}
-	git_SHA1_Final(sha1, &ctx);
-
-	if (binary)
-		fwrite(sha1, 1, 20, stdout);
-	else
-		puts(sha1_to_hex(sha1));
-	exit(0);
+	return cmd_hash_impl(ac, av, GIT_HASH_SHA1);
 }
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index e4890566da..29ac7b0b0d 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -50,4 +50,6 @@ int cmd__windows_named_pipe(int argc, const char **argv);
 #endif
 int cmd__write_cache(int argc, const char **argv);
 
+int cmd_hash_impl(int ac, const char **av, int algo);
+
 #endif

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v4 07/12] sha1-file: add a constant for hash block size
  2018-10-25  2:39 [PATCH v4 00/12] Base SHA-256 implementation brian m. carlson
                   ` (5 preceding siblings ...)
  2018-10-25  2:39 ` [PATCH v4 06/12] t: make the sha1 test-tool helper generic brian m. carlson
@ 2018-10-25  2:40 ` brian m. carlson
  2018-10-25  2:40 ` [PATCH v4 08/12] t/helper: add a test helper to compute hash speed brian m. carlson
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-10-25  2:40 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor

There is one place we need the hash algorithm block size: the HMAC code
for push certs.  Expose this constant in struct git_hash_algo and expose
values for SHA-1 and for the largest value of any hash.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 cache.h     | 4 ++++
 hash.h      | 3 +++
 sha1-file.c | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/cache.h b/cache.h
index bab8e8964f..9e5d1dd85a 100644
--- a/cache.h
+++ b/cache.h
@@ -45,10 +45,14 @@ unsigned long git_deflate_bound(git_zstream *, unsigned long);
 /* The length in bytes and in hex digits of an object name (SHA-1 value). */
 #define GIT_SHA1_RAWSZ 20
 #define GIT_SHA1_HEXSZ (2 * GIT_SHA1_RAWSZ)
+/* The block size of SHA-1. */
+#define GIT_SHA1_BLKSZ 64
 
 /* The length in byte and in hex digits of the largest possible hash value. */
 #define GIT_MAX_RAWSZ GIT_SHA1_RAWSZ
 #define GIT_MAX_HEXSZ GIT_SHA1_HEXSZ
+/* The largest possible block size for any supported hash. */
+#define GIT_MAX_BLKSZ GIT_SHA1_BLKSZ
 
 struct object_id {
 	unsigned char hash[GIT_MAX_RAWSZ];
diff --git a/hash.h b/hash.h
index 80881eea47..1bcf7ab6fd 100644
--- a/hash.h
+++ b/hash.h
@@ -81,6 +81,9 @@ struct git_hash_algo {
 	/* The length of the hash in hex characters. */
 	size_t hexsz;
 
+	/* The block size of the hash. */
+	size_t blksz;
+
 	/* The hash initialization function. */
 	git_hash_init_fn init_fn;
 
diff --git a/sha1-file.c b/sha1-file.c
index 7e9dedc744..9bdd04ea45 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -90,6 +90,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		0x00000000,
 		0,
 		0,
+		0,
 		git_hash_unknown_init,
 		git_hash_unknown_update,
 		git_hash_unknown_final,
@@ -102,6 +103,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		0x73686131,
 		GIT_SHA1_RAWSZ,
 		GIT_SHA1_HEXSZ,
+		GIT_SHA1_BLKSZ,
 		git_hash_sha1_init,
 		git_hash_sha1_update,
 		git_hash_sha1_final,

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v4 08/12] t/helper: add a test helper to compute hash speed
  2018-10-25  2:39 [PATCH v4 00/12] Base SHA-256 implementation brian m. carlson
                   ` (6 preceding siblings ...)
  2018-10-25  2:40 ` [PATCH v4 07/12] sha1-file: add a constant for hash block size brian m. carlson
@ 2018-10-25  2:40 ` brian m. carlson
  2018-10-25  2:40 ` [PATCH v4 09/12] commit-graph: convert to using the_hash_algo brian m. carlson
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-10-25  2:40 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor

Add a utility (which is less for the testsuite and more for developers)
that can compute hash speeds for whatever hash algorithms are
implemented.  This allows developers to test their personal systems to
determine the performance characteristics of various algorithms.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Makefile                   |  1 +
 t/helper/test-hash-speed.c | 61 ++++++++++++++++++++++++++++++++++++++
 t/helper/test-tool.c       |  1 +
 t/helper/test-tool.h       |  1 +
 4 files changed, 64 insertions(+)
 create mode 100644 t/helper/test-hash-speed.c

diff --git a/Makefile b/Makefile
index 81dc9ac819..68169a7abb 100644
--- a/Makefile
+++ b/Makefile
@@ -716,6 +716,7 @@ TEST_BUILTINS_OBJS += test-example-decorate.o
 TEST_BUILTINS_OBJS += test-genrandom.o
 TEST_BUILTINS_OBJS += test-hash.o
 TEST_BUILTINS_OBJS += test-hashmap.o
+TEST_BUILTINS_OBJS += test-hash-speed.o
 TEST_BUILTINS_OBJS += test-index-version.o
 TEST_BUILTINS_OBJS += test-json-writer.o
 TEST_BUILTINS_OBJS += test-lazy-init-name-hash.o
diff --git a/t/helper/test-hash-speed.c b/t/helper/test-hash-speed.c
new file mode 100644
index 0000000000..432233c7f0
--- /dev/null
+++ b/t/helper/test-hash-speed.c
@@ -0,0 +1,61 @@
+#include "test-tool.h"
+#include "cache.h"
+
+#define NUM_SECONDS 3
+
+static inline void compute_hash(const struct git_hash_algo *algo, git_hash_ctx *ctx, uint8_t *final, const void *p, size_t len)
+{
+	algo->init_fn(ctx);
+	algo->update_fn(ctx, p, len);
+	algo->final_fn(final, ctx);
+}
+
+int cmd__hash_speed(int ac, const char **av)
+{
+	git_hash_ctx ctx;
+	unsigned char hash[GIT_MAX_RAWSZ];
+	clock_t initial, start, end;
+	unsigned bufsizes[] = { 64, 256, 1024, 8192, 16384 };
+	int i;
+	void *p;
+	const struct git_hash_algo *algo = NULL;
+
+	if (ac == 2) {
+		for (i = 1; i < GIT_HASH_NALGOS; i++) {
+			if (!strcmp(av[1], hash_algos[i].name)) {
+				algo = &hash_algos[i];
+				break;
+			}
+		}
+	}
+	if (!algo)
+		die("usage: test-tool hash-speed algo_name");
+
+	/* Use this as an offset to make overflow less likely. */
+	initial = clock();
+
+	printf("algo: %s\n", algo->name);
+
+	for (i = 0; i < ARRAY_SIZE(bufsizes); i++) {
+		unsigned long j, kb;
+		double kb_per_sec;
+		p = xcalloc(1, bufsizes[i]);
+		start = end = clock() - initial;
+		for (j = 0; ((end - start) / CLOCKS_PER_SEC) < NUM_SECONDS; j++) {
+			compute_hash(algo, &ctx, hash, p, bufsizes[i]);
+
+			/*
+			 * Only check elapsed time every 128 iterations to avoid
+			 * dominating the runtime with system calls.
+			 */
+			if (!(j & 127))
+				end = clock() - initial;
+		}
+		kb = j * bufsizes[i];
+		kb_per_sec = kb / (1024 * ((double)end - start) / CLOCKS_PER_SEC);
+		printf("size %u: %lu iters; %lu KiB; %0.2f KiB/s\n", bufsizes[i], j, kb, kb_per_sec);
+		free(p);
+	}
+
+	exit(0);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 6b5836dc1b..e009c8186d 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -20,6 +20,7 @@ static struct test_cmd cmds[] = {
 	{ "example-decorate", cmd__example_decorate },
 	{ "genrandom", cmd__genrandom },
 	{ "hashmap", cmd__hashmap },
+	{ "hash-speed", cmd__hash_speed },
 	{ "index-version", cmd__index_version },
 	{ "json-writer", cmd__json_writer },
 	{ "lazy-init-name-hash", cmd__lazy_init_name_hash },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 29ac7b0b0d..19a7e8332a 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -16,6 +16,7 @@ int cmd__dump_untracked_cache(int argc, const char **argv);
 int cmd__example_decorate(int argc, const char **argv);
 int cmd__genrandom(int argc, const char **argv);
 int cmd__hashmap(int argc, const char **argv);
+int cmd__hash_speed(int argc, const char **argv);
 int cmd__index_version(int argc, const char **argv);
 int cmd__json_writer(int argc, const char **argv);
 int cmd__lazy_init_name_hash(int argc, const char **argv);

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v4 09/12] commit-graph: convert to using the_hash_algo
  2018-10-25  2:39 [PATCH v4 00/12] Base SHA-256 implementation brian m. carlson
                   ` (7 preceding siblings ...)
  2018-10-25  2:40 ` [PATCH v4 08/12] t/helper: add a test helper to compute hash speed brian m. carlson
@ 2018-10-25  2:40 ` brian m. carlson
  2018-10-25  2:40 ` [PATCH v4 10/12] Add a base implementation of SHA-256 support brian m. carlson
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-10-25  2:40 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor

Instead of using hard-coded constants for object sizes, use
the_hash_algo to look them up.  In addition, use a function call to look
up the object ID version and produce the correct value.  For now, we use
version 1, which means to use the default algorithm used in the rest of
the repository.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 commit-graph.c | 33 +++++++++++++++++----------------
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..6763d19288 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -23,16 +23,11 @@
 #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
 #define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */
 
-#define GRAPH_DATA_WIDTH 36
+#define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16)
 
 #define GRAPH_VERSION_1 0x1
 #define GRAPH_VERSION GRAPH_VERSION_1
 
-#define GRAPH_OID_VERSION_SHA1 1
-#define GRAPH_OID_LEN_SHA1 GIT_SHA1_RAWSZ
-#define GRAPH_OID_VERSION GRAPH_OID_VERSION_SHA1
-#define GRAPH_OID_LEN GRAPH_OID_LEN_SHA1
-
 #define GRAPH_OCTOPUS_EDGES_NEEDED 0x80000000
 #define GRAPH_PARENT_MISSING 0x7fffffff
 #define GRAPH_EDGE_LAST_MASK 0x7fffffff
@@ -44,13 +39,18 @@
 #define GRAPH_FANOUT_SIZE (4 * 256)
 #define GRAPH_CHUNKLOOKUP_WIDTH 12
 #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
-			+ GRAPH_FANOUT_SIZE + GRAPH_OID_LEN)
+			+ GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
 
 char *get_commit_graph_filename(const char *obj_dir)
 {
 	return xstrfmt("%s/info/commit-graph", obj_dir);
 }
 
+static uint8_t oid_version(void)
+{
+	return 1;
+}
+
 static struct commit_graph *alloc_commit_graph(void)
 {
 	struct commit_graph *g = xcalloc(1, sizeof(*g));
@@ -125,15 +125,15 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 	}
 
 	hash_version = *(unsigned char*)(data + 5);
-	if (hash_version != GRAPH_OID_VERSION) {
+	if (hash_version != oid_version()) {
 		error(_("hash version %X does not match version %X"),
-		      hash_version, GRAPH_OID_VERSION);
+		      hash_version, oid_version());
 		goto cleanup_fail;
 	}
 
 	graph = alloc_commit_graph();
 
-	graph->hash_len = GRAPH_OID_LEN;
+	graph->hash_len = the_hash_algo->rawsz;
 	graph->num_chunks = *(unsigned char*)(data + 6);
 	graph->graph_fd = fd;
 	graph->data = graph_map;
@@ -149,7 +149,7 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 
 		chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
 
-		if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
+		if (chunk_offset > graph_size - the_hash_algo->rawsz) {
 			error(_("improper chunk offset %08x%08x"), (uint32_t)(chunk_offset >> 32),
 			      (uint32_t)chunk_offset);
 			goto cleanup_fail;
@@ -764,6 +764,7 @@ void write_commit_graph(const char *obj_dir,
 	int num_extra_edges;
 	struct commit_list *parent;
 	struct progress *progress = NULL;
+	const unsigned hashsz = the_hash_algo->rawsz;
 
 	if (!commit_graph_compatible(the_repository))
 		return;
@@ -909,7 +910,7 @@ void write_commit_graph(const char *obj_dir,
 	hashwrite_be32(f, GRAPH_SIGNATURE);
 
 	hashwrite_u8(f, GRAPH_VERSION);
-	hashwrite_u8(f, GRAPH_OID_VERSION);
+	hashwrite_u8(f, oid_version());
 	hashwrite_u8(f, num_chunks);
 	hashwrite_u8(f, 0); /* unused padding byte */
 
@@ -924,8 +925,8 @@ void write_commit_graph(const char *obj_dir,
 
 	chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
 	chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
-	chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.nr;
-	chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.nr;
+	chunk_offsets[2] = chunk_offsets[1] + hashsz * commits.nr;
+	chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * commits.nr;
 	chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
 
 	for (i = 0; i <= num_chunks; i++) {
@@ -938,8 +939,8 @@ void write_commit_graph(const char *obj_dir,
 	}
 
 	write_graph_chunk_fanout(f, commits.list, commits.nr);
-	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
-	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
+	write_graph_chunk_oids(f, hashsz, commits.list, commits.nr);
+	write_graph_chunk_data(f, hashsz, commits.list, commits.nr);
 	write_graph_chunk_large_edges(f, commits.list, commits.nr);
 
 	close_commit_graph(the_repository);

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v4 10/12] Add a base implementation of SHA-256 support
  2018-10-25  2:39 [PATCH v4 00/12] Base SHA-256 implementation brian m. carlson
                   ` (8 preceding siblings ...)
  2018-10-25  2:40 ` [PATCH v4 09/12] commit-graph: convert to using the_hash_algo brian m. carlson
@ 2018-10-25  2:40 ` brian m. carlson
  2018-10-25  3:02   ` Carlo Arenas
  2018-10-27  9:03   ` Christian Couder
  2018-10-25  2:40 ` [PATCH v4 11/12] sha256: add an SHA-256 implementation using libgcrypt brian m. carlson
                   ` (2 subsequent siblings)
  12 siblings, 2 replies; 58+ messages in thread
From: brian m. carlson @ 2018-10-25  2:40 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor

SHA-1 is weak and we need to transition to a new hash function.  For
some time, we have referred to this new function as NewHash.  Recently,
we decided to pick SHA-256 as NewHash.

Add a basic implementation of SHA-256 based off libtomcrypt, which is in
the public domain.  Optimize it and restructure it to meet our coding
standards.  Pull in the update and final functions from the SHA-1 block
implementation, as we know these function correctly with all compilers.
This implementation is slower than SHA-1, but more performant
implementations will be introduced in future commits.

Wire up SHA-256 in the list of hash algorithms, and add a test that the
algorithm works correctly.

Note that with this patch, it is still not possible to switch to using
SHA-256 in Git.  Additional patches are needed to prepare the code to
handle a larger hash algorithm and further test fixes are needed.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Makefile               |   4 +
 cache.h                |  12 ++-
 hash.h                 |  19 +++-
 sha1-file.c            |  45 +++++++++
 sha256/block/sha256.c  | 201 +++++++++++++++++++++++++++++++++++++++++
 sha256/block/sha256.h  |  26 ++++++
 t/helper/test-sha256.c |   7 ++
 t/helper/test-tool.c   |   1 +
 t/helper/test-tool.h   |   1 +
 t/t0015-hash.sh        |  25 +++++
 10 files changed, 337 insertions(+), 4 deletions(-)
 create mode 100644 sha256/block/sha256.c
 create mode 100644 sha256/block/sha256.h
 create mode 100644 t/helper/test-sha256.c

diff --git a/Makefile b/Makefile
index 68169a7abb..e99b7712f6 100644
--- a/Makefile
+++ b/Makefile
@@ -739,6 +739,7 @@ TEST_BUILTINS_OBJS += test-run-command.o
 TEST_BUILTINS_OBJS += test-scrap-cache-tree.o
 TEST_BUILTINS_OBJS += test-sha1.o
 TEST_BUILTINS_OBJS += test-sha1-array.o
+TEST_BUILTINS_OBJS += test-sha256.o
 TEST_BUILTINS_OBJS += test-sigchain.o
 TEST_BUILTINS_OBJS += test-strcmp-offset.o
 TEST_BUILTINS_OBJS += test-string-list.o
@@ -1633,6 +1634,9 @@ endif
 endif
 endif
 
+LIB_OBJS += sha256/block/sha256.o
+BASIC_CFLAGS += -DSHA256_BLK
+
 ifdef SHA1_MAX_BLOCK_SIZE
 	LIB_OBJS += compat/sha1-chunked.o
 	BASIC_CFLAGS += -DSHA1_MAX_BLOCK_SIZE="$(SHA1_MAX_BLOCK_SIZE)"
diff --git a/cache.h b/cache.h
index 9e5d1dd85a..48ce1565e6 100644
--- a/cache.h
+++ b/cache.h
@@ -48,11 +48,17 @@ unsigned long git_deflate_bound(git_zstream *, unsigned long);
 /* The block size of SHA-1. */
 #define GIT_SHA1_BLKSZ 64
 
+/* The length in bytes and in hex digits of an object name (SHA-256 value). */
+#define GIT_SHA256_RAWSZ 32
+#define GIT_SHA256_HEXSZ (2 * GIT_SHA256_RAWSZ)
+/* The block size of SHA-256. */
+#define GIT_SHA256_BLKSZ 64
+
 /* The length in byte and in hex digits of the largest possible hash value. */
-#define GIT_MAX_RAWSZ GIT_SHA1_RAWSZ
-#define GIT_MAX_HEXSZ GIT_SHA1_HEXSZ
+#define GIT_MAX_RAWSZ GIT_SHA256_RAWSZ
+#define GIT_MAX_HEXSZ GIT_SHA256_HEXSZ
 /* The largest possible block size for any supported hash. */
-#define GIT_MAX_BLKSZ GIT_SHA1_BLKSZ
+#define GIT_MAX_BLKSZ GIT_SHA256_BLKSZ
 
 struct object_id {
 	unsigned char hash[GIT_MAX_RAWSZ];
diff --git a/hash.h b/hash.h
index 1bcf7ab6fd..a9bc624020 100644
--- a/hash.h
+++ b/hash.h
@@ -15,6 +15,8 @@
 #include "block-sha1/sha1.h"
 #endif
 
+#include "sha256/block/sha256.h"
+
 #ifndef platform_SHA_CTX
 /*
  * platform's underlying implementation of SHA-1; could be OpenSSL,
@@ -34,6 +36,18 @@
 #define git_SHA1_Update		platform_SHA1_Update
 #define git_SHA1_Final		platform_SHA1_Final
 
+#ifndef platform_SHA256_CTX
+#define platform_SHA256_CTX	SHA256_CTX
+#define platform_SHA256_Init	SHA256_Init
+#define platform_SHA256_Update	SHA256_Update
+#define platform_SHA256_Final	SHA256_Final
+#endif
+
+#define git_SHA256_CTX		platform_SHA256_CTX
+#define git_SHA256_Init		platform_SHA256_Init
+#define git_SHA256_Update	platform_SHA256_Update
+#define git_SHA256_Final	platform_SHA256_Final
+
 #ifdef SHA1_MAX_BLOCK_SIZE
 #include "compat/sha1-chunked.h"
 #undef git_SHA1_Update
@@ -52,12 +66,15 @@
 #define GIT_HASH_UNKNOWN 0
 /* SHA-1 */
 #define GIT_HASH_SHA1 1
+/* SHA-256  */
+#define GIT_HASH_SHA256 2
 /* Number of algorithms supported (including unknown). */
-#define GIT_HASH_NALGOS (GIT_HASH_SHA1 + 1)
+#define GIT_HASH_NALGOS (GIT_HASH_SHA256 + 1)
 
 /* A suitably aligned type for stack allocations of hash contexts. */
 union git_hash_ctx {
 	git_SHA_CTX sha1;
+	git_SHA256_CTX sha256;
 };
 typedef union git_hash_ctx git_hash_ctx;
 
diff --git a/sha1-file.c b/sha1-file.c
index 9bdd04ea45..c97d93a14a 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -40,10 +40,20 @@
 #define EMPTY_TREE_SHA1_BIN_LITERAL \
 	 "\x4b\x82\x5d\xc6\x42\xcb\x6e\xb9\xa0\x60" \
 	 "\xe5\x4b\xf8\xd6\x92\x88\xfb\xee\x49\x04"
+#define EMPTY_TREE_SHA256_BIN_LITERAL \
+	"\x6e\xf1\x9b\x41\x22\x5c\x53\x69\xf1\xc1" \
+	"\x04\xd4\x5d\x8d\x85\xef\xa9\xb0\x57\xb5" \
+	"\x3b\x14\xb4\xb9\xb9\x39\xdd\x74\xde\xcc" \
+	"\x53\x21"
 
 #define EMPTY_BLOB_SHA1_BIN_LITERAL \
 	"\xe6\x9d\xe2\x9b\xb2\xd1\xd6\x43\x4b\x8b" \
 	"\x29\xae\x77\x5a\xd8\xc2\xe4\x8c\x53\x91"
+#define EMPTY_BLOB_SHA256_BIN_LITERAL \
+	"\x47\x3a\x0f\x4c\x3b\xe8\xa9\x36\x81\xa2" \
+	"\x67\xe3\xb1\xe9\xa7\xdc\xda\x11\x85\x43" \
+	"\x6f\xe1\x41\xf7\x74\x91\x20\xa3\x03\x72" \
+	"\x18\x13"
 
 const unsigned char null_sha1[GIT_MAX_RAWSZ];
 const struct object_id null_oid;
@@ -53,6 +63,12 @@ static const struct object_id empty_tree_oid = {
 static const struct object_id empty_blob_oid = {
 	EMPTY_BLOB_SHA1_BIN_LITERAL
 };
+static const struct object_id empty_tree_oid_sha256 = {
+	EMPTY_TREE_SHA256_BIN_LITERAL
+};
+static const struct object_id empty_blob_oid_sha256 = {
+	EMPTY_BLOB_SHA256_BIN_LITERAL
+};
 
 static void git_hash_sha1_init(git_hash_ctx *ctx)
 {
@@ -69,6 +85,22 @@ static void git_hash_sha1_final(unsigned char *hash, git_hash_ctx *ctx)
 	git_SHA1_Final(hash, &ctx->sha1);
 }
 
+
+static void git_hash_sha256_init(git_hash_ctx *ctx)
+{
+	git_SHA256_Init(&ctx->sha256);
+}
+
+static void git_hash_sha256_update(git_hash_ctx *ctx, const void *data, size_t len)
+{
+	git_SHA256_Update(&ctx->sha256, data, len);
+}
+
+static void git_hash_sha256_final(unsigned char *hash, git_hash_ctx *ctx)
+{
+	git_SHA256_Final(hash, &ctx->sha256);
+}
+
 static void git_hash_unknown_init(git_hash_ctx *ctx)
 {
 	BUG("trying to init unknown hash");
@@ -110,6 +142,19 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		&empty_tree_oid,
 		&empty_blob_oid,
 	},
+	{
+		"sha256",
+		/* "s256", big-endian */
+		0x73323536,
+		GIT_SHA256_RAWSZ,
+		GIT_SHA256_HEXSZ,
+		GIT_SHA256_BLKSZ,
+		git_hash_sha256_init,
+		git_hash_sha256_update,
+		git_hash_sha256_final,
+		&empty_tree_oid_sha256,
+		&empty_blob_oid_sha256,
+	}
 };
 
 const char *empty_tree_oid_hex(void)
diff --git a/sha256/block/sha256.c b/sha256/block/sha256.c
new file mode 100644
index 0000000000..fa9f80b350
--- /dev/null
+++ b/sha256/block/sha256.c
@@ -0,0 +1,201 @@
+#include "git-compat-util.h"
+#include "./sha256.h"
+
+#undef RND
+#undef BLKSIZE
+
+#define BLKSIZE blk_SHA256_BLKSIZE
+
+void blk_SHA256_Init(blk_SHA256_CTX *ctx)
+{
+	ctx->offset = 0;
+	ctx->size = 0;
+	ctx->state[0] = 0x6A09E667UL;
+	ctx->state[1] = 0xBB67AE85UL;
+	ctx->state[2] = 0x3C6EF372UL;
+	ctx->state[3] = 0xA54FF53AUL;
+	ctx->state[4] = 0x510E527FUL;
+	ctx->state[5] = 0x9B05688CUL;
+	ctx->state[6] = 0x1F83D9ABUL;
+	ctx->state[7] = 0x5BE0CD19UL;
+}
+
+static inline uint32_t ror(uint32_t x, unsigned n)
+{
+	return (x >> n) | (x << (32 - n));
+}
+
+static inline uint32_t ch(uint32_t x, uint32_t y, uint32_t z)
+{
+	return (z ^ (x & (y ^ z)));
+}
+
+static inline uint32_t maj(uint32_t x, uint32_t y, uint32_t z)
+{
+	return (((x | y) & z) | (x & y));
+}
+
+static inline uint32_t sigma0(uint32_t x)
+{
+	return ror(x, 2) ^ ror(x, 13) ^ ror(x, 22);
+}
+
+static inline uint32_t sigma1(uint32_t x)
+{
+	return ror(x, 6) ^ ror(x, 11) ^ ror(x, 25);
+}
+
+static inline uint32_t gamma0(uint32_t x)
+{
+	return ror(x, 7) ^ ror(x, 18) ^ (x >> 3);
+}
+
+static inline uint32_t gamma1(uint32_t x)
+{
+	return ror(x, 17) ^ ror(x, 19) ^ (x >> 10);
+}
+
+static void blk_SHA256_Transform(blk_SHA256_CTX *ctx, const unsigned char *buf)
+{
+
+	uint32_t S[8], W[64], t0, t1;
+	int i;
+
+	/* copy state into S */
+	for (i = 0; i < 8; i++) {
+		S[i] = ctx->state[i];
+	}
+
+	/* copy the state into 512-bits into W[0..15] */
+	for (i = 0; i < 16; i++, buf += sizeof(uint32_t)) {
+		W[i] = get_be32(buf);
+	}
+
+	/* fill W[16..63] */
+	for (i = 16; i < 64; i++) {
+		W[i] = gamma1(W[i - 2]) + W[i - 7] + gamma0(W[i - 15]) + W[i - 16];
+	}
+
+#define RND(a,b,c,d,e,f,g,h,i,ki)                    \
+	t0 = h + sigma1(e) + ch(e, f, g) + ki + W[i];   \
+	t1 = sigma0(a) + maj(a, b, c);                  \
+	d += t0;                                        \
+	h  = t0 + t1;
+
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],0,0x428a2f98);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],1,0x71374491);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],2,0xb5c0fbcf);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],3,0xe9b5dba5);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],4,0x3956c25b);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],5,0x59f111f1);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],6,0x923f82a4);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],7,0xab1c5ed5);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],8,0xd807aa98);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],9,0x12835b01);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],10,0x243185be);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],11,0x550c7dc3);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],12,0x72be5d74);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],13,0x80deb1fe);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],14,0x9bdc06a7);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],15,0xc19bf174);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],16,0xe49b69c1);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],17,0xefbe4786);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],18,0x0fc19dc6);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],19,0x240ca1cc);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],20,0x2de92c6f);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],21,0x4a7484aa);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],22,0x5cb0a9dc);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],23,0x76f988da);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],24,0x983e5152);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],25,0xa831c66d);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],26,0xb00327c8);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],27,0xbf597fc7);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],28,0xc6e00bf3);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],29,0xd5a79147);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],30,0x06ca6351);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],31,0x14292967);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],32,0x27b70a85);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],33,0x2e1b2138);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],34,0x4d2c6dfc);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],35,0x53380d13);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],36,0x650a7354);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],37,0x766a0abb);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],38,0x81c2c92e);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],39,0x92722c85);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],40,0xa2bfe8a1);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],41,0xa81a664b);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],42,0xc24b8b70);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],43,0xc76c51a3);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],44,0xd192e819);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],45,0xd6990624);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],46,0xf40e3585);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],47,0x106aa070);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],48,0x19a4c116);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],49,0x1e376c08);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],50,0x2748774c);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],51,0x34b0bcb5);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],52,0x391c0cb3);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],53,0x4ed8aa4a);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],54,0x5b9cca4f);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],55,0x682e6ff3);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],56,0x748f82ee);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],57,0x78a5636f);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],58,0x84c87814);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],59,0x8cc70208);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],60,0x90befffa);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],61,0xa4506ceb);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],62,0xbef9a3f7);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],63,0xc67178f2);
+
+
+	for (i = 0; i < 8; i++) {
+		ctx->state[i] = ctx->state[i] + S[i];
+	}
+}
+
+void blk_SHA256_Update(blk_SHA256_CTX *ctx, const void *data, size_t len)
+{
+	unsigned int len_buf = ctx->size & 63;
+
+	ctx->size += len;
+
+	/* Read the data into buf and process blocks as they get full */
+	if (len_buf) {
+		unsigned int left = 64 - len_buf;
+		if (len < left)
+			left = len;
+		memcpy(len_buf + ctx->buf, data, left);
+		len_buf = (len_buf + left) & 63;
+		len -= left;
+		data = ((const char *)data + left);
+		if (len_buf)
+			return;
+		blk_SHA256_Transform(ctx, ctx->buf);
+	}
+	while (len >= 64) {
+		blk_SHA256_Transform(ctx, data);
+		data = ((const char *)data + 64);
+		len -= 64;
+	}
+	if (len)
+		memcpy(ctx->buf, data, len);
+}
+
+void blk_SHA256_Final(unsigned char *digest, blk_SHA256_CTX *ctx)
+{
+	static const unsigned char pad[64] = { 0x80 };
+	unsigned int padlen[2];
+	int i;
+
+	/* Pad with a binary 1 (ie 0x80), then zeroes, then length */
+	padlen[0] = htonl((uint32_t)(ctx->size >> 29));
+	padlen[1] = htonl((uint32_t)(ctx->size << 3));
+
+	i = ctx->size & 63;
+	blk_SHA256_Update(ctx, pad, 1 + (63 & (55 - i)));
+	blk_SHA256_Update(ctx, padlen, 8);
+
+	/* copy output */
+	for (i = 0; i < 8; i++, digest += sizeof(uint32_t))
+		put_be32(digest, ctx->state[i]);
+}
diff --git a/sha256/block/sha256.h b/sha256/block/sha256.h
new file mode 100644
index 0000000000..38f02f7e6c
--- /dev/null
+++ b/sha256/block/sha256.h
@@ -0,0 +1,26 @@
+#ifndef SHA256_BLOCK_SHA256_H
+#define SHA256_BLOCK_SHA256_H
+
+#include "git-compat-util.h"
+
+#define blk_SHA256_BLKSIZE 64
+
+struct blk_SHA256_CTX {
+	uint32_t state[8];
+	uint64_t size;
+	uint32_t offset;
+	uint8_t buf[blk_SHA256_BLKSIZE];
+};
+
+typedef struct blk_SHA256_CTX blk_SHA256_CTX;
+
+void blk_SHA256_Init(blk_SHA256_CTX *ctx);
+void blk_SHA256_Update(blk_SHA256_CTX *ctx, const void *data, size_t len);
+void blk_SHA256_Final(unsigned char *digest, blk_SHA256_CTX *ctx);
+
+#define platform_SHA256_CTX blk_SHA256_CTX
+#define platform_SHA256_Init blk_SHA256_Init
+#define platform_SHA256_Update blk_SHA256_Update
+#define platform_SHA256_Final blk_SHA256_Final
+
+#endif
diff --git a/t/helper/test-sha256.c b/t/helper/test-sha256.c
new file mode 100644
index 0000000000..0ac6a99d5f
--- /dev/null
+++ b/t/helper/test-sha256.c
@@ -0,0 +1,7 @@
+#include "test-tool.h"
+#include "cache.h"
+
+int cmd__sha256(int ac, const char **av)
+{
+	return cmd_hash_impl(ac, av, GIT_HASH_SHA256);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index e009c8186d..2a65193514 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -43,6 +43,7 @@ static struct test_cmd cmds[] = {
 	{ "scrap-cache-tree", cmd__scrap_cache_tree },
 	{ "sha1", cmd__sha1 },
 	{ "sha1-array", cmd__sha1_array },
+	{ "sha256", cmd__sha256 },
 	{ "sigchain", cmd__sigchain },
 	{ "strcmp-offset", cmd__strcmp_offset },
 	{ "string-list", cmd__string_list },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 19a7e8332a..2e66a8e47b 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -39,6 +39,7 @@ int cmd__run_command(int argc, const char **argv);
 int cmd__scrap_cache_tree(int argc, const char **argv);
 int cmd__sha1(int argc, const char **argv);
 int cmd__sha1_array(int argc, const char **argv);
+int cmd__sha256(int argc, const char **argv);
 int cmd__sigchain(int argc, const char **argv);
 int cmd__strcmp_offset(int argc, const char **argv);
 int cmd__string_list(int argc, const char **argv);
diff --git a/t/t0015-hash.sh b/t/t0015-hash.sh
index 8e763c2c3d..f8e639743f 100755
--- a/t/t0015-hash.sh
+++ b/t/t0015-hash.sh
@@ -26,4 +26,29 @@ test_expect_success 'test basic SHA-1 hash values' '
 	grep 4b825dc642cb6eb9a060e54bf8d69288fbee4904 actual
 '
 
+test_expect_success 'test basic SHA-256 hash values' '
+	test-tool sha256 </dev/null >actual &&
+	grep e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 actual &&
+	printf "a" | test-tool sha256 >actual &&
+	grep ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb actual &&
+	printf "abc" | test-tool sha256 >actual &&
+	grep ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad actual &&
+	printf "message digest" | test-tool sha256 >actual &&
+	grep f7846f55cf23e14eebeab5b4e1550cad5b509e3348fbc4efa3a1413d393cb650 actual &&
+	printf "abcdefghijklmnopqrstuvwxyz" | test-tool sha256 >actual &&
+	grep 71c480df93d6ae2f1efad1447c66c9525e316218cf51fc8d9ed832f2daf18b73 actual &&
+	perl -E "for (1..100000) { print q{aaaaaaaaaa}; }" | \
+		test-tool sha256 >actual &&
+	grep cdc76e5c9914fb9281a1c7e284d73e67f1809a48a497200e046d39ccc7112cd0 actual &&
+	perl -E "for (1..100000) { print q{abcdefghijklmnopqrstuvwxyz}; }" | \
+		test-tool sha256 >actual &&
+	grep e406ba321ca712ad35a698bf0af8d61fc4dc40eca6bdcea4697962724ccbde35 actual &&
+	printf "blob 0\0" | test-tool sha256 >actual &&
+	grep 473a0f4c3be8a93681a267e3b1e9a7dcda1185436fe141f7749120a303721813 actual &&
+	printf "blob 3\0abc" | test-tool sha256 >actual &&
+	grep c1cf6e465077930e88dc5136641d402f72a229ddd996f627d60e9639eaba35a6 actual &&
+	printf "tree 0\0" | test-tool sha256 >actual &&
+	grep 6ef19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321 actual
+'
+
 test_done

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v4 11/12] sha256: add an SHA-256 implementation using libgcrypt
  2018-10-25  2:39 [PATCH v4 00/12] Base SHA-256 implementation brian m. carlson
                   ` (9 preceding siblings ...)
  2018-10-25  2:40 ` [PATCH v4 10/12] Add a base implementation of SHA-256 support brian m. carlson
@ 2018-10-25  2:40 ` brian m. carlson
  2018-10-25  2:40 ` [PATCH v4 12/12] hash: add an SHA-256 implementation using OpenSSL brian m. carlson
  2018-11-04 23:44 ` [PATCH v5 00/12] Base SHA-256 implementation brian m. carlson
  12 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-10-25  2:40 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor

Generally, one gets better performance out of cryptographic routines
written in assembly than C, and this is also true for SHA-256.  In
addition, most Linux distributions cannot distribute Git linked against
OpenSSL for licensing reasons.

Most systems with GnuPG will also have libgcrypt, since it is a
dependency of GnuPG.  libgcrypt is also faster than the SHA1DC
implementation for messages of a few KiB and larger.

For comparison, on a Core i7-6600U, this implementation processes 16 KiB
chunks at 355 MiB/s while SHA1DC processes equivalent chunks at 337
MiB/s.

In addition, libgcrypt is licensed under the LGPL 2.1, which is
compatible with the GPL.  Add an implementation of SHA-256 that uses
libgcrypt.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Makefile        | 13 +++++++++++--
 hash.h          |  4 ++++
 sha256/gcrypt.h | 30 ++++++++++++++++++++++++++++++
 3 files changed, 45 insertions(+), 2 deletions(-)
 create mode 100644 sha256/gcrypt.h

diff --git a/Makefile b/Makefile
index e99b7712f6..5a07e03100 100644
--- a/Makefile
+++ b/Makefile
@@ -179,6 +179,10 @@ all::
 # in one call to the platform's SHA1_Update(). e.g. APPLE_COMMON_CRYPTO
 # wants 'SHA1_MAX_BLOCK_SIZE=1024L*1024L*1024L' defined.
 #
+# Define BLK_SHA256 to use the built-in SHA-256 routines.
+#
+# Define GCRYPT_SHA256 to use the SHA-256 routines in libgcrypt.
+#
 # Define NEEDS_CRYPTO_WITH_SSL if you need -lcrypto when using -lssl (Darwin).
 #
 # Define NEEDS_SSL_WITH_CRYPTO if you need -lssl when using -lcrypto (Darwin).
@@ -1634,8 +1638,13 @@ endif
 endif
 endif
 
-LIB_OBJS += sha256/block/sha256.o
-BASIC_CFLAGS += -DSHA256_BLK
+ifdef GCRYPT_SHA256
+	BASIC_CFLAGS += -DSHA256_GCRYPT
+	EXTLIBS += -lgcrypt
+else
+	LIB_OBJS += sha256/block/sha256.o
+	BASIC_CFLAGS += -DSHA256_BLK
+endif
 
 ifdef SHA1_MAX_BLOCK_SIZE
 	LIB_OBJS += compat/sha1-chunked.o
diff --git a/hash.h b/hash.h
index a9bc624020..2ef098052d 100644
--- a/hash.h
+++ b/hash.h
@@ -15,7 +15,11 @@
 #include "block-sha1/sha1.h"
 #endif
 
+#if defined(SHA256_GCRYPT)
+#include "sha256/gcrypt.h"
+#else
 #include "sha256/block/sha256.h"
+#endif
 
 #ifndef platform_SHA_CTX
 /*
diff --git a/sha256/gcrypt.h b/sha256/gcrypt.h
new file mode 100644
index 0000000000..09bd8bb200
--- /dev/null
+++ b/sha256/gcrypt.h
@@ -0,0 +1,30 @@
+#ifndef SHA256_GCRYPT_H
+#define SHA256_GCRYPT_H
+
+#include <gcrypt.h>
+
+#define SHA256_DIGEST_SIZE 32
+
+typedef gcry_md_hd_t gcrypt_SHA256_CTX;
+
+inline void gcrypt_SHA256_Init(gcrypt_SHA256_CTX *ctx)
+{
+	gcry_md_open(ctx, GCRY_MD_SHA256, 0);
+}
+
+inline void gcrypt_SHA256_Update(gcrypt_SHA256_CTX *ctx, const void *data, size_t len)
+{
+	gcry_md_write(*ctx, data, len);
+}
+
+inline void gcrypt_SHA256_Final(unsigned char *digest, gcrypt_SHA256_CTX *ctx)
+{
+	memcpy(digest, gcry_md_read(*ctx, GCRY_MD_SHA256), SHA256_DIGEST_SIZE);
+}
+
+#define platform_SHA256_CTX gcrypt_SHA256_CTX
+#define platform_SHA256_Init gcrypt_SHA256_Init
+#define platform_SHA256_Update gcrypt_SHA256_Update
+#define platform_SHA256_Final gcrypt_SHA256_Final
+
+#endif

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v4 12/12] hash: add an SHA-256 implementation using OpenSSL
  2018-10-25  2:39 [PATCH v4 00/12] Base SHA-256 implementation brian m. carlson
                   ` (10 preceding siblings ...)
  2018-10-25  2:40 ` [PATCH v4 11/12] sha256: add an SHA-256 implementation using libgcrypt brian m. carlson
@ 2018-10-25  2:40 ` brian m. carlson
  2018-11-04 23:44 ` [PATCH v5 00/12] Base SHA-256 implementation brian m. carlson
  12 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-10-25  2:40 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor

We already have OpenSSL routines available for SHA-1, so add routines
for SHA-256 as well.

On a Core i7-6600U, this SHA-256 implementation compares favorably to
the SHA1DC SHA-1 implementation:

SHA-1: 157 MiB/s (64 byte chunks); 337 MiB/s (16 KiB chunks)
SHA-256: 165 MiB/s (64 byte chunks); 408 MiB/s (16 KiB chunks)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Makefile | 7 +++++++
 hash.h   | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/Makefile b/Makefile
index 5a07e03100..36fd3a149b 100644
--- a/Makefile
+++ b/Makefile
@@ -183,6 +183,8 @@ all::
 #
 # Define GCRYPT_SHA256 to use the SHA-256 routines in libgcrypt.
 #
+# Define OPENSSL_SHA256 to use the SHA-256 routines in OpenSSL.
+#
 # Define NEEDS_CRYPTO_WITH_SSL if you need -lcrypto when using -lssl (Darwin).
 #
 # Define NEEDS_SSL_WITH_CRYPTO if you need -lssl when using -lcrypto (Darwin).
@@ -1638,6 +1640,10 @@ endif
 endif
 endif
 
+ifdef OPENSSL_SHA256
+	EXTLIBS += $(LIB_4_CRYPTO)
+	BASIC_CFLAGS += -DSHA256_OPENSSL
+else
 ifdef GCRYPT_SHA256
 	BASIC_CFLAGS += -DSHA256_GCRYPT
 	EXTLIBS += -lgcrypt
@@ -1645,6 +1651,7 @@ else
 	LIB_OBJS += sha256/block/sha256.o
 	BASIC_CFLAGS += -DSHA256_BLK
 endif
+endif
 
 ifdef SHA1_MAX_BLOCK_SIZE
 	LIB_OBJS += compat/sha1-chunked.o
diff --git a/hash.h b/hash.h
index 2ef098052d..adde708cf2 100644
--- a/hash.h
+++ b/hash.h
@@ -17,6 +17,8 @@
 
 #if defined(SHA256_GCRYPT)
 #include "sha256/gcrypt.h"
+#elif defined(SHA256_OPENSSL)
+#include <openssl/sha.h>
 #else
 #include "sha256/block/sha256.h"
 #endif

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v4 10/12] Add a base implementation of SHA-256 support
  2018-10-25  2:40 ` [PATCH v4 10/12] Add a base implementation of SHA-256 support brian m. carlson
@ 2018-10-25  3:02   ` Carlo Arenas
  2018-10-28 15:52     ` brian m. carlson
  2018-10-27  9:03   ` Christian Couder
  1 sibling, 1 reply; 58+ messages in thread
From: Carlo Arenas @ 2018-10-25  3:02 UTC (permalink / raw)
  To: sandals; +Cc: git, stolee, avarab, pclouds, szeder.dev

On Wed, Oct 24, 2018 at 7:41 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> diff --git a/sha256/block/sha256.h b/sha256/block/sha256.h
> new file mode 100644
> index 0000000000..38f02f7e6c
> --- /dev/null
> +++ b/sha256/block/sha256.h
> @@ -0,0 +1,26 @@
> +#ifndef SHA256_BLOCK_SHA256_H
> +#define SHA256_BLOCK_SHA256_H
> +
> +#include "git-compat-util.h"

this shouldn't be needed and might be discouraged as per the
instructions in Documentation/CodingGuidelines

Carlo

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v4 10/12] Add a base implementation of SHA-256 support
  2018-10-25  2:40 ` [PATCH v4 10/12] Add a base implementation of SHA-256 support brian m. carlson
  2018-10-25  3:02   ` Carlo Arenas
@ 2018-10-27  9:03   ` Christian Couder
  1 sibling, 0 replies; 58+ messages in thread
From: Christian Couder @ 2018-10-27  9:03 UTC (permalink / raw)
  To: brian m. carlson
  Cc: git, Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Nguyen Thai Ngoc Duy, SZEDER Gábor

On Thu, Oct 25, 2018 at 4:42 AM brian m. carlson
<sandals@crustytoothpaste.net> wrote:

> +static void blk_SHA256_Transform(blk_SHA256_CTX *ctx, const unsigned char *buf)
> +{
> +
> +       uint32_t S[8], W[64], t0, t1;
> +       int i;
> +
> +       /* copy state into S */
> +       for (i = 0; i < 8; i++) {
> +               S[i] = ctx->state[i];
> +       }

Maybe remove unnecessary brackets above and below.

> +       /* copy the state into 512-bits into W[0..15] */
> +       for (i = 0; i < 16; i++, buf += sizeof(uint32_t)) {
> +               W[i] = get_be32(buf);
> +       }
> +
> +       /* fill W[16..63] */
> +       for (i = 16; i < 64; i++) {
> +               W[i] = gamma1(W[i - 2]) + W[i - 7] + gamma0(W[i - 15]) + W[i - 16];
> +       }

[...]

> +       RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],62,0xbef9a3f7);
> +       RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],63,0xc67178f2);
> +
> +

Spurious new line.

> +       for (i = 0; i < 8; i++) {
> +               ctx->state[i] = ctx->state[i] + S[i];
> +       }

Maybe remove unnecessary brackets and use "+=", like:

       for (i = 0; i < 8; i++)
               ctx->state[i] += S[i];

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v4 10/12] Add a base implementation of SHA-256 support
  2018-10-25  3:02   ` Carlo Arenas
@ 2018-10-28 15:52     ` brian m. carlson
  2018-10-29  0:39       ` Junio C Hamano
  0 siblings, 1 reply; 58+ messages in thread
From: brian m. carlson @ 2018-10-28 15:52 UTC (permalink / raw)
  To: Carlo Arenas; +Cc: git, stolee, avarab, pclouds, szeder.dev

[-- Attachment #1: Type: text/plain, Size: 954 bytes --]

On Wed, Oct 24, 2018 at 08:02:55PM -0700, Carlo Arenas wrote:
> On Wed, Oct 24, 2018 at 7:41 PM brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> > diff --git a/sha256/block/sha256.h b/sha256/block/sha256.h
> > new file mode 100644
> > index 0000000000..38f02f7e6c
> > --- /dev/null
> > +++ b/sha256/block/sha256.h
> > @@ -0,0 +1,26 @@
> > +#ifndef SHA256_BLOCK_SHA256_H
> > +#define SHA256_BLOCK_SHA256_H
> > +
> > +#include "git-compat-util.h"
> 
> this shouldn't be needed and might be discouraged as per the
> instructions in Documentation/CodingGuidelines

This may not strictly be needed, but removing it makes the header no
longer self-contained, which blows up my (and others') in-editor
linting.  I think it's okay to add this extra header here to keep it
self-contained, even if we know that it's not going to be absolutely
required.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v4 10/12] Add a base implementation of SHA-256 support
  2018-10-28 15:52     ` brian m. carlson
@ 2018-10-29  0:39       ` Junio C Hamano
  2018-10-31 22:55         ` brian m. carlson
  0 siblings, 1 reply; 58+ messages in thread
From: Junio C Hamano @ 2018-10-29  0:39 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Carlo Arenas, git, stolee, avarab, pclouds, szeder.dev

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

>> > +
>> > +#include "git-compat-util.h"
>> 
>> this shouldn't be needed and might be discouraged as per the
>> instructions in Documentation/CodingGuidelines
>
> This may not strictly be needed, but removing it makes the header no
> longer self-contained, which blows up my (and others') in-editor
> linting.

That sounds like bending backwards to please tools, though.  Can't
these in-editor linting learn the local rules of the codebase they
are asked to operate on?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v4 10/12] Add a base implementation of SHA-256 support
  2018-10-29  0:39       ` Junio C Hamano
@ 2018-10-31 22:55         ` brian m. carlson
  2018-11-01  5:29           ` Junio C Hamano
  0 siblings, 1 reply; 58+ messages in thread
From: brian m. carlson @ 2018-10-31 22:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Carlo Arenas, git, stolee, avarab, pclouds, szeder.dev

[-- Attachment #1: Type: text/plain, Size: 1232 bytes --]

On Mon, Oct 29, 2018 at 09:39:33AM +0900, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> > This may not strictly be needed, but removing it makes the header no
> > longer self-contained, which blows up my (and others') in-editor
> > linting.
> 
> That sounds like bending backwards to please tools, though.  Can't
> these in-editor linting learn the local rules of the codebase they
> are asked to operate on?

Doing so involves writing (in my case) Vim code to configure settings
for this repo.

Since language linting tools invoke compilers and other language
runtimes, you need to specify command-line arguments to those tools, and
in general, that's not a safe thing you can read from the repository
configuration, since just viewing files should not be able to execute
arbitrary code[0].  Languages such as Perl which can execute arbitrary
code with compile checks have to be enabled explicitly with ALE (which
is what I'm using).

I need to do a reroll to address some other issues people brought up, so
I can remove this line.

[0] Pass "-o .bashrc" to the preprocessor, for example.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v4 10/12] Add a base implementation of SHA-256 support
  2018-10-31 22:55         ` brian m. carlson
@ 2018-11-01  5:29           ` Junio C Hamano
  0 siblings, 0 replies; 58+ messages in thread
From: Junio C Hamano @ 2018-11-01  5:29 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Carlo Arenas, git, stolee, avarab, pclouds, szeder.dev

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> I need to do a reroll to address some other issues people brought up, so
> I can remove this line.

OK.  I actually do not feel too strongly about this one, by the way.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v5 00/12] Base SHA-256 implementation
  2018-10-25  2:39 [PATCH v4 00/12] Base SHA-256 implementation brian m. carlson
                   ` (11 preceding siblings ...)
  2018-10-25  2:40 ` [PATCH v4 12/12] hash: add an SHA-256 implementation using OpenSSL brian m. carlson
@ 2018-11-04 23:44 ` brian m. carlson
  2018-11-04 23:44   ` [PATCH v5 01/12] sha1-file: rename algorithm to "sha1" brian m. carlson
                     ` (13 more replies)
  12 siblings, 14 replies; 58+ messages in thread
From: brian m. carlson @ 2018-11-04 23:44 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

This series provides a functional SHA-256 implementation and wires it
up, along with some housekeeping patches to make it suitable for
testing.

Changes from v4:
* Downcase hex constants for consistency.
* Remove needless parentheses in return statement.
* Remove braces for single statement loops.
* Switch to +=.
* Add references to rationale for SHA-256.
* Remove inclusion of "git-compat-util.h" in header.

Changes from v3:
* Switch to using inline functions instead of macros in many cases.
* Undefine remaining macros at the top.

Changes from v2:
* Improve commit messages to include timing and performance information.
* Improve commit messages to be less ambiguous and more friendly to a
  wider variety of English speakers.
* Prefer functions taking struct git_hash_algo in hex.c.
* Port pieces of the block-sha1 implementation over to the block-sha256
  implementation for better compatibility.
* Drop patch 13 in favor of further discussion about the best way
  forward for versioning commit graph.
* Rename the test so as to have a different number from other tests.
* Rebase on master.

Changes from v1:
* Add a hash_to_hex function mirroring sha1_to_hex, but for
  the_hash_algo.
* Strip commit message explanation about why we chose SHA-256.
* Rebase on master
* Strip leading whitespace from commit message.
* Improve commit-graph patch to cover new code added since v1.
* Be more honest about the scope of work involved in porting the SHA-256
  implementation out of libtomcrypt.
* Revert change to limit hashcmp to 20 bytes.

brian m. carlson (12):
  sha1-file: rename algorithm to "sha1"
  sha1-file: provide functions to look up hash algorithms
  hex: introduce functions to print arbitrary hashes
  cache: make hashcmp and hasheq work with larger hashes
  t: add basic tests for our SHA-1 implementation
  t: make the sha1 test-tool helper generic
  sha1-file: add a constant for hash block size
  t/helper: add a test helper to compute hash speed
  commit-graph: convert to using the_hash_algo
  Add a base implementation of SHA-256 support
  sha256: add an SHA-256 implementation using libgcrypt
  hash: add an SHA-256 implementation using OpenSSL

 Makefile                              |  22 +++
 cache.h                               |  51 ++++---
 commit-graph.c                        |  33 ++---
 hash.h                                |  41 +++++-
 hex.c                                 |  32 +++--
 sha1-file.c                           |  70 ++++++++-
 sha256/block/sha256.c                 | 196 ++++++++++++++++++++++++++
 sha256/block/sha256.h                 |  26 ++++
 sha256/gcrypt.h                       |  30 ++++
 t/helper/test-hash-speed.c            |  61 ++++++++
 t/helper/{test-sha1.c => test-hash.c} |  19 +--
 t/helper/test-sha1.c                  |  52 +------
 t/helper/test-sha256.c                |   7 +
 t/helper/test-tool.c                  |   2 +
 t/helper/test-tool.h                  |   4 +
 t/t0015-hash.sh                       |  54 +++++++
 16 files changed, 596 insertions(+), 104 deletions(-)
 create mode 100644 sha256/block/sha256.c
 create mode 100644 sha256/block/sha256.h
 create mode 100644 sha256/gcrypt.h
 create mode 100644 t/helper/test-hash-speed.c
 copy t/helper/{test-sha1.c => test-hash.c} (65%)
 create mode 100644 t/helper/test-sha256.c
 create mode 100755 t/t0015-hash.sh

Range-diff against v4:
 1:  a004a4c982 <  -:  ---------- :hash-impl
 2:  cf9f7f5620 =  1:  cf9f7f5620 sha1-file: rename algorithm to "sha1"
 3:  0144deaebe =  2:  0144deaebe sha1-file: provide functions to look up hash algorithms
 4:  b74858fb03 =  3:  b74858fb03 hex: introduce functions to print arbitrary hashes
 5:  e9703017a4 =  4:  e9703017a4 cache: make hashcmp and hasheq work with larger hashes
 6:  ab85a834fd =  5:  ab85a834fd t: add basic tests for our SHA-1 implementation
 7:  962f6d8903 =  6:  962f6d8903 t: make the sha1 test-tool helper generic
 8:  53addf4d58 =  7:  53addf4d58 sha1-file: add a constant for hash block size
 9:  9ace10faa2 =  8:  9ace10faa2 t/helper: add a test helper to compute hash speed
10:  9adc56d01e =  9:  9adc56d01e commit-graph: convert to using the_hash_algo
11:  f48cb1ad27 ! 10:  90544c504c Add a base implementation of SHA-256 support
    @@ -4,7 +4,9 @@
     
         SHA-1 is weak and we need to transition to a new hash function.  For
         some time, we have referred to this new function as NewHash.  Recently,
    -    we decided to pick SHA-256 as NewHash.
    +    we decided to pick SHA-256 as NewHash.  The reasons behind the choice of
    +    SHA-256 are outlined in the thread starting at [1] and in the commit
    +    history for the hash function transition document.
     
         Add a basic implementation of SHA-256 based off libtomcrypt, which is in
         the public domain.  Optimize it and restructure it to meet our coding
    @@ -20,6 +22,8 @@
         SHA-256 in Git.  Additional patches are needed to prepare the code to
         handle a larger hash algorithm and further test fixes are needed.
     
    +    [1] https://public-inbox.org/git/20180609224913.GC38834@genre.crustytoothpaste.net/
    +
         Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
     
      diff --git a/Makefile b/Makefile
    @@ -216,14 +220,14 @@
     +{
     +	ctx->offset = 0;
     +	ctx->size = 0;
    -+	ctx->state[0] = 0x6A09E667UL;
    -+	ctx->state[1] = 0xBB67AE85UL;
    -+	ctx->state[2] = 0x3C6EF372UL;
    -+	ctx->state[3] = 0xA54FF53AUL;
    -+	ctx->state[4] = 0x510E527FUL;
    -+	ctx->state[5] = 0x9B05688CUL;
    -+	ctx->state[6] = 0x1F83D9ABUL;
    -+	ctx->state[7] = 0x5BE0CD19UL;
    ++	ctx->state[0] = 0x6a09e667ul;
    ++	ctx->state[1] = 0xbb67ae85ul;
    ++	ctx->state[2] = 0x3c6ef372ul;
    ++	ctx->state[3] = 0xa54ff53aul;
    ++	ctx->state[4] = 0x510e527ful;
    ++	ctx->state[5] = 0x9b05688cul;
    ++	ctx->state[6] = 0x1f83d9abul;
    ++	ctx->state[7] = 0x5be0cd19ul;
     +}
     +
     +static inline uint32_t ror(uint32_t x, unsigned n)
    @@ -233,12 +237,12 @@
     +
     +static inline uint32_t ch(uint32_t x, uint32_t y, uint32_t z)
     +{
    -+	return (z ^ (x & (y ^ z)));
    ++	return z ^ (x & (y ^ z));
     +}
     +
     +static inline uint32_t maj(uint32_t x, uint32_t y, uint32_t z)
     +{
    -+	return (((x | y) & z) | (x & y));
    ++	return ((x | y) & z) | (x & y);
     +}
     +
     +static inline uint32_t sigma0(uint32_t x)
    @@ -268,19 +272,16 @@
     +	int i;
     +
     +	/* copy state into S */
    -+	for (i = 0; i < 8; i++) {
    ++	for (i = 0; i < 8; i++)
     +		S[i] = ctx->state[i];
    -+	}
     +
     +	/* copy the state into 512-bits into W[0..15] */
    -+	for (i = 0; i < 16; i++, buf += sizeof(uint32_t)) {
    ++	for (i = 0; i < 16; i++, buf += sizeof(uint32_t))
     +		W[i] = get_be32(buf);
    -+	}
     +
     +	/* fill W[16..63] */
    -+	for (i = 16; i < 64; i++) {
    ++	for (i = 16; i < 64; i++)
     +		W[i] = gamma1(W[i - 2]) + W[i - 7] + gamma0(W[i - 15]) + W[i - 16];
    -+	}
     +
     +#define RND(a,b,c,d,e,f,g,h,i,ki)                    \
     +	t0 = h + sigma1(e) + ch(e, f, g) + ki + W[i];   \
    @@ -353,10 +354,8 @@
     +	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],62,0xbef9a3f7);
     +	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],63,0xc67178f2);
     +
    -+
    -+	for (i = 0; i < 8; i++) {
    -+		ctx->state[i] = ctx->state[i] + S[i];
    -+	}
    ++	for (i = 0; i < 8; i++)
    ++		ctx->state[i] += S[i];
     +}
     +
     +void blk_SHA256_Update(blk_SHA256_CTX *ctx, const void *data, size_t len)
12:  fe8f2ba01c = 11:  467c86e878 sha256: add an SHA-256 implementation using libgcrypt
13:  38142d8fc6 = 12:  73e4bc17d0 hash: add an SHA-256 implementation using OpenSSL

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v5 01/12] sha1-file: rename algorithm to "sha1"
  2018-11-04 23:44 ` [PATCH v5 00/12] Base SHA-256 implementation brian m. carlson
@ 2018-11-04 23:44   ` brian m. carlson
  2018-11-05  7:21     ` Ævar Arnfjörð Bjarmason
  2018-11-04 23:44   ` [PATCH v5 02/12] sha1-file: provide functions to look up hash algorithms brian m. carlson
                     ` (12 subsequent siblings)
  13 siblings, 1 reply; 58+ messages in thread
From: brian m. carlson @ 2018-11-04 23:44 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

The transition plan anticipates us using a syntax such as "^{sha1}" for
disambiguation.  Since this is a syntax some people will be typing a
lot, it makes sense to provide a short, easy-to-type syntax.  Omitting
the dash doesn't create any ambiguity; however, it does make the syntax
shorter and easier to type, especially for touch typists.  In addition,
the transition plan already uses "sha1" in this context.

Rename the name of SHA-1 implementation to "sha1".

Note that this change creates no backwards compatibility concerns, since
we haven't yet used this field in any configuration settings.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 sha1-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sha1-file.c b/sha1-file.c
index dd0b6aa873..91311ebb3d 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -97,7 +97,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		NULL,
 	},
 	{
-		"sha-1",
+		"sha1",
 		/* "sha1", big-endian */
 		0x73686131,
 		GIT_SHA1_RAWSZ,

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v5 02/12] sha1-file: provide functions to look up hash algorithms
  2018-11-04 23:44 ` [PATCH v5 00/12] Base SHA-256 implementation brian m. carlson
  2018-11-04 23:44   ` [PATCH v5 01/12] sha1-file: rename algorithm to "sha1" brian m. carlson
@ 2018-11-04 23:44   ` brian m. carlson
  2018-11-13 18:42     ` Derrick Stolee
  2018-11-04 23:44   ` [PATCH v5 03/12] hex: introduce functions to print arbitrary hashes brian m. carlson
                     ` (11 subsequent siblings)
  13 siblings, 1 reply; 58+ messages in thread
From: brian m. carlson @ 2018-11-04 23:44 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

There are several ways we might refer to a hash algorithm: by name, such
as in the config file; by format ID, such as in a pack; or internally,
by a pointer to the hash_algos array.  Provide functions to look up hash
algorithms based on these various forms and return the internal constant
used for them.  If conversion to another form is necessary, this
internal constant can be used to look up the proper data in the
hash_algos array.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 hash.h      | 13 +++++++++++++
 sha1-file.c | 21 +++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/hash.h b/hash.h
index 7c8238bc2e..80881eea47 100644
--- a/hash.h
+++ b/hash.h
@@ -98,4 +98,17 @@ struct git_hash_algo {
 };
 extern const struct git_hash_algo hash_algos[GIT_HASH_NALGOS];
 
+/*
+ * Return a GIT_HASH_* constant based on the name.  Returns GIT_HASH_UNKNOWN if
+ * the name doesn't match a known algorithm.
+ */
+int hash_algo_by_name(const char *name);
+/* Identical, except based on the format ID. */
+int hash_algo_by_id(uint32_t format_id);
+/* Identical, except for a pointer to struct git_hash_algo. */
+static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
+{
+	return p - hash_algos;
+}
+
 #endif
diff --git a/sha1-file.c b/sha1-file.c
index 91311ebb3d..7e9dedc744 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -122,6 +122,27 @@ const char *empty_blob_oid_hex(void)
 	return oid_to_hex_r(buf, the_hash_algo->empty_blob);
 }
 
+int hash_algo_by_name(const char *name)
+{
+	int i;
+	if (!name)
+		return GIT_HASH_UNKNOWN;
+	for (i = 1; i < GIT_HASH_NALGOS; i++)
+		if (!strcmp(name, hash_algos[i].name))
+			return i;
+	return GIT_HASH_UNKNOWN;
+}
+
+int hash_algo_by_id(uint32_t format_id)
+{
+	int i;
+	for (i = 1; i < GIT_HASH_NALGOS; i++)
+		if (format_id == hash_algos[i].format_id)
+			return i;
+	return GIT_HASH_UNKNOWN;
+}
+
+
 /*
  * This is meant to hold a *small* number of objects that you would
  * want read_sha1_file() to be able to return, but yet you do not want

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v5 03/12] hex: introduce functions to print arbitrary hashes
  2018-11-04 23:44 ` [PATCH v5 00/12] Base SHA-256 implementation brian m. carlson
  2018-11-04 23:44   ` [PATCH v5 01/12] sha1-file: rename algorithm to "sha1" brian m. carlson
  2018-11-04 23:44   ` [PATCH v5 02/12] sha1-file: provide functions to look up hash algorithms brian m. carlson
@ 2018-11-04 23:44   ` brian m. carlson
  2018-11-04 23:44   ` [PATCH v5 04/12] cache: make hashcmp and hasheq work with larger hashes brian m. carlson
                     ` (10 subsequent siblings)
  13 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-11-04 23:44 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

Currently, we have functions that turn an arbitrary SHA-1 value or an
object ID into hex format, either using a static buffer or with a
user-provided buffer.  Add variants of these functions that can handle
an arbitrary hash algorithm, specified by constant.  Update the
documentation as well.

While we're at it, remove the "extern" declaration from this family of
functions, since it's not needed and our style now recommends against
it.

We use the variant taking the algorithm structure pointer as the
internal variant, since taking an algorithm pointer is the easiest way
to handle all of the variants in use.

Note that we maintain these functions because there are hashes which
must change based on the hash algorithm in use but are not object IDs
(such as pack checksums).

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 cache.h | 15 +++++++++------
 hex.c   | 32 ++++++++++++++++++++++++--------
 2 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/cache.h b/cache.h
index 59c8a93046..51580c4b77 100644
--- a/cache.h
+++ b/cache.h
@@ -1364,9 +1364,9 @@ extern int get_oid_hex(const char *hex, struct object_id *sha1);
 extern int hex_to_bytes(unsigned char *binary, const char *hex, size_t len);
 
 /*
- * Convert a binary sha1 to its hex equivalent. The `_r` variant is reentrant,
+ * Convert a binary hash to its hex equivalent. The `_r` variant is reentrant,
  * and writes the NUL-terminated output to the buffer `out`, which must be at
- * least `GIT_SHA1_HEXSZ + 1` bytes, and returns a pointer to out for
+ * least `GIT_MAX_HEXSZ + 1` bytes, and returns a pointer to out for
  * convenience.
  *
  * The non-`_r` variant returns a static buffer, but uses a ring of 4
@@ -1374,10 +1374,13 @@ extern int hex_to_bytes(unsigned char *binary, const char *hex, size_t len);
  *
  *   printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two));
  */
-extern char *sha1_to_hex_r(char *out, const unsigned char *sha1);
-extern char *oid_to_hex_r(char *out, const struct object_id *oid);
-extern char *sha1_to_hex(const unsigned char *sha1);	/* static buffer result! */
-extern char *oid_to_hex(const struct object_id *oid);	/* same static buffer as sha1_to_hex */
+char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash, const struct git_hash_algo *);
+char *sha1_to_hex_r(char *out, const unsigned char *sha1);
+char *oid_to_hex_r(char *out, const struct object_id *oid);
+char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *);	/* static buffer result! */
+char *sha1_to_hex(const unsigned char *sha1);						/* same static buffer */
+char *hash_to_hex(const unsigned char *hash);						/* same static buffer */
+char *oid_to_hex(const struct object_id *oid);						/* same static buffer */
 
 /*
  * Parse a 40-character hexadecimal object ID starting from hex, updating the
diff --git a/hex.c b/hex.c
index 10af1a29e8..d2e8bb9540 100644
--- a/hex.c
+++ b/hex.c
@@ -73,14 +73,15 @@ int parse_oid_hex(const char *hex, struct object_id *oid, const char **end)
 	return ret;
 }
 
-char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
+inline char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash,
+					const struct git_hash_algo *algop)
 {
 	static const char hex[] = "0123456789abcdef";
 	char *buf = buffer;
 	int i;
 
-	for (i = 0; i < the_hash_algo->rawsz; i++) {
-		unsigned int val = *sha1++;
+	for (i = 0; i < algop->rawsz; i++) {
+		unsigned int val = *hash++;
 		*buf++ = hex[val >> 4];
 		*buf++ = hex[val & 0xf];
 	}
@@ -89,20 +90,35 @@ char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
 	return buffer;
 }
 
-char *oid_to_hex_r(char *buffer, const struct object_id *oid)
+char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
 {
-	return sha1_to_hex_r(buffer, oid->hash);
+	return hash_to_hex_algop_r(buffer, sha1, &hash_algos[GIT_HASH_SHA1]);
 }
 
-char *sha1_to_hex(const unsigned char *sha1)
+char *oid_to_hex_r(char *buffer, const struct object_id *oid)
+{
+	return hash_to_hex_algop_r(buffer, oid->hash, the_hash_algo);
+}
+
+char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *algop)
 {
 	static int bufno;
 	static char hexbuffer[4][GIT_MAX_HEXSZ + 1];
 	bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer);
-	return sha1_to_hex_r(hexbuffer[bufno], sha1);
+	return hash_to_hex_algop_r(hexbuffer[bufno], hash, algop);
+}
+
+char *sha1_to_hex(const unsigned char *sha1)
+{
+	return hash_to_hex_algop(sha1, &hash_algos[GIT_HASH_SHA1]);
+}
+
+char *hash_to_hex(const unsigned char *hash)
+{
+	return hash_to_hex_algop(hash, the_hash_algo);
 }
 
 char *oid_to_hex(const struct object_id *oid)
 {
-	return sha1_to_hex(oid->hash);
+	return hash_to_hex_algop(oid->hash, the_hash_algo);
 }

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v5 04/12] cache: make hashcmp and hasheq work with larger hashes
  2018-11-04 23:44 ` [PATCH v5 00/12] Base SHA-256 implementation brian m. carlson
                     ` (2 preceding siblings ...)
  2018-11-04 23:44   ` [PATCH v5 03/12] hex: introduce functions to print arbitrary hashes brian m. carlson
@ 2018-11-04 23:44   ` brian m. carlson
  2018-11-04 23:44   ` [PATCH v5 05/12] t: add basic tests for our SHA-1 implementation brian m. carlson
                     ` (9 subsequent siblings)
  13 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-11-04 23:44 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

In 183a638b7d ("hashcmp: assert constant hash size", 2018-08-23), we
modified hashcmp to assert that the hash size was always 20 to help it
optimize and inline calls to memcmp.  In a future series, we replaced
many calls to hashcmp and oidcmp with calls to hasheq and oideq to
improve inlining further.

However, we want to support hash algorithms other than SHA-1, namely
SHA-256.  When doing so, we must handle the case where these values are
32 bytes long as well as 20.  Adjust hashcmp to handle two cases:
20-byte matches, and maximum-size matches.  Therefore, when we include
SHA-256, we'll automatically handle it properly, while at the same time
teaching the compiler that there are only two possible options to
consider.  This will allow the compiler to write the most efficient
possible code.

Copy similar code into hasheq and perform an identical transformation.
At least with GCC 8.2.0, making hasheq defer to hashcmp when there are
two branches prevents the compiler from inlining the comparison, while
the code in this patch is inlined properly.  Add a comment to avoid an
accidental performance regression from well-intentioned refactoring.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 cache.h | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index 51580c4b77..bab8e8964f 100644
--- a/cache.h
+++ b/cache.h
@@ -1027,16 +1027,12 @@ extern const struct object_id null_oid;
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
 	/*
-	 * This is a temporary optimization hack. By asserting the size here,
-	 * we let the compiler know that it's always going to be 20, which lets
-	 * it turn this fixed-size memcmp into a few inline instructions.
-	 *
-	 * This will need to be extended or ripped out when we learn about
-	 * hashes of different sizes.
+	 * Teach the compiler that there are only two possibilities of hash size
+	 * here, so that it can optimize for this case as much as possible.
 	 */
-	if (the_hash_algo->rawsz != 20)
-		BUG("hash size not yet supported by hashcmp");
-	return memcmp(sha1, sha2, the_hash_algo->rawsz);
+	if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
+		return memcmp(sha1, sha2, GIT_MAX_RAWSZ);
+	return memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)
@@ -1046,7 +1042,13 @@ static inline int oidcmp(const struct object_id *oid1, const struct object_id *o
 
 static inline int hasheq(const unsigned char *sha1, const unsigned char *sha2)
 {
-	return !hashcmp(sha1, sha2);
+	/*
+	 * We write this here instead of deferring to hashcmp so that the
+	 * compiler can properly inline it and avoid calling memcmp.
+	 */
+	if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
+		return !memcmp(sha1, sha2, GIT_MAX_RAWSZ);
+	return !memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
 }
 
 static inline int oideq(const struct object_id *oid1, const struct object_id *oid2)

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v5 05/12] t: add basic tests for our SHA-1 implementation
  2018-11-04 23:44 ` [PATCH v5 00/12] Base SHA-256 implementation brian m. carlson
                     ` (3 preceding siblings ...)
  2018-11-04 23:44   ` [PATCH v5 04/12] cache: make hashcmp and hasheq work with larger hashes brian m. carlson
@ 2018-11-04 23:44   ` brian m. carlson
  2018-11-04 23:44   ` [PATCH v5 06/12] t: make the sha1 test-tool helper generic brian m. carlson
                     ` (8 subsequent siblings)
  13 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-11-04 23:44 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

We have in the past had some unfortunate endianness issues with some
SHA-1 implementations we ship, especially on big-endian machines.  Add
an explicit test using the test helper to catch these issues and point
them out prominently.  This test can also be used as a staging ground
for people testing additional algorithms to verify that their
implementations are working as expected.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t0015-hash.sh | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)
 create mode 100755 t/t0015-hash.sh

diff --git a/t/t0015-hash.sh b/t/t0015-hash.sh
new file mode 100755
index 0000000000..8e763c2c3d
--- /dev/null
+++ b/t/t0015-hash.sh
@@ -0,0 +1,29 @@
+#!/bin/sh
+
+test_description='test basic hash implementation'
+. ./test-lib.sh
+
+
+test_expect_success 'test basic SHA-1 hash values' '
+	test-tool sha1 </dev/null >actual &&
+	grep da39a3ee5e6b4b0d3255bfef95601890afd80709 actual &&
+	printf "a" | test-tool sha1 >actual &&
+	grep 86f7e437faa5a7fce15d1ddcb9eaeaea377667b8 actual &&
+	printf "abc" | test-tool sha1 >actual &&
+	grep a9993e364706816aba3e25717850c26c9cd0d89d actual &&
+	printf "message digest" | test-tool sha1 >actual &&
+	grep c12252ceda8be8994d5fa0290a47231c1d16aae3 actual &&
+	printf "abcdefghijklmnopqrstuvwxyz" | test-tool sha1 >actual &&
+	grep 32d10c7b8cf96570ca04ce37f2a19d84240d3a89 actual &&
+	perl -E "for (1..100000) { print q{aaaaaaaaaa}; }" | \
+		test-tool sha1 >actual &&
+	grep 34aa973cd4c4daa4f61eeb2bdbad27316534016f actual &&
+	printf "blob 0\0" | test-tool sha1 >actual &&
+	grep e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 actual &&
+	printf "blob 3\0abc" | test-tool sha1 >actual &&
+	grep f2ba8f84ab5c1bce84a7b441cb1959cfc7093b7f actual &&
+	printf "tree 0\0" | test-tool sha1 >actual &&
+	grep 4b825dc642cb6eb9a060e54bf8d69288fbee4904 actual
+'
+
+test_done

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v5 06/12] t: make the sha1 test-tool helper generic
  2018-11-04 23:44 ` [PATCH v5 00/12] Base SHA-256 implementation brian m. carlson
                     ` (4 preceding siblings ...)
  2018-11-04 23:44   ` [PATCH v5 05/12] t: add basic tests for our SHA-1 implementation brian m. carlson
@ 2018-11-04 23:44   ` brian m. carlson
  2018-11-04 23:44   ` [PATCH v5 07/12] sha1-file: add a constant for hash block size brian m. carlson
                     ` (7 subsequent siblings)
  13 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-11-04 23:44 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

Since we're going to have multiple hash algorithms to test, it makes
sense to share as much of the test code as possible.  Convert the sha1
helper for the test-tool to be generic and move it out into its own
module.  This will allow us to share most of this code with our NewHash
implementation.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Makefile                              |  1 +
 t/helper/{test-sha1.c => test-hash.c} | 19 +++++-----
 t/helper/test-sha1.c                  | 52 +--------------------------
 t/helper/test-tool.h                  |  2 ++
 4 files changed, 14 insertions(+), 60 deletions(-)
 copy t/helper/{test-sha1.c => test-hash.c} (65%)

diff --git a/Makefile b/Makefile
index d18ab0fe78..81dc9ac819 100644
--- a/Makefile
+++ b/Makefile
@@ -714,6 +714,7 @@ TEST_BUILTINS_OBJS += test-dump-split-index.o
 TEST_BUILTINS_OBJS += test-dump-untracked-cache.o
 TEST_BUILTINS_OBJS += test-example-decorate.o
 TEST_BUILTINS_OBJS += test-genrandom.o
+TEST_BUILTINS_OBJS += test-hash.o
 TEST_BUILTINS_OBJS += test-hashmap.o
 TEST_BUILTINS_OBJS += test-index-version.o
 TEST_BUILTINS_OBJS += test-json-writer.o
diff --git a/t/helper/test-sha1.c b/t/helper/test-hash.c
similarity index 65%
copy from t/helper/test-sha1.c
copy to t/helper/test-hash.c
index 1ba0675c75..0a31de66f3 100644
--- a/t/helper/test-sha1.c
+++ b/t/helper/test-hash.c
@@ -1,13 +1,14 @@
 #include "test-tool.h"
 #include "cache.h"
 
-int cmd__sha1(int ac, const char **av)
+int cmd_hash_impl(int ac, const char **av, int algo)
 {
-	git_SHA_CTX ctx;
-	unsigned char sha1[20];
+	git_hash_ctx ctx;
+	unsigned char hash[GIT_MAX_HEXSZ];
 	unsigned bufsz = 8192;
 	int binary = 0;
 	char *buffer;
+	const struct git_hash_algo *algop = &hash_algos[algo];
 
 	if (ac == 2) {
 		if (!strcmp(av[1], "-b"))
@@ -26,7 +27,7 @@ int cmd__sha1(int ac, const char **av)
 			die("OOPS");
 	}
 
-	git_SHA1_Init(&ctx);
+	algop->init_fn(&ctx);
 
 	while (1) {
 		ssize_t sz, this_sz;
@@ -38,20 +39,20 @@ int cmd__sha1(int ac, const char **av)
 			if (sz == 0)
 				break;
 			if (sz < 0)
-				die_errno("test-sha1");
+				die_errno("test-hash");
 			this_sz += sz;
 			cp += sz;
 			room -= sz;
 		}
 		if (this_sz == 0)
 			break;
-		git_SHA1_Update(&ctx, buffer, this_sz);
+		algop->update_fn(&ctx, buffer, this_sz);
 	}
-	git_SHA1_Final(sha1, &ctx);
+	algop->final_fn(hash, &ctx);
 
 	if (binary)
-		fwrite(sha1, 1, 20, stdout);
+		fwrite(hash, 1, algop->rawsz, stdout);
 	else
-		puts(sha1_to_hex(sha1));
+		puts(hash_to_hex_algop(hash, algop));
 	exit(0);
 }
diff --git a/t/helper/test-sha1.c b/t/helper/test-sha1.c
index 1ba0675c75..d860c387c3 100644
--- a/t/helper/test-sha1.c
+++ b/t/helper/test-sha1.c
@@ -3,55 +3,5 @@
 
 int cmd__sha1(int ac, const char **av)
 {
-	git_SHA_CTX ctx;
-	unsigned char sha1[20];
-	unsigned bufsz = 8192;
-	int binary = 0;
-	char *buffer;
-
-	if (ac == 2) {
-		if (!strcmp(av[1], "-b"))
-			binary = 1;
-		else
-			bufsz = strtoul(av[1], NULL, 10) * 1024 * 1024;
-	}
-
-	if (!bufsz)
-		bufsz = 8192;
-
-	while ((buffer = malloc(bufsz)) == NULL) {
-		fprintf(stderr, "bufsz %u is too big, halving...\n", bufsz);
-		bufsz /= 2;
-		if (bufsz < 1024)
-			die("OOPS");
-	}
-
-	git_SHA1_Init(&ctx);
-
-	while (1) {
-		ssize_t sz, this_sz;
-		char *cp = buffer;
-		unsigned room = bufsz;
-		this_sz = 0;
-		while (room) {
-			sz = xread(0, cp, room);
-			if (sz == 0)
-				break;
-			if (sz < 0)
-				die_errno("test-sha1");
-			this_sz += sz;
-			cp += sz;
-			room -= sz;
-		}
-		if (this_sz == 0)
-			break;
-		git_SHA1_Update(&ctx, buffer, this_sz);
-	}
-	git_SHA1_Final(sha1, &ctx);
-
-	if (binary)
-		fwrite(sha1, 1, 20, stdout);
-	else
-		puts(sha1_to_hex(sha1));
-	exit(0);
+	return cmd_hash_impl(ac, av, GIT_HASH_SHA1);
 }
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index e4890566da..29ac7b0b0d 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -50,4 +50,6 @@ int cmd__windows_named_pipe(int argc, const char **argv);
 #endif
 int cmd__write_cache(int argc, const char **argv);
 
+int cmd_hash_impl(int ac, const char **av, int algo);
+
 #endif

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v5 07/12] sha1-file: add a constant for hash block size
  2018-11-04 23:44 ` [PATCH v5 00/12] Base SHA-256 implementation brian m. carlson
                     ` (5 preceding siblings ...)
  2018-11-04 23:44   ` [PATCH v5 06/12] t: make the sha1 test-tool helper generic brian m. carlson
@ 2018-11-04 23:44   ` brian m. carlson
  2018-11-04 23:44   ` [PATCH v5 08/12] t/helper: add a test helper to compute hash speed brian m. carlson
                     ` (6 subsequent siblings)
  13 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-11-04 23:44 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

There is one place we need the hash algorithm block size: the HMAC code
for push certs.  Expose this constant in struct git_hash_algo and expose
values for SHA-1 and for the largest value of any hash.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 cache.h     | 4 ++++
 hash.h      | 3 +++
 sha1-file.c | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/cache.h b/cache.h
index bab8e8964f..9e5d1dd85a 100644
--- a/cache.h
+++ b/cache.h
@@ -45,10 +45,14 @@ unsigned long git_deflate_bound(git_zstream *, unsigned long);
 /* The length in bytes and in hex digits of an object name (SHA-1 value). */
 #define GIT_SHA1_RAWSZ 20
 #define GIT_SHA1_HEXSZ (2 * GIT_SHA1_RAWSZ)
+/* The block size of SHA-1. */
+#define GIT_SHA1_BLKSZ 64
 
 /* The length in byte and in hex digits of the largest possible hash value. */
 #define GIT_MAX_RAWSZ GIT_SHA1_RAWSZ
 #define GIT_MAX_HEXSZ GIT_SHA1_HEXSZ
+/* The largest possible block size for any supported hash. */
+#define GIT_MAX_BLKSZ GIT_SHA1_BLKSZ
 
 struct object_id {
 	unsigned char hash[GIT_MAX_RAWSZ];
diff --git a/hash.h b/hash.h
index 80881eea47..1bcf7ab6fd 100644
--- a/hash.h
+++ b/hash.h
@@ -81,6 +81,9 @@ struct git_hash_algo {
 	/* The length of the hash in hex characters. */
 	size_t hexsz;
 
+	/* The block size of the hash. */
+	size_t blksz;
+
 	/* The hash initialization function. */
 	git_hash_init_fn init_fn;
 
diff --git a/sha1-file.c b/sha1-file.c
index 7e9dedc744..9bdd04ea45 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -90,6 +90,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		0x00000000,
 		0,
 		0,
+		0,
 		git_hash_unknown_init,
 		git_hash_unknown_update,
 		git_hash_unknown_final,
@@ -102,6 +103,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		0x73686131,
 		GIT_SHA1_RAWSZ,
 		GIT_SHA1_HEXSZ,
+		GIT_SHA1_BLKSZ,
 		git_hash_sha1_init,
 		git_hash_sha1_update,
 		git_hash_sha1_final,

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v5 08/12] t/helper: add a test helper to compute hash speed
  2018-11-04 23:44 ` [PATCH v5 00/12] Base SHA-256 implementation brian m. carlson
                     ` (6 preceding siblings ...)
  2018-11-04 23:44   ` [PATCH v5 07/12] sha1-file: add a constant for hash block size brian m. carlson
@ 2018-11-04 23:44   ` brian m. carlson
  2018-11-04 23:44   ` [PATCH v5 09/12] commit-graph: convert to using the_hash_algo brian m. carlson
                     ` (5 subsequent siblings)
  13 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-11-04 23:44 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

Add a utility (which is less for the testsuite and more for developers)
that can compute hash speeds for whatever hash algorithms are
implemented.  This allows developers to test their personal systems to
determine the performance characteristics of various algorithms.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Makefile                   |  1 +
 t/helper/test-hash-speed.c | 61 ++++++++++++++++++++++++++++++++++++++
 t/helper/test-tool.c       |  1 +
 t/helper/test-tool.h       |  1 +
 4 files changed, 64 insertions(+)
 create mode 100644 t/helper/test-hash-speed.c

diff --git a/Makefile b/Makefile
index 81dc9ac819..68169a7abb 100644
--- a/Makefile
+++ b/Makefile
@@ -716,6 +716,7 @@ TEST_BUILTINS_OBJS += test-example-decorate.o
 TEST_BUILTINS_OBJS += test-genrandom.o
 TEST_BUILTINS_OBJS += test-hash.o
 TEST_BUILTINS_OBJS += test-hashmap.o
+TEST_BUILTINS_OBJS += test-hash-speed.o
 TEST_BUILTINS_OBJS += test-index-version.o
 TEST_BUILTINS_OBJS += test-json-writer.o
 TEST_BUILTINS_OBJS += test-lazy-init-name-hash.o
diff --git a/t/helper/test-hash-speed.c b/t/helper/test-hash-speed.c
new file mode 100644
index 0000000000..432233c7f0
--- /dev/null
+++ b/t/helper/test-hash-speed.c
@@ -0,0 +1,61 @@
+#include "test-tool.h"
+#include "cache.h"
+
+#define NUM_SECONDS 3
+
+static inline void compute_hash(const struct git_hash_algo *algo, git_hash_ctx *ctx, uint8_t *final, const void *p, size_t len)
+{
+	algo->init_fn(ctx);
+	algo->update_fn(ctx, p, len);
+	algo->final_fn(final, ctx);
+}
+
+int cmd__hash_speed(int ac, const char **av)
+{
+	git_hash_ctx ctx;
+	unsigned char hash[GIT_MAX_RAWSZ];
+	clock_t initial, start, end;
+	unsigned bufsizes[] = { 64, 256, 1024, 8192, 16384 };
+	int i;
+	void *p;
+	const struct git_hash_algo *algo = NULL;
+
+	if (ac == 2) {
+		for (i = 1; i < GIT_HASH_NALGOS; i++) {
+			if (!strcmp(av[1], hash_algos[i].name)) {
+				algo = &hash_algos[i];
+				break;
+			}
+		}
+	}
+	if (!algo)
+		die("usage: test-tool hash-speed algo_name");
+
+	/* Use this as an offset to make overflow less likely. */
+	initial = clock();
+
+	printf("algo: %s\n", algo->name);
+
+	for (i = 0; i < ARRAY_SIZE(bufsizes); i++) {
+		unsigned long j, kb;
+		double kb_per_sec;
+		p = xcalloc(1, bufsizes[i]);
+		start = end = clock() - initial;
+		for (j = 0; ((end - start) / CLOCKS_PER_SEC) < NUM_SECONDS; j++) {
+			compute_hash(algo, &ctx, hash, p, bufsizes[i]);
+
+			/*
+			 * Only check elapsed time every 128 iterations to avoid
+			 * dominating the runtime with system calls.
+			 */
+			if (!(j & 127))
+				end = clock() - initial;
+		}
+		kb = j * bufsizes[i];
+		kb_per_sec = kb / (1024 * ((double)end - start) / CLOCKS_PER_SEC);
+		printf("size %u: %lu iters; %lu KiB; %0.2f KiB/s\n", bufsizes[i], j, kb, kb_per_sec);
+		free(p);
+	}
+
+	exit(0);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 6b5836dc1b..e009c8186d 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -20,6 +20,7 @@ static struct test_cmd cmds[] = {
 	{ "example-decorate", cmd__example_decorate },
 	{ "genrandom", cmd__genrandom },
 	{ "hashmap", cmd__hashmap },
+	{ "hash-speed", cmd__hash_speed },
 	{ "index-version", cmd__index_version },
 	{ "json-writer", cmd__json_writer },
 	{ "lazy-init-name-hash", cmd__lazy_init_name_hash },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 29ac7b0b0d..19a7e8332a 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -16,6 +16,7 @@ int cmd__dump_untracked_cache(int argc, const char **argv);
 int cmd__example_decorate(int argc, const char **argv);
 int cmd__genrandom(int argc, const char **argv);
 int cmd__hashmap(int argc, const char **argv);
+int cmd__hash_speed(int argc, const char **argv);
 int cmd__index_version(int argc, const char **argv);
 int cmd__json_writer(int argc, const char **argv);
 int cmd__lazy_init_name_hash(int argc, const char **argv);

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v5 09/12] commit-graph: convert to using the_hash_algo
  2018-11-04 23:44 ` [PATCH v5 00/12] Base SHA-256 implementation brian m. carlson
                     ` (7 preceding siblings ...)
  2018-11-04 23:44   ` [PATCH v5 08/12] t/helper: add a test helper to compute hash speed brian m. carlson
@ 2018-11-04 23:44   ` brian m. carlson
  2018-11-04 23:44   ` [PATCH v5 10/12] Add a base implementation of SHA-256 support brian m. carlson
                     ` (4 subsequent siblings)
  13 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-11-04 23:44 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

Instead of using hard-coded constants for object sizes, use
the_hash_algo to look them up.  In addition, use a function call to look
up the object ID version and produce the correct value.  For now, we use
version 1, which means to use the default algorithm used in the rest of
the repository.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 commit-graph.c | 33 +++++++++++++++++----------------
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..6763d19288 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -23,16 +23,11 @@
 #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
 #define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */
 
-#define GRAPH_DATA_WIDTH 36
+#define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16)
 
 #define GRAPH_VERSION_1 0x1
 #define GRAPH_VERSION GRAPH_VERSION_1
 
-#define GRAPH_OID_VERSION_SHA1 1
-#define GRAPH_OID_LEN_SHA1 GIT_SHA1_RAWSZ
-#define GRAPH_OID_VERSION GRAPH_OID_VERSION_SHA1
-#define GRAPH_OID_LEN GRAPH_OID_LEN_SHA1
-
 #define GRAPH_OCTOPUS_EDGES_NEEDED 0x80000000
 #define GRAPH_PARENT_MISSING 0x7fffffff
 #define GRAPH_EDGE_LAST_MASK 0x7fffffff
@@ -44,13 +39,18 @@
 #define GRAPH_FANOUT_SIZE (4 * 256)
 #define GRAPH_CHUNKLOOKUP_WIDTH 12
 #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
-			+ GRAPH_FANOUT_SIZE + GRAPH_OID_LEN)
+			+ GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
 
 char *get_commit_graph_filename(const char *obj_dir)
 {
 	return xstrfmt("%s/info/commit-graph", obj_dir);
 }
 
+static uint8_t oid_version(void)
+{
+	return 1;
+}
+
 static struct commit_graph *alloc_commit_graph(void)
 {
 	struct commit_graph *g = xcalloc(1, sizeof(*g));
@@ -125,15 +125,15 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 	}
 
 	hash_version = *(unsigned char*)(data + 5);
-	if (hash_version != GRAPH_OID_VERSION) {
+	if (hash_version != oid_version()) {
 		error(_("hash version %X does not match version %X"),
-		      hash_version, GRAPH_OID_VERSION);
+		      hash_version, oid_version());
 		goto cleanup_fail;
 	}
 
 	graph = alloc_commit_graph();
 
-	graph->hash_len = GRAPH_OID_LEN;
+	graph->hash_len = the_hash_algo->rawsz;
 	graph->num_chunks = *(unsigned char*)(data + 6);
 	graph->graph_fd = fd;
 	graph->data = graph_map;
@@ -149,7 +149,7 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 
 		chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
 
-		if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
+		if (chunk_offset > graph_size - the_hash_algo->rawsz) {
 			error(_("improper chunk offset %08x%08x"), (uint32_t)(chunk_offset >> 32),
 			      (uint32_t)chunk_offset);
 			goto cleanup_fail;
@@ -764,6 +764,7 @@ void write_commit_graph(const char *obj_dir,
 	int num_extra_edges;
 	struct commit_list *parent;
 	struct progress *progress = NULL;
+	const unsigned hashsz = the_hash_algo->rawsz;
 
 	if (!commit_graph_compatible(the_repository))
 		return;
@@ -909,7 +910,7 @@ void write_commit_graph(const char *obj_dir,
 	hashwrite_be32(f, GRAPH_SIGNATURE);
 
 	hashwrite_u8(f, GRAPH_VERSION);
-	hashwrite_u8(f, GRAPH_OID_VERSION);
+	hashwrite_u8(f, oid_version());
 	hashwrite_u8(f, num_chunks);
 	hashwrite_u8(f, 0); /* unused padding byte */
 
@@ -924,8 +925,8 @@ void write_commit_graph(const char *obj_dir,
 
 	chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
 	chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
-	chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.nr;
-	chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.nr;
+	chunk_offsets[2] = chunk_offsets[1] + hashsz * commits.nr;
+	chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * commits.nr;
 	chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
 
 	for (i = 0; i <= num_chunks; i++) {
@@ -938,8 +939,8 @@ void write_commit_graph(const char *obj_dir,
 	}
 
 	write_graph_chunk_fanout(f, commits.list, commits.nr);
-	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
-	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
+	write_graph_chunk_oids(f, hashsz, commits.list, commits.nr);
+	write_graph_chunk_data(f, hashsz, commits.list, commits.nr);
 	write_graph_chunk_large_edges(f, commits.list, commits.nr);
 
 	close_commit_graph(the_repository);

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v5 10/12] Add a base implementation of SHA-256 support
  2018-11-04 23:44 ` [PATCH v5 00/12] Base SHA-256 implementation brian m. carlson
                     ` (8 preceding siblings ...)
  2018-11-04 23:44   ` [PATCH v5 09/12] commit-graph: convert to using the_hash_algo brian m. carlson
@ 2018-11-04 23:44   ` brian m. carlson
  2018-11-05 11:39     ` Ævar Arnfjörð Bjarmason
  2018-11-04 23:44   ` [PATCH v5 11/12] sha256: add an SHA-256 implementation using libgcrypt brian m. carlson
                     ` (3 subsequent siblings)
  13 siblings, 1 reply; 58+ messages in thread
From: brian m. carlson @ 2018-11-04 23:44 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

SHA-1 is weak and we need to transition to a new hash function.  For
some time, we have referred to this new function as NewHash.  Recently,
we decided to pick SHA-256 as NewHash.  The reasons behind the choice of
SHA-256 are outlined in the thread starting at [1] and in the commit
history for the hash function transition document.

Add a basic implementation of SHA-256 based off libtomcrypt, which is in
the public domain.  Optimize it and restructure it to meet our coding
standards.  Pull in the update and final functions from the SHA-1 block
implementation, as we know these function correctly with all compilers.
This implementation is slower than SHA-1, but more performant
implementations will be introduced in future commits.

Wire up SHA-256 in the list of hash algorithms, and add a test that the
algorithm works correctly.

Note that with this patch, it is still not possible to switch to using
SHA-256 in Git.  Additional patches are needed to prepare the code to
handle a larger hash algorithm and further test fixes are needed.

[1] https://public-inbox.org/git/20180609224913.GC38834@genre.crustytoothpaste.net/

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Makefile               |   4 +
 cache.h                |  12 ++-
 hash.h                 |  19 +++-
 sha1-file.c            |  45 ++++++++++
 sha256/block/sha256.c  | 196 +++++++++++++++++++++++++++++++++++++++++
 sha256/block/sha256.h  |  26 ++++++
 t/helper/test-sha256.c |   7 ++
 t/helper/test-tool.c   |   1 +
 t/helper/test-tool.h   |   1 +
 t/t0015-hash.sh        |  25 ++++++
 10 files changed, 332 insertions(+), 4 deletions(-)
 create mode 100644 sha256/block/sha256.c
 create mode 100644 sha256/block/sha256.h
 create mode 100644 t/helper/test-sha256.c

diff --git a/Makefile b/Makefile
index 68169a7abb..e99b7712f6 100644
--- a/Makefile
+++ b/Makefile
@@ -739,6 +739,7 @@ TEST_BUILTINS_OBJS += test-run-command.o
 TEST_BUILTINS_OBJS += test-scrap-cache-tree.o
 TEST_BUILTINS_OBJS += test-sha1.o
 TEST_BUILTINS_OBJS += test-sha1-array.o
+TEST_BUILTINS_OBJS += test-sha256.o
 TEST_BUILTINS_OBJS += test-sigchain.o
 TEST_BUILTINS_OBJS += test-strcmp-offset.o
 TEST_BUILTINS_OBJS += test-string-list.o
@@ -1633,6 +1634,9 @@ endif
 endif
 endif
 
+LIB_OBJS += sha256/block/sha256.o
+BASIC_CFLAGS += -DSHA256_BLK
+
 ifdef SHA1_MAX_BLOCK_SIZE
 	LIB_OBJS += compat/sha1-chunked.o
 	BASIC_CFLAGS += -DSHA1_MAX_BLOCK_SIZE="$(SHA1_MAX_BLOCK_SIZE)"
diff --git a/cache.h b/cache.h
index 9e5d1dd85a..48ce1565e6 100644
--- a/cache.h
+++ b/cache.h
@@ -48,11 +48,17 @@ unsigned long git_deflate_bound(git_zstream *, unsigned long);
 /* The block size of SHA-1. */
 #define GIT_SHA1_BLKSZ 64
 
+/* The length in bytes and in hex digits of an object name (SHA-256 value). */
+#define GIT_SHA256_RAWSZ 32
+#define GIT_SHA256_HEXSZ (2 * GIT_SHA256_RAWSZ)
+/* The block size of SHA-256. */
+#define GIT_SHA256_BLKSZ 64
+
 /* The length in byte and in hex digits of the largest possible hash value. */
-#define GIT_MAX_RAWSZ GIT_SHA1_RAWSZ
-#define GIT_MAX_HEXSZ GIT_SHA1_HEXSZ
+#define GIT_MAX_RAWSZ GIT_SHA256_RAWSZ
+#define GIT_MAX_HEXSZ GIT_SHA256_HEXSZ
 /* The largest possible block size for any supported hash. */
-#define GIT_MAX_BLKSZ GIT_SHA1_BLKSZ
+#define GIT_MAX_BLKSZ GIT_SHA256_BLKSZ
 
 struct object_id {
 	unsigned char hash[GIT_MAX_RAWSZ];
diff --git a/hash.h b/hash.h
index 1bcf7ab6fd..a9bc624020 100644
--- a/hash.h
+++ b/hash.h
@@ -15,6 +15,8 @@
 #include "block-sha1/sha1.h"
 #endif
 
+#include "sha256/block/sha256.h"
+
 #ifndef platform_SHA_CTX
 /*
  * platform's underlying implementation of SHA-1; could be OpenSSL,
@@ -34,6 +36,18 @@
 #define git_SHA1_Update		platform_SHA1_Update
 #define git_SHA1_Final		platform_SHA1_Final
 
+#ifndef platform_SHA256_CTX
+#define platform_SHA256_CTX	SHA256_CTX
+#define platform_SHA256_Init	SHA256_Init
+#define platform_SHA256_Update	SHA256_Update
+#define platform_SHA256_Final	SHA256_Final
+#endif
+
+#define git_SHA256_CTX		platform_SHA256_CTX
+#define git_SHA256_Init		platform_SHA256_Init
+#define git_SHA256_Update	platform_SHA256_Update
+#define git_SHA256_Final	platform_SHA256_Final
+
 #ifdef SHA1_MAX_BLOCK_SIZE
 #include "compat/sha1-chunked.h"
 #undef git_SHA1_Update
@@ -52,12 +66,15 @@
 #define GIT_HASH_UNKNOWN 0
 /* SHA-1 */
 #define GIT_HASH_SHA1 1
+/* SHA-256  */
+#define GIT_HASH_SHA256 2
 /* Number of algorithms supported (including unknown). */
-#define GIT_HASH_NALGOS (GIT_HASH_SHA1 + 1)
+#define GIT_HASH_NALGOS (GIT_HASH_SHA256 + 1)
 
 /* A suitably aligned type for stack allocations of hash contexts. */
 union git_hash_ctx {
 	git_SHA_CTX sha1;
+	git_SHA256_CTX sha256;
 };
 typedef union git_hash_ctx git_hash_ctx;
 
diff --git a/sha1-file.c b/sha1-file.c
index 9bdd04ea45..c97d93a14a 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -40,10 +40,20 @@
 #define EMPTY_TREE_SHA1_BIN_LITERAL \
 	 "\x4b\x82\x5d\xc6\x42\xcb\x6e\xb9\xa0\x60" \
 	 "\xe5\x4b\xf8\xd6\x92\x88\xfb\xee\x49\x04"
+#define EMPTY_TREE_SHA256_BIN_LITERAL \
+	"\x6e\xf1\x9b\x41\x22\x5c\x53\x69\xf1\xc1" \
+	"\x04\xd4\x5d\x8d\x85\xef\xa9\xb0\x57\xb5" \
+	"\x3b\x14\xb4\xb9\xb9\x39\xdd\x74\xde\xcc" \
+	"\x53\x21"
 
 #define EMPTY_BLOB_SHA1_BIN_LITERAL \
 	"\xe6\x9d\xe2\x9b\xb2\xd1\xd6\x43\x4b\x8b" \
 	"\x29\xae\x77\x5a\xd8\xc2\xe4\x8c\x53\x91"
+#define EMPTY_BLOB_SHA256_BIN_LITERAL \
+	"\x47\x3a\x0f\x4c\x3b\xe8\xa9\x36\x81\xa2" \
+	"\x67\xe3\xb1\xe9\xa7\xdc\xda\x11\x85\x43" \
+	"\x6f\xe1\x41\xf7\x74\x91\x20\xa3\x03\x72" \
+	"\x18\x13"
 
 const unsigned char null_sha1[GIT_MAX_RAWSZ];
 const struct object_id null_oid;
@@ -53,6 +63,12 @@ static const struct object_id empty_tree_oid = {
 static const struct object_id empty_blob_oid = {
 	EMPTY_BLOB_SHA1_BIN_LITERAL
 };
+static const struct object_id empty_tree_oid_sha256 = {
+	EMPTY_TREE_SHA256_BIN_LITERAL
+};
+static const struct object_id empty_blob_oid_sha256 = {
+	EMPTY_BLOB_SHA256_BIN_LITERAL
+};
 
 static void git_hash_sha1_init(git_hash_ctx *ctx)
 {
@@ -69,6 +85,22 @@ static void git_hash_sha1_final(unsigned char *hash, git_hash_ctx *ctx)
 	git_SHA1_Final(hash, &ctx->sha1);
 }
 
+
+static void git_hash_sha256_init(git_hash_ctx *ctx)
+{
+	git_SHA256_Init(&ctx->sha256);
+}
+
+static void git_hash_sha256_update(git_hash_ctx *ctx, const void *data, size_t len)
+{
+	git_SHA256_Update(&ctx->sha256, data, len);
+}
+
+static void git_hash_sha256_final(unsigned char *hash, git_hash_ctx *ctx)
+{
+	git_SHA256_Final(hash, &ctx->sha256);
+}
+
 static void git_hash_unknown_init(git_hash_ctx *ctx)
 {
 	BUG("trying to init unknown hash");
@@ -110,6 +142,19 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		&empty_tree_oid,
 		&empty_blob_oid,
 	},
+	{
+		"sha256",
+		/* "s256", big-endian */
+		0x73323536,
+		GIT_SHA256_RAWSZ,
+		GIT_SHA256_HEXSZ,
+		GIT_SHA256_BLKSZ,
+		git_hash_sha256_init,
+		git_hash_sha256_update,
+		git_hash_sha256_final,
+		&empty_tree_oid_sha256,
+		&empty_blob_oid_sha256,
+	}
 };
 
 const char *empty_tree_oid_hex(void)
diff --git a/sha256/block/sha256.c b/sha256/block/sha256.c
new file mode 100644
index 0000000000..37850b4e52
--- /dev/null
+++ b/sha256/block/sha256.c
@@ -0,0 +1,196 @@
+#include "git-compat-util.h"
+#include "./sha256.h"
+
+#undef RND
+#undef BLKSIZE
+
+#define BLKSIZE blk_SHA256_BLKSIZE
+
+void blk_SHA256_Init(blk_SHA256_CTX *ctx)
+{
+	ctx->offset = 0;
+	ctx->size = 0;
+	ctx->state[0] = 0x6a09e667ul;
+	ctx->state[1] = 0xbb67ae85ul;
+	ctx->state[2] = 0x3c6ef372ul;
+	ctx->state[3] = 0xa54ff53aul;
+	ctx->state[4] = 0x510e527ful;
+	ctx->state[5] = 0x9b05688cul;
+	ctx->state[6] = 0x1f83d9abul;
+	ctx->state[7] = 0x5be0cd19ul;
+}
+
+static inline uint32_t ror(uint32_t x, unsigned n)
+{
+	return (x >> n) | (x << (32 - n));
+}
+
+static inline uint32_t ch(uint32_t x, uint32_t y, uint32_t z)
+{
+	return z ^ (x & (y ^ z));
+}
+
+static inline uint32_t maj(uint32_t x, uint32_t y, uint32_t z)
+{
+	return ((x | y) & z) | (x & y);
+}
+
+static inline uint32_t sigma0(uint32_t x)
+{
+	return ror(x, 2) ^ ror(x, 13) ^ ror(x, 22);
+}
+
+static inline uint32_t sigma1(uint32_t x)
+{
+	return ror(x, 6) ^ ror(x, 11) ^ ror(x, 25);
+}
+
+static inline uint32_t gamma0(uint32_t x)
+{
+	return ror(x, 7) ^ ror(x, 18) ^ (x >> 3);
+}
+
+static inline uint32_t gamma1(uint32_t x)
+{
+	return ror(x, 17) ^ ror(x, 19) ^ (x >> 10);
+}
+
+static void blk_SHA256_Transform(blk_SHA256_CTX *ctx, const unsigned char *buf)
+{
+
+	uint32_t S[8], W[64], t0, t1;
+	int i;
+
+	/* copy state into S */
+	for (i = 0; i < 8; i++)
+		S[i] = ctx->state[i];
+
+	/* copy the state into 512-bits into W[0..15] */
+	for (i = 0; i < 16; i++, buf += sizeof(uint32_t))
+		W[i] = get_be32(buf);
+
+	/* fill W[16..63] */
+	for (i = 16; i < 64; i++)
+		W[i] = gamma1(W[i - 2]) + W[i - 7] + gamma0(W[i - 15]) + W[i - 16];
+
+#define RND(a,b,c,d,e,f,g,h,i,ki)                    \
+	t0 = h + sigma1(e) + ch(e, f, g) + ki + W[i];   \
+	t1 = sigma0(a) + maj(a, b, c);                  \
+	d += t0;                                        \
+	h  = t0 + t1;
+
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],0,0x428a2f98);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],1,0x71374491);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],2,0xb5c0fbcf);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],3,0xe9b5dba5);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],4,0x3956c25b);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],5,0x59f111f1);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],6,0x923f82a4);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],7,0xab1c5ed5);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],8,0xd807aa98);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],9,0x12835b01);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],10,0x243185be);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],11,0x550c7dc3);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],12,0x72be5d74);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],13,0x80deb1fe);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],14,0x9bdc06a7);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],15,0xc19bf174);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],16,0xe49b69c1);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],17,0xefbe4786);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],18,0x0fc19dc6);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],19,0x240ca1cc);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],20,0x2de92c6f);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],21,0x4a7484aa);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],22,0x5cb0a9dc);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],23,0x76f988da);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],24,0x983e5152);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],25,0xa831c66d);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],26,0xb00327c8);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],27,0xbf597fc7);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],28,0xc6e00bf3);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],29,0xd5a79147);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],30,0x06ca6351);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],31,0x14292967);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],32,0x27b70a85);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],33,0x2e1b2138);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],34,0x4d2c6dfc);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],35,0x53380d13);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],36,0x650a7354);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],37,0x766a0abb);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],38,0x81c2c92e);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],39,0x92722c85);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],40,0xa2bfe8a1);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],41,0xa81a664b);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],42,0xc24b8b70);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],43,0xc76c51a3);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],44,0xd192e819);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],45,0xd6990624);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],46,0xf40e3585);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],47,0x106aa070);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],48,0x19a4c116);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],49,0x1e376c08);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],50,0x2748774c);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],51,0x34b0bcb5);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],52,0x391c0cb3);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],53,0x4ed8aa4a);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],54,0x5b9cca4f);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],55,0x682e6ff3);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],56,0x748f82ee);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],57,0x78a5636f);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],58,0x84c87814);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],59,0x8cc70208);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],60,0x90befffa);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],61,0xa4506ceb);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],62,0xbef9a3f7);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],63,0xc67178f2);
+
+	for (i = 0; i < 8; i++)
+		ctx->state[i] += S[i];
+}
+
+void blk_SHA256_Update(blk_SHA256_CTX *ctx, const void *data, size_t len)
+{
+	unsigned int len_buf = ctx->size & 63;
+
+	ctx->size += len;
+
+	/* Read the data into buf and process blocks as they get full */
+	if (len_buf) {
+		unsigned int left = 64 - len_buf;
+		if (len < left)
+			left = len;
+		memcpy(len_buf + ctx->buf, data, left);
+		len_buf = (len_buf + left) & 63;
+		len -= left;
+		data = ((const char *)data + left);
+		if (len_buf)
+			return;
+		blk_SHA256_Transform(ctx, ctx->buf);
+	}
+	while (len >= 64) {
+		blk_SHA256_Transform(ctx, data);
+		data = ((const char *)data + 64);
+		len -= 64;
+	}
+	if (len)
+		memcpy(ctx->buf, data, len);
+}
+
+void blk_SHA256_Final(unsigned char *digest, blk_SHA256_CTX *ctx)
+{
+	static const unsigned char pad[64] = { 0x80 };
+	unsigned int padlen[2];
+	int i;
+
+	/* Pad with a binary 1 (ie 0x80), then zeroes, then length */
+	padlen[0] = htonl((uint32_t)(ctx->size >> 29));
+	padlen[1] = htonl((uint32_t)(ctx->size << 3));
+
+	i = ctx->size & 63;
+	blk_SHA256_Update(ctx, pad, 1 + (63 & (55 - i)));
+	blk_SHA256_Update(ctx, padlen, 8);
+
+	/* copy output */
+	for (i = 0; i < 8; i++, digest += sizeof(uint32_t))
+		put_be32(digest, ctx->state[i]);
+}
diff --git a/sha256/block/sha256.h b/sha256/block/sha256.h
new file mode 100644
index 0000000000..38f02f7e6c
--- /dev/null
+++ b/sha256/block/sha256.h
@@ -0,0 +1,26 @@
+#ifndef SHA256_BLOCK_SHA256_H
+#define SHA256_BLOCK_SHA256_H
+
+#include "git-compat-util.h"
+
+#define blk_SHA256_BLKSIZE 64
+
+struct blk_SHA256_CTX {
+	uint32_t state[8];
+	uint64_t size;
+	uint32_t offset;
+	uint8_t buf[blk_SHA256_BLKSIZE];
+};
+
+typedef struct blk_SHA256_CTX blk_SHA256_CTX;
+
+void blk_SHA256_Init(blk_SHA256_CTX *ctx);
+void blk_SHA256_Update(blk_SHA256_CTX *ctx, const void *data, size_t len);
+void blk_SHA256_Final(unsigned char *digest, blk_SHA256_CTX *ctx);
+
+#define platform_SHA256_CTX blk_SHA256_CTX
+#define platform_SHA256_Init blk_SHA256_Init
+#define platform_SHA256_Update blk_SHA256_Update
+#define platform_SHA256_Final blk_SHA256_Final
+
+#endif
diff --git a/t/helper/test-sha256.c b/t/helper/test-sha256.c
new file mode 100644
index 0000000000..0ac6a99d5f
--- /dev/null
+++ b/t/helper/test-sha256.c
@@ -0,0 +1,7 @@
+#include "test-tool.h"
+#include "cache.h"
+
+int cmd__sha256(int ac, const char **av)
+{
+	return cmd_hash_impl(ac, av, GIT_HASH_SHA256);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index e009c8186d..2a65193514 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -43,6 +43,7 @@ static struct test_cmd cmds[] = {
 	{ "scrap-cache-tree", cmd__scrap_cache_tree },
 	{ "sha1", cmd__sha1 },
 	{ "sha1-array", cmd__sha1_array },
+	{ "sha256", cmd__sha256 },
 	{ "sigchain", cmd__sigchain },
 	{ "strcmp-offset", cmd__strcmp_offset },
 	{ "string-list", cmd__string_list },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 19a7e8332a..2e66a8e47b 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -39,6 +39,7 @@ int cmd__run_command(int argc, const char **argv);
 int cmd__scrap_cache_tree(int argc, const char **argv);
 int cmd__sha1(int argc, const char **argv);
 int cmd__sha1_array(int argc, const char **argv);
+int cmd__sha256(int argc, const char **argv);
 int cmd__sigchain(int argc, const char **argv);
 int cmd__strcmp_offset(int argc, const char **argv);
 int cmd__string_list(int argc, const char **argv);
diff --git a/t/t0015-hash.sh b/t/t0015-hash.sh
index 8e763c2c3d..f8e639743f 100755
--- a/t/t0015-hash.sh
+++ b/t/t0015-hash.sh
@@ -26,4 +26,29 @@ test_expect_success 'test basic SHA-1 hash values' '
 	grep 4b825dc642cb6eb9a060e54bf8d69288fbee4904 actual
 '
 
+test_expect_success 'test basic SHA-256 hash values' '
+	test-tool sha256 </dev/null >actual &&
+	grep e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 actual &&
+	printf "a" | test-tool sha256 >actual &&
+	grep ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb actual &&
+	printf "abc" | test-tool sha256 >actual &&
+	grep ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad actual &&
+	printf "message digest" | test-tool sha256 >actual &&
+	grep f7846f55cf23e14eebeab5b4e1550cad5b509e3348fbc4efa3a1413d393cb650 actual &&
+	printf "abcdefghijklmnopqrstuvwxyz" | test-tool sha256 >actual &&
+	grep 71c480df93d6ae2f1efad1447c66c9525e316218cf51fc8d9ed832f2daf18b73 actual &&
+	perl -E "for (1..100000) { print q{aaaaaaaaaa}; }" | \
+		test-tool sha256 >actual &&
+	grep cdc76e5c9914fb9281a1c7e284d73e67f1809a48a497200e046d39ccc7112cd0 actual &&
+	perl -E "for (1..100000) { print q{abcdefghijklmnopqrstuvwxyz}; }" | \
+		test-tool sha256 >actual &&
+	grep e406ba321ca712ad35a698bf0af8d61fc4dc40eca6bdcea4697962724ccbde35 actual &&
+	printf "blob 0\0" | test-tool sha256 >actual &&
+	grep 473a0f4c3be8a93681a267e3b1e9a7dcda1185436fe141f7749120a303721813 actual &&
+	printf "blob 3\0abc" | test-tool sha256 >actual &&
+	grep c1cf6e465077930e88dc5136641d402f72a229ddd996f627d60e9639eaba35a6 actual &&
+	printf "tree 0\0" | test-tool sha256 >actual &&
+	grep 6ef19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321 actual
+'
+
 test_done

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v5 11/12] sha256: add an SHA-256 implementation using libgcrypt
  2018-11-04 23:44 ` [PATCH v5 00/12] Base SHA-256 implementation brian m. carlson
                     ` (9 preceding siblings ...)
  2018-11-04 23:44   ` [PATCH v5 10/12] Add a base implementation of SHA-256 support brian m. carlson
@ 2018-11-04 23:44   ` brian m. carlson
  2018-11-04 23:44   ` [PATCH v5 12/12] hash: add an SHA-256 implementation using OpenSSL brian m. carlson
                     ` (2 subsequent siblings)
  13 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-11-04 23:44 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

Generally, one gets better performance out of cryptographic routines
written in assembly than C, and this is also true for SHA-256.  In
addition, most Linux distributions cannot distribute Git linked against
OpenSSL for licensing reasons.

Most systems with GnuPG will also have libgcrypt, since it is a
dependency of GnuPG.  libgcrypt is also faster than the SHA1DC
implementation for messages of a few KiB and larger.

For comparison, on a Core i7-6600U, this implementation processes 16 KiB
chunks at 355 MiB/s while SHA1DC processes equivalent chunks at 337
MiB/s.

In addition, libgcrypt is licensed under the LGPL 2.1, which is
compatible with the GPL.  Add an implementation of SHA-256 that uses
libgcrypt.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Makefile        | 13 +++++++++++--
 hash.h          |  4 ++++
 sha256/gcrypt.h | 30 ++++++++++++++++++++++++++++++
 3 files changed, 45 insertions(+), 2 deletions(-)
 create mode 100644 sha256/gcrypt.h

diff --git a/Makefile b/Makefile
index e99b7712f6..5a07e03100 100644
--- a/Makefile
+++ b/Makefile
@@ -179,6 +179,10 @@ all::
 # in one call to the platform's SHA1_Update(). e.g. APPLE_COMMON_CRYPTO
 # wants 'SHA1_MAX_BLOCK_SIZE=1024L*1024L*1024L' defined.
 #
+# Define BLK_SHA256 to use the built-in SHA-256 routines.
+#
+# Define GCRYPT_SHA256 to use the SHA-256 routines in libgcrypt.
+#
 # Define NEEDS_CRYPTO_WITH_SSL if you need -lcrypto when using -lssl (Darwin).
 #
 # Define NEEDS_SSL_WITH_CRYPTO if you need -lssl when using -lcrypto (Darwin).
@@ -1634,8 +1638,13 @@ endif
 endif
 endif
 
-LIB_OBJS += sha256/block/sha256.o
-BASIC_CFLAGS += -DSHA256_BLK
+ifdef GCRYPT_SHA256
+	BASIC_CFLAGS += -DSHA256_GCRYPT
+	EXTLIBS += -lgcrypt
+else
+	LIB_OBJS += sha256/block/sha256.o
+	BASIC_CFLAGS += -DSHA256_BLK
+endif
 
 ifdef SHA1_MAX_BLOCK_SIZE
 	LIB_OBJS += compat/sha1-chunked.o
diff --git a/hash.h b/hash.h
index a9bc624020..2ef098052d 100644
--- a/hash.h
+++ b/hash.h
@@ -15,7 +15,11 @@
 #include "block-sha1/sha1.h"
 #endif
 
+#if defined(SHA256_GCRYPT)
+#include "sha256/gcrypt.h"
+#else
 #include "sha256/block/sha256.h"
+#endif
 
 #ifndef platform_SHA_CTX
 /*
diff --git a/sha256/gcrypt.h b/sha256/gcrypt.h
new file mode 100644
index 0000000000..09bd8bb200
--- /dev/null
+++ b/sha256/gcrypt.h
@@ -0,0 +1,30 @@
+#ifndef SHA256_GCRYPT_H
+#define SHA256_GCRYPT_H
+
+#include <gcrypt.h>
+
+#define SHA256_DIGEST_SIZE 32
+
+typedef gcry_md_hd_t gcrypt_SHA256_CTX;
+
+inline void gcrypt_SHA256_Init(gcrypt_SHA256_CTX *ctx)
+{
+	gcry_md_open(ctx, GCRY_MD_SHA256, 0);
+}
+
+inline void gcrypt_SHA256_Update(gcrypt_SHA256_CTX *ctx, const void *data, size_t len)
+{
+	gcry_md_write(*ctx, data, len);
+}
+
+inline void gcrypt_SHA256_Final(unsigned char *digest, gcrypt_SHA256_CTX *ctx)
+{
+	memcpy(digest, gcry_md_read(*ctx, GCRY_MD_SHA256), SHA256_DIGEST_SIZE);
+}
+
+#define platform_SHA256_CTX gcrypt_SHA256_CTX
+#define platform_SHA256_Init gcrypt_SHA256_Init
+#define platform_SHA256_Update gcrypt_SHA256_Update
+#define platform_SHA256_Final gcrypt_SHA256_Final
+
+#endif

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v5 12/12] hash: add an SHA-256 implementation using OpenSSL
  2018-11-04 23:44 ` [PATCH v5 00/12] Base SHA-256 implementation brian m. carlson
                     ` (10 preceding siblings ...)
  2018-11-04 23:44   ` [PATCH v5 11/12] sha256: add an SHA-256 implementation using libgcrypt brian m. carlson
@ 2018-11-04 23:44   ` brian m. carlson
  2018-11-05  2:45   ` [PATCH v5 00/12] Base SHA-256 implementation Junio C Hamano
  2018-11-14  4:09   ` [PATCH v6 " brian m. carlson
  13 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-11-04 23:44 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

We already have OpenSSL routines available for SHA-1, so add routines
for SHA-256 as well.

On a Core i7-6600U, this SHA-256 implementation compares favorably to
the SHA1DC SHA-1 implementation:

SHA-1: 157 MiB/s (64 byte chunks); 337 MiB/s (16 KiB chunks)
SHA-256: 165 MiB/s (64 byte chunks); 408 MiB/s (16 KiB chunks)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Makefile | 7 +++++++
 hash.h   | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/Makefile b/Makefile
index 5a07e03100..36fd3a149b 100644
--- a/Makefile
+++ b/Makefile
@@ -183,6 +183,8 @@ all::
 #
 # Define GCRYPT_SHA256 to use the SHA-256 routines in libgcrypt.
 #
+# Define OPENSSL_SHA256 to use the SHA-256 routines in OpenSSL.
+#
 # Define NEEDS_CRYPTO_WITH_SSL if you need -lcrypto when using -lssl (Darwin).
 #
 # Define NEEDS_SSL_WITH_CRYPTO if you need -lssl when using -lcrypto (Darwin).
@@ -1638,6 +1640,10 @@ endif
 endif
 endif
 
+ifdef OPENSSL_SHA256
+	EXTLIBS += $(LIB_4_CRYPTO)
+	BASIC_CFLAGS += -DSHA256_OPENSSL
+else
 ifdef GCRYPT_SHA256
 	BASIC_CFLAGS += -DSHA256_GCRYPT
 	EXTLIBS += -lgcrypt
@@ -1645,6 +1651,7 @@ else
 	LIB_OBJS += sha256/block/sha256.o
 	BASIC_CFLAGS += -DSHA256_BLK
 endif
+endif
 
 ifdef SHA1_MAX_BLOCK_SIZE
 	LIB_OBJS += compat/sha1-chunked.o
diff --git a/hash.h b/hash.h
index 2ef098052d..adde708cf2 100644
--- a/hash.h
+++ b/hash.h
@@ -17,6 +17,8 @@
 
 #if defined(SHA256_GCRYPT)
 #include "sha256/gcrypt.h"
+#elif defined(SHA256_OPENSSL)
+#include <openssl/sha.h>
 #else
 #include "sha256/block/sha256.h"
 #endif

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 00/12] Base SHA-256 implementation
  2018-11-04 23:44 ` [PATCH v5 00/12] Base SHA-256 implementation brian m. carlson
                     ` (11 preceding siblings ...)
  2018-11-04 23:44   ` [PATCH v5 12/12] hash: add an SHA-256 implementation using OpenSSL brian m. carlson
@ 2018-11-05  2:45   ` Junio C Hamano
  2018-11-14  4:09   ` [PATCH v6 " brian m. carlson
  13 siblings, 0 replies; 58+ messages in thread
From: Junio C Hamano @ 2018-11-05  2:45 UTC (permalink / raw)
  To: brian m. carlson
  Cc: git, Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> This series provides a functional SHA-256 implementation and wires it
> up, along with some housekeeping patches to make it suitable for
> testing.
>
> Changes from v4:
> * Downcase hex constants for consistency.
> * Remove needless parentheses in return statement.
> * Remove braces for single statement loops.
> * Switch to +=.
> * Add references to rationale for SHA-256.
> * Remove inclusion of "git-compat-util.h" in header.

You ended up not doing the last one, judging from the interdiff
below.  I think it is OK to leave the header in.

Thanks, will replace.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 01/12] sha1-file: rename algorithm to "sha1"
  2018-11-04 23:44   ` [PATCH v5 01/12] sha1-file: rename algorithm to "sha1" brian m. carlson
@ 2018-11-05  7:21     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 58+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-05  7:21 UTC (permalink / raw)
  To: brian m. carlson
  Cc: git, Derrick Stolee, Duy Nguyen, SZEDER Gábor,
	Jakub Narebski, Christian Couder


On Sun, Nov 04 2018, brian m. carlson wrote:

> The transition plan anticipates us using a syntax such as "^{sha1}" for
> disambiguation.  Since this is a syntax some people will be typing a
> lot, it makes sense to provide a short, easy-to-type syntax.  Omitting
> the dash doesn't create any ambiguity; however, it does make the syntax
> shorter and easier to type, especially for touch typists.  In addition,
> the transition plan already uses "sha1" in this context.

The comment for git_hash_algo's "name" member in hash.h says:

	/*
	 * The name of the algorithm, as appears in the config file and in
	 * messages.
	 */
	const char *name;

Whereas this commit message just refers to a doesn't-yet-exist ^{$algo}
syntax. The hash-function-transition.txt doc also uses forms like sha1
or sha256 in config, not sha-1 or sha-256.

I don't have a point I'm leading up to here, other than a question of
whether we should be doing something closer to this:

diff --git a/hash.h b/hash.h
index 7c8238bc2e..8ae51ac410 100644
--- a/hash.h
+++ b/hash.h
@@ -67,10 +67,17 @@ typedef void (*git_hash_final_fn)(unsigned char *hash, git_hash_ctx *ctx);

 struct git_hash_algo {
 	/*
-	 * The name of the algorithm, as appears in the config file and in
-	 * messages.
+	 * The short name of the algorithm (e.g. "sha1") for use in
+	 * config files (see hash-function-transition.txt) and the
+	 * ^{$name} peel syntax.
 	 */
-	const char *name;
+	const char *short_name;
+
+	/*
+	 * The long name of the algorithm (e.g. "SHA-1") for use in
+	 * messages to users.
+	 */
+	const char *long_name;

 	/* A four-byte version identifier, used in pack indices. */
 	uint32_t format_id;
diff --git a/sha1-file.c b/sha1-file.c
index dd0b6aa873..5ad0526155 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -86,6 +86,7 @@ static void git_hash_unknown_final(unsigned char *hash, git_hash_ctx *ctx)

 const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 	{
+		NULL,
 		NULL,
 		0x00000000,
 		0,
@@ -97,7 +98,8 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		NULL,
 	},
 	{
-		"sha-1",
+		"sha1",
+		"SHA-1",
 		/* "sha1", big-endian */
 		0x73686131,
 		GIT_SHA1_RAWSZ,

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 10/12] Add a base implementation of SHA-256 support
  2018-11-04 23:44   ` [PATCH v5 10/12] Add a base implementation of SHA-256 support brian m. carlson
@ 2018-11-05 11:39     ` Ævar Arnfjörð Bjarmason
  2018-11-07  1:30       ` brian m. carlson
  0 siblings, 1 reply; 58+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-05 11:39 UTC (permalink / raw)
  To: brian m. carlson
  Cc: git, Derrick Stolee, Duy Nguyen, SZEDER Gábor,
	Jakub Narebski, Christian Couder


On Sun, Nov 04 2018, brian m. carlson wrote:

> SHA-1 is weak and we need to transition to a new hash function.  For
> some time, we have referred to this new function as NewHash.  Recently,
> we decided to pick SHA-256 as NewHash.  The reasons behind the choice of
> SHA-256 are outlined in the thread starting at [1] and in the commit
> history for the hash function transition document.

Nit: In some contradiction now to what's said in
hash-function-transition.txt, see 5988eb631a ("doc
hash-function-transition: clarify what SHAttered means", 2018-03-26).

> +	{
> +		"sha256",
> +		/* "s256", big-endian */

The existing entry/comment for sha1 is:

		"sha1",
		/* "sha1", big-endian */

So why the sha256/s256 difference in the code/comment? Wondering if I'm
missing something and we're using "s256" for something.

>  const char *empty_tree_oid_hex(void)
> diff --git a/sha256/block/sha256.c b/sha256/block/sha256.c
> [...]

I had a question before about whether we see ourselves perma-forking
this implementation based off libtomcrypt, as I recall you said yes.

Still, I think it would be better to introduce this in at least two-four
commits where the upstream code is added as-is, then trimmed down to
size, then adapted to our coding style, and finally we add our own
utility functions.

It'll make it easier to forward-port any future upstream changes.

> +	perl -E "for (1..100000) { print q{aaaaaaaaaa}; }" | \
> +		test-tool sha256 >actual &&
> +	grep cdc76e5c9914fb9281a1c7e284d73e67f1809a48a497200e046d39ccc7112cd0 actual &&
> +	perl -E "for (1..100000) { print q{abcdefghijklmnopqrstuvwxyz}; }" | \
> +		test-tool sha256 >actual &&

I've been wanting to make use depend on perl >= 5.10 (previous noises
about that on-list), but for now we claim to support >=5.8, which
doesn't have the -E switch.

But most importantly you aren't even using -E features here, and this
isn't very idoimatic Perl. Instead do, respectively:

    perl -e 'print q{aaaaaaaaaa} x 100000'
    perl -e "print q{abcdefghijklmnopqrstuvwxyz} x 100000"

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 10/12] Add a base implementation of SHA-256 support
  2018-11-05 11:39     ` Ævar Arnfjörð Bjarmason
@ 2018-11-07  1:30       ` brian m. carlson
  2018-11-10 15:52         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 58+ messages in thread
From: brian m. carlson @ 2018-11-07  1:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Derrick Stolee, Duy Nguyen, SZEDER Gábor,
	Jakub Narebski, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 4078 bytes --]

On Mon, Nov 05, 2018 at 12:39:14PM +0100, Ævar Arnfjörð Bjarmason wrote:
> On Sun, Nov 04 2018, brian m. carlson wrote:
> > +	{
> > +		"sha256",
> > +		/* "s256", big-endian */
> 
> The existing entry/comment for sha1 is:
> 
> 		"sha1",
> 		/* "sha1", big-endian */
> 
> So why the sha256/s256 difference in the code/comment? Wondering if I'm
> missing something and we're using "s256" for something.

Ah, good question.  The comment refers to the format_id field which
follows this comment.  The value is the big-endian representation of
"s256" as a 32-bit value.  I picked that over "sha2" to avoid confusion
in case we add SHA-512 in the future, since that's also an SHA-2
algorithm.

Config files and command-line interfaces will use "sha1" or "sha256",
and binary formats will use those 32-bit values ("sha1" or "s256").

> >  const char *empty_tree_oid_hex(void)
> > diff --git a/sha256/block/sha256.c b/sha256/block/sha256.c
> > [...]
> 
> I had a question before about whether we see ourselves perma-forking
> this implementation based off libtomcrypt, as I recall you said yes.

Yes.

> Still, I think it would be better to introduce this in at least two-four
> commits where the upstream code is added as-is, then trimmed down to
> size, then adapted to our coding style, and finally we add our own
> utility functions.

At this point, the only code that's actually used from libtomcrypt is
the block transform.  The upstream code is split over multiple files in
multiple directories and won't compile in our codebase without many
files and a lot of work, so I don't feel good about either including
code that doesn't compile or including large numbers of files that don't
meet our coding standards (and that may still not compile because of
platform issues).

> It'll make it easier to forward-port any future upstream changes.

I don't foresee many, if any, changes to this code.  It either
implements the specification or it doesn't, and it's objectively easy to
determine which.  There's not even an argument to port performance
improvements, since almost everyone will be using a crypto library to
provide this code because libraries perform so dramatically better.
I've tried to not make the code perform worse than it did originally,
but that's it.

Furthermore, the modified code carries a relatively small amount of
resemblance to the original, so if we did port changes forward, we'd
probably have conflicts.

It seems like you really want to include the upstream code as a separate
commit and I understand where you're coming from with wanting to have
this split out into logical commits, but due to the specific nature of
this code, I see a lot of downsides and not a lot of upsides.

> > +	perl -E "for (1..100000) { print q{aaaaaaaaaa}; }" | \
> > +		test-tool sha256 >actual &&
> > +	grep cdc76e5c9914fb9281a1c7e284d73e67f1809a48a497200e046d39ccc7112cd0 actual &&
> > +	perl -E "for (1..100000) { print q{abcdefghijklmnopqrstuvwxyz}; }" | \
> > +		test-tool sha256 >actual &&
> 
> I've been wanting to make use depend on perl >= 5.10 (previous noises
> about that on-list), but for now we claim to support >=5.8, which
> doesn't have the -E switch.

Good point.  I'll fix that.  After having written a lot of one-liners,
I always write -E, and this was originally a one-liner.

> But most importantly you aren't even using -E features here, and this
> isn't very idoimatic Perl. Instead do, respectively:
> 
>     perl -e 'print q{aaaaaaaaaa} x 100000'
>     perl -e "print q{abcdefghijklmnopqrstuvwxyz} x 100000"

I considered the more idiomatic version originally, but the latter could
allocate a decent amount of memory in one chunk, and I wanted to avoid
that.  I think what I'd like to do, actually, is turn on autoflush and
use a postfix for, which would be more idiomatic and could potentially
provide better testing of the chunking code.  I'll add a comment to that
effect.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 10/12] Add a base implementation of SHA-256 support
  2018-11-07  1:30       ` brian m. carlson
@ 2018-11-10 15:52         ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 58+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-10 15:52 UTC (permalink / raw)
  To: brian m. carlson
  Cc: git, Derrick Stolee, Duy Nguyen, SZEDER Gábor,
	Jakub Narebski, Christian Couder


On Wed, Nov 07 2018, brian m. carlson wrote:

> On Mon, Nov 05, 2018 at 12:39:14PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> On Sun, Nov 04 2018, brian m. carlson wrote:
>> > +	{
>> > +		"sha256",
>> > +		/* "s256", big-endian */
>>
>> The existing entry/comment for sha1 is:
>>
>> 		"sha1",
>> 		/* "sha1", big-endian */
>>
>> So why the sha256/s256 difference in the code/comment? Wondering if I'm
>> missing something and we're using "s256" for something.
>
> Ah, good question.  The comment refers to the format_id field which
> follows this comment.  The value is the big-endian representation of
> "s256" as a 32-bit value.  I picked that over "sha2" to avoid confusion
> in case we add SHA-512 in the future, since that's also an SHA-2
> algorithm.
>
> Config files and command-line interfaces will use "sha1" or "sha256",
> and binary formats will use those 32-bit values ("sha1" or "s256").

Okey.

>> >  const char *empty_tree_oid_hex(void)
>> > diff --git a/sha256/block/sha256.c b/sha256/block/sha256.c
>> > [...]
>>
>> I had a question before about whether we see ourselves perma-forking
>> this implementation based off libtomcrypt, as I recall you said yes.
>
> Yes.
>
>> Still, I think it would be better to introduce this in at least two-four
>> commits where the upstream code is added as-is, then trimmed down to
>> size, then adapted to our coding style, and finally we add our own
>> utility functions.
>
> At this point, the only code that's actually used from libtomcrypt is
> the block transform.  The upstream code is split over multiple files in
> multiple directories and won't compile in our codebase without many
> files and a lot of work, so I don't feel good about either including
> code that doesn't compile or including large numbers of files that don't
> meet our coding standards (and that may still not compile because of
> platform issues).
>
>> It'll make it easier to forward-port any future upstream changes.
>
> I don't foresee many, if any, changes to this code.  It either
> implements the specification or it doesn't, and it's objectively easy to
> determine which.  There's not even an argument to port performance
> improvements, since almost everyone will be using a crypto library to
> provide this code because libraries perform so dramatically better.
> I've tried to not make the code perform worse than it did originally,
> but that's it.
>
> Furthermore, the modified code carries a relatively small amount of
> resemblance to the original, so if we did port changes forward, we'd
> probably have conflicts.
>
> It seems like you really want to include the upstream code as a separate
> commit and I understand where you're coming from with wanting to have
> this split out into logical commits, but due to the specific nature of
> this code, I see a lot of downsides and not a lot of upsides.

Yeah sorry to keep bringing this up. Your way makes sense, and I'd
forgotten the details since last time . I'll shut up about it:)

>> > +	perl -E "for (1..100000) { print q{aaaaaaaaaa}; }" | \
>> > +		test-tool sha256 >actual &&
>> > +	grep cdc76e5c9914fb9281a1c7e284d73e67f1809a48a497200e046d39ccc7112cd0 actual &&
>> > +	perl -E "for (1..100000) { print q{abcdefghijklmnopqrstuvwxyz}; }" | \
>> > +		test-tool sha256 >actual &&
>>
>> I've been wanting to make use depend on perl >= 5.10 (previous noises
>> about that on-list), but for now we claim to support >=5.8, which
>> doesn't have the -E switch.
>
> Good point.  I'll fix that.  After having written a lot of one-liners,
> I always write -E, and this was originally a one-liner.
>
>> But most importantly you aren't even using -E features here, and this
>> isn't very idoimatic Perl. Instead do, respectively:
>>
>>     perl -e 'print q{aaaaaaaaaa} x 100000'
>>     perl -e "print q{abcdefghijklmnopqrstuvwxyz} x 100000"
>
> I considered the more idiomatic version originally, but the latter could
> allocate a decent amount of memory in one chunk, and I wanted to avoid
> that.

~2.5MB for the latter, so trivial.

> I think what I'd like to do, actually, is turn on autoflush and
> use a postfix for, which would be more idiomatic and could potentially
> provide better testing of the chunking code.  I'll add a comment to that
> effect.

Okey. Maybe better to use syswrite() instead, or maybe print with
autoflush is more idiomatic, I don't know.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 02/12] sha1-file: provide functions to look up hash algorithms
  2018-11-04 23:44   ` [PATCH v5 02/12] sha1-file: provide functions to look up hash algorithms brian m. carlson
@ 2018-11-13 18:42     ` Derrick Stolee
  2018-11-13 18:45       ` Duy Nguyen
  2018-11-14  0:11       ` Ramsay Jones
  0 siblings, 2 replies; 58+ messages in thread
From: Derrick Stolee @ 2018-11-13 18:42 UTC (permalink / raw)
  To: brian m. carlson, git
  Cc: Ævar Arnfjörð Bjarmason, Duy Nguyen,
	SZEDER Gábor, Jakub Narebski, Christian Couder

On 11/4/2018 6:44 PM, brian m. carlson wrote:
> +int hash_algo_by_name(const char *name)
> +{
> +	int i;
> +	if (!name)
> +		return GIT_HASH_UNKNOWN;
> +	for (i = 1; i < GIT_HASH_NALGOS; i++)
> +		if (!strcmp(name, hash_algos[i].name))
> +			return i;
> +	return GIT_HASH_UNKNOWN;
> +}
> +

Today's test coverage report [1] shows this method is not covered in the 
test suite. Looking at 'pu', it doesn't have any callers.

Do you have a work in progress series that will use this? Could we add a 
test-tool to exercise this somehow?

Thanks,
-Stolee

[1] 
https://public-inbox.org/git/97be3e21-6a8c-9718-5161-37318f6d685f@gmail.com/

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 02/12] sha1-file: provide functions to look up hash algorithms
  2018-11-13 18:42     ` Derrick Stolee
@ 2018-11-13 18:45       ` Duy Nguyen
  2018-11-14  1:01         ` brian m. carlson
  2018-11-14  0:11       ` Ramsay Jones
  1 sibling, 1 reply; 58+ messages in thread
From: Duy Nguyen @ 2018-11-13 18:45 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: brian m. carlson, Git Mailing List,
	Ævar Arnfjörð Bjarmason, SZEDER Gábor,
	Jakub Narebski, Christian Couder

On Tue, Nov 13, 2018 at 7:42 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 11/4/2018 6:44 PM, brian m. carlson wrote:
> > +int hash_algo_by_name(const char *name)
> > +{
> > +     int i;
> > +     if (!name)
> > +             return GIT_HASH_UNKNOWN;
> > +     for (i = 1; i < GIT_HASH_NALGOS; i++)
> > +             if (!strcmp(name, hash_algos[i].name))
> > +                     return i;
> > +     return GIT_HASH_UNKNOWN;
> > +}
> > +
>
> Today's test coverage report [1] shows this method is not covered in the
> test suite. Looking at 'pu', it doesn't have any callers.
>
> Do you have a work in progress series that will use this? Could we add a
> test-tool to exercise this somehow?

Going by the function name, it should be used when hash-object or
other commands learn about --object-format=<name>.
-- 
Duy

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 02/12] sha1-file: provide functions to look up hash algorithms
  2018-11-13 18:42     ` Derrick Stolee
  2018-11-13 18:45       ` Duy Nguyen
@ 2018-11-14  0:11       ` Ramsay Jones
  2018-11-14  0:42         ` Ramsay Jones
  2018-11-14  2:11         ` brian m. carlson
  1 sibling, 2 replies; 58+ messages in thread
From: Ramsay Jones @ 2018-11-14  0:11 UTC (permalink / raw)
  To: Derrick Stolee, brian m. carlson, git
  Cc: Ævar Arnfjörð Bjarmason, Duy Nguyen,
	SZEDER Gábor, Jakub Narebski, Christian Couder



On 13/11/2018 18:42, Derrick Stolee wrote:
> On 11/4/2018 6:44 PM, brian m. carlson wrote:
>> +int hash_algo_by_name(const char *name)
>> +{
>> +    int i;
>> +    if (!name)
>> +        return GIT_HASH_UNKNOWN;
>> +    for (i = 1; i < GIT_HASH_NALGOS; i++)
>> +        if (!strcmp(name, hash_algos[i].name))
>> +            return i;
>> +    return GIT_HASH_UNKNOWN;
>> +}
>> +
> 
> Today's test coverage report [1] shows this method is not covered in the test suite. Looking at 'pu', it doesn't have any callers.
> 
> Do you have a work in progress series that will use this? Could we add a test-tool to exercise this somehow?

There are actually 4 unused external symbols resulting from Brian's
'bc/sha-256' branch. The new unused externals in 'pu' looks like:

    $ diff nsc psc
    37a38,39
    > hex.o	- hash_to_hex
    > hex.o	- hash_to_hex_algop_r
    48a51
    > parse-options.o	- optname
    71a75
    > sha1-file.o	- for_each_file_in_obj_subdir
    72a77,78
    > sha1-file.o	- hash_algo_by_id
    > sha1-file.o	- hash_algo_by_name
    $ 

The symbols from hex.o and sha1-file.o being the 4 symbols from
this branch.

I suspect that upcoming patches will make use of them. ;-)

ATB,
Ramsay Jones



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 02/12] sha1-file: provide functions to look up hash algorithms
  2018-11-14  0:11       ` Ramsay Jones
@ 2018-11-14  0:42         ` Ramsay Jones
  2018-11-14  0:51           ` Jeff King
  2018-11-14  2:11         ` brian m. carlson
  1 sibling, 1 reply; 58+ messages in thread
From: Ramsay Jones @ 2018-11-14  0:42 UTC (permalink / raw)
  To: Derrick Stolee, brian m. carlson, git
  Cc: Ævar Arnfjörð Bjarmason, Duy Nguyen,
	SZEDER Gábor, Jakub Narebski, Christian Couder, Jeff King



On 14/11/2018 00:11, Ramsay Jones wrote:
> 
> 
> On 13/11/2018 18:42, Derrick Stolee wrote:
>> On 11/4/2018 6:44 PM, brian m. carlson wrote:
>>> +int hash_algo_by_name(const char *name)
>>> +{
>>> +    int i;
>>> +    if (!name)
>>> +        return GIT_HASH_UNKNOWN;
>>> +    for (i = 1; i < GIT_HASH_NALGOS; i++)
>>> +        if (!strcmp(name, hash_algos[i].name))
>>> +            return i;
>>> +    return GIT_HASH_UNKNOWN;
>>> +}
>>> +
>>
>> Today's test coverage report [1] shows this method is not covered in the test suite. Looking at 'pu', it doesn't have any callers.
>>
>> Do you have a work in progress series that will use this? Could we add a test-tool to exercise this somehow?
> 
> There are actually 4 unused external symbols resulting from Brian's
> 'bc/sha-256' branch. The new unused externals in 'pu' looks like:
> 
>     $ diff nsc psc
>     37a38,39
>     > hex.o	- hash_to_hex
>     > hex.o	- hash_to_hex_algop_r
>     48a51
>     > parse-options.o	- optname
>     71a75
>     > sha1-file.o	- for_each_file_in_obj_subdir
>     72a77,78
>     > sha1-file.o	- hash_algo_by_id
>     > sha1-file.o	- hash_algo_by_name
>     $ 
> 
> The symbols from hex.o and sha1-file.o being the 4 symbols from
> this branch.
> 
> I suspect that upcoming patches will make use of them. ;-)

BTW, if you were puzzling over the 3rd symbol from sha1-file.o
(which I wasn't counting in the 4 symbols above! ;-) ), then I
believe that is because Jeff's commit 3a2e08245c ("object-store: 
provide helpers for loose_objects_cache", 2018-11-12) effectively
moved the only call outside of sha1-file.c (in sha1-name.c) back
into sha1-file.c

So, for_each_file_in_obj_subdir() could now be marked 'static'.
(whether it should is a different issue).

ATB,
Ramsay Jones


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 02/12] sha1-file: provide functions to look up hash algorithms
  2018-11-14  0:42         ` Ramsay Jones
@ 2018-11-14  0:51           ` Jeff King
  0 siblings, 0 replies; 58+ messages in thread
From: Jeff King @ 2018-11-14  0:51 UTC (permalink / raw)
  To: Ramsay Jones
  Cc: Derrick Stolee, brian m. carlson, git,
	Ævar Arnfjörð Bjarmason, Duy Nguyen,
	SZEDER Gábor, Jakub Narebski, Christian Couder

On Wed, Nov 14, 2018 at 12:42:45AM +0000, Ramsay Jones wrote:

> BTW, if you were puzzling over the 3rd symbol from sha1-file.o
> (which I wasn't counting in the 4 symbols above! ;-) ), then I
> believe that is because Jeff's commit 3a2e08245c ("object-store: 
> provide helpers for loose_objects_cache", 2018-11-12) effectively
> moved the only call outside of sha1-file.c (in sha1-name.c) back
> into sha1-file.c
> 
> So, for_each_file_in_obj_subdir() could now be marked 'static'.
> (whether it should is a different issue).

I think it would be reasonable to do so. Most code shouldn't have to
care about the on-disk hashing structure, so ideally it wouldn't be part
of the public interface.

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 02/12] sha1-file: provide functions to look up hash algorithms
  2018-11-13 18:45       ` Duy Nguyen
@ 2018-11-14  1:01         ` brian m. carlson
  0 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-11-14  1:01 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Derrick Stolee, Git Mailing List,
	Ævar Arnfjörð Bjarmason, SZEDER Gábor,
	Jakub Narebski, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 1093 bytes --]

On Tue, Nov 13, 2018 at 07:45:41PM +0100, Duy Nguyen wrote:
> On Tue, Nov 13, 2018 at 7:42 PM Derrick Stolee <stolee@gmail.com> wrote:
> >
> > On 11/4/2018 6:44 PM, brian m. carlson wrote:
> > > +int hash_algo_by_name(const char *name)
> > > +{
> > > +     int i;
> > > +     if (!name)
> > > +             return GIT_HASH_UNKNOWN;
> > > +     for (i = 1; i < GIT_HASH_NALGOS; i++)
> > > +             if (!strcmp(name, hash_algos[i].name))
> > > +                     return i;
> > > +     return GIT_HASH_UNKNOWN;
> > > +}
> > > +
> >
> > Today's test coverage report [1] shows this method is not covered in the
> > test suite. Looking at 'pu', it doesn't have any callers.
> >
> > Do you have a work in progress series that will use this? Could we add a
> > test-tool to exercise this somehow?
> 
> Going by the function name, it should be used when hash-object or
> other commands learn about --object-format=<name>.

Correct.  I have extensions.objectFormat code that will use this.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 02/12] sha1-file: provide functions to look up hash algorithms
  2018-11-14  0:11       ` Ramsay Jones
  2018-11-14  0:42         ` Ramsay Jones
@ 2018-11-14  2:11         ` brian m. carlson
  2018-11-14  3:53           ` Ramsay Jones
  1 sibling, 1 reply; 58+ messages in thread
From: brian m. carlson @ 2018-11-14  2:11 UTC (permalink / raw)
  To: Ramsay Jones
  Cc: Derrick Stolee, git, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 1660 bytes --]

On Wed, Nov 14, 2018 at 12:11:07AM +0000, Ramsay Jones wrote:
> 
> 
> On 13/11/2018 18:42, Derrick Stolee wrote:
> > On 11/4/2018 6:44 PM, brian m. carlson wrote:
> >> +int hash_algo_by_name(const char *name)
> >> +{
> >> +    int i;
> >> +    if (!name)
> >> +        return GIT_HASH_UNKNOWN;
> >> +    for (i = 1; i < GIT_HASH_NALGOS; i++)
> >> +        if (!strcmp(name, hash_algos[i].name))
> >> +            return i;
> >> +    return GIT_HASH_UNKNOWN;
> >> +}
> >> +
> > 
> > Today's test coverage report [1] shows this method is not covered in the test suite. Looking at 'pu', it doesn't have any callers.
> > 
> > Do you have a work in progress series that will use this? Could we add a test-tool to exercise this somehow?
> 
> There are actually 4 unused external symbols resulting from Brian's
> 'bc/sha-256' branch. The new unused externals in 'pu' looks like:
> 
>     $ diff nsc psc
>     37a38,39
>     > hex.o	- hash_to_hex

I have code that uses this in my object-id-part15 series.  I also have
another series coming after this one that makes heavy use of it.

>     > hex.o	- hash_to_hex_algop_r

I believe this is because it's inline, since it is indeed used just a
few lines below its definition.  I'll drop the inline, since it's meant
to be externally visible.

>     > sha1-file.o	- hash_algo_by_id

This will be used when I write pack index v3, which will be in my
object-id-part15 series.

>     > sha1-file.o	- hash_algo_by_name

This is used in my object-id-part15 series.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 02/12] sha1-file: provide functions to look up hash algorithms
  2018-11-14  2:11         ` brian m. carlson
@ 2018-11-14  3:53           ` Ramsay Jones
  0 siblings, 0 replies; 58+ messages in thread
From: Ramsay Jones @ 2018-11-14  3:53 UTC (permalink / raw)
  To: brian m. carlson, Derrick Stolee, git,
	Ævar Arnfjörð Bjarmason, Duy Nguyen,
	SZEDER Gábor, Jakub Narebski, Christian Couder



On 14/11/2018 02:11, brian m. carlson wrote:
> On Wed, Nov 14, 2018 at 12:11:07AM +0000, Ramsay Jones wrote:
>>
>>
>> On 13/11/2018 18:42, Derrick Stolee wrote:
>>> On 11/4/2018 6:44 PM, brian m. carlson wrote:
>>>> +int hash_algo_by_name(const char *name)
>>>> +{
>>>> +    int i;
>>>> +    if (!name)
>>>> +        return GIT_HASH_UNKNOWN;
>>>> +    for (i = 1; i < GIT_HASH_NALGOS; i++)
>>>> +        if (!strcmp(name, hash_algos[i].name))
>>>> +            return i;
>>>> +    return GIT_HASH_UNKNOWN;
>>>> +}
>>>> +
>>>
>>> Today's test coverage report [1] shows this method is not covered in the test suite. Looking at 'pu', it doesn't have any callers.
>>>
>>> Do you have a work in progress series that will use this? Could we add a test-tool to exercise this somehow?
>>
>> There are actually 4 unused external symbols resulting from Brian's
>> 'bc/sha-256' branch. The new unused externals in 'pu' looks like:
>>
>>     $ diff nsc psc
>>     37a38,39
>>     > hex.o	- hash_to_hex
> 
> I have code that uses this in my object-id-part15 series.  I also have
> another series coming after this one that makes heavy use of it.
> 
>>     > hex.o	- hash_to_hex_algop_r
> 
> I believe this is because it's inline, since it is indeed used just a
> few lines below its definition.  I'll drop the inline, since it's meant
> to be externally visible.

No, this has nothing to do with the 'inline', it is simply not
called outside of hex.c (at present). If you look at the assembler
(objdump -d hex.o), you will find practically all of the function
calls in that file are inlined (even those not marked with 'inline').

[I think the external declaration in cache.h forces the compiler to
add the external definition, despite the 'inline'. If you remove the
'inline' and re-compile and disassemble again, the result is identical.]

Thanks for confirming upcoming patches will add uses for all of
these functions - I suspected that would be the case.

Thanks!

ATB,
Ramsay Jones

> 
>>     > sha1-file.o	- hash_algo_by_id
> 
> This will be used when I write pack index v3, which will be in my
> object-id-part15 series.
> 
>>     > sha1-file.o	- hash_algo_by_name
> 
> This is used in my object-id-part15 series.
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v6 00/12] Base SHA-256 implementation
  2018-11-04 23:44 ` [PATCH v5 00/12] Base SHA-256 implementation brian m. carlson
                     ` (12 preceding siblings ...)
  2018-11-05  2:45   ` [PATCH v5 00/12] Base SHA-256 implementation Junio C Hamano
@ 2018-11-14  4:09   ` brian m. carlson
  2018-11-14  4:09     ` [PATCH v6 01/12] sha1-file: rename algorithm to "sha1" brian m. carlson
                       ` (11 more replies)
  13 siblings, 12 replies; 58+ messages in thread
From: brian m. carlson @ 2018-11-14  4:09 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

This series provides a functional SHA-256 implementation and wires it
up, along with some housekeeping patches to make it suitable for
testing.

Changes from v5:
* Remove inclusion of "git-compat-util.h" in header.
* Remove "inline" from definition of hash_to_hex_algop_r.
* Switch perl invocations in t0015 to use -e instead of -E.
* Switch perl invocations in t0015 to use autoflush and postfix for.

Changes from v4:
* Downcase hex constants for consistency.
* Remove needless parentheses in return statement.
* Remove braces for single statement loops.
* Switch to +=.
* Add references to rationale for SHA-256.

Changes from v3:
* Switch to using inline functions instead of macros in many cases.
* Undefine remaining macros at the top.

Changes from v2:
* Improve commit messages to include timing and performance information.
* Improve commit messages to be less ambiguous and more friendly to a
  wider variety of English speakers.
* Prefer functions taking struct git_hash_algo in hex.c.
* Port pieces of the block-sha1 implementation over to the block-sha256
  implementation for better compatibility.
* Drop patch 13 in favor of further discussion about the best way
  forward for versioning commit graph.
* Rename the test so as to have a different number from other tests.
* Rebase on master.

Changes from v1:
* Add a hash_to_hex function mirroring sha1_to_hex, but for
  the_hash_algo.
* Strip commit message explanation about why we chose SHA-256.
* Rebase on master
* Strip leading whitespace from commit message.
* Improve commit-graph patch to cover new code added since v1.
* Be more honest about the scope of work involved in porting the SHA-256
  implementation out of libtomcrypt.
* Revert change to limit hashcmp to 20 bytes.

brian m. carlson (12):
  sha1-file: rename algorithm to "sha1"
  sha1-file: provide functions to look up hash algorithms
  hex: introduce functions to print arbitrary hashes
  cache: make hashcmp and hasheq work with larger hashes
  t: add basic tests for our SHA-1 implementation
  t: make the sha1 test-tool helper generic
  sha1-file: add a constant for hash block size
  t/helper: add a test helper to compute hash speed
  commit-graph: convert to using the_hash_algo
  Add a base implementation of SHA-256 support
  sha256: add an SHA-256 implementation using libgcrypt
  hash: add an SHA-256 implementation using OpenSSL

 Makefile                              |  22 +++
 cache.h                               |  51 ++++---
 commit-graph.c                        |  33 ++---
 hash.h                                |  41 +++++-
 hex.c                                 |  32 +++--
 sha1-file.c                           |  70 ++++++++-
 sha256/block/sha256.c                 | 196 ++++++++++++++++++++++++++
 sha256/block/sha256.h                 |  24 ++++
 sha256/gcrypt.h                       |  30 ++++
 t/helper/test-hash-speed.c            |  61 ++++++++
 t/helper/{test-sha1.c => test-hash.c} |  19 +--
 t/helper/test-sha1.c                  |  52 +------
 t/helper/test-sha256.c                |   7 +
 t/helper/test-tool.c                  |   2 +
 t/helper/test-tool.h                  |   4 +
 t/t0015-hash.sh                       |  55 ++++++++
 16 files changed, 595 insertions(+), 104 deletions(-)
 create mode 100644 sha256/block/sha256.c
 create mode 100644 sha256/block/sha256.h
 create mode 100644 sha256/gcrypt.h
 create mode 100644 t/helper/test-hash-speed.c
 copy t/helper/{test-sha1.c => test-hash.c} (65%)
 create mode 100644 t/helper/test-sha256.c
 create mode 100755 t/t0015-hash.sh

Range-diff against v5:
 1:  a004a4c982 <  -:  ---------- :hash-impl
 2:  cf9f7f5620 =  1:  fa21b5948c sha1-file: rename algorithm to "sha1"
 3:  0144deaebe =  2:  4e146c92af sha1-file: provide functions to look up hash algorithms
 4:  b74858fb03 !  3:  10e7893242 hex: introduce functions to print arbitrary hashes
    @@ -64,8 +64,8 @@
      }
      
     -char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
    -+inline char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash,
    -+					const struct git_hash_algo *algop)
    ++char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash,
    ++			  const struct git_hash_algo *algop)
      {
      	static const char hex[] = "0123456789abcdef";
      	char *buf = buffer;
 5:  e9703017a4 =  4:  6396a0ff57 cache: make hashcmp and hasheq work with larger hashes
 6:  ab85a834fd !  5:  a51a8b5638 t: add basic tests for our SHA-1 implementation
    @@ -33,7 +33,7 @@
     +	grep c12252ceda8be8994d5fa0290a47231c1d16aae3 actual &&
     +	printf "abcdefghijklmnopqrstuvwxyz" | test-tool sha1 >actual &&
     +	grep 32d10c7b8cf96570ca04ce37f2a19d84240d3a89 actual &&
    -+	perl -E "for (1..100000) { print q{aaaaaaaaaa}; }" | \
    ++	perl -e "$| = 1; print q{aaaaaaaaaa} for 1..100000;" | \
     +		test-tool sha1 >actual &&
     +	grep 34aa973cd4c4daa4f61eeb2bdbad27316534016f actual &&
     +	printf "blob 0\0" | test-tool sha1 >actual &&
 7:  962f6d8903 =  6:  d50b040b33 t: make the sha1 test-tool helper generic
 8:  53addf4d58 =  7:  cbe15d32ed sha1-file: add a constant for hash block size
 9:  9ace10faa2 =  8:  dbb916ff26 t/helper: add a test helper to compute hash speed
10:  9adc56d01e =  9:  77a741aa3f commit-graph: convert to using the_hash_algo
11:  90544c504c ! 10:  ff7ec3b1ef Add a base implementation of SHA-256 support
    @@ -413,8 +413,6 @@
     +#ifndef SHA256_BLOCK_SHA256_H
     +#define SHA256_BLOCK_SHA256_H
     +
    -+#include "git-compat-util.h"
    -+
     +#define blk_SHA256_BLKSIZE 64
     +
     +struct blk_SHA256_CTX {
    @@ -492,10 +490,11 @@
     +	grep f7846f55cf23e14eebeab5b4e1550cad5b509e3348fbc4efa3a1413d393cb650 actual &&
     +	printf "abcdefghijklmnopqrstuvwxyz" | test-tool sha256 >actual &&
     +	grep 71c480df93d6ae2f1efad1447c66c9525e316218cf51fc8d9ed832f2daf18b73 actual &&
    -+	perl -E "for (1..100000) { print q{aaaaaaaaaa}; }" | \
    ++	# Try to exercise the chunking code by turning autoflush on.
    ++	perl -e "$| = 1; print q{aaaaaaaaaa} for 1..100000;" | \
     +		test-tool sha256 >actual &&
     +	grep cdc76e5c9914fb9281a1c7e284d73e67f1809a48a497200e046d39ccc7112cd0 actual &&
    -+	perl -E "for (1..100000) { print q{abcdefghijklmnopqrstuvwxyz}; }" | \
    ++	perl -e "$| = 1; print q{abcdefghijklmnopqrstuvwxyz} for 1..100000;" | \
     +		test-tool sha256 >actual &&
     +	grep e406ba321ca712ad35a698bf0af8d61fc4dc40eca6bdcea4697962724ccbde35 actual &&
     +	printf "blob 0\0" | test-tool sha256 >actual &&
12:  467c86e878 = 11:  46f443809b sha256: add an SHA-256 implementation using libgcrypt
13:  73e4bc17d0 = 12:  6d358417d9 hash: add an SHA-256 implementation using OpenSSL

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v6 01/12] sha1-file: rename algorithm to "sha1"
  2018-11-14  4:09   ` [PATCH v6 " brian m. carlson
@ 2018-11-14  4:09     ` brian m. carlson
  2018-11-14  4:09     ` [PATCH v6 02/12] sha1-file: provide functions to look up hash algorithms brian m. carlson
                       ` (10 subsequent siblings)
  11 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-11-14  4:09 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

The transition plan anticipates us using a syntax such as "^{sha1}" for
disambiguation.  Since this is a syntax some people will be typing a
lot, it makes sense to provide a short, easy-to-type syntax.  Omitting
the dash doesn't create any ambiguity; however, it does make the syntax
shorter and easier to type, especially for touch typists.  In addition,
the transition plan already uses "sha1" in this context.

Rename the name of SHA-1 implementation to "sha1".

Note that this change creates no backwards compatibility concerns, since
we haven't yet used this field in any configuration settings.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 sha1-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sha1-file.c b/sha1-file.c
index 2daf7d9935..3b9c042691 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -97,7 +97,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		NULL,
 	},
 	{
-		"sha-1",
+		"sha1",
 		/* "sha1", big-endian */
 		0x73686131,
 		GIT_SHA1_RAWSZ,

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v6 02/12] sha1-file: provide functions to look up hash algorithms
  2018-11-14  4:09   ` [PATCH v6 " brian m. carlson
  2018-11-14  4:09     ` [PATCH v6 01/12] sha1-file: rename algorithm to "sha1" brian m. carlson
@ 2018-11-14  4:09     ` brian m. carlson
  2018-11-14  4:09     ` [PATCH v6 03/12] hex: introduce functions to print arbitrary hashes brian m. carlson
                       ` (9 subsequent siblings)
  11 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-11-14  4:09 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

There are several ways we might refer to a hash algorithm: by name, such
as in the config file; by format ID, such as in a pack; or internally,
by a pointer to the hash_algos array.  Provide functions to look up hash
algorithms based on these various forms and return the internal constant
used for them.  If conversion to another form is necessary, this
internal constant can be used to look up the proper data in the
hash_algos array.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 hash.h      | 13 +++++++++++++
 sha1-file.c | 21 +++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/hash.h b/hash.h
index 7c8238bc2e..80881eea47 100644
--- a/hash.h
+++ b/hash.h
@@ -98,4 +98,17 @@ struct git_hash_algo {
 };
 extern const struct git_hash_algo hash_algos[GIT_HASH_NALGOS];
 
+/*
+ * Return a GIT_HASH_* constant based on the name.  Returns GIT_HASH_UNKNOWN if
+ * the name doesn't match a known algorithm.
+ */
+int hash_algo_by_name(const char *name);
+/* Identical, except based on the format ID. */
+int hash_algo_by_id(uint32_t format_id);
+/* Identical, except for a pointer to struct git_hash_algo. */
+static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
+{
+	return p - hash_algos;
+}
+
 #endif
diff --git a/sha1-file.c b/sha1-file.c
index 3b9c042691..93ed8c8686 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -122,6 +122,27 @@ const char *empty_blob_oid_hex(void)
 	return oid_to_hex_r(buf, the_hash_algo->empty_blob);
 }
 
+int hash_algo_by_name(const char *name)
+{
+	int i;
+	if (!name)
+		return GIT_HASH_UNKNOWN;
+	for (i = 1; i < GIT_HASH_NALGOS; i++)
+		if (!strcmp(name, hash_algos[i].name))
+			return i;
+	return GIT_HASH_UNKNOWN;
+}
+
+int hash_algo_by_id(uint32_t format_id)
+{
+	int i;
+	for (i = 1; i < GIT_HASH_NALGOS; i++)
+		if (format_id == hash_algos[i].format_id)
+			return i;
+	return GIT_HASH_UNKNOWN;
+}
+
+
 /*
  * This is meant to hold a *small* number of objects that you would
  * want read_sha1_file() to be able to return, but yet you do not want

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v6 03/12] hex: introduce functions to print arbitrary hashes
  2018-11-14  4:09   ` [PATCH v6 " brian m. carlson
  2018-11-14  4:09     ` [PATCH v6 01/12] sha1-file: rename algorithm to "sha1" brian m. carlson
  2018-11-14  4:09     ` [PATCH v6 02/12] sha1-file: provide functions to look up hash algorithms brian m. carlson
@ 2018-11-14  4:09     ` brian m. carlson
  2018-11-14  4:09     ` [PATCH v6 04/12] cache: make hashcmp and hasheq work with larger hashes brian m. carlson
                       ` (8 subsequent siblings)
  11 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-11-14  4:09 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

Currently, we have functions that turn an arbitrary SHA-1 value or an
object ID into hex format, either using a static buffer or with a
user-provided buffer.  Add variants of these functions that can handle
an arbitrary hash algorithm, specified by constant.  Update the
documentation as well.

While we're at it, remove the "extern" declaration from this family of
functions, since it's not needed and our style now recommends against
it.

We use the variant taking the algorithm structure pointer as the
internal variant, since taking an algorithm pointer is the easiest way
to handle all of the variants in use.

Note that we maintain these functions because there are hashes which
must change based on the hash algorithm in use but are not object IDs
(such as pack checksums).

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 cache.h | 15 +++++++++------
 hex.c   | 32 ++++++++++++++++++++++++--------
 2 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/cache.h b/cache.h
index ca36b44ee0..cb7268deea 100644
--- a/cache.h
+++ b/cache.h
@@ -1365,9 +1365,9 @@ extern int get_oid_hex(const char *hex, struct object_id *sha1);
 extern int hex_to_bytes(unsigned char *binary, const char *hex, size_t len);
 
 /*
- * Convert a binary sha1 to its hex equivalent. The `_r` variant is reentrant,
+ * Convert a binary hash to its hex equivalent. The `_r` variant is reentrant,
  * and writes the NUL-terminated output to the buffer `out`, which must be at
- * least `GIT_SHA1_HEXSZ + 1` bytes, and returns a pointer to out for
+ * least `GIT_MAX_HEXSZ + 1` bytes, and returns a pointer to out for
  * convenience.
  *
  * The non-`_r` variant returns a static buffer, but uses a ring of 4
@@ -1375,10 +1375,13 @@ extern int hex_to_bytes(unsigned char *binary, const char *hex, size_t len);
  *
  *   printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two));
  */
-extern char *sha1_to_hex_r(char *out, const unsigned char *sha1);
-extern char *oid_to_hex_r(char *out, const struct object_id *oid);
-extern char *sha1_to_hex(const unsigned char *sha1);	/* static buffer result! */
-extern char *oid_to_hex(const struct object_id *oid);	/* same static buffer as sha1_to_hex */
+char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash, const struct git_hash_algo *);
+char *sha1_to_hex_r(char *out, const unsigned char *sha1);
+char *oid_to_hex_r(char *out, const struct object_id *oid);
+char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *);	/* static buffer result! */
+char *sha1_to_hex(const unsigned char *sha1);						/* same static buffer */
+char *hash_to_hex(const unsigned char *hash);						/* same static buffer */
+char *oid_to_hex(const struct object_id *oid);						/* same static buffer */
 
 /*
  * Parse a 40-character hexadecimal object ID starting from hex, updating the
diff --git a/hex.c b/hex.c
index 10af1a29e8..7850a8879d 100644
--- a/hex.c
+++ b/hex.c
@@ -73,14 +73,15 @@ int parse_oid_hex(const char *hex, struct object_id *oid, const char **end)
 	return ret;
 }
 
-char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
+char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash,
+			  const struct git_hash_algo *algop)
 {
 	static const char hex[] = "0123456789abcdef";
 	char *buf = buffer;
 	int i;
 
-	for (i = 0; i < the_hash_algo->rawsz; i++) {
-		unsigned int val = *sha1++;
+	for (i = 0; i < algop->rawsz; i++) {
+		unsigned int val = *hash++;
 		*buf++ = hex[val >> 4];
 		*buf++ = hex[val & 0xf];
 	}
@@ -89,20 +90,35 @@ char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
 	return buffer;
 }
 
-char *oid_to_hex_r(char *buffer, const struct object_id *oid)
+char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
 {
-	return sha1_to_hex_r(buffer, oid->hash);
+	return hash_to_hex_algop_r(buffer, sha1, &hash_algos[GIT_HASH_SHA1]);
 }
 
-char *sha1_to_hex(const unsigned char *sha1)
+char *oid_to_hex_r(char *buffer, const struct object_id *oid)
+{
+	return hash_to_hex_algop_r(buffer, oid->hash, the_hash_algo);
+}
+
+char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *algop)
 {
 	static int bufno;
 	static char hexbuffer[4][GIT_MAX_HEXSZ + 1];
 	bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer);
-	return sha1_to_hex_r(hexbuffer[bufno], sha1);
+	return hash_to_hex_algop_r(hexbuffer[bufno], hash, algop);
+}
+
+char *sha1_to_hex(const unsigned char *sha1)
+{
+	return hash_to_hex_algop(sha1, &hash_algos[GIT_HASH_SHA1]);
+}
+
+char *hash_to_hex(const unsigned char *hash)
+{
+	return hash_to_hex_algop(hash, the_hash_algo);
 }
 
 char *oid_to_hex(const struct object_id *oid)
 {
-	return sha1_to_hex(oid->hash);
+	return hash_to_hex_algop(oid->hash, the_hash_algo);
 }

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v6 04/12] cache: make hashcmp and hasheq work with larger hashes
  2018-11-14  4:09   ` [PATCH v6 " brian m. carlson
                       ` (2 preceding siblings ...)
  2018-11-14  4:09     ` [PATCH v6 03/12] hex: introduce functions to print arbitrary hashes brian m. carlson
@ 2018-11-14  4:09     ` brian m. carlson
  2018-11-14  4:09     ` [PATCH v6 05/12] t: add basic tests for our SHA-1 implementation brian m. carlson
                       ` (7 subsequent siblings)
  11 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-11-14  4:09 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

In 183a638b7d ("hashcmp: assert constant hash size", 2018-08-23), we
modified hashcmp to assert that the hash size was always 20 to help it
optimize and inline calls to memcmp.  In a future series, we replaced
many calls to hashcmp and oidcmp with calls to hasheq and oideq to
improve inlining further.

However, we want to support hash algorithms other than SHA-1, namely
SHA-256.  When doing so, we must handle the case where these values are
32 bytes long as well as 20.  Adjust hashcmp to handle two cases:
20-byte matches, and maximum-size matches.  Therefore, when we include
SHA-256, we'll automatically handle it properly, while at the same time
teaching the compiler that there are only two possible options to
consider.  This will allow the compiler to write the most efficient
possible code.

Copy similar code into hasheq and perform an identical transformation.
At least with GCC 8.2.0, making hasheq defer to hashcmp when there are
two branches prevents the compiler from inlining the comparison, while
the code in this patch is inlined properly.  Add a comment to avoid an
accidental performance regression from well-intentioned refactoring.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 cache.h | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index cb7268deea..8607878dbc 100644
--- a/cache.h
+++ b/cache.h
@@ -1028,16 +1028,12 @@ extern const struct object_id null_oid;
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
 	/*
-	 * This is a temporary optimization hack. By asserting the size here,
-	 * we let the compiler know that it's always going to be 20, which lets
-	 * it turn this fixed-size memcmp into a few inline instructions.
-	 *
-	 * This will need to be extended or ripped out when we learn about
-	 * hashes of different sizes.
+	 * Teach the compiler that there are only two possibilities of hash size
+	 * here, so that it can optimize for this case as much as possible.
 	 */
-	if (the_hash_algo->rawsz != 20)
-		BUG("hash size not yet supported by hashcmp");
-	return memcmp(sha1, sha2, the_hash_algo->rawsz);
+	if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
+		return memcmp(sha1, sha2, GIT_MAX_RAWSZ);
+	return memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)
@@ -1047,7 +1043,13 @@ static inline int oidcmp(const struct object_id *oid1, const struct object_id *o
 
 static inline int hasheq(const unsigned char *sha1, const unsigned char *sha2)
 {
-	return !hashcmp(sha1, sha2);
+	/*
+	 * We write this here instead of deferring to hashcmp so that the
+	 * compiler can properly inline it and avoid calling memcmp.
+	 */
+	if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
+		return !memcmp(sha1, sha2, GIT_MAX_RAWSZ);
+	return !memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
 }
 
 static inline int oideq(const struct object_id *oid1, const struct object_id *oid2)

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v6 05/12] t: add basic tests for our SHA-1 implementation
  2018-11-14  4:09   ` [PATCH v6 " brian m. carlson
                       ` (3 preceding siblings ...)
  2018-11-14  4:09     ` [PATCH v6 04/12] cache: make hashcmp and hasheq work with larger hashes brian m. carlson
@ 2018-11-14  4:09     ` brian m. carlson
  2018-11-14  4:09     ` [PATCH v6 06/12] t: make the sha1 test-tool helper generic brian m. carlson
                       ` (6 subsequent siblings)
  11 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-11-14  4:09 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

We have in the past had some unfortunate endianness issues with some
SHA-1 implementations we ship, especially on big-endian machines.  Add
an explicit test using the test helper to catch these issues and point
them out prominently.  This test can also be used as a staging ground
for people testing additional algorithms to verify that their
implementations are working as expected.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t0015-hash.sh | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)
 create mode 100755 t/t0015-hash.sh

diff --git a/t/t0015-hash.sh b/t/t0015-hash.sh
new file mode 100755
index 0000000000..884fe5acd1
--- /dev/null
+++ b/t/t0015-hash.sh
@@ -0,0 +1,29 @@
+#!/bin/sh
+
+test_description='test basic hash implementation'
+. ./test-lib.sh
+
+
+test_expect_success 'test basic SHA-1 hash values' '
+	test-tool sha1 </dev/null >actual &&
+	grep da39a3ee5e6b4b0d3255bfef95601890afd80709 actual &&
+	printf "a" | test-tool sha1 >actual &&
+	grep 86f7e437faa5a7fce15d1ddcb9eaeaea377667b8 actual &&
+	printf "abc" | test-tool sha1 >actual &&
+	grep a9993e364706816aba3e25717850c26c9cd0d89d actual &&
+	printf "message digest" | test-tool sha1 >actual &&
+	grep c12252ceda8be8994d5fa0290a47231c1d16aae3 actual &&
+	printf "abcdefghijklmnopqrstuvwxyz" | test-tool sha1 >actual &&
+	grep 32d10c7b8cf96570ca04ce37f2a19d84240d3a89 actual &&
+	perl -e "$| = 1; print q{aaaaaaaaaa} for 1..100000;" | \
+		test-tool sha1 >actual &&
+	grep 34aa973cd4c4daa4f61eeb2bdbad27316534016f actual &&
+	printf "blob 0\0" | test-tool sha1 >actual &&
+	grep e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 actual &&
+	printf "blob 3\0abc" | test-tool sha1 >actual &&
+	grep f2ba8f84ab5c1bce84a7b441cb1959cfc7093b7f actual &&
+	printf "tree 0\0" | test-tool sha1 >actual &&
+	grep 4b825dc642cb6eb9a060e54bf8d69288fbee4904 actual
+'
+
+test_done

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v6 06/12] t: make the sha1 test-tool helper generic
  2018-11-14  4:09   ` [PATCH v6 " brian m. carlson
                       ` (4 preceding siblings ...)
  2018-11-14  4:09     ` [PATCH v6 05/12] t: add basic tests for our SHA-1 implementation brian m. carlson
@ 2018-11-14  4:09     ` brian m. carlson
  2018-11-14  4:09     ` [PATCH v6 07/12] sha1-file: add a constant for hash block size brian m. carlson
                       ` (5 subsequent siblings)
  11 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-11-14  4:09 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

Since we're going to have multiple hash algorithms to test, it makes
sense to share as much of the test code as possible.  Convert the sha1
helper for the test-tool to be generic and move it out into its own
module.  This will allow us to share most of this code with our NewHash
implementation.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Makefile                              |  1 +
 t/helper/{test-sha1.c => test-hash.c} | 19 +++++-----
 t/helper/test-sha1.c                  | 52 +--------------------------
 t/helper/test-tool.h                  |  2 ++
 4 files changed, 14 insertions(+), 60 deletions(-)
 copy t/helper/{test-sha1.c => test-hash.c} (65%)

diff --git a/Makefile b/Makefile
index 016fdcdb81..daf0b198ec 100644
--- a/Makefile
+++ b/Makefile
@@ -724,6 +724,7 @@ TEST_BUILTINS_OBJS += test-dump-split-index.o
 TEST_BUILTINS_OBJS += test-dump-untracked-cache.o
 TEST_BUILTINS_OBJS += test-example-decorate.o
 TEST_BUILTINS_OBJS += test-genrandom.o
+TEST_BUILTINS_OBJS += test-hash.o
 TEST_BUILTINS_OBJS += test-hashmap.o
 TEST_BUILTINS_OBJS += test-index-version.o
 TEST_BUILTINS_OBJS += test-json-writer.o
diff --git a/t/helper/test-sha1.c b/t/helper/test-hash.c
similarity index 65%
copy from t/helper/test-sha1.c
copy to t/helper/test-hash.c
index 1ba0675c75..0a31de66f3 100644
--- a/t/helper/test-sha1.c
+++ b/t/helper/test-hash.c
@@ -1,13 +1,14 @@
 #include "test-tool.h"
 #include "cache.h"
 
-int cmd__sha1(int ac, const char **av)
+int cmd_hash_impl(int ac, const char **av, int algo)
 {
-	git_SHA_CTX ctx;
-	unsigned char sha1[20];
+	git_hash_ctx ctx;
+	unsigned char hash[GIT_MAX_HEXSZ];
 	unsigned bufsz = 8192;
 	int binary = 0;
 	char *buffer;
+	const struct git_hash_algo *algop = &hash_algos[algo];
 
 	if (ac == 2) {
 		if (!strcmp(av[1], "-b"))
@@ -26,7 +27,7 @@ int cmd__sha1(int ac, const char **av)
 			die("OOPS");
 	}
 
-	git_SHA1_Init(&ctx);
+	algop->init_fn(&ctx);
 
 	while (1) {
 		ssize_t sz, this_sz;
@@ -38,20 +39,20 @@ int cmd__sha1(int ac, const char **av)
 			if (sz == 0)
 				break;
 			if (sz < 0)
-				die_errno("test-sha1");
+				die_errno("test-hash");
 			this_sz += sz;
 			cp += sz;
 			room -= sz;
 		}
 		if (this_sz == 0)
 			break;
-		git_SHA1_Update(&ctx, buffer, this_sz);
+		algop->update_fn(&ctx, buffer, this_sz);
 	}
-	git_SHA1_Final(sha1, &ctx);
+	algop->final_fn(hash, &ctx);
 
 	if (binary)
-		fwrite(sha1, 1, 20, stdout);
+		fwrite(hash, 1, algop->rawsz, stdout);
 	else
-		puts(sha1_to_hex(sha1));
+		puts(hash_to_hex_algop(hash, algop));
 	exit(0);
 }
diff --git a/t/helper/test-sha1.c b/t/helper/test-sha1.c
index 1ba0675c75..d860c387c3 100644
--- a/t/helper/test-sha1.c
+++ b/t/helper/test-sha1.c
@@ -3,55 +3,5 @@
 
 int cmd__sha1(int ac, const char **av)
 {
-	git_SHA_CTX ctx;
-	unsigned char sha1[20];
-	unsigned bufsz = 8192;
-	int binary = 0;
-	char *buffer;
-
-	if (ac == 2) {
-		if (!strcmp(av[1], "-b"))
-			binary = 1;
-		else
-			bufsz = strtoul(av[1], NULL, 10) * 1024 * 1024;
-	}
-
-	if (!bufsz)
-		bufsz = 8192;
-
-	while ((buffer = malloc(bufsz)) == NULL) {
-		fprintf(stderr, "bufsz %u is too big, halving...\n", bufsz);
-		bufsz /= 2;
-		if (bufsz < 1024)
-			die("OOPS");
-	}
-
-	git_SHA1_Init(&ctx);
-
-	while (1) {
-		ssize_t sz, this_sz;
-		char *cp = buffer;
-		unsigned room = bufsz;
-		this_sz = 0;
-		while (room) {
-			sz = xread(0, cp, room);
-			if (sz == 0)
-				break;
-			if (sz < 0)
-				die_errno("test-sha1");
-			this_sz += sz;
-			cp += sz;
-			room -= sz;
-		}
-		if (this_sz == 0)
-			break;
-		git_SHA1_Update(&ctx, buffer, this_sz);
-	}
-	git_SHA1_Final(sha1, &ctx);
-
-	if (binary)
-		fwrite(sha1, 1, 20, stdout);
-	else
-		puts(sha1_to_hex(sha1));
-	exit(0);
+	return cmd_hash_impl(ac, av, GIT_HASH_SHA1);
 }
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 042f12464b..b5b7712a1d 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -51,4 +51,6 @@ int cmd__windows_named_pipe(int argc, const char **argv);
 #endif
 int cmd__write_cache(int argc, const char **argv);
 
+int cmd_hash_impl(int ac, const char **av, int algo);
+
 #endif

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v6 07/12] sha1-file: add a constant for hash block size
  2018-11-14  4:09   ` [PATCH v6 " brian m. carlson
                       ` (5 preceding siblings ...)
  2018-11-14  4:09     ` [PATCH v6 06/12] t: make the sha1 test-tool helper generic brian m. carlson
@ 2018-11-14  4:09     ` brian m. carlson
  2018-11-14  4:09     ` [PATCH v6 08/12] t/helper: add a test helper to compute hash speed brian m. carlson
                       ` (4 subsequent siblings)
  11 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-11-14  4:09 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

There is one place we need the hash algorithm block size: the HMAC code
for push certs.  Expose this constant in struct git_hash_algo and expose
values for SHA-1 and for the largest value of any hash.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 cache.h     | 4 ++++
 hash.h      | 3 +++
 sha1-file.c | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/cache.h b/cache.h
index 8607878dbc..62b2f3a5e8 100644
--- a/cache.h
+++ b/cache.h
@@ -45,10 +45,14 @@ unsigned long git_deflate_bound(git_zstream *, unsigned long);
 /* The length in bytes and in hex digits of an object name (SHA-1 value). */
 #define GIT_SHA1_RAWSZ 20
 #define GIT_SHA1_HEXSZ (2 * GIT_SHA1_RAWSZ)
+/* The block size of SHA-1. */
+#define GIT_SHA1_BLKSZ 64
 
 /* The length in byte and in hex digits of the largest possible hash value. */
 #define GIT_MAX_RAWSZ GIT_SHA1_RAWSZ
 #define GIT_MAX_HEXSZ GIT_SHA1_HEXSZ
+/* The largest possible block size for any supported hash. */
+#define GIT_MAX_BLKSZ GIT_SHA1_BLKSZ
 
 struct object_id {
 	unsigned char hash[GIT_MAX_RAWSZ];
diff --git a/hash.h b/hash.h
index 80881eea47..1bcf7ab6fd 100644
--- a/hash.h
+++ b/hash.h
@@ -81,6 +81,9 @@ struct git_hash_algo {
 	/* The length of the hash in hex characters. */
 	size_t hexsz;
 
+	/* The block size of the hash. */
+	size_t blksz;
+
 	/* The hash initialization function. */
 	git_hash_init_fn init_fn;
 
diff --git a/sha1-file.c b/sha1-file.c
index 93ed8c8686..c47349a5f8 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -90,6 +90,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		0x00000000,
 		0,
 		0,
+		0,
 		git_hash_unknown_init,
 		git_hash_unknown_update,
 		git_hash_unknown_final,
@@ -102,6 +103,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		0x73686131,
 		GIT_SHA1_RAWSZ,
 		GIT_SHA1_HEXSZ,
+		GIT_SHA1_BLKSZ,
 		git_hash_sha1_init,
 		git_hash_sha1_update,
 		git_hash_sha1_final,

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v6 08/12] t/helper: add a test helper to compute hash speed
  2018-11-14  4:09   ` [PATCH v6 " brian m. carlson
                       ` (6 preceding siblings ...)
  2018-11-14  4:09     ` [PATCH v6 07/12] sha1-file: add a constant for hash block size brian m. carlson
@ 2018-11-14  4:09     ` brian m. carlson
  2018-11-14  4:09     ` [PATCH v6 09/12] commit-graph: convert to using the_hash_algo brian m. carlson
                       ` (3 subsequent siblings)
  11 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-11-14  4:09 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

Add a utility (which is less for the testsuite and more for developers)
that can compute hash speeds for whatever hash algorithms are
implemented.  This allows developers to test their personal systems to
determine the performance characteristics of various algorithms.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Makefile                   |  1 +
 t/helper/test-hash-speed.c | 61 ++++++++++++++++++++++++++++++++++++++
 t/helper/test-tool.c       |  1 +
 t/helper/test-tool.h       |  1 +
 4 files changed, 64 insertions(+)
 create mode 100644 t/helper/test-hash-speed.c

diff --git a/Makefile b/Makefile
index daf0b198ec..c6f06bf50b 100644
--- a/Makefile
+++ b/Makefile
@@ -726,6 +726,7 @@ TEST_BUILTINS_OBJS += test-example-decorate.o
 TEST_BUILTINS_OBJS += test-genrandom.o
 TEST_BUILTINS_OBJS += test-hash.o
 TEST_BUILTINS_OBJS += test-hashmap.o
+TEST_BUILTINS_OBJS += test-hash-speed.o
 TEST_BUILTINS_OBJS += test-index-version.o
 TEST_BUILTINS_OBJS += test-json-writer.o
 TEST_BUILTINS_OBJS += test-lazy-init-name-hash.o
diff --git a/t/helper/test-hash-speed.c b/t/helper/test-hash-speed.c
new file mode 100644
index 0000000000..432233c7f0
--- /dev/null
+++ b/t/helper/test-hash-speed.c
@@ -0,0 +1,61 @@
+#include "test-tool.h"
+#include "cache.h"
+
+#define NUM_SECONDS 3
+
+static inline void compute_hash(const struct git_hash_algo *algo, git_hash_ctx *ctx, uint8_t *final, const void *p, size_t len)
+{
+	algo->init_fn(ctx);
+	algo->update_fn(ctx, p, len);
+	algo->final_fn(final, ctx);
+}
+
+int cmd__hash_speed(int ac, const char **av)
+{
+	git_hash_ctx ctx;
+	unsigned char hash[GIT_MAX_RAWSZ];
+	clock_t initial, start, end;
+	unsigned bufsizes[] = { 64, 256, 1024, 8192, 16384 };
+	int i;
+	void *p;
+	const struct git_hash_algo *algo = NULL;
+
+	if (ac == 2) {
+		for (i = 1; i < GIT_HASH_NALGOS; i++) {
+			if (!strcmp(av[1], hash_algos[i].name)) {
+				algo = &hash_algos[i];
+				break;
+			}
+		}
+	}
+	if (!algo)
+		die("usage: test-tool hash-speed algo_name");
+
+	/* Use this as an offset to make overflow less likely. */
+	initial = clock();
+
+	printf("algo: %s\n", algo->name);
+
+	for (i = 0; i < ARRAY_SIZE(bufsizes); i++) {
+		unsigned long j, kb;
+		double kb_per_sec;
+		p = xcalloc(1, bufsizes[i]);
+		start = end = clock() - initial;
+		for (j = 0; ((end - start) / CLOCKS_PER_SEC) < NUM_SECONDS; j++) {
+			compute_hash(algo, &ctx, hash, p, bufsizes[i]);
+
+			/*
+			 * Only check elapsed time every 128 iterations to avoid
+			 * dominating the runtime with system calls.
+			 */
+			if (!(j & 127))
+				end = clock() - initial;
+		}
+		kb = j * bufsizes[i];
+		kb_per_sec = kb / (1024 * ((double)end - start) / CLOCKS_PER_SEC);
+		printf("size %u: %lu iters; %lu KiB; %0.2f KiB/s\n", bufsizes[i], j, kb, kb_per_sec);
+		free(p);
+	}
+
+	exit(0);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index bfb195b1a8..c222d532a2 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -20,6 +20,7 @@ static struct test_cmd cmds[] = {
 	{ "example-decorate", cmd__example_decorate },
 	{ "genrandom", cmd__genrandom },
 	{ "hashmap", cmd__hashmap },
+	{ "hash-speed", cmd__hash_speed },
 	{ "index-version", cmd__index_version },
 	{ "json-writer", cmd__json_writer },
 	{ "lazy-init-name-hash", cmd__lazy_init_name_hash },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index b5b7712a1d..70dafb7649 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -16,6 +16,7 @@ int cmd__dump_untracked_cache(int argc, const char **argv);
 int cmd__example_decorate(int argc, const char **argv);
 int cmd__genrandom(int argc, const char **argv);
 int cmd__hashmap(int argc, const char **argv);
+int cmd__hash_speed(int argc, const char **argv);
 int cmd__index_version(int argc, const char **argv);
 int cmd__json_writer(int argc, const char **argv);
 int cmd__lazy_init_name_hash(int argc, const char **argv);

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v6 09/12] commit-graph: convert to using the_hash_algo
  2018-11-14  4:09   ` [PATCH v6 " brian m. carlson
                       ` (7 preceding siblings ...)
  2018-11-14  4:09     ` [PATCH v6 08/12] t/helper: add a test helper to compute hash speed brian m. carlson
@ 2018-11-14  4:09     ` brian m. carlson
  2018-11-14  4:09     ` [PATCH v6 10/12] Add a base implementation of SHA-256 support brian m. carlson
                       ` (2 subsequent siblings)
  11 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-11-14  4:09 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

Instead of using hard-coded constants for object sizes, use
the_hash_algo to look them up.  In addition, use a function call to look
up the object ID version and produce the correct value.  For now, we use
version 1, which means to use the default algorithm used in the rest of
the repository.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 commit-graph.c | 33 +++++++++++++++++----------------
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..6763d19288 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -23,16 +23,11 @@
 #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
 #define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */
 
-#define GRAPH_DATA_WIDTH 36
+#define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16)
 
 #define GRAPH_VERSION_1 0x1
 #define GRAPH_VERSION GRAPH_VERSION_1
 
-#define GRAPH_OID_VERSION_SHA1 1
-#define GRAPH_OID_LEN_SHA1 GIT_SHA1_RAWSZ
-#define GRAPH_OID_VERSION GRAPH_OID_VERSION_SHA1
-#define GRAPH_OID_LEN GRAPH_OID_LEN_SHA1
-
 #define GRAPH_OCTOPUS_EDGES_NEEDED 0x80000000
 #define GRAPH_PARENT_MISSING 0x7fffffff
 #define GRAPH_EDGE_LAST_MASK 0x7fffffff
@@ -44,13 +39,18 @@
 #define GRAPH_FANOUT_SIZE (4 * 256)
 #define GRAPH_CHUNKLOOKUP_WIDTH 12
 #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
-			+ GRAPH_FANOUT_SIZE + GRAPH_OID_LEN)
+			+ GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
 
 char *get_commit_graph_filename(const char *obj_dir)
 {
 	return xstrfmt("%s/info/commit-graph", obj_dir);
 }
 
+static uint8_t oid_version(void)
+{
+	return 1;
+}
+
 static struct commit_graph *alloc_commit_graph(void)
 {
 	struct commit_graph *g = xcalloc(1, sizeof(*g));
@@ -125,15 +125,15 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 	}
 
 	hash_version = *(unsigned char*)(data + 5);
-	if (hash_version != GRAPH_OID_VERSION) {
+	if (hash_version != oid_version()) {
 		error(_("hash version %X does not match version %X"),
-		      hash_version, GRAPH_OID_VERSION);
+		      hash_version, oid_version());
 		goto cleanup_fail;
 	}
 
 	graph = alloc_commit_graph();
 
-	graph->hash_len = GRAPH_OID_LEN;
+	graph->hash_len = the_hash_algo->rawsz;
 	graph->num_chunks = *(unsigned char*)(data + 6);
 	graph->graph_fd = fd;
 	graph->data = graph_map;
@@ -149,7 +149,7 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 
 		chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
 
-		if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
+		if (chunk_offset > graph_size - the_hash_algo->rawsz) {
 			error(_("improper chunk offset %08x%08x"), (uint32_t)(chunk_offset >> 32),
 			      (uint32_t)chunk_offset);
 			goto cleanup_fail;
@@ -764,6 +764,7 @@ void write_commit_graph(const char *obj_dir,
 	int num_extra_edges;
 	struct commit_list *parent;
 	struct progress *progress = NULL;
+	const unsigned hashsz = the_hash_algo->rawsz;
 
 	if (!commit_graph_compatible(the_repository))
 		return;
@@ -909,7 +910,7 @@ void write_commit_graph(const char *obj_dir,
 	hashwrite_be32(f, GRAPH_SIGNATURE);
 
 	hashwrite_u8(f, GRAPH_VERSION);
-	hashwrite_u8(f, GRAPH_OID_VERSION);
+	hashwrite_u8(f, oid_version());
 	hashwrite_u8(f, num_chunks);
 	hashwrite_u8(f, 0); /* unused padding byte */
 
@@ -924,8 +925,8 @@ void write_commit_graph(const char *obj_dir,
 
 	chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
 	chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
-	chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.nr;
-	chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.nr;
+	chunk_offsets[2] = chunk_offsets[1] + hashsz * commits.nr;
+	chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * commits.nr;
 	chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
 
 	for (i = 0; i <= num_chunks; i++) {
@@ -938,8 +939,8 @@ void write_commit_graph(const char *obj_dir,
 	}
 
 	write_graph_chunk_fanout(f, commits.list, commits.nr);
-	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
-	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
+	write_graph_chunk_oids(f, hashsz, commits.list, commits.nr);
+	write_graph_chunk_data(f, hashsz, commits.list, commits.nr);
 	write_graph_chunk_large_edges(f, commits.list, commits.nr);
 
 	close_commit_graph(the_repository);

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v6 10/12] Add a base implementation of SHA-256 support
  2018-11-14  4:09   ` [PATCH v6 " brian m. carlson
                       ` (8 preceding siblings ...)
  2018-11-14  4:09     ` [PATCH v6 09/12] commit-graph: convert to using the_hash_algo brian m. carlson
@ 2018-11-14  4:09     ` brian m. carlson
  2018-11-14  4:09     ` [PATCH v6 11/12] sha256: add an SHA-256 implementation using libgcrypt brian m. carlson
  2018-11-14  4:09     ` [PATCH v6 12/12] hash: add an SHA-256 implementation using OpenSSL brian m. carlson
  11 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-11-14  4:09 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

SHA-1 is weak and we need to transition to a new hash function.  For
some time, we have referred to this new function as NewHash.  Recently,
we decided to pick SHA-256 as NewHash.  The reasons behind the choice of
SHA-256 are outlined in the thread starting at [1] and in the commit
history for the hash function transition document.

Add a basic implementation of SHA-256 based off libtomcrypt, which is in
the public domain.  Optimize it and restructure it to meet our coding
standards.  Pull in the update and final functions from the SHA-1 block
implementation, as we know these function correctly with all compilers.
This implementation is slower than SHA-1, but more performant
implementations will be introduced in future commits.

Wire up SHA-256 in the list of hash algorithms, and add a test that the
algorithm works correctly.

Note that with this patch, it is still not possible to switch to using
SHA-256 in Git.  Additional patches are needed to prepare the code to
handle a larger hash algorithm and further test fixes are needed.

[1] https://public-inbox.org/git/20180609224913.GC38834@genre.crustytoothpaste.net/

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Makefile               |   4 +
 cache.h                |  12 ++-
 hash.h                 |  19 +++-
 sha1-file.c            |  45 ++++++++++
 sha256/block/sha256.c  | 196 +++++++++++++++++++++++++++++++++++++++++
 sha256/block/sha256.h  |  24 +++++
 t/helper/test-sha256.c |   7 ++
 t/helper/test-tool.c   |   1 +
 t/helper/test-tool.h   |   1 +
 t/t0015-hash.sh        |  26 ++++++
 10 files changed, 331 insertions(+), 4 deletions(-)
 create mode 100644 sha256/block/sha256.c
 create mode 100644 sha256/block/sha256.h
 create mode 100644 t/helper/test-sha256.c

diff --git a/Makefile b/Makefile
index c6f06bf50b..1302c07f5c 100644
--- a/Makefile
+++ b/Makefile
@@ -749,6 +749,7 @@ TEST_BUILTINS_OBJS += test-run-command.o
 TEST_BUILTINS_OBJS += test-scrap-cache-tree.o
 TEST_BUILTINS_OBJS += test-sha1.o
 TEST_BUILTINS_OBJS += test-sha1-array.o
+TEST_BUILTINS_OBJS += test-sha256.o
 TEST_BUILTINS_OBJS += test-sigchain.o
 TEST_BUILTINS_OBJS += test-strcmp-offset.o
 TEST_BUILTINS_OBJS += test-string-list.o
@@ -1646,6 +1647,9 @@ endif
 endif
 endif
 
+LIB_OBJS += sha256/block/sha256.o
+BASIC_CFLAGS += -DSHA256_BLK
+
 ifdef SHA1_MAX_BLOCK_SIZE
 	LIB_OBJS += compat/sha1-chunked.o
 	BASIC_CFLAGS += -DSHA1_MAX_BLOCK_SIZE="$(SHA1_MAX_BLOCK_SIZE)"
diff --git a/cache.h b/cache.h
index 62b2f3a5e8..b0dfe42736 100644
--- a/cache.h
+++ b/cache.h
@@ -48,11 +48,17 @@ unsigned long git_deflate_bound(git_zstream *, unsigned long);
 /* The block size of SHA-1. */
 #define GIT_SHA1_BLKSZ 64
 
+/* The length in bytes and in hex digits of an object name (SHA-256 value). */
+#define GIT_SHA256_RAWSZ 32
+#define GIT_SHA256_HEXSZ (2 * GIT_SHA256_RAWSZ)
+/* The block size of SHA-256. */
+#define GIT_SHA256_BLKSZ 64
+
 /* The length in byte and in hex digits of the largest possible hash value. */
-#define GIT_MAX_RAWSZ GIT_SHA1_RAWSZ
-#define GIT_MAX_HEXSZ GIT_SHA1_HEXSZ
+#define GIT_MAX_RAWSZ GIT_SHA256_RAWSZ
+#define GIT_MAX_HEXSZ GIT_SHA256_HEXSZ
 /* The largest possible block size for any supported hash. */
-#define GIT_MAX_BLKSZ GIT_SHA1_BLKSZ
+#define GIT_MAX_BLKSZ GIT_SHA256_BLKSZ
 
 struct object_id {
 	unsigned char hash[GIT_MAX_RAWSZ];
diff --git a/hash.h b/hash.h
index 1bcf7ab6fd..a9bc624020 100644
--- a/hash.h
+++ b/hash.h
@@ -15,6 +15,8 @@
 #include "block-sha1/sha1.h"
 #endif
 
+#include "sha256/block/sha256.h"
+
 #ifndef platform_SHA_CTX
 /*
  * platform's underlying implementation of SHA-1; could be OpenSSL,
@@ -34,6 +36,18 @@
 #define git_SHA1_Update		platform_SHA1_Update
 #define git_SHA1_Final		platform_SHA1_Final
 
+#ifndef platform_SHA256_CTX
+#define platform_SHA256_CTX	SHA256_CTX
+#define platform_SHA256_Init	SHA256_Init
+#define platform_SHA256_Update	SHA256_Update
+#define platform_SHA256_Final	SHA256_Final
+#endif
+
+#define git_SHA256_CTX		platform_SHA256_CTX
+#define git_SHA256_Init		platform_SHA256_Init
+#define git_SHA256_Update	platform_SHA256_Update
+#define git_SHA256_Final	platform_SHA256_Final
+
 #ifdef SHA1_MAX_BLOCK_SIZE
 #include "compat/sha1-chunked.h"
 #undef git_SHA1_Update
@@ -52,12 +66,15 @@
 #define GIT_HASH_UNKNOWN 0
 /* SHA-1 */
 #define GIT_HASH_SHA1 1
+/* SHA-256  */
+#define GIT_HASH_SHA256 2
 /* Number of algorithms supported (including unknown). */
-#define GIT_HASH_NALGOS (GIT_HASH_SHA1 + 1)
+#define GIT_HASH_NALGOS (GIT_HASH_SHA256 + 1)
 
 /* A suitably aligned type for stack allocations of hash contexts. */
 union git_hash_ctx {
 	git_SHA_CTX sha1;
+	git_SHA256_CTX sha256;
 };
 typedef union git_hash_ctx git_hash_ctx;
 
diff --git a/sha1-file.c b/sha1-file.c
index c47349a5f8..e7764822ea 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -40,10 +40,20 @@
 #define EMPTY_TREE_SHA1_BIN_LITERAL \
 	 "\x4b\x82\x5d\xc6\x42\xcb\x6e\xb9\xa0\x60" \
 	 "\xe5\x4b\xf8\xd6\x92\x88\xfb\xee\x49\x04"
+#define EMPTY_TREE_SHA256_BIN_LITERAL \
+	"\x6e\xf1\x9b\x41\x22\x5c\x53\x69\xf1\xc1" \
+	"\x04\xd4\x5d\x8d\x85\xef\xa9\xb0\x57\xb5" \
+	"\x3b\x14\xb4\xb9\xb9\x39\xdd\x74\xde\xcc" \
+	"\x53\x21"
 
 #define EMPTY_BLOB_SHA1_BIN_LITERAL \
 	"\xe6\x9d\xe2\x9b\xb2\xd1\xd6\x43\x4b\x8b" \
 	"\x29\xae\x77\x5a\xd8\xc2\xe4\x8c\x53\x91"
+#define EMPTY_BLOB_SHA256_BIN_LITERAL \
+	"\x47\x3a\x0f\x4c\x3b\xe8\xa9\x36\x81\xa2" \
+	"\x67\xe3\xb1\xe9\xa7\xdc\xda\x11\x85\x43" \
+	"\x6f\xe1\x41\xf7\x74\x91\x20\xa3\x03\x72" \
+	"\x18\x13"
 
 const unsigned char null_sha1[GIT_MAX_RAWSZ];
 const struct object_id null_oid;
@@ -53,6 +63,12 @@ static const struct object_id empty_tree_oid = {
 static const struct object_id empty_blob_oid = {
 	EMPTY_BLOB_SHA1_BIN_LITERAL
 };
+static const struct object_id empty_tree_oid_sha256 = {
+	EMPTY_TREE_SHA256_BIN_LITERAL
+};
+static const struct object_id empty_blob_oid_sha256 = {
+	EMPTY_BLOB_SHA256_BIN_LITERAL
+};
 
 static void git_hash_sha1_init(git_hash_ctx *ctx)
 {
@@ -69,6 +85,22 @@ static void git_hash_sha1_final(unsigned char *hash, git_hash_ctx *ctx)
 	git_SHA1_Final(hash, &ctx->sha1);
 }
 
+
+static void git_hash_sha256_init(git_hash_ctx *ctx)
+{
+	git_SHA256_Init(&ctx->sha256);
+}
+
+static void git_hash_sha256_update(git_hash_ctx *ctx, const void *data, size_t len)
+{
+	git_SHA256_Update(&ctx->sha256, data, len);
+}
+
+static void git_hash_sha256_final(unsigned char *hash, git_hash_ctx *ctx)
+{
+	git_SHA256_Final(hash, &ctx->sha256);
+}
+
 static void git_hash_unknown_init(git_hash_ctx *ctx)
 {
 	BUG("trying to init unknown hash");
@@ -110,6 +142,19 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		&empty_tree_oid,
 		&empty_blob_oid,
 	},
+	{
+		"sha256",
+		/* "s256", big-endian */
+		0x73323536,
+		GIT_SHA256_RAWSZ,
+		GIT_SHA256_HEXSZ,
+		GIT_SHA256_BLKSZ,
+		git_hash_sha256_init,
+		git_hash_sha256_update,
+		git_hash_sha256_final,
+		&empty_tree_oid_sha256,
+		&empty_blob_oid_sha256,
+	}
 };
 
 const char *empty_tree_oid_hex(void)
diff --git a/sha256/block/sha256.c b/sha256/block/sha256.c
new file mode 100644
index 0000000000..37850b4e52
--- /dev/null
+++ b/sha256/block/sha256.c
@@ -0,0 +1,196 @@
+#include "git-compat-util.h"
+#include "./sha256.h"
+
+#undef RND
+#undef BLKSIZE
+
+#define BLKSIZE blk_SHA256_BLKSIZE
+
+void blk_SHA256_Init(blk_SHA256_CTX *ctx)
+{
+	ctx->offset = 0;
+	ctx->size = 0;
+	ctx->state[0] = 0x6a09e667ul;
+	ctx->state[1] = 0xbb67ae85ul;
+	ctx->state[2] = 0x3c6ef372ul;
+	ctx->state[3] = 0xa54ff53aul;
+	ctx->state[4] = 0x510e527ful;
+	ctx->state[5] = 0x9b05688cul;
+	ctx->state[6] = 0x1f83d9abul;
+	ctx->state[7] = 0x5be0cd19ul;
+}
+
+static inline uint32_t ror(uint32_t x, unsigned n)
+{
+	return (x >> n) | (x << (32 - n));
+}
+
+static inline uint32_t ch(uint32_t x, uint32_t y, uint32_t z)
+{
+	return z ^ (x & (y ^ z));
+}
+
+static inline uint32_t maj(uint32_t x, uint32_t y, uint32_t z)
+{
+	return ((x | y) & z) | (x & y);
+}
+
+static inline uint32_t sigma0(uint32_t x)
+{
+	return ror(x, 2) ^ ror(x, 13) ^ ror(x, 22);
+}
+
+static inline uint32_t sigma1(uint32_t x)
+{
+	return ror(x, 6) ^ ror(x, 11) ^ ror(x, 25);
+}
+
+static inline uint32_t gamma0(uint32_t x)
+{
+	return ror(x, 7) ^ ror(x, 18) ^ (x >> 3);
+}
+
+static inline uint32_t gamma1(uint32_t x)
+{
+	return ror(x, 17) ^ ror(x, 19) ^ (x >> 10);
+}
+
+static void blk_SHA256_Transform(blk_SHA256_CTX *ctx, const unsigned char *buf)
+{
+
+	uint32_t S[8], W[64], t0, t1;
+	int i;
+
+	/* copy state into S */
+	for (i = 0; i < 8; i++)
+		S[i] = ctx->state[i];
+
+	/* copy the state into 512-bits into W[0..15] */
+	for (i = 0; i < 16; i++, buf += sizeof(uint32_t))
+		W[i] = get_be32(buf);
+
+	/* fill W[16..63] */
+	for (i = 16; i < 64; i++)
+		W[i] = gamma1(W[i - 2]) + W[i - 7] + gamma0(W[i - 15]) + W[i - 16];
+
+#define RND(a,b,c,d,e,f,g,h,i,ki)                    \
+	t0 = h + sigma1(e) + ch(e, f, g) + ki + W[i];   \
+	t1 = sigma0(a) + maj(a, b, c);                  \
+	d += t0;                                        \
+	h  = t0 + t1;
+
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],0,0x428a2f98);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],1,0x71374491);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],2,0xb5c0fbcf);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],3,0xe9b5dba5);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],4,0x3956c25b);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],5,0x59f111f1);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],6,0x923f82a4);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],7,0xab1c5ed5);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],8,0xd807aa98);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],9,0x12835b01);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],10,0x243185be);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],11,0x550c7dc3);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],12,0x72be5d74);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],13,0x80deb1fe);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],14,0x9bdc06a7);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],15,0xc19bf174);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],16,0xe49b69c1);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],17,0xefbe4786);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],18,0x0fc19dc6);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],19,0x240ca1cc);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],20,0x2de92c6f);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],21,0x4a7484aa);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],22,0x5cb0a9dc);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],23,0x76f988da);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],24,0x983e5152);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],25,0xa831c66d);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],26,0xb00327c8);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],27,0xbf597fc7);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],28,0xc6e00bf3);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],29,0xd5a79147);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],30,0x06ca6351);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],31,0x14292967);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],32,0x27b70a85);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],33,0x2e1b2138);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],34,0x4d2c6dfc);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],35,0x53380d13);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],36,0x650a7354);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],37,0x766a0abb);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],38,0x81c2c92e);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],39,0x92722c85);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],40,0xa2bfe8a1);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],41,0xa81a664b);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],42,0xc24b8b70);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],43,0xc76c51a3);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],44,0xd192e819);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],45,0xd6990624);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],46,0xf40e3585);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],47,0x106aa070);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],48,0x19a4c116);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],49,0x1e376c08);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],50,0x2748774c);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],51,0x34b0bcb5);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],52,0x391c0cb3);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],53,0x4ed8aa4a);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],54,0x5b9cca4f);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],55,0x682e6ff3);
+	RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],56,0x748f82ee);
+	RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],57,0x78a5636f);
+	RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],58,0x84c87814);
+	RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],59,0x8cc70208);
+	RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],60,0x90befffa);
+	RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],61,0xa4506ceb);
+	RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],62,0xbef9a3f7);
+	RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],63,0xc67178f2);
+
+	for (i = 0; i < 8; i++)
+		ctx->state[i] += S[i];
+}
+
+void blk_SHA256_Update(blk_SHA256_CTX *ctx, const void *data, size_t len)
+{
+	unsigned int len_buf = ctx->size & 63;
+
+	ctx->size += len;
+
+	/* Read the data into buf and process blocks as they get full */
+	if (len_buf) {
+		unsigned int left = 64 - len_buf;
+		if (len < left)
+			left = len;
+		memcpy(len_buf + ctx->buf, data, left);
+		len_buf = (len_buf + left) & 63;
+		len -= left;
+		data = ((const char *)data + left);
+		if (len_buf)
+			return;
+		blk_SHA256_Transform(ctx, ctx->buf);
+	}
+	while (len >= 64) {
+		blk_SHA256_Transform(ctx, data);
+		data = ((const char *)data + 64);
+		len -= 64;
+	}
+	if (len)
+		memcpy(ctx->buf, data, len);
+}
+
+void blk_SHA256_Final(unsigned char *digest, blk_SHA256_CTX *ctx)
+{
+	static const unsigned char pad[64] = { 0x80 };
+	unsigned int padlen[2];
+	int i;
+
+	/* Pad with a binary 1 (ie 0x80), then zeroes, then length */
+	padlen[0] = htonl((uint32_t)(ctx->size >> 29));
+	padlen[1] = htonl((uint32_t)(ctx->size << 3));
+
+	i = ctx->size & 63;
+	blk_SHA256_Update(ctx, pad, 1 + (63 & (55 - i)));
+	blk_SHA256_Update(ctx, padlen, 8);
+
+	/* copy output */
+	for (i = 0; i < 8; i++, digest += sizeof(uint32_t))
+		put_be32(digest, ctx->state[i]);
+}
diff --git a/sha256/block/sha256.h b/sha256/block/sha256.h
new file mode 100644
index 0000000000..5099d6421d
--- /dev/null
+++ b/sha256/block/sha256.h
@@ -0,0 +1,24 @@
+#ifndef SHA256_BLOCK_SHA256_H
+#define SHA256_BLOCK_SHA256_H
+
+#define blk_SHA256_BLKSIZE 64
+
+struct blk_SHA256_CTX {
+	uint32_t state[8];
+	uint64_t size;
+	uint32_t offset;
+	uint8_t buf[blk_SHA256_BLKSIZE];
+};
+
+typedef struct blk_SHA256_CTX blk_SHA256_CTX;
+
+void blk_SHA256_Init(blk_SHA256_CTX *ctx);
+void blk_SHA256_Update(blk_SHA256_CTX *ctx, const void *data, size_t len);
+void blk_SHA256_Final(unsigned char *digest, blk_SHA256_CTX *ctx);
+
+#define platform_SHA256_CTX blk_SHA256_CTX
+#define platform_SHA256_Init blk_SHA256_Init
+#define platform_SHA256_Update blk_SHA256_Update
+#define platform_SHA256_Final blk_SHA256_Final
+
+#endif
diff --git a/t/helper/test-sha256.c b/t/helper/test-sha256.c
new file mode 100644
index 0000000000..0ac6a99d5f
--- /dev/null
+++ b/t/helper/test-sha256.c
@@ -0,0 +1,7 @@
+#include "test-tool.h"
+#include "cache.h"
+
+int cmd__sha256(int ac, const char **av)
+{
+	return cmd_hash_impl(ac, av, GIT_HASH_SHA256);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index c222d532a2..5b137874e1 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -43,6 +43,7 @@ static struct test_cmd cmds[] = {
 	{ "scrap-cache-tree", cmd__scrap_cache_tree },
 	{ "sha1", cmd__sha1 },
 	{ "sha1-array", cmd__sha1_array },
+	{ "sha256", cmd__sha256 },
 	{ "sigchain", cmd__sigchain },
 	{ "strcmp-offset", cmd__strcmp_offset },
 	{ "string-list", cmd__string_list },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 70dafb7649..ca5c88edb2 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -39,6 +39,7 @@ int cmd__run_command(int argc, const char **argv);
 int cmd__scrap_cache_tree(int argc, const char **argv);
 int cmd__sha1(int argc, const char **argv);
 int cmd__sha1_array(int argc, const char **argv);
+int cmd__sha256(int argc, const char **argv);
 int cmd__sigchain(int argc, const char **argv);
 int cmd__strcmp_offset(int argc, const char **argv);
 int cmd__string_list(int argc, const char **argv);
diff --git a/t/t0015-hash.sh b/t/t0015-hash.sh
index 884fe5acd1..291e9061f3 100755
--- a/t/t0015-hash.sh
+++ b/t/t0015-hash.sh
@@ -26,4 +26,30 @@ test_expect_success 'test basic SHA-1 hash values' '
 	grep 4b825dc642cb6eb9a060e54bf8d69288fbee4904 actual
 '
 
+test_expect_success 'test basic SHA-256 hash values' '
+	test-tool sha256 </dev/null >actual &&
+	grep e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 actual &&
+	printf "a" | test-tool sha256 >actual &&
+	grep ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb actual &&
+	printf "abc" | test-tool sha256 >actual &&
+	grep ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad actual &&
+	printf "message digest" | test-tool sha256 >actual &&
+	grep f7846f55cf23e14eebeab5b4e1550cad5b509e3348fbc4efa3a1413d393cb650 actual &&
+	printf "abcdefghijklmnopqrstuvwxyz" | test-tool sha256 >actual &&
+	grep 71c480df93d6ae2f1efad1447c66c9525e316218cf51fc8d9ed832f2daf18b73 actual &&
+	# Try to exercise the chunking code by turning autoflush on.
+	perl -e "$| = 1; print q{aaaaaaaaaa} for 1..100000;" | \
+		test-tool sha256 >actual &&
+	grep cdc76e5c9914fb9281a1c7e284d73e67f1809a48a497200e046d39ccc7112cd0 actual &&
+	perl -e "$| = 1; print q{abcdefghijklmnopqrstuvwxyz} for 1..100000;" | \
+		test-tool sha256 >actual &&
+	grep e406ba321ca712ad35a698bf0af8d61fc4dc40eca6bdcea4697962724ccbde35 actual &&
+	printf "blob 0\0" | test-tool sha256 >actual &&
+	grep 473a0f4c3be8a93681a267e3b1e9a7dcda1185436fe141f7749120a303721813 actual &&
+	printf "blob 3\0abc" | test-tool sha256 >actual &&
+	grep c1cf6e465077930e88dc5136641d402f72a229ddd996f627d60e9639eaba35a6 actual &&
+	printf "tree 0\0" | test-tool sha256 >actual &&
+	grep 6ef19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321 actual
+'
+
 test_done

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v6 11/12] sha256: add an SHA-256 implementation using libgcrypt
  2018-11-14  4:09   ` [PATCH v6 " brian m. carlson
                       ` (9 preceding siblings ...)
  2018-11-14  4:09     ` [PATCH v6 10/12] Add a base implementation of SHA-256 support brian m. carlson
@ 2018-11-14  4:09     ` brian m. carlson
  2018-11-14  4:09     ` [PATCH v6 12/12] hash: add an SHA-256 implementation using OpenSSL brian m. carlson
  11 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-11-14  4:09 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

Generally, one gets better performance out of cryptographic routines
written in assembly than C, and this is also true for SHA-256.  In
addition, most Linux distributions cannot distribute Git linked against
OpenSSL for licensing reasons.

Most systems with GnuPG will also have libgcrypt, since it is a
dependency of GnuPG.  libgcrypt is also faster than the SHA1DC
implementation for messages of a few KiB and larger.

For comparison, on a Core i7-6600U, this implementation processes 16 KiB
chunks at 355 MiB/s while SHA1DC processes equivalent chunks at 337
MiB/s.

In addition, libgcrypt is licensed under the LGPL 2.1, which is
compatible with the GPL.  Add an implementation of SHA-256 that uses
libgcrypt.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Makefile        | 13 +++++++++++--
 hash.h          |  4 ++++
 sha256/gcrypt.h | 30 ++++++++++++++++++++++++++++++
 3 files changed, 45 insertions(+), 2 deletions(-)
 create mode 100644 sha256/gcrypt.h

diff --git a/Makefile b/Makefile
index 1302c07f5c..6c99844aa8 100644
--- a/Makefile
+++ b/Makefile
@@ -179,6 +179,10 @@ all::
 # in one call to the platform's SHA1_Update(). e.g. APPLE_COMMON_CRYPTO
 # wants 'SHA1_MAX_BLOCK_SIZE=1024L*1024L*1024L' defined.
 #
+# Define BLK_SHA256 to use the built-in SHA-256 routines.
+#
+# Define GCRYPT_SHA256 to use the SHA-256 routines in libgcrypt.
+#
 # Define NEEDS_CRYPTO_WITH_SSL if you need -lcrypto when using -lssl (Darwin).
 #
 # Define NEEDS_SSL_WITH_CRYPTO if you need -lssl when using -lcrypto (Darwin).
@@ -1647,8 +1651,13 @@ endif
 endif
 endif
 
-LIB_OBJS += sha256/block/sha256.o
-BASIC_CFLAGS += -DSHA256_BLK
+ifdef GCRYPT_SHA256
+	BASIC_CFLAGS += -DSHA256_GCRYPT
+	EXTLIBS += -lgcrypt
+else
+	LIB_OBJS += sha256/block/sha256.o
+	BASIC_CFLAGS += -DSHA256_BLK
+endif
 
 ifdef SHA1_MAX_BLOCK_SIZE
 	LIB_OBJS += compat/sha1-chunked.o
diff --git a/hash.h b/hash.h
index a9bc624020..2ef098052d 100644
--- a/hash.h
+++ b/hash.h
@@ -15,7 +15,11 @@
 #include "block-sha1/sha1.h"
 #endif
 
+#if defined(SHA256_GCRYPT)
+#include "sha256/gcrypt.h"
+#else
 #include "sha256/block/sha256.h"
+#endif
 
 #ifndef platform_SHA_CTX
 /*
diff --git a/sha256/gcrypt.h b/sha256/gcrypt.h
new file mode 100644
index 0000000000..09bd8bb200
--- /dev/null
+++ b/sha256/gcrypt.h
@@ -0,0 +1,30 @@
+#ifndef SHA256_GCRYPT_H
+#define SHA256_GCRYPT_H
+
+#include <gcrypt.h>
+
+#define SHA256_DIGEST_SIZE 32
+
+typedef gcry_md_hd_t gcrypt_SHA256_CTX;
+
+inline void gcrypt_SHA256_Init(gcrypt_SHA256_CTX *ctx)
+{
+	gcry_md_open(ctx, GCRY_MD_SHA256, 0);
+}
+
+inline void gcrypt_SHA256_Update(gcrypt_SHA256_CTX *ctx, const void *data, size_t len)
+{
+	gcry_md_write(*ctx, data, len);
+}
+
+inline void gcrypt_SHA256_Final(unsigned char *digest, gcrypt_SHA256_CTX *ctx)
+{
+	memcpy(digest, gcry_md_read(*ctx, GCRY_MD_SHA256), SHA256_DIGEST_SIZE);
+}
+
+#define platform_SHA256_CTX gcrypt_SHA256_CTX
+#define platform_SHA256_Init gcrypt_SHA256_Init
+#define platform_SHA256_Update gcrypt_SHA256_Update
+#define platform_SHA256_Final gcrypt_SHA256_Final
+
+#endif

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v6 12/12] hash: add an SHA-256 implementation using OpenSSL
  2018-11-14  4:09   ` [PATCH v6 " brian m. carlson
                       ` (10 preceding siblings ...)
  2018-11-14  4:09     ` [PATCH v6 11/12] sha256: add an SHA-256 implementation using libgcrypt brian m. carlson
@ 2018-11-14  4:09     ` brian m. carlson
  11 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-11-14  4:09 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Duy Nguyen, SZEDER Gábor, Jakub Narebski, Christian Couder

We already have OpenSSL routines available for SHA-1, so add routines
for SHA-256 as well.

On a Core i7-6600U, this SHA-256 implementation compares favorably to
the SHA1DC SHA-1 implementation:

SHA-1: 157 MiB/s (64 byte chunks); 337 MiB/s (16 KiB chunks)
SHA-256: 165 MiB/s (64 byte chunks); 408 MiB/s (16 KiB chunks)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Makefile | 7 +++++++
 hash.h   | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/Makefile b/Makefile
index 6c99844aa8..6844deb92d 100644
--- a/Makefile
+++ b/Makefile
@@ -183,6 +183,8 @@ all::
 #
 # Define GCRYPT_SHA256 to use the SHA-256 routines in libgcrypt.
 #
+# Define OPENSSL_SHA256 to use the SHA-256 routines in OpenSSL.
+#
 # Define NEEDS_CRYPTO_WITH_SSL if you need -lcrypto when using -lssl (Darwin).
 #
 # Define NEEDS_SSL_WITH_CRYPTO if you need -lssl when using -lcrypto (Darwin).
@@ -1651,6 +1653,10 @@ endif
 endif
 endif
 
+ifdef OPENSSL_SHA256
+	EXTLIBS += $(LIB_4_CRYPTO)
+	BASIC_CFLAGS += -DSHA256_OPENSSL
+else
 ifdef GCRYPT_SHA256
 	BASIC_CFLAGS += -DSHA256_GCRYPT
 	EXTLIBS += -lgcrypt
@@ -1658,6 +1664,7 @@ else
 	LIB_OBJS += sha256/block/sha256.o
 	BASIC_CFLAGS += -DSHA256_BLK
 endif
+endif
 
 ifdef SHA1_MAX_BLOCK_SIZE
 	LIB_OBJS += compat/sha1-chunked.o
diff --git a/hash.h b/hash.h
index 2ef098052d..adde708cf2 100644
--- a/hash.h
+++ b/hash.h
@@ -17,6 +17,8 @@
 
 #if defined(SHA256_GCRYPT)
 #include "sha256/gcrypt.h"
+#elif defined(SHA256_OPENSSL)
+#include <openssl/sha.h>
 #else
 #include "sha256/block/sha256.h"
 #endif

^ permalink raw reply related	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2018-11-14  4:10 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-25  2:39 [PATCH v4 00/12] Base SHA-256 implementation brian m. carlson
2018-10-25  2:39 ` [PATCH v4 01/12] sha1-file: rename algorithm to "sha1" brian m. carlson
2018-10-25  2:39 ` [PATCH v4 02/12] sha1-file: provide functions to look up hash algorithms brian m. carlson
2018-10-25  2:39 ` [PATCH v4 03/12] hex: introduce functions to print arbitrary hashes brian m. carlson
2018-10-25  2:39 ` [PATCH v4 04/12] cache: make hashcmp and hasheq work with larger hashes brian m. carlson
2018-10-25  2:39 ` [PATCH v4 05/12] t: add basic tests for our SHA-1 implementation brian m. carlson
2018-10-25  2:39 ` [PATCH v4 06/12] t: make the sha1 test-tool helper generic brian m. carlson
2018-10-25  2:40 ` [PATCH v4 07/12] sha1-file: add a constant for hash block size brian m. carlson
2018-10-25  2:40 ` [PATCH v4 08/12] t/helper: add a test helper to compute hash speed brian m. carlson
2018-10-25  2:40 ` [PATCH v4 09/12] commit-graph: convert to using the_hash_algo brian m. carlson
2018-10-25  2:40 ` [PATCH v4 10/12] Add a base implementation of SHA-256 support brian m. carlson
2018-10-25  3:02   ` Carlo Arenas
2018-10-28 15:52     ` brian m. carlson
2018-10-29  0:39       ` Junio C Hamano
2018-10-31 22:55         ` brian m. carlson
2018-11-01  5:29           ` Junio C Hamano
2018-10-27  9:03   ` Christian Couder
2018-10-25  2:40 ` [PATCH v4 11/12] sha256: add an SHA-256 implementation using libgcrypt brian m. carlson
2018-10-25  2:40 ` [PATCH v4 12/12] hash: add an SHA-256 implementation using OpenSSL brian m. carlson
2018-11-04 23:44 ` [PATCH v5 00/12] Base SHA-256 implementation brian m. carlson
2018-11-04 23:44   ` [PATCH v5 01/12] sha1-file: rename algorithm to "sha1" brian m. carlson
2018-11-05  7:21     ` Ævar Arnfjörð Bjarmason
2018-11-04 23:44   ` [PATCH v5 02/12] sha1-file: provide functions to look up hash algorithms brian m. carlson
2018-11-13 18:42     ` Derrick Stolee
2018-11-13 18:45       ` Duy Nguyen
2018-11-14  1:01         ` brian m. carlson
2018-11-14  0:11       ` Ramsay Jones
2018-11-14  0:42         ` Ramsay Jones
2018-11-14  0:51           ` Jeff King
2018-11-14  2:11         ` brian m. carlson
2018-11-14  3:53           ` Ramsay Jones
2018-11-04 23:44   ` [PATCH v5 03/12] hex: introduce functions to print arbitrary hashes brian m. carlson
2018-11-04 23:44   ` [PATCH v5 04/12] cache: make hashcmp and hasheq work with larger hashes brian m. carlson
2018-11-04 23:44   ` [PATCH v5 05/12] t: add basic tests for our SHA-1 implementation brian m. carlson
2018-11-04 23:44   ` [PATCH v5 06/12] t: make the sha1 test-tool helper generic brian m. carlson
2018-11-04 23:44   ` [PATCH v5 07/12] sha1-file: add a constant for hash block size brian m. carlson
2018-11-04 23:44   ` [PATCH v5 08/12] t/helper: add a test helper to compute hash speed brian m. carlson
2018-11-04 23:44   ` [PATCH v5 09/12] commit-graph: convert to using the_hash_algo brian m. carlson
2018-11-04 23:44   ` [PATCH v5 10/12] Add a base implementation of SHA-256 support brian m. carlson
2018-11-05 11:39     ` Ævar Arnfjörð Bjarmason
2018-11-07  1:30       ` brian m. carlson
2018-11-10 15:52         ` Ævar Arnfjörð Bjarmason
2018-11-04 23:44   ` [PATCH v5 11/12] sha256: add an SHA-256 implementation using libgcrypt brian m. carlson
2018-11-04 23:44   ` [PATCH v5 12/12] hash: add an SHA-256 implementation using OpenSSL brian m. carlson
2018-11-05  2:45   ` [PATCH v5 00/12] Base SHA-256 implementation Junio C Hamano
2018-11-14  4:09   ` [PATCH v6 " brian m. carlson
2018-11-14  4:09     ` [PATCH v6 01/12] sha1-file: rename algorithm to "sha1" brian m. carlson
2018-11-14  4:09     ` [PATCH v6 02/12] sha1-file: provide functions to look up hash algorithms brian m. carlson
2018-11-14  4:09     ` [PATCH v6 03/12] hex: introduce functions to print arbitrary hashes brian m. carlson
2018-11-14  4:09     ` [PATCH v6 04/12] cache: make hashcmp and hasheq work with larger hashes brian m. carlson
2018-11-14  4:09     ` [PATCH v6 05/12] t: add basic tests for our SHA-1 implementation brian m. carlson
2018-11-14  4:09     ` [PATCH v6 06/12] t: make the sha1 test-tool helper generic brian m. carlson
2018-11-14  4:09     ` [PATCH v6 07/12] sha1-file: add a constant for hash block size brian m. carlson
2018-11-14  4:09     ` [PATCH v6 08/12] t/helper: add a test helper to compute hash speed brian m. carlson
2018-11-14  4:09     ` [PATCH v6 09/12] commit-graph: convert to using the_hash_algo brian m. carlson
2018-11-14  4:09     ` [PATCH v6 10/12] Add a base implementation of SHA-256 support brian m. carlson
2018-11-14  4:09     ` [PATCH v6 11/12] sha256: add an SHA-256 implementation using libgcrypt brian m. carlson
2018-11-14  4:09     ` [PATCH v6 12/12] hash: add an SHA-256 implementation using OpenSSL brian m. carlson

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).