git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3
@ 2020-02-22 20:17 brian m. carlson
  2020-02-22 20:17 ` [PATCH v2 01/24] builtin/pack-objects: make hash agnostic brian m. carlson
                   ` (24 more replies)
  0 siblings, 25 replies; 51+ messages in thread
From: brian m. carlson @ 2020-02-22 20:17 UTC (permalink / raw)
  To: git

This is a series for part 1 of 3 of a stage 4 SHA-256 implementation.
It is mostly the same as v1[0], which was RFC.  The interested reader is
referred there for the in-depth explanations.

A few interesting changes have taken place since v1.  First, I
discovered a regression in a recent series which introduced a hard-coded
constant, so patch 1 addresses this.

Second, I discovered a mistaken assumption that we were making about our
hash implementations: that copying the struct was sufficient to copy the
context.  This is not true for libgcrypt, where our context is a pointer
instead, so patch 2 addresses this with a helper function.

Finally, I've added a check to prevent non-developers from creating
SHA-256 repositories, since this series is not sufficient to implement
full SHA-256 support.  Even as a developer, creating a SHA-256
repository immediately leads to a broken state, since we don't recognize
the extension (yet) and therefore promptly refuse to operate on it in
any way.  Preventing this experience seemed prudent.

There are a few other minor changes indicated in the range-diff below.

If folks have opinions about things like option names (or really,
anything else), I'd love to hear them.

There are two more parts for this implementation and one more set of
test fixes.  They will be coming in due course.

[0] https://lore.kernel.org/git/20200113124729.3684846-1-sandals@crustytoothpaste.net/

brian m. carlson (24):
  builtin/pack-objects: make hash agnostic
  hash: implement and use a context cloning function
  hex: introduce parsing variants taking hash algorithms
  hex: add functions to parse hex object IDs in any algorithm
  repository: require a build flag to use SHA-256
  t: use hash-specific lookup tables to define test constants
  t6300: abstract away SHA-1-specific constants
  t6300: make hash algorithm independent
  t/helper/test-dump-split-index: initialize git repository
  t/helper: initialize repository if necessary
  t/helper: make repository tests hash independent
  setup: allow check_repository_format to read repository format
  builtin/init-db: allow specifying hash algorithm on command line
  builtin/init-db: add environment variable for new repo hash
  init-db: move writing repo version into a function
  worktree: allow repository version 1
  commit: use expected signature header for SHA-256
  gpg-interface: improve interface for parsing tags
  tag: store SHA-256 signatures in a header
  fast-import: permit reading multiple marks files
  fast-import: add helper function for inserting mark object entries
  fast-import: make find_marks work on any mark set
  fast-import: add a generic function to iterate over marks
  fast-import: add options for rewriting submodules

 Documentation/git-fast-import.txt |  20 +++
 Documentation/git-init.txt        |   7 +-
 Documentation/git.txt             |   6 +
 builtin/clone.c                   |   2 +-
 builtin/commit.c                  |   2 +-
 builtin/fmt-merge-msg.c           |  26 +++-
 builtin/init-db.c                 |  75 +++++++--
 builtin/mktag.c                   |  14 ++
 builtin/pack-objects.c            |   2 +-
 builtin/receive-pack.c            |   4 +-
 builtin/tag.c                     |  20 ++-
 cache.h                           |  25 ++-
 commit.c                          |  58 +++++--
 commit.h                          |   8 +
 config.mak.dev                    |   2 +
 csum-file.c                       |   2 +-
 fast-import.c                     | 246 ++++++++++++++++++++++--------
 gpg-interface.c                   |  17 ++-
 gpg-interface.h                   |   9 +-
 hash.h                            |  21 +++
 hex.c                             |  57 ++++++-
 log-tree.c                        |  14 +-
 path.c                            |   2 +-
 ref-filter.c                      |  23 ++-
 repository.c                      |   4 +
 sequencer.c                       |   2 +-
 setup.c                           |   6 +-
 sha1-file.c                       |  18 +++
 sha256/gcrypt.h                   |   6 +
 t/helper/test-dump-split-index.c  |   2 +
 t/helper/test-repository.c        |  14 +-
 t/t1450-fsck.sh                   |  24 +++
 t/t5801-remote-helpers.sh         |   4 +-
 t/t6300-for-each-ref.sh           |  61 +++++---
 t/t7004-tag.sh                    |   8 +-
 t/t7030-verify-tag.sh             |  17 +++
 t/t7510-signed-commit.sh          |  16 +-
 t/t9300-fast-import.sh            | 109 +++++++++++++
 t/test-lib.sh                     |  29 ++--
 tag.c                             |  15 +-
 worktree.c                        |  10 +-
 41 files changed, 810 insertions(+), 197 deletions(-)

Range-diff against v1:
 -:  ---------- >  1:  0037d0f950 builtin/pack-objects: make hash agnostic
 -:  ---------- >  2:  e6f632eb38 hash: implement and use a context cloning function
 1:  17e86d7e8a =  3:  205c67144b hex: introduce parsing variants taking hash algorithms
 2:  8a4e05f03f =  4:  09e3f4ca8e hex: add functions to parse hex object IDs in any algorithm
 3:  4efa071dd6 =  5:  3df541b2f0 repository: require a build flag to use SHA-256
 4:  0c26727e8c =  6:  1161b70d24 t: use hash-specific lookup tables to define test constants
 5:  3aff50578d =  7:  fd3a85f59f t6300: abstract away SHA-1-specific constants
 6:  2736da7f60 =  8:  fb93b0900b t6300: make hash algorithm independent
 7:  343cd6c9b2 =  9:  1d873fc05a t/helper/test-dump-split-index: initialize git repository
 8:  1c8e31fd41 = 10:  8e918b28d5 t/helper: initialize repository if necessary
 9:  9dade56ee5 = 11:  7ae0d820eb t/helper: make repository tests hash independent
10:  b08bec90e7 = 12:  f13b1465cf setup: allow check_repository_format to read repository format
11:  acfdc2b0c8 ! 13:  273a2c06ce builtin/init-db: allow specifying hash algorithm on command line
    @@ Commit message
         algorithm.  Ensure that if we are writing a non-SHA-1 repository that we
         set the repository version to 1 and write the objectFormat extension.
     
    +    Restrict this option to work only when ENABLE_SHA256 is set until the
    +    codebase is in a situation to fully support this.
    +
         Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
     
      ## Documentation/git-init.txt ##
    @@ builtin/init-db.c: static int create_default_files(const char *template_path,
      			exit(1);
      	}
      
    ++#ifndef ENABLE_SHA256
    ++	if (fmt->hash_algo != GIT_HASH_SHA1)
    ++		die(_("The hash algorithm %s is not supported in this build."), hash_algos[fmt->hash_algo].name);
    ++#endif
    ++
     +	if (fmt->hash_algo != GIT_HASH_SHA1)
     +		repo_version = GIT_REPO_VERSION_READ;
     +
12:  adbd01c070 = 14:  6f3c736259 builtin/init-db: add environment variable for new repo hash
13:  21cdc16e15 ! 15:  3966b560b5 init-db: move writing repo version into a function
    @@ builtin/init-db.c: static int needs_work_tree_config(const char *git_dir, const
     +	char repo_version_string[10];
     +	int repo_version = GIT_REPO_VERSION;
     +
    ++#ifndef ENABLE_SHA256
    ++	if (hash_algo != GIT_HASH_SHA1)
    ++		die(_("The hash algorithm %s is not supported in this build."), hash_algos[hash_algo].name);
    ++#endif
    ++
     +	if (hash_algo != GIT_HASH_SHA1)
     +		repo_version = GIT_REPO_VERSION_READ;
     +
    @@ builtin/init-db.c: static int create_default_files(const char *template_path,
      			exit(1);
      	}
      
    +-#ifndef ENABLE_SHA256
    +-	if (fmt->hash_algo != GIT_HASH_SHA1)
    +-		die(_("The hash algorithm %s is not supported in this build."), hash_algos[fmt->hash_algo].name);
    +-#endif
    +-
     -	if (fmt->hash_algo != GIT_HASH_SHA1)
     -		repo_version = GIT_REPO_VERSION_READ;
     -
14:  45423efc1c = 16:  5af56f4f6a worktree: allow repository version 1
15:  7b99132acd ! 17:  fb51a683f1 commit: use expected signature header for SHA-256
    @@ commit.c: int remove_signature(struct strbuf *buf)
     
      ## sequencer.c ##
     @@ sequencer.c: static int try_to_commit(struct repository *r,
    - 	if (parse_head(r, &current_head))
      		return -1;
    + 
      	if (flags & AMEND_MSG) {
     -		const char *exclude_gpgsig[] = { "gpgsig", NULL };
     +		const char *exclude_gpgsig[] = { "gpgsig", "gpgsig-sha256", NULL };
16:  b4eecdbc28 ! 18:  3afac82c18 gpg-interface: improve interface for parsing tags
    @@ log-tree.c: static int show_one_mergetag(struct commit *commit,
     +	struct strbuf payload = STRBUF_INIT;
     +	struct strbuf signature = STRBUF_INIT;
      
    - 	hash_object_file(extra->value, extra->len, type_name(OBJ_TAG), &oid);
    - 	tag = lookup_tag(the_repository, &oid);
    + 	hash_object_file(the_hash_algo, extra->value, extra->len,
    + 			 type_name(OBJ_TAG), &oid);
     @@ log-tree.c: static int show_one_mergetag(struct commit *commit,
      			    "parent #%d, tagged '%s'\n", nth + 1, tag->tag);
      	gpg_message_offset = verify_message.len;
17:  5d9c8753b1 = 19:  5e6d1ccbb2 tag: store SHA-256 signatures in a header
18:  4a279f679d = 20:  fb15c66ba1 fast-import: permit reading multiple marks files
19:  01ef63b04d = 21:  273468b0ac fast-import: add helper function for inserting mark object entries
20:  2469dc1324 = 22:  bba3cf497d fast-import: make find_marks work on any mark set
21:  0ad1e4d3a8 = 23:  714ae09a8c fast-import: add a generic function to iterate over marks
22:  bc53f8aaec = 24:  5c50871f30 fast-import: add options for rewriting submodules

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v2 01/24] builtin/pack-objects: make hash agnostic
  2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
@ 2020-02-22 20:17 ` brian m. carlson
  2020-02-23 21:57   ` Jeff King
  2020-02-22 20:17 ` [PATCH v2 02/24] hash: implement and use a context cloning function brian m. carlson
                   ` (23 subsequent siblings)
  24 siblings, 1 reply; 51+ messages in thread
From: brian m. carlson @ 2020-02-22 20:17 UTC (permalink / raw)
  To: git; +Cc: Jeff King

Avoid hard-coding a hash size, instead preferring to use the_hash_algo.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/pack-objects.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 940fbcb7b3..fceac7b462 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -879,7 +879,7 @@ static void write_reused_pack_one(size_t pos, struct hashfile *out,
 			len = encode_in_pack_object_header(header, sizeof(header),
 							   OBJ_REF_DELTA, size);
 			hashwrite(out, header, len);
-			hashwrite(out, base_sha1, 20);
+			hashwrite(out, base_sha1, the_hash_algo->rawsz);
 			copy_pack_data(out, reuse_packfile, w_curs, cur, next - cur);
 			return;
 		}

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 02/24] hash: implement and use a context cloning function
  2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
  2020-02-22 20:17 ` [PATCH v2 01/24] builtin/pack-objects: make hash agnostic brian m. carlson
@ 2020-02-22 20:17 ` brian m. carlson
  2020-02-22 20:17 ` [PATCH v2 03/24] hex: introduce parsing variants taking hash algorithms brian m. carlson
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 51+ messages in thread
From: brian m. carlson @ 2020-02-22 20:17 UTC (permalink / raw)
  To: git

For all of our SHA-1 implementations and most of our SHA-256
implementations, the hash context we use is a real struct.  For these
implementations, it's possible to copy a hash context by making a copy
of the struct.

However, for our libgcrypt implementation, our hash context is a
pointer.  Consequently, copying it does not lead to an independent hash
context like we intended.

Fortunately, however, libgcrypt provides us with a handy function to
copy hash contexts.  Let's add a cloning function to the hash algorithm
API, and use it in the one place we need to make a hash context copy.
With this change, our libgcrypt SHA-256 implementation is fully
functional with all of our other hash implementations.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 csum-file.c     |  2 +-
 hash.h          | 21 +++++++++++++++++++++
 sha1-file.c     | 18 ++++++++++++++++++
 sha256/gcrypt.h |  6 ++++++
 4 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/csum-file.c b/csum-file.c
index 53ce37f7ca..0f35fa5ee4 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -157,7 +157,7 @@ void hashfile_checkpoint(struct hashfile *f, struct hashfile_checkpoint *checkpo
 {
 	hashflush(f);
 	checkpoint->offset = f->total;
-	checkpoint->ctx = f->ctx;
+	the_hash_algo->clone_fn(&checkpoint->ctx, &f->ctx);
 }
 
 int hashfile_truncate(struct hashfile *f, struct hashfile_checkpoint *checkpoint)
diff --git a/hash.h b/hash.h
index 52a4f1a3f4..e0f3f16b06 100644
--- a/hash.h
+++ b/hash.h
@@ -16,6 +16,7 @@
 #endif
 
 #if defined(SHA256_GCRYPT)
+#define SHA256_NEEDS_CLONE_HELPER
 #include "sha256/gcrypt.h"
 #elif defined(SHA256_OPENSSL)
 #include <openssl/sha.h>
@@ -54,12 +55,28 @@
 #define git_SHA256_Update	platform_SHA256_Update
 #define git_SHA256_Final	platform_SHA256_Final
 
+#ifdef platform_SHA256_Clone
+#define git_SHA256_Clone	platform_SHA256_Clone
+#endif
+
 #ifdef SHA1_MAX_BLOCK_SIZE
 #include "compat/sha1-chunked.h"
 #undef git_SHA1_Update
 #define git_SHA1_Update		git_SHA1_Update_Chunked
 #endif
 
+static inline void git_SHA1_Clone(git_SHA_CTX *dst, const git_SHA_CTX *src)
+{
+	memcpy(dst, src, sizeof(*dst));
+}
+
+#ifndef SHA256_NEEDS_CLONE_HELPER
+static inline void git_SHA256_Clone(git_SHA256_CTX *dst, const git_SHA256_CTX *src)
+{
+	memcpy(dst, src, sizeof(*dst));
+}
+#endif
+
 /*
  * Note that these constants are suitable for indexing the hash_algos array and
  * comparing against each other, but are otherwise arbitrary, so they should not
@@ -85,6 +102,7 @@ union git_hash_ctx {
 typedef union git_hash_ctx git_hash_ctx;
 
 typedef void (*git_hash_init_fn)(git_hash_ctx *ctx);
+typedef void (*git_hash_clone_fn)(git_hash_ctx *dst, const git_hash_ctx *src);
 typedef void (*git_hash_update_fn)(git_hash_ctx *ctx, const void *in, size_t len);
 typedef void (*git_hash_final_fn)(unsigned char *hash, git_hash_ctx *ctx);
 
@@ -110,6 +128,9 @@ struct git_hash_algo {
 	/* The hash initialization function. */
 	git_hash_init_fn init_fn;
 
+	/* The hash context cloning function. */
+	git_hash_clone_fn clone_fn;
+
 	/* The hash update function. */
 	git_hash_update_fn update_fn;
 
diff --git a/sha1-file.c b/sha1-file.c
index d785de8a85..c36ca5a545 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -74,6 +74,11 @@ static void git_hash_sha1_init(git_hash_ctx *ctx)
 	git_SHA1_Init(&ctx->sha1);
 }
 
+static void git_hash_sha1_clone(git_hash_ctx *dst, const git_hash_ctx *src)
+{
+	git_SHA1_Clone(&dst->sha1, &src->sha1);
+}
+
 static void git_hash_sha1_update(git_hash_ctx *ctx, const void *data, size_t len)
 {
 	git_SHA1_Update(&ctx->sha1, data, len);
@@ -90,6 +95,11 @@ static void git_hash_sha256_init(git_hash_ctx *ctx)
 	git_SHA256_Init(&ctx->sha256);
 }
 
+static void git_hash_sha256_clone(git_hash_ctx *dst, const git_hash_ctx *src)
+{
+	git_SHA256_Clone(&dst->sha256, &src->sha256);
+}
+
 static void git_hash_sha256_update(git_hash_ctx *ctx, const void *data, size_t len)
 {
 	git_SHA256_Update(&ctx->sha256, data, len);
@@ -105,6 +115,11 @@ static void git_hash_unknown_init(git_hash_ctx *ctx)
 	BUG("trying to init unknown hash");
 }
 
+static void git_hash_unknown_clone(git_hash_ctx *dst, const git_hash_ctx *src)
+{
+	BUG("trying to clone unknown hash");
+}
+
 static void git_hash_unknown_update(git_hash_ctx *ctx, const void *data, size_t len)
 {
 	BUG("trying to update unknown hash");
@@ -123,6 +138,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		0,
 		0,
 		git_hash_unknown_init,
+		git_hash_unknown_clone,
 		git_hash_unknown_update,
 		git_hash_unknown_final,
 		NULL,
@@ -136,6 +152,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		GIT_SHA1_HEXSZ,
 		GIT_SHA1_BLKSZ,
 		git_hash_sha1_init,
+		git_hash_sha1_clone,
 		git_hash_sha1_update,
 		git_hash_sha1_final,
 		&empty_tree_oid,
@@ -149,6 +166,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		GIT_SHA256_HEXSZ,
 		GIT_SHA256_BLKSZ,
 		git_hash_sha256_init,
+		git_hash_sha256_clone,
 		git_hash_sha256_update,
 		git_hash_sha256_final,
 		&empty_tree_oid_sha256,
diff --git a/sha256/gcrypt.h b/sha256/gcrypt.h
index 09bd8bb200..501da5ed91 100644
--- a/sha256/gcrypt.h
+++ b/sha256/gcrypt.h
@@ -22,8 +22,14 @@ inline void gcrypt_SHA256_Final(unsigned char *digest, gcrypt_SHA256_CTX *ctx)
 	memcpy(digest, gcry_md_read(*ctx, GCRY_MD_SHA256), SHA256_DIGEST_SIZE);
 }
 
+inline void gcrypt_SHA256_Clone(gcrypt_SHA256_CTX *dst, const gcrypt_SHA256_CTX *src)
+{
+	gcry_md_copy(dst, *src);
+}
+
 #define platform_SHA256_CTX gcrypt_SHA256_CTX
 #define platform_SHA256_Init gcrypt_SHA256_Init
+#define platform_SHA256_Clone gcrypt_SHA256_Clone
 #define platform_SHA256_Update gcrypt_SHA256_Update
 #define platform_SHA256_Final gcrypt_SHA256_Final
 

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 03/24] hex: introduce parsing variants taking hash algorithms
  2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
  2020-02-22 20:17 ` [PATCH v2 01/24] builtin/pack-objects: make hash agnostic brian m. carlson
  2020-02-22 20:17 ` [PATCH v2 02/24] hash: implement and use a context cloning function brian m. carlson
@ 2020-02-22 20:17 ` brian m. carlson
  2020-02-22 20:17 ` [PATCH v2 04/24] hex: add functions to parse hex object IDs in any algorithm brian m. carlson
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 51+ messages in thread
From: brian m. carlson @ 2020-02-22 20:17 UTC (permalink / raw)
  To: git

Introduce variants of get_oid_hex and parse_oid_hex that parse an
arbitrary hash algorithm, implementing internal functions to avoid
duplication.  These functions can be used in the transport code to parse
refs properly.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 cache.h |  7 +++++++
 hex.c   | 35 +++++++++++++++++++++++++++--------
 2 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/cache.h b/cache.h
index 37c899b53f..4c9c21a11d 100644
--- a/cache.h
+++ b/cache.h
@@ -1481,6 +1481,9 @@ int set_disambiguate_hint_config(const char *var, const char *value);
 int get_sha1_hex(const char *hex, unsigned char *sha1);
 int get_oid_hex(const char *hex, struct object_id *sha1);
 
+/* Like get_oid_hex, but for an arbitrary hash algorithm. */
+int get_oid_hex_algop(const char *hex, struct object_id *oid, const struct git_hash_algo *algop);
+
 /*
  * Read `len` pairs of hexadecimal digits from `hex` and write the
  * values to `binary` as `len` bytes. Return 0 on success, or -1 if
@@ -1516,6 +1519,10 @@ char *oid_to_hex(const struct object_id *oid);						/* same static buffer */
  */
 int parse_oid_hex(const char *hex, struct object_id *oid, const char **end);
 
+/* Like parse_oid_hex, but for an arbitrary hash algorithm. */
+int parse_oid_hex_algop(const char *hex, struct object_id *oid, const char **end,
+			const struct git_hash_algo *algo);
+
 /*
  * This reads short-hand syntax that not only evaluates to a commit
  * object name, but also can act as if the end user spelled the name
diff --git a/hex.c b/hex.c
index fd7f00c43f..10e24dc2e4 100644
--- a/hex.c
+++ b/hex.c
@@ -47,30 +47,49 @@ int hex_to_bytes(unsigned char *binary, const char *hex, size_t len)
 	return 0;
 }
 
-int get_sha1_hex(const char *hex, unsigned char *sha1)
+static int get_hash_hex_algop(const char *hex, unsigned char *hash,
+			      const struct git_hash_algo *algop)
 {
 	int i;
-	for (i = 0; i < the_hash_algo->rawsz; i++) {
+	for (i = 0; i < algop->rawsz; i++) {
 		int val = hex2chr(hex);
 		if (val < 0)
 			return -1;
-		*sha1++ = val;
+		*hash++ = val;
 		hex += 2;
 	}
 	return 0;
 }
 
+int get_sha1_hex(const char *hex, unsigned char *sha1)
+{
+	return get_hash_hex_algop(hex, sha1, the_hash_algo);
+}
+
+int get_oid_hex_algop(const char *hex, struct object_id *oid,
+		      const struct git_hash_algo *algop)
+{
+	return get_hash_hex_algop(hex, oid->hash, algop);
+}
+
 int get_oid_hex(const char *hex, struct object_id *oid)
 {
-	return get_sha1_hex(hex, oid->hash);
+	return get_oid_hex_algop(hex, oid, the_hash_algo);
+}
+
+int parse_oid_hex_algop(const char *hex, struct object_id *oid,
+			const char **end,
+			const struct git_hash_algo *algop)
+{
+	int ret = get_hash_hex_algop(hex, oid->hash, algop);
+	if (!ret)
+		*end = hex + algop->hexsz;
+	return ret;
 }
 
 int parse_oid_hex(const char *hex, struct object_id *oid, const char **end)
 {
-	int ret = get_oid_hex(hex, oid);
-	if (!ret)
-		*end = hex + the_hash_algo->hexsz;
-	return ret;
+	return parse_oid_hex_algop(hex, oid, end, the_hash_algo);
 }
 
 char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash,

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 04/24] hex: add functions to parse hex object IDs in any algorithm
  2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (2 preceding siblings ...)
  2020-02-22 20:17 ` [PATCH v2 03/24] hex: introduce parsing variants taking hash algorithms brian m. carlson
@ 2020-02-22 20:17 ` brian m. carlson
  2020-02-22 20:17 ` [PATCH v2 05/24] repository: require a build flag to use SHA-256 brian m. carlson
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 51+ messages in thread
From: brian m. carlson @ 2020-02-22 20:17 UTC (permalink / raw)
  To: git

There are some places where we need to parse a hex object ID in any
algorithm without knowing beforehand which algorithm is in use. An
example is when parsing fast-import marks.

Add a get_oid_hex_any to parse an object ID and return the algorithm it
belongs to, and additionally add parse_oid_hex_any which is the
equivalent change for parse_oid_hex. If the object is not parseable, we
return GIT_HASH_UNKNOWN.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 cache.h | 10 ++++++++++
 hex.c   | 22 ++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/cache.h b/cache.h
index 4c9c21a11d..158d7ccfd8 100644
--- a/cache.h
+++ b/cache.h
@@ -1523,6 +1523,16 @@ int parse_oid_hex(const char *hex, struct object_id *oid, const char **end);
 int parse_oid_hex_algop(const char *hex, struct object_id *oid, const char **end,
 			const struct git_hash_algo *algo);
 
+
+/*
+ * These functions work like get_oid_hex and parse_oid_hex, but they will parse
+ * a hex value for any algorithm. The algorithm is detected based on the length
+ * and the algorithm in use is returned. If this is not a hex object ID in any
+ * algorithm, returns GIT_HASH_UNKNOWN.
+ */
+int get_oid_hex_any(const char *hex, struct object_id *oid);
+int parse_oid_hex_any(const char *hex, struct object_id *oid, const char **end);
+
 /*
  * This reads short-hand syntax that not only evaluates to a commit
  * object name, but also can act as if the end user spelled the name
diff --git a/hex.c b/hex.c
index 10e24dc2e4..da51e64929 100644
--- a/hex.c
+++ b/hex.c
@@ -72,6 +72,20 @@ int get_oid_hex_algop(const char *hex, struct object_id *oid,
 	return get_hash_hex_algop(hex, oid->hash, algop);
 }
 
+/*
+ * NOTE: This function relies on hash algorithms being in order from shortest
+ * length to longest length.
+ */
+int get_oid_hex_any(const char *hex, struct object_id *oid)
+{
+	int i;
+	for (i = GIT_HASH_NALGOS - 1; i > 0; i--) {
+		if (!get_hash_hex_algop(hex, oid->hash, &hash_algos[i]))
+			return i;
+	}
+	return GIT_HASH_UNKNOWN;
+}
+
 int get_oid_hex(const char *hex, struct object_id *oid)
 {
 	return get_oid_hex_algop(hex, oid, the_hash_algo);
@@ -87,6 +101,14 @@ int parse_oid_hex_algop(const char *hex, struct object_id *oid,
 	return ret;
 }
 
+int parse_oid_hex_any(const char *hex, struct object_id *oid, const char **end)
+{
+	int ret = get_oid_hex_any(hex, oid);
+	if (ret)
+		*end = hex + hash_algos[ret].hexsz;
+	return ret;
+}
+
 int parse_oid_hex(const char *hex, struct object_id *oid, const char **end)
 {
 	return parse_oid_hex_algop(hex, oid, end, the_hash_algo);

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 05/24] repository: require a build flag to use SHA-256
  2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (3 preceding siblings ...)
  2020-02-22 20:17 ` [PATCH v2 04/24] hex: add functions to parse hex object IDs in any algorithm brian m. carlson
@ 2020-02-22 20:17 ` brian m. carlson
  2020-02-22 20:17 ` [PATCH v2 06/24] t: use hash-specific lookup tables to define test constants brian m. carlson
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 51+ messages in thread
From: brian m. carlson @ 2020-02-22 20:17 UTC (permalink / raw)
  To: git

At this point, SHA-256 support is experimental and some behavior may
change. To avoid surprising unsuspecting users, require a build flag,
ENABLE_SHA256, to allow use of a non-SHA-1 algorithm. Enable this flag
by default when the DEVELOPER make flag is set so that contributors can
test this case adequately.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 config.mak.dev | 2 ++
 repository.c   | 4 ++++
 2 files changed, 6 insertions(+)

diff --git a/config.mak.dev b/config.mak.dev
index bf1f3fcdee..2e19435915 100644
--- a/config.mak.dev
+++ b/config.mak.dev
@@ -17,6 +17,8 @@ DEVELOPER_CFLAGS += -Wstrict-prototypes
 DEVELOPER_CFLAGS += -Wunused
 DEVELOPER_CFLAGS += -Wvla
 
+DEVELOPER_CFLAGS += -DENABLE_SHA256
+
 ifndef COMPILER_FEATURES
 COMPILER_FEATURES := $(shell ./detect-compiler $(CC))
 endif
diff --git a/repository.c b/repository.c
index a4174ddb06..6f7f6f002b 100644
--- a/repository.c
+++ b/repository.c
@@ -89,6 +89,10 @@ void repo_set_gitdir(struct repository *repo,
 void repo_set_hash_algo(struct repository *repo, int hash_algo)
 {
 	repo->hash_algo = &hash_algos[hash_algo];
+#ifndef ENABLE_SHA256
+	if (hash_algo != GIT_HASH_SHA1)
+		die(_("The hash algorithm %s is not supported in this build."), repo->hash_algo->name);
+#endif
 }
 
 /*

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 06/24] t: use hash-specific lookup tables to define test constants
  2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (4 preceding siblings ...)
  2020-02-22 20:17 ` [PATCH v2 05/24] repository: require a build flag to use SHA-256 brian m. carlson
@ 2020-02-22 20:17 ` brian m. carlson
  2020-02-22 20:17 ` [PATCH v2 07/24] t6300: abstract away SHA-1-specific constants brian m. carlson
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 51+ messages in thread
From: brian m. carlson @ 2020-02-22 20:17 UTC (permalink / raw)
  To: git

In the future, we'll allow developers to run the testsuite with a hash
algorithm of their choice.  To make this easier, compute the fixed
constants using test_oid. Move the constant initialization down below
the point where test-lib-functions.sh is loaded so the functions are
defined.

Note that we don't provide a value for the OID_REGEX value directly
because writing a large number of instances of "[0-9a-f]" in the
oid-info files is unwieldy and there isn't a way to compute it based on
those values. Instead, compute it based on ZERO_OID.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/test-lib.sh | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 0ea1e5a05e..9fe390bd5a 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -494,21 +494,6 @@ case $(echo $GIT_TRACE |tr "[A-Z]" "[a-z]") in
 	;;
 esac
 
-# Convenience
-#
-# A regexp to match 5, 35 and 40 hexdigits
-_x05='[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]'
-_x35="$_x05$_x05$_x05$_x05$_x05$_x05$_x05"
-_x40="$_x35$_x05"
-
-# Zero SHA-1
-_z40=0000000000000000000000000000000000000000
-
-OID_REGEX="$_x40"
-ZERO_OID=$_z40
-EMPTY_TREE=4b825dc642cb6eb9a060e54bf8d69288fbee4904
-EMPTY_BLOB=e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
-
 # Line feed
 LF='
 '
@@ -1383,6 +1368,20 @@ then
 	fi
 fi
 
+# Convenience
+# A regexp to match 5, 35 and 40 hexdigits
+_x05='[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]'
+_x35="$_x05$_x05$_x05$_x05$_x05$_x05$_x05"
+_x40="$_x35$_x05"
+
+test_oid_init
+
+ZERO_OID=$(test_oid zero)
+OID_REGEX=$(echo $ZERO_OID | sed -e 's/0/[0-9a-f]/g')
+EMPTY_TREE=$(test_oid empty_tree)
+EMPTY_BLOB=$(test_oid empty_blob)
+_z40=$ZERO_OID
+
 # Provide an implementation of the 'yes' utility; the upper bound
 # limit is there to help Windows that cannot stop this loop from
 # wasting cycles when the downstream stops reading, so do not be

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 07/24] t6300: abstract away SHA-1-specific constants
  2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (5 preceding siblings ...)
  2020-02-22 20:17 ` [PATCH v2 06/24] t: use hash-specific lookup tables to define test constants brian m. carlson
@ 2020-02-22 20:17 ` brian m. carlson
  2020-02-24 18:01   ` Junio C Hamano
  2020-02-22 20:17 ` [PATCH v2 08/24] t6300: make hash algorithm independent brian m. carlson
                   ` (17 subsequent siblings)
  24 siblings, 1 reply; 51+ messages in thread
From: brian m. carlson @ 2020-02-22 20:17 UTC (permalink / raw)
  To: git

Adjust the test so that it computes variables for object IDs instead of
using hard-coded hashes.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t6300-for-each-ref.sh | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 9c910ce746..2406b93d35 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -20,6 +20,10 @@ setdate_and_increment () {
 }
 
 test_expect_success setup '
+	test_oid_cache <<-EOF &&
+	disklen sha1:138
+	disklen sha256:154
+	EOF
 	setdate_and_increment &&
 	echo "Using $datestamp" > one &&
 	git add one &&
@@ -50,6 +54,9 @@ test_atom() {
 	"
 }
 
+hexlen=$(test_oid hexsz)
+disklen=$(test_oid disklen)
+
 test_atom head refname refs/heads/master
 test_atom head refname: refs/heads/master
 test_atom head refname:short master
@@ -82,9 +89,9 @@ test_atom head push:rstrip=-1 refs
 test_atom head push:strip=1 remotes/myfork/master
 test_atom head push:strip=-1 master
 test_atom head objecttype commit
-test_atom head objectsize 171
-test_atom head objectsize:disk 138
-test_atom head deltabase 0000000000000000000000000000000000000000
+test_atom head objectsize $((131 + hexlen))
+test_atom head objectsize:disk $disklen
+test_atom head deltabase $ZERO_OID
 test_atom head objectname $(git rev-parse refs/heads/master)
 test_atom head objectname:short $(git rev-parse --short refs/heads/master)
 test_atom head objectname:short=1 $(git rev-parse --short=1 refs/heads/master)
@@ -125,11 +132,11 @@ test_atom tag refname:short testtag
 test_atom tag upstream ''
 test_atom tag push ''
 test_atom tag objecttype tag
-test_atom tag objectsize 154
-test_atom tag objectsize:disk 138
-test_atom tag '*objectsize:disk' 138
-test_atom tag deltabase 0000000000000000000000000000000000000000
-test_atom tag '*deltabase' 0000000000000000000000000000000000000000
+test_atom tag objectsize $((114 + hexlen))
+test_atom tag objectsize:disk $disklen
+test_atom tag '*objectsize:disk' $disklen
+test_atom tag deltabase $ZERO_OID
+test_atom tag '*deltabase' $ZERO_OID
 test_atom tag objectname $(git rev-parse refs/tags/testtag)
 test_atom tag objectname:short $(git rev-parse --short refs/tags/testtag)
 test_atom head objectname:short=1 $(git rev-parse --short=1 refs/heads/master)
@@ -139,7 +146,7 @@ test_atom tag parent ''
 test_atom tag numparent ''
 test_atom tag object $(git rev-parse refs/tags/testtag^0)
 test_atom tag type 'commit'
-test_atom tag '*objectname' 'ea122842f48be4afb2d1fc6a4b96c05885ab7463'
+test_atom tag '*objectname' $(git rev-parse refs/tags/testtag^{})
 test_atom tag '*objecttype' 'commit'
 test_atom tag author ''
 test_atom tag authorname ''

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 08/24] t6300: make hash algorithm independent
  2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (6 preceding siblings ...)
  2020-02-22 20:17 ` [PATCH v2 07/24] t6300: abstract away SHA-1-specific constants brian m. carlson
@ 2020-02-22 20:17 ` brian m. carlson
  2020-02-22 20:17 ` [PATCH v2 09/24] t/helper/test-dump-split-index: initialize git repository brian m. carlson
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 51+ messages in thread
From: brian m. carlson @ 2020-02-22 20:17 UTC (permalink / raw)
  To: git

One of the git for-each-ref tests asks to sort by object ID.  However,
when sorted, the order of the refs differs between SHA-1 and SHA-256.
Sort the expected output so that the test passes.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t6300-for-each-ref.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 2406b93d35..b3c1092338 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -650,7 +650,7 @@ test_atom refs/tags/signed-long contents "subject line
 body contents
 $sig"
 
-cat >expected <<EOF
+sort >expected <<EOF
 $(git rev-parse refs/tags/bogo) <committer@example.com> refs/tags/bogo
 $(git rev-parse refs/tags/master) <committer@example.com> refs/tags/master
 EOF

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 09/24] t/helper/test-dump-split-index: initialize git repository
  2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (7 preceding siblings ...)
  2020-02-22 20:17 ` [PATCH v2 08/24] t6300: make hash algorithm independent brian m. carlson
@ 2020-02-22 20:17 ` brian m. carlson
  2020-02-22 20:17 ` [PATCH v2 10/24] t/helper: initialize repository if necessary brian m. carlson
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 51+ messages in thread
From: brian m. carlson @ 2020-02-22 20:17 UTC (permalink / raw)
  To: git

In this test helper, we read the index.  In order to have the proper
hash algorithm set up, we must call setup_git_directory.  Do so, so that
the test works when extensions.objectFormat is set.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/helper/test-dump-split-index.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/t/helper/test-dump-split-index.c b/t/helper/test-dump-split-index.c
index 63c689d6ee..a209880eb3 100644
--- a/t/helper/test-dump-split-index.c
+++ b/t/helper/test-dump-split-index.c
@@ -13,6 +13,8 @@ int cmd__dump_split_index(int ac, const char **av)
 	struct split_index *si;
 	int i;
 
+	setup_git_directory();
+
 	do_read_index(&the_index, av[1], 1);
 	printf("own %s\n", oid_to_hex(&the_index.oid));
 	si = the_index.split_index;

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 10/24] t/helper: initialize repository if necessary
  2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (8 preceding siblings ...)
  2020-02-22 20:17 ` [PATCH v2 09/24] t/helper/test-dump-split-index: initialize git repository brian m. carlson
@ 2020-02-22 20:17 ` brian m. carlson
  2020-02-24 18:05   ` Junio C Hamano
  2020-02-22 20:17 ` [PATCH v2 11/24] t/helper: make repository tests hash independent brian m. carlson
                   ` (14 subsequent siblings)
  24 siblings, 1 reply; 51+ messages in thread
From: brian m. carlson @ 2020-02-22 20:17 UTC (permalink / raw)
  To: git

The repository helper is used in t5318 to read commit graphs whether
we're in a repository or not. However, without a repository, we have no
way to properly initialize the hash algorithm, meaning that the file is
misread.

Fix this by calling setup_git_directory_gently, which will read the
environment variable the testsuite sets to ensure that the correct hash
algorithm is set.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/helper/test-repository.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/t/helper/test-repository.c b/t/helper/test-repository.c
index f7f8618445..ecc768e4cb 100644
--- a/t/helper/test-repository.c
+++ b/t/helper/test-repository.c
@@ -75,6 +75,10 @@ static void test_get_commit_tree_in_graph(const char *gitdir,
 
 int cmd__repository(int argc, const char **argv)
 {
+	int nongit_ok = 0;
+
+	setup_git_directory_gently(&nongit_ok);
+
 	if (argc < 2)
 		die("must have at least 2 arguments");
 	if (!strcmp(argv[1], "parse_commit_in_graph")) {

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 11/24] t/helper: make repository tests hash independent
  2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (9 preceding siblings ...)
  2020-02-22 20:17 ` [PATCH v2 10/24] t/helper: initialize repository if necessary brian m. carlson
@ 2020-02-22 20:17 ` brian m. carlson
  2020-02-22 20:17 ` [PATCH v2 12/24] setup: allow check_repository_format to read repository format brian m. carlson
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 51+ messages in thread
From: brian m. carlson @ 2020-02-22 20:17 UTC (permalink / raw)
  To: git

This test currently hard-codes the hash algorithm as SHA-1 when calling
repo_set_hash_algo so that the_hash_algo is properly initialized.
However, this does not work with SHA-256 repositories. Read the
repository value that repo_init has read into the local repository
variable and set the algorithm based on that value.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/helper/test-repository.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/t/helper/test-repository.c b/t/helper/test-repository.c
index ecc768e4cb..56f0e3c1be 100644
--- a/t/helper/test-repository.c
+++ b/t/helper/test-repository.c
@@ -19,12 +19,11 @@ static void test_parse_commit_in_graph(const char *gitdir, const char *worktree,
 
 	memset(the_repository, 0, sizeof(*the_repository));
 
-	/* TODO: Needed for temporary hack in hashcmp, see 183a638b7da. */
-	repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
-
 	if (repo_init(&r, gitdir, worktree))
 		die("Couldn't init repo");
 
+	repo_set_hash_algo(the_repository, hash_algo_by_ptr(r.hash_algo));
+
 	c = lookup_commit(&r, commit_oid);
 
 	if (!parse_commit_in_graph(&r, c))
@@ -50,12 +49,11 @@ static void test_get_commit_tree_in_graph(const char *gitdir,
 
 	memset(the_repository, 0, sizeof(*the_repository));
 
-	/* TODO: Needed for temporary hack in hashcmp, see 183a638b7da. */
-	repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
-
 	if (repo_init(&r, gitdir, worktree))
 		die("Couldn't init repo");
 
+	repo_set_hash_algo(the_repository, hash_algo_by_ptr(r.hash_algo));
+
 	c = lookup_commit(&r, commit_oid);
 
 	/*

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 12/24] setup: allow check_repository_format to read repository format
  2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (10 preceding siblings ...)
  2020-02-22 20:17 ` [PATCH v2 11/24] t/helper: make repository tests hash independent brian m. carlson
@ 2020-02-22 20:17 ` brian m. carlson
  2020-02-22 20:17 ` [PATCH v2 13/24] builtin/init-db: allow specifying hash algorithm on command line brian m. carlson
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 51+ messages in thread
From: brian m. carlson @ 2020-02-22 20:17 UTC (permalink / raw)
  To: git

In some cases, we will want to not only check the repository format, but
extract the information that we've gained.  To do so, allow
check_repository_format to take a pointer to struct repository_format.
Allow passing NULL for this argument if we're not interested in the
information, and pass NULL for all existing callers.  A future patch
will make use of this information.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/init-db.c | 2 +-
 cache.h           | 4 +++-
 path.c            | 2 +-
 setup.c           | 6 ++++--
 4 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/builtin/init-db.c b/builtin/init-db.c
index 944ec77fe1..b11f07064d 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -378,7 +378,7 @@ int init_db(const char *git_dir, const char *real_git_dir,
 	 * config file, so this will not fail.  What we are catching
 	 * is an attempt to reinitialize new repository with an old tool.
 	 */
-	check_repository_format();
+	check_repository_format(NULL);
 
 	reinit = create_default_files(template_dir, original_git_dir);
 
diff --git a/cache.h b/cache.h
index 158d7ccfd8..29ee02a8d4 100644
--- a/cache.h
+++ b/cache.h
@@ -1086,8 +1086,10 @@ int verify_repository_format(const struct repository_format *format,
  * and die if it is a version we don't understand. Generally one would
  * set_git_dir() before calling this, and use it only for "are we in a valid
  * repo?".
+ *
+ * If successful and fmt is not NULL, fill fmt with data.
  */
-void check_repository_format(void);
+void check_repository_format(struct repository_format *fmt);
 
 #define MTIME_CHANGED	0x0001
 #define CTIME_CHANGED	0x0002
diff --git a/path.c b/path.c
index 88cf593007..a10b62c0c4 100644
--- a/path.c
+++ b/path.c
@@ -851,7 +851,7 @@ const char *enter_repo(const char *path, int strict)
 
 	if (is_git_directory(".")) {
 		set_git_dir(".");
-		check_repository_format();
+		check_repository_format(NULL);
 		return path;
 	}
 
diff --git a/setup.c b/setup.c
index 4ea7a0b081..af20c3d7c0 100644
--- a/setup.c
+++ b/setup.c
@@ -1253,10 +1253,12 @@ int git_config_perm(const char *var, const char *value)
 	return -(i & 0666);
 }
 
-void check_repository_format(void)
+void check_repository_format(struct repository_format *fmt)
 {
 	struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT;
-	check_repository_format_gently(get_git_dir(), &repo_fmt, NULL);
+	if (!fmt)
+		fmt = &repo_fmt;
+	check_repository_format_gently(get_git_dir(), fmt, NULL);
 	startup_info->have_repository = 1;
 	clear_repository_format(&repo_fmt);
 }

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 13/24] builtin/init-db: allow specifying hash algorithm on command line
  2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (11 preceding siblings ...)
  2020-02-22 20:17 ` [PATCH v2 12/24] setup: allow check_repository_format to read repository format brian m. carlson
@ 2020-02-22 20:17 ` brian m. carlson
  2020-02-24 18:14   ` Junio C Hamano
  2020-02-22 20:17 ` [PATCH v2 14/24] builtin/init-db: add environment variable for new repo hash brian m. carlson
                   ` (11 subsequent siblings)
  24 siblings, 1 reply; 51+ messages in thread
From: brian m. carlson @ 2020-02-22 20:17 UTC (permalink / raw)
  To: git

Allow the user to specify the hash algorithm on the command line by
using the --object-format option to git init.  Validate that the user is
not attempting to reinitialize a repository with a different hash
algorithm.  Ensure that if we are writing a non-SHA-1 repository that we
set the repository version to 1 and write the objectFormat extension.

Restrict this option to work only when ENABLE_SHA256 is set until the
codebase is in a situation to fully support this.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/git-init.txt |  7 ++++-
 builtin/clone.c            |  2 +-
 builtin/init-db.c          | 52 +++++++++++++++++++++++++++++++++-----
 cache.h                    |  3 ++-
 4 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-init.txt b/Documentation/git-init.txt
index 32880aafb0..adc6adfd38 100644
--- a/Documentation/git-init.txt
+++ b/Documentation/git-init.txt
@@ -10,7 +10,7 @@ SYNOPSIS
 --------
 [verse]
 'git init' [-q | --quiet] [--bare] [--template=<template_directory>]
-	  [--separate-git-dir <git dir>]
+	  [--separate-git-dir <git dir>] [--object-format=<format]
 	  [--shared[=<permissions>]] [directory]
 
 
@@ -48,6 +48,11 @@ Only print error and warning messages; all other output will be suppressed.
 Create a bare repository. If `GIT_DIR` environment is not set, it is set to the
 current working directory.
 
+--object-format=<format>::
+
+Specify the given object format (hash algorithm) for the repository.  The valid
+values are 'sha1' and (if enabled) 'sha256'.  'sha1' is the default.
+
 --template=<template_directory>::
 
 Specify the directory from which templates will be used.  (See the "TEMPLATE
diff --git a/builtin/clone.c b/builtin/clone.c
index 4f6150c55c..961996a110 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1097,7 +1097,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	init_db(git_dir, real_git_dir, option_template, INIT_DB_QUIET);
+	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN, INIT_DB_QUIET);
 
 	if (real_git_dir)
 		git_dir = real_git_dir;
diff --git a/builtin/init-db.c b/builtin/init-db.c
index b11f07064d..d05552f0ae 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -177,7 +177,8 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 }
 
 static int create_default_files(const char *template_path,
-				const char *original_git_dir)
+				const char *original_git_dir,
+				const struct repository_format *fmt)
 {
 	struct stat st1;
 	struct strbuf buf = STRBUF_INIT;
@@ -187,6 +188,7 @@ static int create_default_files(const char *template_path,
 	int reinit;
 	int filemode;
 	struct strbuf err = STRBUF_INIT;
+	int repo_version = GIT_REPO_VERSION;
 
 	/* Just look for `init.templatedir` */
 	init_db_template_dir = NULL; /* re-set in case it was set before */
@@ -244,11 +246,23 @@ static int create_default_files(const char *template_path,
 			exit(1);
 	}
 
+#ifndef ENABLE_SHA256
+	if (fmt->hash_algo != GIT_HASH_SHA1)
+		die(_("The hash algorithm %s is not supported in this build."), hash_algos[fmt->hash_algo].name);
+#endif
+
+	if (fmt->hash_algo != GIT_HASH_SHA1)
+		repo_version = GIT_REPO_VERSION_READ;
+
 	/* This forces creation of new config file */
 	xsnprintf(repo_version_string, sizeof(repo_version_string),
-		  "%d", GIT_REPO_VERSION);
+		  "%d", repo_version);
 	git_config_set("core.repositoryformatversion", repo_version_string);
 
+	if (fmt->hash_algo != GIT_HASH_SHA1)
+		git_config_set("extensions.objectformat",
+			       hash_algos[fmt->hash_algo].name);
+
 	/* Check filemode trustability */
 	path = git_path_buf(&buf, "config");
 	filemode = TEST_FILEMODE;
@@ -340,12 +354,26 @@ static void separate_git_dir(const char *git_dir, const char *git_link)
 	write_file(git_link, "gitdir: %s", git_dir);
 }
 
+static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash)
+{
+	/*
+	 * If we already have an initialized repo, don't allow the user to
+	 * specify a different algorithm, as that could cause corruption.
+	 * Otherwise, if the user has specified one on the command line, use it.
+	 */
+	if (repo_fmt->version >= 0 && hash != GIT_HASH_UNKNOWN && hash != repo_fmt->hash_algo)
+		die(_("attempt to reinitialize repository with different hash"));
+	else if (hash != GIT_HASH_UNKNOWN)
+		repo_fmt->hash_algo = hash;
+}
+
 int init_db(const char *git_dir, const char *real_git_dir,
-	    const char *template_dir, unsigned int flags)
+	    const char *template_dir, int hash, unsigned int flags)
 {
 	int reinit;
 	int exist_ok = flags & INIT_DB_EXIST_OK;
 	char *original_git_dir = real_pathdup(git_dir, 1);
+	struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT;
 
 	if (real_git_dir) {
 		struct stat st;
@@ -378,9 +406,11 @@ int init_db(const char *git_dir, const char *real_git_dir,
 	 * config file, so this will not fail.  What we are catching
 	 * is an attempt to reinitialize new repository with an old tool.
 	 */
-	check_repository_format(NULL);
+	check_repository_format(&repo_fmt);
 
-	reinit = create_default_files(template_dir, original_git_dir);
+	validate_hash_algorithm(&repo_fmt, hash);
+
+	reinit = create_default_files(template_dir, original_git_dir, &repo_fmt);
 
 	create_object_directory();
 
@@ -482,6 +512,8 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	const char *work_tree;
 	const char *template_dir = NULL;
 	unsigned int flags = 0;
+	const char *object_format = NULL;
+	int hash_algo = GIT_HASH_UNKNOWN;
 	const struct option init_db_options[] = {
 		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
 				N_("directory from which templates will be used")),
@@ -494,6 +526,8 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 		OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
 		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
 			   N_("separate git dir from working tree")),
+		OPT_STRING(0, "object-format", &object_format, N_("hash"),
+			   N_("specify the hash algorithm to use")),
 		OPT_END()
 	};
 
@@ -546,6 +580,12 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 		free(cwd);
 	}
 
+	if (object_format) {
+		hash_algo = hash_algo_by_name(object_format);
+		if (hash_algo == GIT_HASH_UNKNOWN)
+			die(_("unknown hash algorithm '%s'"), object_format);
+	}
+
 	if (init_shared_repository != -1)
 		set_shared_repository(init_shared_repository);
 
@@ -597,5 +637,5 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	UNLEAK(work_tree);
 
 	flags |= INIT_DB_EXIST_OK;
-	return init_db(git_dir, real_git_dir, template_dir, flags);
+	return init_db(git_dir, real_git_dir, template_dir, hash_algo, flags);
 }
diff --git a/cache.h b/cache.h
index 29ee02a8d4..7a47e023ba 100644
--- a/cache.h
+++ b/cache.h
@@ -627,7 +627,8 @@ int path_inside_repo(const char *prefix, const char *path);
 #define INIT_DB_EXIST_OK 0x0002
 
 int init_db(const char *git_dir, const char *real_git_dir,
-	    const char *template_dir, unsigned int flags);
+	    const char *template_dir, int hash_algo,
+	    unsigned int flags);
 
 void sanitize_stdfds(void);
 int daemonize(void);

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 14/24] builtin/init-db: add environment variable for new repo hash
  2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (12 preceding siblings ...)
  2020-02-22 20:17 ` [PATCH v2 13/24] builtin/init-db: allow specifying hash algorithm on command line brian m. carlson
@ 2020-02-22 20:17 ` brian m. carlson
  2020-02-22 20:17 ` [PATCH v2 15/24] init-db: move writing repo version into a function brian m. carlson
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 51+ messages in thread
From: brian m. carlson @ 2020-02-22 20:17 UTC (permalink / raw)
  To: git

For the foreseeable future, SHA-1 will be the default algorithm for Git.
However, when running the testsuite, we want to be able to test an
arbitrary algorithm. It would be quite burdensome and very untidy to
have to specify the algorithm we'd like to test every time we
initialized a new repository somewhere in the testsuite, so add an
environment variable to allow us to specify the default hash algorithm
for Git.

This has the benefit that we can set it once for the entire testsuite
and not have to think about it. In the future, users can also use it to
set the default for their repositories if they would like to do so.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/git.txt | 6 ++++++
 builtin/init-db.c     | 9 +++++++++
 2 files changed, 15 insertions(+)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index b0672bd806..9d6769e95a 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -493,6 +493,12 @@ double-quotes and respecting backslash escapes. E.g., the value
 	details. This variable has lower precedence than other path
 	variables such as GIT_INDEX_FILE, GIT_OBJECT_DIRECTORY...
 
+`GIT_DEFAULT_HASH_ALGORITHM`::
+	If this variable is set, the default hash algorithm for new
+	repositories will be set to this value. This value is currently
+	ignored when cloning; the setting of the remote repository
+	is used instead. The default is "sha1".
+
 Git Commits
 ~~~~~~~~~~~
 `GIT_AUTHOR_NAME`::
diff --git a/builtin/init-db.c b/builtin/init-db.c
index d05552f0ae..ab4fd682ab 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -20,6 +20,8 @@
 #define TEST_FILEMODE 1
 #endif
 
+#define GIT_DEFAULT_HASH_ENVIRONMENT "GIT_DEFAULT_HASH"
+
 static int init_is_bare_repository = 0;
 static int init_shared_repository = -1;
 static const char *init_db_template_dir;
@@ -356,6 +358,7 @@ static void separate_git_dir(const char *git_dir, const char *git_link)
 
 static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash)
 {
+	const char *env = getenv(GIT_DEFAULT_HASH_ENVIRONMENT);
 	/*
 	 * If we already have an initialized repo, don't allow the user to
 	 * specify a different algorithm, as that could cause corruption.
@@ -365,6 +368,12 @@ static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash
 		die(_("attempt to reinitialize repository with different hash"));
 	else if (hash != GIT_HASH_UNKNOWN)
 		repo_fmt->hash_algo = hash;
+	else if (env) {
+		int env_algo = hash_algo_by_name(env);
+		if (env_algo == GIT_HASH_UNKNOWN)
+			die(_("unknown hash algorithm '%s'"), env);
+		repo_fmt->hash_algo = env_algo;
+	}
 }
 
 int init_db(const char *git_dir, const char *real_git_dir,

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 15/24] init-db: move writing repo version into a function
  2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (13 preceding siblings ...)
  2020-02-22 20:17 ` [PATCH v2 14/24] builtin/init-db: add environment variable for new repo hash brian m. carlson
@ 2020-02-22 20:17 ` brian m. carlson
  2020-02-22 20:17 ` [PATCH v2 16/24] worktree: allow repository version 1 brian m. carlson
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 51+ messages in thread
From: brian m. carlson @ 2020-02-22 20:17 UTC (permalink / raw)
  To: git

When we perform a clone, we won't know the remote side's hash algorithm
until we've read the heads.  Consequently, we'll need to rewrite the
repository format version and hash algorithm once we know what the
remote side has.  Move the code that does this into its own function so
that we can call it from clone in the future.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/init-db.c | 42 ++++++++++++++++++++++++------------------
 cache.h           |  1 +
 2 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/builtin/init-db.c b/builtin/init-db.c
index ab4fd682ab..5d96e59885 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -178,6 +178,29 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 	return 1;
 }
 
+void initialize_repository_version(int hash_algo)
+{
+	char repo_version_string[10];
+	int repo_version = GIT_REPO_VERSION;
+
+#ifndef ENABLE_SHA256
+	if (hash_algo != GIT_HASH_SHA1)
+		die(_("The hash algorithm %s is not supported in this build."), hash_algos[hash_algo].name);
+#endif
+
+	if (hash_algo != GIT_HASH_SHA1)
+		repo_version = GIT_REPO_VERSION_READ;
+
+	/* This forces creation of new config file */
+	xsnprintf(repo_version_string, sizeof(repo_version_string),
+		  "%d", repo_version);
+	git_config_set("core.repositoryformatversion", repo_version_string);
+
+	if (hash_algo != GIT_HASH_SHA1)
+		git_config_set("extensions.objectformat",
+			       hash_algos[hash_algo].name);
+}
+
 static int create_default_files(const char *template_path,
 				const char *original_git_dir,
 				const struct repository_format *fmt)
@@ -185,12 +208,10 @@ static int create_default_files(const char *template_path,
 	struct stat st1;
 	struct strbuf buf = STRBUF_INIT;
 	char *path;
-	char repo_version_string[10];
 	char junk[2];
 	int reinit;
 	int filemode;
 	struct strbuf err = STRBUF_INIT;
-	int repo_version = GIT_REPO_VERSION;
 
 	/* Just look for `init.templatedir` */
 	init_db_template_dir = NULL; /* re-set in case it was set before */
@@ -248,22 +269,7 @@ static int create_default_files(const char *template_path,
 			exit(1);
 	}
 
-#ifndef ENABLE_SHA256
-	if (fmt->hash_algo != GIT_HASH_SHA1)
-		die(_("The hash algorithm %s is not supported in this build."), hash_algos[fmt->hash_algo].name);
-#endif
-
-	if (fmt->hash_algo != GIT_HASH_SHA1)
-		repo_version = GIT_REPO_VERSION_READ;
-
-	/* This forces creation of new config file */
-	xsnprintf(repo_version_string, sizeof(repo_version_string),
-		  "%d", repo_version);
-	git_config_set("core.repositoryformatversion", repo_version_string);
-
-	if (fmt->hash_algo != GIT_HASH_SHA1)
-		git_config_set("extensions.objectformat",
-			       hash_algos[fmt->hash_algo].name);
+	initialize_repository_version(fmt->hash_algo);
 
 	/* Check filemode trustability */
 	path = git_path_buf(&buf, "config");
diff --git a/cache.h b/cache.h
index 7a47e023ba..0653318d2b 100644
--- a/cache.h
+++ b/cache.h
@@ -629,6 +629,7 @@ int path_inside_repo(const char *prefix, const char *path);
 int init_db(const char *git_dir, const char *real_git_dir,
 	    const char *template_dir, int hash_algo,
 	    unsigned int flags);
+void initialize_repository_version(int hash_algo);
 
 void sanitize_stdfds(void);
 int daemonize(void);

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 16/24] worktree: allow repository version 1
  2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (14 preceding siblings ...)
  2020-02-22 20:17 ` [PATCH v2 15/24] init-db: move writing repo version into a function brian m. carlson
@ 2020-02-22 20:17 ` brian m. carlson
  2020-02-22 20:17 ` [PATCH v2 17/24] commit: use expected signature header for SHA-256 brian m. carlson
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 51+ messages in thread
From: brian m. carlson @ 2020-02-22 20:17 UTC (permalink / raw)
  To: git

Git supports both repository versions 0 and 1.  These formats are
identical except for the presence of extensions.  When using an
extension, such as for a different hash algorithm, a check for only
version 0 causes the check to fail.  Instead, call
verify_repository_format to verify that we have an appropriate version
and no unknown extensions.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 worktree.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/worktree.c b/worktree.c
index 5b4793caa3..d1d23aadb4 100644
--- a/worktree.c
+++ b/worktree.c
@@ -449,7 +449,7 @@ const struct worktree *find_shared_symref(const char *symref,
 int submodule_uses_worktrees(const char *path)
 {
 	char *submodule_gitdir;
-	struct strbuf sb = STRBUF_INIT;
+	struct strbuf sb = STRBUF_INIT, err = STRBUF_INIT;
 	DIR *dir;
 	struct dirent *d;
 	int ret = 0;
@@ -463,18 +463,16 @@ int submodule_uses_worktrees(const char *path)
 	get_common_dir_noenv(&sb, submodule_gitdir);
 	free(submodule_gitdir);
 
-	/*
-	 * The check below is only known to be good for repository format
-	 * version 0 at the time of writing this code.
-	 */
 	strbuf_addstr(&sb, "/config");
 	read_repository_format(&format, sb.buf);
-	if (format.version != 0) {
+	if (verify_repository_format(&format, &err)) {
+		strbuf_release(&err);
 		strbuf_release(&sb);
 		clear_repository_format(&format);
 		return 1;
 	}
 	clear_repository_format(&format);
+	strbuf_release(&err);
 
 	/* Replace config by worktrees. */
 	strbuf_setlen(&sb, sb.len - strlen("config"));

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 17/24] commit: use expected signature header for SHA-256
  2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (15 preceding siblings ...)
  2020-02-22 20:17 ` [PATCH v2 16/24] worktree: allow repository version 1 brian m. carlson
@ 2020-02-22 20:17 ` brian m. carlson
  2020-02-24 18:24   ` Junio C Hamano
  2020-02-22 20:17 ` [PATCH v2 18/24] gpg-interface: improve interface for parsing tags brian m. carlson
                   ` (7 subsequent siblings)
  24 siblings, 1 reply; 51+ messages in thread
From: brian m. carlson @ 2020-02-22 20:17 UTC (permalink / raw)
  To: git

The transition plan anticipates that we will allow signatures using
multiple algorithms in a single commit. In order to do so, we need to
use a different header per algorithm so that it will be obvious over
which data to compute the signature.

The transition plan specifies that we should use "gpgsig-sha256", so
wire up the commit code such that it can write and parse the current
algorithm, and it can remove the headers for any algorithm when creating
a new commit. Add tests to ensure that we write using the right header
and that git fsck doesn't reject these commits.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/commit.c         |  2 +-
 commit.c                 | 30 +++++++++++++++++++++++-------
 sequencer.c              |  2 +-
 t/t1450-fsck.sh          | 24 ++++++++++++++++++++++++
 t/t7510-signed-commit.sh | 16 +++++++++++++---
 5 files changed, 62 insertions(+), 12 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 7ba33a3bec..798d362a2e 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1659,7 +1659,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 	}
 
 	if (amend) {
-		const char *exclude_gpgsig[2] = { "gpgsig", NULL };
+		const char *exclude_gpgsig[3] = { "gpgsig", "gpgsig-sha256", NULL };
 		extra = read_commit_extra_headers(current_head, exclude_gpgsig);
 	} else {
 		struct commit_extra_header **tail = &extra;
diff --git a/commit.c b/commit.c
index a6cfa41a4e..534e14f22a 100644
--- a/commit.c
+++ b/commit.c
@@ -961,14 +961,22 @@ struct commit *get_fork_point(const char *refname, struct commit *commit)
 	return ret;
 }
 
-static const char gpg_sig_header[] = "gpgsig";
-static const int gpg_sig_header_len = sizeof(gpg_sig_header) - 1;
+/*
+ * Indexed by hash algorithm identifier.
+ */
+static const char *gpg_sig_headers[] = {
+	NULL,
+	"gpgsig",
+	"gpgsig-sha256",
+};
 
 static int do_sign_commit(struct strbuf *buf, const char *keyid)
 {
 	struct strbuf sig = STRBUF_INIT;
 	int inspos, copypos;
 	const char *eoh;
+	const char *gpg_sig_header = gpg_sig_headers[hash_algo_by_ptr(the_hash_algo)];
+	int gpg_sig_header_len = strlen(gpg_sig_header);
 
 	/* find the end of the header */
 	eoh = strstr(buf->buf, "\n\n");
@@ -1010,6 +1018,8 @@ int parse_signed_commit(const struct commit *commit,
 	const char *buffer = get_commit_buffer(commit, &size);
 	int in_signature, saw_signature = -1;
 	const char *line, *tail;
+	const char *gpg_sig_header = gpg_sig_headers[hash_algo_by_ptr(the_hash_algo)];
+	int gpg_sig_header_len = strlen(gpg_sig_header);
 
 	line = buffer;
 	tail = buffer + size;
@@ -1056,11 +1066,17 @@ int remove_signature(struct strbuf *buf)
 
 		if (in_signature && line[0] == ' ')
 			sig_end = next;
-		else if (starts_with(line, gpg_sig_header) &&
-			 line[gpg_sig_header_len] == ' ') {
-			sig_start = line;
-			sig_end = next;
-			in_signature = 1;
+		else if (starts_with(line, "gpgsig")) {
+			int i;
+			for (i = 1; i < GIT_HASH_NALGOS; i++) {
+				const char *p;
+				if (skip_prefix(line, gpg_sig_headers[i], &p) &&
+				    *p == ' ') {
+					sig_start = line;
+					sig_end = next;
+					in_signature = 1;
+				}
+			}
 		} else {
 			if (*line == '\n')
 				/* dump the whole remainder of the buffer */
diff --git a/sequencer.c b/sequencer.c
index ba90a513b9..cbd920afb2 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -1321,7 +1321,7 @@ static int try_to_commit(struct repository *r,
 		return -1;
 
 	if (flags & AMEND_MSG) {
-		const char *exclude_gpgsig[] = { "gpgsig", NULL };
+		const char *exclude_gpgsig[] = { "gpgsig", "gpgsig-sha256", NULL };
 		const char *out_enc = get_commit_output_encoding();
 		const char *message = logmsg_reencode(current_head, NULL,
 						      out_enc);
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 02478bc4ec..70a8307154 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -133,6 +133,30 @@ test_expect_success 'other worktree HEAD link pointing at a funny place' '
 	test_i18ngrep "worktrees/other/HEAD points to something strange" out
 '
 
+test_expect_success 'commit with multiple signatures is okay' '
+	git cat-file commit HEAD >basis &&
+	cat >sigs <<-EOF &&
+	gpgsig -----BEGIN PGP SIGNATURE-----
+	  VGhpcyBpcyBub3QgcmVhbGx5IGEgc2lnbmF0dXJlLg==
+	  -----END PGP SIGNATURE-----
+	gpgsig-sha256 -----BEGIN PGP SIGNATURE-----
+	  VGhpcyBpcyBub3QgcmVhbGx5IGEgc2lnbmF0dXJlLg==
+	  -----END PGP SIGNATURE-----
+	EOF
+	sed -e "/^committer/q" basis >okay &&
+	cat sigs >>okay &&
+	echo >>okay &&
+	sed -e "1,/^$/d" basis >>okay &&
+	cat okay &&
+	new=$(git hash-object -t commit -w --stdin <okay) &&
+	test_when_finished "remove_object $new" &&
+	git update-ref refs/heads/bogus "$new" &&
+	test_when_finished "git update-ref -d refs/heads/bogus" &&
+	git fsck 2>out &&
+	cat out &&
+	! grep "commit $new" out
+'
+
 test_expect_success 'email without @ is okay' '
 	git cat-file commit HEAD >basis &&
 	sed "s/@/AT/" basis >okay &&
diff --git a/t/t7510-signed-commit.sh b/t/t7510-signed-commit.sh
index 0c06d22a00..6baaa1ad91 100755
--- a/t/t7510-signed-commit.sh
+++ b/t/t7510-signed-commit.sh
@@ -6,6 +6,11 @@ GNUPGHOME_NOT_USED=$GNUPGHOME
 . "$TEST_DIRECTORY/lib-gpg.sh"
 
 test_expect_success GPG 'create signed commits' '
+	test_oid_cache <<-\EOF &&
+	header sha1:gpgsig
+	header sha256:gpgsig-sha256
+	EOF
+
 	test_when_finished "test_unconfig commit.gpgsign" &&
 
 	echo 1 >file && git add file &&
@@ -155,6 +160,11 @@ test_expect_success GPG 'verify signatures with --raw' '
 	)
 '
 
+test_expect_success GPG 'proper header is used for hash algorithm' '
+	git cat-file commit fourth-signed >output &&
+	grep "^$(test_oid header) -----BEGIN PGP SIGNATURE-----" output
+'
+
 test_expect_success GPG 'show signed commit with signature' '
 	git show -s initial >commit &&
 	git show -s --show-signature initial >show &&
@@ -162,7 +172,7 @@ test_expect_success GPG 'show signed commit with signature' '
 	git cat-file commit initial >cat &&
 	grep -v -e "gpg: " -e "Warning: " show >show.commit &&
 	grep -e "gpg: " -e "Warning: " show >show.gpg &&
-	grep -v "^ " cat | grep -v "^gpgsig " >cat.commit &&
+	grep -v "^ " cat | grep -v "^$(test_oid header) " >cat.commit &&
 	test_cmp show.commit commit &&
 	test_cmp show.gpg verify.2 &&
 	test_cmp cat.commit verify.1
@@ -299,10 +309,10 @@ test_expect_success GPG 'check config gpg.format values' '
 test_expect_success GPG 'detect fudged commit with double signature' '
 	sed -e "/gpgsig/,/END PGP/d" forged1 >double-base &&
 	sed -n -e "/gpgsig/,/END PGP/p" forged1 | \
-		sed -e "s/^gpgsig//;s/^ //" | gpg --dearmor >double-sig1.sig &&
+		sed -e "s/^$(test_oid header)//;s/^ //" | gpg --dearmor >double-sig1.sig &&
 	gpg -o double-sig2.sig -u 29472784 --detach-sign double-base &&
 	cat double-sig1.sig double-sig2.sig | gpg --enarmor >double-combined.asc &&
-	sed -e "s/^\(-.*\)ARMORED FILE/\1SIGNATURE/;1s/^/gpgsig /;2,\$s/^/ /" \
+	sed -e "s/^\(-.*\)ARMORED FILE/\1SIGNATURE/;1s/^/$(test_oid header) /;2,\$s/^/ /" \
 		double-combined.asc > double-gpgsig &&
 	sed -e "/committer/r double-gpgsig" double-base >double-commit &&
 	git hash-object -w -t commit double-commit >double-commit.commit &&

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 18/24] gpg-interface: improve interface for parsing tags
  2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (16 preceding siblings ...)
  2020-02-22 20:17 ` [PATCH v2 17/24] commit: use expected signature header for SHA-256 brian m. carlson
@ 2020-02-22 20:17 ` brian m. carlson
  2020-02-24 18:26   ` Junio C Hamano
  2020-02-25 10:29   ` Johannes Schindelin
  2020-02-22 20:17 ` [PATCH v2 19/24] tag: store SHA-256 signatures in a header brian m. carlson
                   ` (6 subsequent siblings)
  24 siblings, 2 replies; 51+ messages in thread
From: brian m. carlson @ 2020-02-22 20:17 UTC (permalink / raw)
  To: git

We have a function which parses a buffer with a signature at the end,
parse_signature, and this function is used for signed tags.  However,
the transition plan has SHA-256 tags using a header, which is a
materially different syntax.  The current interface is not suitable for
parsing such tags.

Adjust the parse_signature interface to store the parsed data in two
strbufs and turn the existing function into parse_signed_buffer.  The
latter is still used in places where we want to strip off the signature
in a SHA-1 tag or in places where we know we always have a signed
buffer, such as push certs.

Adjust all the callers to deal with this new interface.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/fmt-merge-msg.c | 26 ++++++++++++++++++--------
 builtin/receive-pack.c  |  4 ++--
 builtin/tag.c           | 16 ++++++++++++----
 commit.c                |  9 ++++++---
 gpg-interface.c         | 13 ++++++++++++-
 gpg-interface.h         |  9 ++++++++-
 log-tree.c              | 14 ++++++++------
 ref-filter.c            | 18 ++++++++++++++----
 tag.c                   | 15 ++++++++-------
 9 files changed, 88 insertions(+), 36 deletions(-)

diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c
index 05a92c59d8..29f647e2d9 100644
--- a/builtin/fmt-merge-msg.c
+++ b/builtin/fmt-merge-msg.c
@@ -472,6 +472,7 @@ static void fmt_tag_signature(struct strbuf *tagbuf,
 			      const char *buf,
 			      unsigned long len)
 {
+
 	const char *tag_body = strstr(buf, "\n\n");
 	if (tag_body) {
 		tag_body += 2;
@@ -492,24 +493,31 @@ static void fmt_merge_msg_sigs(struct strbuf *out)
 	for (i = 0; i < origins.nr; i++) {
 		struct object_id *oid = origins.items[i].util;
 		enum object_type type;
-		unsigned long size, len;
-		char *buf = read_object_file(oid, &type, &size);
+		unsigned long size;
+		char *buf = read_object_file(oid, &type, &size), *orig = buf;
 		struct signature_check sigc = { 0 };
+		struct strbuf payload = STRBUF_INIT;
+		struct strbuf signature = STRBUF_INIT;
 		struct strbuf sig = STRBUF_INIT;
+		size_t len = size;
 
 		if (!buf || type != OBJ_TAG)
 			goto next;
-		len = parse_signature(buf, size);
-
-		if (size == len)
-			; /* merely annotated */
-		else if (!check_signature(buf, len, buf + len, size - len,
+		if (!parse_signature(buf, size, &payload, &signature))
+			len = size; /* merely annotated */
+		else if (!check_signature(payload.buf, payload.len,
+					  signature.buf, signature.len,
 					  &sigc)) {
 			strbuf_addstr(&sig, sigc.gpg_output);
 			signature_check_clear(&sigc);
 		} else
 			strbuf_addstr(&sig, "gpg verification failed.\n");
 
+		if (payload.len) {
+			buf = payload.buf;
+			len = payload.len;
+		}
+
 		if (!tag_number++) {
 			fmt_tag_signature(&tagbuf, &sig, buf, len);
 			first_tag = i;
@@ -531,8 +539,10 @@ static void fmt_merge_msg_sigs(struct strbuf *out)
 			fmt_tag_signature(&tagbuf, &sig, buf, len);
 		}
 		strbuf_release(&sig);
+		strbuf_release(&payload);
+		strbuf_release(&signature);
 	next:
-		free(buf);
+		free(orig);
 	}
 	if (tagbuf.len) {
 		strbuf_addch(out, '\n');
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 411e0b4d99..4c814c1963 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -636,7 +636,7 @@ static void prepare_push_cert_sha1(struct child_process *proc)
 
 		memset(&sigcheck, '\0', sizeof(sigcheck));
 
-		bogs = parse_signature(push_cert.buf, push_cert.len);
+		bogs = parse_signed_buffer(push_cert.buf, push_cert.len);
 		check_signature(push_cert.buf, bogs, push_cert.buf + bogs,
 				push_cert.len - bogs, &sigcheck);
 
@@ -1568,7 +1568,7 @@ static void queue_commands_from_cert(struct command **tail,
 		die("malformed push certificate %.*s", 100, push_cert->buf);
 	else
 		boc += 2;
-	eoc = push_cert->buf + parse_signature(push_cert->buf, push_cert->len);
+	eoc = push_cert->buf + parse_signed_buffer(push_cert->buf, push_cert->len);
 
 	while (boc < eoc) {
 		const char *eol = memchr(boc, '\n', eoc - boc);
diff --git a/builtin/tag.c b/builtin/tag.c
index e0a4c25382..6b95c6a2ea 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -174,11 +174,17 @@ static void write_tag_body(int fd, const struct object_id *oid)
 {
 	unsigned long size;
 	enum object_type type;
-	char *buf, *sp;
+	char *buf, *sp, *orig;
+	struct strbuf payload = STRBUF_INIT;
+	struct strbuf signature = STRBUF_INIT;
 
-	buf = read_object_file(oid, &type, &size);
+	orig = buf = read_object_file(oid, &type, &size);
 	if (!buf)
 		return;
+	if (parse_signature(buf, size, &payload, &signature)) {
+		buf = payload.buf;
+		size = payload.len;
+	}
 	/* skip header */
 	sp = strstr(buf, "\n\n");
 
@@ -187,9 +193,11 @@ static void write_tag_body(int fd, const struct object_id *oid)
 		return;
 	}
 	sp += 2; /* skip the 2 LFs */
-	write_or_die(fd, sp, parse_signature(sp, buf + size - sp));
+	write_or_die(fd, sp, buf + size - sp);
 
-	free(buf);
+	free(orig);
+	strbuf_release(&payload);
+	strbuf_release(&signature);
 }
 
 static int build_tag_object(struct strbuf *buf, int sign, struct object_id *result)
diff --git a/commit.c b/commit.c
index 534e14f22a..d39ce5076d 100644
--- a/commit.c
+++ b/commit.c
@@ -1097,8 +1097,10 @@ static void handle_signed_tag(struct commit *parent, struct commit_extra_header
 	struct merge_remote_desc *desc;
 	struct commit_extra_header *mergetag;
 	char *buf;
-	unsigned long size, len;
+	unsigned long size;
 	enum object_type type;
+	struct strbuf payload = STRBUF_INIT;
+	struct strbuf signature = STRBUF_INIT;
 
 	desc = merge_remote_util(parent);
 	if (!desc || !desc->obj)
@@ -1106,8 +1108,7 @@ static void handle_signed_tag(struct commit *parent, struct commit_extra_header
 	buf = read_object_file(&desc->obj->oid, &type, &size);
 	if (!buf || type != OBJ_TAG)
 		goto free_return;
-	len = parse_signature(buf, size);
-	if (size == len)
+	if (!parse_signature(buf, size, &payload, &signature))
 		goto free_return;
 	/*
 	 * We could verify this signature and either omit the tag when
@@ -1126,6 +1127,8 @@ static void handle_signed_tag(struct commit *parent, struct commit_extra_header
 
 	**tail = mergetag;
 	*tail = &mergetag->next;
+	strbuf_release(&payload);
+	strbuf_release(&signature);
 	return;
 
 free_return:
diff --git a/gpg-interface.c b/gpg-interface.c
index 2d538bcd6e..b25f5c21d8 100644
--- a/gpg-interface.c
+++ b/gpg-interface.c
@@ -345,7 +345,7 @@ void print_signature_buffer(const struct signature_check *sigc, unsigned flags)
 		fputs(output, stderr);
 }
 
-size_t parse_signature(const char *buf, size_t size)
+size_t parse_signed_buffer(const char *buf, size_t size)
 {
 	size_t len = 0;
 	size_t match = size;
@@ -361,6 +361,17 @@ size_t parse_signature(const char *buf, size_t size)
 	return match;
 }
 
+int parse_signature(const char *buf, size_t size, struct strbuf *payload, struct strbuf *signature)
+{
+	size_t match = parse_signed_buffer(buf, size);
+	if (match != size) {
+		strbuf_add(payload, buf, match);
+		strbuf_add(signature, buf + match, size - match);
+		return 1;
+	}
+	return 0;
+}
+
 void set_signing_key(const char *key)
 {
 	free(configured_signing_key);
diff --git a/gpg-interface.h b/gpg-interface.h
index f4e9b4f371..80567e4894 100644
--- a/gpg-interface.h
+++ b/gpg-interface.h
@@ -37,13 +37,20 @@ struct signature_check {
 
 void signature_check_clear(struct signature_check *sigc);
 
+/*
+ * Look at a GPG signed tag object.  If such a signature exists, store it in
+ * signature and the signed content in payload.  Return 1 if a signature was
+ * found, and 0 otherwise.
+ */
+int parse_signature(const char *buf, size_t size, struct strbuf *payload, struct strbuf *signature);
+
 /*
  * Look at GPG signed content (e.g. a signed tag object), whose
  * payload is followed by a detached signature on it.  Return the
  * offset where the embedded detached signature begins, or the end of
  * the data when there is no such signature.
  */
-size_t parse_signature(const char *buf, size_t size);
+size_t parse_signed_buffer(const char *buf, size_t size);
 
 /*
  * Create a detached signature for the contents of "buffer" and append
diff --git a/log-tree.c b/log-tree.c
index cae38dcc66..75bd61a531 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -499,7 +499,9 @@ static int show_one_mergetag(struct commit *commit,
 	struct strbuf verify_message;
 	struct signature_check sigc = { 0 };
 	int status, nth;
-	size_t payload_size, gpg_message_offset;
+	size_t gpg_message_offset;
+	struct strbuf payload = STRBUF_INIT;
+	struct strbuf signature = STRBUF_INIT;
 
 	hash_object_file(the_hash_algo, extra->value, extra->len,
 			 type_name(OBJ_TAG), &oid);
@@ -523,13 +525,11 @@ static int show_one_mergetag(struct commit *commit,
 			    "parent #%d, tagged '%s'\n", nth + 1, tag->tag);
 	gpg_message_offset = verify_message.len;
 
-	payload_size = parse_signature(extra->value, extra->len);
 	status = -1;
-	if (extra->len > payload_size) {
+	if (parse_signature(extra->value, extra->len, &payload, &signature)) {
 		/* could have a good signature */
-		if (!check_signature(extra->value, payload_size,
-				     extra->value + payload_size,
-				     extra->len - payload_size, &sigc)) {
+		if (!check_signature(payload.buf, payload.len,
+				     signature.buf, signature.len, &sigc)) {
 			strbuf_addstr(&verify_message, sigc.gpg_output);
 			signature_check_clear(&sigc);
 			status = 0; /* good */
@@ -540,6 +540,8 @@ static int show_one_mergetag(struct commit *commit,
 
 	show_sig_lines(opt, status, verify_message.buf);
 	strbuf_release(&verify_message);
+	strbuf_release(&payload);
+	strbuf_release(&signature);
 	return 0;
 }
 
diff --git a/ref-filter.c b/ref-filter.c
index 6867e33648..212f1165bb 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1161,7 +1161,13 @@ static void find_subpos(const char *buf,
 			unsigned long *nonsiglen,
 			const char **sig, unsigned long *siglen)
 {
+	struct strbuf payload = STRBUF_INIT;
+	struct strbuf signature = STRBUF_INIT;
 	const char *eol;
+	const char *end = buf + strlen(buf);
+	const char *sigstart;
+
+
 	/* skip past header until we hit empty line */
 	while (*buf && *buf != '\n') {
 		eol = strchrnul(buf, '\n');
@@ -1174,13 +1180,14 @@ static void find_subpos(const char *buf,
 		buf++;
 
 	/* parse signature first; we might not even have a subject line */
-	*sig = buf + parse_signature(buf, strlen(buf));
-	*siglen = strlen(*sig);
+	parse_signature(buf, end - buf, &payload, &signature);
+	*sig = strbuf_detach(&signature, siglen);
+	sigstart = buf + parse_signed_buffer(buf, strlen(buf));
 
 	/* subject is first non-empty line */
 	*sub = buf;
 	/* subject goes to first empty line */
-	while (buf < *sig && *buf && *buf != '\n') {
+	while (buf < sigstart && *buf && *buf != '\n') {
 		eol = strchrnul(buf, '\n');
 		if (*eol)
 			eol++;
@@ -1196,7 +1203,7 @@ static void find_subpos(const char *buf,
 		buf++;
 	*body = buf;
 	*bodylen = strlen(buf);
-	*nonsiglen = *sig - buf;
+	*nonsiglen = sigstart - buf;
 }
 
 /*
@@ -1234,6 +1241,7 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
 		struct used_atom *atom = &used_atom[i];
 		const char *name = atom->name;
 		struct atom_value *v = &val[i];
+
 		if (!!deref != (*name == '*'))
 			continue;
 		if (deref)
@@ -1273,6 +1281,8 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
 			v->s = strbuf_detach(&s, NULL);
 		} else if (atom->u.contents.option == C_BARE)
 			v->s = xstrdup(subpos);
+
+		free((void *)sigpos);
 	}
 }
 
diff --git a/tag.c b/tag.c
index 71b544467e..5d04506d10 100644
--- a/tag.c
+++ b/tag.c
@@ -13,26 +13,27 @@ const char *tag_type = "tag";
 static int run_gpg_verify(const char *buf, unsigned long size, unsigned flags)
 {
 	struct signature_check sigc;
-	size_t payload_size;
+	struct strbuf payload = STRBUF_INIT;
+	struct strbuf signature = STRBUF_INIT;
 	int ret;
 
 	memset(&sigc, 0, sizeof(sigc));
 
-	payload_size = parse_signature(buf, size);
-
-	if (size == payload_size) {
+	if (!parse_signature(buf, size, &payload, &signature)) {
 		if (flags & GPG_VERIFY_VERBOSE)
-			write_in_full(1, buf, payload_size);
+			write_in_full(1, buf, size);
 		return error("no signature found");
 	}
 
-	ret = check_signature(buf, payload_size, buf + payload_size,
-				size - payload_size, &sigc);
+	ret = check_signature(payload.buf, payload.len, signature.buf,
+				signature.len, &sigc);
 
 	if (!(flags & GPG_VERIFY_OMIT_STATUS))
 		print_signature_buffer(&sigc, flags);
 
 	signature_check_clear(&sigc);
+	strbuf_release(&payload);
+	strbuf_release(&signature);
 	return ret;
 }
 

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 19/24] tag: store SHA-256 signatures in a header
  2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (17 preceding siblings ...)
  2020-02-22 20:17 ` [PATCH v2 18/24] gpg-interface: improve interface for parsing tags brian m. carlson
@ 2020-02-22 20:17 ` brian m. carlson
  2020-02-24 18:30   ` Junio C Hamano
  2020-02-22 20:17 ` [PATCH v2 20/24] fast-import: permit reading multiple marks files brian m. carlson
                   ` (5 subsequent siblings)
  24 siblings, 1 reply; 51+ messages in thread
From: brian m. carlson @ 2020-02-22 20:17 UTC (permalink / raw)
  To: git

In the future, we'll want to allow a user to sign both the SHA-1 version
of a tag and the SHA-256 version at the same time.  Since for SHA-1 the
signature is appended to the tag message, we must use a different way to
allow multiple signature.

The transition plan envisions this using a gpgsig-sha256 header, much as
for commits.  Refactor the commit code that performs parsing of this
header and use it for tags that use SHA-256.  Check that we get tags in
the correct format depending on what algorithm we're using.

Note that currently we have no way to rewrite an object into another
hash algorithm, and therefore we don't have a way to verify the
signatures of the other hash algorithm.  Because of the way the
signatures are stored, we'll reject commits signed with both algorithms,
which is essential for security.  If we allowed both signatures despite
not being able to verify them and one signature were invalid, we'd end
up with a security problem.

There are, however, a few things to note.

In the ref-filter code, we hoist the signature parsing above the blank
line delimiting headers and body so we can find the signature when using
SHA-256.  For similar reasons, t6300 no longer emits the signature as
part of the body since it's no longer part of the body.

We mark a test for exporting signatures with remote helpers as SHA-1
only.  Since we no longer have signatures in the body (those are for
SHA-1 only), they cannot be exported this way.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/mktag.c           | 14 ++++++++++++++
 builtin/tag.c             |  4 +++-
 commit.c                  | 19 +++++++++++++++----
 commit.h                  |  8 ++++++++
 gpg-interface.c           | 16 ++++++++++------
 ref-filter.c              |  7 +++----
 t/t5801-remote-helpers.sh |  4 +++-
 t/t6300-for-each-ref.sh   | 34 +++++++++++++++++++++++++---------
 t/t7004-tag.sh            |  8 +++++++-
 t/t7030-verify-tag.sh     | 17 +++++++++++++++++
 10 files changed, 105 insertions(+), 26 deletions(-)

diff --git a/builtin/mktag.c b/builtin/mktag.c
index 4982d3a93e..ce1665c4e4 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -144,6 +144,20 @@ static int verify_tag(char *buffer, unsigned long size)
 			(uintmax_t) (tagger_line - buffer));
 	tagger_line += 6;
 
+	if (hash_algo_by_ptr(the_hash_algo) == GIT_HASH_SHA256 &&
+		!memcmp(tagger_line, "gpgsig-sha256 ", 14)) {
+		char *p = strpbrk(tagger_line + 1, "\n");
+		if (!p)
+			return error("char%"PRIuMAX": could not find end of line",
+				(uintmax_t) (tagger_line - buffer));
+		tagger_line = p + 1;
+		while (*tagger_line == ' ' && (p = strpbrk(tagger_line, "\n")))
+			tagger_line = p + 1;
+		if (!p)
+			return error("char%"PRIuMAX": could not find end of line",
+				(uintmax_t) (tagger_line - buffer));
+	}
+
 	/* Verify the blank line separating the header from the body */
 	if (*tagger_line != '\n')
 		return error("char%"PRIuMAX": trailing garbage in tag header",
diff --git a/builtin/tag.c b/builtin/tag.c
index 6b95c6a2ea..ab6b5044e6 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -128,7 +128,9 @@ static int verify_tag(const char *name, const char *ref,
 
 static int do_sign(struct strbuf *buffer)
 {
-	return sign_buffer(buffer, buffer, get_signing_key());
+	if (hash_algo_by_ptr(the_hash_algo) == GIT_HASH_SHA1)
+		return sign_buffer(buffer, buffer, get_signing_key());
+	return sign_with_header(buffer, get_signing_key());
 }
 
 static const char tag_template[] =
diff --git a/commit.c b/commit.c
index d39ce5076d..e3e4f74a36 100644
--- a/commit.c
+++ b/commit.c
@@ -970,7 +970,7 @@ static const char *gpg_sig_headers[] = {
 	"gpgsig-sha256",
 };
 
-static int do_sign_commit(struct strbuf *buf, const char *keyid)
+int sign_with_header(struct strbuf *buf, const char *keyid)
 {
 	struct strbuf sig = STRBUF_INIT;
 	int inspos, copypos;
@@ -1010,12 +1010,24 @@ static int do_sign_commit(struct strbuf *buf, const char *keyid)
 	return 0;
 }
 
+
+
 int parse_signed_commit(const struct commit *commit,
 			struct strbuf *payload, struct strbuf *signature)
 {
-
 	unsigned long size;
 	const char *buffer = get_commit_buffer(commit, &size);
+	int ret = parse_buffer_signed_by_header(buffer, size, payload, signature);
+
+	unuse_commit_buffer(commit, buffer);
+	return ret;
+}
+
+int parse_buffer_signed_by_header(const char *buffer,
+				  unsigned long size,
+				  struct strbuf *payload,
+				  struct strbuf *signature)
+{
 	int in_signature, saw_signature = -1;
 	const char *line, *tail;
 	const char *gpg_sig_header = gpg_sig_headers[hash_algo_by_ptr(the_hash_algo)];
@@ -1048,7 +1060,6 @@ int parse_signed_commit(const struct commit *commit,
 		}
 		line = next;
 	}
-	unuse_commit_buffer(commit, buffer);
 	return saw_signature;
 }
 
@@ -1490,7 +1501,7 @@ int commit_tree_extended(const char *msg, size_t msg_len,
 	if (encoding_is_utf8 && !verify_utf8(&buffer))
 		fprintf(stderr, _(commit_utf8_warn));
 
-	if (sign_commit && do_sign_commit(&buffer, sign_commit)) {
+	if (sign_commit && sign_with_header(&buffer, sign_commit)) {
 		result = -1;
 		goto out;
 	}
diff --git a/commit.h b/commit.h
index 008a0fa4a0..e9898df412 100644
--- a/commit.h
+++ b/commit.h
@@ -402,4 +402,12 @@ int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, void
 LAST_ARG_MUST_BE_NULL
 int run_commit_hook(int editor_is_used, const char *index_file, const char *name, ...);
 
+/* Sign a commit or tag buffer, storing the result in a header. */
+int sign_with_header(struct strbuf *buf, const char *keyid);
+/* Parse the signature out of a header. */
+int parse_buffer_signed_by_header(const char *buffer,
+				  unsigned long size,
+				  struct strbuf *payload,
+				  struct strbuf *signature);
+
 #endif /* COMMIT_H */
diff --git a/gpg-interface.c b/gpg-interface.c
index b25f5c21d8..a63766a19b 100644
--- a/gpg-interface.c
+++ b/gpg-interface.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "commit.h"
 #include "config.h"
 #include "run-command.h"
 #include "strbuf.h"
@@ -363,13 +364,16 @@ size_t parse_signed_buffer(const char *buf, size_t size)
 
 int parse_signature(const char *buf, size_t size, struct strbuf *payload, struct strbuf *signature)
 {
-	size_t match = parse_signed_buffer(buf, size);
-	if (match != size) {
-		strbuf_add(payload, buf, match);
-		strbuf_add(signature, buf + match, size - match);
-		return 1;
+	if (hash_algo_by_ptr(the_hash_algo) == GIT_HASH_SHA1) {
+		size_t match = parse_signed_buffer(buf, size);
+		if (match != size) {
+			strbuf_add(payload, buf, match);
+			strbuf_add(signature, buf + match, size - match);
+			return 1;
+		}
+		return 0;
 	}
-	return 0;
+	return parse_buffer_signed_by_header(buf, size, payload, signature);
 }
 
 void set_signing_key(const char *key)
diff --git a/ref-filter.c b/ref-filter.c
index 212f1165bb..933530a14c 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1167,6 +1167,8 @@ static void find_subpos(const char *buf,
 	const char *end = buf + strlen(buf);
 	const char *sigstart;
 
+	/* parse signature first; we might not even have a subject line */
+	parse_signature(buf, end - buf, &payload, &signature);
 
 	/* skip past header until we hit empty line */
 	while (*buf && *buf != '\n') {
@@ -1178,9 +1180,6 @@ static void find_subpos(const char *buf,
 	/* skip any empty lines */
 	while (*buf == '\n')
 		buf++;
-
-	/* parse signature first; we might not even have a subject line */
-	parse_signature(buf, end - buf, &payload, &signature);
 	*sig = strbuf_detach(&signature, siglen);
 	sigstart = buf + parse_signed_buffer(buf, strlen(buf));
 
@@ -1267,7 +1266,7 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
 			v->s = xmemdupz(sigpos, siglen);
 		else if (atom->u.contents.option == C_LINES) {
 			struct strbuf s = STRBUF_INIT;
-			const char *contents_end = bodylen + bodypos - siglen;
+			const char *contents_end = bodypos + nonsiglen;
 
 			/*  Size is the length of the message after removing the signature */
 			append_lines(&s, subpos, contents_end - subpos, atom->u.contents.nlines);
diff --git a/t/t5801-remote-helpers.sh b/t/t5801-remote-helpers.sh
index 121e5c6edb..801802be9d 100755
--- a/t/t5801-remote-helpers.sh
+++ b/t/t5801-remote-helpers.sh
@@ -192,7 +192,9 @@ test_expect_success GPG 'push signed tag' '
 	test_must_fail compare_refs local signed-tag server signed-tag
 '
 
-test_expect_success GPG 'push signed tag with signed-tags capability' '
+# SHA-256 signatures are stored in the header and can't be round-tripped through
+# fast-export.
+test_expect_success GPG,SHA1 'push signed tag with signed-tags capability' '
 	(cd local &&
 	git checkout master &&
 	git tag -s -m signed-tag signed-tag-2 &&
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index b3c1092338..caeabfb293 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -54,6 +54,17 @@ test_atom() {
 	"
 }
 
+test_atom_hash () {
+	local val
+	if [ "$(test_oid algo)" = sha1 ]
+	then
+		val="$3"
+	else
+		val="$4"
+	fi
+	test_atom "$1" "$2" "$val"
+}
+
 hexlen=$(test_oid hexsz)
 disklen=$(test_oid disklen)
 
@@ -625,30 +636,35 @@ sig='-----BEGIN PGP SIGNATURE-----
 PREREQ=GPG
 test_atom refs/tags/signed-empty subject ''
 test_atom refs/tags/signed-empty contents:subject ''
-test_atom refs/tags/signed-empty body "$sig"
+test_atom_hash refs/tags/signed-empty body "$sig" ''
 test_atom refs/tags/signed-empty contents:body ''
 test_atom refs/tags/signed-empty contents:signature "$sig"
-test_atom refs/tags/signed-empty contents "$sig"
+test_atom_hash refs/tags/signed-empty contents "$sig" ''
 
 test_atom refs/tags/signed-short subject 'subject line'
 test_atom refs/tags/signed-short contents:subject 'subject line'
-test_atom refs/tags/signed-short body "$sig"
+test_atom_hash refs/tags/signed-short body "$sig" ''
 test_atom refs/tags/signed-short contents:body ''
 test_atom refs/tags/signed-short contents:signature "$sig"
-test_atom refs/tags/signed-short contents "subject line
-$sig"
+test_atom_hash refs/tags/signed-short contents "subject line
+$sig" 'subject line
+'
 
 test_atom refs/tags/signed-long subject 'subject line'
 test_atom refs/tags/signed-long contents:subject 'subject line'
-test_atom refs/tags/signed-long body "body contents
-$sig"
+test_atom_hash refs/tags/signed-long body "body contents
+$sig" 'body contents
+'
 test_atom refs/tags/signed-long contents:body 'body contents
 '
 test_atom refs/tags/signed-long contents:signature "$sig"
-test_atom refs/tags/signed-long contents "subject line
+test_atom_hash refs/tags/signed-long contents "subject line
 
 body contents
-$sig"
+$sig" 'subject line
+
+body contents
+'
 
 sort >expected <<EOF
 $(git rev-parse refs/tags/bogo) <committer@example.com> refs/tags/bogo
diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh
index 6db92bd3ba..bd74b2d7e0 100755
--- a/t/t7004-tag.sh
+++ b/t/t7004-tag.sh
@@ -425,7 +425,13 @@ test_expect_success \
 # creating annotated tags:
 
 get_tag_msg () {
-	git cat-file tag "$1" | sed -e "/BEGIN PGP/q"
+	if [ "$(test_oid algo)" = sha1 ]
+	then
+		git cat-file tag "$1" | sed -e "/BEGIN PGP/q"
+	else
+
+		git cat-file tag "$1" | sed -e '/^gpgsig-sha256/{s/^gpgsig-sha256 //;h;d};/^ /d;${p;x;/^$/d}'
+	fi
 }
 
 # run test_tick before committing always gives the time in that timezone
diff --git a/t/t7030-verify-tag.sh b/t/t7030-verify-tag.sh
index 5c5bc32ccb..d95a03d315 100755
--- a/t/t7030-verify-tag.sh
+++ b/t/t7030-verify-tag.sh
@@ -79,6 +79,23 @@ test_expect_success GPG 'verify and show signatures' '
 	)
 '
 
+test_expect_success GPG 'signature has expected format' '
+	for tag in initial second merge fourth-signed sixth-signed seventh-signed
+	do
+		if [ "$(test_oid algo)"	= sha1 ]
+		then
+			git cat-file tag seventh-signed >output &&
+			! grep gpgsig output &&
+			grep "^-----BEGIN PGP SIGNATURE-----" output
+		else
+			git cat-file tag seventh-signed >output &&
+			grep gpgsig-sha256 output &&
+			! grep "^-----BEGIN PGP SIGNATURE-----" output
+		fi &&
+		echo $tag OK || exit 1
+	done
+'
+
 test_expect_success GPGSM 'verify and show signatures x509' '
 	git verify-tag ninth-signed-x509 2>actual &&
 	grep "Good signature from" actual &&

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 20/24] fast-import: permit reading multiple marks files
  2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (18 preceding siblings ...)
  2020-02-22 20:17 ` [PATCH v2 19/24] tag: store SHA-256 signatures in a header brian m. carlson
@ 2020-02-22 20:17 ` brian m. carlson
  2020-06-05 16:14   ` Junio C Hamano
  2020-02-22 20:17 ` [PATCH v2 21/24] fast-import: add helper function for inserting mark object entries brian m. carlson
                   ` (4 subsequent siblings)
  24 siblings, 1 reply; 51+ messages in thread
From: brian m. carlson @ 2020-02-22 20:17 UTC (permalink / raw)
  To: git

In the future, we'll want to read marks files for submodules as well.
Refactor the existing code to make it possible to read multiple marks
files, each into their own marks set.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 fast-import.c | 32 ++++++++++++++++++--------------
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index b8b65a801c..b9ecd89699 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -493,9 +493,8 @@ static char *pool_strdup(const char *s)
 	return r;
 }
 
-static void insert_mark(uintmax_t idnum, struct object_entry *oe)
+static void insert_mark(struct mark_set *s, uintmax_t idnum, struct object_entry *oe)
 {
-	struct mark_set *s = marks;
 	while ((idnum >> s->shift) >= 1024) {
 		s = mem_pool_calloc(&fi_mem_pool, 1, sizeof(struct mark_set));
 		s->shift = marks->shift + 10;
@@ -919,7 +918,7 @@ static int store_object(
 
 	e = insert_object(&oid);
 	if (mark)
-		insert_mark(mark, e);
+		insert_mark(marks, mark, e);
 	if (e->idx.offset) {
 		duplicate_count_by_type[type]++;
 		return 1;
@@ -1117,7 +1116,7 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark)
 	e = insert_object(&oid);
 
 	if (mark)
-		insert_mark(mark, e);
+		insert_mark(marks, mark, e);
 
 	if (e->idx.offset) {
 		duplicate_count_by_type[OBJ_BLOB]++;
@@ -1712,16 +1711,9 @@ static void dump_marks(void)
 	}
 }
 
-static void read_marks(void)
+static void read_mark_file(struct mark_set *s, FILE *f)
 {
 	char line[512];
-	FILE *f = fopen(import_marks_file, "r");
-	if (f)
-		;
-	else if (import_marks_file_ignore_missing && errno == ENOENT)
-		goto done; /* Marks file does not exist */
-	else
-		die_errno("cannot read '%s'", import_marks_file);
 	while (fgets(line, sizeof(line), f)) {
 		uintmax_t mark;
 		char *end;
@@ -1747,8 +1739,20 @@ static void read_marks(void)
 			e->pack_id = MAX_PACK_ID;
 			e->idx.offset = 1; /* just not zero! */
 		}
-		insert_mark(mark, e);
+		insert_mark(s, mark, e);
 	}
+}
+
+static void read_marks(void)
+{
+	FILE *f = fopen(import_marks_file, "r");
+	if (f)
+		;
+	else if (import_marks_file_ignore_missing && errno == ENOENT)
+		goto done; /* Marks file does not exist */
+	else
+		die_errno("cannot read '%s'", import_marks_file);
+	read_mark_file(marks, f);
 	fclose(f);
 done:
 	import_marks_file_done = 1;
@@ -3130,7 +3134,7 @@ static void parse_alias(void)
 		die(_("Expected 'to' command, got %s"), command_buf.buf);
 	e = find_object(&b.oid);
 	assert(e);
-	insert_mark(next_mark, e);
+	insert_mark(marks, next_mark, e);
 }
 
 static char* make_fast_import_path(const char *path)

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 21/24] fast-import: add helper function for inserting mark object entries
  2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (19 preceding siblings ...)
  2020-02-22 20:17 ` [PATCH v2 20/24] fast-import: permit reading multiple marks files brian m. carlson
@ 2020-02-22 20:17 ` brian m. carlson
  2020-02-22 20:17 ` [PATCH v2 22/24] fast-import: make find_marks work on any mark set brian m. carlson
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 51+ messages in thread
From: brian m. carlson @ 2020-02-22 20:17 UTC (permalink / raw)
  To: git

Currently, everything we want to insert into a mark set is an object
entry. However, in the future, we will want to insert objects of other
types. Teach read_mark_file to take a function pointer which helps us
insert the object we want into our mark set.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 fast-import.c | 37 ++++++++++++++++++++++---------------
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index b9ecd89699..3ce4a04473 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -131,6 +131,8 @@ struct recent_command {
 	char *buf;
 };
 
+typedef void (*mark_set_inserter_t)(struct mark_set *s, struct object_id *oid, uintmax_t mark);
+
 /* Configured limits on output */
 static unsigned long max_depth = 50;
 static off_t max_packsize;
@@ -1711,14 +1713,30 @@ static void dump_marks(void)
 	}
 }
 
-static void read_mark_file(struct mark_set *s, FILE *f)
+static void insert_object_entry(struct mark_set *s, struct object_id *oid, uintmax_t mark)
+{
+	struct object_entry *e;
+	e = find_object(oid);
+	if (!e) {
+		enum object_type type = oid_object_info(the_repository,
+							oid, NULL);
+		if (type < 0)
+			die("object not found: %s", oid_to_hex(oid));
+		e = insert_object(oid);
+		e->type = type;
+		e->pack_id = MAX_PACK_ID;
+		e->idx.offset = 1; /* just not zero! */
+	}
+	insert_mark(s, mark, e);
+}
+
+static void read_mark_file(struct mark_set *s, FILE *f, mark_set_inserter_t inserter)
 {
 	char line[512];
 	while (fgets(line, sizeof(line), f)) {
 		uintmax_t mark;
 		char *end;
 		struct object_id oid;
-		struct object_entry *e;
 
 		end = strchr(line, '\n');
 		if (line[0] != ':' || !end)
@@ -1728,18 +1746,7 @@ static void read_mark_file(struct mark_set *s, FILE *f)
 		if (!mark || end == line + 1
 			|| *end != ' ' || get_oid_hex(end + 1, &oid))
 			die("corrupt mark line: %s", line);
-		e = find_object(&oid);
-		if (!e) {
-			enum object_type type = oid_object_info(the_repository,
-								&oid, NULL);
-			if (type < 0)
-				die("object not found: %s", oid_to_hex(&oid));
-			e = insert_object(&oid);
-			e->type = type;
-			e->pack_id = MAX_PACK_ID;
-			e->idx.offset = 1; /* just not zero! */
-		}
-		insert_mark(s, mark, e);
+		inserter(s, &oid, mark);
 	}
 }
 
@@ -1752,7 +1759,7 @@ static void read_marks(void)
 		goto done; /* Marks file does not exist */
 	else
 		die_errno("cannot read '%s'", import_marks_file);
-	read_mark_file(marks, f);
+	read_mark_file(marks, f, insert_object_entry);
 	fclose(f);
 done:
 	import_marks_file_done = 1;

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 22/24] fast-import: make find_marks work on any mark set
  2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (20 preceding siblings ...)
  2020-02-22 20:17 ` [PATCH v2 21/24] fast-import: add helper function for inserting mark object entries brian m. carlson
@ 2020-02-22 20:17 ` brian m. carlson
  2020-02-22 20:17 ` [PATCH v2 23/24] fast-import: add a generic function to iterate over marks brian m. carlson
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 51+ messages in thread
From: brian m. carlson @ 2020-02-22 20:17 UTC (permalink / raw)
  To: git

In the future, we'll use multiple different mark sets with this
function, so make it take an argument that points to the mark set to
operate on.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 fast-import.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 3ce4a04473..8aaa7f6289 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -517,10 +517,9 @@ static void insert_mark(struct mark_set *s, uintmax_t idnum, struct object_entry
 	s->data.marked[idnum] = oe;
 }
 
-static struct object_entry *find_mark(uintmax_t idnum)
+static void *find_mark(struct mark_set *s, uintmax_t idnum)
 {
 	uintmax_t orig_idnum = idnum;
-	struct mark_set *s = marks;
 	struct object_entry *oe = NULL;
 	if ((idnum >> s->shift) < 1024) {
 		while (s && s->shift) {
@@ -2225,7 +2224,7 @@ static void file_change_m(const char *p, struct branch *b)
 	}
 
 	if (*p == ':') {
-		oe = find_mark(parse_mark_ref_space(&p));
+		oe = find_mark(marks, parse_mark_ref_space(&p));
 		oidcpy(&oid, &oe->idx.oid);
 	} else if (skip_prefix(p, "inline ", &p)) {
 		inline_data = 1;
@@ -2399,7 +2398,7 @@ static void note_change_n(const char *p, struct branch *b, unsigned char *old_fa
 	/* Now parse the notemodify command. */
 	/* <dataref> or 'inline' */
 	if (*p == ':') {
-		oe = find_mark(parse_mark_ref_space(&p));
+		oe = find_mark(marks, parse_mark_ref_space(&p));
 		oidcpy(&oid, &oe->idx.oid);
 	} else if (skip_prefix(p, "inline ", &p)) {
 		inline_data = 1;
@@ -2420,7 +2419,7 @@ static void note_change_n(const char *p, struct branch *b, unsigned char *old_fa
 		oidcpy(&commit_oid, &s->oid);
 	} else if (*p == ':') {
 		uintmax_t commit_mark = parse_mark_ref_eol(p);
-		struct object_entry *commit_oe = find_mark(commit_mark);
+		struct object_entry *commit_oe = find_mark(marks, commit_mark);
 		if (commit_oe->type != OBJ_COMMIT)
 			die("Mark :%" PRIuMAX " not a commit", commit_mark);
 		oidcpy(&commit_oid, &commit_oe->idx.oid);
@@ -2524,7 +2523,7 @@ static int parse_objectish(struct branch *b, const char *objectish)
 		oidcpy(&b->branch_tree.versions[1].oid, t);
 	} else if (*objectish == ':') {
 		uintmax_t idnum = parse_mark_ref_eol(objectish);
-		struct object_entry *oe = find_mark(idnum);
+		struct object_entry *oe = find_mark(marks, idnum);
 		if (oe->type != OBJ_COMMIT)
 			die("Mark :%" PRIuMAX " not a commit", idnum);
 		if (!oideq(&b->oid, &oe->idx.oid)) {
@@ -2588,7 +2587,7 @@ static struct hash_list *parse_merge(unsigned int *count)
 			oidcpy(&n->oid, &s->oid);
 		else if (*from == ':') {
 			uintmax_t idnum = parse_mark_ref_eol(from);
-			struct object_entry *oe = find_mark(idnum);
+			struct object_entry *oe = find_mark(marks, idnum);
 			if (oe->type != OBJ_COMMIT)
 				die("Mark :%" PRIuMAX " not a commit", idnum);
 			oidcpy(&n->oid, &oe->idx.oid);
@@ -2762,7 +2761,7 @@ static void parse_new_tag(const char *arg)
 	} else if (*from == ':') {
 		struct object_entry *oe;
 		from_mark = parse_mark_ref_eol(from);
-		oe = find_mark(from_mark);
+		oe = find_mark(marks, from_mark);
 		type = oe->type;
 		oidcpy(&oid, &oe->idx.oid);
 	} else if (!get_oid(from, &oid)) {
@@ -2920,7 +2919,7 @@ static void parse_get_mark(const char *p)
 	if (*p != ':')
 		die("Not a mark: %s", p);
 
-	oe = find_mark(parse_mark_ref_eol(p));
+	oe = find_mark(marks, parse_mark_ref_eol(p));
 	if (!oe)
 		die("Unknown mark: %s", command_buf.buf);
 
@@ -2935,7 +2934,7 @@ static void parse_cat_blob(const char *p)
 
 	/* cat-blob SP <object> LF */
 	if (*p == ':') {
-		oe = find_mark(parse_mark_ref_eol(p));
+		oe = find_mark(marks, parse_mark_ref_eol(p));
 		if (!oe)
 			die("Unknown mark: %s", command_buf.buf);
 		oidcpy(&oid, &oe->idx.oid);
@@ -3010,7 +3009,7 @@ static struct object_entry *parse_treeish_dataref(const char **p)
 	struct object_entry *e;
 
 	if (**p == ':') {	/* <mark> */
-		e = find_mark(parse_mark_ref_space(p));
+		e = find_mark(marks, parse_mark_ref_space(p));
 		if (!e)
 			die("Unknown mark: %s", command_buf.buf);
 		oidcpy(&oid, &e->idx.oid);

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 23/24] fast-import: add a generic function to iterate over marks
  2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (21 preceding siblings ...)
  2020-02-22 20:17 ` [PATCH v2 22/24] fast-import: make find_marks work on any mark set brian m. carlson
@ 2020-02-22 20:17 ` brian m. carlson
  2020-02-22 20:17 ` [PATCH v2 24/24] fast-import: add options for rewriting submodules brian m. carlson
  2020-02-24 18:34 ` [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 Junio C Hamano
  24 siblings, 0 replies; 51+ messages in thread
From: brian m. carlson @ 2020-02-22 20:17 UTC (permalink / raw)
  To: git

Currently, we can iterate over marks only to dump them to a file. In the
future, we'll want to perform an arbitrary operation over the items of a
mark set. Add a function, for_each_mark, that iterates over marks in a
set and performs an arbitrary callback function for each mark. Switch
the mark dumping routine to use this function now that it's available.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 fast-import.c | 50 ++++++++++++++++++++++++++------------------------
 1 file changed, 26 insertions(+), 24 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 8aaa7f6289..6711f71ba7 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -132,6 +132,7 @@ struct recent_command {
 };
 
 typedef void (*mark_set_inserter_t)(struct mark_set *s, struct object_id *oid, uintmax_t mark);
+typedef void (*each_mark_fn_t)(uintmax_t mark, void *obj, void *cbp);
 
 /* Configured limits on output */
 static unsigned long max_depth = 50;
@@ -232,6 +233,29 @@ static void parse_get_mark(const char *p);
 static void parse_cat_blob(const char *p);
 static void parse_ls(const char *p, struct branch *b);
 
+static void for_each_mark(struct mark_set *m, uintmax_t base, each_mark_fn_t callback, void *p)
+{
+	uintmax_t k;
+	if (m->shift) {
+		for (k = 0; k < 1024; k++) {
+			if (m->data.sets[k])
+				for_each_mark(m->data.sets[k], base + (k << m->shift), callback, p);
+		}
+	} else {
+		for (k = 0; k < 1024; k++) {
+			if (m->data.marked[k])
+				callback(base + k, m->data.marked[k], p);
+		}
+	}
+}
+
+static void dump_marks_fn(uintmax_t mark, void *object, void *cbp) {
+	struct object_entry *e = object;
+	FILE *f = cbp;
+
+	fprintf(f, ":%" PRIuMAX " %s\n", mark, oid_to_hex(&e->idx.oid));
+}
+
 static void write_branch_report(FILE *rpt, struct branch *b)
 {
 	fprintf(rpt, "%s:\n", b->name);
@@ -260,8 +284,6 @@ static void write_branch_report(FILE *rpt, struct branch *b)
 	fputc('\n', rpt);
 }
 
-static void dump_marks_helper(FILE *, uintmax_t, struct mark_set *);
-
 static void write_crash_report(const char *err)
 {
 	char *loc = git_pathdup("fast_import_crash_%"PRIuMAX, (uintmax_t) getpid());
@@ -340,7 +362,7 @@ static void write_crash_report(const char *err)
 	if (export_marks_file)
 		fprintf(rpt, "  exported to %s\n", export_marks_file);
 	else
-		dump_marks_helper(rpt, 0, marks);
+		for_each_mark(marks, 0, dump_marks_fn, rpt);
 
 	fputc('\n', rpt);
 	fputs("-------------------\n", rpt);
@@ -1655,26 +1677,6 @@ static void dump_tags(void)
 	strbuf_release(&err);
 }
 
-static void dump_marks_helper(FILE *f,
-	uintmax_t base,
-	struct mark_set *m)
-{
-	uintmax_t k;
-	if (m->shift) {
-		for (k = 0; k < 1024; k++) {
-			if (m->data.sets[k])
-				dump_marks_helper(f, base + (k << m->shift),
-					m->data.sets[k]);
-		}
-	} else {
-		for (k = 0; k < 1024; k++) {
-			if (m->data.marked[k])
-				fprintf(f, ":%" PRIuMAX " %s\n", base + k,
-					oid_to_hex(&m->data.marked[k]->idx.oid));
-		}
-	}
-}
-
 static void dump_marks(void)
 {
 	struct lock_file mark_lock = LOCK_INIT;
@@ -1704,7 +1706,7 @@ static void dump_marks(void)
 		return;
 	}
 
-	dump_marks_helper(f, 0, marks);
+	for_each_mark(marks, 0, dump_marks_fn, f);
 	if (commit_lock_file(&mark_lock)) {
 		failure |= error_errno("Unable to write file %s",
 				       export_marks_file);

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 24/24] fast-import: add options for rewriting submodules
  2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (22 preceding siblings ...)
  2020-02-22 20:17 ` [PATCH v2 23/24] fast-import: add a generic function to iterate over marks brian m. carlson
@ 2020-02-22 20:17 ` brian m. carlson
  2020-02-24 18:34 ` [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 Junio C Hamano
  24 siblings, 0 replies; 51+ messages in thread
From: brian m. carlson @ 2020-02-22 20:17 UTC (permalink / raw)
  To: git

When converting a repository using submodules from one hash algorithm to
another, it is necessary to rewrite the submodules from the old
algorithm to the new algorithm, since only references to submodules, not
their contents, are written to the fast-export stream. Without rewriting
the submodules, fast-import fails with an "Invalid dataref" error when
encountering a submodule in another algorithm.

Add a pair of options, --rewrite-submodules-from and
--rewrite-submodules-to, that take a list of marks produced by
fast-export and fast-import, respectively, when processing the
submodule. Use these marks to map the submodule commits from the old
algorithm to the new algorithm.

We read marks into two corresponding struct mark_set objects and then
perform a mapping from the old to the new using a hash table. This lets
us reuse the same mark parsing code that is used elsewhere and allows us
to efficiently read and match marks based on their ID, since mark files
need not be sorted.

Note that because we're using a khash table for the object IDs, and this
table copies values of struct object_id instead of taking references to
them, it's necessary to zero the struct object_id values that we use to
insert and look up in the table. Otherwise, we would end up with SHA-1
values that don't match because of whatever stack garbage might be left
in the unused area.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/git-fast-import.txt |  20 ++++++
 fast-import.c                     | 112 ++++++++++++++++++++++++++++--
 t/t9300-fast-import.sh            | 109 +++++++++++++++++++++++++++++
 3 files changed, 236 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 7889f95940..77c6b3d001 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -122,6 +122,26 @@ Locations of Marks Files
 Relative and non-relative marks may be combined by interweaving
 --(no-)-relative-marks with the --(import|export)-marks= options.
 
+Submodule Rewriting
+~~~~~~~~~~~~~~~~~~~
+
+--rewrite-submodules-from=<name>:<file>::
+--rewrite-submodules-to=<name>:<file>::
+  Rewrite the object IDs for the submodule specified by <name> from the values
+	used in the from <file> to those used in the to <file>. The from marks should
+	have been created by `git fast-export`, and the to marks should have been
+	created by `git fast-import` when importing that same submodule.
++
+<name> may be any arbitrary string not containing a colon character, but the
+same value must be used with both options when specifying corresponding marks.
+Multiple submodules may be specified with different values for <name>. It is an
+error not to use these options in corresponding pairs.
++
+These options are primarily useful when converting a repository from one hash
+algorithm to another; without them, fast-import will fail if it encounters a
+submodule because it has no way of writing the object ID into the new hash
+algorithm.
+
 Performance and Compression Tuning
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/fast-import.c b/fast-import.c
index 6711f71ba7..202dda11a6 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -18,6 +18,7 @@
 #include "object-store.h"
 #include "mem-pool.h"
 #include "commit-reach.h"
+#include "khash.h"
 
 #define PACK_ID_BITS 16
 #define MAX_PACK_ID ((1<<PACK_ID_BITS)-1)
@@ -53,6 +54,7 @@ struct object_entry_pool {
 
 struct mark_set {
 	union {
+		struct object_id *oids[1024];
 		struct object_entry *marked[1024];
 		struct mark_set *sets[1024];
 	} data;
@@ -225,6 +227,11 @@ static int allow_unsafe_features;
 /* Signal handling */
 static volatile sig_atomic_t checkpoint_requested;
 
+/* Submodule marks */
+static struct string_list sub_marks_from = STRING_LIST_INIT_DUP;
+static struct string_list sub_marks_to = STRING_LIST_INIT_DUP;
+static kh_oid_map_t *sub_oid_map;
+
 /* Where to write output of cat-blob commands */
 static int cat_blob_fd = STDOUT_FILENO;
 
@@ -1731,6 +1738,11 @@ static void insert_object_entry(struct mark_set *s, struct object_id *oid, uintm
 	insert_mark(s, mark, e);
 }
 
+static void insert_oid_entry(struct mark_set *s, struct object_id *oid, uintmax_t mark)
+{
+	insert_mark(s, mark, xmemdupz(oid, sizeof(*oid)));
+}
+
 static void read_mark_file(struct mark_set *s, FILE *f, mark_set_inserter_t inserter)
 {
 	char line[512];
@@ -1739,13 +1751,17 @@ static void read_mark_file(struct mark_set *s, FILE *f, mark_set_inserter_t inse
 		char *end;
 		struct object_id oid;
 
+		/* Ensure SHA-1 objects are padded with zeros. */
+		memset(oid.hash, 0, sizeof(oid.hash));
+
 		end = strchr(line, '\n');
 		if (line[0] != ':' || !end)
 			die("corrupt mark line: %s", line);
 		*end = 0;
 		mark = strtoumax(line + 1, &end, 10);
 		if (!mark || end == line + 1
-			|| *end != ' ' || get_oid_hex(end + 1, &oid))
+			|| *end != ' '
+			|| get_oid_hex_any(end + 1, &oid) == GIT_HASH_UNKNOWN)
 			die("corrupt mark line: %s", line);
 		inserter(s, &oid, mark);
 	}
@@ -2146,6 +2162,30 @@ static uintmax_t change_note_fanout(struct tree_entry *root,
 	return do_change_note_fanout(root, root, hex_oid, 0, path, 0, fanout);
 }
 
+static int parse_mapped_oid_hex(const char *hex, struct object_id *oid, const char **end)
+{
+	int algo;
+	khiter_t it;
+
+	/* Make SHA-1 object IDs have all-zero padding. */
+	memset(oid->hash, 0, sizeof(oid->hash));
+
+	algo = parse_oid_hex_any(hex, oid, end);
+	if (algo == GIT_HASH_UNKNOWN)
+		return -1;
+
+	it = kh_get_oid_map(sub_oid_map, *oid);
+	/* No such object? */
+	if (it == kh_end(sub_oid_map)) {
+		/* If we're using the same algorithm, pass it through. */
+		if (hash_algos[algo].format_id == the_hash_algo->format_id)
+			return 0;
+		return -1;
+	}
+	oidcpy(oid, kh_value(sub_oid_map, it));
+	return 0;
+}
+
 /*
  * Given a pointer into a string, parse a mark reference:
  *
@@ -2232,7 +2272,7 @@ static void file_change_m(const char *p, struct branch *b)
 		inline_data = 1;
 		oe = NULL; /* not used with inline_data, but makes gcc happy */
 	} else {
-		if (parse_oid_hex(p, &oid, &p))
+		if (parse_mapped_oid_hex(p, &oid, &p))
 			die("Invalid dataref: %s", command_buf.buf);
 		oe = find_object(&oid);
 		if (*p++ != ' ')
@@ -2406,7 +2446,7 @@ static void note_change_n(const char *p, struct branch *b, unsigned char *old_fa
 		inline_data = 1;
 		oe = NULL; /* not used with inline_data, but makes gcc happy */
 	} else {
-		if (parse_oid_hex(p, &oid, &p))
+		if (parse_mapped_oid_hex(p, &oid, &p))
 			die("Invalid dataref: %s", command_buf.buf);
 		oe = find_object(&oid);
 		if (*p++ != ' ')
@@ -2941,7 +2981,7 @@ static void parse_cat_blob(const char *p)
 			die("Unknown mark: %s", command_buf.buf);
 		oidcpy(&oid, &oe->idx.oid);
 	} else {
-		if (parse_oid_hex(p, &oid, &p))
+		if (parse_mapped_oid_hex(p, &oid, &p))
 			die("Invalid dataref: %s", command_buf.buf);
 		if (*p)
 			die("Garbage after SHA1: %s", command_buf.buf);
@@ -3005,6 +3045,42 @@ static struct object_entry *dereference(struct object_entry *oe,
 	return find_object(oid);
 }
 
+static void insert_mapped_mark(uintmax_t mark, void *object, void *cbp)
+{
+	struct object_id *fromoid = object;
+	struct object_id *tooid = find_mark(cbp, mark);
+	int ret;
+	khiter_t it;
+
+	it = kh_put_oid_map(sub_oid_map, *fromoid, &ret);
+	/* We've already seen this object. */
+	if (ret == 0)
+		return;
+	kh_value(sub_oid_map, it) = tooid;
+}
+
+static void build_mark_map_one(struct mark_set *from, struct mark_set *to)
+{
+	for_each_mark(from, 0, insert_mapped_mark, to);
+}
+
+static void build_mark_map(struct string_list *from, struct string_list *to)
+{
+	struct string_list_item *fromp, *top;
+
+	sub_oid_map = kh_init_oid_map();
+
+	for_each_string_list_item(fromp, from) {
+		top = string_list_lookup(to, fromp->string);
+		if (!fromp->util) {
+			die(_("Missing from marks for submodule '%s'"), fromp->string);
+		} else if (!top || !top->util) {
+			die(_("Missing to marks for submodule '%s'"), fromp->string);
+		}
+		build_mark_map_one(fromp->util, top->util);
+	}
+}
+
 static struct object_entry *parse_treeish_dataref(const char **p)
 {
 	struct object_id oid;
@@ -3016,7 +3092,7 @@ static struct object_entry *parse_treeish_dataref(const char **p)
 			die("Unknown mark: %s", command_buf.buf);
 		oidcpy(&oid, &e->idx.oid);
 	} else {	/* <sha1> */
-		if (parse_oid_hex(*p, &oid, p))
+		if (parse_mapped_oid_hex(*p, &oid, p))
 			die("Invalid dataref: %s", command_buf.buf);
 		e = find_object(&oid);
 		if (*(*p)++ != ' ')
@@ -3222,6 +3298,26 @@ static void option_export_pack_edges(const char *edges)
 	pack_edges = xfopen(edges, "a");
 }
 
+static void option_rewrite_submodules(const char *arg, struct string_list *list)
+{
+	struct mark_set *ms;
+	FILE *fp;
+	char *s = xstrdup(arg);
+	char *f = strchr(s, ':');
+	if (!f)
+		die(_("Expected format name:filename for submodule rewrite option"));
+	*f = '\0';
+	f++;
+	ms = xcalloc(1, sizeof(*ms));
+	string_list_insert(list, s)->util = ms;
+
+	fp = fopen(f, "r");
+	if (!fp)
+		die_errno("cannot read '%s'", f);
+	read_mark_file(ms, fp, insert_oid_entry);
+	fclose(fp);
+}
+
 static int parse_one_option(const char *option)
 {
 	if (skip_prefix(option, "max-pack-size=", &option)) {
@@ -3284,6 +3380,11 @@ static int parse_one_feature(const char *feature, int from_stream)
 		option_export_marks(arg);
 	} else if (!strcmp(feature, "alias")) {
 		; /* Don't die - this feature is supported */
+	} else if (skip_prefix(feature, "rewrite-submodules-to=", &arg)) {
+		option_rewrite_submodules(arg, &sub_marks_to);
+	} else if (skip_prefix(feature, "rewrite-submodules-from=", &arg)) {
+		option_rewrite_submodules(arg, &sub_marks_from);
+	} else if (skip_prefix(feature, "rewrite-submodules-from=", &arg)) {
 	} else if (!strcmp(feature, "get-mark")) {
 		; /* Don't die - this feature is supported */
 	} else if (!strcmp(feature, "cat-blob")) {
@@ -3389,6 +3490,7 @@ static void parse_argv(void)
 	seen_data_command = 1;
 	if (import_marks_file)
 		read_marks();
+	build_mark_map(&sub_marks_from, &sub_marks_to);
 }
 
 int cmd_main(int argc, const char **argv)
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index ae9950a9c2..22c6c27763 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -3382,4 +3382,113 @@ test_expect_success 'X: handling encoding' '
 	git log -1 --format=%B encoding | grep $(printf "\317\200")
 '
 
+###
+### series Y (submodules and hash algorithms)
+###
+
+cat >Y-sub-input <<\Y_INPUT_END
+blob
+mark :1
+data 4
+foo
+
+reset refs/heads/master
+commit refs/heads/master
+mark :2
+author Full Name <user@company.tld> 1000000000 +0100
+committer Full Name <user@company.tld> 1000000000 +0100
+data 24
+Test submodule commit 1
+M 100644 :1 file
+
+blob
+mark :3
+data 8
+foo
+bar
+
+commit refs/heads/master
+mark :4
+author Full Name <user@company.tld> 1000000001 +0100
+committer Full Name <user@company.tld> 1000000001 +0100
+data 24
+Test submodule commit 2
+from :2
+M 100644 :3 file
+Y_INPUT_END
+
+# Note that the submodule object IDs are intentionally not translated.
+cat >Y-main-input <<\Y_INPUT_END
+blob
+mark :1
+data 4
+foo
+
+reset refs/heads/master
+commit refs/heads/master
+mark :2
+author Full Name <user@company.tld> 2000000000 +0100
+committer Full Name <user@company.tld> 2000000000 +0100
+data 14
+Test commit 1
+M 100644 :1 file
+
+blob
+mark :3
+data 73
+[submodule "sub1"]
+	path = sub1
+	url = https://void.example.com/main.git
+
+commit refs/heads/master
+mark :4
+author Full Name <user@company.tld> 2000000001 +0100
+committer Full Name <user@company.tld> 2000000001 +0100
+data 14
+Test commit 2
+from :2
+M 100644 :3 .gitmodules
+M 160000 0712c5be7cf681388e355ef47525aaf23aee1a6d sub1
+
+blob
+mark :5
+data 8
+foo
+bar
+
+commit refs/heads/master
+mark :6
+author Full Name <user@company.tld> 2000000002 +0100
+committer Full Name <user@company.tld> 2000000002 +0100
+data 14
+Test commit 3
+from :4
+M 100644 :5 file
+M 160000 ff729f5e62f72c0c3978207d9a80e5f3a65f14d7 sub1
+Y_INPUT_END
+
+cat >Y-marks <<\Y_INPUT_END
+:2 0712c5be7cf681388e355ef47525aaf23aee1a6d
+:4 ff729f5e62f72c0c3978207d9a80e5f3a65f14d7
+Y_INPUT_END
+
+test_expect_success 'Y: setup' '
+	test_oid_cache <<-EOF
+	Ymaster sha1:9afed2f9161ddf416c0a1863b8b0725b00070504
+	Ymaster sha256:c0a1010da1df187b2e287654793df01b464bd6f8e3f17fc1481a7dadf84caee3
+	EOF
+'
+
+test_expect_success 'Y: rewrite submodules' '
+	git init main1 &&
+	(
+		cd main1 &&
+		git init sub2 &&
+		git -C sub2 fast-import --export-marks=../sub2-marks <../Y-sub-input &&
+		git fast-import --rewrite-submodules-from=sub:../Y-marks \
+			--rewrite-submodules-to=sub:sub2-marks <../Y-main-input &&
+		test "$(git rev-parse master)" = "$(test_oid Ymaster)"
+	)
+'
+
 test_done

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 01/24] builtin/pack-objects: make hash agnostic
  2020-02-22 20:17 ` [PATCH v2 01/24] builtin/pack-objects: make hash agnostic brian m. carlson
@ 2020-02-23 21:57   ` Jeff King
  2020-02-24  3:01     ` Jeff King
  2020-02-24  3:42     ` brian m. carlson
  0 siblings, 2 replies; 51+ messages in thread
From: Jeff King @ 2020-02-23 21:57 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Christian Couder, git

On Sat, Feb 22, 2020 at 08:17:26PM +0000, brian m. carlson wrote:

> Avoid hard-coding a hash size, instead preferring to use the_hash_algo.
> [...]
> -			hashwrite(out, base_sha1, 20);
> +			hashwrite(out, base_sha1, the_hash_algo->rawsz);

Yeah, looks obviously correct (and I think this is new from the
pack-reuse patches of mine that Christian sent recently).

The name "base_sha1" is clearly not great either. It could be changed
trivially, but the real sin is that it comes from
nth_packed_object_sha1(). It could be replaced with
nth_packed_object_oid() at the cost of an extra hash copy, which isn't
too bad.

It would be nice to get rid of that function entirely. In most cases,
it's either free to do so (we end up copying the result into an oid
anyway) or we pay for one extra hashcpy (out of the mmap into a local
struct). But the one in pack-check.c:verify_packfile() is tricky; we
store a pointer per object into the index mmap, and we'd have to switch
to storing an oid per object. Given that the code isn't commonly run
(and other operations like _generating_ the index in the first place are
clearly OK with the extra memory hit), I think I'd be OK with that in
the name of cleanliness.

All of that is sort of orthogonal to your goals, I think, so I don't
mind at all calling it out of scope for your series. As long as we use
the_hash_algo->rawsz, these bare pointer accesses aren't wrong (it's
just that it's easy to accidentally get it wrong, as this code shows).

I think it would also be correct to cast the mmap'd bytes to a "struct
object_id" given that the struct contains the hash bytes as the first
member. I worry a little that we'd get no compiler warning of the
breakage if that assumption changes, but it also seems unlikely to do
so.

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 01/24] builtin/pack-objects: make hash agnostic
  2020-02-23 21:57   ` Jeff King
@ 2020-02-24  3:01     ` Jeff King
  2020-02-24  3:42     ` brian m. carlson
  1 sibling, 0 replies; 51+ messages in thread
From: Jeff King @ 2020-02-24  3:01 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Christian Couder, git

On Sun, Feb 23, 2020 at 04:57:59PM -0500, Jeff King wrote:

> It would be nice to get rid of [nth_packed_object_sha1] entirely. In most cases,
> it's either free to do so (we end up copying the result into an oid
> anyway) or we pay for one extra hashcpy (out of the mmap into a local
> struct). But the one in pack-check.c:verify_packfile() is tricky; we
> store a pointer per object into the index mmap, and we'd have to switch
> to storing an oid per object. Given that the code isn't commonly run
> (and other operations like _generating_ the index in the first place are
> clearly OK with the extra memory hit), I think I'd be OK with that in
> the name of cleanliness.
> 
> All of that is sort of orthogonal to your goals, I think, so I don't
> mind at all calling it out of scope for your series. As long as we use
> the_hash_algo->rawsz, these bare pointer accesses aren't wrong (it's
> just that it's easy to accidentally get it wrong, as this code shows).

I looked into this a bit further. It turns out that the current code in
verify_packfile() was explicitly trying to avoid that extra memory. But
the good news is that I think we can improve it further, cleaning up the
existing type-punning and using even less memory than now.

I'll prepare a separate series to do that. I had thought the cleanup
might also make this whole 20 / the_hash_algo->rawsz thing go away, but
it doesn't (the name hashwrite() made me think we ought to be using an
oidwrite(), but of course that's silly; the "hash" here is a "file with
a hash checksum" and not "an object id hash"). So the two topics really
are independent.

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 01/24] builtin/pack-objects: make hash agnostic
  2020-02-23 21:57   ` Jeff King
  2020-02-24  3:01     ` Jeff King
@ 2020-02-24  3:42     ` brian m. carlson
  2020-02-24  4:20       ` Jeff King
  1 sibling, 1 reply; 51+ messages in thread
From: brian m. carlson @ 2020-02-24  3:42 UTC (permalink / raw)
  To: Jeff King; +Cc: Christian Couder, git

[-- Attachment #1: Type: text/plain, Size: 2223 bytes --]

On 2020-02-23 at 21:57:59, Jeff King wrote:
> On Sat, Feb 22, 2020 at 08:17:26PM +0000, brian m. carlson wrote:
> 
> > Avoid hard-coding a hash size, instead preferring to use the_hash_algo.
> > [...]
> > -			hashwrite(out, base_sha1, 20);
> > +			hashwrite(out, base_sha1, the_hash_algo->rawsz);
> 
> Yeah, looks obviously correct (and I think this is new from the
> pack-reuse patches of mine that Christian sent recently).

I believe it is, which is why I CC'd you on it.

> The name "base_sha1" is clearly not great either. It could be changed
> trivially, but the real sin is that it comes from
> nth_packed_object_sha1(). It could be replaced with
> nth_packed_object_oid() at the cost of an extra hash copy, which isn't
> too bad.

I probably should send a series getting rid of the rest of the "sha1"
places in our code, because there are a lot of them, but I just haven't
gotten around to it yet.  And yeah, as you mentioned, there are still a
lot of places using nth_packed_object_sha1.

> It would be nice to get rid of that function entirely. In most cases,
> it's either free to do so (we end up copying the result into an oid
> anyway) or we pay for one extra hashcpy (out of the mmap into a local
> struct). But the one in pack-check.c:verify_packfile() is tricky; we
> store a pointer per object into the index mmap, and we'd have to switch
> to storing an oid per object. Given that the code isn't commonly run
> (and other operations like _generating_ the index in the first place are
> clearly OK with the extra memory hit), I think I'd be OK with that in
> the name of cleanliness.

Yeah, that's why I hadn't switched it out earlier.

> I think it would also be correct to cast the mmap'd bytes to a "struct
> object_id" given that the struct contains the hash bytes as the first
> member. I worry a little that we'd get no compiler warning of the
> breakage if that assumption changes, but it also seems unlikely to do
> so.

In the future, struct object_id will get a new member (an algorithm
value), but I think it's fine to make the assumption that the hash bytes
are first.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 01/24] builtin/pack-objects: make hash agnostic
  2020-02-24  3:42     ` brian m. carlson
@ 2020-02-24  4:20       ` Jeff King
  0 siblings, 0 replies; 51+ messages in thread
From: Jeff King @ 2020-02-24  4:20 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Christian Couder, git

On Mon, Feb 24, 2020 at 03:42:59AM +0000, brian m. carlson wrote:

> > Yeah, looks obviously correct (and I think this is new from the
> > pack-reuse patches of mine that Christian sent recently).
> 
> I believe it is, which is why I CC'd you on it.

Heh, yeah. I knew you knew, but was saying so for the rest of the
audience. :)

> > I think it would also be correct to cast the mmap'd bytes to a "struct
> > object_id" given that the struct contains the hash bytes as the first
> > member. I worry a little that we'd get no compiler warning of the
> > breakage if that assumption changes, but it also seems unlikely to do
> > so.
> 
> In the future, struct object_id will get a new member (an algorithm
> value), but I think it's fine to make the assumption that the hash bytes
> are first.

Yeah, I think that would continue to work, although weirdness would
ensue if anybody ever dereferenced the algorithm member in one of the
type-punned structs. If we can avoid it entirely, I think we should (and
it was easy to remove the spot in pack-check).

Patches incoming.

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 07/24] t6300: abstract away SHA-1-specific constants
  2020-02-22 20:17 ` [PATCH v2 07/24] t6300: abstract away SHA-1-specific constants brian m. carlson
@ 2020-02-24 18:01   ` Junio C Hamano
  2020-02-24 18:12     ` Jeff King
  0 siblings, 1 reply; 51+ messages in thread
From: Junio C Hamano @ 2020-02-24 18:01 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> Adjust the test so that it computes variables for object IDs instead of
> using hard-coded hashes.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  t/t6300-for-each-ref.sh | 25 ++++++++++++++++---------
>  1 file changed, 16 insertions(+), 9 deletions(-)
>
> diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
> index 9c910ce746..2406b93d35 100755
> --- a/t/t6300-for-each-ref.sh
> +++ b/t/t6300-for-each-ref.sh
> @@ -20,6 +20,10 @@ setdate_and_increment () {
>  }
>  
>  test_expect_success setup '
> +	test_oid_cache <<-EOF &&
> +	disklen sha1:138
> +	disklen sha256:154
> +	EOF
>  	setdate_and_increment &&
>  	echo "Using $datestamp" > one &&
>  	git add one &&
> @@ -50,6 +54,9 @@ test_atom() {
>  	"
>  }
>  
> +hexlen=$(test_oid hexsz)
> +disklen=$(test_oid disklen)
> +
>  test_atom head refname refs/heads/master
>  test_atom head refname: refs/heads/master
>  test_atom head refname:short master
> @@ -82,9 +89,9 @@ test_atom head push:rstrip=-1 refs
>  test_atom head push:strip=1 remotes/myfork/master
>  test_atom head push:strip=-1 master
>  test_atom head objecttype commit
> -test_atom head objectsize 171
> -test_atom head objectsize:disk 138
> -test_atom head deltabase 0000000000000000000000000000000000000000
> +test_atom head objectsize $((131 + hexlen))

171 == 131 + 40 and that is because we are looking at the initial
commit, whose contents has a single object name (i.e. its tree).

Makes sense.

> +test_atom head objectsize:disk $disklen
> +test_atom head deltabase $ZERO_OID
>  test_atom head objectname $(git rev-parse refs/heads/master)
>  test_atom head objectname:short $(git rev-parse --short refs/heads/master)
>  test_atom head objectname:short=1 $(git rev-parse --short=1 refs/heads/master)
> @@ -125,11 +132,11 @@ test_atom tag refname:short testtag
>  test_atom tag upstream ''
>  test_atom tag push ''
>  test_atom tag objecttype tag
> -test_atom tag objectsize 154
> -test_atom tag objectsize:disk 138
> -test_atom tag '*objectsize:disk' 138
> -test_atom tag deltabase 0000000000000000000000000000000000000000
> -test_atom tag '*deltabase' 0000000000000000000000000000000000000000
> +test_atom tag objectsize $((114 + hexlen))

Likewise, 154 == 114 + 40 because an annotated tag has an object
pointer to a single object (i.e. its pointee).

Makes sense, too.

> +test_atom tag objectsize:disk $disklen
> +test_atom tag '*objectsize:disk' $disklen
> +test_atom tag deltabase $ZERO_OID
> +test_atom tag '*deltabase' $ZERO_OID
>  test_atom tag objectname $(git rev-parse refs/tags/testtag)
>  test_atom tag objectname:short $(git rev-parse --short refs/tags/testtag)
>  test_atom head objectname:short=1 $(git rev-parse --short=1 refs/heads/master)
> @@ -139,7 +146,7 @@ test_atom tag parent ''
>  test_atom tag numparent ''
>  test_atom tag object $(git rev-parse refs/tags/testtag^0)
>  test_atom tag type 'commit'
> -test_atom tag '*objectname' 'ea122842f48be4afb2d1fc6a4b96c05885ab7463'
> +test_atom tag '*objectname' $(git rev-parse refs/tags/testtag^{})
>  test_atom tag '*objecttype' 'commit'
>  test_atom tag author ''
>  test_atom tag authorname ''

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 10/24] t/helper: initialize repository if necessary
  2020-02-22 20:17 ` [PATCH v2 10/24] t/helper: initialize repository if necessary brian m. carlson
@ 2020-02-24 18:05   ` Junio C Hamano
  2020-02-25  0:05     ` brian m. carlson
  0 siblings, 1 reply; 51+ messages in thread
From: Junio C Hamano @ 2020-02-24 18:05 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> The repository helper is used in t5318 to read commit graphs whether
> we're in a repository or not. However, without a repository, we have no
> way to properly initialize the hash algorithm, meaning that the file is
> misread.
>
> Fix this by calling setup_git_directory_gently, which will read the
> environment variable the testsuite sets to ensure that the correct hash
> algorithm is set.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  t/helper/test-repository.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/t/helper/test-repository.c b/t/helper/test-repository.c
> index f7f8618445..ecc768e4cb 100644
> --- a/t/helper/test-repository.c
> +++ b/t/helper/test-repository.c
> @@ -75,6 +75,10 @@ static void test_get_commit_tree_in_graph(const char *gitdir,
>  
>  int cmd__repository(int argc, const char **argv)
>  {
> +	int nongit_ok = 0;
> +
> +	setup_git_directory_gently(&nongit_ok);

No need to initialize nongit_ok to any specific value before calling
setup_git_directory_gently() and I personally find this initialization
unhelpful to new readers, as it misleadingly hints the setup process
may be affected by the value passed in by the value in nongit_ok,
when in reality the variable is purely used as outout-only (the
first thing the function does is to clear it).

Was it necessary to work around a compiler warning or something?

>  	if (argc < 2)
>  		die("must have at least 2 arguments");
>  	if (!strcmp(argv[1], "parse_commit_in_graph")) {

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 07/24] t6300: abstract away SHA-1-specific constants
  2020-02-24 18:01   ` Junio C Hamano
@ 2020-02-24 18:12     ` Jeff King
  2020-02-24 20:41       ` Junio C Hamano
  0 siblings, 1 reply; 51+ messages in thread
From: Jeff King @ 2020-02-24 18:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: brian m. carlson, git

On Mon, Feb 24, 2020 at 10:01:08AM -0800, Junio C Hamano wrote:

> > -test_atom head objectsize 171
> > -test_atom head objectsize:disk 138
> > -test_atom head deltabase 0000000000000000000000000000000000000000
> > +test_atom head objectsize $((131 + hexlen))
> 
> 171 == 131 + 40 and that is because we are looking at the initial
> commit, whose contents has a single object name (i.e. its tree).

I wonder if it would be more readable to just pipe "cat-file" through
"wc -c", rather than hard-coding. Then there's no magic number at all.

> > +test_atom head objectsize:disk $disklen

Likewise for $disklen, if it's a loose object we could just get the
information from the filesystem. That would stop us caring about the
hash, _and_ it would make us robust to random changes in the zlib
compression.

I'm not sure if we also check packed objects. If so, you can compute the
size from the output of show-index, which gives the offsets of each
object. That's also how Git does it internally, though, so I'm not sure
if that is getting too close to just testing nothing (but IMHO the thing
we're really covering here is the format routines).

-Peff

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 13/24] builtin/init-db: allow specifying hash algorithm on command line
  2020-02-22 20:17 ` [PATCH v2 13/24] builtin/init-db: allow specifying hash algorithm on command line brian m. carlson
@ 2020-02-24 18:14   ` Junio C Hamano
  2020-02-25  0:11     ` brian m. carlson
  0 siblings, 1 reply; 51+ messages in thread
From: Junio C Hamano @ 2020-02-24 18:14 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

>  'git init' [-q | --quiet] [--bare] [--template=<template_directory>]
> -	  [--separate-git-dir <git dir>]
> +	  [--separate-git-dir <git dir>] [--object-format=<format]

A missing closing ket> in <bra-ket> pair.

> +#ifndef ENABLE_SHA256
> +	if (fmt->hash_algo != GIT_HASH_SHA1)
> +		die(_("The hash algorithm %s is not supported in this build."), hash_algos[fmt->hash_algo].name);

Could you fold the overlong line here?

>  int init_db(const char *git_dir, const char *real_git_dir,
> -	    const char *template_dir, unsigned int flags)
> +	    const char *template_dir, int hash, unsigned int flags)

Perhaps rename "hash" to "hash_algo"?  I know that it is very
unlikely for a variable whose name is 'hash' to be mistaken as a raw
hash value when its type is "int" (as opposed to say char *), but
still.  I wouldn't be saying this if its type were an "enum
hash_algo" or something like that.

> +	const char *object_format = NULL;
> +	int hash_algo = GIT_HASH_UNKNOWN;

This one _is_ good.

>  	const struct option init_db_options[] = {
>  		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
>  				N_("directory from which templates will be used")),
> @@ -494,6 +526,8 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
>  		OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
>  		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
>  			   N_("separate git dir from working tree")),
> +		OPT_STRING(0, "object-format", &object_format, N_("hash"),
> +			   N_("specify the hash algorithm to use")),
>  		OPT_END()
>  	};
>  
> @@ -546,6 +580,12 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
>  		free(cwd);
>  	}
>  
> +	if (object_format) {
> +		hash_algo = hash_algo_by_name(object_format);
> +		if (hash_algo == GIT_HASH_UNKNOWN)
> +			die(_("unknown hash algorithm '%s'"), object_format);
> +	}
> +
>  	if (init_shared_repository != -1)
>  		set_shared_repository(init_shared_repository);
>  
> @@ -597,5 +637,5 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
>  	UNLEAK(work_tree);
>  
>  	flags |= INIT_DB_EXIST_OK;
> -	return init_db(git_dir, real_git_dir, template_dir, flags);
> +	return init_db(git_dir, real_git_dir, template_dir, hash_algo, flags);
>  }
> diff --git a/cache.h b/cache.h
> index 29ee02a8d4..7a47e023ba 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -627,7 +627,8 @@ int path_inside_repo(const char *prefix, const char *path);
>  #define INIT_DB_EXIST_OK 0x0002
>  
>  int init_db(const char *git_dir, const char *real_git_dir,
> -	    const char *template_dir, unsigned int flags);
> +	    const char *template_dir, int hash_algo,

So is this one.

> +	    unsigned int flags);
>  
>  void sanitize_stdfds(void);
>  int daemonize(void);

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 17/24] commit: use expected signature header for SHA-256
  2020-02-22 20:17 ` [PATCH v2 17/24] commit: use expected signature header for SHA-256 brian m. carlson
@ 2020-02-24 18:24   ` Junio C Hamano
  0 siblings, 0 replies; 51+ messages in thread
From: Junio C Hamano @ 2020-02-24 18:24 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> diff --git a/builtin/commit.c b/builtin/commit.c
> index 7ba33a3bec..798d362a2e 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -1659,7 +1659,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
>  	}
>  
>  	if (amend) {
> -		const char *exclude_gpgsig[2] = { "gpgsig", NULL };
> +		const char *exclude_gpgsig[3] = { "gpgsig", "gpgsig-sha256", NULL };

For futureproofing, we should eventually revamp the machinery of
read_commit_extra_headers() so that we do not have to maintain this
array.  Instead, the second parameter, that is passed down to
read_commit_extra_header_lines() helper, should become a callback
function that lets the caller say "no, I do not want this header",
and we can use it to exclude "gpgsig", "gpgsig-sha256", and anything
that begins with "gpgsig-" in the future.  We might even want to
exclude "gpgsig-$algo" where "$algo" is the name of the algorithm
that this version of Git does not yet recognise while giving a
warning, or perhaps we may want to cause read_commit_extra_headers()
to yield an error on gpgsig with unknown hashalgo.  These become
easier to arrange with such a machinery update.

One question I have is if it makes sense to do so now before this
series, or we leave a "NEEDSWORK:" note and complete SHA-2 transition
with the current mechanism.

> +test_expect_success 'commit with multiple signatures is okay' '
> +	git cat-file commit HEAD >basis &&
> +	cat >sigs <<-EOF &&
> +	gpgsig -----BEGIN PGP SIGNATURE-----
> +	  VGhpcyBpcyBub3QgcmVhbGx5IGEgc2lnbmF0dXJlLg==
> +	  -----END PGP SIGNATURE-----
> +	gpgsig-sha256 -----BEGIN PGP SIGNATURE-----
> +	  VGhpcyBpcyBub3QgcmVhbGx5IGEgc2lnbmF0dXJlLg==
> +	  -----END PGP SIGNATURE-----
> +	EOF
> +	sed -e "/^committer/q" basis >okay &&
> +	cat sigs >>okay &&
> +	echo >>okay &&
> +	sed -e "1,/^$/d" basis >>okay &&
> +	cat okay &&
> +	new=$(git hash-object -t commit -w --stdin <okay) &&
> +	test_when_finished "remove_object $new" &&
> +	git update-ref refs/heads/bogus "$new" &&
> +	test_when_finished "git update-ref -d refs/heads/bogus" &&
> +	git fsck 2>out &&
> +	cat out &&
> +	! grep "commit $new" out
> +'

There seem to be a few debugging leftover cats in the above.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 18/24] gpg-interface: improve interface for parsing tags
  2020-02-22 20:17 ` [PATCH v2 18/24] gpg-interface: improve interface for parsing tags brian m. carlson
@ 2020-02-24 18:26   ` Junio C Hamano
  2020-02-25 10:29   ` Johannes Schindelin
  1 sibling, 0 replies; 51+ messages in thread
From: Junio C Hamano @ 2020-02-24 18:26 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> We have a function which parses a buffer with a signature at the end,
> parse_signature, and this function is used for signed tags.  However,
> the transition plan has SHA-256 tags using a header, which is a
> materially different syntax.  The current interface is not suitable for
> parsing such tags.
>
> Adjust the parse_signature interface to store the parsed data in two
> strbufs and turn the existing function into parse_signed_buffer.  The
> latter is still used in places where we want to strip off the signature
> in a SHA-1 tag or in places where we know we always have a signed
> buffer, such as push certs.
>
> Adjust all the callers to deal with this new interface.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  builtin/fmt-merge-msg.c | 26 ++++++++++++++++++--------
>  builtin/receive-pack.c  |  4 ++--
>  builtin/tag.c           | 16 ++++++++++++----
>  commit.c                |  9 ++++++---
>  gpg-interface.c         | 13 ++++++++++++-
>  gpg-interface.h         |  9 ++++++++-
>  log-tree.c              | 14 ++++++++------
>  ref-filter.c            | 18 ++++++++++++++----
>  tag.c                   | 15 ++++++++-------
>  9 files changed, 88 insertions(+), 36 deletions(-)
>
> diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c
> index 05a92c59d8..29f647e2d9 100644
> --- a/builtin/fmt-merge-msg.c
> +++ b/builtin/fmt-merge-msg.c
> @@ -472,6 +472,7 @@ static void fmt_tag_signature(struct strbuf *tagbuf,
>  			      const char *buf,
>  			      unsigned long len)
>  {
> +
>  	const char *tag_body = strstr(buf, "\n\n");
>  	if (tag_body) {
>  		tag_body += 2;

Is this hunk a pun to rhyme with the strstr ;-)?

> diff --git a/gpg-interface.c b/gpg-interface.c
> index 2d538bcd6e..b25f5c21d8 100644
> --- a/gpg-interface.c
> +++ b/gpg-interface.c
> @@ -345,7 +345,7 @@ void print_signature_buffer(const struct signature_check *sigc, unsigned flags)
>  		fputs(output, stderr);
>  }
>  
> -size_t parse_signature(const char *buf, size_t size)
> +size_t parse_signed_buffer(const char *buf, size_t size)
>  {
>  	size_t len = 0;
>  	size_t match = size;
> @@ -361,6 +361,17 @@ size_t parse_signature(const char *buf, size_t size)
>  	return match;
>  }
>  
> +int parse_signature(const char *buf, size_t size, struct strbuf *payload, struct strbuf *signature)
> +{
> +	size_t match = parse_signed_buffer(buf, size);
> +	if (match != size) {
> +		strbuf_add(payload, buf, match);
> +		strbuf_add(signature, buf + match, size - match);
> +		return 1;
> +	}
> +	return 0;
> +}
> +

Makes sense.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 19/24] tag: store SHA-256 signatures in a header
  2020-02-22 20:17 ` [PATCH v2 19/24] tag: store SHA-256 signatures in a header brian m. carlson
@ 2020-02-24 18:30   ` Junio C Hamano
  0 siblings, 0 replies; 51+ messages in thread
From: Junio C Hamano @ 2020-02-24 18:30 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> In the future, we'll want to allow a user to sign both the SHA-1 version
> of a tag and the SHA-256 version at the same time.  Since for SHA-1 the
> signature is appended to the tag message, we must use a different way to
> allow multiple signature.

"multiple signatureS"?

> The transition plan envisions this using a gpgsig-sha256 header, much as
> for commits.  Refactor the commit code that performs parsing of this
> header and use it for tags that use SHA-256.  Check that we get tags in
> the correct format depending on what algorithm we're using.

Sounds sensible.

> diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
> index b3c1092338..caeabfb293 100755
> --- a/t/t6300-for-each-ref.sh
> +++ b/t/t6300-for-each-ref.sh
> @@ -54,6 +54,17 @@ test_atom() {
>  	"
>  }
>  
> +test_atom_hash () {
> +	local val
> +	if [ "$(test_oid algo)" = sha1 ]
> +	then

        if test "$(...)" = sha1
        then

I saw there are a few other copies of these in the result of
applying the whole series.

> +	else
> +
> +		git cat-file tag "$1" | sed -e '/^gpgsig-sha256/{s/^gpgsig-sha256 //;h;d};/^ /d;${p;x;/^$/d}'

Please line-wrap an overlong line, immediately after the pipe '|' character.

> +	fi
>  }

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3
  2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (23 preceding siblings ...)
  2020-02-22 20:17 ` [PATCH v2 24/24] fast-import: add options for rewriting submodules brian m. carlson
@ 2020-02-24 18:34 ` Junio C Hamano
  24 siblings, 0 replies; 51+ messages in thread
From: Junio C Hamano @ 2020-02-24 18:34 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> This is a series for part 1 of 3 of a stage 4 SHA-256 implementation.
> It is mostly the same as v1[0], which was RFC.  The interested reader is
> referred there for the in-depth explanations.
>
> A few interesting changes have taken place since v1.  First, I
> discovered a regression in a recent series which introduced a hard-coded
> constant, so patch 1 addresses this.
>
> Second, I discovered a mistaken assumption that we were making about our
> hash implementations: that copying the struct was sufficient to copy the
> context.  This is not true for libgcrypt, where our context is a pointer
> instead, so patch 2 addresses this with a helper function.
>
> Finally, I've added a check to prevent non-developers from creating
> SHA-256 repositories, since this series is not sufficient to implement
> full SHA-256 support.  Even as a developer, creating a SHA-256
> repository immediately leads to a broken state, since we don't recognize
> the extension (yet) and therefore promptly refuse to operate on it in
> any way.  Preventing this experience seemed prudent.

I am very tempted to take 1/24 separately and queue it at the tip of
the jk/packfile-reuse-cleanup topic.

I didn't read the fast-import bits at the end of the series, but
everything before those steps made sense to me.

Thanks.  


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 07/24] t6300: abstract away SHA-1-specific constants
  2020-02-24 18:12     ` Jeff King
@ 2020-02-24 20:41       ` Junio C Hamano
  0 siblings, 0 replies; 51+ messages in thread
From: Junio C Hamano @ 2020-02-24 20:41 UTC (permalink / raw)
  To: Jeff King; +Cc: brian m. carlson, git

Jeff King <peff@peff.net> writes:

> On Mon, Feb 24, 2020 at 10:01:08AM -0800, Junio C Hamano wrote:
>
>> > -test_atom head objectsize 171
>> > -test_atom head objectsize:disk 138
>> > -test_atom head deltabase 0000000000000000000000000000000000000000
>> > +test_atom head objectsize $((131 + hexlen))
>> 
>> 171 == 131 + 40 and that is because we are looking at the initial
>> commit, whose contents has a single object name (i.e. its tree).
>
> I wonder if it would be more readable to just pipe "cat-file" through
> "wc -c", rather than hard-coding. Then there's no magic number at all.

After seeing nearby tests use output from $(git rev-parse) as the
expected output, I had a similar thought.

>> > +test_atom head objectsize:disk $disklen
>
> Likewise for $disklen, if it's a loose object we could just get the
> information from the filesystem. That would stop us caring about the
> hash, _and_ it would make us robust to random changes in the zlib
> compression.
>
> I'm not sure if we also check packed objects. If so, you can compute the
> size from the output of show-index, which gives the offsets of each
> object. That's also how Git does it internally, though, so I'm not sure
> if that is getting too close to just testing nothing (but IMHO the thing
> we're really covering here is the format routines).

Somebody may find it tempting to use "cat-file --batch-check=<format>"
and at that point it would really become fuzzy what we are checking ;-)

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 10/24] t/helper: initialize repository if necessary
  2020-02-24 18:05   ` Junio C Hamano
@ 2020-02-25  0:05     ` brian m. carlson
  0 siblings, 0 replies; 51+ messages in thread
From: brian m. carlson @ 2020-02-25  0:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1213 bytes --]

On 2020-02-24 at 18:05:55, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> > diff --git a/t/helper/test-repository.c b/t/helper/test-repository.c
> > index f7f8618445..ecc768e4cb 100644
> > --- a/t/helper/test-repository.c
> > +++ b/t/helper/test-repository.c
> > @@ -75,6 +75,10 @@ static void test_get_commit_tree_in_graph(const char *gitdir,
> >  
> >  int cmd__repository(int argc, const char **argv)
> >  {
> > +	int nongit_ok = 0;
> > +
> > +	setup_git_directory_gently(&nongit_ok);
> 
> No need to initialize nongit_ok to any specific value before calling
> setup_git_directory_gently() and I personally find this initialization
> unhelpful to new readers, as it misleadingly hints the setup process
> may be affected by the value passed in by the value in nongit_ok,
> when in reality the variable is purely used as outout-only (the
> first thing the function does is to clear it).
> 
> Was it necessary to work around a compiler warning or something?

I don't recall, but since this seems to be the only place we do that, I
assume it's not, and I'll just drop it.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 13/24] builtin/init-db: allow specifying hash algorithm on command line
  2020-02-24 18:14   ` Junio C Hamano
@ 2020-02-25  0:11     ` brian m. carlson
  0 siblings, 0 replies; 51+ messages in thread
From: brian m. carlson @ 2020-02-25  0:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1210 bytes --]

On 2020-02-24 at 18:14:33, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> >  'git init' [-q | --quiet] [--bare] [--template=<template_directory>]
> > -	  [--separate-git-dir <git dir>]
> > +	  [--separate-git-dir <git dir>] [--object-format=<format]
> 
> A missing closing ket> in <bra-ket> pair.

Good catch.

> > +#ifndef ENABLE_SHA256
> > +	if (fmt->hash_algo != GIT_HASH_SHA1)
> > +		die(_("The hash algorithm %s is not supported in this build."), hash_algos[fmt->hash_algo].name);
> 
> Could you fold the overlong line here?

Absolutely.

> >  int init_db(const char *git_dir, const char *real_git_dir,
> > -	    const char *template_dir, unsigned int flags)
> > +	    const char *template_dir, int hash, unsigned int flags)
> 
> Perhaps rename "hash" to "hash_algo"?  I know that it is very
> unlikely for a variable whose name is 'hash' to be mistaken as a raw
> hash value when its type is "int" (as opposed to say char *), but
> still.  I wouldn't be saying this if its type were an "enum
> hash_algo" or something like that.

Sure, that's reasonable.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 18/24] gpg-interface: improve interface for parsing tags
  2020-02-22 20:17 ` [PATCH v2 18/24] gpg-interface: improve interface for parsing tags brian m. carlson
  2020-02-24 18:26   ` Junio C Hamano
@ 2020-02-25 10:29   ` Johannes Schindelin
  2020-02-25 19:25     ` Junio C Hamano
  2020-02-26  2:23     ` brian m. carlson
  1 sibling, 2 replies; 51+ messages in thread
From: Johannes Schindelin @ 2020-02-25 10:29 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

Hi brian,

On Sat, 22 Feb 2020, brian m. carlson wrote:

> diff --git a/ref-filter.c b/ref-filter.c
> index 6867e33648..212f1165bb 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -1161,7 +1161,13 @@ static void find_subpos(const char *buf,
>  			unsigned long *nonsiglen,
>  			const char **sig, unsigned long *siglen)
>  {
> +	struct strbuf payload = STRBUF_INIT;
> +	struct strbuf signature = STRBUF_INIT;
>  	const char *eol;
> +	const char *end = buf + strlen(buf);
> +	const char *sigstart;
> +
> +
>  	/* skip past header until we hit empty line */
>  	while (*buf && *buf != '\n') {
>  		eol = strchrnul(buf, '\n');
> @@ -1174,13 +1180,14 @@ static void find_subpos(const char *buf,
>  		buf++;
>
>  	/* parse signature first; we might not even have a subject line */
> -	*sig = buf + parse_signature(buf, strlen(buf));
> -	*siglen = strlen(*sig);
> +	parse_signature(buf, end - buf, &payload, &signature);
> +	*sig = strbuf_detach(&signature, siglen);

While I like the spirit of this patch, it makes the Windows build fail. I
put this on top of Git for Windows' `shears/pu` branch to fix it (maybe
you could adopt a variation of it?):

-- snipsnap --
Subject: [PATCH] fixup??? gpg-interface: improve interface for parsing tags

In 3f69139fa39 (gpg-interface: improve interface for parsing tags,
2020-02-22), we introduce a call to `strbuf_detach()`. The second
parameter of that function takes a pointer to the variable where the
length of the string should be stored.

However, `strbuf_detach()` uses the 21st century data type `size_t`,
while the existing code in `find_subpos()` uses the `unsigned long` data
type that the 1980/1980 so desperately want back.

Unsurprisingly, this causes problems with the Windows build, where
`sizeof(unsigned long) == 4` even in 64-bit builds (which, I feel the
need to add, is totally legitimate).

We should probably change the data type to the correct one in a
preparatory patch, before improving the interface for parsing tags.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 ref-filter.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 164ab62d15e..bab34b9ef74 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1156,10 +1156,10 @@ static void grab_person(const char *who, struct atom_value *val, int deref, void
 }

 static void find_subpos(const char *buf,
-			const char **sub, unsigned long *sublen,
-			const char **body, unsigned long *bodylen,
-			unsigned long *nonsiglen,
-			const char **sig, unsigned long *siglen)
+			const char **sub, size_t *sublen,
+			const char **body, size_t *bodylen,
+			size_t *nonsiglen,
+			const char **sig, size_t *siglen)
 {
 	struct strbuf payload = STRBUF_INIT;
 	struct strbuf signature = STRBUF_INIT;
@@ -1234,7 +1234,7 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
 {
 	int i;
 	const char *subpos = NULL, *bodypos = NULL, *sigpos = NULL;
-	unsigned long sublen = 0, bodylen = 0, nonsiglen = 0, siglen = 0;
+	size_t sublen = 0, bodylen = 0, nonsiglen = 0, siglen = 0;

 	for (i = 0; i < used_atom_cnt; i++) {
 		struct used_atom *atom = &used_atom[i];
--
2.25.1.windows.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 18/24] gpg-interface: improve interface for parsing tags
  2020-02-25 10:29   ` Johannes Schindelin
@ 2020-02-25 19:25     ` Junio C Hamano
  2020-02-26  3:05       ` brian m. carlson
  2020-02-26  2:23     ` brian m. carlson
  1 sibling, 1 reply; 51+ messages in thread
From: Junio C Hamano @ 2020-02-25 19:25 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: brian m. carlson, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>>  	/* parse signature first; we might not even have a subject line */
>> -	*sig = buf + parse_signature(buf, strlen(buf));
>> -	*siglen = strlen(*sig);
>> +	parse_signature(buf, end - buf, &payload, &signature);
>> +	*sig = strbuf_detach(&signature, siglen);
>
> While I like the spirit of this patch, it makes the Windows build fail. I
> put this on top of Git for Windows' `shears/pu` branch to fix it (maybe
> you could adopt a variation of it?):

FWIW, it's of course not just Windows, but on anything whose size_t
!= ulong, e.g. https://travis-ci.org/git/git/jobs/654661788 (Linux32)

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 18/24] gpg-interface: improve interface for parsing tags
  2020-02-25 10:29   ` Johannes Schindelin
  2020-02-25 19:25     ` Junio C Hamano
@ 2020-02-26  2:23     ` brian m. carlson
  2020-02-27 13:24       ` Johannes Schindelin
  1 sibling, 1 reply; 51+ messages in thread
From: brian m. carlson @ 2020-02-26  2:23 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1474 bytes --]

On 2020-02-25 at 10:29:26, Johannes Schindelin wrote:
> Hi brian,
> 
> On Sat, 22 Feb 2020, brian m. carlson wrote:
> 
> > diff --git a/ref-filter.c b/ref-filter.c
> > index 6867e33648..212f1165bb 100644
> > --- a/ref-filter.c
> > +++ b/ref-filter.c
> > @@ -1161,7 +1161,13 @@ static void find_subpos(const char *buf,
> >  			unsigned long *nonsiglen,
> >  			const char **sig, unsigned long *siglen)
> >  {
> > +	struct strbuf payload = STRBUF_INIT;
> > +	struct strbuf signature = STRBUF_INIT;
> >  	const char *eol;
> > +	const char *end = buf + strlen(buf);
> > +	const char *sigstart;
> > +
> > +
> >  	/* skip past header until we hit empty line */
> >  	while (*buf && *buf != '\n') {
> >  		eol = strchrnul(buf, '\n');
> > @@ -1174,13 +1180,14 @@ static void find_subpos(const char *buf,
> >  		buf++;
> >
> >  	/* parse signature first; we might not even have a subject line */
> > -	*sig = buf + parse_signature(buf, strlen(buf));
> > -	*siglen = strlen(*sig);
> > +	parse_signature(buf, end - buf, &payload, &signature);
> > +	*sig = strbuf_detach(&signature, siglen);
> 
> While I like the spirit of this patch, it makes the Windows build fail. I
> put this on top of Git for Windows' `shears/pu` branch to fix it (maybe
> you could adopt a variation of it?):

I'm happy to squash this in.  Sorry for the breakage, and thanks for
catching this.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 18/24] gpg-interface: improve interface for parsing tags
  2020-02-25 19:25     ` Junio C Hamano
@ 2020-02-26  3:05       ` brian m. carlson
  2020-02-26  3:11         ` Junio C Hamano
  0 siblings, 1 reply; 51+ messages in thread
From: brian m. carlson @ 2020-02-26  3:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git

[-- Attachment #1: Type: text/plain, Size: 1037 bytes --]

On 2020-02-25 at 19:25:46, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> >>  	/* parse signature first; we might not even have a subject line */
> >> -	*sig = buf + parse_signature(buf, strlen(buf));
> >> -	*siglen = strlen(*sig);
> >> +	parse_signature(buf, end - buf, &payload, &signature);
> >> +	*sig = strbuf_detach(&signature, siglen);
> >
> > While I like the spirit of this patch, it makes the Windows build fail. I
> > put this on top of Git for Windows' `shears/pu` branch to fix it (maybe
> > you could adopt a variation of it?):
> 
> FWIW, it's of course not just Windows, but on anything whose size_t
> != ulong, e.g. https://travis-ci.org/git/git/jobs/654661788 (Linux32)

I will likely not get to a reroll before Thursday at the earliest due to
other commitments, so if it's more convenient for you to eject this from
pu and just pick up v3 when I get around to it, please feel free.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 18/24] gpg-interface: improve interface for parsing tags
  2020-02-26  3:05       ` brian m. carlson
@ 2020-02-26  3:11         ` Junio C Hamano
  0 siblings, 0 replies; 51+ messages in thread
From: Junio C Hamano @ 2020-02-26  3:11 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Johannes Schindelin, git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> I will likely not get to a reroll before Thursday at the earliest due to
> other commitments, so if it's more convenient for you to eject this from
> pu and just pick up v3 when I get around to it, please feel free.

Thanks, but I think a minimal SQUASH??? I added while queuing to
'pu' seems to have made it pass so that would have to do for now.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 18/24] gpg-interface: improve interface for parsing tags
  2020-02-26  2:23     ` brian m. carlson
@ 2020-02-27 13:24       ` Johannes Schindelin
  2020-02-27 15:06         ` Junio C Hamano
  0 siblings, 1 reply; 51+ messages in thread
From: Johannes Schindelin @ 2020-02-27 13:24 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

Hi brian,

On Wed, 26 Feb 2020, brian m. carlson wrote:

> On 2020-02-25 at 10:29:26, Johannes Schindelin wrote:
> >
> > On Sat, 22 Feb 2020, brian m. carlson wrote:
> >
> > > diff --git a/ref-filter.c b/ref-filter.c
> > > index 6867e33648..212f1165bb 100644
> > > --- a/ref-filter.c
> > > +++ b/ref-filter.c
> > > @@ -1161,7 +1161,13 @@ static void find_subpos(const char *buf,
> > >  			unsigned long *nonsiglen,
> > >  			const char **sig, unsigned long *siglen)
> > >  {
> > > +	struct strbuf payload = STRBUF_INIT;
> > > +	struct strbuf signature = STRBUF_INIT;
> > >  	const char *eol;
> > > +	const char *end = buf + strlen(buf);
> > > +	const char *sigstart;
> > > +
> > > +
> > >  	/* skip past header until we hit empty line */
> > >  	while (*buf && *buf != '\n') {
> > >  		eol = strchrnul(buf, '\n');
> > > @@ -1174,13 +1180,14 @@ static void find_subpos(const char *buf,
> > >  		buf++;
> > >
> > >  	/* parse signature first; we might not even have a subject line */
> > > -	*sig = buf + parse_signature(buf, strlen(buf));
> > > -	*siglen = strlen(*sig);
> > > +	parse_signature(buf, end - buf, &payload, &signature);
> > > +	*sig = strbuf_detach(&signature, siglen);
> >
> > While I like the spirit of this patch, it makes the Windows build fail. I
> > put this on top of Git for Windows' `shears/pu` branch to fix it (maybe
> > you could adopt a variation of it?):
>
> I'm happy to squash this in.  Sorry for the breakage, and thanks for
> catching this.

You're welcome, but credit for catching it should go to Azure Pipelines
;-)

To be honest, I am rather happy how these CI builds help us catch things
already when they are in `pu`. _Quite_ happy.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 18/24] gpg-interface: improve interface for parsing tags
  2020-02-27 13:24       ` Johannes Schindelin
@ 2020-02-27 15:06         ` Junio C Hamano
  0 siblings, 0 replies; 51+ messages in thread
From: Junio C Hamano @ 2020-02-27 15:06 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: brian m. carlson, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> You're welcome, but credit for catching it should go to Azure Pipelines
> ;-)
>
> To be honest, I am rather happy how these CI builds help us catch things
> already when they are in `pu`. _Quite_ happy.

Oh, you do not have to be honest for that---you show that all the
time ;-)

And I am happy that I can throw a new topic to 'pu' and CI will
catch issues before I even look at it seriously to consider if it is
worth keeping it.

One sad thing about Azure Pipelines thing is that, as far as I
recall, we hear about breakages it discovered only from you and
nobody else, while we hear about breakages discovered by Travis by
at least multiple people, and that is *not* because the coverage is
wider or report is more accesible and easier to read there.  It's
that our CI at Travis forced awareness of its existence on us by
simply being the only one out there for a long time.

We are bound to add issues to the tip of 'pu' (e.g. 64-vs-32,
bytesex, or GNUisms) that are easy to discover by building on
different platforms and in different configurations every once in a
while.  We should advertise more to gain awareness so that other
people can also notice breakages Travis's limited coverage may have
missed even when you are on vacation.

If you want quantitative measure of success ;-) (sorry, I couldn't
resist as it has been Perf period around here), I want to see at
least two people outside Microsoft to notice and report breakages
first found by Azure Pipelines CI in the coming 6 months.  How can
we help make that happen?

Thanks.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 20/24] fast-import: permit reading multiple marks files
  2020-02-22 20:17 ` [PATCH v2 20/24] fast-import: permit reading multiple marks files brian m. carlson
@ 2020-06-05 16:14   ` Junio C Hamano
  2020-06-05 16:26     ` Junio C Hamano
  0 siblings, 1 reply; 51+ messages in thread
From: Junio C Hamano @ 2020-06-05 16:14 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> In the future, we'll want to read marks files for submodules as well.
> Refactor the existing code to make it possible to read multiple marks
> files, each into their own marks set.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  fast-import.c | 32 ++++++++++++++++++--------------
>  1 file changed, 18 insertions(+), 14 deletions(-)

This conversion looks a bit strange.  We are working off of 'marks'
that is file scope global, and the topmost caller passes the pointer
to that single instance down the callchain to this function (among
others) in "s", and it all looked no-op to my eye, but this actually
updates the global 'marks'.

> diff --git a/fast-import.c b/fast-import.c
> index b8b65a801c..b9ecd89699 100644
> --- a/fast-import.c
> +++ b/fast-import.c
> @@ -493,25 +493,24 @@ static char *pool_strdup(const char *s)
>  	return r;
>  }
>  
> -static void insert_mark(uintmax_t idnum, struct object_entry *oe)
> +static void insert_mark(struct mark_set *s, uintmax_t idnum, struct object_entry *oe)
>  {
> -	struct mark_set *s = marks;

This is no longer needed as the caller gave the 'marks' to us in
's'.  Up to this line, this step is a no-op patch. 

>  	while ((idnum >> s->shift) >= 1024) {
>  		s = mem_pool_calloc(&fi_mem_pool, 1, sizeof(struct mark_set));

Here we overwrite 's', and we lost the pointer the caller gave us!!!

>  		s->shift = marks->shift + 10;

And we still work off of global marks here and

>  		s->data.sets[0] = marks;
>  		marks = s;

try to swap it here.  But if our caller is also using 's' that was
passed from its caller, this assignment would not be seen by our
caller.  We'd need to pass a pointer to caller's 's' to this
function and update it around here.

>  	}
>  	while (s->shift) {
>  		uintmax_t i = idnum >> s->shift;
>  		idnum -= i << s->shift;
>  		if (!s->data.sets[i]) {
>  			s->data.sets[i] = mem_pool_calloc(&fi_mem_pool, 1, sizeof(struct mark_set));
>  			s->data.sets[i]->shift = s->shift - 10;
>  		}
>  		s = s->data.sets[i];
>  	}
>  	if (!s->data.marked[idnum])
>  		marks_set_count++;

I am not sure if this global also needs a separate counter and I
didn't bother to check.

>  	s->data.marked[idnum] = oe;
>  }

With this patch t9300 still seems to pass.

I do not know if this is the fix for Billes Tibor's "fast-import
oom" issue <c53bb69b-682d-3b47-4ed0-5f4559e69e37@gmx.com> but the
change to this function stood out.

Thanks.


 fast-import.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 0dfa14dc8c..f39b7890ac 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -534,13 +534,15 @@ static char *pool_strdup(const char *s)
 	return r;
 }
 
-static void insert_mark(struct mark_set *s, uintmax_t idnum, struct object_entry *oe)
+static void insert_mark(struct mark_set **sp, uintmax_t idnum, struct object_entry *oe)
 {
+	struct mark_set *s = *sp;
+
 	while ((idnum >> s->shift) >= 1024) {
 		s = mem_pool_calloc(&fi_mem_pool, 1, sizeof(struct mark_set));
-		s->shift = marks->shift + 10;
-		s->data.sets[0] = marks;
-		marks = s;
+		s->shift = (*sp)->shift + 10;
+		s->data.sets[0] = (*sp);
+		(*sp) = s;
 	}
 	while (s->shift) {
 		uintmax_t i = idnum >> s->shift;
@@ -958,7 +960,7 @@ static int store_object(
 
 	e = insert_object(&oid);
 	if (mark)
-		insert_mark(marks, mark, e);
+		insert_mark(&marks, mark, e);
 	if (e->idx.offset) {
 		duplicate_count_by_type[type]++;
 		return 1;
@@ -1156,7 +1158,7 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark)
 	e = insert_object(&oid);
 
 	if (mark)
-		insert_mark(marks, mark, e);
+		insert_mark(&marks, mark, e);
 
 	if (e->idx.offset) {
 		duplicate_count_by_type[OBJ_BLOB]++;
@@ -1745,12 +1747,12 @@ static void insert_object_entry(struct mark_set *s, struct object_id *oid, uintm
 		e->pack_id = MAX_PACK_ID;
 		e->idx.offset = 1; /* just not zero! */
 	}
-	insert_mark(s, mark, e);
+	insert_mark(&s, mark, e);
 }
 
 static void insert_oid_entry(struct mark_set *s, struct object_id *oid, uintmax_t mark)
 {
-	insert_mark(s, mark, xmemdupz(oid, sizeof(*oid)));
+	insert_mark(&s, mark, xmemdupz(oid, sizeof(*oid)));
 }
 
 static void read_mark_file(struct mark_set *s, FILE *f, mark_set_inserter_t inserter)
@@ -3242,7 +3244,7 @@ static void parse_alias(void)
 		die(_("Expected 'to' command, got %s"), command_buf.buf);
 	e = find_object(&b.oid);
 	assert(e);
-	insert_mark(marks, next_mark, e);
+	insert_mark(&marks, next_mark, e);
 }
 
 static char* make_fast_import_path(const char *path)

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 20/24] fast-import: permit reading multiple marks files
  2020-06-05 16:14   ` Junio C Hamano
@ 2020-06-05 16:26     ` Junio C Hamano
  2020-06-05 22:39       ` brian m. carlson
  0 siblings, 1 reply; 51+ messages in thread
From: Junio C Hamano @ 2020-06-05 16:26 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> @@ -1745,12 +1747,12 @@ static void insert_object_entry(struct mark_set *s, struct object_id *oid, uintm
>  		e->pack_id = MAX_PACK_ID;
>  		e->idx.offset = 1; /* just not zero! */
>  	}
> -	insert_mark(s, mark, e);
> +	insert_mark(&s, mark, e);
>  }
>  
>  static void insert_oid_entry(struct mark_set *s, struct object_id *oid, uintmax_t mark)
>  {
> -	insert_mark(s, mark, xmemdupz(oid, sizeof(*oid)));
> +	insert_mark(&s, mark, xmemdupz(oid, sizeof(*oid)));
>  }
>  

Actually the above "fix" still has the same issue; we may have made
the on-stack copy of 's' that are parameters these two functions see
as callees, but the caller of these two functions would not see the
effect of the munging done deep in the callchain by insert_mark().

The read_marks() function calls read_mark_file() with "marks", but I
think you'd need to pass pointer to that global singleton and teach
mark_set_inserter_t function to take a pointer to a pointer to
struct mark_set throughout, so that the instance the top-level
caller (read_marks in this callchain) gets updated.

Here is another try.

 fast-import.c | 28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 0dfa14dc8c..ed87d6e380 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -150,7 +150,7 @@ struct recent_command {
 	char *buf;
 };
 
-typedef void (*mark_set_inserter_t)(struct mark_set *s, struct object_id *oid, uintmax_t mark);
+typedef void (*mark_set_inserter_t)(struct mark_set **s, struct object_id *oid, uintmax_t mark);
 typedef void (*each_mark_fn_t)(uintmax_t mark, void *obj, void *cbp);
 
 /* Configured limits on output */
@@ -534,13 +534,15 @@ static char *pool_strdup(const char *s)
 	return r;
 }
 
-static void insert_mark(struct mark_set *s, uintmax_t idnum, struct object_entry *oe)
+static void insert_mark(struct mark_set **sp, uintmax_t idnum, struct object_entry *oe)
 {
+	struct mark_set *s = *sp;
+
 	while ((idnum >> s->shift) >= 1024) {
 		s = mem_pool_calloc(&fi_mem_pool, 1, sizeof(struct mark_set));
-		s->shift = marks->shift + 10;
-		s->data.sets[0] = marks;
-		marks = s;
+		s->shift = (*sp)->shift + 10;
+		s->data.sets[0] = (*sp);
+		(*sp) = s;
 	}
 	while (s->shift) {
 		uintmax_t i = idnum >> s->shift;
@@ -958,7 +960,7 @@ static int store_object(
 
 	e = insert_object(&oid);
 	if (mark)
-		insert_mark(marks, mark, e);
+		insert_mark(&marks, mark, e);
 	if (e->idx.offset) {
 		duplicate_count_by_type[type]++;
 		return 1;
@@ -1156,7 +1158,7 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark)
 	e = insert_object(&oid);
 
 	if (mark)
-		insert_mark(marks, mark, e);
+		insert_mark(&marks, mark, e);
 
 	if (e->idx.offset) {
 		duplicate_count_by_type[OBJ_BLOB]++;
@@ -1731,7 +1733,7 @@ static void dump_marks(void)
 	}
 }
 
-static void insert_object_entry(struct mark_set *s, struct object_id *oid, uintmax_t mark)
+static void insert_object_entry(struct mark_set **s, struct object_id *oid, uintmax_t mark)
 {
 	struct object_entry *e;
 	e = find_object(oid);
@@ -1748,12 +1750,12 @@ static void insert_object_entry(struct mark_set *s, struct object_id *oid, uintm
 	insert_mark(s, mark, e);
 }
 
-static void insert_oid_entry(struct mark_set *s, struct object_id *oid, uintmax_t mark)
+static void insert_oid_entry(struct mark_set **s, struct object_id *oid, uintmax_t mark)
 {
 	insert_mark(s, mark, xmemdupz(oid, sizeof(*oid)));
 }
 
-static void read_mark_file(struct mark_set *s, FILE *f, mark_set_inserter_t inserter)
+static void read_mark_file(struct mark_set **s, FILE *f, mark_set_inserter_t inserter)
 {
 	char line[512];
 	while (fgets(line, sizeof(line), f)) {
@@ -1786,7 +1788,7 @@ static void read_marks(void)
 		goto done; /* Marks file does not exist */
 	else
 		die_errno("cannot read '%s'", import_marks_file);
-	read_mark_file(marks, f, insert_object_entry);
+	read_mark_file(&marks, f, insert_object_entry);
 	fclose(f);
 done:
 	import_marks_file_done = 1;
@@ -3242,7 +3244,7 @@ static void parse_alias(void)
 		die(_("Expected 'to' command, got %s"), command_buf.buf);
 	e = find_object(&b.oid);
 	assert(e);
-	insert_mark(marks, next_mark, e);
+	insert_mark(&marks, next_mark, e);
 }
 
 static char* make_fast_import_path(const char *path)
@@ -3340,7 +3342,7 @@ static void option_rewrite_submodules(const char *arg, struct string_list *list)
 	fp = fopen(f, "r");
 	if (!fp)
 		die_errno("cannot read '%s'", f);
-	read_mark_file(ms, fp, insert_oid_entry);
+	read_mark_file(&ms, fp, insert_oid_entry);
 	fclose(fp);
 }
 

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 20/24] fast-import: permit reading multiple marks files
  2020-06-05 16:26     ` Junio C Hamano
@ 2020-06-05 22:39       ` brian m. carlson
  2020-06-05 22:53         ` Junio C Hamano
  0 siblings, 1 reply; 51+ messages in thread
From: brian m. carlson @ 2020-06-05 22:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1963 bytes --]

On 2020-06-05 at 16:26:02, Junio C Hamano wrote:
> diff --git a/fast-import.c b/fast-import.c
> index 0dfa14dc8c..ed87d6e380 100644
> --- a/fast-import.c
> +++ b/fast-import.c
> @@ -150,7 +150,7 @@ struct recent_command {
>  	char *buf;
>  };
>  
> -typedef void (*mark_set_inserter_t)(struct mark_set *s, struct object_id *oid, uintmax_t mark);
> +typedef void (*mark_set_inserter_t)(struct mark_set **s, struct object_id *oid, uintmax_t mark);
>  typedef void (*each_mark_fn_t)(uintmax_t mark, void *obj, void *cbp);
>  
>  /* Configured limits on output */
> @@ -534,13 +534,15 @@ static char *pool_strdup(const char *s)
>  	return r;
>  }
>  
> -static void insert_mark(struct mark_set *s, uintmax_t idnum, struct object_entry *oe)
> +static void insert_mark(struct mark_set **sp, uintmax_t idnum, struct object_entry *oe)
>  {
> +	struct mark_set *s = *sp;
> +
>  	while ((idnum >> s->shift) >= 1024) {
>  		s = mem_pool_calloc(&fi_mem_pool, 1, sizeof(struct mark_set));
> -		s->shift = marks->shift + 10;
> -		s->data.sets[0] = marks;
> -		marks = s;
> +		s->shift = (*sp)->shift + 10;
> +		s->data.sets[0] = (*sp);
> +		(*sp) = s;
>  	}

Yeah, this is much better.  I'm not sure how I missed converting that
block, but it definitely leaks heavily, and obviously we don't want to
update the actual marks variable in every case, since that defeats the
purpose of the patch.

We don't actually hit this case in any of the tests because we don't
have any tests that have 1024 marks in them.  I'm having trouble coming
up with a test even with a large number of marks because I don't think
we produce different behavior here; we just leak a lot of memory.

This does fix the reported issue, as I suspected.

Do you want to write this up into a patch with a commit message, or
should I?  If the latter, may I have your sign-off for it?
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 20/24] fast-import: permit reading multiple marks files
  2020-06-05 22:39       ` brian m. carlson
@ 2020-06-05 22:53         ` Junio C Hamano
  0 siblings, 0 replies; 51+ messages in thread
From: Junio C Hamano @ 2020-06-05 22:53 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> We don't actually hit this case in any of the tests because we don't
> have any tests that have 1024 marks in them.  I'm having trouble coming
> up with a test even with a large number of marks because I don't think
> we produce different behavior here; we just leak a lot of memory.
>
> This does fix the reported issue, as I suspected.
>
> Do you want to write this up into a patch with a commit message, or
> should I?  If the latter, may I have your sign-off for it?

Sure, anything I write for this project you can assume to be signed
off by me.  As I was just "fixing" what I thought was a botched
conversion, without really figuring out where it actually leads to
leaks, I'd need further work if I were to wrap it up with a sensible
log message, so thanks for volunteering ;-)

Signed-off-by: Junio C Hamano <gitster@pobox.com>

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2020-06-05 22:54 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-22 20:17 [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
2020-02-22 20:17 ` [PATCH v2 01/24] builtin/pack-objects: make hash agnostic brian m. carlson
2020-02-23 21:57   ` Jeff King
2020-02-24  3:01     ` Jeff King
2020-02-24  3:42     ` brian m. carlson
2020-02-24  4:20       ` Jeff King
2020-02-22 20:17 ` [PATCH v2 02/24] hash: implement and use a context cloning function brian m. carlson
2020-02-22 20:17 ` [PATCH v2 03/24] hex: introduce parsing variants taking hash algorithms brian m. carlson
2020-02-22 20:17 ` [PATCH v2 04/24] hex: add functions to parse hex object IDs in any algorithm brian m. carlson
2020-02-22 20:17 ` [PATCH v2 05/24] repository: require a build flag to use SHA-256 brian m. carlson
2020-02-22 20:17 ` [PATCH v2 06/24] t: use hash-specific lookup tables to define test constants brian m. carlson
2020-02-22 20:17 ` [PATCH v2 07/24] t6300: abstract away SHA-1-specific constants brian m. carlson
2020-02-24 18:01   ` Junio C Hamano
2020-02-24 18:12     ` Jeff King
2020-02-24 20:41       ` Junio C Hamano
2020-02-22 20:17 ` [PATCH v2 08/24] t6300: make hash algorithm independent brian m. carlson
2020-02-22 20:17 ` [PATCH v2 09/24] t/helper/test-dump-split-index: initialize git repository brian m. carlson
2020-02-22 20:17 ` [PATCH v2 10/24] t/helper: initialize repository if necessary brian m. carlson
2020-02-24 18:05   ` Junio C Hamano
2020-02-25  0:05     ` brian m. carlson
2020-02-22 20:17 ` [PATCH v2 11/24] t/helper: make repository tests hash independent brian m. carlson
2020-02-22 20:17 ` [PATCH v2 12/24] setup: allow check_repository_format to read repository format brian m. carlson
2020-02-22 20:17 ` [PATCH v2 13/24] builtin/init-db: allow specifying hash algorithm on command line brian m. carlson
2020-02-24 18:14   ` Junio C Hamano
2020-02-25  0:11     ` brian m. carlson
2020-02-22 20:17 ` [PATCH v2 14/24] builtin/init-db: add environment variable for new repo hash brian m. carlson
2020-02-22 20:17 ` [PATCH v2 15/24] init-db: move writing repo version into a function brian m. carlson
2020-02-22 20:17 ` [PATCH v2 16/24] worktree: allow repository version 1 brian m. carlson
2020-02-22 20:17 ` [PATCH v2 17/24] commit: use expected signature header for SHA-256 brian m. carlson
2020-02-24 18:24   ` Junio C Hamano
2020-02-22 20:17 ` [PATCH v2 18/24] gpg-interface: improve interface for parsing tags brian m. carlson
2020-02-24 18:26   ` Junio C Hamano
2020-02-25 10:29   ` Johannes Schindelin
2020-02-25 19:25     ` Junio C Hamano
2020-02-26  3:05       ` brian m. carlson
2020-02-26  3:11         ` Junio C Hamano
2020-02-26  2:23     ` brian m. carlson
2020-02-27 13:24       ` Johannes Schindelin
2020-02-27 15:06         ` Junio C Hamano
2020-02-22 20:17 ` [PATCH v2 19/24] tag: store SHA-256 signatures in a header brian m. carlson
2020-02-24 18:30   ` Junio C Hamano
2020-02-22 20:17 ` [PATCH v2 20/24] fast-import: permit reading multiple marks files brian m. carlson
2020-06-05 16:14   ` Junio C Hamano
2020-06-05 16:26     ` Junio C Hamano
2020-06-05 22:39       ` brian m. carlson
2020-06-05 22:53         ` Junio C Hamano
2020-02-22 20:17 ` [PATCH v2 21/24] fast-import: add helper function for inserting mark object entries brian m. carlson
2020-02-22 20:17 ` [PATCH v2 22/24] fast-import: make find_marks work on any mark set brian m. carlson
2020-02-22 20:17 ` [PATCH v2 23/24] fast-import: add a generic function to iterate over marks brian m. carlson
2020-02-22 20:17 ` [PATCH v2 24/24] fast-import: add options for rewriting submodules brian m. carlson
2020-02-24 18:34 ` [PATCH v2 00/24] SHA-256 stage 4 implementation, part 1/3 Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).