git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 00/15] SHA-256 / SHA-1 interop, part 1
@ 2021-04-10 15:21 brian m. carlson
  2021-04-10 15:21 ` [PATCH 01/15] sha1-file: allow hashing objects literally with any algorithm brian m. carlson
                   ` (14 more replies)
  0 siblings, 15 replies; 57+ messages in thread
From: brian m. carlson @ 2021-04-10 15:21 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee

Because I know the first twenty-something weren't enough, here's another
series of SHA-256-related patches.  This is the beginning of the work to
make SHA-256 and SHA-1 repositories interoperate, which of course is
part of the transition plan.

This series introduces code to make struct object_id take a hash
algorithm member and use that for printing the object ID.  This is the
first step to handling multiple hash algorithms in the same binary,
which we'll need if we're going to process both SHA-1 and SHA-256
versions of objects.

The major contributor to the size of this series is the patch to switch
to per-algorithm null OIDs via a null_oid function.  It's certainly
possible there are conflicts with topics in flight, but those should be
easy to resolve (just change "&null_oid" to "null_oid()").

In addition, there's an initial set of patches to allow hashing an
object literally with any algorithm.  Such objects cannot be written
into the object store, but their values can be printed.

This series is available as transition-interop-part-1 from the usual
places and is part of the transition-interop branch (the entirety of the
work) as well.  Be aware that the latter is very, very broken and its
use is not advised at this time.

Future series will likely include some fixes to our testsuite and code
to map objects across hash algorithms.

brian m. carlson (15):
  sha1-file: allow hashing objects literally with any algorithm
  builtin/hash-object: allow literally hashing with a given algorithm
  cache: add an algo member to struct object_id
  Always use oidread to read into struct object_id
  hash: add a function to finalize object IDs
  Use the final_oid_fn to finalize hashing of object IDs
  builtin/pack-redundant: avoid casting buffers to struct object_id
  cache: compare the entire buffer for struct object_id
  hash: set and copy algo field in struct object_id
  hash: provide per-algorithm null OIDs
  builtin/show-index: set the algorithm for object IDs
  commit-graph: don't store file hashes as struct object_id
  builtin/pack-objects: avoid using struct object_id for pack hash
  hex: default to the_hash_algo on zero algorithm value
  hex: print objects using the hash algorithm member

 archive.c                                    |  6 +-
 blame.c                                      |  2 +-
 branch.c                                     |  2 +-
 builtin/checkout.c                           |  6 +-
 builtin/clone.c                              |  2 +-
 builtin/describe.c                           |  2 +-
 builtin/diff.c                               |  2 +-
 builtin/fast-export.c                        | 10 +--
 builtin/fast-import.c                        |  8 +-
 builtin/grep.c                               |  2 +-
 builtin/hash-object.c                        | 47 ++++++++----
 builtin/index-pack.c                         |  6 +-
 builtin/ls-files.c                           |  2 +-
 builtin/pack-objects.c                       | 20 ++---
 builtin/pack-redundant.c                     | 28 +++----
 builtin/rebase.c                             |  4 +-
 builtin/receive-pack.c                       |  2 +-
 builtin/show-index.c                         |  4 +-
 builtin/submodule--helper.c                  | 29 +++----
 builtin/unpack-objects.c                     |  7 +-
 builtin/worktree.c                           |  4 +-
 bulk-checkin.c                               |  2 +-
 combine-diff.c                               |  2 +-
 commit-graph.c                               | 25 ++++---
 connect.c                                    |  2 +-
 diff-lib.c                                   |  6 +-
 diff-no-index.c                              |  2 +-
 diff.c                                       |  6 +-
 dir.c                                        |  6 +-
 grep.c                                       |  2 +-
 hash.h                                       | 73 +++++++++++-------
 hex.c                                        | 20 +++--
 http-walker.c                                |  2 +-
 http.c                                       |  2 +-
 log-tree.c                                   |  2 +-
 match-trees.c                                |  2 +-
 merge-ort.c                                  | 20 ++---
 merge-recursive.c                            | 10 +--
 midx.c                                       |  2 +-
 notes-merge.c                                |  2 +-
 notes.c                                      |  9 ++-
 object-file.c                                | 79 +++++++++++++++++---
 object-store.h                               |  3 +
 parse-options-cb.c                           |  2 +-
 range-diff.c                                 |  2 +-
 read-cache.c                                 |  4 +-
 refs.c                                       |  4 +-
 refs/debug.c                                 |  2 +-
 refs/files-backend.c                         |  2 +-
 reset.c                                      |  2 +-
 sequencer.c                                  |  4 +-
 split-index.c                                |  2 +-
 submodule-config.c                           |  2 +-
 submodule.c                                  | 26 ++++---
 t/helper/test-submodule-nested-repo-config.c |  2 +-
 t/t1007-hash-object.sh                       | 10 +++
 tree-diff.c                                  |  4 +-
 tree-walk.c                                  |  2 +-
 wt-status.c                                  |  4 +-
 xdiff-interface.c                            |  2 +-
 60 files changed, 341 insertions(+), 209 deletions(-)


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 01/15] sha1-file: allow hashing objects literally with any algorithm
  2021-04-10 15:21 [PATCH 00/15] SHA-256 / SHA-1 interop, part 1 brian m. carlson
@ 2021-04-10 15:21 ` brian m. carlson
  2021-04-15  8:55   ` Denton Liu
  2021-04-16 15:04   ` Ævar Arnfjörð Bjarmason
  2021-04-10 15:21 ` [PATCH 02/15] builtin/hash-object: allow literally hashing with a given algorithm brian m. carlson
                   ` (13 subsequent siblings)
  14 siblings, 2 replies; 57+ messages in thread
From: brian m. carlson @ 2021-04-10 15:21 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee

In order to perform suitable testing with multiple algorithms and
interoperability, we'll need the ability to hash an object with a given
algorithm. Introduce this capability for now only for objects which are
hashed literally by adding a function which does this and changing a
static function to accept an algorithm pointer.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 object-file.c  | 16 ++++++++++++++--
 object-store.h |  3 +++
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/object-file.c b/object-file.c
index 624af408cd..f5847ee20f 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1957,6 +1957,15 @@ int write_object_file(const void *buf, unsigned long len, const char *type,
 int hash_object_file_literally(const void *buf, unsigned long len,
 			       const char *type, struct object_id *oid,
 			       unsigned flags)
+{
+	return hash_object_file_literally_algop(buf, len, type, oid, flags,
+						the_hash_algo);
+}
+
+int hash_object_file_literally_algop(const void *buf, unsigned long len,
+				     const char *type, struct object_id *oid,
+				     unsigned flags,
+				     const struct git_hash_algo *algo)
 {
 	char *header;
 	int hdrlen, status = 0;
@@ -1964,11 +1973,14 @@ int hash_object_file_literally(const void *buf, unsigned long len,
 	/* type string, SP, %lu of the length plus NUL must fit this */
 	hdrlen = strlen(type) + MAX_HEADER_LEN;
 	header = xmalloc(hdrlen);
-	write_object_file_prepare(the_hash_algo, buf, len, type, oid, header,
-				  &hdrlen);
+	write_object_file_prepare(algo, buf, len, type, oid, header, &hdrlen);
 
 	if (!(flags & HASH_WRITE_OBJECT))
 		goto cleanup;
+	if (algo->format_id != the_hash_algo->format_id) {
+		status = -1;
+		goto cleanup;
+	}
 	if (freshen_packed_object(oid) || freshen_loose_object(oid))
 		goto cleanup;
 	status = write_loose_object(oid, header, hdrlen, buf, len, 0);
diff --git a/object-store.h b/object-store.h
index ec32c23dcb..f95d03a7f5 100644
--- a/object-store.h
+++ b/object-store.h
@@ -221,6 +221,9 @@ int hash_object_file_literally(const void *buf, unsigned long len,
 			       const char *type, struct object_id *oid,
 			       unsigned flags);
 
+int hash_object_file_literally_algop(const void *buf, unsigned long len,
+				     const char *type, struct object_id *oid,
+				     unsigned flags, const struct git_hash_algo *algo);
 /*
  * Add an object file to the in-memory object store, without writing it
  * to disk.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 02/15] builtin/hash-object: allow literally hashing with a given algorithm
  2021-04-10 15:21 [PATCH 00/15] SHA-256 / SHA-1 interop, part 1 brian m. carlson
  2021-04-10 15:21 ` [PATCH 01/15] sha1-file: allow hashing objects literally with any algorithm brian m. carlson
@ 2021-04-10 15:21 ` brian m. carlson
  2021-04-11  8:52   ` Ævar Arnfjörð Bjarmason
                     ` (2 more replies)
  2021-04-10 15:21 ` [PATCH 03/15] cache: add an algo member to struct object_id brian m. carlson
                   ` (12 subsequent siblings)
  14 siblings, 3 replies; 57+ messages in thread
From: brian m. carlson @ 2021-04-10 15:21 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee

Add an --object-format argument to git hash-object that allows hashing
an object with a given algorithm. Currently this options is limited to
use with --literally, since the index_* functions do not yet handle
multiple hash algorithms.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/hash-object.c  | 47 ++++++++++++++++++++++++++++++------------
 t/t1007-hash-object.sh | 10 +++++++++
 2 files changed, 44 insertions(+), 13 deletions(-)

diff --git a/builtin/hash-object.c b/builtin/hash-object.c
index 640ef4ded5..0203cfbe9a 100644
--- a/builtin/hash-object.c
+++ b/builtin/hash-object.c
@@ -17,7 +17,8 @@
  * needs to bypass the data conversion performed by, and the type
  * limitation imposed by, index_fd() and its callees.
  */
-static int hash_literally(struct object_id *oid, int fd, const char *type, unsigned flags)
+static int hash_literally(struct object_id *oid, int fd, const char *type,
+			  unsigned flags, const struct git_hash_algo *algo)
 {
 	struct strbuf buf = STRBUF_INIT;
 	int ret;
@@ -25,42 +26,46 @@ static int hash_literally(struct object_id *oid, int fd, const char *type, unsig
 	if (strbuf_read(&buf, fd, 4096) < 0)
 		ret = -1;
 	else
-		ret = hash_object_file_literally(buf.buf, buf.len, type, oid,
-						 flags);
+		ret = hash_object_file_literally_algop(buf.buf, buf.len, type, oid,
+						       flags, algo);
 	strbuf_release(&buf);
 	return ret;
 }
 
 static void hash_fd(int fd, const char *type, const char *path, unsigned flags,
-		    int literally)
+		    int literally, const struct git_hash_algo *algo)
 {
 	struct stat st;
 	struct object_id oid;
 
+	if (!literally && algo != the_hash_algo)
+		die(_("Can't use hash algo %s except literally yet"), algo->name);
+
 	if (fstat(fd, &st) < 0 ||
 	    (literally
-	     ? hash_literally(&oid, fd, type, flags)
+	     ? hash_literally(&oid, fd, type, flags, algo)
 	     : index_fd(the_repository->index, &oid, fd, &st,
 			type_from_string(type), path, flags)))
 		die((flags & HASH_WRITE_OBJECT)
 		    ? "Unable to add %s to database"
 		    : "Unable to hash %s", path);
-	printf("%s\n", oid_to_hex(&oid));
+	printf("%s\n", hash_to_hex_algop(oid.hash, algo));
 	maybe_flush_or_die(stdout, "hash to stdout");
 }
 
 static void hash_object(const char *path, const char *type, const char *vpath,
-			unsigned flags, int literally)
+			unsigned flags, int literally,
+			const struct git_hash_algo *algo)
 {
 	int fd;
 	fd = open(path, O_RDONLY);
 	if (fd < 0)
 		die_errno("Cannot open '%s'", path);
-	hash_fd(fd, type, vpath, flags, literally);
+	hash_fd(fd, type, vpath, flags, literally, algo);
 }
 
 static void hash_stdin_paths(const char *type, int no_filters, unsigned flags,
-			     int literally)
+			     int literally, const struct git_hash_algo *algo)
 {
 	struct strbuf buf = STRBUF_INIT;
 	struct strbuf unquoted = STRBUF_INIT;
@@ -73,7 +78,7 @@ static void hash_stdin_paths(const char *type, int no_filters, unsigned flags,
 			strbuf_swap(&buf, &unquoted);
 		}
 		hash_object(buf.buf, type, no_filters ? NULL : buf.buf, flags,
-			    literally);
+			    literally, algo);
 	}
 	strbuf_release(&buf);
 	strbuf_release(&unquoted);
@@ -94,6 +99,8 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 	int nongit = 0;
 	unsigned flags = HASH_FORMAT_CHECK;
 	const char *vpath = NULL;
+	const char *object_format = NULL;
+	const struct git_hash_algo *algo;
 	const struct option hash_object_options[] = {
 		OPT_STRING('t', NULL, &type, N_("type"), N_("object type")),
 		OPT_BIT('w', NULL, &flags, N_("write the object into the object database"),
@@ -103,6 +110,7 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 		OPT_BOOL( 0 , "no-filters", &no_filters, N_("store file as is without filters")),
 		OPT_BOOL( 0, "literally", &literally, N_("just hash any random garbage to create corrupt objects for debugging Git")),
 		OPT_STRING( 0 , "path", &vpath, N_("file"), N_("process file as it were from this path")),
+		OPT_STRING( 0 , "object-format", &object_format, N_("object-format"), N_("Use this hash algorithm")),
 		OPT_END()
 	};
 	int i;
@@ -121,6 +129,19 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 
 	git_config(git_default_config, NULL);
 
+	algo = the_hash_algo;
+	if (object_format) {
+		if (flags & HASH_WRITE_OBJECT)
+			errstr = "Can't use -w with --object-format";
+		else {
+			int id = hash_algo_by_name(object_format);
+			if (id == GIT_HASH_UNKNOWN)
+				errstr = "Unknown object format";
+			else
+				algo = &hash_algos[id];
+		}
+	}
+
 	if (stdin_paths) {
 		if (hashstdin)
 			errstr = "Can't use --stdin-paths with --stdin";
@@ -142,7 +163,7 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 	}
 
 	if (hashstdin)
-		hash_fd(0, type, vpath, flags, literally);
+		hash_fd(0, type, vpath, flags, literally, algo);
 
 	for (i = 0 ; i < argc; i++) {
 		const char *arg = argv[i];
@@ -151,12 +172,12 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 		if (prefix)
 			arg = to_free = prefix_filename(prefix, arg);
 		hash_object(arg, type, no_filters ? NULL : vpath ? vpath : arg,
-			    flags, literally);
+			    flags, literally, algo);
 		free(to_free);
 	}
 
 	if (stdin_paths)
-		hash_stdin_paths(type, no_filters, flags, literally);
+		hash_stdin_paths(type, no_filters, flags, literally, algo);
 
 	return 0;
 }
diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index 64b340f227..ea4b3d2bda 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -83,6 +83,11 @@ test_expect_success 'hash a file' '
 	test "$(test_oid hello)" = $(git hash-object hello)
 '
 
+test_expect_failure 'hash a file with a given algorithm' '
+	test "$(test_oid --hash=sha1 hello)" = $(git hash-object --object-format=sha1 hello) &&
+	test "$(test_oid --hash=sha256 hello)" = $(git hash-object --object-format=sha256 hello)
+'
+
 test_blob_does_not_exist "$(test_oid hello)"
 
 test_expect_success 'hash from stdin' '
@@ -248,4 +253,9 @@ test_expect_success '--literally with extra-long type' '
 	echo example | git hash-object -t $t --literally --stdin
 '
 
+test_expect_success '--literally with --object-format' '
+	test $(test_oid --hash=sha1 hello) = $(git hash-object -t blob --literally --object-format=sha1 hello) &&
+	test $(test_oid --hash=sha256 hello) = $(git hash-object -t blob --literally --object-format=sha256 hello)
+'
+
 test_done

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 03/15] cache: add an algo member to struct object_id
  2021-04-10 15:21 [PATCH 00/15] SHA-256 / SHA-1 interop, part 1 brian m. carlson
  2021-04-10 15:21 ` [PATCH 01/15] sha1-file: allow hashing objects literally with any algorithm brian m. carlson
  2021-04-10 15:21 ` [PATCH 02/15] builtin/hash-object: allow literally hashing with a given algorithm brian m. carlson
@ 2021-04-10 15:21 ` brian m. carlson
  2021-04-11 11:55   ` Ævar Arnfjörð Bjarmason
  2021-04-13 12:12   ` Derrick Stolee
  2021-04-10 15:21 ` [PATCH 04/15] Always use oidread to read into " brian m. carlson
                   ` (11 subsequent siblings)
  14 siblings, 2 replies; 57+ messages in thread
From: brian m. carlson @ 2021-04-10 15:21 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee

Now that we're working with multiple hash algorithms in the same repo,
it's best if we label each object ID with its algorithm so we can
determine how to format a given object ID. Add a member called algo to
struct object_id.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 hash.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hash.h b/hash.h
index 3fb0c3d400..dafdcb3335 100644
--- a/hash.h
+++ b/hash.h
@@ -181,6 +181,7 @@ static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
 
 struct object_id {
 	unsigned char hash[GIT_MAX_RAWSZ];
+	int algo;
 };
 
 #define the_hash_algo the_repository->hash_algo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 04/15] Always use oidread to read into struct object_id
  2021-04-10 15:21 [PATCH 00/15] SHA-256 / SHA-1 interop, part 1 brian m. carlson
                   ` (2 preceding siblings ...)
  2021-04-10 15:21 ` [PATCH 03/15] cache: add an algo member to struct object_id brian m. carlson
@ 2021-04-10 15:21 ` brian m. carlson
  2021-04-10 15:21 ` [PATCH 05/15] hash: add a function to finalize object IDs brian m. carlson
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 57+ messages in thread
From: brian m. carlson @ 2021-04-10 15:21 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee

In the future, we'll want oidread to automatically set the hash
algorithm member for an object ID we read into it, so ensure we use
oidread instead of hashcpy everywhere we're copying a hash value into a
struct object_id.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 archive.c                |  2 +-
 builtin/fast-import.c    |  4 ++--
 builtin/index-pack.c     |  4 ++--
 builtin/unpack-objects.c |  2 +-
 commit-graph.c           | 12 ++++++------
 dir.c                    |  4 ++--
 http-walker.c            |  2 +-
 match-trees.c            |  2 +-
 midx.c                   |  2 +-
 notes.c                  |  4 ++--
 read-cache.c             |  4 ++--
 split-index.c            |  2 +-
 tree-walk.c              |  2 +-
 13 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/archive.c b/archive.c
index 295615580d..6cfb9e42d6 100644
--- a/archive.c
+++ b/archive.c
@@ -203,7 +203,7 @@ static void queue_directory(const unsigned char *sha1,
 	d->mode	   = mode;
 	c->bottom  = d;
 	d->len = xsnprintf(d->path, len, "%.*s%s/", (int)base->len, base->buf, filename);
-	hashcpy(d->oid.hash, sha1);
+	oidread(&d->oid, sha1);
 }
 
 static int write_directory(struct archiver_context *c)
diff --git a/builtin/fast-import.c b/builtin/fast-import.c
index 3afa81cf9a..9d2a058a66 100644
--- a/builtin/fast-import.c
+++ b/builtin/fast-import.c
@@ -1276,8 +1276,8 @@ static void load_tree(struct tree_entry *root)
 		e->versions[0].mode = e->versions[1].mode;
 		e->name = to_atom(c, strlen(c));
 		c += e->name->str_len + 1;
-		hashcpy(e->versions[0].oid.hash, (unsigned char *)c);
-		hashcpy(e->versions[1].oid.hash, (unsigned char *)c);
+		oidread(&e->versions[0].oid, (unsigned char *)c);
+		oidread(&e->versions[1].oid, (unsigned char *)c);
 		c += the_hash_algo->rawsz;
 	}
 	free(buf);
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 15507b5cff..41e2c240b8 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -524,7 +524,7 @@ static void *unpack_raw_entry(struct object_entry *obj,
 
 	switch (obj->type) {
 	case OBJ_REF_DELTA:
-		hashcpy(ref_oid->hash, fill(the_hash_algo->rawsz));
+		oidread(ref_oid, fill(the_hash_algo->rawsz));
 		use(the_hash_algo->rawsz);
 		break;
 	case OBJ_OFS_DELTA:
@@ -1358,7 +1358,7 @@ static struct object_entry *append_obj_to_pack(struct hashfile *f,
 	obj[1].idx.offset += write_compressed(f, buf, size);
 	obj[0].idx.crc32 = crc32_end(f);
 	hashflush(f);
-	hashcpy(obj->idx.oid.hash, sha1);
+	oidread(&obj->idx.oid, sha1);
 	return obj;
 }
 
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 4a70b17f8f..a8b73ecf43 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -355,7 +355,7 @@ static void unpack_delta_entry(enum object_type type, unsigned long delta_size,
 	struct object_id base_oid;
 
 	if (type == OBJ_REF_DELTA) {
-		hashcpy(base_oid.hash, fill(the_hash_algo->rawsz));
+		oidread(&base_oid, fill(the_hash_algo->rawsz));
 		use(the_hash_algo->rawsz);
 		delta_data = get_data(delta_size);
 		if (dry_run || !delta_data) {
diff --git a/commit-graph.c b/commit-graph.c
index f18380b922..23fef56d31 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -425,7 +425,7 @@ struct commit_graph *parse_commit_graph(struct repository *r,
 		FREE_AND_NULL(graph->bloom_filter_settings);
 	}
 
-	hashcpy(graph->oid.hash, graph->data + graph->data_len - graph->hash_len);
+	oidread(&graph->oid, graph->data + graph->data_len - graph->hash_len);
 
 	if (verify_commit_graph_lite(graph))
 		goto free_and_return;
@@ -746,7 +746,7 @@ static void load_oid_from_graph(struct commit_graph *g,
 
 	lex_index = pos - g->num_commits_in_base;
 
-	hashcpy(oid->hash, g->chunk_oid_lookup + g->hash_len * lex_index);
+	oidread(oid, g->chunk_oid_lookup + g->hash_len * lex_index);
 }
 
 static struct commit_list **insert_parent_or_die(struct repository *r,
@@ -939,7 +939,7 @@ static struct tree *load_tree_for_commit(struct repository *r,
 	commit_data = g->chunk_commit_data +
 			GRAPH_DATA_WIDTH * (graph_pos - g->num_commits_in_base);
 
-	hashcpy(oid.hash, commit_data);
+	oidread(&oid, commit_data);
 	set_commit_tree(c, lookup_tree(r, &oid));
 
 	return c->maybe_tree;
@@ -2322,7 +2322,7 @@ int write_commit_graph(struct object_directory *odb,
 		struct commit_graph *g = ctx->r->objects->commit_graph;
 		for (i = 0; i < g->num_commits; i++) {
 			struct object_id oid;
-			hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+			oidread(&oid, g->chunk_oid_lookup + g->hash_len * i);
 			oid_array_append(&ctx->oids, &oid);
 		}
 	}
@@ -2453,7 +2453,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 	for (i = 0; i < g->num_commits; i++) {
 		struct commit *graph_commit;
 
-		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+		oidread(&cur_oid, g->chunk_oid_lookup + g->hash_len * i);
 
 		if (i && oidcmp(&prev_oid, &cur_oid) >= 0)
 			graph_report(_("commit-graph has incorrect OID order: %s then %s"),
@@ -2501,7 +2501,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 		timestamp_t generation;
 
 		display_progress(progress, i + 1);
-		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+		oidread(&cur_oid, g->chunk_oid_lookup + g->hash_len * i);
 
 		graph_commit = lookup_commit(r, &cur_oid);
 		odb_commit = (struct commit *)create_object(r, &cur_oid, alloc_commit_node(r));
diff --git a/dir.c b/dir.c
index 3474e67e8f..813dd7ba53 100644
--- a/dir.c
+++ b/dir.c
@@ -3344,7 +3344,7 @@ static void read_oid(size_t pos, void *cb)
 		rd->data = rd->end + 1;
 		return;
 	}
-	hashcpy(ud->exclude_oid.hash, rd->data);
+	oidread(&ud->exclude_oid, rd->data);
 	rd->data += the_hash_algo->rawsz;
 }
 
@@ -3352,7 +3352,7 @@ static void load_oid_stat(struct oid_stat *oid_stat, const unsigned char *data,
 			  const unsigned char *sha1)
 {
 	stat_data_from_disk(&oid_stat->stat, data);
-	hashcpy(oid_stat->oid.hash, sha1);
+	oidread(&oid_stat->oid, sha1);
 	oid_stat->valid = 1;
 }
 
diff --git a/http-walker.c b/http-walker.c
index 4fb1235cd4..90d8ecb57e 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -155,7 +155,7 @@ static void prefetch(struct walker *walker, unsigned char *sha1)
 
 	newreq = xmalloc(sizeof(*newreq));
 	newreq->walker = walker;
-	hashcpy(newreq->oid.hash, sha1);
+	oidread(&newreq->oid, sha1);
 	newreq->repo = data->alt;
 	newreq->state = WAITING;
 	newreq->req = NULL;
diff --git a/match-trees.c b/match-trees.c
index f6c194c1cc..df413989fa 100644
--- a/match-trees.c
+++ b/match-trees.c
@@ -226,7 +226,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
 		    oid_to_hex(oid1));
 	if (*subpath) {
 		struct object_id tree_oid;
-		hashcpy(tree_oid.hash, rewrite_here);
+		oidread(&tree_oid, rewrite_here);
 		status = splice_tree(&tree_oid, subpath, oid2, &subtree);
 		if (status)
 			return status;
diff --git a/midx.c b/midx.c
index 9e86583172..21d6a05e88 100644
--- a/midx.c
+++ b/midx.c
@@ -247,7 +247,7 @@ struct object_id *nth_midxed_object_oid(struct object_id *oid,
 	if (n >= m->num_objects)
 		return NULL;
 
-	hashcpy(oid->hash, m->chunk_oid_lookup + m->hash_len * n);
+	oidread(oid, m->chunk_oid_lookup + m->hash_len * n);
 	return oid;
 }
 
diff --git a/notes.c b/notes.c
index a19e4ad794..a44b25858f 100644
--- a/notes.c
+++ b/notes.c
@@ -352,7 +352,7 @@ static void add_non_note(struct notes_tree *t, char *path,
 	n->next = NULL;
 	n->path = path;
 	n->mode = mode;
-	hashcpy(n->oid.hash, sha1);
+	oidread(&n->oid, sha1);
 	t->prev_non_note = n;
 
 	if (!t->first_non_note) {
@@ -1134,7 +1134,7 @@ int remove_note(struct notes_tree *t, const unsigned char *object_sha1)
 	if (!t)
 		t = &default_notes_tree;
 	assert(t->initialized);
-	hashcpy(l.key_oid.hash, object_sha1);
+	oidread(&l.key_oid, object_sha1);
 	oidclr(&l.val_oid);
 	note_tree_remove(t, t->root, 0, &l);
 	if (is_null_oid(&l.val_oid)) /* no note was removed */
diff --git a/read-cache.c b/read-cache.c
index 5a907af2fb..2944146545 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1845,7 +1845,7 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
 	ce->ce_flags = flags & ~CE_NAMEMASK;
 	ce->ce_namelen = len;
 	ce->index = 0;
-	hashcpy(ce->oid.hash, ondisk->data);
+	oidread(&ce->oid, ondisk->data);
 	memcpy(ce->name, name, len);
 	ce->name[len] = '\0';
 
@@ -2195,7 +2195,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	if (verify_hdr(hdr, mmap_size) < 0)
 		goto unmap;
 
-	hashcpy(istate->oid.hash, (const unsigned char *)hdr + mmap_size - the_hash_algo->rawsz);
+	oidread(&istate->oid, (const unsigned char *)hdr + mmap_size - the_hash_algo->rawsz);
 	istate->version = ntohl(hdr->hdr_version);
 	istate->cache_nr = ntohl(hdr->hdr_entries);
 	istate->cache_alloc = alloc_nr(istate->cache_nr);
diff --git a/split-index.c b/split-index.c
index 94937d21a3..4d6e52d46f 100644
--- a/split-index.c
+++ b/split-index.c
@@ -21,7 +21,7 @@ int read_link_extension(struct index_state *istate,
 	if (sz < the_hash_algo->rawsz)
 		return error("corrupt link extension (too short)");
 	si = init_split_index(istate);
-	hashcpy(si->base_oid.hash, data);
+	oidread(&si->base_oid, data);
 	data += the_hash_algo->rawsz;
 	sz -= the_hash_algo->rawsz;
 	if (!sz)
diff --git a/tree-walk.c b/tree-walk.c
index 2d6226d5f1..3a94959d64 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -49,7 +49,7 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
 	desc->entry.path = path;
 	desc->entry.mode = canon_mode(mode);
 	desc->entry.pathlen = len - 1;
-	hashcpy(desc->entry.oid.hash, (const unsigned char *)path + len);
+	oidread(&desc->entry.oid, (const unsigned char *)path + len);
 
 	return 0;
 }

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 05/15] hash: add a function to finalize object IDs
  2021-04-10 15:21 [PATCH 00/15] SHA-256 / SHA-1 interop, part 1 brian m. carlson
                   ` (3 preceding siblings ...)
  2021-04-10 15:21 ` [PATCH 04/15] Always use oidread to read into " brian m. carlson
@ 2021-04-10 15:21 ` brian m. carlson
  2021-04-10 15:21 ` [PATCH 06/15] Use the final_oid_fn to finalize hashing of " brian m. carlson
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 57+ messages in thread
From: brian m. carlson @ 2021-04-10 15:21 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee

To avoid the penalty of having to branch in hash comparison functions,
we'll want to always compare the full hash member in a struct object_id,
which will require that SHA-1 object IDs be zero-padded.  To do so, add
a function which finalizes a hash context and writes it into an object
ID that performs this padding.

Move the definition of struct object_id and the constant definitions
higher up so we they are available for us to use.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 hash.h        | 50 +++++++++++++++++++++++++++-----------------------
 object-file.c | 25 +++++++++++++++++++++++++
 2 files changed, 52 insertions(+), 23 deletions(-)

diff --git a/hash.h b/hash.h
index dafdcb3335..c8f03d8aee 100644
--- a/hash.h
+++ b/hash.h
@@ -95,6 +95,29 @@ static inline void git_SHA256_Clone(git_SHA256_CTX *dst, const git_SHA256_CTX *s
 /* Number of algorithms supported (including unknown). */
 #define GIT_HASH_NALGOS (GIT_HASH_SHA256 + 1)
 
+/* The length in bytes and in hex digits of an object name (SHA-1 value). */
+#define GIT_SHA1_RAWSZ 20
+#define GIT_SHA1_HEXSZ (2 * GIT_SHA1_RAWSZ)
+/* The block size of SHA-1. */
+#define GIT_SHA1_BLKSZ 64
+
+/* The length in bytes and in hex digits of an object name (SHA-256 value). */
+#define GIT_SHA256_RAWSZ 32
+#define GIT_SHA256_HEXSZ (2 * GIT_SHA256_RAWSZ)
+/* The block size of SHA-256. */
+#define GIT_SHA256_BLKSZ 64
+
+/* The length in byte and in hex digits of the largest possible hash value. */
+#define GIT_MAX_RAWSZ GIT_SHA256_RAWSZ
+#define GIT_MAX_HEXSZ GIT_SHA256_HEXSZ
+/* The largest possible block size for any supported hash. */
+#define GIT_MAX_BLKSZ GIT_SHA256_BLKSZ
+
+struct object_id {
+	unsigned char hash[GIT_MAX_RAWSZ];
+	int algo;
+};
+
 /* A suitably aligned type for stack allocations of hash contexts. */
 union git_hash_ctx {
 	git_SHA_CTX sha1;
@@ -106,6 +129,7 @@ typedef void (*git_hash_init_fn)(git_hash_ctx *ctx);
 typedef void (*git_hash_clone_fn)(git_hash_ctx *dst, const git_hash_ctx *src);
 typedef void (*git_hash_update_fn)(git_hash_ctx *ctx, const void *in, size_t len);
 typedef void (*git_hash_final_fn)(unsigned char *hash, git_hash_ctx *ctx);
+typedef void (*git_hash_final_oid_fn)(struct object_id *oid, git_hash_ctx *ctx);
 
 struct git_hash_algo {
 	/*
@@ -138,6 +162,9 @@ struct git_hash_algo {
 	/* The hash finalization function. */
 	git_hash_final_fn final_fn;
 
+	/* The hash finalization function for object IDs. */
+	git_hash_final_oid_fn final_oid_fn;
+
 	/* The OID of the empty tree. */
 	const struct object_id *empty_tree;
 
@@ -161,29 +188,6 @@ static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
 	return p - hash_algos;
 }
 
-/* The length in bytes and in hex digits of an object name (SHA-1 value). */
-#define GIT_SHA1_RAWSZ 20
-#define GIT_SHA1_HEXSZ (2 * GIT_SHA1_RAWSZ)
-/* The block size of SHA-1. */
-#define GIT_SHA1_BLKSZ 64
-
-/* The length in bytes and in hex digits of an object name (SHA-256 value). */
-#define GIT_SHA256_RAWSZ 32
-#define GIT_SHA256_HEXSZ (2 * GIT_SHA256_RAWSZ)
-/* The block size of SHA-256. */
-#define GIT_SHA256_BLKSZ 64
-
-/* The length in byte and in hex digits of the largest possible hash value. */
-#define GIT_MAX_RAWSZ GIT_SHA256_RAWSZ
-#define GIT_MAX_HEXSZ GIT_SHA256_HEXSZ
-/* The largest possible block size for any supported hash. */
-#define GIT_MAX_BLKSZ GIT_SHA256_BLKSZ
-
-struct object_id {
-	unsigned char hash[GIT_MAX_RAWSZ];
-	int algo;
-};
-
 #define the_hash_algo the_repository->hash_algo
 
 extern const struct object_id null_oid;
diff --git a/object-file.c b/object-file.c
index f5847ee20f..58d31452d8 100644
--- a/object-file.c
+++ b/object-file.c
@@ -89,6 +89,12 @@ static void git_hash_sha1_final(unsigned char *hash, git_hash_ctx *ctx)
 	git_SHA1_Final(hash, &ctx->sha1);
 }
 
+static void git_hash_sha1_final_oid(struct object_id *oid, git_hash_ctx *ctx)
+{
+	git_SHA1_Final(oid->hash, &ctx->sha1);
+	memset(oid->hash + GIT_SHA1_RAWSZ, 0, GIT_MAX_RAWSZ - GIT_SHA1_RAWSZ);
+}
+
 
 static void git_hash_sha256_init(git_hash_ctx *ctx)
 {
@@ -110,6 +116,16 @@ static void git_hash_sha256_final(unsigned char *hash, git_hash_ctx *ctx)
 	git_SHA256_Final(hash, &ctx->sha256);
 }
 
+static void git_hash_sha256_final_oid(struct object_id *oid, git_hash_ctx *ctx)
+{
+	git_SHA256_Final(oid->hash, &ctx->sha256);
+	/*
+	 * This currently does nothing, so the compiler should optimize it out,
+	 * but keep it in case we extend the hash size again.
+	 */
+	memset(oid->hash + GIT_SHA256_RAWSZ, 0, GIT_MAX_RAWSZ - GIT_SHA256_RAWSZ);
+}
+
 static void git_hash_unknown_init(git_hash_ctx *ctx)
 {
 	BUG("trying to init unknown hash");
@@ -130,6 +146,12 @@ static void git_hash_unknown_final(unsigned char *hash, git_hash_ctx *ctx)
 	BUG("trying to finalize unknown hash");
 }
 
+static void git_hash_unknown_final_oid(struct object_id *oid, git_hash_ctx *ctx)
+{
+	BUG("trying to finalize unknown hash");
+}
+
+
 const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 	{
 		NULL,
@@ -141,6 +163,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		git_hash_unknown_clone,
 		git_hash_unknown_update,
 		git_hash_unknown_final,
+		git_hash_unknown_final_oid,
 		NULL,
 		NULL,
 	},
@@ -155,6 +178,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		git_hash_sha1_clone,
 		git_hash_sha1_update,
 		git_hash_sha1_final,
+		git_hash_sha1_final_oid,
 		&empty_tree_oid,
 		&empty_blob_oid,
 	},
@@ -169,6 +193,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		git_hash_sha256_clone,
 		git_hash_sha256_update,
 		git_hash_sha256_final,
+		git_hash_sha256_final_oid,
 		&empty_tree_oid_sha256,
 		&empty_blob_oid_sha256,
 	}

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 06/15] Use the final_oid_fn to finalize hashing of object IDs
  2021-04-10 15:21 [PATCH 00/15] SHA-256 / SHA-1 interop, part 1 brian m. carlson
                   ` (4 preceding siblings ...)
  2021-04-10 15:21 ` [PATCH 05/15] hash: add a function to finalize object IDs brian m. carlson
@ 2021-04-10 15:21 ` brian m. carlson
  2021-04-10 15:21 ` [PATCH 07/15] builtin/pack-redundant: avoid casting buffers to struct object_id brian m. carlson
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 57+ messages in thread
From: brian m. carlson @ 2021-04-10 15:21 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee

When we're hashing a value which is going to be an object ID, we want to
zero-pad that value if necessary.  To do so, use the final_oid_fn
instead of the final_fn anytime we're going to create an object ID to
ensure we perform this operation.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/fast-import.c    | 4 ++--
 builtin/index-pack.c     | 2 +-
 builtin/unpack-objects.c | 2 +-
 bulk-checkin.c           | 2 +-
 diff.c                   | 2 +-
 http.c                   | 2 +-
 object-file.c            | 8 ++++----
 7 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/builtin/fast-import.c b/builtin/fast-import.c
index 9d2a058a66..20406f6775 100644
--- a/builtin/fast-import.c
+++ b/builtin/fast-import.c
@@ -940,7 +940,7 @@ static int store_object(
 	the_hash_algo->init_fn(&c);
 	the_hash_algo->update_fn(&c, hdr, hdrlen);
 	the_hash_algo->update_fn(&c, dat->buf, dat->len);
-	the_hash_algo->final_fn(oid.hash, &c);
+	the_hash_algo->final_oid_fn(&oid, &c);
 	if (oidout)
 		oidcpy(oidout, &oid);
 
@@ -1136,7 +1136,7 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark)
 		}
 	}
 	git_deflate_end(&s);
-	the_hash_algo->final_fn(oid.hash, &c);
+	the_hash_algo->final_oid_fn(&oid, &c);
 
 	if (oidout)
 		oidcpy(oidout, &oid);
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 41e2c240b8..3fbc5d7077 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -489,7 +489,7 @@ static void *unpack_entry_data(off_t offset, unsigned long size,
 		bad_object(offset, _("inflate returned %d"), status);
 	git_inflate_end(&stream);
 	if (oid)
-		the_hash_algo->final_fn(oid->hash, &c);
+		the_hash_algo->final_oid_fn(oid, &c);
 	return buf == fixed_buf ? NULL : buf;
 }
 
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index a8b73ecf43..6ac90dc5f7 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -576,7 +576,7 @@ int cmd_unpack_objects(int argc, const char **argv, const char *prefix)
 	the_hash_algo->init_fn(&ctx);
 	unpack_all();
 	the_hash_algo->update_fn(&ctx, buffer, offset);
-	the_hash_algo->final_fn(oid.hash, &ctx);
+	the_hash_algo->final_oid_fn(&oid, &ctx);
 	if (strict) {
 		write_rest();
 		if (fsck_finish(&fsck_options))
diff --git a/bulk-checkin.c b/bulk-checkin.c
index 6f3c97cd34..127312acd1 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -238,7 +238,7 @@ static int deflate_to_pack(struct bulk_checkin_state *state,
 		if (lseek(fd, seekback, SEEK_SET) == (off_t) -1)
 			return error("cannot seek back");
 	}
-	the_hash_algo->final_fn(result_oid->hash, &ctx);
+	the_hash_algo->final_oid_fn(result_oid, &ctx);
 	if (!idx)
 		return 0;
 
diff --git a/diff.c b/diff.c
index 4acccd9d7e..97c62f47df 100644
--- a/diff.c
+++ b/diff.c
@@ -6234,7 +6234,7 @@ static int diff_get_patch_id(struct diff_options *options, struct object_id *oid
 	}
 
 	if (!stable)
-		the_hash_algo->final_fn(oid->hash, &ctx);
+		the_hash_algo->final_oid_fn(oid, &ctx);
 
 	return 0;
 }
diff --git a/http.c b/http.c
index 406410f884..c83bc33a5f 100644
--- a/http.c
+++ b/http.c
@@ -2576,7 +2576,7 @@ int finish_http_object_request(struct http_object_request *freq)
 	}
 
 	git_inflate_end(&freq->stream);
-	the_hash_algo->final_fn(freq->real_oid.hash, &freq->c);
+	the_hash_algo->final_oid_fn(&freq->real_oid, &freq->c);
 	if (freq->zret != Z_STREAM_END) {
 		unlink_or_warn(freq->tmpfile.buf);
 		return -1;
diff --git a/object-file.c b/object-file.c
index 58d31452d8..3f43c376e7 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1054,7 +1054,7 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 			break;
 		r->hash_algo->update_fn(&c, buf, readlen);
 	}
-	r->hash_algo->final_fn(real_oid.hash, &c);
+	r->hash_algo->final_oid_fn(&real_oid, &c);
 	close_istream(st);
 	return !oideq(oid, &real_oid) ? -1 : 0;
 }
@@ -1755,7 +1755,7 @@ static void write_object_file_prepare(const struct git_hash_algo *algo,
 	algo->init_fn(&c);
 	algo->update_fn(&c, hdr, *hdrlen);
 	algo->update_fn(&c, buf, len);
-	algo->final_fn(oid->hash, &c);
+	algo->final_oid_fn(oid, &c);
 }
 
 /*
@@ -1927,7 +1927,7 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
 	if (ret != Z_OK)
 		die(_("deflateEnd on object %s failed (%d)"), oid_to_hex(oid),
 		    ret);
-	the_hash_algo->final_fn(parano_oid.hash, &c);
+	the_hash_algo->final_oid_fn(&parano_oid, &c);
 	if (!oideq(oid, &parano_oid))
 		die(_("confused by unstable object source data for %s"),
 		    oid_to_hex(oid));
@@ -2520,7 +2520,7 @@ static int check_stream_oid(git_zstream *stream,
 		return -1;
 	}
 
-	the_hash_algo->final_fn(real_oid.hash, &c);
+	the_hash_algo->final_oid_fn(&real_oid, &c);
 	if (!oideq(expected_oid, &real_oid)) {
 		error(_("hash mismatch for %s (expected %s)"), path,
 		      oid_to_hex(expected_oid));

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 07/15] builtin/pack-redundant: avoid casting buffers to struct object_id
  2021-04-10 15:21 [PATCH 00/15] SHA-256 / SHA-1 interop, part 1 brian m. carlson
                   ` (5 preceding siblings ...)
  2021-04-10 15:21 ` [PATCH 06/15] Use the final_oid_fn to finalize hashing of " brian m. carlson
@ 2021-04-10 15:21 ` brian m. carlson
  2021-04-10 15:21 ` [PATCH 08/15] cache: compare the entire buffer for " brian m. carlson
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 57+ messages in thread
From: brian m. carlson @ 2021-04-10 15:21 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee

Now that we need our instances of struct object_id to be zero padded, we
can no longer cast unsigned char buffers to be pointers to struct
object_id.  This file reads data out of the pack objects and then
inserts it directly into a linked list item which is a pointer to struct
object_id.  Instead, let's have the linked list item hold its own struct
object_id and copy the data into it.

In addition, since these are not really pointers to struct object_id,
stop passing them around as such, and call them what they really are:
pointers to unsigned char.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/pack-redundant.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/builtin/pack-redundant.c b/builtin/pack-redundant.c
index 7102996c75..8bf5c0acad 100644
--- a/builtin/pack-redundant.c
+++ b/builtin/pack-redundant.c
@@ -20,7 +20,7 @@ static int load_all_packs, verbose, alt_odb;
 
 struct llist_item {
 	struct llist_item *next;
-	const struct object_id *oid;
+	struct object_id oid;
 };
 static struct llist {
 	struct llist_item *front;
@@ -95,10 +95,10 @@ static struct llist * llist_copy(struct llist *list)
 
 static inline struct llist_item *llist_insert(struct llist *list,
 					      struct llist_item *after,
-					      const struct object_id *oid)
+					      const unsigned char *oid)
 {
 	struct llist_item *new_item = llist_item_get();
-	new_item->oid = oid;
+	oidread(&new_item->oid, oid);
 	new_item->next = NULL;
 
 	if (after != NULL) {
@@ -118,7 +118,7 @@ static inline struct llist_item *llist_insert(struct llist *list,
 }
 
 static inline struct llist_item *llist_insert_back(struct llist *list,
-						   const struct object_id *oid)
+						   const unsigned char *oid)
 {
 	return llist_insert(list, list->back, oid);
 }
@@ -130,9 +130,9 @@ static inline struct llist_item *llist_insert_sorted_unique(struct llist *list,
 
 	l = (hint == NULL) ? list->front : hint;
 	while (l) {
-		int cmp = oidcmp(l->oid, oid);
+		int cmp = oidcmp(&l->oid, oid);
 		if (cmp > 0) { /* we insert before this entry */
-			return llist_insert(list, prev, oid);
+			return llist_insert(list, prev, oid->hash);
 		}
 		if (!cmp) { /* already exists */
 			return l;
@@ -141,11 +141,11 @@ static inline struct llist_item *llist_insert_sorted_unique(struct llist *list,
 		l = l->next;
 	}
 	/* insert at the end */
-	return llist_insert_back(list, oid);
+	return llist_insert_back(list, oid->hash);
 }
 
 /* returns a pointer to an item in front of sha1 */
-static inline struct llist_item * llist_sorted_remove(struct llist *list, const struct object_id *oid, struct llist_item *hint)
+static inline struct llist_item * llist_sorted_remove(struct llist *list, const unsigned char *oid, struct llist_item *hint)
 {
 	struct llist_item *prev, *l;
 
@@ -153,7 +153,7 @@ static inline struct llist_item * llist_sorted_remove(struct llist *list, const
 	l = (hint == NULL) ? list->front : hint;
 	prev = NULL;
 	while (l) {
-		const int cmp = oidcmp(l->oid, oid);
+		const int cmp = hashcmp(l->oid.hash, oid);
 		if (cmp > 0) /* not in list, since sorted */
 			return prev;
 		if (!cmp) { /* found */
@@ -188,7 +188,7 @@ static void llist_sorted_difference_inplace(struct llist *A,
 	b = B->front;
 
 	while (b) {
-		hint = llist_sorted_remove(A, b->oid, hint);
+		hint = llist_sorted_remove(A, b->oid.hash, hint);
 		b = b->next;
 	}
 }
@@ -260,10 +260,10 @@ static void cmp_two_packs(struct pack_list *p1, struct pack_list *p2)
 		/* cmp ~ p1 - p2 */
 		if (cmp == 0) {
 			p1_hint = llist_sorted_remove(p1->unique_objects,
-					(const struct object_id *)(p1_base + p1_off),
+					p1_base + p1_off,
 					p1_hint);
 			p2_hint = llist_sorted_remove(p2->unique_objects,
-					(const struct object_id *)(p1_base + p1_off),
+					p1_base + p1_off,
 					p2_hint);
 			p1_off += p1_step;
 			p2_off += p2_step;
@@ -455,7 +455,7 @@ static void load_all_objects(void)
 		l = pl->remaining_objects->front;
 		while (l) {
 			hint = llist_insert_sorted_unique(all_objects,
-							  l->oid, hint);
+							  &l->oid, hint);
 			l = l->next;
 		}
 		pl = pl->next;
@@ -521,7 +521,7 @@ static struct pack_list * add_pack(struct packed_git *p)
 	base += 256 * 4 + ((p->index_version < 2) ? 4 : 8);
 	step = the_hash_algo->rawsz + ((p->index_version < 2) ? 4 : 0);
 	while (off < p->num_objects * step) {
-		llist_insert_back(l.remaining_objects, (const struct object_id *)(base + off));
+		llist_insert_back(l.remaining_objects, base + off);
 		off += step;
 	}
 	l.all_objects_size = l.remaining_objects->size;

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 08/15] cache: compare the entire buffer for struct object_id
  2021-04-10 15:21 [PATCH 00/15] SHA-256 / SHA-1 interop, part 1 brian m. carlson
                   ` (6 preceding siblings ...)
  2021-04-10 15:21 ` [PATCH 07/15] builtin/pack-redundant: avoid casting buffers to struct object_id brian m. carlson
@ 2021-04-10 15:21 ` brian m. carlson
  2021-04-11  8:17   ` Chris Torek
  2021-04-11 11:36   ` Ævar Arnfjörð Bjarmason
  2021-04-10 15:21 ` [PATCH 09/15] hash: set and copy algo field in " brian m. carlson
                   ` (6 subsequent siblings)
  14 siblings, 2 replies; 57+ messages in thread
From: brian m. carlson @ 2021-04-10 15:21 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee

Currently, when we compare two object IDs, we have to take a branch to
determine what the hash size is supposed to be.  The compiler can
optimize well for a single length, but has trouble when there are two
possible lengths.

There is, however, an alternative: we can ensure that we always compare
the full length of the hash buffer, but in turn we must zero the
remainder of the buffer when using SHA-1; otherwise, we'll end up with
incompatible junk at the end of otherwise equivalent object IDs that
will prevent them from matching.  This is an acceptable tradeoff,
because we generally read an object ID in once, but then compare it
against others multiple times.

This latter approach also has some benefits as well: since we will have
annotated every location in which we load an object ID into an instance
of struct object_id, if we want to set the hash algorithm for the object
ID, we can do so at the same time.

Adopt this latter approach, since it provides us greater flexibility and
lets us read and store object IDs for multiple algorithms at once.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 hash.h        | 13 ++++++++++---
 hex.c         |  9 ++++++---
 notes.c       |  3 +++
 object-file.c |  1 +
 4 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/hash.h b/hash.h
index c8f03d8aee..04eba5c56b 100644
--- a/hash.h
+++ b/hash.h
@@ -205,7 +205,7 @@ static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)
 {
-	return hashcmp(oid1->hash, oid2->hash);
+	return memcmp(oid1->hash, oid2->hash, GIT_MAX_RAWSZ);
 }
 
 static inline int hasheq(const unsigned char *sha1, const unsigned char *sha2)
@@ -221,7 +221,7 @@ static inline int hasheq(const unsigned char *sha1, const unsigned char *sha2)
 
 static inline int oideq(const struct object_id *oid1, const struct object_id *oid2)
 {
-	return hasheq(oid1->hash, oid2->hash);
+	return !memcmp(oid1->hash, oid2->hash, GIT_MAX_RAWSZ);
 }
 
 static inline int is_null_oid(const struct object_id *oid)
@@ -258,7 +258,9 @@ static inline void oidclr(struct object_id *oid)
 
 static inline void oidread(struct object_id *oid, const unsigned char *hash)
 {
-	memcpy(oid->hash, hash, the_hash_algo->rawsz);
+	size_t rawsz = the_hash_algo->rawsz;
+	memcpy(oid->hash, hash, rawsz);
+	memset(oid->hash + rawsz, 0, GIT_MAX_RAWSZ - rawsz);
 }
 
 static inline int is_empty_blob_sha1(const unsigned char *sha1)
@@ -281,6 +283,11 @@ static inline int is_empty_tree_oid(const struct object_id *oid)
 	return oideq(oid, the_hash_algo->empty_tree);
 }
 
+static inline void oid_pad_buffer(struct object_id *oid, const struct git_hash_algo *algop)
+{
+	memset(oid->hash + algop->rawsz, 0, GIT_MAX_RAWSZ - algop->rawsz);
+}
+
 const char *empty_tree_oid_hex(void);
 const char *empty_blob_oid_hex(void);
 
diff --git a/hex.c b/hex.c
index da51e64929..5fa3e71cb9 100644
--- a/hex.c
+++ b/hex.c
@@ -69,7 +69,10 @@ int get_sha1_hex(const char *hex, unsigned char *sha1)
 int get_oid_hex_algop(const char *hex, struct object_id *oid,
 		      const struct git_hash_algo *algop)
 {
-	return get_hash_hex_algop(hex, oid->hash, algop);
+	int ret = get_hash_hex_algop(hex, oid->hash, algop);
+	if (!ret)
+		oid_pad_buffer(oid, algop);
+	return ret;
 }
 
 /*
@@ -80,7 +83,7 @@ int get_oid_hex_any(const char *hex, struct object_id *oid)
 {
 	int i;
 	for (i = GIT_HASH_NALGOS - 1; i > 0; i--) {
-		if (!get_hash_hex_algop(hex, oid->hash, &hash_algos[i]))
+		if (!get_oid_hex_algop(hex, oid, &hash_algos[i]))
 			return i;
 	}
 	return GIT_HASH_UNKNOWN;
@@ -95,7 +98,7 @@ int parse_oid_hex_algop(const char *hex, struct object_id *oid,
 			const char **end,
 			const struct git_hash_algo *algop)
 {
-	int ret = get_hash_hex_algop(hex, oid->hash, algop);
+	int ret = get_oid_hex_algop(hex, oid, algop);
 	if (!ret)
 		*end = hex + algop->hexsz;
 	return ret;
diff --git a/notes.c b/notes.c
index a44b25858f..1dfe9e2b9f 100644
--- a/notes.c
+++ b/notes.c
@@ -455,6 +455,8 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 		CALLOC_ARRAY(l, 1);
 		oidcpy(&l->key_oid, &object_oid);
 		oidcpy(&l->val_oid, &entry.oid);
+		oid_pad_buffer(&l->key_oid, the_hash_algo);
+		oid_pad_buffer(&l->val_oid, the_hash_algo);
 		if (note_tree_insert(t, node, n, l, type,
 				     combine_notes_concatenate))
 			die("Failed to load %s %s into notes tree "
@@ -484,6 +486,7 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 				strbuf_addch(&non_note_path, '/');
 			}
 			strbuf_addstr(&non_note_path, entry.path);
+			oid_pad_buffer(&entry.oid, the_hash_algo);
 			add_non_note(t, strbuf_detach(&non_note_path, NULL),
 				     entry.mode, entry.oid.hash);
 		}
diff --git a/object-file.c b/object-file.c
index 3f43c376e7..8e338247cc 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2352,6 +2352,7 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr,
 		if (namelen == the_hash_algo->hexsz - 2 &&
 		    !hex_to_bytes(oid.hash + 1, de->d_name,
 				  the_hash_algo->rawsz - 1)) {
+			oid_pad_buffer(&oid, the_hash_algo);
 			if (obj_cb) {
 				r = obj_cb(&oid, path->buf, data);
 				if (r)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 09/15] hash: set and copy algo field in struct object_id
  2021-04-10 15:21 [PATCH 00/15] SHA-256 / SHA-1 interop, part 1 brian m. carlson
                   ` (7 preceding siblings ...)
  2021-04-10 15:21 ` [PATCH 08/15] cache: compare the entire buffer for " brian m. carlson
@ 2021-04-10 15:21 ` brian m. carlson
  2021-04-11 11:57   ` Ævar Arnfjörð Bjarmason
  2021-04-10 15:21 ` [PATCH 10/15] hash: provide per-algorithm null OIDs brian m. carlson
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 57+ messages in thread
From: brian m. carlson @ 2021-04-10 15:21 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee

Now that struct object_id has an algorithm field, we should populate it.
This will allow us to handle object IDs in any supported algorithm and
distinguish between them.  Ensure that the field is written whenever we
write an object ID by storing it explicitly every time we write an
object.  Set values for the empty blob and tree values as well.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 hash.h        |  4 ++++
 object-file.c | 14 ++++++++++----
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/hash.h b/hash.h
index 04eba5c56b..3b114f053e 100644
--- a/hash.h
+++ b/hash.h
@@ -237,6 +237,7 @@ static inline void hashcpy(unsigned char *sha_dst, const unsigned char *sha_src)
 static inline void oidcpy(struct object_id *dst, const struct object_id *src)
 {
 	memcpy(dst->hash, src->hash, GIT_MAX_RAWSZ);
+	dst->algo = src->algo;
 }
 
 static inline struct object_id *oiddup(const struct object_id *src)
@@ -254,6 +255,7 @@ static inline void hashclr(unsigned char *hash)
 static inline void oidclr(struct object_id *oid)
 {
 	memset(oid->hash, 0, GIT_MAX_RAWSZ);
+	oid->algo = hash_algo_by_ptr(the_hash_algo);
 }
 
 static inline void oidread(struct object_id *oid, const unsigned char *hash)
@@ -261,6 +263,7 @@ static inline void oidread(struct object_id *oid, const unsigned char *hash)
 	size_t rawsz = the_hash_algo->rawsz;
 	memcpy(oid->hash, hash, rawsz);
 	memset(oid->hash + rawsz, 0, GIT_MAX_RAWSZ - rawsz);
+	oid->algo = hash_algo_by_ptr(the_hash_algo);
 }
 
 static inline int is_empty_blob_sha1(const unsigned char *sha1)
@@ -286,6 +289,7 @@ static inline int is_empty_tree_oid(const struct object_id *oid)
 static inline void oid_pad_buffer(struct object_id *oid, const struct git_hash_algo *algop)
 {
 	memset(oid->hash + algop->rawsz, 0, GIT_MAX_RAWSZ - algop->rawsz);
+	oid->algo = hash_algo_by_ptr(algop);
 }
 
 const char *empty_tree_oid_hex(void);
diff --git a/object-file.c b/object-file.c
index 8e338247cc..5f1fa05c4e 100644
--- a/object-file.c
+++ b/object-file.c
@@ -57,16 +57,20 @@
 
 const struct object_id null_oid;
 static const struct object_id empty_tree_oid = {
-	EMPTY_TREE_SHA1_BIN_LITERAL
+	EMPTY_TREE_SHA1_BIN_LITERAL,
+	GIT_HASH_SHA1,
 };
 static const struct object_id empty_blob_oid = {
-	EMPTY_BLOB_SHA1_BIN_LITERAL
+	EMPTY_BLOB_SHA1_BIN_LITERAL,
+	GIT_HASH_SHA1,
 };
 static const struct object_id empty_tree_oid_sha256 = {
-	EMPTY_TREE_SHA256_BIN_LITERAL
+	EMPTY_TREE_SHA256_BIN_LITERAL,
+	GIT_HASH_SHA256,
 };
 static const struct object_id empty_blob_oid_sha256 = {
-	EMPTY_BLOB_SHA256_BIN_LITERAL
+	EMPTY_BLOB_SHA256_BIN_LITERAL,
+	GIT_HASH_SHA256,
 };
 
 static void git_hash_sha1_init(git_hash_ctx *ctx)
@@ -93,6 +97,7 @@ static void git_hash_sha1_final_oid(struct object_id *oid, git_hash_ctx *ctx)
 {
 	git_SHA1_Final(oid->hash, &ctx->sha1);
 	memset(oid->hash + GIT_SHA1_RAWSZ, 0, GIT_MAX_RAWSZ - GIT_SHA1_RAWSZ);
+	oid->algo = GIT_HASH_SHA1;
 }
 
 
@@ -124,6 +129,7 @@ static void git_hash_sha256_final_oid(struct object_id *oid, git_hash_ctx *ctx)
 	 * but keep it in case we extend the hash size again.
 	 */
 	memset(oid->hash + GIT_SHA256_RAWSZ, 0, GIT_MAX_RAWSZ - GIT_SHA256_RAWSZ);
+	oid->algo = GIT_HASH_SHA256;
 }
 
 static void git_hash_unknown_init(git_hash_ctx *ctx)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 10/15] hash: provide per-algorithm null OIDs
  2021-04-10 15:21 [PATCH 00/15] SHA-256 / SHA-1 interop, part 1 brian m. carlson
                   ` (8 preceding siblings ...)
  2021-04-10 15:21 ` [PATCH 09/15] hash: set and copy algo field in " brian m. carlson
@ 2021-04-10 15:21 ` brian m. carlson
  2021-04-11 14:03   ` Junio C Hamano
  2021-04-10 15:21 ` [PATCH 11/15] builtin/show-index: set the algorithm for object IDs brian m. carlson
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 57+ messages in thread
From: brian m. carlson @ 2021-04-10 15:21 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee

Up until recently, object IDs did not have an algorithm member, only a
hash.  Consequently, it was possible to share one null (all-zeros)
object ID among all hash algorithms.  Now that we're going to be
handling objects from multiple hash algorithms, it's important to make
sure that all object IDs have a correct algorithm field.

Introduce a per-algorithm null OID, and add it to struct hash_algo.
Introduce a wrapper function as well, and use it everywhere we used to
use the null_oid constant.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 archive.c                                    |  4 ++-
 blame.c                                      |  2 +-
 branch.c                                     |  2 +-
 builtin/checkout.c                           |  6 ++--
 builtin/clone.c                              |  2 +-
 builtin/describe.c                           |  2 +-
 builtin/diff.c                               |  2 +-
 builtin/fast-export.c                        | 10 +++----
 builtin/grep.c                               |  2 +-
 builtin/ls-files.c                           |  2 +-
 builtin/rebase.c                             |  4 +--
 builtin/receive-pack.c                       |  2 +-
 builtin/submodule--helper.c                  | 29 ++++++++++----------
 builtin/unpack-objects.c                     |  3 +-
 builtin/worktree.c                           |  4 +--
 combine-diff.c                               |  2 +-
 connect.c                                    |  2 +-
 diff-lib.c                                   |  6 ++--
 diff-no-index.c                              |  2 +-
 diff.c                                       |  4 +--
 dir.c                                        |  2 +-
 grep.c                                       |  2 +-
 hash.h                                       |  7 +++--
 log-tree.c                                   |  2 +-
 merge-ort.c                                  | 20 +++++++-------
 merge-recursive.c                            | 10 +++----
 notes-merge.c                                |  2 +-
 notes.c                                      |  2 +-
 object-file.c                                | 15 +++++++++-
 parse-options-cb.c                           |  2 +-
 range-diff.c                                 |  2 +-
 refs.c                                       |  4 +--
 refs/debug.c                                 |  2 +-
 refs/files-backend.c                         |  2 +-
 reset.c                                      |  2 +-
 sequencer.c                                  |  4 +--
 submodule-config.c                           |  2 +-
 submodule.c                                  | 26 ++++++++++--------
 t/helper/test-submodule-nested-repo-config.c |  2 +-
 tree-diff.c                                  |  4 +--
 wt-status.c                                  |  4 +--
 xdiff-interface.c                            |  2 +-
 42 files changed, 117 insertions(+), 95 deletions(-)

diff --git a/archive.c b/archive.c
index 6cfb9e42d6..ff2bb54f62 100644
--- a/archive.c
+++ b/archive.c
@@ -275,9 +275,11 @@ int write_archive_entries(struct archiver_args *args,
 	int err;
 	struct strbuf path_in_archive = STRBUF_INIT;
 	struct strbuf content = STRBUF_INIT;
-	struct object_id fake_oid = null_oid;
+	struct object_id fake_oid;
 	int i;
 
+	oidcpy(&fake_oid, null_oid());
+
 	if (args->baselen > 0 && args->base[args->baselen - 1] == '/') {
 		size_t len = args->baselen;
 
diff --git a/blame.c b/blame.c
index 5018bb8fb2..206c295660 100644
--- a/blame.c
+++ b/blame.c
@@ -242,7 +242,7 @@ static struct commit *fake_working_tree_commit(struct repository *r,
 		switch (st.st_mode & S_IFMT) {
 		case S_IFREG:
 			if (opt->flags.allow_textconv &&
-			    textconv_object(r, read_from, mode, &null_oid, 0, &buf_ptr, &buf_len))
+			    textconv_object(r, read_from, mode, null_oid(), 0, &buf_ptr, &buf_len))
 				strbuf_attach(&buf, buf_ptr, buf_len, buf_len + 1);
 			else if (strbuf_read_file(&buf, read_from, st.st_size) != st.st_size)
 				die_errno("cannot open or read '%s'", read_from);
diff --git a/branch.c b/branch.c
index 9c9dae1eae..8db10f8496 100644
--- a/branch.c
+++ b/branch.c
@@ -322,7 +322,7 @@ void create_branch(struct repository *r,
 		transaction = ref_transaction_begin(&err);
 		if (!transaction ||
 		    ref_transaction_update(transaction, ref.buf,
-					   &oid, forcing ? NULL : &null_oid,
+					   &oid, forcing ? NULL : null_oid(),
 					   0, msg, &err) ||
 		    ref_transaction_commit(transaction, &err))
 			die("%s", err.buf);
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 4c696ef480..8a12b92c5f 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -106,8 +106,8 @@ static int post_checkout_hook(struct commit *old_commit, struct commit *new_comm
 			      int changed)
 {
 	return run_hook_le(NULL, "post-checkout",
-			   oid_to_hex(old_commit ? &old_commit->object.oid : &null_oid),
-			   oid_to_hex(new_commit ? &new_commit->object.oid : &null_oid),
+			   oid_to_hex(old_commit ? &old_commit->object.oid : null_oid()),
+			   oid_to_hex(new_commit ? &new_commit->object.oid : null_oid()),
 			   changed ? "1" : "0", NULL);
 	/* "new_commit" can be NULL when checking out from the index before
 	   a commit exists. */
@@ -638,7 +638,7 @@ static int reset_tree(struct tree *tree, const struct checkout_opts *o,
 	opts.src_index = &the_index;
 	opts.dst_index = &the_index;
 	init_checkout_metadata(&opts.meta, info->refname,
-			       info->commit ? &info->commit->object.oid : &null_oid,
+			       info->commit ? &info->commit->object.oid : null_oid(),
 			       NULL);
 	parse_tree(tree);
 	init_tree_desc(&tree_desc, tree->buffer, tree->size);
diff --git a/builtin/clone.c b/builtin/clone.c
index f6b0c48bed..eeb74c0217 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -820,7 +820,7 @@ static int checkout(int submodule_progress)
 	if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
 		die(_("unable to write new index file"));
 
-	err |= run_hook_le(NULL, "post-checkout", oid_to_hex(&null_oid),
+	err |= run_hook_le(NULL, "post-checkout", oid_to_hex(null_oid()),
 			   oid_to_hex(&oid), "1", NULL);
 
 	if (!err && (option_recurse_submodules.nr > 0)) {
diff --git a/builtin/describe.c b/builtin/describe.c
index 40482d8e9f..e912ba50d7 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -502,7 +502,7 @@ static void describe_blob(struct object_id oid, struct strbuf *dst)
 {
 	struct rev_info revs;
 	struct strvec args = STRVEC_INIT;
-	struct process_commit_data pcd = { null_oid, oid, dst, &revs};
+	struct process_commit_data pcd = { *null_oid(), oid, dst, &revs};
 
 	strvec_pushl(&args, "internal: The first arg is not parsed",
 		     "--objects", "--in-commit-order", "--reverse", "HEAD",
diff --git a/builtin/diff.c b/builtin/diff.c
index 617b9a4101..2d87c37a17 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -98,7 +98,7 @@ static int builtin_diff_b_f(struct rev_info *revs,
 
 	stuff_change(&revs->diffopt,
 		     blob[0]->mode, canon_mode(st.st_mode),
-		     &blob[0]->item->oid, &null_oid,
+		     &blob[0]->item->oid, null_oid(),
 		     1, 0,
 		     blob[0]->path ? blob[0]->path : path,
 		     path);
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 85a76e0ef8..3c20f164f0 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -870,7 +870,7 @@ static void handle_tag(const char *name, struct tag *tag)
 				p = rewrite_commit((struct commit *)tagged);
 				if (!p) {
 					printf("reset %s\nfrom %s\n\n",
-					       name, oid_to_hex(&null_oid));
+					       name, oid_to_hex(null_oid()));
 					free(buf);
 					return;
 				}
@@ -884,7 +884,7 @@ static void handle_tag(const char *name, struct tag *tag)
 
 	if (tagged->type == OBJ_TAG) {
 		printf("reset %s\nfrom %s\n\n",
-		       name, oid_to_hex(&null_oid));
+		       name, oid_to_hex(null_oid()));
 	}
 	skip_prefix(name, "refs/tags/", &name);
 	printf("tag %s\n", name);
@@ -1016,7 +1016,7 @@ static void handle_tags_and_duplicates(struct string_list *extras)
 				 * it.
 				 */
 				printf("reset %s\nfrom %s\n\n",
-				       name, oid_to_hex(&null_oid));
+				       name, oid_to_hex(null_oid()));
 				continue;
 			}
 
@@ -1035,7 +1035,7 @@ static void handle_tags_and_duplicates(struct string_list *extras)
 				if (!reference_excluded_commits) {
 					/* delete the ref */
 					printf("reset %s\nfrom %s\n\n",
-					       name, oid_to_hex(&null_oid));
+					       name, oid_to_hex(null_oid()));
 					continue;
 				}
 				/* set ref to commit using oid, not mark */
@@ -1146,7 +1146,7 @@ static void handle_deletes(void)
 			continue;
 
 		printf("reset %s\nfrom %s\n\n",
-				refspec->dst, oid_to_hex(&null_oid));
+				refspec->dst, oid_to_hex(null_oid()));
 	}
 }
 
diff --git a/builtin/grep.c b/builtin/grep.c
index 5de725f904..e0e326004e 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -421,7 +421,7 @@ static int grep_submodule(struct grep_opt *opt,
 	struct grep_opt subopt;
 	int hit;
 
-	sub = submodule_from_path(superproject, &null_oid, path);
+	sub = submodule_from_path(superproject, null_oid(), path);
 
 	if (!is_submodule_active(superproject, path))
 		return 0;
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 60a2913a01..c589eb7f89 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -210,7 +210,7 @@ static void show_submodule(struct repository *superproject,
 {
 	struct repository subrepo;
 	const struct submodule *sub = submodule_from_path(superproject,
-							  &null_oid, path);
+							  null_oid(), path);
 
 	if (repo_submodule_init(&subrepo, superproject, sub))
 		return;
diff --git a/builtin/rebase.c b/builtin/rebase.c
index 783b526f6e..c9206b40ea 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -485,7 +485,7 @@ static const char * const builtin_rebase_interactive_usage[] = {
 int cmd_rebase__interactive(int argc, const char **argv, const char *prefix)
 {
 	struct rebase_options opts = REBASE_OPTIONS_INIT;
-	struct object_id squash_onto = null_oid;
+	struct object_id squash_onto = *null_oid();
 	enum action command = ACTION_NONE;
 	struct option options[] = {
 		OPT_NEGBIT(0, "ff", &opts.flags, N_("allow fast-forward"),
@@ -1139,7 +1139,7 @@ static int can_fast_forward(struct commit *onto, struct commit *upstream,
 
 	merge_bases = get_merge_bases(onto, head);
 	if (!merge_bases || merge_bases->next) {
-		oidcpy(merge_base, &null_oid);
+		oidcpy(merge_base, null_oid());
 		goto done;
 	}
 
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 6bc12c828a..a34742513a 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -329,7 +329,7 @@ static void write_head_info(void)
 	for_each_alternate_ref(show_one_alternate_ref, &seen);
 	oidset_clear(&seen);
 	if (!sent_capabilities)
-		show_ref("capabilities^{}", &null_oid);
+		show_ref("capabilities^{}", null_oid());
 
 	advertise_shallow_grafts(1);
 
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 9d505a6329..d55f6262e9 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -426,7 +426,8 @@ static int module_list(int argc, const char **argv, const char *prefix)
 		const struct cache_entry *ce = list.entries[i];
 
 		if (ce_stage(ce))
-			printf("%06o %s U\t", ce->ce_mode, oid_to_hex(&null_oid));
+			printf("%06o %s U\t", ce->ce_mode,
+			       oid_to_hex(null_oid()));
 		else
 			printf("%06o %s %d\t", ce->ce_mode,
 			       oid_to_hex(&ce->oid), ce_stage(ce));
@@ -466,7 +467,7 @@ static void runcommand_in_submodule_cb(const struct cache_entry *list_item,
 
 	displaypath = get_submodule_displaypath(path, info->prefix);
 
-	sub = submodule_from_path(the_repository, &null_oid, path);
+	sub = submodule_from_path(the_repository, null_oid(), path);
 
 	if (!sub)
 		die(_("No url found for submodule path '%s' in .gitmodules"),
@@ -623,7 +624,7 @@ static void init_submodule(const char *path, const char *prefix,
 
 	displaypath = get_submodule_displaypath(path, prefix);
 
-	sub = submodule_from_path(the_repository, &null_oid, path);
+	sub = submodule_from_path(the_repository, null_oid(), path);
 
 	if (!sub)
 		die(_("No url found for submodule path '%s' in .gitmodules"),
@@ -783,14 +784,14 @@ static void status_submodule(const char *path, const struct object_id *ce_oid,
 	struct strbuf buf = STRBUF_INIT;
 	const char *git_dir;
 
-	if (!submodule_from_path(the_repository, &null_oid, path))
+	if (!submodule_from_path(the_repository, null_oid(), path))
 		die(_("no submodule mapping found in .gitmodules for path '%s'"),
 		      path);
 
 	displaypath = get_submodule_displaypath(path, prefix);
 
 	if ((CE_STAGEMASK & ce_flags) >> CE_STAGESHIFT) {
-		print_status(flags, 'U', path, &null_oid, displaypath);
+		print_status(flags, 'U', path, null_oid(), displaypath);
 		goto cleanup;
 	}
 
@@ -916,7 +917,7 @@ static int module_name(int argc, const char **argv, const char *prefix)
 	if (argc != 2)
 		usage(_("git submodule--helper name <path>"));
 
-	sub = submodule_from_path(the_repository, &null_oid, argv[1]);
+	sub = submodule_from_path(the_repository, null_oid(), argv[1]);
 
 	if (!sub)
 		die(_("no submodule mapping found in .gitmodules for path '%s'"),
@@ -1040,7 +1041,7 @@ static void generate_submodule_summary(struct summary_cb *info,
 	char *errmsg = NULL;
 	int total_commits = -1;
 
-	if (!info->cached && oideq(&p->oid_dst, &null_oid)) {
+	if (!info->cached && oideq(&p->oid_dst, null_oid())) {
 		if (S_ISGITLINK(p->mod_dst)) {
 			struct ref_store *refs = get_submodule_ref_store(p->sm_path);
 			if (refs)
@@ -1177,7 +1178,7 @@ static void prepare_submodule_summary(struct summary_cb *info,
 
 		if (info->for_status && p->status != 'A' &&
 		    (sub = submodule_from_path(the_repository,
-					       &null_oid, p->sm_path))) {
+					       null_oid(), p->sm_path))) {
 			char *config_key = NULL;
 			const char *value;
 			int ignore_all = 0;
@@ -1373,7 +1374,7 @@ static void sync_submodule(const char *path, const char *prefix,
 	if (!is_submodule_active(the_repository, path))
 		return;
 
-	sub = submodule_from_path(the_repository, &null_oid, path);
+	sub = submodule_from_path(the_repository, null_oid(), path);
 
 	if (sub && sub->url) {
 		if (starts_with_dot_dot_slash(sub->url) ||
@@ -1525,7 +1526,7 @@ static void deinit_submodule(const char *path, const char *prefix,
 	struct strbuf sb_config = STRBUF_INIT;
 	char *sub_git_dir = xstrfmt("%s/.git", path);
 
-	sub = submodule_from_path(the_repository, &null_oid, path);
+	sub = submodule_from_path(the_repository, null_oid(), path);
 
 	if (!sub || !sub->name)
 		goto cleanup;
@@ -1925,7 +1926,7 @@ static void determine_submodule_update_strategy(struct repository *r,
 						const char *update,
 						struct submodule_update_strategy *out)
 {
-	const struct submodule *sub = submodule_from_path(r, &null_oid, path);
+	const struct submodule *sub = submodule_from_path(r, null_oid(), path);
 	char *key;
 	const char *val;
 
@@ -2077,7 +2078,7 @@ static int prepare_to_clone_next_submodule(const struct cache_entry *ce,
 		goto cleanup;
 	}
 
-	sub = submodule_from_path(the_repository, &null_oid, ce->name);
+	sub = submodule_from_path(the_repository, null_oid(), ce->name);
 
 	if (suc->recursive_prefix)
 		displaypath = relative_path(suc->recursive_prefix,
@@ -2395,7 +2396,7 @@ static const char *remote_submodule_branch(const char *path)
 	const char *branch = NULL;
 	char *key;
 
-	sub = submodule_from_path(the_repository, &null_oid, path);
+	sub = submodule_from_path(the_repository, null_oid(), path);
 	if (!sub)
 		return NULL;
 
@@ -2533,7 +2534,7 @@ static int ensure_core_worktree(int argc, const char **argv, const char *prefix)
 
 	path = argv[1];
 
-	sub = submodule_from_path(the_repository, &null_oid, path);
+	sub = submodule_from_path(the_repository, null_oid(), path);
 	if (!sub)
 		BUG("We could get the submodule handle before?");
 
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 6ac90dc5f7..4a9466295b 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -421,7 +421,8 @@ static void unpack_delta_entry(enum object_type type, unsigned long delta_size,
 			 * has not been resolved yet.
 			 */
 			oidclr(&obj_list[nr].oid);
-			add_delta_to_list(nr, &null_oid, base_offset, delta_data, delta_size);
+			add_delta_to_list(nr, null_oid(), base_offset,
+					  delta_data, delta_size);
 			return;
 		}
 	}
diff --git a/builtin/worktree.c b/builtin/worktree.c
index 8771453493..f754978e47 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -331,7 +331,7 @@ static int add_worktree(const char *path, const char *refname,
 	 */
 	strbuf_reset(&sb);
 	strbuf_addf(&sb, "%s/HEAD", sb_repo.buf);
-	write_file(sb.buf, "%s", oid_to_hex(&null_oid));
+	write_file(sb.buf, "%s", oid_to_hex(null_oid()));
 	strbuf_reset(&sb);
 	strbuf_addf(&sb, "%s/commondir", sb_repo.buf);
 	write_file(sb.buf, "../..");
@@ -394,7 +394,7 @@ static int add_worktree(const char *path, const char *refname,
 			cp.argv = NULL;
 			cp.trace2_hook_name = "post-checkout";
 			strvec_pushl(&cp.args, absolute_path(hook),
-				     oid_to_hex(&null_oid),
+				     oid_to_hex(null_oid()),
 				     oid_to_hex(&commit->object.oid),
 				     "1", NULL);
 			ret = run_command(&cp);
diff --git a/combine-diff.c b/combine-diff.c
index 06635f91bc..7d925ce9ce 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1068,7 +1068,7 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 						   &result_size, NULL, NULL);
 		} else if (textconv) {
 			struct diff_filespec *df = alloc_filespec(elem->path);
-			fill_filespec(df, &null_oid, 0, st.st_mode);
+			fill_filespec(df, null_oid(), 0, st.st_mode);
 			result_size = fill_textconv(opt->repo, textconv, df, &result);
 			free_filespec(df);
 		} else if (0 <= (fd = open(elem->path, O_RDONLY))) {
diff --git a/connect.c b/connect.c
index 40b5c15f81..70b13389ba 100644
--- a/connect.c
+++ b/connect.c
@@ -254,7 +254,7 @@ static int process_dummy_ref(const struct packet_reader *reader)
 		return 0;
 	name++;
 
-	return oideq(&null_oid, &oid) && !strcmp(name, "capabilities^{}");
+	return oideq(null_oid(), &oid) && !strcmp(name, "capabilities^{}");
 }
 
 static void check_no_capabilities(const char *line, int len)
diff --git a/diff-lib.c b/diff-lib.c
index e5a58c9259..c2ac9250fe 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -232,7 +232,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 				   ce_intent_to_add(ce)) {
 				newmode = ce_mode_from_stat(ce, st.st_mode);
 				diff_addremove(&revs->diffopt, '+', newmode,
-					       &null_oid, 0, ce->name, 0);
+					       null_oid(), 0, ce->name, 0);
 				continue;
 			}
 
@@ -249,7 +249,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 		}
 		oldmode = ce->ce_mode;
 		old_oid = &ce->oid;
-		new_oid = changed ? &null_oid : &ce->oid;
+		new_oid = changed ? null_oid() : &ce->oid;
 		diff_change(&revs->diffopt, oldmode, newmode,
 			    old_oid, new_oid,
 			    !is_null_oid(old_oid),
@@ -307,7 +307,7 @@ static int get_stat_data(const struct index_state *istate,
 						    0, dirty_submodule);
 		if (changed) {
 			mode = ce_mode_from_stat(ce, st.st_mode);
-			oid = &null_oid;
+			oid = null_oid();
 		}
 	}
 
diff --git a/diff-no-index.c b/diff-no-index.c
index 7814eabfe0..308922e2b3 100644
--- a/diff-no-index.c
+++ b/diff-no-index.c
@@ -83,7 +83,7 @@ static struct diff_filespec *noindex_filespec(const char *name, int mode)
 	if (!name)
 		name = "/dev/null";
 	s = alloc_filespec(name);
-	fill_filespec(s, &null_oid, 0, mode);
+	fill_filespec(s, null_oid(), 0, mode);
 	if (name == file_from_standard_input)
 		populate_from_stdin(s);
 	return s;
diff --git a/diff.c b/diff.c
index 97c62f47df..7c730fe644 100644
--- a/diff.c
+++ b/diff.c
@@ -4190,7 +4190,7 @@ static struct diff_tempfile *prepare_temp_file(struct repository *r,
 				die_errno("readlink(%s)", name);
 			prep_temp_blob(r->index, name, temp, sb.buf, sb.len,
 				       (one->oid_valid ?
-					&one->oid : &null_oid),
+					&one->oid : null_oid()),
 				       (one->oid_valid ?
 					one->mode : S_IFLNK));
 			strbuf_release(&sb);
@@ -4199,7 +4199,7 @@ static struct diff_tempfile *prepare_temp_file(struct repository *r,
 			/* we can borrow from the file in the work tree */
 			temp->name = name;
 			if (!one->oid_valid)
-				oid_to_hex_r(temp->hex, &null_oid);
+				oid_to_hex_r(temp->hex, null_oid());
 			else
 				oid_to_hex_r(temp->hex, &one->oid);
 			/* Even though we may sometimes borrow the
diff --git a/dir.c b/dir.c
index 813dd7ba53..037474337f 100644
--- a/dir.c
+++ b/dir.c
@@ -3556,7 +3556,7 @@ static void connect_wt_gitdir_in_nested(const char *sub_worktree,
 			 */
 			i++;
 
-		sub = submodule_from_path(&subrepo, &null_oid, ce->name);
+		sub = submodule_from_path(&subrepo, null_oid(), ce->name);
 		if (!sub || !is_submodule_active(&subrepo, ce->name))
 			/* .gitmodules broken or inactive sub */
 			continue;
diff --git a/grep.c b/grep.c
index c5c348be55..8f91af1cb0 100644
--- a/grep.c
+++ b/grep.c
@@ -1494,7 +1494,7 @@ static int fill_textconv_grep(struct repository *r,
 		fill_filespec(df, gs->identifier, 1, 0100644);
 		break;
 	case GREP_SOURCE_FILE:
-		fill_filespec(df, &null_oid, 0, 0100644);
+		fill_filespec(df, null_oid(), 0, 0100644);
 		break;
 	default:
 		BUG("attempt to textconv something without a path?");
diff --git a/hash.h b/hash.h
index 3b114f053e..5fabf6e1ec 100644
--- a/hash.h
+++ b/hash.h
@@ -170,6 +170,9 @@ struct git_hash_algo {
 
 	/* The OID of the empty blob. */
 	const struct object_id *empty_blob;
+
+	/* The all-zeros OID. */
+	const struct object_id *null_oid;
 };
 extern const struct git_hash_algo hash_algos[GIT_HASH_NALGOS];
 
@@ -190,7 +193,7 @@ static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
 
 #define the_hash_algo the_repository->hash_algo
 
-extern const struct object_id null_oid;
+const struct object_id *null_oid(void);
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
@@ -226,7 +229,7 @@ static inline int oideq(const struct object_id *oid1, const struct object_id *oi
 
 static inline int is_null_oid(const struct object_id *oid)
 {
-	return oideq(oid, &null_oid);
+	return oideq(oid, null_oid());
 }
 
 static inline void hashcpy(unsigned char *sha_dst, const unsigned char *sha_src)
diff --git a/log-tree.c b/log-tree.c
index f3178a66a9..7b823786c2 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -419,7 +419,7 @@ void log_write_email_headers(struct rev_info *opt, struct commit *commit,
 {
 	const char *extra_headers = opt->extra_headers;
 	const char *name = oid_to_hex(opt->zero_commit ?
-				      &null_oid : &commit->object.oid);
+				      null_oid() : &commit->object.oid);
 
 	*need_8bit_cte_p = 0; /* unknown */
 
diff --git a/merge-ort.c b/merge-ort.c
index 5e118a85ee..3e552cfd21 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -1291,7 +1291,7 @@ static int handle_content_merge(struct merge_options *opt,
 		two_way = ((S_IFMT & o->mode) != (S_IFMT & a->mode));
 
 		merge_status = merge_3way(opt, path,
-					  two_way ? &null_oid : &o->oid,
+					  two_way ? null_oid() : &o->oid,
 					  &a->oid, &b->oid,
 					  pathnames, extra_marker_size,
 					  &result_buf);
@@ -1313,7 +1313,7 @@ static int handle_content_merge(struct merge_options *opt,
 	} else if (S_ISGITLINK(a->mode)) {
 		int two_way = ((S_IFMT & o->mode) != (S_IFMT & a->mode));
 		clean = merge_submodule(opt, pathnames[0],
-					two_way ? &null_oid : &o->oid,
+					two_way ? null_oid() : &o->oid,
 					&a->oid, &b->oid, &result->oid);
 		if (opt->priv->call_depth && two_way && !clean) {
 			result->mode = o->mode;
@@ -2123,7 +2123,7 @@ static int process_renames(struct merge_options *opt,
 			if (type_changed) {
 				/* rename vs. typechange */
 				/* Mark the original as resolved by removal */
-				memcpy(&oldinfo->stages[0].oid, &null_oid,
+				memcpy(&oldinfo->stages[0].oid, null_oid(),
 				       sizeof(oldinfo->stages[0].oid));
 				oldinfo->stages[0].mode = 0;
 				oldinfo->filemask &= 0x06;
@@ -2762,7 +2762,7 @@ static void process_entry(struct merge_options *opt,
 			if (ci->filemask & (1 << i))
 				continue;
 			ci->stages[i].mode = 0;
-			oidcpy(&ci->stages[i].oid, &null_oid);
+			oidcpy(&ci->stages[i].oid, null_oid());
 		}
 	} else if (ci->df_conflict && ci->merged.result.mode != 0) {
 		/*
@@ -2808,7 +2808,7 @@ static void process_entry(struct merge_options *opt,
 				continue;
 			/* zero out any entries related to directories */
 			new_ci->stages[i].mode = 0;
-			oidcpy(&new_ci->stages[i].oid, &null_oid);
+			oidcpy(&new_ci->stages[i].oid, null_oid());
 		}
 
 		/*
@@ -2909,11 +2909,11 @@ static void process_entry(struct merge_options *opt,
 			new_ci->merged.result.mode = ci->stages[2].mode;
 			oidcpy(&new_ci->merged.result.oid, &ci->stages[2].oid);
 			new_ci->stages[1].mode = 0;
-			oidcpy(&new_ci->stages[1].oid, &null_oid);
+			oidcpy(&new_ci->stages[1].oid, null_oid());
 			new_ci->filemask = 5;
 			if ((S_IFMT & b_mode) != (S_IFMT & o_mode)) {
 				new_ci->stages[0].mode = 0;
-				oidcpy(&new_ci->stages[0].oid, &null_oid);
+				oidcpy(&new_ci->stages[0].oid, null_oid());
 				new_ci->filemask = 4;
 			}
 
@@ -2921,11 +2921,11 @@ static void process_entry(struct merge_options *opt,
 			ci->merged.result.mode = ci->stages[1].mode;
 			oidcpy(&ci->merged.result.oid, &ci->stages[1].oid);
 			ci->stages[2].mode = 0;
-			oidcpy(&ci->stages[2].oid, &null_oid);
+			oidcpy(&ci->stages[2].oid, null_oid());
 			ci->filemask = 3;
 			if ((S_IFMT & a_mode) != (S_IFMT & o_mode)) {
 				ci->stages[0].mode = 0;
-				oidcpy(&ci->stages[0].oid, &null_oid);
+				oidcpy(&ci->stages[0].oid, null_oid());
 				ci->filemask = 2;
 			}
 
@@ -3042,7 +3042,7 @@ static void process_entry(struct merge_options *opt,
 		/* Deleted on both sides */
 		ci->merged.is_null = 1;
 		ci->merged.result.mode = 0;
-		oidcpy(&ci->merged.result.oid, &null_oid);
+		oidcpy(&ci->merged.result.oid, null_oid());
 		ci->merged.clean = !ci->path_conflict;
 	}
 
diff --git a/merge-recursive.c b/merge-recursive.c
index ed31f9496c..03f5c0769e 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -486,7 +486,7 @@ static int get_tree_entry_if_blob(struct repository *r,
 
 	ret = get_tree_entry(r, tree, path, &dfs->oid, &dfs->mode);
 	if (S_ISDIR(dfs->mode)) {
-		oidcpy(&dfs->oid, &null_oid);
+		oidcpy(&dfs->oid, null_oid());
 		dfs->mode = 0;
 	}
 	return ret;
@@ -1604,7 +1604,7 @@ static int handle_file_collision(struct merge_options *opt,
 
 	/* Store things in diff_filespecs for functions that need it */
 	null.path = (char *)collide_path;
-	oidcpy(&null.oid, &null_oid);
+	oidcpy(&null.oid, null_oid());
 	null.mode = 0;
 
 	if (merge_mode_and_contents(opt, &null, a, b, collide_path,
@@ -2789,11 +2789,11 @@ static int process_renames(struct merge_options *opt,
 			dst_other.mode = ren1->dst_entry->stages[other_stage].mode;
 			try_merge = 0;
 
-			if (oideq(&src_other.oid, &null_oid) &&
+			if (oideq(&src_other.oid, null_oid()) &&
 			    ren1->dir_rename_original_type == 'A') {
 				setup_rename_conflict_info(RENAME_VIA_DIR,
 							   opt, ren1, NULL);
-			} else if (oideq(&src_other.oid, &null_oid)) {
+			} else if (oideq(&src_other.oid, null_oid())) {
 				setup_rename_conflict_info(RENAME_DELETE,
 							   opt, ren1, NULL);
 			} else if ((dst_other.mode == ren1->pair->two->mode) &&
@@ -2812,7 +2812,7 @@ static int process_renames(struct merge_options *opt,
 						      1, /* update_cache */
 						      0  /* update_wd    */))
 					clean_merge = -1;
-			} else if (!oideq(&dst_other.oid, &null_oid)) {
+			} else if (!oideq(&dst_other.oid, null_oid())) {
 				/*
 				 * Probably not a clean merge, but it's
 				 * premature to set clean_merge to 0 here,
diff --git a/notes-merge.c b/notes-merge.c
index d2771fa3d4..53c587f750 100644
--- a/notes-merge.c
+++ b/notes-merge.c
@@ -600,7 +600,7 @@ int notes_merge(struct notes_merge_options *o,
 	/* Find merge bases */
 	bases = get_merge_bases(local, remote);
 	if (!bases) {
-		base_oid = &null_oid;
+		base_oid = null_oid();
 		base_tree_oid = the_hash_algo->empty_tree;
 		if (o->verbosity >= 4)
 			printf("No merge base found; doing history-less merge\n");
diff --git a/notes.c b/notes.c
index 1dfe9e2b9f..3f48a2aac3 100644
--- a/notes.c
+++ b/notes.c
@@ -1327,7 +1327,7 @@ int copy_note(struct notes_tree *t,
 	if (note)
 		return add_note(t, to_obj, note, combine_notes);
 	else if (existing_note)
-		return add_note(t, to_obj, &null_oid, combine_notes);
+		return add_note(t, to_obj, null_oid(), combine_notes);
 
 	return 0;
 }
diff --git a/object-file.c b/object-file.c
index 5f1fa05c4e..50bb5b6ca4 100644
--- a/object-file.c
+++ b/object-file.c
@@ -55,7 +55,6 @@
 	"\x6f\xe1\x41\xf7\x74\x91\x20\xa3\x03\x72" \
 	"\x18\x13"
 
-const struct object_id null_oid;
 static const struct object_id empty_tree_oid = {
 	EMPTY_TREE_SHA1_BIN_LITERAL,
 	GIT_HASH_SHA1,
@@ -64,6 +63,9 @@ static const struct object_id empty_blob_oid = {
 	EMPTY_BLOB_SHA1_BIN_LITERAL,
 	GIT_HASH_SHA1,
 };
+const struct object_id null_oid_sha1 = {
+	{0}, GIT_HASH_SHA1,
+};
 static const struct object_id empty_tree_oid_sha256 = {
 	EMPTY_TREE_SHA256_BIN_LITERAL,
 	GIT_HASH_SHA256,
@@ -72,6 +74,9 @@ static const struct object_id empty_blob_oid_sha256 = {
 	EMPTY_BLOB_SHA256_BIN_LITERAL,
 	GIT_HASH_SHA256,
 };
+static const struct object_id null_oid_sha256 = {
+	{0}, GIT_HASH_SHA256,
+};
 
 static void git_hash_sha1_init(git_hash_ctx *ctx)
 {
@@ -172,6 +177,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		git_hash_unknown_final_oid,
 		NULL,
 		NULL,
+		NULL,
 	},
 	{
 		"sha1",
@@ -187,6 +193,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		git_hash_sha1_final_oid,
 		&empty_tree_oid,
 		&empty_blob_oid,
+		&null_oid_sha1,
 	},
 	{
 		"sha256",
@@ -202,9 +209,15 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		git_hash_sha256_final_oid,
 		&empty_tree_oid_sha256,
 		&empty_blob_oid_sha256,
+		&null_oid_sha256,
 	}
 };
 
+const struct object_id *null_oid(void)
+{
+	return the_hash_algo->null_oid;
+}
+
 const char *empty_tree_oid_hex(void)
 {
 	static char buf[GIT_MAX_HEXSZ + 1];
diff --git a/parse-options-cb.c b/parse-options-cb.c
index 4542d4d3f9..3c811e1e4a 100644
--- a/parse-options-cb.c
+++ b/parse-options-cb.c
@@ -140,7 +140,7 @@ int parse_opt_object_id(const struct option *opt, const char *arg, int unset)
 	struct object_id *target = opt->value;
 
 	if (unset) {
-		*target = null_oid;
+		oidcpy(target, null_oid());
 		return 0;
 	}
 	if (!arg)
diff --git a/range-diff.c b/range-diff.c
index 116fb0735c..1a4471fe4c 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -445,7 +445,7 @@ static struct diff_filespec *get_filespec(const char *name, const char *p)
 {
 	struct diff_filespec *spec = alloc_filespec(name);
 
-	fill_filespec(spec, &null_oid, 0, 0100644);
+	fill_filespec(spec, null_oid(), 0, 0100644);
 	spec->data = (char *)p;
 	spec->size = strlen(p);
 	spec->should_munmap = 0;
diff --git a/refs.c b/refs.c
index 261fd82beb..996063fdf4 100644
--- a/refs.c
+++ b/refs.c
@@ -1107,7 +1107,7 @@ int ref_transaction_create(struct ref_transaction *transaction,
 	if (!new_oid || is_null_oid(new_oid))
 		BUG("create called without valid new_oid");
 	return ref_transaction_update(transaction, refname, new_oid,
-				      &null_oid, flags, msg, err);
+				      null_oid(), flags, msg, err);
 }
 
 int ref_transaction_delete(struct ref_transaction *transaction,
@@ -1119,7 +1119,7 @@ int ref_transaction_delete(struct ref_transaction *transaction,
 	if (old_oid && is_null_oid(old_oid))
 		BUG("delete called with old_oid set to zeros");
 	return ref_transaction_update(transaction, refname,
-				      &null_oid, old_oid,
+				      null_oid(), old_oid,
 				      flags, msg, err);
 }
 
diff --git a/refs/debug.c b/refs/debug.c
index 922e64fa6a..2665f94309 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -243,7 +243,7 @@ static int debug_read_raw_ref(struct ref_store *ref_store, const char *refname,
 	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
 	int res = 0;
 
-	oidcpy(oid, &null_oid);
+	oidcpy(oid, null_oid());
 	res = drefs->refs->be->read_raw_ref(drefs->refs, refname, oid, referent,
 					    type);
 
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 119972ee16..3f29f8c143 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1084,7 +1084,7 @@ static void prune_ref(struct files_ref_store *refs, struct ref_to_prune *r)
 	ref_transaction_add_update(
 			transaction, r->name,
 			REF_NO_DEREF | REF_HAVE_NEW | REF_HAVE_OLD | REF_IS_PRUNING,
-			&null_oid, &r->oid, NULL);
+			null_oid(), &r->oid, NULL);
 	if (ref_transaction_commit(transaction, &err))
 		goto cleanup;
 
diff --git a/reset.c b/reset.c
index 2f4fbd07c5..4bea758053 100644
--- a/reset.c
+++ b/reset.c
@@ -128,7 +128,7 @@ int reset_head(struct repository *r, struct object_id *oid, const char *action,
 	}
 	if (run_hook)
 		run_hook_le(NULL, "post-checkout",
-			    oid_to_hex(orig ? orig : &null_oid),
+			    oid_to_hex(orig ? orig : null_oid()),
 			    oid_to_hex(oid), "1", NULL);
 
 leave_reset_head:
diff --git a/sequencer.c b/sequencer.c
index fd183b5593..dbee779243 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -524,7 +524,7 @@ static int fast_forward_to(struct repository *r,
 	if (!transaction ||
 	    ref_transaction_update(transaction, "HEAD",
 				   to, unborn && !is_rebase_i(opts) ?
-				   &null_oid : from,
+				   null_oid() : from,
 				   0, sb.buf, &err) ||
 	    ref_transaction_commit(transaction, &err)) {
 		ref_transaction_free(transaction);
@@ -1131,7 +1131,7 @@ int update_head_with_reflog(const struct commit *old_head,
 	transaction = ref_transaction_begin(err);
 	if (!transaction ||
 	    ref_transaction_update(transaction, "HEAD", new_head,
-				   old_head ? &old_head->object.oid : &null_oid,
+				   old_head ? &old_head->object.oid : null_oid(),
 				   0, sb.buf, err) ||
 	    ref_transaction_commit(transaction, err)) {
 		ret = -1;
diff --git a/submodule-config.c b/submodule-config.c
index f502505566..2026120fb3 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -671,7 +671,7 @@ static int gitmodules_cb(const char *var, const char *value, void *data)
 
 	parameter.cache = repo->submodule_cache;
 	parameter.treeish_name = NULL;
-	parameter.gitmodules_oid = &null_oid;
+	parameter.gitmodules_oid = null_oid();
 	parameter.overwrite = 1;
 
 	return parse_config(var, value, &parameter);
diff --git a/submodule.c b/submodule.c
index 9767ba9893..7eeaf7d203 100644
--- a/submodule.c
+++ b/submodule.c
@@ -113,7 +113,7 @@ int update_path_in_gitmodules(const char *oldpath, const char *newpath)
 	if (is_gitmodules_unmerged(the_repository->index))
 		die(_("Cannot change unmerged .gitmodules, resolve merge conflicts first"));
 
-	submodule = submodule_from_path(the_repository, &null_oid, oldpath);
+	submodule = submodule_from_path(the_repository, null_oid(), oldpath);
 	if (!submodule || !submodule->name) {
 		warning(_("Could not find section in .gitmodules where path=%s"), oldpath);
 		return -1;
@@ -142,7 +142,7 @@ int remove_path_from_gitmodules(const char *path)
 	if (is_gitmodules_unmerged(the_repository->index))
 		die(_("Cannot change unmerged .gitmodules, resolve merge conflicts first"));
 
-	submodule = submodule_from_path(the_repository, &null_oid, path);
+	submodule = submodule_from_path(the_repository, null_oid(), path);
 	if (!submodule || !submodule->name) {
 		warning(_("Could not find section in .gitmodules where path=%s"), path);
 		return -1;
@@ -188,7 +188,8 @@ void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
 					     const char *path)
 {
 	const struct submodule *submodule = submodule_from_path(the_repository,
-								&null_oid, path);
+								null_oid(),
+								path);
 	if (submodule) {
 		const char *ignore;
 		char *key;
@@ -244,7 +245,7 @@ int is_submodule_active(struct repository *repo, const char *path)
 	const struct string_list *sl;
 	const struct submodule *module;
 
-	module = submodule_from_path(repo, &null_oid, path);
+	module = submodule_from_path(repo, null_oid(), path);
 
 	/* early return if there isn't a path->module mapping */
 	if (!module)
@@ -745,7 +746,7 @@ const struct submodule *submodule_from_ce(const struct cache_entry *ce)
 	if (!should_update_submodules())
 		return NULL;
 
-	return submodule_from_path(the_repository, &null_oid, ce->name);
+	return submodule_from_path(the_repository, null_oid(), ce->name);
 }
 
 static struct oid_array *submodule_commits(struct string_list *submodules,
@@ -1037,7 +1038,7 @@ int find_unpushed_submodules(struct repository *r,
 		const struct submodule *submodule;
 		const char *path = NULL;
 
-		submodule = submodule_from_name(r, &null_oid, name->string);
+		submodule = submodule_from_name(r, null_oid(), name->string);
 		if (submodule)
 			path = submodule->path;
 		else
@@ -1224,7 +1225,7 @@ static void calculate_changed_submodule_paths(struct repository *r,
 		const struct submodule *submodule;
 		const char *path = NULL;
 
-		submodule = submodule_from_name(r, &null_oid, name->string);
+		submodule = submodule_from_name(r, null_oid(), name->string);
 		if (submodule)
 			path = submodule->path;
 		else
@@ -1361,7 +1362,7 @@ static struct fetch_task *fetch_task_create(struct repository *r,
 	struct fetch_task *task = xmalloc(sizeof(*task));
 	memset(task, 0, sizeof(*task));
 
-	task->sub = submodule_from_path(r, &null_oid, path);
+	task->sub = submodule_from_path(r, null_oid(), path);
 	if (!task->sub) {
 		/*
 		 * No entry in .gitmodules? Technically not a submodule,
@@ -1917,7 +1918,7 @@ int submodule_move_head(const char *path,
 	if (old_head && !is_submodule_populated_gently(path, error_code_ptr))
 		return 0;
 
-	sub = submodule_from_path(the_repository, &null_oid, path);
+	sub = submodule_from_path(the_repository, null_oid(), path);
 
 	if (!sub)
 		BUG("could not get submodule information for '%s'", path);
@@ -2076,7 +2077,7 @@ static void relocate_single_git_dir_into_superproject(const char *path)
 
 	real_old_git_dir = real_pathdup(old_git_dir, 1);
 
-	sub = submodule_from_path(the_repository, &null_oid, path);
+	sub = submodule_from_path(the_repository, null_oid(), path);
 	if (!sub)
 		die(_("could not lookup name for submodule '%s'"), path);
 
@@ -2135,7 +2136,7 @@ void absorb_git_dir_into_superproject(const char *path,
 		* superproject did not rewrite the git file links yet,
 		* fix it now.
 		*/
-		sub = submodule_from_path(the_repository, &null_oid, path);
+		sub = submodule_from_path(the_repository, null_oid(), path);
 		if (!sub)
 			die(_("could not lookup name for submodule '%s'"), path);
 		connect_work_tree_and_git_dir(path,
@@ -2283,7 +2284,8 @@ int submodule_to_gitdir(struct strbuf *buf, const char *submodule)
 		strbuf_addstr(buf, git_dir);
 	}
 	if (!is_git_directory(buf->buf)) {
-		sub = submodule_from_path(the_repository, &null_oid, submodule);
+		sub = submodule_from_path(the_repository, null_oid(),
+					  submodule);
 		if (!sub) {
 			ret = -1;
 			goto cleanup;
diff --git a/t/helper/test-submodule-nested-repo-config.c b/t/helper/test-submodule-nested-repo-config.c
index c5fd4527dc..e3f11ff5a7 100644
--- a/t/helper/test-submodule-nested-repo-config.c
+++ b/t/helper/test-submodule-nested-repo-config.c
@@ -18,7 +18,7 @@ int cmd__submodule_nested_repo_config(int argc, const char **argv)
 
 	setup_git_directory();
 
-	sub = submodule_from_path(the_repository, &null_oid, argv[1]);
+	sub = submodule_from_path(the_repository, null_oid(), argv[1]);
 	if (repo_submodule_init(&subrepo, the_repository, sub)) {
 		die_usage(argv, "Submodule not found.");
 	}
diff --git a/tree-diff.c b/tree-diff.c
index 7cebbb327e..1572615bd9 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -161,7 +161,7 @@ static struct combine_diff_path *path_appendnew(struct combine_diff_path *last,
 	memcpy(p->path + base->len, path, pathlen);
 	p->path[len] = 0;
 	p->mode = mode;
-	oidcpy(&p->oid, oid ? oid : &null_oid);
+	oidcpy(&p->oid, oid ? oid : null_oid());
 
 	return p;
 }
@@ -243,7 +243,7 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *p,
 				mode_i = tp[i].entry.mode;
 			}
 			else {
-				oid_i = &null_oid;
+				oid_i = null_oid();
 				mode_i = 0;
 			}
 
diff --git a/wt-status.c b/wt-status.c
index 1aed68c43c..7603c1b198 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1687,10 +1687,10 @@ void wt_status_get_state(struct repository *r,
 	if (!sequencer_get_last_command(r, &action)) {
 		if (action == REPLAY_PICK) {
 			state->cherry_pick_in_progress = 1;
-			oidcpy(&state->cherry_pick_head_oid, &null_oid);
+			oidcpy(&state->cherry_pick_head_oid, null_oid());
 		} else {
 			state->revert_in_progress = 1;
-			oidcpy(&state->revert_head_oid, &null_oid);
+			oidcpy(&state->revert_head_oid, null_oid());
 		}
 	}
 	if (get_detached_from)
diff --git a/xdiff-interface.c b/xdiff-interface.c
index 4d20069302..609615db2c 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -172,7 +172,7 @@ void read_mmblob(mmfile_t *ptr, const struct object_id *oid)
 	unsigned long size;
 	enum object_type type;
 
-	if (oideq(oid, &null_oid)) {
+	if (oideq(oid, null_oid())) {
 		ptr->ptr = xstrdup("");
 		ptr->size = 0;
 		return;

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 11/15] builtin/show-index: set the algorithm for object IDs
  2021-04-10 15:21 [PATCH 00/15] SHA-256 / SHA-1 interop, part 1 brian m. carlson
                   ` (9 preceding siblings ...)
  2021-04-10 15:21 ` [PATCH 10/15] hash: provide per-algorithm null OIDs brian m. carlson
@ 2021-04-10 15:21 ` brian m. carlson
  2021-04-10 15:21 ` [PATCH 12/15] commit-graph: don't store file hashes as struct object_id brian m. carlson
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 57+ messages in thread
From: brian m. carlson @ 2021-04-10 15:21 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee

In most cases, when we load the hash of an object into a struct
object_id, we load it using one of the oid* or *_oid_hex functions.
However, for git show-index, we read it in directly using fread.  As a
consequence, set the algorithm correctly so the objects can be used
correctly both now and in the future.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/show-index.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/builtin/show-index.c b/builtin/show-index.c
index 8106b03a6b..0e0b9fb95b 100644
--- a/builtin/show-index.c
+++ b/builtin/show-index.c
@@ -71,9 +71,11 @@ int cmd_show_index(int argc, const char **argv, const char *prefix)
 			uint32_t off;
 		} *entries;
 		ALLOC_ARRAY(entries, nr);
-		for (i = 0; i < nr; i++)
+		for (i = 0; i < nr; i++) {
 			if (fread(entries[i].oid.hash, hashsz, 1, stdin) != 1)
 				die("unable to read sha1 %u/%u", i, nr);
+			entries[i].oid.algo = hash_algo_by_ptr(the_hash_algo);
+		}
 		for (i = 0; i < nr; i++)
 			if (fread(&entries[i].crc, 4, 1, stdin) != 1)
 				die("unable to read crc %u/%u", i, nr);

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 12/15] commit-graph: don't store file hashes as struct object_id
  2021-04-10 15:21 [PATCH 00/15] SHA-256 / SHA-1 interop, part 1 brian m. carlson
                   ` (10 preceding siblings ...)
  2021-04-10 15:21 ` [PATCH 11/15] builtin/show-index: set the algorithm for object IDs brian m. carlson
@ 2021-04-10 15:21 ` brian m. carlson
  2021-04-10 15:21 ` [PATCH 13/15] builtin/pack-objects: avoid using struct object_id for pack hash brian m. carlson
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 57+ messages in thread
From: brian m. carlson @ 2021-04-10 15:21 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee

The idea behind struct object_id is that it is supposed to represent the
identifier of a standard Git object or a special pseudo-object like the
all-zeros object ID.  In this case, we have file hashes, which, while
similar, are distinct from the identifiers of objects.

Switch these code paths to use an unsigned char array.  This is both
more logically consistent and it means that we need not set the
algorithm identifier for the struct object_id.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 commit-graph.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 23fef56d31..2bcb4e0f89 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1793,8 +1793,8 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 	struct lock_file lk = LOCK_INIT;
 	const unsigned hashsz = the_hash_algo->rawsz;
 	struct strbuf progress_title = STRBUF_INIT;
-	struct object_id file_hash;
 	struct chunkfile *cf;
+	unsigned char file_hash[GIT_MAX_RAWSZ];
 
 	if (ctx->split) {
 		struct strbuf tmp_file = STRBUF_INIT;
@@ -1909,7 +1909,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 	}
 
 	close_commit_graph(ctx->r->objects);
-	finalize_hashfile(f, file_hash.hash, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
+	finalize_hashfile(f, file_hash, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
 	free_chunkfile(cf);
 
 	if (ctx->split) {
@@ -1945,7 +1945,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 			unlink(graph_name);
 		}
 
-		ctx->commit_graph_hash_after[ctx->num_commit_graphs_after - 1] = xstrdup(oid_to_hex(&file_hash));
+		ctx->commit_graph_hash_after[ctx->num_commit_graphs_after - 1] = xstrdup(hash_to_hex(file_hash));
 		final_graph_name = get_split_graph_filename(ctx->odb,
 					ctx->commit_graph_hash_after[ctx->num_commit_graphs_after - 1]);
 		ctx->commit_graph_filenames_after[ctx->num_commit_graphs_after - 1] = final_graph_name;
@@ -2425,7 +2425,8 @@ static void graph_report(const char *fmt, ...)
 int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 {
 	uint32_t i, cur_fanout_pos = 0;
-	struct object_id prev_oid, cur_oid, checksum;
+	struct object_id prev_oid, cur_oid;
+	unsigned char checksum[GIT_MAX_HEXSZ];
 	int generation_zero = 0;
 	struct hashfile *f;
 	int devnull;
@@ -2444,8 +2445,8 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 	devnull = open("/dev/null", O_WRONLY);
 	f = hashfd(devnull, NULL);
 	hashwrite(f, g->data, g->data_len - g->hash_len);
-	finalize_hashfile(f, checksum.hash, CSUM_CLOSE);
-	if (!hasheq(checksum.hash, g->data + g->data_len - g->hash_len)) {
+	finalize_hashfile(f, checksum, CSUM_CLOSE);
+	if (!hasheq(checksum, g->data + g->data_len - g->hash_len)) {
 		graph_report(_("the commit-graph file has incorrect checksum and is likely corrupt"));
 		verify_commit_graph_error = VERIFY_COMMIT_GRAPH_ERROR_HASH;
 	}

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 13/15] builtin/pack-objects: avoid using struct object_id for pack hash
  2021-04-10 15:21 [PATCH 00/15] SHA-256 / SHA-1 interop, part 1 brian m. carlson
                   ` (11 preceding siblings ...)
  2021-04-10 15:21 ` [PATCH 12/15] commit-graph: don't store file hashes as struct object_id brian m. carlson
@ 2021-04-10 15:21 ` brian m. carlson
  2021-04-10 15:21 ` [PATCH 14/15] hex: default to the_hash_algo on zero algorithm value brian m. carlson
  2021-04-10 15:21 ` [PATCH 15/15] hex: print objects using the hash algorithm member brian m. carlson
  14 siblings, 0 replies; 57+ messages in thread
From: brian m. carlson @ 2021-04-10 15:21 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee

We use struct object_id for the names of objects.  It isn't intended to
be used for other hash values that don't name objects such as the pack
hash.

Because struct object_id will soon need to have its algorithm member
set, using it in this code path would mean that we didn't set that
member, only the hash member, which would result in a crash.  For both
of these reasons, switch to using an unsigned char array of size
GIT_MAX_RAWSZ.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/pack-objects.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 525c2d8552..5b25382204 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1030,7 +1030,7 @@ static void write_pack_file(void)
 	write_order = compute_write_order();
 
 	do {
-		struct object_id oid;
+		unsigned char hash[GIT_MAX_RAWSZ];
 		char *pack_tmp_name = NULL;
 
 		if (pack_to_stdout)
@@ -1059,13 +1059,13 @@ static void write_pack_file(void)
 		 * If so, rewrite it like in fast-import
 		 */
 		if (pack_to_stdout) {
-			finalize_hashfile(f, oid.hash, CSUM_HASH_IN_STREAM | CSUM_CLOSE);
+			finalize_hashfile(f, hash, CSUM_HASH_IN_STREAM | CSUM_CLOSE);
 		} else if (nr_written == nr_remaining) {
-			finalize_hashfile(f, oid.hash, CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE);
+			finalize_hashfile(f, hash, CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE);
 		} else {
-			int fd = finalize_hashfile(f, oid.hash, 0);
-			fixup_pack_header_footer(fd, oid.hash, pack_tmp_name,
-						 nr_written, oid.hash, offset);
+			int fd = finalize_hashfile(f, hash, 0);
+			fixup_pack_header_footer(fd, hash, pack_tmp_name,
+						 nr_written, hash, offset);
 			close(fd);
 			if (write_bitmap_index) {
 				if (write_bitmap_index != WRITE_BITMAP_QUIET)
@@ -1100,17 +1100,17 @@ static void write_pack_file(void)
 			strbuf_addf(&tmpname, "%s-", base_name);
 
 			if (write_bitmap_index) {
-				bitmap_writer_set_checksum(oid.hash);
+				bitmap_writer_set_checksum(hash);
 				bitmap_writer_build_type_index(
 					&to_pack, written_list, nr_written);
 			}
 
 			finish_tmp_packfile(&tmpname, pack_tmp_name,
 					    written_list, nr_written,
-					    &pack_idx_opts, oid.hash);
+					    &pack_idx_opts, hash);
 
 			if (write_bitmap_index) {
-				strbuf_addf(&tmpname, "%s.bitmap", oid_to_hex(&oid));
+				strbuf_addf(&tmpname, "%s.bitmap", hash_to_hex(hash));
 
 				stop_progress(&progress_state);
 
@@ -1124,7 +1124,7 @@ static void write_pack_file(void)
 
 			strbuf_release(&tmpname);
 			free(pack_tmp_name);
-			puts(oid_to_hex(&oid));
+			puts(hash_to_hex(hash));
 		}
 
 		/* mark written objects as written to previous pack */

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 14/15] hex: default to the_hash_algo on zero algorithm value
  2021-04-10 15:21 [PATCH 00/15] SHA-256 / SHA-1 interop, part 1 brian m. carlson
                   ` (12 preceding siblings ...)
  2021-04-10 15:21 ` [PATCH 13/15] builtin/pack-objects: avoid using struct object_id for pack hash brian m. carlson
@ 2021-04-10 15:21 ` brian m. carlson
  2021-04-10 15:21 ` [PATCH 15/15] hex: print objects using the hash algorithm member brian m. carlson
  14 siblings, 0 replies; 57+ messages in thread
From: brian m. carlson @ 2021-04-10 15:21 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee

There are numerous places in the codebase where we assume we can
initialize data by zeroing all its bytes.  However, when we do that with
a struct object_id, it leaves the structure with a zero value for the
algorithm, which is invalid.

We could forbid this pattern and require that all struct object_id
instances be initialized using oidclr, but this seems burdensome and
it's unnatural to most C programmers.  Instead, if the algorithm is
zero, assume we wanted to use the default hash algorithm instead.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 hex.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/hex.c b/hex.c
index 5fa3e71cb9..43597b2dbb 100644
--- a/hex.c
+++ b/hex.c
@@ -124,6 +124,13 @@ char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash,
 	char *buf = buffer;
 	int i;
 
+	/*
+	 * Our struct object_id has been memset to 0, so default to printing
+	 * using the default hash.
+	 */
+	if (algop == &hash_algos[0])
+		algop = the_hash_algo;
+
 	for (i = 0; i < algop->rawsz; i++) {
 		unsigned int val = *hash++;
 		*buf++ = hex[val >> 4];

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 15/15] hex: print objects using the hash algorithm member
  2021-04-10 15:21 [PATCH 00/15] SHA-256 / SHA-1 interop, part 1 brian m. carlson
                   ` (13 preceding siblings ...)
  2021-04-10 15:21 ` [PATCH 14/15] hex: default to the_hash_algo on zero algorithm value brian m. carlson
@ 2021-04-10 15:21 ` brian m. carlson
  14 siblings, 0 replies; 57+ messages in thread
From: brian m. carlson @ 2021-04-10 15:21 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee

Now that all code paths correctly set the hash algorithm member of
struct object_id, write an object's hex representation using the hash
algorithm member embedded in it.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 hex.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hex.c b/hex.c
index 43597b2dbb..4537e79d3c 100644
--- a/hex.c
+++ b/hex.c
@@ -143,7 +143,7 @@ char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash,
 
 char *oid_to_hex_r(char *buffer, const struct object_id *oid)
 {
-	return hash_to_hex_algop_r(buffer, oid->hash, the_hash_algo);
+	return hash_to_hex_algop_r(buffer, oid->hash, &hash_algos[oid->algo]);
 }
 
 char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *algop)
@@ -161,5 +161,5 @@ char *hash_to_hex(const unsigned char *hash)
 
 char *oid_to_hex(const struct object_id *oid)
 {
-	return hash_to_hex_algop(oid->hash, the_hash_algo);
+	return hash_to_hex_algop(oid->hash, &hash_algos[oid->algo]);
 }

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 08/15] cache: compare the entire buffer for struct object_id
  2021-04-10 15:21 ` [PATCH 08/15] cache: compare the entire buffer for " brian m. carlson
@ 2021-04-11  8:17   ` Chris Torek
  2021-04-11 11:36   ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 57+ messages in thread
From: Chris Torek @ 2021-04-11  8:17 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git List, Derrick Stolee

Just an observation here: comparing 256 bytes every time
would seem to have one nice bonus side effect and one
potentially bad, but vanishingly unlikely, side effect: a 160
byte null hash will now compare equal to a 256 byte null
hash (good), but a 160 byte hash extended to 256 bytes
will compare equal to a 256 byte hash that just happens to
end in 96 bytes of zero (bad, but I would guess, will never
actually happen).

Chris

On Sat, Apr 10, 2021 at 8:23 AM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> Currently, when we compare two object IDs, we have to take a branch to
> determine what the hash size is supposed to be.  The compiler can
> optimize well for a single length, but has trouble when there are two
> possible lengths.
>
> There is, however, an alternative: we can ensure that we always compare
> the full length of the hash buffer, but in turn we must zero the
> remainder of the buffer when using SHA-1; otherwise, we'll end up with
> incompatible junk at the end of otherwise equivalent object IDs that
> will prevent them from matching.  This is an acceptable tradeoff,
> because we generally read an object ID in once, but then compare it
> against others multiple times.
>
> This latter approach also has some benefits as well: since we will have
> annotated every location in which we load an object ID into an instance
> of struct object_id, if we want to set the hash algorithm for the object
> ID, we can do so at the same time.
>
> Adopt this latter approach, since it provides us greater flexibility and
> lets us read and store object IDs for multiple algorithms at once.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  hash.h        | 13 ++++++++++---
>  hex.c         |  9 ++++++---
>  notes.c       |  3 +++
>  object-file.c |  1 +
>  4 files changed, 20 insertions(+), 6 deletions(-)
>
> diff --git a/hash.h b/hash.h
> index c8f03d8aee..04eba5c56b 100644
> --- a/hash.h
> +++ b/hash.h
> @@ -205,7 +205,7 @@ static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
>
>  static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)
>  {
> -       return hashcmp(oid1->hash, oid2->hash);
> +       return memcmp(oid1->hash, oid2->hash, GIT_MAX_RAWSZ);
>  }
>
>  static inline int hasheq(const unsigned char *sha1, const unsigned char *sha2)
> @@ -221,7 +221,7 @@ static inline int hasheq(const unsigned char *sha1, const unsigned char *sha2)
>
>  static inline int oideq(const struct object_id *oid1, const struct object_id *oid2)
>  {
> -       return hasheq(oid1->hash, oid2->hash);
> +       return !memcmp(oid1->hash, oid2->hash, GIT_MAX_RAWSZ);
>  }
>
>  static inline int is_null_oid(const struct object_id *oid)
> @@ -258,7 +258,9 @@ static inline void oidclr(struct object_id *oid)
>
>  static inline void oidread(struct object_id *oid, const unsigned char *hash)
>  {
> -       memcpy(oid->hash, hash, the_hash_algo->rawsz);
> +       size_t rawsz = the_hash_algo->rawsz;
> +       memcpy(oid->hash, hash, rawsz);
> +       memset(oid->hash + rawsz, 0, GIT_MAX_RAWSZ - rawsz);
>  }
>
>  static inline int is_empty_blob_sha1(const unsigned char *sha1)
> @@ -281,6 +283,11 @@ static inline int is_empty_tree_oid(const struct object_id *oid)
>         return oideq(oid, the_hash_algo->empty_tree);
>  }
>
> +static inline void oid_pad_buffer(struct object_id *oid, const struct git_hash_algo *algop)
> +{
> +       memset(oid->hash + algop->rawsz, 0, GIT_MAX_RAWSZ - algop->rawsz);
> +}
> +
>  const char *empty_tree_oid_hex(void);
>  const char *empty_blob_oid_hex(void);
>
> diff --git a/hex.c b/hex.c
> index da51e64929..5fa3e71cb9 100644
> --- a/hex.c
> +++ b/hex.c
> @@ -69,7 +69,10 @@ int get_sha1_hex(const char *hex, unsigned char *sha1)
>  int get_oid_hex_algop(const char *hex, struct object_id *oid,
>                       const struct git_hash_algo *algop)
>  {
> -       return get_hash_hex_algop(hex, oid->hash, algop);
> +       int ret = get_hash_hex_algop(hex, oid->hash, algop);
> +       if (!ret)
> +               oid_pad_buffer(oid, algop);
> +       return ret;
>  }
>
>  /*
> @@ -80,7 +83,7 @@ int get_oid_hex_any(const char *hex, struct object_id *oid)
>  {
>         int i;
>         for (i = GIT_HASH_NALGOS - 1; i > 0; i--) {
> -               if (!get_hash_hex_algop(hex, oid->hash, &hash_algos[i]))
> +               if (!get_oid_hex_algop(hex, oid, &hash_algos[i]))
>                         return i;
>         }
>         return GIT_HASH_UNKNOWN;
> @@ -95,7 +98,7 @@ int parse_oid_hex_algop(const char *hex, struct object_id *oid,
>                         const char **end,
>                         const struct git_hash_algo *algop)
>  {
> -       int ret = get_hash_hex_algop(hex, oid->hash, algop);
> +       int ret = get_oid_hex_algop(hex, oid, algop);
>         if (!ret)
>                 *end = hex + algop->hexsz;
>         return ret;
> diff --git a/notes.c b/notes.c
> index a44b25858f..1dfe9e2b9f 100644
> --- a/notes.c
> +++ b/notes.c
> @@ -455,6 +455,8 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
>                 CALLOC_ARRAY(l, 1);
>                 oidcpy(&l->key_oid, &object_oid);
>                 oidcpy(&l->val_oid, &entry.oid);
> +               oid_pad_buffer(&l->key_oid, the_hash_algo);
> +               oid_pad_buffer(&l->val_oid, the_hash_algo);
>                 if (note_tree_insert(t, node, n, l, type,
>                                      combine_notes_concatenate))
>                         die("Failed to load %s %s into notes tree "
> @@ -484,6 +486,7 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
>                                 strbuf_addch(&non_note_path, '/');
>                         }
>                         strbuf_addstr(&non_note_path, entry.path);
> +                       oid_pad_buffer(&entry.oid, the_hash_algo);
>                         add_non_note(t, strbuf_detach(&non_note_path, NULL),
>                                      entry.mode, entry.oid.hash);
>                 }
> diff --git a/object-file.c b/object-file.c
> index 3f43c376e7..8e338247cc 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -2352,6 +2352,7 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr,
>                 if (namelen == the_hash_algo->hexsz - 2 &&
>                     !hex_to_bytes(oid.hash + 1, de->d_name,
>                                   the_hash_algo->rawsz - 1)) {
> +                       oid_pad_buffer(&oid, the_hash_algo);
>                         if (obj_cb) {
>                                 r = obj_cb(&oid, path->buf, data);
>                                 if (r)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 02/15] builtin/hash-object: allow literally hashing with a given algorithm
  2021-04-10 15:21 ` [PATCH 02/15] builtin/hash-object: allow literally hashing with a given algorithm brian m. carlson
@ 2021-04-11  8:52   ` Ævar Arnfjörð Bjarmason
  2021-04-11 21:07     ` brian m. carlson
  2021-04-16 15:21   ` Ævar Arnfjörð Bjarmason
  2021-04-16 17:27   ` Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-11  8:52 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Derrick Stolee


On Sat, Apr 10 2021, brian m. carlson wrote:

> +	algo = the_hash_algo;
> +	if (object_format) {
> +		if (flags & HASH_WRITE_OBJECT)
> +			errstr = "Can't use -w with --object-format";
> +		else {
> +			int id = hash_algo_by_name(object_format);
> +			if (id == GIT_HASH_UNKNOWN)
> +				errstr = "Unknown object format";

An established pattern, but shouldn't these be N_()'d while we're at it?
At least for new strings.

> +			else
> +				algo = &hash_algos[id];
> +		}
> +	}

Style nit: if .. {} else {} not if .. else {}.

> +test_expect_success '--literally with --object-format' '
> +	test $(test_oid --hash=sha1 hello) = $(git hash-object -t blob --literally --object-format=sha1 hello) &&
> +	test $(test_oid --hash=sha256 hello) = $(git hash-object -t blob --literally --object-format=sha256 hello)
> +'

This would be more readable and easier to debug with 4x tempfiles and 2x
test_cmp.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 08/15] cache: compare the entire buffer for struct object_id
  2021-04-10 15:21 ` [PATCH 08/15] cache: compare the entire buffer for " brian m. carlson
  2021-04-11  8:17   ` Chris Torek
@ 2021-04-11 11:36   ` Ævar Arnfjörð Bjarmason
  2021-04-11 21:05     ` brian m. carlson
  1 sibling, 1 reply; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-11 11:36 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Derrick Stolee


On Sat, Apr 10 2021, brian m. carlson wrote:

> Currently, when we compare two object IDs, we have to take a branch to
> determine what the hash size is supposed to be.  The compiler can
> optimize well for a single length, but has trouble when there are two
> possible lengths.

This would benefit from some performance/perf numbers. When this code
was first changed like this in 183a638b7da (hashcmp: assert constant
hash size, 2018-08-23) we had:

      Test     v2.18.0             v2.19.0-rc0               HEAD
      ------------------------------------------------------------------------------
      0001.2:  34.24(33.81+0.43)   34.83(34.42+0.40) +1.7%   33.90(33.47+0.42) -1.0%

Then it was later modified in 0dab7129ab1 (cache: make hashcmp and
hasheq work with larger hashes, 2018-11-14).

> @@ -205,7 +205,7 @@ static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
>  
>  static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)
>  {
> -	return hashcmp(oid1->hash, oid2->hash);
> +	return memcmp(oid1->hash, oid2->hash, GIT_MAX_RAWSZ);
>  }

hashcmp is now:

        if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
                return memcmp(sha1, sha2, GIT_MAX_RAWSZ);
        return memcmp(sha1, sha2, GIT_SHA1_RAWSZ);

Wouldn't it make more sense to amend it to just be a memcmp
wrapper/macro if we're going to not make this conditional on the hash
algorithm, or are there other callsites where we still want the old way
of doing it?

>  
>  static inline int hasheq(const unsigned char *sha1, const unsigned char *sha2)
> @@ -221,7 +221,7 @@ static inline int hasheq(const unsigned char *sha1, const unsigned char *sha2)
>  
>  static inline int oideq(const struct object_id *oid1, const struct object_id *oid2)
>  {
> -	return hasheq(oid1->hash, oid2->hash);
> +	return !memcmp(oid1->hash, oid2->hash, GIT_MAX_RAWSZ);
>  }

Ditto hasheq v.s. !memcmp:

        if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
                return !memcmp(sha1, sha2, GIT_MAX_RAWSZ);
        return !memcmp(sha1, sha2, GIT_SHA1_RAWSZ);

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/15] cache: add an algo member to struct object_id
  2021-04-10 15:21 ` [PATCH 03/15] cache: add an algo member to struct object_id brian m. carlson
@ 2021-04-11 11:55   ` Ævar Arnfjörð Bjarmason
  2021-04-11 21:37     ` brian m. carlson
  2021-04-13 12:12   ` Derrick Stolee
  1 sibling, 1 reply; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-11 11:55 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Derrick Stolee


On Sat, Apr 10 2021, brian m. carlson wrote:

> Now that we're working with multiple hash algorithms in the same repo,
> it's best if we label each object ID with its algorithm so we can
> determine how to format a given object ID. Add a member called algo to
> struct object_id.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  hash.h | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/hash.h b/hash.h
> index 3fb0c3d400..dafdcb3335 100644
> --- a/hash.h
> +++ b/hash.h
> @@ -181,6 +181,7 @@ static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
>  
>  struct object_id {
>  	unsigned char hash[GIT_MAX_RAWSZ];
> +	int algo;

Curiosity since I'm not as familiar as you with the multi-hash support
by far:

So struct object_id is GIT_MAX_RAWSZ, not two types of structs for
GIT_SHA1_RAWSZ and GIT_SHA256_RAWSZ. That pre-dates this series because
we'd like to not deal with two types of objects everywhere for SHA-1 and
SHA-256. Makes sense.

Before this series we'd memcmp them up to their actual length, but the
last GIT_MAX_RAWSZ-GIT_SHA1_RAWSZ would be uninitialized

Now we pad them out, so the last 96 bits of every SHA1 are 0000...;
Couldn't we also tell which hash an object is by memcmp-ing those last N
bits and see if they're all zero'd?

Feels a bit hackish, and we'd need to reconsider that method if we'd
ever support other same-length hashes.

But OTOH having these objects all padded out in memory to the same
length, but having to carry around a "what hash algo" is it yields the
arguably weird hack of having a per-hash NULL_OID, which has never been
an actual object of any hash type, but just a pseudo-object.

As another aside I had some local patches (just for playing around) to
implement SHA-256/160, i.e. a SHA-256-to-SHA-1-length that doesn't
officially exist. We'd store things as full-length SHA-256 internally,
but on anything that would format them (including plumbing output) we'd
emit the truncated version(s).

The idea was to support Git/SHA-256 when combined with legacy systems
who'd all need DB column changes to have different length hashes.

I abandoned it as insany sillyness after playing with it for about a
day, but it did reveal that much of the hash code now can assume
internal length == formatting length, which is why I'm 3 paragraphs into
this digression, i.e. maybe some of the code structure also makes having
a NULL_OID always be 256-bits when we want to format it as 160/256
painful...

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 09/15] hash: set and copy algo field in struct object_id
  2021-04-10 15:21 ` [PATCH 09/15] hash: set and copy algo field in " brian m. carlson
@ 2021-04-11 11:57   ` Ævar Arnfjörð Bjarmason
  2021-04-11 21:48     ` brian m. carlson
  0 siblings, 1 reply; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-11 11:57 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Derrick Stolee


On Sat, Apr 10 2021, brian m. carlson wrote:

>  const struct object_id null_oid;
>  static const struct object_id empty_tree_oid = {
> -	EMPTY_TREE_SHA1_BIN_LITERAL
> +	EMPTY_TREE_SHA1_BIN_LITERAL,
> +	GIT_HASH_SHA1,
>  };
>  static const struct object_id empty_blob_oid = {
> -	EMPTY_BLOB_SHA1_BIN_LITERAL
> +	EMPTY_BLOB_SHA1_BIN_LITERAL,
> +	GIT_HASH_SHA1,
>  };
>  static const struct object_id empty_tree_oid_sha256 = {
> -	EMPTY_TREE_SHA256_BIN_LITERAL
> +	EMPTY_TREE_SHA256_BIN_LITERAL,
> +	GIT_HASH_SHA256,
>  };
>  static const struct object_id empty_blob_oid_sha256 = {
> -	EMPTY_BLOB_SHA256_BIN_LITERAL
> +	EMPTY_BLOB_SHA256_BIN_LITERAL,
> +	GIT_HASH_SHA256,
>  };

In this and some other patches we're continuing to add new fields to
structs without using designated initializers.

Not a new problem at all, just a note that if you re-roll I for one
would very much appreciate starting by migrating over to that. It makes
for much easier reading in subsequent patches in this series, and in
future ones.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 10/15] hash: provide per-algorithm null OIDs
  2021-04-10 15:21 ` [PATCH 10/15] hash: provide per-algorithm null OIDs brian m. carlson
@ 2021-04-11 14:03   ` Junio C Hamano
  2021-04-11 21:51     ` brian m. carlson
  0 siblings, 1 reply; 57+ messages in thread
From: Junio C Hamano @ 2021-04-11 14:03 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Derrick Stolee

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> diff --git a/object-file.c b/object-file.c
> index 5f1fa05c4e..50bb5b6ca4 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -55,7 +55,6 @@
>  	"\x6f\xe1\x41\xf7\x74\x91\x20\xa3\x03\x72" \
>  	"\x18\x13"
>  
> -const struct object_id null_oid;
>  static const struct object_id empty_tree_oid = {
>  	EMPTY_TREE_SHA1_BIN_LITERAL,
>  	GIT_HASH_SHA1,
> @@ -64,6 +63,9 @@ static const struct object_id empty_blob_oid = {
>  	EMPTY_BLOB_SHA1_BIN_LITERAL,
>  	GIT_HASH_SHA1,
>  };
> +const struct object_id null_oid_sha1 = {
> +	{0}, GIT_HASH_SHA1,
> +};

sparse wants this to be a file-scope static.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 08/15] cache: compare the entire buffer for struct object_id
  2021-04-11 11:36   ` Ævar Arnfjörð Bjarmason
@ 2021-04-11 21:05     ` brian m. carlson
  0 siblings, 0 replies; 57+ messages in thread
From: brian m. carlson @ 2021-04-11 21:05 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Derrick Stolee

[-- Attachment #1: Type: text/plain, Size: 2054 bytes --]

On 2021-04-11 at 11:36:33, Ævar Arnfjörð Bjarmason wrote:
> 
> On Sat, Apr 10 2021, brian m. carlson wrote:
> 
> > Currently, when we compare two object IDs, we have to take a branch to
> > determine what the hash size is supposed to be.  The compiler can
> > optimize well for a single length, but has trouble when there are two
> > possible lengths.
> 
> This would benefit from some performance/perf numbers. When this code
> was first changed like this in 183a638b7da (hashcmp: assert constant
> hash size, 2018-08-23) we had:
> 
>       Test     v2.18.0             v2.19.0-rc0               HEAD
>       ------------------------------------------------------------------------------
>       0001.2:  34.24(33.81+0.43)   34.83(34.42+0.40) +1.7%   33.90(33.47+0.42) -1.0%
> 
> Then it was later modified in 0dab7129ab1 (cache: make hashcmp and
> hasheq work with larger hashes, 2018-11-14).

I can do some perf numbers.

> > @@ -205,7 +205,7 @@ static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
> >  
> >  static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)
> >  {
> > -	return hashcmp(oid1->hash, oid2->hash);
> > +	return memcmp(oid1->hash, oid2->hash, GIT_MAX_RAWSZ);
> >  }
> 
> hashcmp is now:
> 
>         if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
>                 return memcmp(sha1, sha2, GIT_MAX_RAWSZ);
>         return memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
> 
> Wouldn't it make more sense to amend it to just be a memcmp
> wrapper/macro if we're going to not make this conditional on the hash
> algorithm, or are there other callsites where we still want the old way
> of doing it?

No, we can't do that.  With oidcmp, we know the buffer is large enough.
However, in some cases, the buffer in hashcmp is not large enough.  For
example, we may be at the end of a SHA-1 tree object and we'd segfault.
I did try that and I quickly found that it was totally broken.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 02/15] builtin/hash-object: allow literally hashing with a given algorithm
  2021-04-11  8:52   ` Ævar Arnfjörð Bjarmason
@ 2021-04-11 21:07     ` brian m. carlson
  0 siblings, 0 replies; 57+ messages in thread
From: brian m. carlson @ 2021-04-11 21:07 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Derrick Stolee

[-- Attachment #1: Type: text/plain, Size: 1172 bytes --]

On 2021-04-11 at 08:52:02, Ævar Arnfjörð Bjarmason wrote:
> 
> On Sat, Apr 10 2021, brian m. carlson wrote:
> 
> > +	algo = the_hash_algo;
> > +	if (object_format) {
> > +		if (flags & HASH_WRITE_OBJECT)
> > +			errstr = "Can't use -w with --object-format";
> > +		else {
> > +			int id = hash_algo_by_name(object_format);
> > +			if (id == GIT_HASH_UNKNOWN)
> > +				errstr = "Unknown object format";
> 
> An established pattern, but shouldn't these be N_()'d while we're at it?
> At least for new strings.

Sure, I can do that.

> > +			else
> > +				algo = &hash_algos[id];
> > +		}
> > +	}
> 
> Style nit: if .. {} else {} not if .. else {}.

Will fix.

> > +test_expect_success '--literally with --object-format' '
> > +	test $(test_oid --hash=sha1 hello) = $(git hash-object -t blob --literally --object-format=sha1 hello) &&
> > +	test $(test_oid --hash=sha256 hello) = $(git hash-object -t blob --literally --object-format=sha256 hello)
> > +'
> 
> This would be more readable and easier to debug with 4x tempfiles and 2x
> test_cmp.

Okay, I can go for that.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/15] cache: add an algo member to struct object_id
  2021-04-11 11:55   ` Ævar Arnfjörð Bjarmason
@ 2021-04-11 21:37     ` brian m. carlson
  0 siblings, 0 replies; 57+ messages in thread
From: brian m. carlson @ 2021-04-11 21:37 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Derrick Stolee

[-- Attachment #1: Type: text/plain, Size: 3956 bytes --]

On 2021-04-11 at 11:55:57, Ævar Arnfjörð Bjarmason wrote:
> 
> On Sat, Apr 10 2021, brian m. carlson wrote:
> 
> > Now that we're working with multiple hash algorithms in the same repo,
> > it's best if we label each object ID with its algorithm so we can
> > determine how to format a given object ID. Add a member called algo to
> > struct object_id.
> >
> > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> > ---
> >  hash.h | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/hash.h b/hash.h
> > index 3fb0c3d400..dafdcb3335 100644
> > --- a/hash.h
> > +++ b/hash.h
> > @@ -181,6 +181,7 @@ static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
> >  
> >  struct object_id {
> >  	unsigned char hash[GIT_MAX_RAWSZ];
> > +	int algo;
> 
> Curiosity since I'm not as familiar as you with the multi-hash support
> by far:
> 
> So struct object_id is GIT_MAX_RAWSZ, not two types of structs for
> GIT_SHA1_RAWSZ and GIT_SHA256_RAWSZ. That pre-dates this series because
> we'd like to not deal with two types of objects everywhere for SHA-1 and
> SHA-256. Makes sense.
> 
> Before this series we'd memcmp them up to their actual length, but the
> last GIT_MAX_RAWSZ-GIT_SHA1_RAWSZ would be uninitialized
> 
> Now we pad them out, so the last 96 bits of every SHA1 are 0000...;
> Couldn't we also tell which hash an object is by memcmp-ing those last N
> bits and see if they're all zero'd?

That makes a lot of assumptions about the security of the hash
algorithm that I don't want to make here.  If anyone can ever find a
SHA-256 hash with trailing 96 bits zero, then they can confuse it with a
SHA-1 hash.  That means that our security level goes from 128 bits to 96
bits.  It's also a nonstandard construction.

More importantly, it results in the null OID being treated as a SHA-1
OID.  Because we do print the null OID in some cases, we're going to
break a lot of output formats if we print all the rest of the OIDs with
64 characters and then the null OID with 40.  That's to say nothing of
the problems in binary formats.

The reason we pad these objects is because our hashmaps are broken if we
don't.  I don't remember all the gory details, but it was obvious to me
that if they weren't consistently equal, things were going to be broken.
That's the only reason, not theoretical purity.

> Feels a bit hackish, and we'd need to reconsider that method if we'd
> ever support other same-length hashes.

My hope is that we don't need to do this, but we do have SHA-3 to serve
as a backup for SHA-2.  If quantum computers don't progress
substantially, SHA-3-256 is definitely a viable candidate for
replacement if anything ever happens to SHA-256.

> But OTOH having these objects all padded out in memory to the same
> length, but having to carry around a "what hash algo" is it yields the
> arguably weird hack of having a per-hash NULL_OID, which has never been
> an actual object of any hash type, but just a pseudo-object.

Unfortunately, as I mentioned above, we need to have two null OIDs to
handle printing things out.  It's inconvenient, I agree.

> I abandoned it as insany sillyness after playing with it for about a
> day, but it did reveal that much of the hash code now can assume
> internal length == formatting length, which is why I'm 3 paragraphs into
> this digression, i.e. maybe some of the code structure also makes having
> a NULL_OID always be 256-bits when we want to format it as 160/256
> painful...

We'll always format based on the algorithm in the OID.  That's the
simplest way to make things work because unfortunately we may end up
with both types of OIDs in the same code paths (as we're converting one
to the other) and otherwise our printing functions need a lot of special
handling and even more variants than they have today.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 09/15] hash: set and copy algo field in struct object_id
  2021-04-11 11:57   ` Ævar Arnfjörð Bjarmason
@ 2021-04-11 21:48     ` brian m. carlson
  2021-04-11 22:12       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 57+ messages in thread
From: brian m. carlson @ 2021-04-11 21:48 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Derrick Stolee

[-- Attachment #1: Type: text/plain, Size: 1832 bytes --]

On 2021-04-11 at 11:57:30, Ævar Arnfjörð Bjarmason wrote:
> 
> On Sat, Apr 10 2021, brian m. carlson wrote:
> 
> >  const struct object_id null_oid;
> >  static const struct object_id empty_tree_oid = {
> > -	EMPTY_TREE_SHA1_BIN_LITERAL
> > +	EMPTY_TREE_SHA1_BIN_LITERAL,
> > +	GIT_HASH_SHA1,
> >  };
> >  static const struct object_id empty_blob_oid = {
> > -	EMPTY_BLOB_SHA1_BIN_LITERAL
> > +	EMPTY_BLOB_SHA1_BIN_LITERAL,
> > +	GIT_HASH_SHA1,
> >  };
> >  static const struct object_id empty_tree_oid_sha256 = {
> > -	EMPTY_TREE_SHA256_BIN_LITERAL
> > +	EMPTY_TREE_SHA256_BIN_LITERAL,
> > +	GIT_HASH_SHA256,
> >  };
> >  static const struct object_id empty_blob_oid_sha256 = {
> > -	EMPTY_BLOB_SHA256_BIN_LITERAL
> > +	EMPTY_BLOB_SHA256_BIN_LITERAL,
> > +	GIT_HASH_SHA256,
> >  };
> 
> In this and some other patches we're continuing to add new fields to
> structs without using designated initializers.
> 
> Not a new problem at all, just a note that if you re-roll I for one
> would very much appreciate starting by migrating over to that. It makes
> for much easier reading in subsequent patches in this series, and in
> future ones.

I'm happy to do that.  I thought we were not allowed to use C99 features
because only recent versions of MSVC support modern C.  I was previously
under the impression that MSVC didn't support anything but C89, but they
now support C11 and C17 in their latest release[0], much to my surprise.

If we're willing to require C99 features, then I'm happy to add those.
I'll also send a follow-up series to require C99 support, which I think
is overdue considering the standard is 22 years old.

[0] https://devblogs.microsoft.com/cppblog/c11-and-c17-standard-support-arriving-in-msvc/
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 10/15] hash: provide per-algorithm null OIDs
  2021-04-11 14:03   ` Junio C Hamano
@ 2021-04-11 21:51     ` brian m. carlson
  0 siblings, 0 replies; 57+ messages in thread
From: brian m. carlson @ 2021-04-11 21:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Derrick Stolee

[-- Attachment #1: Type: text/plain, Size: 856 bytes --]

On 2021-04-11 at 14:03:05, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > diff --git a/object-file.c b/object-file.c
> > index 5f1fa05c4e..50bb5b6ca4 100644
> > --- a/object-file.c
> > +++ b/object-file.c
> > @@ -55,7 +55,6 @@
> >  	"\x6f\xe1\x41\xf7\x74\x91\x20\xa3\x03\x72" \
> >  	"\x18\x13"
> >  
> > -const struct object_id null_oid;
> >  static const struct object_id empty_tree_oid = {
> >  	EMPTY_TREE_SHA1_BIN_LITERAL,
> >  	GIT_HASH_SHA1,
> > @@ -64,6 +63,9 @@ static const struct object_id empty_blob_oid = {
> >  	EMPTY_BLOB_SHA1_BIN_LITERAL,
> >  	GIT_HASH_SHA1,
> >  };
> > +const struct object_id null_oid_sha1 = {
> > +	{0}, GIT_HASH_SHA1,
> > +};
> 
> sparse wants this to be a file-scope static.

Can do.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 09/15] hash: set and copy algo field in struct object_id
  2021-04-11 21:48     ` brian m. carlson
@ 2021-04-11 22:12       ` Ævar Arnfjörð Bjarmason
  2021-04-11 23:52         ` brian m. carlson
  2021-04-12 10:53         ` [PATCH 09/15] hash: set and copy algo field in struct object_id Junio C Hamano
  0 siblings, 2 replies; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-11 22:12 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Derrick Stolee


On Sun, Apr 11 2021, brian m. carlson wrote:

> On 2021-04-11 at 11:57:30, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Sat, Apr 10 2021, brian m. carlson wrote:
>> 
>> >  const struct object_id null_oid;
>> >  static const struct object_id empty_tree_oid = {
>> > -	EMPTY_TREE_SHA1_BIN_LITERAL
>> > +	EMPTY_TREE_SHA1_BIN_LITERAL,
>> > +	GIT_HASH_SHA1,
>> >  };
>> >  static const struct object_id empty_blob_oid = {
>> > -	EMPTY_BLOB_SHA1_BIN_LITERAL
>> > +	EMPTY_BLOB_SHA1_BIN_LITERAL,
>> > +	GIT_HASH_SHA1,
>> >  };
>> >  static const struct object_id empty_tree_oid_sha256 = {
>> > -	EMPTY_TREE_SHA256_BIN_LITERAL
>> > +	EMPTY_TREE_SHA256_BIN_LITERAL,
>> > +	GIT_HASH_SHA256,
>> >  };
>> >  static const struct object_id empty_blob_oid_sha256 = {
>> > -	EMPTY_BLOB_SHA256_BIN_LITERAL
>> > +	EMPTY_BLOB_SHA256_BIN_LITERAL,
>> > +	GIT_HASH_SHA256,
>> >  };
>> 
>> In this and some other patches we're continuing to add new fields to
>> structs without using designated initializers.
>> 
>> Not a new problem at all, just a note that if you re-roll I for one
>> would very much appreciate starting by migrating over to that. It makes
>> for much easier reading in subsequent patches in this series, and in
>> future ones.
>
> I'm happy to do that.  I thought we were not allowed to use C99 features
> because only recent versions of MSVC support modern C.  I was previously
> under the impression that MSVC didn't support anything but C89, but they
> now support C11 and C17 in their latest release[0], much to my surprise.
>
> If we're willing to require C99 features, then I'm happy to add those.
> I'll also send a follow-up series to require C99 support, which I think
> is overdue considering the standard is 22 years old.
>
> [0] https://devblogs.microsoft.com/cppblog/c11-and-c17-standard-support-arriving-in-msvc/

I don't think we can in general require C99, e.g. I found just the other
day that our CI's MSVC will fail on %zu (to print size_t without %lu & a
cast).

But we can use some subset of C99 features, and happily designated
initializers is one of those, see cbc0f81d96f (strbuf: use designated
initializers in STRBUF_INIT, 2017-07-10). It's been used all over the
place since then.

See e.g.: git grep -P '^\s+\.\S+ = ' -- '*.[ch]'

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 09/15] hash: set and copy algo field in struct object_id
  2021-04-11 22:12       ` Ævar Arnfjörð Bjarmason
@ 2021-04-11 23:52         ` brian m. carlson
  2021-04-12 11:02           ` [PATCH 0/2] C99: harder dependency on variadic macros Ævar Arnfjörð Bjarmason
  2021-04-12 10:53         ` [PATCH 09/15] hash: set and copy algo field in struct object_id Junio C Hamano
  1 sibling, 1 reply; 57+ messages in thread
From: brian m. carlson @ 2021-04-11 23:52 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Derrick Stolee

[-- Attachment #1: Type: text/plain, Size: 1776 bytes --]

On 2021-04-11 at 22:12:38, Ævar Arnfjörð Bjarmason wrote:
> 
> On Sun, Apr 11 2021, brian m. carlson wrote:
> 
> > On 2021-04-11 at 11:57:30, Ævar Arnfjörð Bjarmason wrote:
> >> 
> >> In this and some other patches we're continuing to add new fields to
> >> structs without using designated initializers.
> >> 
> >> Not a new problem at all, just a note that if you re-roll I for one
> >> would very much appreciate starting by migrating over to that. It makes
> >> for much easier reading in subsequent patches in this series, and in
> >> future ones.
> >
> > I'm happy to do that.  I thought we were not allowed to use C99 features
> > because only recent versions of MSVC support modern C.  I was previously
> > under the impression that MSVC didn't support anything but C89, but they
> > now support C11 and C17 in their latest release[0], much to my surprise.
> >
> > If we're willing to require C99 features, then I'm happy to add those.
> > I'll also send a follow-up series to require C99 support, which I think
> > is overdue considering the standard is 22 years old.
> >
> > [0] https://devblogs.microsoft.com/cppblog/c11-and-c17-standard-support-arriving-in-msvc/
> 
> I don't think we can in general require C99, e.g. I found just the other
> day that our CI's MSVC will fail on %zu (to print size_t without %lu & a
> cast).

That's a shame.  I think I'd like to try, though, and ask people to
upgrade MSVC to a suitable version if we're going to continue to support
it.  It's not like there aren't alternatives.  So I'm going to send out
that series anyway, I think.  That's independent of this series, though,
so I'll add the designated initializers in v2.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 09/15] hash: set and copy algo field in struct object_id
  2021-04-11 22:12       ` Ævar Arnfjörð Bjarmason
  2021-04-11 23:52         ` brian m. carlson
@ 2021-04-12 10:53         ` Junio C Hamano
  2021-04-12 11:13           ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 57+ messages in thread
From: Junio C Hamano @ 2021-04-12 10:53 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: brian m. carlson, git, Derrick Stolee

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> But we can use some subset of C99 features, and happily designated
> initializers is one of those, see cbc0f81d96f (strbuf: use designated
> initializers in STRBUF_INIT, 2017-07-10). It's been used all over the
> place since then.

Good advice to cite a commit that on purpose used a feature and
documented that it is allowed.

Also see Documentation/CodingGuidelines ;-)  The document should
give the authoritative blessing for features allowed to be used (add
any missing with a proposed patch).

Thanks.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 0/2] C99: harder dependency on variadic macros
  2021-04-11 23:52         ` brian m. carlson
@ 2021-04-12 11:02           ` Ævar Arnfjörð Bjarmason
  2021-04-12 11:02             ` [PATCH 1/2] git-compat-util.h: clarify comment on GCC-specific code Ævar Arnfjörð Bjarmason
                               ` (2 more replies)
  0 siblings, 3 replies; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 11:02 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, brian m . carlson, Jeff King,
	Ævar Arnfjörð Bjarmason

Since [1] which has been out since v2.31.0 we've had a hard dependency
on variadic macros.

Removing the relevant always-off-unless-you-monkeypatch-the-source
code may be too aggressive for Junio's "Let's give it enough time"[2].

But I'm submitting this because of brian m. carlson's note[3] about
wanting to submit more general patches for declaring a hard dependency
on all of C99.

Whatever anyone thinks of that this harder dependency on C99 variadic
macros would be a subset of such a change, so it makes sense to
consider it first. Let's see if anyone has an issue with this landing
before brian's suggested larger change.

1. https://lore.kernel.org/git/YBJLgY+CWtS9TeVb@coredump.intra.peff.net/
2. https://lore.kernel.org/git/xmqq5z3hy4fq.fsf@gitster.c.googlers.com/
3. https://lore.kernel.org/git/YHOLo36MfuTj6YeD@camp.crustytoothpaste.net/

Ævar Arnfjörð Bjarmason (2):
  git-compat-util.h: clarify comment on GCC-specific code
  C99 support: remove non-HAVE_VARIADIC_MACROS code

 Documentation/CodingGuidelines |  3 ++
 banned.h                       |  5 ---
 git-compat-util.h              | 25 +++++-------
 trace.c                        | 73 ----------------------------------
 trace.h                        | 62 -----------------------------
 trace2.c                       | 39 ------------------
 trace2.h                       | 25 ------------
 usage.c                        | 10 -----
 8 files changed, 12 insertions(+), 230 deletions(-)

-- 
2.31.1.631.gb80e078001e


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 1/2] git-compat-util.h: clarify comment on GCC-specific code
  2021-04-12 11:02           ` [PATCH 0/2] C99: harder dependency on variadic macros Ævar Arnfjörð Bjarmason
@ 2021-04-12 11:02             ` Ævar Arnfjörð Bjarmason
  2021-04-13  7:57               ` Jeff King
  2021-05-21  2:06               ` Jonathan Nieder
  2021-04-12 11:02             ` [PATCH 2/2] C99 support: remove non-HAVE_VARIADIC_MACROS code Ævar Arnfjörð Bjarmason
  2021-04-12 12:14             ` [PATCH 0/2] C99: harder dependency on variadic macros Bagas Sanjaya
  2 siblings, 2 replies; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 11:02 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, brian m . carlson, Jeff King,
	Ævar Arnfjörð Bjarmason

Change a comment added in e208f9cc757 (make error()'s constant return
value more visible, 2012-12-15) to note that the code doesn't only
depend on variadic macros, which have been a hard dependency since
765dc168882 (git-compat-util: always enable variadic macros,
2021-01-28), but also on GCC's handling of __VA_ARGS__. The commit
message for e208f9cc757 made this clear, but the comment it added did
not.

See also e05bed960d3 (trace: add 'file:line' to all trace output,
2014-07-12) for another comment about GNUC's handling of __VA_ARGS__.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 git-compat-util.h | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index 9ddf9d7044b..540aba22a4d 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -480,10 +480,15 @@ void warning_errno(const char *err, ...) __attribute__((format (printf, 1, 2)));
 
 /*
  * Let callers be aware of the constant return value; this can help
- * gcc with -Wuninitialized analysis. We restrict this trick to gcc, though,
- * because some compilers may not support variadic macros. Since we're only
- * trying to help gcc, anyway, it's OK; other compilers will fall back to
- * using the function as usual.
+ * gcc with -Wuninitialized analysis.
+ *
+ * We restrict this trick to gcc, though, because while we rely on the
+ * presence of C99 variadic macros, this code also relies on the
+ * non-standard behavior of GCC's __VA_ARGS__, allowing error() to
+ * work even if no format specifiers are passed to error().
+ *
+ * Since we're only trying to help gcc, anyway, it's OK; other
+ * compilers will fall back to using the function as usual.
  */
 #if defined(__GNUC__)
 static inline int const_error(void)
-- 
2.31.1.631.gb80e078001e


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 2/2] C99 support: remove non-HAVE_VARIADIC_MACROS code
  2021-04-12 11:02           ` [PATCH 0/2] C99: harder dependency on variadic macros Ævar Arnfjörð Bjarmason
  2021-04-12 11:02             ` [PATCH 1/2] git-compat-util.h: clarify comment on GCC-specific code Ævar Arnfjörð Bjarmason
@ 2021-04-12 11:02             ` Ævar Arnfjörð Bjarmason
  2021-04-12 17:58               ` Junio C Hamano
  2021-05-21  2:50               ` Jonathan Nieder
  2021-04-12 12:14             ` [PATCH 0/2] C99: harder dependency on variadic macros Bagas Sanjaya
  2 siblings, 2 replies; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 11:02 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, brian m . carlson, Jeff King,
	Ævar Arnfjörð Bjarmason

Remove code that depend on HAVE_VARIADIC_MACROS not being set. Since
765dc168882 (git-compat-util: always enable variadic macros,
2021-01-28) we've unconditionally defined it to be true, and that
change went out with v2.31.0. This should have given packagers enough
time to discover whether variadic macros were an issue.

It seems that they weren't, so let's update the coding guidelines and
remove all the fallback code for the non-HAVE_VARIADIC_MACROS case.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/CodingGuidelines |  3 ++
 banned.h                       |  5 ---
 git-compat-util.h              | 12 ------
 trace.c                        | 73 ----------------------------------
 trace.h                        | 62 -----------------------------
 trace2.c                       | 39 ------------------
 trace2.h                       | 25 ------------
 usage.c                        | 10 -----
 8 files changed, 3 insertions(+), 226 deletions(-)

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 45465bc0c98..7eafb1758e6 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -205,6 +205,9 @@ For C programs:
    . since mid 2017 with 512f41cf, we have been using designated
      initializers for array (e.g. "int array[10] = { [5] = 2 }").
 
+   . since early 2021 with 765dc168882, we have been using variadic
+     macros, mostly for printf-like trace and debug macros.
+
    These used to be forbidden, but we have not heard any breakage
    report, and they are assumed to be safe.
 
diff --git a/banned.h b/banned.h
index 7ab4f2e4921..6ccf46bc197 100644
--- a/banned.h
+++ b/banned.h
@@ -21,13 +21,8 @@
 
 #undef sprintf
 #undef vsprintf
-#ifdef HAVE_VARIADIC_MACROS
 #define sprintf(...) BANNED(sprintf)
 #define vsprintf(...) BANNED(vsprintf)
-#else
-#define sprintf(buf,fmt,arg) BANNED(sprintf)
-#define vsprintf(buf,fmt,arg) BANNED(vsprintf)
-#endif
 
 #undef gmtime
 #define gmtime(t) BANNED(gmtime)
diff --git a/git-compat-util.h b/git-compat-util.h
index 540aba22a4d..da7ab91335f 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1192,24 +1192,12 @@ static inline int regexec_buf(const regex_t *preg, const char *buf, size_t size,
 #endif
 #endif
 
-/*
- * This is always defined as a first step towards making the use of variadic
- * macros unconditional. If it causes compilation problems on your platform,
- * please report it to the Git mailing list at git@vger.kernel.org.
- */
-#define HAVE_VARIADIC_MACROS 1
-
 /* usage.c: only to be used for testing BUG() implementation (see test-tool) */
 extern int BUG_exit_code;
 
-#ifdef HAVE_VARIADIC_MACROS
 __attribute__((format (printf, 3, 4))) NORETURN
 void BUG_fl(const char *file, int line, const char *fmt, ...);
 #define BUG(...) BUG_fl(__FILE__, __LINE__, __VA_ARGS__)
-#else
-__attribute__((format (printf, 1, 2))) NORETURN
-void BUG(const char *fmt, ...);
-#endif
 
 /*
  * Preserves errno, prints a message, but gives no warning for ENOENT.
diff --git a/trace.c b/trace.c
index f726686fd92..43173301f59 100644
--- a/trace.c
+++ b/trace.c
@@ -111,13 +111,11 @@ static int prepare_trace_line(const char *file, int line,
 	strbuf_addf(buf, "%02d:%02d:%02d.%06ld ", tm.tm_hour, tm.tm_min,
 		    tm.tm_sec, (long) tv.tv_usec);
 
-#ifdef HAVE_VARIADIC_MACROS
 	/* print file:line */
 	strbuf_addf(buf, "%s:%d ", file, line);
 	/* align trace output (column 40 catches most files names in git) */
 	while (buf->len < 40)
 		strbuf_addch(buf, ' ');
-#endif
 
 	return 1;
 }
@@ -229,74 +227,6 @@ static void trace_performance_vprintf_fl(const char *file, int line,
 	strbuf_release(&buf);
 }
 
-#ifndef HAVE_VARIADIC_MACROS
-
-void trace_printf(const char *format, ...)
-{
-	va_list ap;
-	va_start(ap, format);
-	trace_vprintf_fl(NULL, 0, &trace_default_key, format, ap);
-	va_end(ap);
-}
-
-void trace_printf_key(struct trace_key *key, const char *format, ...)
-{
-	va_list ap;
-	va_start(ap, format);
-	trace_vprintf_fl(NULL, 0, key, format, ap);
-	va_end(ap);
-}
-
-void trace_argv_printf(const char **argv, const char *format, ...)
-{
-	va_list ap;
-	va_start(ap, format);
-	trace_argv_vprintf_fl(NULL, 0, argv, format, ap);
-	va_end(ap);
-}
-
-void trace_strbuf(struct trace_key *key, const struct strbuf *data)
-{
-	trace_strbuf_fl(NULL, 0, key, data);
-}
-
-void trace_performance(uint64_t nanos, const char *format, ...)
-{
-	va_list ap;
-	va_start(ap, format);
-	trace_performance_vprintf_fl(NULL, 0, nanos, format, ap);
-	va_end(ap);
-}
-
-void trace_performance_since(uint64_t start, const char *format, ...)
-{
-	va_list ap;
-	va_start(ap, format);
-	trace_performance_vprintf_fl(NULL, 0, getnanotime() - start,
-				     format, ap);
-	va_end(ap);
-}
-
-void trace_performance_leave(const char *format, ...)
-{
-	va_list ap;
-	uint64_t since;
-
-	if (perf_indent)
-		perf_indent--;
-
-	if (!format) /* Allow callers to leave without tracing anything */
-		return;
-
-	since = perf_start_times[perf_indent];
-	va_start(ap, format);
-	trace_performance_vprintf_fl(NULL, 0, getnanotime() - since,
-				     format, ap);
-	va_end(ap);
-}
-
-#else
-
 void trace_printf_key_fl(const char *file, int line, struct trace_key *key,
 			 const char *format, ...)
 {
@@ -342,9 +272,6 @@ void trace_performance_leave_fl(const char *file, int line,
 	va_end(ap);
 }
 
-#endif /* HAVE_VARIADIC_MACROS */
-
-
 static const char *quote_crnl(const char *path)
 {
 	static struct strbuf new_path = STRBUF_INIT;
diff --git a/trace.h b/trace.h
index 0dbbad0e41c..c6b3f6ce889 100644
--- a/trace.h
+++ b/trace.h
@@ -126,66 +126,6 @@ void trace_command_performance(const char **argv);
 void trace_verbatim(struct trace_key *key, const void *buf, unsigned len);
 uint64_t trace_performance_enter(void);
 
-#ifndef HAVE_VARIADIC_MACROS
-
-/**
- * Prints a formatted message, similar to printf.
- */
-__attribute__((format (printf, 1, 2)))
-void trace_printf(const char *format, ...);
-
-__attribute__((format (printf, 2, 3)))
-void trace_printf_key(struct trace_key *key, const char *format, ...);
-
-/**
- * Prints a formatted message, followed by a quoted list of arguments.
- */
-__attribute__((format (printf, 2, 3)))
-void trace_argv_printf(const char **argv, const char *format, ...);
-
-/**
- * Prints the strbuf, without additional formatting (i.e. doesn't
- * choke on `%` or even `\0`).
- */
-void trace_strbuf(struct trace_key *key, const struct strbuf *data);
-
-/**
- * Prints elapsed time (in nanoseconds) if GIT_TRACE_PERFORMANCE is enabled.
- *
- * Example:
- * ------------
- * uint64_t t = 0;
- * for (;;) {
- * 	// ignore
- * t -= getnanotime();
- * // code section to measure
- * t += getnanotime();
- * // ignore
- * }
- * trace_performance(t, "frotz");
- * ------------
- */
-__attribute__((format (printf, 2, 3)))
-void trace_performance(uint64_t nanos, const char *format, ...);
-
-/**
- * Prints elapsed time since 'start' if GIT_TRACE_PERFORMANCE is enabled.
- *
- * Example:
- * ------------
- * uint64_t start = getnanotime();
- * // code section to measure
- * trace_performance_since(start, "foobar");
- * ------------
- */
-__attribute__((format (printf, 2, 3)))
-void trace_performance_since(uint64_t start, const char *format, ...);
-
-__attribute__((format (printf, 1, 2)))
-void trace_performance_leave(const char *format, ...);
-
-#else
-
 /*
  * Macros to add file:line - see above for C-style declarations of how these
  * should be used.
@@ -285,6 +225,4 @@ static inline int trace_pass_fl(struct trace_key *key)
 	return key->fd || !key->initialized;
 }
 
-#endif /* HAVE_VARIADIC_MACROS */
-
 #endif /* TRACE_H */
diff --git a/trace2.c b/trace2.c
index 256120c7fd5..51d0e6cbd5e 100644
--- a/trace2.c
+++ b/trace2.c
@@ -597,20 +597,6 @@ void trace2_region_enter_printf_fl(const char *file, int line,
 	va_end(ap);
 }
 
-#ifndef HAVE_VARIADIC_MACROS
-void trace2_region_enter_printf(const char *category, const char *label,
-				const struct repository *repo, const char *fmt,
-				...)
-{
-	va_list ap;
-
-	va_start(ap, fmt);
-	trace2_region_enter_printf_va_fl(NULL, 0, category, label, repo, fmt,
-					 ap);
-	va_end(ap);
-}
-#endif
-
 void trace2_region_leave_printf_va_fl(const char *file, int line,
 				      const char *category, const char *label,
 				      const struct repository *repo,
@@ -673,20 +659,6 @@ void trace2_region_leave_printf_fl(const char *file, int line,
 	va_end(ap);
 }
 
-#ifndef HAVE_VARIADIC_MACROS
-void trace2_region_leave_printf(const char *category, const char *label,
-				const struct repository *repo, const char *fmt,
-				...)
-{
-	va_list ap;
-
-	va_start(ap, fmt);
-	trace2_region_leave_printf_va_fl(NULL, 0, category, label, repo, fmt,
-					 ap);
-	va_end(ap);
-}
-#endif
-
 void trace2_data_string_fl(const char *file, int line, const char *category,
 			   const struct repository *repo, const char *key,
 			   const char *value)
@@ -782,17 +754,6 @@ void trace2_printf_fl(const char *file, int line, const char *fmt, ...)
 	va_end(ap);
 }
 
-#ifndef HAVE_VARIADIC_MACROS
-void trace2_printf(const char *fmt, ...)
-{
-	va_list ap;
-
-	va_start(ap, fmt);
-	trace2_printf_va_fl(NULL, 0, fmt, ap);
-	va_end(ap);
-}
-#endif
-
 const char *trace2_session_id(void)
 {
 	return tr2_sid_get();
diff --git a/trace2.h b/trace2.h
index ede18c2e063..5d85826f23d 100644
--- a/trace2.h
+++ b/trace2.h
@@ -362,18 +362,9 @@ void trace2_region_enter_printf_fl(const char *file, int line,
 				   const struct repository *repo,
 				   const char *fmt, ...);
 
-#ifdef HAVE_VARIADIC_MACROS
 #define trace2_region_enter_printf(category, label, repo, ...)                 \
 	trace2_region_enter_printf_fl(__FILE__, __LINE__, (category), (label), \
 				      (repo), __VA_ARGS__)
-#else
-/* clang-format off */
-__attribute__((format (region_enter_printf, 4, 5)))
-void trace2_region_enter_printf(const char *category, const char *label,
-				const struct repository *repo, const char *fmt,
-				...);
-/* clang-format on */
-#endif
 
 /**
  * Emit a 'region_leave' event for <category>.<label> with optional
@@ -407,18 +398,9 @@ void trace2_region_leave_printf_fl(const char *file, int line,
 				   const struct repository *repo,
 				   const char *fmt, ...);
 
-#ifdef HAVE_VARIADIC_MACROS
 #define trace2_region_leave_printf(category, label, repo, ...)                 \
 	trace2_region_leave_printf_fl(__FILE__, __LINE__, (category), (label), \
 				      (repo), __VA_ARGS__)
-#else
-/* clang-format off */
-__attribute__((format (region_leave_printf, 4, 5)))
-void trace2_region_leave_printf(const char *category, const char *label,
-				const struct repository *repo, const char *fmt,
-				...);
-/* clang-format on */
-#endif
 
 /**
  * Emit a key-value pair 'data' event of the form <category>.<key> = <value>.
@@ -471,14 +453,7 @@ void trace2_printf_va_fl(const char *file, int line, const char *fmt,
 
 void trace2_printf_fl(const char *file, int line, const char *fmt, ...);
 
-#ifdef HAVE_VARIADIC_MACROS
 #define trace2_printf(...) trace2_printf_fl(__FILE__, __LINE__, __VA_ARGS__)
-#else
-/* clang-format off */
-__attribute__((format (printf, 1, 2)))
-void trace2_printf(const char *fmt, ...);
-/* clang-format on */
-#endif
 
 /*
  * Optional platform-specific code to dump information about the
diff --git a/usage.c b/usage.c
index 1b206de36d6..c8022c517e2 100644
--- a/usage.c
+++ b/usage.c
@@ -290,7 +290,6 @@ static NORETURN void BUG_vfl(const char *file, int line, const char *fmt, va_lis
 	abort();
 }
 
-#ifdef HAVE_VARIADIC_MACROS
 NORETURN void BUG_fl(const char *file, int line, const char *fmt, ...)
 {
 	va_list ap;
@@ -298,15 +297,6 @@ NORETURN void BUG_fl(const char *file, int line, const char *fmt, ...)
 	BUG_vfl(file, line, fmt, ap);
 	va_end(ap);
 }
-#else
-NORETURN void BUG(const char *fmt, ...)
-{
-	va_list ap;
-	va_start(ap, fmt);
-	BUG_vfl(NULL, 0, fmt, ap);
-	va_end(ap);
-}
-#endif
 
 #ifdef SUPPRESS_ANNOTATED_LEAKS
 void unleak_memory(const void *ptr, size_t len)
-- 
2.31.1.631.gb80e078001e


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 09/15] hash: set and copy algo field in struct object_id
  2021-04-12 10:53         ` [PATCH 09/15] hash: set and copy algo field in struct object_id Junio C Hamano
@ 2021-04-12 11:13           ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 11:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: brian m. carlson, git, Derrick Stolee, Jeff King


On Mon, Apr 12 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> But we can use some subset of C99 features, and happily designated
>> initializers is one of those, see cbc0f81d96f (strbuf: use designated
>> initializers in STRBUF_INIT, 2017-07-10). It's been used all over the
>> place since then.
>
> Good advice to cite a commit that on purpose used a feature and
> documented that it is allowed.
>
> Also see Documentation/CodingGuidelines ;-)  The document should
> give the authoritative blessing for features allowed to be used (add
> any missing with a proposed patch).

Our E-Mails probably crossed, my initial motivation for just-submitted
http://lore.kernel.org/git/cover-0.2-00000000000-20210412T105422Z-avarab@gmail.com
was going to CodingGuidelines, and vaguely remembering that there was
some other C99 thing that wasn't listed there, and then (re-)discovering
the recent variadic macro commit from Jeff King.

As noted there maybe 2/2 of that is too aggressive, but in that case it
would make sense to have a V2 of that which just carved off the
CodingGuidelines change.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/2] C99: harder dependency on variadic macros
  2021-04-12 11:02           ` [PATCH 0/2] C99: harder dependency on variadic macros Ævar Arnfjörð Bjarmason
  2021-04-12 11:02             ` [PATCH 1/2] git-compat-util.h: clarify comment on GCC-specific code Ævar Arnfjörð Bjarmason
  2021-04-12 11:02             ` [PATCH 2/2] C99 support: remove non-HAVE_VARIADIC_MACROS code Ævar Arnfjörð Bjarmason
@ 2021-04-12 12:14             ` Bagas Sanjaya
  2021-04-12 12:41               ` Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 57+ messages in thread
From: Bagas Sanjaya @ 2021-04-12 12:14 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, brian m . carlson, Jeff King, Git Users

On 12/04/21 18.02, Ævar Arnfjörð Bjarmason wrote:
> But I'm submitting this because of brian m. carlson's note[3] about
> wanting to submit more general patches for declaring a hard dependency
> on all of C99.

I think we should bump standard requirement to C99, right?

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/2] C99: harder dependency on variadic macros
  2021-04-12 12:14             ` [PATCH 0/2] C99: harder dependency on variadic macros Bagas Sanjaya
@ 2021-04-12 12:41               ` Ævar Arnfjörð Bjarmason
  2021-04-12 22:57                 ` brian m. carlson
  0 siblings, 1 reply; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 12:41 UTC (permalink / raw)
  To: Bagas Sanjaya; +Cc: Junio C Hamano, brian m . carlson, Jeff King, Git Users


On Mon, Apr 12 2021, Bagas Sanjaya wrote:

> On 12/04/21 18.02, Ævar Arnfjörð Bjarmason wrote:
>> But I'm submitting this because of brian m. carlson's note[3] about
>> wanting to submit more general patches for declaring a hard dependency
>> on all of C99.
>
> I think we should bump standard requirement to C99, right?

I think that's worth discussing, but isn't the topic of this more narrow
change.

As noted in
http://lore.kernel.org/git/87wnt8eai1.fsf@evledraar.gmail.com if we
simply do that some of our MSVC CI will start failing.

I don't know what other compilers we need to support that may support
our current subset of C99 features, but not the full set, or if e.g. the
CI can simply have its MSVC compiler version bumped.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] C99 support: remove non-HAVE_VARIADIC_MACROS code
  2021-04-12 11:02             ` [PATCH 2/2] C99 support: remove non-HAVE_VARIADIC_MACROS code Ævar Arnfjörð Bjarmason
@ 2021-04-12 17:58               ` Junio C Hamano
  2021-04-13  8:00                 ` Jeff King
  2021-05-21  2:50               ` Jonathan Nieder
  1 sibling, 1 reply; 57+ messages in thread
From: Junio C Hamano @ 2021-04-12 17:58 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, brian m . carlson, Jeff King

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Remove code that depend on HAVE_VARIADIC_MACROS not being set. Since
> 765dc168882 (git-compat-util: always enable variadic macros,
> 2021-01-28) we've unconditionally defined it to be true, and that
> change went out with v2.31.0. This should have given packagers enough
> time to discover whether variadic macros were an issue.

It hasn't even been a month since we did v2.31.0.  Since it was not
even a maintenance release for security update, I have no reason to
expect packagers to be all that prompt to react.  And because we
gave them an escape hatch, they may have used it to update their
distro packages and haven't had a chance to tell us about it yet.

So, the above does not sound like a credible excuse to make our
future work necessary to react to "our toolchain is not ready yet"
complaints bigger.  At least not yet.

Please do not add patches that you know are unnecessary right now to
the pile of patches that needs to consume reviewer bandwidth.

Thanks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/2] C99: harder dependency on variadic macros
  2021-04-12 12:41               ` Ævar Arnfjörð Bjarmason
@ 2021-04-12 22:57                 ` brian m. carlson
  2021-04-12 23:19                   ` Junio C Hamano
  0 siblings, 1 reply; 57+ messages in thread
From: brian m. carlson @ 2021-04-12 22:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Bagas Sanjaya, Junio C Hamano, Jeff King, Git Users

[-- Attachment #1: Type: text/plain, Size: 1838 bytes --]

On 2021-04-12 at 12:41:35, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Apr 12 2021, Bagas Sanjaya wrote:
> 
> > On 12/04/21 18.02, Ævar Arnfjörð Bjarmason wrote:
> >> But I'm submitting this because of brian m. carlson's note[3] about
> >> wanting to submit more general patches for declaring a hard dependency
> >> on all of C99.
> >
> > I think we should bump standard requirement to C99, right?
> 
> I think that's worth discussing, but isn't the topic of this more narrow
> change.

I'm in favor of this more narrow change as well.

Junio's statement that packages may not have had time to update is true,
but I also just looked at a variety of packages that run on Linux,
FreeBSD, and NetBSD (and, since it's pkgsrc, Solaris), and they're all
updated.  Usually most open source OS vendors are reasonably prompt
about updating their Git versions, at least in the bleeding edge
repositories.

If this works on Windows, it will also work on Unix, because POSIX has
required C99 support since the 2001 revision, and __VA_ARGS__ is C99.
Unix systems are not the thing preventing us from enabling C99 support
(or any subset of it) in any meaningful sense.

> As noted in
> http://lore.kernel.org/git/87wnt8eai1.fsf@evledraar.gmail.com if we
> simply do that some of our MSVC CI will start failing.
> 
> I don't know what other compilers we need to support that may support
> our current subset of C99 features, but not the full set, or if e.g. the
> CI can simply have its MSVC compiler version bumped.

I'm looking at fixing our CI before I send in my series.  My series,
when it comes in, will have a green CI status because I do want to be
sure that we're providing a supported environment for MSVC wherever
that's possible.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/2] C99: harder dependency on variadic macros
  2021-04-12 22:57                 ` brian m. carlson
@ 2021-04-12 23:19                   ` Junio C Hamano
  0 siblings, 0 replies; 57+ messages in thread
From: Junio C Hamano @ 2021-04-12 23:19 UTC (permalink / raw)
  To: brian m. carlson, Randall S. Becker
  Cc: Ævar Arnfjörð Bjarmason, Bagas Sanjaya, Jeff King,
	Git Users

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> If this works on Windows, it will also work on Unix, because POSIX has
> required C99 support since the 2001 revision, and __VA_ARGS__ is C99.
> Unix systems are not the thing preventing us from enabling C99 support
> (or any subset of it) in any meaningful sense.

Yeah, among the list (semi-)regulars, the only ones that may likely
to be broken is the NonStop folks.  Hopefully they are aware of this
discussion?

https://lore.kernel.org/git/xmqqeeffe669.fsf@gitster.g/


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] git-compat-util.h: clarify comment on GCC-specific code
  2021-04-12 11:02             ` [PATCH 1/2] git-compat-util.h: clarify comment on GCC-specific code Ævar Arnfjörð Bjarmason
@ 2021-04-13  7:57               ` Jeff King
  2021-04-13 21:07                 ` Junio C Hamano
  2021-05-21  2:06               ` Jonathan Nieder
  1 sibling, 1 reply; 57+ messages in thread
From: Jeff King @ 2021-04-13  7:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, brian m . carlson

On Mon, Apr 12, 2021 at 01:02:17PM +0200, Ævar Arnfjörð Bjarmason wrote:

> diff --git a/git-compat-util.h b/git-compat-util.h
> index 9ddf9d7044b..540aba22a4d 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -480,10 +480,15 @@ void warning_errno(const char *err, ...) __attribute__((format (printf, 1, 2)));
>  
>  /*
>   * Let callers be aware of the constant return value; this can help
> - * gcc with -Wuninitialized analysis. We restrict this trick to gcc, though,
> - * because some compilers may not support variadic macros. Since we're only
> - * trying to help gcc, anyway, it's OK; other compilers will fall back to
> - * using the function as usual.
> + * gcc with -Wuninitialized analysis.
> + *
> + * We restrict this trick to gcc, though, because while we rely on the
> + * presence of C99 variadic macros, this code also relies on the
> + * non-standard behavior of GCC's __VA_ARGS__, allowing error() to
> + * work even if no format specifiers are passed to error().
> + *
> + * Since we're only trying to help gcc, anyway, it's OK; other
> + * compilers will fall back to using the function as usual.
>   */
>  #if defined(__GNUC__)

I don't mind leaving this gcc-only, since as you note that's the point
of what the code is trying to do. But wouldn't this always work because
we know there is at least one arg (the format itself)?

I.e., if we had written:

  #define error(fmt, ...) (error(fmt, __VA_ARGS__), const_error())

that would be a problem for:

  error("foo");

But because we wrote:

  #define error(...) (error(__VA_ARGS__), const_error())

then it's OK.

-Peff

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] C99 support: remove non-HAVE_VARIADIC_MACROS code
  2021-04-12 17:58               ` Junio C Hamano
@ 2021-04-13  8:00                 ` Jeff King
  0 siblings, 0 replies; 57+ messages in thread
From: Jeff King @ 2021-04-13  8:00 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, git, brian m . carlson

On Mon, Apr 12, 2021 at 10:58:22AM -0700, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
> 
> > Remove code that depend on HAVE_VARIADIC_MACROS not being set. Since
> > 765dc168882 (git-compat-util: always enable variadic macros,
> > 2021-01-28) we've unconditionally defined it to be true, and that
> > change went out with v2.31.0. This should have given packagers enough
> > time to discover whether variadic macros were an issue.
> 
> It hasn't even been a month since we did v2.31.0.  Since it was not
> even a maintenance release for security update, I have no reason to
> expect packagers to be all that prompt to react.  And because we
> gave them an escape hatch, they may have used it to update their
> distro packages and haven't had a chance to tell us about it yet.
> 
> So, the above does not sound like a credible excuse to make our
> future work necessary to react to "our toolchain is not ready yet"
> complaints bigger.  At least not yet.

Yeah, the whole idea of the change in v2.31.0 was to change as little as
possible for the weather-balloon. I agree that waiting longer to see the
results of our test makes sense, unless there is a pressing need.

If we had some new use case that was going to have to _add_ workarounds
for platforms without variadic macros, I'd be more inclined to think
about that timetable versus how much work it would be to add those
workarounds. But since there isn't one, it seems there is little cost to
waiting longer.

-Peff

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/15] cache: add an algo member to struct object_id
  2021-04-10 15:21 ` [PATCH 03/15] cache: add an algo member to struct object_id brian m. carlson
  2021-04-11 11:55   ` Ævar Arnfjörð Bjarmason
@ 2021-04-13 12:12   ` Derrick Stolee
  2021-04-14  1:08     ` brian m. carlson
  1 sibling, 1 reply; 57+ messages in thread
From: Derrick Stolee @ 2021-04-13 12:12 UTC (permalink / raw)
  To: brian m. carlson, git; +Cc: Derrick Stolee

On 4/10/2021 11:21 AM, brian m. carlson wrote:
> Now that we're working with multiple hash algorithms in the same repo,
> it's best if we label each object ID with its algorithm so we can
> determine how to format a given object ID. Add a member called algo to
> struct object_id.
> 
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  hash.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hash.h b/hash.h
> index 3fb0c3d400..dafdcb3335 100644
> --- a/hash.h
> +++ b/hash.h
> @@ -181,6 +181,7 @@ static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
>  
>  struct object_id {
>  	unsigned char hash[GIT_MAX_RAWSZ];
> +	int algo;
>  };

What are the performance implications of adding this single bit
(that actually costs us 4 to 8 bytes, based on alignment)? Later
in the series you add longer hash comparisons, too. These seem
like they will affect performance for existing SHA-1 repos, and
it would be nice to know how much we are paying for this support.

I assume that we already checked what happened when GIT_MAX_RAWSZ
increased, but that seemed worth the cost so we could have SHA-256
at all. I find the justification for this interoperability mode to
be less significant, and potentially adding too much of a tax onto
both SHA-1 repos that will never upgrade, and SHA-256 repos that
upgrade all at once (or start as SHA-256).

Of course, if there truly is no serious performance implication to
this change, then I support following the transition plan and
allowing us to be flexible on timelines for interoperability. It
just seems like we need to investigate what this will cost.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] git-compat-util.h: clarify comment on GCC-specific code
  2021-04-13  7:57               ` Jeff King
@ 2021-04-13 21:07                 ` Junio C Hamano
  2021-04-14  5:21                   ` Jeff King
  0 siblings, 1 reply; 57+ messages in thread
From: Junio C Hamano @ 2021-04-13 21:07 UTC (permalink / raw)
  To: Jeff King; +Cc: Ævar Arnfjörð Bjarmason, git, brian m . carlson

Jeff King <peff@peff.net> writes:

>> + * We restrict this trick to gcc, though, because while we rely on the
>> + * presence of C99 variadic macros, this code also relies on the
>> + * non-standard behavior of GCC's __VA_ARGS__, allowing error() to
>> + * work even if no format specifiers are passed to error().

The last part of this comment is puzzlling.  Do we ever call error()
without any format specifier?  There may be GCC-ism behaviour around
the __VA_ARGS__ stuff, but are we relying on that GCC-ism?

>> + * Since we're only trying to help gcc, anyway, it's OK; other
>> + * compilers will fall back to using the function as usual.
>>   */
>>  #if defined(__GNUC__)
>
> I don't mind leaving this gcc-only, since as you note that's the point
> of what the code is trying to do. But wouldn't this always work because
> we know there is at least one arg (the format itself)?
>
> I.e., if we had written:
>
>   #define error(fmt, ...) (error(fmt, __VA_ARGS__), const_error())
>
> that would be a problem for:
>
>   error("foo");
>
> But because we wrote:
>
>   #define error(...) (error(__VA_ARGS__), const_error())
>
> then it's OK.

I think so.  At least I find the new comment confusing, and I'd
prefer to see it cleaned up.

Thanks.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/15] cache: add an algo member to struct object_id
  2021-04-13 12:12   ` Derrick Stolee
@ 2021-04-14  1:08     ` brian m. carlson
  2021-04-15  8:47       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 57+ messages in thread
From: brian m. carlson @ 2021-04-14  1:08 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Derrick Stolee

[-- Attachment #1: Type: text/plain, Size: 3493 bytes --]

On 2021-04-13 at 12:12:21, Derrick Stolee wrote:
> On 4/10/2021 11:21 AM, brian m. carlson wrote:
> > Now that we're working with multiple hash algorithms in the same repo,
> > it's best if we label each object ID with its algorithm so we can
> > determine how to format a given object ID. Add a member called algo to
> > struct object_id.
> > 
> > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> > ---
> >  hash.h | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/hash.h b/hash.h
> > index 3fb0c3d400..dafdcb3335 100644
> > --- a/hash.h
> > +++ b/hash.h
> > @@ -181,6 +181,7 @@ static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
> >  
> >  struct object_id {
> >  	unsigned char hash[GIT_MAX_RAWSZ];
> > +	int algo;
> >  };
> 
> What are the performance implications of adding this single bit
> (that actually costs us 4 to 8 bytes, based on alignment)? Later
> in the series you add longer hash comparisons, too. These seem
> like they will affect performance for existing SHA-1 repos, and
> it would be nice to know how much we are paying for this support.

I will do some performance numbers on these patches, but it will likely
be the weekend before I can get to it.  I think this will add 4 bytes on
most platforms, since int is typically 32 bits, and the alignment
requirement would be for the most strictly aligned member, which is the
int, so a 4-byte alignment.  I don't think the alignment requirements
are especially onerous here.

> I assume that we already checked what happened when GIT_MAX_RAWSZ
> increased, but that seemed worth the cost so we could have SHA-256
> at all. I find the justification for this interoperability mode to
> be less significant, and potentially adding too much of a tax onto
> both SHA-1 repos that will never upgrade, and SHA-256 repos that
> upgrade all at once (or start as SHA-256).

The entire goal of the interoperability is to let people seamlessly and
transparently move from SHA-1 to SHA-256.  Currently, the only way
people can move a SHA-1 repository to a SHA-256 repository is with
fast-import and fast-export, which loses all digital signatures and tags
to blobs.  This also requires a flag day.

SHA-1 can now be attacked for USD 45,000.  That means it is within the
budget of a dedicated professional and virtually all medium or large
corporations, including even most municipal governments, to create a
SHA-1 collision.  Unfortunately, the way we deal with this is to die, so
as soon as this happens, the repository fails closed.  While an attacker
cannot make use of the collisions to spread malicious objects, because
of the way Git works, they can effectively DoS a repository, which is in
itself a security issue.  Fixing this requires major surgery.

We need the interoperability code to let people transition their
repositories away from SHA-1, even if it has some performance impact,
because without that most SHA-1 repositories will never transition.
That's what's outlined in the transition plan, and why that approach was
proposed, even though it would be nicer to avoid having to implement it
at all.

I will endeavor to make the performance impact as small as possible, of
course, and ideally there will be none.  I am sensitive to the fact that
people do run absurdly large workloads on Git, as we both know, and I do
want to support that.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] git-compat-util.h: clarify comment on GCC-specific code
  2021-04-13 21:07                 ` Junio C Hamano
@ 2021-04-14  5:21                   ` Jeff King
  2021-04-14  6:12                     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 57+ messages in thread
From: Jeff King @ 2021-04-14  5:21 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, git, brian m . carlson

On Tue, Apr 13, 2021 at 02:07:13PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >> + * We restrict this trick to gcc, though, because while we rely on the
> >> + * presence of C99 variadic macros, this code also relies on the
> >> + * non-standard behavior of GCC's __VA_ARGS__, allowing error() to
> >> + * work even if no format specifiers are passed to error().
> 
> The last part of this comment is puzzlling.  Do we ever call error()
> without any format specifier?  There may be GCC-ism behaviour around
> the __VA_ARGS__ stuff, but are we relying on that GCC-ism?

I took "format specifier" to mean the "%" code within the format. E.g.:

  error("foo");

has no format specifier, and thus no arguments after the format. But
every call will have at least the format string itself.

AFAIK, portably using variadic macros means you need there to always be
at least one argument. Hence "error(fmt, ...)" is wrong (the "..." may
have no arguments) but "error(...)" is OK (you always have a format
string). I'm not sure if Ævar knows about some other portability gotcha,
or if he just didn't realize that this was written in the portable way.

-Peff

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] git-compat-util.h: clarify comment on GCC-specific code
  2021-04-14  5:21                   ` Jeff King
@ 2021-04-14  6:12                     ` Ævar Arnfjörð Bjarmason
  2021-04-14  7:31                       ` Jeff King
  0 siblings, 1 reply; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-14  6:12 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, brian m . carlson


On Wed, Apr 14 2021, Jeff King wrote:

> On Tue, Apr 13, 2021 at 02:07:13PM -0700, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>> 
>> >> + * We restrict this trick to gcc, though, because while we rely on the
>> >> + * presence of C99 variadic macros, this code also relies on the
>> >> + * non-standard behavior of GCC's __VA_ARGS__, allowing error() to
>> >> + * work even if no format specifiers are passed to error().
>> 
>> The last part of this comment is puzzlling.  Do we ever call error()
>> without any format specifier?  There may be GCC-ism behaviour around
>> the __VA_ARGS__ stuff, but are we relying on that GCC-ism?
>
> I took "format specifier" to mean the "%" code within the format. E.g.:
>
>   error("foo");
>
> has no format specifier, and thus no arguments after the format. But
> every call will have at least the format string itself.
>
> AFAIK, portably using variadic macros means you need there to always be
> at least one argument. Hence "error(fmt, ...)" is wrong (the "..." may
> have no arguments) but "error(...)" is OK (you always have a format
> string). I'm not sure if Ævar knows about some other portability gotcha,
> or if he just didn't realize that this was written in the portable way.

No, I just read elsewhere that GCC had non-standard behavior, and didn't
look carefully at your implementation, but since it explicitly depended
on GNUC etc. understood it to mean it was GCC-specific, not just
C99-specific.

So it can simply be changed to depend on HAVE_VARIADIC_MACROS instead?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] git-compat-util.h: clarify comment on GCC-specific code
  2021-04-14  6:12                     ` Ævar Arnfjörð Bjarmason
@ 2021-04-14  7:31                       ` Jeff King
  0 siblings, 0 replies; 57+ messages in thread
From: Jeff King @ 2021-04-14  7:31 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, git, brian m . carlson

On Wed, Apr 14, 2021 at 08:12:34AM +0200, Ævar Arnfjörð Bjarmason wrote:

> 
> On Wed, Apr 14 2021, Jeff King wrote:
> 
> > On Tue, Apr 13, 2021 at 02:07:13PM -0700, Junio C Hamano wrote:
> >
> >> Jeff King <peff@peff.net> writes:
> >> 
> >> >> + * We restrict this trick to gcc, though, because while we rely on the
> >> >> + * presence of C99 variadic macros, this code also relies on the
> >> >> + * non-standard behavior of GCC's __VA_ARGS__, allowing error() to
> >> >> + * work even if no format specifiers are passed to error().
> >> 
> >> The last part of this comment is puzzlling.  Do we ever call error()
> >> without any format specifier?  There may be GCC-ism behaviour around
> >> the __VA_ARGS__ stuff, but are we relying on that GCC-ism?
> >
> > I took "format specifier" to mean the "%" code within the format. E.g.:
> >
> >   error("foo");
> >
> > has no format specifier, and thus no arguments after the format. But
> > every call will have at least the format string itself.
> >
> > AFAIK, portably using variadic macros means you need there to always be
> > at least one argument. Hence "error(fmt, ...)" is wrong (the "..." may
> > have no arguments) but "error(...)" is OK (you always have a format
> > string). I'm not sure if Ævar knows about some other portability gotcha,
> > or if he just didn't realize that this was written in the portable way.
> 
> No, I just read elsewhere that GCC had non-standard behavior, and didn't
> look carefully at your implementation, but since it explicitly depended
> on GNUC etc. understood it to mean it was GCC-specific, not just
> C99-specific.
> 
> So it can simply be changed to depend on HAVE_VARIADIC_MACROS instead?

I think it probably could be, yes.

The original predates HAVE_VARIADIC_MACROS (which we got in 2014); the
original error() macro is from e208f9cc75 (make error()'s constant
return value more visible, 2012-12-15).

The original also used the gcc-ism with the paste operator (see the
commit message for mention of it), but that was actually dropped later
by 9798f7e5f9 (Use __VA_ARGS__ for all of error's arguments,
2013-02-08), giving us the C99-portable version we have now.

All that said, I don't see much value in converting it to use
HAVE_VARIADIC_MACROS. It is mostly there to benefit gcc's warning code.

In theory making the return value visible could help other compilers
generate better code, too. If we care about doing that, we could switch
to making it unconditional. But aside from the variadic macros, it's
kind of a weird construct and I'd be worried about generating warnings
on other compilers or static analysis systems. E.g., see the hack from
87fe5df365 (inline constant return from error() function, 2014-05-06).

-Peff

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/15] cache: add an algo member to struct object_id
  2021-04-14  1:08     ` brian m. carlson
@ 2021-04-15  8:47       ` Ævar Arnfjörð Bjarmason
  2021-04-15 23:51         ` brian m. carlson
  0 siblings, 1 reply; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-15  8:47 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Derrick Stolee, git, Derrick Stolee


On Wed, Apr 14 2021, brian m. carlson wrote:

> On 2021-04-13 at 12:12:21, Derrick Stolee wrote:
>> On 4/10/2021 11:21 AM, brian m. carlson wrote:
>> > Now that we're working with multiple hash algorithms in the same repo,
>> > it's best if we label each object ID with its algorithm so we can
>> > determine how to format a given object ID. Add a member called algo to
>> > struct object_id.
>> > 
>> > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
>> > ---
>> >  hash.h | 1 +
>> >  1 file changed, 1 insertion(+)
>> > 
>> > diff --git a/hash.h b/hash.h
>> > index 3fb0c3d400..dafdcb3335 100644
>> > --- a/hash.h
>> > +++ b/hash.h
>> > @@ -181,6 +181,7 @@ static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
>> >  
>> >  struct object_id {
>> >  	unsigned char hash[GIT_MAX_RAWSZ];
>> > +	int algo;
>> >  };
>> 
>> What are the performance implications of adding this single bit
>> (that actually costs us 4 to 8 bytes, based on alignment)? Later
>> in the series you add longer hash comparisons, too. These seem
>> like they will affect performance for existing SHA-1 repos, and
>> it would be nice to know how much we are paying for this support.
>
> I will do some performance numbers on these patches, but it will likely
> be the weekend before I can get to it.  I think this will add 4 bytes on
> most platforms, since int is typically 32 bits, and the alignment
> requirement would be for the most strictly aligned member, which is the
> int, so a 4-byte alignment.  I don't think the alignment requirements
> are especially onerous here.

I think if you're doing such a perf test one where we have SHA-1 mode
with SHA-1 length OID v.s. SHA-256 (the current behavior) would be
interesting as well.

It seems like good SHA1s to test that are 5a0cc8aca79 and
13eeedb5d17. Running:

    GIT_PERF_REPEAT_COUNT=10 \
    GIT_SKIP_TESTS="p0001.[3-9] p1450.2" \
    GIT_TEST_OPTS= GIT_PERF_MAKE_OPTS='-j8 CFLAGS=-O3' \
    ./run 5a0cc8aca79 13eeedb5d17 -- p0001-rev-list.sh p1450-fsck.sh

(I added a fsck --connectivity-only test)

Gives us:

    Test                               5a0cc8aca79         13eeedb5d17            
    ------------------------------------------------------------------------------
    0001.1: rev-list --all             2.46(2.22+0.14)     2.48(2.25+0.14) +0.8%  
    0001.2: rev-list --all --objects   10.79(10.22+0.14)   10.92(10.24+0.20) +1.2%
    1450.1: fsck --connectivity-only   16.61(15.42+0.34)   16.94(15.60+0.32) +2.0%

So at least on my box none of those are outside of the confidence
intervals. This was against my copy of git.git. Perhaps it matters more
under memory pressure.

>> I assume that we already checked what happened when GIT_MAX_RAWSZ
>> increased, but that seemed worth the cost so we could have SHA-256
>> at all. I find the justification for this interoperability mode to
>> be less significant, and potentially adding too much of a tax onto
>> both SHA-1 repos that will never upgrade, and SHA-256 repos that
>> upgrade all at once (or start as SHA-256).
>
> The entire goal of the interoperability is to let people seamlessly and
> transparently move from SHA-1 to SHA-256.  Currently, the only way
> people can move a SHA-1 repository to a SHA-256 repository is with
> fast-import and fast-export, which loses all digital signatures and tags
> to blobs.  This also requires a flag day.
>
> SHA-1 can now be attacked for USD 45,000.  That means it is within the
> budget of a dedicated professional and virtually all medium or large
> corporations, including even most municipal governments, to create a
> SHA-1 collision.

Is that for vanilla SHA-1, or SHA-1DC?

> Unfortunately, the way we deal with this is to die, so
> as soon as this happens, the repository fails closed.  While an attacker
> cannot make use of the collisions to spread malicious objects, because
> of the way Git works, they can effectively DoS a repository, which is in
> itself a security issue.  Fixing this requires major surgery.

Can you elaborate on this? I believe that we die on any known collision
via the SHA1-DC code, and even if we didn't have that we'd detect the
collision (see [1] for the code) and die while the object is in the
temporary quarantine.

I believe such a request is cheaper to serve than one that doesn't
upload colliding objects, we die earlier (less CPU etc.), and don't add
objects to the store.

So what's the DoS vector?

> We need the interoperability code to let people transition their
> repositories away from SHA-1, even if it has some performance impact,
> because without that most SHA-1 repositories will never transition.
> That's what's outlined in the transition plan, and why that approach was
> proposed, even though it would be nicer to avoid having to implement it
> at all.

There's no question that we need working interop.

The question at least in my mind is why that interop is happening by
annotating every object held in memory with whether they're SHA-1 or
SHA-256, as opposed to having some translation layer earlier in the
chain.

Not all our file or in-memory structures are are like that, e.g. the
commit graph has a header saying "this is a bunch of SHA-1/256", and the
objects that follow are padded to that actual hash size, not the max
size we know about.

My understanding of the transition plan was that we'd e.g. have a
SHA-1<->SHA-256 mapping of objects, which we'd say use to push/pull.

But that by the time I ran say a "git commit" none of that machinery
needed to care that I was interoping with a SHA-1 repo on the other end.
It would just happily have all SHA-256 objects, create new ones, and
only by the time I needed to push them would something kick in to
re-hash them.

I *think* the anwer is just "it's easier on the C-level" and "the
wastage doesn't seem to matter much", which is fair enough.

*Goes and digs in the ML archive*:

    https://lore.kernel.org/git/1399147942-165308-1-git-send-email-sandals@crustytoothpaste.net/#t
    https://lore.kernel.org/git/55016A3A.6010100@alum.mit.edu/

To answer (some) of that myself:

Digging up some of the initial discussion that seems to be the case, at
that point there was a suggestion of:

    struct object_id {
        unsigned char hash_type;
        union {
            unsigned char sha1[GIT_SHA1_RAWSZ];
            unsigned char sha256[GIT_SHA256_RAWSZ];
        } hash;
    };

To which you replied:
    
    What I think might be more beneficial is to make the hash function
    specific to a particular repository, and therefore you could maintain
    something like this:
    
      struct object_id {
          unsigned char hash[GIT_MAX_RAWSZ];
      };

It wouldn't matter for the memory use to make it a union, but my reading
of the above is that the reason for the current object_id not-a-union
structure might not be valid now that there's a "hash_type" member, no?

> I will endeavor to make the performance impact as small as possible, of
> course, and ideally there will be none.  I am sensitive to the fact that
> people do run absurdly large workloads on Git, as we both know, and I do
> want to support that.

All of the above being said I do wonder if for those who worry about
hash size inflating their object store whether a more sensible end-goal
if that's an issue wouldn't be to store abbreviated hashes.

As long as you'd re-hash/inflate the size in the case of collisions
(which would be a more expensive check at the "fetch" boundary) you
could do so safely, and the result would be less memory consumption.

But maybe it's just a non-issue :)

1. https://lore.kernel.org/git/20181113201910.11518-1-avarab@gmail.com/

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/15] sha1-file: allow hashing objects literally with any algorithm
  2021-04-10 15:21 ` [PATCH 01/15] sha1-file: allow hashing objects literally with any algorithm brian m. carlson
@ 2021-04-15  8:55   ` Denton Liu
  2021-04-15 23:03     ` brian m. carlson
  2021-04-16 15:04   ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 57+ messages in thread
From: Denton Liu @ 2021-04-15  8:55 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Derrick Stolee

Hi brian,

> Subject: [PATCH 01/15] sha1-file: allow hashing objects literally with any algorithm

s/sha1-file/object-file/

I can see that you've waited a while to send this series ;)

On Sat, Apr 10, 2021 at 03:21:26PM +0000, brian m. carlson wrote:
> In order to perform suitable testing with multiple algorithms and
> interoperability, we'll need the ability to hash an object with a given
> algorithm. Introduce this capability for now only for objects which are
> hashed literally by adding a function which does this and changing a
> static function to accept an algorithm pointer.
> 
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  object-file.c  | 16 ++++++++++++++--
>  object-store.h |  3 +++
>  2 files changed, 17 insertions(+), 2 deletions(-)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/15] sha1-file: allow hashing objects literally with any algorithm
  2021-04-15  8:55   ` Denton Liu
@ 2021-04-15 23:03     ` brian m. carlson
  0 siblings, 0 replies; 57+ messages in thread
From: brian m. carlson @ 2021-04-15 23:03 UTC (permalink / raw)
  To: Denton Liu; +Cc: git, Derrick Stolee

[-- Attachment #1: Type: text/plain, Size: 785 bytes --]

On 2021-04-15 at 08:55:52, Denton Liu wrote:
> Hi brian,
> 
> > Subject: [PATCH 01/15] sha1-file: allow hashing objects literally with any algorithm
> 
> s/sha1-file/object-file/

Good point.  Will fix.

> I can see that you've waited a while to send this series ;)

Yes, I started writing this series after I finished the origin SHA-256
series.  That had been completed for some time, but it took time to send
out all the patches, so it's been sitting in my repository for a while.

I had hoped that things would be a little simpler than they had been and
I could have been finished by now, but I still have some 150 tests to
fix, so I decided to send out some initial patches to keep things
moving.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/15] cache: add an algo member to struct object_id
  2021-04-15  8:47       ` Ævar Arnfjörð Bjarmason
@ 2021-04-15 23:51         ` brian m. carlson
  0 siblings, 0 replies; 57+ messages in thread
From: brian m. carlson @ 2021-04-15 23:51 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee, git, Derrick Stolee

[-- Attachment #1: Type: text/plain, Size: 9484 bytes --]

On 2021-04-15 at 08:47:00, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Apr 14 2021, brian m. carlson wrote:
> > I will do some performance numbers on these patches, but it will likely
> > be the weekend before I can get to it.  I think this will add 4 bytes on
> > most platforms, since int is typically 32 bits, and the alignment
> > requirement would be for the most strictly aligned member, which is the
> > int, so a 4-byte alignment.  I don't think the alignment requirements
> > are especially onerous here.
> 
> I think if you're doing such a perf test one where we have SHA-1 mode
> with SHA-1 length OID v.s. SHA-256 (the current behavior) would be
> interesting as well.
> 
> It seems like good SHA1s to test that are 5a0cc8aca79 and
> 13eeedb5d17. Running:
> 
>     GIT_PERF_REPEAT_COUNT=10 \
>     GIT_SKIP_TESTS="p0001.[3-9] p1450.2" \
>     GIT_TEST_OPTS= GIT_PERF_MAKE_OPTS='-j8 CFLAGS=-O3' \
>     ./run 5a0cc8aca79 13eeedb5d17 -- p0001-rev-list.sh p1450-fsck.sh
> 
> (I added a fsck --connectivity-only test)
> 
> Gives us:
> 
>     Test                               5a0cc8aca79         13eeedb5d17
>     ------------------------------------------------------------------------------
>     0001.1: rev-list --all             2.46(2.22+0.14)     2.48(2.25+0.14) +0.8%
>     0001.2: rev-list --all --objects   10.79(10.22+0.14)   10.92(10.24+0.20) +1.2%
>     1450.1: fsck --connectivity-only   16.61(15.42+0.34)   16.94(15.60+0.32) +2.0%
> 
> So at least on my box none of those are outside of the confidence
> intervals. This was against my copy of git.git. Perhaps it matters more
> under memory pressure.

I do plan to take a deeper look at this this weekend and do some
performance numbers, and I think these are great examples to use.  I
just have a limited amount of time most weeknights because, among other
things, I am taking French a couple nights a week.

I talked with Stolee today about this approach and the desire for
performance, so I think we're on the same page about trying to make this
as fast as possible.  I plan to try a couple alternative solutions which
may work as well (or at least, I will make notes this time about why
they didn't work) and be less impactful.

> >> I assume that we already checked what happened when GIT_MAX_RAWSZ
> >> increased, but that seemed worth the cost so we could have SHA-256
> >> at all. I find the justification for this interoperability mode to
> >> be less significant, and potentially adding too much of a tax onto
> >> both SHA-1 repos that will never upgrade, and SHA-256 repos that
> >> upgrade all at once (or start as SHA-256).
> >
> > The entire goal of the interoperability is to let people seamlessly and
> > transparently move from SHA-1 to SHA-256.  Currently, the only way
> > people can move a SHA-1 repository to a SHA-256 repository is with
> > fast-import and fast-export, which loses all digital signatures and tags
> > to blobs.  This also requires a flag day.
> >
> > SHA-1 can now be attacked for USD 45,000.  That means it is within the
> > budget of a dedicated professional and virtually all medium or large
> > corporations, including even most municipal governments, to create a
> > SHA-1 collision.
> 
> Is that for vanilla SHA-1, or SHA-1DC?

Well, for SHA-1 in general.  SHA-1DC doesn't do anything except detect
collisions.  People can still create collisions, but we detect them.

> > Unfortunately, the way we deal with this is to die, so
> > as soon as this happens, the repository fails closed.  While an attacker
> > cannot make use of the collisions to spread malicious objects, because
> > of the way Git works, they can effectively DoS a repository, which is in
> > itself a security issue.  Fixing this requires major surgery.
> 
> Can you elaborate on this? I believe that we die on any known collision
> via the SHA1-DC code, and even if we didn't have that we'd detect the
> collision (see [1] for the code) and die while the object is in the
> temporary quarantine.
> 
> I believe such a request is cheaper to serve than one that doesn't
> upload colliding objects, we die earlier (less CPU etc.), and don't add
> objects to the store.
> 
> So what's the DoS vector?

That assumes that the server is using SHA-1DC and that the object can't
be uploaded any way except a push where its hash is checked.  Those are
true for many, but not all, hosting providers.  For example, anyone
using Git in a FIPS-validated environment can only use FIPS-validated
crypto, and I'm not aware of any SHA-1DC implementations that are.
Also, some implementations like Dulwich don't use SHA-1DC.

Once someone can get that object to a standard Git which does use
SHA-1DC, then the repository is pretty much hosed.  I probably can make
this better by just dropping the non SHA-1DC mode here and in libgit2,
to at least disincentivize poor choices in the most common
implementations.

> > We need the interoperability code to let people transition their
> > repositories away from SHA-1, even if it has some performance impact,
> > because without that most SHA-1 repositories will never transition.
> > That's what's outlined in the transition plan, and why that approach was
> > proposed, even though it would be nicer to avoid having to implement it
> > at all.
> 
> There's no question that we need working interop.
> 
> The question at least in my mind is why that interop is happening by
> annotating every object held in memory with whether they're SHA-1 or
> SHA-256, as opposed to having some translation layer earlier in the
> chain.

This is a good question.  Let me provide an example.

When we speak the remote protocol with a remote system, we'll parse out
several object ID of the appropriate algorithm.  We will then pass those
around to various places in our transport code.  It makes it a lot
easier if we can just run every object ID through an inline mapping
function which immediately does nothing if the object ID is of the
appropriate type rather than adding additional code to keep a check of
the current algorithm that's being used in the transport code.

It also makes it a lot easier when we let people store data in SHA-256
and then print things in SHA-1 for compatibility if we can add an
oid_to_hex_output and just map every object automatically on output,
regardless of where it came from, without needing to keep track of what
algorithm it's in.  For example, think about situations where the user
may have passed in a SHA-1 object ID and we reuse that value instead of
reading a SHA-256 object from the store.

So it's not completely impossible to avoid a hash algorithm member, but
it does significantly complicate our code not to do it.

> Not all our file or in-memory structures are are like that, e.g. the
> commit graph has a header saying "this is a bunch of SHA-1/256", and the
> objects that follow are padded to that actual hash size, not the max
> size we know about.
> 
> My understanding of the transition plan was that we'd e.g. have a
> SHA-1<->SHA-256 mapping of objects, which we'd say use to push/pull.

Correct.  That code exists and mostly works.  There are still a lot of
failing tests, but I have a pack index v3 that stores that data and a
loose object store.

> But that by the time I ran say a "git commit" none of that machinery
> needed to care that I was interoping with a SHA-1 repo on the other end.
> It would just happily have all SHA-256 objects, create new ones, and
> only by the time I needed to push them would something kick in to
> re-hash them.
> 
> I *think* the anwer is just "it's easier on the C-level" and "the
> wastage doesn't seem to matter much", which is fair enough.

I think that's accurate.

> *Goes and digs in the ML archive*:
> 
>     https://lore.kernel.org/git/1399147942-165308-1-git-send-email-sandals@crustytoothpaste.net/#t
>     https://lore.kernel.org/git/55016A3A.6010100@alum.mit.edu/
> 
> To answer (some) of that myself:
> 
> Digging up some of the initial discussion that seems to be the case, at
> that point there was a suggestion of:
> 
>     struct object_id {
>         unsigned char hash_type;
>         union {
>             unsigned char sha1[GIT_SHA1_RAWSZ];
>             unsigned char sha256[GIT_SHA256_RAWSZ];
>         } hash;
>     };
> 
> To which you replied:
> 
>     What I think might be more beneficial is to make the hash function
>     specific to a particular repository, and therefore you could maintain
>     something like this:
> 
>       struct object_id {
>           unsigned char hash[GIT_MAX_RAWSZ];
>       };
> 
> It wouldn't matter for the memory use to make it a union, but my reading
> of the above is that the reason for the current object_id not-a-union
> structure might not be valid now that there's a "hash_type" member, no?

Probably at that time we didn't have the formal transition plan and
didn't anticipate interoperability as a concern.  I do think that using
a single hash member instead of a union makes things easier because we
generally don't want to look up two different members in cases like
printing a hex OID.  We ultimately just want to print the right number
of bytes from that data, and the union just makes things trickier with
the compiler when we do that.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/15] sha1-file: allow hashing objects literally with any algorithm
  2021-04-10 15:21 ` [PATCH 01/15] sha1-file: allow hashing objects literally with any algorithm brian m. carlson
  2021-04-15  8:55   ` Denton Liu
@ 2021-04-16 15:04   ` Ævar Arnfjörð Bjarmason
  2021-04-16 18:55     ` Junio C Hamano
  1 sibling, 1 reply; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-16 15:04 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Derrick Stolee


On Sat, Apr 10 2021, brian m. carlson wrote:

> In order to perform suitable testing with multiple algorithms and
> interoperability, we'll need the ability to hash an object with a given
> algorithm. Introduce this capability for now only for objects which are
> hashed literally by adding a function which does this and changing a
> static function to accept an algorithm pointer.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  object-file.c  | 16 ++++++++++++++--
>  object-store.h |  3 +++
>  2 files changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/object-file.c b/object-file.c
> index 624af408cd..f5847ee20f 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1957,6 +1957,15 @@ int write_object_file(const void *buf, unsigned long len, const char *type,
>  int hash_object_file_literally(const void *buf, unsigned long len,
>  			       const char *type, struct object_id *oid,
>  			       unsigned flags)
> +{
> +	return hash_object_file_literally_algop(buf, len, type, oid, flags,
> +						the_hash_algo);
> +}
> +
> +int hash_object_file_literally_algop(const void *buf, unsigned long len,
> +				     const char *type, struct object_id *oid,
> +				     unsigned flags,
> +				     const struct git_hash_algo *algo)
>  {
>  	char *header;
>  	int hdrlen, status = 0;
> @@ -1964,11 +1973,14 @@ int hash_object_file_literally(const void *buf, unsigned long len,
>  	/* type string, SP, %lu of the length plus NUL must fit this */
>  	hdrlen = strlen(type) + MAX_HEADER_LEN;
>  	header = xmalloc(hdrlen);
> -	write_object_file_prepare(the_hash_algo, buf, len, type, oid, header,
> -				  &hdrlen);
> +	write_object_file_prepare(algo, buf, len, type, oid, header, &hdrlen);
>  
>  	if (!(flags & HASH_WRITE_OBJECT))
>  		goto cleanup;
> +	if (algo->format_id != the_hash_algo->format_id) {
> +		status = -1;
> +		goto cleanup;
> +	}
>  	if (freshen_packed_object(oid) || freshen_loose_object(oid))
>  		goto cleanup;
>  	status = write_loose_object(oid, header, hdrlen, buf, len, 0);
> diff --git a/object-store.h b/object-store.h
> index ec32c23dcb..f95d03a7f5 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -221,6 +221,9 @@ int hash_object_file_literally(const void *buf, unsigned long len,
>  			       const char *type, struct object_id *oid,
>  			       unsigned flags);
>  
> +int hash_object_file_literally_algop(const void *buf, unsigned long len,
> +				     const char *type, struct object_id *oid,
> +				     unsigned flags, const struct git_hash_algo *algo);
>  /*
>   * Add an object file to the in-memory object store, without writing it
>   * to disk.

We only have one user of hash_object_file_literally(),
builtin/hash-object.c, let's just change the signature of
hash_object_file_literally() instead of adding a new function. This
leaves the tree with no direct user of hash_object_file_literally().

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 02/15] builtin/hash-object: allow literally hashing with a given algorithm
  2021-04-10 15:21 ` [PATCH 02/15] builtin/hash-object: allow literally hashing with a given algorithm brian m. carlson
  2021-04-11  8:52   ` Ævar Arnfjörð Bjarmason
@ 2021-04-16 15:21   ` Ævar Arnfjörð Bjarmason
  2021-04-16 17:27   ` Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-16 15:21 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Derrick Stolee


On Sat, Apr 10 2021, brian m. carlson wrote:

> @@ -103,6 +110,7 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
>  		OPT_BOOL( 0 , "no-filters", &no_filters, N_("store file as is without filters")),
>  		OPT_BOOL( 0, "literally", &literally, N_("just hash any random garbage to create corrupt objects for debugging Git")),
>  		OPT_STRING( 0 , "path", &vpath, N_("file"), N_("process file as it were from this path")),
> +		OPT_STRING( 0 , "object-format", &object_format, N_("object-format"), N_("Use this hash algorithm")),
>  		OPT_END()

Nit: Carrrying the "0 , " v.s. "0, " formatting bug in the "path"
option, due to copy/pasting.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 02/15] builtin/hash-object: allow literally hashing with a given algorithm
  2021-04-10 15:21 ` [PATCH 02/15] builtin/hash-object: allow literally hashing with a given algorithm brian m. carlson
  2021-04-11  8:52   ` Ævar Arnfjörð Bjarmason
  2021-04-16 15:21   ` Ævar Arnfjörð Bjarmason
@ 2021-04-16 17:27   ` Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-16 17:27 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Derrick Stolee


On Sat, Apr 10 2021, brian m. carlson wrote:

> Add an --object-format argument to git hash-object that allows hashing
> an object with a given algorithm. Currently this options is limited to
> use with --literally, since the index_* functions do not yet handle
> multiple hash algorithms.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  builtin/hash-object.c  | 47 ++++++++++++++++++++++++++++++------------
>  t/t1007-hash-object.sh | 10 +++++++++
>  2 files changed, 44 insertions(+), 13 deletions(-)
>
> diff --git a/builtin/hash-object.c b/builtin/hash-object.c
> index 640ef4ded5..0203cfbe9a 100644
> --- a/builtin/hash-object.c
> +++ b/builtin/hash-object.c
> @@ -17,7 +17,8 @@
>   * needs to bypass the data conversion performed by, and the type
>   * limitation imposed by, index_fd() and its callees.
>   */
> -static int hash_literally(struct object_id *oid, int fd, const char *type, unsigned flags)
> +static int hash_literally(struct object_id *oid, int fd, const char *type,
> +			  unsigned flags, const struct git_hash_algo *algo)
>  {
>  	struct strbuf buf = STRBUF_INIT;
>  	int ret;
> @@ -25,42 +26,46 @@ static int hash_literally(struct object_id *oid, int fd, const char *type, unsig
>  	if (strbuf_read(&buf, fd, 4096) < 0)
>  		ret = -1;
>  	else
> -		ret = hash_object_file_literally(buf.buf, buf.len, type, oid,
> -						 flags);
> +		ret = hash_object_file_literally_algop(buf.buf, buf.len, type, oid,
> +						       flags, algo);
>  	strbuf_release(&buf);
>  	return ret;
>  }
>  
>  static void hash_fd(int fd, const char *type, const char *path, unsigned flags,
> -		    int literally)
> +		    int literally, const struct git_hash_algo *algo)
>  {
>  	struct stat st;
>  	struct object_id oid;
>  
> +	if (!literally && algo != the_hash_algo)
> +		die(_("Can't use hash algo %s except literally yet"), algo->name);
> +
>  	if (fstat(fd, &st) < 0 ||
>  	    (literally
> -	     ? hash_literally(&oid, fd, type, flags)
> +	     ? hash_literally(&oid, fd, type, flags, algo)
>  	     : index_fd(the_repository->index, &oid, fd, &st,
>  			type_from_string(type), path, flags)))
>  		die((flags & HASH_WRITE_OBJECT)
>  		    ? "Unable to add %s to database"
>  		    : "Unable to hash %s", path);
> -	printf("%s\n", oid_to_hex(&oid));
> +	printf("%s\n", hash_to_hex_algop(oid.hash, algo));
>  	maybe_flush_or_die(stdout, "hash to stdout");
>  }
>  
>  static void hash_object(const char *path, const char *type, const char *vpath,
> -			unsigned flags, int literally)
> +			unsigned flags, int literally,
> +			const struct git_hash_algo *algo)
>  {
>  	int fd;
>  	fd = open(path, O_RDONLY);
>  	if (fd < 0)
>  		die_errno("Cannot open '%s'", path);
> -	hash_fd(fd, type, vpath, flags, literally);
> +	hash_fd(fd, type, vpath, flags, literally, algo);
>  }
>  
>  static void hash_stdin_paths(const char *type, int no_filters, unsigned flags,
> -			     int literally)
> +			     int literally, const struct git_hash_algo *algo)
>  {
>  	struct strbuf buf = STRBUF_INIT;
>  	struct strbuf unquoted = STRBUF_INIT;
> @@ -73,7 +78,7 @@ static void hash_stdin_paths(const char *type, int no_filters, unsigned flags,
>  			strbuf_swap(&buf, &unquoted);
>  		}
>  		hash_object(buf.buf, type, no_filters ? NULL : buf.buf, flags,
> -			    literally);
> +			    literally, algo);
>  	}
>  	strbuf_release(&buf);
>  	strbuf_release(&unquoted);
> @@ -94,6 +99,8 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
>  	int nongit = 0;
>  	unsigned flags = HASH_FORMAT_CHECK;
>  	const char *vpath = NULL;
> +	const char *object_format = NULL;
> +	const struct git_hash_algo *algo;
>  	const struct option hash_object_options[] = {
>  		OPT_STRING('t', NULL, &type, N_("type"), N_("object type")),
>  		OPT_BIT('w', NULL, &flags, N_("write the object into the object database"),
> @@ -103,6 +110,7 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
>  		OPT_BOOL( 0 , "no-filters", &no_filters, N_("store file as is without filters")),
>  		OPT_BOOL( 0, "literally", &literally, N_("just hash any random garbage to create corrupt objects for debugging Git")),
>  		OPT_STRING( 0 , "path", &vpath, N_("file"), N_("process file as it were from this path")),
> +		OPT_STRING( 0 , "object-format", &object_format, N_("object-format"), N_("Use this hash algorithm")),
>  		OPT_END()
>  	};
>  	int i;
> @@ -121,6 +129,19 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
>  
>  	git_config(git_default_config, NULL);
>  
> +	algo = the_hash_algo;
> +	if (object_format) {
> +		if (flags & HASH_WRITE_OBJECT)
> +			errstr = "Can't use -w with --object-format";
> +		else {
> +			int id = hash_algo_by_name(object_format);
> +			if (id == GIT_HASH_UNKNOWN)
> +				errstr = "Unknown object format";
> +			else
> +				algo = &hash_algos[id];
> +		}
> +	}
> +
>  	if (stdin_paths) {
>  		if (hashstdin)
>  			errstr = "Can't use --stdin-paths with --stdin";
> @@ -142,7 +163,7 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
>  	}
>  
>  	if (hashstdin)
> -		hash_fd(0, type, vpath, flags, literally);
> +		hash_fd(0, type, vpath, flags, literally, algo);
>  
>  	for (i = 0 ; i < argc; i++) {
>  		const char *arg = argv[i];
> @@ -151,12 +172,12 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
>  		if (prefix)
>  			arg = to_free = prefix_filename(prefix, arg);
>  		hash_object(arg, type, no_filters ? NULL : vpath ? vpath : arg,
> -			    flags, literally);
> +			    flags, literally, algo);
>  		free(to_free);
>  	}
>  
>  	if (stdin_paths)
> -		hash_stdin_paths(type, no_filters, flags, literally);
> +		hash_stdin_paths(type, no_filters, flags, literally, algo);
>  
>  	return 0;
>  }
> diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
> index 64b340f227..ea4b3d2bda 100755
> --- a/t/t1007-hash-object.sh
> +++ b/t/t1007-hash-object.sh
> @@ -83,6 +83,11 @@ test_expect_success 'hash a file' '
>  	test "$(test_oid hello)" = $(git hash-object hello)
>  '
>  
> +test_expect_failure 'hash a file with a given algorithm' '
> +	test "$(test_oid --hash=sha1 hello)" = $(git hash-object --object-format=sha1 hello) &&
> +	test "$(test_oid --hash=sha256 hello)" = $(git hash-object --object-format=sha256 hello)
> +'
> +
>  test_blob_does_not_exist "$(test_oid hello)"
>  
>  test_expect_success 'hash from stdin' '
> @@ -248,4 +253,9 @@ test_expect_success '--literally with extra-long type' '
>  	echo example | git hash-object -t $t --literally --stdin
>  '
>  
> +test_expect_success '--literally with --object-format' '
> +	test $(test_oid --hash=sha1 hello) = $(git hash-object -t blob --literally --object-format=sha1 hello) &&
> +	test $(test_oid --hash=sha256 hello) = $(git hash-object -t blob --literally --object-format=sha256 hello)
> +'
> +
>  test_done

First, the "errstr" handling here is arguably buggy, or at least hard to
read. Before this change we never clobber it if we find an issue, but
after we do. In the end it doesn't really matter, we error on one usage
error or the other, but it does make fore more confusing reading.

The patch-on-top mentioned below turns that into a "goto error".

Secondly, sorry about the semantic merge conflict, but while trying to
fix that issue today I started reviwing this in more depth, and wanted
to ask some questions first:

I don't get why we need to pass the algop around here at all. Why not
handle this like your previous change to e.g. show-index.c where we set
it globally?

Just doing it like that would make your TODO test pass, and we're a
one-shot command here, I see no reason not do do that.

I have a hacky WIP series on top of this which implements that mid-way,
the diff is also at the end of this E-Mail.

For my own re-rolling of my object.c changes I settled on passing a
"type_len" down to hash_literally() et al for reasons I won't go into
here. That would conflict with your in-flight changes (in a way that's
not hard to resolve), but not conflicting at all because we just used a
global repo_set_hash_algo() would be even better.

So here's the messy patches on top:

https://github.com/avar/git/compare/3e8b16d2e70...avar-bc/hash-transition-interop-part-1

The other part of that is to just support "git --object-format=sha256"
instead of needing every command to slowly aquire an --object-format
argument.

That's currently broken in that series for a reason I wanted to ask you
about. I know *why*, but I wonder what the best solution is.

Part how that's dealt with, i.e. if you think this direction of making
"git" take that argument directly, has to do with data in your wetware,
not something in-tree :) I.e. what the future direction is.

I.e. it's broken because we end up calling repo_set_hash_algo() in
several places, first via common-main.c, then with my patch below in
git.c's argument handling, and then via parse_options() and in setup.c,
e.g. hash-object.c's explicit setup_git_directory() call.

As you'll see from the setup_git_directory_gently() call below I had an
attempt to monkeypatch around that by detecting zero'd out structs in
setup_git_directory_gently().

But e.g. ./t5318-commit-graph.sh is still broken because we no longer
pay attention to "git init --object-format=*" since
check_repository_format() and validate_hash_algorithm() end up not only
validating config, but also (re-)setting it.

It seems to me that we need a separation of concerns there, to have the
default in one place, and validation in another. But maybe there's some
more subtle reason for why we're (re-)setting it in that manner, and I
ran out of hacking time today and figured I'd send this E-Mail off at
the end of the day :)


diff --git a/builtin/hash-object.c b/builtin/hash-object.c
index 0203cfbe9a4..0ecf24b4793 100644
--- a/builtin/hash-object.c
+++ b/builtin/hash-object.c
@@ -17,8 +17,7 @@
  * needs to bypass the data conversion performed by, and the type
  * limitation imposed by, index_fd() and its callees.
  */
-static int hash_literally(struct object_id *oid, int fd, const char *type,
-			  unsigned flags, const struct git_hash_algo *algo)
+static int hash_literally(struct object_id *oid, int fd, const char *type, unsigned flags)
 {
 	struct strbuf buf = STRBUF_INIT;
 	int ret;
@@ -26,46 +25,42 @@ static int hash_literally(struct object_id *oid, int fd, const char *type,
 	if (strbuf_read(&buf, fd, 4096) < 0)
 		ret = -1;
 	else
-		ret = hash_object_file_literally_algop(buf.buf, buf.len, type, oid,
-						       flags, algo);
+		ret = hash_object_file_literally(buf.buf, buf.len, type, oid,
+						 flags);
 	strbuf_release(&buf);
 	return ret;
 }
 
 static void hash_fd(int fd, const char *type, const char *path, unsigned flags,
-		    int literally, const struct git_hash_algo *algo)
+		    int literally)
 {
 	struct stat st;
 	struct object_id oid;
 
-	if (!literally && algo != the_hash_algo)
-		die(_("Can't use hash algo %s except literally yet"), algo->name);
-
 	if (fstat(fd, &st) < 0 ||
 	    (literally
-	     ? hash_literally(&oid, fd, type, flags, algo)
+	     ? hash_literally(&oid, fd, type, flags)
 	     : index_fd(the_repository->index, &oid, fd, &st,
 			type_from_string(type), path, flags)))
 		die((flags & HASH_WRITE_OBJECT)
 		    ? "Unable to add %s to database"
 		    : "Unable to hash %s", path);
-	printf("%s\n", hash_to_hex_algop(oid.hash, algo));
+	printf("%s\n", oid_to_hex(&oid));
 	maybe_flush_or_die(stdout, "hash to stdout");
 }
 
 static void hash_object(const char *path, const char *type, const char *vpath,
-			unsigned flags, int literally,
-			const struct git_hash_algo *algo)
+			unsigned flags, int literally)
 {
 	int fd;
 	fd = open(path, O_RDONLY);
 	if (fd < 0)
 		die_errno("Cannot open '%s'", path);
-	hash_fd(fd, type, vpath, flags, literally, algo);
+	hash_fd(fd, type, vpath, flags, literally);
 }
 
 static void hash_stdin_paths(const char *type, int no_filters, unsigned flags,
-			     int literally, const struct git_hash_algo *algo)
+			     int literally)
 {
 	struct strbuf buf = STRBUF_INIT;
 	struct strbuf unquoted = STRBUF_INIT;
@@ -78,7 +73,7 @@ static void hash_stdin_paths(const char *type, int no_filters, unsigned flags,
 			strbuf_swap(&buf, &unquoted);
 		}
 		hash_object(buf.buf, type, no_filters ? NULL : buf.buf, flags,
-			    literally, algo);
+			    literally);
 	}
 	strbuf_release(&buf);
 	strbuf_release(&unquoted);
@@ -100,7 +95,6 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 	unsigned flags = HASH_FORMAT_CHECK;
 	const char *vpath = NULL;
 	const char *object_format = NULL;
-	const struct git_hash_algo *algo;
 	const struct option hash_object_options[] = {
 		OPT_STRING('t', NULL, &type, N_("type"), N_("object type")),
 		OPT_BIT('w', NULL, &flags, N_("write the object into the object database"),
@@ -110,47 +104,38 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 		OPT_BOOL( 0 , "no-filters", &no_filters, N_("store file as is without filters")),
 		OPT_BOOL( 0, "literally", &literally, N_("just hash any random garbage to create corrupt objects for debugging Git")),
 		OPT_STRING( 0 , "path", &vpath, N_("file"), N_("process file as it were from this path")),
-		OPT_STRING( 0 , "object-format", &object_format, N_("object-format"), N_("Use this hash algorithm")),
+		OPT_OBJECT_FORMAT(0, "object-format", &object_format),
 		OPT_END()
 	};
 	int i;
 	const char *errstr = NULL;
 
-	argc = parse_options(argc, argv, prefix, hash_object_options,
-			     hash_object_usage, 0);
-
+	/* might setup the hash algorithm */
 	if (flags & HASH_WRITE_OBJECT)
 		prefix = setup_git_directory();
 	else
 		prefix = setup_git_directory_gently(&nongit);
 
+	/* maybe override the already setup hash algorithm */
+	argc = parse_options(argc, argv, prefix, hash_object_options,
+			     hash_object_usage, 0);
+
 	if (vpath && prefix)
 		vpath = xstrdup(prefix_filename(prefix, vpath));
 
 	git_config(git_default_config, NULL);
 
-	algo = the_hash_algo;
-	if (object_format) {
-		if (flags & HASH_WRITE_OBJECT)
-			errstr = "Can't use -w with --object-format";
-		else {
-			int id = hash_algo_by_name(object_format);
-			if (id == GIT_HASH_UNKNOWN)
-				errstr = "Unknown object format";
-			else
-				algo = &hash_algos[id];
-		}
-	}
 
-	if (stdin_paths) {
+	if (object_format && flags & HASH_WRITE_OBJECT) {
+		errstr = "Can't use -w with --object-format";
+	} else if (stdin_paths) {
 		if (hashstdin)
 			errstr = "Can't use --stdin-paths with --stdin";
 		else if (argc)
 			errstr = "Can't specify files with --stdin-paths";
 		else if (vpath)
 			errstr = "Can't use --stdin-paths with --path";
-	}
-	else {
+	} else {
 		if (hashstdin > 1)
 			errstr = "Multiple --stdin arguments are not supported";
 		if (vpath && no_filters)
@@ -163,7 +148,7 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 	}
 
 	if (hashstdin)
-		hash_fd(0, type, vpath, flags, literally, algo);
+		hash_fd(0, type, vpath, flags, literally);
 
 	for (i = 0 ; i < argc; i++) {
 		const char *arg = argv[i];
@@ -172,12 +157,12 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 		if (prefix)
 			arg = to_free = prefix_filename(prefix, arg);
 		hash_object(arg, type, no_filters ? NULL : vpath ? vpath : arg,
-			    flags, literally, algo);
+			    flags, literally);
 		free(to_free);
 	}
 
 	if (stdin_paths)
-		hash_stdin_paths(type, no_filters, flags, literally, algo);
+		hash_stdin_paths(type, no_filters, flags, literally);
 
 	return 0;
 }
diff --git a/builtin/init-db.c b/builtin/init-db.c
index c19b35f1e69..7a7e63cf702 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -548,8 +548,7 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 			   N_("separate git dir from working tree")),
 		OPT_STRING('b', "initial-branch", &initial_branch, N_("name"),
 			   N_("override the name of the initial branch")),
-		OPT_STRING(0, "object-format", &object_format, N_("hash"),
-			   N_("specify the hash algorithm to use")),
+		OPT_OBJECT_FORMAT(0, "object-format", &object_format),
 		OPT_END()
 	};
 
@@ -607,11 +606,8 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 		free(cwd);
 	}
 
-	if (object_format) {
+	if (object_format)
 		hash_algo = hash_algo_by_name(object_format);
-		if (hash_algo == GIT_HASH_UNKNOWN)
-			die(_("unknown hash algorithm '%s'"), object_format);
-	}
 
 	if (init_shared_repository != -1)
 		set_shared_repository(init_shared_repository);
diff --git a/builtin/show-index.c b/builtin/show-index.c
index 0e0b9fb95bc..51352c2eaeb 100644
--- a/builtin/show-index.c
+++ b/builtin/show-index.c
@@ -15,23 +15,12 @@ int cmd_show_index(int argc, const char **argv, const char *prefix)
 	unsigned int version;
 	static unsigned int top_index[256];
 	unsigned hashsz;
-	const char *hash_name = NULL;
-	int hash_algo;
 	const struct option show_index_options[] = {
-		OPT_STRING(0, "object-format", &hash_name, N_("hash-algorithm"),
-			   N_("specify the hash algorithm to use")),
+		OPT_OBJECT_FORMAT(0, "object-format", NULL),
 		OPT_END()
 	};
 
 	argc = parse_options(argc, argv, prefix, show_index_options, show_index_usage, 0);
-
-	if (hash_name) {
-		hash_algo = hash_algo_by_name(hash_name);
-		if (hash_algo == GIT_HASH_UNKNOWN)
-			die(_("Unknown hash algorithm"));
-		repo_set_hash_algo(the_repository, hash_algo);
-	}
-
 	hashsz = the_hash_algo->rawsz;
 
 	if (fread(top_index, 2 * 4, 1, stdin) != 1)
diff --git a/builtin/verify-pack.c b/builtin/verify-pack.c
index 05c52135946..6ce09d068e7 100644
--- a/builtin/verify-pack.c
+++ b/builtin/verify-pack.c
@@ -16,6 +16,8 @@ static int verify_one_pack(const char *path, unsigned int flags, const char *has
 	int stat_only = flags & VERIFY_PACK_STAT_ONLY;
 	int err;
 
+	if (hash_algo)
+		strvec_pushf(argv, "--object-format=%s", hash_algo);
 	strvec_push(argv, "index-pack");
 
 	if (stat_only)
@@ -25,9 +27,6 @@ static int verify_one_pack(const char *path, unsigned int flags, const char *has
 	else
 		strvec_push(argv, "--verify");
 
-	if (hash_algo)
-		strvec_pushf(argv, "--object-format=%s", hash_algo);
-
 	/*
 	 * In addition to "foo.pack" we accept "foo.idx" and "foo";
 	 * normalize these forms to "foo.pack" for "index-pack --verify".
@@ -71,8 +70,7 @@ int cmd_verify_pack(int argc, const char **argv, const char *prefix)
 			VERIFY_PACK_VERBOSE),
 		OPT_BIT('s', "stat-only", &flags, N_("show statistics only"),
 			VERIFY_PACK_STAT_ONLY),
-		OPT_STRING(0, "object-format", &object_format, N_("hash"),
-			   N_("specify the hash algorithm to use")),
+		OPT_OBJECT_FORMAT(0, "object-format", &object_format),
 		OPT_END()
 	};
 
diff --git a/git.c b/git.c
index 9bc077a025c..824a9ef6cd2 100644
--- a/git.c
+++ b/git.c
@@ -160,6 +160,8 @@ static int handle_options(const char ***argv, int *argc, int *envchanged)
 				trace2_cmd_name("_query_");
 				exit(0);
 			}
+		} else if (skip_prefix(cmd, "--object-format=", &cmd)) {
+			repo_set_hash_algo_arg(the_repository, cmd);
 		} else if (!strcmp(cmd, "--html-path")) {
 			puts(system_path(GIT_HTML_PATH));
 			trace2_cmd_name("_query_");
diff --git a/object-file.c b/object-file.c
index 0401d7ca4fb..c53fc14718f 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2001,15 +2001,6 @@ int write_object_file(const void *buf, unsigned long len, const char *type,
 int hash_object_file_literally(const void *buf, unsigned long len,
 			       const char *type, struct object_id *oid,
 			       unsigned flags)
-{
-	return hash_object_file_literally_algop(buf, len, type, oid, flags,
-						the_hash_algo);
-}
-
-int hash_object_file_literally_algop(const void *buf, unsigned long len,
-				     const char *type, struct object_id *oid,
-				     unsigned flags,
-				     const struct git_hash_algo *algo)
 {
 	char *header;
 	int hdrlen, status = 0;
@@ -2017,14 +2008,11 @@ int hash_object_file_literally_algop(const void *buf, unsigned long len,
 	/* type string, SP, %lu of the length plus NUL must fit this */
 	hdrlen = strlen(type) + MAX_HEADER_LEN;
 	header = xmalloc(hdrlen);
-	write_object_file_prepare(algo, buf, len, type, oid, header, &hdrlen);
+	write_object_file_prepare(the_hash_algo, buf, len, type, oid, header,
+				  &hdrlen);
 
 	if (!(flags & HASH_WRITE_OBJECT))
 		goto cleanup;
-	if (algo->format_id != the_hash_algo->format_id) {
-		status = -1;
-		goto cleanup;
-	}
 	if (freshen_packed_object(oid) || freshen_loose_object(oid))
 		goto cleanup;
 	status = write_loose_object(oid, header, hdrlen, buf, len, 0);
diff --git a/parse-options-cb.c b/parse-options-cb.c
index 3c811e1e4a7..6308bed9675 100644
--- a/parse-options-cb.c
+++ b/parse-options-cb.c
@@ -293,3 +293,17 @@ int parse_opt_passthru_argv(const struct option *opt, const char *arg, int unset
 
 	return 0;
 }
+
+int parse_opt_object_format_cb(const struct option *opt, const char *arg, int unset)
+{
+	const char **value = opt->value;
+
+	BUG_ON_OPT_NEG(unset);
+
+	if (arg)
+		repo_set_hash_algo_arg(the_repository, arg);
+	if (value)
+		*value = arg;
+
+	return 0;
+}
diff --git a/parse-options.h b/parse-options.h
index a845a9d9527..086dab755ca 100644
--- a/parse-options.h
+++ b/parse-options.h
@@ -201,6 +201,10 @@ struct option {
 #define OPT_ALIAS(s, l, source_long_name) \
 	{ OPTION_ALIAS, (s), (l), (source_long_name) }
 
+#define OPT_OBJECT_FORMAT(s, l, v) \
+	{ OPTION_CALLBACK, (s), (l), (v), N_("hash"),N_("hash algorithm"), \
+	  PARSE_OPT_NONEG, parse_opt_object_format_cb }
+
 /*
  * parse_options() will filter out the processed options and leave the
  * non-option arguments in argv[]. argv0 is assumed program name and
@@ -303,6 +307,7 @@ enum parse_opt_result parse_opt_unknown_cb(struct parse_opt_ctx_t *ctx,
 					   const char *, int);
 int parse_opt_passthru(const struct option *, const char *, int);
 int parse_opt_passthru_argv(const struct option *, const char *, int);
+int parse_opt_object_format_cb(const struct option *, const char *, int);
 
 #define OPT__VERBOSE(var, h)  OPT_COUNTUP('v', "verbose", (var), (h))
 #define OPT__QUIET(var, h)    OPT_COUNTUP('q', "quiet",   (var), (h))
diff --git a/repository.c b/repository.c
index 87b355e7a65..1a7aca657db 100644
--- a/repository.c
+++ b/repository.c
@@ -91,6 +91,15 @@ void repo_set_hash_algo(struct repository *repo, int hash_algo)
 	repo->hash_algo = &hash_algos[hash_algo];
 }
 
+void repo_set_hash_algo_arg(struct repository *repo, const char *algo)
+{
+	int algo_id = hash_algo_by_name(algo);
+	if (algo_id == GIT_HASH_UNKNOWN)
+		die(_("unknown hash algorithm '%s'"), algo);
+	fprintf(stderr, "now setting %s=%d\n", algo, algo_id);
+	repo_set_hash_algo(the_repository, algo_id);
+}
+
 /*
  * Attempt to resolve and set the provided 'gitdir' for repository 'repo'.
  * Return 0 upon success and a non-zero value upon failure.
diff --git a/repository.h b/repository.h
index b385ca3c94b..56482f89f64 100644
--- a/repository.h
+++ b/repository.h
@@ -160,6 +160,7 @@ void repo_set_gitdir(struct repository *repo, const char *root,
 		     const struct set_gitdir_args *extra_args);
 void repo_set_worktree(struct repository *repo, const char *path);
 void repo_set_hash_algo(struct repository *repo, int algo);
+void repo_set_hash_algo_arg(struct repository *repo, const char *algo);
 void initialize_the_repository(void);
 int repo_init(struct repository *r, const char *gitdir, const char *worktree);
 
diff --git a/setup.c b/setup.c
index c04cd25a30d..7ad8f685b9b 100644
--- a/setup.c
+++ b/setup.c
@@ -1308,7 +1308,15 @@ const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository)
+		if (startup_info->have_repository &&
+		    /*
+		     * If we have called initialize_the_repository()
+		     * via common-main.c let's not set things up from
+		     * the REPOSITORY_FORMAT_INIT defaults again,
+		     * otherwise we'll clobber e.g. a invocations of
+		     * "git --object-format=<HASH> some-cmd".
+		     */
+		    !the_repository->hash_algo)
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
 	}
 
diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index ea4b3d2bda4..37bfb40c506 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -83,7 +83,7 @@ test_expect_success 'hash a file' '
 	test "$(test_oid hello)" = $(git hash-object hello)
 '
 
-test_expect_failure 'hash a file with a given algorithm' '
+test_expect_success 'hash a file with a given algorithm' '
 	test "$(test_oid --hash=sha1 hello)" = $(git hash-object --object-format=sha1 hello) &&
 	test "$(test_oid --hash=sha256 hello)" = $(git hash-object --object-format=sha256 hello)
 '
@@ -255,7 +255,9 @@ test_expect_success '--literally with extra-long type' '
 
 test_expect_success '--literally with --object-format' '
 	test $(test_oid --hash=sha1 hello) = $(git hash-object -t blob --literally --object-format=sha1 hello) &&
-	test $(test_oid --hash=sha256 hello) = $(git hash-object -t blob --literally --object-format=sha256 hello)
+	test $(test_oid --hash=sha256 hello) = $(git hash-object -t blob --literally --object-format=sha256 hello) &&
+	test $(test_oid --hash=sha256 hello) = $(git --object-format=sha256 hash-object -t blob --literally hello) &&
+	test $(test_oid --hash=sha256 hello) = $(git --object-format=sha1 hash-object -t blob --object-format=sha256 --literally hello)
 '
 
 test_done

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/15] sha1-file: allow hashing objects literally with any algorithm
  2021-04-16 15:04   ` Ævar Arnfjörð Bjarmason
@ 2021-04-16 18:55     ` Junio C Hamano
  0 siblings, 0 replies; 57+ messages in thread
From: Junio C Hamano @ 2021-04-16 18:55 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: brian m. carlson, git, Derrick Stolee

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> We only have one user of hash_object_file_literally(),
> builtin/hash-object.c, let's just change the signature of
> hash_object_file_literally() instead of adding a new function. This
> leaves the tree with no direct user of hash_object_file_literally().

Makes sense.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] git-compat-util.h: clarify comment on GCC-specific code
  2021-04-12 11:02             ` [PATCH 1/2] git-compat-util.h: clarify comment on GCC-specific code Ævar Arnfjörð Bjarmason
  2021-04-13  7:57               ` Jeff King
@ 2021-05-21  2:06               ` Jonathan Nieder
  1 sibling, 0 replies; 57+ messages in thread
From: Jonathan Nieder @ 2021-05-21  2:06 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, brian m . carlson, Jeff King

Hi,

Ævar Arnfjörð Bjarmason wrote:

> Change a comment added in e208f9cc757 (make error()'s constant return
> value more visible, 2012-12-15) to note that the code doesn't only
> depend on variadic macros, which have been a hard dependency since
> 765dc168882 (git-compat-util: always enable variadic macros,
> 2021-01-28), but also on GCC's handling of __VA_ARGS__. The commit
> message for e208f9cc757 made this clear, but the comment it added did
> not.
>
> See also e05bed960d3 (trace: add 'file:line' to all trace output,
> 2014-07-12) for another comment about GNUC's handling of __VA_ARGS__.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  git-compat-util.h | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)

Oh, subtle.  In fact, I believe 9798f7e5f9 (Use __VA_ARGS__ for all of
error's arguments, 2013-02-08) got rid of the gcc-ism, so we could do
the following instead.

(See section 6.10.3.10 for a description of

	#define f(...) g(__VA_ARGS__)

in the C99 standard.)

Thanks,
Jonathan

-- >8 --
Subject: error: use macro-based static analysis aid on non-gcc, too

In the rest of Git, we check HAVE_VARIADIC_MACROS (which is
unconditionally defined to true as a way to discover platforms that do
not support it) to guard code that requires variadic macro support.
This variadic macro is a bit older, so it uses a __GNUC__ check
instead.  Switch to use of HAVE_VARIADIC_MACROS here as well, so more
compilers can get the benefit of knowing at compile time that error()
always returns -1.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 git-compat-util.h | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index a508dbe5a3..ca22b11760 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -483,14 +483,19 @@ void warning_errno(const char *err, ...) __attribute__((format (printf, 1, 2)));
 #include <openssl/x509v3.h>
 #endif /* NO_OPENSSL */
 
+/*
+ * This is always defined as a first step towards making the use of variadic
+ * macros unconditional. If it causes compilation problems on your platform,
+ * please report it to the Git mailing list at git@vger.kernel.org.
+ */
+#define HAVE_VARIADIC_MACROS 1
+
 /*
  * Let callers be aware of the constant return value; this can help
- * gcc with -Wuninitialized analysis. We restrict this trick to gcc, though,
- * because some compilers may not support variadic macros. Since we're only
- * trying to help gcc, anyway, it's OK; other compilers will fall back to
- * using the function as usual.
+ * gcc with -Wuninitialized analysis. Compilers without support for variadic
+ * macros will fall back to using the function as usual.
  */
-#if defined(__GNUC__)
+#ifdef HAVE_VARIADIC_MACROS
 static inline int const_error(void)
 {
 	return -1;
@@ -1192,13 +1197,6 @@ static inline int regexec_buf(const regex_t *preg, const char *buf, size_t size,
 #endif
 #endif
 
-/*
- * This is always defined as a first step towards making the use of variadic
- * macros unconditional. If it causes compilation problems on your platform,
- * please report it to the Git mailing list at git@vger.kernel.org.
- */
-#define HAVE_VARIADIC_MACROS 1
-
 /* usage.c: only to be used for testing BUG() implementation (see test-tool) */
 extern int BUG_exit_code;
 
-- 
2.31.1.818.g46aad6cb9e


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] C99 support: remove non-HAVE_VARIADIC_MACROS code
  2021-04-12 11:02             ` [PATCH 2/2] C99 support: remove non-HAVE_VARIADIC_MACROS code Ævar Arnfjörð Bjarmason
  2021-04-12 17:58               ` Junio C Hamano
@ 2021-05-21  2:50               ` Jonathan Nieder
  1 sibling, 0 replies; 57+ messages in thread
From: Jonathan Nieder @ 2021-05-21  2:50 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, brian m . carlson, Jeff King

Hi,

Ævar Arnfjörð Bjarmason wrote:

> Remove code that depend on HAVE_VARIADIC_MACROS not being set. Since
> 765dc168882 (git-compat-util: always enable variadic macros,
> 2021-01-28) we've unconditionally defined it to be true, and that
> change went out with v2.31.0. This should have given packagers enough
> time to discover whether variadic macros were an issue.
>
> It seems that they weren't, so let's update the coding guidelines and
> remove all the fallback code for the non-HAVE_VARIADIC_MACROS case.

As discussed in the rest of this thread, that's a pretty short time,
so while I would love to be able to use variadic macros
unconditionally, I think we'd need a different rationale.

That said, I want us to be ready the moment external conditions allow.
Ideally we want it to be as simple as

	git grep --name-only -e HAVE_VARIADIC_MACROS |
	xargs unifdef -m -DHAVE_VARIADIC_MACROS=1

plus removing the #define; is there anything we need to do in advance
to make that work well?  Let's see.

[...]
> --- a/trace.h
> +++ b/trace.h
> @@ -126,66 +126,6 @@ void trace_command_performance(const char **argv);
>  void trace_verbatim(struct trace_key *key, const void *buf, unsigned len);
>  uint64_t trace_performance_enter(void);
>  
> -#ifndef HAVE_VARIADIC_MACROS
> -
> -/**
> - * Prints a formatted message, similar to printf.
> - */
> -__attribute__((format (printf, 1, 2)))
> -void trace_printf(const char *format, ...);

This removes the documentation for these functions and doesn't add it
back on the #else side.  So I think we'd want the following
preparatory patch.

Thanks,
Jonathan

-- >8 --
Subject: trace: move comments to variadic macro variant of trace functions

Nowadays compilers not having variadic macros are the exception.  Move
API documentation to the declarations used in the common case.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 trace.h | 79 +++++++++++++++++++++++++++------------------------------
 1 file changed, 38 insertions(+), 41 deletions(-)

diff --git a/trace.h b/trace.h
index 0dbbad0e41..fc7eb0bc72 100644
--- a/trace.h
+++ b/trace.h
@@ -128,56 +128,22 @@ uint64_t trace_performance_enter(void);
 
 #ifndef HAVE_VARIADIC_MACROS
 
-/**
- * Prints a formatted message, similar to printf.
- */
+/* Fallback implementation that does not add file:line. */
+
 __attribute__((format (printf, 1, 2)))
 void trace_printf(const char *format, ...);
 
 __attribute__((format (printf, 2, 3)))
 void trace_printf_key(struct trace_key *key, const char *format, ...);
 
-/**
- * Prints a formatted message, followed by a quoted list of arguments.
- */
 __attribute__((format (printf, 2, 3)))
 void trace_argv_printf(const char **argv, const char *format, ...);
 
-/**
- * Prints the strbuf, without additional formatting (i.e. doesn't
- * choke on `%` or even `\0`).
- */
 void trace_strbuf(struct trace_key *key, const struct strbuf *data);
 
-/**
- * Prints elapsed time (in nanoseconds) if GIT_TRACE_PERFORMANCE is enabled.
- *
- * Example:
- * ------------
- * uint64_t t = 0;
- * for (;;) {
- * 	// ignore
- * t -= getnanotime();
- * // code section to measure
- * t += getnanotime();
- * // ignore
- * }
- * trace_performance(t, "frotz");
- * ------------
- */
 __attribute__((format (printf, 2, 3)))
 void trace_performance(uint64_t nanos, const char *format, ...);
 
-/**
- * Prints elapsed time since 'start' if GIT_TRACE_PERFORMANCE is enabled.
- *
- * Example:
- * ------------
- * uint64_t start = getnanotime();
- * // code section to measure
- * trace_performance_since(start, "foobar");
- * ------------
- */
 __attribute__((format (printf, 2, 3)))
 void trace_performance_since(uint64_t start, const char *format, ...);
 
@@ -186,11 +152,6 @@ void trace_performance_leave(const char *format, ...);
 
 #else
 
-/*
- * Macros to add file:line - see above for C-style declarations of how these
- * should be used.
- */
-
 /*
  * TRACE_CONTEXT may be set to __FUNCTION__ if the compiler supports it. The
  * default is __FILE__, as it is consistent with assert(), and static function
@@ -227,8 +188,14 @@ void trace_performance_leave(const char *format, ...);
 					    __VA_ARGS__);		    \
 	} while (0)
 
+/**
+ * Prints a formatted message, similar to printf.
+ */
 #define trace_printf(...) trace_printf_key(&trace_default_key, __VA_ARGS__)
 
+/**
+ * Prints a formatted message, followed by a quoted list of arguments.
+ */
 #define trace_argv_printf(argv, ...)					    \
 	do {								    \
 		if (trace_pass_fl(&trace_default_key))			    \
@@ -236,12 +203,32 @@ void trace_performance_leave(const char *format, ...);
 					    argv, __VA_ARGS__);		    \
 	} while (0)
 
+/**
+ * Prints the strbuf, without additional formatting (i.e. doesn't
+ * choke on `%` or even `\0`).
+ */
 #define trace_strbuf(key, data)						    \
 	do {								    \
 		if (trace_pass_fl(key))					    \
 			trace_strbuf_fl(TRACE_CONTEXT, __LINE__, key, data);\
 	} while (0)
 
+/**
+ * Prints elapsed time (in nanoseconds) if GIT_TRACE_PERFORMANCE is enabled.
+ *
+ * Example:
+ * ------------
+ * uint64_t t = 0;
+ * for (;;) {
+ * 	// ignore
+ * t -= getnanotime();
+ * // code section to measure
+ * t += getnanotime();
+ * // ignore
+ * }
+ * trace_performance(t, "frotz");
+ * ------------
+ */
 #define trace_performance(nanos, ...)					    \
 	do {								    \
 		if (trace_pass_fl(&trace_perf_key))			    \
@@ -249,6 +236,16 @@ void trace_performance_leave(const char *format, ...);
 					     __VA_ARGS__);		    \
 	} while (0)
 
+/**
+ * Prints elapsed time since 'start' if GIT_TRACE_PERFORMANCE is enabled.
+ *
+ * Example:
+ * ------------
+ * uint64_t start = getnanotime();
+ * // code section to measure
+ * trace_performance_since(start, "foobar");
+ * ------------
+ */
 #define trace_performance_since(start, ...)				    \
 	do {								    \
 		if (trace_pass_fl(&trace_perf_key))			    \
-- 
2.31.1.818.g46aad6cb9e


^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2021-05-21  2:50 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-10 15:21 [PATCH 00/15] SHA-256 / SHA-1 interop, part 1 brian m. carlson
2021-04-10 15:21 ` [PATCH 01/15] sha1-file: allow hashing objects literally with any algorithm brian m. carlson
2021-04-15  8:55   ` Denton Liu
2021-04-15 23:03     ` brian m. carlson
2021-04-16 15:04   ` Ævar Arnfjörð Bjarmason
2021-04-16 18:55     ` Junio C Hamano
2021-04-10 15:21 ` [PATCH 02/15] builtin/hash-object: allow literally hashing with a given algorithm brian m. carlson
2021-04-11  8:52   ` Ævar Arnfjörð Bjarmason
2021-04-11 21:07     ` brian m. carlson
2021-04-16 15:21   ` Ævar Arnfjörð Bjarmason
2021-04-16 17:27   ` Ævar Arnfjörð Bjarmason
2021-04-10 15:21 ` [PATCH 03/15] cache: add an algo member to struct object_id brian m. carlson
2021-04-11 11:55   ` Ævar Arnfjörð Bjarmason
2021-04-11 21:37     ` brian m. carlson
2021-04-13 12:12   ` Derrick Stolee
2021-04-14  1:08     ` brian m. carlson
2021-04-15  8:47       ` Ævar Arnfjörð Bjarmason
2021-04-15 23:51         ` brian m. carlson
2021-04-10 15:21 ` [PATCH 04/15] Always use oidread to read into " brian m. carlson
2021-04-10 15:21 ` [PATCH 05/15] hash: add a function to finalize object IDs brian m. carlson
2021-04-10 15:21 ` [PATCH 06/15] Use the final_oid_fn to finalize hashing of " brian m. carlson
2021-04-10 15:21 ` [PATCH 07/15] builtin/pack-redundant: avoid casting buffers to struct object_id brian m. carlson
2021-04-10 15:21 ` [PATCH 08/15] cache: compare the entire buffer for " brian m. carlson
2021-04-11  8:17   ` Chris Torek
2021-04-11 11:36   ` Ævar Arnfjörð Bjarmason
2021-04-11 21:05     ` brian m. carlson
2021-04-10 15:21 ` [PATCH 09/15] hash: set and copy algo field in " brian m. carlson
2021-04-11 11:57   ` Ævar Arnfjörð Bjarmason
2021-04-11 21:48     ` brian m. carlson
2021-04-11 22:12       ` Ævar Arnfjörð Bjarmason
2021-04-11 23:52         ` brian m. carlson
2021-04-12 11:02           ` [PATCH 0/2] C99: harder dependency on variadic macros Ævar Arnfjörð Bjarmason
2021-04-12 11:02             ` [PATCH 1/2] git-compat-util.h: clarify comment on GCC-specific code Ævar Arnfjörð Bjarmason
2021-04-13  7:57               ` Jeff King
2021-04-13 21:07                 ` Junio C Hamano
2021-04-14  5:21                   ` Jeff King
2021-04-14  6:12                     ` Ævar Arnfjörð Bjarmason
2021-04-14  7:31                       ` Jeff King
2021-05-21  2:06               ` Jonathan Nieder
2021-04-12 11:02             ` [PATCH 2/2] C99 support: remove non-HAVE_VARIADIC_MACROS code Ævar Arnfjörð Bjarmason
2021-04-12 17:58               ` Junio C Hamano
2021-04-13  8:00                 ` Jeff King
2021-05-21  2:50               ` Jonathan Nieder
2021-04-12 12:14             ` [PATCH 0/2] C99: harder dependency on variadic macros Bagas Sanjaya
2021-04-12 12:41               ` Ævar Arnfjörð Bjarmason
2021-04-12 22:57                 ` brian m. carlson
2021-04-12 23:19                   ` Junio C Hamano
2021-04-12 10:53         ` [PATCH 09/15] hash: set and copy algo field in struct object_id Junio C Hamano
2021-04-12 11:13           ` Ævar Arnfjörð Bjarmason
2021-04-10 15:21 ` [PATCH 10/15] hash: provide per-algorithm null OIDs brian m. carlson
2021-04-11 14:03   ` Junio C Hamano
2021-04-11 21:51     ` brian m. carlson
2021-04-10 15:21 ` [PATCH 11/15] builtin/show-index: set the algorithm for object IDs brian m. carlson
2021-04-10 15:21 ` [PATCH 12/15] commit-graph: don't store file hashes as struct object_id brian m. carlson
2021-04-10 15:21 ` [PATCH 13/15] builtin/pack-objects: avoid using struct object_id for pack hash brian m. carlson
2021-04-10 15:21 ` [PATCH 14/15] hex: default to the_hash_algo on zero algorithm value brian m. carlson
2021-04-10 15:21 ` [PATCH 15/15] hex: print objects using the hash algorithm member brian m. carlson

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).