[PATCH 00/31] Hash function transition part 16

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH 00/31] Hash function transition part 16
@ 2019-02-12  1:22 brian m. carlson
  2019-02-12  1:22 ` [PATCH 01/31] t/lib-submodule-update: use appropriate length constant brian m. carlson
                   ` (31 more replies)
  0 siblings, 32 replies; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

This is the sixteenth series of hash function transition patches. This
series contains various fixes, mostly focused around the pack bitmap
code, the HTTP code, the archive code, the index, and parts of our Perl
code.

This is the second to last series required for a "stage 0" Git; that is,
one that can operate only with SHA-256, but not SHA-1. Observers will
notice a focus on getting rid of sha1_to_hex and null_sha1 as well as
the normal types of transforms; the next series will remove both of
these.

This series modifies the index code such that it can work with a hash
algorithm of any length. In order to do so, the structs involved were
changed to use flex array members and not store the hash in a fixed
array member. This design was chosen over a multiple struct approach
because it ensures that we have one consistent, well-tested code path
that works for both algorithms, as well as any algorithms in the future.
Comments on the approach or arguments for other designs are welcome.

This is a rather long series, but most of it is concentrated in a few
small areas, so hopefully it's a little easier to review because of
that.

To preview the series that come after this, there is an additional
series for stage 0 Git (object-id-part17 plus part of sha256-fixes),
plus potentially several series of test fixes (test-fixes-part4 and part
of sha256-fixes). Following that, I plan to introduce, under the
DEVELOPER Makefile flag, the actual code which supports
extensions.objectFormat and makes it so that it works
(transition-stage-4).

brian m. carlson (31):
  t/lib-submodule-update: use appropriate length constant
  pack-bitmap: make bitmap header handling hash agnostic
  pack-bitmap: convert struct stored_bitmap to object_id
  pack-bitmap: replace sha1_to_hex
  pack-bitmap: switch hard-coded constants to the_hash_algo
  submodule: avoid hard-coded constants
  notes-merge: switch to use the_hash_algo
  notes: make hash size independent
  notes: replace sha1_to_hex
  object-store: rename and expand packed_git's sha1 member
  builtin/name-rev: make hash-size independent
  fast-import: make hash-size independent
  fast-import: replace sha1_to_hex
  builtin/am: make hash size independent
  builtin/pull: make hash-size independent
  http-push: convert to use the_hash_algo
  http-backend: allow 64-character hex names
  http-push: remove remaining uses of sha1_to_hex
  http-walker: replace sha1_to_hex
  http: replace hard-coded constant with the_hash_algo
  http: compute hash of downloaded objects using the_hash_algo
  http: replace sha1_to_hex
  remote-curl: make hash size independent
  archive-tar: make hash size independent
  archive: convert struct archiver_args to object_id
  refspec: make hash size independent
  builtin/difftool: use parse_oid_hex
  dir: make untracked cache extension hash size independent
  read-cache: read data in a hash-independent way
  Git.pm: make hash size independent
  gitweb: make hash size independent

 archive-tar.c               |  7 ++--
 archive-zip.c               | 10 ++---
 archive.c                   |  8 ++--
 archive.h                   |  2 +-
 builtin/am.c                |  9 +++--
 builtin/difftool.c          |  6 +--
 builtin/get-tar-commit-id.c | 11 +++++-
 builtin/name-rev.c          | 14 ++++---
 builtin/pack-redundant.c    |  2 +-
 builtin/pull.c              |  5 ++-
 dir.c                       | 28 +++++++-------
 fast-import.c               | 48 +++++++++++++-----------
 gitweb/gitweb.perl          | 63 ++++++++++++++++---------------
 http-backend.c              |  3 ++
 http-push.c                 | 29 ++++++++-------
 http-walker.c               | 18 ++++-----
 http.c                      | 33 +++++++++--------
 http.h                      |  2 +-
 merge-recursive.c           |  2 +-
 notes-merge.c               |  6 +--
 notes.c                     | 44 +++++++++++-----------
 object-store.h              |  2 +-
 pack-bitmap-write.c         |  8 ++--
 pack-bitmap.c               | 20 +++++-----
 pack-bitmap.h               |  2 +-
 packfile.c                  |  6 +--
 perl/Git.pm                 |  2 +-
 read-cache.c                | 74 +++++++++++++++----------------------
 refspec.c                   |  2 +-
 remote-curl.c               | 11 +++---
 submodule.c                 |  2 +-
 t/lib-submodule-update.sh   |  3 +-
 32 files changed, 246 insertions(+), 236 deletions(-)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 01/31] t/lib-submodule-update: use appropriate length constant
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12  1:22 ` [PATCH 02/31] pack-bitmap: make bitmap header handling hash agnostic brian m. carlson
                   ` (30 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Instead of using a specific invalid hard-coded object ID, produce one
of the appropriate length by using test_oid.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/lib-submodule-update.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 5b56b23166..1dd17fc03e 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -139,7 +139,7 @@ create_lib_submodule_repo () {
 		git revert HEAD &&
 
 		git checkout -b invalid_sub1 add_sub1 &&
-		git update-index --cacheinfo 160000 0123456789012345678901234567890123456789 sub1 &&
+		git update-index --cacheinfo 160000 $(test_oid numeric) sub1 &&
 		git commit -m "Invalid sub1 commit" &&
 		git checkout -b valid_sub1 &&
 		git revert HEAD &&
@@ -196,6 +196,7 @@ test_git_directory_exists() {
 # the submodule repo if it doesn't exist and configures the most problematic
 # settings for diff.ignoreSubmodules.
 prolog () {
+	test_oid_init &&
 	(test -d submodule_update_repo || create_lib_submodule_repo) &&
 	test_config_global diff.ignoreSubmodules all &&
 	test_config diff.ignoreSubmodules all

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 02/31] pack-bitmap: make bitmap header handling hash agnostic
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
  2019-02-12  1:22 ` [PATCH 01/31] t/lib-submodule-update: use appropriate length constant brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12  1:22 ` [PATCH 03/31] pack-bitmap: convert struct stored_bitmap to object_id brian m. carlson
                   ` (29 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Increase the checksum field in struct bitmap_disk_header to be
GIT_MAX_RAWSZ bytes in length and ensure that we hash the proper number
of bytes out when computing the bitmap checksum.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 pack-bitmap-write.c | 2 +-
 pack-bitmap.c       | 2 +-
 pack-bitmap.h       | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 5566e94abe..c82fb01fd7 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -535,7 +535,7 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 	header.entry_count = htonl(writer.selected_nr);
 	hashcpy(header.checksum, writer.pack_checksum);
 
-	hashwrite(f, &header, sizeof(header));
+	hashwrite(f, &header, sizeof(header) - GIT_MAX_RAWSZ + the_hash_algo->rawsz);
 	dump_bitmap(f, writer.commits);
 	dump_bitmap(f, writer.trees);
 	dump_bitmap(f, writer.blobs);
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 4695aaf6b4..b53f37243c 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -163,7 +163,7 @@ static int load_bitmap_header(struct bitmap_index *index)
 	}
 
 	index->entry_count = ntohl(header->entry_count);
-	index->map_pos += sizeof(*header);
+	index->map_pos += sizeof(*header) - GIT_MAX_RAWSZ + the_hash_algo->rawsz;
 	return 0;
 }
 
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 8418ba8c79..344ba23af9 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -14,7 +14,7 @@ struct bitmap_disk_header {
 	uint16_t version;
 	uint16_t options;
 	uint32_t entry_count;
-	unsigned char checksum[20];
+	unsigned char checksum[GIT_MAX_RAWSZ];
 };
 
 static const char BITMAP_IDX_SIGNATURE[] = {'B', 'I', 'T', 'M'};

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 03/31] pack-bitmap: convert struct stored_bitmap to object_id
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
  2019-02-12  1:22 ` [PATCH 01/31] t/lib-submodule-update: use appropriate length constant brian m. carlson
  2019-02-12  1:22 ` [PATCH 02/31] pack-bitmap: make bitmap header handling hash agnostic brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12  1:22 ` [PATCH 04/31] pack-bitmap: replace sha1_to_hex brian m. carlson
                   ` (28 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Convert struct stored_bitmap to use struct object_id.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 pack-bitmap.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index b53f37243c..c760913cea 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -18,7 +18,7 @@
  * commit.
  */
 struct stored_bitmap {
-	unsigned char sha1[20];
+	struct object_id oid;
 	struct ewah_bitmap *root;
 	struct stored_bitmap *xor;
 	int flags;
@@ -181,9 +181,9 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
 	stored->root = root;
 	stored->xor = xor_with;
 	stored->flags = flags;
-	hashcpy(stored->sha1, sha1);
+	oidread(&stored->oid, sha1);
 
-	hash_pos = kh_put_sha1(index->bitmaps, stored->sha1, &ret);
+	hash_pos = kh_put_sha1(index->bitmaps, stored->oid.hash, &ret);
 
 	/* a 0 return code means the insertion succeeded with no changes,
 	 * because the SHA1 already existed on the map. this is bad, there
@@ -1080,7 +1080,7 @@ int rebuild_existing_bitmaps(struct bitmap_index *bitmap_git,
 					    lookup_stored_bitmap(stored),
 					    rebuild)) {
 				hash_pos = kh_put_sha1(reused_bitmaps,
-						       stored->sha1,
+						       stored->oid.hash,
 						       &hash_ret);
 				kh_value(reused_bitmaps, hash_pos) =
 					bitmap_to_ewah(rebuild);

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 04/31] pack-bitmap: replace sha1_to_hex
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (2 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 03/31] pack-bitmap: convert struct stored_bitmap to object_id brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12  6:37   ` Jeff King
  2019-02-12  1:22 ` [PATCH 05/31] pack-bitmap: switch hard-coded constants to the_hash_algo brian m. carlson
                   ` (27 subsequent siblings)
  31 siblings, 1 reply; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Replace the uses of sha1_to_hex in the pack bitmap code with hash_to_hex
to allow the use of SHA-256 as well.  Rename a few variables since they
are no longer limited to SHA-1.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 pack-bitmap-write.c | 6 +++---
 pack-bitmap.c       | 8 ++++----
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index c82fb01fd7..802ed62677 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -142,13 +142,13 @@ static inline void reset_all_seen(void)
 	seen_objects_nr = 0;
 }
 
-static uint32_t find_object_pos(const unsigned char *sha1)
+static uint32_t find_object_pos(const unsigned char *hash)
 {
-	struct object_entry *entry = packlist_find(writer.to_pack, sha1, NULL);
+	struct object_entry *entry = packlist_find(writer.to_pack, hash, NULL);
 
 	if (!entry) {
 		die("Failed to write bitmap index. Packfile doesn't have full closure "
-			"(object %s is missing)", sha1_to_hex(sha1));
+			"(object %s is missing)", hash_to_hex(hash));
 	}
 
 	return oe_in_pack_pos(writer.to_pack, entry);
diff --git a/pack-bitmap.c b/pack-bitmap.c
index c760913cea..6d6fa68563 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -169,7 +169,7 @@ static int load_bitmap_header(struct bitmap_index *index)
 
 static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
 					  struct ewah_bitmap *root,
-					  const unsigned char *sha1,
+					  const unsigned char *hash,
 					  struct stored_bitmap *xor_with,
 					  int flags)
 {
@@ -181,7 +181,7 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
 	stored->root = root;
 	stored->xor = xor_with;
 	stored->flags = flags;
-	oidread(&stored->oid, sha1);
+	oidread(&stored->oid, hash);
 
 	hash_pos = kh_put_sha1(index->bitmaps, stored->oid.hash, &ret);
 
@@ -189,7 +189,7 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
 	 * because the SHA1 already existed on the map. this is bad, there
 	 * shouldn't be duplicated commits in the index */
 	if (ret == 0) {
-		error("Duplicate entry in bitmap index: %s", sha1_to_hex(sha1));
+		error("Duplicate entry in bitmap index: %s", hash_to_hex(hash));
 		return NULL;
 	}
 
@@ -805,7 +805,7 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 
 		fprintf(stderr, "Failed to reuse at %d (%016llx)\n",
 			reuse_objects, result->words[i]);
-		fprintf(stderr, " %s\n", sha1_to_hex(sha1));
+		fprintf(stderr, " %s\n", hash_to_hex(sha1));
 	}
 #endif
 

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 05/31] pack-bitmap: switch hard-coded constants to the_hash_algo
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (3 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 04/31] pack-bitmap: replace sha1_to_hex brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12 11:13   ` Ævar Arnfjörð Bjarmason
  2019-02-12  1:22 ` [PATCH 06/31] submodule: avoid hard-coded constants brian m. carlson
                   ` (26 subsequent siblings)
  31 siblings, 1 reply; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Switch two hard-coded uses of 20 to references to the_hash_algo.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 pack-bitmap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 6d6fa68563..603492c237 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -138,7 +138,7 @@ static int load_bitmap_header(struct bitmap_index *index)
 {
 	struct bitmap_disk_header *header = (void *)index->map;
 
-	if (index->map_size < sizeof(*header) + 20)
+	if (index->map_size < sizeof(*header) + the_hash_algo->rawsz)
 		return error("Corrupted bitmap index (missing header data)");
 
 	if (memcmp(header->magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE)) != 0)
@@ -157,7 +157,7 @@ static int load_bitmap_header(struct bitmap_index *index)
 				"(Git requires BITMAP_OPT_FULL_DAG)");
 
 		if (flags & BITMAP_OPT_HASH_CACHE) {
-			unsigned char *end = index->map + index->map_size - 20;
+			unsigned char *end = index->map + index->map_size - the_hash_algo->rawsz;
 			index->hashes = ((uint32_t *)end) - index->pack->num_objects;
 		}
 	}

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 06/31] submodule: avoid hard-coded constants
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (4 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 05/31] pack-bitmap: switch hard-coded constants to the_hash_algo brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12  1:22 ` [PATCH 07/31] notes-merge: switch to use the_hash_algo brian m. carlson
                   ` (25 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Instead of using hard-coded 40-based constants, express these values in
terms of the_hash_algo and GIT_MAX_HEXSZ.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 merge-recursive.c | 2 +-
 submodule.c       | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 4851825aeb..b8bd5d4f8d 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1122,7 +1122,7 @@ static int find_first_merges(struct repository *repo,
 	struct commit *commit;
 	int contains_another;
 
-	char merged_revision[42];
+	char merged_revision[GIT_MAX_HEXSZ + 2];
 	const char *rev_args[] = { "rev-list", "--merges", "--ancestry-path",
 				   "--all", merged_revision, NULL };
 	struct rev_info revs;
diff --git a/submodule.c b/submodule.c
index 934ecfa294..0e2faaa41b 100644
--- a/submodule.c
+++ b/submodule.c
@@ -990,7 +990,7 @@ static int submodule_needs_pushing(struct repository *r,
 		if (start_command(&cp))
 			die("Could not run 'git rev-list <commits> --not --remotes -n 1' command in submodule %s",
 					path);
-		if (strbuf_read(&buf, cp.out, 41))
+		if (strbuf_read(&buf, cp.out, the_hash_algo->hexsz + 1))
 			needs_pushing = 1;
 		finish_command(&cp);
 		close(cp.out);

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 07/31] notes-merge: switch to use the_hash_algo
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (5 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 06/31] submodule: avoid hard-coded constants brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12  1:22 ` [PATCH 08/31] notes: make hash size independent brian m. carlson
                   ` (24 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Switch from using GIT_SHA1_HEXSZ to GIT_MAX_HEXSZ and the_hash_algo so
that the code works with any hash algorithm.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 notes-merge.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/notes-merge.c b/notes-merge.c
index 280aa8e6c1..2fe724f1cf 100644
--- a/notes-merge.c
+++ b/notes-merge.c
@@ -29,14 +29,14 @@ void init_notes_merge_options(struct repository *r,
 
 static int path_to_oid(const char *path, struct object_id *oid)
 {
-	char hex_oid[GIT_SHA1_HEXSZ];
+	char hex_oid[GIT_MAX_HEXSZ];
 	int i = 0;
-	while (*path && i < GIT_SHA1_HEXSZ) {
+	while (*path && i < the_hash_algo->hexsz) {
 		if (*path != '/')
 			hex_oid[i++] = *path;
 		path++;
 	}
-	if (*path || i != GIT_SHA1_HEXSZ)
+	if (*path || i != the_hash_algo->hexsz)
 		return -1;
 	return get_oid_hex(hex_oid, oid);
 }

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 08/31] notes: make hash size independent
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (6 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 07/31] notes-merge: switch to use the_hash_algo brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12  1:37   ` Eric Sunshine
  2019-02-12  1:22 ` [PATCH 09/31] notes: replace sha1_to_hex brian m. carlson
                   ` (23 subsequent siblings)
  31 siblings, 1 reply; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Switch out various uses of the GIT_SHA1_* constants with GIT_MAX_*
constants for allocations and the_hash_algo for general parsing.  Update
a comment to no longer be SHA-1 specific.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 notes.c | 32 +++++++++++++++++---------------
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/notes.c b/notes.c
index 7f7cc4d511..96647f1d2d 100644
--- a/notes.c
+++ b/notes.c
@@ -67,8 +67,9 @@ struct non_note {
 
 #define GET_NIBBLE(n, sha1) ((((sha1)[(n) >> 1]) >> ((~(n) & 0x01) << 2)) & 0x0f)
 
-#define KEY_INDEX (GIT_SHA1_RAWSZ - 1)
-#define FANOUT_PATH_SEPARATORS ((GIT_SHA1_HEXSZ / 2) - 1)
+#define KEY_INDEX (the_hash_algo->rawsz - 1)
+#define FANOUT_PATH_SEPARATORS (the_hash_algo->rawsz - 1)
+#define FANOUT_PATH_SEPARATORS_MAX ((GIT_MAX_HEXSZ / 2) - 1)
 #define SUBTREE_SHA1_PREFIXCMP(key_sha1, subtree_sha1) \
 	(memcmp(key_sha1, subtree_sha1, subtree_sha1[KEY_INDEX]))
 
@@ -198,7 +199,7 @@ static void note_tree_remove(struct notes_tree *t,
 		struct leaf_node *entry)
 {
 	struct leaf_node *l;
-	struct int_node *parent_stack[GIT_SHA1_RAWSZ];
+	struct int_node *parent_stack[GIT_MAX_RAWSZ];
 	unsigned char i, j;
 	void **p = note_tree_search(t, &tree, &n, entry->key_oid.hash);
 
@@ -394,6 +395,7 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 	void *buf;
 	struct tree_desc desc;
 	struct name_entry entry;
+	const unsigned hashsz = the_hash_algo->rawsz;
 
 	buf = fill_tree_descriptor(&desc, &subtree->val_oid);
 	if (!buf)
@@ -401,7 +403,7 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 		     oid_to_hex(&subtree->val_oid));
 
 	prefix_len = subtree->key_oid.hash[KEY_INDEX];
-	if (prefix_len >= GIT_SHA1_RAWSZ)
+	if (prefix_len >= hashsz)
 		BUG("prefix_len (%"PRIuMAX") is out of range", (uintmax_t)prefix_len);
 	if (prefix_len * 2 < n)
 		BUG("prefix_len (%"PRIuMAX") is too small", (uintmax_t)prefix_len);
@@ -411,7 +413,7 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 		struct leaf_node *l;
 		size_t path_len = strlen(entry.path);
 
-		if (path_len == 2 * (GIT_SHA1_RAWSZ - prefix_len)) {
+		if (path_len == 2 * (hashsz - prefix_len)) {
 			/* This is potentially the remainder of the SHA-1 */
 
 			if (!S_ISREG(entry.mode))
@@ -419,7 +421,7 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 				goto handle_non_note;
 
 			if (hex_to_bytes(object_oid.hash + prefix_len, entry.path,
-					 GIT_SHA1_RAWSZ - prefix_len))
+					 hashsz - prefix_len))
 				goto handle_non_note; /* entry.path is not a SHA1 */
 
 			type = PTR_TYPE_NOTE;
@@ -439,7 +441,7 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 			 * except for the last byte, where we write
 			 * the length:
 			 */
-			memset(object_oid.hash + len, 0, GIT_SHA1_RAWSZ - len - 1);
+			memset(object_oid.hash + len, 0, hashsz - len - 1);
 			object_oid.hash[KEY_INDEX] = (unsigned char)len;
 
 			type = PTR_TYPE_SUBTREE;
@@ -527,15 +529,15 @@ static unsigned char determine_fanout(struct int_node *tree, unsigned char n,
 	return fanout + 1;
 }
 
-/* hex SHA1 + 19 * '/' + NUL */
-#define FANOUT_PATH_MAX GIT_SHA1_HEXSZ + FANOUT_PATH_SEPARATORS + 1
+/* hex oid + one slash between each pair + NUL */
+#define FANOUT_PATH_MAX GIT_MAX_HEXSZ + FANOUT_PATH_SEPARATORS_MAX + 1
 
 static void construct_path_with_fanout(const unsigned char *sha1,
 		unsigned char fanout, char *path)
 {
 	unsigned int i = 0, j = 0;
 	const char *hex_sha1 = sha1_to_hex(sha1);
-	assert(fanout < GIT_SHA1_RAWSZ);
+	assert(fanout < the_hash_algo->rawsz);
 	while (fanout) {
 		path[i++] = hex_sha1[j++];
 		path[i++] = hex_sha1[j++];
@@ -637,10 +639,10 @@ static inline int matches_tree_write_stack(struct tree_write_stack *tws,
 
 static void write_tree_entry(struct strbuf *buf, unsigned int mode,
 		const char *path, unsigned int path_len, const
-		unsigned char *sha1)
+		unsigned char *hash)
 {
 	strbuf_addf(buf, "%o %.*s%c", mode, path_len, path, '\0');
-	strbuf_add(buf, sha1, GIT_SHA1_RAWSZ);
+	strbuf_add(buf, hash, the_hash_algo->rawsz);
 }
 
 static void tree_write_stack_init_subtree(struct tree_write_stack *tws,
@@ -652,7 +654,7 @@ static void tree_write_stack_init_subtree(struct tree_write_stack *tws,
 	n = (struct tree_write_stack *)
 		xmalloc(sizeof(struct tree_write_stack));
 	n->next = NULL;
-	strbuf_init(&n->buf, 256 * (32 + GIT_SHA1_HEXSZ)); /* assume 256 entries per tree */
+	strbuf_init(&n->buf, 256 * (32 + the_hash_algo->hexsz)); /* assume 256 entries per tree */
 	n->path[0] = n->path[1] = '\0';
 	tws->next = n;
 	tws->path[0] = path[0];
@@ -757,7 +759,7 @@ static int write_each_note(const struct object_id *object_oid,
 		note_path[note_path_len] = '\0';
 		mode = 040000;
 	}
-	assert(note_path_len <= GIT_SHA1_HEXSZ + FANOUT_PATH_SEPARATORS);
+	assert(note_path_len <= GIT_MAX_HEXSZ + FANOUT_PATH_SEPARATORS);
 
 	/* Weave non-note entries into note entries */
 	return  write_each_non_note_until(note_path, d) ||
@@ -1137,7 +1139,7 @@ int write_notes_tree(struct notes_tree *t, struct object_id *result)
 
 	/* Prepare for traversal of current notes tree */
 	root.next = NULL; /* last forward entry in list is grounded */
-	strbuf_init(&root.buf, 256 * (32 + GIT_SHA1_HEXSZ)); /* assume 256 entries */
+	strbuf_init(&root.buf, 256 * (32 + the_hash_algo->hexsz)); /* assume 256 entries */
 	root.path[0] = root.path[1] = '\0';
 	cb_data.root = &root;
 	cb_data.next_non_note = t->first_non_note;

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 09/31] notes: replace sha1_to_hex
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (7 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 08/31] notes: make hash size independent brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12  1:22 ` [PATCH 10/31] object-store: rename and expand packed_git's sha1 member brian m. carlson
                   ` (22 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Replace the uses of sha1_to_hex in this function with hash_to_hex to
allow the use of SHA-256 as well.  Rename some variables since this code
is no longer limited to SHA-1.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 notes.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/notes.c b/notes.c
index 96647f1d2d..a9a3ddd6d7 100644
--- a/notes.c
+++ b/notes.c
@@ -532,19 +532,19 @@ static unsigned char determine_fanout(struct int_node *tree, unsigned char n,
 /* hex oid + one slash between each pair + NUL */
 #define FANOUT_PATH_MAX GIT_MAX_HEXSZ + FANOUT_PATH_SEPARATORS_MAX + 1
 
-static void construct_path_with_fanout(const unsigned char *sha1,
+static void construct_path_with_fanout(const unsigned char *hash,
 		unsigned char fanout, char *path)
 {
 	unsigned int i = 0, j = 0;
-	const char *hex_sha1 = sha1_to_hex(sha1);
+	const char *hex_hash = hash_to_hex(hash);
 	assert(fanout < the_hash_algo->rawsz);
 	while (fanout) {
-		path[i++] = hex_sha1[j++];
-		path[i++] = hex_sha1[j++];
+		path[i++] = hex_hash[j++];
+		path[i++] = hex_hash[j++];
 		path[i++] = '/';
 		fanout--;
 	}
-	xsnprintf(path + i, FANOUT_PATH_MAX - i, "%s", hex_sha1 + j);
+	xsnprintf(path + i, FANOUT_PATH_MAX - i, "%s", hex_hash + j);
 }
 
 static int for_each_note_helper(struct notes_tree *t, struct int_node *tree,
@@ -1167,7 +1167,7 @@ void prune_notes(struct notes_tree *t, int flags)
 
 	while (l) {
 		if (flags & NOTES_PRUNE_VERBOSE)
-			printf("%s\n", sha1_to_hex(l->sha1));
+			printf("%s\n", hash_to_hex(l->sha1));
 		if (!(flags & NOTES_PRUNE_DRYRUN))
 			remove_note(t, l->sha1);
 		l = l->next;

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 10/31] object-store: rename and expand packed_git's sha1 member
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (8 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 09/31] notes: replace sha1_to_hex brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12  3:32   ` Eric Sunshine
  2019-02-12  1:22 ` [PATCH 11/31] builtin/name-rev: make hash-size independent brian m. carlson
                   ` (21 subsequent siblings)
  31 siblings, 1 reply; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

This member is used to represent the pack checksum of the pack in
question.  Expand this member to be GIT_MAX_RAWSZ bytes in length so it
works with longer hashes and rename it to be "hash" instead of "sha1".
This transformation was made with a change to the definition and the
following semantic patch:

@@
struct packed_git *E1;
@@
- E1->sha1
+ E1->hash

@@
struct packed_git E1;
@@
- E1.sha1
+ E1.hash

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/pack-redundant.c |  2 +-
 fast-import.c            | 17 +++++++++--------
 http-push.c              |  3 ++-
 http-walker.c            |  2 +-
 http.c                   | 13 +++++++------
 object-store.h           |  2 +-
 packfile.c               |  6 +++---
 7 files changed, 24 insertions(+), 21 deletions(-)

diff --git a/builtin/pack-redundant.c b/builtin/pack-redundant.c
index 11bc514566..d9af7f5414 100644
--- a/builtin/pack-redundant.c
+++ b/builtin/pack-redundant.c
@@ -689,7 +689,7 @@ int cmd_pack_redundant(int argc, const char **argv, const char *prefix)
 	pl = red = pack_list_difference(local_packs, min);
 	while (pl) {
 		printf("%s\n%s\n",
-		       sha1_pack_index_name(pl->pack->sha1),
+		       sha1_pack_index_name(pl->pack->hash),
 		       pl->pack->pack_name);
 		pl = pl->next;
 	}
diff --git a/fast-import.c b/fast-import.c
index b7ba755c2b..7c9a10a77b 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -742,7 +742,8 @@ static const char *create_index(void)
 	if (c != last)
 		die("internal consistency error creating the index");
 
-	tmpfile = write_idx_file(NULL, idx, object_count, &pack_idx_opts, pack_data->sha1);
+	tmpfile = write_idx_file(NULL, idx, object_count, &pack_idx_opts,
+				 pack_data->hash);
 	free(idx);
 	return tmpfile;
 }
@@ -753,7 +754,7 @@ static char *keep_pack(const char *curr_index_name)
 	struct strbuf name = STRBUF_INIT;
 	int keep_fd;
 
-	odb_pack_name(&name, pack_data->sha1, "keep");
+	odb_pack_name(&name, pack_data->hash, "keep");
 	keep_fd = odb_pack_keep(name.buf);
 	if (keep_fd < 0)
 		die_errno("cannot create keep file");
@@ -761,11 +762,11 @@ static char *keep_pack(const char *curr_index_name)
 	if (close(keep_fd))
 		die_errno("failed to write keep file");
 
-	odb_pack_name(&name, pack_data->sha1, "pack");
+	odb_pack_name(&name, pack_data->hash, "pack");
 	if (finalize_object_file(pack_data->pack_name, name.buf))
 		die("cannot store pack file");
 
-	odb_pack_name(&name, pack_data->sha1, "idx");
+	odb_pack_name(&name, pack_data->hash, "idx");
 	if (finalize_object_file(curr_index_name, name.buf))
 		die("cannot store index file");
 	free((void *)curr_index_name);
@@ -779,7 +780,7 @@ static void unkeep_all_packs(void)
 
 	for (k = 0; k < pack_id; k++) {
 		struct packed_git *p = all_packs[k];
-		odb_pack_name(&name, p->sha1, "keep");
+		odb_pack_name(&name, p->hash, "keep");
 		unlink_or_warn(name.buf);
 	}
 	strbuf_release(&name);
@@ -821,9 +822,9 @@ static void end_packfile(void)
 
 		close_pack_windows(pack_data);
 		finalize_hashfile(pack_file, cur_pack_oid.hash, 0);
-		fixup_pack_header_footer(pack_data->pack_fd, pack_data->sha1,
-				    pack_data->pack_name, object_count,
-				    cur_pack_oid.hash, pack_size);
+		fixup_pack_header_footer(pack_data->pack_fd, pack_data->hash,
+					 pack_data->pack_name, object_count,
+					 cur_pack_oid.hash, pack_size);
 
 		if (object_count <= unpack_limit) {
 			if (!loosen_small_pack(pack_data)) {
diff --git a/http-push.c b/http-push.c
index b22c7caea0..b313ada515 100644
--- a/http-push.c
+++ b/http-push.c
@@ -315,7 +315,8 @@ static void start_fetch_packed(struct transfer_request *request)
 		return;
 	}
 
-	fprintf(stderr,	"Fetching pack %s\n", sha1_to_hex(target->sha1));
+	fprintf(stderr,	"Fetching pack %s\n",
+		sha1_to_hex(target->hash));
 	fprintf(stderr, " which contains %s\n", oid_to_hex(&request->obj->oid));
 
 	preq = new_http_pack_request(target, repo->url);
diff --git a/http-walker.c b/http-walker.c
index 8ae5d76c6a..8063896cf6 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -434,7 +434,7 @@ static int http_fetch_pack(struct walker *walker, struct alt_base *repo, unsigne
 
 	if (walker->get_verbosely) {
 		fprintf(stderr, "Getting pack %s\n",
-			sha1_to_hex(target->sha1));
+			sha1_to_hex(target->hash));
 		fprintf(stderr, " which contains %s\n",
 			sha1_to_hex(sha1));
 	}
diff --git a/http.c b/http.c
index a32ad36ddf..a09adc518f 100644
--- a/http.c
+++ b/http.c
@@ -2236,10 +2236,10 @@ int finish_http_pack_request(struct http_pack_request *preq)
 		return -1;
 	}
 
-	unlink(sha1_pack_index_name(p->sha1));
+	unlink(sha1_pack_index_name(p->hash));
 
-	if (finalize_object_file(preq->tmpfile.buf, sha1_pack_name(p->sha1))
-	 || finalize_object_file(tmp_idx, sha1_pack_index_name(p->sha1))) {
+	if (finalize_object_file(preq->tmpfile.buf, sha1_pack_name(p->hash))
+	 || finalize_object_file(tmp_idx, sha1_pack_index_name(p->hash))) {
 		free(tmp_idx);
 		return -1;
 	}
@@ -2262,10 +2262,10 @@ struct http_pack_request *new_http_pack_request(
 
 	end_url_with_slash(&buf, base_url);
 	strbuf_addf(&buf, "objects/pack/pack-%s.pack",
-		sha1_to_hex(target->sha1));
+		sha1_to_hex(target->hash));
 	preq->url = strbuf_detach(&buf, NULL);
 
-	strbuf_addf(&preq->tmpfile, "%s.temp", sha1_pack_name(target->sha1));
+	strbuf_addf(&preq->tmpfile, "%s.temp", sha1_pack_name(target->hash));
 	preq->packfile = fopen(preq->tmpfile.buf, "a");
 	if (!preq->packfile) {
 		error("Unable to open local file %s for pack",
@@ -2289,7 +2289,8 @@ struct http_pack_request *new_http_pack_request(
 		if (http_is_verbose)
 			fprintf(stderr,
 				"Resuming fetch of pack %s at byte %"PRIuMAX"\n",
-				sha1_to_hex(target->sha1), (uintmax_t)prev_posn);
+				sha1_to_hex(target->hash),
+				(uintmax_t)prev_posn);
 		http_opt_request_remainder(preq->slot->curl, prev_posn);
 	}
 
diff --git a/object-store.h b/object-store.h
index 14fc935bd1..56f8aea1cc 100644
--- a/object-store.h
+++ b/object-store.h
@@ -77,7 +77,7 @@ struct packed_git {
 		 freshened:1,
 		 do_not_close:1,
 		 pack_promisor:1;
-	unsigned char sha1[20];
+	unsigned char hash[GIT_MAX_RAWSZ];
 	struct revindex_entry *revindex;
 	/* something like ".git/objects/pack/xxxxx.pack" */
 	char pack_name[FLEX_ARRAY]; /* more */
diff --git a/packfile.c b/packfile.c
index 16bcb75262..e6e8861650 100644
--- a/packfile.c
+++ b/packfile.c
@@ -235,7 +235,7 @@ struct packed_git *parse_pack_index(unsigned char *sha1, const char *idx_path)
 	struct packed_git *p = alloc_packed_git(alloc);
 
 	memcpy(p->pack_name, path, alloc); /* includes NUL */
-	hashcpy(p->sha1, sha1);
+	hashcpy(p->hash, sha1);
 	if (check_packed_git_idx(idx_path, p)) {
 		free(p);
 		return NULL;
@@ -722,8 +722,8 @@ struct packed_git *add_packed_git(const char *path, size_t path_len, int local)
 	p->pack_local = local;
 	p->mtime = st.st_mtime;
 	if (path_len < the_hash_algo->hexsz ||
-	    get_sha1_hex(path + path_len - the_hash_algo->hexsz, p->sha1))
-		hashclr(p->sha1);
+	    get_sha1_hex(path + path_len - the_hash_algo->hexsz, p->hash))
+		hashclr(p->hash);
 	return p;
 }
 

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 11/31] builtin/name-rev: make hash-size independent
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (9 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 10/31] object-store: rename and expand packed_git's sha1 member brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12  1:22 ` [PATCH 12/31] fast-import: " brian m. carlson
                   ` (20 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Use the_hash_algo when parsing instead of GIT_SHA1_HEXSZ so that this
function works with any size hash.  Rename the variable forty to
counter, as this is a better name and is independent of the hash size.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/name-rev.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index f1cb45c227..05ccf53e00 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -361,23 +361,25 @@ static char const * const name_rev_usage[] = {
 static void name_rev_line(char *p, struct name_ref_data *data)
 {
 	struct strbuf buf = STRBUF_INIT;
-	int forty = 0;
+	int counter = 0;
 	char *p_start;
+	const unsigned hexsz = the_hash_algo->hexsz;
+
 	for (p_start = p; *p; p++) {
 #define ishex(x) (isdigit((x)) || ((x) >= 'a' && (x) <= 'f'))
 		if (!ishex(*p))
-			forty = 0;
-		else if (++forty == GIT_SHA1_HEXSZ &&
+			counter = 0;
+		else if (++counter == hexsz &&
 			 !ishex(*(p+1))) {
 			struct object_id oid;
 			const char *name = NULL;
 			char c = *(p+1);
 			int p_len = p - p_start + 1;
 
-			forty = 0;
+			counter = 0;
 
 			*(p+1) = 0;
-			if (!get_oid(p - (GIT_SHA1_HEXSZ - 1), &oid)) {
+			if (!get_oid(p - (hexsz - 1), &oid)) {
 				struct object *o =
 					lookup_object(the_repository,
 						      oid.hash);
@@ -390,7 +392,7 @@ static void name_rev_line(char *p, struct name_ref_data *data)
 				continue;
 
 			if (data->name_only)
-				printf("%.*s%s", p_len - GIT_SHA1_HEXSZ, p_start, name);
+				printf("%.*s%s", p_len - hexsz, p_start, name);
 			else
 				printf("%.*s (%s)", p_len, p_start, name);
 			p_start = p + 1;

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 12/31] fast-import: make hash-size independent
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (10 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 11/31] builtin/name-rev: make hash-size independent brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12  3:44   ` Eric Sunshine
  2019-02-12  1:22 ` [PATCH 13/31] fast-import: replace sha1_to_hex brian m. carlson
                   ` (19 subsequent siblings)
  31 siblings, 1 reply; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Replace several uses of GIT_SHA1_HEXSZ and 40-based constants with
references to the_hash_algo.  Update the note handling code here to
compute path sizes based on GIT_MAX_RAWSZ as well.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 fast-import.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 7c9a10a77b..eba8c8c919 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1241,7 +1241,7 @@ static void load_tree(struct tree_entry *root)
 		c += e->name->str_len + 1;
 		hashcpy(e->versions[0].oid.hash, (unsigned char *)c);
 		hashcpy(e->versions[1].oid.hash, (unsigned char *)c);
-		c += GIT_SHA1_RAWSZ;
+		c += the_hash_algo->rawsz;
 	}
 	free(buf);
 }
@@ -1288,7 +1288,7 @@ static void mktree(struct tree_content *t, int v, struct strbuf *b)
 		strbuf_addf(b, "%o %s%c",
 			(unsigned int)(e->versions[v].mode & ~NO_DELTA),
 			e->name->str_dat, '\0');
-		strbuf_add(b, e->versions[v].oid.hash, GIT_SHA1_RAWSZ);
+		strbuf_add(b, e->versions[v].oid.hash, the_hash_algo->rawsz);
 	}
 }
 
@@ -2047,7 +2047,8 @@ static uintmax_t do_change_note_fanout(
 	unsigned int i, tmp_hex_oid_len, tmp_fullpath_len;
 	uintmax_t num_notes = 0;
 	struct object_id oid;
-	char realpath[60];
+	char realpath[GIT_MAX_RAWSZ * 3];
+	const unsigned hexsz = the_hash_algo->hexsz;
 
 	if (!root->tree)
 		load_tree(root);
@@ -2067,7 +2068,7 @@ static uintmax_t do_change_note_fanout(
 		 * of 2 chars.
 		 */
 		if (!e->versions[1].mode ||
-		    tmp_hex_oid_len > GIT_SHA1_HEXSZ ||
+		    tmp_hex_oid_len > hexsz ||
 		    e->name->str_len % 2)
 			continue;
 
@@ -2081,7 +2082,7 @@ static uintmax_t do_change_note_fanout(
 		tmp_fullpath_len += e->name->str_len;
 		fullpath[tmp_fullpath_len] = '\0';
 
-		if (tmp_hex_oid_len == GIT_SHA1_HEXSZ && !get_oid_hex(hex_oid, &oid)) {
+		if (tmp_hex_oid_len == hexsz && !get_oid_hex(hex_oid, &oid)) {
 			/* This is a note entry */
 			if (fanout == 0xff) {
 				/* Counting mode, no rename */
@@ -2352,7 +2353,7 @@ static void note_change_n(const char *p, struct branch *b, unsigned char *old_fa
 	struct object_entry *oe;
 	struct branch *s;
 	struct object_id oid, commit_oid;
-	char path[60];
+	char path[GIT_MAX_RAWSZ * 3];
 	uint16_t inline_data = 0;
 	unsigned char new_fanout;
 
@@ -2405,7 +2406,7 @@ static void note_change_n(const char *p, struct branch *b, unsigned char *old_fa
 		char *buf = read_object_with_reference(&commit_oid,
 						       commit_type, &size,
 						       &commit_oid);
-		if (!buf || size < 46)
+		if (!buf || size < the_hash_algo->hexsz)
 			die("Not a valid commit: %s", p);
 		free(buf);
 	} else
@@ -2456,7 +2457,7 @@ static void file_change_deleteall(struct branch *b)
 
 static void parse_from_commit(struct branch *b, char *buf, unsigned long size)
 {
-	if (!buf || size < GIT_SHA1_HEXSZ + 6)
+	if (!buf || size < the_hash_algo->hexsz + 6)
 		die("Not a valid commit: %s", oid_to_hex(&b->oid));
 	if (memcmp("tree ", buf, 5)
 		|| get_oid_hex(buf + 5, &b->branch_tree.versions[1].oid))
@@ -2555,7 +2556,7 @@ static struct hash_list *parse_merge(unsigned int *count)
 			char *buf = read_object_with_reference(&n->oid,
 							       commit_type,
 							       &size, &n->oid);
-			if (!buf || size < 46)
+			if (!buf || size < the_hash_algo->hexsz)
 				die("Not a valid commit: %s", from);
 			free(buf);
 		} else
@@ -2842,7 +2843,7 @@ static void parse_get_mark(const char *p)
 		die("Unknown mark: %s", command_buf.buf);
 
 	xsnprintf(output, sizeof(output), "%s\n", oid_to_hex(&oe->idx.oid));
-	cat_blob_write(output, GIT_SHA1_HEXSZ + 1);
+	cat_blob_write(output, the_hash_algo->hexsz + 1);
 }
 
 static void parse_cat_blob(const char *p)
@@ -2872,6 +2873,8 @@ static struct object_entry *dereference(struct object_entry *oe,
 {
 	unsigned long size;
 	char *buf = NULL;
+	const unsigned hexsz = the_hash_algo->hexsz;
+
 	if (!oe) {
 		enum object_type type = oid_object_info(the_repository, oid,
 							NULL);
@@ -2905,12 +2908,12 @@ static struct object_entry *dereference(struct object_entry *oe,
 	/* Peel one layer. */
 	switch (oe->type) {
 	case OBJ_TAG:
-		if (size < GIT_SHA1_HEXSZ + strlen("object ") ||
+		if (size < hexsz + strlen("object ") ||
 		    get_oid_hex(buf + strlen("object "), oid))
 			die("Invalid SHA1 in tag: %s", command_buf.buf);
 		break;
 	case OBJ_COMMIT:
-		if (size < GIT_SHA1_HEXSZ + strlen("tree ") ||
+		if (size < hexsz + strlen("tree ") ||
 		    get_oid_hex(buf + strlen("tree "), oid))
 			die("Invalid SHA1 in commit: %s", command_buf.buf);
 	}

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 13/31] fast-import: replace sha1_to_hex
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (11 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 12/31] fast-import: " brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12  1:22 ` [PATCH 14/31] builtin/am: make hash size independent brian m. carlson
                   ` (18 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Replace the uses of sha1_to_hex in this function with hash_to_hex to
allow the use of SHA-256 as well.  Rename a variable since it is no
longer limited to SHA-1.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 fast-import.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index eba8c8c919..3d6e2bfa4d 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2945,7 +2945,7 @@ static struct object_entry *parse_treeish_dataref(const char **p)
 	return e;
 }
 
-static void print_ls(int mode, const unsigned char *sha1, const char *path)
+static void print_ls(int mode, const unsigned char *hash, const char *path)
 {
 	static struct strbuf line = STRBUF_INIT;
 
@@ -2965,7 +2965,7 @@ static void print_ls(int mode, const unsigned char *sha1, const char *path)
 		/* mode SP type SP object_name TAB path LF */
 		strbuf_reset(&line);
 		strbuf_addf(&line, "%06o %s %s\t",
-				mode & ~NO_DELTA, type, sha1_to_hex(sha1));
+				mode & ~NO_DELTA, type, hash_to_hex(hash));
 		quote_c_style(path, &line, NULL, 0);
 		strbuf_addch(&line, '\n');
 	}

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 14/31] builtin/am: make hash size independent
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (12 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 13/31] fast-import: replace sha1_to_hex brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12  1:22 ` [PATCH 15/31] builtin/pull: make hash-size independent brian m. carlson
                   ` (17 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Instead of using GIT_SHA1_HEXSZ, switch to using the_hash_algo and
parse_oid_hex to parse the lines involved in rebasing notes.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/am.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 58a2aef28b..584baf1c7e 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -485,23 +485,24 @@ static int copy_notes_for_rebase(const struct am_state *state)
 
 	while (!strbuf_getline_lf(&sb, fp)) {
 		struct object_id from_obj, to_obj;
+		const char *p;
 
-		if (sb.len != GIT_SHA1_HEXSZ * 2 + 1) {
+		if (sb.len != the_hash_algo->hexsz * 2 + 1) {
 			ret = error(invalid_line, sb.buf);
 			goto finish;
 		}
 
-		if (get_oid_hex(sb.buf, &from_obj)) {
+		if (parse_oid_hex(sb.buf, &from_obj, &p)) {
 			ret = error(invalid_line, sb.buf);
 			goto finish;
 		}
 
-		if (sb.buf[GIT_SHA1_HEXSZ] != ' ') {
+		if (*p != ' ') {
 			ret = error(invalid_line, sb.buf);
 			goto finish;
 		}
 
-		if (get_oid_hex(sb.buf + GIT_SHA1_HEXSZ + 1, &to_obj)) {
+		if (get_oid_hex(p + 1, &to_obj)) {
 			ret = error(invalid_line, sb.buf);
 			goto finish;
 		}

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 15/31] builtin/pull: make hash-size independent
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (13 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 14/31] builtin/am: make hash size independent brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12  3:47   ` Eric Sunshine
  2019-02-12  1:22 ` [PATCH 16/31] http-push: convert to use the_hash_algo brian m. carlson
                   ` (16 subsequent siblings)
  31 siblings, 1 reply; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Instead of using get_oid_hex and GIT_SHA1_HEXSZ, use parse_oid_hex to
avoid the need for a constant and simplify the code.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/pull.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/pull.c b/builtin/pull.c
index 701d1473dc..52d93c1e3b 100644
--- a/builtin/pull.c
+++ b/builtin/pull.c
@@ -365,9 +365,10 @@ static void get_merge_heads(struct oid_array *merge_heads)
 
 	fp = xfopen(filename, "r");
 	while (strbuf_getline_lf(&sb, fp) != EOF) {
-		if (get_oid_hex(sb.buf, &oid))
+		const char *p;
+		if (parse_oid_hex(sb.buf, &oid, &p))
 			continue;  /* invalid line: does not start with SHA1 */
-		if (starts_with(sb.buf + GIT_SHA1_HEXSZ, "\tnot-for-merge\t"))
+		if (starts_with(p, "\tnot-for-merge\t"))
 			continue;  /* ref is not-for-merge */
 		oid_array_append(merge_heads, &oid);
 	}

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 16/31] http-push: convert to use the_hash_algo
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (14 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 15/31] builtin/pull: make hash-size independent brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12  1:22 ` [PATCH 17/31] http-backend: allow 64-character hex names brian m. carlson
                   ` (15 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Switch the lock token code to use the_hash_algo and increase its buffers
to be allocated using GIT_MAX_* constants.  Update the parsing of object
paths to use the_hash_algo as well.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 http-push.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/http-push.c b/http-push.c
index b313ada515..c4e5142bc9 100644
--- a/http-push.c
+++ b/http-push.c
@@ -145,7 +145,7 @@ struct remote_lock {
 	char *url;
 	char *owner;
 	char *token;
-	char tmpfile_suffix[41];
+	char tmpfile_suffix[GIT_MAX_HEXSZ + 1];
 	time_t start_time;
 	long timeout;
 	int refreshing;
@@ -399,7 +399,7 @@ static void start_put(struct transfer_request *request)
 	request->dest = strbuf_detach(&buf, NULL);
 
 	append_remote_object_url(&buf, repo->url, hex, 0);
-	strbuf_add(&buf, request->lock->tmpfile_suffix, 41);
+	strbuf_add(&buf, request->lock->tmpfile_suffix, the_hash_algo->hexsz + 1);
 	request->url = strbuf_detach(&buf, NULL);
 
 	slot = get_active_slot();
@@ -758,8 +758,8 @@ static void handle_lockprop_ctx(struct xml_ctx *ctx, int tag_closed)
 static void handle_new_lock_ctx(struct xml_ctx *ctx, int tag_closed)
 {
 	struct remote_lock *lock = (struct remote_lock *)ctx->userData;
-	git_SHA_CTX sha_ctx;
-	unsigned char lock_token_sha1[20];
+	git_hash_ctx hash_ctx;
+	unsigned char lock_token_hash[GIT_MAX_RAWSZ];
 
 	if (tag_closed && ctx->cdata) {
 		if (!strcmp(ctx->name, DAV_ACTIVELOCK_OWNER)) {
@@ -771,12 +771,12 @@ static void handle_new_lock_ctx(struct xml_ctx *ctx, int tag_closed)
 		} else if (!strcmp(ctx->name, DAV_ACTIVELOCK_TOKEN)) {
 			lock->token = xstrdup(ctx->cdata);
 
-			git_SHA1_Init(&sha_ctx);
-			git_SHA1_Update(&sha_ctx, lock->token, strlen(lock->token));
-			git_SHA1_Final(lock_token_sha1, &sha_ctx);
+			the_hash_algo->init_fn(&hash_ctx);
+			the_hash_algo->update_fn(&hash_ctx, lock->token, strlen(lock->token));
+			the_hash_algo->final_fn(lock_token_hash, &hash_ctx);
 
 			lock->tmpfile_suffix[0] = '_';
-			memcpy(lock->tmpfile_suffix + 1, sha1_to_hex(lock_token_sha1), 40);
+			memcpy(lock->tmpfile_suffix + 1, hash_to_hex(lock_token_hash), the_hash_algo->hexsz);
 		}
 	}
 }
@@ -1018,7 +1018,7 @@ static void remote_ls(const char *path, int flags,
 /* extract hex from sharded "xx/x{38}" filename */
 static int get_oid_hex_from_objpath(const char *path, struct object_id *oid)
 {
-	if (strlen(path) != GIT_SHA1_HEXSZ + 1)
+	if (strlen(path) != the_hash_algo->hexsz + 1)
 		return -1;
 
 	if (hex_to_bytes(oid->hash, path, 1))
@@ -1026,7 +1026,7 @@ static int get_oid_hex_from_objpath(const char *path, struct object_id *oid)
 	path += 2;
 	path++; /* skip '/' */
 
-	return hex_to_bytes(oid->hash + 1, path, GIT_SHA1_RAWSZ - 1);
+	return hex_to_bytes(oid->hash + 1, path, the_hash_algo->rawsz - 1);
 }
 
 static void process_ls_object(struct remote_ls_ctx *ls)

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 17/31] http-backend: allow 64-character hex names
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (15 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 16/31] http-push: convert to use the_hash_algo brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12  1:22 ` [PATCH 18/31] http-push: remove remaining uses of sha1_to_hex brian m. carlson
                   ` (14 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

In an SHA-256-backed repository using the http-backend handler for dumb
protocol clients, it may be necessary to access the raw packs using
their full SHA-256-specified names.  Allow packs and loose objects to be
accessed using their full SHA-256-specified 64-character hex names.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 http-backend.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/http-backend.c b/http-backend.c
index 29e68e38b5..ec3144b444 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -711,8 +711,11 @@ static struct service_cmd {
 	{"GET", "/objects/info/http-alternates$", get_text_file},
 	{"GET", "/objects/info/packs$", get_info_packs},
 	{"GET", "/objects/[0-9a-f]{2}/[0-9a-f]{38}$", get_loose_object},
+	{"GET", "/objects/[0-9a-f]{2}/[0-9a-f]{62}$", get_loose_object},
 	{"GET", "/objects/pack/pack-[0-9a-f]{40}\\.pack$", get_pack_file},
+	{"GET", "/objects/pack/pack-[0-9a-f]{64}\\.pack$", get_pack_file},
 	{"GET", "/objects/pack/pack-[0-9a-f]{40}\\.idx$", get_idx_file},
+	{"GET", "/objects/pack/pack-[0-9a-f]{64}\\.idx$", get_idx_file},
 
 	{"POST", "/git-upload-pack$", service_rpc},
 	{"POST", "/git-receive-pack$", service_rpc}

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 18/31] http-push: remove remaining uses of sha1_to_hex
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (16 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 17/31] http-backend: allow 64-character hex names brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12  1:22 ` [PATCH 19/31] http-walker: replace sha1_to_hex brian m. carlson
                   ` (13 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Since sha1_to_hex is limited to SHA-1, switch all remaining uses of it
in this file to hash_to_hex or oid_to_hex.  Modify update_remote to take
a pointer to struct object_id, and since we don't modify that parameter
in the function, set it to be const as well.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 http-push.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/http-push.c b/http-push.c
index c4e5142bc9..f675a96316 100644
--- a/http-push.c
+++ b/http-push.c
@@ -316,7 +316,7 @@ static void start_fetch_packed(struct transfer_request *request)
 	}
 
 	fprintf(stderr,	"Fetching pack %s\n",
-		sha1_to_hex(target->hash));
+		hash_to_hex(target->hash));
 	fprintf(stderr, " which contains %s\n", oid_to_hex(&request->obj->oid));
 
 	preq = new_http_pack_request(target, repo->url);
@@ -1374,7 +1374,7 @@ static int get_delta(struct rev_info *revs, struct remote_lock *lock)
 	return count;
 }
 
-static int update_remote(unsigned char *sha1, struct remote_lock *lock)
+static int update_remote(const struct object_id *oid, struct remote_lock *lock)
 {
 	struct active_request_slot *slot;
 	struct slot_results results;
@@ -1383,7 +1383,7 @@ static int update_remote(unsigned char *sha1, struct remote_lock *lock)
 
 	dav_headers = get_dav_token_headers(lock, DAV_HEADER_IF);
 
-	strbuf_addf(&out_buffer.buf, "%s\n", sha1_to_hex(sha1));
+	strbuf_addf(&out_buffer.buf, "%s\n", oid_to_hex(oid));
 
 	slot = get_active_slot();
 	slot->results = &results;
@@ -1948,7 +1948,7 @@ int cmd_main(int argc, const char **argv)
 		run_request_queue();
 
 		/* Update the remote branch if all went well */
-		if (aborted || !update_remote(ref->new_oid.hash, ref_lock))
+		if (aborted || !update_remote(&ref->new_oid, ref_lock))
 			rc = 1;
 
 		if (!rc)

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 19/31] http-walker: replace sha1_to_hex
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (17 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 18/31] http-push: remove remaining uses of sha1_to_hex brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12  3:51   ` Eric Sunshine
  2019-02-12  1:22 ` [PATCH 20/31] http: replace hard-coded constant with the_hash_algo brian m. carlson
                   ` (12 subsequent siblings)
  31 siblings, 1 reply; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Since sha1_to_hex is limited to SHA-1, replace the uses of it in this
file with hasH_to_hex.  Rename several variables accordingly to reflect
that they are no longer limited to SHA-1.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 http-walker.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/http-walker.c b/http-walker.c
index 8063896cf6..e11670eee2 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -434,9 +434,9 @@ static int http_fetch_pack(struct walker *walker, struct alt_base *repo, unsigne
 
 	if (walker->get_verbosely) {
 		fprintf(stderr, "Getting pack %s\n",
-			sha1_to_hex(target->hash));
+			hash_to_hex(target->hash));
 		fprintf(stderr, " which contains %s\n",
-			sha1_to_hex(sha1));
+			hash_to_hex(sha1));
 	}
 
 	preq = new_http_pack_request(target, repo->base);
@@ -473,9 +473,9 @@ static void abort_object_request(struct object_request *obj_req)
 	release_object_request(obj_req);
 }
 
-static int fetch_object(struct walker *walker, unsigned char *sha1)
+static int fetch_object(struct walker *walker, unsigned char *hash)
 {
-	char *hex = sha1_to_hex(sha1);
+	char *hex = hash_to_hex(hash);
 	int ret = 0;
 	struct object_request *obj_req = NULL;
 	struct http_object_request *req;
@@ -483,7 +483,7 @@ static int fetch_object(struct walker *walker, unsigned char *sha1)
 
 	list_for_each(pos, head) {
 		obj_req = list_entry(pos, struct object_request, node);
-		if (hasheq(obj_req->oid.hash, sha1))
+		if (hasheq(obj_req->oid.hash, hash))
 			break;
 	}
 	if (obj_req == NULL)
@@ -557,20 +557,20 @@ static int fetch_object(struct walker *walker, unsigned char *sha1)
 	return ret;
 }
 
-static int fetch(struct walker *walker, unsigned char *sha1)
+static int fetch(struct walker *walker, unsigned char *hash)
 {
 	struct walker_data *data = walker->data;
 	struct alt_base *altbase = data->alt;
 
-	if (!fetch_object(walker, sha1))
+	if (!fetch_object(walker, hash))
 		return 0;
 	while (altbase) {
-		if (!http_fetch_pack(walker, altbase, sha1))
+		if (!http_fetch_pack(walker, altbase, hash))
 			return 0;
 		fetch_alternates(walker, data->alt->base);
 		altbase = altbase->next;
 	}
-	return error("Unable to find %s under %s", sha1_to_hex(sha1),
+	return error("Unable to find %s under %s", hash_to_hex(hash),
 		     data->alt->base);
 }
 

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 20/31] http: replace hard-coded constant with the_hash_algo
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (18 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 19/31] http-walker: replace sha1_to_hex brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12  1:22 ` [PATCH 21/31] http: compute hash of downloaded objects using the_hash_algo brian m. carlson
                   ` (11 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Replace a hard-coded 40 with a reference to the_hash_algo.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 http.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/http.c b/http.c
index a09adc518f..993ddc956a 100644
--- a/http.c
+++ b/http.c
@@ -2065,7 +2065,7 @@ int http_fetch_ref(const char *base, struct ref *ref)
 	url = quote_ref_url(base, ref->name);
 	if (http_get_strbuf(url, &buffer, &options) == HTTP_OK) {
 		strbuf_rtrim(&buffer);
-		if (buffer.len == 40)
+		if (buffer.len == the_hash_algo->hexsz)
 			ret = get_oid_hex(buffer.buf, &ref->old_oid);
 		else if (starts_with(buffer.buf, "ref: ")) {
 			ref->symref = xstrdup(buffer.buf + 5);

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 21/31] http: compute hash of downloaded objects using the_hash_algo
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (19 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 20/31] http: replace hard-coded constant with the_hash_algo brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12  1:22 ` [PATCH 22/31] http: replace sha1_to_hex brian m. carlson
                   ` (10 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 http.c | 10 +++++-----
 http.h |  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/http.c b/http.c
index 993ddc956a..458d07fabb 100644
--- a/http.c
+++ b/http.c
@@ -2337,8 +2337,8 @@ static size_t fwrite_sha1_file(char *ptr, size_t eltsize, size_t nmemb,
 		freq->stream.next_out = expn;
 		freq->stream.avail_out = sizeof(expn);
 		freq->zret = git_inflate(&freq->stream, Z_SYNC_FLUSH);
-		git_SHA1_Update(&freq->c, expn,
-				sizeof(expn) - freq->stream.avail_out);
+		the_hash_algo->update_fn(&freq->c, expn,
+					 sizeof(expn) - freq->stream.avail_out);
 	} while (freq->stream.avail_in && freq->zret == Z_OK);
 	return size;
 }
@@ -2396,7 +2396,7 @@ struct http_object_request *new_http_object_request(const char *base_url,
 
 	git_inflate_init(&freq->stream);
 
-	git_SHA1_Init(&freq->c);
+	the_hash_algo->init_fn(&freq->c);
 
 	freq->url = get_remote_object_url(base_url, hex, 0);
 
@@ -2431,7 +2431,7 @@ struct http_object_request *new_http_object_request(const char *base_url,
 	if (prev_read == -1) {
 		memset(&freq->stream, 0, sizeof(freq->stream));
 		git_inflate_init(&freq->stream);
-		git_SHA1_Init(&freq->c);
+		the_hash_algo->init_fn(&freq->c);
 		if (prev_posn>0) {
 			prev_posn = 0;
 			lseek(freq->localfile, 0, SEEK_SET);
@@ -2502,7 +2502,7 @@ int finish_http_object_request(struct http_object_request *freq)
 	}
 
 	git_inflate_end(&freq->stream);
-	git_SHA1_Final(freq->real_oid.hash, &freq->c);
+	the_hash_algo->final_fn(freq->real_oid.hash, &freq->c);
 	if (freq->zret != Z_STREAM_END) {
 		unlink_or_warn(freq->tmpfile.buf);
 		return -1;
diff --git a/http.h b/http.h
index 4eb4e808e5..10d3cfdb80 100644
--- a/http.h
+++ b/http.h
@@ -225,7 +225,7 @@ struct http_object_request {
 	long http_code;
 	struct object_id oid;
 	struct object_id real_oid;
-	git_SHA_CTX c;
+	git_hash_ctx c;
 	git_zstream stream;
 	int zret;
 	int rename;

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 22/31] http: replace sha1_to_hex
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (20 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 21/31] http: compute hash of downloaded objects using the_hash_algo brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12  1:22 ` [PATCH 23/31] remote-curl: make hash size independent brian m. carlson
                   ` (9 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Since sha1_to_hex is limited to SHA-1, replace it with hash_to_hex.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 http.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/http.c b/http.c
index 458d07fabb..8ab07458e4 100644
--- a/http.c
+++ b/http.c
@@ -2079,19 +2079,19 @@ int http_fetch_ref(const char *base, struct ref *ref)
 }
 
 /* Helpers for fetching packs */
-static char *fetch_pack_index(unsigned char *sha1, const char *base_url)
+static char *fetch_pack_index(unsigned char *hash, const char *base_url)
 {
 	char *url, *tmp;
 	struct strbuf buf = STRBUF_INIT;
 
 	if (http_is_verbose)
-		fprintf(stderr, "Getting index for pack %s\n", sha1_to_hex(sha1));
+		fprintf(stderr, "Getting index for pack %s\n", hash_to_hex(hash));
 
 	end_url_with_slash(&buf, base_url);
-	strbuf_addf(&buf, "objects/pack/pack-%s.idx", sha1_to_hex(sha1));
+	strbuf_addf(&buf, "objects/pack/pack-%s.idx", hash_to_hex(hash));
 	url = strbuf_detach(&buf, NULL);
 
-	strbuf_addf(&buf, "%s.temp", sha1_pack_index_name(sha1));
+	strbuf_addf(&buf, "%s.temp", sha1_pack_index_name(hash));
 	tmp = strbuf_detach(&buf, NULL);
 
 	if (http_get_file(url, tmp, NULL) != HTTP_OK) {
@@ -2262,7 +2262,7 @@ struct http_pack_request *new_http_pack_request(
 
 	end_url_with_slash(&buf, base_url);
 	strbuf_addf(&buf, "objects/pack/pack-%s.pack",
-		sha1_to_hex(target->hash));
+		hash_to_hex(target->hash));
 	preq->url = strbuf_detach(&buf, NULL);
 
 	strbuf_addf(&preq->tmpfile, "%s.temp", sha1_pack_name(target->hash));
@@ -2289,7 +2289,7 @@ struct http_pack_request *new_http_pack_request(
 		if (http_is_verbose)
 			fprintf(stderr,
 				"Resuming fetch of pack %s at byte %"PRIuMAX"\n",
-				sha1_to_hex(target->hash),
+				hash_to_hex(target->hash),
 				(uintmax_t)prev_posn);
 		http_opt_request_remainder(preq->slot->curl, prev_posn);
 	}

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 23/31] remote-curl: make hash size independent
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (21 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 22/31] http: replace sha1_to_hex brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12 11:11   ` Ævar Arnfjörð Bjarmason
  2019-02-12  1:22 ` [PATCH 24/31] archive-tar: " brian m. carlson
                   ` (8 subsequent siblings)
  31 siblings, 1 reply; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Change one hard-coded use of the constant 40 to a reference to
the_hash_algo.  In addition, switch a use of get_oid_hex to
parse_oid_hex to avoid the need to use a constant.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 remote-curl.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index bb7421023b..8395b71bbb 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -249,7 +249,7 @@ static struct ref *parse_info_refs(struct discovery *heads)
 		if (data[i] == '\t')
 			mid = &data[i];
 		if (data[i] == '\n') {
-			if (mid - start != 40)
+			if (mid - start != the_hash_algo->hexsz)
 				die("%sinfo/refs not valid: is this a git repository?",
 				    url.buf);
 			data[i] = 0;
@@ -1013,12 +1013,13 @@ static void parse_fetch(struct strbuf *buf)
 			const char *name;
 			struct ref *ref;
 			struct object_id old_oid;
+			const char *q;
 
-			if (get_oid_hex(p, &old_oid))
+			if (parse_oid_hex(p, &old_oid, &q))
 				die("protocol error: expected sha/ref, got %s'", p);
-			if (p[GIT_SHA1_HEXSZ] == ' ')
-				name = p + GIT_SHA1_HEXSZ + 1;
-			else if (!p[GIT_SHA1_HEXSZ])
+			if (*q == ' ')
+				name = q + 1;
+			else if (!*q)
 				name = "";
 			else
 				die("protocol error: expected sha/ref, got %s'", p);

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 24/31] archive-tar: make hash size independent
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (22 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 23/31] remote-curl: make hash size independent brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12  7:20   ` René Scharfe
  2019-02-12  1:22 ` [PATCH 25/31] archive: convert struct archiver_args to object_id brian m. carlson
                   ` (7 subsequent siblings)
  31 siblings, 1 reply; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Make the tar generation code hash size independent by using
the_hash_algo.  Make the tar parsing code compute the header value based
on the hash algorithm in use. Update a variable name and switch to
hash_to_hex.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 archive-tar.c               |  7 ++++---
 builtin/get-tar-commit-id.c | 11 +++++++++--
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/archive-tar.c b/archive-tar.c
index 4aabd566fb..a5ba55c11e 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -326,14 +326,15 @@ static int write_tar_entry(struct archiver_args *args,
 
 static void write_global_extended_header(struct archiver_args *args)
 {
-	const unsigned char *sha1 = args->commit_sha1;
+	const unsigned char *hash = args->commit_sha1;
 	struct strbuf ext_header = STRBUF_INIT;
 	struct ustar_header header;
 	unsigned int mode;
 
-	if (sha1)
+	if (hash)
 		strbuf_append_ext_header(&ext_header, "comment",
-					 sha1_to_hex(sha1), 40);
+					 hash_to_hex(hash),
+					 the_hash_algo->hexsz);
 	if (args->time > USTAR_MAX_MTIME) {
 		strbuf_append_ext_header_uint(&ext_header, "mtime",
 					      args->time);
diff --git a/builtin/get-tar-commit-id.c b/builtin/get-tar-commit-id.c
index 2706fcfaf2..2760549e91 100644
--- a/builtin/get-tar-commit-id.c
+++ b/builtin/get-tar-commit-id.c
@@ -5,6 +5,7 @@
 #include "commit.h"
 #include "tar.h"
 #include "builtin.h"
+#include "strbuf.h"
 #include "quote.h"
 
 static const char builtin_get_tar_commit_id_usage[] =
@@ -21,6 +22,8 @@ int cmd_get_tar_commit_id(int argc, const char **argv, const char *prefix)
 	char *content = buffer + RECORDSIZE;
 	const char *comment;
 	ssize_t n;
+	char *hdrprefix;
+	int ret;
 
 	if (argc != 1)
 		usage(builtin_get_tar_commit_id_usage);
@@ -32,10 +35,14 @@ int cmd_get_tar_commit_id(int argc, const char **argv, const char *prefix)
 		die_errno("git get-tar-commit-id: EOF before reading tar header");
 	if (header->typeflag[0] != 'g')
 		return 1;
-	if (!skip_prefix(content, "52 comment=", &comment))
+
+	hdrprefix = xstrfmt("%zu comment=", the_hash_algo->hexsz + strlen(" comment=") + 2 + 1);
+	ret = skip_prefix(content, hdrprefix, &comment);
+	free(hdrprefix);
+	if (!ret)
 		return 1;
 
-	if (write_in_full(1, comment, 41) < 0)
+	if (write_in_full(1, comment, the_hash_algo->hexsz + 1) < 0)
 		die_errno("git get-tar-commit-id: write error");
 
 	return 0;

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 25/31] archive: convert struct archiver_args to object_id
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (23 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 24/31] archive-tar: " brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12  1:22 ` [PATCH 26/31] refspec: make hash size independent brian m. carlson
                   ` (6 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Change the commit_sha1 member to be called "commit_oid" and change it to
be a pointer to struct object_id.  Additionally, update two uses of
GIT_SHA1_HEXSZ to use the_hash_algo instead.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 archive-tar.c |  6 +++---
 archive-zip.c | 10 +++++-----
 archive.c     |  8 ++++----
 archive.h     |  2 +-
 4 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/archive-tar.c b/archive-tar.c
index a5ba55c11e..3e53aac1e6 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -326,14 +326,14 @@ static int write_tar_entry(struct archiver_args *args,
 
 static void write_global_extended_header(struct archiver_args *args)
 {
-	const unsigned char *hash = args->commit_sha1;
+	const struct object_id *oid = args->commit_oid;
 	struct strbuf ext_header = STRBUF_INIT;
 	struct ustar_header header;
 	unsigned int mode;
 
-	if (hash)
+	if (oid)
 		strbuf_append_ext_header(&ext_header, "comment",
-					 hash_to_hex(hash),
+					 oid_to_hex(oid),
 					 the_hash_algo->hexsz);
 	if (args->time > USTAR_MAX_MTIME) {
 		strbuf_append_ext_header_uint(&ext_header, "mtime",
diff --git a/archive-zip.c b/archive-zip.c
index 155ee4a779..4d66b5be6e 100644
--- a/archive-zip.c
+++ b/archive-zip.c
@@ -577,7 +577,7 @@ static void write_zip64_trailer(void)
 	write_or_die(1, &locator64, ZIP64_DIR_TRAILER_LOCATOR_SIZE);
 }
 
-static void write_zip_trailer(const unsigned char *sha1)
+static void write_zip_trailer(const struct object_id *oid)
 {
 	struct zip_dir_trailer trailer;
 	int clamped = 0;
@@ -590,14 +590,14 @@ static void write_zip_trailer(const unsigned char *sha1)
 	copy_le16_clamp(trailer.entries, zip_dir_entries, &clamped);
 	copy_le32(trailer.size, zip_dir.len);
 	copy_le32_clamp(trailer.offset, zip_offset, &clamped);
-	copy_le16(trailer.comment_length, sha1 ? GIT_SHA1_HEXSZ : 0);
+	copy_le16(trailer.comment_length, oid ? the_hash_algo->hexsz : 0);
 
 	write_or_die(1, zip_dir.buf, zip_dir.len);
 	if (clamped)
 		write_zip64_trailer();
 	write_or_die(1, &trailer, ZIP_DIR_TRAILER_SIZE);
-	if (sha1)
-		write_or_die(1, sha1_to_hex(sha1), GIT_SHA1_HEXSZ);
+	if (oid)
+		write_or_die(1, oid_to_hex(oid), the_hash_algo->hexsz);
 }
 
 static void dos_time(timestamp_t *timestamp, int *dos_date, int *dos_time)
@@ -635,7 +635,7 @@ static int write_zip_archive(const struct archiver *ar,
 
 	err = write_archive_entries(args, write_zip_entry);
 	if (!err)
-		write_zip_trailer(args->commit_sha1);
+		write_zip_trailer(args->commit_oid);
 
 	strbuf_release(&zip_dir);
 
diff --git a/archive.c b/archive.c
index 1f98324a93..f2c78a2712 100644
--- a/archive.c
+++ b/archive.c
@@ -380,7 +380,7 @@ static void parse_treeish_arg(const char **argv,
 		int remote)
 {
 	const char *name = argv[0];
-	const unsigned char *commit_sha1;
+	const struct object_id *commit_oid;
 	time_t archive_time;
 	struct tree *tree;
 	const struct commit *commit;
@@ -402,10 +402,10 @@ static void parse_treeish_arg(const char **argv,
 
 	commit = lookup_commit_reference_gently(ar_args->repo, &oid, 1);
 	if (commit) {
-		commit_sha1 = commit->object.oid.hash;
+		commit_oid = &commit->object.oid;
 		archive_time = commit->date;
 	} else {
-		commit_sha1 = NULL;
+		commit_oid = NULL;
 		archive_time = time(NULL);
 	}
 
@@ -426,7 +426,7 @@ static void parse_treeish_arg(const char **argv,
 		tree = parse_tree_indirect(&tree_oid);
 	}
 	ar_args->tree = tree;
-	ar_args->commit_sha1 = commit_sha1;
+	ar_args->commit_oid = commit_oid;
 	ar_args->commit = commit;
 	ar_args->time = archive_time;
 }
diff --git a/archive.h b/archive.h
index 21ac010699..dd022a6b46 100644
--- a/archive.h
+++ b/archive.h
@@ -11,7 +11,7 @@ struct archiver_args {
 	const char *base;
 	size_t baselen;
 	struct tree *tree;
-	const unsigned char *commit_sha1;
+	const struct object_id *commit_oid;
 	const struct commit *commit;
 	timestamp_t time;
 	struct pathspec pathspec;

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 26/31] refspec: make hash size independent
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (24 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 25/31] archive: convert struct archiver_args to object_id brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12  1:22 ` [PATCH 27/31] builtin/difftool: use parse_oid_hex brian m. carlson
                   ` (5 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Switch a use of GIT_SHA1_HEXSZ to use the_hash_algo.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 refspec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/refspec.c b/refspec.c
index f529092fd6..9a9bf21934 100644
--- a/refspec.c
+++ b/refspec.c
@@ -72,7 +72,7 @@ static int parse_refspec(struct refspec_item *item, const char *refspec, int fet
 		/* LHS */
 		if (!*item->src)
 			; /* empty is ok; it means "HEAD" */
-		else if (llen == GIT_SHA1_HEXSZ && !get_oid_hex(item->src, &unused))
+		else if (llen == the_hash_algo->hexsz && !get_oid_hex(item->src, &unused))
 			item->exact_sha1 = 1; /* ok */
 		else if (!check_refname_format(item->src, flags))
 			; /* valid looking ref is ok */

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 27/31] builtin/difftool: use parse_oid_hex
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (25 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 26/31] refspec: make hash size independent brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12  8:27   ` Eric Sunshine
  2019-02-12  1:22 ` [PATCH 28/31] dir: make untracked cache extension hash size independent brian m. carlson
                   ` (4 subsequent siblings)
  31 siblings, 1 reply; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Instead of using get_oid_hex and adding constants to the result, use
parse_oid_hex to make this code independent of the hash size.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/difftool.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/builtin/difftool.c b/builtin/difftool.c
index a3ea60ea71..1b648226dc 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -65,14 +65,12 @@ static int parse_index_info(char *p, int *mode1, int *mode2,
 	*mode2 = (int)strtol(p + 1, &p, 8);
 	if (*p != ' ')
 		return error("expected ' ', got '%c'", *p);
-	if (get_oid_hex(++p, oid1))
+	if (parse_oid_hex(++p, oid1, (const char **)&p))
 		return error("expected object ID, got '%s'", p + 1);
-	p += GIT_SHA1_HEXSZ;
 	if (*p != ' ')
 		return error("expected ' ', got '%c'", *p);
-	if (get_oid_hex(++p, oid2))
+	if (parse_oid_hex(++p, oid2, (const char **)&p))
 		return error("expected object ID, got '%s'", p + 1);
-	p += GIT_SHA1_HEXSZ;
 	if (*p != ' ')
 		return error("expected ' ', got '%c'", *p);
 	*status = *++p;

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 28/31] dir: make untracked cache extension hash size independent
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (26 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 27/31] builtin/difftool: use parse_oid_hex brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12 11:08   ` Ævar Arnfjörð Bjarmason
  2019-02-12  1:22 ` [PATCH 29/31] read-cache: read data in a hash-independent way brian m. carlson
                   ` (3 subsequent siblings)
  31 siblings, 1 reply; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Instead of using a struct with a flex array member to read and write the
untracked cache extension, use a shorter, fixed-length struct and add
the name and hash data explicitly.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 dir.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/dir.c b/dir.c
index b2cabadf25..7503b56918 100644
--- a/dir.c
+++ b/dir.c
@@ -2545,13 +2545,9 @@ struct ondisk_untracked_cache {
 	struct stat_data info_exclude_stat;
 	struct stat_data excludes_file_stat;
 	uint32_t dir_flags;
-	unsigned char info_exclude_sha1[20];
-	unsigned char excludes_file_sha1[20];
-	char exclude_per_dir[FLEX_ARRAY];
 };
 
 #define ouc_offset(x) offsetof(struct ondisk_untracked_cache, x)
-#define ouc_size(len) (ouc_offset(exclude_per_dir) + len + 1)
 
 struct write_data {
 	int index;	   /* number of written untracked_cache_dir */
@@ -2634,20 +2630,21 @@ void write_untracked_extension(struct strbuf *out, struct untracked_cache *untra
 	struct write_data wd;
 	unsigned char varbuf[16];
 	int varint_len;
-	size_t len = strlen(untracked->exclude_per_dir);
+	const unsigned hashsz = the_hash_algo->rawsz;
 
-	FLEX_ALLOC_MEM(ouc, exclude_per_dir, untracked->exclude_per_dir, len);
+	ouc = xcalloc(1, sizeof(*ouc));
 	stat_data_to_disk(&ouc->info_exclude_stat, &untracked->ss_info_exclude.stat);
 	stat_data_to_disk(&ouc->excludes_file_stat, &untracked->ss_excludes_file.stat);
-	hashcpy(ouc->info_exclude_sha1, untracked->ss_info_exclude.oid.hash);
-	hashcpy(ouc->excludes_file_sha1, untracked->ss_excludes_file.oid.hash);
 	ouc->dir_flags = htonl(untracked->dir_flags);
 
 	varint_len = encode_varint(untracked->ident.len, varbuf);
 	strbuf_add(out, varbuf, varint_len);
 	strbuf_addbuf(out, &untracked->ident);
 
-	strbuf_add(out, ouc, ouc_size(len));
+	strbuf_add(out, ouc, sizeof(*ouc));
+	strbuf_add(out, untracked->ss_info_exclude.oid.hash, hashsz);
+	strbuf_add(out, untracked->ss_excludes_file.oid.hash, hashsz);
+	strbuf_add(out, untracked->exclude_per_dir, strlen(untracked->exclude_per_dir) + 1);
 	FREE_AND_NULL(ouc);
 
 	if (!untracked->root) {
@@ -2834,6 +2831,9 @@ struct untracked_cache *read_untracked_extension(const void *data, unsigned long
 	int ident_len;
 	ssize_t len;
 	const char *exclude_per_dir;
+	const unsigned hashsz = the_hash_algo->rawsz;
+	const unsigned offset = sizeof(struct ondisk_untracked_cache);
+	const unsigned exclude_per_dir_offset = offset + 2 * hashsz;
 
 	if (sz <= 1 || end[-1] != '\0')
 		return NULL;
@@ -2845,7 +2845,7 @@ struct untracked_cache *read_untracked_extension(const void *data, unsigned long
 	ident = (const char *)next;
 	next += ident_len;
 
-	if (next + ouc_size(0) > end)
+	if (next + exclude_per_dir_offset + 1 > end)
 		return NULL;
 
 	uc = xcalloc(1, sizeof(*uc));
@@ -2853,15 +2853,15 @@ struct untracked_cache *read_untracked_extension(const void *data, unsigned long
 	strbuf_add(&uc->ident, ident, ident_len);
 	load_oid_stat(&uc->ss_info_exclude,
 		      next + ouc_offset(info_exclude_stat),
-		      next + ouc_offset(info_exclude_sha1));
+		      next + offset);
 	load_oid_stat(&uc->ss_excludes_file,
 		      next + ouc_offset(excludes_file_stat),
-		      next + ouc_offset(excludes_file_sha1));
+		      next + offset + hashsz);
 	uc->dir_flags = get_be32(next + ouc_offset(dir_flags));
-	exclude_per_dir = (const char *)next + ouc_offset(exclude_per_dir);
+	exclude_per_dir = (const char *)next + exclude_per_dir_offset;
 	uc->exclude_per_dir = xstrdup(exclude_per_dir);
 	/* NUL after exclude_per_dir is covered by sizeof(*ouc) */
-	next += ouc_size(strlen(exclude_per_dir));
+	next += exclude_per_dir_offset + strlen(exclude_per_dir) + 1;
 	if (next >= end)
 		goto done2;
 

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 29/31] read-cache: read data in a hash-independent way
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (27 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 28/31] dir: make untracked cache extension hash size independent brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12  1:22 ` [PATCH 30/31] Git.pm: make hash size independent brian m. carlson
                   ` (2 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Index entries are structured with a variety of fields up front, followed
by a hash and one or two flags fields.  Because the hash field is stored
in the middle of the structure, it's difficult to use one fixed-size
structure that easily allows access to the hash and flags fields.
Adjust the structure to hold the maximum amount of data that may be
needed using a member called "data" and read and write this field
independently in the various places that need to read and write the
structure.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 read-cache.c | 74 ++++++++++++++++++++--------------------------------
 1 file changed, 29 insertions(+), 45 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 0e0c93edc9..d9f12c568f 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1634,39 +1634,24 @@ struct ondisk_cache_entry {
 	uint32_t uid;
 	uint32_t gid;
 	uint32_t size;
-	unsigned char sha1[20];
-	uint16_t flags;
-	char name[FLEX_ARRAY]; /* more */
-};
-
-/*
- * This struct is used when CE_EXTENDED bit is 1
- * The struct must match ondisk_cache_entry exactly from
- * ctime till flags
- */
-struct ondisk_cache_entry_extended {
-	struct cache_time ctime;
-	struct cache_time mtime;
-	uint32_t dev;
-	uint32_t ino;
-	uint32_t mode;
-	uint32_t uid;
-	uint32_t gid;
-	uint32_t size;
-	unsigned char sha1[20];
-	uint16_t flags;
-	uint16_t flags2;
-	char name[FLEX_ARRAY]; /* more */
+	/*
+	 * unsigned char hash[hashsz];
+	 * uint16_t flags;
+	 * if (flags & CE_EXTENDED)
+	 *	uint16_t flags2;
+	 */
+	unsigned char data[GIT_MAX_RAWSZ + 2 * sizeof(uint16_t)];
+	char name[FLEX_ARRAY];
 };
 
 /* These are only used for v3 or lower */
 #define align_padding_size(size, len) ((size + (len) + 8) & ~7) - (size + len)
-#define align_flex_name(STRUCT,len) ((offsetof(struct STRUCT,name) + (len) + 8) & ~7)
+#define align_flex_name(STRUCT,len) ((offsetof(struct STRUCT,data) + (len) + 8) & ~7)
 #define ondisk_cache_entry_size(len) align_flex_name(ondisk_cache_entry,len)
-#define ondisk_cache_entry_extended_size(len) align_flex_name(ondisk_cache_entry_extended,len)
-#define ondisk_ce_size(ce) (((ce)->ce_flags & CE_EXTENDED) ? \
-			    ondisk_cache_entry_extended_size(ce_namelen(ce)) : \
-			    ondisk_cache_entry_size(ce_namelen(ce)))
+#define ondisk_data_size(flags, len) (the_hash_algo->rawsz + \
+				     ((flags & CE_EXTENDED) ? 2 : 1) * sizeof(uint16_t) + len)
+#define ondisk_data_size_max(len) (ondisk_data_size(CE_EXTENDED, len))
+#define ondisk_ce_size(ce) (ondisk_cache_entry_size(ondisk_data_size((ce)->ce_flags, ce_namelen(ce))))
 
 /* Allow fsck to force verification of the index checksum. */
 int verify_index_checksum;
@@ -1740,6 +1725,8 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
 	struct cache_entry *ce;
 	size_t len;
 	const char *name;
+	const unsigned hashsz = the_hash_algo->rawsz;
+	const uint16_t *flagsp = (const uint16_t *)(ondisk->data + hashsz);
 	unsigned int flags;
 	size_t copy_len = 0;
 	/*
@@ -1752,22 +1739,20 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
 	int expand_name_field = version == 4;
 
 	/* On-disk flags are just 16 bits */
-	flags = get_be16(&ondisk->flags);
+	flags = get_be16(flagsp);
 	len = flags & CE_NAMEMASK;
 
 	if (flags & CE_EXTENDED) {
-		struct ondisk_cache_entry_extended *ondisk2;
 		int extended_flags;
-		ondisk2 = (struct ondisk_cache_entry_extended *)ondisk;
-		extended_flags = get_be16(&ondisk2->flags2) << 16;
+		extended_flags = get_be16(flagsp + 1) << 16;
 		/* We do not yet understand any bit out of CE_EXTENDED_FLAGS */
 		if (extended_flags & ~CE_EXTENDED_FLAGS)
 			die(_("unknown index entry format 0x%08x"), extended_flags);
 		flags |= extended_flags;
-		name = ondisk2->name;
+		name = (const char *)(flagsp + 2);
 	}
 	else
-		name = ondisk->name;
+		name = (const char *)(flagsp + 1);
 
 	if (expand_name_field) {
 		const unsigned char *cp = (const unsigned char *)name;
@@ -1806,7 +1791,9 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
 	ce->ce_flags = flags & ~CE_NAMEMASK;
 	ce->ce_namelen = len;
 	ce->index = 0;
-	hashcpy(ce->oid.hash, ondisk->sha1);
+	hashcpy(ce->oid.hash, ondisk->data);
+	memcpy(ce->name, name, len);
+	ce->name[len] = '\0';
 
 	if (expand_name_field) {
 		if (copy_len)
@@ -2528,6 +2515,8 @@ static void copy_cache_entry_to_ondisk(struct ondisk_cache_entry *ondisk,
 				       struct cache_entry *ce)
 {
 	short flags;
+	const unsigned hashsz = the_hash_algo->rawsz;
+	uint16_t *flagsp = (uint16_t *)(ondisk->data + hashsz);
 
 	ondisk->ctime.sec = htonl(ce->ce_stat_data.sd_ctime.sec);
 	ondisk->mtime.sec = htonl(ce->ce_stat_data.sd_mtime.sec);
@@ -2539,15 +2528,13 @@ static void copy_cache_entry_to_ondisk(struct ondisk_cache_entry *ondisk,
 	ondisk->uid  = htonl(ce->ce_stat_data.sd_uid);
 	ondisk->gid  = htonl(ce->ce_stat_data.sd_gid);
 	ondisk->size = htonl(ce->ce_stat_data.sd_size);
-	hashcpy(ondisk->sha1, ce->oid.hash);
+	hashcpy(ondisk->data, ce->oid.hash);
 
 	flags = ce->ce_flags & ~CE_NAMEMASK;
 	flags |= (ce_namelen(ce) >= CE_NAMEMASK ? CE_NAMEMASK : ce_namelen(ce));
-	ondisk->flags = htons(flags);
+	flagsp[0] = htons(flags);
 	if (ce->ce_flags & CE_EXTENDED) {
-		struct ondisk_cache_entry_extended *ondisk2;
-		ondisk2 = (struct ondisk_cache_entry_extended *)ondisk;
-		ondisk2->flags2 = htons((ce->ce_flags & CE_EXTENDED_FLAGS) >> 16);
+		flagsp[1] = htons((ce->ce_flags & CE_EXTENDED_FLAGS) >> 16);
 	}
 }
 
@@ -2566,10 +2553,7 @@ static int ce_write_entry(git_hash_ctx *c, int fd, struct cache_entry *ce,
 		stripped_name = 1;
 	}
 
-	if (ce->ce_flags & CE_EXTENDED)
-		size = offsetof(struct ondisk_cache_entry_extended, name);
-	else
-		size = offsetof(struct ondisk_cache_entry, name);
+	size = offsetof(struct ondisk_cache_entry,data) + ondisk_data_size(ce->ce_flags, 0);
 
 	if (!previous_name) {
 		int len = ce_namelen(ce);
@@ -2727,7 +2711,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 	struct cache_entry **cache = istate->cache;
 	int entries = istate->cache_nr;
 	struct stat st;
-	struct ondisk_cache_entry_extended ondisk;
+	struct ondisk_cache_entry ondisk;
 	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
 	int drop_cache_tree = istate->drop_cache_tree;
 	off_t offset;

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 30/31] Git.pm: make hash size independent
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (28 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 29/31] read-cache: read data in a hash-independent way brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12 10:59   ` Ævar Arnfjörð Bjarmason
  2019-02-12  1:22 ` [PATCH 31/31] gitweb: " brian m. carlson
  2019-02-12 11:15 ` [PATCH 00/31] Hash function transition part 16 Ævar Arnfjörð Bjarmason
  31 siblings, 1 reply; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

The cat_blob function was matching on exactly 40 hex characters.  This
won't work with SHA-256, which uses 64-character hex object IDs.  While
it should be fine to simply match any number of hex characters since the
output is space delimited, be extra safe by matching either exactly 40
or exactly 64 hex characters.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 perl/Git.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/perl/Git.pm b/perl/Git.pm
index d856930b2e..62c472e0ce 100644
--- a/perl/Git.pm
+++ b/perl/Git.pm
@@ -980,7 +980,7 @@ sub cat_blob {
 		return -1;
 	}
 
-	if ($description !~ /^[0-9a-fA-F]{40} \S+ (\d+)$/) {
+	if ($description !~ /^[0-9a-fA-F]{40}(?:[0-9a-fA-F]{24})? \S+ (\d+)$/) {
 		carp "Unexpected result returned from git cat-file";
 		return -1;
 	}

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 31/31] gitweb: make hash size independent
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (29 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 30/31] Git.pm: make hash size independent brian m. carlson
@ 2019-02-12  1:22 ` brian m. carlson
  2019-02-12 10:57   ` Ævar Arnfjörð Bjarmason
  2019-02-12 11:15 ` [PATCH 00/31] Hash function transition part 16 Ævar Arnfjörð Bjarmason
  31 siblings, 1 reply; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:22 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Duy Nguyen

Gitweb has several hard-coded 40 values throughout it to check for
values that are passed in or acquired from Git.  To simplify the code,
introduce a regex variable that matches either exactly 40 or exactly 64
hex characters, and use this variable anywhere we would have previously
hard-coded a 40 in a regex.

Similarly, switch the code that looks for deleted diffinfo information
to look for either 40 or 64 zeros, and update one piece of code to use
this function.  Finally, when formatting a log line, allow an
abbreviated describe output to contain up to 64 characters.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 gitweb/gitweb.perl | 63 ++++++++++++++++++++++++----------------------
 1 file changed, 33 insertions(+), 30 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 2594a4badb..aec8ca3256 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -788,6 +788,9 @@ sub check_loadavg {
 # ======================================================================
 # input validation and dispatch
 
+# A regex matching a valid object ID.
+our $oid_regex = qr/(?:[0-9a-fA-F]{40}(?:[0-9a-fA-F]{24})?)/;
+
 # input parameters can be collected from a variety of sources (presently, CGI
 # and PATH_INFO), so we define an %input_params hash that collects them all
 # together during validation: this allows subsequent uses (e.g. href()) to be
@@ -1516,7 +1519,7 @@ sub is_valid_refname {
 
 	return undef unless defined $input;
 	# textual hashes are O.K.
-	if ($input =~ m/^[0-9a-fA-F]{40}$/) {
+	if ($input =~ m/^$oid_regex$/) {
 		return 1;
 	}
 	# it must be correct pathname
@@ -2037,10 +2040,10 @@ sub format_log_line_html {
             (?<!-) # see strbuf_check_tag_ref(). Tags can't start with -
             [A-Za-z0-9.-]+
             (?!\.) # refs can't end with ".", see check_refname_format()
-            -g[0-9a-fA-F]{7,40}
+            -g[0-9a-fA-F]{7,64}
             |
             # Just a normal looking Git SHA1
-            [0-9a-fA-F]{7,40}
+            [0-9a-fA-F]{7,64}
         )
         \b
     }{
@@ -2286,7 +2289,7 @@ sub format_extended_diff_header_line {
 		         ')</span>';
 	}
 	# match <hash>
-	if ($line =~ m/^index [0-9a-fA-F]{40},[0-9a-fA-F]{40}/) {
+	if ($line =~ m/^index $oid_regex,$oid_regex/) {
 		# can match only for combined diff
 		$line = 'index ';
 		for (my $i = 0; $i < $diffinfo->{'nparents'}; $i++) {
@@ -2308,7 +2311,7 @@ sub format_extended_diff_header_line {
 			$line .= '0' x 7;
 		}
 
-	} elsif ($line =~ m/^index [0-9a-fA-F]{40}..[0-9a-fA-F]{40}/) {
+	} elsif ($line =~ m/^index $oid_regex..$oid_regex/) {
 		# can match only for ordinary diff
 		my ($from_link, $to_link);
 		if ($from->{'href'}) {
@@ -2834,7 +2837,7 @@ sub git_get_hash_by_path {
 	}
 
 	#'100644 blob 0fa3f3a66fb6a137f6ec2c19351ed4d807070ffa	panic.c'
-	$line =~ m/^([0-9]+) (.+) ([0-9a-fA-F]{40})\t/;
+	$line =~ m/^([0-9]+) (.+) ($oid_regex)\t/;
 	if (defined $type && $type ne $2) {
 		# type doesn't match
 		return undef;
@@ -3333,7 +3336,7 @@ sub git_get_references {
 
 	while (my $line = <$fd>) {
 		chomp $line;
-		if ($line =~ m!^([0-9a-fA-F]{40})\srefs/($type.*)$!) {
+		if ($line =~ m!^($oid_regex)\srefs/($type.*)$!) {
 			if (defined $refs{$1}) {
 				push @{$refs{$1}}, $2;
 			} else {
@@ -3407,7 +3410,7 @@ sub parse_tag {
 	$tag{'id'} = $tag_id;
 	while (my $line = <$fd>) {
 		chomp $line;
-		if ($line =~ m/^object ([0-9a-fA-F]{40})$/) {
+		if ($line =~ m/^object ($oid_regex)$/) {
 			$tag{'object'} = $1;
 		} elsif ($line =~ m/^type (.+)$/) {
 			$tag{'type'} = $1;
@@ -3451,15 +3454,15 @@ sub parse_commit_text {
 	}
 
 	my $header = shift @commit_lines;
-	if ($header !~ m/^[0-9a-fA-F]{40}/) {
+	if ($header !~ m/^$oid_regex/) {
 		return;
 	}
 	($co{'id'}, my @parents) = split ' ', $header;
 	while (my $line = shift @commit_lines) {
 		last if $line eq "\n";
-		if ($line =~ m/^tree ([0-9a-fA-F]{40})$/) {
+		if ($line =~ m/^tree ($oid_regex)$/) {
 			$co{'tree'} = $1;
-		} elsif ((!defined $withparents) && ($line =~ m/^parent ([0-9a-fA-F]{40})$/)) {
+		} elsif ((!defined $withparents) && ($line =~ m/^parent ($oid_regex)$/)) {
 			push @parents, $1;
 		} elsif ($line =~ m/^author (.*) ([0-9]+) (.*)$/) {
 			$co{'author'} = to_utf8($1);
@@ -3591,7 +3594,7 @@ sub parse_difftree_raw_line {
 
 	# ':100644 100644 03b218260e99b78c6df0ed378e59ed9205ccc96d 3b93d5e7cc7f7dd4ebed13a5cc1a4ad976fc94d8 M	ls-files.c'
 	# ':100644 100644 7f9281985086971d3877aca27704f2aaf9c448ce bc190ebc71bbd923f2b728e505408f5e54bd073a M	rev-tree.c'
-	if ($line =~ m/^:([0-7]{6}) ([0-7]{6}) ([0-9a-fA-F]{40}) ([0-9a-fA-F]{40}) (.)([0-9]{0,3})\t(.*)$/) {
+	if ($line =~ m/^:([0-7]{6}) ([0-7]{6}) ($oid_regex) ($oid_regex) (.)([0-9]{0,3})\t(.*)$/) {
 		$res{'from_mode'} = $1;
 		$res{'to_mode'} = $2;
 		$res{'from_id'} = $3;
@@ -3606,7 +3609,7 @@ sub parse_difftree_raw_line {
 	}
 	# '::100755 100755 100755 60e79ca1b01bc8b057abe17ddab484699a7f5fdb 94067cc5f73388f33722d52ae02f44692bc07490 94067cc5f73388f33722d52ae02f44692bc07490 MR	git-gui/git-gui.sh'
 	# combined diff (for merge commit)
-	elsif ($line =~ s/^(::+)((?:[0-7]{6} )+)((?:[0-9a-fA-F]{40} )+)([a-zA-Z]+)\t(.*)$//) {
+	elsif ($line =~ s/^(::+)((?:[0-7]{6} )+)((?:$oid_regex )+)([a-zA-Z]+)\t(.*)$//) {
 		$res{'nparents'}  = length($1);
 		$res{'from_mode'} = [ split(' ', $2) ];
 		$res{'to_mode'} = pop @{$res{'from_mode'}};
@@ -3616,7 +3619,7 @@ sub parse_difftree_raw_line {
 		$res{'to_file'} = unquote($5);
 	}
 	# 'c512b523472485aef4fff9e57b229d9d243c967f'
-	elsif ($line =~ m/^([0-9a-fA-F]{40})$/) {
+	elsif ($line =~ m/^($oid_regex)$/) {
 		$res{'commit'} = $1;
 	}
 
@@ -3644,7 +3647,7 @@ sub parse_ls_tree_line {
 
 	if ($opts{'-l'}) {
 		#'100644 blob 0fa3f3a66fb6a137f6ec2c19351ed4d807070ffa   16717	panic.c'
-		$line =~ m/^([0-9]+) (.+) ([0-9a-fA-F]{40}) +(-|[0-9]+)\t(.+)$/s;
+		$line =~ m/^([0-9]+) (.+) ($oid_regex) +(-|[0-9]+)\t(.+)$/s;
 
 		$res{'mode'} = $1;
 		$res{'type'} = $2;
@@ -3657,7 +3660,7 @@ sub parse_ls_tree_line {
 		}
 	} else {
 		#'100644 blob 0fa3f3a66fb6a137f6ec2c19351ed4d807070ffa	panic.c'
-		$line =~ m/^([0-9]+) (.+) ([0-9a-fA-F]{40})\t(.+)$/s;
+		$line =~ m/^([0-9]+) (.+) ($oid_regex)\t(.+)$/s;
 
 		$res{'mode'} = $1;
 		$res{'type'} = $2;
@@ -4799,7 +4802,7 @@ sub fill_from_file_info {
 sub is_deleted {
 	my $diffinfo = shift;
 
-	return $diffinfo->{'to_id'} eq ('0' x 40);
+	return $diffinfo->{'to_id'} eq ('0' x 40) || $diffinfo->{'to_id'} eq ('0' x 64);
 }
 
 # does patch correspond to [previous] difftree raw line
@@ -6285,7 +6288,7 @@ sub git_search_changes {
 			              -class => "list subject"},
 			              chop_and_escape_str($co{'title'}, 50) . "<br/>");
 		} elsif (defined $set{'to_id'}) {
-			next if ($set{'to_id'} =~ m/^0{40}$/);
+			next if is_deleted(\%set);
 
 			print $cgi->a({-href => href(action=>"blob", hash_base=>$co{'id'},
 			                             hash=>$set{'to_id'}, file_name=>$set{'to_file'}),
@@ -6829,7 +6832,7 @@ sub git_blame_common {
 			# the header: <SHA-1> <src lineno> <dst lineno> [<lines in group>]
 			# no <lines in group> for subsequent lines in group of lines
 			my ($full_rev, $orig_lineno, $lineno, $group_size) =
-			   ($line =~ /^([0-9a-f]{40}) (\d+) (\d+)(?: (\d+))?$/);
+			   ($line =~ /^($oid_regex) (\d+) (\d+)(?: (\d+))?$/);
 			if (!exists $metainfo{$full_rev}) {
 				$metainfo{$full_rev} = { 'nprevious' => 0 };
 			}
@@ -6879,7 +6882,7 @@ sub git_blame_common {
 			}
 			# 'previous' <sha1 of parent commit> <filename at commit>
 			if (exists $meta->{'previous'} &&
-			    $meta->{'previous'} =~ /^([a-fA-F0-9]{40}) (.*)$/) {
+			    $meta->{'previous'} =~ /^($oid_regex) (.*)$/) {
 				$meta->{'parent'} = $1;
 				$meta->{'file_parent'} = unquote($2);
 			}
@@ -6996,7 +6999,7 @@ sub git_blob_plain {
 		} else {
 			die_error(400, "No file name defined");
 		}
-	} elsif ($hash =~ m/^[0-9a-fA-F]{40}$/) {
+	} elsif ($hash =~ m/^$oid_regex$/) {
 		# blobs defined by non-textual hash id's can be cached
 		$expires = "+1d";
 	}
@@ -7057,7 +7060,7 @@ sub git_blob {
 		} else {
 			die_error(400, "No file name defined");
 		}
-	} elsif ($hash =~ m/^[0-9a-fA-F]{40}$/) {
+	} elsif ($hash =~ m/^$oid_regex$/) {
 		# blobs defined by non-textual hash id's can be cached
 		$expires = "+1d";
 	}
@@ -7515,7 +7518,7 @@ sub git_commit {
 
 	# non-textual hash id's can be cached
 	my $expires;
-	if ($hash =~ m/^[0-9a-fA-F]{40}$/) {
+	if ($hash =~ m/^$oid_regex$/) {
 		$expires = "+1d";
 	}
 	my $refs = git_get_references();
@@ -7609,7 +7612,7 @@ sub git_object {
 		close $fd;
 
 		#'100644 blob 0fa3f3a66fb6a137f6ec2c19351ed4d807070ffa	panic.c'
-		unless ($line && $line =~ m/^([0-9]+) (.+) ([0-9a-fA-F]{40})\t/) {
+		unless ($line && $line =~ m/^([0-9]+) (.+) ($oid_regex)\t/) {
 			die_error(404, "File or directory for given base does not exist");
 		}
 		$type = $2;
@@ -7649,7 +7652,7 @@ sub git_blobdiff {
 				or die_error(404, "Blob diff not found");
 
 		} elsif (defined $hash &&
-		         $hash =~ /[0-9a-fA-F]{40}/) {
+		         $hash =~ $oid_regex) {
 			# try to find filename from $hash
 
 			# read filtered raw output
@@ -7659,7 +7662,7 @@ sub git_blobdiff {
 			@difftree =
 				# ':100644 100644 03b21826... 3b93d5e7... M	ls-files.c'
 				# $hash == to_id
-				grep { /^:[0-7]{6} [0-7]{6} [0-9a-fA-F]{40} $hash/ }
+				grep { /^:[0-7]{6} [0-7]{6} $oid_regex $hash/ }
 				map { chomp; $_ } <$fd>;
 			close $fd
 				or die_error(404, "Reading git-diff-tree failed");
@@ -7682,8 +7685,8 @@ sub git_blobdiff {
 		$hash        ||= $diffinfo{'to_id'};
 
 		# non-textual hash id's can be cached
-		if ($hash_base =~ m/^[0-9a-fA-F]{40}$/ &&
-		    $hash_parent_base =~ m/^[0-9a-fA-F]{40}$/) {
+		if ($hash_base =~ m/^$oid_regex$/ &&
+		    $hash_parent_base =~ m/^$oid_regex$/) {
 			$expires = '+1d';
 		}
 
@@ -7819,7 +7822,7 @@ sub git_commitdiff {
 		    $hash_parent ne '-c' && $hash_parent ne '--cc') {
 			# commitdiff with two commits given
 			my $hash_parent_short = $hash_parent;
-			if ($hash_parent =~ m/^[0-9a-fA-F]{40}$/) {
+			if ($hash_parent =~ m/^$oid_regex$/) {
 				$hash_parent_short = substr($hash_parent, 0, 7);
 			}
 			$formats_nav .=
@@ -7928,7 +7931,7 @@ sub git_commitdiff {
 
 	# non-textual hash id's can be cached
 	my $expires;
-	if ($hash =~ m/^[0-9a-fA-F]{40}$/) {
+	if ($hash =~ m/^$oid_regex$/) {
 		$expires = "+1d";
 	}
 

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH 08/31] notes: make hash size independent
  2019-02-12  1:22 ` [PATCH 08/31] notes: make hash size independent brian m. carlson
@ 2019-02-12  1:37   ` Eric Sunshine
  2019-02-12  1:42     ` brian m. carlson
  0 siblings, 1 reply; 56+ messages in thread
From: Eric Sunshine @ 2019-02-12  1:37 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git List, René Scharfe, Duy Nguyen

On Mon, Feb 11, 2019 at 8:23 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> Switch out various uses of the GIT_SHA1_* constants with GIT_MAX_*
> constants for allocations and the_hash_algo for general parsing.  Update
> a comment to no longer be SHA-1 specific.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
> diff --git a/notes.c b/notes.c
> @@ -527,15 +529,15 @@ static unsigned char determine_fanout(struct int_node *tree, unsigned char n,
> -/* hex SHA1 + 19 * '/' + NUL */
> -#define FANOUT_PATH_MAX GIT_SHA1_HEXSZ + FANOUT_PATH_SEPARATORS + 1
> +/* hex oid + one slash between each pair + NUL */
> +#define FANOUT_PATH_MAX GIT_MAX_HEXSZ + FANOUT_PATH_SEPARATORS_MAX + 1

I had trouble understanding to what pair you are referring. I _think_
you mean "pair of hex digits". If so, perhaps:

    /* hex oid + '/' between each pair of hex digits + NUL */

(but not worth a re-roll).

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 08/31] notes: make hash size independent
  2019-02-12  1:37   ` Eric Sunshine
@ 2019-02-12  1:42     ` brian m. carlson
  0 siblings, 0 replies; 56+ messages in thread
From: brian m. carlson @ 2019-02-12  1:42 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List, René Scharfe, Duy Nguyen

[-- Attachment #1: Type: text/plain, Size: 944 bytes --]

On Mon, Feb 11, 2019 at 08:37:49PM -0500, Eric Sunshine wrote:
> On Mon, Feb 11, 2019 at 8:23 PM brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> > diff --git a/notes.c b/notes.c
> > @@ -527,15 +529,15 @@ static unsigned char determine_fanout(struct int_node *tree, unsigned char n,
> > -/* hex SHA1 + 19 * '/' + NUL */
> > -#define FANOUT_PATH_MAX GIT_SHA1_HEXSZ + FANOUT_PATH_SEPARATORS + 1
> > +/* hex oid + one slash between each pair + NUL */
> > +#define FANOUT_PATH_MAX GIT_MAX_HEXSZ + FANOUT_PATH_SEPARATORS_MAX + 1
> 
> I had trouble understanding to what pair you are referring. I _think_
> you mean "pair of hex digits". If so, perhaps:
> 
>     /* hex oid + '/' between each pair of hex digits + NUL */
> 
> (but not worth a re-roll).

Yes, I did. I fully expect to do at least one reroll here, so I'll
update it when I do.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 10/31] object-store: rename and expand packed_git's sha1 member
  2019-02-12  1:22 ` [PATCH 10/31] object-store: rename and expand packed_git's sha1 member brian m. carlson
@ 2019-02-12  3:32   ` Eric Sunshine
  2019-02-14  3:33     ` brian m. carlson
  0 siblings, 1 reply; 56+ messages in thread
From: Eric Sunshine @ 2019-02-12  3:32 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git List, René Scharfe, Duy Nguyen

On Mon, Feb 11, 2019 at 8:23 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> This member is used to represent the pack checksum of the pack in
> question.  Expand this member to be GIT_MAX_RAWSZ bytes in length so it
> works with longer hashes and rename it to be "hash" instead of "sha1".
> [...]
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
> diff --git a/builtin/pack-redundant.c b/builtin/pack-redundant.c
> @@ -689,7 +689,7 @@ int cmd_pack_redundant(int argc, const char **argv, const char *prefix)
>         while (pl) {
>                 printf("%s\n%s\n",
> -                      sha1_pack_index_name(pl->pack->sha1),
> +                      sha1_pack_index_name(pl->pack->hash),

I guess there is no oid_pack_index_name() function yet?

> diff --git a/http-walker.c b/http-walker.c
> @@ -434,7 +434,7 @@ static int http_fetch_pack(struct walker *walker, struct alt_base *repo, unsigne
>         if (walker->get_verbosely) {
>                 fprintf(stderr, "Getting pack %s\n",
> -                       sha1_to_hex(target->sha1));
> +                       sha1_to_hex(target->hash));

Should this have become oid_to_hex()?

>                 fprintf(stderr, " which contains %s\n",
>                         sha1_to_hex(sha1));

The patch isn't touching this sha1 yet, so this is okay(?).

> diff --git a/http.c b/http.c
> @@ -2262,10 +2262,10 @@ struct http_pack_request *new_http_pack_request(
>         strbuf_addf(&buf, "objects/pack/pack-%s.pack",
> -               sha1_to_hex(target->sha1));
> +               sha1_to_hex(target->hash));

oid_to_hex()?

> @@ -2289,7 +2289,8 @@ struct http_pack_request *new_http_pack_request(
>                         fprintf(stderr,
>                                 "Resuming fetch of pack %s at byte %"PRIuMAX"\n",
> -                               sha1_to_hex(target->sha1), (uintmax_t)prev_posn);
> +                               sha1_to_hex(target->hash),

oid_to_hex()?

> diff --git a/packfile.c b/packfile.c
> @@ -722,8 +722,8 @@ struct packed_git *add_packed_git(const char *path, size_t path_len, int local)
>         if (path_len < the_hash_algo->hexsz ||
> -           get_sha1_hex(path + path_len - the_hash_algo->hexsz, p->sha1))
> -               hashclr(p->sha1);
> +           get_sha1_hex(path + path_len - the_hash_algo->hexsz, p->hash))
> +               hashclr(p->hash);

get_oid_hex()?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 12/31] fast-import: make hash-size independent
  2019-02-12  1:22 ` [PATCH 12/31] fast-import: " brian m. carlson
@ 2019-02-12  3:44   ` Eric Sunshine
  2019-02-12 23:36     ` brian m. carlson
  0 siblings, 1 reply; 56+ messages in thread
From: Eric Sunshine @ 2019-02-12  3:44 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git List, René Scharfe, Duy Nguyen

On Mon, Feb 11, 2019 at 8:23 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> Replace several uses of GIT_SHA1_HEXSZ and 40-based constants with
> references to the_hash_algo.  Update the note handling code here to
> compute path sizes based on GIT_MAX_RAWSZ as well.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
> diff --git a/fast-import.c b/fast-import.c
> @@ -2047,7 +2047,8 @@ static uintmax_t do_change_note_fanout(
> -       char realpath[60];
> +       char realpath[GIT_MAX_RAWSZ * 3];

I wonder if the fixed multiplier deserves a comment explaining that
this is reserving enough space for a hex representation with '/'
between each digit pair plus NUL. Which leads to the next question: Is
there is GIT_MAX_HEXSZ constant? If so, this might be more clearly
represented (or not) by taking advantage of that value.

Also, there are a number of hardcoded 60's in the code earlier in this
file, such as:

    if ((max_packsize && (pack_size + 60 + len) > max_packsize)
        || (pack_size + 60 + len) < pack_size)
        cycle_packfile();

Is that just a coincidence or is it related to the 60 characters
allocated for 'realpath'?

> @@ -2405,7 +2406,7 @@ static void note_change_n(const char *p, struct branch *b, unsigned char *old_fa
>                 char *buf = read_object_with_reference(&commit_oid,
>                                                        commit_type, &size,
>                                                        &commit_oid);
> -               if (!buf || size < 46)
> +               if (!buf || size < the_hash_algo->hexsz)

What exactly did the 46 represent and how does it relate to 'hexsz'?
Stated differently, why didn't this become:

    the_hash_algo->hexsz + 6'

?

> @@ -2456,7 +2457,7 @@ static void file_change_deleteall(struct branch *b)
>  static void parse_from_commit(struct branch *b, char *buf, unsigned long size)
>  {
> -       if (!buf || size < GIT_SHA1_HEXSZ + 6)
> +       if (!buf || size < the_hash_algo->hexsz + 6)

...as it seems to have here...

> @@ -2555,7 +2556,7 @@ static struct hash_list *parse_merge(unsigned int *count)
> -                       if (!buf || size < 46)
> +                       if (!buf || size < the_hash_algo->hexsz)

...but not here.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 15/31] builtin/pull: make hash-size independent
  2019-02-12  1:22 ` [PATCH 15/31] builtin/pull: make hash-size independent brian m. carlson
@ 2019-02-12  3:47   ` Eric Sunshine
  0 siblings, 0 replies; 56+ messages in thread
From: Eric Sunshine @ 2019-02-12  3:47 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git List, René Scharfe, Duy Nguyen

On Mon, Feb 11, 2019 at 8:23 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> Instead of using get_oid_hex and GIT_SHA1_HEXSZ, use parse_oid_hex to
> avoid the need for a constant and simplify the code.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
> diff --git a/builtin/pull.c b/builtin/pull.c
> @@ -365,9 +365,10 @@ static void get_merge_heads(struct oid_array *merge_heads)
>         while (strbuf_getline_lf(&sb, fp) != EOF) {
> -               if (get_oid_hex(sb.buf, &oid))
> +               const char *p;
> +               if (parse_oid_hex(sb.buf, &oid, &p))
>                         continue;  /* invalid line: does not start with SHA1 */

While you're here, perhaps fix the comment so it doesn't talk about
SHA1 anymore.

> -               if (starts_with(sb.buf + GIT_SHA1_HEXSZ, "\tnot-for-merge\t"))
> +               if (starts_with(p, "\tnot-for-merge\t"))
>                         continue;  /* ref is not-for-merge */
>                 oid_array_append(merge_heads, &oid);
>         }

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 19/31] http-walker: replace sha1_to_hex
  2019-02-12  1:22 ` [PATCH 19/31] http-walker: replace sha1_to_hex brian m. carlson
@ 2019-02-12  3:51   ` Eric Sunshine
  0 siblings, 0 replies; 56+ messages in thread
From: Eric Sunshine @ 2019-02-12  3:51 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git List, René Scharfe, Duy Nguyen

On Mon, Feb 11, 2019 at 8:23 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> Since sha1_to_hex is limited to SHA-1, replace the uses of it in this
> file with hasH_to_hex.  Rename several variables accordingly to reflect

s/hasH/hash/

> that they are no longer limited to SHA-1.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/31] pack-bitmap: replace sha1_to_hex
  2019-02-12  1:22 ` [PATCH 04/31] pack-bitmap: replace sha1_to_hex brian m. carlson
@ 2019-02-12  6:37   ` Jeff King
  2019-02-13  0:00     ` brian m. carlson
  0 siblings, 1 reply; 56+ messages in thread
From: Jeff King @ 2019-02-12  6:37 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, René Scharfe, Duy Nguyen

On Tue, Feb 12, 2019 at 01:22:29AM +0000, brian m. carlson wrote:

> Replace the uses of sha1_to_hex in the pack bitmap code with hash_to_hex
> to allow the use of SHA-256 as well.  Rename a few variables since they
> are no longer limited to SHA-1.

Sounds good, although...

> -static uint32_t find_object_pos(const unsigned char *sha1)
> +static uint32_t find_object_pos(const unsigned char *hash)

Isn't this really just a "struct object_id"? Why don't we want to use
that here?

I suspect it may be partially because our khash is storing raw sha1s.
But shouldn't we also be converted that to store object_ids?

In particular:

> @@ -181,7 +181,7 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
>  	stored->root = root;
>  	stored->xor = xor_with;
>  	stored->flags = flags;
> -	oidread(&stored->oid, sha1);
> +	oidread(&stored->oid, hash);
>  
>  	hash_pos = kh_put_sha1(index->bitmaps, stored->oid.hash, &ret);

This last line (which is actually from the previous patch) is going to
always truncate the stored data to 20 bytes, isn't it?

I think we need to define a kh_oid. We have the "set" version already in
oidset.[ch]; I think we need to make that more public.

-Peff

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 24/31] archive-tar: make hash size independent
  2019-02-12  1:22 ` [PATCH 24/31] archive-tar: " brian m. carlson
@ 2019-02-12  7:20   ` René Scharfe
  2019-02-12 17:33     ` René Scharfe
  0 siblings, 1 reply; 56+ messages in thread
From: René Scharfe @ 2019-02-12  7:20 UTC (permalink / raw)
  To: brian m. carlson, git; +Cc: Duy Nguyen

Am 12.02.2019 um 02:22 schrieb brian m. carlson:
> diff --git a/builtin/get-tar-commit-id.c b/builtin/get-tar-commit-id.c
> index 2706fcfaf2..2760549e91 100644
> --- a/builtin/get-tar-commit-id.c
> +++ b/builtin/get-tar-commit-id.c
> @@ -5,6 +5,7 @@
>  #include "commit.h"
>  #include "tar.h"
>  #include "builtin.h"
> +#include "strbuf.h"
>  #include "quote.h"
>
>  static const char builtin_get_tar_commit_id_usage[] =
> @@ -21,6 +22,8 @@ int cmd_get_tar_commit_id(int argc, const char **argv, const char *prefix)
>  	char *content = buffer + RECORDSIZE;
>  	const char *comment;
>  	ssize_t n;
> +	char *hdrprefix;
> +	int ret;
>
>  	if (argc != 1)
>  		usage(builtin_get_tar_commit_id_usage);
> @@ -32,10 +35,14 @@ int cmd_get_tar_commit_id(int argc, const char **argv, const char *prefix)
>  		die_errno("git get-tar-commit-id: EOF before reading tar header");
>  	if (header->typeflag[0] != 'g')
>  		return 1;
> -	if (!skip_prefix(content, "52 comment=", &comment))
> +
> +	hdrprefix = xstrfmt("%zu comment=", the_hash_algo->hexsz + strlen(" comment=") + 2 + 1);
> +	ret = skip_prefix(content, hdrprefix, &comment);
> +	free(hdrprefix);
> +	if (!ret)
>  		return 1;
>
> -	if (write_in_full(1, comment, 41) < 0)
> +	if (write_in_full(1, comment, the_hash_algo->hexsz + 1) < 0)
>  		die_errno("git get-tar-commit-id: write error");
>
>  	return 0;

That command currently prints the pax comment in tar archives if it
looks like a SHA1 hash based on its length.  It should continue to do
so, and _also_ show longer hashes.  Your change makes it _only_ show
those longer hashes.

So it could check for all known valid hash lengths in turn, or accept
any payload length between 40 and the_hash_algo->hexsz, or loosen up
totally and show comments of any length.

René

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 27/31] builtin/difftool: use parse_oid_hex
  2019-02-12  1:22 ` [PATCH 27/31] builtin/difftool: use parse_oid_hex brian m. carlson
@ 2019-02-12  8:27   ` Eric Sunshine
  0 siblings, 0 replies; 56+ messages in thread
From: Eric Sunshine @ 2019-02-12  8:27 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git List, René Scharfe, Duy Nguyen

On Mon, Feb 11, 2019 at 8:24 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> Instead of using get_oid_hex and adding constants to the result, use
> parse_oid_hex to make this code independent of the hash size.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
> diff --git a/builtin/difftool.c b/builtin/difftool.c
> @@ -65,14 +65,12 @@ static int parse_index_info(char *p, int *mode1, int *mode2,
>         if (*p != ' ')
>                 return error("expected ' ', got '%c'", *p);
> -       if (get_oid_hex(++p, oid1))
> +       if (parse_oid_hex(++p, oid1, (const char **)&p))
>                 return error("expected object ID, got '%s'", p + 1);

Not a problem introduced by this patch, but is the 'p + 1' in the
error message correct? 'p' has already been incremented via '++p' in
the call to parse_oid_hex() to point at what _should_ be the start of
OID, so one would think that the error message would want to print out
whatever was found there rather than what was found one character
after the start of OID.

> -       if (get_oid_hex(++p, oid2))
> +       if (parse_oid_hex(++p, oid2, (const char **)&p))
>                 return error("expected object ID, got '%s'", p + 1);

Ditto.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 31/31] gitweb: make hash size independent
  2019-02-12  1:22 ` [PATCH 31/31] gitweb: " brian m. carlson
@ 2019-02-12 10:57   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 56+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-02-12 10:57 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, René Scharfe, Duy Nguyen


On Tue, Feb 12 2019, brian m. carlson wrote:

> Gitweb has several hard-coded 40 values throughout it to check for
> values that are passed in or acquired from Git.  To simplify the code,
> introduce a regex variable that matches either exactly 40 or exactly 64
> hex characters, and use this variable anywhere we would have previously
> hard-coded a 40 in a regex.
>
> Similarly, switch the code that looks for deleted diffinfo information
> to look for either 40 or 64 zeros, and update one piece of code to use
> this function.  Finally, when formatting a log line, allow an
> abbreviated describe output to contain up to 64 characters.

This might be going a bit overboard but I tried this with a variant
where...

> +# A regex matching a valid object ID.
> +our $oid_regex = qr/(?:[0-9a-fA-F]{40}(?:[0-9a-fA-F]{24})?)/;
> +

Instead of this dense regex I did:

    my $sha1_len = 40;
    my $sha256_extra_len = 24;
    my $sha256_len = $sha1_len + $sha256_extra_len;

    sub oid_nlen_regex {
    	my $len = shift;
    	my $hchr = qr/[0-9a-fA-F]/;
    	return qr/(?:(?:$hchr){$len})/
    }

    our $oid_regex;
    {
        my $x = oid_nlen_regex($sha1_len);
        my $y = oid_nlen_regex($sha256_extra_len);
        $oid_regex = qr/(?:$x(?:$y)?)/
    }

Then most of the rest of this is the same, e.g.:
> -	if ($input =~ m/^[0-9a-fA-F]{40}$/) {

But...

> @@ -2037,10 +2040,10 @@ sub format_log_line_html {
>              (?<!-) # see strbuf_check_tag_ref(). Tags can't start with -
>              [A-Za-z0-9.-]+
>              (?!\.) # refs can't end with ".", see check_refname_format()
> -            -g[0-9a-fA-F]{7,40}
> +            -g[0-9a-fA-F]{7,64}
>              |
>              # Just a normal looking Git SHA1
> -            [0-9a-fA-F]{7,40}
> +            [0-9a-fA-F]{7,64}
>          )
>          \b
>      }{

E.g. here we can do call oid_nlen_regex("7,64") to produce this blurb.

> -	if ($line =~ m/^index [0-9a-fA-F]{40},[0-9a-fA-F]{40}/) {
> +	if ($line =~ m/^index $oid_regex,$oid_regex/) {
> -	} elsif ($line =~ m/^index [0-9a-fA-F]{40}..[0-9a-fA-F]{40}/) {
> +	} elsif ($line =~ m/^index $oid_regex..$oid_regex/) {

And here, maybe nobody cares, but we now implicitly accept mixed SHA-1 &
SHA-256 input. Whereas we could have a helper on top of the above code
like:

    sub oid_nlen_prefix_infix_regex {
        my $nlen = shift;
        my $prefix = shift;
        my $infix = shift;

        my $rx = oid_nlen_regex($nlen);

        return qr/^\Q$prefix\E$rx\Q$infix\E$rx$/;
    }

And then e.g.:

    } elsif ($line =~ oid_nlen_prefix_infix_regex($sha1_len, "index ", "..") ||
             $line =~ oid_nlen_prefix_infix_regex($sha256_len, "index ", "..")) {

So only accept SHA1..SHA1 or SHA256..SHA256, not SHA1..SHA256 or
SHA256..SHA1.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 30/31] Git.pm: make hash size independent
  2019-02-12  1:22 ` [PATCH 30/31] Git.pm: make hash size independent brian m. carlson
@ 2019-02-12 10:59   ` Ævar Arnfjörð Bjarmason
  2019-02-18 19:09     ` brian m. carlson
  0 siblings, 1 reply; 56+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-02-12 10:59 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, René Scharfe, Duy Nguyen


On Tue, Feb 12 2019, brian m. carlson wrote:

> The cat_blob function was matching on exactly 40 hex characters.  This
> won't work with SHA-256, which uses 64-character hex object IDs.  While
> it should be fine to simply match any number of hex characters since the
> output is space delimited, be extra safe by matching either exactly 40
> or exactly 64 hex characters.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  perl/Git.pm | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/perl/Git.pm b/perl/Git.pm
> index d856930b2e..62c472e0ce 100644
> --- a/perl/Git.pm
> +++ b/perl/Git.pm
> @@ -980,7 +980,7 @@ sub cat_blob {
>  		return -1;
>  	}
>
> -	if ($description !~ /^[0-9a-fA-F]{40} \S+ (\d+)$/) {
> +	if ($description !~ /^[0-9a-fA-F]{40}(?:[0-9a-fA-F]{24})? \S+ (\d+)$/) {
>  		carp "Unexpected result returned from git cat-file";
>  		return -1;
>  	}

The gitweb code doesn't load Git.pm now, but does anyone know a reason
for why we'd avoid any perl/* dependency in the gitweb code? If not the
regex here & in gitweb could be factored into e.g. a tiny Git::OID
library which would either just expose a $GIT::OID_REGEX, or something
like the sort of interface (might not be worth it) that I suggested in
my feedback to 31/31.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 28/31] dir: make untracked cache extension hash size independent
  2019-02-12  1:22 ` [PATCH 28/31] dir: make untracked cache extension hash size independent brian m. carlson
@ 2019-02-12 11:08   ` Ævar Arnfjörð Bjarmason
  2019-02-13  0:30     ` brian m. carlson
  0 siblings, 1 reply; 56+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-02-12 11:08 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, René Scharfe, Duy Nguyen

On Tue, Feb 12 2019, brian m. carlson wrote:

> Instead of using a struct with a flex array member to read and write the
> untracked cache extension, use a shorter, fixed-length struct and add
> the name and hash data explicitly.
> [...]
>  	struct stat_data info_exclude_stat;
>  	struct stat_data excludes_file_stat;
>  	uint32_t dir_flags;
> -	unsigned char info_exclude_sha1[20];
> -	unsigned char excludes_file_sha1[20];
> -	char exclude_per_dir[FLEX_ARRAY];
>  };

Both this & the follow-up 29/31 look scary since this is an on-disk
structure and this patch & the next one rather than implementing some
transition, just changes the structs & code we use to read & write to
use the current hash size.

So what are we going to do when the "current" size is SHA-256 and our
on-disk cache is SHA-1? It seems like with this we'd at best (I haven't
tested) throw away the SHA-1 UC data, and at worst introduce some silent
persistent read failure.

In any case that seems like something we should have tests for with an
on-disk format, i.e. write in one hash, see what happens when we read in
another, and perhaps instead of not understanding SHA-1 hashes in
SHA-256 mode we'd read the SHA-1 and consult the SHA-1<->SHA-256 lookup
table we're going to be eventually maintaining on the side?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 23/31] remote-curl: make hash size independent
  2019-02-12  1:22 ` [PATCH 23/31] remote-curl: make hash size independent brian m. carlson
@ 2019-02-12 11:11   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 56+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-02-12 11:11 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, René Scharfe, Duy Nguyen


On Tue, Feb 12 2019, brian m. carlson wrote:

> Change one hard-coded use of the constant 40 to a reference to
> the_hash_algo.  In addition, switch a use of get_oid_hex to
> parse_oid_hex to avoid the need to use a constant.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  remote-curl.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/remote-curl.c b/remote-curl.c
> index bb7421023b..8395b71bbb 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -249,7 +249,7 @@ static struct ref *parse_info_refs(struct discovery *heads)
>  		if (data[i] == '\t')
>  			mid = &data[i];
>  		if (data[i] == '\n') {
> -			if (mid - start != 40)
> +			if (mid - start != the_hash_algo->hexsz)
>  				die("%sinfo/refs not valid: is this a git repository?",
>  				    url.buf);
>  			data[i] = 0;

Similar to my comment on 28/31 I may be missing something, but this is
the part where we're parsing the ref advertisement from a remote
repository, which may e.g. be SHA-1, and our local "the hash algo" may
be SHA-256, no?

So isn't this a point where we need to accept both, and parse it into
some structure where we mark it as either/or SHA-1 or SHA-256, and late
or consult SHA-1<->SHA-256 lookup tables etc.?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 05/31] pack-bitmap: switch hard-coded constants to the_hash_algo
  2019-02-12  1:22 ` [PATCH 05/31] pack-bitmap: switch hard-coded constants to the_hash_algo brian m. carlson
@ 2019-02-12 11:13   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 56+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-02-12 11:13 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, René Scharfe, Duy Nguyen


On Tue, Feb 12 2019, brian m. carlson wrote:

> Switch two hard-coded uses of 20 to references to the_hash_algo.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  pack-bitmap.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 6d6fa68563..603492c237 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -138,7 +138,7 @@ static int load_bitmap_header(struct bitmap_index *index)
>  {
>  	struct bitmap_disk_header *header = (void *)index->map;
>
> -	if (index->map_size < sizeof(*header) + 20)
> +	if (index->map_size < sizeof(*header) + the_hash_algo->rawsz)
>  		return error("Corrupted bitmap index (missing header data)");
>
>  	if (memcmp(header->magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE)) != 0)
> @@ -157,7 +157,7 @@ static int load_bitmap_header(struct bitmap_index *index)
>  				"(Git requires BITMAP_OPT_FULL_DAG)");
>
>  		if (flags & BITMAP_OPT_HASH_CACHE) {
> -			unsigned char *end = index->map + index->map_size - 20;
> +			unsigned char *end = index->map + index->map_size - the_hash_algo->rawsz;
>  			index->hashes = ((uint32_t *)end) - index->pack->num_objects;
>  		}
>  	}

Similarly to my other comments upthread, what happens in a repo where we
have a SHA-1 bitmap file & a SHA-1<->SHA-256 mapping?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 00/31] Hash function transition part 16
  2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
                   ` (30 preceding siblings ...)
  2019-02-12  1:22 ` [PATCH 31/31] gitweb: " brian m. carlson
@ 2019-02-12 11:15 ` Ævar Arnfjörð Bjarmason
  31 siblings, 0 replies; 56+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-02-12 11:15 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, René Scharfe, Duy Nguyen


On Tue, Feb 12 2019, brian m. carlson wrote:

> This is the sixteenth series of hash function transition patches. This
> series contains various fixes, mostly focused around the pack bitmap
> code, the HTTP code, the archive code, the index, and parts of our Perl
> code.
>
> This is the second to last series required for a "stage 0" Git; that is,

I skimmed most of this, but decided to stop when I came to "am" since I
was just going to start repeating the same question I had in other
patches, i.e. for the parts of this that deal with on-disk formats how
does just e.g. search-replacing s/40/64/ interact with needing to read
existing files (bitmaps, "am" patches, untracked cache etc.) which may
be in the "old" format.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 24/31] archive-tar: make hash size independent
  2019-02-12  7:20   ` René Scharfe
@ 2019-02-12 17:33     ` René Scharfe
  2019-02-13  0:11       ` brian m. carlson
  0 siblings, 1 reply; 56+ messages in thread
From: René Scharfe @ 2019-02-12 17:33 UTC (permalink / raw)
  To: brian m. carlson, git; +Cc: Duy Nguyen

Am 12.02.2019 um 08:20 schrieb René Scharfe:
> That command currently prints the pax comment in tar archives if it
> looks like a SHA1 hash based on its length.  It should continue to do
> so, and _also_ show longer hashes.  Your change makes it _only_ show
> those longer hashes.
>
> So it could check for all known valid hash lengths in turn, or accept
> any payload length between 40 and the_hash_algo->hexsz, or loosen up
> totally and show comments of any length.

How about the following patch, as a preparation?

-- >8 --
From: Rene Scharfe <l.s.r@web.de>
Subject: [PATCH] get-tar-commit-id: parse comment record

Parse pax comment records properly and get rid of magic numbers for
acceptable comment length.  This simplifies a later change to handle
longer hashes.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
---
 builtin/get-tar-commit-id.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/builtin/get-tar-commit-id.c b/builtin/get-tar-commit-id.c
index 2706fcfaf2..312e44ed05 100644
--- a/builtin/get-tar-commit-id.c
+++ b/builtin/get-tar-commit-id.c
@@ -21,6 +21,8 @@ int cmd_get_tar_commit_id(int argc, const char **argv, const char *prefix)
 	char *content = buffer + RECORDSIZE;
 	const char *comment;
 	ssize_t n;
+	long len;
+	char *end;

 	if (argc != 1)
 		usage(builtin_get_tar_commit_id_usage);
@@ -32,10 +34,17 @@ int cmd_get_tar_commit_id(int argc, const char **argv, const char *prefix)
 		die_errno("git get-tar-commit-id: EOF before reading tar header");
 	if (header->typeflag[0] != 'g')
 		return 1;
-	if (!skip_prefix(content, "52 comment=", &comment))
+
+	len = strtol(content, &end, 10);
+	if (errno == ERANGE || end == content || len < 0)
+		return 1;
+	if (!skip_prefix(end, " comment=", &comment))
+		return 1;
+	len -= comment - content;
+	if (len != GIT_SHA1_HEXSZ + 1)
 		return 1;

-	if (write_in_full(1, comment, 41) < 0)
+	if (write_in_full(1, comment, len) < 0)
 		die_errno("git get-tar-commit-id: write error");

 	return 0;
--
2.20.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH 12/31] fast-import: make hash-size independent
  2019-02-12  3:44   ` Eric Sunshine
@ 2019-02-12 23:36     ` brian m. carlson
  0 siblings, 0 replies; 56+ messages in thread
From: brian m. carlson @ 2019-02-12 23:36 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List, René Scharfe, Duy Nguyen

[-- Attachment #1: Type: text/plain, Size: 2516 bytes --]

On Mon, Feb 11, 2019 at 10:44:12PM -0500, Eric Sunshine wrote:
> On Mon, Feb 11, 2019 at 8:23 PM brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> > Replace several uses of GIT_SHA1_HEXSZ and 40-based constants with
> > references to the_hash_algo.  Update the note handling code here to
> > compute path sizes based on GIT_MAX_RAWSZ as well.
> >
> > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> > ---
> > diff --git a/fast-import.c b/fast-import.c
> > @@ -2047,7 +2047,8 @@ static uintmax_t do_change_note_fanout(
> > -       char realpath[60];
> > +       char realpath[GIT_MAX_RAWSZ * 3];
> 
> I wonder if the fixed multiplier deserves a comment explaining that
> this is reserving enough space for a hex representation with '/'
> between each digit pair plus NUL. Which leads to the next question: Is
> there is GIT_MAX_HEXSZ constant? If so, this might be more clearly
> represented (or not) by taking advantage of that value.

There is such a constant. I'll add a comment and see if I can write it
in a way that makes it more intuitive what we're computing.

> Also, there are a number of hardcoded 60's in the code earlier in this
> file, such as:
> 
>     if ((max_packsize && (pack_size + 60 + len) > max_packsize)
>         || (pack_size + 60 + len) < pack_size)
>         cycle_packfile();
> 
> Is that just a coincidence or is it related to the 60 characters
> allocated for 'realpath'?

That's an interesting question. It looks like these are indeed related
to the hash size, but they're not related to the realpath call above.
I'll convert them as well.

> > @@ -2405,7 +2406,7 @@ static void note_change_n(const char *p, struct branch *b, unsigned char *old_fa
> >                 char *buf = read_object_with_reference(&commit_oid,
> >                                                        commit_type, &size,
> >                                                        &commit_oid);
> > -               if (!buf || size < 46)
> > +               if (!buf || size < the_hash_algo->hexsz)
> 
> What exactly did the 46 represent and how does it relate to 'hexsz'?
> Stated differently, why didn't this become:
> 
>     the_hash_algo->hexsz + 6'
> 
> ?

Good catch. I believe that 46 is the number of bytes in "tree %s\n",
which is the smallest possible commit (a root commit without an author
or committer).

I'll fix this misconversion.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/31] pack-bitmap: replace sha1_to_hex
  2019-02-12  6:37   ` Jeff King
@ 2019-02-13  0:00     ` brian m. carlson
  2019-02-14  4:41       ` Jeff King
  0 siblings, 1 reply; 56+ messages in thread
From: brian m. carlson @ 2019-02-13  0:00 UTC (permalink / raw)
  To: Jeff King; +Cc: git, René Scharfe, Duy Nguyen

[-- Attachment #1: Type: text/plain, Size: 1974 bytes --]

On Tue, Feb 12, 2019 at 01:37:49AM -0500, Jeff King wrote:
> On Tue, Feb 12, 2019 at 01:22:29AM +0000, brian m. carlson wrote:
> > -static uint32_t find_object_pos(const unsigned char *sha1)
> > +static uint32_t find_object_pos(const unsigned char *hash)
> 
> Isn't this really just a "struct object_id"? Why don't we want to use
> that here?
> 
> I suspect it may be partially because our khash is storing raw sha1s.
> But shouldn't we also be converted that to store object_ids?

I think probably there are some more places that we could convert here.
There may have been one or two places that weren't convertible because
we ended up passing data from some sort of buffer around. I'll take
another look.

> In particular:
> 
> > @@ -181,7 +181,7 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
> >  	stored->root = root;
> >  	stored->xor = xor_with;
> >  	stored->flags = flags;
> > -	oidread(&stored->oid, sha1);
> > +	oidread(&stored->oid, hash);
> >  
> >  	hash_pos = kh_put_sha1(index->bitmaps, stored->oid.hash, &ret);
> 
> This last line (which is actually from the previous patch) is going to
> always truncate the stored data to 20 bytes, isn't it?

No, I don't think it does. The _sha1 variant stores pointers to unsigned
char, while the _oid variant stores the entire struct object_id (not
just a pointer to it). We don't care how much data the pointer points
to.

> I think we need to define a kh_oid. We have the "set" version already in
> oidset.[ch]; I think we need to make that more public.

I wrote this quite a bit before that code came in, which is probably why
I didn't do that originally. I seem to recall last time I looked at this
that there was some reason that hoisting this didn't work as I expected
due to header include order, but I'll take a look and see if I can
figure out a way to do this.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 24/31] archive-tar: make hash size independent
  2019-02-12 17:33     ` René Scharfe
@ 2019-02-13  0:11       ` brian m. carlson
  0 siblings, 0 replies; 56+ messages in thread
From: brian m. carlson @ 2019-02-13  0:11 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, Duy Nguyen

[-- Attachment #1: Type: text/plain, Size: 2488 bytes --]

On Tue, Feb 12, 2019 at 06:33:39PM +0100, René Scharfe wrote:
> Am 12.02.2019 um 08:20 schrieb René Scharfe:
> > That command currently prints the pax comment in tar archives if it
> > looks like a SHA1 hash based on its length.  It should continue to do
> > so, and _also_ show longer hashes.  Your change makes it _only_ show
> > those longer hashes.
> >
> > So it could check for all known valid hash lengths in turn, or accept
> > any payload length between 40 and the_hash_algo->hexsz, or loosen up
> > totally and show comments of any length.
> 
> How about the following patch, as a preparation?
> 
> -- >8 --
> From: Rene Scharfe <l.s.r@web.de>
> Subject: [PATCH] get-tar-commit-id: parse comment record
> 
> Parse pax comment records properly and get rid of magic numbers for
> acceptable comment length.  This simplifies a later change to handle
> longer hashes.
> 
> Signed-off-by: Rene Scharfe <l.s.r@web.de>
> ---
>  builtin/get-tar-commit-id.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/builtin/get-tar-commit-id.c b/builtin/get-tar-commit-id.c
> index 2706fcfaf2..312e44ed05 100644
> --- a/builtin/get-tar-commit-id.c
> +++ b/builtin/get-tar-commit-id.c
> @@ -21,6 +21,8 @@ int cmd_get_tar_commit_id(int argc, const char **argv, const char *prefix)
>  	char *content = buffer + RECORDSIZE;
>  	const char *comment;
>  	ssize_t n;
> +	long len;
> +	char *end;
> 
>  	if (argc != 1)
>  		usage(builtin_get_tar_commit_id_usage);
> @@ -32,10 +34,17 @@ int cmd_get_tar_commit_id(int argc, const char **argv, const char *prefix)
>  		die_errno("git get-tar-commit-id: EOF before reading tar header");
>  	if (header->typeflag[0] != 'g')
>  		return 1;
> -	if (!skip_prefix(content, "52 comment=", &comment))
> +
> +	len = strtol(content, &end, 10);
> +	if (errno == ERANGE || end == content || len < 0)
> +		return 1;
> +	if (!skip_prefix(end, " comment=", &comment))
> +		return 1;
> +	len -= comment - content;
> +	if (len != GIT_SHA1_HEXSZ + 1)
>  		return 1;

So it turns out I wrote a different patch for this in a later series,
but I like this style better. I'm going to squash my existing patches
together and then rework them such that they're based off yours.

My technique iterated over the entire header, comparing the entire
header, which is much less elegant than this solution.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 28/31] dir: make untracked cache extension hash size independent
  2019-02-12 11:08   ` Ævar Arnfjörð Bjarmason
@ 2019-02-13  0:30     ` brian m. carlson
  0 siblings, 0 replies; 56+ messages in thread
From: brian m. carlson @ 2019-02-13  0:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, René Scharfe, Duy Nguyen

[-- Attachment #1: Type: text/plain, Size: 1609 bytes --]

On Tue, Feb 12, 2019 at 12:08:01PM +0100, Ævar Arnfjörð Bjarmason wrote:
> Both this & the follow-up 29/31 look scary since this is an on-disk
> structure and this patch & the next one rather than implementing some
> transition, just changes the structs & code we use to read & write to
> use the current hash size.
> 
> So what are we going to do when the "current" size is SHA-256 and our
> on-disk cache is SHA-1? It seems like with this we'd at best (I haven't
> tested) throw away the SHA-1 UC data, and at worst introduce some silent
> persistent read failure.

That's a good question. The current design has everything in the .git
directory other than the loose object table and the pack indices in one
algorithm. That is, the untracked cache extension, like the rest of the
index, will always be SHA-256 when the objects are stored in SHA-256.

Having mixed algorithms in the .git directory would add a lot of
complexity.

> In any case that seems like something we should have tests for with an
> on-disk format, i.e. write in one hash, see what happens when we read in
> another, and perhaps instead of not understanding SHA-1 hashes in
> SHA-256 mode we'd read the SHA-1 and consult the SHA-1<->SHA-256 lookup
> table we're going to be eventually maintaining on the side?

Correct. The current plan is that when the transition code is fully
implemented, we'll read object IDs from the user in whatever algorithms
we support, translate them into the default algorithm, and then use
them.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 10/31] object-store: rename and expand packed_git's sha1 member
  2019-02-12  3:32   ` Eric Sunshine
@ 2019-02-14  3:33     ` brian m. carlson
  0 siblings, 0 replies; 56+ messages in thread
From: brian m. carlson @ 2019-02-14  3:33 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List, René Scharfe, Duy Nguyen

[-- Attachment #1: Type: text/plain, Size: 1955 bytes --]

On Mon, Feb 11, 2019 at 10:32:15PM -0500, Eric Sunshine wrote:
> On Mon, Feb 11, 2019 at 8:23 PM brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> > This member is used to represent the pack checksum of the pack in
> > question.  Expand this member to be GIT_MAX_RAWSZ bytes in length so it
> > works with longer hashes and rename it to be "hash" instead of "sha1".
> > [...]
> > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> > ---
> > diff --git a/builtin/pack-redundant.c b/builtin/pack-redundant.c
> > @@ -689,7 +689,7 @@ int cmd_pack_redundant(int argc, const char **argv, const char *prefix)
> >         while (pl) {
> >                 printf("%s\n%s\n",
> > -                      sha1_pack_index_name(pl->pack->sha1),
> > +                      sha1_pack_index_name(pl->pack->hash),
> 
> I guess there is no oid_pack_index_name() function yet?

Correct, although see below.

> > diff --git a/http-walker.c b/http-walker.c
> > @@ -434,7 +434,7 @@ static int http_fetch_pack(struct walker *walker, struct alt_base *repo, unsigne
> >         if (walker->get_verbosely) {
> >                 fprintf(stderr, "Getting pack %s\n",
> > -                       sha1_to_hex(target->sha1));
> > +                       sha1_to_hex(target->hash));
> 
> Should this have become oid_to_hex()?

No, I think what's misleading here is that the member for both is named
"hash". This is a struct packed_git, not a struct object_id. The reason
I didn't convert this to a struct object_id is because this isn't
actually an object ID, it's a pack checksum hash.

I've tried to be diligent about separating the object IDs from other
hashes because I think it makes it easier to reason about the code, but
also because it may help us avoid tricky behavior when we have to deal
with multiple algorithms for input and storage.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/31] pack-bitmap: replace sha1_to_hex
  2019-02-13  0:00     ` brian m. carlson
@ 2019-02-14  4:41       ` Jeff King
  0 siblings, 0 replies; 56+ messages in thread
From: Jeff King @ 2019-02-14  4:41 UTC (permalink / raw)
  To: brian m. carlson, git, René Scharfe, Duy Nguyen

On Wed, Feb 13, 2019 at 12:00:07AM +0000, brian m. carlson wrote:

> On Tue, Feb 12, 2019 at 01:37:49AM -0500, Jeff King wrote:
> > On Tue, Feb 12, 2019 at 01:22:29AM +0000, brian m. carlson wrote:
> > > -static uint32_t find_object_pos(const unsigned char *sha1)
> > > +static uint32_t find_object_pos(const unsigned char *hash)
> > 
> > Isn't this really just a "struct object_id"? Why don't we want to use
> > that here?
> > 
> > I suspect it may be partially because our khash is storing raw sha1s.
> > But shouldn't we also be converted that to store object_ids?
> 
> I think probably there are some more places that we could convert here.
> There may have been one or two places that weren't convertible because
> we ended up passing data from some sort of buffer around. I'll take
> another look.

Thanks. I don't want to derail you too much if you have a series of
other changes on top. And moving to "hash" here is a step in the right
direction. But if we can take it all the way to object_id while we're
looking at it, I think that's preferable.

> > >  	hash_pos = kh_put_sha1(index->bitmaps, stored->oid.hash, &ret);
> > 
> > This last line (which is actually from the previous patch) is going to
> > always truncate the stored data to 20 bytes, isn't it?
> 
> No, I don't think it does. The _sha1 variant stores pointers to unsigned
> char, while the _oid variant stores the entire struct object_id (not
> just a pointer to it). We don't care how much data the pointer points
> to.

Oh, you're right. I was thinking it actually stored the 20-byte
sequences, but I was just reading it wrong. Sorry for the confusion.

> > I think we need to define a kh_oid. We have the "set" version already in
> > oidset.[ch]; I think we need to make that more public.
> 
> I wrote this quite a bit before that code came in, which is probably why
> I didn't do that originally. I seem to recall last time I looked at this
> that there was some reason that hoisting this didn't work as I expected
> due to header include order, but I'll take a look and see if I can
> figure out a way to do this.

Great, thanks!

-Peff

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 30/31] Git.pm: make hash size independent
  2019-02-12 10:59   ` Ævar Arnfjörð Bjarmason
@ 2019-02-18 19:09     ` brian m. carlson
  2019-02-18 21:00       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 56+ messages in thread
From: brian m. carlson @ 2019-02-18 19:09 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, René Scharfe, Duy Nguyen

[-- Attachment #1: Type: text/plain, Size: 798 bytes --]

On Tue, Feb 12, 2019 at 11:59:27AM +0100, Ævar Arnfjörð Bjarmason wrote:
> The gitweb code doesn't load Git.pm now, but does anyone know a reason
> for why we'd avoid any perl/* dependency in the gitweb code? If not the
> regex here & in gitweb could be factored into e.g. a tiny Git::OID
> library which would either just expose a $GIT::OID_REGEX, or something
> like the sort of interface (might not be worth it) that I suggested in
> my feedback to 31/31.

I think one potential issue here is that some distributors bundle the
Perl modules with git send-email and not gitweb, and they're packaged
independently. I'm not opposed to seeing a patch to do that, but it
probably belongs in its own series.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 30/31] Git.pm: make hash size independent
  2019-02-18 19:09     ` brian m. carlson
@ 2019-02-18 21:00       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 56+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-02-18 21:00 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, René Scharfe, Duy Nguyen


On Mon, Feb 18 2019, brian m. carlson wrote:

> On Tue, Feb 12, 2019 at 11:59:27AM +0100, Ævar Arnfjörð Bjarmason wrote:
>> The gitweb code doesn't load Git.pm now, but does anyone know a reason
>> for why we'd avoid any perl/* dependency in the gitweb code? If not the
>> regex here & in gitweb could be factored into e.g. a tiny Git::OID
>> library which would either just expose a $GIT::OID_REGEX, or something
>> like the sort of interface (might not be worth it) that I suggested in
>> my feedback to 31/31.
>
> I think one potential issue here is that some distributors bundle the
> Perl modules with git send-email and not gitweb, and they're packaged
> independently. I'm not opposed to seeing a patch to do that, but it
> probably belongs in its own series.

In packaging terms that one's easy to juggle, FWIW I'm just aware of
e.g. RedHat packaging perl-Git as its own thing depended on by git-svn,
git-email & friends, not anyone who packages it with git-send-email
specifically.

But according to gitweb/README it's also meant to be used as a
stand-alone script. So:

 * You can wget it from the git.git repo & run it
 * You can run it with any (reasonable) version of git
 * If you're missing perl modules, those are all on CPAN (just CGI.pm
   should be missing these days)

Making it depend on any module we ship would break all those things. I
don't know if anyone cares, but avoiding this very minor copy/pasting
seems like a bad reason to wake that particular dragon.

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2019-02-18 21:00 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-12  1:22 [PATCH 00/31] Hash function transition part 16 brian m. carlson
2019-02-12  1:22 ` [PATCH 01/31] t/lib-submodule-update: use appropriate length constant brian m. carlson
2019-02-12  1:22 ` [PATCH 02/31] pack-bitmap: make bitmap header handling hash agnostic brian m. carlson
2019-02-12  1:22 ` [PATCH 03/31] pack-bitmap: convert struct stored_bitmap to object_id brian m. carlson
2019-02-12  1:22 ` [PATCH 04/31] pack-bitmap: replace sha1_to_hex brian m. carlson
2019-02-12  6:37   ` Jeff King
2019-02-13  0:00     ` brian m. carlson
2019-02-14  4:41       ` Jeff King
2019-02-12  1:22 ` [PATCH 05/31] pack-bitmap: switch hard-coded constants to the_hash_algo brian m. carlson
2019-02-12 11:13   ` Ævar Arnfjörð Bjarmason
2019-02-12  1:22 ` [PATCH 06/31] submodule: avoid hard-coded constants brian m. carlson
2019-02-12  1:22 ` [PATCH 07/31] notes-merge: switch to use the_hash_algo brian m. carlson
2019-02-12  1:22 ` [PATCH 08/31] notes: make hash size independent brian m. carlson
2019-02-12  1:37   ` Eric Sunshine
2019-02-12  1:42     ` brian m. carlson
2019-02-12  1:22 ` [PATCH 09/31] notes: replace sha1_to_hex brian m. carlson
2019-02-12  1:22 ` [PATCH 10/31] object-store: rename and expand packed_git's sha1 member brian m. carlson
2019-02-12  3:32   ` Eric Sunshine
2019-02-14  3:33     ` brian m. carlson
2019-02-12  1:22 ` [PATCH 11/31] builtin/name-rev: make hash-size independent brian m. carlson
2019-02-12  1:22 ` [PATCH 12/31] fast-import: " brian m. carlson
2019-02-12  3:44   ` Eric Sunshine
2019-02-12 23:36     ` brian m. carlson
2019-02-12  1:22 ` [PATCH 13/31] fast-import: replace sha1_to_hex brian m. carlson
2019-02-12  1:22 ` [PATCH 14/31] builtin/am: make hash size independent brian m. carlson
2019-02-12  1:22 ` [PATCH 15/31] builtin/pull: make hash-size independent brian m. carlson
2019-02-12  3:47   ` Eric Sunshine
2019-02-12  1:22 ` [PATCH 16/31] http-push: convert to use the_hash_algo brian m. carlson
2019-02-12  1:22 ` [PATCH 17/31] http-backend: allow 64-character hex names brian m. carlson
2019-02-12  1:22 ` [PATCH 18/31] http-push: remove remaining uses of sha1_to_hex brian m. carlson
2019-02-12  1:22 ` [PATCH 19/31] http-walker: replace sha1_to_hex brian m. carlson
2019-02-12  3:51   ` Eric Sunshine
2019-02-12  1:22 ` [PATCH 20/31] http: replace hard-coded constant with the_hash_algo brian m. carlson
2019-02-12  1:22 ` [PATCH 21/31] http: compute hash of downloaded objects using the_hash_algo brian m. carlson
2019-02-12  1:22 ` [PATCH 22/31] http: replace sha1_to_hex brian m. carlson
2019-02-12  1:22 ` [PATCH 23/31] remote-curl: make hash size independent brian m. carlson
2019-02-12 11:11   ` Ævar Arnfjörð Bjarmason
2019-02-12  1:22 ` [PATCH 24/31] archive-tar: " brian m. carlson
2019-02-12  7:20   ` René Scharfe
2019-02-12 17:33     ` René Scharfe
2019-02-13  0:11       ` brian m. carlson
2019-02-12  1:22 ` [PATCH 25/31] archive: convert struct archiver_args to object_id brian m. carlson
2019-02-12  1:22 ` [PATCH 26/31] refspec: make hash size independent brian m. carlson
2019-02-12  1:22 ` [PATCH 27/31] builtin/difftool: use parse_oid_hex brian m. carlson
2019-02-12  8:27   ` Eric Sunshine
2019-02-12  1:22 ` [PATCH 28/31] dir: make untracked cache extension hash size independent brian m. carlson
2019-02-12 11:08   ` Ævar Arnfjörð Bjarmason
2019-02-13  0:30     ` brian m. carlson
2019-02-12  1:22 ` [PATCH 29/31] read-cache: read data in a hash-independent way brian m. carlson
2019-02-12  1:22 ` [PATCH 30/31] Git.pm: make hash size independent brian m. carlson
2019-02-12 10:59   ` Ævar Arnfjörð Bjarmason
2019-02-18 19:09     ` brian m. carlson
2019-02-18 21:00       ` Ævar Arnfjörð Bjarmason
2019-02-12  1:22 ` [PATCH 31/31] gitweb: " brian m. carlson
2019-02-12 10:57   ` Ævar Arnfjörð Bjarmason
2019-02-12 11:15 ` [PATCH 00/31] Hash function transition part 16 Ævar Arnfjörð Bjarmason

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).