git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC PATCH 0/6] Hash Abstraction
@ 2017-08-21  0:00 brian m. carlson
  2017-08-21  0:00 ` [RFC PATCH 1/6] vcs-svn: remove unused prototypes brian m. carlson
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: brian m. carlson @ 2017-08-21  0:00 UTC (permalink / raw)
  To: git

= Overview

This is an RFC series proposing a basic abstraction for hash functions.

As we get closer to converting the remainder of the codebase to use
struct object_id, we should think about the design we want our hash
function abstraction to take.  This series is a proposal for one idea to
start discussion.  Input on any aspect of this proposal is welcome.

This series exposes a struct git_hash_algo that contains basic
information about a given hash algorithm that distinguishes it from
other algorithms: name, lengths, implementing functions, and empty tree
and blob constants.  It also exposes an array of hash algorithms, and a
constant for indexing them.

The series also demonstrates a simple conversion using the abstraction
over empty blob and tree values.

In order to avoid conflicting with the struct repository work and with
the goal of avoiding global variables as much as possible, I've pushed
the hash algorithm into struct repository and exposed it via a #define.
This necessitiates pulling repository.h into cache.h, which I don't
think is fatal.  Doing that, in turn, necessitated some work on the
Subversion code to avoid conflicts.

It should be fine for Junio to pick up the first two patches from this
series, as they're relatively independent and valuable without the rest
of the series.  The rest should not be applied immediately, although
they do pass the testsuite.

I proposed this series now as it will inform the way we go about
converting other parts of the codebase, especially some of the pack
algorithms.  Because we share some hash computation code between pack
checksums and object hashing, we need to decide whether to expose pack
checksums as struct object_id, even though they are technically not
object IDs.  Furthermore, if we end up needing to stuff an algorithm
value into struct object_id, we'll no longer be able to directly
reference object IDs in a pack without a copy.

This series is available from the usual places as branch hash-struct,
based against master.

= Naming

The length names are similar to the current constant names
intentionally.  I've used the "hash_algo" name for both the integer
constant and the pointer to struct, although we could change the latter
to "hash_impl" or such as people like.

I chose to name the define "current_hash" and expose no other defines.
The name is relatively short since we're going to be typing it a lot.
However, if people like, we can capitalize it or expose other defines
(say, a GIT_HASH_RAWSZ or GIT_HASH_HEXSZ) instead of or in addition to
current_hash, which would make this name less interesting.

Feel free to propose alternatives to the naming of anything in this
series.

= Open Issues

I originally decided to convert hex.c as an example, but quickly found
out that this caused segfaults.  As part of setup, we call
is_git_directory, which calls validate_headref, which ends up calling
get_sha1_hex.  Obviously, we don't have a repository, so the hash
algorithm isn't set up yet.  This is an area we'll need to consider
making hash function agnostic, and we may also need to consider
inserting a hash constant integer into struct object_id if we're going
to do that.  Alternatively, we could just paper over this issue as a
special case.

Clearly we're going to want to expose some sort of lookup functionality
for hash algorithms.  We'll need to expose lookup by name (for the
.git/config file and any command-line options), but we may want other
functions as well.  What functions should those be?  Should we expose
the structure or the constant for those lookup functions?  If the
structure, we'll probably need to expose the constant in the structure
as well for easy use.

Should we avoid exposing the array of structure altogether and wrap this
in a function?

We could expose a union of hash context structures and take that as the
pointer type for the API calls.  That would probably obviate the need
for ctxsz.

We could expose hex versions of the blob constants if desired.  This
might make converting the remaining pieces of code that use them easier.

There are probably dozens of other things I haven't thought of yet as
well.

brian m. carlson (6):
  vcs-svn: remove unused prototypes
  vcs-svn: rename repo functions to "svn_repo"
  setup: expose enumerated repo info
  Add structure representing hash algorithm
  Integrate hash algorithm support with repo setup
  Switch empty tree and blob lookups to use hash abstraction

 builtin/am.c        |  2 +-
 builtin/checkout.c  |  2 +-
 builtin/diff.c      |  2 +-
 builtin/pull.c      |  2 +-
 cache.h             | 48 ++++++++++++++++++++++++++++++++++++++++++++----
 diff-lib.c          |  2 +-
 merge-recursive.c   |  2 +-
 notes-merge.c       |  2 +-
 repository.c        |  7 +++++++
 repository.h        |  5 +++++
 sequencer.c         |  6 +++---
 setup.c             | 48 +++++++++++++++++++++++++++---------------------
 sha1_file.c         | 29 +++++++++++++++++++++++++++++
 submodule.c         |  2 +-
 vcs-svn/repo_tree.c |  6 +++---
 vcs-svn/repo_tree.h | 13 +++----------
 vcs-svn/svndump.c   |  8 ++++----
 17 files changed, 133 insertions(+), 53 deletions(-)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH 1/6] vcs-svn: remove unused prototypes
  2017-08-21  0:00 [RFC PATCH 0/6] Hash Abstraction brian m. carlson
@ 2017-08-21  0:00 ` brian m. carlson
  2017-08-21  0:00 ` [RFC PATCH 2/6] vcs-svn: rename repo functions to "svn_repo" brian m. carlson
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: brian m. carlson @ 2017-08-21  0:00 UTC (permalink / raw)
  To: git

The Subversion code had prototypes for several functions which were not
ever defined or used.  These functions all had names starting with
"repo_", some of which conflict with those in repository.h.  To avoid
the conflict, remove those unused prototypes.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 vcs-svn/repo_tree.h | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/vcs-svn/repo_tree.h b/vcs-svn/repo_tree.h
index 889c6a3c95..d0f5690dca 100644
--- a/vcs-svn/repo_tree.h
+++ b/vcs-svn/repo_tree.h
@@ -10,14 +10,7 @@ struct strbuf;
 
 uint32_t next_blob_mark(void);
 void repo_copy(uint32_t revision, const char *src, const char *dst);
-void repo_add(const char *path, uint32_t mode, uint32_t blob_mark);
 const char *repo_read_path(const char *path, uint32_t *mode_out);
 void repo_delete(const char *path);
-void repo_commit(uint32_t revision, const char *author,
-		const struct strbuf *log, const char *uuid, const char *url,
-		long unsigned timestamp);
-void repo_diff(uint32_t r1, uint32_t r2);
-void repo_init(void);
-void repo_reset(void);
 
 #endif

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 2/6] vcs-svn: rename repo functions to "svn_repo"
  2017-08-21  0:00 [RFC PATCH 0/6] Hash Abstraction brian m. carlson
  2017-08-21  0:00 ` [RFC PATCH 1/6] vcs-svn: remove unused prototypes brian m. carlson
@ 2017-08-21  0:00 ` brian m. carlson
  2017-08-21  0:00 ` [RFC PATCH 3/6] setup: expose enumerated repo info brian m. carlson
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: brian m. carlson @ 2017-08-21  0:00 UTC (permalink / raw)
  To: git

There were several functions in the Subversion code that started with
"repo_".  This namespace is also used by the Git struct repository code.
Rename these functions to start with "svn_repo" to avoid any future
conflicts.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 vcs-svn/repo_tree.c | 6 +++---
 vcs-svn/repo_tree.h | 6 +++---
 vcs-svn/svndump.c   | 8 ++++----
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/vcs-svn/repo_tree.c b/vcs-svn/repo_tree.c
index 67d27f0b6c..d77cb0ada7 100644
--- a/vcs-svn/repo_tree.c
+++ b/vcs-svn/repo_tree.c
@@ -8,7 +8,7 @@
 #include "repo_tree.h"
 #include "fast_export.h"
 
-const char *repo_read_path(const char *path, uint32_t *mode_out)
+const char *svn_repo_read_path(const char *path, uint32_t *mode_out)
 {
 	int err;
 	static struct strbuf buf = STRBUF_INIT;
@@ -25,7 +25,7 @@ const char *repo_read_path(const char *path, uint32_t *mode_out)
 	return buf.buf;
 }
 
-void repo_copy(uint32_t revision, const char *src, const char *dst)
+void svn_repo_copy(uint32_t revision, const char *src, const char *dst)
 {
 	int err;
 	uint32_t mode;
@@ -42,7 +42,7 @@ void repo_copy(uint32_t revision, const char *src, const char *dst)
 	fast_export_modify(dst, mode, data.buf);
 }
 
-void repo_delete(const char *path)
+void svn_repo_delete(const char *path)
 {
 	fast_export_delete(path);
 }
diff --git a/vcs-svn/repo_tree.h b/vcs-svn/repo_tree.h
index d0f5690dca..555b64bbb6 100644
--- a/vcs-svn/repo_tree.h
+++ b/vcs-svn/repo_tree.h
@@ -9,8 +9,8 @@ struct strbuf;
 #define REPO_MODE_LNK 0120000
 
 uint32_t next_blob_mark(void);
-void repo_copy(uint32_t revision, const char *src, const char *dst);
-const char *repo_read_path(const char *path, uint32_t *mode_out);
-void repo_delete(const char *path);
+void svn_repo_copy(uint32_t revision, const char *src, const char *dst);
+const char *svn_repo_read_path(const char *path, uint32_t *mode_out);
+void svn_repo_delete(const char *path);
 
 #endif
diff --git a/vcs-svn/svndump.c b/vcs-svn/svndump.c
index 1846685a21..37c4a36b92 100644
--- a/vcs-svn/svndump.c
+++ b/vcs-svn/svndump.c
@@ -225,15 +225,15 @@ static void handle_node(void)
 		if (have_text || have_props || node_ctx.srcRev)
 			die("invalid dump: deletion node has "
 				"copyfrom info, text, or properties");
-		repo_delete(node_ctx.dst.buf);
+		svn_repo_delete(node_ctx.dst.buf);
 		return;
 	}
 	if (node_ctx.action == NODEACT_REPLACE) {
-		repo_delete(node_ctx.dst.buf);
+		svn_repo_delete(node_ctx.dst.buf);
 		node_ctx.action = NODEACT_ADD;
 	}
 	if (node_ctx.srcRev) {
-		repo_copy(node_ctx.srcRev, node_ctx.src.buf, node_ctx.dst.buf);
+		svn_repo_copy(node_ctx.srcRev, node_ctx.src.buf, node_ctx.dst.buf);
 		if (node_ctx.action == NODEACT_ADD)
 			node_ctx.action = NODEACT_CHANGE;
 	}
@@ -249,7 +249,7 @@ static void handle_node(void)
 		old_data = NULL;
 	} else if (node_ctx.action == NODEACT_CHANGE) {
 		uint32_t mode;
-		old_data = repo_read_path(node_ctx.dst.buf, &mode);
+		old_data = svn_repo_read_path(node_ctx.dst.buf, &mode);
 		if (mode == REPO_MODE_DIR && type != REPO_MODE_DIR)
 			die("invalid dump: cannot modify a directory into a file");
 		if (mode != REPO_MODE_DIR && type == REPO_MODE_DIR)

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 3/6] setup: expose enumerated repo info
  2017-08-21  0:00 [RFC PATCH 0/6] Hash Abstraction brian m. carlson
  2017-08-21  0:00 ` [RFC PATCH 1/6] vcs-svn: remove unused prototypes brian m. carlson
  2017-08-21  0:00 ` [RFC PATCH 2/6] vcs-svn: rename repo functions to "svn_repo" brian m. carlson
@ 2017-08-21  0:00 ` brian m. carlson
  2017-08-21  0:00 ` [RFC PATCH 4/6] Add structure representing hash algorithm brian m. carlson
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: brian m. carlson @ 2017-08-21  0:00 UTC (permalink / raw)
  To: git

We enumerate several different items as part of struct
repository_format, but then actually set up those values using the
global variables we've initialized from them.  Instead, let's pass a
pointer to the structure down to the code where we enumerate these
values, so we can later on use those values directly to perform setup.

This technique makes it easier for us to determine additional items
about the repository format (such as the hash algorithm) and then use
them for setup later on, without needing to add additional global
variables.  We can't avoid using the existing global variables since
they're intricately intertwined with how things work at the moment, but
this improves things for the future.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 setup.c | 46 +++++++++++++++++++++++++---------------------
 1 file changed, 25 insertions(+), 21 deletions(-)

diff --git a/setup.c b/setup.c
index 860507e1fd..115e70a4e8 100644
--- a/setup.c
+++ b/setup.c
@@ -437,16 +437,15 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
 	return 0;
 }
 
-static int check_repository_format_gently(const char *gitdir, int *nongit_ok)
+static int check_repository_format_gently(const char *gitdir, struct repository_format *candidate, int *nongit_ok)
 {
 	struct strbuf sb = STRBUF_INIT;
 	struct strbuf err = STRBUF_INIT;
-	struct repository_format candidate;
 	int has_common;
 
 	has_common = get_common_dir(&sb, gitdir);
 	strbuf_addstr(&sb, "/config");
-	read_repository_format(&candidate, sb.buf);
+	read_repository_format(candidate, sb.buf);
 	strbuf_release(&sb);
 
 	/*
@@ -454,10 +453,10 @@ static int check_repository_format_gently(const char *gitdir, int *nongit_ok)
 	 * we treat a missing config as a silent "ok", even when nongit_ok
 	 * is unset.
 	 */
-	if (candidate.version < 0)
+	if (candidate->version < 0)
 		return 0;
 
-	if (verify_repository_format(&candidate, &err) < 0) {
+	if (verify_repository_format(candidate, &err) < 0) {
 		if (nongit_ok) {
 			warning("%s", err.buf);
 			strbuf_release(&err);
@@ -467,21 +466,21 @@ static int check_repository_format_gently(const char *gitdir, int *nongit_ok)
 		die("%s", err.buf);
 	}
 
-	repository_format_precious_objects = candidate.precious_objects;
-	string_list_clear(&candidate.unknown_extensions, 0);
+	repository_format_precious_objects = candidate->precious_objects;
+	string_list_clear(&candidate->unknown_extensions, 0);
 	if (!has_common) {
-		if (candidate.is_bare != -1) {
-			is_bare_repository_cfg = candidate.is_bare;
+		if (candidate->is_bare != -1) {
+			is_bare_repository_cfg = candidate->is_bare;
 			if (is_bare_repository_cfg == 1)
 				inside_work_tree = -1;
 		}
-		if (candidate.work_tree) {
+		if (candidate->work_tree) {
 			free(git_work_tree_cfg);
-			git_work_tree_cfg = candidate.work_tree;
+			git_work_tree_cfg = candidate->work_tree;
 			inside_work_tree = -1;
 		}
 	} else {
-		free(candidate.work_tree);
+		free(candidate->work_tree);
 	}
 
 	return 0;
@@ -627,6 +626,7 @@ const char *read_gitfile_gently(const char *path, int *return_error_code)
 
 static const char *setup_explicit_git_dir(const char *gitdirenv,
 					  struct strbuf *cwd,
+					  struct repository_format *repo_fmt,
 					  int *nongit_ok)
 {
 	const char *work_tree_env = getenv(GIT_WORK_TREE_ENVIRONMENT);
@@ -652,7 +652,7 @@ static const char *setup_explicit_git_dir(const char *gitdirenv,
 		die("Not a git repository: '%s'", gitdirenv);
 	}
 
-	if (check_repository_format_gently(gitdirenv, nongit_ok)) {
+	if (check_repository_format_gently(gitdirenv, repo_fmt, nongit_ok)) {
 		free(gitfile);
 		return NULL;
 	}
@@ -725,9 +725,10 @@ static const char *setup_explicit_git_dir(const char *gitdirenv,
 
 static const char *setup_discovered_git_dir(const char *gitdir,
 					    struct strbuf *cwd, int offset,
+					    struct repository_format *repo_fmt,
 					    int *nongit_ok)
 {
-	if (check_repository_format_gently(gitdir, nongit_ok))
+	if (check_repository_format_gently(gitdir, repo_fmt, nongit_ok))
 		return NULL;
 
 	/* --work-tree is set without --git-dir; use discovered one */
@@ -739,7 +740,7 @@ static const char *setup_discovered_git_dir(const char *gitdir,
 			gitdir = to_free = real_pathdup(gitdir, 1);
 		if (chdir(cwd->buf))
 			die_errno("Could not come back to cwd");
-		ret = setup_explicit_git_dir(gitdir, cwd, nongit_ok);
+		ret = setup_explicit_git_dir(gitdir, cwd, repo_fmt, nongit_ok);
 		free(to_free);
 		return ret;
 	}
@@ -771,11 +772,12 @@ static const char *setup_discovered_git_dir(const char *gitdir,
 
 /* #16.1, #17.1, #20.1, #21.1, #22.1 (see t1510) */
 static const char *setup_bare_git_dir(struct strbuf *cwd, int offset,
+				      struct repository_format *repo_fmt,
 				      int *nongit_ok)
 {
 	int root_len;
 
-	if (check_repository_format_gently(".", nongit_ok))
+	if (check_repository_format_gently(".", repo_fmt, nongit_ok))
 		return NULL;
 
 	setenv(GIT_IMPLICIT_WORK_TREE_ENVIRONMENT, "0", 1);
@@ -787,7 +789,7 @@ static const char *setup_bare_git_dir(struct strbuf *cwd, int offset,
 		gitdir = offset == cwd->len ? "." : xmemdupz(cwd->buf, offset);
 		if (chdir(cwd->buf))
 			die_errno("Could not come back to cwd");
-		return setup_explicit_git_dir(gitdir, cwd, nongit_ok);
+		return setup_explicit_git_dir(gitdir, cwd, repo_fmt, nongit_ok);
 	}
 
 	inside_git_dir = 1;
@@ -1028,6 +1030,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
 	static struct strbuf cwd = STRBUF_INIT;
 	struct strbuf dir = STRBUF_INIT, gitdir = STRBUF_INIT;
 	const char *prefix, *env_prefix;
+	struct repository_format repo_fmt;
 
 	/*
 	 * We may have read an incomplete configuration before
@@ -1055,18 +1058,18 @@ const char *setup_git_directory_gently(int *nongit_ok)
 		prefix = NULL;
 		break;
 	case GIT_DIR_EXPLICIT:
-		prefix = setup_explicit_git_dir(gitdir.buf, &cwd, nongit_ok);
+		prefix = setup_explicit_git_dir(gitdir.buf, &cwd, &repo_fmt, nongit_ok);
 		break;
 	case GIT_DIR_DISCOVERED:
 		if (dir.len < cwd.len && chdir(dir.buf))
 			die(_("Cannot change to '%s'"), dir.buf);
 		prefix = setup_discovered_git_dir(gitdir.buf, &cwd, dir.len,
-						  nongit_ok);
+						  &repo_fmt, nongit_ok);
 		break;
 	case GIT_DIR_BARE:
 		if (dir.len < cwd.len && chdir(dir.buf))
 			die(_("Cannot change to '%s'"), dir.buf);
-		prefix = setup_bare_git_dir(&cwd, dir.len, nongit_ok);
+		prefix = setup_bare_git_dir(&cwd, dir.len, &repo_fmt, nongit_ok);
 		break;
 	case GIT_DIR_HIT_CEILING:
 		prefix = setup_nongit(cwd.buf, nongit_ok);
@@ -1183,7 +1186,8 @@ int git_config_perm(const char *var, const char *value)
 
 void check_repository_format(void)
 {
-	check_repository_format_gently(get_git_dir(), NULL);
+	struct repository_format repo_fmt;
+	check_repository_format_gently(get_git_dir(), &repo_fmt, NULL);
 	startup_info->have_repository = 1;
 }
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 4/6] Add structure representing hash algorithm
  2017-08-21  0:00 [RFC PATCH 0/6] Hash Abstraction brian m. carlson
                   ` (2 preceding siblings ...)
  2017-08-21  0:00 ` [RFC PATCH 3/6] setup: expose enumerated repo info brian m. carlson
@ 2017-08-21  0:00 ` brian m. carlson
  2017-08-21 21:08   ` Stefan Beller
  2017-08-21  0:00 ` [RFC PATCH 5/6] Integrate hash algorithm support with repo setup brian m. carlson
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 11+ messages in thread
From: brian m. carlson @ 2017-08-21  0:00 UTC (permalink / raw)
  To: git

Since in the future we want to support an additional hash algorithm, add
a structure that represents a hash algorithm and all the data that must
go along with it.  Add a constant to allow easy enumeration of hash
algorithms.  Implement function typedefs to create an abstract API that
can be used by any hash algorithm, and wrappers for the existing SHA1
functions that conform to this API.

Don't include an entry in the hash algorithm structure for the null
object ID.  As this value is all zeros, any suitably sized all-zero
object ID can be used, and there's no need to store a given one on a
per-hash basis.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 cache.h     | 36 ++++++++++++++++++++++++++++++++++++
 sha1_file.c | 29 +++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

diff --git a/cache.h b/cache.h
index 1c69d2a05a..375a7fb15e 100644
--- a/cache.h
+++ b/cache.h
@@ -76,6 +76,42 @@ struct object_id {
 	unsigned char hash[GIT_MAX_RAWSZ];
 };
 
+#define GIT_HASH_SHA1 0
+
+typedef void (*git_hash_init_fn)(void *ctx);
+typedef void (*git_hash_update_fn)(void *ctx, const void *in, size_t len);
+typedef void (*git_hash_final_fn)(unsigned char *hash, void *ctx);
+
+struct git_hash_algo {
+	/* The name of the algorithm, as appears in the config file. */
+	const char *name;
+
+	/* The size of a hash context (e.g. git_SHA_CTX). */
+	size_t ctxsz;
+
+	/* The length of the hash in bytes. */
+	size_t rawsz;
+
+	/* The length of the hash in hex characters. */
+	size_t hexsz;
+
+	/* The hash initialization function. */
+	git_hash_init_fn init_fn;
+
+	/* The hash update function. */
+	git_hash_update_fn update_fn;
+
+	/* The hash finalization function. */
+	git_hash_final_fn final_fn;
+
+	/* The OID of the empty tree. */
+	const struct object_id *empty_tree;
+
+	/* The OID of the empty blob. */
+	const struct object_id *empty_blob;
+};
+extern const struct git_hash_algo hash_algos[1];
+
 #if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT)
 #define DTYPE(de)	((de)->d_type)
 #else
diff --git a/sha1_file.c b/sha1_file.c
index b60ae15f70..bd9ff02802 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -41,6 +41,35 @@ const struct object_id empty_blob_oid = {
 	EMPTY_BLOB_SHA1_BIN_LITERAL
 };
 
+static inline void git_hash_sha1_init(void *ctx)
+{
+	git_SHA1_Init((git_SHA_CTX *)ctx);
+}
+
+static inline void git_hash_sha1_update(void *ctx, const void *data, size_t len)
+{
+	git_SHA1_Update((git_SHA_CTX *)ctx, data, len);
+}
+
+static inline void git_hash_sha1_final(unsigned char *hash, void *ctx)
+{
+	git_SHA1_Final(hash, (git_SHA_CTX *)ctx);
+}
+
+const struct git_hash_algo hash_algos[1] = {
+	[GIT_HASH_SHA1] = {
+		.name = "sha-1",
+		.ctxsz = sizeof(git_SHA_CTX),
+		.rawsz = GIT_SHA1_RAWSZ,
+		.hexsz = GIT_SHA1_HEXSZ,
+		.init_fn = git_hash_sha1_init,
+		.update_fn = git_hash_sha1_update,
+		.final_fn = git_hash_sha1_final,
+		.empty_tree = &empty_tree_oid,
+		.empty_blob = &empty_blob_oid,
+	},
+};
+
 /*
  * This is meant to hold a *small* number of objects that you would
  * want read_sha1_file() to be able to return, but yet you do not want

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 5/6] Integrate hash algorithm support with repo setup
  2017-08-21  0:00 [RFC PATCH 0/6] Hash Abstraction brian m. carlson
                   ` (3 preceding siblings ...)
  2017-08-21  0:00 ` [RFC PATCH 4/6] Add structure representing hash algorithm brian m. carlson
@ 2017-08-21  0:00 ` brian m. carlson
  2017-08-21 21:16   ` Stefan Beller
  2017-08-21  0:00 ` [RFC PATCH 6/6] Switch empty tree and blob lookups to use hash abstraction brian m. carlson
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 11+ messages in thread
From: brian m. carlson @ 2017-08-21  0:00 UTC (permalink / raw)
  To: git

In future versions of Git, we plan to support an additional hash
algorithm.  Integrate the enumeration of hash algorithms with repository
setup, and store a pointer to the enumerated data in struct repository.
Of course, we currently only support SHA-1, so hard-code this value in
read_repository_format.  In the future, we'll enumerate this value from
the configuration.

Add a constant, current_hash, which points to the hash_algo structure
pointer in the repository global.  Include repository.h in cache.h since
we now need to have access to these struct and variable definitions.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 cache.h      | 4 ++++
 repository.c | 7 +++++++
 repository.h | 5 +++++
 setup.c      | 2 ++
 4 files changed, 18 insertions(+)

diff --git a/cache.h b/cache.h
index 375a7fb15e..d759824803 100644
--- a/cache.h
+++ b/cache.h
@@ -13,6 +13,7 @@
 #include "hash.h"
 #include "path.h"
 #include "sha1-array.h"
+#include "repository.h"
 
 #ifndef platform_SHA_CTX
 /*
@@ -112,6 +113,8 @@ struct git_hash_algo {
 };
 extern const struct git_hash_algo hash_algos[1];
 
+#define current_hash the_repository->hash_algo
+
 #if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT)
 #define DTYPE(de)	((de)->d_type)
 #else
@@ -894,6 +897,7 @@ struct repository_format {
 	int version;
 	int precious_objects;
 	int is_bare;
+	int hash_algo;
 	char *work_tree;
 	struct string_list unknown_extensions;
 };
diff --git a/repository.c b/repository.c
index 1617467568..37764f627a 100644
--- a/repository.c
+++ b/repository.c
@@ -62,6 +62,11 @@ void repo_set_gitdir(struct repository *repo, const char *path)
 	repo_setup_env(repo);
 }
 
+void repo_set_hash_algo(struct repository *repo, int hash_algo)
+{
+	repo->hash_algo = &hash_algos[hash_algo];
+}
+
 /*
  * Attempt to resolve and set the provided 'gitdir' for repository 'repo'.
  * Return 0 upon success and a non-zero value upon failure.
@@ -134,6 +139,8 @@ int repo_init(struct repository *repo, const char *gitdir, const char *worktree)
 	if (read_and_verify_repository_format(&format, repo->commondir))
 		goto error;
 
+	repo->hash_algo = &hash_algos[format.hash_algo];
+
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
diff --git a/repository.h b/repository.h
index 417787f3ef..f171172150 100644
--- a/repository.h
+++ b/repository.h
@@ -4,6 +4,7 @@
 struct config_set;
 struct index_state;
 struct submodule_cache;
+struct git_hash_algo;
 
 struct repository {
 	/* Environment */
@@ -67,6 +68,9 @@ struct repository {
 	 */
 	struct index_state *index;
 
+	/* Repository's current hash algorithm. */
+	const struct git_hash_algo *hash_algo;
+
 	/* Configurations */
 	/*
 	 * Bit used during initialization to indicate if repository state (like
@@ -86,6 +90,7 @@ extern struct repository *the_repository;
 
 extern void repo_set_gitdir(struct repository *repo, const char *path);
 extern void repo_set_worktree(struct repository *repo, const char *path);
+extern void repo_set_hash_algo(struct repository *repo, int algo);
 extern int repo_init(struct repository *repo, const char *gitdir, const char *worktree);
 extern int repo_submodule_init(struct repository *submodule,
 			       struct repository *superproject,
diff --git a/setup.c b/setup.c
index 115e70a4e8..289e24811c 100644
--- a/setup.c
+++ b/setup.c
@@ -491,6 +491,7 @@ int read_repository_format(struct repository_format *format, const char *path)
 	memset(format, 0, sizeof(*format));
 	format->version = -1;
 	format->is_bare = -1;
+	format->hash_algo = GIT_HASH_SHA1;
 	string_list_init(&format->unknown_extensions, 1);
 	git_config_from_file(check_repo_format, path, format);
 	return format->version;
@@ -1125,6 +1126,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
 			repo_set_gitdir(the_repository, gitdir);
 			setup_git_env();
 		}
+		repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
 	}
 
 	strbuf_release(&dir);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 6/6] Switch empty tree and blob lookups to use hash abstraction
  2017-08-21  0:00 [RFC PATCH 0/6] Hash Abstraction brian m. carlson
                   ` (4 preceding siblings ...)
  2017-08-21  0:00 ` [RFC PATCH 5/6] Integrate hash algorithm support with repo setup brian m. carlson
@ 2017-08-21  0:00 ` brian m. carlson
  2017-08-21  0:18 ` [RFC PATCH 0/6] Hash Abstraction Junio C Hamano
  2017-08-21 20:48 ` Stefan Beller
  7 siblings, 0 replies; 11+ messages in thread
From: brian m. carlson @ 2017-08-21  0:00 UTC (permalink / raw)
  To: git

Switch the uses of empty_tree_oid and empty_blob_oid to use the
current_hash abstraction that represents the current hash algorithm in
use.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/am.c       | 2 +-
 builtin/checkout.c | 2 +-
 builtin/diff.c     | 2 +-
 builtin/pull.c     | 2 +-
 cache.h            | 8 ++++----
 diff-lib.c         | 2 +-
 merge-recursive.c  | 2 +-
 notes-merge.c      | 2 +-
 sequencer.c        | 6 +++---
 submodule.c        | 2 +-
 10 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 40cc6d6fe8..347b30bd1f 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1435,7 +1435,7 @@ static void write_index_patch(const struct am_state *state)
 	if (!get_oid_tree("HEAD", &head))
 		tree = lookup_tree(&head);
 	else
-		tree = lookup_tree(&empty_tree_oid);
+		tree = lookup_tree(current_hash->empty_tree);
 
 	fp = xfopen(am_path(state, "patch"), "w");
 	init_revisions(&rev_info, NULL);
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 2d75ac66c7..efdec232b7 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -515,7 +515,7 @@ static int merge_working_tree(const struct checkout_opts *opts,
 		}
 		tree = parse_tree_indirect(old->commit ?
 					   &old->commit->object.oid :
-					   &empty_tree_oid);
+					   current_hash->empty_tree);
 		init_tree_desc(&trees[0], tree->buffer, tree->size);
 		tree = parse_tree_indirect(&new->commit->object.oid);
 		init_tree_desc(&trees[1], tree->buffer, tree->size);
diff --git a/builtin/diff.c b/builtin/diff.c
index 7cde6abbcf..4cb5d23899 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -382,7 +382,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
 				add_head_to_pending(&rev);
 				if (!rev.pending.nr) {
 					struct tree *tree;
-					tree = lookup_tree(&empty_tree_oid);
+					tree = lookup_tree(current_hash->empty_tree);
 					add_pending_object(&rev, &tree->object, "HEAD");
 				}
 				break;
diff --git a/builtin/pull.c b/builtin/pull.c
index 9b86e519b1..4fa5db2fc9 100644
--- a/builtin/pull.c
+++ b/builtin/pull.c
@@ -537,7 +537,7 @@ static int pull_into_void(const struct object_id *merge_head,
 	 * index/worktree changes that the user already made on the unborn
 	 * branch.
 	 */
-	if (checkout_fast_forward(&empty_tree_oid, merge_head, 0))
+	if (checkout_fast_forward(current_hash->empty_tree, merge_head, 0))
 		return 1;
 
 	if (update_ref("initial pull", "HEAD", merge_head->hash, curr_head->hash, 0, UPDATE_REFS_DIE_ON_ERR))
diff --git a/cache.h b/cache.h
index d759824803..23d2e29fcd 100644
--- a/cache.h
+++ b/cache.h
@@ -1051,22 +1051,22 @@ extern const struct object_id empty_blob_oid;
 
 static inline int is_empty_blob_sha1(const unsigned char *sha1)
 {
-	return !hashcmp(sha1, EMPTY_BLOB_SHA1_BIN);
+	return !hashcmp(sha1, current_hash->empty_blob->hash);
 }
 
 static inline int is_empty_blob_oid(const struct object_id *oid)
 {
-	return !hashcmp(oid->hash, EMPTY_BLOB_SHA1_BIN);
+	return !oidcmp(oid, current_hash->empty_blob);
 }
 
 static inline int is_empty_tree_sha1(const unsigned char *sha1)
 {
-	return !hashcmp(sha1, EMPTY_TREE_SHA1_BIN);
+	return !hashcmp(sha1, current_hash->empty_tree->hash);
 }
 
 static inline int is_empty_tree_oid(const struct object_id *oid)
 {
-	return !hashcmp(oid->hash, EMPTY_TREE_SHA1_BIN);
+	return !oidcmp(oid, current_hash->empty_tree);
 }
 
 /* set default permissions by passing mode arguments to open(2) */
diff --git a/diff-lib.c b/diff-lib.c
index 2a52b07954..0575f4cb1b 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -216,7 +216,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 			} else if (revs->diffopt.ita_invisible_in_index &&
 				   ce_intent_to_add(ce)) {
 				diff_addremove(&revs->diffopt, '+', ce->ce_mode,
-					       &empty_tree_oid, 0,
+					       current_hash->empty_tree, 0,
 					       ce->name, 0);
 				continue;
 			}
diff --git a/merge-recursive.c b/merge-recursive.c
index 1494ffdb82..981fb668f9 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2043,7 +2043,7 @@ int merge_recursive(struct merge_options *o,
 		/* if there is no common ancestor, use an empty tree */
 		struct tree *tree;
 
-		tree = lookup_tree(&empty_tree_oid);
+		tree = lookup_tree(current_hash->empty_tree);
 		merged_common_ancestors = make_virtual_commit(tree, "ancestor");
 	}
 
diff --git a/notes-merge.c b/notes-merge.c
index c12b354f10..5b1e2ef8d6 100644
--- a/notes-merge.c
+++ b/notes-merge.c
@@ -597,7 +597,7 @@ int notes_merge(struct notes_merge_options *o,
 	bases = get_merge_bases(local, remote);
 	if (!bases) {
 		base_oid = &null_oid;
-		base_tree_oid = &empty_tree_oid;
+		base_tree_oid = current_hash->empty_tree;
 		if (o->verbosity >= 4)
 			printf("No merge base found; doing history-less merge\n");
 	} else if (!bases->next) {
diff --git a/sequencer.c b/sequencer.c
index f5e85a398d..86961a29cb 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -345,7 +345,7 @@ static int read_oneliner(struct strbuf *buf,
 
 static struct tree *empty_tree(void)
 {
-	return lookup_tree(&empty_tree_oid);
+	return lookup_tree(current_hash->empty_tree);
 }
 
 static int error_dirty_index(struct replay_opts *opts)
@@ -703,7 +703,7 @@ static int is_original_commit_empty(struct commit *commit)
 				oid_to_hex(&parent->object.oid));
 		ptree_oid = &parent->tree->object.oid;
 	} else {
-		ptree_oid = &empty_tree_oid; /* commit is root */
+		ptree_oid = current_hash->empty_tree; /* commit is root */
 	}
 
 	return !oidcmp(ptree_oid, &commit->tree->object.oid);
@@ -956,7 +956,7 @@ static int do_pick_commit(enum todo_command command, struct commit *commit,
 	} else {
 		unborn = get_oid("HEAD", &head);
 		if (unborn)
-			oidcpy(&head, &empty_tree_oid);
+			oidcpy(&head, current_hash->empty_tree);
 		if (index_differs_from(unborn ? EMPTY_TREE_SHA1_HEX : "HEAD", 0, 0))
 			return error_dirty_index(opts);
 	}
diff --git a/submodule.c b/submodule.c
index 27de65a4a0..12ad6b1338 100644
--- a/submodule.c
+++ b/submodule.c
@@ -656,7 +656,7 @@ void show_submodule_inline_diff(FILE *f, const char *path,
 		const char *del, const char *add, const char *reset,
 		const struct diff_options *o)
 {
-	const struct object_id *old = &empty_tree_oid, *new = &empty_tree_oid;
+	const struct object_id *old = current_hash->empty_tree, *new = current_hash->empty_tree;
 	struct commit *left = NULL, *right = NULL;
 	struct commit_list *merge_bases = NULL;
 	struct strbuf submodule_dir = STRBUF_INIT;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 0/6] Hash Abstraction
  2017-08-21  0:00 [RFC PATCH 0/6] Hash Abstraction brian m. carlson
                   ` (5 preceding siblings ...)
  2017-08-21  0:00 ` [RFC PATCH 6/6] Switch empty tree and blob lookups to use hash abstraction brian m. carlson
@ 2017-08-21  0:18 ` Junio C Hamano
  2017-08-21 20:48 ` Stefan Beller
  7 siblings, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2017-08-21  0:18 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> brian m. carlson (6):
>   vcs-svn: remove unused prototypes
>   vcs-svn: rename repo functions to "svn_repo"
>   setup: expose enumerated repo info
>   Add structure representing hash algorithm
>   Integrate hash algorithm support with repo setup
>   Switch empty tree and blob lookups to use hash abstraction

The first two are about very quiescent code that can go in and sail
thru 'next' very quickly without problems, I would think.

After the third one alone, the discovered format information may not
be fully plumbed through (i.e. after setup finishes there is no way
for outside callers to peek at the repo_fmt instance that was on
stack), but it seems that this is sufficient for the purpose of this
step and it looks a very sensible first step.,

As to patch 4, I tend to agree with the analysis you had in this
cover letter--it did make me wonder if we want to have union of hash
contenxt structures, for example.  But the enumeration of virtual
functions looks about right.

I wondered if we wanted "binery size" and "hex size" separately,
because the latter will always be twice as big as the former as long
as the latter's definition is "hex", but that is a minor point.

Thanks for starting this.

>  builtin/am.c        |  2 +-
>  builtin/checkout.c  |  2 +-
>  builtin/diff.c      |  2 +-
>  builtin/pull.c      |  2 +-
>  cache.h             | 48 ++++++++++++++++++++++++++++++++++++++++++++----
>  diff-lib.c          |  2 +-
>  merge-recursive.c   |  2 +-
>  notes-merge.c       |  2 +-
>  repository.c        |  7 +++++++
>  repository.h        |  5 +++++
>  sequencer.c         |  6 +++---
>  setup.c             | 48 +++++++++++++++++++++++++++---------------------
>  sha1_file.c         | 29 +++++++++++++++++++++++++++++
>  submodule.c         |  2 +-
>  vcs-svn/repo_tree.c |  6 +++---
>  vcs-svn/repo_tree.h | 13 +++----------
>  vcs-svn/svndump.c   |  8 ++++----
>  17 files changed, 133 insertions(+), 53 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 0/6] Hash Abstraction
  2017-08-21  0:00 [RFC PATCH 0/6] Hash Abstraction brian m. carlson
                   ` (6 preceding siblings ...)
  2017-08-21  0:18 ` [RFC PATCH 0/6] Hash Abstraction Junio C Hamano
@ 2017-08-21 20:48 ` Stefan Beller
  7 siblings, 0 replies; 11+ messages in thread
From: Stefan Beller @ 2017-08-21 20:48 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git@vger.kernel.org

On Sun, Aug 20, 2017 at 5:00 PM, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> = Overview
>
> This is an RFC series proposing a basic abstraction for hash functions.
>
> As we get closer to converting the remainder of the codebase to use
> struct object_id, we should think about the design we want our hash
> function abstraction to take.  This series is a proposal for one idea to
> start discussion.  Input on any aspect of this proposal is welcome.
>
> This series exposes a struct git_hash_algo that contains basic
> information about a given hash algorithm that distinguishes it from
> other algorithms: name, lengths, implementing functions, and empty tree
> and blob constants.  It also exposes an array of hash algorithms, and a
> constant for indexing them.
>
> The series also demonstrates a simple conversion using the abstraction
> over empty blob and tree values.
>
> In order to avoid conflicting with the struct repository work and with
> the goal of avoiding global variables as much as possible, I've pushed
> the hash algorithm into struct repository and exposed it via a #define.
> This necessitiates pulling repository.h into cache.h, which I don't
> think is fatal.  Doing that, in turn, necessitated some work on the
> Subversion code to avoid conflicts.
>
> It should be fine for Junio to pick up the first two patches from this
> series, as they're relatively independent and valuable without the rest
> of the series.  The rest should not be applied immediately, although
> they do pass the testsuite.
>
> I proposed this series now as it will inform the way we go about
> converting other parts of the codebase, especially some of the pack
> algorithms.  Because we share some hash computation code between pack
> checksums and object hashing, we need to decide whether to expose pack
> checksums as struct object_id, even though they are technically not
> object IDs.  Furthermore, if we end up needing to stuff an algorithm
> value into struct object_id, we'll no longer be able to directly
> reference object IDs in a pack without a copy.

Note that the checksum hash could be different from actual object ids,
so it may be easiest to introduce a "new pack format", which is just
s/sha1/<new hash>/g for any object ids, though I guess the work needed
is just the same for converting checksums to the new hash, too.


> This series is available from the usual places as branch hash-struct,
> based against master.
>
> = Naming
>
> The length names are similar to the current constant names
> intentionally.  I've used the "hash_algo" name for both the integer
> constant and the pointer to struct, although we could change the latter
> to "hash_impl" or such as people like.
>
> I chose to name the define "current_hash" and expose no other defines.
> The name is relatively short since we're going to be typing it a lot.
> However, if people like, we can capitalize it or expose other defines
> (say, a GIT_HASH_RAWSZ or GIT_HASH_HEXSZ) instead of or in addition to
> current_hash, which would make this name less interesting.
>
> Feel free to propose alternatives to the naming of anything in this
> series.
>
> = Open Issues
>
> I originally decided to convert hex.c as an example, but quickly found
> out that this caused segfaults.  As part of setup, we call
> is_git_directory, which calls validate_headref, which ends up calling
> get_sha1_hex.  Obviously, we don't have a repository, so the hash
> algorithm isn't set up yet.  This is an area we'll need to consider
> making hash function agnostic, and we may also need to consider
> inserting a hash constant integer into struct object_id if we're going
> to do that.  Alternatively, we could just paper over this issue as a
> special case.

We could introduce new files .sha1 or .sha256 in the config dir, whose
existence can be checked prior to validating the HEAD.

Maybe the HEAD validation can be non-strict and could check
if we'd have at least 'n' hex chars and at most 'm' hex chars. 'n' and 'm'
depend on the hash functions supported of that version of git. (Maybe
we'd validate later again to have the correct hash function length,
or we could also allow abbreviated hash names for HEAD)

> Clearly we're going to want to expose some sort of lookup functionality
> for hash algorithms.  We'll need to expose lookup by name (for the
> .git/config file and any command-line options), but we may want other
> functions as well.  What functions should those be?  Should we expose
> the structure or the constant for those lookup functions?  If the
> structure, we'll probably need to expose the constant in the structure
> as well for easy use.
>
> Should we avoid exposing the array of structure altogether and wrap this
> in a function?
>
> We could expose a union of hash context structures and take that as the
> pointer type for the API calls.  That would probably obviate the need
> for ctxsz.
>
> We could expose hex versions of the blob constants if desired.  This
> might make converting the remaining pieces of code that use them easier.
>
> There are probably dozens of other things I haven't thought of yet as
> well.
>
> brian m. carlson (6):
>   vcs-svn: remove unused prototypes
>   vcs-svn: rename repo functions to "svn_repo"
>   setup: expose enumerated repo info
>   Add structure representing hash algorithm
>   Integrate hash algorithm support with repo setup
>   Switch empty tree and blob lookups to use hash abstraction
>
>  builtin/am.c        |  2 +-
>  builtin/checkout.c  |  2 +-
>  builtin/diff.c      |  2 +-
>  builtin/pull.c      |  2 +-
>  cache.h             | 48 ++++++++++++++++++++++++++++++++++++++++++++----
>  diff-lib.c          |  2 +-
>  merge-recursive.c   |  2 +-
>  notes-merge.c       |  2 +-
>  repository.c        |  7 +++++++
>  repository.h        |  5 +++++
>  sequencer.c         |  6 +++---
>  setup.c             | 48 +++++++++++++++++++++++++++---------------------
>  sha1_file.c         | 29 +++++++++++++++++++++++++++++
>  submodule.c         |  2 +-
>  vcs-svn/repo_tree.c |  6 +++---
>  vcs-svn/repo_tree.h | 13 +++----------
>  vcs-svn/svndump.c   |  8 ++++----
>  17 files changed, 133 insertions(+), 53 deletions(-)
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 4/6] Add structure representing hash algorithm
  2017-08-21  0:00 ` [RFC PATCH 4/6] Add structure representing hash algorithm brian m. carlson
@ 2017-08-21 21:08   ` Stefan Beller
  0 siblings, 0 replies; 11+ messages in thread
From: Stefan Beller @ 2017-08-21 21:08 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git@vger.kernel.org

On Sun, Aug 20, 2017 at 5:00 PM, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> Since in the future we want to support an additional hash algorithm, add
> a structure that represents a hash algorithm and all the data that must
> go along with it.  Add a constant to allow easy enumeration of hash
> algorithms.  Implement function typedefs to create an abstract API that
> can be used by any hash algorithm, and wrappers for the existing SHA1
> functions that conform to this API.
>
> Don't include an entry in the hash algorithm structure for the null
> object ID.  As this value is all zeros, any suitably sized all-zero
> object ID can be used, and there's no need to store a given one on a
> per-hash basis.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  cache.h     | 36 ++++++++++++++++++++++++++++++++++++
>  sha1_file.c | 29 +++++++++++++++++++++++++++++
>  2 files changed, 65 insertions(+)
>
> diff --git a/cache.h b/cache.h
> index 1c69d2a05a..375a7fb15e 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -76,6 +76,42 @@ struct object_id {
>         unsigned char hash[GIT_MAX_RAWSZ];
>  };
>
> +#define GIT_HASH_SHA1 0
> +
> +typedef void (*git_hash_init_fn)(void *ctx);
> +typedef void (*git_hash_update_fn)(void *ctx, const void *in, size_t len);
> +typedef void (*git_hash_final_fn)(unsigned char *hash, void *ctx);
> +
> +struct git_hash_algo {
> +       /* The name of the algorithm, as appears in the config file. */
> +       const char *name;

and potentially in error messages?

> +
> +       /* The size of a hash context (e.g. git_SHA_CTX). */
> +       size_t ctxsz;
> +
> +       /* The length of the hash in bytes. */
> +       size_t rawsz;

[in binary, as opposed to the next entry]

> +
> +       /* The length of the hash in hex characters. */
> +       size_t hexsz;

By having two entries for binary and hex size, we current
choice of needing twice as much choice for the human
representation (as is inherent from the binary <-> hex
conversion, one char stores 8 or 4 bit); so this is good to
have. But is it misnamed? (This abstraction only makes
sense if we ever plan to have human readable representation
not factor 2, e.g. base64 for the non-binary representation.
But then the comment is wrong!)

Maybe s/hex characters/string representation suited for humans/ ?
(Bad naming proposal, but still)

> +       /* The hash initialization function. */
> +       git_hash_init_fn init_fn;
> +
> +       /* The hash update function. */
> +       git_hash_update_fn update_fn;
> +
> +       /* The hash finalization function. */
> +       git_hash_final_fn final_fn;

I shortly wondered if we want to have just one
pointer to a struct that contains these 3 functions,
but that seems overly complex.

> +       /* The OID of the empty tree. */
> +       const struct object_id *empty_tree;
> +
> +       /* The OID of the empty blob. */
> +       const struct object_id *empty_blob;

In a more object oriented world, this would
not be as micro-optimized(?), but rather be:

    object o = new object()
    o.getHash()

or such, but we tend to access empty_{blob,tree}
quite often in the code base. So it seems like a good
idea to keep this shortcut around.


> +};
> +extern const struct git_hash_algo hash_algos[1];
> +
>  #if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT)
>  #define DTYPE(de)      ((de)->d_type)
>  #else
> diff --git a/sha1_file.c b/sha1_file.c
> index b60ae15f70..bd9ff02802 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -41,6 +41,35 @@ const struct object_id empty_blob_oid = {
>         EMPTY_BLOB_SHA1_BIN_LITERAL
>  };
>
> +static inline void git_hash_sha1_init(void *ctx)
> +{
> +       git_SHA1_Init((git_SHA_CTX *)ctx);
> +}
> +
> +static inline void git_hash_sha1_update(void *ctx, const void *data, size_t len)
> +{
> +       git_SHA1_Update((git_SHA_CTX *)ctx, data, len);
> +}
> +
> +static inline void git_hash_sha1_final(unsigned char *hash, void *ctx)
> +{
> +       git_SHA1_Final(hash, (git_SHA_CTX *)ctx);
> +}
> +
> +const struct git_hash_algo hash_algos[1] = {
> +       [GIT_HASH_SHA1] = {
> +               .name = "sha-1",
> +               .ctxsz = sizeof(git_SHA_CTX),
> +               .rawsz = GIT_SHA1_RAWSZ,
> +               .hexsz = GIT_SHA1_HEXSZ,
> +               .init_fn = git_hash_sha1_init,
> +               .update_fn = git_hash_sha1_update,
> +               .final_fn = git_hash_sha1_final,
> +               .empty_tree = &empty_tree_oid,
> +               .empty_blob = &empty_blob_oid,
> +       },
> +};

This is the same new pattern as in 512f41cfac (clean.c: use
designated initializer, 2017-07-14)

Maybe we want to keep just one test balloon and not mix
this one in there, too?

> +
>  /*
>   * This is meant to hold a *small* number of objects that you would
>   * want read_sha1_file() to be able to return, but yet you do not want

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 5/6] Integrate hash algorithm support with repo setup
  2017-08-21  0:00 ` [RFC PATCH 5/6] Integrate hash algorithm support with repo setup brian m. carlson
@ 2017-08-21 21:16   ` Stefan Beller
  0 siblings, 0 replies; 11+ messages in thread
From: Stefan Beller @ 2017-08-21 21:16 UTC (permalink / raw)
  To: brian m. carlson, Brandon Williams; +Cc: git@vger.kernel.org

On Sun, Aug 20, 2017 at 5:00 PM, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> In future versions of Git, we plan to support an additional hash
> algorithm.  Integrate the enumeration of hash algorithms with repository
> setup, and store a pointer to the enumerated data in struct repository.
> Of course, we currently only support SHA-1, so hard-code this value in
> read_repository_format.  In the future, we'll enumerate this value from
> the configuration.
>
> Add a constant, current_hash, which points to the hash_algo structure
> pointer in the repository global.  Include repository.h in cache.h since
> we now need to have access to these struct and variable definitions.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>

The new code seems to fit in nicely with the repository struct.

In another series I also included repository.h in cache.h
(not yet sent out), so I think that is a good idea.

> ---
>  cache.h      | 4 ++++
>  repository.c | 7 +++++++
>  repository.h | 5 +++++
>  setup.c      | 2 ++
>  4 files changed, 18 insertions(+)
>
> diff --git a/cache.h b/cache.h
> index 375a7fb15e..d759824803 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -13,6 +13,7 @@
>  #include "hash.h"
>  #include "path.h"
>  #include "sha1-array.h"
> +#include "repository.h"
>
>  #ifndef platform_SHA_CTX
>  /*
> @@ -112,6 +113,8 @@ struct git_hash_algo {
>  };
>  extern const struct git_hash_algo hash_algos[1];
>
> +#define current_hash the_repository->hash_algo
> +
>  #if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT)
>  #define DTYPE(de)      ((de)->d_type)
>  #else
> @@ -894,6 +897,7 @@ struct repository_format {
>         int version;
>         int precious_objects;
>         int is_bare;
> +       int hash_algo;
>         char *work_tree;
>         struct string_list unknown_extensions;
>  };
> diff --git a/repository.c b/repository.c
> index 1617467568..37764f627a 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -62,6 +62,11 @@ void repo_set_gitdir(struct repository *repo, const char *path)
>         repo_setup_env(repo);
>  }
>
> +void repo_set_hash_algo(struct repository *repo, int hash_algo)
> +{
> +       repo->hash_algo = &hash_algos[hash_algo];
> +}
> +
>  /*
>   * Attempt to resolve and set the provided 'gitdir' for repository 'repo'.
>   * Return 0 upon success and a non-zero value upon failure.
> @@ -134,6 +139,8 @@ int repo_init(struct repository *repo, const char *gitdir, const char *worktree)
>         if (read_and_verify_repository_format(&format, repo->commondir))
>                 goto error;
>
> +       repo->hash_algo = &hash_algos[format.hash_algo];
> +
>         if (worktree)
>                 repo_set_worktree(repo, worktree);
>
> diff --git a/repository.h b/repository.h
> index 417787f3ef..f171172150 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -4,6 +4,7 @@
>  struct config_set;
>  struct index_state;
>  struct submodule_cache;
> +struct git_hash_algo;
>
>  struct repository {
>         /* Environment */
> @@ -67,6 +68,9 @@ struct repository {
>          */
>         struct index_state *index;
>
> +       /* Repository's current hash algorithm. */
> +       const struct git_hash_algo *hash_algo;
> +
>         /* Configurations */
>         /*
>          * Bit used during initialization to indicate if repository state (like
> @@ -86,6 +90,7 @@ extern struct repository *the_repository;
>
>  extern void repo_set_gitdir(struct repository *repo, const char *path);
>  extern void repo_set_worktree(struct repository *repo, const char *path);
> +extern void repo_set_hash_algo(struct repository *repo, int algo);
>  extern int repo_init(struct repository *repo, const char *gitdir, const char *worktree);
>  extern int repo_submodule_init(struct repository *submodule,
>                                struct repository *superproject,
> diff --git a/setup.c b/setup.c
> index 115e70a4e8..289e24811c 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -491,6 +491,7 @@ int read_repository_format(struct repository_format *format, const char *path)
>         memset(format, 0, sizeof(*format));
>         format->version = -1;
>         format->is_bare = -1;
> +       format->hash_algo = GIT_HASH_SHA1;
>         string_list_init(&format->unknown_extensions, 1);
>         git_config_from_file(check_repo_format, path, format);
>         return format->version;
> @@ -1125,6 +1126,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
>                         repo_set_gitdir(the_repository, gitdir);
>                         setup_git_env();
>                 }
> +               repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
>         }
>
>         strbuf_release(&dir);

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-08-21 21:16 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-21  0:00 [RFC PATCH 0/6] Hash Abstraction brian m. carlson
2017-08-21  0:00 ` [RFC PATCH 1/6] vcs-svn: remove unused prototypes brian m. carlson
2017-08-21  0:00 ` [RFC PATCH 2/6] vcs-svn: rename repo functions to "svn_repo" brian m. carlson
2017-08-21  0:00 ` [RFC PATCH 3/6] setup: expose enumerated repo info brian m. carlson
2017-08-21  0:00 ` [RFC PATCH 4/6] Add structure representing hash algorithm brian m. carlson
2017-08-21 21:08   ` Stefan Beller
2017-08-21  0:00 ` [RFC PATCH 5/6] Integrate hash algorithm support with repo setup brian m. carlson
2017-08-21 21:16   ` Stefan Beller
2017-08-21  0:00 ` [RFC PATCH 6/6] Switch empty tree and blob lookups to use hash abstraction brian m. carlson
2017-08-21  0:18 ` [RFC PATCH 0/6] Hash Abstraction Junio C Hamano
2017-08-21 20:48 ` Stefan Beller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).