[PATCH 0/6] convert hash_pos() to oid

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH 0/6] convert hash_pos() to oid_pos()
@ 2021-01-28  6:12 Jeff King
  2021-01-28  6:12 ` [PATCH 1/6] commit_graft_pos(): take an oid instead of a bare hash Jeff King
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Jeff King @ 2021-01-28  6:12 UTC (permalink / raw)
  To: git

I started this series in early December, after getting annoyed that we
still have a function called sha1_pos(). But before I got a chance to
polish it, it became hash_pos(). That removed most of my annoyance,
though I do still like the type-safety that using object_id brings.

And of course there were some interesting cleanups along the way. So I
decided to rebase and send it anyway. I think one could argue that
hash_pos() is a better interface because a caller _could_ use it without
an object_id.  But in practice we do not seem to do so (the one
exception is rerere, but I ended up refactoring that away anyway).

So here it is. Possibly code churn, possibly cleanup. Opinions welcome. :)

  [1/6]: commit_graft_pos(): take an oid instead of a bare hash
  [2/6]: rerere: check dirname format while iterating rr_cache directory
  [3/6]: rerere: tighten rr-cache dirname check
  [4/6]: rerere: use strmap to store rerere directories
  [5/6]: hash_pos(): convert to oid_pos()
  [6/6]: oid_pos(): access table through const pointers

 builtin/name-rev.c  | 10 +++---
 commit-graph.c      | 30 +++++++++---------
 commit.c            | 18 +++++------
 commit.h            |  2 +-
 hash-lookup.c       | 18 +++++------
 hash-lookup.h       | 10 +++---
 oid-array.c         |  8 ++---
 pack-bitmap-write.c |  8 ++---
 rerere.c            | 75 ++++++++++++++++++++-------------------------
 shallow.c           |  2 +-
 10 files changed, 87 insertions(+), 94 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/6] commit_graft_pos(): take an oid instead of a bare hash
  2021-01-28  6:12 [PATCH 0/6] convert hash_pos() to oid_pos() Jeff King
@ 2021-01-28  6:12 ` Jeff King
  2021-01-28 12:57   ` Derrick Stolee
  2021-01-28  6:14 ` [PATCH 2/6] rerere: check dirname format while iterating rr_cache directory Jeff King
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2021-01-28  6:12 UTC (permalink / raw)
  To: git

All of our callers have an object_id, and are just dereferencing the
hash field to pass to us. Let's take the actual object_id instead. We
still access the hash to pass to hash_pos, but it's a step in the right
direction.

This makes the callers slightly simpler, but also gets rid of the
untyped pointer, as well as the now-inaccurate name "sha1".

Signed-off-by: Jeff King <peff@peff.net>
---
I think this one is an obvious cleanup that we'd want whether we go
further in the series or not.

 commit.c  | 8 ++++----
 commit.h  | 2 +-
 shallow.c | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/commit.c b/commit.c
index bab8d5ab07..fa26729ba5 100644
--- a/commit.c
+++ b/commit.c
@@ -111,17 +111,17 @@ static const unsigned char *commit_graft_sha1_access(size_t index, void *table)
 	return commit_graft_table[index]->oid.hash;
 }
 
-int commit_graft_pos(struct repository *r, const unsigned char *sha1)
+int commit_graft_pos(struct repository *r, const struct object_id *oid)
 {
-	return hash_pos(sha1, r->parsed_objects->grafts,
+	return hash_pos(oid->hash, r->parsed_objects->grafts,
 			r->parsed_objects->grafts_nr,
 			commit_graft_sha1_access);
 }
 
 int register_commit_graft(struct repository *r, struct commit_graft *graft,
 			  int ignore_dups)
 {
-	int pos = commit_graft_pos(r, graft->oid.hash);
+	int pos = commit_graft_pos(r, &graft->oid);
 
 	if (0 <= pos) {
 		if (ignore_dups)
@@ -232,7 +232,7 @@ struct commit_graft *lookup_commit_graft(struct repository *r, const struct obje
 {
 	int pos;
 	prepare_commit_graft(r);
-	pos = commit_graft_pos(r, oid->hash);
+	pos = commit_graft_pos(r, oid);
 	if (pos < 0)
 		return NULL;
 	return r->parsed_objects->grafts[pos];
diff --git a/commit.h b/commit.h
index f4e7b0158e..ecacf9ade3 100644
--- a/commit.h
+++ b/commit.h
@@ -239,7 +239,7 @@ typedef int (*each_commit_graft_fn)(const struct commit_graft *, void *);
 
 struct commit_graft *read_graft_line(struct strbuf *line);
 /* commit_graft_pos returns an index into r->parsed_objects->grafts. */
-int commit_graft_pos(struct repository *r, const unsigned char *sha1);
+int commit_graft_pos(struct repository *r, const struct object_id *oid);
 int register_commit_graft(struct repository *r, struct commit_graft *, int);
 void prepare_commit_graft(struct repository *r);
 struct commit_graft *lookup_commit_graft(struct repository *r, const struct object_id *oid);
diff --git a/shallow.c b/shallow.c
index 91b9e1073c..9ed18eb884 100644
--- a/shallow.c
+++ b/shallow.c
@@ -41,7 +41,7 @@ int register_shallow(struct repository *r, const struct object_id *oid)
 
 int unregister_shallow(const struct object_id *oid)
 {
-	int pos = commit_graft_pos(the_repository, oid->hash);
+	int pos = commit_graft_pos(the_repository, oid);
 	if (pos < 0)
 		return -1;
 	if (pos + 1 < the_repository->parsed_objects->grafts_nr)
-- 
2.30.0.758.gfe500d6872


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/6] rerere: check dirname format while iterating rr_cache directory
  2021-01-28  6:12 [PATCH 0/6] convert hash_pos() to oid_pos() Jeff King
  2021-01-28  6:12 ` [PATCH 1/6] commit_graft_pos(): take an oid instead of a bare hash Jeff King
@ 2021-01-28  6:14 ` Jeff King
  2021-01-28  6:16 ` [PATCH 3/6] rerere: tighten rr-cache dirname check Jeff King
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2021-01-28  6:14 UTC (permalink / raw)
  To: git

In rerere_gc(), we walk over the .git/rr_cache directory and create a
struct for each entry we find. We feed any name we get from readdir() to
find_rerere_dir(), which then calls get_sha1_hex() on it (since we use
the binary hash as a lookup key). If that fails (i.e., the directory
name is not what we expected), it returns NULL. But the comment in
find_rerere_dir() says "BUG".

It _would_ be a bug for the call from new_rerere_id_hex(), the only
other code path, to fail here; it's generating the hex internally. But
the call in rerere_gc() is using it say "is this a plausible directory
name".

Let's instead have rerere_gc() do its own "is this plausible" check.
That has two benefits:

  - we can now reliably BUG() inside find_rerere_dir(), which would
    catch bugs in the other code path (and we now will never return NULL
    from the function, which makes it easier to see that a rerere_id
    struct will always have a non-NULL "collection" field).

  - it makes the use of the binary hash an implementation detail of
    find_rerere_dir(), not known by callers. That will free us up to
    change it in a future patch.

Signed-off-by: Jeff King <peff@peff.net>
---
Obviously I have an ulterior motive here, but mostly this just seemed
to make the code clearer to me. The separation of concerns might also
help if we did something more exotic with rerere conflict hashes (which
just use the_hash_algo now, but really don't have to).

 rerere.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/rerere.c b/rerere.c
index d6928c1b5c..7b0c262ac6 100644
--- a/rerere.c
+++ b/rerere.c
@@ -146,7 +146,7 @@ static struct rerere_dir *find_rerere_dir(const char *hex)
 	int pos;
 
 	if (get_sha1_hex(hex, hash))
-		return NULL; /* BUG */
+		BUG("cannot parse rerere dir hex?");
 	pos = hash_pos(hash, rerere_dir, rerere_dir_nr, rerere_dir_hash);
 	if (pos < 0) {
 		rr_dir = xmalloc(sizeof(*rr_dir));
@@ -1178,6 +1178,13 @@ static void prune_one(struct rerere_id *id,
 		unlink_rr_item(id);
 }
 
+/* Does the basename in "path" look plausibly like an rr-cache entry? */
+static int is_rr_cache_dirname(const char *path)
+{
+	unsigned char hash[GIT_MAX_RAWSZ];
+	return !get_sha1_hex(path, hash);
+}
+
 void rerere_gc(struct repository *r, struct string_list *rr)
 {
 	struct string_list to_remove = STRING_LIST_INIT_DUP;
@@ -1205,10 +1212,11 @@ void rerere_gc(struct repository *r, struct string_list *rr)
 
 		if (is_dot_or_dotdot(e->d_name))
 			continue;
-		rr_dir = find_rerere_dir(e->d_name);
-		if (!rr_dir)
+		if (!is_rr_cache_dirname(e->d_name))
 			continue; /* or should we remove e->d_name? */
 
+		rr_dir = find_rerere_dir(e->d_name);
+
 		now_empty = 1;
 		for (id.variant = 0, id.collection = rr_dir;
 		     id.variant < id.collection->status_nr;
-- 
2.30.0.758.gfe500d6872


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/6] rerere: tighten rr-cache dirname check
  2021-01-28  6:12 [PATCH 0/6] convert hash_pos() to oid_pos() Jeff King
  2021-01-28  6:12 ` [PATCH 1/6] commit_graft_pos(): take an oid instead of a bare hash Jeff King
  2021-01-28  6:14 ` [PATCH 2/6] rerere: check dirname format while iterating rr_cache directory Jeff King
@ 2021-01-28  6:16 ` Jeff King
  2021-01-28 22:35   ` Taylor Blau
  2021-01-28  6:19 ` [PATCH 4/6] rerere: use strmap to store rerere directories Jeff King
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2021-01-28  6:16 UTC (permalink / raw)
  To: git

We check only that get_sha1_hex() doesn't complain, which means we'd
match an all-hex name with trailing cruft after it. This probably
doesn't matter much in practice, since there shouldn't be anything else
ni the rr-cache directory, but it could possibly cause us to mix up sha1
and sha256 entries (which also shouldn't be intermingled, but could be
leftovers from a repository conversion).

Note that "get_sha1_hex()" is a confusing historical name. It is
actually using the_hash_algo, so it would be sha256 in a sha256 repo.
We'll switch to using parse_oid_hex(), because that conveniently
advances our pointer. But it also gets rid of the sha1 name. Arguably
it's a little funny to use "object_id" here for something that isn't
actually naming an object, but it's unlikely to be a problem (and is
contained in a single function).

Signed-off-by: Jeff King <peff@peff.net>
---
This is mostly just an oddity I noticed while touching the code, and
I've never seen triggered in real life.

 rerere.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/rerere.c b/rerere.c
index 7b0c262ac6..e92e305f96 100644
--- a/rerere.c
+++ b/rerere.c
@@ -1181,8 +1181,9 @@ static void prune_one(struct rerere_id *id,
 /* Does the basename in "path" look plausibly like an rr-cache entry? */
 static int is_rr_cache_dirname(const char *path)
 {
-	unsigned char hash[GIT_MAX_RAWSZ];
-	return !get_sha1_hex(path, hash);
+	struct object_id oid;
+	const char *end;
+	return !parse_oid_hex(path, &oid, &end) && !*end;
 }
 
 void rerere_gc(struct repository *r, struct string_list *rr)
-- 
2.30.0.758.gfe500d6872


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/6] rerere: use strmap to store rerere directories
  2021-01-28  6:12 [PATCH 0/6] convert hash_pos() to oid_pos() Jeff King
                   ` (2 preceding siblings ...)
  2021-01-28  6:16 ` [PATCH 3/6] rerere: tighten rr-cache dirname check Jeff King
@ 2021-01-28  6:19 ` Jeff King
  2021-01-28  6:34   ` Jeff King
  2021-01-28  6:38   ` Junio C Hamano
  2021-01-28  6:19 ` [PATCH 5/6] hash_pos(): convert to oid_pos() Jeff King
  2021-01-28  6:20 ` [PATCH 6/6] oid_pos(): access table through const pointers Jeff King
  5 siblings, 2 replies; 15+ messages in thread
From: Jeff King @ 2021-01-28  6:19 UTC (permalink / raw)
  To: git

On Thu, Dec 03, 2020 at 11:28:14AM -0500, Jeff King wrote:

> We store a struct for each directory we access under .git/rr-cache. The
> structs are kept in an array sorted by the binary hash associated with
> their name (and we do lookups with a binary search).
> 
> This works OK, but there are a few small downsides:
> 
>  - the amount of code isn't huge, but it's more than we'd need using one
>    of our other stock data structures
> 
>  - the insertion into a sorted array is quadratic (though in practice
>    it's unlikely anybody has enough conflicts for this to matter)
> 
>  - it's intimately tied to the representation of an object hash. This
>    isn't a big deal, as the conflict ids we generate use the same hash,
>    but it produces a few awkward bits (e.g., we are the only user of
>    hash_pos() that is not using object_id).
> 
> Let's instead just treat the directory names as strings, and store them
> in a strmap. This is less code, and removes the use of hash_pos().
> 
> Insertion is now non-quadratic, though we probably use a bit more
> memory. Besides the hash table overhead, and storing hex bytes instead
> of a binary hash, we actually store each name twice. Other code expects
> to access the name of a rerere_dir struct from the struct itself, so we
> need a copy there. But strmap keeps its own copy of the name, as well.
> 
> Using a bare hashmap instead of strmap means we could use the name for
> both, but at the cost of extra code (e.g., our own comparison function).
> Likewise, strmap has a feature to use a pointer to the in-struct name at
> the cost of a little extra code. I didn't do either here, as simple code
> seemed more important than squeezing out a few bytes of efficiency.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
This one might be controversial, or at least considered unnecessary
churn. Because the benefits I listed above are pretty negligible, and
really my ulterior motive is getting rid of the call to hash_pos().

Still, it was nice to try out strmap a bit more, and the resulting code
is shorter. It also abstracts things more, if we were to do something
more exotic with hashes (right now I think a sha256 repo just has sha256
conflict hashes, and everything would just work. I don't have any plans
to change that, but this would be a better base if somebody wanted to do
so later).

>  rerere.c | 62 +++++++++++++++++++++-----------------------------------
>  1 file changed, 23 insertions(+), 39 deletions(-)
> 
> diff --git a/rerere.c b/rerere.c
> index e92e305f96..dee60dc6df 100644
> --- a/rerere.c
> +++ b/rerere.c
> @@ -11,6 +11,7 @@
>  #include "pathspec.h"
>  #include "object-store.h"
>  #include "hash-lookup.h"
> +#include "strmap.h"
>  
>  #define RESOLVED 0
>  #define PUNTED 1
> @@ -23,26 +24,27 @@ static int rerere_enabled = -1;
>  /* automatically update cleanly resolved paths to the index */
>  static int rerere_autoupdate;
>  
> -static int rerere_dir_nr;
> -static int rerere_dir_alloc;
> -
>  #define RR_HAS_POSTIMAGE 1
>  #define RR_HAS_PREIMAGE 2
> -static struct rerere_dir {
> -	unsigned char hash[GIT_MAX_HEXSZ];
> +struct rerere_dir {
>  	int status_alloc, status_nr;
>  	unsigned char *status;
> -} **rerere_dir;
> +	char name[FLEX_ARRAY];
> +};
> +
> +static struct strmap rerere_dirs = STRMAP_INIT;
>  
>  static void free_rerere_dirs(void)
>  {
> -	int i;
> -	for (i = 0; i < rerere_dir_nr; i++) {
> -		free(rerere_dir[i]->status);
> -		free(rerere_dir[i]);
> +	struct hashmap_iter iter;
> +	struct strmap_entry *ent;
> +
> +	strmap_for_each_entry(&rerere_dirs, &iter, ent) {
> +		struct rerere_dir *rr_dir = ent->value;
> +		free(rr_dir->status);
> +		free(rr_dir);
>  	}
> -	FREE_AND_NULL(rerere_dir);
> -	rerere_dir_nr = rerere_dir_alloc = 0;
> +	strmap_clear(&rerere_dirs, 0);
>  }
>  
>  static void free_rerere_id(struct string_list_item *item)
> @@ -52,7 +54,7 @@ static void free_rerere_id(struct string_list_item *item)
>  
>  static const char *rerere_id_hex(const struct rerere_id *id)
>  {
> -	return hash_to_hex(id->collection->hash);
> +	return id->collection->name;
>  }
>  
>  static void fit_variant(struct rerere_dir *rr_dir, int variant)
> @@ -115,7 +117,7 @@ static int is_rr_file(const char *name, const char *filename, int *variant)
>  static void scan_rerere_dir(struct rerere_dir *rr_dir)
>  {
>  	struct dirent *de;
> -	DIR *dir = opendir(git_path("rr-cache/%s", hash_to_hex(rr_dir->hash)));
> +	DIR *dir = opendir(git_path("rr-cache/%s", rr_dir->name));
>  
>  	if (!dir)
>  		return;
> @@ -133,39 +135,21 @@ static void scan_rerere_dir(struct rerere_dir *rr_dir)
>  	closedir(dir);
>  }
>  
> -static const unsigned char *rerere_dir_hash(size_t i, void *table)
> -{
> -	struct rerere_dir **rr_dir = table;
> -	return rr_dir[i]->hash;
> -}
> -
>  static struct rerere_dir *find_rerere_dir(const char *hex)
>  {
> -	unsigned char hash[GIT_MAX_RAWSZ];
>  	struct rerere_dir *rr_dir;
> -	int pos;
> -
> -	if (get_sha1_hex(hex, hash))
> -		BUG("cannot parse rerere dir hex?");
> -	pos = hash_pos(hash, rerere_dir, rerere_dir_nr, rerere_dir_hash);
> -	if (pos < 0) {
> -		rr_dir = xmalloc(sizeof(*rr_dir));
> -		hashcpy(rr_dir->hash, hash);
> +
> +	rr_dir = strmap_get(&rerere_dirs, hex);
> +	if (!rr_dir) {
> +		FLEX_ALLOC_STR(rr_dir, name, hex);
>  		rr_dir->status = NULL;
>  		rr_dir->status_nr = 0;
>  		rr_dir->status_alloc = 0;
> -		pos = -1 - pos;
> -
> -		/* Make sure the array is big enough ... */
> -		ALLOC_GROW(rerere_dir, rerere_dir_nr + 1, rerere_dir_alloc);
> -		/* ... and add it in. */
> -		rerere_dir_nr++;
> -		MOVE_ARRAY(rerere_dir + pos + 1, rerere_dir + pos,
> -			   rerere_dir_nr - pos - 1);
> -		rerere_dir[pos] = rr_dir;
> +		strmap_put(&rerere_dirs, hex, rr_dir);
> +
>  		scan_rerere_dir(rr_dir);
>  	}
> -	return rerere_dir[pos];
> +	return rr_dir;
>  }
>  
>  static int has_rerere_resolution(const struct rerere_id *id)
> -- 
> 2.30.0.758.gfe500d6872
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 5/6] hash_pos(): convert to oid_pos()
  2021-01-28  6:12 [PATCH 0/6] convert hash_pos() to oid_pos() Jeff King
                   ` (3 preceding siblings ...)
  2021-01-28  6:19 ` [PATCH 4/6] rerere: use strmap to store rerere directories Jeff King
@ 2021-01-28  6:19 ` Jeff King
  2021-01-28 22:48   ` Taylor Blau
  2021-01-28  6:20 ` [PATCH 6/6] oid_pos(): access table through const pointers Jeff King
  5 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2021-01-28  6:19 UTC (permalink / raw)
  To: git

All of our callers are actually looking up an object_id, not a bare
hash. Likewise, the arrays they are looking in are actual arrays of
object_id (not just raw bytes of hashes, as we might find in a pack
.idx; those are handled by bsearch_hash()).

Using an object_id gives us more type safety, and makes the callers
slightly shorter. It also gets rid of the word "sha1" from several
access functions, though we could obviously also rename those with
s/sha1/hash/.

Signed-off-by: Jeff King <peff@peff.net>
---
If we don't want to make this change, I think it's still worth sweeping
through the callers and changing the names of their access functions.

 builtin/name-rev.c  |  8 ++++----
 commit-graph.c      | 28 ++++++++++++++--------------
 commit.c            | 10 +++++-----
 hash-lookup.c       | 18 +++++++++---------
 hash-lookup.h       | 10 +++++-----
 oid-array.c         |  6 +++---
 pack-bitmap-write.c |  6 +++---
 7 files changed, 43 insertions(+), 43 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 3fe71a8c01..27138fdce4 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -390,10 +390,10 @@ static void name_tips(void)
 	}
 }
 
-static const unsigned char *nth_tip_table_ent(size_t ix, void *table_)
+static const struct object_id *nth_tip_table_ent(size_t ix, void *table_)
 {
 	struct tip_table_entry *table = table_;
-	return table[ix].oid.hash;
+	return &table[ix].oid;
 }
 
 static const char *get_exact_ref_match(const struct object *o)
@@ -408,8 +408,8 @@ static const char *get_exact_ref_match(const struct object *o)
 		tip_table.sorted = 1;
 	}
 
-	found = hash_pos(o->oid.hash, tip_table.table, tip_table.nr,
-			 nth_tip_table_ent);
+	found = oid_pos(&o->oid, tip_table.table, tip_table.nr,
+			nth_tip_table_ent);
 	if (0 <= found)
 		return tip_table.table[found].refname;
 	return NULL;
diff --git a/commit-graph.c b/commit-graph.c
index f3486ec18f..248f1efb73 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1012,10 +1012,10 @@ static int write_graph_chunk_oids(struct hashfile *f,
 	return 0;
 }
 
-static const unsigned char *commit_to_sha1(size_t index, void *table)
+static const struct object_id *commit_to_oid(size_t index, void *table)
 {
 	struct commit **commits = table;
-	return commits[index]->object.oid.hash;
+	return &commits[index]->object.oid;
 }
 
 static int write_graph_chunk_data(struct hashfile *f,
@@ -1043,10 +1043,10 @@ static int write_graph_chunk_data(struct hashfile *f,
 		if (!parent)
 			edge_value = GRAPH_PARENT_NONE;
 		else {
-			edge_value = hash_pos(parent->item->object.oid.hash,
-					      ctx->commits.list,
-					      ctx->commits.nr,
-					      commit_to_sha1);
+			edge_value = oid_pos(&parent->item->object.oid,
+					     ctx->commits.list,
+					     ctx->commits.nr,
+					     commit_to_oid);
 
 			if (edge_value >= 0)
 				edge_value += ctx->new_num_commits_in_base;
@@ -1074,10 +1074,10 @@ static int write_graph_chunk_data(struct hashfile *f,
 		else if (parent->next)
 			edge_value = GRAPH_EXTRA_EDGES_NEEDED | num_extra_edges;
 		else {
-			edge_value = hash_pos(parent->item->object.oid.hash,
-					      ctx->commits.list,
-					      ctx->commits.nr,
-					      commit_to_sha1);
+			edge_value = oid_pos(&parent->item->object.oid,
+					     ctx->commits.list,
+					     ctx->commits.nr,
+					     commit_to_oid);
 
 			if (edge_value >= 0)
 				edge_value += ctx->new_num_commits_in_base;
@@ -1143,10 +1143,10 @@ static int write_graph_chunk_extra_edges(struct hashfile *f,
 
 		/* Since num_parents > 2, this initializer is safe. */
 		for (parent = (*list)->parents->next; parent; parent = parent->next) {
-			int edge_value = hash_pos(parent->item->object.oid.hash,
-						  ctx->commits.list,
-						  ctx->commits.nr,
-						  commit_to_sha1);
+			int edge_value = oid_pos(&parent->item->object.oid,
+						 ctx->commits.list,
+						 ctx->commits.nr,
+						 commit_to_oid);
 
 			if (edge_value >= 0)
 				edge_value += ctx->new_num_commits_in_base;
diff --git a/commit.c b/commit.c
index fa26729ba5..39eab5b36b 100644
--- a/commit.c
+++ b/commit.c
@@ -105,17 +105,17 @@ static timestamp_t parse_commit_date(const char *buf, const char *tail)
 	return parse_timestamp(dateptr, NULL, 10);
 }
 
-static const unsigned char *commit_graft_sha1_access(size_t index, void *table)
+static const struct object_id *commit_graft_oid_access(size_t index, void *table)
 {
 	struct commit_graft **commit_graft_table = table;
-	return commit_graft_table[index]->oid.hash;
+	return &commit_graft_table[index]->oid;
 }
 
 int commit_graft_pos(struct repository *r, const struct object_id *oid)
 {
-	return hash_pos(oid->hash, r->parsed_objects->grafts,
-			r->parsed_objects->grafts_nr,
-			commit_graft_sha1_access);
+	return oid_pos(oid, r->parsed_objects->grafts,
+		       r->parsed_objects->grafts_nr,
+		       commit_graft_oid_access);
 }
 
 int register_commit_graft(struct repository *r, struct commit_graft *graft,
diff --git a/hash-lookup.c b/hash-lookup.c
index 1191856a32..d15bb34574 100644
--- a/hash-lookup.c
+++ b/hash-lookup.c
@@ -1,9 +1,9 @@
 #include "cache.h"
 #include "hash-lookup.h"
 
-static uint32_t take2(const unsigned char *hash)
+static uint32_t take2(const struct object_id *oid, size_t ofs)
 {
-	return ((hash[0] << 8) | hash[1]);
+	return ((oid->hash[ofs] << 8) | oid->hash[ofs + 1]);
 }
 
 /*
@@ -47,11 +47,11 @@ static uint32_t take2(const unsigned char *hash)
  */
 /*
  * The table should contain "nr" elements.
- * The hash of element i (between 0 and nr - 1) should be returned
+ * The oid of element i (between 0 and nr - 1) should be returned
  * by "fn(i, table)".
  */
-int hash_pos(const unsigned char *hash, void *table, size_t nr,
-	     hash_access_fn fn)
+int oid_pos(const struct object_id *oid, void *table, size_t nr,
+	    oid_access_fn fn)
 {
 	size_t hi = nr;
 	size_t lo = 0;
@@ -64,9 +64,9 @@ int hash_pos(const unsigned char *hash, void *table, size_t nr,
 		size_t lov, hiv, miv, ofs;
 
 		for (ofs = 0; ofs < the_hash_algo->rawsz - 2; ofs += 2) {
-			lov = take2(fn(0, table) + ofs);
-			hiv = take2(fn(nr - 1, table) + ofs);
-			miv = take2(hash + ofs);
+			lov = take2(fn(0, table), ofs);
+			hiv = take2(fn(nr - 1, table), ofs);
+			miv = take2(oid, ofs);
 			if (miv < lov)
 				return -1;
 			if (hiv < miv)
@@ -88,7 +88,7 @@ int hash_pos(const unsigned char *hash, void *table, size_t nr,
 
 	do {
 		int cmp;
-		cmp = hashcmp(fn(mi, table), hash);
+		cmp = oidcmp(fn(mi, table), oid);
 		if (!cmp)
 			return mi;
 		if (cmp > 0)
diff --git a/hash-lookup.h b/hash-lookup.h
index 5d476dec72..7b3ecad1f0 100644
--- a/hash-lookup.h
+++ b/hash-lookup.h
@@ -1,12 +1,12 @@
 #ifndef HASH_LOOKUP_H
 #define HASH_LOOKUP_H
 
-typedef const unsigned char *hash_access_fn(size_t index, void *table);
+typedef const struct object_id *oid_access_fn(size_t index, void *table);
 
-int hash_pos(const unsigned char *hash,
-	     void *table,
-	     size_t nr,
-	     hash_access_fn fn);
+int oid_pos(const struct object_id *oid,
+	    void *table,
+	    size_t nr,
+	    oid_access_fn fn);
 
 /*
  * Searches for hash in table, using the given fanout table to determine the
diff --git a/oid-array.c b/oid-array.c
index 889b311f22..a19235afbf 100644
--- a/oid-array.c
+++ b/oid-array.c
@@ -22,16 +22,16 @@ void oid_array_sort(struct oid_array *array)
 	array->sorted = 1;
 }
 
-static const unsigned char *sha1_access(size_t index, void *table)
+static const struct object_id *oid_access(size_t index, void *table)
 {
 	struct object_id *array = table;
-	return array[index].hash;
+	return &array[index];
 }
 
 int oid_array_lookup(struct oid_array *array, const struct object_id *oid)
 {
 	oid_array_sort(array);
-	return hash_pos(oid->hash, array->oid, array->nr, sha1_access);
+	return oid_pos(oid, array->oid, array->nr, oid_access);
 }
 
 void oid_array_clear(struct oid_array *array)
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 92460a6126..f21259dfc8 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -610,10 +610,10 @@ static inline void dump_bitmap(struct hashfile *f, struct ewah_bitmap *bitmap)
 		die("Failed to write bitmap index");
 }
 
-static const unsigned char *sha1_access(size_t pos, void *table)
+static const struct object_id *oid_access(size_t pos, void *table)
 {
 	struct pack_idx_entry **index = table;
-	return index[pos]->oid.hash;
+	return &index[pos]->oid;
 }
 
 static void write_selected_commits_v1(struct hashfile *f,
@@ -626,7 +626,7 @@ static void write_selected_commits_v1(struct hashfile *f,
 		struct bitmapped_commit *stored = &writer.selected[i];
 
 		int commit_pos =
-			hash_pos(stored->commit->object.oid.hash, index, index_nr, sha1_access);
+			oid_pos(&stored->commit->object.oid, index, index_nr, oid_access);
 
 		if (commit_pos < 0)
 			BUG("trying to write commit not in index");
-- 
2.30.0.758.gfe500d6872


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/6] oid_pos(): access table through const pointers
  2021-01-28  6:12 [PATCH 0/6] convert hash_pos() to oid_pos() Jeff King
                   ` (4 preceding siblings ...)
  2021-01-28  6:19 ` [PATCH 5/6] hash_pos(): convert to oid_pos() Jeff King
@ 2021-01-28  6:20 ` Jeff King
  5 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2021-01-28  6:20 UTC (permalink / raw)
  To: git

When we are looking up an oid in an array, we obviously don't need to
write to the array. Let's mark it as const in the function interfaces,
as well as in the local variables we use to derference the void pointer
(note a few cases use pointers-to-pointers, so we mark everything
const).

Signed-off-by: Jeff King <peff@peff.net>
---
This is just a hygiene thing. I think it's pretty unlikely for one of
these 2-line access functions to somehow forget that the table is
supposed to be const.

 builtin/name-rev.c  | 4 ++--
 commit-graph.c      | 4 ++--
 commit.c            | 4 ++--
 hash-lookup.c       | 2 +-
 hash-lookup.h       | 4 ++--
 oid-array.c         | 4 ++--
 pack-bitmap-write.c | 4 ++--
 7 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 27138fdce4..b221d30014 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -390,9 +390,9 @@ static void name_tips(void)
 	}
 }
 
-static const struct object_id *nth_tip_table_ent(size_t ix, void *table_)
+static const struct object_id *nth_tip_table_ent(size_t ix, const void *table_)
 {
-	struct tip_table_entry *table = table_;
+	const struct tip_table_entry *table = table_;
 	return &table[ix].oid;
 }
 
diff --git a/commit-graph.c b/commit-graph.c
index 248f1efb73..b09fce5f57 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1012,9 +1012,9 @@ static int write_graph_chunk_oids(struct hashfile *f,
 	return 0;
 }
 
-static const struct object_id *commit_to_oid(size_t index, void *table)
+static const struct object_id *commit_to_oid(size_t index, const void *table)
 {
-	struct commit **commits = table;
+	const struct commit * const *commits = table;
 	return &commits[index]->object.oid;
 }
 
diff --git a/commit.c b/commit.c
index 39eab5b36b..fd2831dad3 100644
--- a/commit.c
+++ b/commit.c
@@ -105,9 +105,9 @@ static timestamp_t parse_commit_date(const char *buf, const char *tail)
 	return parse_timestamp(dateptr, NULL, 10);
 }
 
-static const struct object_id *commit_graft_oid_access(size_t index, void *table)
+static const struct object_id *commit_graft_oid_access(size_t index, const void *table)
 {
-	struct commit_graft **commit_graft_table = table;
+	const struct commit_graft * const *commit_graft_table = table;
 	return &commit_graft_table[index]->oid;
 }
 
diff --git a/hash-lookup.c b/hash-lookup.c
index d15bb34574..b98ed5e11e 100644
--- a/hash-lookup.c
+++ b/hash-lookup.c
@@ -50,7 +50,7 @@ static uint32_t take2(const struct object_id *oid, size_t ofs)
  * The oid of element i (between 0 and nr - 1) should be returned
  * by "fn(i, table)".
  */
-int oid_pos(const struct object_id *oid, void *table, size_t nr,
+int oid_pos(const struct object_id *oid, const void *table, size_t nr,
 	    oid_access_fn fn)
 {
 	size_t hi = nr;
diff --git a/hash-lookup.h b/hash-lookup.h
index 7b3ecad1f0..dbd71ebaf7 100644
--- a/hash-lookup.h
+++ b/hash-lookup.h
@@ -1,10 +1,10 @@
 #ifndef HASH_LOOKUP_H
 #define HASH_LOOKUP_H
 
-typedef const struct object_id *oid_access_fn(size_t index, void *table);
+typedef const struct object_id *oid_access_fn(size_t index, const void *table);
 
 int oid_pos(const struct object_id *oid,
-	    void *table,
+	    const void *table,
 	    size_t nr,
 	    oid_access_fn fn);
 
diff --git a/oid-array.c b/oid-array.c
index a19235afbf..73ba76e9e9 100644
--- a/oid-array.c
+++ b/oid-array.c
@@ -22,9 +22,9 @@ void oid_array_sort(struct oid_array *array)
 	array->sorted = 1;
 }
 
-static const struct object_id *oid_access(size_t index, void *table)
+static const struct object_id *oid_access(size_t index, const void *table)
 {
-	struct object_id *array = table;
+	const struct object_id *array = table;
 	return &array[index];
 }
 
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index f21259dfc8..88d9e696a5 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -610,9 +610,9 @@ static inline void dump_bitmap(struct hashfile *f, struct ewah_bitmap *bitmap)
 		die("Failed to write bitmap index");
 }
 
-static const struct object_id *oid_access(size_t pos, void *table)
+static const struct object_id *oid_access(size_t pos, const void *table)
 {
-	struct pack_idx_entry **index = table;
+	const struct pack_idx_entry * const *index = table;
 	return &index[pos]->oid;
 }
 
-- 
2.30.0.758.gfe500d6872

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/6] rerere: use strmap to store rerere directories
  2021-01-28  6:19 ` [PATCH 4/6] rerere: use strmap to store rerere directories Jeff King
@ 2021-01-28  6:34   ` Jeff King
  2021-01-28 19:52     ` Junio C Hamano
  2021-01-28  6:38   ` Junio C Hamano
  1 sibling, 1 reply; 15+ messages in thread
From: Jeff King @ 2021-01-28  6:34 UTC (permalink / raw)
  To: git

On Thu, Jan 28, 2021 at 01:19:12AM -0500, Jeff King wrote:

> On Thu, Dec 03, 2020 at 11:28:14AM -0500, Jeff King wrote:
> 
> > We store a struct for each directory we access under .git/rr-cache. The
> > structs are kept in an array sorted by the binary hash associated with
> > their name (and we do lookups with a binary search).

Hmph. Apparently I accidentally sent this as a reply instead of sending
the original format-patch output. Oops.

Here it is minus the quoting, which should make it easier to apply. :)

-- >8 --
Subject: [PATCH] rerere: use strmap to store rerere directories

We store a struct for each directory we access under .git/rr-cache. The
structs are kept in an array sorted by the binary hash associated with
their name (and we do lookups with a binary search).

This works OK, but there are a few small downsides:

 - the amount of code isn't huge, but it's more than we'd need using one
   of our other stock data structures

 - the insertion into a sorted array is quadratic (though in practice
   it's unlikely anybody has enough conflicts for this to matter)

 - it's intimately tied to the representation of an object hash. This
   isn't a big deal, as the conflict ids we generate use the same hash,
   but it produces a few awkward bits (e.g., we are the only user of
   hash_pos() that is not using object_id).

Let's instead just treat the directory names as strings, and store them
in a strmap. This is less code, and removes the use of hash_pos().

Insertion is now non-quadratic, though we probably use a bit more
memory. Besides the hash table overhead, and storing hex bytes instead
of a binary hash, we actually store each name twice. Other code expects
to access the name of a rerere_dir struct from the struct itself, so we
need a copy there. But strmap keeps its own copy of the name, as well.

Using a bare hashmap instead of strmap means we could use the name for
both, but at the cost of extra code (e.g., our own comparison function).
Likewise, strmap has a feature to use a pointer to the in-struct name at
the cost of a little extra code. I didn't do either here, as simple code
seemed more important than squeezing out a few bytes of efficiency.

Signed-off-by: Jeff King <peff@peff.net>
---
This one might be controversial, or at least considered unnecessary
churn. Because the benefits I listed above are pretty negligible, and
really my ulterior motive is getting rid of the call to hash_pos().

Still, it was nice to try out strmap a bit more, and the resulting code
is shorter. It also abstracts things more, if we were to do something
more exotic with hashes (right now I think a sha256 repo just has sha256
conflict hashes, and everything would just work. I don't have any plans
to change that, but this would be a better base if somebody wanted to do
so later).

 rerere.c | 62 +++++++++++++++++++++-----------------------------------
 1 file changed, 23 insertions(+), 39 deletions(-)

diff --git a/rerere.c b/rerere.c
index e92e305f96..dee60dc6df 100644
--- a/rerere.c
+++ b/rerere.c
@@ -11,6 +11,7 @@
 #include "pathspec.h"
 #include "object-store.h"
 #include "hash-lookup.h"
+#include "strmap.h"
 
 #define RESOLVED 0
 #define PUNTED 1
@@ -23,26 +24,27 @@ static int rerere_enabled = -1;
 /* automatically update cleanly resolved paths to the index */
 static int rerere_autoupdate;
 
-static int rerere_dir_nr;
-static int rerere_dir_alloc;
-
 #define RR_HAS_POSTIMAGE 1
 #define RR_HAS_PREIMAGE 2
-static struct rerere_dir {
-	unsigned char hash[GIT_MAX_HEXSZ];
+struct rerere_dir {
 	int status_alloc, status_nr;
 	unsigned char *status;
-} **rerere_dir;
+	char name[FLEX_ARRAY];
+};
+
+static struct strmap rerere_dirs = STRMAP_INIT;
 
 static void free_rerere_dirs(void)
 {
-	int i;
-	for (i = 0; i < rerere_dir_nr; i++) {
-		free(rerere_dir[i]->status);
-		free(rerere_dir[i]);
+	struct hashmap_iter iter;
+	struct strmap_entry *ent;
+
+	strmap_for_each_entry(&rerere_dirs, &iter, ent) {
+		struct rerere_dir *rr_dir = ent->value;
+		free(rr_dir->status);
+		free(rr_dir);
 	}
-	FREE_AND_NULL(rerere_dir);
-	rerere_dir_nr = rerere_dir_alloc = 0;
+	strmap_clear(&rerere_dirs, 0);
 }
 
 static void free_rerere_id(struct string_list_item *item)
@@ -52,7 +54,7 @@ static void free_rerere_id(struct string_list_item *item)
 
 static const char *rerere_id_hex(const struct rerere_id *id)
 {
-	return hash_to_hex(id->collection->hash);
+	return id->collection->name;
 }
 
 static void fit_variant(struct rerere_dir *rr_dir, int variant)
@@ -115,7 +117,7 @@ static int is_rr_file(const char *name, const char *filename, int *variant)
 static void scan_rerere_dir(struct rerere_dir *rr_dir)
 {
 	struct dirent *de;
-	DIR *dir = opendir(git_path("rr-cache/%s", hash_to_hex(rr_dir->hash)));
+	DIR *dir = opendir(git_path("rr-cache/%s", rr_dir->name));
 
 	if (!dir)
 		return;
@@ -133,39 +135,21 @@ static void scan_rerere_dir(struct rerere_dir *rr_dir)
 	closedir(dir);
 }
 
-static const unsigned char *rerere_dir_hash(size_t i, void *table)
-{
-	struct rerere_dir **rr_dir = table;
-	return rr_dir[i]->hash;
-}
-
 static struct rerere_dir *find_rerere_dir(const char *hex)
 {
-	unsigned char hash[GIT_MAX_RAWSZ];
 	struct rerere_dir *rr_dir;
-	int pos;
-
-	if (get_sha1_hex(hex, hash))
-		BUG("cannot parse rerere dir hex?");
-	pos = hash_pos(hash, rerere_dir, rerere_dir_nr, rerere_dir_hash);
-	if (pos < 0) {
-		rr_dir = xmalloc(sizeof(*rr_dir));
-		hashcpy(rr_dir->hash, hash);
+
+	rr_dir = strmap_get(&rerere_dirs, hex);
+	if (!rr_dir) {
+		FLEX_ALLOC_STR(rr_dir, name, hex);
 		rr_dir->status = NULL;
 		rr_dir->status_nr = 0;
 		rr_dir->status_alloc = 0;
-		pos = -1 - pos;
-
-		/* Make sure the array is big enough ... */
-		ALLOC_GROW(rerere_dir, rerere_dir_nr + 1, rerere_dir_alloc);
-		/* ... and add it in. */
-		rerere_dir_nr++;
-		MOVE_ARRAY(rerere_dir + pos + 1, rerere_dir + pos,
-			   rerere_dir_nr - pos - 1);
-		rerere_dir[pos] = rr_dir;
+		strmap_put(&rerere_dirs, hex, rr_dir);
+
 		scan_rerere_dir(rr_dir);
 	}
-	return rerere_dir[pos];
+	return rr_dir;
 }
 
 static int has_rerere_resolution(const struct rerere_id *id)
-- 
2.30.0.758.gfe500d6872


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/6] rerere: use strmap to store rerere directories
  2021-01-28  6:19 ` [PATCH 4/6] rerere: use strmap to store rerere directories Jeff King
  2021-01-28  6:34   ` Jeff King
@ 2021-01-28  6:38   ` Junio C Hamano
  2021-01-28  6:51     ` Jeff King
  1 sibling, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2021-01-28  6:38 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> On Thu, Dec 03, 2020 at 11:28:14AM -0500, Jeff King wrote:

Curiously, the reference of the message points at the cover letter
of this series, and there is no mention of a message from early last
December.

I think this is good.  These "conflict IDs" are not even object
names, so it would be strange to use "struct object_id" for them
(for that matter, packfile ids are not object names, either, but I
suspect we might use object_id there).

If there were a patch that can readily applied ;-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/6] rerere: use strmap to store rerere directories
  2021-01-28  6:38   ` Junio C Hamano
@ 2021-01-28  6:51     ` Jeff King
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2021-01-28  6:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Wed, Jan 27, 2021 at 10:38:31PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > On Thu, Dec 03, 2020 at 11:28:14AM -0500, Jeff King wrote:
> 
> Curiously, the reference of the message points at the cover letter
> of this series, and there is no mention of a message from early last
> December.

I think that is the author-date generated by format-patch, since that's
when I originally wrote it. I generally format the messages into an
mbox, load that in mutt, and then use mutt's "bounce" feature to send
them, which rewrites the date header. In this case, though, I somehow
accidentally hit "reply" (more astounding is that I didn't notice it as
I was sending; I think I might be getting senile).

> I think this is good.  These "conflict IDs" are not even object
> names, so it would be strange to use "struct object_id" for them
> (for that matter, packfile ids are not object names, either, but I
> suspect we might use object_id there).

I suspected that, too, but when I checked I couldn't find any instance
where we do. I might not have looked hard enough, or perhaps whoever
converted the pack code had better taste than me (see the similar use of
object_id in the previous patch).

> If there were a patch that can readily applied ;-)

Yep, see my reply. :)

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/6] commit_graft_pos(): take an oid instead of a bare hash
  2021-01-28  6:12 ` [PATCH 1/6] commit_graft_pos(): take an oid instead of a bare hash Jeff King
@ 2021-01-28 12:57   ` Derrick Stolee
  0 siblings, 0 replies; 15+ messages in thread
From: Derrick Stolee @ 2021-01-28 12:57 UTC (permalink / raw)
  To: Jeff King, git

On 1/28/2021 1:12 AM, Jeff King wrote:
> All of our callers have an object_id, and are just dereferencing the
> hash field to pass to us. Let's take the actual object_id instead. We
> still access the hash to pass to hash_pos, but it's a step in the right
> direction.
> 
> This makes the callers slightly simpler, but also gets rid of the
> untyped pointer, as well as the now-inaccurate name "sha1".
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I think this one is an obvious cleanup that we'd want whether we go
> further in the series or not.

I agree. Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/6] rerere: use strmap to store rerere directories
  2021-01-28  6:34   ` Jeff King
@ 2021-01-28 19:52     ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2021-01-28 19:52 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> Subject: [PATCH] rerere: use strmap to store rerere directories
>
> We store a struct for each directory we access under .git/rr-cache. The
> structs are kept in an array sorted by the binary hash associated with
> their name (and we do lookups with a binary search).
>
> This works OK, but there are a few small downsides:
>
>  - the amount of code isn't huge, but it's more than we'd need using one
>    of our other stock data structures
>
>  - the insertion into a sorted array is quadratic (though in practice
>    it's unlikely anybody has enough conflicts for this to matter)
>
>  - it's intimately tied to the representation of an object hash. This
>    isn't a big deal, as the conflict ids we generate use the same hash,
>    but it produces a few awkward bits (e.g., we are the only user of
>    hash_pos() that is not using object_id).
>
> Let's instead just treat the directory names as strings, and store them
> in a strmap. This is less code, and removes the use of hash_pos().

Nice.  I didn't realize that this part is so well isolated that
another table implementation can easily be dropped in (it was a long
time since I stared at this part of the system, back when I was
extending it to handle <preimage,postimage> pairs that share the
same conflict IDs, which was quite tricky to get right).

> This one might be controversial, or at least considered unnecessary
> churn. Because the benefits I listed above are pretty negligible, and
> really my ulterior motive is getting rid of the call to hash_pos().

No objection from me.  Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] rerere: tighten rr-cache dirname check
  2021-01-28  6:16 ` [PATCH 3/6] rerere: tighten rr-cache dirname check Jeff King
@ 2021-01-28 22:35   ` Taylor Blau
  0 siblings, 0 replies; 15+ messages in thread
From: Taylor Blau @ 2021-01-28 22:35 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Thu, Jan 28, 2021 at 01:16:50AM -0500, Jeff King wrote:
> Note that "get_sha1_hex()" is a confusing historical name. It is
> actually using the_hash_algo, so it would be sha256 in a sha256 repo.
> We'll switch to using parse_oid_hex(), because that conveniently
> advances our pointer. But it also gets rid of the sha1 name. Arguably
> it's a little funny to use "object_id" here for something that isn't
> actually naming an object, but it's unlikely to be a problem (and is
> contained in a single function).

Not that it matters, but I think the argument would be that having "oid"
in the function call probably isn't ideal, but it's closer to the ideal
than having "sha1" in the same call (which isn't always the case).

Likewise about advancing the pointer, since everything is pass-by-value,
modifying the stack local copy of path doesn't matter to callers of
is_rr_cache_dirname().

So I think that this change is an improvement over the status-quo here.
Thanks,
Taylor

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/6] hash_pos(): convert to oid_pos()
  2021-01-28  6:19 ` [PATCH 5/6] hash_pos(): convert to oid_pos() Jeff King
@ 2021-01-28 22:48   ` Taylor Blau
  2021-01-29  1:18     ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Taylor Blau @ 2021-01-28 22:48 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Thu, Jan 28, 2021 at 01:19:42AM -0500, Jeff King wrote:
> All of our callers are actually looking up an object_id, not a bare
> hash. Likewise, the arrays they are looking in are actual arrays of
> object_id (not just raw bytes of hashes, as we might find in a pack
> .idx; those are handled by bsearch_hash()).
>
> Using an object_id gives us more type safety, and makes the callers
> slightly shorter. It also gets rid of the word "sha1" from several
> access functions, though we could obviously also rename those with
> s/sha1/hash/.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> If we don't want to make this change, I think it's still worth sweeping
> through the callers and changing the names of their access functions.
>
>  builtin/name-rev.c  |  8 ++++----
>  commit-graph.c      | 28 ++++++++++++++--------------
>  commit.c            | 10 +++++-----
>  hash-lookup.c       | 18 +++++++++---------
>  hash-lookup.h       | 10 +++++-----

I wondered briefly if we should rename this to oid-lookup.{c,h}, but I
think the answer is "no", since we still have bsearch_hash() in this
header.

Probably a single header with "hash" in the name is better than two
headers each containing a single function (and one containing an
additional typedef).

So, I think that what you did here is good.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/6] hash_pos(): convert to oid_pos()
  2021-01-28 22:48   ` Taylor Blau
@ 2021-01-29  1:18     ` Jeff King
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2021-01-29  1:18 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git

On Thu, Jan 28, 2021 at 05:48:22PM -0500, Taylor Blau wrote:

> >  builtin/name-rev.c  |  8 ++++----
> >  commit-graph.c      | 28 ++++++++++++++--------------
> >  commit.c            | 10 +++++-----
> >  hash-lookup.c       | 18 +++++++++---------
> >  hash-lookup.h       | 10 +++++-----
> 
> I wondered briefly if we should rename this to oid-lookup.{c,h}, but I
> think the answer is "no", since we still have bsearch_hash() in this
> header.
> 
> Probably a single header with "hash" in the name is better than two
> headers each containing a single function (and one containing an
> additional typedef).
> 
> So, I think that what you did here is good.

Yeah. I'd be more inclined to do so if renames weren't still a minor
pain when following history, etc. Getting rid of "sha1" crosses my bar
of worth doing. The distinction between "hash" and "oid" here doesn't.
(plus as you note, we'd really have to split it, which is even more
annoying).

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-01-29  1:23 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-28  6:12 [PATCH 0/6] convert hash_pos() to oid_pos() Jeff King
2021-01-28  6:12 ` [PATCH 1/6] commit_graft_pos(): take an oid instead of a bare hash Jeff King
2021-01-28 12:57   ` Derrick Stolee
2021-01-28  6:14 ` [PATCH 2/6] rerere: check dirname format while iterating rr_cache directory Jeff King
2021-01-28  6:16 ` [PATCH 3/6] rerere: tighten rr-cache dirname check Jeff King
2021-01-28 22:35   ` Taylor Blau
2021-01-28  6:19 ` [PATCH 4/6] rerere: use strmap to store rerere directories Jeff King
2021-01-28  6:34   ` Jeff King
2021-01-28 19:52     ` Junio C Hamano
2021-01-28  6:38   ` Junio C Hamano
2021-01-28  6:51     ` Jeff King
2021-01-28  6:19 ` [PATCH 5/6] hash_pos(): convert to oid_pos() Jeff King
2021-01-28 22:48   ` Taylor Blau
2021-01-29  1:18     ` Jeff King
2021-01-28  6:20 ` [PATCH 6/6] oid_pos(): access table through const pointers Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).