git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/7] fix inconsistent uses of the_repo in parse_object()'s call chain
@ 2020-01-30 20:32 Matheus Tavares
  2020-01-30 20:32 ` [PATCH 1/7] diff: make diff_populate_filespec() honor its repo argument Matheus Tavares
                   ` (8 more replies)
  0 siblings, 9 replies; 16+ messages in thread
From: Matheus Tavares @ 2020-01-30 20:32 UTC (permalink / raw)
  To: git

The motivation for this patchset is another series I'm working on, to
make git-grep stop adding submodules' odbs to the alternates list. With
that, parse_object() will have to be called with the subdmodules' struct
repository. But it seemed that this function doesn't pass on the
received repo to some inner calls, which, instead, always use
the_repository. This series seeks to fix these inconsistencies.

Note: I also tried to replace some uses of the_hash_algo with the struct
git_hash_algo from the received repository, for consitency. (In
practice, I'm not sure if this is very useful right now, but maybe it
will be relevant for the change to SHA256?) Still, many functions in
parse_object()'s call chain call oid_to_hex(), which uses the_hash_algo.
Since changing this would require a much bigger operation, I decided to
leave it as is, for now.

Note II: Despite receiving a repo through the apply_state struct,
apply.c:apply_binary() call functions which uses the_repository
internally. Because of that, I used the_hash_algo in this function in
patch 6. Should I change it to use apply_state->repo->hash_algo
anyway?

travis build: https://travis-ci.org/matheustavares/git/builds/644022000

Matheus Tavares (7):
  diff: make diff_populate_filespec() honor its repo argument
  cache-tree: use given repo's hash_algo at verify_one()
  pack-check: use given repo's hash_algo at verify_packfile()
  streaming: allow open_istream() to handle any repo
  sha1-file: pass git_hash_algo to write_object_file_prepare()
  sha1-file: pass git_hash_algo to hash_object_file()
  sha1-file: allow check_object_signature() to handle any repo

 apply.c                  |  6 +++--
 archive-tar.c            |  6 ++---
 archive-zip.c            |  3 ++-
 builtin/fast-export.c    |  3 ++-
 builtin/index-pack.c     | 10 +++++---
 builtin/mktag.c          |  7 +++--
 builtin/pack-objects.c   |  3 ++-
 builtin/replace.c        |  3 ++-
 builtin/unpack-objects.c |  3 ++-
 cache-tree.c             | 11 +++++---
 cache.h                  |  3 ++-
 convert.c                |  2 +-
 diff.c                   |  2 +-
 diffcore-rename.c        |  4 +--
 dir.c                    |  4 +--
 log-tree.c               |  3 ++-
 object-store.h           |  5 ++--
 object.c                 |  5 ++--
 pack-check.c             | 12 ++++-----
 sha1-file.c              | 55 ++++++++++++++++++++++------------------
 streaming.c              | 28 ++++++++++----------
 streaming.h              |  4 ++-
 22 files changed, 106 insertions(+), 76 deletions(-)

-- 
2.25.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/7] diff: make diff_populate_filespec() honor its repo argument
  2020-01-30 20:32 [PATCH 0/7] fix inconsistent uses of the_repo in parse_object()'s call chain Matheus Tavares
@ 2020-01-30 20:32 ` Matheus Tavares
  2020-01-30 20:32 ` [PATCH 2/7] cache-tree: use given repo's hash_algo at verify_one() Matheus Tavares
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Matheus Tavares @ 2020-01-30 20:32 UTC (permalink / raw)
  To: git; +Cc: brian m. carlson, Junio C Hamano

diff_populate_filespec() takes a struct repository argument but it
doesn't get passed down to read_object_file().

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 diff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 8e2914c031..0f333c27d6 100644
--- a/diff.c
+++ b/diff.c
@@ -4024,7 +4024,7 @@ int diff_populate_filespec(struct repository *r,
 				return 0;
 			}
 		}
-		s->data = read_object_file(&s->oid, &type, &s->size);
+		s->data = repo_read_object_file(r, &s->oid, &type, &s->size);
 		if (!s->data)
 			die("unable to read %s", oid_to_hex(&s->oid));
 		s->should_free = 1;
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/7] cache-tree: use given repo's hash_algo at verify_one()
  2020-01-30 20:32 [PATCH 0/7] fix inconsistent uses of the_repo in parse_object()'s call chain Matheus Tavares
  2020-01-30 20:32 ` [PATCH 1/7] diff: make diff_populate_filespec() honor its repo argument Matheus Tavares
@ 2020-01-30 20:32 ` Matheus Tavares
  2020-01-30 20:32 ` [PATCH 3/7] pack-check: use given repo's hash_algo at verify_packfile() Matheus Tavares
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Matheus Tavares @ 2020-01-30 20:32 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy, Jeff King, Junio C Hamano

verify_one() takes a struct repository argument but uses the_hash_algo
internally. Replace it with the provided repo's git_hash_algo, for
consistency. For now, this is mainly a cosmetic change, as all callers
of this function currently only pass the_repository to it.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 cache-tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cache-tree.c b/cache-tree.c
index 1bd1b23d38..8c51a15751 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -826,7 +826,7 @@ static void verify_one(struct repository *r,
 			i++;
 		}
 		strbuf_addf(&tree_buf, "%o %.*s%c", mode, entlen, name, '\0');
-		strbuf_add(&tree_buf, oid->hash, the_hash_algo->rawsz);
+		strbuf_add(&tree_buf, oid->hash, r->hash_algo->rawsz);
 	}
 	hash_object_file(tree_buf.buf, tree_buf.len, tree_type, &new_oid);
 	if (!oideq(&new_oid, &it->oid))
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/7] pack-check: use given repo's hash_algo at verify_packfile()
  2020-01-30 20:32 [PATCH 0/7] fix inconsistent uses of the_repo in parse_object()'s call chain Matheus Tavares
  2020-01-30 20:32 ` [PATCH 1/7] diff: make diff_populate_filespec() honor its repo argument Matheus Tavares
  2020-01-30 20:32 ` [PATCH 2/7] cache-tree: use given repo's hash_algo at verify_one() Matheus Tavares
@ 2020-01-30 20:32 ` Matheus Tavares
  2020-01-30 20:32 ` [PATCH 4/7] streaming: allow open_istream() to handle any repo Matheus Tavares
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Matheus Tavares @ 2020-01-30 20:32 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Jeff King

At verify_packfile(), use the git_hash_algo from the provided repository
instead of the_hash_algo, for consistency. Like the previous patch, this
shouldn't bring any behavior changes, since this function is currently
only receiving the_repository.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 pack-check.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/pack-check.c b/pack-check.c
index 2cc3603189..0fb3b365c7 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -67,23 +67,23 @@ static int verify_packfile(struct repository *r,
 	if (!is_pack_valid(p))
 		return error("packfile %s cannot be accessed", p->pack_name);
 
-	the_hash_algo->init_fn(&ctx);
+	r->hash_algo->init_fn(&ctx);
 	do {
 		unsigned long remaining;
 		unsigned char *in = use_pack(p, w_curs, offset, &remaining);
 		offset += remaining;
 		if (!pack_sig_ofs)
-			pack_sig_ofs = p->pack_size - the_hash_algo->rawsz;
+			pack_sig_ofs = p->pack_size - r->hash_algo->rawsz;
 		if (offset > pack_sig_ofs)
 			remaining -= (unsigned int)(offset - pack_sig_ofs);
-		the_hash_algo->update_fn(&ctx, in, remaining);
+		r->hash_algo->update_fn(&ctx, in, remaining);
 	} while (offset < pack_sig_ofs);
-	the_hash_algo->final_fn(hash, &ctx);
+	r->hash_algo->final_fn(hash, &ctx);
 	pack_sig = use_pack(p, w_curs, pack_sig_ofs, NULL);
 	if (!hasheq(hash, pack_sig))
 		err = error("%s pack checksum mismatch",
 			    p->pack_name);
-	if (!hasheq(index_base + index_size - the_hash_algo->hexsz, pack_sig))
+	if (!hasheq(index_base + index_size - r->hash_algo->hexsz, pack_sig))
 		err = error("%s pack checksum does not match its index",
 			    p->pack_name);
 	unuse_pack(w_curs);
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/7] streaming: allow open_istream() to handle any repo
  2020-01-30 20:32 [PATCH 0/7] fix inconsistent uses of the_repo in parse_object()'s call chain Matheus Tavares
                   ` (2 preceding siblings ...)
  2020-01-30 20:32 ` [PATCH 3/7] pack-check: use given repo's hash_algo at verify_packfile() Matheus Tavares
@ 2020-01-30 20:32 ` Matheus Tavares
  2020-01-31 18:55   ` Junio C Hamano
  2020-01-30 20:32 ` [PATCH 5/7] sha1-file: pass git_hash_algo to write_object_file_prepare() Matheus Tavares
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 16+ messages in thread
From: Matheus Tavares @ 2020-01-30 20:32 UTC (permalink / raw)
  To: git
  Cc: Stefan Beller, Nguyễn Thái Ngọc Duy,
	brian m. carlson, Jeff King, Junio C Hamano

Some callers of open_istream() at archive-tar.c and archive-zip.c are
capable of working on arbitrary repositories but the repo struct is not
passed down to open_istream(), which uses the_repository internally. For
now, that's not a problem since the said callers are only being called
with the_repository. But to be consistent and avoid future problems,
let's allow open_istream() to receive a struct repository and use that
instead of the_repository. This parameter addition will also be used in
a future patch to make sha1-file.c:check_object_signature() be able to
work on arbitrary repos.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 archive-tar.c          |  6 +++---
 archive-zip.c          |  3 ++-
 builtin/index-pack.c   |  3 ++-
 builtin/pack-objects.c |  3 ++-
 sha1-file.c            |  2 +-
 streaming.c            | 28 +++++++++++++++-------------
 streaming.h            |  4 +++-
 7 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/archive-tar.c b/archive-tar.c
index e16d3f756d..5a77701a15 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -112,7 +112,7 @@ static void write_trailer(void)
  * queues up writes, so that all our write(2) calls write exactly one
  * full block; pads writes to RECORDSIZE
  */
-static int stream_blocked(const struct object_id *oid)
+static int stream_blocked(struct repository *r, const struct object_id *oid)
 {
 	struct git_istream *st;
 	enum object_type type;
@@ -120,7 +120,7 @@ static int stream_blocked(const struct object_id *oid)
 	char buf[BLOCKSIZE];
 	ssize_t readlen;
 
-	st = open_istream(oid, &type, &sz, NULL);
+	st = open_istream(r, oid, &type, &sz, NULL);
 	if (!st)
 		return error(_("cannot stream blob %s"), oid_to_hex(oid));
 	for (;;) {
@@ -324,7 +324,7 @@ static int write_tar_entry(struct archiver_args *args,
 		if (buffer)
 			write_blocked(buffer, size);
 		else
-			err = stream_blocked(oid);
+			err = stream_blocked(args->repo, oid);
 	}
 	free(buffer);
 	return err;
diff --git a/archive-zip.c b/archive-zip.c
index 11f5b1974b..e9f426298b 100644
--- a/archive-zip.c
+++ b/archive-zip.c
@@ -345,7 +345,8 @@ static int write_zip_entry(struct archiver_args *args,
 
 		if (S_ISREG(mode) && type == OBJ_BLOB && !args->convert &&
 		    size > big_file_threshold) {
-			stream = open_istream(oid, &type, &size, NULL);
+			stream = open_istream(args->repo, oid, &type, &size,
+					      NULL);
 			if (!stream)
 				return error(_("cannot stream blob %s"),
 					     oid_to_hex(oid));
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 60a5591039..7a08da8401 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -757,7 +757,8 @@ static int check_collison(struct object_entry *entry)
 
 	memset(&data, 0, sizeof(data));
 	data.entry = entry;
-	data.st = open_istream(&entry->idx.oid, &type, &size, NULL);
+	data.st = open_istream(the_repository, &entry->idx.oid, &type, &size,
+			       NULL);
 	if (!data.st)
 		return -1;
 	if (size != entry->size || type != entry->type)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 393c20a2d7..ef17efd94e 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -303,7 +303,8 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
 	if (!usable_delta) {
 		if (oe_type(entry) == OBJ_BLOB &&
 		    oe_size_greater_than(&to_pack, entry, big_file_threshold) &&
-		    (st = open_istream(&entry->idx.oid, &type, &size, NULL)) != NULL)
+		    (st = open_istream(the_repository, &entry->idx.oid, &type,
+				       &size, NULL)) != NULL)
 			buf = NULL;
 		else {
 			buf = read_object_file(&entry->idx.oid, &type, &size);
diff --git a/sha1-file.c b/sha1-file.c
index 03ae9ae93a..13b90b1cb1 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -986,7 +986,7 @@ int check_object_signature(const struct object_id *oid, void *map,
 		return !oideq(oid, &real_oid) ? -1 : 0;
 	}
 
-	st = open_istream(oid, &obj_type, &size, NULL);
+	st = open_istream(the_repository, oid, &obj_type, &size, NULL);
 	if (!st)
 		return -1;
 
diff --git a/streaming.c b/streaming.c
index fcd6303219..800f07a52c 100644
--- a/streaming.c
+++ b/streaming.c
@@ -16,6 +16,7 @@ enum input_source {
 };
 
 typedef int (*open_istream_fn)(struct git_istream *,
+			       struct repository *,
 			       struct object_info *,
 			       const struct object_id *,
 			       enum object_type *);
@@ -29,8 +30,8 @@ struct stream_vtbl {
 
 #define open_method_decl(name) \
 	int open_istream_ ##name \
-	(struct git_istream *st, struct object_info *oi, \
-	 const struct object_id *oid, \
+	(struct git_istream *st, struct repository *r, \
+	 struct object_info *oi, const struct object_id *oid, \
 	 enum object_type *type)
 
 #define close_method_decl(name) \
@@ -108,7 +109,8 @@ ssize_t read_istream(struct git_istream *st, void *buf, size_t sz)
 	return st->vtbl->read(st, buf, sz);
 }
 
-static enum input_source istream_source(const struct object_id *oid,
+static enum input_source istream_source(struct repository *r,
+					const struct object_id *oid,
 					enum object_type *type,
 					struct object_info *oi)
 {
@@ -117,7 +119,7 @@ static enum input_source istream_source(const struct object_id *oid,
 
 	oi->typep = type;
 	oi->sizep = &size;
-	status = oid_object_info_extended(the_repository, oid, oi, 0);
+	status = oid_object_info_extended(r, oid, oi, 0);
 	if (status < 0)
 		return stream_error;
 
@@ -133,22 +135,23 @@ static enum input_source istream_source(const struct object_id *oid,
 	}
 }
 
-struct git_istream *open_istream(const struct object_id *oid,
+struct git_istream *open_istream(struct repository *r,
+				 const struct object_id *oid,
 				 enum object_type *type,
 				 unsigned long *size,
 				 struct stream_filter *filter)
 {
 	struct git_istream *st;
 	struct object_info oi = OBJECT_INFO_INIT;
-	const struct object_id *real = lookup_replace_object(the_repository, oid);
-	enum input_source src = istream_source(real, type, &oi);
+	const struct object_id *real = lookup_replace_object(r, oid);
+	enum input_source src = istream_source(r, real, type, &oi);
 
 	if (src < 0)
 		return NULL;
 
 	st = xmalloc(sizeof(*st));
-	if (open_istream_tbl[src](st, &oi, real, type)) {
-		if (open_istream_incore(st, &oi, real, type)) {
+	if (open_istream_tbl[src](st, r, &oi, real, type)) {
+		if (open_istream_incore(st, r, &oi, real, type)) {
 			free(st);
 			return NULL;
 		}
@@ -338,8 +341,7 @@ static struct stream_vtbl loose_vtbl = {
 
 static open_method_decl(loose)
 {
-	st->u.loose.mapped = map_loose_object(the_repository,
-					      oid, &st->u.loose.mapsize);
+	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
 		return -1;
 	if ((unpack_loose_header(&st->z,
@@ -499,7 +501,7 @@ static struct stream_vtbl incore_vtbl = {
 
 static open_method_decl(incore)
 {
-	st->u.incore.buf = read_object_file_extended(the_repository, oid, type, &st->size, 0);
+	st->u.incore.buf = read_object_file_extended(r, oid, type, &st->size, 0);
 	st->u.incore.read_ptr = 0;
 	st->vtbl = &incore_vtbl;
 
@@ -520,7 +522,7 @@ int stream_blob_to_fd(int fd, const struct object_id *oid, struct stream_filter
 	ssize_t kept = 0;
 	int result = -1;
 
-	st = open_istream(oid, &type, &sz, filter);
+	st = open_istream(the_repository, oid, &type, &sz, filter);
 	if (!st) {
 		if (filter)
 			free_stream_filter(filter);
diff --git a/streaming.h b/streaming.h
index f465a3cd31..5e4e6acfd0 100644
--- a/streaming.h
+++ b/streaming.h
@@ -8,7 +8,9 @@
 /* opaque */
 struct git_istream;
 
-struct git_istream *open_istream(const struct object_id *, enum object_type *, unsigned long *, struct stream_filter *);
+struct git_istream *open_istream(struct repository *, const struct object_id *,
+				 enum object_type *, unsigned long *,
+				 struct stream_filter *);
 int close_istream(struct git_istream *);
 ssize_t read_istream(struct git_istream *, void *, size_t);
 
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/7] sha1-file: pass git_hash_algo to write_object_file_prepare()
  2020-01-30 20:32 [PATCH 0/7] fix inconsistent uses of the_repo in parse_object()'s call chain Matheus Tavares
                   ` (3 preceding siblings ...)
  2020-01-30 20:32 ` [PATCH 4/7] streaming: allow open_istream() to handle any repo Matheus Tavares
@ 2020-01-30 20:32 ` Matheus Tavares
  2020-02-01 10:03   ` Torsten Bögershausen
  2020-01-30 20:32 ` [PATCH 6/7] sha1-file: pass git_hash_algo to hash_object_file() Matheus Tavares
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 16+ messages in thread
From: Matheus Tavares @ 2020-01-30 20:32 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Junio C Hamano, Patryk Obara, brian m. carlson,
	Eric Sunshine, Torsten Bögershausen

Allow write_object_file_prepare() to receive arbitrary 'struct
git_hash_algo's instead of always using the_hash_algo. The added
parameter will be used in the next commit to make hash_object_file() be
able to work with arbitrary git_hash_algo's, as well.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 sha1-file.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/sha1-file.c b/sha1-file.c
index 13b90b1cb1..e789bfd153 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1647,7 +1647,8 @@ void *read_object_with_reference(struct repository *r,
 	}
 }
 
-static void write_object_file_prepare(const void *buf, unsigned long len,
+static void write_object_file_prepare(const struct git_hash_algo *algo,
+				      const void *buf, unsigned long len,
 				      const char *type, struct object_id *oid,
 				      char *hdr, int *hdrlen)
 {
@@ -1657,10 +1658,10 @@ static void write_object_file_prepare(const void *buf, unsigned long len,
 	*hdrlen = xsnprintf(hdr, *hdrlen, "%s %"PRIuMAX , type, (uintmax_t)len)+1;
 
 	/* Sha1.. */
-	the_hash_algo->init_fn(&c);
-	the_hash_algo->update_fn(&c, hdr, *hdrlen);
-	the_hash_algo->update_fn(&c, buf, len);
-	the_hash_algo->final_fn(oid->hash, &c);
+	algo->init_fn(&c);
+	algo->update_fn(&c, hdr, *hdrlen);
+	algo->update_fn(&c, buf, len);
+	algo->final_fn(oid->hash, &c);
 }
 
 /*
@@ -1718,7 +1719,8 @@ int hash_object_file(const void *buf, unsigned long len, const char *type,
 {
 	char hdr[MAX_HEADER_LEN];
 	int hdrlen = sizeof(hdr);
-	write_object_file_prepare(buf, len, type, oid, hdr, &hdrlen);
+	write_object_file_prepare(the_hash_algo, buf, len, type, oid, hdr,
+				  &hdrlen);
 	return 0;
 }
 
@@ -1876,7 +1878,8 @@ int write_object_file(const void *buf, unsigned long len, const char *type,
 	/* Normally if we have it in the pack then we do not bother writing
 	 * it out into .git/objects/??/?{38} file.
 	 */
-	write_object_file_prepare(buf, len, type, oid, hdr, &hdrlen);
+	write_object_file_prepare(the_hash_algo, buf, len, type, oid, hdr,
+				  &hdrlen);
 	if (freshen_packed_object(oid) || freshen_loose_object(oid))
 		return 0;
 	return write_loose_object(oid, hdr, hdrlen, buf, len, 0);
@@ -1892,7 +1895,8 @@ int hash_object_file_literally(const void *buf, unsigned long len,
 	/* type string, SP, %lu of the length plus NUL must fit this */
 	hdrlen = strlen(type) + MAX_HEADER_LEN;
 	header = xmalloc(hdrlen);
-	write_object_file_prepare(buf, len, type, oid, header, &hdrlen);
+	write_object_file_prepare(the_hash_algo, buf, len, type, oid, header,
+				  &hdrlen);
 
 	if (!(flags & HASH_WRITE_OBJECT))
 		goto cleanup;
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/7] sha1-file: pass git_hash_algo to hash_object_file()
  2020-01-30 20:32 [PATCH 0/7] fix inconsistent uses of the_repo in parse_object()'s call chain Matheus Tavares
                   ` (4 preceding siblings ...)
  2020-01-30 20:32 ` [PATCH 5/7] sha1-file: pass git_hash_algo to write_object_file_prepare() Matheus Tavares
@ 2020-01-30 20:32 ` Matheus Tavares
  2020-01-30 20:32 ` [PATCH 7/7] sha1-file: allow check_object_signature() to handle any repo Matheus Tavares
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Matheus Tavares @ 2020-01-30 20:32 UTC (permalink / raw)
  To: git
  Cc: Stefan Beller, Nguyễn Thái Ngọc Duy,
	Junio C Hamano, brian m. carlson, Jeff King

Allow hash_object_file() to work on arbitrary repos by introducing a
git_hash_algo parameter. Change callers which have a struct repository
pointer in their scope to pass on the git_hash_algo from the said repo.
For all other callers, pass on the_hash_algo, which was already being
used internally at hash_object_file(). This functionality will be used
in the following patch to make check_object_signature() be able to work
on arbitrary repos (which, in turn, will be used to fix an
inconsistency at object.c:parse_object()).

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 apply.c                  |  6 ++++--
 builtin/index-pack.c     |  2 +-
 builtin/replace.c        |  3 ++-
 builtin/unpack-objects.c |  3 ++-
 cache-tree.c             |  9 ++++++---
 convert.c                |  2 +-
 diffcore-rename.c        |  4 ++--
 dir.c                    |  4 ++--
 log-tree.c               |  3 ++-
 object-store.h           |  5 +++--
 sha1-file.c              | 20 +++++++++++---------
 11 files changed, 36 insertions(+), 25 deletions(-)

diff --git a/apply.c b/apply.c
index fab44322c5..bdc008fae2 100644
--- a/apply.c
+++ b/apply.c
@@ -3157,7 +3157,8 @@ static int apply_binary(struct apply_state *state,
 		 * See if the old one matches what the patch
 		 * applies to.
 		 */
-		hash_object_file(img->buf, img->len, blob_type, &oid);
+		hash_object_file(the_hash_algo, img->buf, img->len, blob_type,
+				 &oid);
 		if (strcmp(oid_to_hex(&oid), patch->old_oid_prefix))
 			return error(_("the patch applies to '%s' (%s), "
 				       "which does not match the "
@@ -3202,7 +3203,8 @@ static int apply_binary(struct apply_state *state,
 				     name);
 
 		/* verify that the result matches */
-		hash_object_file(img->buf, img->len, blob_type, &oid);
+		hash_object_file(the_hash_algo, img->buf, img->len, blob_type,
+				 &oid);
 		if (strcmp(oid_to_hex(&oid), patch->new_oid_prefix))
 			return error(_("binary patch to '%s' creates incorrect result (expecting %s, got %s)"),
 				name, patch->new_oid_prefix, oid_to_hex(&oid));
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 7a08da8401..0183610a76 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -949,7 +949,7 @@ static void resolve_delta(struct object_entry *delta_obj,
 	free(delta_data);
 	if (!result->data)
 		bad_object(delta_obj->idx.offset, _("failed to apply delta"));
-	hash_object_file(result->data, result->size,
+	hash_object_file(the_hash_algo, result->data, result->size,
 			 type_name(delta_obj->real_type), &delta_obj->idx.oid);
 	sha1_object(result->data, NULL, result->size, delta_obj->real_type,
 		    &delta_obj->idx.oid);
diff --git a/builtin/replace.c b/builtin/replace.c
index bd92dc63b9..b36d17a657 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -409,7 +409,8 @@ static int check_one_mergetag(struct commit *commit,
 	struct tag *tag;
 	int i;
 
-	hash_object_file(extra->value, extra->len, type_name(OBJ_TAG), &tag_oid);
+	hash_object_file(the_hash_algo, extra->value, extra->len,
+			 type_name(OBJ_TAG), &tag_oid);
 	tag = lookup_tag(the_repository, &tag_oid);
 	if (!tag)
 		return error(_("bad mergetag in commit '%s'"), ref);
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 9100964667..dd4a75e030 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -265,7 +265,8 @@ static void write_object(unsigned nr, enum object_type type,
 	} else {
 		struct object *obj;
 		int eaten;
-		hash_object_file(buf, size, type_name(type), &obj_list[nr].oid);
+		hash_object_file(the_hash_algo, buf, size, type_name(type),
+				 &obj_list[nr].oid);
 		added_object(nr, type, buf, size);
 		obj = parse_object_buffer(the_repository, &obj_list[nr].oid,
 					  type, size, buf,
diff --git a/cache-tree.c b/cache-tree.c
index 8c51a15751..a537a806c1 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -407,13 +407,15 @@ static int update_one(struct cache_tree *it,
 
 	if (repair) {
 		struct object_id oid;
-		hash_object_file(buffer.buf, buffer.len, tree_type, &oid);
+		hash_object_file(the_hash_algo, buffer.buf, buffer.len,
+				 tree_type, &oid);
 		if (has_object_file_with_flags(&oid, OBJECT_INFO_SKIP_FETCH_OBJECT))
 			oidcpy(&it->oid, &oid);
 		else
 			to_invalidate = 1;
 	} else if (dryrun) {
-		hash_object_file(buffer.buf, buffer.len, tree_type, &it->oid);
+		hash_object_file(the_hash_algo, buffer.buf, buffer.len,
+				 tree_type, &it->oid);
 	} else if (write_object_file(buffer.buf, buffer.len, tree_type,
 				     &it->oid)) {
 		strbuf_release(&buffer);
@@ -828,7 +830,8 @@ static void verify_one(struct repository *r,
 		strbuf_addf(&tree_buf, "%o %.*s%c", mode, entlen, name, '\0');
 		strbuf_add(&tree_buf, oid->hash, r->hash_algo->rawsz);
 	}
-	hash_object_file(tree_buf.buf, tree_buf.len, tree_type, &new_oid);
+	hash_object_file(r->hash_algo, tree_buf.buf, tree_buf.len, tree_type,
+			 &new_oid);
 	if (!oideq(&new_oid, &it->oid))
 		BUG("cache-tree for path %.*s does not match. "
 		    "Expected %s got %s", len, path->buf,
diff --git a/convert.c b/convert.c
index 797e0bd0b1..33507ee0a9 100644
--- a/convert.c
+++ b/convert.c
@@ -1146,7 +1146,7 @@ static int ident_to_worktree(const char *src, size_t len,
 	/* are we "faking" in place editing ? */
 	if (src == buf->buf)
 		to_free = strbuf_detach(buf, NULL);
-	hash_object_file(src, len, "blob", &oid);
+	hash_object_file(the_hash_algo, src, len, "blob", &oid);
 
 	strbuf_grow(buf, len + cnt * (the_hash_algo->hexsz + 3));
 	for (;;) {
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 531d7adeaf..e189f407af 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -263,8 +263,8 @@ static unsigned int hash_filespec(struct repository *r,
 	if (!filespec->oid_valid) {
 		if (diff_populate_filespec(r, filespec, 0))
 			return 0;
-		hash_object_file(filespec->data, filespec->size, "blob",
-				 &filespec->oid);
+		hash_object_file(r->hash_algo, filespec->data, filespec->size,
+				 "blob", &filespec->oid);
 	}
 	return oidhash(&filespec->oid);
 }
diff --git a/dir.c b/dir.c
index 7d255227b1..512c5d7ea6 100644
--- a/dir.c
+++ b/dir.c
@@ -1002,8 +1002,8 @@ static int add_patterns(const char *fname, const char *base, int baselen,
 				oidcpy(&oid_stat->oid,
 				       &istate->cache[pos]->oid);
 			else
-				hash_object_file(buf, size, "blob",
-						 &oid_stat->oid);
+				hash_object_file(the_hash_algo, buf, size,
+						 "blob", &oid_stat->oid);
 			fill_stat_data(&oid_stat->stat, &st);
 			oid_stat->valid = 1;
 		}
diff --git a/log-tree.c b/log-tree.c
index 4e32638de8..cae38dcc66 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -501,7 +501,8 @@ static int show_one_mergetag(struct commit *commit,
 	int status, nth;
 	size_t payload_size, gpg_message_offset;
 
-	hash_object_file(extra->value, extra->len, type_name(OBJ_TAG), &oid);
+	hash_object_file(the_hash_algo, extra->value, extra->len,
+			 type_name(OBJ_TAG), &oid);
 	tag = lookup_tag(the_repository, &oid);
 	if (!tag)
 		return -1; /* error message already given */
diff --git a/object-store.h b/object-store.h
index 61b8b13e3b..cf97cbc292 100644
--- a/object-store.h
+++ b/object-store.h
@@ -198,8 +198,9 @@ static inline void *repo_read_object_file(struct repository *r,
 /* Read and unpack an object file into memory, write memory to an object file */
 int oid_object_info(struct repository *r, const struct object_id *, unsigned long *);
 
-int hash_object_file(const void *buf, unsigned long len,
-		     const char *type, struct object_id *oid);
+int hash_object_file(const struct git_hash_algo *algo, const void *buf,
+		     unsigned long len, const char *type,
+		     struct object_id *oid);
 
 int write_object_file(const void *buf, unsigned long len,
 		      const char *type, struct object_id *oid);
diff --git a/sha1-file.c b/sha1-file.c
index e789bfd153..d545e8b882 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -982,7 +982,7 @@ int check_object_signature(const struct object_id *oid, void *map,
 	int hdrlen;
 
 	if (map) {
-		hash_object_file(map, size, type, &real_oid);
+		hash_object_file(the_hash_algo, map, size, type, &real_oid);
 		return !oideq(oid, &real_oid) ? -1 : 0;
 	}
 
@@ -1543,7 +1543,7 @@ int pretend_object_file(void *buf, unsigned long len, enum object_type type,
 {
 	struct cached_object *co;
 
-	hash_object_file(buf, len, type_name(type), oid);
+	hash_object_file(the_hash_algo, buf, len, type_name(type), oid);
 	if (has_object_file(oid) || find_cached_object(oid))
 		return 0;
 	ALLOC_GROW(cached_objects, cached_object_nr + 1, cached_object_alloc);
@@ -1714,13 +1714,13 @@ static int write_buffer(int fd, const void *buf, size_t len)
 	return 0;
 }
 
-int hash_object_file(const void *buf, unsigned long len, const char *type,
+int hash_object_file(const struct git_hash_algo *algo, const void *buf,
+		     unsigned long len, const char *type,
 		     struct object_id *oid)
 {
 	char hdr[MAX_HEADER_LEN];
 	int hdrlen = sizeof(hdr);
-	write_object_file_prepare(the_hash_algo, buf, len, type, oid, hdr,
-				  &hdrlen);
+	write_object_file_prepare(algo, buf, len, type, oid, hdr, &hdrlen);
 	return 0;
 }
 
@@ -2006,7 +2006,8 @@ static int index_mem(struct index_state *istate,
 	if (write_object)
 		ret = write_object_file(buf, size, type_name(type), oid);
 	else
-		ret = hash_object_file(buf, size, type_name(type), oid);
+		ret = hash_object_file(the_hash_algo, buf, size,
+				       type_name(type), oid);
 	if (re_allocated)
 		free(buf);
 	return ret;
@@ -2032,8 +2033,8 @@ static int index_stream_convert_blob(struct index_state *istate,
 		ret = write_object_file(sbuf.buf, sbuf.len, type_name(OBJ_BLOB),
 					oid);
 	else
-		ret = hash_object_file(sbuf.buf, sbuf.len, type_name(OBJ_BLOB),
-				       oid);
+		ret = hash_object_file(the_hash_algo, sbuf.buf, sbuf.len,
+				       type_name(OBJ_BLOB), oid);
 	strbuf_release(&sbuf);
 	return ret;
 }
@@ -2151,7 +2152,8 @@ int index_path(struct index_state *istate, struct object_id *oid,
 		if (strbuf_readlink(&sb, path, st->st_size))
 			return error_errno("readlink(\"%s\")", path);
 		if (!(flags & HASH_WRITE_OBJECT))
-			hash_object_file(sb.buf, sb.len, blob_type, oid);
+			hash_object_file(the_hash_algo, sb.buf, sb.len,
+					 blob_type, oid);
 		else if (write_object_file(sb.buf, sb.len, blob_type, oid))
 			rc = error(_("%s: failed to insert into database"), path);
 		strbuf_release(&sb);
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 7/7] sha1-file: allow check_object_signature() to handle any repo
  2020-01-30 20:32 [PATCH 0/7] fix inconsistent uses of the_repo in parse_object()'s call chain Matheus Tavares
                   ` (5 preceding siblings ...)
  2020-01-30 20:32 ` [PATCH 6/7] sha1-file: pass git_hash_algo to hash_object_file() Matheus Tavares
@ 2020-01-30 20:32 ` Matheus Tavares
  2020-01-30 21:26 ` [PATCH 0/7] fix inconsistent uses of the_repo in parse_object()'s call chain Junio C Hamano
  2020-01-31  9:03 ` Jeff King
  8 siblings, 0 replies; 16+ messages in thread
From: Matheus Tavares @ 2020-01-30 20:32 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Jeff King, Stefan Beller

Some callers of check_object_signature() can work on arbitrary
repositories, but the repo does not get passed to this function.
Instead, the_repository is always used internally. To fix possible
inconsistencies, allow the function to receive a struct repository and
make those callers pass on the repo being handled.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/fast-export.c |  3 ++-
 builtin/index-pack.c  |  5 +++--
 builtin/mktag.c       |  7 +++++--
 cache.h               |  3 ++-
 object.c              |  5 +++--
 pack-check.c          |  2 +-
 sha1-file.c           | 21 +++++++++++----------
 7 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index dbec4df92b..25386b34d3 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -293,7 +293,8 @@ static void export_blob(const struct object_id *oid)
 		buf = read_object_file(oid, &type, &size);
 		if (!buf)
 			die("could not read blob %s", oid_to_hex(oid));
-		if (check_object_signature(oid, buf, size, type_name(type)) < 0)
+		if (check_object_signature(the_repository, oid, buf, size,
+					   type_name(type)) < 0)
 			die("oid mismatch in blob %s", oid_to_hex(oid));
 		object = parse_object_buffer(the_repository, oid, type,
 					     size, buf, &eaten);
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 0183610a76..acdda17d84 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1384,8 +1384,9 @@ static void fix_unresolved_deltas(struct hashfile *f)
 		if (!base_obj->data)
 			continue;
 
-		if (check_object_signature(&d->oid, base_obj->data,
-				base_obj->size, type_name(type)))
+		if (check_object_signature(the_repository, &d->oid,
+					   base_obj->data, base_obj->size,
+					   type_name(type)))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
 		base_obj->obj = append_obj_to_pack(f, d->oid.hash,
 					base_obj->data, base_obj->size, type);
diff --git a/builtin/mktag.c b/builtin/mktag.c
index 6fb7dc8578..4982d3a93e 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -29,8 +29,11 @@ static int verify_object(const struct object_id *oid, const char *expected_type)
 	const struct object_id *repl = lookup_replace_object(the_repository, oid);
 
 	if (buffer) {
-		if (type == type_from_string(expected_type))
-			ret = check_object_signature(repl, buffer, size, expected_type);
+		if (type == type_from_string(expected_type)) {
+			ret = check_object_signature(the_repository, repl,
+						     buffer, size,
+						     expected_type);
+		}
 		free(buffer);
 	}
 	return ret;
diff --git a/cache.h b/cache.h
index cbfaead23a..40ae160991 100644
--- a/cache.h
+++ b/cache.h
@@ -1363,7 +1363,8 @@ int git_open_cloexec(const char *name, int flags);
 int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
 int parse_loose_header(const char *hdr, unsigned long *sizep);
 
-int check_object_signature(const struct object_id *oid, void *buf, unsigned long size, const char *type);
+int check_object_signature(struct repository *r, const struct object_id *oid,
+			   void *buf, unsigned long size, const char *type);
 
 int finalize_object_file(const char *tmpfile, const char *filename);
 
diff --git a/object.c b/object.c
index 142ef69399..81f5820fc3 100644
--- a/object.c
+++ b/object.c
@@ -262,7 +262,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
 	    (!obj && repo_has_object_file(r, oid) &&
 	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
-		if (check_object_signature(repl, NULL, 0, NULL) < 0) {
+		if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
 			return NULL;
 		}
@@ -272,7 +272,8 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 
 	buffer = repo_read_object_file(r, oid, &type, &size);
 	if (buffer) {
-		if (check_object_signature(repl, buffer, size, type_name(type)) < 0) {
+		if (check_object_signature(r, repl, buffer, size,
+					   type_name(type)) < 0) {
 			free(buffer);
 			error(_("hash mismatch %s"), oid_to_hex(repl));
 			return NULL;
diff --git a/pack-check.c b/pack-check.c
index 0fb3b365c7..e4ef71c673 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -144,7 +144,7 @@ static int verify_packfile(struct repository *r,
 			err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
 				    oid_to_hex(entries[i].oid.oid), p->pack_name,
 				    (uintmax_t)entries[i].offset);
-		else if (check_object_signature(entries[i].oid.oid, data, size, type_name(type)))
+		else if (check_object_signature(r, entries[i].oid.oid, data, size, type_name(type)))
 			err = error("packed %s from %s is corrupt",
 				    oid_to_hex(entries[i].oid.oid), p->pack_name);
 		else if (fn) {
diff --git a/sha1-file.c b/sha1-file.c
index d545e8b882..e2f608ac08 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -971,8 +971,8 @@ void *xmmap(void *start, size_t length,
  * With "map" == NULL, try reading the object named with "oid" using
  * the streaming interface and rehash it to do the same.
  */
-int check_object_signature(const struct object_id *oid, void *map,
-			   unsigned long size, const char *type)
+int check_object_signature(struct repository *r, const struct object_id *oid,
+			   void *map, unsigned long size, const char *type)
 {
 	struct object_id real_oid;
 	enum object_type obj_type;
@@ -982,11 +982,11 @@ int check_object_signature(const struct object_id *oid, void *map,
 	int hdrlen;
 
 	if (map) {
-		hash_object_file(the_hash_algo, map, size, type, &real_oid);
+		hash_object_file(r->hash_algo, map, size, type, &real_oid);
 		return !oideq(oid, &real_oid) ? -1 : 0;
 	}
 
-	st = open_istream(the_repository, oid, &obj_type, &size, NULL);
+	st = open_istream(r, oid, &obj_type, &size, NULL);
 	if (!st)
 		return -1;
 
@@ -994,8 +994,8 @@ int check_object_signature(const struct object_id *oid, void *map,
 	hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %"PRIuMAX , type_name(obj_type), (uintmax_t)size) + 1;
 
 	/* Sha1.. */
-	the_hash_algo->init_fn(&c);
-	the_hash_algo->update_fn(&c, hdr, hdrlen);
+	r->hash_algo->init_fn(&c);
+	r->hash_algo->update_fn(&c, hdr, hdrlen);
 	for (;;) {
 		char buf[1024 * 16];
 		ssize_t readlen = read_istream(st, buf, sizeof(buf));
@@ -1006,9 +1006,9 @@ int check_object_signature(const struct object_id *oid, void *map,
 		}
 		if (!readlen)
 			break;
-		the_hash_algo->update_fn(&c, buf, readlen);
+		r->hash_algo->update_fn(&c, buf, readlen);
 	}
-	the_hash_algo->final_fn(real_oid.hash, &c);
+	r->hash_algo->final_fn(real_oid.hash, &c);
 	close_istream(st);
 	return !oideq(oid, &real_oid) ? -1 : 0;
 }
@@ -2454,8 +2454,9 @@ int read_loose_object(const char *path,
 			git_inflate_end(&stream);
 			goto out;
 		}
-		if (check_object_signature(expected_oid, *contents,
-					 *size, type_name(*type))) {
+		if (check_object_signature(the_repository, expected_oid,
+					   *contents, *size,
+					   type_name(*type))) {
 			error(_("hash mismatch for %s (expected %s)"), path,
 			      oid_to_hex(expected_oid));
 			free(*contents);
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/7] fix inconsistent uses of the_repo in parse_object()'s call chain
  2020-01-30 20:32 [PATCH 0/7] fix inconsistent uses of the_repo in parse_object()'s call chain Matheus Tavares
                   ` (6 preceding siblings ...)
  2020-01-30 20:32 ` [PATCH 7/7] sha1-file: allow check_object_signature() to handle any repo Matheus Tavares
@ 2020-01-30 21:26 ` Junio C Hamano
  2020-01-30 23:46   ` brian m. carlson
  2020-01-31  9:03 ` Jeff King
  8 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2020-01-30 21:26 UTC (permalink / raw)
  To: Matheus Tavares, brian m. carlson; +Cc: git

Matheus Tavares <matheus.bernardino@usp.br> writes:

> Note: I also tried to replace some uses of the_hash_algo with the struct
> git_hash_algo from the received repository, for consitency. (In
> practice, I'm not sure if this is very useful right now, but maybe it
> will be relevant for the change to SHA256?) Still, many functions in
> parse_object()'s call chain call oid_to_hex(), which uses the_hash_algo.
> Since changing this would require a much bigger operation, I decided to
> leave it as is, for now.

Passing a repo instance throughout the callchain and redusing the
use of the_repo would be a worthwhile longer term clean-up.  

Moving off of the_hash_algo to repo->hash_algo is, too, and it is
very much relevant to the "newhash" work Brian (CCed) has been
working on.  You two do not want to step on each other's toes.
Please coordinate.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/7] fix inconsistent uses of the_repo in parse_object()'s call chain
  2020-01-30 21:26 ` [PATCH 0/7] fix inconsistent uses of the_repo in parse_object()'s call chain Junio C Hamano
@ 2020-01-30 23:46   ` brian m. carlson
  0 siblings, 0 replies; 16+ messages in thread
From: brian m. carlson @ 2020-01-30 23:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Matheus Tavares, git

[-- Attachment #1: Type: text/plain, Size: 1330 bytes --]

On 2020-01-30 at 21:26:02, Junio C Hamano wrote:
> Matheus Tavares <matheus.bernardino@usp.br> writes:
> 
> > Note: I also tried to replace some uses of the_hash_algo with the struct
> > git_hash_algo from the received repository, for consitency. (In
> > practice, I'm not sure if this is very useful right now, but maybe it
> > will be relevant for the change to SHA256?) Still, many functions in
> > parse_object()'s call chain call oid_to_hex(), which uses the_hash_algo.
> > Since changing this would require a much bigger operation, I decided to
> > leave it as is, for now.
> 
> Passing a repo instance throughout the callchain and redusing the
> use of the_repo would be a worthwhile longer term clean-up.

Yes, I think this would be a nice change to see.

> Moving off of the_hash_algo to repo->hash_algo is, too, and it is
> very much relevant to the "newhash" work Brian (CCed) has been
> working on.  You two do not want to step on each other's toes.
> Please coordinate.

This series is fine from my end.  I think it provides a nice improvement
that I can take advantage of in the future, so I don't see a reason to
hold it up on my account.  The few patches I have in this area would be
easy to rebase or adjust.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/7] fix inconsistent uses of the_repo in parse_object()'s call chain
  2020-01-30 20:32 [PATCH 0/7] fix inconsistent uses of the_repo in parse_object()'s call chain Matheus Tavares
                   ` (7 preceding siblings ...)
  2020-01-30 21:26 ` [PATCH 0/7] fix inconsistent uses of the_repo in parse_object()'s call chain Junio C Hamano
@ 2020-01-31  9:03 ` Jeff King
  2020-02-01  4:44   ` Matheus Tavares Bernardino
  8 siblings, 1 reply; 16+ messages in thread
From: Jeff King @ 2020-01-31  9:03 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git

On Thu, Jan 30, 2020 at 05:32:16PM -0300, Matheus Tavares wrote:

> The motivation for this patchset is another series I'm working on, to
> make git-grep stop adding submodules' odbs to the alternates list. With
> that, parse_object() will have to be called with the subdmodules' struct
> repository. But it seemed that this function doesn't pass on the
> received repo to some inner calls, which, instead, always use
> the_repository. This series seeks to fix these inconsistencies.

I read over the patches and they all seem sensible. There may be spots
you missed where these functions are subtly using the_repository under
the hood (or where you used the_repository but could have used some
repository pointer tucked away in a struct). But even if that is the
case, it seems clear that all of the changes are going in the correct
direction, and we can continue to iterate on top.

> Note: I also tried to replace some uses of the_hash_algo with the struct
> git_hash_algo from the received repository, for consitency. (In
> practice, I'm not sure if this is very useful right now, but maybe it
> will be relevant for the change to SHA256?) Still, many functions in
> parse_object()'s call chain call oid_to_hex(), which uses the_hash_algo.
> Since changing this would require a much bigger operation, I decided to
> leave it as is, for now.

I'm not sure I look forward to a day where oid_to_hex() needs to take an
extra parameter, just because it will make it more annoying to use. But
we'll have to go there eventually unless we plan to forbid mixing of
repositories with different hashes within the same process.

The hash transition document says:

  To convert recorded submodule pointers, you need to have the converted
  submodule repository in place. The translation table of the submodule
  can be used to look up the new hash.

which I take to mean that the local copy of the submodule needs to match
the superproject hash (this says nothing about what the upstream owner
of the submodule wants to do; we'd be translating on the fly to the new
hash in the local repo). So using the_hash_algo would work either way.

I don't think we're particularly interested in supporting multiple
unrelated repositories within the same process. While that would be
convenient for some cases (e.g., you have a million repositories and
want to look at all of their trees without creating a million
processes), I don't think it's a goal anyone is really working towards
with this "struct repository" layer.

> Note II: Despite receiving a repo through the apply_state struct,
> apply.c:apply_binary() call functions which uses the_repository
> internally. Because of that, I used the_hash_algo in this function in
> patch 6. Should I change it to use apply_state->repo->hash_algo
> anyway?

I think this follows my "it's OK for now because we're going in the
right direction from above". I.e., we probably do eventually want to use
apply_state->repo->hash_algo, just like we probably do eventually want
all those functions to take a repository argument. But by even using
the_hash_algo, you've at least marked the spot for later fixing (the
very final step of this long process will be eradicating all of
the_hash_algo and the_repository from the bottom of the call-stack on
up).

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/7] streaming: allow open_istream() to handle any repo
  2020-01-30 20:32 ` [PATCH 4/7] streaming: allow open_istream() to handle any repo Matheus Tavares
@ 2020-01-31 18:55   ` Junio C Hamano
  0 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2020-01-31 18:55 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: git, Stefan Beller, Nguyễn Thái Ngọc Duy,
	brian m. carlson, Jeff King

Matheus Tavares <matheus.bernardino@usp.br> writes:

> Some callers of open_istream() at archive-tar.c and archive-zip.c are
> capable of working on arbitrary repositories but the repo struct is not
> passed down to open_istream(), which uses the_repository internally. For
> now, that's not a problem since the said callers are only being called
> with the_repository. But to be consistent and avoid future problems,
> let's allow open_istream() to receive a struct repository and use that
> instead of the_repository. This parameter addition will also be used in
> a future patch to make sha1-file.c:check_object_signature() be able to
> work on arbitrary repos.
>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---

Nicely done.  I am happy to see this change as the inventor of the
streaming interface ;-)

>  archive-tar.c          |  6 +++---
>  archive-zip.c          |  3 ++-
>  builtin/index-pack.c   |  3 ++-
>  builtin/pack-objects.c |  3 ++-
>  sha1-file.c            |  2 +-
>  streaming.c            | 28 +++++++++++++++-------------
>  streaming.h            |  4 +++-
>  7 files changed, 28 insertions(+), 21 deletions(-)
>
> diff --git a/archive-tar.c b/archive-tar.c
> index e16d3f756d..5a77701a15 100644
> --- a/archive-tar.c
> +++ b/archive-tar.c
> @@ -112,7 +112,7 @@ static void write_trailer(void)
>   * queues up writes, so that all our write(2) calls write exactly one
>   * full block; pads writes to RECORDSIZE
>   */
> -static int stream_blocked(const struct object_id *oid)
> +static int stream_blocked(struct repository *r, const struct object_id *oid)
>  {
>  	struct git_istream *st;
>  	enum object_type type;
> @@ -120,7 +120,7 @@ static int stream_blocked(const struct object_id *oid)
>  	char buf[BLOCKSIZE];
>  	ssize_t readlen;
>  
> -	st = open_istream(oid, &type, &sz, NULL);
> +	st = open_istream(r, oid, &type, &sz, NULL);
>  	if (!st)
>  		return error(_("cannot stream blob %s"), oid_to_hex(oid));
>  	for (;;) {
> @@ -324,7 +324,7 @@ static int write_tar_entry(struct archiver_args *args,
>  		if (buffer)
>  			write_blocked(buffer, size);
>  		else
> -			err = stream_blocked(oid);
> +			err = stream_blocked(args->repo, oid);
>  	}
>  	free(buffer);
>  	return err;
> diff --git a/archive-zip.c b/archive-zip.c
> index 11f5b1974b..e9f426298b 100644
> --- a/archive-zip.c
> +++ b/archive-zip.c
> @@ -345,7 +345,8 @@ static int write_zip_entry(struct archiver_args *args,
>  
>  		if (S_ISREG(mode) && type == OBJ_BLOB && !args->convert &&
>  		    size > big_file_threshold) {
> -			stream = open_istream(oid, &type, &size, NULL);
> +			stream = open_istream(args->repo, oid, &type, &size,
> +					      NULL);
>  			if (!stream)
>  				return error(_("cannot stream blob %s"),
>  					     oid_to_hex(oid));
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index 60a5591039..7a08da8401 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -757,7 +757,8 @@ static int check_collison(struct object_entry *entry)
>  
>  	memset(&data, 0, sizeof(data));
>  	data.entry = entry;
> -	data.st = open_istream(&entry->idx.oid, &type, &size, NULL);
> +	data.st = open_istream(the_repository, &entry->idx.oid, &type, &size,
> +			       NULL);
>  	if (!data.st)
>  		return -1;
>  	if (size != entry->size || type != entry->type)
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 393c20a2d7..ef17efd94e 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -303,7 +303,8 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent
>  	if (!usable_delta) {
>  		if (oe_type(entry) == OBJ_BLOB &&
>  		    oe_size_greater_than(&to_pack, entry, big_file_threshold) &&
> -		    (st = open_istream(&entry->idx.oid, &type, &size, NULL)) != NULL)
> +		    (st = open_istream(the_repository, &entry->idx.oid, &type,
> +				       &size, NULL)) != NULL)
>  			buf = NULL;
>  		else {
>  			buf = read_object_file(&entry->idx.oid, &type, &size);
> diff --git a/sha1-file.c b/sha1-file.c
> index 03ae9ae93a..13b90b1cb1 100644
> --- a/sha1-file.c
> +++ b/sha1-file.c
> @@ -986,7 +986,7 @@ int check_object_signature(const struct object_id *oid, void *map,
>  		return !oideq(oid, &real_oid) ? -1 : 0;
>  	}
>  
> -	st = open_istream(oid, &obj_type, &size, NULL);
> +	st = open_istream(the_repository, oid, &obj_type, &size, NULL);
>  	if (!st)
>  		return -1;
>  
> diff --git a/streaming.c b/streaming.c
> index fcd6303219..800f07a52c 100644
> --- a/streaming.c
> +++ b/streaming.c
> @@ -16,6 +16,7 @@ enum input_source {
>  };
>  
>  typedef int (*open_istream_fn)(struct git_istream *,
> +			       struct repository *,
>  			       struct object_info *,
>  			       const struct object_id *,
>  			       enum object_type *);
> @@ -29,8 +30,8 @@ struct stream_vtbl {
>  
>  #define open_method_decl(name) \
>  	int open_istream_ ##name \
> -	(struct git_istream *st, struct object_info *oi, \
> -	 const struct object_id *oid, \
> +	(struct git_istream *st, struct repository *r, \
> +	 struct object_info *oi, const struct object_id *oid, \
>  	 enum object_type *type)
>  
>  #define close_method_decl(name) \
> @@ -108,7 +109,8 @@ ssize_t read_istream(struct git_istream *st, void *buf, size_t sz)
>  	return st->vtbl->read(st, buf, sz);
>  }
>  
> -static enum input_source istream_source(const struct object_id *oid,
> +static enum input_source istream_source(struct repository *r,
> +					const struct object_id *oid,
>  					enum object_type *type,
>  					struct object_info *oi)
>  {
> @@ -117,7 +119,7 @@ static enum input_source istream_source(const struct object_id *oid,
>  
>  	oi->typep = type;
>  	oi->sizep = &size;
> -	status = oid_object_info_extended(the_repository, oid, oi, 0);
> +	status = oid_object_info_extended(r, oid, oi, 0);
>  	if (status < 0)
>  		return stream_error;
>  
> @@ -133,22 +135,23 @@ static enum input_source istream_source(const struct object_id *oid,
>  	}
>  }
>  
> -struct git_istream *open_istream(const struct object_id *oid,
> +struct git_istream *open_istream(struct repository *r,
> +				 const struct object_id *oid,
>  				 enum object_type *type,
>  				 unsigned long *size,
>  				 struct stream_filter *filter)
>  {
>  	struct git_istream *st;
>  	struct object_info oi = OBJECT_INFO_INIT;
> -	const struct object_id *real = lookup_replace_object(the_repository, oid);
> -	enum input_source src = istream_source(real, type, &oi);
> +	const struct object_id *real = lookup_replace_object(r, oid);
> +	enum input_source src = istream_source(r, real, type, &oi);
>  
>  	if (src < 0)
>  		return NULL;
>  
>  	st = xmalloc(sizeof(*st));
> -	if (open_istream_tbl[src](st, &oi, real, type)) {
> -		if (open_istream_incore(st, &oi, real, type)) {
> +	if (open_istream_tbl[src](st, r, &oi, real, type)) {
> +		if (open_istream_incore(st, r, &oi, real, type)) {
>  			free(st);
>  			return NULL;
>  		}
> @@ -338,8 +341,7 @@ static struct stream_vtbl loose_vtbl = {
>  
>  static open_method_decl(loose)
>  {
> -	st->u.loose.mapped = map_loose_object(the_repository,
> -					      oid, &st->u.loose.mapsize);
> +	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
>  	if (!st->u.loose.mapped)
>  		return -1;
>  	if ((unpack_loose_header(&st->z,
> @@ -499,7 +501,7 @@ static struct stream_vtbl incore_vtbl = {
>  
>  static open_method_decl(incore)
>  {
> -	st->u.incore.buf = read_object_file_extended(the_repository, oid, type, &st->size, 0);
> +	st->u.incore.buf = read_object_file_extended(r, oid, type, &st->size, 0);
>  	st->u.incore.read_ptr = 0;
>  	st->vtbl = &incore_vtbl;
>  
> @@ -520,7 +522,7 @@ int stream_blob_to_fd(int fd, const struct object_id *oid, struct stream_filter
>  	ssize_t kept = 0;
>  	int result = -1;
>  
> -	st = open_istream(oid, &type, &sz, filter);
> +	st = open_istream(the_repository, oid, &type, &sz, filter);
>  	if (!st) {
>  		if (filter)
>  			free_stream_filter(filter);
> diff --git a/streaming.h b/streaming.h
> index f465a3cd31..5e4e6acfd0 100644
> --- a/streaming.h
> +++ b/streaming.h
> @@ -8,7 +8,9 @@
>  /* opaque */
>  struct git_istream;
>  
> -struct git_istream *open_istream(const struct object_id *, enum object_type *, unsigned long *, struct stream_filter *);
> +struct git_istream *open_istream(struct repository *, const struct object_id *,
> +				 enum object_type *, unsigned long *,
> +				 struct stream_filter *);
>  int close_istream(struct git_istream *);
>  ssize_t read_istream(struct git_istream *, void *, size_t);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/7] fix inconsistent uses of the_repo in parse_object()'s call chain
  2020-01-31  9:03 ` Jeff King
@ 2020-02-01  4:44   ` Matheus Tavares Bernardino
  2020-02-01 11:04     ` Jeff King
  0 siblings, 1 reply; 16+ messages in thread
From: Matheus Tavares Bernardino @ 2020-02-01  4:44 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Hi, Peff

Thanks for the great feedback.

On Fri, Jan 31, 2020 at 6:03 AM Jeff King <peff@peff.net> wrote:
>
[...]
> The hash transition document says:
>
>   To convert recorded submodule pointers, you need to have the converted
>   submodule repository in place. The translation table of the submodule
>   can be used to look up the new hash.
>
> which I take to mean that the local copy of the submodule needs to match
> the superproject hash (this says nothing about what the upstream owner
> of the submodule wants to do; we'd be translating on the fly to the new
> hash in the local repo). So using the_hash_algo would work either way.
>
> I don't think we're particularly interested in supporting multiple
> unrelated repositories within the same process. While that would be
> convenient for some cases (e.g., you have a million repositories and
> want to look at all of their trees without creating a million
> processes), I don't think it's a goal anyone is really working towards
> with this "struct repository" layer.

Thanks for these explanations. One thing that left me thinking,
though, is about changing the_hash_algo to r->hash_algo (not only in
oid_to_hex()). I previously thought this would eventually be required
for the hash transition. But if I understood your comment correctly,
it might not be necessary since the submodule should always match the
superproject hash. And, as you mentioned, supporting multiple
unrelated repos [in a single process] is not one of the main goals, as
well. So wouldn't keep using the_hash_algo be OK?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] sha1-file: pass git_hash_algo to write_object_file_prepare()
  2020-01-30 20:32 ` [PATCH 5/7] sha1-file: pass git_hash_algo to write_object_file_prepare() Matheus Tavares
@ 2020-02-01 10:03   ` Torsten Bögershausen
  2020-02-01 11:08     ` Jeff King
  0 siblings, 1 reply; 16+ messages in thread
From: Torsten Bögershausen @ 2020-02-01 10:03 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: git, Jeff King, Junio C Hamano, Patryk Obara, brian m. carlson,
	Eric Sunshine

On Thu, Jan 30, 2020 at 05:32:21PM -0300, Matheus Tavares wrote:
> Allow write_object_file_prepare() to receive arbitrary 'struct
> git_hash_algo's instead of always using the_hash_algo. The added
> parameter will be used in the next commit to make hash_object_file() be
> able to work with arbitrary git_hash_algo's, as well.
>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
>  sha1-file.c | 20 ++++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/sha1-file.c b/sha1-file.c
> index 13b90b1cb1..e789bfd153 100644
> --- a/sha1-file.c
> +++ b/sha1-file.c
> @@ -1647,7 +1647,8 @@ void *read_object_with_reference(struct repository *r,
>  	}
>  }
>
> -static void write_object_file_prepare(const void *buf, unsigned long len,
> +static void write_object_file_prepare(const struct git_hash_algo *algo,
> +				      const void *buf, unsigned long len,
>  				      const char *type, struct object_id *oid,
>  				      char *hdr, int *hdrlen)
>  {

One minor minor suggestion/nit, may be my own type of style only.

When adding a parameter to a function, we typically add it at the end of
the parameter list, rather than at the beginning.
The other (unwritten) convention, when dealing with "buffer pointer/len",
seams to be that buffer-ptr is the first paramter and len is is the second.

However, this is my personal note, other opinions and comments are welcome.
What do others think ?



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/7] fix inconsistent uses of the_repo in parse_object()'s call chain
  2020-02-01  4:44   ` Matheus Tavares Bernardino
@ 2020-02-01 11:04     ` Jeff King
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff King @ 2020-02-01 11:04 UTC (permalink / raw)
  To: Matheus Tavares Bernardino; +Cc: git

On Sat, Feb 01, 2020 at 01:44:04AM -0300, Matheus Tavares Bernardino wrote:

> > I don't think we're particularly interested in supporting multiple
> > unrelated repositories within the same process. While that would be
> > convenient for some cases (e.g., you have a million repositories and
> > want to look at all of their trees without creating a million
> > processes), I don't think it's a goal anyone is really working towards
> > with this "struct repository" layer.
> 
> Thanks for these explanations. One thing that left me thinking,
> though, is about changing the_hash_algo to r->hash_algo (not only in
> oid_to_hex()). I previously thought this would eventually be required
> for the hash transition. But if I understood your comment correctly,
> it might not be necessary since the submodule should always match the
> superproject hash. And, as you mentioned, supporting multiple
> unrelated repos [in a single process] is not one of the main goals, as
> well. So wouldn't keep using the_hash_algo be OK?

Yes, by the reasoning in my message, it wouldn't be a big problem. But I
literally hadn't thought about it until writing that, so I may either be
missing something, or there may be plans from other folks working on the
hash transition that contradict that.

Even if we don't care too much about it for now, though, it still seems
like a step in the right direction.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] sha1-file: pass git_hash_algo to write_object_file_prepare()
  2020-02-01 10:03   ` Torsten Bögershausen
@ 2020-02-01 11:08     ` Jeff King
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff King @ 2020-02-01 11:08 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Matheus Tavares, git, Junio C Hamano, Patryk Obara,
	brian m. carlson, Eric Sunshine

On Sat, Feb 01, 2020 at 11:03:37AM +0100, Torsten Bögershausen wrote:

> > diff --git a/sha1-file.c b/sha1-file.c
> > index 13b90b1cb1..e789bfd153 100644
> > --- a/sha1-file.c
> > +++ b/sha1-file.c
> > @@ -1647,7 +1647,8 @@ void *read_object_with_reference(struct repository *r,
> >  	}
> >  }
> >
> > -static void write_object_file_prepare(const void *buf, unsigned long len,
> > +static void write_object_file_prepare(const struct git_hash_algo *algo,
> > +				      const void *buf, unsigned long len,
> >  				      const char *type, struct object_id *oid,
> >  				      char *hdr, int *hdrlen)
> >  {
> 
> One minor minor suggestion/nit, may be my own type of style only.
> 
> When adding a parameter to a function, we typically add it at the end of
> the parameter list, rather than at the beginning.
> The other (unwritten) convention, when dealing with "buffer pointer/len",
> seams to be that buffer-ptr is the first paramter and len is is the second.

I'd usually add new options at the end, too, all things being equal. But
in this case, I see:

  - some of these are input parameters and some are output (I think oid,
    hdr, and hdrlen are the latter); it makes sense to keep them grouped
    that way

  - in object-oriented functions, we'd often take the object first
    (e.g., strbuf_add). In non-object-oriented functions, we often take
    a "context" argument first that shapes how the function works (e.g.,
    all of the pp_* functions that take a pretty_print_context). I'd
    think of the algorithm a little bit as a "context" argument, though
    I admit it's mostly a gut feeling and quite subjective.

    But if you believe that line of reasoning, then it doesn't seem out
    of place as the first argument.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-02-01 11:08 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-30 20:32 [PATCH 0/7] fix inconsistent uses of the_repo in parse_object()'s call chain Matheus Tavares
2020-01-30 20:32 ` [PATCH 1/7] diff: make diff_populate_filespec() honor its repo argument Matheus Tavares
2020-01-30 20:32 ` [PATCH 2/7] cache-tree: use given repo's hash_algo at verify_one() Matheus Tavares
2020-01-30 20:32 ` [PATCH 3/7] pack-check: use given repo's hash_algo at verify_packfile() Matheus Tavares
2020-01-30 20:32 ` [PATCH 4/7] streaming: allow open_istream() to handle any repo Matheus Tavares
2020-01-31 18:55   ` Junio C Hamano
2020-01-30 20:32 ` [PATCH 5/7] sha1-file: pass git_hash_algo to write_object_file_prepare() Matheus Tavares
2020-02-01 10:03   ` Torsten Bögershausen
2020-02-01 11:08     ` Jeff King
2020-01-30 20:32 ` [PATCH 6/7] sha1-file: pass git_hash_algo to hash_object_file() Matheus Tavares
2020-01-30 20:32 ` [PATCH 7/7] sha1-file: allow check_object_signature() to handle any repo Matheus Tavares
2020-01-30 21:26 ` [PATCH 0/7] fix inconsistent uses of the_repo in parse_object()'s call chain Junio C Hamano
2020-01-30 23:46   ` brian m. carlson
2020-01-31  9:03 ` Jeff King
2020-02-01  4:44   ` Matheus Tavares Bernardino
2020-02-01 11:04     ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).