git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/5] cleaning up read_object() family of functions
@ 2023-01-07 13:48 Jeff King
  2023-01-07 13:48 ` [PATCH 1/5] object-file: inline calls to read_object() Jeff King
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Jeff King @ 2023-01-07 13:48 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

I often get confused about the difference between:

  - read_object()
  - read_object_file();
  - read_object_file_extended();
  - repo_read_object_file();

Since Jonathan's recent cleanups from 9e59b38c88 (object-file: emit
corruption errors when detected, 2022-12-14), these are mostly thin
wrappers around each other and around oid_object_info_extended().

This series shuffles things around a little more so that we are down to
just read_object_file() and repo_read_object_file(). And the
relationship there is pretty easy (and long-term we'd eventually merge
them once everyone has a repository object).

It is a net reduction in lines, even though some of the callers end up a
little longer (because they have to stuff pointers into an object_info
struct). If that's too distasteful, the middle ground is to have a
helper like:

  void *foo(struct repository *r, const struct object_id *oid,
            enum object_type *type, unsigned long *size,
	    unsigned flags)
  {
	struct object_info oi = OBJECT_INFO_INIT;
	void *content;

	oi.typep = type;
	oi.sizep = size;
	oi.contentp = ret;

	if (oid_object_info_extended(r, oid, &oi, flags) < 0)
		return NULL;
	return content;
  }

which is basically the same as read_object(), but makes it clear that
you can pass OBJECT_INFO flags. The trouble is that I could not come up
with a name for it that was not confusing. ;) So just having most places
call oid_object_info_extended() directly seemed better. It would be nice
if that function had a shorter name, too, but I left that for another
day.

  [1/5]: object-file: inline calls to read_object()
  [2/5]: streaming: inline call to read_object_file_extended()
  [3/5]: read_object_file_extended(): drop lookup_replace option
  [4/5]: repo_read_object_file(): stop wrapping read_object_file_extended()
  [5/5]: packfile: inline custom read_object()

 object-file.c  | 52 ++++++++++++++++++--------------------------------
 object-store.h | 18 +++++------------
 packfile.c     | 26 +++++++++----------------
 streaming.c    | 11 ++++++++---
 4 files changed, 41 insertions(+), 66 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/5] object-file: inline calls to read_object()
  2023-01-07 13:48 [PATCH 0/5] cleaning up read_object() family of functions Jeff King
@ 2023-01-07 13:48 ` Jeff King
  2023-01-12  9:13   ` Ævar Arnfjörð Bjarmason
  2023-01-07 13:49 ` [PATCH 2/5] streaming: inline call to read_object_file_extended() Jeff King
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Jeff King @ 2023-01-07 13:48 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Since read_object() is these days just a thin wrapper around
oid_object_info_extended(), and since it only has two callers, let's
just inline those calls. This has a few positive outcomes:

  - it's a net reduction in source code lines

  - even though the callers end up with a few extra lines, they're now
    more flexible and can use object_info flags directly. So no more
    need to convert die_if_corrupt between parameter/flag, and we can
    ask for lookup replacement with a flag rather than doing it
    ourselves.

  - there's one fewer function in an already crowded namespace (e.g.,
    the difference between read_object() and read_object_file() was not
    immediately obvious; now we only have one of them).

Signed-off-by: Jeff King <peff@peff.net>
---
 object-file.c  | 45 +++++++++++++++++----------------------------
 object-store.h |  2 +-
 2 files changed, 18 insertions(+), 29 deletions(-)

diff --git a/object-file.c b/object-file.c
index 80a0cd3b35..ed1babbac2 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1671,23 +1671,6 @@ int oid_object_info(struct repository *r,
 	return type;
 }
 
-static void *read_object(struct repository *r,
-			 const struct object_id *oid, enum object_type *type,
-			 unsigned long *size,
-			 int die_if_corrupt)
-{
-	struct object_info oi = OBJECT_INFO_INIT;
-	void *content;
-	oi.typep = type;
-	oi.sizep = size;
-	oi.contentp = &content;
-
-	if (oid_object_info_extended(r, oid, &oi, die_if_corrupt
-				     ? OBJECT_INFO_DIE_IF_CORRUPT : 0) < 0)
-		return NULL;
-	return content;
-}
-
 int pretend_object_file(void *buf, unsigned long len, enum object_type type,
 			struct object_id *oid)
 {
@@ -1709,25 +1692,28 @@ int pretend_object_file(void *buf, unsigned long len, enum object_type type,
 
 /*
  * This function dies on corrupt objects; the callers who want to
- * deal with them should arrange to call read_object() and give error
- * messages themselves.
+ * deal with them should arrange to call oid_object_info_extended() and give
+ * error messages themselves.
  */
 void *read_object_file_extended(struct repository *r,
 				const struct object_id *oid,
 				enum object_type *type,
 				unsigned long *size,
 				int lookup_replace)
 {
+	struct object_info oi = OBJECT_INFO_INIT;
+	unsigned flags = OBJECT_INFO_DIE_IF_CORRUPT;
 	void *data;
-	const struct object_id *repl = lookup_replace ?
-		lookup_replace_object(r, oid) : oid;
 
-	errno = 0;
-	data = read_object(r, repl, type, size, 1);
-	if (data)
-		return data;
+	oi.typep = type;
+	oi.sizep = size;
+	oi.contentp = &data;
+	if (lookup_replace)
+		flags |= OBJECT_INFO_LOOKUP_REPLACE;
+	if (oid_object_info_extended(r, oid, &oi, flags))
+	    return NULL;
 
-	return NULL;
+	return data;
 }
 
 void *read_object_with_reference(struct repository *r,
@@ -2255,15 +2241,18 @@ int force_object_loose(const struct object_id *oid, time_t mtime)
 {
 	void *buf;
 	unsigned long len;
+	struct object_info oi = OBJECT_INFO_INIT;
 	enum object_type type;
 	char hdr[MAX_HEADER_LEN];
 	int hdrlen;
 	int ret;
 
 	if (has_loose_object(oid))
 		return 0;
-	buf = read_object(the_repository, oid, &type, &len, 0);
-	if (!buf)
+	oi.typep = &type;
+	oi.sizep = &len;
+	oi.contentp = &buf;
+	if (oid_object_info_extended(the_repository, oid, &oi, 0))
 		return error(_("cannot read object for %s"), oid_to_hex(oid));
 	hdrlen = format_object_header(hdr, sizeof(hdr), type, len);
 	ret = write_loose_object(oid, hdr, hdrlen, buf, len, mtime, 0);
diff --git a/object-store.h b/object-store.h
index 98c1d67946..f0aa03bbb9 100644
--- a/object-store.h
+++ b/object-store.h
@@ -358,7 +358,7 @@ void assert_oid_type(const struct object_id *oid, enum object_type expect);
 /*
  * Enabling the object read lock allows multiple threads to safely call the
  * following functions in parallel: repo_read_object_file(), read_object_file(),
- * read_object_file_extended(), read_object_with_reference(), read_object(),
+ * read_object_file_extended(), read_object_with_reference(),
  * oid_object_info() and oid_object_info_extended().
  *
  * obj_read_lock() and obj_read_unlock() may also be used to protect other
-- 
2.39.0.469.g9000b9c396


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/5] streaming: inline call to read_object_file_extended()
  2023-01-07 13:48 [PATCH 0/5] cleaning up read_object() family of functions Jeff King
  2023-01-07 13:48 ` [PATCH 1/5] object-file: inline calls to read_object() Jeff King
@ 2023-01-07 13:49 ` Jeff King
  2023-01-07 13:50 ` [PATCH 3/5] read_object_file_extended(): drop lookup_replace option Jeff King
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Jeff King @ 2023-01-07 13:49 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

The open_istream_incore() function is the only direct user of
read_object_file_extended(), and the only caller which unsets the
lookup_replace flag. Since read_object_file_extended() is now just a
thin wrapper around oid_object_info_extended(), let's inline the call.
That will let us simplify read_object_file_extended() in the next patch.

The inlined version here is a few more lines because of the query setup,
but it's much more flexible, since we can pass (or omit) any flags we
want.

Note the updated comment in the istream struct definition. It was
already slightly wrong (we never called read_object(); it has been
read_object_file_extended() since day one), but should now be accurate.

Signed-off-by: Jeff King <peff@peff.net>
---
 streaming.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/streaming.c b/streaming.c
index 7b2f8b2b93..27841dc1d9 100644
--- a/streaming.c
+++ b/streaming.c
@@ -38,7 +38,7 @@ struct git_istream {
 
 	union {
 		struct {
-			char *buf; /* from read_object() */
+			char *buf; /* from oid_object_info_extended() */
 			unsigned long read_ptr;
 		} incore;
 
@@ -388,12 +388,17 @@ static ssize_t read_istream_incore(struct git_istream *st, char *buf, size_t sz)
 static int open_istream_incore(struct git_istream *st, struct repository *r,
 			       const struct object_id *oid, enum object_type *type)
 {
-	st->u.incore.buf = read_object_file_extended(r, oid, type, &st->size, 0);
+	struct object_info oi = OBJECT_INFO_INIT;
+
 	st->u.incore.read_ptr = 0;
 	st->close = close_istream_incore;
 	st->read = read_istream_incore;
 
-	return st->u.incore.buf ? 0 : -1;
+	oi.typep = type;
+	oi.sizep = &st->size;
+	oi.contentp = (void **)&st->u.incore.buf;
+	return oid_object_info_extended(r, oid, &oi,
+					OBJECT_INFO_DIE_IF_CORRUPT);
 }
 
 /*****************************************************************************
-- 
2.39.0.469.g9000b9c396


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/5] read_object_file_extended(): drop lookup_replace option
  2023-01-07 13:48 [PATCH 0/5] cleaning up read_object() family of functions Jeff King
  2023-01-07 13:48 ` [PATCH 1/5] object-file: inline calls to read_object() Jeff King
  2023-01-07 13:49 ` [PATCH 2/5] streaming: inline call to read_object_file_extended() Jeff King
@ 2023-01-07 13:50 ` Jeff King
  2023-01-07 13:50 ` [PATCH 4/5] repo_read_object_file(): stop wrapping read_object_file_extended() Jeff King
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Jeff King @ 2023-01-07 13:50 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Our sole caller always passes in "1", so we can just drop the parameter
entirely. Anybody who doesn't want this behavior could easily call
oid_object_info_extended() themselves, as we're just a thin wrapper
around it.

Signed-off-by: Jeff King <peff@peff.net>
---
 object-file.c  | 7 ++-----
 object-store.h | 4 ++--
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/object-file.c b/object-file.c
index ed1babbac2..f472f2d6a0 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1698,18 +1698,15 @@ int pretend_object_file(void *buf, unsigned long len, enum object_type type,
 void *read_object_file_extended(struct repository *r,
 				const struct object_id *oid,
 				enum object_type *type,
-				unsigned long *size,
-				int lookup_replace)
+				unsigned long *size)
 {
 	struct object_info oi = OBJECT_INFO_INIT;
-	unsigned flags = OBJECT_INFO_DIE_IF_CORRUPT;
+	unsigned flags = OBJECT_INFO_DIE_IF_CORRUPT | OBJECT_INFO_LOOKUP_REPLACE;
 	void *data;
 
 	oi.typep = type;
 	oi.sizep = size;
 	oi.contentp = &data;
-	if (lookup_replace)
-		flags |= OBJECT_INFO_LOOKUP_REPLACE;
 	if (oid_object_info_extended(r, oid, &oi, flags))
 	    return NULL;
 
diff --git a/object-store.h b/object-store.h
index f0aa03bbb9..6ccacc947b 100644
--- a/object-store.h
+++ b/object-store.h
@@ -244,13 +244,13 @@ void *map_loose_object(struct repository *r, const struct object_id *oid,
 void *read_object_file_extended(struct repository *r,
 				const struct object_id *oid,
 				enum object_type *type,
-				unsigned long *size, int lookup_replace);
+				unsigned long *size);
 static inline void *repo_read_object_file(struct repository *r,
 					  const struct object_id *oid,
 					  enum object_type *type,
 					  unsigned long *size)
 {
-	return read_object_file_extended(r, oid, type, size, 1);
+	return read_object_file_extended(r, oid, type, size);
 }
 #ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
 #define read_object_file(oid, type, size) repo_read_object_file(the_repository, oid, type, size)
-- 
2.39.0.469.g9000b9c396


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/5] repo_read_object_file(): stop wrapping read_object_file_extended()
  2023-01-07 13:48 [PATCH 0/5] cleaning up read_object() family of functions Jeff King
                   ` (2 preceding siblings ...)
  2023-01-07 13:50 ` [PATCH 3/5] read_object_file_extended(): drop lookup_replace option Jeff King
@ 2023-01-07 13:50 ` Jeff King
  2023-01-07 13:50 ` [PATCH 5/5] packfile: inline custom read_object() Jeff King
  2023-01-09 15:09 ` [PATCH 0/5] cleaning up read_object() family of functions Derrick Stolee
  5 siblings, 0 replies; 20+ messages in thread
From: Jeff King @ 2023-01-07 13:50 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

The only caller of read_object_file_extended() is the thin wrapper of
repo_read_object_file(). Instead of wrapping, let's just rename the
inner function and let people call it directly. This cleans up the
namespace and reduces confusion.

Signed-off-by: Jeff King <peff@peff.net>
---
 object-file.c  |  8 ++++----
 object-store.h | 18 +++++-------------
 2 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/object-file.c b/object-file.c
index f472f2d6a0..80b08fc389 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1695,10 +1695,10 @@ int pretend_object_file(void *buf, unsigned long len, enum object_type type,
  * deal with them should arrange to call oid_object_info_extended() and give
  * error messages themselves.
  */
-void *read_object_file_extended(struct repository *r,
-				const struct object_id *oid,
-				enum object_type *type,
-				unsigned long *size)
+void *repo_read_object_file(struct repository *r,
+			    const struct object_id *oid,
+			    enum object_type *type,
+			    unsigned long *size)
 {
 	struct object_info oi = OBJECT_INFO_INIT;
 	unsigned flags = OBJECT_INFO_DIE_IF_CORRUPT | OBJECT_INFO_LOOKUP_REPLACE;
diff --git a/object-store.h b/object-store.h
index 6ccacc947b..1a713d89d7 100644
--- a/object-store.h
+++ b/object-store.h
@@ -241,17 +241,10 @@ const char *loose_object_path(struct repository *r, struct strbuf *buf,
 void *map_loose_object(struct repository *r, const struct object_id *oid,
 		       unsigned long *size);
 
-void *read_object_file_extended(struct repository *r,
-				const struct object_id *oid,
-				enum object_type *type,
-				unsigned long *size);
-static inline void *repo_read_object_file(struct repository *r,
-					  const struct object_id *oid,
-					  enum object_type *type,
-					  unsigned long *size)
-{
-	return read_object_file_extended(r, oid, type, size);
-}
+void *repo_read_object_file(struct repository *r,
+			    const struct object_id *oid,
+			    enum object_type *type,
+			    unsigned long *size);
 #ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
 #define read_object_file(oid, type, size) repo_read_object_file(the_repository, oid, type, size)
 #endif
@@ -358,8 +351,7 @@ void assert_oid_type(const struct object_id *oid, enum object_type expect);
 /*
  * Enabling the object read lock allows multiple threads to safely call the
  * following functions in parallel: repo_read_object_file(), read_object_file(),
- * read_object_file_extended(), read_object_with_reference(),
- * oid_object_info() and oid_object_info_extended().
+ * read_object_with_reference(), oid_object_info() and oid_object_info_extended().
  *
  * obj_read_lock() and obj_read_unlock() may also be used to protect other
  * section which cannot execute in parallel with object reading. Since the used
-- 
2.39.0.469.g9000b9c396


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/5] packfile: inline custom read_object()
  2023-01-07 13:48 [PATCH 0/5] cleaning up read_object() family of functions Jeff King
                   ` (3 preceding siblings ...)
  2023-01-07 13:50 ` [PATCH 4/5] repo_read_object_file(): stop wrapping read_object_file_extended() Jeff King
@ 2023-01-07 13:50 ` Jeff King
  2023-01-12  9:01   ` Ævar Arnfjörð Bjarmason
  2023-01-09 15:09 ` [PATCH 0/5] cleaning up read_object() family of functions Derrick Stolee
  5 siblings, 1 reply; 20+ messages in thread
From: Jeff King @ 2023-01-07 13:50 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

When the pack code was split into its own file[1], it got a copy of the
static read_object() function. But there's only one caller here, so we
could just inline it. And it's worth doing so, as the name read_object()
invites comparisons to the public read_object_file(), but the two don't
behave quite the same.

[1] The move happened over several commits, but the relevant one here is
    f1d8130be0 (pack: move clear_delta_base_cache(), packed_object_info(),
    unpack_entry(), 2017-08-18).

Signed-off-by: Jeff King <peff@peff.net>
---
 packfile.c | 26 +++++++++-----------------
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/packfile.c b/packfile.c
index c0d7dd93f4..79e21ab18e 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1650,22 +1650,6 @@ struct unpack_entry_stack_ent {
 	unsigned long size;
 };
 
-static void *read_object(struct repository *r,
-			 const struct object_id *oid,
-			 enum object_type *type,
-			 unsigned long *size)
-{
-	struct object_info oi = OBJECT_INFO_INIT;
-	void *content;
-	oi.typep = type;
-	oi.sizep = size;
-	oi.contentp = &content;
-
-	if (oid_object_info_extended(r, oid, &oi, 0) < 0)
-		return NULL;
-	return content;
-}
-
 void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 		   enum object_type *final_type, unsigned long *final_size)
 {
@@ -1798,14 +1782,22 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 			uint32_t pos;
 			struct object_id base_oid;
 			if (!(offset_to_pack_pos(p, obj_offset, &pos))) {
+				struct object_info oi = OBJECT_INFO_INIT;
+
 				nth_packed_object_id(&base_oid, p,
 						     pack_pos_to_index(p, pos));
 				error("failed to read delta base object %s"
 				      " at offset %"PRIuMAX" from %s",
 				      oid_to_hex(&base_oid), (uintmax_t)obj_offset,
 				      p->pack_name);
 				mark_bad_packed_object(p, &base_oid);
-				base = read_object(r, &base_oid, &type, &base_size);
+
+				oi.typep = &type;
+				oi.sizep = &base_size;
+				oi.contentp = &base;
+				if (oid_object_info_extended(r, &base_oid, &oi, 0) < 0)
+					base = NULL;
+
 				external_base = base;
 			}
 		}
-- 
2.39.0.469.g9000b9c396

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/5] cleaning up read_object() family of functions
  2023-01-07 13:48 [PATCH 0/5] cleaning up read_object() family of functions Jeff King
                   ` (4 preceding siblings ...)
  2023-01-07 13:50 ` [PATCH 5/5] packfile: inline custom read_object() Jeff King
@ 2023-01-09 15:09 ` Derrick Stolee
  2023-01-11 18:26   ` Jeff King
  5 siblings, 1 reply; 20+ messages in thread
From: Derrick Stolee @ 2023-01-09 15:09 UTC (permalink / raw)
  To: Jeff King, git; +Cc: Jonathan Tan

On 1/7/2023 8:48 AM, Jeff King wrote:
> I often get confused about the difference between:
> 
>   - read_object()
>   - read_object_file();
>   - read_object_file_extended();
>   - repo_read_object_file();
> 
> Since Jonathan's recent cleanups from 9e59b38c88 (object-file: emit
> corruption errors when detected, 2022-12-14), these are mostly thin
> wrappers around each other and around oid_object_info_extended().
> 
> This series shuffles things around a little more so that we are down to
> just read_object_file() and repo_read_object_file(). And the
> relationship there is pretty easy (and long-term we'd eventually merge
> them once everyone has a repository object).

I read the patches carefully and the translations look correct and
definitely help with this confusing mess of method names.

> It is a net reduction in lines, even though some of the callers end up a
> little longer (because they have to stuff pointers into an object_info
> struct). If that's too distasteful, the middle ground is to have a
> helper like:
> 
>   void *foo(struct repository *r, const struct object_id *oid,
>             enum object_type *type, unsigned long *size,
> 	    unsigned flags)
>   {
> 	struct object_info oi = OBJECT_INFO_INIT;
> 	void *content;
> 
> 	oi.typep = type;
> 	oi.sizep = size;
> 	oi.contentp = ret;
> 
> 	if (oid_object_info_extended(r, oid, &oi, flags) < 0)
> 		return NULL;
> 	return content;
>   }
> 
> which is basically the same as read_object(), but makes it clear that
> you can pass OBJECT_INFO flags. The trouble is that I could not come up
> with a name for it that was not confusing. ;) So just having most places
> call oid_object_info_extended() directly seemed better. It would be nice
> if that function had a shorter name, too, but I left that for another
> day.

I did think that requiring callers to create their own object_info
structs (which takes at least four lines) would be too much, but
the number of new callers is so low that I think this is a fine place
to stop.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/5] cleaning up read_object() family of functions
  2023-01-09 15:09 ` [PATCH 0/5] cleaning up read_object() family of functions Derrick Stolee
@ 2023-01-11 18:26   ` Jeff King
  2023-01-11 20:17     ` Derrick Stolee
  2023-01-12  9:21     ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 20+ messages in thread
From: Jeff King @ 2023-01-11 18:26 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Jonathan Tan

On Mon, Jan 09, 2023 at 10:09:32AM -0500, Derrick Stolee wrote:

> I did think that requiring callers to create their own object_info
> structs (which takes at least four lines) would be too much, but
> the number of new callers is so low that I think this is a fine place
> to stop.

Yeah, that was my feeling. I do wonder if there's a way to make it
easier for callers of oid_object_info_extended(), but I couldn't come up
with anything that's nice enough to merit the complexity.

For example, here's an attempt to let the caller use designated
initializers to set up the query struct:

diff --git a/object-file.c b/object-file.c
index 80b08fc389..60ca75d755 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1700,13 +1700,12 @@ void *repo_read_object_file(struct repository *r,
 			    enum object_type *type,
 			    unsigned long *size)
 {
-	struct object_info oi = OBJECT_INFO_INIT;
 	unsigned flags = OBJECT_INFO_DIE_IF_CORRUPT | OBJECT_INFO_LOOKUP_REPLACE;
 	void *data;
+	struct object_info oi = OBJECT_INFO(.typep = type,
+					    .sizep = size,
+					    .contentp = &data);
 
-	oi.typep = type;
-	oi.sizep = size;
-	oi.contentp = &data;
 	if (oid_object_info_extended(r, oid, &oi, flags))
 	    return NULL;
 
diff --git a/object-store.h b/object-store.h
index 1a713d89d7..e894cee61b 100644
--- a/object-store.h
+++ b/object-store.h
@@ -418,7 +418,8 @@ struct object_info {
  * Initializer for a "struct object_info" that wants no items. You may
  * also memset() the memory to all-zeroes.
  */
-#define OBJECT_INFO_INIT { 0 }
+#define OBJECT_INFO(...) { 0, __VA_ARGS__ }
+#define OBJECT_INFO_INIT OBJECT_INFO()
 
 /* Invoke lookup_replace_object() on the given hash */
 #define OBJECT_INFO_LOOKUP_REPLACE 1

But:

  - it actually triggers a gcc warning, since OBJECT_INFO(.typep = foo)
    sets typep twice (once for the default "0", and once by name). In
    this case the "0" is superfluous, since that's the default, and we
    could just do:

      #define OBJECT_INFO(...) { __VA_ARGS__ }
      #define OBJECT_INFO_INIT OBJECT_INFO(0)

    but I was hoping to find a general technique for object
    initializers.

  - it's not really that much shorter than the existing code. The real
    benefit of "data = read_object(oid, type, size)" is the implicit
    number and names of the parameters. And the way to get that is to
    provide an extra function.

So I think we are better off with the code that is longer but totally
obvious, unless we really want to add a function wrapper for common
queries as syntactic sugar.

-Peff

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/5] cleaning up read_object() family of functions
  2023-01-11 18:26   ` Jeff King
@ 2023-01-11 20:17     ` Derrick Stolee
  2023-01-11 20:30       ` Jeff King
  2023-01-12  9:21     ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 20+ messages in thread
From: Derrick Stolee @ 2023-01-11 20:17 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Jonathan Tan

On 1/11/2023 1:26 PM, Jeff King wrote:
> On Mon, Jan 09, 2023 at 10:09:32AM -0500, Derrick Stolee wrote:
> 
>> I did think that requiring callers to create their own object_info
>> structs (which takes at least four lines) would be too much, but
>> the number of new callers is so low that I think this is a fine place
>> to stop.
> 
> Yeah, that was my feeling. I do wonder if there's a way to make it
> easier for callers of oid_object_info_extended(), but I couldn't come up
> with anything that's nice enough to merit the complexity.
> 
> For example, here's an attempt to let the caller use designated
> initializers to set up the query struct:

> +	struct object_info oi = OBJECT_INFO(.typep = type,
> +					    .sizep = size,
> +					    .contentp = &data);

Your macro expansion creates this format:

	struct object_info oi = {
		.type = type,
		.sizep = size,
		.contentp = &data,
	};

And even this expansion looks a bit better than the inline
updates:

> -	oi.typep = type;
> -	oi.sizep = size;
> -	oi.contentp = &data;

So maybe that's a preferred pattern that we could establish
by replacing the existing callers. It's also such a minor
point that I wouldn't say it's a high priority to do.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/5] cleaning up read_object() family of functions
  2023-01-11 20:17     ` Derrick Stolee
@ 2023-01-11 20:30       ` Jeff King
  0 siblings, 0 replies; 20+ messages in thread
From: Jeff King @ 2023-01-11 20:30 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Jonathan Tan

On Wed, Jan 11, 2023 at 03:17:58PM -0500, Derrick Stolee wrote:

> > For example, here's an attempt to let the caller use designated
> > initializers to set up the query struct:
> 
> > +	struct object_info oi = OBJECT_INFO(.typep = type,
> > +					    .sizep = size,
> > +					    .contentp = &data);
> 
> Your macro expansion creates this format:
> 
> 	struct object_info oi = {
> 		.type = type,
> 		.sizep = size,
> 		.contentp = &data,
> 	};
> 
> And even this expansion looks a bit better than the inline
> updates:

There's a subtle assumption in the expanded initializer, though, which
is that everything not specified is OK to be zero-initialized. That
works for object_info, but not for arbitrary structs (which is why we
have these INIT macros in the first place).

-Peff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] packfile: inline custom read_object()
  2023-01-07 13:50 ` [PATCH 5/5] packfile: inline custom read_object() Jeff King
@ 2023-01-12  9:01   ` Ævar Arnfjörð Bjarmason
  2023-01-12 16:29     ` Jeff King
  0 siblings, 1 reply; 20+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2023-01-12  9:01 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Jonathan Tan


On Sat, Jan 07 2023, Jeff King wrote:

> When the pack code was split into its own file[1], it got a copy of the
> static read_object() function. But there's only one caller here, so we
> could just inline it. And it's worth doing so, as the name read_object()
> invites comparisons to the public read_object_file(), but the two don't
> behave quite the same.
>
> [1] The move happened over several commits, but the relevant one here is
>     f1d8130be0 (pack: move clear_delta_base_cache(), packed_object_info(),
>     unpack_entry(), 2017-08-18).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  packfile.c | 26 +++++++++-----------------
>  1 file changed, 9 insertions(+), 17 deletions(-)
>
> diff --git a/packfile.c b/packfile.c
> index c0d7dd93f4..79e21ab18e 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -1650,22 +1650,6 @@ struct unpack_entry_stack_ent {
>  	unsigned long size;
>  };
>  
> -static void *read_object(struct repository *r,
> -			 const struct object_id *oid,
> -			 enum object_type *type,
> -			 unsigned long *size)
> -{
> -	struct object_info oi = OBJECT_INFO_INIT;
> -	void *content;
> -	oi.typep = type;
> -	oi.sizep = size;
> -	oi.contentp = &content;
> -
> -	if (oid_object_info_extended(r, oid, &oi, 0) < 0)
> -		return NULL;
> -	return content;
> -}
> -
>  void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
>  		   enum object_type *final_type, unsigned long *final_size)
>  {
> @@ -1798,14 +1782,22 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
>  			uint32_t pos;
>  			struct object_id base_oid;
>  			if (!(offset_to_pack_pos(p, obj_offset, &pos))) {
> +				struct object_info oi = OBJECT_INFO_INIT;
> +
>  				nth_packed_object_id(&base_oid, p,
>  						     pack_pos_to_index(p, pos));
>  				error("failed to read delta base object %s"
>  				      " at offset %"PRIuMAX" from %s",
>  				      oid_to_hex(&base_oid), (uintmax_t)obj_offset,
>  				      p->pack_name);
>  				mark_bad_packed_object(p, &base_oid);
> -				base = read_object(r, &base_oid, &type, &base_size);
> +
> +				oi.typep = &type;
> +				oi.sizep = &base_size;
> +				oi.contentp = &base;
> +				if (oid_object_info_extended(r, &base_oid, &oi, 0) < 0)
> +					base = NULL;
> +
>  				external_base = base;
>  			}
>  		}

This isn't introducing a behavior difference, in fact it's diligently
bending over backwards to preserve existing behavior, but I don't think
we need to do so, and shouldn't have this "base = NULL" line.

Here we're within an "if" block where we tested that "base == NULL"
(which is why we're trying to populate it)

Before when we had read_object() re-assigning to "base" here was the
obvious thing to do, but now this seems like undue an incomplete
paranoia.

If oid_object_info_extended() why can't we trust that it didn't touch
our "base"? And if we can't trust that, why are we trusting that it left
"type" and "base_size" untouched?

I think squashing this in would be much better:
	
	diff --git a/packfile.c b/packfile.c
	index 79e21ab18e7..f45017422a1 100644
	--- a/packfile.c
	+++ b/packfile.c
	@@ -1795,10 +1795,8 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
	 				oi.typep = &type;
	 				oi.sizep = &base_size;
	 				oi.contentp = &base;
	-				if (oid_object_info_extended(r, &base_oid, &oi, 0) < 0)
	-					base = NULL;
	-
	-				external_base = base;
	+				if (!oid_object_info_extended(r, &base_oid, &oi, 0))
	+					external_base = base;
	 			}
	 		}

Not only aren't we second-guessing that our "base" was left alone, we're
using the return value of oid_object_info_extended() to guard that
assignment to "external_base" instead (it's NULL at this point too).





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] object-file: inline calls to read_object()
  2023-01-07 13:48 ` [PATCH 1/5] object-file: inline calls to read_object() Jeff King
@ 2023-01-12  9:13   ` Ævar Arnfjörð Bjarmason
  2023-01-12 16:06     ` [PATCH] object-file: fix indent-with-space Jeff King
  0 siblings, 1 reply; 20+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2023-01-12  9:13 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Jonathan Tan


On Sat, Jan 07 2023, Jeff King wrote:

> +	oi.typep = type;
> +	oi.sizep = size;
> +	oi.contentp = &data;
> +	if (lookup_replace)
> +		flags |= OBJECT_INFO_LOOKUP_REPLACE;
> +	if (oid_object_info_extended(r, oid, &oi, flags))
> +	    return NULL;

Style: This is "\t    ", but should be "\t\t".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/5] cleaning up read_object() family of functions
  2023-01-11 18:26   ` Jeff King
  2023-01-11 20:17     ` Derrick Stolee
@ 2023-01-12  9:21     ` Ævar Arnfjörð Bjarmason
  2023-01-12 16:16       ` Jeff King
  1 sibling, 1 reply; 20+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2023-01-12  9:21 UTC (permalink / raw)
  To: Jeff King; +Cc: Derrick Stolee, git, Jonathan Tan, René Scharfe


On Wed, Jan 11 2023, Jeff King wrote:

> On Mon, Jan 09, 2023 at 10:09:32AM -0500, Derrick Stolee wrote:
>
>> I did think that requiring callers to create their own object_info
>> structs (which takes at least four lines) would be too much, but
>> the number of new callers is so low that I think this is a fine place
>> to stop.
>
> Yeah, that was my feeling. I do wonder if there's a way to make it
> easier for callers of oid_object_info_extended(), but I couldn't come up
> with anything that's nice enough to merit the complexity.
>
> For example, here's an attempt to let the caller use designated
> initializers to set up the query struct:
>
> diff --git a/object-file.c b/object-file.c
> index 80b08fc389..60ca75d755 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1700,13 +1700,12 @@ void *repo_read_object_file(struct repository *r,
>  			    enum object_type *type,
>  			    unsigned long *size)
>  {
> -	struct object_info oi = OBJECT_INFO_INIT;
>  	unsigned flags = OBJECT_INFO_DIE_IF_CORRUPT | OBJECT_INFO_LOOKUP_REPLACE;
>  	void *data;
> +	struct object_info oi = OBJECT_INFO(.typep = type,
> +					    .sizep = size,
> +					    .contentp = &data);
>  
> -	oi.typep = type;
> -	oi.sizep = size;
> -	oi.contentp = &data;
>  	if (oid_object_info_extended(r, oid, &oi, flags))
>  	    return NULL;
>  
> diff --git a/object-store.h b/object-store.h
> index 1a713d89d7..e894cee61b 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -418,7 +418,8 @@ struct object_info {
>   * Initializer for a "struct object_info" that wants no items. You may
>   * also memset() the memory to all-zeroes.
>   */
> -#define OBJECT_INFO_INIT { 0 }
> +#define OBJECT_INFO(...) { 0, __VA_ARGS__ }
> +#define OBJECT_INFO_INIT OBJECT_INFO()
>  
>  /* Invoke lookup_replace_object() on the given hash */
>  #define OBJECT_INFO_LOOKUP_REPLACE 1
>
> But:
>
>   - it actually triggers a gcc warning, since OBJECT_INFO(.typep = foo)
>     sets typep twice (once for the default "0", and once by name). In
>     this case the "0" is superfluous, since that's the default, and we
>     could just do:
>
>       #define OBJECT_INFO(...) { __VA_ARGS__ }
>       #define OBJECT_INFO_INIT OBJECT_INFO(0)
>
>     but I was hoping to find a general technique for object
>     initializers.
>
>   - it's not really that much shorter than the existing code. The real
>     benefit of "data = read_object(oid, type, size)" is the implicit
>     number and names of the parameters. And the way to get that is to
>     provide an extra function.
>
> So I think we are better off with the code that is longer but totally
> obvious, unless we really want to add a function wrapper for common
> queries as syntactic sugar.
>
> -Peff

I agree that it's probably not worth it here, but I think you're just
tying yourself in knots in trying to define these macros in terms of
each other. This sort of thing will work if you just do:
	
	diff --git a/object-store.h b/object-store.h
	index e894cee61ba..bfcd2482dc5 100644
	--- a/object-store.h
	+++ b/object-store.h
	@@ -418,8 +418,8 @@ struct object_info {
	  * Initializer for a "struct object_info" that wants no items. You may
	  * also memset() the memory to all-zeroes.
	  */
	-#define OBJECT_INFO(...) { 0, __VA_ARGS__ }
	-#define OBJECT_INFO_INIT OBJECT_INFO()
	+#define OBJECT_INFO_INIT { 0 }
	+#define OBJECT_INFO(...) { __VA_ARGS__ }
	 
	 /* Invoke lookup_replace_object() on the given hash */
	 #define OBJECT_INFO_LOOKUP_REPLACE 1

Which is just a twist on René's suggestion from [1], i.e.:

	#define CHILD_PROCESS_INIT_EX(...) { .args = STRVEC_INIT, __VA_ARGS__ }

In that case we always need to rely on the "args" being init'd, and the
GCC warning you note is a feature, its initialization is "private", and
you should never override it.

But likewise you don't need the "0" there, if the user provides an empty
list that's their own fault, they should use OBJECT_INFO_INIT
instead.

If they do provide arguments it's an implementation detail how any
"default" arguments get init'd, if they're not clobbering any "private"
arguments we're OK.

So using an explicit "0" is the same as providing nothing in the
"*_ARGS()" case, in both cases we're just offloading that zero-init to
the language.

The only way I think you can dig yourself into a proper hole here is if
you're trying to support 0 or N args, as P99 shows that's possible, but
quite complex (and not worth it, IMO).

1. https://lore.kernel.org/git/749f6adc-928a-0978-e3a1-2ede9f07def0@web.de/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] object-file: fix indent-with-space
  2023-01-12  9:13   ` Ævar Arnfjörð Bjarmason
@ 2023-01-12 16:06     ` Jeff King
  2023-01-12 16:08       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff King @ 2023-01-12 16:06 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Jonathan Tan, Junio C Hamano

On Thu, Jan 12, 2023 at 10:13:20AM +0100, Ævar Arnfjörð Bjarmason wrote:

> 
> On Sat, Jan 07 2023, Jeff King wrote:
> 
> > +	oi.typep = type;
> > +	oi.sizep = size;
> > +	oi.contentp = &data;
> > +	if (lookup_replace)
> > +		flags |= OBJECT_INFO_LOOKUP_REPLACE;
> > +	if (oid_object_info_extended(r, oid, &oi, flags))
> > +	    return NULL;
> 
> Style: This is "\t    ", but should be "\t\t".

Hmph, I'm not sure how I managed that. Thanks for pointing it out. The
commit is in 'next', so I think we'd want this on top (of
jk/read-object-cleanup).

-- >8 --
Subject: [PATCH] object-file: fix indent-with-space

Commit b25562e63f (object-file: inline calls to read_object(),
2023-01-07) accidentally indented a conditional block with spaces
instead of a tab.

Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 object-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index 80b08fc389..ce9efae994 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1708,7 +1708,7 @@ void *repo_read_object_file(struct repository *r,
 	oi.sizep = size;
 	oi.contentp = &data;
 	if (oid_object_info_extended(r, oid, &oi, flags))
-	    return NULL;
+		return NULL;
 
 	return data;
 }
-- 
2.39.0.508.g93b13bde48

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] object-file: fix indent-with-space
  2023-01-12 16:06     ` [PATCH] object-file: fix indent-with-space Jeff King
@ 2023-01-12 16:08       ` Ævar Arnfjörð Bjarmason
  2023-01-13 17:40         ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2023-01-12 16:08 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Jonathan Tan, Junio C Hamano


On Thu, Jan 12 2023, Jeff King wrote:

> On Thu, Jan 12, 2023 at 10:13:20AM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> 
>> On Sat, Jan 07 2023, Jeff King wrote:
>> 
>> > +	oi.typep = type;
>> > +	oi.sizep = size;
>> > +	oi.contentp = &data;
>> > +	if (lookup_replace)
>> > +		flags |= OBJECT_INFO_LOOKUP_REPLACE;
>> > +	if (oid_object_info_extended(r, oid, &oi, flags))
>> > +	    return NULL;
>> 
>> Style: This is "\t    ", but should be "\t\t".
>
> Hmph, I'm not sure how I managed that. Thanks for pointing it out. The
> commit is in 'next', so I think we'd want this on top (of
> jk/read-object-cleanup).

> -- >8 --
> Subject: [PATCH] object-file: fix indent-with-space
>
> Commit b25562e63f (object-file: inline calls to read_object(),
> 2023-01-07) accidentally indented a conditional block with spaces
> instead of a tab.
>
> Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  object-file.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/object-file.c b/object-file.c
> index 80b08fc389..ce9efae994 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1708,7 +1708,7 @@ void *repo_read_object_file(struct repository *r,
>  	oi.sizep = size;
>  	oi.contentp = &data;
>  	if (oid_object_info_extended(r, oid, &oi, flags))
> -	    return NULL;
> +		return NULL;
>  
>  	return data;
>  }

Thanks, I didn't notice (assuming it was too soon, it being less than a
week) that it was in "next" already. This change LGTM, thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/5] cleaning up read_object() family of functions
  2023-01-12  9:21     ` Ævar Arnfjörð Bjarmason
@ 2023-01-12 16:16       ` Jeff King
  2023-01-12 16:22         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff King @ 2023-01-12 16:16 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee, git, Jonathan Tan, René Scharfe

On Thu, Jan 12, 2023 at 10:21:46AM +0100, Ævar Arnfjörð Bjarmason wrote:

> I agree that it's probably not worth it here, but I think you're just
> tying yourself in knots in trying to define these macros in terms of
> each other. This sort of thing will work if you just do:
> 	
> 	diff --git a/object-store.h b/object-store.h
> 	index e894cee61ba..bfcd2482dc5 100644
> 	--- a/object-store.h
> 	+++ b/object-store.h
> 	@@ -418,8 +418,8 @@ struct object_info {
> 	  * Initializer for a "struct object_info" that wants no items. You may
> 	  * also memset() the memory to all-zeroes.
> 	  */
> 	-#define OBJECT_INFO(...) { 0, __VA_ARGS__ }
> 	-#define OBJECT_INFO_INIT OBJECT_INFO()
> 	+#define OBJECT_INFO_INIT { 0 }
> 	+#define OBJECT_INFO(...) { __VA_ARGS__ }

Right, that works because the initializer is just "0", which the
compiler can do for us implicitly. I agree it works here to omit, but as
a general solution, it doesn't.

> Which is just a twist on René's suggestion from [1], i.e.:
> 
> 	#define CHILD_PROCESS_INIT_EX(...) { .args = STRVEC_INIT, __VA_ARGS__ }
>
> In that case we always need to rely on the "args" being init'd, and the
> GCC warning you note is a feature, its initialization is "private", and
> you should never override it.

Right, and it works here because you'd never want to init .args to
anything else (which I think is what you mean by "private"). But in the
general case the defaults can't set something that the caller might want
to override, because the compiler's warning doesn't know the difference
between "override" and "oops, you specified this twice".

It's mostly a non-issue because we tend to prefer 0-initialization when
possible, but I think as a general technique this is probably opening a
can of worms for little benefit.

-Peff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/5] cleaning up read_object() family of functions
  2023-01-12 16:16       ` Jeff King
@ 2023-01-12 16:22         ` Ævar Arnfjörð Bjarmason
  2023-01-12 16:53           ` Jeff King
  0 siblings, 1 reply; 20+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2023-01-12 16:22 UTC (permalink / raw)
  To: Jeff King; +Cc: Derrick Stolee, git, Jonathan Tan, René Scharfe


On Thu, Jan 12 2023, Jeff King wrote:

> On Thu, Jan 12, 2023 at 10:21:46AM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> I agree that it's probably not worth it here, but I think you're just
>> tying yourself in knots in trying to define these macros in terms of
>> each other. This sort of thing will work if you just do:
>> 	
>> 	diff --git a/object-store.h b/object-store.h
>> 	index e894cee61ba..bfcd2482dc5 100644
>> 	--- a/object-store.h
>> 	+++ b/object-store.h
>> 	@@ -418,8 +418,8 @@ struct object_info {
>> 	  * Initializer for a "struct object_info" that wants no items. You may
>> 	  * also memset() the memory to all-zeroes.
>> 	  */
>> 	-#define OBJECT_INFO(...) { 0, __VA_ARGS__ }
>> 	-#define OBJECT_INFO_INIT OBJECT_INFO()
>> 	+#define OBJECT_INFO_INIT { 0 }
>> 	+#define OBJECT_INFO(...) { __VA_ARGS__ }
>
> Right, that works because the initializer is just "0", which the
> compiler can do for us implicitly. I agree it works here to omit, but as
> a general solution, it doesn't.
>
>> Which is just a twist on René's suggestion from [1], i.e.:
>> 
>> 	#define CHILD_PROCESS_INIT_EX(...) { .args = STRVEC_INIT, __VA_ARGS__ }
>>
>> In that case we always need to rely on the "args" being init'd, and the
>> GCC warning you note is a feature, its initialization is "private", and
>> you should never override it.
>
> Right, and it works here because you'd never want to init .args to
> anything else (which I think is what you mean by "private"). But in the
> general case the defaults can't set something that the caller might want
> to override, because the compiler's warning doesn't know the difference
> between "override" and "oops, you specified this twice".
>
> It's mostly a non-issue because we tend to prefer 0-initialization when
> possible, but I think as a general technique this is probably opening a
> can of worms for little benefit.

You're right in the general case, although I think that if we did
encounter such a use-case a perfectly good solution would be to just
suppress the GCC-specific warning with the relevant GCC-specific macro
magic, this being perfectly valid C, just something it (rightly, as it's
almost always a mistake) complains about.

But I can't think of a case where this would matter for us in practice.

We have members like "struct strbuf"'s "buf", which always needs to be
init'd, but never "maybe by the user", so the pattern above would work
there.

Then we have things like "strdup_strings" which we might imagine that
the user would override (with a hypothetical "struct string_list" that
took more arguments, but in those cases we could just add another init
macro, as "STRING_LIST_INIT_{DUP,NODUP}" does.

For any such member we could always just invert its boolean state, if it
came to that, couldn't we?

Anyway, I agree that it's not worth pursuing this in this case.

But I think it's a neat pattern that we might find use for sooner than
later for something else.

I don't think it's worth the churn to change it at this point (except
maybe with a sufficiently clever coccinelle rule), but I think it's
already "worth it" in the case of the run-command API, if we were adding
that code today under current constraints (i.e. being able to use C99
macro features).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] packfile: inline custom read_object()
  2023-01-12  9:01   ` Ævar Arnfjörð Bjarmason
@ 2023-01-12 16:29     ` Jeff King
  0 siblings, 0 replies; 20+ messages in thread
From: Jeff King @ 2023-01-12 16:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Jonathan Tan

On Thu, Jan 12, 2023 at 10:01:28AM +0100, Ævar Arnfjörð Bjarmason wrote:

> > -				base = read_object(r, &base_oid, &type, &base_size);
> > +
> > +				oi.typep = &type;
> > +				oi.sizep = &base_size;
> > +				oi.contentp = &base;
> > +				if (oid_object_info_extended(r, &base_oid, &oi, 0) < 0)
> > +					base = NULL;
> > +
> >  				external_base = base;
> >  			}
> >  		}
> 
> This isn't introducing a behavior difference, in fact it's diligently
> bending over backwards to preserve existing behavior, but I don't think
> we need to do so, and shouldn't have this "base = NULL" line.
> 
> Here we're within an "if" block where we tested that "base == NULL"
> (which is why we're trying to populate it)
> 
> Before when we had read_object() re-assigning to "base" here was the
> obvious thing to do, but now this seems like undue an incomplete
> paranoia.

I think it's the same paranoia that was in read_object(). There it
catches the error and returns NULL, rather than the probably-NULL
"content" (though to be fair, it simply did not initialize the pointer,
so it would have had to do that to depend on it).

I agree it's probably being overly defensive. But I don't think
oid_object_info_extended() makes any promises, and it's not completely
clear to me if packed_object_info() could return a non-NULL entry here
on an error (e.g., if packed_to_object_type() fails even after we pulled
out the content).

So probably yes, we could depend on that (and if not, arguably we should
be fixing oid_object_info_extended(), because we are probably leaking a
buffer in that case). But we definitely shouldn't be doing it in the
middle of another patch.

> If oid_object_info_extended() why can't we trust that it didn't touch
> our "base"? And if we can't trust that, why are we trusting that it left
> "type" and "base_size" untouched?

My assumption is that "base" gated access to "type" and "base_size". So
as long as "!base", we do not look at the other two.

> I think squashing this in would be much better:
> 	
> 	diff --git a/packfile.c b/packfile.c
> 	index 79e21ab18e7..f45017422a1 100644
> 	--- a/packfile.c
> 	+++ b/packfile.c
> 	@@ -1795,10 +1795,8 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
> 	 				oi.typep = &type;
> 	 				oi.sizep = &base_size;
> 	 				oi.contentp = &base;
> 	-				if (oid_object_info_extended(r, &base_oid, &oi, 0) < 0)
> 	-					base = NULL;
> 	-
> 	-				external_base = base;
> 	+				if (!oid_object_info_extended(r, &base_oid, &oi, 0))
> 	+					external_base = base;
> 	 			}
> 	 		}
> 
> Not only aren't we second-guessing that our "base" was left alone, we're
> using the return value of oid_object_info_extended() to guard that
> assignment to "external_base" instead (it's NULL at this point too).

I don't think we need to guard the assignment (we know it will be NULL
if we saw an error). But sure, I don't mind if you want to do that
simplification, but it should be on top if at all.

-Peff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/5] cleaning up read_object() family of functions
  2023-01-12 16:22         ` Ævar Arnfjörð Bjarmason
@ 2023-01-12 16:53           ` Jeff King
  0 siblings, 0 replies; 20+ messages in thread
From: Jeff King @ 2023-01-12 16:53 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee, git, Jonathan Tan, René Scharfe

On Thu, Jan 12, 2023 at 05:22:04PM +0100, Ævar Arnfjörð Bjarmason wrote:

> We have members like "struct strbuf"'s "buf", which always needs to be
> init'd, but never "maybe by the user", so the pattern above would work
> there.

We've discussed in the past having a strbuf that points to an existing
buffer, over which it takes ownership. Or a const string that we'd leave
behind (but not free) if we needed to grow.

In those cases you'd want to pass in a buffer to the allocator. Of
course in the case of a strbuf those initializers would probably just be
totally separate from the regular slopbuf one, just because there's not
much else in a strbuf to initialize. You don't gain much from trying to
avoid repetition.

> Anyway, I agree that it's not worth pursuing this in this case.
> 
> But I think it's a neat pattern that we might find use for sooner than
> later for something else.

I remain unconvinced. ;) Mostly just that the lines saved versus the
amount of magic and thought doesn't seem reasonable. But it's something
we can keep in mind as new opportunities show up.

-Peff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] object-file: fix indent-with-space
  2023-01-12 16:08       ` Ævar Arnfjörð Bjarmason
@ 2023-01-13 17:40         ` Junio C Hamano
  0 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2023-01-13 17:40 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Jeff King, git, Jonathan Tan

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> diff --git a/object-file.c b/object-file.c
>> index 80b08fc389..ce9efae994 100644
>> --- a/object-file.c
>> +++ b/object-file.c
>> @@ -1708,7 +1708,7 @@ void *repo_read_object_file(struct repository *r,
>>  	oi.sizep = size;
>>  	oi.contentp = &data;
>>  	if (oid_object_info_extended(r, oid, &oi, flags))
>> -	    return NULL;
>> +		return NULL;
>>  
>>  	return data;
>>  }

Thanks, both, for being extra careful.

> Thanks, I didn't notice (assuming it was too soon, it being less than a
> week) that it was in "next" already. This change LGTM, thanks!

It would be surprising if an ordinary topic goes to 'master' without
spending a week in 'next', but it is something I aim to merge a
reasonably well-done topic down to 'next' from 'seen' with minimum
amount of time.  Here minimum usually means 1 wallclock day, just to
catch silly typos, if the patches are reviewed adequately on list by
folks (or possibly by me).

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-01-13 17:50 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-07 13:48 [PATCH 0/5] cleaning up read_object() family of functions Jeff King
2023-01-07 13:48 ` [PATCH 1/5] object-file: inline calls to read_object() Jeff King
2023-01-12  9:13   ` Ævar Arnfjörð Bjarmason
2023-01-12 16:06     ` [PATCH] object-file: fix indent-with-space Jeff King
2023-01-12 16:08       ` Ævar Arnfjörð Bjarmason
2023-01-13 17:40         ` Junio C Hamano
2023-01-07 13:49 ` [PATCH 2/5] streaming: inline call to read_object_file_extended() Jeff King
2023-01-07 13:50 ` [PATCH 3/5] read_object_file_extended(): drop lookup_replace option Jeff King
2023-01-07 13:50 ` [PATCH 4/5] repo_read_object_file(): stop wrapping read_object_file_extended() Jeff King
2023-01-07 13:50 ` [PATCH 5/5] packfile: inline custom read_object() Jeff King
2023-01-12  9:01   ` Ævar Arnfjörð Bjarmason
2023-01-12 16:29     ` Jeff King
2023-01-09 15:09 ` [PATCH 0/5] cleaning up read_object() family of functions Derrick Stolee
2023-01-11 18:26   ` Jeff King
2023-01-11 20:17     ` Derrick Stolee
2023-01-11 20:30       ` Jeff King
2023-01-12  9:21     ` Ævar Arnfjörð Bjarmason
2023-01-12 16:16       ` Jeff King
2023-01-12 16:22         ` Ævar Arnfjörð Bjarmason
2023-01-12 16:53           ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).