[PATCH 0/5] [RFC] config API: return empty list, not NULL

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH 0/5] [RFC] config API: return empty list, not NULL
@ 2022-09-27 14:08 Derrick Stolee via GitGitGadget
  2022-09-27 14:08 ` [PATCH 1/5] config: relax requirements on multi-value return Derrick Stolee via GitGitGadget
                   ` (5 more replies)
  0 siblings, 6 replies; 21+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-09-27 14:08 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee

This work changes the behavior of asking for a multi-valued config key to
return an empty list instead of a NULL value. This simplifies the handling
of the result and is safer for development in the future.

This is based on v4 of my unregister series [1]

[1]
https://lore.kernel.org/git/pull.1358.v4.git.1664287021.gitgitgadget@gmail.com/

This idea came about due to a bug in the git maintenance unregister work
where the result from git_config_get_value_multi() was sent directly to
for_each_string_list_item() without checking for a NULL value first.

I'm sending this as an RFC mostly because I'm not 100% sure this shift is
worth the refactoring pain and effort. I personally think getting an empty
list is a safer choice, but I can also understand if someone has a different
opinion.

Thanks, -Stolee

Derrick Stolee (5):
  config: relax requirements on multi-value return
  *: relax git_configset_get_value_multi result
  config: add BUG() statement instead of possible segfault
  config: return an empty list, not NULL
  *: expect a non-NULL list of config values

 builtin/for-each-repo.c     |  8 --------
 builtin/gc.c                | 20 ++++++++------------
 builtin/log.c               | 10 ++++------
 builtin/pack-objects.c      |  3 ---
 builtin/repack.c            | 13 +++++--------
 builtin/submodule--helper.c |  4 ++--
 config.c                    | 15 ++++++++++-----
 config.h                    |  6 +++---
 pack-bitmap.c               |  3 ---
 submodule.c                 |  2 +-
 t/helper/test-config.c      |  4 ++--
 versioncmp.c                |  8 ++++----
 12 files changed, 39 insertions(+), 57 deletions(-)


base-commit: 73a262cdca46a45aeeda6f47ea3357aaeb937e7b
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1369%2Fderrickstolee%2Fconfig-empty-list-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1369/derrickstolee/config-empty-list-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1369
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/5] config: relax requirements on multi-value return
  2022-09-27 14:08 [PATCH 0/5] [RFC] config API: return empty list, not NULL Derrick Stolee via GitGitGadget
@ 2022-09-27 14:08 ` Derrick Stolee via GitGitGadget
  2022-09-27 17:26   ` Junio C Hamano
  2022-09-27 14:08 ` [PATCH 2/5] *: relax git_configset_get_value_multi result Derrick Stolee via GitGitGadget
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-09-27 14:08 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The git_configset_get_value() method has an assert() statement
guaranteeing that the result from git_configset_get_value_multi() is
either NULL or has at least one element. We want to change that return
to provide an empty list instead of a NULL list, so change the earlier
'return 1' condition to care about a NULL or empty list.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 config.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/config.c b/config.c
index cbb5a3bab74..bf89afbdab0 100644
--- a/config.c
+++ b/config.c
@@ -2407,9 +2407,8 @@ int git_configset_get_value(struct config_set *cs, const char *key, const char *
 	 */
 	values = git_configset_get_value_multi(cs, key);
 
-	if (!values)
+	if (!values || !values->nr)
 		return 1;
-	assert(values->nr > 0);
 	*value = values->items[values->nr - 1].string;
 	return 0;
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/5] *: relax git_configset_get_value_multi result
  2022-09-27 14:08 [PATCH 0/5] [RFC] config API: return empty list, not NULL Derrick Stolee via GitGitGadget
  2022-09-27 14:08 ` [PATCH 1/5] config: relax requirements on multi-value return Derrick Stolee via GitGitGadget
@ 2022-09-27 14:08 ` Derrick Stolee via GitGitGadget
  2022-09-28 15:58   ` Taylor Blau
  2022-09-27 14:08 ` [PATCH 3/5] config: add BUG() statement instead of possible segfault Derrick Stolee via GitGitGadget
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-09-27 14:08 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

We will change the config API to return an empty list instead of a NULL
list when there are no values in a multi-valued key. Update checks that
look for NULL and have them instead check for NULL or empty lists.

There are other checks that look for NULL, but then do an operation that
would be a no-op on an empty list. These are not modified here but will
be modified after we change the behavior of the config API.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/submodule--helper.c | 10 ++++++++--
 submodule.c                 |  2 +-
 t/helper/test-config.c      |  4 ++--
 versioncmp.c                |  8 ++++----
 4 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 0b4acb442b2..1ddc08b076c 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -531,6 +531,7 @@ static int module_init(int argc, const char **argv, const char *prefix)
 	struct init_cb info = INIT_CB_INIT;
 	struct pathspec pathspec = { 0 };
 	struct module_list list = MODULE_LIST_INIT;
+	const struct string_list *active_modules;
 	int quiet = 0;
 	struct option module_init_options[] = {
 		OPT__QUIET(&quiet, N_("suppress output for initializing a submodule")),
@@ -552,7 +553,9 @@ static int module_init(int argc, const char **argv, const char *prefix)
 	 * If there are no path args and submodule.active is set then,
 	 * by default, only initialize 'active' modules.
 	 */
-	if (!argc && git_config_get_value_multi("submodule.active"))
+	if (!argc &&
+	    (active_modules = git_config_get_value_multi("submodule.active")) &&
+	    active_modules->nr)
 		module_list_active(&list);
 
 	info.prefix = prefix;
@@ -2706,6 +2709,7 @@ static int module_update(int argc, const char **argv, const char *prefix)
 		opt.warn_if_uninitialized = 1;
 
 	if (opt.init) {
+		const struct string_list *active_modules;
 		struct module_list list = MODULE_LIST_INIT;
 		struct init_cb info = INIT_CB_INIT;
 
@@ -2720,7 +2724,9 @@ static int module_update(int argc, const char **argv, const char *prefix)
 		 * If there are no path args and submodule.active is set then,
 		 * by default, only initialize 'active' modules.
 		 */
-		if (!argc && git_config_get_value_multi("submodule.active"))
+		if (!argc &&
+		    (active_modules = git_config_get_value_multi("submodule.active")) &&
+		    active_modules->nr)
 			module_list_active(&list);
 
 		info.prefix = opt.prefix;
diff --git a/submodule.c b/submodule.c
index bf7a2c79183..06230961c80 100644
--- a/submodule.c
+++ b/submodule.c
@@ -275,7 +275,7 @@ int is_tree_submodule_active(struct repository *repo,
 
 	/* submodule.active is set */
 	sl = repo_config_get_value_multi(repo, "submodule.active");
-	if (sl) {
+	if (sl && sl->nr) {
 		struct pathspec ps;
 		struct strvec args = STRVEC_INIT;
 		const struct string_list_item *item;
diff --git a/t/helper/test-config.c b/t/helper/test-config.c
index 4ba9eb65606..62644dd71d7 100644
--- a/t/helper/test-config.c
+++ b/t/helper/test-config.c
@@ -96,7 +96,7 @@ int cmd__config(int argc, const char **argv)
 		}
 	} else if (argc == 3 && !strcmp(argv[1], "get_value_multi")) {
 		strptr = git_config_get_value_multi(argv[2]);
-		if (strptr) {
+		if (strptr && strptr->nr) {
 			for (i = 0; i < strptr->nr; i++) {
 				v = strptr->items[i].string;
 				if (!v)
@@ -160,7 +160,7 @@ int cmd__config(int argc, const char **argv)
 			}
 		}
 		strptr = git_configset_get_value_multi(&cs, argv[2]);
-		if (strptr) {
+		if (strptr && strptr->nr) {
 			for (i = 0; i < strptr->nr; i++) {
 				v = strptr->items[i].string;
 				if (!v)
diff --git a/versioncmp.c b/versioncmp.c
index 069ee94a4d7..8772f869042 100644
--- a/versioncmp.c
+++ b/versioncmp.c
@@ -164,14 +164,14 @@ int versioncmp(const char *s1, const char *s2)
 		initialized = 1;
 		prereleases = git_config_get_value_multi("versionsort.suffix");
 		deprecated_prereleases = git_config_get_value_multi("versionsort.prereleasesuffix");
-		if (prereleases) {
-			if (deprecated_prereleases)
+		if (prereleases && prereleases->nr) {
+			if (deprecated_prereleases && deprecated_prereleases->nr)
 				warning("ignoring versionsort.prereleasesuffix because versionsort.suffix is set");
 		} else
 			prereleases = deprecated_prereleases;
 	}
-	if (prereleases && swap_prereleases(s1, s2, (const char *) p1 - s1 - 1,
-					    &diff))
+	if (prereleases && prereleases->nr &&
+	    swap_prereleases(s1, s2, (const char *) p1 - s1 - 1, &diff))
 		return diff;
 
 	state = result_type[state * 3 + (((c2 == '0') + (isdigit (c2) != 0)))];
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/5] config: add BUG() statement instead of possible segfault
  2022-09-27 14:08 [PATCH 0/5] [RFC] config API: return empty list, not NULL Derrick Stolee via GitGitGadget
  2022-09-27 14:08 ` [PATCH 1/5] config: relax requirements on multi-value return Derrick Stolee via GitGitGadget
  2022-09-27 14:08 ` [PATCH 2/5] *: relax git_configset_get_value_multi result Derrick Stolee via GitGitGadget
@ 2022-09-27 14:08 ` Derrick Stolee via GitGitGadget
  2022-09-27 16:17   ` Ævar Arnfjörð Bjarmason
  2022-09-27 14:08 ` [PATCH 4/5] config: return an empty list, not NULL Derrick Stolee via GitGitGadget
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-09-27 14:08 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The git_die_config() method calls git_config_get_value_multi() but
immediately navigates to its first value without checking if the result
is NULL or empty. Callers should only call git_die_config() if there is
at least one value for the given 'key', but such a mistaken use might
slip through. It would be better to show a BUG() statement than a
possible segfault.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 config.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/config.c b/config.c
index bf89afbdab0..0c41606c7d4 100644
--- a/config.c
+++ b/config.c
@@ -2833,8 +2833,13 @@ void git_die_config(const char *key, const char *err, ...)
 		va_end(params);
 	}
 	values = git_config_get_value_multi(key);
-	kv_info = values->items[values->nr - 1].util;
-	git_die_config_linenr(key, kv_info->filename, kv_info->linenr);
+
+	if (values && values->nr) {
+		kv_info = values->items[values->nr - 1].util;
+		git_die_config_linenr(key, kv_info->filename, kv_info->linenr);
+	} else {
+		BUG("expected a non-empty list of values");
+	}
 }
 
 /*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/5] config: return an empty list, not NULL
  2022-09-27 14:08 [PATCH 0/5] [RFC] config API: return empty list, not NULL Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2022-09-27 14:08 ` [PATCH 3/5] config: add BUG() statement instead of possible segfault Derrick Stolee via GitGitGadget
@ 2022-09-27 14:08 ` Derrick Stolee via GitGitGadget
  2022-09-27 16:21   ` Ævar Arnfjörð Bjarmason
  2022-09-27 14:08 ` [PATCH 5/5] *: expect a non-NULL list of config values Derrick Stolee via GitGitGadget
  2022-09-28  2:40 ` [PATCH 0/5] [RFC] config API: return empty list, not NULL Junio C Hamano
  5 siblings, 1 reply; 21+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-09-27 14:08 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

For the multi-valued config API methods, Git previously returned a NULL
list instead of an empty list. Previous changes adjusted all callers to
instead expect an empty, non-NULL list, making this a safe change.

The next change will remove the NULL checks from all callers.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 config.c | 3 ++-
 config.h | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/config.c b/config.c
index 0c41606c7d4..2d4ca1ae6dc 100644
--- a/config.c
+++ b/config.c
@@ -2415,8 +2415,9 @@ int git_configset_get_value(struct config_set *cs, const char *key, const char *
 
 const struct string_list *git_configset_get_value_multi(struct config_set *cs, const char *key)
 {
+	static struct string_list empty_list = STRING_LIST_INIT_NODUP;
 	struct config_set_element *e = configset_find_element(cs, key);
-	return e ? &e->value_list : NULL;
+	return e ? &e->value_list : &empty_list;
 }
 
 int git_configset_get_string(struct config_set *cs, const char *key, char **dest)
diff --git a/config.h b/config.h
index ca994d77147..9897b97c0b9 100644
--- a/config.h
+++ b/config.h
@@ -458,7 +458,7 @@ int git_configset_add_parameters(struct config_set *cs);
 /**
  * Finds and returns the value list, sorted in order of increasing priority
  * for the configuration variable `key` and config set `cs`. When the
- * configuration variable `key` is not found, returns NULL. The caller
+ * configuration variable `key` is not found, returns an empty list. The caller
  * should not free or modify the returned pointer, as it is owned by the cache.
  */
 const struct string_list *git_configset_get_value_multi(struct config_set *cs, const char *key);
@@ -543,8 +543,8 @@ int git_config_get_value(const char *key, const char **value);
 /**
  * Finds and returns the value list, sorted in order of increasing priority
  * for the configuration variable `key`. When the configuration variable
- * `key` is not found, returns NULL. The caller should not free or modify
- * the returned pointer, as it is owned by the cache.
+ * `key` is not found, returns an empty list. The caller should not free or
+ * modify the returned pointer, as it is owned by the cache.
  */
 const struct string_list *git_config_get_value_multi(const char *key);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/5] *: expect a non-NULL list of config values
  2022-09-27 14:08 [PATCH 0/5] [RFC] config API: return empty list, not NULL Derrick Stolee via GitGitGadget
                   ` (3 preceding siblings ...)
  2022-09-27 14:08 ` [PATCH 4/5] config: return an empty list, not NULL Derrick Stolee via GitGitGadget
@ 2022-09-27 14:08 ` Derrick Stolee via GitGitGadget
  2022-09-28  2:40 ` [PATCH 0/5] [RFC] config API: return empty list, not NULL Junio C Hamano
  5 siblings, 0 replies; 21+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-09-27 14:08 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The previous change altered the multi-valued config API to return an
empty list intead of a NULL list. Update all callers to rely on the
non-NULL result and instead check for an empty list.

Several callers only checked for a NULL list so they could safely call
for_each_string_list_item(), which no-ops on an empty list.

The bitmap_preferred_tips() method is a thin wrapper on the config API,
so its callers are also updated to match this expectation.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/for-each-repo.c     |  8 --------
 builtin/gc.c                | 20 ++++++++------------
 builtin/log.c               | 10 ++++------
 builtin/pack-objects.c      |  3 ---
 builtin/repack.c            | 13 +++++--------
 builtin/submodule--helper.c | 10 ++--------
 config.c                    |  2 +-
 pack-bitmap.c               |  3 ---
 t/helper/test-config.c      |  2 +-
 9 files changed, 21 insertions(+), 50 deletions(-)

diff --git a/builtin/for-each-repo.c b/builtin/for-each-repo.c
index fd86e5a8619..635ea5e15fd 100644
--- a/builtin/for-each-repo.c
+++ b/builtin/for-each-repo.c
@@ -44,14 +44,6 @@ int cmd_for_each_repo(int argc, const char **argv, const char *prefix)
 
 	values = repo_config_get_value_multi(the_repository,
 					     config_key);
-
-	/*
-	 * Do nothing on an empty list, which is equivalent to the case
-	 * where the config variable does not exist at all.
-	 */
-	if (!values)
-		return 0;
-
 	for (i = 0; !result && i < values->nr; i++)
 		result = run_command_on_repo(values->items[i].string, argc, argv);
 
diff --git a/builtin/gc.c b/builtin/gc.c
index 7a585f0b71d..1e9ac2ac7e3 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1493,12 +1493,10 @@ static int maintenance_register(int argc, const char **argv, const char *prefix)
 		git_config_set("maintenance.strategy", "incremental");
 
 	list = git_config_get_value_multi(key);
-	if (list) {
-		for_each_string_list_item(item, list) {
-			if (!strcmp(maintpath, item->string)) {
-				found = 1;
-				break;
-			}
+	for_each_string_list_item(item, list) {
+		if (!strcmp(maintpath, item->string)) {
+			found = 1;
+			break;
 		}
 	}
 
@@ -1550,12 +1548,10 @@ static int maintenance_unregister(int argc, const char **argv, const char *prefi
 				   options);
 
 	list = git_config_get_value_multi(key);
-	if (list) {
-		for_each_string_list_item(item, list) {
-			if (!strcmp(maintpath, item->string)) {
-				found = 1;
-				break;
-			}
+	for_each_string_list_item(item, list) {
+		if (!strcmp(maintpath, item->string)) {
+			found = 1;
+			break;
 		}
 	}
 
diff --git a/builtin/log.c b/builtin/log.c
index ee19dc5d450..719ef966045 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -182,15 +182,13 @@ static void set_default_decoration_filter(struct decoration_filter *decoration_f
 	int i;
 	char *value = NULL;
 	struct string_list *include = decoration_filter->include_ref_pattern;
+	struct string_list_item *item;
 	const struct string_list *config_exclude =
 			git_config_get_value_multi("log.excludeDecoration");
 
-	if (config_exclude) {
-		struct string_list_item *item;
-		for_each_string_list_item(item, config_exclude)
-			string_list_append(decoration_filter->exclude_ref_config_pattern,
-					   item->string);
-	}
+	for_each_string_list_item(item, config_exclude)
+		string_list_append(decoration_filter->exclude_ref_config_pattern,
+				   item->string);
 
 	/*
 	 * By default, decorate_all is disabled. Enable it if
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 3658c05cafc..d15e1857e67 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3982,9 +3982,6 @@ static void mark_bitmap_preferred_tips(void)
 	const struct string_list *preferred_tips;
 
 	preferred_tips = bitmap_preferred_tips(the_repository);
-	if (!preferred_tips)
-		return;
-
 	for_each_string_list_item(item, preferred_tips) {
 		for_each_ref_in(item->string, mark_bitmap_preferred_tip, NULL);
 	}
diff --git a/builtin/repack.c b/builtin/repack.c
index a5bacc77974..1655747f6e0 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -539,6 +539,7 @@ static int midx_snapshot_ref_one(const char *refname UNUSED,
 static void midx_snapshot_refs(struct tempfile *f)
 {
 	struct midx_snapshot_ref_data data;
+	struct string_list_item *item;
 	const struct string_list *preferred = bitmap_preferred_tips(the_repository);
 
 	data.f = f;
@@ -549,14 +550,10 @@ static void midx_snapshot_refs(struct tempfile *f)
 		 die(_("could not open tempfile %s for writing"),
 		     get_tempfile_path(f));
 
-	if (preferred) {
-		struct string_list_item *item;
-
-		data.preferred = 1;
-		for_each_string_list_item(item, preferred)
-			for_each_ref_in(item->string, midx_snapshot_ref_one, &data);
-		data.preferred = 0;
-	}
+	data.preferred = 1;
+	for_each_string_list_item(item, preferred)
+		for_each_ref_in(item->string, midx_snapshot_ref_one, &data);
+	data.preferred = 0;
 
 	for_each_ref(midx_snapshot_ref_one, &data);
 
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 1ddc08b076c..5a8b6120157 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -531,7 +531,6 @@ static int module_init(int argc, const char **argv, const char *prefix)
 	struct init_cb info = INIT_CB_INIT;
 	struct pathspec pathspec = { 0 };
 	struct module_list list = MODULE_LIST_INIT;
-	const struct string_list *active_modules;
 	int quiet = 0;
 	struct option module_init_options[] = {
 		OPT__QUIET(&quiet, N_("suppress output for initializing a submodule")),
@@ -553,9 +552,7 @@ static int module_init(int argc, const char **argv, const char *prefix)
 	 * If there are no path args and submodule.active is set then,
 	 * by default, only initialize 'active' modules.
 	 */
-	if (!argc &&
-	    (active_modules = git_config_get_value_multi("submodule.active")) &&
-	    active_modules->nr)
+	if (!argc && git_config_get_value_multi("submodule.active")->nr)
 		module_list_active(&list);
 
 	info.prefix = prefix;
@@ -2709,7 +2706,6 @@ static int module_update(int argc, const char **argv, const char *prefix)
 		opt.warn_if_uninitialized = 1;
 
 	if (opt.init) {
-		const struct string_list *active_modules;
 		struct module_list list = MODULE_LIST_INIT;
 		struct init_cb info = INIT_CB_INIT;
 
@@ -2724,9 +2720,7 @@ static int module_update(int argc, const char **argv, const char *prefix)
 		 * If there are no path args and submodule.active is set then,
 		 * by default, only initialize 'active' modules.
 		 */
-		if (!argc &&
-		    (active_modules = git_config_get_value_multi("submodule.active")) &&
-		    active_modules->nr)
+		if (!argc && git_config_get_value_multi("submodule.active")->nr)
 			module_list_active(&list);
 
 		info.prefix = opt.prefix;
diff --git a/config.c b/config.c
index 2d4ca1ae6dc..8bd55688352 100644
--- a/config.c
+++ b/config.c
@@ -2407,7 +2407,7 @@ int git_configset_get_value(struct config_set *cs, const char *key, const char *
 	 */
 	values = git_configset_get_value_multi(cs, key);
 
-	if (!values || !values->nr)
+	if (!values->nr)
 		return 1;
 	*value = values->items[values->nr - 1].string;
 	return 0;
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 9a208abc1fd..77d02c7bbe7 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2310,9 +2310,6 @@ int bitmap_is_preferred_refname(struct repository *r, const char *refname)
 	const struct string_list *preferred_tips = bitmap_preferred_tips(r);
 	struct string_list_item *item;
 
-	if (!preferred_tips)
-		return 0;
-
 	for_each_string_list_item(item, preferred_tips) {
 		if (starts_with(refname, item->string))
 			return 1;
diff --git a/t/helper/test-config.c b/t/helper/test-config.c
index 62644dd71d7..90810946783 100644
--- a/t/helper/test-config.c
+++ b/t/helper/test-config.c
@@ -96,7 +96,7 @@ int cmd__config(int argc, const char **argv)
 		}
 	} else if (argc == 3 && !strcmp(argv[1], "get_value_multi")) {
 		strptr = git_config_get_value_multi(argv[2]);
-		if (strptr && strptr->nr) {
+		if (strptr->nr) {
 			for (i = 0; i < strptr->nr; i++) {
 				v = strptr->items[i].string;
 				if (!v)
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/5] config: add BUG() statement instead of possible segfault
  2022-09-27 14:08 ` [PATCH 3/5] config: add BUG() statement instead of possible segfault Derrick Stolee via GitGitGadget
@ 2022-09-27 16:17   ` Ævar Arnfjörð Bjarmason
  2022-09-27 16:46     ` Derrick Stolee
  0 siblings, 1 reply; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-09-27 16:17 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, Derrick Stolee

On Tue, Sep 27 2022, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>
>
> The git_die_config() method calls git_config_get_value_multi() but
> immediately navigates to its first value without checking if the result
> is NULL or empty. Callers should only call git_die_config() if there is
> at least one value for the given 'key', but such a mistaken use might
> slip through. It would be better to show a BUG() statement than a
> possible segfault.
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  config.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/config.c b/config.c
> index bf89afbdab0..0c41606c7d4 100644
> --- a/config.c
> +++ b/config.c
> @@ -2833,8 +2833,13 @@ void git_die_config(const char *key, const char *err, ...)
>  		va_end(params);
>  	}
>  	values = git_config_get_value_multi(key);
> -	kv_info = values->items[values->nr - 1].util;
> -	git_die_config_linenr(key, kv_info->filename, kv_info->linenr);
> +
> +	if (values && values->nr) {
> +		kv_info = values->items[values->nr - 1].util;
> +		git_die_config_linenr(key, kv_info->filename, kv_info->linenr);
> +	} else {
> +		BUG("expected a non-empty list of values");
> +	}
>  }
>  
>  /*

AFAIKT the intent of the current code on "master" is that this will only
get called if the likes of git_configset_get_string() returns < 0, not
if it returns > 0.

So isn't the combination of your 1/5 and this 3/5 now conflating these
two conditions? See e.g. repo_config_get_string_tmp() and when it would
call git_die_config().

I.e. isn't the whole point of git_die_config() to print an error message
about a configuration *value* that we've parsed out of the config?

If e.g. the key itself is bad we'll get a -1, but in this case it seems
we would have a BUG(), but it's not that we "expected a non-empty list
of values", but that the state of the world changed between our previous
configset invocation, no?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] config: return an empty list, not NULL
  2022-09-27 14:08 ` [PATCH 4/5] config: return an empty list, not NULL Derrick Stolee via GitGitGadget
@ 2022-09-27 16:21   ` Ævar Arnfjörð Bjarmason
  2022-09-27 16:50     ` Derrick Stolee
  0 siblings, 1 reply; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-09-27 16:21 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, Derrick Stolee


On Tue, Sep 27 2022, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>
>
> For the multi-valued config API methods, Git previously returned a NULL
> list instead of an empty list. Previous changes adjusted all callers to
> instead expect an empty, non-NULL list, making this a safe change.
>
> The next change will remove the NULL checks from all callers.
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  config.c | 3 ++-
>  config.h | 6 +++---
>  2 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/config.c b/config.c
> index 0c41606c7d4..2d4ca1ae6dc 100644
> --- a/config.c
> +++ b/config.c
> @@ -2415,8 +2415,9 @@ int git_configset_get_value(struct config_set *cs, const char *key, const char *
>  
>  const struct string_list *git_configset_get_value_multi(struct config_set *cs, const char *key)
>  {
> +	static struct string_list empty_list = STRING_LIST_INIT_NODUP;
>  	struct config_set_element *e = configset_find_element(cs, key);
> -	return e ? &e->value_list : NULL;
> +	return e ? &e->value_list : &empty_list;
>  }
>  
>  int git_configset_get_string(struct config_set *cs, const char *key, char **dest)
> diff --git a/config.h b/config.h
> index ca994d77147..9897b97c0b9 100644
> --- a/config.h
> +++ b/config.h
> @@ -458,7 +458,7 @@ int git_configset_add_parameters(struct config_set *cs);
>  /**
>   * Finds and returns the value list, sorted in order of increasing priority
>   * for the configuration variable `key` and config set `cs`. When the
> - * configuration variable `key` is not found, returns NULL. The caller
> + * configuration variable `key` is not found, returns an empty list. The caller
>   * should not free or modify the returned pointer, as it is owned by the cache.
>   */
>  const struct string_list *git_configset_get_value_multi(struct config_set *cs, const char *key);
> @@ -543,8 +543,8 @@ int git_config_get_value(const char *key, const char **value);
>  /**
>   * Finds and returns the value list, sorted in order of increasing priority
>   * for the configuration variable `key`. When the configuration variable
> - * `key` is not found, returns NULL. The caller should not free or modify
> - * the returned pointer, as it is owned by the cache.
> + * `key` is not found, returns an empty list. The caller should not free or
> + * modify the returned pointer, as it is owned by the cache.
>   */
>  const struct string_list *git_config_get_value_multi(const char *key);

Aside from the "DWIM API" aspect of this (which I don't mind) I think
this is really taking the low-level function in the wrong direction, and
that we should just add a new simple wrapper instead.

I.e. both the pre-image API docs & this series gloss over the fact that
we'd not just return NULL here if the config wasn't there, but also if
git_config_parse_key() failed.

So it seems to me that a better direction would be starting with
something like the WIP below (which doesn't compile the whole code, I
stopped at config.[ch] and pack-bitmap.c). I.e. the same "int" return
and "dest" pattern that most other things in the config API have.

The functionality you want in this series would then just be a wrapper
for that which wouldn't care about the difference between "bad key" and
"no such config entry".

But I really think the low-level config API should care about the
difference, granted before your change it's rather crappy, and would
return NULL for both, but I think we can do better...

diff --git a/config.c b/config.c
index cbb5a3bab74..a9536d3a9e8 100644
--- a/config.c
+++ b/config.c
@@ -2275,23 +2275,28 @@ void read_very_early_config(config_fn_t cb, void *data)
 	config_with_options(cb, data, NULL, &opts);
 }
 
-static struct config_set_element *configset_find_element(struct config_set *cs, const char *key)
+static int configset_find_element(struct config_set *cs, const char *key,
+				  struct config_set_element **dest)
 {
 	struct config_set_element k;
 	struct config_set_element *found_entry;
 	char *normalized_key;
+	int ret;
+
 	/*
 	 * `key` may come from the user, so normalize it before using it
 	 * for querying entries from the hashmap.
 	 */
-	if (git_config_parse_key(key, &normalized_key, NULL))
-		return NULL;
+	ret = git_config_parse_key(key, &normalized_key, NULL);
+	if (ret < 0)
+		return ret;
 
 	hashmap_entry_init(&k.ent, strhash(normalized_key));
 	k.key = normalized_key;
 	found_entry = hashmap_get_entry(&cs->config_hash, &k, ent, NULL);
 	free(normalized_key);
-	return found_entry;
+	*dest = found_entry;
+	return 0;
 }
 
 static int configset_add_value(struct config_set *cs, const char *key, const char *value)
@@ -2300,8 +2305,11 @@ static int configset_add_value(struct config_set *cs, const char *key, const cha
 	struct string_list_item *si;
 	struct configset_list_item *l_item;
 	struct key_value_info *kv_info = xmalloc(sizeof(*kv_info));
+	int ret;
 
-	e = configset_find_element(cs, key);
+	ret = configset_find_element(cs, key, &e);
+	if (ret < 0)
+		return ret;
 	/*
 	 * Since the keys are being fed by git_config*() callback mechanism, they
 	 * are already normalized. So simply add them without any further munging.
@@ -2400,24 +2408,34 @@ int git_configset_add_parameters(struct config_set *cs)
 int git_configset_get_value(struct config_set *cs, const char *key, const char **value)
 {
 	const struct string_list *values = NULL;
+	int ret;
+
 	/*
 	 * Follows "last one wins" semantic, i.e., if there are multiple matches for the
 	 * queried key in the files of the configset, the value returned will be the last
 	 * value in the value list for that key.
 	 */
-	values = git_configset_get_value_multi(cs, key);
+	ret = git_configset_get_value_multi(cs, key, &values);
+
+	if (ret < 0)
+		return ret;
 
-	if (!values)
-		return 1;
 	assert(values->nr > 0);
 	*value = values->items[values->nr - 1].string;
 	return 0;
 }
 
-const struct string_list *git_configset_get_value_multi(struct config_set *cs, const char *key)
+int git_configset_get_value_multi(struct config_set *cs, const char *key,
+				  const struct string_list **dest)
 {
-	struct config_set_element *e = configset_find_element(cs, key);
-	return e ? &e->value_list : NULL;
+	struct config_set_element *e = NULL;
+	int ret;
+
+	ret = configset_find_element(cs, key, &e);
+	if (ret < 0)
+		return ret;
+	*dest = &e->value_list;
+	return 0;
 }
 
 int git_configset_get_string(struct config_set *cs, const char *key, char **dest)
@@ -2563,11 +2581,12 @@ int repo_config_get_value(struct repository *repo,
 	return git_configset_get_value(repo->config, key, value);
 }
 
-const struct string_list *repo_config_get_value_multi(struct repository *repo,
-						      const char *key)
+int repo_config_get_value_multi(struct repository *repo,
+				const char *key,
+				const struct string_list **dest)
 {
 	git_config_check_init(repo);
-	return git_configset_get_value_multi(repo->config, key);
+	return git_configset_get_value_multi(repo->config, key, dest);
 }
 
 int repo_config_get_string(struct repository *repo,
@@ -2684,9 +2703,9 @@ int git_config_get_value(const char *key, const char **value)
 	return repo_config_get_value(the_repository, key, value);
 }
 
-const struct string_list *git_config_get_value_multi(const char *key)
+int git_config_get_value_multi(const char *key, const struct string_list **dest)
 {
-	return repo_config_get_value_multi(the_repository, key);
+	return repo_config_get_value_multi(the_repository, key, dest);
 }
 
 int git_config_get_string(const char *key, char **dest)
@@ -2826,6 +2845,7 @@ void git_die_config(const char *key, const char *err, ...)
 	const struct string_list *values;
 	struct key_value_info *kv_info;
 	report_fn error_fn = get_error_routine();
+	int ret;
 
 	if (err) {
 		va_list params;
@@ -2833,7 +2853,9 @@ void git_die_config(const char *key, const char *err, ...)
 		error_fn(err, params);
 		va_end(params);
 	}
-	values = git_config_get_value_multi(key);
+	ret = git_config_get_value_multi(key, &values);
+	if (ret < 0)
+		BUG("...");
 	kv_info = values->items[values->nr - 1].util;
 	git_die_config_linenr(key, kv_info->filename, kv_info->linenr);
 }
diff --git a/config.h b/config.h
index ca994d77147..99b8dc1944c 100644
--- a/config.h
+++ b/config.h
@@ -461,7 +461,8 @@ int git_configset_add_parameters(struct config_set *cs);
  * configuration variable `key` is not found, returns NULL. The caller
  * should not free or modify the returned pointer, as it is owned by the cache.
  */
-const struct string_list *git_configset_get_value_multi(struct config_set *cs, const char *key);
+int git_configset_get_value_multi(struct config_set *cs, const char *key,
+				  const struct string_list **dest);
 
 /**
  * Clears `config_set` structure, removes all saved variable-value pairs.
@@ -495,8 +496,9 @@ struct repository;
 void repo_config(struct repository *repo, config_fn_t fn, void *data);
 int repo_config_get_value(struct repository *repo,
 			  const char *key, const char **value);
-const struct string_list *repo_config_get_value_multi(struct repository *repo,
-						      const char *key);
+int repo_config_get_value_multi(struct repository *repo,
+				const char *key,
+				const struct string_list **dest);
 int repo_config_get_string(struct repository *repo,
 			   const char *key, char **dest);
 int repo_config_get_string_tmp(struct repository *repo,
@@ -546,7 +548,8 @@ int git_config_get_value(const char *key, const char **value);
  * `key` is not found, returns NULL. The caller should not free or modify
  * the returned pointer, as it is owned by the cache.
  */
-const struct string_list *git_config_get_value_multi(const char *key);
+int git_config_get_value_multi(const char *key,
+			       const struct string_list **dest);
 
 /**
  * Resets and invalidates the config cache.
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 440407f1be7..0d704dc76ef 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2301,7 +2301,13 @@ int bitmap_is_midx(struct bitmap_index *bitmap_git)
 
 const struct string_list *bitmap_preferred_tips(struct repository *r)
 {
-	return repo_config_get_value_multi(r, "pack.preferbitmaptips");
+	const struct string_list *dest;
+	int ret;
+
+	ret = repo_config_get_value_multi(r, "pack.preferbitmaptips", &dest);
+	if (ret < 0)
+		BUG("got %d from repo_config_get_value_multi()", ret);
+	return dest;
 }
 
 int bitmap_is_preferred_refname(struct repository *r, const char *refname)

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/5] config: add BUG() statement instead of possible segfault
  2022-09-27 16:17   ` Ævar Arnfjörð Bjarmason
@ 2022-09-27 16:46     ` Derrick Stolee
  2022-09-27 17:22       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 21+ messages in thread
From: Derrick Stolee @ 2022-09-27 16:46 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason,
	Derrick Stolee via GitGitGadget; +Cc: git

On 9/27/2022 12:17 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Sep 27 2022, Derrick Stolee via GitGitGadget wrote:
> 
>> From: Derrick Stolee <derrickstolee@github.com>
>>
>> The git_die_config() method calls git_config_get_value_multi() but
>> immediately navigates to its first value without checking if the result
>> is NULL or empty. Callers should only call git_die_config() if there is
>> at least one value for the given 'key', but such a mistaken use might
>> slip through. It would be better to show a BUG() statement than a
>> possible segfault.
 
> AFAIKT the intent of the current code on "master" is that this will only
> get called if the likes of git_configset_get_string() returns < 0, not
> if it returns > 0.
> 
> So isn't the combination of your 1/5 and this 3/5 now conflating these
> two conditions? See e.g. repo_config_get_string_tmp() and when it would
> call git_die_config().
> 
> I.e. isn't the whole point of git_die_config() to print an error message
> about a configuration *value* that we've parsed out of the config?
> 
> If e.g. the key itself is bad we'll get a -1, but in this case it seems
> we would have a BUG(), but it's not that we "expected a non-empty list
> of values", but that the state of the world changed between our previous
> configset invocation, no?

If git_die_config() was static to config.c, then I would agree with you
that its use is controlled enough to avoid that possibility. However, it
is available in config.h and its doc comment does not say anything about
"make sure 'key' properly parses at least one value".

The point is defense-in-depth to prevent a segfault if someone adds an
incorrect caller. It looks too easy to add an erroneous caller.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] config: return an empty list, not NULL
  2022-09-27 16:21   ` Ævar Arnfjörð Bjarmason
@ 2022-09-27 16:50     ` Derrick Stolee
  2022-09-27 19:18       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 21+ messages in thread
From: Derrick Stolee @ 2022-09-27 16:50 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason,
	Derrick Stolee via GitGitGadget; +Cc: git

On 9/27/2022 12:21 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Sep 27 2022, Derrick Stolee via GitGitGadget wrote:

>>  /**
>>   * Finds and returns the value list, sorted in order of increasing priority
>>   * for the configuration variable `key`. When the configuration variable
>> - * `key` is not found, returns NULL. The caller should not free or modify
>> - * the returned pointer, as it is owned by the cache.
>> + * `key` is not found, returns an empty list. The caller should not free or
>> + * modify the returned pointer, as it is owned by the cache.
>>   */
>>  const struct string_list *git_config_get_value_multi(const char *key);
> 
> Aside from the "DWIM API" aspect of this (which I don't mind) I think
> this is really taking the low-level function in the wrong direction, and
> that we should just add a new simple wrapper instead.
> 
> I.e. both the pre-image API docs & this series gloss over the fact that
> we'd not just return NULL here if the config wasn't there, but also if
> git_config_parse_key() failed.
> 
> So it seems to me that a better direction would be starting with
> something like the WIP below (which doesn't compile the whole code, I
> stopped at config.[ch] and pack-bitmap.c). I.e. the same "int" return
> and "dest" pattern that most other things in the config API have.

Do you have an example where a caller would benefit from this
distinction? Without such an example, I don't think it is worth
creating such a huge change for purity's sake alone.

I'm pretty happy that the diff for this series is an overall
reduction in code, while also not being too large in the interim:

 12 files changed, 39 insertions(+), 57 deletions(-)

If all callers that use the *_multi() methods would only use the
wrapper, then what is the point of doing the low-level manipulations?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/5] config: add BUG() statement instead of possible segfault
  2022-09-27 16:46     ` Derrick Stolee
@ 2022-09-27 17:22       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-09-27 17:22 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Derrick Stolee via GitGitGadget, git


On Tue, Sep 27 2022, Derrick Stolee wrote:

> On 9/27/2022 12:17 PM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Tue, Sep 27 2022, Derrick Stolee via GitGitGadget wrote:
>> 
>>> From: Derrick Stolee <derrickstolee@github.com>
>>>
>>> The git_die_config() method calls git_config_get_value_multi() but
>>> immediately navigates to its first value without checking if the result
>>> is NULL or empty. Callers should only call git_die_config() if there is
>>> at least one value for the given 'key', but such a mistaken use might
>>> slip through. It would be better to show a BUG() statement than a
>>> possible segfault.
>  
>> AFAIKT the intent of the current code on "master" is that this will only
>> get called if the likes of git_configset_get_string() returns < 0, not
>> if it returns > 0.
>> 
>> So isn't the combination of your 1/5 and this 3/5 now conflating these
>> two conditions? See e.g. repo_config_get_string_tmp() and when it would
>> call git_die_config().
>> 
>> I.e. isn't the whole point of git_die_config() to print an error message
>> about a configuration *value* that we've parsed out of the config?
>> 
>> If e.g. the key itself is bad we'll get a -1, but in this case it seems
>> we would have a BUG(), but it's not that we "expected a non-empty list
>> of values", but that the state of the world changed between our previous
>> configset invocation, no?
>
> If git_die_config() was static to config.c, then I would agree with you
> that its use is controlled enough to avoid that possibility. However, it
> is available in config.h and its doc comment does not say anything about
> "make sure 'key' properly parses at least one value".

It does, it's part of the configset API, which any non-trivial user
understands adhares to the semantics of the configset cache.

E.g. the fast-import.c caller is doing, basically:

	if (!git_config_get_int("pack.indexversion", &indexversion_value))
		/* ...if we're not happy with indexversion_value ...*/
		git_die_config("pack.indexversion", ...);

Anyway, I'm all for adding some extra paranoia with a BUG(), it was just
unclear to me if this was some segfault we were expecting to perhaps
have now, or with API use after this series.

AFAICT it's really neither of those, but just some side-paranoia added
while-at-it for a *different* use-case than the primary goal of this
series. I.e. one where you're looking up a key you don't know exists,
whereas git_die_config() is for a key we *know* exists, but don't like
its value.

I think it's better if we just clarify that it's supposed to be used by
that in the API docs, and if you insist on the BUG then the:

	BUG("expected a non-empty list of values");

Should really get to the point directly, and say something like:

	BUG("called with non-existing key '%s'? "
            "Call git_die_config(...) only with keys whose value you don't like", key);



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] config: relax requirements on multi-value return
  2022-09-27 14:08 ` [PATCH 1/5] config: relax requirements on multi-value return Derrick Stolee via GitGitGadget
@ 2022-09-27 17:26   ` Junio C Hamano
  0 siblings, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2022-09-27 17:26 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> The git_configset_get_value() method has an assert() statement
> guaranteeing that the result from git_configset_get_value_multi() is
> either NULL or has at least one element. We want to change that return
> to provide an empty list instead of a NULL list, so change the earlier
> 'return 1' condition to care about a NULL or empty list.

If we are allowing to return an empty string_list from the function
in later steps, the assert() that insists the list has at least 1
item on it will get in the way.  OK.

> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  config.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/config.c b/config.c
> index cbb5a3bab74..bf89afbdab0 100644
> --- a/config.c
> +++ b/config.c
> @@ -2407,9 +2407,8 @@ int git_configset_get_value(struct config_set *cs, const char *key, const char *
>  	 */
>  	values = git_configset_get_value_multi(cs, key);
>  
> -	if (!values)
> +	if (!values || !values->nr)
>  		return 1;
> -	assert(values->nr > 0);
>  	*value = values->items[values->nr - 1].string;

But we need to guard against an empty list, obviously.  OK.

>  	return 0;
>  }

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] config: return an empty list, not NULL
  2022-09-27 16:50     ` Derrick Stolee
@ 2022-09-27 19:18       ` Ævar Arnfjörð Bjarmason
  2022-09-28 13:46         ` Derrick Stolee
  0 siblings, 1 reply; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-09-27 19:18 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Derrick Stolee via GitGitGadget, git


On Tue, Sep 27 2022, Derrick Stolee wrote:

> On 9/27/2022 12:21 PM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Tue, Sep 27 2022, Derrick Stolee via GitGitGadget wrote:
>
>>>  /**
>>>   * Finds and returns the value list, sorted in order of increasing priority
>>>   * for the configuration variable `key`. When the configuration variable
>>> - * `key` is not found, returns NULL. The caller should not free or modify
>>> - * the returned pointer, as it is owned by the cache.
>>> + * `key` is not found, returns an empty list. The caller should not free or
>>> + * modify the returned pointer, as it is owned by the cache.
>>>   */
>>>  const struct string_list *git_config_get_value_multi(const char *key);
>> 
>> Aside from the "DWIM API" aspect of this (which I don't mind) I think
>> this is really taking the low-level function in the wrong direction, and
>> that we should just add a new simple wrapper instead.
>> 
>> I.e. both the pre-image API docs & this series gloss over the fact that
>> we'd not just return NULL here if the config wasn't there, but also if
>> git_config_parse_key() failed.
>> 
>> So it seems to me that a better direction would be starting with
>> something like the WIP below (which doesn't compile the whole code, I
>> stopped at config.[ch] and pack-bitmap.c). I.e. the same "int" return
>> and "dest" pattern that most other things in the config API have.
>
> Do you have an example where a caller would benefit from this
> distinction? Without such an example, I don't think it is worth
> creating such a huge change for purity's sake alone.

Not initially, I started poking at this because the CL/series/commits
says that we don't care about the case of non-existing keys, without
being clear as to why we want to conflate that with other errors we
might get from this API.

But after some digging I found:

	$ for k in a a.b. "'x.y"; do ./git for-each-repo --config=$k;  echo $?; done
	error: key does not contain a section: a
	0
	error: key does not contain variable name: a.b.
	0
	error: invalid key: 'x.y
	0
	
I.e. the repo_config_get_value_multi() you added in for-each-repo
doesn't distinguish between bad keys and non-existing keys, and returns
0 even though it printed an "error".

> I'm pretty happy that the diff for this series is an overall
> reduction in code, while also not being too large in the interim:
>
>  12 files changed, 39 insertions(+), 57 deletions(-)
>
> If all callers that use the *_multi() methods would only use the
> wrapper, then what is the point of doing the low-level manipulations?

I hacked up something that's at least RFC-quality based on this
approach, but CI is running etc., so not submitting it
now:

	https://github.com/git/git/compare/master...avar:git:avar/have-git_configset_get_value-use-dest-and-int-pattern

I think the resulting diff is more idiomatic API use, i.e. you ended up
with:

	        /* submodule.active is set */
	        sl = repo_config_get_value_multi(repo, "submodule.active");
	-       if (sl) {
	+       if (sl && sl->nr) {

But I ended up doing:

	        /* submodule.active is set */
	-       sl = repo_config_get_value_multi(repo, "submodule.active");
	-       if (sl) {
	+       if (!repo_config_get_const_value_multi(repo, "submodule.active", &sl)) {

Note the "const" in the function name, i.e. there's wrappers that handle
the case where we have a hardcoded key name, in which case we can BUG()
out if we'd return < 0, so all we have left is just "does key exist".

In any case, I'm all for having some simple wrapper for the common cases.

But I didn't find a single case where we actually needed this "never
give me a non-NULL list" behavior, it could just be generalized to
"let's have the API tell us if the key exist".

If you use the non-"const" API you can distinguish the err < 0 case, so
for-each-repo can now error out appropriately:

	$ ./git for-each-repo --config=a; echo $?
	error: key does not contain a section: a
	fatal: got bad config --config=a
	
	usage: git for-each-repo --config=<config> <command-args>
	
	    --config <config>     config key storing a list of repository paths
	
	129

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/5] [RFC] config API: return empty list, not NULL
  2022-09-27 14:08 [PATCH 0/5] [RFC] config API: return empty list, not NULL Derrick Stolee via GitGitGadget
                   ` (4 preceding siblings ...)
  2022-09-27 14:08 ` [PATCH 5/5] *: expect a non-NULL list of config values Derrick Stolee via GitGitGadget
@ 2022-09-28  2:40 ` Junio C Hamano
  2022-09-28 18:38   ` Derrick Stolee
  5 siblings, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2022-09-28  2:40 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This work changes the behavior of asking for a multi-valued config key to
> return an empty list instead of a NULL value. This simplifies the handling
> of the result and is safer for development in the future.
>
> This is based on v4 of my unregister series [1]
>
> [1]
> https://lore.kernel.org/git/pull.1358.v4.git.1664287021.gitgitgadget@gmail.com/
>
> This idea came about due to a bug in the git maintenance unregister work
> where the result from git_config_get_value_multi() was sent directly to
> for_each_string_list_item() without checking for a NULL value first.
>
> I'm sending this as an RFC mostly because I'm not 100% sure this shift is
> worth the refactoring pain and effort. I personally think getting an empty
> list is a safer choice, but I can also understand if someone has a different
> opinion.

Thanks.

I actually am in favor of the idea that a NULL can be passed around
to signal the lack of a string_list (or the lack of a instance of
any "collection" type), and the current code is structured as such,
and it gives us extra flexibility.  Of course, we need to see if
that extra flexibility is worth it.

With a colleciton col, "if (col && col->nr)" checks if we have
something to work on.  But a code like this (which is a longhand for
the for_each_string_list_item() issue we just reencountered):

    col = git_get_some_collection(...);
    if (!col)
	return; /* no collection */
    if (!col->nr)
	git_add_to_some_collection(col, the default item);
    for (i = 0; i < col->nr; i++)
	do things on col.stuff[i];

can react differently to cases where we have an empty collection
and where we do not have any collection to begin with.  

The other side of the coin is that it would make it harder to treat
the lack of collection itself and the collection being empty the
same way.  The above code might need to become

    col = git_get_some_collection(...);
    if (!col)
	col = git_get_empty_collection();
    if (!col->nr)
	git_add_to_some_collection(col, the default item);
    for (i = 0; i < col->nr; i++)
	do things on col.stuff[i];

but if the "get the collection" thing returns an empty collection
when there is actually no collection, we can lose two lines from
there.

Between these two positions, I am in favor of the flexibility that
we can use NULL to signal the lack of collection, not a presence of
a collection with zero items, as it feels conceptually cleaner.

Counting the hunks in [2/5] and [5/5], it seems that "when no
variable with given key is defined, we return an empty set" makes us
to have more code in 7 places in [PATCH 2/5], while allowing us to
lose code in 10 places in [PATCH 5/5].

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] config: return an empty list, not NULL
  2022-09-27 19:18       ` Ævar Arnfjörð Bjarmason
@ 2022-09-28 13:46         ` Derrick Stolee
  2022-09-28 14:37           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 21+ messages in thread
From: Derrick Stolee @ 2022-09-28 13:46 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee via GitGitGadget, git

On 9/27/22 3:18 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Sep 27 2022, Derrick Stolee wrote:
> 
>> On 9/27/2022 12:21 PM, Ævar Arnfjörð Bjarmason wrote:
>>>
>>> On Tue, Sep 27 2022, Derrick Stolee via GitGitGadget wrote:
>>
>>>>  /**
>>>>   * Finds and returns the value list, sorted in order of increasing priority
>>>>   * for the configuration variable `key`. When the configuration variable
>>>> - * `key` is not found, returns NULL. The caller should not free or modify
>>>> - * the returned pointer, as it is owned by the cache.
>>>> + * `key` is not found, returns an empty list. The caller should not free or
>>>> + * modify the returned pointer, as it is owned by the cache.
>>>>   */
>>>>  const struct string_list *git_config_get_value_multi(const char *key);
>>>
>>> Aside from the "DWIM API" aspect of this (which I don't mind) I think
>>> this is really taking the low-level function in the wrong direction, and
>>> that we should just add a new simple wrapper instead.
>>>
>>> I.e. both the pre-image API docs & this series gloss over the fact that
>>> we'd not just return NULL here if the config wasn't there, but also if
>>> git_config_parse_key() failed.
>>>
>>> So it seems to me that a better direction would be starting with
>>> something like the WIP below (which doesn't compile the whole code, I
>>> stopped at config.[ch] and pack-bitmap.c). I.e. the same "int" return
>>> and "dest" pattern that most other things in the config API have.
>>
>> Do you have an example where a caller would benefit from this
>> distinction? Without such an example, I don't think it is worth
>> creating such a huge change for purity's sake alone.
> 
> Not initially, I started poking at this because the CL/series/commits
> says that we don't care about the case of non-existing keys, without
> being clear as to why we want to conflate that with other errors we
> might get from this API.
> 
> But after some digging I found:
> 
> 	$ for k in a a.b. "'x.y"; do ./git for-each-repo --config=$k;  echo $?; done
> 	error: key does not contain a section: a
> 	0
> 	error: key does not contain variable name: a.b.
> 	0
> 	error: invalid key: 'x.y
> 	0
> 	
> I.e. the repo_config_get_value_multi() you added in for-each-repo
> doesn't distinguish between bad keys and non-existing keys, and returns
> 0 even though it printed an "error".

I can understand wanting to inform the user that they provided an
invalid key using a nonzero exit code. I can also understand that
the command does what is asked: it did nothing because the given
key has no values (because it can't). I think the use of an "error"
message balances things towards wanting a nonzero exit code.

>> I'm pretty happy that the diff for this series is an overall
>> reduction in code, while also not being too large in the interim:
>>
>>  12 files changed, 39 insertions(+), 57 deletions(-)
>>
>> If all callers that use the *_multi() methods would only use the
>> wrapper, then what is the point of doing the low-level manipulations?
> 
> I hacked up something that's at least RFC-quality based on this
> approach, but CI is running etc., so not submitting it
> now:
> 
> 	https://github.com/git/git/compare/master...avar:git:avar/have-git_configset_get_value-use-dest-and-int-pattern
> 
> I think the resulting diff is more idiomatic API use, i.e. you ended up
> with:
> 
> 	        /* submodule.active is set */
> 	        sl = repo_config_get_value_multi(repo, "submodule.active");
> 	-       if (sl) {
> 	+       if (sl && sl->nr) {

You're right that I forgot to change this one to "if (sl->nr)"
in patch 5.

> But I ended up doing:
> 
> 	        /* submodule.active is set */
> 	-       sl = repo_config_get_value_multi(repo, "submodule.active");
> 	-       if (sl) {
> 	+       if (!repo_config_get_const_value_multi(repo, "submodule.active", &sl)) {
> 
> Note the "const" in the function name, i.e. there's wrappers that handle
> the case where we have a hardcoded key name, in which case we can BUG()
> out if we'd return < 0, so all we have left is just "does key exist".

The problem here is that the block actually cares that the list is non-empty
and should not run if the list is empty. In that case, you would need to add
"&& sl->nr" to the condition.

I'm of course assuming that an empty list is different from an error. In
your for-each-repo example, we would not want to return a non-zero exit
code on an empty list, only on a bad key (or other I/O problem).

If we return a negative value on an error and the number of matches on
success, then this change could instead be "if (repo_config....() > 0)".

> In any case, I'm all for having some simple wrapper for the common cases
A simple wrapper would be nice, and be exactly the method as it is
updated in this series. The error-result version could be adopted when
there is reason to do so.

> But I didn't find a single case where we actually needed this "never
> give me a non-NULL list" behavior, it could just be generalized to
> "let's have the API tell us if the key exist".

Most cases want to feed the result into the for_each_string_list_item()
macro. Based on the changes in patch 5, I think the empty list is a
better pattern and leads to prettier code in almost all cases.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] config: return an empty list, not NULL
  2022-09-28 13:46         ` Derrick Stolee
@ 2022-09-28 14:37           ` Ævar Arnfjörð Bjarmason
  2022-09-28 18:10             ` Derrick Stolee
  0 siblings, 1 reply; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-09-28 14:37 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Derrick Stolee via GitGitGadget, git


On Wed, Sep 28 2022, Derrick Stolee wrote:

> On 9/27/22 3:18 PM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Tue, Sep 27 2022, Derrick Stolee wrote:
>> 
>>> On 9/27/2022 12:21 PM, Ævar Arnfjörð Bjarmason wrote:
>>>>
>>>> On Tue, Sep 27 2022, Derrick Stolee via GitGitGadget wrote:
>>>
>>>>>  /**
>>>>>   * Finds and returns the value list, sorted in order of increasing priority
>>>>>   * for the configuration variable `key`. When the configuration variable
>>>>> - * `key` is not found, returns NULL. The caller should not free or modify
>>>>> - * the returned pointer, as it is owned by the cache.
>>>>> + * `key` is not found, returns an empty list. The caller should not free or
>>>>> + * modify the returned pointer, as it is owned by the cache.
>>>>>   */
>>>>>  const struct string_list *git_config_get_value_multi(const char *key);
>>>>
>>>> Aside from the "DWIM API" aspect of this (which I don't mind) I think
>>>> this is really taking the low-level function in the wrong direction, and
>>>> that we should just add a new simple wrapper instead.
>>>>
>>>> I.e. both the pre-image API docs & this series gloss over the fact that
>>>> we'd not just return NULL here if the config wasn't there, but also if
>>>> git_config_parse_key() failed.
>>>>
>>>> So it seems to me that a better direction would be starting with
>>>> something like the WIP below (which doesn't compile the whole code, I
>>>> stopped at config.[ch] and pack-bitmap.c). I.e. the same "int" return
>>>> and "dest" pattern that most other things in the config API have.
>>>
>>> Do you have an example where a caller would benefit from this
>>> distinction? Without such an example, I don't think it is worth
>>> creating such a huge change for purity's sake alone.
>> 
>> Not initially, I started poking at this because the CL/series/commits
>> says that we don't care about the case of non-existing keys, without
>> being clear as to why we want to conflate that with other errors we
>> might get from this API.
>> 
>> But after some digging I found:
>> 
>> 	$ for k in a a.b. "'x.y"; do ./git for-each-repo --config=$k;  echo $?; done
>> 	error: key does not contain a section: a
>> 	0
>> 	error: key does not contain variable name: a.b.
>> 	0
>> 	error: invalid key: 'x.y
>> 	0
>> 	
>> I.e. the repo_config_get_value_multi() you added in for-each-repo
>> doesn't distinguish between bad keys and non-existing keys, and returns
>> 0 even though it printed an "error".
>
> I can understand wanting to inform the user that they provided an
> invalid key using a nonzero exit code. I can also understand that
> the command does what is asked: it did nothing because the given
> key has no values (because it can't). I think the use of an "error"
> message balances things towards wanting a nonzero exit code.

Right, to be clear I think 6c62f015520 (for-each-repo: do nothing on
empty config, 2021-01-08) is sensible, i.e. we want to return 0 on a
non-existing key.

We just shouldn't conflate that with e.g. these parse errors, which the
API squashing the underlying negative return values and the "NULL list"
imposes on the user.

>>> I'm pretty happy that the diff for this series is an overall
>>> reduction in code, while also not being too large in the interim:
>>>
>>>  12 files changed, 39 insertions(+), 57 deletions(-)
>>>
>>> If all callers that use the *_multi() methods would only use the
>>> wrapper, then what is the point of doing the low-level manipulations?
>> 
>> I hacked up something that's at least RFC-quality based on this
>> approach, but CI is running etc., so not submitting it
>> now:
>> 
>> 	https://github.com/git/git/compare/master...avar:git:avar/have-git_configset_get_value-use-dest-and-int-pattern
>> 
>> I think the resulting diff is more idiomatic API use, i.e. you ended up
>> with:
>> 
>> 	        /* submodule.active is set */
>> 	        sl = repo_config_get_value_multi(repo, "submodule.active");
>> 	-       if (sl) {
>> 	+       if (sl && sl->nr) {
>
> You're right that I forgot to change this one to "if (sl->nr)"
> in patch 5.

If I am I didn't mean to point that out, I ws just pointing out the
end-API use. I.e. int return value v.s. the "populate dest" pattern, but
yes, in your end-state you'd drop the "sl &&" part.

>> But I ended up doing:
>> 
>> 	        /* submodule.active is set */
>> 	-       sl = repo_config_get_value_multi(repo, "submodule.active");
>> 	-       if (sl) {
>> 	+       if (!repo_config_get_const_value_multi(repo, "submodule.active", &sl)) {
>> 
>> Note the "const" in the function name, i.e. there's wrappers that handle
>> the case where we have a hardcoded key name, in which case we can BUG()
>> out if we'd return < 0, so all we have left is just "does key exist".
>
> The problem here is that the block actually cares that the list is non-empty
> and should not run if the list is empty. In that case, you would need to add
> "&& sl->nr" to the condition.
>
> I'm of course assuming that an empty list is different from an error. In
> your for-each-repo example, we would not want to return a non-zero exit
> code on an empty list, only on a bad key (or other I/O problem).
>
> If we return a negative value on an error and the number of matches on
> success, then this change could instead be "if (repo_config....() > 0)".

Hrm, I think you're confusing the worldview your series here is
advocating for, and what I'm suggesting as an alternative.

There isn't any way on "master" to have "an empty list", that's a
worldview you're proposing. In particular your 1/5 here removes:

	assert(values->nr > 0);

More generally the config format has no notion of "an empty list", if
you have a valid key-value pair at all you have a list of ".nr >= 1".

The "empty list" is a construct you're introducing in this series,
because you wanted the idiom of passing things to
for_each_string_list_item.

I'm advocating for not going that route, and instead make the *_multi()
method like the rest of the config API. I.e. to use the "return int,
populate dest" pattern.

It's fine if we disagree, but I get the sense that it's not clear what
we're disagreeing *on* :)

>> In any case, I'm all for having some simple wrapper for the common cases
> A simple wrapper would be nice, and be exactly the method as it is
> updated in this series. The error-result version could be adopted when
> there is reason to do so.

Well, no :) We ended up with two different "simple wrapper[s]", mine
doesn't have this notion of a "struct string_list *list" with .nr == 0.

>> But I didn't find a single case where we actually needed this "never
>> give me a non-NULL list" behavior, it could just be generalized to
>> "let's have the API tell us if the key exist".
>
> Most cases want to feed the result into the for_each_string_list_item()
> macro. Based on the changes in patch 5, I think the empty list is a
> better pattern and leads to prettier code in almost all cases.

I updated the WIP RFC series I linked to upthread a bit since my initial
reply (the link is still good, I force-pushed), I then rebased your
series here on "master", below is a diff of some select files.

The overall diff is much bigger obviously (API changes and all), but the
below demonstrates some of the API changes (yours is "-", mine is
"+"). I've commented inline on some of it:

	diff --git a/builtin/for-each-repo.c b/builtin/for-each-repo.c
	index 635ea5e15fd..16e9a76d04a 100644
	--- a/builtin/for-each-repo.c
	+++ b/builtin/for-each-repo.c
	@@ -29,6 +29,7 @@ int cmd_for_each_repo(int argc, const char **argv, const char *prefix)
	 	static const char *config_key = NULL;
	 	int i, result = 0;
	 	const struct string_list *values;
	+	int err;
	 
	 	const struct option options[] = {
	 		OPT_STRING(0, "config", &config_key, N_("config"),
	@@ -42,8 +43,13 @@ int cmd_for_each_repo(int argc, const char **argv, const char *prefix)
	 	if (!config_key)
	 		die(_("missing --config=<config>"));
	 
	-	values = repo_config_get_value_multi(the_repository,
	-					     config_key);
	+	err = repo_config_get_value_multi(the_repository, config_key, &values);
	+	if (err < 0)
	+		usage_msg_optf(_("got bad config --config=%s"),
	+			       for_each_repo_usage, options, config_key);
	+	else if (err)
	+		return 0;
	+
	 	for (i = 0; !result && i < values->nr; i++)
	 		result = run_command_on_repo(values->items[i].string, argc, argv);

Here we're relying an error to the user that we couldn't before, because
repo_config_get_value_multi() would return "NULL" for both "key is bad"
and "key doesn't exist". There's a corresponding test modification
below.

	diff --git a/builtin/gc.c b/builtin/gc.c
	index 1e9ac2ac7e3..94b77a88a99 100644
	--- a/builtin/gc.c
	+++ b/builtin/gc.c
	@@ -1472,9 +1472,7 @@ static int maintenance_register(int argc, const char **argv, const char *prefix)
	 	};
	 	int found = 0;
	 	const char *key = "maintenance.repo";
	-	char *config_value;
	 	char *maintpath = get_maintpath();
	-	struct string_list_item *item;
	 	const struct string_list *list;
	 
	 	argc = parse_options(argc, argv, prefix, options,
	@@ -1487,18 +1485,11 @@ static int maintenance_register(int argc, const char **argv, const char *prefix)
	 	git_config_set("maintenance.auto", "false");
	 
	 	/* Set maintenance strategy, if unset */
	-	if (!git_config_get_string("maintenance.strategy", &config_value))
	-		free(config_value);
	-	else
	+	if (git_config_lookup_value("maintenance.strategy"))
	 		git_config_set("maintenance.strategy", "incremental");

In looking at this I thought we were way overdue for a "does this key
exist?" helper, this and a few other API users use it.
	 
	-	list = git_config_get_value_multi(key);
	-	for_each_string_list_item(item, list) {
	-		if (!strcmp(maintpath, item->string)) {
	-			found = 1;
	-			break;
	-		}
	-	}
	+	if (!git_config_get_const_value_multi(key, &list))
	+		found = unsorted_string_list_has_string(list, maintpath);

So, it turns out that the initial reason you wanted the "pass NULL to
for_each_string_list_item" is actually something we can do with
unsorted_string_list_has_string(), which implements the same loop.

The difference here is *the* API difference we're discussing. I.e. we'll
never get a NULL "list", we'll instead always get a non-NULL list with
>= 1 item if we can get this key at all.

The "const value" helper is a wrapper that handles the "err < 0" case. I
cases where we hardcode the key it's a BUG() if we get "err < 0". The
wrapper is just:

	int err = git_configset_get_value_multi(cs, key, dest);
        if (err < 0)
		BUG("failed to parse constant key '%s'!", key);
	return err;

	[...]
	@@ -1547,13 +1537,8 @@ static int maintenance_unregister(int argc, const char **argv, const char *prefi
	 		usage_with_options(builtin_maintenance_unregister_usage,
	 				   options);
	 
	-	list = git_config_get_value_multi(key);
	-	for_each_string_list_item(item, list) {
	-		if (!strcmp(maintpath, item->string)) {
	-			found = 1;
	-			break;
	-		}
	-	}
	+	if (!git_config_get_const_value_multi(key, &list))
	+		found = unsorted_string_list_has_string(list, maintpath);

Ditto the same git_config_get_const_value_multi() &
unsorted_string_list_has_string() pattern.
	 
	 	if (found) {
	 		int rc;
	diff --git a/builtin/log.c b/builtin/log.c
	index 719ef966045..bdb87f6c42b 100644
	--- a/builtin/log.c
	+++ b/builtin/log.c
	@@ -182,13 +182,15 @@ static void set_default_decoration_filter(struct decoration_filter *decoration_f
	 	int i;
	 	char *value = NULL;
	 	struct string_list *include = decoration_filter->include_ref_pattern;
	-	struct string_list_item *item;
	-	const struct string_list *config_exclude =
	-			git_config_get_value_multi("log.excludeDecoration");
	+	const struct string_list *config_exclude;
	 
	-	for_each_string_list_item(item, config_exclude)
	-		string_list_append(decoration_filter->exclude_ref_config_pattern,
	-				   item->string);
	+	if (!git_config_get_const_value_multi("log.excludeDecoration",
	+					      &config_exclude)) {
	+		struct string_list_item *item;
	+		for_each_string_list_item(item, config_exclude)
	+			string_list_append(decoration_filter->exclude_ref_config_pattern,
	+					   item->string);
	+	}

Here's a case where we need to use for_each_string_list_item(), I think
it's nice how we can now scope the "item" variable.

	 	/*
	 	 * By default, decorate_all is disabled. Enable it if
	diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
	index 5a8b6120157..b758255f816 100644
	--- a/builtin/submodule--helper.c
	+++ b/builtin/submodule--helper.c
	@@ -552,7 +552,7 @@ static int module_init(int argc, const char **argv, const char *prefix)
	 	 * If there are no path args and submodule.active is set then,
	 	 * by default, only initialize 'active' modules.
	 	 */
	-	if (!argc && git_config_get_value_multi("submodule.active")->nr)
	+	if (!argc && !git_config_lookup_value("submodule.active"))
	 		module_list_active(&list);

You changed these in your 2/5, but they really just wanted the new "does
this key exist?" API. No need to construct the string_list just to throw
it away...
	 
	 	info.prefix = prefix;
	@@ -2720,7 +2720,7 @@ static int module_update(int argc, const char **argv, const char *prefix)
	 		 * If there are no path args and submodule.active is set then,
	 		 * by default, only initialize 'active' modules.
	 		 */
	-		if (!argc && git_config_get_value_multi("submodule.active")->nr)
	+		if (!argc && !git_config_lookup_value("submodule.active"))
	 			module_list_active(&list);

Ditto.
	 
	 		info.prefix = opt.prefix;
	@@ -3164,7 +3164,6 @@ static int config_submodule_in_gitmodules(const char *name, const char *var, con
	 static void configure_added_submodule(struct add_data *add_data)
	 {
	 	char *key;
	-	const char *val;
	 	struct child_process add_submod = CHILD_PROCESS_INIT;
	 	struct child_process add_gitmodules = CHILD_PROCESS_INIT;
	 
	@@ -3209,7 +3208,7 @@ static void configure_added_submodule(struct add_data *add_data)
	 	 * is_submodule_active(), since that function needs to find
	 	 * out the value of "submodule.active" again anyway.
	 	 */
	-	if (!git_config_get_string_tmp("submodule.active", &val)) {
	+	if (!git_config_lookup_value("submodule.active")) {
	 		/*
	 		 * If the submodule being added isn't already covered by the
	 		 * current configured pathspec, set the submodule's active flag

Ditto.

	diff --git a/submodule.c b/submodule.c
	index 06230961c80..4474cf9ed2d 100644
	--- a/submodule.c
	+++ b/submodule.c
	@@ -274,8 +274,7 @@ int is_tree_submodule_active(struct repository *repo,
	 	free(key);
	 
	 	/* submodule.active is set */
	-	sl = repo_config_get_value_multi(repo, "submodule.active");
	-	if (sl && sl->nr) {
	+	if (!repo_config_get_const_value_multi(repo, "submodule.active", &sl)) {
	 		struct pathspec ps;
	 		struct strvec args = STRVEC_INIT;
	 		const struct string_list_item *item;

Another "*the* API difference we're discussing". I.e. sure, your end
state would be "if (sl->nr)", but if we make it return "int"...

	diff --git a/t/helper/test-config.c b/t/helper/test-config.c
	index 90810946783..432ad047537 100644
	--- a/t/helper/test-config.c
	+++ b/t/helper/test-config.c
	@@ -95,8 +95,7 @@ int cmd__config(int argc, const char **argv)
	 			goto exit1;
	 		}
	 	} else if (argc == 3 && !strcmp(argv[1], "get_value_multi")) {
	-		strptr = git_config_get_value_multi(argv[2]);
	-		if (strptr->nr) {
	+		if (!git_config_get_const_value_multi(argv[2], &strptr)) {
	 			for (i = 0; i < strptr->nr; i++) {
	 				v = strptr->items[i].string;
	 				if (!v)

Ditto, (this one converts away from your preferred API use).

	@@ -159,8 +158,7 @@ int cmd__config(int argc, const char **argv)
	 				goto exit2;
	 			}
	 		}
	-		strptr = git_configset_get_value_multi(&cs, argv[2]);
	-		if (strptr && strptr->nr) {
	+		if (!git_configset_get_const_value_multi(&cs, argv[2], &strptr)) {
	 			for (i = 0; i < strptr->nr; i++) {
	 				v = strptr->items[i].string;
	 				if (!v)

Ditto, sans that you'd presumably want s/strptr && // here.

	diff --git a/t/t0068-for-each-repo.sh b/t/t0068-for-each-repo.sh
	index 4675e852517..115221c9ca5 100755
	--- a/t/t0068-for-each-repo.sh
	+++ b/t/t0068-for-each-repo.sh
	@@ -33,4 +33,10 @@ test_expect_success 'do nothing on empty config' '
	 	git for-each-repo --config=bogus.config -- help --no-such-option
	 '
	 
	+test_expect_success 'error on bad config keys' '
	+	test_expect_code 129 git for-each-repo --config=a &&
	+	test_expect_code 129 git for-each-repo --config=a.b. &&
	+	test_expect_code 129 git for-each-repo --config="'\''.b"
	+'
	+
	 test_done

A test showing behavior change we can implement now that we don't sweep
the "err < 0" under the rug.

That branch also grew to have some other changes we may or may not want,
one thing was to convert the various *_get_*() functionts that now
normalize the non-zero return value with e.g.:

	 int git_configset_get_int(struct config_set *cs, const char *key, int *dest)
	 {
	        const char *value;
	-       if (!git_configset_get_value(cs, key, &value)) {
	-               *dest = git_config_int(key, value);
	-               return 0;
	-       } else
	-               return 1;
	+       int err;
	+
	+       if ((err = git_configset_get_value(cs, key, &value)))
	+               return err;
	+       *dest = git_config_int(key, value);
	+       return 0;
	 }

No caller currently cares about it, but I think it makes sense generally
not to throw away errors if we can (whether that part is worth the churn
is another topic).

Anyway, the reason I started looking at this RFC to begin with was
because this *_multi() part of the config API has often seemed odd to
me, i.e. I wondered why we couldn't just have it use the "return int,
populate dest" pattern. I'd just never tried to see if I could get that
to work.

It's a bit of one-off churn to get to this point, but I think the end
result of having all the API functions act the same way to signal key
existence v.s. validity is worth it.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] *: relax git_configset_get_value_multi result
  2022-09-27 14:08 ` [PATCH 2/5] *: relax git_configset_get_value_multi result Derrick Stolee via GitGitGadget
@ 2022-09-28 15:58   ` Taylor Blau
  0 siblings, 0 replies; 21+ messages in thread
From: Taylor Blau @ 2022-09-28 15:58 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, Derrick Stolee

On Tue, Sep 27, 2022 at 02:08:28PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>

Should the patch message be:

  *: relax git_config_get_value_multi() result

instead of referring to git_configset_get_value_multi?

> ---
>  builtin/submodule--helper.c | 10 ++++++++--
>  submodule.c                 |  2 +-
>  t/helper/test-config.c      |  4 ++--
>  versioncmp.c                |  8 ++++----
>  4 files changed, 15 insertions(+), 9 deletions(-)

All looks good here. I did a git grep for git_config_get_value_multi()
to make sure all of those spots were covered here. No comments from me,
other than a little thinking aloud below on a couple of the conversions.

> @@ -552,7 +553,9 @@ static int module_init(int argc, const char **argv, const char *prefix)
>  	 * If there are no path args and submodule.active is set then,
>  	 * by default, only initialize 'active' modules.
>  	 */
> -	if (!argc && git_config_get_value_multi("submodule.active"))
> +	if (!argc &&
> +	    (active_modules = git_config_get_value_multi("submodule.active")) &&
> +	    active_modules->nr)

Yuck ;-). I was going to suggest that the addition of a helper function
something like:

    static int any_configured(const char *key)
    {
      const struct string_list *vals = git_configset_get_value_multi(key);
      return vals && vals->nr;
    }

would be worthwhile for this and the below conversion, but the change at
the end of this series makes it a moot point. So the temporary eye-sore
is just fine.

> diff --git a/t/helper/test-config.c b/t/helper/test-config.c
> index 4ba9eb65606..62644dd71d7 100644
> --- a/t/helper/test-config.c
> +++ b/t/helper/test-config.c
> @@ -96,7 +96,7 @@ int cmd__config(int argc, const char **argv)
>  		}
>  	} else if (argc == 3 && !strcmp(argv[1], "get_value_multi")) {
>  		strptr = git_config_get_value_multi(argv[2]);
> -		if (strptr) {
> +		if (strptr && strptr->nr) {
>  			for (i = 0; i < strptr->nr; i++) {
>  				v = strptr->items[i].string;
>  				if (!v)

Good catch. I was thinking that this whole "if" statement could have
been dropped, but the goto that is hidden by the context meant that this
*would* have been a behavior change otherwise. So the conversion here is
appropriate.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] config: return an empty list, not NULL
  2022-09-28 14:37           ` Ævar Arnfjörð Bjarmason
@ 2022-09-28 18:10             ` Derrick Stolee
  2022-09-28 19:33               ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 21+ messages in thread
From: Derrick Stolee @ 2022-09-28 18:10 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee via GitGitGadget, git

On 9/28/2022 10:37 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Sep 28 2022, Derrick Stolee wrote:

>> If we return a negative value on an error and the number of matches on
>> success, then this change could instead be "if (repo_config....() > 0)".
> 
> Hrm, I think you're confusing the worldview your series here is
> advocating for, and what I'm suggesting as an alternative.
> 
> There isn't any way on "master" to have "an empty list", that's a
> worldview you're proposing. In particular your 1/5 here removes:
> 
> 	assert(values->nr > 0);

Yes, I think the lack of a key should be the same to an empty list
because it is logically equivalent and an empty list is safer to use
than a possibly-NULL list. That's what started this whole investigation.

By no longer returning NULL, we can eliminate an entire class of bugs
from happening.

> More generally the config format has no notion of "an empty list", if
> you have a valid key-value pair at all you have a list of ".nr >= 1".

The critical part is that you have a return code that has three modes:

 1. Negative: some kind of parsing error happened, such as a malformed
    'key' parameter.

 2. Zero: The 'key' was found with multiple values.

 3. Positive: The 'key' was not found. (Are there other innocuous
    errors that fit in this case?)

This "positive means not found" is very non-obvious.

Even with your goal of exposing those parsing errors when the 'key' is
user-supplied, I still think it would be better to provide a non-NULL
empty string_list and only care about these return values:

 1. Negative: some kind of parsing error happened.

 2. Nonnegative: parsing succeeded. The string_list has all found values
    (might be empty) and the return value is equal to the string_list's
    'nr' member.

In these cases, I see two modes of use.

First, check for an error and exit early (empty list no-ops naturally):

	if (git_config_get_const_value_multi(key, &list) < 0)
		return -1;

	for_each_string_list_item(item, list) {
		...
	}

Second, ignore errors. Care about non-empty list:

	if (git_config_get_const_value_multi("known.key", &list) > 0) {
		...
	}

But this is just a matter of taste at this point, and I'm getting the
feeling that my taste for reducing the chances of NULL references is
not very popular.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/5] [RFC] config API: return empty list, not NULL
  2022-09-28  2:40 ` [PATCH 0/5] [RFC] config API: return empty list, not NULL Junio C Hamano
@ 2022-09-28 18:38   ` Derrick Stolee
  2022-09-28 19:27     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 21+ messages in thread
From: Derrick Stolee @ 2022-09-28 18:38 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget; +Cc: git

On 9/27/2022 10:40 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> This work changes the behavior of asking for a multi-valued config key to
>> return an empty list instead of a NULL value. This simplifies the handling
>> of the result and is safer for development in the future.
>>
>> This is based on v4 of my unregister series [1]
>>
>> [1]
>> https://lore.kernel.org/git/pull.1358.v4.git.1664287021.gitgitgadget@gmail.com/
>>
>> This idea came about due to a bug in the git maintenance unregister work
>> where the result from git_config_get_value_multi() was sent directly to
>> for_each_string_list_item() without checking for a NULL value first.
>>
>> I'm sending this as an RFC mostly because I'm not 100% sure this shift is
>> worth the refactoring pain and effort. I personally think getting an empty
>> list is a safer choice, but I can also understand if someone has a different
>> opinion.
> 
> Thanks.
> 
> I actually am in favor of the idea that a NULL can be passed around
> to signal the lack of a string_list (or the lack of a instance of
> any "collection" type), and the current code is structured as such,
> and it gives us extra flexibility.  Of course, we need to see if
> that extra flexibility is worth it.
> 
> With a colleciton col, "if (col && col->nr)" checks if we have
> something to work on.  But a code like this (which is a longhand for
> the for_each_string_list_item() issue we just reencountered):
> 
>     col = git_get_some_collection(...);
>     if (!col)
> 	return; /* no collection */
>     if (!col->nr)
> 	git_add_to_some_collection(col, the default item);
>     for (i = 0; i < col->nr; i++)
> 	do things on col.stuff[i];
> 
> can react differently to cases where we have an empty collection
> and where we do not have any collection to begin with.  
> 
> The other side of the coin is that it would make it harder to treat
> the lack of collection itself and the collection being empty the
> same way.  The above code might need to become
> 
>     col = git_get_some_collection(...);
>     if (!col)
> 	col = git_get_empty_collection();
>     if (!col->nr)
> 	git_add_to_some_collection(col, the default item);
>     for (i = 0; i < col->nr; i++)
> 	do things on col.stuff[i];
> 
> but if the "get the collection" thing returns an empty collection
> when there is actually no collection, we can lose two lines from
> there.

I'm all for conveying more information when possible, but how can
the config API provide a distinction between an empty list and a
NULL list? The only thing I can think about is a case where the
empty value clears the list and no new values are added, such as

	[bogus "key"]
		item = one
		item = two
		item =

With this, the key exists in the config file, but the multi-valued
list is empty. Is that an important distinction? I don't think so.

> Between these two positions, I am in favor of the flexibility that
> we can use NULL to signal the lack of collection, not a presence of
> a collection with zero items, as it feels conceptually cleaner.
> 
> Counting the hunks in [2/5] and [5/5], it seems that "when no
> variable with given key is defined, we return an empty set" makes us
> to have more code in 7 places in [PATCH 2/5], while allowing us to
> lose code in 10 places in [PATCH 5/5].

Outside of the "if (sl && sl->nr) {" that I forgot to convert into
"if (sl->nr) {" in patch 5, all of those 7 places that have "more code"
end up with only that extra "->nr". The code looks uglier temporarily
in patch 2 to create the compatibility mode so patch 4 can change only
the config API without test failures.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/5] [RFC] config API: return empty list, not NULL
  2022-09-28 18:38   ` Derrick Stolee
@ 2022-09-28 19:27     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-09-28 19:27 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Junio C Hamano, Derrick Stolee via GitGitGadget, git


On Wed, Sep 28 2022, Derrick Stolee wrote:

> On 9/27/2022 10:40 PM, Junio C Hamano wrote:
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>> 
>>> This work changes the behavior of asking for a multi-valued config key to
>>> return an empty list instead of a NULL value. This simplifies the handling
>>> of the result and is safer for development in the future.
>>>
>>> This is based on v4 of my unregister series [1]
>>>
>>> [1]
>>> https://lore.kernel.org/git/pull.1358.v4.git.1664287021.gitgitgadget@gmail.com/
>>>
>>> This idea came about due to a bug in the git maintenance unregister work
>>> where the result from git_config_get_value_multi() was sent directly to
>>> for_each_string_list_item() without checking for a NULL value first.
>>>
>>> I'm sending this as an RFC mostly because I'm not 100% sure this shift is
>>> worth the refactoring pain and effort. I personally think getting an empty
>>> list is a safer choice, but I can also understand if someone has a different
>>> opinion.
>> 
>> Thanks.
>> 
>> I actually am in favor of the idea that a NULL can be passed around
>> to signal the lack of a string_list (or the lack of a instance of
>> any "collection" type), and the current code is structured as such,
>> and it gives us extra flexibility.  Of course, we need to see if
>> that extra flexibility is worth it.
>> 
>> With a colleciton col, "if (col && col->nr)" checks if we have
>> something to work on.  But a code like this (which is a longhand for
>> the for_each_string_list_item() issue we just reencountered):
>> 
>>     col = git_get_some_collection(...);
>>     if (!col)
>> 	return; /* no collection */
>>     if (!col->nr)
>> 	git_add_to_some_collection(col, the default item);
>>     for (i = 0; i < col->nr; i++)
>> 	do things on col.stuff[i];
>> 
>> can react differently to cases where we have an empty collection
>> and where we do not have any collection to begin with.  
>> 
>> The other side of the coin is that it would make it harder to treat
>> the lack of collection itself and the collection being empty the
>> same way.  The above code might need to become
>> 
>>     col = git_get_some_collection(...);
>>     if (!col)
>> 	col = git_get_empty_collection();
>>     if (!col->nr)
>> 	git_add_to_some_collection(col, the default item);
>>     for (i = 0; i < col->nr; i++)
>> 	do things on col.stuff[i];
>> 
>> but if the "get the collection" thing returns an empty collection
>> when there is actually no collection, we can lose two lines from
>> there.
>
> I'm all for conveying more information when possible, but how can
> the config API provide a distinction between an empty list and a
> NULL list? The only thing I can think about is a case where the
> empty value clears the list and no new values are added, such as
>
> 	[bogus "key"]
> 		item = one
> 		item = two
> 		item =
>
> With this, the key exists in the config file, but the multi-valued
> list is empty.

It's not empty, that's a list with three items: ["one", "two", ""], the
last one is just the empty string.

We then have some stateful parsing logic for individual keys that
decides to apply the business logic that an empty value clears the list,
but none of that's implemented as part of the config API itself.

I think we might want to create some helper function for such a
special-cased "multi" list, but...

> Is that an important distinction? I don't think so.

...even then no, I don't think there should be a distinction. It
shouldn't be an "empty list" in the "list.nr == 0" sense if the config
API understood that sort of construct natively, it should just act as
though the key wasn't specified, surely...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] config: return an empty list, not NULL
  2022-09-28 18:10             ` Derrick Stolee
@ 2022-09-28 19:33               ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-09-28 19:33 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Derrick Stolee via GitGitGadget, git

On Wed, Sep 28 2022, Derrick Stolee wrote:

> On 9/28/2022 10:37 AM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Wed, Sep 28 2022, Derrick Stolee wrote:
>
>>> If we return a negative value on an error and the number of matches on
>>> success, then this change could instead be "if (repo_config....() > 0)".
>> 
>> Hrm, I think you're confusing the worldview your series here is
>> advocating for, and what I'm suggesting as an alternative.
>> 
>> There isn't any way on "master" to have "an empty list", that's a
>> worldview you're proposing. In particular your 1/5 here removes:
>> 
>> 	assert(values->nr > 0);
>
> Yes, I think the lack of a key should be the same to an empty list
> because it is logically equivalent and an empty list is safer to use
> than a possibly-NULL list. That's what started this whole investigation.
>
> By no longer returning NULL, we can eliminate an entire class of bugs
> from happening.

This saga was started because this code had a segfault:

	const struct string_list *list = git_config_get_value_multi(key);
	for_each_string_list_item(item, list) { ... found = 1 ... }

In general I completely agree with you that APIs should be safe by
default, especially when it comes to NULL. E.g. it's *very handy* that
the "buf" member of "struct strbuf" is never NULL.

But I just don't see it in this case, i.e. the alternative once we make
*_multi() act like the rest of the config API is:

	if (!git_config_get_const_value_multi(key, &list))
		for_each_string_list_item(item, list) { ... found = 1 ... }

Or, well, in my version:

	if (!git_config_get_const_value_multi(key, &list))
		found = unsorted_string_list_has_string(list, maintpath);

But that unsorted_string_list_has_string() pattern could also be used in
the former.

I've found this "if I have this data, then" behavior of the config API
to be very handy & safe to use, we just haven't used it for this one API
entry point.

>> More generally the config format has no notion of "an empty list", if
>> you have a valid key-value pair at all you have a list of ".nr >= 1".
>
> The critical part is that you have a return code that has three modes:
>
>  1. Negative: some kind of parsing error happened, such as a malformed
>     'key' parameter.

Critically not "some kind of parsing error", but "malformed key error",
we've already parsed the config itself at this point.

>  2. Zero: The 'key' was found with multiple values.

Well, found at all, maybe it has just one value.

>  3. Positive: The 'key' was not found. (Are there other innocuous
>     errors that fit in this case?)

It's just "not found", nothing else (there's no "return 2;" etc).

But in terms of simple API use if your key is a constant string
(i.e. the ones where we hardcode the key in the C source) we just need
to handle cases #2 and #3, hence why I added that _*const_value_multi()
wrapper.

Of course it's possible to get an error on
git_config_get_const_value_multi("mal.formed.key.", but we'd BUG() out
on that in development, and once we test it once we know the hardcoded
key is valid. So as an API it makes sense not to worry about it, but
just BUG() out.

You only need to handle this "malformed key" case on runtime in
git-config(1) itself, and e.g. "git for-each-repo --config=<key>", where
we get the key from the user

> This "positive means not found" is very non-obvious.

Is it? I find it quite idiomatic in C, but yeah, it would be odd in some
other contexts and/or languages.

In this case though every other function in the config API works that
way, i.e.:

	git grep 'if \(!.*config_get_'

So whatever our desires if it were a clean-room API (where a key
existing might return 1), surely consistency would win out at this
point.

> Even with your goal of exposing those parsing errors when the 'key' is
> user-supplied, I still think it would be better to provide a non-NULL
> empty string_list and only care about these return values:
>
>  1. Negative: some kind of parsing error happened.
>
>  2. Nonnegative: parsing succeeded. The string_list has all found values
>     (might be empty) and the return value is equal to the string_list's
>     'nr' member.

Can you show me how this sort of thing would look in your proposal:

	int bool;
        const struct string_list *list;
        int found = 0;

	if (!git_config_get_const_value_multi("a.list", &list))
		found = unsorted_string_list_has_string(list, "needle");
	if (!git_config_get_bool("a.bool", &bool))
		if (bool) ...;

AFAICT that would be:

	int bool;
        const struct string_list *list = git_config_get_value_multi("a.list");
        int found = unsorted_string_list_has_string(list, "needle");

	if (!git_config_get_bool("a.bool", &bool))
		if (bool) ...;

I find it unintuitive to make it act differently just because it's a
collection, so we can get away with basically stuffing the "existed?" in
the value itself (as .nr == 0).

> In these cases, I see two modes of use.
>
> First, check for an error and exit early (empty list no-ops naturally):
>
> 	if (git_config_get_const_value_multi(key, &list) < 0)
> 		return -1;

No, the _*const_value_multi() never returns < 0, that's a BUG(). So we
don't worry about that for the common case.

At the tip of my branch for this gc, log, versioncmp etc. all use that.

Which means that this right after:

> 	for_each_string_list_item(item, list) {
> 		...
> 	}

Would be an error, we'd have a NULL list if we got a return value of 1,
but the common case is:

	if (!git_config_get_const_value_multi(key, &list) {
		struct string_list_item *item;

		for_each_string_list_item(item, list) ...
	}

> Second, ignore errors. Care about non-empty list:
>
> 	if (git_config_get_const_value_multi("known.key", &list) > 0) {
> 		...
> 	}

Since that's the *_const_value_multi() not > 0, just "!fn()" will do,
since it's 0 or 1.

> But this is just a matter of taste at this point, and I'm getting the
> feeling that my taste for reducing the chances of NULL references is
> not very popular.

It's very popular with me :)

I just don't see how it's needed in *this case*, since we can entirely
sidestep the need for a dangerous NULL value by making it act like the
rest of the API.

 think the reason the bug in [1] happened is exactly because
git_config_get_value_multi(key) currently acts in a way where you need
to explicitly remember to handle the NULL, *and* if you forget to do so
it's easy to have it work.

I.e. in your (initial buggy) version:

	const struct string_list *list = git_config_get_value_multi(key);

	for_each_string_list_item(item, list) { /* boom, "list" can be null */

The problem being that you can easily write that idiomatically, have it
work in development (because you test with that in your config), and
then it blows up if you don't have that config.

But that's exactly why I think the rest of the config API is well
designed, because as soon as you need to write that as:

	const struct string_list *list;

	if (!git_config_get_const_value_multi(key, &list))
		for_each_string_list_item(item, list) { /* "list" can't be null /*

You're forced to structure your code in such a way that you can't really
forget the implicit NULL check, because to make use of the API at all
you need an "if I have the data, then..." block.

1. https://lore.kernel.org/git/8a8bffaec89e55da0c5bcac2f04331e0d4e69790.1664218087.git.gitgitgadget@gmail.com/

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-09-28 20:08 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-27 14:08 [PATCH 0/5] [RFC] config API: return empty list, not NULL Derrick Stolee via GitGitGadget
2022-09-27 14:08 ` [PATCH 1/5] config: relax requirements on multi-value return Derrick Stolee via GitGitGadget
2022-09-27 17:26   ` Junio C Hamano
2022-09-27 14:08 ` [PATCH 2/5] *: relax git_configset_get_value_multi result Derrick Stolee via GitGitGadget
2022-09-28 15:58   ` Taylor Blau
2022-09-27 14:08 ` [PATCH 3/5] config: add BUG() statement instead of possible segfault Derrick Stolee via GitGitGadget
2022-09-27 16:17   ` Ævar Arnfjörð Bjarmason
2022-09-27 16:46     ` Derrick Stolee
2022-09-27 17:22       ` Ævar Arnfjörð Bjarmason
2022-09-27 14:08 ` [PATCH 4/5] config: return an empty list, not NULL Derrick Stolee via GitGitGadget
2022-09-27 16:21   ` Ævar Arnfjörð Bjarmason
2022-09-27 16:50     ` Derrick Stolee
2022-09-27 19:18       ` Ævar Arnfjörð Bjarmason
2022-09-28 13:46         ` Derrick Stolee
2022-09-28 14:37           ` Ævar Arnfjörð Bjarmason
2022-09-28 18:10             ` Derrick Stolee
2022-09-28 19:33               ` Ævar Arnfjörð Bjarmason
2022-09-27 14:08 ` [PATCH 5/5] *: expect a non-NULL list of config values Derrick Stolee via GitGitGadget
2022-09-28  2:40 ` [PATCH 0/5] [RFC] config API: return empty list, not NULL Junio C Hamano
2022-09-28 18:38   ` Derrick Stolee
2022-09-28 19:27     ` Ævar Arnfjörð Bjarmason

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).