git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / Atom feed
* [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug)
@ 2018-03-29 15:18 Johannes Schindelin
  2018-03-29 15:18 ` [PATCH 1/9] git_config_set: fix off-by-two Johannes Schindelin
                   ` (12 more replies)
  0 siblings, 13 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-03-29 15:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

This patch series started out as a single patch trying to figure out what it
takes to fix that annoying bug that has been reported several times over the
years, where `git config --unset` would leave empty sections behind, and `git
config --add` would not reuse them.

Little did I know that this would turn not only into a full patch to fix this
issue, but into a full-blown series of nine patches.

The first patch is somewhat of a "while at it" bug fix that I first thought
would be a lot more critical than it actually is: It really only affects config
files that start with a section followed immediately (i.e. without a newline)
by a one-letter boolean setting (i.e. without a `= <value>` part). So while it
is a real bug fix, I doubt anybody ever got bitten by it.

Nonetheless, I would be confortable with this patch going into v2.17.0, even at
this late stage. The final verdict is Junio's, of course.

The next swath of patches add some tests, and adjust one test about which I
already complained at length yesterday, so I will spare you the same ordeal
today. These fixes are pretty straight-forward, and I always try to keep my
added tests as concise as possible, so please tell me if you find a way to make
them smaller (without giving up readability and debuggability).

Finally, the interesting part, where I do two things, essentially (with
preparatory steps for each thing):

1. I add the ability for `git config --unset/--unset-all` to detect that it
   can remove a section that has just become empty (see below for some more
   discussion of what I consider "become empty"), and

2. I add the ability for `git config [--add] key value` to re-use empty
   sections.

Note that the --unset/--unset-all part is the hairy one, and I would love it if
people could concentrate on wrapping their heads around that function, and
obviously tell me how I can change it to make it more readable (or even point
out incorrect behavior).

Now, to the really important part: why does this patch series not conflict with
my very early statements that we cannot simply remove empty sections because we
may end up with stale comments?

Well, the patch in question takes pains to determine *iff* there are any
comments surrounding, or included in, the section. If any are found: previous
behavior. Under the assumption that the user edited the file, we keep it as
intact as possible (see below for some argument against this). If no comments
are found, and let's face it, this is probably *the* common case, as few people
edit their config files by hand these days (neither should they because it is
too easy to end up with an unparseable one), the now-empty section *is*
removed.

So what is the argument against this extra care to detect comments? Well, if
you have something like this:

	[section]
		; Here we comment about the variable called snarf
		snarf = froop

and we run `git config --unset section.snarf`, we end up with this config:

	[section]
		; Here we comment about the variable called snarf

which obviously does not make sense. However, that is already established
behavior for quite a few years, and I do not even try to think of a way how
this could be solved.


Johannes Schindelin (9):
  git_config_set: fix off-by-two
  t1300: rename it to reflect that `repo-config` was deprecated
  t1300: avoid relying on a bug
  t1300: remove unreasonable expectation from TODO
  t1300: `--unset-all` can leave an empty section behind (bug)
  git_config_set: simplify the way the section name is remembered
  git config --unset: remove empty sections (in normal situations)
  git_config_set: use do_config_from_file() directly
  git_config_set: reuse empty sections

 config.c                                    | 234 +++++++++++++++++++++++++---
 t/{t1300-repo-config.sh => t1300-config.sh} |  36 ++++-
 2 files changed, 250 insertions(+), 20 deletions(-)
 rename t/{t1300-repo-config.sh => t1300-config.sh} (98%)


base-commit: 03df4959472e7d4b5117bb72ac86e1e2bcf21723
Published-As: https://github.com/dscho/git/releases/tag/empty-config-section-v1
Fetch-It-Via: git fetch https://github.com/dscho/git empty-config-section-v1
-- 
2.16.2.windows.1.26.g2cc3565eb4b


^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 1/9] git_config_set: fix off-by-two
  2018-03-29 15:18 [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug) Johannes Schindelin
@ 2018-03-29 15:18 ` Johannes Schindelin
  2018-03-29 18:15   ` Stefan Beller
  2018-03-29 15:18 ` [PATCH 2/9] t1300: rename it to reflect that `repo-config` was deprecated Johannes Schindelin
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 103+ messages in thread
From: Johannes Schindelin @ 2018-03-29 15:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

Currently, we are slightly overzealous When removing an entry from a
config file of this form:

	[abc]a
	[xyz]
		key = value

When calling `git config --unset abc.a` on this file, it leaves this
(invalid) config behind:

	[
	[xyz]
		key = value

The reason is that we try to search for the beginning of the line (or
for the end of the preceding section header on the same line) that
defines abc.a, but as an optimization, we subtract 2 from the offset
pointing just after the definition before we call
find_beginning_of_line(). That function, however, *also* performs that
optimization and promptly fails to find the section header correctly.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/config.c b/config.c
index b0c20e6cb8a..5cc049aaef0 100644
--- a/config.c
+++ b/config.c
@@ -2632,7 +2632,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 			} else
 				copy_end = find_beginning_of_line(
 					contents, contents_sz,
-					store.offset[i]-2, &new_line);
+					store.offset[i], &new_line);
 
 			if (copy_end > 0 && contents[copy_end-1] != '\n')
 				new_line = 1;
-- 
2.16.2.windows.1.26.g2cc3565eb4b



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/9] t1300: rename it to reflect that `repo-config` was deprecated
  2018-03-29 15:18 [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug) Johannes Schindelin
  2018-03-29 15:18 ` [PATCH 1/9] git_config_set: fix off-by-two Johannes Schindelin
@ 2018-03-29 15:18 ` Johannes Schindelin
  2018-03-29 19:42   ` Jeff King
  2018-03-29 15:18 ` [PATCH 3/9] t1300: avoid relying on a bug Johannes Schindelin
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 103+ messages in thread
From: Johannes Schindelin @ 2018-03-29 15:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/{t1300-repo-config.sh => t1300-config.sh} | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename t/{t1300-repo-config.sh => t1300-config.sh} (100%)

diff --git a/t/t1300-repo-config.sh b/t/t1300-config.sh
similarity index 100%
rename from t/t1300-repo-config.sh
rename to t/t1300-config.sh
-- 
2.16.2.windows.1.26.g2cc3565eb4b



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 3/9] t1300: avoid relying on a bug
  2018-03-29 15:18 [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug) Johannes Schindelin
  2018-03-29 15:18 ` [PATCH 1/9] git_config_set: fix off-by-two Johannes Schindelin
  2018-03-29 15:18 ` [PATCH 2/9] t1300: rename it to reflect that `repo-config` was deprecated Johannes Schindelin
@ 2018-03-29 15:18 ` Johannes Schindelin
  2018-03-29 19:43   ` Jeff King
  2018-03-29 15:18 ` [PATCH 4/9] t1300: remove unreasonable expectation from TODO Johannes Schindelin
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 103+ messages in thread
From: Johannes Schindelin @ 2018-03-29 15:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

The test case 'unset with cont. lines' relied on a bug that is about to
be fixed: it tests *explicitly* that removing the last entry from a
config section leaves an *empty* section behind.

Let's fix this test case not to rely on that behavior, simply by
preventing the section from becoming empty.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1300-config.sh | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 4f8e6f5fde3..1ece7bad05f 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -108,6 +108,7 @@ bar = foo
 [beta]
 baz = multiple \
 lines
+foo = bar
 EOF
 
 test_expect_success 'unset with cont. lines' '
@@ -118,6 +119,7 @@ cat > expect <<\EOF
 [alpha]
 bar = foo
 [beta]
+foo = bar
 EOF
 
 test_expect_success 'unset with cont. lines is correct' 'test_cmp expect .git/config'
-- 
2.16.2.windows.1.26.g2cc3565eb4b



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 4/9] t1300: remove unreasonable expectation from TODO
  2018-03-29 15:18 [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug) Johannes Schindelin
                   ` (2 preceding siblings ...)
  2018-03-29 15:18 ` [PATCH 3/9] t1300: avoid relying on a bug Johannes Schindelin
@ 2018-03-29 15:18 ` Johannes Schindelin
  2018-03-29 19:52   ` Jeff King
  2018-03-29 15:18 ` [PATCH 5/9] t1300: `--unset-all` can leave an empty section behind (bug) Johannes Schindelin
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 103+ messages in thread
From: Johannes Schindelin @ 2018-03-29 15:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

In https://public-inbox.org/git/7vvc8alzat.fsf@alter.siamese.dyndns.org/
a reasonable patch was made quite a bit less so by changing a test case
demonstrating a bug to a test case that demonstrates that we ask for too
much: the test case 'unsetting the last key in a section removes header'
now expects a future bug fix to be able to determine whether a free-form
comment above a section header refers to said section or not.

Rather than shooting for the stars (and not even getting off the
ground), let's start shooting for something obtainable and be reasonably
confident that we *can* get it.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1300-config.sh | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 1ece7bad05f..3ad3df0c83e 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -1413,7 +1413,7 @@ test_expect_success 'urlmatch with wildcard' '
 '
 
 # good section hygiene
-test_expect_failure 'unsetting the last key in a section removes header' '
+test_expect_failure '--unset last key removes section (except if commented)' '
 	cat >.git/config <<-\EOF &&
 	# some generic comment on the configuration file itself
 	# a comment specific to this "section" section.
@@ -1427,6 +1427,25 @@ test_expect_failure 'unsetting the last key in a section removes header' '
 
 	cat >expect <<-\EOF &&
 	# some generic comment on the configuration file itself
+	# a comment specific to this "section" section.
+	[section]
+	# some intervening lines
+	# that should also be dropped
+
+	# please be careful when you update the above variable
+	EOF
+
+	git config --unset section.key &&
+	test_cmp expect .git/config &&
+
+	cat >.git/config <<-\EOF &&
+	[section]
+	key = value
+	[next-section]
+	EOF
+
+	cat >expect <<-\EOF &&
+	[next-section]
 	EOF
 
 	git config --unset section.key &&
-- 
2.16.2.windows.1.26.g2cc3565eb4b



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 5/9] t1300: `--unset-all` can leave an empty section behind (bug)
  2018-03-29 15:18 [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug) Johannes Schindelin
                   ` (3 preceding siblings ...)
  2018-03-29 15:18 ` [PATCH 4/9] t1300: remove unreasonable expectation from TODO Johannes Schindelin
@ 2018-03-29 15:18 ` Johannes Schindelin
  2018-03-29 19:54   ` Jeff King
  2018-03-29 15:18 ` [PATCH 6/9] git_config_set: simplify the way the section name is remembered Johannes Schindelin
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 103+ messages in thread
From: Johannes Schindelin @ 2018-03-29 15:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

We already have a test demonstrating that removing the last entry from a
config section fails to remove the section header of the now-empty
section.

The same can happen, of course, if we remove the last entries in one fell
swoop. This is *also* a bug, and should be fixed at the same time.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1300-config.sh | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 3ad3df0c83e..ff79a213567 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -1452,6 +1452,17 @@ test_expect_failure '--unset last key removes section (except if commented)' '
 	test_cmp expect .git/config
 '
 
+test_expect_failure '--unset-all removes section if empty & uncommented' '
+	cat >.git/config <<-\EOF &&
+	[section]
+	key = value1
+	key = value2
+	EOF
+
+	git config --unset-all section.key &&
+	test_line_count = 0 .git/config
+'
+
 test_expect_failure 'adding a key into an empty section reuses header' '
 	cat >.git/config <<-\EOF &&
 	[section]
-- 
2.16.2.windows.1.26.g2cc3565eb4b



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 6/9] git_config_set: simplify the way the section name is remembered
  2018-03-29 15:18 [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug) Johannes Schindelin
                   ` (4 preceding siblings ...)
  2018-03-29 15:18 ` [PATCH 5/9] t1300: `--unset-all` can leave an empty section behind (bug) Johannes Schindelin
@ 2018-03-29 15:18 ` Johannes Schindelin
  2018-03-29 15:19 ` [PATCH 7/9] git config --unset: remove empty sections (in normal situations) Johannes Schindelin
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-03-29 15:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

This not only reduces the number of lines, but also opens the door for
reusing the section name later (which the upcoming patch to remove
now-empty sections will do).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/config.c b/config.c
index 5cc049aaef0..d35dffa50de 100644
--- a/config.c
+++ b/config.c
@@ -2486,12 +2486,14 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 	struct lock_file lock = LOCK_INIT;
 	char *filename_buf = NULL;
 	char *contents = NULL;
+	char *section_name = NULL;
 	size_t contents_sz;
 
 	/* parse-key returns negative; flip the sign to feed exit(3) */
-	ret = 0 - git_config_parse_key(key, &store.key, &store.baselen);
+	ret = 0 - git_config_parse_key(key, &section_name, &store.baselen);
 	if (ret)
 		goto out_free;
+	store.key = section_name;
 
 	store.multi_replace = multi_replace;
 
@@ -2505,7 +2507,6 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 	fd = hold_lock_file_for_update(&lock, config_filename, 0);
 	if (fd < 0) {
 		error_errno("could not lock config file %s", config_filename);
-		free(store.key);
 		ret = CONFIG_NO_LOCK;
 		goto out_free;
 	}
@@ -2515,8 +2516,6 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 	 */
 	in_fd = open(config_filename, O_RDONLY);
 	if ( in_fd < 0 ) {
-		free(store.key);
-
 		if ( ENOENT != errno ) {
 			error_errno("opening %s", config_filename);
 			ret = CONFIG_INVALID_FILE; /* same as "invalid config file" */
@@ -2571,7 +2570,6 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 		 */
 		if (git_config_from_file(store_aux, config_filename, NULL)) {
 			error("invalid config file %s", config_filename);
-			free(store.key);
 			if (store.value_regex != NULL &&
 			    store.value_regex != CONFIG_REGEX_NONE) {
 				regfree(store.value_regex);
@@ -2581,7 +2579,6 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 			goto out_free;
 		}
 
-		free(store.key);
 		if (store.value_regex != NULL &&
 		    store.value_regex != CONFIG_REGEX_NONE) {
 			regfree(store.value_regex);
@@ -2682,6 +2679,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 
 out_free:
 	rollback_lock_file(&lock);
+	free(section_name);
 	free(filename_buf);
 	if (contents)
 		munmap(contents, contents_sz);
-- 
2.16.2.windows.1.26.g2cc3565eb4b



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 7/9] git config --unset: remove empty sections (in normal situations)
  2018-03-29 15:18 [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug) Johannes Schindelin
                   ` (5 preceding siblings ...)
  2018-03-29 15:18 ` [PATCH 6/9] git_config_set: simplify the way the section name is remembered Johannes Schindelin
@ 2018-03-29 15:19 ` Johannes Schindelin
  2018-03-29 21:32   ` Jeff King
  2018-03-29 15:19 ` [PATCH 8/9] git_config_set: use do_config_from_file() directly Johannes Schindelin
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 103+ messages in thread
From: Johannes Schindelin @ 2018-03-29 15:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

The original reasoning for not removing section headers upon removal of
the last entry went like this: the user could have added comments about
the section, or about the entries therein, and if there were other
comments there, we would not know whether we should remove them.

In particular, a concocted example was presented that looked like this
(and was added to t1300):

	# some generic comment on the configuration file itself
	# a comment specific to this "section" section.
	[section]
	# some intervening lines
	# that should also be dropped

	key = value
	# please be careful when you update the above variable

The ideal thing for `git config --unset section.key` in this case would
be to leave only the first line behind, because all the other comments
are now obsolete.

However, this is unfeasible, short of adding a complete Natural Language
Processing module to Git, which seems not only a lot of work, but a
totally unreasonable feature (for little benefit to most users).

Now, the real kicker about this problem is: most users do not edit their
config files at all! In their use case, the config looks like this
instead:

	[section]
		key = value

... and it is totally obvious what should happen if the entry is
removed.

Let's generalize this observation to this conservative strategy: if we
are removing the last entry from a section, and there are no comments
inside that section nor surrounding it, then remove the entire section.
Otherwise behave as before: leave the now-empty section (including those
comments, even the one about the now-deleted entry).

We have to be careful, though, to handle also the case where there are
*multiple* entries that are removed: any subset of them might be the last
entries of their respective sections.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.c          | 181 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 t/t1300-config.sh |   4 +-
 2 files changed, 182 insertions(+), 3 deletions(-)

diff --git a/config.c b/config.c
index d35dffa50de..503aef4b318 100644
--- a/config.c
+++ b/config.c
@@ -2429,6 +2429,177 @@ static ssize_t find_beginning_of_line(const char *contents, size_t size,
 	return offset;
 }
 
+/*
+ * This function determines whether the offset is in a line that starts with a
+ * comment character.
+ *
+ * Note: it does *not* report when a regular line (section header, config
+ * setting) *ends* in a comment.
+ */
+static int is_in_comment_line(const char *contents, size_t offset)
+{
+	int comment = 0;
+
+	while (offset > 0)
+		switch (contents[--offset]) {
+		case ';':
+		case '#':
+			comment = 1;
+			break;
+		case '\n':
+			break;
+		case ' ':
+		case '\t':
+			continue;
+		default:
+			comment = 0;
+		}
+
+	return comment;
+}
+
+/*
+ * If we are about to unset the last key(s) in a section, and if there are
+ * no comments surrounding (or included in) the section, we will want to
+ * extend begin/end to remove the entire section.
+ *
+ * Note: the parameter `i_ptr` points to the index into the store.offset
+ * array, reflecting the end offset of the respective entry to be deleted.
+ * This index may be incremented if a section has more than one entry (which
+ * all are to be removed).
+ */
+static void maybe_remove_section(const char *contents, size_t size,
+				 const char *section_name,
+				 size_t section_name_len,
+				 size_t *begin, int *i_ptr, int *new_line)
+{
+	size_t begin2, end2;
+	int seen_section = 0, dummy, i = *i_ptr;
+
+	/*
+	 * First, make sure that this is the last key in the section, and that
+	 * there are no comments that are possibly about the current section.
+	 */
+next_entry:
+	for (end2 = store.offset[i]; end2 < size; end2++) {
+		switch (contents[end2]) {
+		case ' ':
+		case '\t':
+		case '\n':
+			continue;
+		case '\r':
+			if (++end2 < size && contents[end2] == '\n')
+				continue;
+			break;
+		case '[':
+			/* If the section name is repeated, continue */
+			if (end2 + 1 + section_name_len < size &&
+			    contents[end2 + section_name_len] == ']' &&
+			    !memcmp(contents + end2 + 1, section_name,
+				    section_name_len)) {
+				end2 += section_name_len;
+				continue;
+			}
+			goto look_before;
+		case ';':
+		case '#':
+			/* There is a comment, cannot remove this section */
+			return;
+		default:
+			/* There are other keys in that section */
+			break;
+		}
+
+		/*
+		 * Uh oh... we found something else in this section. But do
+		 * we want to remove this, too?
+		 */
+		if (++i >= store.seen)
+			return;
+
+		begin2 = find_beginning_of_line(contents, size, store.offset[i],
+						&dummy);
+		if (begin2 > end2)
+			return;
+
+		/* Looks like we want to remove the next one, too... */
+		goto next_entry;
+	}
+
+look_before:
+	/*
+	 * Now, ensure that this is the first key, and that there are no
+	 * comments before the entry nor before the section header.
+	 */
+	for (begin2 = *begin; begin2 > 0; )
+		switch (contents[begin2 - 1]) {
+		case ' ':
+		case '\t':
+			begin2--;
+			continue;
+		case '\n':
+			if (--begin2 > 0 && contents[begin2 - 1] == '\r')
+				begin2--;
+			continue;
+		case ']':
+			if (begin2 > section_name_len + 1 &&
+			    contents[begin2 - section_name_len - 2] == '[' &&
+			    !memcmp(contents + begin2 - section_name_len - 1,
+				    section_name, section_name_len)) {
+				begin2 -= section_name_len + 2;
+				seen_section = 1;
+				continue;
+			}
+
+			/*
+			 * It looks like a section header, but it could be a
+			 * comment instead...
+			 */
+			if (is_in_comment_line(contents, begin2))
+				return;
+
+			/*
+			 * We encountered the previous section header: This
+			 * really was the only entry, so remove the entire
+			 * section.
+			 */
+			if (contents[begin2] != '\n') {
+				begin2--;
+				*new_line = 1;
+			}
+
+			store.offset[i] = end2;
+			*begin = begin2;
+			*i_ptr = i;
+			return;
+		default:
+			/*
+			 * Any other character means it is either a comment or
+			 * a config setting; if it is a comment, we do not want
+			 * to remove this section. If it is a config setting,
+			 * we only want to remove this section if this is
+			 * already the next section.
+			 */
+			if (seen_section &&
+			    !is_in_comment_line(contents, begin2)) {
+				if (contents[begin2] != '\n') {
+					begin2--;
+					*new_line = 1;
+				}
+
+				store.offset[i] = end2;
+				*begin = begin2;
+				*i_ptr = i;
+			}
+			return;
+		}
+
+	/* This section extends to the beginning of the file. */
+	store.offset[i] = end2;
+	*begin = begin2;
+	*i_ptr = i;
+}
+
 int git_config_set_in_file_gently(const char *config_filename,
 				  const char *key, const char *value)
 {
@@ -2626,10 +2797,18 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 				store.offset[i] = copy_end = contents_sz;
 			} else if (store.state != KEY_SEEN) {
 				copy_end = store.offset[i];
-			} else
+			} else {
 				copy_end = find_beginning_of_line(
 					contents, contents_sz,
 					store.offset[i], &new_line);
+				if (!value)
+					maybe_remove_section(contents,
+							     contents_sz,
+							     section_name,
+							     store.baselen,
+							     &copy_end, &i,
+							     &new_line);
+			}
 
 			if (copy_end > 0 && contents[copy_end-1] != '\n')
 				new_line = 1;
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index ff79a213567..ecbcc9cf3d0 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -1413,7 +1413,7 @@ test_expect_success 'urlmatch with wildcard' '
 '
 
 # good section hygiene
-test_expect_failure '--unset last key removes section (except if commented)' '
+test_expect_success '--unset last key removes section (except if commented)' '
 	cat >.git/config <<-\EOF &&
 	# some generic comment on the configuration file itself
 	# a comment specific to this "section" section.
@@ -1452,7 +1452,7 @@ test_expect_failure '--unset last key removes section (except if commented)' '
 	test_cmp expect .git/config
 '
 
-test_expect_failure '--unset-all removes section if empty & uncommented' '
+test_expect_success '--unset-all removes section if empty & uncommented' '
 	cat >.git/config <<-\EOF &&
 	[section]
 	key = value1
-- 
2.16.2.windows.1.26.g2cc3565eb4b



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 8/9] git_config_set: use do_config_from_file() directly
  2018-03-29 15:18 [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug) Johannes Schindelin
                   ` (6 preceding siblings ...)
  2018-03-29 15:19 ` [PATCH 7/9] git config --unset: remove empty sections (in normal situations) Johannes Schindelin
@ 2018-03-29 15:19 ` Johannes Schindelin
  2018-03-29 21:38   ` Jeff King
  2018-03-29 15:19 ` [PATCH 9/9] git_config_set: reuse empty sections Johannes Schindelin
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 103+ messages in thread
From: Johannes Schindelin @ 2018-03-29 15:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

Technically, it is the git_config_set_multivar_in_file_gently()
function that we modify here (but the oneline would get too long if we
were that precise).

This change prepares the git_config_set machinery to allow reusing empty
sections, by using the file-local function do_config_from_file()
directly (whose signature can then be changed without any effect outside
of config.c).

An incidental benefit is that we avoid a level of indirection, and we
also avoid calling flockfile()/funlockfile() when we already know that
we are not operating on stdin/stdout here.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/config.c b/config.c
index 503aef4b318..eb1e0d335fc 100644
--- a/config.c
+++ b/config.c
@@ -2706,6 +2706,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 		struct stat st;
 		size_t copy_begin, copy_end;
 		int i, new_line = 0;
+		FILE *f;
 
 		if (value_regex == NULL)
 			store.value_regex = NULL;
@@ -2739,7 +2740,10 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 		 * As a side effect, we make sure to transform only a valid
 		 * existing config file.
 		 */
-		if (git_config_from_file(store_aux, config_filename, NULL)) {
+		f = fopen_or_warn(config_filename, "r");
+		if (!f || do_config_from_file(store_aux, CONFIG_ORIGIN_FILE,
+					      config_filename, config_filename,
+					      f, NULL)) {
 			error("invalid config file %s", config_filename);
 			if (store.value_regex != NULL &&
 			    store.value_regex != CONFIG_REGEX_NONE) {
@@ -2747,8 +2751,11 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 				free(store.value_regex);
 			}
 			ret = CONFIG_INVALID_FILE;
+			if (f)
+				fclose(f);
 			goto out_free;
-		}
+		} else
+			fclose(f);
 
 		if (store.value_regex != NULL &&
 		    store.value_regex != CONFIG_REGEX_NONE) {
-- 
2.16.2.windows.1.26.g2cc3565eb4b



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 9/9] git_config_set: reuse empty sections
  2018-03-29 15:18 [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug) Johannes Schindelin
                   ` (7 preceding siblings ...)
  2018-03-29 15:19 ` [PATCH 8/9] git_config_set: use do_config_from_file() directly Johannes Schindelin
@ 2018-03-29 15:19 ` Johannes Schindelin
  2018-03-29 21:50   ` Jeff King
  2018-03-29 17:58 ` [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug) Stefan Beller
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 103+ messages in thread
From: Johannes Schindelin @ 2018-03-29 15:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

It can happen quite easily that the last setting in a config section is
removed, and to avoid confusion when there are comments in the config
about that section, we keep a lone section header, i.e. an empty
section.

The code to add new entries in the config tries to be cute by reusing
the parsing code that is used to retrieve config settings, but that
poses the problem that the latter use case does *not* care about empty
sections, therefore even the former user case won't see them.

Fix this by introducing a mode where the parser reports also empty
sections (with a trailing '.' as tell-tale), and then using that when
adding new config entries.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.c          | 32 +++++++++++++++++++++++---------
 t/t1300-config.sh |  2 +-
 2 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/config.c b/config.c
index eb1e0d335fc..b04c40f76bc 100644
--- a/config.c
+++ b/config.c
@@ -653,13 +653,15 @@ static int get_base_var(struct strbuf *name)
 	}
 }
 
-static int git_parse_source(config_fn_t fn, void *data)
+static int git_parse_source(config_fn_t fn, void *data,
+			    int include_section_headers)
 {
 	int comment = 0;
 	int baselen = 0;
 	struct strbuf *var = &cf->var;
 	int error_return = 0;
 	char *error_msg = NULL;
+	int saw_section_header = 0;
 
 	/* U+FEFF Byte Order Mark in UTF8 */
 	const char *bomptr = utf8_bom;
@@ -685,6 +687,16 @@ static int git_parse_source(config_fn_t fn, void *data)
 			if (cf->eof)
 				return 0;
 			comment = 0;
+			if (saw_section_header) {
+				if (include_section_headers) {
+					cf->linenr--;
+					error_return = fn(var->buf, NULL, data);
+					if (error_return < 0)
+						break;
+					cf->linenr++;
+				}
+				saw_section_header = 0;
+			}
 			continue;
 		}
 		if (comment || isspace(c))
@@ -700,6 +712,7 @@ static int git_parse_source(config_fn_t fn, void *data)
 				break;
 			strbuf_addch(var, '.');
 			baselen = var->len;
+			saw_section_header = 1;
 			continue;
 		}
 		if (!isalpha(c))
@@ -1398,7 +1411,8 @@ int git_default_config(const char *var, const char *value, void *dummy)
  * fgetc, ungetc, ftell of top need to be initialized before calling
  * this function.
  */
-static int do_config_from(struct config_source *top, config_fn_t fn, void *data)
+static int do_config_from(struct config_source *top, config_fn_t fn, void *data,
+			  int include_section_headers)
 {
 	int ret;
 
@@ -1410,7 +1424,7 @@ static int do_config_from(struct config_source *top, config_fn_t fn, void *data)
 	strbuf_init(&top->var, 1024);
 	cf = top;
 
-	ret = git_parse_source(fn, data);
+	ret = git_parse_source(fn, data, include_section_headers);
 
 	/* pop config-file parsing state stack */
 	strbuf_release(&top->value);
@@ -1423,7 +1437,7 @@ static int do_config_from(struct config_source *top, config_fn_t fn, void *data)
 static int do_config_from_file(config_fn_t fn,
 		const enum config_origin_type origin_type,
 		const char *name, const char *path, FILE *f,
-		void *data)
+		void *data, int include_section_headers)
 {
 	struct config_source top;
 
@@ -1436,12 +1450,12 @@ static int do_config_from_file(config_fn_t fn,
 	top.do_ungetc = config_file_ungetc;
 	top.do_ftell = config_file_ftell;
 
-	return do_config_from(&top, fn, data);
+	return do_config_from(&top, fn, data, include_section_headers);
 }
 
 static int git_config_from_stdin(config_fn_t fn, void *data)
 {
-	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin, data);
+	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin, data, 0);
 }
 
 int git_config_from_file(config_fn_t fn, const char *filename, void *data)
@@ -1452,7 +1466,7 @@ int git_config_from_file(config_fn_t fn, const char *filename, void *data)
 	f = fopen_or_warn(filename, "r");
 	if (f) {
 		flockfile(f);
-		ret = do_config_from_file(fn, CONFIG_ORIGIN_FILE, filename, filename, f, data);
+		ret = do_config_from_file(fn, CONFIG_ORIGIN_FILE, filename, filename, f, data, 0);
 		funlockfile(f);
 		fclose(f);
 	}
@@ -1475,7 +1489,7 @@ int git_config_from_mem(config_fn_t fn, const enum config_origin_type origin_typ
 	top.do_ungetc = config_buf_ungetc;
 	top.do_ftell = config_buf_ftell;
 
-	return do_config_from(&top, fn, data);
+	return do_config_from(&top, fn, data, 0);
 }
 
 int git_config_from_blob_oid(config_fn_t fn,
@@ -2743,7 +2757,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 		f = fopen_or_warn(config_filename, "r");
 		if (!f || do_config_from_file(store_aux, CONFIG_ORIGIN_FILE,
 					      config_filename, config_filename,
-					      f, NULL)) {
+					      f, NULL, 1)) {
 			error("invalid config file %s", config_filename);
 			if (store.value_regex != NULL &&
 			    store.value_regex != CONFIG_REGEX_NONE) {
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index ecbcc9cf3d0..867397ae930 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -1463,7 +1463,7 @@ test_expect_success '--unset-all removes section if empty & uncommented' '
 	test_line_count = 0 .git/config
 '
 
-test_expect_failure 'adding a key into an empty section reuses header' '
+test_expect_success 'adding a key into an empty section reuses header' '
 	cat >.git/config <<-\EOF &&
 	[section]
 	EOF
-- 
2.16.2.windows.1.26.g2cc3565eb4b

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug)
  2018-03-29 15:18 [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug) Johannes Schindelin
                   ` (8 preceding siblings ...)
  2018-03-29 15:19 ` [PATCH 9/9] git_config_set: reuse empty sections Johannes Schindelin
@ 2018-03-29 17:58 ` Stefan Beller
  2018-03-30 12:14   ` Johannes Schindelin
  2018-03-29 19:39 ` Jeff King
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 103+ messages in thread
From: Stefan Beller @ 2018-03-29 17:58 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Jason Frey,
	Philip Oakley

On Thu, Mar 29, 2018 at 8:18 AM, Johannes Schindelin
<johannes.schindelin@gmx.de> wrote:

> So what is the argument against this extra care to detect comments? Well, if
> you have something like this:
>
>         [section]
>                 ; Here we comment about the variable called snarf
>                 snarf = froop
>
> and we run `git config --unset section.snarf`, we end up with this config:
>
>         [section]
>                 ; Here we comment about the variable called snarf
>
> which obviously does not make sense. However, that is already established
> behavior for quite a few years, and I do not even try to think of a way how
> this could be solved.

By commenting out the key/value pair instead of deleting it.
It's called --unset, not --delete ;)

Now onto reviewing the patches.

Stefan

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 1/9] git_config_set: fix off-by-two
  2018-03-29 15:18 ` [PATCH 1/9] git_config_set: fix off-by-two Johannes Schindelin
@ 2018-03-29 18:15   ` Stefan Beller
  2018-03-29 19:41     ` Jeff King
  0 siblings, 1 reply; 103+ messages in thread
From: Stefan Beller @ 2018-03-29 18:15 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Jason Frey,
	Philip Oakley

On Thu, Mar 29, 2018 at 8:18 AM, Johannes Schindelin
<johannes.schindelin@gmx.de> wrote:
> Currently, we are slightly overzealous When removing an entry from a
> config file of this form:
>
>         [abc]a
>         [xyz]
>                 key = value
>
> When calling `git config --unset abc.a` on this file, it leaves this
> (invalid) config behind:
>
>         [
>         [xyz]
>                 key = value
>
> The reason is that we try to search for the beginning of the line (or
> for the end of the preceding section header on the same line) that
> defines abc.a, but as an optimization, we subtract 2 from the offset
> pointing just after the definition before we call
> find_beginning_of_line(). That function, however, *also* performs that
> optimization and promptly fails to find the section header correctly.

This commit message would be more convincing if we had it in test form.

    [abc]a

is not written by Git, but would be written from an outside tool or person
and we barely cope with it?

Thanks,
Stefan

>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  config.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/config.c b/config.c
> index b0c20e6cb8a..5cc049aaef0 100644
> --- a/config.c
> +++ b/config.c
> @@ -2632,7 +2632,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
>                         } else
>                                 copy_end = find_beginning_of_line(
>                                         contents, contents_sz,
> -                                       store.offset[i]-2, &new_line);
> +                                       store.offset[i], &new_line);
>
>                         if (copy_end > 0 && contents[copy_end-1] != '\n')
>                                 new_line = 1;
> --
> 2.16.2.windows.1.26.g2cc3565eb4b
>
>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug)
  2018-03-29 15:18 [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug) Johannes Schindelin
                   ` (9 preceding siblings ...)
  2018-03-29 17:58 ` [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug) Stefan Beller
@ 2018-03-29 19:39 ` Jeff King
  2018-03-30 12:35   ` Johannes Schindelin
  2018-03-30 14:17 ` Ævar Arnfjörð Bjarmason
  2018-04-03 16:27 ` [PATCH v2 00/15] " Johannes Schindelin
  12 siblings, 1 reply; 103+ messages in thread
From: Jeff King @ 2018-03-29 19:39 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

On Thu, Mar 29, 2018 at 05:18:30PM +0200, Johannes Schindelin wrote:

> Little did I know that this would turn not only into a full patch to fix this
> issue, but into a full-blown series of nine patches.

It's amazing how often that happens. :)

> The first patch is somewhat of a "while at it" bug fix that I first thought
> would be a lot more critical than it actually is: It really only affects config
> files that start with a section followed immediately (i.e. without a newline)
> by a one-letter boolean setting (i.e. without a `= <value>` part). So while it
> is a real bug fix, I doubt anybody ever got bitten by it.

That makes me wonder if somebody could craft a malicious config to do
something bad. But I don't think so. Config is trusted already, and it
looks like this bug is both hard to trigger and doesn't result in any
kind of memory funniness, just a bogus output.

> Now, to the really important part: why does this patch series not conflict with
> my very early statements that we cannot simply remove empty sections because we
> may end up with stale comments?
> 
> Well, the patch in question takes pains to determine *iff* there are any
> comments surrounding, or included in, the section. If any are found: previous
> behavior. Under the assumption that the user edited the file, we keep it as
> intact as possible (see below for some argument against this). If no comments
> are found, and let's face it, this is probably *the* common case, as few people
> edit their config files by hand these days (neither should they because it is
> too easy to end up with an unparseable one), the now-empty section *is*
> removed.

I'm not against people editing their config files by hand. But I think
what you propose here makes a lot of sense, because it works as long as
you don't intermingle hand- and auto-editing in the same section (and it
even works if you do intermingle, as long as you don't use comments,
which are probably even more rare).

So it seems like quite a sensible compromise, and I think should make
most people happy.

-Peff

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 1/9] git_config_set: fix off-by-two
  2018-03-29 18:15   ` Stefan Beller
@ 2018-03-29 19:41     ` Jeff King
  2018-03-30 12:32       ` Johannes Schindelin
  0 siblings, 1 reply; 103+ messages in thread
From: Jeff King @ 2018-03-29 19:41 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Johannes Schindelin, git, Junio C Hamano, Thomas Rast,
	Phil Haack, Ævar Arnfjörð Bjarmason, Jason Frey,
	Philip Oakley

On Thu, Mar 29, 2018 at 11:15:33AM -0700, Stefan Beller wrote:

> > When calling `git config --unset abc.a` on this file, it leaves this
> > (invalid) config behind:
> >
> >         [
> >         [xyz]
> >                 key = value
> >
> > The reason is that we try to search for the beginning of the line (or
> > for the end of the preceding section header on the same line) that
> > defines abc.a, but as an optimization, we subtract 2 from the offset
> > pointing just after the definition before we call
> > find_beginning_of_line(). That function, however, *also* performs that
> > optimization and promptly fails to find the section header correctly.
> 
> This commit message would be more convincing if we had it in test form.

I agree a test might be nice. But I don't find the commit message
unconvincing at all. It explains pretty clearly why the bug occurs, and
you can verify it by looking at find_beginning_of_line.

>     [abc]a
> 
> is not written by Git, but would be written from an outside tool or person
> and we barely cope with it?

Yes, I don't think git would ever write onto the same line. But clearly
we should handle anything that's syntactically valid.

-Peff

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/9] t1300: rename it to reflect that `repo-config` was deprecated
  2018-03-29 15:18 ` [PATCH 2/9] t1300: rename it to reflect that `repo-config` was deprecated Johannes Schindelin
@ 2018-03-29 19:42   ` Jeff King
  2018-03-30 12:37     ` Johannes Schindelin
  0 siblings, 1 reply; 103+ messages in thread
From: Jeff King @ 2018-03-29 19:42 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

On Thu, Mar 29, 2018 at 05:18:40PM +0200, Johannes Schindelin wrote:

> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  t/{t1300-repo-config.sh => t1300-config.sh} | 0
>  1 file changed, 0 insertions(+), 0 deletions(-)
>  rename t/{t1300-repo-config.sh => t1300-config.sh} (100%)

This has only been bugging me for oh, about 10 years. Thanks.

-Peff

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 3/9] t1300: avoid relying on a bug
  2018-03-29 15:18 ` [PATCH 3/9] t1300: avoid relying on a bug Johannes Schindelin
@ 2018-03-29 19:43   ` Jeff King
  2018-03-30 12:38     ` Johannes Schindelin
  0 siblings, 1 reply; 103+ messages in thread
From: Jeff King @ 2018-03-29 19:43 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

On Thu, Mar 29, 2018 at 05:18:45PM +0200, Johannes Schindelin wrote:

> The test case 'unset with cont. lines' relied on a bug that is about to
> be fixed: it tests *explicitly* that removing the last entry from a
> config section leaves an *empty* section behind.
> 
> Let's fix this test case not to rely on that behavior, simply by
> preventing the section from becoming empty.

Seems like a good solution. I don't think we care in particular about
testing a multi-line value at the end of the file.

-Peff

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 4/9] t1300: remove unreasonable expectation from TODO
  2018-03-29 15:18 ` [PATCH 4/9] t1300: remove unreasonable expectation from TODO Johannes Schindelin
@ 2018-03-29 19:52   ` Jeff King
  2018-03-29 20:45     ` Junio C Hamano
  2018-03-30 12:42     ` Johannes Schindelin
  0 siblings, 2 replies; 103+ messages in thread
From: Jeff King @ 2018-03-29 19:52 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

On Thu, Mar 29, 2018 at 05:18:50PM +0200, Johannes Schindelin wrote:

> In https://public-inbox.org/git/7vvc8alzat.fsf@alter.siamese.dyndns.org/
> a reasonable patch was made quite a bit less so by changing a test case
> demonstrating a bug to a test case that demonstrates that we ask for too
> much: the test case 'unsetting the last key in a section removes header'
> now expects a future bug fix to be able to determine whether a free-form
> comment above a section header refers to said section or not.
> 
> Rather than shooting for the stars (and not even getting off the
> ground), let's start shooting for something obtainable and be reasonably
> confident that we *can* get it.

As I said before, I'm fine with turning this test into something more
realistic.

An obvious question is whether we should preserve the original
unrealistic parts by splitting it: the realistic parts into one
expect_failure (that we'd switch to expect_success by the end of this
series), and then an unrealistic one to serve as a documentation of the
ideal, with a comment explaining why it's unrealistic.

I doubt the "unrealistic" half would be serving much purpose though, so
I'm OK to see it get eliminated here.

-Peff

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 5/9] t1300: `--unset-all` can leave an empty section behind (bug)
  2018-03-29 15:18 ` [PATCH 5/9] t1300: `--unset-all` can leave an empty section behind (bug) Johannes Schindelin
@ 2018-03-29 19:54   ` Jeff King
  0 siblings, 0 replies; 103+ messages in thread
From: Jeff King @ 2018-03-29 19:54 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

On Thu, Mar 29, 2018 at 05:18:53PM +0200, Johannes Schindelin wrote:

> We already have a test demonstrating that removing the last entry from a
> config section fails to remove the section header of the now-empty
> section.
> 
> The same can happen, of course, if we remove the last entries in one fell
> swoop. This is *also* a bug, and should be fixed at the same time.

Yep, makes sense, and the diff is obviously correct.

-Peff

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 4/9] t1300: remove unreasonable expectation from TODO
  2018-03-29 19:52   ` Jeff King
@ 2018-03-29 20:45     ` Junio C Hamano
  2018-03-30 12:42     ` Johannes Schindelin
  1 sibling, 0 replies; 103+ messages in thread
From: Junio C Hamano @ 2018-03-29 20:45 UTC (permalink / raw)
  To: Jeff King
  Cc: Johannes Schindelin, git, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

Jeff King <peff@peff.net> writes:

> An obvious question is whether we should preserve the original
> unrealistic parts by splitting it: the realistic parts into one
> expect_failure (that we'd switch to expect_success by the end of this
> series), and then an unrealistic one to serve as a documentation of the
> ideal, with a comment explaining why it's unrealistic.
>
> I doubt the "unrealistic" half would be serving much purpose though, so
> I'm OK to see it get eliminated here.

Likewise.  The series looks good so far.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 7/9] git config --unset: remove empty sections (in normal situations)
  2018-03-29 15:19 ` [PATCH 7/9] git config --unset: remove empty sections (in normal situations) Johannes Schindelin
@ 2018-03-29 21:32   ` Jeff King
  2018-03-30 13:00     ` Johannes Schindelin
  0 siblings, 1 reply; 103+ messages in thread
From: Jeff King @ 2018-03-29 21:32 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

On Thu, Mar 29, 2018 at 05:19:00PM +0200, Johannes Schindelin wrote:

> Let's generalize this observation to this conservative strategy: if we
> are removing the last entry from a section, and there are no comments
> inside that section nor surrounding it, then remove the entire section.
> Otherwise behave as before: leave the now-empty section (including those
> comments, even the one about the now-deleted entry).

Yep, as I said earlier, this makes a ton of sense to me.

> +/*
> + * This function determines whether the offset is in a line that starts with a
> + * comment character.
> + *
> + * Note: it does *not* report when a regular line (section header, config
> + * setting) *ends* in a comment.
> + */
> +static int is_in_comment_line(const char *contents, size_t offset)
> +{
> +	int comment = 0;
> +
> +	while (offset > 0)
> +		switch (contents[--offset]) {
> +		case ';':
> +		case '#':
> +			comment = 1;
> +			break;
> +		case '\n':
> +			break;
> +		case ' ':
> +		case '\t':
> +			continue;
> +		default:
> +			comment = 0;
> +		}
> +
> +	return comment;
> +}

This doesn't pay any attention to quoting, so I wondered if it would get
fooled by a line like:

  key = "this content has a # comment in it"

or even:

  [section "this section has a # comment in it"]

but those don't count because the line doesn't _start_ with the comment
character. Could we design one that does? This isn't valid:

  [section]
  key = multiline \
    # with comment

But I think this is:

  [section]
  key = "multiline \
    # with comment"

So let's see if we can fool it:

-- >8 --

cat >file <<-\EOF
[one]
key = "multiline \
  # with comment"
[two]
key = true
EOF

# should produce "multiline   # with comment"
./git config --file=file one.key

# this should ideally remove the section
./git config --file=file --unset two.key
cat file

-- 8< --

That seems to work as expected. I'm not 100% sure why, though, since I
thought we'd hit the "seen_section && !is_in_comment_line" bit of the
look_before loop. Running it through gdb, I'm not convinced that
is_in_comment_line is working correctly, though. Shouldn't it stop when
it sees the newline, and return "comment"? There's a "break" there, but
it doesn't break out of the loop due to the switch statement.

So we'll _always_ walk back to the beginning of file. So I suspect your
test passes because it does:

  # this is the start of the file
  [section]
  key = true

but:

  [anotherSection]
  key = true
  # a comment not at the start
  [section]
  key = true

does the wrong thing, and removes [section]. If we fix that bug like
this:

diff --git a/config.c b/config.c
index b04c40f76b..3b2c7e9387 100644
--- a/config.c
+++ b/config.c
@@ -2461,7 +2461,7 @@ static int is_in_comment_line(const char *contents, size_t offset)
 			comment = 1;
 			break;
 		case '\n':
-			break;
+			return comment;
 		case ' ':
 		case '\t':
 			continue;

then it keeps "[section]" correctly. But now if we go back to our funny
multiline example, it does the wrong thing (it keeps [two], even though
that's not _really_ a comment).

To be honest, I could live with that as an open bug. It's a pretty
ridiculous situation, and the worst case is that we err on the side of
caution and don't remove the section. And I think it would be hard to
fix. We could look for the continuation backslash when we find the
newline, but that gets fooled by:

  # a comment \
  # with a pointless backslash

You can't just notice the quote and say "oh, I'm in a quoted section"
because that gets fooled by:

  # a pointless "quote

To know whether that quote is valid or not, you have to find the other
quote. But doing that backwards is hard (if not impossible).

> +static void maybe_remove_section(const char *contents, size_t size,
> +				 const char *section_name,
> +				 size_t section_name_len,
> +				 size_t *begin, int *i_ptr, int *new_line)
> +{
> +	size_t begin2, end2;
> +	int seen_section = 0, dummy, i = *i_ptr;
> +
> +	/*
> +	 * First, make sure that this is the last key in the section, and that
> +	 * there are no comments that are possibly about the current section.
> +	 */
> +next_entry:
> +	for (end2 = store.offset[i]; end2 < size; end2++) {
> +		switch (contents[end2]) {
> +		case ' ':
> +		case '\t':
> +		case '\n':
> +			continue;
> +		case '\r':
> +			if (++end2 < size && contents[end2] == '\n')
> +				continue;
> +			break;
> +		case '[':
> +			/* If the section name is repeated, continue */
> +			if (end2 + 1 + section_name_len < size &&
> +			    contents[end2 + section_name_len] == ']' &&
> +			    !memcmp(contents + end2 + 1, section_name,
> +				    section_name_len)) {
> +				end2 += section_name_len;
> +				continue;
> +			}
> +			goto look_before;
> +		case ';':
> +		case '#':
> +			/* There is a comment, cannot remove this section */
> +			return;
> +		default:
> +			/* There are other keys in that section */
> +			break;
> +		}

OK, this all makes sense. We're scanning forward to find the next '[',
without finding any keys or comments. We don't have to worry about
quoting because we'd quit as soon as we see a key anyway. I like the
special-case for finding our same section name, since that would help
clean up cruft from existing versions of Git.

It looks like there may be an off-by-one, though. Should it be checking:

  contents[end2 + 1 + section_name_len] == ']'

to skip over the opening '['? In a simple example:

  [foo]
  bar = true
  [foo]

we don't seem to remove the second section header. It works with the
patch below:

diff --git a/config.c b/config.c
index b04c40f76b..48dcb52840 100644
--- a/config.c
+++ b/config.c
@@ -2508,10 +2508,10 @@ static void maybe_remove_section(const char *contents, size_t size,
 		case '[':
 			/* If the section name is repeated, continue */
 			if (end2 + 1 + section_name_len < size &&
-			    contents[end2 + section_name_len] == ']' &&
+			    contents[end2 + 1 + section_name_len] == ']' &&
 			    !memcmp(contents + end2 + 1, section_name,
 				    section_name_len)) {
-				end2 += section_name_len;
+				end2 += section_name_len + 1;
 				continue;
 			}
 			goto look_before;

Unfortunately I think this whole thing breaks down with subsections. If
we try this:

  [foo "subsection"]
  bar = true
  [foo "subsection"]

then the section_name variable contains "foo.subsection", which we can't
textually match. And we end up failing to remove either section (the
latter one because of this loop, and the former because of the same
problem in the look_before loop).

> +		/*
> +		 * Uh oh... we found something else in this section. But do
> +		 * we want to remove this, too?
> +		 */
> +		if (++i >= store.seen)
> +			return;
> +
> +		begin2 = find_beginning_of_line(contents, size, store.offset[i],
> +						&dummy);
> +		if (begin2 > end2)
> +			return;
> +
> +		/* Looks like we want to remove the next one, too... */
> +		goto next_entry;
> +	}

OK, makes sense.

> +look_before:
> +	/*
> +	 * Now, ensure that this is the first key, and that there are no
> +	 * comments before the entry nor before the section header.
> +	 */
> +	for (begin2 = *begin; begin2 > 0; )
> +		switch (contents[begin2 - 1]) {
> +		case ' ':
> +		case '\t':
> +			begin2--;
> +			continue;
> +		case '\n':
> +			if (--begin2 > 0 && contents[begin2 - 1] == '\r')
> +				begin2--;
> +			continue;
> +		case ']':
> +			if (begin2 > section_name_len + 1 &&
> +			    contents[begin2 - section_name_len - 2] == '[' &&
> +			    !memcmp(contents + begin2 - section_name_len - 1,
> +				    section_name, section_name_len)) {
> +				begin2 -= section_name_len + 2;
> +				seen_section = 1;
> +				continue;
> +			}

OK, this is the backwards mirror image of the earlier part. Which makes
sense. And this handles the reverse case for the doubled section name:

  [foo]
  [foo]
  bar = true

because we'd hit this section-name check twice, and just set
"seen_section = 1" both times. So that works (modulo the subsection
parsing thing).

As far as quoting goes, now we're coming from the back of each line now.
And I don't think we strictly require double-quotes around string
values. So imagine this:

  [one]
  foo = this has [brackets]
  bar = this does not

When deleting one.bar, we'd erroneously think that closing bracket is
the prior section header. I _think_ it behaves correctly, though,
because we then say "well, delete everything back to that bracket
character". Which happens to be the correct thing to do anyway.

But let's get more devious. What about this:

  [one]
  foo = fake section [one]
  bar = whatever

If I unset foo.bar with your patch, I end up with the truncated:

  [one]
  foo = fake sectio

Yikes. This is obviously a ridiculous example, but the failure case is
pretty nasty.

Again, the tricky thing here is that we're parsing backwards. We don't
know what's syntactically relevant and what isn't.

> +
> +			/*
> +			 * It looks like a section header, but it could be a
> +			 * comment instead...
> +			 */
> +			if (is_in_comment_line(contents, begin2))
> +				return;

This would get fooled if we allowed line continuation in subsection
names, like:

  [one "subsection\
     # with newline"]
  key = true

but it looks like our parser doesn't allow that (aside from it being
slightly insane, of course). Good.

> +			/*
> +			 * We encountered the previous section header: This
> +			 * really was the only entry, so remove the entire
> +			 * section.
> +			 */
> +			if (contents[begin2] != '\n') {
> +				begin2--;
> +				*new_line = 1;
> +			}
> +
> +			store.offset[i] = end2;
> +			*begin = begin2;
> +			*i_ptr = i;
> +			return;

OK, makes sense.

> +		default:
> +			/*
> +			 * Any other character means it is either a comment or
> +			 * a config setting; if it is a comment, we do not want
> +			 * to remove this section. If it is a config setting,
> +			 * we only want to remove this section if this is
> +			 * already the next section.
> +			 */
> +			if (seen_section &&
> +			    !is_in_comment_line(contents, begin2)) {
> +				if (contents[begin2] != '\n') {
> +					begin2--;
> +					*new_line = 1;
> +				}
> +
> +				store.offset[i] = end2;
> +				*begin = begin2;
> +				*i_ptr = i;
> +			}
> +			return;
> +		}

Here's where we get fooled by is_in_comment_line() that I showed at the
beginning. We don't have to worry about other quoting, because any key
(quoted or not) would cause us to abort, since it's in the section.

> +	/* This section extends to the beginning of the file. */
> +	store.offset[i] = end2;
> +	*begin = begin2;
> +	*i_ptr = i;
> +}

Right, makes sense.


Ok, phew. That was a tough read. So here's what I see:

  1. Minor bug in is_in_comment_line(), patch above.

  2. Minor bug in matching section names, patch above.

  3. Matching subsection names doesn't work. I think this should be
     fixable with a helper function which can match '[one "two"]' when
     given "one.two".

  4. Backwards parsing causes is_in_comment_line to trigger more than it
     should. I can live with that because the trigger is arcane, and the
     error behavior is pretty harmless.

  5. Backwards parsing can find a bogus section. Also arcane, but the
     error behavior is pretty scary.

(4) and (5) are the ones that I don't see a way to fix, given the
current way in which we do the config-writing (i.e., running it through
the regular read-parser and then trying to "patch up" the found
locations). I think that's also what's contributing to the code being
hard to read, since you end up doing quite a lot of manual re-parsing.

I think the sane way to do this would be to parse the whole thing into
a tree (that includes things like comments and whitespace), and then we
could much more easily manipulate that tree, without dealing with the
parsing (forwards _and_ backwards). But that's a pretty big change from
the current code.

It also potentially means duplicating the parsing logic, unless we teach
the regular reader to do the tree-parse, and then pick out the config
from that. That's likely much slower than the existing parser (since
we'd allocate a bunch of tree nodes instead of just dumping strings to
the callbacks). But these days we cache the parsed config anyway, so I'm
not sure if a slight slowdown would actually matter that much.

I guess the holy grail would be a parser which reports _all_ syntactic
events (section names, keys, comments, whitespace, etc) as a stream
without storing anything. And then the normal reader could just discard
the non-key events, and the writer here could build the tree from those
events.

-Peff

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 8/9] git_config_set: use do_config_from_file() directly
  2018-03-29 15:19 ` [PATCH 8/9] git_config_set: use do_config_from_file() directly Johannes Schindelin
@ 2018-03-29 21:38   ` Jeff King
  2018-03-30 13:02     ` Johannes Schindelin
  0 siblings, 1 reply; 103+ messages in thread
From: Jeff King @ 2018-03-29 21:38 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

On Thu, Mar 29, 2018 at 05:19:04PM +0200, Johannes Schindelin wrote:

> Technically, it is the git_config_set_multivar_in_file_gently()
> function that we modify here (but the oneline would get too long if we
> were that precise).
> 
> This change prepares the git_config_set machinery to allow reusing empty
> sections, by using the file-local function do_config_from_file()
> directly (whose signature can then be changed without any effect outside
> of config.c).
> 
> An incidental benefit is that we avoid a level of indirection, and we
> also avoid calling flockfile()/funlockfile() when we already know that
> we are not operating on stdin/stdout here.

I'm not sure I understand that last paragraph. What does flockfile() have
to do with stdin/stdout?

The point of those calls is that we're locking the FILE handle, so that
it's safe for the lower-level config code to run getc_unlocked(), which
is faster.

So without those, we're calling getc_unlocked() without holding the
lock. I think it probably works in practice because we know that we're
single-threaded, but it seems a bit sketchy.

-Peff

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 9/9] git_config_set: reuse empty sections
  2018-03-29 15:19 ` [PATCH 9/9] git_config_set: reuse empty sections Johannes Schindelin
@ 2018-03-29 21:50   ` Jeff King
  2018-03-30 13:15     ` Johannes Schindelin
  0 siblings, 1 reply; 103+ messages in thread
From: Jeff King @ 2018-03-29 21:50 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

On Thu, Mar 29, 2018 at 05:19:09PM +0200, Johannes Schindelin wrote:

> It can happen quite easily that the last setting in a config section is
> removed, and to avoid confusion when there are comments in the config
> about that section, we keep a lone section header, i.e. an empty
> section.
> 
> The code to add new entries in the config tries to be cute by reusing
> the parsing code that is used to retrieve config settings, but that
> poses the problem that the latter use case does *not* care about empty
> sections, therefore even the former user case won't see them.
> 
> Fix this by introducing a mode where the parser reports also empty
> sections (with a trailing '.' as tell-tale), and then using that when
> adding new config entries.

Heh, so it seems we are partway to the "event-stream" suggestion I made
earlier. I agree this is the right way to approach this problem.

I wondered if we allow keys to end in ".", but it seems that we don't.

> diff --git a/config.c b/config.c
> index eb1e0d335fc..b04c40f76bc 100644
> --- a/config.c
> +++ b/config.c
> @@ -653,13 +653,15 @@ static int get_base_var(struct strbuf *name)
>  	}
>  }
>  
> -static int git_parse_source(config_fn_t fn, void *data)
> +static int git_parse_source(config_fn_t fn, void *data,
> +			    int include_section_headers)

We already have a "struct config_options", but we do a terrible job of
passing it around (since it only impacts the include stuff right now,
and that all gets handled at a very outer level).

Rather than plumb this one int through everywhere, should we add it to
that struct and plumb the struct through?

-Peff

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug)
  2018-03-29 17:58 ` [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug) Stefan Beller
@ 2018-03-30 12:14   ` Johannes Schindelin
  0 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-03-30 12:14 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Jason Frey,
	Philip Oakley

Hi Stefan,

On Thu, 29 Mar 2018, Stefan Beller wrote:

> On Thu, Mar 29, 2018 at 8:18 AM, Johannes Schindelin
> <johannes.schindelin@gmx.de> wrote:
> 
> > So what is the argument against this extra care to detect comments? Well, if
> > you have something like this:
> >
> >         [section]
> >                 ; Here we comment about the variable called snarf
> >                 snarf = froop
> >
> > and we run `git config --unset section.snarf`, we end up with this config:
> >
> >         [section]
> >                 ; Here we comment about the variable called snarf
> >
> > which obviously does not make sense. However, that is already established
> > behavior for quite a few years, and I do not even try to think of a way how
> > this could be solved.
> 
> By commenting out the key/value pair instead of deleting it.
> It's called --unset, not --delete ;)

That would open the door to new bug reports when a user starts with this
concocted config:

	[section]
		# This is a comment about the `key` setting
		key = value

and then does this:

	git config --unset section.key
	git config section.key value
	git config --unset section.key
	git config section.key value
	git config --unset section.key
	git config section.key value

and then ends up with a config like this:

	[section]
		# This is a comment about the `key` setting
		;key = value
		;key = value
		;key = value
		key = value

And note that the comment might be about `value` instead, so reusing a
commented-out `key` setting won't fly, either.

I *did* give this problem a couple of minutes of thought before writing my
assessment that is quoted above ;-)

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 1/9] git_config_set: fix off-by-two
  2018-03-29 19:41     ` Jeff King
@ 2018-03-30 12:32       ` Johannes Schindelin
  2018-03-30 14:15         ` Ævar Arnfjörð Bjarmason
                           ` (2 more replies)
  0 siblings, 3 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-03-30 12:32 UTC (permalink / raw)
  To: Jeff King
  Cc: Stefan Beller, git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Jason Frey,
	Philip Oakley

Hi,

On Thu, 29 Mar 2018, Jeff King wrote:

> On Thu, Mar 29, 2018 at 11:15:33AM -0700, Stefan Beller wrote:
> 
> > > When calling `git config --unset abc.a` on this file, it leaves this
> > > (invalid) config behind:
> > >
> > >         [
> > >         [xyz]
> > >                 key = value
> > >
> > > The reason is that we try to search for the beginning of the line (or
> > > for the end of the preceding section header on the same line) that
> > > defines abc.a, but as an optimization, we subtract 2 from the offset
> > > pointing just after the definition before we call
> > > find_beginning_of_line(). That function, however, *also* performs that
> > > optimization and promptly fails to find the section header correctly.
> > 
> > This commit message would be more convincing if we had it in test form.
> 
> I agree a test might be nice. But I don't find the commit message
> unconvincing at all. It explains pretty clearly why the bug occurs, and
> you can verify it by looking at find_beginning_of_line.
> 
> >     [abc]a
> > 
> > is not written by Git, but would be written from an outside tool or person
> > and we barely cope with it?
> 
> Yes, I don't think git would ever write onto the same line. But clearly
> we should handle anything that's syntactically valid.

I was tempted to add the test case, because it is easy to test it.

But I then decided *not* to add it. Why? Testing is a balance between "can
do" and "need to do".

Can you imagine that I did *not* run the entire test suite before
submitting this patch series, because it takes an incredible *90 minutes*
to run *on a fast Windows machine*?

Seriously, this is hurting me. I do not complain about this due to some
mental illness forcing me to do it. I complain about this so often
*because it slows me down*, you gentle people. And you don't seem to care,
at least the test suite gets noticably worse by the month. I frankly do
not know what to do about this, as you keep adding and adding and it gets
less and less feasible for me to run the full test suite. I seem to be
totally unable to get through to you with the message that this is a real
problem with a real need to get fixed.

So with this in mind, I do not want to add a test case for a concocted
example that won't affect anybody except users who *want* to trigger this
bug.

I hope you agree,
Dscho

P.S.: Of course I ran the entire test suite. Not on Windows, but in a
Linux VM, because Linux is what Git is fine-tuned for, most obviously so.
An alien digging up ancient Earth history in the far future might be
tempted to assume that Git was developed to develop Linux which was
developed to develop Git, and then ask herself why humans bothered at all.

I actually ran the entire test suite on Linux on every single patch, via
`git rebase -x "make -j15 DEVELOPER=1 test" @{u}`, as I usually do before
submitting a patch series.

And it *did* find an obscure bug in an earlier iteration, where
t5512-ls-remote.sh demonstrated that looking at only one entry at a time
is not enough: `git config --unset-all uploadpack.hiderefs` *also* needs
to remove the now-empty section, because we might end up with the empty
sections in the wrong order, and the order of [transfer] and [uploadpack]
*matters* if the transfer.hiderefs setting is negative and the
uploadpack.hiderefs setting is positive, as is the case in 'overrides work
between mixed transfer/upload-pack hideRefs'. (Side-note: this looks like
a pretty obvious design bug to me, as there is *no tooling* to switch
around the order of these settings. Even worse: if somebody gets
instructions to add those settings, and there is already a [transfer]
section in the config: you're out of luck! You will have to *know* that
the order matters, *and add a second [transfer] section manually*!)

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug)
  2018-03-29 19:39 ` Jeff King
@ 2018-03-30 12:35   ` Johannes Schindelin
  0 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-03-30 12:35 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

Hi Peff,

On Thu, 29 Mar 2018, Jeff King wrote:

> On Thu, Mar 29, 2018 at 05:18:30PM +0200, Johannes Schindelin wrote:
> 
> > The first patch is somewhat of a "while at it" bug fix that I first
> > thought would be a lot more critical than it actually is: It really
> > only affects config files that start with a section followed
> > immediately (i.e. without a newline) by a one-letter boolean setting
> > (i.e. without a `= <value>` part). So while it is a real bug fix, I
> > doubt anybody ever got bitten by it.
> 
> That makes me wonder if somebody could craft a malicious config to do
> something bad.

I thought about that, and could not think of anything other than social
engineering vectors. Even in that case, the error message is instructive
enough that the user should be able to fix the config without consulting
StackOverflow.

> > Now, to the really important part: why does this patch series not
> > conflict with my very early statements that we cannot simply remove
> > empty sections because we may end up with stale comments?
> > 
> > Well, the patch in question takes pains to determine *iff* there are
> > any comments surrounding, or included in, the section. If any are
> > found: previous behavior. Under the assumption that the user edited
> > the file, we keep it as intact as possible (see below for some
> > argument against this). If no comments are found, and let's face it,
> > this is probably *the* common case, as few people edit their config
> > files by hand these days (neither should they because it is too easy
> > to end up with an unparseable one), the now-empty section *is*
> > removed.
> 
> I'm not against people editing their config files by hand. But I think
> what you propose here makes a lot of sense, because it works as long as
> you don't intermingle hand- and auto-editing in the same section (and it
> even works if you do intermingle, as long as you don't use comments,
> which are probably even more rare).
> 
> So it seems like quite a sensible compromise, and I think should make
> most people happy.

Thanks for confirming my line of thinking,
Dscho

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/9] t1300: rename it to reflect that `repo-config` was deprecated
  2018-03-29 19:42   ` Jeff King
@ 2018-03-30 12:37     ` Johannes Schindelin
  0 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-03-30 12:37 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

Hi Peff,

On Thu, 29 Mar 2018, Jeff King wrote:

> On Thu, Mar 29, 2018 at 05:18:40PM +0200, Johannes Schindelin wrote:
> 
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> >  t/{t1300-repo-config.sh => t1300-config.sh} | 0
> >  1 file changed, 0 insertions(+), 0 deletions(-)
> >  rename t/{t1300-repo-config.sh => t1300-config.sh} (100%)
> 
> This has only been bugging me for oh, about 10 years.

Yep.

We should have done that right after moving the builtins' code to
builtins/.

Which reminds me that we *still* do not have a lib/ where all the source
code for libgit.a lives. And then maybe standalone/ for the source code of
the non-builtin tools. And... this would make for a fine micro-project
next year, I guess. Or in ten.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 3/9] t1300: avoid relying on a bug
  2018-03-29 19:43   ` Jeff King
@ 2018-03-30 12:38     ` Johannes Schindelin
  0 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-03-30 12:38 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

Hi Peff,

On Thu, 29 Mar 2018, Jeff King wrote:

> On Thu, Mar 29, 2018 at 05:18:45PM +0200, Johannes Schindelin wrote:
> 
> > The test case 'unset with cont. lines' relied on a bug that is about to
> > be fixed: it tests *explicitly* that removing the last entry from a
> > config section leaves an *empty* section behind.
> > 
> > Let's fix this test case not to rely on that behavior, simply by
> > preventing the section from becoming empty.
> 
> Seems like a good solution. I don't think we care in particular about
> testing a multi-line value at the end of the file.

... and if we did, we should have documented that.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 4/9] t1300: remove unreasonable expectation from TODO
  2018-03-29 19:52   ` Jeff King
  2018-03-29 20:45     ` Junio C Hamano
@ 2018-03-30 12:42     ` Johannes Schindelin
  1 sibling, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-03-30 12:42 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

Hi Peff,

On Thu, 29 Mar 2018, Jeff King wrote:

> On Thu, Mar 29, 2018 at 05:18:50PM +0200, Johannes Schindelin wrote:
> 
> > In https://public-inbox.org/git/7vvc8alzat.fsf@alter.siamese.dyndns.org/
> > a reasonable patch was made quite a bit less so by changing a test case
> > demonstrating a bug to a test case that demonstrates that we ask for too
> > much: the test case 'unsetting the last key in a section removes header'
> > now expects a future bug fix to be able to determine whether a free-form
> > comment above a section header refers to said section or not.
> > 
> > Rather than shooting for the stars (and not even getting off the
> > ground), let's start shooting for something obtainable and be reasonably
> > confident that we *can* get it.
> 
> As I said before, I'm fine with turning this test into something more
> realistic.

Good.

Of course, I worked hard to come up with a patch series, i.e. I put in
some effort to placate anybody who would be offended by my accompanying
rant.

> An obvious question is whether we should preserve the original
> unrealistic parts by splitting it: the realistic parts into one
> expect_failure (that we'd switch to expect_success by the end of this
> series), and then an unrealistic one to serve as a documentation of the
> ideal, with a comment explaining why it's unrealistic.

As stated before, I think it would be a mistake to mark up this
unrealistic example with `test_expect_failure`. We do, after all, suggest
occasionally to grep for that when somebody asks what they could work on.
And you do not want to set somebody like that up for failure by pointing
them to such a "bug".

However, I did keep the example to demonstrate the expectation that
sections with surrounding comments are kept. That was very much intended.

And the reason I did not change the unrealistic example? So that it is
easier to review in our patch-based review process, where I try to avoid
hunks that might distract from the intent of the change.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 7/9] git config --unset: remove empty sections (in normal situations)
  2018-03-29 21:32   ` Jeff King
@ 2018-03-30 13:00     ` Johannes Schindelin
  2018-03-30 13:09       ` Jeff King
  0 siblings, 1 reply; 103+ messages in thread
From: Johannes Schindelin @ 2018-03-30 13:00 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

Hi Peff,

On Thu, 29 Mar 2018, Jeff King wrote:

> On Thu, Mar 29, 2018 at 05:19:00PM +0200, Johannes Schindelin wrote:
> 
> > Let's generalize this observation to this conservative strategy: if we
> > are removing the last entry from a section, and there are no comments
> > inside that section nor surrounding it, then remove the entire section.
> > Otherwise behave as before: leave the now-empty section (including those
> > comments, even the one about the now-deleted entry).
> 
> Yep, as I said earlier, this makes a ton of sense to me.
> 
> [... thorough review ...]

Thank you for taking the time (and figuring out my off-by-ones, am I not
the king of those?).

Your in-depth analysis of the backtracking approach also makes sense, in
particular the awful bug that looks very, very similar to what 1/9 fixes
elsewhere.

I'll take some time to go over your comments in detail, but there is one
suggestion that I think I'll want to pursue first:

> I guess the holy grail would be a parser which reports _all_ syntactic
> events (section names, keys, comments, whitespace, etc) as a stream
> without storing anything. And then the normal reader could just discard
> the non-key events, and the writer here could build the tree from those
> events.

I already changed the do_config_from_file()/do_config_from() code path to
allow for handing back section headers. And I *think* that approach should
be easily extended to allow for an optional callback for these syntactic
events (and we do not need more than that, as the parsed "tree" really is
a list: there is nothing nested about ini files, so we really only have a
linear list of blocks (event type, offset range)).

I'll think about this a little bit, and hopefully come back with v2 in a
while that uses that approach.

Thank you so much for that suggestion,
Dscho

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 8/9] git_config_set: use do_config_from_file() directly
  2018-03-29 21:38   ` Jeff King
@ 2018-03-30 13:02     ` Johannes Schindelin
  2018-03-30 13:14       ` Jeff King
  0 siblings, 1 reply; 103+ messages in thread
From: Johannes Schindelin @ 2018-03-30 13:02 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

Hi Peff,

On Thu, 29 Mar 2018, Jeff King wrote:

> On Thu, Mar 29, 2018 at 05:19:04PM +0200, Johannes Schindelin wrote:
> 
> > Technically, it is the git_config_set_multivar_in_file_gently()
> > function that we modify here (but the oneline would get too long if we
> > were that precise).
> > 
> > This change prepares the git_config_set machinery to allow reusing empty
> > sections, by using the file-local function do_config_from_file()
> > directly (whose signature can then be changed without any effect outside
> > of config.c).
> > 
> > An incidental benefit is that we avoid a level of indirection, and we
> > also avoid calling flockfile()/funlockfile() when we already know that
> > we are not operating on stdin/stdout here.
> 
> I'm not sure I understand that last paragraph. What does flockfile() have
> to do with stdin/stdout?
> 
> The point of those calls is that we're locking the FILE handle, so that
> it's safe for the lower-level config code to run getc_unlocked(), which
> is faster.
> 
> So without those, we're calling getc_unlocked() without holding the
> lock. I think it probably works in practice because we know that we're
> single-threaded, but it seems a bit sketchy.

Oops. I misunderstood the purpose of flockfile(), then. I thought it was
only about multiple users of stdin/stdout.

Will have a look whether flockfile()/funlockfile() can be moved into
do_config_from_file() instead.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 7/9] git config --unset: remove empty sections (in normal situations)
  2018-03-30 13:00     ` Johannes Schindelin
@ 2018-03-30 13:09       ` Jeff King
  0 siblings, 0 replies; 103+ messages in thread
From: Jeff King @ 2018-03-30 13:09 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

On Fri, Mar 30, 2018 at 03:00:06PM +0200, Johannes Schindelin wrote:

> > I guess the holy grail would be a parser which reports _all_ syntactic
> > events (section names, keys, comments, whitespace, etc) as a stream
> > without storing anything. And then the normal reader could just discard
> > the non-key events, and the writer here could build the tree from those
> > events.
> 
> I already changed the do_config_from_file()/do_config_from() code path to
> allow for handing back section headers. And I *think* that approach should
> be easily extended to allow for an optional callback for these syntactic
> events (and we do not need more than that, as the parsed "tree" really is
> a list: there is nothing nested about ini files, so we really only have a
> linear list of blocks (event type, offset range)).

True. I was thinking we'd want sections with keys, whitespace, and
comments under them. But even that does not really make sense. As this
patch series shows, comments do not "belong" to a section, and the file
really needs to be considered as a stream.

So yeah, if we can parse it into a sequence of events in one
forward-pass and then manipulate that sequence, I think it should be
sufficient (and _way_ more readable than the current code, even before
the bits you are trying to fix here).

> I'll think about this a little bit, and hopefully come back with v2 in a
> while that uses that approach.
> 
> Thank you so much for that suggestion,

Great. Thanks for working on this.

-Peff

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 8/9] git_config_set: use do_config_from_file() directly
  2018-03-30 13:02     ` Johannes Schindelin
@ 2018-03-30 13:14       ` Jeff King
  2018-03-30 14:01         ` Johannes Schindelin
  0 siblings, 1 reply; 103+ messages in thread
From: Jeff King @ 2018-03-30 13:14 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

On Fri, Mar 30, 2018 at 03:02:00PM +0200, Johannes Schindelin wrote:

> > I'm not sure I understand that last paragraph. What does flockfile() have
> > to do with stdin/stdout?
> > 
> > The point of those calls is that we're locking the FILE handle, so that
> > it's safe for the lower-level config code to run getc_unlocked(), which
> > is faster.
> > 
> > So without those, we're calling getc_unlocked() without holding the
> > lock. I think it probably works in practice because we know that we're
> > single-threaded, but it seems a bit sketchy.
> 
> Oops. I misunderstood the purpose of flockfile(), then. I thought it was
> only about multiple users of stdin/stdout.
> 
> Will have a look whether flockfile()/funlockfile() can be moved into
> do_config_from_file() instead.

In a sense stdin/stdout are much more susceptible to this because
they're global variables, and any thread may touch them. For the config
code, we open our own handle that we don't expose elsewhere. So probably
it would be fine just to use the unlocked variants even without locking.

But IMHO it's good practice to always flockfile() before using the
unlocked variants. My reading of POSIX is that it's OK to use the
unlocked variants without holding the lock (if you know there won't be
contention), but if it's not hard to err on the side of safety, I'd
prefer it.

-Peff

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 9/9] git_config_set: reuse empty sections
  2018-03-29 21:50   ` Jeff King
@ 2018-03-30 13:15     ` Johannes Schindelin
  0 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-03-30 13:15 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

Hi Peff,

On Thu, 29 Mar 2018, Jeff King wrote:

> On Thu, Mar 29, 2018 at 05:19:09PM +0200, Johannes Schindelin wrote:
> 
> > It can happen quite easily that the last setting in a config section is
> > removed, and to avoid confusion when there are comments in the config
> > about that section, we keep a lone section header, i.e. an empty
> > section.
> > 
> > The code to add new entries in the config tries to be cute by reusing
> > the parsing code that is used to retrieve config settings, but that
> > poses the problem that the latter use case does *not* care about empty
> > sections, therefore even the former user case won't see them.
> > 
> > Fix this by introducing a mode where the parser reports also empty
> > sections (with a trailing '.' as tell-tale), and then using that when
> > adding new config entries.
> 
> Heh, so it seems we are partway to the "event-stream" suggestion I made
> earlier. I agree this is the right way to approach this problem.
> 
> I wondered if we allow keys to end in ".", but it seems that we don't.
> 
> > diff --git a/config.c b/config.c
> > index eb1e0d335fc..b04c40f76bc 100644
> > --- a/config.c
> > +++ b/config.c
> > @@ -653,13 +653,15 @@ static int get_base_var(struct strbuf *name)
> >  	}
> >  }
> >  
> > -static int git_parse_source(config_fn_t fn, void *data)
> > +static int git_parse_source(config_fn_t fn, void *data,
> > +			    int include_section_headers)
> 
> We already have a "struct config_options", but we do a terrible job of
> passing it around (since it only impacts the include stuff right now,
> and that all gets handled at a very outer level).
> 
> Rather than plumb this one int through everywhere, should we add it to
> that struct and plumb the struct through?

Yesss!

Again, thank you so much for this really valuable review. This is even
better than what I hoped for.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 8/9] git_config_set: use do_config_from_file() directly
  2018-03-30 13:14       ` Jeff King
@ 2018-03-30 14:01         ` Johannes Schindelin
  2018-03-30 14:08           ` Jeff King
  0 siblings, 1 reply; 103+ messages in thread
From: Johannes Schindelin @ 2018-03-30 14:01 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

Hi Peff,

On Fri, 30 Mar 2018, Jeff King wrote:

> On Fri, Mar 30, 2018 at 03:02:00PM +0200, Johannes Schindelin wrote:
> 
> > > I'm not sure I understand that last paragraph. What does flockfile() have
> > > to do with stdin/stdout?
> > > 
> > > The point of those calls is that we're locking the FILE handle, so that
> > > it's safe for the lower-level config code to run getc_unlocked(), which
> > > is faster.
> > > 
> > > So without those, we're calling getc_unlocked() without holding the
> > > lock. I think it probably works in practice because we know that we're
> > > single-threaded, but it seems a bit sketchy.
> > 
> > Oops. I misunderstood the purpose of flockfile(), then. I thought it was
> > only about multiple users of stdin/stdout.
> > 
> > Will have a look whether flockfile()/funlockfile() can be moved into
> > do_config_from_file() instead.
> 
> In a sense stdin/stdout are much more susceptible to this because
> they're global variables, and any thread may touch them. For the config
> code, we open our own handle that we don't expose elsewhere. So probably
> it would be fine just to use the unlocked variants even without locking.
> 
> But IMHO it's good practice to always flockfile() before using the
> unlocked variants. My reading of POSIX is that it's OK to use the
> unlocked variants without holding the lock (if you know there won't be
> contention), but if it's not hard to err on the side of safety, I'd
> prefer it.

You know what is *really* funny?

-- snip --
static int git_config_from_stdin(config_fn_t fn, void *data)
{
        return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin, data, 0);
}

int git_config_from_file(config_fn_t fn, const char *filename, void *data)
{
        int ret = -1;
        FILE *f;

        f = fopen_or_warn(filename, "r");
        if (f) {
                flockfile(f);
                ret = do_config_from_file(fn, CONFIG_ORIGIN_FILE, filename, filename, f, data, 0);
                funlockfile(f);
                fclose(f);
        }
        return ret;
}
-- snap --

So the _stdin variant *goes out of its way not to flockfile()*...

But I guess all this will become moot when I start handing down the config
options. It does mean that I have to change the signatures in header
files, oh well ;-)

But then I can drop this here patch and we can stop musing about
flockfile()  ;-)

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 8/9] git_config_set: use do_config_from_file() directly
  2018-03-30 14:01         ` Johannes Schindelin
@ 2018-03-30 14:08           ` Jeff King
  2018-03-30 19:04             ` Johannes Schindelin
  0 siblings, 1 reply; 103+ messages in thread
From: Jeff King @ 2018-03-30 14:08 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

On Fri, Mar 30, 2018 at 04:01:56PM +0200, Johannes Schindelin wrote:

> You know what is *really* funny?
> 
> -- snip --
> static int git_config_from_stdin(config_fn_t fn, void *data)
> {
>         return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin, data, 0);
> }
> 
> int git_config_from_file(config_fn_t fn, const char *filename, void *data)
> {
>         int ret = -1;
>         FILE *f;
> 
>         f = fopen_or_warn(filename, "r");
>         if (f) {
>                 flockfile(f);
>                 ret = do_config_from_file(fn, CONFIG_ORIGIN_FILE, filename, filename, f, data, 0);
>                 funlockfile(f);
>                 fclose(f);
>         }
>         return ret;
> }
> -- snap --
> 
> So the _stdin variant *goes out of its way not to flockfile()*...

*facepalm* That's probably my fault, since git_config_from_stdin()
existed already when I did the flockfile stuff.

Probably the flockfile should go into do_config_from_file(), where we
specify to use the unlocked variants.

> But I guess all this will become moot when I start handing down the config
> options. It does mean that I have to change the signatures in header
> files, oh well ;-)
> 
> But then I can drop this here patch and we can stop musing about
> flockfile()  ;-)

Yeah, I'll wait to see how your refactor turns out.

-Peff

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 1/9] git_config_set: fix off-by-two
  2018-03-30 12:32       ` Johannes Schindelin
@ 2018-03-30 14:15         ` Ævar Arnfjörð Bjarmason
  2018-03-30 16:24           ` Junio C Hamano
  2018-03-30 16:36         ` Duy Nguyen
  2018-03-30 18:45         ` A potential approach to making tests faster on Windows Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 103+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-03-30 14:15 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Jeff King, Stefan Beller, git, Junio C Hamano, Thomas Rast,
	Phil Haack, Jason Frey, Philip Oakley


On Fri, Mar 30 2018, Johannes Schindelin wrote:

> On Thu, 29 Mar 2018, Jeff King wrote:
>
>> On Thu, Mar 29, 2018 at 11:15:33AM -0700, Stefan Beller wrote:
>>
>> > > When calling `git config --unset abc.a` on this file, it leaves this
>> > > (invalid) config behind:
>> > >
>> > >         [
>> > >         [xyz]
>> > >                 key = value
>> > >
>> > > The reason is that we try to search for the beginning of the line (or
>> > > for the end of the preceding section header on the same line) that
>> > > defines abc.a, but as an optimization, we subtract 2 from the offset
>> > > pointing just after the definition before we call
>> > > find_beginning_of_line(). That function, however, *also* performs that
>> > > optimization and promptly fails to find the section header correctly.
>> >
>> > This commit message would be more convincing if we had it in test form.
>>
>> I agree a test might be nice. But I don't find the commit message
>> unconvincing at all. It explains pretty clearly why the bug occurs, and
>> you can verify it by looking at find_beginning_of_line.
>>
>> >     [abc]a
>> >
>> > is not written by Git, but would be written from an outside tool or person
>> > and we barely cope with it?
>>
>> Yes, I don't think git would ever write onto the same line. But clearly
>> we should handle anything that's syntactically valid.
>
> I was tempted to add the test case, because it is easy to test it.
>
> But I then decided *not* to add it. Why? Testing is a balance between "can
> do" and "need to do".
>
> Can you imagine that I did *not* run the entire test suite before
> submitting this patch series, because it takes an incredible *90 minutes*
> to run *on a fast Windows machine*?

I think if it's worth fixing it's worth testing for, a future change to
the config code could easily introduce a regression for this, and
particularly in this type of code obscure edge cases like this can point
to bugs elsewhere.

We have the EXPENSIVE_ON_WINDOWS prerequisite already in master from an
earlier series of mine, maybe we could use that here, or add some other
prereq like OVERLY_EXHAUSTIVE which by default could depend on
EXPENSIVE_ON_WINDOWS, i.e. we'd have a set of overly pedantic tests that
we skip on Windows by default, as there's no reason to suspect they're
platform-dependent, but we'd like to know if they regress.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug)
  2018-03-29 15:18 [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug) Johannes Schindelin
                   ` (10 preceding siblings ...)
  2018-03-29 19:39 ` Jeff King
@ 2018-03-30 14:17 ` Ævar Arnfjörð Bjarmason
  2018-03-30 18:46   ` Johannes Schindelin
  2018-04-03 16:27 ` [PATCH v2 00/15] " Johannes Schindelin
  12 siblings, 1 reply; 103+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-03-30 14:17 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Stefan Beller, Jason Frey, Philip Oakley


On Thu, Mar 29 2018, Johannes Schindelin wrote:

> Nonetheless, I would be confortable with this patch going into v2.17.0, even at
> this late stage. The final verdict is Junio's, of course.

Thanks a lot for working on this. I'm keen to stress test this, but
won't have time in the next few days, and in any case think that the
parts that change functionality should wait until after 2.17 (but
e.g. the test renaming would be fine for a cherry-pick).

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 1/9] git_config_set: fix off-by-two
  2018-03-30 14:15         ` Ævar Arnfjörð Bjarmason
@ 2018-03-30 16:24           ` Junio C Hamano
  2018-03-30 18:44             ` Johannes Schindelin
  0 siblings, 1 reply; 103+ messages in thread
From: Junio C Hamano @ 2018-03-30 16:24 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Johannes Schindelin, Jeff King, Stefan Beller, git, Thomas Rast,
	Phil Haack, Jason Frey, Philip Oakley

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> I think if it's worth fixing it's worth testing for, a future change to
> the config code could easily introduce a regression for this, and
> particularly in this type of code obscure edge cases like this can point
> to bugs elsewhere.

Yup.  "The port to my favourite platform is too slow, and everybody
should learn to live with thin test coverage" would not be a good
strategy in the longer run.


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 1/9] git_config_set: fix off-by-two
  2018-03-30 12:32       ` Johannes Schindelin
  2018-03-30 14:15         ` Ævar Arnfjörð Bjarmason
@ 2018-03-30 16:36         ` Duy Nguyen
  2018-03-30 18:53           ` Johannes Schindelin
  2018-03-30 18:45         ` A potential approach to making tests faster on Windows Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 103+ messages in thread
From: Duy Nguyen @ 2018-03-30 16:36 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Jeff King, Stefan Beller, git, Junio C Hamano, Thomas Rast,
	Phil Haack, Ævar Arnfjörð Bjarmason, Jason Frey,
	Philip Oakley

On Fri, Mar 30, 2018 at 2:32 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
> On Thu, 29 Mar 2018, Jeff King wrote:
>
>> On Thu, Mar 29, 2018 at 11:15:33AM -0700, Stefan Beller wrote:
>>
>> > > When calling `git config --unset abc.a` on this file, it leaves this
>> > > (invalid) config behind:
>> > >
>> > >         [
>> > >         [xyz]
>> > >                 key = value
>> > >
>> > > The reason is that we try to search for the beginning of the line (or
>> > > for the end of the preceding section header on the same line) that
>> > > defines abc.a, but as an optimization, we subtract 2 from the offset
>> > > pointing just after the definition before we call
>> > > find_beginning_of_line(). That function, however, *also* performs that
>> > > optimization and promptly fails to find the section header correctly.
>> >
>> > This commit message would be more convincing if we had it in test form.
>>
>> I agree a test might be nice. But I don't find the commit message
>> unconvincing at all. It explains pretty clearly why the bug occurs, and
>> you can verify it by looking at find_beginning_of_line.
>>
>> >     [abc]a
>> >
>> > is not written by Git, but would be written from an outside tool or person
>> > and we barely cope with it?
>>
>> Yes, I don't think git would ever write onto the same line. But clearly
>> we should handle anything that's syntactically valid.
>
> I was tempted to add the test case, because it is easy to test it.
>
> But I then decided *not* to add it. Why? Testing is a balance between "can
> do" and "need to do".
>
> Can you imagine that I did *not* run the entire test suite before
> submitting this patch series, because it takes an incredible *90 minutes*
> to run *on a fast Windows machine*?

What's wrong with firing up a new worktree, run the test suite there
and go back to do something else so you won't waste time just waiting
for test results and submit? Sure there is a mental overhead for
switching tasks, but at 90 minutes, I think it's worth doing.
-- 
Duy

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 1/9] git_config_set: fix off-by-two
  2018-03-30 16:24           ` Junio C Hamano
@ 2018-03-30 18:44             ` Johannes Schindelin
  2018-03-30 19:00               ` Junio C Hamano
  0 siblings, 1 reply; 103+ messages in thread
From: Johannes Schindelin @ 2018-03-30 18:44 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, Jeff King, Stefan Beller,
	git, Thomas Rast, Phil Haack, Jason Frey, Philip Oakley

[-- Attachment #1: Type: text/plain, Size: 859 bytes --]

Hi Junio,

On Fri, 30 Mar 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> 
> > I think if it's worth fixing it's worth testing for, a future change to
> > the config code could easily introduce a regression for this, and
> > particularly in this type of code obscure edge cases like this can point
> > to bugs elsewhere.
> 
> Yup.  "The port to my favourite platform is too slow, and everybody
> should learn to live with thin test coverage" would not be a good
> strategy in the longer run.

What would be a *really* good strategy is: "Oh, there is a problem! Let's
acknowledge it and try to come up with a solution rather than a
work-around".

EXPENSIVE_ON_WINDOWS is a symptom. Not a solution.

And you are actively hurting my ability to contribute, I hope you are
aware of that.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 103+ messages in thread

* A potential approach to making tests faster on Windows
  2018-03-30 12:32       ` Johannes Schindelin
  2018-03-30 14:15         ` Ævar Arnfjörð Bjarmason
  2018-03-30 16:36         ` Duy Nguyen
@ 2018-03-30 18:45         ` Ævar Arnfjörð Bjarmason
  2018-03-30 18:58           ` Junio C Hamano
                             ` (2 more replies)
  2 siblings, 3 replies; 103+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-03-30 18:45 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Jeff King, Stefan Beller, git, Junio C Hamano, Thomas Rast,
	Phil Haack, Jason Frey, Philip Oakley, Duy Nguyen


On Fri, Mar 30 2018, Johannes Schindelin wrote [expressing frustrations
about Windows test suite slowness]:

I've wondered for a while whether it wouldn't be a viable approach to
make something like an interpreter for our test suite to get around this
problem, i.e. much of it's very repetitive and just using a few shell
functions we've defined, what if we had C equivalents of those?

Duy had a WIP patch set a while ago to add C test suite support, but I
thought what if we turn that inside-out, and instead have a shell
interpreter that knows about the likes of test_cmp, and executes them
directly?

Here's proof of concept as a patch to the dash shell:

    u dash (debian/master=) $ git diff
    diff --git a/src/builtins.def.in b/src/builtins.def.in
    index 4441fe4..b214a17 100644
    --- a/src/builtins.def.in
    +++ b/src/builtins.def.in
    @@ -92,3 +92,4 @@ ulimitcmd     ulimit
     #endif
     testcmd                test [
     killcmd                -u kill
    +testcmpcmd     test_cmp
    diff --git a/src/jobs.c b/src/jobs.c
    index c2c2332..905563f 100644
    --- a/src/jobs.c
    +++ b/src/jobs.c
    @@ -1502,3 +1502,12 @@ getstatus(struct job *job) {
                    jobno(job), job->nprocs, status, retval));
            return retval;
     }
    +
    +#include <stdio.h>
    +int
    +testcmpcmd(argc, argv)
    +       int argc;
    +       char **argv;
    +{
    +       fprintf(stderr, "Got %d arguments\n", argc);
    +}

I just added that to jobs.c because it was easiest, then test_cmp
becomes a builtin:

    u dash (debian/master=) $ src/dash -c 'type test_cmp'
    test_cmp is a shell builtin
    u dash (debian/master=) $ src/dash -c 'echo foo && test_cmp 1 2 3'
    foo
    Got 4 arguments

I.e. it's really easy to add new built in commands to the dash shell
(and probably other shells, but dash is really small & fast).

We could carry some patch like that to dash, and also patch it so
test-lib.sh could know that that was our own custom shell, and we'd then
skip defining functions like test_cmp, and instead use that new builtin.

Similarly, it could then be linked to our own binaries, and the
test-tool would be a builtin that would appropriately dispatch, and we
could even eventually make "git" a shell builtin.

I don't have time or interest to work on this now, but thought it was
interesting to share. This assumes that something in shellscript like:

    while echo foo; do echo bar; done

Is no slower on Windows than *nix, since it's purely using built-ins, as
opposed to something that would shell out.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug)
  2018-03-30 14:17 ` Ævar Arnfjörð Bjarmason
@ 2018-03-30 18:46   ` Johannes Schindelin
  0 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-03-30 18:46 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Stefan Beller, Jason Frey, Philip Oakley

[-- Attachment #1: Type: text/plain, Size: 806 bytes --]

Hi Ævar,

On Fri, 30 Mar 2018, Ævar Arnfjörð Bjarmason wrote:

> 
> On Thu, Mar 29 2018, Johannes Schindelin wrote:
> 
> > Nonetheless, I would be confortable with this patch going into
> > v2.17.0, even at this late stage. The final verdict is Junio's, of
> > course.
> 
> Thanks a lot for working on this. I'm keen to stress test this, but
> won't have time in the next few days, and in any case think that the
> parts that change functionality should wait until after 2.17 (but e.g.
> the test renaming would be fine for a cherry-pick).

Obviously this was never meant to get into v2.17.0 (apart maybe from 1/9,
which however is so contested over that addition of the test case under
the assumption that anybody but me would dare to touch those parts of the
code).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 1/9] git_config_set: fix off-by-two
  2018-03-30 16:36         ` Duy Nguyen
@ 2018-03-30 18:53           ` Johannes Schindelin
  2018-03-30 19:16             ` Duy Nguyen
  0 siblings, 1 reply; 103+ messages in thread
From: Johannes Schindelin @ 2018-03-30 18:53 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Jeff King, Stefan Beller, git, Junio C Hamano, Thomas Rast,
	Phil Haack, Ævar Arnfjörð Bjarmason, Jason Frey,
	Philip Oakley

Hi Duy,

On Fri, 30 Mar 2018, Duy Nguyen wrote:

> On Fri, Mar 30, 2018 at 2:32 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > On Thu, 29 Mar 2018, Jeff King wrote:
> >
> >> On Thu, Mar 29, 2018 at 11:15:33AM -0700, Stefan Beller wrote:
> >>
> >> > > When calling `git config --unset abc.a` on this file, it leaves this
> >> > > (invalid) config behind:
> >> > >
> >> > >         [
> >> > >         [xyz]
> >> > >                 key = value
> >> > >
> >> > > The reason is that we try to search for the beginning of the line (or
> >> > > for the end of the preceding section header on the same line) that
> >> > > defines abc.a, but as an optimization, we subtract 2 from the offset
> >> > > pointing just after the definition before we call
> >> > > find_beginning_of_line(). That function, however, *also* performs that
> >> > > optimization and promptly fails to find the section header correctly.
> >> >
> >> > This commit message would be more convincing if we had it in test form.
> >>
> >> I agree a test might be nice. But I don't find the commit message
> >> unconvincing at all. It explains pretty clearly why the bug occurs, and
> >> you can verify it by looking at find_beginning_of_line.
> >>
> >> >     [abc]a
> >> >
> >> > is not written by Git, but would be written from an outside tool or person
> >> > and we barely cope with it?
> >>
> >> Yes, I don't think git would ever write onto the same line. But clearly
> >> we should handle anything that's syntactically valid.
> >
> > I was tempted to add the test case, because it is easy to test it.
> >
> > But I then decided *not* to add it. Why? Testing is a balance between "can
> > do" and "need to do".
> >
> > Can you imagine that I did *not* run the entire test suite before
> > submitting this patch series, because it takes an incredible *90 minutes*
> > to run *on a fast Windows machine*?
> 
> What's wrong with firing up a new worktree, run the test suite there
> and go back to do something else so you won't waste time just waiting
> for test results and submit? Sure there is a mental overhead for
> switching tasks, but at 90 minutes, I think it's worth doing.

Of course it is worth doing. That's why I often test the end result on
Windows (waiting those 90 minutes, but I do not fire up a new worktree, I
use my cloud privilege and let Azure/Visual Studio Team Services do the
work for me, without slowing down my laptop).

What I would love to do, however, would be to test all intermediate
patches, too, as that often shows a problem with my frequent reorderings
via interactive rebases. And 90 minutes times 9 is... 13 hours and 30
minutes. That's a really long time.

I think the best course of action would be to incrementally do away with
the shell scripted test framework, in the way you outlined earlier this
year. This would *also* buy us a wealth of other benefits, such as better
control over the parallelization, resource usage, etc.

It would also finally make it easier to introduce something like "smart
testing" where code coverage could be computed (this works only for C
code, of course, not for the many scripted parts of core Git), and a diff
could be inspected to discover which tests *really* need to be run,
skipping the tests that would only touch unchanged code.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: A potential approach to making tests faster on Windows
  2018-03-30 18:45         ` A potential approach to making tests faster on Windows Ævar Arnfjörð Bjarmason
@ 2018-03-30 18:58           ` Junio C Hamano
  2018-03-30 19:16           ` Jeff King
  2018-04-03 11:43           ` Johannes Schindelin
  2 siblings, 0 replies; 103+ messages in thread
From: Junio C Hamano @ 2018-03-30 18:58 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Johannes Schindelin, Jeff King, Stefan Beller, git, Thomas Rast,
	Phil Haack, Jason Frey, Philip Oakley, Duy Nguyen

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Fri, Mar 30 2018, Johannes Schindelin wrote [expressing frustrations
> about Windows test suite slowness]:
>
> I've wondered for a while whether it wouldn't be a viable approach to
> make something like an interpreter for our test suite to get around this
> problem, i.e. much of it's very repetitive and just using a few shell
> functions we've defined, what if we had C equivalents of those?
> ...
>
> I don't have time or interest to work on this now, but thought it was
> interesting to share. This assumes that something in shellscript like:
>
>     while echo foo; do echo bar; done
>
> Is no slower on Windows than *nix, since it's purely using built-ins, as
> opposed to something that would shell out.

That's interesting; it certainly is appreciated to be constructive
to find a usable solution.


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 1/9] git_config_set: fix off-by-two
  2018-03-30 18:44             ` Johannes Schindelin
@ 2018-03-30 19:00               ` Junio C Hamano
  2018-04-03  9:31                 ` Johannes Schindelin
  0 siblings, 1 reply; 103+ messages in thread
From: Junio C Hamano @ 2018-03-30 19:00 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Ævar Arnfjörð Bjarmason, Jeff King, Stefan Beller,
	git, Thomas Rast, Phil Haack, Jason Frey, Philip Oakley

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> What would be a *really* good strategy is: "Oh, there is a problem! Let's
> acknowledge it and try to come up with a solution rather than a
> work-around".
>
> EXPENSIVE_ON_WINDOWS is a symptom. Not a solution.

Yes, it is a workaround.  Making shell faster on windows would of
course be one possible solution to make t/t*.sh scripts go faster
;-)  Or update parts of t/t*.sh so that the equivalent test coverage
can be kept while running making them go faster on Windows.





^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 8/9] git_config_set: use do_config_from_file() directly
  2018-03-30 14:08           ` Jeff King
@ 2018-03-30 19:04             ` Johannes Schindelin
  0 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-03-30 19:04 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

Hi Peff,

On Fri, 30 Mar 2018, Jeff King wrote:

> On Fri, Mar 30, 2018 at 04:01:56PM +0200, Johannes Schindelin wrote:
> 
> > You know what is *really* funny?
> > 
> > -- snip --
> > static int git_config_from_stdin(config_fn_t fn, void *data)
> > {
> >         return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin, data, 0);
> > }
> > 
> > int git_config_from_file(config_fn_t fn, const char *filename, void *data)
> > {
> >         int ret = -1;
> >         FILE *f;
> > 
> >         f = fopen_or_warn(filename, "r");
> >         if (f) {
> >                 flockfile(f);
> >                 ret = do_config_from_file(fn, CONFIG_ORIGIN_FILE, filename, filename, f, data, 0);
> >                 funlockfile(f);
> >                 fclose(f);
> >         }
> >         return ret;
> > }
> > -- snap --
> > 
> > So the _stdin variant *goes out of its way not to flockfile()*...
> 
> *facepalm* That's probably my fault, since git_config_from_stdin()
> existed already when I did the flockfile stuff.
> 
> Probably the flockfile should go into do_config_from_file(), where we
> specify to use the unlocked variants.

Ah, that makes sense now! I am glad I could also help ;-)

> > But I guess all this will become moot when I start handing down the config
> > options. It does mean that I have to change the signatures in header
> > files, oh well ;-)
> > 
> > But then I can drop this here patch and we can stop musing about
> > flockfile()  ;-)
> 
> Yeah, I'll wait to see how your refactor turns out.

I don't think I'll touch too much in that part of the code. My changes
should not cause merge conflicts with a patch moving the
flockfile()/funlockfile() calls to do_config_from_file().

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 1/9] git_config_set: fix off-by-two
  2018-03-30 18:53           ` Johannes Schindelin
@ 2018-03-30 19:16             ` Duy Nguyen
  0 siblings, 0 replies; 103+ messages in thread
From: Duy Nguyen @ 2018-03-30 19:16 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Jeff King, Stefan Beller, git, Junio C Hamano, Thomas Rast,
	Phil Haack, Ævar Arnfjörð Bjarmason, Jason Frey,
	Philip Oakley

On Fri, Mar 30, 2018 at 8:53 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> I think the best course of action would be to incrementally do away with
> the shell scripted test framework, in the way you outlined earlier this
> year. This would *also* buy us a wealth of other benefits, such as better
> control over the parallelization, resource usage, etc.

If you have not noticed, I'm a bit busy with all sorts of stuff and
probably won't continue that work. And since it affects you the most,
you probably have the best motive to tackle it ;-) I don't think
complaining about slow test suite helps. And avoiding adding more
tests because of that definitely does not help.

> It would also finally make it easier to introduce something like "smart
> testing" where code coverage could be computed (this works only for C
> code, of course, not for the many scripted parts of core Git), and a diff
> could be inspected to discover which tests *really* need to be run,
> skipping the tests that would only touch unchanged code.
-- 
Duy

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: A potential approach to making tests faster on Windows
  2018-03-30 18:45         ` A potential approach to making tests faster on Windows Ævar Arnfjörð Bjarmason
  2018-03-30 18:58           ` Junio C Hamano
@ 2018-03-30 19:16           ` Jeff King
  2018-04-03  9:49             ` Johannes Schindelin
  2018-04-03 11:43           ` Johannes Schindelin
  2 siblings, 1 reply; 103+ messages in thread
From: Jeff King @ 2018-03-30 19:16 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Johannes Schindelin, Stefan Beller, git, Junio C Hamano,
	Thomas Rast, Phil Haack, Jason Frey, Philip Oakley, Duy Nguyen

On Fri, Mar 30, 2018 at 08:45:45PM +0200, Ævar Arnfjörð Bjarmason wrote:

> I've wondered for a while whether it wouldn't be a viable approach to
> make something like an interpreter for our test suite to get around this
> problem, i.e. much of it's very repetitive and just using a few shell
> functions we've defined, what if we had C equivalents of those?

I've had a similar thought, though I wonder how far we could get with
just shell. I even tried it out with test_cmp:

  https://public-inbox.org/git/20161020215647.5no7effvutwep2xt@sigill.intra.peff.net/

But Johannes Sixt pointed out that they already do this (see
mingw_test_cmp in test-lib-functions).

I also tried to explore a few numbers about process invocations to see
if running shell commands is the problem:

  https://public-inbox.org/git/20161020123111.qnbsainul2g54z4z@sigill.intra.peff.net/

There was some discussion there about whether the problem is programs
being exec'd, or if it's forks due to subshells. And if it is programs
being exec'd, whether it's shell programs or if it is simply that we
exec Git a huge number of times.

-Peff

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 1/9] git_config_set: fix off-by-two
  2018-03-30 19:00               ` Junio C Hamano
@ 2018-04-03  9:31                 ` Johannes Schindelin
  2018-04-03 15:29                   ` Duy Nguyen
  2018-04-08 23:12                   ` Junio C Hamano
  0 siblings, 2 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-03  9:31 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, Jeff King, Stefan Beller,
	git, Thomas Rast, Phil Haack, Jason Frey, Philip Oakley

Hi Junio,

On Fri, 30 Mar 2018, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > What would be a *really* good strategy is: "Oh, there is a problem! Let's
> > acknowledge it and try to come up with a solution rather than a
> > work-around".
> >
> > EXPENSIVE_ON_WINDOWS is a symptom. Not a solution.
> 
> Yes, it is a workaround.  Making shell faster on windows would of
> course be one possible solution to make t/t*.sh scripts go faster
> ;-)  Or update parts of t/t*.sh so that the equivalent test coverage
> can be kept while running making them go faster on Windows.

What makes you think that I did not try my hardest for around 812 hours in
total so far to make the shell faster?

Ciao,
Dscho

P.S.: I do not have the actual number of hours I spent on both MSYS2's
runtime and BusyBox and Git to find *some* way to make it faster, as my
time-keeping is organized in a different way that makes it hard to query
the overall number. But I can state with confidence that it is easily in
the 200-300 hour range, if not beyond that.

It is very frustrating to spend that much time with only little gains here
and there (and BusyBox-w32 is simply not robust enough yet, apart from
also not showing a significant improvement in performance). Please do not
make this experience even more frustrating. Thanks.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: A potential approach to making tests faster on Windows
  2018-03-30 19:16           ` Jeff King
@ 2018-04-03  9:49             ` Johannes Schindelin
  2018-04-03 11:28               ` Ævar Arnfjörð Bjarmason
  2018-04-03 21:36               ` Eric Sunshine
  0 siblings, 2 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-03  9:49 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Stefan Beller, git,
	Junio C Hamano, Thomas Rast, Phil Haack, Jason Frey,
	Philip Oakley, Duy Nguyen

[-- Attachment #1: Type: text/plain, Size: 3321 bytes --]

Hi Peff,

On Fri, 30 Mar 2018, Jeff King wrote:

> On Fri, Mar 30, 2018 at 08:45:45PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> > I've wondered for a while whether it wouldn't be a viable approach to
> > make something like an interpreter for our test suite to get around
> > this problem, i.e. much of it's very repetitive and just using a few
> > shell functions we've defined, what if we had C equivalents of those?
> 
> I've had a similar thought, though I wonder how far we could get with
> just shell. I even tried it out with test_cmp:
> 
>   https://public-inbox.org/git/20161020215647.5no7effvutwep2xt@sigill.intra.peff.net/
> 
> But Johannes Sixt pointed out that they already do this (see
> mingw_test_cmp in test-lib-functions).

Right.

Additionally, I noticed that that simple loop in shell is *also* very slow on
Windows (at least in the MSYS2 Bash we use in Git for Windows).

Under the assumption that it is the Bash with the loop that uses too much
POSIX emulation to make it fast, I re-implemented mingw_test_cmp in pure
C:
https://github.com/git-for-windows/git/commit/8a96ef63a0083ba02305dfeef6ff92c31b4fd7c3

Unfortunately, it did not produce any noticeable speed improvement, so I
did not even finish the conversion (when the cmp fails, it does not show
you any helpful diff yet).

> I also tried to explore a few numbers about process invocations to see
> if running shell commands is the problem:
> 
>   https://public-inbox.org/git/20161020123111.qnbsainul2g54z4z@sigill.intra.peff.net/

This mail was still in my inbox, in want of me saying something about
this.

My main evidence that shell scripts on macOS are slower than on Linux was
the difference of the improvement incurred by moving more things from
git-rebase--interactive.sh into sequencer.c: Linux saw an improvement only
of about 3x, while macOS saw an improvement of 4x, IIRC. If I don't
remember the absolute numbers correctly, at least I vividly remember the
qualitative difference: It was noticeable.

> There was some discussion there about whether the problem is programs
> being exec'd, or if it's forks due to subshells. And if it is programs
> being exec'd, whether it's shell programs or if it is simply that we
> exec Git a huge number of times.

One large problem there is that it is really hard to analyze performance
over such a heterogenous code base: part C, part Perl, part Unix shell
(and of course, when you say Unix shell, you imply dozens of separate
tools that *also* need to be performance-profiled). I have very good
profiling tools for C, I saw some built-in performance profiling for Perl,
but there is no good performance profiling for Unix shell scripting: I
doubt that the inventors of shell scripting had speed-critical production
code in mind when they came up with the idea.

I did invest dozens of hours earlier this year trying to obtain debug
symbols in .pdb format (ready for Visual Studio's really envy-inducing
performance profiler) also for the MSYS2 runtime and Bash, so that I could
analyze what makes things so awfully slow in Git's test suite.

The only problem is that I also have to do other things in my day-job, so
that project waits patiently until I have some time to come back to that
project.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: A potential approach to making tests faster on Windows
  2018-04-03  9:49             ` Johannes Schindelin
@ 2018-04-03 11:28               ` Ævar Arnfjörð Bjarmason
  2018-04-03 15:55                 ` Johannes Schindelin
  2018-04-03 21:36               ` Eric Sunshine
  1 sibling, 1 reply; 103+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-04-03 11:28 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Jeff King, Stefan Beller, git, Junio C Hamano, Thomas Rast,
	Phil Haack, Jason Frey, Philip Oakley, Duy Nguyen


On Tue, Apr 03 2018, Johannes Schindelin wrote:

> Hi Peff,
>
> On Fri, 30 Mar 2018, Jeff King wrote:
>
>> On Fri, Mar 30, 2018 at 08:45:45PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> > I've wondered for a while whether it wouldn't be a viable approach to
>> > make something like an interpreter for our test suite to get around
>> > this problem, i.e. much of it's very repetitive and just using a few
>> > shell functions we've defined, what if we had C equivalents of those?
>>
>> I've had a similar thought, though I wonder how far we could get with
>> just shell. I even tried it out with test_cmp:
>>
>>   https://public-inbox.org/git/20161020215647.5no7effvutwep2xt@sigill.intra.peff.net/
>>
>> But Johannes Sixt pointed out that they already do this (see
>> mingw_test_cmp in test-lib-functions).
>
> Right.
>
> Additionally, I noticed that that simple loop in shell is *also* very slow on
> Windows (at least in the MSYS2 Bash we use in Git for Windows).
>
> Under the assumption that it is the Bash with the loop that uses too much
> POSIX emulation to make it fast, I re-implemented mingw_test_cmp in pure
> C:
> https://github.com/git-for-windows/git/commit/8a96ef63a0083ba02305dfeef6ff92c31b4fd7c3
>
> Unfortunately, it did not produce any noticeable speed improvement, so I
> did not even finish the conversion (when the cmp fails, it does not show
> you any helpful diff yet).

I don't know the details of Windows, but it sounds like you're trying to
performance test two things that are going to suck for different
reasons.

On one hand the pure-*.sh comparison would be slower than just diff on
*nix, because it's not C, so you'll get that slowness, but gain in not
having to fork another process.

On the other hand the C implementation is going to be really fast, but
it's going to take you a long time to get it started on Windows.

Which is why I think it would be really interesting to see the third
approach I suggested, i.e. hack the shell to make the test_cmp a builtin
and test that. Then you won't fork, but will get the advantage of your
fast C codepath.

Also, even if test_cmp is much faster, Peff's results over at
https://public-inbox.org/git/20161020123111.qnbsainul2g54z4z@sigill.intra.peff.net/
suggest that you may not notice anyway. Aside from the points raised
there about the bin wrappers it seems the easiest wins are having a
builtin version of "rm" and "cat".

Are you able to compile dash on Windows with some modification of the
patch I sent upthread? If not it doesn't seem too hard to do the same
trick for bash, see:

    git grep '\balias\b' -- builtins

Once you have bash.git checked out. I.e. you add a bit of Makefile
boilerplate and you should be able to get a new builtin.

>> I also tried to explore a few numbers about process invocations to see
>> if running shell commands is the problem:
>>
>>   https://public-inbox.org/git/20161020123111.qnbsainul2g54z4z@sigill.intra.peff.net/
>
> This mail was still in my inbox, in want of me saying something about
> this.
>
> My main evidence that shell scripts on macOS are slower than on Linux was
> the difference of the improvement incurred by moving more things from
> git-rebase--interactive.sh into sequencer.c: Linux saw an improvement only
> of about 3x, while macOS saw an improvement of 4x, IIRC. If I don't
> remember the absolute numbers correctly, at least I vividly remember the
> qualitative difference: It was noticeable.
>
>> There was some discussion there about whether the problem is programs
>> being exec'd, or if it's forks due to subshells. And if it is programs
>> being exec'd, whether it's shell programs or if it is simply that we
>> exec Git a huge number of times.
>
> One large problem there is that it is really hard to analyze performance
> over such a heterogenous code base: part C, part Perl, part Unix shell
> (and of course, when you say Unix shell, you imply dozens of separate
> tools that *also* need to be performance-profiled). I have very good
> profiling tools for C, I saw some built-in performance profiling for Perl,
> but there is no good performance profiling for Unix shell scripting: I
> doubt that the inventors of shell scripting had speed-critical production
> code in mind when they came up with the idea.
>
> I did invest dozens of hours earlier this year trying to obtain debug
> symbols in .pdb format (ready for Visual Studio's really envy-inducing
> performance profiler) also for the MSYS2 runtime and Bash, so that I could
> analyze what makes things so awfully slow in Git's test suite.
>
> The only problem is that I also have to do other things in my day-job, so
> that project waits patiently until I have some time to come back to that
> project.
>
> Ciao,
> Dscho

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: A potential approach to making tests faster on Windows
  2018-03-30 18:45         ` A potential approach to making tests faster on Windows Ævar Arnfjörð Bjarmason
  2018-03-30 18:58           ` Junio C Hamano
  2018-03-30 19:16           ` Jeff King
@ 2018-04-03 11:43           ` Johannes Schindelin
  2018-04-03 13:27             ` Jeff King
  2 siblings, 1 reply; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-03 11:43 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff King, Stefan Beller, git, Junio C Hamano, Thomas Rast,
	Phil Haack, Jason Frey, Philip Oakley, Duy Nguyen

[-- Attachment #1: Type: text/plain, Size: 4397 bytes --]

Hi Ævar,

On Fri, 30 Mar 2018, Ævar Arnfjörð Bjarmason wrote:

> On Fri, Mar 30 2018, Johannes Schindelin wrote [expressing frustrations
> about Windows test suite slowness]:

To be precise (and I think it is important to be precise here): it is not
the Windows test suite about which I talked, it is Git's test suite, as
run on Windows. It might sound like a small difference, but it is not: the
fault really lies with Git because it wants to be a portable software.

> I've wondered for a while whether it wouldn't be a viable approach to
> make something like an interpreter for our test suite to get around this
> problem, i.e. much of it's very repetitive and just using a few shell
> functions we've defined, what if we had C equivalents of those?

There has even been an attempt to do this by Linus Torvalds himself:

https://public-inbox.org/git/Pine.LNX.4.64.0602232229340.3771@g5.osdl.org/

It has not really gone anywhere...

To be honest, I had a different idea (because I do not really want to
maintain yet another piece of software): BusyBox. The source code is clean
enough, and it should, in theory, allow us to go really fast.

> Duy had a WIP patch set a while ago to add C test suite support, but I
> thought what if we turn that inside-out, and instead have a shell
> interpreter that knows about the likes of test_cmp, and executes them
> directly?

The problem, of course, is: if you add Git-test-suite-specific stuff to
any Unix shell, you are going to have to maintain this fork, and all of a
sudden it has become a lot harder to develop Git, and to port it.

Quite frankly, I would rather go with Duy's original approach, or a
variation thereof, as snuck into the wildmatch discussion here:

	https://public-inbox.org/git/20180110090724.GA2893@ash/

> Here's proof of concept as a patch to the dash shell:
> 
>     u dash (debian/master=) $ git diff
>     diff --git a/src/builtins.def.in b/src/builtins.def.in
>     index 4441fe4..b214a17 100644
>     --- a/src/builtins.def.in
>     +++ b/src/builtins.def.in
>     @@ -92,3 +92,4 @@ ulimitcmd     ulimit
>      #endif
>      testcmd                test [
>      killcmd                -u kill
>     +testcmpcmd     test_cmp
>     diff --git a/src/jobs.c b/src/jobs.c
>     index c2c2332..905563f 100644
>     --- a/src/jobs.c
>     +++ b/src/jobs.c
>     @@ -1502,3 +1502,12 @@ getstatus(struct job *job) {
>                     jobno(job), job->nprocs, status, retval));
>             return retval;
>      }
>     +
>     +#include <stdio.h>
>     +int
>     +testcmpcmd(argc, argv)
>     +       int argc;
>     +       char **argv;
>     +{
>     +       fprintf(stderr, "Got %d arguments\n", argc);
>     +}
> 
> I just added that to jobs.c because it was easiest, then test_cmp
> becomes a builtin:
> 
>     u dash (debian/master=) $ src/dash -c 'type test_cmp'
>     test_cmp is a shell builtin
>     u dash (debian/master=) $ src/dash -c 'echo foo && test_cmp 1 2 3'
>     foo
>     Got 4 arguments
> 
> I.e. it's really easy to add new built in commands to the dash shell
> (and probably other shells, but dash is really small & fast).
> 
> We could carry some patch like that to dash, and also patch it so
> test-lib.sh could know that that was our own custom shell, and we'd then
> skip defining functions like test_cmp, and instead use that new builtin.

Or even use the output of `type test_cmp` as a tell-tale.

> Similarly, it could then be linked to our own binaries, and the
> test-tool would be a builtin that would appropriately dispatch, and we
> could even eventually make "git" a shell builtin.
> 
> I don't have time or interest to work on this now, but thought it was
> interesting to share. This assumes that something in shellscript like:
> 
>     while echo foo; do echo bar; done
> 
> Is no slower on Windows than *nix, since it's purely using built-ins, as
> opposed to something that would shell out.

It is still interpreting stuff. And it still goes through the POSIX
emulation layer.

I did see reports on the Git for Windows bug tracker that gave me the
impression that such loops in Unix shell scripts may not, in fact, be as
performant in MSYS2's Bash as you would like to believe:

https://github.com/git-for-windows/git/issues/1533#issuecomment-372025449

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: A potential approach to making tests faster on Windows
  2018-04-03 11:43           ` Johannes Schindelin
@ 2018-04-03 13:27             ` Jeff King
  2018-04-03 16:00               ` Johannes Schindelin
  0 siblings, 1 reply; 103+ messages in thread
From: Jeff King @ 2018-04-03 13:27 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Ævar Arnfjörð Bjarmason, Stefan Beller, git,
	Junio C Hamano, Thomas Rast, Phil Haack, Jason Frey,
	Philip Oakley, Duy Nguyen

On Tue, Apr 03, 2018 at 01:43:10PM +0200, Johannes Schindelin wrote:

> > I don't have time or interest to work on this now, but thought it was
> > interesting to share. This assumes that something in shellscript like:
> > 
> >     while echo foo; do echo bar; done
> > 
> > Is no slower on Windows than *nix, since it's purely using built-ins, as
> > opposed to something that would shell out.
> 
> It is still interpreting stuff. And it still goes through the POSIX
> emulation layer.
> 
> I did see reports on the Git for Windows bug tracker that gave me the
> impression that such loops in Unix shell scripts may not, in fact, be as
> performant in MSYS2's Bash as you would like to believe:
> 
> https://github.com/git-for-windows/git/issues/1533#issuecomment-372025449

The main problem with `read` loops in shell is that the shell makes one
read() syscall per character. It has to, because doing otherwise is
user-visible in cases where the descriptor may get passed to a different
process.

There's unfortunately no portable way to say "please just read this
quickly, I promise nobody else is going to read the descriptor". And nor
do I know of any shell which is smart enough to know that it's going to
consume to EOF anyway (as you would for something like "cmd | while
read").

If you know you have bash, you can use "-N" to get a more efficient
read:

  $ echo foo | strace -e read bash -c 'read foo'
  [...]
  read(0, "f", 1)                         = 1
  read(0, "o", 1)                         = 1
  read(0, "o", 1)                         = 1
  read(0, "\n", 1)                        = 1

  $ echo foo | strace -e read bash -c 'read -N 10 foo'
  [...]
  read(0, "foo\n", 10)                    = 4
  read(0, "", 6)                          = 0

but then you have another problem: how to split the resulting buffer
into lines in shell. ;)

But if we're at the point of creating custom C builtins for
busybox/dash/etc, you should be able to create a primitive for "read
this using buffered stdio, other processes be damned, and return one
line at a time".

-Peff

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 1/9] git_config_set: fix off-by-two
  2018-04-03  9:31                 ` Johannes Schindelin
@ 2018-04-03 15:29                   ` Duy Nguyen
  2018-04-03 15:47                     ` Johannes Schindelin
  2018-04-08 23:12                   ` Junio C Hamano
  1 sibling, 1 reply; 103+ messages in thread
From: Duy Nguyen @ 2018-04-03 15:29 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Jeff King, Stefan Beller, git, Thomas Rast, Phil Haack,
	Jason Frey, Philip Oakley

On Tue, Apr 3, 2018 at 11:31 AM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> It is very frustrating to spend that much time with only little gains here
> and there (and BusyBox-w32 is simply not robust enough yet, apart from
> also not showing a significant improvement in performance).

You still use busybox-w32? It's amazing that people still use it after
the linux subsystem comes. busybox has a lot of commands built in
(i.e. no new processes) and unless rmyorston did something more, the
"fork" in ash shell should be as cheap as it could be: it simply
serializes data and sends to the new process.

If performance does not improve, I guess the process creation cost
dominates. There's not much we could do except moving away from the
zillion processes test framework: either something C-based or another
scripting language (ok I don't want to bring this up again)
-- 
Duy

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 1/9] git_config_set: fix off-by-two
  2018-04-03 15:29                   ` Duy Nguyen
@ 2018-04-03 15:47                     ` Johannes Schindelin
  0 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-03 15:47 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Jeff King, Stefan Beller, git, Thomas Rast, Phil Haack,
	Jason Frey, Philip Oakley

Hi Duy,

On Tue, 3 Apr 2018, Duy Nguyen wrote:

> On Tue, Apr 3, 2018 at 11:31 AM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> > It is very frustrating to spend that much time with only little gains
> > here and there (and BusyBox-w32 is simply not robust enough yet, apart
> > from also not showing a significant improvement in performance).
> 
> You still use busybox-w32?

Yes.

> It's amazing that people still use it after the linux subsystem comes.

I use WSL myself. But you need to realize that WSL is only available on
Windows 10 (many users still use Windows 7), and it is a little tricky to
get to work in Docker containers, I heard, so I did not even try.

Also, many Windows users are unfamiliar with Linux, and forcing them to
learn and install a Linux distribution on their machine when all they want
is to use Git is a bit... much.

> busybox has a lot of commands built in (i.e. no new processes) and
> unless rmyorston did something more, the "fork" in ash shell should be
> as cheap as it could be: it simply serializes data and sends to the new
> process.

Yes, I had the pleasure of reading that code.

It might surprise you, but I had to come up with quite a bit of patches to
make the test suite pass. And it does not really pass, as I randomly get
hangs...

> If performance does not improve, I guess the process creation cost
> dominates. There's not much we could do except moving away from the
> zillion processes test framework: either something C-based or another
> scripting language (ok I don't want to bring this up again)

There is no need to guess. I now have .pdb files, and once I have a good
example of a shell script construct that is particularly slow, and once I
find some time to work on it, I will dig into the bottlenecks.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: A potential approach to making tests faster on Windows
  2018-04-03 11:28               ` Ævar Arnfjörð Bjarmason
@ 2018-04-03 15:55                 ` Johannes Schindelin
  0 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-03 15:55 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff King, Stefan Beller, git, Junio C Hamano, Thomas Rast,
	Phil Haack, Jason Frey, Philip Oakley, Duy Nguyen

[-- Attachment #1: Type: text/plain, Size: 1376 bytes --]

Hi Ævar,

On Tue, 3 Apr 2018, Ævar Arnfjörð Bjarmason wrote:

> [...] I think it would be really interesting to see the third
> approach I suggested, i.e. hack the shell to make the test_cmp a builtin
> and test that. Then you won't fork, but will get the advantage of your
> fast C codepath.

That should be relatively equivalent to running in BusyBox-w32's ash.
BusyBox-w32 is a pure-Win32 version of BusyBox (i.e. it does not use any
POSIX emulation layer, not Cygwin nor MSYS2).

I did not notice any Earth-shaking performance improvement when running a
test with BusyBox-w32's ash. It was a couple of percent, maybe even 20%
faster, but nowhere near the orders of magnitude I had been expecting.

> Also, even if test_cmp is much faster, Peff's results over at
> https://public-inbox.org/git/20161020123111.qnbsainul2g54z4z@sigill.intra.peff.net/
> suggest that you may not notice anyway. Aside from the points raised
> there about the bin wrappers it seems the easiest wins are having a
> builtin version of "rm" and "cat".

In BusyBox-w32, `rm` and `cat` *are* built-ins.

> Are you able to compile dash on Windows with some modification of the
> patch I sent upthread?

In theory, yes. In practice, I lack the time (and I do not expect this to
have any performance benefit over using BusyBox-w32 to run the test suite).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: A potential approach to making tests faster on Windows
  2018-04-03 13:27             ` Jeff King
@ 2018-04-03 16:00               ` Johannes Schindelin
  2018-04-06 21:40                 ` Jeff King
  0 siblings, 1 reply; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-03 16:00 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Stefan Beller, git,
	Junio C Hamano, Thomas Rast, Phil Haack, Jason Frey,
	Philip Oakley, Duy Nguyen

Hi Peff,

On Tue, 3 Apr 2018, Jeff King wrote:

> On Tue, Apr 03, 2018 at 01:43:10PM +0200, Johannes Schindelin wrote:
> 
> > > I don't have time or interest to work on this now, but thought it was
> > > interesting to share. This assumes that something in shellscript like:
> > > 
> > >     while echo foo; do echo bar; done
> > > 
> > > Is no slower on Windows than *nix, since it's purely using built-ins, as
> > > opposed to something that would shell out.
> > 
> > It is still interpreting stuff. And it still goes through the POSIX
> > emulation layer.
> > 
> > I did see reports on the Git for Windows bug tracker that gave me the
> > impression that such loops in Unix shell scripts may not, in fact, be as
> > performant in MSYS2's Bash as you would like to believe:
> > 
> > https://github.com/git-for-windows/git/issues/1533#issuecomment-372025449
> 
> The main problem with `read` loops in shell is that the shell makes one
> read() syscall per character. It has to, because doing otherwise is
> user-visible in cases where the descriptor may get passed to a different
> process.

Thank you for the explanation. Makes tons of sense now.

> There's unfortunately no portable way to say "please just read this
> quickly, I promise nobody else is going to read the descriptor". And nor
> do I know of any shell which is smart enough to know that it's going to
> consume to EOF anyway (as you would for something like "cmd | while
> read").
> 
> If you know you have bash, you can use "-N" to get a more efficient
> read:
> 
>   $ echo foo | strace -e read bash -c 'read foo'
>   [...]
>   read(0, "f", 1)                         = 1
>   read(0, "o", 1)                         = 1
>   read(0, "o", 1)                         = 1
>   read(0, "\n", 1)                        = 1
> 
>   $ echo foo | strace -e read bash -c 'read -N 10 foo'
>   [...]
>   read(0, "foo\n", 10)                    = 4
>   read(0, "", 6)                          = 0
> 
> but then you have another problem: how to split the resulting buffer
> into lines in shell. ;)

True.

> But if we're at the point of creating custom C builtins for
> busybox/dash/etc, you should be able to create a primitive for "read
> this using buffered stdio, other processes be damned, and return one
> line at a time".

Well, you know, I do not think that papering over the root cause will make
anything better. And the root cause is that we use a test framework
written in Unix shell.

I will have to set aside some time to dig into the bottlenecks there and
figure out what parts I can safely convert into "test builtins", i.e. into
the test-tool Duy introduced, to avoid having shell scripts do the
heavy-lifting.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v2 00/15] Assorted fixes for `git config` (including the "empty sections" bug)
  2018-03-29 15:18 [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug) Johannes Schindelin
                   ` (11 preceding siblings ...)
  2018-03-30 14:17 ` Ævar Arnfjörð Bjarmason
@ 2018-04-03 16:27 ` Johannes Schindelin
  2018-04-03 16:28   ` [PATCH v2 01/15] git_config_set: fix off-by-two Johannes Schindelin
                     ` (16 more replies)
  12 siblings, 17 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-03 16:27 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

This patch series originally only tried to help fixing that annoying bug that
has been reported several times over the years, where `git config --unset`
would leave empty sections behind, and `git config --add` would not reuse them.

The first patch is somewhat of a "while at it" bug fix that I first thought
would be a lot more critical than it actually is: It really only affects config
files that start with a section followed immediately (i.e. without a newline)
by a one-letter boolean setting (i.e. without a `= <value>` part). So while it
is a real bug fix, I doubt anybody ever got bitten by it.

The next swath of patches add and fix some tests, while also fixing the bug
where --replace-all would sometimes insert extra line breaks.

These fixes are pretty straight-forward, and I always try to keep my added
tests as concise as possible, so please tell me if you find a way to make them
smaller (without giving up readability and debuggability).

Then, I introduce a couple of building blocks: a "config parser event stream",
i.e. an optional callback that can be used to report events such as "comment", "white-space", etc together with the corresponding extents in the config file.

Finally, the interesting part, where I do two things, essentially (with
preparatory steps for each thing):

1. I add the ability for `git config --unset/--unset-all` to detect that it
   can remove a section that has just become empty (see below for some more
   discussion of what I consider "become empty"), and

2. I add the ability for `git config [--add] key value` to re-use empty
   sections.

I am very, very grateful for the time Peff spent on reviewing the previous
iteration, and hope that he realizes just how much the elegance of the
event-stream-based version is due to his excellent review.

To reiterate why does this patch series not conflict with my very early
statements that we cannot simply remove empty sections because we may end up
with stale comments?

Well, the patch in question takes pains to determine *iff* there are any
comments surrounding, or included in, the section. If any are found: previous
behavior. Under the assumption that the user edited the file, we keep it as
intact as possible (see below for some argument against this). If no comments
are found, and let's face it, this is probably *the* common case, as few people
edit their config files by hand these days (neither should they because it is
too easy to end up with an unparseable one), the now-empty section *is*
removed.

So what is the argument against this extra care to detect comments? Well, if
you have something like this:

	[section]
		; Here we comment about the variable called snarf
		snarf = froop

and we run `git config --unset section.snarf`, we end up with this config:

	[section]
		; Here we comment about the variable called snarf

which obviously does not make sense. However, that is already established
behavior for quite a few years, and I do not even try to think of a way how
this could be solved.

Changes since v1:

- a new feature was introduced where the config parser can be asked to report
  all "events" (like, section header or comment) via a callback function.

- the patches to reuse empty sections, and to remove sections that just became
  empty (without any surrounding comments) were rewritten to make use of that
  config parser event stream (incidentally fixing a couple of problems with
  the backtracking version which were pointed out by Peff).

- to make those changes easier to review, they have been split up into several
  tiny logical steps: the file-local `store` was replaced with callback data,
  some fields were renamed for consistency, the state machine when parsing the
  config was replaced by easier-to-understand flags, etc.

- while pouring over the code, I managed to find another obscure bug: under
  certain circumstances, --replace-all could produce extra new-lines. This is
  now fixed as part of the preparatory patches.


Johannes Schindelin (15):
  git_config_set: fix off-by-two
  t1300: rename it to reflect that `repo-config` was deprecated
  t1300: demonstrate that --replace-all can "invent" newlines
  config --replace-all: avoid extra line breaks
  t1300: avoid relying on a bug
  t1300: remove unreasonable expectation from TODO
  t1300: `--unset-all` can leave an empty section behind (bug)
  config: introduce an optional event stream while parsing
  config: avoid using the global variable `store`
  config_set_store: rename some fields for consistency
  git_config_set: do not use a state machine
  git_config_set: make use of the config parser's event stream
  git config --unset: remove empty sections (in the common case)
  git_config_set: reuse empty sections
  TODOs

 config.c                                    | 449 ++++++++++++++++++++--------
 config.h                                    |  25 ++
 t/{t1300-repo-config.sh => t1300-config.sh} |  57 +++-
 3 files changed, 396 insertions(+), 135 deletions(-)
 rename t/{t1300-repo-config.sh => t1300-config.sh} (97%)


base-commit: 468165c1d8a442994a825f3684528361727cd8c0
Published-As: https://github.com/dscho/git/releases/tag/empty-config-section-v2
Fetch-It-Via: git fetch https://github.com/dscho/git empty-config-section-v2

Interdiff vs v1 (sorry for the size, it is essentially a rewrite of more
than half of the previous iteration):
 diff --git a/config.c b/config.c
 index b04c40f76bc..ee7ea24123d 100644
 --- a/config.c
 +++ b/config.c
 @@ -653,21 +653,65 @@ static int get_base_var(struct strbuf *name)
  	}
  }
  
 +struct parse_event_data {
 +	enum config_event_t previous_type;
 +	size_t previous_offset;
 +	const struct config_options *opts;
 +};
 +
 +static inline int do_event(enum config_event_t type,
 +			   struct parse_event_data *data)
 +{
 +	size_t offset;
 +
 +	if (!data->opts || !data->opts->event_fn)
 +		return 0;
 +
 +	if (type == CONFIG_EVENT_WHITESPACE &&
 +	    data->previous_type == type)
 +		return 0;
 +
 +	offset = cf->do_ftell(cf);
 +	/*
 +	 * At EOF, the parser always "inserts" an extra '\n', therefore
 +	 * the end offset of the event is the current file position, otherwise
 +	 * we will already have advanced to the next event.
 +	 */
 +	if (type != CONFIG_EVENT_EOF)
 +		offset--;
 +
 +	if (data->previous_type != CONFIG_EVENT_EOF &&
 +	    data->opts->event_fn(data->previous_type, data->previous_offset,
 +				 offset, data->opts->event_fn_data) < 0)
 +		return -1;
 +
 +	data->previous_type = type;
 +	data->previous_offset = offset;
 +
 +	return 0;
 +}
 +
  static int git_parse_source(config_fn_t fn, void *data,
 -			    int include_section_headers)
 +			    const struct config_options *opts)
  {
  	int comment = 0;
  	int baselen = 0;
  	struct strbuf *var = &cf->var;
  	int error_return = 0;
  	char *error_msg = NULL;
 -	int saw_section_header = 0;
  
  	/* U+FEFF Byte Order Mark in UTF8 */
  	const char *bomptr = utf8_bom;
  
 +	/* For the parser event callback */
 +	struct parse_event_data event_data = {
 +		CONFIG_EVENT_EOF, 0, opts
 +	};
 +
  	for (;;) {
 -		int c = get_next_char();
 +		int c;
 +
 +		c = get_next_char();
  		if (bomptr && *bomptr) {
  			/* We are at the file beginning; skip UTF8-encoded BOM
  			 * if present. Sane editors won't put this in on their
 @@ -684,39 +728,47 @@ static int git_parse_source(config_fn_t fn, void *data,
  			}
  		}
  		if (c == '\n') {
 -			if (cf->eof)
 +			if (cf->eof) {
 +				if (do_event(CONFIG_EVENT_EOF, &event_data) < 0)
 +					return -1;
  				return 0;
 -			comment = 0;
 -			if (saw_section_header) {
 -				if (include_section_headers) {
 -					cf->linenr--;
 -					error_return = fn(var->buf, NULL, data);
 -					if (error_return < 0)
 -						break;
 -					cf->linenr++;
 -				}
 -				saw_section_header = 0;
  			}
 +			if (do_event(CONFIG_EVENT_WHITESPACE, &event_data) < 0)
 +				return -1;
 +			comment = 0;
  			continue;
  		}
 -		if (comment || isspace(c))
 +		if (comment)
  			continue;
 +		if (isspace(c)) {
 +			if (do_event(CONFIG_EVENT_WHITESPACE, &event_data) < 0)
 +					return -1;
 +			continue;
 +		}
  		if (c == '#' || c == ';') {
 +			if (do_event(CONFIG_EVENT_COMMENT, &event_data) < 0)
 +					return -1;
  			comment = 1;
  			continue;
  		}
  		if (c == '[') {
 +			if (do_event(CONFIG_EVENT_SECTION, &event_data) < 0)
 +					return -1;
 +
  			/* Reset prior to determining a new stem */
  			strbuf_reset(var);
  			if (get_base_var(var) < 0 || var->len < 1)
  				break;
  			strbuf_addch(var, '.');
  			baselen = var->len;
 -			saw_section_header = 1;
  			continue;
  		}
  		if (!isalpha(c))
  			break;
 +
 +		if (do_event(CONFIG_EVENT_ENTRY, &event_data) < 0)
 +			return -1;
 +
  		/*
  		 * Truncate the var name back to the section header
  		 * stem prior to grabbing the suffix part of the name
 @@ -728,6 +780,9 @@ static int git_parse_source(config_fn_t fn, void *data,
  			break;
  	}
  
 +	if (do_event(CONFIG_EVENT_ERROR, &event_data) < 0)
 +		return -1;
 +
  	switch (cf->origin_type) {
  	case CONFIG_ORIGIN_BLOB:
  		error_msg = xstrfmt(_("bad config line %d in blob %s"),
 @@ -1412,7 +1467,7 @@ int git_default_config(const char *var, const char *value, void *dummy)
   * this function.
   */
  static int do_config_from(struct config_source *top, config_fn_t fn, void *data,
 -			  int include_section_headers)
 +			  const struct config_options *opts)
  {
  	int ret;
  
 @@ -1424,7 +1479,7 @@ static int do_config_from(struct config_source *top, config_fn_t fn, void *data,
  	strbuf_init(&top->var, 1024);
  	cf = top;
  
 -	ret = git_parse_source(fn, data, include_section_headers);
 +	ret = git_parse_source(fn, data, opts);
  
  	/* pop config-file parsing state stack */
  	strbuf_release(&top->value);
 @@ -1437,7 +1492,7 @@ static int do_config_from(struct config_source *top, config_fn_t fn, void *data,
  static int do_config_from_file(config_fn_t fn,
  		const enum config_origin_type origin_type,
  		const char *name, const char *path, FILE *f,
 -		void *data, int include_section_headers)
 +		void *data, const struct config_options *opts)
  {
  	struct config_source top;
  
 @@ -1450,15 +1505,18 @@ static int do_config_from_file(config_fn_t fn,
  	top.do_ungetc = config_file_ungetc;
  	top.do_ftell = config_file_ftell;
  
 -	return do_config_from(&top, fn, data, include_section_headers);
 +	return do_config_from(&top, fn, data, opts);
  }
  
  static int git_config_from_stdin(config_fn_t fn, void *data)
  {
 -	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin, data, 0);
 +	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin,
 +				   data, NULL);
  }
  
 -int git_config_from_file(config_fn_t fn, const char *filename, void *data)
 +int git_config_from_file_with_options(config_fn_t fn, const char *filename,
 +				      void *data,
 +				      const struct config_options *opts)
  {
  	int ret = -1;
  	FILE *f;
 @@ -1466,13 +1524,19 @@ int git_config_from_file(config_fn_t fn, const char *filename, void *data)
  	f = fopen_or_warn(filename, "r");
  	if (f) {
  		flockfile(f);
 -		ret = do_config_from_file(fn, CONFIG_ORIGIN_FILE, filename, filename, f, data, 0);
 +		ret = do_config_from_file(fn, CONFIG_ORIGIN_FILE, filename,
 +					  filename, f, data, opts);
  		funlockfile(f);
  		fclose(f);
  	}
  	return ret;
  }
  
 +int git_config_from_file(config_fn_t fn, const char *filename, void *data)
 +{
 +	return git_config_from_file_with_options(fn, filename, data, NULL);
 +}
 +
  int git_config_from_mem(config_fn_t fn, const enum config_origin_type origin_type,
  			const char *name, const char *buf, size_t len, void *data)
  {
 @@ -1489,7 +1553,7 @@ int git_config_from_mem(config_fn_t fn, const enum config_origin_type origin_typ
  	top.do_ungetc = config_buf_ungetc;
  	top.do_ftell = config_buf_ftell;
  
 -	return do_config_from(&top, fn, data, 0);
 +	return do_config_from(&top, fn, data, NULL);
  }
  
  int git_config_from_blob_oid(config_fn_t fn,
 @@ -2233,96 +2297,98 @@ void git_die_config(const char *key, const char *err, ...)
   * Find all the stuff for git_config_set() below.
   */
  
 -static struct {
 +struct config_set_store {
  	int baselen;
  	char *key;
  	int do_not_match;
  	regex_t *value_regex;
  	int multi_replace;
 -	size_t *offset;
 -	unsigned int offset_alloc;
 -	enum { START, SECTION_SEEN, SECTION_END_SEEN, KEY_SEEN } state;
 -	unsigned int seen;
 -} store;
 +	struct {
 +		size_t begin, end;
 +		enum config_event_t type;
 +		int is_keys_section;
 +	} *parsed;
 +	unsigned int parsed_nr, parsed_alloc, *seen, seen_nr, seen_alloc;
 +	unsigned int key_seen:1, section_seen:1, is_keys_section:1;
 +};
  
 -static int matches(const char *key, const char *value)
 +static int matches(const char *key, const char *value,
 +		   const struct config_set_store *store)
  {
 -	if (strcmp(key, store.key))
 +	if (strcmp(key, store->key))
  		return 0; /* not ours */
 -	if (!store.value_regex)
 +	if (!store->value_regex)
  		return 1; /* always matches */
 -	if (store.value_regex == CONFIG_REGEX_NONE)
 +	if (store->value_regex == CONFIG_REGEX_NONE)
  		return 0; /* never matches */
  
 -	return store.do_not_match ^
 -		(value && !regexec(store.value_regex, value, 0, NULL, 0));
 +	return store->do_not_match ^
 +		(value && !regexec(store->value_regex, value, 0, NULL, 0));
 +}
 +
 +static int store_aux_event(enum config_event_t type,
 +			   size_t begin, size_t end, void *data)
 +{
 +	struct config_set_store *store = data;
 +
 +	ALLOC_GROW(store->parsed, store->parsed_nr + 1, store->parsed_alloc);
 +	store->parsed[store->parsed_nr].begin = begin;
 +	store->parsed[store->parsed_nr].end = end;
 +	store->parsed[store->parsed_nr].type = type;
 +
 +	if (type == CONFIG_EVENT_SECTION) {
 +		if (cf->var.len < 2 || cf->var.buf[cf->var.len - 1] != '.')
 +			BUG("Invalid section name '%s'", cf->var.buf);
 +
 +		/* Is this the section we were looking for? */
 +		store->is_keys_section =
 +			store->parsed[store->parsed_nr].is_keys_section =
 +			cf->var.len - 1 == store->baselen &&
 +			!strncasecmp(cf->var.buf, store->key, store->baselen);
 +		if (store->is_keys_section) {
 +			store->section_seen = 1;
 +			ALLOC_GROW(store->seen, store->seen_nr + 1,
 +				   store->seen_alloc);
 +			store->seen[store->seen_nr] = store->parsed_nr;
 +		}
 +	}
 +
 +	store->parsed_nr++;
 +
 +	return 0;
  }
  
  static int store_aux(const char *key, const char *value, void *cb)
  {
 -	const char *ep;
 -	size_t section_len;
 +	struct config_set_store *store = cb;
  
 -	switch (store.state) {
 -	case KEY_SEEN:
 -		if (matches(key, value)) {
 -			if (store.seen == 1 && store.multi_replace == 0) {
 +	if (store->key_seen) {
 +		if (matches(key, value, store)) {
 +			if (store->seen_nr == 1 && store->multi_replace == 0) {
  				warning(_("%s has multiple values"), key);
  			}
  
 -			ALLOC_GROW(store.offset, store.seen + 1,
 -				   store.offset_alloc);
 +			ALLOC_GROW(store->seen, store->seen_nr + 1,
 +				   store->seen_alloc);
  
 -			store.offset[store.seen] = cf->do_ftell(cf);
 -			store.seen++;
 +			store->seen[store->seen_nr] = store->parsed_nr;
 +			store->seen_nr++;
  		}
 -		break;
 -	case SECTION_SEEN:
 +	} else if (store->is_keys_section) {
  		/*
 -		 * What we are looking for is in store.key (both
 -		 * section and var), and its section part is baselen
 -		 * long.  We found key (again, both section and var).
 -		 * We would want to know if this key is in the same
 -		 * section as what we are looking for.  We already
 -		 * know we are in the same section as what should
 -		 * hold store.key.
 +		 * Do not increment matches yet: this may not be a match, but we
 +		 * are in the desired section.
  		 */
 -		ep = strrchr(key, '.');
 -		section_len = ep - key;
 -
 -		if ((section_len != store.baselen) ||
 -		    memcmp(key, store.key, section_len+1)) {
 -			store.state = SECTION_END_SEEN;
 -			break;
 -		}
 +		ALLOC_GROW(store->seen, store->seen_nr + 1, store->seen_alloc);
 +		store->seen[store->seen_nr] = store->parsed_nr;
 +		store->section_seen = 1;
  
 -		/*
 -		 * Do not increment matches: this is no match, but we
 -		 * just made sure we are in the desired section.
 -		 */
 -		ALLOC_GROW(store.offset, store.seen + 1,
 -			   store.offset_alloc);
 -		store.offset[store.seen] = cf->do_ftell(cf);
 -		/* fallthru */
 -	case SECTION_END_SEEN:
 -	case START:
 -		if (matches(key, value)) {
 -			ALLOC_GROW(store.offset, store.seen + 1,
 -				   store.offset_alloc);
 -			store.offset[store.seen] = cf->do_ftell(cf);
 -			store.state = KEY_SEEN;
 -			store.seen++;
 -		} else {
 -			if (strrchr(key, '.') - key == store.baselen &&
 -			      !strncmp(key, store.key, store.baselen)) {
 -					store.state = SECTION_SEEN;
 -					ALLOC_GROW(store.offset,
 -						   store.seen + 1,
 -						   store.offset_alloc);
 -					store.offset[store.seen] = cf->do_ftell(cf);
 -			}
 +		if (matches(key, value, store)) {
 +			store->seen_nr++;
 +			store->key_seen = 1;
  		}
  	}
 +
  	return 0;
  }
  
 @@ -2334,31 +2400,33 @@ static int write_error(const char *filename)
  	return 4;
  }
  
 -static struct strbuf store_create_section(const char *key)
 +static struct strbuf store_create_section(const char *key,
 +					  const struct config_set_store *store)
  {
  	const char *dot;
  	int i;
  	struct strbuf sb = STRBUF_INIT;
  
 -	dot = memchr(key, '.', store.baselen);
 +	dot = memchr(key, '.', store->baselen);
  	if (dot) {
  		strbuf_addf(&sb, "[%.*s \"", (int)(dot - key), key);
 -		for (i = dot - key + 1; i < store.baselen; i++) {
 +		for (i = dot - key + 1; i < store->baselen; i++) {
  			if (key[i] == '"' || key[i] == '\\')
  				strbuf_addch(&sb, '\\');
  			strbuf_addch(&sb, key[i]);
  		}
  		strbuf_addstr(&sb, "\"]\n");
  	} else {
 -		strbuf_addf(&sb, "[%.*s]\n", store.baselen, key);
 +		strbuf_addf(&sb, "[%.*s]\n", store->baselen, key);
  	}
  
  	return sb;
  }
  
 -static ssize_t write_section(int fd, const char *key)
 +static ssize_t write_section(int fd, const char *key,
 +			     const struct config_set_store *store)
  {
 -	struct strbuf sb = store_create_section(key);
 +	struct strbuf sb = store_create_section(key, store);
  	ssize_t ret;
  
  	ret = write_in_full(fd, sb.buf, sb.len);
 @@ -2367,11 +2435,12 @@ static ssize_t write_section(int fd, const char *key)
  	return ret;
  }
  
 -static ssize_t write_pair(int fd, const char *key, const char *value)
 +static ssize_t write_pair(int fd, const char *key, const char *value,
 +			  const struct config_set_store *store)
  {
  	int i;
  	ssize_t ret;
 -	int length = strlen(key + store.baselen + 1);
 +	int length = strlen(key + store->baselen + 1);
  	const char *quote = "";
  	struct strbuf sb = STRBUF_INIT;
  
 @@ -2391,7 +2460,7 @@ static ssize_t write_pair(int fd, const char *key, const char *value)
  		quote = "\"";
  
  	strbuf_addf(&sb, "\t%.*s = %s",
 -		    length, key + store.baselen + 1, quote);
 +		    length, key + store->baselen + 1, quote);
  
  	for (i = 0; value[i]; i++)
  		switch (value[i]) {
 @@ -2417,201 +2486,85 @@ static ssize_t write_pair(int fd, const char *key, const char *value)
  	return ret;
  }
  
 -static ssize_t find_beginning_of_line(const char *contents, size_t size,
 -	size_t offset_, int *found_bracket)
 -{
 -	size_t equal_offset = size, bracket_offset = size;
 -	ssize_t offset;
 -
 -contline:
 -	for (offset = offset_-2; offset > 0
 -			&& contents[offset] != '\n'; offset--)
 -		switch (contents[offset]) {
 -			case '=': equal_offset = offset; break;
 -			case ']': bracket_offset = offset; break;
 -		}
 -	if (offset > 0 && contents[offset-1] == '\\') {
 -		offset_ = offset;
 -		goto contline;
 -	}
 -	if (bracket_offset < equal_offset) {
 -		*found_bracket = 1;
 -		offset = bracket_offset+1;
 -	} else
 -		offset++;
 -
 -	return offset;
 -}
 -
 -/*
 - * This function determines whether the offset is in a line that starts with a
 - * comment character.
 - *
 - * Note: it does *not* report when a regular line (section header, config
 - * setting) *ends* in a comment.
 - */
 -static int is_in_comment_line(const char *contents, size_t offset)
 -{
 -	int comment = 0;
 -
 -	while (offset > 0)
 -		switch (contents[--offset]) {
 -		case ';':
 -		case '#':
 -			comment = 1;
 -			break;
 -		case '\n':
 -			break;
 -		case ' ':
 -		case '\t':
 -			continue;
 -		default:
 -			comment = 0;
 -		}
 -
 -	return comment;
 -}
 -
  /*
   * If we are about to unset the last key(s) in a section, and if there are
   * no comments surrounding (or included in) the section, we will want to
   * extend begin/end to remove the entire section.
   *
 - * Note: the parameter `i_ptr` points to the index into the store.offset
 - * array, reflecting the end offset of the respective entry to be deleted.
 - * This index may be incremented if a section has more than one entry (which
 - * all are to be removed).
 + * Note: the parameter `seen_ptr` points to the index into the store.seen
 + * array.  * This index may be incremented if a section has more than one
 + * entry (which all are to be removed).
   */
 -static void maybe_remove_section(const char *contents, size_t size,
 -				 const char *section_name,
 -				 size_t section_name_len,
 -				 size_t *begin, int *i_ptr, int *new_line)
 +static void maybe_remove_section(struct config_set_store *store,
 +				 const char *contents,
 +				 size_t *begin_offset, size_t *end_offset,
 +				 int *seen_ptr)
  {
 -	size_t begin2, end2;
 -	int seen_section = 0, dummy, i = *i_ptr;
 +	size_t begin;
 +	int i, seen, section_seen = 0;
  
  	/*
 -	 * First, make sure that this is the last key in the section, and that
 -	 * there are no comments that are possibly about the current section.
 +	 * First, ensure that this is the first key, and that there are no
 +	 * comments before the entry nor before the section header.
  	 */
 -next_entry:
 -	for (end2 = store.offset[i]; end2 < size; end2++) {
 -		switch (contents[end2]) {
 -		case ' ':
 -		case '\t':
 -		case '\n':
 -			continue;
 -		case '\r':
 -			if (++end2 < size && contents[end2] == '\n')
 -				continue;
 -			break;
 -		case '[':
 -			/* If the section name is repeated, continue */
 -			if (end2 + 1 + section_name_len < size &&
 -			    contents[end2 + section_name_len] == ']' &&
 -			    !memcmp(contents + end2 + 1, section_name,
 -				    section_name_len)) {
 -				end2 += section_name_len;
 -				continue;
 -			}
 -			goto look_before;
 -		case ';':
 -		case '#':
 -			/* There is a comment, cannot remove this section */
 +	seen = *seen_ptr;
 +	for (i = store->seen[seen]; i > 0; i--) {
 +		enum config_event_t type = store->parsed[i - 1].type;
 +
 +		if (type == CONFIG_EVENT_COMMENT)
 +			/* There is a comment before this entry or section */
  			return;
 -		default:
 -			/* There are other keys in that section */
 +		if (type == CONFIG_EVENT_ENTRY) {
 +			if (!section_seen)
 +				/* This is not the section's first entry. */
 +				return;
 +			/* We encountered no comment before the section. */
  			break;
  		}
 -
 -		/*
 -		 * Uh oh... we found something else in this section. But do
 -		 * we want to remove this, too?
 -		 */
 -		if (++i >= store.seen)
 -			return;
 -
 -		begin2 = find_beginning_of_line(contents, size, store.offset[i],
 -						&dummy);
 -		if (begin2 > end2)
 -			return;
 -
 -		/* Looks like we want to remove the next one, too... */
 -		goto next_entry;
 +		if (type == CONFIG_EVENT_SECTION) {
 +			if (!store->parsed[i - 1].is_keys_section)
 +				break;
 +			section_seen = 1;
 +		}
  	}
 +	begin = store->parsed[i].begin;
  
 -look_before:
  	/*
 -	 * Now, ensure that this is the first key, and that there are no
 -	 * comments before the entry nor before the section header.
 +	 * Next, make sure that we are removing he last key(s) in the section,
 +	 * and that there are no comments that are possibly about the current
 +	 * section.
  	 */
 -	for (begin2 = *begin; begin2 > 0; )
 -		switch (contents[begin2 - 1]) {
 -		case ' ':
 -		case '\t':
 -			begin2--;
 -			continue;
 -		case '\n':
 -			if (--begin2 > 0 && contents[begin2 - 1] == '\r')
 -				begin2--;
 -			continue;
 -		case ']':
 -			if (begin2 > section_name_len + 1 &&
 -			    contents[begin2 - section_name_len - 2] == '[' &&
 -			    !memcmp(contents + begin2 - section_name_len - 1,
 -				    section_name, section_name_len)) {
 -				begin2 -= section_name_len + 2;
 -				seen_section = 1;
 -				continue;
 -			}
 -
 -			/*
 -			 * It looks like a section header, but it could be a
 -			 * comment instead...
 -			 */
 -			if (is_in_comment_line(contents, begin2))
 -				return;
 -
 -			/*
 -			 * We encountered the previous section header: This
 -			 * really was the only entry, so remove the entire
 -			 * section.
 -			 */
 -			if (contents[begin2] != '\n') {
 -				begin2--;
 -				*new_line = 1;
 -			}
 +	for (i = store->seen[seen] + 1; i < store->parsed_nr; i++) {
 +		enum config_event_t type = store->parsed[i].type;
  
 -			store.offset[i] = end2;
 -			*begin = begin2;
 -			*i_ptr = i;
 +		if (type == CONFIG_EVENT_COMMENT)
  			return;
 -		default:
 -			/*
 -			 * Any other character means it is either a comment or
 -			 * a config setting; if it is a comment, we do not want
 -			 * to remove this section. If it is a config setting,
 -			 * we only want to remove this section if this is
 -			 * already the next section.
 -			 */
 -			if (seen_section &&
 -			    !is_in_comment_line(contents, begin2)) {
 -				if (contents[begin2] != '\n') {
 -					begin2--;
 -					*new_line = 1;
 -				}
 -
 -				store.offset[i] = end2;
 -				*begin = begin2;
 -				*i_ptr = i;
 -			}
 +		if (type == CONFIG_EVENT_SECTION) {
 +			if (store->parsed[i].is_keys_section)
 +				continue;
 +			break;
 +		}
 +		if (type == CONFIG_EVENT_ENTRY) {
 +			if (++seen < store->seen_nr &&
 +			    i == store->seen[seen])
 +				/* We want to remove this entry, too */
 +				continue;
 +			/* There is another entry in this section. */
  			return;
  		}
 +	}
  
 -	/* This section extends to the beginning of the file. */
 -	store.offset[i] = end2;
 -	*begin = begin2;
 -	*i_ptr = i;
 +	/*
 +	 * We are really removing the last entry/entries from this section, and
 +	 * there are no enclosed or surrounding comments. Remove the entire,
 +	 * now-empty section.
 +	 */
 +	*seen_ptr = seen;
 +	*begin_offset = begin;
 +	if (i < store->parsed_nr)
 +		*end_offset = store->parsed[i].begin;
 +	else
 +		*end_offset = store->parsed[store->parsed_nr - 1].end;
  }
  
  int git_config_set_in_file_gently(const char *config_filename,
 @@ -2671,14 +2624,15 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
  	struct lock_file lock = LOCK_INIT;
  	char *filename_buf = NULL;
  	char *contents = NULL;
 -	char *section_name = NULL;
  	size_t contents_sz;
 +	struct config_set_store store;
 +
 +	memset(&store, 0, sizeof(store));
  
  	/* parse-key returns negative; flip the sign to feed exit(3) */
 -	ret = 0 - git_config_parse_key(key, &section_name, &store.baselen);
 +	ret = 0 - git_config_parse_key(key, &store.key, &store.baselen);
  	if (ret)
  		goto out_free;
 -	store.key = section_name;
  
  	store.multi_replace = multi_replace;
  
 @@ -2692,6 +2646,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
  	fd = hold_lock_file_for_update(&lock, config_filename, 0);
  	if (fd < 0) {
  		error_errno("could not lock config file %s", config_filename);
 +		free(store.key);
  		ret = CONFIG_NO_LOCK;
  		goto out_free;
  	}
 @@ -2701,6 +2656,8 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
  	 */
  	in_fd = open(config_filename, O_RDONLY);
  	if ( in_fd < 0 ) {
 +		free(store.key);
 +
  		if ( ENOENT != errno ) {
  			error_errno("opening %s", config_filename);
  			ret = CONFIG_INVALID_FILE; /* same as "invalid config file" */
 @@ -2713,14 +2670,14 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
  		}
  
  		store.key = (char *)key;
 -		if (write_section(fd, key) < 0 ||
 -		    write_pair(fd, key, value) < 0)
 +		if (write_section(fd, key, &store) < 0 ||
 +		    write_pair(fd, key, value, &store) < 0)
  			goto write_err_out;
  	} else {
  		struct stat st;
  		size_t copy_begin, copy_end;
  		int i, new_line = 0;
 -		FILE *f;
 +		struct config_options opts;
  
  		if (value_regex == NULL)
  			store.value_regex = NULL;
 @@ -2743,34 +2700,36 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
  			}
  		}
  
 -		ALLOC_GROW(store.offset, 1, store.offset_alloc);
 -		store.offset[0] = 0;
 -		store.state = START;
 -		store.seen = 0;
 +		ALLOC_GROW(store.parsed, 1, store.parsed_alloc);
 +		store.parsed[0].end = 0;
 +
 +		memset(&opts, 0, sizeof(opts));
 +		opts.event_fn = store_aux_event;
 +		opts.event_fn_data = &store;
  
  		/*
 -		 * After this, store.offset will contain the *end* offset
 -		 * of the last match, or remain at 0 if no match was found.
 +		 * After this, store.parsed will contain offsets of all the
 +		 * parsed elements, and store.seen will contain a list of
 +		 * matches, as indices into store.parsed.
 +		 *
  		 * As a side effect, we make sure to transform only a valid
  		 * existing config file.
  		 */
 -		f = fopen_or_warn(config_filename, "r");
 -		if (!f || do_config_from_file(store_aux, CONFIG_ORIGIN_FILE,
 -					      config_filename, config_filename,
 -					      f, NULL, 1)) {
 +		if (git_config_from_file_with_options(store_aux,
 +						      config_filename,
 +						      &store, &opts)) {
  			error("invalid config file %s", config_filename);
 +			free(store.key);
  			if (store.value_regex != NULL &&
  			    store.value_regex != CONFIG_REGEX_NONE) {
  				regfree(store.value_regex);
  				free(store.value_regex);
  			}
  			ret = CONFIG_INVALID_FILE;
 -			if (f)
 -				fclose(f);
  			goto out_free;
 -		} else
 -			fclose(f);
 +		}
  
 +		free(store.key);
  		if (store.value_regex != NULL &&
  		    store.value_regex != CONFIG_REGEX_NONE) {
  			regfree(store.value_regex);
 @@ -2778,8 +2737,8 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
  		}
  
  		/* if nothing to unset, or too many matches, error out */
 -		if ((store.seen == 0 && value == NULL) ||
 -				(store.seen > 1 && multi_replace == 0)) {
 +		if ((store.seen_nr == 0 && value == NULL) ||
 +		    (store.seen_nr > 1 && multi_replace == 0)) {
  			ret = CONFIG_NOTHING_SET;
  			goto out_free;
  		}
 @@ -2810,25 +2769,48 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
  			goto out_free;
  		}
  
 -		if (store.seen == 0)
 -			store.seen = 1;
 +		if (store.seen_nr == 0) {
 +			if (!store.seen_alloc) {
 +				/* Did not see key nor section */
 +				ALLOC_GROW(store.seen, 1, store.seen_alloc);
 +				store.seen[0] = store.parsed_nr
 +					- !!store.parsed_nr;
 +			}
 +			store.seen_nr = 1;
 +		}
  
 -		for (i = 0, copy_begin = 0; i < store.seen; i++) {
 -			if (store.offset[i] == 0) {
 -				store.offset[i] = copy_end = contents_sz;
 -			} else if (store.state != KEY_SEEN) {
 -				copy_end = store.offset[i];
 +		for (i = 0, copy_begin = 0; i < store.seen_nr; i++) {
 +			size_t replace_end;
 +			int j = store.seen[i];
 +
 +			new_line = 0;
 +			if (!store.key_seen) {
 +				copy_end = store.parsed[j].end;
 +				/* include '\n' when copying section header */
 +				if (copy_end > 0 && copy_end < contents_sz &&
 +				    contents[copy_end - 1] != '\n' &&
 +				    contents[copy_end] == '\n')
 +					copy_end++;
 +				replace_end = copy_end;
  			} else {
 -				copy_end = find_beginning_of_line(
 -					contents, contents_sz,
 -					store.offset[i], &new_line);
 +				replace_end = store.parsed[j].end;
 +				copy_end = store.parsed[j].begin;
  				if (!value)
 -					maybe_remove_section(contents,
 -							     contents_sz,
 -							     section_name,
 -							     store.baselen,
 -							     &copy_end, &i,
 -							     &new_line);
 +					maybe_remove_section(&store, contents,
 +							     &copy_end,
 +							     &replace_end, &i);
 +				/*
 +				 * Swallow preceding white-space on the same
 +				 * line.
 +				 */
 +				while (copy_end > 0 ) {
 +					char c = contents[copy_end - 1];
 +
 +					if (isspace(c) && c != '\n')
 +						copy_end--;
 +					else
 +						break;
 +				}
  			}
  
  			if (copy_end > 0 && contents[copy_end-1] != '\n')
 @@ -2843,16 +2825,16 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
  				    write_str_in_full(fd, "\n") < 0)
  					goto write_err_out;
  			}
 -			copy_begin = store.offset[i];
 +			copy_begin = replace_end;
  		}
  
  		/* write the pair (value == NULL means unset) */
  		if (value != NULL) {
 -			if (store.state == START) {
 -				if (write_section(fd, key) < 0)
 +			if (!store.section_seen) {
 +				if (write_section(fd, key, &store) < 0)
  					goto write_err_out;
  			}
 -			if (write_pair(fd, key, value) < 0)
 +			if (write_pair(fd, key, value, &store) < 0)
  				goto write_err_out;
  		}
  
 @@ -2879,7 +2861,6 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
  
  out_free:
  	rollback_lock_file(&lock);
 -	free(section_name);
  	free(filename_buf);
  	if (contents)
  		munmap(contents, contents_sz);
 @@ -2977,7 +2958,8 @@ static int section_name_is_ok(const char *name)
  
  /* if new_name == NULL, the section is removed instead */
  static int git_config_copy_or_rename_section_in_file(const char *config_filename,
 -				      const char *old_name, const char *new_name, int copy)
 +				      const char *old_name,
 +				      const char *new_name, int copy)
  {
  	int ret = 0, remove = 0;
  	char *filename_buf = NULL;
 @@ -2987,6 +2969,9 @@ static int git_config_copy_or_rename_section_in_file(const char *config_filename
  	FILE *config_file = NULL;
  	struct stat st;
  	struct strbuf copystr = STRBUF_INIT;
 +	struct config_set_store store;
 +
 +	memset(&store, 0, sizeof(store));
  
  	if (new_name && !section_name_is_ok(new_name)) {
  		ret = error("invalid section name: %s", new_name);
 @@ -3056,7 +3041,7 @@ static int git_config_copy_or_rename_section_in_file(const char *config_filename
  				}
  				store.baselen = strlen(new_name);
  				if (!copy) {
 -					if (write_section(out_fd, new_name) < 0) {
 +					if (write_section(out_fd, new_name, &store) < 0) {
  						ret = write_error(get_lock_file_path(&lock));
  						goto out;
  					}
 @@ -3077,7 +3062,7 @@ static int git_config_copy_or_rename_section_in_file(const char *config_filename
  						output[0] = '\t';
  					}
  				} else {
 -					copystr = store_create_section(new_name);
 +					copystr = store_create_section(new_name, &store);
  				}
  			}
  			remove = 0;
 diff --git a/config.h b/config.h
 index ef70a9cac1e..5a2394daae2 100644
 --- a/config.h
 +++ b/config.h
 @@ -28,15 +28,40 @@ enum config_origin_type {
  	CONFIG_ORIGIN_CMDLINE
  };
  
 +enum config_event_t {
 +	CONFIG_EVENT_SECTION,
 +	CONFIG_EVENT_ENTRY,
 +	CONFIG_EVENT_WHITESPACE,
 +	CONFIG_EVENT_COMMENT,
 +	CONFIG_EVENT_EOF,
 +	CONFIG_EVENT_ERROR
 +};
 +
 +/*
 + * The parser event function (if not NULL) is called with the event type and
 + * the begin/end offsets of the parsed elements.
 + *
 + * Note: for CONFIG_EVENT_ENTRY (i.e. config variables), the trailing newline
 + * character is considered part of the element.
 + */
 +typedef int (*config_parser_event_fn_t)(enum config_event_t type,
 +					size_t begin_offset, size_t end_offset,
 +					void *event_fn_data);
 +
  struct config_options {
  	unsigned int respect_includes : 1;
  	const char *commondir;
  	const char *git_dir;
 +	config_parser_event_fn_t event_fn;
 +	void *event_fn_data;
  };
  
  typedef int (*config_fn_t)(const char *, const char *, void *);
  extern int git_default_config(const char *, const char *, void *);
  extern int git_config_from_file(config_fn_t fn, const char *, void *);
 +extern int git_config_from_file_with_options(config_fn_t fn, const char *,
 +					     void *,
 +					     const struct config_options *);
  extern int git_config_from_mem(config_fn_t fn, const enum config_origin_type,
  					const char *name, const char *buf, size_t len, void *data);
  extern int git_config_from_blob_oid(config_fn_t fn, const char *name,
 diff --git a/t/t1300-config.sh b/t/t1300-config.sh
 index 867397ae930..6d0e13020d1 100755
 --- a/t/t1300-config.sh
 +++ b/t/t1300-config.sh
 @@ -1643,4 +1643,25 @@ test_expect_success '--local requires a repo' '
  	test_expect_code 128 nongit git config --local foo.bar
  '
  
 +test_expect_success '--replace-all does not invent newlines' '
 +	q_to_tab >.git/config <<-\EOF &&
 +	[abc]key
 +	QkeepSection
 +	[xyz]
 +	Qkey = 1
 +	[abc]
 +	Qkey = a
 +	EOF
 +	q_to_tab >expect <<-\EOF &&
 +	[abc]
 +	QkeepSection
 +	[xyz]
 +	Qkey = 1
 +	[abc]
 +	Qkey = b
 +	EOF
 +	git config --replace-all abc.key b &&
 +	test_cmp .git/config expect
 +'
 +
  test_done
-- 
2.16.2.windows.1.26.g2cc3565eb4b


^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v2 01/15] git_config_set: fix off-by-two
  2018-04-03 16:27 ` [PATCH v2 00/15] " Johannes Schindelin
@ 2018-04-03 16:28   ` Johannes Schindelin
  2018-04-03 16:28   ` [PATCH v2 02/15] t1300: rename it to reflect that `repo-config` was deprecated Johannes Schindelin
                     ` (15 subsequent siblings)
  16 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-03 16:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

Currently, we are slightly overzealous When removing an entry from a
config file of this form:

	[abc]a
	[xyz]
		key = value

When calling `git config --unset abc.a` on this file, it leaves this
(invalid) config behind:

	[
	[xyz]
		key = value

The reason is that we try to search for the beginning of the line (or
for the end of the preceding section header on the same line) that
defines abc.a, but as an optimization, we subtract 2 from the offset
pointing just after the definition before we call
find_beginning_of_line(). That function, however, *also* performs that
optimization and promptly fails to find the section header correctly.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/config.c b/config.c
index b0c20e6cb8a..5cc049aaef0 100644
--- a/config.c
+++ b/config.c
@@ -2632,7 +2632,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 			} else
 				copy_end = find_beginning_of_line(
 					contents, contents_sz,
-					store.offset[i]-2, &new_line);
+					store.offset[i], &new_line);
 
 			if (copy_end > 0 && contents[copy_end-1] != '\n')
 				new_line = 1;
-- 
2.16.2.windows.1.26.g2cc3565eb4b



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v2 02/15] t1300: rename it to reflect that `repo-config` was deprecated
  2018-04-03 16:27 ` [PATCH v2 00/15] " Johannes Schindelin
  2018-04-03 16:28   ` [PATCH v2 01/15] git_config_set: fix off-by-two Johannes Schindelin
@ 2018-04-03 16:28   ` Johannes Schindelin
  2018-04-03 16:28   ` [PATCH v2 03/15] t1300: demonstrate that --replace-all can "invent" newlines Johannes Schindelin
                     ` (14 subsequent siblings)
  16 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-03 16:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/{t1300-repo-config.sh => t1300-config.sh} | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename t/{t1300-repo-config.sh => t1300-config.sh} (100%)

diff --git a/t/t1300-repo-config.sh b/t/t1300-config.sh
similarity index 100%
rename from t/t1300-repo-config.sh
rename to t/t1300-config.sh
-- 
2.16.2.windows.1.26.g2cc3565eb4b



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v2 03/15] t1300: demonstrate that --replace-all can "invent" newlines
  2018-04-03 16:27 ` [PATCH v2 00/15] " Johannes Schindelin
  2018-04-03 16:28   ` [PATCH v2 01/15] git_config_set: fix off-by-two Johannes Schindelin
  2018-04-03 16:28   ` [PATCH v2 02/15] t1300: rename it to reflect that `repo-config` was deprecated Johannes Schindelin
@ 2018-04-03 16:28   ` Johannes Schindelin
  2018-04-03 16:28   ` [PATCH v2 04/15] config --replace-all: avoid extra line breaks Johannes Schindelin
                     ` (13 subsequent siblings)
  16 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-03 16:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1300-config.sh | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 4f8e6f5fde3..cc417687e8d 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -1611,4 +1611,25 @@ test_expect_success '--local requires a repo' '
 	test_expect_code 128 nongit git config --local foo.bar
 '
 
+test_expect_failure '--replace-all does not invent newlines' '
+	q_to_tab >.git/config <<-\EOF &&
+	[abc]key
+	QkeepSection
+	[xyz]
+	Qkey = 1
+	[abc]
+	Qkey = a
+	EOF
+	q_to_tab >expect <<-\EOF &&
+	[abc]
+	QkeepSection
+	[xyz]
+	Qkey = 1
+	[abc]
+	Qkey = b
+	EOF
+	git config --replace-all abc.key b &&
+	test_cmp .git/config expect
+'
+
 test_done
-- 
2.16.2.windows.1.26.g2cc3565eb4b



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v2 04/15] config --replace-all: avoid extra line breaks
  2018-04-03 16:27 ` [PATCH v2 00/15] " Johannes Schindelin
                     ` (2 preceding siblings ...)
  2018-04-03 16:28   ` [PATCH v2 03/15] t1300: demonstrate that --replace-all can "invent" newlines Johannes Schindelin
@ 2018-04-03 16:28   ` Johannes Schindelin
  2018-04-03 16:28   ` [PATCH v2 05/15] t1300: avoid relying on a bug Johannes Schindelin
                     ` (12 subsequent siblings)
  16 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-03 16:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

When replacing multiple config entries at once, we did not re-set the
flag that indicates whether we need to insert a new-line before the new
entry. As a consequence, an extra new-line was inserted under certain
circumstances.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.c          | 1 +
 t/t1300-config.sh | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/config.c b/config.c
index 5cc049aaef0..f10f8c6f52f 100644
--- a/config.c
+++ b/config.c
@@ -2625,6 +2625,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 			store.seen = 1;
 
 		for (i = 0, copy_begin = 0; i < store.seen; i++) {
+			new_line = 0;
 			if (store.offset[i] == 0) {
 				store.offset[i] = copy_end = contents_sz;
 			} else if (store.state != KEY_SEEN) {
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index cc417687e8d..aed12be492f 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -1611,7 +1611,7 @@ test_expect_success '--local requires a repo' '
 	test_expect_code 128 nongit git config --local foo.bar
 '
 
-test_expect_failure '--replace-all does not invent newlines' '
+test_expect_success '--replace-all does not invent newlines' '
 	q_to_tab >.git/config <<-\EOF &&
 	[abc]key
 	QkeepSection
-- 
2.16.2.windows.1.26.g2cc3565eb4b



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v2 05/15] t1300: avoid relying on a bug
  2018-04-03 16:27 ` [PATCH v2 00/15] " Johannes Schindelin
                     ` (3 preceding siblings ...)
  2018-04-03 16:28   ` [PATCH v2 04/15] config --replace-all: avoid extra line breaks Johannes Schindelin
@ 2018-04-03 16:28   ` Johannes Schindelin
  2018-04-03 16:28   ` [PATCH v2 06/15] t1300: remove unreasonable expectation from TODO Johannes Schindelin
                     ` (11 subsequent siblings)
  16 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-03 16:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

The test case 'unset with cont. lines' relied on a bug that is about to
be fixed: it tests *explicitly* that removing the last entry from a
config section leaves an *empty* section behind.

Let's fix this test case not to rely on that behavior, simply by
preventing the section from becoming empty.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1300-config.sh | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index aed12be492f..7c0ee208dea 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -108,6 +108,7 @@ bar = foo
 [beta]
 baz = multiple \
 lines
+foo = bar
 EOF
 
 test_expect_success 'unset with cont. lines' '
@@ -118,6 +119,7 @@ cat > expect <<\EOF
 [alpha]
 bar = foo
 [beta]
+foo = bar
 EOF
 
 test_expect_success 'unset with cont. lines is correct' 'test_cmp expect .git/config'
-- 
2.16.2.windows.1.26.g2cc3565eb4b



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v2 06/15] t1300: remove unreasonable expectation from TODO
  2018-04-03 16:27 ` [PATCH v2 00/15] " Johannes Schindelin
                     ` (4 preceding siblings ...)
  2018-04-03 16:28   ` [PATCH v2 05/15] t1300: avoid relying on a bug Johannes Schindelin
@ 2018-04-03 16:28   ` Johannes Schindelin
  2018-04-03 16:28   ` [PATCH v2 07/15] t1300: `--unset-all` can leave an empty section behind (bug) Johannes Schindelin
                     ` (10 subsequent siblings)
  16 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-03 16:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

In https://public-inbox.org/git/7vvc8alzat.fsf@alter.siamese.dyndns.org/
a reasonable patch was made quite a bit less so by changing a test case
demonstrating a bug to a test case that demonstrates that we ask for too
much: the test case 'unsetting the last key in a section removes header'
now expects a future bug fix to be able to determine whether a free-form
comment above a section header refers to said section or not.

Rather than shooting for the stars (and not even getting off the
ground), let's start shooting for something obtainable and be reasonably
confident that we *can* get it.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1300-config.sh | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 7c0ee208dea..187fc5b195f 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -1413,7 +1413,7 @@ test_expect_success 'urlmatch with wildcard' '
 '
 
 # good section hygiene
-test_expect_failure 'unsetting the last key in a section removes header' '
+test_expect_failure '--unset last key removes section (except if commented)' '
 	cat >.git/config <<-\EOF &&
 	# some generic comment on the configuration file itself
 	# a comment specific to this "section" section.
@@ -1427,6 +1427,25 @@ test_expect_failure 'unsetting the last key in a section removes header' '
 
 	cat >expect <<-\EOF &&
 	# some generic comment on the configuration file itself
+	# a comment specific to this "section" section.
+	[section]
+	# some intervening lines
+	# that should also be dropped
+
+	# please be careful when you update the above variable
+	EOF
+
+	git config --unset section.key &&
+	test_cmp expect .git/config &&
+
+	cat >.git/config <<-\EOF &&
+	[section]
+	key = value
+	[next-section]
+	EOF
+
+	cat >expect <<-\EOF &&
+	[next-section]
 	EOF
 
 	git config --unset section.key &&
-- 
2.16.2.windows.1.26.g2cc3565eb4b



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v2 07/15] t1300: `--unset-all` can leave an empty section behind (bug)
  2018-04-03 16:27 ` [PATCH v2 00/15] " Johannes Schindelin
                     ` (5 preceding siblings ...)
  2018-04-03 16:28   ` [PATCH v2 06/15] t1300: remove unreasonable expectation from TODO Johannes Schindelin
@ 2018-04-03 16:28   ` Johannes Schindelin
  2018-04-03 16:28   ` [PATCH v2 08/15] config: introduce an optional event stream while parsing Johannes Schindelin
                     ` (9 subsequent siblings)
  16 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-03 16:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

We already have a test demonstrating that removing the last entry from a
config section fails to remove the section header of the now-empty
section.

The same can happen, of course, if we remove the last entries in one fell
swoop. This is *also* a bug, and should be fixed at the same time.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1300-config.sh | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 187fc5b195f..10b9bf4b088 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -1452,6 +1452,17 @@ test_expect_failure '--unset last key removes section (except if commented)' '
 	test_cmp expect .git/config
 '
 
+test_expect_failure '--unset-all removes section if empty & uncommented' '
+	cat >.git/config <<-\EOF &&
+	[section]
+	key = value1
+	key = value2
+	EOF
+
+	git config --unset-all section.key &&
+	test_line_count = 0 .git/config
+'
+
 test_expect_failure 'adding a key into an empty section reuses header' '
 	cat >.git/config <<-\EOF &&
 	[section]
-- 
2.16.2.windows.1.26.g2cc3565eb4b



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v2 08/15] config: introduce an optional event stream while parsing
  2018-04-03 16:27 ` [PATCH v2 00/15] " Johannes Schindelin
                     ` (6 preceding siblings ...)
  2018-04-03 16:28   ` [PATCH v2 07/15] t1300: `--unset-all` can leave an empty section behind (bug) Johannes Schindelin
@ 2018-04-03 16:28   ` Johannes Schindelin
  2018-04-06 21:22     ` Jeff King
  2018-04-03 16:28   ` [PATCH v2 09/15] config: avoid using the global variable `store` Johannes Schindelin
                     ` (8 subsequent siblings)
  16 siblings, 1 reply; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-03 16:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

This extends our config parser so that it can optionally produce an event
stream via callback function, where it reports e.g. when a comment was
parsed, or a section header, etc.

This parser will be used subsequently to handle the scenarios better where
removing config entries would make sections empty, or where a new entry
could be added to an already-existing, empty section.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--------
 config.h |  25 ++++++++++++++++
 2 files changed, 115 insertions(+), 12 deletions(-)

diff --git a/config.c b/config.c
index f10f8c6f52f..4cd745f6628 100644
--- a/config.c
+++ b/config.c
@@ -653,7 +653,46 @@ static int get_base_var(struct strbuf *name)
 	}
 }
 
-static int git_parse_source(config_fn_t fn, void *data)
+struct parse_event_data {
+	enum config_event_t previous_type;
+	size_t previous_offset;
+	const struct config_options *opts;
+};
+
+static inline int do_event(enum config_event_t type,
+			   struct parse_event_data *data)
+{
+	size_t offset;
+
+	if (!data->opts || !data->opts->event_fn)
+		return 0;
+
+	if (type == CONFIG_EVENT_WHITESPACE &&
+	    data->previous_type == type)
+		return 0;
+
+	offset = cf->do_ftell(cf);
+	/*
+	 * At EOF, the parser always "inserts" an extra '\n', therefore
+	 * the end offset of the event is the current file position, otherwise
+	 * we will already have advanced to the next event.
+	 */
+	if (type != CONFIG_EVENT_EOF)
+		offset--;
+
+	if (data->previous_type != CONFIG_EVENT_EOF &&
+	    data->opts->event_fn(data->previous_type, data->previous_offset,
+				 offset, data->opts->event_fn_data) < 0)
+		return -1;
+
+	data->previous_type = type;
+	data->previous_offset = offset;
+
+	return 0;
+}
+
+static int git_parse_source(config_fn_t fn, void *data,
+			    const struct config_options *opts)
 {
 	int comment = 0;
 	int baselen = 0;
@@ -664,8 +703,15 @@ static int git_parse_source(config_fn_t fn, void *data)
 	/* U+FEFF Byte Order Mark in UTF8 */
 	const char *bomptr = utf8_bom;
 
+	/* For the parser event callback */
+	struct parse_event_data event_data = {
+		CONFIG_EVENT_EOF, 0, opts
+	};
+
 	for (;;) {
-		int c = get_next_char();
+		int c;
+
+		c = get_next_char();
 		if (bomptr && *bomptr) {
 			/* We are at the file beginning; skip UTF8-encoded BOM
 			 * if present. Sane editors won't put this in on their
@@ -682,18 +728,33 @@ static int git_parse_source(config_fn_t fn, void *data)
 			}
 		}
 		if (c == '\n') {
-			if (cf->eof)
+			if (cf->eof) {
+				if (do_event(CONFIG_EVENT_EOF, &event_data) < 0)
+					return -1;
 				return 0;
+			}
+			if (do_event(CONFIG_EVENT_WHITESPACE, &event_data) < 0)
+				return -1;
 			comment = 0;
 			continue;
 		}
-		if (comment || isspace(c))
+		if (comment)
 			continue;
+		if (isspace(c)) {
+			if (do_event(CONFIG_EVENT_WHITESPACE, &event_data) < 0)
+					return -1;
+			continue;
+		}
 		if (c == '#' || c == ';') {
+			if (do_event(CONFIG_EVENT_COMMENT, &event_data) < 0)
+					return -1;
 			comment = 1;
 			continue;
 		}
 		if (c == '[') {
+			if (do_event(CONFIG_EVENT_SECTION, &event_data) < 0)
+					return -1;
+
 			/* Reset prior to determining a new stem */
 			strbuf_reset(var);
 			if (get_base_var(var) < 0 || var->len < 1)
@@ -704,6 +765,10 @@ static int git_parse_source(config_fn_t fn, void *data)
 		}
 		if (!isalpha(c))
 			break;
+
+		if (do_event(CONFIG_EVENT_ENTRY, &event_data) < 0)
+			return -1;
+
 		/*
 		 * Truncate the var name back to the section header
 		 * stem prior to grabbing the suffix part of the name
@@ -715,6 +780,9 @@ static int git_parse_source(config_fn_t fn, void *data)
 			break;
 	}
 
+	if (do_event(CONFIG_EVENT_ERROR, &event_data) < 0)
+		return -1;
+
 	switch (cf->origin_type) {
 	case CONFIG_ORIGIN_BLOB:
 		error_msg = xstrfmt(_("bad config line %d in blob %s"),
@@ -1398,7 +1466,8 @@ int git_default_config(const char *var, const char *value, void *dummy)
  * fgetc, ungetc, ftell of top need to be initialized before calling
  * this function.
  */
-static int do_config_from(struct config_source *top, config_fn_t fn, void *data)
+static int do_config_from(struct config_source *top, config_fn_t fn, void *data,
+			  const struct config_options *opts)
 {
 	int ret;
 
@@ -1410,7 +1479,7 @@ static int do_config_from(struct config_source *top, config_fn_t fn, void *data)
 	strbuf_init(&top->var, 1024);
 	cf = top;
 
-	ret = git_parse_source(fn, data);
+	ret = git_parse_source(fn, data, opts);
 
 	/* pop config-file parsing state stack */
 	strbuf_release(&top->value);
@@ -1423,7 +1492,7 @@ static int do_config_from(struct config_source *top, config_fn_t fn, void *data)
 static int do_config_from_file(config_fn_t fn,
 		const enum config_origin_type origin_type,
 		const char *name, const char *path, FILE *f,
-		void *data)
+		void *data, const struct config_options *opts)
 {
 	struct config_source top;
 
@@ -1436,15 +1505,18 @@ static int do_config_from_file(config_fn_t fn,
 	top.do_ungetc = config_file_ungetc;
 	top.do_ftell = config_file_ftell;
 
-	return do_config_from(&top, fn, data);
+	return do_config_from(&top, fn, data, opts);
 }
 
 static int git_config_from_stdin(config_fn_t fn, void *data)
 {
-	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin, data);
+	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin,
+				   data, NULL);
 }
 
-int git_config_from_file(config_fn_t fn, const char *filename, void *data)
+int git_config_from_file_with_options(config_fn_t fn, const char *filename,
+				      void *data,
+				      const struct config_options *opts)
 {
 	int ret = -1;
 	FILE *f;
@@ -1452,13 +1524,19 @@ int git_config_from_file(config_fn_t fn, const char *filename, void *data)
 	f = fopen_or_warn(filename, "r");
 	if (f) {
 		flockfile(f);
-		ret = do_config_from_file(fn, CONFIG_ORIGIN_FILE, filename, filename, f, data);
+		ret = do_config_from_file(fn, CONFIG_ORIGIN_FILE, filename,
+					  filename, f, data, opts);
 		funlockfile(f);
 		fclose(f);
 	}
 	return ret;
 }
 
+int git_config_from_file(config_fn_t fn, const char *filename, void *data)
+{
+	return git_config_from_file_with_options(fn, filename, data, NULL);
+}
+
 int git_config_from_mem(config_fn_t fn, const enum config_origin_type origin_type,
 			const char *name, const char *buf, size_t len, void *data)
 {
@@ -1475,7 +1553,7 @@ int git_config_from_mem(config_fn_t fn, const enum config_origin_type origin_typ
 	top.do_ungetc = config_buf_ungetc;
 	top.do_ftell = config_buf_ftell;
 
-	return do_config_from(&top, fn, data);
+	return do_config_from(&top, fn, data, NULL);
 }
 
 int git_config_from_blob_oid(config_fn_t fn,
diff --git a/config.h b/config.h
index ef70a9cac1e..5a2394daae2 100644
--- a/config.h
+++ b/config.h
@@ -28,15 +28,40 @@ enum config_origin_type {
 	CONFIG_ORIGIN_CMDLINE
 };
 
+enum config_event_t {
+	CONFIG_EVENT_SECTION,
+	CONFIG_EVENT_ENTRY,
+	CONFIG_EVENT_WHITESPACE,
+	CONFIG_EVENT_COMMENT,
+	CONFIG_EVENT_EOF,
+	CONFIG_EVENT_ERROR
+};
+
+/*
+ * The parser event function (if not NULL) is called with the event type and
+ * the begin/end offsets of the parsed elements.
+ *
+ * Note: for CONFIG_EVENT_ENTRY (i.e. config variables), the trailing newline
+ * character is considered part of the element.
+ */
+typedef int (*config_parser_event_fn_t)(enum config_event_t type,
+					size_t begin_offset, size_t end_offset,
+					void *event_fn_data);
+
 struct config_options {
 	unsigned int respect_includes : 1;
 	const char *commondir;
 	const char *git_dir;
+	config_parser_event_fn_t event_fn;
+	void *event_fn_data;
 };
 
 typedef int (*config_fn_t)(const char *, const char *, void *);
 extern int git_default_config(const char *, const char *, void *);
 extern int git_config_from_file(config_fn_t fn, const char *, void *);
+extern int git_config_from_file_with_options(config_fn_t fn, const char *,
+					     void *,
+					     const struct config_options *);
 extern int git_config_from_mem(config_fn_t fn, const enum config_origin_type,
 					const char *name, const char *buf, size_t len, void *data);
 extern int git_config_from_blob_oid(config_fn_t fn, const char *name,
-- 
2.16.2.windows.1.26.g2cc3565eb4b



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v2 09/15] config: avoid using the global variable `store`
  2018-04-03 16:27 ` [PATCH v2 00/15] " Johannes Schindelin
                     ` (7 preceding siblings ...)
  2018-04-03 16:28   ` [PATCH v2 08/15] config: introduce an optional event stream while parsing Johannes Schindelin
@ 2018-04-03 16:28   ` Johannes Schindelin
  2018-04-06 21:23     ` Jeff King
  2018-04-03 16:28   ` [PATCH v2 10/15] config_set_store: rename some fields for consistency Johannes Schindelin
                     ` (7 subsequent siblings)
  16 siblings, 1 reply; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-03 16:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

It is much easier to reason about, when the config code to set/unset
variables or to remove/rename sections does not rely on a global (or
file-local) variable.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.c | 119 +++++++++++++++++++++++++++++++++++----------------------------
 1 file changed, 66 insertions(+), 53 deletions(-)

diff --git a/config.c b/config.c
index 4cd745f6628..90ae71cb905 100644
--- a/config.c
+++ b/config.c
@@ -2297,7 +2297,7 @@ void git_die_config(const char *key, const char *err, ...)
  * Find all the stuff for git_config_set() below.
  */
 
-static struct {
+struct config_set_store {
 	int baselen;
 	char *key;
 	int do_not_match;
@@ -2307,56 +2307,58 @@ static struct {
 	unsigned int offset_alloc;
 	enum { START, SECTION_SEEN, SECTION_END_SEEN, KEY_SEEN } state;
 	unsigned int seen;
-} store;
+};
 
-static int matches(const char *key, const char *value)
+static int matches(const char *key, const char *value,
+		   const struct config_set_store *store)
 {
-	if (strcmp(key, store.key))
+	if (strcmp(key, store->key))
 		return 0; /* not ours */
-	if (!store.value_regex)
+	if (!store->value_regex)
 		return 1; /* always matches */
-	if (store.value_regex == CONFIG_REGEX_NONE)
+	if (store->value_regex == CONFIG_REGEX_NONE)
 		return 0; /* never matches */
 
-	return store.do_not_match ^
-		(value && !regexec(store.value_regex, value, 0, NULL, 0));
+	return store->do_not_match ^
+		(value && !regexec(store->value_regex, value, 0, NULL, 0));
 }
 
 static int store_aux(const char *key, const char *value, void *cb)
 {
 	const char *ep;
 	size_t section_len;
+	struct config_set_store *store = cb;
 
-	switch (store.state) {
+	switch (store->state) {
 	case KEY_SEEN:
-		if (matches(key, value)) {
-			if (store.seen == 1 && store.multi_replace == 0) {
+		if (matches(key, value, store)) {
+			if (store->seen == 1 && store->multi_replace == 0) {
 				warning(_("%s has multiple values"), key);
 			}
 
-			ALLOC_GROW(store.offset, store.seen + 1,
-				   store.offset_alloc);
+			ALLOC_GROW(store->offset, store->seen + 1,
+				   store->offset_alloc);
 
-			store.offset[store.seen] = cf->do_ftell(cf);
-			store.seen++;
+			store->offset[store->seen] = cf->do_ftell(cf);
+			store->seen++;
 		}
 		break;
 	case SECTION_SEEN:
 		/*
-		 * What we are looking for is in store.key (both
+		 * What we are looking for is in store->key (both
 		 * section and var), and its section part is baselen
 		 * long.  We found key (again, both section and var).
 		 * We would want to know if this key is in the same
 		 * section as what we are looking for.  We already
 		 * know we are in the same section as what should
-		 * hold store.key.
+		 * hold store->key.
 		 */
 		ep = strrchr(key, '.');
 		section_len = ep - key;
 
-		if ((section_len != store.baselen) ||
-		    memcmp(key, store.key, section_len+1)) {
-			store.state = SECTION_END_SEEN;
+		if ((section_len != store->baselen) ||
+		    memcmp(key, store->key, section_len+1)) {
+			store->state = SECTION_END_SEEN;
 			break;
 		}
 
@@ -2364,26 +2366,27 @@ static int store_aux(const char *key, const char *value, void *cb)
 		 * Do not increment matches: this is no match, but we
 		 * just made sure we are in the desired section.
 		 */
-		ALLOC_GROW(store.offset, store.seen + 1,
-			   store.offset_alloc);
-		store.offset[store.seen] = cf->do_ftell(cf);
+		ALLOC_GROW(store->offset, store->seen + 1,
+			   store->offset_alloc);
+		store->offset[store->seen] = cf->do_ftell(cf);
 		/* fallthru */
 	case SECTION_END_SEEN:
 	case START:
-		if (matches(key, value)) {
-			ALLOC_GROW(store.offset, store.seen + 1,
-				   store.offset_alloc);
-			store.offset[store.seen] = cf->do_ftell(cf);
-			store.state = KEY_SEEN;
-			store.seen++;
+		if (matches(key, value, store)) {
+			ALLOC_GROW(store->offset, store->seen + 1,
+				   store->offset_alloc);
+			store->offset[store->seen] = cf->do_ftell(cf);
+			store->state = KEY_SEEN;
+			store->seen++;
 		} else {
-			if (strrchr(key, '.') - key == store.baselen &&
-			      !strncmp(key, store.key, store.baselen)) {
-					store.state = SECTION_SEEN;
-					ALLOC_GROW(store.offset,
-						   store.seen + 1,
-						   store.offset_alloc);
-					store.offset[store.seen] = cf->do_ftell(cf);
+			if (strrchr(key, '.') - key == store->baselen &&
+			      !strncmp(key, store->key, store->baselen)) {
+					store->state = SECTION_SEEN;
+					ALLOC_GROW(store->offset,
+						   store->seen + 1,
+						   store->offset_alloc);
+					store->offset[store->seen] =
+						cf->do_ftell(cf);
 			}
 		}
 	}
@@ -2398,31 +2401,33 @@ static int write_error(const char *filename)
 	return 4;
 }
 
-static struct strbuf store_create_section(const char *key)
+static struct strbuf store_create_section(const char *key,
+					  const struct config_set_store *store)
 {
 	const char *dot;
 	int i;
 	struct strbuf sb = STRBUF_INIT;
 
-	dot = memchr(key, '.', store.baselen);
+	dot = memchr(key, '.', store->baselen);
 	if (dot) {
 		strbuf_addf(&sb, "[%.*s \"", (int)(dot - key), key);
-		for (i = dot - key + 1; i < store.baselen; i++) {
+		for (i = dot - key + 1; i < store->baselen; i++) {
 			if (key[i] == '"' || key[i] == '\\')
 				strbuf_addch(&sb, '\\');
 			strbuf_addch(&sb, key[i]);
 		}
 		strbuf_addstr(&sb, "\"]\n");
 	} else {
-		strbuf_addf(&sb, "[%.*s]\n", store.baselen, key);
+		strbuf_addf(&sb, "[%.*s]\n", store->baselen, key);
 	}
 
 	return sb;
 }
 
-static ssize_t write_section(int fd, const char *key)
+static ssize_t write_section(int fd, const char *key,
+			     const struct config_set_store *store)
 {
-	struct strbuf sb = store_create_section(key);
+	struct strbuf sb = store_create_section(key, store);
 	ssize_t ret;
 
 	ret = write_in_full(fd, sb.buf, sb.len);
@@ -2431,11 +2436,12 @@ static ssize_t write_section(int fd, const char *key)
 	return ret;
 }
 
-static ssize_t write_pair(int fd, const char *key, const char *value)
+static ssize_t write_pair(int fd, const char *key, const char *value,
+			  const struct config_set_store *store)
 {
 	int i;
 	ssize_t ret;
-	int length = strlen(key + store.baselen + 1);
+	int length = strlen(key + store->baselen + 1);
 	const char *quote = "";
 	struct strbuf sb = STRBUF_INIT;
 
@@ -2455,7 +2461,7 @@ static ssize_t write_pair(int fd, const char *key, const char *value)
 		quote = "\"";
 
 	strbuf_addf(&sb, "\t%.*s = %s",
-		    length, key + store.baselen + 1, quote);
+		    length, key + store->baselen + 1, quote);
 
 	for (i = 0; value[i]; i++)
 		switch (value[i]) {
@@ -2565,6 +2571,9 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 	char *filename_buf = NULL;
 	char *contents = NULL;
 	size_t contents_sz;
+	struct config_set_store store;
+
+	memset(&store, 0, sizeof(store));
 
 	/* parse-key returns negative; flip the sign to feed exit(3) */
 	ret = 0 - git_config_parse_key(key, &store.key, &store.baselen);
@@ -2607,8 +2616,8 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 		}
 
 		store.key = (char *)key;
-		if (write_section(fd, key) < 0 ||
-		    write_pair(fd, key, value) < 0)
+		if (write_section(fd, key, &store) < 0 ||
+		    write_pair(fd, key, value, &store) < 0)
 			goto write_err_out;
 	} else {
 		struct stat st;
@@ -2647,7 +2656,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 		 * As a side effect, we make sure to transform only a valid
 		 * existing config file.
 		 */
-		if (git_config_from_file(store_aux, config_filename, NULL)) {
+		if (git_config_from_file(store_aux, config_filename, &store)) {
 			error("invalid config file %s", config_filename);
 			free(store.key);
 			if (store.value_regex != NULL &&
@@ -2731,10 +2740,10 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 		/* write the pair (value == NULL means unset) */
 		if (value != NULL) {
 			if (store.state == START) {
-				if (write_section(fd, key) < 0)
+				if (write_section(fd, key, &store) < 0)
 					goto write_err_out;
 			}
-			if (write_pair(fd, key, value) < 0)
+			if (write_pair(fd, key, value, &store) < 0)
 				goto write_err_out;
 		}
 
@@ -2858,7 +2867,8 @@ static int section_name_is_ok(const char *name)
 
 /* if new_name == NULL, the section is removed instead */
 static int git_config_copy_or_rename_section_in_file(const char *config_filename,
-				      const char *old_name, const char *new_name, int copy)
+				      const char *old_name,
+				      const char *new_name, int copy)
 {
 	int ret = 0, remove = 0;
 	char *filename_buf = NULL;
@@ -2868,6 +2878,9 @@ static int git_config_copy_or_rename_section_in_file(const char *config_filename
 	FILE *config_file = NULL;
 	struct stat st;
 	struct strbuf copystr = STRBUF_INIT;
+	struct config_set_store store;
+
+	memset(&store, 0, sizeof(store));
 
 	if (new_name && !section_name_is_ok(new_name)) {
 		ret = error("invalid section name: %s", new_name);
@@ -2937,7 +2950,7 @@ static int git_config_copy_or_rename_section_in_file(const char *config_filename
 				}
 				store.baselen = strlen(new_name);
 				if (!copy) {
-					if (write_section(out_fd, new_name) < 0) {
+					if (write_section(out_fd, new_name, &store) < 0) {
 						ret = write_error(get_lock_file_path(&lock));
 						goto out;
 					}
@@ -2958,7 +2971,7 @@ static int git_config_copy_or_rename_section_in_file(const char *config_filename
 						output[0] = '\t';
 					}
 				} else {
-					copystr = store_create_section(new_name);
+					copystr = store_create_section(new_name, &store);
 				}
 			}
 			remove = 0;
-- 
2.16.2.windows.1.26.g2cc3565eb4b



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v2 10/15] config_set_store: rename some fields for consistency
  2018-04-03 16:27 ` [PATCH v2 00/15] " Johannes Schindelin
                     ` (8 preceding siblings ...)
  2018-04-03 16:28   ` [PATCH v2 09/15] config: avoid using the global variable `store` Johannes Schindelin
@ 2018-04-03 16:28   ` Johannes Schindelin
  2018-04-03 16:28   ` [PATCH v2 11/15] git_config_set: do not use a state machine Johannes Schindelin
                     ` (6 subsequent siblings)
  16 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-03 16:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

The `seen` field is the actual length of the `offset` array, and the
`offset_alloc` field records what was allocated (to avoid resizing
wherever `seen` has to be incremented).

Elsewhere, we use the convention `name` for the array, where `name` is
descriptive enough to guess its purpose, `name_nr` for the actual length
and `name_alloc` to record the maximum length without needing to resize.

Let's make the names of the fields in question consistent with that
convention.

This will also help with the next steps where we will let the
git_config_set() machinery use the config event stream that we just
introduced.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.c | 63 +++++++++++++++++++++++++++++++--------------------------------
 1 file changed, 31 insertions(+), 32 deletions(-)

diff --git a/config.c b/config.c
index 90ae71cb905..b73b48b5650 100644
--- a/config.c
+++ b/config.c
@@ -2303,10 +2303,9 @@ struct config_set_store {
 	int do_not_match;
 	regex_t *value_regex;
 	int multi_replace;
-	size_t *offset;
-	unsigned int offset_alloc;
+	size_t *seen;
+	unsigned int seen_nr, seen_alloc;
 	enum { START, SECTION_SEEN, SECTION_END_SEEN, KEY_SEEN } state;
-	unsigned int seen;
 };
 
 static int matches(const char *key, const char *value,
@@ -2332,15 +2331,15 @@ static int store_aux(const char *key, const char *value, void *cb)
 	switch (store->state) {
 	case KEY_SEEN:
 		if (matches(key, value, store)) {
-			if (store->seen == 1 && store->multi_replace == 0) {
+			if (store->seen_nr == 1 && store->multi_replace == 0) {
 				warning(_("%s has multiple values"), key);
 			}
 
-			ALLOC_GROW(store->offset, store->seen + 1,
-				   store->offset_alloc);
+			ALLOC_GROW(store->seen, store->seen_nr + 1,
+				   store->seen_alloc);
 
-			store->offset[store->seen] = cf->do_ftell(cf);
-			store->seen++;
+			store->seen[store->seen_nr] = cf->do_ftell(cf);
+			store->seen_nr++;
 		}
 		break;
 	case SECTION_SEEN:
@@ -2366,26 +2365,26 @@ static int store_aux(const char *key, const char *value, void *cb)
 		 * Do not increment matches: this is no match, but we
 		 * just made sure we are in the desired section.
 		 */
-		ALLOC_GROW(store->offset, store->seen + 1,
-			   store->offset_alloc);
-		store->offset[store->seen] = cf->do_ftell(cf);
+		ALLOC_GROW(store->seen, store->seen_nr + 1,
+			   store->seen_alloc);
+		store->seen[store->seen_nr] = cf->do_ftell(cf);
 		/* fallthru */
 	case SECTION_END_SEEN:
 	case START:
 		if (matches(key, value, store)) {
-			ALLOC_GROW(store->offset, store->seen + 1,
-				   store->offset_alloc);
-			store->offset[store->seen] = cf->do_ftell(cf);
+			ALLOC_GROW(store->seen, store->seen_nr + 1,
+				   store->seen_alloc);
+			store->seen[store->seen_nr] = cf->do_ftell(cf);
 			store->state = KEY_SEEN;
-			store->seen++;
+			store->seen_nr++;
 		} else {
 			if (strrchr(key, '.') - key == store->baselen &&
 			      !strncmp(key, store->key, store->baselen)) {
 					store->state = SECTION_SEEN;
-					ALLOC_GROW(store->offset,
-						   store->seen + 1,
-						   store->offset_alloc);
-					store->offset[store->seen] =
+					ALLOC_GROW(store->seen,
+						   store->seen_nr + 1,
+						   store->seen_alloc);
+					store->seen[store->seen_nr] =
 						cf->do_ftell(cf);
 			}
 		}
@@ -2645,10 +2644,10 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 			}
 		}
 
-		ALLOC_GROW(store.offset, 1, store.offset_alloc);
-		store.offset[0] = 0;
+		ALLOC_GROW(store.seen, 1, store.seen_alloc);
+		store.seen[0] = 0;
 		store.state = START;
-		store.seen = 0;
+		store.seen_nr = 0;
 
 		/*
 		 * After this, store.offset will contain the *end* offset
@@ -2676,8 +2675,8 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 		}
 
 		/* if nothing to unset, or too many matches, error out */
-		if ((store.seen == 0 && value == NULL) ||
-				(store.seen > 1 && multi_replace == 0)) {
+		if ((store.seen_nr == 0 && value == NULL) ||
+		    (store.seen_nr > 1 && multi_replace == 0)) {
 			ret = CONFIG_NOTHING_SET;
 			goto out_free;
 		}
@@ -2708,19 +2707,19 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 			goto out_free;
 		}
 
-		if (store.seen == 0)
-			store.seen = 1;
+		if (store.seen_nr == 0)
+			store.seen_nr = 1;
 
-		for (i = 0, copy_begin = 0; i < store.seen; i++) {
+		for (i = 0, copy_begin = 0; i < store.seen_nr; i++) {
 			new_line = 0;
-			if (store.offset[i] == 0) {
-				store.offset[i] = copy_end = contents_sz;
+			if (store.seen[i] == 0) {
+				store.seen[i] = copy_end = contents_sz;
 			} else if (store.state != KEY_SEEN) {
-				copy_end = store.offset[i];
+				copy_end = store.seen[i];
 			} else
 				copy_end = find_beginning_of_line(
 					contents, contents_sz,
-					store.offset[i], &new_line);
+					store.seen[i], &new_line);
 
 			if (copy_end > 0 && contents[copy_end-1] != '\n')
 				new_line = 1;
@@ -2734,7 +2733,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 				    write_str_in_full(fd, "\n") < 0)
 					goto write_err_out;
 			}
-			copy_begin = store.offset[i];
+			copy_begin = store.seen[i];
 		}
 
 		/* write the pair (value == NULL means unset) */
-- 
2.16.2.windows.1.26.g2cc3565eb4b



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v2 11/15] git_config_set: do not use a state machine
  2018-04-03 16:27 ` [PATCH v2 00/15] " Johannes Schindelin
                     ` (9 preceding siblings ...)
  2018-04-03 16:28   ` [PATCH v2 10/15] config_set_store: rename some fields for consistency Johannes Schindelin
@ 2018-04-03 16:28   ` Johannes Schindelin
  2018-04-06 21:28     ` Jeff King
  2018-04-03 16:28   ` [PATCH v2 12/15] git_config_set: make use of the config parser's event stream Johannes Schindelin
                     ` (5 subsequent siblings)
  16 siblings, 1 reply; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-03 16:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

While a neat theoretical construct, state machines are hard to read. In
this instance, it does not even make a whole lot of sense because we are
more interested in flags, anyway: has the section been seen? Has the key
been seen? Does the current section match the key we are looking for?

Besides, the state `SECTION_SEEN` was named in a misleading way: it did
not indicate that we saw the section matching the key we are looking
for, but it instead indicated that we are *currently* in that section.

Let's just replace the state machine logic by clear and obvious flags.

This will also make it easier to review the upcoming patches to use the
newly-introduced `event_fn` callback of the config parser.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.c | 59 +++++++++++++++++++++++++++++------------------------------
 1 file changed, 29 insertions(+), 30 deletions(-)

diff --git a/config.c b/config.c
index b73b48b5650..84e8f7ffeb8 100644
--- a/config.c
+++ b/config.c
@@ -2305,7 +2305,7 @@ struct config_set_store {
 	int multi_replace;
 	size_t *seen;
 	unsigned int seen_nr, seen_alloc;
-	enum { START, SECTION_SEEN, SECTION_END_SEEN, KEY_SEEN } state;
+	unsigned int key_seen:1, section_seen:1, is_keys_section:1;
 };
 
 static int matches(const char *key, const char *value,
@@ -2328,8 +2328,7 @@ static int store_aux(const char *key, const char *value, void *cb)
 	size_t section_len;
 	struct config_set_store *store = cb;
 
-	switch (store->state) {
-	case KEY_SEEN:
+	if (store->key_seen) {
 		if (matches(key, value, store)) {
 			if (store->seen_nr == 1 && store->multi_replace == 0) {
 				warning(_("%s has multiple values"), key);
@@ -2341,8 +2340,8 @@ static int store_aux(const char *key, const char *value, void *cb)
 			store->seen[store->seen_nr] = cf->do_ftell(cf);
 			store->seen_nr++;
 		}
-		break;
-	case SECTION_SEEN:
+		return 0;
+	} else if (store->is_keys_section) {
 		/*
 		 * What we are looking for is in store->key (both
 		 * section and var), and its section part is baselen
@@ -2357,10 +2356,9 @@ static int store_aux(const char *key, const char *value, void *cb)
 
 		if ((section_len != store->baselen) ||
 		    memcmp(key, store->key, section_len+1)) {
-			store->state = SECTION_END_SEEN;
-			break;
+			store->is_keys_section = 0;
+			return 0;
 		}
-
 		/*
 		 * Do not increment matches: this is no match, but we
 		 * just made sure we are in the desired section.
@@ -2368,27 +2366,29 @@ static int store_aux(const char *key, const char *value, void *cb)
 		ALLOC_GROW(store->seen, store->seen_nr + 1,
 			   store->seen_alloc);
 		store->seen[store->seen_nr] = cf->do_ftell(cf);
-		/* fallthru */
-	case SECTION_END_SEEN:
-	case START:
-		if (matches(key, value, store)) {
-			ALLOC_GROW(store->seen, store->seen_nr + 1,
-				   store->seen_alloc);
-			store->seen[store->seen_nr] = cf->do_ftell(cf);
-			store->state = KEY_SEEN;
-			store->seen_nr++;
-		} else {
-			if (strrchr(key, '.') - key == store->baselen &&
-			      !strncmp(key, store->key, store->baselen)) {
-					store->state = SECTION_SEEN;
-					ALLOC_GROW(store->seen,
-						   store->seen_nr + 1,
-						   store->seen_alloc);
-					store->seen[store->seen_nr] =
-						cf->do_ftell(cf);
-			}
+	}
+
+	if (matches(key, value, store)) {
+		ALLOC_GROW(store->seen, store->seen_nr + 1,
+			   store->seen_alloc);
+		store->seen[store->seen_nr] = cf->do_ftell(cf);
+		store->seen_nr++;
+		store->key_seen = 1;
+		store->section_seen = 1;
+		store->is_keys_section = 1;
+	} else {
+		if (strrchr(key, '.') - key == store->baselen &&
+		      !strncmp(key, store->key, store->baselen)) {
+				store->section_seen = 1;
+				store->is_keys_section = 1;
+				ALLOC_GROW(store->seen,
+					   store->seen_nr + 1,
+					   store->seen_alloc);
+				store->seen[store->seen_nr] =
+					cf->do_ftell(cf);
 		}
 	}
+
 	return 0;
 }
 
@@ -2646,7 +2646,6 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 
 		ALLOC_GROW(store.seen, 1, store.seen_alloc);
 		store.seen[0] = 0;
-		store.state = START;
 		store.seen_nr = 0;
 
 		/*
@@ -2714,7 +2713,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 			new_line = 0;
 			if (store.seen[i] == 0) {
 				store.seen[i] = copy_end = contents_sz;
-			} else if (store.state != KEY_SEEN) {
+			} else if (!store.key_seen) {
 				copy_end = store.seen[i];
 			} else
 				copy_end = find_beginning_of_line(
@@ -2738,7 +2737,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 
 		/* write the pair (value == NULL means unset) */
 		if (value != NULL) {
-			if (store.state == START) {
+			if (!store.section_seen) {
 				if (write_section(fd, key, &store) < 0)
 					goto write_err_out;
 			}
-- 
2.16.2.windows.1.26.g2cc3565eb4b



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v2 12/15] git_config_set: make use of the config parser's event stream
  2018-04-03 16:27 ` [PATCH v2 00/15] " Johannes Schindelin
                     ` (10 preceding siblings ...)
  2018-04-03 16:28   ` [PATCH v2 11/15] git_config_set: do not use a state machine Johannes Schindelin
@ 2018-04-03 16:28   ` Johannes Schindelin
  2018-04-03 16:28   ` [PATCH v2 13/15] git config --unset: remove empty sections (in the common case) Johannes Schindelin
                     ` (4 subsequent siblings)
  16 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-03 16:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

In the recent commit with the title "config: introduce an optional event
stream while parsing", we introduced an optional callback to keep track
of the config parser's events "comment", "white-space", "section header"
and "entry".

One motivation for this feature was to make use of it in the code that
edits the config. And this commit makes it so.

Note: this patch changes the meaning of the `seen` array that records
whether we saw the config entry that is to be edited: previously, it
contained the end offset of the found entry. Now, we introduce a new
array `parsed` that keeps a record of *all* config parser events (with
begin/end offsets), and the items in the `seen` array now point into the
`parsed` array.

There are two reasons why we do it this way:

1. To keep the implementation simple, the config parser's event stream
   reports the event only after the config callback was called, so we
   would not receive the begin offset otherwise.

2. In the following patches, we will re-use the `parsed` array to fix two
   long-standing bugs related to empty sections.

Note that this also makes the code more robust with respect to finding the
begin offset of the part(s) of the config file to be edited, as we no
longer back-track to find the beginning of the line.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.c | 170 ++++++++++++++++++++++++++++++---------------------------------
 1 file changed, 81 insertions(+), 89 deletions(-)

diff --git a/config.c b/config.c
index 84e8f7ffeb8..345b1d2f140 100644
--- a/config.c
+++ b/config.c
@@ -2303,8 +2303,11 @@ struct config_set_store {
 	int do_not_match;
 	regex_t *value_regex;
 	int multi_replace;
-	size_t *seen;
-	unsigned int seen_nr, seen_alloc;
+	struct {
+		size_t begin, end;
+		enum config_event_t type;
+	} *parsed;
+	unsigned int parsed_nr, parsed_alloc, *seen, seen_nr, seen_alloc;
 	unsigned int key_seen:1, section_seen:1, is_keys_section:1;
 };
 
@@ -2322,10 +2325,31 @@ static int matches(const char *key, const char *value,
 		(value && !regexec(store->value_regex, value, 0, NULL, 0));
 }
 
+static int store_aux_event(enum config_event_t type,
+			   size_t begin, size_t end, void *data)
+{
+	struct config_set_store *store = data;
+
+	ALLOC_GROW(store->parsed, store->parsed_nr + 1, store->parsed_alloc);
+	store->parsed[store->parsed_nr].begin = begin;
+	store->parsed[store->parsed_nr].end = end;
+	store->parsed[store->parsed_nr].type = type;
+	store->parsed_nr++;
+
+	if (type == CONFIG_EVENT_SECTION) {
+		if (cf->var.len < 2 || cf->var.buf[cf->var.len - 1] != '.')
+			BUG("Invalid section name '%s'", cf->var.buf);
+
+		/* Is this the section we were looking for? */
+		store->is_keys_section = cf->var.len - 1 == store->baselen &&
+			!strncasecmp(cf->var.buf, store->key, store->baselen);
+	}
+
+	return 0;
+}
+
 static int store_aux(const char *key, const char *value, void *cb)
 {
-	const char *ep;
-	size_t section_len;
 	struct config_set_store *store = cb;
 
 	if (store->key_seen) {
@@ -2337,55 +2361,21 @@ static int store_aux(const char *key, const char *value, void *cb)
 			ALLOC_GROW(store->seen, store->seen_nr + 1,
 				   store->seen_alloc);
 
-			store->seen[store->seen_nr] = cf->do_ftell(cf);
+			store->seen[store->seen_nr] = store->parsed_nr;
 			store->seen_nr++;
 		}
-		return 0;
 	} else if (store->is_keys_section) {
 		/*
-		 * What we are looking for is in store->key (both
-		 * section and var), and its section part is baselen
-		 * long.  We found key (again, both section and var).
-		 * We would want to know if this key is in the same
-		 * section as what we are looking for.  We already
-		 * know we are in the same section as what should
-		 * hold store->key.
+		 * Do not increment matches yet: this may not be a match, but we
+		 * are in the desired section.
 		 */
-		ep = strrchr(key, '.');
-		section_len = ep - key;
-
-		if ((section_len != store->baselen) ||
-		    memcmp(key, store->key, section_len+1)) {
-			store->is_keys_section = 0;
-			return 0;
-		}
-		/*
-		 * Do not increment matches: this is no match, but we
-		 * just made sure we are in the desired section.
-		 */
-		ALLOC_GROW(store->seen, store->seen_nr + 1,
-			   store->seen_alloc);
-		store->seen[store->seen_nr] = cf->do_ftell(cf);
-	}
-
-	if (matches(key, value, store)) {
-		ALLOC_GROW(store->seen, store->seen_nr + 1,
-			   store->seen_alloc);
-		store->seen[store->seen_nr] = cf->do_ftell(cf);
-		store->seen_nr++;
-		store->key_seen = 1;
+		ALLOC_GROW(store->seen, store->seen_nr + 1, store->seen_alloc);
+		store->seen[store->seen_nr] = store->parsed_nr;
 		store->section_seen = 1;
-		store->is_keys_section = 1;
-	} else {
-		if (strrchr(key, '.') - key == store->baselen &&
-		      !strncmp(key, store->key, store->baselen)) {
-				store->section_seen = 1;
-				store->is_keys_section = 1;
-				ALLOC_GROW(store->seen,
-					   store->seen_nr + 1,
-					   store->seen_alloc);
-				store->seen[store->seen_nr] =
-					cf->do_ftell(cf);
+
+		if (matches(key, value, store)) {
+			store->seen_nr++;
+			store->key_seen = 1;
 		}
 	}
 
@@ -2486,32 +2476,6 @@ static ssize_t write_pair(int fd, const char *key, const char *value,
 	return ret;
 }
 
-static ssize_t find_beginning_of_line(const char *contents, size_t size,
-	size_t offset_, int *found_bracket)
-{
-	size_t equal_offset = size, bracket_offset = size;
-	ssize_t offset;
-
-contline:
-	for (offset = offset_-2; offset > 0
-			&& contents[offset] != '\n'; offset--)
-		switch (contents[offset]) {
-			case '=': equal_offset = offset; break;
-			case ']': bracket_offset = offset; break;
-		}
-	if (offset > 0 && contents[offset-1] == '\\') {
-		offset_ = offset;
-		goto contline;
-	}
-	if (bracket_offset < equal_offset) {
-		*found_bracket = 1;
-		offset = bracket_offset+1;
-	} else
-		offset++;
-
-	return offset;
-}
-
 int git_config_set_in_file_gently(const char *config_filename,
 				  const char *key, const char *value)
 {
@@ -2622,6 +2586,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 		struct stat st;
 		size_t copy_begin, copy_end;
 		int i, new_line = 0;
+		struct config_options opts;
 
 		if (value_regex == NULL)
 			store.value_regex = NULL;
@@ -2644,17 +2609,24 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 			}
 		}
 
-		ALLOC_GROW(store.seen, 1, store.seen_alloc);
-		store.seen[0] = 0;
-		store.seen_nr = 0;
+		ALLOC_GROW(store.parsed, 1, store.parsed_alloc);
+		store.parsed[0].end = 0;
+
+		memset(&opts, 0, sizeof(opts));
+		opts.event_fn = store_aux_event;
+		opts.event_fn_data = &store;
 
 		/*
-		 * After this, store.offset will contain the *end* offset
-		 * of the last match, or remain at 0 if no match was found.
+		 * After this, store.parsed will contain offsets of all the
+		 * parsed elements, and store.seen will contain a list of
+		 * matches, as indices into store.parsed.
+		 *
 		 * As a side effect, we make sure to transform only a valid
 		 * existing config file.
 		 */
-		if (git_config_from_file(store_aux, config_filename, &store)) {
+		if (git_config_from_file_with_options(store_aux,
+						      config_filename,
+						      &store, &opts)) {
 			error("invalid config file %s", config_filename);
 			free(store.key);
 			if (store.value_regex != NULL &&
@@ -2706,19 +2678,39 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 			goto out_free;
 		}
 
-		if (store.seen_nr == 0)
+		if (store.seen_nr == 0) {
+			if (!store.seen_alloc) {
+				/* Did not see key nor section */
+				ALLOC_GROW(store.seen, 1, store.seen_alloc);
+				store.seen[0] = store.parsed_nr
+					- !!store.parsed_nr;
+			}
 			store.seen_nr = 1;
+		}
 
 		for (i = 0, copy_begin = 0; i < store.seen_nr; i++) {
+			size_t replace_end;
+			int j = store.seen[i];
+
 			new_line = 0;
-			if (store.seen[i] == 0) {
-				store.seen[i] = copy_end = contents_sz;
-			} else if (!store.key_seen) {
-				copy_end = store.seen[i];
-			} else
-				copy_end = find_beginning_of_line(
-					contents, contents_sz,
-					store.seen[i], &new_line);
+			if (!store.key_seen) {
+				replace_end = copy_end = store.parsed[j].end;
+			} else {
+				replace_end = store.parsed[j].end;
+				copy_end = store.parsed[j].begin;
+				/*
+				 * Swallow preceding white-space on the same
+				 * line.
+				 */
+				while (copy_end > 0 ) {
+					char c = contents[copy_end - 1];
+
+					if (isspace(c) && c != '\n')
+						copy_end--;
+					else
+						break;
+				}
+			}
 
 			if (copy_end > 0 && contents[copy_end-1] != '\n')
 				new_line = 1;
@@ -2732,7 +2724,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 				    write_str_in_full(fd, "\n") < 0)
 					goto write_err_out;
 			}
-			copy_begin = store.seen[i];
+			copy_begin = replace_end;
 		}
 
 		/* write the pair (value == NULL means unset) */
-- 
2.16.2.windows.1.26.g2cc3565eb4b



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v2 13/15] git config --unset: remove empty sections (in the common case)
  2018-04-03 16:27 ` [PATCH v2 00/15] " Johannes Schindelin
                     ` (11 preceding siblings ...)
  2018-04-03 16:28   ` [PATCH v2 12/15] git_config_set: make use of the config parser's event stream Johannes Schindelin
@ 2018-04-03 16:28   ` Johannes Schindelin
  2018-04-03 16:29   ` [PATCH v2 14/15] git_config_set: reuse empty sections Johannes Schindelin
                     ` (3 subsequent siblings)
  16 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-03 16:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

The original reasoning for not removing section headers upon removal of
the last entry went like this: the user could have added comments about
the section, or about the entries therein, and if there were other
comments there, we would not know whether we should remove them.

In particular, a concocted example was presented that looked like this
(and was added to t1300):

	# some generic comment on the configuration file itself
	# a comment specific to this "section" section.
	[section]
	# some intervening lines
	# that should also be dropped

	key = value
	# please be careful when you update the above variable

The ideal thing for `git config --unset section.key` in this case would
be to leave only the first line behind, because all the other comments
are now obsolete.

However, this is unfeasible, short of adding a complete Natural Language
Processing module to Git, which seems not only a lot of work, but a
totally unreasonable feature (for little benefit to most users).

Now, the real kicker about this problem is: most users do not edit their
config files at all! In their use case, the config looks like this
instead:

	[section]
		key = value

... and it is totally obvious what should happen if the entry is
removed: the entire section should vanish.

Let's generalize this observation to this conservative strategy: if we
are removing the last entry from a section, and there are no comments
inside that section nor surrounding it, then remove the entire section.
Otherwise behave as before: leave the now-empty section (including those
comments, even ones about the now-deleted entry).

We have to be extra careful to handle the case where more than one entry
is removed: any subset of them might be the last entries of their
respective sections (and if there are no comments in or around that
section, the section should be removed, too).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.c          | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 t/t1300-config.sh |  4 +--
 2 files changed, 93 insertions(+), 4 deletions(-)

diff --git a/config.c b/config.c
index 345b1d2f140..271e9605ec1 100644
--- a/config.c
+++ b/config.c
@@ -2306,6 +2306,7 @@ struct config_set_store {
 	struct {
 		size_t begin, end;
 		enum config_event_t type;
+		int is_keys_section;
 	} *parsed;
 	unsigned int parsed_nr, parsed_alloc, *seen, seen_nr, seen_alloc;
 	unsigned int key_seen:1, section_seen:1, is_keys_section:1;
@@ -2334,17 +2335,20 @@ static int store_aux_event(enum config_event_t type,
 	store->parsed[store->parsed_nr].begin = begin;
 	store->parsed[store->parsed_nr].end = end;
 	store->parsed[store->parsed_nr].type = type;
-	store->parsed_nr++;
 
 	if (type == CONFIG_EVENT_SECTION) {
 		if (cf->var.len < 2 || cf->var.buf[cf->var.len - 1] != '.')
 			BUG("Invalid section name '%s'", cf->var.buf);
 
 		/* Is this the section we were looking for? */
-		store->is_keys_section = cf->var.len - 1 == store->baselen &&
+		store->is_keys_section =
+			store->parsed[store->parsed_nr].is_keys_section =
+			cf->var.len - 1 == store->baselen &&
 			!strncasecmp(cf->var.buf, store->key, store->baselen);
 	}
 
+	store->parsed_nr++;
+
 	return 0;
 }
 
@@ -2476,6 +2480,87 @@ static ssize_t write_pair(int fd, const char *key, const char *value,
 	return ret;
 }
 
+/*
+ * If we are about to unset the last key(s) in a section, and if there are
+ * no comments surrounding (or included in) the section, we will want to
+ * extend begin/end to remove the entire section.
+ *
+ * Note: the parameter `seen_ptr` points to the index into the store.seen
+ * array.  * This index may be incremented if a section has more than one
+ * entry (which all are to be removed).
+ */
+static void maybe_remove_section(struct config_set_store *store,
+				 const char *contents,
+				 size_t *begin_offset, size_t *end_offset,
+				 int *seen_ptr)
+{
+	size_t begin;
+	int i, seen, section_seen = 0;
+
+	/*
+	 * First, ensure that this is the first key, and that there are no
+	 * comments before the entry nor before the section header.
+	 */
+	seen = *seen_ptr;
+	for (i = store->seen[seen]; i > 0; i--) {
+		enum config_event_t type = store->parsed[i - 1].type;
+
+		if (type == CONFIG_EVENT_COMMENT)
+			/* There is a comment before this entry or section */
+			return;
+		if (type == CONFIG_EVENT_ENTRY) {
+			if (!section_seen)
+				/* This is not the section's first entry. */
+				return;
+			/* We encountered no comment before the section. */
+			break;
+		}
+		if (type == CONFIG_EVENT_SECTION) {
+			if (!store->parsed[i - 1].is_keys_section)
+				break;
+			section_seen = 1;
+		}
+	}
+	begin = store->parsed[i].begin;
+
+	/*
+	 * Next, make sure that we are removing he last key(s) in the section,
+	 * and that there are no comments that are possibly about the current
+	 * section.
+	 */
+	for (i = store->seen[seen] + 1; i < store->parsed_nr; i++) {
+		enum config_event_t type = store->parsed[i].type;
+
+		if (type == CONFIG_EVENT_COMMENT)
+			return;
+		if (type == CONFIG_EVENT_SECTION) {
+			if (store->parsed[i].is_keys_section)
+				continue;
+			break;
+		}
+		if (type == CONFIG_EVENT_ENTRY) {
+			if (++seen < store->seen_nr &&
+			    i == store->seen[seen])
+				/* We want to remove this entry, too */
+				continue;
+			/* There is another entry in this section. */
+			return;
+		}
+	}
+
+	/*
+	 * We are really removing the last entry/entries from this section, and
+	 * there are no enclosed or surrounding comments. Remove the entire,
+	 * now-empty section.
+	 */
+	*seen_ptr = seen;
+	*begin_offset = begin;
+	if (i < store->parsed_nr)
+		*end_offset = store->parsed[i].begin;
+	else
+		*end_offset = store->parsed[store->parsed_nr - 1].end;
+}
+
 int git_config_set_in_file_gently(const char *config_filename,
 				  const char *key, const char *value)
 {
@@ -2698,6 +2783,10 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 			} else {
 				replace_end = store.parsed[j].end;
 				copy_end = store.parsed[j].begin;
+				if (!value)
+					maybe_remove_section(&store, contents,
+							     &copy_end,
+							     &replace_end, &i);
 				/*
 				 * Swallow preceding white-space on the same
 				 * line.
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 10b9bf4b088..6d34513eedd 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -1413,7 +1413,7 @@ test_expect_success 'urlmatch with wildcard' '
 '
 
 # good section hygiene
-test_expect_failure '--unset last key removes section (except if commented)' '
+test_expect_success '--unset last key removes section (except if commented)' '
 	cat >.git/config <<-\EOF &&
 	# some generic comment on the configuration file itself
 	# a comment specific to this "section" section.
@@ -1452,7 +1452,7 @@ test_expect_failure '--unset last key removes section (except if commented)' '
 	test_cmp expect .git/config
 '
 
-test_expect_failure '--unset-all removes section if empty & uncommented' '
+test_expect_success '--unset-all removes section if empty & uncommented' '
 	cat >.git/config <<-\EOF &&
 	[section]
 	key = value1
-- 
2.16.2.windows.1.26.g2cc3565eb4b



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v2 14/15] git_config_set: reuse empty sections
  2018-04-03 16:27 ` [PATCH v2 00/15] " Johannes Schindelin
                     ` (12 preceding siblings ...)
  2018-04-03 16:28   ` [PATCH v2 13/15] git config --unset: remove empty sections (in the common case) Johannes Schindelin
@ 2018-04-03 16:29   ` Johannes Schindelin
  2018-04-03 16:30   ` [PATCH v2 00/15] Assorted fixes for `git config` (including the "empty sections" bug) Johannes Schindelin
                     ` (2 subsequent siblings)
  16 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-03 16:29 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

It can happen quite easily that the last setting in a config section is
removed, and to avoid confusion when there are comments in the config
about that section, we keep a lone section header, i.e. an empty
section.

Now that we use the `event_fn` callback, it is easy to add support for
re-using empty sections, so let's do that.

Note: t5512-ls-remote requires that this change is applied *after* the
patch "git config --unset: remove empty sections (in the common case)":
without that patch, there would be empty `transfer` and `uploadpack`
sections ready for reuse, but in the *wrong* order (and sconsequently,
t5512's "overrides work between mixed transfer/upload-pack hideRefs"
would fail).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.c          | 14 +++++++++++++-
 t/t1300-config.sh |  2 +-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/config.c b/config.c
index 271e9605ec1..ee7ea24123d 100644
--- a/config.c
+++ b/config.c
@@ -2345,6 +2345,12 @@ static int store_aux_event(enum config_event_t type,
 			store->parsed[store->parsed_nr].is_keys_section =
 			cf->var.len - 1 == store->baselen &&
 			!strncasecmp(cf->var.buf, store->key, store->baselen);
+		if (store->is_keys_section) {
+			store->section_seen = 1;
+			ALLOC_GROW(store->seen, store->seen_nr + 1,
+				   store->seen_alloc);
+			store->seen[store->seen_nr] = store->parsed_nr;
+		}
 	}
 
 	store->parsed_nr++;
@@ -2779,7 +2785,13 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 
 			new_line = 0;
 			if (!store.key_seen) {
-				replace_end = copy_end = store.parsed[j].end;
+				copy_end = store.parsed[j].end;
+				/* include '\n' when copying section header */
+				if (copy_end > 0 && copy_end < contents_sz &&
+				    contents[copy_end - 1] != '\n' &&
+				    contents[copy_end] == '\n')
+					copy_end++;
+				replace_end = copy_end;
 			} else {
 				replace_end = store.parsed[j].end;
 				copy_end = store.parsed[j].begin;
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 6d34513eedd..6d0e13020d1 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -1463,7 +1463,7 @@ test_expect_success '--unset-all removes section if empty & uncommented' '
 	test_line_count = 0 .git/config
 '
 
-test_expect_failure 'adding a key into an empty section reuses header' '
+test_expect_success 'adding a key into an empty section reuses header' '
 	cat >.git/config <<-\EOF &&
 	[section]
 	EOF
-- 
2.16.2.windows.1.26.g2cc3565eb4b

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 00/15] Assorted fixes for `git config` (including the "empty sections" bug)
  2018-04-03 16:27 ` [PATCH v2 00/15] " Johannes Schindelin
                     ` (13 preceding siblings ...)
  2018-04-03 16:29   ` [PATCH v2 14/15] git_config_set: reuse empty sections Johannes Schindelin
@ 2018-04-03 16:30   ` Johannes Schindelin
  2018-04-06 21:33   ` Jeff King
  2018-04-09  8:31   ` [PATCH v3 " Johannes Schindelin
  16 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-03 16:30 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

Hi team,

On Tue, 3 Apr 2018, Johannes Schindelin wrote:

> Johannes Schindelin (15):
>   git_config_set: fix off-by-two
>   t1300: rename it to reflect that `repo-config` was deprecated
>   t1300: demonstrate that --replace-all can "invent" newlines
>   config --replace-all: avoid extra line breaks
>   t1300: avoid relying on a bug
>   t1300: remove unreasonable expectation from TODO
>   t1300: `--unset-all` can leave an empty section behind (bug)
>   config: introduce an optional event stream while parsing
>   config: avoid using the global variable `store`
>   config_set_store: rename some fields for consistency
>   git_config_set: do not use a state machine
>   git_config_set: make use of the config parser's event stream
>   git config --unset: remove empty sections (in the common case)
>   git_config_set: reuse empty sections
>   TODOs

Please note that the `TODOs` commit is a left-over of my internal
book-keeping, and its diff is actually empty. Hence `format-patch` does
not even generate a mail for it, so there is no [PATCH v2 15/15].

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: A potential approach to making tests faster on Windows
  2018-04-03  9:49             ` Johannes Schindelin
  2018-04-03 11:28               ` Ævar Arnfjörð Bjarmason
@ 2018-04-03 21:36               ` Eric Sunshine
  1 sibling, 0 replies; 103+ messages in thread
From: Eric Sunshine @ 2018-04-03 21:36 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Jeff King, Ævar Arnfjörð Bjarmason, Stefan Beller,
	git, Junio C Hamano, Thomas Rast, Phil Haack, Jason Frey,
	Philip Oakley, Duy Nguyen

On Tue, Apr 3, 2018 at 5:49 AM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> My main evidence that shell scripts on macOS are slower than on Linux was
> the difference of the improvement incurred by moving more things from
> git-rebase--interactive.sh into sequencer.c: Linux saw an improvement only
> of about 3x, while macOS saw an improvement of 4x, IIRC. If I don't
> remember the absolute numbers correctly, at least I vividly remember the
> qualitative difference: It was noticeable.

MacOS is _slow_, much, much slower than, say, Linux.

Several years ago, when I had this machine configured for multi-boot,
I ran MacOS and Linux on bare metal. Back then, using ram disk for the
"trash" directories, and disabling Spotlight indexing on MacOS to
avoid it eating CPU and causing I/O contention, the Git test suite
would run to completion on Linux in slightly over 1 minute. On MacOS,
it would take over 10 minutes; 10 times slower.

These days, the Git test suite takes 15 minutes to run on the same
hardware (with same conditions: ram disk and Spotlight disabled),
which is painfully slow, thus I rarely do it. Unfortunately, I don't
have Linux installed on bare metal anymore, so I can't make a proper
comparison, but I do run Linux in a virtual machine under MacOS and,
even though its running within a virtualized environment, Linux is
still much faster than MacOS, taking 4:25 (slow, but not to the point
of outright pain).

That the test suite runs so much faster on Linux (bare metal or
virtualized) than MacOS on this machine, I have attributed (or
understood as being due) to poor HFS+ filesystem performance. It's
even worse when Spotlight interferes. Presumably, the new, recently
released, Mac filesystem has improved performance, but it's restricted
to SSD's, whereas this machine has a physical drive, thus I can't test
it.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 08/15] config: introduce an optional event stream while parsing
  2018-04-03 16:28   ` [PATCH v2 08/15] config: introduce an optional event stream while parsing Johannes Schindelin
@ 2018-04-06 21:22     ` Jeff King
  2018-04-09  7:35       ` Johannes Schindelin
  0 siblings, 1 reply; 103+ messages in thread
From: Jeff King @ 2018-04-06 21:22 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

On Tue, Apr 03, 2018 at 06:28:29PM +0200, Johannes Schindelin wrote:

> This extends our config parser so that it can optionally produce an event
> stream via callback function, where it reports e.g. when a comment was
> parsed, or a section header, etc.
> 
> This parser will be used subsequently to handle the scenarios better where
> removing config entries would make sections empty, or where a new entry
> could be added to an already-existing, empty section.

Nice, it looks like this didn't end up being too bad to go in this
direction. It seems like this is an optional "also emit the events here"
function you can set. I think in the long run we could actually just
always emit the events to this function. And then we could wrap that to
provide an interface that matches the existing callbacks (just an
event-stream callback that sees EVENT_ENTRY and calls the sub-callback).

But that might end up quite a pain, since we have a zillion entry points
into the config parser, making wrapping tough. So I'm perfectly happy to
stop here for now.

> +static inline int do_event(enum config_event_t type,
> +			   struct parse_event_data *data)

I'm not sure if "inline" here is a good idea, as it seems to get called
quite a few times. If we're trying to make things fast, bloating the
instruction cache may have the opposite effect.

-Peff

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 09/15] config: avoid using the global variable `store`
  2018-04-03 16:28   ` [PATCH v2 09/15] config: avoid using the global variable `store` Johannes Schindelin
@ 2018-04-06 21:23     ` Jeff King
  2018-04-09  7:36       ` Johannes Schindelin
  0 siblings, 1 reply; 103+ messages in thread
From: Jeff King @ 2018-04-06 21:23 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

On Tue, Apr 03, 2018 at 06:28:34PM +0200, Johannes Schindelin wrote:

> It is much easier to reason about, when the config code to set/unset
> variables or to remove/rename sections does not rely on a global (or
> file-local) variable.

Agreed.

> -static struct {
> +struct config_set_store {

This made me think of the existing "configset", which is quite a
different thing. Maybe just "config_store_data" or something would clash
less.

-Peff

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 11/15] git_config_set: do not use a state machine
  2018-04-03 16:28   ` [PATCH v2 11/15] git_config_set: do not use a state machine Johannes Schindelin
@ 2018-04-06 21:28     ` Jeff King
  2018-04-09  7:50       ` Johannes Schindelin
  0 siblings, 1 reply; 103+ messages in thread
From: Jeff King @ 2018-04-06 21:28 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

On Tue, Apr 03, 2018 at 06:28:42PM +0200, Johannes Schindelin wrote:

> While a neat theoretical construct, state machines are hard to read. In
> this instance, it does not even make a whole lot of sense because we are
> more interested in flags, anyway: has the section been seen? Has the key
> been seen? Does the current section match the key we are looking for?
> 
> Besides, the state `SECTION_SEEN` was named in a misleading way: it did
> not indicate that we saw the section matching the key we are looking
> for, but it instead indicated that we are *currently* in that section.
> 
> Let's just replace the state machine logic by clear and obvious flags.
> 
> This will also make it easier to review the upcoming patches to use the
> newly-introduced `event_fn` callback of the config parser.

I think this is probably a good direction. But one thing state machines
can help with is keeping the state to a manageable size. With 3 bits of
flags, we now have 8 possible states, up from the previous 4.

Clearly some of those are nonsensical (can you be in key_seen without
section_seen? I'd think not), but it's up to the code to interpret and
reset those manually.

I'll defer to your judgement, though, on this making things for the
future patches more readable. You spend a lot more time poking at it
than I have.

-Peff

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 00/15] Assorted fixes for `git config` (including the "empty sections" bug)
  2018-04-03 16:27 ` [PATCH v2 00/15] " Johannes Schindelin
                     ` (14 preceding siblings ...)
  2018-04-03 16:30   ` [PATCH v2 00/15] Assorted fixes for `git config` (including the "empty sections" bug) Johannes Schindelin
@ 2018-04-06 21:33   ` Jeff King
  2018-04-09  8:19     ` Johannes Schindelin
  2018-04-09  8:31   ` [PATCH v3 " Johannes Schindelin
  16 siblings, 1 reply; 103+ messages in thread
From: Jeff King @ 2018-04-06 21:33 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

On Tue, Apr 03, 2018 at 06:27:55PM +0200, Johannes Schindelin wrote:

> I am very, very grateful for the time Peff spent on reviewing the previous
> iteration, and hope that he realizes just how much the elegance of the
> event-stream-based version is due to his excellent review.

Unfortunately I ran out of time this week to give this version an
equally careful review, and I'm about to go on vacation for a few weeks.

I did give a cursory look over it, and the new maybe_remove_section() is
much more pleasant. So aside from a few minor nits I pointed out, this
generally looks good.

One thing I'd like to have seen is a few more tests covering exotic
cases that I turned up in my earlier review. Some of the weird multiline
cases I care less about, but we should probably cover at least:

  1. Comment behavior when removing a section that isn't at the
     beginning of the file.

  2. Removing the final key from a section with a subsection.

Those should both be natural fallouts of the new method, but it would be
good to have test coverage.

Thanks for reworking this, and if it's still not merged when I get back,
I promise to review it more carefully then. :)

-Peff

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: A potential approach to making tests faster on Windows
  2018-04-03 16:00               ` Johannes Schindelin
@ 2018-04-06 21:40                 ` Jeff King
  2018-04-06 21:57                   ` Stefan Beller
  0 siblings, 1 reply; 103+ messages in thread
From: Jeff King @ 2018-04-06 21:40 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Ævar Arnfjörð Bjarmason, Stefan Beller, git,
	Junio C Hamano, Thomas Rast, Phil Haack, Jason Frey,
	Philip Oakley, Duy Nguyen

On Tue, Apr 03, 2018 at 06:00:05PM +0200, Johannes Schindelin wrote:

> > But if we're at the point of creating custom C builtins for
> > busybox/dash/etc, you should be able to create a primitive for "read
> > this using buffered stdio, other processes be damned, and return one
> > line at a time".
> 
> Well, you know, I do not think that papering over the root cause will make
> anything better. And the root cause is that we use a test framework
> written in Unix shell.

I'm not entirely convinced of this. My earlier numbers show that we
spend a lot of time actually running Git. But that's not because we're
written in shell, but because the stable interface to Git is running
individual processes.

So we can unit-test wildmatch or similar in a single C program, but I
think we inherently need to run "git init" a lot of times.

Now I think there's reason to doubt some of my numbers. I was counting
exec's, and non-exec forks due to subshells, etc, may be important. So I
claim only that I remain unconvinced that we are certain of the root
cause.

At any rate, I would be happy to see more study into this. If we can
create a measurable speedup for an existing script, that might give us a
blueprint for speeding up the whole suite.

-Peff

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: A potential approach to making tests faster on Windows
  2018-04-06 21:40                 ` Jeff King
@ 2018-04-06 21:57                   ` Stefan Beller
  0 siblings, 0 replies; 103+ messages in thread
From: Stefan Beller @ 2018-04-06 21:57 UTC (permalink / raw)
  To: Jeff King
  Cc: Johannes Schindelin, Ævar Arnfjörð Bjarmason, git,
	Junio C Hamano, Thomas Rast, Phil Haack, Jason Frey,
	Philip Oakley, Duy Nguyen

On Fri, Apr 6, 2018 at 2:40 PM, Jeff King <peff@peff.net> wrote:
> On Tue, Apr 03, 2018 at 06:00:05PM +0200, Johannes Schindelin wrote:
>
>> > But if we're at the point of creating custom C builtins for
>> > busybox/dash/etc, you should be able to create a primitive for "read
>> > this using buffered stdio, other processes be damned, and return one
>> > line at a time".
>>
>> Well, you know, I do not think that papering over the root cause will make
>> anything better. And the root cause is that we use a test framework
>> written in Unix shell.
>
> I'm not entirely convinced of this. My earlier numbers show that we
> spend a lot of time actually running Git. But that's not because we're
> written in shell, but because the stable interface to Git is running
> individual processes.
>
> So we can unit-test wildmatch or similar in a single C program, but I
> think we inherently need to run "git init" a lot of times.
>
> Now I think there's reason to doubt some of my numbers. I was counting
> exec's, and non-exec forks due to subshells, etc, may be important. So I
> claim only that I remain unconvinced that we are certain of the root
> cause.
>
> At any rate, I would be happy to see more study into this. If we can
> create a measurable speedup for an existing script, that might give us a
> blueprint for speeding up the whole suite.

The setup of each test is finicky, as we'd do different setups for each test
as we'd test different things. I once wondered if we'd want to have a
"ready made" directory that contains repositories in various states
that we can copy for each test and only need minimal adjustments
instead of writing the setup from scratch in each script.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 1/9] git_config_set: fix off-by-two
  2018-04-03  9:31                 ` Johannes Schindelin
  2018-04-03 15:29                   ` Duy Nguyen
@ 2018-04-08 23:12                   ` Junio C Hamano
  1 sibling, 0 replies; 103+ messages in thread
From: Junio C Hamano @ 2018-04-08 23:12 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Ævar Arnfjörð Bjarmason, Jeff King, Stefan Beller,
	git, Thomas Rast, Phil Haack, Jason Frey, Philip Oakley

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> Yes, it is a workaround.  Making shell faster on windows would of
>> course be one possible solution to make t/t*.sh scripts go faster
>> ;-)  Or update parts of t/t*.sh so that the equivalent test coverage
>> can be kept while running making them go faster on Windows.
>
> What makes you think that I did not try my hardest for around 812 hours in
> total so far to make the shell faster?

Nowhere in these four lines I ever said that I think you did not
work hard to solve the performance issues you have.


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 08/15] config: introduce an optional event stream while parsing
  2018-04-06 21:22     ` Jeff King
@ 2018-04-09  7:35       ` Johannes Schindelin
  0 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-09  7:35 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

Hi Peff,

On Fri, 6 Apr 2018, Jeff King wrote:

> On Tue, Apr 03, 2018 at 06:28:29PM +0200, Johannes Schindelin wrote:
> 
> > This extends our config parser so that it can optionally produce an
> > event stream via callback function, where it reports e.g. when a
> > comment was parsed, or a section header, etc.
> > 
> > This parser will be used subsequently to handle the scenarios better
> > where removing config entries would make sections empty, or where a
> > new entry could be added to an already-existing, empty section.
> 
> Nice, it looks like this didn't end up being too bad to go in this
> direction. It seems like this is an optional "also emit the events here"
> function you can set.

Yes.

> I think in the long run we could actually just always emit the events to
> this function. And then we could wrap that to provide an interface that
> matches the existing callbacks (just an event-stream callback that sees
> EVENT_ENTRY and calls the sub-callback).

Well, not precisely. The event stream was implemented in a minimal
fashion, in particular *not* emitting enough information in the event
stream for that. To keep things as little intrusive as possible, the
CONFIG_EVENT_ENTRY event is only emitted *after* the config_fn is called,
and at that point we do not even know the key and the value any more.

I fear that it would make the code quite a bit more complicated to change
it in the way you suggested.

Side note: a slightly ugly aspect of my patch series is that the
CONFIG_EVENT_SECTION event *also* does not provide the interesting
information (in this case, the section name), but that it has to be
inferred from the cf->var field (which is file-local to config.c, and
which has been set to the section name followed by a single '.' at that
point). Again, this keeps the diff simpler to review, and that's why I did
it that way.

> But that might end up quite a pain, since we have a zillion entry points
> into the config parser, making wrapping tough. So I'm perfectly happy to
> stop here for now.

Right.

> > +static inline int do_event(enum config_event_t type,
> > +			   struct parse_event_data *data)
> 
> I'm not sure if "inline" here is a good idea, as it seems to get called
> quite a few times. If we're trying to make things fast, bloating the
> instruction cache may have the opposite effect.

Good point.

The reason I declared this as inline function was that I test whether
either data->opts or data->opts->event_fn are NULL, and whether we are
continuing to look at whitespace, for early returns from that function.
Which I wanted to avoid doing in a hot function (I'd rather skip calling
the function if it is pointless to call it).

However, the config code is hardly performance-critical, as we do not
expect to parse hundreds of kilobytes, right? So that "inline" was a
premature optimization.

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 09/15] config: avoid using the global variable `store`
  2018-04-06 21:23     ` Jeff King
@ 2018-04-09  7:36       ` Johannes Schindelin
  0 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-09  7:36 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

Hi Peff,

On Fri, 6 Apr 2018, Jeff King wrote:

> On Tue, Apr 03, 2018 at 06:28:34PM +0200, Johannes Schindelin wrote:
> 
> > -static struct {
> > +struct config_set_store {
> 
> This made me think of the existing "configset", which is quite a
> different thing. Maybe just "config_store_data" or something would clash
> less.

Sure,
Dscho

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 11/15] git_config_set: do not use a state machine
  2018-04-06 21:28     ` Jeff King
@ 2018-04-09  7:50       ` Johannes Schindelin
  0 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-09  7:50 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

Hi Peff,

On Fri, 6 Apr 2018, Jeff King wrote:

> On Tue, Apr 03, 2018 at 06:28:42PM +0200, Johannes Schindelin wrote:
> 
> > While a neat theoretical construct, state machines are hard to read. In
> > this instance, it does not even make a whole lot of sense because we are
> > more interested in flags, anyway: has the section been seen? Has the key
> > been seen? Does the current section match the key we are looking for?
> > 
> > Besides, the state `SECTION_SEEN` was named in a misleading way: it did
> > not indicate that we saw the section matching the key we are looking
> > for, but it instead indicated that we are *currently* in that section.
> > 
> > Let's just replace the state machine logic by clear and obvious flags.
> > 
> > This will also make it easier to review the upcoming patches to use the
> > newly-introduced `event_fn` callback of the config parser.
> 
> I think this is probably a good direction. But one thing state machines
> can help with is keeping the state to a manageable size. With 3 bits of
> flags, we now have 8 possible states, up from the previous 4.
> 
> Clearly some of those are nonsensical (can you be in key_seen without
> section_seen? I'd think not), but it's up to the code to interpret and
> reset those manually.

That is true. On the other hand, it is easy to miss incorrect state
transitions in state machines (or to miss unused states).

> I'll defer to your judgement, though, on this making things for the
> future patches more readable. You spend a lot more time poking at it
> than I have.

The original reason to get rid of the state machine was: I did not need
the states any more in the end. Since the section name is set via the
event stream we now know in the config_fn whether we are in the correct
section or not.

I also liked the fact that it was much easier to reason about correct
code: "Did I catch all the states that apply here?" is a hairier question
than "Is this flag true?"

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 00/15] Assorted fixes for `git config` (including the "empty sections" bug)
  2018-04-06 21:33   ` Jeff King
@ 2018-04-09  8:19     ` Johannes Schindelin
  0 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-09  8:19 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

Hi Peff,

On Fri, 6 Apr 2018, Jeff King wrote:

> On Tue, Apr 03, 2018 at 06:27:55PM +0200, Johannes Schindelin wrote:
> 
> > I am very, very grateful for the time Peff spent on reviewing the
> > previous iteration, and hope that he realizes just how much the
> > elegance of the event-stream-based version is due to his excellent
> > review.
> 
> Unfortunately I ran out of time this week to give this version an
> equally careful review, and I'm about to go on vacation for a few weeks.

No worries, and thank you for your review. I know I am adding more stuff
to review these days than I review other stuff, but I promise that I will
try to get more reviews in once I am done with this patch series (and with
the --rebase-merges one).

> I did give a cursory look over it, and the new maybe_remove_section() is
> much more pleasant. So aside from a few minor nits I pointed out, this
> generally looks good.

Thanks!

> One thing I'd like to have seen is a few more tests covering exotic
> cases that I turned up in my earlier review. Some of the weird multiline
> cases I care less about, but we should probably cover at least:
> 
>   1. Comment behavior when removing a section that isn't at the
>      beginning of the file.
> 
>   2. Removing the final key from a section with a subsection.
> 
> Those should both be natural fallouts of the new method, but it would be
> good to have test coverage.

I added this, in a new commit I call "t1300: add a few more hairy examples
of sections becoming empty".

> Thanks for reworking this, and if it's still not merged when I get back,
> I promise to review it more carefully then. :)

:-)

Have a good vacation!
Dscho

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v3 00/15] Assorted fixes for `git config` (including the "empty sections" bug)
  2018-04-03 16:27 ` [PATCH v2 00/15] " Johannes Schindelin
                     ` (15 preceding siblings ...)
  2018-04-06 21:33   ` Jeff King
@ 2018-04-09  8:31   ` Johannes Schindelin
  2018-04-09  8:31     ` [PATCH v3 01/15] git_config_set: fix off-by-two Johannes Schindelin
                       ` (14 more replies)
  16 siblings, 15 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-09  8:31 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

This patch series originally only tried to help fixing that annoying bug that
has been reported several times over the years, where `git config --unset`
would leave empty sections behind, and `git config --add` would not reuse them.

The first patch is somewhat of a "while at it" bug fix that I first thought
would be a lot more critical than it actually is: It really only affects config
files that start with a section followed immediately (i.e. without a newline)
by a one-letter boolean setting (i.e. without a `= <value>` part). So while it
is a real bug fix, I doubt anybody ever got bitten by it.

The next swath of patches add and fix some tests, while also fixing the bug
where --replace-all would sometimes insert extra line breaks.

Then, I introduce a couple of building blocks: a "config parser event stream",
i.e. an optional callback that can be used to report events such as "comment",
"white-space", etc together with the corresponding extents in the config file.

Finally, the interesting part, where I do two things, essentially (with
preparatory steps for each thing):

1. I add the ability for `git config --unset/--unset-all` to detect that it
   can remove a section that has just become empty (see below for some more
   discussion of what I consider "become empty"), and

2. I add the ability for `git config [--add] key value` to re-use empty
   sections.

To reiterate why does this patch series not conflict with my very early
statements that we cannot simply remove empty sections because we may end up
with stale comments?

Well, the patch in question takes pains to determine *iff* there are any
comments surrounding, or included in, the section. If any are found: previous
behavior. Under the assumption that the user edited the file, we keep it as
intact as possible (see below for some argument against this). If no comments
are found, and let's face it, this is probably *the* common case, as few people
edit their config files by hand these days (neither should they because it is
too easy to end up with an unparseable one), the now-empty section *is*
removed.

So what is the argument against this extra care to detect comments? Well, if
you have something like this:

	[section]
		; Here we comment about the variable called snarf
		snarf = froop

and we run `git config --unset section.snarf`, we end up with this config:

	[section]
		; Here we comment about the variable called snarf

which obviously does not make sense. However, that is already established
behavior for quite a few years, and I do not even try to think of a way how
this could be solved.

Changes since v2:

- removed the `inline` attribute from the `do_event()` function.

- renamed `struct config_set_store` to `struct config_store_data`, to make its
  roled more obvious.

- a whole slew of concocted test cases were added to the test to verify that
  a section that becomes empty is removed, based on Peff's analysis at
  https://public-inbox.org/git/20180329213229.GG2939@sigill.intra.peff.net/


Johannes Schindelin (15):
  git_config_set: fix off-by-two
  t1300: rename it to reflect that `repo-config` was deprecated
  t1300: demonstrate that --replace-all can "invent" newlines
  config --replace-all: avoid extra line breaks
  t1300: avoid relying on a bug
  t1300: remove unreasonable expectation from TODO
  t1300: add a few more hairy examples of sections becoming empty
  t1300: `--unset-all` can leave an empty section behind (bug)
  config: introduce an optional event stream while parsing
  config: avoid using the global variable `store`
  config_set_store: rename some fields for consistency
  git_config_set: do not use a state machine
  git_config_set: make use of the config parser's event stream
  git config --unset: remove empty sections (in the common case)
  git_config_set: reuse empty sections

 config.c                                    | 448 ++++++++++++++------
 config.h                                    |  25 ++
 t/{t1300-repo-config.sh => t1300-config.sh} | 102 ++++-
 3 files changed, 439 insertions(+), 136 deletions(-)
 rename t/{t1300-repo-config.sh => t1300-config.sh} (95%)


base-commit: 468165c1d8a442994a825f3684528361727cd8c0
Published-As: https://github.com/dscho/git/releases/tag/empty-config-section-v3
Fetch-It-Via: git fetch https://github.com/dscho/git empty-config-section-v3

Interdiff vs v2:
 diff --git a/config.c b/config.c
 index ee7ea24123d..6155d0651bd 100644
 --- a/config.c
 +++ b/config.c
 @@ -659,8 +659,7 @@ struct parse_event_data {
  	const struct config_options *opts;
  };
  
 -static inline int do_event(enum config_event_t type,
 -			   struct parse_event_data *data)
 +static int do_event(enum config_event_t type, struct parse_event_data *data)
  {
  	size_t offset;
  
 @@ -2297,7 +2296,7 @@ void git_die_config(const char *key, const char *err, ...)
   * Find all the stuff for git_config_set() below.
   */
  
 -struct config_set_store {
 +struct config_store_data {
  	int baselen;
  	char *key;
  	int do_not_match;
 @@ -2313,7 +2312,7 @@ struct config_set_store {
  };
  
  static int matches(const char *key, const char *value,
 -		   const struct config_set_store *store)
 +		   const struct config_store_data *store)
  {
  	if (strcmp(key, store->key))
  		return 0; /* not ours */
 @@ -2329,7 +2328,7 @@ static int matches(const char *key, const char *value,
  static int store_aux_event(enum config_event_t type,
  			   size_t begin, size_t end, void *data)
  {
 -	struct config_set_store *store = data;
 +	struct config_store_data *store = data;
  
  	ALLOC_GROW(store->parsed, store->parsed_nr + 1, store->parsed_alloc);
  	store->parsed[store->parsed_nr].begin = begin;
 @@ -2360,7 +2359,7 @@ static int store_aux_event(enum config_event_t type,
  
  static int store_aux(const char *key, const char *value, void *cb)
  {
 -	struct config_set_store *store = cb;
 +	struct config_store_data *store = cb;
  
  	if (store->key_seen) {
  		if (matches(key, value, store)) {
 @@ -2401,7 +2400,7 @@ static int write_error(const char *filename)
  }
  
  static struct strbuf store_create_section(const char *key,
 -					  const struct config_set_store *store)
 +					  const struct config_store_data *store)
  {
  	const char *dot;
  	int i;
 @@ -2424,7 +2423,7 @@ static struct strbuf store_create_section(const char *key,
  }
  
  static ssize_t write_section(int fd, const char *key,
 -			     const struct config_set_store *store)
 +			     const struct config_store_data *store)
  {
  	struct strbuf sb = store_create_section(key, store);
  	ssize_t ret;
 @@ -2436,7 +2435,7 @@ static ssize_t write_section(int fd, const char *key,
  }
  
  static ssize_t write_pair(int fd, const char *key, const char *value,
 -			  const struct config_set_store *store)
 +			  const struct config_store_data *store)
  {
  	int i;
  	ssize_t ret;
 @@ -2495,7 +2494,7 @@ static ssize_t write_pair(int fd, const char *key, const char *value,
   * array.  * This index may be incremented if a section has more than one
   * entry (which all are to be removed).
   */
 -static void maybe_remove_section(struct config_set_store *store,
 +static void maybe_remove_section(struct config_store_data *store,
  				 const char *contents,
  				 size_t *begin_offset, size_t *end_offset,
  				 int *seen_ptr)
 @@ -2625,7 +2624,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
  	char *filename_buf = NULL;
  	char *contents = NULL;
  	size_t contents_sz;
 -	struct config_set_store store;
 +	struct config_store_data store;
  
  	memset(&store, 0, sizeof(store));
  
 @@ -2969,7 +2968,7 @@ static int git_config_copy_or_rename_section_in_file(const char *config_filename
  	FILE *config_file = NULL;
  	struct stat st;
  	struct strbuf copystr = STRBUF_INIT;
 -	struct config_set_store store;
 +	struct config_store_data store;
  
  	memset(&store, 0, sizeof(store));
  
 diff --git a/t/t1300-config.sh b/t/t1300-config.sh
 index 6d0e13020d1..eef0bbe4f9f 100755
 --- a/t/t1300-config.sh
 +++ b/t/t1300-config.sh
 @@ -1449,7 +1449,50 @@ test_expect_success '--unset last key removes section (except if commented)' '
  	EOF
  
  	git config --unset section.key &&
 -	test_cmp expect .git/config
 +	test_cmp expect .git/config &&
 +
 +	q_to_tab >.git/config <<-\EOF &&
 +	[one]
 +	Qkey = "multiline \
 +	QQ# with comment"
 +	[two]
 +	key = true
 +	EOF
 +	git config --unset two.key &&
 +	! grep two .git/config &&
 +
 +	q_to_tab >.git/config <<-\EOF &&
 +	[one]
 +	Qkey = "multiline \
 +	QQ# with comment"
 +	[one]
 +	key = true
 +	EOF
 +	git config --unset-all one.key &&
 +	test_line_count = 0 .git/config &&
 +
 +	q_to_tab >.git/config <<-\EOF &&
 +	[one]
 +	Qkey = true
 +	Q# a comment not at the start
 +	[two]
 +	Qkey = true
 +	EOF
 +	git config --unset two.key &&
 +	grep two .git/config &&
 +
 +	q_to_tab >.git/config <<-\EOF &&
 +	[one]
 +	Qkey = not [two "subsection"]
 +	[two "subsection"]
 +	[two "subsection"]
 +	Qkey = true
 +	[TWO "subsection"]
 +	[one]
 +	EOF
 +	git config --unset two.subsection.key &&
 +	test "not [two subsection]" = "$(git config one.key)" &&
 +	test_line_count = 3 .git/config
  '
  
  test_expect_success '--unset-all removes section if empty & uncommented' '
-- 
2.17.0.windows.1.4.g7e4058d72e3


^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v3 01/15] git_config_set: fix off-by-two
  2018-04-09  8:31   ` [PATCH v3 " Johannes Schindelin
@ 2018-04-09  8:31     ` Johannes Schindelin
  2018-04-09  8:31     ` [PATCH v3 02/15] t1300: rename it to reflect that `repo-config` was deprecated Johannes Schindelin
                       ` (13 subsequent siblings)
  14 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-09  8:31 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

Currently, we are slightly overzealous When removing an entry from a
config file of this form:

	[abc]a
	[xyz]
		key = value

When calling `git config --unset abc.a` on this file, it leaves this
(invalid) config behind:

	[
	[xyz]
		key = value

The reason is that we try to search for the beginning of the line (or
for the end of the preceding section header on the same line) that
defines abc.a, but as an optimization, we subtract 2 from the offset
pointing just after the definition before we call
find_beginning_of_line(). That function, however, *also* performs that
optimization and promptly fails to find the section header correctly.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/config.c b/config.c
index b0c20e6cb8a..5cc049aaef0 100644
--- a/config.c
+++ b/config.c
@@ -2632,7 +2632,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 			} else
 				copy_end = find_beginning_of_line(
 					contents, contents_sz,
-					store.offset[i]-2, &new_line);
+					store.offset[i], &new_line);
 
 			if (copy_end > 0 && contents[copy_end-1] != '\n')
 				new_line = 1;
-- 
2.17.0.windows.1.4.g7e4058d72e3



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v3 02/15] t1300: rename it to reflect that `repo-config` was deprecated
  2018-04-09  8:31   ` [PATCH v3 " Johannes Schindelin
  2018-04-09  8:31     ` [PATCH v3 01/15] git_config_set: fix off-by-two Johannes Schindelin
@ 2018-04-09  8:31     ` Johannes Schindelin
  2018-04-09  8:31     ` [PATCH v3 03/15] t1300: demonstrate that --replace-all can "invent" newlines Johannes Schindelin
                       ` (12 subsequent siblings)
  14 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-09  8:31 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/{t1300-repo-config.sh => t1300-config.sh} | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename t/{t1300-repo-config.sh => t1300-config.sh} (100%)

diff --git a/t/t1300-repo-config.sh b/t/t1300-config.sh
similarity index 100%
rename from t/t1300-repo-config.sh
rename to t/t1300-config.sh
-- 
2.17.0.windows.1.4.g7e4058d72e3



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v3 03/15] t1300: demonstrate that --replace-all can "invent" newlines
  2018-04-09  8:31   ` [PATCH v3 " Johannes Schindelin
  2018-04-09  8:31     ` [PATCH v3 01/15] git_config_set: fix off-by-two Johannes Schindelin
  2018-04-09  8:31     ` [PATCH v3 02/15] t1300: rename it to reflect that `repo-config` was deprecated Johannes Schindelin
@ 2018-04-09  8:31     ` Johannes Schindelin
  2018-04-09  8:31     ` [PATCH v3 04/15] config --replace-all: avoid extra line breaks Johannes Schindelin
                       ` (11 subsequent siblings)
  14 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-09  8:31 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1300-config.sh | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 4f8e6f5fde3..cc417687e8d 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -1611,4 +1611,25 @@ test_expect_success '--local requires a repo' '
 	test_expect_code 128 nongit git config --local foo.bar
 '
 
+test_expect_failure '--replace-all does not invent newlines' '
+	q_to_tab >.git/config <<-\EOF &&
+	[abc]key
+	QkeepSection
+	[xyz]
+	Qkey = 1
+	[abc]
+	Qkey = a
+	EOF
+	q_to_tab >expect <<-\EOF &&
+	[abc]
+	QkeepSection
+	[xyz]
+	Qkey = 1
+	[abc]
+	Qkey = b
+	EOF
+	git config --replace-all abc.key b &&
+	test_cmp .git/config expect
+'
+
 test_done
-- 
2.17.0.windows.1.4.g7e4058d72e3



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v3 04/15] config --replace-all: avoid extra line breaks
  2018-04-09  8:31   ` [PATCH v3 " Johannes Schindelin
                       ` (2 preceding siblings ...)
  2018-04-09  8:31     ` [PATCH v3 03/15] t1300: demonstrate that --replace-all can "invent" newlines Johannes Schindelin
@ 2018-04-09  8:31     ` Johannes Schindelin
  2018-04-09  8:31     ` [PATCH v3 05/15] t1300: avoid relying on a bug Johannes Schindelin
                       ` (10 subsequent siblings)
  14 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-09  8:31 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

When replacing multiple config entries at once, we did not re-set the
flag that indicates whether we need to insert a new-line before the new
entry. As a consequence, an extra new-line was inserted under certain
circumstances.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.c          | 1 +
 t/t1300-config.sh | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/config.c b/config.c
index 5cc049aaef0..f10f8c6f52f 100644
--- a/config.c
+++ b/config.c
@@ -2625,6 +2625,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 			store.seen = 1;
 
 		for (i = 0, copy_begin = 0; i < store.seen; i++) {
+			new_line = 0;
 			if (store.offset[i] == 0) {
 				store.offset[i] = copy_end = contents_sz;
 			} else if (store.state != KEY_SEEN) {
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index cc417687e8d..aed12be492f 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -1611,7 +1611,7 @@ test_expect_success '--local requires a repo' '
 	test_expect_code 128 nongit git config --local foo.bar
 '
 
-test_expect_failure '--replace-all does not invent newlines' '
+test_expect_success '--replace-all does not invent newlines' '
 	q_to_tab >.git/config <<-\EOF &&
 	[abc]key
 	QkeepSection
-- 
2.17.0.windows.1.4.g7e4058d72e3



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v3 05/15] t1300: avoid relying on a bug
  2018-04-09  8:31   ` [PATCH v3 " Johannes Schindelin
                       ` (3 preceding siblings ...)
  2018-04-09  8:31     ` [PATCH v3 04/15] config --replace-all: avoid extra line breaks Johannes Schindelin
@ 2018-04-09  8:31     ` Johannes Schindelin
  2018-04-09  8:31     ` [PATCH v3 06/15] t1300: remove unreasonable expectation from TODO Johannes Schindelin
                       ` (9 subsequent siblings)
  14 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-09  8:31 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

The test case 'unset with cont. lines' relied on a bug that is about to
be fixed: it tests *explicitly* that removing the last entry from a
config section leaves an *empty* section behind.

Let's fix this test case not to rely on that behavior, simply by
preventing the section from becoming empty.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1300-config.sh | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index aed12be492f..7c0ee208dea 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -108,6 +108,7 @@ bar = foo
 [beta]
 baz = multiple \
 lines
+foo = bar
 EOF
 
 test_expect_success 'unset with cont. lines' '
@@ -118,6 +119,7 @@ cat > expect <<\EOF
 [alpha]
 bar = foo
 [beta]
+foo = bar
 EOF
 
 test_expect_success 'unset with cont. lines is correct' 'test_cmp expect .git/config'
-- 
2.17.0.windows.1.4.g7e4058d72e3



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v3 06/15] t1300: remove unreasonable expectation from TODO
  2018-04-09  8:31   ` [PATCH v3 " Johannes Schindelin
                       ` (4 preceding siblings ...)
  2018-04-09  8:31     ` [PATCH v3 05/15] t1300: avoid relying on a bug Johannes Schindelin
@ 2018-04-09  8:31     ` Johannes Schindelin
  2018-04-09  8:31     ` [PATCH v3 07/15] t1300: add a few more hairy examples of sections becoming empty Johannes Schindelin
                       ` (8 subsequent siblings)
  14 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-09  8:31 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

In https://public-inbox.org/git/7vvc8alzat.fsf@alter.siamese.dyndns.org/
a reasonable patch was made quite a bit less so by changing a test case
demonstrating a bug to a test case that demonstrates that we ask for too
much: the test case 'unsetting the last key in a section removes header'
now expects a future bug fix to be able to determine whether a free-form
comment above a section header refers to said section or not.

Rather than shooting for the stars (and not even getting off the
ground), let's start shooting for something obtainable and be reasonably
confident that we *can* get it.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1300-config.sh | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 7c0ee208dea..187fc5b195f 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -1413,7 +1413,7 @@ test_expect_success 'urlmatch with wildcard' '
 '
 
 # good section hygiene
-test_expect_failure 'unsetting the last key in a section removes header' '
+test_expect_failure '--unset last key removes section (except if commented)' '
 	cat >.git/config <<-\EOF &&
 	# some generic comment on the configuration file itself
 	# a comment specific to this "section" section.
@@ -1427,6 +1427,25 @@ test_expect_failure 'unsetting the last key in a section removes header' '
 
 	cat >expect <<-\EOF &&
 	# some generic comment on the configuration file itself
+	# a comment specific to this "section" section.
+	[section]
+	# some intervening lines
+	# that should also be dropped
+
+	# please be careful when you update the above variable
+	EOF
+
+	git config --unset section.key &&
+	test_cmp expect .git/config &&
+
+	cat >.git/config <<-\EOF &&
+	[section]
+	key = value
+	[next-section]
+	EOF
+
+	cat >expect <<-\EOF &&
+	[next-section]
 	EOF
 
 	git config --unset section.key &&
-- 
2.17.0.windows.1.4.g7e4058d72e3



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v3 07/15] t1300: add a few more hairy examples of sections becoming empty
  2018-04-09  8:31   ` [PATCH v3 " Johannes Schindelin
                       ` (5 preceding siblings ...)
  2018-04-09  8:31     ` [PATCH v3 06/15] t1300: remove unreasonable expectation from TODO Johannes Schindelin
@ 2018-04-09  8:31     ` Johannes Schindelin
  2018-04-09  8:32     ` [PATCH v3 08/15] t1300: `--unset-all` can leave an empty section behind (bug) Johannes Schindelin
                       ` (7 subsequent siblings)
  14 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-09  8:31 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

During the review of the first iteration of the patch series to remove
sections that become empty upon --unset or --unset-all, Jeff King
identified a couple of problematic cases with the backtracking approach
that was still used then to "look backwards for the section header":
https://public-inbox.org/git/20180329213229.GG2939@sigill.intra.peff.net/

This patch adds a couple of concocted examples designed to fool a
backtracking parser.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1300-config.sh | 45 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 187fc5b195f..bc30cfb3468 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -1449,7 +1449,50 @@ test_expect_failure '--unset last key removes section (except if commented)' '
 	EOF
 
 	git config --unset section.key &&
-	test_cmp expect .git/config
+	test_cmp expect .git/config &&
+
+	q_to_tab >.git/config <<-\EOF &&
+	[one]
+	Qkey = "multiline \
+	QQ# with comment"
+	[two]
+	key = true
+	EOF
+	git config --unset two.key &&
+	! grep two .git/config &&
+
+	q_to_tab >.git/config <<-\EOF &&
+	[one]
+	Qkey = "multiline \
+	QQ# with comment"
+	[one]
+	key = true
+	EOF
+	git config --unset-all one.key &&
+	test_line_count = 0 .git/config &&
+
+	q_to_tab >.git/config <<-\EOF &&
+	[one]
+	Qkey = true
+	Q# a comment not at the start
+	[two]
+	Qkey = true
+	EOF
+	git config --unset two.key &&
+	grep two .git/config &&
+
+	q_to_tab >.git/config <<-\EOF &&
+	[one]
+	Qkey = not [two "subsection"]
+	[two "subsection"]
+	[two "subsection"]
+	Qkey = true
+	[TWO "subsection"]
+	[one]
+	EOF
+	git config --unset two.subsection.key &&
+	test "not [two subsection]" = "$(git config one.key)" &&
+	test_line_count = 3 .git/config
 '
 
 test_expect_failure 'adding a key into an empty section reuses header' '
-- 
2.17.0.windows.1.4.g7e4058d72e3



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v3 08/15] t1300: `--unset-all` can leave an empty section behind (bug)
  2018-04-09  8:31   ` [PATCH v3 " Johannes Schindelin
                       ` (6 preceding siblings ...)
  2018-04-09  8:31     ` [PATCH v3 07/15] t1300: add a few more hairy examples of sections becoming empty Johannes Schindelin
@ 2018-04-09  8:32     ` Johannes Schindelin
  2018-04-09  8:32     ` [PATCH v3 09/15] config: introduce an optional event stream while parsing Johannes Schindelin
                       ` (6 subsequent siblings)
  14 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-09  8:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

We already have a test demonstrating that removing the last entry from a
config section fails to remove the section header of the now-empty
section.

The same can happen, of course, if we remove the last entries in one fell
swoop. This is *also* a bug, and should be fixed at the same time.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1300-config.sh | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index bc30cfb3468..9d23a8ca972 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -1495,6 +1495,17 @@ test_expect_failure '--unset last key removes section (except if commented)' '
 	test_line_count = 3 .git/config
 '
 
+test_expect_failure '--unset-all removes section if empty & uncommented' '
+	cat >.git/config <<-\EOF &&
+	[section]
+	key = value1
+	key = value2
+	EOF
+
+	git config --unset-all section.key &&
+	test_line_count = 0 .git/config
+'
+
 test_expect_failure 'adding a key into an empty section reuses header' '
 	cat >.git/config <<-\EOF &&
 	[section]
-- 
2.17.0.windows.1.4.g7e4058d72e3



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v3 09/15] config: introduce an optional event stream while parsing
  2018-04-09  8:31   ` [PATCH v3 " Johannes Schindelin
                       ` (7 preceding siblings ...)
  2018-04-09  8:32     ` [PATCH v3 08/15] t1300: `--unset-all` can leave an empty section behind (bug) Johannes Schindelin
@ 2018-04-09  8:32     ` Johannes Schindelin
  2018-04-09  8:32     ` [PATCH v3 10/15] config: avoid using the global variable `store` Johannes Schindelin
                       ` (5 subsequent siblings)
  14 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-09  8:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

This extends our config parser so that it can optionally produce an event
stream via callback function, where it reports e.g. when a comment was
parsed, or a section header, etc.

This parser will be used subsequently to handle the scenarios better where
removing config entries would make sections empty, or where a new entry
could be added to an already-existing, empty section.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.c | 101 ++++++++++++++++++++++++++++++++++++++++++++++++-------
 config.h |  25 ++++++++++++++
 2 files changed, 114 insertions(+), 12 deletions(-)

diff --git a/config.c b/config.c
index f10f8c6f52f..03d8e7709fe 100644
--- a/config.c
+++ b/config.c
@@ -653,7 +653,45 @@ static int get_base_var(struct strbuf *name)
 	}
 }
 
-static int git_parse_source(config_fn_t fn, void *data)
+struct parse_event_data {
+	enum config_event_t previous_type;
+	size_t previous_offset;
+	const struct config_options *opts;
+};
+
+static int do_event(enum config_event_t type, struct parse_event_data *data)
+{
+	size_t offset;
+
+	if (!data->opts || !data->opts->event_fn)
+		return 0;
+
+	if (type == CONFIG_EVENT_WHITESPACE &&
+	    data->previous_type == type)
+		return 0;
+
+	offset = cf->do_ftell(cf);
+	/*
+	 * At EOF, the parser always "inserts" an extra '\n', therefore
+	 * the end offset of the event is the current file position, otherwise
+	 * we will already have advanced to the next event.
+	 */
+	if (type != CONFIG_EVENT_EOF)
+		offset--;
+
+	if (data->previous_type != CONFIG_EVENT_EOF &&
+	    data->opts->event_fn(data->previous_type, data->previous_offset,
+				 offset, data->opts->event_fn_data) < 0)
+		return -1;
+
+	data->previous_type = type;
+	data->previous_offset = offset;
+
+	return 0;
+}
+
+static int git_parse_source(config_fn_t fn, void *data,
+			    const struct config_options *opts)
 {
 	int comment = 0;
 	int baselen = 0;
@@ -664,8 +702,15 @@ static int git_parse_source(config_fn_t fn, void *data)
 	/* U+FEFF Byte Order Mark in UTF8 */
 	const char *bomptr = utf8_bom;
 
+	/* For the parser event callback */
+	struct parse_event_data event_data = {
+		CONFIG_EVENT_EOF, 0, opts
+	};
+
 	for (;;) {
-		int c = get_next_char();
+		int c;
+
+		c = get_next_char();
 		if (bomptr && *bomptr) {
 			/* We are at the file beginning; skip UTF8-encoded BOM
 			 * if present. Sane editors won't put this in on their
@@ -682,18 +727,33 @@ static int git_parse_source(config_fn_t fn, void *data)
 			}
 		}
 		if (c == '\n') {
-			if (cf->eof)
+			if (cf->eof) {
+				if (do_event(CONFIG_EVENT_EOF, &event_data) < 0)
+					return -1;
 				return 0;
+			}
+			if (do_event(CONFIG_EVENT_WHITESPACE, &event_data) < 0)
+				return -1;
 			comment = 0;
 			continue;
 		}
-		if (comment || isspace(c))
+		if (comment)
 			continue;
+		if (isspace(c)) {
+			if (do_event(CONFIG_EVENT_WHITESPACE, &event_data) < 0)
+					return -1;
+			continue;
+		}
 		if (c == '#' || c == ';') {
+			if (do_event(CONFIG_EVENT_COMMENT, &event_data) < 0)
+					return -1;
 			comment = 1;
 			continue;
 		}
 		if (c == '[') {
+			if (do_event(CONFIG_EVENT_SECTION, &event_data) < 0)
+					return -1;
+
 			/* Reset prior to determining a new stem */
 			strbuf_reset(var);
 			if (get_base_var(var) < 0 || var->len < 1)
@@ -704,6 +764,10 @@ static int git_parse_source(config_fn_t fn, void *data)
 		}
 		if (!isalpha(c))
 			break;
+
+		if (do_event(CONFIG_EVENT_ENTRY, &event_data) < 0)
+			return -1;
+
 		/*
 		 * Truncate the var name back to the section header
 		 * stem prior to grabbing the suffix part of the name
@@ -715,6 +779,9 @@ static int git_parse_source(config_fn_t fn, void *data)
 			break;
 	}
 
+	if (do_event(CONFIG_EVENT_ERROR, &event_data) < 0)
+		return -1;
+
 	switch (cf->origin_type) {
 	case CONFIG_ORIGIN_BLOB:
 		error_msg = xstrfmt(_("bad config line %d in blob %s"),
@@ -1398,7 +1465,8 @@ int git_default_config(const char *var, const char *value, void *dummy)
  * fgetc, ungetc, ftell of top need to be initialized before calling
  * this function.
  */
-static int do_config_from(struct config_source *top, config_fn_t fn, void *data)
+static int do_config_from(struct config_source *top, config_fn_t fn, void *data,
+			  const struct config_options *opts)
 {
 	int ret;
 
@@ -1410,7 +1478,7 @@ static int do_config_from(struct config_source *top, config_fn_t fn, void *data)
 	strbuf_init(&top->var, 1024);
 	cf = top;
 
-	ret = git_parse_source(fn, data);
+	ret = git_parse_source(fn, data, opts);
 
 	/* pop config-file parsing state stack */
 	strbuf_release(&top->value);
@@ -1423,7 +1491,7 @@ static int do_config_from(struct config_source *top, config_fn_t fn, void *data)
 static int do_config_from_file(config_fn_t fn,
 		const enum config_origin_type origin_type,
 		const char *name, const char *path, FILE *f,
-		void *data)
+		void *data, const struct config_options *opts)
 {
 	struct config_source top;
 
@@ -1436,15 +1504,18 @@ static int do_config_from_file(config_fn_t fn,
 	top.do_ungetc = config_file_ungetc;
 	top.do_ftell = config_file_ftell;
 
-	return do_config_from(&top, fn, data);
+	return do_config_from(&top, fn, data, opts);
 }
 
 static int git_config_from_stdin(config_fn_t fn, void *data)
 {
-	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin, data);
+	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin,
+				   data, NULL);
 }
 
-int git_config_from_file(config_fn_t fn, const char *filename, void *data)
+int git_config_from_file_with_options(config_fn_t fn, const char *filename,
+				      void *data,
+				      const struct config_options *opts)
 {
 	int ret = -1;
 	FILE *f;
@@ -1452,13 +1523,19 @@ int git_config_from_file(config_fn_t fn, const char *filename, void *data)
 	f = fopen_or_warn(filename, "r");
 	if (f) {
 		flockfile(f);
-		ret = do_config_from_file(fn, CONFIG_ORIGIN_FILE, filename, filename, f, data);
+		ret = do_config_from_file(fn, CONFIG_ORIGIN_FILE, filename,
+					  filename, f, data, opts);
 		funlockfile(f);
 		fclose(f);
 	}
 	return ret;
 }
 
+int git_config_from_file(config_fn_t fn, const char *filename, void *data)
+{
+	return git_config_from_file_with_options(fn, filename, data, NULL);
+}
+
 int git_config_from_mem(config_fn_t fn, const enum config_origin_type origin_type,
 			const char *name, const char *buf, size_t len, void *data)
 {
@@ -1475,7 +1552,7 @@ int git_config_from_mem(config_fn_t fn, const enum config_origin_type origin_typ
 	top.do_ungetc = config_buf_ungetc;
 	top.do_ftell = config_buf_ftell;
 
-	return do_config_from(&top, fn, data);
+	return do_config_from(&top, fn, data, NULL);
 }
 
 int git_config_from_blob_oid(config_fn_t fn,
diff --git a/config.h b/config.h
index ef70a9cac1e..5a2394daae2 100644
--- a/config.h
+++ b/config.h
@@ -28,15 +28,40 @@ enum config_origin_type {
 	CONFIG_ORIGIN_CMDLINE
 };
 
+enum config_event_t {
+	CONFIG_EVENT_SECTION,
+	CONFIG_EVENT_ENTRY,
+	CONFIG_EVENT_WHITESPACE,
+	CONFIG_EVENT_COMMENT,
+	CONFIG_EVENT_EOF,
+	CONFIG_EVENT_ERROR
+};
+
+/*
+ * The parser event function (if not NULL) is called with the event type and
+ * the begin/end offsets of the parsed elements.
+ *
+ * Note: for CONFIG_EVENT_ENTRY (i.e. config variables), the trailing newline
+ * character is considered part of the element.
+ */
+typedef int (*config_parser_event_fn_t)(enum config_event_t type,
+					size_t begin_offset, size_t end_offset,
+					void *event_fn_data);
+
 struct config_options {
 	unsigned int respect_includes : 1;
 	const char *commondir;
 	const char *git_dir;
+	config_parser_event_fn_t event_fn;
+	void *event_fn_data;
 };
 
 typedef int (*config_fn_t)(const char *, const char *, void *);
 extern int git_default_config(const char *, const char *, void *);
 extern int git_config_from_file(config_fn_t fn, const char *, void *);
+extern int git_config_from_file_with_options(config_fn_t fn, const char *,
+					     void *,
+					     const struct config_options *);
 extern int git_config_from_mem(config_fn_t fn, const enum config_origin_type,
 					const char *name, const char *buf, size_t len, void *data);
 extern int git_config_from_blob_oid(config_fn_t fn, const char *name,
-- 
2.17.0.windows.1.4.g7e4058d72e3



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v3 10/15] config: avoid using the global variable `store`
  2018-04-09  8:31   ` [PATCH v3 " Johannes Schindelin
                       ` (8 preceding siblings ...)
  2018-04-09  8:32     ` [PATCH v3 09/15] config: introduce an optional event stream while parsing Johannes Schindelin
@ 2018-04-09  8:32     ` Johannes Schindelin
  2018-04-09  8:32     ` [PATCH v3 11/15] config_set_store: rename some fields for consistency Johannes Schindelin
                       ` (4 subsequent siblings)
  14 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-09  8:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

It is much easier to reason about, when the config code to set/unset
variables or to remove/rename sections does not rely on a global (or
file-local) variable.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.c | 119 ++++++++++++++++++++++++++++++-------------------------
 1 file changed, 66 insertions(+), 53 deletions(-)

diff --git a/config.c b/config.c
index 03d8e7709fe..0c0a965267d 100644
--- a/config.c
+++ b/config.c
@@ -2296,7 +2296,7 @@ void git_die_config(const char *key, const char *err, ...)
  * Find all the stuff for git_config_set() below.
  */
 
-static struct {
+struct config_store_data {
 	int baselen;
 	char *key;
 	int do_not_match;
@@ -2306,56 +2306,58 @@ static struct {
 	unsigned int offset_alloc;
 	enum { START, SECTION_SEEN, SECTION_END_SEEN, KEY_SEEN } state;
 	unsigned int seen;
-} store;
+};
 
-static int matches(const char *key, const char *value)
+static int matches(const char *key, const char *value,
+		   const struct config_store_data *store)
 {
-	if (strcmp(key, store.key))
+	if (strcmp(key, store->key))
 		return 0; /* not ours */
-	if (!store.value_regex)
+	if (!store->value_regex)
 		return 1; /* always matches */
-	if (store.value_regex == CONFIG_REGEX_NONE)
+	if (store->value_regex == CONFIG_REGEX_NONE)
 		return 0; /* never matches */
 
-	return store.do_not_match ^
-		(value && !regexec(store.value_regex, value, 0, NULL, 0));
+	return store->do_not_match ^
+		(value && !regexec(store->value_regex, value, 0, NULL, 0));
 }
 
 static int store_aux(const char *key, const char *value, void *cb)
 {
 	const char *ep;
 	size_t section_len;
+	struct config_store_data *store = cb;
 
-	switch (store.state) {
+	switch (store->state) {
 	case KEY_SEEN:
-		if (matches(key, value)) {
-			if (store.seen == 1 && store.multi_replace == 0) {
+		if (matches(key, value, store)) {
+			if (store->seen == 1 && store->multi_replace == 0) {
 				warning(_("%s has multiple values"), key);
 			}
 
-			ALLOC_GROW(store.offset, store.seen + 1,
-				   store.offset_alloc);
+			ALLOC_GROW(store->offset, store->seen + 1,
+				   store->offset_alloc);
 
-			store.offset[store.seen] = cf->do_ftell(cf);
-			store.seen++;
+			store->offset[store->seen] = cf->do_ftell(cf);
+			store->seen++;
 		}
 		break;
 	case SECTION_SEEN:
 		/*
-		 * What we are looking for is in store.key (both
+		 * What we are looking for is in store->key (both
 		 * section and var), and its section part is baselen
 		 * long.  We found key (again, both section and var).
 		 * We would want to know if this key is in the same
 		 * section as what we are looking for.  We already
 		 * know we are in the same section as what should
-		 * hold store.key.
+		 * hold store->key.
 		 */
 		ep = strrchr(key, '.');
 		section_len = ep - key;
 
-		if ((section_len != store.baselen) ||
-		    memcmp(key, store.key, section_len+1)) {
-			store.state = SECTION_END_SEEN;
+		if ((section_len != store->baselen) ||
+		    memcmp(key, store->key, section_len+1)) {
+			store->state = SECTION_END_SEEN;
 			break;
 		}
 
@@ -2363,26 +2365,27 @@ static int store_aux(const char *key, const char *value, void *cb)
 		 * Do not increment matches: this is no match, but we
 		 * just made sure we are in the desired section.
 		 */
-		ALLOC_GROW(store.offset, store.seen + 1,
-			   store.offset_alloc);
-		store.offset[store.seen] = cf->do_ftell(cf);
+		ALLOC_GROW(store->offset, store->seen + 1,
+			   store->offset_alloc);
+		store->offset[store->seen] = cf->do_ftell(cf);
 		/* fallthru */
 	case SECTION_END_SEEN:
 	case START:
-		if (matches(key, value)) {
-			ALLOC_GROW(store.offset, store.seen + 1,
-				   store.offset_alloc);
-			store.offset[store.seen] = cf->do_ftell(cf);
-			store.state = KEY_SEEN;
-			store.seen++;
+		if (matches(key, value, store)) {
+			ALLOC_GROW(store->offset, store->seen + 1,
+				   store->offset_alloc);
+			store->offset[store->seen] = cf->do_ftell(cf);
+			store->state = KEY_SEEN;
+			store->seen++;
 		} else {
-			if (strrchr(key, '.') - key == store.baselen &&
-			      !strncmp(key, store.key, store.baselen)) {
-					store.state = SECTION_SEEN;
-					ALLOC_GROW(store.offset,
-						   store.seen + 1,
-						   store.offset_alloc);
-					store.offset[store.seen] = cf->do_ftell(cf);
+			if (strrchr(key, '.') - key == store->baselen &&
+			      !strncmp(key, store->key, store->baselen)) {
+					store->state = SECTION_SEEN;
+					ALLOC_GROW(store->offset,
+						   store->seen + 1,
+						   store->offset_alloc);
+					store->offset[store->seen] =
+						cf->do_ftell(cf);
 			}
 		}
 	}
@@ -2397,31 +2400,33 @@ static int write_error(const char *filename)
 	return 4;
 }
 
-static struct strbuf store_create_section(const char *key)
+static struct strbuf store_create_section(const char *key,
+					  const struct config_store_data *store)
 {
 	const char *dot;
 	int i;
 	struct strbuf sb = STRBUF_INIT;
 
-	dot = memchr(key, '.', store.baselen);
+	dot = memchr(key, '.', store->baselen);
 	if (dot) {
 		strbuf_addf(&sb, "[%.*s \"", (int)(dot - key), key);
-		for (i = dot - key + 1; i < store.baselen; i++) {
+		for (i = dot - key + 1; i < store->baselen; i++) {
 			if (key[i] == '"' || key[i] == '\\')
 				strbuf_addch(&sb, '\\');
 			strbuf_addch(&sb, key[i]);
 		}
 		strbuf_addstr(&sb, "\"]\n");
 	} else {
-		strbuf_addf(&sb, "[%.*s]\n", store.baselen, key);
+		strbuf_addf(&sb, "[%.*s]\n", store->baselen, key);
 	}
 
 	return sb;
 }
 
-static ssize_t write_section(int fd, const char *key)
+static ssize_t write_section(int fd, const char *key,
+			     const struct config_store_data *store)
 {
-	struct strbuf sb = store_create_section(key);
+	struct strbuf sb = store_create_section(key, store);
 	ssize_t ret;
 
 	ret = write_in_full(fd, sb.buf, sb.len);
@@ -2430,11 +2435,12 @@ static ssize_t write_section(int fd, const char *key)
 	return ret;
 }
 
-static ssize_t write_pair(int fd, const char *key, const char *value)
+static ssize_t write_pair(int fd, const char *key, const char *value,
+			  const struct config_store_data *store)
 {
 	int i;
 	ssize_t ret;
-	int length = strlen(key + store.baselen + 1);
+	int length = strlen(key + store->baselen + 1);
 	const char *quote = "";
 	struct strbuf sb = STRBUF_INIT;
 
@@ -2454,7 +2460,7 @@ static ssize_t write_pair(int fd, const char *key, const char *value)
 		quote = "\"";
 
 	strbuf_addf(&sb, "\t%.*s = %s",
-		    length, key + store.baselen + 1, quote);
+		    length, key + store->baselen + 1, quote);
 
 	for (i = 0; value[i]; i++)
 		switch (value[i]) {
@@ -2564,6 +2570,9 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 	char *filename_buf = NULL;
 	char *contents = NULL;
 	size_t contents_sz;
+	struct config_store_data store;
+
+	memset(&store, 0, sizeof(store));
 
 	/* parse-key returns negative; flip the sign to feed exit(3) */
 	ret = 0 - git_config_parse_key(key, &store.key, &store.baselen);
@@ -2606,8 +2615,8 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 		}
 
 		store.key = (char *)key;
-		if (write_section(fd, key) < 0 ||
-		    write_pair(fd, key, value) < 0)
+		if (write_section(fd, key, &store) < 0 ||
+		    write_pair(fd, key, value, &store) < 0)
 			goto write_err_out;
 	} else {
 		struct stat st;
@@ -2646,7 +2655,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 		 * As a side effect, we make sure to transform only a valid
 		 * existing config file.
 		 */
-		if (git_config_from_file(store_aux, config_filename, NULL)) {
+		if (git_config_from_file(store_aux, config_filename, &store)) {
 			error("invalid config file %s", config_filename);
 			free(store.key);
 			if (store.value_regex != NULL &&
@@ -2730,10 +2739,10 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 		/* write the pair (value == NULL means unset) */
 		if (value != NULL) {
 			if (store.state == START) {
-				if (write_section(fd, key) < 0)
+				if (write_section(fd, key, &store) < 0)
 					goto write_err_out;
 			}
-			if (write_pair(fd, key, value) < 0)
+			if (write_pair(fd, key, value, &store) < 0)
 				goto write_err_out;
 		}
 
@@ -2857,7 +2866,8 @@ static int section_name_is_ok(const char *name)
 
 /* if new_name == NULL, the section is removed instead */
 static int git_config_copy_or_rename_section_in_file(const char *config_filename,
-				      const char *old_name, const char *new_name, int copy)
+				      const char *old_name,
+				      const char *new_name, int copy)
 {
 	int ret = 0, remove = 0;
 	char *filename_buf = NULL;
@@ -2867,6 +2877,9 @@ static int git_config_copy_or_rename_section_in_file(const char *config_filename
 	FILE *config_file = NULL;
 	struct stat st;
 	struct strbuf copystr = STRBUF_INIT;
+	struct config_store_data store;
+
+	memset(&store, 0, sizeof(store));
 
 	if (new_name && !section_name_is_ok(new_name)) {
 		ret = error("invalid section name: %s", new_name);
@@ -2936,7 +2949,7 @@ static int git_config_copy_or_rename_section_in_file(const char *config_filename
 				}
 				store.baselen = strlen(new_name);
 				if (!copy) {
-					if (write_section(out_fd, new_name) < 0) {
+					if (write_section(out_fd, new_name, &store) < 0) {
 						ret = write_error(get_lock_file_path(&lock));
 						goto out;
 					}
@@ -2957,7 +2970,7 @@ static int git_config_copy_or_rename_section_in_file(const char *config_filename
 						output[0] = '\t';
 					}
 				} else {
-					copystr = store_create_section(new_name);
+					copystr = store_create_section(new_name, &store);
 				}
 			}
 			remove = 0;
-- 
2.17.0.windows.1.4.g7e4058d72e3



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v3 11/15] config_set_store: rename some fields for consistency
  2018-04-09  8:31   ` [PATCH v3 " Johannes Schindelin
                       ` (9 preceding siblings ...)
  2018-04-09  8:32     ` [PATCH v3 10/15] config: avoid using the global variable `store` Johannes Schindelin
@ 2018-04-09  8:32     ` Johannes Schindelin
  2018-04-09  8:32     ` [PATCH v3 12/15] git_config_set: do not use a state machine Johannes Schindelin
                       ` (3 subsequent siblings)
  14 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-09  8:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

The `seen` field is the actual length of the `offset` array, and the
`offset_alloc` field records what was allocated (to avoid resizing
wherever `seen` has to be incremented).

Elsewhere, we use the convention `name` for the array, where `name` is
descriptive enough to guess its purpose, `name_nr` for the actual length
and `name_alloc` to record the maximum length without needing to resize.

Let's make the names of the fields in question consistent with that
convention.

This will also help with the next steps where we will let the
git_config_set() machinery use the config event stream that we just
introduced.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.c | 63 ++++++++++++++++++++++++++++----------------------------
 1 file changed, 31 insertions(+), 32 deletions(-)

diff --git a/config.c b/config.c
index 0c0a965267d..2341620c11a 100644
--- a/config.c
+++ b/config.c
@@ -2302,10 +2302,9 @@ struct config_store_data {
 	int do_not_match;
 	regex_t *value_regex;
 	int multi_replace;
-	size_t *offset;
-	unsigned int offset_alloc;
+	size_t *seen;
+	unsigned int seen_nr, seen_alloc;
 	enum { START, SECTION_SEEN, SECTION_END_SEEN, KEY_SEEN } state;
-	unsigned int seen;
 };
 
 static int matches(const char *key, const char *value,
@@ -2331,15 +2330,15 @@ static int store_aux(const char *key, const char *value, void *cb)
 	switch (store->state) {
 	case KEY_SEEN:
 		if (matches(key, value, store)) {
-			if (store->seen == 1 && store->multi_replace == 0) {
+			if (store->seen_nr == 1 && store->multi_replace == 0) {
 				warning(_("%s has multiple values"), key);
 			}
 
-			ALLOC_GROW(store->offset, store->seen + 1,
-				   store->offset_alloc);
+			ALLOC_GROW(store->seen, store->seen_nr + 1,
+				   store->seen_alloc);
 
-			store->offset[store->seen] = cf->do_ftell(cf);
-			store->seen++;
+			store->seen[store->seen_nr] = cf->do_ftell(cf);
+			store->seen_nr++;
 		}
 		break;
 	case SECTION_SEEN:
@@ -2365,26 +2364,26 @@ static int store_aux(const char *key, const char *value, void *cb)
 		 * Do not increment matches: this is no match, but we
 		 * just made sure we are in the desired section.
 		 */
-		ALLOC_GROW(store->offset, store->seen + 1,
-			   store->offset_alloc);
-		store->offset[store->seen] = cf->do_ftell(cf);
+		ALLOC_GROW(store->seen, store->seen_nr + 1,
+			   store->seen_alloc);
+		store->seen[store->seen_nr] = cf->do_ftell(cf);
 		/* fallthru */
 	case SECTION_END_SEEN:
 	case START:
 		if (matches(key, value, store)) {
-			ALLOC_GROW(store->offset, store->seen + 1,
-				   store->offset_alloc);
-			store->offset[store->seen] = cf->do_ftell(cf);
+			ALLOC_GROW(store->seen, store->seen_nr + 1,
+				   store->seen_alloc);
+			store->seen[store->seen_nr] = cf->do_ftell(cf);
 			store->state = KEY_SEEN;
-			store->seen++;
+			store->seen_nr++;
 		} else {
 			if (strrchr(key, '.') - key == store->baselen &&
 			      !strncmp(key, store->key, store->baselen)) {
 					store->state = SECTION_SEEN;
-					ALLOC_GROW(store->offset,
-						   store->seen + 1,
-						   store->offset_alloc);
-					store->offset[store->seen] =
+					ALLOC_GROW(store->seen,
+						   store->seen_nr + 1,
+						   store->seen_alloc);
+					store->seen[store->seen_nr] =
 						cf->do_ftell(cf);
 			}
 		}
@@ -2644,10 +2643,10 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 			}
 		}
 
-		ALLOC_GROW(store.offset, 1, store.offset_alloc);
-		store.offset[0] = 0;
+		ALLOC_GROW(store.seen, 1, store.seen_alloc);
+		store.seen[0] = 0;
 		store.state = START;
-		store.seen = 0;
+		store.seen_nr = 0;
 
 		/*
 		 * After this, store.offset will contain the *end* offset
@@ -2675,8 +2674,8 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 		}
 
 		/* if nothing to unset, or too many matches, error out */
-		if ((store.seen == 0 && value == NULL) ||
-				(store.seen > 1 && multi_replace == 0)) {
+		if ((store.seen_nr == 0 && value == NULL) ||
+		    (store.seen_nr > 1 && multi_replace == 0)) {
 			ret = CONFIG_NOTHING_SET;
 			goto out_free;
 		}
@@ -2707,19 +2706,19 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 			goto out_free;
 		}
 
-		if (store.seen == 0)
-			store.seen = 1;
+		if (store.seen_nr == 0)
+			store.seen_nr = 1;
 
-		for (i = 0, copy_begin = 0; i < store.seen; i++) {
+		for (i = 0, copy_begin = 0; i < store.seen_nr; i++) {
 			new_line = 0;
-			if (store.offset[i] == 0) {
-				store.offset[i] = copy_end = contents_sz;
+			if (store.seen[i] == 0) {
+				store.seen[i] = copy_end = contents_sz;
 			} else if (store.state != KEY_SEEN) {
-				copy_end = store.offset[i];
+				copy_end = store.seen[i];
 			} else
 				copy_end = find_beginning_of_line(
 					contents, contents_sz,
-					store.offset[i], &new_line);
+					store.seen[i], &new_line);
 
 			if (copy_end > 0 && contents[copy_end-1] != '\n')
 				new_line = 1;
@@ -2733,7 +2732,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 				    write_str_in_full(fd, "\n") < 0)
 					goto write_err_out;
 			}
-			copy_begin = store.offset[i];
+			copy_begin = store.seen[i];
 		}
 
 		/* write the pair (value == NULL means unset) */
-- 
2.17.0.windows.1.4.g7e4058d72e3



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v3 12/15] git_config_set: do not use a state machine
  2018-04-09  8:31   ` [PATCH v3 " Johannes Schindelin
                       ` (10 preceding siblings ...)
  2018-04-09  8:32     ` [PATCH v3 11/15] config_set_store: rename some fields for consistency Johannes Schindelin
@ 2018-04-09  8:32     ` Johannes Schindelin
  2018-04-09  8:32     ` [PATCH v3 13/15] git_config_set: make use of the config parser's event stream Johannes Schindelin
                       ` (2 subsequent siblings)
  14 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-09  8:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

While a neat theoretical construct, state machines are hard to read. In
this instance, it does not even make a whole lot of sense because we are
more interested in flags, anyway: has the section been seen? Has the key
been seen? Does the current section match the key we are looking for?

Besides, the state `SECTION_SEEN` was named in a misleading way: it did
not indicate that we saw the section matching the key we are looking
for, but it instead indicated that we are *currently* in that section.

Let's just replace the state machine logic by clear and obvious flags.

This will also make it easier to review the upcoming patches to use the
newly-introduced `event_fn` callback of the config parser.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.c | 59 ++++++++++++++++++++++++++++----------------------------
 1 file changed, 29 insertions(+), 30 deletions(-)

diff --git a/config.c b/config.c
index 2341620c11a..3f1cbfa181e 100644
--- a/config.c
+++ b/config.c
@@ -2304,7 +2304,7 @@ struct config_store_data {
 	int multi_replace;
 	size_t *seen;
 	unsigned int seen_nr, seen_alloc;
-	enum { START, SECTION_SEEN, SECTION_END_SEEN, KEY_SEEN } state;
+	unsigned int key_seen:1, section_seen:1, is_keys_section:1;
 };
 
 static int matches(const char *key, const char *value,
@@ -2327,8 +2327,7 @@ static int store_aux(const char *key, const char *value, void *cb)
 	size_t section_len;
 	struct config_store_data *store = cb;
 
-	switch (store->state) {
-	case KEY_SEEN:
+	if (store->key_seen) {
 		if (matches(key, value, store)) {
 			if (store->seen_nr == 1 && store->multi_replace == 0) {
 				warning(_("%s has multiple values"), key);
@@ -2340,8 +2339,8 @@ static int store_aux(const char *key, const char *value, void *cb)
 			store->seen[store->seen_nr] = cf->do_ftell(cf);
 			store->seen_nr++;
 		}
-		break;
-	case SECTION_SEEN:
+		return 0;
+	} else if (store->is_keys_section) {
 		/*
 		 * What we are looking for is in store->key (both
 		 * section and var), and its section part is baselen
@@ -2356,10 +2355,9 @@ static int store_aux(const char *key, const char *value, void *cb)
 
 		if ((section_len != store->baselen) ||
 		    memcmp(key, store->key, section_len+1)) {
-			store->state = SECTION_END_SEEN;
-			break;
+			store->is_keys_section = 0;
+			return 0;
 		}
-
 		/*
 		 * Do not increment matches: this is no match, but we
 		 * just made sure we are in the desired section.
@@ -2367,27 +2365,29 @@ static int store_aux(const char *key, const char *value, void *cb)
 		ALLOC_GROW(store->seen, store->seen_nr + 1,
 			   store->seen_alloc);
 		store->seen[store->seen_nr] = cf->do_ftell(cf);
-		/* fallthru */
-	case SECTION_END_SEEN:
-	case START:
-		if (matches(key, value, store)) {
-			ALLOC_GROW(store->seen, store->seen_nr + 1,
-				   store->seen_alloc);
-			store->seen[store->seen_nr] = cf->do_ftell(cf);
-			store->state = KEY_SEEN;
-			store->seen_nr++;
-		} else {
-			if (strrchr(key, '.') - key == store->baselen &&
-			      !strncmp(key, store->key, store->baselen)) {
-					store->state = SECTION_SEEN;
-					ALLOC_GROW(store->seen,
-						   store->seen_nr + 1,
-						   store->seen_alloc);
-					store->seen[store->seen_nr] =
-						cf->do_ftell(cf);
-			}
+	}
+
+	if (matches(key, value, store)) {
+		ALLOC_GROW(store->seen, store->seen_nr + 1,
+			   store->seen_alloc);
+		store->seen[store->seen_nr] = cf->do_ftell(cf);
+		store->seen_nr++;
+		store->key_seen = 1;
+		store->section_seen = 1;
+		store->is_keys_section = 1;
+	} else {
+		if (strrchr(key, '.') - key == store->baselen &&
+		      !strncmp(key, store->key, store->baselen)) {
+				store->section_seen = 1;
+				store->is_keys_section = 1;
+				ALLOC_GROW(store->seen,
+					   store->seen_nr + 1,
+					   store->seen_alloc);
+				store->seen[store->seen_nr] =
+					cf->do_ftell(cf);
 		}
 	}
+
 	return 0;
 }
 
@@ -2645,7 +2645,6 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 
 		ALLOC_GROW(store.seen, 1, store.seen_alloc);
 		store.seen[0] = 0;
-		store.state = START;
 		store.seen_nr = 0;
 
 		/*
@@ -2713,7 +2712,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 			new_line = 0;
 			if (store.seen[i] == 0) {
 				store.seen[i] = copy_end = contents_sz;
-			} else if (store.state != KEY_SEEN) {
+			} else if (!store.key_seen) {
 				copy_end = store.seen[i];
 			} else
 				copy_end = find_beginning_of_line(
@@ -2737,7 +2736,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 
 		/* write the pair (value == NULL means unset) */
 		if (value != NULL) {
-			if (store.state == START) {
+			if (!store.section_seen) {
 				if (write_section(fd, key, &store) < 0)
 					goto write_err_out;
 			}
-- 
2.17.0.windows.1.4.g7e4058d72e3



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v3 13/15] git_config_set: make use of the config parser's event stream
  2018-04-09  8:31   ` [PATCH v3 " Johannes Schindelin
                       ` (11 preceding siblings ...)
  2018-04-09  8:32     ` [PATCH v3 12/15] git_config_set: do not use a state machine Johannes Schindelin
@ 2018-04-09  8:32     ` Johannes Schindelin
  2018-05-08 13:42       ` Jeff King
  2018-04-09  8:32     ` [PATCH v3 14/15] git config --unset: remove empty sections (in the common case) Johannes Schindelin
  2018-04-09  8:32     ` [PATCH v3 15/15] git_config_set: reuse empty sections Johannes Schindelin
  14 siblings, 1 reply; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-09  8:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

In the recent commit with the title "config: introduce an optional event
stream while parsing", we introduced an optional callback to keep track
of the config parser's events "comment", "white-space", "section header"
and "entry".

One motivation for this feature was to make use of it in the code that
edits the config. And this commit makes it so.

Note: this patch changes the meaning of the `seen` array that records
whether we saw the config entry that is to be edited: previously, it
contained the end offset of the found entry. Now, we introduce a new
array `parsed` that keeps a record of *all* config parser events (with
begin/end offsets), and the items in the `seen` array now point into the
`parsed` array.

There are two reasons why we do it this way:

1. To keep the implementation simple, the config parser's event stream
   reports the event only after the config callback was called, so we
   would not receive the begin offset otherwise.

2. In the following patches, we will re-use the `parsed` array to fix two
   long-standing bugs related to empty sections.

Note that this also makes the code more robust with respect to finding the
begin offset of the part(s) of the config file to be edited, as we no
longer back-track to find the beginning of the line.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.c | 170 ++++++++++++++++++++++++++-----------------------------
 1 file changed, 81 insertions(+), 89 deletions(-)

diff --git a/config.c b/config.c
index 3f1cbfa181e..72d71fc9a4e 100644
--- a/config.c
+++ b/config.c
@@ -2302,8 +2302,11 @@ struct config_store_data {
 	int do_not_match;
 	regex_t *value_regex;
 	int multi_replace;
-	size_t *seen;
-	unsigned int seen_nr, seen_alloc;
+	struct {
+		size_t begin, end;
+		enum config_event_t type;
+	} *parsed;
+	unsigned int parsed_nr, parsed_alloc, *seen, seen_nr, seen_alloc;
 	unsigned int key_seen:1, section_seen:1, is_keys_section:1;
 };
 
@@ -2321,10 +2324,31 @@ static int matches(const char *key, const char *value,
 		(value && !regexec(store->value_regex, value, 0, NULL, 0));
 }
 
+static int store_aux_event(enum config_event_t type,
+			   size_t begin, size_t end, void *data)
+{
+	struct config_store_data *store = data;
+
+	ALLOC_GROW(store->parsed, store->parsed_nr + 1, store->parsed_alloc);
+	store->parsed[store->parsed_nr].begin = begin;
+	store->parsed[store->parsed_nr].end = end;
+	store->parsed[store->parsed_nr].type = type;
+	store->parsed_nr++;
+
+	if (type == CONFIG_EVENT_SECTION) {
+		if (cf->var.len < 2 || cf->var.buf[cf->var.len - 1] != '.')
+			BUG("Invalid section name '%s'", cf->var.buf);
+
+		/* Is this the section we were looking for? */
+		store->is_keys_section = cf->var.len - 1 == store->baselen &&
+			!strncasecmp(cf->var.buf, store->key, store->baselen);
+	}
+
+	return 0;
+}
+
 static int store_aux(const char *key, const char *value, void *cb)
 {
-	const char *ep;
-	size_t section_len;
 	struct config_store_data *store = cb;
 
 	if (store->key_seen) {
@@ -2336,55 +2360,21 @@ static int store_aux(const char *key, const char *value, void *cb)
 			ALLOC_GROW(store->seen, store->seen_nr + 1,
 				   store->seen_alloc);
 
-			store->seen[store->seen_nr] = cf->do_ftell(cf);
+			store->seen[store->seen_nr] = store->parsed_nr;
 			store->seen_nr++;
 		}
-		return 0;
 	} else if (store->is_keys_section) {
 		/*
-		 * What we are looking for is in store->key (both
-		 * section and var), and its section part is baselen
-		 * long.  We found key (again, both section and var).
-		 * We would want to know if this key is in the same
-		 * section as what we are looking for.  We already
-		 * know we are in the same section as what should
-		 * hold store->key.
+		 * Do not increment matches yet: this may not be a match, but we
+		 * are in the desired section.
 		 */
-		ep = strrchr(key, '.');
-		section_len = ep - key;
-
-		if ((section_len != store->baselen) ||
-		    memcmp(key, store->key, section_len+1)) {
-			store->is_keys_section = 0;
-			return 0;
-		}
-		/*
-		 * Do not increment matches: this is no match, but we
-		 * just made sure we are in the desired section.
-		 */
-		ALLOC_GROW(store->seen, store->seen_nr + 1,
-			   store->seen_alloc);
-		store->seen[store->seen_nr] = cf->do_ftell(cf);
-	}
-
-	if (matches(key, value, store)) {
-		ALLOC_GROW(store->seen, store->seen_nr + 1,
-			   store->seen_alloc);
-		store->seen[store->seen_nr] = cf->do_ftell(cf);
-		store->seen_nr++;
-		store->key_seen = 1;
+		ALLOC_GROW(store->seen, store->seen_nr + 1, store->seen_alloc);
+		store->seen[store->seen_nr] = store->parsed_nr;
 		store->section_seen = 1;
-		store->is_keys_section = 1;
-	} else {
-		if (strrchr(key, '.') - key == store->baselen &&
-		      !strncmp(key, store->key, store->baselen)) {
-				store->section_seen = 1;
-				store->is_keys_section = 1;
-				ALLOC_GROW(store->seen,
-					   store->seen_nr + 1,
-					   store->seen_alloc);
-				store->seen[store->seen_nr] =
-					cf->do_ftell(cf);
+
+		if (matches(key, value, store)) {
+			store->seen_nr++;
+			store->key_seen = 1;
 		}
 	}
 
@@ -2485,32 +2475,6 @@ static ssize_t write_pair(int fd, const char *key, const char *value,
 	return ret;
 }
 
-static ssize_t find_beginning_of_line(const char *contents, size_t size,
-	size_t offset_, int *found_bracket)
-{
-	size_t equal_offset = size, bracket_offset = size;
-	ssize_t offset;
-
-contline:
-	for (offset = offset_-2; offset > 0
-			&& contents[offset] != '\n'; offset--)
-		switch (contents[offset]) {
-			case '=': equal_offset = offset; break;
-			case ']': bracket_offset = offset; break;
-		}
-	if (offset > 0 && contents[offset-1] == '\\') {
-		offset_ = offset;
-		goto contline;
-	}
-	if (bracket_offset < equal_offset) {
-		*found_bracket = 1;
-		offset = bracket_offset+1;
-	} else
-		offset++;
-
-	return offset;
-}
-
 int git_config_set_in_file_gently(const char *config_filename,
 				  const char *key, const char *value)
 {
@@ -2621,6 +2585,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 		struct stat st;
 		size_t copy_begin, copy_end;
 		int i, new_line = 0;
+		struct config_options opts;
 
 		if (value_regex == NULL)
 			store.value_regex = NULL;
@@ -2643,17 +2608,24 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 			}
 		}
 
-		ALLOC_GROW(store.seen, 1, store.seen_alloc);
-		store.seen[0] = 0;
-		store.seen_nr = 0;
+		ALLOC_GROW(store.parsed, 1, store.parsed_alloc);
+		store.parsed[0].end = 0;
+
+		memset(&opts, 0, sizeof(opts));
+		opts.event_fn = store_aux_event;
+		opts.event_fn_data = &store;
 
 		/*
-		 * After this, store.offset will contain the *end* offset
-		 * of the last match, or remain at 0 if no match was found.
+		 * After this, store.parsed will contain offsets of all the
+		 * parsed elements, and store.seen will contain a list of
+		 * matches, as indices into store.parsed.
+		 *
 		 * As a side effect, we make sure to transform only a valid
 		 * existing config file.
 		 */
-		if (git_config_from_file(store_aux, config_filename, &store)) {
+		if (git_config_from_file_with_options(store_aux,
+						      config_filename,
+						      &store, &opts)) {
 			error("invalid config file %s", config_filename);
 			free(store.key);
 			if (store.value_regex != NULL &&
@@ -2705,19 +2677,39 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 			goto out_free;
 		}
 
-		if (store.seen_nr == 0)
+		if (store.seen_nr == 0) {
+			if (!store.seen_alloc) {
+				/* Did not see key nor section */
+				ALLOC_GROW(store.seen, 1, store.seen_alloc);
+				store.seen[0] = store.parsed_nr
+					- !!store.parsed_nr;
+			}
 			store.seen_nr = 1;
+		}
 
 		for (i = 0, copy_begin = 0; i < store.seen_nr; i++) {
+			size_t replace_end;
+			int j = store.seen[i];
+
 			new_line = 0;
-			if (store.seen[i] == 0) {
-				store.seen[i] = copy_end = contents_sz;
-			} else if (!store.key_seen) {
-				copy_end = store.seen[i];
-			} else
-				copy_end = find_beginning_of_line(
-					contents, contents_sz,
-					store.seen[i], &new_line);
+			if (!store.key_seen) {
+				replace_end = copy_end = store.parsed[j].end;
+			} else {
+				replace_end = store.parsed[j].end;
+				copy_end = store.parsed[j].begin;
+				/*
+				 * Swallow preceding white-space on the same
+				 * line.
+				 */
+				while (copy_end > 0 ) {
+					char c = contents[copy_end - 1];
+
+					if (isspace(c) && c != '\n')
+						copy_end--;
+					else
+						break;
+				}
+			}
 
 			if (copy_end > 0 && contents[copy_end-1] != '\n')
 				new_line = 1;
@@ -2731,7 +2723,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 				    write_str_in_full(fd, "\n") < 0)
 					goto write_err_out;
 			}
-			copy_begin = store.seen[i];
+			copy_begin = replace_end;
 		}
 
 		/* write the pair (value == NULL means unset) */
-- 
2.17.0.windows.1.4.g7e4058d72e3



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v3 14/15] git config --unset: remove empty sections (in the common case)
  2018-04-09  8:31   ` [PATCH v3 " Johannes Schindelin
                       ` (12 preceding siblings ...)
  2018-04-09  8:32     ` [PATCH v3 13/15] git_config_set: make use of the config parser's event stream Johannes Schindelin
@ 2018-04-09  8:32     ` Johannes Schindelin
  2018-04-09  8:32     ` [PATCH v3 15/15] git_config_set: reuse empty sections Johannes Schindelin
  14 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-09  8:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

The original reasoning for not removing section headers upon removal of
the last entry went like this: the user could have added comments about
the section, or about the entries therein, and if there were other
comments there, we would not know whether we should remove them.

In particular, a concocted example was presented that looked like this
(and was added to t1300):

	# some generic comment on the configuration file itself
	# a comment specific to this "section" section.
	[section]
	# some intervening lines
	# that should also be dropped

	key = value
	# please be careful when you update the above variable

The ideal thing for `git config --unset section.key` in this case would
be to leave only the first line behind, because all the other comments
are now obsolete.

However, this is unfeasible, short of adding a complete Natural Language
Processing module to Git, which seems not only a lot of work, but a
totally unreasonable feature (for little benefit to most users).

Now, the real kicker about this problem is: most users do not edit their
config files at all! In their use case, the config looks like this
instead:

	[section]
		key = value

... and it is totally obvious what should happen if the entry is
removed: the entire section should vanish.

Let's generalize this observation to this conservative strategy: if we
are removing the last entry from a section, and there are no comments
inside that section nor surrounding it, then remove the entire section.
Otherwise behave as before: leave the now-empty section (including those
comments, even ones about the now-deleted entry).

We have to be extra careful to handle the case where more than one entry
is removed: any subset of them might be the last entries of their
respective sections (and if there are no comments in or around that
section, the section should be removed, too).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.c          | 93 ++++++++++++++++++++++++++++++++++++++++++++++-
 t/t1300-config.sh |  4 +-
 2 files changed, 93 insertions(+), 4 deletions(-)

diff --git a/config.c b/config.c
index 72d71fc9a4e..2c7a10acdaa 100644
--- a/config.c
+++ b/config.c
@@ -2305,6 +2305,7 @@ struct config_store_data {
 	struct {
 		size_t begin, end;
 		enum config_event_t type;
+		int is_keys_section;
 	} *parsed;
 	unsigned int parsed_nr, parsed_alloc, *seen, seen_nr, seen_alloc;
 	unsigned int key_seen:1, section_seen:1, is_keys_section:1;
@@ -2333,17 +2334,20 @@ static int store_aux_event(enum config_event_t type,
 	store->parsed[store->parsed_nr].begin = begin;
 	store->parsed[store->parsed_nr].end = end;
 	store->parsed[store->parsed_nr].type = type;
-	store->parsed_nr++;
 
 	if (type == CONFIG_EVENT_SECTION) {
 		if (cf->var.len < 2 || cf->var.buf[cf->var.len - 1] != '.')
 			BUG("Invalid section name '%s'", cf->var.buf);
 
 		/* Is this the section we were looking for? */
-		store->is_keys_section = cf->var.len - 1 == store->baselen &&
+		store->is_keys_section =
+			store->parsed[store->parsed_nr].is_keys_section =
+			cf->var.len - 1 == store->baselen &&
 			!strncasecmp(cf->var.buf, store->key, store->baselen);
 	}
 
+	store->parsed_nr++;
+
 	return 0;
 }
 
@@ -2475,6 +2479,87 @@ static ssize_t write_pair(int fd, const char *key, const char *value,
 	return ret;
 }
 
+/*
+ * If we are about to unset the last key(s) in a section, and if there are
+ * no comments surrounding (or included in) the section, we will want to
+ * extend begin/end to remove the entire section.
+ *
+ * Note: the parameter `seen_ptr` points to the index into the store.seen
+ * array.  * This index may be incremented if a section has more than one
+ * entry (which all are to be removed).
+ */
+static void maybe_remove_section(struct config_store_data *store,
+				 const char *contents,
+				 size_t *begin_offset, size_t *end_offset,
+				 int *seen_ptr)
+{
+	size_t begin;
+	int i, seen, section_seen = 0;
+
+	/*
+	 * First, ensure that this is the first key, and that there are no
+	 * comments before the entry nor before the section header.
+	 */
+	seen = *seen_ptr;
+	for (i = store->seen[seen]; i > 0; i--) {
+		enum config_event_t type = store->parsed[i - 1].type;
+
+		if (type == CONFIG_EVENT_COMMENT)
+			/* There is a comment before this entry or section */
+			return;
+		if (type == CONFIG_EVENT_ENTRY) {
+			if (!section_seen)
+				/* This is not the section's first entry. */
+				return;
+			/* We encountered no comment before the section. */
+			break;
+		}
+		if (type == CONFIG_EVENT_SECTION) {
+			if (!store->parsed[i - 1].is_keys_section)
+				break;
+			section_seen = 1;
+		}
+	}
+	begin = store->parsed[i].begin;
+
+	/*
+	 * Next, make sure that we are removing he last key(s) in the section,
+	 * and that there are no comments that are possibly about the current
+	 * section.
+	 */
+	for (i = store->seen[seen] + 1; i < store->parsed_nr; i++) {
+		enum config_event_t type = store->parsed[i].type;
+
+		if (type == CONFIG_EVENT_COMMENT)
+			return;
+		if (type == CONFIG_EVENT_SECTION) {
+			if (store->parsed[i].is_keys_section)
+				continue;
+			break;
+		}
+		if (type == CONFIG_EVENT_ENTRY) {
+			if (++seen < store->seen_nr &&
+			    i == store->seen[seen])
+				/* We want to remove this entry, too */
+				continue;
+			/* There is another entry in this section. */
+			return;
+		}
+	}
+
+	/*
+	 * We are really removing the last entry/entries from this section, and
+	 * there are no enclosed or surrounding comments. Remove the entire,
+	 * now-empty section.
+	 */
+	*seen_ptr = seen;
+	*begin_offset = begin;
+	if (i < store->parsed_nr)
+		*end_offset = store->parsed[i].begin;
+	else
+		*end_offset = store->parsed[store->parsed_nr - 1].end;
+}
+
 int git_config_set_in_file_gently(const char *config_filename,
 				  const char *key, const char *value)
 {
@@ -2697,6 +2782,10 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 			} else {
 				replace_end = store.parsed[j].end;
 				copy_end = store.parsed[j].begin;
+				if (!value)
+					maybe_remove_section(&store, contents,
+							     &copy_end,
+							     &replace_end, &i);
 				/*
 				 * Swallow preceding white-space on the same
 				 * line.
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 9d23a8ca972..d973fd53398 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -1413,7 +1413,7 @@ test_expect_success 'urlmatch with wildcard' '
 '
 
 # good section hygiene
-test_expect_failure '--unset last key removes section (except if commented)' '
+test_expect_success '--unset last key removes section (except if commented)' '
 	cat >.git/config <<-\EOF &&
 	# some generic comment on the configuration file itself
 	# a comment specific to this "section" section.
@@ -1495,7 +1495,7 @@ test_expect_failure '--unset last key removes section (except if commented)' '
 	test_line_count = 3 .git/config
 '
 
-test_expect_failure '--unset-all removes section if empty & uncommented' '
+test_expect_success '--unset-all removes section if empty & uncommented' '
 	cat >.git/config <<-\EOF &&
 	[section]
 	key = value1
-- 
2.17.0.windows.1.4.g7e4058d72e3



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v3 15/15] git_config_set: reuse empty sections
  2018-04-09  8:31   ` [PATCH v3 " Johannes Schindelin
                       ` (13 preceding siblings ...)
  2018-04-09  8:32     ` [PATCH v3 14/15] git config --unset: remove empty sections (in the common case) Johannes Schindelin
@ 2018-04-09  8:32     ` Johannes Schindelin
  14 siblings, 0 replies; 103+ messages in thread
From: Johannes Schindelin @ 2018-04-09  8:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Thomas Rast, Phil Haack, Jeff King,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

It can happen quite easily that the last setting in a config section is
removed, and to avoid confusion when there are comments in the config
about that section, we keep a lone section header, i.e. an empty
section.

Now that we use the `event_fn` callback, it is easy to add support for
re-using empty sections, so let's do that.

Note: t5512-ls-remote requires that this change is applied *after* the
patch "git config --unset: remove empty sections (in the common case)":
without that patch, there would be empty `transfer` and `uploadpack`
sections ready for reuse, but in the *wrong* order (and sconsequently,
t5512's "overrides work between mixed transfer/upload-pack hideRefs"
would fail).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.c          | 14 +++++++++++++-
 t/t1300-config.sh |  2 +-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/config.c b/config.c
index 2c7a10acdaa..6155d0651bd 100644
--- a/config.c
+++ b/config.c
@@ -2344,6 +2344,12 @@ static int store_aux_event(enum config_event_t type,
 			store->parsed[store->parsed_nr].is_keys_section =
 			cf->var.len - 1 == store->baselen &&
 			!strncasecmp(cf->var.buf, store->key, store->baselen);
+		if (store->is_keys_section) {
+			store->section_seen = 1;
+			ALLOC_GROW(store->seen, store->seen_nr + 1,
+				   store->seen_alloc);
+			store->seen[store->seen_nr] = store->parsed_nr;
+		}
 	}
 
 	store->parsed_nr++;
@@ -2778,7 +2784,13 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 
 			new_line = 0;
 			if (!store.key_seen) {
-				replace_end = copy_end = store.parsed[j].end;
+				copy_end = store.parsed[j].end;
+				/* include '\n' when copying section header */
+				if (copy_end > 0 && copy_end < contents_sz &&
+				    contents[copy_end - 1] != '\n' &&
+				    contents[copy_end] == '\n')
+					copy_end++;
+				replace_end = copy_end;
 			} else {
 				replace_end = store.parsed[j].end;
 				copy_end = store.parsed[j].begin;
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index d973fd53398..eef0bbe4f9f 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -1506,7 +1506,7 @@ test_expect_success '--unset-all removes section if empty & uncommented' '
 	test_line_count = 0 .git/config
 '
 
-test_expect_failure 'adding a key into an empty section reuses header' '
+test_expect_success 'adding a key into an empty section reuses header' '
 	cat >.git/config <<-\EOF &&
 	[section]
 	EOF
-- 
2.17.0.windows.1.4.g7e4058d72e3

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v3 13/15] git_config_set: make use of the config parser's event stream
  2018-04-09  8:32     ` [PATCH v3 13/15] git_config_set: make use of the config parser's event stream Johannes Schindelin
@ 2018-05-08 13:42       ` Jeff King
  2018-05-08 14:00         ` Jeff King
  0 siblings, 1 reply; 103+ messages in thread
From: Jeff King @ 2018-05-08 13:42 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

On Mon, Apr 09, 2018 at 10:32:20AM +0200, Johannes Schindelin wrote:

> +static int store_aux_event(enum config_event_t type,
> +			   size_t begin, size_t end, void *data)
> +{
> +	struct config_store_data *store = data;
> +
> +	ALLOC_GROW(store->parsed, store->parsed_nr + 1, store->parsed_alloc);
> +	store->parsed[store->parsed_nr].begin = begin;
> +	store->parsed[store->parsed_nr].end = end;
> +	store->parsed[store->parsed_nr].type = type;
> +	store->parsed_nr++;
> +
> +	if (type == CONFIG_EVENT_SECTION) {
> +		if (cf->var.len < 2 || cf->var.buf[cf->var.len - 1] != '.')
> +			BUG("Invalid section name '%s'", cf->var.buf);

I triggered this BUG today while playing around. Here's a minimal
reproduction:

  echo '[broken' >config
  git config --file=config a.b c

I'm not sure if it should simply be a die() and not a BUG(), since
it depends on the input. Or if it is a BUG and we expected an earlier
part of the code (like the event generator) to catch this broken case
before we get to this function.

-Peff

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v3 13/15] git_config_set: make use of the config parser's event stream
  2018-05-08 13:42       ` Jeff King
@ 2018-05-08 14:00         ` Jeff King
  0 siblings, 0 replies; 103+ messages in thread
From: Jeff King @ 2018-05-08 14:00 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, Thomas Rast, Phil Haack,
	Ævar Arnfjörð Bjarmason, Stefan Beller,
	Jason Frey, Philip Oakley

On Tue, May 08, 2018 at 09:42:48AM -0400, Jeff King wrote:

> On Mon, Apr 09, 2018 at 10:32:20AM +0200, Johannes Schindelin wrote:
> 
> > +static int store_aux_event(enum config_event_t type,
> > +			   size_t begin, size_t end, void *data)
> > +{
> > +	struct config_store_data *store = data;
> > +
> > +	ALLOC_GROW(store->parsed, store->parsed_nr + 1, store->parsed_alloc);
> > +	store->parsed[store->parsed_nr].begin = begin;
> > +	store->parsed[store->parsed_nr].end = end;
> > +	store->parsed[store->parsed_nr].type = type;
> > +	store->parsed_nr++;
> > +
> > +	if (type == CONFIG_EVENT_SECTION) {
> > +		if (cf->var.len < 2 || cf->var.buf[cf->var.len - 1] != '.')
> > +			BUG("Invalid section name '%s'", cf->var.buf);
> 
> I triggered this BUG today while playing around. Here's a minimal
> reproduction:
> 
>   echo '[broken' >config
>   git config --file=config a.b c
> 
> I'm not sure if it should simply be a die() and not a BUG(), since
> it depends on the input. Or if it is a BUG and we expected an earlier
> part of the code (like the event generator) to catch this broken case
> before we get to this function.

By the way, one side effect of BUG() here is that we call abort(), which
means that our atexit handlers don't run. And a crufty "config.lock"
file is left that prevents running the command again.

In our discussion elsewhere of having BUG() just call exit(), I'm not
sure if we'd want it to skip those cleanups or not (it's helpful to
not run them if you're trying to debug, but otherwise is annoying).

-Peff

^ permalink raw reply	[flat|nested] 103+ messages in thread

end of thread, other threads:[~2018-05-08 14:00 UTC | newest]

Thread overview: 103+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-29 15:18 [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug) Johannes Schindelin
2018-03-29 15:18 ` [PATCH 1/9] git_config_set: fix off-by-two Johannes Schindelin
2018-03-29 18:15   ` Stefan Beller
2018-03-29 19:41     ` Jeff King
2018-03-30 12:32       ` Johannes Schindelin
2018-03-30 14:15         ` Ævar Arnfjörð Bjarmason
2018-03-30 16:24           ` Junio C Hamano
2018-03-30 18:44             ` Johannes Schindelin
2018-03-30 19:00               ` Junio C Hamano
2018-04-03  9:31                 ` Johannes Schindelin
2018-04-03 15:29                   ` Duy Nguyen
2018-04-03 15:47                     ` Johannes Schindelin
2018-04-08 23:12                   ` Junio C Hamano
2018-03-30 16:36         ` Duy Nguyen
2018-03-30 18:53           ` Johannes Schindelin
2018-03-30 19:16             ` Duy Nguyen
2018-03-30 18:45         ` A potential approach to making tests faster on Windows Ævar Arnfjörð Bjarmason
2018-03-30 18:58           ` Junio C Hamano
2018-03-30 19:16           ` Jeff King
2018-04-03  9:49             ` Johannes Schindelin
2018-04-03 11:28               ` Ævar Arnfjörð Bjarmason
2018-04-03 15:55                 ` Johannes Schindelin
2018-04-03 21:36               ` Eric Sunshine
2018-04-03 11:43           ` Johannes Schindelin
2018-04-03 13:27             ` Jeff King
2018-04-03 16:00               ` Johannes Schindelin
2018-04-06 21:40                 ` Jeff King
2018-04-06 21:57                   ` Stefan Beller
2018-03-29 15:18 ` [PATCH 2/9] t1300: rename it to reflect that `repo-config` was deprecated Johannes Schindelin
2018-03-29 19:42   ` Jeff King
2018-03-30 12:37     ` Johannes Schindelin
2018-03-29 15:18 ` [PATCH 3/9] t1300: avoid relying on a bug Johannes Schindelin
2018-03-29 19:43   ` Jeff King
2018-03-30 12:38     ` Johannes Schindelin
2018-03-29 15:18 ` [PATCH 4/9] t1300: remove unreasonable expectation from TODO Johannes Schindelin
2018-03-29 19:52   ` Jeff King
2018-03-29 20:45     ` Junio C Hamano
2018-03-30 12:42     ` Johannes Schindelin
2018-03-29 15:18 ` [PATCH 5/9] t1300: `--unset-all` can leave an empty section behind (bug) Johannes Schindelin
2018-03-29 19:54   ` Jeff King
2018-03-29 15:18 ` [PATCH 6/9] git_config_set: simplify the way the section name is remembered Johannes Schindelin
2018-03-29 15:19 ` [PATCH 7/9] git config --unset: remove empty sections (in normal situations) Johannes Schindelin
2018-03-29 21:32   ` Jeff King
2018-03-30 13:00     ` Johannes Schindelin
2018-03-30 13:09       ` Jeff King
2018-03-29 15:19 ` [PATCH 8/9] git_config_set: use do_config_from_file() directly Johannes Schindelin
2018-03-29 21:38   ` Jeff King
2018-03-30 13:02     ` Johannes Schindelin
2018-03-30 13:14       ` Jeff King
2018-03-30 14:01         ` Johannes Schindelin
2018-03-30 14:08           ` Jeff King
2018-03-30 19:04             ` Johannes Schindelin
2018-03-29 15:19 ` [PATCH 9/9] git_config_set: reuse empty sections Johannes Schindelin
2018-03-29 21:50   ` Jeff King
2018-03-30 13:15     ` Johannes Schindelin
2018-03-29 17:58 ` [PATCH 0/9] Assorted fixes for `git config` (including the "empty sections" bug) Stefan Beller
2018-03-30 12:14   ` Johannes Schindelin
2018-03-29 19:39 ` Jeff King
2018-03-30 12:35   ` Johannes Schindelin
2018-03-30 14:17 ` Ævar Arnfjörð Bjarmason
2018-03-30 18:46   ` Johannes Schindelin
2018-04-03 16:27 ` [PATCH v2 00/15] " Johannes Schindelin
2018-04-03 16:28   ` [PATCH v2 01/15] git_config_set: fix off-by-two Johannes Schindelin
2018-04-03 16:28   ` [PATCH v2 02/15] t1300: rename it to reflect that `repo-config` was deprecated Johannes Schindelin
2018-04-03 16:28   ` [PATCH v2 03/15] t1300: demonstrate that --replace-all can "invent" newlines Johannes Schindelin
2018-04-03 16:28   ` [PATCH v2 04/15] config --replace-all: avoid extra line breaks Johannes Schindelin
2018-04-03 16:28   ` [PATCH v2 05/15] t1300: avoid relying on a bug Johannes Schindelin
2018-04-03 16:28   ` [PATCH v2 06/15] t1300: remove unreasonable expectation from TODO Johannes Schindelin
2018-04-03 16:28   ` [PATCH v2 07/15] t1300: `--unset-all` can leave an empty section behind (bug) Johannes Schindelin
2018-04-03 16:28   ` [PATCH v2 08/15] config: introduce an optional event stream while parsing Johannes Schindelin
2018-04-06 21:22     ` Jeff King
2018-04-09  7:35       ` Johannes Schindelin
2018-04-03 16:28   ` [PATCH v2 09/15] config: avoid using the global variable `store` Johannes Schindelin
2018-04-06 21:23     ` Jeff King
2018-04-09  7:36       ` Johannes Schindelin
2018-04-03 16:28   ` [PATCH v2 10/15] config_set_store: rename some fields for consistency Johannes Schindelin
2018-04-03 16:28   ` [PATCH v2 11/15] git_config_set: do not use a state machine Johannes Schindelin
2018-04-06 21:28     ` Jeff King
2018-04-09  7:50       ` Johannes Schindelin
2018-04-03 16:28   ` [PATCH v2 12/15] git_config_set: make use of the config parser's event stream Johannes Schindelin
2018-04-03 16:28   ` [PATCH v2 13/15] git config --unset: remove empty sections (in the common case) Johannes Schindelin
2018-04-03 16:29   ` [PATCH v2 14/15] git_config_set: reuse empty sections Johannes Schindelin
2018-04-03 16:30   ` [PATCH v2 00/15] Assorted fixes for `git config` (including the "empty sections" bug) Johannes Schindelin
2018-04-06 21:33   ` Jeff King
2018-04-09  8:19     ` Johannes Schindelin
2018-04-09  8:31   ` [PATCH v3 " Johannes Schindelin
2018-04-09  8:31     ` [PATCH v3 01/15] git_config_set: fix off-by-two Johannes Schindelin
2018-04-09  8:31     ` [PATCH v3 02/15] t1300: rename it to reflect that `repo-config` was deprecated Johannes Schindelin
2018-04-09  8:31     ` [PATCH v3 03/15] t1300: demonstrate that --replace-all can "invent" newlines Johannes Schindelin
2018-04-09  8:31     ` [PATCH v3 04/15] config --replace-all: avoid extra line breaks Johannes Schindelin
2018-04-09  8:31     ` [PATCH v3 05/15] t1300: avoid relying on a bug Johannes Schindelin
2018-04-09  8:31     ` [PATCH v3 06/15] t1300: remove unreasonable expectation from TODO Johannes Schindelin
2018-04-09  8:31     ` [PATCH v3 07/15] t1300: add a few more hairy examples of sections becoming empty Johannes Schindelin
2018-04-09  8:32     ` [PATCH v3 08/15] t1300: `--unset-all` can leave an empty section behind (bug) Johannes Schindelin
2018-04-09  8:32     ` [PATCH v3 09/15] config: introduce an optional event stream while parsing Johannes Schindelin
2018-04-09  8:32     ` [PATCH v3 10/15] config: avoid using the global variable `store` Johannes Schindelin
2018-04-09  8:32     ` [PATCH v3 11/15] config_set_store: rename some fields for consistency Johannes Schindelin
2018-04-09  8:32     ` [PATCH v3 12/15] git_config_set: do not use a state machine Johannes Schindelin
2018-04-09  8:32     ` [PATCH v3 13/15] git_config_set: make use of the config parser's event stream Johannes Schindelin
2018-05-08 13:42       ` Jeff King
2018-05-08 14:00         ` Jeff King
2018-04-09  8:32     ` [PATCH v3 14/15] git config --unset: remove empty sections (in the common case) Johannes Schindelin
2018-04-09  8:32     ` [PATCH v3 15/15] git_config_set: reuse empty sections Johannes Schindelin

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for the project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git